2023-01-11T21:14:16.8543400Z Requested labels: linux.g5.4xlarge.nvidia.gpu 2023-01-11T21:14:16.8543511Z Job defined at: pytorch/pytorch/.github/workflows/_linux-test.yml@refs/tags/ciflow/trunk/91627 2023-01-11T21:14:16.8543690Z Reusable workflow chain: 2023-01-11T21:14:16.8543725Z pytorch/pytorch/.github/workflows/trunk.yml@refs/tags/ciflow/trunk/91627 (8419ddda87c8a47eacc63b54bc7ec98c1f27c26e) 2023-01-11T21:14:16.8543773Z -> pytorch/pytorch/.github/workflows/_linux-test.yml@refs/tags/ciflow/trunk/91627 (8419ddda87c8a47eacc63b54bc7ec98c1f27c26e) 2023-01-11T21:14:16.8543803Z Waiting for a runner to pick up this job... 2023-01-11T21:14:17.1266156Z Job is about to start running on the runner: i-016718a172a944ca0 (organization) 2023-01-11T21:14:21.5010212Z Current runner version: '2.300.2' 2023-01-11T21:14:21.5015494Z Runner name: 'i-016718a172a944ca0' 2023-01-11T21:14:21.5016078Z Runner group name: 'Default' 2023-01-11T21:14:21.5016758Z Machine name: 'ip-10-0-2-196' 2023-01-11T21:14:21.5018981Z ##[group]GITHUB_TOKEN Permissions 2023-01-11T21:14:21.5019738Z Actions: write 2023-01-11T21:14:21.5020092Z Checks: write 2023-01-11T21:14:21.5020463Z Contents: write 2023-01-11T21:14:21.5020874Z Deployments: write 2023-01-11T21:14:21.5021235Z Discussions: write 2023-01-11T21:14:21.5021599Z Issues: write 2023-01-11T21:14:21.5021999Z Metadata: read 2023-01-11T21:14:21.5022357Z Packages: write 2023-01-11T21:14:21.5022737Z Pages: write 2023-01-11T21:14:21.5023116Z PullRequests: write 2023-01-11T21:14:21.5023520Z RepositoryProjects: write 2023-01-11T21:14:21.5023943Z SecurityEvents: write 2023-01-11T21:14:21.5024340Z Statuses: write 2023-01-11T21:14:21.5024688Z ##[endgroup] 2023-01-11T21:14:21.5027649Z Secret source: Actions 2023-01-11T21:14:21.5028355Z Prepare workflow directory 2023-01-11T21:14:21.7575859Z Prepare all required actions 2023-01-11T21:14:21.7752585Z Getting action download info 2023-01-11T21:14:22.0855347Z Download action repository 'pytorch/test-infra@main' (SHA:2c225610d00fb13c04fcd60389d3e4d8326167c3) 2023-01-11T21:14:22.3833061Z Download action repository 'pytorch/pytorch@master' (SHA:c5836153f5332ca83d5cacde38f2829a4d54793e) 2023-01-11T21:14:24.8242351Z Download action repository 'seemethere/upload-artifact-s3@v5' (SHA:baba72d0712b404f646cebe0730933554ebce96a) 2023-01-11T21:14:25.1588624Z Getting action download info 2023-01-11T21:14:25.5985061Z Download action repository 'malfet/checkout@silent-checkout' (SHA:c7b8fef48edfe1bca0044a44b1f7f7c4318a3076) 2023-01-11T21:14:27.0832770Z Getting action download info 2023-01-11T21:14:27.4893178Z Download action repository 'nick-fields/retry@3e91a01664abd3c5cd539100d10d33b9c5b68482' (SHA:3e91a01664abd3c5cd539100d10d33b9c5b68482) 2023-01-11T21:14:27.6264289Z Uses: pytorch/pytorch/.github/workflows/_linux-test.yml 2023-01-11T21:14:27.6265701Z ##[group] Inputs 2023-01-11T21:14:27.6266007Z build-environment: linux-bionic-cuda11.6-py3.10-gcc7-sm86 2023-01-11T21:14:27.6266854Z test-matrix: { include: [ { config: "default", shard: 1, num_shards: 4, runner: "linux.g5.4xlarge.nvidia.gpu" }, { config: "default", shard: 2, num_shards: 4, runner: "linux.g5.4xlarge.nvidia.gpu" }, { config: "default", shard: 3, num_shards: 4, runner: "linux.g5.4xlarge.nvidia.gpu" }, { config: "default", shard: 4, num_shards: 4, runner: "linux.g5.4xlarge.nvidia.gpu" }, { config: "slow", shard: 1, num_shards: 2, runner: "linux.g5.4xlarge.nvidia.gpu" }, { config: "slow", shard: 2, num_shards: 2, runner: "linux.g5.4xlarge.nvidia.gpu" }, { config: "functorch", shard: 1, num_shards: 1, runner: "linux.g5.4xlarge.nvidia.gpu" }, ]} 2023-01-11T21:14:27.6267762Z docker-image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-bionic-cuda11.6-cudnn8-py3-gcc7:fd224c2e6c79d7fdec6408da598bf52bc5b201dd 2023-01-11T21:14:27.6268143Z sync-tag: 2023-01-11T21:14:27.6268787Z timeout-minutes: 240 2023-01-11T21:14:27.6268995Z use-gha: 2023-01-11T21:14:27.6269195Z ##[endgroup] 2023-01-11T21:14:27.6269706Z Complete job name: cuda11.6-py3.10-gcc7-sm86 / test (default, 2, 4, linux.g5.4xlarge.nvidia.gpu) 2023-01-11T21:14:27.6926993Z ##[group]Run pytorch/test-infra/.github/actions/setup-ssh@main 2023-01-11T21:14:27.6927275Z with: 2023-01-11T21:14:27.6927726Z github-secret: *** 2023-01-11T21:14:27.6928076Z instructions: All testing is done inside the container, to start an interactive session run: docker exec -it $(docker container ps --format '{{.ID}}') bash 2023-01-11T21:14:27.6928433Z activate-with-label: false 2023-01-11T21:14:27.6928644Z label: with-ssh 2023-01-11T21:14:27.6928851Z remove-existing-keys: true 2023-01-11T21:14:27.6929046Z env: 2023-01-11T21:14:27.6929232Z GIT_DEFAULT_BRANCH: master 2023-01-11T21:14:27.6929436Z ##[endgroup] 2023-01-11T21:14:27.7688201Z ciflow reference detected, attempting to extract PR number 2023-01-11T21:14:28.3871962Z Grabbing public ssh keys from https://github.com/pytorch-bot[bot].keys 2023-01-11T21:14:28.4638580Z No SSH keys found for user pytorch-bot[bot] 2023-01-11T21:14:28.4639305Z Grabbing public ssh keys from https://github.com/LucaLumetti.keys 2023-01-11T21:14:28.5427293Z ~/.ssh/authorized_keys file found on node, removing ~/.ssh and starting fresh 2023-01-11T21:14:28.5441034Z Public keys pulled and installed to /home/ec2-user/.ssh/authorized_keys 2023-01-11T21:14:28.5464501Z Login using: ssh ec2-user@ec2-18-208-138-83.compute-1.amazonaws.com 2023-01-11T21:14:28.5464865Z All testing is done inside the container, to start an interactive session run: 2023-01-11T21:14:28.5465240Z docker exec -it $(docker container ps --format '{{.ID}}') bash 2023-01-11T21:14:28.5682039Z ##[group]Run pytorch/pytorch/.github/actions/checkout-pytorch@master 2023-01-11T21:14:28.5682322Z with: 2023-01-11T21:14:28.5682506Z submodules: recursive 2023-01-11T21:14:28.5682714Z fetch-depth: 0 2023-01-11T21:14:28.5682907Z env: 2023-01-11T21:14:28.5683089Z GIT_DEFAULT_BRANCH: master 2023-01-11T21:14:28.5683298Z ##[endgroup] 2023-01-11T21:14:28.5887396Z ##[group]Run retry () { 2023-01-11T21:14:28.5887657Z retry () { 2023-01-11T21:14:28.5887901Z  $* || (sleep 1 && $*) || (sleep 2 && $*) || (sleep 4 && $*) || (sleep 8 && $*) 2023-01-11T21:14:28.5888123Z } 2023-01-11T21:14:28.5888325Z echo "${GITHUB_WORKSPACE}" 2023-01-11T21:14:28.5888559Z if [ -z "${NO_SUDO}" ]; then 2023-01-11T21:14:28.5888801Z  retry sudo rm -rf "${GITHUB_WORKSPACE}" 2023-01-11T21:14:28.5889029Z else 2023-01-11T21:14:28.5889248Z  retry rm -rf "${GITHUB_WORKSPACE}" 2023-01-11T21:14:28.5889455Z fi 2023-01-11T21:14:28.5889706Z mkdir "${GITHUB_WORKSPACE}" 2023-01-11T21:14:28.5904334Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2023-01-11T21:14:28.5904583Z env: 2023-01-11T21:14:28.5904777Z GIT_DEFAULT_BRANCH: master 2023-01-11T21:14:28.5904986Z NO_SUDO: 2023-01-11T21:14:28.5905164Z ##[endgroup] 2023-01-11T21:14:28.5995605Z /home/ec2-user/actions-runner/_work/pytorch/pytorch 2023-01-11T21:14:30.5287398Z ##[group]Run malfet/checkout@silent-checkout 2023-01-11T21:14:30.5287667Z with: 2023-01-11T21:14:30.5287882Z ref: 8419ddda87c8a47eacc63b54bc7ec98c1f27c26e 2023-01-11T21:14:30.5288103Z fetch-depth: 0 2023-01-11T21:14:30.5288299Z submodules: recursive 2023-01-11T21:14:30.5288500Z quiet-checkout: true 2023-01-11T21:14:30.5288706Z repository: pytorch/pytorch 2023-01-11T21:14:30.5289021Z token: *** 2023-01-11T21:14:30.5289210Z ssh-strict: true 2023-01-11T21:14:30.5289414Z persist-credentials: true 2023-01-11T21:14:30.5289626Z clean: true 2023-01-11T21:14:30.5289820Z lfs: false 2023-01-11T21:14:30.5290008Z set-safe-directory: true 2023-01-11T21:14:30.5290203Z env: 2023-01-11T21:14:30.5290389Z GIT_DEFAULT_BRANCH: master 2023-01-11T21:14:30.5290581Z ##[endgroup] 2023-01-11T21:14:30.6382665Z Syncing repository: pytorch/pytorch 2023-01-11T21:14:30.6383967Z ##[group]Getting Git version info 2023-01-11T21:14:30.6384431Z Working directory is '/home/ec2-user/actions-runner/_work/pytorch/pytorch' 2023-01-11T21:14:30.6384876Z [command]/usr/bin/git version 2023-01-11T21:14:30.6385087Z git version 2.38.1 2023-01-11T21:14:30.6385646Z ##[endgroup] 2023-01-11T21:14:30.6396410Z Temporarily overriding HOME='/home/ec2-user/actions-runner/_work/_temp/72d2e0db-dd5b-43c1-a658-5ef95a1e4abf' before making global git config changes 2023-01-11T21:14:30.6396842Z Adding repository directory to the temporary git global config as a safe directory 2023-01-11T21:14:30.6399052Z [command]/usr/bin/git config --global --add safe.directory /home/ec2-user/actions-runner/_work/pytorch/pytorch 2023-01-11T21:14:30.6434701Z Deleting the contents of '/home/ec2-user/actions-runner/_work/pytorch/pytorch' 2023-01-11T21:14:30.6439274Z ##[group]Initializing the repository 2023-01-11T21:14:30.6441825Z [command]/usr/bin/git init /home/ec2-user/actions-runner/_work/pytorch/pytorch 2023-01-11T21:14:30.6463266Z hint: Using 'master' as the name for the initial branch. This default branch name 2023-01-11T21:14:30.6463631Z hint: is subject to change. To configure the initial branch name to use in all 2023-01-11T21:14:30.6464000Z hint: of your new repositories, which will suppress this warning, call: 2023-01-11T21:14:30.6464248Z hint: 2023-01-11T21:14:30.6464541Z hint: git config --global init.defaultBranch 2023-01-11T21:14:30.6464783Z hint: 2023-01-11T21:14:30.6465086Z hint: Names commonly chosen instead of 'master' are 'main', 'trunk' and 2023-01-11T21:14:30.6465464Z hint: 'development'. The just-created branch can be renamed via this command: 2023-01-11T21:14:30.6465715Z hint: 2023-01-11T21:14:30.6466036Z hint: git branch -m 2023-01-11T21:14:30.6466415Z Initialized empty Git repository in /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/ 2023-01-11T21:14:30.6471634Z [command]/usr/bin/git remote add origin https://github.com/pytorch/pytorch 2023-01-11T21:14:30.6491580Z ##[endgroup] 2023-01-11T21:14:30.6491976Z ##[group]Disabling automatic garbage collection 2023-01-11T21:14:30.6495060Z [command]/usr/bin/git config --local gc.auto 0 2023-01-11T21:14:30.6513309Z ##[endgroup] 2023-01-11T21:14:30.6513652Z ##[group]Setting up auth 2023-01-11T21:14:30.6519393Z [command]/usr/bin/git config --local --name-only --get-regexp core\.sshCommand 2023-01-11T21:14:30.6540889Z [command]/usr/bin/git submodule foreach --recursive git config --local --name-only --get-regexp 'core\.sshCommand' && git config --local --unset-all 'core.sshCommand' || : 2023-01-11T21:14:30.6765500Z [command]/usr/bin/git config --local --name-only --get-regexp http\.https\:\/\/github\.com\/\.extraheader 2023-01-11T21:14:30.6788288Z [command]/usr/bin/git submodule foreach --recursive git config --local --name-only --get-regexp 'http\.https\:\/\/github\.com\/\.extraheader' && git config --local --unset-all 'http.https://github.com/.extraheader' || : 2023-01-11T21:14:30.6993426Z [command]/usr/bin/git config --local http.https://github.com/.extraheader AUTHORIZATION: basic *** 2023-01-11T21:14:30.7032940Z ##[endgroup] 2023-01-11T21:14:30.7033311Z ##[group]Fetching the repository 2023-01-11T21:14:30.7038728Z [command]/usr/bin/git -c protocol.version=2 fetch --prune --quiet --no-recurse-submodules origin +refs/heads/*:refs/remotes/origin/* +refs/tags/*:refs/tags/* 2023-01-11T21:15:22.4358794Z [command]/usr/bin/git rev-parse --verify --quiet 8419ddda87c8a47eacc63b54bc7ec98c1f27c26e^{object} 2023-01-11T21:15:22.4379435Z 8419ddda87c8a47eacc63b54bc7ec98c1f27c26e 2023-01-11T21:15:22.4383202Z ##[endgroup] 2023-01-11T21:15:22.4384143Z ##[group]Determining the checkout info 2023-01-11T21:15:22.4384835Z ##[endgroup] 2023-01-11T21:15:22.4385241Z ##[group]Checking out the ref 2023-01-11T21:15:22.4388055Z [command]/usr/bin/git checkout --quiet --force 8419ddda87c8a47eacc63b54bc7ec98c1f27c26e 2023-01-11T21:15:23.6204460Z ##[endgroup] 2023-01-11T21:15:23.6205773Z ##[group]Setting up auth for fetching submodules 2023-01-11T21:15:23.6211245Z [command]/usr/bin/git config --global http.https://github.com/.extraheader AUTHORIZATION: basic *** 2023-01-11T21:15:23.6259196Z [command]/usr/bin/git config --global --unset-all url.https://github.com/.insteadOf 2023-01-11T21:15:23.6280030Z [command]/usr/bin/git config --global --add url.https://github.com/.insteadOf git@github.com: 2023-01-11T21:15:23.6309561Z [command]/usr/bin/git config --global --add url.https://github.com/.insteadOf org-21003710@github.com: 2023-01-11T21:15:23.6339410Z ##[endgroup] 2023-01-11T21:15:23.6340414Z ##[group]Fetching submodules 2023-01-11T21:15:23.6342664Z [command]/usr/bin/git submodule sync --recursive 2023-01-11T21:15:23.6586105Z [command]/usr/bin/git -c protocol.version=2 submodule update --init --force --recursive 2023-01-11T21:15:23.6796782Z Submodule 'android/libs/fbjni' (https://github.com/facebookincubator/fbjni.git) registered for path 'android/libs/fbjni' 2023-01-11T21:15:23.6798441Z Submodule 'third_party/NNPACK_deps/FP16' (https://github.com/Maratyszcza/FP16.git) registered for path 'third_party/FP16' 2023-01-11T21:15:23.6799122Z Submodule 'third_party/NNPACK_deps/FXdiv' (https://github.com/Maratyszcza/FXdiv.git) registered for path 'third_party/FXdiv' 2023-01-11T21:15:23.6800071Z Submodule 'third_party/NNPACK' (https://github.com/Maratyszcza/NNPACK.git) registered for path 'third_party/NNPACK' 2023-01-11T21:15:23.6802014Z Submodule 'third_party/QNNPACK' (https://github.com/pytorch/QNNPACK) registered for path 'third_party/QNNPACK' 2023-01-11T21:15:23.6804922Z Submodule 'third_party/VulkanMemoryAllocator' (https://github.com/GPUOpen-LibrariesAndSDKs/VulkanMemoryAllocator.git) registered for path 'third_party/VulkanMemoryAllocator' 2023-01-11T21:15:23.6806373Z Submodule 'third_party/XNNPACK' (https://github.com/google/XNNPACK.git) registered for path 'third_party/XNNPACK' 2023-01-11T21:15:23.6807953Z Submodule 'third_party/benchmark' (https://github.com/google/benchmark.git) registered for path 'third_party/benchmark' 2023-01-11T21:15:23.6809966Z Submodule 'third_party/cpuinfo' (https://github.com/pytorch/cpuinfo.git) registered for path 'third_party/cpuinfo' 2023-01-11T21:15:23.6812391Z Submodule 'third_party/cub' (https://github.com/NVlabs/cub.git) registered for path 'third_party/cub' 2023-01-11T21:15:23.6816034Z Submodule 'third_party/cudnn_frontend' (https://github.com/NVIDIA/cudnn-frontend.git) registered for path 'third_party/cudnn_frontend' 2023-01-11T21:15:23.6816838Z Submodule 'third_party/cutlass' (https://github.com/NVIDIA/cutlass.git) registered for path 'third_party/cutlass' 2023-01-11T21:15:23.6819020Z Submodule 'third_party/eigen' (https://gitlab.com/libeigen/eigen.git) registered for path 'third_party/eigen' 2023-01-11T21:15:23.6821463Z Submodule 'third_party/fbgemm' (https://github.com/pytorch/fbgemm) registered for path 'third_party/fbgemm' 2023-01-11T21:15:23.6823753Z Submodule 'third_party/flatbuffers' (https://github.com/google/flatbuffers.git) registered for path 'third_party/flatbuffers' 2023-01-11T21:15:23.6826131Z Submodule 'third_party/fmt' (https://github.com/fmtlib/fmt.git) registered for path 'third_party/fmt' 2023-01-11T21:15:23.6828758Z Submodule 'third_party/foxi' (https://github.com/houseroad/foxi.git) registered for path 'third_party/foxi' 2023-01-11T21:15:23.6831282Z Submodule 'third_party/gemmlowp/gemmlowp' (https://github.com/google/gemmlowp.git) registered for path 'third_party/gemmlowp/gemmlowp' 2023-01-11T21:15:23.6833861Z Submodule 'third_party/gloo' (https://github.com/facebookincubator/gloo) registered for path 'third_party/gloo' 2023-01-11T21:15:23.6836643Z Submodule 'third_party/googletest' (https://github.com/google/googletest.git) registered for path 'third_party/googletest' 2023-01-11T21:15:23.6839447Z Submodule 'third_party/ideep' (https://github.com/intel/ideep) registered for path 'third_party/ideep' 2023-01-11T21:15:23.6842330Z Submodule 'third_party/ios-cmake' (https://github.com/Yangqing/ios-cmake.git) registered for path 'third_party/ios-cmake' 2023-01-11T21:15:23.6845216Z Submodule 'third_party/ittapi' (https://github.com/intel/ittapi.git) registered for path 'third_party/ittapi' 2023-01-11T21:15:23.6848159Z Submodule 'third_party/kineto' (https://github.com/pytorch/kineto) registered for path 'third_party/kineto' 2023-01-11T21:15:23.6851108Z Submodule 'third_party/nccl/nccl' (https://github.com/NVIDIA/nccl) registered for path 'third_party/nccl/nccl' 2023-01-11T21:15:23.6855358Z Submodule 'third_party/neon2sse' (https://github.com/intel/ARM_NEON_2_x86_SSE.git) registered for path 'third_party/neon2sse' 2023-01-11T21:15:23.6858544Z Submodule 'third_party/nlohmann' (https://github.com/nlohmann/json.git) registered for path 'third_party/nlohmann' 2023-01-11T21:15:23.6861673Z Submodule 'third_party/onnx' (https://github.com/onnx/onnx.git) registered for path 'third_party/onnx' 2023-01-11T21:15:23.6864955Z Submodule 'third_party/onnx-tensorrt' (https://github.com/onnx/onnx-tensorrt) registered for path 'third_party/onnx-tensorrt' 2023-01-11T21:15:23.6868301Z Submodule 'third_party/pocketfft' (https://github.com/mreineck/pocketfft) registered for path 'third_party/pocketfft' 2023-01-11T21:15:23.6871683Z Submodule 'third_party/protobuf' (https://github.com/protocolbuffers/protobuf.git) registered for path 'third_party/protobuf' 2023-01-11T21:15:23.6875071Z Submodule 'third_party/NNPACK_deps/psimd' (https://github.com/Maratyszcza/psimd.git) registered for path 'third_party/psimd' 2023-01-11T21:15:23.6878747Z Submodule 'third_party/NNPACK_deps/pthreadpool' (https://github.com/Maratyszcza/pthreadpool.git) registered for path 'third_party/pthreadpool' 2023-01-11T21:15:23.6882216Z Submodule 'third_party/pybind11' (https://github.com/pybind/pybind11.git) registered for path 'third_party/pybind11' 2023-01-11T21:15:23.6885926Z Submodule 'third_party/python-enum' (https://github.com/PeachPy/enum34.git) registered for path 'third_party/python-enum' 2023-01-11T21:15:23.6889613Z Submodule 'third_party/python-peachpy' (https://github.com/malfet/PeachPy.git) registered for path 'third_party/python-peachpy' 2023-01-11T21:15:23.6893543Z Submodule 'third_party/python-six' (https://github.com/benjaminp/six.git) registered for path 'third_party/python-six' 2023-01-11T21:15:23.6897617Z Submodule 'third_party/sleef' (https://github.com/shibatch/sleef) registered for path 'third_party/sleef' 2023-01-11T21:15:23.6901434Z Submodule 'third_party/tbb' (https://github.com/01org/tbb) registered for path 'third_party/tbb' 2023-01-11T21:15:23.6905863Z Submodule 'third_party/tensorpipe' (https://github.com/pytorch/tensorpipe.git) registered for path 'third_party/tensorpipe' 2023-01-11T21:15:23.6909839Z Submodule 'third_party/zstd' (https://github.com/facebook/zstd.git) registered for path 'third_party/zstd' 2023-01-11T21:15:23.6928781Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/android/libs/fbjni'... 2023-01-11T21:15:23.9752383Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/FP16'... 2023-01-11T21:15:24.2009871Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/FXdiv'... 2023-01-11T21:15:24.5309618Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/NNPACK'... 2023-01-11T21:15:24.8230216Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/QNNPACK'... 2023-01-11T21:15:25.0948223Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/VulkanMemoryAllocator'... 2023-01-11T21:15:27.1538539Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/XNNPACK'... 2023-01-11T21:15:32.6805819Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/benchmark'... 2023-01-11T21:15:33.0813352Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/cpuinfo'... 2023-01-11T21:15:33.5801525Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/cub'... 2023-01-11T21:15:35.0340374Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/cudnn_frontend'... 2023-01-11T21:15:36.1281197Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/cutlass'... 2023-01-11T21:15:37.2759124Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/eigen'... 2023-01-11T21:15:42.8814259Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fbgemm'... 2023-01-11T21:15:43.6265726Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/flatbuffers'... 2023-01-11T21:15:45.0714241Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fmt'... 2023-01-11T21:15:46.2243279Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/foxi'... 2023-01-11T21:15:46.4328511Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/gemmlowp/gemmlowp'... 2023-01-11T21:15:47.1412260Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/gloo'... 2023-01-11T21:15:47.4848878Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/googletest'... 2023-01-11T21:15:48.3635735Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/ideep'... 2023-01-11T21:15:48.7759329Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/ios-cmake'... 2023-01-11T21:15:48.9891807Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/ittapi'... 2023-01-11T21:15:49.2689774Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto'... 2023-01-11T21:15:50.5765284Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/nccl/nccl'... 2023-01-11T21:15:50.9170666Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/neon2sse'... 2023-01-11T21:15:51.2838485Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/nlohmann'... 2023-01-11T21:15:57.2794826Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/onnx'... 2023-01-11T21:15:58.9512136Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/onnx-tensorrt'... 2023-01-11T21:15:59.4773267Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/pocketfft'... 2023-01-11T21:15:59.7145657Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/protobuf'... 2023-01-11T21:16:05.0217394Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/psimd'... 2023-01-11T21:16:06.9220841Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/pthreadpool'... 2023-01-11T21:16:07.2444530Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/pybind11'... 2023-01-11T21:16:08.8290383Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/python-enum'... 2023-01-11T21:16:09.4136793Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/python-peachpy'... 2023-01-11T21:16:09.7346831Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/python-six'... 2023-01-11T21:16:10.1870774Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/sleef'... 2023-01-11T21:16:11.2655769Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/tbb'... 2023-01-11T21:16:13.0874356Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/tensorpipe'... 2023-01-11T21:16:13.5666565Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/zstd'... 2023-01-11T21:16:15.6246257Z Submodule path 'android/libs/fbjni': checked out '7e1e1fe3858c63c251c637ae41a20de425dde96f' 2023-01-11T21:16:15.6320287Z Submodule path 'third_party/FP16': checked out '4dfe081cf6bcd15db339cf2680b9281b8451eeb3' 2023-01-11T21:16:15.6376911Z Submodule path 'third_party/FXdiv': checked out 'b408327ac2a15ec3e43352421954f5b1967701d1' 2023-01-11T21:16:15.6554233Z Submodule path 'third_party/NNPACK': checked out 'c07e3a0400713d546e0dea2d5466dd22ea389c73' 2023-01-11T21:16:15.6727244Z Submodule path 'third_party/QNNPACK': checked out '7d2a4e9931a82adc3814275b6219a03e24e36b4c' 2023-01-11T21:16:15.7002886Z Submodule path 'third_party/VulkanMemoryAllocator': checked out 'a6bfc237255a6bac1513f7c1ebde6d8aed6b5191' 2023-01-11T21:16:16.1929576Z Submodule path 'third_party/XNNPACK': checked out 'ae108ef49aa5623b896fc93d4298c49d1750d9ba' 2023-01-11T21:16:16.8923166Z Submodule path 'third_party/benchmark': checked out '0d98dba29d66e93259db7daa53a9327df767a415' 2023-01-11T21:16:16.9769618Z Submodule path 'third_party/cpuinfo': checked out '8ec7bd91ad0470e61cf38f618cc1f270dede599c' 2023-01-11T21:16:17.0053773Z Submodule path 'third_party/cub': checked out 'd106ddb991a56c3df1b6d51b2409e36ba8181ce4' 2023-01-11T21:16:17.2387339Z Submodule path 'third_party/cudnn_frontend': checked out '171a7a986f7fbd9ed71bd0cf3c7ad4f55843d6b3' 2023-01-11T21:16:17.5744422Z Submodule path 'third_party/cutlass': checked out 'b72cbf957df8cf84a6d0ff91c190ad51a9c1d24a' 2023-01-11T21:16:17.7768696Z Submodule path 'third_party/eigen': checked out '3147391d946bb4b6c68edd901f2add6ac1f31f8c' 2023-01-11T21:16:17.8133415Z Submodule path 'third_party/fbgemm': checked out '80d64206c07879fd4683be66873de7cefa1a0a71' 2023-01-11T21:16:17.8144783Z Submodule 'third_party/asmjit' (https://github.com/asmjit/asmjit.git) registered for path 'third_party/fbgemm/third_party/asmjit' 2023-01-11T21:16:17.8147013Z Submodule 'third_party/cpuinfo' (https://github.com/pytorch/cpuinfo) registered for path 'third_party/fbgemm/third_party/cpuinfo' 2023-01-11T21:16:17.8149954Z Submodule 'third_party/googletest' (https://github.com/google/googletest) registered for path 'third_party/fbgemm/third_party/googletest' 2023-01-11T21:16:17.8152793Z Submodule 'third_party/hipify_torch' (https://github.com/ROCmSoftwarePlatform/hipify_torch.git) registered for path 'third_party/fbgemm/third_party/hipify_torch' 2023-01-11T21:16:17.8179735Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fbgemm/third_party/asmjit'... 2023-01-11T21:16:18.7758155Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fbgemm/third_party/cpuinfo'... 2023-01-11T21:16:19.3122006Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fbgemm/third_party/googletest'... 2023-01-11T21:16:20.2260036Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fbgemm/third_party/hipify_torch'... 2023-01-11T21:16:20.6023525Z Submodule path 'third_party/fbgemm/third_party/asmjit': checked out 'd3fbf7c9bc7c1d1365a94a45614b91c5a3706b81' 2023-01-11T21:16:20.6879811Z Submodule path 'third_party/fbgemm/third_party/cpuinfo': checked out 'ed8b86a253800bafdb7b25c5c399f91bff9cb1f3' 2023-01-11T21:16:20.7375893Z Submodule path 'third_party/fbgemm/third_party/googletest': checked out 'cbf019de22c8dd37b2108da35b2748fd702d1796' 2023-01-11T21:16:20.7458268Z Submodule path 'third_party/fbgemm/third_party/hipify_torch': checked out '1840658c184f3eeba787dae0f06c45756c1daaf5' 2023-01-11T21:16:20.8154152Z Submodule path 'third_party/flatbuffers': checked out 'd0cede9c90c5257537c293517a21376408b549fa' 2023-01-11T21:16:20.8455251Z Submodule path 'third_party/fmt': checked out '7bdf0628b1276379886c7f6dda2cef2b3b374f0b' 2023-01-11T21:16:20.8520589Z Submodule path 'third_party/foxi': checked out 'c278588e34e535f0bb8f00df3880d26928038cad' 2023-01-11T21:16:20.8827236Z Submodule path 'third_party/gemmlowp/gemmlowp': checked out '3fb5c176c17c765a3492cd2f0321b0dab712f350' 2023-01-11T21:16:20.9017704Z Submodule path 'third_party/gloo': checked out '4a5e339b764261d20fc409071dc7a8b8989aa195' 2023-01-11T21:16:20.9386456Z Submodule path 'third_party/googletest': checked out 'e2239ee6043f73722e7aa812a459f54a28552929' 2023-01-11T21:16:20.9477515Z Submodule path 'third_party/ideep': checked out 'e533c771a1e75a1c225c14b2261eefa62681d9e6' 2023-01-11T21:16:20.9488230Z Submodule 'mkl-dnn' (https://github.com/intel/mkl-dnn.git) registered for path 'third_party/ideep/mkl-dnn' 2023-01-11T21:16:20.9505938Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/ideep/mkl-dnn'... 2023-01-11T21:16:29.0382523Z Submodule path 'third_party/ideep/mkl-dnn': checked out '404ad76ee633c939d705eb583ffe50a806969d5e' 2023-01-11T21:16:29.0397569Z Submodule 'third_party/oneDNN' (https://github.com/oneapi-src/oneDNN.git) registered for path 'third_party/ideep/mkl-dnn/third_party/oneDNN' 2023-01-11T21:16:29.0421171Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/ideep/mkl-dnn/third_party/oneDNN'... 2023-01-11T21:16:37.0542495Z Submodule path 'third_party/ideep/mkl-dnn/third_party/oneDNN': checked out 'fbec3e25a559ee252022ae066817b204e106a6ba' 2023-01-11T21:16:37.0626089Z Submodule path 'third_party/ios-cmake': checked out '8abaed637d56f1337d6e1d2c4026e25c1eade724' 2023-01-11T21:16:37.0741591Z Submodule path 'third_party/ittapi': checked out '5b8a7d7422611c3a0d799fb5fc5dd4abfae35b42' 2023-01-11T21:16:37.1518534Z Submodule path 'third_party/kineto': checked out '6c1629809068efd78a8d56b4aa479c7ec49ae562' 2023-01-11T21:16:37.1529987Z Submodule 'libkineto/third_party/fmt' (https://github.com/fmtlib/fmt.git) registered for path 'third_party/kineto/libkineto/third_party/fmt' 2023-01-11T21:16:37.1531079Z Submodule 'libkineto/third_party/googletest' (https://github.com/google/googletest.git) registered for path 'third_party/kineto/libkineto/third_party/googletest' 2023-01-11T21:16:37.1552163Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/fmt'... 2023-01-11T21:16:38.2297343Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/googletest'... 2023-01-11T21:16:39.1691767Z Submodule path 'third_party/kineto/libkineto/third_party/fmt': checked out '2591ab91c3898c9f6544fff04660276537d32ffd' 2023-01-11T21:16:39.2139328Z Submodule path 'third_party/kineto/libkineto/third_party/googletest': checked out '7aca84427f224eeed3144123d5230d5871e93347' 2023-01-11T21:16:39.2308033Z Submodule path 'third_party/nccl/nccl': checked out 'f89fd4777d2ef9229c039ff750ae21da01626f52' 2023-01-11T21:16:39.2416403Z Submodule path 'third_party/neon2sse': checked out '97a126f08ce318023be604d03f88bf0820a9464a' 2023-01-11T21:16:39.3259864Z Submodule path 'third_party/nlohmann': checked out '87cda1d6646592ac5866dc703c8e1839046a6806' 2023-01-11T21:16:39.5228244Z Submodule path 'third_party/onnx': checked out 'f7ee1ac60d06abe8e26c9b6bbe1e3db5286b614b' 2023-01-11T21:16:39.5250380Z Submodule 'third_party/benchmark' (https://github.com/google/benchmark.git) registered for path 'third_party/onnx/third_party/benchmark' 2023-01-11T21:16:39.5251444Z Submodule 'third_party/pybind11' (https://github.com/pybind/pybind11.git) registered for path 'third_party/onnx/third_party/pybind11' 2023-01-11T21:16:39.5270820Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/onnx/third_party/benchmark'... 2023-01-11T21:16:39.9297544Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/onnx/third_party/pybind11'... 2023-01-11T21:16:40.7851843Z Submodule path 'third_party/onnx/third_party/benchmark': checked out '0d98dba29d66e93259db7daa53a9327df767a415' 2023-01-11T21:16:40.8109083Z Submodule path 'third_party/onnx/third_party/pybind11': checked out 'ffa346860b306c9bbfb341aed9c14c067751feb8' 2023-01-11T21:16:40.8231559Z Submodule path 'third_party/onnx-tensorrt': checked out 'c153211418a7c57ce071d9ce2a41f8d1c85a878f' 2023-01-11T21:16:40.8242000Z Submodule 'third_party/onnx' (https://github.com/onnx/onnx.git) registered for path 'third_party/onnx-tensorrt/third_party/onnx' 2023-01-11T21:16:40.8258137Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/onnx-tensorrt/third_party/onnx'... 2023-01-11T21:16:42.5039262Z Submodule path 'third_party/onnx-tensorrt/third_party/onnx': checked out '765f5ee823a67a866f4bd28a9860e81f3c811ce8' 2023-01-11T21:16:42.5052394Z Submodule 'third_party/benchmark' (https://github.com/google/benchmark.git) registered for path 'third_party/onnx-tensorrt/third_party/onnx/third_party/benchmark' 2023-01-11T21:16:42.5055600Z Submodule 'third_party/pybind11' (https://github.com/pybind/pybind11.git) registered for path 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11' 2023-01-11T21:16:42.5072632Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/onnx-tensorrt/third_party/onnx/third_party/benchmark'... 2023-01-11T21:16:42.9010409Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11'... 2023-01-11T21:16:43.7220276Z Submodule path 'third_party/onnx-tensorrt/third_party/onnx/third_party/benchmark': checked out 'e776aa0275e293707b6a0901e0e8d8a8a3679508' 2023-01-11T21:16:43.7810738Z Submodule path 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11': checked out 'a1041190c8b8ff0cd9e2f0752248ad5e3789ea0c' 2023-01-11T21:16:43.7822244Z Submodule 'tools/clang' (https://github.com/wjakob/clang-cindex-python3) registered for path 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11/tools/clang' 2023-01-11T21:16:43.7840021Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11/tools/clang'... 2023-01-11T21:16:43.9979410Z Submodule path 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11/tools/clang': checked out '6a00cbc4a9b8e68b71caf7f774b3f9c753ae84d5' 2023-01-11T21:16:44.0045463Z Submodule path 'third_party/pocketfft': checked out 'ea778e37710c07723435b1be58235996d1d43a5a' 2023-01-11T21:16:44.2159850Z Submodule path 'third_party/protobuf': checked out 'd1eca4e4b421cd2997495c4b4e65cea6be4e9b8a' 2023-01-11T21:16:44.2177634Z Submodule 'third_party/benchmark' (https://github.com/google/benchmark.git) registered for path 'third_party/protobuf/third_party/benchmark' 2023-01-11T21:16:44.2179273Z Submodule 'third_party/googletest' (https://github.com/google/googletest.git) registered for path 'third_party/protobuf/third_party/googletest' 2023-01-11T21:16:44.2196058Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/protobuf/third_party/benchmark'... 2023-01-11T21:16:44.5980605Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/protobuf/third_party/googletest'... 2023-01-11T21:16:45.4999493Z Submodule path 'third_party/protobuf/third_party/benchmark': checked out '5b7683f49e1e9223cf9927b24f6fd3d6bd82e3f8' 2023-01-11T21:16:45.5566207Z Submodule path 'third_party/protobuf/third_party/googletest': checked out '5ec7f0c4a113e2f18ac2c6cc7df51ad6afc24081' 2023-01-11T21:16:45.5622594Z Submodule path 'third_party/psimd': checked out '072586a71b55b7f8c584153d223e95687148a900' 2023-01-11T21:16:45.5697391Z Submodule path 'third_party/pthreadpool': checked out 'a134dd5d4cee80cce15db81a72e7f929d71dd413' 2023-01-11T21:16:45.5961624Z Submodule path 'third_party/pybind11': checked out '80dc998efced8ceb2be59756668a7e90e8bef917' 2023-01-11T21:16:45.6021257Z Submodule path 'third_party/python-enum': checked out '4cfedc426c4e2fc52e3f5c2b4297e15ed8d6b8c7' 2023-01-11T21:16:45.6234421Z Submodule path 'third_party/python-peachpy': checked out 'f45429b087dd7d5bc78bb40dc7cf06425c252d67' 2023-01-11T21:16:45.6298239Z Submodule path 'third_party/python-six': checked out '15e31431af97e5e64b80af0a3f598d382bcdd49a' 2023-01-11T21:16:45.6640872Z Submodule path 'third_party/sleef': checked out 'e0a003ee838b75d11763aa9c3ef17bf71a725bff' 2023-01-11T21:16:45.7531369Z Submodule path 'third_party/tbb': checked out 'a51a90bc609bb73db8ea13841b5cf7aa4344d4a9' 2023-01-11T21:16:45.7727572Z Submodule path 'third_party/tensorpipe': checked out '52791a2fd214b2a9dc5759d36725909c1daa7f2e' 2023-01-11T21:16:45.7737897Z Submodule 'third_party/googletest' (https://github.com/google/googletest.git) registered for path 'third_party/tensorpipe/third_party/googletest' 2023-01-11T21:16:45.7738869Z Submodule 'third_party/libnop' (https://github.com/google/libnop.git) registered for path 'third_party/tensorpipe/third_party/libnop' 2023-01-11T21:16:45.7739944Z Submodule 'third_party/libuv' (https://github.com/libuv/libuv.git) registered for path 'third_party/tensorpipe/third_party/libuv' 2023-01-11T21:16:45.7742080Z Submodule 'third_party/pybind11' (https://github.com/pybind/pybind11.git) registered for path 'third_party/tensorpipe/third_party/pybind11' 2023-01-11T21:16:45.7758031Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/tensorpipe/third_party/googletest'... 2023-01-11T21:16:46.8790898Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/tensorpipe/third_party/libnop'... 2023-01-11T21:16:47.1627435Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/tensorpipe/third_party/libuv'... 2023-01-11T21:16:48.2546743Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/tensorpipe/third_party/pybind11'... 2023-01-11T21:16:49.2448386Z Submodule path 'third_party/tensorpipe/third_party/googletest': checked out 'aee0f9d9b5b87796ee8a0ab26b7587ec30e8858e' 2023-01-11T21:16:49.2559008Z Submodule path 'third_party/tensorpipe/third_party/libnop': checked out '910b55815be16109f04f4180e9adee14fb4ce281' 2023-01-11T21:16:49.3080362Z Submodule path 'third_party/tensorpipe/third_party/libuv': checked out '1dff88e5161cba5c59276d2070d2e304e4dcb242' 2023-01-11T21:16:49.3323395Z Submodule path 'third_party/tensorpipe/third_party/pybind11': checked out 'a23996fce38ff6ccfbcdc09f1e63f2c4be5ea2ef' 2023-01-11T21:16:49.3336297Z Submodule 'tools/clang' (https://github.com/wjakob/clang-cindex-python3) registered for path 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2023-01-11T21:16:49.3353642Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/tensorpipe/third_party/pybind11/tools/clang'... 2023-01-11T21:16:49.6150955Z Submodule path 'third_party/tensorpipe/third_party/pybind11/tools/clang': checked out '6a00cbc4a9b8e68b71caf7f774b3f9c753ae84d5' 2023-01-11T21:16:49.7332728Z Submodule path 'third_party/zstd': checked out 'aec56a52fbab207fc639a1937d1e708a282edca8' 2023-01-11T21:16:49.7358580Z [command]/usr/bin/git submodule foreach --recursive git config --local gc.auto 0 2023-01-11T21:16:49.7558175Z Entering 'android/libs/fbjni' 2023-01-11T21:16:49.7584202Z Entering 'third_party/FP16' 2023-01-11T21:16:49.7609691Z Entering 'third_party/FXdiv' 2023-01-11T21:16:49.7636243Z Entering 'third_party/NNPACK' 2023-01-11T21:16:49.7667340Z Entering 'third_party/QNNPACK' 2023-01-11T21:16:49.7693118Z Entering 'third_party/VulkanMemoryAllocator' 2023-01-11T21:16:49.7723853Z Entering 'third_party/XNNPACK' 2023-01-11T21:16:49.7762474Z Entering 'third_party/benchmark' 2023-01-11T21:16:49.7791265Z Entering 'third_party/cpuinfo' 2023-01-11T21:16:49.7826473Z Entering 'third_party/cub' 2023-01-11T21:16:49.7854986Z Entering 'third_party/cudnn_frontend' 2023-01-11T21:16:49.7884711Z Entering 'third_party/cutlass' 2023-01-11T21:16:49.7915500Z Entering 'third_party/eigen' 2023-01-11T21:16:49.7950127Z Entering 'third_party/fbgemm' 2023-01-11T21:16:49.7982160Z Entering 'third_party/fbgemm/third_party/asmjit' 2023-01-11T21:16:49.8009725Z Entering 'third_party/fbgemm/third_party/cpuinfo' 2023-01-11T21:16:49.8038431Z Entering 'third_party/fbgemm/third_party/googletest' 2023-01-11T21:16:49.8068300Z Entering 'third_party/fbgemm/third_party/hipify_torch' 2023-01-11T21:16:49.8096590Z Entering 'third_party/flatbuffers' 2023-01-11T21:16:49.8126282Z Entering 'third_party/fmt' 2023-01-11T21:16:49.8154495Z Entering 'third_party/foxi' 2023-01-11T21:16:49.8183489Z Entering 'third_party/gemmlowp/gemmlowp' 2023-01-11T21:16:49.8210424Z Entering 'third_party/gloo' 2023-01-11T21:16:49.8237224Z Entering 'third_party/googletest' 2023-01-11T21:16:49.8265011Z Entering 'third_party/ideep' 2023-01-11T21:16:49.8289980Z Entering 'third_party/ideep/mkl-dnn' 2023-01-11T21:16:49.8317847Z Entering 'third_party/ideep/mkl-dnn/third_party/oneDNN' 2023-01-11T21:16:49.8349464Z Entering 'third_party/ios-cmake' 2023-01-11T21:16:49.8376240Z Entering 'third_party/ittapi' 2023-01-11T21:16:49.8401282Z Entering 'third_party/kineto' 2023-01-11T21:16:49.8427102Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2023-01-11T21:16:49.8455228Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2023-01-11T21:16:49.8481348Z Entering 'third_party/nccl/nccl' 2023-01-11T21:16:49.8506721Z Entering 'third_party/neon2sse' 2023-01-11T21:16:49.8532506Z Entering 'third_party/nlohmann' 2023-01-11T21:16:49.8558984Z Entering 'third_party/onnx' 2023-01-11T21:16:49.8594246Z Entering 'third_party/onnx/third_party/benchmark' 2023-01-11T21:16:49.8620887Z Entering 'third_party/onnx/third_party/pybind11' 2023-01-11T21:16:49.8648135Z Entering 'third_party/onnx-tensorrt' 2023-01-11T21:16:49.8674321Z Entering 'third_party/onnx-tensorrt/third_party/onnx' 2023-01-11T21:16:49.8709458Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/benchmark' 2023-01-11T21:16:49.8736066Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11' 2023-01-11T21:16:49.8762013Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11/tools/clang' 2023-01-11T21:16:49.8795874Z Entering 'third_party/pocketfft' 2023-01-11T21:16:49.8823241Z Entering 'third_party/protobuf' 2023-01-11T21:16:49.8850819Z Entering 'third_party/protobuf/third_party/benchmark' 2023-01-11T21:16:49.8877830Z Entering 'third_party/protobuf/third_party/googletest' 2023-01-11T21:16:49.8905982Z Entering 'third_party/psimd' 2023-01-11T21:16:49.8931128Z Entering 'third_party/pthreadpool' 2023-01-11T21:16:49.8958191Z Entering 'third_party/pybind11' 2023-01-11T21:16:49.8985138Z Entering 'third_party/python-enum' 2023-01-11T21:16:49.9010639Z Entering 'third_party/python-peachpy' 2023-01-11T21:16:49.9037522Z Entering 'third_party/python-six' 2023-01-11T21:16:49.9063469Z Entering 'third_party/sleef' 2023-01-11T21:16:49.9088238Z Entering 'third_party/tbb' 2023-01-11T21:16:49.9116590Z Entering 'third_party/tensorpipe' 2023-01-11T21:16:49.9142438Z Entering 'third_party/tensorpipe/third_party/googletest' 2023-01-11T21:16:49.9167193Z Entering 'third_party/tensorpipe/third_party/libnop' 2023-01-11T21:16:49.9193195Z Entering 'third_party/tensorpipe/third_party/libuv' 2023-01-11T21:16:49.9219420Z Entering 'third_party/tensorpipe/third_party/pybind11' 2023-01-11T21:16:49.9244682Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2023-01-11T21:16:49.9272630Z Entering 'third_party/zstd' 2023-01-11T21:16:49.9306988Z ##[endgroup] 2023-01-11T21:16:49.9307426Z ##[group]Persisting credentials for submodules 2023-01-11T21:16:49.9312790Z [command]/usr/bin/git submodule foreach --recursive git config --local --name-only --get-regexp 'url\.https\:\/\/github\.com\/\.insteadOf' && git config --local --unset-all 'url.https://github.com/.insteadOf' || : 2023-01-11T21:16:49.9524322Z Entering 'android/libs/fbjni' 2023-01-11T21:16:49.9554405Z Entering 'third_party/FP16' 2023-01-11T21:16:49.9581071Z Entering 'third_party/FXdiv' 2023-01-11T21:16:49.9605638Z Entering 'third_party/NNPACK' 2023-01-11T21:16:49.9633026Z Entering 'third_party/QNNPACK' 2023-01-11T21:16:49.9661091Z Entering 'third_party/VulkanMemoryAllocator' 2023-01-11T21:16:49.9694019Z Entering 'third_party/XNNPACK' 2023-01-11T21:16:49.9729957Z Entering 'third_party/benchmark' 2023-01-11T21:16:49.9756555Z Entering 'third_party/cpuinfo' 2023-01-11T21:16:49.9785822Z Entering 'third_party/cub' 2023-01-11T21:16:49.9811968Z Entering 'third_party/cudnn_frontend' 2023-01-11T21:16:49.9846800Z Entering 'third_party/cutlass' 2023-01-11T21:16:49.9882157Z Entering 'third_party/eigen' 2023-01-11T21:16:49.9909795Z Entering 'third_party/fbgemm' 2023-01-11T21:16:49.9938752Z Entering 'third_party/fbgemm/third_party/asmjit' 2023-01-11T21:16:49.9965470Z Entering 'third_party/fbgemm/third_party/cpuinfo' 2023-01-11T21:16:49.9994006Z Entering 'third_party/fbgemm/third_party/googletest' 2023-01-11T21:16:50.0024567Z Entering 'third_party/fbgemm/third_party/hipify_torch' 2023-01-11T21:16:50.0052912Z Entering 'third_party/flatbuffers' 2023-01-11T21:16:50.0081095Z Entering 'third_party/fmt' 2023-01-11T21:16:50.0105902Z Entering 'third_party/foxi' 2023-01-11T21:16:50.0131223Z Entering 'third_party/gemmlowp/gemmlowp' 2023-01-11T21:16:50.0156075Z Entering 'third_party/gloo' 2023-01-11T21:16:50.0183455Z Entering 'third_party/googletest' 2023-01-11T21:16:50.0207943Z Entering 'third_party/ideep' 2023-01-11T21:16:50.0233477Z Entering 'third_party/ideep/mkl-dnn' 2023-01-11T21:16:50.0263889Z Entering 'third_party/ideep/mkl-dnn/third_party/oneDNN' 2023-01-11T21:16:50.0295616Z Entering 'third_party/ios-cmake' 2023-01-11T21:16:50.0321401Z Entering 'third_party/ittapi' 2023-01-11T21:16:50.0347495Z Entering 'third_party/kineto' 2023-01-11T21:16:50.0372916Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2023-01-11T21:16:50.0398976Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2023-01-11T21:16:50.0426032Z Entering 'third_party/nccl/nccl' 2023-01-11T21:16:50.0451060Z Entering 'third_party/neon2sse' 2023-01-11T21:16:50.0476548Z Entering 'third_party/nlohmann' 2023-01-11T21:16:50.0502436Z Entering 'third_party/onnx' 2023-01-11T21:16:50.0536574Z Entering 'third_party/onnx/third_party/benchmark' 2023-01-11T21:16:50.0561757Z Entering 'third_party/onnx/third_party/pybind11' 2023-01-11T21:16:50.0589115Z Entering 'third_party/onnx-tensorrt' 2023-01-11T21:16:50.0616046Z Entering 'third_party/onnx-tensorrt/third_party/onnx' 2023-01-11T21:16:50.0645602Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/benchmark' 2023-01-11T21:16:50.0672325Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11' 2023-01-11T21:16:50.0697879Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11/tools/clang' 2023-01-11T21:16:50.0726826Z Entering 'third_party/pocketfft' 2023-01-11T21:16:50.0753286Z Entering 'third_party/protobuf' 2023-01-11T21:16:50.0783302Z Entering 'third_party/protobuf/third_party/benchmark' 2023-01-11T21:16:50.0809823Z Entering 'third_party/protobuf/third_party/googletest' 2023-01-11T21:16:50.0838202Z Entering 'third_party/psimd' 2023-01-11T21:16:50.0864768Z Entering 'third_party/pthreadpool' 2023-01-11T21:16:50.0895668Z Entering 'third_party/pybind11' 2023-01-11T21:16:50.0924518Z Entering 'third_party/python-enum' 2023-01-11T21:16:50.0951419Z Entering 'third_party/python-peachpy' 2023-01-11T21:16:50.0977859Z Entering 'third_party/python-six' 2023-01-11T21:16:50.1003643Z Entering 'third_party/sleef' 2023-01-11T21:16:50.1030062Z Entering 'third_party/tbb' 2023-01-11T21:16:50.1057024Z Entering 'third_party/tensorpipe' 2023-01-11T21:16:50.1082602Z Entering 'third_party/tensorpipe/third_party/googletest' 2023-01-11T21:16:50.1108382Z Entering 'third_party/tensorpipe/third_party/libnop' 2023-01-11T21:16:50.1138512Z Entering 'third_party/tensorpipe/third_party/libuv' 2023-01-11T21:16:50.1166627Z Entering 'third_party/tensorpipe/third_party/pybind11' 2023-01-11T21:16:50.1192717Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2023-01-11T21:16:50.1220764Z Entering 'third_party/zstd' 2023-01-11T21:16:50.1256752Z [command]/usr/bin/git submodule foreach --recursive git config --local 'http.https://github.com/.extraheader' 'AUTHORIZATION: basic ***' && git config --local --show-origin --name-only --get-regexp remote.origin.url 2023-01-11T21:16:50.1465988Z Entering 'android/libs/fbjni' 2023-01-11T21:16:50.1489153Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/android/libs/fbjni/config remote.origin.url 2023-01-11T21:16:50.1500948Z Entering 'third_party/FP16' 2023-01-11T21:16:50.1524037Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FP16/config remote.origin.url 2023-01-11T21:16:50.1535154Z Entering 'third_party/FXdiv' 2023-01-11T21:16:50.1558492Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FXdiv/config remote.origin.url 2023-01-11T21:16:50.1568712Z Entering 'third_party/NNPACK' 2023-01-11T21:16:50.1592476Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK/config remote.origin.url 2023-01-11T21:16:50.1603337Z Entering 'third_party/QNNPACK' 2023-01-11T21:16:50.1627250Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/QNNPACK/config remote.origin.url 2023-01-11T21:16:50.1637913Z Entering 'third_party/VulkanMemoryAllocator' 2023-01-11T21:16:50.1662678Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/VulkanMemoryAllocator/config remote.origin.url 2023-01-11T21:16:50.1673315Z Entering 'third_party/XNNPACK' 2023-01-11T21:16:50.1696803Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/XNNPACK/config remote.origin.url 2023-01-11T21:16:50.1713905Z Entering 'third_party/benchmark' 2023-01-11T21:16:50.1739519Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/benchmark/config remote.origin.url 2023-01-11T21:16:50.1752686Z Entering 'third_party/cpuinfo' 2023-01-11T21:16:50.1776674Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cpuinfo/config remote.origin.url 2023-01-11T21:16:50.1787589Z Entering 'third_party/cub' 2023-01-11T21:16:50.1811461Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cub/config remote.origin.url 2023-01-11T21:16:50.1824650Z Entering 'third_party/cudnn_frontend' 2023-01-11T21:16:50.1848923Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cudnn_frontend/config remote.origin.url 2023-01-11T21:16:50.1863675Z Entering 'third_party/cutlass' 2023-01-11T21:16:50.1887348Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cutlass/config remote.origin.url 2023-01-11T21:16:50.1902865Z Entering 'third_party/eigen' 2023-01-11T21:16:50.1926461Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/eigen/config remote.origin.url 2023-01-11T21:16:50.1939091Z Entering 'third_party/fbgemm' 2023-01-11T21:16:50.1962920Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/config remote.origin.url 2023-01-11T21:16:50.1975767Z Entering 'third_party/fbgemm/third_party/asmjit' 2023-01-11T21:16:50.2000632Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/third_party/asmjit/config remote.origin.url 2023-01-11T21:16:50.2013466Z Entering 'third_party/fbgemm/third_party/cpuinfo' 2023-01-11T21:16:50.2040289Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/third_party/cpuinfo/config remote.origin.url 2023-01-11T21:16:50.2054918Z Entering 'third_party/fbgemm/third_party/googletest' 2023-01-11T21:16:50.2079574Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/third_party/googletest/config remote.origin.url 2023-01-11T21:16:50.2091239Z Entering 'third_party/fbgemm/third_party/hipify_torch' 2023-01-11T21:16:50.2119894Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/third_party/hipify_torch/config remote.origin.url 2023-01-11T21:16:50.2132725Z Entering 'third_party/flatbuffers' 2023-01-11T21:16:50.2158803Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flatbuffers/config remote.origin.url 2023-01-11T21:16:50.2171672Z Entering 'third_party/fmt' 2023-01-11T21:16:50.2195956Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fmt/config remote.origin.url 2023-01-11T21:16:50.2206864Z Entering 'third_party/foxi' 2023-01-11T21:16:50.2231398Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/foxi/config remote.origin.url 2023-01-11T21:16:50.2242171Z Entering 'third_party/gemmlowp/gemmlowp' 2023-01-11T21:16:50.2267371Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/gemmlowp/gemmlowp/config remote.origin.url 2023-01-11T21:16:50.2277994Z Entering 'third_party/gloo' 2023-01-11T21:16:50.2303523Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/gloo/config remote.origin.url 2023-01-11T21:16:50.2314110Z Entering 'third_party/googletest' 2023-01-11T21:16:50.2338208Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/googletest/config remote.origin.url 2023-01-11T21:16:50.2349326Z Entering 'third_party/ideep' 2023-01-11T21:16:50.2376132Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/config remote.origin.url 2023-01-11T21:16:50.2387432Z Entering 'third_party/ideep/mkl-dnn' 2023-01-11T21:16:50.2413142Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/modules/mkl-dnn/config remote.origin.url 2023-01-11T21:16:50.2428572Z Entering 'third_party/ideep/mkl-dnn/third_party/oneDNN' 2023-01-11T21:16:50.2456533Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/modules/mkl-dnn/modules/third_party/oneDNN/config remote.origin.url 2023-01-11T21:16:50.2474157Z Entering 'third_party/ios-cmake' 2023-01-11T21:16:50.2500687Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ios-cmake/config remote.origin.url 2023-01-11T21:16:50.2512871Z Entering 'third_party/ittapi' 2023-01-11T21:16:50.2542102Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ittapi/config remote.origin.url 2023-01-11T21:16:50.2554343Z Entering 'third_party/kineto' 2023-01-11T21:16:50.2580495Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/config remote.origin.url 2023-01-11T21:16:50.2591008Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2023-01-11T21:16:50.2618653Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/fmt/config remote.origin.url 2023-01-11T21:16:50.2631175Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2023-01-11T21:16:50.2656323Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/googletest/config remote.origin.url 2023-01-11T21:16:50.2668375Z Entering 'third_party/nccl/nccl' 2023-01-11T21:16:50.2692167Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/nccl/nccl/config remote.origin.url 2023-01-11T21:16:50.2704345Z Entering 'third_party/neon2sse' 2023-01-11T21:16:50.2727949Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/neon2sse/config remote.origin.url 2023-01-11T21:16:50.2739095Z Entering 'third_party/nlohmann' 2023-01-11T21:16:50.2763197Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/nlohmann/config remote.origin.url 2023-01-11T21:16:50.2777785Z Entering 'third_party/onnx' 2023-01-11T21:16:50.2802502Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/config remote.origin.url 2023-01-11T21:16:50.2823794Z Entering 'third_party/onnx/third_party/benchmark' 2023-01-11T21:16:50.2848575Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/modules/third_party/benchmark/config remote.origin.url 2023-01-11T21:16:50.2860874Z Entering 'third_party/onnx/third_party/pybind11' 2023-01-11T21:16:50.2888348Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/modules/third_party/pybind11/config remote.origin.url 2023-01-11T21:16:50.2903214Z Entering 'third_party/onnx-tensorrt' 2023-01-11T21:16:50.2928398Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/onnx-tensorrt/config remote.origin.url 2023-01-11T21:16:50.2939049Z Entering 'third_party/onnx-tensorrt/third_party/onnx' 2023-01-11T21:16:50.2963449Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/onnx-tensorrt/modules/third_party/onnx/config remote.origin.url 2023-01-11T21:16:50.2979344Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/benchmark' 2023-01-11T21:16:50.3006357Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/onnx-tensorrt/modules/third_party/onnx/modules/third_party/benchmark/config remote.origin.url 2023-01-11T21:16:50.3019662Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11' 2023-01-11T21:16:50.3044745Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/onnx-tensorrt/modules/third_party/onnx/modules/third_party/pybind11/config remote.origin.url 2023-01-11T21:16:50.3056213Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11/tools/clang' 2023-01-11T21:16:50.3080728Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/onnx-tensorrt/modules/third_party/onnx/modules/third_party/pybind11/modules/tools/clang/config remote.origin.url 2023-01-11T21:16:50.3096294Z Entering 'third_party/pocketfft' 2023-01-11T21:16:50.3120178Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/pocketfft/config remote.origin.url 2023-01-11T21:16:50.3130554Z Entering 'third_party/protobuf' 2023-01-11T21:16:50.3154657Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/config remote.origin.url 2023-01-11T21:16:50.3167434Z Entering 'third_party/protobuf/third_party/benchmark' 2023-01-11T21:16:50.3191846Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/benchmark/config remote.origin.url 2023-01-11T21:16:50.3203186Z Entering 'third_party/protobuf/third_party/googletest' 2023-01-11T21:16:50.3227018Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/googletest/config remote.origin.url 2023-01-11T21:16:50.3239235Z Entering 'third_party/psimd' 2023-01-11T21:16:50.3263259Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/psimd/config remote.origin.url 2023-01-11T21:16:50.3274416Z Entering 'third_party/pthreadpool' 2023-01-11T21:16:50.3299294Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/pthreadpool/config remote.origin.url 2023-01-11T21:16:50.3310202Z Entering 'third_party/pybind11' 2023-01-11T21:16:50.3335428Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/pybind11/config remote.origin.url 2023-01-11T21:16:50.3345416Z Entering 'third_party/python-enum' 2023-01-11T21:16:50.3368894Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/python-enum/config remote.origin.url 2023-01-11T21:16:50.3380321Z Entering 'third_party/python-peachpy' 2023-01-11T21:16:50.3404392Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/python-peachpy/config remote.origin.url 2023-01-11T21:16:50.3416035Z Entering 'third_party/python-six' 2023-01-11T21:16:50.3439591Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/python-six/config remote.origin.url 2023-01-11T21:16:50.3450564Z Entering 'third_party/sleef' 2023-01-11T21:16:50.3474616Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/sleef/config remote.origin.url 2023-01-11T21:16:50.3485104Z Entering 'third_party/tbb' 2023-01-11T21:16:50.3508297Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tbb/config remote.origin.url 2023-01-11T21:16:50.3521056Z Entering 'third_party/tensorpipe' 2023-01-11T21:16:50.3549264Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/config remote.origin.url 2023-01-11T21:16:50.3561112Z Entering 'third_party/tensorpipe/third_party/googletest' 2023-01-11T21:16:50.3588137Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/googletest/config remote.origin.url 2023-01-11T21:16:50.3599037Z Entering 'third_party/tensorpipe/third_party/libnop' 2023-01-11T21:16:50.3622979Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libnop/config remote.origin.url 2023-01-11T21:16:50.3633990Z Entering 'third_party/tensorpipe/third_party/libuv' 2023-01-11T21:16:50.3659751Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libuv/config remote.origin.url 2023-01-11T21:16:50.3672844Z Entering 'third_party/tensorpipe/third_party/pybind11' 2023-01-11T21:16:50.3698100Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/config remote.origin.url 2023-01-11T21:16:50.3708066Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2023-01-11T21:16:50.3731694Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/modules/tools/clang/config remote.origin.url 2023-01-11T21:16:50.3745317Z Entering 'third_party/zstd' 2023-01-11T21:16:50.3768728Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/zstd/config remote.origin.url 2023-01-11T21:16:50.4961820Z [command]/usr/bin/git submodule foreach --recursive git config --local --add 'url.https://github.com/.insteadOf' 'git@github.com:' 2023-01-11T21:16:50.5167849Z Entering 'android/libs/fbjni' 2023-01-11T21:16:50.5194687Z Entering 'third_party/FP16' 2023-01-11T21:16:50.5224065Z Entering 'third_party/FXdiv' 2023-01-11T21:16:50.5252851Z Entering 'third_party/NNPACK' 2023-01-11T21:16:50.5280959Z Entering 'third_party/QNNPACK' 2023-01-11T21:16:50.5311816Z Entering 'third_party/VulkanMemoryAllocator' 2023-01-11T21:16:50.5340737Z Entering 'third_party/XNNPACK' 2023-01-11T21:16:50.5378416Z Entering 'third_party/benchmark' 2023-01-11T21:16:50.5406656Z Entering 'third_party/cpuinfo' 2023-01-11T21:16:50.5444384Z Entering 'third_party/cub' 2023-01-11T21:16:50.5473299Z Entering 'third_party/cudnn_frontend' 2023-01-11T21:16:50.5505187Z Entering 'third_party/cutlass' 2023-01-11T21:16:50.5539057Z Entering 'third_party/eigen' 2023-01-11T21:16:50.5568347Z Entering 'third_party/fbgemm' 2023-01-11T21:16:50.5598341Z Entering 'third_party/fbgemm/third_party/asmjit' 2023-01-11T21:16:50.5628446Z Entering 'third_party/fbgemm/third_party/cpuinfo' 2023-01-11T21:16:50.5655774Z Entering 'third_party/fbgemm/third_party/googletest' 2023-01-11T21:16:50.5684150Z Entering 'third_party/fbgemm/third_party/hipify_torch' 2023-01-11T21:16:50.5712692Z Entering 'third_party/flatbuffers' 2023-01-11T21:16:50.5742401Z Entering 'third_party/fmt' 2023-01-11T21:16:50.5770480Z Entering 'third_party/foxi' 2023-01-11T21:16:50.5797848Z Entering 'third_party/gemmlowp/gemmlowp' 2023-01-11T21:16:50.5827012Z Entering 'third_party/gloo' 2023-01-11T21:16:50.5855835Z Entering 'third_party/googletest' 2023-01-11T21:16:50.5886605Z Entering 'third_party/ideep' 2023-01-11T21:16:50.5914639Z Entering 'third_party/ideep/mkl-dnn' 2023-01-11T21:16:50.6034387Z Entering 'third_party/ideep/mkl-dnn/third_party/oneDNN' 2023-01-11T21:16:50.6066317Z Entering 'third_party/ios-cmake' 2023-01-11T21:16:50.6096791Z Entering 'third_party/ittapi' 2023-01-11T21:16:50.6122403Z Entering 'third_party/kineto' 2023-01-11T21:16:50.6150592Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2023-01-11T21:16:50.6177643Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2023-01-11T21:16:50.6203779Z Entering 'third_party/nccl/nccl' 2023-01-11T21:16:50.6234659Z Entering 'third_party/neon2sse' 2023-01-11T21:16:50.6261299Z Entering 'third_party/nlohmann' 2023-01-11T21:16:50.6287390Z Entering 'third_party/onnx' 2023-01-11T21:16:50.6325934Z Entering 'third_party/onnx/third_party/benchmark' 2023-01-11T21:16:50.6353398Z Entering 'third_party/onnx/third_party/pybind11' 2023-01-11T21:16:50.6383930Z Entering 'third_party/onnx-tensorrt' 2023-01-11T21:16:50.6411080Z Entering 'third_party/onnx-tensorrt/third_party/onnx' 2023-01-11T21:16:50.6441032Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/benchmark' 2023-01-11T21:16:50.6468160Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11' 2023-01-11T21:16:50.6495532Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11/tools/clang' 2023-01-11T21:16:50.6526176Z Entering 'third_party/pocketfft' 2023-01-11T21:16:50.6552118Z Entering 'third_party/protobuf' 2023-01-11T21:16:50.6581083Z Entering 'third_party/protobuf/third_party/benchmark' 2023-01-11T21:16:50.6606654Z Entering 'third_party/protobuf/third_party/googletest' 2023-01-11T21:16:50.6638207Z Entering 'third_party/psimd' 2023-01-11T21:16:50.6664424Z Entering 'third_party/pthreadpool' 2023-01-11T21:16:50.6691979Z Entering 'third_party/pybind11' 2023-01-11T21:16:50.6721368Z Entering 'third_party/python-enum' 2023-01-11T21:16:50.6747345Z Entering 'third_party/python-peachpy' 2023-01-11T21:16:50.6775772Z Entering 'third_party/python-six' 2023-01-11T21:16:50.6801515Z Entering 'third_party/sleef' 2023-01-11T21:16:50.6831325Z Entering 'third_party/tbb' 2023-01-11T21:16:50.6858839Z Entering 'third_party/tensorpipe' 2023-01-11T21:16:50.6884917Z Entering 'third_party/tensorpipe/third_party/googletest' 2023-01-11T21:16:50.6913394Z Entering 'third_party/tensorpipe/third_party/libnop' 2023-01-11T21:16:50.6939841Z Entering 'third_party/tensorpipe/third_party/libuv' 2023-01-11T21:16:50.6968100Z Entering 'third_party/tensorpipe/third_party/pybind11' 2023-01-11T21:16:50.6993870Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2023-01-11T21:16:50.7024666Z Entering 'third_party/zstd' 2023-01-11T21:16:50.7060614Z [command]/usr/bin/git submodule foreach --recursive git config --local --add 'url.https://github.com/.insteadOf' 'org-21003710@github.com:' 2023-01-11T21:16:50.7269625Z Entering 'android/libs/fbjni' 2023-01-11T21:16:50.7298664Z Entering 'third_party/FP16' 2023-01-11T21:16:50.7327201Z Entering 'third_party/FXdiv' 2023-01-11T21:16:50.7356240Z Entering 'third_party/NNPACK' 2023-01-11T21:16:50.7385244Z Entering 'third_party/QNNPACK' 2023-01-11T21:16:50.7413931Z Entering 'third_party/VulkanMemoryAllocator' 2023-01-11T21:16:50.7445351Z Entering 'third_party/XNNPACK' 2023-01-11T21:16:50.7479147Z Entering 'third_party/benchmark' 2023-01-11T21:16:50.7508680Z Entering 'third_party/cpuinfo' 2023-01-11T21:16:50.7537664Z Entering 'third_party/cub' 2023-01-11T21:16:50.7565020Z Entering 'third_party/cudnn_frontend' 2023-01-11T21:16:50.7598869Z Entering 'third_party/cutlass' 2023-01-11T21:16:50.7633409Z Entering 'third_party/eigen' 2023-01-11T21:16:50.7664941Z Entering 'third_party/fbgemm' 2023-01-11T21:16:50.7692960Z Entering 'third_party/fbgemm/third_party/asmjit' 2023-01-11T21:16:50.7721622Z Entering 'third_party/fbgemm/third_party/cpuinfo' 2023-01-11T21:16:50.7749743Z Entering 'third_party/fbgemm/third_party/googletest' 2023-01-11T21:16:50.7776072Z Entering 'third_party/fbgemm/third_party/hipify_torch' 2023-01-11T21:16:50.7805658Z Entering 'third_party/flatbuffers' 2023-01-11T21:16:50.7838582Z Entering 'third_party/fmt' 2023-01-11T21:16:50.7866577Z Entering 'third_party/foxi' 2023-01-11T21:16:50.7895915Z Entering 'third_party/gemmlowp/gemmlowp' 2023-01-11T21:16:50.7924604Z Entering 'third_party/gloo' 2023-01-11T21:16:50.7954518Z Entering 'third_party/googletest' 2023-01-11T21:16:50.7983163Z Entering 'third_party/ideep' 2023-01-11T21:16:50.8010564Z Entering 'third_party/ideep/mkl-dnn' 2023-01-11T21:16:50.8040656Z Entering 'third_party/ideep/mkl-dnn/third_party/oneDNN' 2023-01-11T21:16:50.8073796Z Entering 'third_party/ios-cmake' 2023-01-11T21:16:50.8101117Z Entering 'third_party/ittapi' 2023-01-11T21:16:50.8130154Z Entering 'third_party/kineto' 2023-01-11T21:16:50.8160127Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2023-01-11T21:16:50.8187580Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2023-01-11T21:16:50.8216935Z Entering 'third_party/nccl/nccl' 2023-01-11T21:16:50.8247322Z Entering 'third_party/neon2sse' 2023-01-11T21:16:50.8277120Z Entering 'third_party/nlohmann' 2023-01-11T21:16:50.8309023Z Entering 'third_party/onnx' 2023-01-11T21:16:50.8345382Z Entering 'third_party/onnx/third_party/benchmark' 2023-01-11T21:16:50.8375904Z Entering 'third_party/onnx/third_party/pybind11' 2023-01-11T21:16:50.8405273Z Entering 'third_party/onnx-tensorrt' 2023-01-11T21:16:50.8433322Z Entering 'third_party/onnx-tensorrt/third_party/onnx' 2023-01-11T21:16:50.8466152Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/benchmark' 2023-01-11T21:16:50.8496053Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11' 2023-01-11T21:16:50.8524715Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11/tools/clang' 2023-01-11T21:16:50.8557051Z Entering 'third_party/pocketfft' 2023-01-11T21:16:50.8583714Z Entering 'third_party/protobuf' 2023-01-11T21:16:50.8614852Z Entering 'third_party/protobuf/third_party/benchmark' 2023-01-11T21:16:50.8643913Z Entering 'third_party/protobuf/third_party/googletest' 2023-01-11T21:16:50.8673922Z Entering 'third_party/psimd' 2023-01-11T21:16:50.8702621Z Entering 'third_party/pthreadpool' 2023-01-11T21:16:50.8730614Z Entering 'third_party/pybind11' 2023-01-11T21:16:50.8761015Z Entering 'third_party/python-enum' 2023-01-11T21:16:50.8787742Z Entering 'third_party/python-peachpy' 2023-01-11T21:16:50.8815437Z Entering 'third_party/python-six' 2023-01-11T21:16:50.8844054Z Entering 'third_party/sleef' 2023-01-11T21:16:50.8872032Z Entering 'third_party/tbb' 2023-01-11T21:16:50.8900878Z Entering 'third_party/tensorpipe' 2023-01-11T21:16:50.8930041Z Entering 'third_party/tensorpipe/third_party/googletest' 2023-01-11T21:16:50.8960001Z Entering 'third_party/tensorpipe/third_party/libnop' 2023-01-11T21:16:50.8989616Z Entering 'third_party/tensorpipe/third_party/libuv' 2023-01-11T21:16:50.9016250Z Entering 'third_party/tensorpipe/third_party/pybind11' 2023-01-11T21:16:50.9043704Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2023-01-11T21:16:50.9075801Z Entering 'third_party/zstd' 2023-01-11T21:16:50.9109560Z ##[endgroup] 2023-01-11T21:16:50.9139535Z [command]/usr/bin/git log -1 --format='%H' 2023-01-11T21:16:50.9161649Z '8419ddda87c8a47eacc63b54bc7ec98c1f27c26e' 2023-01-11T21:16:50.9274811Z Prepare all required actions 2023-01-11T21:16:50.9300769Z ##[group]Run ./.github/actions/setup-linux 2023-01-11T21:16:50.9301018Z env: 2023-01-11T21:16:50.9301245Z GIT_DEFAULT_BRANCH: master 2023-01-11T21:16:50.9301456Z ##[endgroup] 2023-01-11T21:16:50.9337541Z ##[group]Run set -euo pipefail 2023-01-11T21:16:50.9337802Z set -euo pipefail 2023-01-11T21:16:50.9338020Z function get_ec2_metadata() { 2023-01-11T21:16:50.9338286Z  # Pulled from instance metadata endpoint for EC2 2023-01-11T21:16:50.9338650Z  # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html 2023-01-11T21:16:50.9338959Z  category=$1 2023-01-11T21:16:50.9339218Z  curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" 2023-01-11T21:16:50.9339488Z } 2023-01-11T21:16:50.9339735Z echo "ami-id: $(get_ec2_metadata ami-id)" 2023-01-11T21:16:50.9340200Z echo "instance-id: $(get_ec2_metadata instance-id)" 2023-01-11T21:16:50.9340504Z echo "instance-type: $(get_ec2_metadata instance-type)" 2023-01-11T21:16:50.9340766Z echo "system info $(uname -a)" 2023-01-11T21:16:50.9351589Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2023-01-11T21:16:50.9351826Z env: 2023-01-11T21:16:50.9352010Z GIT_DEFAULT_BRANCH: master 2023-01-11T21:16:50.9352212Z ##[endgroup] 2023-01-11T21:16:50.9421223Z ami-id: ami-096198a0bccc6bad4 2023-01-11T21:16:50.9462306Z instance-id: i-016718a172a944ca0 2023-01-11T21:16:50.9502944Z instance-type: g5.4xlarge 2023-01-11T21:16:50.9508475Z system info Linux ip-10-0-2-196.ec2.internal 4.14.252-195.483.amzn2.x86_64 #1 SMP Mon Nov 1 20:58:46 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux 2023-01-11T21:16:50.9523866Z ##[group]Run if systemctl is-active --quiet docker; then 2023-01-11T21:16:50.9524165Z if systemctl is-active --quiet docker; then 2023-01-11T21:16:50.9524440Z  echo "Docker daemon is running..."; 2023-01-11T21:16:50.9524678Z else 2023-01-11T21:16:50.9524925Z  echo "Starting docker deamon..." && sudo systemctl start docker; 2023-01-11T21:16:50.9525174Z fi 2023-01-11T21:16:50.9536476Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2023-01-11T21:16:50.9536715Z env: 2023-01-11T21:16:50.9536918Z GIT_DEFAULT_BRANCH: master 2023-01-11T21:16:50.9537130Z ##[endgroup] 2023-01-11T21:16:50.9574676Z Docker daemon is running... 2023-01-11T21:16:50.9588207Z ##[group]Run AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") 2023-01-11T21:16:50.9588580Z AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") 2023-01-11T21:16:50.9588875Z retry () { "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") } 2023-01-11T21:16:50.9589278Z retry aws ecr get-login*** "$AWS_DEFAULT_REGION" | docker login --username AWS \ 2023-01-11T21:16:50.9589646Z  --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" 2023-01-11T21:16:50.9598308Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2023-01-11T21:16:50.9598538Z env: 2023-01-11T21:16:50.9598730Z GIT_DEFAULT_BRANCH: master 2023-01-11T21:16:50.9598946Z AWS_RETRY_MODE: standard 2023-01-11T21:16:50.9599147Z AWS_MAX_ATTEMPTS: 5 2023-01-11T21:16:50.9599440Z AWS_DEFAULT_REGION: us-east-1 2023-01-11T21:16:50.9599653Z ##[endgroup] 2023-01-11T21:16:51.7735285Z WARNING! Your password will be stored unencrypted in /home/ec2-user/.docker/config.json. 2023-01-11T21:16:51.7736208Z Configure a credential helper to remove this warning. See 2023-01-11T21:16:51.7737203Z https://docs.docker.com/engine/reference/commandline/login/#credentials-store 2023-01-11T21:16:51.7737735Z 2023-01-11T21:16:51.7737967Z Login Succeeded 2023-01-11T21:16:51.7769173Z ##[group]Run env | grep '^GITHUB' >> "/tmp/github_env_${GITHUB_RUN_ID}" 2023-01-11T21:16:51.7769726Z env | grep '^GITHUB' >> "/tmp/github_env_${GITHUB_RUN_ID}" 2023-01-11T21:16:51.7770131Z env | grep '^CI' >> "/tmp/github_env_${GITHUB_RUN_ID}" 2023-01-11T21:16:51.7781354Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2023-01-11T21:16:51.7781585Z env: 2023-01-11T21:16:51.7781772Z GIT_DEFAULT_BRANCH: master 2023-01-11T21:16:51.7781975Z ##[endgroup] 2023-01-11T21:16:51.7848687Z ##[group]Run pytorch/test-infra/.github/actions/pull-docker-image@main 2023-01-11T21:16:51.7848951Z with: 2023-01-11T21:16:51.7849330Z docker-image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-bionic-cuda11.6-cudnn8-py3-gcc7:fd224c2e6c79d7fdec6408da598bf52bc5b201dd 2023-01-11T21:16:51.7849694Z env: 2023-01-11T21:16:51.7849882Z GIT_DEFAULT_BRANCH: master 2023-01-11T21:16:51.7850087Z ##[endgroup] 2023-01-11T21:16:51.7861466Z ##[group]Run retry () { "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") } 2023-01-11T21:16:51.7861750Z retry () { "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") } 2023-01-11T21:16:51.7862034Z # ignore output since only exit code is used for conditional 2023-01-11T21:16:51.7862336Z # only pull docker image if it's not available locally 2023-01-11T21:16:51.7862650Z if ! docker inspect --type=image "${DOCKER_IMAGE}" >/dev/null 2>/dev/null; then 2023-01-11T21:16:51.7862958Z  retry docker pull "${DOCKER_IMAGE}" 2023-01-11T21:16:51.7863176Z fi 2023-01-11T21:16:51.7871835Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2023-01-11T21:16:51.7872067Z env: 2023-01-11T21:16:51.7872259Z GIT_DEFAULT_BRANCH: master 2023-01-11T21:16:51.7872657Z DOCKER_IMAGE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-bionic-cuda11.6-cudnn8-py3-gcc7:fd224c2e6c79d7fdec6408da598bf52bc5b201dd 2023-01-11T21:16:51.7873028Z ##[endgroup] 2023-01-11T21:16:52.0040315Z fd224c2e6c79d7fdec6408da598bf52bc5b201dd: Pulling from pytorch/pytorch-linux-bionic-cuda11.6-cudnn8-py3-gcc7 2023-01-11T21:16:52.0040819Z fb668870d8a7: Pulling fs layer 2023-01-11T21:16:52.0041161Z 3dc32ed140fb: Pulling fs layer 2023-01-11T21:16:52.0041479Z 54a1df240516: Pulling fs layer 2023-01-11T21:16:52.0041693Z cf378b3cb3c7: Pulling fs layer 2023-01-11T21:16:52.0041909Z 9b4412378859: Pulling fs layer 2023-01-11T21:16:52.0045820Z 502253a1be21: Pulling fs layer 2023-01-11T21:16:52.0046226Z 5c7dd67e5809: Pulling fs layer 2023-01-11T21:16:52.0046684Z bdfd23ed3f48: Pulling fs layer 2023-01-11T21:16:52.0047049Z aee1dd761bdd: Pulling fs layer 2023-01-11T21:16:52.0047311Z 5feda9af2542: Pulling fs layer 2023-01-11T21:16:52.0047671Z f8371ecb849a: Pulling fs layer 2023-01-11T21:16:52.0048102Z ce4a87d45645: Pulling fs layer 2023-01-11T21:16:52.0048513Z 39629f7269f9: Pulling fs layer 2023-01-11T21:16:52.0048894Z cf378b3cb3c7: Waiting 2023-01-11T21:16:52.0049285Z 87d0ffa55850: Pulling fs layer 2023-01-11T21:16:52.0063658Z 70702f8b5bc4: Pulling fs layer 2023-01-11T21:16:52.0064075Z 0c06be5c20e0: Pulling fs layer 2023-01-11T21:16:52.0064374Z b372c2a3bc3f: Pulling fs layer 2023-01-11T21:16:52.0064663Z 582d081a59fa: Pulling fs layer 2023-01-11T21:16:52.0064883Z e1c655e7ec0e: Pulling fs layer 2023-01-11T21:16:52.0065108Z c7726d39d806: Pulling fs layer 2023-01-11T21:16:52.0065314Z 9b4412378859: Waiting 2023-01-11T21:16:52.0065512Z 1c22f2f8c01b: Pulling fs layer 2023-01-11T21:16:52.0065722Z bdfd23ed3f48: Waiting 2023-01-11T21:16:52.0070259Z b8f759fd0191: Pulling fs layer 2023-01-11T21:16:52.0070826Z 502253a1be21: Waiting 2023-01-11T21:16:52.0071110Z e28e73a4bddd: Pulling fs layer 2023-01-11T21:16:52.0071395Z 90d8f9bbe048: Pulling fs layer 2023-01-11T21:16:52.0071663Z b34bd39d0461: Pulling fs layer 2023-01-11T21:16:52.0071937Z 2f2308643d60: Pulling fs layer 2023-01-11T21:16:52.0072209Z 8e3432e5a569: Pulling fs layer 2023-01-11T21:16:52.0072457Z aee1dd761bdd: Waiting 2023-01-11T21:16:52.0073707Z 9ea746919509: Pulling fs layer 2023-01-11T21:16:52.0074068Z 5c7dd67e5809: Waiting 2023-01-11T21:16:52.0074301Z 39629f7269f9: Waiting 2023-01-11T21:16:52.0074543Z 70702f8b5bc4: Waiting 2023-01-11T21:16:52.0074785Z 0c06be5c20e0: Waiting 2023-01-11T21:16:52.0075033Z 1a2fd7b216d7: Pulling fs layer 2023-01-11T21:16:52.0075297Z 5feda9af2542: Waiting 2023-01-11T21:16:52.0075542Z f8371ecb849a: Waiting 2023-01-11T21:16:52.0075828Z 19fde6a723a0: Pulling fs layer 2023-01-11T21:16:52.0076100Z 06369252d749: Pulling fs layer 2023-01-11T21:16:52.0076483Z c7726d39d806: Waiting 2023-01-11T21:16:52.0076746Z b372c2a3bc3f: Waiting 2023-01-11T21:16:52.0077004Z ea4bfeaa0fc7: Pulling fs layer 2023-01-11T21:16:52.0077246Z a1d16b6a5070: Pulling fs layer 2023-01-11T21:16:52.0077463Z f550b7ff2470: Pulling fs layer 2023-01-11T21:16:52.0077657Z e1c655e7ec0e: Waiting 2023-01-11T21:16:52.0077858Z 12ddc57b99eb: Pulling fs layer 2023-01-11T21:16:52.0078070Z 8345085fb0a0: Pulling fs layer 2023-01-11T21:16:52.0078264Z b34bd39d0461: Waiting 2023-01-11T21:16:52.0078465Z 4cc94dbec031: Pulling fs layer 2023-01-11T21:16:52.0078665Z 90d8f9bbe048: Waiting 2023-01-11T21:16:52.0078860Z 29a7c0d5fa4c: Pulling fs layer 2023-01-11T21:16:52.0079069Z 25571655d0e1: Pulling fs layer 2023-01-11T21:16:52.0079278Z bdf297d7f88c: Pulling fs layer 2023-01-11T21:16:52.0079478Z 1c22f2f8c01b: Waiting 2023-01-11T21:16:52.0079712Z 0b3950af8ae1: Pulling fs layer 2023-01-11T21:16:52.0079944Z 6d68f7da8baa: Pulling fs layer 2023-01-11T21:16:52.0080139Z 9ea746919509: Waiting 2023-01-11T21:16:52.0080331Z 4cc94dbec031: Waiting 2023-01-11T21:16:52.0080526Z b8f759fd0191: Waiting 2023-01-11T21:16:52.0080719Z cca768f96df4: Pulling fs layer 2023-01-11T21:16:52.0080923Z f550b7ff2470: Waiting 2023-01-11T21:16:52.0081123Z 8c3cf3d5e1c5: Pulling fs layer 2023-01-11T21:16:52.0081331Z 61eecfa8b34e: Pulling fs layer 2023-01-11T21:16:52.0081531Z 06369252d749: Waiting 2023-01-11T21:16:52.0081715Z 8345085fb0a0: Waiting 2023-01-11T21:16:52.0081895Z 29a7c0d5fa4c: Waiting 2023-01-11T21:16:52.0082083Z 0b3950af8ae1: Waiting 2023-01-11T21:16:52.0082264Z 8e3432e5a569: Waiting 2023-01-11T21:16:52.0082441Z 12ddc57b99eb: Waiting 2023-01-11T21:16:52.0082629Z 582d081a59fa: Waiting 2023-01-11T21:16:52.0082818Z e28e73a4bddd: Waiting 2023-01-11T21:16:52.0082998Z a1d16b6a5070: Waiting 2023-01-11T21:16:52.0083191Z 6d68f7da8baa: Waiting 2023-01-11T21:16:52.0083374Z 25571655d0e1: Waiting 2023-01-11T21:16:52.0083558Z ea4bfeaa0fc7: Waiting 2023-01-11T21:16:52.0083749Z 8c3cf3d5e1c5: Waiting 2023-01-11T21:16:52.0083938Z bdf297d7f88c: Waiting 2023-01-11T21:16:52.0084124Z cca768f96df4: Waiting 2023-01-11T21:16:52.0084326Z 95c1ac011645: Pulling fs layer 2023-01-11T21:16:52.0084525Z 61eecfa8b34e: Waiting 2023-01-11T21:16:52.0084717Z 3046cc00c4ca: Pulling fs layer 2023-01-11T21:16:52.0084930Z 195d560d8cf6: Pulling fs layer 2023-01-11T21:16:52.0085141Z 77250abd5ca4: Pulling fs layer 2023-01-11T21:16:52.0085355Z 881b24daf9c5: Pulling fs layer 2023-01-11T21:16:52.0085559Z 9fbf0a18619e: Pulling fs layer 2023-01-11T21:16:52.0085769Z 02048a597c22: Pulling fs layer 2023-01-11T21:16:52.0085963Z 95c1ac011645: Waiting 2023-01-11T21:16:52.0086152Z 859052a25d95: Pulling fs layer 2023-01-11T21:16:52.0086350Z 195d560d8cf6: Waiting 2023-01-11T21:16:52.0086538Z 3046cc00c4ca: Waiting 2023-01-11T21:16:52.0086719Z 77250abd5ca4: Waiting 2023-01-11T21:16:52.0086920Z 3e03143da3c2: Pulling fs layer 2023-01-11T21:16:52.0087125Z 881b24daf9c5: Waiting 2023-01-11T21:16:52.0087307Z 9fbf0a18619e: Waiting 2023-01-11T21:16:52.0087490Z 859052a25d95: Waiting 2023-01-11T21:16:52.0087676Z 3e03143da3c2: Waiting 2023-01-11T21:16:52.0087906Z 02048a597c22: Waiting 2023-01-11T21:16:52.1265075Z 3dc32ed140fb: Verifying Checksum 2023-01-11T21:16:52.1265478Z 3dc32ed140fb: Download complete 2023-01-11T21:16:52.2135529Z cf378b3cb3c7: Verifying Checksum 2023-01-11T21:16:52.2136049Z cf378b3cb3c7: Download complete 2023-01-11T21:16:52.2715343Z 9b4412378859: Download complete 2023-01-11T21:16:52.3085283Z 54a1df240516: Verifying Checksum 2023-01-11T21:16:52.3085585Z 54a1df240516: Download complete 2023-01-11T21:16:52.3372911Z fb668870d8a7: Download complete 2023-01-11T21:16:52.4196997Z 5c7dd67e5809: Verifying Checksum 2023-01-11T21:16:52.4197353Z 5c7dd67e5809: Download complete 2023-01-11T21:16:52.4585949Z bdfd23ed3f48: Download complete 2023-01-11T21:16:52.5073386Z aee1dd761bdd: Verifying Checksum 2023-01-11T21:16:52.5073725Z aee1dd761bdd: Download complete 2023-01-11T21:16:52.6015657Z f8371ecb849a: Download complete 2023-01-11T21:16:52.6929645Z ce4a87d45645: Download complete 2023-01-11T21:16:52.9619150Z fb668870d8a7: Pull complete 2023-01-11T21:16:53.2066467Z 3dc32ed140fb: Pull complete 2023-01-11T21:16:53.6638793Z 54a1df240516: Pull complete 2023-01-11T21:16:53.7736960Z cf378b3cb3c7: Pull complete 2023-01-11T21:16:53.8705145Z 9b4412378859: Pull complete 2023-01-11T21:16:54.8283262Z 39629f7269f9: Verifying Checksum 2023-01-11T21:16:54.8283603Z 39629f7269f9: Download complete 2023-01-11T21:16:54.9209092Z 87d0ffa55850: Verifying Checksum 2023-01-11T21:16:54.9209403Z 87d0ffa55850: Download complete 2023-01-11T21:16:54.9989753Z 70702f8b5bc4: Verifying Checksum 2023-01-11T21:16:54.9990144Z 70702f8b5bc4: Download complete 2023-01-11T21:16:55.0959173Z 0c06be5c20e0: Download complete 2023-01-11T21:16:57.0643542Z b372c2a3bc3f: Verifying Checksum 2023-01-11T21:16:57.0643940Z b372c2a3bc3f: Download complete 2023-01-11T21:16:57.1319682Z 582d081a59fa: Verifying Checksum 2023-01-11T21:16:57.1319930Z 582d081a59fa: Download complete 2023-01-11T21:16:57.2164460Z e1c655e7ec0e: Download complete 2023-01-11T21:17:03.4948380Z 502253a1be21: Download complete 2023-01-11T21:17:03.5609436Z 1c22f2f8c01b: Verifying Checksum 2023-01-11T21:17:03.5609718Z 1c22f2f8c01b: Download complete 2023-01-11T21:17:03.6409552Z b8f759fd0191: Verifying Checksum 2023-01-11T21:17:03.6409795Z b8f759fd0191: Download complete 2023-01-11T21:17:03.7388181Z e28e73a4bddd: Verifying Checksum 2023-01-11T21:17:03.7388513Z e28e73a4bddd: Download complete 2023-01-11T21:17:03.8498961Z 90d8f9bbe048: Verifying Checksum 2023-01-11T21:17:03.8499326Z 90d8f9bbe048: Download complete 2023-01-11T21:17:03.9535712Z b34bd39d0461: Download complete 2023-01-11T21:17:04.0445951Z 2f2308643d60: Download complete 2023-01-11T21:17:05.0018394Z 8e3432e5a569: Verifying Checksum 2023-01-11T21:17:05.0018797Z 8e3432e5a569: Download complete 2023-01-11T21:17:05.0930538Z 9ea746919509: Verifying Checksum 2023-01-11T21:17:05.0930864Z 9ea746919509: Download complete 2023-01-11T21:17:05.1789841Z 1a2fd7b216d7: Download complete 2023-01-11T21:17:05.2745643Z 19fde6a723a0: Verifying Checksum 2023-01-11T21:17:05.2745982Z 19fde6a723a0: Download complete 2023-01-11T21:17:05.4105633Z 06369252d749: Verifying Checksum 2023-01-11T21:17:05.4105961Z 06369252d749: Download complete 2023-01-11T21:17:05.4979334Z ea4bfeaa0fc7: Verifying Checksum 2023-01-11T21:17:05.4979672Z ea4bfeaa0fc7: Download complete 2023-01-11T21:17:06.8218954Z 5feda9af2542: Verifying Checksum 2023-01-11T21:17:06.8219240Z 5feda9af2542: Download complete 2023-01-11T21:17:06.8935626Z f550b7ff2470: Download complete 2023-01-11T21:17:06.9923285Z 12ddc57b99eb: Download complete 2023-01-11T21:17:07.5484597Z 8345085fb0a0: Verifying Checksum 2023-01-11T21:17:07.5485014Z 8345085fb0a0: Download complete 2023-01-11T21:17:07.6299910Z 4cc94dbec031: Verifying Checksum 2023-01-11T21:17:07.6300265Z 4cc94dbec031: Download complete 2023-01-11T21:17:07.7165028Z 29a7c0d5fa4c: Verifying Checksum 2023-01-11T21:17:07.7165465Z 29a7c0d5fa4c: Download complete 2023-01-11T21:17:07.9740510Z 25571655d0e1: Verifying Checksum 2023-01-11T21:17:07.9740870Z 25571655d0e1: Download complete 2023-01-11T21:17:08.0771718Z bdf297d7f88c: Verifying Checksum 2023-01-11T21:17:08.0772544Z bdf297d7f88c: Download complete 2023-01-11T21:17:08.5529328Z 0b3950af8ae1: Verifying Checksum 2023-01-11T21:17:08.5529614Z 0b3950af8ae1: Download complete 2023-01-11T21:17:08.6380071Z 6d68f7da8baa: Verifying Checksum 2023-01-11T21:17:08.6380346Z 6d68f7da8baa: Download complete 2023-01-11T21:17:08.7241864Z cca768f96df4: Verifying Checksum 2023-01-11T21:17:08.7242246Z cca768f96df4: Download complete 2023-01-11T21:17:08.8119701Z a1d16b6a5070: Verifying Checksum 2023-01-11T21:17:08.8120085Z a1d16b6a5070: Download complete 2023-01-11T21:17:08.8903387Z 61eecfa8b34e: Download complete 2023-01-11T21:17:08.9962720Z 95c1ac011645: Verifying Checksum 2023-01-11T21:17:08.9963131Z 95c1ac011645: Download complete 2023-01-11T21:17:09.0869092Z 3046cc00c4ca: Verifying Checksum 2023-01-11T21:17:09.0869446Z 3046cc00c4ca: Download complete 2023-01-11T21:17:09.1828423Z 195d560d8cf6: Verifying Checksum 2023-01-11T21:17:09.1828909Z 195d560d8cf6: Download complete 2023-01-11T21:17:09.7412014Z 77250abd5ca4: Download complete 2023-01-11T21:17:09.8175811Z 881b24daf9c5: Verifying Checksum 2023-01-11T21:17:09.8176099Z 881b24daf9c5: Download complete 2023-01-11T21:17:11.6815169Z 9fbf0a18619e: Verifying Checksum 2023-01-11T21:17:11.6815892Z 9fbf0a18619e: Download complete 2023-01-11T21:17:11.7740456Z 02048a597c22: Verifying Checksum 2023-01-11T21:17:11.7740750Z 02048a597c22: Download complete 2023-01-11T21:17:14.5226181Z 8c3cf3d5e1c5: Verifying Checksum 2023-01-11T21:17:14.5226623Z 8c3cf3d5e1c5: Download complete 2023-01-11T21:17:14.6376493Z 3e03143da3c2: Verifying Checksum 2023-01-11T21:17:14.6376861Z 3e03143da3c2: Download complete 2023-01-11T21:17:16.9096376Z 502253a1be21: Pull complete 2023-01-11T21:17:17.0151345Z 5c7dd67e5809: Pull complete 2023-01-11T21:17:17.4191688Z bdfd23ed3f48: Pull complete 2023-01-11T21:17:17.5058748Z aee1dd761bdd: Pull complete 2023-01-11T21:17:34.2100689Z 5feda9af2542: Pull complete 2023-01-11T21:17:34.4356769Z f8371ecb849a: Pull complete 2023-01-11T21:17:34.6477249Z ce4a87d45645: Pull complete 2023-01-11T21:17:41.3057440Z 39629f7269f9: Pull complete 2023-01-11T21:17:41.5398081Z 87d0ffa55850: Pull complete 2023-01-11T21:17:41.7727691Z 70702f8b5bc4: Pull complete 2023-01-11T21:17:42.0103081Z 0c06be5c20e0: Pull complete 2023-01-11T21:17:43.6598807Z b372c2a3bc3f: Pull complete 2023-01-11T21:17:43.8978189Z 582d081a59fa: Pull complete 2023-01-11T21:17:44.1283032Z e1c655e7ec0e: Pull complete 2023-01-11T21:17:50.5485327Z c7726d39d806: Verifying Checksum 2023-01-11T21:17:50.5485689Z c7726d39d806: Download complete 2023-01-11T21:18:17.8189679Z c7726d39d806: Pull complete 2023-01-11T21:18:18.0420798Z 1c22f2f8c01b: Pull complete 2023-01-11T21:18:18.2671148Z b8f759fd0191: Pull complete 2023-01-11T21:18:18.5010183Z e28e73a4bddd: Pull complete 2023-01-11T21:18:18.7377991Z 90d8f9bbe048: Pull complete 2023-01-11T21:18:18.9452208Z b34bd39d0461: Pull complete 2023-01-11T21:18:19.1798187Z 2f2308643d60: Pull complete 2023-01-11T21:18:21.2984376Z 8e3432e5a569: Pull complete 2023-01-11T21:18:21.5271471Z 9ea746919509: Pull complete 2023-01-11T21:18:21.7629329Z 1a2fd7b216d7: Pull complete 2023-01-11T21:18:22.0478763Z 19fde6a723a0: Pull complete 2023-01-11T21:18:22.2905751Z 06369252d749: Pull complete 2023-01-11T21:18:22.5065710Z ea4bfeaa0fc7: Pull complete 2023-01-11T21:18:27.5597403Z a1d16b6a5070: Pull complete 2023-01-11T21:18:27.7702298Z f550b7ff2470: Pull complete 2023-01-11T21:18:27.9964697Z 12ddc57b99eb: Pull complete 2023-01-11T21:18:28.9288887Z 8345085fb0a0: Pull complete 2023-01-11T21:18:29.1704693Z 4cc94dbec031: Pull complete 2023-01-11T21:18:29.3908414Z 29a7c0d5fa4c: Pull complete 2023-01-11T21:18:29.8058272Z 25571655d0e1: Pull complete 2023-01-11T21:18:30.0037983Z bdf297d7f88c: Pull complete 2023-01-11T21:18:31.3530941Z 0b3950af8ae1: Pull complete 2023-01-11T21:18:31.5620912Z 6d68f7da8baa: Pull complete 2023-01-11T21:18:31.7921054Z cca768f96df4: Pull complete 2023-01-11T21:18:37.0105557Z 8c3cf3d5e1c5: Pull complete 2023-01-11T21:18:37.1395298Z 61eecfa8b34e: Pull complete 2023-01-11T21:18:37.5365422Z 95c1ac011645: Pull complete 2023-01-11T21:18:37.7201520Z 3046cc00c4ca: Pull complete 2023-01-11T21:18:37.9430793Z 195d560d8cf6: Pull complete 2023-01-11T21:18:38.7511629Z 77250abd5ca4: Pull complete 2023-01-11T21:18:38.9897698Z 881b24daf9c5: Pull complete 2023-01-11T21:18:40.7336732Z 9fbf0a18619e: Pull complete 2023-01-11T21:18:40.9960786Z 02048a597c22: Pull complete 2023-01-11T21:18:42.9949275Z 859052a25d95: Verifying Checksum 2023-01-11T21:18:42.9951467Z 859052a25d95: Download complete 2023-01-11T21:19:16.9733211Z 859052a25d95: Pull complete 2023-01-11T21:19:17.0700095Z 3e03143da3c2: Pull complete 2023-01-11T21:19:17.0808014Z Digest: sha256:866df6c1171dbe014496717cf2080d6cc72ca611a4e8146525c9ef09640c8ba4 2023-01-11T21:19:17.0837386Z Status: Downloaded newer image for 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-bionic-cuda11.6-cudnn8-py3-gcc7:fd224c2e6c79d7fdec6408da598bf52bc5b201dd 2023-01-11T21:19:17.0861012Z 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-bionic-cuda11.6-cudnn8-py3-gcc7:fd224c2e6c79d7fdec6408da598bf52bc5b201dd 2023-01-11T21:19:17.0938699Z ##[group]Run pytorch/test-infra/.github/actions/setup-nvidia@main 2023-01-11T21:19:17.0938952Z with: 2023-01-11T21:19:17.0939143Z driver-version: 515.76 2023-01-11T21:19:17.0939337Z env: 2023-01-11T21:19:17.0939515Z GIT_DEFAULT_BRANCH: master 2023-01-11T21:19:17.0939718Z ##[endgroup] 2023-01-11T21:19:17.0964127Z ##[group]Run nick-fields/retry@3e91a01664abd3c5cd539100d10d33b9c5b68482 2023-01-11T21:19:17.0964367Z with: 2023-01-11T21:19:17.0964549Z timeout_minutes: 10 2023-01-11T21:19:17.0964743Z max_attempts: 3 2023-01-11T21:19:17.0969789Z command: # Is it disgusting to have a full shell script here in this github action? Sure # But is it the best way to make it so that this action relies on nothing else? Absolutely set -eou pipefail DISTRIBUTION=$(. /etc/os-release;echo $ID$VERSION_ID) DRIVER_FN="NVIDIA-Linux-x86_64-${DRIVER_VERSION}.run" YUM_REPO_URL="https://nvidia.github.io/nvidia-docker/${DISTRIBUTION}/nvidia-docker.repo" install_nvidia_docker2_amzn2() { ( set -x # Needed for yum-config-manager sudo yum install -y yum-utils sudo yum-config-manager --add-repo "${YUM_REPO_URL}" sudo yum install -y nvidia-docker2 sudo systemctl restart docker ) } install_nvidia_docker2_ubuntu20() { ( set -x sudo apt-get install -y nvidia-docker2 sudo systemctl restart docker ) } pre_install_nvidia_driver_amzn2() { ( # Purge any nvidia driver installed from RHEL repo sudo yum remove -y nvidia-driver-latest-dkms ) } install_nvidia_driver_common() { ( # Try to gather more information about the runner and its existing NVIDIA driver if any echo "Before installing NVIDIA driver" lspci lsmod modinfo nvidia || true HAS_NVIDIA_DRIVER=0 # Check if NVIDIA driver has already been installed if [ -x "$(command -v nvidia-smi)" ]; then set +e # The driver exists, check its version next. Also check only the first GPU if there are more than one of them # so that the same driver version is not print over multiple lines INSTALLED_DRIVER_VERSION=$(nvidia-smi --query-gpu=driver_version --format=csv,noheader --id=0) NVIDIA_SMI_STATUS=$? if [ "$NVIDIA_SMI_STATUS" -ne 0 ] && [ "$NVIDIA_SMI_STATUS" -ne 14 ]; then echo "Failed to get NVIDIA driver version ($INSTALLED_DRIVER_VERSION). Continuing" elif [ "$INSTALLED_DRIVER_VERSION" != "$DRIVER_VERSION" ]; then echo "NVIDIA driver ($INSTALLED_DRIVER_VERSION) has been installed, but we expect to have $DRIVER_VERSION instead. Continuing" else HAS_NVIDIA_DRIVER=1 echo "NVIDIA driver ($INSTALLED_DRIVER_VERSION) has already been installed. Skipping NVIDIA driver installation" fi set -e fi if [ "$HAS_NVIDIA_DRIVER" -eq 0 ]; then # CAUTION: this may need to be updated in future if [ "${DISTRIBUTION}" != ubuntu20.04 ]; then sudo yum groupinstall -y "Development Tools" # ensure our kernel install is the same as our underlying kernel, # groupinstall "Development Tools" has a habit of mismatching kernel headers sudo yum install -y "kernel-devel-uname-r == $(uname -r)" sudo modprobe backlight fi sudo curl -fsL -o /tmp/nvidia_driver "https://s3.amazonaws.com/ossci-linux/nvidia_driver/$DRIVER_FN" set +e sudo /bin/bash /tmp/nvidia_driver -s --no-drm NVIDIA_INSTALLATION_STATUS=$? RESET_GPU=0 if [ "$NVIDIA_INSTALLATION_STATUS" -ne 0 ]; then sudo cat /var/log/nvidia-installer.log # Fail to install NVIDIA driver, try to reset the GPU RESET_GPU=1 elif [ -x "$(command -v nvidia-smi)" ]; then # Check again if nvidia-smi works even if the driver installation completes successfully INSTALLED_DRIVER_VERSION=$(nvidia-smi --query-gpu=driver_version --format=csv,noheader --id=0) NVIDIA_SMI_STATUS=$? if [ "$NVIDIA_SMI_STATUS" -ne 0 ] && [ "$NVIDIA_SMI_STATUS" -ne 14 ]; then RESET_GPU=1 fi fi if [ "$RESET_GPU" -eq 1 ]; then NVIDIA_DEVICES=$(lspci -D | grep -i NVIDIA | cut -d' ' -f1) # The GPU can get stuck in a failure state if somehow the test crashs the GPU microcode. When this # happens, we'll try to reset all NVIDIA devices https://github.com/pytorch/pytorch/issues/88388 for PCI_ID in $NVIDIA_DEVICES; do DEVICE_ENABLED=$(cat /sys/bus/pci/devices/$PCI_ID/enable) echo "Reseting $PCI_ID (enabled state: $DEVICE_ENABLED)" # This requires sudo permission of course echo "1" | sudo tee /sys/bus/pci/devices/$PCI_ID/reset sleep 1 done fi sudo rm -fv /tmp/nvidia_driver set -e fi ) } post_install_nvidia_driver_common() { ( sudo modprobe nvidia || true echo "After installing NVIDIA driver" lspci lsmod modinfo nvidia || true ( set +e nvidia-smi NVIDIA_SMI_STATUS=$? # Allowable exit statuses for nvidia-smi, see: https://github.com/NVIDIA/gpu-operator/issues/285 if [ "$NVIDIA_SMI_STATUS" -eq 0 ] || [ "$NVIDIA_SMI_STATUS" -eq 14 ]; then echo "INFO: Ignoring allowed status ${NVIDIA_SMI_STATUS}" else echo "ERROR: nvidia-smi exited with unresolved status ${NVIDIA_SMI_STATUS}" exit ${NVIDIA_SMI_STATUS} fi set -e ) ) } install_nvidia_driver_amzn2() { ( set -x pre_install_nvidia_driver_amzn2 install_nvidia_driver_common post_install_nvidia_driver_common ) } install_nvidia_driver_ubuntu20() { ( set -x install_nvidia_driver_common post_install_nvidia_driver_common ) } echo "== Installing nvidia driver ${DRIVER_FN} ==" case "${DISTRIBUTION}" in amzn*) install_nvidia_driver_amzn2 ;; ubuntu20.04) install_nvidia_driver_ubuntu20 ;; *) echo "ERROR: Unknown distribution ${DISTRIBUTION}" exit 1 ;; esac # Install container toolkit based on distribution echo "== Installing nvidia container toolkit for ${DISTRIBUTION} ==" case "${DISTRIBUTION}" in amzn*) install_nvidia_docker2_amzn2 ;; ubuntu20.04) install_nvidia_docker2_ubuntu20 ;; *) echo "ERROR: Unknown distribution ${DISTRIBUTION}" exit 1 ;; esac echo "GPU_FLAG=--gpus all" >> "${GITHUB_ENV}" 2023-01-11T21:19:17.0975437Z retry_wait_seconds: 10 2023-01-11T21:19:17.0975664Z polling_interval_seconds: 1 2023-01-11T21:19:17.0975883Z warning_on_retry: true 2023-01-11T21:19:17.0976082Z continue_on_error: false 2023-01-11T21:19:17.0976277Z env: 2023-01-11T21:19:17.0976465Z GIT_DEFAULT_BRANCH: master 2023-01-11T21:19:17.0976668Z DRIVER_VERSION: 515.76 2023-01-11T21:19:17.0976866Z ##[endgroup] 2023-01-11T21:19:17.1445560Z == Installing nvidia driver NVIDIA-Linux-x86_64-515.76.run == 2023-01-11T21:19:17.1446685Z + pre_install_nvidia_driver_amzn2 2023-01-11T21:19:17.1447389Z + sudo yum remove -y nvidia-driver-latest-dkms 2023-01-11T21:19:17.4230335Z Loaded plugins: extras_suggestions, langpacks, priorities, update-motd 2023-01-11T21:19:17.4619120Z No Match for argument: nvidia-driver-latest-dkms 2023-01-11T21:19:17.4862616Z No Packages marked for removal 2023-01-11T21:19:17.4993762Z + install_nvidia_driver_common 2023-01-11T21:19:17.4994441Z + echo 'Before installing NVIDIA driver' 2023-01-11T21:19:17.4994682Z + lspci 2023-01-11T21:19:17.4999119Z Before installing NVIDIA driver 2023-01-11T21:19:17.5082312Z 00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] 2023-01-11T21:19:17.5083299Z 00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II] 2023-01-11T21:19:17.5084244Z 00:01.3 Non-VGA unclassified device: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 08) 2023-01-11T21:19:17.5084875Z 00:03.0 VGA compatible controller: Amazon.com, Inc. Device 1111 2023-01-11T21:19:17.5086304Z 00:04.0 Non-Volatile memory controller: Amazon.com, Inc. Device 8061 2023-01-11T21:19:17.5086931Z 00:05.0 Ethernet controller: Amazon.com, Inc. Elastic Network Adapter (ENA) 2023-01-11T21:19:17.5087512Z 00:1e.0 3D controller: NVIDIA Corporation Device 2237 (rev a1) 2023-01-11T21:19:17.5088228Z 00:1f.0 Non-Volatile memory controller: Amazon.com, Inc. NVMe SSD Controller 2023-01-11T21:19:17.5088708Z + lsmod 2023-01-11T21:19:17.5098621Z Module Size Used by 2023-01-11T21:19:17.5098983Z nvidia_modeset 1142784 0 2023-01-11T21:19:17.5099298Z nvidia_uvm 1269760 0 2023-01-11T21:19:17.5099534Z veth 16384 0 2023-01-11T21:19:17.5099759Z nvidia 40808448 2 nvidia_uvm,nvidia_modeset 2023-01-11T21:19:17.5102169Z drm 425984 1 nvidia 2023-01-11T21:19:17.5102688Z i2c_core 77824 2 nvidia,drm 2023-01-11T21:19:17.5103079Z backlight 16384 1 nvidia_modeset 2023-01-11T21:19:17.5103405Z xt_conntrack 16384 1 2023-01-11T21:19:17.5103709Z ipt_MASQUERADE 16384 1 2023-01-11T21:19:17.5104035Z nf_nat_masquerade_ipv4 16384 1 ipt_MASQUERADE 2023-01-11T21:19:17.5104382Z nf_conntrack_netlink 49152 0 2023-01-11T21:19:17.5104731Z nfnetlink 16384 2 nf_conntrack_netlink 2023-01-11T21:19:17.5105039Z xfrm_user 45056 1 2023-01-11T21:19:17.5105344Z xfrm_algo 16384 1 xfrm_user 2023-01-11T21:19:17.5105648Z xt_addrtype 16384 2 2023-01-11T21:19:17.5105931Z iptable_filter 16384 1 2023-01-11T21:19:17.5106171Z iptable_nat 16384 1 2023-01-11T21:19:17.5106486Z nf_conntrack_ipv4 16384 3 2023-01-11T21:19:17.5106847Z nf_defrag_ipv4 16384 1 nf_conntrack_ipv4 2023-01-11T21:19:17.5107254Z nf_nat_ipv4 16384 1 iptable_nat 2023-01-11T21:19:17.5107732Z nf_nat 36864 2 nf_nat_masquerade_ipv4,nf_nat_ipv4 2023-01-11T21:19:17.5108269Z nf_conntrack 155648 7 xt_conntrack,nf_nat_masquerade_ipv4,nf_conntrack_ipv4,nf_nat,ipt_MASQUERADE,nf_nat_ipv4,nf_conntrack_netlink 2023-01-11T21:19:17.5108668Z br_netfilter 24576 0 2023-01-11T21:19:17.5109048Z bridge 172032 1 br_netfilter 2023-01-11T21:19:17.5109520Z stp 16384 1 bridge 2023-01-11T21:19:17.5109883Z llc 16384 2 bridge,stp 2023-01-11T21:19:17.5110386Z overlay 86016 0 2023-01-11T21:19:17.5110746Z sunrpc 393216 1 2023-01-11T21:19:17.5111664Z dm_mirror 28672 0 2023-01-11T21:19:17.5112220Z dm_region_hash 20480 1 dm_mirror 2023-01-11T21:19:17.5112547Z dm_log 20480 2 dm_region_hash,dm_mirror 2023-01-11T21:19:17.5112938Z dm_mod 143360 2 dm_log,dm_mirror 2023-01-11T21:19:17.5113269Z dax 69632 1 dm_mod 2023-01-11T21:19:17.5113552Z pcc_cpufreq 16384 0 2023-01-11T21:19:17.5113925Z crc32_pclmul 16384 0 2023-01-11T21:19:17.5114277Z ghash_clmulni_intel 16384 0 2023-01-11T21:19:17.5114614Z pcbc 16384 0 2023-01-11T21:19:17.5114934Z aesni_intel 188416 0 2023-01-11T21:19:17.5115260Z aes_x86_64 20480 1 aesni_intel 2023-01-11T21:19:17.5115603Z crypto_simd 16384 1 aesni_intel 2023-01-11T21:19:17.5115989Z glue_helper 16384 1 aesni_intel 2023-01-11T21:19:17.5116324Z cryptd 28672 3 crypto_simd,ghash_clmulni_intel,aesni_intel 2023-01-11T21:19:17.5116625Z mousedev 24576 0 2023-01-11T21:19:17.5116916Z psmouse 32768 0 2023-01-11T21:19:17.5117156Z evdev 20480 3 2023-01-11T21:19:17.5117489Z button 16384 0 2023-01-11T21:19:17.5117732Z ena 114688 0 2023-01-11T21:19:17.5117945Z crc32c_intel 24576 0 2023-01-11T21:19:17.5118201Z autofs4 49152 2 2023-01-11T21:19:17.5118462Z + modinfo nvidia 2023-01-11T21:19:17.5118860Z filename: /lib/modules/4.14.252-195.483.amzn2.x86_64/kernel/drivers/video/nvidia.ko 2023-01-11T21:19:17.5119179Z firmware: nvidia/515.76/gsp.bin 2023-01-11T21:19:17.5119537Z alias: char-major-195-* 2023-01-11T21:19:17.5119772Z version: 515.76 2023-01-11T21:19:17.5120032Z supported: external 2023-01-11T21:19:17.5120286Z license: NVIDIA 2023-01-11T21:19:17.5120530Z srcversion: 51FD9DD90150B35351AFFBB 2023-01-11T21:19:17.5120826Z alias: pci:v000010DEd*sv*sd*bc06sc80i00* 2023-01-11T21:19:17.5121123Z alias: pci:v000010DEd*sv*sd*bc03sc02i00* 2023-01-11T21:19:17.5121433Z alias: pci:v000010DEd*sv*sd*bc03sc00i00* 2023-01-11T21:19:17.5121725Z depends: i2c-core,drm 2023-01-11T21:19:17.5122022Z retpoline: Y 2023-01-11T21:19:17.5122269Z name: nvidia 2023-01-11T21:19:17.5122601Z vermagic: 4.14.252-195.483.amzn2.x86_64 SMP mod_unload modversions 2023-01-11T21:19:17.5157210Z parm: NvSwitchRegDwords:NvSwitch regkey (charp) 2023-01-11T21:19:17.5157657Z parm: NvSwitchBlacklist:NvSwitchBlacklist=uuid[,uuid...] (charp) 2023-01-11T21:19:17.5158037Z parm: NVreg_ResmanDebugLevel:int 2023-01-11T21:19:17.5158330Z parm: NVreg_RmLogonRC:int 2023-01-11T21:19:17.5158629Z parm: NVreg_ModifyDeviceFiles:int 2023-01-11T21:19:17.5158873Z parm: NVreg_DeviceFileUID:int 2023-01-11T21:19:17.5159093Z parm: NVreg_DeviceFileGID:int 2023-01-11T21:19:17.5159322Z parm: NVreg_DeviceFileMode:int 2023-01-11T21:19:17.5159599Z parm: NVreg_InitializeSystemMemoryAllocations:int 2023-01-11T21:19:17.5159885Z parm: NVreg_UsePageAttributeTable:int 2023-01-11T21:19:17.5160131Z parm: NVreg_EnablePCIeGen3:int 2023-01-11T21:19:17.5160352Z parm: NVreg_EnableMSI:int 2023-01-11T21:19:17.5160568Z parm: NVreg_TCEBypassMode:int 2023-01-11T21:19:17.5160805Z parm: NVreg_EnableStreamMemOPs:int 2023-01-11T21:19:17.5161076Z parm: NVreg_RestrictProfilingToAdminUsers:int 2023-01-11T21:19:17.5161368Z parm: NVreg_PreserveVideoMemoryAllocations:int 2023-01-11T21:19:17.5161856Z parm: NVreg_EnableS0ixPowerManagement:int 2023-01-11T21:19:17.5162166Z parm: NVreg_S0ixPowerManagementVideoMemoryThreshold:int 2023-01-11T21:19:17.5162464Z parm: NVreg_DynamicPowerManagement:int 2023-01-11T21:19:17.5162776Z parm: NVreg_DynamicPowerManagementVideoMemoryThreshold:int 2023-01-11T21:19:17.5163076Z parm: NVreg_EnableGpuFirmware:int 2023-01-11T21:19:17.5163327Z parm: NVreg_EnableGpuFirmwareLogs:int 2023-01-11T21:19:17.5163711Z parm: NVreg_OpenRmEnableUnsupportedGpus:int 2023-01-11T21:19:17.5163993Z parm: NVreg_EnableUserNUMAManagement:int 2023-01-11T21:19:17.5164254Z parm: NVreg_MemoryPoolSize:int 2023-01-11T21:19:17.5164491Z parm: NVreg_KMallocHeapMaxSize:int 2023-01-11T21:19:17.5164746Z parm: NVreg_VMallocHeapMaxSize:int 2023-01-11T21:19:17.5164986Z parm: NVreg_IgnoreMMIOCheck:int 2023-01-11T21:19:17.5165215Z parm: NVreg_NvLinkDisable:int 2023-01-11T21:19:17.5165486Z parm: NVreg_EnablePCIERelaxedOrderingMode:int 2023-01-11T21:19:17.5165764Z parm: NVreg_RegisterPCIDriver:int 2023-01-11T21:19:17.5166018Z parm: NVreg_EnableDbgBreakpoint:int 2023-01-11T21:19:17.5166262Z parm: NVreg_RegistryDwords:charp 2023-01-11T21:19:17.5166521Z parm: NVreg_RegistryDwordsPerDevice:charp 2023-01-11T21:19:17.5166772Z parm: NVreg_RmMsg:charp 2023-01-11T21:19:17.5166999Z parm: NVreg_GpuBlacklist:charp 2023-01-11T21:19:17.5167306Z parm: NVreg_TemporaryFilePath:charp 2023-01-11T21:19:17.5167560Z parm: NVreg_ExcludedGpus:charp 2023-01-11T21:19:17.5167801Z parm: NVreg_DmaRemapPeerMmio:int 2023-01-11T21:19:17.5168046Z parm: rm_firmware_active:charp 2023-01-11T21:19:17.5168267Z + HAS_NVIDIA_DRIVER=0 2023-01-11T21:19:17.5168534Z ++ command -v nvidia-smi 2023-01-11T21:19:17.5168788Z + '[' -x /usr/bin/nvidia-smi ']' 2023-01-11T21:19:17.5168994Z + set +e 2023-01-11T21:19:17.5169304Z ++ nvidia-smi --query-gpu=driver_version --format=csv,noheader --id=0 2023-01-11T21:19:20.2800290Z + INSTALLED_DRIVER_VERSION=515.76 2023-01-11T21:19:20.2800965Z + NVIDIA_SMI_STATUS=0 2023-01-11T21:19:20.2801478Z + '[' 0 -ne 0 ']' 2023-01-11T21:19:20.2801760Z + '[' 515.76 '!=' 515.76 ']' 2023-01-11T21:19:20.2802002Z + HAS_NVIDIA_DRIVER=1 2023-01-11T21:19:20.2802364Z + echo 'NVIDIA driver (515.76) has already been installed. Skipping NVIDIA driver installation' 2023-01-11T21:19:20.2802665Z + set -e 2023-01-11T21:19:20.2802873Z + '[' 1 -eq 0 ']' 2023-01-11T21:19:20.2803148Z NVIDIA driver (515.76) has already been installed. Skipping NVIDIA driver installation 2023-01-11T21:19:20.2803437Z + post_install_nvidia_driver_common 2023-01-11T21:19:20.2805138Z + sudo modprobe nvidia 2023-01-11T21:19:20.2913770Z + echo 'After installing NVIDIA driver' 2023-01-11T21:19:20.2914149Z + lspci 2023-01-11T21:19:20.2914426Z After installing NVIDIA driver 2023-01-11T21:19:20.2999784Z 00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] 2023-01-11T21:19:20.3000697Z 00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II] 2023-01-11T21:19:20.3001905Z 00:01.3 Non-VGA unclassified device: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 08) 2023-01-11T21:19:20.3002886Z 00:03.0 VGA compatible controller: Amazon.com, Inc. Device 1111 2023-01-11T21:19:20.3003948Z 00:04.0 Non-Volatile memory controller: Amazon.com, Inc. Device 8061 2023-01-11T21:19:20.3004836Z 00:05.0 Ethernet controller: Amazon.com, Inc. Elastic Network Adapter (ENA) 2023-01-11T21:19:20.3005780Z 00:1e.0 3D controller: NVIDIA Corporation Device 2237 (rev a1) 2023-01-11T21:19:20.3006985Z 00:1f.0 Non-Volatile memory controller: Amazon.com, Inc. NVMe SSD Controller 2023-01-11T21:19:20.3007756Z + lsmod 2023-01-11T21:19:20.3019348Z Module Size Used by 2023-01-11T21:19:20.3019799Z nvidia_modeset 1142784 0 2023-01-11T21:19:20.3020173Z nvidia_uvm 1269760 0 2023-01-11T21:19:20.3020431Z veth 16384 0 2023-01-11T21:19:20.3020730Z nvidia 40808448 2 nvidia_uvm,nvidia_modeset 2023-01-11T21:19:20.3021020Z drm 425984 1 nvidia 2023-01-11T21:19:20.3021330Z i2c_core 77824 2 nvidia,drm 2023-01-11T21:19:20.3021823Z backlight 16384 1 nvidia_modeset 2023-01-11T21:19:20.3022899Z xt_conntrack 16384 1 2023-01-11T21:19:20.3023245Z ipt_MASQUERADE 16384 1 2023-01-11T21:19:20.3023608Z nf_nat_masquerade_ipv4 16384 1 ipt_MASQUERADE 2023-01-11T21:19:20.3024117Z nf_conntrack_netlink 49152 0 2023-01-11T21:19:20.3024359Z nfnetlink 16384 2 nf_conntrack_netlink 2023-01-11T21:19:20.3024580Z xfrm_user 45056 1 2023-01-11T21:19:20.3024783Z xfrm_algo 16384 1 xfrm_user 2023-01-11T21:19:20.3024994Z xt_addrtype 16384 2 2023-01-11T21:19:20.3025201Z iptable_filter 16384 1 2023-01-11T21:19:20.3025396Z iptable_nat 16384 1 2023-01-11T21:19:20.3025604Z nf_conntrack_ipv4 16384 3 2023-01-11T21:19:20.3025832Z nf_defrag_ipv4 16384 1 nf_conntrack_ipv4 2023-01-11T21:19:20.3026073Z nf_nat_ipv4 16384 1 iptable_nat 2023-01-11T21:19:20.3026316Z nf_nat 36864 2 nf_nat_masquerade_ipv4,nf_nat_ipv4 2023-01-11T21:19:20.3026675Z nf_conntrack 155648 7 xt_conntrack,nf_nat_masquerade_ipv4,nf_conntrack_ipv4,nf_nat,ipt_MASQUERADE,nf_nat_ipv4,nf_conntrack_netlink 2023-01-11T21:19:20.3027110Z br_netfilter 24576 0 2023-01-11T21:19:20.3027424Z bridge 172032 1 br_netfilter 2023-01-11T21:19:20.3027855Z stp 16384 1 bridge 2023-01-11T21:19:20.3028170Z llc 16384 2 bridge,stp 2023-01-11T21:19:20.3028482Z overlay 86016 0 2023-01-11T21:19:20.3028796Z sunrpc 393216 1 2023-01-11T21:19:20.3029103Z dm_mirror 28672 0 2023-01-11T21:19:20.3029411Z dm_region_hash 20480 1 dm_mirror 2023-01-11T21:19:20.3029658Z dm_log 20480 2 dm_region_hash,dm_mirror 2023-01-11T21:19:20.3029899Z dm_mod 143360 2 dm_log,dm_mirror 2023-01-11T21:19:20.3030115Z dax 69632 1 dm_mod 2023-01-11T21:19:20.3030314Z pcc_cpufreq 16384 0 2023-01-11T21:19:20.3030520Z crc32_pclmul 16384 0 2023-01-11T21:19:20.3030729Z ghash_clmulni_intel 16384 0 2023-01-11T21:19:20.3030952Z pcbc 16384 0 2023-01-11T21:19:20.3031171Z aesni_intel 188416 0 2023-01-11T21:19:20.3031383Z aes_x86_64 20480 1 aesni_intel 2023-01-11T21:19:20.3031601Z crypto_simd 16384 1 aesni_intel 2023-01-11T21:19:20.3031830Z glue_helper 16384 1 aesni_intel 2023-01-11T21:19:20.3032091Z cryptd 28672 3 crypto_simd,ghash_clmulni_intel,aesni_intel 2023-01-11T21:19:20.3032333Z mousedev 24576 0 2023-01-11T21:19:20.3032531Z psmouse 32768 0 2023-01-11T21:19:20.3032732Z evdev 20480 3 2023-01-11T21:19:20.3032916Z button 16384 0 2023-01-11T21:19:20.3033105Z ena 114688 0 2023-01-11T21:19:20.3033301Z crc32c_intel 24576 0 2023-01-11T21:19:20.3033489Z autofs4 49152 2 2023-01-11T21:19:20.3033678Z + modinfo nvidia 2023-01-11T21:19:20.3034076Z filename: /lib/modules/4.14.252-195.483.amzn2.x86_64/kernel/drivers/video/nvidia.ko 2023-01-11T21:19:20.3034350Z firmware: nvidia/515.76/gsp.bin 2023-01-11T21:19:20.3034602Z alias: char-major-195-* 2023-01-11T21:19:20.3034817Z version: 515.76 2023-01-11T21:19:20.3035022Z supported: external 2023-01-11T21:19:20.3035213Z license: NVIDIA 2023-01-11T21:19:20.3035430Z srcversion: 51FD9DD90150B35351AFFBB 2023-01-11T21:19:20.3035671Z alias: pci:v000010DEd*sv*sd*bc06sc80i00* 2023-01-11T21:19:20.3035906Z alias: pci:v000010DEd*sv*sd*bc03sc02i00* 2023-01-11T21:19:20.3036146Z alias: pci:v000010DEd*sv*sd*bc03sc00i00* 2023-01-11T21:19:20.3036407Z depends: i2c-core,drm 2023-01-11T21:19:20.3036605Z retpoline: Y 2023-01-11T21:19:20.3036797Z name: nvidia 2023-01-11T21:19:20.3037103Z vermagic: 4.14.252-195.483.amzn2.x86_64 SMP mod_unload modversions 2023-01-11T21:19:20.3037396Z parm: NvSwitchRegDwords:NvSwitch regkey (charp) 2023-01-11T21:19:20.3037695Z parm: NvSwitchBlacklist:NvSwitchBlacklist=uuid[,uuid...] (charp) 2023-01-11T21:19:20.3037980Z parm: NVreg_ResmanDebugLevel:int 2023-01-11T21:19:20.3038213Z parm: NVreg_RmLogonRC:int 2023-01-11T21:19:20.3038539Z parm: NVreg_ModifyDeviceFiles:int 2023-01-11T21:19:20.3038783Z parm: NVreg_DeviceFileUID:int 2023-01-11T21:19:20.3039019Z parm: NVreg_DeviceFileGID:int 2023-01-11T21:19:20.3039250Z parm: NVreg_DeviceFileMode:int 2023-01-11T21:19:20.3039535Z parm: NVreg_InitializeSystemMemoryAllocations:int 2023-01-11T21:19:20.3039830Z parm: NVreg_UsePageAttributeTable:int 2023-01-11T21:19:20.3040075Z parm: NVreg_EnablePCIeGen3:int 2023-01-11T21:19:20.3040309Z parm: NVreg_EnableMSI:int 2023-01-11T21:19:20.3040539Z parm: NVreg_TCEBypassMode:int 2023-01-11T21:19:20.3040779Z parm: NVreg_EnableStreamMemOPs:int 2023-01-11T21:19:20.3041061Z parm: NVreg_RestrictProfilingToAdminUsers:int 2023-01-11T21:19:20.3041365Z parm: NVreg_PreserveVideoMemoryAllocations:int 2023-01-11T21:19:20.3041660Z parm: NVreg_EnableS0ixPowerManagement:int 2023-01-11T21:19:20.3041972Z parm: NVreg_S0ixPowerManagementVideoMemoryThreshold:int 2023-01-11T21:19:20.3042324Z parm: NVreg_DynamicPowerManagement:int 2023-01-11T21:19:20.3042647Z parm: NVreg_DynamicPowerManagementVideoMemoryThreshold:int 2023-01-11T21:19:20.3042948Z parm: NVreg_EnableGpuFirmware:int 2023-01-11T21:19:20.3043208Z parm: NVreg_EnableGpuFirmwareLogs:int 2023-01-11T21:19:20.3043490Z parm: NVreg_OpenRmEnableUnsupportedGpus:int 2023-01-11T21:19:20.3043770Z parm: NVreg_EnableUserNUMAManagement:int 2023-01-11T21:19:20.3044027Z parm: NVreg_MemoryPoolSize:int 2023-01-11T21:19:20.3044277Z parm: NVreg_KMallocHeapMaxSize:int 2023-01-11T21:19:20.3044523Z parm: NVreg_VMallocHeapMaxSize:int 2023-01-11T21:19:20.3044766Z parm: NVreg_IgnoreMMIOCheck:int 2023-01-11T21:19:20.3045008Z parm: NVreg_NvLinkDisable:int 2023-01-11T21:19:20.3045280Z parm: NVreg_EnablePCIERelaxedOrderingMode:int 2023-01-11T21:19:20.3045549Z parm: NVreg_RegisterPCIDriver:int 2023-01-11T21:19:20.3045807Z parm: NVreg_EnableDbgBreakpoint:int 2023-01-11T21:19:20.3046058Z parm: NVreg_RegistryDwords:charp 2023-01-11T21:19:20.3046315Z parm: NVreg_RegistryDwordsPerDevice:charp 2023-01-11T21:19:20.3046565Z parm: NVreg_RmMsg:charp 2023-01-11T21:19:20.3046793Z parm: NVreg_GpuBlacklist:charp 2023-01-11T21:19:20.3047038Z parm: NVreg_TemporaryFilePath:charp 2023-01-11T21:19:20.3047283Z parm: NVreg_ExcludedGpus:charp 2023-01-11T21:19:20.3047526Z parm: NVreg_DmaRemapPeerMmio:int 2023-01-11T21:19:20.3047759Z parm: rm_firmware_active:charp 2023-01-11T21:19:20.3047966Z + set +e 2023-01-11T21:19:20.3048171Z + nvidia-smi 2023-01-11T21:19:22.7497174Z Wed Jan 11 21:19:22 2023 2023-01-11T21:19:22.7497866Z +-----------------------------------------------------------------------------+ 2023-01-11T21:19:22.7498364Z | NVIDIA-SMI 515.76 Driver Version: 515.76 CUDA Version: 11.7 | 2023-01-11T21:19:22.7498832Z |-------------------------------+----------------------+----------------------+ 2023-01-11T21:19:22.7499416Z | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | 2023-01-11T21:19:22.7499817Z | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | 2023-01-11T21:19:22.7500106Z | | | MIG M. | 2023-01-11T21:19:22.7500342Z |===============================+======================+======================| 2023-01-11T21:19:22.7727814Z | 0 NVIDIA A10G Off | 00000000:00:1E.0 Off | 0 | 2023-01-11T21:19:22.7728391Z | 0% 27C P0 58W / 300W | 0MiB / 23028MiB | 2% Default | 2023-01-11T21:19:22.7728869Z | | | N/A | 2023-01-11T21:19:22.7729627Z +-------------------------------+----------------------+----------------------+ 2023-01-11T21:19:22.7730461Z 2023-01-11T21:19:22.7731094Z +-----------------------------------------------------------------------------+ 2023-01-11T21:19:22.7731626Z | Processes: | 2023-01-11T21:19:22.7732100Z | GPU GI CI PID Type Process name GPU Memory | 2023-01-11T21:19:22.7732496Z | ID ID Usage | 2023-01-11T21:19:22.7732738Z |=============================================================================| 2023-01-11T21:19:22.7733058Z | No running processes found | 2023-01-11T21:19:22.7733418Z +-----------------------------------------------------------------------------+ 2023-01-11T21:19:23.2077874Z + NVIDIA_SMI_STATUS=0 2023-01-11T21:19:23.2078648Z + '[' 0 -eq 0 ']' 2023-01-11T21:19:23.2079343Z + echo 'INFO: Ignoring allowed status 0' 2023-01-11T21:19:23.2079859Z + set -e 2023-01-11T21:19:23.2080550Z INFO: Ignoring allowed status 0 2023-01-11T21:19:23.2083105Z == Installing nvidia container toolkit for amzn2 == 2023-01-11T21:19:23.2085300Z + sudo yum install -y yum-utils 2023-01-11T21:19:23.4794043Z Loaded plugins: extras_suggestions, langpacks, priorities, update-motd 2023-01-11T21:19:24.6724949Z Package yum-utils-1.1.31-46.amzn2.0.1.noarch already installed and latest version 2023-01-11T21:19:24.6725765Z Nothing to do 2023-01-11T21:19:24.7600171Z + sudo yum-config-manager --add-repo https://nvidia.github.io/nvidia-docker/amzn2/nvidia-docker.repo 2023-01-11T21:19:27.1796107Z Loaded plugins: extras_suggestions, langpacks, priorities, update-motd 2023-01-11T21:19:27.2119373Z adding repo from: https://nvidia.github.io/nvidia-docker/amzn2/nvidia-docker.repo 2023-01-11T21:19:27.2120417Z grabbing file https://nvidia.github.io/nvidia-docker/amzn2/nvidia-docker.repo to /etc/yum.repos.d/nvidia-docker.repo 2023-01-11T21:19:27.2121216Z repo saved to /etc/yum.repos.d/nvidia-docker.repo 2023-01-11T21:19:27.2268416Z + sudo yum install -y nvidia-docker2 2023-01-11T21:19:27.7173281Z Loaded plugins: extras_suggestions, langpacks, priorities, update-motd 2023-01-11T21:19:28.9089518Z Package nvidia-docker2-2.11.0-1.noarch already installed and latest version 2023-01-11T21:19:28.9090113Z Nothing to do 2023-01-11T21:19:28.9926948Z + sudo systemctl restart docker 2023-01-11T21:20:08.1967807Z Command completed after 1 attempt(s). 2023-01-11T21:20:08.2015757Z ##[group]Run python3 -m pip install psutil==5.9.1 2023-01-11T21:20:08.2016050Z python3 -m pip install psutil==5.9.1 2023-01-11T21:20:08.2016303Z python3 -m pip install pynvml==11.4.1 2023-01-11T21:20:08.2016591Z python3 -m tools.stats.monitor > usage_log.txt 2>&1 & 2023-01-11T21:20:08.2016895Z echo "monitor-script-pid=${!}" >> "${GITHUB_OUTPUT}" 2023-01-11T21:20:08.2027518Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2023-01-11T21:20:08.2027755Z env: 2023-01-11T21:20:08.2027949Z GIT_DEFAULT_BRANCH: master 2023-01-11T21:20:08.2028163Z GPU_FLAG: --gpus all 2023-01-11T21:20:08.2028352Z ##[endgroup] 2023-01-11T21:20:08.4160563Z Defaulting to user installation because normal site-packages is not writeable 2023-01-11T21:20:08.4345308Z Requirement already satisfied: psutil==5.9.1 in /home/ec2-user/.local/lib/python3.7/site-packages (5.9.1) 2023-01-11T21:20:08.8605360Z Defaulting to user installation because normal site-packages is not writeable 2023-01-11T21:20:08.8776688Z Requirement already satisfied: pynvml==11.4.1 in /home/ec2-user/.local/lib/python3.7/site-packages (11.4.1) 2023-01-11T21:20:09.0943401Z Prepare all required actions 2023-01-11T21:20:09.0943670Z Getting action download info 2023-01-11T21:20:09.2781202Z Download action repository 'seemethere/download-artifact-s3@v4' (SHA:4a8bfae15cc25cc0785c1603ee87a9da8fd442ea) 2023-01-11T21:20:09.4579393Z Download action repository 'actions/download-artifact@v3' (SHA:9bc31d5ccc31df68ecc42ccf4149144866c47d8a) 2023-01-11T21:20:09.7752947Z ##[group]Run ./.github/actions/download-build-artifacts 2023-01-11T21:20:09.7753193Z with: 2023-01-11T21:20:09.7753416Z name: linux-bionic-cuda11.6-py3.10-gcc7-sm86 2023-01-11T21:20:09.7753642Z env: 2023-01-11T21:20:09.7753831Z GIT_DEFAULT_BRANCH: master 2023-01-11T21:20:09.7754034Z GPU_FLAG: --gpus all 2023-01-11T21:20:09.7754230Z ##[endgroup] 2023-01-11T21:20:09.7777521Z ##[group]Run seemethere/download-artifact-s3@v4 2023-01-11T21:20:09.7777746Z with: 2023-01-11T21:20:09.7777978Z name: linux-bionic-cuda11.6-py3.10-gcc7-sm86 2023-01-11T21:20:09.7778234Z s3-bucket: gha-artifacts 2023-01-11T21:20:09.7778482Z region: us-east-1 2023-01-11T21:20:09.7778673Z env: 2023-01-11T21:20:09.7778870Z GIT_DEFAULT_BRANCH: master 2023-01-11T21:20:09.7779076Z GPU_FLAG: --gpus all 2023-01-11T21:20:09.7779275Z ##[endgroup] 2023-01-11T21:20:10.1734048Z Found 1 objects with prefix pytorch/pytorch/3896346758/linux-bionic-cuda11.6-py3.10-gcc7-sm86/ 2023-01-11T21:20:10.1734970Z Starting download (1/1): /home/ec2-user/actions-runner/_work/pytorch/pytorch/artifacts.zip 2023-01-11T21:20:18.0544660Z Finished download (1/1): /home/ec2-user/actions-runner/_work/pytorch/pytorch/artifacts.zip 2023-01-11T21:20:18.0545229Z 2023-01-11T21:20:18.0557928Z ##[warning]The `set-output` command is deprecated and will be disabled soon. Please upgrade to using Environment Files. For more information see: https://github.blog/changelog/2022-10-11-github-actions-deprecating-save-state-and-set-output-commands/ 2023-01-11T21:20:18.0565204Z Artifact download has finished successfully 2023-01-11T21:20:18.0676138Z ##[group]Run unzip -o artifacts.zip 2023-01-11T21:20:18.0676394Z unzip -o artifacts.zip 2023-01-11T21:20:18.0687106Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2023-01-11T21:20:18.0687362Z env: 2023-01-11T21:20:18.0687586Z GIT_DEFAULT_BRANCH: master 2023-01-11T21:20:18.0687795Z GPU_FLAG: --gpus all 2023-01-11T21:20:18.0687994Z ##[endgroup] 2023-01-11T21:20:18.0720534Z Archive: artifacts.zip 2023-01-11T21:20:18.0722549Z creating: dist/ 2023-01-11T21:20:19.6675485Z inflating: dist/torch-2.0.0a0+git8419ddd-cp310-cp310-linux_x86_64.whl 2023-01-11T21:20:19.6675826Z creating: build/custom_test_artifacts/ 2023-01-11T21:20:19.6676164Z creating: build/custom_test_artifacts/custom-op-build/ 2023-01-11T21:20:19.6676530Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/ 2023-01-11T21:20:19.6682367Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/CMakeOutput.log 2023-01-11T21:20:19.6683593Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/ 2023-01-11T21:20:19.6684439Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CMakeSystem.cmake 2023-01-11T21:20:19.6685296Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdC/ 2023-01-11T21:20:19.6686139Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdC/tmp/ 2023-01-11T21:20:19.6687060Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdC/CMakeCCompilerId.c 2023-01-11T21:20:19.6687960Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdC/a.out 2023-01-11T21:20:19.6688549Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCXX/ 2023-01-11T21:20:19.6688988Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCXX/tmp/ 2023-01-11T21:20:19.6689458Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCXX/CMakeCXXCompilerId.cpp 2023-01-11T21:20:19.6690077Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCXX/a.out 2023-01-11T21:20:19.6690556Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CMakeDetermineCompilerABI_C.bin 2023-01-11T21:20:19.6691028Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CMakeCCompiler.cmake 2023-01-11T21:20:19.6691608Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CMakeDetermineCompilerABI_CXX.bin 2023-01-11T21:20:19.6692080Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CMakeCXXCompiler.cmake 2023-01-11T21:20:19.6692524Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCUDA/ 2023-01-11T21:20:19.6692957Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/ 2023-01-11T21:20:19.6736586Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cpp1.ii 2023-01-11T21:20:19.6737906Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.c 2023-01-11T21:20:19.6738802Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.gpu 2023-01-11T21:20:19.6739624Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.stub.c 2023-01-11T21:20:19.6740192Z extracting: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.module_id 2023-01-11T21:20:19.6740711Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.ptx 2023-01-11T21:20:19.6741239Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.sm_52.cubin 2023-01-11T21:20:19.6741766Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.fatbin 2023-01-11T21:20:19.6742307Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.fatbin.c 2023-01-11T21:20:19.6774300Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cpp4.ii 2023-01-11T21:20:19.6806736Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.cpp 2023-01-11T21:20:19.6808300Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.o 2023-01-11T21:20:19.6809578Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/a_dlink.sm_52.cubin 2023-01-11T21:20:19.6810558Z extracting: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/a_dlink.reg.c 2023-01-11T21:20:19.6811519Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/a_dlink.fatbin 2023-01-11T21:20:19.6812481Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/a_dlink.fatbin.c 2023-01-11T21:20:19.6813568Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/a_dlink.o 2023-01-11T21:20:19.6814992Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCUDA/CMakeCUDACompilerId.cu 2023-01-11T21:20:19.6866520Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCUDA/a.out 2023-01-11T21:20:19.6936504Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CMakeDetermineCompilerABI_CUDA.bin 2023-01-11T21:20:19.6937851Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CMakeCUDACompiler.cmake 2023-01-11T21:20:19.6938582Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/CMakeTmp/ 2023-01-11T21:20:19.6939409Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/CMakeError.log 2023-01-11T21:20:19.6939996Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/cmake.check_cache 2023-01-11T21:20:19.6940605Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/ 2023-01-11T21:20:19.6941342Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/compiler_depend.ts 2023-01-11T21:20:19.6942143Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/compiler_depend.make 2023-01-11T21:20:19.6942905Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/depend.make 2023-01-11T21:20:19.6943371Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/link.txt 2023-01-11T21:20:19.6943809Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/cmake_clean.cmake 2023-01-11T21:20:19.6944259Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/build.make 2023-01-11T21:20:19.6944705Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/DependInfo.cmake 2023-01-11T21:20:19.6945154Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/flags.make 2023-01-11T21:20:19.6945597Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/progress.make 2023-01-11T21:20:19.6963787Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/op.cpp.o.d 2023-01-11T21:20:19.7068823Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/op.cpp.o 2023-01-11T21:20:19.7069559Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/ 2023-01-11T21:20:19.7070308Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/compiler_depend.ts 2023-01-11T21:20:19.7070842Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/compiler_depend.make 2023-01-11T21:20:19.7071664Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/depend.make 2023-01-11T21:20:19.7072277Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/link.txt 2023-01-11T21:20:19.7072739Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/cmake_clean.cmake 2023-01-11T21:20:19.7073204Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/build.make 2023-01-11T21:20:19.7073676Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/DependInfo.cmake 2023-01-11T21:20:19.7074146Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/flags.make 2023-01-11T21:20:19.7074605Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/progress.make 2023-01-11T21:20:19.7089797Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/test_custom_ops.cpp.o.d 2023-01-11T21:20:19.7157805Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/test_custom_ops.cpp.o 2023-01-11T21:20:19.7158983Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/CMakeDirectoryInformation.cmake 2023-01-11T21:20:19.7160212Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/TargetDirectories.txt 2023-01-11T21:20:19.7161380Z extracting: build/custom_test_artifacts/custom-op-build/CMakeFiles/progress.marks 2023-01-11T21:20:19.7162479Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/Makefile2 2023-01-11T21:20:19.7163571Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/Makefile.cmake 2023-01-11T21:20:19.7164453Z inflating: build/custom_test_artifacts/custom-op-build/detect_cuda_version.cc 2023-01-11T21:20:19.7165410Z inflating: build/custom_test_artifacts/custom-op-build/CMakeCache.txt 2023-01-11T21:20:19.7166158Z inflating: build/custom_test_artifacts/custom-op-build/Makefile 2023-01-11T21:20:19.7166906Z inflating: build/custom_test_artifacts/custom-op-build/cmake_install.cmake 2023-01-11T21:20:19.7238517Z inflating: build/custom_test_artifacts/custom-op-build/libcustom_ops.so 2023-01-11T21:20:19.7289662Z inflating: build/custom_test_artifacts/custom-op-build/test_custom_ops 2023-01-11T21:20:19.7290195Z creating: build/custom_test_artifacts/jit-hook-build/ 2023-01-11T21:20:19.7290596Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/ 2023-01-11T21:20:19.7297284Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/CMakeOutput.log 2023-01-11T21:20:19.7297830Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/ 2023-01-11T21:20:19.7298257Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CMakeSystem.cmake 2023-01-11T21:20:19.7298686Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdC/ 2023-01-11T21:20:19.7299119Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdC/tmp/ 2023-01-11T21:20:19.7299575Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdC/CMakeCCompilerId.c 2023-01-11T21:20:19.7300008Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdC/a.out 2023-01-11T21:20:19.7300437Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCXX/ 2023-01-11T21:20:19.7300857Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCXX/tmp/ 2023-01-11T21:20:19.7301604Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCXX/CMakeCXXCompilerId.cpp 2023-01-11T21:20:19.7302242Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCXX/a.out 2023-01-11T21:20:19.7303463Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CMakeDetermineCompilerABI_C.bin 2023-01-11T21:20:19.7303937Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CMakeCCompiler.cmake 2023-01-11T21:20:19.7325725Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CMakeDetermineCompilerABI_CXX.bin 2023-01-11T21:20:19.7326652Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CMakeCXXCompiler.cmake 2023-01-11T21:20:19.7327136Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCUDA/ 2023-01-11T21:20:19.7327575Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/ 2023-01-11T21:20:19.7350110Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cpp1.ii 2023-01-11T21:20:19.7351022Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.c 2023-01-11T21:20:19.7351819Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.gpu 2023-01-11T21:20:19.7352677Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.stub.c 2023-01-11T21:20:19.7353244Z extracting: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.module_id 2023-01-11T21:20:19.7353776Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.ptx 2023-01-11T21:20:19.7354312Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.sm_52.cubin 2023-01-11T21:20:19.7354836Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.fatbin 2023-01-11T21:20:19.7355495Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.fatbin.c 2023-01-11T21:20:19.7386528Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cpp4.ii 2023-01-11T21:20:19.7419769Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.cpp 2023-01-11T21:20:19.7420680Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.o 2023-01-11T21:20:19.7421417Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/a_dlink.sm_52.cubin 2023-01-11T21:20:19.7422040Z extracting: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/a_dlink.reg.c 2023-01-11T21:20:19.7422515Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/a_dlink.fatbin 2023-01-11T21:20:19.7423198Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/a_dlink.fatbin.c 2023-01-11T21:20:19.7423759Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/a_dlink.o 2023-01-11T21:20:19.7424254Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCUDA/CMakeCUDACompilerId.cu 2023-01-11T21:20:19.7481003Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCUDA/a.out 2023-01-11T21:20:19.7539057Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CMakeDetermineCompilerABI_CUDA.bin 2023-01-11T21:20:19.7539684Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CMakeCUDACompiler.cmake 2023-01-11T21:20:19.7540165Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/CMakeTmp/ 2023-01-11T21:20:19.7540574Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/CMakeError.log 2023-01-11T21:20:19.7540985Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/cmake.check_cache 2023-01-11T21:20:19.7541400Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/ 2023-01-11T21:20:19.7541856Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/compiler_depend.ts 2023-01-11T21:20:19.7542334Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/compiler_depend.make 2023-01-11T21:20:19.7542789Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/depend.make 2023-01-11T21:20:19.7543235Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/link.txt 2023-01-11T21:20:19.7543692Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/cmake_clean.cmake 2023-01-11T21:20:19.7544142Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/build.make 2023-01-11T21:20:19.7544602Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/DependInfo.cmake 2023-01-11T21:20:19.7545063Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/flags.make 2023-01-11T21:20:19.7545514Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/progress.make 2023-01-11T21:20:19.7560698Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/test_jit_hooks.cpp.o.d 2023-01-11T21:20:19.7616195Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/test_jit_hooks.cpp.o 2023-01-11T21:20:19.7617192Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/CMakeDirectoryInformation.cmake 2023-01-11T21:20:19.7617868Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/TargetDirectories.txt 2023-01-11T21:20:19.7618302Z extracting: build/custom_test_artifacts/jit-hook-build/CMakeFiles/progress.marks 2023-01-11T21:20:19.7618832Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/Makefile2 2023-01-11T21:20:19.7619248Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/Makefile.cmake 2023-01-11T21:20:19.7619656Z inflating: build/custom_test_artifacts/jit-hook-build/detect_cuda_version.cc 2023-01-11T21:20:19.7620039Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeCache.txt 2023-01-11T21:20:19.7620448Z inflating: build/custom_test_artifacts/jit-hook-build/Makefile 2023-01-11T21:20:19.7620830Z inflating: build/custom_test_artifacts/jit-hook-build/cmake_install.cmake 2023-01-11T21:20:19.7659713Z inflating: build/custom_test_artifacts/jit-hook-build/test_jit_hooks 2023-01-11T21:20:19.7660098Z creating: build/custom_test_artifacts/custom-backend-build/ 2023-01-11T21:20:19.7660486Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/ 2023-01-11T21:20:19.7664953Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/CMakeOutput.log 2023-01-11T21:20:19.7665491Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/ 2023-01-11T21:20:19.7665938Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CMakeSystem.cmake 2023-01-11T21:20:19.7666388Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdC/ 2023-01-11T21:20:19.7666830Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdC/tmp/ 2023-01-11T21:20:19.7667557Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdC/CMakeCCompilerId.c 2023-01-11T21:20:19.7668760Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdC/a.out 2023-01-11T21:20:19.7669299Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCXX/ 2023-01-11T21:20:19.7669751Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCXX/tmp/ 2023-01-11T21:20:19.7672207Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCXX/CMakeCXXCompilerId.cpp 2023-01-11T21:20:19.7672737Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCXX/a.out 2023-01-11T21:20:19.7673361Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CMakeDetermineCompilerABI_C.bin 2023-01-11T21:20:19.7673862Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CMakeCCompiler.cmake 2023-01-11T21:20:19.7675319Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CMakeDetermineCompilerABI_CXX.bin 2023-01-11T21:20:19.7675909Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CMakeCXXCompiler.cmake 2023-01-11T21:20:19.7676380Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCUDA/ 2023-01-11T21:20:19.7676838Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/ 2023-01-11T21:20:19.7720174Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cpp1.ii 2023-01-11T21:20:19.7721869Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.c 2023-01-11T21:20:19.7723366Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.gpu 2023-01-11T21:20:19.7724548Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.stub.c 2023-01-11T21:20:19.7725667Z extracting: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.module_id 2023-01-11T21:20:19.7726753Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.ptx 2023-01-11T21:20:19.7728013Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.sm_52.cubin 2023-01-11T21:20:19.7728609Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.fatbin 2023-01-11T21:20:19.7729149Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.fatbin.c 2023-01-11T21:20:19.7756612Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cpp4.ii 2023-01-11T21:20:19.7789449Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.cpp 2023-01-11T21:20:19.7790286Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.o 2023-01-11T21:20:19.7790953Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/a_dlink.sm_52.cubin 2023-01-11T21:20:19.7791635Z extracting: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/a_dlink.reg.c 2023-01-11T21:20:19.7792339Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/a_dlink.fatbin 2023-01-11T21:20:19.7792995Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/a_dlink.fatbin.c 2023-01-11T21:20:19.7793577Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/a_dlink.o 2023-01-11T21:20:19.7794077Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCUDA/CMakeCUDACompilerId.cu 2023-01-11T21:20:19.7851118Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCUDA/a.out 2023-01-11T21:20:19.7909314Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CMakeDetermineCompilerABI_CUDA.bin 2023-01-11T21:20:19.7909943Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CMakeCUDACompiler.cmake 2023-01-11T21:20:19.7910469Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/CMakeTmp/ 2023-01-11T21:20:19.7910957Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/CMakeError.log 2023-01-11T21:20:19.7911462Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/cmake.check_cache 2023-01-11T21:20:19.7911969Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/ 2023-01-11T21:20:19.7912519Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/compiler_depend.ts 2023-01-11T21:20:19.7913097Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/compiler_depend.make 2023-01-11T21:20:19.7913650Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/depend.make 2023-01-11T21:20:19.7914191Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/link.txt 2023-01-11T21:20:19.7914744Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/cmake_clean.cmake 2023-01-11T21:20:19.7915300Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/build.make 2023-01-11T21:20:19.7915778Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/DependInfo.cmake 2023-01-11T21:20:19.7916257Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/flags.make 2023-01-11T21:20:19.7916742Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/progress.make 2023-01-11T21:20:19.7917495Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/custom_backend.cpp.o.d 2023-01-11T21:20:19.8039258Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/custom_backend.cpp.o 2023-01-11T21:20:19.8040275Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/ 2023-01-11T21:20:19.8041257Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/compiler_depend.ts 2023-01-11T21:20:19.8042356Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/compiler_depend.make 2023-01-11T21:20:19.8043327Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/depend.make 2023-01-11T21:20:19.8044283Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/link.txt 2023-01-11T21:20:19.8045269Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/cmake_clean.cmake 2023-01-11T21:20:19.8046249Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/build.make 2023-01-11T21:20:19.8047218Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/DependInfo.cmake 2023-01-11T21:20:19.8047943Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/flags.make 2023-01-11T21:20:19.8048461Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/progress.make 2023-01-11T21:20:19.8058032Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/test_custom_backend.cpp.o.d 2023-01-11T21:20:19.8105777Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/test_custom_backend.cpp.o 2023-01-11T21:20:19.8106301Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/CMakeDirectoryInformation.cmake 2023-01-11T21:20:19.8106789Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/TargetDirectories.txt 2023-01-11T21:20:19.8107241Z extracting: build/custom_test_artifacts/custom-backend-build/CMakeFiles/progress.marks 2023-01-11T21:20:19.8107672Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/Makefile2 2023-01-11T21:20:19.8108102Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/Makefile.cmake 2023-01-11T21:20:19.8108734Z inflating: build/custom_test_artifacts/custom-backend-build/detect_cuda_version.cc 2023-01-11T21:20:19.8110327Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeCache.txt 2023-01-11T21:20:19.8110741Z inflating: build/custom_test_artifacts/custom-backend-build/Makefile 2023-01-11T21:20:19.8111202Z inflating: build/custom_test_artifacts/custom-backend-build/cmake_install.cmake 2023-01-11T21:20:19.8208400Z inflating: build/custom_test_artifacts/custom-backend-build/libcustom_backend.so 2023-01-11T21:20:19.8245876Z inflating: build/custom_test_artifacts/custom-backend-build/test_custom_backend 2023-01-11T21:20:19.8246467Z creating: build/lib/ 2023-01-11T21:20:19.8246902Z inflating: build/lib/libclog.a 2023-01-11T21:20:19.8300662Z inflating: build/lib/libgtest.a 2023-01-11T21:20:19.8308587Z inflating: build/lib/libpthreadpool.a 2023-01-11T21:20:19.8392315Z inflating: build/lib/libprotobuf-lite.a 2023-01-11T21:20:19.8399083Z inflating: build/lib/libittnotify.a 2023-01-11T21:20:19.8474915Z inflating: build/lib/libbenchmark.a 2023-01-11T21:20:19.8500547Z inflating: build/lib/libtensorpipe_uv.a 2023-01-11T21:20:19.8560381Z inflating: build/lib/libasmjit.a 2023-01-11T21:20:19.8984325Z inflating: build/lib/libprotobuf.a 2023-01-11T21:20:19.9096303Z inflating: build/lib/libgloo.a 2023-01-11T21:20:19.9122015Z inflating: build/lib/libfmt.a 2023-01-11T21:20:19.9122763Z inflating: build/lib/libfoxi_loader.a 2023-01-11T21:20:19.9123273Z inflating: build/lib/libcaffe2_nvrtc.so 2023-01-11T21:20:19.9188009Z inflating: build/lib/libc10.so 2023-01-11T21:20:19.9188304Z inflating: build/lib/libtorch_global_deps.so 2023-01-11T21:20:19.9196331Z inflating: build/lib/libcpuinfo.a 2023-01-11T21:20:19.9202737Z inflating: build/lib/libcpuinfo_internals.a 2023-01-11T21:20:19.9215850Z inflating: build/lib/libqnnpack.a 2023-01-11T21:20:19.9234612Z inflating: build/lib/libpytorch_qnnpack.a 2023-01-11T21:20:19.9686503Z inflating: build/lib/libprotoc.a 2023-01-11T21:20:19.9687305Z inflating: build/lib/libnnpack_reference_layers.a 2023-01-11T21:20:19.9701364Z inflating: build/lib/libgmock.a 2023-01-11T21:20:19.9701866Z inflating: build/lib/libgtest_main.a 2023-01-11T21:20:19.9702213Z inflating: build/lib/libbenchmark_main.a 2023-01-11T21:20:19.9719670Z inflating: build/lib/libnnpack.a 2023-01-11T21:20:20.0239882Z inflating: build/lib/libtensorpipe.a 2023-01-11T21:20:20.8002153Z inflating: build/lib/libdnnl.a 2023-01-11T21:20:20.8112797Z inflating: build/lib/libXNNPACK.a 2023-01-11T21:20:20.8155878Z inflating: build/lib/libc10_cuda.so 2023-01-11T21:20:20.8156369Z inflating: build/lib/libgmock_main.a 2023-01-11T21:20:20.9383742Z inflating: build/lib/libfbgemm.a 2023-01-11T21:20:21.0298368Z inflating: build/lib/libdnnl_graph.a 2023-01-11T21:20:21.0705882Z inflating: build/lib/libkineto.a 2023-01-11T21:20:21.0935900Z inflating: build/lib/libtensorpipe_cuda.a 2023-01-11T21:20:21.0970841Z inflating: build/lib/libcaffe2_protos.a 2023-01-11T21:20:21.1009185Z inflating: build/lib/libonnx_proto.a 2023-01-11T21:20:21.1543929Z inflating: build/lib/libonnx.a 2023-01-11T21:20:21.1883703Z inflating: build/lib/libgloo_cuda.a 2023-01-11T21:20:23.0706738Z inflating: build/lib/libtorch_cpu.so 2023-01-11T21:20:23.0715507Z inflating: build/lib/libunbox_lib.a 2023-01-11T21:20:24.4922402Z inflating: build/lib/libtorch_cuda.so 2023-01-11T21:20:24.4922760Z inflating: build/lib/libtorch.so 2023-01-11T21:20:25.2609609Z inflating: build/lib/libtorch_cuda_linalg.so 2023-01-11T21:20:25.2610184Z inflating: build/lib/libc10d_cuda_test.so 2023-01-11T21:20:25.2659915Z inflating: build/lib/libtorchbind_test.so 2023-01-11T21:20:25.2677879Z inflating: build/lib/libjitbackend_test.so 2023-01-11T21:20:25.2702322Z inflating: build/lib/libbackend_with_compiler.so 2023-01-11T21:20:25.2705793Z inflating: build/lib/libshm.so 2023-01-11T21:20:25.4167718Z inflating: build/lib/libtorch_python.so 2023-01-11T21:20:25.4197959Z inflating: build/lib/libnnapi_backend.so 2023-01-11T21:20:25.4198437Z creating: build/bin/ 2023-01-11T21:20:25.4242313Z inflating: build/bin/c10_CompileTimeFunctionPointer_test 2023-01-11T21:20:25.4287369Z inflating: build/bin/c10_DeviceGuard_test 2023-01-11T21:20:25.4330039Z inflating: build/bin/c10_Device_test 2023-01-11T21:20:25.4380806Z inflating: build/bin/c10_DispatchKeySet_test 2023-01-11T21:20:25.4422698Z inflating: build/bin/c10_StreamGuard_test 2023-01-11T21:20:25.4465948Z inflating: build/bin/c10_SymInt_test 2023-01-11T21:20:25.4515493Z inflating: build/bin/c10_InlineDeviceGuard_test 2023-01-11T21:20:25.4564024Z inflating: build/bin/c10_InlineStreamGuard_test 2023-01-11T21:20:25.4613657Z inflating: build/bin/c10_SizesAndStrides_test 2023-01-11T21:20:25.4656521Z inflating: build/bin/c10_Array_test 2023-01-11T21:20:25.4700915Z inflating: build/bin/c10_Bitset_test 2023-01-11T21:20:25.4762807Z inflating: build/bin/c10_C++17_test 2023-01-11T21:20:25.4804932Z inflating: build/bin/c10_ConstexprCrc_test 2023-01-11T21:20:25.4848024Z inflating: build/bin/c10_DeadlockDetection_test 2023-01-11T21:20:25.4891285Z inflating: build/bin/c10_Half_test 2023-01-11T21:20:25.4941443Z inflating: build/bin/c10_LeftRight_test 2023-01-11T21:20:25.4995972Z inflating: build/bin/c10_Metaprogramming_test 2023-01-11T21:20:25.5123146Z inflating: build/bin/c10_SmallVectorTest 2023-01-11T21:20:25.5167351Z inflating: build/bin/c10_Synchronized_test 2023-01-11T21:20:25.5213725Z inflating: build/bin/c10_TypeIndex_test 2023-01-11T21:20:25.5264237Z inflating: build/bin/c10_ThreadLocal_test 2023-01-11T21:20:25.5308458Z inflating: build/bin/c10_TypeList_test 2023-01-11T21:20:25.5350943Z inflating: build/bin/c10_TypeTraits_test 2023-01-11T21:20:25.5397517Z inflating: build/bin/c10_accumulate_test 2023-01-11T21:20:25.5446660Z inflating: build/bin/c10_bfloat16_test 2023-01-11T21:20:25.5493957Z inflating: build/bin/c10_complex_math_test 2023-01-11T21:20:25.5542721Z inflating: build/bin/c10_complex_test 2023-01-11T21:20:25.5637044Z inflating: build/bin/c10_either_test 2023-01-11T21:20:25.5684367Z inflating: build/bin/c10_exception_test 2023-01-11T21:20:25.5728093Z inflating: build/bin/c10_flags_test 2023-01-11T21:20:25.5873662Z inflating: build/bin/c10_intrusive_ptr_test 2023-01-11T21:20:25.5918824Z inflating: build/bin/c10_irange_test 2023-01-11T21:20:25.5968812Z inflating: build/bin/c10_logging_test 2023-01-11T21:20:25.6032023Z inflating: build/bin/c10_optional_test 2023-01-11T21:20:25.6087196Z inflating: build/bin/c10_ordered_preserving_dict_test 2023-01-11T21:20:25.6134150Z inflating: build/bin/c10_registry_test 2023-01-11T21:20:25.6185311Z inflating: build/bin/c10_string_view_test 2023-01-11T21:20:25.6232117Z inflating: build/bin/c10_tempfile_test 2023-01-11T21:20:25.6281588Z inflating: build/bin/c10_typeid_test 2023-01-11T21:20:25.6329823Z inflating: build/bin/c10_intrusive_ptr_benchmark 2023-01-11T21:20:25.6740571Z inflating: build/bin/protoc-3.13.0.0 2023-01-11T21:20:25.7152961Z inflating: build/bin/protoc 2023-01-11T21:20:25.7199792Z inflating: build/bin/c10_cuda_CUDAAssertionsTest_1_var_test 2023-01-11T21:20:25.7247889Z inflating: build/bin/c10_cuda_CUDAAssertionsTest_catches_stream 2023-01-11T21:20:25.7293243Z inflating: build/bin/c10_cuda_CUDAAssertionsTest_catches_thread_and_block_and_device 2023-01-11T21:20:25.7339361Z inflating: build/bin/c10_cuda_CUDAAssertionsTest_from_2_processes 2023-01-11T21:20:25.7385925Z inflating: build/bin/c10_cuda_CUDAAssertionsTest_multiple_writes_from_blocks_and_threads 2023-01-11T21:20:25.7434098Z inflating: build/bin/c10_cuda_CUDAAssertionsTest_multiple_writes_from_multiple_blocks 2023-01-11T21:20:25.7476406Z inflating: build/bin/c10_cuda_CUDATest 2023-01-11T21:20:25.7522929Z inflating: build/bin/c10_cuda_CUDAAssertionsTest_multiple_writes_from_same_block 2023-01-11T21:20:25.7779427Z inflating: build/bin/vec_test_all_types_DEFAULT 2023-01-11T21:20:25.8066438Z inflating: build/bin/vec_test_all_types_AVX2 2023-01-11T21:20:25.8114248Z inflating: build/bin/HashStoreTest 2023-01-11T21:20:25.8161496Z inflating: build/bin/FileStoreTest 2023-01-11T21:20:25.8213871Z inflating: build/bin/TCPStoreTest 2023-01-11T21:20:25.8226656Z inflating: build/bin/ProcessGroupMPITest 2023-01-11T21:20:25.8276920Z inflating: build/bin/test_edge_op_registration 2023-01-11T21:20:25.8279634Z inflating: build/bin/example_allreduce 2023-01-11T21:20:25.8327249Z inflating: build/bin/Dimname_test 2023-01-11T21:20:25.8390780Z inflating: build/bin/Dict_test 2023-01-11T21:20:25.8447404Z inflating: build/bin/MaybeOwned_test 2023-01-11T21:20:25.8497363Z inflating: build/bin/NamedTensor_test 2023-01-11T21:20:25.8549827Z inflating: build/bin/apply_utils_test 2023-01-11T21:20:25.8601760Z inflating: build/bin/atest 2023-01-11T21:20:25.8655720Z inflating: build/bin/basic 2023-01-11T21:20:25.8702584Z inflating: build/bin/broadcast_test 2023-01-11T21:20:25.8753254Z inflating: build/bin/cpu_generator_test 2023-01-11T21:20:25.8799753Z inflating: build/bin/cpu_profiling_allocator_test 2023-01-11T21:20:25.8843617Z inflating: build/bin/dispatch_key_set_test 2023-01-11T21:20:25.8920191Z inflating: build/bin/cpu_rng_test 2023-01-11T21:20:25.8963642Z inflating: build/bin/dlconvertor_test 2023-01-11T21:20:25.9015803Z inflating: build/bin/extension_backend_test 2023-01-11T21:20:25.9063395Z inflating: build/bin/half_test 2023-01-11T21:20:25.9106571Z inflating: build/bin/lazy_tensor_test 2023-01-11T21:20:25.9188644Z inflating: build/bin/ivalue_test 2023-01-11T21:20:25.9236652Z inflating: build/bin/math_kernel_test 2023-01-11T21:20:25.9285644Z inflating: build/bin/memory_format_test 2023-01-11T21:20:25.9331844Z inflating: build/bin/memory_overlapping_test 2023-01-11T21:20:25.9376584Z inflating: build/bin/operator_name_test 2023-01-11T21:20:25.9424867Z inflating: build/bin/native_test 2023-01-11T21:20:25.9471799Z inflating: build/bin/mobile_memory_cleanup 2023-01-11T21:20:25.9515687Z inflating: build/bin/operators_test 2023-01-11T21:20:25.9561938Z inflating: build/bin/packedtensoraccessor_test 2023-01-11T21:20:25.9612118Z inflating: build/bin/quantized_test 2023-01-11T21:20:25.9668708Z inflating: build/bin/pow_test 2023-01-11T21:20:25.9712590Z inflating: build/bin/reduce_ops_test 2023-01-11T21:20:25.9757660Z inflating: build/bin/reportMemoryUsage_test 2023-01-11T21:20:25.9806769Z inflating: build/bin/scalar_tensor_test 2023-01-11T21:20:25.9855595Z inflating: build/bin/scalar_test 2023-01-11T21:20:25.9901586Z inflating: build/bin/stride_properties_test 2023-01-11T21:20:25.9970334Z inflating: build/bin/tensor_iterator_test 2023-01-11T21:20:25.9972639Z inflating: build/bin/thread_init_test 2023-01-11T21:20:26.0020746Z inflating: build/bin/type_ptr_test 2023-01-11T21:20:26.0068890Z inflating: build/bin/test_parallel 2023-01-11T21:20:26.0113490Z inflating: build/bin/variant_test 2023-01-11T21:20:26.0166739Z inflating: build/bin/type_test 2023-01-11T21:20:26.0211521Z inflating: build/bin/undefined_tensor_test 2023-01-11T21:20:26.0212652Z inflating: build/bin/verify_api_visibility 2023-01-11T21:20:26.0274066Z inflating: build/bin/legacy_vmap_test 2023-01-11T21:20:26.0318877Z inflating: build/bin/weakref_test 2023-01-11T21:20:26.0371640Z inflating: build/bin/IListRef_test 2023-01-11T21:20:26.0465253Z inflating: build/bin/List_test 2023-01-11T21:20:26.0510005Z inflating: build/bin/wrapdim_test 2023-01-11T21:20:26.0553634Z inflating: build/bin/xla_tensor_test 2023-01-11T21:20:26.0658875Z inflating: build/bin/kernel_function_legacy_test 2023-01-11T21:20:26.0715416Z inflating: build/bin/KernelFunction_test 2023-01-11T21:20:26.0798394Z inflating: build/bin/kernel_function_test 2023-01-11T21:20:26.0908904Z inflating: build/bin/kernel_lambda_legacy_test 2023-01-11T21:20:26.0999333Z inflating: build/bin/kernel_lambda_test 2023-01-11T21:20:26.1051856Z inflating: build/bin/kernel_stackbased_test 2023-01-11T21:20:26.1135879Z inflating: build/bin/make_boxed_from_unboxed_functor_test 2023-01-11T21:20:26.1179622Z inflating: build/bin/CppSignature_test 2023-01-11T21:20:26.1220359Z inflating: build/bin/op_allowlist_test 2023-01-11T21:20:26.1267360Z inflating: build/bin/inline_container_test 2023-01-11T21:20:26.1317918Z inflating: build/bin/backend_fallback_test 2023-01-11T21:20:26.1566870Z inflating: build/bin/op_registration_test 2023-01-11T21:20:26.1611978Z inflating: build/bin/cuda_apply_test 2023-01-11T21:20:26.1659559Z inflating: build/bin/cuda_caching_host_allocator_test 2023-01-11T21:20:26.1701956Z inflating: build/bin/cuda_device_test 2023-01-11T21:20:26.1764314Z inflating: build/bin/cuda_complex_math_test 2023-01-11T21:20:26.1807877Z inflating: build/bin/cuda_dlconvertor_test 2023-01-11T21:20:26.1857582Z inflating: build/bin/cuda_complex_test 2023-01-11T21:20:26.1909826Z inflating: build/bin/cuda_atomic_ops_test 2023-01-11T21:20:26.1962971Z inflating: build/bin/cuda_cub_test 2023-01-11T21:20:26.2007082Z inflating: build/bin/cuda_integer_divider_test 2023-01-11T21:20:26.2052055Z inflating: build/bin/cuda_reportMemoryUsage_test 2023-01-11T21:20:26.2109622Z inflating: build/bin/cuda_distributions_test 2023-01-11T21:20:26.2163255Z inflating: build/bin/cuda_stream_test 2023-01-11T21:20:26.2206239Z inflating: build/bin/cuda_half_test 2023-01-11T21:20:26.2247553Z inflating: build/bin/cuda_cudnn_test 2023-01-11T21:20:26.2298551Z inflating: build/bin/cuda_generator_test 2023-01-11T21:20:26.2312467Z inflating: build/bin/tutorial_tensorexpr 2023-01-11T21:20:26.2369393Z inflating: build/bin/ProcessGroupGlooTest 2023-01-11T21:20:26.2420033Z inflating: build/bin/ProcessGroupGlooAsyncTest 2023-01-11T21:20:26.2474660Z inflating: build/bin/ProcessGroupNCCLTest 2023-01-11T21:20:26.2525238Z inflating: build/bin/ProcessGroupNCCLErrorsTest 2023-01-11T21:20:26.2567648Z inflating: build/bin/cuda_optional_test 2023-01-11T21:20:26.2613281Z inflating: build/bin/ProcessGroupUCCTest 2023-01-11T21:20:26.2660352Z inflating: build/bin/test_dist_autograd 2023-01-11T21:20:26.2705097Z inflating: build/bin/cuda_packedtensoraccessor_test 2023-01-11T21:20:26.2766711Z inflating: build/bin/test_cpp_rpc 2023-01-11T21:20:26.2775127Z inflating: build/bin/aot_model_compiler_test 2023-01-11T21:20:26.2776497Z inflating: build/bin/parallel_benchmark 2023-01-11T21:20:26.2837619Z inflating: build/bin/test_mobile_nnc 2023-01-11T21:20:26.3552462Z inflating: build/bin/test_tensorexpr 2023-01-11T21:20:26.3858065Z inflating: build/bin/test_lazy 2023-01-11T21:20:26.3903049Z inflating: build/bin/cuda_vectorized_test 2023-01-11T21:20:26.3907098Z inflating: build/bin/torch_shm_manager 2023-01-11T21:20:26.4943987Z inflating: build/bin/test_api 2023-01-11T21:20:26.5891923Z inflating: build/bin/test_jit 2023-01-11T21:20:26.5916575Z ##[group]Run df -H 2023-01-11T21:20:26.5916777Z df -H 2023-01-11T21:20:26.5927711Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2023-01-11T21:20:26.5927943Z env: 2023-01-11T21:20:26.5928134Z GIT_DEFAULT_BRANCH: master 2023-01-11T21:20:26.5928351Z GPU_FLAG: --gpus all 2023-01-11T21:20:26.5928541Z ##[endgroup] 2023-01-11T21:20:26.5955889Z Filesystem Size Used Avail Use% Mounted on 2023-01-11T21:20:26.5956161Z devtmpfs 34G 0 34G 0% /dev 2023-01-11T21:20:26.5956443Z tmpfs 34G 0 34G 0% /dev/shm 2023-01-11T21:20:26.5956938Z tmpfs 34G 476k 34G 1% /run 2023-01-11T21:20:26.5957185Z tmpfs 34G 0 34G 0% /sys/fs/cgroup 2023-01-11T21:20:26.5957419Z /dev/nvme0n1p1 162G 29G 133G 18% / 2023-01-11T21:20:26.5974020Z ##[group]Run .github/scripts/parse_ref.py 2023-01-11T21:20:26.5974279Z .github/scripts/parse_ref.py 2023-01-11T21:20:26.5983079Z shell: /usr/bin/bash -e {0} 2023-01-11T21:20:26.5983279Z env: 2023-01-11T21:20:26.5983463Z GIT_DEFAULT_BRANCH: master 2023-01-11T21:20:26.5983676Z GPU_FLAG: --gpus all 2023-01-11T21:20:26.5983874Z ##[endgroup] 2023-01-11T21:20:26.6198136Z ##[group]Run set -x 2023-01-11T21:20:26.6198464Z set -x 2023-01-11T21:20:26.6198703Z  2023-01-11T21:20:26.6198993Z if [[ $TEST_CONFIG == 'multigpu' ]]; then 2023-01-11T21:20:26.6199274Z  TEST_COMMAND=.jenkins/pytorch/multigpu-test.sh 2023-01-11T21:20:26.6199565Z elif [[ $BUILD_ENVIRONMENT == *onnx* ]]; then 2023-01-11T21:20:26.6199861Z  TEST_COMMAND=.jenkins/onnx/test.sh 2023-01-11T21:20:26.6200083Z else 2023-01-11T21:20:26.6200300Z  TEST_COMMAND=.jenkins/pytorch/test.sh 2023-01-11T21:20:26.6200521Z fi 2023-01-11T21:20:26.6200707Z  2023-01-11T21:20:26.6200958Z COMMIT_MESSAGES=$(git cherry -v "origin/${GIT_DEFAULT_BRANCH:-master}") 2023-01-11T21:20:26.6201232Z  2023-01-11T21:20:26.6201484Z # sanitize the input commit message and PR body here: 2023-01-11T21:20:26.6201734Z # 2023-01-11T21:20:26.6202053Z # trim all new lines from commit messages + PR_BODY to avoid issues with batch environment 2023-01-11T21:20:26.6202439Z # variable copying. see https://github.com/pytorch/pytorch/pull/80043#issuecomment-1167796028 2023-01-11T21:20:26.6202773Z COMMIT_MESSAGES="${COMMIT_MESSAGES//[$'\n\r']}" 2023-01-11T21:20:26.6203027Z PR_BODY="${PR_BODY//[$'\n\r']}" 2023-01-11T21:20:26.6203238Z  2023-01-11T21:20:26.6203517Z # then trim all special characters like single and double quotes to avoid unescaped inputs to 2023-01-11T21:20:26.6203811Z # wreak havoc internally 2023-01-11T21:20:26.6204072Z export COMMIT_MESSAGES="${COMMIT_MESSAGES//[\'\"]}" 2023-01-11T21:20:26.6204414Z export PR_BODY="${PR_BODY//[\'\"]}" 2023-01-11T21:20:26.6204625Z  2023-01-11T21:20:26.6204869Z # detached container should get cleaned up by teardown_ec2_linux 2023-01-11T21:20:26.6205175Z # TODO: Stop building test binaries as part of the build phase 2023-01-11T21:20:26.6205474Z # Used for GPU_FLAG since that doesn't play nice 2023-01-11T21:20:26.6205734Z # shellcheck disable=SC2086,SC2090 2023-01-11T21:20:26.6205972Z container_name=$(docker run \ 2023-01-11T21:20:26.6206185Z  ${GPU_FLAG:-} \ 2023-01-11T21:20:26.6206402Z  -e BUILD_ENVIRONMENT \ 2023-01-11T21:20:26.6206624Z  -e PR_NUMBER \ 2023-01-11T21:20:26.6206830Z  -e GITHUB_ACTIONS \ 2023-01-11T21:20:26.6207040Z  -e BASE_SHA \ 2023-01-11T21:20:26.6207239Z  -e BRANCH \ 2023-01-11T21:20:26.6207428Z  -e SHA1 \ 2023-01-11T21:20:26.6207639Z  -e AWS_DEFAULT_REGION \ 2023-01-11T21:20:26.6207854Z  -e IN_WHEEL_TEST \ 2023-01-11T21:20:26.6208054Z  -e SHARD_NUMBER \ 2023-01-11T21:20:26.6208262Z  -e TEST_CONFIG \ 2023-01-11T21:20:26.6208476Z  -e NUM_TEST_SHARDS \ 2023-01-11T21:20:26.6208683Z  -e PR_BODY \ 2023-01-11T21:20:26.6208887Z  -e COMMIT_MESSAGES \ 2023-01-11T21:20:26.6209111Z  -e CONTINUE_THROUGH_ERROR \ 2023-01-11T21:20:26.6209349Z  -e PYTORCH_RETRY_TEST_CASES \ 2023-01-11T21:20:26.6209591Z  -e PYTORCH_OVERRIDE_FLAKY_SIGNAL \ 2023-01-11T21:20:26.6209822Z  -e PR_LABELS \ 2023-01-11T21:20:26.6210053Z  -e MAX_JOBS="$(nproc --ignore=2)" \ 2023-01-11T21:20:26.6210280Z  -e SCCACHE_BUCKET \ 2023-01-11T21:20:26.6210502Z  -e SCCACHE_S3_KEY_PREFIX \ 2023-01-11T21:20:26.6210717Z  -e XLA_CUDA \ 2023-01-11T21:20:26.6210934Z  -e XLA_CLANG_CACHE_S3_BUCKET_NAME \ 2023-01-11T21:20:26.6211190Z  -e PYTORCH_TEST_CUDA_MEM_LEAK_CHECK \ 2023-01-11T21:20:26.6211451Z  -e PYTORCH_TEST_RERUN_DISABLED_TESTS \ 2023-01-11T21:20:26.6211729Z  --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \ 2023-01-11T21:20:26.6211976Z  --ulimit stack=10485760:83886080 \ 2023-01-11T21:20:26.6212261Z  --security-opt seccomp=unconfined \ 2023-01-11T21:20:26.6212503Z  --cap-add=SYS_PTRACE \ 2023-01-11T21:20:26.6212711Z  --ipc=host \ 2023-01-11T21:20:26.6212929Z  --shm-size="${SHM_SIZE}" \ 2023-01-11T21:20:26.6213212Z  --tty \ 2023-01-11T21:20:26.6213398Z  --detach \ 2023-01-11T21:20:26.6213612Z  --name="${container_name}" \ 2023-01-11T21:20:26.6213830Z  --user jenkins \ 2023-01-11T21:20:26.6214077Z  -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \ 2023-01-11T21:20:26.6214344Z  -w /var/lib/jenkins/workspace \ 2023-01-11T21:20:26.6214698Z  "${DOCKER_IMAGE}" 2023-01-11T21:20:26.6214891Z ) 2023-01-11T21:20:26.6215125Z echo "DOCKER_CONTAINER_ID=${container_name}" >> "${GITHUB_ENV}" 2023-01-11T21:20:26.6215476Z docker exec -t "${container_name}" sh -c "pip install $(echo dist/*.whl)[opt-einsum] && ${TEST_COMMAND}" 2023-01-11T21:20:26.6226107Z shell: /usr/bin/bash -e {0} 2023-01-11T21:20:26.6226304Z env: 2023-01-11T21:20:26.6226501Z GIT_DEFAULT_BRANCH: master 2023-01-11T21:20:26.6226719Z GPU_FLAG: --gpus all 2023-01-11T21:20:26.6226989Z BUILD_ENVIRONMENT: linux-bionic-cuda11.6-py3.10-gcc7-sm86 2023-01-11T21:20:26.6227245Z PR_NUMBER: 2023-01-11T21:20:26.6227430Z BRANCH: 2023-01-11T21:20:26.6227645Z SHA1: 8419ddda87c8a47eacc63b54bc7ec98c1f27c26e 2023-01-11T21:20:26.6227916Z BASE_SHA: 8419ddda87c8a47eacc63b54bc7ec98c1f27c26e 2023-01-11T21:20:26.6228166Z PYTORCH_RETRY_TEST_CASES: 1 2023-01-11T21:20:26.6228386Z PYTORCH_OVERRIDE_FLAKY_SIGNAL: 1 2023-01-11T21:20:26.6228608Z TEST_CONFIG: default 2023-01-11T21:20:26.6228852Z SHARD_NUMBER: 2 2023-01-11T21:20:26.6229040Z NUM_TEST_SHARDS: 4 2023-01-11T21:20:26.6229249Z PR_BODY: 2023-01-11T21:20:26.6229483Z CONTINUE_THROUGH_ERROR: False 2023-01-11T21:20:26.6229756Z SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2 2023-01-11T21:20:26.6230008Z SCCACHE_S3_KEY_PREFIX: trunk 2023-01-11T21:20:26.6230212Z SHM_SIZE: 2g 2023-01-11T21:20:26.6230594Z DOCKER_IMAGE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-bionic-cuda11.6-cudnn8-py3-gcc7:fd224c2e6c79d7fdec6408da598bf52bc5b201dd 2023-01-11T21:20:26.6230955Z XLA_CUDA: 2023-01-11T21:20:26.6231227Z XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla 2023-01-11T21:20:26.6231521Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK: 0 2023-01-11T21:20:26.6231753Z PYTORCH_TEST_RERUN_DISABLED_TESTS: 0 2023-01-11T21:20:26.6231965Z ##[endgroup] 2023-01-11T21:20:26.6255899Z + [[ default == \m\u\l\t\i\g\p\u ]] 2023-01-11T21:20:26.6257096Z + [[ linux-bionic-cuda11.6-py3.10-gcc7-sm86 == *onnx* ]] 2023-01-11T21:20:26.6257694Z + TEST_COMMAND=.jenkins/pytorch/test.sh 2023-01-11T21:20:26.6258559Z ++ git cherry -v origin/master 2023-01-11T21:20:26.6669375Z + COMMIT_MESSAGES='+ 52a16ce42647731c772e14e7175afa40fda07b3d make torchgen rename also Number arguments into '\''input'\'' 2023-01-11T21:20:26.6669843Z + 87db01a53ecb702267ec36787654e418a52f8e93 fix torch.where signature mismatch 2023-01-11T21:20:26.6670301Z + 8419ddda87c8a47eacc63b54bc7ec98c1f27c26e '\''other'\'' instead of '\''output'\'' in documentation' 2023-01-11T21:20:26.6672086Z + COMMIT_MESSAGES='+ 52a16ce42647731c772e14e7175afa40fda07b3d make torchgen rename also Number arguments into '\''input'\''+ 87db01a53ecb702267ec36787654e418a52f8e93 fix torch.where signature mismatch+ 8419ddda87c8a47eacc63b54bc7ec98c1f27c26e '\''other'\'' instead of '\''output'\'' in documentation' 2023-01-11T21:20:26.6673034Z + PR_BODY= 2023-01-11T21:20:26.6675987Z + export 'COMMIT_MESSAGES=+ 52a16ce42647731c772e14e7175afa40fda07b3d make torchgen rename also Number arguments into input+ 87db01a53ecb702267ec36787654e418a52f8e93 fix torch.where signature mismatch+ 8419ddda87c8a47eacc63b54bc7ec98c1f27c26e other instead of output in documentation' 2023-01-11T21:20:26.6678082Z + COMMIT_MESSAGES='+ 52a16ce42647731c772e14e7175afa40fda07b3d make torchgen rename also Number arguments into input+ 87db01a53ecb702267ec36787654e418a52f8e93 fix torch.where signature mismatch+ 8419ddda87c8a47eacc63b54bc7ec98c1f27c26e other instead of output in documentation' 2023-01-11T21:20:26.6679021Z + export PR_BODY= 2023-01-11T21:20:26.6679295Z + PR_BODY= 2023-01-11T21:20:26.6683916Z +++ nproc --ignore=2 2023-01-11T21:20:26.6694465Z ++ docker run --gpus all -e BUILD_ENVIRONMENT -e PR_NUMBER -e GITHUB_ACTIONS -e BASE_SHA -e BRANCH -e SHA1 -e AWS_DEFAULT_REGION -e IN_WHEEL_TEST -e SHARD_NUMBER -e TEST_CONFIG -e NUM_TEST_SHARDS -e PR_BODY -e COMMIT_MESSAGES -e CONTINUE_THROUGH_ERROR -e PYTORCH_RETRY_TEST_CASES -e PYTORCH_OVERRIDE_FLAKY_SIGNAL -e PR_LABELS -e MAX_JOBS=14 -e SCCACHE_BUCKET -e SCCACHE_S3_KEY_PREFIX -e XLA_CUDA -e XLA_CLANG_CACHE_S3_BUCKET_NAME -e PYTORCH_TEST_CUDA_MEM_LEAK_CHECK -e PYTORCH_TEST_RERUN_DISABLED_TESTS --env-file=/tmp/github_env_3896346758 --ulimit stack=10485760:83886080 --security-opt seccomp=unconfined --cap-add=SYS_PTRACE --ipc=host --shm-size=2g --tty --detach --name= --user jenkins -v /home/ec2-user/actions-runner/_work/pytorch/pytorch:/var/lib/jenkins/workspace -w /var/lib/jenkins/workspace 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-bionic-cuda11.6-cudnn8-py3-gcc7:fd224c2e6c79d7fdec6408da598bf52bc5b201dd 2023-01-11T21:20:40.4350268Z + container_name=b465a1e11c77f74805f694f8096e59c3761c5a109e9555d7d0ec24fd6c5fa0c6 2023-01-11T21:20:40.4350706Z + echo DOCKER_CONTAINER_ID=b465a1e11c77f74805f694f8096e59c3761c5a109e9555d7d0ec24fd6c5fa0c6 2023-01-11T21:20:40.4353337Z ++ echo dist/torch-2.0.0a0+git8419ddd-cp310-cp310-linux_x86_64.whl 2023-01-11T21:20:40.4354475Z + docker exec -t b465a1e11c77f74805f694f8096e59c3761c5a109e9555d7d0ec24fd6c5fa0c6 sh -c 'pip install dist/torch-2.0.0a0+git8419ddd-cp310-cp310-linux_x86_64.whl[opt-einsum] && .jenkins/pytorch/test.sh' 2023-01-11T21:20:40.8166662Z Processing ./dist/torch-2.0.0a0+git8419ddd-cp310-cp310-linux_x86_64.whl 2023-01-11T21:20:41.0777824Z Requirement already satisfied: typing-extensions in /opt/conda/lib/python3.10/site-packages (from torch==2.0.0a0+git8419ddd) (4.4.0) 2023-01-11T21:20:41.0779647Z Requirement already satisfied: sympy in /opt/conda/lib/python3.10/site-packages (from torch==2.0.0a0+git8419ddd) (1.11.1) 2023-01-11T21:20:41.0780648Z Requirement already satisfied: networkx in /opt/conda/lib/python3.10/site-packages (from torch==2.0.0a0+git8419ddd) (2.6.3) 2023-01-11T21:20:41.0790397Z Requirement already satisfied: opt-einsum>=3.3 in /opt/conda/lib/python3.10/site-packages (from torch==2.0.0a0+git8419ddd) (3.3.0) 2023-01-11T21:20:41.0845204Z Requirement already satisfied: numpy>=1.7 in /opt/conda/lib/python3.10/site-packages (from opt-einsum>=3.3->torch==2.0.0a0+git8419ddd) (1.21.2) 2023-01-11T21:20:41.0990474Z Requirement already satisfied: mpmath>=0.19 in /opt/conda/lib/python3.10/site-packages (from sympy->torch==2.0.0a0+git8419ddd) (1.2.1) 2023-01-11T21:20:41.7507138Z Installing collected packages: torch 2023-01-11T21:20:48.2537998Z Successfully installed torch-2.0.0a0+git8419ddd 2023-01-11T21:20:48.4025660Z + echo 'Environment variables:' 2023-01-11T21:20:48.4025928Z Environment variables: 2023-01-11T21:20:48.4026124Z + env 2023-01-11T21:20:48.4031541Z SHARD_NUMBER=2 2023-01-11T21:20:48.4032062Z NV_LIBCUBLAS_DEV_VERSION=11.9.2.110-1 2023-01-11T21:20:48.4032395Z NV_CUDA_COMPAT_PACKAGE=cuda-compat-11-6 2023-01-11T21:20:48.4032683Z LD_LIBRARY_PATH=/usr/local/nvidia/lib:/usr/local/nvidia/lib64 2023-01-11T21:20:48.4033009Z NV_LIBNCCL_DEV_PACKAGE=libnccl-dev=2.12.10-1+cuda11.6 2023-01-11T21:20:48.4034351Z UCC_HOME=/usr 2023-01-11T21:20:48.4034884Z BUILD_ENVIRONMENT=linux-bionic-cuda11.6-py3.10-gcc7-sm86 2023-01-11T21:20:48.4035190Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=0 2023-01-11T21:20:48.4035601Z NV_LIBNPP_DEV_PACKAGE=libnpp-dev-11-6=11.6.3.124-1 2023-01-11T21:20:48.4035979Z INSTALLED_DB=yes 2023-01-11T21:20:48.4036391Z HOSTNAME=b465a1e11c77 2023-01-11T21:20:48.4036619Z GITHUB_REF_NAME=ciflow/trunk/91627 2023-01-11T21:20:48.4036892Z GITHUB_API_URL=https://api.github.com 2023-01-11T21:20:48.4037235Z GITHUB_REPOSITORY_OWNER_ID=21003710 2023-01-11T21:20:48.4037717Z OPENSSL_DIR=/opt/openssl 2023-01-11T21:20:48.4038043Z UCC_COMMIT=1c7a7127186e7836f73aafbd7697bbc274a77eee 2023-01-11T21:20:48.4044501Z GITHUB_STEP_SUMMARY=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/step_summary_1df51260-fb3e-4787-8fe3-b0d112ea072e 2023-01-11T21:20:48.4045177Z CUDA_PATH=/usr/local/cuda 2023-01-11T21:20:48.4045756Z GITHUB_ACTION_PATH=/home/ec2-user/actions-runner/_work/pytorch/pytorch/./.github/actions/setup-linux 2023-01-11T21:20:48.4046082Z GITHUB_RUN_ATTEMPT=1 2023-01-11T21:20:48.4046287Z TEST_CONFIG=default 2023-01-11T21:20:48.4046548Z NV_LIBNPP_VERSION=11.6.3.124-1 2023-01-11T21:20:48.4046917Z NV_NVPROF_DEV_PACKAGE=cuda-nvprof-11-6=11.6.124-1 2023-01-11T21:20:48.4047174Z GITHUB_REPOSITORY_OWNER=pytorch 2023-01-11T21:20:48.4047428Z GITHUB_ACTIONS=true 2023-01-11T21:20:48.4047686Z NVIDIA_VISIBLE_DEVICES=all 2023-01-11T21:20:48.4047927Z NV_NVPROF_VERSION=11.6.124-1 2023-01-11T21:20:48.4048169Z NV_LIBCUSPARSE_VERSION=11.7.2.124-1 2023-01-11T21:20:48.4048573Z GITHUB_WORKFLOW_REF=pytorch/pytorch/.github/workflows/trunk.yml@refs/tags/ciflow/trunk/91627 2023-01-11T21:20:48.4048861Z NVIDIA_PRODUCT_NAME=CUDA 2023-01-11T21:20:48.4049071Z CI=true 2023-01-11T21:20:48.4049337Z PYTORCH_OVERRIDE_FLAKY_SIGNAL=1 2023-01-11T21:20:48.4049653Z NV_LIBCUBLAS_DEV_PACKAGE=libcublas-dev-11-6=11.9.2.110-1 2023-01-11T21:20:48.4049886Z BRANCH= 2023-01-11T21:20:48.4050146Z GITHUB_HEAD_REF= 2023-01-11T21:20:48.4050393Z UCX_COMMIT=31e74cac7bee0ef66bef2af72e7d86d9c282e5ab 2023-01-11T21:20:48.4050673Z GITHUB_ACTOR=pytorch-bot[bot] 2023-01-11T21:20:48.4051011Z CMAKE_CUDA_COMPILER_LAUNCHER=/opt/cache/bin/sccache 2023-01-11T21:20:48.4051384Z GITHUB_ACTION_REF= 2023-01-11T21:20:48.4051628Z NCCL_VERSION=2.12.10-1 2023-01-11T21:20:48.4051897Z GITHUB_ACTION=__self 2023-01-11T21:20:48.4052109Z GITHUB_REF_PROTECTED=false 2023-01-11T21:20:48.4052469Z XLA_CLANG_CACHE_S3_BUCKET_NAME=ossci-compiler-clang-cache-circleci-xla 2023-01-11T21:20:48.4052860Z PYTORCH_TEST_RERUN_DISABLED_TESTS=0 2023-01-11T21:20:48.4056417Z *** 2023-01-11T21:20:48.4056728Z INSTALLED_VISION=yes 2023-01-11T21:20:48.4056927Z NVARCH=x86_64 2023-01-11T21:20:48.4057297Z NV_LIBCUSPARSE_DEV_VERSION=11.7.2.124-1 2023-01-11T21:20:48.4057533Z HOME=/var/lib/jenkins 2023-01-11T21:20:48.4057953Z GITHUB_STATE=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/save_state_1df51260-fb3e-4787-8fe3-b0d112ea072e 2023-01-11T21:20:48.4058288Z CARGO_NET_GIT_FETCH_WITH_CLI=true 2023-01-11T21:20:48.4058514Z GITHUB_ACTION_REPOSITORY= 2023-01-11T21:20:48.4058714Z GITHUB_REF_TYPE=tag 2023-01-11T21:20:48.4058952Z NV_LIBNCCL_PACKAGE_VERSION=2.12.10-1 2023-01-11T21:20:48.4059189Z GITHUB_RETENTION_DAYS=90 2023-01-11T21:20:48.4059491Z SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2 2023-01-11T21:20:48.4059808Z NV_LIBNCCL_PACKAGE=libnccl2=2.12.10-1+cuda11.6 2023-01-11T21:20:48.4060236Z GITHUB_ENV=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/set_env_1df51260-fb3e-4787-8fe3-b0d112ea072e 2023-01-11T21:20:48.4060560Z DEBIAN_FRONTEND=noninteractive 2023-01-11T21:20:48.4060827Z NV_LIBNCCL_DEV_PACKAGE_NAME=libnccl-dev 2023-01-11T21:20:48.4061073Z GITHUB_REF=refs/tags/ciflow/trunk/91627 2023-01-11T21:20:48.4061322Z NV_CUDA_LIB_VERSION=11.6.2-1 2023-01-11T21:20:48.4061566Z GITHUB_SHA=8419ddda87c8a47eacc63b54bc7ec98c1f27c26e 2023-01-11T21:20:48.4061810Z INSTALLED_PROTOBUF=yes 2023-01-11T21:20:48.4062025Z GITHUB_REPOSITORY_ID=65600975 2023-01-11T21:20:48.4062228Z GITHUB_RUN_ID=3896346758 2023-01-11T21:20:48.4062503Z NV_LIBNPP_PACKAGE=libnpp-11-6=11.6.3.124-1 2023-01-11T21:20:48.4062740Z NV_LIBNCCL_PACKAGE_NAME=libnccl2 2023-01-11T21:20:48.4062970Z LIBRARY_PATH=/usr/local/cuda/lib64/stubs 2023-01-11T21:20:48.4063217Z NV_NVTX_VERSION=11.6.124-1 2023-01-11T21:20:48.4063432Z CONTINUE_THROUGH_ERROR=False 2023-01-11T21:20:48.4063668Z GITHUB_SERVER_URL=https://github.com 2023-01-11T21:20:48.4063887Z MAX_JOBS=14 2023-01-11T21:20:48.4064080Z GITHUB_ACTOR_ID=54816060 2023-01-11T21:20:48.4064306Z NV_LIBCUBLAS_VERSION=11.9.2.110-1 2023-01-11T21:20:48.4064801Z NV_LIBCUBLAS_PACKAGE=libcublas-11-6=11.9.2.110-1 2023-01-11T21:20:48.4065183Z GITHUB_EVENT_PATH=/home/ec2-user/actions-runner/_work/_temp/_github_workflow/event.json 2023-01-11T21:20:48.4065453Z UCX_HOME=/usr 2023-01-11T21:20:48.4065643Z PYTORCH_RETRY_TEST_CASES=1 2023-01-11T21:20:48.4065900Z GITHUB_GRAPHQL_URL=https://api.github.com/graphql 2023-01-11T21:20:48.4066179Z BASE_SHA=8419ddda87c8a47eacc63b54bc7ec98c1f27c26e 2023-01-11T21:20:48.4066441Z NV_CUDA_CUDART_DEV_VERSION=11.6.55-1 2023-01-11T21:20:48.4066652Z PR_BODY= 2023-01-11T21:20:48.4066832Z GITHUB_BASE_REF= 2023-01-11T21:20:48.4067008Z TERM=xterm 2023-01-11T21:20:48.4067185Z XLA_CUDA= 2023-01-11T21:20:48.4067396Z NV_NVML_DEV_VERSION=11.6.55-1 2023-01-11T21:20:48.4067604Z TORCH_CUDA_ARCH_LIST=Maxwell 2023-01-11T21:20:48.4067809Z CUDA_VERSION=11.6.2 2023-01-11T21:20:48.4068073Z NV_LIBCUBLAS_PACKAGE_NAME=libcublas-11-6 2023-01-11T21:20:48.4068303Z OPENSSL_ROOT_DIR=/opt/openssl 2023-01-11T21:20:48.4068725Z GITHUB_PATH=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/add_path_1df51260-fb3e-4787-8fe3-b0d112ea072e 2023-01-11T21:20:48.4069027Z GITHUB_JOB=test 2023-01-11T21:20:48.4069221Z SCCACHE_S3_KEY_PREFIX=trunk 2023-01-11T21:20:48.4069687Z COMMIT_MESSAGES=+ 52a16ce42647731c772e14e7175afa40fda07b3d make torchgen rename also Number arguments into input+ 87db01a53ecb702267ec36787654e418a52f8e93 fix torch.where signature mismatch+ 8419ddda87c8a47eacc63b54bc7ec98c1f27c26e other instead of output in documentation 2023-01-11T21:20:48.4070231Z NVIDIA_DRIVER_CAPABILITIES=compute,utility 2023-01-11T21:20:48.4070463Z NUM_TEST_SHARDS=4 2023-01-11T21:20:48.4070641Z PR_NUMBER= 2023-01-11T21:20:48.4071164Z GITHUB_OUTPUT=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/set_output_1df51260-fb3e-4787-8fe3-b0d112ea072e 2023-01-11T21:20:48.4071464Z SHLVL=1 2023-01-11T21:20:48.4071737Z NV_LIBCUBLAS_DEV_PACKAGE_NAME=libcublas-dev-11-6 2023-01-11T21:20:48.4071998Z GITHUB_REPOSITORY=pytorch/pytorch 2023-01-11T21:20:48.4072614Z NVIDIA_REQUIRE_CUDA=cuda>=11.6 brand=tesla,driver>=418,driver<419 brand=tesla,driver>=450,driver<451 brand=tesla,driver>=470,driver<471 brand=unknown,driver>=470,driver<471 brand=nvidia,driver>=470,driver<471 brand=nvidiartx,driver>=470,driver<471 brand=geforce,driver>=470,driver<471 brand=geforcertx,driver>=470,driver<471 brand=quadro,driver>=470,driver<471 brand=quadrortx,driver>=470,driver<471 brand=titan,driver>=470,driver<471 brand=titanrtx,driver>=470,driver<471 2023-01-11T21:20:48.4073190Z NV_LIBNPP_DEV_VERSION=11.6.3.124-1 2023-01-11T21:20:48.4073443Z SHA1=8419ddda87c8a47eacc63b54bc7ec98c1f27c26e 2023-01-11T21:20:48.4073672Z GITHUB_EVENT_NAME=push 2023-01-11T21:20:48.4073916Z NV_CUDA_CUDART_VERSION=11.6.55-1 2023-01-11T21:20:48.4074197Z TORCH_NVCC_FLAGS=-Xfatbin -compress-all 2023-01-11T21:20:48.4074421Z GITHUB_RUN_NUMBER=22986 2023-01-11T21:20:48.4074627Z GITHUB_WORKFLOW=trunk 2023-01-11T21:20:48.4074964Z PATH=/opt/cache/bin:/opt/conda/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin 2023-01-11T21:20:48.4075310Z NV_LIBNCCL_DEV_PACKAGE_VERSION=2.12.10-1 2023-01-11T21:20:48.4075589Z GITHUB_WORKFLOW_SHA=8419ddda87c8a47eacc63b54bc7ec98c1f27c26e 2023-01-11T21:20:48.4075964Z GITHUB_WORKSPACE=/home/ec2-user/actions-runner/_work/pytorch/pytorch 2023-01-11T21:20:48.4076296Z GITHUB_TRIGGERING_ACTOR=pytorch-bot[bot] 2023-01-11T21:20:48.4076513Z _=/usr/bin/env 2023-01-11T21:20:48.4076810Z ++ python -c 'import site; print(site.getsitepackages()[0])' 2023-01-11T21:20:48.4178294Z + TORCH_INSTALL_DIR=/opt/conda/lib/python3.10/site-packages/torch 2023-01-11T21:20:48.4178756Z + TORCH_BIN_DIR=/opt/conda/lib/python3.10/site-packages/torch/bin 2023-01-11T21:20:48.4179147Z + TORCH_LIB_DIR=/opt/conda/lib/python3.10/site-packages/torch/lib 2023-01-11T21:20:48.4180787Z + TORCH_TEST_DIR=/opt/conda/lib/python3.10/site-packages/torch/test 2023-01-11T21:20:48.4181309Z + BUILD_DIR=build 2023-01-11T21:20:48.4181721Z + BUILD_RENAMED_DIR=build_renamed 2023-01-11T21:20:48.4182352Z + BUILD_BIN_DIR=build/bin 2023-01-11T21:20:48.4182753Z + export VALGRIND=ON 2023-01-11T21:20:48.4183151Z + VALGRIND=ON 2023-01-11T21:20:48.4183568Z + export TORCH_INDUCTOR_INSTALL_GXX=ON 2023-01-11T21:20:48.4184017Z + TORCH_INDUCTOR_INSTALL_GXX=ON 2023-01-11T21:20:48.4184671Z + [[ linux-bionic-cuda11.6-py3.10-gcc7-sm86 == *clang9* ]] 2023-01-11T21:20:48.4185360Z + [[ linux-bionic-cuda11.6-py3.10-gcc7-sm86 != *bazel* ]] 2023-01-11T21:20:48.4185625Z ++ realpath build/custom_test_artifacts 2023-01-11T21:20:48.4187748Z + CUSTOM_TEST_ARTIFACT_BUILD_DIR=/var/lib/jenkins/workspace/build/custom_test_artifacts 2023-01-11T21:20:48.4190481Z ++ dirname .jenkins/pytorch/test.sh 2023-01-11T21:20:48.4196340Z + source .jenkins/pytorch/common.sh 2023-01-11T21:20:48.4199076Z +++ dirname .jenkins/pytorch/common.sh 2023-01-11T21:20:48.4205377Z ++ source .jenkins/pytorch/common_utils.sh 2023-01-11T21:20:48.4207783Z +++ declare -f -t trap_add 2023-01-11T21:20:48.4212939Z ++ set -ex 2023-01-11T21:20:48.4213592Z ++ [[ linux-bionic-cuda11.6-py3.10-gcc7-sm86 == *rocm* ]] 2023-01-11T21:20:48.4213905Z ++ BUILD_TEST_LIBTORCH=0 2023-01-11T21:20:48.4214272Z + echo 'Environment variables' 2023-01-11T21:20:48.4214679Z Environment variables 2023-01-11T21:20:48.4214887Z + env 2023-01-11T21:20:48.4218852Z SHARD_NUMBER=2 2023-01-11T21:20:48.4219195Z NV_LIBCUBLAS_DEV_VERSION=11.9.2.110-1 2023-01-11T21:20:48.4219564Z NV_CUDA_COMPAT_PACKAGE=cuda-compat-11-6 2023-01-11T21:20:48.4221545Z LD_LIBRARY_PATH=/usr/local/nvidia/lib:/usr/local/nvidia/lib64 2023-01-11T21:20:48.4222062Z NV_LIBNCCL_DEV_PACKAGE=libnccl-dev=2.12.10-1+cuda11.6 2023-01-11T21:20:48.4222377Z UCC_HOME=/usr 2023-01-11T21:20:48.4222824Z BUILD_ENVIRONMENT=linux-bionic-cuda11.6-py3.10-gcc7-sm86 2023-01-11T21:20:48.4223430Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=0 2023-01-11T21:20:48.4223876Z NV_LIBNPP_DEV_PACKAGE=libnpp-dev-11-6=11.6.3.124-1 2023-01-11T21:20:48.4224167Z INSTALLED_DB=yes 2023-01-11T21:20:48.4224375Z HOSTNAME=b465a1e11c77 2023-01-11T21:20:48.4224596Z GITHUB_REF_NAME=ciflow/trunk/91627 2023-01-11T21:20:48.4224861Z GITHUB_API_URL=https://api.github.com 2023-01-11T21:20:48.4225107Z GITHUB_REPOSITORY_OWNER_ID=21003710 2023-01-11T21:20:48.4225340Z OPENSSL_DIR=/opt/openssl 2023-01-11T21:20:48.4225588Z UCC_COMMIT=1c7a7127186e7836f73aafbd7697bbc274a77eee 2023-01-11T21:20:48.4226054Z GITHUB_STEP_SUMMARY=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/step_summary_1df51260-fb3e-4787-8fe3-b0d112ea072e 2023-01-11T21:20:48.4226386Z CUDA_PATH=/usr/local/cuda 2023-01-11T21:20:48.4226776Z GITHUB_ACTION_PATH=/home/ec2-user/actions-runner/_work/pytorch/pytorch/./.github/actions/setup-linux 2023-01-11T21:20:48.4227067Z GITHUB_RUN_ATTEMPT=1 2023-01-11T21:20:48.4227274Z TEST_CONFIG=default 2023-01-11T21:20:48.4227503Z NV_LIBNPP_VERSION=11.6.3.124-1 2023-01-11T21:20:48.4227796Z NV_NVPROF_DEV_PACKAGE=cuda-nvprof-11-6=11.6.124-1 2023-01-11T21:20:48.4228059Z GITHUB_REPOSITORY_OWNER=pytorch 2023-01-11T21:20:48.4228278Z GITHUB_ACTIONS=true 2023-01-11T21:20:48.4228503Z NVIDIA_VISIBLE_DEVICES=all 2023-01-11T21:20:48.4228782Z NV_NVPROF_VERSION=11.6.124-1 2023-01-11T21:20:48.4229096Z NV_LIBCUSPARSE_VERSION=11.7.2.124-1 2023-01-11T21:20:48.4229500Z GITHUB_WORKFLOW_REF=pytorch/pytorch/.github/workflows/trunk.yml@refs/tags/ciflow/trunk/91627 2023-01-11T21:20:48.4229872Z NVIDIA_PRODUCT_NAME=CUDA 2023-01-11T21:20:48.4230118Z CI=true 2023-01-11T21:20:48.4230319Z PYTORCH_OVERRIDE_FLAKY_SIGNAL=1 2023-01-11T21:20:48.4230634Z NV_LIBCUBLAS_DEV_PACKAGE=libcublas-dev-11-6=11.9.2.110-1 2023-01-11T21:20:48.4230870Z BRANCH= 2023-01-11T21:20:48.4231046Z GITHUB_HEAD_REF= 2023-01-11T21:20:48.4231283Z UCX_COMMIT=31e74cac7bee0ef66bef2af72e7d86d9c282e5ab 2023-01-11T21:20:48.4231577Z GITHUB_ACTOR=pytorch-bot[bot] 2023-01-11T21:20:48.4231832Z CMAKE_CUDA_COMPILER_LAUNCHER=/opt/cache/bin/sccache 2023-01-11T21:20:48.4232055Z GITHUB_ACTION_REF= 2023-01-11T21:20:48.4232271Z NCCL_VERSION=2.12.10-1 2023-01-11T21:20:48.4232475Z GITHUB_ACTION=__self 2023-01-11T21:20:48.4232660Z VALGRIND=ON 2023-01-11T21:20:48.4232858Z GITHUB_REF_PROTECTED=false 2023-01-11T21:20:48.4233328Z XLA_CLANG_CACHE_S3_BUCKET_NAME=ossci-compiler-clang-cache-circleci-xla 2023-01-11T21:20:48.4233638Z PYTORCH_TEST_RERUN_DISABLED_TESTS=0 2023-01-11T21:20:48.4233970Z *** 2023-01-11T21:20:48.4234148Z INSTALLED_VISION=yes 2023-01-11T21:20:48.4234350Z NVARCH=x86_64 2023-01-11T21:20:48.4234590Z NV_LIBCUSPARSE_DEV_VERSION=11.7.2.124-1 2023-01-11T21:20:48.4234808Z HOME=/var/lib/jenkins 2023-01-11T21:20:48.4235225Z GITHUB_STATE=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/save_state_1df51260-fb3e-4787-8fe3-b0d112ea072e 2023-01-11T21:20:48.4235602Z CARGO_NET_GIT_FETCH_WITH_CLI=true 2023-01-11T21:20:48.4235822Z GITHUB_ACTION_REPOSITORY= 2023-01-11T21:20:48.4236038Z GITHUB_REF_TYPE=tag 2023-01-11T21:20:48.4236279Z NV_LIBNCCL_PACKAGE_VERSION=2.12.10-1 2023-01-11T21:20:48.4236509Z GITHUB_RETENTION_DAYS=90 2023-01-11T21:20:48.4236805Z SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2 2023-01-11T21:20:48.4237131Z NV_LIBNCCL_PACKAGE=libnccl2=2.12.10-1+cuda11.6 2023-01-11T21:20:48.4237564Z GITHUB_ENV=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/set_env_1df51260-fb3e-4787-8fe3-b0d112ea072e 2023-01-11T21:20:48.4237875Z DEBIAN_FRONTEND=noninteractive 2023-01-11T21:20:48.4238155Z NV_LIBNCCL_DEV_PACKAGE_NAME=libnccl-dev 2023-01-11T21:20:48.4238402Z GITHUB_REF=refs/tags/ciflow/trunk/91627 2023-01-11T21:20:48.4238643Z NV_CUDA_LIB_VERSION=11.6.2-1 2023-01-11T21:20:48.4238900Z GITHUB_SHA=8419ddda87c8a47eacc63b54bc7ec98c1f27c26e 2023-01-11T21:20:48.4239146Z INSTALLED_PROTOBUF=yes 2023-01-11T21:20:48.4239357Z GITHUB_REPOSITORY_ID=65600975 2023-01-11T21:20:48.4239576Z GITHUB_RUN_ID=3896346758 2023-01-11T21:20:48.4239850Z NV_LIBNPP_PACKAGE=libnpp-11-6=11.6.3.124-1 2023-01-11T21:20:48.4240132Z NV_LIBNCCL_PACKAGE_NAME=libnccl2 2023-01-11T21:20:48.4240374Z LIBRARY_PATH=/usr/local/cuda/lib64/stubs 2023-01-11T21:20:48.4240627Z NV_NVTX_VERSION=11.6.124-1 2023-01-11T21:20:48.4240840Z CONTINUE_THROUGH_ERROR=False 2023-01-11T21:20:48.4241083Z GITHUB_SERVER_URL=https://github.com 2023-01-11T21:20:48.4241313Z MAX_JOBS=14 2023-01-11T21:20:48.4241499Z GITHUB_ACTOR_ID=54816060 2023-01-11T21:20:48.4241738Z NV_LIBCUBLAS_VERSION=11.9.2.110-1 2023-01-11T21:20:48.4242039Z NV_LIBCUBLAS_PACKAGE=libcublas-11-6=11.9.2.110-1 2023-01-11T21:20:48.4242419Z GITHUB_EVENT_PATH=/home/ec2-user/actions-runner/_work/_temp/_github_workflow/event.json 2023-01-11T21:20:48.4242691Z UCX_HOME=/usr 2023-01-11T21:20:48.4242897Z PYTORCH_RETRY_TEST_CASES=1 2023-01-11T21:20:48.4243161Z GITHUB_GRAPHQL_URL=https://api.github.com/graphql 2023-01-11T21:20:48.4243441Z BASE_SHA=8419ddda87c8a47eacc63b54bc7ec98c1f27c26e 2023-01-11T21:20:48.4243720Z NV_CUDA_CUDART_DEV_VERSION=11.6.55-1 2023-01-11T21:20:48.4243936Z PR_BODY= 2023-01-11T21:20:48.4244114Z GITHUB_BASE_REF= 2023-01-11T21:20:48.4244305Z TERM=xterm 2023-01-11T21:20:48.4244511Z TORCH_INDUCTOR_INSTALL_GXX=ON 2023-01-11T21:20:48.4244709Z XLA_CUDA= 2023-01-11T21:20:48.4244923Z NV_NVML_DEV_VERSION=11.6.55-1 2023-01-11T21:20:48.4245144Z TORCH_CUDA_ARCH_LIST=Maxwell 2023-01-11T21:20:48.4245347Z CUDA_VERSION=11.6.2 2023-01-11T21:20:48.4245614Z NV_LIBCUBLAS_PACKAGE_NAME=libcublas-11-6 2023-01-11T21:20:48.4245854Z OPENSSL_ROOT_DIR=/opt/openssl 2023-01-11T21:20:48.4246303Z GITHUB_PATH=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/add_path_1df51260-fb3e-4787-8fe3-b0d112ea072e 2023-01-11T21:20:48.4246620Z GITHUB_JOB=test 2023-01-11T21:20:48.4246826Z SCCACHE_S3_KEY_PREFIX=trunk 2023-01-11T21:20:48.4247296Z COMMIT_MESSAGES=+ 52a16ce42647731c772e14e7175afa40fda07b3d make torchgen rename also Number arguments into input+ 87db01a53ecb702267ec36787654e418a52f8e93 fix torch.where signature mismatch+ 8419ddda87c8a47eacc63b54bc7ec98c1f27c26e other instead of output in documentation 2023-01-11T21:20:48.4247766Z NVIDIA_DRIVER_CAPABILITIES=compute,utility 2023-01-11T21:20:48.4247996Z NUM_TEST_SHARDS=4 2023-01-11T21:20:48.4248187Z PR_NUMBER= 2023-01-11T21:20:48.4248584Z GITHUB_OUTPUT=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/set_output_1df51260-fb3e-4787-8fe3-b0d112ea072e 2023-01-11T21:20:48.4248924Z SHLVL=1 2023-01-11T21:20:48.4249199Z NV_LIBCUBLAS_DEV_PACKAGE_NAME=libcublas-dev-11-6 2023-01-11T21:20:48.4249455Z GITHUB_REPOSITORY=pytorch/pytorch 2023-01-11T21:20:48.4250060Z NVIDIA_REQUIRE_CUDA=cuda>=11.6 brand=tesla,driver>=418,driver<419 brand=tesla,driver>=450,driver<451 brand=tesla,driver>=470,driver<471 brand=unknown,driver>=470,driver<471 brand=nvidia,driver>=470,driver<471 brand=nvidiartx,driver>=470,driver<471 brand=geforce,driver>=470,driver<471 brand=geforcertx,driver>=470,driver<471 brand=quadro,driver>=470,driver<471 brand=quadrortx,driver>=470,driver<471 brand=titan,driver>=470,driver<471 brand=titanrtx,driver>=470,driver<471 2023-01-11T21:20:48.4250651Z NV_LIBNPP_DEV_VERSION=11.6.3.124-1 2023-01-11T21:20:48.4250906Z SHA1=8419ddda87c8a47eacc63b54bc7ec98c1f27c26e 2023-01-11T21:20:48.4251137Z GITHUB_EVENT_NAME=push 2023-01-11T21:20:48.4251379Z NV_CUDA_CUDART_VERSION=11.6.55-1 2023-01-11T21:20:48.4251654Z TORCH_NVCC_FLAGS=-Xfatbin -compress-all 2023-01-11T21:20:48.4251919Z GITHUB_RUN_NUMBER=22986 2023-01-11T21:20:48.4252121Z GITHUB_WORKFLOW=trunk 2023-01-11T21:20:48.4252451Z PATH=/opt/cache/bin:/opt/conda/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin 2023-01-11T21:20:48.4252806Z NV_LIBNCCL_DEV_PACKAGE_VERSION=2.12.10-1 2023-01-11T21:20:48.4253076Z GITHUB_WORKFLOW_SHA=8419ddda87c8a47eacc63b54bc7ec98c1f27c26e 2023-01-11T21:20:48.4253547Z GITHUB_WORKSPACE=/home/ec2-user/actions-runner/_work/pytorch/pytorch 2023-01-11T21:20:48.4253878Z GITHUB_TRIGGERING_ACTOR=pytorch-bot[bot] 2023-01-11T21:20:48.4254091Z _=/usr/bin/env 2023-01-11T21:20:48.4254317Z + echo 'Testing pytorch' 2023-01-11T21:20:48.4254764Z Testing pytorch 2023-01-11T21:20:48.4255002Z + export LANG=C.UTF-8 2023-01-11T21:20:48.4255247Z + LANG=C.UTF-8 2023-01-11T21:20:48.4255493Z + PR_NUMBER= 2023-01-11T21:20:48.4255696Z + [[ default == \d\e\f\a\u\l\t ]] 2023-01-11T21:20:48.4255907Z + export CUDA_VISIBLE_DEVICES=0 2023-01-11T21:20:48.4256126Z + CUDA_VISIBLE_DEVICES=0 2023-01-11T21:20:48.4256338Z + export HIP_VISIBLE_DEVICES=0 2023-01-11T21:20:48.4256543Z + HIP_VISIBLE_DEVICES=0 2023-01-11T21:20:48.4256759Z + [[ default == \d\i\s\t\r\i\b\u\t\e\d ]] 2023-01-11T21:20:48.4256976Z + [[ default == \s\l\o\w ]] 2023-01-11T21:20:48.4257304Z + [[ linux-bionic-cuda11.6-py3.10-gcc7-sm86 == *slow-gradcheck* ]] 2023-01-11T21:20:48.4257679Z + [[ linux-bionic-cuda11.6-py3.10-gcc7-sm86 == *cuda* ]] 2023-01-11T21:20:48.4257960Z + export PYTORCH_TESTING_DEVICE_ONLY_FOR=cuda 2023-01-11T21:20:48.4258202Z + PYTORCH_TESTING_DEVICE_ONLY_FOR=cuda 2023-01-11T21:20:48.4258429Z + [[ default == *crossref* ]] 2023-01-11T21:20:48.4258640Z + [[ default == *dynamo* ]] 2023-01-11T21:20:48.4258838Z + [[ default == *inductor* ]] 2023-01-11T21:20:48.4259150Z + [[ linux-bionic-cuda11.6-py3.10-gcc7-sm86 == *rocm* ]] 2023-01-11T21:20:48.4259513Z + [[ linux-bionic-cuda11.6-py3.10-gcc7-sm86 != *-bazel-* ]] 2023-01-11T21:20:48.4259812Z + pip_install --user ninja==1.10.2 2023-01-11T21:20:48.4260126Z + pip install --progress-bar off --user ninja==1.10.2 2023-01-11T21:20:48.7914126Z Collecting ninja==1.10.2 2023-01-11T21:20:48.8073931Z Downloading ninja-1.10.2-py2.py3-none-manylinux_2_5_x86_64.manylinux1_x86_64.whl (108 kB) 2023-01-11T21:20:49.4199468Z Installing collected packages: ninja 2023-01-11T21:20:49.4267972Z  WARNING: The script ninja is installed in '/var/lib/jenkins/.local/bin' which is not on PATH. 2023-01-11T21:20:49.4268500Z Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location. 2023-01-11T21:20:49.4317182Z Successfully installed ninja-1.10.2 2023-01-11T21:20:49.4966198Z + export PATH=/var/lib/jenkins/.local/bin:/opt/cache/bin:/opt/conda/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin 2023-01-11T21:20:49.4967045Z + PATH=/var/lib/jenkins/.local/bin:/opt/cache/bin:/opt/conda/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin 2023-01-11T21:20:49.4967636Z + [[ linux-bionic-cuda11.6-py3.10-gcc7-sm86 == *asan* ]] 2023-01-11T21:20:49.4968001Z + [[ linux-bionic-cuda11.6-py3.10-gcc7-sm86 == *-tsan* ]] 2023-01-11T21:20:49.4968272Z + [[ default == \n\o\g\p\u\_\N\O\_\A\V\X\2 ]] 2023-01-11T21:20:49.4968519Z + [[ default == \n\o\g\p\u\_\A\V\X\5\1\2 ]] 2023-01-11T21:20:49.4975788Z + [[ linux-bionic-cuda11.6-py3.10-gcc7-sm86 == *tbb* ]] 2023-01-11T21:20:49.4985745Z + [[ linux-bionic-cuda11.6-py3.10-gcc7-sm86 == *libtorch* ]] 2023-01-11T21:20:49.4986138Z + [[ linux-bionic-cuda11.6-py3.10-gcc7-sm86 == *-bazel-* ]] 2023-01-11T21:20:49.4986500Z + [[ linux-bionic-cuda11.6-py3.10-gcc7-sm86 == *-tsan* ]] 2023-01-11T21:20:49.4988293Z + cd test 2023-01-11T21:20:49.4988727Z + python -c 'import torch; print(torch.__config__.show())' 2023-01-11T21:20:50.7017088Z PyTorch built with: 2023-01-11T21:20:50.7017594Z - GCC 7.5 2023-01-11T21:20:50.7017933Z - C++ Version: 201703 2023-01-11T21:20:50.7018510Z - Intel(R) oneAPI Math Kernel Library Version 2022.0-Product Build 20211112 for Intel(R) 64 architecture applications 2023-01-11T21:20:50.7019103Z - Intel(R) MKL-DNN v2.7.2 (Git Hash fbec3e25a559ee252022ae066817b204e106a6ba) 2023-01-11T21:20:50.7019535Z - OpenMP 201511 (a.k.a. OpenMP 4.5) 2023-01-11T21:20:50.7019934Z - LAPACK is enabled (usually provided by MKL) 2023-01-11T21:20:50.7020265Z - NNPACK is enabled 2023-01-11T21:20:50.7020582Z - CPU capability usage: AVX2 2023-01-11T21:20:50.7020825Z - CUDA Runtime 11.6 2023-01-11T21:20:50.7021134Z - NVCC architecture flags: -gencode;arch=compute_86,code=sm_86 2023-01-11T21:20:50.7021437Z - CuDNN 8.3.2 (built against CUDA 11.5) 2023-01-11T21:20:50.7021998Z - Magma 2.6.1 2023-01-11T21:20:50.7024325Z - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.6, CUDNN_VERSION=8.3.2, CXX_COMPILER=/opt/cache/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wunused-local-typedefs -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Werror -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, FORCE_FALLBACK_CUDA_MPI=1, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_DISABLE_GPU_ASSERTS=ON, TORCH_VERSION=2.0.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=ON, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, 2023-01-11T21:20:50.7026038Z 2023-01-11T21:20:50.9726584Z + cd test 2023-01-11T21:20:50.9727132Z + python -c 'import torch; print(torch.__config__.parallel_info())' 2023-01-11T21:20:52.0695883Z ATen/Parallel: 2023-01-11T21:20:52.0707702Z at::get_num_threads() : 8 2023-01-11T21:20:52.0707954Z at::get_num_interop_threads() : 8 2023-01-11T21:20:52.0708189Z OpenMP 201511 (a.k.a. OpenMP 4.5) 2023-01-11T21:20:52.0708400Z omp_get_max_threads() : 8 2023-01-11T21:20:52.0708933Z Intel(R) oneAPI Math Kernel Library Version 2022.0-Product Build 20211112 for Intel(R) 64 architecture applications 2023-01-11T21:20:52.0709252Z mkl_get_max_threads() : 8 2023-01-11T21:20:52.0709603Z Intel(R) MKL-DNN v2.7.2 (Git Hash fbec3e25a559ee252022ae066817b204e106a6ba) 2023-01-11T21:20:52.0709893Z std::thread::hardware_concurrency() : 16 2023-01-11T21:20:52.0710124Z Environment variables: 2023-01-11T21:20:52.0710328Z OMP_NUM_THREADS : [not set] 2023-01-11T21:20:52.0710538Z MKL_NUM_THREADS : [not set] 2023-01-11T21:20:52.0711099Z ATen parallel backend: OpenMP 2023-01-11T21:20:52.0711251Z 2023-01-11T21:20:52.3144835Z + [[ default == *backward* ]] 2023-01-11T21:20:52.3145485Z + [[ default == *xla* ]] 2023-01-11T21:20:52.3146062Z + [[ default == \j\i\t\_\l\e\g\a\c\y ]] 2023-01-11T21:20:52.3147072Z + [[ linux-bionic-cuda11.6-py3.10-gcc7-sm86 == *libtorch* ]] 2023-01-11T21:20:52.3147464Z + [[ default == distributed ]] 2023-01-11T21:20:52.3147858Z + [[ default == deploy ]] 2023-01-11T21:20:52.3148875Z + [[ default == *inductor_distributed* ]] 2023-01-11T21:20:52.3149708Z + [[ default == *dynamo* ]] 2023-01-11T21:20:52.3150165Z + [[ default == *dynamo* ]] 2023-01-11T21:20:52.3150618Z + [[ default == *inductor_huggingface* ]] 2023-01-11T21:20:52.3150983Z + [[ default == *inductor_timm* ]] 2023-01-11T21:20:52.3151319Z + [[ default == *inductor_torchbench* ]] 2023-01-11T21:20:52.3151626Z + [[ default == *inductor* ]] 2023-01-11T21:20:52.3151834Z + [[ 2 == 1 ]] 2023-01-11T21:20:52.3152051Z + [[ 2 == 2 ]] 2023-01-11T21:20:52.3152330Z + [[ 4 -gt 1 ]] 2023-01-11T21:20:52.3152537Z + install_torchvision 2023-01-11T21:20:52.3152773Z + local commit 2023-01-11T21:20:52.3153057Z ++ get_pinned_commit vision 2023-01-11T21:20:52.3153319Z ++ cat .github/ci_commit_pins/vision.txt 2023-01-11T21:20:52.3159820Z + commit=32d254bbfcf14975f846765775584e61ef25a5bc 2023-01-11T21:20:52.3160604Z + pip_install --no-use-pep517 --user git+https://github.com/pytorch/vision.git@32d254bbfcf14975f846765775584e61ef25a5bc 2023-01-11T21:20:52.3161435Z + pip install --progress-bar off --no-use-pep517 --user git+https://github.com/pytorch/vision.git@32d254bbfcf14975f846765775584e61ef25a5bc 2023-01-11T21:20:52.6303510Z Collecting git+https://github.com/pytorch/vision.git@32d254bbfcf14975f846765775584e61ef25a5bc 2023-01-11T21:20:52.6306352Z Cloning https://github.com/pytorch/vision.git (to revision 32d254bbfcf14975f846765775584e61ef25a5bc) to /tmp/pip-req-build-2lgu94ct 2023-01-11T21:20:52.6321892Z Running command git clone --filter=blob:none --quiet https://github.com/pytorch/vision.git /tmp/pip-req-build-2lgu94ct 2023-01-11T21:20:54.8556672Z Running command git rev-parse -q --verify 'sha^32d254bbfcf14975f846765775584e61ef25a5bc' 2023-01-11T21:20:54.8576357Z Running command git fetch -q https://github.com/pytorch/vision.git 32d254bbfcf14975f846765775584e61ef25a5bc 2023-01-11T21:20:55.7900132Z Running command git checkout -q 32d254bbfcf14975f846765775584e61ef25a5bc 2023-01-11T21:20:56.3710976Z Resolved https://github.com/pytorch/vision.git to commit 32d254bbfcf14975f846765775584e61ef25a5bc 2023-01-11T21:20:58.3718106Z Preparing metadata (setup.py) ... [?25l- done 2023-01-11T21:20:58.3773502Z [?25hRequirement already satisfied: typing_extensions in /opt/conda/lib/python3.10/site-packages (from torchvision==0.15.0a0+32d254b) (4.4.0) 2023-01-11T21:20:58.3776158Z Requirement already satisfied: numpy in /opt/conda/lib/python3.10/site-packages (from torchvision==0.15.0a0+32d254b) (1.21.2) 2023-01-11T21:20:58.3779229Z Requirement already satisfied: requests in /opt/conda/lib/python3.10/site-packages (from torchvision==0.15.0a0+32d254b) (2.28.1) 2023-01-11T21:20:58.3781945Z Requirement already satisfied: torch in /opt/conda/lib/python3.10/site-packages (from torchvision==0.15.0a0+32d254b) (2.0.0a0+git8419ddd) 2023-01-11T21:20:58.3787893Z Requirement already satisfied: pillow!=8.3.*,>=5.3.0 in /opt/conda/lib/python3.10/site-packages (from torchvision==0.15.0a0+32d254b) (9.3.0) 2023-01-11T21:20:58.3946762Z Requirement already satisfied: certifi>=2017.4.17 in /opt/conda/lib/python3.10/site-packages (from requests->torchvision==0.15.0a0+32d254b) (2022.12.7) 2023-01-11T21:20:58.3952445Z Requirement already satisfied: urllib3<1.27,>=1.21.1 in /opt/conda/lib/python3.10/site-packages (from requests->torchvision==0.15.0a0+32d254b) (1.26.13) 2023-01-11T21:20:58.3957478Z Requirement already satisfied: idna<4,>=2.5 in /opt/conda/lib/python3.10/site-packages (from requests->torchvision==0.15.0a0+32d254b) (3.4) 2023-01-11T21:20:58.3964292Z Requirement already satisfied: charset-normalizer<3,>=2 in /opt/conda/lib/python3.10/site-packages (from requests->torchvision==0.15.0a0+32d254b) (2.0.4) 2023-01-11T21:20:58.4005417Z Requirement already satisfied: networkx in /opt/conda/lib/python3.10/site-packages (from torch->torchvision==0.15.0a0+32d254b) (2.6.3) 2023-01-11T21:20:58.4007053Z Requirement already satisfied: sympy in /opt/conda/lib/python3.10/site-packages (from torch->torchvision==0.15.0a0+32d254b) (1.11.1) 2023-01-11T21:20:58.4304796Z Requirement already satisfied: mpmath>=0.19 in /opt/conda/lib/python3.10/site-packages (from sympy->torch->torchvision==0.15.0a0+32d254b) (1.2.1) 2023-01-11T21:20:58.4358703Z Building wheels for collected packages: torchvision 2023-01-11T21:21:55.7272141Z Building wheel for torchvision (setup.py) ... [?25l- \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / done 2023-01-11T21:21:55.7299683Z [?25h Created wheel for torchvision: filename=torchvision-0.15.0a0+32d254b-cp310-cp310-linux_x86_64.whl size=1856205 sha256=7af400b8a3a64568380d22371ca34ed36a606a82d5c8659da96b47376d3eb0f5 2023-01-11T21:21:55.7300251Z Stored in directory: /var/lib/jenkins/.cache/pip/wheels/ca/33/ae/1f7c8972d058d079236e7ca0a30b53b050afb405820b9ed787 2023-01-11T21:21:55.7331282Z Successfully built torchvision 2023-01-11T21:21:56.2912231Z Installing collected packages: torchvision 2023-01-11T21:21:56.6628056Z Successfully installed torchvision-0.15.0a0+32d254b 2023-01-11T21:21:56.7589618Z + install_triton 2023-01-11T21:21:56.7589918Z + local commit 2023-01-11T21:21:56.7590152Z + [[ default == *rocm* ]] 2023-01-11T21:21:56.7591678Z ++ get_pinned_commit triton 2023-01-11T21:21:56.7592262Z ++ cat .github/ci_commit_pins/triton.txt 2023-01-11T21:21:56.7602806Z + commit=0d7e7532279e45672555e344646f5c19c3972331 2023-01-11T21:21:56.7603571Z + pip_install --user git+https://github.com/openai/triton@0d7e7532279e45672555e344646f5c19c3972331#subdirectory=python 2023-01-11T21:21:56.7604137Z + pip install --progress-bar off --user git+https://github.com/openai/triton@0d7e7532279e45672555e344646f5c19c3972331#subdirectory=python 2023-01-11T21:21:57.0746437Z Collecting git+https://github.com/openai/triton@0d7e7532279e45672555e344646f5c19c3972331#subdirectory=python 2023-01-11T21:21:57.0747654Z Cloning https://github.com/openai/triton (to revision 0d7e7532279e45672555e344646f5c19c3972331) to /tmp/pip-req-build-phj8u5v_ 2023-01-11T21:21:57.0764019Z Running command git clone --filter=blob:none --quiet https://github.com/openai/triton /tmp/pip-req-build-phj8u5v_ 2023-01-11T21:21:57.7144111Z Running command git rev-parse -q --verify 'sha^0d7e7532279e45672555e344646f5c19c3972331' 2023-01-11T21:21:57.7157676Z Running command git fetch -q https://github.com/openai/triton 0d7e7532279e45672555e344646f5c19c3972331 2023-01-11T21:21:58.0417228Z Running command git checkout -q 0d7e7532279e45672555e344646f5c19c3972331 2023-01-11T21:21:58.3784551Z Resolved https://github.com/openai/triton to commit 0d7e7532279e45672555e344646f5c19c3972331 2023-01-11T21:21:58.3785406Z Running command git submodule update --init --recursive -q 2023-01-11T21:21:58.8241585Z Preparing metadata (setup.py) ... [?25l- done 2023-01-11T21:21:58.9990950Z [?25hCollecting cmake 2023-01-11T21:21:59.0181191Z Downloading cmake-3.25.0-py2.py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (23.7 MB) 2023-01-11T21:21:59.2816510Z Collecting filelock 2023-01-11T21:21:59.2901667Z Downloading filelock-3.9.0-py3-none-any.whl (9.7 kB) 2023-01-11T21:21:59.2934271Z Requirement already satisfied: torch in /opt/conda/lib/python3.10/site-packages (from triton==2.0.0) (2.0.0a0+git8419ddd) 2023-01-11T21:21:59.3114806Z Requirement already satisfied: sympy in /opt/conda/lib/python3.10/site-packages (from torch->triton==2.0.0) (1.11.1) 2023-01-11T21:21:59.3118622Z Requirement already satisfied: networkx in /opt/conda/lib/python3.10/site-packages (from torch->triton==2.0.0) (2.6.3) 2023-01-11T21:21:59.3122833Z Requirement already satisfied: typing-extensions in /opt/conda/lib/python3.10/site-packages (from torch->triton==2.0.0) (4.4.0) 2023-01-11T21:21:59.3267540Z Requirement already satisfied: mpmath>=0.19 in /opt/conda/lib/python3.10/site-packages (from sympy->torch->triton==2.0.0) (1.2.1) 2023-01-11T21:21:59.3316824Z Building wheels for collected packages: triton 2023-01-11T21:22:39.5302469Z Building wheel for triton (setup.py) ... [?25l- \ | / - \ | / done 2023-01-11T21:22:39.5424816Z [?25h Created wheel for triton: filename=triton-2.0.0-cp310-cp310-linux_x86_64.whl size=15377935 sha256=fd8e6bb61136a085b66646350c4b898753e24c3d426a3caab7b52ea9562bb401 2023-01-11T21:22:39.5425759Z Stored in directory: /var/lib/jenkins/.cache/pip/wheels/3f/1d/23/1c2bc47d618a44f9c949aea4b7e355e737a1f1ed208f009295 2023-01-11T21:22:39.5436473Z Successfully built triton 2023-01-11T21:22:40.1617777Z Installing collected packages: cmake, filelock, triton 2023-01-11T21:22:41.1710154Z Successfully installed cmake-3.25.0 filelock-3.9.0 triton-2.0.0 2023-01-11T21:22:41.2616768Z + pip_install --user jinja2 2023-01-11T21:22:41.2617155Z + pip install --progress-bar off --user jinja2 2023-01-11T21:22:41.6130468Z Collecting jinja2 2023-01-11T21:22:41.6302107Z Downloading Jinja2-3.1.2-py3-none-any.whl (133 kB) 2023-01-11T21:22:41.6423896Z Requirement already satisfied: MarkupSafe>=2.0 in /opt/conda/lib/python3.10/site-packages (from jinja2) (2.1.1) 2023-01-11T21:22:42.2619233Z Installing collected packages: jinja2 2023-01-11T21:22:42.3386537Z Successfully installed jinja2-3.1.2 2023-01-11T21:22:42.4030984Z + test_python_shard 2 2023-01-11T21:22:42.4031370Z + [[ -z 4 ]] 2023-01-11T21:22:42.4031926Z + python test/run_test.py --exclude-jit-executor --exclude-distributed-tests --shard 2 4 --verbose 2023-01-11T21:22:44.0298710Z Ignoring disabled issues: [] 2023-01-11T21:22:44.0549325Z /var/lib/jenkins/workspace/test/run_test.py:1169: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead. 2023-01-11T21:22:44.0549869Z if torch.version.cuda is not None and LooseVersion(torch.version.cuda) >= "11.6": 2023-01-11T21:22:44.0615344Z ##[warning] Gathered no stats from artifacts. Proceeding with default sharding plan. 2023-01-11T21:22:44.0616080Z Selected tests: 2023-01-11T21:22:44.0616294Z benchmark_utils/test_benchmark_utils 2023-01-11T21:22:44.0616516Z dynamo/test_comptime 2023-01-11T21:22:44.0616716Z dynamo/test_functions 2023-01-11T21:22:44.0616905Z dynamo/test_misc 2023-01-11T21:22:44.0617106Z dynamo/test_optimizations 2023-01-11T21:22:44.0617328Z dynamo/test_replay_record 2023-01-11T21:22:44.0617567Z dynamo/test_torchxla_integration 2023-01-11T21:22:44.0617816Z dynamo/test_verify_correctness 2023-01-11T21:22:44.0618046Z inductor/test_torchinductor 2023-01-11T21:22:44.0618264Z lazy/test_extract_compiled_graph 2023-01-11T21:22:44.0618478Z lazy/test_ts_opinfo 2023-01-11T21:22:44.0618669Z nn/test_init 2023-01-11T21:22:44.0618854Z nn/test_packed_sequence 2023-01-11T21:22:44.0619074Z profiler/test_memory_profiler 2023-01-11T21:22:44.0619284Z test_autocast 2023-01-11T21:22:44.0619469Z test_comparison_utils 2023-01-11T21:22:44.0619687Z test_cpp_extensions_aot_no_ninja 2023-01-11T21:22:44.0619908Z test_cuda_nvml_based_avail 2023-01-11T21:22:44.0620102Z test_dataloader 2023-01-11T21:22:44.0620290Z test_dispatch 2023-01-11T21:22:44.0620480Z test_fake_tensor 2023-01-11T21:22:44.0620670Z test_functional_optim 2023-01-11T21:22:44.0620876Z test_fx_experimental 2023-01-11T21:22:44.0621073Z test_import_stats 2023-01-11T21:22:44.0621256Z test_jit_autocast 2023-01-11T21:22:44.0621450Z test_jit_llga_fuser 2023-01-11T21:22:44.0621639Z test_linalg 2023-01-11T21:22:44.0621817Z test_matmul_cuda 2023-01-11T21:22:44.0622014Z test_mkldnn_fusion 2023-01-11T21:22:44.0622206Z test_module_init 2023-01-11T21:22:44.0622403Z test_multiprocessing_spawn 2023-01-11T21:22:44.0622613Z test_native_mha 2023-01-11T21:22:44.0622807Z test_numpy_interop 2023-01-11T21:22:44.0622986Z test_ops 2023-01-11T21:22:44.0623162Z test_optim 2023-01-11T21:22:44.0623345Z test_prims 2023-01-11T21:22:44.0623785Z test_python_dispatch 2023-01-11T21:22:44.0624001Z test_scatter_gather_ops 2023-01-11T21:22:44.0624209Z test_shape_ops 2023-01-11T21:22:44.0624398Z test_sparse_csr 2023-01-11T21:22:44.0624603Z test_tensor_creation_ops 2023-01-11T21:22:44.0624808Z test_testing 2023-01-11T21:22:44.0624989Z test_type_info 2023-01-11T21:22:44.0625182Z test_view_ops 2023-01-11T21:22:44.0625370Z doctests 2023-01-11T21:22:44.0725716Z Prioritized test from test file changes. 2023-01-11T21:22:44.0727020Z reordering tests for PR: 2023-01-11T21:22:44.0728267Z prioritized: ['dynamo/test_misc', 'dynamo/test_optimizations', 'dynamo/test_torchxla_integration', 'inductor/test_torchinductor', 'test_fake_tensor', 'test_python_dispatch', 'test_scatter_gather_ops', 'test_sparse_csr', 'test_testing'] 2023-01-11T21:22:44.0729921Z the rest: ['benchmark_utils/test_benchmark_utils', 'dynamo/test_comptime', 'dynamo/test_functions', 'dynamo/test_replay_record', 'dynamo/test_verify_correctness', 'lazy/test_extract_compiled_graph', 'lazy/test_ts_opinfo', 'nn/test_init', 'nn/test_packed_sequence', 'profiler/test_memory_profiler', 'test_autocast', 'test_comparison_utils', 'test_cpp_extensions_aot_no_ninja', 'test_cuda_nvml_based_avail', 'test_dataloader', 'test_dispatch', 'test_functional_optim', 'test_fx_experimental', 'test_import_stats', 'test_jit_autocast', 'test_jit_llga_fuser', 'test_linalg', 'test_matmul_cuda', 'test_mkldnn_fusion', 'test_module_init', 'test_multiprocessing_spawn', 'test_native_mha', 'test_numpy_interop', 'test_ops', 'test_optim', 'test_prims', 'test_shape_ops', 'test_tensor_creation_ops', 'test_type_info', 'test_view_ops', 'doctests'] 2023-01-11T21:22:44.0730956Z 2023-01-11T21:22:44.0731359Z Downloading https://raw.githubusercontent.com/pytorch/test-infra/generated-stats/stats/slow-tests.json to /var/lib/jenkins/workspace/test/.pytorch-slow-tests.json 2023-01-11T21:22:44.1070547Z Downloading https://raw.githubusercontent.com/pytorch/test-infra/generated-stats/stats/disabled-tests-condensed.json to /var/lib/jenkins/workspace/test/.pytorch-disabled-tests.json 2023-01-11T21:22:44.1285685Z parallel (file granularity) tests: 2023-01-11T21:22:44.1286552Z dynamo/test_misc 2023-01-11T21:22:44.1287150Z dynamo/test_optimizations 2023-01-11T21:22:44.1287646Z dynamo/test_torchxla_integration 2023-01-11T21:22:44.1288094Z inductor/test_torchinductor 2023-01-11T21:22:44.1288434Z test_python_dispatch 2023-01-11T21:22:44.1288826Z test_scatter_gather_ops 2023-01-11T21:22:44.1289116Z test_testing 2023-01-11T21:22:44.1289321Z benchmark_utils/test_benchmark_utils 2023-01-11T21:22:44.1289545Z dynamo/test_comptime 2023-01-11T21:22:44.1289750Z dynamo/test_functions 2023-01-11T21:22:44.1289962Z dynamo/test_replay_record 2023-01-11T21:22:44.1290198Z dynamo/test_verify_correctness 2023-01-11T21:22:44.1290431Z lazy/test_extract_compiled_graph 2023-01-11T21:22:44.1290646Z lazy/test_ts_opinfo 2023-01-11T21:22:44.1290831Z nn/test_init 2023-01-11T21:22:44.1291028Z nn/test_packed_sequence 2023-01-11T21:22:44.1291261Z profiler/test_memory_profiler 2023-01-11T21:22:44.1291462Z test_autocast 2023-01-11T21:22:44.1291659Z test_comparison_utils 2023-01-11T21:22:44.1291859Z test_dataloader 2023-01-11T21:22:44.1292049Z test_functional_optim 2023-01-11T21:22:44.1292252Z test_fx_experimental 2023-01-11T21:22:44.1292451Z test_import_stats 2023-01-11T21:22:44.1292634Z test_jit_autocast 2023-01-11T21:22:44.1292826Z test_jit_llga_fuser 2023-01-11T21:22:44.1293018Z test_matmul_cuda 2023-01-11T21:22:44.1293200Z test_mkldnn_fusion 2023-01-11T21:22:44.1293391Z test_module_init 2023-01-11T21:22:44.1293661Z test_native_mha 2023-01-11T21:22:44.1293875Z test_numpy_interop 2023-01-11T21:22:44.1294080Z test_optim 2023-01-11T21:22:44.1294267Z test_shape_ops 2023-01-11T21:22:44.1294449Z test_type_info 2023-01-11T21:22:44.1294787Z test_view_ops 2023-01-11T21:22:44.1295006Z serial (file granularity) tests: 2023-01-11T21:22:44.1295213Z test_fake_tensor 2023-01-11T21:22:44.1295406Z test_sparse_csr 2023-01-11T21:22:44.1295749Z test_cpp_extensions_aot_no_ninja 2023-01-11T21:22:44.1295968Z test_cuda_nvml_based_avail 2023-01-11T21:22:44.1296171Z test_dispatch 2023-01-11T21:22:44.1296358Z test_linalg 2023-01-11T21:22:44.1296557Z test_multiprocessing_spawn 2023-01-11T21:22:44.1296761Z test_ops 2023-01-11T21:22:44.1296939Z test_prims 2023-01-11T21:22:44.1297127Z test_tensor_creation_ops 2023-01-11T21:22:44.1297324Z doctests 2023-01-11T21:22:45.7012141Z Ignoring disabled issues: [] 2023-01-11T21:22:45.7241718Z Running dynamo/test_misc ... [2023-01-11 21:22:45.723623] 2023-01-11T21:22:45.7244408Z Executing ['/opt/conda/bin/python', '-bb', 'dynamo/test_misc.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 21:22:45.723969] 2023-01-11T21:22:45.7299912Z Ignoring disabled issues: [] 2023-01-11T21:22:45.7532702Z Running dynamo/test_optimizations ... [2023-01-11 21:22:45.752655] 2023-01-11T21:22:45.7537581Z Executing ['/opt/conda/bin/python', '-bb', 'dynamo/test_optimizations.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 21:22:45.753017] 2023-01-11T21:22:49.1701897Z 2023-01-11T21:22:49.1703476Z Expand the folded group to see the log file of dynamo/test_optimizations 2023-01-11T21:22:49.1711296Z ##[group]PRINTING LOG FILE of dynamo/test_optimizations (/var/lib/jenkins/workspace/test/test-reports/dynamo-test_optimizations__p8vut7u) 2023-01-11T21:22:49.1711568Z 2023-01-11T21:22:49.1711657Z Running tests... 2023-01-11T21:22:49.1712089Z ---------------------------------------------------------------------- 2023-01-11T21:22:49.1712498Z Test results will be stored in test-reports/python-unittest/dynamo.test_optimizations 2023-01-11T21:22:49.1713081Z test_inplace_normalize (__main__.NormalizeIRTests) ... ok (1.197s) 2023-01-11T21:22:49.1713422Z frames [('total', 1), ('ok', 1)] 2023-01-11T21:22:49.1713746Z stats [('calls_captured', 3), ('fusions_possible', 2), ('unique_graphs', 1)] 2023-01-11T21:22:49.1714052Z aot_autograd [('total', 1), ('ok', 1)] 2023-01-11T21:22:49.1714479Z test_example_inputs (__main__.TestOptimizations) ... stats [('calls_captured', 3), ('fusions_possible', 2), ('unique_graphs', 1)] 2023-01-11T21:22:49.1714763Z ok (0.091s) 2023-01-11T21:22:49.1715191Z test_example_inputs_runtime_use (__main__.TestOptimizations) ... stats [('calls_captured', 3), ('fusions_possible', 2), ('unique_graphs', 1)] 2023-01-11T21:22:49.1715496Z ok (0.007s) 2023-01-11T21:22:49.1715743Z test_has_mutation (__main__.TestOptimizations) ... ok (0.015s) 2023-01-11T21:22:49.1716125Z test_has_mutation_factory (__main__.TestOptimizations) ... frames [('total', 1), ('ok', 1)] 2023-01-11T21:22:49.1716513Z stats [('calls_captured', 2), ('fusions_possible', 1), ('unique_graphs', 1)] 2023-01-11T21:22:49.1716764Z ok (0.014s) 2023-01-11T21:22:49.1717096Z test_inplacifier (__main__.TestOptimizations) ... optimizations [('out', 1), ('inplace', 1)] 2023-01-11T21:22:49.1717365Z ok (0.014s) 2023-01-11T21:22:49.1717625Z test_ipex_bf16 (__main__.TestOptimizations) ... skip: requires ipex (0.001s) 2023-01-11T21:22:49.1717955Z test_ipex_fp32 (__main__.TestOptimizations) ... skip: requires ipex (0.001s) 2023-01-11T21:22:49.1718339Z test_log_conv_args (__main__.TestOptimizations) ... frames [('total', 1), ('ok', 1)] 2023-01-11T21:22:49.1718719Z stats [('calls_captured', 3), ('fusions_possible', 2), ('unique_graphs', 1)] 2023-01-11T21:22:49.1718964Z ok (0.140s) 2023-01-11T21:22:49.1719075Z 2023-01-11T21:22:49.1719267Z ---------------------------------------------------------------------- 2023-01-11T21:22:49.1719527Z Ran 9 tests in 1.479s 2023-01-11T21:22:49.1719651Z 2023-01-11T21:22:49.1719732Z OK (skipped=2) 2023-01-11T21:22:49.1719847Z 2023-01-11T21:22:49.1719943Z Generating XML reports... 2023-01-11T21:22:49.1720376Z Generated XML report: test-reports/python-unittest/dynamo.test_optimizations/TEST-NormalizeIRTests-20230111212247.xml 2023-01-11T21:22:49.1720936Z Generated XML report: test-reports/python-unittest/dynamo.test_optimizations/TEST-TestOptimizations-20230111212247.xml 2023-01-11T21:22:49.1721189Z 2023-01-11T21:22:49.1721548Z ##[endgroup] 2023-01-11T21:22:49.1721975Z FINISHED PRINTING LOG FILE of dynamo/test_optimizations (/var/lib/jenkins/workspace/test/test-reports/dynamo-test_optimizations__p8vut7u) 2023-01-11T21:22:49.1722224Z 2023-01-11T21:22:51.1020398Z Ignoring disabled issues: [] 2023-01-11T21:22:51.1249002Z Running dynamo/test_torchxla_integration ... [2023-01-11 21:22:51.124462] 2023-01-11T21:22:51.1250925Z Executing ['/opt/conda/bin/python', '-bb', 'dynamo/test_torchxla_integration.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 21:22:51.124786] 2023-01-11T21:22:52.7102858Z 2023-01-11T21:22:52.7103388Z Expand the folded group to see the log file of dynamo/test_misc 2023-01-11T21:22:52.7104937Z ##[group]PRINTING LOG FILE of dynamo/test_misc (/var/lib/jenkins/workspace/test/test-reports/dynamo-test_misc_mzyasd93) 2023-01-11T21:22:52.7105494Z 2023-01-11T21:22:52.7105620Z Running tests... 2023-01-11T21:22:52.7106360Z ---------------------------------------------------------------------- 2023-01-11T21:22:52.7109828Z Test results will be stored in test-reports/python-unittest/dynamo.test_misc 2023-01-11T21:22:52.7110305Z test_allow_in_graph (__main__.MiscTests) ... ok (1.192s) 2023-01-11T21:22:52.7110838Z test_autocast (__main__.MiscTests) ... frames [('total', 1), ('ok', 1)] 2023-01-11T21:22:52.7111365Z stats [('calls_captured', 5), ('fusions_possible', 4), ('unique_graphs', 1)] 2023-01-11T21:22:52.7111914Z stats [('calls_captured', 7), ('fusions_possible', 6), ('unique_graphs', 1)] 2023-01-11T21:22:52.7112246Z ok (0.668s) 2023-01-11T21:22:52.7112734Z test_autocast_cpu (__main__.MiscTests) ... stats [('calls_captured', 7), ('fusions_possible', 6), ('unique_graphs', 1)] 2023-01-11T21:22:52.7113331Z ok (0.027s) 2023-01-11T21:22:52.7113858Z test_autocast_device (__main__.MiscTests) ... stats [('calls_captured', 7), ('fusions_possible', 6), ('unique_graphs', 1)] 2023-01-11T21:22:52.7114234Z ok (0.026s) 2023-01-11T21:22:52.7116521Z test_autocast_float64 (__main__.MiscTests) ... stats [('calls_captured', 7), ('fusions_possible', 6), ('unique_graphs', 1)] 2023-01-11T21:22:52.7116886Z ok (0.025s) 2023-01-11T21:22:52.7117220Z test_autograd_function_equivalence (__main__.MiscTests) ... inline_call [] 2023-01-11T21:22:52.7117753Z stats [('calls_captured', 4), ('unique_graphs', 4), ('fusions_possible', 0)] 2023-01-11T21:22:52.7118094Z ok (0.100s) 2023-01-11T21:22:52.7118674Z test_autograd_profiler (__main__.MiscTests) ... STAGE:2023-01-11 21:22:49 2193:2193 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:22:52.7119341Z STAGE:2023-01-11 21:22:49 2193:2193 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:22:52.7124357Z STAGE:2023-01-11 21:22:49 2193:2193 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:22:52.7124952Z [2023-01-11 21:22:49,540] torch._dynamo.variables.torch: [WARNING] Profiler will be ignored 2023-01-11T21:22:52.7125361Z frames [('total', 2), ('ok', 2)] 2023-01-11T21:22:52.7125626Z unimplemented [] 2023-01-11T21:22:52.7125946Z graph_break [('Tensor.tolist', 1)] 2023-01-11T21:22:52.7126330Z stats [('calls_captured', 4), ('fusions_possible', 2), ('unique_graphs', 2)] 2023-01-11T21:22:52.7126636Z ok (0.296s) 2023-01-11T21:22:52.7127149Z test_autograd_profiler_enabled (__main__.MiscTests) ... STAGE:2023-01-11 21:22:49 2193:2193 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:22:52.7127726Z STAGE:2023-01-11 21:22:49 2193:2193 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:22:52.7128340Z STAGE:2023-01-11 21:22:49 2193:2193 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:22:52.7128760Z frames [('total', 3), ('ok', 3)] 2023-01-11T21:22:52.7128963Z unimplemented [] 2023-01-11T21:22:52.7129289Z graph_break [('torch.autograd._profiler_enabled not supported yet', 1)] 2023-01-11T21:22:52.7129668Z stats [('calls_captured', 2), ('unique_graphs', 2), ('fusions_possible', 0)] 2023-01-11T21:22:52.7130051Z ok (0.014s) 2023-01-11T21:22:52.7130434Z test_boolarg (__main__.MiscTests) ... stats [('calls_captured', 3), ('unique_graphs', 3), ('fusions_possible', 0)] 2023-01-11T21:22:52.7130736Z ok (0.010s) 2023-01-11T21:22:52.7130973Z test_build_tuple_unpack (__main__.MiscTests) ... inline_call [] 2023-01-11T21:22:52.7131324Z stats [('calls_captured', 4), ('fusions_possible', 2), ('unique_graphs', 2)] 2023-01-11T21:22:52.7131566Z ok (0.013s) 2023-01-11T21:22:52.7131896Z test_builder_for_class_with_metaclass (__main__.MiscTests) ... frames [('total', 1), ('ok', 1)] 2023-01-11T21:22:52.7132268Z stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)] 2023-01-11T21:22:52.7132509Z ok (0.005s) 2023-01-11T21:22:52.7132888Z test_builtin_isinstance (__main__.MiscTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)] 2023-01-11T21:22:52.7133171Z ok (0.011s) 2023-01-11T21:22:52.7133424Z test_builtin_subclasses_as_method_on_class_type (__main__.MiscTests) ... ok (0.002s) 2023-01-11T21:22:52.7133876Z test_builtin_subclasses_as_method_on_var (__main__.MiscTests) ... ok (0.004s) 2023-01-11T21:22:52.7134209Z test_call_parent_non_class_methods_from_child (__main__.MiscTests) ... inline_call [] 2023-01-11T21:22:52.7134801Z stats [('calls_captured', 3), ('fusions_possible', 2), ('unique_graphs', 1)] 2023-01-11T21:22:52.7135048Z ok (0.008s) 2023-01-11T21:22:52.7135415Z test_callpacked (__main__.MiscTests) ... stats [('calls_captured', 4), ('fusions_possible', 2), ('unique_graphs', 2)] 2023-01-11T21:22:52.7135684Z ok (0.010s) 2023-01-11T21:22:52.7135990Z test_cell_output1 (__main__.MiscTests) ... frames [('total', 1), ('ok', 1)] 2023-01-11T21:22:52.7136432Z stats [('calls_captured', 2), ('fusions_possible', 1), ('unique_graphs', 1)] 2023-01-11T21:22:52.7136672Z ok (0.005s) 2023-01-11T21:22:52.7136965Z test_cell_output2 (__main__.MiscTests) ... frames [('total', 2), ('ok', 2)] 2023-01-11T21:22:52.7137218Z unimplemented [] 2023-01-11T21:22:52.7137618Z graph_break [('call_function UserDefinedObjectVariable(unsupported) [TensorVariable(), TensorVariable()] {}', 1)] 2023-01-11T21:22:52.7138040Z stats [('calls_captured', 3), ('fusions_possible', 2), ('unique_graphs', 1)] 2023-01-11T21:22:52.7138284Z ok (0.008s) 2023-01-11T21:22:52.7138986Z test_change_backends (__main__.MiscTests) ... /opt/conda/lib/python3.10/site-packages/torch/jit/_check.py:181: UserWarning: The TorchScript type system doesn't support instance-level annotations on empty non-base types in `__init__`. Instead, either 1) use a type annotation in the class body, or 2) wrap the type in `torch.jit.Attribute`. 2023-01-11T21:22:52.7139599Z warnings.warn("The TorchScript type system doesn't support " 2023-01-11T21:22:52.7139965Z stats [('calls_captured', 3), ('unique_graphs', 3), ('fusions_possible', 0)] 2023-01-11T21:22:52.7140262Z frames [('total', 2), ('ok', 2)] 2023-01-11T21:22:52.7140462Z ok (0.058s) 2023-01-11T21:22:52.7140753Z test_cond (__main__.MiscTests) ... frames [('total', 1), ('ok', 1)] 2023-01-11T21:22:52.7140985Z inline_call [] 2023-01-11T21:22:52.7141300Z stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)] 2023-01-11T21:22:52.7141548Z ok (0.012s) 2023-01-11T21:22:52.7141772Z test_cond_export (__main__.MiscTests) ... inline_call [] 2023-01-11T21:22:52.7142122Z stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)] 2023-01-11T21:22:52.7142365Z ok (0.016s) 2023-01-11T21:22:52.7142600Z test_cond_export_single_arg (__main__.MiscTests) ... inline_call [] 2023-01-11T21:22:52.7142967Z stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)] 2023-01-11T21:22:52.7143209Z ok (0.009s) 2023-01-11T21:22:52.7143501Z test_cond_nested (__main__.MiscTests) ... frames [('total', 1), ('ok', 1)] 2023-01-11T21:22:52.7143741Z inline_call [] 2023-01-11T21:22:52.7144047Z stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)] 2023-01-11T21:22:52.7144280Z ok (0.014s) 2023-01-11T21:22:52.7144589Z test_cond_side_effects (__main__.MiscTests) ... expected failure (0.001s) 2023-01-11T21:22:52.7144970Z test_config_getattr_default (__main__.MiscTests) ... frames [('total', 3), ('ok', 3)] 2023-01-11T21:22:52.7145345Z stats [('calls_captured', 21), ('fusions_possible', 18), ('unique_graphs', 3)] 2023-01-11T21:22:52.7145582Z ok (0.033s) 2023-01-11T21:22:52.7145888Z test_config_log_level (__main__.MiscTests) ... frames [('total', 1), ('ok', 1)] 2023-01-11T21:22:52.7146255Z stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)] 2023-01-11T21:22:52.7146487Z ok (0.005s) 2023-01-11T21:22:52.7146778Z test_config_obj (__main__.MiscTests) ... frames [('total', 4), ('ok', 4)] 2023-01-11T21:22:52.7147140Z stats [('calls_captured', 8), ('fusions_possible', 4), ('unique_graphs', 4)] 2023-01-11T21:22:52.7147368Z ok (0.022s) 2023-01-11T21:22:52.7147613Z test_const_dict_variable_python_type (__main__.MiscTests) ... ok (0.001s) 2023-01-11T21:22:52.7148212Z test_cross_entropy_loss_fancy_ctor (__main__.MiscTests) ... /opt/conda/lib/python3.10/site-packages/torch/nn/_reduction.py:42: UserWarning: size_average and reduce args will be deprecated, please use reduction='none' instead. 2023-01-11T21:22:52.7148624Z warnings.warn(warning.format(ret)) 2023-01-11T21:22:52.7148826Z ok (0.002s) 2023-01-11T21:22:52.7149069Z test_cross_entropy_loss_simple_ctor (__main__.MiscTests) ... ok (0.001s) 2023-01-11T21:22:52.7149441Z test_dataclass_fields (__main__.MiscTests) ... frames [('total', 2), ('ok', 2)] 2023-01-11T21:22:52.7149679Z inline_call [] 2023-01-11T21:22:52.7150010Z stats [('calls_captured', 3), ('unique_graphs', 2), ('fusions_possible', 1)] 2023-01-11T21:22:52.7150279Z ok (0.034s) 2023-01-11T21:22:52.7150673Z test_dict_mutation_side_effect (__main__.MiscTests) ... frames [('total', 1), ('ok', 1)] 2023-01-11T21:22:52.7151046Z stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)] 2023-01-11T21:22:52.7151286Z ok (0.005s) 2023-01-11T21:22:52.7151631Z test_dict_reconstruct_keeps_original_order (__main__.MiscTests) ... frames [('total', 13), ('ok', 12)] 2023-01-11T21:22:52.7152089Z unimplemented [("Guard setup for uninitialized class ", 1)] 2023-01-11T21:22:52.7152990Z graph_break [('UnspecializedNNModuleVariable missing add_module', 3), ('construct nn.Module: ReLU', 1), ('call_function in skip_files /opt/conda/lib/python3.10/collections/__init__.py', 1), ('construct nn.Module: ModuleDict', 1), ('Patched init cannot be inlined.', 1), ('construct nn.Module: Linear', 1), ('construct nn.Module: Sigmoid', 1), ('call_method ConstDictVariable() update [TupleVariable()] {}', 1)] 2023-01-11T21:22:52.7153654Z inline_call [('inline __setitem__', 2), ('Patched init cannot be inlined.', 1)] 2023-01-11T21:22:52.7153906Z ok (0.036s) 2023-01-11T21:22:52.7154194Z test_dictcomp (__main__.MiscTests) ... frames [('total', 1), ('ok', 1)] 2023-01-11T21:22:52.7154435Z inline_call [] 2023-01-11T21:22:52.7154744Z stats [('calls_captured', 2), ('fusions_possible', 1), ('unique_graphs', 1)] 2023-01-11T21:22:52.7154978Z ok (0.007s) 2023-01-11T21:22:52.7155208Z test_disable_flag (__main__.MiscTests) ... ok (0.002s) 2023-01-11T21:22:52.7155479Z test_disable_optimize (__main__.MiscTests) ... ok (0.002s) 2023-01-11T21:22:52.7155837Z test_disallow_in_graph (__main__.MiscTests) ... frames [('total', 2), ('ok', 2)] 2023-01-11T21:22:52.7156080Z unimplemented [] 2023-01-11T21:22:52.7156466Z graph_break [('call_function UserDefinedObjectVariable(sub) [TensorVariable(), ConstantVariable(int)] {}', 1)] 2023-01-11T21:22:52.7156888Z stats [('calls_captured', 4), ('fusions_possible', 2), ('unique_graphs', 2)] 2023-01-11T21:22:52.7157117Z ok (0.010s) 2023-01-11T21:22:52.7157353Z test_dunder_methods (__main__.MiscTests) ... inline_call [] 2023-01-11T21:22:52.7157704Z stats [('calls_captured', 4), ('fusions_possible', 3), ('unique_graphs', 1)] 2023-01-11T21:22:52.7157935Z ok (0.020s) 2023-01-11T21:22:52.7158177Z test_duplicate_graph_break_warning (__main__.MiscTests) ... break 2023-01-11T21:22:52.7158479Z break 2023-01-11T21:22:52.7158693Z frames [('total', 9), ('ok', 9)] 2023-01-11T21:22:52.7159038Z inline_call [('call_function BuiltinVariable(print) [ConstantVariable(str)] {}', 2)] 2023-01-11T21:22:52.7159309Z unimplemented [] 2023-01-11T21:22:52.7159647Z graph_break [('call_function BuiltinVariable(print) [ConstantVariable(str)] {}', 4)] 2023-01-11T21:22:52.7160025Z stats [('calls_captured', 6), ('unique_graphs', 4), ('fusions_possible', 2)] 2023-01-11T21:22:52.7160262Z ok (0.026s) 2023-01-11T21:22:52.7160507Z test_dynamo_min_operator_with_shape (__main__.MiscTests) ... ok (0.002s) 2023-01-11T21:22:52.7160859Z test_empty_list (__main__.MiscTests) ... frames [('total', 2), ('ok', 2)] 2023-01-11T21:22:52.7161218Z stats [('calls_captured', 2), ('unique_graphs', 2), ('fusions_possible', 0)] 2023-01-11T21:22:52.7161461Z ok (0.008s) 2023-01-11T21:22:52.7161830Z test_enum_no_graphbreaks (__main__.MiscTests) ... stats [('calls_captured', 3), ('unique_graphs', 2), ('fusions_possible', 1)] 2023-01-11T21:22:52.7162114Z ok (0.010s) 2023-01-11T21:22:52.7162428Z test_error_on_nested_fx_trace (__main__.MiscTests) ... frames [('total', 1), ('ok', 1)] 2023-01-11T21:22:52.7162802Z stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)] 2023-01-11T21:22:52.7163032Z ok (0.004s) 2023-01-11T21:22:52.7163384Z test_fold (__main__.MiscTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)] 2023-01-11T21:22:52.7163649Z ok (0.004s) 2023-01-11T21:22:52.7164032Z test_frozenset_torch_func_contains (__main__.MiscTests) ... stats [('calls_captured', 3), ('unique_graphs', 2), ('fusions_possible', 1)] 2023-01-11T21:22:52.7164363Z ok (0.009s) 2023-01-11T21:22:52.7164746Z test_function_annotation (__main__.MiscTests) ... stats [('calls_captured', 2), ('unique_graphs', 2), ('fusions_possible', 0)] 2023-01-11T21:22:52.7165019Z ok (0.008s) 2023-01-11T21:22:52.7165378Z test_generate_tensor_from_list_of_numpy_primitive_type (__main__.MiscTests) ... frames [('total', 2), ('ok', 2)] 2023-01-11T21:22:52.7165665Z unimplemented [] 2023-01-11T21:22:52.7165904Z graph_break [('numpy', 1)] 2023-01-11T21:22:52.7166218Z stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)] 2023-01-11T21:22:52.7166457Z ok (0.007s) 2023-01-11T21:22:52.7166821Z test_get_device (__main__.MiscTests) ... stats [('calls_captured', 2), ('fusions_possible', 1), ('unique_graphs', 1)] 2023-01-11T21:22:52.7167086Z ok (0.005s) 2023-01-11T21:22:52.7167372Z test_grad (__main__.MiscTests) ... frames [('total', 2), ('ok', 2)] 2023-01-11T21:22:52.7167613Z unimplemented [] 2023-01-11T21:22:52.7167859Z graph_break [('Tensor.backward', 1)] 2023-01-11T21:22:52.7168200Z stats [('calls_captured', 4), ('fusions_possible', 2), ('unique_graphs', 2)] 2023-01-11T21:22:52.7168442Z ok (0.012s) 2023-01-11T21:22:52.7168736Z test_grad_mode_guard (__main__.MiscTests) ... frames [('total', 2), ('ok', 2)] 2023-01-11T21:22:52.7168991Z unimplemented [] 2023-01-11T21:22:52.7169243Z graph_break [('Tensor.tolist', 1)] 2023-01-11T21:22:52.7169570Z stats [('calls_captured', 4), ('fusions_possible', 2), ('unique_graphs', 2)] 2023-01-11T21:22:52.7169808Z ok (0.011s) 2023-01-11T21:22:52.7170108Z test_graph_break (__main__.MiscTests) ... frames [('total', 3), ('ok', 3)] 2023-01-11T21:22:52.7170419Z unimplemented [] 2023-01-11T21:22:52.7170799Z graph_break [('call_function in skip_files /opt/conda/lib/python3.10/site-packages/torch/_dynamo/__init__.py', 2)] 2023-01-11T21:22:52.7171198Z stats [('calls_captured', 6), ('fusions_possible', 3), ('unique_graphs', 3)] 2023-01-11T21:22:52.7171436Z ok (0.015s) 2023-01-11T21:22:52.7171809Z test_guard_failure_fn (__main__.MiscTests) ... stats [('calls_captured', 8), ('fusions_possible', 6), ('unique_graphs', 2)] 2023-01-11T21:22:52.7172082Z ok (0.012s) 2023-01-11T21:22:52.7172456Z test_guard_failure_fn2 (__main__.MiscTests) ... stats [('calls_captured', 6), ('fusions_possible', 4), ('unique_graphs', 2)] 2023-01-11T21:22:52.7172737Z ok (0.011s) 2023-01-11T21:22:52.7173138Z test_id_of_nn_module (__main__.MiscTests) ... stats [('calls_captured', 3), ('unique_graphs', 2), ('fusions_possible', 1)] 2023-01-11T21:22:52.7173421Z ok (0.012s) 2023-01-11T21:22:52.7173895Z test_if_cond_nn_mod (__main__.MiscTests) ... stats [('calls_captured', 3), ('unique_graphs', 2), ('fusions_possible', 1)] 2023-01-11T21:22:52.7174171Z ok (0.013s) 2023-01-11T21:22:52.7174468Z test_inference_mode (__main__.MiscTests) ... frames [('total', 1), ('ok', 1)] 2023-01-11T21:22:52.7175134Z stats [('calls_captured', 2), ('fusions_possible', 1), ('unique_graphs', 1)] 2023-01-11T21:22:52.7175376Z ok (0.006s) 2023-01-11T21:22:52.7175678Z test_inline_dict_mutation (__main__.MiscTests) ... frames [('total', 1), ('ok', 1)] 2023-01-11T21:22:52.7175930Z inline_call [] 2023-01-11T21:22:52.7176255Z stats [('calls_captured', 3), ('fusions_possible', 2), ('unique_graphs', 1)] 2023-01-11T21:22:52.7176567Z ok (0.020s) 2023-01-11T21:22:52.7177030Z test_inline_func_jump_on_tensor_condition (__main__.MiscTests) ... frames [('total', 4), ('ok', 4)] 2023-01-11T21:22:52.7177514Z inline_call [('generic_jump TensorVariable()', 1)] 2023-01-11T21:22:52.7177829Z unimplemented [] 2023-01-11T21:22:52.7178195Z graph_break [('generic_jump TensorVariable()', 1)] 2023-01-11T21:22:52.7178659Z stats [('calls_captured', 3), ('unique_graphs', 3), ('fusions_possible', 0)] 2023-01-11T21:22:52.7178970Z ok (0.016s) 2023-01-11T21:22:52.7179371Z test_inline_list_mutation (__main__.MiscTests) ... frames [('total', 1), ('ok', 1)] 2023-01-11T21:22:52.7179689Z inline_call [] 2023-01-11T21:22:52.7181557Z stats [('calls_captured', 2), ('fusions_possible', 1), ('unique_graphs', 1)] 2023-01-11T21:22:52.7181840Z ok (0.018s) 2023-01-11T21:22:52.7182345Z test_inplace (__main__.MiscTests) ... stats [('calls_captured', 3), ('fusions_possible', 2), ('unique_graphs', 1)] 2023-01-11T21:22:52.7182623Z ok (0.012s) 2023-01-11T21:22:52.7183063Z test_inplace_param_update (__main__.MiscTests) ... stats [('calls_captured', 5), ('fusions_possible', 4), ('unique_graphs', 1)] 2023-01-11T21:22:52.7183452Z ok (0.006s) 2023-01-11T21:22:52.7183847Z test_is_compiling (__main__.MiscTests) ... frames [('total', 1), ('ok', 1)] 2023-01-11T21:22:52.7184339Z stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)] 2023-01-11T21:22:52.7184641Z ok (0.009s) 2023-01-11T21:22:52.7185139Z test_is_floating_point (__main__.MiscTests) ... stats [('calls_captured', 3), ('fusions_possible', 2), ('unique_graphs', 1)] 2023-01-11T21:22:52.7185513Z ok (0.006s) 2023-01-11T21:22:52.7185958Z test_is_floating_point2 (__main__.MiscTests) ... stats [('calls_captured', 3), ('fusions_possible', 2), ('unique_graphs', 1)] 2023-01-11T21:22:52.7186244Z ok (0.006s) 2023-01-11T21:22:52.7186632Z test_is_tensor (__main__.MiscTests) ... stats [('calls_captured', 3), ('fusions_possible', 2), ('unique_graphs', 1)] 2023-01-11T21:22:52.7186973Z ok (0.006s) 2023-01-11T21:22:52.7187320Z test_is_tensor2 (__main__.MiscTests) ... frames [('total', 2), ('ok', 2)] 2023-01-11T21:22:52.7187731Z stats [('calls_captured', 2), ('unique_graphs', 2), ('fusions_possible', 0)] 2023-01-11T21:22:52.7188018Z ok (0.012s) 2023-01-11T21:22:52.7188384Z test_is_tensor_like (__main__.MiscTests) ... stats [('calls_captured', 3), ('unique_graphs', 2), ('fusions_possible', 1)] 2023-01-11T21:22:52.7188707Z ok (0.012s) 2023-01-11T21:22:52.7189016Z test_is_tensor_like2 (__main__.MiscTests) ... frames [('total', 3), ('ok', 3)] 2023-01-11T21:22:52.7189261Z unimplemented [] 2023-01-11T21:22:52.7189647Z graph_break [('call_function args: UserDefinedObjectVariable(MyTensor) ', 1)] 2023-01-11T21:22:52.7190067Z stats [('calls_captured', 2), ('unique_graphs', 2), ('fusions_possible', 0)] 2023-01-11T21:22:52.7190369Z ok (0.015s) 2023-01-11T21:22:52.7190845Z test_item (__main__.MiscTests) ... stats [('calls_captured', 3), ('fusions_possible', 2), ('unique_graphs', 1)] 2023-01-11T21:22:52.7191164Z ok (0.016s) 2023-01-11T21:22:52.7191832Z test_item_changes (__main__.MiscTests) ... stats [('calls_captured', 6), ('fusions_possible', 4), ('unique_graphs', 2)] 2023-01-11T21:22:52.7192110Z ok (0.031s) 2023-01-11T21:22:52.7192498Z test_item_changes_new_shape (__main__.MiscTests) ... stats [('calls_captured', 6), ('fusions_possible', 4), ('unique_graphs', 2)] 2023-01-11T21:22:52.7192784Z ok (0.031s) 2023-01-11T21:22:52.7193010Z test_large_reduction_list (__main__.MiscTests) ... ok (0.008s) 2023-01-11T21:22:52.7193299Z test_linetable_writer (__main__.MiscTests) ... ok (0.001s) 2023-01-11T21:22:52.7193666Z test_list_append_return_none (__main__.MiscTests) ... frames [('total', 1), ('ok', 1)] 2023-01-11T21:22:52.7194034Z stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)] 2023-01-11T21:22:52.7194280Z ok (0.005s) 2023-01-11T21:22:52.7194569Z test_list_mul (__main__.MiscTests) ... frames [('total', 1), ('ok', 1)] 2023-01-11T21:22:52.7194800Z ok (0.002s) 2023-01-11T21:22:52.7195084Z test_listcomp (__main__.MiscTests) ... frames [('total', 1), ('ok', 1)] 2023-01-11T21:22:52.7195317Z inline_call [] 2023-01-11T21:22:52.7195631Z stats [('calls_captured', 4), ('fusions_possible', 3), ('unique_graphs', 1)] 2023-01-11T21:22:52.7195865Z ok (0.009s) 2023-01-11T21:22:52.7196128Z test_lnotab_writer (__main__.MiscTests) ... skip: use lnotab when python < 3.10 (0.000s) 2023-01-11T21:22:52.7196571Z test_manual_seed (__main__.MiscTests) ... stats [('calls_captured', 3), ('fusions_possible', 2), ('unique_graphs', 1)] 2023-01-11T21:22:52.7196838Z ok (0.005s) 2023-01-11T21:22:52.7197197Z test_matmul1 (__main__.MiscTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)] 2023-01-11T21:22:52.7197468Z ok (0.004s) 2023-01-11T21:22:52.7197701Z test_module_complex_iter (__main__.MiscTests) ... ok (0.007s) 2023-01-11T21:22:52.7198110Z test_module_deepcopy (__main__.MiscTests) ... frames [('total', 6), ('ok', 6)] 2023-01-11T21:22:52.7198366Z unimplemented [] 2023-01-11T21:22:52.7198709Z graph_break [('call_function in skip_files /opt/conda/lib/python3.10/copy.py', 2)] 2023-01-11T21:22:52.7198964Z inline_call [] 2023-01-11T21:22:52.7199282Z stats [('calls_captured', 4), ('fusions_possible', 3), ('unique_graphs', 1)] 2023-01-11T21:22:52.7199532Z ok (0.043s) 2023-01-11T21:22:52.7199762Z test_named_parameters (__main__.MiscTests) ... ok (0.017s) 2023-01-11T21:22:52.7200150Z test_namedtuple1 (__main__.MiscTests) ... frames [('total', 1), ('ok', 1)] 2023-01-11T21:22:52.7200541Z stats [('calls_captured', 2), ('fusions_possible', 1), ('unique_graphs', 1)] 2023-01-11T21:22:52.7200776Z ok (0.006s) 2023-01-11T21:22:52.7201081Z test_namedtuple2 (__main__.MiscTests) ... frames [('total', 1), ('ok', 1)] 2023-01-11T21:22:52.7201446Z stats [('calls_captured', 3), ('fusions_possible', 2), ('unique_graphs', 1)] 2023-01-11T21:22:52.7201692Z ok (0.007s) 2023-01-11T21:22:52.7202064Z test_namedtuple3 (__main__.MiscTests) ... frames [('total', 1), ('ok', 1)] 2023-01-11T21:22:52.7202429Z stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)] 2023-01-11T21:22:52.7202669Z ok (0.005s) 2023-01-11T21:22:52.7202962Z test_nan (__main__.MiscTests) ... frames [('total', 1), ('ok', 1)] 2023-01-11T21:22:52.7203323Z stats [('calls_captured', 2), ('fusions_possible', 1), ('unique_graphs', 1)] 2023-01-11T21:22:52.7203560Z ok (0.004s) 2023-01-11T21:22:52.7203799Z test_nested_closure (__main__.MiscTests) ... inline_call [] 2023-01-11T21:22:52.7204165Z stats [('calls_captured', 9), ('fusions_possible', 7), ('unique_graphs', 2)] 2023-01-11T21:22:52.7204396Z ok (0.031s) 2023-01-11T21:22:52.7204643Z test_nested_closure_mutation (__main__.MiscTests) ... inline_call [] 2023-01-11T21:22:52.7205017Z stats [('calls_captured', 11), ('fusions_possible', 9), ('unique_graphs', 2)] 2023-01-11T21:22:52.7205257Z ok (0.023s) 2023-01-11T21:22:52.7205751Z test_nested_disable_decorator (__main__.MiscTests) ... [2023-01-11 21:22:50,520] torch._dynamo.convert_frame: [ERROR] WON'T CONVERT fn3 /var/lib/jenkins/workspace/test/dynamo/test_misc.py line 1197 2023-01-11T21:22:52.7206101Z due to: 2023-01-11T21:22:52.7206351Z Traceback (most recent call last): 2023-01-11T21:22:52.7206730Z File "/opt/conda/lib/python3.10/site-packages/torch/_dynamo/exc.py", line 67, in unimplemented 2023-01-11T21:22:52.7207025Z raise Unsupported(msg) 2023-01-11T21:22:52.7207406Z torch._dynamo.exc.Unsupported: call torch._dynamo.disable() wrapped function .fn1 at 0x7f78e333fb50> 2023-01-11T21:22:52.7207674Z 2023-01-11T21:22:52.7207746Z from user code: 2023-01-11T21:22:52.7208011Z File "/var/lib/jenkins/workspace/test/dynamo/test_misc.py", line 1199, in fn3 2023-01-11T21:22:52.7208270Z return fn2(x) 2023-01-11T21:22:52.7208527Z File "/var/lib/jenkins/workspace/test/dynamo/test_misc.py", line 1192, in fn2 2023-01-11T21:22:52.7208796Z x = fn1(x) # graph break 2023-01-11T21:22:52.7208925Z 2023-01-11T21:22:52.7209067Z Set torch._dynamo.config.verbose=True for more information 2023-01-11T21:22:52.7209238Z 2023-01-11T21:22:52.7209243Z 2023-01-11T21:22:52.7209372Z frames [('total', 2), ('ok', 2)] 2023-01-11T21:22:52.7209580Z unimplemented [] 2023-01-11T21:22:52.7210020Z graph_break [('call torch._dynamo.disable() wrapped function .fn1 at 0x7f78e333fb50>', 1)] 2023-01-11T21:22:52.7210522Z stats [('calls_captured', 4), ('fusions_possible', 2), ('unique_graphs', 2)] 2023-01-11T21:22:52.7211005Z inline_call [('call torch._dynamo.disable() wrapped function .fn1 at 0x7f78e333fb50>', 1)] 2023-01-11T21:22:52.7211315Z ok (0.012s) 2023-01-11T21:22:52.7211701Z test_nested_optimize (__main__.MiscTests) ... stats [('calls_captured', 8), ('fusions_possible', 6), ('unique_graphs', 2)] 2023-01-11T21:22:52.7212028Z ok (0.014s) 2023-01-11T21:22:52.7212274Z test_nested_optimize_decorator (__main__.MiscTests) ... inline_call [] 2023-01-11T21:22:52.7212648Z stats [('calls_captured', 4), ('fusions_possible', 3), ('unique_graphs', 1)] 2023-01-11T21:22:52.7212896Z ok (0.008s) 2023-01-11T21:22:52.7213274Z test_nested_optimize_run (__main__.MiscTests) ... stats [('calls_captured', 8), ('fusions_possible', 6), ('unique_graphs', 2)] 2023-01-11T21:22:52.7213559Z ok (0.012s) 2023-01-11T21:22:52.7214065Z test_nn_functional_reduction (__main__.MiscTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)] 2023-01-11T21:22:52.7214360Z ok (0.005s) 2023-01-11T21:22:52.7214767Z test_nn_sequential_invocation (__main__.MiscTests) ... inline_call [] 2023-01-11T21:22:52.7215165Z stats [('calls_captured', 3), ('fusions_possible', 2), ('unique_graphs', 1)] 2023-01-11T21:22:52.7215502Z ok (0.021s) 2023-01-11T21:22:52.7215810Z test_nn_sequential_invocation_reposition_indices (__main__.MiscTests) ... inline_call [] 2023-01-11T21:22:52.7216192Z stats [('calls_captured', 2), ('fusions_possible', 1), ('unique_graphs', 1)] 2023-01-11T21:22:52.7216431Z ok (0.016s) 2023-01-11T21:22:52.7216749Z test_no_error_on_nested_fx_trace (__main__.MiscTests) ... frames [('total', 1), ('ok', 1)] 2023-01-11T21:22:52.7217116Z stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)] 2023-01-11T21:22:52.7217357Z ok (0.005s) 2023-01-11T21:22:52.7217724Z test_no_grad (__main__.MiscTests) ... stats [('calls_captured', 40), ('fusions_possible', 32), ('unique_graphs', 8)] 2023-01-11T21:22:52.7217996Z ok (0.048s) 2023-01-11T21:22:52.7218297Z test_not_dynamic_scope (__main__.MiscTests) ... frames [('total', 1), ('ok', 1)] 2023-01-11T21:22:52.7218547Z inline_call [] 2023-01-11T21:22:52.7218852Z stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)] 2023-01-11T21:22:52.7219085Z ok (0.005s) 2023-01-11T21:22:52.7219440Z test_numel (__main__.MiscTests) ... stats [('calls_captured', 3), ('fusions_possible', 2), ('unique_graphs', 1)] 2023-01-11T21:22:52.7219712Z ok (0.006s) 2023-01-11T21:22:52.7220014Z test_numpy_int_constant (__main__.MiscTests) ... frames [('total', 1), ('ok', 1)] 2023-01-11T21:22:52.7220385Z stats [('calls_captured', 2), ('fusions_possible', 1), ('unique_graphs', 1)] 2023-01-11T21:22:52.7220760Z ok (0.006s) 2023-01-11T21:22:52.7221074Z test_numpy_variable_isinstance (__main__.MiscTests) ... frames [('total', 1), ('ok', 1)] 2023-01-11T21:22:52.7221450Z stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)] 2023-01-11T21:22:52.7221689Z ok (0.004s) 2023-01-11T21:22:52.7221926Z test_object_classmethod (__main__.MiscTests) ... inline_call [] 2023-01-11T21:22:52.7222275Z stats [('calls_captured', 2), ('fusions_possible', 1), ('unique_graphs', 1)] 2023-01-11T21:22:52.7222515Z ok (0.010s) 2023-01-11T21:22:52.7222751Z test_object_staticmethod (__main__.MiscTests) ... inline_call [] 2023-01-11T21:22:52.7223106Z stats [('calls_captured', 2), ('fusions_possible', 1), ('unique_graphs', 1)] 2023-01-11T21:22:52.7223346Z ok (0.010s) 2023-01-11T21:22:52.7223728Z test_onnx_shape_as_tensor (__main__.MiscTests) ... stats [('calls_captured', 15), ('fusions_possible', 10), ('unique_graphs', 5)] 2023-01-11T21:22:52.7224005Z ok (0.071s) 2023-01-11T21:22:52.7224389Z test_optimize_on_module (__main__.MiscTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)] 2023-01-11T21:22:52.7224671Z ok (0.005s) 2023-01-11T21:22:52.7225026Z test_pair (__main__.MiscTests) ... stats [('calls_captured', 5), ('fusions_possible', 4), ('unique_graphs', 1)] 2023-01-11T21:22:52.7225287Z ok (0.019s) 2023-01-11T21:22:52.7225587Z test_python_slice (__main__.MiscTests) ... frames [('total', 2), ('ok', 2)] 2023-01-11T21:22:52.7225823Z ok (0.006s) 2023-01-11T21:22:52.7226118Z test_raise_on_backend_error (__main__.MiscTests) ... frames [('total', 1)] 2023-01-11T21:22:52.7226450Z stats [('calls_captured', 3), ('fusions_possible', 2)] 2023-01-11T21:22:52.7226734Z ok (0.005s) 2023-01-11T21:22:52.7227018Z test_raises (__main__.MiscTests) ... frames [('total', 3), ('ok', 3)] 2023-01-11T21:22:52.7227260Z unimplemented [] 2023-01-11T21:22:52.7227587Z graph_break [('call_function BuiltinVariable(str) [TensorVariable()] {}', 1)] 2023-01-11T21:22:52.7227961Z stats [('calls_captured', 3), ('fusions_possible', 2), ('unique_graphs', 1)] 2023-01-11T21:22:52.7228197Z ok (0.008s) 2023-01-11T21:22:52.7228550Z test_rand (__main__.MiscTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)] 2023-01-11T21:22:52.7228818Z ok (0.009s) 2023-01-11T21:22:52.7229038Z test_range_input (__main__.MiscTests) ... inline_call [] 2023-01-11T21:22:52.7229388Z stats [('calls_captured', 3), ('fusions_possible', 2), ('unique_graphs', 1)] 2023-01-11T21:22:52.7229629Z ok (0.007s) 2023-01-11T21:22:52.7229958Z test_recursive_inline_list_mutation (__main__.MiscTests) ... frames [('total', 1), ('ok', 1)] 2023-01-11T21:22:52.7230260Z inline_call [] 2023-01-11T21:22:52.7230569Z stats [('calls_captured', 7), ('fusions_possible', 6), ('unique_graphs', 1)] 2023-01-11T21:22:52.7230800Z ok (0.013s) 2023-01-11T21:22:52.7231112Z test_release_input_memory (__main__.MiscTests) ... frames [('total', 1), ('ok', 1)] 2023-01-11T21:22:52.7231486Z stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)] 2023-01-11T21:22:52.7231718Z ok (0.003s) 2023-01-11T21:22:52.7232029Z test_release_module_memory (__main__.MiscTests) ... frames [('total', 1), ('ok', 1)] 2023-01-11T21:22:52.7232401Z stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)] 2023-01-11T21:22:52.7232639Z ok (0.008s) 2023-01-11T21:22:52.7233039Z test_repro_graph_breaks_in__get_item_by_idx (__main__.MiscTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)] 2023-01-11T21:22:52.7233338Z ok (0.009s) 2023-01-11T21:22:52.7233649Z test_restore_graphstate (__main__.MiscTests) ... frames [('total', 4), ('ok', 4)] 2023-01-11T21:22:52.7233984Z inline_call [('generic_jump TensorVariable()', 1)] 2023-01-11T21:22:52.7234218Z unimplemented [] 2023-01-11T21:22:52.7234495Z graph_break [('generic_jump TensorVariable()', 1)] 2023-01-11T21:22:52.7234839Z stats [('calls_captured', 6), ('unique_graphs', 4), ('fusions_possible', 2)] 2023-01-11T21:22:52.7235079Z ok (0.019s) 2023-01-11T21:22:52.7235513Z test_restore_graphstate_internals (__main__.MiscTests) ... stats [('calls_captured', 3), ('fusions_possible', 2), ('unique_graphs', 1)] 2023-01-11T21:22:52.7235809Z ok (0.007s) 2023-01-11T21:22:52.7236118Z test_return_nested_function (__main__.MiscTests) ... frames [('total', 2), ('ok', 2)] 2023-01-11T21:22:52.7236491Z stats [('calls_captured', 7), ('fusions_possible', 5), ('unique_graphs', 2)] 2023-01-11T21:22:52.7236729Z ok (0.013s) 2023-01-11T21:22:52.7236947Z test_sample_input (__main__.MiscTests) ... ok (0.516s) 2023-01-11T21:22:52.7237297Z test_setattr_mutation1 (__main__.MiscTests) ... frames [('total', 2), ('ok', 2)] 2023-01-11T21:22:52.7237754Z unimplemented [('call_method UserDefinedObjectVariable(member_descriptor) __mul__ [ConstantVariable(int)] {}', 1)] 2023-01-11T21:22:52.7238276Z graph_break [("isinstance called on UserDefinedClass UserDefinedObjectVariable(member_descriptor) ", 1)] 2023-01-11T21:22:52.7238641Z frames [('total', 1), ('ok', 1)] 2023-01-11T21:22:52.7238970Z stats [('calls_captured', 12), ('fusions_possible', 11), ('unique_graphs', 1)] 2023-01-11T21:22:52.7239216Z ok (0.017s) 2023-01-11T21:22:52.7239516Z test_setattr_mutation2 (__main__.MiscTests) ... frames [('total', 1), ('ok', 1)] 2023-01-11T21:22:52.7239763Z inline_call [] 2023-01-11T21:22:52.7240073Z stats [('calls_captured', 9), ('fusions_possible', 8), ('unique_graphs', 1)] 2023-01-11T21:22:52.7240325Z ok (0.013s) 2023-01-11T21:22:52.7240663Z test_setattr_mutation3 (__main__.MiscTests) ... frames [('total', 1), ('ok', 1)] 2023-01-11T21:22:52.7240908Z inline_call [] 2023-01-11T21:22:52.7241211Z stats [('calls_captured', 9), ('fusions_possible', 8), ('unique_graphs', 1)] 2023-01-11T21:22:52.7241503Z ok (0.014s) 2023-01-11T21:22:52.7241808Z test_shape_unpack (__main__.MiscTests) ... frames [('total', 1), ('ok', 1)] 2023-01-11T21:22:52.7242167Z stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)] 2023-01-11T21:22:52.7242414Z ok (0.004s) 2023-01-11T21:22:52.7242760Z test_side_effects_codegen_update_mutated (__main__.MiscTests) ... frames [('total', 6), ('ok', 6)] 2023-01-11T21:22:52.7243040Z unimplemented [] 2023-01-11T21:22:52.7243281Z graph_break [('Tensor.item', 4)] 2023-01-11T21:22:52.7243612Z stats [('calls_captured', 8), ('fusions_possible', 4), ('unique_graphs', 4)] 2023-01-11T21:22:52.7243855Z ok (0.026s) 2023-01-11T21:22:52.7244147Z test_size_input (__main__.MiscTests) ... frames [('total', 2), ('ok', 2)] 2023-01-11T21:22:52.7244513Z stats [('calls_captured', 2), ('unique_graphs', 2), ('fusions_possible', 0)] 2023-01-11T21:22:52.7244757Z ok (0.007s) 2023-01-11T21:22:52.7245121Z test_slice_input (__main__.MiscTests) ... stats [('calls_captured', 3), ('unique_graphs', 3), ('fusions_possible', 0)] 2023-01-11T21:22:52.7245399Z ok (0.031s) 2023-01-11T21:22:52.7245724Z test_tensor_build_list_unpack (__main__.MiscTests) ... frames [('total', 1), ('ok', 1)] 2023-01-11T21:22:52.7246104Z stats [('calls_captured', 2), ('fusions_possible', 1), ('unique_graphs', 1)] 2023-01-11T21:22:52.7246340Z ok (0.011s) 2023-01-11T21:22:52.7246712Z test_tensor_data (__main__.MiscTests) ... stats [('calls_captured', 2), ('fusions_possible', 1), ('unique_graphs', 1)] 2023-01-11T21:22:52.7246992Z ok (0.014s) 2023-01-11T21:22:52.7247288Z test_tensor_dict1 (__main__.MiscTests) ... frames [('total', 1), ('ok', 1)] 2023-01-11T21:22:52.7247655Z stats [('calls_captured', 2), ('fusions_possible', 1), ('unique_graphs', 1)] 2023-01-11T21:22:52.7247901Z ok (0.005s) 2023-01-11T21:22:52.7248195Z test_tensor_dict2 (__main__.MiscTests) ... frames [('total', 3), ('ok', 3)] 2023-01-11T21:22:52.7248565Z stats [('calls_captured', 9), ('fusions_possible', 6), ('unique_graphs', 3)] 2023-01-11T21:22:52.7248812Z ok (0.031s) 2023-01-11T21:22:52.7249135Z test_tensor_dot_grad_no_graph_break (__main__.MiscTests) ... frames [('total', 2), ('ok', 2)] 2023-01-11T21:22:52.7249400Z unimplemented [] 2023-01-11T21:22:52.7249663Z graph_break [('Tensor.backward', 1)] 2023-01-11T21:22:52.7250049Z stats [('calls_captured', 6), ('fusions_possible', 4), ('unique_graphs', 2)] 2023-01-11T21:22:52.7250285Z ok (0.014s) 2023-01-11T21:22:52.7250602Z test_tensor_is_contiguous (__main__.MiscTests) ... frames [('total', 2), ('ok', 2)] 2023-01-11T21:22:52.7250978Z stats [('calls_captured', 8), ('fusions_possible', 6), ('unique_graphs', 2)] 2023-01-11T21:22:52.7251216Z ok (0.166s) 2023-01-11T21:22:52.7251531Z test_tensor_item_capture (__main__.MiscTests) ... frames [('total', 1), ('ok', 1)] 2023-01-11T21:22:52.7251907Z stats [('calls_captured', 3), ('fusions_possible', 2), ('unique_graphs', 1)] 2023-01-11T21:22:52.7252140Z ok (0.006s) 2023-01-11T21:22:52.7252464Z test_tensor_item_no_capture (__main__.MiscTests) ... frames [('total', 1), ('ok', 1)] 2023-01-11T21:22:52.7252724Z unimplemented [] 2023-01-11T21:22:52.7252963Z graph_break [('Tensor.item', 1)] 2023-01-11T21:22:52.7253295Z stats [('calls_captured', 2), ('fusions_possible', 1), ('unique_graphs', 1)] 2023-01-11T21:22:52.7253537Z ok (0.005s) 2023-01-11T21:22:52.7254035Z test_tensor_layout (__main__.MiscTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)] 2023-01-11T21:22:52.7254303Z ok (0.010s) 2023-01-11T21:22:52.7254886Z test_tensor_types (__main__.MiscTests) ... frames [('total', 10), ('ok', 10)] 2023-01-11T21:22:52.7255299Z stats [('calls_captured', 10), ('unique_graphs', 10), ('fusions_possible', 0)] 2023-01-11T21:22:52.7255660Z ok (0.076s) 2023-01-11T21:22:52.7256055Z test_top_package_import (__main__.MiscTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)] 2023-01-11T21:22:52.7256343Z ok (0.005s) 2023-01-11T21:22:52.7256926Z test_torch_cuda_is_available (__main__.MiscTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)] 2023-01-11T21:22:52.7257219Z ok (0.004s) 2023-01-11T21:22:52.7257613Z test_torch_cudnn_is_acceptable (__main__.MiscTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)] 2023-01-11T21:22:52.7257913Z ok (0.005s) 2023-01-11T21:22:52.7258168Z test_torch_cudnn_is_acceptable_bad_inputs (__main__.MiscTests) ... ok (0.005s) 2023-01-11T21:22:52.7258573Z test_torch_nn_parameter_isinstance (__main__.MiscTests) ... frames [('total', 1), ('ok', 1)] 2023-01-11T21:22:52.7258960Z stats [('calls_captured', 3), ('fusions_possible', 2), ('unique_graphs', 1)] 2023-01-11T21:22:52.7259199Z ok (0.011s) 2023-01-11T21:22:52.7259937Z test_torch_profiler (__main__.MiscTests) ... skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/91868 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (0.001s) 2023-01-11T21:22:52.7260980Z test_torch_seed (__main__.MiscTests) ... skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/91867 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (0.001s) 2023-01-11T21:22:52.7261973Z test_torch_size (__main__.MiscTests) ... skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/91866 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (0.001s) 2023-01-11T21:22:52.7262574Z test_type_copy (__main__.MiscTests) ... frames [('total', 2), ('ok', 2)] 2023-01-11T21:22:52.7262942Z stats [('calls_captured', 6), ('fusions_possible', 4), ('unique_graphs', 2)] 2023-01-11T21:22:52.7263181Z ok (0.013s) 2023-01-11T21:22:52.7263510Z test_typing_variable_isinstance (__main__.MiscTests) ... frames [('total', 1), ('ok', 1)] 2023-01-11T21:22:52.7263894Z stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)] 2023-01-11T21:22:52.7264181Z ok (0.005s) 2023-01-11T21:22:52.7264549Z test_unpack4 (__main__.MiscTests) ... stats [('calls_captured', 5), ('fusions_possible', 4), ('unique_graphs', 1)] 2023-01-11T21:22:52.7264827Z ok (0.015s) 2023-01-11T21:22:52.7265177Z test_unpack5 (__main__.MiscTests) ... stats [('calls_captured', 5), ('fusions_possible', 4), ('unique_graphs', 1)] 2023-01-11T21:22:52.7265451Z ok (0.015s) 2023-01-11T21:22:52.7265802Z test_update_locals_and_stack_uses_shared_cache (__main__.MiscTests) ... frames [('total', 2), ('ok', 2)] 2023-01-11T21:22:52.7266076Z inline_call [] 2023-01-11T21:22:52.7266263Z unimplemented [] 2023-01-11T21:22:52.7266601Z graph_break [('call_method ListVariable() extend [ListIteratorVariable()] {}', 1)] 2023-01-11T21:22:52.7266867Z ok (0.007s) 2023-01-11T21:22:52.7267245Z test_user_defined_class_name (__main__.MiscTests) ... stats [('calls_captured', 2), ('fusions_possible', 1), ('unique_graphs', 1)] 2023-01-11T21:22:52.7267537Z ok (0.008s) 2023-01-11T21:22:52.7267812Z test_user_function_variable_supports_enum_argument (__main__.MiscTests) ... inline_call [] 2023-01-11T21:22:52.7268067Z ok (0.004s) 2023-01-11T21:22:52.7268343Z test_user_function_variable_supports_function_argument (__main__.MiscTests) ... inline_call [] 2023-01-11T21:22:52.7268734Z stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)] 2023-01-11T21:22:52.7268973Z ok (0.005s) 2023-01-11T21:22:52.7269253Z test_user_function_variable_supports_type_abcmeta_argument (__main__.MiscTests) ... inline_call [] 2023-01-11T21:22:52.7269650Z stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)] 2023-01-11T21:22:52.7269899Z ok (0.005s) 2023-01-11T21:22:52.7270283Z test_user_getattr1 (__main__.MiscTests) ... frames [('total', 1), ('ok', 1)] 2023-01-11T21:22:52.7270528Z inline_call [] 2023-01-11T21:22:52.7270841Z stats [('calls_captured', 2), ('fusions_possible', 1), ('unique_graphs', 1)] 2023-01-11T21:22:52.7271078Z ok (0.006s) 2023-01-11T21:22:52.7271391Z test_user_getattr2 (__main__.MiscTests) ... frames [('total', 1), ('ok', 1)] 2023-01-11T21:22:52.7271643Z inline_call [] 2023-01-11T21:22:52.7271948Z stats [('calls_captured', 3), ('fusions_possible', 2), ('unique_graphs', 1)] 2023-01-11T21:22:52.7272189Z ok (0.007s) 2023-01-11T21:22:52.7272493Z test_user_property (__main__.MiscTests) ... frames [('total', 1), ('ok', 1)] 2023-01-11T21:22:52.7272734Z inline_call [] 2023-01-11T21:22:52.7273044Z stats [('calls_captured', 2), ('fusions_possible', 1), ('unique_graphs', 1)] 2023-01-11T21:22:52.7273288Z ok (0.005s) 2023-01-11T21:22:52.7273521Z test_usr_cls_classmethod (__main__.MiscTests) ... inline_call [] 2023-01-11T21:22:52.7273888Z stats [('calls_captured', 2), ('fusions_possible', 1), ('unique_graphs', 1)] 2023-01-11T21:22:52.7274132Z ok (0.007s) 2023-01-11T21:22:52.7274376Z test_usr_cls_staticmethod (__main__.MiscTests) ... inline_call [] 2023-01-11T21:22:52.7274732Z stats [('calls_captured', 2), ('fusions_possible', 1), ('unique_graphs', 1)] 2023-01-11T21:22:52.7274978Z ok (0.007s) 2023-01-11T21:22:52.7275205Z test_version_ci (__main__.MiscTests) ... ok (0.000s) 2023-01-11T21:22:52.7275487Z test_write_to_closures_in_inlining (__main__.MiscTests) ... inline_call [] 2023-01-11T21:22:52.7275863Z stats [('calls_captured', 3), ('fusions_possible', 2), ('unique_graphs', 1)] 2023-01-11T21:22:52.7276103Z ok (0.008s) 2023-01-11T21:22:52.7276318Z test_jit_save (__main__.TestTracer) ... ok (0.039s) 2023-01-11T21:22:52.7276472Z 2023-01-11T21:22:52.7276678Z ---------------------------------------------------------------------- 2023-01-11T21:22:52.7276947Z Ran 161 tests in 4.834s 2023-01-11T21:22:52.7277072Z 2023-01-11T21:22:52.7277182Z OK (skipped=4, expected failures=1) 2023-01-11T21:22:52.7277317Z 2023-01-11T21:22:52.7277411Z Generating XML reports... 2023-01-11T21:22:52.7277818Z Generated XML report: test-reports/python-unittest/dynamo.test_misc/TEST-MiscTests-20230111212247.xml 2023-01-11T21:22:52.7278363Z Generated XML report: test-reports/python-unittest/dynamo.test_misc/TEST-TestTracer-20230111212247.xml 2023-01-11T21:22:52.7278593Z 2023-01-11T21:22:52.7279003Z ##[endgroup] 2023-01-11T21:22:52.7279402Z FINISHED PRINTING LOG FILE of dynamo/test_misc (/var/lib/jenkins/workspace/test/test-reports/dynamo-test_misc_mzyasd93) 2023-01-11T21:22:52.7279631Z 2023-01-11T21:22:52.8191671Z 2023-01-11T21:22:52.8192521Z Expand the folded group to see the log file of dynamo/test_torchxla_integration 2023-01-11T21:22:52.8193555Z ##[group]PRINTING LOG FILE of dynamo/test_torchxla_integration (/var/lib/jenkins/workspace/test/test-reports/dynamo-test_torchxla_integration_xt1a9m96) 2023-01-11T21:22:52.8193901Z 2023-01-11T21:22:52.8194007Z Running tests... 2023-01-11T21:22:52.8194553Z ---------------------------------------------------------------------- 2023-01-11T21:22:52.8195091Z Test results will be stored in test-reports/python-unittest/dynamo.test_torchxla_integration 2023-01-11T21:22:52.8195637Z test_basic (__main__.TorchXLAReuseGraphTest) ... skip: Skip the tests since torch_xla is not available or XLA devices are not specified (0.001s) 2023-01-11T21:22:52.8196231Z test_inplace_update (__main__.TorchXLAReuseGraphTest) ... skip: Skip the tests since torch_xla is not available or XLA devices are not specified (0.001s) 2023-01-11T21:22:52.8196838Z test_linear (__main__.TorchXLAReuseGraphTest) ... skip: Skip the tests since torch_xla is not available or XLA devices are not specified (0.001s) 2023-01-11T21:22:52.8197463Z test_matmul (__main__.TorchXLAReuseGraphTest) ... skip: Skip the tests since torch_xla is not available or XLA devices are not specified (0.001s) 2023-01-11T21:22:52.8197813Z 2023-01-11T21:22:52.8198088Z ---------------------------------------------------------------------- 2023-01-11T21:22:52.8198808Z Ran 4 tests in 0.003s 2023-01-11T21:22:52.8198976Z 2023-01-11T21:22:52.8199086Z OK (skipped=4) 2023-01-11T21:22:52.8199251Z 2023-01-11T21:22:52.8199378Z Generating XML reports... 2023-01-11T21:22:52.8200048Z Generated XML report: test-reports/python-unittest/dynamo.test_torchxla_integration/TEST-TorchXLAReuseGraphTest-20230111212252.xml 2023-01-11T21:22:52.8200423Z 2023-01-11T21:22:52.8200761Z ##[endgroup] 2023-01-11T21:22:52.8201403Z FINISHED PRINTING LOG FILE of dynamo/test_torchxla_integration (/var/lib/jenkins/workspace/test/test-reports/dynamo-test_torchxla_integration_xt1a9m96) 2023-01-11T21:22:52.8201752Z 2023-01-11T21:22:54.6923975Z Ignoring disabled issues: [] 2023-01-11T21:22:54.7169966Z Running inductor/test_torchinductor ... [2023-01-11 21:22:54.716491] 2023-01-11T21:22:54.7172674Z Executing ['/opt/conda/bin/python', '-bb', 'inductor/test_torchinductor.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 21:22:54.716859] 2023-01-11T21:22:54.7352883Z Ignoring disabled issues: [] 2023-01-11T21:22:54.7601159Z Running test_python_dispatch ... [2023-01-11 21:22:54.759617] 2023-01-11T21:22:54.7603099Z Executing ['/opt/conda/bin/python', '-bb', 'test_python_dispatch.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 21:22:54.759963] 2023-01-11T21:22:59.4492670Z 2023-01-11T21:22:59.4493947Z Expand the folded group to see the log file of test_python_dispatch 2023-01-11T21:22:59.4495178Z ##[group]PRINTING LOG FILE of test_python_dispatch (/var/lib/jenkins/workspace/test/test-reports/test_python_dispatch_fh1knau0) 2023-01-11T21:22:59.4495523Z 2023-01-11T21:22:59.4495633Z Running tests... 2023-01-11T21:22:59.4496203Z ---------------------------------------------------------------------- 2023-01-11T21:22:59.4496770Z Test results will be stored in test-reports/python-unittest/test_python_dispatch 2023-01-11T21:22:59.4497246Z test_all_same_mode (__main__.TestPythonDispatch) ... ok (1.070s) 2023-01-11T21:22:59.4497685Z test_autograd_in_attr (__main__.TestPythonDispatch) ... ok (0.002s) 2023-01-11T21:22:59.4498105Z test_basic (__main__.TestPythonDispatch) ... ok (0.005s) 2023-01-11T21:22:59.4498548Z test_capture_logs_with_torch_dispatch_mode (__main__.TestPythonDispatch) ... ok (0.001s) 2023-01-11T21:22:59.4499286Z test_construct_int_tensor (__main__.TestPythonDispatch) ... ok (0.000s) 2023-01-11T21:22:59.4499776Z test_custom_autograd (__main__.TestPythonDispatch) ... ok (0.005s) 2023-01-11T21:22:59.4500264Z test_deepcopy_non_wrapper_subclass (__main__.TestPythonDispatch) ... ok (0.001s) 2023-01-11T21:22:59.4500766Z test_deepcopy_wrapper_subclass (__main__.TestPythonDispatch) ... ok (0.001s) 2023-01-11T21:22:59.4501292Z test_deepcopy_wrapper_subclass_with_clone_returning_different_type (__main__.TestPythonDispatch) ... ok (0.001s) 2023-01-11T21:22:59.4501848Z test_detach_appears_twice_when_called_once (__main__.TestPythonDispatch) ... ok (0.001s) 2023-01-11T21:22:59.4502341Z test_device_slowpath (__main__.TestPythonDispatch) ... ok (0.002s) 2023-01-11T21:22:59.4502749Z test_dim_slowpath (__main__.TestPythonDispatch) ... ok (0.002s) 2023-01-11T21:22:59.4503165Z test_dispatch_super_call (__main__.TestPythonDispatch) ... ok (0.001s) 2023-01-11T21:22:59.4503638Z test_dispatch_super_call_list_arg (__main__.TestPythonDispatch) ... ok (0.001s) 2023-01-11T21:22:59.4504143Z test_dispatch_super_dont_autograd (__main__.TestPythonDispatch) ... ok (0.001s) 2023-01-11T21:22:59.4504608Z test_error_using_class_method_on_mode (__main__.TestPythonDispatch) ... ok (0.005s) 2023-01-11T21:22:59.4505069Z test_exception_handling (__main__.TestPythonDispatch) ... ok (0.001s) 2023-01-11T21:22:59.4505493Z test_fancy_strides (__main__.TestPythonDispatch) ... ok (0.002s) 2023-01-11T21:22:59.4505883Z test_format (__main__.TestPythonDispatch) ... ok (0.001s) 2023-01-11T21:22:59.4506282Z test_get_cur_mode (__main__.TestPythonDispatch) ... ok (0.001s) 2023-01-11T21:22:59.4506789Z test_get_mode_stack (__main__.TestPythonDispatch) ... ok (0.001s) 2023-01-11T21:22:59.4507255Z test_index_put_where_only_index_is_subclass (__main__.TestPythonDispatch) ... ok (0.001s) 2023-01-11T21:22:59.4507887Z test_invalid_ret (__main__.TestPythonDispatch) ... /var/lib/jenkins/workspace/test/test_python_dispatch.py:447: DeprecationWarning: Please use assertRaisesRegex instead. 2023-01-11T21:22:59.4508409Z self.assertRaisesRegexp( 2023-01-11T21:22:59.4508692Z ok (0.001s) 2023-01-11T21:22:59.4509030Z test_is_contiguous_slow_path (__main__.TestPythonDispatch) ... ok (0.003s) 2023-01-11T21:22:59.4509446Z test_kwarg_only (__main__.TestPythonDispatch) ... ok (0.002s) 2023-01-11T21:22:59.4509886Z test_kwarg_only_and_positional_default (__main__.TestPythonDispatch) ... ok (0.001s) 2023-01-11T21:22:59.4510324Z test_layout_slow_path (__main__.TestPythonDispatch) ... ok (0.002s) 2023-01-11T21:22:59.4510733Z test_like (__main__.TestPythonDispatch) ... ok (0.001s) 2023-01-11T21:22:59.4511147Z test_list_ret (__main__.TestPythonDispatch) ... ok (0.001s) 2023-01-11T21:22:59.4511598Z test_make_subclass_with_modes (__main__.TestPythonDispatch) ... ok (0.002s) 2023-01-11T21:22:59.4512067Z test_make_wrapper_subclass_noalloc (__main__.TestPythonDispatch) ... ok (0.000s) 2023-01-11T21:22:59.4512563Z test_make_wrapper_subclass_propagates_metadata (__main__.TestPythonDispatch) ... ok (0.001s) 2023-01-11T21:22:59.4513036Z test_maybe_tuple_bug (__main__.TestPythonDispatch) ... ok (0.001s) 2023-01-11T21:22:59.4513466Z test_mode_with_make_subclass (__main__.TestPythonDispatch) ... ok (0.001s) 2023-01-11T21:22:59.4513954Z test_multiple_ops_subclass (__main__.TestPythonDispatch) ... ok (0.001s) 2023-01-11T21:22:59.4514450Z test_nested_push_logging_tensor_mode (__main__.TestPythonDispatch) ... ok (0.001s) 2023-01-11T21:22:59.4514900Z test_nesting_same_mode (__main__.TestPythonDispatch) ... ok (0.001s) 2023-01-11T21:22:59.4515304Z test_new_ones (__main__.TestPythonDispatch) ... ok (0.001s) 2023-01-11T21:22:59.4515721Z test_none_wrapping (__main__.TestPythonDispatch) ... ok (0.008s) 2023-01-11T21:22:59.4516161Z test_notimplemented_mode (__main__.TestPythonDispatch) ... ok (0.002s) 2023-01-11T21:22:59.4516584Z test_optional_tensor_list (__main__.TestPythonDispatch) ... woof 2023-01-11T21:22:59.4516907Z ok (0.002s) 2023-01-11T21:22:59.4517294Z test_out (__main__.TestPythonDispatch) ... ok (0.001s) 2023-01-11T21:22:59.4517711Z test_produce_real_type (__main__.TestPythonDispatch) ... ok (0.001s) 2023-01-11T21:22:59.4518123Z test_set_data (__main__.TestPythonDispatch) ... ok (0.001s) 2023-01-11T21:22:59.4518547Z test_shallow_copy_and_detach (__main__.TestPythonDispatch) ... ok (0.002s) 2023-01-11T21:22:59.4518972Z test_sizes_slow_path (__main__.TestPythonDispatch) ... ok (0.003s) 2023-01-11T21:22:59.4519392Z test_standard_is_not_subclass (__main__.TestPythonDispatch) ... ok (0.000s) 2023-01-11T21:22:59.4520316Z test_storage (__main__.TestPythonDispatch) ... /var/lib/jenkins/workspace/test/test_python_dispatch.py:469: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:22:59.4521170Z self.assertRaises(RuntimeError, lambda: x.storage()) 2023-01-11T21:22:59.4521488Z ok (0.001s) 2023-01-11T21:22:59.4522334Z test_storage_can_be_converted_to_python_object (__main__.TestPythonDispatch) ... /var/lib/jenkins/workspace/test/test_python_dispatch.py:1197: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:22:59.4523147Z s = torch.Storage() 2023-01-11T21:22:59.4523408Z ok (0.001s) 2023-01-11T21:22:59.4523746Z test_strides_slow_path (__main__.TestPythonDispatch) ... ok (0.002s) 2023-01-11T21:22:59.4524280Z test_subclass_autograd_device_check (__main__.TestPythonDispatch) ... ok (0.002s) 2023-01-11T21:22:59.4524734Z test_subclass_creation (__main__.TestPythonDispatch) ... ok (0.004s) 2023-01-11T21:22:59.4525179Z test_subclass_priority (__main__.TestPythonDispatch) ... ok (0.002s) 2023-01-11T21:22:59.4525638Z test_tolist_numpy_with_torch_dispatch_mode (__main__.TestPythonDispatch) ... ok (0.002s) 2023-01-11T21:22:59.4526115Z test_torch_dispatch_mode_basic (__main__.TestPythonDispatch) ... ok (0.001s) 2023-01-11T21:22:59.4526588Z test_torch_dispatch_mode_respects_no_dispatch (__main__.TestPythonDispatch) ... ok (0.001s) 2023-01-11T21:22:59.4527082Z test_torch_dispatch_mode_subclass_priority (__main__.TestPythonDispatch) ... ok (0.002s) 2023-01-11T21:22:59.4527578Z test_torch_dispatch_mode_unrelated_tensors (__main__.TestPythonDispatch) ... ok (0.001s) 2023-01-11T21:22:59.4528005Z test_version (__main__.TestPythonDispatch) ... ok (0.001s) 2023-01-11T21:22:59.4528445Z test_with_mode_created_separately (__main__.TestPythonDispatch) ... ok (0.001s) 2023-01-11T21:22:59.4528894Z test_with_nested_modes (__main__.TestPythonDispatch) ... ok (0.001s) 2023-01-11T21:22:59.4529339Z test_wrapper_subclass_serializes (__main__.TestPythonDispatch) ... ok (0.002s) 2023-01-11T21:22:59.4529767Z test_basic (__main__.TestPythonDispatcher) ... ok (0.001s) 2023-01-11T21:22:59.4530163Z test_lstsq (__main__.TestPythonDispatcher) ... ok (0.003s) 2023-01-11T21:22:59.4530613Z test_alias_analysis (__main__.TestPythonRegistration) ... ok (0.006s) 2023-01-11T21:22:59.4531057Z test_create_new_library (__main__.TestPythonRegistration) ... ok (0.002s) 2023-01-11T21:22:59.4531517Z test_error_for_unsupported_ns_or_kind (__main__.TestPythonRegistration) ... ok (0.001s) 2023-01-11T21:22:59.4531995Z test_error_if_fn_not_callable (__main__.TestPythonRegistration) ... ok (0.001s) 2023-01-11T21:22:59.4532483Z test_extend_library_with_dispatch_key_arg (__main__.TestPythonRegistration) ... ok (0.002s) 2023-01-11T21:22:59.4533503Z test_override_aten_ops_with_multiple_libraries (__main__.TestPythonRegistration) ... /opt/conda/lib/python3.10/site-packages/torch/library.py:126: UserWarning: Overriding a previously registered kernel for the same operator and the same dispatch key 2023-01-11T21:22:59.4534459Z operator: aten::mul.Tensor(Tensor self, Tensor other) -> Tensor 2023-01-11T21:22:59.4535067Z registered at /var/lib/jenkins/workspace/build/aten/src/ATen/RegisterSchema.cpp:6 2023-01-11T21:22:59.4535446Z dispatch key: ZeroTensor 2023-01-11T21:22:59.4535909Z previous kernel: registered at /var/lib/jenkins/workspace/aten/src/ATen/LegacyBatchingRegistrations.cpp:1070 2023-01-11T21:22:59.4536543Z new kernel: registered at /dev/null:549 (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/core/dispatch/OperatorEntry.cpp:159.) 2023-01-11T21:22:59.4537150Z self.m.impl(name, dispatch_key if dispatch_key != "" else "CompositeImplicitAutograd", fn) 2023-01-11T21:22:59.4537536Z ok (0.003s) 2023-01-11T21:22:59.4537901Z test_override_cpu_sum (__main__.TestPythonRegistration) ... ok (0.001s) 2023-01-11T21:22:59.4538806Z test_override_cuda_with_jiterator (__main__.TestPythonRegistration) ... /opt/conda/lib/python3.10/site-packages/torch/library.py:126: UserWarning: Overriding a previously registered kernel for the same operator and the same dispatch key 2023-01-11T21:22:59.4539599Z operator: aten::where.self(Tensor condition, Tensor self, Tensor other) -> Tensor 2023-01-11T21:22:59.4540068Z registered at /var/lib/jenkins/workspace/build/aten/src/ATen/RegisterSchema.cpp:6 2023-01-11T21:22:59.4540441Z dispatch key: CUDA 2023-01-11T21:22:59.4540878Z previous kernel: registered at /var/lib/jenkins/workspace/aten/src/ATen/functorch/BatchRulesBinaryOps.cpp:322 2023-01-11T21:22:59.4541481Z new kernel: registered at /dev/null:209 (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/core/dispatch/OperatorEntry.cpp:159.) 2023-01-11T21:22:59.4542286Z self.m.impl(name, dispatch_key if dispatch_key != "" else "CompositeImplicitAutograd", fn) 2023-01-11T21:22:59.4543062Z /opt/conda/lib/python3.10/site-packages/torch/library.py:126: UserWarning: Overriding a previously registered kernel for the same operator and the same dispatch key 2023-01-11T21:22:59.4543713Z operator: aten::gelu(Tensor self, *, str approximate="none") -> Tensor 2023-01-11T21:22:59.4544160Z registered at /var/lib/jenkins/workspace/build/aten/src/ATen/RegisterSchema.cpp:6 2023-01-11T21:22:59.4544538Z dispatch key: CUDA 2023-01-11T21:22:59.4544967Z previous kernel: registered at /var/lib/jenkins/workspace/aten/src/ATen/functorch/BatchRulesUnaryOps.cpp:82 2023-01-11T21:22:59.4545594Z new kernel: registered at /dev/null:210 (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/core/dispatch/OperatorEntry.cpp:159.) 2023-01-11T21:22:59.4546165Z self.m.impl(name, dispatch_key if dispatch_key != "" else "CompositeImplicitAutograd", fn) 2023-01-11T21:22:59.4546964Z /opt/conda/lib/python3.10/site-packages/torch/library.py:126: UserWarning: Overriding a previously registered kernel for the same operator and the same dispatch key 2023-01-11T21:22:59.4547557Z operator: aten::exp(Tensor self) -> Tensor 2023-01-11T21:22:59.4547967Z registered at /var/lib/jenkins/workspace/build/aten/src/ATen/RegisterSchema.cpp:6 2023-01-11T21:22:59.4548338Z dispatch key: CUDA 2023-01-11T21:22:59.4548794Z previous kernel: registered at /var/lib/jenkins/workspace/aten/src/ATen/LegacyBatchingRegistrations.cpp:1070 2023-01-11T21:22:59.4549440Z new kernel: registered at /dev/null:211 (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/core/dispatch/OperatorEntry.cpp:159.) 2023-01-11T21:22:59.4549995Z self.m.impl(name, dispatch_key if dispatch_key != "" else "CompositeImplicitAutograd", fn) 2023-01-11T21:22:59.4550767Z /opt/conda/lib/python3.10/site-packages/torch/library.py:126: UserWarning: Overriding a previously registered kernel for the same operator and the same dispatch key 2023-01-11T21:22:59.4551445Z operator: aten::add.Tensor(Tensor self, Tensor other, *, Scalar alpha=1) -> Tensor 2023-01-11T21:22:59.4551921Z registered at /var/lib/jenkins/workspace/build/aten/src/ATen/RegisterSchema.cpp:6 2023-01-11T21:22:59.4552296Z dispatch key: CUDA 2023-01-11T21:22:59.4552858Z previous kernel: registered at /var/lib/jenkins/workspace/aten/src/ATen/LegacyBatchingRegistrations.cpp:1070 2023-01-11T21:22:59.4553503Z new kernel: registered at /dev/null:212 (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/core/dispatch/OperatorEntry.cpp:159.) 2023-01-11T21:22:59.4554058Z self.m.impl(name, dispatch_key if dispatch_key != "" else "CompositeImplicitAutograd", fn) 2023-01-11T21:22:59.4554408Z ok (1.703s) 2023-01-11T21:22:59.4554565Z 2023-01-11T21:22:59.4554847Z ---------------------------------------------------------------------- 2023-01-11T21:22:59.4555189Z Ran 72 tests in 2.894s 2023-01-11T21:22:59.4555352Z 2023-01-11T21:22:59.4555432Z OK 2023-01-11T21:22:59.4555573Z 2023-01-11T21:22:59.4555716Z Generating XML reports... 2023-01-11T21:22:59.4556448Z Generated XML report: test-reports/python-unittest/test_python_dispatch/TEST-TestPythonDispatch-20230111212256.xml 2023-01-11T21:22:59.4557424Z Generated XML report: test-reports/python-unittest/test_python_dispatch/TEST-TestPythonDispatcher-20230111212256.xml 2023-01-11T21:22:59.4558336Z Generated XML report: test-reports/python-unittest/test_python_dispatch/TEST-TestPythonRegistration-20230111212256.xml 2023-01-11T21:22:59.4558691Z 2023-01-11T21:22:59.4559117Z ##[endgroup] 2023-01-11T21:22:59.4559677Z FINISHED PRINTING LOG FILE of test_python_dispatch (/var/lib/jenkins/workspace/test/test-reports/test_python_dispatch_fh1knau0) 2023-01-11T21:22:59.4560008Z 2023-01-11T21:23:01.3731793Z Ignoring disabled issues: [] 2023-01-11T21:23:01.3966722Z Running test_scatter_gather_ops ... [2023-01-11 21:23:01.395949] 2023-01-11T21:23:01.3967364Z Executing ['/opt/conda/bin/python', '-bb', 'test_scatter_gather_ops.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 21:23:01.396273] 2023-01-11T21:23:08.6247006Z 2023-01-11T21:23:08.6247471Z Expand the folded group to see the log file of test_scatter_gather_ops 2023-01-11T21:23:08.6248737Z ##[group]PRINTING LOG FILE of test_scatter_gather_ops (/var/lib/jenkins/workspace/test/test-reports/test_scatter_gather_ops_339th585) 2023-01-11T21:23:08.6249138Z 2023-01-11T21:23:08.6249235Z Running tests... 2023-01-11T21:23:08.6249676Z ---------------------------------------------------------------------- 2023-01-11T21:23:08.6250088Z Test results will be stored in test-reports/python-unittest/test_scatter_gather_ops 2023-01-11T21:23:08.6250500Z test_gather_backward_with_empty_index_tensor_sparse_grad_False_cuda_float32 (__main__.TestScatterGatherCUDA) ... ok (0.004s) 2023-01-11T21:23:08.6250928Z test_gather_backward_with_empty_index_tensor_sparse_grad_False_cuda_float64 (__main__.TestScatterGatherCUDA) ... ok (0.001s) 2023-01-11T21:23:08.6251353Z test_gather_backward_with_empty_index_tensor_sparse_grad_True_cuda_float32 (__main__.TestScatterGatherCUDA) ... ok (0.001s) 2023-01-11T21:23:08.6251779Z test_gather_backward_with_empty_index_tensor_sparse_grad_True_cuda_float64 (__main__.TestScatterGatherCUDA) ... ok (0.001s) 2023-01-11T21:23:08.6252168Z test_gather_bool_cuda_bool (__main__.TestScatterGatherCUDA) ... ok (0.001s) 2023-01-11T21:23:08.6252502Z test_gather_cuda_complex64 (__main__.TestScatterGatherCUDA) ... ok (0.019s) 2023-01-11T21:23:08.6252826Z test_gather_cuda_float32 (__main__.TestScatterGatherCUDA) ... ok (0.019s) 2023-01-11T21:23:08.6253161Z test_scatter__cuda_complex64 (__main__.TestScatterGatherCUDA) ... ok (0.037s) 2023-01-11T21:23:08.6253498Z test_scatter__cuda_float16 (__main__.TestScatterGatherCUDA) ... ok (0.036s) 2023-01-11T21:23:08.6253908Z test_scatter__cuda_float32 (__main__.TestScatterGatherCUDA) ... ok (0.036s) 2023-01-11T21:23:08.6254256Z test_scatter__reductions_cuda_float16 (__main__.TestScatterGatherCUDA) ... ok (0.450s) 2023-01-11T21:23:08.6254868Z test_scatter__reductions_cuda_float32 (__main__.TestScatterGatherCUDA) ... ok (0.450s) 2023-01-11T21:23:08.6255225Z test_scatter__scalar_cuda_complex64 (__main__.TestScatterGatherCUDA) ... ok (0.035s) 2023-01-11T21:23:08.6255692Z test_scatter__scalar_cuda_float16 (__main__.TestScatterGatherCUDA) ... ok (0.035s) 2023-01-11T21:23:08.6256040Z test_scatter__scalar_cuda_float32 (__main__.TestScatterGatherCUDA) ... ok (0.036s) 2023-01-11T21:23:08.6256387Z test_scatter_add__cuda_complex64 (__main__.TestScatterGatherCUDA) ... ok (0.037s) 2023-01-11T21:23:08.6256731Z test_scatter_add__cuda_float16 (__main__.TestScatterGatherCUDA) ... ok (0.037s) 2023-01-11T21:23:08.6257058Z test_scatter_add__cuda_float32 (__main__.TestScatterGatherCUDA) ... ok (0.036s) 2023-01-11T21:23:08.6257427Z test_scatter_add_mult_index_base_cuda_float32 (__main__.TestScatterGatherCUDA) ... ok (0.002s) 2023-01-11T21:23:08.6258030Z test_scatter_reduce_amax_cuda_bfloat16 (__main__.TestScatterGatherCUDA) ... /var/lib/jenkins/workspace/test/test_scatter_gather_ops.py:107: UserWarning: scatter_reduce() is in beta and the API may change at any time. (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/TensorAdvancedIndexing.cpp:1739.) 2023-01-11T21:23:08.6258612Z actual = fn(base.clone(), dim, idx, src, reduce=reduction, include_self=include_self) 2023-01-11T21:23:08.6258862Z ok (0.090s) 2023-01-11T21:23:08.6259130Z test_scatter_reduce_amax_cuda_float16 (__main__.TestScatterGatherCUDA) ... ok (0.072s) 2023-01-11T21:23:08.6259481Z test_scatter_reduce_amax_cuda_float32 (__main__.TestScatterGatherCUDA) ... ok (0.071s) 2023-01-11T21:23:08.6259834Z test_scatter_reduce_amax_cuda_float64 (__main__.TestScatterGatherCUDA) ... ok (0.072s) 2023-01-11T21:23:08.6260170Z test_scatter_reduce_amax_cuda_int16 (__main__.TestScatterGatherCUDA) ... ok (0.069s) 2023-01-11T21:23:08.6260513Z test_scatter_reduce_amax_cuda_int32 (__main__.TestScatterGatherCUDA) ... ok (0.071s) 2023-01-11T21:23:08.6260917Z test_scatter_reduce_amax_cuda_int64 (__main__.TestScatterGatherCUDA) ... ok (0.071s) 2023-01-11T21:23:08.6261255Z test_scatter_reduce_amax_cuda_int8 (__main__.TestScatterGatherCUDA) ... ok (0.071s) 2023-01-11T21:23:08.6261600Z test_scatter_reduce_amax_cuda_uint8 (__main__.TestScatterGatherCUDA) ... ok (0.069s) 2023-01-11T21:23:08.6261945Z test_scatter_reduce_amin_cuda_bfloat16 (__main__.TestScatterGatherCUDA) ... ok (0.072s) 2023-01-11T21:23:08.6262301Z test_scatter_reduce_amin_cuda_float16 (__main__.TestScatterGatherCUDA) ... ok (0.072s) 2023-01-11T21:23:08.6262638Z test_scatter_reduce_amin_cuda_float32 (__main__.TestScatterGatherCUDA) ... ok (0.071s) 2023-01-11T21:23:08.6262975Z test_scatter_reduce_amin_cuda_float64 (__main__.TestScatterGatherCUDA) ... ok (0.072s) 2023-01-11T21:23:08.6263323Z test_scatter_reduce_amin_cuda_int16 (__main__.TestScatterGatherCUDA) ... ok (0.070s) 2023-01-11T21:23:08.6263652Z test_scatter_reduce_amin_cuda_int32 (__main__.TestScatterGatherCUDA) ... ok (0.071s) 2023-01-11T21:23:08.6263994Z test_scatter_reduce_amin_cuda_int64 (__main__.TestScatterGatherCUDA) ... ok (0.071s) 2023-01-11T21:23:08.6264337Z test_scatter_reduce_amin_cuda_int8 (__main__.TestScatterGatherCUDA) ... ok (0.070s) 2023-01-11T21:23:08.6264679Z test_scatter_reduce_amin_cuda_uint8 (__main__.TestScatterGatherCUDA) ... ok (0.071s) 2023-01-11T21:23:08.6265017Z test_scatter_reduce_mean_cuda_bfloat16 (__main__.TestScatterGatherCUDA) ... ok (0.061s) 2023-01-11T21:23:08.6265364Z test_scatter_reduce_mean_cuda_float16 (__main__.TestScatterGatherCUDA) ... ok (0.062s) 2023-01-11T21:23:08.6265715Z test_scatter_reduce_mean_cuda_float32 (__main__.TestScatterGatherCUDA) ... ok (0.063s) 2023-01-11T21:23:08.6266048Z test_scatter_reduce_mean_cuda_float64 (__main__.TestScatterGatherCUDA) ... ok (0.063s) 2023-01-11T21:23:08.6266391Z test_scatter_reduce_mean_cuda_int16 (__main__.TestScatterGatherCUDA) ... ok (0.061s) 2023-01-11T21:23:08.6266732Z test_scatter_reduce_mean_cuda_int32 (__main__.TestScatterGatherCUDA) ... ok (0.063s) 2023-01-11T21:23:08.6267079Z test_scatter_reduce_mean_cuda_int64 (__main__.TestScatterGatherCUDA) ... ok (0.062s) 2023-01-11T21:23:08.6267410Z test_scatter_reduce_mean_cuda_int8 (__main__.TestScatterGatherCUDA) ... ok (0.062s) 2023-01-11T21:23:08.6267778Z test_scatter_reduce_mean_cuda_uint8 (__main__.TestScatterGatherCUDA) ... ok (0.062s) 2023-01-11T21:23:08.6268125Z test_scatter_reduce_prod_cuda_bfloat16 (__main__.TestScatterGatherCUDA) ... ok (0.063s) 2023-01-11T21:23:08.6268475Z test_scatter_reduce_prod_cuda_float16 (__main__.TestScatterGatherCUDA) ... ok (0.062s) 2023-01-11T21:23:08.6268813Z test_scatter_reduce_prod_cuda_float32 (__main__.TestScatterGatherCUDA) ... ok (0.063s) 2023-01-11T21:23:08.6269157Z test_scatter_reduce_prod_cuda_float64 (__main__.TestScatterGatherCUDA) ... ok (0.063s) 2023-01-11T21:23:08.6269503Z test_scatter_reduce_prod_cuda_int16 (__main__.TestScatterGatherCUDA) ... ok (0.062s) 2023-01-11T21:23:08.6269836Z test_scatter_reduce_prod_cuda_int32 (__main__.TestScatterGatherCUDA) ... ok (0.062s) 2023-01-11T21:23:08.6270173Z test_scatter_reduce_prod_cuda_int64 (__main__.TestScatterGatherCUDA) ... ok (0.062s) 2023-01-11T21:23:08.6270517Z test_scatter_reduce_prod_cuda_int8 (__main__.TestScatterGatherCUDA) ... ok (0.062s) 2023-01-11T21:23:08.6270860Z test_scatter_reduce_prod_cuda_uint8 (__main__.TestScatterGatherCUDA) ... ok (0.063s) 2023-01-11T21:23:08.6271191Z test_scatter_reduce_sum_cuda_bfloat16 (__main__.TestScatterGatherCUDA) ... ok (0.062s) 2023-01-11T21:23:08.6271539Z test_scatter_reduce_sum_cuda_complex128 (__main__.TestScatterGatherCUDA) ... ok (0.062s) 2023-01-11T21:23:08.6271894Z test_scatter_reduce_sum_cuda_complex64 (__main__.TestScatterGatherCUDA) ... ok (0.062s) 2023-01-11T21:23:08.6272236Z test_scatter_reduce_sum_cuda_float16 (__main__.TestScatterGatherCUDA) ... ok (0.061s) 2023-01-11T21:23:08.6272582Z test_scatter_reduce_sum_cuda_float32 (__main__.TestScatterGatherCUDA) ... ok (0.060s) 2023-01-11T21:23:08.6272955Z test_scatter_reduce_sum_cuda_float64 (__main__.TestScatterGatherCUDA) ... ok (0.062s) 2023-01-11T21:23:08.6273296Z test_scatter_reduce_sum_cuda_int16 (__main__.TestScatterGatherCUDA) ... ok (0.062s) 2023-01-11T21:23:08.6273633Z test_scatter_reduce_sum_cuda_int32 (__main__.TestScatterGatherCUDA) ... ok (0.062s) 2023-01-11T21:23:08.6273971Z test_scatter_reduce_sum_cuda_int64 (__main__.TestScatterGatherCUDA) ... ok (0.062s) 2023-01-11T21:23:08.6274311Z test_scatter_reduce_sum_cuda_int8 (__main__.TestScatterGatherCUDA) ... ok (0.062s) 2023-01-11T21:23:08.6274690Z test_scatter_reduce_sum_cuda_uint8 (__main__.TestScatterGatherCUDA) ... ok (0.063s) 2023-01-11T21:23:08.6274880Z 2023-01-11T21:23:08.6275095Z ---------------------------------------------------------------------- 2023-01-11T21:23:08.6275358Z Ran 66 tests in 4.376s 2023-01-11T21:23:08.6275482Z 2023-01-11T21:23:08.6275549Z OK 2023-01-11T21:23:08.6275643Z 2023-01-11T21:23:08.6275740Z Generating XML reports... 2023-01-11T21:23:08.6276208Z Generated XML report: test-reports/python-unittest/test_scatter_gather_ops/TEST-TestScatterGatherCUDA-20230111212303.xml 2023-01-11T21:23:08.6276472Z 2023-01-11T21:23:08.6294189Z ##[endgroup] 2023-01-11T21:23:08.6294798Z FINISHED PRINTING LOG FILE of test_scatter_gather_ops (/var/lib/jenkins/workspace/test/test-reports/test_scatter_gather_ops_339th585) 2023-01-11T21:23:08.6295062Z 2023-01-11T21:23:10.5351943Z Ignoring disabled issues: [] 2023-01-11T21:23:10.5584696Z Running test_testing ... [2023-01-11 21:23:10.558055] 2023-01-11T21:23:10.5587071Z Executing ['/opt/conda/bin/python', '-bb', 'test_testing.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 21:23:10.558396] 2023-01-11T21:23:47.4350023Z 2023-01-11T21:23:47.4350541Z Expand the folded group to see the log file of test_testing 2023-01-11T21:23:47.4351592Z ##[group]PRINTING LOG FILE of test_testing (/var/lib/jenkins/workspace/test/test-reports/test_testing_qs5jmj7o) 2023-01-11T21:23:47.4352018Z 2023-01-11T21:23:47.4352150Z Running tests... 2023-01-11T21:23:47.4352771Z ---------------------------------------------------------------------- 2023-01-11T21:23:47.4357891Z Test results will be stored in test-reports/python-unittest/test_testing 2023-01-11T21:23:47.4358590Z test_bool (__main__.TestAssertClose) ... ok (0.002s) 2023-01-11T21:23:47.4359070Z test_default_tolerance_selection_mismatching_dtypes (__main__.TestAssertClose) ... ok (0.002s) 2023-01-11T21:23:47.4359531Z test_docstring_examples (__main__.TestAssertClose) ... ok (0.007s) 2023-01-11T21:23:47.4359887Z test_matching (__main__.TestAssertClose) ... ok (0.001s) 2023-01-11T21:23:47.4360965Z test_matching_atol (__main__.TestAssertClose) ... ok (0.001s) 2023-01-11T21:23:47.4361471Z test_matching_conjugate_bit (__main__.TestAssertClose) ... ok (0.002s) 2023-01-11T21:23:47.4362296Z test_matching_nan (__main__.TestAssertClose) ... ok (0.003s) 2023-01-11T21:23:47.4362949Z test_matching_nan_with_equal_nan (__main__.TestAssertClose) ... ok (0.001s) 2023-01-11T21:23:47.4363410Z test_matching_rtol (__main__.TestAssertClose) ... ok (0.001s) 2023-01-11T21:23:47.4363833Z test_meta (__main__.TestAssertClose) ... ok (0.001s) 2023-01-11T21:23:47.4364294Z test_mismatching_dtype (__main__.TestAssertClose) ... ok (0.001s) 2023-01-11T21:23:47.4364769Z test_mismatching_dtype_no_check (__main__.TestAssertClose) ... ok (0.001s) 2023-01-11T21:23:47.4365840Z test_mismatching_layout (__main__.TestAssertClose) ... /var/lib/jenkins/workspace/test/test_testing.py:619: UserWarning: Sparse CSR tensor support is in beta state. If you miss a functionality in the sparse tensor support, please submit a feature request to https://github.com/pytorch/pytorch/issues. (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/SparseCsrTensorImpl.cpp:56.) 2023-01-11T21:23:47.4366733Z sparse_csr = strided.to_sparse_csr() 2023-01-11T21:23:47.4367014Z ok (0.002s) 2023-01-11T21:23:47.4367292Z test_mismatching_layout_no_check (__main__.TestAssertClose) ... ok (0.004s) 2023-01-11T21:23:47.4367782Z test_mismatching_shape (__main__.TestAssertClose) ... ok (0.001s) 2023-01-11T21:23:47.4368091Z test_mismatching_stride (__main__.TestAssertClose) ... ok (0.001s) 2023-01-11T21:23:47.4376898Z test_mismatching_stride_no_check (__main__.TestAssertClose) ... ok (0.002s) 2023-01-11T21:23:47.4377351Z test_mismatching_types (__main__.TestAssertClose) ... ok (0.001s) 2023-01-11T21:23:47.4377782Z test_mismatching_types_subclasses (__main__.TestAssertClose) ... ok (0.002s) 2023-01-11T21:23:47.4378239Z test_mismatching_types_type_equality (__main__.TestAssertClose) ... ok (0.001s) 2023-01-11T21:23:47.4378639Z test_mismatching_values (__main__.TestAssertClose) ... ok (0.001s) 2023-01-11T21:23:47.4379044Z test_mismatching_values_atol (__main__.TestAssertClose) ... ok (0.002s) 2023-01-11T21:23:47.4379451Z test_mismatching_values_rtol (__main__.TestAssertClose) ... ok (0.002s) 2023-01-11T21:23:47.4379807Z test_none (__main__.TestAssertClose) ... ok (0.001s) 2023-01-11T21:23:47.4380154Z test_none_mismatch (__main__.TestAssertClose) ... ok (0.001s) 2023-01-11T21:23:47.4380514Z test_numpy (__main__.TestAssertClose) ... ok (0.002s) 2023-01-11T21:23:47.4380867Z test_only_atol (__main__.TestAssertClose) ... ok (0.001s) 2023-01-11T21:23:47.4381204Z test_only_rtol (__main__.TestAssertClose) ... ok (0.001s) 2023-01-11T21:23:47.4381550Z test_scalar (__main__.TestAssertClose) ... ok (0.003s) 2023-01-11T21:23:47.4381920Z test_unexpected_error_compare (__main__.TestAssertClose) ... ok (0.001s) 2023-01-11T21:23:47.4382329Z test_unexpected_error_originate (__main__.TestAssertClose) ... ok (0.001s) 2023-01-11T21:23:47.4382703Z test_unknown_layout (__main__.TestAssertClose) ... ok (0.001s) 2023-01-11T21:23:47.4383069Z test_unknown_type (__main__.TestAssertClose) ... ok (0.008s) 2023-01-11T21:23:47.4383655Z test_mapping_mismatching_keys (__main__.TestAssertCloseContainer) ... ok (0.000s) 2023-01-11T21:23:47.4384133Z test_mapping_mismatching_values_msg (__main__.TestAssertCloseContainer) ... ok (0.001s) 2023-01-11T21:23:47.4384550Z test_sequence_mismatching_len (__main__.TestAssertCloseContainer) ... ok (0.000s) 2023-01-11T21:23:47.4384903Z test_sequence_mismatching_values_msg (__main__.TestAssertCloseContainer) ... ok (0.001s) 2023-01-11T21:23:47.4385449Z test_abs_diff (__main__.TestAssertCloseErrorMessage) ... ok (0.003s) 2023-01-11T21:23:47.4385777Z test_abs_diff_scalar (__main__.TestAssertCloseErrorMessage) ... ok (0.001s) 2023-01-11T21:23:47.4386104Z test_atol (__main__.TestAssertCloseErrorMessage) ... ok (0.003s) 2023-01-11T21:23:47.4386435Z test_identifier_scalars (__main__.TestAssertCloseErrorMessage) ... ok (0.001s) 2023-01-11T21:23:47.4386777Z test_identifier_tensor_likes (__main__.TestAssertCloseErrorMessage) ... ok (0.002s) 2023-01-11T21:23:47.4387129Z test_mismatched_elements (__main__.TestAssertCloseErrorMessage) ... ok (0.002s) 2023-01-11T21:23:47.4387464Z test_msg_callable (__main__.TestAssertCloseErrorMessage) ... ok (0.001s) 2023-01-11T21:23:47.4387789Z test_msg_str (__main__.TestAssertCloseErrorMessage) ... ok (0.001s) 2023-01-11T21:23:47.4388101Z test_not_close (__main__.TestAssertCloseErrorMessage) ... ok (0.007s) 2023-01-11T21:23:47.4388419Z test_not_equal (__main__.TestAssertCloseErrorMessage) ... ok (0.002s) 2023-01-11T21:23:47.4388742Z test_rel_diff (__main__.TestAssertCloseErrorMessage) ... ok (0.003s) 2023-01-11T21:23:47.4389067Z test_rel_diff_scalar (__main__.TestAssertCloseErrorMessage) ... ok (0.001s) 2023-01-11T21:23:47.4389378Z test_rtol (__main__.TestAssertCloseErrorMessage) ... ok (0.003s) 2023-01-11T21:23:47.4389758Z test_zero_div_zero (__main__.TestAssertCloseErrorMessage) ... ok (0.003s) 2023-01-11T21:23:47.4390214Z test_mismatching_device_cuda (__main__.TestAssertCloseMultiDeviceCUDA) ... ok (0.002s) 2023-01-11T21:23:47.4390692Z test_mismatching_device_no_check_cuda (__main__.TestAssertCloseMultiDeviceCUDA) ... ok (0.003s) 2023-01-11T21:23:47.4391154Z test_matching_per_channel (__main__.TestAssertCloseQuantized) ... ok (0.002s) 2023-01-11T21:23:47.4391673Z test_matching_per_tensor (__main__.TestAssertCloseQuantized) ... ok (0.002s) 2023-01-11T21:23:47.4392122Z test_mismatching_is_quantized (__main__.TestAssertCloseQuantized) ... ok (0.001s) 2023-01-11T21:23:47.4392548Z test_mismatching_qscheme (__main__.TestAssertCloseQuantized) ... ok (0.001s) 2023-01-11T21:23:47.4392944Z test_matching (__main__.TestAssertCloseSparseBSC) ... ok (0.002s) 2023-01-11T21:23:47.4393281Z test_mismatching_ccol_indices_msg (__main__.TestAssertCloseSparseBSC) ... ok (0.003s) 2023-01-11T21:23:47.4393627Z test_mismatching_row_indices_msg (__main__.TestAssertCloseSparseBSC) ... ok (0.003s) 2023-01-11T21:23:47.4393980Z test_mismatching_values_msg (__main__.TestAssertCloseSparseBSC) ... ok (0.003s) 2023-01-11T21:23:47.4394312Z test_matching (__main__.TestAssertCloseSparseBSR) ... ok (0.002s) 2023-01-11T21:23:47.4394643Z test_mismatching_col_indices_msg (__main__.TestAssertCloseSparseBSR) ... ok (0.003s) 2023-01-11T21:23:47.4394990Z test_mismatching_crow_indices_msg (__main__.TestAssertCloseSparseBSR) ... ok (0.003s) 2023-01-11T21:23:47.4395331Z test_mismatching_values_msg (__main__.TestAssertCloseSparseBSR) ... ok (0.003s) 2023-01-11T21:23:47.4395668Z test_matching_coalesced (__main__.TestAssertCloseSparseCOO) ... ok (0.001s) 2023-01-11T21:23:47.4396001Z test_matching_uncoalesced (__main__.TestAssertCloseSparseCOO) ... ok (0.001s) 2023-01-11T21:23:47.4396349Z test_mismatching_indices_msg (__main__.TestAssertCloseSparseCOO) ... ok (0.003s) 2023-01-11T21:23:47.4396682Z test_mismatching_nnz (__main__.TestAssertCloseSparseCOO) ... ok (0.001s) 2023-01-11T21:23:47.4397015Z test_mismatching_sparse_dims (__main__.TestAssertCloseSparseCOO) ... ok (0.001s) 2023-01-11T21:23:47.4397347Z test_mismatching_values_msg (__main__.TestAssertCloseSparseCOO) ... ok (0.003s) 2023-01-11T21:23:47.4397674Z test_matching (__main__.TestAssertCloseSparseCSC) ... ok (0.001s) 2023-01-11T21:23:47.4398014Z test_mismatching_ccol_indices_msg (__main__.TestAssertCloseSparseCSC) ... ok (0.003s) 2023-01-11T21:23:47.4398355Z test_mismatching_row_indices_msg (__main__.TestAssertCloseSparseCSC) ... ok (0.003s) 2023-01-11T21:23:47.4398697Z test_mismatching_values_msg (__main__.TestAssertCloseSparseCSC) ... ok (0.003s) 2023-01-11T21:23:47.4399130Z test_hybrid_support (__main__.TestAssertCloseSparseCSR) ... expected failure (0.006s) 2023-01-11T21:23:47.4399462Z test_matching (__main__.TestAssertCloseSparseCSR) ... ok (0.002s) 2023-01-11T21:23:47.4399788Z test_mismatching_col_indices_msg (__main__.TestAssertCloseSparseCSR) ... ok (0.003s) 2023-01-11T21:23:47.4400137Z test_mismatching_crow_indices_msg (__main__.TestAssertCloseSparseCSR) ... ok (0.003s) 2023-01-11T21:23:47.4400482Z test_mismatching_values_msg (__main__.TestAssertCloseSparseCSR) ... ok (0.003s) 2023-01-11T21:23:47.4400798Z test_filtering_env_var (__main__.TestFrameworkUtils) ... ok (11.351s) 2023-01-11T21:23:47.4401088Z test_circular_dependencies (__main__.TestImports) 2023-01-11T21:23:47.4401726Z Checks that all modules inside torch can be imported ... 2023-01-11 21:23:39,302 - torch.distributed.nn.jit.instantiator - INFO - Created a temporary directory at /tmp/tmpec57be6h 2023-01-11T21:23:47.4402324Z 2023-01-11 21:23:39,302 - torch.distributed.nn.jit.instantiator - INFO - Writing /tmp/tmpec57be6h/_remote_module_non_scriptable.py 2023-01-11T21:23:47.4402632Z ok (15.258s) 2023-01-11T21:23:47.4402910Z test_no_mutate_global_logging_on_import_path_functorch (__main__.TestImports) ... ok (1.382s) 2023-01-11T21:23:47.4403256Z test_no_mutate_global_logging_on_import_path_torch (__main__.TestImports) ... ok (1.367s) 2023-01-11T21:23:47.4403676Z test_no_warning_on_import (__main__.TestImports) ... /var/lib/jenkins/workspace/test/test_testing.py:1836: DeprecationWarning: Please use assertEqual instead. 2023-01-11T21:23:47.4404027Z self.assertEquals(out, "") 2023-01-11T21:23:47.4404234Z ok (1.384s) 2023-01-11T21:23:47.4404572Z test_opinfo_error_generators_T_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s) 2023-01-11T21:23:47.4404956Z test_opinfo_error_generators___radd___cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s) 2023-01-11T21:23:47.4405354Z test_opinfo_error_generators___rand___cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4405744Z test_opinfo_error_generators___rdiv___cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4406124Z test_opinfo_error_generators___rmod___cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s) 2023-01-11T21:23:47.4406499Z test_opinfo_error_generators___rmul___cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4406891Z test_opinfo_error_generators___ror___cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4407280Z test_opinfo_error_generators___rpow___cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4407663Z test_opinfo_error_generators___rsub___cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4408046Z test_opinfo_error_generators___rxor___cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4408427Z test_opinfo_error_generators_add_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4408813Z test_opinfo_error_generators_amax_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4409188Z test_opinfo_error_generators_amin_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s) 2023-01-11T21:23:47.4409579Z test_opinfo_error_generators_aminmax_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4409976Z test_opinfo_error_generators_arange_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4410385Z test_opinfo_error_generators_as_strided_scatter_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4410780Z test_opinfo_error_generators_atan2_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4411174Z test_opinfo_error_generators_bernoulli_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4411607Z test_opinfo_error_generators_bitwise_and_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4412008Z test_opinfo_error_generators_bitwise_left_shift_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4412411Z test_opinfo_error_generators_bitwise_or_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4412828Z test_opinfo_error_generators_bitwise_right_shift_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4413238Z test_opinfo_error_generators_bitwise_xor_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4413622Z test_opinfo_error_generators_cat_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4414143Z test_opinfo_error_generators_clamp_max_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4414768Z test_opinfo_error_generators_clamp_min_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4415171Z test_opinfo_error_generators_complex_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4415552Z test_opinfo_error_generators_copysign_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4415960Z test_opinfo_error_generators_cov_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s) 2023-01-11T21:23:47.4416365Z test_opinfo_error_generators_diag_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4416743Z test_opinfo_error_generators_diag_embed_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4417149Z test_opinfo_error_generators_diagonal_copy_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4417627Z test_opinfo_error_generators_diagonal_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4418032Z test_opinfo_error_generators_div_floor_rounding_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4418440Z test_opinfo_error_generators_div_no_rounding_mode_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4418856Z test_opinfo_error_generators_div_trunc_rounding_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4419256Z test_opinfo_error_generators_dsplit_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4419648Z test_opinfo_error_generators_dstack_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4420024Z test_opinfo_error_generators_eq_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4420410Z test_opinfo_error_generators_eye_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4420799Z test_opinfo_error_generators_fft_fft2_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4421182Z test_opinfo_error_generators_fft_fft_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4421572Z test_opinfo_error_generators_fft_fftn_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4421966Z test_opinfo_error_generators_fft_hfft2_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s) 2023-01-11T21:23:47.4422358Z test_opinfo_error_generators_fft_hfft_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4422739Z test_opinfo_error_generators_fft_hfftn_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4423128Z test_opinfo_error_generators_fft_ifft2_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4423525Z test_opinfo_error_generators_fft_ifft_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4423914Z test_opinfo_error_generators_fft_ifftn_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4424339Z test_opinfo_error_generators_fft_ihfft2_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4424734Z test_opinfo_error_generators_fft_ihfft_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4425122Z test_opinfo_error_generators_fft_ihfftn_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4425518Z test_opinfo_error_generators_fft_irfft2_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4425900Z test_opinfo_error_generators_fft_irfft_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4426285Z test_opinfo_error_generators_fft_irfftn_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4426672Z test_opinfo_error_generators_fft_rfft2_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4427057Z test_opinfo_error_generators_fft_rfft_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4427452Z test_opinfo_error_generators_fft_rfftn_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4427841Z test_opinfo_error_generators_fliplr_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4428229Z test_opinfo_error_generators_flipud_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s) 2023-01-11T21:23:47.4428614Z test_opinfo_error_generators_float_power_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4429017Z test_opinfo_error_generators_floor_divide_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4429411Z test_opinfo_error_generators_fmax_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4429828Z test_opinfo_error_generators_fmin_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4430204Z test_opinfo_error_generators_fmod_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4430598Z test_opinfo_error_generators_gather_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4430990Z test_opinfo_error_generators_gcd_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4431362Z test_opinfo_error_generators_ge_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4431749Z test_opinfo_error_generators_gradient_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4432135Z test_opinfo_error_generators_gt_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4432523Z test_opinfo_error_generators_heaviside_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4432910Z test_opinfo_error_generators_hsplit_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4433299Z test_opinfo_error_generators_hstack_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4433687Z test_opinfo_error_generators_hypot_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4434074Z test_opinfo_error_generators_igamma_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4434452Z test_opinfo_error_generators_igammac_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4434853Z test_opinfo_error_generators_index_select_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4435248Z test_opinfo_error_generators_isclose_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4435639Z test_opinfo_error_generators_jiterator_binary_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4436064Z test_opinfo_error_generators_jiterator_binary_return_by_ref_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4436507Z test_opinfo_error_generators_kthvalue_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4436903Z test_opinfo_error_generators_lcm_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4437279Z test_opinfo_error_generators_ldexp_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4437662Z test_opinfo_error_generators_le_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4438053Z test_opinfo_error_generators_linalg_cross_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4438457Z test_opinfo_error_generators_linalg_lstsq_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4438866Z test_opinfo_error_generators_linalg_lstsq_grad_oriented_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4439286Z test_opinfo_error_generators_linspace_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4439689Z test_opinfo_error_generators_logcumsumexp_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4440088Z test_opinfo_error_generators_logical_and_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4440481Z test_opinfo_error_generators_logical_or_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4440874Z test_opinfo_error_generators_logical_xor_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4441276Z test_opinfo_error_generators_logspace_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4441654Z test_opinfo_error_generators_lt_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4442091Z test_opinfo_error_generators_masked_fill_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4442491Z test_opinfo_error_generators_masked_select_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4442893Z test_opinfo_error_generators_max_binary_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4443280Z test_opinfo_error_generators_maximum_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4443664Z test_opinfo_error_generators_mean_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4444052Z test_opinfo_error_generators_median_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4444442Z test_opinfo_error_generators_min_binary_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4444824Z test_opinfo_error_generators_minimum_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4445219Z test_opinfo_error_generators_movedim_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4445608Z test_opinfo_error_generators_mul_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4445994Z test_opinfo_error_generators_multinomial_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4446396Z test_opinfo_error_generators_narrow_copy_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4446792Z test_opinfo_error_generators_narrow_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4447197Z test_opinfo_error_generators_native_layer_norm_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4447585Z test_opinfo_error_generators_ne_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4447970Z test_opinfo_error_generators_neg_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4448357Z test_opinfo_error_generators_nextafter_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4448807Z test_opinfo_error_generators_nn_functional_avg_pool1d_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4449230Z test_opinfo_error_generators_nn_functional_avg_pool2d_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4449657Z test_opinfo_error_generators_nn_functional_avg_pool3d_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4450078Z test_opinfo_error_generators_nn_functional_conv1d_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4450489Z test_opinfo_error_generators_nn_functional_conv2d_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4450904Z test_opinfo_error_generators_nn_functional_embedding_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4451340Z test_opinfo_error_generators_nn_functional_gaussian_nll_loss_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4451768Z test_opinfo_error_generators_nn_functional_gelu_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4452178Z test_opinfo_error_generators_nn_functional_group_norm_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4452612Z test_opinfo_error_generators_nn_functional_hinge_embedding_loss_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4453046Z test_opinfo_error_generators_nn_functional_huber_loss_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4453464Z test_opinfo_error_generators_nn_functional_l1_loss_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4454005Z test_opinfo_error_generators_nn_functional_margin_ranking_loss_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4454651Z test_opinfo_error_generators_nn_functional_max_pool1d_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4455075Z test_opinfo_error_generators_nn_functional_max_pool2d_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4455514Z test_opinfo_error_generators_nn_functional_max_pool3d_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4455947Z test_opinfo_error_generators_nn_functional_poisson_nll_loss_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4456374Z test_opinfo_error_generators_nn_functional_prelu_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4456780Z test_opinfo_error_generators_nn_functional_rrelu_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4457204Z test_opinfo_error_generators_nn_functional_soft_margin_loss_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4457620Z test_opinfo_error_generators_nn_functional_softshrink_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4458048Z test_opinfo_error_generators_nn_functional_triplet_margin_loss_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4458499Z test_opinfo_error_generators_nn_functional_triplet_margin_with_distance_loss_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4458918Z test_opinfo_error_generators_ormqr_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4459295Z test_opinfo_error_generators_polar_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4459678Z test_opinfo_error_generators_pow_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4460066Z test_opinfo_error_generators_remainder_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4460457Z test_opinfo_error_generators_renorm_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4460839Z test_opinfo_error_generators_reshape_as_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4461308Z test_opinfo_error_generators_reshape_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4461693Z test_opinfo_error_generators_roll_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4462066Z test_opinfo_error_generators_rot90_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4462445Z test_opinfo_error_generators_rsub_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4462832Z test_opinfo_error_generators_scatter_add_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4463227Z test_opinfo_error_generators_scatter_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s) 2023-01-11T21:23:47.4463629Z test_opinfo_error_generators_signal_windows_bartlett_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4464050Z test_opinfo_error_generators_signal_windows_blackman_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4464467Z test_opinfo_error_generators_signal_windows_cosine_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4464887Z test_opinfo_error_generators_signal_windows_exponential_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4465296Z test_opinfo_error_generators_signal_windows_gaussian_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4465716Z test_opinfo_error_generators_signal_windows_general_cosine_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4466145Z test_opinfo_error_generators_signal_windows_general_hamming_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4466672Z test_opinfo_error_generators_signal_windows_hamming_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4467086Z test_opinfo_error_generators_signal_windows_hann_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4467503Z test_opinfo_error_generators_signal_windows_kaiser_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4467924Z test_opinfo_error_generators_signal_windows_nuttall_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4468352Z test_opinfo_error_generators_special_chebyshev_polynomial_t_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4468782Z test_opinfo_error_generators_special_chebyshev_polynomial_u_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4469499Z test_opinfo_error_generators_special_chebyshev_polynomial_v_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... skip: Skipping - testing takes an unreasonably long time, #79528 (0.000s) 2023-01-11T21:23:47.4470170Z test_opinfo_error_generators_special_chebyshev_polynomial_w_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... skip: Skipping - testing takes an unreasonably long time, #79528 (0.000s) 2023-01-11T21:23:47.4470664Z test_opinfo_error_generators_special_hermite_polynomial_h_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4471094Z test_opinfo_error_generators_special_hermite_polynomial_he_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4471534Z test_opinfo_error_generators_special_laguerre_polynomial_l_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4472144Z test_opinfo_error_generators_special_legendre_polynomial_p_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... skip: Skipping - testing takes an unreasonably long time, #79528 (0.000s) 2023-01-11T21:23:47.4472820Z test_opinfo_error_generators_special_shifted_chebyshev_polynomial_t_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... skip: Skipping - testing takes an unreasonably long time, #79528 (0.000s) 2023-01-11T21:23:47.4473531Z test_opinfo_error_generators_special_shifted_chebyshev_polynomial_u_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... skip: Skipping - testing takes an unreasonably long time, #79528 (0.000s) 2023-01-11T21:23:47.4474206Z test_opinfo_error_generators_special_shifted_chebyshev_polynomial_v_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... skip: Skipping - testing takes an unreasonably long time, #79528 (0.000s) 2023-01-11T21:23:47.4474872Z test_opinfo_error_generators_special_shifted_chebyshev_polynomial_w_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... skip: Skipping - testing takes an unreasonably long time, #79528 (0.000s) 2023-01-11T21:23:47.4475352Z test_opinfo_error_generators_special_xlog1py_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4475786Z test_opinfo_error_generators_special_zeta_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4476200Z test_opinfo_error_generators_sub_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4476597Z test_opinfo_error_generators_sum_to_size_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4476986Z test_opinfo_error_generators_t_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4477360Z test_opinfo_error_generators_take_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4477750Z test_opinfo_error_generators_trace_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4478135Z test_opinfo_error_generators_tril_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4478509Z test_opinfo_error_generators_triu_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4478992Z test_opinfo_error_generators_true_divide_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4479387Z test_opinfo_error_generators_unbind_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4479782Z test_opinfo_error_generators_uniform_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4480175Z test_opinfo_error_generators_view_as_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4480560Z test_opinfo_error_generators_view_copy_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4480949Z test_opinfo_error_generators_view_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4481329Z test_opinfo_error_generators_vsplit_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4481710Z test_opinfo_error_generators_vstack_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4482106Z test_opinfo_error_generators_where_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4482495Z test_opinfo_error_generators_xlogy_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4482902Z test_opinfo_reference_generators___radd___cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4483309Z test_opinfo_reference_generators___rand___cuda_int64 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4483715Z test_opinfo_reference_generators___rdiv___cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4484131Z test_opinfo_reference_generators___rmod___cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4484537Z test_opinfo_reference_generators___rmul___cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4484946Z test_opinfo_reference_generators___ror___cuda_int64 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4485350Z test_opinfo_reference_generators___rpow___cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s) 2023-01-11T21:23:47.4485812Z test_opinfo_reference_generators___rsub___cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4486223Z test_opinfo_reference_generators___rxor___cuda_int64 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4486620Z test_opinfo_reference_generators_abs_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4487030Z test_opinfo_reference_generators_acos_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4487438Z test_opinfo_reference_generators_acosh_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4487841Z test_opinfo_reference_generators_add_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4488259Z test_opinfo_reference_generators_addcdiv_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s) 2023-01-11T21:23:47.4488673Z test_opinfo_reference_generators_addcmul_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4489087Z test_opinfo_reference_generators_angle_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4489486Z test_opinfo_reference_generators_asin_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4489897Z test_opinfo_reference_generators_asinh_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4490299Z test_opinfo_reference_generators_atan2_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4490708Z test_opinfo_reference_generators_atan_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4491144Z test_opinfo_reference_generators_atanh_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4491560Z test_opinfo_reference_generators_bfloat16_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s) 2023-01-11T21:23:47.4491977Z test_opinfo_reference_generators_bitwise_and_cuda_int64 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4492405Z test_opinfo_reference_generators_bitwise_left_shift_cuda_int64 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4492820Z test_opinfo_reference_generators_bitwise_not_cuda_int64 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4493231Z test_opinfo_reference_generators_bitwise_or_cuda_int64 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4493655Z test_opinfo_reference_generators_bitwise_right_shift_cuda_int64 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4494215Z test_opinfo_reference_generators_bitwise_xor_cuda_int64 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4494883Z test_opinfo_reference_generators_bool_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s) 2023-01-11T21:23:47.4495309Z test_opinfo_reference_generators_broadcast_tensors_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4495733Z test_opinfo_reference_generators_bucketize_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4496140Z test_opinfo_reference_generators_byte_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4496534Z test_opinfo_reference_generators_cat_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4496942Z test_opinfo_reference_generators_cdouble_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4497349Z test_opinfo_reference_generators_ceil_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4497746Z test_opinfo_reference_generators_cfloat_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s) 2023-01-11T21:23:47.4498238Z test_opinfo_reference_generators_chalf_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4498642Z test_opinfo_reference_generators_char_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4499046Z test_opinfo_reference_generators_chunk_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4499440Z test_opinfo_reference_generators_clamp_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4499847Z test_opinfo_reference_generators_clamp_max_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4500266Z test_opinfo_reference_generators_clamp_min_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4500669Z test_opinfo_reference_generators_clone_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4501068Z test_opinfo_reference_generators_complex_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s) 2023-01-11T21:23:47.4501473Z test_opinfo_reference_generators_conj_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4501883Z test_opinfo_reference_generators_conj_physical_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4502310Z test_opinfo_reference_generators_contiguous_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4502720Z test_opinfo_reference_generators_copysign_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4503127Z test_opinfo_reference_generators_cos_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4503583Z test_opinfo_reference_generators_cosh_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4504000Z test_opinfo_reference_generators_deg2rad_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s) 2023-01-11T21:23:47.4504413Z test_opinfo_reference_generators_diag_embed_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4504838Z test_opinfo_reference_generators_diagonal_copy_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4505259Z test_opinfo_reference_generators_diagonal_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4505666Z test_opinfo_reference_generators_digamma_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4506141Z test_opinfo_reference_generators_div_floor_rounding_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4506585Z test_opinfo_reference_generators_div_no_rounding_mode_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4507022Z test_opinfo_reference_generators_div_trunc_rounding_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4507437Z test_opinfo_reference_generators_double_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s) 2023-01-11T21:23:47.4507855Z test_opinfo_reference_generators_empty_like_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4508268Z test_opinfo_reference_generators_eq_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4508674Z test_opinfo_reference_generators_erf_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4509075Z test_opinfo_reference_generators_erfc_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4509492Z test_opinfo_reference_generators_erfinv_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4509900Z test_opinfo_reference_generators_exp2_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4510339Z test_opinfo_reference_generators_exp_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s) 2023-01-11T21:23:47.4510739Z test_opinfo_reference_generators_expm1_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4511152Z test_opinfo_reference_generators_fill_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4511563Z test_opinfo_reference_generators_flatten_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4511977Z test_opinfo_reference_generators_float_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4512387Z test_opinfo_reference_generators_float_power_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s) 2023-01-11T21:23:47.4512800Z test_opinfo_reference_generators_floor_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s) 2023-01-11T21:23:47.4513216Z test_opinfo_reference_generators_floor_divide_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s) 2023-01-11T21:23:47.4513632Z test_opinfo_reference_generators_fmax_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4514026Z test_opinfo_reference_generators_fmin_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4514430Z test_opinfo_reference_generators_fmod_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4514837Z test_opinfo_reference_generators_frac_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s) 2023-01-11T21:23:47.4515270Z test_opinfo_reference_generators_frexp_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4515675Z test_opinfo_reference_generators_gcd_cuda_int64 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4516088Z test_opinfo_reference_generators_ge_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4516489Z test_opinfo_reference_generators_gt_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s) 2023-01-11T21:23:47.4516883Z test_opinfo_reference_generators_half_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4517297Z test_opinfo_reference_generators_heaviside_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4517716Z test_opinfo_reference_generators_hypot_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4518124Z test_opinfo_reference_generators_i0_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4518521Z test_opinfo_reference_generators_igamma_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s) 2023-01-11T21:23:47.4518935Z test_opinfo_reference_generators_igammac_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4519352Z test_opinfo_reference_generators_imag_cuda_complex64 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s) 2023-01-11T21:23:47.4519772Z test_opinfo_reference_generators_index_add_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4520183Z test_opinfo_reference_generators_index_copy_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4520603Z test_opinfo_reference_generators_index_fill_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4521024Z test_opinfo_reference_generators_index_select_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4521436Z test_opinfo_reference_generators_int_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4521846Z test_opinfo_reference_generators_isclose_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4522290Z test_opinfo_reference_generators_isfinite_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4522705Z test_opinfo_reference_generators_isinf_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4523099Z test_opinfo_reference_generators_isnan_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4523510Z test_opinfo_reference_generators_isneginf_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4523919Z test_opinfo_reference_generators_isposinf_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4524338Z test_opinfo_reference_generators_isreal_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4524750Z test_opinfo_reference_generators_jiterator_binary_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4525198Z test_opinfo_reference_generators_jiterator_binary_return_by_ref_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4525639Z test_opinfo_reference_generators_jiterator_unary_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s) 2023-01-11T21:23:47.4526057Z test_opinfo_reference_generators_lcm_cuda_int64 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4526457Z test_opinfo_reference_generators_ldexp_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4526864Z test_opinfo_reference_generators_le_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4527306Z test_opinfo_reference_generators_lgamma_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4527713Z test_opinfo_reference_generators_log10_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4528111Z test_opinfo_reference_generators_log1p_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4528521Z test_opinfo_reference_generators_log2_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4528929Z test_opinfo_reference_generators_log_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4529345Z test_opinfo_reference_generators_logical_and_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4529761Z test_opinfo_reference_generators_logical_not_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4530183Z test_opinfo_reference_generators_logical_or_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4530602Z test_opinfo_reference_generators_logical_xor_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4531011Z test_opinfo_reference_generators_logit_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4531418Z test_opinfo_reference_generators_long_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4531822Z test_opinfo_reference_generators_lt_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s) 2023-01-11T21:23:47.4532233Z test_opinfo_reference_generators_max_binary_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4532640Z test_opinfo_reference_generators_maximum_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4533059Z test_opinfo_reference_generators_min_binary_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4533476Z test_opinfo_reference_generators_minimum_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4534068Z test_opinfo_reference_generators_movedim_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4534619Z test_opinfo_reference_generators_mul_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4535045Z test_opinfo_reference_generators_mvlgamma_mvlgamma_p_1_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4535493Z test_opinfo_reference_generators_mvlgamma_mvlgamma_p_3_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4535971Z test_opinfo_reference_generators_mvlgamma_mvlgamma_p_5_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4536394Z test_opinfo_reference_generators_nan_to_num_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4536818Z test_opinfo_reference_generators_narrow_copy_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4537237Z test_opinfo_reference_generators_narrow_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4537645Z test_opinfo_reference_generators_ne_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4538044Z test_opinfo_reference_generators_neg_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4538457Z test_opinfo_reference_generators_nextafter_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s) 2023-01-11T21:23:47.4538883Z test_opinfo_reference_generators_nn_functional_celu_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4539318Z test_opinfo_reference_generators_nn_functional_elu_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4539851Z test_opinfo_reference_generators_nn_functional_group_norm_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4540301Z test_opinfo_reference_generators_nn_functional_hardshrink_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4540754Z test_opinfo_reference_generators_nn_functional_hardsigmoid_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4541209Z test_opinfo_reference_generators_nn_functional_hardtanh_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4541658Z test_opinfo_reference_generators_nn_functional_hinge_embedding_loss_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4542116Z test_opinfo_reference_generators_nn_functional_logsigmoid_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4542577Z test_opinfo_reference_generators_nn_functional_margin_ranking_loss_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4543024Z test_opinfo_reference_generators_nn_functional_mish_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4543459Z test_opinfo_reference_generators_nn_functional_prelu_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4543895Z test_opinfo_reference_generators_nn_functional_relu6_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4544323Z test_opinfo_reference_generators_nn_functional_relu_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4544756Z test_opinfo_reference_generators_nn_functional_rrelu_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4545176Z test_opinfo_reference_generators_nn_functional_selu_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s) 2023-01-11T21:23:47.4545614Z test_opinfo_reference_generators_nn_functional_silu_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4546138Z test_opinfo_reference_generators_nn_functional_softplus_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4546573Z test_opinfo_reference_generators_nn_functional_softshrink_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4547013Z test_opinfo_reference_generators_nn_functional_softsign_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4547452Z test_opinfo_reference_generators_nn_functional_tanhshrink_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4547894Z test_opinfo_reference_generators_nn_functional_threshold_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4548318Z test_opinfo_reference_generators_permute_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4548729Z test_opinfo_reference_generators_polar_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4549158Z test_opinfo_reference_generators_polygamma_polygamma_n_0_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4549601Z test_opinfo_reference_generators_polygamma_polygamma_n_1_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4550028Z test_opinfo_reference_generators_polygamma_polygamma_n_2_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4550465Z test_opinfo_reference_generators_polygamma_polygamma_n_3_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4550891Z test_opinfo_reference_generators_polygamma_polygamma_n_4_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4551355Z test_opinfo_reference_generators_positive_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4551761Z test_opinfo_reference_generators_pow_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4552172Z test_opinfo_reference_generators_rad2deg_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4552583Z test_opinfo_reference_generators_real_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4552996Z test_opinfo_reference_generators_reciprocal_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4553408Z test_opinfo_reference_generators_remainder_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4553830Z test_opinfo_reference_generators_reshape_as_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4554254Z test_opinfo_reference_generators_reshape_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4554663Z test_opinfo_reference_generators_round_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4555078Z test_opinfo_reference_generators_round_decimals_0_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s) 2023-01-11T21:23:47.4555509Z test_opinfo_reference_generators_round_decimals_3_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4555940Z test_opinfo_reference_generators_round_decimals_neg_3_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4556358Z test_opinfo_reference_generators_rsqrt_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4556752Z test_opinfo_reference_generators_rsub_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4557164Z test_opinfo_reference_generators_sgn_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4557567Z test_opinfo_reference_generators_short_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4558006Z test_opinfo_reference_generators_sigmoid_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s) 2023-01-11T21:23:47.4558407Z test_opinfo_reference_generators_sign_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4558836Z test_opinfo_reference_generators_signal_windows_bartlett_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4559275Z test_opinfo_reference_generators_signal_windows_blackman_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4559709Z test_opinfo_reference_generators_signal_windows_cosine_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4560159Z test_opinfo_reference_generators_signal_windows_exponential_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4560609Z test_opinfo_reference_generators_signal_windows_gaussian_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4561060Z test_opinfo_reference_generators_signal_windows_general_cosine_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4561513Z test_opinfo_reference_generators_signal_windows_general_hamming_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s) 2023-01-11T21:23:47.4561958Z test_opinfo_reference_generators_signal_windows_hamming_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4562394Z test_opinfo_reference_generators_signal_windows_hann_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4562875Z test_opinfo_reference_generators_signal_windows_kaiser_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4563305Z test_opinfo_reference_generators_signal_windows_nuttall_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4563736Z test_opinfo_reference_generators_signbit_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4564145Z test_opinfo_reference_generators_sin_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4564554Z test_opinfo_reference_generators_sinc_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s) 2023-01-11T21:23:47.4564952Z test_opinfo_reference_generators_sinh_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4565373Z test_opinfo_reference_generators_special_airy_ai_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4565810Z test_opinfo_reference_generators_special_bessel_j0_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4566240Z test_opinfo_reference_generators_special_bessel_j1_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4566660Z test_opinfo_reference_generators_special_bessel_y0_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4567083Z test_opinfo_reference_generators_special_bessel_y1_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4567531Z test_opinfo_reference_generators_special_chebyshev_polynomial_t_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4567994Z test_opinfo_reference_generators_special_chebyshev_polynomial_u_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s) 2023-01-11T21:23:47.4568727Z test_opinfo_reference_generators_special_chebyshev_polynomial_v_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... skip: Skipping - testing takes an unreasonably long time, #79528 (0.000s) 2023-01-11T21:23:47.4569419Z test_opinfo_reference_generators_special_chebyshev_polynomial_w_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... skip: Skipping - testing takes an unreasonably long time, #79528 (0.000s) 2023-01-11T21:23:47.4569980Z test_opinfo_reference_generators_special_entr_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4570407Z test_opinfo_reference_generators_special_erfcx_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4570843Z test_opinfo_reference_generators_special_hermite_polynomial_h_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4571304Z test_opinfo_reference_generators_special_hermite_polynomial_he_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4571744Z test_opinfo_reference_generators_special_i0e_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4572175Z test_opinfo_reference_generators_special_i1_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s) 2023-01-11T21:23:47.4572591Z test_opinfo_reference_generators_special_i1e_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4573029Z test_opinfo_reference_generators_special_laguerre_polynomial_l_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4574819Z test_opinfo_reference_generators_special_legendre_polynomial_p_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... skip: Skipping - testing takes an unreasonably long time, #79528 (0.000s) 2023-01-11T21:23:47.4575321Z test_opinfo_reference_generators_special_log_ndtr_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4575764Z test_opinfo_reference_generators_special_modified_bessel_i0_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4576293Z test_opinfo_reference_generators_special_modified_bessel_i1_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4576746Z test_opinfo_reference_generators_special_modified_bessel_k0_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4577198Z test_opinfo_reference_generators_special_modified_bessel_k1_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s) 2023-01-11T21:23:47.4577633Z test_opinfo_reference_generators_special_ndtr_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4578052Z test_opinfo_reference_generators_special_ndtri_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4578505Z test_opinfo_reference_generators_special_polygamma_special_polygamma_n_0_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4578981Z test_opinfo_reference_generators_special_scaled_modified_bessel_k0_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4579443Z test_opinfo_reference_generators_special_scaled_modified_bessel_k1_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4580084Z test_opinfo_reference_generators_special_shifted_chebyshev_polynomial_t_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... skip: Skipping - testing takes an unreasonably long time, #79528 (0.000s) 2023-01-11T21:23:47.4580790Z test_opinfo_reference_generators_special_shifted_chebyshev_polynomial_u_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... skip: Skipping - testing takes an unreasonably long time, #79528 (0.000s) 2023-01-11T21:23:47.4581495Z test_opinfo_reference_generators_special_shifted_chebyshev_polynomial_v_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... skip: Skipping - testing takes an unreasonably long time, #79528 (0.000s) 2023-01-11T21:23:47.4582204Z test_opinfo_reference_generators_special_shifted_chebyshev_polynomial_w_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... skip: Skipping - testing takes an unreasonably long time, #79528 (0.000s) 2023-01-11T21:23:47.4582758Z test_opinfo_reference_generators_special_spherical_bessel_j0_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4583197Z test_opinfo_reference_generators_special_xlog1py_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4583670Z test_opinfo_reference_generators_special_zeta_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4584083Z test_opinfo_reference_generators_sqrt_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4584493Z test_opinfo_reference_generators_square_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4584893Z test_opinfo_reference_generators_sub_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s) 2023-01-11T21:23:47.4585303Z test_opinfo_reference_generators_tan_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s) 2023-01-11T21:23:47.4585704Z test_opinfo_reference_generators_tanh_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4586113Z test_opinfo_reference_generators_true_divide_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4586524Z test_opinfo_reference_generators_trunc_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4586932Z test_opinfo_reference_generators_view_as_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4587340Z test_opinfo_reference_generators_view_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4587742Z test_opinfo_reference_generators_where_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4588182Z test_opinfo_reference_generators_xlogy_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4588584Z test_opinfo_sample_generators_H_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4588987Z test_opinfo_sample_generators_T_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4589384Z test_opinfo_sample_generators___getitem___cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4589793Z test_opinfo_sample_generators___radd___cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4590198Z test_opinfo_sample_generators___rand___cuda_int64 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4590598Z test_opinfo_sample_generators___rdiv___cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4591007Z test_opinfo_sample_generators___rmatmul___cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4591408Z test_opinfo_sample_generators___rmod___cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4591810Z test_opinfo_sample_generators___rmul___cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4592208Z test_opinfo_sample_generators___ror___cuda_int64 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4592597Z test_opinfo_sample_generators___rpow___cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4592991Z test_opinfo_sample_generators___rsub___cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4593394Z test_opinfo_sample_generators___rxor___cuda_int64 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4593807Z test_opinfo_sample_generators__native_batch_norm_legit_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4594251Z test_opinfo_sample_generators__softmax_backward_data_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4594700Z test_opinfo_sample_generators_abs_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4595105Z test_opinfo_sample_generators_acos_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4595498Z test_opinfo_sample_generators_acosh_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4595901Z test_opinfo_sample_generators_add_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4596305Z test_opinfo_sample_generators_addbmm_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4596714Z test_opinfo_sample_generators_addcdiv_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4597118Z test_opinfo_sample_generators_addcmul_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4597523Z test_opinfo_sample_generators_addmm_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4597942Z test_opinfo_sample_generators_addmm_decomposed_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4598363Z test_opinfo_sample_generators_addmv_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4598759Z test_opinfo_sample_generators_addr_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4599159Z test_opinfo_sample_generators_all_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4599563Z test_opinfo_sample_generators_allclose_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4600003Z test_opinfo_sample_generators_amax_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4600398Z test_opinfo_sample_generators_amin_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4600802Z test_opinfo_sample_generators_aminmax_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4601205Z test_opinfo_sample_generators_angle_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4601597Z test_opinfo_sample_generators_any_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4601998Z test_opinfo_sample_generators_arange_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4602401Z test_opinfo_sample_generators_argmax_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4602801Z test_opinfo_sample_generators_argmin_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4603196Z test_opinfo_sample_generators_argsort_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4603607Z test_opinfo_sample_generators_argwhere_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4604024Z test_opinfo_sample_generators_as_strided_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4604453Z test_opinfo_sample_generators_as_strided_partial_views_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4604881Z test_opinfo_sample_generators_as_strided_scatter_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4605299Z test_opinfo_sample_generators_asin_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4605698Z test_opinfo_sample_generators_asinh_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4606105Z test_opinfo_sample_generators_atan2_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4606491Z test_opinfo_sample_generators_atan_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4606922Z test_opinfo_sample_generators_atanh_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4607334Z test_opinfo_sample_generators_atleast_1d_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4607740Z test_opinfo_sample_generators_atleast_2d_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4608145Z test_opinfo_sample_generators_atleast_3d_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4608545Z test_opinfo_sample_generators_baddbmm_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4608959Z test_opinfo_sample_generators_bernoulli_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4609362Z test_opinfo_sample_generators_bfloat16_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4609773Z test_opinfo_sample_generators_bincount_cuda_int64 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4610178Z test_opinfo_sample_generators_bitwise_and_cuda_int64 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4610594Z test_opinfo_sample_generators_bitwise_left_shift_cuda_int64 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4611001Z test_opinfo_sample_generators_bitwise_not_cuda_int64 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4611408Z test_opinfo_sample_generators_bitwise_or_cuda_int64 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4611824Z test_opinfo_sample_generators_bitwise_right_shift_cuda_int64 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4612272Z test_opinfo_sample_generators_bitwise_xor_cuda_int64 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4612676Z test_opinfo_sample_generators_block_diag_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4613081Z test_opinfo_sample_generators_bmm_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4613479Z test_opinfo_sample_generators_bool_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4614014Z test_opinfo_sample_generators_broadcast_shapes_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4614656Z test_opinfo_sample_generators_broadcast_tensors_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4615092Z test_opinfo_sample_generators_broadcast_to_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4615558Z test_opinfo_sample_generators_bucketize_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4615951Z test_opinfo_sample_generators_byte_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4616361Z test_opinfo_sample_generators_cartesian_prod_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4616771Z test_opinfo_sample_generators_cat_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4617167Z test_opinfo_sample_generators_cdist_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4617555Z test_opinfo_sample_generators_cdouble_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4617951Z test_opinfo_sample_generators_ceil_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4618351Z test_opinfo_sample_generators_cfloat_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4618748Z test_opinfo_sample_generators_chalf_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4619211Z test_opinfo_sample_generators_char_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4619612Z test_opinfo_sample_generators_cholesky_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4620022Z test_opinfo_sample_generators_cholesky_inverse_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4620440Z test_opinfo_sample_generators_cholesky_solve_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4620836Z test_opinfo_sample_generators_chunk_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4621233Z test_opinfo_sample_generators_clamp_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4621631Z test_opinfo_sample_generators_clamp_max_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4622036Z test_opinfo_sample_generators_clamp_min_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4622431Z test_opinfo_sample_generators_clone_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4622835Z test_opinfo_sample_generators_column_stack_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4623249Z test_opinfo_sample_generators_combinations_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4623649Z test_opinfo_sample_generators_complex_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4624046Z test_opinfo_sample_generators_conj_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4624514Z test_opinfo_sample_generators_conj_physical_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4624938Z test_opinfo_sample_generators_constant_pad_nd_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4625350Z test_opinfo_sample_generators_contiguous_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4625767Z test_opinfo_sample_generators_copysign_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4626224Z test_opinfo_sample_generators_corrcoef_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4626627Z test_opinfo_sample_generators_cos_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4627018Z test_opinfo_sample_generators_cosh_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4627434Z test_opinfo_sample_generators_count_nonzero_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4627839Z test_opinfo_sample_generators_cov_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4628242Z test_opinfo_sample_generators_cross_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4628642Z test_opinfo_sample_generators_cummax_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4629048Z test_opinfo_sample_generators_cummin_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4629454Z test_opinfo_sample_generators_cumprod_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4629855Z test_opinfo_sample_generators_cumsum_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4630271Z test_opinfo_sample_generators_cumulative_trapezoid_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4630693Z test_opinfo_sample_generators_deg2rad_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4631138Z test_opinfo_sample_generators_diag_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4631540Z test_opinfo_sample_generators_diag_embed_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4631949Z test_opinfo_sample_generators_diagflat_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4632362Z test_opinfo_sample_generators_diagonal_copy_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4632775Z test_opinfo_sample_generators_diagonal_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4633181Z test_opinfo_sample_generators_diagonal_scatter_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4633594Z test_opinfo_sample_generators_diff_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4634001Z test_opinfo_sample_generators_digamma_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4634409Z test_opinfo_sample_generators_dist_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4634814Z test_opinfo_sample_generators_div_floor_rounding_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4635240Z test_opinfo_sample_generators_div_no_rounding_mode_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4635671Z test_opinfo_sample_generators_div_trunc_rounding_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4636124Z test_opinfo_sample_generators_dot_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4636563Z test_opinfo_sample_generators_double_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4636963Z test_opinfo_sample_generators_dsplit_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4637365Z test_opinfo_sample_generators_dstack_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4637750Z test_opinfo_sample_generators_einsum_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4638145Z test_opinfo_sample_generators_empty_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4638561Z test_opinfo_sample_generators_empty_like_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4638965Z test_opinfo_sample_generators_eq_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4639360Z test_opinfo_sample_generators_equal_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4639759Z test_opinfo_sample_generators_erf_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4640158Z test_opinfo_sample_generators_erfc_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4640558Z test_opinfo_sample_generators_erfinv_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4640953Z test_opinfo_sample_generators_exp2_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4641349Z test_opinfo_sample_generators_exp_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4641751Z test_opinfo_sample_generators_expand_as_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4642151Z test_opinfo_sample_generators_expand_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4642550Z test_opinfo_sample_generators_expm1_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4642946Z test_opinfo_sample_generators_eye_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4643378Z test_opinfo_sample_generators_fft_fft2_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4643774Z test_opinfo_sample_generators_fft_fft_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4644180Z test_opinfo_sample_generators_fft_fftn_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4644592Z test_opinfo_sample_generators_fft_fftshift_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4645005Z test_opinfo_sample_generators_fft_hfft2_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4645405Z test_opinfo_sample_generators_fft_hfft_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4645804Z test_opinfo_sample_generators_fft_hfftn_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4646207Z test_opinfo_sample_generators_fft_ifft2_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4646613Z test_opinfo_sample_generators_fft_ifft_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4647002Z test_opinfo_sample_generators_fft_ifftn_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4647414Z test_opinfo_sample_generators_fft_ifftshift_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4647830Z test_opinfo_sample_generators_fft_ihfft2_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4648270Z test_opinfo_sample_generators_fft_ihfft_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4648666Z test_opinfo_sample_generators_fft_ihfftn_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4649070Z test_opinfo_sample_generators_fft_irfft2_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4649479Z test_opinfo_sample_generators_fft_irfft_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4649874Z test_opinfo_sample_generators_fft_irfftn_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4650276Z test_opinfo_sample_generators_fft_rfft2_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4650676Z test_opinfo_sample_generators_fft_rfft_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4651075Z test_opinfo_sample_generators_fft_rfftn_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4651471Z test_opinfo_sample_generators_fill_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4651874Z test_opinfo_sample_generators_flatten_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4652276Z test_opinfo_sample_generators_flip_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4652677Z test_opinfo_sample_generators_fliplr_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4653072Z test_opinfo_sample_generators_flipud_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4653475Z test_opinfo_sample_generators_float_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4653987Z test_opinfo_sample_generators_float_power_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4654436Z test_opinfo_sample_generators_floor_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4655076Z test_opinfo_sample_generators_floor_divide_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4655581Z test_opinfo_sample_generators_fmax_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4655976Z test_opinfo_sample_generators_fmin_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4656365Z test_opinfo_sample_generators_fmod_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4656746Z test_opinfo_sample_generators_frac_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4657141Z test_opinfo_sample_generators_frexp_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4657537Z test_opinfo_sample_generators_full_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4657923Z test_opinfo_sample_generators_full_like_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4658330Z test_opinfo_sample_generators_gather_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4658726Z test_opinfo_sample_generators_gcd_cuda_int64 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4659118Z test_opinfo_sample_generators_ge_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4659504Z test_opinfo_sample_generators_geqrf_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4659901Z test_opinfo_sample_generators_gradient_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4660313Z test_opinfo_sample_generators_grid_sampler_2d_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4660768Z test_opinfo_sample_generators_gt_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4661158Z test_opinfo_sample_generators_half_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4661566Z test_opinfo_sample_generators_heaviside_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4661969Z test_opinfo_sample_generators_histc_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4662370Z test_opinfo_sample_generators_hsplit_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4662766Z test_opinfo_sample_generators_hstack_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4663166Z test_opinfo_sample_generators_hypot_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4663564Z test_opinfo_sample_generators_i0_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4663955Z test_opinfo_sample_generators_igamma_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4664362Z test_opinfo_sample_generators_igammac_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4664769Z test_opinfo_sample_generators_imag_cuda_complex64 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4665175Z test_opinfo_sample_generators_index_add_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4665578Z test_opinfo_sample_generators_index_copy_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4665982Z test_opinfo_sample_generators_index_fill_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4666389Z test_opinfo_sample_generators_index_put_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4666808Z test_opinfo_sample_generators_index_reduce_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4667315Z test_opinfo_sample_generators_index_select_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4667722Z test_opinfo_sample_generators_inner_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4668126Z test_opinfo_sample_generators_int_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4668531Z test_opinfo_sample_generators_isclose_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s) 2023-01-11T21:23:47.4668929Z test_opinfo_sample_generators_isfinite_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4669328Z test_opinfo_sample_generators_isin_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4669728Z test_opinfo_sample_generators_isinf_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4670116Z test_opinfo_sample_generators_isnan_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4670518Z test_opinfo_sample_generators_isneginf_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4670922Z test_opinfo_sample_generators_isposinf_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4671330Z test_opinfo_sample_generators_isreal_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4671734Z test_opinfo_sample_generators_istft_cuda_complex64 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4672162Z test_opinfo_sample_generators_jiterator_2inputs_2outputs_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4672644Z test_opinfo_sample_generators_jiterator_4inputs_with_extra_args_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4673080Z test_opinfo_sample_generators_jiterator_binary_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4673510Z test_opinfo_sample_generators_jiterator_binary_return_by_ref_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4673945Z test_opinfo_sample_generators_jiterator_unary_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4674352Z test_opinfo_sample_generators_kron_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4674755Z test_opinfo_sample_generators_kthvalue_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4675153Z test_opinfo_sample_generators_lcm_cuda_int64 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4675597Z test_opinfo_sample_generators_ldexp_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4675995Z test_opinfo_sample_generators_le_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4676398Z test_opinfo_sample_generators_lerp_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4676792Z test_opinfo_sample_generators_lgamma_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4677209Z test_opinfo_sample_generators_linalg_cholesky_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4677633Z test_opinfo_sample_generators_linalg_cholesky_ex_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4678054Z test_opinfo_sample_generators_linalg_cond_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4678464Z test_opinfo_sample_generators_linalg_cross_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4678879Z test_opinfo_sample_generators_linalg_det_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4679329Z test_opinfo_sample_generators_linalg_det_singular_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4679744Z test_opinfo_sample_generators_linalg_eig_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4680158Z test_opinfo_sample_generators_linalg_eigh_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4680576Z test_opinfo_sample_generators_linalg_eigvals_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4680999Z test_opinfo_sample_generators_linalg_eigvalsh_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4681433Z test_opinfo_sample_generators_linalg_householder_product_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4681858Z test_opinfo_sample_generators_linalg_inv_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4682273Z test_opinfo_sample_generators_linalg_inv_ex_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4682692Z test_opinfo_sample_generators_linalg_ldl_factor_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s) 2023-01-11T21:23:47.4683111Z test_opinfo_sample_generators_linalg_ldl_factor_ex_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s) 2023-01-11T21:23:47.4683533Z test_opinfo_sample_generators_linalg_ldl_solve_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s) 2023-01-11T21:23:47.4683949Z test_opinfo_sample_generators_linalg_lstsq_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s) 2023-01-11T21:23:47.4684415Z test_opinfo_sample_generators_linalg_lstsq_grad_oriented_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s) 2023-01-11T21:23:47.4684836Z test_opinfo_sample_generators_linalg_lu_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s) 2023-01-11T21:23:47.4685255Z test_opinfo_sample_generators_linalg_lu_factor_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s) 2023-01-11T21:23:47.4685681Z test_opinfo_sample_generators_linalg_lu_factor_ex_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s) 2023-01-11T21:23:47.4686105Z test_opinfo_sample_generators_linalg_lu_solve_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s) 2023-01-11T21:23:47.4686515Z test_opinfo_sample_generators_linalg_matrix_norm_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s) 2023-01-11T21:23:47.4686940Z test_opinfo_sample_generators_linalg_matrix_power_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s) 2023-01-11T21:23:47.4687362Z test_opinfo_sample_generators_linalg_matrix_rank_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s) 2023-01-11T21:23:47.4687792Z test_opinfo_sample_generators_linalg_matrix_rank_hermitian_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s) 2023-01-11T21:23:47.4688223Z test_opinfo_sample_generators_linalg_multi_dot_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s) 2023-01-11T21:23:47.4688642Z test_opinfo_sample_generators_linalg_norm_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s) 2023-01-11T21:23:47.4689075Z test_opinfo_sample_generators_linalg_norm_subgradients_at_zero_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s) 2023-01-11T21:23:47.4689511Z test_opinfo_sample_generators_linalg_pinv_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s) 2023-01-11T21:23:47.4689930Z test_opinfo_sample_generators_linalg_pinv_hermitian_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s) 2023-01-11T21:23:47.4690425Z test_opinfo_sample_generators_linalg_pinv_singular_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... skip: test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test (0.001s) 2023-01-11T21:23:47.4690928Z test_opinfo_sample_generators_linalg_qr_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s) 2023-01-11T21:23:47.4691353Z test_opinfo_sample_generators_linalg_slogdet_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s) 2023-01-11T21:23:47.4691762Z test_opinfo_sample_generators_linalg_solve_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s) 2023-01-11T21:23:47.4692178Z test_opinfo_sample_generators_linalg_solve_ex_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s) 2023-01-11T21:23:47.4692602Z test_opinfo_sample_generators_linalg_solve_triangular_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s) 2023-01-11T21:23:47.4693029Z test_opinfo_sample_generators_linalg_svd_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s) 2023-01-11T21:23:47.4693437Z test_opinfo_sample_generators_linalg_svdvals_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s) 2023-01-11T21:23:47.4693998Z test_opinfo_sample_generators_linalg_tensorinv_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s) 2023-01-11T21:23:47.4694459Z test_opinfo_sample_generators_linalg_tensorsolve_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s) 2023-01-11T21:23:47.4695129Z test_opinfo_sample_generators_linalg_vander_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s) 2023-01-11T21:23:47.4695545Z test_opinfo_sample_generators_linalg_vecdot_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s) 2023-01-11T21:23:47.4695957Z test_opinfo_sample_generators_linalg_vector_norm_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s) 2023-01-11T21:23:47.4696457Z test_opinfo_sample_generators_linspace_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s) 2023-01-11T21:23:47.4696853Z test_opinfo_sample_generators_log10_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s) 2023-01-11T21:23:47.4697262Z test_opinfo_sample_generators_log1p_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s) 2023-01-11T21:23:47.4697661Z test_opinfo_sample_generators_log2_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s) 2023-01-11T21:23:47.4698060Z test_opinfo_sample_generators_log_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s) 2023-01-11T21:23:47.4698459Z test_opinfo_sample_generators_log_softmax_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s) 2023-01-11T21:23:47.4698878Z test_opinfo_sample_generators_log_softmax_with_dtype_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4699301Z test_opinfo_sample_generators_logaddexp2_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4699718Z test_opinfo_sample_generators_logaddexp_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4700128Z test_opinfo_sample_generators_logcumsumexp_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4700545Z test_opinfo_sample_generators_logdet_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4700952Z test_opinfo_sample_generators_logical_and_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4701359Z test_opinfo_sample_generators_logical_not_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4701762Z test_opinfo_sample_generators_logical_or_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4702167Z test_opinfo_sample_generators_logical_xor_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4702576Z test_opinfo_sample_generators_logit_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4702986Z test_opinfo_sample_generators_logspace_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4703452Z test_opinfo_sample_generators_logsumexp_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4703866Z test_opinfo_sample_generators_long_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4704261Z test_opinfo_sample_generators_lt_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4704649Z test_opinfo_sample_generators_lu_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4705051Z test_opinfo_sample_generators_lu_solve_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4705459Z test_opinfo_sample_generators_lu_unpack_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4705862Z test_opinfo_sample_generators_mH_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4706250Z test_opinfo_sample_generators_mT_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4706655Z test_opinfo_sample_generators_masked_amax_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4707057Z test_opinfo_sample_generators_masked_amin_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4707474Z test_opinfo_sample_generators_masked_argmax_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4707885Z test_opinfo_sample_generators_masked_argmin_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4708302Z test_opinfo_sample_generators_masked_cumprod_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4708755Z test_opinfo_sample_generators_masked_cumsum_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4709168Z test_opinfo_sample_generators_masked_fill_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4709579Z test_opinfo_sample_generators_masked_log_softmax_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4710006Z test_opinfo_sample_generators_masked_logaddexp_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4710430Z test_opinfo_sample_generators_masked_logsumexp_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4710848Z test_opinfo_sample_generators_masked_mean_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4711252Z test_opinfo_sample_generators_masked_median_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4711666Z test_opinfo_sample_generators_masked_norm_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4712077Z test_opinfo_sample_generators_masked_normalize_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4712488Z test_opinfo_sample_generators_masked_prod_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4712905Z test_opinfo_sample_generators_masked_scatter_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4713320Z test_opinfo_sample_generators_masked_select_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4713731Z test_opinfo_sample_generators_masked_softmax_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4714142Z test_opinfo_sample_generators_masked_softmin_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4714556Z test_opinfo_sample_generators_masked_std_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4714986Z test_opinfo_sample_generators_masked_sum_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4715395Z test_opinfo_sample_generators_masked_var_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s) 2023-01-11T21:23:47.4715797Z test_opinfo_sample_generators_matmul_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4716200Z test_opinfo_sample_generators_matrix_exp_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4716599Z test_opinfo_sample_generators_max_binary_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4717035Z test_opinfo_sample_generators_max_pool2d_with_indices_backward_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4717475Z test_opinfo_sample_generators_max_reduction_no_dim_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4717906Z test_opinfo_sample_generators_max_reduction_with_dim_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4718323Z test_opinfo_sample_generators_maximum_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4718725Z test_opinfo_sample_generators_mean_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4719119Z test_opinfo_sample_generators_median_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4719536Z test_opinfo_sample_generators_meshgrid_list_of_tensors_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4719973Z test_opinfo_sample_generators_meshgrid_variadic_tensors_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4720430Z test_opinfo_sample_generators_min_binary_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4720843Z test_opinfo_sample_generators_min_reduction_no_dim_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4721273Z test_opinfo_sample_generators_min_reduction_with_dim_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4721686Z test_opinfo_sample_generators_minimum_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4722078Z test_opinfo_sample_generators_mm_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4722473Z test_opinfo_sample_generators_mode_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4722871Z test_opinfo_sample_generators_movedim_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4723274Z test_opinfo_sample_generators_msort_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4723665Z test_opinfo_sample_generators_mul_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4724068Z test_opinfo_sample_generators_multinomial_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4724467Z test_opinfo_sample_generators_mv_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4724882Z test_opinfo_sample_generators_mvlgamma_mvlgamma_p_1_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4725308Z test_opinfo_sample_generators_mvlgamma_mvlgamma_p_3_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4725739Z test_opinfo_sample_generators_mvlgamma_mvlgamma_p_5_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4726157Z test_opinfo_sample_generators_nan_to_num_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4726565Z test_opinfo_sample_generators_nanmean_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4726997Z test_opinfo_sample_generators_nanmedian_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4727411Z test_opinfo_sample_generators_nanquantile_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4727815Z test_opinfo_sample_generators_nansum_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4728224Z test_opinfo_sample_generators_narrow_copy_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4728624Z test_opinfo_sample_generators_narrow_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4729041Z test_opinfo_sample_generators_native_batch_norm_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4729476Z test_opinfo_sample_generators_native_dropout_backward_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4729913Z test_opinfo_sample_generators_native_layer_norm_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4730318Z test_opinfo_sample_generators_ne_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4730715Z test_opinfo_sample_generators_neg_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4731121Z test_opinfo_sample_generators_new_empty_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4731527Z test_opinfo_sample_generators_new_empty_strided_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4731981Z test_opinfo_sample_generators_new_full_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4732382Z test_opinfo_sample_generators_new_ones_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4732788Z test_opinfo_sample_generators_new_zeros_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4733184Z test_opinfo_sample_generators_nextafter_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4733624Z test_opinfo_sample_generators_nn_functional__scaled_dot_product_attention_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4734227Z test_opinfo_sample_generators_nn_functional_adaptive_avg_pool1d_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4734895Z test_opinfo_sample_generators_nn_functional_adaptive_avg_pool2d_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4735339Z test_opinfo_sample_generators_nn_functional_adaptive_avg_pool3d_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4735787Z test_opinfo_sample_generators_nn_functional_adaptive_max_pool1d_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4736232Z test_opinfo_sample_generators_nn_functional_adaptive_max_pool2d_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4736672Z test_opinfo_sample_generators_nn_functional_adaptive_max_pool3d_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4737108Z test_opinfo_sample_generators_nn_functional_alpha_dropout_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4737544Z test_opinfo_sample_generators_nn_functional_avg_pool1d_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4737973Z test_opinfo_sample_generators_nn_functional_avg_pool2d_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4738399Z test_opinfo_sample_generators_nn_functional_avg_pool3d_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4738915Z test_opinfo_sample_generators_nn_functional_batch_norm_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4739364Z test_opinfo_sample_generators_nn_functional_batch_norm_without_cudnn_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4739810Z test_opinfo_sample_generators_nn_functional_bilinear_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s) 2023-01-11T21:23:47.4740248Z test_opinfo_sample_generators_nn_functional_binary_cross_entropy_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4740706Z test_opinfo_sample_generators_nn_functional_binary_cross_entropy_with_logits_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4741154Z test_opinfo_sample_generators_nn_functional_celu_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4741579Z test_opinfo_sample_generators_nn_functional_conv1d_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4742003Z test_opinfo_sample_generators_nn_functional_conv2d_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4742426Z test_opinfo_sample_generators_nn_functional_conv_transpose1d_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4742870Z test_opinfo_sample_generators_nn_functional_conv_transpose2d_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4743316Z test_opinfo_sample_generators_nn_functional_conv_transpose3d_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4743818Z test_opinfo_sample_generators_nn_functional_cosine_embedding_loss_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4744266Z test_opinfo_sample_generators_nn_functional_cosine_similarity_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4744715Z test_opinfo_sample_generators_nn_functional_cross_entropy_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4745152Z test_opinfo_sample_generators_nn_functional_ctc_loss_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4745582Z test_opinfo_sample_generators_nn_functional_dropout2d_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4746010Z test_opinfo_sample_generators_nn_functional_dropout3d_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4746439Z test_opinfo_sample_generators_nn_functional_dropout_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4746872Z test_opinfo_sample_generators_nn_functional_elu_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4747312Z test_opinfo_sample_generators_nn_functional_embedding_bag_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4747744Z test_opinfo_sample_generators_nn_functional_embedding_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4748203Z test_opinfo_sample_generators_nn_functional_feature_alpha_dropout_with_train_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4748683Z test_opinfo_sample_generators_nn_functional_feature_alpha_dropout_without_train_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4749151Z test_opinfo_sample_generators_nn_functional_fractional_max_pool2d_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4749606Z test_opinfo_sample_generators_nn_functional_fractional_max_pool3d_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4750090Z test_opinfo_sample_generators_nn_functional_gaussian_nll_loss_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4750526Z test_opinfo_sample_generators_nn_functional_gelu_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4750954Z test_opinfo_sample_generators_nn_functional_glu_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4751379Z test_opinfo_sample_generators_nn_functional_grid_sample_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4751816Z test_opinfo_sample_generators_nn_functional_group_norm_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4752250Z test_opinfo_sample_generators_nn_functional_hardshrink_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4752688Z test_opinfo_sample_generators_nn_functional_hardsigmoid_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4753125Z test_opinfo_sample_generators_nn_functional_hardswish_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4753557Z test_opinfo_sample_generators_nn_functional_hardtanh_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4754005Z test_opinfo_sample_generators_nn_functional_hinge_embedding_loss_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4754446Z test_opinfo_sample_generators_nn_functional_huber_loss_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4754879Z test_opinfo_sample_generators_nn_functional_instance_norm_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4755361Z test_opinfo_sample_generators_nn_functional_interpolate_area_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4755854Z test_opinfo_sample_generators_nn_functional_interpolate_bicubic_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4756322Z test_opinfo_sample_generators_nn_functional_interpolate_bilinear_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4756770Z test_opinfo_sample_generators_nn_functional_interpolate_linear_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4757222Z test_opinfo_sample_generators_nn_functional_interpolate_nearest_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4757673Z test_opinfo_sample_generators_nn_functional_interpolate_trilinear_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4758120Z test_opinfo_sample_generators_nn_functional_kl_div_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4758538Z test_opinfo_sample_generators_nn_functional_l1_loss_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4758968Z test_opinfo_sample_generators_nn_functional_layer_norm_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4759395Z test_opinfo_sample_generators_nn_functional_leaky_relu_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4759822Z test_opinfo_sample_generators_nn_functional_linear_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4760256Z test_opinfo_sample_generators_nn_functional_local_response_norm_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4760698Z test_opinfo_sample_generators_nn_functional_logsigmoid_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4761144Z test_opinfo_sample_generators_nn_functional_margin_ranking_loss_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4761611Z test_opinfo_sample_generators_nn_functional_max_pool1d_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s) 2023-01-11T21:23:47.4762029Z test_opinfo_sample_generators_nn_functional_max_pool2d_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4762452Z test_opinfo_sample_generators_nn_functional_max_pool3d_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4762885Z test_opinfo_sample_generators_nn_functional_max_unpool1d_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4763330Z test_opinfo_sample_generators_nn_functional_max_unpool1d_grad_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4763771Z test_opinfo_sample_generators_nn_functional_max_unpool2d_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4764216Z test_opinfo_sample_generators_nn_functional_max_unpool2d_grad_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4764657Z test_opinfo_sample_generators_nn_functional_max_unpool3d_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4765110Z test_opinfo_sample_generators_nn_functional_max_unpool3d_grad_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4765577Z test_opinfo_sample_generators_nn_functional_mish_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4766001Z test_opinfo_sample_generators_nn_functional_mse_loss_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4766438Z test_opinfo_sample_generators_nn_functional_multi_margin_loss_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4766922Z test_opinfo_sample_generators_nn_functional_multilabel_margin_loss_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4767381Z test_opinfo_sample_generators_nn_functional_multilabel_soft_margin_loss_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4767829Z test_opinfo_sample_generators_nn_functional_nll_loss_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4768257Z test_opinfo_sample_generators_nn_functional_normalize_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4768684Z test_opinfo_sample_generators_nn_functional_one_hot_cuda_int64 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s) 2023-01-11T21:23:47.4769110Z test_opinfo_sample_generators_nn_functional_pad_circular_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4769552Z test_opinfo_sample_generators_nn_functional_pad_constant_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4769987Z test_opinfo_sample_generators_nn_functional_pad_reflect_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4770426Z test_opinfo_sample_generators_nn_functional_pad_replicate_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4770864Z test_opinfo_sample_generators_nn_functional_pairwise_distance_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4771301Z test_opinfo_sample_generators_nn_functional_pdist_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4771736Z test_opinfo_sample_generators_nn_functional_pixel_shuffle_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4772180Z test_opinfo_sample_generators_nn_functional_pixel_unshuffle_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4772620Z test_opinfo_sample_generators_nn_functional_poisson_nll_loss_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4773145Z test_opinfo_sample_generators_nn_functional_prelu_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4773573Z test_opinfo_sample_generators_nn_functional_relu6_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4774132Z test_opinfo_sample_generators_nn_functional_relu_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4774767Z test_opinfo_sample_generators_nn_functional_rrelu_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4775182Z test_opinfo_sample_generators_nn_functional_selu_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4775607Z test_opinfo_sample_generators_nn_functional_silu_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4776036Z test_opinfo_sample_generators_nn_functional_smooth_l1_loss_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4776471Z test_opinfo_sample_generators_nn_functional_soft_margin_loss_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4776904Z test_opinfo_sample_generators_nn_functional_softmin_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4777343Z test_opinfo_sample_generators_nn_functional_softmin_with_dtype_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4777780Z test_opinfo_sample_generators_nn_functional_softplus_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4778203Z test_opinfo_sample_generators_nn_functional_softshrink_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4778726Z test_opinfo_sample_generators_nn_functional_softsign_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4779156Z test_opinfo_sample_generators_nn_functional_tanhshrink_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4779588Z test_opinfo_sample_generators_nn_functional_threshold_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4780026Z test_opinfo_sample_generators_nn_functional_triplet_margin_loss_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4780489Z test_opinfo_sample_generators_nn_functional_triplet_margin_with_distance_loss_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4780941Z test_opinfo_sample_generators_nn_functional_unfold_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4781386Z test_opinfo_sample_generators_nn_functional_upsample_bilinear_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4781829Z test_opinfo_sample_generators_nn_functional_upsample_nearest_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4782258Z test_opinfo_sample_generators_nonzero_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4782662Z test_opinfo_sample_generators_norm_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4783064Z test_opinfo_sample_generators_norm_fro_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4783455Z test_opinfo_sample_generators_norm_inf_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s) 2023-01-11T21:23:47.4783845Z test_opinfo_sample_generators_norm_nuc_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4784252Z test_opinfo_sample_generators_normal_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4784669Z test_opinfo_sample_generators_normal_number_mean_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4785115Z test_opinfo_sample_generators_ones_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4785522Z test_opinfo_sample_generators_ones_like_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4785923Z test_opinfo_sample_generators_ormqr_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4786321Z test_opinfo_sample_generators_outer_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4786728Z test_opinfo_sample_generators_pca_lowrank_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s) 2023-01-11T21:23:47.4787142Z test_opinfo_sample_generators_permute_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4787553Z test_opinfo_sample_generators_pinverse_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4787951Z test_opinfo_sample_generators_polar_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4788370Z test_opinfo_sample_generators_polygamma_polygamma_n_0_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4788808Z test_opinfo_sample_generators_polygamma_polygamma_n_1_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4789243Z test_opinfo_sample_generators_polygamma_polygamma_n_2_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4789664Z test_opinfo_sample_generators_polygamma_polygamma_n_3_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4790124Z test_opinfo_sample_generators_polygamma_polygamma_n_4_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4790542Z test_opinfo_sample_generators_positive_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4790950Z test_opinfo_sample_generators_pow_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4791347Z test_opinfo_sample_generators_prod_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4791746Z test_opinfo_sample_generators_put_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4792142Z test_opinfo_sample_generators_qr_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4792542Z test_opinfo_sample_generators_quantile_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4792941Z test_opinfo_sample_generators_rad2deg_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4793352Z test_opinfo_sample_generators_rand_like_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4793760Z test_opinfo_sample_generators_randint_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4794173Z test_opinfo_sample_generators_randint_like_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4794571Z test_opinfo_sample_generators_randn_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4794976Z test_opinfo_sample_generators_randn_like_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4795383Z test_opinfo_sample_generators_ravel_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4795779Z test_opinfo_sample_generators_real_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4796190Z test_opinfo_sample_generators_reciprocal_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4796605Z test_opinfo_sample_generators_remainder_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4797041Z test_opinfo_sample_generators_renorm_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4797439Z test_opinfo_sample_generators_repeat_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s) 2023-01-11T21:23:47.4797856Z test_opinfo_sample_generators_repeat_interleave_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4798276Z test_opinfo_sample_generators_reshape_as_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4798681Z test_opinfo_sample_generators_reshape_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4799079Z test_opinfo_sample_generators_resize__cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4799481Z test_opinfo_sample_generators_resize_as__cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4799689Z test_opinfo_sample_generators_resolve_conj_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s) 2023-01-11T21:23:47.4799886Z test_opinfo_sample_generators_resolve_neg_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4800077Z test_opinfo_sample_generators_roll_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4800268Z test_opinfo_sample_generators_rot90_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4800458Z test_opinfo_sample_generators_round_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4800655Z test_opinfo_sample_generators_round_decimals_0_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4800890Z test_opinfo_sample_generators_round_decimals_3_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4801100Z test_opinfo_sample_generators_round_decimals_neg_3_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4801293Z test_opinfo_sample_generators_rsqrt_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4801488Z test_opinfo_sample_generators_rsub_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4801689Z test_opinfo_sample_generators_scalar_tensor_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4801886Z test_opinfo_sample_generators_scatter_add_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4802081Z test_opinfo_sample_generators_scatter_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4802292Z test_opinfo_sample_generators_scatter_reduce_amax_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4802494Z test_opinfo_sample_generators_scatter_reduce_amin_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4802694Z test_opinfo_sample_generators_scatter_reduce_mean_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4802893Z test_opinfo_sample_generators_scatter_reduce_prod_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4803096Z test_opinfo_sample_generators_scatter_reduce_sum_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4803298Z test_opinfo_sample_generators_searchsorted_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4803509Z test_opinfo_sample_generators_segment_reduce_lengths_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4803718Z test_opinfo_sample_generators_segment_reduce_offsets_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4803944Z test_opinfo_sample_generators_select_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4804149Z test_opinfo_sample_generators_select_scatter_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4804336Z test_opinfo_sample_generators_sgn_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4804521Z test_opinfo_sample_generators_short_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4804711Z test_opinfo_sample_generators_sigmoid_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4804901Z test_opinfo_sample_generators_sign_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4805115Z test_opinfo_sample_generators_signal_windows_bartlett_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4805368Z test_opinfo_sample_generators_signal_windows_blackman_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4805582Z test_opinfo_sample_generators_signal_windows_cosine_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4805798Z test_opinfo_sample_generators_signal_windows_exponential_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4806006Z test_opinfo_sample_generators_signal_windows_gaussian_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4806220Z test_opinfo_sample_generators_signal_windows_general_cosine_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4806471Z test_opinfo_sample_generators_signal_windows_general_hamming_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4806670Z test_opinfo_sample_generators_signal_windows_hamming_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4806875Z test_opinfo_sample_generators_signal_windows_hann_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4807080Z test_opinfo_sample_generators_signal_windows_kaiser_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4807281Z test_opinfo_sample_generators_signal_windows_nuttall_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4807472Z test_opinfo_sample_generators_signbit_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4807662Z test_opinfo_sample_generators_sin_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4807855Z test_opinfo_sample_generators_sinc_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4808043Z test_opinfo_sample_generators_sinh_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4808234Z test_opinfo_sample_generators_slice_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4808425Z test_opinfo_sample_generators_slice_scatter_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4808614Z test_opinfo_sample_generators_softmax_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4808817Z test_opinfo_sample_generators_softmax_with_dtype_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4809003Z test_opinfo_sample_generators_sort_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4809212Z test_opinfo_sample_generators_sparse_sampled_addmm_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4809416Z test_opinfo_sample_generators_special_airy_ai_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4809673Z test_opinfo_sample_generators_special_bessel_j0_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4809873Z test_opinfo_sample_generators_special_bessel_j1_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4810071Z test_opinfo_sample_generators_special_bessel_y0_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4810263Z test_opinfo_sample_generators_special_bessel_y1_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4810474Z test_opinfo_sample_generators_special_chebyshev_polynomial_t_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4810697Z test_opinfo_sample_generators_special_chebyshev_polynomial_u_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4811199Z test_opinfo_sample_generators_special_chebyshev_polynomial_v_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... skip: Skipping - testing takes an unreasonably long time, #79528 (0.000s) 2023-01-11T21:23:47.4811580Z test_opinfo_sample_generators_special_chebyshev_polynomial_w_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... skip: Skipping - testing takes an unreasonably long time, #79528 (0.000s) 2023-01-11T21:23:47.4811780Z test_opinfo_sample_generators_special_entr_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4811980Z test_opinfo_sample_generators_special_erfcx_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4812199Z test_opinfo_sample_generators_special_hermite_polynomial_h_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4812451Z test_opinfo_sample_generators_special_hermite_polynomial_he_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4812648Z test_opinfo_sample_generators_special_i0e_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4812847Z test_opinfo_sample_generators_special_i1_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4813031Z test_opinfo_sample_generators_special_i1e_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4813250Z test_opinfo_sample_generators_special_laguerre_polynomial_l_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4813644Z test_opinfo_sample_generators_special_legendre_polynomial_p_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... skip: Skipping - testing takes an unreasonably long time, #79528 (0.000s) 2023-01-11T21:23:47.4814003Z test_opinfo_sample_generators_special_log_ndtr_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4814232Z test_opinfo_sample_generators_special_modified_bessel_i0_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4814460Z test_opinfo_sample_generators_special_modified_bessel_i1_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4814935Z test_opinfo_sample_generators_special_modified_bessel_k0_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4815137Z test_opinfo_sample_generators_special_modified_bessel_k1_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4815340Z test_opinfo_sample_generators_special_ndtr_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4815570Z test_opinfo_sample_generators_special_ndtri_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4815816Z test_opinfo_sample_generators_special_polygamma_special_polygamma_n_0_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4816036Z test_opinfo_sample_generators_special_scaled_modified_bessel_k0_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4816346Z test_opinfo_sample_generators_special_scaled_modified_bessel_k1_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4816761Z test_opinfo_sample_generators_special_shifted_chebyshev_polynomial_t_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... skip: Skipping - testing takes an unreasonably long time, #79528 (0.000s) 2023-01-11T21:23:47.4817165Z test_opinfo_sample_generators_special_shifted_chebyshev_polynomial_u_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... skip: Skipping - testing takes an unreasonably long time, #79528 (0.000s) 2023-01-11T21:23:47.4817570Z test_opinfo_sample_generators_special_shifted_chebyshev_polynomial_v_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... skip: Skipping - testing takes an unreasonably long time, #79528 (0.000s) 2023-01-11T21:23:47.4817973Z test_opinfo_sample_generators_special_shifted_chebyshev_polynomial_w_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... skip: Skipping - testing takes an unreasonably long time, #79528 (0.000s) 2023-01-11T21:23:47.4818190Z test_opinfo_sample_generators_special_spherical_bessel_j0_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4818391Z test_opinfo_sample_generators_special_xlog1py_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4818588Z test_opinfo_sample_generators_special_zeta_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4818779Z test_opinfo_sample_generators_split_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4819026Z test_opinfo_sample_generators_split_list_args_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4819220Z test_opinfo_sample_generators_split_with_sizes_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4819412Z test_opinfo_sample_generators_sqrt_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4819604Z test_opinfo_sample_generators_square_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4819797Z test_opinfo_sample_generators_squeeze_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4819986Z test_opinfo_sample_generators_stack_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4820174Z test_opinfo_sample_generators_std_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4820367Z test_opinfo_sample_generators_std_mean_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4820571Z test_opinfo_sample_generators_std_mean_unbiased_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4820769Z test_opinfo_sample_generators_std_unbiased_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4820949Z test_opinfo_sample_generators_stft_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4821134Z test_opinfo_sample_generators_sub_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4821320Z test_opinfo_sample_generators_sum_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4821513Z test_opinfo_sample_generators_sum_to_size_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4821696Z test_opinfo_sample_generators_svd_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4821892Z test_opinfo_sample_generators_svd_lowrank_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4822082Z test_opinfo_sample_generators_symeig_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4822293Z test_opinfo_sample_generators_t_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4822497Z test_opinfo_sample_generators_take_along_dim_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4822678Z test_opinfo_sample_generators_take_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4822866Z test_opinfo_sample_generators_tan_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4823051Z test_opinfo_sample_generators_tanh_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4823248Z test_opinfo_sample_generators_tensor_split_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4823446Z test_opinfo_sample_generators_tensordot_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4823633Z test_opinfo_sample_generators_tile_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4823818Z test_opinfo_sample_generators_to_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4824012Z test_opinfo_sample_generators_to_sparse_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4824197Z test_opinfo_sample_generators_topk_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4824378Z test_opinfo_sample_generators_trace_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4824571Z test_opinfo_sample_generators_transpose_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4824861Z test_opinfo_sample_generators_trapezoid_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4825049Z test_opinfo_sample_generators_trapz_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4825256Z test_opinfo_sample_generators_triangular_solve_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4825441Z test_opinfo_sample_generators_tril_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4825641Z test_opinfo_sample_generators_tril_indices_cuda_int64 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4825855Z test_opinfo_sample_generators_triu_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4826064Z test_opinfo_sample_generators_triu_indices_cuda_int64 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4826264Z test_opinfo_sample_generators_true_divide_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4826445Z test_opinfo_sample_generators_trunc_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4826639Z test_opinfo_sample_generators_unbind_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4826834Z test_opinfo_sample_generators_unflatten_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4827028Z test_opinfo_sample_generators_unfold_copy_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4827222Z test_opinfo_sample_generators_unfold_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4827413Z test_opinfo_sample_generators_uniform_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4827621Z test_opinfo_sample_generators_unique_consecutive_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4827811Z test_opinfo_sample_generators_unique_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4828029Z test_opinfo_sample_generators_unsqueeze_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4828210Z test_opinfo_sample_generators_var_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4828401Z test_opinfo_sample_generators_var_mean_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4828602Z test_opinfo_sample_generators_var_mean_unbiased_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4828799Z test_opinfo_sample_generators_var_unbiased_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4828987Z test_opinfo_sample_generators_vdot_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4829188Z test_opinfo_sample_generators_view_as_complex_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4829384Z test_opinfo_sample_generators_view_as_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4829582Z test_opinfo_sample_generators_view_as_real_cuda_complex64 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4829779Z test_opinfo_sample_generators_view_copy_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4829958Z test_opinfo_sample_generators_view_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4830146Z test_opinfo_sample_generators_vsplit_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4830333Z test_opinfo_sample_generators_vstack_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4830552Z test_opinfo_sample_generators_where_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4830740Z test_opinfo_sample_generators_xlogy_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s) 2023-01-11T21:23:47.4830929Z test_opinfo_sample_generators_zero__cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4831118Z test_opinfo_sample_generators_zeros_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4831311Z test_opinfo_sample_generators_zeros_like_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4831435Z test_sample_input (__main__.TestOpInfos) ... ok (0.001s) 2023-01-11T21:23:47.4831564Z test_sample_input_metadata (__main__.TestOpInfos) ... ok (0.001s) 2023-01-11T21:23:47.4831711Z test_default_names (__main__.TestTestParametrization) ... ok (0.001s) 2023-01-11T21:23:47.4831877Z test_modules_decorator_misuse_error (__main__.TestTestParametrization) ... ok (0.001s) 2023-01-11T21:23:47.4852237Z test_multiple_handling_of_same_param_error (__main__.TestTestParametrization) ... ok (0.001s) 2023-01-11T21:23:47.4852415Z test_name_fn (__main__.TestTestParametrization) ... ok (0.002s) 2023-01-11T21:23:47.4852589Z test_ops_decorator_misuse_error (__main__.TestTestParametrization) ... ok (0.001s) 2023-01-11T21:23:47.4852766Z test_subtest_expected_failure_x_1 (__main__.TestTestParametrization) ... ok (0.000s) 2023-01-11T21:23:47.4852944Z test_subtest_expected_failure_x_2 (__main__.TestTestParametrization) ... expected failure (0.000s) 2023-01-11T21:23:47.4853115Z test_subtest_expected_failure_x_3 (__main__.TestTestParametrization) ... ok (0.000s) 2023-01-11T21:23:47.4853262Z test_subtest_names (__main__.TestTestParametrization) ... ok (0.001s) 2023-01-11T21:23:47.4853456Z test_two_things_subtest_expected_failure_x_1_y_4 (__main__.TestTestParametrization) ... expected failure (0.001s) 2023-01-11T21:23:47.4853647Z test_two_things_subtest_expected_failure_x_1_y_5 (__main__.TestTestParametrization) ... expected failure (0.001s) 2023-01-11T21:23:47.4853928Z test_two_things_subtest_expected_failure_x_1_y_6 (__main__.TestTestParametrization) ... expected failure (0.001s) 2023-01-11T21:23:47.4854208Z test_two_things_subtest_expected_failure_x_2_y_4 (__main__.TestTestParametrization) ... ok (0.000s) 2023-01-11T21:23:47.4854395Z test_two_things_subtest_expected_failure_x_2_y_5 (__main__.TestTestParametrization) ... ok (0.000s) 2023-01-11T21:23:47.4854804Z test_two_things_subtest_expected_failure_x_2_y_6 (__main__.TestTestParametrization) ... expected failure (0.001s) 2023-01-11T21:23:47.4854978Z test_two_things_subtest_expected_failure_x_3_y_4 (__main__.TestTestParametrization) ... ok (0.000s) 2023-01-11T21:23:47.4855175Z test_two_things_subtest_expected_failure_x_3_y_5 (__main__.TestTestParametrization) ... ok (0.000s) 2023-01-11T21:23:47.4855392Z test_two_things_subtest_expected_failure_x_3_y_6 (__main__.TestTestParametrization) ... expected failure (0.001s) 2023-01-11T21:23:47.4855583Z test_default_names_cuda (__main__.TestTestParametrizationDeviceTypeCUDA) ... ok (0.001s) 2023-01-11T21:23:47.4855787Z test_dtypes_composition_invalid_cuda (__main__.TestTestParametrizationDeviceTypeCUDA) ... ok (0.001s) 2023-01-11T21:23:47.4855987Z test_dtypes_composition_valid_cuda (__main__.TestTestParametrizationDeviceTypeCUDA) ... ok (0.001s) 2023-01-11T21:23:47.4856200Z test_multiple_handling_of_same_param_error_cuda (__main__.TestTestParametrizationDeviceTypeCUDA) ... ok (0.001s) 2023-01-11T21:23:47.4856377Z test_name_fn_cuda (__main__.TestTestParametrizationDeviceTypeCUDA) ... ok (0.002s) 2023-01-11T21:23:47.4856571Z test_ops_composition_names_cuda (__main__.TestTestParametrizationDeviceTypeCUDA) ... ok (0.523s) 2023-01-11T21:23:47.4856765Z test_subtest_expected_failure_x_1_cuda (__main__.TestTestParametrizationDeviceTypeCUDA) ... ok (0.001s) 2023-01-11T21:23:47.4857080Z test_subtest_expected_failure_x_2_cuda (__main__.TestTestParametrizationDeviceTypeCUDA) ... expected failure (0.001s) 2023-01-11T21:23:47.4857281Z test_subtest_expected_failure_x_3_cuda (__main__.TestTestParametrizationDeviceTypeCUDA) ... ok (0.000s) 2023-01-11T21:23:47.4857467Z test_subtest_names_cuda (__main__.TestTestParametrizationDeviceTypeCUDA) ... ok (0.001s) 2023-01-11T21:23:47.4857696Z test_two_things_subtest_expected_failure_x_1_y_4_cuda (__main__.TestTestParametrizationDeviceTypeCUDA) ... expected failure (0.001s) 2023-01-11T21:23:47.4857924Z test_two_things_subtest_expected_failure_x_1_y_5_cuda (__main__.TestTestParametrizationDeviceTypeCUDA) ... expected failure (0.001s) 2023-01-11T21:23:47.4858152Z test_two_things_subtest_expected_failure_x_1_y_6_cuda (__main__.TestTestParametrizationDeviceTypeCUDA) ... expected failure (0.001s) 2023-01-11T21:23:47.4858363Z test_two_things_subtest_expected_failure_x_2_y_4_cuda (__main__.TestTestParametrizationDeviceTypeCUDA) ... ok (0.001s) 2023-01-11T21:23:47.4858579Z test_two_things_subtest_expected_failure_x_2_y_5_cuda (__main__.TestTestParametrizationDeviceTypeCUDA) ... ok (0.001s) 2023-01-11T21:23:47.4858796Z test_two_things_subtest_expected_failure_x_2_y_6_cuda (__main__.TestTestParametrizationDeviceTypeCUDA) ... expected failure (0.001s) 2023-01-11T21:23:47.4859012Z test_two_things_subtest_expected_failure_x_3_y_4_cuda (__main__.TestTestParametrizationDeviceTypeCUDA) ... ok (0.001s) 2023-01-11T21:23:47.4859217Z test_two_things_subtest_expected_failure_x_3_y_5_cuda (__main__.TestTestParametrizationDeviceTypeCUDA) ... ok (0.001s) 2023-01-11T21:23:47.4859439Z test_two_things_subtest_expected_failure_x_3_y_6_cuda (__main__.TestTestParametrizationDeviceTypeCUDA) ... expected failure (0.001s) 2023-01-11T21:23:47.4859635Z test_unparametrized_names_cuda (__main__.TestTestParametrizationDeviceTypeCUDA) ... ok (0.001s) 2023-01-11T21:23:47.4859786Z test_assertEqual_longMessage_cuda (__main__.TestTestingCUDA) ... ok (0.001s) 2023-01-11T21:23:47.4860372Z test_assertEqual_numpy_cuda_bool (__main__.TestTestingCUDA) ... 2023-01-11 21:23:45,582 - torch.cuda._sanitizer - INFO - Found Stream with id: 1003079584, but no matching stream creation in the trace. Backfilling the trace now. Perhaps the sanitizer was enabled after some torch operations? 2023-01-11T21:23:47.4860489Z ok (0.004s) 2023-01-11T21:23:47.4861017Z test_assertEqual_numpy_cuda_complex128 (__main__.TestTestingCUDA) ... 2023-01-11 21:23:45,585 - torch.cuda._sanitizer - INFO - Found Stream with id: 1002190448, but no matching stream creation in the trace. Backfilling the trace now. Perhaps the sanitizer was enabled after some torch operations? 2023-01-11T21:23:47.4861091Z ok (0.003s) 2023-01-11T21:23:47.4861613Z test_assertEqual_numpy_cuda_complex64 (__main__.TestTestingCUDA) ... 2023-01-11 21:23:45,589 - torch.cuda._sanitizer - INFO - Found Stream with id: 1009198656, but no matching stream creation in the trace. Backfilling the trace now. Perhaps the sanitizer was enabled after some torch operations? 2023-01-11T21:23:47.4861684Z ok (0.003s) 2023-01-11T21:23:47.4862199Z test_assertEqual_numpy_cuda_float16 (__main__.TestTestingCUDA) ... 2023-01-11 21:23:45,592 - torch.cuda._sanitizer - INFO - Found Stream with id: 1001152576, but no matching stream creation in the trace. Backfilling the trace now. Perhaps the sanitizer was enabled after some torch operations? 2023-01-11T21:23:47.4862273Z ok (0.003s) 2023-01-11T21:23:47.4862782Z test_assertEqual_numpy_cuda_float32 (__main__.TestTestingCUDA) ... 2023-01-11 21:23:45,596 - torch.cuda._sanitizer - INFO - Found Stream with id: 1003437184, but no matching stream creation in the trace. Backfilling the trace now. Perhaps the sanitizer was enabled after some torch operations? 2023-01-11T21:23:47.4862855Z ok (0.004s) 2023-01-11T21:23:47.4863363Z test_assertEqual_numpy_cuda_float64 (__main__.TestTestingCUDA) ... 2023-01-11 21:23:45,599 - torch.cuda._sanitizer - INFO - Found Stream with id: 1001806720, but no matching stream creation in the trace. Backfilling the trace now. Perhaps the sanitizer was enabled after some torch operations? 2023-01-11T21:23:47.4863466Z ok (0.003s) 2023-01-11T21:23:47.4863976Z test_assertEqual_numpy_cuda_int16 (__main__.TestTestingCUDA) ... 2023-01-11 21:23:45,603 - torch.cuda._sanitizer - INFO - Found Stream with id: 1003544512, but no matching stream creation in the trace. Backfilling the trace now. Perhaps the sanitizer was enabled after some torch operations? 2023-01-11T21:23:47.4864049Z ok (0.003s) 2023-01-11T21:23:47.4864551Z test_assertEqual_numpy_cuda_int32 (__main__.TestTestingCUDA) ... 2023-01-11 21:23:45,606 - torch.cuda._sanitizer - INFO - Found Stream with id: 1004346976, but no matching stream creation in the trace. Backfilling the trace now. Perhaps the sanitizer was enabled after some torch operations? 2023-01-11T21:23:47.4864626Z ok (0.003s) 2023-01-11T21:23:47.4865121Z test_assertEqual_numpy_cuda_int64 (__main__.TestTestingCUDA) ... 2023-01-11 21:23:45,609 - torch.cuda._sanitizer - INFO - Found Stream with id: 1007361680, but no matching stream creation in the trace. Backfilling the trace now. Perhaps the sanitizer was enabled after some torch operations? 2023-01-11T21:23:47.4865197Z ok (0.003s) 2023-01-11T21:23:47.4865754Z test_assertEqual_numpy_cuda_int8 (__main__.TestTestingCUDA) ... 2023-01-11 21:23:45,612 - torch.cuda._sanitizer - INFO - Found Stream with id: 1002159456, but no matching stream creation in the trace. Backfilling the trace now. Perhaps the sanitizer was enabled after some torch operations? 2023-01-11T21:23:47.4865828Z ok (0.003s) 2023-01-11T21:23:47.4866329Z test_assertEqual_numpy_cuda_uint8 (__main__.TestTestingCUDA) ... 2023-01-11 21:23:45,616 - torch.cuda._sanitizer - INFO - Found Stream with id: 113169504, but no matching stream creation in the trace. Backfilling the trace now. Perhaps the sanitizer was enabled after some torch operations? 2023-01-11T21:23:47.4866401Z ok (0.003s) 2023-01-11T21:23:47.4866648Z test_cuda_assert_should_not_stop_common_distributed_test_suite_cuda (__main__.TestTestingCUDA) ... skip: test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test (0.001s) 2023-01-11T21:23:47.4866892Z test_cuda_assert_should_stop_common_device_type_test_suite_cuda (__main__.TestTestingCUDA) ... skip: test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test (0.001s) 2023-01-11T21:23:47.4867181Z test_cuda_assert_should_stop_common_utils_test_suite_cuda (__main__.TestTestingCUDA) ... skip: test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test (0.001s) 2023-01-11T21:23:47.4867331Z test_get_supported_dtypes_cuda (__main__.TestTestingCUDA) ... ok (0.111s) 2023-01-11T21:23:47.4867860Z test_isclose_atol_rtol_greater_than_zero_cuda_bool (__main__.TestTestingCUDA) ... 2023-01-11 21:23:45,732 - torch.cuda._sanitizer - INFO - Found Stream with id: 1003039408, but no matching stream creation in the trace. Backfilling the trace now. Perhaps the sanitizer was enabled after some torch operations? 2023-01-11T21:23:47.4867933Z ok (0.007s) 2023-01-11T21:23:47.4868455Z test_isclose_atol_rtol_greater_than_zero_cuda_float16 (__main__.TestTestingCUDA) ... 2023-01-11 21:23:45,740 - torch.cuda._sanitizer - INFO - Found Stream with id: 996997792, but no matching stream creation in the trace. Backfilling the trace now. Perhaps the sanitizer was enabled after some torch operations? 2023-01-11T21:23:47.4868530Z ok (0.007s) 2023-01-11T21:23:47.4869064Z test_isclose_atol_rtol_greater_than_zero_cuda_float32 (__main__.TestTestingCUDA) ... 2023-01-11 21:23:45,747 - torch.cuda._sanitizer - INFO - Found Stream with id: 997146432, but no matching stream creation in the trace. Backfilling the trace now. Perhaps the sanitizer was enabled after some torch operations? 2023-01-11T21:23:47.4869140Z ok (0.007s) 2023-01-11T21:23:47.4869668Z test_isclose_atol_rtol_greater_than_zero_cuda_float64 (__main__.TestTestingCUDA) ... 2023-01-11 21:23:45,754 - torch.cuda._sanitizer - INFO - Found Stream with id: 996981072, but no matching stream creation in the trace. Backfilling the trace now. Perhaps the sanitizer was enabled after some torch operations? 2023-01-11T21:23:47.4869776Z ok (0.007s) 2023-01-11T21:23:47.4870307Z test_isclose_atol_rtol_greater_than_zero_cuda_int16 (__main__.TestTestingCUDA) ... 2023-01-11 21:23:45,761 - torch.cuda._sanitizer - INFO - Found Stream with id: 1004551360, but no matching stream creation in the trace. Backfilling the trace now. Perhaps the sanitizer was enabled after some torch operations? 2023-01-11T21:23:47.4870381Z ok (0.007s) 2023-01-11T21:23:47.4870903Z test_isclose_atol_rtol_greater_than_zero_cuda_int32 (__main__.TestTestingCUDA) ... 2023-01-11 21:23:45,768 - torch.cuda._sanitizer - INFO - Found Stream with id: 996942240, but no matching stream creation in the trace. Backfilling the trace now. Perhaps the sanitizer was enabled after some torch operations? 2023-01-11T21:23:47.4870976Z ok (0.007s) 2023-01-11T21:23:47.4871490Z test_isclose_atol_rtol_greater_than_zero_cuda_int64 (__main__.TestTestingCUDA) ... 2023-01-11 21:23:45,775 - torch.cuda._sanitizer - INFO - Found Stream with id: 1005064944, but no matching stream creation in the trace. Backfilling the trace now. Perhaps the sanitizer was enabled after some torch operations? 2023-01-11T21:23:47.4871566Z ok (0.007s) 2023-01-11T21:23:47.4872078Z test_isclose_atol_rtol_greater_than_zero_cuda_int8 (__main__.TestTestingCUDA) ... 2023-01-11 21:23:45,782 - torch.cuda._sanitizer - INFO - Found Stream with id: 1006226560, but no matching stream creation in the trace. Backfilling the trace now. Perhaps the sanitizer was enabled after some torch operations? 2023-01-11T21:23:47.4872153Z ok (0.007s) 2023-01-11T21:23:47.4872674Z test_isclose_atol_rtol_greater_than_zero_cuda_uint8 (__main__.TestTestingCUDA) ... 2023-01-11 21:23:45,789 - torch.cuda._sanitizer - INFO - Found Stream with id: 1000930384, but no matching stream creation in the trace. Backfilling the trace now. Perhaps the sanitizer was enabled after some torch operations? 2023-01-11T21:23:47.4872751Z ok (0.007s) 2023-01-11T21:23:47.4873238Z test_isclose_bool_cuda (__main__.TestTestingCUDA) ... 2023-01-11 21:23:45,796 - torch.cuda._sanitizer - INFO - Found Stream with id: 1004529328, but no matching stream creation in the trace. Backfilling the trace now. Perhaps the sanitizer was enabled after some torch operations? 2023-01-11T21:23:47.4873310Z ok (0.007s) 2023-01-11T21:23:47.4873854Z test_isclose_complex_cuda_complex128 (__main__.TestTestingCUDA) ... 2023-01-11 21:23:45,806 - torch.cuda._sanitizer - INFO - Found Stream with id: 996944224, but no matching stream creation in the trace. Backfilling the trace now. Perhaps the sanitizer was enabled after some torch operations? 2023-01-11T21:23:47.4873932Z ok (0.240s) 2023-01-11T21:23:47.4874436Z test_isclose_complex_cuda_complex64 (__main__.TestTestingCUDA) ... 2023-01-11 21:23:46,046 - torch.cuda._sanitizer - INFO - Found Stream with id: 1008427296, but no matching stream creation in the trace. Backfilling the trace now. Perhaps the sanitizer was enabled after some torch operations? 2023-01-11T21:23:47.4874512Z ok (0.235s) 2023-01-11T21:23:47.4874665Z test_isclose_equality_shortcut_cuda (__main__.TestTestingCUDA) ... ok (0.001s) 2023-01-11T21:23:47.4875159Z test_isclose_float_cuda_float16 (__main__.TestTestingCUDA) ... 2023-01-11 21:23:46,280 - torch.cuda._sanitizer - INFO - Found Stream with id: 1004815696, but no matching stream creation in the trace. Backfilling the trace now. Perhaps the sanitizer was enabled after some torch operations? 2023-01-11T21:23:47.4875233Z ok (0.030s) 2023-01-11T21:23:47.4875732Z test_isclose_float_cuda_float32 (__main__.TestTestingCUDA) ... 2023-01-11 21:23:46,311 - torch.cuda._sanitizer - INFO - Found Stream with id: 1005532160, but no matching stream creation in the trace. Backfilling the trace now. Perhaps the sanitizer was enabled after some torch operations? 2023-01-11T21:23:47.4875804Z ok (0.031s) 2023-01-11T21:23:47.4876297Z test_isclose_float_cuda_float64 (__main__.TestTestingCUDA) ... 2023-01-11 21:23:46,342 - torch.cuda._sanitizer - INFO - Found Stream with id: 1004110048, but no matching stream creation in the trace. Backfilling the trace now. Perhaps the sanitizer was enabled after some torch operations? 2023-01-11T21:23:47.4876401Z ok (0.031s) 2023-01-11T21:23:47.4876902Z test_isclose_integer_cuda_int16 (__main__.TestTestingCUDA) ... 2023-01-11 21:23:46,373 - torch.cuda._sanitizer - INFO - Found Stream with id: 1003385264, but no matching stream creation in the trace. Backfilling the trace now. Perhaps the sanitizer was enabled after some torch operations? 2023-01-11T21:23:47.4876974Z ok (0.014s) 2023-01-11T21:23:47.4877121Z test_isclose_integer_cuda_int32 (__main__.TestTestingCUDA) ... ok (0.014s) 2023-01-11T21:23:47.4877262Z test_isclose_integer_cuda_int64 (__main__.TestTestingCUDA) ... ok (0.014s) 2023-01-11T21:23:47.4877400Z test_isclose_integer_cuda_int8 (__main__.TestTestingCUDA) ... ok (0.014s) 2023-01-11T21:23:47.4877542Z test_isclose_integer_cuda_uint8 (__main__.TestTestingCUDA) ... ok (0.014s) 2023-01-11T21:23:47.4877712Z test_isclose_nan_equality_shortcut_cuda_complex128 (__main__.TestTestingCUDA) ... ok (0.002s) 2023-01-11T21:23:47.4877877Z test_isclose_nan_equality_shortcut_cuda_complex64 (__main__.TestTestingCUDA) ... ok (0.002s) 2023-01-11T21:23:47.4878037Z test_isclose_nan_equality_shortcut_cuda_float16 (__main__.TestTestingCUDA) ... ok (0.002s) 2023-01-11T21:23:47.4878200Z test_isclose_nan_equality_shortcut_cuda_float32 (__main__.TestTestingCUDA) ... ok (0.002s) 2023-01-11T21:23:47.4878358Z test_isclose_nan_equality_shortcut_cuda_float64 (__main__.TestTestingCUDA) ... ok (0.002s) 2023-01-11T21:23:47.4878917Z test_make_tensor_complex32_cuda (__main__.TestTestingCUDA) ... /opt/conda/lib/python3.10/site-packages/torch/testing/_creation.py:167: UserWarning: ComplexHalf support is experimental and many operators don't support it yet. (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/EmptyTensor.cpp:32.) 2023-01-11T21:23:47.4879046Z result = torch.empty(shape, device=device, dtype=dtype) 2023-01-11T21:23:47.4879116Z ok (0.001s) 2023-01-11T21:23:47.4879251Z test_make_tensor_cuda_bool (__main__.TestTestingCUDA) ... ok (0.009s) 2023-01-11T21:23:47.4879776Z test_make_tensor_cuda_complex64 (__main__.TestTestingCUDA) ... 2023-01-11 21:23:46,466 - torch.cuda._sanitizer - INFO - Found Stream with id: 1004253456, but no matching stream creation in the trace. Backfilling the trace now. Perhaps the sanitizer was enabled after some torch operations? 2023-01-11T21:23:47.4879847Z ok (0.009s) 2023-01-11T21:23:47.4880342Z test_make_tensor_cuda_float32 (__main__.TestTestingCUDA) ... 2023-01-11 21:23:46,472 - torch.cuda._sanitizer - INFO - Found Stream with id: 1008557520, but no matching stream creation in the trace. Backfilling the trace now. Perhaps the sanitizer was enabled after some torch operations? 2023-01-11T21:23:47.4880414Z ok (0.015s) 2023-01-11T21:23:47.4880899Z test_make_tensor_cuda_int64 (__main__.TestTestingCUDA) ... 2023-01-11 21:23:46,488 - torch.cuda._sanitizer - INFO - Found Stream with id: 1002658816, but no matching stream creation in the trace. Backfilling the trace now. Perhaps the sanitizer was enabled after some torch operations? 2023-01-11T21:23:47.4880975Z ok (0.015s) 2023-01-11T21:23:47.4880982Z 2023-01-11T21:23:47.4881181Z ---------------------------------------------------------------------- 2023-01-11T21:23:47.4881268Z Ran 1248 tests in 32.945s 2023-01-11T21:23:47.4881274Z 2023-01-11T21:23:47.4881369Z OK (skipped=25, expected failures=13) 2023-01-11T21:23:47.4881375Z 2023-01-11T21:23:47.4881464Z Generating XML reports... 2023-01-11T21:23:47.4881748Z Generated XML report: test-reports/python-unittest/test_testing/TEST-TestAssertClose-20230111212313.xml 2023-01-11T21:23:47.4882054Z Generated XML report: test-reports/python-unittest/test_testing/TEST-TestAssertCloseContainer-20230111212313.xml 2023-01-11T21:23:47.4882368Z Generated XML report: test-reports/python-unittest/test_testing/TEST-TestAssertCloseErrorMessage-20230111212313.xml 2023-01-11T21:23:47.4882689Z Generated XML report: test-reports/python-unittest/test_testing/TEST-TestAssertCloseMultiDeviceCUDA-20230111212313.xml 2023-01-11T21:23:47.4883022Z Generated XML report: test-reports/python-unittest/test_testing/TEST-TestAssertCloseQuantized-20230111212313.xml 2023-01-11T21:23:47.4883324Z Generated XML report: test-reports/python-unittest/test_testing/TEST-TestAssertCloseSparseBSC-20230111212313.xml 2023-01-11T21:23:47.4883621Z Generated XML report: test-reports/python-unittest/test_testing/TEST-TestAssertCloseSparseBSR-20230111212313.xml 2023-01-11T21:23:47.4883909Z Generated XML report: test-reports/python-unittest/test_testing/TEST-TestAssertCloseSparseCOO-20230111212313.xml 2023-01-11T21:23:47.4884201Z Generated XML report: test-reports/python-unittest/test_testing/TEST-TestAssertCloseSparseCSC-20230111212313.xml 2023-01-11T21:23:47.4884507Z Generated XML report: test-reports/python-unittest/test_testing/TEST-TestAssertCloseSparseCSR-20230111212313.xml 2023-01-11T21:23:47.4884786Z Generated XML report: test-reports/python-unittest/test_testing/TEST-TestFrameworkUtils-20230111212313.xml 2023-01-11T21:23:47.4885056Z Generated XML report: test-reports/python-unittest/test_testing/TEST-TestImports-20230111212313.xml 2023-01-11T21:23:47.4885371Z Generated XML report: test-reports/python-unittest/test_testing/TEST-TestOpInfoSampleFunctionsCUDA-20230111212313.xml 2023-01-11T21:23:47.4885631Z Generated XML report: test-reports/python-unittest/test_testing/TEST-TestOpInfos-20230111212313.xml 2023-01-11T21:23:47.4885926Z Generated XML report: test-reports/python-unittest/test_testing/TEST-TestTestParametrization-20230111212313.xml 2023-01-11T21:23:47.4886269Z Generated XML report: test-reports/python-unittest/test_testing/TEST-TestTestParametrizationDeviceTypeCUDA-20230111212313.xml 2023-01-11T21:23:47.4886533Z Generated XML report: test-reports/python-unittest/test_testing/TEST-TestTestingCUDA-20230111212313.xml 2023-01-11T21:23:47.4886546Z 2023-01-11T21:23:47.4887047Z ##[endgroup] 2023-01-11T21:23:47.4887318Z FINISHED PRINTING LOG FILE of test_testing (/var/lib/jenkins/workspace/test/test-reports/test_testing_qs5jmj7o) 2023-01-11T21:23:47.4887327Z 2023-01-11T21:23:49.3658983Z Ignoring disabled issues: [] 2023-01-11T21:23:49.3885241Z Running benchmark_utils/test_benchmark_utils ... [2023-01-11 21:23:49.387825] 2023-01-11T21:23:49.3886159Z Executing ['/opt/conda/bin/python', '-bb', 'benchmark_utils/test_benchmark_utils.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 21:23:49.388178] 2023-01-11T21:23:53.8296155Z 2023-01-11T21:23:53.8297833Z Expand the folded group to see the log file of benchmark_utils/test_benchmark_utils 2023-01-11T21:23:53.8299028Z ##[group]PRINTING LOG FILE of benchmark_utils/test_benchmark_utils (/var/lib/jenkins/workspace/test/test-reports/benchmark_utils-test_benchmark_utils_qeotpt96) 2023-01-11T21:23:53.8299417Z 2023-01-11T21:23:53.8299538Z Running tests... 2023-01-11T21:23:53.8300131Z ---------------------------------------------------------------------- 2023-01-11T21:23:53.8300773Z Test results will be stored in test-reports/python-unittest/benchmark_utils.test_benchmark_utils 2023-01-11T21:23:53.8301257Z test_adaptive_timer (__main__.TestBenchmarkUtils) ... ok (1.277s) 2023-01-11T21:23:53.8301779Z test_collect_callgrind (__main__.TestBenchmarkUtils) ... skip: test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test (0.001s) 2023-01-11T21:23:53.8302392Z test_collect_cpp_callgrind (__main__.TestBenchmarkUtils) ... skip: Failing on clang, see 74398 (0.001s) 2023-01-11T21:23:53.8302875Z test_compare (__main__.TestBenchmarkUtils) ... ok (0.152s) 2023-01-11T21:23:53.8303339Z test_cpp_timer (__main__.TestBenchmarkUtils) ... skip: Failing on clang, see 74398 (0.000s) 2023-01-11T21:23:53.8303748Z test_fuzzer (__main__.TestBenchmarkUtils) ... ok (0.002s) 2023-01-11T21:23:53.8304176Z test_manipulate_callgrind_stats (__main__.TestBenchmarkUtils) ... ok (0.037s) 2023-01-11T21:23:53.8304594Z test_timer (__main__.TestBenchmarkUtils) ... ok (0.910s) 2023-01-11T21:23:53.8305054Z test_timer_tiny_fast_snippet (__main__.TestBenchmarkUtils) ... skip: Failing on clang, see 74398 (0.000s) 2023-01-11T21:23:53.8305600Z 2023-01-11T21:23:53.8305897Z ---------------------------------------------------------------------- 2023-01-11T21:23:53.8306270Z Ran 9 tests in 2.381s 2023-01-11T21:23:53.8306438Z 2023-01-11T21:23:53.8306561Z OK (skipped=4) 2023-01-11T21:23:53.8306738Z 2023-01-11T21:23:53.8306869Z Generating XML reports... 2023-01-11T21:23:53.8307551Z Generated XML report: test-reports/python-unittest/benchmark_utils.test_benchmark_utils/TEST-TestBenchmarkUtils-20230111212350.xml 2023-01-11T21:23:53.8307926Z 2023-01-11T21:23:53.8308273Z ##[endgroup] 2023-01-11T21:23:53.8308924Z FINISHED PRINTING LOG FILE of benchmark_utils/test_benchmark_utils (/var/lib/jenkins/workspace/test/test-reports/benchmark_utils-test_benchmark_utils_qeotpt96) 2023-01-11T21:23:53.8309331Z 2023-01-11T21:23:55.7315948Z Ignoring disabled issues: [] 2023-01-11T21:23:55.7550646Z Running dynamo/test_comptime ... [2023-01-11 21:23:55.754431] 2023-01-11T21:23:55.7551468Z Executing ['/opt/conda/bin/python', '-bb', 'dynamo/test_comptime.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 21:23:55.754777] 2023-01-11T21:23:58.8797244Z 2023-01-11T21:23:58.8798254Z Expand the folded group to see the log file of dynamo/test_comptime 2023-01-11T21:23:58.8799403Z ##[group]PRINTING LOG FILE of dynamo/test_comptime (/var/lib/jenkins/workspace/test/test-reports/dynamo-test_comptime_q0eg521l) 2023-01-11T21:23:58.8799723Z 2023-01-11T21:23:58.8799843Z Running tests... 2023-01-11T21:23:58.8800399Z ---------------------------------------------------------------------- 2023-01-11T21:23:58.8800958Z Test results will be stored in test-reports/python-unittest/dynamo.test_comptime 2023-01-11T21:23:58.8801310Z test_get_local (__main__.ComptimeTests) ... ok (1.108s) 2023-01-11T21:23:58.8801704Z test_graph_break (__main__.ComptimeTests) ... frames [('total', 1), ('ok', 1)] 2023-01-11T21:23:58.8802199Z stats [('calls_captured', 3), ('fusions_possible', 2), ('unique_graphs', 1)] 2023-01-11T21:23:58.8802615Z frames [('total', 6), ('ok', 6)] 2023-01-11T21:23:58.8802958Z stats [('calls_captured', 5), ('unique_graphs', 4), ('fusions_possible', 1)] 2023-01-11T21:23:58.8803300Z unimplemented [] 2023-01-11T21:23:58.8803691Z graph_break [('ComptimeContext.graph_break', 2)] 2023-01-11T21:23:58.8804209Z inline_call [('ComptimeContext.graph_break', 1)] 2023-01-11T21:23:58.8804443Z ok (0.016s) 2023-01-11T21:23:58.8804757Z test_print_bt (__main__.ComptimeTests) ... File "/var/lib/jenkins/workspace/test/dynamo/test_comptime.py", line 152, in f 2023-01-11T21:23:58.8805040Z y = g(y) 2023-01-11T21:23:58.8805301Z File "/var/lib/jenkins/workspace/test/dynamo/test_comptime.py", line 145, in g 2023-01-11T21:23:58.8805570Z comptime.print_bt() 2023-01-11T21:23:58.8805696Z 2023-01-11T21:23:58.8805817Z frames [('total', 1), ('ok', 1)] 2023-01-11T21:23:58.8806026Z inline_call [] 2023-01-11T21:23:58.8806396Z stats [('calls_captured', 3), ('fusions_possible', 2), ('unique_graphs', 1)] 2023-01-11T21:23:58.8806735Z ok (0.090s) 2023-01-11T21:23:58.8807089Z test_print_disas (__main__.ComptimeTests) ... 54 0 LOAD_FAST 0 (x) 2023-01-11T21:23:58.8807455Z 2 LOAD_CONST 1 (2) 2023-01-11T21:23:58.8807732Z 4 BINARY_MULTIPLY 2023-01-11T21:23:58.8808026Z 6 STORE_FAST 1 (y) 2023-01-11T21:23:58.8808195Z 2023-01-11T21:23:58.8808316Z 56 8 LOAD_GLOBAL 0 (comptime) 2023-01-11T21:23:58.8808457Z 2023-01-11T21:23:58.8808683Z 57 10 LOAD_CONST 2 () 2023-01-11T21:23:58.8809317Z 12 LOAD_CONST 3 ('ComptimeTests.test_print_disas..f.._') 2023-01-11T21:23:58.8809668Z 14 MAKE_FUNCTION 0 2023-01-11T21:23:58.8809935Z 16 CALL_FUNCTION 1 2023-01-11T21:23:58.8810271Z 18 STORE_FAST 2 (_) 2023-01-11T21:23:58.8810402Z 2023-01-11T21:23:58.8810503Z 60 20 LOAD_GLOBAL 0 (comptime) 2023-01-11T21:23:58.8810749Z 22 LOAD_METHOD 1 (print_disas) 2023-01-11T21:23:58.8810982Z 24 CALL_METHOD 0 2023-01-11T21:23:58.8811233Z --> 26 POP_TOP 2023-01-11T21:23:58.8811346Z 2023-01-11T21:23:58.8811449Z 62 28 LOAD_FAST 1 (y) 2023-01-11T21:23:58.8811675Z 30 LOAD_CONST 4 (3) 2023-01-11T21:23:58.8811884Z 32 BINARY_ADD 2023-01-11T21:23:58.8812077Z 34 RETURN_VALUE 2023-01-11T21:23:58.8812199Z 2023-01-11T21:23:58.8812327Z frames [('total', 1), ('ok', 1)] 2023-01-11T21:23:58.8812555Z inline_call [] 2023-01-11T21:23:58.8812864Z stats [('calls_captured', 2), ('fusions_possible', 1), ('unique_graphs', 1)] 2023-01-11T21:23:58.8813102Z ok (0.006s) 2023-01-11T21:23:58.8813327Z test_print_graph (__main__.ComptimeTests) ... 2023-01-11T21:23:58.8813485Z 2023-01-11T21:23:58.8813490Z 2023-01-11T21:23:58.8813595Z def forward(self, x : torch.Tensor): 2023-01-11T21:23:58.8813975Z # File: /var/lib/jenkins/workspace/test/dynamo/test_comptime.py:26, code: y = x * 2 2023-01-11T21:23:58.8814246Z mul = x * 2; x = None 2023-01-11T21:23:58.8814433Z 2023-01-11T21:23:58.8815099Z frames [('total', 1), ('ok', 1)] 2023-01-11T21:23:58.8815376Z inline_call [] 2023-01-11T21:23:58.8815737Z stats [('calls_captured', 2), ('fusions_possible', 1), ('unique_graphs', 1)] 2023-01-11T21:23:58.8815970Z ok (0.006s) 2023-01-11T21:23:58.8816199Z test_print_guards (__main__.ComptimeTests) ... - 2023-01-11T21:23:58.8816479Z local 'x' TENSOR_MATCH 2023-01-11T21:23:58.8816674Z { 2023-01-11T21:23:58.8816896Z 'guard_types': None, 2023-01-11T21:23:58.8817135Z 'code': None, 2023-01-11T21:23:58.8817376Z 'obj_weakref': None 2023-01-11T21:23:58.8817621Z 'guarded_class': None 2023-01-11T21:23:58.8817816Z } 2023-01-11T21:23:58.8817991Z 2023-01-11T21:23:58.8818208Z frames [('total', 1), ('ok', 1)] 2023-01-11T21:23:58.8818407Z inline_call [] 2023-01-11T21:23:58.8818715Z stats [('calls_captured', 2), ('fusions_possible', 1), ('unique_graphs', 1)] 2023-01-11T21:23:58.8819014Z ok (0.006s) 2023-01-11T21:23:58.8819269Z test_print_locals (__main__.ComptimeTests) ... x = TensorVariable() 2023-01-11T21:23:58.8819523Z y = TensorVariable() 2023-01-11T21:23:58.8819730Z _ = ConstantVariable(NoneType) 2023-01-11T21:23:58.8819983Z frames [('total', 1), ('ok', 1)] 2023-01-11T21:23:58.8820180Z inline_call [] 2023-01-11T21:23:58.8820475Z stats [('calls_captured', 2), ('fusions_possible', 1), ('unique_graphs', 1)] 2023-01-11T21:23:58.8820719Z ok (0.006s) 2023-01-11T21:23:58.8821019Z test_print_value_stack (__main__.ComptimeTests) ... - TensorVariable() 2023-01-11T21:23:58.8821304Z frames [('total', 1), ('ok', 1)] 2023-01-11T21:23:58.8821503Z inline_call [] 2023-01-11T21:23:58.8821804Z stats [('calls_captured', 3), ('fusions_possible', 2), ('unique_graphs', 1)] 2023-01-11T21:23:58.8822212Z ok (0.007s) 2023-01-11T21:23:58.8822321Z 2023-01-11T21:23:58.8822522Z ---------------------------------------------------------------------- 2023-01-11T21:23:58.8822781Z Ran 8 tests in 1.246s 2023-01-11T21:23:58.8822908Z 2023-01-11T21:23:58.8822980Z OK 2023-01-11T21:23:58.8823072Z 2023-01-11T21:23:58.8823166Z Generating XML reports... 2023-01-11T21:23:58.8823585Z Generated XML report: test-reports/python-unittest/dynamo.test_comptime/TEST-ComptimeTests-20230111212357.xml 2023-01-11T21:23:58.8823829Z 2023-01-11T21:23:58.8824143Z ##[endgroup] 2023-01-11T21:23:58.8824548Z FINISHED PRINTING LOG FILE of dynamo/test_comptime (/var/lib/jenkins/workspace/test/test-reports/dynamo-test_comptime_q0eg521l) 2023-01-11T21:23:58.8824779Z 2023-01-11T21:24:00.7868078Z Ignoring disabled issues: [] 2023-01-11T21:24:00.8098487Z Running dynamo/test_functions ... [2023-01-11 21:24:00.809254] 2023-01-11T21:24:00.8099867Z Executing ['/opt/conda/bin/python', '-bb', 'dynamo/test_functions.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 21:24:00.809617] 2023-01-11T21:24:04.6833605Z 2023-01-11T21:24:04.6834226Z Expand the folded group to see the log file of dynamo/test_functions 2023-01-11T21:24:04.6835796Z ##[group]PRINTING LOG FILE of dynamo/test_functions (/var/lib/jenkins/workspace/test/test-reports/dynamo-test_functions_ovkhnnj2) 2023-01-11T21:24:04.6836068Z 2023-01-11T21:24:04.6836188Z Running tests... 2023-01-11T21:24:04.6837464Z ---------------------------------------------------------------------- 2023-01-11T21:24:04.6838026Z Test results will be stored in test-reports/python-unittest/dynamo.test_functions 2023-01-11T21:24:04.6838429Z test_T (__main__.FunctionTests) ... ok (1.198s) 2023-01-11T21:24:04.6838916Z test_add (__main__.FunctionTests) ... stats [('calls_captured', 2), ('fusions_possible', 1), ('unique_graphs', 1)] 2023-01-11T21:24:04.6839457Z stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)] 2023-01-11T21:24:04.6839759Z ok (0.006s) 2023-01-11T21:24:04.6840373Z test_add_ (__main__.FunctionTests) ... /var/lib/jenkins/workspace/test/dynamo/test_functions.py:73: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor). 2023-01-11T21:24:04.6841060Z a_copy = torch.tensor(a) 2023-01-11T21:24:04.6841944Z /opt/conda/lib/python3.10/site-packages/torch/_dynamo/utils.py:1052: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor). 2023-01-11T21:24:04.6842536Z return node.target(*args, **kwargs) 2023-01-11T21:24:04.6843109Z .7:5: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor). 2023-01-11T21:24:04.6843673Z tensor = torch.tensor(a); a = None 2023-01-11T21:24:04.6844130Z stats [('calls_captured', 3), ('fusions_possible', 2), ('unique_graphs', 1)] 2023-01-11T21:24:04.6844437Z ok (0.015s) 2023-01-11T21:24:04.6845233Z test_addcdiv (__main__.FunctionTests) ... stats [('calls_captured', 3), ('fusions_possible', 2), ('unique_graphs', 1)] 2023-01-11T21:24:04.6845604Z ok (0.008s) 2023-01-11T21:24:04.6846071Z test_addcdiv_ (__main__.FunctionTests) ... /var/lib/jenkins/workspace/test/dynamo/test_functions.py:84: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor). 2023-01-11T21:24:04.6846609Z a_copy = torch.tensor(a) 2023-01-11T21:24:04.6847293Z /opt/conda/lib/python3.10/site-packages/torch/_dynamo/utils.py:1052: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor). 2023-01-11T21:24:04.6847741Z return node.target(*args, **kwargs) 2023-01-11T21:24:04.6848228Z .12:5: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor). 2023-01-11T21:24:04.6848631Z tensor = torch.tensor(a); a = None 2023-01-11T21:24:04.6849041Z stats [('calls_captured', 4), ('fusions_possible', 3), ('unique_graphs', 1)] 2023-01-11T21:24:04.6849400Z ok (0.015s) 2023-01-11T21:24:04.6849757Z test_build_list_unpack (__main__.FunctionTests) ... inline_call [] 2023-01-11T21:24:04.6850303Z stats [('calls_captured', 5), ('fusions_possible', 4), ('unique_graphs', 1)] 2023-01-11T21:24:04.6850660Z ok (0.012s) 2023-01-11T21:24:04.6851075Z test_chunks1 (__main__.FunctionTests) ... stats [('calls_captured', 3), ('fusions_possible', 2), ('unique_graphs', 1)] 2023-01-11T21:24:04.6851584Z ok (0.008s) 2023-01-11T21:24:04.6852125Z test_const_tuple_add1 (__main__.FunctionTests) ... stats [('calls_captured', 4), ('fusions_possible', 3), ('unique_graphs', 1)] 2023-01-11T21:24:04.6852513Z ok (0.008s) 2023-01-11T21:24:04.6853035Z test_const_tuple_add2 (__main__.FunctionTests) ... stats [('calls_captured', 4), ('fusions_possible', 3), ('unique_graphs', 1)] 2023-01-11T21:24:04.6853400Z ok (0.008s) 2023-01-11T21:24:04.6854009Z test_constant1 (__main__.FunctionTests) ... stats [('calls_captured', 3), ('fusions_possible', 2), ('unique_graphs', 1)] 2023-01-11T21:24:04.6854381Z ok (0.007s) 2023-01-11T21:24:04.6855057Z test_constant2 (__main__.FunctionTests) ... stats [('calls_captured', 3), ('fusions_possible', 2), ('unique_graphs', 1)] 2023-01-11T21:24:04.6855397Z ok (0.007s) 2023-01-11T21:24:04.6855817Z test_constant3 (__main__.FunctionTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)] 2023-01-11T21:24:04.6856159Z ok (0.005s) 2023-01-11T21:24:04.6856683Z test_constant4 (__main__.FunctionTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)] 2023-01-11T21:24:04.6857052Z ok (0.005s) 2023-01-11T21:24:04.6857544Z test_default_dict (__main__.FunctionTests) ... stats [('calls_captured', 2), ('fusions_possible', 1), ('unique_graphs', 1)] 2023-01-11T21:24:04.6857878Z ok (0.010s) 2023-01-11T21:24:04.6858321Z test_del (__main__.FunctionTests) ... stats [('calls_captured', 3), ('fusions_possible', 2), ('unique_graphs', 1)] 2023-01-11T21:24:04.6858644Z ok (0.007s) 2023-01-11T21:24:04.6859123Z test_device (__main__.FunctionTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)] 2023-01-11T21:24:04.6859485Z ok (0.005s) 2023-01-11T21:24:04.6860012Z test_device_constant (__main__.FunctionTests) ... stats [('calls_captured', 2), ('fusions_possible', 1), ('unique_graphs', 1)] 2023-01-11T21:24:04.6860381Z ok (0.011s) 2023-01-11T21:24:04.6860860Z test_dict_copy (__main__.FunctionTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)] 2023-01-11T21:24:04.6861224Z ok (0.005s) 2023-01-11T21:24:04.6861717Z test_dict_ops (__main__.FunctionTests) ... stats [('calls_captured', 8), ('fusions_possible', 7), ('unique_graphs', 1)] 2023-01-11T21:24:04.6862156Z ok (0.013s) 2023-01-11T21:24:04.6862546Z test_dict_param_keys (__main__.FunctionTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)] 2023-01-11T21:24:04.6862830Z ok (0.006s) 2023-01-11T21:24:04.6863215Z test_distributed_is_available (__main__.FunctionTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)] 2023-01-11T21:24:04.6863508Z ok (0.005s) 2023-01-11T21:24:04.6863902Z test_distributed_is_initialized (__main__.FunctionTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)] 2023-01-11T21:24:04.6864197Z ok (0.005s) 2023-01-11T21:24:04.6864556Z test_dtype (__main__.FunctionTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)] 2023-01-11T21:24:04.6864827Z ok (0.005s) 2023-01-11T21:24:04.6865203Z test_dtype_compare (__main__.FunctionTests) ... stats [('calls_captured', 2), ('fusions_possible', 1), ('unique_graphs', 1)] 2023-01-11T21:24:04.6865479Z ok (0.007s) 2023-01-11T21:24:04.6865839Z test_finfo (__main__.FunctionTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)] 2023-01-11T21:24:04.6866109Z ok (0.007s) 2023-01-11T21:24:04.6866472Z test_float (__main__.FunctionTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)] 2023-01-11T21:24:04.6866736Z ok (0.005s) 2023-01-11T21:24:04.6867115Z test_fn_with_self_set (__main__.FunctionTests) ... stats [('calls_captured', 4), ('fusions_possible', 3), ('unique_graphs', 1)] 2023-01-11T21:24:04.6867400Z ok (0.010s) 2023-01-11T21:24:04.6867758Z test_fstrings1 (__main__.FunctionTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)] 2023-01-11T21:24:04.6868094Z ok (0.005s) 2023-01-11T21:24:04.6868466Z test_fstrings2 (__main__.FunctionTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)] 2023-01-11T21:24:04.6868731Z ok (0.005s) 2023-01-11T21:24:04.6869100Z test_fstrings3 (__main__.FunctionTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)] 2023-01-11T21:24:04.6869381Z ok (0.005s) 2023-01-11T21:24:04.6869625Z test_funcdef_closure (__main__.FunctionTests) ... inline_call [] 2023-01-11T21:24:04.6869985Z stats [('calls_captured', 10), ('fusions_possible', 9), ('unique_graphs', 1)] 2023-01-11T21:24:04.6870232Z ok (0.014s) 2023-01-11T21:24:04.6870615Z test_get_default_dtype (__main__.FunctionTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)] 2023-01-11T21:24:04.6870897Z ok (0.005s) 2023-01-11T21:24:04.6871270Z test_globalfn (__main__.FunctionTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)] 2023-01-11T21:24:04.6871552Z ok (0.005s) 2023-01-11T21:24:04.6871922Z test_globalmodule (__main__.FunctionTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)] 2023-01-11T21:24:04.6872206Z ok (0.009s) 2023-01-11T21:24:04.6872577Z test_globalvar (__main__.FunctionTests) ... stats [('calls_captured', 2), ('fusions_possible', 1), ('unique_graphs', 1)] 2023-01-11T21:24:04.6872860Z ok (0.006s) 2023-01-11T21:24:04.6873221Z test_import1 (__main__.FunctionTests) ... stats [('calls_captured', 2), ('fusions_possible', 1), ('unique_graphs', 1)] 2023-01-11T21:24:04.6873502Z ok (0.006s) 2023-01-11T21:24:04.6873875Z test_indirect1 (__main__.FunctionTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)] 2023-01-11T21:24:04.6874146Z ok (0.005s) 2023-01-11T21:24:04.6874515Z test_indirect2 (__main__.FunctionTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)] 2023-01-11T21:24:04.6874798Z ok (0.005s) 2023-01-11T21:24:04.6875154Z test_indirect3 (__main__.FunctionTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)] 2023-01-11T21:24:04.6875429Z ok (0.005s) 2023-01-11T21:24:04.6875682Z test_inline_jit_annotations (__main__.FunctionTests) ... inline_call [] 2023-01-11T21:24:04.6876089Z stats [('calls_captured', 3), ('fusions_possible', 2), ('unique_graphs', 1)] 2023-01-11T21:24:04.6876332Z ok (0.007s) 2023-01-11T21:24:04.6876713Z test_inline_softmax (__main__.FunctionTests) ... stats [('calls_captured', 3), ('fusions_possible', 2), ('unique_graphs', 1)] 2023-01-11T21:24:04.6876996Z ok (0.009s) 2023-01-11T21:24:04.6877236Z test_inline_with_default (__main__.FunctionTests) ... inline_call [] 2023-01-11T21:24:04.6877605Z stats [('calls_captured', 2), ('fusions_possible', 1), ('unique_graphs', 1)] 2023-01-11T21:24:04.6877853Z ok (0.006s) 2023-01-11T21:24:04.6878087Z test_inner_function (__main__.FunctionTests) ... inline_call [] 2023-01-11T21:24:04.6878453Z stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)] 2023-01-11T21:24:04.6878696Z ok (0.005s) 2023-01-11T21:24:04.6879096Z test_is_contiguous_memory_format (__main__.FunctionTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)] 2023-01-11T21:24:04.6879385Z ok (0.005s) 2023-01-11T21:24:04.6879842Z test_is_fx_tracing (__main__.FunctionTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)] 2023-01-11T21:24:04.6880117Z ok (0.005s) 2023-01-11T21:24:04.6880539Z test_is_in_onnx_export (__main__.FunctionTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)] 2023-01-11T21:24:04.6880839Z ok (0.005s) 2023-01-11T21:24:04.6881204Z test_is_not_null (__main__.FunctionTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)] 2023-01-11T21:24:04.6881484Z ok (0.005s) 2023-01-11T21:24:04.6881862Z test_is_quantized (__main__.FunctionTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)] 2023-01-11T21:24:04.6882178Z ok (0.005s) 2023-01-11T21:24:04.6882551Z test_is_sparse (__main__.FunctionTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)] 2023-01-11T21:24:04.6882832Z ok (0.004s) 2023-01-11T21:24:04.6883216Z test_islice_chain (__main__.FunctionTests) ... stats [('calls_captured', 6), ('fusions_possible', 5), ('unique_graphs', 1)] 2023-01-11T21:24:04.6883493Z ok (0.011s) 2023-01-11T21:24:04.6883866Z test_jit_annotate (__main__.FunctionTests) ... stats [('calls_captured', 2), ('fusions_possible', 1), ('unique_graphs', 1)] 2023-01-11T21:24:04.6884149Z ok (0.006s) 2023-01-11T21:24:04.6884524Z test_len_constant_dict (__main__.FunctionTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)] 2023-01-11T21:24:04.6884810Z ok (0.005s) 2023-01-11T21:24:04.6885186Z test_len_constant_list (__main__.FunctionTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)] 2023-01-11T21:24:04.6885476Z ok (0.005s) 2023-01-11T21:24:04.6885867Z test_len_constant_misc_iterables (__main__.FunctionTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)] 2023-01-11T21:24:04.6886163Z ok (0.005s) 2023-01-11T21:24:04.6886541Z test_len_tensor (__main__.FunctionTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)] 2023-01-11T21:24:04.6886816Z ok (0.005s) 2023-01-11T21:24:04.6887186Z test_list_add (__main__.FunctionTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)] 2023-01-11T21:24:04.6887464Z ok (0.005s) 2023-01-11T21:24:04.6887829Z test_list_clear (__main__.FunctionTests) ... stats [('calls_captured', 3), ('fusions_possible', 2), ('unique_graphs', 1)] 2023-01-11T21:24:04.6888107Z ok (0.007s) 2023-01-11T21:24:04.6888484Z test_list_convert (__main__.FunctionTests) ... stats [('calls_captured', 4), ('fusions_possible', 3), ('unique_graphs', 1)] 2023-01-11T21:24:04.6888770Z ok (0.008s) 2023-01-11T21:24:04.6889143Z test_list_reversed (__main__.FunctionTests) ... stats [('calls_captured', 5), ('fusions_possible', 4), ('unique_graphs', 1)] 2023-01-11T21:24:04.6889425Z ok (0.009s) 2023-01-11T21:24:04.6889846Z test_list_slice_assignment (__main__.FunctionTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)] 2023-01-11T21:24:04.6890134Z ok (0.005s) 2023-01-11T21:24:04.6890503Z test_list_truth (__main__.FunctionTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)] 2023-01-11T21:24:04.6890788Z ok (0.005s) 2023-01-11T21:24:04.6891159Z test_listarg1 (__main__.FunctionTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)] 2023-01-11T21:24:04.6891431Z ok (0.005s) 2023-01-11T21:24:04.6891803Z test_listarg2 (__main__.FunctionTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)] 2023-01-11T21:24:04.6892080Z ok (0.006s) 2023-01-11T21:24:04.6892443Z test_listarg3 (__main__.FunctionTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)] 2023-01-11T21:24:04.6892715Z ok (0.005s) 2023-01-11T21:24:04.6893078Z test_listarg4 (__main__.FunctionTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)] 2023-01-11T21:24:04.6893351Z ok (0.005s) 2023-01-11T21:24:04.6893716Z test_listarg5 (__main__.FunctionTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)] 2023-01-11T21:24:04.6894076Z ok (0.005s) 2023-01-11T21:24:04.6894458Z test_load_global_bool (__main__.FunctionTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)] 2023-01-11T21:24:04.6894989Z ok (0.005s) 2023-01-11T21:24:04.6895244Z test_map_sum (__main__.FunctionTests) ... inline_call [] 2023-01-11T21:24:04.6895721Z stats [('calls_captured', 8), ('fusions_possible', 7), ('unique_graphs', 1)] 2023-01-11T21:24:04.6895957Z ok (0.013s) 2023-01-11T21:24:04.6896329Z test_methodcall1 (__main__.FunctionTests) ... inline_call [] 2023-01-11T21:24:04.6896692Z stats [('calls_captured', 3), ('fusions_possible', 2), ('unique_graphs', 1)] 2023-01-11T21:24:04.6896921Z ok (0.007s) 2023-01-11T21:24:04.6897154Z test_methodcall2 (__main__.FunctionTests) ... inline_call [] 2023-01-11T21:24:04.6897505Z stats [('calls_captured', 3), ('fusions_possible', 2), ('unique_graphs', 1)] 2023-01-11T21:24:04.6897733Z ok (0.007s) 2023-01-11T21:24:04.6897966Z test_methodcall3 (__main__.FunctionTests) ... inline_call [] 2023-01-11T21:24:04.6898311Z stats [('calls_captured', 3), ('fusions_possible', 2), ('unique_graphs', 1)] 2023-01-11T21:24:04.6898546Z ok (0.007s) 2023-01-11T21:24:04.6898907Z test_min_max (__main__.FunctionTests) ... stats [('calls_captured', 11), ('fusions_possible', 10), ('unique_graphs', 1)] 2023-01-11T21:24:04.6899187Z ok (0.017s) 2023-01-11T21:24:04.6899563Z test_module_constant (__main__.FunctionTests) ... stats [('calls_captured', 4), ('fusions_possible', 3), ('unique_graphs', 1)] 2023-01-11T21:24:04.6899838Z ok (0.009s) 2023-01-11T21:24:04.6900219Z test_ndim (__main__.FunctionTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)] 2023-01-11T21:24:04.6900524Z ok (0.005s) 2023-01-11T21:24:04.6900877Z test_pop (__main__.FunctionTests) ... stats [('calls_captured', 4), ('fusions_possible', 3), ('unique_graphs', 1)] 2023-01-11T21:24:04.6901149Z ok (0.009s) 2023-01-11T21:24:04.6901512Z test_range1 (__main__.FunctionTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)] 2023-01-11T21:24:04.6901786Z ok (0.005s) 2023-01-11T21:24:04.6902142Z test_range2 (__main__.FunctionTests) ... stats [('calls_captured', 13), ('fusions_possible', 12), ('unique_graphs', 1)] 2023-01-11T21:24:04.6902420Z ok (0.015s) 2023-01-11T21:24:04.6902782Z test_reduce (__main__.FunctionTests) ... stats [('calls_captured', 3), ('fusions_possible', 2), ('unique_graphs', 1)] 2023-01-11T21:24:04.6903048Z ok (0.008s) 2023-01-11T21:24:04.6903425Z test_return_dict (__main__.FunctionTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)] 2023-01-11T21:24:04.6903701Z ok (0.006s) 2023-01-11T21:24:04.6904072Z test_return_dict2 (__main__.FunctionTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)] 2023-01-11T21:24:04.6904394Z ok (0.006s) 2023-01-11T21:24:04.6904772Z test_return_tuple1 (__main__.FunctionTests) ... stats [('calls_captured', 2), ('fusions_possible', 1), ('unique_graphs', 1)] 2023-01-11T21:24:04.6905051Z ok (0.006s) 2023-01-11T21:24:04.6905413Z test_return_tuple2 (__main__.FunctionTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)] 2023-01-11T21:24:04.6905684Z ok (0.005s) 2023-01-11T21:24:04.6906044Z test_shape1 (__main__.FunctionTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)] 2023-01-11T21:24:04.6906311Z ok (0.005s) 2023-01-11T21:24:04.6906674Z test_shape2 (__main__.FunctionTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)] 2023-01-11T21:24:04.6906952Z ok (0.004s) 2023-01-11T21:24:04.6907310Z test_slice1 (__main__.FunctionTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)] 2023-01-11T21:24:04.6907574Z ok (0.004s) 2023-01-11T21:24:04.6907939Z test_slice2 (__main__.FunctionTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)] 2023-01-11T21:24:04.6908213Z ok (0.004s) 2023-01-11T21:24:04.6908563Z test_slice3 (__main__.FunctionTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)] 2023-01-11T21:24:04.6908831Z ok (0.005s) 2023-01-11T21:24:04.6909190Z test_slice4 (__main__.FunctionTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)] 2023-01-11T21:24:04.6909448Z ok (0.004s) 2023-01-11T21:24:04.6909807Z test_slice5 (__main__.FunctionTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)] 2023-01-11T21:24:04.6910143Z ok (0.005s) 2023-01-11T21:24:04.6910526Z test_slice6 (__main__.FunctionTests) ... stats [('calls_captured', 2), ('fusions_possible', 1), ('unique_graphs', 1)] 2023-01-11T21:24:04.6910787Z ok (0.006s) 2023-01-11T21:24:04.6911156Z test_startswith (__main__.FunctionTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)] 2023-01-11T21:24:04.6911434Z ok (0.006s) 2023-01-11T21:24:04.6911791Z test_tensor_len (__main__.FunctionTests) ... stats [('calls_captured', 3), ('fusions_possible', 2), ('unique_graphs', 1)] 2023-01-11T21:24:04.6912069Z ok (0.006s) 2023-01-11T21:24:04.6912451Z test_tensor_new_with_shape (__main__.FunctionTests) ... stats [('calls_captured', 2), ('fusions_possible', 1), ('unique_graphs', 1)] 2023-01-11T21:24:04.6912739Z ok (0.017s) 2023-01-11T21:24:04.6913113Z test_tensor_new_with_size (__main__.FunctionTests) ... stats [('calls_captured', 2), ('fusions_possible', 1), ('unique_graphs', 1)] 2023-01-11T21:24:04.6913398Z ok (0.017s) 2023-01-11T21:24:04.6913767Z test_tensor_type (__main__.FunctionTests) ... stats [('calls_captured', 2), ('fusions_possible', 1), ('unique_graphs', 1)] 2023-01-11T21:24:04.6914033Z ok (0.018s) 2023-01-11T21:24:04.6914403Z test_tensor_type2 (__main__.FunctionTests) ... stats [('calls_captured', 3), ('fusions_possible', 2), ('unique_graphs', 1)] 2023-01-11T21:24:04.6914681Z ok (0.030s) 2023-01-11T21:24:04.6915054Z test_transpose_for_scores (__main__.FunctionTests) ... stats [('calls_captured', 2), ('fusions_possible', 1), ('unique_graphs', 1)] 2023-01-11T21:24:04.6915337Z ok (0.007s) 2023-01-11T21:24:04.6915694Z test_tuple1 (__main__.FunctionTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)] 2023-01-11T21:24:04.6915965Z ok (0.005s) 2023-01-11T21:24:04.6916313Z test_tuple2 (__main__.FunctionTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)] 2023-01-11T21:24:04.6916581Z ok (0.005s) 2023-01-11T21:24:04.6916954Z test_tuple_contains (__main__.FunctionTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)] 2023-01-11T21:24:04.6917225Z ok (0.005s) 2023-01-11T21:24:04.6917594Z test_tuple_iadd (__main__.FunctionTests) ... stats [('calls_captured', 2), ('fusions_possible', 1), ('unique_graphs', 1)] 2023-01-11T21:24:04.6917906Z ok (0.006s) 2023-01-11T21:24:04.6918272Z test_unpack1 (__main__.FunctionTests) ... stats [('calls_captured', 3), ('fusions_possible', 2), ('unique_graphs', 1)] 2023-01-11T21:24:04.6918538Z ok (0.006s) 2023-01-11T21:24:04.6918901Z test_unpack2 (__main__.FunctionTests) ... stats [('calls_captured', 3), ('fusions_possible', 2), ('unique_graphs', 1)] 2023-01-11T21:24:04.6919178Z ok (0.006s) 2023-01-11T21:24:04.6919529Z test_unpack3 (__main__.FunctionTests) ... stats [('calls_captured', 3), ('fusions_possible', 2), ('unique_graphs', 1)] 2023-01-11T21:24:04.6919798Z ok (0.006s) 2023-01-11T21:24:04.6920163Z test_unpack_ex1 (__main__.FunctionTests) ... stats [('calls_captured', 5), ('fusions_possible', 4), ('unique_graphs', 1)] 2023-01-11T21:24:04.6920433Z ok (0.008s) 2023-01-11T21:24:04.6920799Z test_unpack_ex2 (__main__.FunctionTests) ... stats [('calls_captured', 5), ('fusions_possible', 4), ('unique_graphs', 1)] 2023-01-11T21:24:04.6921074Z ok (0.008s) 2023-01-11T21:24:04.6921440Z test_unpack_ex3 (__main__.FunctionTests) ... stats [('calls_captured', 5), ('fusions_possible', 4), ('unique_graphs', 1)] 2023-01-11T21:24:04.6921701Z ok (0.008s) 2023-01-11T21:24:04.6922068Z test_viamethod (__main__.FunctionTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)] 2023-01-11T21:24:04.6922345Z ok (0.005s) 2023-01-11T21:24:04.6922702Z test_viatorch (__main__.FunctionTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)] 2023-01-11T21:24:04.6922974Z ok (0.005s) 2023-01-11T21:24:04.6923082Z 2023-01-11T21:24:04.6923292Z ---------------------------------------------------------------------- 2023-01-11T21:24:04.6923592Z Ran 109 tests in 1.981s 2023-01-11T21:24:04.6923719Z 2023-01-11T21:24:04.6923792Z OK 2023-01-11T21:24:04.6923893Z 2023-01-11T21:24:04.6923987Z Generating XML reports... 2023-01-11T21:24:04.6924415Z Generated XML report: test-reports/python-unittest/dynamo.test_functions/TEST-FunctionTests-20230111212402.xml 2023-01-11T21:24:04.6924666Z 2023-01-11T21:24:04.6926836Z ##[endgroup] 2023-01-11T21:24:04.6927276Z FINISHED PRINTING LOG FILE of dynamo/test_functions (/var/lib/jenkins/workspace/test/test-reports/dynamo-test_functions_ovkhnnj2) 2023-01-11T21:24:04.6927514Z 2023-01-11T21:24:06.6019519Z Ignoring disabled issues: [] 2023-01-11T21:24:06.6250321Z Running dynamo/test_replay_record ... [2023-01-11 21:24:06.624355] 2023-01-11T21:24:06.6251789Z Executing ['/opt/conda/bin/python', '-bb', 'dynamo/test_replay_record.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 21:24:06.624700] 2023-01-11T21:24:08.3281024Z 2023-01-11T21:24:08.3281727Z Expand the folded group to see the log file of dynamo/test_replay_record 2023-01-11T21:24:08.3282863Z ##[group]PRINTING LOG FILE of dynamo/test_replay_record (/var/lib/jenkins/workspace/test/test-reports/dynamo-test_replay_record_vdt6sddc) 2023-01-11T21:24:08.3283123Z 2023-01-11T21:24:08.3283210Z Running tests... 2023-01-11T21:24:08.3283652Z ---------------------------------------------------------------------- 2023-01-11T21:24:08.3284066Z Test results will be stored in test-reports/python-unittest/dynamo.test_replay_record 2023-01-11T21:24:08.3284425Z test_fn_call_args (__main__.ReplayRecordTests) ... skip: requires dill (0.000s) 2023-01-11T21:24:08.3284758Z test_local_module (__main__.ReplayRecordTests) ... skip: requires dill (0.001s) 2023-01-11T21:24:08.3285090Z test_nonlocal_fn_call (__main__.ReplayRecordTests) ... skip: requires dill (0.000s) 2023-01-11T21:24:08.3285625Z test_nonlocal_module_class (__main__.ReplayRecordTests) ... skip: requires dill (0.000s) 2023-01-11T21:24:08.3285980Z test_nonlocal_module_fn_call (__main__.ReplayRecordTests) ... skip: requires dill (0.000s) 2023-01-11T21:24:08.3286334Z test_successful_inline (__main__.ReplayRecordTests) ... skip: requires dill (0.000s) 2023-01-11T21:24:08.3286672Z test_unsuccessful_inline (__main__.ReplayRecordTests) ... skip: requires dill (0.000s) 2023-01-11T21:24:08.3286862Z 2023-01-11T21:24:08.3287339Z ---------------------------------------------------------------------- 2023-01-11T21:24:08.3287599Z Ran 7 tests in 0.003s 2023-01-11T21:24:08.3287722Z 2023-01-11T21:24:08.3287795Z OK (skipped=7) 2023-01-11T21:24:08.3287912Z 2023-01-11T21:24:08.3288005Z Generating XML reports... 2023-01-11T21:24:08.3288459Z Generated XML report: test-reports/python-unittest/dynamo.test_replay_record/TEST-ReplayRecordTests-20230111212408.xml 2023-01-11T21:24:08.3288720Z 2023-01-11T21:24:08.3288953Z ##[endgroup] 2023-01-11T21:24:08.3289374Z FINISHED PRINTING LOG FILE of dynamo/test_replay_record (/var/lib/jenkins/workspace/test/test-reports/dynamo-test_replay_record_vdt6sddc) 2023-01-11T21:24:08.3289620Z 2023-01-11T21:24:10.2671119Z Ignoring disabled issues: [] 2023-01-11T21:24:10.2906542Z Running dynamo/test_verify_correctness ... [2023-01-11 21:24:10.290147] 2023-01-11T21:24:10.2909298Z Executing ['/opt/conda/bin/python', '-bb', 'dynamo/test_verify_correctness.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 21:24:10.290490] 2023-01-11T21:24:13.4470715Z 2023-01-11T21:24:13.4471318Z Expand the folded group to see the log file of dynamo/test_verify_correctness 2023-01-11T21:24:13.4472118Z ##[group]PRINTING LOG FILE of dynamo/test_verify_correctness (/var/lib/jenkins/workspace/test/test-reports/dynamo-test_verify_correctness_dts3b7im) 2023-01-11T21:24:13.4472424Z 2023-01-11T21:24:13.4472517Z Running tests... 2023-01-11T21:24:13.4472960Z ---------------------------------------------------------------------- 2023-01-11T21:24:13.4473386Z Test results will be stored in test-reports/python-unittest/dynamo.test_verify_correctness 2023-01-11T21:24:13.4473965Z test_example_inputs (__main__.TestVerifyCorrectness) ... ok (1.178s) 2023-01-11T21:24:13.4474277Z test_incorrect_verify_false (__main__.TestVerifyCorrectness) 2023-01-11T21:24:13.4474730Z The bad optimization return a graph that ... stats [('calls_captured', 3), ('fusions_possible', 2), ('unique_graphs', 1)] 2023-01-11T21:24:13.4475077Z frames [('total', 2), ('ok', 2)] 2023-01-11T21:24:13.4475427Z stats [('calls_captured', 7), ('fusions_possible', 5), ('unique_graphs', 2)] 2023-01-11T21:24:13.4475667Z ok (0.016s) 2023-01-11T21:24:13.4475914Z test_incorrect_verify_true (__main__.TestVerifyCorrectness) 2023-01-11T21:24:13.4476377Z If a bad optimization return a graph that ... [2023-01-11 21:24:12,911] torch._dynamo.output_graph: [ERROR] error in verify_correctness 2023-01-11T21:24:13.4476691Z Traceback (most recent call last): 2023-01-11T21:24:13.4477089Z File "/opt/conda/lib/python3.10/site-packages/torch/_dynamo/output_graph.py", line 173, in __call__ 2023-01-11T21:24:13.4477435Z raise RuntimeError(f"incorrect results of backend {self}") 2023-01-11T21:24:13.4477804Z RuntimeError: incorrect results of backend 2023-01-11T21:24:13.4478143Z frames [('total', 2), ('ok', 1)] 2023-01-11T21:24:13.4478472Z stats [('calls_captured', 7), ('fusions_possible', 5), ('unique_graphs', 1)] 2023-01-11T21:24:13.4478723Z ok (0.016s) 2023-01-11T21:24:13.4478986Z test_ipex_fp32 (__main__.TestVerifyCorrectness) ... skip: requires ipex (0.001s) 2023-01-11T21:24:13.4479382Z test_nnc (__main__.TestVerifyCorrectness) ... frames [('total', 1), ('ok', 1)] 2023-01-11T21:24:13.4479756Z stats [('calls_captured', 4), ('fusions_possible', 3), ('unique_graphs', 1)] 2023-01-11T21:24:13.4479999Z ok (0.087s) 2023-01-11T21:24:13.4480112Z 2023-01-11T21:24:13.4480318Z ---------------------------------------------------------------------- 2023-01-11T21:24:13.4480579Z Ran 5 tests in 1.299s 2023-01-11T21:24:13.4480702Z 2023-01-11T21:24:13.4480782Z OK (skipped=1) 2023-01-11T21:24:13.4480895Z 2023-01-11T21:24:13.4480988Z Generating XML reports... 2023-01-11T21:24:13.4481449Z Generated XML report: test-reports/python-unittest/dynamo.test_verify_correctness/TEST-TestVerifyCorrectness-20230111212411.xml 2023-01-11T21:24:13.4481721Z 2023-01-11T21:24:13.4481961Z ##[endgroup] 2023-01-11T21:24:13.4482468Z FINISHED PRINTING LOG FILE of dynamo/test_verify_correctness (/var/lib/jenkins/workspace/test/test-reports/dynamo-test_verify_correctness_dts3b7im) 2023-01-11T21:24:13.4482726Z 2023-01-11T21:24:15.3606009Z Ignoring disabled issues: [] 2023-01-11T21:24:15.3834641Z Running lazy/test_extract_compiled_graph ... [2023-01-11 21:24:15.382872] 2023-01-11T21:24:15.3836383Z Executing ['/opt/conda/bin/python', '-bb', 'lazy/test_extract_compiled_graph.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 21:24:15.383234] 2023-01-11T21:24:16.7529621Z 2023-01-11T21:24:16.7530807Z Expand the folded group to see the log file of lazy/test_extract_compiled_graph 2023-01-11T21:24:16.7532184Z ##[group]PRINTING LOG FILE of lazy/test_extract_compiled_graph (/var/lib/jenkins/workspace/test/test-reports/lazy-test_extract_compiled_graph_rb0dxqv0) 2023-01-11T21:24:16.7532701Z 2023-01-11T21:24:16.7533137Z ##[endgroup] 2023-01-11T21:24:16.7534035Z FINISHED PRINTING LOG FILE of lazy/test_extract_compiled_graph (/var/lib/jenkins/workspace/test/test-reports/lazy-test_extract_compiled_graph_rb0dxqv0) 2023-01-11T21:24:16.7534330Z 2023-01-11T21:24:18.6500809Z Ignoring disabled issues: [] 2023-01-11T21:24:18.6728573Z Running lazy/test_ts_opinfo ... [2023-01-11 21:24:18.672074] 2023-01-11T21:24:18.6730294Z Executing ['/opt/conda/bin/python', '-bb', 'lazy/test_ts_opinfo.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 21:24:18.672408] 2023-01-11T21:24:22.3675189Z 2023-01-11T21:24:22.3675647Z Expand the folded group to see the log file of lazy/test_ts_opinfo 2023-01-11T21:24:22.3676509Z ##[group]PRINTING LOG FILE of lazy/test_ts_opinfo (/var/lib/jenkins/workspace/test/test-reports/lazy-test_ts_opinfo_ro8y0r1x) 2023-01-11T21:24:22.3677096Z 2023-01-11T21:24:22.3677201Z Running tests... 2023-01-11T21:24:22.3677752Z ---------------------------------------------------------------------- 2023-01-11T21:24:22.3678193Z Test results will be stored in test-reports/python-unittest/lazy.test_ts_opinfo 2023-01-11T21:24:22.3678539Z test_nonzero_dynamic (__main__.TestLazyDynamicOps) ... ok (0.160s) 2023-01-11T21:24:22.3678904Z testConvolutionBackward (__main__.TestLazyTensor) ... skip: Disable until autograd supports symints (0.001s) 2023-01-11T21:24:22.3679248Z test_tensor_ctr (__main__.TestLazyTensor) ... ok (0.001s) 2023-01-11T21:24:22.3679554Z test_view_mark_step_preserved (__main__.TestLazyTensor) ... ok (0.002s) 2023-01-11T21:24:22.3679733Z 2023-01-11T21:24:22.3679935Z ---------------------------------------------------------------------- 2023-01-11T21:24:22.3680186Z Ran 4 tests in 0.166s 2023-01-11T21:24:22.3680313Z 2023-01-11T21:24:22.3680391Z OK (skipped=1) 2023-01-11T21:24:22.3680512Z 2023-01-11T21:24:22.3680611Z Generating XML reports... 2023-01-11T21:24:22.3681056Z Generated XML report: test-reports/python-unittest/lazy.test_ts_opinfo/TEST-TestLazyDynamicOps-20230111212421.xml 2023-01-11T21:24:22.3681573Z Generated XML report: test-reports/python-unittest/lazy.test_ts_opinfo/TEST-TestLazyTensor-20230111212421.xml 2023-01-11T21:24:22.3681802Z 2023-01-11T21:24:22.3682045Z ##[endgroup] 2023-01-11T21:24:22.3682446Z FINISHED PRINTING LOG FILE of lazy/test_ts_opinfo (/var/lib/jenkins/workspace/test/test-reports/lazy-test_ts_opinfo_ro8y0r1x) 2023-01-11T21:24:22.3682677Z 2023-01-11T21:24:24.2938583Z Ignoring disabled issues: [] 2023-01-11T21:24:24.3176704Z Running nn/test_init ... [2023-01-11 21:24:24.316959] 2023-01-11T21:24:24.3177577Z Executing ['/opt/conda/bin/python', '-bb', 'nn/test_init.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 21:24:24.317321] 2023-01-11T21:24:26.2064978Z 2023-01-11T21:24:26.2066002Z Expand the folded group to see the log file of nn/test_init 2023-01-11T21:24:26.2067078Z ##[group]PRINTING LOG FILE of nn/test_init (/var/lib/jenkins/workspace/test/test-reports/nn-test_init__s30kh7v) 2023-01-11T21:24:26.2067312Z 2023-01-11T21:24:26.2067542Z ##[endgroup] 2023-01-11T21:24:26.2068353Z FINISHED PRINTING LOG FILE of nn/test_init (/var/lib/jenkins/workspace/test/test-reports/nn-test_init__s30kh7v) 2023-01-11T21:24:26.2068584Z 2023-01-11T21:24:28.1452466Z Ignoring disabled issues: [] 2023-01-11T21:24:28.1681181Z Running nn/test_packed_sequence ... [2023-01-11 21:24:28.167512] 2023-01-11T21:24:28.1681865Z Executing ['/opt/conda/bin/python', '-bb', 'nn/test_packed_sequence.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 21:24:28.167859] 2023-01-11T21:24:31.1616828Z 2023-01-11T21:24:31.1617408Z Expand the folded group to see the log file of nn/test_packed_sequence 2023-01-11T21:24:31.1618333Z ##[group]PRINTING LOG FILE of nn/test_packed_sequence (/var/lib/jenkins/workspace/test/test-reports/nn-test_packed_sequence_ctas8rzn) 2023-01-11T21:24:31.1618684Z 2023-01-11T21:24:31.1618805Z Running tests... 2023-01-11T21:24:31.1619380Z ---------------------------------------------------------------------- 2023-01-11T21:24:31.1619955Z Test results will be stored in test-reports/python-unittest/nn.test_packed_sequence 2023-01-11T21:24:31.1620446Z test_pack_padded_sequence (__main__.PackedSequenceTest) ... ok (1.129s) 2023-01-11T21:24:31.1620876Z test_pack_sequence (__main__.PackedSequenceTest) ... ok (0.055s) 2023-01-11T21:24:31.1621196Z test_pad_sequence (__main__.PackedSequenceTest) ... ok (0.009s) 2023-01-11T21:24:31.1621657Z test_pad_sequence_with_non_iterable_sequences (__main__.PackedSequenceTest) ... ok (0.001s) 2023-01-11T21:24:31.1622095Z test_pad_sequence_with_tensor_sequences (__main__.PackedSequenceTest) ... ok (0.001s) 2023-01-11T21:24:31.1622412Z test_to (__main__.PackedSequenceTest) ... ok (0.007s) 2023-01-11T21:24:31.1622710Z test_to_memory_format (__main__.PackedSequenceTest) ... ok (0.001s) 2023-01-11T21:24:31.1623224Z test_total_length (__main__.PackedSequenceTest) ... ok (0.003s) 2023-01-11T21:24:31.1623509Z test_type_casts (__main__.PackedSequenceTest) 2023-01-11T21:24:31.1623824Z Test type casting of `PackedSequence` against type casting of tensor ... ok (0.024s) 2023-01-11T21:24:31.1624149Z test_unpack_sequence (__main__.PackedSequenceTest) ... ok (0.009s) 2023-01-11T21:24:31.1624459Z test_unpad_sequence (__main__.PackedSequenceTest) ... ok (0.008s) 2023-01-11T21:24:31.1624760Z test_wrong_order (__main__.PackedSequenceTest) ... ok (0.004s) 2023-01-11T21:24:31.1624927Z 2023-01-11T21:24:31.1625136Z ---------------------------------------------------------------------- 2023-01-11T21:24:31.1625400Z Ran 12 tests in 1.253s 2023-01-11T21:24:31.1625525Z 2023-01-11T21:24:31.1625594Z OK 2023-01-11T21:24:31.1625693Z 2023-01-11T21:24:31.1625786Z Generating XML reports... 2023-01-11T21:24:31.1626230Z Generated XML report: test-reports/python-unittest/nn.test_packed_sequence/TEST-PackedSequenceTest-20230111212429.xml 2023-01-11T21:24:31.1626492Z 2023-01-11T21:24:31.1626761Z ##[endgroup] 2023-01-11T21:24:31.1627179Z FINISHED PRINTING LOG FILE of nn/test_packed_sequence (/var/lib/jenkins/workspace/test/test-reports/nn-test_packed_sequence_ctas8rzn) 2023-01-11T21:24:31.1627431Z 2023-01-11T21:24:33.0811293Z Ignoring disabled issues: [] 2023-01-11T21:24:33.1045499Z Running profiler/test_memory_profiler ... [2023-01-11 21:24:33.103677] 2023-01-11T21:24:33.1046593Z Executing ['/opt/conda/bin/python', '-bb', 'profiler/test_memory_profiler.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 21:24:33.104010] 2023-01-11T21:24:39.4227989Z 2023-01-11T21:24:39.4228530Z Expand the folded group to see the log file of profiler/test_memory_profiler 2023-01-11T21:24:39.4229921Z ##[group]PRINTING LOG FILE of profiler/test_memory_profiler (/var/lib/jenkins/workspace/test/test-reports/profiler-test_memory_profiler_ksquozlp) 2023-01-11T21:24:39.4230388Z 2023-01-11T21:24:39.4230545Z Running tests... 2023-01-11T21:24:39.4231128Z ---------------------------------------------------------------------- 2023-01-11T21:24:39.4231727Z Test results will be stored in test-reports/python-unittest/profiler.test_memory_profiler 2023-01-11T21:24:39.4232876Z test_data_flow_graph_complicated (__main__.TestDataFlow) ... STAGE:2023-01-11 21:24:35 4185:4185 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:24:39.4233714Z STAGE:2023-01-11 21:24:35 4185:4185 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:24:39.4234460Z STAGE:2023-01-11 21:24:35 4185:4185 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:24:39.4235491Z /var/lib/jenkins/workspace/test/profiler/test_memory_profiler.py:336: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:24:39.4236516Z lines.append(f"{name + ':':<8} T{storage_to_id[t.storage().data_ptr()]}") 2023-01-11T21:24:39.4237138Z STAGE:2023-01-11 21:24:35 4185:4185 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:24:39.4237704Z STAGE:2023-01-11 21:24:35 4185:4185 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:24:39.4238175Z STAGE:2023-01-11 21:24:35 4185:4185 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:24:39.4238459Z ok (1.327s) 2023-01-11T21:24:39.4238916Z test_data_flow_graph_non_op_allocations (__main__.TestDataFlow) ... STAGE:2023-01-11 21:24:35 4185:4185 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:24:39.4239426Z STAGE:2023-01-11 21:24:35 4185:4185 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:24:39.4239938Z STAGE:2023-01-11 21:24:35 4185:4185 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:24:39.4240320Z ok (0.068s) 2023-01-11T21:24:39.4240750Z test_data_flow_graph_simple (__main__.TestDataFlow) ... STAGE:2023-01-11 21:24:35 4185:4185 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:24:39.4241249Z STAGE:2023-01-11 21:24:35 4185:4185 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:24:39.4241708Z STAGE:2023-01-11 21:24:35 4185:4185 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:24:39.4242855Z /var/lib/jenkins/workspace/test/profiler/test_memory_profiler.py:337: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:485.) 2023-01-11T21:24:39.4243590Z if t.grad is not None: 2023-01-11T21:24:39.4243963Z STAGE:2023-01-11 21:24:35 4185:4185 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:24:39.4244406Z STAGE:2023-01-11 21:24:35 4185:4185 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:24:39.4244869Z STAGE:2023-01-11 21:24:35 4185:4185 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:24:39.4245150Z ok (0.131s) 2023-01-11T21:24:39.4245596Z test_data_flow_graph_simple_backward (__main__.TestDataFlow) ... STAGE:2023-01-11 21:24:35 4185:4185 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:24:39.4246101Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:24:39.4246561Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:24:39.4247233Z /var/lib/jenkins/workspace/test/profiler/test_memory_profiler.py:338: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:24:39.4247772Z grad_id = storage_to_id[t.grad.storage().data_ptr()] 2023-01-11T21:24:39.4248005Z ok (0.071s) 2023-01-11T21:24:39.4248468Z test_data_flow_graph_simple_inplace (__main__.TestDataFlow) ... STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:24:39.4248969Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:24:39.4249425Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:24:39.4249880Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:24:39.4250330Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:24:39.4250787Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:24:39.4251058Z ok (0.136s) 2023-01-11T21:24:39.4251503Z test_data_flow_graph_stacked (__main__.TestDataFlow) ... STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:24:39.4251999Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:24:39.4252445Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:24:39.4252893Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:24:39.4253341Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:24:39.4253838Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:24:39.4254423Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:24:39.4255072Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:24:39.4255531Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:24:39.4255806Z ok (0.242s) 2023-01-11T21:24:39.4256252Z test_data_flow_graph_with_annotations (__main__.TestDataFlow) ... STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:24:39.4256754Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:24:39.4257206Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:24:39.4257474Z ok (0.077s) 2023-01-11T21:24:39.4257899Z test_match_schemas (__main__.TestDataFlow) ... STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:24:39.4258384Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:24:39.4258842Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:24:39.4259112Z ok (0.008s) 2023-01-11T21:24:39.4259549Z test_match_schemas_backward (__main__.TestDataFlow) ... STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:24:39.4260039Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:24:39.4260483Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:24:39.4260762Z ok (0.006s) 2023-01-11T21:24:39.4261201Z test_match_schemas_tensorlist (__main__.TestDataFlow) ... STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:24:39.4261792Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:24:39.4262240Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:24:39.4262516Z ok (0.003s) 2023-01-11T21:24:39.4262981Z test_extract_gradients_from_module (__main__.TestIdentifyGradients) ... STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:24:39.4263494Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:24:39.4263941Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:24:39.4264575Z /var/lib/jenkins/workspace/test/profiler/test_memory_profiler.py:117: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:24:39.4265131Z return tensor.storage().data_ptr() == key.storage.ptr 2023-01-11T21:24:39.4265701Z /var/lib/jenkins/workspace/test/profiler/test_memory_profiler.py:147: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:24:39.4266244Z allowed_set = {t.storage().data_ptr() for t in tensors} 2023-01-11T21:24:39.4266636Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:24:39.4267134Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:24:39.4267595Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:24:39.4268046Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:24:39.4268499Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:24:39.4268960Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:24:39.4269413Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:24:39.4269902Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:24:39.4270368Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:24:39.4270651Z ok (0.045s) 2023-01-11T21:24:39.4271138Z test_extract_gradients_from_module_and_optimizer (__main__.TestIdentifyGradients) ... STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:24:39.4271669Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:24:39.4272128Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:24:39.4272405Z ok (0.009s) 2023-01-11T21:24:39.4272874Z test_extract_gradients_from_optimizer (__main__.TestIdentifyGradients) ... STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:24:39.4273395Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:24:39.4273853Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:24:39.4274303Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:24:39.4274746Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:24:39.4275244Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:24:39.4275698Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:24:39.4276137Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:24:39.4276596Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:24:39.4277049Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:24:39.4277501Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:24:39.4277955Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:24:39.4278237Z ok (0.046s) 2023-01-11T21:24:39.4278736Z test_extract_gradients_from_optimizer_set_to_none (__main__.TestIdentifyGradients) ... STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:24:39.4279260Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:24:39.4279713Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:24:39.4280163Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:24:39.4280611Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:24:39.4281075Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:24:39.4281554Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:24:39.4282004Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:24:39.4282465Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:24:39.4282915Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:24:39.4283363Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:24:39.4283817Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:24:39.4284093Z ok (0.053s) 2023-01-11T21:24:39.4284554Z test_extract_gradients_low_level (__main__.TestIdentifyGradients) ... STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:24:39.4285072Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:24:39.4285528Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:24:39.4285972Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:24:39.4286419Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:24:39.4286879Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:24:39.4287156Z ok (0.014s) 2023-01-11T21:24:39.4287591Z test_config_check (__main__.TestMemoryProfiler) ... STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:24:39.4288089Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:24:39.4288549Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:24:39.4288996Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:24:39.4289542Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:24:39.4290002Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:24:39.4290451Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:24:39.4290893Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:24:39.4291355Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:24:39.4291638Z ok (0.008s) 2023-01-11T21:24:39.4292123Z test_categories_e2e_sequential_fwd (__main__.TestMemoryProfilerE2E) ... STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:24:39.4292809Z /var/lib/jenkins/workspace/test/profiler/test_memory_profiler.py:81: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:24:39.4293360Z if isinstance(t, torch.Tensor) and t.storage() 2023-01-11T21:24:39.4293925Z /var/lib/jenkins/workspace/test/profiler/test_memory_profiler.py:78: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:24:39.4294666Z return tuple( 2023-01-11T21:24:39.4295190Z /var/lib/jenkins/workspace/test/profiler/test_memory_profiler.py:79: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:24:39.4295698Z (t._cdata, t.storage().data_ptr()) 2023-01-11T21:24:39.4296098Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:24:39.4296563Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:24:39.4296848Z ok (0.078s) 2023-01-11T21:24:39.4297321Z test_categories_e2e_sequential_fwd_bwd (__main__.TestMemoryProfilerE2E) ... STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:24:39.4297846Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:24:39.4298304Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:24:39.4298586Z ok (0.331s) 2023-01-11T21:24:39.4299045Z test_categories_e2e_simple_fwd (__main__.TestMemoryProfilerE2E) ... STAGE:2023-01-11 21:24:37 4185:4185 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:24:39.4299555Z STAGE:2023-01-11 21:24:37 4185:4185 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:24:39.4300060Z STAGE:2023-01-11 21:24:37 4185:4185 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:24:39.4300331Z ok (0.041s) 2023-01-11T21:24:39.4300796Z test_categories_e2e_simple_fwd_bwd (__main__.TestMemoryProfilerE2E) ... STAGE:2023-01-11 21:24:37 4185:4185 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:24:39.4301312Z STAGE:2023-01-11 21:24:37 4185:4185 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:24:39.4301770Z STAGE:2023-01-11 21:24:37 4185:4185 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:24:39.4302040Z ok (0.197s) 2023-01-11T21:24:39.4302576Z test_categories_e2e_simple_fwd_bwd_step (__main__.TestMemoryProfilerE2E) ... STAGE:2023-01-11 21:24:37 4185:4185 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:24:39.4303103Z STAGE:2023-01-11 21:24:37 4185:4185 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:24:39.4303556Z STAGE:2023-01-11 21:24:37 4185:4185 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:24:39.4303835Z ok (0.245s) 2023-01-11T21:24:39.4304307Z test_categories_e2e_simple_module_fwd (__main__.TestMemoryProfilerE2E) ... STAGE:2023-01-11 21:24:37 4185:4185 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:24:39.4304824Z STAGE:2023-01-11 21:24:37 4185:4185 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:24:39.4305286Z STAGE:2023-01-11 21:24:37 4185:4185 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:24:39.4305570Z ok (0.032s) 2023-01-11T21:24:39.4306053Z test_categories_e2e_simple_module_fwd_bwd (__main__.TestMemoryProfilerE2E) ... STAGE:2023-01-11 21:24:37 4185:4185 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:24:39.4306580Z STAGE:2023-01-11 21:24:37 4185:4185 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:24:39.4307036Z STAGE:2023-01-11 21:24:37 4185:4185 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:24:39.4307315Z ok (0.126s) 2023-01-11T21:24:39.4307802Z test_categories_e2e_simple_module_fwd_bwd_step (__main__.TestMemoryProfilerE2E) ... STAGE:2023-01-11 21:24:37 4185:4185 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:24:39.4308320Z STAGE:2023-01-11 21:24:37 4185:4185 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:24:39.4308826Z STAGE:2023-01-11 21:24:37 4185:4185 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:24:39.4309110Z ok (0.206s) 2023-01-11T21:24:39.4309559Z test_inputs_fwd (__main__.TestMemoryProfilerE2E) ... STAGE:2023-01-11 21:24:37 4185:4185 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:24:39.4310047Z STAGE:2023-01-11 21:24:37 4185:4185 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:24:39.4310504Z STAGE:2023-01-11 21:24:37 4185:4185 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:24:39.4311137Z /var/lib/jenkins/workspace/test/profiler/test_memory_profiler.py:828: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:24:39.4311657Z storage = t.storage() 2023-01-11T21:24:39.4312195Z /var/lib/jenkins/workspace/test/profiler/test_memory_profiler.py:836: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:24:39.4312777Z if key.storage.ptr == storage.data_ptr() and key.device == storage.device 2023-01-11T21:24:39.4313038Z ok (0.028s) 2023-01-11T21:24:39.4313491Z test_inputs_fwd_bwd (__main__.TestMemoryProfilerE2E) ... STAGE:2023-01-11 21:24:37 4185:4185 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:24:39.4313988Z STAGE:2023-01-11 21:24:37 4185:4185 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:24:39.4314454Z STAGE:2023-01-11 21:24:37 4185:4185 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:24:39.4314739Z ok (0.157s) 2023-01-11T21:24:39.4315190Z test_inputs_fwd_lazy (__main__.TestMemoryProfilerE2E) ... STAGE:2023-01-11 21:24:38 4185:4185 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:24:39.4315726Z STAGE:2023-01-11 21:24:38 4185:4185 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:24:39.4316189Z STAGE:2023-01-11 21:24:38 4185:4185 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:24:39.4316467Z ok (0.033s) 2023-01-11T21:24:39.4316922Z test_lazily_initialized (__main__.TestMemoryProfilerE2E) ... STAGE:2023-01-11 21:24:38 4185:4185 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:24:39.4317426Z STAGE:2023-01-11 21:24:38 4185:4185 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:24:39.4317887Z STAGE:2023-01-11 21:24:38 4185:4185 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:24:39.4318175Z ok (0.082s) 2023-01-11T21:24:39.4318626Z test_manual_optimizer_step (__main__.TestMemoryProfilerE2E) ... STAGE:2023-01-11 21:24:38 4185:4185 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:24:39.4319136Z STAGE:2023-01-11 21:24:38 4185:4185 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:24:39.4319604Z STAGE:2023-01-11 21:24:38 4185:4185 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:24:39.4319887Z ok (0.044s) 2023-01-11T21:24:39.4320332Z test_memory_timeline (__main__.TestMemoryProfilerE2E) ... STAGE:2023-01-11 21:24:38 4185:4185 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:24:39.4320836Z STAGE:2023-01-11 21:24:38 4185:4185 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:24:39.4321296Z STAGE:2023-01-11 21:24:38 4185:4185 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:24:39.4321631Z ok (0.191s) 2023-01-11T21:24:39.4322101Z test_parameters_and_gradients (__main__.TestMemoryProfilerE2E) ... STAGE:2023-01-11 21:24:38 4185:4185 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:24:39.4322619Z STAGE:2023-01-11 21:24:38 4185:4185 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:24:39.4323085Z STAGE:2023-01-11 21:24:38 4185:4185 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:24:39.4323528Z STAGE:2023-01-11 21:24:38 4185:4185 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:24:39.4323976Z STAGE:2023-01-11 21:24:38 4185:4185 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:24:39.4324440Z STAGE:2023-01-11 21:24:38 4185:4185 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:24:39.4324895Z STAGE:2023-01-11 21:24:38 4185:4185 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:24:39.4325339Z STAGE:2023-01-11 21:24:38 4185:4185 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:24:39.4325801Z STAGE:2023-01-11 21:24:38 4185:4185 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:24:39.4326258Z STAGE:2023-01-11 21:24:38 4185:4185 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:24:39.4326707Z STAGE:2023-01-11 21:24:38 4185:4185 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:24:39.4327156Z STAGE:2023-01-11 21:24:38 4185:4185 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:24:39.4327437Z ok (0.180s) 2023-01-11T21:24:39.4327917Z test_parameters_and_gradients_set_to_none (__main__.TestMemoryProfilerE2E) ... STAGE:2023-01-11 21:24:38 4185:4185 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:24:39.4328425Z [W CPUAllocator.cpp:231] Memory block of unknown size was allocated before the profiling started, profiler results will not include the deallocation event 2023-01-11T21:24:39.4328935Z STAGE:2023-01-11 21:24:38 4185:4185 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:24:39.4329394Z STAGE:2023-01-11 21:24:38 4185:4185 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:24:39.4329889Z STAGE:2023-01-11 21:24:38 4185:4185 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:24:39.4330385Z STAGE:2023-01-11 21:24:38 4185:4185 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:24:39.4330849Z STAGE:2023-01-11 21:24:38 4185:4185 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:24:39.4331127Z ok (0.246s) 2023-01-11T21:24:39.4331243Z 2023-01-11T21:24:39.4331451Z ---------------------------------------------------------------------- 2023-01-11T21:24:39.4331710Z Ran 32 tests in 4.464s 2023-01-11T21:24:39.4331836Z 2023-01-11T21:24:39.4331908Z OK 2023-01-11T21:24:39.4332013Z 2023-01-11T21:24:39.4332112Z Generating XML reports... 2023-01-11T21:24:39.4332543Z Generated XML report: test-reports/python-unittest/profiler.test_memory_profiler/TEST-TestDataFlow-20230111212434.xml 2023-01-11T21:24:39.4333121Z Generated XML report: test-reports/python-unittest/profiler.test_memory_profiler/TEST-TestIdentifyGradients-20230111212434.xml 2023-01-11T21:24:39.4333706Z Generated XML report: test-reports/python-unittest/profiler.test_memory_profiler/TEST-TestMemoryProfiler-20230111212434.xml 2023-01-11T21:24:39.4334392Z Generated XML report: test-reports/python-unittest/profiler.test_memory_profiler/TEST-TestMemoryProfilerE2E-20230111212434.xml 2023-01-11T21:24:39.4334869Z 2023-01-11T21:24:39.4335220Z ##[endgroup] 2023-01-11T21:24:39.4335669Z FINISHED PRINTING LOG FILE of profiler/test_memory_profiler (/var/lib/jenkins/workspace/test/test-reports/profiler-test_memory_profiler_ksquozlp) 2023-01-11T21:24:39.4335929Z 2023-01-11T21:24:41.3616610Z Ignoring disabled issues: [] 2023-01-11T21:24:41.3850091Z Running test_autocast ... [2023-01-11 21:24:41.384380] 2023-01-11T21:24:41.3851709Z Executing ['/opt/conda/bin/python', '-bb', 'test_autocast.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 21:24:41.384754] 2023-01-11T21:24:45.0413086Z 2023-01-11T21:24:45.0413846Z Expand the folded group to see the log file of test_autocast 2023-01-11T21:24:45.0415132Z ##[group]PRINTING LOG FILE of test_autocast (/var/lib/jenkins/workspace/test/test-reports/test_autocast_9mc1y8u0) 2023-01-11T21:24:45.0415452Z 2023-01-11T21:24:45.0415573Z Running tests... 2023-01-11T21:24:45.0416036Z ---------------------------------------------------------------------- 2023-01-11T21:24:45.0416504Z Test results will be stored in test-reports/python-unittest/test_autocast 2023-01-11T21:24:45.0416859Z test_autocast_methods_expect_builtin_promote (__main__.TestAutocastCPU) ... ok (1.110s) 2023-01-11T21:24:45.0417187Z test_autocast_nn_bf16 (__main__.TestAutocastCPU) ... ok (0.005s) 2023-01-11T21:24:45.0417484Z test_autocast_nn_fp32 (__main__.TestAutocastCPU) ... ok (0.007s) 2023-01-11T21:24:45.0417787Z test_autocast_torch_bf16 (__main__.TestAutocastCPU) ... ok (0.010s) 2023-01-11T21:24:45.0418114Z test_autocast_torch_expect_builtin_promote (__main__.TestAutocastCPU) ... ok (0.005s) 2023-01-11T21:24:45.0418439Z test_autocast_torch_fp32 (__main__.TestAutocastCPU) ... ok (0.064s) 2023-01-11T21:24:45.0418757Z test_autocast_torch_need_autocast_promote (__main__.TestAutocastCPU) ... ok (0.005s) 2023-01-11T21:24:45.0419067Z test_cast_cache_is_global (__main__.TestAutocastGPU) 2023-01-11T21:24:45.0419368Z Verifies that the autocast cache is global. This is done by ... ok (0.656s) 2023-01-11T21:24:45.0419676Z test_autocast_fast_dtype (__main__.TestTorchAutocast) ... ok (0.001s) 2023-01-11T21:24:45.0419850Z 2023-01-11T21:24:45.0420054Z ---------------------------------------------------------------------- 2023-01-11T21:24:45.0420316Z Ran 9 tests in 1.864s 2023-01-11T21:24:45.0420448Z 2023-01-11T21:24:45.0420523Z OK 2023-01-11T21:24:45.0420627Z 2023-01-11T21:24:45.0420714Z Generating XML reports... 2023-01-11T21:24:45.0421145Z Generated XML report: test-reports/python-unittest/test_autocast/TEST-TestAutocastCPU-20230111212442.xml 2023-01-11T21:24:45.0421662Z Generated XML report: test-reports/python-unittest/test_autocast/TEST-TestAutocastGPU-20230111212442.xml 2023-01-11T21:24:45.0422522Z Generated XML report: test-reports/python-unittest/test_autocast/TEST-TestTorchAutocast-20230111212442.xml 2023-01-11T21:24:45.0422756Z 2023-01-11T21:24:45.0423021Z ##[endgroup] 2023-01-11T21:24:45.0423418Z FINISHED PRINTING LOG FILE of test_autocast (/var/lib/jenkins/workspace/test/test-reports/test_autocast_9mc1y8u0) 2023-01-11T21:24:45.0423643Z 2023-01-11T21:24:46.9620368Z Ignoring disabled issues: [] 2023-01-11T21:24:46.9850142Z Running test_comparison_utils ... [2023-01-11 21:24:46.984506] 2023-01-11T21:24:46.9853547Z Executing ['/opt/conda/bin/python', '-bb', 'test_comparison_utils.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 21:24:46.984853] 2023-01-11T21:24:48.5794519Z 2023-01-11T21:24:48.5795094Z Expand the folded group to see the log file of test_comparison_utils 2023-01-11T21:24:48.5796693Z ##[group]PRINTING LOG FILE of test_comparison_utils (/var/lib/jenkins/workspace/test/test-reports/test_comparison_utils_lzsvj7yi) 2023-01-11T21:24:48.5797259Z 2023-01-11T21:24:48.5797726Z ##[endgroup] 2023-01-11T21:24:48.5798668Z FINISHED PRINTING LOG FILE of test_comparison_utils (/var/lib/jenkins/workspace/test/test-reports/test_comparison_utils_lzsvj7yi) 2023-01-11T21:24:48.5799134Z 2023-01-11T21:24:50.4712456Z Ignoring disabled issues: [] 2023-01-11T21:24:50.4945141Z Running test_dataloader ... [2023-01-11 21:24:50.493920] 2023-01-11T21:24:50.4945931Z Executing ['/opt/conda/bin/python', '-bb', 'test_dataloader.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 21:24:50.494249] 2023-01-11T21:27:03.4835434Z 2023-01-11T21:27:03.4837196Z Expand the folded group to see the log file of test_dataloader 2023-01-11T21:27:03.4852707Z ##[group]PRINTING LOG FILE of test_dataloader (/var/lib/jenkins/workspace/test/test-reports/test_dataloader_rmt2ijzu) 2023-01-11T21:27:03.4853311Z 2023-01-11T21:27:03.4853441Z Running tests... 2023-01-11T21:27:03.4854154Z ---------------------------------------------------------------------- 2023-01-11T21:27:03.4855059Z Test results will be stored in test-reports/python-unittest/test_dataloader 2023-01-11T21:27:03.4856008Z test_shuffler_iterdatapipe (__main__.IntegrationTestDataLoaderDataPipe) 2023-01-11T21:27:03.4896218Z Verify ``IterDataPipe.shuffle`` is controlled by ``DataLoader`` ... skip: test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test (0.001s) 2023-01-11T21:27:03.4897198Z test_add_dataset (__main__.TestConcatDataset) ... ok (0.226s) 2023-01-11T21:27:03.4898096Z test_concat_raises_index_error (__main__.TestConcatDataset) ... ok (0.001s) 2023-01-11T21:27:03.4898744Z test_concat_two_non_singletons (__main__.TestConcatDataset) ... ok (0.001s) 2023-01-11T21:27:03.4899621Z test_concat_two_non_singletons_with_empty (__main__.TestConcatDataset) ... ok (0.001s) 2023-01-11T21:27:03.4900400Z test_concat_two_singletons (__main__.TestConcatDataset) ... ok (0.001s) 2023-01-11T21:27:03.4900890Z test_iterable_dataset_err (__main__.TestConcatDataset) ... ok (0.001s) 2023-01-11T21:27:03.4901949Z test_conv_after_fork (__main__.TestConvAfterFork) ... skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/75492 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (0.000s) 2023-01-11T21:27:03.4902599Z test_custom_batch_pin (__main__.TestCustomPinFn) ... ok (0.004s) 2023-01-11T21:27:03.4990334Z test_custom_batch_pin_worker (__main__.TestCustomPinFn) ... ok (0.138s) 2023-01-11T21:27:03.4991370Z test_batch_sampler (__main__.TestDataLoader) ... /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:27:03.4991987Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:27:03.4992696Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:27:03.4993637Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:27:03.4994127Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:27:03.4994526Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:27:03.4995091Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:27:03.4995522Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:27:03.4996225Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:27:03.4996782Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:27:03.4997529Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:27:03.4998206Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:27:03.4998969Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:27:03.5017431Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:27:03.5018171Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:27:03.5018662Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:27:03.5018981Z ok (1.970s) 2023-01-11T21:27:03.5019302Z test_builtin_collection_conversion (__main__.TestDataLoader) ... ok (0.274s) 2023-01-11T21:27:03.5020061Z test_bulk_loading_nobatch (__main__.TestDataLoader) ... ok (0.082s) 2023-01-11T21:27:03.5020462Z test_chain_iterable_style_dataset (__main__.TestDataLoader) ... ok (0.127s) 2023-01-11T21:27:03.5020873Z test_default_collate_bad_numpy_types (__main__.TestDataLoader) ... ok (0.002s) 2023-01-11T21:27:03.5021297Z test_default_collate_bad_sequence_type (__main__.TestDataLoader) ... ok (0.001s) 2023-01-11T21:27:03.5021705Z test_default_collate_dtype (__main__.TestDataLoader) ... ok (0.002s) 2023-01-11T21:27:03.5022116Z test_default_collate_mapping_keep_type (__main__.TestDataLoader) ... ok (0.001s) 2023-01-11T21:27:03.5023591Z test_default_collate_numpy_memmap (__main__.TestDataLoader) ... /opt/conda/lib/python3.10/site-packages/torch/utils/data/_utils/collate.py:172: UserWarning: The given NumPy array is not writable, and PyTorch does not support non-writable tensors. This means writing to this tensor will result in undefined behavior. You may want to copy the array to protect its data or make it writable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/utils/tensor_numpy.cpp:210.) 2023-01-11T21:27:03.5024714Z return collate([torch.as_tensor(b) for b in batch], collate_fn_map=collate_fn_map) 2023-01-11T21:27:03.5025052Z ok (0.013s) 2023-01-11T21:27:03.5025391Z test_default_collate_sequence_dont_keep_type (__main__.TestDataLoader) ... ok (0.001s) 2023-01-11T21:27:03.5025828Z test_default_collate_sequence_keep_type (__main__.TestDataLoader) ... ok (0.002s) 2023-01-11T21:27:03.5026256Z test_default_collate_shared_tensor (__main__.TestDataLoader) ... ok (0.003s) 2023-01-11T21:27:03.5026907Z test_default_convert_mapping_keep_type (__main__.TestDataLoader) ... ok (0.001s) 2023-01-11T21:27:03.5027339Z test_default_convert_sequence_dont_keep_type (__main__.TestDataLoader) ... ok (0.001s) 2023-01-11T21:27:03.5027777Z test_default_convert_sequence_keep_type (__main__.TestDataLoader) ... ok (0.001s) 2023-01-11T21:27:03.5028128Z test_distributed_sampler_invalid_rank (__main__.TestDataLoader) ... ok (0.002s) 2023-01-11T21:27:03.5028454Z test_duplicating_data_with_drop_last (__main__.TestDataLoader) ... ok (0.003s) 2023-01-11T21:27:03.5028753Z test_error (__main__.TestDataLoader) ... ok (0.001s) 2023-01-11T21:27:03.5029126Z test_error_in_init (__main__.TestDataLoader) ... ok (0.093s) 2023-01-11T21:27:03.5029412Z test_error_workers (__main__.TestDataLoader) ... ok (0.083s) 2023-01-11T21:27:03.5029710Z test_excessive_thread_creation_warning (__main__.TestDataLoader) ... ok (0.006s) 2023-01-11T21:27:03.5030021Z test_fd_limit_exceeded (__main__.TestDataLoader) ... ok (1.521s) 2023-01-11T21:27:03.5030607Z test_get_worker_info (__main__.TestDataLoader) ... /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:27:03.5030993Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:27:03.5031438Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:27:03.5031804Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:27:03.5032045Z ok (3.357s) 2023-01-11T21:27:03.5032278Z test_growing_dataset (__main__.TestDataLoader) ... ok (0.001s) 2023-01-11T21:27:03.5032611Z test_invalid_assign_after_init (__main__.TestDataLoader) ... ok (0.001s) 2023-01-11T21:27:03.5032961Z test_invalid_ctor_args_combinations (__main__.TestDataLoader) ... ok (0.003s) 2023-01-11T21:27:03.5033296Z test_iterable_style_dataset (__main__.TestDataLoader) ... ok (1.104s) 2023-01-11T21:27:03.5033627Z test_iterabledataset_len (__main__.TestDataLoader) ... ok (0.002s) 2023-01-11T21:27:03.5034234Z test_large_sampler_indices (__main__.TestDataLoader) ... /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:27:03.5034719Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:27:03.5035211Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:27:03.5035613Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:27:03.5035875Z ok (3.103s) 2023-01-11T21:27:03.5036111Z test_len (__main__.TestDataLoader) ... ok (0.007s) 2023-01-11T21:27:03.5036436Z test_multi_epochs_reproducibility (__main__.TestDataLoader) ... ok (0.150s) 2023-01-11T21:27:03.5037053Z test_multiple_dataloaders (__main__.TestDataLoader) ... /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:27:03.5037494Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:27:03.5037988Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:27:03.5038391Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:27:03.5038894Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:27:03.5039266Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:27:03.5039767Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:27:03.5040163Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:27:03.5040662Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:27:03.5041034Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:27:03.5041528Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:27:03.5041922Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:27:03.5042412Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:27:03.5042782Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:27:03.5043309Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:27:03.5043705Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:27:03.5043953Z ok (3.406s) 2023-01-11T21:27:03.5044496Z test_multiprocessing_contexts (__main__.TestDataLoader) ... /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:27:03.5044943Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:27:03.5045442Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:27:03.5045837Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:27:03.5046337Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:27:03.5046716Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:27:03.5047205Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:27:03.5047601Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:27:03.5048097Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:27:03.5048474Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:27:03.5048961Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:27:03.5049356Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:27:03.5049823Z [W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors] 2023-01-11T21:27:03.5050341Z [W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors] 2023-01-11T21:27:03.5050877Z [W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors] 2023-01-11T21:27:03.5051453Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:27:03.5051838Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:27:03.5052324Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:27:03.5052719Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:27:03.5053222Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:27:03.5053599Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:27:03.5054088Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:27:03.5054798Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:27:03.5055444Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:27:03.5055849Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:27:03.5056280Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:27:03.5056635Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:27:03.5057018Z [W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors] 2023-01-11T21:27:03.5057449Z [W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors] 2023-01-11T21:27:03.5057949Z [W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors] 2023-01-11T21:27:03.5058453Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:27:03.5058797Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:27:03.5059217Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:27:03.5059569Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:27:03.5060002Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:27:03.5060374Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:27:03.5060815Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:27:03.5061168Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:27:03.5061604Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:27:03.5061932Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:27:03.5062359Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:27:03.5062707Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:27:03.5063141Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:27:03.5063555Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:27:03.5063984Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:27:03.5064340Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:27:03.5064771Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:27:03.5065100Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:27:03.5065530Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:27:03.5065881Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:27:03.5066308Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:27:03.5066657Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:27:03.5067085Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:27:03.5067435Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:27:03.5067667Z ok (12.336s) 2023-01-11T21:27:03.5068354Z test_multiprocessing_iterdatapipe (__main__.TestDataLoader) ... /opt/conda/lib/python3.10/site-packages/torch/utils/data/graph_settings.py:88: UserWarning: `shuffle=True` was set, but the datapipe does not contain a `Shuffler`. Adding one at the end. Be aware that the default buffer size might not be sufficient for your task. 2023-01-11T21:27:03.5068839Z warnings.warn( 2023-01-11T21:27:03.5069226Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:27:03.5069566Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:27:03.5070005Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:27:03.5070363Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:27:03.5070838Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:27:03.5071174Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:27:03.5071606Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:27:03.5071965Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:27:03.5072393Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:27:03.5072732Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:27:03.5073163Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:27:03.5073520Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:27:03.5073946Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:27:03.5074291Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:27:03.5074721Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:27:03.5075065Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:27:03.5075496Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:27:03.5075831Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:27:03.5076250Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:27:03.5076622Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:27:03.5077052Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:27:03.5077410Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:27:03.5077849Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:27:03.5078195Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:27:03.5078627Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:27:03.5078965Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:27:03.5079377Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:27:03.5079718Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:27:03.5080149Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:27:03.5080503Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:27:03.5080937Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:27:03.5081289Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:27:03.5081526Z ok (6.467s) 2023-01-11T21:27:03.5081970Z test_no_segfault (__main__.TestDataLoader) ... /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:27:03.5082353Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:27:03.5082781Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:27:03.5083140Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:27:03.5083369Z ok (1.695s) 2023-01-11T21:27:03.5083594Z test_numpy (__main__.TestDataLoader) ... ok (0.002s) 2023-01-11T21:27:03.5083875Z test_numpy_gen_state (__main__.TestDataLoader) ... ok (0.003s) 2023-01-11T21:27:03.5084193Z test_numpy_scalars (__main__.TestDataLoader) ... ok (0.002s) 2023-01-11T21:27:03.5084469Z test_partial_workers (__main__.TestDataLoader) 2023-01-11T21:27:03.5084766Z Check that workers exit even if the iterator is not exhausted. ... ok (0.167s) 2023-01-11T21:27:03.5085045Z test_proper_exit (__main__.TestDataLoader) 2023-01-11T21:27:03.5085470Z There might be ConnectionResetError or leaked semaphore warning (due to dirty process exit), but they are all safe to ignore ... skip: test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test (0.004s) 2023-01-11T21:27:03.5085895Z test_random_sampler (__main__.TestDataLoader) ... ok (0.003s) 2023-01-11T21:27:03.5086212Z test_random_sampler_len_with_replacement (__main__.TestDataLoader) ... ok (0.001s) 2023-01-11T21:27:03.5086543Z test_random_sampler_len_without_replacement (__main__.TestDataLoader) ... ok (0.001s) 2023-01-11T21:27:03.5087068Z test_sampler (__main__.TestDataLoader) ... /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:27:03.5087452Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:27:03.5087888Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:27:03.5088241Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:27:03.5088679Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:27:03.5089022Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:27:03.5089489Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:27:03.5089839Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:27:03.5090279Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:27:03.5090616Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:27:03.5091041Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:27:03.5091397Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:27:03.5091833Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:27:03.5092170Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:27:03.5092590Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:27:03.5092945Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:27:03.5093184Z ok (1.950s) 2023-01-11T21:27:03.5093429Z test_sampler_reproducibility (__main__.TestDataLoader) ... ok (0.014s) 2023-01-11T21:27:03.5093943Z test_segfault (__main__.TestDataLoader) ... /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:27:03.5094326Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:27:03.5096668Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:27:03.5097019Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:27:03.5097256Z ok (3.254s) 2023-01-11T21:27:03.5097498Z test_seqential_batch_workers (__main__.TestDataLoader) ... ok (0.132s) 2023-01-11T21:27:03.5097810Z test_seqential_batch_workers_prefetch (__main__.TestDataLoader) ... ok (0.134s) 2023-01-11T21:27:03.5098112Z test_sequential_batch (__main__.TestDataLoader) ... ok (0.038s) 2023-01-11T21:27:03.5098405Z test_sequential_nonbatch (__main__.TestDataLoader) ... ok (0.021s) 2023-01-11T21:27:03.5098778Z test_sequential_pin_memory (__main__.TestDataLoader) ... ok (0.004s) 2023-01-11T21:27:03.5099068Z test_sequential_workers (__main__.TestDataLoader) ... ok (0.185s) 2023-01-11T21:27:03.5099345Z test_shuffle (__main__.TestDataLoader) ... ok (0.058s) 2023-01-11T21:27:03.5099618Z test_shuffle_batch (__main__.TestDataLoader) ... ok (0.052s) 2023-01-11T21:27:03.5099891Z test_shuffle_batch_none (__main__.TestDataLoader) ... ok (0.054s) 2023-01-11T21:27:03.5100182Z test_shuffle_batch_workers (__main__.TestDataLoader) ... ok (0.163s) 2023-01-11T21:27:03.5100486Z test_shuffle_batch_workers_prefetch (__main__.TestDataLoader) ... ok (0.163s) 2023-01-11T21:27:03.5100780Z test_shuffle_pin_memory (__main__.TestDataLoader) ... ok (0.122s) 2023-01-11T21:27:03.5101088Z test_shuffle_reproducibility (__main__.TestDataLoader) ... ok (0.406s) 2023-01-11T21:27:03.5101385Z test_shuffle_workers (__main__.TestDataLoader) ... ok (0.213s) 2023-01-11T21:27:03.5101880Z test_timeout (__main__.TestDataLoader) ... /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:27:03.5102252Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:27:03.5102682Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:27:03.5103034Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:27:03.5103458Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:27:03.5103797Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:27:03.5104280Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:27:03.5104637Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:27:03.5104867Z ok (5.314s) 2023-01-11T21:27:03.5105096Z test_typing (__main__.TestDataLoader) ... ok (0.001s) 2023-01-11T21:27:03.5105378Z test_worker_init_fn (__main__.TestDataLoader) ... ok (0.051s) 2023-01-11T21:27:03.5105648Z test_worker_seed (__main__.TestDataLoader) ... ok (0.105s) 2023-01-11T21:27:03.5105944Z test_worker_seed_reproducibility (__main__.TestDataLoader) ... ok (0.208s) 2023-01-11T21:27:03.5106508Z test_batch_sampler (__main__.TestDataLoaderPersistentWorkers) ... /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:27:03.5106924Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:27:03.5107350Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:27:03.5107713Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:27:03.5108155Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:27:03.5108499Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:27:03.5108926Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:27:03.5109284Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:27:03.5109724Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:27:03.5110059Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:27:03.5110494Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:27:03.5110850Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:27:03.5111284Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:27:03.5111672Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:27:03.5112109Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:27:03.5112462Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:27:03.5112692Z ok (2.008s) 2023-01-11T21:27:03.5112983Z test_builtin_collection_conversion (__main__.TestDataLoaderPersistentWorkers) ... ok (0.291s) 2023-01-11T21:27:03.5113361Z test_bulk_loading_nobatch (__main__.TestDataLoaderPersistentWorkers) ... ok (0.081s) 2023-01-11T21:27:03.5113739Z test_chain_iterable_style_dataset (__main__.TestDataLoaderPersistentWorkers) ... ok (0.110s) 2023-01-11T21:27:03.5114105Z test_dataset_not_reset (__main__.TestDataLoaderPersistentWorkers) ... ok (0.226s) 2023-01-11T21:27:03.5114482Z test_default_collate_bad_numpy_types (__main__.TestDataLoaderPersistentWorkers) ... ok (0.002s) 2023-01-11T21:27:03.5114877Z test_default_collate_bad_sequence_type (__main__.TestDataLoaderPersistentWorkers) ... ok (0.001s) 2023-01-11T21:27:03.5115251Z test_default_collate_dtype (__main__.TestDataLoaderPersistentWorkers) ... ok (0.002s) 2023-01-11T21:27:03.5115639Z test_default_collate_mapping_keep_type (__main__.TestDataLoaderPersistentWorkers) ... ok (0.001s) 2023-01-11T21:27:03.5116027Z test_default_collate_numpy_memmap (__main__.TestDataLoaderPersistentWorkers) ... ok (0.001s) 2023-01-11T21:27:03.5116422Z test_default_collate_sequence_dont_keep_type (__main__.TestDataLoaderPersistentWorkers) ... ok (0.001s) 2023-01-11T21:27:03.5116815Z test_default_collate_sequence_keep_type (__main__.TestDataLoaderPersistentWorkers) ... ok (0.001s) 2023-01-11T21:27:03.5117237Z test_default_collate_shared_tensor (__main__.TestDataLoaderPersistentWorkers) ... ok (0.001s) 2023-01-11T21:27:03.5117623Z test_default_convert_mapping_keep_type (__main__.TestDataLoaderPersistentWorkers) ... ok (0.001s) 2023-01-11T21:27:03.5118028Z test_default_convert_sequence_dont_keep_type (__main__.TestDataLoaderPersistentWorkers) ... ok (0.001s) 2023-01-11T21:27:03.5118422Z test_default_convert_sequence_keep_type (__main__.TestDataLoaderPersistentWorkers) ... ok (0.001s) 2023-01-11T21:27:03.5118818Z test_distributed_sampler_invalid_rank (__main__.TestDataLoaderPersistentWorkers) ... ok (0.001s) 2023-01-11T21:27:03.5119209Z test_duplicating_data_with_drop_last (__main__.TestDataLoaderPersistentWorkers) ... ok (0.002s) 2023-01-11T21:27:03.5119567Z test_early_exit (__main__.TestDataLoaderPersistentWorkers) ... ok (12.399s) 2023-01-11T21:27:03.5119906Z test_error (__main__.TestDataLoaderPersistentWorkers) ... ok (0.002s) 2023-01-11T21:27:03.5120248Z test_error_in_init (__main__.TestDataLoaderPersistentWorkers) ... ok (0.100s) 2023-01-11T21:27:03.5120596Z test_error_workers (__main__.TestDataLoaderPersistentWorkers) ... ok (0.058s) 2023-01-11T21:27:03.5120965Z test_excessive_thread_creation_warning (__main__.TestDataLoaderPersistentWorkers) ... ok (0.008s) 2023-01-11T21:27:03.5121344Z test_fd_limit_exceeded (__main__.TestDataLoaderPersistentWorkers) ... ok (1.501s) 2023-01-11T21:27:03.5121932Z test_get_worker_info (__main__.TestDataLoaderPersistentWorkers) ... /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:27:03.5122355Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:27:03.5122784Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:27:03.5123145Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:27:03.5123385Z ok (3.342s) 2023-01-11T21:27:03.5123653Z test_growing_dataset (__main__.TestDataLoaderPersistentWorkers) ... ok (0.001s) 2023-01-11T21:27:03.5124020Z test_invalid_assign_after_init (__main__.TestDataLoaderPersistentWorkers) ... ok (0.001s) 2023-01-11T21:27:03.5124403Z test_invalid_ctor_args_combinations (__main__.TestDataLoaderPersistentWorkers) ... ok (0.002s) 2023-01-11T21:27:03.5124820Z test_iterable_style_dataset (__main__.TestDataLoaderPersistentWorkers) ... ok (20.994s) 2023-01-11T21:27:03.5125185Z test_iterabledataset_len (__main__.TestDataLoaderPersistentWorkers) ... ok (0.002s) 2023-01-11T21:27:03.5125783Z test_large_sampler_indices (__main__.TestDataLoaderPersistentWorkers) ... /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:27:03.5126208Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:27:03.5126638Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:27:03.5127005Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:27:03.5127244Z ok (3.116s) 2023-01-11T21:27:03.5127502Z test_len (__main__.TestDataLoaderPersistentWorkers) ... ok (0.007s) 2023-01-11T21:27:03.5127862Z test_multi_epochs_reproducibility (__main__.TestDataLoaderPersistentWorkers) ... ok (0.057s) 2023-01-11T21:27:03.5128468Z test_multiple_dataloaders (__main__.TestDataLoaderPersistentWorkers) ... /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:27:03.5128896Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:27:03.5129325Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:27:03.5129686Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:27:03.5130128Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:27:03.5130508Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:27:03.5130935Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:27:03.5131294Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:27:03.5131732Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:27:03.5132071Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:27:03.5132494Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:27:03.5132848Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:27:03.5133279Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:27:03.5133611Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:27:03.5134036Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:27:03.5134390Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:27:03.5134955Z ok (3.481s) 2023-01-11T21:27:03.5135461Z test_multiprocessing_contexts (__main__.TestDataLoaderPersistentWorkers) ... /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:27:03.5135883Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:27:03.5136312Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:27:03.5136656Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:27:03.5137088Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:27:03.5137424Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:27:03.5137845Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:27:03.5138261Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:27:03.5138693Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:27:03.5139026Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:27:03.5139449Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:27:03.5139793Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:27:03.5140171Z [W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors] 2023-01-11T21:27:03.5140615Z [W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors] 2023-01-11T21:27:03.5141053Z [W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors] 2023-01-11T21:27:03.5141532Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:27:03.5141872Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:27:03.5142298Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:27:03.5142641Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:27:03.5143076Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:27:03.5143461Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:27:03.5143894Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:27:03.5144239Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:27:03.5144679Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:27:03.5145014Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:27:03.5145437Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:27:03.5145792Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:27:03.5146169Z [W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors] 2023-01-11T21:27:03.5146615Z [W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors] 2023-01-11T21:27:03.5147045Z [W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors] 2023-01-11T21:27:03.5147537Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:27:03.5147883Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:27:03.5148318Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:27:03.5148670Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:27:03.5149109Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:27:03.5149449Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:27:03.5149888Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:27:03.5150236Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:27:03.5150706Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:27:03.5151046Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:27:03.5151470Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:27:03.5151824Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:27:03.5152257Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:27:03.5152591Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:27:03.5153011Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:27:03.5153370Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:27:03.5153802Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:27:03.5154132Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:27:03.5154568Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:27:03.5154921Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:27:03.5155350Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:27:03.5155679Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:27:03.5156109Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:27:03.5156496Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:27:03.5156732Z ok (12.296s) 2023-01-11T21:27:03.5157450Z test_multiprocessing_iterdatapipe (__main__.TestDataLoaderPersistentWorkers) ... /opt/conda/lib/python3.10/site-packages/torch/utils/data/graph_settings.py:88: UserWarning: `shuffle=True` was set, but the datapipe does not contain a `Shuffler`. Adding one at the end. Be aware that the default buffer size might not be sufficient for your task. 2023-01-11T21:27:03.5157964Z warnings.warn( 2023-01-11T21:27:03.5158351Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:27:03.5158683Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:27:03.5159116Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:27:03.5159469Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:27:03.5159898Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:27:03.5160226Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:27:03.5160660Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:27:03.5161013Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:27:03.5161448Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:27:03.5161775Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:27:03.5162207Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:27:03.5162559Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:27:03.5162988Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:27:03.5163325Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:27:03.5163784Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:27:03.5164140Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:27:03.5164563Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:27:03.5164908Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:27:03.5165334Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:27:03.5165688Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:27:03.5166112Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:27:03.5166451Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:27:03.5166882Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:27:03.5167228Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:27:03.5167662Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:27:03.5167998Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:27:03.5168423Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:27:03.5168767Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:27:03.5169202Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:27:03.5169574Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:27:03.5169994Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:27:03.5170350Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:27:03.5170587Z ok (6.424s) 2023-01-11T21:27:03.5171076Z test_no_segfault (__main__.TestDataLoaderPersistentWorkers) ... /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:27:03.5171485Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:27:03.5171917Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:27:03.5172273Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:27:03.5172512Z ok (1.675s) 2023-01-11T21:27:03.5172775Z test_numpy (__main__.TestDataLoaderPersistentWorkers) ... ok (0.002s) 2023-01-11T21:27:03.5173124Z test_numpy_gen_state (__main__.TestDataLoaderPersistentWorkers) ... ok (0.003s) 2023-01-11T21:27:03.5173484Z test_numpy_scalars (__main__.TestDataLoaderPersistentWorkers) ... ok (0.002s) 2023-01-11T21:27:03.5173818Z test_partial_workers (__main__.TestDataLoaderPersistentWorkers) 2023-01-11T21:27:03.5174149Z Check that workers exit even if the iterator is not exhausted. ... ok (0.225s) 2023-01-11T21:27:03.5174637Z test_proper_exit (__main__.TestDataLoaderPersistentWorkers) 2023-01-11T21:27:03.5175130Z There might be ConnectionResetError or leaked semaphore warning (due to dirty process exit), but they are all safe to ignore ... skip: test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test (0.003s) 2023-01-11T21:27:03.5175681Z test_random_sampler (__main__.TestDataLoaderPersistentWorkers) ... ok (0.002s) 2023-01-11T21:27:03.5176070Z test_random_sampler_len_with_replacement (__main__.TestDataLoaderPersistentWorkers) ... ok (0.001s) 2023-01-11T21:27:03.5176471Z test_random_sampler_len_without_replacement (__main__.TestDataLoaderPersistentWorkers) ... ok (0.001s) 2023-01-11T21:27:03.5177121Z test_sampler (__main__.TestDataLoaderPersistentWorkers) ... /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:27:03.5177534Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:27:03.5177964Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:27:03.5178317Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:27:03.5178743Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:27:03.5179084Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:27:03.5179516Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:27:03.5179857Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:27:03.5180295Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:27:03.5180632Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:27:03.5181065Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:27:03.5181405Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:27:03.5181840Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:27:03.5182173Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:27:03.5182653Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:27:03.5182995Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:27:03.5183232Z ok (1.965s) 2023-01-11T21:27:03.5183525Z test_sampler_reproducibility (__main__.TestDataLoaderPersistentWorkers) ... ok (0.013s) 2023-01-11T21:27:03.5184102Z test_segfault (__main__.TestDataLoaderPersistentWorkers) ... /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:27:03.5184515Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:27:03.5184949Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:27:03.5185310Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:27:03.5185541Z ok (3.227s) 2023-01-11T21:27:03.5185823Z test_seqential_batch_workers (__main__.TestDataLoaderPersistentWorkers) ... ok (0.141s) 2023-01-11T21:27:03.5186213Z test_seqential_batch_workers_prefetch (__main__.TestDataLoaderPersistentWorkers) ... ok (0.151s) 2023-01-11T21:27:03.5186585Z test_sequential_batch (__main__.TestDataLoaderPersistentWorkers) ... ok (0.037s) 2023-01-11T21:27:03.5186952Z test_sequential_nonbatch (__main__.TestDataLoaderPersistentWorkers) ... ok (0.021s) 2023-01-11T21:27:03.5187320Z test_sequential_pin_memory (__main__.TestDataLoaderPersistentWorkers) ... ok (0.004s) 2023-01-11T21:27:03.5187682Z test_sequential_workers (__main__.TestDataLoaderPersistentWorkers) ... ok (0.197s) 2023-01-11T21:27:03.5188027Z test_shuffle (__main__.TestDataLoaderPersistentWorkers) ... ok (0.058s) 2023-01-11T21:27:03.5188374Z test_shuffle_batch (__main__.TestDataLoaderPersistentWorkers) ... ok (0.052s) 2023-01-11T21:27:03.5188729Z test_shuffle_batch_none (__main__.TestDataLoaderPersistentWorkers) ... ok (0.054s) 2023-01-11T21:27:03.5189088Z test_shuffle_batch_workers (__main__.TestDataLoaderPersistentWorkers) ... ok (0.182s) 2023-01-11T21:27:03.5189468Z test_shuffle_batch_workers_prefetch (__main__.TestDataLoaderPersistentWorkers) ... ok (0.176s) 2023-01-11T21:27:03.5189840Z test_shuffle_pin_memory (__main__.TestDataLoaderPersistentWorkers) ... ok (0.133s) 2023-01-11T21:27:03.5190238Z test_shuffle_reproducibility (__main__.TestDataLoaderPersistentWorkers) ... ok (0.302s) 2023-01-11T21:27:03.5190599Z test_shuffle_workers (__main__.TestDataLoaderPersistentWorkers) ... ok (0.208s) 2023-01-11T21:27:03.5191170Z test_timeout (__main__.TestDataLoaderPersistentWorkers) ... /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:27:03.5191582Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:27:03.5192016Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:27:03.5192370Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:27:03.5192810Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:27:03.5193152Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:27:03.5193582Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:27:03.5193940Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:27:03.5194178Z ok (5.317s) 2023-01-11T21:27:03.5194433Z test_typing (__main__.TestDataLoaderPersistentWorkers) ... ok (0.001s) 2023-01-11T21:27:03.5194779Z test_worker_init_fn (__main__.TestDataLoaderPersistentWorkers) ... ok (0.052s) 2023-01-11T21:27:03.5195125Z test_worker_seed (__main__.TestDataLoaderPersistentWorkers) ... ok (0.110s) 2023-01-11T21:27:03.5195492Z test_worker_seed_reproducibility (__main__.TestDataLoaderPersistentWorkers) ... ok (0.208s) 2023-01-11T21:27:03.5195917Z test_incomplete_fractional_splits (__main__.TestDatasetRandomSplit) ... ok (0.001s) 2023-01-11T21:27:03.5196272Z test_lengths_must_equal_dataset_size (__main__.TestDatasetRandomSplit) ... ok (0.001s) 2023-01-11T21:27:03.5196621Z test_slicing_of_subset_of_dataset (__main__.TestDatasetRandomSplit) ... ok (0.003s) 2023-01-11T21:27:03.5196959Z test_slicing_of_subset_of_subset (__main__.TestDatasetRandomSplit) ... ok (0.002s) 2023-01-11T21:27:03.5197295Z test_splits_are_mutually_exclusive (__main__.TestDatasetRandomSplit) ... ok (0.002s) 2023-01-11T21:27:03.5197628Z test_splits_generator (__main__.TestDatasetRandomSplit) ... ok (0.002s) 2023-01-11T21:27:03.5197955Z test_splits_have_correct_size (__main__.TestDatasetRandomSplit) ... ok (0.001s) 2023-01-11T21:27:03.5198271Z test_splits_indexing_type (__main__.TestDatasetRandomSplit) 2023-01-11T21:27:03.5198555Z Indices generated by random_split ... ok (0.002s) 2023-01-11T21:27:03.5198866Z test_splits_reproducibility (__main__.TestDatasetRandomSplit) ... ok (0.007s) 2023-01-11T21:27:03.5199173Z test_pin_memory (__main__.TestDictDataLoader) ... ok (0.001s) 2023-01-11T21:27:03.5199474Z test_pin_memory_device (__main__.TestDictDataLoader) ... ok (0.001s) 2023-01-11T21:27:03.5200213Z test_pin_memory_with_only_device (__main__.TestDictDataLoader) ... /opt/conda/lib/python3.10/site-packages/torch/utils/data/dataloader.py:599: UserWarning: pin memory device is set and pin_memory flag is not used then device pinned memory won't be usedplease set pin_memory to true, if you need to use the device pin memory 2023-01-11T21:27:03.5200701Z warnings.warn(warn_msg) 2023-01-11T21:27:03.5200892Z ok (0.001s) 2023-01-11T21:27:03.5201137Z test_sequential_batch (__main__.TestDictDataLoader) ... ok (0.034s) 2023-01-11T21:27:03.5201972Z test_ind_worker_queue (__main__.TestIndividualWorkerQueue) ... skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/68643 for platform(s) macos, linux, win. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (0.001s) 2023-01-11T21:27:03.5202593Z test_dataloader_with_namedtuple (__main__.TestNamedTupleDataLoader) ... ok (0.002s) 2023-01-11T21:27:03.5202971Z test_set_affinity_in_worker_init (__main__.TestSetAffinity) ... ok (0.057s) 2023-01-11T21:27:03.5203287Z test_shuffle_pin_memory (__main__.TestStringDataLoader) ... ok (0.075s) 2023-01-11T21:27:03.5203586Z test_getitem (__main__.TestTensorDataset) ... ok (0.005s) 2023-01-11T21:27:03.5203869Z test_getitem_1d (__main__.TestTensorDataset) ... ok (0.004s) 2023-01-11T21:27:03.5204139Z test_len (__main__.TestTensorDataset) ... ok (0.001s) 2023-01-11T21:27:03.5204420Z test_many_tensors (__main__.TestTensorDataset) ... ok (0.003s) 2023-01-11T21:27:03.5204712Z test_single_tensor (__main__.TestTensorDataset) ... ok (0.001s) 2023-01-11T21:27:03.5204887Z 2023-01-11T21:27:03.5205084Z ---------------------------------------------------------------------- 2023-01-11T21:27:03.5205346Z Ran 164 tests in 131.135s 2023-01-11T21:27:03.5205469Z 2023-01-11T21:27:03.5205549Z OK (skipped=5) 2023-01-11T21:27:03.5205664Z 2023-01-11T21:27:03.5205755Z Generating XML reports... 2023-01-11T21:27:03.5206171Z Generated XML report: test-reports/python-unittest/test_dataloader/TEST-TestConcatDataset-20230111212451.xml 2023-01-11T21:27:03.5206701Z Generated XML report: test-reports/python-unittest/test_dataloader/TEST-TestCustomPinFn-20230111212451.xml 2023-01-11T21:27:03.5207214Z Generated XML report: test-reports/python-unittest/test_dataloader/TEST-TestDataLoader-20230111212451.xml 2023-01-11T21:27:03.5207760Z Generated XML report: test-reports/python-unittest/test_dataloader/TEST-TestDataLoaderPersistentWorkers-20230111212451.xml 2023-01-11T21:27:03.5208334Z Generated XML report: test-reports/python-unittest/test_dataloader/TEST-TestDatasetRandomSplit-20230111212451.xml 2023-01-11T21:27:03.5208912Z Generated XML report: test-reports/python-unittest/test_dataloader/TEST-TestDictDataLoader-20230111212451.xml 2023-01-11T21:27:03.5209454Z Generated XML report: test-reports/python-unittest/test_dataloader/TEST-TestNamedTupleDataLoader-20230111212451.xml 2023-01-11T21:27:03.5209979Z Generated XML report: test-reports/python-unittest/test_dataloader/TEST-TestSetAffinity-20230111212451.xml 2023-01-11T21:27:03.5210505Z Generated XML report: test-reports/python-unittest/test_dataloader/TEST-TestStringDataLoader-20230111212451.xml 2023-01-11T21:27:03.5211027Z Generated XML report: test-reports/python-unittest/test_dataloader/TEST-TestTensorDataset-20230111212451.xml 2023-01-11T21:27:03.5211601Z Generated XML report: test-reports/python-unittest/test_dataloader/TEST-IntegrationTestDataLoaderDataPipe-20230111212451.xml 2023-01-11T21:27:03.5212151Z Generated XML report: test-reports/python-unittest/test_dataloader/TEST-TestConvAfterFork-20230111212451.xml 2023-01-11T21:27:03.5212699Z Generated XML report: test-reports/python-unittest/test_dataloader/TEST-TestIndividualWorkerQueue-20230111212451.xml 2023-01-11T21:27:03.5212963Z 2023-01-11T21:27:03.5213353Z ##[endgroup] 2023-01-11T21:27:03.5213741Z FINISHED PRINTING LOG FILE of test_dataloader (/var/lib/jenkins/workspace/test/test-reports/test_dataloader_rmt2ijzu) 2023-01-11T21:27:03.5213963Z 2023-01-11T21:27:05.4220663Z Ignoring disabled issues: [] 2023-01-11T21:27:05.4460874Z Running test_functional_optim ... [2023-01-11 21:27:05.445528] 2023-01-11T21:27:05.4462281Z Executing ['/opt/conda/bin/python', '-bb', 'test_functional_optim.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 21:27:05.445902] 2023-01-11T21:27:08.5022684Z 2023-01-11T21:27:08.5023166Z Expand the folded group to see the log file of test_functional_optim 2023-01-11T21:27:08.5024152Z ##[group]PRINTING LOG FILE of test_functional_optim (/var/lib/jenkins/workspace/test/test-reports/test_functional_optim_xasvohng) 2023-01-11T21:27:08.5024517Z 2023-01-11T21:27:08.5024653Z Running tests... 2023-01-11T21:27:08.5025166Z ---------------------------------------------------------------------- 2023-01-11T21:27:08.5025568Z Test results will be stored in test-reports/python-unittest/test_functional_optim 2023-01-11T21:27:08.5025938Z test_functional_optim_parity_adam (__main__.TestFunctionalOptimParity) ... ok (1.129s) 2023-01-11T21:27:08.5026518Z test_functional_optim_parity_adam_w (__main__.TestFunctionalOptimParity) ... ok (0.020s) 2023-01-11T21:27:08.5026890Z test_functional_optim_parity_sgd (__main__.TestFunctionalOptimParity) ... ok (0.018s) 2023-01-11T21:27:08.5027246Z test_functional_optim_registration (__main__.TestFunctionalOptimParity) ... ok (0.000s) 2023-01-11T21:27:08.5027452Z 2023-01-11T21:27:08.5027652Z ---------------------------------------------------------------------- 2023-01-11T21:27:08.5027912Z Ran 4 tests in 1.168s 2023-01-11T21:27:08.5028033Z 2023-01-11T21:27:08.5028094Z OK 2023-01-11T21:27:08.5028194Z 2023-01-11T21:27:08.5028294Z Generating XML reports... 2023-01-11T21:27:08.5028766Z Generated XML report: test-reports/python-unittest/test_functional_optim/TEST-TestFunctionalOptimParity-20230111212706.xml 2023-01-11T21:27:08.5029034Z 2023-01-11T21:27:08.5029263Z ##[endgroup] 2023-01-11T21:27:08.5029672Z FINISHED PRINTING LOG FILE of test_functional_optim (/var/lib/jenkins/workspace/test/test-reports/test_functional_optim_xasvohng) 2023-01-11T21:27:08.5029908Z 2023-01-11T21:27:10.4149796Z Ignoring disabled issues: [] 2023-01-11T21:27:10.4383344Z Running test_fx_experimental ... [2023-01-11 21:27:10.437525] 2023-01-11T21:27:10.4383939Z Executing ['/opt/conda/bin/python', '-bb', 'test_fx_experimental.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 21:27:10.437870] 2023-01-11T21:27:27.6753342Z 2023-01-11T21:27:27.6754112Z Expand the folded group to see the log file of test_fx_experimental 2023-01-11T21:27:27.6755158Z ##[group]PRINTING LOG FILE of test_fx_experimental (/var/lib/jenkins/workspace/test/test-reports/test_fx_experimental_bv6m_9hq) 2023-01-11T21:27:27.6755905Z 2023-01-11T21:27:27.6756043Z Running tests... 2023-01-11T21:27:27.6756645Z ---------------------------------------------------------------------- 2023-01-11T21:27:27.6757288Z Test results will be stored in test-reports/python-unittest/test_fx_experimental 2023-01-11T21:27:27.6757806Z test_annotate_getitem_node (__main__.TestFXExperimental) ... ok (0.008s) 2023-01-11T21:27:27.6759027Z test_annotate_returns_with_schema (__main__.TestFXExperimental) ... /opt/conda/lib/python3.10/site-packages/torch/jit/_check.py:181: UserWarning: The TorchScript type system doesn't support instance-level annotations on empty non-base types in `__init__`. Instead, either 1) use a type annotation in the class body, or 2) wrap the type in `torch.jit.Attribute`. 2023-01-11T21:27:27.6759948Z warnings.warn("The TorchScript type system doesn't support " 2023-01-11T21:27:27.6760289Z ok (1.169s) 2023-01-11T21:27:27.6760635Z test_aot_based_partition (__main__.TestFXExperimental) ... ok (0.005s) 2023-01-11T21:27:27.6761097Z test_call_to_assert_no_msg (__main__.TestFXExperimental) ... ok (0.003s) 2023-01-11T21:27:27.6761557Z test_call_to_assert_with_empty_msg (__main__.TestFXExperimental) ... ok (0.003s) 2023-01-11T21:27:27.6762024Z test_call_to_assert_with_msg (__main__.TestFXExperimental) ... ok (0.003s) 2023-01-11T21:27:27.6762487Z test_call_to_assert_with_multiline_message (__main__.TestFXExperimental) ... ok (0.003s) 2023-01-11T21:27:27.6763185Z test_conv_bn_fusion (__main__.TestFXExperimental) ... ok (0.727s) 2023-01-11T21:27:27.6763680Z test_conv_bn_fusion_not_running_state (__main__.TestFXExperimental) ... ok (0.008s) 2023-01-11T21:27:27.6764158Z test_cost_aware_partition (__main__.TestFXExperimental) ... ok (0.011s) 2023-01-11T21:27:27.6764576Z test_fetch (__main__.TestFXExperimental) ... ok (0.003s) 2023-01-11T21:27:27.6765012Z test_find_single_partition (__main__.TestFXExperimental) ... ok (0.003s) 2023-01-11T21:27:27.6765456Z test_lack_of_devices (__main__.TestFXExperimental) ... ok (0.002s) 2023-01-11T21:27:27.6765904Z test_large_node_error (__main__.TestFXExperimental) ... ok (0.003s) 2023-01-11T21:27:27.6766348Z test_merge_matmuls (__main__.TestFXExperimental) 2023-01-11T21:27:27.6766717Z A collection of test cases for torch.fx.experimental.merge_matmul, ... ok (0.025s) 2023-01-11T21:27:27.6767148Z test_meta_tracer (__main__.TestFXExperimental) ... ok (0.017s) 2023-01-11T21:27:27.6767474Z test_normalize_args (__main__.TestFXExperimental) ... ok (0.466s) 2023-01-11T21:27:27.6768139Z test_normalize_args_perserve_type (__main__.TestFXExperimental) ... /opt/conda/lib/python3.10/site-packages/torch/fx/operator_schemas.py:207: UserWarning: Does not support nested parametric types, got typing.List[~t]. Please file a bug. 2023-01-11T21:27:27.6768553Z warnings.warn( 2023-01-11T21:27:27.6768745Z ok (0.005s) 2023-01-11T21:27:27.6768999Z test_normalize_args_preserve_meta (__main__.TestFXExperimental) ... ok (0.004s) 2023-01-11T21:27:27.6769341Z test_normalize_binary_operators (__main__.TestFXExperimental) ... ok (0.047s) 2023-01-11T21:27:27.6769667Z test_normalize_modules_exhaustive (__main__.TestFXExperimental) 2023-01-11T21:27:27.6770468Z Exhaustively test `Node.normalized_arguments` on all standard ... /opt/conda/lib/python3.10/site-packages/torch/nn/modules/conv.py:309: UserWarning: Using padding='same' with even kernel lengths and odd dilation may require a zero-padded copy of the input be created (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/Convolution.cpp:997.) 2023-01-11T21:27:27.6771035Z return F.conv1d(input, weight, bias, self.stride, 2023-01-11T21:27:27.6771258Z ok (1.404s) 2023-01-11T21:27:27.6771517Z test_optimize_for_inference_cpu (__main__.TestFXExperimental) ... ok (0.229s) 2023-01-11T21:27:27.6771854Z test_optimize_for_inference_cpu_torchvision (__main__.TestFXExperimental) ... ok (7.288s) 2023-01-11T21:27:27.6772201Z test_partition_device_mapping (__main__.TestFXExperimental) ... ok (0.007s) 2023-01-11T21:27:27.6772562Z test_partition_latency (__main__.TestFXExperimental) ... ok (0.006s) 2023-01-11T21:27:27.6772882Z test_partition_node_manipulation (__main__.TestFXExperimental) ... ok (0.004s) 2023-01-11T21:27:27.6773223Z test_replace_target_nodes_with (__main__.TestFXExperimental) ... ok (0.002s) 2023-01-11T21:27:27.6773537Z test_saturate_host (__main__.TestFXExperimental) ... [0, 4] 2023-01-11T21:27:27.6773765Z [1, 2] 2023-01-11T21:27:27.6773929Z ok (0.005s) 2023-01-11T21:27:27.6774179Z test_size_based_partition (__main__.TestFXExperimental) ... ok (0.005s) 2023-01-11T21:27:27.6774881Z test_sparse_nn_partition (__main__.TestFXExperimental) ... ok (0.103s) 2023-01-11T21:27:27.6775205Z test_split_module_default_arg (__main__.TestFXExperimental) ... ok (0.006s) 2023-01-11T21:27:27.6775544Z test_split_module_kwargs_expansion (__main__.TestFXExperimental) ... ok (0.003s) 2023-01-11T21:27:27.6775881Z test_split_qualname_mapping (__main__.TestFXExperimental) ... ok (0.005s) 2023-01-11T21:27:27.6776198Z test_subgraph_creation (__main__.TestFXExperimental) ... ok (0.005s) 2023-01-11T21:27:27.6776523Z test_subgraph_trivial_resnet (__main__.TestFXExperimental) ... ok (0.152s) 2023-01-11T21:27:27.6776848Z test_subgraph_uniquename (__main__.TestFXExperimental) ... ok (0.005s) 2023-01-11T21:27:27.6777481Z test_to_folder (__main__.TestFXExperimental) ... /opt/conda/lib/python3.10/site-packages/torch/fx/graph_module.py:476: UserWarning: Was not able to save the following children modules as reprs -saved as pickled files instead: ['seq'] 2023-01-11T21:27:27.6777984Z warnings.warn("Was not able to save the following children modules as reprs -" 2023-01-11T21:27:27.6778240Z ok (0.008s) 2023-01-11T21:27:27.6778518Z test_traceable_function_with_nonstandard_name (__main__.TestFXExperimental) ... ok (0.002s) 2023-01-11T21:27:27.6778840Z test_type_matches (__main__.TestFXExperimental) ... ok (0.002s) 2023-01-11T21:27:27.6779179Z test_normalize_args_op_overload_cuda (__main__.TestNormalizeOperatorsCUDA) ... ok (0.002s) 2023-01-11T21:27:27.6779592Z test_normalize_operator_exhaustive_H_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6780022Z test_normalize_operator_exhaustive_T_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6780527Z test_normalize_operator_exhaustive___getitem___cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6780979Z test_normalize_operator_exhaustive___radd___cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6781418Z test_normalize_operator_exhaustive___rdiv___cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6781858Z test_normalize_operator_exhaustive___rmatmul___cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6782290Z test_normalize_operator_exhaustive___rmod___cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6782727Z test_normalize_operator_exhaustive___rmul___cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6783167Z test_normalize_operator_exhaustive___rpow___cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6783605Z test_normalize_operator_exhaustive___rsub___cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6784049Z test_normalize_operator_exhaustive__native_batch_norm_legit_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6784519Z test_normalize_operator_exhaustive__softmax_backward_data_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6784969Z test_normalize_operator_exhaustive_abs_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6785456Z test_normalize_operator_exhaustive_acos_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6785892Z test_normalize_operator_exhaustive_acosh_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6786334Z test_normalize_operator_exhaustive_add_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6786822Z test_normalize_operator_exhaustive_addbmm_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6787270Z test_normalize_operator_exhaustive_addcdiv_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6787710Z test_normalize_operator_exhaustive_addcmul_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6788159Z test_normalize_operator_exhaustive_addmm_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6788613Z test_normalize_operator_exhaustive_addmm_decomposed_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6789062Z test_normalize_operator_exhaustive_addmv_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6789492Z test_normalize_operator_exhaustive_addr_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6789929Z test_normalize_operator_exhaustive_all_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6790372Z test_normalize_operator_exhaustive_allclose_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6790817Z test_normalize_operator_exhaustive_amax_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6791244Z test_normalize_operator_exhaustive_amin_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6791715Z test_normalize_operator_exhaustive_aminmax_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6792161Z test_normalize_operator_exhaustive_angle_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6792598Z test_normalize_operator_exhaustive_any_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6793031Z test_normalize_operator_exhaustive_arange_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6793476Z test_normalize_operator_exhaustive_argmax_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6793929Z test_normalize_operator_exhaustive_argmin_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6794372Z test_normalize_operator_exhaustive_argsort_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6794810Z test_normalize_operator_exhaustive_argwhere_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6795256Z test_normalize_operator_exhaustive_as_strided_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6795723Z test_normalize_operator_exhaustive_as_strided_partial_views_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6796198Z test_normalize_operator_exhaustive_as_strided_scatter_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6796683Z test_normalize_operator_exhaustive_asin_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6797112Z test_normalize_operator_exhaustive_asinh_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6797556Z test_normalize_operator_exhaustive_atan2_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6797995Z test_normalize_operator_exhaustive_atan_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6798439Z test_normalize_operator_exhaustive_atanh_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6798875Z test_normalize_operator_exhaustive_atleast_1d_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6799328Z test_normalize_operator_exhaustive_atleast_2d_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6799774Z test_normalize_operator_exhaustive_atleast_3d_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6800228Z test_normalize_operator_exhaustive_baddbmm_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6800667Z test_normalize_operator_exhaustive_bernoulli_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6801111Z test_normalize_operator_exhaustive_bfloat16_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6801558Z test_normalize_operator_exhaustive_block_diag_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6801997Z test_normalize_operator_exhaustive_bmm_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6802427Z test_normalize_operator_exhaustive_bool_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6802908Z test_normalize_operator_exhaustive_broadcast_shapes_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6803374Z test_normalize_operator_exhaustive_broadcast_tensors_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6803832Z test_normalize_operator_exhaustive_broadcast_to_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6804276Z test_normalize_operator_exhaustive_bucketize_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6804719Z test_normalize_operator_exhaustive_byte_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6805173Z test_normalize_operator_exhaustive_cartesian_prod_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6805624Z test_normalize_operator_exhaustive_cat_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6806058Z test_normalize_operator_exhaustive_cdist_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6806501Z test_normalize_operator_exhaustive_cdouble_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6806945Z test_normalize_operator_exhaustive_ceil_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6807390Z test_normalize_operator_exhaustive_cfloat_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6807850Z test_normalize_operator_exhaustive_chalf_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6808288Z test_normalize_operator_exhaustive_char_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6808734Z test_normalize_operator_exhaustive_cholesky_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6809188Z test_normalize_operator_exhaustive_cholesky_inverse_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6809642Z test_normalize_operator_exhaustive_cholesky_solve_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6810089Z test_normalize_operator_exhaustive_chunk_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6810523Z test_normalize_operator_exhaustive_clamp_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6810973Z test_normalize_operator_exhaustive_clamp_max_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6811417Z test_normalize_operator_exhaustive_clamp_min_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6811860Z test_normalize_operator_exhaustive_clone_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6812307Z test_normalize_operator_exhaustive_column_stack_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6812760Z test_normalize_operator_exhaustive_combinations_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6813201Z test_normalize_operator_exhaustive_complex_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6813647Z test_normalize_operator_exhaustive_conj_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6814123Z test_normalize_operator_exhaustive_conj_physical_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6814855Z test_normalize_operator_exhaustive_constant_pad_nd_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6815305Z test_normalize_operator_exhaustive_contiguous_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6815757Z test_normalize_operator_exhaustive_copysign_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6816205Z test_normalize_operator_exhaustive_corrcoef_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6816649Z test_normalize_operator_exhaustive_cos_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6817083Z test_normalize_operator_exhaustive_cosh_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6817528Z test_normalize_operator_exhaustive_count_nonzero_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6817973Z test_normalize_operator_exhaustive_cov_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6818407Z test_normalize_operator_exhaustive_cross_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6818842Z test_normalize_operator_exhaustive_cummax_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6819284Z test_normalize_operator_exhaustive_cummin_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6819782Z test_normalize_operator_exhaustive_cumprod_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6820231Z test_normalize_operator_exhaustive_cumsum_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6820681Z test_normalize_operator_exhaustive_cumulative_trapezoid_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6821138Z test_normalize_operator_exhaustive_deg2rad_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6821577Z test_normalize_operator_exhaustive_diag_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6822024Z test_normalize_operator_exhaustive_diag_embed_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.003s) 2023-01-11T21:27:27.6822471Z test_normalize_operator_exhaustive_diagflat_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6822926Z test_normalize_operator_exhaustive_diagonal_copy_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6823378Z test_normalize_operator_exhaustive_diagonal_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6823836Z test_normalize_operator_exhaustive_diagonal_scatter_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6824281Z test_normalize_operator_exhaustive_diff_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6824719Z test_normalize_operator_exhaustive_digamma_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6825163Z test_normalize_operator_exhaustive_dist_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6825612Z test_normalize_operator_exhaustive_div_floor_rounding_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6826103Z test_normalize_operator_exhaustive_div_no_rounding_mode_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6826573Z test_normalize_operator_exhaustive_div_trunc_rounding_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6827075Z test_normalize_operator_exhaustive_dot_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6827515Z test_normalize_operator_exhaustive_double_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6827955Z test_normalize_operator_exhaustive_dsplit_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6828399Z test_normalize_operator_exhaustive_dstack_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6828843Z test_normalize_operator_exhaustive_einsum_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6829284Z test_normalize_operator_exhaustive_empty_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6829718Z test_normalize_operator_exhaustive_empty_like_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Skipped! (0.002s) 2023-01-11T21:27:27.6830154Z test_normalize_operator_exhaustive_eq_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6830584Z test_normalize_operator_exhaustive_equal_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6831058Z test_normalize_operator_exhaustive_erf_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6831490Z test_normalize_operator_exhaustive_erfc_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6831933Z test_normalize_operator_exhaustive_erfinv_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6832373Z test_normalize_operator_exhaustive_exp2_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6832812Z test_normalize_operator_exhaustive_exp_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6833247Z test_normalize_operator_exhaustive_expand_as_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6833698Z test_normalize_operator_exhaustive_expand_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6834140Z test_normalize_operator_exhaustive_expm1_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6834582Z test_normalize_operator_exhaustive_eye_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6835013Z test_normalize_operator_exhaustive_fft_fft2_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6835457Z test_normalize_operator_exhaustive_fft_fft_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6835900Z test_normalize_operator_exhaustive_fft_fftn_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6836350Z test_normalize_operator_exhaustive_fft_fftshift_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6836794Z test_normalize_operator_exhaustive_fft_hfft2_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6837281Z test_normalize_operator_exhaustive_fft_hfft_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6837728Z test_normalize_operator_exhaustive_fft_hfftn_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6838173Z test_normalize_operator_exhaustive_fft_ifft2_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6838605Z test_normalize_operator_exhaustive_fft_ifft_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6839046Z test_normalize_operator_exhaustive_fft_ifftn_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6839500Z test_normalize_operator_exhaustive_fft_ifftshift_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6839953Z test_normalize_operator_exhaustive_fft_ihfft2_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6840391Z test_normalize_operator_exhaustive_fft_ihfft_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6840838Z test_normalize_operator_exhaustive_fft_ihfftn_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6841284Z test_normalize_operator_exhaustive_fft_irfft2_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6841731Z test_normalize_operator_exhaustive_fft_irfft_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6842226Z test_normalize_operator_exhaustive_fft_irfftn_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6842671Z test_normalize_operator_exhaustive_fft_rfft2_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6843113Z test_normalize_operator_exhaustive_fft_rfft_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6843554Z test_normalize_operator_exhaustive_fft_rfftn_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6843983Z test_normalize_operator_exhaustive_fill_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6844426Z test_normalize_operator_exhaustive_flatten_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6844870Z test_normalize_operator_exhaustive_flip_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6845317Z test_normalize_operator_exhaustive_fliplr_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6845752Z test_normalize_operator_exhaustive_flipud_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6846195Z test_normalize_operator_exhaustive_float_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6846646Z test_normalize_operator_exhaustive_float_power_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6847092Z test_normalize_operator_exhaustive_floor_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6847576Z test_normalize_operator_exhaustive_floor_divide_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6848028Z test_normalize_operator_exhaustive_fmax_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6848491Z test_normalize_operator_exhaustive_fmin_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6848923Z test_normalize_operator_exhaustive_fmod_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6849349Z test_normalize_operator_exhaustive_frac_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6849787Z test_normalize_operator_exhaustive_frexp_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6850217Z test_normalize_operator_exhaustive_full_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6850662Z test_normalize_operator_exhaustive_full_like_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6851099Z test_normalize_operator_exhaustive_gather_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6851543Z test_normalize_operator_exhaustive_ge_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6851982Z test_normalize_operator_exhaustive_geqrf_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6852427Z test_normalize_operator_exhaustive_gradient_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6852874Z test_normalize_operator_exhaustive_grid_sampler_2d_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6853320Z test_normalize_operator_exhaustive_gt_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6853792Z test_normalize_operator_exhaustive_half_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6854238Z test_normalize_operator_exhaustive_heaviside_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6854879Z test_normalize_operator_exhaustive_histc_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6855325Z test_normalize_operator_exhaustive_hsplit_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6855764Z test_normalize_operator_exhaustive_hstack_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6856199Z test_normalize_operator_exhaustive_hypot_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6856625Z test_normalize_operator_exhaustive_i0_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6857082Z test_normalize_operator_exhaustive_igamma_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6857551Z test_normalize_operator_exhaustive_igammac_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6857995Z test_normalize_operator_exhaustive_index_add_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6858437Z test_normalize_operator_exhaustive_index_copy_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6858885Z test_normalize_operator_exhaustive_index_fill_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6859332Z test_normalize_operator_exhaustive_index_put_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6859782Z test_normalize_operator_exhaustive_index_reduce_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6860290Z test_normalize_operator_exhaustive_index_select_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6860728Z test_normalize_operator_exhaustive_inner_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6861159Z test_normalize_operator_exhaustive_int_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6861592Z test_normalize_operator_exhaustive_isclose_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6862041Z test_normalize_operator_exhaustive_isfinite_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6862474Z test_normalize_operator_exhaustive_isin_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6862909Z test_normalize_operator_exhaustive_isinf_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6863344Z test_normalize_operator_exhaustive_isnan_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6863780Z test_normalize_operator_exhaustive_isneginf_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6864216Z test_normalize_operator_exhaustive_isposinf_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6864656Z test_normalize_operator_exhaustive_isreal_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6865164Z test_normalize_operator_exhaustive_jiterator_2inputs_2outputs_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6865654Z test_normalize_operator_exhaustive_jiterator_4inputs_with_extra_args_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6866127Z test_normalize_operator_exhaustive_jiterator_binary_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6866601Z test_normalize_operator_exhaustive_jiterator_binary_return_by_ref_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6867075Z test_normalize_operator_exhaustive_jiterator_unary_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6867522Z test_normalize_operator_exhaustive_kron_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6867964Z test_normalize_operator_exhaustive_kthvalue_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6868407Z test_normalize_operator_exhaustive_ldexp_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6868858Z test_normalize_operator_exhaustive_le_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6869298Z test_normalize_operator_exhaustive_lerp_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6869731Z test_normalize_operator_exhaustive_lgamma_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6870183Z test_normalize_operator_exhaustive_linalg_cholesky_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6870648Z test_normalize_operator_exhaustive_linalg_cholesky_ex_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6871106Z test_normalize_operator_exhaustive_linalg_cond_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.003s) 2023-01-11T21:27:27.6871581Z test_normalize_operator_exhaustive_linalg_cross_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6872033Z test_normalize_operator_exhaustive_linalg_det_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6872493Z test_normalize_operator_exhaustive_linalg_det_singular_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6872951Z test_normalize_operator_exhaustive_linalg_eig_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6873393Z test_normalize_operator_exhaustive_linalg_eigh_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6873851Z test_normalize_operator_exhaustive_linalg_eigvals_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6874315Z test_normalize_operator_exhaustive_linalg_eigvalsh_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6874786Z test_normalize_operator_exhaustive_linalg_householder_product_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6875245Z test_normalize_operator_exhaustive_linalg_inv_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6875692Z test_normalize_operator_exhaustive_linalg_inv_ex_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6876149Z test_normalize_operator_exhaustive_linalg_ldl_factor_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6876646Z test_normalize_operator_exhaustive_linalg_ldl_factor_ex_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6877103Z test_normalize_operator_exhaustive_linalg_ldl_solve_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6877561Z test_normalize_operator_exhaustive_linalg_lstsq_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6878025Z test_normalize_operator_exhaustive_linalg_lstsq_grad_oriented_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6878486Z test_normalize_operator_exhaustive_linalg_lu_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6878927Z test_normalize_operator_exhaustive_linalg_lu_factor_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6879394Z test_normalize_operator_exhaustive_linalg_lu_factor_ex_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6879855Z test_normalize_operator_exhaustive_linalg_lu_solve_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6880318Z test_normalize_operator_exhaustive_linalg_matrix_norm_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6880779Z test_normalize_operator_exhaustive_linalg_matrix_power_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6881230Z test_normalize_operator_exhaustive_linalg_matrix_rank_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6881698Z test_normalize_operator_exhaustive_linalg_matrix_rank_hermitian_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6882167Z test_normalize_operator_exhaustive_linalg_multi_dot_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6882648Z test_normalize_operator_exhaustive_linalg_norm_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6883114Z test_normalize_operator_exhaustive_linalg_norm_subgradients_at_zero_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6883584Z test_normalize_operator_exhaustive_linalg_pinv_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6884045Z test_normalize_operator_exhaustive_linalg_pinv_hermitian_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6884560Z test_normalize_operator_exhaustive_linalg_pinv_singular_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test (0.002s) 2023-01-11T21:27:27.6885051Z test_normalize_operator_exhaustive_linalg_qr_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6885503Z test_normalize_operator_exhaustive_linalg_slogdet_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6885960Z test_normalize_operator_exhaustive_linalg_solve_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6886419Z test_normalize_operator_exhaustive_linalg_solve_ex_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6886925Z test_normalize_operator_exhaustive_linalg_solve_triangular_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6887417Z test_normalize_operator_exhaustive_linalg_svd_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6887870Z test_normalize_operator_exhaustive_linalg_svdvals_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6888340Z test_normalize_operator_exhaustive_linalg_tensorinv_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6888795Z test_normalize_operator_exhaustive_linalg_tensorsolve_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6889254Z test_normalize_operator_exhaustive_linalg_vander_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6889708Z test_normalize_operator_exhaustive_linalg_vecdot_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6890175Z test_normalize_operator_exhaustive_linalg_vector_norm_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6890625Z test_normalize_operator_exhaustive_linspace_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6891076Z test_normalize_operator_exhaustive_log10_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6891513Z test_normalize_operator_exhaustive_log1p_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6891954Z test_normalize_operator_exhaustive_log2_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6892382Z test_normalize_operator_exhaustive_log_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6892828Z test_normalize_operator_exhaustive_log_softmax_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6893294Z test_normalize_operator_exhaustive_log_softmax_with_dtype_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6893788Z test_normalize_operator_exhaustive_logaddexp2_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6894232Z test_normalize_operator_exhaustive_logaddexp_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6894932Z test_normalize_operator_exhaustive_logcumsumexp_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6895381Z test_normalize_operator_exhaustive_logdet_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6895825Z test_normalize_operator_exhaustive_logical_and_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6896267Z test_normalize_operator_exhaustive_logical_not_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6896719Z test_normalize_operator_exhaustive_logical_or_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6897165Z test_normalize_operator_exhaustive_logical_xor_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6897602Z test_normalize_operator_exhaustive_logit_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6898043Z test_normalize_operator_exhaustive_logspace_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6898479Z test_normalize_operator_exhaustive_logsumexp_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6898978Z test_normalize_operator_exhaustive_long_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6899415Z test_normalize_operator_exhaustive_lt_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6899846Z test_normalize_operator_exhaustive_lu_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6900283Z test_normalize_operator_exhaustive_lu_solve_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6900733Z test_normalize_operator_exhaustive_lu_unpack_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6901175Z test_normalize_operator_exhaustive_mH_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6901612Z test_normalize_operator_exhaustive_mT_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6902054Z test_normalize_operator_exhaustive_masked_amax_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6902506Z test_normalize_operator_exhaustive_masked_amin_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6902963Z test_normalize_operator_exhaustive_masked_argmax_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6903419Z test_normalize_operator_exhaustive_masked_argmin_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6903871Z test_normalize_operator_exhaustive_masked_cumprod_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6904324Z test_normalize_operator_exhaustive_masked_cumsum_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6904774Z test_normalize_operator_exhaustive_masked_fill_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6905266Z test_normalize_operator_exhaustive_masked_log_softmax_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6905727Z test_normalize_operator_exhaustive_masked_logaddexp_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6906185Z test_normalize_operator_exhaustive_masked_logsumexp_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Skipped! (0.002s) 2023-01-11T21:27:27.6906642Z test_normalize_operator_exhaustive_masked_mean_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6907096Z test_normalize_operator_exhaustive_masked_median_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6907541Z test_normalize_operator_exhaustive_masked_norm_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6908002Z test_normalize_operator_exhaustive_masked_normalize_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6908456Z test_normalize_operator_exhaustive_masked_prod_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6908912Z test_normalize_operator_exhaustive_masked_scatter_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6909360Z test_normalize_operator_exhaustive_masked_select_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6909816Z test_normalize_operator_exhaustive_masked_softmax_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6910301Z test_normalize_operator_exhaustive_masked_softmin_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6910759Z test_normalize_operator_exhaustive_masked_std_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6911202Z test_normalize_operator_exhaustive_masked_sum_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6911648Z test_normalize_operator_exhaustive_masked_var_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6912092Z test_normalize_operator_exhaustive_matmul_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6912537Z test_normalize_operator_exhaustive_matrix_exp_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6912979Z test_normalize_operator_exhaustive_max_binary_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6913449Z test_normalize_operator_exhaustive_max_pool2d_with_indices_backward_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6913935Z test_normalize_operator_exhaustive_max_reduction_no_dim_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6914404Z test_normalize_operator_exhaustive_max_reduction_with_dim_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6914858Z test_normalize_operator_exhaustive_maximum_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6915307Z test_normalize_operator_exhaustive_mean_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6915749Z test_normalize_operator_exhaustive_median_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6916207Z test_normalize_operator_exhaustive_meshgrid_list_of_tensors_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6916717Z test_normalize_operator_exhaustive_meshgrid_variadic_tensors_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Skipped! (0.002s) 2023-01-11T21:27:27.6917176Z test_normalize_operator_exhaustive_min_binary_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6917633Z test_normalize_operator_exhaustive_min_reduction_no_dim_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6918103Z test_normalize_operator_exhaustive_min_reduction_with_dim_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6918553Z test_normalize_operator_exhaustive_minimum_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6918992Z test_normalize_operator_exhaustive_mm_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6919432Z test_normalize_operator_exhaustive_mode_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6919874Z test_normalize_operator_exhaustive_movedim_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6920309Z test_normalize_operator_exhaustive_msort_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6920748Z test_normalize_operator_exhaustive_mul_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6921191Z test_normalize_operator_exhaustive_multinomial_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6921667Z test_normalize_operator_exhaustive_mv_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6922114Z test_normalize_operator_exhaustive_mvlgamma_mvlgamma_p_1_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6922582Z test_normalize_operator_exhaustive_mvlgamma_mvlgamma_p_3_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6923048Z test_normalize_operator_exhaustive_mvlgamma_mvlgamma_p_5_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6923508Z test_normalize_operator_exhaustive_nan_to_num_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6923961Z test_normalize_operator_exhaustive_nanmean_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6924404Z test_normalize_operator_exhaustive_nanmedian_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6924857Z test_normalize_operator_exhaustive_nanquantile_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6925303Z test_normalize_operator_exhaustive_nansum_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6925752Z test_normalize_operator_exhaustive_narrow_copy_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6926192Z test_normalize_operator_exhaustive_narrow_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6926642Z test_normalize_operator_exhaustive_native_batch_norm_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6927115Z test_normalize_operator_exhaustive_native_dropout_backward_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6927662Z test_normalize_operator_exhaustive_native_layer_norm_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6928101Z test_normalize_operator_exhaustive_ne_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6928542Z test_normalize_operator_exhaustive_neg_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6928985Z test_normalize_operator_exhaustive_new_empty_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6929440Z test_normalize_operator_exhaustive_new_empty_strided_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6929891Z test_normalize_operator_exhaustive_new_full_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6930341Z test_normalize_operator_exhaustive_new_ones_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6930791Z test_normalize_operator_exhaustive_new_zeros_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6931241Z test_normalize_operator_exhaustive_nextafter_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6931706Z test_normalize_operator_exhaustive_nn_functional__scaled_dot_product_attention_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Skipped! (0.002s) 2023-01-11T21:27:27.6932205Z test_normalize_operator_exhaustive_nn_functional_adaptive_avg_pool1d_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6932736Z test_normalize_operator_exhaustive_nn_functional_adaptive_avg_pool2d_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6933233Z test_normalize_operator_exhaustive_nn_functional_adaptive_avg_pool3d_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6933711Z test_normalize_operator_exhaustive_nn_functional_adaptive_max_pool1d_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6934194Z test_normalize_operator_exhaustive_nn_functional_adaptive_max_pool2d_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6934834Z test_normalize_operator_exhaustive_nn_functional_adaptive_max_pool3d_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6935318Z test_normalize_operator_exhaustive_nn_functional_alpha_dropout_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6935800Z test_normalize_operator_exhaustive_nn_functional_avg_pool1d_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6936265Z test_normalize_operator_exhaustive_nn_functional_avg_pool2d_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6936735Z test_normalize_operator_exhaustive_nn_functional_avg_pool3d_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6937205Z test_normalize_operator_exhaustive_nn_functional_batch_norm_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6937685Z test_normalize_operator_exhaustive_nn_functional_batch_norm_without_cudnn_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6938160Z test_normalize_operator_exhaustive_nn_functional_bilinear_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6938637Z test_normalize_operator_exhaustive_nn_functional_binary_cross_entropy_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6939255Z test_normalize_operator_exhaustive_nn_functional_binary_cross_entropy_with_logits_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6939812Z test_normalize_operator_exhaustive_nn_functional_celu_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6940266Z test_normalize_operator_exhaustive_nn_functional_conv1d_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6940727Z test_normalize_operator_exhaustive_nn_functional_conv2d_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6941201Z test_normalize_operator_exhaustive_nn_functional_conv_transpose1d_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6941688Z test_normalize_operator_exhaustive_nn_functional_conv_transpose2d_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6942158Z test_normalize_operator_exhaustive_nn_functional_conv_transpose3d_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6942642Z test_normalize_operator_exhaustive_nn_functional_cosine_embedding_loss_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6943130Z test_normalize_operator_exhaustive_nn_functional_cosine_similarity_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6943610Z test_normalize_operator_exhaustive_nn_functional_cross_entropy_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6944138Z test_normalize_operator_exhaustive_nn_functional_ctc_loss_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6944609Z test_normalize_operator_exhaustive_nn_functional_dropout2d_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6945080Z test_normalize_operator_exhaustive_nn_functional_dropout3d_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6945553Z test_normalize_operator_exhaustive_nn_functional_dropout_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6946022Z test_normalize_operator_exhaustive_nn_functional_elu_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6946489Z test_normalize_operator_exhaustive_nn_functional_embedding_bag_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6946966Z test_normalize_operator_exhaustive_nn_functional_embedding_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6965725Z test_normalize_operator_exhaustive_nn_functional_feature_alpha_dropout_with_train_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6966401Z test_normalize_operator_exhaustive_nn_functional_feature_alpha_dropout_without_train_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6967076Z test_normalize_operator_exhaustive_nn_functional_fractional_max_pool2d_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6967767Z test_normalize_operator_exhaustive_nn_functional_fractional_max_pool3d_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6968466Z test_normalize_operator_exhaustive_nn_functional_gaussian_nll_loss_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6969241Z test_normalize_operator_exhaustive_nn_functional_gelu_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6969898Z test_normalize_operator_exhaustive_nn_functional_glu_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6970555Z test_normalize_operator_exhaustive_nn_functional_grid_sample_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6971230Z test_normalize_operator_exhaustive_nn_functional_group_norm_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6971897Z test_normalize_operator_exhaustive_nn_functional_hardshrink_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6972577Z test_normalize_operator_exhaustive_nn_functional_hardsigmoid_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6973257Z test_normalize_operator_exhaustive_nn_functional_hardswish_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6973915Z test_normalize_operator_exhaustive_nn_functional_hardtanh_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6974734Z test_normalize_operator_exhaustive_nn_functional_hinge_embedding_loss_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6975251Z test_normalize_operator_exhaustive_nn_functional_huber_loss_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6975828Z test_normalize_operator_exhaustive_nn_functional_instance_norm_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6976297Z test_normalize_operator_exhaustive_nn_functional_interpolate_area_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6976837Z test_normalize_operator_exhaustive_nn_functional_interpolate_bicubic_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6977328Z test_normalize_operator_exhaustive_nn_functional_interpolate_bilinear_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6977814Z test_normalize_operator_exhaustive_nn_functional_interpolate_linear_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6978291Z test_normalize_operator_exhaustive_nn_functional_interpolate_nearest_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6978782Z test_normalize_operator_exhaustive_nn_functional_interpolate_trilinear_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6979257Z test_normalize_operator_exhaustive_nn_functional_kl_div_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6979713Z test_normalize_operator_exhaustive_nn_functional_l1_loss_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6980172Z test_normalize_operator_exhaustive_nn_functional_layer_norm_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6980626Z test_normalize_operator_exhaustive_nn_functional_leaky_relu_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6981086Z test_normalize_operator_exhaustive_nn_functional_linear_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6981557Z test_normalize_operator_exhaustive_nn_functional_local_response_norm_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6982067Z test_normalize_operator_exhaustive_nn_functional_logsigmoid_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6982535Z test_normalize_operator_exhaustive_nn_functional_margin_ranking_loss_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6983007Z test_normalize_operator_exhaustive_nn_functional_max_pool1d_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6983465Z test_normalize_operator_exhaustive_nn_functional_max_pool2d_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6983927Z test_normalize_operator_exhaustive_nn_functional_max_pool3d_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6984384Z test_normalize_operator_exhaustive_nn_functional_max_unpool1d_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6984865Z test_normalize_operator_exhaustive_nn_functional_max_unpool1d_grad_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6985336Z test_normalize_operator_exhaustive_nn_functional_max_unpool2d_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6985807Z test_normalize_operator_exhaustive_nn_functional_max_unpool2d_grad_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6986269Z test_normalize_operator_exhaustive_nn_functional_max_unpool3d_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6986773Z test_normalize_operator_exhaustive_nn_functional_max_unpool3d_grad_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6987242Z test_normalize_operator_exhaustive_nn_functional_mish_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6987702Z test_normalize_operator_exhaustive_nn_functional_mse_loss_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6988167Z test_normalize_operator_exhaustive_nn_functional_multi_margin_loss_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6988640Z test_normalize_operator_exhaustive_nn_functional_multilabel_margin_loss_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6989131Z test_normalize_operator_exhaustive_nn_functional_multilabel_soft_margin_loss_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6989610Z test_normalize_operator_exhaustive_nn_functional_nll_loss_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6990072Z test_normalize_operator_exhaustive_nn_functional_normalize_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6990529Z test_normalize_operator_exhaustive_nn_functional_pad_circular_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6991002Z test_normalize_operator_exhaustive_nn_functional_pad_constant_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6991470Z test_normalize_operator_exhaustive_nn_functional_pad_reflect_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6991946Z test_normalize_operator_exhaustive_nn_functional_pad_replicate_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6992446Z test_normalize_operator_exhaustive_nn_functional_pairwise_distance_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6992915Z test_normalize_operator_exhaustive_nn_functional_pdist_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6993387Z test_normalize_operator_exhaustive_nn_functional_pixel_shuffle_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6993860Z test_normalize_operator_exhaustive_nn_functional_pixel_unshuffle_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6994326Z test_normalize_operator_exhaustive_nn_functional_poisson_nll_loss_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6994795Z test_normalize_operator_exhaustive_nn_functional_prelu_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6995251Z test_normalize_operator_exhaustive_nn_functional_relu6_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6995707Z test_normalize_operator_exhaustive_nn_functional_relu_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6996165Z test_normalize_operator_exhaustive_nn_functional_rrelu_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6996611Z test_normalize_operator_exhaustive_nn_functional_selu_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6997060Z test_normalize_operator_exhaustive_nn_functional_silu_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6997557Z test_normalize_operator_exhaustive_nn_functional_smooth_l1_loss_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6998033Z test_normalize_operator_exhaustive_nn_functional_soft_margin_loss_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6998493Z test_normalize_operator_exhaustive_nn_functional_softmin_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6998963Z test_normalize_operator_exhaustive_nn_functional_softmin_with_dtype_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6999435Z test_normalize_operator_exhaustive_nn_functional_softplus_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.6999903Z test_normalize_operator_exhaustive_nn_functional_softshrink_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7000359Z test_normalize_operator_exhaustive_nn_functional_softsign_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7000824Z test_normalize_operator_exhaustive_nn_functional_tanhshrink_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7001289Z test_normalize_operator_exhaustive_nn_functional_threshold_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7001760Z test_normalize_operator_exhaustive_nn_functional_triplet_margin_loss_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7002247Z test_normalize_operator_exhaustive_nn_functional_triplet_margin_with_distance_loss_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7002729Z test_normalize_operator_exhaustive_nn_functional_unfold_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7003224Z test_normalize_operator_exhaustive_nn_functional_upsample_bilinear_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7003707Z test_normalize_operator_exhaustive_nn_functional_upsample_nearest_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7004170Z test_normalize_operator_exhaustive_nonzero_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7004597Z test_normalize_operator_exhaustive_norm_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7005034Z test_normalize_operator_exhaustive_norm_fro_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7005473Z test_normalize_operator_exhaustive_norm_inf_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7005909Z test_normalize_operator_exhaustive_norm_nuc_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7006333Z test_normalize_operator_exhaustive_normal_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7006776Z test_normalize_operator_exhaustive_normal_number_mean_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7007215Z test_normalize_operator_exhaustive_ones_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7007654Z test_normalize_operator_exhaustive_ones_like_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7008115Z test_normalize_operator_exhaustive_ormqr_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7008546Z test_normalize_operator_exhaustive_outer_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7008992Z test_normalize_operator_exhaustive_pca_lowrank_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7009434Z test_normalize_operator_exhaustive_permute_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7009870Z test_normalize_operator_exhaustive_pinverse_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7010310Z test_normalize_operator_exhaustive_polar_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7010762Z test_normalize_operator_exhaustive_polygamma_polygamma_n_0_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7011232Z test_normalize_operator_exhaustive_polygamma_polygamma_n_1_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Skipped! (0.002s) 2023-01-11T21:27:27.7011685Z test_normalize_operator_exhaustive_polygamma_polygamma_n_2_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Skipped! (0.002s) 2023-01-11T21:27:27.7012138Z test_normalize_operator_exhaustive_polygamma_polygamma_n_3_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Skipped! (0.002s) 2023-01-11T21:27:27.7012587Z test_normalize_operator_exhaustive_polygamma_polygamma_n_4_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Skipped! (0.002s) 2023-01-11T21:27:27.7013033Z test_normalize_operator_exhaustive_positive_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7013462Z test_normalize_operator_exhaustive_pow_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7013897Z test_normalize_operator_exhaustive_prod_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7014374Z test_normalize_operator_exhaustive_put_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7015040Z test_normalize_operator_exhaustive_qr_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7015471Z test_normalize_operator_exhaustive_quantile_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7015910Z test_normalize_operator_exhaustive_rad2deg_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7016347Z test_normalize_operator_exhaustive_rand_like_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7016790Z test_normalize_operator_exhaustive_randint_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7017273Z test_normalize_operator_exhaustive_randint_like_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7017714Z test_normalize_operator_exhaustive_randn_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7018152Z test_normalize_operator_exhaustive_randn_like_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7018589Z test_normalize_operator_exhaustive_ravel_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7019014Z test_normalize_operator_exhaustive_real_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7019522Z test_normalize_operator_exhaustive_reciprocal_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7019972Z test_normalize_operator_exhaustive_remainder_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7020416Z test_normalize_operator_exhaustive_renorm_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7020844Z test_normalize_operator_exhaustive_repeat_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7021300Z test_normalize_operator_exhaustive_repeat_interleave_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7021753Z test_normalize_operator_exhaustive_reshape_as_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7022200Z test_normalize_operator_exhaustive_reshape_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7022633Z test_normalize_operator_exhaustive_resize__cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7023077Z test_normalize_operator_exhaustive_resize_as__cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7023524Z test_normalize_operator_exhaustive_resolve_conj_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7023974Z test_normalize_operator_exhaustive_resolve_neg_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7024409Z test_normalize_operator_exhaustive_roll_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7024843Z test_normalize_operator_exhaustive_rot90_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7025277Z test_normalize_operator_exhaustive_round_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7025806Z test_normalize_operator_exhaustive_round_decimals_0_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7026257Z test_normalize_operator_exhaustive_round_decimals_3_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7026713Z test_normalize_operator_exhaustive_round_decimals_neg_3_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7027224Z test_normalize_operator_exhaustive_rsqrt_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7027714Z test_normalize_operator_exhaustive_rsub_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7028164Z test_normalize_operator_exhaustive_scalar_tensor_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7028617Z test_normalize_operator_exhaustive_scatter_add_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7029058Z test_normalize_operator_exhaustive_scatter_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7029512Z test_normalize_operator_exhaustive_scatter_reduce_amax_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7029978Z test_normalize_operator_exhaustive_scatter_reduce_amin_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7030443Z test_normalize_operator_exhaustive_scatter_reduce_mean_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7030940Z test_normalize_operator_exhaustive_scatter_reduce_prod_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7031400Z test_normalize_operator_exhaustive_scatter_reduce_sum_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7031860Z test_normalize_operator_exhaustive_searchsorted_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7032327Z test_normalize_operator_exhaustive_segment_reduce_lengths_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7032799Z test_normalize_operator_exhaustive_segment_reduce_offsets_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7033250Z test_normalize_operator_exhaustive_select_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7033701Z test_normalize_operator_exhaustive_select_scatter_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7034149Z test_normalize_operator_exhaustive_sgn_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7034586Z test_normalize_operator_exhaustive_short_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7035020Z test_normalize_operator_exhaustive_sigmoid_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7035462Z test_normalize_operator_exhaustive_sign_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7035916Z test_normalize_operator_exhaustive_signal_windows_bartlett_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7036387Z test_normalize_operator_exhaustive_signal_windows_blackman_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7036855Z test_normalize_operator_exhaustive_signal_windows_cosine_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7037360Z test_normalize_operator_exhaustive_signal_windows_exponential_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7037887Z test_normalize_operator_exhaustive_signal_windows_gaussian_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7038364Z test_normalize_operator_exhaustive_signal_windows_general_cosine_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7038841Z test_normalize_operator_exhaustive_signal_windows_general_hamming_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7039320Z test_normalize_operator_exhaustive_signal_windows_hamming_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7039780Z test_normalize_operator_exhaustive_signal_windows_hann_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7040247Z test_normalize_operator_exhaustive_signal_windows_kaiser_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7040706Z test_normalize_operator_exhaustive_signal_windows_nuttall_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7041158Z test_normalize_operator_exhaustive_signbit_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7041595Z test_normalize_operator_exhaustive_sin_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7042062Z test_normalize_operator_exhaustive_sinc_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7042495Z test_normalize_operator_exhaustive_sinh_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7042937Z test_normalize_operator_exhaustive_slice_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7043381Z test_normalize_operator_exhaustive_slice_scatter_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7043828Z test_normalize_operator_exhaustive_softmax_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7044270Z test_normalize_operator_exhaustive_softmax_with_dtype_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7044718Z test_normalize_operator_exhaustive_sort_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7045173Z test_normalize_operator_exhaustive_sparse_sampled_addmm_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7045637Z test_normalize_operator_exhaustive_special_airy_ai_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7046086Z test_normalize_operator_exhaustive_special_bessel_j0_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7046545Z test_normalize_operator_exhaustive_special_bessel_j1_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7047003Z test_normalize_operator_exhaustive_special_bessel_y0_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7047493Z test_normalize_operator_exhaustive_special_bessel_y1_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7047985Z test_normalize_operator_exhaustive_special_chebyshev_polynomial_t_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7048492Z test_normalize_operator_exhaustive_special_chebyshev_polynomial_u_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7049271Z test_normalize_operator_exhaustive_special_chebyshev_polynomial_v_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Skipping - testing takes an unreasonably long time, #79528 (0.002s) 2023-01-11T21:27:27.7049955Z test_normalize_operator_exhaustive_special_chebyshev_polynomial_w_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Skipping - testing takes an unreasonably long time, #79528 (0.002s) 2023-01-11T21:27:27.7050468Z test_normalize_operator_exhaustive_special_entr_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7050919Z test_normalize_operator_exhaustive_special_erfcx_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7051394Z test_normalize_operator_exhaustive_special_hermite_polynomial_h_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7051881Z test_normalize_operator_exhaustive_special_hermite_polynomial_he_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7052350Z test_normalize_operator_exhaustive_special_i0e_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7052792Z test_normalize_operator_exhaustive_special_i1_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7053240Z test_normalize_operator_exhaustive_special_i1e_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7053743Z test_normalize_operator_exhaustive_special_laguerre_polynomial_l_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7054399Z test_normalize_operator_exhaustive_special_legendre_polynomial_p_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Skipping - testing takes an unreasonably long time, #79528 (0.002s) 2023-01-11T21:27:27.7055147Z test_normalize_operator_exhaustive_special_log_ndtr_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7055613Z test_normalize_operator_exhaustive_special_modified_bessel_i0_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7056087Z test_normalize_operator_exhaustive_special_modified_bessel_i1_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7056571Z test_normalize_operator_exhaustive_special_modified_bessel_k0_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7057048Z test_normalize_operator_exhaustive_special_modified_bessel_k1_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7057509Z test_normalize_operator_exhaustive_special_ndtr_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7057964Z test_normalize_operator_exhaustive_special_ndtri_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7058446Z test_normalize_operator_exhaustive_special_polygamma_special_polygamma_n_0_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7058941Z test_normalize_operator_exhaustive_special_scaled_modified_bessel_k0_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7059427Z test_normalize_operator_exhaustive_special_scaled_modified_bessel_k1_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7060163Z test_normalize_operator_exhaustive_special_shifted_chebyshev_polynomial_t_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Skipping - testing takes an unreasonably long time, #79528 (0.002s) 2023-01-11T21:27:27.7060875Z test_normalize_operator_exhaustive_special_shifted_chebyshev_polynomial_u_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Skipping - testing takes an unreasonably long time, #79528 (0.002s) 2023-01-11T21:27:27.7061573Z test_normalize_operator_exhaustive_special_shifted_chebyshev_polynomial_v_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Skipping - testing takes an unreasonably long time, #79528 (0.002s) 2023-01-11T21:27:27.7062266Z test_normalize_operator_exhaustive_special_shifted_chebyshev_polynomial_w_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Skipping - testing takes an unreasonably long time, #79528 (0.002s) 2023-01-11T21:27:27.7062784Z test_normalize_operator_exhaustive_special_spherical_bessel_j0_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7063256Z test_normalize_operator_exhaustive_special_xlog1py_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7063718Z test_normalize_operator_exhaustive_special_zeta_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7064165Z test_normalize_operator_exhaustive_split_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7064606Z test_normalize_operator_exhaustive_split_list_args_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7065063Z test_normalize_operator_exhaustive_split_with_sizes_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7065557Z test_normalize_operator_exhaustive_sqrt_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7066004Z test_normalize_operator_exhaustive_square_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7066441Z test_normalize_operator_exhaustive_squeeze_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7066882Z test_normalize_operator_exhaustive_stack_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7067321Z test_normalize_operator_exhaustive_std_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7067762Z test_normalize_operator_exhaustive_std_mean_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7068213Z test_normalize_operator_exhaustive_std_mean_unbiased_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7068671Z test_normalize_operator_exhaustive_std_unbiased_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7069113Z test_normalize_operator_exhaustive_stft_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7069546Z test_normalize_operator_exhaustive_sub_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7069969Z test_normalize_operator_exhaustive_sum_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7070409Z test_normalize_operator_exhaustive_sum_to_size_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7070855Z test_normalize_operator_exhaustive_svd_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7071292Z test_normalize_operator_exhaustive_svd_lowrank_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7071758Z test_normalize_operator_exhaustive_symeig_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7072196Z test_normalize_operator_exhaustive_t_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7072642Z test_normalize_operator_exhaustive_take_along_dim_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7073088Z test_normalize_operator_exhaustive_take_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7073516Z test_normalize_operator_exhaustive_tan_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7073950Z test_normalize_operator_exhaustive_tanh_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7074397Z test_normalize_operator_exhaustive_tensor_split_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7074851Z test_normalize_operator_exhaustive_tensordot_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7075283Z test_normalize_operator_exhaustive_tile_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7075717Z test_normalize_operator_exhaustive_to_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Skipped! (0.002s) 2023-01-11T21:27:27.7076161Z test_normalize_operator_exhaustive_to_sparse_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7076637Z test_normalize_operator_exhaustive_topk_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7077065Z test_normalize_operator_exhaustive_trace_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7077561Z test_normalize_operator_exhaustive_transpose_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7078004Z test_normalize_operator_exhaustive_trapezoid_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7078443Z test_normalize_operator_exhaustive_trapz_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7078887Z test_normalize_operator_exhaustive_triangular_solve_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7079337Z test_normalize_operator_exhaustive_tril_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7079767Z test_normalize_operator_exhaustive_triu_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7080217Z test_normalize_operator_exhaustive_true_divide_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7080653Z test_normalize_operator_exhaustive_trunc_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7081093Z test_normalize_operator_exhaustive_unbind_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7081540Z test_normalize_operator_exhaustive_unflatten_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7081990Z test_normalize_operator_exhaustive_unfold_copy_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7082429Z test_normalize_operator_exhaustive_unfold_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7082915Z test_normalize_operator_exhaustive_uniform_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7083374Z test_normalize_operator_exhaustive_unique_consecutive_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7083827Z test_normalize_operator_exhaustive_unique_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7084261Z test_normalize_operator_exhaustive_unsqueeze_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7084700Z test_normalize_operator_exhaustive_var_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7085144Z test_normalize_operator_exhaustive_var_mean_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7085600Z test_normalize_operator_exhaustive_var_mean_unbiased_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7086060Z test_normalize_operator_exhaustive_var_unbiased_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7086497Z test_normalize_operator_exhaustive_vdot_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7086944Z test_normalize_operator_exhaustive_view_as_complex_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7087392Z test_normalize_operator_exhaustive_view_as_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7087864Z test_normalize_operator_exhaustive_view_copy_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7088296Z test_normalize_operator_exhaustive_view_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7088739Z test_normalize_operator_exhaustive_vsplit_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7089177Z test_normalize_operator_exhaustive_vstack_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7089618Z test_normalize_operator_exhaustive_where_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7090049Z test_normalize_operator_exhaustive_xlogy_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7090484Z test_normalize_operator_exhaustive_zero__cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7090921Z test_normalize_operator_exhaustive_zeros_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7091365Z test_normalize_operator_exhaustive_zeros_like_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:27:27.7091760Z test_normalize_quantized_eb_cuda (__main__.TestNormalizeOperatorsCUDA) ... ok (0.001s) 2023-01-11T21:27:27.7091961Z 2023-01-11T21:27:27.7092176Z ---------------------------------------------------------------------- 2023-01-11T21:27:27.7092443Z Ran 671 tests in 13.191s 2023-01-11T21:27:27.7092574Z 2023-01-11T21:27:27.7092649Z OK (skipped=630) 2023-01-11T21:27:27.7092769Z 2023-01-11T21:27:27.7092862Z Generating XML reports... 2023-01-11T21:27:27.7093308Z Generated XML report: test-reports/python-unittest/test_fx_experimental/TEST-TestFXExperimental-20230111212713.xml 2023-01-11T21:27:27.7093891Z Generated XML report: test-reports/python-unittest/test_fx_experimental/TEST-TestNormalizeOperatorsCUDA-20230111212713.xml 2023-01-11T21:27:27.7094154Z 2023-01-11T21:27:27.7094830Z ##[endgroup] 2023-01-11T21:27:27.7095336Z FINISHED PRINTING LOG FILE of test_fx_experimental (/var/lib/jenkins/workspace/test/test-reports/test_fx_experimental_bv6m_9hq) 2023-01-11T21:27:27.7095578Z 2023-01-11T21:27:29.6011878Z Ignoring disabled issues: [] 2023-01-11T21:27:29.6242437Z Running test_import_stats ... [2023-01-11 21:27:29.623471] 2023-01-11T21:27:29.6243557Z Executing ['/opt/conda/bin/python', '-bb', 'test_import_stats.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 21:27:29.623788] 2023-01-11T21:27:35.1669313Z 2023-01-11T21:27:35.1670012Z Expand the folded group to see the log file of test_import_stats 2023-01-11T21:27:35.1671145Z ##[group]PRINTING LOG FILE of test_import_stats (/var/lib/jenkins/workspace/test/test-reports/test_import_stats_pq1awtgl) 2023-01-11T21:27:35.1671494Z 2023-01-11T21:27:35.1671634Z Running tests... 2023-01-11T21:27:35.1672159Z ---------------------------------------------------------------------- 2023-01-11T21:27:35.1672727Z Test results will be stored in test-reports/python-unittest/test_import_stats 2023-01-11T21:27:35.1673331Z test_time_cuda_device_count (__main__.TestImportTime) ... ok (2.465s) 2023-01-11T21:27:35.1673802Z test_time_import_torch (__main__.TestImportTime) ... ok (1.365s) 2023-01-11T21:27:35.1673986Z 2023-01-11T21:27:35.1674198Z ---------------------------------------------------------------------- 2023-01-11T21:27:35.1674451Z Ran 2 tests in 3.831s 2023-01-11T21:27:35.1674573Z 2023-01-11T21:27:35.1674646Z OK 2023-01-11T21:27:35.1674749Z 2023-01-11T21:27:35.1674843Z Generating XML reports... 2023-01-11T21:27:35.1675273Z Generated XML report: test-reports/python-unittest/test_import_stats/TEST-TestImportTime-20230111212730.xml 2023-01-11T21:27:35.1675513Z 2023-01-11T21:27:35.1675762Z ##[endgroup] 2023-01-11T21:27:35.1676483Z FINISHED PRINTING LOG FILE of test_import_stats (/var/lib/jenkins/workspace/test/test-reports/test_import_stats_pq1awtgl) 2023-01-11T21:27:35.1676719Z 2023-01-11T21:27:37.0800885Z Ignoring disabled issues: [] 2023-01-11T21:27:37.1055181Z Running test_jit_autocast ... [2023-01-11 21:27:37.104913] 2023-01-11T21:27:37.1057214Z Executing ['/opt/conda/bin/python', '-bb', 'test_jit_autocast.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 21:27:37.105265] 2023-01-11T21:28:46.2066825Z 2023-01-11T21:28:46.2067409Z Expand the folded group to see the log file of test_jit_autocast 2023-01-11T21:28:46.2071308Z ##[group]PRINTING LOG FILE of test_jit_autocast (/var/lib/jenkins/workspace/test/test-reports/test_jit_autocast_kac_6jpc) 2023-01-11T21:28:46.2072121Z monkeytype is not installed. Skipping tests for Profile-Directed Typing 2023-01-11T21:28:46.2072324Z 2023-01-11T21:28:46.2072411Z Running tests... 2023-01-11T21:28:46.2072804Z ---------------------------------------------------------------------- 2023-01-11T21:28:46.2073419Z Test results will be stored in test-reports/python-unittest/test_jit_autocast 2023-01-11T21:28:46.2073832Z test_autocast_api (__main__.TestAutocast) ... ok (0.766s) 2023-01-11T21:28:46.2074189Z test_autocast_api_not_supported (__main__.TestAutocast) ... skip: we need to provide dtype argument at this moment (0.001s) 2023-01-11T21:28:46.2074544Z test_autocast_autodiff (__main__.TestAutocast) ... ok (0.334s) 2023-01-11T21:28:46.2074888Z test_autocast_decorator (__main__.TestAutocast) ... skip: autocast decorators not supported (0.000s) 2023-01-11T21:28:46.2075257Z test_autocast_decorator_outside_jit (__main__.TestAutocast) ... ok (0.003s) 2023-01-11T21:28:46.2075563Z test_autocast_mixed_dtypes (__main__.TestAutocast) ... ok (0.033s) 2023-01-11T21:28:46.2075868Z test_callees (__main__.TestAutocast) ... ok (0.013s) 2023-01-11T21:28:46.2076277Z test_callees_with_autocast_off (__main__.TestAutocast) ... ok (0.013s) 2023-01-11T21:28:46.2076633Z test_callees_with_autocast_on (__main__.TestAutocast) ... ok (0.013s) 2023-01-11T21:28:46.2078270Z test_conditional_autocast (__main__.TestAutocast) ... ok (0.013s) 2023-01-11T21:28:46.2078855Z test_control_flow (__main__.TestAutocast) ... skip: broken due to lack of type propagation (0.001s) 2023-01-11T21:28:46.2079490Z test_divergent_autocast (__main__.TestAutocast) ... ok (0.011s) 2023-01-11T21:28:46.2079895Z test_divergent_types (__main__.TestAutocast) ... ok (0.012s) 2023-01-11T21:28:46.2080274Z test_duplicate_inputs (__main__.TestAutocast) ... ok (0.010s) 2023-01-11T21:28:46.2080652Z test_eager_and_script (__main__.TestAutocast) ... ok (0.004s) 2023-01-11T21:28:46.2081011Z test_explicit_casts (__main__.TestAutocast) ... ok (0.011s) 2023-01-11T21:28:46.2081295Z test_fp32_policy (__main__.TestAutocast) ... ok (0.009s) 2023-01-11T21:28:46.2081579Z test_fp32_policy_with_fp64 (__main__.TestAutocast) ... ok (0.009s) 2023-01-11T21:28:46.2081884Z test_fp32_set_opt_dtype_policy (__main__.TestAutocast) ... ok (0.011s) 2023-01-11T21:28:46.2083437Z test_fp32_set_opt_dtype_policy_fp64 (__main__.TestAutocast) ... ok (0.011s) 2023-01-11T21:28:46.2083783Z test_ignore_amp (__main__.TestAutocast) ... ok (0.003s) 2023-01-11T21:28:46.2084095Z test_implicitly_nested_autocast (__main__.TestAutocast) ... ok (0.010s) 2023-01-11T21:28:46.2084421Z test_inplace (__main__.TestAutocast) ... ok (0.012s) 2023-01-11T21:28:46.2085004Z test_jit_autocast_softmax_cpu (__main__.TestAutocast) ... skip: CPU-only test (0.000s) 2023-01-11T21:28:46.2085480Z test_jit_autocast_softmax_gpu (__main__.TestAutocast) ... ok (0.307s) 2023-01-11T21:28:46.2085913Z test_jit_call_method_under_autocast (__main__.TestAutocast) ... ok (0.038s) 2023-01-11T21:28:46.2086346Z test_jit_executor_under_autocast (__main__.TestAutocast) ... ok (0.015s) 2023-01-11T21:28:46.2086648Z test_jit_freeze_autocast_basic (__main__.TestAutocast) ... ok (0.022s) 2023-01-11T21:28:46.2086953Z test_jit_freeze_autocast_constants (__main__.TestAutocast) ... ok (0.017s) 2023-01-11T21:28:46.2087553Z test_linear_bf16 (__main__.TestAutocast) ... ok (0.011s) 2023-01-11T21:28:46.2087810Z test_minimal (__main__.TestAutocast) ... ok (0.010s) 2023-01-11T21:28:46.2088079Z test_minimal_cpu (__main__.TestAutocast) ... ok (0.009s) 2023-01-11T21:28:46.2088353Z test_minimal_off (__main__.TestAutocast) ... ok (0.009s) 2023-01-11T21:28:46.2088622Z test_nested_autocast (__main__.TestAutocast) ... ok (0.012s) 2023-01-11T21:28:46.2088893Z test_promote_policy (__main__.TestAutocast) ... ok (0.010s) 2023-01-11T21:28:46.2089202Z test_promote_policy_fp64 (__main__.TestAutocast) ... ok (0.010s) 2023-01-11T21:28:46.2089508Z test_reused_autocast (__main__.TestAutocast) ... ok (0.011s) 2023-01-11T21:28:46.2089823Z test_reused_autocast_expr (__main__.TestAutocast) ... skip: unsuported autocast syntax (0.001s) 2023-01-11T21:28:46.2090150Z test_runtime_autocast_state (__main__.TestAutocast) ... ok (0.012s) 2023-01-11T21:28:46.2090454Z test_runtime_autocast_state_expr (__main__.TestAutocast) ... ok (0.013s) 2023-01-11T21:28:46.2090805Z test_script_and_tracing (__main__.TestAutocast) ... ok (0.014s) 2023-01-11T21:28:46.2091343Z test_script_and_tracing_with_autocast (__main__.TestAutocast) ... skip: autocast(False) is ignored inside traced functions (0.001s) 2023-01-11T21:28:46.2091907Z test_script_module (__main__.TestAutocast) ... ok (0.017s) 2023-01-11T21:28:46.2092330Z test_tracing_and_script (__main__.TestAutocast) ... ok (0.019s) 2023-01-11T21:28:46.2092891Z test_tracing_with_autocast_and_script (__main__.TestAutocast) ... skip: scripted called from traced TorchScript is not yet working (0.001s) 2023-01-11T21:28:46.2093435Z test_cat_promote (__main__.TestJitTraceAutocast) ... ok (0.121s) 2023-01-11T21:28:46.2093867Z test_generate_autocast_jit_trace_model (__main__.TestJitTraceAutocast) ... ok (21.547s) 2023-01-11T21:28:46.2094343Z test_nchw_autocast_jit_trace_model (__main__.TestJitTraceAutocast) ... ok (30.489s) 2023-01-11T21:28:46.2094955Z test_nhwc_autocast_jit_trace_model (__main__.TestJitTraceAutocast) ... ok (10.596s) 2023-01-11T21:28:46.2095294Z test_script_autocast_cpu (__main__.TestJitTraceAutocast) ... ok (0.057s) 2023-01-11T21:28:46.2095615Z test_script_autocast_cuda (__main__.TestJitTraceAutocast) ... ok (0.057s) 2023-01-11T21:28:46.2096053Z test_script_autocast_enable_and_check (__main__.TestJitTraceAutocast) ... ok (0.068s) 2023-01-11T21:28:46.2096390Z test_scripted_aliasing (__main__.TestJitTraceAutocast) ... ok (0.064s) 2023-01-11T21:28:46.2096571Z 2023-01-11T21:28:46.2096793Z ---------------------------------------------------------------------- 2023-01-11T21:28:46.2097055Z Ran 53 tests in 64.875s 2023-01-11T21:28:46.2097169Z 2023-01-11T21:28:46.2097250Z OK (skipped=7) 2023-01-11T21:28:46.2097364Z 2023-01-11T21:28:46.2097457Z Generating XML reports... 2023-01-11T21:28:46.2097881Z Generated XML report: test-reports/python-unittest/test_jit_autocast/TEST-TestAutocast-20230111212740.xml 2023-01-11T21:28:46.2098406Z Generated XML report: test-reports/python-unittest/test_jit_autocast/TEST-TestJitTraceAutocast-20230111212740.xml 2023-01-11T21:28:46.2098660Z 2023-01-11T21:28:46.2099057Z ##[endgroup] 2023-01-11T21:28:46.2099460Z FINISHED PRINTING LOG FILE of test_jit_autocast (/var/lib/jenkins/workspace/test/test-reports/test_jit_autocast_kac_6jpc) 2023-01-11T21:28:46.2099686Z 2023-01-11T21:28:48.1126774Z Ignoring disabled issues: [] 2023-01-11T21:28:48.1364567Z Running test_jit_llga_fuser ... [2023-01-11 21:28:48.135974] 2023-01-11T21:28:48.1367356Z Executing ['/opt/conda/bin/python', '-bb', 'test_jit_llga_fuser.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 21:28:48.136302] 2023-01-11T21:28:51.3026799Z 2023-01-11T21:28:51.3027546Z Expand the folded group to see the log file of test_jit_llga_fuser 2023-01-11T21:28:51.3028344Z ##[group]PRINTING LOG FILE of test_jit_llga_fuser (/var/lib/jenkins/workspace/test/test-reports/test_jit_llga_fuser_05vvhmu3) 2023-01-11T21:28:51.3028578Z 2023-01-11T21:28:51.3028928Z Running tests... 2023-01-11T21:28:51.3029367Z ---------------------------------------------------------------------- 2023-01-11T21:28:51.3029858Z test_dynamo_aot_ts_onednn (__main__.TestDynamoAOT) ... Test results will be stored in test-reports/python-unittest/test_jit_llga_fuser 2023-01-11T21:28:51.3030243Z skip: Enable when integration with dynamo aot_autograd is more stable (0.001s) 2023-01-11T21:28:51.3030590Z test_context_manager (__main__.TestEnableDisableLlgaFuser) ... ok (0.066s) 2023-01-11T21:28:51.3030989Z test_bn2d_eltwise_cuda_bfloat16 (__main__.TestFusionPatternCUDA) ... skip: This test fails for BF16 on machines without AVX512. (0.001s) 2023-01-11T21:28:51.3031428Z test_bn2d_eltwise_cuda_float32 (__main__.TestFusionPatternCUDA) ... skip: This test fails for BF16 on machines without AVX512. (0.001s) 2023-01-11T21:28:51.3031850Z test_conv2d_bn_cuda_bfloat16 (__main__.TestFusionPatternCUDA) ... skip: This test fails for BF16 on machines without AVX512. (0.001s) 2023-01-11T21:28:51.3032280Z test_conv2d_bn_cuda_float32 (__main__.TestFusionPatternCUDA) ... skip: This test fails for BF16 on machines without AVX512. (0.001s) 2023-01-11T21:28:51.3032712Z test_conv2d_bn_relu_cuda_bfloat16 (__main__.TestFusionPatternCUDA) ... skip: This test fails for BF16 on machines without AVX512. (0.001s) 2023-01-11T21:28:51.3033144Z test_conv2d_bn_relu_cuda_float32 (__main__.TestFusionPatternCUDA) ... skip: This test fails for BF16 on machines without AVX512. (0.001s) 2023-01-11T21:28:51.3033564Z test_conv2d_clamp_cuda_bfloat16 (__main__.TestFusionPatternCUDA) ... skip: This test fails for BF16 on machines without AVX512. (0.001s) 2023-01-11T21:28:51.3033990Z test_conv2d_clamp_cuda_float32 (__main__.TestFusionPatternCUDA) ... skip: This test fails for BF16 on machines without AVX512. (0.001s) 2023-01-11T21:28:51.3034428Z test_conv2d_eltwise_cuda_bfloat16 (__main__.TestFusionPatternCUDA) ... skip: This test fails for BF16 on machines without AVX512. (0.001s) 2023-01-11T21:28:51.3034862Z test_conv2d_eltwise_cuda_float32 (__main__.TestFusionPatternCUDA) ... skip: This test fails for BF16 on machines without AVX512. (0.001s) 2023-01-11T21:28:51.3035273Z test_conv2d_silu_cuda_bfloat16 (__main__.TestFusionPatternCUDA) ... skip: This test fails for BF16 on machines without AVX512. (0.001s) 2023-01-11T21:28:51.3035769Z test_conv2d_silu_cuda_float32 (__main__.TestFusionPatternCUDA) ... skip: This test fails for BF16 on machines without AVX512. (0.001s) 2023-01-11T21:28:51.3036192Z test_conv2d_sum_cuda_bfloat16 (__main__.TestFusionPatternCUDA) ... skip: This test fails for BF16 on machines without AVX512. (0.001s) 2023-01-11T21:28:51.3036613Z test_conv2d_sum_cuda_float32 (__main__.TestFusionPatternCUDA) ... skip: This test fails for BF16 on machines without AVX512. (0.001s) 2023-01-11T21:28:51.3037043Z test_ensure_tensor_is_rewrapped_cuda_bfloat16 (__main__.TestFusionPatternCUDA) ... skip: This test fails for BF16 on machines without AVX512. (0.001s) 2023-01-11T21:28:51.3037494Z test_ensure_tensor_is_rewrapped_cuda_float32 (__main__.TestFusionPatternCUDA) ... skip: This test fails for BF16 on machines without AVX512. (0.001s) 2023-01-11T21:28:51.3037945Z test_linear_eltwise_cuda_bfloat16 (__main__.TestFusionPatternCUDA) ... skip: This test fails for BF16 on machines without AVX512. (0.001s) 2023-01-11T21:28:51.3038376Z test_linear_eltwise_cuda_float32 (__main__.TestFusionPatternCUDA) ... skip: This test fails for BF16 on machines without AVX512. (0.001s) 2023-01-11T21:28:51.3038818Z test_rewrap_tensor_input_to_pytorch_cuda_bfloat16 (__main__.TestFusionPatternCUDA) ... skip: This test fails for BF16 on machines without AVX512. (0.001s) 2023-01-11T21:28:51.3039262Z test_rewrap_tensor_input_to_pytorch_cuda_float32 (__main__.TestFusionPatternCUDA) ... skip: This test fails for BF16 on machines without AVX512. (0.001s) 2023-01-11T21:28:51.3039702Z test_wildcard_cuda_bfloat16 (__main__.TestFusionPatternCUDA) ... skip: This test fails for BF16 on machines without AVX512. (0.001s) 2023-01-11T21:28:51.3040154Z test_wildcard_cuda_float32 (__main__.TestFusionPatternCUDA) ... skip: This test fails for BF16 on machines without AVX512. (0.001s) 2023-01-11T21:28:51.3040592Z test_wildcard_unsupported_dtype_cuda_int32 (__main__.TestFusionPatternCUDA) ... skip: This test fails for BF16 on machines without AVX512. (0.000s) 2023-01-11T21:28:51.3041006Z test_vision_alexnet_bfloat16 (__main__.TestModel) ... skip: This test fails for BF16 on machines without AVX512. (0.000s) 2023-01-11T21:28:51.3041393Z test_vision_alexnet_float32 (__main__.TestModel) ... skip: This test fails for BF16 on machines without AVX512. (0.000s) 2023-01-11T21:28:51.3041780Z test_vision_densenet121_bfloat16 (__main__.TestModel) ... skip: This test fails for BF16 on machines without AVX512. (0.000s) 2023-01-11T21:28:51.3042167Z test_vision_densenet121_float32 (__main__.TestModel) ... skip: This test fails for BF16 on machines without AVX512. (0.000s) 2023-01-11T21:28:51.3042540Z test_vision_densenet161_bfloat16 (__main__.TestModel) ... skip: This test fails for BF16 on machines without AVX512. (0.000s) 2023-01-11T21:28:51.3042924Z test_vision_densenet161_float32 (__main__.TestModel) ... skip: This test fails for BF16 on machines without AVX512. (0.000s) 2023-01-11T21:28:51.3043307Z test_vision_densenet169_bfloat16 (__main__.TestModel) ... skip: This test fails for BF16 on machines without AVX512. (0.000s) 2023-01-11T21:28:51.3043688Z test_vision_densenet169_float32 (__main__.TestModel) ... skip: This test fails for BF16 on machines without AVX512. (0.000s) 2023-01-11T21:28:51.3044058Z test_vision_densenet201_bfloat16 (__main__.TestModel) ... skip: This test fails for BF16 on machines without AVX512. (0.000s) 2023-01-11T21:28:51.3044436Z test_vision_densenet201_float32 (__main__.TestModel) ... skip: This test fails for BF16 on machines without AVX512. (0.000s) 2023-01-11T21:28:51.3044824Z test_vision_efficientnet_b0_bfloat16 (__main__.TestModel) ... skip: This test fails for BF16 on machines without AVX512. (0.000s) 2023-01-11T21:28:51.3045222Z test_vision_efficientnet_b0_float32 (__main__.TestModel) ... skip: This test fails for BF16 on machines without AVX512. (0.000s) 2023-01-11T21:28:51.3045614Z test_vision_efficientnet_b1_bfloat16 (__main__.TestModel) ... skip: This test fails for BF16 on machines without AVX512. (0.000s) 2023-01-11T21:28:51.3046039Z test_vision_efficientnet_b1_float32 (__main__.TestModel) ... skip: This test fails for BF16 on machines without AVX512. (0.000s) 2023-01-11T21:28:51.3046435Z test_vision_efficientnet_b2_bfloat16 (__main__.TestModel) ... skip: This test fails for BF16 on machines without AVX512. (0.000s) 2023-01-11T21:28:51.3046824Z test_vision_efficientnet_b2_float32 (__main__.TestModel) ... skip: This test fails for BF16 on machines without AVX512. (0.000s) 2023-01-11T21:28:51.3047202Z test_vision_efficientnet_b3_bfloat16 (__main__.TestModel) ... skip: This test fails for BF16 on machines without AVX512. (0.000s) 2023-01-11T21:28:51.3047592Z test_vision_efficientnet_b3_float32 (__main__.TestModel) ... skip: This test fails for BF16 on machines without AVX512. (0.000s) 2023-01-11T21:28:51.3047988Z test_vision_efficientnet_b4_bfloat16 (__main__.TestModel) ... skip: This test fails for BF16 on machines without AVX512. (0.000s) 2023-01-11T21:28:51.3048374Z test_vision_efficientnet_b4_float32 (__main__.TestModel) ... skip: This test fails for BF16 on machines without AVX512. (0.000s) 2023-01-11T21:28:51.3048752Z test_vision_efficientnet_b5_bfloat16 (__main__.TestModel) ... skip: This test fails for BF16 on machines without AVX512. (0.000s) 2023-01-11T21:28:51.3049141Z test_vision_efficientnet_b5_float32 (__main__.TestModel) ... skip: This test fails for BF16 on machines without AVX512. (0.000s) 2023-01-11T21:28:51.3049527Z test_vision_efficientnet_b6_bfloat16 (__main__.TestModel) ... skip: This test fails for BF16 on machines without AVX512. (0.000s) 2023-01-11T21:28:51.3049909Z test_vision_efficientnet_b6_float32 (__main__.TestModel) ... skip: This test fails for BF16 on machines without AVX512. (0.000s) 2023-01-11T21:28:51.3050320Z test_vision_efficientnet_b7_bfloat16 (__main__.TestModel) ... skip: This test fails for BF16 on machines without AVX512. (0.000s) 2023-01-11T21:28:51.3050707Z test_vision_efficientnet_b7_float32 (__main__.TestModel) ... skip: This test fails for BF16 on machines without AVX512. (0.000s) 2023-01-11T21:28:51.3051096Z test_vision_googlenet_bfloat16 (__main__.TestModel) ... skip: This test fails for BF16 on machines without AVX512. (0.000s) 2023-01-11T21:28:51.3051481Z test_vision_googlenet_float32 (__main__.TestModel) ... skip: This test fails for BF16 on machines without AVX512. (0.000s) 2023-01-11T21:28:51.3051858Z test_vision_mnasnet1_0_bfloat16 (__main__.TestModel) ... skip: This test fails for BF16 on machines without AVX512. (0.000s) 2023-01-11T21:28:51.3052235Z test_vision_mnasnet1_0_float32 (__main__.TestModel) ... skip: This test fails for BF16 on machines without AVX512. (0.000s) 2023-01-11T21:28:51.3052663Z test_vision_mobilenet_v2_bfloat16 (__main__.TestModel) ... skip: This test fails for BF16 on machines without AVX512. (0.000s) 2023-01-11T21:28:51.3053046Z test_vision_mobilenet_v2_float32 (__main__.TestModel) ... skip: This test fails for BF16 on machines without AVX512. (0.000s) 2023-01-11T21:28:51.3053443Z test_vision_mobilenet_v3_large_bfloat16 (__main__.TestModel) ... skip: This test fails for BF16 on machines without AVX512. (0.000s) 2023-01-11T21:28:51.3053837Z test_vision_mobilenet_v3_large_float32 (__main__.TestModel) ... skip: This test fails for BF16 on machines without AVX512. (0.000s) 2023-01-11T21:28:51.3054226Z test_vision_regnet_y_400mf_bfloat16 (__main__.TestModel) ... skip: This test fails for BF16 on machines without AVX512. (0.000s) 2023-01-11T21:28:51.3054938Z test_vision_regnet_y_400mf_float32 (__main__.TestModel) ... skip: This test fails for BF16 on machines without AVX512. (0.000s) 2023-01-11T21:28:51.3055351Z test_vision_resnet50_bfloat16 (__main__.TestModel) ... skip: This test fails for BF16 on machines without AVX512. (0.000s) 2023-01-11T21:28:51.3055731Z test_vision_resnet50_float32 (__main__.TestModel) ... skip: This test fails for BF16 on machines without AVX512. (0.000s) 2023-01-11T21:28:51.3056124Z test_vision_resnext101_32x8d_bfloat16 (__main__.TestModel) ... skip: This test fails for BF16 on machines without AVX512. (0.000s) 2023-01-11T21:28:51.3056562Z test_vision_resnext101_32x8d_float32 (__main__.TestModel) ... skip: This test fails for BF16 on machines without AVX512. (0.000s) 2023-01-11T21:28:51.3056952Z test_vision_resnext50_32x4d_bfloat16 (__main__.TestModel) ... skip: This test fails for BF16 on machines without AVX512. (0.000s) 2023-01-11T21:28:51.3057329Z test_vision_resnext50_32x4d_float32 (__main__.TestModel) ... skip: This test fails for BF16 on machines without AVX512. (0.000s) 2023-01-11T21:28:51.3057721Z test_vision_shufflenet_v2_x1_0_bfloat16 (__main__.TestModel) ... skip: This test fails for BF16 on machines without AVX512. (0.000s) 2023-01-11T21:28:51.3058119Z test_vision_shufflenet_v2_x1_0_float32 (__main__.TestModel) ... skip: This test fails for BF16 on machines without AVX512. (0.000s) 2023-01-11T21:28:51.3058508Z test_vision_squeezenet1_0_bfloat16 (__main__.TestModel) ... skip: This test fails for BF16 on machines without AVX512. (0.000s) 2023-01-11T21:28:51.3058892Z test_vision_squeezenet1_0_float32 (__main__.TestModel) ... skip: This test fails for BF16 on machines without AVX512. (0.000s) 2023-01-11T21:28:51.3059270Z test_vision_vgg16_bfloat16 (__main__.TestModel) ... skip: This test fails for BF16 on machines without AVX512. (0.000s) 2023-01-11T21:28:51.3059646Z test_vision_vgg16_float32 (__main__.TestModel) ... skip: This test fails for BF16 on machines without AVX512. (0.000s) 2023-01-11T21:28:51.3060026Z test_vision_wide_resnet50_2_bfloat16 (__main__.TestModel) ... skip: This test fails for BF16 on machines without AVX512. (0.000s) 2023-01-11T21:28:51.3060412Z test_vision_wide_resnet50_2_float32 (__main__.TestModel) ... skip: This test fails for BF16 on machines without AVX512. (0.000s) 2023-01-11T21:28:51.3060829Z test_add_cuda_bfloat16 (__main__.TestOpCUDA) ... skip: This test fails for BF16 on machines without AVX512. (0.000s) 2023-01-11T21:28:51.3061200Z test_add_cuda_float32 (__main__.TestOpCUDA) ... skip: This test fails for BF16 on machines without AVX512. (0.000s) 2023-01-11T21:28:51.3061585Z test_add_scalar_cuda_bfloat16 (__main__.TestOpCUDA) ... skip: This test fails for BF16 on machines without AVX512. (0.000s) 2023-01-11T21:28:51.3061963Z test_add_scalar_cuda_float32 (__main__.TestOpCUDA) ... skip: This test fails for BF16 on machines without AVX512. (0.000s) 2023-01-11T21:28:51.3062346Z test_addmm_cuda_bfloat16 (__main__.TestOpCUDA) ... skip: This test fails for BF16 on machines without AVX512. (0.000s) 2023-01-11T21:28:51.3062726Z test_addmm_cuda_float32 (__main__.TestOpCUDA) ... skip: This test fails for BF16 on machines without AVX512. (0.000s) 2023-01-11T21:28:51.3063107Z test_avg_pool2d_cuda_bfloat16 (__main__.TestOpCUDA) ... skip: This test fails for BF16 on machines without AVX512. (0.001s) 2023-01-11T21:28:51.3063484Z test_avg_pool2d_cuda_float32 (__main__.TestOpCUDA) ... skip: This test fails for BF16 on machines without AVX512. (0.001s) 2023-01-11T21:28:51.3063869Z test_bn2d_cuda_bfloat16 (__main__.TestOpCUDA) ... skip: This test fails for BF16 on machines without AVX512. (0.000s) 2023-01-11T21:28:51.3064240Z test_bn2d_cuda_float32 (__main__.TestOpCUDA) ... skip: This test fails for BF16 on machines without AVX512. (0.000s) 2023-01-11T21:28:51.3064612Z test_cat_cuda_bfloat16 (__main__.TestOpCUDA) ... skip: This test fails for BF16 on machines without AVX512. (0.001s) 2023-01-11T21:28:51.3064977Z test_cat_cuda_float32 (__main__.TestOpCUDA) ... skip: This test fails for BF16 on machines without AVX512. (0.001s) 2023-01-11T21:28:51.3065349Z test_conv2d_cuda_bfloat16 (__main__.TestOpCUDA) ... skip: This test fails for BF16 on machines without AVX512. (0.001s) 2023-01-11T21:28:51.3065732Z test_conv2d_cuda_float32 (__main__.TestOpCUDA) ... skip: This test fails for BF16 on machines without AVX512. (0.001s) 2023-01-11T21:28:51.3066107Z test_eltwise_cuda_bfloat16 (__main__.TestOpCUDA) ... skip: This test fails for BF16 on machines without AVX512. (0.001s) 2023-01-11T21:28:51.3066514Z test_eltwise_cuda_float32 (__main__.TestOpCUDA) ... skip: This test fails for BF16 on machines without AVX512. (0.001s) 2023-01-11T21:28:51.3066903Z test_identity_binary_cuda_bfloat16 (__main__.TestOpCUDA) ... skip: This test fails for BF16 on machines without AVX512. (0.000s) 2023-01-11T21:28:51.3067299Z test_identity_binary_cuda_float32 (__main__.TestOpCUDA) ... skip: This test fails for BF16 on machines without AVX512. (0.000s) 2023-01-11T21:28:51.3067690Z test_layer_norm_cuda_bfloat16 (__main__.TestOpCUDA) ... skip: This test fails for BF16 on machines without AVX512. (0.000s) 2023-01-11T21:28:51.3068066Z test_layer_norm_cuda_float32 (__main__.TestOpCUDA) ... skip: This test fails for BF16 on machines without AVX512. (0.000s) 2023-01-11T21:28:51.3068448Z test_linear_cuda_bfloat16 (__main__.TestOpCUDA) ... skip: This test fails for BF16 on machines without AVX512. (0.000s) 2023-01-11T21:28:51.3068822Z test_linear_cuda_float32 (__main__.TestOpCUDA) ... skip: This test fails for BF16 on machines without AVX512. (0.000s) 2023-01-11T21:28:51.3069199Z test_max_pool2d_cuda_bfloat16 (__main__.TestOpCUDA) ... skip: This test fails for BF16 on machines without AVX512. (0.001s) 2023-01-11T21:28:51.3069586Z test_max_pool2d_cuda_float32 (__main__.TestOpCUDA) ... skip: This test fails for BF16 on machines without AVX512. (0.001s) 2023-01-11T21:28:51.3069959Z test_mul_cuda_bfloat16 (__main__.TestOpCUDA) ... skip: This test fails for BF16 on machines without AVX512. (0.000s) 2023-01-11T21:28:51.3070325Z test_mul_cuda_float32 (__main__.TestOpCUDA) ... skip: This test fails for BF16 on machines without AVX512. (0.000s) 2023-01-11T21:28:51.3070692Z test_softmax_cuda_bfloat16 (__main__.TestOpCUDA) ... skip: This test fails for BF16 on machines without AVX512. (0.000s) 2023-01-11T21:28:51.3071107Z test_softmax_cuda_float32 (__main__.TestOpCUDA) ... skip: This test fails for BF16 on machines without AVX512. (0.000s) 2023-01-11T21:28:51.3071490Z test_typecheck_cuda_bfloat16 (__main__.TestOpCUDA) ... skip: This test fails for BF16 on machines without AVX512. (0.000s) 2023-01-11T21:28:51.3071887Z test_typecheck_cuda_float32 (__main__.TestOpCUDA) ... skip: This test fails for BF16 on machines without AVX512. (0.000s) 2023-01-11T21:28:51.3072287Z test_variable_kernel_avg_pool2d_cuda_bfloat16 (__main__.TestOpCUDA) ... skip: This test fails for BF16 on machines without AVX512. (0.001s) 2023-01-11T21:28:51.3072705Z test_variable_kernel_avg_pool2d_cuda_float32 (__main__.TestOpCUDA) ... skip: This test fails for BF16 on machines without AVX512. (0.001s) 2023-01-11T21:28:51.3072928Z 2023-01-11T21:28:51.3073178Z ---------------------------------------------------------------------- 2023-01-11T21:28:51.3073438Z Ran 107 tests in 0.119s 2023-01-11T21:28:51.3073569Z 2023-01-11T21:28:51.3073644Z OK (skipped=106) 2023-01-11T21:28:51.3073763Z 2023-01-11T21:28:51.3073854Z Generating XML reports... 2023-01-11T21:28:51.3074328Z Generated XML report: test-reports/python-unittest/test_jit_llga_fuser/TEST-TestEnableDisableLlgaFuser-20230111212850.xml 2023-01-11T21:28:51.3074877Z Generated XML report: test-reports/python-unittest/test_jit_llga_fuser/TEST-TestDynamoAOT-20230111212850.xml 2023-01-11T21:28:51.3075402Z Generated XML report: test-reports/python-unittest/test_jit_llga_fuser/TEST-TestFusionPatternCUDA-20230111212850.xml 2023-01-11T21:28:51.3075913Z Generated XML report: test-reports/python-unittest/test_jit_llga_fuser/TEST-TestModel-20230111212850.xml 2023-01-11T21:28:51.3076398Z Generated XML report: test-reports/python-unittest/test_jit_llga_fuser/TEST-TestOpCUDA-20230111212850.xml 2023-01-11T21:28:51.3076621Z 2023-01-11T21:28:51.3076917Z ##[endgroup] 2023-01-11T21:28:51.3077318Z FINISHED PRINTING LOG FILE of test_jit_llga_fuser (/var/lib/jenkins/workspace/test/test-reports/test_jit_llga_fuser_05vvhmu3) 2023-01-11T21:28:51.3077548Z 2023-01-11T21:28:53.2526735Z Ignoring disabled issues: [] 2023-01-11T21:28:53.2765035Z Running test_matmul_cuda ... [2023-01-11 21:28:53.276086] 2023-01-11T21:28:53.2767687Z Executing ['/opt/conda/bin/python', '-bb', 'test_matmul_cuda.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 21:28:53.276441] 2023-01-11T21:30:26.0607249Z 2023-01-11T21:30:26.0608194Z Expand the folded group to see the log file of test_matmul_cuda 2023-01-11T21:30:26.0608895Z ##[group]PRINTING LOG FILE of test_matmul_cuda (/var/lib/jenkins/workspace/test/test-reports/test_matmul_cuda__5okok14) 2023-01-11T21:30:26.0610463Z 2023-01-11T21:30:26.0610837Z Running tests... 2023-01-11T21:30:26.0611314Z ---------------------------------------------------------------------- 2023-01-11T21:30:26.0613229Z Test results will be stored in test-reports/python-unittest/test_matmul_cuda 2023-01-11T21:30:26.0613641Z test_cublas_addmm_size_10000_cuda_bfloat16 (__main__.TestMatmulCudaCUDA) ... ok (9.429s) 2023-01-11T21:30:26.0614112Z test_cublas_addmm_size_10000_cuda_float16 (__main__.TestMatmulCudaCUDA) ... ok (13.119s) 2023-01-11T21:30:26.0614737Z test_cublas_addmm_size_10000_cuda_float32 (__main__.TestMatmulCudaCUDA) ... ok (7.847s) 2023-01-11T21:30:26.0615304Z test_cublas_addmm_size_1000_cuda_bfloat16 (__main__.TestMatmulCudaCUDA) ... ok (0.043s) 2023-01-11T21:30:26.0619624Z test_cublas_addmm_size_1000_cuda_float16 (__main__.TestMatmulCudaCUDA) ... ok (0.088s) 2023-01-11T21:30:26.0620108Z test_cublas_addmm_size_1000_cuda_float32 (__main__.TestMatmulCudaCUDA) ... ok (0.033s) 2023-01-11T21:30:26.0620596Z test_cublas_addmm_size_100_cuda_bfloat16 (__main__.TestMatmulCudaCUDA) ... ok (0.003s) 2023-01-11T21:30:26.0621085Z test_cublas_addmm_size_100_cuda_float16 (__main__.TestMatmulCudaCUDA) ... ok (0.003s) 2023-01-11T21:30:26.0621552Z test_cublas_addmm_size_100_cuda_float32 (__main__.TestMatmulCudaCUDA) ... ok (0.003s) 2023-01-11T21:30:26.0622299Z test_cublas_baddbmm_large_input_1_10000_10000_10000_cuda_bfloat16 (__main__.TestMatmulCudaCUDA) ... ok (15.688s) 2023-01-11T21:30:26.0622674Z test_cublas_baddbmm_large_input_1_10000_10000_10000_cuda_float16 (__main__.TestMatmulCudaCUDA) ... ok (14.463s) 2023-01-11T21:30:26.0623040Z test_cublas_baddbmm_large_input_1_10000_10000_10000_cuda_float32 (__main__.TestMatmulCudaCUDA) ... ok (15.260s) 2023-01-11T21:30:26.0623518Z test_cublas_baddbmm_large_input_1_10000_1000_10000_cuda_bfloat16 (__main__.TestMatmulCudaCUDA) ... ok (4.352s) 2023-01-11T21:30:26.0623965Z test_cublas_baddbmm_large_input_1_10000_1000_10000_cuda_float16 (__main__.TestMatmulCudaCUDA) ... ok (4.600s) 2023-01-11T21:30:26.0624454Z test_cublas_baddbmm_large_input_1_10000_1000_10000_cuda_float32 (__main__.TestMatmulCudaCUDA) ... ok (4.673s) 2023-01-11T21:30:26.0624901Z test_cublas_baddbmm_large_input_2_1000_1000_1000_cuda_bfloat16 (__main__.TestMatmulCudaCUDA) ... ok (0.069s) 2023-01-11T21:30:26.0625264Z test_cublas_baddbmm_large_input_2_1000_1000_1000_cuda_float16 (__main__.TestMatmulCudaCUDA) ... ok (0.067s) 2023-01-11T21:30:26.0625631Z test_cublas_baddbmm_large_input_2_1000_1000_1000_cuda_float32 (__main__.TestMatmulCudaCUDA) ... ok (0.074s) 2023-01-11T21:30:26.0625999Z test_cublas_baddbmm_large_input_2_100_100_100_cuda_bfloat16 (__main__.TestMatmulCudaCUDA) ... ok (0.010s) 2023-01-11T21:30:26.0626355Z test_cublas_baddbmm_large_input_2_100_100_100_cuda_float16 (__main__.TestMatmulCudaCUDA) ... ok (0.004s) 2023-01-11T21:30:26.0626718Z test_cublas_baddbmm_large_input_2_100_100_100_cuda_float32 (__main__.TestMatmulCudaCUDA) ... ok (0.003s) 2023-01-11T21:30:26.0626917Z 2023-01-11T21:30:26.0627159Z ---------------------------------------------------------------------- 2023-01-11T21:30:26.0627427Z Ran 21 tests in 89.835s 2023-01-11T21:30:26.0627554Z 2023-01-11T21:30:26.0627617Z OK 2023-01-11T21:30:26.0627720Z 2023-01-11T21:30:26.0627815Z Generating XML reports... 2023-01-11T21:30:26.0632671Z Generated XML report: test-reports/python-unittest/test_matmul_cuda/TEST-TestMatmulCudaCUDA-20230111212855.xml 2023-01-11T21:30:26.0632937Z 2023-01-11T21:30:26.0633314Z ##[endgroup] 2023-01-11T21:30:26.0633725Z FINISHED PRINTING LOG FILE of test_matmul_cuda (/var/lib/jenkins/workspace/test/test-reports/test_matmul_cuda__5okok14) 2023-01-11T21:30:26.0634095Z 2023-01-11T21:30:28.0102741Z Ignoring disabled issues: [] 2023-01-11T21:30:28.0340496Z Running test_mkldnn_fusion ... [2023-01-11 21:30:28.033291] 2023-01-11T21:30:28.0341623Z Executing ['/opt/conda/bin/python', '-bb', 'test_mkldnn_fusion.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 21:30:28.033650] 2023-01-11T21:31:07.6482941Z 2023-01-11T21:31:07.6483498Z Expand the folded group to see the log file of test_mkldnn_fusion 2023-01-11T21:31:07.6484243Z ##[group]PRINTING LOG FILE of test_mkldnn_fusion (/var/lib/jenkins/workspace/test/test-reports/test_mkldnn_fusion_8b5he92q) 2023-01-11T21:31:07.6484473Z 2023-01-11T21:31:07.6486036Z Running tests... 2023-01-11T21:31:07.6486652Z ---------------------------------------------------------------------- 2023-01-11T21:31:07.6487231Z Test results will be stored in test-reports/python-unittest/test_mkldnn_fusion 2023-01-11T21:31:07.6487822Z test_conv_binary_fusion_ops (__main__.TestMkldnnFusion) ... skip: test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test (0.002s) 2023-01-11T21:31:07.6488322Z test_conv_unary_fusion_nnc (__main__.TestMkldnnFusion) ... ok (0.870s) 2023-01-11T21:31:07.6488741Z test_conv_unary_fusion_ops (__main__.TestMkldnnFusion) ... ok (27.080s) 2023-01-11T21:31:07.6490615Z test_linear_binary_fusion_ops (__main__.TestMkldnnFusion) ... ok (0.015s) 2023-01-11T21:31:07.6490990Z test_linear_unary_fusion_ops (__main__.TestMkldnnFusion) ... ok (0.019s) 2023-01-11T21:31:07.6491353Z test_single_conv (__main__.TestMkldnnFusion) ... ok (1.273s) 2023-01-11T21:31:07.6491674Z test_unsupported_conv (__main__.TestMkldnnFusion) ... ok (7.621s) 2023-01-11T21:31:07.6492138Z 2023-01-11T21:31:07.6492387Z ---------------------------------------------------------------------- 2023-01-11T21:31:07.6492662Z Ran 7 tests in 37.741s 2023-01-11T21:31:07.6492827Z 2023-01-11T21:31:07.6492933Z OK (skipped=1) 2023-01-11T21:31:07.6501323Z 2023-01-11T21:31:07.6501637Z Generating XML reports... 2023-01-11T21:31:07.6502301Z Generated XML report: test-reports/python-unittest/test_mkldnn_fusion/TEST-TestMkldnnFusion-20230111213029.xml 2023-01-11T21:31:07.6502649Z 2023-01-11T21:31:07.6503052Z ##[endgroup] 2023-01-11T21:31:07.6503473Z FINISHED PRINTING LOG FILE of test_mkldnn_fusion (/var/lib/jenkins/workspace/test/test-reports/test_mkldnn_fusion_8b5he92q) 2023-01-11T21:31:07.6503703Z 2023-01-11T21:31:09.5978476Z Ignoring disabled issues: [] 2023-01-11T21:31:09.6213428Z Running test_module_init ... [2023-01-11 21:31:09.620784] 2023-01-11T21:31:09.6215751Z Executing ['/opt/conda/bin/python', '-bb', 'test_module_init.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 21:31:09.621148] 2023-01-11T21:31:15.5358604Z 2023-01-11T21:31:15.5359264Z Expand the folded group to see the log file of test_module_init 2023-01-11T21:31:15.5360218Z ##[group]PRINTING LOG FILE of test_module_init (/var/lib/jenkins/workspace/test/test-reports/test_module_init___9kb0fz) 2023-01-11T21:31:15.5360514Z 2023-01-11T21:31:15.5360634Z Running tests... 2023-01-11T21:31:15.5361278Z ---------------------------------------------------------------------- 2023-01-11T21:31:15.5361866Z Test results will be stored in test-reports/python-unittest/test_module_init 2023-01-11T21:31:15.5362283Z test_nn_AdaptiveAvgPool1d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.003s) 2023-01-11T21:31:15.5364149Z test_nn_AdaptiveAvgPool1d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5364639Z test_nn_AdaptiveAvgPool2d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5365071Z test_nn_AdaptiveAvgPool2d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5365479Z test_nn_AdaptiveAvgPool3d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5365830Z test_nn_AdaptiveAvgPool3d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5366411Z test_nn_AdaptiveLogSoftmaxWithLoss_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.005s) 2023-01-11T21:31:15.5366803Z test_nn_AdaptiveLogSoftmaxWithLoss_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.004s) 2023-01-11T21:31:15.5367161Z test_nn_AdaptiveMaxPool1d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5367510Z test_nn_AdaptiveMaxPool1d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.003s) 2023-01-11T21:31:15.5367854Z test_nn_AdaptiveMaxPool2d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5368195Z test_nn_AdaptiveMaxPool2d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5368527Z test_nn_AdaptiveMaxPool3d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5368871Z test_nn_AdaptiveMaxPool3d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5369215Z test_nn_AlphaDropout_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5369543Z test_nn_AlphaDropout_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5369875Z test_nn_AvgPool1d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5370197Z test_nn_AvgPool1d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5370522Z test_nn_AvgPool2d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5370833Z test_nn_AvgPool2d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5371174Z test_nn_AvgPool3d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5371519Z test_nn_AvgPool3d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5371900Z test_nn_BCELoss_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5372224Z test_nn_BCELoss_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5372559Z test_nn_BCEWithLogitsLoss_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5372911Z test_nn_BCEWithLogitsLoss_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5373246Z test_nn_BatchNorm1d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.004s) 2023-01-11T21:31:15.5373576Z test_nn_BatchNorm1d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.003s) 2023-01-11T21:31:15.5373900Z test_nn_BatchNorm2d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.003s) 2023-01-11T21:31:15.5374216Z test_nn_BatchNorm2d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.004s) 2023-01-11T21:31:15.5374882Z test_nn_BatchNorm3d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.003s) 2023-01-11T21:31:15.5375217Z test_nn_BatchNorm3d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.003s) 2023-01-11T21:31:15.5375611Z test_nn_Bilinear_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.003s) 2023-01-11T21:31:15.5375930Z test_nn_Bilinear_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.003s) 2023-01-11T21:31:15.5376252Z test_nn_CELU_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5376566Z test_nn_CELU_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5376879Z test_nn_CTCLoss_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5377204Z test_nn_CTCLoss_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5377537Z test_nn_ChannelShuffle_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5377876Z test_nn_ChannelShuffle_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5378208Z test_nn_ConstantPad1d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5378550Z test_nn_ConstantPad1d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5378889Z test_nn_ConstantPad2d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5379284Z test_nn_ConstantPad2d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5379618Z test_nn_ConstantPad3d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5379953Z test_nn_ConstantPad3d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5380287Z test_nn_Conv1d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.003s) 2023-01-11T21:31:15.5380595Z test_nn_Conv1d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.003s) 2023-01-11T21:31:15.5380947Z test_nn_Conv2d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.003s) 2023-01-11T21:31:15.5381276Z test_nn_Conv2d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.003s) 2023-01-11T21:31:15.5381582Z test_nn_Conv3d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.003s) 2023-01-11T21:31:15.5381890Z test_nn_Conv3d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.003s) 2023-01-11T21:31:15.5382220Z test_nn_ConvTranspose1d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.003s) 2023-01-11T21:31:15.5382563Z test_nn_ConvTranspose1d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.003s) 2023-01-11T21:31:15.5382903Z test_nn_ConvTranspose2d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.003s) 2023-01-11T21:31:15.5383244Z test_nn_ConvTranspose2d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.003s) 2023-01-11T21:31:15.5383590Z test_nn_ConvTranspose3d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.003s) 2023-01-11T21:31:15.5383917Z test_nn_ConvTranspose3d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.003s) 2023-01-11T21:31:15.5384274Z test_nn_CosineEmbeddingLoss_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5384680Z test_nn_CosineEmbeddingLoss_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5385039Z test_nn_CosineSimilarity_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5385381Z test_nn_CosineSimilarity_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5385733Z test_nn_CrossEntropyLoss_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5386080Z test_nn_CrossEntropyLoss_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5386416Z test_nn_CrossMapLRN2d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5386764Z test_nn_CrossMapLRN2d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5387100Z test_nn_Dropout1d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5387429Z test_nn_Dropout1d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5387752Z test_nn_Dropout2d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5388076Z test_nn_Dropout2d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5388400Z test_nn_Dropout3d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5388723Z test_nn_Dropout3d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5389052Z test_nn_Dropout_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.003s) 2023-01-11T21:31:15.5389377Z test_nn_Dropout_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5389693Z test_nn_ELU_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5390000Z test_nn_ELU_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5390332Z test_nn_EmbeddingBag_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.003s) 2023-01-11T21:31:15.5390673Z test_nn_EmbeddingBag_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.003s) 2023-01-11T21:31:15.5391014Z test_nn_Embedding_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.003s) 2023-01-11T21:31:15.5391379Z test_nn_Embedding_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.003s) 2023-01-11T21:31:15.5391756Z test_nn_FeatureAlphaDropout_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5392129Z test_nn_FeatureAlphaDropout_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5392464Z test_nn_Flatten_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5392791Z test_nn_Flatten_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5393112Z test_nn_Fold_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5393422Z test_nn_Fold_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5393765Z test_nn_FractionalMaxPool2d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5394132Z test_nn_FractionalMaxPool2d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5394490Z test_nn_FractionalMaxPool3d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5394838Z test_nn_FractionalMaxPool3d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5395174Z test_nn_GELU_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5395492Z test_nn_GELU_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5395801Z test_nn_GLU_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5396116Z test_nn_GLU_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5396435Z test_nn_GRUCell_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.003s) 2023-01-11T21:31:15.5396757Z test_nn_GRUCell_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.003s) 2023-01-11T21:31:15.5397064Z test_nn_GRU_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.771s) 2023-01-11T21:31:15.5397407Z test_nn_GRU_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.004s) 2023-01-11T21:31:15.5397738Z test_nn_GaussianNLLLoss_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5398081Z test_nn_GaussianNLLLoss_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5398421Z test_nn_GroupNorm_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.003s) 2023-01-11T21:31:15.5398750Z test_nn_GroupNorm_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.003s) 2023-01-11T21:31:15.5399086Z test_nn_Hardshrink_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5399408Z test_nn_Hardshrink_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5399737Z test_nn_Hardsigmoid_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5400071Z test_nn_Hardsigmoid_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5400423Z test_nn_Hardswish_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5400770Z test_nn_Hardswish_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5401098Z test_nn_Hardtanh_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5401431Z test_nn_Hardtanh_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5401767Z test_nn_HingeEmbeddingLoss_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5402126Z test_nn_HingeEmbeddingLoss_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5402474Z test_nn_HuberLoss_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5402792Z test_nn_HuberLoss_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5403117Z test_nn_Identity_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5403444Z test_nn_Identity_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5403780Z test_nn_InstanceNorm1d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.003s) 2023-01-11T21:31:15.5404115Z test_nn_InstanceNorm1d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.003s) 2023-01-11T21:31:15.5404491Z test_nn_InstanceNorm2d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.003s) 2023-01-11T21:31:15.5404836Z test_nn_InstanceNorm2d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.003s) 2023-01-11T21:31:15.5405164Z test_nn_InstanceNorm3d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.003s) 2023-01-11T21:31:15.5405507Z test_nn_InstanceNorm3d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.003s) 2023-01-11T21:31:15.5405845Z test_nn_KLDivLoss_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5406183Z test_nn_KLDivLoss_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5406502Z test_nn_L1Loss_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5406827Z test_nn_L1Loss_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5407153Z test_nn_LPPool1d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5407477Z test_nn_LPPool1d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5407808Z test_nn_LPPool2d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5408137Z test_nn_LPPool2d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5408463Z test_nn_LSTMCell_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.003s) 2023-01-11T21:31:15.5408776Z test_nn_LSTMCell_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.003s) 2023-01-11T21:31:15.5409099Z test_nn_LSTM_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.003s) 2023-01-11T21:31:15.5409420Z test_nn_LSTM_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.003s) 2023-01-11T21:31:15.5409769Z test_nn_LayerNorm_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.003s) 2023-01-11T21:31:15.5410094Z test_nn_LayerNorm_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.003s) 2023-01-11T21:31:15.5410887Z test_nn_LazyBatchNorm1d_cuda_float32 (__main__.TestModuleInitCUDA) ... /opt/conda/lib/python3.10/site-packages/torch/nn/modules/lazy.py:180: UserWarning: Lazy modules are a new feature under heavy development so changes to the API or functionality can happen at any moment. 2023-01-11T21:31:15.5411534Z warnings.warn('Lazy modules are a new feature under heavy development ' 2023-01-11T21:31:15.5411789Z ok (0.004s) 2023-01-11T21:31:15.5412061Z test_nn_LazyBatchNorm1d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.004s) 2023-01-11T21:31:15.5412414Z test_nn_LazyBatchNorm2d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.003s) 2023-01-11T21:31:15.5412762Z test_nn_LazyBatchNorm2d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.003s) 2023-01-11T21:31:15.5413106Z test_nn_LazyBatchNorm3d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.003s) 2023-01-11T21:31:15.5413456Z test_nn_LazyBatchNorm3d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.003s) 2023-01-11T21:31:15.5413799Z test_nn_LazyConv1d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.004s) 2023-01-11T21:31:15.5414125Z test_nn_LazyConv1d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.003s) 2023-01-11T21:31:15.5414454Z test_nn_LazyConv2d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.003s) 2023-01-11T21:31:15.5415306Z test_nn_LazyConv2d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.003s) 2023-01-11T21:31:15.5415734Z test_nn_LazyConv3d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.003s) 2023-01-11T21:31:15.5416045Z test_nn_LazyConv3d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.003s) 2023-01-11T21:31:15.5416383Z test_nn_LazyConvTranspose1d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.003s) 2023-01-11T21:31:15.5416734Z test_nn_LazyConvTranspose1d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.003s) 2023-01-11T21:31:15.5417082Z test_nn_LazyConvTranspose2d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.004s) 2023-01-11T21:31:15.5417432Z test_nn_LazyConvTranspose2d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.003s) 2023-01-11T21:31:15.5417855Z test_nn_LazyConvTranspose3d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.003s) 2023-01-11T21:31:15.5418204Z test_nn_LazyConvTranspose3d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.003s) 2023-01-11T21:31:15.5418543Z test_nn_LazyInstanceNorm1d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.003s) 2023-01-11T21:31:15.5418889Z test_nn_LazyInstanceNorm1d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.003s) 2023-01-11T21:31:15.5419232Z test_nn_LazyInstanceNorm2d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.003s) 2023-01-11T21:31:15.5419567Z test_nn_LazyInstanceNorm2d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.004s) 2023-01-11T21:31:15.5419917Z test_nn_LazyInstanceNorm3d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.003s) 2023-01-11T21:31:15.5420254Z test_nn_LazyInstanceNorm3d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.003s) 2023-01-11T21:31:15.5420584Z test_nn_LazyLinear_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.003s) 2023-01-11T21:31:15.5420897Z test_nn_LazyLinear_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.003s) 2023-01-11T21:31:15.5421217Z test_nn_LeakyReLU_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5421534Z test_nn_LeakyReLU_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5421851Z test_nn_Linear_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.003s) 2023-01-11T21:31:15.5422152Z test_nn_Linear_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.003s) 2023-01-11T21:31:15.5422486Z test_nn_LocalResponseNorm_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5422834Z test_nn_LocalResponseNorm_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5423220Z test_nn_LogSigmoid_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5423547Z test_nn_LogSigmoid_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5423871Z test_nn_LogSoftmax_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.003s) 2023-01-11T21:31:15.5424185Z test_nn_LogSoftmax_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5424509Z test_nn_MSELoss_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5424836Z test_nn_MSELoss_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5425168Z test_nn_MarginRankingLoss_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5425512Z test_nn_MarginRankingLoss_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5425849Z test_nn_MaxPool1d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5426178Z test_nn_MaxPool1d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5426493Z test_nn_MaxPool2d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5426813Z test_nn_MaxPool2d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5427140Z test_nn_MaxPool3d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5427464Z test_nn_MaxPool3d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5427783Z test_nn_MaxUnpool1d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5428117Z test_nn_MaxUnpool1d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5428448Z test_nn_MaxUnpool2d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5428765Z test_nn_MaxUnpool2d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5429095Z test_nn_MaxUnpool3d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5429418Z test_nn_MaxUnpool3d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5429878Z test_nn_Mish_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5430244Z test_nn_Mish_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5430600Z test_nn_ModuleDict_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5430916Z test_nn_ModuleDict_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5431236Z test_nn_ModuleList_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5431557Z test_nn_ModuleList_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5431898Z test_nn_MultiLabelMarginLoss_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5432258Z test_nn_MultiLabelMarginLoss_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5432627Z test_nn_MultiLabelSoftMarginLoss_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.003s) 2023-01-11T21:31:15.5433003Z test_nn_MultiLabelSoftMarginLoss_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5433356Z test_nn_MultiMarginLoss_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5433706Z test_nn_MultiMarginLoss_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5434055Z test_nn_MultiheadAttention_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.003s) 2023-01-11T21:31:15.5434399Z test_nn_MultiheadAttention_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.003s) 2023-01-11T21:31:15.5434735Z test_nn_NLLLoss_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5435057Z test_nn_NLLLoss_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5435379Z test_nn_PReLU_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.003s) 2023-01-11T21:31:15.5435726Z test_nn_PReLU_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.003s) 2023-01-11T21:31:15.5436062Z test_nn_PairwiseDistance_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5436416Z test_nn_PairwiseDistance_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5436752Z test_nn_ParameterDict_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5437091Z test_nn_ParameterDict_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5437431Z test_nn_ParameterList_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5437775Z test_nn_ParameterList_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5438107Z test_nn_PixelShuffle_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5438441Z test_nn_PixelShuffle_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5438787Z test_nn_PixelUnshuffle_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5439127Z test_nn_PixelUnshuffle_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5439468Z test_nn_PoissonNLLLoss_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5439812Z test_nn_PoissonNLLLoss_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5440149Z test_nn_RNNBase_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.004s) 2023-01-11T21:31:15.5440464Z test_nn_RNNBase_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.003s) 2023-01-11T21:31:15.5440788Z test_nn_RNNCellBase_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.003s) 2023-01-11T21:31:15.5441117Z test_nn_RNNCellBase_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.003s) 2023-01-11T21:31:15.5441435Z test_nn_RNNCell_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.003s) 2023-01-11T21:31:15.5441758Z test_nn_RNNCell_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.003s) 2023-01-11T21:31:15.5442073Z test_nn_RNN_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.003s) 2023-01-11T21:31:15.5442386Z test_nn_RNN_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.003s) 2023-01-11T21:31:15.5442727Z test_nn_RReLU_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5443045Z test_nn_RReLU_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5443358Z test_nn_ReLU6_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5443655Z test_nn_ReLU6_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5443962Z test_nn_ReLU_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5444274Z test_nn_ReLU_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5444604Z test_nn_ReflectionPad1d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5444949Z test_nn_ReflectionPad1d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5445294Z test_nn_ReflectionPad2d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5445643Z test_nn_ReflectionPad2d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5445977Z test_nn_ReflectionPad3d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5446319Z test_nn_ReflectionPad3d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5446669Z test_nn_ReplicationPad1d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5447019Z test_nn_ReplicationPad1d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5447356Z test_nn_ReplicationPad2d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5447703Z test_nn_ReplicationPad2d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.003s) 2023-01-11T21:31:15.5448098Z test_nn_ReplicationPad3d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5448432Z test_nn_ReplicationPad3d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5448755Z test_nn_SELU_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5449069Z test_nn_SELU_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5449392Z test_nn_Sequential_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5449714Z test_nn_Sequential_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5450033Z test_nn_SiLU_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5450344Z test_nn_SiLU_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5450653Z test_nn_Sigmoid_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5450978Z test_nn_Sigmoid_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5451306Z test_nn_SmoothL1Loss_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5451637Z test_nn_SmoothL1Loss_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5451967Z test_nn_SoftMarginLoss_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5452310Z test_nn_SoftMarginLoss_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5452641Z test_nn_Softmax2d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5452954Z test_nn_Softmax2d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5453280Z test_nn_Softmax_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5453600Z test_nn_Softmax_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5453917Z test_nn_Softmin_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5454230Z test_nn_Softmin_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5454877Z test_nn_Softplus_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5475495Z test_nn_Softplus_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5476170Z test_nn_Softshrink_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5476505Z test_nn_Softshrink_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5476818Z test_nn_Softsign_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5477139Z test_nn_Softsign_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5477465Z test_nn_SyncBatchNorm_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.004s) 2023-01-11T21:31:15.5477802Z test_nn_SyncBatchNorm_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.003s) 2023-01-11T21:31:15.5478122Z test_nn_Tanh_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5478430Z test_nn_Tanh_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5478749Z test_nn_Tanhshrink_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5479063Z test_nn_Tanhshrink_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5479388Z test_nn_Threshold_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5479707Z test_nn_Threshold_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5480052Z test_nn_TransformerDecoderLayer_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.027s) 2023-01-11T21:31:15.5480440Z test_nn_TransformerDecoderLayer_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.006s) 2023-01-11T21:31:15.5480820Z test_nn_TransformerDecoder_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.007s) 2023-01-11T21:31:15.5481230Z test_nn_TransformerDecoder_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.007s) 2023-01-11T21:31:15.5481576Z test_nn_TransformerEncoderLayer_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.005s) 2023-01-11T21:31:15.5481941Z test_nn_TransformerEncoderLayer_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.005s) 2023-01-11T21:31:15.5482295Z test_nn_TransformerEncoder_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.005s) 2023-01-11T21:31:15.5482644Z test_nn_TransformerEncoder_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.006s) 2023-01-11T21:31:15.5482972Z test_nn_Transformer_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.307s) 2023-01-11T21:31:15.5483295Z test_nn_Transformer_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.376s) 2023-01-11T21:31:15.5483638Z test_nn_TripletMarginLoss_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5483984Z test_nn_TripletMarginLoss_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5484355Z test_nn_TripletMarginWithDistanceLoss_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5484743Z test_nn_TripletMarginWithDistanceLoss_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5485105Z test_nn_Unflatten_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5485420Z test_nn_Unflatten_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5485732Z test_nn_Unfold_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5486045Z test_nn_Unfold_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5486360Z test_nn_Upsample_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5486679Z test_nn_Upsample_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5487018Z test_nn_UpsamplingBilinear2d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5487376Z test_nn_UpsamplingBilinear2d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5487724Z test_nn_UpsamplingNearest2d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5488081Z test_nn_UpsamplingNearest2d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5488461Z test_nn_ZeroPad2d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5488789Z test_nn_ZeroPad2d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5489099Z test_qat_Conv1d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.004s) 2023-01-11T21:31:15.5489415Z test_qat_Conv1d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.003s) 2023-01-11T21:31:15.5489721Z test_qat_Conv2d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.003s) 2023-01-11T21:31:15.5490018Z test_qat_Conv2d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.003s) 2023-01-11T21:31:15.5490328Z test_qat_Conv3d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.004s) 2023-01-11T21:31:15.5490642Z test_qat_Conv3d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.004s) 2023-01-11T21:31:15.5490959Z test_qat_EmbeddingBag_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.004s) 2023-01-11T21:31:15.5491285Z test_qat_EmbeddingBag_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.003s) 2023-01-11T21:31:15.5491614Z test_qat_Embedding_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.004s) 2023-01-11T21:31:15.5491936Z test_qat_Embedding_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.004s) 2023-01-11T21:31:15.5492246Z test_qat_Linear_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.005s) 2023-01-11T21:31:15.5492557Z test_qat_Linear_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.004s) 2023-01-11T21:31:15.5492879Z test_quantizable_LSTMCell_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.004s) 2023-01-11T21:31:15.5493227Z test_quantizable_LSTMCell_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.003s) 2023-01-11T21:31:15.5493588Z test_quantizable_LSTM_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.004s) 2023-01-11T21:31:15.5493918Z test_quantizable_LSTM_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.004s) 2023-01-11T21:31:15.5494276Z test_quantizable_MultiheadAttention_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.005s) 2023-01-11T21:31:15.5495010Z test_quantizable_MultiheadAttention_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.005s) 2023-01-11T21:31:15.5495398Z test_quantized_BatchNorm2d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.004s) 2023-01-11T21:31:15.5495838Z test_quantized_BatchNorm2d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.004s) 2023-01-11T21:31:15.5496189Z test_quantized_BatchNorm3d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.004s) 2023-01-11T21:31:15.5496523Z test_quantized_BatchNorm3d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.004s) 2023-01-11T21:31:15.5496855Z test_quantized_Conv1d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.003s) 2023-01-11T21:31:15.5497183Z test_quantized_Conv1d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.004s) 2023-01-11T21:31:15.5497499Z test_quantized_Conv2d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.003s) 2023-01-11T21:31:15.5497823Z test_quantized_Conv2d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.003s) 2023-01-11T21:31:15.5498148Z test_quantized_Conv3d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5498471Z test_quantized_Conv3d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5498808Z test_quantized_ConvTranspose1d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.003s) 2023-01-11T21:31:15.5499161Z test_quantized_ConvTranspose1d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.003s) 2023-01-11T21:31:15.5499509Z test_quantized_ConvTranspose2d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.003s) 2023-01-11T21:31:15.5499849Z test_quantized_ConvTranspose2d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.003s) 2023-01-11T21:31:15.5500197Z test_quantized_ConvTranspose3d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.003s) 2023-01-11T21:31:15.5500552Z test_quantized_ConvTranspose3d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.003s) 2023-01-11T21:31:15.5500989Z test_quantized_DeQuantize_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5501327Z test_quantized_DeQuantize_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5501664Z test_quantized_Dropout_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5501996Z test_quantized_Dropout_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5502321Z test_quantized_ELU_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5502643Z test_quantized_ELU_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5502989Z test_quantized_FXFloatFunctional_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5503357Z test_quantized_FXFloatFunctional_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5503710Z test_quantized_FloatFunctional_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5504061Z test_quantized_FloatFunctional_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5504401Z test_quantized_GroupNorm_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.003s) 2023-01-11T21:31:15.5504741Z test_quantized_GroupNorm_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.003s) 2023-01-11T21:31:15.5505062Z test_quantized_Hardswish_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5505394Z test_quantized_Hardswish_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.003s) 2023-01-11T21:31:15.5505734Z test_quantized_InstanceNorm1d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.003s) 2023-01-11T21:31:15.5506134Z test_quantized_InstanceNorm1d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.003s) 2023-01-11T21:31:15.5506479Z test_quantized_InstanceNorm2d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.003s) 2023-01-11T21:31:15.5506821Z test_quantized_InstanceNorm2d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.003s) 2023-01-11T21:31:15.5507159Z test_quantized_InstanceNorm3d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.003s) 2023-01-11T21:31:15.5507493Z test_quantized_InstanceNorm3d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.003s) 2023-01-11T21:31:15.5507833Z test_quantized_LayerNorm_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.003s) 2023-01-11T21:31:15.5508167Z test_quantized_LayerNorm_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.003s) 2023-01-11T21:31:15.5508487Z test_quantized_LeakyReLU_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.003s) 2023-01-11T21:31:15.5508815Z test_quantized_LeakyReLU_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.003s) 2023-01-11T21:31:15.5509155Z test_quantized_Linear_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.003s) 2023-01-11T21:31:15.5509485Z test_quantized_Linear_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5509809Z test_quantized_PReLU_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.009s) 2023-01-11T21:31:15.5510143Z test_quantized_PReLU_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.012s) 2023-01-11T21:31:15.5510507Z test_quantized_QFunctional_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.004s) 2023-01-11T21:31:15.5510866Z test_quantized_QFunctional_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5511207Z test_quantized_Quantize_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.003s) 2023-01-11T21:31:15.5511540Z test_quantized_Quantize_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.003s) 2023-01-11T21:31:15.5511872Z test_quantized_ReLU6_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5512191Z test_quantized_ReLU6_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5512520Z test_quantized_Sigmoid_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5512892Z test_quantized_Sigmoid_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5513223Z test_quantized_Softmax_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5513550Z test_quantized_Softmax_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s) 2023-01-11T21:31:15.5513739Z 2023-01-11T21:31:15.5514060Z ---------------------------------------------------------------------- 2023-01-11T21:31:15.5514325Z Ran 374 tests in 2.499s 2023-01-11T21:31:15.5514449Z 2023-01-11T21:31:15.5514513Z OK 2023-01-11T21:31:15.5514617Z 2023-01-11T21:31:15.5514711Z Generating XML reports... 2023-01-11T21:31:15.5515160Z Generated XML report: test-reports/python-unittest/test_module_init/TEST-TestModuleInitCUDA-20230111213112.xml 2023-01-11T21:31:15.5515416Z 2023-01-11T21:31:15.5515799Z ##[endgroup] 2023-01-11T21:31:15.5516196Z FINISHED PRINTING LOG FILE of test_module_init (/var/lib/jenkins/workspace/test/test-reports/test_module_init___9kb0fz) 2023-01-11T21:31:15.5516421Z 2023-01-11T21:31:17.4472376Z Ignoring disabled issues: [] 2023-01-11T21:31:17.4710251Z Running test_native_mha ... [2023-01-11 21:31:17.470488] 2023-01-11T21:31:17.4710966Z Executing ['/opt/conda/bin/python', '-bb', 'test_native_mha.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 21:31:17.470826] 2023-01-11T21:31:21.3459174Z 2023-01-11T21:31:21.3459885Z Expand the folded group to see the log file of test_native_mha 2023-01-11T21:31:21.3460936Z ##[group]PRINTING LOG FILE of test_native_mha (/var/lib/jenkins/workspace/test/test-reports/test_native_mha_y7qjc5i2) 2023-01-11T21:31:21.3461290Z 2023-01-11T21:31:21.3461419Z Running tests... 2023-01-11T21:31:21.3461991Z ---------------------------------------------------------------------- 2023-01-11T21:31:21.3462686Z Test results will be stored in test-reports/python-unittest/test_native_mha 2023-01-11T21:31:21.3463050Z test_native_multihead_attention_cuda_float16 (__main__.TestMHADeviceTypeCUDA) ... ok (0.670s) 2023-01-11T21:31:21.3463449Z test_native_multihead_attention_cuda_float32 (__main__.TestMHADeviceTypeCUDA) ... ok (0.003s) 2023-01-11T21:31:21.3463837Z test_native_multihead_encoder_decoder_attention_cuda_float16 (__main__.TestMHADeviceTypeCUDA) ... ok (0.002s) 2023-01-11T21:31:21.3464225Z test_native_multihead_encoder_decoder_attention_cuda_float32 (__main__.TestMHADeviceTypeCUDA) ... ok (0.002s) 2023-01-11T21:31:21.3464720Z test_native_multihead_self_attention_use_nt_False_use_padding_False_pad_all_False_need_weights_False_average_attn_weights_False_fused_False_cuda_float16 (__main__.TestMHADeviceTypeCUDA) ... ok (0.005s) 2023-01-11T21:31:21.3465288Z test_native_multihead_self_attention_use_nt_False_use_padding_False_pad_all_False_need_weights_False_average_attn_weights_False_fused_False_cuda_float32 (__main__.TestMHADeviceTypeCUDA) ... ok (0.004s) 2023-01-11T21:31:21.3465850Z test_native_multihead_self_attention_use_nt_False_use_padding_False_pad_all_False_need_weights_False_average_attn_weights_False_fused_True_cuda_float16 (__main__.TestMHADeviceTypeCUDA) ... ok (0.005s) 2023-01-11T21:31:21.3466410Z test_native_multihead_self_attention_use_nt_False_use_padding_False_pad_all_False_need_weights_False_average_attn_weights_False_fused_True_cuda_float32 (__main__.TestMHADeviceTypeCUDA) ... ok (0.004s) 2023-01-11T21:31:21.3466952Z test_native_multihead_self_attention_use_nt_False_use_padding_False_pad_all_False_need_weights_False_average_attn_weights_True_fused_False_cuda_float16 (__main__.TestMHADeviceTypeCUDA) ... ok (0.005s) 2023-01-11T21:31:21.3467494Z test_native_multihead_self_attention_use_nt_False_use_padding_False_pad_all_False_need_weights_False_average_attn_weights_True_fused_False_cuda_float32 (__main__.TestMHADeviceTypeCUDA) ... ok (0.005s) 2023-01-11T21:31:21.3468054Z test_native_multihead_self_attention_use_nt_False_use_padding_False_pad_all_False_need_weights_False_average_attn_weights_True_fused_True_cuda_float16 (__main__.TestMHADeviceTypeCUDA) ... ok (0.005s) 2023-01-11T21:31:21.3468665Z test_native_multihead_self_attention_use_nt_False_use_padding_False_pad_all_False_need_weights_False_average_attn_weights_True_fused_True_cuda_float32 (__main__.TestMHADeviceTypeCUDA) ... ok (0.004s) 2023-01-11T21:31:21.3469216Z test_native_multihead_self_attention_use_nt_False_use_padding_True_pad_all_False_need_weights_False_average_attn_weights_False_fused_False_cuda_float16 (__main__.TestMHADeviceTypeCUDA) ... ok (0.005s) 2023-01-11T21:31:21.3469770Z test_native_multihead_self_attention_use_nt_False_use_padding_True_pad_all_False_need_weights_False_average_attn_weights_False_fused_False_cuda_float32 (__main__.TestMHADeviceTypeCUDA) ... ok (0.005s) 2023-01-11T21:31:21.3470322Z test_native_multihead_self_attention_use_nt_False_use_padding_True_pad_all_False_need_weights_False_average_attn_weights_False_fused_True_cuda_float16 (__main__.TestMHADeviceTypeCUDA) ... ok (0.005s) 2023-01-11T21:31:21.3470884Z test_native_multihead_self_attention_use_nt_False_use_padding_True_pad_all_False_need_weights_False_average_attn_weights_False_fused_True_cuda_float32 (__main__.TestMHADeviceTypeCUDA) ... ok (0.005s) 2023-01-11T21:31:21.3471428Z test_native_multihead_self_attention_use_nt_False_use_padding_True_pad_all_False_need_weights_False_average_attn_weights_True_fused_False_cuda_float16 (__main__.TestMHADeviceTypeCUDA) ... ok (0.005s) 2023-01-11T21:31:21.3471985Z test_native_multihead_self_attention_use_nt_False_use_padding_True_pad_all_False_need_weights_False_average_attn_weights_True_fused_False_cuda_float32 (__main__.TestMHADeviceTypeCUDA) ... ok (0.005s) 2023-01-11T21:31:21.3472527Z test_native_multihead_self_attention_use_nt_False_use_padding_True_pad_all_False_need_weights_False_average_attn_weights_True_fused_True_cuda_float16 (__main__.TestMHADeviceTypeCUDA) ... ok (0.005s) 2023-01-11T21:31:21.3473111Z test_native_multihead_self_attention_use_nt_False_use_padding_True_pad_all_False_need_weights_False_average_attn_weights_True_fused_True_cuda_float32 (__main__.TestMHADeviceTypeCUDA) ... ok (0.005s) 2023-01-11T21:31:21.3473663Z test_native_multihead_self_attention_use_nt_False_use_padding_True_pad_all_True_need_weights_False_average_attn_weights_False_fused_False_cuda_float16 (__main__.TestMHADeviceTypeCUDA) ... ok (0.007s) 2023-01-11T21:31:21.3474208Z test_native_multihead_self_attention_use_nt_False_use_padding_True_pad_all_True_need_weights_False_average_attn_weights_False_fused_False_cuda_float32 (__main__.TestMHADeviceTypeCUDA) ... ok (0.007s) 2023-01-11T21:31:21.3474755Z test_native_multihead_self_attention_use_nt_False_use_padding_True_pad_all_True_need_weights_False_average_attn_weights_False_fused_True_cuda_float16 (__main__.TestMHADeviceTypeCUDA) ... ok (0.007s) 2023-01-11T21:31:21.3475293Z test_native_multihead_self_attention_use_nt_False_use_padding_True_pad_all_True_need_weights_False_average_attn_weights_False_fused_True_cuda_float32 (__main__.TestMHADeviceTypeCUDA) ... ok (0.007s) 2023-01-11T21:31:21.3475843Z test_native_multihead_self_attention_use_nt_False_use_padding_True_pad_all_True_need_weights_False_average_attn_weights_True_fused_False_cuda_float16 (__main__.TestMHADeviceTypeCUDA) ... ok (0.007s) 2023-01-11T21:31:21.3476383Z test_native_multihead_self_attention_use_nt_False_use_padding_True_pad_all_True_need_weights_False_average_attn_weights_True_fused_False_cuda_float32 (__main__.TestMHADeviceTypeCUDA) ... ok (0.007s) 2023-01-11T21:31:21.3476925Z test_native_multihead_self_attention_use_nt_False_use_padding_True_pad_all_True_need_weights_False_average_attn_weights_True_fused_True_cuda_float16 (__main__.TestMHADeviceTypeCUDA) ... ok (0.007s) 2023-01-11T21:31:21.3477471Z test_native_multihead_self_attention_use_nt_False_use_padding_True_pad_all_True_need_weights_False_average_attn_weights_True_fused_True_cuda_float32 (__main__.TestMHADeviceTypeCUDA) ... ok (0.007s) 2023-01-11T21:31:21.3478299Z test_native_multihead_self_attention_use_nt_True_use_padding_False_pad_all_False_need_weights_False_average_attn_weights_False_fused_False_cuda_float16 (__main__.TestMHADeviceTypeCUDA) ... /var/lib/jenkins/workspace/test/test_native_mha.py:207: UserWarning: The PyTorch API of nested tensors is in prototype stage and will change in the near future. (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/NestedTensorImpl.cpp:179.) 2023-01-11T21:31:21.3478951Z q = torch.nested.nested_tensor(qs, device=device, dtype=dtype) 2023-01-11T21:31:21.3479198Z ok (0.006s) 2023-01-11T21:31:21.3479577Z test_native_multihead_self_attention_use_nt_True_use_padding_False_pad_all_False_need_weights_False_average_attn_weights_False_fused_False_cuda_float32 (__main__.TestMHADeviceTypeCUDA) ... ok (0.005s) 2023-01-11T21:31:21.3480126Z test_native_multihead_self_attention_use_nt_True_use_padding_False_pad_all_False_need_weights_False_average_attn_weights_False_fused_True_cuda_float16 (__main__.TestMHADeviceTypeCUDA) ... ok (0.005s) 2023-01-11T21:31:21.3480668Z test_native_multihead_self_attention_use_nt_True_use_padding_False_pad_all_False_need_weights_False_average_attn_weights_False_fused_True_cuda_float32 (__main__.TestMHADeviceTypeCUDA) ... ok (0.005s) 2023-01-11T21:31:21.3481210Z test_native_multihead_self_attention_use_nt_True_use_padding_False_pad_all_False_need_weights_False_average_attn_weights_True_fused_False_cuda_float16 (__main__.TestMHADeviceTypeCUDA) ... ok (0.005s) 2023-01-11T21:31:21.3481745Z test_native_multihead_self_attention_use_nt_True_use_padding_False_pad_all_False_need_weights_False_average_attn_weights_True_fused_False_cuda_float32 (__main__.TestMHADeviceTypeCUDA) ... ok (0.005s) 2023-01-11T21:31:21.3482291Z test_native_multihead_self_attention_use_nt_True_use_padding_False_pad_all_False_need_weights_False_average_attn_weights_True_fused_True_cuda_float16 (__main__.TestMHADeviceTypeCUDA) ... ok (0.005s) 2023-01-11T21:31:21.3482865Z test_native_multihead_self_attention_use_nt_True_use_padding_False_pad_all_False_need_weights_False_average_attn_weights_True_fused_True_cuda_float32 (__main__.TestMHADeviceTypeCUDA) ... ok (0.005s) 2023-01-11T21:31:21.3483404Z test_native_multihead_self_attention_use_nt_True_use_padding_True_pad_all_False_need_weights_False_average_attn_weights_False_fused_False_cuda_float16 (__main__.TestMHADeviceTypeCUDA) ... ok (0.005s) 2023-01-11T21:31:21.3483942Z test_native_multihead_self_attention_use_nt_True_use_padding_True_pad_all_False_need_weights_False_average_attn_weights_False_fused_False_cuda_float32 (__main__.TestMHADeviceTypeCUDA) ... ok (0.005s) 2023-01-11T21:31:21.3484476Z test_native_multihead_self_attention_use_nt_True_use_padding_True_pad_all_False_need_weights_False_average_attn_weights_False_fused_True_cuda_float16 (__main__.TestMHADeviceTypeCUDA) ... ok (0.005s) 2023-01-11T21:31:21.3485021Z test_native_multihead_self_attention_use_nt_True_use_padding_True_pad_all_False_need_weights_False_average_attn_weights_False_fused_True_cuda_float32 (__main__.TestMHADeviceTypeCUDA) ... ok (0.005s) 2023-01-11T21:31:21.3485568Z test_native_multihead_self_attention_use_nt_True_use_padding_True_pad_all_False_need_weights_False_average_attn_weights_True_fused_False_cuda_float16 (__main__.TestMHADeviceTypeCUDA) ... ok (0.005s) 2023-01-11T21:31:21.3486103Z test_native_multihead_self_attention_use_nt_True_use_padding_True_pad_all_False_need_weights_False_average_attn_weights_True_fused_False_cuda_float32 (__main__.TestMHADeviceTypeCUDA) ... ok (0.005s) 2023-01-11T21:31:21.3486642Z test_native_multihead_self_attention_use_nt_True_use_padding_True_pad_all_False_need_weights_False_average_attn_weights_True_fused_True_cuda_float16 (__main__.TestMHADeviceTypeCUDA) ... ok (0.005s) 2023-01-11T21:31:21.3487182Z test_native_multihead_self_attention_use_nt_True_use_padding_True_pad_all_False_need_weights_False_average_attn_weights_True_fused_True_cuda_float32 (__main__.TestMHADeviceTypeCUDA) ... ok (0.005s) 2023-01-11T21:31:21.3487728Z test_native_multihead_self_attention_use_nt_True_use_padding_True_pad_all_True_need_weights_False_average_attn_weights_False_fused_False_cuda_float16 (__main__.TestMHADeviceTypeCUDA) ... ok (0.008s) 2023-01-11T21:31:21.3488304Z test_native_multihead_self_attention_use_nt_True_use_padding_True_pad_all_True_need_weights_False_average_attn_weights_False_fused_False_cuda_float32 (__main__.TestMHADeviceTypeCUDA) ... ok (0.008s) 2023-01-11T21:31:21.3488842Z test_native_multihead_self_attention_use_nt_True_use_padding_True_pad_all_True_need_weights_False_average_attn_weights_False_fused_True_cuda_float16 (__main__.TestMHADeviceTypeCUDA) ... ok (0.008s) 2023-01-11T21:31:21.3489381Z test_native_multihead_self_attention_use_nt_True_use_padding_True_pad_all_True_need_weights_False_average_attn_weights_False_fused_True_cuda_float32 (__main__.TestMHADeviceTypeCUDA) ... ok (0.008s) 2023-01-11T21:31:21.3489915Z test_native_multihead_self_attention_use_nt_True_use_padding_True_pad_all_True_need_weights_False_average_attn_weights_True_fused_False_cuda_float16 (__main__.TestMHADeviceTypeCUDA) ... ok (0.008s) 2023-01-11T21:31:21.3490457Z test_native_multihead_self_attention_use_nt_True_use_padding_True_pad_all_True_need_weights_False_average_attn_weights_True_fused_False_cuda_float32 (__main__.TestMHADeviceTypeCUDA) ... ok (0.007s) 2023-01-11T21:31:21.3490994Z test_native_multihead_self_attention_use_nt_True_use_padding_True_pad_all_True_need_weights_False_average_attn_weights_True_fused_True_cuda_float16 (__main__.TestMHADeviceTypeCUDA) ... ok (0.008s) 2023-01-11T21:31:21.3491528Z test_native_multihead_self_attention_use_nt_True_use_padding_True_pad_all_True_need_weights_False_average_attn_weights_True_fused_True_cuda_float32 (__main__.TestMHADeviceTypeCUDA) ... ok (0.007s) 2023-01-11T21:31:21.3491972Z test_transform_bias_rescale_qkv_cuda_float32 (__main__.TestMHADeviceTypeCUDA) ... ok (0.011s) 2023-01-11T21:31:21.3492418Z test_transform_bias_rescale_qkv_nested_cuda_float32 (__main__.TestMHADeviceTypeCUDA) ... ok (0.013s) 2023-01-11T21:31:21.3492629Z 2023-01-11T21:31:21.3492840Z ---------------------------------------------------------------------- 2023-01-11T21:31:21.3493098Z Ran 54 tests in 0.982s 2023-01-11T21:31:21.3493226Z 2023-01-11T21:31:21.3493298Z OK 2023-01-11T21:31:21.3493402Z 2023-01-11T21:31:21.3493504Z Generating XML reports... 2023-01-11T21:31:21.3493961Z Generated XML report: test-reports/python-unittest/test_native_mha/TEST-TestMHADeviceTypeCUDA-20230111213119.xml 2023-01-11T21:31:21.3494207Z 2023-01-11T21:31:21.3494455Z ##[endgroup] 2023-01-11T21:31:21.3494989Z FINISHED PRINTING LOG FILE of test_native_mha (/var/lib/jenkins/workspace/test/test-reports/test_native_mha_y7qjc5i2) 2023-01-11T21:31:21.3495210Z 2023-01-11T21:31:23.2848983Z Ignoring disabled issues: [] 2023-01-11T21:31:23.3083115Z Running test_numpy_interop ... [2023-01-11 21:31:23.307528] 2023-01-11T21:31:23.3083761Z Executing ['/opt/conda/bin/python', '-bb', 'test_numpy_interop.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 21:31:23.307887] 2023-01-11T21:31:26.2523103Z 2023-01-11T21:31:26.2523727Z Expand the folded group to see the log file of test_numpy_interop 2023-01-11T21:31:26.2524967Z ##[group]PRINTING LOG FILE of test_numpy_interop (/var/lib/jenkins/workspace/test/test-reports/test_numpy_interop_ei77a9xw) 2023-01-11T21:31:26.2525332Z 2023-01-11T21:31:26.2525456Z Running tests... 2023-01-11T21:31:26.2526057Z ---------------------------------------------------------------------- 2023-01-11T21:31:26.2526621Z Test results will be stored in test-reports/python-unittest/test_numpy_interop 2023-01-11T21:31:26.2527634Z test_ctor_with_invalid_numpy_array_sequence_cuda (__main__.TestNumPyInteropCUDA) ... /var/lib/jenkins/workspace/test/test_numpy_interop.py:265: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/utils/tensor_new.cpp:259.) 2023-01-11T21:31:26.2528586Z torch.tensor([np.random.random(size=(3, 3)), np.random.random(size=(3, 0))], device=device) 2023-01-11T21:31:26.2528969Z ok (0.006s) 2023-01-11T21:31:26.2529621Z test_ctor_with_numpy_scalar_ctor_cuda (__main__.TestNumPyInteropCUDA) ... skip: Only runs on cpu (0.001s) 2023-01-11T21:31:26.2530141Z test_from_list_of_ndarray_warning_cuda (__main__.TestNumPyInteropCUDA) ... ok (0.001s) 2023-01-11T21:31:26.2530600Z test_from_numpy_cuda (__main__.TestNumPyInteropCUDA) ... ok (0.009s) 2023-01-11T21:31:26.2531519Z test_has_storage_numpy_cuda (__main__.TestNumPyInteropCUDA) ... /var/lib/jenkins/workspace/test/test_numpy_interop.py:441: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:31:26.2532472Z self.assertIsNotNone(torch.tensor(arr, device=device, dtype=torch.float32).storage()) 2023-01-11T21:31:26.2533340Z /var/lib/jenkins/workspace/test/test_numpy_interop.py:442: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:31:26.2534172Z self.assertIsNotNone(torch.tensor(arr, device=device, dtype=torch.double).storage()) 2023-01-11T21:31:26.2535353Z /var/lib/jenkins/workspace/test/test_numpy_interop.py:443: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:31:26.2536384Z self.assertIsNotNone(torch.tensor(arr, device=device, dtype=torch.int).storage()) 2023-01-11T21:31:26.2537231Z /var/lib/jenkins/workspace/test/test_numpy_interop.py:444: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:31:26.2538039Z self.assertIsNotNone(torch.tensor(arr, device=device, dtype=torch.long).storage()) 2023-01-11T21:31:26.2538903Z /var/lib/jenkins/workspace/test/test_numpy_interop.py:445: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:31:26.2539719Z self.assertIsNotNone(torch.tensor(arr, device=device, dtype=torch.uint8).storage()) 2023-01-11T21:31:26.2540102Z ok (0.002s) 2023-01-11T21:31:26.2540510Z test_multiplication_numpy_scalar_cuda (__main__.TestNumPyInteropCUDA) ... skip: Only runs on cpu (0.001s) 2023-01-11T21:31:26.2541054Z test_numpy_array_interface_cuda (__main__.TestNumPyInteropCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:31:26.2541579Z test_numpy_index_cuda (__main__.TestNumPyInteropCUDA) ... skip: Only runs on cpu (0.001s) 2023-01-11T21:31:26.2542090Z test_numpy_non_writeable_cuda (__main__.TestNumPyInteropCUDA) ... skip: Only runs on cpu (0.000s) 2023-01-11T21:31:26.2542592Z test_numpy_scalar_cmp_cuda_bfloat16 (__main__.TestNumPyInteropCUDA) ... ok (0.004s) 2023-01-11T21:31:26.2543064Z test_numpy_scalar_cmp_cuda_bool (__main__.TestNumPyInteropCUDA) ... ok (0.003s) 2023-01-11T21:31:26.2543596Z test_numpy_scalar_cmp_cuda_complex128 (__main__.TestNumPyInteropCUDA) ... ok (0.002s) 2023-01-11T21:31:26.2544290Z test_numpy_scalar_cmp_cuda_complex64 (__main__.TestNumPyInteropCUDA) ... /var/lib/jenkins/workspace/test/test_numpy_interop.py:471: ComplexWarning: Casting complex values to real discards the imaginary part 2023-01-11T21:31:26.2544916Z self.assertFalse(t == a) 2023-01-11T21:31:26.2545176Z ok (0.002s) 2023-01-11T21:31:26.2545636Z test_numpy_scalar_cmp_cuda_float16 (__main__.TestNumPyInteropCUDA) ... ok (0.002s) 2023-01-11T21:31:26.2546105Z test_numpy_scalar_cmp_cuda_float32 (__main__.TestNumPyInteropCUDA) ... ok (0.002s) 2023-01-11T21:31:26.2546582Z test_numpy_scalar_cmp_cuda_float64 (__main__.TestNumPyInteropCUDA) ... ok (0.002s) 2023-01-11T21:31:26.2547047Z test_numpy_scalar_cmp_cuda_int16 (__main__.TestNumPyInteropCUDA) ... ok (0.002s) 2023-01-11T21:31:26.2547531Z test_numpy_scalar_cmp_cuda_int32 (__main__.TestNumPyInteropCUDA) ... ok (0.002s) 2023-01-11T21:31:26.2550285Z test_numpy_scalar_cmp_cuda_int64 (__main__.TestNumPyInteropCUDA) ... ok (0.002s) 2023-01-11T21:31:26.2550769Z test_numpy_scalar_cmp_cuda_int8 (__main__.TestNumPyInteropCUDA) ... ok (0.002s) 2023-01-11T21:31:26.2551243Z test_numpy_scalar_cmp_cuda_uint8 (__main__.TestNumPyInteropCUDA) ... ok (0.002s) 2023-01-11T21:31:26.2551732Z test_numpy_unresizable_cuda (__main__.TestNumPyInteropCUDA) ... skip: Only runs on cpu (0.001s) 2023-01-11T21:31:26.2552252Z test_parse_numpy_int_cuda (__main__.TestNumPyInteropCUDA) ... skip: Only runs on cpu (0.001s) 2023-01-11T21:31:26.2552745Z test_to_numpy_bool_cuda (__main__.TestNumPyInteropCUDA) ... ok (0.001s) 2023-01-11T21:31:26.2553230Z test_to_numpy_cuda (__main__.TestNumPyInteropCUDA) ... skip: Only runs on cpu (0.003s) 2023-01-11T21:31:26.2553707Z test_to_numpy_force_argument_cuda (__main__.TestNumPyInteropCUDA) ... ok (0.014s) 2023-01-11T21:31:26.2553961Z 2023-01-11T21:31:26.2554331Z ---------------------------------------------------------------------- 2023-01-11T21:31:26.2554692Z Ran 26 tests in 0.075s 2023-01-11T21:31:26.2554861Z 2023-01-11T21:31:26.2554980Z OK (skipped=8) 2023-01-11T21:31:26.2555280Z 2023-01-11T21:31:26.2555408Z Generating XML reports... 2023-01-11T21:31:26.2556009Z Generated XML report: test-reports/python-unittest/test_numpy_interop/TEST-TestNumPyInteropCUDA-20230111213125.xml 2023-01-11T21:31:26.2556366Z 2023-01-11T21:31:26.2556846Z ##[endgroup] 2023-01-11T21:31:26.2557411Z FINISHED PRINTING LOG FILE of test_numpy_interop (/var/lib/jenkins/workspace/test/test-reports/test_numpy_interop_ei77a9xw) 2023-01-11T21:31:26.2557742Z 2023-01-11T21:31:28.1987381Z Ignoring disabled issues: [] 2023-01-11T21:31:28.2226940Z Running test_optim ... [2023-01-11 21:31:28.222120] 2023-01-11T21:31:28.2227458Z Executing ['/opt/conda/bin/python', '-bb', 'test_optim.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 21:31:28.222464] 2023-01-11T21:33:08.0026761Z 2023-01-11T21:33:08.0027354Z Expand the folded group to see the log file of test_optim 2023-01-11T21:33:08.0028256Z ##[group]PRINTING LOG FILE of test_optim (/var/lib/jenkins/workspace/test/test-reports/test_optim_mznt_uo2) 2023-01-11T21:33:08.0029697Z 2023-01-11T21:33:08.0029923Z Running tests... 2023-01-11T21:33:08.0030437Z ---------------------------------------------------------------------- 2023-01-11T21:33:08.0058882Z Test results will be stored in test-reports/python-unittest/test_optim 2023-01-11T21:33:08.0059282Z test_adadelta (__main__.TestDifferentiableOptimizer) ... ok (1.090s) 2023-01-11T21:33:08.0059760Z test_adagrad (__main__.TestDifferentiableOptimizer) ... ok (0.005s) 2023-01-11T21:33:08.0060219Z test_adam (__main__.TestDifferentiableOptimizer) ... ok (0.012s) 2023-01-11T21:33:08.0060596Z test_adamax (__main__.TestDifferentiableOptimizer) ... ok (0.009s) 2023-01-11T21:33:08.0060978Z test_adamw (__main__.TestDifferentiableOptimizer) ... ok (0.012s) 2023-01-11T21:33:08.0062069Z test_asgd (__main__.TestDifferentiableOptimizer) ... ok (0.005s) 2023-01-11T21:33:08.0062544Z test_nadam (__main__.TestDifferentiableOptimizer) ... ok (0.012s) 2023-01-11T21:33:08.0063018Z test_radam (__main__.TestDifferentiableOptimizer) ... ok (0.008s) 2023-01-11T21:33:08.0063471Z test_rmsprop (__main__.TestDifferentiableOptimizer) ... ok (0.011s) 2023-01-11T21:33:08.0063867Z test_rprop (__main__.TestDifferentiableOptimizer) ... ok (0.009s) 2023-01-11T21:33:08.0064180Z test_sgd (__main__.TestDifferentiableOptimizer) ... ok (0.004s) 2023-01-11T21:33:08.0064871Z test_CosineAnnealingWarmRestarts_lr1_T_mult_1 (__main__.TestLRScheduler) ... ok (0.008s) 2023-01-11T21:33:08.0065316Z test_CosineAnnealingWarmRestarts_lr1_T_mult_2 (__main__.TestLRScheduler) ... ok (0.008s) 2023-01-11T21:33:08.0065819Z test_CosineAnnealingWarmRestarts_lr1_T_mult_4 (__main__.TestLRScheduler) ... ok (0.007s) 2023-01-11T21:33:08.0066309Z test_CosineAnnealingWarmRestarts_lr2 (__main__.TestLRScheduler) ... ok (0.063s) 2023-01-11T21:33:08.0066680Z test_CosineAnnealingWarmRestarts_lr3 (__main__.TestLRScheduler) ... ok (0.004s) 2023-01-11T21:33:08.0067041Z test_CosineAnnealingWarmRestarts_lr_state_dict (__main__.TestLRScheduler) ... ok (0.001s) 2023-01-11T21:33:08.0068161Z test_chained_lr1 (__main__.TestLRScheduler) ... /opt/conda/lib/python3.10/site-packages/torch/optim/lr_scheduler.py:139: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`. Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate 2023-01-11T21:33:08.0070423Z warnings.warn("Detected call of `lr_scheduler.step()` before `optimizer.step()`. " 2023-01-11T21:33:08.0070820Z ok (0.001s) 2023-01-11T21:33:08.0071165Z test_chained_lr2 (__main__.TestLRScheduler) ... ok (0.001s) 2023-01-11T21:33:08.0071564Z test_chained_lr2_get_last_lr_before_step (__main__.TestLRScheduler) ... ok (0.001s) 2023-01-11T21:33:08.0071895Z test_chained_lr3 (__main__.TestLRScheduler) ... ok (0.001s) 2023-01-11T21:33:08.0072310Z test_chained_lr4 (__main__.TestLRScheduler) ... ok (0.002s) 2023-01-11T21:33:08.0072590Z test_chained_lr5 (__main__.TestLRScheduler) ... ok (0.002s) 2023-01-11T21:33:08.0072875Z test_closed_form_constantlr (__main__.TestLRScheduler) ... ok (0.006s) 2023-01-11T21:33:08.0073249Z test_closed_form_cos_anneal_lr (__main__.TestLRScheduler) ... ok (0.006s) 2023-01-11T21:33:08.0073694Z test_closed_form_exp_lr (__main__.TestLRScheduler) ... ok (0.006s) 2023-01-11T21:33:08.0074099Z test_closed_form_linearlr (__main__.TestLRScheduler) ... ok (0.006s) 2023-01-11T21:33:08.0074443Z test_closed_form_multi_step_lr (__main__.TestLRScheduler) ... ok (0.006s) 2023-01-11T21:33:08.0074755Z test_closed_form_poly_lr (__main__.TestLRScheduler) ... ok (0.006s) 2023-01-11T21:33:08.0075057Z test_closed_form_step_lr (__main__.TestLRScheduler) ... ok (0.006s) 2023-01-11T21:33:08.0076360Z test_compound_cosanneal_and_exp_lr (__main__.TestLRScheduler) ... /opt/conda/lib/python3.10/site-packages/torch/optim/lr_scheduler.py:139: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`. Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate 2023-01-11T21:33:08.0077435Z warnings.warn("Detected call of `lr_scheduler.step()` before `optimizer.step()`. " 2023-01-11T21:33:08.0077796Z ok (0.002s) 2023-01-11T21:33:08.0078122Z test_compound_cosanneal_and_linearlr (__main__.TestLRScheduler) ... ok (0.002s) 2023-01-11T21:33:08.0078567Z test_compound_cosanneal_and_multistep_lr (__main__.TestLRScheduler) ... ok (0.002s) 2023-01-11T21:33:08.0078986Z test_compound_cosanneal_and_step_lr (__main__.TestLRScheduler) ... ok (0.002s) 2023-01-11T21:33:08.0079383Z test_compound_exp_and_linearlr (__main__.TestLRScheduler) ... ok (0.002s) 2023-01-11T21:33:08.0079783Z test_compound_exp_and_multistep_lr (__main__.TestLRScheduler) ... ok (0.002s) 2023-01-11T21:33:08.0080118Z test_compound_linearlr_and_multistep_lr (__main__.TestLRScheduler) ... ok (0.002s) 2023-01-11T21:33:08.0080450Z test_compound_reduce_lr_on_plateau1 (__main__.TestLRScheduler) ... ok (0.001s) 2023-01-11T21:33:08.0080838Z test_compound_reduce_lr_on_plateau2 (__main__.TestLRScheduler) ... ok (0.002s) 2023-01-11T21:33:08.0081162Z test_compound_reduce_lr_on_plateau3 (__main__.TestLRScheduler) ... ok (0.002s) 2023-01-11T21:33:08.0081481Z test_compound_reduce_lr_on_plateau4 (__main__.TestLRScheduler) ... ok (0.001s) 2023-01-11T21:33:08.0081795Z test_compound_reduce_lr_on_plateau5 (__main__.TestLRScheduler) ... ok (0.002s) 2023-01-11T21:33:08.0082170Z test_compound_step_and_constantlr (__main__.TestLRScheduler) ... ok (0.001s) 2023-01-11T21:33:08.0082489Z test_compound_step_and_exp_lr (__main__.TestLRScheduler) ... ok (0.002s) 2023-01-11T21:33:08.0082885Z test_compound_step_and_multistep_lr (__main__.TestLRScheduler) ... ok (0.001s) 2023-01-11T21:33:08.0083190Z test_constantlr (__main__.TestLRScheduler) ... ok (0.001s) 2023-01-11T21:33:08.0083508Z test_constantlr_is_constant_for_constant_epoch (__main__.TestLRScheduler) ... ok (0.002s) 2023-01-11T21:33:08.0083832Z test_constantlr_with_epoch (__main__.TestLRScheduler) ... ok (0.003s) 2023-01-11T21:33:08.0084868Z test_cos_anneal_lr (__main__.TestLRScheduler) ... /opt/conda/lib/python3.10/site-packages/torch/optim/lr_scheduler.py:139: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`. Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate 2023-01-11T21:33:08.0085612Z warnings.warn("Detected call of `lr_scheduler.step()` before `optimizer.step()`. " 2023-01-11T21:33:08.0085919Z ok (0.001s) 2023-01-11T21:33:08.0086159Z test_cos_anneal_lr_continue (__main__.TestLRScheduler) ... ok (0.001s) 2023-01-11T21:33:08.0086463Z test_cosine_lr_state_dict (__main__.TestLRScheduler) ... ok (0.001s) 2023-01-11T21:33:08.0086764Z test_cosine_then_cyclic (__main__.TestLRScheduler) ... ok (0.002s) 2023-01-11T21:33:08.0087110Z test_cycle_lr_cycle_momentum_fail_with_momentumless_optimizer (__main__.TestLRScheduler) ... ok (0.001s) 2023-01-11T21:33:08.0087446Z test_cycle_lr_exp_range_mode (__main__.TestLRScheduler) ... ok (0.003s) 2023-01-11T21:33:08.0087760Z test_cycle_lr_exp_range_mode_one_lr (__main__.TestLRScheduler) ... ok (0.003s) 2023-01-11T21:33:08.0088095Z test_cycle_lr_exp_range_mode_step_size_up_down (__main__.TestLRScheduler) ... ok (0.003s) 2023-01-11T21:33:08.0088410Z test_cycle_lr_invalid_mode (__main__.TestLRScheduler) ... ok (0.001s) 2023-01-11T21:33:08.0088730Z test_cycle_lr_removed_after_out_of_scope (__main__.TestLRScheduler) ... ok (0.001s) 2023-01-11T21:33:08.0089068Z test_cycle_lr_scale_fn_restored_from_state_dict (__main__.TestLRScheduler) ... ok (0.001s) 2023-01-11T21:33:08.0089401Z test_cycle_lr_state_dict_picklable (__main__.TestLRScheduler) ... ok (0.001s) 2023-01-11T21:33:08.0089712Z test_cycle_lr_triangular2_mode (__main__.TestLRScheduler) ... ok (0.004s) 2023-01-11T21:33:08.0090037Z test_cycle_lr_triangular2_mode_one_lr (__main__.TestLRScheduler) ... ok (0.004s) 2023-01-11T21:33:08.0090378Z test_cycle_lr_triangular2_mode_step_size_up_down (__main__.TestLRScheduler) ... ok (0.005s) 2023-01-11T21:33:08.0090698Z test_cycle_lr_triangular_mode (__main__.TestLRScheduler) ... ok (0.003s) 2023-01-11T21:33:08.0091019Z test_cycle_lr_triangular_mode_one_lr (__main__.TestLRScheduler) ... ok (0.002s) 2023-01-11T21:33:08.0091353Z test_cycle_lr_triangular_mode_one_lr_no_momentum (__main__.TestLRScheduler) ... ok (0.002s) 2023-01-11T21:33:08.0091695Z test_cycle_lr_triangular_mode_step_size_up_down (__main__.TestLRScheduler) ... ok (0.003s) 2023-01-11T21:33:08.0092010Z test_cycle_lr_with_adam (__main__.TestLRScheduler) ... ok (0.003s) 2023-01-11T21:33:08.0092334Z test_cycle_lr_with_momentumless_optimizer (__main__.TestLRScheduler) ... ok (0.002s) 2023-01-11T21:33:08.0092662Z test_error_when_getlr_has_epoch (__main__.TestLRScheduler) ... ok (0.001s) 2023-01-11T21:33:08.0092990Z test_exp_lr (__main__.TestLRScheduler) ... ok (0.001s) 2023-01-11T21:33:08.0093284Z test_exp_step_lr_state_dict (__main__.TestLRScheduler) ... ok (0.001s) 2023-01-11T21:33:08.0093624Z test_exponential_lr_is_constant_for_constant_epoch (__main__.TestLRScheduler) ... ok (0.002s) 2023-01-11T21:33:08.0093955Z test_get_last_lr_constantlr (__main__.TestLRScheduler) ... ok (0.002s) 2023-01-11T21:33:08.0094250Z test_get_last_lr_linearlr (__main__.TestLRScheduler) ... ok (0.002s) 2023-01-11T21:33:08.0095219Z test_get_last_lr_multi_step_lr (__main__.TestLRScheduler) ... ok (0.002s) 2023-01-11T21:33:08.0096492Z test_get_last_lr_sequentiallr (__main__.TestLRScheduler) ... /opt/conda/lib/python3.10/site-packages/torch/optim/lr_scheduler.py:152: UserWarning: The epoch parameter in `scheduler.step()` was not necessary and is being deprecated where possible. Please use `scheduler.step()` to step the scheduler. During the deprecation, if epoch is different from None, the closed form is used instead of the new chainable form, where available. Please open an issue if you are unable to replicate your use case: https://github.com/pytorch/pytorch/issues/new/choose. 2023-01-11T21:33:08.0097247Z warnings.warn(EPOCH_DEPRECATION_WARNING, UserWarning) 2023-01-11T21:33:08.0097479Z ok (0.002s) 2023-01-11T21:33:08.0097725Z test_get_last_lr_step_lr (__main__.TestLRScheduler) ... ok (0.002s) 2023-01-11T21:33:08.0098743Z test_lambda_lr (__main__.TestLRScheduler) ... /opt/conda/lib/python3.10/site-packages/torch/optim/lr_scheduler.py:139: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`. Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate 2023-01-11T21:33:08.0099621Z warnings.warn("Detected call of `lr_scheduler.step()` before `optimizer.step()`. " 2023-01-11T21:33:08.0099887Z ok (0.001s) 2023-01-11T21:33:08.0100132Z test_lambda_lr_state_dict_fn (__main__.TestLRScheduler) ... ok (0.001s) 2023-01-11T21:33:08.0100443Z test_lambda_lr_state_dict_obj (__main__.TestLRScheduler) ... ok (0.001s) 2023-01-11T21:33:08.0100777Z test_linear_linearlr_is_constant_for_constant_epoch (__main__.TestLRScheduler) ... ok (0.002s) 2023-01-11T21:33:08.0101819Z test_linearlr (__main__.TestLRScheduler) ... /opt/conda/lib/python3.10/site-packages/torch/optim/lr_scheduler.py:139: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`. Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate 2023-01-11T21:33:08.0102607Z warnings.warn("Detected call of `lr_scheduler.step()` before `optimizer.step()`. " 2023-01-11T21:33:08.0102869Z ok (0.001s) 2023-01-11T21:33:08.0103125Z test_linearlr_start_factor_limits1 (__main__.TestLRScheduler) ... ok (0.001s) 2023-01-11T21:33:08.0103434Z test_linearlr_start_factor_limits2 (__main__.TestLRScheduler) ... ok (0.001s) 2023-01-11T21:33:08.0103746Z test_linearlr_with_epoch (__main__.TestLRScheduler) ... ok (0.003s) 2023-01-11T21:33:08.0104752Z test_multi_step_lr (__main__.TestLRScheduler) ... /opt/conda/lib/python3.10/site-packages/torch/optim/lr_scheduler.py:139: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`. Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate 2023-01-11T21:33:08.0105543Z warnings.warn("Detected call of `lr_scheduler.step()` before `optimizer.step()`. " 2023-01-11T21:33:08.0105809Z ok (0.001s) 2023-01-11T21:33:08.0106049Z test_multi_step_lr_state_dict (__main__.TestLRScheduler) ... ok (0.001s) 2023-01-11T21:33:08.0106363Z test_multi_step_lr_with_epoch (__main__.TestLRScheduler) ... ok (0.003s) 2023-01-11T21:33:08.0107376Z test_multiplicative_lr (__main__.TestLRScheduler) ... /opt/conda/lib/python3.10/site-packages/torch/optim/lr_scheduler.py:139: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`. Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate 2023-01-11T21:33:08.0108100Z warnings.warn("Detected call of `lr_scheduler.step()` before `optimizer.step()`. " 2023-01-11T21:33:08.0108350Z ok (0.001s) 2023-01-11T21:33:08.0108600Z test_new_pattern_no_warning (__main__.TestLRScheduler) ... ok (0.002s) 2023-01-11T21:33:08.0108918Z test_new_pattern_no_warning_with_arg (__main__.TestLRScheduler) ... ok (0.002s) 2023-01-11T21:33:08.0109265Z test_new_pattern_no_warning_with_overridden_optim_step (__main__.TestLRScheduler) ... ok (0.005s) 2023-01-11T21:33:08.0109593Z test_no_cyclic_references (__main__.TestLRScheduler) ... ok (0.138s) 2023-01-11T21:33:08.0109906Z test_no_cyclic_references_in_step (__main__.TestLRScheduler) ... ok (0.002s) 2023-01-11T21:33:08.0110229Z test_old_pattern_warning (__main__.TestLRScheduler) ... ok (0.004s) 2023-01-11T21:33:08.0110534Z test_old_pattern_warning_resuming (__main__.TestLRScheduler) ... ok (0.003s) 2023-01-11T21:33:08.0110911Z test_old_pattern_warning_resuming_with_arg (__main__.TestLRScheduler) ... ok (0.003s) 2023-01-11T21:33:08.0111251Z test_old_pattern_warning_with_arg (__main__.TestLRScheduler) ... ok (0.003s) 2023-01-11T21:33:08.0111599Z test_old_pattern_warning_with_overridden_optim_step (__main__.TestLRScheduler) ... ok (0.003s) 2023-01-11T21:33:08.0111956Z test_onecycle_lr_cannot_calculate_total_steps (__main__.TestLRScheduler) ... ok (0.001s) 2023-01-11T21:33:08.0112294Z test_onecycle_lr_cosine_annealing (__main__.TestLRScheduler) ... ok (0.003s) 2023-01-11T21:33:08.0112626Z test_onecycle_lr_invalid_anneal_strategy (__main__.TestLRScheduler) ... ok (0.001s) 2023-01-11T21:33:08.0112950Z test_onecycle_lr_invalid_pct_start (__main__.TestLRScheduler) ... ok (0.001s) 2023-01-11T21:33:08.0113275Z test_onecycle_lr_linear_annealing (__main__.TestLRScheduler) ... ok (0.002s) 2023-01-11T21:33:08.0113614Z test_onecycle_lr_linear_annealing_three_phases (__main__.TestLRScheduler) ... ok (0.002s) 2023-01-11T21:33:08.0114650Z test_poly_lr (__main__.TestLRScheduler) ... /opt/conda/lib/python3.10/site-packages/torch/optim/lr_scheduler.py:139: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`. Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate 2023-01-11T21:33:08.0115378Z warnings.warn("Detected call of `lr_scheduler.step()` before `optimizer.step()`. " 2023-01-11T21:33:08.0115635Z ok (0.001s) 2023-01-11T21:33:08.0115910Z test_polynomial_lr_is_constant_for_constant_epoch (__main__.TestLRScheduler) ... ok (0.002s) 2023-01-11T21:33:08.0116246Z test_reduce_lr_on_plateau1 (__main__.TestLRScheduler) ... ok (0.001s) 2023-01-11T21:33:08.0116546Z test_reduce_lr_on_plateau2 (__main__.TestLRScheduler) ... ok (0.002s) 2023-01-11T21:33:08.0116855Z test_reduce_lr_on_plateau3 (__main__.TestLRScheduler) ... ok (0.002s) 2023-01-11T21:33:08.0117160Z test_reduce_lr_on_plateau4 (__main__.TestLRScheduler) ... ok (0.002s) 2023-01-11T21:33:08.0117461Z test_reduce_lr_on_plateau5 (__main__.TestLRScheduler) ... ok (0.002s) 2023-01-11T21:33:08.0117795Z test_reduce_lr_on_plateau6 (__main__.TestLRScheduler) ... ok (0.002s) 2023-01-11T21:33:08.0118095Z test_reduce_lr_on_plateau7 (__main__.TestLRScheduler) ... ok (0.002s) 2023-01-11T21:33:08.0118389Z test_reduce_lr_on_plateau8 (__main__.TestLRScheduler) ... ok (0.002s) 2023-01-11T21:33:08.0118696Z test_reduce_lr_on_plateau_state_dict (__main__.TestLRScheduler) ... ok (0.001s) 2023-01-11T21:33:08.0119721Z test_sequentiallr1 (__main__.TestLRScheduler) ... /opt/conda/lib/python3.10/site-packages/torch/optim/lr_scheduler.py:139: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`. Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate 2023-01-11T21:33:08.0120472Z warnings.warn("Detected call of `lr_scheduler.step()` before `optimizer.step()`. " 2023-01-11T21:33:08.0121487Z /opt/conda/lib/python3.10/site-packages/torch/optim/lr_scheduler.py:152: UserWarning: The epoch parameter in `scheduler.step()` was not necessary and is being deprecated where possible. Please use `scheduler.step()` to step the scheduler. During the deprecation, if epoch is different from None, the closed form is used instead of the new chainable form, where available. Please open an issue if you are unable to replicate your use case: https://github.com/pytorch/pytorch/issues/new/choose. 2023-01-11T21:33:08.0122174Z warnings.warn(EPOCH_DEPRECATION_WARNING, UserWarning) 2023-01-11T21:33:08.0122456Z ok (0.002s) 2023-01-11T21:33:08.0122691Z test_sequentiallr2 (__main__.TestLRScheduler) ... ok (0.001s) 2023-01-11T21:33:08.0122982Z test_sequentiallr3 (__main__.TestLRScheduler) ... ok (0.001s) 2023-01-11T21:33:08.0123273Z test_sequentiallr4 (__main__.TestLRScheduler) ... ok (0.001s) 2023-01-11T21:33:08.0123551Z test_step_lr (__main__.TestLRScheduler) ... ok (0.001s) 2023-01-11T21:33:08.0123870Z test_step_lr_is_constant_for_constant_epoch (__main__.TestLRScheduler) ... ok (0.002s) 2023-01-11T21:33:08.0124194Z test_step_lr_state_dict (__main__.TestLRScheduler) ... ok (0.001s) 2023-01-11T21:33:08.0124488Z test_swa_lr_state_dict (__main__.TestLRScheduler) ... ok (0.001s) 2023-01-11T21:33:08.0124815Z test_swalr_cosine_anneal_after_multiplicative (__main__.TestLRScheduler) ... ok (0.002s) 2023-01-11T21:33:08.0125137Z test_swalr_hypers (__main__.TestLRScheduler) ... ok (0.001s) 2023-01-11T21:33:08.0125460Z test_swalr_linear_anneal_after_multiplicative (__main__.TestLRScheduler) ... ok (0.003s) 2023-01-11T21:33:08.0125780Z test_swalr_no_anneal (__main__.TestLRScheduler) ... ok (0.001s) 2023-01-11T21:33:08.0126782Z test_adadelta (__main__.TestOptim) ... /opt/conda/lib/python3.10/site-packages/torch/optim/lr_scheduler.py:139: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`. Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate 2023-01-11T21:33:08.0127515Z warnings.warn("Detected call of `lr_scheduler.step()` before `optimizer.step()`. " 2023-01-11T21:33:08.0127785Z ok (5.248s) 2023-01-11T21:33:08.0128016Z test_adadelta_complex (__main__.TestOptim) ... ok (0.004s) 2023-01-11T21:33:08.0128289Z test_adagrad (__main__.TestOptim) ... ok (6.156s) 2023-01-11T21:33:08.0128563Z test_adagrad_complex (__main__.TestOptim) ... ok (0.005s) 2023-01-11T21:33:08.0128838Z test_adagrad_sparse (__main__.TestOptim) ... ok (8.841s) 2023-01-11T21:33:08.0129105Z test_adam (__main__.TestOptim) ... ok (15.067s) 2023-01-11T21:33:08.0129368Z test_adamax (__main__.TestOptim) ... ok (3.625s) 2023-01-11T21:33:08.0129631Z test_adamw (__main__.TestOptim) ... ok (4.546s) 2023-01-11T21:33:08.0129929Z test_asgd (__main__.TestOptim) ... ok (6.321s) 2023-01-11T21:33:08.0130218Z test_duplicate_params_in_param_group (__main__.TestOptim) ... ok (0.001s) 2023-01-11T21:33:08.0130508Z test_empty_grad (__main__.TestOptim) ... ok (0.007s) 2023-01-11T21:33:08.0130797Z test_functional_fused_adam_with_foundinf (__main__.TestOptim) ... ok (0.005s) 2023-01-11T21:33:08.0131097Z test_fused_optimizers (__main__.TestOptim) ... ok (0.043s) 2023-01-11T21:33:08.0131382Z test_invalid_param_type (__main__.TestOptim) ... ok (0.000s) 2023-01-11T21:33:08.0131640Z test_lbfgs (__main__.TestOptim) ... ok (0.708s) 2023-01-11T21:33:08.0131911Z test_lbfgs_return_type (__main__.TestOptim) ... ok (0.001s) 2023-01-11T21:33:08.0132203Z test_multi_tensor_optimizers (__main__.TestOptim) ... ok (0.303s) 2023-01-11T21:33:08.0133191Z test_nadam (__main__.TestOptim) ... /opt/conda/lib/python3.10/site-packages/torch/optim/lr_scheduler.py:139: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`. Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate 2023-01-11T21:33:08.0133923Z warnings.warn("Detected call of `lr_scheduler.step()` before `optimizer.step()`. " 2023-01-11T21:33:08.0134180Z ok (2.351s) 2023-01-11T21:33:08.0134420Z test_no_grad_for_all_params (__main__.TestOptim) ... ok (0.001s) 2023-01-11T21:33:08.0135231Z test_post_hook (__main__.TestOptim) ... ok (0.001s) 2023-01-11T21:33:08.0135627Z test_pre_and_post_hook (__main__.TestOptim) ... ok (0.002s) 2023-01-11T21:33:08.0135904Z test_pre_hook (__main__.TestOptim) ... ok (0.001s) 2023-01-11T21:33:08.0136267Z test_radam (__main__.TestOptim) ... ok (2.638s) 2023-01-11T21:33:08.0136522Z test_rmsprop (__main__.TestOptim) ... ok (11.664s) 2023-01-11T21:33:08.0136784Z test_rprop (__main__.TestOptim) ... ok (5.183s) 2023-01-11T21:33:08.0137040Z test_sgd (__main__.TestOptim) ... ok (12.021s) 2023-01-11T21:33:08.0137295Z test_sgd_complex (__main__.TestOptim) ... ok (0.009s) 2023-01-11T21:33:08.0137574Z test_sgd_sparse (__main__.TestOptim) ... ok (7.405s) 2023-01-11T21:33:08.0137850Z test_sparse_adam (__main__.TestOptim) ... ok (2.217s) 2023-01-11T21:33:08.0138145Z test_averaged_model_all_devices (__main__.TestSWAUtils) ... ok (0.058s) 2023-01-11T21:33:08.0138456Z test_averaged_model_exponential (__main__.TestSWAUtils) ... ok (0.008s) 2023-01-11T21:33:08.0138779Z test_averaged_model_exponential_buffers (__main__.TestSWAUtils) ... ok (0.005s) 2023-01-11T21:33:08.0139108Z test_averaged_model_mixed_device (__main__.TestSWAUtils) ... ok (0.006s) 2023-01-11T21:33:08.0139414Z test_averaged_model_state_dict (__main__.TestSWAUtils) ... ok (0.004s) 2023-01-11T21:33:08.0139725Z test_bn_update_eval_momentum (__main__.TestSWAUtils) ... ok (0.052s) 2023-01-11T21:33:08.0140020Z test_update_bn_cnn (__main__.TestSWAUtils) ... ok (1.391s) 2023-01-11T21:33:08.0140305Z test_update_bn_dnn (__main__.TestSWAUtils) ... ok (0.038s) 2023-01-11T21:33:08.0140457Z 2023-01-11T21:33:08.0140681Z ---------------------------------------------------------------------- 2023-01-11T21:33:08.0140951Z Ran 166 tests in 97.587s 2023-01-11T21:33:08.0141076Z 2023-01-11T21:33:08.0141150Z OK 2023-01-11T21:33:08.0141254Z 2023-01-11T21:33:08.0141342Z Generating XML reports... 2023-01-11T21:33:08.0141822Z Generated XML report: test-reports/python-unittest/test_optim/TEST-TestDifferentiableOptimizer-20230111213129.xml 2023-01-11T21:33:08.0142391Z Generated XML report: test-reports/python-unittest/test_optim/TEST-TestLRScheduler-20230111213129.xml 2023-01-11T21:33:08.0142888Z Generated XML report: test-reports/python-unittest/test_optim/TEST-TestOptim-20230111213129.xml 2023-01-11T21:33:08.0143365Z Generated XML report: test-reports/python-unittest/test_optim/TEST-TestSWAUtils-20230111213129.xml 2023-01-11T21:33:08.0143656Z 2023-01-11T21:33:08.0144087Z ##[endgroup] 2023-01-11T21:33:08.0144476Z FINISHED PRINTING LOG FILE of test_optim (/var/lib/jenkins/workspace/test/test-reports/test_optim_mznt_uo2) 2023-01-11T21:33:08.0144691Z 2023-01-11T21:33:09.9330521Z Ignoring disabled issues: [] 2023-01-11T21:33:09.9563903Z Running test_shape_ops ... [2023-01-11 21:33:09.955699] 2023-01-11T21:33:09.9564974Z Executing ['/opt/conda/bin/python', '-bb', 'test_shape_ops.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 21:33:09.956032] 2023-01-11T21:33:14.5124223Z 2023-01-11T21:33:14.5124668Z Expand the folded group to see the log file of test_shape_ops 2023-01-11T21:33:14.5125712Z ##[group]PRINTING LOG FILE of test_shape_ops (/var/lib/jenkins/workspace/test/test-reports/test_shape_ops_sg7hohkm) 2023-01-11T21:33:14.5126017Z 2023-01-11T21:33:14.5126136Z Running tests... 2023-01-11T21:33:14.5126686Z ---------------------------------------------------------------------- 2023-01-11T21:33:14.5127234Z Test results will be stored in test-reports/python-unittest/test_shape_ops 2023-01-11T21:33:14.5127667Z test_clamp_cuda_float32 (__main__.TestShapeOpsCUDA) ... ok (0.018s) 2023-01-11T21:33:14.5127985Z test_clamp_cuda_int64 (__main__.TestShapeOpsCUDA) ... ok (0.002s) 2023-01-11T21:33:14.5128296Z test_clamp_propagates_nans_cuda (__main__.TestShapeOpsCUDA) ... ok (0.003s) 2023-01-11T21:33:14.5128656Z test_clamp_raises_arg_errors_cuda (__main__.TestShapeOpsCUDA) ... ok (0.011s) 2023-01-11T21:33:14.5128987Z test_complex_rot90_cuda_complex128 (__main__.TestShapeOpsCUDA) ... ok (0.005s) 2023-01-11T21:33:14.5129315Z test_complex_rot90_cuda_complex64 (__main__.TestShapeOpsCUDA) ... ok (0.004s) 2023-01-11T21:33:14.5129823Z test_diag_cuda_bool (__main__.TestShapeOpsCUDA) ... ok (0.001s) 2023-01-11T21:33:14.5130124Z test_diag_cuda_float32 (__main__.TestShapeOpsCUDA) ... ok (0.001s) 2023-01-11T21:33:14.5130427Z test_diagonal_cuda (__main__.TestShapeOpsCUDA) ... ok (0.001s) 2023-01-11T21:33:14.5130760Z test_diagonal_multidim_cuda_float32 (__main__.TestShapeOpsCUDA) ... skip: Only runs on cpu (0.001s) 2023-01-11T21:33:14.5131097Z test_flip_cuda_bfloat16 (__main__.TestShapeOpsCUDA) ... ok (0.025s) 2023-01-11T21:33:14.5131393Z test_flip_cuda_bool (__main__.TestShapeOpsCUDA) ... ok (0.015s) 2023-01-11T21:33:14.5131700Z test_flip_cuda_complex128 (__main__.TestShapeOpsCUDA) ... ok (0.028s) 2023-01-11T21:33:14.5132008Z test_flip_cuda_complex64 (__main__.TestShapeOpsCUDA) ... ok (0.027s) 2023-01-11T21:33:14.5132313Z test_flip_cuda_float16 (__main__.TestShapeOpsCUDA) ... ok (0.025s) 2023-01-11T21:33:14.5132614Z test_flip_cuda_float32 (__main__.TestShapeOpsCUDA) ... ok (0.091s) 2023-01-11T21:33:14.5132912Z test_flip_cuda_float64 (__main__.TestShapeOpsCUDA) ... ok (0.023s) 2023-01-11T21:33:14.5133209Z test_flip_cuda_int16 (__main__.TestShapeOpsCUDA) ... ok (0.014s) 2023-01-11T21:33:14.5133504Z test_flip_cuda_int32 (__main__.TestShapeOpsCUDA) ... ok (0.014s) 2023-01-11T21:33:14.5133802Z test_flip_cuda_int64 (__main__.TestShapeOpsCUDA) ... ok (0.013s) 2023-01-11T21:33:14.5134083Z test_flip_cuda_int8 (__main__.TestShapeOpsCUDA) ... ok (0.014s) 2023-01-11T21:33:14.5134378Z test_flip_cuda_uint8 (__main__.TestShapeOpsCUDA) ... ok (0.014s) 2023-01-11T21:33:14.5134880Z test_flip_errors_cuda_bfloat16 (__main__.TestShapeOpsCUDA) ... ok (0.013s) 2023-01-11T21:33:14.5135190Z test_flip_errors_cuda_bool (__main__.TestShapeOpsCUDA) ... ok (0.013s) 2023-01-11T21:33:14.5135508Z test_flip_errors_cuda_complex128 (__main__.TestShapeOpsCUDA) ... ok (0.013s) 2023-01-11T21:33:14.5135832Z test_flip_errors_cuda_complex64 (__main__.TestShapeOpsCUDA) ... ok (0.013s) 2023-01-11T21:33:14.5136250Z test_flip_errors_cuda_float16 (__main__.TestShapeOpsCUDA) ... ok (0.013s) 2023-01-11T21:33:14.5136566Z test_flip_errors_cuda_float32 (__main__.TestShapeOpsCUDA) ... ok (0.013s) 2023-01-11T21:33:14.5136875Z test_flip_errors_cuda_float64 (__main__.TestShapeOpsCUDA) ... ok (0.013s) 2023-01-11T21:33:14.5137260Z test_flip_errors_cuda_int16 (__main__.TestShapeOpsCUDA) ... ok (0.013s) 2023-01-11T21:33:14.5137562Z test_flip_errors_cuda_int32 (__main__.TestShapeOpsCUDA) ... ok (0.013s) 2023-01-11T21:33:14.5137862Z test_flip_errors_cuda_int64 (__main__.TestShapeOpsCUDA) ... ok (0.013s) 2023-01-11T21:33:14.5138172Z test_flip_errors_cuda_int8 (__main__.TestShapeOpsCUDA) ... ok (0.013s) 2023-01-11T21:33:14.5138471Z test_flip_errors_cuda_uint8 (__main__.TestShapeOpsCUDA) ... ok (0.013s) 2023-01-11T21:33:14.5138817Z test_flip_large_tensor_cuda (__main__.TestShapeOpsCUDA) ... skip: Insufficient cpu memory (0.121s) 2023-01-11T21:33:14.5139162Z test_flip_numpy_cuda_bfloat16 (__main__.TestShapeOpsCUDA) ... ok (0.024s) 2023-01-11T21:33:14.5139477Z test_flip_numpy_cuda_bool (__main__.TestShapeOpsCUDA) ... ok (0.021s) 2023-01-11T21:33:14.5139785Z test_flip_numpy_cuda_complex128 (__main__.TestShapeOpsCUDA) ... ok (0.023s) 2023-01-11T21:33:14.5140107Z test_flip_numpy_cuda_complex64 (__main__.TestShapeOpsCUDA) ... ok (0.023s) 2023-01-11T21:33:14.5140426Z test_flip_numpy_cuda_float16 (__main__.TestShapeOpsCUDA) ... ok (0.023s) 2023-01-11T21:33:14.5140730Z test_flip_numpy_cuda_float32 (__main__.TestShapeOpsCUDA) ... ok (0.023s) 2023-01-11T21:33:14.5141038Z test_flip_numpy_cuda_float64 (__main__.TestShapeOpsCUDA) ... ok (0.022s) 2023-01-11T21:33:14.5141350Z test_flip_numpy_cuda_int16 (__main__.TestShapeOpsCUDA) ... ok (0.021s) 2023-01-11T21:33:14.5141658Z test_flip_numpy_cuda_int32 (__main__.TestShapeOpsCUDA) ... ok (0.021s) 2023-01-11T21:33:14.5141954Z test_flip_numpy_cuda_int64 (__main__.TestShapeOpsCUDA) ... ok (0.021s) 2023-01-11T21:33:14.5142261Z test_flip_numpy_cuda_int8 (__main__.TestShapeOpsCUDA) ... ok (0.021s) 2023-01-11T21:33:14.5142608Z test_flip_numpy_cuda_uint8 (__main__.TestShapeOpsCUDA) ... ok (0.021s) 2023-01-11T21:33:14.5142911Z test_fliplr_cuda_complex128 (__main__.TestShapeOpsCUDA) ... ok (0.004s) 2023-01-11T21:33:14.5143219Z test_fliplr_cuda_float64 (__main__.TestShapeOpsCUDA) ... ok (0.004s) 2023-01-11T21:33:14.5143528Z test_fliplr_cuda_int64 (__main__.TestShapeOpsCUDA) ... ok (0.003s) 2023-01-11T21:33:14.5143846Z test_fliplr_invalid_cuda_complex128 (__main__.TestShapeOpsCUDA) ... ok (0.005s) 2023-01-11T21:33:14.5144165Z test_fliplr_invalid_cuda_float64 (__main__.TestShapeOpsCUDA) ... ok (0.005s) 2023-01-11T21:33:14.5144484Z test_fliplr_invalid_cuda_int64 (__main__.TestShapeOpsCUDA) ... ok (0.005s) 2023-01-11T21:33:14.5144797Z test_flipud_cuda_complex128 (__main__.TestShapeOpsCUDA) ... ok (0.005s) 2023-01-11T21:33:14.5145097Z test_flipud_cuda_float64 (__main__.TestShapeOpsCUDA) ... ok (0.004s) 2023-01-11T21:33:14.5145401Z test_flipud_cuda_int64 (__main__.TestShapeOpsCUDA) ... ok (0.004s) 2023-01-11T21:33:14.5145717Z test_flipud_invalid_cuda_complex128 (__main__.TestShapeOpsCUDA) ... ok (0.003s) 2023-01-11T21:33:14.5146034Z test_flipud_invalid_cuda_float64 (__main__.TestShapeOpsCUDA) ... ok (0.003s) 2023-01-11T21:33:14.5146355Z test_flipud_invalid_cuda_int64 (__main__.TestShapeOpsCUDA) ... ok (0.003s) 2023-01-11T21:33:14.5146670Z test_movedim_cuda_complex128 (__main__.TestShapeOpsCUDA) ... ok (0.055s) 2023-01-11T21:33:14.5146986Z test_movedim_cuda_float32 (__main__.TestShapeOpsCUDA) ... ok (0.049s) 2023-01-11T21:33:14.5147281Z test_movedim_cuda_int64 (__main__.TestShapeOpsCUDA) ... ok (0.031s) 2023-01-11T21:33:14.5147597Z test_movedim_invalid_cuda_complex128 (__main__.TestShapeOpsCUDA) ... ok (0.041s) 2023-01-11T21:33:14.5147922Z test_movedim_invalid_cuda_float32 (__main__.TestShapeOpsCUDA) ... ok (0.040s) 2023-01-11T21:33:14.5148234Z test_movedim_invalid_cuda_int64 (__main__.TestShapeOpsCUDA) ... ok (0.040s) 2023-01-11T21:33:14.5148560Z test_nonzero_astuple_out_cuda (__main__.TestShapeOpsCUDA) ... ok (0.053s) 2023-01-11T21:33:14.5148873Z test_nonzero_cuda_bfloat16 (__main__.TestShapeOpsCUDA) ... ok (0.042s) 2023-01-11T21:33:14.5149178Z test_nonzero_cuda_bool (__main__.TestShapeOpsCUDA) ... ok (0.042s) 2023-01-11T21:33:14.5149514Z test_nonzero_cuda_float16 (__main__.TestShapeOpsCUDA) ... ok (0.041s) 2023-01-11T21:33:14.5149820Z test_nonzero_cuda_float32 (__main__.TestShapeOpsCUDA) ... ok (0.041s) 2023-01-11T21:33:14.5150122Z test_nonzero_cuda_float64 (__main__.TestShapeOpsCUDA) ... ok (0.041s) 2023-01-11T21:33:14.5150416Z test_nonzero_cuda_int16 (__main__.TestShapeOpsCUDA) ... ok (0.041s) 2023-01-11T21:33:14.5150714Z test_nonzero_cuda_int32 (__main__.TestShapeOpsCUDA) ... ok (0.041s) 2023-01-11T21:33:14.5151008Z test_nonzero_cuda_int64 (__main__.TestShapeOpsCUDA) ... ok (0.041s) 2023-01-11T21:33:14.5151302Z test_nonzero_cuda_int8 (__main__.TestShapeOpsCUDA) ... ok (0.042s) 2023-01-11T21:33:14.5151593Z test_nonzero_cuda_uint8 (__main__.TestShapeOpsCUDA) ... ok (0.042s) 2023-01-11T21:33:14.5151915Z test_nonzero_discontiguous_cuda (__main__.TestShapeOpsCUDA) ... ok (0.002s) 2023-01-11T21:33:14.5152233Z test_nonzero_no_warning_cuda (__main__.TestShapeOpsCUDA) ... ok (0.001s) 2023-01-11T21:33:14.5152540Z test_nonzero_non_diff_cuda (__main__.TestShapeOpsCUDA) ... ok (0.000s) 2023-01-11T21:33:14.5152836Z test_rot90_cuda (__main__.TestShapeOpsCUDA) ... ok (0.023s) 2023-01-11T21:33:14.5153143Z test_sparse_dense_dim_cuda_complex128 (__main__.TestShapeOpsCUDA) ... ok (0.001s) 2023-01-11T21:33:14.5153474Z test_sparse_dense_dim_cuda_float32 (__main__.TestShapeOpsCUDA) ... ok (0.001s) 2023-01-11T21:33:14.5153788Z test_sparse_dense_dim_cuda_int64 (__main__.TestShapeOpsCUDA) ... ok (0.001s) 2023-01-11T21:33:14.5154113Z test_tolist_cuda (__main__.TestShapeOpsCUDA) ... skip: Only runs on cpu (0.001s) 2023-01-11T21:33:14.5154428Z test_trace_cuda_float16 (__main__.TestShapeOpsCUDA) ... ok (0.003s) 2023-01-11T21:33:14.5154754Z test_trace_cuda_float32 (__main__.TestShapeOpsCUDA) ... ok (0.003s) 2023-01-11T21:33:14.5155052Z test_trace_cuda_float64 (__main__.TestShapeOpsCUDA) ... ok (0.003s) 2023-01-11T21:33:14.5155348Z test_trace_cuda_int16 (__main__.TestShapeOpsCUDA) ... ok (0.002s) 2023-01-11T21:33:14.5155640Z test_trace_cuda_int32 (__main__.TestShapeOpsCUDA) ... ok (0.002s) 2023-01-11T21:33:14.5155936Z test_trace_cuda_int64 (__main__.TestShapeOpsCUDA) ... ok (0.002s) 2023-01-11T21:33:14.5156229Z test_trace_cuda_int8 (__main__.TestShapeOpsCUDA) ... ok (0.002s) 2023-01-11T21:33:14.5156525Z test_trace_cuda_uint8 (__main__.TestShapeOpsCUDA) ... ok (0.002s) 2023-01-11T21:33:14.5156830Z test_unbind_cuda (__main__.TestShapeOpsCUDA) ... skip: Only runs on cpu (0.001s) 2023-01-11T21:33:14.5157012Z 2023-01-11T21:33:14.5157252Z ---------------------------------------------------------------------- 2023-01-11T21:33:14.5157510Z Ran 93 tests in 1.715s 2023-01-11T21:33:14.5157629Z 2023-01-11T21:33:14.5157706Z OK (skipped=4) 2023-01-11T21:33:14.5157822Z 2023-01-11T21:33:14.5157913Z Generating XML reports... 2023-01-11T21:33:14.5158348Z Generated XML report: test-reports/python-unittest/test_shape_ops/TEST-TestShapeOpsCUDA-20230111213312.xml 2023-01-11T21:33:14.5158592Z 2023-01-11T21:33:14.5158880Z ##[endgroup] 2023-01-11T21:33:14.5159276Z FINISHED PRINTING LOG FILE of test_shape_ops (/var/lib/jenkins/workspace/test/test-reports/test_shape_ops_sg7hohkm) 2023-01-11T21:33:14.5159496Z 2023-01-11T21:33:16.4386065Z Ignoring disabled issues: [] 2023-01-11T21:33:16.4623807Z Running test_type_info ... [2023-01-11 21:33:16.461571] 2023-01-11T21:33:16.4624650Z Executing ['/opt/conda/bin/python', '-bb', 'test_type_info.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 21:33:16.461911] 2023-01-11T21:33:19.3085183Z 2023-01-11T21:33:19.3085951Z Expand the folded group to see the log file of test_type_info 2023-01-11T21:33:19.3086806Z ##[group]PRINTING LOG FILE of test_type_info (/var/lib/jenkins/workspace/test/test-reports/test_type_info_vq_q4qi7) 2023-01-11T21:33:19.3087053Z 2023-01-11T21:33:19.3087140Z Running tests... 2023-01-11T21:33:19.3087573Z ---------------------------------------------------------------------- 2023-01-11T21:33:19.3087975Z Test results will be stored in test-reports/python-unittest/test_type_info 2023-01-11T21:33:19.3088495Z test_finfo (__main__.TestDTypeInfo) ... ok (1.096s) 2023-01-11T21:33:19.3088763Z test_iinfo (__main__.TestDTypeInfo) ... ok (0.001s) 2023-01-11T21:33:19.3089043Z test_invalid_input (__main__.TestDTypeInfo) ... ok (0.001s) 2023-01-11T21:33:19.3089205Z 2023-01-11T21:33:19.3089406Z ---------------------------------------------------------------------- 2023-01-11T21:33:19.3089662Z Ran 3 tests in 1.098s 2023-01-11T21:33:19.3089785Z 2023-01-11T21:33:19.3089859Z OK 2023-01-11T21:33:19.3089960Z 2023-01-11T21:33:19.3090056Z Generating XML reports... 2023-01-11T21:33:19.3090493Z Generated XML report: test-reports/python-unittest/test_type_info/TEST-TestDTypeInfo-20230111213317.xml 2023-01-11T21:33:19.3090732Z 2023-01-11T21:33:19.3090960Z ##[endgroup] 2023-01-11T21:33:19.3091351Z FINISHED PRINTING LOG FILE of test_type_info (/var/lib/jenkins/workspace/test/test-reports/test_type_info_vq_q4qi7) 2023-01-11T21:33:19.3091572Z 2023-01-11T21:33:21.2182644Z Ignoring disabled issues: [] 2023-01-11T21:33:21.2420429Z Running test_view_ops ... [2023-01-11 21:33:21.241339] 2023-01-11T21:33:21.2421461Z Executing ['/opt/conda/bin/python', '-bb', 'test_view_ops.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 21:33:21.241713] 2023-01-11T21:33:40.0382656Z 2023-01-11T21:33:40.0383292Z Expand the folded group to see the log file of test_view_ops 2023-01-11T21:33:40.0384014Z ##[group]PRINTING LOG FILE of test_view_ops (/var/lib/jenkins/workspace/test/test-reports/test_view_ops_yux6z2dq) 2023-01-11T21:33:40.0384241Z 2023-01-11T21:33:40.0387633Z Running tests... 2023-01-11T21:33:40.0388305Z ---------------------------------------------------------------------- 2023-01-11T21:33:40.0389162Z Test results will be stored in test-reports/python-unittest/test_view_ops 2023-01-11T21:33:40.0390717Z test_T_cuda (__main__.TestOldViewOpsCUDA) ... /var/lib/jenkins/workspace/test/test_view_ops.py:1305: UserWarning: The use of `x.T` on tensors of dimension other than 2 to reverse their shape is deprecated and it will throw an error in a future release. Consider `x.mT` to transpose batches of matrices or `x.permute(*torch.arange(x.ndim - 1, -1, -1))` to reverse the dimensions of a tensor. (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/TensorShape.cpp:3542.) 2023-01-11T21:33:40.0391736Z t1 = a.T 2023-01-11T21:33:40.0391986Z ok (0.004s) 2023-01-11T21:33:40.0392324Z test_atleast_cuda_complex128 (__main__.TestOldViewOpsCUDA) ... ok (0.296s) 2023-01-11T21:33:40.0392780Z test_atleast_cuda_complex64 (__main__.TestOldViewOpsCUDA) ... ok (0.299s) 2023-01-11T21:33:40.0393242Z test_atleast_cuda_float16 (__main__.TestOldViewOpsCUDA) ... ok (0.271s) 2023-01-11T21:33:40.0393687Z test_atleast_cuda_float32 (__main__.TestOldViewOpsCUDA) ... ok (0.266s) 2023-01-11T21:33:40.0394791Z test_atleast_cuda_float64 (__main__.TestOldViewOpsCUDA) ... ok (0.259s) 2023-01-11T21:33:40.0395216Z test_atleast_cuda_int16 (__main__.TestOldViewOpsCUDA) ... ok (0.137s) 2023-01-11T21:33:40.0395647Z test_atleast_cuda_int32 (__main__.TestOldViewOpsCUDA) ... ok (0.139s) 2023-01-11T21:33:40.0396064Z test_atleast_cuda_int64 (__main__.TestOldViewOpsCUDA) ... ok (0.129s) 2023-01-11T21:33:40.0396490Z test_atleast_cuda_int8 (__main__.TestOldViewOpsCUDA) ... ok (0.136s) 2023-01-11T21:33:40.0396922Z test_atleast_cuda_uint8 (__main__.TestOldViewOpsCUDA) ... ok (0.136s) 2023-01-11T21:33:40.0397351Z test_atleast_gradient_cuda (__main__.TestOldViewOpsCUDA) ... ok (0.076s) 2023-01-11T21:33:40.0397760Z test_big_transpose_cuda (__main__.TestOldViewOpsCUDA) ... ok (0.006s) 2023-01-11T21:33:40.0398250Z test_broadcast_shapes_cuda (__main__.TestOldViewOpsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:33:40.0398811Z test_broadcast_tensors_cuda_float32 (__main__.TestOldViewOpsCUDA) ... skip: Only runs on cpu (0.001s) 2023-01-11T21:33:40.0453547Z test_broadcast_to_cuda_bool (__main__.TestOldViewOpsCUDA) ... /opt/conda/lib/python3.10/site-packages/torch/testing/_comparison.py:679: UserWarning: The given NumPy array is not writable, and PyTorch does not support non-writable tensors. This means writing to this tensor will result in undefined behavior. You may want to copy the array to protect its data or make it writable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/utils/tensor_numpy.cpp:210.) 2023-01-11T21:33:40.0455100Z return torch.as_tensor(tensor_like) 2023-01-11T21:33:40.0455448Z ok (0.042s) 2023-01-11T21:33:40.0455805Z test_broadcast_to_cuda_complex128 (__main__.TestOldViewOpsCUDA) ... ok (0.043s) 2023-01-11T21:33:40.0456401Z test_broadcast_to_cuda_complex64 (__main__.TestOldViewOpsCUDA) ... ok (0.043s) 2023-01-11T21:33:40.0456874Z test_broadcast_to_cuda_float16 (__main__.TestOldViewOpsCUDA) ... ok (0.043s) 2023-01-11T21:33:40.0457343Z test_broadcast_to_cuda_float32 (__main__.TestOldViewOpsCUDA) ... ok (0.043s) 2023-01-11T21:33:40.0457802Z test_broadcast_to_cuda_float64 (__main__.TestOldViewOpsCUDA) ... ok (0.042s) 2023-01-11T21:33:40.0458262Z test_broadcast_to_cuda_int16 (__main__.TestOldViewOpsCUDA) ... ok (0.041s) 2023-01-11T21:33:40.0458738Z test_broadcast_to_cuda_int32 (__main__.TestOldViewOpsCUDA) ... ok (0.042s) 2023-01-11T21:33:40.0459197Z test_broadcast_to_cuda_int64 (__main__.TestOldViewOpsCUDA) ... ok (0.041s) 2023-01-11T21:33:40.0459654Z test_broadcast_to_cuda_int8 (__main__.TestOldViewOpsCUDA) ... ok (0.042s) 2023-01-11T21:33:40.0460078Z test_broadcast_to_cuda_uint8 (__main__.TestOldViewOpsCUDA) ... ok (0.042s) 2023-01-11T21:33:40.0460407Z test_chunk_cuda (__main__.TestOldViewOpsCUDA) ... skip: Only runs on cpu (0.001s) 2023-01-11T21:33:40.0460880Z test_conj_neg_view_numpy_error_cuda (__main__.TestOldViewOpsCUDA) ... skip: Only runs on cpu (0.001s) 2023-01-11T21:33:40.0461535Z test_contiguous_cuda (__main__.TestOldViewOpsCUDA) ... /var/lib/jenkins/workspace/test/test_view_ops.py:1683: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:33:40.0462088Z x.set_(x.storage(), 0, x.size(), stride) 2023-01-11T21:33:40.0462295Z ok (0.001s) 2023-01-11T21:33:40.0462564Z test_crow_col_indices_cuda (__main__.TestOldViewOpsCUDA) ... skip: Only runs on cpu (0.001s) 2023-01-11T21:33:40.0462889Z test_empty_reshape_cuda (__main__.TestOldViewOpsCUDA) ... ok (0.003s) 2023-01-11T21:33:40.0463192Z test_expand_cuda (__main__.TestOldViewOpsCUDA) ... ok (0.010s) 2023-01-11T21:33:40.0463493Z test_flatten_cuda (__main__.TestOldViewOpsCUDA) ... ok (0.016s) 2023-01-11T21:33:40.0463797Z test_memory_format_resize__cuda (__main__.TestOldViewOpsCUDA) ... ok (0.001s) 2023-01-11T21:33:40.0464123Z test_memory_format_resize_as_cuda (__main__.TestOldViewOpsCUDA) ... ok (0.001s) 2023-01-11T21:33:40.0464457Z test_narrow_cuda (__main__.TestOldViewOpsCUDA) ... skip: Only runs on cpu (0.001s) 2023-01-11T21:33:40.0464800Z test_narrow_tensor_cuda (__main__.TestOldViewOpsCUDA) ... skip: Only runs on cpu (0.001s) 2023-01-11T21:33:40.0465122Z test_python_types_cuda (__main__.TestOldViewOpsCUDA) ... ok (0.001s) 2023-01-11T21:33:40.0465421Z test_ravel_cuda (__main__.TestOldViewOpsCUDA) ... ok (0.006s) 2023-01-11T21:33:40.0465717Z test_reshape_cuda (__main__.TestOldViewOpsCUDA) ... ok (0.004s) 2023-01-11T21:33:40.0466354Z test_reshape_view_semantics_cuda_bfloat16 (__main__.TestOldViewOpsCUDA) ... /var/lib/jenkins/workspace/test/test_view_ops.py:1669: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:33:40.0466988Z self.assertEqual(tensor.storage().data_ptr(), view_tensor.storage().data_ptr()) 2023-01-11T21:33:40.0467616Z /var/lib/jenkins/workspace/test/test_view_ops.py:1675: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:33:40.0468193Z self.assertNotEqual(tensor.storage().data_ptr(), copy_tensor.storage().data_ptr()) 2023-01-11T21:33:40.0468461Z ok (0.001s) 2023-01-11T21:33:40.0468720Z test_reshape_view_semantics_cuda_bool (__main__.TestOldViewOpsCUDA) ... ok (0.001s) 2023-01-11T21:33:40.0469067Z test_reshape_view_semantics_cuda_complex128 (__main__.TestOldViewOpsCUDA) ... ok (0.001s) 2023-01-11T21:33:40.0469431Z test_reshape_view_semantics_cuda_complex64 (__main__.TestOldViewOpsCUDA) ... ok (0.001s) 2023-01-11T21:33:40.0469782Z test_reshape_view_semantics_cuda_float16 (__main__.TestOldViewOpsCUDA) ... ok (0.001s) 2023-01-11T21:33:40.0470122Z test_reshape_view_semantics_cuda_float32 (__main__.TestOldViewOpsCUDA) ... ok (0.001s) 2023-01-11T21:33:40.0470465Z test_reshape_view_semantics_cuda_float64 (__main__.TestOldViewOpsCUDA) ... ok (0.001s) 2023-01-11T21:33:40.0470855Z test_reshape_view_semantics_cuda_int16 (__main__.TestOldViewOpsCUDA) ... ok (0.001s) 2023-01-11T21:33:40.0471188Z test_reshape_view_semantics_cuda_int32 (__main__.TestOldViewOpsCUDA) ... ok (0.001s) 2023-01-11T21:33:40.0471526Z test_reshape_view_semantics_cuda_int64 (__main__.TestOldViewOpsCUDA) ... ok (0.001s) 2023-01-11T21:33:40.0471866Z test_reshape_view_semantics_cuda_int8 (__main__.TestOldViewOpsCUDA) ... ok (0.001s) 2023-01-11T21:33:40.0472240Z test_reshape_view_semantics_cuda_uint8 (__main__.TestOldViewOpsCUDA) ... ok (0.001s) 2023-01-11T21:33:40.0472571Z test_resize_all_dtypes_and_devices_cuda (__main__.TestOldViewOpsCUDA) ... ok (0.002s) 2023-01-11T21:33:40.0472917Z test_resize_as_all_dtypes_and_devices_cuda (__main__.TestOldViewOpsCUDA) ... ok (0.002s) 2023-01-11T21:33:40.0473263Z test_resize_as_preserves_strides_cuda (__main__.TestOldViewOpsCUDA) ... ok (0.001s) 2023-01-11T21:33:40.0473584Z test_resize_overflow_cuda (__main__.TestOldViewOpsCUDA) ... ok (0.006s) 2023-01-11T21:33:40.0473912Z test_split_cuda (__main__.TestOldViewOpsCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:33:40.0474246Z test_t_cuda (__main__.TestOldViewOpsCUDA) ... skip: Only runs on cpu (0.001s) 2023-01-11T21:33:40.0474570Z test_tensor_split_errors_cuda (__main__.TestOldViewOpsCUDA) ... ok (0.040s) 2023-01-11T21:33:40.0474892Z test_tensor_split_indices_cuda_bool (__main__.TestOldViewOpsCUDA) ... ok (0.169s) 2023-01-11T21:33:40.0475236Z test_tensor_split_indices_cuda_complex128 (__main__.TestOldViewOpsCUDA) ... ok (0.224s) 2023-01-11T21:33:40.0475586Z test_tensor_split_indices_cuda_complex64 (__main__.TestOldViewOpsCUDA) ... ok (0.229s) 2023-01-11T21:33:40.0475920Z test_tensor_split_indices_cuda_float16 (__main__.TestOldViewOpsCUDA) ... ok (0.213s) 2023-01-11T21:33:40.0476260Z test_tensor_split_indices_cuda_float32 (__main__.TestOldViewOpsCUDA) ... ok (0.214s) 2023-01-11T21:33:40.0476596Z test_tensor_split_indices_cuda_float64 (__main__.TestOldViewOpsCUDA) ... ok (0.211s) 2023-01-11T21:33:40.0476933Z test_tensor_split_indices_cuda_int16 (__main__.TestOldViewOpsCUDA) ... ok (0.171s) 2023-01-11T21:33:40.0477261Z test_tensor_split_indices_cuda_int32 (__main__.TestOldViewOpsCUDA) ... ok (0.175s) 2023-01-11T21:33:40.0477591Z test_tensor_split_indices_cuda_int64 (__main__.TestOldViewOpsCUDA) ... ok (0.170s) 2023-01-11T21:33:40.0477926Z test_tensor_split_indices_cuda_int8 (__main__.TestOldViewOpsCUDA) ... ok (0.176s) 2023-01-11T21:33:40.0478253Z test_tensor_split_indices_cuda_uint8 (__main__.TestOldViewOpsCUDA) ... ok (0.174s) 2023-01-11T21:33:40.0478588Z test_tensor_split_sections_cuda_bool (__main__.TestOldViewOpsCUDA) ... ok (0.605s) 2023-01-11T21:33:40.0478929Z test_tensor_split_sections_cuda_complex128 (__main__.TestOldViewOpsCUDA) ... ok (0.798s) 2023-01-11T21:33:40.0479307Z test_tensor_split_sections_cuda_complex64 (__main__.TestOldViewOpsCUDA) ... ok (0.820s) 2023-01-11T21:33:40.0479648Z test_tensor_split_sections_cuda_float16 (__main__.TestOldViewOpsCUDA) ... ok (0.754s) 2023-01-11T21:33:40.0479989Z test_tensor_split_sections_cuda_float32 (__main__.TestOldViewOpsCUDA) ... ok (0.765s) 2023-01-11T21:33:40.0480328Z test_tensor_split_sections_cuda_float64 (__main__.TestOldViewOpsCUDA) ... ok (0.740s) 2023-01-11T21:33:40.0480671Z test_tensor_split_sections_cuda_int16 (__main__.TestOldViewOpsCUDA) ... ok (0.599s) 2023-01-11T21:33:40.0480998Z test_tensor_split_sections_cuda_int32 (__main__.TestOldViewOpsCUDA) ... ok (0.601s) 2023-01-11T21:33:40.0481339Z test_tensor_split_sections_cuda_int64 (__main__.TestOldViewOpsCUDA) ... ok (0.576s) 2023-01-11T21:33:40.0481669Z test_tensor_split_sections_cuda_int8 (__main__.TestOldViewOpsCUDA) ... ok (0.603s) 2023-01-11T21:33:40.0481994Z test_tensor_split_sections_cuda_uint8 (__main__.TestOldViewOpsCUDA) ... ok (0.597s) 2023-01-11T21:33:40.0482338Z test_transpose_invalid_cuda_complex128 (__main__.TestOldViewOpsCUDA) ... ok (0.036s) 2023-01-11T21:33:40.0482678Z test_transpose_invalid_cuda_float32 (__main__.TestOldViewOpsCUDA) ... ok (0.036s) 2023-01-11T21:33:40.0483013Z test_transpose_invalid_cuda_int64 (__main__.TestOldViewOpsCUDA) ... ok (0.036s) 2023-01-11T21:33:40.0483343Z test_transpose_vs_numpy_cuda_complex128 (__main__.TestOldViewOpsCUDA) ... ok (0.033s) 2023-01-11T21:33:40.0483684Z test_transpose_vs_numpy_cuda_float32 (__main__.TestOldViewOpsCUDA) ... ok (0.029s) 2023-01-11T21:33:40.0484021Z test_transpose_vs_numpy_cuda_int64 (__main__.TestOldViewOpsCUDA) ... ok (0.018s) 2023-01-11T21:33:40.0484381Z test_transposes_cuda_bfloat16 (__main__.TestOldViewOpsCUDA) ... ok (0.005s) 2023-01-11T21:33:40.0484708Z test_transposes_cuda_bool (__main__.TestOldViewOpsCUDA) ... ok (0.003s) 2023-01-11T21:33:40.0485031Z test_transposes_cuda_complex128 (__main__.TestOldViewOpsCUDA) ... ok (0.008s) 2023-01-11T21:33:40.0485371Z test_transposes_cuda_complex64 (__main__.TestOldViewOpsCUDA) ... ok (0.005s) 2023-01-11T21:33:40.0485690Z test_transposes_cuda_float16 (__main__.TestOldViewOpsCUDA) ... ok (0.005s) 2023-01-11T21:33:40.0486014Z test_transposes_cuda_float32 (__main__.TestOldViewOpsCUDA) ... ok (0.005s) 2023-01-11T21:33:40.0486338Z test_transposes_cuda_float64 (__main__.TestOldViewOpsCUDA) ... ok (0.004s) 2023-01-11T21:33:40.0486643Z test_transposes_cuda_int16 (__main__.TestOldViewOpsCUDA) ... ok (0.003s) 2023-01-11T21:33:40.0486960Z test_transposes_cuda_int32 (__main__.TestOldViewOpsCUDA) ... ok (0.003s) 2023-01-11T21:33:40.0487271Z test_transposes_cuda_int64 (__main__.TestOldViewOpsCUDA) ... ok (0.003s) 2023-01-11T21:33:40.0487587Z test_transposes_cuda_int8 (__main__.TestOldViewOpsCUDA) ... ok (0.003s) 2023-01-11T21:33:40.0487892Z test_transposes_cuda_uint8 (__main__.TestOldViewOpsCUDA) ... ok (0.003s) 2023-01-11T21:33:40.0488217Z test_transposes_errors_cuda_bfloat16 (__main__.TestOldViewOpsCUDA) ... ok (0.012s) 2023-01-11T21:33:40.0488560Z test_transposes_errors_cuda_bool (__main__.TestOldViewOpsCUDA) ... ok (0.012s) 2023-01-11T21:33:40.0488887Z test_transposes_errors_cuda_complex128 (__main__.TestOldViewOpsCUDA) ... ok (0.012s) 2023-01-11T21:33:40.0489359Z test_transposes_errors_cuda_complex64 (__main__.TestOldViewOpsCUDA) ... ok (0.012s) 2023-01-11T21:33:40.0489694Z test_transposes_errors_cuda_float16 (__main__.TestOldViewOpsCUDA) ... ok (0.012s) 2023-01-11T21:33:40.0490026Z test_transposes_errors_cuda_float32 (__main__.TestOldViewOpsCUDA) ... ok (0.012s) 2023-01-11T21:33:40.0490364Z test_transposes_errors_cuda_float64 (__main__.TestOldViewOpsCUDA) ... ok (0.012s) 2023-01-11T21:33:40.0490697Z test_transposes_errors_cuda_int16 (__main__.TestOldViewOpsCUDA) ... ok (0.012s) 2023-01-11T21:33:40.0491015Z test_transposes_errors_cuda_int32 (__main__.TestOldViewOpsCUDA) ... ok (0.012s) 2023-01-11T21:33:40.0491337Z test_transposes_errors_cuda_int64 (__main__.TestOldViewOpsCUDA) ... ok (0.012s) 2023-01-11T21:33:40.0491698Z test_transposes_errors_cuda_int8 (__main__.TestOldViewOpsCUDA) ... ok (0.012s) 2023-01-11T21:33:40.0492015Z test_transposes_errors_cuda_uint8 (__main__.TestOldViewOpsCUDA) ... ok (0.012s) 2023-01-11T21:33:40.0492348Z test_unsqueeze_cuda (__main__.TestOldViewOpsCUDA) ... skip: Only runs on cpu (0.001s) 2023-01-11T21:33:40.0492693Z test_view_all_dtypes_and_devices_cuda (__main__.TestOldViewOpsCUDA) ... ok (0.001s) 2023-01-11T21:33:40.0493014Z test_view_cuda (__main__.TestOldViewOpsCUDA) ... ok (0.038s) 2023-01-11T21:33:40.0493304Z test_view_empty_cuda (__main__.TestOldViewOpsCUDA) ... ok (0.001s) 2023-01-11T21:33:40.0493609Z test_T_view_cuda (__main__.TestViewOpsCUDA) ... ok (0.002s) 2023-01-11T21:33:40.0493922Z test_advanced_indexing_assignment_cuda (__main__.TestViewOpsCUDA) ... ok (0.001s) 2023-01-11T21:33:40.0494247Z test_advanced_indexing_nonview_cuda (__main__.TestViewOpsCUDA) ... ok (0.001s) 2023-01-11T21:33:40.0495280Z test_as_strided_gradients_cuda (__main__.TestViewOpsCUDA) ... /var/lib/jenkins/workspace/test/test_view_ops.py:667: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:33:40.0495896Z assert max_offset < len(y.storage()), "test case resizes storage" 2023-01-11T21:33:40.0496565Z /var/lib/jenkins/workspace/test/test_view_ops.py:667: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:33:40.0497195Z assert max_offset < len(y.storage()), "test case resizes storage" 2023-01-11T21:33:40.0497744Z /var/lib/jenkins/workspace/test/test_view_ops.py:667: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:33:40.0498270Z assert max_offset < len(y.storage()), "test case resizes storage" 2023-01-11T21:33:40.0498818Z /var/lib/jenkins/workspace/test/test_view_ops.py:667: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:33:40.0499347Z assert max_offset < len(y.storage()), "test case resizes storage" 2023-01-11T21:33:40.0499894Z /var/lib/jenkins/workspace/test/test_view_ops.py:667: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:33:40.0500408Z assert max_offset < len(y.storage()), "test case resizes storage" 2023-01-11T21:33:40.0500997Z /var/lib/jenkins/workspace/test/test_view_ops.py:667: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:33:40.0501537Z assert max_offset < len(y.storage()), "test case resizes storage" 2023-01-11T21:33:40.0502113Z /var/lib/jenkins/workspace/test/test_view_ops.py:667: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:33:40.0502644Z assert max_offset < len(y.storage()), "test case resizes storage" 2023-01-11T21:33:40.0503179Z /var/lib/jenkins/workspace/test/test_view_ops.py:667: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:33:40.0503699Z assert max_offset < len(y.storage()), "test case resizes storage" 2023-01-11T21:33:40.0504249Z /var/lib/jenkins/workspace/test/test_view_ops.py:667: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:33:40.0504779Z assert max_offset < len(y.storage()), "test case resizes storage" 2023-01-11T21:33:40.0505326Z /var/lib/jenkins/workspace/test/test_view_ops.py:667: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:33:40.0505842Z assert max_offset < len(y.storage()), "test case resizes storage" 2023-01-11T21:33:40.0506073Z ok (0.096s) 2023-01-11T21:33:40.0506332Z test_as_strided_inplace_view_cuda (__main__.TestViewOpsCUDA) ... ok (0.001s) 2023-01-11T21:33:40.0506683Z test_as_strided_view_cuda (__main__.TestViewOpsCUDA) ... ok (0.001s) 2023-01-11T21:33:40.0506998Z test_basic_indexing_ellipses_view_cuda (__main__.TestViewOpsCUDA) ... ok (0.001s) 2023-01-11T21:33:40.0507335Z test_basic_indexing_newaxis_view_cuda (__main__.TestViewOpsCUDA) ... ok (0.001s) 2023-01-11T21:33:40.0507667Z test_basic_indexing_slice_view_cuda (__main__.TestViewOpsCUDA) ... ok (0.001s) 2023-01-11T21:33:40.0508054Z test_chunk_view_cuda (__main__.TestViewOpsCUDA) ... skip: See https://github.com/pytorch/pytorch/pull/32720 (0.000s) 2023-01-11T21:33:40.0508426Z test_conj_imag_view_cuda_complex128 (__main__.TestViewOpsCUDA) ... ok (0.002s) 2023-01-11T21:33:40.0508749Z test_conj_imag_view_cuda_complex64 (__main__.TestViewOpsCUDA) ... ok (0.001s) 2023-01-11T21:33:40.0509067Z test_conj_self_cuda_bfloat16 (__main__.TestViewOpsCUDA) ... ok (0.001s) 2023-01-11T21:33:40.0509369Z test_conj_self_cuda_float16 (__main__.TestViewOpsCUDA) ... ok (0.001s) 2023-01-11T21:33:40.0509683Z test_conj_self_cuda_float32 (__main__.TestViewOpsCUDA) ... ok (0.001s) 2023-01-11T21:33:40.0509987Z test_conj_self_cuda_float64 (__main__.TestViewOpsCUDA) ... ok (0.001s) 2023-01-11T21:33:40.0510282Z test_conj_self_cuda_int16 (__main__.TestViewOpsCUDA) ... ok (0.001s) 2023-01-11T21:33:40.0510588Z test_conj_self_cuda_int32 (__main__.TestViewOpsCUDA) ... ok (0.000s) 2023-01-11T21:33:40.0510886Z test_conj_self_cuda_int64 (__main__.TestViewOpsCUDA) ... ok (0.000s) 2023-01-11T21:33:40.0511185Z test_conj_self_cuda_int8 (__main__.TestViewOpsCUDA) ... ok (0.000s) 2023-01-11T21:33:40.0511477Z test_conj_self_cuda_uint8 (__main__.TestViewOpsCUDA) ... ok (0.000s) 2023-01-11T21:33:40.0511788Z test_conj_view_with_shared_memory_cuda (__main__.TestViewOpsCUDA) ... ok (0.002s) 2023-01-11T21:33:40.0512111Z test_contiguous_nonview_cuda (__main__.TestViewOpsCUDA) ... ok (0.001s) 2023-01-11T21:33:40.0512412Z test_contiguous_self_cuda (__main__.TestViewOpsCUDA) ... ok (0.000s) 2023-01-11T21:33:40.0512719Z test_diagonal_view_cuda (__main__.TestViewOpsCUDA) ... ok (0.001s) 2023-01-11T21:33:40.0513021Z test_expand_as_view_cuda (__main__.TestViewOpsCUDA) ... ok (0.001s) 2023-01-11T21:33:40.0513312Z test_expand_view_cuda (__main__.TestViewOpsCUDA) ... ok (0.001s) 2023-01-11T21:33:40.0513662Z test_flatten_nonview_cuda (__main__.TestViewOpsCUDA) ... ok (0.002s) 2023-01-11T21:33:40.0513966Z test_flatten_view_cuda (__main__.TestViewOpsCUDA) ... ok (0.003s) 2023-01-11T21:33:40.0514279Z test_imag_noncomplex_cuda_bfloat16 (__main__.TestViewOpsCUDA) ... ok (0.003s) 2023-01-11T21:33:40.0514594Z test_imag_noncomplex_cuda_float16 (__main__.TestViewOpsCUDA) ... ok (0.002s) 2023-01-11T21:33:40.0514916Z test_imag_noncomplex_cuda_float32 (__main__.TestViewOpsCUDA) ... ok (0.002s) 2023-01-11T21:33:40.0515234Z test_imag_noncomplex_cuda_float64 (__main__.TestViewOpsCUDA) ... ok (0.002s) 2023-01-11T21:33:40.0515547Z test_imag_noncomplex_cuda_int16 (__main__.TestViewOpsCUDA) ... ok (0.002s) 2023-01-11T21:33:40.0515868Z test_imag_noncomplex_cuda_int32 (__main__.TestViewOpsCUDA) ... ok (0.002s) 2023-01-11T21:33:40.0516187Z test_imag_noncomplex_cuda_int64 (__main__.TestViewOpsCUDA) ... ok (0.002s) 2023-01-11T21:33:40.0516497Z test_imag_noncomplex_cuda_int8 (__main__.TestViewOpsCUDA) ... ok (0.002s) 2023-01-11T21:33:40.0516807Z test_imag_noncomplex_cuda_uint8 (__main__.TestViewOpsCUDA) ... ok (0.002s) 2023-01-11T21:33:40.0517116Z test_movedim_view_cuda (__main__.TestViewOpsCUDA) ... ok (0.004s) 2023-01-11T21:33:40.0517421Z test_narrow_view_cuda (__main__.TestViewOpsCUDA) ... ok (0.001s) 2023-01-11T21:33:40.0517710Z test_permute_view_cuda (__main__.TestViewOpsCUDA) ... ok (0.001s) 2023-01-11T21:33:40.0518024Z test_real_imag_view_cuda_complex128 (__main__.TestViewOpsCUDA) ... ok (0.003s) 2023-01-11T21:33:40.0518343Z test_real_imag_view_cuda_complex64 (__main__.TestViewOpsCUDA) ... ok (0.003s) 2023-01-11T21:33:40.0518654Z test_reshape_as_view_cuda (__main__.TestViewOpsCUDA) ... ok (0.001s) 2023-01-11T21:33:40.0518990Z test_reshape_nonview_cuda (__main__.TestViewOpsCUDA) ... ok (0.001s) 2023-01-11T21:33:40.0519290Z test_reshape_view_cuda (__main__.TestViewOpsCUDA) ... ok (0.001s) 2023-01-11T21:33:40.0519590Z test_select_view_cuda (__main__.TestViewOpsCUDA) ... ok (0.001s) 2023-01-11T21:33:40.0519901Z test_set_real_imag_cuda_complex128_bfloat16 (__main__.TestViewOpsCUDA) ... ok (0.001s) 2023-01-11T21:33:40.0520242Z test_set_real_imag_cuda_complex128_bool (__main__.TestViewOpsCUDA) ... ok (0.002s) 2023-01-11T21:33:40.0520581Z test_set_real_imag_cuda_complex128_complex128 (__main__.TestViewOpsCUDA) ... ok (0.001s) 2023-01-11T21:33:40.0520977Z test_set_real_imag_cuda_complex128_complex64 (__main__.TestViewOpsCUDA) ... ok (0.001s) 2023-01-11T21:33:40.0521305Z test_set_real_imag_cuda_complex128_float16 (__main__.TestViewOpsCUDA) ... ok (0.001s) 2023-01-11T21:33:40.0521638Z test_set_real_imag_cuda_complex128_float32 (__main__.TestViewOpsCUDA) ... ok (0.001s) 2023-01-11T21:33:40.0521972Z test_set_real_imag_cuda_complex128_float64 (__main__.TestViewOpsCUDA) ... ok (0.001s) 2023-01-11T21:33:40.0522293Z test_set_real_imag_cuda_complex128_int16 (__main__.TestViewOpsCUDA) ... ok (0.002s) 2023-01-11T21:33:40.0522625Z test_set_real_imag_cuda_complex128_int32 (__main__.TestViewOpsCUDA) ... ok (0.001s) 2023-01-11T21:33:40.0522951Z test_set_real_imag_cuda_complex128_int64 (__main__.TestViewOpsCUDA) ... ok (0.002s) 2023-01-11T21:33:40.0523278Z test_set_real_imag_cuda_complex128_int8 (__main__.TestViewOpsCUDA) ... ok (0.001s) 2023-01-11T21:33:40.0523600Z test_set_real_imag_cuda_complex128_uint8 (__main__.TestViewOpsCUDA) ... ok (0.001s) 2023-01-11T21:33:40.0523930Z test_set_real_imag_cuda_complex64_bfloat16 (__main__.TestViewOpsCUDA) ... ok (0.002s) 2023-01-11T21:33:40.0524262Z test_set_real_imag_cuda_complex64_bool (__main__.TestViewOpsCUDA) ... ok (0.002s) 2023-01-11T21:33:40.0524590Z test_set_real_imag_cuda_complex64_complex128 (__main__.TestViewOpsCUDA) ... ok (0.001s) 2023-01-11T21:33:40.0524933Z test_set_real_imag_cuda_complex64_complex64 (__main__.TestViewOpsCUDA) ... ok (0.001s) 2023-01-11T21:33:40.0525269Z test_set_real_imag_cuda_complex64_float16 (__main__.TestViewOpsCUDA) ... ok (0.001s) 2023-01-11T21:33:40.0525634Z test_set_real_imag_cuda_complex64_float32 (__main__.TestViewOpsCUDA) ... ok (0.002s) 2023-01-11T21:33:40.0525957Z test_set_real_imag_cuda_complex64_float64 (__main__.TestViewOpsCUDA) ... ok (0.001s) 2023-01-11T21:33:40.0526291Z test_set_real_imag_cuda_complex64_int16 (__main__.TestViewOpsCUDA) ... ok (0.002s) 2023-01-11T21:33:40.0526622Z test_set_real_imag_cuda_complex64_int32 (__main__.TestViewOpsCUDA) ... ok (0.002s) 2023-01-11T21:33:40.0526942Z test_set_real_imag_cuda_complex64_int64 (__main__.TestViewOpsCUDA) ... ok (0.002s) 2023-01-11T21:33:40.0527269Z test_set_real_imag_cuda_complex64_int8 (__main__.TestViewOpsCUDA) ... ok (0.002s) 2023-01-11T21:33:40.0527594Z test_set_real_imag_cuda_complex64_uint8 (__main__.TestViewOpsCUDA) ... ok (0.001s) 2023-01-11T21:33:40.0527977Z test_split_view_cuda (__main__.TestViewOpsCUDA) ... skip: See https://github.com/pytorch/pytorch/pull/32720 (0.000s) 2023-01-11T21:33:40.0528337Z test_squeeze_inplace_view_cuda (__main__.TestViewOpsCUDA) ... ok (0.001s) 2023-01-11T21:33:40.0528647Z test_squeeze_view_cuda (__main__.TestViewOpsCUDA) ... ok (0.001s) 2023-01-11T21:33:40.0528945Z test_t_inplace_view_cuda (__main__.TestViewOpsCUDA) ... ok (0.001s) 2023-01-11T21:33:40.0529233Z test_t_view_cuda (__main__.TestViewOpsCUDA) ... ok (0.001s) 2023-01-11T21:33:40.0529531Z test_transpose_inplace_view_cuda (__main__.TestViewOpsCUDA) ... ok (0.002s) 2023-01-11T21:33:40.0529845Z test_transpose_view_cuda (__main__.TestViewOpsCUDA) ... ok (0.002s) 2023-01-11T21:33:40.0530138Z test_unbind_cuda (__main__.TestViewOpsCUDA) ... ok (0.005s) 2023-01-11T21:33:40.0530417Z test_unbind_view_cuda (__main__.TestViewOpsCUDA) ... ok (0.002s) 2023-01-11T21:33:40.0530718Z test_unfold_view_cuda (__main__.TestViewOpsCUDA) ... ok (0.001s) 2023-01-11T21:33:40.0531105Z test_unsqueeze_inplace_view_cuda (__main__.TestViewOpsCUDA) ... ok (0.001s) 2023-01-11T21:33:40.0531405Z test_unsqueeze_view_cuda (__main__.TestViewOpsCUDA) ... ok (0.001s) 2023-01-11T21:33:40.0531708Z test_view_as_complex_cuda (__main__.TestViewOpsCUDA) ... ok (0.030s) 2023-01-11T21:33:40.0532022Z test_view_as_real_cuda_complex128 (__main__.TestViewOpsCUDA) ... ok (0.002s) 2023-01-11T21:33:40.0532871Z test_view_as_real_cuda_complex32 (__main__.TestViewOpsCUDA) ... /var/lib/jenkins/workspace/test/test_view_ops.py:308: UserWarning: ComplexHalf support is experimental and many operators don't support it yet. (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/EmptyTensor.cpp:32.) 2023-01-11T21:33:40.0533351Z t = torch.randn(3, 4, dtype=dtype, device=device) 2023-01-11T21:33:40.0533570Z ok (0.002s) 2023-01-11T21:33:40.0533821Z test_view_as_real_cuda_complex64 (__main__.TestViewOpsCUDA) ... ok (0.002s) 2023-01-11T21:33:40.0534119Z test_view_as_view_cuda (__main__.TestViewOpsCUDA) ... ok (0.001s) 2023-01-11T21:33:40.0534412Z test_view_copy_cuda (__main__.TestViewOpsCUDA) ... ok (0.002s) 2023-01-11T21:33:40.0538973Z test_view_copy_out_cuda (__main__.TestViewOpsCUDA) ... ok (0.002s) 2023-01-11T21:33:40.0539302Z test_view_copy_output_contiguous_cuda (__main__.TestViewOpsCUDA) ... ok (0.001s) 2023-01-11T21:33:40.0539613Z test_view_dtype_new_cuda_bool (__main__.TestViewOpsCUDA) ... ok (0.179s) 2023-01-11T21:33:40.0539930Z test_view_dtype_new_cuda_complex128 (__main__.TestViewOpsCUDA) ... ok (0.198s) 2023-01-11T21:33:40.0540250Z test_view_dtype_new_cuda_complex64 (__main__.TestViewOpsCUDA) ... ok (0.166s) 2023-01-11T21:33:40.0560637Z test_view_dtype_new_cuda_float16 (__main__.TestViewOpsCUDA) ... ok (0.180s) 2023-01-11T21:33:40.0560969Z test_view_dtype_new_cuda_float32 (__main__.TestViewOpsCUDA) ... ok (0.180s) 2023-01-11T21:33:40.0561288Z test_view_dtype_new_cuda_float64 (__main__.TestViewOpsCUDA) ... ok (0.162s) 2023-01-11T21:33:40.0561618Z test_view_dtype_new_cuda_int16 (__main__.TestViewOpsCUDA) ... ok (0.179s) 2023-01-11T21:33:40.0561923Z test_view_dtype_new_cuda_int32 (__main__.TestViewOpsCUDA) ... ok (0.179s) 2023-01-11T21:33:40.0562233Z test_view_dtype_new_cuda_int64 (__main__.TestViewOpsCUDA) ... ok (0.162s) 2023-01-11T21:33:40.0562819Z test_view_dtype_new_cuda_int8 (__main__.TestViewOpsCUDA) ... ok (0.180s) 2023-01-11T21:33:40.0563135Z test_view_dtype_new_cuda_uint8 (__main__.TestViewOpsCUDA) ... ok (0.178s) 2023-01-11T21:33:40.0563451Z test_view_dtype_upsize_errors_cuda_bfloat16 (__main__.TestViewOpsCUDA) ... ok (0.091s) 2023-01-11T21:33:40.0563792Z test_view_dtype_upsize_errors_cuda_bool (__main__.TestViewOpsCUDA) ... ok (0.135s) 2023-01-11T21:33:40.0564138Z test_view_dtype_upsize_errors_cuda_complex128 (__main__.TestViewOpsCUDA) ... ok (0.001s) 2023-01-11T21:33:40.0564477Z test_view_dtype_upsize_errors_cuda_complex64 (__main__.TestViewOpsCUDA) ... ok (0.016s) 2023-01-11T21:33:40.0564825Z test_view_dtype_upsize_errors_cuda_float16 (__main__.TestViewOpsCUDA) ... ok (0.090s) 2023-01-11T21:33:40.0565165Z test_view_dtype_upsize_errors_cuda_float32 (__main__.TestViewOpsCUDA) ... ok (0.060s) 2023-01-11T21:33:40.0565509Z test_view_dtype_upsize_errors_cuda_float64 (__main__.TestViewOpsCUDA) ... ok (0.016s) 2023-01-11T21:33:40.0565842Z test_view_dtype_upsize_errors_cuda_int16 (__main__.TestViewOpsCUDA) ... ok (0.090s) 2023-01-11T21:33:40.0566171Z test_view_dtype_upsize_errors_cuda_int32 (__main__.TestViewOpsCUDA) ... ok (0.060s) 2023-01-11T21:33:40.0566500Z test_view_dtype_upsize_errors_cuda_int64 (__main__.TestViewOpsCUDA) ... ok (0.016s) 2023-01-11T21:33:40.0566820Z test_view_dtype_upsize_errors_cuda_int8 (__main__.TestViewOpsCUDA) ... ok (0.135s) 2023-01-11T21:33:40.0567153Z test_view_dtype_upsize_errors_cuda_uint8 (__main__.TestViewOpsCUDA) ... ok (0.137s) 2023-01-11T21:33:40.0567487Z test_view_tensor_dsplit_cuda_bfloat16 (__main__.TestViewOpsCUDA) ... ok (0.001s) 2023-01-11T21:33:40.0567903Z test_view_tensor_dsplit_cuda_bool (__main__.TestViewOpsCUDA) ... ok (0.001s) 2023-01-11T21:33:40.0568230Z test_view_tensor_dsplit_cuda_complex128 (__main__.TestViewOpsCUDA) ... ok (0.001s) 2023-01-11T21:33:40.0568565Z test_view_tensor_dsplit_cuda_complex64 (__main__.TestViewOpsCUDA) ... ok (0.001s) 2023-01-11T21:33:40.0568899Z test_view_tensor_dsplit_cuda_float16 (__main__.TestViewOpsCUDA) ... ok (0.001s) 2023-01-11T21:33:40.0569217Z test_view_tensor_dsplit_cuda_float32 (__main__.TestViewOpsCUDA) ... ok (0.001s) 2023-01-11T21:33:40.0569542Z test_view_tensor_dsplit_cuda_float64 (__main__.TestViewOpsCUDA) ... ok (0.001s) 2023-01-11T21:33:40.0569867Z test_view_tensor_dsplit_cuda_int16 (__main__.TestViewOpsCUDA) ... ok (0.001s) 2023-01-11T21:33:40.0570188Z test_view_tensor_dsplit_cuda_int32 (__main__.TestViewOpsCUDA) ... ok (0.001s) 2023-01-11T21:33:40.0570505Z test_view_tensor_dsplit_cuda_int64 (__main__.TestViewOpsCUDA) ... ok (0.001s) 2023-01-11T21:33:40.0570827Z test_view_tensor_dsplit_cuda_int8 (__main__.TestViewOpsCUDA) ... ok (0.001s) 2023-01-11T21:33:40.0571158Z test_view_tensor_dsplit_cuda_uint8 (__main__.TestViewOpsCUDA) ... ok (0.001s) 2023-01-11T21:33:40.0571476Z test_view_tensor_hsplit_cuda_bfloat16 (__main__.TestViewOpsCUDA) ... ok (0.001s) 2023-01-11T21:33:40.0571809Z test_view_tensor_hsplit_cuda_bool (__main__.TestViewOpsCUDA) ... ok (0.001s) 2023-01-11T21:33:40.0572141Z test_view_tensor_hsplit_cuda_complex128 (__main__.TestViewOpsCUDA) ... ok (0.001s) 2023-01-11T21:33:40.0572478Z test_view_tensor_hsplit_cuda_complex64 (__main__.TestViewOpsCUDA) ... ok (0.001s) 2023-01-11T21:33:40.0572800Z test_view_tensor_hsplit_cuda_float16 (__main__.TestViewOpsCUDA) ... ok (0.001s) 2023-01-11T21:33:40.0573126Z test_view_tensor_hsplit_cuda_float32 (__main__.TestViewOpsCUDA) ... ok (0.001s) 2023-01-11T21:33:40.0573456Z test_view_tensor_hsplit_cuda_float64 (__main__.TestViewOpsCUDA) ... ok (0.001s) 2023-01-11T21:33:40.0573769Z test_view_tensor_hsplit_cuda_int16 (__main__.TestViewOpsCUDA) ... ok (0.001s) 2023-01-11T21:33:40.0574094Z test_view_tensor_hsplit_cuda_int32 (__main__.TestViewOpsCUDA) ... ok (0.001s) 2023-01-11T21:33:40.0574414Z test_view_tensor_hsplit_cuda_int64 (__main__.TestViewOpsCUDA) ... ok (0.001s) 2023-01-11T21:33:40.0575122Z test_view_tensor_hsplit_cuda_int8 (__main__.TestViewOpsCUDA) ... ok (0.001s) 2023-01-11T21:33:40.0575526Z test_view_tensor_hsplit_cuda_uint8 (__main__.TestViewOpsCUDA) ... ok (0.001s) 2023-01-11T21:33:40.0575845Z test_view_tensor_split_cuda_bfloat16 (__main__.TestViewOpsCUDA) ... ok (0.001s) 2023-01-11T21:33:40.0576273Z test_view_tensor_split_cuda_bool (__main__.TestViewOpsCUDA) ... ok (0.001s) 2023-01-11T21:33:40.0576591Z test_view_tensor_split_cuda_complex128 (__main__.TestViewOpsCUDA) ... ok (0.001s) 2023-01-11T21:33:40.0576920Z test_view_tensor_split_cuda_complex64 (__main__.TestViewOpsCUDA) ... ok (0.001s) 2023-01-11T21:33:40.0577247Z test_view_tensor_split_cuda_float16 (__main__.TestViewOpsCUDA) ... ok (0.001s) 2023-01-11T21:33:40.0577577Z test_view_tensor_split_cuda_float32 (__main__.TestViewOpsCUDA) ... ok (0.001s) 2023-01-11T21:33:40.0577887Z test_view_tensor_split_cuda_float64 (__main__.TestViewOpsCUDA) ... ok (0.001s) 2023-01-11T21:33:40.0578207Z test_view_tensor_split_cuda_int16 (__main__.TestViewOpsCUDA) ... ok (0.001s) 2023-01-11T21:33:40.0578528Z test_view_tensor_split_cuda_int32 (__main__.TestViewOpsCUDA) ... ok (0.001s) 2023-01-11T21:33:40.0578834Z test_view_tensor_split_cuda_int64 (__main__.TestViewOpsCUDA) ... ok (0.001s) 2023-01-11T21:33:40.0579144Z test_view_tensor_split_cuda_int8 (__main__.TestViewOpsCUDA) ... ok (0.001s) 2023-01-11T21:33:40.0579459Z test_view_tensor_split_cuda_uint8 (__main__.TestViewOpsCUDA) ... ok (0.001s) 2023-01-11T21:33:40.0579778Z test_view_tensor_vsplit_cuda_bfloat16 (__main__.TestViewOpsCUDA) ... ok (0.001s) 2023-01-11T21:33:40.0580093Z test_view_tensor_vsplit_cuda_bool (__main__.TestViewOpsCUDA) ... ok (0.001s) 2023-01-11T21:33:40.0580419Z test_view_tensor_vsplit_cuda_complex128 (__main__.TestViewOpsCUDA) ... ok (0.001s) 2023-01-11T21:33:40.0580862Z test_view_tensor_vsplit_cuda_complex64 (__main__.TestViewOpsCUDA) ... ok (0.001s) 2023-01-11T21:33:40.0581181Z test_view_tensor_vsplit_cuda_float16 (__main__.TestViewOpsCUDA) ... ok (0.001s) 2023-01-11T21:33:40.0581506Z test_view_tensor_vsplit_cuda_float32 (__main__.TestViewOpsCUDA) ... ok (0.001s) 2023-01-11T21:33:40.0581826Z test_view_tensor_vsplit_cuda_float64 (__main__.TestViewOpsCUDA) ... ok (0.001s) 2023-01-11T21:33:40.0582149Z test_view_tensor_vsplit_cuda_int16 (__main__.TestViewOpsCUDA) ... ok (0.001s) 2023-01-11T21:33:40.0582455Z test_view_tensor_vsplit_cuda_int32 (__main__.TestViewOpsCUDA) ... ok (0.001s) 2023-01-11T21:33:40.0582772Z test_view_tensor_vsplit_cuda_int64 (__main__.TestViewOpsCUDA) ... ok (0.001s) 2023-01-11T21:33:40.0583088Z test_view_tensor_vsplit_cuda_int8 (__main__.TestViewOpsCUDA) ... ok (0.001s) 2023-01-11T21:33:40.0583392Z test_view_tensor_vsplit_cuda_uint8 (__main__.TestViewOpsCUDA) ... ok (0.001s) 2023-01-11T21:33:40.0583699Z test_view_view_cuda (__main__.TestViewOpsCUDA) ... ok (0.001s) 2023-01-11T21:33:40.0583867Z 2023-01-11T21:33:40.0584184Z ---------------------------------------------------------------------- 2023-01-11T21:33:40.0584452Z Ran 278 tests in 15.850s 2023-01-11T21:33:40.0584568Z 2023-01-11T21:33:40.0584656Z OK (skipped=12) 2023-01-11T21:33:40.0584774Z 2023-01-11T21:33:40.0584866Z Generating XML reports... 2023-01-11T21:33:40.0585311Z Generated XML report: test-reports/python-unittest/test_view_ops/TEST-TestOldViewOpsCUDA-20230111213323.xml 2023-01-11T21:33:40.0585821Z Generated XML report: test-reports/python-unittest/test_view_ops/TEST-TestViewOpsCUDA-20230111213323.xml 2023-01-11T21:33:40.0586059Z 2023-01-11T21:33:40.0586478Z ##[endgroup] 2023-01-11T21:33:40.0586868Z FINISHED PRINTING LOG FILE of test_view_ops (/var/lib/jenkins/workspace/test/test-reports/test_view_ops_yux6z2dq) 2023-01-11T21:33:40.0587088Z 2023-01-11T21:33:42.0001283Z Ignoring disabled issues: [] 2023-01-11T21:38:05.6660668Z 2023-01-11T21:38:05.6663080Z Expand the folded group to see the log file of inductor/test_torchinductor 2023-01-11T21:38:05.6663865Z ##[group]PRINTING LOG FILE of inductor/test_torchinductor (/var/lib/jenkins/workspace/test/test-reports/inductor-test_torchinductor_y59pervs) 2023-01-11T21:38:05.6709623Z 2023-01-11T21:38:05.6710254Z Running tests... 2023-01-11T21:38:05.6714453Z ---------------------------------------------------------------------- 2023-01-11T21:38:05.6715199Z Test results will be stored in test-reports/python-unittest/inductor.test_torchinductor 2023-01-11T21:38:05.6715723Z test_auto_simd (__main__.CPUReproTests) ... ok (2.951s) 2023-01-11T21:38:05.6716161Z test_complex_memory_overlap (__main__.CPUReproTests) ... ok (0.001s) 2023-01-11T21:38:05.6717157Z test_conv_stride_constraints (__main__.CPUReproTests) ... [2023-01-11 21:23:39,071] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph None 2023-01-11T21:38:05.6717830Z [2023-01-11 21:23:40,797] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph None 2023-01-11T21:38:05.6718312Z [2023-01-11 21:23:40,870] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph None 2023-01-11T21:38:05.6718781Z [2023-01-11 21:23:40,887] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph None 2023-01-11T21:38:05.6718995Z 2023-01-11T21:38:05.6719101Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.6719501Z import torch 2023-01-11T21:38:05.6720075Z import random 2023-01-11T21:38:05.6720371Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.6720714Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.6720922Z 2023-01-11T21:38:05.6721044Z aten = torch.ops.aten 2023-01-11T21:38:05.6721391Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.6723174Z async_compile = AsyncCompile() 2023-01-11T21:38:05.6723584Z 2023-01-11T21:38:05.6723705Z import triton 2023-01-11T21:38:05.6723991Z import triton.language as tl 2023-01-11T21:38:05.6724339Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.6724719Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.6724955Z 2023-01-11T21:38:05.6724962Z 2023-01-11T21:38:05.6725201Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.6725652Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.6726081Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:05.6726341Z float* __restrict__ out_ptr0) 2023-01-11T21:38:05.6726546Z { 2023-01-11T21:38:05.6726751Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.6726948Z { 2023-01-11T21:38:05.6727140Z #pragma omp for collapse(3) 2023-01-11T21:38:05.6727360Z for(long i0=0; i0<2; i0+=1) 2023-01-11T21:38:05.6727545Z { 2023-01-11T21:38:05.6727740Z for(long i1=0; i1<5; i1+=1) 2023-01-11T21:38:05.6727930Z { 2023-01-11T21:38:05.6728120Z for(long i2=0; i2<256; i2+=1) 2023-01-11T21:38:05.6728315Z { 2023-01-11T21:38:05.6728493Z { 2023-01-11T21:38:05.6729182Z { 2023-01-11T21:38:05.6729452Z auto tmp0 = in_ptr0[i2 + (256*i1) + (1280*i0)]; 2023-01-11T21:38:05.6729797Z out_ptr0[i1 + (5*i2) + (1280*i0)] = tmp0; 2023-01-11T21:38:05.6730074Z } 2023-01-11T21:38:05.6730313Z } 2023-01-11T21:38:05.6730557Z } 2023-01-11T21:38:05.6730788Z } 2023-01-11T21:38:05.6731014Z } 2023-01-11T21:38:05.6731201Z } 2023-01-11T21:38:05.6731404Z } 2023-01-11T21:38:05.6731654Z ''') 2023-01-11T21:38:05.6731784Z 2023-01-11T21:38:05.6731789Z 2023-01-11T21:38:05.6731916Z async_compile.wait(globals()) 2023-01-11T21:38:05.6732182Z del async_compile 2023-01-11T21:38:05.6732335Z 2023-01-11T21:38:05.6732432Z def call(args): 2023-01-11T21:38:05.6732701Z inp_1, weight_1 = args 2023-01-11T21:38:05.6732973Z args.clear() 2023-01-11T21:38:05.6733438Z buf0 = empty_strided((2, 5, 16, 16), (1280, 1, 80, 5), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.6733930Z kernel_cpp_0(c_void_p(inp_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:05.6734253Z del inp_1 2023-01-11T21:38:05.6734909Z buf1 = aten.convolution(buf0, weight_1, None, (1, 1), (0, 0), (1, 1), False, (0, 0), 1) 2023-01-11T21:38:05.6735324Z assert_size_stride(buf1, (2, 6, 14, 14), (1176, 1, 84, 6)) 2023-01-11T21:38:05.6735613Z del buf0 2023-01-11T21:38:05.6735853Z del weight_1 2023-01-11T21:38:05.6736069Z return (buf1, ) 2023-01-11T21:38:05.6736190Z 2023-01-11T21:38:05.6736195Z 2023-01-11T21:38:05.6736280Z if __name__ == "__main__": 2023-01-11T21:38:05.6736516Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.6736800Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.6737278Z inp_1 = rand_strided((2, 5, 16, 16), (1280, 256, 16, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.6737672Z weight_1 = rand_strided((6, 5, 3, 3), (45, 1, 15, 5), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.6737976Z print_performance(lambda: call([inp_1, weight_1])) 2023-01-11T21:38:05.6738136Z 2023-01-11T21:38:05.6738140Z 2023-01-11T21:38:05.6738235Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.6738448Z import torch 2023-01-11T21:38:05.6738635Z import random 2023-01-11T21:38:05.6738861Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.6739140Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.6742705Z 2023-01-11T21:38:05.6742916Z aten = torch.ops.aten 2023-01-11T21:38:05.6743303Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.6743677Z async_compile = AsyncCompile() 2023-01-11T21:38:05.6743977Z 2023-01-11T21:38:05.6744061Z import triton 2023-01-11T21:38:05.6744267Z import triton.language as tl 2023-01-11T21:38:05.6744528Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.6744825Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.6745000Z 2023-01-11T21:38:05.6745005Z 2023-01-11T21:38:05.6745104Z async_compile.wait(globals()) 2023-01-11T21:38:05.6745306Z del async_compile 2023-01-11T21:38:05.6745427Z 2023-01-11T21:38:05.6745507Z def call(args): 2023-01-11T21:38:05.6745702Z inp_1, weight_1 = args 2023-01-11T21:38:05.6745893Z args.clear() 2023-01-11T21:38:05.6746151Z buf0 = aten.convolution(inp_1, weight_1, None, (1, 1), (0, 0), (1, 1), False, (0, 0), 1) 2023-01-11T21:38:05.6746447Z assert_size_stride(buf0, (2, 6, 14, 14), (1176, 196, 14, 1)) 2023-01-11T21:38:05.6746657Z del inp_1 2023-01-11T21:38:05.6746866Z del weight_1 2023-01-11T21:38:05.6747168Z return (buf0, ) 2023-01-11T21:38:05.6747348Z 2023-01-11T21:38:05.6747360Z 2023-01-11T21:38:05.6747451Z if __name__ == "__main__": 2023-01-11T21:38:05.6747688Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.6748713Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.6749341Z inp_1 = rand_strided((2, 5, 16, 16), (1280, 256, 16, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.6749957Z weight_1 = rand_strided((6, 5, 3, 3), (45, 9, 3, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.6750262Z print_performance(lambda: call([inp_1, weight_1])) 2023-01-11T21:38:05.6750423Z 2023-01-11T21:38:05.6750500Z ok (1.862s) 2023-01-11T21:38:05.6751072Z test_cpp_kernel_profile (__main__.CPUReproTests) ... /opt/conda/lib/python3.10/site-packages/torch/autograd/profiler.py:179: UserWarning: CUDA is not available, disabling CUDA profiling 2023-01-11T21:38:05.6751771Z warn("CUDA is not available, disabling CUDA profiling") 2023-01-11T21:38:05.6752481Z STAGE:2023-01-11 21:23:41 2346:2346 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:38:05.6753244Z [2023-01-11 21:23:41,065] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 0 2023-01-11T21:38:05.6754024Z [2023-01-11 21:23:45,080] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 0 2023-01-11T21:38:05.6754690Z STAGE:2023-01-11 21:23:45 2346:2346 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:38:05.6755223Z STAGE:2023-01-11 21:23:45 2346:2346 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:38:05.6755571Z 2023-01-11T21:38:05.6755732Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.6756059Z import torch 2023-01-11T21:38:05.6756377Z import random 2023-01-11T21:38:05.6756687Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.6757053Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.6757264Z 2023-01-11T21:38:05.6757378Z aten = torch.ops.aten 2023-01-11T21:38:05.6757710Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.6758056Z async_compile = AsyncCompile() 2023-01-11T21:38:05.6758225Z 2023-01-11T21:38:05.6758305Z import triton 2023-01-11T21:38:05.6758507Z import triton.language as tl 2023-01-11T21:38:05.6758786Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.6759118Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.6759308Z 2023-01-11T21:38:05.6759313Z 2023-01-11T21:38:05.6759473Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.6759729Z #include 2023-01-11T21:38:05.6760102Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.6760490Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:05.6760762Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:05.6761003Z float* __restrict__ out_ptr0) 2023-01-11T21:38:05.6761264Z { 2023-01-11T21:38:05.6761512Z RECORD_FUNCTION("graph_0_kernel_cpp_0", c10::ArrayRef({})); 2023-01-11T21:38:05.6761787Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.6761991Z { 2023-01-11T21:38:05.6762163Z #pragma omp for 2023-01-11T21:38:05.6762373Z for(long i0=0; i0<12; i0+=1) 2023-01-11T21:38:05.6762567Z { 2023-01-11T21:38:05.6762799Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:05.6763100Z auto tmp1 = at::vec::Vectorized::loadu(in_ptr1 + 8*i0); 2023-01-11T21:38:05.6763355Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:05.6763582Z tmp2.store(out_ptr0 + 8*i0); 2023-01-11T21:38:05.6763776Z } 2023-01-11T21:38:05.6763976Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:05.6764199Z for(long i0=96; i0<100; i0+=1) 2023-01-11T21:38:05.6764383Z { 2023-01-11T21:38:05.6764576Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:05.6764799Z auto tmp1 = in_ptr1[i0]; 2023-01-11T21:38:05.6765030Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:05.6765270Z out_ptr0[i0] = tmp2; 2023-01-11T21:38:05.6765456Z } 2023-01-11T21:38:05.6765614Z } 2023-01-11T21:38:05.6765775Z } 2023-01-11T21:38:05.6765955Z ''') 2023-01-11T21:38:05.6766047Z 2023-01-11T21:38:05.6766054Z 2023-01-11T21:38:05.6766149Z async_compile.wait(globals()) 2023-01-11T21:38:05.6766357Z del async_compile 2023-01-11T21:38:05.6766473Z 2023-01-11T21:38:05.6766551Z def call(args): 2023-01-11T21:38:05.6766734Z arg0_1, arg1_1 = args 2023-01-11T21:38:05.6766923Z args.clear() 2023-01-11T21:38:05.6767231Z buf0 = empty_strided((100, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.6767555Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:05.6767815Z del arg0_1 2023-01-11T21:38:05.6767992Z del arg1_1 2023-01-11T21:38:05.6768171Z return (buf0, ) 2023-01-11T21:38:05.6768287Z 2023-01-11T21:38:05.6768291Z 2023-01-11T21:38:05.6768375Z if __name__ == "__main__": 2023-01-11T21:38:05.6768613Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.6768890Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.6769274Z arg0_1 = rand_strided((100, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.6769660Z arg1_1 = rand_strided((100, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.6769960Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:05.6770128Z 2023-01-11T21:38:05.6770194Z ok (4.174s) 2023-01-11T21:38:05.6770449Z test_cpu_vec_cosim (__main__.CPUReproTests) ... ok (0.001s) 2023-01-11T21:38:05.6771012Z test_inplace_add_alpha (__main__.CPUReproTests) ... [2023-01-11 21:23:45,102] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph None 2023-01-11T21:38:05.6771598Z [2023-01-11 21:23:46,792] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph None 2023-01-11T21:38:05.6771832Z 2023-01-11T21:38:05.6771932Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.6772154Z import torch 2023-01-11T21:38:05.6772352Z import random 2023-01-11T21:38:05.6772594Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.6772901Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.6773079Z 2023-01-11T21:38:05.6773169Z aten = torch.ops.aten 2023-01-11T21:38:05.6773453Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.6773735Z async_compile = AsyncCompile() 2023-01-11T21:38:05.6773881Z 2023-01-11T21:38:05.6773960Z import triton 2023-01-11T21:38:05.6774176Z import triton.language as tl 2023-01-11T21:38:05.6774448Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.6775055Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.6775255Z 2023-01-11T21:38:05.6775259Z 2023-01-11T21:38:05.6775494Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.6775836Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.6776294Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:05.6776568Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:05.6776817Z float* __restrict__ out_ptr0) 2023-01-11T21:38:05.6777012Z { 2023-01-11T21:38:05.6777324Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.6777539Z { 2023-01-11T21:38:05.6777713Z #pragma omp for 2023-01-11T21:38:05.6777920Z for(long i0=0; i0<1; i0+=1) 2023-01-11T21:38:05.6778112Z { 2023-01-11T21:38:05.6778348Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:05.6778650Z auto tmp1 = at::vec::Vectorized::loadu(in_ptr1 + 8*i0); 2023-01-11T21:38:05.6778956Z auto tmp2 = at::vec::Vectorized(static_cast(0.55)); 2023-01-11T21:38:05.6779209Z auto tmp3 = tmp1 * tmp2; 2023-01-11T21:38:05.6779429Z auto tmp4 = tmp0 + tmp3; 2023-01-11T21:38:05.6779652Z tmp4.store(out_ptr0 + 8*i0); 2023-01-11T21:38:05.6779845Z } 2023-01-11T21:38:05.6780051Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:05.6780280Z for(long i0=8; i0<10; i0+=1) 2023-01-11T21:38:05.6780473Z { 2023-01-11T21:38:05.6780660Z auto tmp0 = out_ptr0[i0]; 2023-01-11T21:38:05.6780879Z auto tmp1 = in_ptr1[i0]; 2023-01-11T21:38:05.6781115Z auto tmp2 = static_cast(0.55); 2023-01-11T21:38:05.6782834Z auto tmp3 = tmp1 * tmp2; 2023-01-11T21:38:05.6783127Z auto tmp4 = tmp0 + tmp3; 2023-01-11T21:38:05.6783424Z out_ptr0[i0] = tmp4; 2023-01-11T21:38:05.6783687Z } 2023-01-11T21:38:05.6783911Z } 2023-01-11T21:38:05.6784144Z } 2023-01-11T21:38:05.6784423Z ''') 2023-01-11T21:38:05.6784562Z 2023-01-11T21:38:05.6784574Z 2023-01-11T21:38:05.6784700Z async_compile.wait(globals()) 2023-01-11T21:38:05.6784999Z del async_compile 2023-01-11T21:38:05.6785165Z 2023-01-11T21:38:05.6785279Z def call(args): 2023-01-11T21:38:05.6785514Z x_1, y_1 = args 2023-01-11T21:38:05.6785763Z args.clear() 2023-01-11T21:38:05.6786162Z kernel_cpp_0(c_void_p(x_1.data_ptr()), c_void_p(y_1.data_ptr()), c_void_p(x_1.data_ptr())) 2023-01-11T21:38:05.6786422Z del y_1 2023-01-11T21:38:05.6786599Z return (x_1, ) 2023-01-11T21:38:05.6786718Z 2023-01-11T21:38:05.6786722Z 2023-01-11T21:38:05.6786808Z if __name__ == "__main__": 2023-01-11T21:38:05.6787049Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.6787323Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.6787697Z x_1 = rand_strided((10, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.6788053Z y_1 = rand_strided((10, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.6788328Z print_performance(lambda: call([x_1, y_1])) 2023-01-11T21:38:05.6788477Z 2023-01-11T21:38:05.6788553Z ok (1.707s) 2023-01-11T21:38:05.6789204Z test_inplace_squeeze_needed (__main__.CPUReproTests) ... /opt/conda/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py:372: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled.Consider setting `torch.set_float32_matmul_precision('high')` 2023-01-11T21:38:05.6789717Z warnings.warn( 2023-01-11T21:38:05.6790095Z [2023-01-11 21:23:46,937] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1 2023-01-11T21:38:05.6790551Z [2023-01-11 21:23:48,735] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1 2023-01-11T21:38:05.6790763Z 2023-01-11T21:38:05.6790869Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.6791085Z import torch 2023-01-11T21:38:05.6791268Z import random 2023-01-11T21:38:05.6791503Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.6791822Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.6792003Z 2023-01-11T21:38:05.6792092Z aten = torch.ops.aten 2023-01-11T21:38:05.6792349Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.6792618Z async_compile = AsyncCompile() 2023-01-11T21:38:05.6792756Z 2023-01-11T21:38:05.6792838Z import triton 2023-01-11T21:38:05.6793039Z import triton.language as tl 2023-01-11T21:38:05.6793298Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.6793606Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.6793786Z 2023-01-11T21:38:05.6793790Z 2023-01-11T21:38:05.6793934Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.6794275Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.6794630Z extern "C" void kernel(float* __restrict__ in_out_ptr0, 2023-01-11T21:38:05.6794894Z float* __restrict__ in_out_ptr1, 2023-01-11T21:38:05.6795142Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:05.6795398Z const float* __restrict__ in_ptr2, 2023-01-11T21:38:05.6795648Z const float* __restrict__ in_ptr3, 2023-01-11T21:38:05.6795891Z float* __restrict__ out_ptr0, 2023-01-11T21:38:05.6796137Z float* __restrict__ out_ptr2, 2023-01-11T21:38:05.6796381Z float* __restrict__ out_ptr3, 2023-01-11T21:38:05.6796623Z bool* __restrict__ out_ptr4) 2023-01-11T21:38:05.6796821Z { 2023-01-11T21:38:05.6797009Z auto in_ptr0 = in_out_ptr0; 2023-01-11T21:38:05.6797233Z auto out_ptr1 = in_out_ptr1; 2023-01-11T21:38:05.6797423Z { 2023-01-11T21:38:05.6797719Z #pragma omp declare reduction(+:at::vec::Vectorized:omp_out += omp_in) initializer(omp_priv={{0}}) 2023-01-11T21:38:05.6798012Z float tmp3 = 0; 2023-01-11T21:38:05.6798251Z auto tmp3_vec = at::vec::Vectorized(tmp3); 2023-01-11T21:38:05.6798510Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.6798724Z { 2023-01-11T21:38:05.6798933Z #pragma omp for reduction(+:tmp3_vec) 2023-01-11T21:38:05.6799172Z for(long i0=0; i0<1; i0+=1) 2023-01-11T21:38:05.6799411Z { 2023-01-11T21:38:05.6799657Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:05.6799966Z auto tmp1 = at::vec::Vectorized::loadu(in_ptr1 + 8*i0); 2023-01-11T21:38:05.6800230Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:05.6800451Z tmp3_vec += tmp2; 2023-01-11T21:38:05.6800636Z } 2023-01-11T21:38:05.6800943Z tmp3 = at::vec::vec_reduce_all([](at::vec::Vectorized& x, at::vec::Vectorized&y) {return x + y;}, tmp3_vec); 2023-01-11T21:38:05.6801290Z #pragma omp for simd simdlen(4) reduction(+:tmp3) 2023-01-11T21:38:05.6801535Z for(long i0=8; i0<10; i0+=1) 2023-01-11T21:38:05.6801731Z { 2023-01-11T21:38:05.6801934Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:05.6802155Z auto tmp1 = in_ptr1[i0]; 2023-01-11T21:38:05.6802375Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:05.6802588Z tmp3 += tmp2; 2023-01-11T21:38:05.6802784Z } 2023-01-11T21:38:05.6802953Z } 2023-01-11T21:38:05.6803142Z out_ptr0[0] = tmp3; 2023-01-11T21:38:05.6803334Z } 2023-01-11T21:38:05.6803492Z { 2023-01-11T21:38:05.6803788Z #pragma omp declare reduction(+:at::vec::Vectorized:omp_out += omp_in) initializer(omp_priv={{0}}) 2023-01-11T21:38:05.6804083Z float tmp8 = 0; 2023-01-11T21:38:05.6804322Z auto tmp8_vec = at::vec::Vectorized(tmp8); 2023-01-11T21:38:05.6808441Z float tmp9 = 0; 2023-01-11T21:38:05.6808758Z auto tmp9_vec = at::vec::Vectorized(tmp9); 2023-01-11T21:38:05.6810737Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.6811040Z { 2023-01-11T21:38:05.6811367Z #pragma omp for reduction(+:tmp8_vec) reduction(+:tmp9_vec) 2023-01-11T21:38:05.6811744Z for(long i0=0; i0<1; i0+=1) 2023-01-11T21:38:05.6811977Z { 2023-01-11T21:38:05.6812229Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:05.6812535Z auto tmp1 = at::vec::Vectorized::loadu(in_ptr1 + 8*i0); 2023-01-11T21:38:05.6812840Z auto tmp3 = at::vec::Vectorized(out_ptr0[0]); 2023-01-11T21:38:05.6813096Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:05.6813430Z auto tmp4 = at::vec::Vectorized(static_cast(10)); 2023-01-11T21:38:05.6813817Z auto tmp5 = tmp3 / tmp4; 2023-01-11T21:38:05.6814209Z auto tmp6 = tmp2 - tmp5; 2023-01-11T21:38:05.6814717Z auto tmp7 = tmp6.pow(2); 2023-01-11T21:38:05.6815017Z tmp8_vec += tmp7; 2023-01-11T21:38:05.6815446Z tmp9_vec += tmp2; 2023-01-11T21:38:05.6817820Z } 2023-01-11T21:38:05.6818329Z tmp8 = at::vec::vec_reduce_all([](at::vec::Vectorized& x, at::vec::Vectorized&y) {return x + y;}, tmp8_vec); 2023-01-11T21:38:05.6818924Z tmp9 = at::vec::vec_reduce_all([](at::vec::Vectorized& x, at::vec::Vectorized&y) {return x + y;}, tmp9_vec); 2023-01-11T21:38:05.6819426Z #pragma omp for simd simdlen(4) reduction(+:tmp8) reduction(+:tmp9) 2023-01-11T21:38:05.6819772Z for(long i0=8; i0<10; i0+=1) 2023-01-11T21:38:05.6820032Z { 2023-01-11T21:38:05.6820304Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:05.6820530Z auto tmp1 = in_ptr1[i0]; 2023-01-11T21:38:05.6820937Z auto tmp3 = out_ptr0[0]; 2023-01-11T21:38:05.6821164Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:05.6821419Z auto tmp4 = static_cast(10); 2023-01-11T21:38:05.6821747Z auto tmp5 = tmp3 / tmp4; 2023-01-11T21:38:05.6824378Z auto tmp6 = tmp2 - tmp5; 2023-01-11T21:38:05.6824607Z auto tmp7 = tmp6 * tmp6; 2023-01-11T21:38:05.6824813Z tmp8 += tmp7; 2023-01-11T21:38:05.6825112Z tmp9 += tmp2; 2023-01-11T21:38:05.6825340Z } 2023-01-11T21:38:05.6825513Z } 2023-01-11T21:38:05.6825690Z out_ptr1[0] = tmp8; 2023-01-11T21:38:05.6825889Z out_ptr2[0] = tmp9; 2023-01-11T21:38:05.6826069Z } 2023-01-11T21:38:05.6826262Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.6826524Z { 2023-01-11T21:38:05.6826768Z #pragma omp for 2023-01-11T21:38:05.6827026Z for(long i0=0; i0<10; i0+=1) 2023-01-11T21:38:05.6827250Z { 2023-01-11T21:38:05.6827424Z { 2023-01-11T21:38:05.6827590Z { 2023-01-11T21:38:05.6827834Z auto tmp0 = in_out_ptr0[i0]; 2023-01-11T21:38:05.6828142Z auto tmp1 = in_ptr1[i0]; 2023-01-11T21:38:05.6828423Z auto tmp3 = out_ptr2[0]; 2023-01-11T21:38:05.6828676Z auto tmp7 = out_ptr1[0]; 2023-01-11T21:38:05.6828907Z auto tmp13 = in_ptr2[i0]; 2023-01-11T21:38:05.6829145Z auto tmp15 = in_ptr3[i0]; 2023-01-11T21:38:05.6829441Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:05.6829765Z auto tmp4 = static_cast(10); 2023-01-11T21:38:05.6912288Z auto tmp5 = tmp3 / tmp4; 2023-01-11T21:38:05.6912732Z auto tmp6 = tmp2 - tmp5; 2023-01-11T21:38:05.6913030Z auto tmp8 = tmp7 / tmp4; 2023-01-11T21:38:05.6913404Z auto tmp9 = static_cast(1e-05); 2023-01-11T21:38:05.6915586Z auto tmp10 = tmp8 + tmp9; 2023-01-11T21:38:05.6915905Z auto tmp11 = 1 / std::sqrt(tmp10); 2023-01-11T21:38:05.6916337Z auto tmp12 = tmp6 * tmp11; 2023-01-11T21:38:05.6916638Z auto tmp14 = tmp12 * tmp13; 2023-01-11T21:38:05.6925125Z auto tmp16 = tmp14 + tmp15; 2023-01-11T21:38:05.6925465Z auto tmp17 = tmp16 * (tmp16>0); 2023-01-11T21:38:05.6925759Z auto tmp18 = static_cast(0); 2023-01-11T21:38:05.6926056Z auto tmp19 = tmp17 <= tmp18; 2023-01-11T21:38:05.6926381Z in_out_ptr0[i0] = tmp12; 2023-01-11T21:38:05.6926668Z out_ptr3[i0] = tmp17; 2023-01-11T21:38:05.6926951Z out_ptr4[i0] = tmp19; 2023-01-11T21:38:05.6927203Z } 2023-01-11T21:38:05.6927417Z } 2023-01-11T21:38:05.6927595Z } 2023-01-11T21:38:05.6927786Z #pragma omp single 2023-01-11T21:38:05.6928005Z { 2023-01-11T21:38:05.6928178Z { 2023-01-11T21:38:05.6928346Z { 2023-01-11T21:38:05.6928540Z auto tmp0 = out_ptr1[0]; 2023-01-11T21:38:05.6928776Z auto tmp1 = static_cast(10); 2023-01-11T21:38:05.6929008Z auto tmp2 = tmp0 / tmp1; 2023-01-11T21:38:05.6929362Z auto tmp3 = static_cast(1e-05); 2023-01-11T21:38:05.6929657Z auto tmp4 = tmp2 + tmp3; 2023-01-11T21:38:05.6929971Z auto tmp5 = 1 / std::sqrt(tmp4); 2023-01-11T21:38:05.6930268Z auto tmp6 = tmp5 / tmp1; 2023-01-11T21:38:05.6930538Z in_out_ptr1[0] = tmp6; 2023-01-11T21:38:05.6930797Z } 2023-01-11T21:38:05.6931020Z } 2023-01-11T21:38:05.6931231Z } 2023-01-11T21:38:05.6931445Z } 2023-01-11T21:38:05.6931659Z } 2023-01-11T21:38:05.6931905Z ''') 2023-01-11T21:38:05.6932044Z 2023-01-11T21:38:05.6932051Z 2023-01-11T21:38:05.6932189Z async_compile.wait(globals()) 2023-01-11T21:38:05.6932471Z del async_compile 2023-01-11T21:38:05.6932628Z 2023-01-11T21:38:05.6932723Z def call(args): 2023-01-11T21:38:05.6933044Z primals_1, primals_2, primals_3, primals_4, primals_5 = args 2023-01-11T21:38:05.6933340Z args.clear() 2023-01-11T21:38:05.6933767Z buf0 = empty_strided((1, 10), (10, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.6934296Z aten.mm.out(as_strided(primals_5, (1, 10), (10, 1)), as_strided(primals_1, (10, 10), (1, 10)), out=buf0) 2023-01-11T21:38:05.6934802Z del primals_1 2023-01-11T21:38:05.6935222Z buf1 = empty_strided((1, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.6935681Z buf2 = empty_strided((1, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.6936116Z buf3 = empty_strided((1, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.6936449Z buf4 = as_strided(buf0, (10, ), (1, )); del buf0 # reuse 2023-01-11T21:38:05.6936876Z buf5 = empty_strided((10, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.6937435Z buf6 = empty_strided((10, ), (1, ), device='cpu', dtype=torch.bool) 2023-01-11T21:38:05.6937783Z buf7 = buf2; del buf2 # reuse 2023-01-11T21:38:05.6938386Z kernel_cpp_0(c_void_p(buf4.data_ptr()), c_void_p(buf7.data_ptr()), c_void_p(primals_2.data_ptr()), c_void_p(primals_3.data_ptr()), c_void_p(primals_4.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf3.data_ptr()), c_void_p(buf5.data_ptr()), c_void_p(buf6.data_ptr())) 2023-01-11T21:38:05.6938979Z del buf1 2023-01-11T21:38:05.6939210Z del buf3 2023-01-11T21:38:05.6939456Z del primals_2 2023-01-11T21:38:05.6939706Z del primals_4 2023-01-11T21:38:05.6944690Z return (buf5, primals_3, as_strided(primals_5, (1, 10), (10, 1)), buf4, buf6, buf7, ) 2023-01-11T21:38:05.6944941Z 2023-01-11T21:38:05.6944948Z 2023-01-11T21:38:05.6945055Z if __name__ == "__main__": 2023-01-11T21:38:05.6945365Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.6945728Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.6946376Z primals_1 = rand_strided((10, 10), (10, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.6946879Z primals_2 = rand_strided((10, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.6947389Z primals_3 = rand_strided((10, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.6947883Z primals_4 = rand_strided((10, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.6948367Z primals_5 = rand_strided((10, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.6948798Z print_performance(lambda: call([primals_1, primals_2, primals_3, primals_4, primals_5])) 2023-01-11T21:38:05.6949055Z 2023-01-11T21:38:05.6949148Z ok (2.058s) 2023-01-11T21:38:05.6949770Z test_load_same_bool_tensor_twice (__main__.CPUReproTests) ... [2023-01-11 21:23:48,868] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 2 2023-01-11T21:38:05.6950444Z [2023-01-11 21:23:50,596] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 2 2023-01-11T21:38:05.6950727Z 2023-01-11T21:38:05.6950863Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.6951132Z import torch 2023-01-11T21:38:05.6951389Z import random 2023-01-11T21:38:05.6951702Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.6952054Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.6952264Z 2023-01-11T21:38:05.6952377Z aten = torch.ops.aten 2023-01-11T21:38:05.6952712Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.6953053Z async_compile = AsyncCompile() 2023-01-11T21:38:05.6953220Z 2023-01-11T21:38:05.6953330Z import triton 2023-01-11T21:38:05.6953608Z import triton.language as tl 2023-01-11T21:38:05.6953945Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.6954310Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.6954538Z 2023-01-11T21:38:05.6954546Z 2023-01-11T21:38:05.6954752Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.6955260Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.6955701Z extern "C" void kernel(const bool* __restrict__ in_ptr0, 2023-01-11T21:38:05.6956048Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:05.6956440Z float* __restrict__ out_ptr0, 2023-01-11T21:38:05.6956760Z float* __restrict__ out_ptr1) 2023-01-11T21:38:05.6957000Z { 2023-01-11T21:38:05.6957271Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.6957544Z { 2023-01-11T21:38:05.6957771Z #pragma omp for 2023-01-11T21:38:05.6958052Z for(long i0=0; i0<4; i0+=1) 2023-01-11T21:38:05.6958300Z { 2023-01-11T21:38:05.6958568Z float g_tmp_buffer_in_ptr0[8] = {0}; 2023-01-11T21:38:05.6958901Z flag_to_float(in_ptr0 + 8*i0, g_tmp_buffer_in_ptr0, 8); 2023-01-11T21:38:05.6959297Z auto tmp0 = at::vec::Vectorized::loadu(g_tmp_buffer_in_ptr0); 2023-01-11T21:38:05.6959692Z auto tmp2 = at::vec::Vectorized::loadu(in_ptr1 + 8*i0); 2023-01-11T21:38:05.6960044Z flag_to_float(in_ptr0 + 8*i0, g_tmp_buffer_in_ptr0, 8); 2023-01-11T21:38:05.6960524Z auto tmp1 = at::vec::Vectorized(static_cast(-33.0)); 2023-01-11T21:38:05.6960907Z auto tmp3 = decltype(tmp1)::blendv(tmp2, tmp1, tmp0); 2023-01-11T21:38:05.6961233Z tmp3.store(out_ptr0 + 8*i0); 2023-01-11T21:38:05.6961531Z tmp3.store(out_ptr1 + 8*i0); 2023-01-11T21:38:05.6961799Z } 2023-01-11T21:38:05.6962058Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:05.6962353Z for(long i0=32; i0<34; i0+=1) 2023-01-11T21:38:05.6962603Z { 2023-01-11T21:38:05.6962842Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:05.6963130Z auto tmp2 = in_ptr1[i0]; 2023-01-11T21:38:05.6963507Z auto tmp1 = static_cast(-33.0); 2023-01-11T21:38:05.6963881Z auto tmp3 = tmp0 ? tmp1 : tmp2; 2023-01-11T21:38:05.6964174Z out_ptr0[i0] = tmp3; 2023-01-11T21:38:05.6964444Z out_ptr1[i0] = tmp3; 2023-01-11T21:38:05.6964679Z } 2023-01-11T21:38:05.6964930Z } 2023-01-11T21:38:05.6965172Z } 2023-01-11T21:38:05.6965408Z ''') 2023-01-11T21:38:05.6965539Z 2023-01-11T21:38:05.6965550Z 2023-01-11T21:38:05.6965679Z async_compile.wait(globals()) 2023-01-11T21:38:05.6965962Z del async_compile 2023-01-11T21:38:05.6966118Z 2023-01-11T21:38:05.6966220Z def call(args): 2023-01-11T21:38:05.6966468Z arg0_1, arg1_1 = args 2023-01-11T21:38:05.6966720Z args.clear() 2023-01-11T21:38:05.6967149Z buf0 = empty_strided((2, 17), (17, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.6967633Z buf1 = empty_strided((2, 17), (17, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.6968093Z kernel_cpp_0(c_void_p(arg1_1.data_ptr()), c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr())) 2023-01-11T21:38:05.6968479Z del arg0_1 2023-01-11T21:38:05.6968713Z del arg1_1 2023-01-11T21:38:05.6968968Z return (buf0, buf1, ) 2023-01-11T21:38:05.6969121Z 2023-01-11T21:38:05.6969128Z 2023-01-11T21:38:05.6969244Z if __name__ == "__main__": 2023-01-11T21:38:05.6969559Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.6969916Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.6970399Z arg0_1 = rand_strided((2, 17), (17, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.6970876Z arg1_1 = rand_strided((2, 17), (17, 1), device='cpu', dtype=torch.bool) 2023-01-11T21:38:05.6971236Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:05.6971431Z 2023-01-11T21:38:05.6971533Z ok (1.747s) 2023-01-11T21:38:05.6972141Z test_masked_fill_softmax (__main__.CPUReproTests) ... [2023-01-11 21:23:50,639] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 3 2023-01-11T21:38:05.6972830Z [2023-01-11 21:23:52,421] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 3 2023-01-11T21:38:05.6973088Z 2023-01-11T21:38:05.6973222Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.6973513Z import torch 2023-01-11T21:38:05.6973761Z import random 2023-01-11T21:38:05.6974122Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.6974703Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.6974938Z 2023-01-11T21:38:05.6975051Z aten = torch.ops.aten 2023-01-11T21:38:05.6975382Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.6975736Z async_compile = AsyncCompile() 2023-01-11T21:38:05.6975911Z 2023-01-11T21:38:05.6976014Z import triton 2023-01-11T21:38:05.6976298Z import triton.language as tl 2023-01-11T21:38:05.6976620Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.6977014Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.6977354Z 2023-01-11T21:38:05.6977360Z 2023-01-11T21:38:05.6977564Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.6978008Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.6978467Z extern "C" void kernel(float* __restrict__ in_out_ptr0, 2023-01-11T21:38:05.6978828Z const unsigned char* __restrict__ in_ptr0, 2023-01-11T21:38:05.6979163Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:05.6979480Z float* __restrict__ out_ptr0, 2023-01-11T21:38:05.6979792Z float* __restrict__ out_ptr2) 2023-01-11T21:38:05.6980056Z { 2023-01-11T21:38:05.6980298Z auto out_ptr1 = in_out_ptr0; 2023-01-11T21:38:05.6980600Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.6980876Z { 2023-01-11T21:38:05.6981106Z #pragma omp for 2023-01-11T21:38:05.6981379Z for(long i0=0; i0<2; i0+=1) 2023-01-11T21:38:05.6981631Z { 2023-01-11T21:38:05.6981937Z { 2023-01-11T21:38:05.6982593Z #pragma omp declare reduction(max:at::vec::Vectorized:omp_out = at::vec::maximum(omp_out, omp_in)) initializer(omp_priv={{-std::numeric_limits::infinity()}}) 2023-01-11T21:38:05.6983261Z float tmp5 = -std::numeric_limits::infinity(); 2023-01-11T21:38:05.6983642Z auto tmp5_vec = at::vec::Vectorized(tmp5); 2023-01-11T21:38:05.6983957Z for(long i1=0; i1<2; i1+=1) 2023-01-11T21:38:05.6984218Z { 2023-01-11T21:38:05.6984509Z float g_tmp_buffer_in_ptr0[8] = {0}; 2023-01-11T21:38:05.6984874Z flag_to_float(in_ptr0 + (8*i1) + (17*i0), g_tmp_buffer_in_ptr0, 8); 2023-01-11T21:38:05.6985296Z auto tmp0 = at::vec::Vectorized::loadu(g_tmp_buffer_in_ptr0); 2023-01-11T21:38:05.6985709Z auto tmp3 = at::vec::Vectorized::loadu(in_ptr1 + (8*i1) + (17*i0)); 2023-01-11T21:38:05.6986049Z auto tmp1 = (tmp0); 2023-01-11T21:38:05.6986517Z auto tmp2 = at::vec::Vectorized(static_cast(-33.0)); 2023-01-11T21:38:05.6986914Z auto tmp4 = decltype(tmp2)::blendv(tmp3, tmp2, tmp1); 2023-01-11T21:38:05.6987291Z tmp5_vec = at::vec::maximum(tmp5_vec, tmp4); 2023-01-11T21:38:05.6987583Z } 2023-01-11T21:38:05.6987996Z tmp5 = at::vec::vec_reduce_all([](at::vec::Vectorized& x, at::vec::Vectorized&y) {return at::vec::maximum(x, y);}, tmp5_vec); 2023-01-11T21:38:05.6988468Z #pragma omp simd simdlen(4) reduction(max:tmp5) 2023-01-11T21:38:05.6988793Z for(long i1=16; i1<17; i1+=1) 2023-01-11T21:38:05.6989046Z { 2023-01-11T21:38:05.6989326Z auto tmp0 = in_ptr0[i1 + (17*i0)]; 2023-01-11T21:38:05.6989646Z auto tmp3 = in_ptr1[i1 + (17*i0)]; 2023-01-11T21:38:05.6989972Z auto tmp1 = static_cast(tmp0); 2023-01-11T21:38:05.6990381Z auto tmp2 = static_cast(-33.0); 2023-01-11T21:38:05.6990714Z auto tmp4 = tmp1 ? tmp2 : tmp3; 2023-01-11T21:38:05.6991020Z tmp5 = std::max(tmp5, tmp4); 2023-01-11T21:38:05.6991284Z } 2023-01-11T21:38:05.6991623Z out_ptr0[i0] = tmp5; 2023-01-11T21:38:05.6991877Z } 2023-01-11T21:38:05.6992101Z } 2023-01-11T21:38:05.6992338Z #pragma omp for 2023-01-11T21:38:05.6992605Z for(long i0=0; i0<2; i0+=1) 2023-01-11T21:38:05.6992860Z { 2023-01-11T21:38:05.6993086Z { 2023-01-11T21:38:05.6993470Z #pragma omp declare reduction(+:at::vec::Vectorized:omp_out += omp_in) initializer(omp_priv={{0}}) 2023-01-11T21:38:05.6993860Z float tmp8 = 0; 2023-01-11T21:38:05.6994195Z auto tmp8_vec = at::vec::Vectorized(tmp8); 2023-01-11T21:38:05.6994517Z for(long i1=0; i1<2; i1+=1) 2023-01-11T21:38:05.6994764Z { 2023-01-11T21:38:05.6995055Z float g_tmp_buffer_in_ptr0[8] = {0}; 2023-01-11T21:38:05.6995419Z flag_to_float(in_ptr0 + (8*i1) + (17*i0), g_tmp_buffer_in_ptr0, 8); 2023-01-11T21:38:05.6995829Z auto tmp0 = at::vec::Vectorized::loadu(g_tmp_buffer_in_ptr0); 2023-01-11T21:38:05.6996250Z auto tmp3 = at::vec::Vectorized::loadu(in_ptr1 + (8*i1) + (17*i0)); 2023-01-11T21:38:05.6996647Z auto tmp5 = at::vec::Vectorized(out_ptr0[i0]); 2023-01-11T21:38:05.6996978Z auto tmp1 = (tmp0); 2023-01-11T21:38:05.6997446Z auto tmp2 = at::vec::Vectorized(static_cast(-33.0)); 2023-01-11T21:38:05.6997844Z auto tmp4 = decltype(tmp2)::blendv(tmp3, tmp2, tmp1); 2023-01-11T21:38:05.6998238Z auto tmp6 = tmp4 - tmp5; 2023-01-11T21:38:05.6998593Z auto tmp7 = tmp6.exp(); 2023-01-11T21:38:05.6998914Z tmp7.store(out_ptr1 + (8*i1) + (17*i0)); 2023-01-11T21:38:05.6999212Z tmp8_vec += tmp7; 2023-01-11T21:38:05.6999463Z } 2023-01-11T21:38:05.6999873Z tmp8 = at::vec::vec_reduce_all([](at::vec::Vectorized& x, at::vec::Vectorized&y) {return x + y;}, tmp8_vec); 2023-01-11T21:38:05.7000323Z #pragma omp simd simdlen(4) reduction(+:tmp8) 2023-01-11T21:38:05.7000649Z for(long i1=16; i1<17; i1+=1) 2023-01-11T21:38:05.7000904Z { 2023-01-11T21:38:05.7001179Z auto tmp0 = in_ptr0[i1 + (17*i0)]; 2023-01-11T21:38:05.7001502Z auto tmp3 = in_ptr1[i1 + (17*i0)]; 2023-01-11T21:38:05.7001795Z auto tmp5 = out_ptr0[i0]; 2023-01-11T21:38:05.7002122Z auto tmp1 = static_cast(tmp0); 2023-01-11T21:38:05.7002542Z auto tmp2 = static_cast(-33.0); 2023-01-11T21:38:05.7002874Z auto tmp4 = tmp1 ? tmp2 : tmp3; 2023-01-11T21:38:05.7003240Z auto tmp6 = tmp4 - tmp5; 2023-01-11T21:38:05.7003560Z auto tmp7 = std::exp(tmp6); 2023-01-11T21:38:05.7003873Z out_ptr1[i1 + (17*i0)] = tmp7; 2023-01-11T21:38:05.7004140Z tmp8 += tmp7; 2023-01-11T21:38:05.7004386Z } 2023-01-11T21:38:05.7004650Z out_ptr2[i0] = tmp8; 2023-01-11T21:38:05.7004890Z } 2023-01-11T21:38:05.7005118Z } 2023-01-11T21:38:05.7005360Z #pragma omp for 2023-01-11T21:38:05.7005626Z for(long i0=0; i0<2; i0+=1) 2023-01-11T21:38:05.7005872Z { 2023-01-11T21:38:05.7006129Z for(long i1=0; i1<2; i1+=1) 2023-01-11T21:38:05.7006369Z { 2023-01-11T21:38:05.7006697Z auto tmp0 = at::vec::Vectorized::loadu(out_ptr1 + (8*i1) + (17*i0)); 2023-01-11T21:38:05.7007098Z auto tmp1 = at::vec::Vectorized(out_ptr2[i0]); 2023-01-11T21:38:05.7007417Z auto tmp2 = tmp0 / tmp1; 2023-01-11T21:38:05.7007736Z tmp2.store(in_out_ptr0 + (8*i1) + (17*i0)); 2023-01-11T21:38:05.7008016Z } 2023-01-11T21:38:05.7008277Z #pragma omp simd simdlen(4) 2023-01-11T21:38:05.7008623Z for(long i1=16; i1<17; i1+=1) 2023-01-11T21:38:05.7008876Z { 2023-01-11T21:38:05.7009144Z auto tmp0 = out_ptr1[i1 + (17*i0)]; 2023-01-11T21:38:05.7009436Z auto tmp1 = out_ptr2[i0]; 2023-01-11T21:38:05.7009733Z auto tmp2 = tmp0 / tmp1; 2023-01-11T21:38:05.7010039Z in_out_ptr0[i1 + (17*i0)] = tmp2; 2023-01-11T21:38:05.7010289Z } 2023-01-11T21:38:05.7010518Z } 2023-01-11T21:38:05.7010741Z } 2023-01-11T21:38:05.7010945Z } 2023-01-11T21:38:05.7011187Z ''') 2023-01-11T21:38:05.7011317Z 2023-01-11T21:38:05.7011324Z 2023-01-11T21:38:05.7011462Z async_compile.wait(globals()) 2023-01-11T21:38:05.7011729Z del async_compile 2023-01-11T21:38:05.7011892Z 2023-01-11T21:38:05.7011994Z def call(args): 2023-01-11T21:38:05.7012256Z arg0_1, arg1_1 = args 2023-01-11T21:38:05.7012507Z args.clear() 2023-01-11T21:38:05.7012930Z buf0 = empty_strided((2, 1), (1, 2), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.7013411Z buf1 = empty_strided((2, 17), (17, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.7013896Z buf2 = empty_strided((2, 1), (1, 2), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.7014220Z buf3 = buf1; del buf1 # reuse 2023-01-11T21:38:05.7014887Z kernel_cpp_0(c_void_p(buf3.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf2.data_ptr())) 2023-01-11T21:38:05.7015303Z del arg0_1 2023-01-11T21:38:05.7015527Z del arg1_1 2023-01-11T21:38:05.7015767Z return (buf3, ) 2023-01-11T21:38:05.7015925Z 2023-01-11T21:38:05.7016003Z 2023-01-11T21:38:05.7016118Z if __name__ == "__main__": 2023-01-11T21:38:05.7016417Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.7016786Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.7017376Z arg0_1 = rand_strided((2, 17), (17, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.7017863Z arg1_1 = rand_strided((2, 17), (17, 1), device='cpu', dtype=torch.uint8) 2023-01-11T21:38:05.7018231Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:05.7018436Z 2023-01-11T21:38:05.7018538Z ok (1.825s) 2023-01-11T21:38:05.7019143Z test_new_vec_op_cpu_only (__main__.CPUReproTests) ... [2023-01-11 21:23:52,449] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph None 2023-01-11T21:38:05.7019812Z [2023-01-11 21:23:54,130] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph None 2023-01-11T21:38:05.7020084Z 2023-01-11T21:38:05.7020219Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.7020500Z import torch 2023-01-11T21:38:05.7020744Z import random 2023-01-11T21:38:05.7021031Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.7021390Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.7021598Z 2023-01-11T21:38:05.7021710Z aten = torch.ops.aten 2023-01-11T21:38:05.7022052Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.7022400Z async_compile = AsyncCompile() 2023-01-11T21:38:05.7022569Z 2023-01-11T21:38:05.7022672Z import triton 2023-01-11T21:38:05.7022935Z import triton.language as tl 2023-01-11T21:38:05.7023276Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.7023672Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.7023900Z 2023-01-11T21:38:05.7023907Z 2023-01-11T21:38:05.7024109Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.7024544Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.7025020Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:05.7025384Z float* __restrict__ out_ptr0) 2023-01-11T21:38:05.7025668Z { 2023-01-11T21:38:05.7025938Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.7026204Z { 2023-01-11T21:38:05.7026538Z #pragma omp for 2023-01-11T21:38:05.7026812Z for(long i0=0; i0<2; i0+=1) 2023-01-11T21:38:05.7027060Z { 2023-01-11T21:38:05.7027370Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:05.7027699Z auto tmp1 = tmp0.erf(); 2023-01-11T21:38:05.7027996Z auto tmp2 = tmp1.expm1(); 2023-01-11T21:38:05.7028286Z auto tmp3 = tmp2.log1p(); 2023-01-11T21:38:05.7028556Z tmp3.store(out_ptr0 + 8*i0); 2023-01-11T21:38:05.7028824Z } 2023-01-11T21:38:05.7029090Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:05.7029374Z for(long i0=16; i0<18; i0+=1) 2023-01-11T21:38:05.7029636Z { 2023-01-11T21:38:05.7029895Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:05.7030173Z auto tmp1 = std::erf(tmp0); 2023-01-11T21:38:05.7030473Z auto tmp2 = std::expm1(tmp1); 2023-01-11T21:38:05.7030768Z auto tmp3 = std::log1p(tmp2); 2023-01-11T21:38:05.7031054Z out_ptr0[i0] = tmp3; 2023-01-11T21:38:05.7031306Z } 2023-01-11T21:38:05.7031522Z } 2023-01-11T21:38:05.7031724Z } 2023-01-11T21:38:05.7031979Z ''') 2023-01-11T21:38:05.7032117Z 2023-01-11T21:38:05.7032124Z 2023-01-11T21:38:05.7032248Z async_compile.wait(globals()) 2023-01-11T21:38:05.7032514Z del async_compile 2023-01-11T21:38:05.7032668Z 2023-01-11T21:38:05.7032780Z def call(args): 2023-01-11T21:38:05.7033022Z x_1, = args 2023-01-11T21:38:05.7033258Z args.clear() 2023-01-11T21:38:05.7033672Z buf0 = empty_strided((2, 9), (9, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.7034062Z kernel_cpp_0(c_void_p(x_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:05.7103722Z del x_1 2023-01-11T21:38:05.7103967Z return (buf0, ) 2023-01-11T21:38:05.7104124Z 2023-01-11T21:38:05.7104131Z 2023-01-11T21:38:05.7104250Z if __name__ == "__main__": 2023-01-11T21:38:05.7104573Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.7105018Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.7105593Z x_1 = rand_strided((2, 9), (9, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.7105967Z print_performance(lambda: call([x_1])) 2023-01-11T21:38:05.7106173Z 2023-01-11T21:38:05.7106279Z ok (1.708s) 2023-01-11T21:38:05.7106887Z test_no_op_squeeze (__main__.CPUReproTests) ... [2023-01-11 21:23:54,146] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 4 2023-01-11T21:38:05.7107592Z [2023-01-11 21:23:54,148] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 4 2023-01-11T21:38:05.7107875Z 2023-01-11T21:38:05.7108020Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.7108323Z import torch 2023-01-11T21:38:05.7108576Z import random 2023-01-11T21:38:05.7108891Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.7158673Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.7158908Z 2023-01-11T21:38:05.7159038Z aten = torch.ops.aten 2023-01-11T21:38:05.7159388Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.7159751Z async_compile = AsyncCompile() 2023-01-11T21:38:05.7159943Z 2023-01-11T21:38:05.7160046Z import triton 2023-01-11T21:38:05.7160326Z import triton.language as tl 2023-01-11T21:38:05.7160659Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.7161065Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.7161292Z 2023-01-11T21:38:05.7161299Z 2023-01-11T21:38:05.7161427Z async_compile.wait(globals()) 2023-01-11T21:38:05.7161692Z del async_compile 2023-01-11T21:38:05.7161855Z 2023-01-11T21:38:05.7161960Z def call(args): 2023-01-11T21:38:05.7162215Z arg0_1, = args 2023-01-11T21:38:05.7162462Z args.clear() 2023-01-11T21:38:05.7162715Z return (arg0_1, ) 2023-01-11T21:38:05.7162879Z 2023-01-11T21:38:05.7162886Z 2023-01-11T21:38:05.7163001Z if __name__ == "__main__": 2023-01-11T21:38:05.7163498Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.7163885Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.7164409Z arg0_1 = rand_strided((10, 20), (20, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.7164804Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:05.7164989Z 2023-01-11T21:38:05.7165085Z ok (0.018s) 2023-01-11T21:38:05.7165723Z test_parallel_num_threads (__main__.CPUReproTests) ... [2023-01-11 21:23:54,179] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 5 2023-01-11T21:38:05.7166441Z [2023-01-11 21:23:55,873] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 5 2023-01-11T21:38:05.7166734Z 2023-01-11T21:38:05.7166882Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.7167154Z import torch 2023-01-11T21:38:05.7167399Z import random 2023-01-11T21:38:05.7167713Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.7168078Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.7168300Z 2023-01-11T21:38:05.7168414Z aten = torch.ops.aten 2023-01-11T21:38:05.7168760Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.7169101Z async_compile = AsyncCompile() 2023-01-11T21:38:05.7169296Z 2023-01-11T21:38:05.7169395Z import triton 2023-01-11T21:38:05.7169667Z import triton.language as tl 2023-01-11T21:38:05.7170005Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.7170412Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.7170632Z 2023-01-11T21:38:05.7170639Z 2023-01-11T21:38:05.7170963Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.7171422Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.7171868Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:05.7172215Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:05.7172542Z float* __restrict__ out_ptr0) 2023-01-11T21:38:05.7172798Z { 2023-01-11T21:38:05.7173053Z for(long i0=0; i0<25; i0+=1) 2023-01-11T21:38:05.7173301Z { 2023-01-11T21:38:05.7173633Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:05.7174007Z auto tmp1 = at::vec::Vectorized::loadu(in_ptr1 + 8*i0); 2023-01-11T21:38:05.7174343Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:05.7174804Z tmp2.store(out_ptr0 + 8*i0); 2023-01-11T21:38:05.7175068Z } 2023-01-11T21:38:05.7175321Z #pragma omp simd simdlen(4) 2023-01-11T21:38:05.7175617Z for(long i0=200; i0<200; i0+=1) 2023-01-11T21:38:05.7175854Z { 2023-01-11T21:38:05.7176096Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:05.7176373Z auto tmp1 = in_ptr1[i0]; 2023-01-11T21:38:05.7176637Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:05.7176913Z out_ptr0[i0] = tmp2; 2023-01-11T21:38:05.7177262Z } 2023-01-11T21:38:05.7177472Z } 2023-01-11T21:38:05.7177731Z ''') 2023-01-11T21:38:05.7177867Z 2023-01-11T21:38:05.7177874Z 2023-01-11T21:38:05.7178011Z async_compile.wait(globals()) 2023-01-11T21:38:05.7178277Z del async_compile 2023-01-11T21:38:05.7178437Z 2023-01-11T21:38:05.7178546Z def call(args): 2023-01-11T21:38:05.7178815Z arg0_1, arg1_1 = args 2023-01-11T21:38:05.7179057Z args.clear() 2023-01-11T21:38:05.7179506Z buf0 = empty_strided((10, 20), (20, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.7179942Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:05.7180298Z del arg0_1 2023-01-11T21:38:05.7180539Z del arg1_1 2023-01-11T21:38:05.7180787Z return (buf0, ) 2023-01-11T21:38:05.7180945Z 2023-01-11T21:38:05.7180952Z 2023-01-11T21:38:05.7181073Z if __name__ == "__main__": 2023-01-11T21:38:05.7181379Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.7181850Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.7182376Z arg0_1 = rand_strided((10, 20), (20, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.7182877Z arg1_1 = rand_strided((10, 20), (20, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.7183279Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:05.7183491Z 2023-01-11T21:38:05.7183594Z ok (1.763s) 2023-01-11T21:38:05.7184197Z test_sign_cpu_only (__main__.CPUReproTests) ... [2023-01-11 21:23:55,926] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph None 2023-01-11T21:38:05.7184873Z [2023-01-11 21:23:57,652] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph None 2023-01-11T21:38:05.7185158Z 2023-01-11T21:38:05.7185299Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.7185584Z import torch 2023-01-11T21:38:05.7185837Z import random 2023-01-11T21:38:05.7186132Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.7186497Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.7186720Z 2023-01-11T21:38:05.7186839Z aten = torch.ops.aten 2023-01-11T21:38:05.7187174Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.7187516Z async_compile = AsyncCompile() 2023-01-11T21:38:05.7187693Z 2023-01-11T21:38:05.7187796Z import triton 2023-01-11T21:38:05.7188057Z import triton.language as tl 2023-01-11T21:38:05.7188396Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.7188780Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.7189003Z 2023-01-11T21:38:05.7189009Z 2023-01-11T21:38:05.7189304Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.7189739Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.7190200Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:05.7190537Z float* __restrict__ out_ptr0) 2023-01-11T21:38:05.7190791Z { 2023-01-11T21:38:05.7191049Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.7191315Z { 2023-01-11T21:38:05.7191540Z #pragma omp for 2023-01-11T21:38:05.7191811Z for(long i0=0; i0<2; i0+=1) 2023-01-11T21:38:05.7192060Z { 2023-01-11T21:38:05.7192366Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:05.7192807Z auto tmp1 = decltype(tmp0)::blendv(decltype(tmp0)(0), decltype(tmp0)(1), decltype(tmp0)(0) < tmp0); 2023-01-11T21:38:05.7193282Z auto tmp2 = decltype(tmp0)::blendv(decltype(tmp0)(0), decltype(tmp0)(1), tmp0 < decltype(tmp0)(0)); 2023-01-11T21:38:05.7193712Z auto tmp3 = tmp1 - tmp2; 2023-01-11T21:38:05.7194000Z tmp3.store(out_ptr0 + 8*i0); 2023-01-11T21:38:05.7194253Z } 2023-01-11T21:38:05.7194519Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:05.7194816Z for(long i0=16; i0<18; i0+=1) 2023-01-11T21:38:05.7195102Z { 2023-01-11T21:38:05.7195348Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:05.7195618Z auto tmp1 = tmp0 > 0 ? 1 : 0; 2023-01-11T21:38:05.7195905Z auto tmp2 = tmp0 < 0 ? 1 : 0; 2023-01-11T21:38:05.7196236Z auto tmp3 = tmp1 - tmp2; 2023-01-11T21:38:05.7196501Z out_ptr0[i0] = tmp3; 2023-01-11T21:38:05.7196743Z } 2023-01-11T21:38:05.7196956Z } 2023-01-11T21:38:05.7197158Z } 2023-01-11T21:38:05.7197392Z ''') 2023-01-11T21:38:05.7197523Z 2023-01-11T21:38:05.7197529Z 2023-01-11T21:38:05.7197654Z async_compile.wait(globals()) 2023-01-11T21:38:05.7197915Z del async_compile 2023-01-11T21:38:05.7198070Z 2023-01-11T21:38:05.7198174Z def call(args): 2023-01-11T21:38:05.7198408Z x_1, = args 2023-01-11T21:38:05.7198640Z args.clear() 2023-01-11T21:38:05.7199042Z buf0 = empty_strided((2, 9), (9, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.7199431Z kernel_cpp_0(c_void_p(x_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:05.7199785Z del x_1 2023-01-11T21:38:05.7200012Z return (buf0, ) 2023-01-11T21:38:05.7200161Z 2023-01-11T21:38:05.7200167Z 2023-01-11T21:38:05.7200274Z if __name__ == "__main__": 2023-01-11T21:38:05.7200577Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.7200924Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.7201387Z x_1 = rand_strided((2, 9), (9, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.7201732Z print_performance(lambda: call([x_1])) 2023-01-11T21:38:05.7201914Z 2023-01-11T21:38:05.7202007Z ok (1.741s) 2023-01-11T21:38:05.7202301Z test_timed_cpu_only (__main__.CPUReproTests) ... ok (0.001s) 2023-01-11T21:38:05.7202968Z test_vec_dynamic_shapes (__main__.CPUReproTests) ... [2023-01-11 21:23:57,802] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 6 2023-01-11T21:38:05.7203647Z [2023-01-11 21:23:59,535] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 6 2023-01-11T21:38:05.7203914Z 2023-01-11T21:38:05.7204045Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.7204309Z import torch 2023-01-11T21:38:05.7204545Z import random 2023-01-11T21:38:05.7204844Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.7205191Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.7205399Z 2023-01-11T21:38:05.7205508Z aten = torch.ops.aten 2023-01-11T21:38:05.7205839Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.7206172Z async_compile = AsyncCompile() 2023-01-11T21:38:05.7206344Z 2023-01-11T21:38:05.7206449Z import triton 2023-01-11T21:38:05.7206778Z import triton.language as tl 2023-01-11T21:38:05.7207110Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.7207496Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.7207725Z 2023-01-11T21:38:05.7207731Z 2023-01-11T21:38:05.7207924Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.7208379Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.7208821Z extern "C" void kernel(float* __restrict__ in_out_ptr0, 2023-01-11T21:38:05.7209162Z const float* __restrict__ in_ptr0, 2023-01-11T21:38:05.7209471Z float* __restrict__ out_ptr0, 2023-01-11T21:38:05.7209768Z float* __restrict__ out_ptr2, 2023-01-11T21:38:05.7210203Z const long ks0, 2023-01-11T21:38:05.7210470Z const long ks1) 2023-01-11T21:38:05.7210705Z { 2023-01-11T21:38:05.7210949Z auto out_ptr1 = in_out_ptr0; 2023-01-11T21:38:05.7211249Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.7211505Z { 2023-01-11T21:38:05.7211742Z #pragma omp for 2023-01-11T21:38:05.7212006Z for(long i0=0; i0::infinity(); 2023-01-11T21:38:05.7213477Z for(long i1=0; i10); 2023-01-11T21:38:05.7439365Z auto tmp8 = std::cos(tmp7); 2023-01-11T21:38:05.7439658Z auto tmp9 = std::exp(tmp8); 2023-01-11T21:38:05.7439957Z auto tmp10 = std::sqrt(tmp9); 2023-01-11T21:38:05.7440238Z auto tmp11 = tmp10 + tmp0; 2023-01-11T21:38:05.7440601Z auto tmp13 = tmp11 - tmp12; 2023-01-11T21:38:05.7440888Z auto tmp14 = tmp13 * tmp0; 2023-01-11T21:38:05.7441169Z auto tmp15 = tmp14 / tmp0; 2023-01-11T21:38:05.7441457Z auto tmp16 = tmp15 * tmp15; 2023-01-11T21:38:05.7441734Z auto tmp17 = tmp16 * tmp16; 2023-01-11T21:38:05.7442008Z auto tmp18 = tmp17 * tmp15; 2023-01-11T21:38:05.7442302Z auto tmp19 = tmp18 * tmp18; 2023-01-11T21:38:05.7442600Z auto tmp20 = std::log(tmp19); 2023-01-11T21:38:05.7442898Z auto tmp21 = std::floor(tmp20); 2023-01-11T21:38:05.7443211Z auto tmp22 = std::ceil(tmp21); 2023-01-11T21:38:05.7443508Z auto tmp23 = std::trunc(tmp22); 2023-01-11T21:38:05.7443897Z auto tmp24 = std::lgamma(tmp23); 2023-01-11T21:38:05.7444221Z auto tmp25 = std::fmod(tmp24, tmp12); 2023-01-11T21:38:05.7444529Z auto tmp26 = tmp25 > 0 ? 1 : 0; 2023-01-11T21:38:05.7444827Z auto tmp27 = tmp25 < 0 ? 1 : 0; 2023-01-11T21:38:05.7445199Z auto tmp28 = tmp26 - tmp27; 2023-01-11T21:38:05.7445488Z auto tmp29 = tmp28 + tmp12; 2023-01-11T21:38:05.7445772Z out_ptr0[i0] = tmp29; 2023-01-11T21:38:05.7446016Z } 2023-01-11T21:38:05.7446242Z } 2023-01-11T21:38:05.7446467Z } 2023-01-11T21:38:05.7446679Z } 2023-01-11T21:38:05.7446880Z } 2023-01-11T21:38:05.7447112Z ''') 2023-01-11T21:38:05.7447248Z 2023-01-11T21:38:05.7447255Z 2023-01-11T21:38:05.7447385Z async_compile.wait(globals()) 2023-01-11T21:38:05.7447650Z del async_compile 2023-01-11T21:38:05.7447802Z 2023-01-11T21:38:05.7447903Z def call(args): 2023-01-11T21:38:05.7448151Z x1_1, x2_1 = args 2023-01-11T21:38:05.7448378Z args.clear() 2023-01-11T21:38:05.7448793Z buf0 = empty_strided((10, 20), (20, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.7449210Z kernel_cpp_0(c_void_p(x1_1.data_ptr()), c_void_p(x2_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:05.7449529Z del x1_1 2023-01-11T21:38:05.7449755Z del x2_1 2023-01-11T21:38:05.7449987Z return (buf0, ) 2023-01-11T21:38:05.7450133Z 2023-01-11T21:38:05.7450139Z 2023-01-11T21:38:05.7450239Z if __name__ == "__main__": 2023-01-11T21:38:05.7450538Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.7450980Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.7451457Z x1_1 = rand_strided((10, 20), (20, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.7451924Z x2_1 = rand_strided((10, 20), (20, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.7452273Z print_performance(lambda: call([x1_1, x2_1])) 2023-01-11T21:38:05.7452475Z 2023-01-11T21:38:05.7452482Z 2023-01-11T21:38:05.7452613Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.7452880Z import torch 2023-01-11T21:38:05.7453129Z import random 2023-01-11T21:38:05.7453423Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.7453769Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.7453974Z 2023-01-11T21:38:05.7454085Z aten = torch.ops.aten 2023-01-11T21:38:05.7454417Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.7454928Z async_compile = AsyncCompile() 2023-01-11T21:38:05.7455098Z 2023-01-11T21:38:05.7455200Z import triton 2023-01-11T21:38:05.7455463Z import triton.language as tl 2023-01-11T21:38:05.7455807Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.7456182Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.7456405Z 2023-01-11T21:38:05.7456411Z 2023-01-11T21:38:05.7456607Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.7457042Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.7457600Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:05.7457931Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:05.7458247Z float* __restrict__ out_ptr0) 2023-01-11T21:38:05.7458515Z { 2023-01-11T21:38:05.7458761Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.7459025Z { 2023-01-11T21:38:05.7459263Z #pragma omp for 2023-01-11T21:38:05.7459520Z for(long i0=0; i0<25; i0+=1) 2023-01-11T21:38:05.7459773Z { 2023-01-11T21:38:05.7460085Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:05.7460468Z auto tmp11 = at::vec::Vectorized::loadu(in_ptr1 + 8*i0); 2023-01-11T21:38:05.7460803Z auto tmp1 = tmp0.abs(); 2023-01-11T21:38:05.7461085Z auto tmp2 = tmp1.sin(); 2023-01-11T21:38:05.7461437Z auto tmp3 = tmp2.neg(); 2023-01-11T21:38:05.7461728Z auto tmp4 = tmp3 * tmp3; 2023-01-11T21:38:05.7462080Z auto tmp5 = decltype(tmp4)(1)/(decltype(tmp4)(1) + tmp4.neg().exp()); 2023-01-11T21:38:05.7462461Z auto tmp6 = at::vec::clamp_min(tmp5, decltype(tmp5)(0)); 2023-01-11T21:38:05.7462780Z auto tmp7 = tmp6.cos(); 2023-01-11T21:38:05.7463064Z auto tmp8 = tmp7.exp(); 2023-01-11T21:38:05.7463341Z auto tmp9 = tmp8.sqrt(); 2023-01-11T21:38:05.7463617Z auto tmp10 = tmp9 + tmp0; 2023-01-11T21:38:05.7463817Z auto tmp12 = tmp10 - tmp11; 2023-01-11T21:38:05.7463946Z auto tmp13 = tmp12 * tmp0; 2023-01-11T21:38:05.7464065Z auto tmp14 = tmp13 / tmp0; 2023-01-11T21:38:05.7464178Z auto tmp15 = tmp14 * tmp14; 2023-01-11T21:38:05.7464293Z auto tmp16 = tmp15 * tmp15; 2023-01-11T21:38:05.7464411Z auto tmp17 = tmp16 * tmp14; 2023-01-11T21:38:05.7464531Z auto tmp18 = tmp17 * tmp17; 2023-01-11T21:38:05.7464651Z auto tmp19 = tmp18.log(); 2023-01-11T21:38:05.7464776Z auto tmp20 = tmp19.floor(); 2023-01-11T21:38:05.7464896Z auto tmp21 = tmp20.ceil(); 2023-01-11T21:38:05.7465017Z auto tmp22 = tmp21.trunc(); 2023-01-11T21:38:05.7465142Z auto tmp23 = tmp22.lgamma(); 2023-01-11T21:38:05.7465276Z auto tmp24 = tmp23.fmod(tmp11); 2023-01-11T21:38:05.7465488Z auto tmp25 = decltype(tmp24)::blendv(decltype(tmp24)(0), decltype(tmp24)(1), decltype(tmp24)(0) < tmp24); 2023-01-11T21:38:05.7465717Z auto tmp26 = decltype(tmp24)::blendv(decltype(tmp24)(0), decltype(tmp24)(1), tmp24 < decltype(tmp24)(0)); 2023-01-11T21:38:05.7465986Z auto tmp27 = tmp25 - tmp26; 2023-01-11T21:38:05.7466109Z auto tmp28 = tmp27 + tmp11; 2023-01-11T21:38:05.7466239Z tmp28.store(out_ptr0 + 8*i0); 2023-01-11T21:38:05.7466334Z } 2023-01-11T21:38:05.7466475Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:05.7466596Z for(long i0=200; i0<200; i0+=1) 2023-01-11T21:38:05.7466681Z { 2023-01-11T21:38:05.7466803Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:05.7466930Z auto tmp12 = in_ptr1[i0]; 2023-01-11T21:38:05.7467064Z auto tmp1 = std::abs(tmp0); 2023-01-11T21:38:05.7467186Z auto tmp2 = std::sin(tmp1); 2023-01-11T21:38:05.7467362Z auto tmp3 = -tmp2; 2023-01-11T21:38:05.7467483Z auto tmp4 = tmp3 * tmp3; 2023-01-11T21:38:05.7467659Z auto tmp5 = std::exp(-tmp4); 2023-01-11T21:38:05.7467783Z auto tmp6 = 1 / (1 + tmp5); 2023-01-11T21:38:05.7467904Z auto tmp7 = tmp6 * (tmp6>0); 2023-01-11T21:38:05.7468028Z auto tmp8 = std::cos(tmp7); 2023-01-11T21:38:05.7468150Z auto tmp9 = std::exp(tmp8); 2023-01-11T21:38:05.7468285Z auto tmp10 = std::sqrt(tmp9); 2023-01-11T21:38:05.7468405Z auto tmp11 = tmp10 + tmp0; 2023-01-11T21:38:05.7468574Z auto tmp13 = tmp11 - tmp12; 2023-01-11T21:38:05.7468692Z auto tmp14 = tmp13 * tmp0; 2023-01-11T21:38:05.7468809Z auto tmp15 = tmp14 / tmp0; 2023-01-11T21:38:05.7468926Z auto tmp16 = tmp15 * tmp15; 2023-01-11T21:38:05.7469046Z auto tmp17 = tmp16 * tmp16; 2023-01-11T21:38:05.7469163Z auto tmp18 = tmp17 * tmp15; 2023-01-11T21:38:05.7469284Z auto tmp19 = tmp18 * tmp18; 2023-01-11T21:38:05.7469404Z auto tmp20 = std::log(tmp19); 2023-01-11T21:38:05.7469536Z auto tmp21 = std::floor(tmp20); 2023-01-11T21:38:05.7469675Z auto tmp22 = std::ceil(tmp21); 2023-01-11T21:38:05.7469807Z auto tmp23 = std::trunc(tmp22); 2023-01-11T21:38:05.7469945Z auto tmp24 = std::lgamma(tmp23); 2023-01-11T21:38:05.7470084Z auto tmp25 = std::fmod(tmp24, tmp12); 2023-01-11T21:38:05.7470218Z auto tmp26 = tmp25 > 0 ? 1 : 0; 2023-01-11T21:38:05.7470397Z auto tmp27 = tmp25 < 0 ? 1 : 0; 2023-01-11T21:38:05.7470580Z auto tmp28 = tmp26 - tmp27; 2023-01-11T21:38:05.7470699Z auto tmp29 = tmp28 + tmp12; 2023-01-11T21:38:05.7470814Z out_ptr0[i0] = tmp29; 2023-01-11T21:38:05.7470903Z } 2023-01-11T21:38:05.7470987Z } 2023-01-11T21:38:05.7471077Z } 2023-01-11T21:38:05.7471189Z ''') 2023-01-11T21:38:05.7471208Z 2023-01-11T21:38:05.7471213Z 2023-01-11T21:38:05.7471325Z async_compile.wait(globals()) 2023-01-11T21:38:05.7471433Z del async_compile 2023-01-11T21:38:05.7471441Z 2023-01-11T21:38:05.7471540Z def call(args): 2023-01-11T21:38:05.7471647Z x1_1, x2_1 = args 2023-01-11T21:38:05.7471755Z args.clear() 2023-01-11T21:38:05.7472017Z buf0 = empty_strided((10, 20), (20, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.7472235Z kernel_cpp_0(c_void_p(x1_1.data_ptr()), c_void_p(x2_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:05.7472324Z del x1_1 2023-01-11T21:38:05.7472422Z del x2_1 2023-01-11T21:38:05.7472514Z return (buf0, ) 2023-01-11T21:38:05.7472521Z 2023-01-11T21:38:05.7472526Z 2023-01-11T21:38:05.7472635Z if __name__ == "__main__": 2023-01-11T21:38:05.7472788Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.7472959Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.7473230Z x1_1 = rand_strided((10, 20), (20, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.7473498Z x2_1 = rand_strided((10, 20), (20, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.7473641Z print_performance(lambda: call([x1_1, x2_1])) 2023-01-11T21:38:05.7473704Z 2023-01-11T21:38:05.7474059Z [2023-01-11 21:24:04,210] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph None 2023-01-11T21:38:05.7474404Z [2023-01-11 21:24:04,366] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph None 2023-01-11T21:38:05.7474755Z [2023-01-11 21:24:06,125] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph None 2023-01-11T21:38:05.7474763Z 2023-01-11T21:38:05.7474905Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.7475003Z import torch 2023-01-11T21:38:05.7475105Z import random 2023-01-11T21:38:05.7475268Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.7475422Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.7475443Z 2023-01-11T21:38:05.7475547Z aten = torch.ops.aten 2023-01-11T21:38:05.7475737Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.7475868Z async_compile = AsyncCompile() 2023-01-11T21:38:05.7475878Z 2023-01-11T21:38:05.7475975Z import triton 2023-01-11T21:38:05.7476099Z import triton.language as tl 2023-01-11T21:38:05.7476264Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.7476456Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.7476464Z 2023-01-11T21:38:05.7476474Z 2023-01-11T21:38:05.7476647Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.7476916Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.7477073Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:05.7477208Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:05.7477342Z float* __restrict__ out_ptr0) 2023-01-11T21:38:05.7477429Z { 2023-01-11T21:38:05.7477566Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.7477658Z { 2023-01-11T21:38:05.7477759Z #pragma omp for 2023-01-11T21:38:05.7477869Z for(long i0=0; i0<20; i0+=1) 2023-01-11T21:38:05.7477960Z { 2023-01-11T21:38:05.7478078Z #pragma GCC ivdep 2023-01-11T21:38:05.7478191Z for(long i1=0; i1<10; i1+=1) 2023-01-11T21:38:05.7478283Z { 2023-01-11T21:38:05.7478364Z { 2023-01-11T21:38:05.7478519Z { 2023-01-11T21:38:05.7478673Z auto tmp0 = in_ptr0[i0 + (20*i1)]; 2023-01-11T21:38:05.7478815Z auto tmp12 = in_ptr1[i1 + (10*i0)]; 2023-01-11T21:38:05.7478953Z auto tmp1 = std::abs(tmp0); 2023-01-11T21:38:05.7479083Z auto tmp2 = std::sin(tmp1); 2023-01-11T21:38:05.7479269Z auto tmp3 = -tmp2; 2023-01-11T21:38:05.7479404Z auto tmp4 = tmp3 * tmp3; 2023-01-11T21:38:05.7479599Z auto tmp5 = std::exp(-tmp4); 2023-01-11T21:38:05.7479729Z auto tmp6 = 1 / (1 + tmp5); 2023-01-11T21:38:05.7479865Z auto tmp7 = tmp6 * (tmp6>0); 2023-01-11T21:38:05.7480001Z auto tmp8 = std::cos(tmp7); 2023-01-11T21:38:05.7480133Z auto tmp9 = std::exp(tmp8); 2023-01-11T21:38:05.7480279Z auto tmp10 = std::sqrt(tmp9); 2023-01-11T21:38:05.7480418Z auto tmp11 = tmp10 + tmp0; 2023-01-11T21:38:05.7480611Z auto tmp13 = tmp11 - tmp12; 2023-01-11T21:38:05.7480736Z auto tmp14 = tmp13 * tmp0; 2023-01-11T21:38:05.7480872Z auto tmp15 = tmp14 / tmp0; 2023-01-11T21:38:05.7481004Z auto tmp16 = tmp15 * tmp15; 2023-01-11T21:38:05.7481128Z auto tmp17 = tmp16 * tmp16; 2023-01-11T21:38:05.7481262Z auto tmp18 = tmp17 * tmp15; 2023-01-11T21:38:05.7481389Z auto tmp19 = tmp18 * tmp18; 2023-01-11T21:38:05.7481525Z auto tmp20 = std::log(tmp19); 2023-01-11T21:38:05.7481715Z auto tmp21 = std::floor(tmp20); 2023-01-11T21:38:05.7481864Z auto tmp22 = std::ceil(tmp21); 2023-01-11T21:38:05.7482008Z auto tmp23 = std::trunc(tmp22); 2023-01-11T21:38:05.7482156Z auto tmp24 = std::lgamma(tmp23); 2023-01-11T21:38:05.7482310Z auto tmp25 = std::fmod(tmp24, tmp12); 2023-01-11T21:38:05.7482442Z auto tmp26 = tmp25 > 0 ? 1 : 0; 2023-01-11T21:38:05.7482575Z auto tmp27 = tmp25 < 0 ? 1 : 0; 2023-01-11T21:38:05.7482780Z auto tmp28 = tmp26 - tmp27; 2023-01-11T21:38:05.7482900Z auto tmp29 = tmp28 + tmp12; 2023-01-11T21:38:05.7483027Z out_ptr0[i0 + (20*i1)] = tmp29; 2023-01-11T21:38:05.7483122Z } 2023-01-11T21:38:05.7483217Z } 2023-01-11T21:38:05.7483307Z } 2023-01-11T21:38:05.7483400Z } 2023-01-11T21:38:05.7483490Z } 2023-01-11T21:38:05.7483570Z } 2023-01-11T21:38:05.7483687Z ''') 2023-01-11T21:38:05.7483695Z 2023-01-11T21:38:05.7483702Z 2023-01-11T21:38:05.7483829Z async_compile.wait(globals()) 2023-01-11T21:38:05.7483938Z del async_compile 2023-01-11T21:38:05.7483944Z 2023-01-11T21:38:05.7484049Z def call(args): 2023-01-11T21:38:05.7484148Z x1_1, x2_1 = args 2023-01-11T21:38:05.7484250Z args.clear() 2023-01-11T21:38:05.7484508Z buf0 = empty_strided((20, 10), (1, 20), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.7484721Z kernel_cpp_0(c_void_p(x1_1.data_ptr()), c_void_p(x2_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:05.7484819Z del x1_1 2023-01-11T21:38:05.7484929Z del x2_1 2023-01-11T21:38:05.7485044Z return (buf0, ) 2023-01-11T21:38:05.7485052Z 2023-01-11T21:38:05.7485060Z 2023-01-11T21:38:05.7485186Z if __name__ == "__main__": 2023-01-11T21:38:05.7485347Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.7485526Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.7485794Z x1_1 = rand_strided((20, 10), (1, 20), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.7486067Z x2_1 = rand_strided((20, 10), (10, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.7486276Z print_performance(lambda: call([x1_1, x2_1])) 2023-01-11T21:38:05.7486283Z 2023-01-11T21:38:05.7486288Z 2023-01-11T21:38:05.7486418Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.7486517Z import torch 2023-01-11T21:38:05.7486624Z import random 2023-01-11T21:38:05.7486788Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.7486950Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.7486956Z 2023-01-11T21:38:05.7487050Z aten = torch.ops.aten 2023-01-11T21:38:05.7487227Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.7487351Z async_compile = AsyncCompile() 2023-01-11T21:38:05.7487362Z 2023-01-11T21:38:05.7487463Z import triton 2023-01-11T21:38:05.7487586Z import triton.language as tl 2023-01-11T21:38:05.7487741Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.7487925Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.7487932Z 2023-01-11T21:38:05.7487938Z 2023-01-11T21:38:05.7488140Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.7488397Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.7488564Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:05.7488715Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:05.7488853Z float* __restrict__ out_ptr0) 2023-01-11T21:38:05.7488946Z { 2023-01-11T21:38:05.7489083Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.7489175Z { 2023-01-11T21:38:05.7489278Z #pragma omp for 2023-01-11T21:38:05.7489394Z for(long i0=0; i0<8; i0+=1) 2023-01-11T21:38:05.7489546Z { 2023-01-11T21:38:05.7489733Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:05.7489914Z auto tmp11 = at::vec::Vectorized::loadu(in_ptr1 + 8*i0); 2023-01-11T21:38:05.7490032Z auto tmp1 = tmp0.abs(); 2023-01-11T21:38:05.7490151Z auto tmp2 = tmp1.sin(); 2023-01-11T21:38:05.7490257Z auto tmp3 = tmp2.neg(); 2023-01-11T21:38:05.7490382Z auto tmp4 = tmp3 * tmp3; 2023-01-11T21:38:05.7490563Z auto tmp5 = decltype(tmp4)(1)/(decltype(tmp4)(1) + tmp4.neg().exp()); 2023-01-11T21:38:05.7490740Z auto tmp6 = at::vec::clamp_min(tmp5, decltype(tmp5)(0)); 2023-01-11T21:38:05.7490861Z auto tmp7 = tmp6.cos(); 2023-01-11T21:38:05.7490975Z auto tmp8 = tmp7.exp(); 2023-01-11T21:38:05.7491093Z auto tmp9 = tmp8.sqrt(); 2023-01-11T21:38:05.7491216Z auto tmp10 = tmp9 + tmp0; 2023-01-11T21:38:05.7491399Z auto tmp12 = tmp10 - tmp11; 2023-01-11T21:38:05.7491525Z auto tmp13 = tmp12 * tmp0; 2023-01-11T21:38:05.7491646Z auto tmp14 = tmp13 / tmp0; 2023-01-11T21:38:05.7491772Z auto tmp15 = tmp14 * tmp14; 2023-01-11T21:38:05.7491890Z auto tmp16 = tmp15 * tmp15; 2023-01-11T21:38:05.7492010Z auto tmp17 = tmp16 * tmp14; 2023-01-11T21:38:05.7492131Z auto tmp18 = tmp17 * tmp17; 2023-01-11T21:38:05.7492241Z auto tmp19 = tmp18.log(); 2023-01-11T21:38:05.7492362Z auto tmp20 = tmp19.floor(); 2023-01-11T21:38:05.7492481Z auto tmp21 = tmp20.ceil(); 2023-01-11T21:38:05.7492600Z auto tmp22 = tmp21.trunc(); 2023-01-11T21:38:05.7492730Z auto tmp23 = tmp22.lgamma(); 2023-01-11T21:38:05.7492858Z auto tmp24 = tmp23.fmod(tmp11); 2023-01-11T21:38:05.7493084Z auto tmp25 = decltype(tmp24)::blendv(decltype(tmp24)(0), decltype(tmp24)(1), decltype(tmp24)(0) < tmp24); 2023-01-11T21:38:05.7493317Z auto tmp26 = decltype(tmp24)::blendv(decltype(tmp24)(0), decltype(tmp24)(1), tmp24 < decltype(tmp24)(0)); 2023-01-11T21:38:05.7493493Z auto tmp27 = tmp25 - tmp26; 2023-01-11T21:38:05.7493617Z auto tmp28 = tmp27 + tmp11; 2023-01-11T21:38:05.7493737Z tmp28.store(out_ptr0 + 8*i0); 2023-01-11T21:38:05.7493881Z } 2023-01-11T21:38:05.7494018Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:05.7494132Z for(long i0=64; i0<70; i0+=1) 2023-01-11T21:38:05.7494225Z { 2023-01-11T21:38:05.7494338Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:05.7494456Z auto tmp12 = in_ptr1[i0]; 2023-01-11T21:38:05.7655216Z auto tmp1 = std::abs(tmp0); 2023-01-11T21:38:05.7655720Z auto tmp2 = std::sin(tmp1); 2023-01-11T21:38:05.7655975Z auto tmp3 = -tmp2; 2023-01-11T21:38:05.7656127Z auto tmp4 = tmp3 * tmp3; 2023-01-11T21:38:05.7686671Z auto tmp5 = std::exp(-tmp4); 2023-01-11T21:38:05.7686804Z auto tmp6 = 1 / (1 + tmp5); 2023-01-11T21:38:05.7686952Z auto tmp7 = tmp6 * (tmp6>0); 2023-01-11T21:38:05.7687108Z auto tmp8 = std::cos(tmp7); 2023-01-11T21:38:05.7687728Z auto tmp9 = std::exp(tmp8); 2023-01-11T21:38:05.7688116Z auto tmp10 = std::sqrt(tmp9); 2023-01-11T21:38:05.7688264Z auto tmp11 = tmp10 + tmp0; 2023-01-11T21:38:05.7688437Z auto tmp13 = tmp11 - tmp12; 2023-01-11T21:38:05.7688531Z auto tmp14 = tmp13 * tmp0; 2023-01-11T21:38:05.7688644Z auto tmp15 = tmp14 / tmp0; 2023-01-11T21:38:05.7688755Z auto tmp16 = tmp15 * tmp15; 2023-01-11T21:38:05.7688867Z auto tmp17 = tmp16 * tmp16; 2023-01-11T21:38:05.7688981Z auto tmp18 = tmp17 * tmp15; 2023-01-11T21:38:05.7689200Z auto tmp19 = tmp18 * tmp18; 2023-01-11T21:38:05.7689393Z auto tmp20 = std::log(tmp19); 2023-01-11T21:38:05.7689530Z auto tmp21 = std::floor(tmp20); 2023-01-11T21:38:05.7689810Z auto tmp22 = std::ceil(tmp21); 2023-01-11T21:38:05.7689973Z auto tmp23 = std::trunc(tmp22); 2023-01-11T21:38:05.7690136Z auto tmp24 = std::lgamma(tmp23); 2023-01-11T21:38:05.7690334Z auto tmp25 = std::fmod(tmp24, tmp12); 2023-01-11T21:38:05.7690499Z auto tmp26 = tmp25 > 0 ? 1 : 0; 2023-01-11T21:38:05.7690652Z auto tmp27 = tmp25 < 0 ? 1 : 0; 2023-01-11T21:38:05.7690894Z auto tmp28 = tmp26 - tmp27; 2023-01-11T21:38:05.7691014Z auto tmp29 = tmp28 + tmp12; 2023-01-11T21:38:05.7691184Z out_ptr0[i0] = tmp29; 2023-01-11T21:38:05.7691307Z } 2023-01-11T21:38:05.7795398Z } 2023-01-11T21:38:05.7795607Z } 2023-01-11T21:38:05.7795826Z ''') 2023-01-11T21:38:05.7795836Z 2023-01-11T21:38:05.7795842Z 2023-01-11T21:38:05.7796002Z async_compile.wait(globals()) 2023-01-11T21:38:05.7796121Z del async_compile 2023-01-11T21:38:05.7796128Z 2023-01-11T21:38:05.7796254Z def call(args): 2023-01-11T21:38:05.7796362Z x1_1, x2_1 = args 2023-01-11T21:38:05.7796470Z args.clear() 2023-01-11T21:38:05.7796795Z buf0 = empty_strided((10, 7), (7, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.7797029Z kernel_cpp_0(c_void_p(x1_1.data_ptr()), c_void_p(x2_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:05.7797137Z del x1_1 2023-01-11T21:38:05.7797225Z del x2_1 2023-01-11T21:38:05.7797325Z return (buf0, ) 2023-01-11T21:38:05.7797333Z 2023-01-11T21:38:05.7797339Z 2023-01-11T21:38:05.7797451Z if __name__ == "__main__": 2023-01-11T21:38:05.7797617Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.7797795Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.7798067Z x1_1 = rand_strided((10, 7), (7, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.7798331Z x2_1 = rand_strided((10, 7), (7, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.7798478Z print_performance(lambda: call([x1_1, x2_1])) 2023-01-11T21:38:05.7798489Z 2023-01-11T21:38:05.7798578Z ok (6.587s) 2023-01-11T21:38:05.7799478Z test_abs_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.7799664Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.7800025Z [2023-01-11 21:24:06,145] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 7 2023-01-11T21:38:05.7800400Z [2023-01-11 21:24:07,866] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 7 2023-01-11T21:38:05.7800408Z 2023-01-11T21:38:05.7800535Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.7800642Z import torch 2023-01-11T21:38:05.7800746Z import random 2023-01-11T21:38:05.7800910Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.7801074Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.7801081Z 2023-01-11T21:38:05.7801194Z aten = torch.ops.aten 2023-01-11T21:38:05.7801380Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.7801513Z async_compile = AsyncCompile() 2023-01-11T21:38:05.7801520Z 2023-01-11T21:38:05.7801622Z import triton 2023-01-11T21:38:05.7801749Z import triton.language as tl 2023-01-11T21:38:05.7801922Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.7802098Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.7802106Z 2023-01-11T21:38:05.7802112Z 2023-01-11T21:38:05.7802299Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.7802569Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.7802821Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:05.7802965Z float* __restrict__ out_ptr0) 2023-01-11T21:38:05.7803059Z { 2023-01-11T21:38:05.7803204Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.7803300Z { 2023-01-11T21:38:05.7803405Z #pragma omp for 2023-01-11T21:38:05.7803530Z for(long i0=0; i0<2; i0+=1) 2023-01-11T21:38:05.7803620Z { 2023-01-11T21:38:05.7803832Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:05.7803961Z auto tmp1 = tmp0.abs(); 2023-01-11T21:38:05.7804152Z auto tmp2 = at::vec::Vectorized(static_cast(1)); 2023-01-11T21:38:05.7804281Z auto tmp3 = tmp1 + tmp2; 2023-01-11T21:38:05.7804397Z auto tmp4 = tmp0 / tmp3; 2023-01-11T21:38:05.7804518Z tmp4.store(out_ptr0 + 8*i0); 2023-01-11T21:38:05.7804611Z } 2023-01-11T21:38:05.7804746Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:05.7804874Z for(long i0=16; i0<17; i0+=1) 2023-01-11T21:38:05.7804968Z { 2023-01-11T21:38:05.7805085Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:05.7805208Z auto tmp1 = std::abs(tmp0); 2023-01-11T21:38:05.7805343Z auto tmp2 = static_cast(1); 2023-01-11T21:38:05.7805472Z auto tmp3 = tmp1 + tmp2; 2023-01-11T21:38:05.7805593Z auto tmp4 = tmp0 / tmp3; 2023-01-11T21:38:05.7805709Z out_ptr0[i0] = tmp4; 2023-01-11T21:38:05.7805797Z } 2023-01-11T21:38:05.7805886Z } 2023-01-11T21:38:05.7805970Z } 2023-01-11T21:38:05.7806102Z ''') 2023-01-11T21:38:05.7806110Z 2023-01-11T21:38:05.7806116Z 2023-01-11T21:38:05.7806246Z async_compile.wait(globals()) 2023-01-11T21:38:05.7806356Z del async_compile 2023-01-11T21:38:05.7806363Z 2023-01-11T21:38:05.7806464Z def call(args): 2023-01-11T21:38:05.7806567Z arg0_1, = args 2023-01-11T21:38:05.7806666Z args.clear() 2023-01-11T21:38:05.7806944Z buf0 = empty_strided((17, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.7807127Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:05.7807229Z del arg0_1 2023-01-11T21:38:05.7807328Z return (buf0, ) 2023-01-11T21:38:05.7807336Z 2023-01-11T21:38:05.7807342Z 2023-01-11T21:38:05.7807508Z if __name__ == "__main__": 2023-01-11T21:38:05.7807680Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.7807861Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.7808130Z arg0_1 = rand_strided((17, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.7808286Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:05.7808293Z 2023-01-11T21:38:05.7808383Z ok (1.743s) 2023-01-11T21:38:05.7809010Z test_adaptive_avg_pool2d1_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.7809196Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.7809552Z [2023-01-11 21:24:07,895] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 8 2023-01-11T21:38:05.7809880Z [2023-01-11 21:24:07,904] torch._inductor.ir: [WARNING] Using FallbackKernel: aten._adaptive_avg_pool2d 2023-01-11T21:38:05.7809888Z 2023-01-11T21:38:05.7810038Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.7810141Z import torch 2023-01-11T21:38:05.7810245Z import random 2023-01-11T21:38:05.7810409Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.7810573Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.7810581Z 2023-01-11T21:38:05.7810699Z aten = torch.ops.aten 2023-01-11T21:38:05.7810932Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.7811062Z async_compile = AsyncCompile() 2023-01-11T21:38:05.7811069Z 2023-01-11T21:38:05.7811171Z import triton 2023-01-11T21:38:05.7811299Z import triton.language as tl 2023-01-11T21:38:05.7811471Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.7811656Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.7811664Z 2023-01-11T21:38:05.7811671Z 2023-01-11T21:38:05.7811855Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.7812141Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.7812310Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:05.7812452Z float* __restrict__ out_ptr0, 2023-01-11T21:38:05.7812592Z float* __restrict__ out_ptr1) 2023-01-11T21:38:05.7812681Z { 2023-01-11T21:38:05.7812822Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.7812915Z { 2023-01-11T21:38:05.7813019Z #pragma omp for 2023-01-11T21:38:05.7813132Z for(long i0=0; i0<8; i0+=1) 2023-01-11T21:38:05.7813223Z { 2023-01-11T21:38:05.7813342Z #pragma GCC ivdep 2023-01-11T21:38:05.7813464Z for(long i1=0; i1<6; i1+=1) 2023-01-11T21:38:05.7813563Z { 2023-01-11T21:38:05.7813674Z #pragma GCC ivdep 2023-01-11T21:38:05.7813803Z for(long i2=0; i2<6; i2+=1) 2023-01-11T21:38:05.7813896Z { 2023-01-11T21:38:05.7813991Z { 2023-01-11T21:38:05.7814093Z { 2023-01-11T21:38:05.7814260Z auto tmp0 = static_cast(((8*i1) / 3)); 2023-01-11T21:38:05.7814427Z auto tmp1 = static_cast(((21 + (16*i1)) / 6)); 2023-01-11T21:38:05.7814794Z auto tmp2 = tmp0 < tmp1; 2023-01-11T21:38:05.7814971Z auto tmp3 = static_cast(((8*i2) / 3)); 2023-01-11T21:38:05.7815166Z auto tmp4 = static_cast(((21 + (16*i2)) / 6)); 2023-01-11T21:38:05.7815316Z auto tmp5 = tmp3 < tmp4; 2023-01-11T21:38:05.7815449Z auto tmp6 = tmp2 & tmp5; 2023-01-11T21:38:05.7815642Z float tmp7 = 0.0; 2023-01-11T21:38:05.7815757Z if(tmp6) 2023-01-11T21:38:05.7815859Z { 2023-01-11T21:38:05.7816017Z auto tmp8 = in_ptr0[(16*(((8*i1) / 3))) + (256*i0) + (((8*i2) / 3))]; 2023-01-11T21:38:05.7816145Z tmp7 = tmp8; 2023-01-11T21:38:05.7816245Z } 2023-01-11T21:38:05.7816404Z auto tmp9 = static_cast(1 + (((8*i2) / 3))); 2023-01-11T21:38:05.7816542Z auto tmp10 = tmp9 < tmp4; 2023-01-11T21:38:05.7816681Z auto tmp11 = tmp2 & tmp10; 2023-01-11T21:38:05.7816808Z float tmp12 = 0.0; 2023-01-11T21:38:05.7816912Z if(tmp11) 2023-01-11T21:38:05.7817019Z { 2023-01-11T21:38:05.7817280Z auto tmp13 = in_ptr0[1 + (16*(((8*i1) / 3))) + (256*i0) + (((8*i2) / 3))]; 2023-01-11T21:38:05.7817412Z tmp12 = tmp13; 2023-01-11T21:38:05.7817517Z } 2023-01-11T21:38:05.7817662Z auto tmp14 = tmp12 + tmp7; 2023-01-11T21:38:05.7817833Z auto tmp15 = static_cast(2 + (((8*i2) / 3))); 2023-01-11T21:38:05.7817969Z auto tmp16 = tmp15 < tmp4; 2023-01-11T21:38:05.7818098Z auto tmp17 = tmp2 & tmp16; 2023-01-11T21:38:05.7818228Z float tmp18 = 0.0; 2023-01-11T21:38:05.7818343Z if(tmp17) 2023-01-11T21:38:05.7818534Z { 2023-01-11T21:38:05.7818705Z auto tmp19 = in_ptr0[2 + (16*(((8*i1) / 3))) + (256*i0) + (((8*i2) / 3))]; 2023-01-11T21:38:05.7818836Z tmp18 = tmp19; 2023-01-11T21:38:05.7818940Z } 2023-01-11T21:38:05.7819077Z auto tmp20 = tmp18 + tmp14; 2023-01-11T21:38:05.7819235Z auto tmp21 = static_cast(3 + (((8*i2) / 3))); 2023-01-11T21:38:05.7819372Z auto tmp22 = tmp21 < tmp4; 2023-01-11T21:38:05.7819508Z auto tmp23 = tmp2 & tmp22; 2023-01-11T21:38:05.7819634Z float tmp24 = 0.0; 2023-01-11T21:38:05.7819750Z if(tmp23) 2023-01-11T21:38:05.7819853Z { 2023-01-11T21:38:05.7820025Z auto tmp25 = in_ptr0[3 + (16*(((8*i1) / 3))) + (256*i0) + (((8*i2) / 3))]; 2023-01-11T21:38:05.7820142Z tmp24 = tmp25; 2023-01-11T21:38:05.7820250Z } 2023-01-11T21:38:05.7820391Z auto tmp26 = tmp24 + tmp20; 2023-01-11T21:38:05.7820556Z auto tmp27 = static_cast(1 + (((8*i1) / 3))); 2023-01-11T21:38:05.7820690Z auto tmp28 = tmp27 < tmp1; 2023-01-11T21:38:05.7820828Z auto tmp29 = tmp28 & tmp5; 2023-01-11T21:38:05.7820951Z float tmp30 = 0.0; 2023-01-11T21:38:05.7821053Z if(tmp29) 2023-01-11T21:38:05.7821152Z { 2023-01-11T21:38:05.7821318Z auto tmp31 = in_ptr0[16 + (16*(((8*i1) / 3))) + (256*i0) + (((8*i2) / 3))]; 2023-01-11T21:38:05.7821443Z tmp30 = tmp31; 2023-01-11T21:38:05.7821549Z } 2023-01-11T21:38:05.7821678Z auto tmp32 = tmp30 + tmp26; 2023-01-11T21:38:05.7821818Z auto tmp33 = tmp28 & tmp10; 2023-01-11T21:38:05.7821944Z float tmp34 = 0.0; 2023-01-11T21:38:05.7822047Z if(tmp33) 2023-01-11T21:38:05.7822153Z { 2023-01-11T21:38:05.7822420Z auto tmp35 = in_ptr0[17 + (16*(((8*i1) / 3))) + (256*i0) + (((8*i2) / 3))]; 2023-01-11T21:38:05.7822548Z tmp34 = tmp35; 2023-01-11T21:38:05.7822651Z } 2023-01-11T21:38:05.7822786Z auto tmp36 = tmp34 + tmp32; 2023-01-11T21:38:05.7822917Z auto tmp37 = tmp28 & tmp16; 2023-01-11T21:38:05.7823033Z float tmp38 = 0.0; 2023-01-11T21:38:05.7823145Z if(tmp37) 2023-01-11T21:38:05.7823243Z { 2023-01-11T21:38:05.7823414Z auto tmp39 = in_ptr0[18 + (16*(((8*i1) / 3))) + (256*i0) + (((8*i2) / 3))]; 2023-01-11T21:38:05.7823548Z tmp38 = tmp39; 2023-01-11T21:38:05.7823652Z } 2023-01-11T21:38:05.7823785Z auto tmp40 = tmp38 + tmp36; 2023-01-11T21:38:05.7823923Z auto tmp41 = tmp28 & tmp22; 2023-01-11T21:38:05.7824039Z float tmp42 = 0.0; 2023-01-11T21:38:05.7824146Z if(tmp41) 2023-01-11T21:38:05.7824248Z { 2023-01-11T21:38:05.7824423Z auto tmp43 = in_ptr0[19 + (16*(((8*i1) / 3))) + (256*i0) + (((8*i2) / 3))]; 2023-01-11T21:38:05.7824542Z tmp42 = tmp43; 2023-01-11T21:38:05.7824640Z } 2023-01-11T21:38:05.7824773Z auto tmp44 = tmp42 + tmp40; 2023-01-11T21:38:05.7824928Z auto tmp45 = static_cast(2 + (((8*i1) / 3))); 2023-01-11T21:38:05.7825100Z auto tmp46 = tmp45 < tmp1; 2023-01-11T21:38:05.7825240Z auto tmp47 = tmp46 & tmp5; 2023-01-11T21:38:05.7825366Z float tmp48 = 0.0; 2023-01-11T21:38:05.7825479Z if(tmp47) 2023-01-11T21:38:05.7825592Z { 2023-01-11T21:38:05.7825769Z auto tmp49 = in_ptr0[32 + (16*(((8*i1) / 3))) + (256*i0) + (((8*i2) / 3))]; 2023-01-11T21:38:05.7825891Z tmp48 = tmp49; 2023-01-11T21:38:05.7825986Z } 2023-01-11T21:38:05.7826128Z auto tmp50 = tmp48 + tmp44; 2023-01-11T21:38:05.7826268Z auto tmp51 = tmp46 & tmp10; 2023-01-11T21:38:05.7826396Z float tmp52 = 0.0; 2023-01-11T21:38:05.7826515Z if(tmp51) 2023-01-11T21:38:05.7826618Z { 2023-01-11T21:38:05.7826792Z auto tmp53 = in_ptr0[33 + (16*(((8*i1) / 3))) + (256*i0) + (((8*i2) / 3))]; 2023-01-11T21:38:05.7826916Z tmp52 = tmp53; 2023-01-11T21:38:05.7827011Z } 2023-01-11T21:38:05.7827147Z auto tmp54 = tmp52 + tmp50; 2023-01-11T21:38:05.7827283Z auto tmp55 = tmp46 & tmp16; 2023-01-11T21:38:05.7827399Z float tmp56 = 0.0; 2023-01-11T21:38:05.7827508Z if(tmp55) 2023-01-11T21:38:05.7827601Z { 2023-01-11T21:38:05.7827770Z auto tmp57 = in_ptr0[34 + (16*(((8*i1) / 3))) + (256*i0) + (((8*i2) / 3))]; 2023-01-11T21:38:05.7827892Z tmp56 = tmp57; 2023-01-11T21:38:05.7827990Z } 2023-01-11T21:38:05.7828127Z auto tmp58 = tmp56 + tmp54; 2023-01-11T21:38:05.7828268Z auto tmp59 = tmp46 & tmp22; 2023-01-11T21:38:05.7828396Z float tmp60 = 0.0; 2023-01-11T21:38:05.7828508Z if(tmp59) 2023-01-11T21:38:05.7828598Z { 2023-01-11T21:38:05.7828810Z auto tmp61 = in_ptr0[35 + (16*(((8*i1) / 3))) + (256*i0) + (((8*i2) / 3))]; 2023-01-11T21:38:05.7828935Z tmp60 = tmp61; 2023-01-11T21:38:05.7829031Z } 2023-01-11T21:38:05.7829164Z auto tmp62 = tmp60 + tmp58; 2023-01-11T21:38:05.7829315Z auto tmp63 = static_cast(3 + (((8*i1) / 3))); 2023-01-11T21:38:05.7829446Z auto tmp64 = tmp63 < tmp1; 2023-01-11T21:38:05.7829569Z auto tmp65 = tmp64 & tmp5; 2023-01-11T21:38:05.7829696Z float tmp66 = 0.0; 2023-01-11T21:38:05.7829815Z if(tmp65) 2023-01-11T21:38:05.7829922Z { 2023-01-11T21:38:05.7830093Z auto tmp67 = in_ptr0[48 + (16*(((8*i1) / 3))) + (256*i0) + (((8*i2) / 3))]; 2023-01-11T21:38:05.7830210Z tmp66 = tmp67; 2023-01-11T21:38:05.7830313Z } 2023-01-11T21:38:05.7830450Z auto tmp68 = tmp66 + tmp62; 2023-01-11T21:38:05.7830577Z auto tmp69 = tmp64 & tmp10; 2023-01-11T21:38:05.7830706Z float tmp70 = 0.0; 2023-01-11T21:38:05.7830812Z if(tmp69) 2023-01-11T21:38:05.7830914Z { 2023-01-11T21:38:05.7831087Z auto tmp71 = in_ptr0[49 + (16*(((8*i1) / 3))) + (256*i0) + (((8*i2) / 3))]; 2023-01-11T21:38:05.7831209Z tmp70 = tmp71; 2023-01-11T21:38:05.7831314Z } 2023-01-11T21:38:05.7831482Z auto tmp72 = tmp70 + tmp68; 2023-01-11T21:38:05.7831616Z auto tmp73 = tmp64 & tmp16; 2023-01-11T21:38:05.7831746Z float tmp74 = 0.0; 2023-01-11T21:38:05.7831858Z if(tmp73) 2023-01-11T21:38:05.7831961Z { 2023-01-11T21:38:05.7832137Z auto tmp75 = in_ptr0[50 + (16*(((8*i1) / 3))) + (256*i0) + (((8*i2) / 3))]; 2023-01-11T21:38:05.7832257Z tmp74 = tmp75; 2023-01-11T21:38:05.7832350Z } 2023-01-11T21:38:05.7832487Z auto tmp76 = tmp74 + tmp72; 2023-01-11T21:38:05.7832624Z auto tmp77 = tmp64 & tmp22; 2023-01-11T21:38:05.7832751Z float tmp78 = 0.0; 2023-01-11T21:38:05.7832861Z if(tmp77) 2023-01-11T21:38:05.7832964Z { 2023-01-11T21:38:05.7833143Z auto tmp79 = in_ptr0[51 + (16*(((8*i1) / 3))) + (256*i0) + (((8*i2) / 3))]; 2023-01-11T21:38:05.7833267Z tmp78 = tmp79; 2023-01-11T21:38:05.7833361Z } 2023-01-11T21:38:05.7833492Z auto tmp80 = tmp78 + tmp76; 2023-01-11T21:38:05.7833621Z float tmp81 = 0.0; 2023-01-11T21:38:05.7833734Z if(tmp6) 2023-01-11T21:38:05.7833837Z { 2023-01-11T21:38:05.7833994Z auto tmp82 = static_cast(1); 2023-01-11T21:38:05.7834115Z tmp81 = tmp82; 2023-01-11T21:38:05.7834204Z } 2023-01-11T21:38:05.7834329Z float tmp83 = 0.0; 2023-01-11T21:38:05.7834438Z if(tmp11) 2023-01-11T21:38:05.7834540Z { 2023-01-11T21:38:05.7834695Z auto tmp84 = static_cast(1); 2023-01-11T21:38:05.7834822Z tmp83 = tmp84; 2023-01-11T21:38:05.7834945Z } 2023-01-11T21:38:05.7835088Z auto tmp85 = tmp83 + tmp81; 2023-01-11T21:38:05.7835224Z float tmp86 = 0.0; 2023-01-11T21:38:05.7835381Z if(tmp17) 2023-01-11T21:38:05.7835486Z { 2023-01-11T21:38:05.7835631Z auto tmp87 = static_cast(1); 2023-01-11T21:38:05.7835754Z tmp86 = tmp87; 2023-01-11T21:38:05.7835854Z } 2023-01-11T21:38:05.7835981Z auto tmp88 = tmp86 + tmp85; 2023-01-11T21:38:05.7836113Z float tmp89 = 0.0; 2023-01-11T21:38:05.7836221Z if(tmp23) 2023-01-11T21:38:05.7836323Z { 2023-01-11T21:38:05.7836478Z auto tmp90 = static_cast(1); 2023-01-11T21:38:05.7836607Z tmp89 = tmp90; 2023-01-11T21:38:05.7836708Z } 2023-01-11T21:38:05.7836839Z auto tmp91 = tmp89 + tmp88; 2023-01-11T21:38:05.7836955Z float tmp92 = 0.0; 2023-01-11T21:38:05.7837071Z if(tmp29) 2023-01-11T21:38:05.7837173Z { 2023-01-11T21:38:05.7837328Z auto tmp93 = static_cast(1); 2023-01-11T21:38:05.7837453Z tmp92 = tmp93; 2023-01-11T21:38:05.7837551Z } 2023-01-11T21:38:05.7837692Z auto tmp94 = tmp92 + tmp91; 2023-01-11T21:38:05.7837810Z float tmp95 = 0.0; 2023-01-11T21:38:05.7837922Z if(tmp33) 2023-01-11T21:38:05.7838022Z { 2023-01-11T21:38:05.7838253Z auto tmp96 = static_cast(1); 2023-01-11T21:38:05.7838379Z tmp95 = tmp96; 2023-01-11T21:38:05.7838483Z } 2023-01-11T21:38:05.7838625Z auto tmp97 = tmp95 + tmp94; 2023-01-11T21:38:05.7838745Z float tmp98 = 0.0; 2023-01-11T21:38:05.7838857Z if(tmp37) 2023-01-11T21:38:05.7838961Z { 2023-01-11T21:38:05.7839118Z auto tmp99 = static_cast(1); 2023-01-11T21:38:05.7839241Z tmp98 = tmp99; 2023-01-11T21:38:05.7839346Z } 2023-01-11T21:38:05.7839480Z auto tmp100 = tmp98 + tmp97; 2023-01-11T21:38:05.7839603Z float tmp101 = 0.0; 2023-01-11T21:38:05.7839716Z if(tmp41) 2023-01-11T21:38:05.7839817Z { 2023-01-11T21:38:05.7839984Z auto tmp102 = static_cast(1); 2023-01-11T21:38:05.7840111Z tmp101 = tmp102; 2023-01-11T21:38:05.7840213Z } 2023-01-11T21:38:05.7840356Z auto tmp103 = tmp101 + tmp100; 2023-01-11T21:38:05.7840481Z float tmp104 = 0.0; 2023-01-11T21:38:05.7840599Z if(tmp47) 2023-01-11T21:38:05.7840710Z { 2023-01-11T21:38:05.7840861Z auto tmp105 = static_cast(1); 2023-01-11T21:38:05.7840987Z tmp104 = tmp105; 2023-01-11T21:38:05.7841090Z } 2023-01-11T21:38:05.7841240Z auto tmp106 = tmp104 + tmp103; 2023-01-11T21:38:05.7841367Z float tmp107 = 0.0; 2023-01-11T21:38:05.7841472Z if(tmp51) 2023-01-11T21:38:05.7841573Z { 2023-01-11T21:38:05.7841737Z auto tmp108 = static_cast(1); 2023-01-11T21:38:05.7841867Z tmp107 = tmp108; 2023-01-11T21:38:05.7841974Z } 2023-01-11T21:38:05.7842116Z auto tmp109 = tmp107 + tmp106; 2023-01-11T21:38:05.7842296Z float tmp110 = 0.0; 2023-01-11T21:38:05.7842398Z if(tmp55) 2023-01-11T21:38:05.7842503Z { 2023-01-11T21:38:05.7842656Z auto tmp111 = static_cast(1); 2023-01-11T21:38:05.7842781Z tmp110 = tmp111; 2023-01-11T21:38:05.7842885Z } 2023-01-11T21:38:05.7843027Z auto tmp112 = tmp110 + tmp109; 2023-01-11T21:38:05.7843158Z float tmp113 = 0.0; 2023-01-11T21:38:05.7843264Z if(tmp59) 2023-01-11T21:38:05.7843371Z { 2023-01-11T21:38:05.7843522Z auto tmp114 = static_cast(1); 2023-01-11T21:38:05.7843643Z tmp113 = tmp114; 2023-01-11T21:38:05.7843745Z } 2023-01-11T21:38:05.7843889Z auto tmp115 = tmp113 + tmp112; 2023-01-11T21:38:05.7844024Z float tmp116 = 0.0; 2023-01-11T21:38:05.7844123Z if(tmp65) 2023-01-11T21:38:05.7844225Z { 2023-01-11T21:38:05.7844390Z auto tmp117 = static_cast(1); 2023-01-11T21:38:05.7844517Z tmp116 = tmp117; 2023-01-11T21:38:05.7844621Z } 2023-01-11T21:38:05.7844762Z auto tmp118 = tmp116 + tmp115; 2023-01-11T21:38:05.7844889Z float tmp119 = 0.0; 2023-01-11T21:38:05.7844999Z if(tmp69) 2023-01-11T21:38:05.7845148Z { 2023-01-11T21:38:05.7845303Z auto tmp120 = static_cast(1); 2023-01-11T21:38:05.7845430Z tmp119 = tmp120; 2023-01-11T21:38:05.7845529Z } 2023-01-11T21:38:05.7845667Z auto tmp121 = tmp119 + tmp118; 2023-01-11T21:38:05.7845795Z float tmp122 = 0.0; 2023-01-11T21:38:05.7845909Z if(tmp73) 2023-01-11T21:38:05.7845998Z { 2023-01-11T21:38:05.7846154Z auto tmp123 = static_cast(1); 2023-01-11T21:38:05.7846283Z tmp122 = tmp123; 2023-01-11T21:38:05.7846382Z } 2023-01-11T21:38:05.7846527Z auto tmp124 = tmp122 + tmp121; 2023-01-11T21:38:05.7846658Z float tmp125 = 0.0; 2023-01-11T21:38:05.7846778Z if(tmp77) 2023-01-11T21:38:05.7846870Z { 2023-01-11T21:38:05.7847023Z auto tmp126 = static_cast(1); 2023-01-11T21:38:05.7847147Z tmp125 = tmp126; 2023-01-11T21:38:05.7847251Z } 2023-01-11T21:38:05.7847394Z auto tmp127 = tmp125 + tmp124; 2023-01-11T21:38:05.7847535Z auto tmp128 = tmp80 / tmp127; 2023-01-11T21:38:05.7847681Z out_ptr0[i2 + (6*i1) + (36*i0)] = tmp128; 2023-01-11T21:38:05.7847769Z } 2023-01-11T21:38:05.7847866Z } 2023-01-11T21:38:05.7847963Z } 2023-01-11T21:38:05.7848059Z } 2023-01-11T21:38:05.7848153Z } 2023-01-11T21:38:05.7848273Z #pragma omp for 2023-01-11T21:38:05.7848400Z for(long i0=0; i0<256; i0+=1) 2023-01-11T21:38:05.7848484Z { 2023-01-11T21:38:05.7848679Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:05.7848871Z auto tmp1 = at::vec::Vectorized(static_cast(1)); 2023-01-11T21:38:05.7848999Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:05.7849134Z tmp2.store(out_ptr1 + 8*i0); 2023-01-11T21:38:05.7849269Z } 2023-01-11T21:38:05.7849413Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:05.7849528Z for(long i0=2048; i0<2048; i0+=1) 2023-01-11T21:38:05.7849622Z { 2023-01-11T21:38:05.7849750Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:05.7849893Z auto tmp1 = static_cast(1); 2023-01-11T21:38:05.7850013Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:05.7850132Z out_ptr1[i0] = tmp2; 2023-01-11T21:38:05.7850227Z } 2023-01-11T21:38:05.7850312Z } 2023-01-11T21:38:05.7850402Z } 2023-01-11T21:38:05.7850540Z ''') 2023-01-11T21:38:05.7850547Z 2023-01-11T21:38:05.7850552Z 2023-01-11T21:38:05.7850683Z async_compile.wait(globals()) 2023-01-11T21:38:05.7850794Z del async_compile 2023-01-11T21:38:05.7850802Z 2023-01-11T21:38:05.7850908Z def call(args): 2023-01-11T21:38:05.7851012Z arg0_1, = args 2023-01-11T21:38:05.7851108Z args.clear() 2023-01-11T21:38:05.7851401Z buf0 = empty_strided((2, 4, 6, 6), (144, 36, 6, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.7851704Z buf1 = empty_strided((2, 4, 16, 16), (1024, 256, 16, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.7851935Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr())) 2023-01-11T21:38:05.7852035Z del arg0_1 2023-01-11T21:38:05.7852186Z buf2 = aten._adaptive_avg_pool2d(buf1, [2, 5]) 2023-01-11T21:38:05.7852281Z del buf1 2023-01-11T21:38:05.7852384Z buf3 = buf2 2023-01-11T21:38:05.7852526Z assert_size_stride(buf3, (2, 4, 2, 5), (40, 10, 5, 1)) 2023-01-11T21:38:05.7852622Z del buf2 2023-01-11T21:38:05.7852732Z return (buf0, buf3, ) 2023-01-11T21:38:05.7852786Z 2023-01-11T21:38:05.7852792Z 2023-01-11T21:38:05.7852905Z if __name__ == "__main__": 2023-01-11T21:38:05.7853074Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.7853240Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.7853568Z arg0_1 = rand_strided((2, 4, 16, 16), (1024, 256, 16, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.7853722Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:05.7854078Z [2023-01-11 21:24:10,096] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 8 2023-01-11T21:38:05.7854812Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.7855004Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.7855359Z [2023-01-11 21:24:10,123] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 9 2023-01-11T21:38:05.7855367Z 2023-01-11T21:38:05.7855373Z 2023-01-11T21:38:05.7855515Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.7855628Z import torch 2023-01-11T21:38:05.7855729Z import random 2023-01-11T21:38:05.7855897Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.7856069Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.7856077Z 2023-01-11T21:38:05.7856182Z aten = torch.ops.aten 2023-01-11T21:38:05.7856364Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.7856499Z async_compile = AsyncCompile() 2023-01-11T21:38:05.7856507Z 2023-01-11T21:38:05.7856609Z import triton 2023-01-11T21:38:05.7856740Z import triton.language as tl 2023-01-11T21:38:05.7856911Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.7857106Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.7857113Z 2023-01-11T21:38:05.7857118Z 2023-01-11T21:38:05.7857436Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.7857785Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.7857961Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:05.7858106Z float* __restrict__ out_ptr0, 2023-01-11T21:38:05.7858248Z float* __restrict__ out_ptr1) 2023-01-11T21:38:05.7858344Z { 2023-01-11T21:38:05.7858489Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.7858583Z { 2023-01-11T21:38:05.7858684Z #pragma omp for 2023-01-11T21:38:05.7858805Z for(long i0=0; i0<8; i0+=1) 2023-01-11T21:38:05.7858901Z { 2023-01-11T21:38:05.7859021Z #pragma GCC ivdep 2023-01-11T21:38:05.7859144Z for(long i1=0; i1<6; i1+=1) 2023-01-11T21:38:05.7859241Z { 2023-01-11T21:38:05.7859358Z #pragma GCC ivdep 2023-01-11T21:38:05.7859478Z for(long i2=0; i2<6; i2+=1) 2023-01-11T21:38:05.7859574Z { 2023-01-11T21:38:05.7859673Z { 2023-01-11T21:38:05.7859778Z { 2023-01-11T21:38:05.7859937Z auto tmp0 = static_cast((i1 / 2)); 2023-01-11T21:38:05.7860104Z auto tmp1 = static_cast(((8 + (3*i1)) / 6)); 2023-01-11T21:38:05.7860243Z auto tmp2 = tmp0 < tmp1; 2023-01-11T21:38:05.7860403Z auto tmp3 = static_cast((i2 / 2)); 2023-01-11T21:38:05.7860556Z auto tmp4 = static_cast(((8 + (3*i2)) / 6)); 2023-01-11T21:38:05.7860691Z auto tmp5 = tmp3 < tmp4; 2023-01-11T21:38:05.7860827Z auto tmp6 = tmp2 & tmp5; 2023-01-11T21:38:05.7861014Z float tmp7 = 0.0; 2023-01-11T21:38:05.7861130Z if(tmp6) 2023-01-11T21:38:05.7861234Z { 2023-01-11T21:38:05.7861394Z auto tmp8 = in_ptr0[(3*(i1 / 2)) + (9*i0) + (i2 / 2)]; 2023-01-11T21:38:05.7861508Z tmp7 = tmp8; 2023-01-11T21:38:05.7861612Z } 2023-01-11T21:38:05.7861776Z auto tmp9 = static_cast(1 + (i2 / 2)); 2023-01-11T21:38:05.7861913Z auto tmp10 = tmp9 < tmp4; 2023-01-11T21:38:05.7862047Z auto tmp11 = tmp2 & tmp10; 2023-01-11T21:38:05.7862179Z float tmp12 = 0.0; 2023-01-11T21:38:05.7862294Z if(tmp11) 2023-01-11T21:38:05.7862388Z { 2023-01-11T21:38:05.7862556Z auto tmp13 = in_ptr0[1 + (3*(i1 / 2)) + (9*i0) + (i2 / 2)]; 2023-01-11T21:38:05.7862687Z tmp12 = tmp13; 2023-01-11T21:38:05.7862788Z } 2023-01-11T21:38:05.7862927Z auto tmp14 = tmp12 + tmp7; 2023-01-11T21:38:05.7863089Z auto tmp15 = static_cast(1 + (i1 / 2)); 2023-01-11T21:38:05.7863237Z auto tmp16 = tmp15 < tmp1; 2023-01-11T21:38:05.7863369Z auto tmp17 = tmp16 & tmp5; 2023-01-11T21:38:05.7863486Z float tmp18 = 0.0; 2023-01-11T21:38:05.7863599Z if(tmp17) 2023-01-11T21:38:05.7863704Z { 2023-01-11T21:38:05.7863871Z auto tmp19 = in_ptr0[3 + (3*(i1 / 2)) + (9*i0) + (i2 / 2)]; 2023-01-11T21:38:05.7863991Z tmp18 = tmp19; 2023-01-11T21:38:05.7864096Z } 2023-01-11T21:38:05.7864236Z auto tmp20 = tmp18 + tmp14; 2023-01-11T21:38:05.7864372Z auto tmp21 = tmp16 & tmp10; 2023-01-11T21:38:05.7864503Z float tmp22 = 0.0; 2023-01-11T21:38:05.7864617Z if(tmp21) 2023-01-11T21:38:05.7864708Z { 2023-01-11T21:38:05.7864912Z auto tmp23 = in_ptr0[4 + (3*(i1 / 2)) + (9*i0) + (i2 / 2)]; 2023-01-11T21:38:05.7865042Z tmp22 = tmp23; 2023-01-11T21:38:05.7865146Z } 2023-01-11T21:38:05.7865282Z auto tmp24 = tmp22 + tmp20; 2023-01-11T21:38:05.7865397Z float tmp25 = 0.0; 2023-01-11T21:38:05.7865513Z if(tmp6) 2023-01-11T21:38:05.7865617Z { 2023-01-11T21:38:05.7865779Z auto tmp26 = static_cast(1); 2023-01-11T21:38:05.7865904Z tmp25 = tmp26; 2023-01-11T21:38:05.7866010Z } 2023-01-11T21:38:05.7866133Z float tmp27 = 0.0; 2023-01-11T21:38:05.7866237Z if(tmp11) 2023-01-11T21:38:05.7866343Z { 2023-01-11T21:38:05.7866502Z auto tmp28 = static_cast(1); 2023-01-11T21:38:05.7866630Z tmp27 = tmp28; 2023-01-11T21:38:05.7866731Z } 2023-01-11T21:38:05.7866863Z auto tmp29 = tmp27 + tmp25; 2023-01-11T21:38:05.7866986Z float tmp30 = 0.0; 2023-01-11T21:38:05.7867086Z if(tmp17) 2023-01-11T21:38:05.7867188Z { 2023-01-11T21:38:05.7867347Z auto tmp31 = static_cast(1); 2023-01-11T21:38:05.7867471Z tmp30 = tmp31; 2023-01-11T21:38:05.7867575Z } 2023-01-11T21:38:05.7867766Z auto tmp32 = tmp30 + tmp29; 2023-01-11T21:38:05.7867894Z float tmp33 = 0.0; 2023-01-11T21:38:05.7867997Z if(tmp21) 2023-01-11T21:38:05.7868103Z { 2023-01-11T21:38:05.7868271Z auto tmp34 = static_cast(1); 2023-01-11T21:38:05.7868393Z tmp33 = tmp34; 2023-01-11T21:38:05.7868494Z } 2023-01-11T21:38:05.7868629Z auto tmp35 = tmp33 + tmp32; 2023-01-11T21:38:05.7868771Z auto tmp36 = tmp24 / tmp35; 2023-01-11T21:38:05.7868918Z out_ptr0[i2 + (6*i1) + (36*i0)] = tmp36; 2023-01-11T21:38:05.7869012Z } 2023-01-11T21:38:05.7869109Z } 2023-01-11T21:38:05.7869202Z } 2023-01-11T21:38:05.7869294Z } 2023-01-11T21:38:05.7869393Z } 2023-01-11T21:38:05.7869517Z #pragma omp for 2023-01-11T21:38:05.7869630Z for(long i0=0; i0<8; i0+=1) 2023-01-11T21:38:05.7869729Z { 2023-01-11T21:38:05.7869842Z #pragma GCC ivdep 2023-01-11T21:38:05.7869966Z for(long i1=0; i1<2; i1+=1) 2023-01-11T21:38:05.7870066Z { 2023-01-11T21:38:05.7870191Z #pragma GCC ivdep 2023-01-11T21:38:05.7870330Z for(long i2=0; i2<5; i2+=1) 2023-01-11T21:38:05.7870418Z { 2023-01-11T21:38:05.7870514Z { 2023-01-11T21:38:05.7870614Z { 2023-01-11T21:38:05.7870772Z auto tmp0 = static_cast(((3*i1) / 2)); 2023-01-11T21:38:05.7870940Z auto tmp1 = static_cast(2 + (((3*i1) / 2))); 2023-01-11T21:38:05.7871085Z auto tmp2 = tmp0 < tmp1; 2023-01-11T21:38:05.7871245Z auto tmp3 = static_cast(((3*i2) / 5)); 2023-01-11T21:38:05.7871401Z auto tmp4 = static_cast(((7 + (3*i2)) / 5)); 2023-01-11T21:38:05.7871537Z auto tmp5 = tmp3 < tmp4; 2023-01-11T21:38:05.7871679Z auto tmp6 = tmp2 & tmp5; 2023-01-11T21:38:05.7871809Z float tmp7 = 0.0; 2023-01-11T21:38:05.7871966Z if(tmp6) 2023-01-11T21:38:05.7872069Z { 2023-01-11T21:38:05.7872242Z auto tmp8 = in_ptr0[(3*(((3*i1) / 2))) + (9*i0) + (((3*i2) / 5))]; 2023-01-11T21:38:05.7872403Z auto tmp9 = static_cast(1); 2023-01-11T21:38:05.7872536Z auto tmp10 = tmp8 + tmp9; 2023-01-11T21:38:05.7872658Z tmp7 = tmp10; 2023-01-11T21:38:05.7872755Z } 2023-01-11T21:38:05.7872916Z auto tmp11 = static_cast(1 + (((3*i2) / 5))); 2023-01-11T21:38:05.7873057Z auto tmp12 = tmp11 < tmp4; 2023-01-11T21:38:05.7873191Z auto tmp13 = tmp2 & tmp12; 2023-01-11T21:38:05.7873323Z float tmp14 = 0.0; 2023-01-11T21:38:05.7873421Z if(tmp13) 2023-01-11T21:38:05.7873523Z { 2023-01-11T21:38:05.7873705Z auto tmp15 = in_ptr0[1 + (3*(((3*i1) / 2))) + (9*i0) + (((3*i2) / 5))]; 2023-01-11T21:38:05.7873864Z auto tmp16 = static_cast(1); 2023-01-11T21:38:05.7874011Z auto tmp17 = tmp15 + tmp16; 2023-01-11T21:38:05.7874130Z tmp14 = tmp17; 2023-01-11T21:38:05.7874233Z } 2023-01-11T21:38:05.7874374Z auto tmp18 = tmp14 + tmp7; 2023-01-11T21:38:05.7874532Z auto tmp19 = static_cast(1 + (((3*i1) / 2))); 2023-01-11T21:38:05.7874671Z auto tmp20 = tmp19 < tmp1; 2023-01-11T21:38:05.7874851Z auto tmp21 = tmp20 & tmp5; 2023-01-11T21:38:05.7874981Z float tmp22 = 0.0; 2023-01-11T21:38:05.7875095Z if(tmp21) 2023-01-11T21:38:05.7875206Z { 2023-01-11T21:38:05.7875388Z auto tmp23 = in_ptr0[3 + (3*(((3*i1) / 2))) + (9*i0) + (((3*i2) / 5))]; 2023-01-11T21:38:05.7875527Z auto tmp24 = static_cast(1); 2023-01-11T21:38:05.7875671Z auto tmp25 = tmp23 + tmp24; 2023-01-11T21:38:05.7875796Z tmp22 = tmp25; 2023-01-11T21:38:05.7875905Z } 2023-01-11T21:38:05.7876044Z auto tmp26 = tmp22 + tmp18; 2023-01-11T21:38:05.7876182Z auto tmp27 = tmp20 & tmp12; 2023-01-11T21:38:05.7876310Z float tmp28 = 0.0; 2023-01-11T21:38:05.7876427Z if(tmp27) 2023-01-11T21:38:05.7876522Z { 2023-01-11T21:38:05.7876694Z auto tmp29 = in_ptr0[4 + (3*(((3*i1) / 2))) + (9*i0) + (((3*i2) / 5))]; 2023-01-11T21:38:05.7876858Z auto tmp30 = static_cast(1); 2023-01-11T21:38:05.7876997Z auto tmp31 = tmp29 + tmp30; 2023-01-11T21:38:05.7877119Z tmp28 = tmp31; 2023-01-11T21:38:05.7877223Z } 2023-01-11T21:38:05.7877364Z auto tmp32 = tmp28 + tmp26; 2023-01-11T21:38:05.7877483Z float tmp33 = 0.0; 2023-01-11T21:38:05.7877598Z if(tmp6) 2023-01-11T21:38:05.7877698Z { 2023-01-11T21:38:05.7877851Z auto tmp34 = static_cast(1); 2023-01-11T21:38:05.7877974Z tmp33 = tmp34; 2023-01-11T21:38:05.7878077Z } 2023-01-11T21:38:05.7878200Z float tmp35 = 0.0; 2023-01-11T21:38:05.7878304Z if(tmp13) 2023-01-11T21:38:05.7878411Z { 2023-01-11T21:38:05.7878614Z auto tmp36 = static_cast(1); 2023-01-11T21:38:05.7878739Z tmp35 = tmp36; 2023-01-11T21:38:05.7878839Z } 2023-01-11T21:38:05.7878972Z auto tmp37 = tmp35 + tmp33; 2023-01-11T21:38:05.7879099Z float tmp38 = 0.0; 2023-01-11T21:38:05.7879211Z if(tmp21) 2023-01-11T21:38:05.7879304Z { 2023-01-11T21:38:05.7879454Z auto tmp39 = static_cast(1); 2023-01-11T21:38:05.7879581Z tmp38 = tmp39; 2023-01-11T21:38:05.7879684Z } 2023-01-11T21:38:05.7879824Z auto tmp40 = tmp38 + tmp37; 2023-01-11T21:38:05.7879953Z float tmp41 = 0.0; 2023-01-11T21:38:05.7880065Z if(tmp27) 2023-01-11T21:38:05.7880155Z { 2023-01-11T21:38:05.7880314Z auto tmp42 = static_cast(1); 2023-01-11T21:38:05.7880440Z tmp41 = tmp42; 2023-01-11T21:38:05.7880542Z } 2023-01-11T21:38:05.7880676Z auto tmp43 = tmp41 + tmp40; 2023-01-11T21:38:05.7880812Z auto tmp44 = tmp32 / tmp43; 2023-01-11T21:38:05.7880953Z out_ptr1[i2 + (5*i1) + (10*i0)] = tmp44; 2023-01-11T21:38:05.7881044Z } 2023-01-11T21:38:05.7881148Z } 2023-01-11T21:38:05.7881245Z } 2023-01-11T21:38:05.7881338Z } 2023-01-11T21:38:05.7881472Z } 2023-01-11T21:38:05.7881562Z } 2023-01-11T21:38:05.7881654Z } 2023-01-11T21:38:05.7881775Z ''') 2023-01-11T21:38:05.7881784Z 2023-01-11T21:38:05.7881790Z 2023-01-11T21:38:05.7881930Z async_compile.wait(globals()) 2023-01-11T21:38:05.7882041Z del async_compile 2023-01-11T21:38:05.7882048Z 2023-01-11T21:38:05.7882152Z def call(args): 2023-01-11T21:38:05.7882260Z arg0_1, = args 2023-01-11T21:38:05.7882367Z args.clear() 2023-01-11T21:38:05.7882667Z buf0 = empty_strided((2, 4, 6, 6), (144, 36, 6, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.7882939Z buf1 = empty_strided((2, 4, 2, 5), (40, 10, 5, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.7883168Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr())) 2023-01-11T21:38:05.7883271Z del arg0_1 2023-01-11T21:38:05.7883386Z return (buf0, buf1, ) 2023-01-11T21:38:05.7883394Z 2023-01-11T21:38:05.7883400Z 2023-01-11T21:38:05.7883506Z if __name__ == "__main__": 2023-01-11T21:38:05.7883673Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.7883849Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.7884147Z arg0_1 = rand_strided((2, 4, 3, 3), (36, 9, 3, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.7884289Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:05.7884657Z [2023-01-11 21:24:12,158] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 9 2023-01-11T21:38:05.7885242Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.7885417Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.7885782Z [2023-01-11 21:24:12,185] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 10 2023-01-11T21:38:05.7885791Z 2023-01-11T21:38:05.7885797Z 2023-01-11T21:38:05.7885934Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.7886036Z import torch 2023-01-11T21:38:05.7886140Z import random 2023-01-11T21:38:05.7886363Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.7886539Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.7886547Z 2023-01-11T21:38:05.7886647Z aten = torch.ops.aten 2023-01-11T21:38:05.7886840Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.7886975Z async_compile = AsyncCompile() 2023-01-11T21:38:05.7886982Z 2023-01-11T21:38:05.7887087Z import triton 2023-01-11T21:38:05.7887215Z import triton.language as tl 2023-01-11T21:38:05.7887382Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.7887577Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.7887589Z 2023-01-11T21:38:05.7887596Z 2023-01-11T21:38:05.7887792Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.7888059Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.7888232Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:05.7888382Z float* __restrict__ out_ptr0, 2023-01-11T21:38:05.7888530Z float* __restrict__ out_ptr1) 2023-01-11T21:38:05.7888622Z { 2023-01-11T21:38:05.7888756Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.7888852Z { 2023-01-11T21:38:05.7888956Z #pragma omp for 2023-01-11T21:38:05.7889083Z for(long i0=0; i0<36; i0+=1) 2023-01-11T21:38:05.7889178Z { 2023-01-11T21:38:05.7889363Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:05.7889496Z tmp0.store(out_ptr0 + 8*i0); 2023-01-11T21:38:05.7889591Z } 2023-01-11T21:38:05.7889781Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:05.7889893Z for(long i0=288; i0<288; i0+=1) 2023-01-11T21:38:05.7889987Z { 2023-01-11T21:38:05.7890107Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:05.7890218Z out_ptr0[i0] = tmp0; 2023-01-11T21:38:05.7890314Z } 2023-01-11T21:38:05.7890430Z #pragma omp for 2023-01-11T21:38:05.7890553Z for(long i0=0; i0<8; i0+=1) 2023-01-11T21:38:05.7890636Z { 2023-01-11T21:38:05.7890756Z #pragma GCC ivdep 2023-01-11T21:38:05.7890873Z for(long i1=0; i1<2; i1+=1) 2023-01-11T21:38:05.7890966Z { 2023-01-11T21:38:05.7891086Z #pragma GCC ivdep 2023-01-11T21:38:05.7891210Z for(long i2=0; i2<5; i2+=1) 2023-01-11T21:38:05.7891303Z { 2023-01-11T21:38:05.7891392Z { 2023-01-11T21:38:05.7891490Z { 2023-01-11T21:38:05.7891652Z auto tmp0 = static_cast(3*i1); 2023-01-11T21:38:05.7891812Z auto tmp1 = static_cast(3 + (3*i1)); 2023-01-11T21:38:05.7891949Z auto tmp2 = tmp0 < tmp1; 2023-01-11T21:38:05.7892111Z auto tmp3 = static_cast(((6*i2) / 5)); 2023-01-11T21:38:05.7892284Z auto tmp4 = static_cast(2 + (((6*i2) / 5))); 2023-01-11T21:38:05.7892404Z auto tmp5 = tmp3 < tmp4; 2023-01-11T21:38:05.7892545Z auto tmp6 = tmp2 & tmp5; 2023-01-11T21:38:05.7892673Z float tmp7 = 0.0; 2023-01-11T21:38:05.7892786Z if(tmp6) 2023-01-11T21:38:05.7892891Z { 2023-01-11T21:38:05.7893057Z auto tmp8 = in_ptr0[(18*i1) + (36*i0) + (((6*i2) / 5))]; 2023-01-11T21:38:05.7893214Z auto tmp9 = static_cast(1); 2023-01-11T21:38:05.7893362Z auto tmp10 = tmp8 + tmp9; 2023-01-11T21:38:05.7893474Z tmp7 = tmp10; 2023-01-11T21:38:05.7893575Z } 2023-01-11T21:38:05.7893739Z auto tmp11 = static_cast(1 + (((6*i2) / 5))); 2023-01-11T21:38:05.7893916Z auto tmp12 = tmp11 < tmp4; 2023-01-11T21:38:05.7894057Z auto tmp13 = tmp2 & tmp12; 2023-01-11T21:38:05.7894184Z float tmp14 = 0.0; 2023-01-11T21:38:05.7894301Z if(tmp13) 2023-01-11T21:38:05.7894392Z { 2023-01-11T21:38:05.7932113Z auto tmp15 = in_ptr0[1 + (18*i1) + (36*i0) + (((6*i2) / 5))]; 2023-01-11T21:38:05.7932292Z auto tmp16 = static_cast(1); 2023-01-11T21:38:05.7932442Z auto tmp17 = tmp15 + tmp16; 2023-01-11T21:38:05.7932567Z tmp14 = tmp17; 2023-01-11T21:38:05.7932679Z } 2023-01-11T21:38:05.7932825Z auto tmp18 = tmp14 + tmp7; 2023-01-11T21:38:05.7932991Z auto tmp19 = static_cast(1 + (3*i1)); 2023-01-11T21:38:05.7933124Z auto tmp20 = tmp19 < tmp1; 2023-01-11T21:38:05.7933263Z auto tmp21 = tmp20 & tmp5; 2023-01-11T21:38:05.7933391Z float tmp22 = 0.0; 2023-01-11T21:38:05.7933505Z if(tmp21) 2023-01-11T21:38:05.7933611Z { 2023-01-11T21:38:05.7933775Z auto tmp23 = in_ptr0[6 + (18*i1) + (36*i0) + (((6*i2) / 5))]; 2023-01-11T21:38:05.7933932Z auto tmp24 = static_cast(1); 2023-01-11T21:38:05.7934066Z auto tmp25 = tmp23 + tmp24; 2023-01-11T21:38:05.7934191Z tmp22 = tmp25; 2023-01-11T21:38:05.7934387Z } 2023-01-11T21:38:05.7934665Z auto tmp26 = tmp22 + tmp18; 2023-01-11T21:38:05.7934812Z auto tmp27 = tmp20 & tmp12; 2023-01-11T21:38:05.7934942Z float tmp28 = 0.0; 2023-01-11T21:38:05.7935060Z if(tmp27) 2023-01-11T21:38:05.7935155Z { 2023-01-11T21:38:05.7935352Z auto tmp29 = in_ptr0[7 + (18*i1) + (36*i0) + (((6*i2) / 5))]; 2023-01-11T21:38:05.7935528Z auto tmp30 = static_cast(1); 2023-01-11T21:38:05.7935669Z auto tmp31 = tmp29 + tmp30; 2023-01-11T21:38:05.7935794Z tmp28 = tmp31; 2023-01-11T21:38:05.7935899Z } 2023-01-11T21:38:05.7936035Z auto tmp32 = tmp28 + tmp26; 2023-01-11T21:38:05.7936195Z auto tmp33 = static_cast(2 + (3*i1)); 2023-01-11T21:38:05.7936327Z auto tmp34 = tmp33 < tmp1; 2023-01-11T21:38:05.7936465Z auto tmp35 = tmp34 & tmp5; 2023-01-11T21:38:05.7936596Z float tmp36 = 0.0; 2023-01-11T21:38:05.7936708Z if(tmp35) 2023-01-11T21:38:05.7936809Z { 2023-01-11T21:38:05.7936980Z auto tmp37 = in_ptr0[12 + (18*i1) + (36*i0) + (((6*i2) / 5))]; 2023-01-11T21:38:05.7937204Z auto tmp38 = static_cast(1); 2023-01-11T21:38:05.7937333Z auto tmp39 = tmp37 + tmp38; 2023-01-11T21:38:05.7937459Z tmp36 = tmp39; 2023-01-11T21:38:05.7937566Z } 2023-01-11T21:38:05.7937705Z auto tmp40 = tmp36 + tmp32; 2023-01-11T21:38:05.7937842Z auto tmp41 = tmp34 & tmp12; 2023-01-11T21:38:05.7937969Z float tmp42 = 0.0; 2023-01-11T21:38:05.7938080Z if(tmp41) 2023-01-11T21:38:05.7938185Z { 2023-01-11T21:38:05.7938343Z auto tmp43 = in_ptr0[13 + (18*i1) + (36*i0) + (((6*i2) / 5))]; 2023-01-11T21:38:05.7938565Z auto tmp44 = static_cast(1); 2023-01-11T21:38:05.7938708Z auto tmp45 = tmp43 + tmp44; 2023-01-11T21:38:05.7938831Z tmp42 = tmp45; 2023-01-11T21:38:05.7938934Z } 2023-01-11T21:38:05.7939071Z auto tmp46 = tmp42 + tmp40; 2023-01-11T21:38:05.7939202Z auto tmp47 = tmp1 < tmp1; 2023-01-11T21:38:05.7939327Z auto tmp48 = tmp47 & tmp5; 2023-01-11T21:38:05.7939453Z float tmp49 = 0.0; 2023-01-11T21:38:05.7939566Z if(tmp48) 2023-01-11T21:38:05.7939674Z { 2023-01-11T21:38:05.7939835Z auto tmp50 = in_ptr0[18 + (18*i1) + (36*i0) + (((6*i2) / 5))]; 2023-01-11T21:38:05.7939983Z auto tmp51 = static_cast(1); 2023-01-11T21:38:05.7940131Z auto tmp52 = tmp50 + tmp51; 2023-01-11T21:38:05.7940254Z tmp49 = tmp52; 2023-01-11T21:38:05.7940347Z } 2023-01-11T21:38:05.7940486Z auto tmp53 = tmp49 + tmp46; 2023-01-11T21:38:05.7940616Z auto tmp54 = tmp47 & tmp12; 2023-01-11T21:38:05.7940750Z float tmp55 = 0.0; 2023-01-11T21:38:05.7940863Z if(tmp54) 2023-01-11T21:38:05.7940969Z { 2023-01-11T21:38:05.7941135Z auto tmp56 = in_ptr0[19 + (18*i1) + (36*i0) + (((6*i2) / 5))]; 2023-01-11T21:38:05.7941333Z auto tmp57 = static_cast(1); 2023-01-11T21:38:05.7941471Z auto tmp58 = tmp56 + tmp57; 2023-01-11T21:38:05.7941597Z tmp55 = tmp58; 2023-01-11T21:38:05.7941698Z } 2023-01-11T21:38:05.7941832Z auto tmp59 = tmp55 + tmp53; 2023-01-11T21:38:05.7941958Z float tmp60 = 0.0; 2023-01-11T21:38:05.7942071Z if(tmp6) 2023-01-11T21:38:05.7942165Z { 2023-01-11T21:38:05.7942321Z auto tmp61 = static_cast(1); 2023-01-11T21:38:05.7942444Z tmp60 = tmp61; 2023-01-11T21:38:05.7942544Z } 2023-01-11T21:38:05.7942669Z float tmp62 = 0.0; 2023-01-11T21:38:05.7942782Z if(tmp13) 2023-01-11T21:38:05.7942882Z { 2023-01-11T21:38:05.7943039Z auto tmp63 = static_cast(1); 2023-01-11T21:38:05.7943151Z tmp62 = tmp63; 2023-01-11T21:38:05.7943253Z } 2023-01-11T21:38:05.7943393Z auto tmp64 = tmp62 + tmp60; 2023-01-11T21:38:05.7943526Z float tmp65 = 0.0; 2023-01-11T21:38:05.7943640Z if(tmp21) 2023-01-11T21:38:05.7943740Z { 2023-01-11T21:38:05.7943891Z auto tmp66 = static_cast(1); 2023-01-11T21:38:05.7944004Z tmp65 = tmp66; 2023-01-11T21:38:05.7944106Z } 2023-01-11T21:38:05.7944249Z auto tmp67 = tmp65 + tmp64; 2023-01-11T21:38:05.7944374Z float tmp68 = 0.0; 2023-01-11T21:38:05.7944490Z if(tmp27) 2023-01-11T21:38:05.7944595Z { 2023-01-11T21:38:05.7944758Z auto tmp69 = static_cast(1); 2023-01-11T21:38:05.7944870Z tmp68 = tmp69; 2023-01-11T21:38:05.7944974Z } 2023-01-11T21:38:05.7945109Z auto tmp70 = tmp68 + tmp67; 2023-01-11T21:38:05.7945294Z float tmp71 = 0.0; 2023-01-11T21:38:05.7945418Z if(tmp35) 2023-01-11T21:38:05.7945529Z { 2023-01-11T21:38:05.7945697Z auto tmp72 = static_cast(1); 2023-01-11T21:38:05.7945830Z tmp71 = tmp72; 2023-01-11T21:38:05.7945935Z } 2023-01-11T21:38:05.7946072Z auto tmp73 = tmp71 + tmp70; 2023-01-11T21:38:05.7946197Z float tmp74 = 0.0; 2023-01-11T21:38:05.7946312Z if(tmp41) 2023-01-11T21:38:05.7946415Z { 2023-01-11T21:38:05.7946571Z auto tmp75 = static_cast(1); 2023-01-11T21:38:05.7946688Z tmp74 = tmp75; 2023-01-11T21:38:05.7946789Z } 2023-01-11T21:38:05.7946929Z auto tmp76 = tmp74 + tmp73; 2023-01-11T21:38:05.7947061Z float tmp77 = 0.0; 2023-01-11T21:38:05.7947172Z if(tmp48) 2023-01-11T21:38:05.7947273Z { 2023-01-11T21:38:05.7947429Z auto tmp78 = static_cast(1); 2023-01-11T21:38:05.7947553Z tmp77 = tmp78; 2023-01-11T21:38:05.7947646Z } 2023-01-11T21:38:05.7947785Z auto tmp79 = tmp77 + tmp76; 2023-01-11T21:38:05.7947906Z float tmp80 = 0.0; 2023-01-11T21:38:05.7948019Z if(tmp54) 2023-01-11T21:38:05.7948171Z { 2023-01-11T21:38:05.7948332Z auto tmp81 = static_cast(1); 2023-01-11T21:38:05.7948460Z tmp80 = tmp81; 2023-01-11T21:38:05.7948555Z } 2023-01-11T21:38:05.7948687Z auto tmp82 = tmp80 + tmp79; 2023-01-11T21:38:05.7948818Z auto tmp83 = tmp59 / tmp82; 2023-01-11T21:38:05.7948970Z out_ptr1[i2 + (5*i1) + (10*i0)] = tmp83; 2023-01-11T21:38:05.7949073Z } 2023-01-11T21:38:05.7949177Z } 2023-01-11T21:38:05.7949268Z } 2023-01-11T21:38:05.7949354Z } 2023-01-11T21:38:05.7949451Z } 2023-01-11T21:38:05.7949544Z } 2023-01-11T21:38:05.7949635Z } 2023-01-11T21:38:05.7949778Z ''') 2023-01-11T21:38:05.7949785Z 2023-01-11T21:38:05.7949792Z 2023-01-11T21:38:05.7949921Z async_compile.wait(globals()) 2023-01-11T21:38:05.7950037Z del async_compile 2023-01-11T21:38:05.7950044Z 2023-01-11T21:38:05.7950140Z def call(args): 2023-01-11T21:38:05.7950244Z arg0_1, = args 2023-01-11T21:38:05.7950352Z args.clear() 2023-01-11T21:38:05.7950647Z buf0 = empty_strided((2, 4, 6, 6), (144, 36, 6, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.7950947Z buf1 = empty_strided((2, 4, 2, 5), (40, 10, 5, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.7951181Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr())) 2023-01-11T21:38:05.7951279Z del arg0_1 2023-01-11T21:38:05.7951389Z return (buf0, buf1, ) 2023-01-11T21:38:05.7951397Z 2023-01-11T21:38:05.7951403Z 2023-01-11T21:38:05.7951506Z if __name__ == "__main__": 2023-01-11T21:38:05.7951666Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.7951832Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.7973791Z arg0_1 = rand_strided((2, 4, 6, 6), (144, 36, 6, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.7973956Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:05.7974330Z [2023-01-11 21:24:14,053] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 10 2023-01-11T21:38:05.7974340Z 2023-01-11T21:38:05.7974443Z ok (6.186s) 2023-01-11T21:38:05.7975302Z test_adaptive_avg_pool2d2_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.7975489Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.7975842Z [2023-01-11 21:24:14,071] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 11 2023-01-11T21:38:05.7976183Z [2023-01-11 21:24:14,077] torch._inductor.ir: [WARNING] Using FallbackKernel: aten._adaptive_avg_pool2d 2023-01-11T21:38:05.7976543Z [2023-01-11 21:24:14,079] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 11 2023-01-11T21:38:05.7976551Z 2023-01-11T21:38:05.7976684Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.7976792Z import torch 2023-01-11T21:38:05.7976896Z import random 2023-01-11T21:38:05.7977054Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.7977312Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.7977320Z 2023-01-11T21:38:05.7977437Z aten = torch.ops.aten 2023-01-11T21:38:05.7977618Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.7977744Z async_compile = AsyncCompile() 2023-01-11T21:38:05.7977751Z 2023-01-11T21:38:05.7977856Z import triton 2023-01-11T21:38:05.7977982Z import triton.language as tl 2023-01-11T21:38:05.7978157Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.7978427Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.7978435Z 2023-01-11T21:38:05.7978441Z 2023-01-11T21:38:05.7978574Z async_compile.wait(globals()) 2023-01-11T21:38:05.7978682Z del async_compile 2023-01-11T21:38:05.7978689Z 2023-01-11T21:38:05.7978784Z def call(args): 2023-01-11T21:38:05.7978888Z arg0_1, = args 2023-01-11T21:38:05.7978991Z args.clear() 2023-01-11T21:38:05.7979153Z buf0 = aten._adaptive_avg_pool2d(arg0_1, [4, 4]) 2023-01-11T21:38:05.7979254Z del arg0_1 2023-01-11T21:38:05.7979353Z buf1 = buf0 2023-01-11T21:38:05.7979498Z assert_size_stride(buf1, (2, 4, 4, 4), (64, 16, 4, 1)) 2023-01-11T21:38:05.7979586Z del buf0 2023-01-11T21:38:05.7979690Z return (buf1, ) 2023-01-11T21:38:05.7979697Z 2023-01-11T21:38:05.7979704Z 2023-01-11T21:38:05.7979814Z if __name__ == "__main__": 2023-01-11T21:38:05.7979971Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.7980151Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.7980470Z arg0_1 = rand_strided((2, 4, 21, 21), (1764, 441, 21, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.7980622Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:05.7980629Z 2023-01-11T21:38:05.7980729Z ok (0.026s) 2023-01-11T21:38:05.7981350Z test_add_const_float_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.7981534Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.7981889Z [2023-01-11 21:24:14,091] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 12 2023-01-11T21:38:05.7982254Z [2023-01-11 21:24:15,796] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 12 2023-01-11T21:38:05.7982261Z 2023-01-11T21:38:05.7982401Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.7982507Z import torch 2023-01-11T21:38:05.7982608Z import random 2023-01-11T21:38:05.7982813Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.7983016Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.7983024Z 2023-01-11T21:38:05.7983132Z aten = torch.ops.aten 2023-01-11T21:38:05.7983394Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.7983547Z async_compile = AsyncCompile() 2023-01-11T21:38:05.7983554Z 2023-01-11T21:38:05.7983824Z import triton 2023-01-11T21:38:05.7983979Z import triton.language as tl 2023-01-11T21:38:05.7984181Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.7984400Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.7984413Z 2023-01-11T21:38:05.7984420Z 2023-01-11T21:38:05.7984642Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.7984908Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.7985098Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:05.7985265Z float* __restrict__ out_ptr0) 2023-01-11T21:38:05.7985381Z { 2023-01-11T21:38:05.7985547Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.7985660Z { 2023-01-11T21:38:05.7985834Z #pragma omp for 2023-01-11T21:38:05.7985947Z for(long i0=0; i0<4; i0+=1) 2023-01-11T21:38:05.7986061Z { 2023-01-11T21:38:05.7986267Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:05.7986522Z auto tmp1 = at::vec::Vectorized(static_cast(1.5)); 2023-01-11T21:38:05.7986663Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:05.7986814Z tmp2.store(out_ptr0 + 8*i0); 2023-01-11T21:38:05.7986996Z } 2023-01-11T21:38:05.7987156Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:05.7987270Z for(long i0=32; i0<32; i0+=1) 2023-01-11T21:38:05.7987385Z { 2023-01-11T21:38:05.7987531Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:05.7987704Z auto tmp1 = static_cast(1.5); 2023-01-11T21:38:05.7987878Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:05.7988019Z out_ptr0[i0] = tmp2; 2023-01-11T21:38:05.7988149Z } 2023-01-11T21:38:05.7988231Z } 2023-01-11T21:38:05.7988340Z } 2023-01-11T21:38:05.7988494Z ''') 2023-01-11T21:38:05.7988502Z 2023-01-11T21:38:05.7988508Z 2023-01-11T21:38:05.7988649Z async_compile.wait(globals()) 2023-01-11T21:38:05.7988806Z del async_compile 2023-01-11T21:38:05.7988813Z 2023-01-11T21:38:05.7989021Z def call(args): 2023-01-11T21:38:05.7989147Z arg0_1, = args 2023-01-11T21:38:05.7989244Z args.clear() 2023-01-11T21:38:05.7989543Z buf0 = empty_strided((32, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.7989773Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:05.7989919Z del arg0_1 2023-01-11T21:38:05.7990169Z return (buf0, ) 2023-01-11T21:38:05.7990176Z 2023-01-11T21:38:05.7990182Z 2023-01-11T21:38:05.7990327Z if __name__ == "__main__": 2023-01-11T21:38:05.7990503Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.7990698Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.7990971Z arg0_1 = rand_strided((32, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.7991181Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:05.7991189Z 2023-01-11T21:38:05.7991320Z ok (1.718s) 2023-01-11T21:38:05.7991965Z test_add_const_int_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.7992173Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.7992627Z [2023-01-11 21:24:15,809] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 13 2023-01-11T21:38:05.7993013Z [2023-01-11 21:24:17,519] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 13 2023-01-11T21:38:05.7993021Z 2023-01-11T21:38:05.7993245Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.7993472Z import torch 2023-01-11T21:38:05.7993572Z import random 2023-01-11T21:38:05.7993926Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.7994113Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.7994121Z 2023-01-11T21:38:05.7994255Z aten = torch.ops.aten 2023-01-11T21:38:05.7994468Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.7994615Z async_compile = AsyncCompile() 2023-01-11T21:38:05.7994622Z 2023-01-11T21:38:05.7994751Z import triton 2023-01-11T21:38:05.7994935Z import triton.language as tl 2023-01-11T21:38:05.7995101Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.7995311Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.7995319Z 2023-01-11T21:38:05.7995325Z 2023-01-11T21:38:05.7995545Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.7995838Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.7996026Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:05.7996213Z float* __restrict__ out_ptr0) 2023-01-11T21:38:05.7996338Z { 2023-01-11T21:38:05.7996536Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.7996621Z { 2023-01-11T21:38:05.7996810Z #pragma omp for 2023-01-11T21:38:05.7996950Z for(long i0=0; i0<4; i0+=1) 2023-01-11T21:38:05.7997067Z { 2023-01-11T21:38:05.7997278Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:05.7997481Z auto tmp1 = at::vec::Vectorized(static_cast(1)); 2023-01-11T21:38:05.7997641Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:05.7997765Z tmp2.store(out_ptr0 + 8*i0); 2023-01-11T21:38:05.7997878Z } 2023-01-11T21:38:05.7998035Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:05.7998201Z for(long i0=32; i0<32; i0+=1) 2023-01-11T21:38:05.7998314Z { 2023-01-11T21:38:05.7998458Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:05.7998624Z auto tmp1 = static_cast(1); 2023-01-11T21:38:05.7998742Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:05.7998884Z out_ptr0[i0] = tmp2; 2023-01-11T21:38:05.7998994Z } 2023-01-11T21:38:05.7999154Z } 2023-01-11T21:38:05.7999267Z } 2023-01-11T21:38:05.7999416Z ''') 2023-01-11T21:38:05.7999424Z 2023-01-11T21:38:05.7999430Z 2023-01-11T21:38:05.7999583Z async_compile.wait(globals()) 2023-01-11T21:38:05.7999679Z del async_compile 2023-01-11T21:38:05.7999687Z 2023-01-11T21:38:05.7999809Z def call(args): 2023-01-11T21:38:05.7999971Z arg0_1, = args 2023-01-11T21:38:05.8000103Z args.clear() 2023-01-11T21:38:05.8000402Z buf0 = empty_strided((32, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8000605Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:05.8000725Z del arg0_1 2023-01-11T21:38:05.8000821Z return (buf0, ) 2023-01-11T21:38:05.8001158Z 2023-01-11T21:38:05.8001166Z 2023-01-11T21:38:05.8001277Z if __name__ == "__main__": 2023-01-11T21:38:05.8001468Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.8001694Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.8001995Z arg0_1 = rand_strided((32, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8002171Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:05.8002179Z 2023-01-11T21:38:05.8002347Z ok (1.722s) 2023-01-11T21:38:05.8003043Z test_add_inplace_permuted_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.8003248Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.8003633Z [2023-01-11 21:24:17,533] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 14 2023-01-11T21:38:05.8003984Z [2023-01-11 21:24:19,304] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 14 2023-01-11T21:38:05.8004592Z 2023-01-11T21:38:05.8004738Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.8004861Z import torch 2023-01-11T21:38:05.8004992Z import random 2023-01-11T21:38:05.8005206Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.8005423Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.8005437Z 2023-01-11T21:38:05.8005576Z aten = torch.ops.aten 2023-01-11T21:38:05.8005811Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.8005940Z async_compile = AsyncCompile() 2023-01-11T21:38:05.8005947Z 2023-01-11T21:38:05.8006074Z import triton 2023-01-11T21:38:05.8006232Z import triton.language as tl 2023-01-11T21:38:05.8006435Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.8006635Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.8006643Z 2023-01-11T21:38:05.8006648Z 2023-01-11T21:38:05.8006939Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.8007310Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.8007489Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:05.8007623Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:05.8007791Z float* __restrict__ out_ptr0) 2023-01-11T21:38:05.8007907Z { 2023-01-11T21:38:05.8008109Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.8008225Z { 2023-01-11T21:38:05.8008375Z #pragma omp for collapse(2) 2023-01-11T21:38:05.8008542Z for(long i0=0; i0<2; i0+=1) 2023-01-11T21:38:05.8008624Z { 2023-01-11T21:38:05.8008774Z for(long i1=0; i1<12; i1+=1) 2023-01-11T21:38:05.8009324Z { 2023-01-11T21:38:05.8009482Z for(long i2=0; i2<27; i2+=1) 2023-01-11T21:38:05.8009603Z { 2023-01-11T21:38:05.8009849Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + (8*i2) + (221*i1) + (2652*i0)); 2023-01-11T21:38:05.8010084Z auto tmp1 = at::vec::Vectorized::loadu(in_ptr1 + (8*i2) + (221*i0)); 2023-01-11T21:38:05.8010203Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:05.8010835Z tmp2.store(out_ptr0 + (8*i2) + (221*i1) + (2652*i0)); 2023-01-11T21:38:05.8010965Z } 2023-01-11T21:38:05.8011128Z #pragma omp simd simdlen(4) 2023-01-11T21:38:05.8011270Z for(long i2=216; i2<221; i2+=1) 2023-01-11T21:38:05.8011386Z { 2023-01-11T21:38:05.8011556Z auto tmp0 = out_ptr0[i2 + (221*i1) + (2652*i0)]; 2023-01-11T21:38:05.8011718Z auto tmp1 = in_ptr1[i2 + (221*i0)]; 2023-01-11T21:38:05.8011843Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:05.8012017Z out_ptr0[i2 + (221*i1) + (2652*i0)] = tmp2; 2023-01-11T21:38:05.8012130Z } 2023-01-11T21:38:05.8012246Z } 2023-01-11T21:38:05.8012367Z } 2023-01-11T21:38:05.8012518Z } 2023-01-11T21:38:05.8012609Z } 2023-01-11T21:38:05.8012762Z ''') 2023-01-11T21:38:05.8012770Z 2023-01-11T21:38:05.8012776Z 2023-01-11T21:38:05.8012929Z async_compile.wait(globals()) 2023-01-11T21:38:05.8013070Z del async_compile 2023-01-11T21:38:05.8013077Z 2023-01-11T21:38:05.8013291Z def call(args): 2023-01-11T21:38:05.8013442Z arg0_1, arg1_1 = args 2023-01-11T21:38:05.8013572Z args.clear() 2023-01-11T21:38:05.8013809Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(arg0_1.data_ptr())) 2023-01-11T21:38:05.8013903Z del arg1_1 2023-01-11T21:38:05.8014039Z return (arg0_1, ) 2023-01-11T21:38:05.8014047Z 2023-01-11T21:38:05.8014054Z 2023-01-11T21:38:05.8014199Z if __name__ == "__main__": 2023-01-11T21:38:05.8014387Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.8014851Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.8015198Z arg0_1 = rand_strided((2, 13, 12, 17), (2652, 17, 221, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8015523Z arg1_1 = rand_strided((2, 13, 1, 17), (221, 17, 17, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8015698Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:05.8015706Z 2023-01-11T21:38:05.8015832Z ok (1.785s) 2023-01-11T21:38:05.8016438Z test_addmm_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.8016655Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.8017033Z [2023-01-11 21:24:19,333] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 15 2023-01-11T21:38:05.8017626Z [2023-01-11 21:24:21,126] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 15 2023-01-11T21:38:05.8017634Z 2023-01-11T21:38:05.8017876Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.8018005Z import torch 2023-01-11T21:38:05.8018133Z import random 2023-01-11T21:38:05.8018328Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.8018540Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.8018548Z 2023-01-11T21:38:05.8018656Z aten = torch.ops.aten 2023-01-11T21:38:05.8018861Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.8019023Z async_compile = AsyncCompile() 2023-01-11T21:38:05.8019030Z 2023-01-11T21:38:05.8019160Z import triton 2023-01-11T21:38:05.8019312Z import triton.language as tl 2023-01-11T21:38:05.8019505Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.8019723Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.8019735Z 2023-01-11T21:38:05.8019741Z 2023-01-11T21:38:05.8019969Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.8020237Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.8020423Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:05.8020605Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:05.8020777Z const float* __restrict__ in_ptr2, 2023-01-11T21:38:05.8020999Z float* __restrict__ out_ptr0, 2023-01-11T21:38:05.8021161Z float* __restrict__ out_ptr1, 2023-01-11T21:38:05.8021309Z float* __restrict__ out_ptr2) 2023-01-11T21:38:05.8021432Z { 2023-01-11T21:38:05.8021570Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.8021686Z { 2023-01-11T21:38:05.8021823Z #pragma omp for 2023-01-11T21:38:05.8021993Z for(long i0=0; i0<8; i0+=1) 2023-01-11T21:38:05.8022111Z { 2023-01-11T21:38:05.8022330Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:05.8022540Z auto tmp1 = at::vec::Vectorized(static_cast(1)); 2023-01-11T21:38:05.8022657Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:05.8022884Z tmp2.store(out_ptr0 + 8*i0); 2023-01-11T21:38:05.8023027Z } 2023-01-11T21:38:05.8023185Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:05.8023330Z for(long i0=64; i0<64; i0+=1) 2023-01-11T21:38:05.8023445Z { 2023-01-11T21:38:05.8023643Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:05.8023780Z auto tmp1 = static_cast(1); 2023-01-11T21:38:05.8023950Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:05.8024087Z out_ptr0[i0] = tmp2; 2023-01-11T21:38:05.8024200Z } 2023-01-11T21:38:05.8024336Z #pragma omp for 2023-01-11T21:38:05.8024476Z for(long i0=0; i0<8; i0+=1) 2023-01-11T21:38:05.8024801Z { 2023-01-11T21:38:05.8024980Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr1 + 8*i0); 2023-01-11T21:38:05.8025193Z auto tmp1 = at::vec::Vectorized(static_cast(2)); 2023-01-11T21:38:05.8025456Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:05.8025613Z tmp2.store(out_ptr1 + 8*i0); 2023-01-11T21:38:05.8025724Z } 2023-01-11T21:38:05.8025888Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:05.8026055Z for(long i0=64; i0<64; i0+=1) 2023-01-11T21:38:05.8026143Z { 2023-01-11T21:38:05.8026290Z auto tmp0 = in_ptr1[i0]; 2023-01-11T21:38:05.8026455Z auto tmp1 = static_cast(2); 2023-01-11T21:38:05.8026611Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:05.8026796Z out_ptr1[i0] = tmp2; 2023-01-11T21:38:05.8027482Z } 2023-01-11T21:38:05.8027624Z #pragma omp for 2023-01-11T21:38:05.8027738Z for(long i0=0; i0<8; i0+=1) 2023-01-11T21:38:05.8027929Z { 2023-01-11T21:38:05.8028139Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr2 + 8*i0); 2023-01-11T21:38:05.8028732Z auto tmp1 = at::vec::Vectorized(static_cast(3)); 2023-01-11T21:38:05.8028904Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:05.8029063Z tmp2.store(out_ptr2 + 8*i0); 2023-01-11T21:38:05.8029179Z } 2023-01-11T21:38:05.8029313Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:05.8029457Z for(long i0=64; i0<64; i0+=1) 2023-01-11T21:38:05.8029566Z { 2023-01-11T21:38:05.8029713Z auto tmp0 = in_ptr2[i0]; 2023-01-11T21:38:05.8029872Z auto tmp1 = static_cast(3); 2023-01-11T21:38:05.8030030Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:05.8030163Z out_ptr2[i0] = tmp2; 2023-01-11T21:38:05.8030249Z } 2023-01-11T21:38:05.8030431Z } 2023-01-11T21:38:05.8030544Z } 2023-01-11T21:38:05.8030706Z ''') 2023-01-11T21:38:05.8030719Z 2023-01-11T21:38:05.8030724Z 2023-01-11T21:38:05.8030941Z kernel_cpp_1 = async_compile.cpp(''' 2023-01-11T21:38:05.8031247Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.8031453Z extern "C" void kernel(float* __restrict__ in_out_ptr0) 2023-01-11T21:38:05.8031577Z { 2023-01-11T21:38:05.8031708Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.8031828Z { 2023-01-11T21:38:05.8031969Z #pragma omp for 2023-01-11T21:38:05.8032156Z for(long i0=0; i0<8; i0+=1) 2023-01-11T21:38:05.8032270Z { 2023-01-11T21:38:05.8032499Z auto tmp0 = at::vec::Vectorized::loadu(in_out_ptr0 + 8*i0); 2023-01-11T21:38:05.8032715Z auto tmp1 = at::vec::Vectorized(static_cast(4)); 2023-01-11T21:38:05.8032835Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:05.8032998Z tmp2.store(in_out_ptr0 + 8*i0); 2023-01-11T21:38:05.8033111Z } 2023-01-11T21:38:05.8033329Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:05.8033473Z for(long i0=64; i0<64; i0+=1) 2023-01-11T21:38:05.8033586Z { 2023-01-11T21:38:05.8033763Z auto tmp0 = in_out_ptr0[i0]; 2023-01-11T21:38:05.8033905Z auto tmp1 = static_cast(4); 2023-01-11T21:38:05.8034109Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:05.8034257Z in_out_ptr0[i0] = tmp2; 2023-01-11T21:38:05.8034377Z } 2023-01-11T21:38:05.8034489Z } 2023-01-11T21:38:05.8034633Z } 2023-01-11T21:38:05.8034799Z ''') 2023-01-11T21:38:05.8034808Z 2023-01-11T21:38:05.8034814Z 2023-01-11T21:38:05.8034956Z async_compile.wait(globals()) 2023-01-11T21:38:05.8035120Z del async_compile 2023-01-11T21:38:05.8035128Z 2023-01-11T21:38:05.8035254Z def call(args): 2023-01-11T21:38:05.8035394Z arg0_1, arg1_1, arg2_1 = args 2023-01-11T21:38:05.8035537Z args.clear() 2023-01-11T21:38:05.8035833Z buf0 = empty_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8036187Z buf1 = empty_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8036515Z buf2 = empty_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8036836Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(arg2_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr())) 2023-01-11T21:38:05.8036972Z del arg0_1 2023-01-11T21:38:05.8037095Z del arg1_1 2023-01-11T21:38:05.8037207Z del arg2_1 2023-01-11T21:38:05.8037534Z buf3 = empty_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8037729Z aten.addmm.out(buf0, buf1, buf2, beta=1, alpha=1, out=buf3) 2023-01-11T21:38:05.8037853Z del buf0 2023-01-11T21:38:05.8037945Z del buf1 2023-01-11T21:38:05.8038065Z del buf2 2023-01-11T21:38:05.8038213Z buf4 = buf3; del buf3 # reuse 2023-01-11T21:38:05.8038404Z kernel_cpp_1(c_void_p(buf4.data_ptr())) 2023-01-11T21:38:05.8038592Z return (buf4, ) 2023-01-11T21:38:05.8038600Z 2023-01-11T21:38:05.8038607Z 2023-01-11T21:38:05.8038734Z if __name__ == "__main__": 2023-01-11T21:38:05.8038926Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.8039144Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.8039431Z arg0_1 = rand_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8039780Z arg1_1 = rand_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8040080Z arg2_1 = rand_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8040271Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1])) 2023-01-11T21:38:05.8040279Z 2023-01-11T21:38:05.8040399Z ok (1.901s) 2023-01-11T21:38:05.8041069Z test_alexnet_prefix_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.8041267Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.8041656Z [2023-01-11 21:24:21,321] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 16 2023-01-11T21:38:05.8041998Z [2023-01-11 21:24:21,408] torch._inductor.scheduler: [DEBUG] removed dead node: buf2 2023-01-11T21:38:05.8042356Z [2023-01-11 21:24:23,118] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 16 2023-01-11T21:38:05.8042395Z 2023-01-11T21:38:05.8042529Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.8042681Z import torch 2023-01-11T21:38:05.8042808Z import random 2023-01-11T21:38:05.8042999Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.8043188Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.8043200Z 2023-01-11T21:38:05.8043336Z aten = torch.ops.aten 2023-01-11T21:38:05.8043572Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.8043693Z async_compile = AsyncCompile() 2023-01-11T21:38:05.8043776Z 2023-01-11T21:38:05.8043875Z import triton 2023-01-11T21:38:05.8044070Z import triton.language as tl 2023-01-11T21:38:05.8044265Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.8044481Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.8044490Z 2023-01-11T21:38:05.8044496Z 2023-01-11T21:38:05.8044707Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.8045003Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.8045199Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:05.8045358Z float* __restrict__ out_ptr0) 2023-01-11T21:38:05.8045446Z { 2023-01-11T21:38:05.8045604Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.8045712Z { 2023-01-11T21:38:05.8045845Z #pragma omp for 2023-01-11T21:38:05.8046172Z for(long i0=0; i0<1024; i0+=1) 2023-01-11T21:38:05.8046298Z { 2023-01-11T21:38:05.8046410Z #pragma GCC ivdep 2023-01-11T21:38:05.8046566Z for(long i1=0; i1<27; i1+=1) 2023-01-11T21:38:05.8046682Z { 2023-01-11T21:38:05.8046882Z #pragma GCC ivdep 2023-01-11T21:38:05.8047059Z for(long i2=0; i2<27; i2+=1) 2023-01-11T21:38:05.8047177Z { 2023-01-11T21:38:05.8047299Z { 2023-01-11T21:38:05.8047395Z { 2023-01-11T21:38:05.8047584Z auto tmp0 = in_ptr0[(2*i2) + (110*i1) + (3025*i0)]; 2023-01-11T21:38:05.8047814Z auto tmp2 = in_ptr0[1 + (2*i2) + (110*i1) + (3025*i0)]; 2023-01-11T21:38:05.8048007Z auto tmp5 = in_ptr0[2 + (2*i2) + (110*i1) + (3025*i0)]; 2023-01-11T21:38:05.8048254Z auto tmp8 = in_ptr0[55 + (2*i2) + (110*i1) + (3025*i0)]; 2023-01-11T21:38:05.8048450Z auto tmp11 = in_ptr0[56 + (2*i2) + (110*i1) + (3025*i0)]; 2023-01-11T21:38:05.8048663Z auto tmp14 = in_ptr0[57 + (2*i2) + (110*i1) + (3025*i0)]; 2023-01-11T21:38:05.8048852Z auto tmp17 = in_ptr0[110 + (2*i2) + (110*i1) + (3025*i0)]; 2023-01-11T21:38:05.8049013Z auto tmp20 = in_ptr0[111 + (2*i2) + (110*i1) + (3025*i0)]; 2023-01-11T21:38:05.8049205Z auto tmp23 = in_ptr0[112 + (2*i2) + (110*i1) + (3025*i0)]; 2023-01-11T21:38:05.8049372Z auto tmp1 = tmp0 * (tmp0>0); 2023-01-11T21:38:05.8049535Z auto tmp3 = tmp2 * (tmp2>0); 2023-01-11T21:38:05.8049742Z auto tmp4 = (tmp1 != tmp1) ? tmp1 : std::max(tmp3, tmp1); 2023-01-11T21:38:05.8049953Z auto tmp6 = tmp5 * (tmp5>0); 2023-01-11T21:38:05.8050175Z auto tmp7 = (tmp4 != tmp4) ? tmp4 : std::max(tmp6, tmp4); 2023-01-11T21:38:05.8050339Z auto tmp9 = tmp8 * (tmp8>0); 2023-01-11T21:38:05.8050552Z auto tmp10 = (tmp7 != tmp7) ? tmp7 : std::max(tmp9, tmp7); 2023-01-11T21:38:05.8050698Z auto tmp12 = tmp11 * (tmp11>0); 2023-01-11T21:38:05.8050909Z auto tmp13 = (tmp10 != tmp10) ? tmp10 : std::max(tmp12, tmp10); 2023-01-11T21:38:05.8051099Z auto tmp15 = tmp14 * (tmp14>0); 2023-01-11T21:38:05.8051312Z auto tmp16 = (tmp13 != tmp13) ? tmp13 : std::max(tmp15, tmp13); 2023-01-11T21:38:05.8051489Z auto tmp18 = tmp17 * (tmp17>0); 2023-01-11T21:38:05.8051705Z auto tmp19 = (tmp16 != tmp16) ? tmp16 : std::max(tmp18, tmp16); 2023-01-11T21:38:05.8051877Z auto tmp21 = tmp20 * (tmp20>0); 2023-01-11T21:38:05.8052098Z auto tmp22 = (tmp19 != tmp19) ? tmp19 : std::max(tmp21, tmp19); 2023-01-11T21:38:05.8052274Z auto tmp24 = tmp23 * (tmp23>0); 2023-01-11T21:38:05.8052494Z auto tmp25 = (tmp22 != tmp22) ? tmp22 : std::max(tmp24, tmp22); 2023-01-11T21:38:05.8052665Z out_ptr0[i2 + (27*i1) + (729*i0)] = tmp25; 2023-01-11T21:38:05.8052787Z } 2023-01-11T21:38:05.8052931Z } 2023-01-11T21:38:05.8053088Z } 2023-01-11T21:38:05.8053199Z } 2023-01-11T21:38:05.8053320Z } 2023-01-11T21:38:05.8053402Z } 2023-01-11T21:38:05.8053507Z } 2023-01-11T21:38:05.8053655Z ''') 2023-01-11T21:38:05.8053663Z 2023-01-11T21:38:05.8053670Z 2023-01-11T21:38:05.8053820Z async_compile.wait(globals()) 2023-01-11T21:38:05.8053944Z del async_compile 2023-01-11T21:38:05.8053957Z 2023-01-11T21:38:05.8054105Z def call(args): 2023-01-11T21:38:05.8054249Z arg0_1, arg1_1, arg2_1 = args 2023-01-11T21:38:05.8054343Z args.clear() 2023-01-11T21:38:05.8054744Z buf0 = aten.convolution(arg2_1, arg1_1, arg0_1, (4, 4), (2, 2), (1, 1), False, (0, 0), 1) 2023-01-11T21:38:05.8054890Z assert_size_stride(buf0, (16, 64, 55, 55), (193600, 3025, 55, 1)) 2023-01-11T21:38:05.8054979Z del arg0_1 2023-01-11T21:38:05.8055067Z del arg1_1 2023-01-11T21:38:05.8055155Z del arg2_1 2023-01-11T21:38:05.8055410Z buf1 = empty_strided((16, 64, 27, 27), (46656, 729, 27, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8055557Z kernel_cpp_0(c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr())) 2023-01-11T21:38:05.8055682Z return (buf1, ) 2023-01-11T21:38:05.8055688Z 2023-01-11T21:38:05.8055692Z 2023-01-11T21:38:05.8055837Z if __name__ == "__main__": 2023-01-11T21:38:05.8055971Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.8056194Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.8056409Z arg0_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8056645Z arg1_1 = rand_strided((64, 3, 11, 11), (363, 121, 11, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8056896Z arg2_1 = rand_strided((16, 3, 224, 224), (150528, 50176, 224, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8057051Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1])) 2023-01-11T21:38:05.8057056Z 2023-01-11T21:38:05.8057195Z ok (1.955s) 2023-01-11T21:38:05.8057745Z test_any_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.8057902Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.8058182Z [2023-01-11 21:24:23,198] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 17 2023-01-11T21:38:05.8058465Z [2023-01-11 21:24:24,949] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 17 2023-01-11T21:38:05.8058901Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.8059050Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.8059330Z [2023-01-11 21:24:24,991] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 18 2023-01-11T21:38:05.8059613Z [2023-01-11 21:24:26,864] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 18 2023-01-11T21:38:05.8059619Z 2023-01-11T21:38:05.8059798Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.8059869Z import torch 2023-01-11T21:38:05.8059967Z import random 2023-01-11T21:38:05.8060144Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.8060288Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.8060294Z 2023-01-11T21:38:05.8060392Z aten = torch.ops.aten 2023-01-11T21:38:05.8060556Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.8060669Z async_compile = AsyncCompile() 2023-01-11T21:38:05.8060674Z 2023-01-11T21:38:05.8060767Z import triton 2023-01-11T21:38:05.8060857Z import triton.language as tl 2023-01-11T21:38:05.8061016Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.8061177Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.8061186Z 2023-01-11T21:38:05.8061190Z 2023-01-11T21:38:05.8061350Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.8061575Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.8061721Z extern "C" void kernel(bool* __restrict__ in_out_ptr0, 2023-01-11T21:38:05.8061847Z bool* __restrict__ in_out_ptr1, 2023-01-11T21:38:05.8061980Z const float* __restrict__ in_ptr0, 2023-01-11T21:38:05.8062081Z bool* __restrict__ out_ptr0, 2023-01-11T21:38:05.8062240Z bool* __restrict__ out_ptr1) 2023-01-11T21:38:05.8062336Z { 2023-01-11T21:38:05.8062443Z auto out_ptr2 = in_out_ptr0; 2023-01-11T21:38:05.8062547Z auto out_ptr3 = in_out_ptr1; 2023-01-11T21:38:05.8062637Z { 2023-01-11T21:38:05.8062719Z { 2023-01-11T21:38:05.8062795Z bool tmp2 = 0; 2023-01-11T21:38:05.8062888Z bool tmp4 = 0; 2023-01-11T21:38:05.8062981Z bool tmp8 = 0; 2023-01-11T21:38:05.8063107Z bool tmp10 = 0; 2023-01-11T21:38:05.8063249Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.8063339Z { 2023-01-11T21:38:05.8063530Z #pragma omp for reduction(||:tmp2) reduction(||:tmp4) reduction(||:tmp8) reduction(||:tmp10) 2023-01-11T21:38:05.8063625Z for(long i0=0; i0<64; i0+=1) 2023-01-11T21:38:05.8063712Z { 2023-01-11T21:38:05.8063800Z { 2023-01-11T21:38:05.8063921Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:05.8064095Z auto tmp1 = static_cast(tmp0); 2023-01-11T21:38:05.8064227Z auto tmp3 = std::isinf(tmp0); 2023-01-11T21:38:05.8064360Z auto tmp5 = tmp3 == 0; 2023-01-11T21:38:05.8064472Z auto tmp6 = static_cast(tmp5); 2023-01-11T21:38:05.8064602Z auto tmp7 = static_cast(tmp6); 2023-01-11T21:38:05.8064716Z auto tmp9 = tmp5 == 0; 2023-01-11T21:38:05.8064828Z tmp2 = tmp2 || tmp1; 2023-01-11T21:38:05.8064938Z tmp4 = tmp4 || tmp3; 2023-01-11T21:38:05.8065047Z tmp8 = tmp8 || tmp7; 2023-01-11T21:38:05.8065161Z tmp10 = tmp10 || tmp9; 2023-01-11T21:38:05.8065234Z } 2023-01-11T21:38:05.8065342Z } 2023-01-11T21:38:05.8065451Z } 2023-01-11T21:38:05.8065564Z out_ptr0[0] = tmp2; 2023-01-11T21:38:05.8065663Z out_ptr1[0] = tmp4; 2023-01-11T21:38:05.8065760Z out_ptr2[0] = tmp8; 2023-01-11T21:38:05.8065858Z out_ptr3[0] = tmp10; 2023-01-11T21:38:05.8065921Z } 2023-01-11T21:38:05.8066036Z } 2023-01-11T21:38:05.8066123Z { 2023-01-11T21:38:05.8066205Z { 2023-01-11T21:38:05.8066324Z auto tmp0 = out_ptr2[0]; 2023-01-11T21:38:05.8066429Z auto tmp1 = tmp0 == 0; 2023-01-11T21:38:05.8066514Z in_out_ptr0[0] = tmp1; 2023-01-11T21:38:05.8066596Z } 2023-01-11T21:38:05.8066674Z } 2023-01-11T21:38:05.8066756Z { 2023-01-11T21:38:05.8066845Z { 2023-01-11T21:38:05.8066949Z auto tmp0 = out_ptr3[0]; 2023-01-11T21:38:05.8067052Z auto tmp1 = tmp0 == 0; 2023-01-11T21:38:05.8067161Z in_out_ptr1[0] = tmp1; 2023-01-11T21:38:05.8067260Z } 2023-01-11T21:38:05.8067343Z } 2023-01-11T21:38:05.8067420Z } 2023-01-11T21:38:05.8067524Z ''') 2023-01-11T21:38:05.8067530Z 2023-01-11T21:38:05.8067534Z 2023-01-11T21:38:05.8067688Z async_compile.wait(globals()) 2023-01-11T21:38:05.8067780Z del async_compile 2023-01-11T21:38:05.8067785Z 2023-01-11T21:38:05.8067855Z def call(args): 2023-01-11T21:38:05.8067943Z arg0_1, = args 2023-01-11T21:38:05.8068034Z args.clear() 2023-01-11T21:38:05.8068248Z buf0 = empty_strided((), (), device='cpu', dtype=torch.bool) 2023-01-11T21:38:05.8068447Z buf1 = empty_strided((), (), device='cpu', dtype=torch.bool) 2023-01-11T21:38:05.8068644Z buf2 = empty_strided((), (), device='cpu', dtype=torch.bool) 2023-01-11T21:38:05.8068846Z buf3 = empty_strided((), (), device='cpu', dtype=torch.bool) 2023-01-11T21:38:05.8068931Z buf4 = buf2; del buf2 # reuse 2023-01-11T21:38:05.8069040Z buf5 = buf3; del buf3 # reuse 2023-01-11T21:38:05.8069275Z kernel_cpp_0(c_void_p(buf4.data_ptr()), c_void_p(buf5.data_ptr()), c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr())) 2023-01-11T21:38:05.8069364Z del arg0_1 2023-01-11T21:38:05.8069475Z return (buf0, buf1, buf4, buf5, ) 2023-01-11T21:38:05.8069480Z 2023-01-11T21:38:05.8069485Z 2023-01-11T21:38:05.8069594Z if __name__ == "__main__": 2023-01-11T21:38:05.8069727Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.8069878Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.8070073Z arg0_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8070276Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:05.8070281Z 2023-01-11T21:38:05.8070286Z 2023-01-11T21:38:05.8070398Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.8070486Z import torch 2023-01-11T21:38:05.8070576Z import random 2023-01-11T21:38:05.8070714Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.8070879Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.8070884Z 2023-01-11T21:38:05.8070983Z aten = torch.ops.aten 2023-01-11T21:38:05.8071118Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.8071227Z async_compile = AsyncCompile() 2023-01-11T21:38:05.8071232Z 2023-01-11T21:38:05.8071321Z import triton 2023-01-11T21:38:05.8071429Z import triton.language as tl 2023-01-11T21:38:05.8071570Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.8071726Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.8071735Z 2023-01-11T21:38:05.8071739Z 2023-01-11T21:38:05.8071902Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.8072144Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.8072259Z extern "C" void kernel(bool* __restrict__ in_out_ptr0, 2023-01-11T21:38:05.8072385Z bool* __restrict__ in_out_ptr1, 2023-01-11T21:38:05.8072511Z const float* __restrict__ in_ptr0, 2023-01-11T21:38:05.8072666Z bool* __restrict__ out_ptr0, 2023-01-11T21:38:05.8072783Z bool* __restrict__ out_ptr1) 2023-01-11T21:38:05.8072863Z { 2023-01-11T21:38:05.8072979Z auto out_ptr3 = in_out_ptr0; 2023-01-11T21:38:05.8073064Z auto out_ptr2 = in_out_ptr1; 2023-01-11T21:38:05.8073182Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.8073276Z { 2023-01-11T21:38:05.8073373Z #pragma omp for 2023-01-11T21:38:05.8073475Z for(long i0=0; i0<16; i0+=1) 2023-01-11T21:38:05.8073562Z { 2023-01-11T21:38:05.8073644Z { 2023-01-11T21:38:05.8073708Z { 2023-01-11T21:38:05.8073815Z bool tmp2 = 0; 2023-01-11T21:38:05.8073929Z for(long i1=0; i1<8; i1+=1) 2023-01-11T21:38:05.8074015Z { 2023-01-11T21:38:05.8074145Z { 2023-01-11T21:38:05.8074276Z auto tmp0 = in_ptr0[i1 + (8*i0)]; 2023-01-11T21:38:05.8074446Z auto tmp1 = static_cast(tmp0); 2023-01-11T21:38:05.8074540Z tmp2 = tmp2 || tmp1; 2023-01-11T21:38:05.8074628Z } 2023-01-11T21:38:05.8074722Z } 2023-01-11T21:38:05.8074829Z out_ptr0[i0] = tmp2; 2023-01-11T21:38:05.8074915Z } 2023-01-11T21:38:05.8074998Z } 2023-01-11T21:38:05.8075093Z } 2023-01-11T21:38:05.8075156Z } 2023-01-11T21:38:05.8075238Z { 2023-01-11T21:38:05.8075323Z { 2023-01-11T21:38:05.8075428Z bool tmp2 = 0; 2023-01-11T21:38:05.8075526Z bool tmp5 = 0; 2023-01-11T21:38:05.8075655Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.8075718Z { 2023-01-11T21:38:05.8075872Z #pragma omp for reduction(||:tmp2) reduction(||:tmp5) 2023-01-11T21:38:05.8075988Z for(long i0=0; i0<128; i0+=1) 2023-01-11T21:38:05.8076086Z { 2023-01-11T21:38:05.8076205Z { 2023-01-11T21:38:05.8076332Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:05.8076460Z auto tmp1 = std::isinf(tmp0); 2023-01-11T21:38:05.8076554Z auto tmp3 = tmp1 == 0; 2023-01-11T21:38:05.8076669Z auto tmp4 = tmp3 == 0; 2023-01-11T21:38:05.8076779Z tmp2 = tmp2 || tmp1; 2023-01-11T21:38:05.8076892Z tmp5 = tmp5 || tmp4; 2023-01-11T21:38:05.8077012Z } 2023-01-11T21:38:05.8077117Z } 2023-01-11T21:38:05.8077206Z } 2023-01-11T21:38:05.8077287Z out_ptr1[0] = tmp2; 2023-01-11T21:38:05.8077387Z out_ptr2[0] = tmp5; 2023-01-11T21:38:05.8077470Z } 2023-01-11T21:38:05.8077554Z } 2023-01-11T21:38:05.8077677Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.8077757Z { 2023-01-11T21:38:05.8077863Z #pragma omp for 2023-01-11T21:38:05.8077945Z for(long i0=0; i0<8; i0+=1) 2023-01-11T21:38:05.8078076Z { 2023-01-11T21:38:05.8078160Z { 2023-01-11T21:38:05.8078242Z { 2023-01-11T21:38:05.8078344Z bool tmp5 = 0; 2023-01-11T21:38:05.8078458Z for(long i1=0; i1<16; i1+=1) 2023-01-11T21:38:05.8078524Z { 2023-01-11T21:38:05.8078617Z { 2023-01-11T21:38:05.8078752Z auto tmp0 = in_ptr0[i0 + (8*i1)]; 2023-01-11T21:38:05.8078887Z auto tmp1 = std::isinf(tmp0); 2023-01-11T21:38:05.8079018Z auto tmp2 = tmp1 == 0; 2023-01-11T21:38:05.8079153Z auto tmp3 = static_cast(tmp2); 2023-01-11T21:38:05.8079286Z auto tmp4 = static_cast(tmp3); 2023-01-11T21:38:05.8079403Z tmp5 = tmp5 || tmp4; 2023-01-11T21:38:05.8079474Z } 2023-01-11T21:38:05.8079563Z } 2023-01-11T21:38:05.8079677Z out_ptr3[i0] = tmp5; 2023-01-11T21:38:05.8079762Z } 2023-01-11T21:38:05.8079846Z } 2023-01-11T21:38:05.8079979Z } 2023-01-11T21:38:05.8080058Z #pragma omp for 2023-01-11T21:38:05.8080165Z for(long i0=0; i0<8; i0+=1) 2023-01-11T21:38:05.8080247Z { 2023-01-11T21:38:05.8080332Z { 2023-01-11T21:38:05.8241079Z { 2023-01-11T21:38:05.8241365Z auto tmp0 = out_ptr3[i0]; 2023-01-11T21:38:05.8241478Z auto tmp1 = tmp0 == 0; 2023-01-11T21:38:05.8241579Z in_out_ptr0[i0] = tmp1; 2023-01-11T21:38:05.8241656Z } 2023-01-11T21:38:05.8241732Z } 2023-01-11T21:38:05.8241804Z } 2023-01-11T21:38:05.8241897Z #pragma omp single 2023-01-11T21:38:05.8242263Z { 2023-01-11T21:38:05.8242338Z { 2023-01-11T21:38:05.8242409Z { 2023-01-11T21:38:05.8242510Z auto tmp0 = out_ptr2[0]; 2023-01-11T21:38:05.8242606Z auto tmp1 = tmp0 == 0; 2023-01-11T21:38:05.8242699Z in_out_ptr1[0] = tmp1; 2023-01-11T21:38:05.8242768Z } 2023-01-11T21:38:05.8242831Z } 2023-01-11T21:38:05.8242900Z } 2023-01-11T21:38:05.8242965Z } 2023-01-11T21:38:05.8243034Z } 2023-01-11T21:38:05.8243233Z ''') 2023-01-11T21:38:05.8243242Z 2023-01-11T21:38:05.8243246Z 2023-01-11T21:38:05.8243348Z async_compile.wait(globals()) 2023-01-11T21:38:05.8243429Z del async_compile 2023-01-11T21:38:05.8243434Z 2023-01-11T21:38:05.8243505Z def call(args): 2023-01-11T21:38:05.8243585Z arg0_1, = args 2023-01-11T21:38:05.8243661Z args.clear() 2023-01-11T21:38:05.8243869Z buf0 = empty_strided((16, ), (1, ), device='cpu', dtype=torch.bool) 2023-01-11T21:38:05.8244054Z buf1 = empty_strided((), (), device='cpu', dtype=torch.bool) 2023-01-11T21:38:05.8244234Z buf4 = empty_strided((), (), device='cpu', dtype=torch.bool) 2023-01-11T21:38:05.8244423Z buf2 = empty_strided((8, ), (1, ), device='cpu', dtype=torch.bool) 2023-01-11T21:38:05.8244516Z buf3 = buf2; del buf2 # reuse 2023-01-11T21:38:05.8244600Z buf5 = buf4; del buf4 # reuse 2023-01-11T21:38:05.8244822Z kernel_cpp_0(c_void_p(buf3.data_ptr()), c_void_p(buf5.data_ptr()), c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr())) 2023-01-11T21:38:05.8244897Z del arg0_1 2023-01-11T21:38:05.8244991Z return (buf0, buf1, buf3, buf5, ) 2023-01-11T21:38:05.8245068Z 2023-01-11T21:38:05.8245073Z 2023-01-11T21:38:05.8245155Z if __name__ == "__main__": 2023-01-11T21:38:05.8245280Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.8245408Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.8245642Z arg0_1 = rand_strided((16, 8), (8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8245768Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:05.8245774Z 2023-01-11T21:38:05.8245845Z ok (3.704s) 2023-01-11T21:38:05.8246299Z test_arange1_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.8246432Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.8246697Z [2023-01-11 21:24:26,920] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 19 2023-01-11T21:38:05.8246962Z [2023-01-11 21:24:28,649] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 19 2023-01-11T21:38:05.8246971Z 2023-01-11T21:38:05.8247072Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.8247148Z import torch 2023-01-11T21:38:05.8247222Z import random 2023-01-11T21:38:05.8247337Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.8247463Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.8247468Z 2023-01-11T21:38:05.8247550Z aten = torch.ops.aten 2023-01-11T21:38:05.8247687Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.8247784Z async_compile = AsyncCompile() 2023-01-11T21:38:05.8247790Z 2023-01-11T21:38:05.8247866Z import triton 2023-01-11T21:38:05.8247963Z import triton.language as tl 2023-01-11T21:38:05.8248083Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.8248225Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.8248231Z 2023-01-11T21:38:05.8248235Z 2023-01-11T21:38:05.8248373Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.8248613Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.8248737Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:05.8248843Z float* __restrict__ out_ptr0, 2023-01-11T21:38:05.8248945Z float* __restrict__ out_ptr1) 2023-01-11T21:38:05.8249009Z { 2023-01-11T21:38:05.8249106Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.8249174Z { 2023-01-11T21:38:05.8249259Z #pragma omp for 2023-01-11T21:38:05.8249347Z for(long i0=0; i0<64; i0+=1) 2023-01-11T21:38:05.8249415Z { 2023-01-11T21:38:05.8249486Z { 2023-01-11T21:38:05.8249553Z { 2023-01-11T21:38:05.8249645Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:05.8249756Z auto tmp1 = static_cast(i0); 2023-01-11T21:38:05.8249853Z auto tmp2 = tmp0 * tmp1; 2023-01-11T21:38:05.8249945Z out_ptr0[i0] = tmp2; 2023-01-11T21:38:05.8250014Z } 2023-01-11T21:38:05.8250084Z } 2023-01-11T21:38:05.8250154Z } 2023-01-11T21:38:05.8250229Z #pragma omp for 2023-01-11T21:38:05.8250315Z for(long i0=0; i0<8; i0+=1) 2023-01-11T21:38:05.8250383Z { 2023-01-11T21:38:05.8250469Z #pragma GCC ivdep 2023-01-11T21:38:05.8250557Z for(long i1=0; i1<8; i1+=1) 2023-01-11T21:38:05.8250625Z { 2023-01-11T21:38:05.8250688Z { 2023-01-11T21:38:05.8250759Z { 2023-01-11T21:38:05.8250870Z auto tmp0 = out_ptr0[i1 + (8*i0)]; 2023-01-11T21:38:05.8251015Z auto tmp1 = static_cast(10 + i1); 2023-01-11T21:38:05.8251133Z auto tmp2 = static_cast(tmp1); 2023-01-11T21:38:05.8251232Z auto tmp3 = tmp0 + tmp2; 2023-01-11T21:38:05.8251333Z out_ptr1[i1 + (8*i0)] = tmp3; 2023-01-11T21:38:05.8251408Z } 2023-01-11T21:38:05.8251472Z } 2023-01-11T21:38:05.8251539Z } 2023-01-11T21:38:05.8251605Z } 2023-01-11T21:38:05.8251673Z } 2023-01-11T21:38:05.8251741Z } 2023-01-11T21:38:05.8251831Z ''') 2023-01-11T21:38:05.8251837Z 2023-01-11T21:38:05.8251841Z 2023-01-11T21:38:05.8251933Z async_compile.wait(globals()) 2023-01-11T21:38:05.8252004Z del async_compile 2023-01-11T21:38:05.8252009Z 2023-01-11T21:38:05.8252084Z def call(args): 2023-01-11T21:38:05.8252160Z arg0_1, = args 2023-01-11T21:38:05.8252233Z args.clear() 2023-01-11T21:38:05.8252432Z buf0 = empty_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8252629Z buf1 = empty_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8252799Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr())) 2023-01-11T21:38:05.8252867Z del arg0_1 2023-01-11T21:38:05.8252949Z return (buf0, buf1, ) 2023-01-11T21:38:05.8252957Z 2023-01-11T21:38:05.8252962Z 2023-01-11T21:38:05.8253044Z if __name__ == "__main__": 2023-01-11T21:38:05.8253160Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.8253288Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.8253486Z arg0_1 = rand_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8253599Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:05.8253604Z 2023-01-11T21:38:05.8253675Z ok (1.841s) 2023-01-11T21:38:05.8254120Z test_arange2_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.8254295Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.8254712Z [2023-01-11 21:24:28,737] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 20 2023-01-11T21:38:05.8254981Z [2023-01-11 21:24:30,441] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 20 2023-01-11T21:38:05.8254987Z 2023-01-11T21:38:05.8255086Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.8255163Z import torch 2023-01-11T21:38:05.8255242Z import random 2023-01-11T21:38:05.8255360Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.8255485Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.8255494Z 2023-01-11T21:38:05.8255571Z aten = torch.ops.aten 2023-01-11T21:38:05.8255707Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.8255803Z async_compile = AsyncCompile() 2023-01-11T21:38:05.8255808Z 2023-01-11T21:38:05.8255883Z import triton 2023-01-11T21:38:05.8255977Z import triton.language as tl 2023-01-11T21:38:05.8256101Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.8256243Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.8256248Z 2023-01-11T21:38:05.8256253Z 2023-01-11T21:38:05.8256392Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.8256593Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.8256716Z extern "C" void kernel(const long* __restrict__ in_ptr0, 2023-01-11T21:38:05.8256818Z long* __restrict__ out_ptr0) 2023-01-11T21:38:05.8256885Z { 2023-01-11T21:38:05.8257032Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.8257098Z { 2023-01-11T21:38:05.8257248Z #pragma omp for 2023-01-11T21:38:05.8257330Z for(long i0=0; i0<8; i0+=1) 2023-01-11T21:38:05.8257401Z { 2023-01-11T21:38:05.8257486Z #pragma GCC ivdep 2023-01-11T21:38:05.8257574Z for(long i1=0; i1<8; i1+=1) 2023-01-11T21:38:05.8257646Z { 2023-01-11T21:38:05.8257715Z { 2023-01-11T21:38:05.8257785Z { 2023-01-11T21:38:05.8257887Z auto tmp0 = in_ptr0[i1 + (8*i0)]; 2023-01-11T21:38:05.8258001Z auto tmp1 = static_cast(i1); 2023-01-11T21:38:05.8258102Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:05.8258202Z out_ptr0[i1 + (8*i0)] = tmp2; 2023-01-11T21:38:05.8258275Z } 2023-01-11T21:38:05.8258345Z } 2023-01-11T21:38:05.8258412Z } 2023-01-11T21:38:05.8258474Z } 2023-01-11T21:38:05.8258544Z } 2023-01-11T21:38:05.8258609Z } 2023-01-11T21:38:05.8258694Z ''') 2023-01-11T21:38:05.8258699Z 2023-01-11T21:38:05.8258704Z 2023-01-11T21:38:05.8258800Z async_compile.wait(globals()) 2023-01-11T21:38:05.8258877Z del async_compile 2023-01-11T21:38:05.8258882Z 2023-01-11T21:38:05.8258957Z def call(args): 2023-01-11T21:38:05.8259028Z arg0_1, = args 2023-01-11T21:38:05.8259104Z args.clear() 2023-01-11T21:38:05.8259297Z buf0 = empty_strided((8, 8), (8, 1), device='cpu', dtype=torch.int64) 2023-01-11T21:38:05.8259438Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:05.8259512Z del arg0_1 2023-01-11T21:38:05.8259587Z return (buf0, ) 2023-01-11T21:38:05.8259592Z 2023-01-11T21:38:05.8259596Z 2023-01-11T21:38:05.8259677Z if __name__ == "__main__": 2023-01-11T21:38:05.8259789Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.8259915Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.8260107Z arg0_1 = rand_strided((8, 8), (8, 1), device='cpu', dtype=torch.int64) 2023-01-11T21:38:05.8260221Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:05.8260226Z 2023-01-11T21:38:05.8260298Z ok (1.735s) 2023-01-11T21:38:05.8260823Z test_arange3_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.8260957Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.8261214Z [2023-01-11 21:24:30,478] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 21 2023-01-11T21:38:05.8261474Z [2023-01-11 21:24:32,182] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 21 2023-01-11T21:38:05.8261483Z 2023-01-11T21:38:05.8261582Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.8261651Z import torch 2023-01-11T21:38:05.8261727Z import random 2023-01-11T21:38:05.8261847Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.8261969Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.8261976Z 2023-01-11T21:38:05.8262060Z aten = torch.ops.aten 2023-01-11T21:38:05.8262196Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.8262292Z async_compile = AsyncCompile() 2023-01-11T21:38:05.8262297Z 2023-01-11T21:38:05.8262365Z import triton 2023-01-11T21:38:05.8262458Z import triton.language as tl 2023-01-11T21:38:05.8262585Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.8262724Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.8262730Z 2023-01-11T21:38:05.8262735Z 2023-01-11T21:38:05.8262871Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.8263113Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.8263239Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:05.8263343Z float* __restrict__ out_ptr0) 2023-01-11T21:38:05.8263403Z { 2023-01-11T21:38:05.8263507Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.8263574Z { 2023-01-11T21:38:05.8263655Z #pragma omp for 2023-01-11T21:38:05.8263746Z for(long i0=0; i0<14; i0+=1) 2023-01-11T21:38:05.8263815Z { 2023-01-11T21:38:05.8263882Z { 2023-01-11T21:38:05.8263945Z { 2023-01-11T21:38:05.8264042Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:05.8264155Z auto tmp1 = static_cast(4*i0); 2023-01-11T21:38:05.8264266Z auto tmp2 = static_cast(tmp1); 2023-01-11T21:38:05.8264368Z auto tmp3 = tmp0 + tmp2; 2023-01-11T21:38:05.8264461Z out_ptr0[i0] = tmp3; 2023-01-11T21:38:05.8264531Z } 2023-01-11T21:38:05.8264592Z } 2023-01-11T21:38:05.8264662Z } 2023-01-11T21:38:05.8264728Z } 2023-01-11T21:38:05.8264792Z } 2023-01-11T21:38:05.8264876Z ''') 2023-01-11T21:38:05.8264882Z 2023-01-11T21:38:05.8264886Z 2023-01-11T21:38:05.8264982Z async_compile.wait(globals()) 2023-01-11T21:38:05.8265059Z del async_compile 2023-01-11T21:38:05.8265064Z 2023-01-11T21:38:05.8265132Z def call(args): 2023-01-11T21:38:05.8265209Z arg0_1, = args 2023-01-11T21:38:05.8265285Z args.clear() 2023-01-11T21:38:05.8265481Z buf0 = empty_strided((14, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8265619Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:05.8265692Z del arg0_1 2023-01-11T21:38:05.8265768Z return (buf0, ) 2023-01-11T21:38:05.8265773Z 2023-01-11T21:38:05.8265777Z 2023-01-11T21:38:05.8265851Z if __name__ == "__main__": 2023-01-11T21:38:05.8265971Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.8266102Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.8266296Z arg0_1 = rand_strided((14, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8266408Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:05.8266442Z 2023-01-11T21:38:05.8266516Z ok (1.750s) 2023-01-11T21:38:05.8266960Z test_arange4_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.8267090Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.8267346Z [2023-01-11 21:24:32,224] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 22 2023-01-11T21:38:05.8267609Z [2023-01-11 21:24:33,924] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 22 2023-01-11T21:38:05.8267614Z 2023-01-11T21:38:05.8267706Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.8267779Z import torch 2023-01-11T21:38:05.8267857Z import random 2023-01-11T21:38:05.8267976Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.8268099Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.8268104Z 2023-01-11T21:38:05.8268185Z aten = torch.ops.aten 2023-01-11T21:38:05.8268322Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.8268411Z async_compile = AsyncCompile() 2023-01-11T21:38:05.8268422Z 2023-01-11T21:38:05.8268491Z import triton 2023-01-11T21:38:05.8268583Z import triton.language as tl 2023-01-11T21:38:05.8268709Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.8268881Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.8268887Z 2023-01-11T21:38:05.8268891Z 2023-01-11T21:38:05.8269028Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.8269233Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.8269357Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:05.8269455Z float* __restrict__ out_ptr0) 2023-01-11T21:38:05.8269521Z { 2023-01-11T21:38:05.8269623Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.8269689Z { 2023-01-11T21:38:05.8269771Z #pragma omp for 2023-01-11T21:38:05.8269859Z for(long i0=0; i0<1024; i0+=1) 2023-01-11T21:38:05.8269926Z { 2023-01-11T21:38:05.8269988Z { 2023-01-11T21:38:05.8270055Z { 2023-01-11T21:38:05.8270153Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:05.8270326Z auto tmp1 = static_cast(512 + ((-1)*i0)); 2023-01-11T21:38:05.8270438Z auto tmp2 = static_cast(tmp1); 2023-01-11T21:38:05.8270575Z auto tmp3 = tmp0 - tmp2; 2023-01-11T21:38:05.8270665Z out_ptr0[i0] = tmp3; 2023-01-11T21:38:05.8270728Z } 2023-01-11T21:38:05.8270795Z } 2023-01-11T21:38:05.8270863Z } 2023-01-11T21:38:05.8270930Z } 2023-01-11T21:38:05.8270993Z } 2023-01-11T21:38:05.8271077Z ''') 2023-01-11T21:38:05.8271082Z 2023-01-11T21:38:05.8271086Z 2023-01-11T21:38:05.8271179Z async_compile.wait(globals()) 2023-01-11T21:38:05.8271249Z del async_compile 2023-01-11T21:38:05.8271254Z 2023-01-11T21:38:05.8271330Z def call(args): 2023-01-11T21:38:05.8271404Z arg0_1, = args 2023-01-11T21:38:05.8271478Z args.clear() 2023-01-11T21:38:05.8271676Z buf0 = empty_strided((1024, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8271813Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:05.8271888Z del arg0_1 2023-01-11T21:38:05.8271957Z return (buf0, ) 2023-01-11T21:38:05.8271962Z 2023-01-11T21:38:05.8271973Z 2023-01-11T21:38:05.8272048Z if __name__ == "__main__": 2023-01-11T21:38:05.8272167Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.8272329Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.8272553Z arg0_1 = rand_strided((1024, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8272676Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:05.8272681Z 2023-01-11T21:38:05.8272756Z ok (1.733s) 2023-01-11T21:38:05.8273299Z test_argmax_argmin1_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.8273445Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.8273737Z [2023-01-11 21:24:33,957] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 23 2023-01-11T21:38:05.8274034Z [2023-01-11 21:24:35,652] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 23 2023-01-11T21:38:05.8274039Z 2023-01-11T21:38:05.8274142Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.8274217Z import torch 2023-01-11T21:38:05.8274292Z import random 2023-01-11T21:38:05.8274422Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.8274557Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.8274562Z 2023-01-11T21:38:05.8274648Z aten = torch.ops.aten 2023-01-11T21:38:05.8274791Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.8274891Z async_compile = AsyncCompile() 2023-01-11T21:38:05.8274896Z 2023-01-11T21:38:05.8275010Z import triton 2023-01-11T21:38:05.8275126Z import triton.language as tl 2023-01-11T21:38:05.8275277Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.8275443Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.8275448Z 2023-01-11T21:38:05.8275452Z 2023-01-11T21:38:05.8275607Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.8275834Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.8275961Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:05.8276069Z long* __restrict__ out_ptr0, 2023-01-11T21:38:05.8276177Z long* __restrict__ out_ptr1) 2023-01-11T21:38:05.8276243Z { 2023-01-11T21:38:05.8276309Z { 2023-01-11T21:38:05.8276379Z { 2023-01-11T21:38:05.8276511Z struct IndexValue_1 {size_t index; float value;}; 2023-01-11T21:38:05.8276802Z IndexValue_1 tmp1{0, -std::numeric_limits::infinity()}; 2023-01-11T21:38:05.8276942Z #pragma omp declare reduction(argmax : struct IndexValue_1 :\ 2023-01-11T21:38:05.8277103Z omp_out.value = omp_in.value < omp_out.value ? omp_out.value : omp_in.value,\ 2023-01-11T21:38:05.8277260Z omp_out.index = omp_in.value < omp_out.value ? omp_out.index : omp_in.index)\ 2023-01-11T21:38:05.8277495Z initializer(omp_priv = {0, -std::numeric_limits::infinity()}) 2023-01-11T21:38:05.8277618Z struct IndexValue_2 {size_t index; float value;}; 2023-01-11T21:38:05.8277756Z IndexValue_2 tmp2{0, std::numeric_limits::infinity()}; 2023-01-11T21:38:05.8277898Z #pragma omp declare reduction(argmin : struct IndexValue_2 :\ 2023-01-11T21:38:05.8278048Z omp_out.value = omp_in.value > omp_out.value ? omp_out.value : omp_in.value,\ 2023-01-11T21:38:05.8278191Z omp_out.index = omp_in.value > omp_out.value ? omp_out.index : omp_in.index)\ 2023-01-11T21:38:05.8278342Z initializer(omp_priv = {0, std::numeric_limits::infinity()}) 2023-01-11T21:38:05.8278451Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.8278520Z { 2023-01-11T21:38:05.8278709Z #pragma omp for reduction(argmax:tmp1) reduction(argmin:tmp2) 2023-01-11T21:38:05.8278810Z for(long i0=0; i0<524288; i0+=1) 2023-01-11T21:38:05.8278880Z { 2023-01-11T21:38:05.8278955Z { 2023-01-11T21:38:05.8279050Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:05.8279150Z if (tmp1.value < tmp0) { 2023-01-11T21:38:05.8279269Z tmp1.index = i0; tmp1.value = tmp0; 2023-01-11T21:38:05.8279343Z } 2023-01-11T21:38:05.8279441Z if (tmp2.value > tmp0) { 2023-01-11T21:38:05.8279554Z tmp2.index = i0; tmp2.value = tmp0; 2023-01-11T21:38:05.8279630Z } 2023-01-11T21:38:05.8279695Z } 2023-01-11T21:38:05.8279764Z } 2023-01-11T21:38:05.8279829Z } 2023-01-11T21:38:05.8279921Z out_ptr0[0] = tmp1.index; 2023-01-11T21:38:05.8280010Z out_ptr1[0] = tmp2.index; 2023-01-11T21:38:05.8280078Z } 2023-01-11T21:38:05.8280141Z } 2023-01-11T21:38:05.8280205Z } 2023-01-11T21:38:05.8280292Z ''') 2023-01-11T21:38:05.8280298Z 2023-01-11T21:38:05.8280302Z 2023-01-11T21:38:05.8280398Z async_compile.wait(globals()) 2023-01-11T21:38:05.8280474Z del async_compile 2023-01-11T21:38:05.8280479Z 2023-01-11T21:38:05.8280555Z def call(args): 2023-01-11T21:38:05.8280627Z arg0_1, = args 2023-01-11T21:38:05.8280703Z args.clear() 2023-01-11T21:38:05.8280879Z buf0 = empty_strided((), (), device='cpu', dtype=torch.int64) 2023-01-11T21:38:05.8281060Z buf1 = empty_strided((), (), device='cpu', dtype=torch.int64) 2023-01-11T21:38:05.8281229Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr())) 2023-01-11T21:38:05.8281336Z del arg0_1 2023-01-11T21:38:05.8281418Z return (buf0, buf1, ) 2023-01-11T21:38:05.8281423Z 2023-01-11T21:38:05.8281428Z 2023-01-11T21:38:05.8281510Z if __name__ == "__main__": 2023-01-11T21:38:05.8281630Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.8281750Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.8281968Z arg0_1 = rand_strided((8, 256, 256), (65536, 256, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8282082Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:05.8282087Z 2023-01-11T21:38:05.8282158Z ok (1.732s) 2023-01-11T21:38:05.8282618Z test_argmax_argmin2_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.8282755Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.8283013Z [2023-01-11 21:24:35,698] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 24 2023-01-11T21:38:05.8283273Z [2023-01-11 21:24:37,371] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 24 2023-01-11T21:38:05.8283279Z 2023-01-11T21:38:05.8283377Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.8283453Z import torch 2023-01-11T21:38:05.8283521Z import random 2023-01-11T21:38:05.8283641Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.8283766Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.8283771Z 2023-01-11T21:38:05.8283852Z aten = torch.ops.aten 2023-01-11T21:38:05.8283987Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.8284086Z async_compile = AsyncCompile() 2023-01-11T21:38:05.8284091Z 2023-01-11T21:38:05.8284164Z import triton 2023-01-11T21:38:05.8284250Z import triton.language as tl 2023-01-11T21:38:05.8284376Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.8284544Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.8284550Z 2023-01-11T21:38:05.8284554Z 2023-01-11T21:38:05.8284693Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.8284904Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.8285027Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:05.8285134Z long* __restrict__ out_ptr0, 2023-01-11T21:38:05.8285235Z long* __restrict__ out_ptr1, 2023-01-11T21:38:05.8285326Z long* __restrict__ out_ptr2, 2023-01-11T21:38:05.8285423Z long* __restrict__ out_ptr3) 2023-01-11T21:38:05.8285494Z { 2023-01-11T21:38:05.8285596Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.8285662Z { 2023-01-11T21:38:05.8285744Z #pragma omp for 2023-01-11T21:38:05.8285835Z for(long i0=0; i0<144; i0+=1) 2023-01-11T21:38:05.8285896Z { 2023-01-11T21:38:05.8285967Z { 2023-01-11T21:38:05.8286035Z { 2023-01-11T21:38:05.8286162Z struct IndexValue_3 {size_t index; float value;}; 2023-01-11T21:38:05.8286399Z IndexValue_3 tmp1{0, -std::numeric_limits::infinity()}; 2023-01-11T21:38:05.8286543Z #pragma omp declare reduction(argmax : struct IndexValue_3 :\ 2023-01-11T21:38:05.8286701Z omp_out.value = omp_in.value < omp_out.value ? omp_out.value : omp_in.value,\ 2023-01-11T21:38:05.8286853Z omp_out.index = omp_in.value < omp_out.value ? omp_out.index : omp_in.index)\ 2023-01-11T21:38:05.8287165Z initializer(omp_priv = {0, -std::numeric_limits::infinity()}) 2023-01-11T21:38:05.8287294Z struct IndexValue_4 {size_t index; float value;}; 2023-01-11T21:38:05.8287433Z IndexValue_4 tmp2{0, std::numeric_limits::infinity()}; 2023-01-11T21:38:05.8287582Z #pragma omp declare reduction(argmin : struct IndexValue_4 :\ 2023-01-11T21:38:05.8287743Z omp_out.value = omp_in.value > omp_out.value ? omp_out.value : omp_in.value,\ 2023-01-11T21:38:05.8287894Z omp_out.index = omp_in.value > omp_out.value ? omp_out.index : omp_in.index)\ 2023-01-11T21:38:05.8288044Z initializer(omp_priv = {0, std::numeric_limits::infinity()}) 2023-01-11T21:38:05.8288143Z for(long i1=0; i1<144; i1+=1) 2023-01-11T21:38:05.8288209Z { 2023-01-11T21:38:05.8288283Z { 2023-01-11T21:38:05.8288393Z auto tmp0 = in_ptr0[i0 + (144*i1)]; 2023-01-11T21:38:05.8288499Z if (tmp1.value < tmp0) { 2023-01-11T21:38:05.8288619Z tmp1.index = i1; tmp1.value = tmp0; 2023-01-11T21:38:05.8288695Z } 2023-01-11T21:38:05.8288800Z if (tmp2.value > tmp0) { 2023-01-11T21:38:05.8288918Z tmp2.index = i1; tmp2.value = tmp0; 2023-01-11T21:38:05.8288987Z } 2023-01-11T21:38:05.8289059Z } 2023-01-11T21:38:05.8289130Z } 2023-01-11T21:38:05.8289230Z out_ptr0[i0] = tmp1.index; 2023-01-11T21:38:05.8289327Z out_ptr1[i0] = tmp2.index; 2023-01-11T21:38:05.8289395Z } 2023-01-11T21:38:05.8289464Z } 2023-01-11T21:38:05.8289526Z } 2023-01-11T21:38:05.8289608Z #pragma omp for 2023-01-11T21:38:05.8289696Z for(long i0=0; i0<144; i0+=1) 2023-01-11T21:38:05.8289766Z { 2023-01-11T21:38:05.8289834Z { 2023-01-11T21:38:05.8289901Z { 2023-01-11T21:38:05.8290022Z struct IndexValue_5 {size_t index; float value;}; 2023-01-11T21:38:05.8290256Z IndexValue_5 tmp1{0, -std::numeric_limits::infinity()}; 2023-01-11T21:38:05.8290433Z #pragma omp declare reduction(argmax : struct IndexValue_5 :\ 2023-01-11T21:38:05.8290592Z omp_out.value = omp_in.value < omp_out.value ? omp_out.value : omp_in.value,\ 2023-01-11T21:38:05.8290745Z omp_out.index = omp_in.value < omp_out.value ? omp_out.index : omp_in.index)\ 2023-01-11T21:38:05.8290986Z initializer(omp_priv = {0, -std::numeric_limits::infinity()}) 2023-01-11T21:38:05.8291113Z struct IndexValue_6 {size_t index; float value;}; 2023-01-11T21:38:05.8291255Z IndexValue_6 tmp2{0, std::numeric_limits::infinity()}; 2023-01-11T21:38:05.8291403Z #pragma omp declare reduction(argmin : struct IndexValue_6 :\ 2023-01-11T21:38:05.8291551Z omp_out.value = omp_in.value > omp_out.value ? omp_out.value : omp_in.value,\ 2023-01-11T21:38:05.8291707Z omp_out.index = omp_in.value > omp_out.value ? omp_out.index : omp_in.index)\ 2023-01-11T21:38:05.8291857Z initializer(omp_priv = {0, std::numeric_limits::infinity()}) 2023-01-11T21:38:05.8291958Z for(long i1=0; i1<144; i1+=1) 2023-01-11T21:38:05.8292029Z { 2023-01-11T21:38:05.8292101Z { 2023-01-11T21:38:05.8292211Z auto tmp0 = in_ptr0[i1 + (144*i0)]; 2023-01-11T21:38:05.8292315Z if (tmp1.value < tmp0) { 2023-01-11T21:38:05.8292425Z tmp1.index = i1; tmp1.value = tmp0; 2023-01-11T21:38:05.8292501Z } 2023-01-11T21:38:05.8292633Z if (tmp2.value > tmp0) { 2023-01-11T21:38:05.8292751Z tmp2.index = i1; tmp2.value = tmp0; 2023-01-11T21:38:05.8292824Z } 2023-01-11T21:38:05.8292899Z } 2023-01-11T21:38:05.8292971Z } 2023-01-11T21:38:05.8293068Z out_ptr2[i0] = tmp1.index; 2023-01-11T21:38:05.8293164Z out_ptr3[i0] = tmp2.index; 2023-01-11T21:38:05.8293235Z } 2023-01-11T21:38:05.8293303Z } 2023-01-11T21:38:05.8293369Z } 2023-01-11T21:38:05.8293436Z } 2023-01-11T21:38:05.8293494Z } 2023-01-11T21:38:05.8293579Z ''') 2023-01-11T21:38:05.8293585Z 2023-01-11T21:38:05.8293589Z 2023-01-11T21:38:05.8293684Z async_compile.wait(globals()) 2023-01-11T21:38:05.8293762Z del async_compile 2023-01-11T21:38:05.8293767Z 2023-01-11T21:38:05.8293843Z def call(args): 2023-01-11T21:38:05.8293917Z arg0_1, = args 2023-01-11T21:38:05.8293996Z args.clear() 2023-01-11T21:38:05.8294190Z buf0 = empty_strided((144, ), (1, ), device='cpu', dtype=torch.int64) 2023-01-11T21:38:05.8294374Z buf1 = empty_strided((144, ), (1, ), device='cpu', dtype=torch.int64) 2023-01-11T21:38:05.8294679Z buf2 = empty_strided((144, ), (1, ), device='cpu', dtype=torch.int64) 2023-01-11T21:38:05.8294875Z buf3 = empty_strided((144, ), (1, ), device='cpu', dtype=torch.int64) 2023-01-11T21:38:05.8295095Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr()), c_void_p(buf3.data_ptr())) 2023-01-11T21:38:05.8295170Z del arg0_1 2023-01-11T21:38:05.8295261Z return (buf0, buf1, buf2, buf3, ) 2023-01-11T21:38:05.8295267Z 2023-01-11T21:38:05.8295271Z 2023-01-11T21:38:05.8295352Z if __name__ == "__main__": 2023-01-11T21:38:05.8295472Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.8295618Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.8295852Z arg0_1 = rand_strided((144, 144), (144, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8295966Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:05.8295971Z 2023-01-11T21:38:05.8296044Z ok (1.715s) 2023-01-11T21:38:05.8296168Z test_argmax_argmin3_cpu (__main__.CpuTests) ... skip: 2023-01-11T21:38:05.8296376Z FIXME: In the case of having equally max/min elements, our implementation returns 2023-01-11T21:38:05.8296491Z the last index instead of the first one 2023-01-11T21:38:05.8296563Z (0.001s) 2023-01-11T21:38:05.8297014Z test_as_strided_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.8297201Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.8297471Z [2023-01-11 21:24:37,403] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 25 2023-01-11T21:38:05.8297736Z [2023-01-11 21:24:39,091] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 25 2023-01-11T21:38:05.8297742Z 2023-01-11T21:38:05.8297837Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.8297911Z import torch 2023-01-11T21:38:05.8297985Z import random 2023-01-11T21:38:05.8298103Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.8298232Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.8298237Z 2023-01-11T21:38:05.8298313Z aten = torch.ops.aten 2023-01-11T21:38:05.8298452Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.8298549Z async_compile = AsyncCompile() 2023-01-11T21:38:05.8298555Z 2023-01-11T21:38:05.8298629Z import triton 2023-01-11T21:38:05.8298762Z import triton.language as tl 2023-01-11T21:38:05.8298885Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.8299024Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.8299030Z 2023-01-11T21:38:05.8299034Z 2023-01-11T21:38:05.8299177Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.8299378Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.8299503Z extern "C" void kernel(float* __restrict__ in_out_ptr0, 2023-01-11T21:38:05.8299614Z const float* __restrict__ in_ptr0) 2023-01-11T21:38:05.8299681Z { 2023-01-11T21:38:05.8299783Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.8299847Z { 2023-01-11T21:38:05.8299929Z #pragma omp for 2023-01-11T21:38:05.8300011Z for(long i0=0; i0<512; i0+=1) 2023-01-11T21:38:05.8300080Z { 2023-01-11T21:38:05.8300220Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:05.8300361Z auto tmp1 = at::vec::Vectorized(static_cast(1)); 2023-01-11T21:38:05.8300451Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:05.8300584Z auto tmp3 = at::vec::Vectorized(static_cast(2)); 2023-01-11T21:38:05.8300677Z auto tmp4 = tmp2 + tmp3; 2023-01-11T21:38:05.8300780Z tmp4.store(in_out_ptr0 + 8*i0); 2023-01-11T21:38:05.8300842Z } 2023-01-11T21:38:05.8300942Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:05.8301035Z for(long i0=4096; i0<4096; i0+=1) 2023-01-11T21:38:05.8301104Z { 2023-01-11T21:38:05.8301194Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:05.8301297Z auto tmp1 = static_cast(1); 2023-01-11T21:38:05.8301380Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:05.8301483Z auto tmp3 = static_cast(2); 2023-01-11T21:38:05.8301571Z auto tmp4 = tmp2 + tmp3; 2023-01-11T21:38:05.8301662Z in_out_ptr0[i0] = tmp4; 2023-01-11T21:38:05.8301729Z } 2023-01-11T21:38:05.8301795Z } 2023-01-11T21:38:05.8301862Z } 2023-01-11T21:38:05.8301941Z ''') 2023-01-11T21:38:05.8301946Z 2023-01-11T21:38:05.8301957Z 2023-01-11T21:38:05.8302044Z async_compile.wait(globals()) 2023-01-11T21:38:05.8302153Z del async_compile 2023-01-11T21:38:05.8302160Z 2023-01-11T21:38:05.8302238Z def call(args): 2023-01-11T21:38:05.8302313Z arg0_1, = args 2023-01-11T21:38:05.8302388Z args.clear() 2023-01-11T21:38:05.8302591Z buf0 = empty_strided((64, 64), (64, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8302718Z buf1 = as_strided(buf0, (8, 8, 64), (512, 64, 1)); del buf0 # reuse 2023-01-11T21:38:05.8302849Z kernel_cpp_0(c_void_p(buf1.data_ptr()), c_void_p(arg0_1.data_ptr())) 2023-01-11T21:38:05.8302964Z return (as_strided(arg0_1, (8, 8, 64), (512, 64, 1)), buf1, ) 2023-01-11T21:38:05.8302969Z 2023-01-11T21:38:05.8302974Z 2023-01-11T21:38:05.8303057Z if __name__ == "__main__": 2023-01-11T21:38:05.8303174Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.8303301Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.8303501Z arg0_1 = rand_strided((64, 64), (64, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8303616Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:05.8303621Z 2023-01-11T21:38:05.8303695Z ok (1.719s) 2023-01-11T21:38:05.8304155Z test_as_strided_scatter_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.8304281Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.8304564Z [2023-01-11 21:24:39,121] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 26 2023-01-11T21:38:05.8304825Z [2023-01-11 21:24:40,876] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 26 2023-01-11T21:38:05.8304831Z 2023-01-11T21:38:05.8304929Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.8305009Z import torch 2023-01-11T21:38:05.8305086Z import random 2023-01-11T21:38:05.8305205Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.8305339Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.8305345Z 2023-01-11T21:38:05.8305436Z aten = torch.ops.aten 2023-01-11T21:38:05.8305593Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.8305692Z async_compile = AsyncCompile() 2023-01-11T21:38:05.8305698Z 2023-01-11T21:38:05.8305773Z import triton 2023-01-11T21:38:05.8305867Z import triton.language as tl 2023-01-11T21:38:05.8305993Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.8306135Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.8306140Z 2023-01-11T21:38:05.8306145Z 2023-01-11T21:38:05.8306279Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.8306478Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.8306606Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:05.8306716Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:05.8306819Z float* __restrict__ out_ptr0) 2023-01-11T21:38:05.8306885Z { 2023-01-11T21:38:05.8306986Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.8307052Z { 2023-01-11T21:38:05.8307128Z #pragma omp for 2023-01-11T21:38:05.8307219Z for(long i0=0; i0<1280; i0+=1) 2023-01-11T21:38:05.8307286Z { 2023-01-11T21:38:05.8307425Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:05.8307568Z auto tmp1 = at::vec::Vectorized(static_cast(8)); 2023-01-11T21:38:05.8307656Z auto tmp2 = tmp0 * tmp1; 2023-01-11T21:38:05.8307797Z auto tmp3 = at::vec::Vectorized(static_cast(10)); 2023-01-11T21:38:05.8307885Z auto tmp4 = tmp2 + tmp3; 2023-01-11T21:38:05.8308007Z tmp4.store(out_ptr0 + 8*i0); 2023-01-11T21:38:05.8308076Z } 2023-01-11T21:38:05.8308177Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:05.8308273Z for(long i0=10240; i0<10240; i0+=1) 2023-01-11T21:38:05.8308340Z { 2023-01-11T21:38:05.8308428Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:05.8308532Z auto tmp1 = static_cast(8); 2023-01-11T21:38:05.8308615Z auto tmp2 = tmp0 * tmp1; 2023-01-11T21:38:05.8308720Z auto tmp3 = static_cast(10); 2023-01-11T21:38:05.8308810Z auto tmp4 = tmp2 + tmp3; 2023-01-11T21:38:05.8308896Z out_ptr0[i0] = tmp4; 2023-01-11T21:38:05.8308965Z } 2023-01-11T21:38:05.8309047Z #pragma omp for 2023-01-11T21:38:05.8309134Z for(long i0=0; i0<5120; i0+=1) 2023-01-11T21:38:05.8309194Z { 2023-01-11T21:38:05.8309262Z { 2023-01-11T21:38:05.8309331Z { 2023-01-11T21:38:05.8309431Z auto tmp0 = in_ptr1[i0]; 2023-01-11T21:38:05.8309542Z auto tmp1 = static_cast(2); 2023-01-11T21:38:05.8309637Z auto tmp2 = tmp0 * tmp1; 2023-01-11T21:38:05.8309739Z auto tmp3 = static_cast(4); 2023-01-11T21:38:05.8309880Z auto tmp4 = tmp2 - tmp3; 2023-01-11T21:38:05.8309975Z out_ptr0[2*i0] = tmp4; 2023-01-11T21:38:05.8310045Z } 2023-01-11T21:38:05.8310111Z } 2023-01-11T21:38:05.8310179Z } 2023-01-11T21:38:05.8310248Z } 2023-01-11T21:38:05.8310306Z } 2023-01-11T21:38:05.8310388Z ''') 2023-01-11T21:38:05.8310424Z 2023-01-11T21:38:05.8310428Z 2023-01-11T21:38:05.8310523Z async_compile.wait(globals()) 2023-01-11T21:38:05.8310602Z del async_compile 2023-01-11T21:38:05.8310607Z 2023-01-11T21:38:05.8310681Z def call(args): 2023-01-11T21:38:05.8310767Z arg0_1, arg1_1 = args 2023-01-11T21:38:05.8310843Z args.clear() 2023-01-11T21:38:05.8311053Z buf0 = empty_strided((10, 1024), (1024, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8311214Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:05.8311288Z del arg0_1 2023-01-11T21:38:05.8311359Z del arg1_1 2023-01-11T21:38:05.8311441Z return (buf0, ) 2023-01-11T21:38:05.8311446Z 2023-01-11T21:38:05.8311451Z 2023-01-11T21:38:05.8311532Z if __name__ == "__main__": 2023-01-11T21:38:05.8311651Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.8311777Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.8311976Z arg0_1 = rand_strided((10, 1024), (1024, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8312181Z arg1_1 = rand_strided((10, 512), (512, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8312302Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:05.8312309Z 2023-01-11T21:38:05.8312381Z ok (1.785s) 2023-01-11T21:38:05.8312836Z test_avg_pool2d1_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.8312971Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.8313227Z [2023-01-11 21:24:40,896] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 27 2023-01-11T21:38:05.8313493Z [2023-01-11 21:24:42,653] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 27 2023-01-11T21:38:05.8313499Z 2023-01-11T21:38:05.8313597Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.8313674Z import torch 2023-01-11T21:38:05.8313743Z import random 2023-01-11T21:38:05.8313892Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.8314017Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.8314022Z 2023-01-11T21:38:05.8314104Z aten = torch.ops.aten 2023-01-11T21:38:05.8314244Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.8314341Z async_compile = AsyncCompile() 2023-01-11T21:38:05.8314347Z 2023-01-11T21:38:05.8314420Z import triton 2023-01-11T21:38:05.8314507Z import triton.language as tl 2023-01-11T21:38:05.8314634Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.8314773Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.8314782Z 2023-01-11T21:38:05.8314786Z 2023-01-11T21:38:05.8314924Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.8315173Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.8315311Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:05.8315419Z float* __restrict__ out_ptr0) 2023-01-11T21:38:05.8315484Z { 2023-01-11T21:38:05.8315579Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.8315646Z { 2023-01-11T21:38:05.8315728Z #pragma omp for 2023-01-11T21:38:05.8315816Z for(long i0=0; i0<8; i0+=1) 2023-01-11T21:38:05.8315883Z { 2023-01-11T21:38:05.8315970Z #pragma GCC ivdep 2023-01-11T21:38:05.8316058Z for(long i1=0; i1<7; i1+=1) 2023-01-11T21:38:05.8316120Z { 2023-01-11T21:38:05.8316206Z #pragma GCC ivdep 2023-01-11T21:38:05.8316300Z for(long i2=0; i2<7; i2+=1) 2023-01-11T21:38:05.8316400Z { 2023-01-11T21:38:05.8316473Z { 2023-01-11T21:38:05.8316547Z { 2023-01-11T21:38:05.8316665Z auto tmp0 = in_ptr0[(2*i2) + (32*i1) + (256*i0)]; 2023-01-11T21:38:05.8316778Z auto tmp1 = in_ptr0[1 + (2*i2) + (32*i1) + (256*i0)]; 2023-01-11T21:38:05.8316902Z auto tmp3 = in_ptr0[2 + (2*i2) + (32*i1) + (256*i0)]; 2023-01-11T21:38:05.8317021Z auto tmp5 = in_ptr0[16 + (2*i2) + (32*i1) + (256*i0)]; 2023-01-11T21:38:05.8317134Z auto tmp7 = in_ptr0[17 + (2*i2) + (32*i1) + (256*i0)]; 2023-01-11T21:38:05.8317248Z auto tmp9 = in_ptr0[18 + (2*i2) + (32*i1) + (256*i0)]; 2023-01-11T21:38:05.8317367Z auto tmp11 = in_ptr0[32 + (2*i2) + (32*i1) + (256*i0)]; 2023-01-11T21:38:05.8317481Z auto tmp13 = in_ptr0[33 + (2*i2) + (32*i1) + (256*i0)]; 2023-01-11T21:38:05.8317598Z auto tmp15 = in_ptr0[34 + (2*i2) + (32*i1) + (256*i0)]; 2023-01-11T21:38:05.8317693Z auto tmp2 = tmp1 + tmp0; 2023-01-11T21:38:05.8317789Z auto tmp4 = tmp3 + tmp2; 2023-01-11T21:38:05.8317889Z auto tmp6 = tmp5 + tmp4; 2023-01-11T21:38:05.8317987Z auto tmp8 = tmp7 + tmp6; 2023-01-11T21:38:05.8318087Z auto tmp10 = tmp9 + tmp8; 2023-01-11T21:38:05.8318187Z auto tmp12 = tmp11 + tmp10; 2023-01-11T21:38:05.8318291Z auto tmp14 = tmp13 + tmp12; 2023-01-11T21:38:05.8318387Z auto tmp16 = tmp15 + tmp14; 2023-01-11T21:38:05.8318512Z auto tmp17 = static_cast(0.1111111111111111); 2023-01-11T21:38:05.8318612Z auto tmp18 = tmp16 * tmp17; 2023-01-11T21:38:05.8318722Z out_ptr0[i2 + (7*i1) + (49*i0)] = tmp18; 2023-01-11T21:38:05.8318797Z } 2023-01-11T21:38:05.8318868Z } 2023-01-11T21:38:05.8318936Z } 2023-01-11T21:38:05.8318998Z } 2023-01-11T21:38:05.8319063Z } 2023-01-11T21:38:05.8319129Z } 2023-01-11T21:38:05.8319191Z } 2023-01-11T21:38:05.8319317Z ''') 2023-01-11T21:38:05.8319323Z 2023-01-11T21:38:05.8319328Z 2023-01-11T21:38:05.8319421Z async_compile.wait(globals()) 2023-01-11T21:38:05.8319500Z del async_compile 2023-01-11T21:38:05.8319505Z 2023-01-11T21:38:05.8319579Z def call(args): 2023-01-11T21:38:05.8319647Z arg0_1, = args 2023-01-11T21:38:05.8319722Z args.clear() 2023-01-11T21:38:05.8319935Z buf0 = empty_strided((2, 4, 7, 7), (196, 49, 7, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8320074Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:05.8320146Z del arg0_1 2023-01-11T21:38:05.8320221Z return (buf0, ) 2023-01-11T21:38:05.8320229Z 2023-01-11T21:38:05.8320233Z 2023-01-11T21:38:05.8320313Z if __name__ == "__main__": 2023-01-11T21:38:05.8320424Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.8320548Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.8320770Z arg0_1 = rand_strided((2, 4, 16, 16), (1024, 256, 16, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8320883Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:05.8320889Z 2023-01-11T21:38:05.8320958Z ok (1.777s) 2023-01-11T21:38:05.8321408Z test_avg_pool2d2_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.8321541Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.8321825Z [2023-01-11 21:24:42,694] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 28 2023-01-11T21:38:05.8322087Z [2023-01-11 21:24:44,436] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 28 2023-01-11T21:38:05.8322096Z 2023-01-11T21:38:05.8322196Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.8322266Z import torch 2023-01-11T21:38:05.8322340Z import random 2023-01-11T21:38:05.8322458Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.8322582Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.8322587Z 2023-01-11T21:38:05.8322669Z aten = torch.ops.aten 2023-01-11T21:38:05.8322806Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.8322902Z async_compile = AsyncCompile() 2023-01-11T21:38:05.8322907Z 2023-01-11T21:38:05.8322975Z import triton 2023-01-11T21:38:05.8323072Z import triton.language as tl 2023-01-11T21:38:05.8323200Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.8323339Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.8323345Z 2023-01-11T21:38:05.8323349Z 2023-01-11T21:38:05.8323482Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.8323691Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.8323815Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:05.8323918Z float* __restrict__ out_ptr0) 2023-01-11T21:38:05.8323978Z { 2023-01-11T21:38:05.8324078Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.8324143Z { 2023-01-11T21:38:05.8324222Z #pragma omp for 2023-01-11T21:38:05.8324309Z for(long i0=0; i0<1024; i0+=1) 2023-01-11T21:38:05.8324377Z { 2023-01-11T21:38:05.8324462Z #pragma GCC ivdep 2023-01-11T21:38:05.8324546Z for(long i1=0; i1<27; i1+=1) 2023-01-11T21:38:05.8324616Z { 2023-01-11T21:38:05.8324701Z #pragma GCC ivdep 2023-01-11T21:38:05.8324797Z for(long i2=0; i2<27; i2+=1) 2023-01-11T21:38:05.8324864Z { 2023-01-11T21:38:05.8324933Z { 2023-01-11T21:38:05.8325000Z { 2023-01-11T21:38:05.8325147Z auto tmp0 = in_ptr0[(2*i2) + (110*i1) + (3025*i0)]; 2023-01-11T21:38:05.8325268Z auto tmp1 = in_ptr0[1 + (2*i2) + (110*i1) + (3025*i0)]; 2023-01-11T21:38:05.8325384Z auto tmp3 = in_ptr0[2 + (2*i2) + (110*i1) + (3025*i0)]; 2023-01-11T21:38:05.8325502Z auto tmp5 = in_ptr0[55 + (2*i2) + (110*i1) + (3025*i0)]; 2023-01-11T21:38:05.8325618Z auto tmp7 = in_ptr0[56 + (2*i2) + (110*i1) + (3025*i0)]; 2023-01-11T21:38:05.8325734Z auto tmp9 = in_ptr0[57 + (2*i2) + (110*i1) + (3025*i0)]; 2023-01-11T21:38:05.8325857Z auto tmp11 = in_ptr0[110 + (2*i2) + (110*i1) + (3025*i0)]; 2023-01-11T21:38:05.8325978Z auto tmp13 = in_ptr0[111 + (2*i2) + (110*i1) + (3025*i0)]; 2023-01-11T21:38:05.8326093Z auto tmp15 = in_ptr0[112 + (2*i2) + (110*i1) + (3025*i0)]; 2023-01-11T21:38:05.8326197Z auto tmp2 = tmp1 + tmp0; 2023-01-11T21:38:05.8326294Z auto tmp4 = tmp3 + tmp2; 2023-01-11T21:38:05.8326392Z auto tmp6 = tmp5 + tmp4; 2023-01-11T21:38:05.8326489Z auto tmp8 = tmp7 + tmp6; 2023-01-11T21:38:05.8326591Z auto tmp10 = tmp9 + tmp8; 2023-01-11T21:38:05.8326693Z auto tmp12 = tmp11 + tmp10; 2023-01-11T21:38:05.8326791Z auto tmp14 = tmp13 + tmp12; 2023-01-11T21:38:05.8326894Z auto tmp16 = tmp15 + tmp14; 2023-01-11T21:38:05.8327050Z auto tmp17 = static_cast(0.1111111111111111); 2023-01-11T21:38:05.8327151Z auto tmp18 = tmp16 * tmp17; 2023-01-11T21:38:05.8327264Z out_ptr0[i2 + (27*i1) + (729*i0)] = tmp18; 2023-01-11T21:38:05.8327339Z } 2023-01-11T21:38:05.8327411Z } 2023-01-11T21:38:05.8327474Z } 2023-01-11T21:38:05.8327542Z } 2023-01-11T21:38:05.8327611Z } 2023-01-11T21:38:05.8327677Z } 2023-01-11T21:38:05.8327741Z } 2023-01-11T21:38:05.8327826Z ''') 2023-01-11T21:38:05.8327831Z 2023-01-11T21:38:05.8327836Z 2023-01-11T21:38:05.8327931Z async_compile.wait(globals()) 2023-01-11T21:38:05.8328002Z del async_compile 2023-01-11T21:38:05.8328014Z 2023-01-11T21:38:05.8328084Z def call(args): 2023-01-11T21:38:05.8328156Z arg0_1, = args 2023-01-11T21:38:05.8328229Z args.clear() 2023-01-11T21:38:05.8328452Z buf0 = empty_strided((16, 64, 27, 27), (46656, 729, 27, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8328593Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:05.8328664Z del arg0_1 2023-01-11T21:38:05.8328737Z return (buf0, ) 2023-01-11T21:38:05.8328742Z 2023-01-11T21:38:05.8328746Z 2023-01-11T21:38:05.8328823Z if __name__ == "__main__": 2023-01-11T21:38:05.8328939Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.8329065Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.8329291Z arg0_1 = rand_strided((16, 64, 55, 55), (193600, 3025, 55, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8329405Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:05.8329412Z 2023-01-11T21:38:05.8329484Z ok (1.826s) 2023-01-11T21:38:05.8329935Z test_avg_pool2d3_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.8330069Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.8330353Z [2023-01-11 21:24:44,500] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 29 2023-01-11T21:38:05.8330611Z [2023-01-11 21:24:46,312] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 29 2023-01-11T21:38:05.8330622Z 2023-01-11T21:38:05.8330715Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.8330790Z import torch 2023-01-11T21:38:05.8330863Z import random 2023-01-11T21:38:05.8330983Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.8331106Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.8331111Z 2023-01-11T21:38:05.8331194Z aten = torch.ops.aten 2023-01-11T21:38:05.8331334Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.8331424Z async_compile = AsyncCompile() 2023-01-11T21:38:05.8331429Z 2023-01-11T21:38:05.8331503Z import triton 2023-01-11T21:38:05.8331596Z import triton.language as tl 2023-01-11T21:38:05.8331725Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.8331863Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.8331869Z 2023-01-11T21:38:05.8331873Z 2023-01-11T21:38:05.8332010Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.8332216Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.8332338Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:05.8332435Z float* __restrict__ out_ptr0) 2023-01-11T21:38:05.8332501Z { 2023-01-11T21:38:05.8332604Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.8332699Z { 2023-01-11T21:38:05.8332781Z #pragma omp for 2023-01-11T21:38:05.8332868Z for(long i0=0; i0<4; i0+=1) 2023-01-11T21:38:05.8332936Z { 2023-01-11T21:38:05.8333016Z #pragma GCC ivdep 2023-01-11T21:38:05.8333105Z for(long i1=0; i1<4; i1+=1) 2023-01-11T21:38:05.8333173Z { 2023-01-11T21:38:05.8333245Z { 2023-01-11T21:38:05.8333316Z { 2023-01-11T21:38:05.8333495Z auto tmp0 = static_cast((-1) + (2*i0)); 2023-01-11T21:38:05.8333606Z auto tmp1 = static_cast(0); 2023-01-11T21:38:05.8333701Z auto tmp2 = tmp0 >= tmp1; 2023-01-11T21:38:05.8333808Z auto tmp3 = static_cast(8); 2023-01-11T21:38:05.8333908Z auto tmp4 = tmp0 < tmp3; 2023-01-11T21:38:05.8334005Z auto tmp5 = tmp2 & tmp4; 2023-01-11T21:38:05.8334177Z auto tmp6 = static_cast((-1) + (2*i1)); 2023-01-11T21:38:05.8334279Z auto tmp7 = tmp6 >= tmp1; 2023-01-11T21:38:05.8334379Z auto tmp8 = tmp6 < tmp3; 2023-01-11T21:38:05.8334469Z auto tmp9 = tmp7 & tmp8; 2023-01-11T21:38:05.8334770Z auto tmp10 = tmp5 & tmp9; 2023-01-11T21:38:05.8334869Z float tmp11 = 0.0; 2023-01-11T21:38:05.8334954Z if(tmp10) 2023-01-11T21:38:05.8335026Z { 2023-01-11T21:38:05.8335212Z auto tmp12 = in_ptr0[(-9) + (2*i1) + (16*i0)]; 2023-01-11T21:38:05.8335302Z tmp11 = tmp12; 2023-01-11T21:38:05.8335370Z } 2023-01-11T21:38:05.8335484Z auto tmp13 = static_cast(2*i1); 2023-01-11T21:38:05.8335587Z auto tmp14 = tmp13 >= tmp1; 2023-01-11T21:38:05.8335686Z auto tmp15 = tmp13 < tmp3; 2023-01-11T21:38:05.8335789Z auto tmp16 = tmp14 & tmp15; 2023-01-11T21:38:05.8335887Z auto tmp17 = tmp5 & tmp16; 2023-01-11T21:38:05.8335978Z float tmp18 = 0.0; 2023-01-11T21:38:05.8336053Z if(tmp17) 2023-01-11T21:38:05.8336126Z { 2023-01-11T21:38:05.8336532Z auto tmp19 = in_ptr0[(-8) + (2*i1) + (16*i0)]; 2023-01-11T21:38:05.8336627Z tmp18 = tmp19; 2023-01-11T21:38:05.8336704Z } 2023-01-11T21:38:05.8336807Z auto tmp20 = tmp18 + tmp11; 2023-01-11T21:38:05.8336924Z auto tmp21 = static_cast(1 + (2*i1)); 2023-01-11T21:38:05.8337023Z auto tmp22 = tmp21 >= tmp1; 2023-01-11T21:38:05.8337115Z auto tmp23 = tmp21 < tmp3; 2023-01-11T21:38:05.8337274Z auto tmp24 = tmp22 & tmp23; 2023-01-11T21:38:05.8337383Z auto tmp25 = tmp5 & tmp24; 2023-01-11T21:38:05.8337488Z float tmp26 = 0.0; 2023-01-11T21:38:05.8337580Z if(tmp25) 2023-01-11T21:38:05.8337671Z { 2023-01-11T21:38:05.8337849Z auto tmp27 = in_ptr0[(-7) + (2*i1) + (16*i0)]; 2023-01-11T21:38:05.8337933Z tmp26 = tmp27; 2023-01-11T21:38:05.8338004Z } 2023-01-11T21:38:05.8338102Z auto tmp28 = tmp26 + tmp20; 2023-01-11T21:38:05.8338215Z auto tmp29 = static_cast(2*i0); 2023-01-11T21:38:05.8338313Z auto tmp30 = tmp29 >= tmp1; 2023-01-11T21:38:05.8338408Z auto tmp31 = tmp29 < tmp3; 2023-01-11T21:38:05.8338504Z auto tmp32 = tmp30 & tmp31; 2023-01-11T21:38:05.8338593Z auto tmp33 = tmp32 & tmp9; 2023-01-11T21:38:05.8338682Z float tmp34 = 0.0; 2023-01-11T21:38:05.8338803Z if(tmp33) 2023-01-11T21:38:05.8338874Z { 2023-01-11T21:38:05.8339050Z auto tmp35 = in_ptr0[(-1) + (2*i1) + (16*i0)]; 2023-01-11T21:38:05.8339138Z tmp34 = tmp35; 2023-01-11T21:38:05.8339209Z } 2023-01-11T21:38:05.8339304Z auto tmp36 = tmp34 + tmp28; 2023-01-11T21:38:05.8339398Z auto tmp37 = tmp32 & tmp16; 2023-01-11T21:38:05.8339488Z float tmp38 = 0.0; 2023-01-11T21:38:05.8339567Z if(tmp37) 2023-01-11T21:38:05.8339638Z { 2023-01-11T21:38:05.8339749Z auto tmp39 = in_ptr0[(2*i1) + (16*i0)]; 2023-01-11T21:38:05.8339839Z tmp38 = tmp39; 2023-01-11T21:38:05.8339905Z } 2023-01-11T21:38:05.8340002Z auto tmp40 = tmp38 + tmp36; 2023-01-11T21:38:05.8340103Z auto tmp41 = tmp32 & tmp24; 2023-01-11T21:38:05.8340191Z float tmp42 = 0.0; 2023-01-11T21:38:05.8340271Z if(tmp41) 2023-01-11T21:38:05.8340343Z { 2023-01-11T21:38:05.8340456Z auto tmp43 = in_ptr0[1 + (2*i1) + (16*i0)]; 2023-01-11T21:38:05.8340540Z tmp42 = tmp43; 2023-01-11T21:38:05.8340612Z } 2023-01-11T21:38:05.8340710Z auto tmp44 = tmp42 + tmp40; 2023-01-11T21:38:05.8340825Z auto tmp45 = static_cast(1 + (2*i0)); 2023-01-11T21:38:05.8340921Z auto tmp46 = tmp45 >= tmp1; 2023-01-11T21:38:05.8341020Z auto tmp47 = tmp45 < tmp3; 2023-01-11T21:38:05.8341116Z auto tmp48 = tmp46 & tmp47; 2023-01-11T21:38:05.8341211Z auto tmp49 = tmp48 & tmp9; 2023-01-11T21:38:05.8341294Z float tmp50 = 0.0; 2023-01-11T21:38:05.8341377Z if(tmp49) 2023-01-11T21:38:05.8341447Z { 2023-01-11T21:38:05.8341560Z auto tmp51 = in_ptr0[7 + (2*i1) + (16*i0)]; 2023-01-11T21:38:05.8341647Z tmp50 = tmp51; 2023-01-11T21:38:05.8341754Z } 2023-01-11T21:38:05.8341854Z auto tmp52 = tmp50 + tmp44; 2023-01-11T21:38:05.8341943Z auto tmp53 = tmp48 & tmp16; 2023-01-11T21:38:05.8342032Z float tmp54 = 0.0; 2023-01-11T21:38:05.8342114Z if(tmp53) 2023-01-11T21:38:05.8342184Z { 2023-01-11T21:38:05.8342296Z auto tmp55 = in_ptr0[8 + (2*i1) + (16*i0)]; 2023-01-11T21:38:05.8342385Z tmp54 = tmp55; 2023-01-11T21:38:05.8342456Z } 2023-01-11T21:38:05.8342546Z auto tmp56 = tmp54 + tmp52; 2023-01-11T21:38:05.8342643Z auto tmp57 = tmp48 & tmp24; 2023-01-11T21:38:05.8342733Z float tmp58 = 0.0; 2023-01-11T21:38:05.8342814Z if(tmp57) 2023-01-11T21:38:05.8342886Z { 2023-01-11T21:38:05.8343000Z auto tmp59 = in_ptr0[9 + (2*i1) + (16*i0)]; 2023-01-11T21:38:05.8343088Z tmp58 = tmp59; 2023-01-11T21:38:05.8343154Z } 2023-01-11T21:38:05.8343248Z auto tmp60 = tmp58 + tmp56; 2023-01-11T21:38:05.8343372Z auto tmp61 = static_cast(0.1111111111111111); 2023-01-11T21:38:05.8343469Z auto tmp62 = tmp60 * tmp61; 2023-01-11T21:38:05.8343567Z out_ptr0[i1 + (4*i0)] = tmp62; 2023-01-11T21:38:05.8343637Z } 2023-01-11T21:38:05.8343705Z } 2023-01-11T21:38:05.8343765Z } 2023-01-11T21:38:05.8343864Z } 2023-01-11T21:38:05.8343928Z } 2023-01-11T21:38:05.8343994Z } 2023-01-11T21:38:05.8344081Z ''') 2023-01-11T21:38:05.8344087Z 2023-01-11T21:38:05.8344091Z 2023-01-11T21:38:05.8344184Z async_compile.wait(globals()) 2023-01-11T21:38:05.8344260Z del async_compile 2023-01-11T21:38:05.8344265Z 2023-01-11T21:38:05.8344333Z def call(args): 2023-01-11T21:38:05.8344409Z arg0_1, = args 2023-01-11T21:38:05.8344483Z args.clear() 2023-01-11T21:38:05.8344696Z buf0 = empty_strided((1, 1, 4, 4), (16, 16, 4, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8344835Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:05.8344904Z del arg0_1 2023-01-11T21:38:05.8344978Z return (buf0, ) 2023-01-11T21:38:05.8344983Z 2023-01-11T21:38:05.8344988Z 2023-01-11T21:38:05.8345067Z if __name__ == "__main__": 2023-01-11T21:38:05.8345178Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.8345303Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.8345519Z arg0_1 = rand_strided((1, 1, 8, 8), (64, 64, 8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8345631Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:05.8345636Z 2023-01-11T21:38:05.8345706Z ok (1.832s) 2023-01-11T21:38:05.8346169Z test_avg_pool2d4_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.8346300Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.8346556Z [2023-01-11 21:24:46,332] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 30 2023-01-11T21:38:05.8346821Z [2023-01-11 21:24:48,104] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 30 2023-01-11T21:38:05.8346829Z 2023-01-11T21:38:05.8346920Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.8346997Z import torch 2023-01-11T21:38:05.8347072Z import random 2023-01-11T21:38:05.8347189Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.8347338Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.8347343Z 2023-01-11T21:38:05.8347425Z aten = torch.ops.aten 2023-01-11T21:38:05.8347561Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.8347657Z async_compile = AsyncCompile() 2023-01-11T21:38:05.8347663Z 2023-01-11T21:38:05.8347730Z import triton 2023-01-11T21:38:05.8347820Z import triton.language as tl 2023-01-11T21:38:05.8347943Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.8348085Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.8348091Z 2023-01-11T21:38:05.8348098Z 2023-01-11T21:38:05.8348236Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.8348444Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.8348567Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:05.8348671Z float* __restrict__ out_ptr0) 2023-01-11T21:38:05.8348731Z { 2023-01-11T21:38:05.8348832Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.8348896Z { 2023-01-11T21:38:05.8348977Z #pragma omp for 2023-01-11T21:38:05.8349063Z for(long i0=0; i0<16; i0+=1) 2023-01-11T21:38:05.8349127Z { 2023-01-11T21:38:05.8349205Z #pragma GCC ivdep 2023-01-11T21:38:05.8349295Z for(long i1=0; i1<55; i1+=1) 2023-01-11T21:38:05.8349365Z { 2023-01-11T21:38:05.8349449Z #pragma GCC ivdep 2023-01-11T21:38:05.8349543Z for(long i2=0; i2<55; i2+=1) 2023-01-11T21:38:05.8349612Z { 2023-01-11T21:38:05.8349711Z { 2023-01-11T21:38:05.8349777Z { 2023-01-11T21:38:05.8349897Z auto tmp0 = in_ptr0[(2*i2) + (222*i1) + (12321*i0)]; 2023-01-11T21:38:05.8350021Z auto tmp1 = in_ptr0[1 + (2*i2) + (222*i1) + (12321*i0)]; 2023-01-11T21:38:05.8350140Z auto tmp3 = in_ptr0[2 + (2*i2) + (222*i1) + (12321*i0)]; 2023-01-11T21:38:05.8350261Z auto tmp5 = in_ptr0[111 + (2*i2) + (222*i1) + (12321*i0)]; 2023-01-11T21:38:05.8350380Z auto tmp7 = in_ptr0[112 + (2*i2) + (222*i1) + (12321*i0)]; 2023-01-11T21:38:05.8350500Z auto tmp9 = in_ptr0[113 + (2*i2) + (222*i1) + (12321*i0)]; 2023-01-11T21:38:05.8350622Z auto tmp11 = in_ptr0[222 + (2*i2) + (222*i1) + (12321*i0)]; 2023-01-11T21:38:05.8350735Z auto tmp13 = in_ptr0[223 + (2*i2) + (222*i1) + (12321*i0)]; 2023-01-11T21:38:05.8350856Z auto tmp15 = in_ptr0[224 + (2*i2) + (222*i1) + (12321*i0)]; 2023-01-11T21:38:05.8350961Z auto tmp2 = tmp1 + tmp0; 2023-01-11T21:38:05.8351060Z auto tmp4 = tmp3 + tmp2; 2023-01-11T21:38:05.8351155Z auto tmp6 = tmp5 + tmp4; 2023-01-11T21:38:05.8351256Z auto tmp8 = tmp7 + tmp6; 2023-01-11T21:38:05.8351357Z auto tmp10 = tmp9 + tmp8; 2023-01-11T21:38:05.8351460Z auto tmp12 = tmp11 + tmp10; 2023-01-11T21:38:05.8351555Z auto tmp14 = tmp13 + tmp12; 2023-01-11T21:38:05.8351656Z auto tmp16 = tmp15 + tmp14; 2023-01-11T21:38:05.8351781Z auto tmp17 = static_cast(0.1111111111111111); 2023-01-11T21:38:05.8351880Z auto tmp18 = tmp16 * tmp17; 2023-01-11T21:38:05.8351990Z out_ptr0[i2 + (55*i1) + (3025*i0)] = tmp18; 2023-01-11T21:38:05.8352065Z } 2023-01-11T21:38:05.8352136Z } 2023-01-11T21:38:05.8352198Z } 2023-01-11T21:38:05.8352263Z } 2023-01-11T21:38:05.8352329Z } 2023-01-11T21:38:05.8352395Z } 2023-01-11T21:38:05.8352456Z } 2023-01-11T21:38:05.8352586Z ''') 2023-01-11T21:38:05.8352592Z 2023-01-11T21:38:05.8352597Z 2023-01-11T21:38:05.8352690Z async_compile.wait(globals()) 2023-01-11T21:38:05.8352760Z del async_compile 2023-01-11T21:38:05.8352765Z 2023-01-11T21:38:05.8352842Z def call(args): 2023-01-11T21:38:05.8352914Z arg0_1, = args 2023-01-11T21:38:05.8352987Z args.clear() 2023-01-11T21:38:05.8353212Z buf0 = empty_strided((2, 8, 55, 55), (24200, 3025, 55, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8353350Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:05.8353422Z del arg0_1 2023-01-11T21:38:05.8353490Z return (buf0, ) 2023-01-11T21:38:05.8353497Z 2023-01-11T21:38:05.8353509Z 2023-01-11T21:38:05.8353583Z if __name__ == "__main__": 2023-01-11T21:38:05.8353699Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.8353823Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.8354051Z arg0_1 = rand_strided((2, 8, 111, 111), (98568, 12321, 111, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8354163Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:05.8354168Z 2023-01-11T21:38:05.8354239Z ok (1.794s) 2023-01-11T21:38:05.8354689Z test_avg_pool2d5_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.8354848Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.8355106Z [2023-01-11 21:24:48,126] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 31 2023-01-11T21:38:05.8355111Z 2023-01-11T21:38:05.8355203Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.8355274Z import torch 2023-01-11T21:38:05.8355350Z import random 2023-01-11T21:38:05.8355467Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.8355589Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.8355595Z 2023-01-11T21:38:05.8355675Z aten = torch.ops.aten 2023-01-11T21:38:05.8355808Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.8355897Z async_compile = AsyncCompile() 2023-01-11T21:38:05.8355907Z 2023-01-11T21:38:05.8355973Z import triton 2023-01-11T21:38:05.8356064Z import triton.language as tl 2023-01-11T21:38:05.8356188Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.8356330Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.8356335Z 2023-01-11T21:38:05.8356340Z 2023-01-11T21:38:05.8356475Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.8356681Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.8356806Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:05.8356903Z float* __restrict__ out_ptr0) 2023-01-11T21:38:05.8356965Z { 2023-01-11T21:38:05.8357065Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.8357130Z { 2023-01-11T21:38:05.8357210Z #pragma omp for 2023-01-11T21:38:05.8357293Z for(long i0=0; i0<4; i0+=1) 2023-01-11T21:38:05.8357360Z { 2023-01-11T21:38:05.8357439Z #pragma GCC ivdep 2023-01-11T21:38:05.8357527Z for(long i1=0; i1<4; i1+=1) 2023-01-11T21:38:05.8357593Z { 2023-01-11T21:38:05.8357659Z { 2023-01-11T21:38:05.8357733Z { 2023-01-11T21:38:05.8357907Z auto tmp0 = static_cast((-1) + (2*i0)); 2023-01-11T21:38:05.8358015Z auto tmp1 = static_cast(0); 2023-01-11T21:38:05.8358108Z auto tmp2 = tmp0 >= tmp1; 2023-01-11T21:38:05.8358257Z auto tmp3 = static_cast(8); 2023-01-11T21:38:05.8358356Z auto tmp4 = tmp0 < tmp3; 2023-01-11T21:38:05.8358455Z auto tmp5 = tmp2 & tmp4; 2023-01-11T21:38:05.8358626Z auto tmp6 = static_cast((-1) + (2*i1)); 2023-01-11T21:38:05.8358725Z auto tmp7 = tmp6 >= tmp1; 2023-01-11T21:38:05.8358822Z auto tmp8 = tmp6 < tmp3; 2023-01-11T21:38:05.8358911Z auto tmp9 = tmp7 & tmp8; 2023-01-11T21:38:05.8359008Z auto tmp10 = tmp5 & tmp9; 2023-01-11T21:38:05.8359096Z float tmp11 = 0.0; 2023-01-11T21:38:05.8359180Z if(tmp10) 2023-01-11T21:38:05.8359254Z { 2023-01-11T21:38:05.8359431Z auto tmp12 = in_ptr0[(-9) + (2*i1) + (16*i0)]; 2023-01-11T21:38:05.8359518Z tmp11 = tmp12; 2023-01-11T21:38:05.8359584Z } 2023-01-11T21:38:05.8359703Z auto tmp13 = static_cast(2*i1); 2023-01-11T21:38:05.8359805Z auto tmp14 = tmp13 >= tmp1; 2023-01-11T21:38:05.8359903Z auto tmp15 = tmp13 < tmp3; 2023-01-11T21:38:05.8359998Z auto tmp16 = tmp14 & tmp15; 2023-01-11T21:38:05.8360093Z auto tmp17 = tmp5 & tmp16; 2023-01-11T21:38:05.8360185Z float tmp18 = 0.0; 2023-01-11T21:38:05.8360269Z if(tmp17) 2023-01-11T21:38:05.8360335Z { 2023-01-11T21:38:05.8360508Z auto tmp19 = in_ptr0[(-8) + (2*i1) + (16*i0)]; 2023-01-11T21:38:05.8360625Z tmp18 = tmp19; 2023-01-11T21:38:05.8360699Z } 2023-01-11T21:38:05.8360796Z auto tmp20 = tmp18 + tmp11; 2023-01-11T21:38:05.8360911Z auto tmp21 = static_cast(1 + (2*i1)); 2023-01-11T21:38:05.8361011Z auto tmp22 = tmp21 >= tmp1; 2023-01-11T21:38:05.8361102Z auto tmp23 = tmp21 < tmp3; 2023-01-11T21:38:05.8361196Z auto tmp24 = tmp22 & tmp23; 2023-01-11T21:38:05.8361292Z auto tmp25 = tmp5 & tmp24; 2023-01-11T21:38:05.8361382Z float tmp26 = 0.0; 2023-01-11T21:38:05.8361463Z if(tmp25) 2023-01-11T21:38:05.8361536Z { 2023-01-11T21:38:05.8361708Z auto tmp27 = in_ptr0[(-7) + (2*i1) + (16*i0)]; 2023-01-11T21:38:05.8361789Z tmp26 = tmp27; 2023-01-11T21:38:05.8361865Z } 2023-01-11T21:38:05.8361963Z auto tmp28 = tmp26 + tmp20; 2023-01-11T21:38:05.8362076Z auto tmp29 = static_cast(2*i0); 2023-01-11T21:38:05.8362175Z auto tmp30 = tmp29 >= tmp1; 2023-01-11T21:38:05.8362276Z auto tmp31 = tmp29 < tmp3; 2023-01-11T21:38:05.8362371Z auto tmp32 = tmp30 & tmp31; 2023-01-11T21:38:05.8362460Z auto tmp33 = tmp32 & tmp9; 2023-01-11T21:38:05.8362550Z float tmp34 = 0.0; 2023-01-11T21:38:05.8362630Z if(tmp33) 2023-01-11T21:38:05.8362701Z { 2023-01-11T21:38:05.8362873Z auto tmp35 = in_ptr0[(-1) + (2*i1) + (16*i0)]; 2023-01-11T21:38:05.8362961Z tmp34 = tmp35; 2023-01-11T21:38:05.8363031Z } 2023-01-11T21:38:05.8363122Z auto tmp36 = tmp34 + tmp28; 2023-01-11T21:38:05.8363220Z auto tmp37 = tmp32 & tmp16; 2023-01-11T21:38:05.8363309Z float tmp38 = 0.0; 2023-01-11T21:38:05.8363387Z if(tmp37) 2023-01-11T21:38:05.8363459Z { 2023-01-11T21:38:05.8363597Z auto tmp39 = in_ptr0[(2*i1) + (16*i0)]; 2023-01-11T21:38:05.8363685Z tmp38 = tmp39; 2023-01-11T21:38:05.8363750Z } 2023-01-11T21:38:05.8363846Z auto tmp40 = tmp38 + tmp36; 2023-01-11T21:38:05.8363944Z auto tmp41 = tmp32 & tmp24; 2023-01-11T21:38:05.8364032Z float tmp42 = 0.0; 2023-01-11T21:38:05.8364112Z if(tmp41) 2023-01-11T21:38:05.8364187Z { 2023-01-11T21:38:05.8364300Z auto tmp43 = in_ptr0[1 + (2*i1) + (16*i0)]; 2023-01-11T21:38:05.8364387Z tmp42 = tmp43; 2023-01-11T21:38:05.8364455Z } 2023-01-11T21:38:05.8364552Z auto tmp44 = tmp42 + tmp40; 2023-01-11T21:38:05.8364666Z auto tmp45 = static_cast(1 + (2*i0)); 2023-01-11T21:38:05.8364763Z auto tmp46 = tmp45 >= tmp1; 2023-01-11T21:38:05.8364865Z auto tmp47 = tmp45 < tmp3; 2023-01-11T21:38:05.8364962Z auto tmp48 = tmp46 & tmp47; 2023-01-11T21:38:05.8365057Z auto tmp49 = tmp48 & tmp9; 2023-01-11T21:38:05.8365140Z float tmp50 = 0.0; 2023-01-11T21:38:05.8365218Z if(tmp49) 2023-01-11T21:38:05.8365290Z { 2023-01-11T21:38:05.8365403Z auto tmp51 = in_ptr0[7 + (2*i1) + (16*i0)]; 2023-01-11T21:38:05.8365489Z tmp50 = tmp51; 2023-01-11T21:38:05.8365562Z } 2023-01-11T21:38:05.8365693Z auto tmp52 = tmp50 + tmp44; 2023-01-11T21:38:05.8365782Z auto tmp53 = tmp48 & tmp16; 2023-01-11T21:38:05.8365873Z float tmp54 = 0.0; 2023-01-11T21:38:05.8365952Z if(tmp53) 2023-01-11T21:38:05.8366023Z { 2023-01-11T21:38:05.8366137Z auto tmp55 = in_ptr0[8 + (2*i1) + (16*i0)]; 2023-01-11T21:38:05.8366225Z tmp54 = tmp55; 2023-01-11T21:38:05.8366294Z } 2023-01-11T21:38:05.8366384Z auto tmp56 = tmp54 + tmp52; 2023-01-11T21:38:05.8366479Z auto tmp57 = tmp48 & tmp24; 2023-01-11T21:38:05.8366568Z float tmp58 = 0.0; 2023-01-11T21:38:05.8366646Z if(tmp57) 2023-01-11T21:38:05.8366718Z { 2023-01-11T21:38:05.8366831Z auto tmp59 = in_ptr0[9 + (2*i1) + (16*i0)]; 2023-01-11T21:38:05.8366921Z tmp58 = tmp59; 2023-01-11T21:38:05.8366987Z } 2023-01-11T21:38:05.8367083Z auto tmp60 = tmp58 + tmp56; 2023-01-11T21:38:05.8367172Z float tmp61 = 0.0; 2023-01-11T21:38:05.8367251Z if(tmp10) 2023-01-11T21:38:05.8367325Z { 2023-01-11T21:38:05.8367439Z auto tmp62 = static_cast(1); 2023-01-11T21:38:05.8367528Z tmp61 = tmp62; 2023-01-11T21:38:05.8367592Z } 2023-01-11T21:38:05.8367681Z float tmp63 = 0.0; 2023-01-11T21:38:05.8367760Z if(tmp17) 2023-01-11T21:38:05.8367830Z { 2023-01-11T21:38:05.8367942Z auto tmp64 = static_cast(1); 2023-01-11T21:38:05.8368028Z tmp63 = tmp64; 2023-01-11T21:38:05.8368099Z } 2023-01-11T21:38:05.8368191Z auto tmp65 = tmp63 + tmp61; 2023-01-11T21:38:05.8368281Z float tmp66 = 0.0; 2023-01-11T21:38:05.8368361Z if(tmp25) 2023-01-11T21:38:05.8368432Z { 2023-01-11T21:38:05.8368543Z auto tmp67 = static_cast(1); 2023-01-11T21:38:05.8368667Z tmp66 = tmp67; 2023-01-11T21:38:05.8368740Z } 2023-01-11T21:38:05.8368831Z auto tmp68 = tmp66 + tmp65; 2023-01-11T21:38:05.8368919Z float tmp69 = 0.0; 2023-01-11T21:38:05.8368998Z if(tmp33) 2023-01-11T21:38:05.8369069Z { 2023-01-11T21:38:05.8369179Z auto tmp70 = static_cast(1); 2023-01-11T21:38:05.8369264Z tmp69 = tmp70; 2023-01-11T21:38:05.8369336Z } 2023-01-11T21:38:05.8369427Z auto tmp71 = tmp69 + tmp68; 2023-01-11T21:38:05.8369519Z float tmp72 = 0.0; 2023-01-11T21:38:05.8369600Z if(tmp37) 2023-01-11T21:38:05.8369674Z { 2023-01-11T21:38:05.8369782Z auto tmp73 = static_cast(1); 2023-01-11T21:38:05.8369870Z tmp72 = tmp73; 2023-01-11T21:38:05.8369945Z } 2023-01-11T21:38:05.8370036Z auto tmp74 = tmp72 + tmp71; 2023-01-11T21:38:05.8370125Z float tmp75 = 0.0; 2023-01-11T21:38:05.8370206Z if(tmp41) 2023-01-11T21:38:05.8370279Z { 2023-01-11T21:38:05.8370387Z auto tmp76 = static_cast(1); 2023-01-11T21:38:05.8370473Z tmp75 = tmp76; 2023-01-11T21:38:05.8370544Z } 2023-01-11T21:38:05.8370635Z auto tmp77 = tmp75 + tmp74; 2023-01-11T21:38:05.8370750Z float tmp78 = 0.0; 2023-01-11T21:38:05.8370829Z if(tmp49) 2023-01-11T21:38:05.8370901Z { 2023-01-11T21:38:05.8371010Z auto tmp79 = static_cast(1); 2023-01-11T21:38:05.8371097Z tmp78 = tmp79; 2023-01-11T21:38:05.8371176Z } 2023-01-11T21:38:05.8371267Z auto tmp80 = tmp78 + tmp77; 2023-01-11T21:38:05.8371357Z float tmp81 = 0.0; 2023-01-11T21:38:05.8371437Z if(tmp53) 2023-01-11T21:38:05.8371509Z { 2023-01-11T21:38:05.8371619Z auto tmp82 = static_cast(1); 2023-01-11T21:38:05.8371706Z tmp81 = tmp82; 2023-01-11T21:38:05.8371781Z } 2023-01-11T21:38:05.8371878Z auto tmp83 = tmp81 + tmp80; 2023-01-11T21:38:05.8371961Z float tmp84 = 0.0; 2023-01-11T21:38:05.8372043Z if(tmp57) 2023-01-11T21:38:05.8372118Z { 2023-01-11T21:38:05.8372227Z auto tmp85 = static_cast(1); 2023-01-11T21:38:05.8372313Z tmp84 = tmp85; 2023-01-11T21:38:05.8372384Z } 2023-01-11T21:38:05.8372484Z auto tmp86 = tmp84 + tmp83; 2023-01-11T21:38:05.8372575Z auto tmp87 = tmp60 / tmp86; 2023-01-11T21:38:05.8372675Z out_ptr0[i1 + (4*i0)] = tmp87; 2023-01-11T21:38:05.8372746Z } 2023-01-11T21:38:05.8372813Z } 2023-01-11T21:38:05.8372882Z } 2023-01-11T21:38:05.8372949Z } 2023-01-11T21:38:05.8373009Z } 2023-01-11T21:38:05.8373072Z } 2023-01-11T21:38:05.8373159Z ''') 2023-01-11T21:38:05.8373165Z 2023-01-11T21:38:05.8373169Z 2023-01-11T21:38:05.8373264Z async_compile.wait(globals()) 2023-01-11T21:38:05.8373342Z del async_compile 2023-01-11T21:38:05.8373347Z 2023-01-11T21:38:05.8373421Z def call(args): 2023-01-11T21:38:05.8373492Z arg0_1, = args 2023-01-11T21:38:05.8373567Z args.clear() 2023-01-11T21:38:05.8373773Z buf0 = empty_strided((1, 1, 4, 4), (16, 16, 4, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8373991Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:05.8374067Z del arg0_1 2023-01-11T21:38:05.8374143Z return (buf0, ) 2023-01-11T21:38:05.8374148Z 2023-01-11T21:38:05.8374152Z 2023-01-11T21:38:05.8374232Z if __name__ == "__main__": 2023-01-11T21:38:05.8374348Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.8374587Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.8374795Z arg0_1 = rand_strided((1, 1, 8, 8), (64, 64, 8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8374906Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:05.8375169Z [2023-01-11 21:24:49,929] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 31 2023-01-11T21:38:05.8375179Z 2023-01-11T21:38:05.8375249Z ok (1.822s) 2023-01-11T21:38:05.8375708Z test_avg_pool2d6_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.8375840Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.8376093Z [2023-01-11 21:24:49,948] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 32 2023-01-11T21:38:05.8376352Z [2023-01-11 21:24:51,727] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 32 2023-01-11T21:38:05.8376358Z 2023-01-11T21:38:05.8376502Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.8376576Z import torch 2023-01-11T21:38:05.8376644Z import random 2023-01-11T21:38:05.8376760Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.8376882Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.8376887Z 2023-01-11T21:38:05.8376970Z aten = torch.ops.aten 2023-01-11T21:38:05.8377105Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.8377256Z async_compile = AsyncCompile() 2023-01-11T21:38:05.8377262Z 2023-01-11T21:38:05.8377338Z import triton 2023-01-11T21:38:05.8377423Z import triton.language as tl 2023-01-11T21:38:05.8377549Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.8377689Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.8377695Z 2023-01-11T21:38:05.8377700Z 2023-01-11T21:38:05.8377838Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.8378042Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.8378169Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:05.8378273Z float* __restrict__ out_ptr0) 2023-01-11T21:38:05.8378336Z { 2023-01-11T21:38:05.8378431Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.8378499Z { 2023-01-11T21:38:05.8378580Z #pragma omp for 2023-01-11T21:38:05.8378669Z for(long i0=0; i0<4; i0+=1) 2023-01-11T21:38:05.8378734Z { 2023-01-11T21:38:05.8378819Z #pragma GCC ivdep 2023-01-11T21:38:05.8378906Z for(long i1=0; i1<4; i1+=1) 2023-01-11T21:38:05.8378968Z { 2023-01-11T21:38:05.8379040Z { 2023-01-11T21:38:05.8379112Z { 2023-01-11T21:38:05.8379285Z auto tmp0 = static_cast((-1) + (2*i0)); 2023-01-11T21:38:05.8379395Z auto tmp1 = static_cast(0); 2023-01-11T21:38:05.8379495Z auto tmp2 = tmp0 >= tmp1; 2023-01-11T21:38:05.8379604Z auto tmp3 = static_cast(8); 2023-01-11T21:38:05.8379697Z auto tmp4 = tmp0 < tmp3; 2023-01-11T21:38:05.8379795Z auto tmp5 = tmp2 & tmp4; 2023-01-11T21:38:05.8380015Z auto tmp6 = static_cast((-1) + (2*i1)); 2023-01-11T21:38:05.8380116Z auto tmp7 = tmp6 >= tmp1; 2023-01-11T21:38:05.8380214Z auto tmp8 = tmp6 < tmp3; 2023-01-11T21:38:05.8380310Z auto tmp9 = tmp7 & tmp8; 2023-01-11T21:38:05.8380411Z auto tmp10 = tmp5 & tmp9; 2023-01-11T21:38:05.8380495Z float tmp11 = 0.0; 2023-01-11T21:38:05.8380576Z if(tmp10) 2023-01-11T21:38:05.8380649Z { 2023-01-11T21:38:05.8380824Z auto tmp12 = in_ptr0[(-9) + (2*i1) + (16*i0)]; 2023-01-11T21:38:05.8380917Z tmp11 = tmp12; 2023-01-11T21:38:05.8380992Z } 2023-01-11T21:38:05.8381107Z auto tmp13 = static_cast(2*i1); 2023-01-11T21:38:05.8381207Z auto tmp14 = tmp13 >= tmp1; 2023-01-11T21:38:05.8381299Z auto tmp15 = tmp13 < tmp3; 2023-01-11T21:38:05.8381398Z auto tmp16 = tmp14 & tmp15; 2023-01-11T21:38:05.8381498Z auto tmp17 = tmp5 & tmp16; 2023-01-11T21:38:05.8381589Z float tmp18 = 0.0; 2023-01-11T21:38:05.8381670Z if(tmp17) 2023-01-11T21:38:05.8381740Z { 2023-01-11T21:38:05.8381914Z auto tmp19 = in_ptr0[(-8) + (2*i1) + (16*i0)]; 2023-01-11T21:38:05.8381996Z tmp18 = tmp19; 2023-01-11T21:38:05.8382068Z } 2023-01-11T21:38:05.8382166Z auto tmp20 = tmp18 + tmp11; 2023-01-11T21:38:05.8382313Z auto tmp21 = static_cast(1 + (2*i1)); 2023-01-11T21:38:05.8382411Z auto tmp22 = tmp21 >= tmp1; 2023-01-11T21:38:05.8382510Z auto tmp23 = tmp21 < tmp3; 2023-01-11T21:38:05.8382605Z auto tmp24 = tmp22 & tmp23; 2023-01-11T21:38:05.8382698Z auto tmp25 = tmp5 & tmp24; 2023-01-11T21:38:05.8382787Z float tmp26 = 0.0; 2023-01-11T21:38:05.8382869Z if(tmp25) 2023-01-11T21:38:05.8382940Z { 2023-01-11T21:38:05.8383112Z auto tmp27 = in_ptr0[(-7) + (2*i1) + (16*i0)]; 2023-01-11T21:38:05.8383200Z tmp26 = tmp27; 2023-01-11T21:38:05.8383274Z } 2023-01-11T21:38:05.8383367Z auto tmp28 = tmp26 + tmp20; 2023-01-11T21:38:05.8383478Z auto tmp29 = static_cast(2*i0); 2023-01-11T21:38:05.8383579Z auto tmp30 = tmp29 >= tmp1; 2023-01-11T21:38:05.8383675Z auto tmp31 = tmp29 < tmp3; 2023-01-11T21:38:05.8383771Z auto tmp32 = tmp30 & tmp31; 2023-01-11T21:38:05.8383867Z auto tmp33 = tmp32 & tmp9; 2023-01-11T21:38:05.8383958Z float tmp34 = 0.0; 2023-01-11T21:38:05.8384033Z if(tmp33) 2023-01-11T21:38:05.8384106Z { 2023-01-11T21:38:05.8384278Z auto tmp35 = in_ptr0[(-1) + (2*i1) + (16*i0)]; 2023-01-11T21:38:05.8384368Z tmp34 = tmp35; 2023-01-11T21:38:05.8384439Z } 2023-01-11T21:38:05.8384539Z auto tmp36 = tmp34 + tmp28; 2023-01-11T21:38:05.8384636Z auto tmp37 = tmp32 & tmp16; 2023-01-11T21:38:05.8384725Z float tmp38 = 0.0; 2023-01-11T21:38:05.8384800Z if(tmp37) 2023-01-11T21:38:05.8384880Z { 2023-01-11T21:38:05.8384994Z auto tmp39 = in_ptr0[(2*i1) + (16*i0)]; 2023-01-11T21:38:05.8385081Z tmp38 = tmp39; 2023-01-11T21:38:05.8385153Z } 2023-01-11T21:38:05.8385251Z auto tmp40 = tmp38 + tmp36; 2023-01-11T21:38:05.8385375Z auto tmp41 = tmp32 & tmp24; 2023-01-11T21:38:05.8385460Z float tmp42 = 0.0; 2023-01-11T21:38:05.8385551Z if(tmp41) 2023-01-11T21:38:05.8385637Z { 2023-01-11T21:38:05.8385765Z auto tmp43 = in_ptr0[1 + (2*i1) + (16*i0)]; 2023-01-11T21:38:05.8385861Z tmp42 = tmp43; 2023-01-11T21:38:05.8385931Z } 2023-01-11T21:38:05.8386030Z auto tmp44 = tmp42 + tmp40; 2023-01-11T21:38:05.8386140Z auto tmp45 = static_cast(1 + (2*i0)); 2023-01-11T21:38:05.8386239Z auto tmp46 = tmp45 >= tmp1; 2023-01-11T21:38:05.8386337Z auto tmp47 = tmp45 < tmp3; 2023-01-11T21:38:05.8386434Z auto tmp48 = tmp46 & tmp47; 2023-01-11T21:38:05.8386529Z auto tmp49 = tmp48 & tmp9; 2023-01-11T21:38:05.8386624Z float tmp50 = 0.0; 2023-01-11T21:38:05.8386704Z if(tmp49) 2023-01-11T21:38:05.8386770Z { 2023-01-11T21:38:05.8386883Z auto tmp51 = in_ptr0[7 + (2*i1) + (16*i0)]; 2023-01-11T21:38:05.8386971Z tmp50 = tmp51; 2023-01-11T21:38:05.8387041Z } 2023-01-11T21:38:05.8387139Z auto tmp52 = tmp50 + tmp44; 2023-01-11T21:38:05.8387235Z auto tmp53 = tmp48 & tmp16; 2023-01-11T21:38:05.8387323Z float tmp54 = 0.0; 2023-01-11T21:38:05.8387396Z if(tmp53) 2023-01-11T21:38:05.8387499Z { 2023-01-11T21:38:05.8387610Z auto tmp55 = in_ptr0[8 + (2*i1) + (16*i0)]; 2023-01-11T21:38:05.8387697Z tmp54 = tmp55; 2023-01-11T21:38:05.8387769Z } 2023-01-11T21:38:05.8387869Z auto tmp56 = tmp54 + tmp52; 2023-01-11T21:38:05.8387967Z auto tmp57 = tmp48 & tmp24; 2023-01-11T21:38:05.8388050Z float tmp58 = 0.0; 2023-01-11T21:38:05.8388130Z if(tmp57) 2023-01-11T21:38:05.8388201Z { 2023-01-11T21:38:05.8388314Z auto tmp59 = in_ptr0[9 + (2*i1) + (16*i0)]; 2023-01-11T21:38:05.8388403Z tmp58 = tmp59; 2023-01-11T21:38:05.8388476Z } 2023-01-11T21:38:05.8388573Z auto tmp60 = tmp58 + tmp56; 2023-01-11T21:38:05.8388690Z auto tmp61 = static_cast(0.3333333333333333); 2023-01-11T21:38:05.8388791Z auto tmp62 = tmp60 * tmp61; 2023-01-11T21:38:05.8388891Z out_ptr0[i1 + (4*i0)] = tmp62; 2023-01-11T21:38:05.8388963Z } 2023-01-11T21:38:05.8389033Z } 2023-01-11T21:38:05.8389097Z } 2023-01-11T21:38:05.8389167Z } 2023-01-11T21:38:05.8389226Z } 2023-01-11T21:38:05.8389290Z } 2023-01-11T21:38:05.8389376Z ''') 2023-01-11T21:38:05.8389381Z 2023-01-11T21:38:05.8389385Z 2023-01-11T21:38:05.8389481Z async_compile.wait(globals()) 2023-01-11T21:38:05.8389556Z del async_compile 2023-01-11T21:38:05.8389561Z 2023-01-11T21:38:05.8389634Z def call(args): 2023-01-11T21:38:05.8389706Z arg0_1, = args 2023-01-11T21:38:05.8389774Z args.clear() 2023-01-11T21:38:05.8389987Z buf0 = empty_strided((1, 1, 4, 4), (16, 16, 4, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8390124Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:05.8390200Z del arg0_1 2023-01-11T21:38:05.8390274Z return (buf0, ) 2023-01-11T21:38:05.8390279Z 2023-01-11T21:38:05.8390283Z 2023-01-11T21:38:05.8390363Z if __name__ == "__main__": 2023-01-11T21:38:05.8390479Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.8390634Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.8390840Z arg0_1 = rand_strided((1, 1, 8, 8), (64, 64, 8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8390955Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:05.8390960Z 2023-01-11T21:38:05.8391032Z ok (1.798s) 2023-01-11T21:38:05.8391481Z test_avg_pool2d7_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.8391616Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.8391872Z [2023-01-11 21:24:51,746] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 33 2023-01-11T21:38:05.8392100Z [2023-01-11 21:24:51,751] torch._inductor.ir: [WARNING] Using FallbackKernel: aten.avg_pool2d 2023-01-11T21:38:05.8392361Z [2023-01-11 21:24:51,753] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 33 2023-01-11T21:38:05.8392368Z 2023-01-11T21:38:05.8392468Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.8392542Z import torch 2023-01-11T21:38:05.8392610Z import random 2023-01-11T21:38:05.8392729Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.8392851Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.8392857Z 2023-01-11T21:38:05.8392942Z aten = torch.ops.aten 2023-01-11T21:38:05.8393108Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.8393204Z async_compile = AsyncCompile() 2023-01-11T21:38:05.8393210Z 2023-01-11T21:38:05.8393282Z import triton 2023-01-11T21:38:05.8393368Z import triton.language as tl 2023-01-11T21:38:05.8393492Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.8393634Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.8393640Z 2023-01-11T21:38:05.8393644Z 2023-01-11T21:38:05.8393736Z async_compile.wait(globals()) 2023-01-11T21:38:05.8393811Z del async_compile 2023-01-11T21:38:05.8393817Z 2023-01-11T21:38:05.8393890Z def call(args): 2023-01-11T21:38:05.8393964Z arg0_1, = args 2023-01-11T21:38:05.8394042Z args.clear() 2023-01-11T21:38:05.8394168Z buf0 = aten.avg_pool2d(arg0_1, [13, 13], [1, 1], [0, 0], False, True, None) 2023-01-11T21:38:05.8394240Z del arg0_1 2023-01-11T21:38:05.8394312Z buf1 = buf0 2023-01-11T21:38:05.8394425Z assert_size_stride(buf1, (1, 1, 12, 12), (144, 144, 12, 1)) 2023-01-11T21:38:05.8394498Z del buf0 2023-01-11T21:38:05.8394574Z return (buf1, ) 2023-01-11T21:38:05.8394579Z 2023-01-11T21:38:05.8394583Z 2023-01-11T21:38:05.8394662Z if __name__ == "__main__": 2023-01-11T21:38:05.8394770Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.8394898Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.8395119Z arg0_1 = rand_strided((1, 1, 24, 24), (576, 576, 24, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8395232Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:05.8395237Z 2023-01-11T21:38:05.8395307Z ok (0.026s) 2023-01-11T21:38:05.8395771Z test_avg_pool2d_backward2_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.8395906Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.8396167Z [2023-01-11 21:24:51,773] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 34 2023-01-11T21:38:05.8396172Z 2023-01-11T21:38:05.8396310Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.8396389Z import torch 2023-01-11T21:38:05.8396458Z import random 2023-01-11T21:38:05.8396575Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.8396698Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.8396703Z 2023-01-11T21:38:05.8396785Z aten = torch.ops.aten 2023-01-11T21:38:05.8396922Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.8397017Z async_compile = AsyncCompile() 2023-01-11T21:38:05.8397022Z 2023-01-11T21:38:05.8397095Z import triton 2023-01-11T21:38:05.8397180Z import triton.language as tl 2023-01-11T21:38:05.8397306Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.8397446Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.8397452Z 2023-01-11T21:38:05.8397456Z 2023-01-11T21:38:05.8397593Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.8397806Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.8397930Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:05.8398034Z float* __restrict__ out_ptr0) 2023-01-11T21:38:05.8398103Z { 2023-01-11T21:38:05.8398197Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.8398265Z { 2023-01-11T21:38:05.8398350Z #pragma omp for 2023-01-11T21:38:05.8398438Z for(long i0=0; i0<20; i0+=1) 2023-01-11T21:38:05.8398505Z { 2023-01-11T21:38:05.8398588Z #pragma GCC ivdep 2023-01-11T21:38:05.8398679Z for(long i1=0; i1<15; i1+=1) 2023-01-11T21:38:05.8398770Z { 2023-01-11T21:38:05.8398838Z { 2023-01-11T21:38:05.8398909Z { 2023-01-11T21:38:05.8399078Z auto tmp0 = static_cast((-1) + i0); 2023-01-11T21:38:05.8399246Z auto tmp1 = static_cast((-1) + i1); 2023-01-11T21:38:05.8399362Z auto tmp2 = static_cast(2 + i0); 2023-01-11T21:38:05.8399473Z auto tmp3 = static_cast(2 + i1); 2023-01-11T21:38:05.8399575Z auto tmp4 = static_cast(0); 2023-01-11T21:38:05.8399713Z auto tmp5 = (tmp4 != tmp4) ? tmp4 : std::max(tmp0, tmp4); 2023-01-11T21:38:05.8399847Z auto tmp6 = (tmp4 != tmp4) ? tmp4 : std::max(tmp1, tmp4); 2023-01-11T21:38:05.8399956Z auto tmp7 = static_cast(20); 2023-01-11T21:38:05.8400084Z auto tmp8 = (tmp7 != tmp7) ? tmp7 : std::min(tmp2, tmp7); 2023-01-11T21:38:05.8400195Z auto tmp9 = static_cast(15); 2023-01-11T21:38:05.8400330Z auto tmp10 = (tmp9 != tmp9) ? tmp9 : std::min(tmp3, tmp9); 2023-01-11T21:38:05.8400429Z auto tmp11 = tmp5 + tmp4; 2023-01-11T21:38:05.8400522Z auto tmp12 = tmp6 + tmp4; 2023-01-11T21:38:05.8400633Z auto tmp13 = static_cast(1); 2023-01-11T21:38:05.8400738Z auto tmp14 = static_cast(3); 2023-01-11T21:38:05.8400838Z auto tmp15 = tmp11 * tmp13; 2023-01-11T21:38:05.8400983Z auto tmp16 = tmp15 - tmp13; 2023-01-11T21:38:05.8401079Z auto tmp17 = tmp12 * tmp13; 2023-01-11T21:38:05.8401220Z auto tmp18 = tmp17 - tmp13; 2023-01-11T21:38:05.8401310Z auto tmp19 = tmp16 + tmp14; 2023-01-11T21:38:05.8401408Z auto tmp20 = tmp7 + tmp13; 2023-01-11T21:38:05.8401547Z auto tmp21 = (tmp20 != tmp20) ? tmp20 : std::min(tmp19, tmp20); 2023-01-11T21:38:05.8401646Z auto tmp22 = tmp18 + tmp14; 2023-01-11T21:38:05.8401744Z auto tmp23 = tmp9 + tmp13; 2023-01-11T21:38:05.8401915Z auto tmp24 = (tmp23 != tmp23) ? tmp23 : std::min(tmp22, tmp23); 2023-01-11T21:38:05.8402051Z auto tmp25 = (tmp4 != tmp4) ? tmp4 : std::max(tmp16, tmp4); 2023-01-11T21:38:05.8402182Z auto tmp26 = (tmp4 != tmp4) ? tmp4 : std::max(tmp18, tmp4); 2023-01-11T21:38:05.8402309Z auto tmp27 = (tmp7 != tmp7) ? tmp7 : std::min(tmp21, tmp7); 2023-01-11T21:38:05.8402428Z auto tmp28 = (tmp9 != tmp9) ? tmp9 : std::min(tmp24, tmp9); 2023-01-11T21:38:05.8402574Z auto tmp29 = tmp27 - tmp25; 2023-01-11T21:38:05.8402718Z auto tmp30 = tmp28 - tmp26; 2023-01-11T21:38:05.8402817Z auto tmp31 = tmp29 * tmp30; 2023-01-11T21:38:05.8402959Z auto tmp32 = tmp8 - tmp13; 2023-01-11T21:38:05.8403095Z auto tmp33 = (tmp32 != tmp32) ? tmp32 : std::min(tmp11, tmp32); 2023-01-11T21:38:05.8403239Z auto tmp34 = tmp10 - tmp13; 2023-01-11T21:38:05.8403375Z auto tmp35 = (tmp34 != tmp34) ? tmp34 : std::min(tmp12, tmp34); 2023-01-11T21:38:05.8403483Z auto tmp36 = in_ptr0[tmp35 + (15*tmp33)]; 2023-01-11T21:38:05.8403581Z auto tmp37 = tmp36 / tmp31; 2023-01-11T21:38:05.8403679Z auto tmp38 = tmp11 < tmp8; 2023-01-11T21:38:05.8403774Z auto tmp39 = tmp12 < tmp10; 2023-01-11T21:38:05.8403870Z auto tmp40 = tmp38 & tmp39; 2023-01-11T21:38:05.8403986Z auto tmp41 = static_cast(0.0); 2023-01-11T21:38:05.8404094Z auto tmp42 = tmp40 ? tmp37 : tmp41; 2023-01-11T21:38:05.8404216Z auto tmp43 = tmp6 + tmp13; 2023-01-11T21:38:05.8404315Z auto tmp44 = tmp43 * tmp13; 2023-01-11T21:38:05.8404458Z auto tmp45 = tmp44 - tmp13; 2023-01-11T21:38:05.8404554Z auto tmp46 = tmp45 + tmp14; 2023-01-11T21:38:05.8404692Z auto tmp47 = (tmp23 != tmp23) ? tmp23 : std::min(tmp46, tmp23); 2023-01-11T21:38:05.8404827Z auto tmp48 = (tmp4 != tmp4) ? tmp4 : std::max(tmp45, tmp4); 2023-01-11T21:38:05.8404956Z auto tmp49 = (tmp9 != tmp9) ? tmp9 : std::min(tmp47, tmp9); 2023-01-11T21:38:05.8405099Z auto tmp50 = tmp49 - tmp48; 2023-01-11T21:38:05.8405197Z auto tmp51 = tmp29 * tmp50; 2023-01-11T21:38:05.8405322Z auto tmp52 = (tmp34 != tmp34) ? tmp34 : std::min(tmp43, tmp34); 2023-01-11T21:38:05.8405438Z auto tmp53 = in_ptr0[tmp52 + (15*tmp33)]; 2023-01-11T21:38:05.8405540Z auto tmp54 = tmp53 / tmp51; 2023-01-11T21:38:05.8405636Z auto tmp55 = tmp43 < tmp10; 2023-01-11T21:38:05.8405733Z auto tmp56 = tmp38 & tmp55; 2023-01-11T21:38:05.8405837Z auto tmp57 = tmp42 + tmp54; 2023-01-11T21:38:05.8405945Z auto tmp58 = tmp56 ? tmp57 : tmp42; 2023-01-11T21:38:05.8406047Z auto tmp59 = static_cast(2); 2023-01-11T21:38:05.8406146Z auto tmp60 = tmp6 + tmp59; 2023-01-11T21:38:05.8406242Z auto tmp61 = tmp60 * tmp13; 2023-01-11T21:38:05.8406384Z auto tmp62 = tmp61 - tmp13; 2023-01-11T21:38:05.8406482Z auto tmp63 = tmp62 + tmp14; 2023-01-11T21:38:05.8406614Z auto tmp64 = (tmp23 != tmp23) ? tmp23 : std::min(tmp63, tmp23); 2023-01-11T21:38:05.8406744Z auto tmp65 = (tmp4 != tmp4) ? tmp4 : std::max(tmp62, tmp4); 2023-01-11T21:38:05.8406874Z auto tmp66 = (tmp9 != tmp9) ? tmp9 : std::min(tmp64, tmp9); 2023-01-11T21:38:05.8407010Z auto tmp67 = tmp66 - tmp65; 2023-01-11T21:38:05.8407106Z auto tmp68 = tmp29 * tmp67; 2023-01-11T21:38:05.8407272Z auto tmp69 = (tmp34 != tmp34) ? tmp34 : std::min(tmp60, tmp34); 2023-01-11T21:38:05.8407387Z auto tmp70 = in_ptr0[tmp69 + (15*tmp33)]; 2023-01-11T21:38:05.8407482Z auto tmp71 = tmp70 / tmp68; 2023-01-11T21:38:05.8407580Z auto tmp72 = tmp60 < tmp10; 2023-01-11T21:38:05.8407678Z auto tmp73 = tmp38 & tmp72; 2023-01-11T21:38:05.8407773Z auto tmp74 = tmp58 + tmp71; 2023-01-11T21:38:05.8407874Z auto tmp75 = tmp73 ? tmp74 : tmp58; 2023-01-11T21:38:05.8407971Z auto tmp76 = tmp5 + tmp13; 2023-01-11T21:38:05.8408069Z auto tmp77 = tmp76 * tmp13; 2023-01-11T21:38:05.8408214Z auto tmp78 = tmp77 - tmp13; 2023-01-11T21:38:05.8408309Z auto tmp79 = tmp78 + tmp14; 2023-01-11T21:38:05.8408445Z auto tmp80 = (tmp20 != tmp20) ? tmp20 : std::min(tmp79, tmp20); 2023-01-11T21:38:05.8408580Z auto tmp81 = (tmp4 != tmp4) ? tmp4 : std::max(tmp78, tmp4); 2023-01-11T21:38:05.8408710Z auto tmp82 = (tmp7 != tmp7) ? tmp7 : std::min(tmp80, tmp7); 2023-01-11T21:38:05.8408847Z auto tmp83 = tmp82 - tmp81; 2023-01-11T21:38:05.8408944Z auto tmp84 = tmp83 * tmp30; 2023-01-11T21:38:05.8409075Z auto tmp85 = (tmp32 != tmp32) ? tmp32 : std::min(tmp76, tmp32); 2023-01-11T21:38:05.8409188Z auto tmp86 = in_ptr0[tmp35 + (15*tmp85)]; 2023-01-11T21:38:05.8409313Z auto tmp87 = tmp86 / tmp84; 2023-01-11T21:38:05.8409409Z auto tmp88 = tmp76 < tmp8; 2023-01-11T21:38:05.8409505Z auto tmp89 = tmp88 & tmp39; 2023-01-11T21:38:05.8409604Z auto tmp90 = tmp75 + tmp87; 2023-01-11T21:38:05.8409710Z auto tmp91 = tmp89 ? tmp90 : tmp75; 2023-01-11T21:38:05.8409811Z auto tmp92 = tmp83 * tmp50; 2023-01-11T21:38:05.8409924Z auto tmp93 = in_ptr0[tmp52 + (15*tmp85)]; 2023-01-11T21:38:05.8410022Z auto tmp94 = tmp93 / tmp92; 2023-01-11T21:38:05.8410116Z auto tmp95 = tmp88 & tmp55; 2023-01-11T21:38:05.8410211Z auto tmp96 = tmp91 + tmp94; 2023-01-11T21:38:05.8410319Z auto tmp97 = tmp95 ? tmp96 : tmp91; 2023-01-11T21:38:05.8410409Z auto tmp98 = tmp83 * tmp67; 2023-01-11T21:38:05.8410521Z auto tmp99 = in_ptr0[tmp69 + (15*tmp85)]; 2023-01-11T21:38:05.8410628Z auto tmp100 = tmp99 / tmp98; 2023-01-11T21:38:05.8410728Z auto tmp101 = tmp88 & tmp72; 2023-01-11T21:38:05.8410831Z auto tmp102 = tmp97 + tmp100; 2023-01-11T21:38:05.8410948Z auto tmp103 = tmp101 ? tmp102 : tmp97; 2023-01-11T21:38:05.8411048Z auto tmp104 = tmp5 + tmp59; 2023-01-11T21:38:05.8411143Z auto tmp105 = tmp104 * tmp13; 2023-01-11T21:38:05.8411293Z auto tmp106 = tmp105 - tmp13; 2023-01-11T21:38:05.8411393Z auto tmp107 = tmp106 + tmp14; 2023-01-11T21:38:05.8411530Z auto tmp108 = (tmp20 != tmp20) ? tmp20 : std::min(tmp107, tmp20); 2023-01-11T21:38:05.8411668Z auto tmp109 = (tmp4 != tmp4) ? tmp4 : std::max(tmp106, tmp4); 2023-01-11T21:38:05.8411800Z auto tmp110 = (tmp7 != tmp7) ? tmp7 : std::min(tmp108, tmp7); 2023-01-11T21:38:05.8411953Z auto tmp111 = tmp110 - tmp109; 2023-01-11T21:38:05.8412053Z auto tmp112 = tmp111 * tmp30; 2023-01-11T21:38:05.8412189Z auto tmp113 = (tmp32 != tmp32) ? tmp32 : std::min(tmp104, tmp32); 2023-01-11T21:38:05.8412335Z auto tmp114 = in_ptr0[tmp35 + (15*tmp113)]; 2023-01-11T21:38:05.8412439Z auto tmp115 = tmp114 / tmp112; 2023-01-11T21:38:05.8412541Z auto tmp116 = tmp104 < tmp8; 2023-01-11T21:38:05.8412641Z auto tmp117 = tmp116 & tmp39; 2023-01-11T21:38:05.8412743Z auto tmp118 = tmp103 + tmp115; 2023-01-11T21:38:05.8412854Z auto tmp119 = tmp117 ? tmp118 : tmp103; 2023-01-11T21:38:05.8412957Z auto tmp120 = tmp111 * tmp50; 2023-01-11T21:38:05.8413065Z auto tmp121 = in_ptr0[tmp52 + (15*tmp113)]; 2023-01-11T21:38:05.8413169Z auto tmp122 = tmp121 / tmp120; 2023-01-11T21:38:05.8413271Z auto tmp123 = tmp116 & tmp55; 2023-01-11T21:38:05.8413371Z auto tmp124 = tmp119 + tmp122; 2023-01-11T21:38:05.8413481Z auto tmp125 = tmp123 ? tmp124 : tmp119; 2023-01-11T21:38:05.8413584Z auto tmp126 = tmp111 * tmp67; 2023-01-11T21:38:05.8413701Z auto tmp127 = in_ptr0[tmp69 + (15*tmp113)]; 2023-01-11T21:38:05.8413802Z auto tmp128 = tmp127 / tmp126; 2023-01-11T21:38:05.8413896Z auto tmp129 = tmp116 & tmp72; 2023-01-11T21:38:05.8413997Z auto tmp130 = tmp125 + tmp128; 2023-01-11T21:38:05.8414111Z auto tmp131 = tmp129 ? tmp130 : tmp125; 2023-01-11T21:38:05.8414215Z out_ptr0[i1 + (15*i0)] = tmp131; 2023-01-11T21:38:05.8414286Z } 2023-01-11T21:38:05.8414385Z } 2023-01-11T21:38:05.8414451Z } 2023-01-11T21:38:05.8414692Z } 2023-01-11T21:38:05.8414762Z } 2023-01-11T21:38:05.8414824Z } 2023-01-11T21:38:05.8414912Z ''') 2023-01-11T21:38:05.8414918Z 2023-01-11T21:38:05.8414922Z 2023-01-11T21:38:05.8415020Z async_compile.wait(globals()) 2023-01-11T21:38:05.8415102Z del async_compile 2023-01-11T21:38:05.8415107Z 2023-01-11T21:38:05.8415182Z def call(args): 2023-01-11T21:38:05.8415273Z arg0_1, arg1_1 = args 2023-01-11T21:38:05.8415352Z args.clear() 2023-01-11T21:38:05.8415594Z buf0 = empty_strided((1, 1, 20, 15), (300, 300, 15, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8415733Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:05.8415804Z del arg0_1 2023-01-11T21:38:05.8415879Z return (buf0, ) 2023-01-11T21:38:05.8415884Z 2023-01-11T21:38:05.8415889Z 2023-01-11T21:38:05.8415968Z if __name__ == "__main__": 2023-01-11T21:38:05.8416088Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.8416211Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.8416428Z arg0_1 = rand_strided((1, 1, 20, 15), (300, 300, 15, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8416646Z arg1_1 = rand_strided((1, 1, 20, 15), (300, 300, 15, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8416766Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:05.8417030Z [2023-01-11 21:24:53,617] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 34 2023-01-11T21:38:05.8417036Z 2023-01-11T21:38:05.8417108Z ok (1.870s) 2023-01-11T21:38:05.8417625Z test_avg_pool2d_backward3_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.8417762Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.8418018Z [2023-01-11 21:24:53,658] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 35 2023-01-11T21:38:05.8418331Z [2023-01-11 21:24:55,564] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 35 2023-01-11T21:38:05.8418346Z 2023-01-11T21:38:05.8418438Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.8418512Z import torch 2023-01-11T21:38:05.8418584Z import random 2023-01-11T21:38:05.8418702Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.8418827Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.8418832Z 2023-01-11T21:38:05.8418912Z aten = torch.ops.aten 2023-01-11T21:38:05.8419048Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.8419138Z async_compile = AsyncCompile() 2023-01-11T21:38:05.8419146Z 2023-01-11T21:38:05.8419220Z import triton 2023-01-11T21:38:05.8419312Z import triton.language as tl 2023-01-11T21:38:05.8419438Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.8419578Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.8419584Z 2023-01-11T21:38:05.8419590Z 2023-01-11T21:38:05.8419728Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.8419933Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.8420055Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:05.8420152Z float* __restrict__ out_ptr0) 2023-01-11T21:38:05.8420219Z { 2023-01-11T21:38:05.8420320Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.8420387Z { 2023-01-11T21:38:05.8420468Z #pragma omp for 2023-01-11T21:38:05.8420556Z for(long i0=0; i0<2016; i0+=1) 2023-01-11T21:38:05.8420662Z { 2023-01-11T21:38:05.8420740Z #pragma GCC ivdep 2023-01-11T21:38:05.8420829Z for(long i1=0; i1<21; i1+=1) 2023-01-11T21:38:05.8420896Z { 2023-01-11T21:38:05.8420981Z #pragma GCC ivdep 2023-01-11T21:38:05.8421082Z for(long i2=0; i2<21; i2+=1) 2023-01-11T21:38:05.8421153Z { 2023-01-11T21:38:05.8421222Z { 2023-01-11T21:38:05.8421288Z { 2023-01-11T21:38:05.8421411Z auto tmp0 = static_cast(((1 + i1) / 2)); 2023-01-11T21:38:05.8421529Z auto tmp1 = static_cast(((1 + i2) / 2)); 2023-01-11T21:38:05.8421648Z auto tmp2 = static_cast(1 + (i1 / 2)); 2023-01-11T21:38:05.8421765Z auto tmp3 = static_cast(1 + (i2 / 2)); 2023-01-11T21:38:05.8421878Z auto tmp4 = static_cast(0); 2023-01-11T21:38:05.8422015Z auto tmp5 = (tmp4 != tmp4) ? tmp4 : std::max(tmp0, tmp4); 2023-01-11T21:38:05.8422142Z auto tmp6 = (tmp4 != tmp4) ? tmp4 : std::max(tmp1, tmp4); 2023-01-11T21:38:05.8422256Z auto tmp7 = static_cast(11); 2023-01-11T21:38:05.8422392Z auto tmp8 = (tmp7 != tmp7) ? tmp7 : std::min(tmp2, tmp7); 2023-01-11T21:38:05.8422519Z auto tmp9 = (tmp7 != tmp7) ? tmp7 : std::min(tmp3, tmp7); 2023-01-11T21:38:05.8422620Z auto tmp10 = tmp5 + tmp4; 2023-01-11T21:38:05.8422718Z auto tmp11 = tmp6 + tmp4; 2023-01-11T21:38:05.8422833Z auto tmp12 = static_cast(1); 2023-01-11T21:38:05.8422983Z auto tmp13 = tmp8 - tmp12; 2023-01-11T21:38:05.8423121Z auto tmp14 = (tmp13 != tmp13) ? tmp13 : std::min(tmp10, tmp13); 2023-01-11T21:38:05.8423263Z auto tmp15 = tmp9 - tmp12; 2023-01-11T21:38:05.8423404Z auto tmp16 = (tmp15 != tmp15) ? tmp15 : std::min(tmp11, tmp15); 2023-01-11T21:38:05.8423528Z auto tmp17 = in_ptr0[tmp16 + (11*tmp14) + (121*i0)]; 2023-01-11T21:38:05.8423628Z auto tmp18 = tmp17 / 1; 2023-01-11T21:38:05.8423761Z auto tmp19 = tmp10 < tmp8; 2023-01-11T21:38:05.8423864Z auto tmp20 = tmp11 < tmp9; 2023-01-11T21:38:05.8423968Z auto tmp21 = tmp19 & tmp20; 2023-01-11T21:38:05.8424077Z auto tmp22 = static_cast(0.0); 2023-01-11T21:38:05.8424188Z auto tmp23 = tmp21 ? tmp18 : tmp22; 2023-01-11T21:38:05.8424300Z out_ptr0[i2 + (21*i1) + (441*i0)] = tmp23; 2023-01-11T21:38:05.8424374Z } 2023-01-11T21:38:05.8424445Z } 2023-01-11T21:38:05.8424517Z } 2023-01-11T21:38:05.8424583Z } 2023-01-11T21:38:05.8424643Z } 2023-01-11T21:38:05.8424710Z } 2023-01-11T21:38:05.8424773Z } 2023-01-11T21:38:05.8424856Z ''') 2023-01-11T21:38:05.8424861Z 2023-01-11T21:38:05.8424866Z 2023-01-11T21:38:05.8424959Z async_compile.wait(globals()) 2023-01-11T21:38:05.8425035Z del async_compile 2023-01-11T21:38:05.8425042Z 2023-01-11T21:38:05.8425117Z def call(args): 2023-01-11T21:38:05.8425195Z arg0_1, arg1_1 = args 2023-01-11T21:38:05.8425263Z args.clear() 2023-01-11T21:38:05.8425494Z buf0 = empty_strided((1, 2016, 21, 21), (889056, 441, 21, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8425633Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:05.8425705Z del arg0_1 2023-01-11T21:38:05.8425779Z return (buf0, ) 2023-01-11T21:38:05.8425786Z 2023-01-11T21:38:05.8425790Z 2023-01-11T21:38:05.8425870Z if __name__ == "__main__": 2023-01-11T21:38:05.8425987Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.8426136Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.8426362Z arg0_1 = rand_strided((1, 2016, 11, 11), (243936, 121, 11, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8426582Z arg1_1 = rand_strided((1, 2016, 21, 21), (889056, 441, 21, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8426704Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:05.8426709Z 2023-01-11T21:38:05.8426779Z ok (1.954s) 2023-01-11T21:38:05.8427242Z test_avg_pool2d_backward4_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.8427374Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.8427635Z [2023-01-11 21:24:55,600] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 36 2023-01-11T21:38:05.8427875Z [2023-01-11 21:24:55,611] torch._inductor.ir: [WARNING] Using FallbackKernel: aten.avg_pool2d_backward 2023-01-11T21:38:05.8428137Z [2023-01-11 21:24:55,613] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 36 2023-01-11T21:38:05.8428143Z 2023-01-11T21:38:05.8428235Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.8428308Z import torch 2023-01-11T21:38:05.8428381Z import random 2023-01-11T21:38:05.8428500Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.8428624Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.8428629Z 2023-01-11T21:38:05.8428712Z aten = torch.ops.aten 2023-01-11T21:38:05.8428849Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.8428945Z async_compile = AsyncCompile() 2023-01-11T21:38:05.8428953Z 2023-01-11T21:38:05.8429020Z import triton 2023-01-11T21:38:05.8429114Z import triton.language as tl 2023-01-11T21:38:05.8429240Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.8429380Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.8429386Z 2023-01-11T21:38:05.8429419Z 2023-01-11T21:38:05.8429515Z async_compile.wait(globals()) 2023-01-11T21:38:05.8429591Z del async_compile 2023-01-11T21:38:05.8429596Z 2023-01-11T21:38:05.8429671Z def call(args): 2023-01-11T21:38:05.8429750Z arg0_1, arg1_1 = args 2023-01-11T21:38:05.8429818Z args.clear() 2023-01-11T21:38:05.8429968Z buf0 = aten.avg_pool2d_backward(arg0_1, arg1_1, [13, 13], [1, 1], [0, 0], True, False, None) 2023-01-11T21:38:05.8430040Z del arg0_1 2023-01-11T21:38:05.8430110Z del arg1_1 2023-01-11T21:38:05.8430183Z buf1 = buf0 2023-01-11T21:38:05.8430297Z assert_size_stride(buf1, (1, 16, 24, 24), (9216, 576, 24, 1)) 2023-01-11T21:38:05.8430370Z del buf0 2023-01-11T21:38:05.8430439Z return (buf1, ) 2023-01-11T21:38:05.8430444Z 2023-01-11T21:38:05.8430449Z 2023-01-11T21:38:05.8430528Z if __name__ == "__main__": 2023-01-11T21:38:05.8430645Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.8430774Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.8430999Z arg0_1 = rand_strided((1, 16, 12, 12), (2304, 144, 12, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8431217Z arg1_1 = rand_strided((1, 16, 24, 24), (9216, 576, 24, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8431338Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:05.8431343Z 2023-01-11T21:38:05.8431412Z ok (0.036s) 2023-01-11T21:38:05.8431866Z test_avg_pool2d_backward_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.8432025Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.8432283Z [2023-01-11 21:24:55,634] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 37 2023-01-11T21:38:05.8432546Z [2023-01-11 21:24:57,391] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 37 2023-01-11T21:38:05.8432552Z 2023-01-11T21:38:05.8432650Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.8432725Z import torch 2023-01-11T21:38:05.8432799Z import random 2023-01-11T21:38:05.8432917Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.8433042Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.8433047Z 2023-01-11T21:38:05.8433121Z aten = torch.ops.aten 2023-01-11T21:38:05.8433260Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.8433354Z async_compile = AsyncCompile() 2023-01-11T21:38:05.8433359Z 2023-01-11T21:38:05.8433431Z import triton 2023-01-11T21:38:05.8433523Z import triton.language as tl 2023-01-11T21:38:05.8433646Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.8433788Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.8433793Z 2023-01-11T21:38:05.8433798Z 2023-01-11T21:38:05.8433935Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.8434134Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.8434257Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:05.8434362Z float* __restrict__ out_ptr0) 2023-01-11T21:38:05.8434428Z { 2023-01-11T21:38:05.8434528Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.8434593Z { 2023-01-11T21:38:05.8434677Z #pragma omp for 2023-01-11T21:38:05.8434757Z for(long i0=0; i0<8; i0+=1) 2023-01-11T21:38:05.8434823Z { 2023-01-11T21:38:05.8434907Z #pragma GCC ivdep 2023-01-11T21:38:05.8434997Z for(long i1=0; i1<14; i1+=1) 2023-01-11T21:38:05.8435064Z { 2023-01-11T21:38:05.8435149Z #pragma GCC ivdep 2023-01-11T21:38:05.8435286Z for(long i2=0; i2<14; i2+=1) 2023-01-11T21:38:05.8435350Z { 2023-01-11T21:38:05.8435421Z { 2023-01-11T21:38:05.8435495Z { 2023-01-11T21:38:05.8435612Z auto tmp0 = static_cast((i1 / 2)); 2023-01-11T21:38:05.8435727Z auto tmp1 = static_cast((i2 / 2)); 2023-01-11T21:38:05.8435845Z auto tmp2 = static_cast(1 + (i1 / 2)); 2023-01-11T21:38:05.8435961Z auto tmp3 = static_cast(1 + (i2 / 2)); 2023-01-11T21:38:05.8436067Z auto tmp4 = static_cast(0); 2023-01-11T21:38:05.8436207Z auto tmp5 = (tmp4 != tmp4) ? tmp4 : std::max(tmp0, tmp4); 2023-01-11T21:38:05.8436338Z auto tmp6 = (tmp4 != tmp4) ? tmp4 : std::max(tmp1, tmp4); 2023-01-11T21:38:05.8436452Z auto tmp7 = static_cast(7); 2023-01-11T21:38:05.8436584Z auto tmp8 = (tmp7 != tmp7) ? tmp7 : std::min(tmp2, tmp7); 2023-01-11T21:38:05.8436712Z auto tmp9 = (tmp7 != tmp7) ? tmp7 : std::min(tmp3, tmp7); 2023-01-11T21:38:05.8436812Z auto tmp10 = tmp5 + tmp4; 2023-01-11T21:38:05.8436911Z auto tmp11 = tmp6 + tmp4; 2023-01-11T21:38:05.8437016Z auto tmp12 = static_cast(1); 2023-01-11T21:38:05.8437168Z auto tmp13 = tmp8 - tmp12; 2023-01-11T21:38:05.8437308Z auto tmp14 = (tmp13 != tmp13) ? tmp13 : std::min(tmp10, tmp13); 2023-01-11T21:38:05.8437487Z auto tmp15 = tmp9 - tmp12; 2023-01-11T21:38:05.8437625Z auto tmp16 = (tmp15 != tmp15) ? tmp15 : std::min(tmp11, tmp15); 2023-01-11T21:38:05.8437749Z auto tmp17 = in_ptr0[tmp16 + (7*tmp14) + (49*i0)]; 2023-01-11T21:38:05.8437850Z auto tmp18 = tmp17 / 4; 2023-01-11T21:38:05.8437950Z auto tmp19 = tmp10 < tmp8; 2023-01-11T21:38:05.8438045Z auto tmp20 = tmp11 < tmp9; 2023-01-11T21:38:05.8438150Z auto tmp21 = tmp19 & tmp20; 2023-01-11T21:38:05.8438267Z auto tmp22 = static_cast(0.0); 2023-01-11T21:38:05.8438378Z auto tmp23 = tmp21 ? tmp18 : tmp22; 2023-01-11T21:38:05.8438491Z out_ptr0[i2 + (14*i1) + (196*i0)] = tmp23; 2023-01-11T21:38:05.8438565Z } 2023-01-11T21:38:05.8438640Z } 2023-01-11T21:38:05.8438708Z } 2023-01-11T21:38:05.8438768Z } 2023-01-11T21:38:05.8438834Z } 2023-01-11T21:38:05.8438899Z } 2023-01-11T21:38:05.8438964Z } 2023-01-11T21:38:05.8439049Z ''') 2023-01-11T21:38:05.8439054Z 2023-01-11T21:38:05.8439059Z 2023-01-11T21:38:05.8439157Z async_compile.wait(globals()) 2023-01-11T21:38:05.8439234Z del async_compile 2023-01-11T21:38:05.8439239Z 2023-01-11T21:38:05.8439307Z def call(args): 2023-01-11T21:38:05.8439387Z arg0_1, arg1_1 = args 2023-01-11T21:38:05.8439461Z args.clear() 2023-01-11T21:38:05.8439683Z buf0 = empty_strided((2, 4, 14, 14), (784, 196, 14, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8439824Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:05.8439898Z del arg0_1 2023-01-11T21:38:05.8439973Z return (buf0, ) 2023-01-11T21:38:05.8439978Z 2023-01-11T21:38:05.8439983Z 2023-01-11T21:38:05.8440059Z if __name__ == "__main__": 2023-01-11T21:38:05.8440176Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.8440305Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.8440519Z arg0_1 = rand_strided((2, 4, 7, 7), (196, 49, 7, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8440766Z arg1_1 = rand_strided((2, 4, 14, 14), (784, 196, 14, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8440892Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:05.8440898Z 2023-01-11T21:38:05.8440970Z ok (1.781s) 2023-01-11T21:38:05.8441422Z test_baddbmm_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.8441556Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.8441805Z [2023-01-11 21:24:57,417] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 38 2023-01-11T21:38:05.8442068Z [2023-01-11 21:24:59,153] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 38 2023-01-11T21:38:05.8442074Z 2023-01-11T21:38:05.8442173Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.8442247Z import torch 2023-01-11T21:38:05.8442322Z import random 2023-01-11T21:38:05.8442441Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.8442565Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.8442571Z 2023-01-11T21:38:05.8442652Z aten = torch.ops.aten 2023-01-11T21:38:05.8442781Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.8442877Z async_compile = AsyncCompile() 2023-01-11T21:38:05.8442882Z 2023-01-11T21:38:05.8442988Z import triton 2023-01-11T21:38:05.8443081Z import triton.language as tl 2023-01-11T21:38:05.8443209Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.8443353Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.8443359Z 2023-01-11T21:38:05.8443364Z 2023-01-11T21:38:05.8443506Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.8443714Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.8443828Z extern "C" void kernel(float* __restrict__ in_out_ptr0, 2023-01-11T21:38:05.8443938Z const float* __restrict__ in_ptr0) 2023-01-11T21:38:05.8444004Z { 2023-01-11T21:38:05.8444106Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.8444173Z { 2023-01-11T21:38:05.8444267Z #pragma omp for collapse(2) 2023-01-11T21:38:05.8444354Z for(long i0=0; i0<6; i0+=1) 2023-01-11T21:38:05.8444414Z { 2023-01-11T21:38:05.8444505Z for(long i1=0; i1<128; i1+=1) 2023-01-11T21:38:05.8444576Z { 2023-01-11T21:38:05.8444671Z for(long i2=0; i2<12; i2+=1) 2023-01-11T21:38:05.8444741Z { 2023-01-11T21:38:05.8444892Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + (8*i2) + (100*i0)); 2023-01-11T21:38:05.8445055Z auto tmp1 = at::vec::Vectorized::loadu(in_out_ptr0 + (8*i2) + (100*i1) + (12800*i0)); 2023-01-11T21:38:05.8445153Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:05.8445273Z tmp2.store(in_out_ptr0 + (8*i2) + (100*i1) + (12800*i0)); 2023-01-11T21:38:05.8445360Z } 2023-01-11T21:38:05.8445472Z #pragma omp simd simdlen(4) 2023-01-11T21:38:05.8445590Z for(long i2=96; i2<100; i2+=1) 2023-01-11T21:38:05.8445659Z { 2023-01-11T21:38:05.8445764Z auto tmp0 = in_ptr0[i2 + (100*i0)]; 2023-01-11T21:38:05.8445882Z auto tmp1 = in_out_ptr0[i2 + (100*i1) + (12800*i0)]; 2023-01-11T21:38:05.8445974Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:05.8446086Z in_out_ptr0[i2 + (100*i1) + (12800*i0)] = tmp2; 2023-01-11T21:38:05.8446155Z } 2023-01-11T21:38:05.8446223Z } 2023-01-11T21:38:05.8446290Z } 2023-01-11T21:38:05.8446386Z } 2023-01-11T21:38:05.8446454Z } 2023-01-11T21:38:05.8446532Z ''') 2023-01-11T21:38:05.8446537Z 2023-01-11T21:38:05.8446542Z 2023-01-11T21:38:05.8446637Z async_compile.wait(globals()) 2023-01-11T21:38:05.8446712Z del async_compile 2023-01-11T21:38:05.8446717Z 2023-01-11T21:38:05.8446792Z def call(args): 2023-01-11T21:38:05.8446879Z arg0_1, arg1_1, arg2_1 = args 2023-01-11T21:38:05.8446954Z args.clear() 2023-01-11T21:38:05.8447170Z buf0 = empty_strided((6, 128, 100), (12800, 100, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8447264Z aten.bmm.out(arg1_1, arg2_1, out=buf0) 2023-01-11T21:38:05.8447336Z del arg1_1 2023-01-11T21:38:05.8447411Z del arg2_1 2023-01-11T21:38:05.8447501Z buf1 = buf0; del buf0 # reuse 2023-01-11T21:38:05.8447638Z kernel_cpp_0(c_void_p(buf1.data_ptr()), c_void_p(arg0_1.data_ptr())) 2023-01-11T21:38:05.8447710Z del arg0_1 2023-01-11T21:38:05.8447784Z return (buf1, ) 2023-01-11T21:38:05.8447789Z 2023-01-11T21:38:05.8447796Z 2023-01-11T21:38:05.8447877Z if __name__ == "__main__": 2023-01-11T21:38:05.8447988Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.8448115Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.8448328Z arg0_1 = rand_strided((6, 1, 100), (100, 100, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8448538Z arg1_1 = rand_strided((6, 128, 64), (8192, 64, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8448749Z arg2_1 = rand_strided((6, 64, 100), (6400, 100, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8448879Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1])) 2023-01-11T21:38:05.8448919Z 2023-01-11T21:38:05.8448992Z ok (1.796s) 2023-01-11T21:38:05.8449452Z test_batch_norm_2d_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.8449587Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.8449836Z [2023-01-11 21:24:59,463] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 39 2023-01-11T21:38:05.8450099Z [2023-01-11 21:25:01,325] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 39 2023-01-11T21:38:05.8450520Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.8450655Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.8450912Z [2023-01-11 21:25:01,606] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 40 2023-01-11T21:38:05.8450918Z 2023-01-11T21:38:05.8451016Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.8451090Z import torch 2023-01-11T21:38:05.8451164Z import random 2023-01-11T21:38:05.8451286Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.8451403Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.8451408Z 2023-01-11T21:38:05.8451490Z aten = torch.ops.aten 2023-01-11T21:38:05.8451626Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.8451724Z async_compile = AsyncCompile() 2023-01-11T21:38:05.8451729Z 2023-01-11T21:38:05.8451803Z import triton 2023-01-11T21:38:05.8451896Z import triton.language as tl 2023-01-11T21:38:05.8452020Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.8452159Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.8452193Z 2023-01-11T21:38:05.8452199Z 2023-01-11T21:38:05.8452329Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.8452536Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.8452660Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:05.8452769Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:05.8452877Z const float* __restrict__ in_ptr2, 2023-01-11T21:38:05.8452984Z const float* __restrict__ in_ptr3, 2023-01-11T21:38:05.8453089Z const float* __restrict__ in_ptr4, 2023-01-11T21:38:05.8453195Z float* __restrict__ out_ptr0, 2023-01-11T21:38:05.8453291Z float* __restrict__ out_ptr1, 2023-01-11T21:38:05.8453390Z float* __restrict__ out_ptr2, 2023-01-11T21:38:05.8453493Z bool* __restrict__ out_ptr3) 2023-01-11T21:38:05.8453561Z { 2023-01-11T21:38:05.8453664Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.8453730Z { 2023-01-11T21:38:05.8453812Z #pragma omp for 2023-01-11T21:38:05.8453892Z for(long i0=0; i0<1; i0+=1) 2023-01-11T21:38:05.8453958Z { 2023-01-11T21:38:05.8454099Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:05.8454195Z tmp0.store(out_ptr0 + 8*i0); 2023-01-11T21:38:05.8454263Z } 2023-01-11T21:38:05.8454363Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:05.8454448Z for(long i0=8; i0<10; i0+=1) 2023-01-11T21:38:05.8454647Z { 2023-01-11T21:38:05.8454782Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:05.8454869Z out_ptr0[i0] = tmp0; 2023-01-11T21:38:05.8454934Z } 2023-01-11T21:38:05.8455015Z #pragma omp for 2023-01-11T21:38:05.8455101Z for(long i0=0; i0<1; i0+=1) 2023-01-11T21:38:05.8455160Z { 2023-01-11T21:38:05.8455299Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr1 + 8*i0); 2023-01-11T21:38:05.8455407Z tmp0.store(out_ptr1 + 8*i0); 2023-01-11T21:38:05.8455489Z } 2023-01-11T21:38:05.8455600Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:05.8455698Z for(long i0=8; i0<10; i0+=1) 2023-01-11T21:38:05.8455763Z { 2023-01-11T21:38:05.8455845Z auto tmp0 = in_ptr1[i0]; 2023-01-11T21:38:05.8455930Z out_ptr1[i0] = tmp0; 2023-01-11T21:38:05.8455999Z } 2023-01-11T21:38:05.8456091Z #pragma omp for collapse(2) 2023-01-11T21:38:05.8456176Z for(long i0=0; i0<2; i0+=1) 2023-01-11T21:38:05.8456248Z { 2023-01-11T21:38:05.8456337Z for(long i1=0; i1<10; i1+=1) 2023-01-11T21:38:05.8456398Z { 2023-01-11T21:38:05.8456492Z for(long i2=0; i2<8; i2+=1) 2023-01-11T21:38:05.8456562Z { 2023-01-11T21:38:05.8456721Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr2 + (8*i2) + (64*i1) + (640*i0)); 2023-01-11T21:38:05.8456856Z auto tmp1 = at::vec::Vectorized(out_ptr0[i1]); 2023-01-11T21:38:05.8456988Z auto tmp3 = at::vec::Vectorized(out_ptr1[i1]); 2023-01-11T21:38:05.8457120Z auto tmp11 = at::vec::Vectorized(in_ptr3[i1]); 2023-01-11T21:38:05.8457309Z auto tmp13 = at::vec::Vectorized(in_ptr4[i1]); 2023-01-11T21:38:05.8457447Z auto tmp2 = tmp0 - tmp1; 2023-01-11T21:38:05.8457657Z auto tmp4 = at::vec::Vectorized(static_cast(1e-05)); 2023-01-11T21:38:05.8457758Z auto tmp5 = tmp3 + tmp4; 2023-01-11T21:38:05.8457853Z auto tmp6 = tmp5.sqrt(); 2023-01-11T21:38:05.8457960Z auto tmp7 = tmp6.reciprocal(); 2023-01-11T21:38:05.8458104Z auto tmp8 = at::vec::Vectorized(static_cast(1)); 2023-01-11T21:38:05.8458243Z auto tmp9 = tmp7 * tmp8; 2023-01-11T21:38:05.8458341Z auto tmp10 = tmp2 * tmp9; 2023-01-11T21:38:05.8458433Z auto tmp12 = tmp10 * tmp11; 2023-01-11T21:38:05.8458533Z auto tmp14 = tmp12 + tmp13; 2023-01-11T21:38:05.8458671Z auto tmp15 = at::vec::clamp_min(tmp14, decltype(tmp14)(0)); 2023-01-11T21:38:05.8458792Z tmp15.store(out_ptr2 + (8*i2) + (64*i1) + (640*i0)); 2023-01-11T21:38:05.8458863Z } 2023-01-11T21:38:05.8458962Z #pragma omp simd simdlen(4) 2023-01-11T21:38:05.8459057Z for(long i2=64; i2<64; i2+=1) 2023-01-11T21:38:05.8459123Z { 2023-01-11T21:38:05.8459234Z auto tmp0 = in_ptr2[i2 + (64*i1) + (640*i0)]; 2023-01-11T21:38:05.8459331Z auto tmp1 = out_ptr0[i1]; 2023-01-11T21:38:05.8459433Z auto tmp3 = out_ptr1[i1]; 2023-01-11T21:38:05.8459531Z auto tmp11 = in_ptr3[i1]; 2023-01-11T21:38:05.8459626Z auto tmp13 = in_ptr4[i1]; 2023-01-11T21:38:05.8459764Z auto tmp2 = tmp0 - tmp1; 2023-01-11T21:38:05.8459921Z auto tmp4 = static_cast(1e-05); 2023-01-11T21:38:05.8460016Z auto tmp5 = tmp3 + tmp4; 2023-01-11T21:38:05.8460120Z auto tmp6 = std::sqrt(tmp5); 2023-01-11T21:38:05.8460212Z auto tmp7 = 1 / tmp6; 2023-01-11T21:38:05.8460323Z auto tmp8 = static_cast(1); 2023-01-11T21:38:05.8460419Z auto tmp9 = tmp7 * tmp8; 2023-01-11T21:38:05.8460514Z auto tmp10 = tmp2 * tmp9; 2023-01-11T21:38:05.8460691Z auto tmp12 = tmp10 * tmp11; 2023-01-11T21:38:05.8460783Z auto tmp14 = tmp12 + tmp13; 2023-01-11T21:38:05.8460884Z auto tmp15 = tmp14 * (tmp14>0); 2023-01-11T21:38:05.8460991Z out_ptr2[i2 + (64*i1) + (640*i0)] = tmp15; 2023-01-11T21:38:05.8461064Z } 2023-01-11T21:38:05.8461131Z } 2023-01-11T21:38:05.8461196Z } 2023-01-11T21:38:05.8461280Z #pragma omp for 2023-01-11T21:38:05.8461361Z for(long i0=0; i0<1280; i0+=1) 2023-01-11T21:38:05.8461428Z { 2023-01-11T21:38:05.8461494Z { 2023-01-11T21:38:05.8461563Z { 2023-01-11T21:38:05.8461660Z auto tmp0 = out_ptr2[i0]; 2023-01-11T21:38:05.8461769Z auto tmp1 = static_cast(0); 2023-01-11T21:38:05.8461857Z auto tmp2 = tmp0 <= tmp1; 2023-01-11T21:38:05.8461950Z out_ptr3[i0] = tmp2; 2023-01-11T21:38:05.8462021Z } 2023-01-11T21:38:05.8462158Z } 2023-01-11T21:38:05.8462240Z } 2023-01-11T21:38:05.8462334Z } 2023-01-11T21:38:05.8462422Z } 2023-01-11T21:38:05.8462668Z ''') 2023-01-11T21:38:05.8462674Z 2023-01-11T21:38:05.8462679Z 2023-01-11T21:38:05.8462793Z async_compile.wait(globals()) 2023-01-11T21:38:05.8462888Z del async_compile 2023-01-11T21:38:05.8462893Z 2023-01-11T21:38:05.8462986Z def call(args): 2023-01-11T21:38:05.8463124Z primals_1, primals_2, primals_3, primals_4, primals_5, primals_6 = args 2023-01-11T21:38:05.8463216Z args.clear() 2023-01-11T21:38:05.8463441Z buf0 = empty_strided((10, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8463676Z buf1 = empty_strided((10, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8463936Z buf2 = empty_strided((2, 10, 8, 8), (640, 64, 8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8464205Z buf3 = empty_strided((2, 10, 8, 8), (640, 64, 8, 1), device='cpu', dtype=torch.bool) 2023-01-11T21:38:05.8464583Z kernel_cpp_0(c_void_p(primals_3.data_ptr()), c_void_p(primals_4.data_ptr()), c_void_p(primals_6.data_ptr()), c_void_p(primals_1.data_ptr()), c_void_p(primals_2.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr()), c_void_p(buf3.data_ptr())) 2023-01-11T21:38:05.8464680Z del primals_1 2023-01-11T21:38:05.8464749Z del primals_2 2023-01-11T21:38:05.8464887Z del primals_3 2023-01-11T21:38:05.8464983Z del primals_4 2023-01-11T21:38:05.8465125Z return (buf0, buf1, buf2, primals_6, buf0, buf1, buf3, ) 2023-01-11T21:38:05.8465131Z 2023-01-11T21:38:05.8465135Z 2023-01-11T21:38:05.8465236Z if __name__ == "__main__": 2023-01-11T21:38:05.8465372Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.8465542Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.8465798Z primals_1 = rand_strided((10, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8465995Z primals_2 = rand_strided((10, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8466212Z primals_3 = rand_strided((10, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8466435Z primals_4 = rand_strided((10, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8466642Z primals_5 = rand_strided((), (), device='cpu', dtype=torch.int64) 2023-01-11T21:38:05.8466880Z primals_6 = rand_strided((2, 10, 8, 8), (640, 64, 8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8467076Z print_performance(lambda: call([primals_1, primals_2, primals_3, primals_4, primals_5, primals_6])) 2023-01-11T21:38:05.8467082Z 2023-01-11T21:38:05.8467362Z [2023-01-11 21:25:03,417] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 40 2023-01-11T21:38:05.8467368Z 2023-01-11T21:38:05.8467502Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.8467595Z import torch 2023-01-11T21:38:05.8467703Z import random 2023-01-11T21:38:05.8467884Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.8468023Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.8468028Z 2023-01-11T21:38:05.8468127Z aten = torch.ops.aten 2023-01-11T21:38:05.8468281Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.8468395Z async_compile = AsyncCompile() 2023-01-11T21:38:05.8468400Z 2023-01-11T21:38:05.8468486Z import triton 2023-01-11T21:38:05.8468571Z import triton.language as tl 2023-01-11T21:38:05.8468735Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.8468891Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.8468896Z 2023-01-11T21:38:05.8468900Z 2023-01-11T21:38:05.8469058Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.8469282Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.8469422Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:05.8469551Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:05.8469674Z const float* __restrict__ in_ptr2, 2023-01-11T21:38:05.8469781Z const float* __restrict__ in_ptr3, 2023-01-11T21:38:05.8469913Z const float* __restrict__ in_ptr4, 2023-01-11T21:38:05.8470046Z float* __restrict__ out_ptr0, 2023-01-11T21:38:05.8470210Z float* __restrict__ out_ptr1, 2023-01-11T21:38:05.8470323Z float* __restrict__ out_ptr2, 2023-01-11T21:38:05.8470437Z bool* __restrict__ out_ptr3) 2023-01-11T21:38:05.8470517Z { 2023-01-11T21:38:05.8470614Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.8470696Z { 2023-01-11T21:38:05.8470802Z #pragma omp for 2023-01-11T21:38:05.8470907Z for(long i0=0; i0<1; i0+=1) 2023-01-11T21:38:05.8471003Z { 2023-01-11T21:38:05.8471164Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:05.8471277Z tmp0.store(out_ptr0 + 8*i0); 2023-01-11T21:38:05.8471337Z } 2023-01-11T21:38:05.8471452Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:05.8471554Z for(long i0=8; i0<10; i0+=1) 2023-01-11T21:38:05.8471674Z { 2023-01-11T21:38:05.8471781Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:05.8471886Z out_ptr0[i0] = tmp0; 2023-01-11T21:38:05.8472027Z } 2023-01-11T21:38:05.8472101Z #pragma omp for 2023-01-11T21:38:05.8472203Z for(long i0=0; i0<1; i0+=1) 2023-01-11T21:38:05.8472285Z { 2023-01-11T21:38:05.8472435Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr1 + 8*i0); 2023-01-11T21:38:05.8472551Z tmp0.store(out_ptr1 + 8*i0); 2023-01-11T21:38:05.8472637Z } 2023-01-11T21:38:05.8472750Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:05.8472833Z for(long i0=8; i0<10; i0+=1) 2023-01-11T21:38:05.8472914Z { 2023-01-11T21:38:05.8473030Z auto tmp0 = in_ptr1[i0]; 2023-01-11T21:38:05.8473132Z out_ptr1[i0] = tmp0; 2023-01-11T21:38:05.8473217Z } 2023-01-11T21:38:05.8473333Z #pragma omp for collapse(2) 2023-01-11T21:38:05.8473436Z for(long i0=0; i0<3; i0+=1) 2023-01-11T21:38:05.8473496Z { 2023-01-11T21:38:05.8473600Z for(long i1=0; i1<10; i1+=1) 2023-01-11T21:38:05.8473681Z { 2023-01-11T21:38:05.8473794Z for(long i2=0; i2<32; i2+=1) 2023-01-11T21:38:05.8473935Z { 2023-01-11T21:38:05.8474112Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr2 + (8*i2) + (256*i1) + (2560*i0)); 2023-01-11T21:38:05.8474269Z auto tmp1 = at::vec::Vectorized(out_ptr0[i1]); 2023-01-11T21:38:05.8474394Z auto tmp3 = at::vec::Vectorized(out_ptr1[i1]); 2023-01-11T21:38:05.8474540Z auto tmp11 = at::vec::Vectorized(in_ptr3[i1]); 2023-01-11T21:38:05.8474717Z auto tmp13 = at::vec::Vectorized(in_ptr4[i1]); 2023-01-11T21:38:05.8474880Z auto tmp2 = tmp0 - tmp1; 2023-01-11T21:38:05.8475113Z auto tmp4 = at::vec::Vectorized(static_cast(1e-05)); 2023-01-11T21:38:05.8475248Z auto tmp5 = tmp3 + tmp4; 2023-01-11T21:38:05.8475404Z auto tmp6 = tmp5.sqrt(); 2023-01-11T21:38:05.8475527Z auto tmp7 = tmp6.reciprocal(); 2023-01-11T21:38:05.8475663Z auto tmp8 = at::vec::Vectorized(static_cast(1)); 2023-01-11T21:38:05.8475773Z auto tmp9 = tmp7 * tmp8; 2023-01-11T21:38:05.8475883Z auto tmp10 = tmp2 * tmp9; 2023-01-11T21:38:05.8475996Z auto tmp12 = tmp10 * tmp11; 2023-01-11T21:38:05.8476115Z auto tmp14 = tmp12 + tmp13; 2023-01-11T21:38:05.8476271Z auto tmp15 = at::vec::clamp_min(tmp14, decltype(tmp14)(0)); 2023-01-11T21:38:05.8476460Z tmp15.store(out_ptr2 + (8*i2) + (256*i1) + (2560*i0)); 2023-01-11T21:38:05.8476556Z } 2023-01-11T21:38:05.8476651Z #pragma omp simd simdlen(4) 2023-01-11T21:38:05.8476765Z for(long i2=256; i2<256; i2+=1) 2023-01-11T21:38:05.8476848Z { 2023-01-11T21:38:05.8476975Z auto tmp0 = in_ptr2[i2 + (256*i1) + (2560*i0)]; 2023-01-11T21:38:05.8477090Z auto tmp1 = out_ptr0[i1]; 2023-01-11T21:38:05.8477202Z auto tmp3 = out_ptr1[i1]; 2023-01-11T21:38:05.8477317Z auto tmp11 = in_ptr3[i1]; 2023-01-11T21:38:05.8477405Z auto tmp13 = in_ptr4[i1]; 2023-01-11T21:38:05.8477560Z auto tmp2 = tmp0 - tmp1; 2023-01-11T21:38:05.8477751Z auto tmp4 = static_cast(1e-05); 2023-01-11T21:38:05.8477863Z auto tmp5 = tmp3 + tmp4; 2023-01-11T21:38:05.8477983Z auto tmp6 = std::sqrt(tmp5); 2023-01-11T21:38:05.8478088Z auto tmp7 = 1 / tmp6; 2023-01-11T21:38:05.8478212Z auto tmp8 = static_cast(1); 2023-01-11T21:38:05.8478301Z auto tmp9 = tmp7 * tmp8; 2023-01-11T21:38:05.8478450Z auto tmp10 = tmp2 * tmp9; 2023-01-11T21:38:05.8478604Z auto tmp12 = tmp10 * tmp11; 2023-01-11T21:38:05.8478715Z auto tmp14 = tmp12 + tmp13; 2023-01-11T21:38:05.8478847Z auto tmp15 = tmp14 * (tmp14>0); 2023-01-11T21:38:05.8478971Z out_ptr2[i2 + (256*i1) + (2560*i0)] = tmp15; 2023-01-11T21:38:05.8479056Z } 2023-01-11T21:38:05.8479116Z } 2023-01-11T21:38:05.8479196Z } 2023-01-11T21:38:05.8479300Z #pragma omp for 2023-01-11T21:38:05.8479410Z for(long i0=0; i0<7680; i0+=1) 2023-01-11T21:38:05.8479497Z { 2023-01-11T21:38:05.8479578Z { 2023-01-11T21:38:05.8479675Z { 2023-01-11T21:38:05.8479765Z auto tmp0 = out_ptr2[i0]; 2023-01-11T21:38:05.8479890Z auto tmp1 = static_cast(0); 2023-01-11T21:38:05.8480000Z auto tmp2 = tmp0 <= tmp1; 2023-01-11T21:38:05.8480116Z out_ptr3[i0] = tmp2; 2023-01-11T21:38:05.8480200Z } 2023-01-11T21:38:05.8480322Z } 2023-01-11T21:38:05.8480404Z } 2023-01-11T21:38:05.8480463Z } 2023-01-11T21:38:05.8480541Z } 2023-01-11T21:38:05.8480656Z ''') 2023-01-11T21:38:05.8480662Z 2023-01-11T21:38:05.8480666Z 2023-01-11T21:38:05.8480775Z async_compile.wait(globals()) 2023-01-11T21:38:05.8480874Z del async_compile 2023-01-11T21:38:05.8480879Z 2023-01-11T21:38:05.8480972Z def call(args): 2023-01-11T21:38:05.8481131Z primals_1, primals_2, primals_3, primals_4, primals_5, primals_6 = args 2023-01-11T21:38:05.8481235Z args.clear() 2023-01-11T21:38:05.8481444Z buf0 = empty_strided((10, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8481650Z buf1 = empty_strided((10, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8481884Z buf2 = empty_strided((3, 10, 16, 16), (2560, 256, 16, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8482138Z buf3 = empty_strided((3, 10, 16, 16), (2560, 256, 16, 1), device='cpu', dtype=torch.bool) 2023-01-11T21:38:05.8482482Z kernel_cpp_0(c_void_p(primals_3.data_ptr()), c_void_p(primals_4.data_ptr()), c_void_p(primals_6.data_ptr()), c_void_p(primals_1.data_ptr()), c_void_p(primals_2.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr()), c_void_p(buf3.data_ptr())) 2023-01-11T21:38:05.8482580Z del primals_1 2023-01-11T21:38:05.8482671Z del primals_2 2023-01-11T21:38:05.8482739Z del primals_3 2023-01-11T21:38:05.8482869Z del primals_4 2023-01-11T21:38:05.8483009Z return (buf0, buf1, buf2, primals_6, buf0, buf1, buf3, ) 2023-01-11T21:38:05.8483021Z 2023-01-11T21:38:05.8483026Z 2023-01-11T21:38:05.8483120Z if __name__ == "__main__": 2023-01-11T21:38:05.8483261Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.8483417Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.8483640Z primals_1 = rand_strided((10, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8483853Z primals_2 = rand_strided((10, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8484042Z primals_3 = rand_strided((10, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8484252Z primals_4 = rand_strided((10, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8484458Z primals_5 = rand_strided((), (), device='cpu', dtype=torch.int64) 2023-01-11T21:38:05.8484702Z primals_6 = rand_strided((3, 10, 16, 16), (2560, 256, 16, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8484904Z print_performance(lambda: call([primals_1, primals_2, primals_3, primals_4, primals_5, primals_6])) 2023-01-11T21:38:05.8484912Z 2023-01-11T21:38:05.8485003Z ok (4.235s) 2023-01-11T21:38:05.8485527Z test_bernoulli1_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.8485677Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.8485953Z [2023-01-11 21:25:03,461] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 41 2023-01-11T21:38:05.8486270Z [2023-01-11 21:25:05,220] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 41 2023-01-11T21:38:05.8486276Z 2023-01-11T21:38:05.8486372Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.8486463Z import torch 2023-01-11T21:38:05.8486560Z import random 2023-01-11T21:38:05.8486696Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.8486835Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.8486840Z 2023-01-11T21:38:05.8486959Z aten = torch.ops.aten 2023-01-11T21:38:05.8487111Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.8487225Z async_compile = AsyncCompile() 2023-01-11T21:38:05.8487230Z 2023-01-11T21:38:05.8487297Z import triton 2023-01-11T21:38:05.8487404Z import triton.language as tl 2023-01-11T21:38:05.8487560Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.8487718Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.8487723Z 2023-01-11T21:38:05.8487727Z 2023-01-11T21:38:05.8487882Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.8488109Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.8488291Z extern "C" void kernel(float* __restrict__ out_ptr0) 2023-01-11T21:38:05.8488371Z { 2023-01-11T21:38:05.8488467Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.8488596Z { 2023-01-11T21:38:05.8488692Z #pragma omp for 2023-01-11T21:38:05.8488796Z for(long i0=0; i0<12; i0+=1) 2023-01-11T21:38:05.8488879Z { 2023-01-11T21:38:05.8489036Z auto tmp0 = at::vec::Vectorized(static_cast(0)); 2023-01-11T21:38:05.8489147Z tmp0.store(out_ptr0 + 8*i0); 2023-01-11T21:38:05.8489208Z } 2023-01-11T21:38:05.8489334Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:05.8489445Z for(long i0=96; i0<100; i0+=1) 2023-01-11T21:38:05.8489529Z { 2023-01-11T21:38:05.8489646Z auto tmp0 = static_cast(0); 2023-01-11T21:38:05.8489747Z out_ptr0[i0] = tmp0; 2023-01-11T21:38:05.8489807Z } 2023-01-11T21:38:05.8489890Z } 2023-01-11T21:38:05.8489968Z } 2023-01-11T21:38:05.8490067Z ''') 2023-01-11T21:38:05.8490072Z 2023-01-11T21:38:05.8490076Z 2023-01-11T21:38:05.8490207Z async_compile.wait(globals()) 2023-01-11T21:38:05.8490300Z del async_compile 2023-01-11T21:38:05.8490305Z 2023-01-11T21:38:05.8490435Z def call(args): 2023-01-11T21:38:05.8490527Z arg0_1, = args 2023-01-11T21:38:05.8490595Z args.clear() 2023-01-11T21:38:05.8490803Z buf0 = empty_strided((100, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8490926Z kernel_cpp_0(c_void_p(buf0.data_ptr())) 2023-01-11T21:38:05.8491029Z aten.bernoulli_(buf0, ) 2023-01-11T21:38:05.8491132Z return (buf0, buf0, ) 2023-01-11T21:38:05.8491137Z 2023-01-11T21:38:05.8491141Z 2023-01-11T21:38:05.8491248Z if __name__ == "__main__": 2023-01-11T21:38:05.8491381Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.8491502Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.8491715Z arg0_1 = rand_strided((100, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8491845Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:05.8491850Z 2023-01-11T21:38:05.8491934Z ok (1.795s) 2023-01-11T21:38:05.8492434Z test_bernoulli2_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.8492593Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.8492868Z [2023-01-11 21:25:05,245] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 42 2023-01-11T21:38:05.8493155Z [2023-01-11 21:25:05,246] torch._inductor.lowering: [WARNING] using triton random, expect difference from eager 2023-01-11T21:38:05.8493477Z [2023-01-11 21:25:06,981] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 42 2023-01-11T21:38:05.8493483Z 2023-01-11T21:38:05.8493600Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.8493670Z import torch 2023-01-11T21:38:05.8493765Z import random 2023-01-11T21:38:05.8493905Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.8494054Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.8494060Z 2023-01-11T21:38:05.8494158Z aten = torch.ops.aten 2023-01-11T21:38:05.8494310Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.8494433Z async_compile = AsyncCompile() 2023-01-11T21:38:05.8494439Z 2023-01-11T21:38:05.8494633Z import triton 2023-01-11T21:38:05.8494747Z import triton.language as tl 2023-01-11T21:38:05.8494894Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.8495052Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.8495295Z seed_cpu_None = None # 9130db9322feaa41c28986790b86d7dd047e77339ff46fce775dbaa5929b26ce 2023-01-11T21:38:05.8495301Z 2023-01-11T21:38:05.8495305Z 2023-01-11T21:38:05.8495459Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.8495683Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.8495822Z extern "C" void kernel(const long* __restrict__ seed0, 2023-01-11T21:38:05.8495926Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:05.8496104Z bool* __restrict__ out_ptr0) 2023-01-11T21:38:05.8496183Z { 2023-01-11T21:38:05.8496305Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.8496389Z { 2023-01-11T21:38:05.8496485Z #pragma omp for 2023-01-11T21:38:05.8496587Z for(long i0=0; i0<8; i0+=1) 2023-01-11T21:38:05.8496647Z { 2023-01-11T21:38:05.8496727Z { 2023-01-11T21:38:05.8496811Z { 2023-01-11T21:38:05.8496932Z auto tmp0 = seed0[0]; 2023-01-11T21:38:05.8497054Z auto tmp5 = in_ptr1[i0]; 2023-01-11T21:38:05.8497304Z auto tmp1 = static_cast(65535); 2023-01-11T21:38:05.8497419Z auto tmp2 = tmp0 ^ tmp1; 2023-01-11T21:38:05.8497523Z auto tmp3 = static_cast(i0); 2023-01-11T21:38:05.8497683Z auto tmp4 = static_cast(normalized_rand_cpu(tmp2, tmp3));; 2023-01-11T21:38:05.8497798Z auto tmp6 = tmp4 < tmp5; 2023-01-11T21:38:05.8497902Z out_ptr0[i0] = tmp6; 2023-01-11T21:38:05.8498027Z } 2023-01-11T21:38:05.8498127Z } 2023-01-11T21:38:05.8498208Z } 2023-01-11T21:38:05.8498270Z } 2023-01-11T21:38:05.8498349Z } 2023-01-11T21:38:05.8498452Z ''') 2023-01-11T21:38:05.8498458Z 2023-01-11T21:38:05.8498462Z 2023-01-11T21:38:05.8498572Z async_compile.wait(globals()) 2023-01-11T21:38:05.8498662Z del async_compile 2023-01-11T21:38:05.8498672Z 2023-01-11T21:38:05.8498759Z def call(args): 2023-01-11T21:38:05.8498855Z arg0_1, = args 2023-01-11T21:38:05.8498926Z args.clear() 2023-01-11T21:38:05.8499095Z torch.randint(2**31, size=(), dtype=torch.int64, out=seed_cpu_None) 2023-01-11T21:38:05.8499339Z buf0 = empty_strided((8, ), (1, ), device='cpu', dtype=torch.bool) 2023-01-11T21:38:05.8499539Z kernel_cpp_0(c_void_p(seed_cpu_None.data_ptr()), c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:05.8499627Z del arg0_1 2023-01-11T21:38:05.8499718Z return (buf0, ) 2023-01-11T21:38:05.8499724Z 2023-01-11T21:38:05.8499728Z 2023-01-11T21:38:05.8499822Z if __name__ == "__main__": 2023-01-11T21:38:05.8500031Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.8500152Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.8500366Z seed_cpu_None = rand_strided((), (), device='cpu', dtype=torch.int64) 2023-01-11T21:38:05.8500590Z arg0_1 = rand_strided((8, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8500721Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:05.8500726Z 2023-01-11T21:38:05.8500810Z ok (1.761s) 2023-01-11T21:38:05.8501279Z test_bitwise2_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.8501434Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.8501711Z [2023-01-11 21:25:07,003] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 43 2023-01-11T21:38:05.8501988Z [2023-01-11 21:25:08,748] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 43 2023-01-11T21:38:05.8502025Z 2023-01-11T21:38:05.8502139Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.8502206Z import torch 2023-01-11T21:38:05.8502317Z import random 2023-01-11T21:38:05.8502452Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.8502589Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.8502597Z 2023-01-11T21:38:05.8502703Z aten = torch.ops.aten 2023-01-11T21:38:05.8502855Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.8503006Z async_compile = AsyncCompile() 2023-01-11T21:38:05.8503012Z 2023-01-11T21:38:05.8503079Z import triton 2023-01-11T21:38:05.8503186Z import triton.language as tl 2023-01-11T21:38:05.8503335Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.8503502Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.8503508Z 2023-01-11T21:38:05.8503512Z 2023-01-11T21:38:05.8503667Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.8503901Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.8504041Z extern "C" void kernel(const bool* __restrict__ in_ptr0, 2023-01-11T21:38:05.8504167Z const bool* __restrict__ in_ptr1, 2023-01-11T21:38:05.8504264Z bool* __restrict__ out_ptr0, 2023-01-11T21:38:05.8504387Z bool* __restrict__ out_ptr1, 2023-01-11T21:38:05.8504505Z bool* __restrict__ out_ptr2, 2023-01-11T21:38:05.8504618Z bool* __restrict__ out_ptr3) 2023-01-11T21:38:05.8504714Z { 2023-01-11T21:38:05.8504839Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.8504919Z { 2023-01-11T21:38:05.8504995Z #pragma omp for 2023-01-11T21:38:05.8505098Z for(long i0=0; i0<40; i0+=1) 2023-01-11T21:38:05.8505221Z { 2023-01-11T21:38:05.8505307Z { 2023-01-11T21:38:05.8505393Z { 2023-01-11T21:38:05.8505510Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:05.8505645Z auto tmp2 = in_ptr1[i0]; 2023-01-11T21:38:05.8505733Z auto tmp1 = tmp0 == 0; 2023-01-11T21:38:05.8505844Z auto tmp3 = tmp0 | tmp2; 2023-01-11T21:38:05.8505957Z auto tmp4 = tmp0 ^ tmp2; 2023-01-11T21:38:05.8506108Z auto tmp5 = tmp0 & tmp2; 2023-01-11T21:38:05.8506216Z out_ptr0[i0] = tmp1; 2023-01-11T21:38:05.8506320Z out_ptr1[i0] = tmp3; 2023-01-11T21:38:05.8506435Z out_ptr2[i0] = tmp4; 2023-01-11T21:38:05.8506517Z out_ptr3[i0] = tmp5; 2023-01-11T21:38:05.8506601Z } 2023-01-11T21:38:05.8506696Z } 2023-01-11T21:38:05.8506782Z } 2023-01-11T21:38:05.8506866Z } 2023-01-11T21:38:05.8506993Z } 2023-01-11T21:38:05.8507073Z ''') 2023-01-11T21:38:05.8507100Z 2023-01-11T21:38:05.8507105Z 2023-01-11T21:38:05.8507192Z async_compile.wait(globals()) 2023-01-11T21:38:05.8507299Z del async_compile 2023-01-11T21:38:05.8507304Z 2023-01-11T21:38:05.8507396Z def call(args): 2023-01-11T21:38:05.8507492Z arg0_1, arg1_1 = args 2023-01-11T21:38:05.8507596Z args.clear() 2023-01-11T21:38:05.8507809Z buf0 = empty_strided((2, 20), (20, 1), device='cpu', dtype=torch.bool) 2023-01-11T21:38:05.8508019Z buf1 = empty_strided((2, 20), (20, 1), device='cpu', dtype=torch.bool) 2023-01-11T21:38:05.8508198Z buf2 = empty_strided((2, 20), (20, 1), device='cpu', dtype=torch.bool) 2023-01-11T21:38:05.8508400Z buf3 = empty_strided((2, 20), (20, 1), device='cpu', dtype=torch.bool) 2023-01-11T21:38:05.8508661Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr()), c_void_p(buf3.data_ptr())) 2023-01-11T21:38:05.8508751Z del arg0_1 2023-01-11T21:38:05.8508838Z del arg1_1 2023-01-11T21:38:05.8508948Z return (buf0, buf1, buf2, buf3, ) 2023-01-11T21:38:05.8508982Z 2023-01-11T21:38:05.8508987Z 2023-01-11T21:38:05.8509097Z if __name__ == "__main__": 2023-01-11T21:38:05.8509230Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.8509351Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.8509601Z arg0_1 = rand_strided((2, 20), (20, 1), device='cpu', dtype=torch.bool) 2023-01-11T21:38:05.8509817Z arg1_1 = rand_strided((2, 20), (20, 1), device='cpu', dtype=torch.bool) 2023-01-11T21:38:05.8509954Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:05.8509959Z 2023-01-11T21:38:05.8510046Z ok (1.767s) 2023-01-11T21:38:05.8510516Z test_bitwise_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.8510670Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.8510962Z [2023-01-11 21:25:08,776] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 44 2023-01-11T21:38:05.8511246Z [2023-01-11 21:25:10,632] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 44 2023-01-11T21:38:05.8511252Z 2023-01-11T21:38:05.8511378Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.8511448Z import torch 2023-01-11T21:38:05.8511539Z import random 2023-01-11T21:38:05.8511675Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.8511819Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.8511824Z 2023-01-11T21:38:05.8511923Z aten = torch.ops.aten 2023-01-11T21:38:05.8512076Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.8512205Z async_compile = AsyncCompile() 2023-01-11T21:38:05.8512214Z 2023-01-11T21:38:05.8512284Z import triton 2023-01-11T21:38:05.8512400Z import triton.language as tl 2023-01-11T21:38:05.8512586Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.8512742Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.8512747Z 2023-01-11T21:38:05.8512752Z 2023-01-11T21:38:05.8512939Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.8513163Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.8513300Z extern "C" void kernel(const int* __restrict__ in_ptr0, 2023-01-11T21:38:05.8513429Z const int* __restrict__ in_ptr1, 2023-01-11T21:38:05.8513526Z int* __restrict__ out_ptr0, 2023-01-11T21:38:05.8513655Z int* __restrict__ out_ptr1, 2023-01-11T21:38:05.8513771Z int* __restrict__ out_ptr2, 2023-01-11T21:38:05.8513886Z int* __restrict__ out_ptr3) 2023-01-11T21:38:05.8513973Z { 2023-01-11T21:38:05.8514093Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.8514176Z { 2023-01-11T21:38:05.8514253Z #pragma omp for 2023-01-11T21:38:05.8514362Z for(long i0=0; i0<64; i0+=1) 2023-01-11T21:38:05.8514445Z { 2023-01-11T21:38:05.8514543Z { 2023-01-11T21:38:05.8514669Z { 2023-01-11T21:38:05.8514784Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:05.8514899Z auto tmp2 = in_ptr1[i0]; 2023-01-11T21:38:05.8514983Z auto tmp1 = ~tmp0; 2023-01-11T21:38:05.8515097Z auto tmp3 = tmp0 | tmp2; 2023-01-11T21:38:05.8515216Z auto tmp4 = tmp0 ^ tmp2; 2023-01-11T21:38:05.8515327Z auto tmp5 = tmp0 & tmp2; 2023-01-11T21:38:05.8515433Z out_ptr0[i0] = tmp1; 2023-01-11T21:38:05.8515550Z out_ptr1[i0] = tmp3; 2023-01-11T21:38:05.8515645Z out_ptr2[i0] = tmp4; 2023-01-11T21:38:05.8535206Z out_ptr3[i0] = tmp5; 2023-01-11T21:38:05.8535308Z } 2023-01-11T21:38:05.8535384Z } 2023-01-11T21:38:05.8535449Z } 2023-01-11T21:38:05.8535522Z } 2023-01-11T21:38:05.8535590Z } 2023-01-11T21:38:05.8535705Z ''') 2023-01-11T21:38:05.8535711Z 2023-01-11T21:38:05.8535720Z 2023-01-11T21:38:05.8535833Z async_compile.wait(globals()) 2023-01-11T21:38:05.8535916Z del async_compile 2023-01-11T21:38:05.8535922Z 2023-01-11T21:38:05.8536001Z def call(args): 2023-01-11T21:38:05.8536078Z arg0_1, arg1_1 = args 2023-01-11T21:38:05.8536159Z args.clear() 2023-01-11T21:38:05.8536388Z buf0 = empty_strided((64, ), (1, ), device='cpu', dtype=torch.int32) 2023-01-11T21:38:05.8536583Z buf1 = empty_strided((64, ), (1, ), device='cpu', dtype=torch.int32) 2023-01-11T21:38:05.8536776Z buf2 = empty_strided((64, ), (1, ), device='cpu', dtype=torch.int32) 2023-01-11T21:38:05.8536965Z buf3 = empty_strided((64, ), (1, ), device='cpu', dtype=torch.int32) 2023-01-11T21:38:05.8537272Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr()), c_void_p(buf3.data_ptr())) 2023-01-11T21:38:05.8537362Z del arg0_1 2023-01-11T21:38:05.8537445Z del arg1_1 2023-01-11T21:38:05.8537543Z return (buf0, buf1, buf2, buf3, ) 2023-01-11T21:38:05.8537549Z 2023-01-11T21:38:05.8537553Z 2023-01-11T21:38:05.8537637Z if __name__ == "__main__": 2023-01-11T21:38:05.8537757Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.8537885Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.8538079Z arg0_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.int32) 2023-01-11T21:38:05.8538270Z arg1_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.int32) 2023-01-11T21:38:05.8538391Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:05.8538397Z 2023-01-11T21:38:05.8538464Z ok (1.884s) 2023-01-11T21:38:05.8539010Z test_bmm1_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.8539147Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.8539413Z [2023-01-11 21:25:10,657] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 45 2023-01-11T21:38:05.8539675Z [2023-01-11 21:25:12,563] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 45 2023-01-11T21:38:05.8540090Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.8540224Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.8540479Z [2023-01-11 21:25:12,586] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 46 2023-01-11T21:38:05.8540485Z 2023-01-11T21:38:05.8540582Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.8540657Z import torch 2023-01-11T21:38:05.8540734Z import random 2023-01-11T21:38:05.8540847Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.8540969Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.8540974Z 2023-01-11T21:38:05.8541056Z aten = torch.ops.aten 2023-01-11T21:38:05.8541197Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.8541293Z async_compile = AsyncCompile() 2023-01-11T21:38:05.8541341Z 2023-01-11T21:38:05.8541419Z import triton 2023-01-11T21:38:05.8541515Z import triton.language as tl 2023-01-11T21:38:05.8541633Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.8541773Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.8541778Z 2023-01-11T21:38:05.8541783Z 2023-01-11T21:38:05.8541925Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.8542132Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.8542259Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:05.8542368Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:05.8542472Z float* __restrict__ out_ptr0, 2023-01-11T21:38:05.8542575Z float* __restrict__ out_ptr1) 2023-01-11T21:38:05.8542634Z { 2023-01-11T21:38:05.8542737Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.8542805Z { 2023-01-11T21:38:05.8542891Z #pragma omp for 2023-01-11T21:38:05.8542980Z for(long i0=0; i0<16; i0+=1) 2023-01-11T21:38:05.8543048Z { 2023-01-11T21:38:05.8543199Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:05.8543331Z auto tmp1 = at::vec::Vectorized(static_cast(1)); 2023-01-11T21:38:05.8543428Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:05.8543524Z tmp2.store(out_ptr0 + 8*i0); 2023-01-11T21:38:05.8543593Z } 2023-01-11T21:38:05.8543695Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:05.8543785Z for(long i0=128; i0<128; i0+=1) 2023-01-11T21:38:05.8543852Z { 2023-01-11T21:38:05.8543934Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:05.8544038Z auto tmp1 = static_cast(1); 2023-01-11T21:38:05.8544127Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:05.8544215Z out_ptr0[i0] = tmp2; 2023-01-11T21:38:05.8544283Z } 2023-01-11T21:38:05.8544371Z #pragma omp for 2023-01-11T21:38:05.8544457Z for(long i0=0; i0<16; i0+=1) 2023-01-11T21:38:05.8544516Z { 2023-01-11T21:38:05.8544650Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr1 + 8*i0); 2023-01-11T21:38:05.8544786Z auto tmp1 = at::vec::Vectorized(static_cast(2)); 2023-01-11T21:38:05.8544914Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:05.8545010Z tmp2.store(out_ptr1 + 8*i0); 2023-01-11T21:38:05.8545078Z } 2023-01-11T21:38:05.8545180Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:05.8545261Z for(long i0=128; i0<128; i0+=1) 2023-01-11T21:38:05.8545329Z { 2023-01-11T21:38:05.8545420Z auto tmp0 = in_ptr1[i0]; 2023-01-11T21:38:05.8545524Z auto tmp1 = static_cast(2); 2023-01-11T21:38:05.8545614Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:05.8545701Z out_ptr1[i0] = tmp2; 2023-01-11T21:38:05.8545770Z } 2023-01-11T21:38:05.8545831Z } 2023-01-11T21:38:05.8545897Z } 2023-01-11T21:38:05.8545984Z ''') 2023-01-11T21:38:05.8545990Z 2023-01-11T21:38:05.8545994Z 2023-01-11T21:38:05.8546131Z kernel_cpp_1 = async_compile.cpp(''' 2023-01-11T21:38:05.8546336Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.8546462Z extern "C" void kernel(float* __restrict__ in_out_ptr0) 2023-01-11T21:38:05.8546528Z { 2023-01-11T21:38:05.8546622Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.8546690Z { 2023-01-11T21:38:05.8546771Z #pragma omp for 2023-01-11T21:38:05.8546858Z for(long i0=0; i0<16; i0+=1) 2023-01-11T21:38:05.8546927Z { 2023-01-11T21:38:05.8547072Z auto tmp0 = at::vec::Vectorized::loadu(in_out_ptr0 + 8*i0); 2023-01-11T21:38:05.8547206Z auto tmp1 = at::vec::Vectorized(static_cast(3)); 2023-01-11T21:38:05.8547296Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:05.8547423Z tmp2.store(in_out_ptr0 + 8*i0); 2023-01-11T21:38:05.8547493Z } 2023-01-11T21:38:05.8547595Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:05.8547685Z for(long i0=128; i0<128; i0+=1) 2023-01-11T21:38:05.8547754Z { 2023-01-11T21:38:05.8547851Z auto tmp0 = in_out_ptr0[i0]; 2023-01-11T21:38:05.8547953Z auto tmp1 = static_cast(3); 2023-01-11T21:38:05.8548045Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:05.8548136Z in_out_ptr0[i0] = tmp2; 2023-01-11T21:38:05.8548206Z } 2023-01-11T21:38:05.8548275Z } 2023-01-11T21:38:05.8548341Z } 2023-01-11T21:38:05.8548428Z ''') 2023-01-11T21:38:05.8548434Z 2023-01-11T21:38:05.8548438Z 2023-01-11T21:38:05.8548535Z async_compile.wait(globals()) 2023-01-11T21:38:05.8548609Z del async_compile 2023-01-11T21:38:05.8548614Z 2023-01-11T21:38:05.8548692Z def call(args): 2023-01-11T21:38:05.8548774Z arg0_1, arg1_1 = args 2023-01-11T21:38:05.8548853Z args.clear() 2023-01-11T21:38:05.8549063Z buf0 = empty_strided((2, 8, 8), (64, 8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8549171Z aten.bmm.out(arg0_1, arg1_1, out=buf0) 2023-01-11T21:38:05.8549377Z buf1 = empty_strided((2, 8, 8), (64, 8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8549571Z buf2 = empty_strided((2, 8, 8), (64, 8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8549771Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr())) 2023-01-11T21:38:05.8549848Z del arg0_1 2023-01-11T21:38:05.8549926Z del arg1_1 2023-01-11T21:38:05.8550128Z buf3 = empty_strided((2, 8, 8), (64, 8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8550226Z aten.bmm.out(buf1, buf2, out=buf3) 2023-01-11T21:38:05.8550298Z del buf1 2023-01-11T21:38:05.8550363Z del buf2 2023-01-11T21:38:05.8550457Z buf4 = buf3; del buf3 # reuse 2023-01-11T21:38:05.8550571Z kernel_cpp_1(c_void_p(buf4.data_ptr())) 2023-01-11T21:38:05.8550659Z return (buf0, buf4, ) 2023-01-11T21:38:05.8550664Z 2023-01-11T21:38:05.8550668Z 2023-01-11T21:38:05.8550753Z if __name__ == "__main__": 2023-01-11T21:38:05.8550873Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.8551003Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.8551255Z arg0_1 = rand_strided((2, 8, 8), (64, 8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8551446Z arg1_1 = rand_strided((2, 8, 8), (64, 8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8551568Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:05.8551574Z 2023-01-11T21:38:05.8551578Z 2023-01-11T21:38:05.8551676Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.8551750Z import torch 2023-01-11T21:38:05.8551829Z import random 2023-01-11T21:38:05.8551949Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.8552073Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.8552081Z 2023-01-11T21:38:05.8552164Z aten = torch.ops.aten 2023-01-11T21:38:05.8552293Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.8552389Z async_compile = AsyncCompile() 2023-01-11T21:38:05.8552394Z 2023-01-11T21:38:05.8552469Z import triton 2023-01-11T21:38:05.8552564Z import triton.language as tl 2023-01-11T21:38:05.8552691Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.8552830Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.8552836Z 2023-01-11T21:38:05.8552840Z 2023-01-11T21:38:05.8552974Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.8553175Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.8553292Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:05.8553401Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:05.8553505Z float* __restrict__ out_ptr0, 2023-01-11T21:38:05.8553638Z float* __restrict__ out_ptr1) 2023-01-11T21:38:05.8553706Z { 2023-01-11T21:38:05.8553809Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.8553879Z { 2023-01-11T21:38:05.8553954Z #pragma omp for 2023-01-11T21:38:05.8554044Z for(long i0=0; i0<16; i0+=1) 2023-01-11T21:38:05.8554112Z { 2023-01-11T21:38:05.8554251Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:05.8554388Z auto tmp1 = at::vec::Vectorized(static_cast(1)); 2023-01-11T21:38:05.8554479Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:05.8554576Z tmp2.store(out_ptr0 + 8*i0); 2023-01-11T21:38:05.8554635Z } 2023-01-11T21:38:05.8554735Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:05.8554823Z for(long i0=128; i0<128; i0+=1) 2023-01-11T21:38:05.8554890Z { 2023-01-11T21:38:05.8554978Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:05.8555087Z auto tmp1 = static_cast(1); 2023-01-11T21:38:05.8555177Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:05.8555255Z out_ptr0[i0] = tmp2; 2023-01-11T21:38:05.8555324Z } 2023-01-11T21:38:05.8555405Z #pragma omp for 2023-01-11T21:38:05.8555495Z for(long i0=0; i0<10; i0+=1) 2023-01-11T21:38:05.8555562Z { 2023-01-11T21:38:05.8555699Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr1 + 8*i0); 2023-01-11T21:38:05.8555836Z auto tmp1 = at::vec::Vectorized(static_cast(2)); 2023-01-11T21:38:05.8555918Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:05.8556015Z tmp2.store(out_ptr1 + 8*i0); 2023-01-11T21:38:05.8556084Z } 2023-01-11T21:38:05.8556182Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:05.8556274Z for(long i0=80; i0<80; i0+=1) 2023-01-11T21:38:05.8556345Z { 2023-01-11T21:38:05.8556434Z auto tmp0 = in_ptr1[i0]; 2023-01-11T21:38:05.8556534Z auto tmp1 = static_cast(2); 2023-01-11T21:38:05.8556622Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:05.8556708Z out_ptr1[i0] = tmp2; 2023-01-11T21:38:05.8556775Z } 2023-01-11T21:38:05.8556842Z } 2023-01-11T21:38:05.8556908Z } 2023-01-11T21:38:05.8557025Z ''') 2023-01-11T21:38:05.8557031Z 2023-01-11T21:38:05.8557035Z 2023-01-11T21:38:05.8557162Z kernel_cpp_1 = async_compile.cpp(''' 2023-01-11T21:38:05.8557370Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.8557489Z extern "C" void kernel(float* __restrict__ in_out_ptr0) 2023-01-11T21:38:05.8557554Z { 2023-01-11T21:38:05.8557658Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.8557726Z { 2023-01-11T21:38:05.8557807Z #pragma omp for 2023-01-11T21:38:05.8557886Z for(long i0=0; i0<20; i0+=1) 2023-01-11T21:38:05.8557954Z { 2023-01-11T21:38:05.8558102Z auto tmp0 = at::vec::Vectorized::loadu(in_out_ptr0 + 8*i0); 2023-01-11T21:38:05.8558240Z auto tmp1 = at::vec::Vectorized(static_cast(3)); 2023-01-11T21:38:05.8558329Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:05.8558432Z tmp2.store(in_out_ptr0 + 8*i0); 2023-01-11T21:38:05.8558503Z } 2023-01-11T21:38:05.8558595Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:05.8558683Z for(long i0=160; i0<160; i0+=1) 2023-01-11T21:38:05.8558749Z { 2023-01-11T21:38:05.8558843Z auto tmp0 = in_out_ptr0[i0]; 2023-01-11T21:38:05.8558948Z auto tmp1 = static_cast(3); 2023-01-11T21:38:05.8559036Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:05.8559123Z in_out_ptr0[i0] = tmp2; 2023-01-11T21:38:05.8559183Z } 2023-01-11T21:38:05.8559250Z } 2023-01-11T21:38:05.8559315Z } 2023-01-11T21:38:05.8559405Z ''') 2023-01-11T21:38:05.8559410Z 2023-01-11T21:38:05.8559444Z 2023-01-11T21:38:05.8559539Z async_compile.wait(globals()) 2023-01-11T21:38:05.8559617Z del async_compile 2023-01-11T21:38:05.8559623Z 2023-01-11T21:38:05.8559698Z def call(args): 2023-01-11T21:38:05.8559770Z arg0_1, arg1_1 = args 2023-01-11T21:38:05.8559846Z args.clear() 2023-01-11T21:38:05.8560055Z buf0 = empty_strided((1, 16, 10), (160, 10, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8560231Z aten.mm.out(as_strided(arg0_1, (16, 8), (8, 1)), as_strided(arg1_1, (8, 10), (10, 1)), out=as_strided(buf0, (16, 10), (10, 1))) 2023-01-11T21:38:05.8560438Z buf1 = empty_strided((1, 16, 8), (128, 8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8560638Z buf2 = empty_strided((1, 8, 10), (80, 10, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8560834Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr())) 2023-01-11T21:38:05.8560906Z del arg0_1 2023-01-11T21:38:05.8560971Z del arg1_1 2023-01-11T21:38:05.8561181Z buf3 = empty_strided((1, 16, 10), (160, 10, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8561350Z aten.mm.out(as_strided(buf1, (16, 8), (8, 1)), as_strided(buf2, (8, 10), (10, 1)), out=as_strided(buf3, (16, 10), (10, 1))) 2023-01-11T21:38:05.8561422Z del buf1 2023-01-11T21:38:05.8561493Z del buf2 2023-01-11T21:38:05.8561585Z buf4 = buf3; del buf3 # reuse 2023-01-11T21:38:05.8561692Z kernel_cpp_1(c_void_p(buf4.data_ptr())) 2023-01-11T21:38:05.8561766Z return (buf0, buf4, ) 2023-01-11T21:38:05.8561779Z 2023-01-11T21:38:05.8561783Z 2023-01-11T21:38:05.8561857Z if __name__ == "__main__": 2023-01-11T21:38:05.8561978Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.8562110Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.8562315Z arg0_1 = rand_strided((1, 16, 8), (128, 8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8562516Z arg1_1 = rand_strided((1, 8, 10), (80, 10, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8562638Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:05.8562643Z 2023-01-11T21:38:05.8562908Z [2023-01-11 21:25:14,448] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 46 2023-01-11T21:38:05.8562984Z ok (3.816s) 2023-01-11T21:38:05.8563460Z test_bmm2_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.8563587Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.8563841Z [2023-01-11 21:25:14,467] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 47 2023-01-11T21:38:05.8564107Z [2023-01-11 21:25:14,470] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 47 2023-01-11T21:38:05.8564112Z 2023-01-11T21:38:05.8564215Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.8564289Z import torch 2023-01-11T21:38:05.8564363Z import random 2023-01-11T21:38:05.8564484Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.8564609Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.8564615Z 2023-01-11T21:38:05.8564690Z aten = torch.ops.aten 2023-01-11T21:38:05.8564827Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.8564925Z async_compile = AsyncCompile() 2023-01-11T21:38:05.8564930Z 2023-01-11T21:38:05.8565004Z import triton 2023-01-11T21:38:05.8565096Z import triton.language as tl 2023-01-11T21:38:05.8565225Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.8565367Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.8565401Z 2023-01-11T21:38:05.8565406Z 2023-01-11T21:38:05.8565502Z async_compile.wait(globals()) 2023-01-11T21:38:05.8565572Z del async_compile 2023-01-11T21:38:05.8565577Z 2023-01-11T21:38:05.8565652Z def call(args): 2023-01-11T21:38:05.8565733Z arg0_1, arg1_1 = args 2023-01-11T21:38:05.8565808Z args.clear() 2023-01-11T21:38:05.8566017Z buf0 = empty_strided((1, 8, 8), (64, 8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8566184Z aten.mm.out(as_strided(arg0_1, (8, 8), (1, 8)), as_strided(arg1_1, (8, 8), (8, 1)), out=as_strided(buf0, (8, 8), (8, 1))) 2023-01-11T21:38:05.8566258Z del arg0_1 2023-01-11T21:38:05.8566322Z del arg1_1 2023-01-11T21:38:05.8566398Z return (buf0, ) 2023-01-11T21:38:05.8566403Z 2023-01-11T21:38:05.8566408Z 2023-01-11T21:38:05.8566490Z if __name__ == "__main__": 2023-01-11T21:38:05.8566607Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.8566736Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.8566944Z arg0_1 = rand_strided((1, 8, 8), (64, 8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8567142Z arg1_1 = rand_strided((1, 8, 8), (64, 8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8567262Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:05.8567267Z 2023-01-11T21:38:05.8567340Z ok (0.021s) 2023-01-11T21:38:05.8567782Z test_bool_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.8567917Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.8568172Z [2023-01-11 21:25:14,504] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 48 2023-01-11T21:38:05.8568433Z [2023-01-11 21:25:16,270] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 48 2023-01-11T21:38:05.8568439Z 2023-01-11T21:38:05.8568537Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.8568611Z import torch 2023-01-11T21:38:05.8568686Z import random 2023-01-11T21:38:05.8568834Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.8568961Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.8568966Z 2023-01-11T21:38:05.8569042Z aten = torch.ops.aten 2023-01-11T21:38:05.8569177Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.8569275Z async_compile = AsyncCompile() 2023-01-11T21:38:05.8569280Z 2023-01-11T21:38:05.8569354Z import triton 2023-01-11T21:38:05.8569446Z import triton.language as tl 2023-01-11T21:38:05.8569569Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.8569711Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.8569719Z 2023-01-11T21:38:05.8569724Z 2023-01-11T21:38:05.8569864Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.8570063Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.8570190Z extern "C" void kernel(const bool* __restrict__ in_ptr0, 2023-01-11T21:38:05.8570303Z const bool* __restrict__ in_ptr1, 2023-01-11T21:38:05.8570408Z bool* __restrict__ out_ptr0, 2023-01-11T21:38:05.8570508Z bool* __restrict__ out_ptr1, 2023-01-11T21:38:05.8570607Z bool* __restrict__ out_ptr2, 2023-01-11T21:38:05.8570704Z bool* __restrict__ out_ptr3, 2023-01-11T21:38:05.8570793Z bool* __restrict__ out_ptr4, 2023-01-11T21:38:05.8570890Z bool* __restrict__ out_ptr5, 2023-01-11T21:38:05.8570987Z bool* __restrict__ out_ptr6, 2023-01-11T21:38:05.8571117Z bool* __restrict__ out_ptr7, 2023-01-11T21:38:05.8571219Z bool* __restrict__ out_ptr8) 2023-01-11T21:38:05.8571286Z { 2023-01-11T21:38:05.8571369Z #pragma GCC ivdep 2023-01-11T21:38:05.8571448Z for(long i0=0; i0<4; i0+=1) 2023-01-11T21:38:05.8571515Z { 2023-01-11T21:38:05.8571586Z { 2023-01-11T21:38:05.8571659Z { 2023-01-11T21:38:05.8571755Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:05.8571848Z auto tmp1 = in_ptr1[i0]; 2023-01-11T21:38:05.8571941Z auto tmp2 = tmp0 || tmp1; 2023-01-11T21:38:05.8572028Z auto tmp3 = tmp0 && tmp1; 2023-01-11T21:38:05.8572118Z auto tmp4 = tmp0 & tmp1; 2023-01-11T21:38:05.8572210Z auto tmp5 = tmp0 | tmp1; 2023-01-11T21:38:05.8572302Z auto tmp6 = tmp0 ^ tmp1; 2023-01-11T21:38:05.8572393Z auto tmp7 = tmp0 == 0; 2023-01-11T21:38:05.8572483Z out_ptr0[i0] = tmp2; 2023-01-11T21:38:05.8572572Z out_ptr1[i0] = tmp3; 2023-01-11T21:38:05.8572650Z out_ptr2[i0] = tmp4; 2023-01-11T21:38:05.8572734Z out_ptr3[i0] = tmp5; 2023-01-11T21:38:05.8572821Z out_ptr4[i0] = tmp6; 2023-01-11T21:38:05.8572905Z out_ptr5[i0] = tmp3; 2023-01-11T21:38:05.8572993Z out_ptr6[i0] = tmp2; 2023-01-11T21:38:05.8573077Z out_ptr7[i0] = tmp7; 2023-01-11T21:38:05.8573160Z out_ptr8[i0] = tmp1; 2023-01-11T21:38:05.8573221Z } 2023-01-11T21:38:05.8573288Z } 2023-01-11T21:38:05.8573354Z } 2023-01-11T21:38:05.8573420Z } 2023-01-11T21:38:05.8573509Z ''') 2023-01-11T21:38:05.8573514Z 2023-01-11T21:38:05.8573519Z 2023-01-11T21:38:05.8573620Z async_compile.wait(globals()) 2023-01-11T21:38:05.8573700Z del async_compile 2023-01-11T21:38:05.8573705Z 2023-01-11T21:38:05.8573772Z def call(args): 2023-01-11T21:38:05.8573853Z arg0_1, arg1_1 = args 2023-01-11T21:38:05.8573932Z args.clear() 2023-01-11T21:38:05.8574121Z buf0 = empty_strided((4, ), (1, ), device='cpu', dtype=torch.bool) 2023-01-11T21:38:05.8574306Z buf1 = empty_strided((4, ), (1, ), device='cpu', dtype=torch.bool) 2023-01-11T21:38:05.8574620Z buf2 = empty_strided((4, ), (1, ), device='cpu', dtype=torch.bool) 2023-01-11T21:38:05.8574897Z buf3 = empty_strided((4, ), (1, ), device='cpu', dtype=torch.bool) 2023-01-11T21:38:05.8575074Z buf4 = empty_strided((4, ), (1, ), device='cpu', dtype=torch.bool) 2023-01-11T21:38:05.8575255Z buf5 = empty_strided((4, ), (1, ), device='cpu', dtype=torch.bool) 2023-01-11T21:38:05.8575436Z buf6 = empty_strided((4, ), (1, ), device='cpu', dtype=torch.bool) 2023-01-11T21:38:05.8575614Z buf7 = empty_strided((4, ), (1, ), device='cpu', dtype=torch.bool) 2023-01-11T21:38:05.8575792Z buf8 = empty_strided((4, ), (1, ), device='cpu', dtype=torch.bool) 2023-01-11T21:38:05.8576140Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr()), c_void_p(buf3.data_ptr()), c_void_p(buf4.data_ptr()), c_void_p(buf5.data_ptr()), c_void_p(buf6.data_ptr()), c_void_p(buf7.data_ptr()), c_void_p(buf8.data_ptr())) 2023-01-11T21:38:05.8576220Z del arg0_1 2023-01-11T21:38:05.8576293Z del arg1_1 2023-01-11T21:38:05.8576420Z return (buf0, buf1, buf2, buf3, buf4, buf5, buf6, buf7, buf8, ) 2023-01-11T21:38:05.8576426Z 2023-01-11T21:38:05.8576430Z 2023-01-11T21:38:05.8576504Z if __name__ == "__main__": 2023-01-11T21:38:05.8576625Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.8576752Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.8576939Z arg0_1 = rand_strided((4, ), (1, ), device='cpu', dtype=torch.bool) 2023-01-11T21:38:05.8577176Z arg1_1 = rand_strided((4, ), (1, ), device='cpu', dtype=torch.bool) 2023-01-11T21:38:05.8577307Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:05.8577350Z 2023-01-11T21:38:05.8577423Z ok (1.800s) 2023-01-11T21:38:05.8577748Z test_both_scalars_cpu (__main__.CpuTests) ... [2023-01-11 21:25:16,327] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 49 2023-01-11T21:38:05.8578008Z [2023-01-11 21:25:18,108] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 49 2023-01-11T21:38:05.8578020Z 2023-01-11T21:38:05.8578111Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.8578187Z import torch 2023-01-11T21:38:05.8578261Z import random 2023-01-11T21:38:05.8578380Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.8578505Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.8578510Z 2023-01-11T21:38:05.8578592Z aten = torch.ops.aten 2023-01-11T21:38:05.8578727Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.8578815Z async_compile = AsyncCompile() 2023-01-11T21:38:05.8578822Z 2023-01-11T21:38:05.8578896Z import triton 2023-01-11T21:38:05.8578993Z import triton.language as tl 2023-01-11T21:38:05.8579118Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.8579258Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.8579263Z 2023-01-11T21:38:05.8579267Z 2023-01-11T21:38:05.8579405Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.8579615Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.8579732Z extern "C" void kernel(float* __restrict__ out_ptr0, 2023-01-11T21:38:05.8579826Z float* __restrict__ out_ptr1, 2023-01-11T21:38:05.8579927Z float* __restrict__ out_ptr2, 2023-01-11T21:38:05.8580027Z float* __restrict__ out_ptr3, 2023-01-11T21:38:05.8580126Z float* __restrict__ out_ptr4, 2023-01-11T21:38:05.8580225Z float* __restrict__ out_ptr5) 2023-01-11T21:38:05.8580291Z { 2023-01-11T21:38:05.8580360Z { 2023-01-11T21:38:05.8580420Z { 2023-01-11T21:38:05.8580525Z auto tmp0 = static_cast(4); 2023-01-11T21:38:05.8580634Z auto tmp1 = static_cast(3.3); 2023-01-11T21:38:05.8580723Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:05.8580808Z out_ptr0[0] = tmp2; 2023-01-11T21:38:05.8580907Z } 2023-01-11T21:38:05.8580978Z } 2023-01-11T21:38:05.8581036Z { 2023-01-11T21:38:05.8581102Z { 2023-01-11T21:38:05.8581210Z auto tmp0 = static_cast(3.3); 2023-01-11T21:38:05.8581316Z auto tmp1 = static_cast(4); 2023-01-11T21:38:05.8581405Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:05.8581488Z out_ptr1[0] = tmp2; 2023-01-11T21:38:05.8581548Z } 2023-01-11T21:38:05.8581614Z } 2023-01-11T21:38:05.8581680Z { 2023-01-11T21:38:05.8581749Z { 2023-01-11T21:38:05.8581855Z auto tmp0 = static_cast(4); 2023-01-11T21:38:05.8581965Z auto tmp1 = static_cast(3.3); 2023-01-11T21:38:05.8582095Z auto tmp2 = tmp0 - tmp1; 2023-01-11T21:38:05.8582171Z out_ptr2[0] = tmp2; 2023-01-11T21:38:05.8582238Z } 2023-01-11T21:38:05.8582304Z } 2023-01-11T21:38:05.8582371Z { 2023-01-11T21:38:05.8582439Z { 2023-01-11T21:38:05.8582550Z auto tmp0 = static_cast(3.3); 2023-01-11T21:38:05.8582654Z auto tmp1 = static_cast(4); 2023-01-11T21:38:05.8582774Z auto tmp2 = tmp0 - tmp1; 2023-01-11T21:38:05.8582858Z out_ptr3[0] = tmp2; 2023-01-11T21:38:05.8582923Z } 2023-01-11T21:38:05.8582989Z } 2023-01-11T21:38:05.8583060Z { 2023-01-11T21:38:05.8583129Z { 2023-01-11T21:38:05.8583224Z auto tmp0 = static_cast(4); 2023-01-11T21:38:05.8583331Z auto tmp1 = static_cast(3.3); 2023-01-11T21:38:05.8583422Z auto tmp2 = tmp0 * tmp1; 2023-01-11T21:38:05.8583504Z out_ptr4[0] = tmp2; 2023-01-11T21:38:05.8583608Z } 2023-01-11T21:38:05.8583678Z } 2023-01-11T21:38:05.8583747Z { 2023-01-11T21:38:05.8583806Z { 2023-01-11T21:38:05.8583910Z auto tmp0 = static_cast(3.3); 2023-01-11T21:38:05.8584012Z auto tmp1 = static_cast(4); 2023-01-11T21:38:05.8584104Z auto tmp2 = tmp0 * tmp1; 2023-01-11T21:38:05.8584190Z out_ptr5[0] = tmp2; 2023-01-11T21:38:05.8584256Z } 2023-01-11T21:38:05.8584322Z } 2023-01-11T21:38:05.8584381Z } 2023-01-11T21:38:05.8584466Z ''') 2023-01-11T21:38:05.8584472Z 2023-01-11T21:38:05.8584476Z 2023-01-11T21:38:05.8584572Z async_compile.wait(globals()) 2023-01-11T21:38:05.8584649Z del async_compile 2023-01-11T21:38:05.8584654Z 2023-01-11T21:38:05.8584729Z def call(args): 2023-01-11T21:38:05.8584914Z buf0 = empty_strided((), (), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8585096Z buf1 = empty_strided((), (), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8585271Z buf2 = empty_strided((), (), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8585448Z buf3 = empty_strided((), (), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8585622Z buf4 = empty_strided((), (), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8585799Z buf5 = empty_strided((), (), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8586033Z kernel_cpp_0(c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr()), c_void_p(buf3.data_ptr()), c_void_p(buf4.data_ptr()), c_void_p(buf5.data_ptr())) 2023-01-11T21:38:05.8586144Z return (buf0, buf1, buf2, buf3, buf4, buf5, ) 2023-01-11T21:38:05.8586150Z 2023-01-11T21:38:05.8586155Z 2023-01-11T21:38:05.8586236Z if __name__ == "__main__": 2023-01-11T21:38:05.8586355Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.8586483Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.8586578Z print_performance(lambda: call([])) 2023-01-11T21:38:05.8586585Z 2023-01-11T21:38:05.8586659Z ok (1.839s) 2023-01-11T21:38:05.8587139Z test_cat_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.8587275Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.8587534Z [2023-01-11 21:25:18,159] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 50 2023-01-11T21:38:05.8587794Z [2023-01-11 21:25:20,058] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 50 2023-01-11T21:38:05.8588205Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.8588340Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.8588598Z [2023-01-11 21:25:20,126] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 51 2023-01-11T21:38:05.8588603Z 2023-01-11T21:38:05.8588701Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.8588777Z import torch 2023-01-11T21:38:05.8588845Z import random 2023-01-11T21:38:05.8588964Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.8589088Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.8589093Z 2023-01-11T21:38:05.8589176Z aten = torch.ops.aten 2023-01-11T21:38:05.8589311Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.8589409Z async_compile = AsyncCompile() 2023-01-11T21:38:05.8589446Z 2023-01-11T21:38:05.8589522Z import triton 2023-01-11T21:38:05.8589607Z import triton.language as tl 2023-01-11T21:38:05.8589734Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.8589873Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.8589878Z 2023-01-11T21:38:05.8589885Z 2023-01-11T21:38:05.8590023Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.8590230Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.8590356Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:05.8590463Z float* __restrict__ out_ptr0, 2023-01-11T21:38:05.8590563Z float* __restrict__ out_ptr1, 2023-01-11T21:38:05.8590655Z float* __restrict__ out_ptr2, 2023-01-11T21:38:05.8590757Z float* __restrict__ out_ptr3, 2023-01-11T21:38:05.8590858Z float* __restrict__ out_ptr4, 2023-01-11T21:38:05.8590962Z double* __restrict__ out_ptr5, 2023-01-11T21:38:05.8591063Z double* __restrict__ out_ptr6) 2023-01-11T21:38:05.8591130Z { 2023-01-11T21:38:05.8591233Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.8591292Z { 2023-01-11T21:38:05.8591384Z #pragma omp for 2023-01-11T21:38:05.8591471Z for(long i0=0; i0<8; i0+=1) 2023-01-11T21:38:05.8591537Z { 2023-01-11T21:38:05.8591628Z for(long i1=0; i1<2; i1+=1) 2023-01-11T21:38:05.8591697Z { 2023-01-11T21:38:05.8591851Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + (8*i1) + (16*i0)); 2023-01-11T21:38:05.8591986Z auto tmp1 = at::vec::Vectorized(static_cast(2)); 2023-01-11T21:38:05.8592081Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:05.8592193Z tmp0.store(out_ptr0 + (8*i1) + (36*i0)); 2023-01-11T21:38:05.8592303Z tmp2.store(out_ptr1 + (8*i1) + (36*i0)); 2023-01-11T21:38:05.8592371Z } 2023-01-11T21:38:05.8592471Z #pragma omp simd simdlen(4) 2023-01-11T21:38:05.8592563Z for(long i1=16; i1<16; i1+=1) 2023-01-11T21:38:05.8592623Z { 2023-01-11T21:38:05.8592767Z auto tmp0 = in_ptr0[i1 + (16*i0)]; 2023-01-11T21:38:05.8592876Z auto tmp1 = static_cast(2); 2023-01-11T21:38:05.8592970Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:05.8593067Z out_ptr0[i1 + (36*i0)] = tmp0; 2023-01-11T21:38:05.8593163Z out_ptr1[i1 + (36*i0)] = tmp2; 2023-01-11T21:38:05.8593231Z } 2023-01-11T21:38:05.8593291Z } 2023-01-11T21:38:05.8593373Z #pragma omp for 2023-01-11T21:38:05.8593461Z for(long i0=0; i0<8; i0+=1) 2023-01-11T21:38:05.8593529Z { 2023-01-11T21:38:05.8593614Z #pragma GCC ivdep 2023-01-11T21:38:05.8593701Z for(long i1=0; i1<4; i1+=1) 2023-01-11T21:38:05.8593773Z { 2023-01-11T21:38:05.8593835Z { 2023-01-11T21:38:05.8593909Z { 2023-01-11T21:38:05.8594021Z auto tmp0 = in_ptr0[i1 + (16*i0)]; 2023-01-11T21:38:05.8594136Z auto tmp1 = static_cast(1); 2023-01-11T21:38:05.8594241Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:05.8594345Z out_ptr2[i1 + (36*i0)] = tmp2; 2023-01-11T21:38:05.8594416Z } 2023-01-11T21:38:05.8594477Z } 2023-01-11T21:38:05.8594546Z } 2023-01-11T21:38:05.8594613Z } 2023-01-11T21:38:05.8594697Z #pragma omp for 2023-01-11T21:38:05.8594785Z for(long i0=0; i0<128; i0+=1) 2023-01-11T21:38:05.8594851Z { 2023-01-11T21:38:05.8594919Z { 2023-01-11T21:38:05.8594980Z { 2023-01-11T21:38:05.8595078Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:05.8595215Z auto tmp1 = static_cast(2); 2023-01-11T21:38:05.8595313Z auto tmp2 = tmp0 * tmp1; 2023-01-11T21:38:05.8595427Z auto tmp3 = static_cast(tmp2); 2023-01-11T21:38:05.8595518Z out_ptr3[i0] = tmp2; 2023-01-11T21:38:05.8595608Z out_ptr4[i0] = tmp2; 2023-01-11T21:38:05.8595694Z out_ptr5[i0] = tmp3; 2023-01-11T21:38:05.8595783Z out_ptr6[i0] = tmp3; 2023-01-11T21:38:05.8595852Z } 2023-01-11T21:38:05.8595922Z } 2023-01-11T21:38:05.8595990Z } 2023-01-11T21:38:05.8596056Z } 2023-01-11T21:38:05.8596113Z } 2023-01-11T21:38:05.8596202Z ''') 2023-01-11T21:38:05.8596208Z 2023-01-11T21:38:05.8596212Z 2023-01-11T21:38:05.8596305Z async_compile.wait(globals()) 2023-01-11T21:38:05.8596381Z del async_compile 2023-01-11T21:38:05.8596386Z 2023-01-11T21:38:05.8596462Z def call(args): 2023-01-11T21:38:05.8596534Z arg0_1, = args 2023-01-11T21:38:05.8596614Z args.clear() 2023-01-11T21:38:05.8596818Z buf3 = empty_strided((8, 36), (36, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8596918Z buf0 = as_strided(buf3, (8, 16), (36, 1)) # alias 2023-01-11T21:38:05.8597027Z buf2 = as_strided(buf3, (8, 16), (36, 1), 20) # alias 2023-01-11T21:38:05.8597138Z buf1 = as_strided(buf3, (8, 4), (36, 1), 16) # alias 2023-01-11T21:38:05.8597337Z buf6 = empty_strided((16, 16), (16, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8597442Z buf4 = as_strided(buf6, (8, 16), (16, 1)) # alias 2023-01-11T21:38:05.8597551Z buf5 = as_strided(buf6, (8, 16), (16, 1), 128) # alias 2023-01-11T21:38:05.8597751Z buf9 = empty_strided((16, 16), (16, 1), device='cpu', dtype=torch.float64) 2023-01-11T21:38:05.8597848Z buf7 = as_strided(buf9, (8, 16), (16, 1)) # alias 2023-01-11T21:38:05.8597956Z buf8 = as_strided(buf9, (8, 16), (16, 1), 128) # alias 2023-01-11T21:38:05.8598235Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf2.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf4.data_ptr()), c_void_p(buf5.data_ptr()), c_void_p(buf7.data_ptr()), c_void_p(buf8.data_ptr())) 2023-01-11T21:38:05.8598312Z del arg0_1 2023-01-11T21:38:05.8598398Z return (buf3, buf6, buf9, ) 2023-01-11T21:38:05.8598404Z 2023-01-11T21:38:05.8598435Z 2023-01-11T21:38:05.8598518Z if __name__ == "__main__": 2023-01-11T21:38:05.8598636Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.8598766Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.8598965Z arg0_1 = rand_strided((8, 16), (16, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8599071Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:05.8599076Z 2023-01-11T21:38:05.8599341Z [2023-01-11 21:25:22,075] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 51 2023-01-11T21:38:05.8599347Z 2023-01-11T21:38:05.8599445Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.8599523Z import torch 2023-01-11T21:38:05.8599599Z import random 2023-01-11T21:38:05.8599718Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.8599843Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.8599848Z 2023-01-11T21:38:05.8599933Z aten = torch.ops.aten 2023-01-11T21:38:05.8600066Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.8600161Z async_compile = AsyncCompile() 2023-01-11T21:38:05.8600166Z 2023-01-11T21:38:05.8600241Z import triton 2023-01-11T21:38:05.8600334Z import triton.language as tl 2023-01-11T21:38:05.8600460Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.8600599Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.8600605Z 2023-01-11T21:38:05.8600610Z 2023-01-11T21:38:05.8600746Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.8600953Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.8601102Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:05.8601214Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:05.8601324Z const float* __restrict__ in_ptr2, 2023-01-11T21:38:05.8601439Z const double* __restrict__ in_ptr3, 2023-01-11T21:38:05.8601546Z float* __restrict__ out_ptr0, 2023-01-11T21:38:05.8601648Z float* __restrict__ out_ptr1, 2023-01-11T21:38:05.8601748Z float* __restrict__ out_ptr2, 2023-01-11T21:38:05.8601840Z float* __restrict__ out_ptr3, 2023-01-11T21:38:05.8601940Z float* __restrict__ out_ptr4, 2023-01-11T21:38:05.8602038Z float* __restrict__ out_ptr5, 2023-01-11T21:38:05.8602142Z double* __restrict__ out_ptr6, 2023-01-11T21:38:05.8602246Z double* __restrict__ out_ptr7, 2023-01-11T21:38:05.8602350Z float* __restrict__ out_ptr8, 2023-01-11T21:38:05.8602452Z double* __restrict__ out_ptr9) 2023-01-11T21:38:05.8602510Z { 2023-01-11T21:38:05.8602614Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.8602681Z { 2023-01-11T21:38:05.8602780Z #pragma omp for collapse(3) 2023-01-11T21:38:05.8602868Z for(long i0=0; i0<3; i0+=1) 2023-01-11T21:38:05.8602934Z { 2023-01-11T21:38:05.8603023Z for(long i1=0; i1<3; i1+=1) 2023-01-11T21:38:05.8603084Z { 2023-01-11T21:38:05.8603181Z for(long i2=0; i2<16; i2+=1) 2023-01-11T21:38:05.8603250Z { 2023-01-11T21:38:05.8603321Z { 2023-01-11T21:38:05.8603396Z { 2023-01-11T21:38:05.8603516Z auto tmp0 = in_ptr0[i0 + (3*i2) + (48*i1)]; 2023-01-11T21:38:05.8603633Z auto tmp1 = static_cast(1); 2023-01-11T21:38:05.8603730Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:05.8603848Z auto tmp3 = static_cast(2); 2023-01-11T21:38:05.8603951Z auto tmp4 = tmp0 + tmp3; 2023-01-11T21:38:05.8604064Z out_ptr0[i2 + (48*i1) + (144*i0)] = tmp0; 2023-01-11T21:38:05.8604201Z out_ptr1[i2 + (48*i1) + (144*i0)] = tmp2; 2023-01-11T21:38:05.8604309Z out_ptr2[i2 + (48*i1) + (144*i0)] = tmp4; 2023-01-11T21:38:05.8604383Z } 2023-01-11T21:38:05.8604454Z } 2023-01-11T21:38:05.8604516Z } 2023-01-11T21:38:05.8604584Z } 2023-01-11T21:38:05.8604652Z } 2023-01-11T21:38:05.8604749Z #pragma omp for collapse(2) 2023-01-11T21:38:05.8604836Z for(long i0=0; i0<3; i0+=1) 2023-01-11T21:38:05.8604907Z { 2023-01-11T21:38:05.8605012Z for(long i1=0; i1<144; i1+=1) 2023-01-11T21:38:05.8605087Z { 2023-01-11T21:38:05.8605173Z { 2023-01-11T21:38:05.8605250Z { 2023-01-11T21:38:05.8605359Z auto tmp0 = in_ptr1[i1 + (144*i0)]; 2023-01-11T21:38:05.8605461Z out_ptr3[i0 + (3*i1)] = tmp0; 2023-01-11T21:38:05.8605535Z } 2023-01-11T21:38:05.8605596Z } 2023-01-11T21:38:05.8605668Z } 2023-01-11T21:38:05.8605734Z } 2023-01-11T21:38:05.8605833Z #pragma omp for collapse(2) 2023-01-11T21:38:05.8605919Z for(long i0=0; i0<3; i0+=1) 2023-01-11T21:38:05.8605988Z { 2023-01-11T21:38:05.8606078Z for(long i1=0; i1<48; i1+=1) 2023-01-11T21:38:05.8606138Z { 2023-01-11T21:38:05.8606208Z { 2023-01-11T21:38:05.8606279Z { 2023-01-11T21:38:05.8606390Z auto tmp0 = in_ptr0[i0 + (3*i1)]; 2023-01-11T21:38:05.8606504Z auto tmp1 = static_cast(2); 2023-01-11T21:38:05.8606635Z auto tmp2 = tmp0 * tmp1; 2023-01-11T21:38:05.8606739Z out_ptr4[i1 + (48*i0)] = tmp2; 2023-01-11T21:38:05.8606803Z } 2023-01-11T21:38:05.8606872Z } 2023-01-11T21:38:05.8606942Z } 2023-01-11T21:38:05.8607018Z } 2023-01-11T21:38:05.8607100Z #pragma omp for 2023-01-11T21:38:05.8607190Z for(long i0=0; i0<144; i0+=1) 2023-01-11T21:38:05.8607250Z { 2023-01-11T21:38:05.8607316Z { 2023-01-11T21:38:05.8607386Z { 2023-01-11T21:38:05.8607483Z auto tmp0 = out_ptr4[i0]; 2023-01-11T21:38:05.8607596Z auto tmp1 = static_cast(tmp0); 2023-01-11T21:38:05.8607686Z out_ptr5[i0] = tmp0; 2023-01-11T21:38:05.8607776Z out_ptr6[i0] = tmp1; 2023-01-11T21:38:05.8607857Z out_ptr7[i0] = tmp1; 2023-01-11T21:38:05.8607929Z } 2023-01-11T21:38:05.8607996Z } 2023-01-11T21:38:05.8608067Z } 2023-01-11T21:38:05.8608161Z #pragma omp for collapse(3) 2023-01-11T21:38:05.8608248Z for(long i0=0; i0<2; i0+=1) 2023-01-11T21:38:05.8608316Z { 2023-01-11T21:38:05.8608396Z for(long i1=0; i1<3; i1+=1) 2023-01-11T21:38:05.8608470Z { 2023-01-11T21:38:05.8608564Z for(long i2=0; i2<48; i2+=1) 2023-01-11T21:38:05.8608633Z { 2023-01-11T21:38:05.8608703Z { 2023-01-11T21:38:05.8608776Z { 2023-01-11T21:38:05.8608895Z auto tmp0 = in_ptr2[i2 + (48*i1) + (144*i0)]; 2023-01-11T21:38:05.8608998Z out_ptr8[i1 + (3*i2) + (144*i0)] = tmp0; 2023-01-11T21:38:05.8609071Z } 2023-01-11T21:38:05.8609141Z } 2023-01-11T21:38:05.8609209Z } 2023-01-11T21:38:05.8609277Z } 2023-01-11T21:38:05.8609346Z } 2023-01-11T21:38:05.8609433Z #pragma omp for collapse(3) 2023-01-11T21:38:05.8609520Z for(long i0=0; i0<2; i0+=1) 2023-01-11T21:38:05.8609586Z { 2023-01-11T21:38:05.8609675Z for(long i1=0; i1<3; i1+=1) 2023-01-11T21:38:05.8609744Z { 2023-01-11T21:38:05.8609865Z for(long i2=0; i2<48; i2+=1) 2023-01-11T21:38:05.8609936Z { 2023-01-11T21:38:05.8609999Z { 2023-01-11T21:38:05.8610072Z { 2023-01-11T21:38:05.8610190Z auto tmp0 = in_ptr3[i2 + (48*i1) + (144*i0)]; 2023-01-11T21:38:05.8610301Z out_ptr9[i1 + (3*i2) + (144*i0)] = tmp0; 2023-01-11T21:38:05.8610373Z } 2023-01-11T21:38:05.8610444Z } 2023-01-11T21:38:05.8610511Z } 2023-01-11T21:38:05.8610571Z } 2023-01-11T21:38:05.8610638Z } 2023-01-11T21:38:05.8610705Z } 2023-01-11T21:38:05.8610775Z } 2023-01-11T21:38:05.8610865Z ''') 2023-01-11T21:38:05.8610870Z 2023-01-11T21:38:05.8610875Z 2023-01-11T21:38:05.8610969Z async_compile.wait(globals()) 2023-01-11T21:38:05.8611047Z del async_compile 2023-01-11T21:38:05.8611052Z 2023-01-11T21:38:05.8611120Z def call(args): 2023-01-11T21:38:05.8611195Z arg0_1, = args 2023-01-11T21:38:05.8611277Z args.clear() 2023-01-11T21:38:05.8611500Z buf3 = empty_strided((1, 3, 3, 48), (432, 144, 48, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8611617Z buf0 = as_strided(buf3, (1, 3, 3, 16), (432, 144, 48, 1)) # alias 2023-01-11T21:38:05.8611735Z buf1 = as_strided(buf3, (1, 3, 3, 16), (432, 144, 48, 1), 16) # alias 2023-01-11T21:38:05.8611853Z buf2 = as_strided(buf3, (1, 3, 3, 16), (432, 144, 48, 1), 32) # alias 2023-01-11T21:38:05.8612066Z buf4 = empty_strided((1, 3, 3, 48), (432, 1, 144, 3), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8612269Z buf7 = empty_strided((2, 3, 3, 16), (144, 48, 16, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8612416Z buf5 = as_strided(buf7, (1, 3, 3, 16), (144, 48, 16, 1)) # alias 2023-01-11T21:38:05.8612532Z buf6 = as_strided(buf7, (1, 3, 3, 16), (144, 48, 16, 1), 144) # alias 2023-01-11T21:38:05.8612747Z buf11 = empty_strided((2, 3, 3, 16), (144, 48, 16, 1), device='cpu', dtype=torch.float64) 2023-01-11T21:38:05.8612869Z buf9 = as_strided(buf11, (1, 3, 3, 16), (144, 48, 16, 1)) # alias 2023-01-11T21:38:05.8612990Z buf10 = as_strided(buf11, (1, 3, 3, 16), (144, 48, 16, 1), 144) # alias 2023-01-11T21:38:05.8613201Z buf8 = empty_strided((2, 3, 3, 16), (144, 1, 48, 3), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8613413Z buf12 = empty_strided((2, 3, 3, 16), (144, 1, 48, 3), device='cpu', dtype=torch.float64) 2023-01-11T21:38:05.8613820Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf3.data_ptr()), c_void_p(buf7.data_ptr()), c_void_p(buf11.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr()), c_void_p(buf4.data_ptr()), c_void_p(buf5.data_ptr()), c_void_p(buf6.data_ptr()), c_void_p(buf9.data_ptr()), c_void_p(buf10.data_ptr()), c_void_p(buf8.data_ptr()), c_void_p(buf12.data_ptr())) 2023-01-11T21:38:05.8613891Z del arg0_1 2023-01-11T21:38:05.8613960Z del buf0 2023-01-11T21:38:05.8614029Z del buf1 2023-01-11T21:38:05.8614104Z del buf10 2023-01-11T21:38:05.8614174Z del buf11 2023-01-11T21:38:05.8614243Z del buf2 2023-01-11T21:38:05.8614310Z del buf3 2023-01-11T21:38:05.8614371Z del buf5 2023-01-11T21:38:05.8614441Z del buf6 2023-01-11T21:38:05.8614617Z del buf7 2023-01-11T21:38:05.8614687Z del buf9 2023-01-11T21:38:05.8614778Z return (buf4, buf8, buf12, ) 2023-01-11T21:38:05.8614783Z 2023-01-11T21:38:05.8614788Z 2023-01-11T21:38:05.8614869Z if __name__ == "__main__": 2023-01-11T21:38:05.8614989Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.8615111Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.8615332Z arg0_1 = rand_strided((1, 3, 3, 16), (144, 1, 48, 3), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8615449Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:05.8615454Z 2023-01-11T21:38:05.8615527Z ok (3.967s) 2023-01-11T21:38:05.8616027Z test_cat_extern_kernel_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.8616161Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.8616418Z [2023-01-11 21:25:22,119] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 52 2023-01-11T21:38:05.8616682Z [2023-01-11 21:25:23,879] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 52 2023-01-11T21:38:05.8616690Z 2023-01-11T21:38:05.8616789Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.8616856Z import torch 2023-01-11T21:38:05.8616932Z import random 2023-01-11T21:38:05.8617051Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.8617230Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.8617238Z 2023-01-11T21:38:05.8617329Z aten = torch.ops.aten 2023-01-11T21:38:05.8617468Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.8617564Z async_compile = AsyncCompile() 2023-01-11T21:38:05.8617569Z 2023-01-11T21:38:05.8617644Z import triton 2023-01-11T21:38:05.8617729Z import triton.language as tl 2023-01-11T21:38:05.8617855Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.8617995Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.8618000Z 2023-01-11T21:38:05.8618004Z 2023-01-11T21:38:05.8618183Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.8618390Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.8618515Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:05.8618621Z float* __restrict__ out_ptr0) 2023-01-11T21:38:05.8618688Z { 2023-01-11T21:38:05.8618782Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.8618847Z { 2023-01-11T21:38:05.8618931Z #pragma omp for 2023-01-11T21:38:05.8619021Z for(long i0=0; i0<256; i0+=1) 2023-01-11T21:38:05.8619087Z { 2023-01-11T21:38:05.8619178Z for(long i1=0; i1<32; i1+=1) 2023-01-11T21:38:05.8619239Z { 2023-01-11T21:38:05.8619389Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + (8*i1) + (256*i0)); 2023-01-11T21:38:05.8619498Z tmp0.store(out_ptr0 + (8*i1) + (512*i0)); 2023-01-11T21:38:05.8619567Z } 2023-01-11T21:38:05.8619672Z #pragma omp simd simdlen(4) 2023-01-11T21:38:05.8619766Z for(long i1=256; i1<256; i1+=1) 2023-01-11T21:38:05.8619835Z { 2023-01-11T21:38:05.8619928Z auto tmp0 = in_ptr0[i1 + (256*i0)]; 2023-01-11T21:38:05.8620025Z out_ptr0[i1 + (512*i0)] = tmp0; 2023-01-11T21:38:05.8620094Z } 2023-01-11T21:38:05.8620165Z } 2023-01-11T21:38:05.8620232Z } 2023-01-11T21:38:05.8620300Z } 2023-01-11T21:38:05.8620385Z ''') 2023-01-11T21:38:05.8620391Z 2023-01-11T21:38:05.8620395Z 2023-01-11T21:38:05.8620487Z async_compile.wait(globals()) 2023-01-11T21:38:05.8620557Z del async_compile 2023-01-11T21:38:05.8620562Z 2023-01-11T21:38:05.8620637Z def call(args): 2023-01-11T21:38:05.8620733Z arg0_1, arg1_1, arg2_1, arg3_1 = args 2023-01-11T21:38:05.8620810Z args.clear() 2023-01-11T21:38:05.8621016Z buf0 = empty_strided((256, 1600), (1600, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8621115Z aten.mm.out(arg1_1, arg2_1, out=buf0) 2023-01-11T21:38:05.8621191Z del arg1_1 2023-01-11T21:38:05.8621255Z del arg2_1 2023-01-11T21:38:05.8621457Z buf3 = empty_strided((256, 512), (512, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8621572Z buf1 = as_strided(buf3, (256, 256), (512, 1)) # alias 2023-01-11T21:38:05.8621727Z aten.mm.out(as_strided(buf0, (256, 100), (1600, 1)), arg3_1, out=buf1) 2023-01-11T21:38:05.8621801Z del arg3_1 2023-01-11T21:38:05.8621873Z del buf0 2023-01-11T21:38:05.8621983Z buf2 = as_strided(buf3, (256, 256), (512, 1), 256) # alias 2023-01-11T21:38:05.8622114Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf2.data_ptr())) 2023-01-11T21:38:05.8622188Z del arg0_1 2023-01-11T21:38:05.8622264Z return (buf3, ) 2023-01-11T21:38:05.8622269Z 2023-01-11T21:38:05.8622274Z 2023-01-11T21:38:05.8622356Z if __name__ == "__main__": 2023-01-11T21:38:05.8622473Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.8622599Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.8622807Z arg0_1 = rand_strided((256, 256), (256, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8623012Z arg1_1 = rand_strided((256, 1024), (1024, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8623211Z arg2_1 = rand_strided((1024, 1600), (1600, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8623409Z arg3_1 = rand_strided((100, 256), (256, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8623545Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1, arg3_1])) 2023-01-11T21:38:05.8623551Z 2023-01-11T21:38:05.8623625Z ok (1.836s) 2023-01-11T21:38:05.8624088Z test_cat_upcasting_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.8624254Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.8624511Z [2023-01-11 21:25:23,934] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 53 2023-01-11T21:38:05.8624776Z [2023-01-11 21:25:25,679] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 53 2023-01-11T21:38:05.8624782Z 2023-01-11T21:38:05.8624880Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.8624947Z import torch 2023-01-11T21:38:05.8625024Z import random 2023-01-11T21:38:05.8625142Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.8625265Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.8625271Z 2023-01-11T21:38:05.8625354Z aten = torch.ops.aten 2023-01-11T21:38:05.8625491Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.8625590Z async_compile = AsyncCompile() 2023-01-11T21:38:05.8625599Z 2023-01-11T21:38:05.8625672Z import triton 2023-01-11T21:38:05.8625758Z import triton.language as tl 2023-01-11T21:38:05.8625883Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.8626023Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.8626029Z 2023-01-11T21:38:05.8626035Z 2023-01-11T21:38:05.8626172Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.8626380Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.8626506Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:05.8626618Z const half* __restrict__ in_ptr1, 2023-01-11T21:38:05.8626724Z float* __restrict__ out_ptr0, 2023-01-11T21:38:05.8626817Z float* __restrict__ out_ptr1) 2023-01-11T21:38:05.8626883Z { 2023-01-11T21:38:05.8626990Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.8627061Z { 2023-01-11T21:38:05.8627145Z #pragma omp for 2023-01-11T21:38:05.8627235Z for(long i0=0; i0<8; i0+=1) 2023-01-11T21:38:05.8627302Z { 2023-01-11T21:38:05.8627384Z for(long i1=0; i1<2; i1+=1) 2023-01-11T21:38:05.8627456Z { 2023-01-11T21:38:05.8627633Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + (8*i1) + (16*i0)); 2023-01-11T21:38:05.8627747Z tmp0.store(out_ptr0 + (8*i1) + (36*i0)); 2023-01-11T21:38:05.8627812Z } 2023-01-11T21:38:05.8627911Z #pragma omp simd simdlen(4) 2023-01-11T21:38:05.8628004Z for(long i1=16; i1<16; i1+=1) 2023-01-11T21:38:05.8628065Z { 2023-01-11T21:38:05.8628166Z auto tmp0 = in_ptr0[i1 + (16*i0)]; 2023-01-11T21:38:05.8628264Z out_ptr0[i1 + (36*i0)] = tmp0; 2023-01-11T21:38:05.8628333Z } 2023-01-11T21:38:05.8628401Z } 2023-01-11T21:38:05.8628484Z #pragma omp for 2023-01-11T21:38:05.8628576Z for(long i0=0; i0<8; i0+=1) 2023-01-11T21:38:05.8628635Z { 2023-01-11T21:38:05.8628722Z #pragma GCC ivdep 2023-01-11T21:38:05.8628814Z for(long i1=0; i1<20; i1+=1) 2023-01-11T21:38:05.8628884Z { 2023-01-11T21:38:05.8628956Z { 2023-01-11T21:38:05.8629027Z { 2023-01-11T21:38:05.8629152Z auto tmp0 = static_cast(in_ptr1[i1 + (20*i0)]); 2023-01-11T21:38:05.8629271Z auto tmp1 = static_cast(tmp0); 2023-01-11T21:38:05.8629373Z out_ptr1[i1 + (36*i0)] = tmp1; 2023-01-11T21:38:05.8629446Z } 2023-01-11T21:38:05.8629513Z } 2023-01-11T21:38:05.8629582Z } 2023-01-11T21:38:05.8629650Z } 2023-01-11T21:38:05.8629709Z } 2023-01-11T21:38:05.8629778Z } 2023-01-11T21:38:05.8629864Z ''') 2023-01-11T21:38:05.8629869Z 2023-01-11T21:38:05.8629873Z 2023-01-11T21:38:05.8629966Z async_compile.wait(globals()) 2023-01-11T21:38:05.8630089Z del async_compile 2023-01-11T21:38:05.8630094Z 2023-01-11T21:38:05.8630169Z def call(args): 2023-01-11T21:38:05.8630249Z arg0_1, arg1_1 = args 2023-01-11T21:38:05.8630317Z args.clear() 2023-01-11T21:38:05.8630518Z buf2 = empty_strided((8, 36), (36, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8630631Z buf0 = as_strided(buf2, (8, 16), (36, 1)) # alias 2023-01-11T21:38:05.8630740Z buf1 = as_strided(buf2, (8, 20), (36, 1), 16) # alias 2023-01-11T21:38:05.8630934Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr())) 2023-01-11T21:38:05.8631005Z del arg0_1 2023-01-11T21:38:05.8631078Z del arg1_1 2023-01-11T21:38:05.8631156Z return (buf2, ) 2023-01-11T21:38:05.8631161Z 2023-01-11T21:38:05.8631166Z 2023-01-11T21:38:05.8631239Z if __name__ == "__main__": 2023-01-11T21:38:05.8631358Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.8631490Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.8631690Z arg0_1 = rand_strided((8, 16), (16, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8631885Z arg1_1 = rand_strided((8, 20), (20, 1), device='cpu', dtype=torch.float16) 2023-01-11T21:38:05.8632008Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:05.8632014Z 2023-01-11T21:38:05.8632085Z ok (1.767s) 2023-01-11T21:38:05.8632538Z test_cauchy_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.8632669Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.8632917Z [2023-01-11 21:25:25,700] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 54 2023-01-11T21:38:05.8633180Z [2023-01-11 21:25:27,431] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 54 2023-01-11T21:38:05.8633186Z 2023-01-11T21:38:05.8633285Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.8633358Z import torch 2023-01-11T21:38:05.8633474Z import random 2023-01-11T21:38:05.8633594Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.8633720Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.8633726Z 2023-01-11T21:38:05.8633810Z aten = torch.ops.aten 2023-01-11T21:38:05.8633939Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.8634036Z async_compile = AsyncCompile() 2023-01-11T21:38:05.8634041Z 2023-01-11T21:38:05.8634115Z import triton 2023-01-11T21:38:05.8634208Z import triton.language as tl 2023-01-11T21:38:05.8634334Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.8634478Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.8634483Z 2023-01-11T21:38:05.8634488Z 2023-01-11T21:38:05.8634625Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.8634834Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.8634977Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:05.8635099Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:05.8635217Z float* __restrict__ out_ptr0) 2023-01-11T21:38:05.8635285Z { 2023-01-11T21:38:05.8635351Z { 2023-01-11T21:38:05.8635418Z { 2023-01-11T21:38:05.8635502Z float tmp6 = 0; 2023-01-11T21:38:05.8635604Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.8635672Z { 2023-01-11T21:38:05.8635783Z #pragma omp for reduction(+:tmp6) 2023-01-11T21:38:05.8635879Z for(long i0=0; i0<32; i0+=1) 2023-01-11T21:38:05.8635982Z { 2023-01-11T21:38:05.8636082Z for(long i1=0; i1<32; i1+=1) 2023-01-11T21:38:05.8636155Z { 2023-01-11T21:38:05.8636220Z { 2023-01-11T21:38:05.8636322Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:05.8636423Z auto tmp1 = in_ptr1[i1]; 2023-01-11T21:38:05.8636567Z auto tmp2 = tmp0 - tmp1; 2023-01-11T21:38:05.8636663Z auto tmp3 = 1 / tmp2; 2023-01-11T21:38:05.8636778Z auto tmp4 = static_cast(1); 2023-01-11T21:38:05.8636879Z auto tmp5 = tmp3 * tmp4; 2023-01-11T21:38:05.8636959Z tmp6 += tmp5; 2023-01-11T21:38:05.8637037Z } 2023-01-11T21:38:05.8637109Z } 2023-01-11T21:38:05.8637179Z } 2023-01-11T21:38:05.8637247Z } 2023-01-11T21:38:05.8637336Z out_ptr0[0] = tmp6; 2023-01-11T21:38:05.8637405Z } 2023-01-11T21:38:05.8637463Z } 2023-01-11T21:38:05.8637531Z } 2023-01-11T21:38:05.8637615Z ''') 2023-01-11T21:38:05.8637621Z 2023-01-11T21:38:05.8637625Z 2023-01-11T21:38:05.8637719Z async_compile.wait(globals()) 2023-01-11T21:38:05.8637798Z del async_compile 2023-01-11T21:38:05.8637805Z 2023-01-11T21:38:05.8637883Z def call(args): 2023-01-11T21:38:05.8637962Z arg0_1, arg1_1 = args 2023-01-11T21:38:05.8638031Z args.clear() 2023-01-11T21:38:05.8638219Z buf0 = empty_strided((), (), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8638386Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:05.8638458Z del arg0_1 2023-01-11T21:38:05.8638529Z del arg1_1 2023-01-11T21:38:05.8638605Z return (buf0, ) 2023-01-11T21:38:05.8638610Z 2023-01-11T21:38:05.8638615Z 2023-01-11T21:38:05.8638698Z if __name__ == "__main__": 2023-01-11T21:38:05.8638818Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.8638936Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.8639134Z arg0_1 = rand_strided((32, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8639327Z arg1_1 = rand_strided((32, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8639476Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:05.8639482Z 2023-01-11T21:38:05.8639556Z ok (1.752s) 2023-01-11T21:38:05.8640005Z test_clamp_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.8640137Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.8640397Z [2023-01-11 21:25:27,461] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 55 2023-01-11T21:38:05.8640659Z [2023-01-11 21:25:29,210] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 55 2023-01-11T21:38:05.8640665Z 2023-01-11T21:38:05.8640757Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.8640833Z import torch 2023-01-11T21:38:05.8640909Z import random 2023-01-11T21:38:05.8641028Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.8641153Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.8641158Z 2023-01-11T21:38:05.8641240Z aten = torch.ops.aten 2023-01-11T21:38:05.8641376Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.8641475Z async_compile = AsyncCompile() 2023-01-11T21:38:05.8641481Z 2023-01-11T21:38:05.8641547Z import triton 2023-01-11T21:38:05.8641642Z import triton.language as tl 2023-01-11T21:38:05.8641796Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.8641936Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.8641941Z 2023-01-11T21:38:05.8641946Z 2023-01-11T21:38:05.8642084Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.8642291Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.8642415Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:05.8642527Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:05.8642622Z float* __restrict__ out_ptr0, 2023-01-11T21:38:05.8642725Z float* __restrict__ out_ptr1, 2023-01-11T21:38:05.8642825Z float* __restrict__ out_ptr2) 2023-01-11T21:38:05.8642893Z { 2023-01-11T21:38:05.8642995Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.8643065Z { 2023-01-11T21:38:05.8643148Z #pragma omp for 2023-01-11T21:38:05.8643232Z for(long i0=0; i0<8; i0+=1) 2023-01-11T21:38:05.8643301Z { 2023-01-11T21:38:05.8643441Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:05.8643578Z auto tmp5 = at::vec::Vectorized::loadu(in_ptr1 + 8*i0); 2023-01-11T21:38:05.8643803Z auto tmp1 = at::vec::Vectorized(static_cast(-0.10000000149011612)); 2023-01-11T21:38:05.8643919Z auto tmp2 = at::vec::maximum(tmp0, tmp1); 2023-01-11T21:38:05.8644067Z auto tmp3 = at::vec::Vectorized(static_cast(0.10000000149011612)); 2023-01-11T21:38:05.8644180Z auto tmp4 = at::vec::minimum(tmp2, tmp3); 2023-01-11T21:38:05.8644313Z auto tmp6 = at::vec::Vectorized(static_cast(0.0)); 2023-01-11T21:38:05.8644428Z auto tmp7 = at::vec::maximum(tmp5, tmp6); 2023-01-11T21:38:05.8644519Z auto tmp8 = tmp0 + tmp5; 2023-01-11T21:38:05.8644630Z auto tmp9 = at::vec::minimum(tmp8, tmp6); 2023-01-11T21:38:05.8644728Z tmp4.store(out_ptr0 + 8*i0); 2023-01-11T21:38:05.8644826Z tmp7.store(out_ptr1 + 8*i0); 2023-01-11T21:38:05.8644919Z tmp9.store(out_ptr2 + 8*i0); 2023-01-11T21:38:05.8644979Z } 2023-01-11T21:38:05.8645079Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:05.8645199Z for(long i0=64; i0<64; i0+=1) 2023-01-11T21:38:05.8645271Z { 2023-01-11T21:38:05.8645360Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:05.8645449Z auto tmp5 = in_ptr1[i0]; 2023-01-11T21:38:05.8645625Z auto tmp1 = static_cast(-0.10000000149011612); 2023-01-11T21:38:05.8645756Z auto tmp2 = (tmp1 != tmp1) ? tmp1 : std::max(tmp0, tmp1); 2023-01-11T21:38:05.8645865Z auto tmp3 = static_cast(0.10000000149011612); 2023-01-11T21:38:05.8645991Z auto tmp4 = (tmp3 != tmp3) ? tmp3 : std::min(tmp2, tmp3); 2023-01-11T21:38:05.8646098Z auto tmp6 = static_cast(0.0); 2023-01-11T21:38:05.8646228Z auto tmp7 = (tmp6 != tmp6) ? tmp6 : std::max(tmp5, tmp6); 2023-01-11T21:38:05.8646319Z auto tmp8 = tmp0 + tmp5; 2023-01-11T21:38:05.8646444Z auto tmp9 = (tmp6 != tmp6) ? tmp6 : std::min(tmp8, tmp6); 2023-01-11T21:38:05.8646532Z out_ptr0[i0] = tmp4; 2023-01-11T21:38:05.8646613Z out_ptr1[i0] = tmp7; 2023-01-11T21:38:05.8646702Z out_ptr2[i0] = tmp9; 2023-01-11T21:38:05.8646768Z } 2023-01-11T21:38:05.8646835Z } 2023-01-11T21:38:05.8646899Z } 2023-01-11T21:38:05.8646984Z ''') 2023-01-11T21:38:05.8646990Z 2023-01-11T21:38:05.8646994Z 2023-01-11T21:38:05.8647089Z async_compile.wait(globals()) 2023-01-11T21:38:05.8647159Z del async_compile 2023-01-11T21:38:05.8647173Z 2023-01-11T21:38:05.8647240Z def call(args): 2023-01-11T21:38:05.8647319Z arg0_1, arg1_1 = args 2023-01-11T21:38:05.8647394Z args.clear() 2023-01-11T21:38:05.8647589Z buf0 = empty_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8647812Z buf1 = empty_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8648000Z buf2 = empty_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8648222Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr())) 2023-01-11T21:38:05.8648288Z del arg0_1 2023-01-11T21:38:05.8648360Z del arg1_1 2023-01-11T21:38:05.8648447Z return (buf0, buf1, buf2, ) 2023-01-11T21:38:05.8648452Z 2023-01-11T21:38:05.8648457Z 2023-01-11T21:38:05.8648536Z if __name__ == "__main__": 2023-01-11T21:38:05.8648657Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.8648783Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.8648979Z arg0_1 = rand_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8649178Z arg1_1 = rand_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8649292Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:05.8649297Z 2023-01-11T21:38:05.8649368Z ok (1.792s) 2023-01-11T21:38:05.8649818Z test_clone_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.8649950Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.8650207Z [2023-01-11 21:25:29,257] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 56 2023-01-11T21:38:05.8650470Z [2023-01-11 21:25:31,127] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 56 2023-01-11T21:38:05.8650479Z 2023-01-11T21:38:05.8650579Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.8650652Z import torch 2023-01-11T21:38:05.8650727Z import random 2023-01-11T21:38:05.8650839Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.8650963Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.8650968Z 2023-01-11T21:38:05.8651077Z aten = torch.ops.aten 2023-01-11T21:38:05.8651216Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.8651312Z async_compile = AsyncCompile() 2023-01-11T21:38:05.8651317Z 2023-01-11T21:38:05.8651392Z import triton 2023-01-11T21:38:05.8651485Z import triton.language as tl 2023-01-11T21:38:05.8651610Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.8651742Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.8651748Z 2023-01-11T21:38:05.8651753Z 2023-01-11T21:38:05.8651890Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.8652099Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.8652225Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:05.8652330Z float* __restrict__ out_ptr0, 2023-01-11T21:38:05.8652434Z float* __restrict__ out_ptr1) 2023-01-11T21:38:05.8652502Z { 2023-01-11T21:38:05.8652596Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.8652665Z { 2023-01-11T21:38:05.8652747Z #pragma omp for 2023-01-11T21:38:05.8652834Z for(long i0=0; i0<32; i0+=1) 2023-01-11T21:38:05.8652902Z { 2023-01-11T21:38:05.8653042Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:05.8653179Z auto tmp1 = at::vec::Vectorized(static_cast(2)); 2023-01-11T21:38:05.8653267Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:05.8653395Z auto tmp3 = at::vec::Vectorized(static_cast(1)); 2023-01-11T21:38:05.8653511Z auto tmp4 = tmp0 + tmp3; 2023-01-11T21:38:05.8653607Z tmp2.store(out_ptr0 + 8*i0); 2023-01-11T21:38:05.8653701Z tmp4.store(out_ptr1 + 8*i0); 2023-01-11T21:38:05.8653769Z } 2023-01-11T21:38:05.8653869Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:05.8653961Z for(long i0=256; i0<256; i0+=1) 2023-01-11T21:38:05.8654021Z { 2023-01-11T21:38:05.8654111Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:05.8654216Z auto tmp1 = static_cast(2); 2023-01-11T21:38:05.8654305Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:05.8654409Z auto tmp3 = static_cast(1); 2023-01-11T21:38:05.8654623Z auto tmp4 = tmp0 + tmp3; 2023-01-11T21:38:05.8654714Z out_ptr0[i0] = tmp2; 2023-01-11T21:38:05.8654792Z out_ptr1[i0] = tmp4; 2023-01-11T21:38:05.8654859Z } 2023-01-11T21:38:05.8654926Z } 2023-01-11T21:38:05.8654989Z } 2023-01-11T21:38:05.8655087Z ''') 2023-01-11T21:38:05.8655092Z 2023-01-11T21:38:05.8655097Z 2023-01-11T21:38:05.8655194Z async_compile.wait(globals()) 2023-01-11T21:38:05.8655271Z del async_compile 2023-01-11T21:38:05.8655276Z 2023-01-11T21:38:05.8655343Z def call(args): 2023-01-11T21:38:05.8655417Z arg0_1, = args 2023-01-11T21:38:05.8655492Z args.clear() 2023-01-11T21:38:05.8655698Z buf0 = empty_strided((16, 16), (16, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8655896Z buf1 = empty_strided((16, 16), (16, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8656065Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr())) 2023-01-11T21:38:05.8656140Z del arg0_1 2023-01-11T21:38:05.8656214Z return (buf0, buf1, ) 2023-01-11T21:38:05.8656232Z 2023-01-11T21:38:05.8656236Z 2023-01-11T21:38:05.8656309Z if __name__ == "__main__": 2023-01-11T21:38:05.8656428Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.8656556Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.8656756Z arg0_1 = rand_strided((16, 16), (16, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8656869Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:05.8656874Z 2023-01-11T21:38:05.8656946Z ok (1.913s) 2023-01-11T21:38:05.8657557Z test_constant_pad_1d_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.8657695Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.8657956Z [2023-01-11 21:25:31,164] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 57 2023-01-11T21:38:05.8658215Z [2023-01-11 21:25:32,904] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 57 2023-01-11T21:38:05.8658224Z 2023-01-11T21:38:05.8658329Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.8658406Z import torch 2023-01-11T21:38:05.8658486Z import random 2023-01-11T21:38:05.8658607Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.8658735Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.8658740Z 2023-01-11T21:38:05.8658825Z aten = torch.ops.aten 2023-01-11T21:38:05.8658958Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.8659059Z async_compile = AsyncCompile() 2023-01-11T21:38:05.8659064Z 2023-01-11T21:38:05.8659143Z import triton 2023-01-11T21:38:05.8659238Z import triton.language as tl 2023-01-11T21:38:05.8659370Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.8659512Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.8659518Z 2023-01-11T21:38:05.8659555Z 2023-01-11T21:38:05.8659696Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.8659901Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.8660029Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:05.8660131Z float* __restrict__ out_ptr0, 2023-01-11T21:38:05.8660237Z float* __restrict__ out_ptr1) 2023-01-11T21:38:05.8660306Z { 2023-01-11T21:38:05.8660412Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.8660482Z { 2023-01-11T21:38:05.8660567Z #pragma omp for 2023-01-11T21:38:05.8660650Z for(long i0=0; i0<32; i0+=1) 2023-01-11T21:38:05.8660720Z { 2023-01-11T21:38:05.8660808Z #pragma GCC ivdep 2023-01-11T21:38:05.8660902Z for(long i1=0; i1<32; i1+=1) 2023-01-11T21:38:05.8660975Z { 2023-01-11T21:38:05.8661048Z { 2023-01-11T21:38:05.8661121Z { 2023-01-11T21:38:05.8661232Z auto tmp0 = static_cast(i1); 2023-01-11T21:38:05.8661343Z auto tmp1 = static_cast(31); 2023-01-11T21:38:05.8661447Z auto tmp2 = tmp0 < tmp1; 2023-01-11T21:38:05.8661542Z float tmp3 = 6.0; 2023-01-11T21:38:05.8661631Z if(tmp2) 2023-01-11T21:38:05.8661712Z { 2023-01-11T21:38:05.8661825Z auto tmp4 = in_ptr0[i1 + (31*i0)]; 2023-01-11T21:38:05.8661907Z tmp3 = tmp4; 2023-01-11T21:38:05.8661987Z } 2023-01-11T21:38:05.8662092Z out_ptr0[i1 + (32*i0)] = tmp3; 2023-01-11T21:38:05.8662164Z } 2023-01-11T21:38:05.8662236Z } 2023-01-11T21:38:05.8662308Z } 2023-01-11T21:38:05.8662378Z } 2023-01-11T21:38:05.8662455Z #pragma omp for 2023-01-11T21:38:05.8662545Z for(long i0=0; i0<32; i0+=1) 2023-01-11T21:38:05.8662619Z { 2023-01-11T21:38:05.8662706Z #pragma GCC ivdep 2023-01-11T21:38:05.8662798Z for(long i1=0; i1<36; i1+=1) 2023-01-11T21:38:05.8662874Z { 2023-01-11T21:38:05.8662946Z { 2023-01-11T21:38:05.8663012Z { 2023-01-11T21:38:05.8663216Z auto tmp0 = static_cast((-2) + i1); 2023-01-11T21:38:05.8663334Z auto tmp1 = static_cast(0); 2023-01-11T21:38:05.8663441Z auto tmp2 = tmp0 >= tmp1; 2023-01-11T21:38:05.8663554Z auto tmp3 = static_cast(31); 2023-01-11T21:38:05.8663659Z auto tmp4 = tmp0 < tmp3; 2023-01-11T21:38:05.8663764Z auto tmp5 = tmp2 & tmp4; 2023-01-11T21:38:05.8663851Z float tmp6 = 99.0; 2023-01-11T21:38:05.8663941Z if(tmp5) 2023-01-11T21:38:05.8664018Z { 2023-01-11T21:38:05.8664199Z auto tmp7 = in_ptr0[(-2) + i1 + (31*i0)]; 2023-01-11T21:38:05.8664292Z tmp6 = tmp7; 2023-01-11T21:38:05.8664369Z } 2023-01-11T21:38:05.8664473Z out_ptr1[i1 + (36*i0)] = tmp6; 2023-01-11T21:38:05.8664538Z } 2023-01-11T21:38:05.8664612Z } 2023-01-11T21:38:05.8664681Z } 2023-01-11T21:38:05.8664750Z } 2023-01-11T21:38:05.8664820Z } 2023-01-11T21:38:05.8664888Z } 2023-01-11T21:38:05.8664970Z ''') 2023-01-11T21:38:05.8664986Z 2023-01-11T21:38:05.8664991Z 2023-01-11T21:38:05.8665087Z async_compile.wait(globals()) 2023-01-11T21:38:05.8665186Z del async_compile 2023-01-11T21:38:05.8665191Z 2023-01-11T21:38:05.8665278Z def call(args): 2023-01-11T21:38:05.8665371Z arg0_1, = args 2023-01-11T21:38:05.8665451Z args.clear() 2023-01-11T21:38:05.8665662Z buf0 = empty_strided((2, 16, 32), (512, 32, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8665943Z buf1 = empty_strided((2, 16, 36), (576, 36, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8666105Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr())) 2023-01-11T21:38:05.8666180Z del arg0_1 2023-01-11T21:38:05.8666261Z return (buf0, buf1, ) 2023-01-11T21:38:05.8666270Z 2023-01-11T21:38:05.8666274Z 2023-01-11T21:38:05.8666354Z if __name__ == "__main__": 2023-01-11T21:38:05.8666473Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.8666600Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.8666807Z arg0_1 = rand_strided((2, 16, 31), (496, 31, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8666921Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:05.8666926Z 2023-01-11T21:38:05.8666989Z ok (1.817s) 2023-01-11T21:38:05.8667447Z test_constant_pad_2d_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.8667591Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.8667849Z [2023-01-11 21:25:32,980] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 58 2023-01-11T21:38:05.8668113Z [2023-01-11 21:25:34,739] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 58 2023-01-11T21:38:05.8668118Z 2023-01-11T21:38:05.8668218Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.8668292Z import torch 2023-01-11T21:38:05.8668370Z import random 2023-01-11T21:38:05.8668491Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.8668608Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.8668626Z 2023-01-11T21:38:05.8668700Z aten = torch.ops.aten 2023-01-11T21:38:05.8668834Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.8668932Z async_compile = AsyncCompile() 2023-01-11T21:38:05.8668938Z 2023-01-11T21:38:05.8669012Z import triton 2023-01-11T21:38:05.8669133Z import triton.language as tl 2023-01-11T21:38:05.8669257Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.8669397Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.8669403Z 2023-01-11T21:38:05.8669407Z 2023-01-11T21:38:05.8669536Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.8669745Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.8669868Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:05.8669973Z float* __restrict__ out_ptr0, 2023-01-11T21:38:05.8670075Z float* __restrict__ out_ptr1) 2023-01-11T21:38:05.8670145Z { 2023-01-11T21:38:05.8670247Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.8670313Z { 2023-01-11T21:38:05.8670401Z #pragma omp for collapse(2) 2023-01-11T21:38:05.8670488Z for(long i0=0; i0<10; i0+=1) 2023-01-11T21:38:05.8670554Z { 2023-01-11T21:38:05.8670652Z for(long i1=0; i1<10; i1+=1) 2023-01-11T21:38:05.8670720Z { 2023-01-11T21:38:05.8670790Z { 2023-01-11T21:38:05.8670854Z { 2023-01-11T21:38:05.8671023Z auto tmp0 = static_cast((-1) + i0); 2023-01-11T21:38:05.8671136Z auto tmp1 = static_cast(0); 2023-01-11T21:38:05.8671236Z auto tmp2 = tmp0 >= tmp1; 2023-01-11T21:38:05.8671347Z auto tmp3 = static_cast(8); 2023-01-11T21:38:05.8671444Z auto tmp4 = tmp0 < tmp3; 2023-01-11T21:38:05.8671612Z auto tmp5 = static_cast((-1) + i1); 2023-01-11T21:38:05.8671745Z auto tmp6 = tmp5 >= tmp1; 2023-01-11T21:38:05.8671836Z auto tmp7 = tmp5 < tmp3; 2023-01-11T21:38:05.8671934Z auto tmp8 = tmp2 & tmp4; 2023-01-11T21:38:05.8672033Z auto tmp9 = tmp8 & tmp6; 2023-01-11T21:38:05.8672133Z auto tmp10 = tmp9 & tmp7; 2023-01-11T21:38:05.8672227Z float tmp11 = 6.0; 2023-01-11T21:38:05.8672310Z if(tmp10) 2023-01-11T21:38:05.8672387Z { 2023-01-11T21:38:05.8672554Z auto tmp12 = in_ptr0[(-9) + i1 + (8*i0)]; 2023-01-11T21:38:05.8672645Z tmp11 = tmp12; 2023-01-11T21:38:05.8672720Z } 2023-01-11T21:38:05.8672823Z out_ptr0[i1 + (10*i0)] = tmp11; 2023-01-11T21:38:05.8672893Z } 2023-01-11T21:38:05.8672963Z } 2023-01-11T21:38:05.8673034Z } 2023-01-11T21:38:05.8673094Z } 2023-01-11T21:38:05.8673191Z #pragma omp for collapse(2) 2023-01-11T21:38:05.8673276Z for(long i0=0; i0<15; i0+=1) 2023-01-11T21:38:05.8673344Z { 2023-01-11T21:38:05.8673434Z for(long i1=0; i1<11; i1+=1) 2023-01-11T21:38:05.8673502Z { 2023-01-11T21:38:05.8673564Z { 2023-01-11T21:38:05.8673635Z { 2023-01-11T21:38:05.8673803Z auto tmp0 = static_cast((-3) + i0); 2023-01-11T21:38:05.8673913Z auto tmp1 = static_cast(0); 2023-01-11T21:38:05.8674011Z auto tmp2 = tmp0 >= tmp1; 2023-01-11T21:38:05.8674118Z auto tmp3 = static_cast(8); 2023-01-11T21:38:05.8674216Z auto tmp4 = tmp0 < tmp3; 2023-01-11T21:38:05.8674384Z auto tmp5 = static_cast((-1) + i1); 2023-01-11T21:38:05.8674475Z auto tmp6 = tmp5 >= tmp1; 2023-01-11T21:38:05.8674580Z auto tmp7 = tmp5 < tmp3; 2023-01-11T21:38:05.8674681Z auto tmp8 = tmp2 & tmp4; 2023-01-11T21:38:05.8674778Z auto tmp9 = tmp8 & tmp6; 2023-01-11T21:38:05.8674875Z auto tmp10 = tmp9 & tmp7; 2023-01-11T21:38:05.8675008Z float tmp11 = 99.0; 2023-01-11T21:38:05.8675093Z if(tmp10) 2023-01-11T21:38:05.8675159Z { 2023-01-11T21:38:05.8675334Z auto tmp12 = in_ptr0[(-25) + i1 + (8*i0)]; 2023-01-11T21:38:05.8675424Z tmp11 = tmp12; 2023-01-11T21:38:05.8675499Z } 2023-01-11T21:38:05.8675603Z out_ptr1[i1 + (11*i0)] = tmp11; 2023-01-11T21:38:05.8675676Z } 2023-01-11T21:38:05.8675749Z } 2023-01-11T21:38:05.8675809Z } 2023-01-11T21:38:05.8675881Z } 2023-01-11T21:38:05.8675948Z } 2023-01-11T21:38:05.8676014Z } 2023-01-11T21:38:05.8676098Z ''') 2023-01-11T21:38:05.8676105Z 2023-01-11T21:38:05.8676109Z 2023-01-11T21:38:05.8676206Z async_compile.wait(globals()) 2023-01-11T21:38:05.8676284Z del async_compile 2023-01-11T21:38:05.8676289Z 2023-01-11T21:38:05.8676357Z def call(args): 2023-01-11T21:38:05.8676436Z arg0_1, = args 2023-01-11T21:38:05.8676511Z args.clear() 2023-01-11T21:38:05.8676733Z buf0 = empty_strided((1, 1, 10, 10), (100, 100, 10, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8676952Z buf1 = empty_strided((1, 1, 15, 11), (165, 165, 11, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8677121Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr())) 2023-01-11T21:38:05.8677196Z del arg0_1 2023-01-11T21:38:05.8677270Z return (buf0, buf1, ) 2023-01-11T21:38:05.8677288Z 2023-01-11T21:38:05.8677293Z 2023-01-11T21:38:05.8677366Z if __name__ == "__main__": 2023-01-11T21:38:05.8677517Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.8677647Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.8677860Z arg0_1 = rand_strided((1, 1, 8, 8), (64, 64, 8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8677981Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:05.8677986Z 2023-01-11T21:38:05.8678057Z ok (1.787s) 2023-01-11T21:38:05.8678516Z test_constant_pad_3d_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.8678649Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.8678905Z [2023-01-11 21:25:34,768] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 59 2023-01-11T21:38:05.8679164Z [2023-01-11 21:25:36,533] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 59 2023-01-11T21:38:05.8679170Z 2023-01-11T21:38:05.8679269Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.8679348Z import torch 2023-01-11T21:38:05.8679424Z import random 2023-01-11T21:38:05.8679544Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.8679668Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.8679675Z 2023-01-11T21:38:05.8679758Z aten = torch.ops.aten 2023-01-11T21:38:05.8679898Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.8679987Z async_compile = AsyncCompile() 2023-01-11T21:38:05.8679992Z 2023-01-11T21:38:05.8680071Z import triton 2023-01-11T21:38:05.8680161Z import triton.language as tl 2023-01-11T21:38:05.8680288Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.8680433Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.8680438Z 2023-01-11T21:38:05.8680442Z 2023-01-11T21:38:05.8680579Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.8680783Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.8680941Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:05.8681040Z float* __restrict__ out_ptr0, 2023-01-11T21:38:05.8681146Z float* __restrict__ out_ptr1) 2023-01-11T21:38:05.8681213Z { 2023-01-11T21:38:05.8681317Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.8681385Z { 2023-01-11T21:38:05.8681479Z #pragma omp for collapse(2) 2023-01-11T21:38:05.8681565Z for(long i0=0; i0<2; i0+=1) 2023-01-11T21:38:05.8681627Z { 2023-01-11T21:38:05.8681718Z for(long i1=0; i1<15; i1+=1) 2023-01-11T21:38:05.8681788Z { 2023-01-11T21:38:05.8681879Z #pragma GCC ivdep 2023-01-11T21:38:05.8681975Z for(long i2=0; i2<11; i2+=1) 2023-01-11T21:38:05.8682044Z { 2023-01-11T21:38:05.8682126Z #pragma GCC ivdep 2023-01-11T21:38:05.8682223Z for(long i3=0; i3<7; i3+=1) 2023-01-11T21:38:05.8682297Z { 2023-01-11T21:38:05.8682371Z { 2023-01-11T21:38:05.8682445Z { 2023-01-11T21:38:05.8682623Z auto tmp0 = static_cast((-5) + i1); 2023-01-11T21:38:05.8682738Z auto tmp1 = static_cast(0); 2023-01-11T21:38:05.8682836Z auto tmp2 = tmp0 >= tmp1; 2023-01-11T21:38:05.8682948Z auto tmp3 = static_cast(4); 2023-01-11T21:38:05.8683051Z auto tmp4 = tmp0 < tmp3; 2023-01-11T21:38:05.8683228Z auto tmp5 = static_cast((-3) + i2); 2023-01-11T21:38:05.8683364Z auto tmp6 = tmp5 >= tmp1; 2023-01-11T21:38:05.8683470Z auto tmp7 = tmp5 < tmp3; 2023-01-11T21:38:05.8683645Z auto tmp8 = static_cast((-1) + i3); 2023-01-11T21:38:05.8683753Z auto tmp9 = tmp8 >= tmp1; 2023-01-11T21:38:05.8683848Z auto tmp10 = tmp8 < tmp3; 2023-01-11T21:38:05.8683953Z auto tmp11 = tmp2 & tmp4; 2023-01-11T21:38:05.8684063Z auto tmp12 = tmp11 & tmp6; 2023-01-11T21:38:05.8684168Z auto tmp13 = tmp12 & tmp7; 2023-01-11T21:38:05.8684272Z auto tmp14 = tmp13 & tmp9; 2023-01-11T21:38:05.8684379Z auto tmp15 = tmp14 & tmp10; 2023-01-11T21:38:05.8684477Z float tmp16 = 6.0; 2023-01-11T21:38:05.8684568Z if(tmp15) 2023-01-11T21:38:05.8684639Z { 2023-01-11T21:38:05.8684844Z auto tmp17 = in_ptr0[(-93) + i3 + (4*i2) + (16*i1) + (64*i0)]; 2023-01-11T21:38:05.8684938Z tmp16 = tmp17; 2023-01-11T21:38:05.8685018Z } 2023-01-11T21:38:05.8685139Z out_ptr0[i3 + (7*i2) + (77*i1) + (1155*i0)] = tmp16; 2023-01-11T21:38:05.8685214Z } 2023-01-11T21:38:05.8685288Z } 2023-01-11T21:38:05.8685351Z } 2023-01-11T21:38:05.8685421Z } 2023-01-11T21:38:05.8685485Z } 2023-01-11T21:38:05.8685552Z } 2023-01-11T21:38:05.8685634Z #pragma omp for 2023-01-11T21:38:05.8685723Z for(long i0=0; i0<8; i0+=1) 2023-01-11T21:38:05.8685783Z { 2023-01-11T21:38:05.8685868Z #pragma GCC ivdep 2023-01-11T21:38:05.8685959Z for(long i1=0; i1<11; i1+=1) 2023-01-11T21:38:05.8686028Z { 2023-01-11T21:38:05.8686114Z #pragma GCC ivdep 2023-01-11T21:38:05.8686210Z for(long i2=0; i2<4; i2+=1) 2023-01-11T21:38:05.8686280Z { 2023-01-11T21:38:05.8686343Z { 2023-01-11T21:38:05.8686447Z { 2023-01-11T21:38:05.8686622Z auto tmp0 = static_cast((-3) + i1); 2023-01-11T21:38:05.8686734Z auto tmp1 = static_cast(0); 2023-01-11T21:38:05.8686837Z auto tmp2 = tmp0 >= tmp1; 2023-01-11T21:38:05.8686949Z auto tmp3 = static_cast(4); 2023-01-11T21:38:05.8687050Z auto tmp4 = tmp0 < tmp3; 2023-01-11T21:38:05.8687142Z auto tmp5 = tmp2 & tmp4; 2023-01-11T21:38:05.8687235Z float tmp6 = 6.0; 2023-01-11T21:38:05.8687322Z if(tmp5) 2023-01-11T21:38:05.8687397Z { 2023-01-11T21:38:05.8687581Z auto tmp7 = in_ptr0[(-12) + i2 + (4*i1) + (16*i0)]; 2023-01-11T21:38:05.8687673Z tmp6 = tmp7; 2023-01-11T21:38:05.8687748Z } 2023-01-11T21:38:05.8687859Z out_ptr1[i2 + (4*i1) + (44*i0)] = tmp6; 2023-01-11T21:38:05.8687925Z } 2023-01-11T21:38:05.8687996Z } 2023-01-11T21:38:05.8688065Z } 2023-01-11T21:38:05.8688133Z } 2023-01-11T21:38:05.8688200Z } 2023-01-11T21:38:05.8688267Z } 2023-01-11T21:38:05.8688324Z } 2023-01-11T21:38:05.8688410Z ''') 2023-01-11T21:38:05.8688415Z 2023-01-11T21:38:05.8688420Z 2023-01-11T21:38:05.8688514Z async_compile.wait(globals()) 2023-01-11T21:38:05.8688593Z del async_compile 2023-01-11T21:38:05.8688598Z 2023-01-11T21:38:05.8688673Z def call(args): 2023-01-11T21:38:05.8688779Z arg0_1, = args 2023-01-11T21:38:05.8688857Z args.clear() 2023-01-11T21:38:05.8689079Z buf0 = empty_strided((2, 15, 11, 7), (1155, 77, 7, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8689286Z buf1 = empty_strided((2, 4, 11, 4), (176, 44, 4, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8689459Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr())) 2023-01-11T21:38:05.8689536Z del arg0_1 2023-01-11T21:38:05.8689619Z return (buf0, buf1, ) 2023-01-11T21:38:05.8689624Z 2023-01-11T21:38:05.8689628Z 2023-01-11T21:38:05.8689711Z if __name__ == "__main__": 2023-01-11T21:38:05.8689831Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.8689959Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.8690170Z arg0_1 = rand_strided((2, 4, 4, 4), (64, 16, 4, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8690275Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:05.8690283Z 2023-01-11T21:38:05.8690354Z ok (1.796s) 2023-01-11T21:38:05.8690833Z test_conv2d_backward_channels_last_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.8690965Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.8691219Z [2023-01-11 21:25:36,667] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 60 2023-01-11T21:38:05.8691480Z [2023-01-11 21:25:36,686] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 60 2023-01-11T21:38:05.8691485Z 2023-01-11T21:38:05.8691587Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.8691665Z import torch 2023-01-11T21:38:05.8691741Z import random 2023-01-11T21:38:05.8691854Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.8691978Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.8691984Z 2023-01-11T21:38:05.8692070Z aten = torch.ops.aten 2023-01-11T21:38:05.8692235Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.8692334Z async_compile = AsyncCompile() 2023-01-11T21:38:05.8692339Z 2023-01-11T21:38:05.8692415Z import triton 2023-01-11T21:38:05.8692512Z import triton.language as tl 2023-01-11T21:38:05.8692639Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.8692771Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.8692777Z 2023-01-11T21:38:05.8692781Z 2023-01-11T21:38:05.8692874Z async_compile.wait(globals()) 2023-01-11T21:38:05.8692952Z del async_compile 2023-01-11T21:38:05.8692957Z 2023-01-11T21:38:05.8693034Z def call(args): 2023-01-11T21:38:05.8693125Z arg0_1, arg1_1, arg2_1 = args 2023-01-11T21:38:05.8693201Z args.clear() 2023-01-11T21:38:05.8693376Z buf0 = aten.convolution_backward(arg0_1, arg1_1, arg2_1, [320], [1, 1], [0, 0], [1, 1], False, [0, 0], 1, [True, True, True]) 2023-01-11T21:38:05.8693451Z del arg0_1 2023-01-11T21:38:05.8693518Z del arg1_1 2023-01-11T21:38:05.8693594Z del arg2_1 2023-01-11T21:38:05.8693667Z buf1 = buf0[0] 2023-01-11T21:38:05.8693787Z assert_size_stride(buf1, (2, 2048, 8, 8), (131072, 1, 16384, 2048)) 2023-01-11T21:38:05.8693862Z buf2 = buf0[1] 2023-01-11T21:38:05.8693979Z assert_size_stride(buf2, (320, 2048, 1, 1), (2048, 1, 2048, 2048)) 2023-01-11T21:38:05.8694044Z buf3 = buf0[2] 2023-01-11T21:38:05.8694147Z assert_size_stride(buf3, (320, ), (1, )) 2023-01-11T21:38:05.8694216Z del buf0 2023-01-11T21:38:05.8694304Z return (buf1, buf2, buf3, ) 2023-01-11T21:38:05.8694309Z 2023-01-11T21:38:05.8694314Z 2023-01-11T21:38:05.8694394Z if __name__ == "__main__": 2023-01-11T21:38:05.8694655Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.8694783Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.8695012Z arg0_1 = rand_strided((2, 320, 8, 8), (20480, 64, 8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8695227Z arg1_1 = rand_strided((2, 2048, 8, 8), (131072, 64, 8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8695456Z arg2_1 = rand_strided((320, 2048, 1, 1), (2048, 1, 2048, 2048), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8695584Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1])) 2023-01-11T21:38:05.8695589Z 2023-01-11T21:38:05.8695663Z ok (0.167s) 2023-01-11T21:38:05.8695833Z test_conv2d_binary_cpu (__main__.CpuTests) ... skip: only support cpu conv2d binary test (0.002s) 2023-01-11T21:38:05.8696004Z test_conv2d_channels_last_cpu (__main__.CpuTests) ... skip: only support cpu channels_last (0.001s) 2023-01-11T21:38:05.8696169Z test_conv2d_packed_cpu (__main__.CpuTests) ... skip: only support cpu conv2d unary test (0.000s) 2023-01-11T21:38:05.8696334Z test_conv2d_unary_cpu (__main__.CpuTests) ... skip: only support cpu conv2d unary test (0.001s) 2023-01-11T21:38:05.8696506Z test_conv3d_channels_last_cpu (__main__.CpuTests) ... skip: only support cpu channels_last (0.001s) 2023-01-11T21:38:05.8697011Z test_conv_autotune_cpu (__main__.CpuTests) ... /opt/conda/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py:372: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled.Consider setting `torch.set_float32_matmul_precision('high')` 2023-01-11T21:38:05.8697095Z warnings.warn( 2023-01-11T21:38:05.8697417Z [2023-01-11 21:25:36,773] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 61 2023-01-11T21:38:05.8697678Z [2023-01-11 21:25:36,795] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 61 2023-01-11T21:38:05.8697684Z 2023-01-11T21:38:05.8697786Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.8697862Z import torch 2023-01-11T21:38:05.8697938Z import random 2023-01-11T21:38:05.8698058Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.8698174Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.8698188Z 2023-01-11T21:38:05.8698309Z aten = torch.ops.aten 2023-01-11T21:38:05.8698449Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.8698546Z async_compile = AsyncCompile() 2023-01-11T21:38:05.8698551Z 2023-01-11T21:38:05.8698626Z import triton 2023-01-11T21:38:05.8698720Z import triton.language as tl 2023-01-11T21:38:05.8698846Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.8698985Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.8698990Z 2023-01-11T21:38:05.8699131Z from torch._inductor.triton_ops.conv_perf_model import early_config_prune 2023-01-11T21:38:05.8699280Z from torch._inductor.triton_ops.conv_perf_model import estimate_conv_time 2023-01-11T21:38:05.8699427Z from torch._inductor.triton_ops.autotune import conv_heuristics 2023-01-11T21:38:05.8699432Z 2023-01-11T21:38:05.8699437Z 2023-01-11T21:38:05.8699528Z async_compile.wait(globals()) 2023-01-11T21:38:05.8699604Z del async_compile 2023-01-11T21:38:05.8699609Z 2023-01-11T21:38:05.8699686Z def call(args): 2023-01-11T21:38:05.8699776Z arg0_1, arg1_1, arg2_1 = args 2023-01-11T21:38:05.8699851Z args.clear() 2023-01-11T21:38:05.8699987Z buf0 = aten.convolution(arg0_1, arg1_1, arg2_1, (1, 1), (0, 0), (1, 1), False, (0, 0), 1) 2023-01-11T21:38:05.8700104Z assert_size_stride(buf0, (32, 32, 32, 32), (32768, 1024, 32, 1)) 2023-01-11T21:38:05.8700179Z del arg0_1 2023-01-11T21:38:05.8700251Z del arg1_1 2023-01-11T21:38:05.8700322Z del arg2_1 2023-01-11T21:38:05.8700398Z return (buf0, ) 2023-01-11T21:38:05.8700403Z 2023-01-11T21:38:05.8700409Z 2023-01-11T21:38:05.8700489Z if __name__ == "__main__": 2023-01-11T21:38:05.8700646Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.8700765Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.8700997Z arg0_1 = rand_strided((32, 128, 32, 32), (131072, 1024, 32, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8701218Z arg1_1 = rand_strided((32, 128, 1, 1), (128, 1, 1, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8701409Z arg2_1 = rand_strided((32, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8701537Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1])) 2023-01-11T21:38:05.8701542Z 2023-01-11T21:38:05.8701614Z ok (0.099s) 2023-01-11T21:38:05.8701939Z test_conv_backward_cpu (__main__.CpuTests) ... [2023-01-11 21:25:36,895] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 62 2023-01-11T21:38:05.8702202Z [2023-01-11 21:25:36,963] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 62 2023-01-11T21:38:05.8702211Z 2023-01-11T21:38:05.8702301Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.8702377Z import torch 2023-01-11T21:38:05.8702453Z import random 2023-01-11T21:38:05.8702571Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.8702695Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.8702701Z 2023-01-11T21:38:05.8702786Z aten = torch.ops.aten 2023-01-11T21:38:05.8702923Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.8703018Z async_compile = AsyncCompile() 2023-01-11T21:38:05.8703023Z 2023-01-11T21:38:05.8703090Z import triton 2023-01-11T21:38:05.8703184Z import triton.language as tl 2023-01-11T21:38:05.8703309Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.8703453Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.8703459Z 2023-01-11T21:38:05.8703463Z 2023-01-11T21:38:05.8703558Z async_compile.wait(globals()) 2023-01-11T21:38:05.8703635Z del async_compile 2023-01-11T21:38:05.8703643Z 2023-01-11T21:38:05.8703719Z def call(args): 2023-01-11T21:38:05.8703849Z arg0_1, arg1_1, arg2_1, arg3_1, arg4_1, arg5_1, arg6_1, arg7_1, arg8_1 = args 2023-01-11T21:38:05.8703918Z args.clear() 2023-01-11T21:38:05.8704124Z buf0 = aten.convolution_backward(arg0_1, arg1_1, arg2_1, [4], [1, 1], [0, 0], [1, 1], False, [0, 0], 1, [True, True, True]) 2023-01-11T21:38:05.8704202Z buf1 = buf0[0] 2023-01-11T21:38:05.8704316Z assert_size_stride(buf1, (3, 4, 5, 5), (100, 25, 5, 1)) 2023-01-11T21:38:05.8704389Z buf2 = buf0[1] 2023-01-11T21:38:05.8704498Z assert_size_stride(buf2, (4, 4, 3, 3), (36, 9, 3, 1)) 2023-01-11T21:38:05.8704570Z buf3 = buf0[2] 2023-01-11T21:38:05.8704662Z assert_size_stride(buf3, (4, ), (1, )) 2023-01-11T21:38:05.8704732Z del buf0 2023-01-11T21:38:05.8704900Z buf4 = aten.convolution_backward(arg0_1, arg1_1, arg2_1, [4], [1, 1], [0, 0], [1, 1], False, [0, 0], 1, [True, False, False]) 2023-01-11T21:38:05.8704974Z del arg0_1 2023-01-11T21:38:05.8705061Z del arg1_1 2023-01-11T21:38:05.8705148Z del arg2_1 2023-01-11T21:38:05.8705227Z buf5 = buf4[0] 2023-01-11T21:38:05.8705347Z assert_size_stride(buf5, (3, 4, 5, 5), (100, 25, 5, 1)) 2023-01-11T21:38:05.8705420Z del buf4 2023-01-11T21:38:05.8705585Z buf6 = aten.convolution_backward(arg3_1, arg4_1, arg5_1, [4], [1], [0], [1], False, [0], 1, [True, True, True]) 2023-01-11T21:38:05.8705659Z del arg3_1 2023-01-11T21:38:05.8705730Z del arg4_1 2023-01-11T21:38:05.8705802Z del arg5_1 2023-01-11T21:38:05.8705875Z buf7 = buf6[0] 2023-01-11T21:38:05.8705977Z assert_size_stride(buf7, (3, 4, 5, 5), (100, 25, 5, 1)) 2023-01-11T21:38:05.8706052Z buf8 = buf6[1] 2023-01-11T21:38:05.8706160Z assert_size_stride(buf8, (4, 4, 3, 3), (36, 9, 3, 1)) 2023-01-11T21:38:05.8706232Z buf9 = buf6[2] 2023-01-11T21:38:05.8706330Z assert_size_stride(buf9, (4, ), (1, )) 2023-01-11T21:38:05.8706402Z del buf6 2023-01-11T21:38:05.8706576Z buf10 = aten.convolution_backward(arg6_1, arg7_1, arg8_1, [4], [1, 1, 1], [0, 0, 0], [1, 1, 1], False, [0, 0, 0], 1, [True, True, True]) 2023-01-11T21:38:05.8706671Z del arg6_1 2023-01-11T21:38:05.8706746Z del arg7_1 2023-01-11T21:38:05.8706817Z del arg8_1 2023-01-11T21:38:05.8706893Z buf11 = buf10[0] 2023-01-11T21:38:05.8707014Z assert_size_stride(buf11, (3, 4, 5, 5, 5), (500, 125, 25, 5, 1)) 2023-01-11T21:38:05.8707091Z buf12 = buf10[1] 2023-01-11T21:38:05.8707206Z assert_size_stride(buf12, (4, 4, 3, 3, 3), (108, 27, 9, 3, 1)) 2023-01-11T21:38:05.8707273Z buf13 = buf10[2] 2023-01-11T21:38:05.8707374Z assert_size_stride(buf13, (4, ), (1, )) 2023-01-11T21:38:05.8707444Z del buf10 2023-01-11T21:38:05.8707577Z return (buf1, buf2, buf3, buf5, buf7, buf8, buf9, buf11, buf12, buf13, ) 2023-01-11T21:38:05.8707583Z 2023-01-11T21:38:05.8707588Z 2023-01-11T21:38:05.8707670Z if __name__ == "__main__": 2023-01-11T21:38:05.8707789Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.8707920Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.8708136Z arg0_1 = rand_strided((3, 4, 3, 3), (36, 9, 3, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8708341Z arg1_1 = rand_strided((3, 4, 5, 5), (100, 25, 5, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8708554Z arg2_1 = rand_strided((4, 4, 3, 3), (36, 9, 3, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8708755Z arg3_1 = rand_strided((3, 4, 3, 3), (36, 9, 3, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8708963Z arg4_1 = rand_strided((3, 4, 5, 5), (100, 25, 5, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8709164Z arg5_1 = rand_strided((4, 4, 3, 3), (36, 9, 3, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8709379Z arg6_1 = rand_strided((3, 4, 3, 3, 3), (108, 27, 9, 3, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8709600Z arg7_1 = rand_strided((3, 4, 5, 5, 5), (500, 125, 25, 5, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8709818Z arg8_1 = rand_strided((4, 4, 3, 3, 3), (108, 27, 9, 3, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8709976Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1, arg3_1, arg4_1, arg5_1, arg6_1, arg7_1, arg8_1])) 2023-01-11T21:38:05.8709990Z 2023-01-11T21:38:05.8710083Z ok (0.168s) 2023-01-11T21:38:05.8710245Z test_conv_bn_fuse_cpu (__main__.CpuTests) ... skip: only support cpu conv bn test (0.001s) 2023-01-11T21:38:05.8710418Z test_conv_functional_bn_fuse_cpu (__main__.CpuTests) ... skip: only support cpu conv bn test (0.001s) 2023-01-11T21:38:05.8710876Z test_convolution1_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.8711010Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.8711268Z [2023-01-11 21:25:37,066] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 63 2023-01-11T21:38:05.8711530Z [2023-01-11 21:25:38,841] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 63 2023-01-11T21:38:05.8711536Z 2023-01-11T21:38:05.8711636Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.8711712Z import torch 2023-01-11T21:38:05.8711780Z import random 2023-01-11T21:38:05.8711900Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.8712024Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.8712029Z 2023-01-11T21:38:05.8712110Z aten = torch.ops.aten 2023-01-11T21:38:05.8712247Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.8712345Z async_compile = AsyncCompile() 2023-01-11T21:38:05.8712377Z 2023-01-11T21:38:05.8712453Z import triton 2023-01-11T21:38:05.8712546Z import triton.language as tl 2023-01-11T21:38:05.8712664Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.8712805Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.8712811Z 2023-01-11T21:38:05.8712816Z 2023-01-11T21:38:05.8712959Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.8713167Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.8713289Z extern "C" void kernel(float* __restrict__ in_out_ptr0, 2023-01-11T21:38:05.8713396Z bool* __restrict__ out_ptr0) 2023-01-11T21:38:05.8713465Z { 2023-01-11T21:38:05.8713559Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.8713625Z { 2023-01-11T21:38:05.8713709Z #pragma omp for 2023-01-11T21:38:05.8713797Z for(long i0=0; i0<2352; i0+=1) 2023-01-11T21:38:05.8713866Z { 2023-01-11T21:38:05.8713939Z { 2023-01-11T21:38:05.8714008Z { 2023-01-11T21:38:05.8714104Z auto tmp0 = in_out_ptr0[i0]; 2023-01-11T21:38:05.8714205Z auto tmp1 = tmp0 * (tmp0>0); 2023-01-11T21:38:05.8714316Z auto tmp2 = static_cast(0); 2023-01-11T21:38:05.8714418Z auto tmp3 = tmp1 <= tmp2; 2023-01-11T21:38:05.8714514Z in_out_ptr0[i0] = tmp1; 2023-01-11T21:38:05.8714606Z out_ptr0[i0] = tmp3; 2023-01-11T21:38:05.8714675Z } 2023-01-11T21:38:05.8714735Z } 2023-01-11T21:38:05.8714801Z } 2023-01-11T21:38:05.8714867Z } 2023-01-11T21:38:05.8714931Z } 2023-01-11T21:38:05.8715016Z ''') 2023-01-11T21:38:05.8715022Z 2023-01-11T21:38:05.8715026Z 2023-01-11T21:38:05.8715121Z async_compile.wait(globals()) 2023-01-11T21:38:05.8715198Z del async_compile 2023-01-11T21:38:05.8715203Z 2023-01-11T21:38:05.8715270Z def call(args): 2023-01-11T21:38:05.8715378Z primals_1, primals_2, primals_3 = args 2023-01-11T21:38:05.8715459Z args.clear() 2023-01-11T21:38:05.8715617Z buf0 = aten.convolution(primals_3, primals_1, primals_2, (1, 1), (0, 0), (1, 1), False, (0, 0), 1) 2023-01-11T21:38:05.8715734Z assert_size_stride(buf0, (2, 6, 14, 14), (1176, 196, 14, 1)) 2023-01-11T21:38:05.8715812Z del primals_2 2023-01-11T21:38:05.8715973Z buf1 = as_strided(buf0, (2, 6, 14, 14), (1176, 196, 14, 1)); del buf0 # reuse 2023-01-11T21:38:05.8716187Z buf2 = empty_strided((2, 6, 14, 14), (1176, 196, 14, 1), device='cpu', dtype=torch.bool) 2023-01-11T21:38:05.8716318Z kernel_cpp_0(c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr())) 2023-01-11T21:38:05.8716430Z return (buf1, primals_1, primals_3, buf2, ) 2023-01-11T21:38:05.8716435Z 2023-01-11T21:38:05.8716439Z 2023-01-11T21:38:05.8716520Z if __name__ == "__main__": 2023-01-11T21:38:05.8716638Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.8716764Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.8716989Z primals_1 = rand_strided((6, 5, 3, 3), (45, 9, 3, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8717188Z primals_2 = rand_strided((6, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8717412Z primals_3 = rand_strided((2, 5, 16, 16), (1280, 256, 16, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8717548Z print_performance(lambda: call([primals_1, primals_2, primals_3])) 2023-01-11T21:38:05.8717553Z 2023-01-11T21:38:05.8717626Z ok (1.867s) 2023-01-11T21:38:05.8718080Z test_convolution2_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.8718209Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.8718493Z [2023-01-11 21:25:38,886] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 64 2023-01-11T21:38:05.8718754Z [2023-01-11 21:25:38,907] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 64 2023-01-11T21:38:05.8718762Z 2023-01-11T21:38:05.8718862Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.8718938Z import torch 2023-01-11T21:38:05.8719012Z import random 2023-01-11T21:38:05.8719124Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.8719253Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.8719259Z 2023-01-11T21:38:05.8719343Z aten = torch.ops.aten 2023-01-11T21:38:05.8719480Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.8719582Z async_compile = AsyncCompile() 2023-01-11T21:38:05.8719587Z 2023-01-11T21:38:05.8719664Z import triton 2023-01-11T21:38:05.8719757Z import triton.language as tl 2023-01-11T21:38:05.8719888Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.8720020Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.8720026Z 2023-01-11T21:38:05.8720038Z 2023-01-11T21:38:05.8720124Z async_compile.wait(globals()) 2023-01-11T21:38:05.8720204Z del async_compile 2023-01-11T21:38:05.8720209Z 2023-01-11T21:38:05.8720284Z def call(args): 2023-01-11T21:38:05.8720371Z arg0_1, arg1_1, arg2_1 = args 2023-01-11T21:38:05.8720449Z args.clear() 2023-01-11T21:38:05.8720585Z buf0 = aten.convolution(arg0_1, arg1_1, arg2_1, (4,), (0,), (1,), True, (0,), 1) 2023-01-11T21:38:05.8720697Z assert_size_stride(buf0, (2, 16, 364), (5824, 364, 1)) 2023-01-11T21:38:05.8720763Z del arg0_1 2023-01-11T21:38:05.8720835Z del arg1_1 2023-01-11T21:38:05.8720908Z del arg2_1 2023-01-11T21:38:05.8720988Z return (buf0, ) 2023-01-11T21:38:05.8720993Z 2023-01-11T21:38:05.8720997Z 2023-01-11T21:38:05.8721083Z if __name__ == "__main__": 2023-01-11T21:38:05.8721200Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.8721326Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.8721531Z arg0_1 = rand_strided((2, 32, 90), (2880, 90, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8721766Z arg1_1 = rand_strided((32, 16, 8), (128, 8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8721961Z arg2_1 = rand_strided((16, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8722091Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1])) 2023-01-11T21:38:05.8722096Z 2023-01-11T21:38:05.8722167Z ok (0.064s) 2023-01-11T21:38:05.8722615Z test_cos_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.8722752Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.8723010Z [2023-01-11 21:25:38,939] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 65 2023-01-11T21:38:05.8723275Z [2023-01-11 21:25:40,740] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 65 2023-01-11T21:38:05.8723280Z 2023-01-11T21:38:05.8723380Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.8723446Z import torch 2023-01-11T21:38:05.8723521Z import random 2023-01-11T21:38:05.8723638Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.8723765Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.8723770Z 2023-01-11T21:38:05.8723852Z aten = torch.ops.aten 2023-01-11T21:38:05.8723989Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.8724089Z async_compile = AsyncCompile() 2023-01-11T21:38:05.8724122Z 2023-01-11T21:38:05.8724190Z import triton 2023-01-11T21:38:05.8724284Z import triton.language as tl 2023-01-11T21:38:05.8724409Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.8724551Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.8724556Z 2023-01-11T21:38:05.8724563Z 2023-01-11T21:38:05.8724704Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.8724912Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.8725040Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:05.8725146Z float* __restrict__ out_ptr0, 2023-01-11T21:38:05.8725240Z float* __restrict__ out_ptr1) 2023-01-11T21:38:05.8725310Z { 2023-01-11T21:38:05.8725413Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.8725482Z { 2023-01-11T21:38:05.8725564Z #pragma omp for 2023-01-11T21:38:05.8725655Z for(long i0=0; i0<32; i0+=1) 2023-01-11T21:38:05.8725724Z { 2023-01-11T21:38:05.8725859Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:05.8725950Z auto tmp1 = tmp0.cos(); 2023-01-11T21:38:05.8726092Z auto tmp2 = at::vec::Vectorized(static_cast(2)); 2023-01-11T21:38:05.8726184Z auto tmp3 = tmp1 + tmp2; 2023-01-11T21:38:05.8726320Z auto tmp4 = at::vec::Vectorized(static_cast(1)); 2023-01-11T21:38:05.8726412Z auto tmp5 = tmp0 + tmp4; 2023-01-11T21:38:05.8726503Z auto tmp6 = tmp5.cos(); 2023-01-11T21:38:05.8726600Z tmp3.store(out_ptr0 + 8*i0); 2023-01-11T21:38:05.8726688Z tmp6.store(out_ptr1 + 8*i0); 2023-01-11T21:38:05.8726756Z } 2023-01-11T21:38:05.8726857Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:05.8726946Z for(long i0=256; i0<256; i0+=1) 2023-01-11T21:38:05.8727015Z { 2023-01-11T21:38:05.8727104Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:05.8727191Z auto tmp1 = std::cos(tmp0); 2023-01-11T21:38:05.8727303Z auto tmp2 = static_cast(2); 2023-01-11T21:38:05.8727392Z auto tmp3 = tmp1 + tmp2; 2023-01-11T21:38:05.8727495Z auto tmp4 = static_cast(1); 2023-01-11T21:38:05.8727622Z auto tmp5 = tmp0 + tmp4; 2023-01-11T21:38:05.8727717Z auto tmp6 = std::cos(tmp5); 2023-01-11T21:38:05.8727804Z out_ptr0[i0] = tmp3; 2023-01-11T21:38:05.8727881Z out_ptr1[i0] = tmp6; 2023-01-11T21:38:05.8727948Z } 2023-01-11T21:38:05.8728015Z } 2023-01-11T21:38:05.8728081Z } 2023-01-11T21:38:05.8728166Z ''') 2023-01-11T21:38:05.8728172Z 2023-01-11T21:38:05.8728176Z 2023-01-11T21:38:05.8728274Z async_compile.wait(globals()) 2023-01-11T21:38:05.8728354Z del async_compile 2023-01-11T21:38:05.8728360Z 2023-01-11T21:38:05.8728435Z def call(args): 2023-01-11T21:38:05.8728507Z arg0_1, = args 2023-01-11T21:38:05.8728583Z args.clear() 2023-01-11T21:38:05.8728786Z buf0 = empty_strided((16, 16), (16, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8728987Z buf1 = empty_strided((16, 16), (16, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8729157Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr())) 2023-01-11T21:38:05.8729232Z del arg0_1 2023-01-11T21:38:05.8729313Z return (buf0, buf1, ) 2023-01-11T21:38:05.8729318Z 2023-01-11T21:38:05.8729323Z 2023-01-11T21:38:05.8729396Z if __name__ == "__main__": 2023-01-11T21:38:05.8729516Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.8729642Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.8729841Z arg0_1 = rand_strided((16, 16), (16, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8729959Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:05.8729964Z 2023-01-11T21:38:05.8730071Z ok (1.833s) 2023-01-11T21:38:05.8730238Z test_cpp_wrapper_cpu (__main__.CpuTests) ... skip: cpp_wrapper only supports cpu (0.001s) 2023-01-11T21:38:05.8730379Z test_cudnn_rnn_cpu (__main__.CpuTests) ... skip: requires CUDA (0.003s) 2023-01-11T21:38:05.8730589Z test_dense_mask_index_cpu (__main__.CpuTests) ... skip: https://github.com/pytorch/torchdynamo/issues/1697 (0.001s) 2023-01-11T21:38:05.8731037Z test_div1_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.8731170Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.8731436Z [2023-01-11 21:25:40,809] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 66 2023-01-11T21:38:05.8731699Z [2023-01-11 21:25:42,730] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 66 2023-01-11T21:38:05.8731705Z 2023-01-11T21:38:05.8731804Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.8731880Z import torch 2023-01-11T21:38:05.8731955Z import random 2023-01-11T21:38:05.8732076Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.8732192Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.8732204Z 2023-01-11T21:38:05.8732280Z aten = torch.ops.aten 2023-01-11T21:38:05.8732417Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.8732514Z async_compile = AsyncCompile() 2023-01-11T21:38:05.8732521Z 2023-01-11T21:38:05.8732596Z import triton 2023-01-11T21:38:05.8732691Z import triton.language as tl 2023-01-11T21:38:05.8732817Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.8732957Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.8732965Z 2023-01-11T21:38:05.8732969Z 2023-01-11T21:38:05.8733107Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.8733305Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.8733457Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:05.8733570Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:05.8733674Z float* __restrict__ out_ptr0, 2023-01-11T21:38:05.8733777Z float* __restrict__ out_ptr1, 2023-01-11T21:38:05.8733876Z float* __restrict__ out_ptr2, 2023-01-11T21:38:05.8733974Z float* __restrict__ out_ptr3, 2023-01-11T21:38:05.8734065Z float* __restrict__ out_ptr4) 2023-01-11T21:38:05.8734133Z { 2023-01-11T21:38:05.8734235Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.8734302Z { 2023-01-11T21:38:05.8734387Z #pragma omp for 2023-01-11T21:38:05.8734581Z for(long i0=0; i0<8; i0+=1) 2023-01-11T21:38:05.8734651Z { 2023-01-11T21:38:05.8734786Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:05.8734923Z auto tmp1 = at::vec::Vectorized::loadu(in_ptr1 + 8*i0); 2023-01-11T21:38:05.8735025Z auto tmp2 = tmp0 / tmp1; 2023-01-11T21:38:05.8735135Z auto tmp3 = tmp2.floor(); 2023-01-11T21:38:05.8735237Z auto tmp4 = tmp2.trunc(); 2023-01-11T21:38:05.8735346Z tmp2.store(out_ptr0 + 8*i0); 2023-01-11T21:38:05.8735440Z tmp3.store(out_ptr1 + 8*i0); 2023-01-11T21:38:05.8735527Z tmp4.store(out_ptr2 + 8*i0); 2023-01-11T21:38:05.8735622Z tmp2.store(out_ptr3 + 8*i0); 2023-01-11T21:38:05.8735714Z tmp3.store(out_ptr4 + 8*i0); 2023-01-11T21:38:05.8735782Z } 2023-01-11T21:38:05.8735880Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:05.8736018Z for(long i0=64; i0<64; i0+=1) 2023-01-11T21:38:05.8736086Z { 2023-01-11T21:38:05.8736169Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:05.8736259Z auto tmp1 = in_ptr1[i0]; 2023-01-11T21:38:05.8736346Z auto tmp2 = tmp0 / tmp1; 2023-01-11T21:38:05.8736447Z auto tmp3 = std::floor(tmp2); 2023-01-11T21:38:05.8736550Z auto tmp4 = std::trunc(tmp2); 2023-01-11T21:38:05.8736634Z out_ptr0[i0] = tmp2; 2023-01-11T21:38:05.8736720Z out_ptr1[i0] = tmp3; 2023-01-11T21:38:05.8736796Z out_ptr2[i0] = tmp4; 2023-01-11T21:38:05.8736882Z out_ptr3[i0] = tmp2; 2023-01-11T21:38:05.8736967Z out_ptr4[i0] = tmp3; 2023-01-11T21:38:05.8737037Z } 2023-01-11T21:38:05.8737105Z } 2023-01-11T21:38:05.8737225Z } 2023-01-11T21:38:05.8737330Z ''') 2023-01-11T21:38:05.8737336Z 2023-01-11T21:38:05.8737340Z 2023-01-11T21:38:05.8737426Z async_compile.wait(globals()) 2023-01-11T21:38:05.8737508Z del async_compile 2023-01-11T21:38:05.8737513Z 2023-01-11T21:38:05.8737588Z def call(args): 2023-01-11T21:38:05.8737669Z arg0_1, arg1_1 = args 2023-01-11T21:38:05.8737744Z args.clear() 2023-01-11T21:38:05.8737943Z buf0 = empty_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8738140Z buf1 = empty_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8738322Z buf2 = empty_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8738513Z buf3 = empty_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8738700Z buf4 = empty_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8738961Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr()), c_void_p(buf3.data_ptr()), c_void_p(buf4.data_ptr())) 2023-01-11T21:38:05.8739037Z del arg0_1 2023-01-11T21:38:05.8739114Z del arg1_1 2023-01-11T21:38:05.8739216Z return (buf0, buf1, buf2, buf3, buf4, ) 2023-01-11T21:38:05.8739222Z 2023-01-11T21:38:05.8739226Z 2023-01-11T21:38:05.8739306Z if __name__ == "__main__": 2023-01-11T21:38:05.8739425Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.8739588Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.8739787Z arg0_1 = rand_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8739983Z arg1_1 = rand_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8740104Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:05.8740109Z 2023-01-11T21:38:05.8740179Z ok (1.985s) 2023-01-11T21:38:05.8740623Z test_div2_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.8740760Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.8741021Z [2023-01-11 21:25:42,790] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 67 2023-01-11T21:38:05.8741285Z [2023-01-11 21:25:44,576] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 67 2023-01-11T21:38:05.8741291Z 2023-01-11T21:38:05.8741381Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.8741459Z import torch 2023-01-11T21:38:05.8741534Z import random 2023-01-11T21:38:05.8741654Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.8741777Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.8741782Z 2023-01-11T21:38:05.8741867Z aten = torch.ops.aten 2023-01-11T21:38:05.8742002Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.8742132Z async_compile = AsyncCompile() 2023-01-11T21:38:05.8742137Z 2023-01-11T21:38:05.8742204Z import triton 2023-01-11T21:38:05.8742298Z import triton.language as tl 2023-01-11T21:38:05.8742422Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.8742568Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.8742574Z 2023-01-11T21:38:05.8742579Z 2023-01-11T21:38:05.8742717Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.8742926Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.8743053Z extern "C" void kernel(const long* __restrict__ in_ptr0, 2023-01-11T21:38:05.8743162Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:05.8743257Z float* __restrict__ out_ptr0, 2023-01-11T21:38:05.8743356Z float* __restrict__ out_ptr1, 2023-01-11T21:38:05.8743457Z float* __restrict__ out_ptr2, 2023-01-11T21:38:05.8743560Z float* __restrict__ out_ptr3, 2023-01-11T21:38:05.8743658Z float* __restrict__ out_ptr4) 2023-01-11T21:38:05.8743724Z { 2023-01-11T21:38:05.8743825Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.8743884Z { 2023-01-11T21:38:05.8743969Z #pragma omp for 2023-01-11T21:38:05.8744055Z for(long i0=0; i0<64; i0+=1) 2023-01-11T21:38:05.8744123Z { 2023-01-11T21:38:05.8744191Z { 2023-01-11T21:38:05.8744262Z { 2023-01-11T21:38:05.8744353Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:05.8744453Z auto tmp2 = in_ptr1[i0]; 2023-01-11T21:38:05.8744570Z auto tmp1 = static_cast(tmp0); 2023-01-11T21:38:05.8744667Z auto tmp3 = tmp1 / tmp2; 2023-01-11T21:38:05.8744774Z auto tmp4 = std::floor(tmp3); 2023-01-11T21:38:05.8744878Z auto tmp5 = std::trunc(tmp3); 2023-01-11T21:38:05.8744971Z out_ptr0[i0] = tmp3; 2023-01-11T21:38:05.8745061Z out_ptr1[i0] = tmp4; 2023-01-11T21:38:05.8745142Z out_ptr2[i0] = tmp5; 2023-01-11T21:38:05.8745229Z out_ptr3[i0] = tmp3; 2023-01-11T21:38:05.8745345Z out_ptr4[i0] = tmp4; 2023-01-11T21:38:05.8745417Z } 2023-01-11T21:38:05.8745484Z } 2023-01-11T21:38:05.8745553Z } 2023-01-11T21:38:05.8745613Z } 2023-01-11T21:38:05.8745678Z } 2023-01-11T21:38:05.8745764Z ''') 2023-01-11T21:38:05.8745770Z 2023-01-11T21:38:05.8745774Z 2023-01-11T21:38:05.8745869Z async_compile.wait(globals()) 2023-01-11T21:38:05.8745948Z del async_compile 2023-01-11T21:38:05.8745953Z 2023-01-11T21:38:05.8746028Z def call(args): 2023-01-11T21:38:05.8746110Z arg0_1, arg1_1 = args 2023-01-11T21:38:05.8746185Z args.clear() 2023-01-11T21:38:05.8746374Z buf0 = empty_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8746568Z buf1 = empty_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8746756Z buf2 = empty_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8746945Z buf3 = empty_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8747133Z buf4 = empty_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8747392Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr()), c_void_p(buf3.data_ptr()), c_void_p(buf4.data_ptr())) 2023-01-11T21:38:05.8747467Z del arg0_1 2023-01-11T21:38:05.8747539Z del arg1_1 2023-01-11T21:38:05.8747633Z return (buf0, buf1, buf2, buf3, buf4, ) 2023-01-11T21:38:05.8747639Z 2023-01-11T21:38:05.8747643Z 2023-01-11T21:38:05.8747724Z if __name__ == "__main__": 2023-01-11T21:38:05.8747843Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.8748002Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.8748194Z arg0_1 = rand_strided((8, 8), (8, 1), device='cpu', dtype=torch.int64) 2023-01-11T21:38:05.8748387Z arg1_1 = rand_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8748511Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:05.8748516Z 2023-01-11T21:38:05.8748587Z ok (1.847s) 2023-01-11T21:38:05.8749036Z test_div3_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.8749162Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.8749420Z [2023-01-11 21:25:44,624] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 68 2023-01-11T21:38:05.8749683Z [2023-01-11 21:25:46,350] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 68 2023-01-11T21:38:05.8749689Z 2023-01-11T21:38:05.8749788Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.8749865Z import torch 2023-01-11T21:38:05.8749939Z import random 2023-01-11T21:38:05.8750059Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.8750184Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.8750189Z 2023-01-11T21:38:05.8750264Z aten = torch.ops.aten 2023-01-11T21:38:05.8750401Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.8750498Z async_compile = AsyncCompile() 2023-01-11T21:38:05.8750504Z 2023-01-11T21:38:05.8750580Z import triton 2023-01-11T21:38:05.8750673Z import triton.language as tl 2023-01-11T21:38:05.8750797Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.8750938Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.8750944Z 2023-01-11T21:38:05.8750948Z 2023-01-11T21:38:05.8751087Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.8751286Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.8751449Z extern "C" void kernel(const long* __restrict__ in_ptr0, 2023-01-11T21:38:05.8751563Z const long* __restrict__ in_ptr1, 2023-01-11T21:38:05.8751669Z float* __restrict__ out_ptr0, 2023-01-11T21:38:05.8751772Z long* __restrict__ out_ptr1, 2023-01-11T21:38:05.8751872Z long* __restrict__ out_ptr2, 2023-01-11T21:38:05.8751974Z float* __restrict__ out_ptr3, 2023-01-11T21:38:05.8752072Z long* __restrict__ out_ptr4) 2023-01-11T21:38:05.8752130Z { 2023-01-11T21:38:05.8752234Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.8752304Z { 2023-01-11T21:38:05.8752386Z #pragma omp for 2023-01-11T21:38:05.8752475Z for(long i0=0; i0<64; i0+=1) 2023-01-11T21:38:05.8752544Z { 2023-01-11T21:38:05.8752605Z { 2023-01-11T21:38:05.8752674Z { 2023-01-11T21:38:05.8752775Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:05.8752873Z auto tmp2 = in_ptr1[i0]; 2023-01-11T21:38:05.8752986Z auto tmp1 = static_cast(tmp0); 2023-01-11T21:38:05.8753100Z auto tmp3 = static_cast(tmp2); 2023-01-11T21:38:05.8753195Z auto tmp4 = tmp1 / tmp3; 2023-01-11T21:38:05.8753433Z auto tmp5 = ((tmp0 < 0) != (tmp2 < 0) ? (tmp0 % tmp2 != 0 ? tmp0 / tmp2 - 1 : tmp0 / tmp2) : tmp0 / tmp2); 2023-01-11T21:38:05.8753530Z auto tmp6 = tmp0 / tmp2; 2023-01-11T21:38:05.8753622Z out_ptr0[i0] = tmp4; 2023-01-11T21:38:05.8753743Z out_ptr1[i0] = tmp5; 2023-01-11T21:38:05.8753832Z out_ptr2[i0] = tmp6; 2023-01-11T21:38:05.8753921Z out_ptr3[i0] = tmp4; 2023-01-11T21:38:05.8754009Z out_ptr4[i0] = tmp5; 2023-01-11T21:38:05.8754079Z } 2023-01-11T21:38:05.8754140Z } 2023-01-11T21:38:05.8754210Z } 2023-01-11T21:38:05.8754279Z } 2023-01-11T21:38:05.8754342Z } 2023-01-11T21:38:05.8754428Z ''') 2023-01-11T21:38:05.8754434Z 2023-01-11T21:38:05.8754439Z 2023-01-11T21:38:05.8754533Z async_compile.wait(globals()) 2023-01-11T21:38:05.8754603Z del async_compile 2023-01-11T21:38:05.8754619Z 2023-01-11T21:38:05.8754687Z def call(args): 2023-01-11T21:38:05.8754770Z arg0_1, arg1_1 = args 2023-01-11T21:38:05.8754850Z args.clear() 2023-01-11T21:38:05.8755047Z buf0 = empty_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8755238Z buf1 = empty_strided((8, 8), (8, 1), device='cpu', dtype=torch.int64) 2023-01-11T21:38:05.8755435Z buf2 = empty_strided((8, 8), (8, 1), device='cpu', dtype=torch.int64) 2023-01-11T21:38:05.8755630Z buf3 = empty_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8755808Z buf4 = empty_strided((8, 8), (8, 1), device='cpu', dtype=torch.int64) 2023-01-11T21:38:05.8756066Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr()), c_void_p(buf3.data_ptr()), c_void_p(buf4.data_ptr())) 2023-01-11T21:38:05.8756142Z del arg0_1 2023-01-11T21:38:05.8756216Z del arg1_1 2023-01-11T21:38:05.8756321Z return (buf0, buf1, buf2, buf3, buf4, ) 2023-01-11T21:38:05.8756328Z 2023-01-11T21:38:05.8756332Z 2023-01-11T21:38:05.8756413Z if __name__ == "__main__": 2023-01-11T21:38:05.8756532Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.8756661Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.8756847Z arg0_1 = rand_strided((8, 8), (8, 1), device='cpu', dtype=torch.int64) 2023-01-11T21:38:05.8757039Z arg1_1 = rand_strided((8, 8), (8, 1), device='cpu', dtype=torch.int64) 2023-01-11T21:38:05.8757162Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:05.8757167Z 2023-01-11T21:38:05.8757238Z ok (1.773s) 2023-01-11T21:38:05.8757768Z test_div4_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.8757904Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.8758160Z [2023-01-11 21:25:46,396] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 69 2023-01-11T21:38:05.8758427Z [2023-01-11 21:25:46,417] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 69 2023-01-11T21:38:05.8758433Z 2023-01-11T21:38:05.8758536Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.8758613Z import torch 2023-01-11T21:38:05.8758681Z import random 2023-01-11T21:38:05.8758806Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.8758931Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.8758936Z 2023-01-11T21:38:05.8759022Z aten = torch.ops.aten 2023-01-11T21:38:05.8759164Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.8759261Z async_compile = AsyncCompile() 2023-01-11T21:38:05.8759266Z 2023-01-11T21:38:05.8759342Z import triton 2023-01-11T21:38:05.8759427Z import triton.language as tl 2023-01-11T21:38:05.8759558Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.8759697Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.8759732Z 2023-01-11T21:38:05.8759737Z 2023-01-11T21:38:05.8759878Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.8760084Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.8760207Z extern "C" void kernel(const long* __restrict__ in_ptr0, 2023-01-11T21:38:05.8760320Z const long* __restrict__ in_ptr1, 2023-01-11T21:38:05.8760424Z float* __restrict__ out_ptr0, 2023-01-11T21:38:05.8760516Z long* __restrict__ out_ptr1, 2023-01-11T21:38:05.8760617Z long* __restrict__ out_ptr2, 2023-01-11T21:38:05.8760719Z float* __restrict__ out_ptr3, 2023-01-11T21:38:05.8760817Z long* __restrict__ out_ptr4) 2023-01-11T21:38:05.8760883Z { 2023-01-11T21:38:05.8760986Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.8761052Z { 2023-01-11T21:38:05.8761126Z #pragma omp for 2023-01-11T21:38:05.8761218Z for(long i0=0; i0<64; i0+=1) 2023-01-11T21:38:05.8761286Z { 2023-01-11T21:38:05.8761355Z { 2023-01-11T21:38:05.8761424Z { 2023-01-11T21:38:05.8761523Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:05.8761621Z auto tmp2 = in_ptr1[i0]; 2023-01-11T21:38:05.8761730Z auto tmp1 = static_cast(tmp0); 2023-01-11T21:38:05.8761844Z auto tmp3 = static_cast(tmp2); 2023-01-11T21:38:05.8761940Z auto tmp4 = tmp1 / tmp3; 2023-01-11T21:38:05.8762188Z auto tmp5 = ((tmp0 < 0) != (tmp2 < 0) ? (tmp0 % tmp2 != 0 ? tmp0 / tmp2 - 1 : tmp0 / tmp2) : tmp0 / tmp2); 2023-01-11T21:38:05.8762283Z auto tmp6 = tmp0 / tmp2; 2023-01-11T21:38:05.8762374Z out_ptr0[i0] = tmp4; 2023-01-11T21:38:05.8762465Z out_ptr1[i0] = tmp5; 2023-01-11T21:38:05.8762546Z out_ptr2[i0] = tmp6; 2023-01-11T21:38:05.8762637Z out_ptr3[i0] = tmp4; 2023-01-11T21:38:05.8762726Z out_ptr4[i0] = tmp5; 2023-01-11T21:38:05.8762803Z } 2023-01-11T21:38:05.8762870Z } 2023-01-11T21:38:05.8762937Z } 2023-01-11T21:38:05.8763005Z } 2023-01-11T21:38:05.8763062Z } 2023-01-11T21:38:05.8763146Z ''') 2023-01-11T21:38:05.8763182Z 2023-01-11T21:38:05.8763187Z 2023-01-11T21:38:05.8763284Z async_compile.wait(globals()) 2023-01-11T21:38:05.8763360Z del async_compile 2023-01-11T21:38:05.8763365Z 2023-01-11T21:38:05.8763440Z def call(args): 2023-01-11T21:38:05.8763526Z arg0_1, arg1_1 = args 2023-01-11T21:38:05.8763602Z args.clear() 2023-01-11T21:38:05.8763788Z buf0 = empty_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8763980Z buf1 = empty_strided((8, 8), (8, 1), device='cpu', dtype=torch.int64) 2023-01-11T21:38:05.8764170Z buf2 = empty_strided((8, 8), (8, 1), device='cpu', dtype=torch.int64) 2023-01-11T21:38:05.8764365Z buf3 = empty_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8764553Z buf4 = empty_strided((8, 8), (8, 1), device='cpu', dtype=torch.int64) 2023-01-11T21:38:05.8764811Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr()), c_void_p(buf3.data_ptr()), c_void_p(buf4.data_ptr())) 2023-01-11T21:38:05.8764904Z del arg0_1 2023-01-11T21:38:05.8764984Z del arg1_1 2023-01-11T21:38:05.8765098Z return (buf0, buf1, buf2, buf3, buf4, ) 2023-01-11T21:38:05.8765115Z 2023-01-11T21:38:05.8765120Z 2023-01-11T21:38:05.8765194Z if __name__ == "__main__": 2023-01-11T21:38:05.8765314Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.8765440Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.8765632Z arg0_1 = rand_strided((8, 8), (8, 1), device='cpu', dtype=torch.int64) 2023-01-11T21:38:05.8765822Z arg1_1 = rand_strided((8, 8), (8, 1), device='cpu', dtype=torch.int64) 2023-01-11T21:38:05.8765981Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:05.8765986Z 2023-01-11T21:38:05.8766057Z ok (0.067s) 2023-01-11T21:38:05.8766501Z test_div5_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.8766635Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.8766881Z [2023-01-11 21:25:46,588] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 70 2023-01-11T21:38:05.8767141Z [2023-01-11 21:25:48,251] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 70 2023-01-11T21:38:05.8767150Z 2023-01-11T21:38:05.8767250Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.8767326Z import torch 2023-01-11T21:38:05.8767402Z import random 2023-01-11T21:38:05.8767521Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.8767649Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.8767654Z 2023-01-11T21:38:05.8767741Z aten = torch.ops.aten 2023-01-11T21:38:05.8767871Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.8767966Z async_compile = AsyncCompile() 2023-01-11T21:38:05.8767972Z 2023-01-11T21:38:05.8768046Z import triton 2023-01-11T21:38:05.8768141Z import triton.language as tl 2023-01-11T21:38:05.8768265Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.8768405Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.8768410Z 2023-01-11T21:38:05.8768415Z 2023-01-11T21:38:05.8768552Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.8768759Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.8768873Z extern "C" void kernel(const long* __restrict__ in_ptr0, 2023-01-11T21:38:05.8768978Z float* __restrict__ out_ptr0, 2023-01-11T21:38:05.8769082Z long* __restrict__ out_ptr1, 2023-01-11T21:38:05.8769213Z long* __restrict__ out_ptr2, 2023-01-11T21:38:05.8769320Z float* __restrict__ out_ptr3, 2023-01-11T21:38:05.8769421Z long* __restrict__ out_ptr4) 2023-01-11T21:38:05.8769489Z { 2023-01-11T21:38:05.8769584Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.8769650Z { 2023-01-11T21:38:05.8769731Z #pragma omp for 2023-01-11T21:38:05.8769822Z for(long i0=0; i0<64; i0+=1) 2023-01-11T21:38:05.8769892Z { 2023-01-11T21:38:05.8769960Z { 2023-01-11T21:38:05.8770030Z { 2023-01-11T21:38:05.8770120Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:05.8770238Z auto tmp1 = static_cast(tmp0); 2023-01-11T21:38:05.8770352Z auto tmp2 = static_cast(16); 2023-01-11T21:38:05.8770451Z auto tmp3 = tmp1 / tmp2; 2023-01-11T21:38:05.8770560Z auto tmp4 = static_cast(16); 2023-01-11T21:38:05.8770813Z auto tmp5 = ((tmp0 < 0) != (tmp4 < 0) ? (tmp0 % tmp4 != 0 ? tmp0 / tmp4 - 1 : tmp0 / tmp4) : tmp0 / tmp4); 2023-01-11T21:38:05.8770914Z auto tmp6 = tmp0 / tmp4; 2023-01-11T21:38:05.8770997Z out_ptr0[i0] = tmp3; 2023-01-11T21:38:05.8771085Z out_ptr1[i0] = tmp5; 2023-01-11T21:38:05.8771174Z out_ptr2[i0] = tmp6; 2023-01-11T21:38:05.8771264Z out_ptr3[i0] = tmp3; 2023-01-11T21:38:05.8771352Z out_ptr4[i0] = tmp5; 2023-01-11T21:38:05.8771421Z } 2023-01-11T21:38:05.8771490Z } 2023-01-11T21:38:05.8771582Z } 2023-01-11T21:38:05.8771650Z } 2023-01-11T21:38:05.8771715Z } 2023-01-11T21:38:05.8771799Z ''') 2023-01-11T21:38:05.8771805Z 2023-01-11T21:38:05.8771809Z 2023-01-11T21:38:05.8771903Z async_compile.wait(globals()) 2023-01-11T21:38:05.8771980Z del async_compile 2023-01-11T21:38:05.8771986Z 2023-01-11T21:38:05.8772063Z def call(args): 2023-01-11T21:38:05.8772131Z arg0_1, = args 2023-01-11T21:38:05.8772207Z args.clear() 2023-01-11T21:38:05.8772401Z buf0 = empty_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8772590Z buf1 = empty_strided((8, 8), (8, 1), device='cpu', dtype=torch.int64) 2023-01-11T21:38:05.8772779Z buf2 = empty_strided((8, 8), (8, 1), device='cpu', dtype=torch.int64) 2023-01-11T21:38:05.8772969Z buf3 = empty_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8773153Z buf4 = empty_strided((8, 8), (8, 1), device='cpu', dtype=torch.int64) 2023-01-11T21:38:05.8773399Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr()), c_void_p(buf3.data_ptr()), c_void_p(buf4.data_ptr())) 2023-01-11T21:38:05.8773466Z del arg0_1 2023-01-11T21:38:05.8773569Z return (buf0, buf1, buf2, buf3, buf4, ) 2023-01-11T21:38:05.8773574Z 2023-01-11T21:38:05.8773579Z 2023-01-11T21:38:05.8773662Z if __name__ == "__main__": 2023-01-11T21:38:05.8773780Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.8773910Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.8774103Z arg0_1 = rand_strided((8, 8), (8, 1), device='cpu', dtype=torch.int64) 2023-01-11T21:38:05.8774217Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:05.8774222Z 2023-01-11T21:38:05.8774294Z ok (1.835s) 2023-01-11T21:38:05.8774845Z test_div6_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.8774976Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.8775288Z [2023-01-11 21:25:48,296] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 71 2023-01-11T21:38:05.8775553Z [2023-01-11 21:25:49,958] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 71 2023-01-11T21:38:05.8775559Z 2023-01-11T21:38:05.8775659Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.8775734Z import torch 2023-01-11T21:38:05.8775812Z import random 2023-01-11T21:38:05.8775933Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.8776056Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.8776061Z 2023-01-11T21:38:05.8776136Z aten = torch.ops.aten 2023-01-11T21:38:05.8776278Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.8776380Z async_compile = AsyncCompile() 2023-01-11T21:38:05.8776385Z 2023-01-11T21:38:05.8776460Z import triton 2023-01-11T21:38:05.8776554Z import triton.language as tl 2023-01-11T21:38:05.8776681Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.8776820Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.8776826Z 2023-01-11T21:38:05.8776830Z 2023-01-11T21:38:05.8776968Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.8777222Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.8777354Z extern "C" void kernel(const bool* __restrict__ in_ptr0, 2023-01-11T21:38:05.8777465Z const long* __restrict__ in_ptr1, 2023-01-11T21:38:05.8777569Z float* __restrict__ out_ptr0, 2023-01-11T21:38:05.8777669Z long* __restrict__ out_ptr1, 2023-01-11T21:38:05.8777809Z long* __restrict__ out_ptr2, 2023-01-11T21:38:05.8777911Z float* __restrict__ out_ptr3, 2023-01-11T21:38:05.8778008Z long* __restrict__ out_ptr4) 2023-01-11T21:38:05.8778066Z { 2023-01-11T21:38:05.8778171Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.8778238Z { 2023-01-11T21:38:05.8778320Z #pragma omp for 2023-01-11T21:38:05.8778409Z for(long i0=0; i0<64; i0+=1) 2023-01-11T21:38:05.8778477Z { 2023-01-11T21:38:05.8778538Z { 2023-01-11T21:38:05.8778610Z { 2023-01-11T21:38:05.8778708Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:05.8778805Z auto tmp3 = in_ptr1[i0]; 2023-01-11T21:38:05.8778914Z auto tmp1 = static_cast(tmp0); 2023-01-11T21:38:05.8779028Z auto tmp2 = static_cast(tmp1); 2023-01-11T21:38:05.8779138Z auto tmp4 = static_cast(tmp3); 2023-01-11T21:38:05.8779231Z auto tmp5 = tmp2 / tmp4; 2023-01-11T21:38:05.8779477Z auto tmp6 = ((tmp1 < 0) != (tmp3 < 0) ? (tmp1 % tmp3 != 0 ? tmp1 / tmp3 - 1 : tmp1 / tmp3) : tmp1 / tmp3); 2023-01-11T21:38:05.8779573Z auto tmp7 = tmp1 / tmp3; 2023-01-11T21:38:05.8779686Z auto tmp8 = static_cast(tmp0); 2023-01-11T21:38:05.8779781Z auto tmp9 = tmp8 / tmp4; 2023-01-11T21:38:05.8779870Z out_ptr0[i0] = tmp5; 2023-01-11T21:38:05.8779960Z out_ptr1[i0] = tmp6; 2023-01-11T21:38:05.8780051Z out_ptr2[i0] = tmp7; 2023-01-11T21:38:05.8780133Z out_ptr3[i0] = tmp9; 2023-01-11T21:38:05.8780221Z out_ptr4[i0] = tmp6; 2023-01-11T21:38:05.8780292Z } 2023-01-11T21:38:05.8780361Z } 2023-01-11T21:38:05.8780428Z } 2023-01-11T21:38:05.8780496Z } 2023-01-11T21:38:05.8780553Z } 2023-01-11T21:38:05.8780638Z ''') 2023-01-11T21:38:05.8780644Z 2023-01-11T21:38:05.8780649Z 2023-01-11T21:38:05.8780742Z async_compile.wait(globals()) 2023-01-11T21:38:05.8780821Z del async_compile 2023-01-11T21:38:05.8780826Z 2023-01-11T21:38:05.8780902Z def call(args): 2023-01-11T21:38:05.8780982Z arg0_1, arg1_1 = args 2023-01-11T21:38:05.8781092Z args.clear() 2023-01-11T21:38:05.8781289Z buf0 = empty_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8781471Z buf1 = empty_strided((8, 8), (8, 1), device='cpu', dtype=torch.int64) 2023-01-11T21:38:05.8781663Z buf2 = empty_strided((8, 8), (8, 1), device='cpu', dtype=torch.int64) 2023-01-11T21:38:05.8781854Z buf3 = empty_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8782039Z buf4 = empty_strided((8, 8), (8, 1), device='cpu', dtype=torch.int64) 2023-01-11T21:38:05.8782298Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr()), c_void_p(buf3.data_ptr()), c_void_p(buf4.data_ptr())) 2023-01-11T21:38:05.8782377Z del arg0_1 2023-01-11T21:38:05.8782449Z del arg1_1 2023-01-11T21:38:05.8782551Z return (buf0, buf1, buf2, buf3, buf4, ) 2023-01-11T21:38:05.8782556Z 2023-01-11T21:38:05.8782563Z 2023-01-11T21:38:05.8782637Z if __name__ == "__main__": 2023-01-11T21:38:05.8782755Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.8782880Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.8783068Z arg0_1 = rand_strided((8, 8), (8, 1), device='cpu', dtype=torch.bool) 2023-01-11T21:38:05.8783259Z arg1_1 = rand_strided((8, 8), (8, 1), device='cpu', dtype=torch.int64) 2023-01-11T21:38:05.8783377Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:05.8783382Z 2023-01-11T21:38:05.8783454Z ok (1.707s) 2023-01-11T21:38:05.8783902Z test_div7_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.8784066Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.8784315Z [2023-01-11 21:25:50,004] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 72 2023-01-11T21:38:05.8784580Z [2023-01-11 21:25:51,670] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 72 2023-01-11T21:38:05.8784586Z 2023-01-11T21:38:05.8784687Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.8784762Z import torch 2023-01-11T21:38:05.8784837Z import random 2023-01-11T21:38:05.8784957Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.8785085Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.8785091Z 2023-01-11T21:38:05.8785173Z aten = torch.ops.aten 2023-01-11T21:38:05.8785303Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.8785399Z async_compile = AsyncCompile() 2023-01-11T21:38:05.8785405Z 2023-01-11T21:38:05.8785478Z import triton 2023-01-11T21:38:05.8785575Z import triton.language as tl 2023-01-11T21:38:05.8785702Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.8785843Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.8785849Z 2023-01-11T21:38:05.8785853Z 2023-01-11T21:38:05.8785991Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.8786197Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.8786312Z extern "C" void kernel(const long* __restrict__ in_ptr0, 2023-01-11T21:38:05.8786421Z const long* __restrict__ in_ptr1, 2023-01-11T21:38:05.8786531Z float* __restrict__ out_ptr0, 2023-01-11T21:38:05.8786632Z long* __restrict__ out_ptr1, 2023-01-11T21:38:05.8786735Z long* __restrict__ out_ptr2, 2023-01-11T21:38:05.8786837Z float* __restrict__ out_ptr3, 2023-01-11T21:38:05.8786964Z long* __restrict__ out_ptr4) 2023-01-11T21:38:05.8787024Z { 2023-01-11T21:38:05.8787128Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.8787193Z { 2023-01-11T21:38:05.8787274Z #pragma omp for 2023-01-11T21:38:05.8787363Z for(long i0=0; i0<10000; i0+=1) 2023-01-11T21:38:05.8787432Z { 2023-01-11T21:38:05.8787501Z { 2023-01-11T21:38:05.8787564Z { 2023-01-11T21:38:05.8787662Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:05.8787761Z auto tmp2 = in_ptr1[i0]; 2023-01-11T21:38:05.8787877Z auto tmp1 = static_cast(tmp0); 2023-01-11T21:38:05.8787994Z auto tmp3 = static_cast(tmp2); 2023-01-11T21:38:05.8788090Z auto tmp4 = tmp1 / tmp3; 2023-01-11T21:38:05.8788338Z auto tmp5 = ((tmp0 < 0) != (tmp2 < 0) ? (tmp0 % tmp2 != 0 ? tmp0 / tmp2 - 1 : tmp0 / tmp2) : tmp0 / tmp2); 2023-01-11T21:38:05.8788430Z auto tmp6 = tmp0 / tmp2; 2023-01-11T21:38:05.8788522Z out_ptr0[i0] = tmp4; 2023-01-11T21:38:05.8788611Z out_ptr1[i0] = tmp5; 2023-01-11T21:38:05.8788698Z out_ptr2[i0] = tmp6; 2023-01-11T21:38:05.8788785Z out_ptr3[i0] = tmp4; 2023-01-11T21:38:05.8788872Z out_ptr4[i0] = tmp5; 2023-01-11T21:38:05.8788941Z } 2023-01-11T21:38:05.8789002Z } 2023-01-11T21:38:05.8789069Z } 2023-01-11T21:38:05.8789135Z } 2023-01-11T21:38:05.8789200Z } 2023-01-11T21:38:05.8789285Z ''') 2023-01-11T21:38:05.8789290Z 2023-01-11T21:38:05.8789327Z 2023-01-11T21:38:05.8789422Z async_compile.wait(globals()) 2023-01-11T21:38:05.8789499Z del async_compile 2023-01-11T21:38:05.8789503Z 2023-01-11T21:38:05.8789578Z def call(args): 2023-01-11T21:38:05.8789651Z arg0_1, arg1_1 = args 2023-01-11T21:38:05.8789726Z args.clear() 2023-01-11T21:38:05.8789934Z buf0 = empty_strided((100, 100), (100, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8790137Z buf1 = empty_strided((100, 100), (100, 1), device='cpu', dtype=torch.int64) 2023-01-11T21:38:05.8790336Z buf2 = empty_strided((100, 100), (100, 1), device='cpu', dtype=torch.int64) 2023-01-11T21:38:05.8790539Z buf3 = empty_strided((100, 100), (100, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8790736Z buf4 = empty_strided((100, 100), (100, 1), device='cpu', dtype=torch.int64) 2023-01-11T21:38:05.8790986Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr()), c_void_p(buf3.data_ptr()), c_void_p(buf4.data_ptr())) 2023-01-11T21:38:05.8791063Z del arg0_1 2023-01-11T21:38:05.8791135Z del arg1_1 2023-01-11T21:38:05.8791236Z return (buf0, buf1, buf2, buf3, buf4, ) 2023-01-11T21:38:05.8791241Z 2023-01-11T21:38:05.8791246Z 2023-01-11T21:38:05.8791328Z if __name__ == "__main__": 2023-01-11T21:38:05.8791454Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.8791581Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.8791781Z arg0_1 = rand_strided((100, 100), (100, 1), device='cpu', dtype=torch.int64) 2023-01-11T21:38:05.8791974Z arg1_1 = rand_strided((100, 100), (100, 1), device='cpu', dtype=torch.int64) 2023-01-11T21:38:05.8792095Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:05.8792100Z 2023-01-11T21:38:05.8792172Z ok (1.713s) 2023-01-11T21:38:05.8792483Z test_div8_cpu (__main__.CpuTests) ... [2023-01-11 21:25:51,707] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 73 2023-01-11T21:38:05.8792747Z [2023-01-11 21:25:53,354] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 73 2023-01-11T21:38:05.8792754Z 2023-01-11T21:38:05.8792859Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.8792933Z import torch 2023-01-11T21:38:05.8793008Z import random 2023-01-11T21:38:05.8793148Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.8793275Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.8793280Z 2023-01-11T21:38:05.8793362Z aten = torch.ops.aten 2023-01-11T21:38:05.8793503Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.8793601Z async_compile = AsyncCompile() 2023-01-11T21:38:05.8793606Z 2023-01-11T21:38:05.8793680Z import triton 2023-01-11T21:38:05.8793773Z import triton.language as tl 2023-01-11T21:38:05.8793900Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.8794032Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.8794047Z 2023-01-11T21:38:05.8794051Z 2023-01-11T21:38:05.8794181Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.8794385Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.8794501Z extern "C" void kernel(long* __restrict__ out_ptr0, 2023-01-11T21:38:05.8794604Z long* __restrict__ out_ptr1, 2023-01-11T21:38:05.8794704Z long* __restrict__ out_ptr2) 2023-01-11T21:38:05.8794771Z { 2023-01-11T21:38:05.8794838Z { 2023-01-11T21:38:05.8794899Z { 2023-01-11T21:38:05.8795008Z auto tmp0 = static_cast(1024); 2023-01-11T21:38:05.8795125Z auto tmp1 = static_cast(100); 2023-01-11T21:38:05.8795229Z auto tmp2 = tmp0 / tmp1; 2023-01-11T21:38:05.8795329Z out_ptr0[0] = tmp2; 2023-01-11T21:38:05.8795406Z } 2023-01-11T21:38:05.8795476Z } 2023-01-11T21:38:05.8795534Z { 2023-01-11T21:38:05.8795632Z { 2023-01-11T21:38:05.8795738Z auto tmp0 = static_cast(1024); 2023-01-11T21:38:05.8795841Z auto tmp1 = static_cast(100); 2023-01-11T21:38:05.8796085Z auto tmp2 = ((tmp0 < 0) != (tmp1 < 0) ? (tmp0 % tmp1 != 0 ? tmp0 / tmp1 - 1 : tmp0 / tmp1) : tmp0 / tmp1); 2023-01-11T21:38:05.8796176Z out_ptr1[0] = tmp2; 2023-01-11T21:38:05.8796243Z } 2023-01-11T21:38:05.8796302Z } 2023-01-11T21:38:05.8796366Z { 2023-01-11T21:38:05.8796433Z { 2023-01-11T21:38:05.8796536Z auto tmp0 = static_cast(1024); 2023-01-11T21:38:05.8796642Z auto tmp1 = static_cast(100); 2023-01-11T21:38:05.8796730Z auto tmp2 = tmp0 / tmp1; 2023-01-11T21:38:05.8796806Z out_ptr2[0] = tmp2; 2023-01-11T21:38:05.8796870Z } 2023-01-11T21:38:05.8796935Z } 2023-01-11T21:38:05.8796997Z } 2023-01-11T21:38:05.8797081Z ''') 2023-01-11T21:38:05.8797086Z 2023-01-11T21:38:05.8797091Z 2023-01-11T21:38:05.8797187Z async_compile.wait(globals()) 2023-01-11T21:38:05.8797265Z del async_compile 2023-01-11T21:38:05.8797271Z 2023-01-11T21:38:05.8797338Z def call(args): 2023-01-11T21:38:05.8797525Z buf0 = empty_strided((), (), device='cpu', dtype=torch.int64) 2023-01-11T21:38:05.8797709Z buf1 = empty_strided((), (), device='cpu', dtype=torch.int64) 2023-01-11T21:38:05.8797885Z buf2 = empty_strided((), (), device='cpu', dtype=torch.int64) 2023-01-11T21:38:05.8798049Z kernel_cpp_0(c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr())) 2023-01-11T21:38:05.8798137Z return (buf0, buf1, buf2, ) 2023-01-11T21:38:05.8798142Z 2023-01-11T21:38:05.8798146Z 2023-01-11T21:38:05.8798229Z if __name__ == "__main__": 2023-01-11T21:38:05.8798346Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.8798474Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.8798570Z print_performance(lambda: call([])) 2023-01-11T21:38:05.8798578Z 2023-01-11T21:38:05.8798649Z ok (1.683s) 2023-01-11T21:38:05.8799141Z test_div_prim_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.8799278Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.8799536Z [2023-01-11 21:25:53,393] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 74 2023-01-11T21:38:05.8799798Z [2023-01-11 21:25:55,060] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 74 2023-01-11T21:38:05.8800211Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.8800346Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.8800601Z [2023-01-11 21:25:55,097] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 75 2023-01-11T21:38:05.8800861Z [2023-01-11 21:25:56,747] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 75 2023-01-11T21:38:05.8800867Z 2023-01-11T21:38:05.8800965Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.8801032Z import torch 2023-01-11T21:38:05.8801106Z import random 2023-01-11T21:38:05.8801225Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.8801354Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.8801359Z 2023-01-11T21:38:05.8801441Z aten = torch.ops.aten 2023-01-11T21:38:05.8801606Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.8801702Z async_compile = AsyncCompile() 2023-01-11T21:38:05.8801707Z 2023-01-11T21:38:05.8801775Z import triton 2023-01-11T21:38:05.8801868Z import triton.language as tl 2023-01-11T21:38:05.8801995Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.8802137Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.8802143Z 2023-01-11T21:38:05.8802148Z 2023-01-11T21:38:05.8802287Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.8802492Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.8802617Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:05.8802734Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:05.8802831Z float* __restrict__ out_ptr0) 2023-01-11T21:38:05.8802895Z { 2023-01-11T21:38:05.8803004Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.8803070Z { 2023-01-11T21:38:05.8803153Z #pragma omp for 2023-01-11T21:38:05.8803240Z for(long i0=0; i0<12; i0+=1) 2023-01-11T21:38:05.8803308Z { 2023-01-11T21:38:05.8803442Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:05.8803581Z auto tmp1 = at::vec::Vectorized::loadu(in_ptr1 + 8*i0); 2023-01-11T21:38:05.8803671Z auto tmp2 = tmp0 / tmp1; 2023-01-11T21:38:05.8803766Z tmp2.store(out_ptr0 + 8*i0); 2023-01-11T21:38:05.8803834Z } 2023-01-11T21:38:05.8803933Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:05.8804022Z for(long i0=96; i0<100; i0+=1) 2023-01-11T21:38:05.8804081Z { 2023-01-11T21:38:05.8804171Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:05.8804260Z auto tmp1 = in_ptr1[i0]; 2023-01-11T21:38:05.8804349Z auto tmp2 = tmp0 / tmp1; 2023-01-11T21:38:05.8804437Z out_ptr0[i0] = tmp2; 2023-01-11T21:38:05.8804504Z } 2023-01-11T21:38:05.8804573Z } 2023-01-11T21:38:05.8804630Z } 2023-01-11T21:38:05.8804718Z ''') 2023-01-11T21:38:05.8804724Z 2023-01-11T21:38:05.8804728Z 2023-01-11T21:38:05.8804824Z async_compile.wait(globals()) 2023-01-11T21:38:05.8804900Z del async_compile 2023-01-11T21:38:05.8804937Z 2023-01-11T21:38:05.8805017Z def call(args): 2023-01-11T21:38:05.8805098Z arg0_1, arg1_1 = args 2023-01-11T21:38:05.8805173Z args.clear() 2023-01-11T21:38:05.8805361Z buf0 = empty_strided((100, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8805530Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:05.8805604Z del arg0_1 2023-01-11T21:38:05.8805676Z del arg1_1 2023-01-11T21:38:05.8805751Z return (buf0, ) 2023-01-11T21:38:05.8805756Z 2023-01-11T21:38:05.8805761Z 2023-01-11T21:38:05.8805840Z if __name__ == "__main__": 2023-01-11T21:38:05.8805962Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.8806089Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.8806280Z arg0_1 = rand_strided((100, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8806474Z arg1_1 = rand_strided((100, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8806601Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:05.8806606Z 2023-01-11T21:38:05.8806611Z 2023-01-11T21:38:05.8806712Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.8806789Z import torch 2023-01-11T21:38:05.8806864Z import random 2023-01-11T21:38:05.8806981Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.8807097Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.8807109Z 2023-01-11T21:38:05.8807183Z aten = torch.ops.aten 2023-01-11T21:38:05.8807319Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.8807415Z async_compile = AsyncCompile() 2023-01-11T21:38:05.8807447Z 2023-01-11T21:38:05.8807527Z import triton 2023-01-11T21:38:05.8807621Z import triton.language as tl 2023-01-11T21:38:05.8807746Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.8807887Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.8807892Z 2023-01-11T21:38:05.8807899Z 2023-01-11T21:38:05.8808031Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.8819667Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.8819833Z extern "C" void kernel(const long* __restrict__ in_ptr0, 2023-01-11T21:38:05.8820005Z const long* __restrict__ in_ptr1, 2023-01-11T21:38:05.8820144Z long* __restrict__ out_ptr0) 2023-01-11T21:38:05.8820215Z { 2023-01-11T21:38:05.8820325Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.8820399Z { 2023-01-11T21:38:05.8820478Z #pragma omp for 2023-01-11T21:38:05.8820581Z for(long i0=0; i0<100; i0+=1) 2023-01-11T21:38:05.8820654Z { 2023-01-11T21:38:05.8820725Z { 2023-01-11T21:38:05.8820804Z { 2023-01-11T21:38:05.8820907Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:05.8821002Z auto tmp1 = in_ptr1[i0]; 2023-01-11T21:38:05.8821105Z auto tmp2 = tmp0 / tmp1; 2023-01-11T21:38:05.8821200Z out_ptr0[i0] = tmp2; 2023-01-11T21:38:05.8821275Z } 2023-01-11T21:38:05.8821346Z } 2023-01-11T21:38:05.8821415Z } 2023-01-11T21:38:05.8821483Z } 2023-01-11T21:38:05.8821542Z } 2023-01-11T21:38:05.8821661Z ''') 2023-01-11T21:38:05.8821667Z 2023-01-11T21:38:05.8821672Z 2023-01-11T21:38:05.8821771Z async_compile.wait(globals()) 2023-01-11T21:38:05.8821853Z del async_compile 2023-01-11T21:38:05.8821858Z 2023-01-11T21:38:05.8821935Z def call(args): 2023-01-11T21:38:05.8822019Z arg0_1, arg1_1 = args 2023-01-11T21:38:05.8822100Z args.clear() 2023-01-11T21:38:05.8822293Z buf0 = empty_strided((100, ), (1, ), device='cpu', dtype=torch.int64) 2023-01-11T21:38:05.8822473Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:05.8822552Z del arg0_1 2023-01-11T21:38:05.8822629Z del arg1_1 2023-01-11T21:38:05.8822797Z return (buf0, ) 2023-01-11T21:38:05.8822803Z 2023-01-11T21:38:05.8822808Z 2023-01-11T21:38:05.8822895Z if __name__ == "__main__": 2023-01-11T21:38:05.8823021Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.8823157Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.8823352Z arg0_1 = rand_strided((100, ), (1, ), device='cpu', dtype=torch.int64) 2023-01-11T21:38:05.8823551Z arg1_1 = rand_strided((100, ), (1, ), device='cpu', dtype=torch.int64) 2023-01-11T21:38:05.8823678Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:05.8823683Z 2023-01-11T21:38:05.8823764Z ok (3.392s) 2023-01-11T21:38:05.8824228Z test_div_zero_dim_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.8824369Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.8824673Z [2023-01-11 21:25:56,800] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 76 2023-01-11T21:38:05.8825000Z [2023-01-11 21:25:58,554] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 76 2023-01-11T21:38:05.8825533Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.8825727Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.8826066Z [2023-01-11 21:25:58,606] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 77 2023-01-11T21:38:05.8826480Z [2023-01-11 21:26:00,368] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 77 2023-01-11T21:38:05.8827003Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.8827153Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.8827453Z [2023-01-11 21:26:00,481] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 78 2023-01-11T21:38:05.8827459Z 2023-01-11T21:38:05.8827568Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.8827677Z import torch 2023-01-11T21:38:05.8827759Z import random 2023-01-11T21:38:05.8827900Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.8828041Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.8828047Z 2023-01-11T21:38:05.8828129Z aten = torch.ops.aten 2023-01-11T21:38:05.8828289Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.8828425Z async_compile = AsyncCompile() 2023-01-11T21:38:05.8828431Z 2023-01-11T21:38:05.8828513Z import triton 2023-01-11T21:38:05.8828616Z import triton.language as tl 2023-01-11T21:38:05.8828760Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.8828921Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.8828930Z 2023-01-11T21:38:05.8828934Z 2023-01-11T21:38:05.8829119Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.8829349Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.8829527Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:05.8829646Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:05.8829759Z float* __restrict__ out_ptr0, 2023-01-11T21:38:05.8829895Z float* __restrict__ out_ptr1, 2023-01-11T21:38:05.8830003Z float* __restrict__ out_ptr2, 2023-01-11T21:38:05.8830110Z float* __restrict__ out_ptr3, 2023-01-11T21:38:05.8830207Z float* __restrict__ out_ptr4) 2023-01-11T21:38:05.8830282Z { 2023-01-11T21:38:05.8830394Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.8830467Z { 2023-01-11T21:38:05.8830581Z #pragma omp for 2023-01-11T21:38:05.8830678Z for(long i0=0; i0<1; i0+=1) 2023-01-11T21:38:05.8830749Z { 2023-01-11T21:38:05.8830894Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:05.8831026Z auto tmp1 = at::vec::Vectorized(in_ptr1[0]); 2023-01-11T21:38:05.8831124Z auto tmp2 = tmp0 / tmp1; 2023-01-11T21:38:05.8831218Z auto tmp3 = tmp2.floor(); 2023-01-11T21:38:05.8831314Z auto tmp4 = tmp2.trunc(); 2023-01-11T21:38:05.8831415Z tmp2.store(out_ptr0 + 8*i0); 2023-01-11T21:38:05.8831520Z tmp3.store(out_ptr1 + 8*i0); 2023-01-11T21:38:05.8831610Z tmp4.store(out_ptr2 + 8*i0); 2023-01-11T21:38:05.8831710Z tmp2.store(out_ptr3 + 8*i0); 2023-01-11T21:38:05.8831808Z tmp3.store(out_ptr4 + 8*i0); 2023-01-11T21:38:05.8831881Z } 2023-01-11T21:38:05.8832010Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:05.8832143Z for(long i0=8; i0<10; i0+=1) 2023-01-11T21:38:05.8832220Z { 2023-01-11T21:38:05.8832308Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:05.8832408Z auto tmp1 = in_ptr1[0]; 2023-01-11T21:38:05.8832505Z auto tmp2 = tmp0 / tmp1; 2023-01-11T21:38:05.8832639Z auto tmp3 = std::floor(tmp2); 2023-01-11T21:38:05.8832750Z auto tmp4 = std::trunc(tmp2); 2023-01-11T21:38:05.8832846Z out_ptr0[i0] = tmp2; 2023-01-11T21:38:05.8832940Z out_ptr1[i0] = tmp3; 2023-01-11T21:38:05.8833023Z out_ptr2[i0] = tmp4; 2023-01-11T21:38:05.8833116Z out_ptr3[i0] = tmp2; 2023-01-11T21:38:05.8833207Z out_ptr4[i0] = tmp3; 2023-01-11T21:38:05.8833280Z } 2023-01-11T21:38:05.8833353Z } 2023-01-11T21:38:05.8833425Z } 2023-01-11T21:38:05.8833520Z ''') 2023-01-11T21:38:05.8833525Z 2023-01-11T21:38:05.8833530Z 2023-01-11T21:38:05.8833649Z async_compile.wait(globals()) 2023-01-11T21:38:05.8833737Z del async_compile 2023-01-11T21:38:05.8833746Z 2023-01-11T21:38:05.8833826Z def call(args): 2023-01-11T21:38:05.8833913Z arg0_1, arg1_1 = args 2023-01-11T21:38:05.8834021Z args.clear() 2023-01-11T21:38:05.8834228Z buf0 = empty_strided((10, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8834434Z buf1 = empty_strided((10, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8834633Z buf2 = empty_strided((10, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8834849Z buf3 = empty_strided((10, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8835044Z buf4 = empty_strided((10, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8835340Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr()), c_void_p(buf3.data_ptr()), c_void_p(buf4.data_ptr())) 2023-01-11T21:38:05.8835421Z del arg0_1 2023-01-11T21:38:05.8835500Z del arg1_1 2023-01-11T21:38:05.8835613Z return (buf0, buf1, buf2, buf3, buf4, ) 2023-01-11T21:38:05.8835619Z 2023-01-11T21:38:05.8835623Z 2023-01-11T21:38:05.8835713Z if __name__ == "__main__": 2023-01-11T21:38:05.8835844Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.8835970Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.8836248Z arg0_1 = rand_strided((10, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8836449Z arg1_1 = rand_strided((), (), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8836579Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:05.8836584Z 2023-01-11T21:38:05.8836589Z 2023-01-11T21:38:05.8836694Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.8836804Z import torch 2023-01-11T21:38:05.8836887Z import random 2023-01-11T21:38:05.8837015Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.8837139Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.8837148Z 2023-01-11T21:38:05.8837240Z aten = torch.ops.aten 2023-01-11T21:38:05.8837385Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.8837511Z async_compile = AsyncCompile() 2023-01-11T21:38:05.8837518Z 2023-01-11T21:38:05.8837601Z import triton 2023-01-11T21:38:05.8837701Z import triton.language as tl 2023-01-11T21:38:05.8837839Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.8837978Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.8837994Z 2023-01-11T21:38:05.8837999Z 2023-01-11T21:38:05.8838134Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.8838377Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.8838510Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:05.8838630Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:05.8838743Z float* __restrict__ out_ptr0, 2023-01-11T21:38:05.8838896Z float* __restrict__ out_ptr1, 2023-01-11T21:38:05.8839027Z float* __restrict__ out_ptr2, 2023-01-11T21:38:05.8839124Z float* __restrict__ out_ptr3, 2023-01-11T21:38:05.8839231Z float* __restrict__ out_ptr4) 2023-01-11T21:38:05.8839307Z { 2023-01-11T21:38:05.8839416Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.8839485Z { 2023-01-11T21:38:05.8839573Z #pragma omp for 2023-01-11T21:38:05.8839668Z for(long i0=0; i0<1; i0+=1) 2023-01-11T21:38:05.8839730Z { 2023-01-11T21:38:05.8839891Z auto tmp0 = at::vec::Vectorized(in_ptr0[0]); 2023-01-11T21:38:05.8840036Z auto tmp1 = at::vec::Vectorized::loadu(in_ptr1 + 8*i0); 2023-01-11T21:38:05.8840131Z auto tmp2 = tmp0 / tmp1; 2023-01-11T21:38:05.8840230Z auto tmp3 = tmp2.floor(); 2023-01-11T21:38:05.8840325Z auto tmp4 = tmp2.trunc(); 2023-01-11T21:38:05.8840430Z tmp2.store(out_ptr0 + 8*i0); 2023-01-11T21:38:05.8840523Z tmp3.store(out_ptr1 + 8*i0); 2023-01-11T21:38:05.8840623Z tmp4.store(out_ptr2 + 8*i0); 2023-01-11T21:38:05.8863372Z tmp2.store(out_ptr3 + 8*i0); 2023-01-11T21:38:05.8863493Z tmp3.store(out_ptr4 + 8*i0); 2023-01-11T21:38:05.8863569Z } 2023-01-11T21:38:05.8863673Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:05.8863767Z for(long i0=8; i0<10; i0+=1) 2023-01-11T21:38:05.8863828Z { 2023-01-11T21:38:05.8863918Z auto tmp0 = in_ptr0[0]; 2023-01-11T21:38:05.8864008Z auto tmp1 = in_ptr1[i0]; 2023-01-11T21:38:05.8864097Z auto tmp2 = tmp0 / tmp1; 2023-01-11T21:38:05.8864202Z auto tmp3 = std::floor(tmp2); 2023-01-11T21:38:05.8864302Z auto tmp4 = std::trunc(tmp2); 2023-01-11T21:38:05.8864389Z out_ptr0[i0] = tmp2; 2023-01-11T21:38:05.8864466Z out_ptr1[i0] = tmp3; 2023-01-11T21:38:05.8864555Z out_ptr2[i0] = tmp4; 2023-01-11T21:38:05.8864641Z out_ptr3[i0] = tmp2; 2023-01-11T21:38:05.8864729Z out_ptr4[i0] = tmp3; 2023-01-11T21:38:05.8864801Z } 2023-01-11T21:38:05.8864869Z } 2023-01-11T21:38:05.8864938Z } 2023-01-11T21:38:05.8865044Z ''') 2023-01-11T21:38:05.8865182Z 2023-01-11T21:38:05.8865199Z 2023-01-11T21:38:05.8865289Z async_compile.wait(globals()) 2023-01-11T21:38:05.8865368Z del async_compile 2023-01-11T21:38:05.8865373Z 2023-01-11T21:38:05.8865451Z def call(args): 2023-01-11T21:38:05.8865533Z arg0_1, arg1_1 = args 2023-01-11T21:38:05.8865610Z args.clear() 2023-01-11T21:38:05.8865809Z buf0 = empty_strided((10, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8866002Z buf1 = empty_strided((10, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8866182Z buf2 = empty_strided((10, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8866374Z buf3 = empty_strided((10, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8866559Z buf4 = empty_strided((10, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8866824Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr()), c_void_p(buf3.data_ptr()), c_void_p(buf4.data_ptr())) 2023-01-11T21:38:05.8866900Z del arg0_1 2023-01-11T21:38:05.8866974Z del arg1_1 2023-01-11T21:38:05.8867074Z return (buf0, buf1, buf2, buf3, buf4, ) 2023-01-11T21:38:05.8867079Z 2023-01-11T21:38:05.8867084Z 2023-01-11T21:38:05.8867166Z if __name__ == "__main__": 2023-01-11T21:38:05.8867277Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.8867406Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.8867593Z arg0_1 = rand_strided((), (), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8867786Z arg1_1 = rand_strided((10, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8867958Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:05.8867963Z 2023-01-11T21:38:05.8868235Z [2023-01-11 21:26:02,196] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 78 2023-01-11T21:38:05.8868658Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.8868789Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.8869042Z [2023-01-11 21:26:02,241] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 79 2023-01-11T21:38:05.8869302Z [2023-01-11 21:26:03,923] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 79 2023-01-11T21:38:05.8869311Z 2023-01-11T21:38:05.8869405Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.8869479Z import torch 2023-01-11T21:38:05.8869553Z import random 2023-01-11T21:38:05.8869671Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.8869797Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.8869803Z 2023-01-11T21:38:05.8869884Z aten = torch.ops.aten 2023-01-11T21:38:05.8870020Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.8870111Z async_compile = AsyncCompile() 2023-01-11T21:38:05.8870116Z 2023-01-11T21:38:05.8870190Z import triton 2023-01-11T21:38:05.8870281Z import triton.language as tl 2023-01-11T21:38:05.8870406Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.8870543Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.8870549Z 2023-01-11T21:38:05.8870553Z 2023-01-11T21:38:05.8870689Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.8870896Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.8871016Z extern "C" void kernel(const long* __restrict__ in_ptr0, 2023-01-11T21:38:05.8871123Z const long* __restrict__ in_ptr1, 2023-01-11T21:38:05.8871255Z float* __restrict__ out_ptr0, 2023-01-11T21:38:05.8871357Z long* __restrict__ out_ptr1, 2023-01-11T21:38:05.8871453Z long* __restrict__ out_ptr2, 2023-01-11T21:38:05.8871553Z float* __restrict__ out_ptr3, 2023-01-11T21:38:05.8871649Z long* __restrict__ out_ptr4) 2023-01-11T21:38:05.8871713Z { 2023-01-11T21:38:05.8871811Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.8871876Z { 2023-01-11T21:38:05.8871957Z #pragma omp for 2023-01-11T21:38:05.8872043Z for(long i0=0; i0<10; i0+=1) 2023-01-11T21:38:05.8872109Z { 2023-01-11T21:38:05.8872177Z { 2023-01-11T21:38:05.8872245Z { 2023-01-11T21:38:05.8872340Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:05.8872435Z auto tmp2 = in_ptr1[0]; 2023-01-11T21:38:05.8872547Z auto tmp1 = static_cast(tmp0); 2023-01-11T21:38:05.8872659Z auto tmp3 = static_cast(tmp2); 2023-01-11T21:38:05.8872754Z auto tmp4 = tmp1 / tmp3; 2023-01-11T21:38:05.8873004Z auto tmp5 = ((tmp0 < 0) != (tmp2 < 0) ? (tmp0 % tmp2 != 0 ? tmp0 / tmp2 - 1 : tmp0 / tmp2) : tmp0 / tmp2); 2023-01-11T21:38:05.8873099Z auto tmp6 = tmp0 / tmp2; 2023-01-11T21:38:05.8873185Z out_ptr0[i0] = tmp4; 2023-01-11T21:38:05.8873274Z out_ptr1[i0] = tmp5; 2023-01-11T21:38:05.8873360Z out_ptr2[i0] = tmp6; 2023-01-11T21:38:05.8873447Z out_ptr3[i0] = tmp4; 2023-01-11T21:38:05.8873608Z out_ptr4[i0] = tmp5; 2023-01-11T21:38:05.8873674Z } 2023-01-11T21:38:05.8873738Z } 2023-01-11T21:38:05.8873798Z } 2023-01-11T21:38:05.8873861Z } 2023-01-11T21:38:05.8873921Z } 2023-01-11T21:38:05.8874002Z ''') 2023-01-11T21:38:05.8874007Z 2023-01-11T21:38:05.8874012Z 2023-01-11T21:38:05.8874103Z async_compile.wait(globals()) 2023-01-11T21:38:05.8874176Z del async_compile 2023-01-11T21:38:05.8874180Z 2023-01-11T21:38:05.8874251Z def call(args): 2023-01-11T21:38:05.8874324Z arg0_1, arg1_1 = args 2023-01-11T21:38:05.8874396Z args.clear() 2023-01-11T21:38:05.8874585Z buf0 = empty_strided((10, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8874771Z buf1 = empty_strided((10, ), (1, ), device='cpu', dtype=torch.int64) 2023-01-11T21:38:05.8874957Z buf2 = empty_strided((10, ), (1, ), device='cpu', dtype=torch.int64) 2023-01-11T21:38:05.8875141Z buf3 = empty_strided((10, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8875324Z buf4 = empty_strided((10, ), (1, ), device='cpu', dtype=torch.int64) 2023-01-11T21:38:05.8875580Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr()), c_void_p(buf3.data_ptr()), c_void_p(buf4.data_ptr())) 2023-01-11T21:38:05.8875652Z del arg0_1 2023-01-11T21:38:05.8875718Z del arg1_1 2023-01-11T21:38:05.8875814Z return (buf0, buf1, buf2, buf3, buf4, ) 2023-01-11T21:38:05.8875820Z 2023-01-11T21:38:05.8875824Z 2023-01-11T21:38:05.8875901Z if __name__ == "__main__": 2023-01-11T21:38:05.8876015Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.8876136Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.8876323Z arg0_1 = rand_strided((10, ), (1, ), device='cpu', dtype=torch.int64) 2023-01-11T21:38:05.8876500Z arg1_1 = rand_strided((), (), device='cpu', dtype=torch.int64) 2023-01-11T21:38:05.8876613Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:05.8876625Z 2023-01-11T21:38:05.8876629Z 2023-01-11T21:38:05.8876721Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.8876791Z import torch 2023-01-11T21:38:05.8876861Z import random 2023-01-11T21:38:05.8876975Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.8877127Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.8877133Z 2023-01-11T21:38:05.8877213Z aten = torch.ops.aten 2023-01-11T21:38:05.8877346Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.8877434Z async_compile = AsyncCompile() 2023-01-11T21:38:05.8877439Z 2023-01-11T21:38:05.8877509Z import triton 2023-01-11T21:38:05.8877598Z import triton.language as tl 2023-01-11T21:38:05.8877718Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.8877854Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.8877859Z 2023-01-11T21:38:05.8877864Z 2023-01-11T21:38:05.8878001Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.8878202Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.8878321Z extern "C" void kernel(const long* __restrict__ in_ptr0, 2023-01-11T21:38:05.8878423Z const long* __restrict__ in_ptr1, 2023-01-11T21:38:05.8878524Z float* __restrict__ out_ptr0, 2023-01-11T21:38:05.8878621Z long* __restrict__ out_ptr1, 2023-01-11T21:38:05.8878716Z long* __restrict__ out_ptr2, 2023-01-11T21:38:05.8878814Z float* __restrict__ out_ptr3, 2023-01-11T21:38:05.8878907Z long* __restrict__ out_ptr4) 2023-01-11T21:38:05.8878969Z { 2023-01-11T21:38:05.8879063Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.8879125Z { 2023-01-11T21:38:05.8879205Z #pragma omp for 2023-01-11T21:38:05.8879290Z for(long i0=0; i0<10; i0+=1) 2023-01-11T21:38:05.8879384Z { 2023-01-11T21:38:05.8879451Z { 2023-01-11T21:38:05.8879518Z { 2023-01-11T21:38:05.8879611Z auto tmp0 = in_ptr0[0]; 2023-01-11T21:38:05.8879709Z auto tmp2 = in_ptr1[i0]; 2023-01-11T21:38:05.8879820Z auto tmp1 = static_cast(tmp0); 2023-01-11T21:38:05.8879934Z auto tmp3 = static_cast(tmp2); 2023-01-11T21:38:05.8880029Z auto tmp4 = tmp1 / tmp3; 2023-01-11T21:38:05.8880276Z auto tmp5 = ((tmp0 < 0) != (tmp2 < 0) ? (tmp0 % tmp2 != 0 ? tmp0 / tmp2 - 1 : tmp0 / tmp2) : tmp0 / tmp2); 2023-01-11T21:38:05.8880373Z auto tmp6 = tmp0 / tmp2; 2023-01-11T21:38:05.8880458Z out_ptr0[i0] = tmp4; 2023-01-11T21:38:05.8880546Z out_ptr1[i0] = tmp5; 2023-01-11T21:38:05.8880633Z out_ptr2[i0] = tmp6; 2023-01-11T21:38:05.8880721Z out_ptr3[i0] = tmp4; 2023-01-11T21:38:05.8880811Z out_ptr4[i0] = tmp5; 2023-01-11T21:38:05.8880879Z } 2023-01-11T21:38:05.8880946Z } 2023-01-11T21:38:05.8881009Z } 2023-01-11T21:38:05.8881077Z } 2023-01-11T21:38:05.8881144Z } 2023-01-11T21:38:05.8881229Z ''') 2023-01-11T21:38:05.8881235Z 2023-01-11T21:38:05.8881239Z 2023-01-11T21:38:05.8881340Z async_compile.wait(globals()) 2023-01-11T21:38:05.8902129Z del async_compile 2023-01-11T21:38:05.8902135Z 2023-01-11T21:38:05.8902225Z def call(args): 2023-01-11T21:38:05.8902300Z arg0_1, arg1_1 = args 2023-01-11T21:38:05.8902378Z args.clear() 2023-01-11T21:38:05.8902582Z buf0 = empty_strided((10, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8902769Z buf1 = empty_strided((10, ), (1, ), device='cpu', dtype=torch.int64) 2023-01-11T21:38:05.8902958Z buf2 = empty_strided((10, ), (1, ), device='cpu', dtype=torch.int64) 2023-01-11T21:38:05.8903145Z buf3 = empty_strided((10, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8903331Z buf4 = empty_strided((10, ), (1, ), device='cpu', dtype=torch.int64) 2023-01-11T21:38:05.8903669Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr()), c_void_p(buf3.data_ptr()), c_void_p(buf4.data_ptr())) 2023-01-11T21:38:05.8903739Z del arg0_1 2023-01-11T21:38:05.8903812Z del arg1_1 2023-01-11T21:38:05.8903910Z return (buf0, buf1, buf2, buf3, buf4, ) 2023-01-11T21:38:05.8903916Z 2023-01-11T21:38:05.8903920Z 2023-01-11T21:38:05.8904000Z if __name__ == "__main__": 2023-01-11T21:38:05.8904123Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.8904248Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.8904430Z arg0_1 = rand_strided((), (), device='cpu', dtype=torch.int64) 2023-01-11T21:38:05.8904617Z arg1_1 = rand_strided((10, ), (1, ), device='cpu', dtype=torch.int64) 2023-01-11T21:38:05.8904737Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:05.8904747Z 2023-01-11T21:38:05.8904811Z ok (7.180s) 2023-01-11T21:38:05.8905318Z test_dropout_cpu (__main__.CpuTests) ... /opt/conda/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py:372: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled.Consider setting `torch.set_float32_matmul_precision('high')` 2023-01-11T21:38:05.8905399Z warnings.warn( 2023-01-11T21:38:05.8905655Z [2023-01-11 21:26:03,993] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 80 2023-01-11T21:38:05.8905905Z [2023-01-11 21:26:03,994] torch._inductor.lowering: [WARNING] using triton random, expect difference from eager 2023-01-11T21:38:05.8906165Z [2023-01-11 21:26:05,666] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 80 2023-01-11T21:38:05.8906415Z [2023-01-11 21:26:05,731] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 81 2023-01-11T21:38:05.8906722Z [2023-01-11 21:26:05,731] torch._inductor.lowering: [WARNING] using triton random, expect difference from eager 2023-01-11T21:38:05.8906982Z [2023-01-11 21:26:05,740] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 81 2023-01-11T21:38:05.8906988Z 2023-01-11T21:38:05.8907085Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.8907160Z import torch 2023-01-11T21:38:05.8907236Z import random 2023-01-11T21:38:05.8907358Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.8907484Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.8907489Z 2023-01-11T21:38:05.8907572Z aten = torch.ops.aten 2023-01-11T21:38:05.8907710Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.8907801Z async_compile = AsyncCompile() 2023-01-11T21:38:05.8907811Z 2023-01-11T21:38:05.8907882Z import triton 2023-01-11T21:38:05.8907976Z import triton.language as tl 2023-01-11T21:38:05.8908106Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.8908247Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.8908414Z seed_cpu_None = None # 9130db9322feaa41c28986790b86d7dd047e77339ff46fce775dbaa5929b26ce 2023-01-11T21:38:05.8908420Z 2023-01-11T21:38:05.8908425Z 2023-01-11T21:38:05.8908570Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.8908780Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.8908902Z extern "C" void kernel(const long* __restrict__ seed0, 2023-01-11T21:38:05.8909008Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:05.8909117Z float* __restrict__ out_ptr0) 2023-01-11T21:38:05.8909188Z { 2023-01-11T21:38:05.8909293Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.8909365Z { 2023-01-11T21:38:05.8909449Z #pragma omp for 2023-01-11T21:38:05.8909547Z for(long i0=0; i0<1000; i0+=1) 2023-01-11T21:38:05.8909610Z { 2023-01-11T21:38:05.8909681Z { 2023-01-11T21:38:05.8909754Z { 2023-01-11T21:38:05.8909849Z auto tmp0 = seed0[0]; 2023-01-11T21:38:05.8909953Z auto tmp6 = in_ptr1[i0]; 2023-01-11T21:38:05.8910097Z auto tmp1 = static_cast(i0); 2023-01-11T21:38:05.8910247Z auto tmp2 = static_cast(normalized_rand_cpu(tmp0, tmp1));; 2023-01-11T21:38:05.8910354Z auto tmp3 = static_cast(0.5); 2023-01-11T21:38:05.8910452Z auto tmp4 = tmp2 > tmp3; 2023-01-11T21:38:05.8910568Z auto tmp5 = static_cast(tmp4); 2023-01-11T21:38:05.8910670Z auto tmp7 = tmp5 * tmp6; 2023-01-11T21:38:05.8910784Z auto tmp8 = static_cast(2.0); 2023-01-11T21:38:05.8910886Z auto tmp9 = tmp7 * tmp8; 2023-01-11T21:38:05.8910978Z out_ptr0[i0] = tmp9; 2023-01-11T21:38:05.8911047Z } 2023-01-11T21:38:05.8911119Z } 2023-01-11T21:38:05.8911190Z } 2023-01-11T21:38:05.8911258Z } 2023-01-11T21:38:05.8911325Z } 2023-01-11T21:38:05.8911415Z ''') 2023-01-11T21:38:05.8911420Z 2023-01-11T21:38:05.8911425Z 2023-01-11T21:38:05.8911524Z async_compile.wait(globals()) 2023-01-11T21:38:05.8911597Z del async_compile 2023-01-11T21:38:05.8911611Z 2023-01-11T21:38:05.8911681Z def call(args): 2023-01-11T21:38:05.8911758Z arg0_1, = args 2023-01-11T21:38:05.8911838Z args.clear() 2023-01-11T21:38:05.8911981Z torch.randint(2**31, size=(), dtype=torch.int64, out=seed_cpu_None) 2023-01-11T21:38:05.8912180Z buf0 = empty_strided((1000, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8912359Z kernel_cpp_0(c_void_p(seed_cpu_None.data_ptr()), c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:05.8912436Z del arg0_1 2023-01-11T21:38:05.8912507Z return (buf0, ) 2023-01-11T21:38:05.8912543Z 2023-01-11T21:38:05.8912548Z 2023-01-11T21:38:05.8912633Z if __name__ == "__main__": 2023-01-11T21:38:05.8912756Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.8912889Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.8913088Z seed_cpu_None = rand_strided((), (), device='cpu', dtype=torch.int64) 2023-01-11T21:38:05.8913292Z arg0_1 = rand_strided((1000, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8913408Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:05.8913413Z 2023-01-11T21:38:05.8913417Z 2023-01-11T21:38:05.8913520Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.8913589Z import torch 2023-01-11T21:38:05.8913668Z import random 2023-01-11T21:38:05.8913791Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.8913916Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.8913921Z 2023-01-11T21:38:05.8914008Z aten = torch.ops.aten 2023-01-11T21:38:05.8914150Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.8914250Z async_compile = AsyncCompile() 2023-01-11T21:38:05.8914255Z 2023-01-11T21:38:05.8914332Z import triton 2023-01-11T21:38:05.8914421Z import triton.language as tl 2023-01-11T21:38:05.8914554Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.8914697Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.8914867Z seed_cpu_None = None # 9130db9322feaa41c28986790b86d7dd047e77339ff46fce775dbaa5929b26ce 2023-01-11T21:38:05.8914872Z 2023-01-11T21:38:05.8914876Z 2023-01-11T21:38:05.8915017Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.8915223Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.8915346Z extern "C" void kernel(const long* __restrict__ seed0, 2023-01-11T21:38:05.8915462Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:05.8915563Z float* __restrict__ out_ptr0) 2023-01-11T21:38:05.8915633Z { 2023-01-11T21:38:05.8915739Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.8915812Z { 2023-01-11T21:38:05.8915897Z #pragma omp for 2023-01-11T21:38:05.8915990Z for(long i0=0; i0<1000; i0+=1) 2023-01-11T21:38:05.8916053Z { 2023-01-11T21:38:05.8916153Z { 2023-01-11T21:38:05.8916225Z { 2023-01-11T21:38:05.8916322Z auto tmp0 = seed0[0]; 2023-01-11T21:38:05.8916422Z auto tmp6 = in_ptr1[i0]; 2023-01-11T21:38:05.8916534Z auto tmp1 = static_cast(i0); 2023-01-11T21:38:05.8916682Z auto tmp2 = static_cast(normalized_rand_cpu(tmp0, tmp1));; 2023-01-11T21:38:05.8916787Z auto tmp3 = static_cast(0.5); 2023-01-11T21:38:05.8916888Z auto tmp4 = tmp2 > tmp3; 2023-01-11T21:38:05.8917007Z auto tmp5 = static_cast(tmp4); 2023-01-11T21:38:05.8917109Z auto tmp7 = tmp5 * tmp6; 2023-01-11T21:38:05.8917222Z auto tmp8 = static_cast(2.0); 2023-01-11T21:38:05.8917321Z auto tmp9 = tmp7 * tmp8; 2023-01-11T21:38:05.8917415Z out_ptr0[i0] = tmp9; 2023-01-11T21:38:05.8917494Z } 2023-01-11T21:38:05.8917557Z } 2023-01-11T21:38:05.8917630Z } 2023-01-11T21:38:05.8917699Z } 2023-01-11T21:38:05.8917769Z } 2023-01-11T21:38:05.8917858Z ''') 2023-01-11T21:38:05.8917863Z 2023-01-11T21:38:05.8917868Z 2023-01-11T21:38:05.8917965Z async_compile.wait(globals()) 2023-01-11T21:38:05.8918048Z del async_compile 2023-01-11T21:38:05.8918053Z 2023-01-11T21:38:05.8918123Z def call(args): 2023-01-11T21:38:05.8918199Z arg0_1, = args 2023-01-11T21:38:05.8918276Z args.clear() 2023-01-11T21:38:05.8918418Z torch.randint(2**31, size=(), dtype=torch.int64, out=seed_cpu_None) 2023-01-11T21:38:05.8918617Z buf0 = empty_strided((1000, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8918823Z kernel_cpp_0(c_void_p(seed_cpu_None.data_ptr()), c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:05.8918896Z del arg0_1 2023-01-11T21:38:05.8918966Z return (buf0, ) 2023-01-11T21:38:05.8918971Z 2023-01-11T21:38:05.8918983Z 2023-01-11T21:38:05.8919059Z if __name__ == "__main__": 2023-01-11T21:38:05.8919174Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.8919302Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.8919495Z seed_cpu_None = rand_strided((), (), device='cpu', dtype=torch.int64) 2023-01-11T21:38:05.8919693Z arg0_1 = rand_strided((1000, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8919807Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:05.8919812Z 2023-01-11T21:38:05.8919882Z ok (1.811s) 2023-01-11T21:38:05.8920215Z test_dropout_deterministic_cpu (__main__.CpuTests) ... [2023-01-11 21:26:05,804] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 82 2023-01-11T21:38:05.8920475Z [2023-01-11 21:26:05,804] torch._inductor.lowering: [WARNING] using triton random, expect difference from eager 2023-01-11T21:38:05.8920739Z [2023-01-11 21:26:07,483] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 82 2023-01-11T21:38:05.8920992Z [2023-01-11 21:26:07,546] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 83 2023-01-11T21:38:05.8921243Z [2023-01-11 21:26:07,546] torch._inductor.lowering: [WARNING] using triton random, expect difference from eager 2023-01-11T21:38:05.8921501Z [2023-01-11 21:26:07,554] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 83 2023-01-11T21:38:05.8921506Z 2023-01-11T21:38:05.8921607Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.8921683Z import torch 2023-01-11T21:38:05.8921759Z import random 2023-01-11T21:38:05.8921874Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.8922000Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.8922005Z 2023-01-11T21:38:05.8922092Z aten = torch.ops.aten 2023-01-11T21:38:05.8922229Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.8922356Z async_compile = AsyncCompile() 2023-01-11T21:38:05.8922361Z 2023-01-11T21:38:05.8922438Z import triton 2023-01-11T21:38:05.8922532Z import triton.language as tl 2023-01-11T21:38:05.8922649Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.8922790Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.8922955Z seed_cpu_None = None # 9130db9322feaa41c28986790b86d7dd047e77339ff46fce775dbaa5929b26ce 2023-01-11T21:38:05.8922960Z 2023-01-11T21:38:05.8922965Z 2023-01-11T21:38:05.8923105Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.8923313Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.8923436Z extern "C" void kernel(const long* __restrict__ seed0, 2023-01-11T21:38:05.8923546Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:05.8923649Z float* __restrict__ out_ptr0) 2023-01-11T21:38:05.8923707Z { 2023-01-11T21:38:05.8923813Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.8923882Z { 2023-01-11T21:38:05.8923964Z #pragma omp for 2023-01-11T21:38:05.8924053Z for(long i0=0; i0<1024; i0+=1) 2023-01-11T21:38:05.8924123Z { 2023-01-11T21:38:05.8924192Z { 2023-01-11T21:38:05.8924254Z { 2023-01-11T21:38:05.8924351Z auto tmp0 = seed0[0]; 2023-01-11T21:38:05.8924450Z auto tmp6 = in_ptr1[i0]; 2023-01-11T21:38:05.8924557Z auto tmp1 = static_cast(i0); 2023-01-11T21:38:05.8924701Z auto tmp2 = static_cast(normalized_rand_cpu(tmp0, tmp1));; 2023-01-11T21:38:05.8924839Z auto tmp3 = static_cast(0.55); 2023-01-11T21:38:05.8924934Z auto tmp4 = tmp2 > tmp3; 2023-01-11T21:38:05.8925038Z auto tmp5 = static_cast(tmp4); 2023-01-11T21:38:05.8925135Z auto tmp7 = tmp5 * tmp6; 2023-01-11T21:38:05.8925258Z auto tmp8 = static_cast(2.2222222222222223); 2023-01-11T21:38:05.8925353Z auto tmp9 = tmp7 * tmp8; 2023-01-11T21:38:05.8925443Z out_ptr0[i0] = tmp9; 2023-01-11T21:38:05.8925515Z } 2023-01-11T21:38:05.8925584Z } 2023-01-11T21:38:05.8925644Z } 2023-01-11T21:38:05.8925711Z } 2023-01-11T21:38:05.8925776Z } 2023-01-11T21:38:05.8925862Z ''') 2023-01-11T21:38:05.8925867Z 2023-01-11T21:38:05.8925871Z 2023-01-11T21:38:05.8925966Z async_compile.wait(globals()) 2023-01-11T21:38:05.8926043Z del async_compile 2023-01-11T21:38:05.8926047Z 2023-01-11T21:38:05.8926124Z def call(args): 2023-01-11T21:38:05.8926202Z arg0_1, = args 2023-01-11T21:38:05.8926270Z args.clear() 2023-01-11T21:38:05.8926407Z torch.randint(2**31, size=(), dtype=torch.int64, out=seed_cpu_None) 2023-01-11T21:38:05.8926603Z buf0 = empty_strided((1024, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8926779Z kernel_cpp_0(c_void_p(seed_cpu_None.data_ptr()), c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:05.8926853Z del arg0_1 2023-01-11T21:38:05.8926929Z return (buf0, ) 2023-01-11T21:38:05.8926934Z 2023-01-11T21:38:05.8926938Z 2023-01-11T21:38:05.8927020Z if __name__ == "__main__": 2023-01-11T21:38:05.8927130Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.8927257Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.8927451Z seed_cpu_None = rand_strided((), (), device='cpu', dtype=torch.int64) 2023-01-11T21:38:05.8927648Z arg0_1 = rand_strided((1024, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8927764Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:05.8927769Z 2023-01-11T21:38:05.8927773Z 2023-01-11T21:38:05.8927871Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.8927946Z import torch 2023-01-11T21:38:05.8928023Z import random 2023-01-11T21:38:05.8928135Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.8928301Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.8928306Z 2023-01-11T21:38:05.8928390Z aten = torch.ops.aten 2023-01-11T21:38:05.8928526Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.8928622Z async_compile = AsyncCompile() 2023-01-11T21:38:05.8928628Z 2023-01-11T21:38:05.8928705Z import triton 2023-01-11T21:38:05.8928799Z import triton.language as tl 2023-01-11T21:38:05.8928924Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.8929055Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.8929218Z seed_cpu_None = None # 9130db9322feaa41c28986790b86d7dd047e77339ff46fce775dbaa5929b26ce 2023-01-11T21:38:05.8929226Z 2023-01-11T21:38:05.8929230Z 2023-01-11T21:38:05.8929368Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.8929573Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.8929700Z extern "C" void kernel(const long* __restrict__ seed0, 2023-01-11T21:38:05.8929810Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:05.8929914Z float* __restrict__ out_ptr0) 2023-01-11T21:38:05.8929982Z { 2023-01-11T21:38:05.8930077Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.8930145Z { 2023-01-11T21:38:05.8930227Z #pragma omp for 2023-01-11T21:38:05.8930317Z for(long i0=0; i0<1024; i0+=1) 2023-01-11T21:38:05.8930386Z { 2023-01-11T21:38:05.8930454Z { 2023-01-11T21:38:05.8930516Z { 2023-01-11T21:38:05.8930607Z auto tmp0 = seed0[0]; 2023-01-11T21:38:05.8930739Z auto tmp6 = in_ptr1[i0]; 2023-01-11T21:38:05.8930846Z auto tmp1 = static_cast(i0); 2023-01-11T21:38:05.8930990Z auto tmp2 = static_cast(normalized_rand_cpu(tmp0, tmp1));; 2023-01-11T21:38:05.8931104Z auto tmp3 = static_cast(0.55); 2023-01-11T21:38:05.8931201Z auto tmp4 = tmp2 > tmp3; 2023-01-11T21:38:05.8931313Z auto tmp5 = static_cast(tmp4); 2023-01-11T21:38:05.8931402Z auto tmp7 = tmp5 * tmp6; 2023-01-11T21:38:05.8931519Z auto tmp8 = static_cast(2.2222222222222223); 2023-01-11T21:38:05.8931614Z auto tmp9 = tmp7 * tmp8; 2023-01-11T21:38:05.8931706Z out_ptr0[i0] = tmp9; 2023-01-11T21:38:05.8931778Z } 2023-01-11T21:38:05.8931846Z } 2023-01-11T21:38:05.8931913Z } 2023-01-11T21:38:05.8931972Z } 2023-01-11T21:38:05.8932041Z } 2023-01-11T21:38:05.8932126Z ''') 2023-01-11T21:38:05.8932132Z 2023-01-11T21:38:05.8932136Z 2023-01-11T21:38:05.8932232Z async_compile.wait(globals()) 2023-01-11T21:38:05.8932310Z del async_compile 2023-01-11T21:38:05.8932315Z 2023-01-11T21:38:05.8932390Z def call(args): 2023-01-11T21:38:05.8932466Z arg0_1, = args 2023-01-11T21:38:05.8932537Z args.clear() 2023-01-11T21:38:05.8932672Z torch.randint(2**31, size=(), dtype=torch.int64, out=seed_cpu_None) 2023-01-11T21:38:05.8932865Z buf0 = empty_strided((1024, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8933037Z kernel_cpp_0(c_void_p(seed_cpu_None.data_ptr()), c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:05.8933111Z del arg0_1 2023-01-11T21:38:05.8933186Z return (buf0, ) 2023-01-11T21:38:05.8933191Z 2023-01-11T21:38:05.8933196Z 2023-01-11T21:38:05.8933277Z if __name__ == "__main__": 2023-01-11T21:38:05.8933395Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.8933516Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.8933710Z seed_cpu_None = rand_strided((), (), device='cpu', dtype=torch.int64) 2023-01-11T21:38:05.8933906Z arg0_1 = rand_strided((1024, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8934050Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:05.8934056Z 2023-01-11T21:38:05.8934130Z ok (1.815s) 2023-01-11T21:38:05.8934611Z test_dtype_mismatch_issue_cpu (__main__.CpuTests) ... [2023-01-11 21:26:07,578] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 84 2023-01-11T21:38:05.8934880Z [2023-01-11 21:26:09,302] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 84 2023-01-11T21:38:05.8934886Z 2023-01-11T21:38:05.8934985Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.8935052Z import torch 2023-01-11T21:38:05.8935127Z import random 2023-01-11T21:38:05.8935246Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.8935373Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.8935379Z 2023-01-11T21:38:05.8935460Z aten = torch.ops.aten 2023-01-11T21:38:05.8935600Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.8935697Z async_compile = AsyncCompile() 2023-01-11T21:38:05.8935705Z 2023-01-11T21:38:05.8935782Z import triton 2023-01-11T21:38:05.8935868Z import triton.language as tl 2023-01-11T21:38:05.8935994Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.8936133Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.8936138Z 2023-01-11T21:38:05.8936143Z 2023-01-11T21:38:05.8936282Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.8936491Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.8936613Z extern "C" void kernel(float* __restrict__ in_out_ptr0, 2023-01-11T21:38:05.8936726Z const float* __restrict__ in_ptr0, 2023-01-11T21:38:05.8936879Z float* __restrict__ out_ptr0, 2023-01-11T21:38:05.8936975Z float* __restrict__ out_ptr2) 2023-01-11T21:38:05.8937041Z { 2023-01-11T21:38:05.8937191Z auto out_ptr1 = in_out_ptr0; 2023-01-11T21:38:05.8937297Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.8937366Z { 2023-01-11T21:38:05.8937449Z #pragma omp for 2023-01-11T21:38:05.8937530Z for(long i0=0; i0<4096; i0+=1) 2023-01-11T21:38:05.8937598Z { 2023-01-11T21:38:05.8937667Z { 2023-01-11T21:38:05.8937736Z { 2023-01-11T21:38:05.8937999Z float tmp5 = -std::numeric_limits::infinity(); 2023-01-11T21:38:05.8938098Z for(long i1=0; i1<64; i1+=1) 2023-01-11T21:38:05.8938171Z { 2023-01-11T21:38:05.8938238Z { 2023-01-11T21:38:05.8938353Z auto tmp0 = static_cast(i1); 2023-01-11T21:38:05.8938473Z auto tmp1 = static_cast(63); 2023-01-11T21:38:05.8938575Z auto tmp2 = tmp0 < tmp1; 2023-01-11T21:38:05.8938668Z float tmp3 = 0.0; 2023-01-11T21:38:05.8938750Z if(tmp2) 2023-01-11T21:38:05.8938829Z { 2023-01-11T21:38:05.8938943Z auto tmp4 = in_ptr0[i1 + (63*i0)]; 2023-01-11T21:38:05.8939027Z tmp3 = tmp4; 2023-01-11T21:38:05.8939105Z } 2023-01-11T21:38:05.8939216Z tmp5 = std::max(tmp5, tmp3); 2023-01-11T21:38:05.8939289Z } 2023-01-11T21:38:05.8939359Z } 2023-01-11T21:38:05.8939451Z out_ptr0[i0] = tmp5; 2023-01-11T21:38:05.8939524Z } 2023-01-11T21:38:05.8939585Z } 2023-01-11T21:38:05.8939654Z } 2023-01-11T21:38:05.8939739Z #pragma omp for 2023-01-11T21:38:05.8939827Z for(long i0=0; i0<4096; i0+=1) 2023-01-11T21:38:05.8939895Z { 2023-01-11T21:38:05.8939962Z { 2023-01-11T21:38:05.8940024Z { 2023-01-11T21:38:05.8940111Z float tmp8 = 0; 2023-01-11T21:38:05.8940257Z for(long i1=0; i1<64; i1+=1) 2023-01-11T21:38:05.8940330Z { 2023-01-11T21:38:05.8940404Z { 2023-01-11T21:38:05.8940510Z auto tmp5 = out_ptr0[i0]; 2023-01-11T21:38:05.8940624Z auto tmp0 = static_cast(i1); 2023-01-11T21:38:05.8940728Z auto tmp1 = static_cast(63); 2023-01-11T21:38:05.8940829Z auto tmp2 = tmp0 < tmp1; 2023-01-11T21:38:05.8940923Z float tmp3 = 0.0; 2023-01-11T21:38:05.8941006Z if(tmp2) 2023-01-11T21:38:05.8941082Z { 2023-01-11T21:38:05.8941198Z auto tmp4 = in_ptr0[i1 + (63*i0)]; 2023-01-11T21:38:05.8941291Z tmp3 = tmp4; 2023-01-11T21:38:05.8941359Z } 2023-01-11T21:38:05.8941507Z auto tmp6 = tmp3 - tmp5; 2023-01-11T21:38:05.8941621Z auto tmp7 = std::exp(tmp6); 2023-01-11T21:38:05.8941728Z out_ptr1[i1 + (64*i0)] = tmp7; 2023-01-11T21:38:05.8941818Z tmp8 += tmp7; 2023-01-11T21:38:05.8941892Z } 2023-01-11T21:38:05.8941965Z } 2023-01-11T21:38:05.8942047Z out_ptr2[i0] = tmp8; 2023-01-11T21:38:05.8942121Z } 2023-01-11T21:38:05.8942188Z } 2023-01-11T21:38:05.8942256Z } 2023-01-11T21:38:05.8942338Z #pragma omp for 2023-01-11T21:38:05.8942427Z for(long i0=0; i0<4096; i0+=1) 2023-01-11T21:38:05.8942495Z { 2023-01-11T21:38:05.8942621Z for(long i1=0; i1<8; i1+=1) 2023-01-11T21:38:05.8942690Z { 2023-01-11T21:38:05.8942842Z auto tmp0 = at::vec::Vectorized::loadu(out_ptr1 + (8*i1) + (64*i0)); 2023-01-11T21:38:05.8942976Z auto tmp1 = at::vec::Vectorized(out_ptr2[i0]); 2023-01-11T21:38:05.8943073Z auto tmp2 = tmp0 / tmp1; 2023-01-11T21:38:05.8943189Z tmp2.store(in_out_ptr0 + (8*i1) + (64*i0)); 2023-01-11T21:38:05.8943259Z } 2023-01-11T21:38:05.8943347Z #pragma omp simd simdlen(4) 2023-01-11T21:38:05.8943441Z for(long i1=64; i1<64; i1+=1) 2023-01-11T21:38:05.8943525Z { 2023-01-11T21:38:05.8943655Z auto tmp0 = out_ptr1[i1 + (64*i0)]; 2023-01-11T21:38:05.8943777Z auto tmp1 = out_ptr2[i0]; 2023-01-11T21:38:05.8943895Z auto tmp2 = tmp0 / tmp1; 2023-01-11T21:38:05.8944021Z in_out_ptr0[i1 + (64*i0)] = tmp2; 2023-01-11T21:38:05.8944089Z } 2023-01-11T21:38:05.8944159Z } 2023-01-11T21:38:05.8944228Z } 2023-01-11T21:38:05.8944297Z } 2023-01-11T21:38:05.8944392Z ''') 2023-01-11T21:38:05.8944398Z 2023-01-11T21:38:05.8944403Z 2023-01-11T21:38:05.8944500Z async_compile.wait(globals()) 2023-01-11T21:38:05.8944581Z del async_compile 2023-01-11T21:38:05.8944591Z 2023-01-11T21:38:05.8944670Z def call(args): 2023-01-11T21:38:05.8944742Z arg0_1, = args 2023-01-11T21:38:05.8944821Z args.clear() 2023-01-11T21:38:05.8945037Z buf0 = empty_strided((128, 32, 1), (32, 1, 4096), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8945256Z buf1 = empty_strided((128, 32, 64), (2048, 64, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8945468Z buf2 = empty_strided((128, 32, 1), (32, 1, 4096), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8945564Z buf3 = buf1; del buf1 # reuse 2023-01-11T21:38:05.8945760Z kernel_cpp_0(c_void_p(buf3.data_ptr()), c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf2.data_ptr())) 2023-01-11T21:38:05.8945831Z del arg0_1 2023-01-11T21:38:05.8945912Z return (buf3, ) 2023-01-11T21:38:05.8945917Z 2023-01-11T21:38:05.8945921Z 2023-01-11T21:38:05.8946016Z if __name__ == "__main__": 2023-01-11T21:38:05.8946139Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.8946319Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.8946533Z arg0_1 = rand_strided((128, 32, 63), (2016, 63, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8946653Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:05.8946658Z 2023-01-11T21:38:05.8946731Z ok (1.750s) 2023-01-11T21:38:05.8947263Z test_elu_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.8947403Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.8947699Z [2023-01-11 21:26:09,347] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 85 2023-01-11T21:38:05.8948000Z [2023-01-11 21:26:11,027] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 85 2023-01-11T21:38:05.8948006Z 2023-01-11T21:38:05.8948110Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.8948188Z import torch 2023-01-11T21:38:05.8948266Z import random 2023-01-11T21:38:05.8948394Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.8948528Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.8948533Z 2023-01-11T21:38:05.8948610Z aten = torch.ops.aten 2023-01-11T21:38:05.8948761Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.8948896Z async_compile = AsyncCompile() 2023-01-11T21:38:05.8948901Z 2023-01-11T21:38:05.8948981Z import triton 2023-01-11T21:38:05.8949080Z import triton.language as tl 2023-01-11T21:38:05.8949225Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.8949368Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.8949375Z 2023-01-11T21:38:05.8949380Z 2023-01-11T21:38:05.8949519Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.8949719Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.8949845Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:05.8949951Z float* __restrict__ out_ptr0, 2023-01-11T21:38:05.8950061Z float* __restrict__ out_ptr1) 2023-01-11T21:38:05.8950129Z { 2023-01-11T21:38:05.8950233Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.8950303Z { 2023-01-11T21:38:05.8950380Z #pragma omp for 2023-01-11T21:38:05.8950474Z for(long i0=0; i0<256; i0+=1) 2023-01-11T21:38:05.8950544Z { 2023-01-11T21:38:05.8950615Z { 2023-01-11T21:38:05.8950685Z { 2023-01-11T21:38:05.8950786Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:05.8950905Z auto tmp1 = static_cast(0); 2023-01-11T21:38:05.8950998Z auto tmp2 = tmp0 > tmp1; 2023-01-11T21:38:05.8951121Z auto tmp3 = static_cast(1.0507009873554805); 2023-01-11T21:38:05.8951220Z auto tmp4 = tmp0 * tmp3; 2023-01-11T21:38:05.8951334Z auto tmp5 = static_cast(1.0); 2023-01-11T21:38:05.8951432Z auto tmp6 = tmp0 * tmp5; 2023-01-11T21:38:05.8951542Z auto tmp7 = std::expm1(tmp6); 2023-01-11T21:38:05.8951662Z auto tmp8 = static_cast(1.7580993408473766); 2023-01-11T21:38:05.8951760Z auto tmp9 = tmp7 * tmp8; 2023-01-11T21:38:05.8951862Z auto tmp10 = tmp2 ? tmp4 : tmp9; 2023-01-11T21:38:05.8951971Z auto tmp11 = static_cast(2); 2023-01-11T21:38:05.8952073Z auto tmp12 = tmp10 + tmp11; 2023-01-11T21:38:05.8952183Z auto tmp13 = static_cast(1); 2023-01-11T21:38:05.8952319Z auto tmp14 = tmp0 + tmp13; 2023-01-11T21:38:05.8952423Z auto tmp15 = tmp14 > tmp1; 2023-01-11T21:38:05.8952534Z auto tmp16 = static_cast(3); 2023-01-11T21:38:05.8952627Z auto tmp17 = tmp14 * tmp16; 2023-01-11T21:38:05.8952737Z auto tmp18 = static_cast(4); 2023-01-11T21:38:05.8952837Z auto tmp19 = tmp14 * tmp18; 2023-01-11T21:38:05.8952947Z auto tmp20 = std::expm1(tmp19); 2023-01-11T21:38:05.8953056Z auto tmp21 = static_cast(6); 2023-01-11T21:38:05.8953157Z auto tmp22 = tmp20 * tmp21; 2023-01-11T21:38:05.8953266Z auto tmp23 = tmp15 ? tmp17 : tmp22; 2023-01-11T21:38:05.8953352Z out_ptr0[i0] = tmp12; 2023-01-11T21:38:05.8953446Z out_ptr1[i0] = tmp23; 2023-01-11T21:38:05.8953519Z } 2023-01-11T21:38:05.8953592Z } 2023-01-11T21:38:05.8953664Z } 2023-01-11T21:38:05.8953733Z } 2023-01-11T21:38:05.8953800Z } 2023-01-11T21:38:05.8953882Z ''') 2023-01-11T21:38:05.8953888Z 2023-01-11T21:38:05.8953892Z 2023-01-11T21:38:05.8953995Z async_compile.wait(globals()) 2023-01-11T21:38:05.8954076Z del async_compile 2023-01-11T21:38:05.8954081Z 2023-01-11T21:38:05.8954159Z def call(args): 2023-01-11T21:38:05.8954238Z arg0_1, = args 2023-01-11T21:38:05.8954321Z args.clear() 2023-01-11T21:38:05.8954525Z buf0 = empty_strided((16, 16), (16, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8954718Z buf1 = empty_strided((16, 16), (16, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8954920Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr())) 2023-01-11T21:38:05.8955000Z del arg0_1 2023-01-11T21:38:05.8955086Z return (buf0, buf1, ) 2023-01-11T21:38:05.8955091Z 2023-01-11T21:38:05.8955096Z 2023-01-11T21:38:05.8955193Z if __name__ == "__main__": 2023-01-11T21:38:05.8955333Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.8955484Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.8955686Z arg0_1 = rand_strided((16, 16), (16, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8955792Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:05.8955806Z 2023-01-11T21:38:05.8955873Z ok (1.723s) 2023-01-11T21:38:05.8956334Z test_embedding_bag_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.8956474Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.8956736Z [2023-01-11 21:26:11,052] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 86 2023-01-11T21:38:05.8956971Z [2023-01-11 21:26:11,066] torch._inductor.ir: [WARNING] Using FallbackKernel: aten._embedding_bag 2023-01-11T21:38:05.8957236Z [2023-01-11 21:26:11,069] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 86 2023-01-11T21:38:05.8957242Z 2023-01-11T21:38:05.8957345Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.8957424Z import torch 2023-01-11T21:38:05.8957506Z import random 2023-01-11T21:38:05.8957621Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.8957750Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.8957758Z 2023-01-11T21:38:05.8957844Z aten = torch.ops.aten 2023-01-11T21:38:05.8957983Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.8958082Z async_compile = AsyncCompile() 2023-01-11T21:38:05.8958087Z 2023-01-11T21:38:05.8958164Z import triton 2023-01-11T21:38:05.8958291Z import triton.language as tl 2023-01-11T21:38:05.8958411Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.8958552Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.8958558Z 2023-01-11T21:38:05.8958562Z 2023-01-11T21:38:05.8958657Z async_compile.wait(globals()) 2023-01-11T21:38:05.8958735Z del async_compile 2023-01-11T21:38:05.8958740Z 2023-01-11T21:38:05.8958816Z def call(args): 2023-01-11T21:38:05.8958905Z arg0_1, arg1_1, arg2_1 = args 2023-01-11T21:38:05.8958982Z args.clear() 2023-01-11T21:38:05.8959095Z buf0 = aten._embedding_bag(arg0_1, arg1_1, arg2_1) 2023-01-11T21:38:05.8959160Z del arg0_1 2023-01-11T21:38:05.8959236Z del arg1_1 2023-01-11T21:38:05.8959307Z del arg2_1 2023-01-11T21:38:05.8959381Z buf1 = buf0[0] 2023-01-11T21:38:05.8959481Z assert_size_stride(buf1, (3, 4), (4, 1)) 2023-01-11T21:38:05.8959556Z buf2 = buf0[1] 2023-01-11T21:38:05.8959656Z assert_size_stride(buf2, (0, ), (1, )) 2023-01-11T21:38:05.8959724Z buf3 = buf0[2] 2023-01-11T21:38:05.8959824Z assert_size_stride(buf3, (3, ), (1, )) 2023-01-11T21:38:05.8959899Z buf4 = buf0[3] 2023-01-11T21:38:05.8959994Z assert_size_stride(buf4, (3, ), (1, )) 2023-01-11T21:38:05.8960066Z del buf0 2023-01-11T21:38:05.8960159Z return (buf1, buf2, buf3, buf4, ) 2023-01-11T21:38:05.8960165Z 2023-01-11T21:38:05.8960169Z 2023-01-11T21:38:05.8960250Z if __name__ == "__main__": 2023-01-11T21:38:05.8960361Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.8960488Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.8960689Z arg0_1 = rand_strided((10, 4), (4, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8960915Z arg1_1 = rand_strided((8, ), (1, ), device='cpu', dtype=torch.int64) 2023-01-11T21:38:05.8961106Z arg2_1 = rand_strided((3, ), (1, ), device='cpu', dtype=torch.int64) 2023-01-11T21:38:05.8961235Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1])) 2023-01-11T21:38:05.8961243Z 2023-01-11T21:38:05.8961317Z ok (0.041s) 2023-01-11T21:38:05.8961769Z test_embedding_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.8961899Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.8962149Z [2023-01-11 21:26:11,172] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 87 2023-01-11T21:38:05.8962412Z [2023-01-11 21:26:12,886] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 87 2023-01-11T21:38:05.8962418Z 2023-01-11T21:38:05.8962519Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.8962594Z import torch 2023-01-11T21:38:05.8962673Z import random 2023-01-11T21:38:05.8962794Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.8962919Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.8962924Z 2023-01-11T21:38:05.8963008Z aten = torch.ops.aten 2023-01-11T21:38:05.8963137Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.8963234Z async_compile = AsyncCompile() 2023-01-11T21:38:05.8963239Z 2023-01-11T21:38:05.8963314Z import triton 2023-01-11T21:38:05.8963409Z import triton.language as tl 2023-01-11T21:38:05.8963534Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.8963675Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.8963680Z 2023-01-11T21:38:05.8963684Z 2023-01-11T21:38:05.8963822Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.8964029Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.8964176Z extern "C" void kernel(const long* __restrict__ in_ptr0, 2023-01-11T21:38:05.8964289Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:05.8964394Z float* __restrict__ out_ptr0, 2023-01-11T21:38:05.8964498Z bool* __restrict__ out_ptr1, 2023-01-11T21:38:05.8964599Z long* __restrict__ out_ptr2) 2023-01-11T21:38:05.8964666Z { 2023-01-11T21:38:05.8964770Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.8964830Z { 2023-01-11T21:38:05.8964913Z #pragma omp for 2023-01-11T21:38:05.8965002Z for(long i0=0; i0<16; i0+=1) 2023-01-11T21:38:05.8965071Z { 2023-01-11T21:38:05.8965164Z #pragma GCC ivdep 2023-01-11T21:38:05.8965252Z for(long i1=0; i1<4; i1+=1) 2023-01-11T21:38:05.8965319Z { 2023-01-11T21:38:05.8965381Z { 2023-01-11T21:38:05.8965453Z { 2023-01-11T21:38:05.8965554Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:05.8965669Z auto tmp1 = in_ptr1[i1 + (4*tmp0)]; 2023-01-11T21:38:05.8965772Z auto tmp2 = tmp1 * (tmp1>0); 2023-01-11T21:38:05.8965873Z out_ptr0[i1 + (4*i0)] = tmp2; 2023-01-11T21:38:05.8965946Z } 2023-01-11T21:38:05.8966009Z } 2023-01-11T21:38:05.8966075Z } 2023-01-11T21:38:05.8966142Z } 2023-01-11T21:38:05.8966223Z #pragma omp for 2023-01-11T21:38:05.8966311Z for(long i0=0; i0<64; i0+=1) 2023-01-11T21:38:05.8966376Z { 2023-01-11T21:38:05.8966437Z { 2023-01-11T21:38:05.8966506Z { 2023-01-11T21:38:05.8966638Z auto tmp0 = out_ptr0[i0]; 2023-01-11T21:38:05.8966749Z auto tmp1 = static_cast(0); 2023-01-11T21:38:05.8966848Z auto tmp2 = tmp0 <= tmp1; 2023-01-11T21:38:05.8966940Z out_ptr1[i0] = tmp2; 2023-01-11T21:38:05.8967009Z } 2023-01-11T21:38:05.8967072Z } 2023-01-11T21:38:05.8967142Z } 2023-01-11T21:38:05.8967224Z #pragma omp for 2023-01-11T21:38:05.8967309Z for(long i0=0; i0<16; i0+=1) 2023-01-11T21:38:05.8967377Z { 2023-01-11T21:38:05.8967444Z { 2023-01-11T21:38:05.8967513Z { 2023-01-11T21:38:05.8967602Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:05.8967718Z auto tmp1 = static_cast(tmp0); 2023-01-11T21:38:05.8967828Z auto tmp2 = static_cast(tmp1); 2023-01-11T21:38:05.8967918Z out_ptr2[i0] = tmp2; 2023-01-11T21:38:05.8967990Z } 2023-01-11T21:38:05.8968057Z } 2023-01-11T21:38:05.8968126Z } 2023-01-11T21:38:05.8968185Z } 2023-01-11T21:38:05.8968250Z } 2023-01-11T21:38:05.8968338Z ''') 2023-01-11T21:38:05.8968344Z 2023-01-11T21:38:05.8968348Z 2023-01-11T21:38:05.8968444Z async_compile.wait(globals()) 2023-01-11T21:38:05.8968530Z del async_compile 2023-01-11T21:38:05.8968535Z 2023-01-11T21:38:05.8968610Z def call(args): 2023-01-11T21:38:05.8968704Z primals_1, primals_2 = args 2023-01-11T21:38:05.8968772Z args.clear() 2023-01-11T21:38:05.8968981Z buf0 = empty_strided((2, 8, 4), (32, 4, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8969181Z buf1 = empty_strided((2, 8, 4), (32, 4, 1), device='cpu', dtype=torch.bool) 2023-01-11T21:38:05.8969374Z buf2 = empty_strided((2, 8), (8, 1), device='cpu', dtype=torch.int64) 2023-01-11T21:38:05.8969602Z kernel_cpp_0(c_void_p(primals_2.data_ptr()), c_void_p(primals_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr())) 2023-01-11T21:38:05.8969686Z del primals_1 2023-01-11T21:38:05.8969762Z del primals_2 2023-01-11T21:38:05.8969850Z return (buf0, buf1, buf2, ) 2023-01-11T21:38:05.8969855Z 2023-01-11T21:38:05.8969860Z 2023-01-11T21:38:05.8969933Z if __name__ == "__main__": 2023-01-11T21:38:05.8970095Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.8970226Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.8970431Z primals_1 = rand_strided((10, 4), (4, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8970629Z primals_2 = rand_strided((2, 8), (8, 1), device='cpu', dtype=torch.int64) 2023-01-11T21:38:05.8970762Z print_performance(lambda: call([primals_1, primals_2])) 2023-01-11T21:38:05.8970767Z 2023-01-11T21:38:05.8970837Z ok (1.818s) 2023-01-11T21:38:05.8971286Z test_exp_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.8971423Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.8971674Z [2023-01-11 21:26:12,905] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 88 2023-01-11T21:38:05.8971937Z [2023-01-11 21:26:14,603] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 88 2023-01-11T21:38:05.8971943Z 2023-01-11T21:38:05.8972043Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.8972120Z import torch 2023-01-11T21:38:05.8972195Z import random 2023-01-11T21:38:05.8972316Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.8972440Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.8972445Z 2023-01-11T21:38:05.8972572Z aten = torch.ops.aten 2023-01-11T21:38:05.8972702Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.8972800Z async_compile = AsyncCompile() 2023-01-11T21:38:05.8972805Z 2023-01-11T21:38:05.8972881Z import triton 2023-01-11T21:38:05.8972974Z import triton.language as tl 2023-01-11T21:38:05.8973102Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.8973244Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.8973249Z 2023-01-11T21:38:05.8973253Z 2023-01-11T21:38:05.8973391Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.8973596Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.8973712Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:05.8973822Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:05.8973931Z float* __restrict__ out_ptr0, 2023-01-11T21:38:05.8974036Z float* __restrict__ out_ptr1) 2023-01-11T21:38:05.8974103Z { 2023-01-11T21:38:05.8974208Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.8974278Z { 2023-01-11T21:38:05.8974352Z #pragma omp for 2023-01-11T21:38:05.8974441Z for(long i0=0; i0<8; i0+=1) 2023-01-11T21:38:05.8974660Z { 2023-01-11T21:38:05.8974810Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:05.8974973Z auto tmp2 = at::vec::Vectorized::loadu(in_ptr1 + 8*i0); 2023-01-11T21:38:05.8975082Z auto tmp1 = tmp0.exp(); 2023-01-11T21:38:05.8975180Z auto tmp3 = tmp0 + tmp2; 2023-01-11T21:38:05.8975261Z auto tmp4 = tmp3.exp(); 2023-01-11T21:38:05.8975358Z tmp1.store(out_ptr0 + 8*i0); 2023-01-11T21:38:05.8975455Z tmp4.store(out_ptr1 + 8*i0); 2023-01-11T21:38:05.8975521Z } 2023-01-11T21:38:05.8975622Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:05.8975715Z for(long i0=64; i0<64; i0+=1) 2023-01-11T21:38:05.8975781Z { 2023-01-11T21:38:05.8975863Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:05.8975953Z auto tmp2 = in_ptr1[i0]; 2023-01-11T21:38:05.8976048Z auto tmp1 = std::exp(tmp0); 2023-01-11T21:38:05.8976192Z auto tmp3 = tmp0 + tmp2; 2023-01-11T21:38:05.8976288Z auto tmp4 = std::exp(tmp3); 2023-01-11T21:38:05.8976375Z out_ptr0[i0] = tmp1; 2023-01-11T21:38:05.8976459Z out_ptr1[i0] = tmp4; 2023-01-11T21:38:05.8976519Z } 2023-01-11T21:38:05.8976586Z } 2023-01-11T21:38:05.8976651Z } 2023-01-11T21:38:05.8976741Z ''') 2023-01-11T21:38:05.8976746Z 2023-01-11T21:38:05.8976751Z 2023-01-11T21:38:05.8976846Z async_compile.wait(globals()) 2023-01-11T21:38:05.8976923Z del async_compile 2023-01-11T21:38:05.8976928Z 2023-01-11T21:38:05.8977003Z def call(args): 2023-01-11T21:38:05.8977085Z arg0_1, arg1_1 = args 2023-01-11T21:38:05.8977262Z args.clear() 2023-01-11T21:38:05.8977470Z buf0 = empty_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8977665Z buf1 = empty_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8977865Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr())) 2023-01-11T21:38:05.8977939Z del arg0_1 2023-01-11T21:38:05.8978011Z del arg1_1 2023-01-11T21:38:05.8978094Z return (buf0, buf1, ) 2023-01-11T21:38:05.8978099Z 2023-01-11T21:38:05.8978104Z 2023-01-11T21:38:05.8978177Z if __name__ == "__main__": 2023-01-11T21:38:05.8978298Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.8978425Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.8978622Z arg0_1 = rand_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8978815Z arg1_1 = rand_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8978990Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:05.8978995Z 2023-01-11T21:38:05.8979074Z ok (1.723s) 2023-01-11T21:38:05.8979532Z test_expand_as_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.8979665Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.8979914Z [2023-01-11 21:26:14,672] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 89 2023-01-11T21:38:05.8980174Z [2023-01-11 21:26:16,362] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 89 2023-01-11T21:38:05.8980179Z 2023-01-11T21:38:05.8980281Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.8980358Z import torch 2023-01-11T21:38:05.8980435Z import random 2023-01-11T21:38:05.8980554Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.8980682Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.8980687Z 2023-01-11T21:38:05.8980770Z aten = torch.ops.aten 2023-01-11T21:38:05.8980900Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.8980996Z async_compile = AsyncCompile() 2023-01-11T21:38:05.8981001Z 2023-01-11T21:38:05.8981076Z import triton 2023-01-11T21:38:05.8981169Z import triton.language as tl 2023-01-11T21:38:05.8981294Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.8981435Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.8981441Z 2023-01-11T21:38:05.8981445Z 2023-01-11T21:38:05.8981584Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.8981791Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.8981911Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:05.8982018Z float* __restrict__ out_ptr0) 2023-01-11T21:38:05.8982085Z { 2023-01-11T21:38:05.8982186Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.8982254Z { 2023-01-11T21:38:05.8982382Z #pragma omp for collapse(2) 2023-01-11T21:38:05.8982474Z for(long i0=0; i0<6; i0+=1) 2023-01-11T21:38:05.8982535Z { 2023-01-11T21:38:05.8982626Z for(long i1=0; i1<128; i1+=1) 2023-01-11T21:38:05.8982697Z { 2023-01-11T21:38:05.8982790Z for(long i2=0; i2<12; i2+=1) 2023-01-11T21:38:05.8982863Z { 2023-01-11T21:38:05.8983014Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + (8*i2) + (100*i0)); 2023-01-11T21:38:05.8983160Z auto tmp1 = at::vec::Vectorized(static_cast(1)); 2023-01-11T21:38:05.8983261Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:05.8983349Z auto tmp3 = tmp2 + tmp1; 2023-01-11T21:38:05.8983470Z tmp3.store(out_ptr0 + (8*i2) + (100*i1) + (12800*i0)); 2023-01-11T21:38:05.8983541Z } 2023-01-11T21:38:05.8983643Z #pragma omp simd simdlen(4) 2023-01-11T21:38:05.8983742Z for(long i2=96; i2<100; i2+=1) 2023-01-11T21:38:05.8983813Z { 2023-01-11T21:38:05.8983920Z auto tmp0 = in_ptr0[i2 + (100*i0)]; 2023-01-11T21:38:05.8984021Z auto tmp1 = static_cast(1); 2023-01-11T21:38:05.8984118Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:05.8984215Z auto tmp3 = tmp2 + tmp1; 2023-01-11T21:38:05.8984327Z out_ptr0[i2 + (100*i1) + (12800*i0)] = tmp3; 2023-01-11T21:38:05.8984398Z } 2023-01-11T21:38:05.8984467Z } 2023-01-11T21:38:05.8984536Z } 2023-01-11T21:38:05.8984626Z } 2023-01-11T21:38:05.8984689Z } 2023-01-11T21:38:05.8984775Z ''') 2023-01-11T21:38:05.8984780Z 2023-01-11T21:38:05.8984785Z 2023-01-11T21:38:05.8984883Z async_compile.wait(globals()) 2023-01-11T21:38:05.8984980Z del async_compile 2023-01-11T21:38:05.8984987Z 2023-01-11T21:38:05.8985065Z def call(args): 2023-01-11T21:38:05.8985169Z arg0_1, arg1_1 = args 2023-01-11T21:38:05.8985240Z args.clear() 2023-01-11T21:38:05.8985458Z buf0 = empty_strided((6, 128, 100), (12800, 100, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8985598Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:05.8985714Z return (as_strided(arg0_1, (6, 128, 100), (100, 0, 1)), buf0, ) 2023-01-11T21:38:05.8985719Z 2023-01-11T21:38:05.8985723Z 2023-01-11T21:38:05.8985808Z if __name__ == "__main__": 2023-01-11T21:38:05.8985925Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.8986053Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.8986268Z arg0_1 = rand_strided((6, 1, 100), (100, 100, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8986475Z arg1_1 = rand_strided((6, 128, 100), (12800, 100, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8986593Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:05.8986599Z 2023-01-11T21:38:05.8986676Z ok (1.755s) 2023-01-11T21:38:05.8987130Z test_expand_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.8987264Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.8987518Z [2023-01-11 21:26:16,389] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 90 2023-01-11T21:38:05.8987782Z [2023-01-11 21:26:18,090] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 90 2023-01-11T21:38:05.8987787Z 2023-01-11T21:38:05.8987888Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.8987963Z import torch 2023-01-11T21:38:05.8988031Z import random 2023-01-11T21:38:05.8988222Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.8988350Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.8988356Z 2023-01-11T21:38:05.8988438Z aten = torch.ops.aten 2023-01-11T21:38:05.8988574Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.8988671Z async_compile = AsyncCompile() 2023-01-11T21:38:05.8988676Z 2023-01-11T21:38:05.8988752Z import triton 2023-01-11T21:38:05.8988846Z import triton.language as tl 2023-01-11T21:38:05.8988965Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.8989107Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.8989115Z 2023-01-11T21:38:05.8989120Z 2023-01-11T21:38:05.8989259Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.8989469Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.8989596Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:05.8989702Z float* __restrict__ out_ptr0, 2023-01-11T21:38:05.8989806Z float* __restrict__ out_ptr1) 2023-01-11T21:38:05.8989873Z { 2023-01-11T21:38:05.8989967Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.8990034Z { 2023-01-11T21:38:05.8990129Z #pragma omp for collapse(2) 2023-01-11T21:38:05.8990217Z for(long i0=0; i0<12; i0+=1) 2023-01-11T21:38:05.8990284Z { 2023-01-11T21:38:05.8990372Z for(long i1=0; i1<2; i1+=1) 2023-01-11T21:38:05.8990433Z { 2023-01-11T21:38:05.8990522Z #pragma GCC ivdep 2023-01-11T21:38:05.8990644Z for(long i2=0; i2<3; i2+=1) 2023-01-11T21:38:05.8990718Z { 2023-01-11T21:38:05.8990809Z #pragma GCC ivdep 2023-01-11T21:38:05.8990907Z for(long i3=0; i3<2; i3+=1) 2023-01-11T21:38:05.8990977Z { 2023-01-11T21:38:05.8991045Z { 2023-01-11T21:38:05.8991122Z { 2023-01-11T21:38:05.8991234Z auto tmp0 = in_ptr0[i3 + (2*i1)]; 2023-01-11T21:38:05.8991353Z auto tmp1 = static_cast(1); 2023-01-11T21:38:05.8991457Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:05.8991571Z auto tmp3 = static_cast(2); 2023-01-11T21:38:05.8991674Z auto tmp4 = tmp2 + tmp3; 2023-01-11T21:38:05.8991792Z out_ptr0[i3 + (2*i2) + (6*i1) + (12*i0)] = tmp4; 2023-01-11T21:38:05.8991863Z } 2023-01-11T21:38:05.8991937Z } 2023-01-11T21:38:05.8992010Z } 2023-01-11T21:38:05.8992078Z } 2023-01-11T21:38:05.8992146Z } 2023-01-11T21:38:05.8992213Z } 2023-01-11T21:38:05.8992301Z #pragma omp for collapse(2) 2023-01-11T21:38:05.8992394Z for(long i0=0; i0<2; i0+=1) 2023-01-11T21:38:05.8992460Z { 2023-01-11T21:38:05.8992548Z for(long i1=0; i1<2; i1+=1) 2023-01-11T21:38:05.8992615Z { 2023-01-11T21:38:05.8992704Z #pragma GCC ivdep 2023-01-11T21:38:05.8992797Z for(long i2=0; i2<3; i2+=1) 2023-01-11T21:38:05.8992859Z { 2023-01-11T21:38:05.8992947Z #pragma GCC ivdep 2023-01-11T21:38:05.8993042Z for(long i3=0; i3<2; i3+=1) 2023-01-11T21:38:05.8993114Z { 2023-01-11T21:38:05.8993188Z { 2023-01-11T21:38:05.8993267Z { 2023-01-11T21:38:05.8993380Z auto tmp0 = in_ptr0[i3 + (2*i1)]; 2023-01-11T21:38:05.8993489Z auto tmp1 = static_cast(2); 2023-01-11T21:38:05.8993593Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:05.8993739Z out_ptr1[i3 + (2*i2) + (6*i1) + (12*i0)] = tmp2; 2023-01-11T21:38:05.8993817Z } 2023-01-11T21:38:05.8993891Z } 2023-01-11T21:38:05.8993962Z } 2023-01-11T21:38:05.8994031Z } 2023-01-11T21:38:05.8994091Z } 2023-01-11T21:38:05.8994162Z } 2023-01-11T21:38:05.8994232Z } 2023-01-11T21:38:05.8994300Z } 2023-01-11T21:38:05.8994386Z ''') 2023-01-11T21:38:05.8994392Z 2023-01-11T21:38:05.8994396Z 2023-01-11T21:38:05.8994491Z async_compile.wait(globals()) 2023-01-11T21:38:05.8994569Z del async_compile 2023-01-11T21:38:05.8994575Z 2023-01-11T21:38:05.8994644Z def call(args): 2023-01-11T21:38:05.8994723Z arg0_1, = args 2023-01-11T21:38:05.8994801Z args.clear() 2023-01-11T21:38:05.8995019Z buf0 = empty_strided((3, 4, 2, 3, 2), (48, 12, 6, 2, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8995238Z buf1 = empty_strided((2, 1, 2, 3, 2), (12, 12, 6, 2, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8995410Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr())) 2023-01-11T21:38:05.8995535Z return (buf0, buf1, as_strided(arg0_1, (2, 2, 5, 2), (0, 2, 0, 1)), ) 2023-01-11T21:38:05.8995541Z 2023-01-11T21:38:05.8995545Z 2023-01-11T21:38:05.8995625Z if __name__ == "__main__": 2023-01-11T21:38:05.8995736Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.8995863Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.8996065Z arg0_1 = rand_strided((2, 1, 2), (2, 2, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.8996179Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:05.8996213Z 2023-01-11T21:38:05.8996288Z ok (1.726s) 2023-01-11T21:38:05.8996924Z test_expanded_reduction_cpu (__main__.CpuTests) ... skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/87157 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (0.001s) 2023-01-11T21:38:05.8997376Z test_expm1_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.8997509Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.8997769Z [2023-01-11 21:26:18,110] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 91 2023-01-11T21:38:05.8998032Z [2023-01-11 21:26:19,787] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 91 2023-01-11T21:38:05.8998451Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.8998584Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.8998829Z [2023-01-11 21:26:19,804] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 92 2023-01-11T21:38:05.8999087Z [2023-01-11 21:26:21,484] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 92 2023-01-11T21:38:05.8999534Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.8999668Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.8999920Z [2023-01-11 21:26:21,500] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 93 2023-01-11T21:38:05.9000179Z [2023-01-11 21:26:23,231] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 93 2023-01-11T21:38:05.9000593Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.9000727Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.9000984Z [2023-01-11 21:26:23,247] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 94 2023-01-11T21:38:05.9001243Z [2023-01-11 21:26:24,952] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 94 2023-01-11T21:38:05.9001655Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.9001784Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.9002051Z [2023-01-11 21:26:24,968] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 95 2023-01-11T21:38:05.9002069Z 2023-01-11T21:38:05.9002161Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.9002237Z import torch 2023-01-11T21:38:05.9002312Z import random 2023-01-11T21:38:05.9002433Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.9002558Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.9002564Z 2023-01-11T21:38:05.9002647Z aten = torch.ops.aten 2023-01-11T21:38:05.9002788Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.9002876Z async_compile = AsyncCompile() 2023-01-11T21:38:05.9002881Z 2023-01-11T21:38:05.9002956Z import triton 2023-01-11T21:38:05.9003050Z import triton.language as tl 2023-01-11T21:38:05.9003177Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.9003318Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.9003327Z 2023-01-11T21:38:05.9003331Z 2023-01-11T21:38:05.9003469Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.9003678Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.9003801Z extern "C" void kernel(const half* __restrict__ in_ptr0, 2023-01-11T21:38:05.9003899Z half* __restrict__ out_ptr0, 2023-01-11T21:38:05.9004001Z half* __restrict__ out_ptr1) 2023-01-11T21:38:05.9004067Z { 2023-01-11T21:38:05.9004171Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.9004237Z { 2023-01-11T21:38:05.9004319Z #pragma omp for 2023-01-11T21:38:05.9004406Z for(long i0=0; i0<64; i0+=1) 2023-01-11T21:38:05.9004466Z { 2023-01-11T21:38:05.9004535Z { 2023-01-11T21:38:05.9004605Z { 2023-01-11T21:38:05.9004727Z auto tmp0 = static_cast(in_ptr0[i0]); 2023-01-11T21:38:05.9004837Z auto tmp1 = std::expm1(tmp0); 2023-01-11T21:38:05.9004949Z auto tmp2 = static_cast(2); 2023-01-11T21:38:05.9005046Z auto tmp3 = tmp1 * tmp2; 2023-01-11T21:38:05.9005129Z out_ptr0[i0] = tmp1; 2023-01-11T21:38:05.9005219Z out_ptr1[i0] = tmp3; 2023-01-11T21:38:05.9005321Z } 2023-01-11T21:38:05.9005391Z } 2023-01-11T21:38:05.9005458Z } 2023-01-11T21:38:05.9005527Z } 2023-01-11T21:38:05.9005584Z } 2023-01-11T21:38:05.9005672Z ''') 2023-01-11T21:38:05.9005677Z 2023-01-11T21:38:05.9005682Z 2023-01-11T21:38:05.9005775Z async_compile.wait(globals()) 2023-01-11T21:38:05.9005852Z del async_compile 2023-01-11T21:38:05.9005857Z 2023-01-11T21:38:05.9005934Z def call(args): 2023-01-11T21:38:05.9006009Z arg0_1, = args 2023-01-11T21:38:05.9006086Z args.clear() 2023-01-11T21:38:05.9006283Z buf0 = empty_strided((64, ), (1, ), device='cpu', dtype=torch.float16) 2023-01-11T21:38:05.9006469Z buf1 = empty_strided((64, ), (1, ), device='cpu', dtype=torch.float16) 2023-01-11T21:38:05.9006638Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr())) 2023-01-11T21:38:05.9006712Z del arg0_1 2023-01-11T21:38:05.9006793Z return (buf0, buf1, ) 2023-01-11T21:38:05.9006798Z 2023-01-11T21:38:05.9006802Z 2023-01-11T21:38:05.9006886Z if __name__ == "__main__": 2023-01-11T21:38:05.9007004Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.9007132Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.9007325Z arg0_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float16) 2023-01-11T21:38:05.9007430Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:05.9007435Z 2023-01-11T21:38:05.9007439Z 2023-01-11T21:38:05.9007540Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.9007614Z import torch 2023-01-11T21:38:05.9007690Z import random 2023-01-11T21:38:05.9007809Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.9007961Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.9007967Z 2023-01-11T21:38:05.9008049Z aten = torch.ops.aten 2023-01-11T21:38:05.9008184Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.9008272Z async_compile = AsyncCompile() 2023-01-11T21:38:05.9008280Z 2023-01-11T21:38:05.9008353Z import triton 2023-01-11T21:38:05.9008446Z import triton.language as tl 2023-01-11T21:38:05.9008573Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.9008716Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.9008721Z 2023-01-11T21:38:05.9008725Z 2023-01-11T21:38:05.9008859Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.9009064Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.9009188Z extern "C" void kernel(const half* __restrict__ in_ptr0, 2023-01-11T21:38:05.9009282Z half* __restrict__ out_ptr0, 2023-01-11T21:38:05.9009389Z half* __restrict__ out_ptr1) 2023-01-11T21:38:05.9009457Z { 2023-01-11T21:38:05.9009559Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.9009625Z { 2023-01-11T21:38:05.9009707Z #pragma omp for 2023-01-11T21:38:05.9009791Z for(long i0=0; i0<201; i0+=1) 2023-01-11T21:38:05.9009859Z { 2023-01-11T21:38:05.9009928Z { 2023-01-11T21:38:05.9009998Z { 2023-01-11T21:38:05.9010117Z auto tmp0 = static_cast(in_ptr0[i0]); 2023-01-11T21:38:05.9010224Z auto tmp1 = std::expm1(tmp0); 2023-01-11T21:38:05.9010331Z auto tmp2 = static_cast(2); 2023-01-11T21:38:05.9010420Z auto tmp3 = tmp1 * tmp2; 2023-01-11T21:38:05.9010511Z out_ptr0[i0] = tmp1; 2023-01-11T21:38:05.9010602Z out_ptr1[i0] = tmp3; 2023-01-11T21:38:05.9010672Z } 2023-01-11T21:38:05.9010742Z } 2023-01-11T21:38:05.9010810Z } 2023-01-11T21:38:05.9010877Z } 2023-01-11T21:38:05.9010934Z } 2023-01-11T21:38:05.9011020Z ''') 2023-01-11T21:38:05.9011025Z 2023-01-11T21:38:05.9011029Z 2023-01-11T21:38:05.9011123Z async_compile.wait(globals()) 2023-01-11T21:38:05.9011200Z del async_compile 2023-01-11T21:38:05.9011247Z 2023-01-11T21:38:05.9011325Z def call(args): 2023-01-11T21:38:05.9011399Z arg0_1, = args 2023-01-11T21:38:05.9011474Z args.clear() 2023-01-11T21:38:05.9011660Z buf0 = empty_strided((201, ), (1, ), device='cpu', dtype=torch.float16) 2023-01-11T21:38:05.9011853Z buf1 = empty_strided((201, ), (1, ), device='cpu', dtype=torch.float16) 2023-01-11T21:38:05.9012020Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr())) 2023-01-11T21:38:05.9012095Z del arg0_1 2023-01-11T21:38:05.9012176Z return (buf0, buf1, ) 2023-01-11T21:38:05.9012181Z 2023-01-11T21:38:05.9012185Z 2023-01-11T21:38:05.9012268Z if __name__ == "__main__": 2023-01-11T21:38:05.9012386Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.9012512Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.9012701Z arg0_1 = rand_strided((201, ), (1, ), device='cpu', dtype=torch.float16) 2023-01-11T21:38:05.9012818Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:05.9012824Z 2023-01-11T21:38:05.9012828Z 2023-01-11T21:38:05.9012926Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.9013000Z import torch 2023-01-11T21:38:05.9013074Z import random 2023-01-11T21:38:05.9013192Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.9013314Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.9013319Z 2023-01-11T21:38:05.9013401Z aten = torch.ops.aten 2023-01-11T21:38:05.9013529Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.9013625Z async_compile = AsyncCompile() 2023-01-11T21:38:05.9013659Z 2023-01-11T21:38:05.9013733Z import triton 2023-01-11T21:38:05.9013826Z import triton.language as tl 2023-01-11T21:38:05.9013954Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.9014096Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.9014101Z 2023-01-11T21:38:05.9014106Z 2023-01-11T21:38:05.9014244Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.9014450Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.9014683Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:05.9014792Z float* __restrict__ out_ptr0, 2023-01-11T21:38:05.9014895Z float* __restrict__ out_ptr1) 2023-01-11T21:38:05.9014961Z { 2023-01-11T21:38:05.9015063Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.9015129Z { 2023-01-11T21:38:05.9015210Z #pragma omp for 2023-01-11T21:38:05.9015290Z for(long i0=0; i0<8; i0+=1) 2023-01-11T21:38:05.9015361Z { 2023-01-11T21:38:05.9015505Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:05.9015600Z auto tmp1 = tmp0.expm1(); 2023-01-11T21:38:05.9015738Z auto tmp2 = at::vec::Vectorized(static_cast(2)); 2023-01-11T21:38:05.9015829Z auto tmp3 = tmp1 * tmp2; 2023-01-11T21:38:05.9015927Z tmp1.store(out_ptr0 + 8*i0); 2023-01-11T21:38:05.9016028Z tmp3.store(out_ptr1 + 8*i0); 2023-01-11T21:38:05.9016088Z } 2023-01-11T21:38:05.9016188Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:05.9016276Z for(long i0=64; i0<64; i0+=1) 2023-01-11T21:38:05.9016343Z { 2023-01-11T21:38:05.9016438Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:05.9016537Z auto tmp1 = std::expm1(tmp0); 2023-01-11T21:38:05.9016634Z auto tmp2 = static_cast(2); 2023-01-11T21:38:05.9016724Z auto tmp3 = tmp1 * tmp2; 2023-01-11T21:38:05.9016813Z out_ptr0[i0] = tmp1; 2023-01-11T21:38:05.9016898Z out_ptr1[i0] = tmp3; 2023-01-11T21:38:05.9016968Z } 2023-01-11T21:38:05.9017036Z } 2023-01-11T21:38:05.9017102Z } 2023-01-11T21:38:05.9017264Z ''') 2023-01-11T21:38:05.9017270Z 2023-01-11T21:38:05.9017283Z 2023-01-11T21:38:05.9017431Z async_compile.wait(globals()) 2023-01-11T21:38:05.9017526Z del async_compile 2023-01-11T21:38:05.9017532Z 2023-01-11T21:38:05.9017620Z def call(args): 2023-01-11T21:38:05.9017708Z arg0_1, = args 2023-01-11T21:38:05.9017799Z args.clear() 2023-01-11T21:38:05.9017996Z buf0 = empty_strided((64, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9018185Z buf1 = empty_strided((64, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9018342Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr())) 2023-01-11T21:38:05.9018420Z del arg0_1 2023-01-11T21:38:05.9018501Z return (buf0, buf1, ) 2023-01-11T21:38:05.9018509Z 2023-01-11T21:38:05.9018514Z 2023-01-11T21:38:05.9018598Z if __name__ == "__main__": 2023-01-11T21:38:05.9018715Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.9018842Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.9019035Z arg0_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9019150Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:05.9019155Z 2023-01-11T21:38:05.9019159Z 2023-01-11T21:38:05.9019249Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.9019323Z import torch 2023-01-11T21:38:05.9019397Z import random 2023-01-11T21:38:05.9019517Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.9019641Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.9019646Z 2023-01-11T21:38:05.9019728Z aten = torch.ops.aten 2023-01-11T21:38:05.9019862Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.9019992Z async_compile = AsyncCompile() 2023-01-11T21:38:05.9020005Z 2023-01-11T21:38:05.9020072Z import triton 2023-01-11T21:38:05.9020166Z import triton.language as tl 2023-01-11T21:38:05.9020290Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.9020432Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.9020440Z 2023-01-11T21:38:05.9020444Z 2023-01-11T21:38:05.9020581Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.9020784Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.9020908Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:05.9021011Z float* __restrict__ out_ptr0, 2023-01-11T21:38:05.9021105Z float* __restrict__ out_ptr1) 2023-01-11T21:38:05.9021170Z { 2023-01-11T21:38:05.9021271Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.9021338Z { 2023-01-11T21:38:05.9021423Z #pragma omp for 2023-01-11T21:38:05.9021510Z for(long i0=0; i0<25; i0+=1) 2023-01-11T21:38:05.9021570Z { 2023-01-11T21:38:05.9021709Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:05.9021800Z auto tmp1 = tmp0.expm1(); 2023-01-11T21:38:05.9021938Z auto tmp2 = at::vec::Vectorized(static_cast(2)); 2023-01-11T21:38:05.9022029Z auto tmp3 = tmp1 * tmp2; 2023-01-11T21:38:05.9022129Z tmp1.store(out_ptr0 + 8*i0); 2023-01-11T21:38:05.9022224Z tmp3.store(out_ptr1 + 8*i0); 2023-01-11T21:38:05.9022292Z } 2023-01-11T21:38:05.9022384Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:05.9022473Z for(long i0=200; i0<201; i0+=1) 2023-01-11T21:38:05.9022540Z { 2023-01-11T21:38:05.9022629Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:05.9022727Z auto tmp1 = std::expm1(tmp0); 2023-01-11T21:38:05.9022831Z auto tmp2 = static_cast(2); 2023-01-11T21:38:05.9022923Z auto tmp3 = tmp1 * tmp2; 2023-01-11T21:38:05.9023002Z out_ptr0[i0] = tmp1; 2023-01-11T21:38:05.9023089Z out_ptr1[i0] = tmp3; 2023-01-11T21:38:05.9023160Z } 2023-01-11T21:38:05.9023229Z } 2023-01-11T21:38:05.9023295Z } 2023-01-11T21:38:05.9023380Z ''') 2023-01-11T21:38:05.9023422Z 2023-01-11T21:38:05.9023427Z 2023-01-11T21:38:05.9023520Z async_compile.wait(globals()) 2023-01-11T21:38:05.9023592Z del async_compile 2023-01-11T21:38:05.9023597Z 2023-01-11T21:38:05.9023673Z def call(args): 2023-01-11T21:38:05.9023750Z arg0_1, = args 2023-01-11T21:38:05.9023825Z args.clear() 2023-01-11T21:38:05.9024020Z buf0 = empty_strided((201, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9024212Z buf1 = empty_strided((201, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9024379Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr())) 2023-01-11T21:38:05.9024448Z del arg0_1 2023-01-11T21:38:05.9024529Z return (buf0, buf1, ) 2023-01-11T21:38:05.9024534Z 2023-01-11T21:38:05.9024538Z 2023-01-11T21:38:05.9024620Z if __name__ == "__main__": 2023-01-11T21:38:05.9024738Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.9024871Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.9025066Z arg0_1 = rand_strided((201, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9025179Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:05.9025184Z 2023-01-11T21:38:05.9025443Z [2023-01-11 21:26:26,670] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 95 2023-01-11T21:38:05.9025861Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.9026017Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.9026272Z [2023-01-11 21:26:26,688] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 96 2023-01-11T21:38:05.9026539Z [2023-01-11 21:26:28,446] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 96 2023-01-11T21:38:05.9026953Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.9027086Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.9027340Z [2023-01-11 21:26:28,463] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 97 2023-01-11T21:38:05.9027603Z [2023-01-11 21:26:30,305] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 97 2023-01-11T21:38:05.9028018Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.9028149Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.9028400Z [2023-01-11 21:26:30,335] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 98 2023-01-11T21:38:05.9028660Z [2023-01-11 21:26:32,127] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 98 2023-01-11T21:38:05.9029072Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.9029228Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.9029481Z [2023-01-11 21:26:32,144] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 99 2023-01-11T21:38:05.9029487Z 2023-01-11T21:38:05.9029587Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.9029661Z import torch 2023-01-11T21:38:05.9029737Z import random 2023-01-11T21:38:05.9029858Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.9029983Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.9029989Z 2023-01-11T21:38:05.9030076Z aten = torch.ops.aten 2023-01-11T21:38:05.9030207Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.9030304Z async_compile = AsyncCompile() 2023-01-11T21:38:05.9030310Z 2023-01-11T21:38:05.9030386Z import triton 2023-01-11T21:38:05.9030479Z import triton.language as tl 2023-01-11T21:38:05.9030607Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.9030750Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.9030755Z 2023-01-11T21:38:05.9030760Z 2023-01-11T21:38:05.9030897Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.9031105Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.9031224Z extern "C" void kernel(const double* __restrict__ in_ptr0, 2023-01-11T21:38:05.9031332Z double* __restrict__ out_ptr0, 2023-01-11T21:38:05.9031435Z double* __restrict__ out_ptr1) 2023-01-11T21:38:05.9031504Z { 2023-01-11T21:38:05.9031606Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.9031706Z { 2023-01-11T21:38:05.9031791Z #pragma omp for 2023-01-11T21:38:05.9031872Z for(long i0=0; i0<64; i0+=1) 2023-01-11T21:38:05.9031941Z { 2023-01-11T21:38:05.9032009Z { 2023-01-11T21:38:05.9032078Z { 2023-01-11T21:38:05.9032181Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:05.9032288Z auto tmp1 = std::expm1(tmp0); 2023-01-11T21:38:05.9032398Z auto tmp2 = static_cast(2); 2023-01-11T21:38:05.9032488Z auto tmp3 = tmp1 * tmp2; 2023-01-11T21:38:05.9032579Z out_ptr0[i0] = tmp1; 2023-01-11T21:38:05.9032670Z out_ptr1[i0] = tmp3; 2023-01-11T21:38:05.9032741Z } 2023-01-11T21:38:05.9032808Z } 2023-01-11T21:38:05.9032877Z } 2023-01-11T21:38:05.9032943Z } 2023-01-11T21:38:05.9033000Z } 2023-01-11T21:38:05.9033086Z ''') 2023-01-11T21:38:05.9033095Z 2023-01-11T21:38:05.9033099Z 2023-01-11T21:38:05.9033194Z async_compile.wait(globals()) 2023-01-11T21:38:05.9033272Z del async_compile 2023-01-11T21:38:05.9033277Z 2023-01-11T21:38:05.9033351Z def call(args): 2023-01-11T21:38:05.9033426Z arg0_1, = args 2023-01-11T21:38:05.9033502Z args.clear() 2023-01-11T21:38:05.9033692Z buf0 = empty_strided((64, ), (1, ), device='cpu', dtype=torch.float64) 2023-01-11T21:38:05.9033887Z buf1 = empty_strided((64, ), (1, ), device='cpu', dtype=torch.float64) 2023-01-11T21:38:05.9034055Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr())) 2023-01-11T21:38:05.9034130Z del arg0_1 2023-01-11T21:38:05.9034212Z return (buf0, buf1, ) 2023-01-11T21:38:05.9034217Z 2023-01-11T21:38:05.9034222Z 2023-01-11T21:38:05.9034303Z if __name__ == "__main__": 2023-01-11T21:38:05.9034422Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.9034551Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.9034738Z arg0_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float64) 2023-01-11T21:38:05.9034853Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:05.9034858Z 2023-01-11T21:38:05.9034862Z 2023-01-11T21:38:05.9034961Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.9035036Z import torch 2023-01-11T21:38:05.9035141Z import random 2023-01-11T21:38:05.9035262Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.9035386Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.9035391Z 2023-01-11T21:38:05.9035472Z aten = torch.ops.aten 2023-01-11T21:38:05.9035600Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.9035696Z async_compile = AsyncCompile() 2023-01-11T21:38:05.9035701Z 2023-01-11T21:38:05.9035775Z import triton 2023-01-11T21:38:05.9035866Z import triton.language as tl 2023-01-11T21:38:05.9035989Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.9036132Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.9036138Z 2023-01-11T21:38:05.9036142Z 2023-01-11T21:38:05.9036279Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.9036484Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.9036605Z extern "C" void kernel(const double* __restrict__ in_ptr0, 2023-01-11T21:38:05.9036711Z double* __restrict__ out_ptr0, 2023-01-11T21:38:05.9036814Z double* __restrict__ out_ptr1) 2023-01-11T21:38:05.9036881Z { 2023-01-11T21:38:05.9036984Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.9037050Z { 2023-01-11T21:38:05.9037131Z #pragma omp for 2023-01-11T21:38:05.9037212Z for(long i0=0; i0<201; i0+=1) 2023-01-11T21:38:05.9037279Z { 2023-01-11T21:38:05.9037346Z { 2023-01-11T21:38:05.9037414Z { 2023-01-11T21:38:05.9037514Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:05.9037647Z auto tmp1 = std::expm1(tmp0); 2023-01-11T21:38:05.9037758Z auto tmp2 = static_cast(2); 2023-01-11T21:38:05.9037847Z auto tmp3 = tmp1 * tmp2; 2023-01-11T21:38:05.9037939Z out_ptr0[i0] = tmp1; 2023-01-11T21:38:05.9038032Z out_ptr1[i0] = tmp3; 2023-01-11T21:38:05.9038104Z } 2023-01-11T21:38:05.9038174Z } 2023-01-11T21:38:05.9038243Z } 2023-01-11T21:38:05.9038310Z } 2023-01-11T21:38:05.9038367Z } 2023-01-11T21:38:05.9038452Z ''') 2023-01-11T21:38:05.9038457Z 2023-01-11T21:38:05.9038461Z 2023-01-11T21:38:05.9038556Z async_compile.wait(globals()) 2023-01-11T21:38:05.9038634Z del async_compile 2023-01-11T21:38:05.9038639Z 2023-01-11T21:38:05.9038716Z def call(args): 2023-01-11T21:38:05.9038791Z arg0_1, = args 2023-01-11T21:38:05.9038868Z args.clear() 2023-01-11T21:38:05.9039054Z buf0 = empty_strided((201, ), (1, ), device='cpu', dtype=torch.float64) 2023-01-11T21:38:05.9039254Z buf1 = empty_strided((201, ), (1, ), device='cpu', dtype=torch.float64) 2023-01-11T21:38:05.9039420Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr())) 2023-01-11T21:38:05.9039495Z del arg0_1 2023-01-11T21:38:05.9039578Z return (buf0, buf1, ) 2023-01-11T21:38:05.9039586Z 2023-01-11T21:38:05.9039591Z 2023-01-11T21:38:05.9039673Z if __name__ == "__main__": 2023-01-11T21:38:05.9039790Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.9039914Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.9040100Z arg0_1 = rand_strided((201, ), (1, ), device='cpu', dtype=torch.float64) 2023-01-11T21:38:05.9040211Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:05.9040216Z 2023-01-11T21:38:05.9040221Z 2023-01-11T21:38:05.9040317Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.9040393Z import torch 2023-01-11T21:38:05.9040473Z import random 2023-01-11T21:38:05.9040591Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.9040716Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.9040721Z 2023-01-11T21:38:05.9040804Z aten = torch.ops.aten 2023-01-11T21:38:05.9040931Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.9041058Z async_compile = AsyncCompile() 2023-01-11T21:38:05.9041064Z 2023-01-11T21:38:05.9041142Z import triton 2023-01-11T21:38:05.9041236Z import triton.language as tl 2023-01-11T21:38:05.9041361Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.9041503Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.9041508Z 2023-01-11T21:38:05.9041513Z 2023-01-11T21:38:05.9041648Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.9041854Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.9041967Z extern "C" void kernel(const int* __restrict__ in_ptr0, 2023-01-11T21:38:05.9042077Z float* __restrict__ out_ptr0, 2023-01-11T21:38:05.9042181Z float* __restrict__ out_ptr1) 2023-01-11T21:38:05.9042248Z { 2023-01-11T21:38:05.9042352Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.9042419Z { 2023-01-11T21:38:05.9042504Z #pragma omp for 2023-01-11T21:38:05.9042585Z for(long i0=0; i0<64; i0+=1) 2023-01-11T21:38:05.9042655Z { 2023-01-11T21:38:05.9042724Z { 2023-01-11T21:38:05.9042794Z { 2023-01-11T21:38:05.9042890Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:05.9043003Z auto tmp1 = static_cast(tmp0); 2023-01-11T21:38:05.9043103Z auto tmp2 = std::expm1(tmp1); 2023-01-11T21:38:05.9043212Z auto tmp3 = static_cast(2); 2023-01-11T21:38:05.9043309Z auto tmp4 = tmp2 * tmp3; 2023-01-11T21:38:05.9043400Z out_ptr0[i0] = tmp2; 2023-01-11T21:38:05.9043518Z out_ptr1[i0] = tmp4; 2023-01-11T21:38:05.9043588Z } 2023-01-11T21:38:05.9043657Z } 2023-01-11T21:38:05.9043717Z } 2023-01-11T21:38:05.9043784Z } 2023-01-11T21:38:05.9043849Z } 2023-01-11T21:38:05.9043937Z ''') 2023-01-11T21:38:05.9043943Z 2023-01-11T21:38:05.9043950Z 2023-01-11T21:38:05.9044044Z async_compile.wait(globals()) 2023-01-11T21:38:05.9044125Z del async_compile 2023-01-11T21:38:05.9044130Z 2023-01-11T21:38:05.9044205Z def call(args): 2023-01-11T21:38:05.9044279Z arg0_1, = args 2023-01-11T21:38:05.9044347Z args.clear() 2023-01-11T21:38:05.9044541Z buf0 = empty_strided((64, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9044732Z buf1 = empty_strided((64, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9044900Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr())) 2023-01-11T21:38:05.9044974Z del arg0_1 2023-01-11T21:38:05.9045060Z return (buf0, buf1, ) 2023-01-11T21:38:05.9045065Z 2023-01-11T21:38:05.9045070Z 2023-01-11T21:38:05.9045153Z if __name__ == "__main__": 2023-01-11T21:38:05.9045262Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.9045386Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.9045581Z arg0_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.int32) 2023-01-11T21:38:05.9045695Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:05.9045700Z 2023-01-11T21:38:05.9045704Z 2023-01-11T21:38:05.9045803Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.9045878Z import torch 2023-01-11T21:38:05.9045956Z import random 2023-01-11T21:38:05.9046075Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.9046190Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.9046195Z 2023-01-11T21:38:05.9046282Z aten = torch.ops.aten 2023-01-11T21:38:05.9046422Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.9046525Z async_compile = AsyncCompile() 2023-01-11T21:38:05.9046530Z 2023-01-11T21:38:05.9046603Z import triton 2023-01-11T21:38:05.9046698Z import triton.language as tl 2023-01-11T21:38:05.9046827Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.9046998Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.9047005Z 2023-01-11T21:38:05.9047010Z 2023-01-11T21:38:05.9047138Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.9047345Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.9047465Z extern "C" void kernel(const int* __restrict__ in_ptr0, 2023-01-11T21:38:05.9047569Z float* __restrict__ out_ptr0, 2023-01-11T21:38:05.9047672Z float* __restrict__ out_ptr1) 2023-01-11T21:38:05.9047739Z { 2023-01-11T21:38:05.9047841Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.9047903Z { 2023-01-11T21:38:05.9047987Z #pragma omp for 2023-01-11T21:38:05.9048076Z for(long i0=0; i0<201; i0+=1) 2023-01-11T21:38:05.9048144Z { 2023-01-11T21:38:05.9048211Z { 2023-01-11T21:38:05.9048281Z { 2023-01-11T21:38:05.9048378Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:05.9048487Z auto tmp1 = static_cast(tmp0); 2023-01-11T21:38:05.9048597Z auto tmp2 = std::expm1(tmp1); 2023-01-11T21:38:05.9048706Z auto tmp3 = static_cast(2); 2023-01-11T21:38:05.9048803Z auto tmp4 = tmp2 * tmp3; 2023-01-11T21:38:05.9048893Z out_ptr0[i0] = tmp2; 2023-01-11T21:38:05.9048982Z out_ptr1[i0] = tmp4; 2023-01-11T21:38:05.9049053Z } 2023-01-11T21:38:05.9049113Z } 2023-01-11T21:38:05.9049181Z } 2023-01-11T21:38:05.9049248Z } 2023-01-11T21:38:05.9049313Z } 2023-01-11T21:38:05.9049427Z ''') 2023-01-11T21:38:05.9049433Z 2023-01-11T21:38:05.9049437Z 2023-01-11T21:38:05.9049531Z async_compile.wait(globals()) 2023-01-11T21:38:05.9049611Z del async_compile 2023-01-11T21:38:05.9049617Z 2023-01-11T21:38:05.9049685Z def call(args): 2023-01-11T21:38:05.9049759Z arg0_1, = args 2023-01-11T21:38:05.9049834Z args.clear() 2023-01-11T21:38:05.9050031Z buf0 = empty_strided((201, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9050223Z buf1 = empty_strided((201, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9050389Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr())) 2023-01-11T21:38:05.9050463Z del arg0_1 2023-01-11T21:38:05.9050546Z return (buf0, buf1, ) 2023-01-11T21:38:05.9050551Z 2023-01-11T21:38:05.9050555Z 2023-01-11T21:38:05.9050627Z if __name__ == "__main__": 2023-01-11T21:38:05.9050746Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.9050872Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.9051068Z arg0_1 = rand_strided((201, ), (1, ), device='cpu', dtype=torch.int32) 2023-01-11T21:38:05.9051182Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:05.9051186Z 2023-01-11T21:38:05.9051453Z [2023-01-11 21:26:33,977] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 99 2023-01-11T21:38:05.9051868Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.9051999Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.9052256Z [2023-01-11 21:26:33,994] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 100 2023-01-11T21:38:05.9052511Z [2023-01-11 21:26:35,767] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 100 2023-01-11T21:38:05.9052531Z 2023-01-11T21:38:05.9052622Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.9052699Z import torch 2023-01-11T21:38:05.9052773Z import random 2023-01-11T21:38:05.9052932Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.9053059Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.9053064Z 2023-01-11T21:38:05.9053147Z aten = torch.ops.aten 2023-01-11T21:38:05.9053285Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.9053373Z async_compile = AsyncCompile() 2023-01-11T21:38:05.9053378Z 2023-01-11T21:38:05.9053454Z import triton 2023-01-11T21:38:05.9053550Z import triton.language as tl 2023-01-11T21:38:05.9053676Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.9053817Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.9053825Z 2023-01-11T21:38:05.9053829Z 2023-01-11T21:38:05.9053966Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.9054172Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.9054294Z extern "C" void kernel(const long* __restrict__ in_ptr0, 2023-01-11T21:38:05.9054393Z float* __restrict__ out_ptr0, 2023-01-11T21:38:05.9054606Z float* __restrict__ out_ptr1) 2023-01-11T21:38:05.9054673Z { 2023-01-11T21:38:05.9054777Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.9054846Z { 2023-01-11T21:38:05.9054928Z #pragma omp for 2023-01-11T21:38:05.9055017Z for(long i0=0; i0<64; i0+=1) 2023-01-11T21:38:05.9055077Z { 2023-01-11T21:38:05.9055148Z { 2023-01-11T21:38:05.9055216Z { 2023-01-11T21:38:05.9055314Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:05.9055430Z auto tmp1 = static_cast(tmp0); 2023-01-11T21:38:05.9055586Z auto tmp2 = std::expm1(tmp1); 2023-01-11T21:38:05.9055696Z auto tmp3 = static_cast(2); 2023-01-11T21:38:05.9055786Z auto tmp4 = tmp2 * tmp3; 2023-01-11T21:38:05.9055877Z out_ptr0[i0] = tmp2; 2023-01-11T21:38:05.9055970Z out_ptr1[i0] = tmp4; 2023-01-11T21:38:05.9056041Z } 2023-01-11T21:38:05.9056107Z } 2023-01-11T21:38:05.9056174Z } 2023-01-11T21:38:05.9056239Z } 2023-01-11T21:38:05.9056296Z } 2023-01-11T21:38:05.9056382Z ''') 2023-01-11T21:38:05.9056388Z 2023-01-11T21:38:05.9056392Z 2023-01-11T21:38:05.9056484Z async_compile.wait(globals()) 2023-01-11T21:38:05.9056561Z del async_compile 2023-01-11T21:38:05.9056566Z 2023-01-11T21:38:05.9056647Z def call(args): 2023-01-11T21:38:05.9056721Z arg0_1, = args 2023-01-11T21:38:05.9056797Z args.clear() 2023-01-11T21:38:05.9056984Z buf0 = empty_strided((64, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9057266Z buf1 = empty_strided((64, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9057438Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr())) 2023-01-11T21:38:05.9057516Z del arg0_1 2023-01-11T21:38:05.9057601Z return (buf0, buf1, ) 2023-01-11T21:38:05.9057606Z 2023-01-11T21:38:05.9057611Z 2023-01-11T21:38:05.9057692Z if __name__ == "__main__": 2023-01-11T21:38:05.9057811Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.9057936Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.9058119Z arg0_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.int64) 2023-01-11T21:38:05.9058235Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:05.9058240Z 2023-01-11T21:38:05.9058244Z 2023-01-11T21:38:05.9058342Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.9058416Z import torch 2023-01-11T21:38:05.9058492Z import random 2023-01-11T21:38:05.9058612Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.9058737Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.9058742Z 2023-01-11T21:38:05.9058821Z aten = torch.ops.aten 2023-01-11T21:38:05.9058949Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.9059088Z async_compile = AsyncCompile() 2023-01-11T21:38:05.9059094Z 2023-01-11T21:38:05.9059171Z import triton 2023-01-11T21:38:05.9059269Z import triton.language as tl 2023-01-11T21:38:05.9059395Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.9059537Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.9059542Z 2023-01-11T21:38:05.9059547Z 2023-01-11T21:38:05.9059684Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.9059890Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.9060006Z extern "C" void kernel(const long* __restrict__ in_ptr0, 2023-01-11T21:38:05.9060114Z float* __restrict__ out_ptr0, 2023-01-11T21:38:05.9060221Z float* __restrict__ out_ptr1) 2023-01-11T21:38:05.9060287Z { 2023-01-11T21:38:05.9060388Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.9060458Z { 2023-01-11T21:38:05.9060544Z #pragma omp for 2023-01-11T21:38:05.9060625Z for(long i0=0; i0<201; i0+=1) 2023-01-11T21:38:05.9060695Z { 2023-01-11T21:38:05.9060763Z { 2023-01-11T21:38:05.9060832Z { 2023-01-11T21:38:05.9060929Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:05.9061044Z auto tmp1 = static_cast(tmp0); 2023-01-11T21:38:05.9061152Z auto tmp2 = std::expm1(tmp1); 2023-01-11T21:38:05.9061253Z auto tmp3 = static_cast(2); 2023-01-11T21:38:05.9061349Z auto tmp4 = tmp2 * tmp3; 2023-01-11T21:38:05.9061439Z out_ptr0[i0] = tmp2; 2023-01-11T21:38:05.9061556Z out_ptr1[i0] = tmp4; 2023-01-11T21:38:05.9061626Z } 2023-01-11T21:38:05.9061694Z } 2023-01-11T21:38:05.9061753Z } 2023-01-11T21:38:05.9061824Z } 2023-01-11T21:38:05.9061894Z } 2023-01-11T21:38:05.9061980Z ''') 2023-01-11T21:38:05.9061986Z 2023-01-11T21:38:05.9061993Z 2023-01-11T21:38:05.9062089Z async_compile.wait(globals()) 2023-01-11T21:38:05.9062166Z del async_compile 2023-01-11T21:38:05.9062171Z 2023-01-11T21:38:05.9062250Z def call(args): 2023-01-11T21:38:05.9062324Z arg0_1, = args 2023-01-11T21:38:05.9062392Z args.clear() 2023-01-11T21:38:05.9062586Z buf0 = empty_strided((201, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9062780Z buf1 = empty_strided((201, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9062948Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr())) 2023-01-11T21:38:05.9063026Z del arg0_1 2023-01-11T21:38:05.9063112Z return (buf0, buf1, ) 2023-01-11T21:38:05.9063117Z 2023-01-11T21:38:05.9063122Z 2023-01-11T21:38:05.9063203Z if __name__ == "__main__": 2023-01-11T21:38:05.9063322Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.9063441Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.9063636Z arg0_1 = rand_strided((201, ), (1, ), device='cpu', dtype=torch.int64) 2023-01-11T21:38:05.9063748Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:05.9063753Z 2023-01-11T21:38:05.9063828Z ok (17.682s) 2023-01-11T21:38:05.9064279Z test_fill1_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.9064413Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.9064675Z [2023-01-11 21:26:35,819] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 101 2023-01-11T21:38:05.9064969Z [2023-01-11 21:26:37,612] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 101 2023-01-11T21:38:05.9064976Z 2023-01-11T21:38:05.9065077Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.9065144Z import torch 2023-01-11T21:38:05.9065220Z import random 2023-01-11T21:38:05.9065342Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.9065470Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.9065475Z 2023-01-11T21:38:05.9065559Z aten = torch.ops.aten 2023-01-11T21:38:05.9065695Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.9065799Z async_compile = AsyncCompile() 2023-01-11T21:38:05.9065804Z 2023-01-11T21:38:05.9065883Z import triton 2023-01-11T21:38:05.9065968Z import triton.language as tl 2023-01-11T21:38:05.9066094Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.9066235Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.9066240Z 2023-01-11T21:38:05.9066244Z 2023-01-11T21:38:05.9066384Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.9066590Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.9066709Z extern "C" void kernel(float* __restrict__ out_ptr0, 2023-01-11T21:38:05.9066813Z float* __restrict__ out_ptr1) 2023-01-11T21:38:05.9066871Z { 2023-01-11T21:38:05.9066975Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.9067042Z { 2023-01-11T21:38:05.9067125Z #pragma omp for 2023-01-11T21:38:05.9067216Z for(long i0=0; i0<32; i0+=1) 2023-01-11T21:38:05.9067286Z { 2023-01-11T21:38:05.9067429Z auto tmp0 = at::vec::Vectorized(static_cast(1)); 2023-01-11T21:38:05.9067546Z tmp0.store(out_ptr0 + 8*i0); 2023-01-11T21:38:05.9067614Z } 2023-01-11T21:38:05.9067715Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:05.9067806Z for(long i0=256; i0<256; i0+=1) 2023-01-11T21:38:05.9067874Z { 2023-01-11T21:38:05.9067983Z auto tmp0 = static_cast(1); 2023-01-11T21:38:05.9068071Z out_ptr0[i0] = tmp0; 2023-01-11T21:38:05.9068131Z } 2023-01-11T21:38:05.9068211Z #pragma omp for 2023-01-11T21:38:05.9068300Z for(long i0=0; i0<32; i0+=1) 2023-01-11T21:38:05.9068366Z { 2023-01-11T21:38:05.9068502Z auto tmp0 = at::vec::Vectorized(static_cast(2)); 2023-01-11T21:38:05.9068600Z tmp0.store(out_ptr1 + 8*i0); 2023-01-11T21:38:05.9068671Z } 2023-01-11T21:38:05.9068762Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:05.9068850Z for(long i0=256; i0<256; i0+=1) 2023-01-11T21:38:05.9068922Z { 2023-01-11T21:38:05.9069027Z auto tmp0 = static_cast(2); 2023-01-11T21:38:05.9069113Z out_ptr1[i0] = tmp0; 2023-01-11T21:38:05.9069181Z } 2023-01-11T21:38:05.9069248Z } 2023-01-11T21:38:05.9069306Z } 2023-01-11T21:38:05.9069391Z ''') 2023-01-11T21:38:05.9069397Z 2023-01-11T21:38:05.9069404Z 2023-01-11T21:38:05.9069501Z async_compile.wait(globals()) 2023-01-11T21:38:05.9069581Z del async_compile 2023-01-11T21:38:05.9069586Z 2023-01-11T21:38:05.9069665Z def call(args): 2023-01-11T21:38:05.9069742Z arg0_1, = args 2023-01-11T21:38:05.9069819Z args.clear() 2023-01-11T21:38:05.9070012Z buf0 = empty_strided((16, 16), (16, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9070212Z buf1 = empty_strided((16, 16), (16, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9070351Z kernel_cpp_0(c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr())) 2023-01-11T21:38:05.9070434Z return (buf0, buf1, ) 2023-01-11T21:38:05.9070442Z 2023-01-11T21:38:05.9070446Z 2023-01-11T21:38:05.9070528Z if __name__ == "__main__": 2023-01-11T21:38:05.9070648Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.9070775Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.9071004Z arg0_1 = rand_strided((16, 16), (16, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9071110Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:05.9071115Z 2023-01-11T21:38:05.9071191Z ok (1.846s) 2023-01-11T21:38:05.9071638Z test_fill2_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.9071772Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.9072035Z [2023-01-11 21:26:37,665] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 102 2023-01-11T21:38:05.9072300Z [2023-01-11 21:26:39,533] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 102 2023-01-11T21:38:05.9072306Z 2023-01-11T21:38:05.9072406Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.9072481Z import torch 2023-01-11T21:38:05.9072556Z import random 2023-01-11T21:38:05.9072667Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.9072794Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.9072799Z 2023-01-11T21:38:05.9072884Z aten = torch.ops.aten 2023-01-11T21:38:05.9073020Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.9073117Z async_compile = AsyncCompile() 2023-01-11T21:38:05.9073122Z 2023-01-11T21:38:05.9073195Z import triton 2023-01-11T21:38:05.9073289Z import triton.language as tl 2023-01-11T21:38:05.9073450Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.9073583Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.9073589Z 2023-01-11T21:38:05.9073602Z 2023-01-11T21:38:05.9073731Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.9073938Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.9074059Z extern "C" void kernel(float* __restrict__ out_ptr0, 2023-01-11T21:38:05.9074164Z float* __restrict__ out_ptr1) 2023-01-11T21:38:05.9074229Z { 2023-01-11T21:38:05.9074335Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.9074403Z { 2023-01-11T21:38:05.9074477Z #pragma omp for 2023-01-11T21:38:05.9074569Z for(long i0=0; i0<32; i0+=1) 2023-01-11T21:38:05.9074636Z { 2023-01-11T21:38:05.9074780Z auto tmp0 = at::vec::Vectorized(static_cast(1)); 2023-01-11T21:38:05.9074877Z tmp0.store(out_ptr0 + 8*i0); 2023-01-11T21:38:05.9074947Z } 2023-01-11T21:38:05.9075039Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:05.9075126Z for(long i0=256; i0<256; i0+=1) 2023-01-11T21:38:05.9075194Z { 2023-01-11T21:38:05.9075301Z auto tmp0 = static_cast(1); 2023-01-11T21:38:05.9075391Z out_ptr0[i0] = tmp0; 2023-01-11T21:38:05.9075458Z } 2023-01-11T21:38:05.9075543Z #pragma omp for 2023-01-11T21:38:05.9075621Z for(long i0=0; i0<32; i0+=1) 2023-01-11T21:38:05.9075689Z { 2023-01-11T21:38:05.9075831Z auto tmp0 = at::vec::Vectorized(static_cast(3.0)); 2023-01-11T21:38:05.9075928Z tmp0.store(out_ptr1 + 8*i0); 2023-01-11T21:38:05.9075997Z } 2023-01-11T21:38:05.9076098Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:05.9076186Z for(long i0=256; i0<256; i0+=1) 2023-01-11T21:38:05.9076245Z { 2023-01-11T21:38:05.9076351Z auto tmp0 = static_cast(3.0); 2023-01-11T21:38:05.9076438Z out_ptr1[i0] = tmp0; 2023-01-11T21:38:05.9076507Z } 2023-01-11T21:38:05.9076574Z } 2023-01-11T21:38:05.9076638Z } 2023-01-11T21:38:05.9076724Z ''') 2023-01-11T21:38:05.9076729Z 2023-01-11T21:38:05.9076734Z 2023-01-11T21:38:05.9076819Z async_compile.wait(globals()) 2023-01-11T21:38:05.9076926Z del async_compile 2023-01-11T21:38:05.9076932Z 2023-01-11T21:38:05.9077007Z def call(args): 2023-01-11T21:38:05.9077084Z arg0_1, = args 2023-01-11T21:38:05.9077161Z args.clear() 2023-01-11T21:38:05.9077364Z buf0 = empty_strided((16, 16), (16, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9077561Z buf1 = empty_strided((16, 16), (16, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9077698Z kernel_cpp_0(c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr())) 2023-01-11T21:38:05.9077772Z return (buf0, buf1, ) 2023-01-11T21:38:05.9077777Z 2023-01-11T21:38:05.9077781Z 2023-01-11T21:38:05.9077866Z if __name__ == "__main__": 2023-01-11T21:38:05.9077985Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.9078114Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.9078312Z arg0_1 = rand_strided((16, 16), (16, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9078428Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:05.9078433Z 2023-01-11T21:38:05.9078506Z ok (1.914s) 2023-01-11T21:38:05.9078954Z test_flip_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.9079086Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.9079335Z [2023-01-11 21:26:39,573] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 103 2023-01-11T21:38:05.9079672Z [2023-01-11 21:26:41,414] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 103 2023-01-11T21:38:05.9079677Z 2023-01-11T21:38:05.9079779Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.9079861Z import torch 2023-01-11T21:38:05.9079936Z import random 2023-01-11T21:38:05.9080055Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.9080178Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.9080183Z 2023-01-11T21:38:05.9080265Z aten = torch.ops.aten 2023-01-11T21:38:05.9080393Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.9080491Z async_compile = AsyncCompile() 2023-01-11T21:38:05.9080496Z 2023-01-11T21:38:05.9080570Z import triton 2023-01-11T21:38:05.9080661Z import triton.language as tl 2023-01-11T21:38:05.9080786Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.9080932Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.9080938Z 2023-01-11T21:38:05.9080943Z 2023-01-11T21:38:05.9081081Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.9081287Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.9081406Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:05.9081513Z float* __restrict__ out_ptr0, 2023-01-11T21:38:05.9081616Z float* __restrict__ out_ptr1) 2023-01-11T21:38:05.9081683Z { 2023-01-11T21:38:05.9081785Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.9081853Z { 2023-01-11T21:38:05.9081932Z #pragma omp for 2023-01-11T21:38:05.9082013Z for(long i0=0; i0<12; i0+=1) 2023-01-11T21:38:05.9082081Z { 2023-01-11T21:38:05.9082167Z #pragma GCC ivdep 2023-01-11T21:38:05.9082256Z for(long i1=0; i1<6; i1+=1) 2023-01-11T21:38:05.9082328Z { 2023-01-11T21:38:05.9082398Z { 2023-01-11T21:38:05.9082473Z { 2023-01-11T21:38:05.9082640Z auto tmp0 = in_ptr0[5 + ((-1)*i1) + (6*i0)]; 2023-01-11T21:38:05.9082742Z out_ptr0[i1 + (6*i0)] = tmp0; 2023-01-11T21:38:05.9082844Z } 2023-01-11T21:38:05.9082914Z } 2023-01-11T21:38:05.9082983Z } 2023-01-11T21:38:05.9083050Z } 2023-01-11T21:38:05.9083138Z #pragma omp for collapse(2) 2023-01-11T21:38:05.9083226Z for(long i0=0; i0<2; i0+=1) 2023-01-11T21:38:05.9083296Z { 2023-01-11T21:38:05.9083385Z for(long i1=0; i1<6; i1+=1) 2023-01-11T21:38:05.9083454Z { 2023-01-11T21:38:05.9083542Z #pragma GCC ivdep 2023-01-11T21:38:05.9083638Z for(long i2=0; i2<6; i2+=1) 2023-01-11T21:38:05.9083700Z { 2023-01-11T21:38:05.9083772Z { 2023-01-11T21:38:05.9083846Z { 2023-01-11T21:38:05.9084031Z auto tmp0 = in_ptr0[30 + i2 + ((-6)*i1) + (36*i0)]; 2023-01-11T21:38:05.9084147Z auto tmp1 = static_cast(2); 2023-01-11T21:38:05.9084297Z auto tmp2 = tmp0 - tmp1; 2023-01-11T21:38:05.9084409Z out_ptr1[i2 + (6*i1) + (36*i0)] = tmp2; 2023-01-11T21:38:05.9084475Z } 2023-01-11T21:38:05.9084545Z } 2023-01-11T21:38:05.9084618Z } 2023-01-11T21:38:05.9084689Z } 2023-01-11T21:38:05.9084757Z } 2023-01-11T21:38:05.9084823Z } 2023-01-11T21:38:05.9084889Z } 2023-01-11T21:38:05.9084966Z ''') 2023-01-11T21:38:05.9084971Z 2023-01-11T21:38:05.9084976Z 2023-01-11T21:38:05.9085070Z async_compile.wait(globals()) 2023-01-11T21:38:05.9085148Z del async_compile 2023-01-11T21:38:05.9085153Z 2023-01-11T21:38:05.9085230Z def call(args): 2023-01-11T21:38:05.9085339Z arg0_1, = args 2023-01-11T21:38:05.9085418Z args.clear() 2023-01-11T21:38:05.9085631Z buf0 = empty_strided((1, 2, 6, 6), (72, 36, 6, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9085832Z buf1 = empty_strided((1, 2, 6, 6), (72, 36, 6, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9086005Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr())) 2023-01-11T21:38:05.9086079Z del arg0_1 2023-01-11T21:38:05.9086160Z return (buf0, buf1, ) 2023-01-11T21:38:05.9086165Z 2023-01-11T21:38:05.9086170Z 2023-01-11T21:38:05.9086254Z if __name__ == "__main__": 2023-01-11T21:38:05.9086375Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.9086504Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.9086717Z arg0_1 = rand_strided((1, 2, 6, 6), (72, 36, 6, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9086821Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:05.9086841Z 2023-01-11T21:38:05.9086904Z ok (1.893s) 2023-01-11T21:38:05.9087358Z test_fmod_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.9087493Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.9087752Z [2023-01-11 21:26:41,450] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 104 2023-01-11T21:38:05.9088017Z [2023-01-11 21:26:43,260] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 104 2023-01-11T21:38:05.9088023Z 2023-01-11T21:38:05.9088122Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.9088197Z import torch 2023-01-11T21:38:05.9088275Z import random 2023-01-11T21:38:05.9088397Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.9088514Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.9088519Z 2023-01-11T21:38:05.9088600Z aten = torch.ops.aten 2023-01-11T21:38:05.9088737Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.9088861Z async_compile = AsyncCompile() 2023-01-11T21:38:05.9088868Z 2023-01-11T21:38:05.9088945Z import triton 2023-01-11T21:38:05.9089038Z import triton.language as tl 2023-01-11T21:38:05.9089164Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.9089297Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.9089312Z 2023-01-11T21:38:05.9089316Z 2023-01-11T21:38:05.9089445Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.9089652Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.9089777Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:05.9089890Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:05.9089994Z float* __restrict__ out_ptr0, 2023-01-11T21:38:05.9090095Z float* __restrict__ out_ptr1) 2023-01-11T21:38:05.9090165Z { 2023-01-11T21:38:05.9090261Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.9090328Z { 2023-01-11T21:38:05.9090412Z #pragma omp for 2023-01-11T21:38:05.9090502Z for(long i0=0; i0<9; i0+=1) 2023-01-11T21:38:05.9090570Z { 2023-01-11T21:38:05.9090713Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:05.9090849Z auto tmp1 = at::vec::Vectorized::loadu(in_ptr1 + 8*i0); 2023-01-11T21:38:05.9090938Z auto tmp2 = tmp0.fmod(tmp1); 2023-01-11T21:38:05.9091081Z auto tmp3 = at::vec::Vectorized(static_cast(3.0)); 2023-01-11T21:38:05.9091173Z auto tmp4 = tmp0 * tmp3; 2023-01-11T21:38:05.9091296Z auto tmp5 = tmp4.fmod(tmp1); 2023-01-11T21:38:05.9091436Z auto tmp6 = at::vec::Vectorized(static_cast(2.0)); 2023-01-11T21:38:05.9091563Z auto tmp7 = tmp5 - tmp6; 2023-01-11T21:38:05.9091657Z tmp2.store(out_ptr0 + 8*i0); 2023-01-11T21:38:05.9091754Z tmp7.store(out_ptr1 + 8*i0); 2023-01-11T21:38:05.9091814Z } 2023-01-11T21:38:05.9091914Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:05.9092003Z for(long i0=72; i0<72; i0+=1) 2023-01-11T21:38:05.9092070Z { 2023-01-11T21:38:05.9092158Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:05.9092246Z auto tmp1 = in_ptr1[i0]; 2023-01-11T21:38:05.9092353Z auto tmp2 = std::fmod(tmp0, tmp1); 2023-01-11T21:38:05.9092452Z auto tmp3 = static_cast(3.0); 2023-01-11T21:38:05.9092546Z auto tmp4 = tmp0 * tmp3; 2023-01-11T21:38:05.9092650Z auto tmp5 = std::fmod(tmp4, tmp1); 2023-01-11T21:38:05.9092761Z auto tmp6 = static_cast(2.0); 2023-01-11T21:38:05.9092886Z auto tmp7 = tmp5 - tmp6; 2023-01-11T21:38:05.9092976Z out_ptr0[i0] = tmp2; 2023-01-11T21:38:05.9093062Z out_ptr1[i0] = tmp7; 2023-01-11T21:38:05.9093122Z } 2023-01-11T21:38:05.9093190Z } 2023-01-11T21:38:05.9093258Z } 2023-01-11T21:38:05.9093343Z ''') 2023-01-11T21:38:05.9093348Z 2023-01-11T21:38:05.9093353Z 2023-01-11T21:38:05.9093447Z async_compile.wait(globals()) 2023-01-11T21:38:05.9093526Z del async_compile 2023-01-11T21:38:05.9093531Z 2023-01-11T21:38:05.9093608Z def call(args): 2023-01-11T21:38:05.9093681Z arg0_1, arg1_1 = args 2023-01-11T21:38:05.9093761Z args.clear() 2023-01-11T21:38:05.9093974Z buf0 = empty_strided((1, 2, 6, 6), (72, 36, 6, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9094189Z buf1 = empty_strided((1, 2, 6, 6), (72, 36, 6, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9094387Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr())) 2023-01-11T21:38:05.9094464Z del arg0_1 2023-01-11T21:38:05.9094650Z del arg1_1 2023-01-11T21:38:05.9094725Z return (buf0, buf1, ) 2023-01-11T21:38:05.9094739Z 2023-01-11T21:38:05.9094743Z 2023-01-11T21:38:05.9094873Z if __name__ == "__main__": 2023-01-11T21:38:05.9094993Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.9095125Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.9095338Z arg0_1 = rand_strided((1, 2, 6, 6), (72, 36, 6, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9095549Z arg1_1 = rand_strided((1, 2, 6, 6), (72, 36, 6, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9095669Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:05.9095675Z 2023-01-11T21:38:05.9095747Z ok (1.846s) 2023-01-11T21:38:05.9096205Z test_fmod_zero_dim_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.9096343Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.9096594Z [2023-01-11 21:26:43,287] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 105 2023-01-11T21:38:05.9096857Z [2023-01-11 21:26:45,058] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 105 2023-01-11T21:38:05.9097334Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.9097503Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.9097757Z [2023-01-11 21:26:45,074] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 106 2023-01-11T21:38:05.9098018Z [2023-01-11 21:26:46,826] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 106 2023-01-11T21:38:05.9098024Z 2023-01-11T21:38:05.9098122Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.9098199Z import torch 2023-01-11T21:38:05.9098275Z import random 2023-01-11T21:38:05.9098387Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.9098514Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.9098519Z 2023-01-11T21:38:05.9098603Z aten = torch.ops.aten 2023-01-11T21:38:05.9098743Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.9098844Z async_compile = AsyncCompile() 2023-01-11T21:38:05.9098849Z 2023-01-11T21:38:05.9098923Z import triton 2023-01-11T21:38:05.9099015Z import triton.language as tl 2023-01-11T21:38:05.9099142Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.9099274Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.9099282Z 2023-01-11T21:38:05.9099296Z 2023-01-11T21:38:05.9099426Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.9099634Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.9099759Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:05.9099870Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:05.9099980Z float* __restrict__ out_ptr0) 2023-01-11T21:38:05.9100046Z { 2023-01-11T21:38:05.9100149Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.9100208Z { 2023-01-11T21:38:05.9100295Z #pragma omp for 2023-01-11T21:38:05.9100384Z for(long i0=0; i0<1; i0+=1) 2023-01-11T21:38:05.9100452Z { 2023-01-11T21:38:05.9100594Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:05.9100720Z auto tmp1 = at::vec::Vectorized(in_ptr1[0]); 2023-01-11T21:38:05.9100852Z auto tmp2 = tmp0.fmod(tmp1); 2023-01-11T21:38:05.9100942Z tmp2.store(out_ptr0 + 8*i0); 2023-01-11T21:38:05.9101009Z } 2023-01-11T21:38:05.9101115Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:05.9101204Z for(long i0=8; i0<10; i0+=1) 2023-01-11T21:38:05.9101273Z { 2023-01-11T21:38:05.9101363Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:05.9101451Z auto tmp1 = in_ptr1[0]; 2023-01-11T21:38:05.9101551Z auto tmp2 = std::fmod(tmp0, tmp1); 2023-01-11T21:38:05.9101646Z out_ptr0[i0] = tmp2; 2023-01-11T21:38:05.9101715Z } 2023-01-11T21:38:05.9101783Z } 2023-01-11T21:38:05.9101853Z } 2023-01-11T21:38:05.9101940Z ''') 2023-01-11T21:38:05.9101945Z 2023-01-11T21:38:05.9101949Z 2023-01-11T21:38:05.9102042Z async_compile.wait(globals()) 2023-01-11T21:38:05.9102115Z del async_compile 2023-01-11T21:38:05.9102120Z 2023-01-11T21:38:05.9102194Z def call(args): 2023-01-11T21:38:05.9102274Z arg0_1, arg1_1 = args 2023-01-11T21:38:05.9102356Z args.clear() 2023-01-11T21:38:05.9102549Z buf0 = empty_strided((10, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9102716Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:05.9102790Z del arg0_1 2023-01-11T21:38:05.9102855Z del arg1_1 2023-01-11T21:38:05.9102933Z return (buf0, ) 2023-01-11T21:38:05.9102938Z 2023-01-11T21:38:05.9102942Z 2023-01-11T21:38:05.9103028Z if __name__ == "__main__": 2023-01-11T21:38:05.9103148Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.9103274Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.9103503Z arg0_1 = rand_strided((10, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9103690Z arg1_1 = rand_strided((), (), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9103810Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:05.9103815Z 2023-01-11T21:38:05.9103821Z 2023-01-11T21:38:05.9103912Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.9103989Z import torch 2023-01-11T21:38:05.9104065Z import random 2023-01-11T21:38:05.9104184Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.9104305Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.9104310Z 2023-01-11T21:38:05.9104394Z aten = torch.ops.aten 2023-01-11T21:38:05.9104529Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.9104628Z async_compile = AsyncCompile() 2023-01-11T21:38:05.9104633Z 2023-01-11T21:38:05.9104699Z import triton 2023-01-11T21:38:05.9104796Z import triton.language as tl 2023-01-11T21:38:05.9104920Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.9105059Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.9105064Z 2023-01-11T21:38:05.9105069Z 2023-01-11T21:38:05.9105204Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.9105413Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.9105539Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:05.9105650Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:05.9105746Z float* __restrict__ out_ptr0) 2023-01-11T21:38:05.9105813Z { 2023-01-11T21:38:05.9105915Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.9105982Z { 2023-01-11T21:38:05.9106066Z #pragma omp for 2023-01-11T21:38:05.9106155Z for(long i0=0; i0<1; i0+=1) 2023-01-11T21:38:05.9106215Z { 2023-01-11T21:38:05.9106345Z auto tmp0 = at::vec::Vectorized(in_ptr0[0]); 2023-01-11T21:38:05.9106482Z auto tmp1 = at::vec::Vectorized::loadu(in_ptr1 + 8*i0); 2023-01-11T21:38:05.9106578Z auto tmp2 = tmp0.fmod(tmp1); 2023-01-11T21:38:05.9106677Z tmp2.store(out_ptr0 + 8*i0); 2023-01-11T21:38:05.9106744Z } 2023-01-11T21:38:05.9106875Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:05.9106965Z for(long i0=8; i0<10; i0+=1) 2023-01-11T21:38:05.9107024Z { 2023-01-11T21:38:05.9107116Z auto tmp0 = in_ptr0[0]; 2023-01-11T21:38:05.9107206Z auto tmp1 = in_ptr1[i0]; 2023-01-11T21:38:05.9107313Z auto tmp2 = std::fmod(tmp0, tmp1); 2023-01-11T21:38:05.9107399Z out_ptr0[i0] = tmp2; 2023-01-11T21:38:05.9107469Z } 2023-01-11T21:38:05.9107535Z } 2023-01-11T21:38:05.9107593Z } 2023-01-11T21:38:05.9107679Z ''') 2023-01-11T21:38:05.9107685Z 2023-01-11T21:38:05.9107689Z 2023-01-11T21:38:05.9107789Z async_compile.wait(globals()) 2023-01-11T21:38:05.9107869Z del async_compile 2023-01-11T21:38:05.9107874Z 2023-01-11T21:38:05.9107950Z def call(args): 2023-01-11T21:38:05.9108031Z arg0_1, arg1_1 = args 2023-01-11T21:38:05.9108108Z args.clear() 2023-01-11T21:38:05.9108294Z buf0 = empty_strided((10, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9108463Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:05.9108536Z del arg0_1 2023-01-11T21:38:05.9108607Z del arg1_1 2023-01-11T21:38:05.9108686Z return (buf0, ) 2023-01-11T21:38:05.9108691Z 2023-01-11T21:38:05.9108696Z 2023-01-11T21:38:05.9108777Z if __name__ == "__main__": 2023-01-11T21:38:05.9108898Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.9109029Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.9109208Z arg0_1 = rand_strided((), (), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9109430Z arg1_1 = rand_strided((10, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9109551Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:05.9109556Z 2023-01-11T21:38:05.9109628Z ok (3.563s) 2023-01-11T21:38:05.9110092Z test_forced_buffer_realize_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.9110231Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.9110494Z [2023-01-11 21:26:46,861] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 107 2023-01-11T21:38:05.9110755Z [2023-01-11 21:26:48,603] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 107 2023-01-11T21:38:05.9110764Z 2023-01-11T21:38:05.9110866Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.9110933Z import torch 2023-01-11T21:38:05.9111009Z import random 2023-01-11T21:38:05.9111131Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.9111259Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.9111264Z 2023-01-11T21:38:05.9111348Z aten = torch.ops.aten 2023-01-11T21:38:05.9111484Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.9111581Z async_compile = AsyncCompile() 2023-01-11T21:38:05.9111586Z 2023-01-11T21:38:05.9111661Z import triton 2023-01-11T21:38:05.9111746Z import triton.language as tl 2023-01-11T21:38:05.9111871Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.9112010Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.9112015Z 2023-01-11T21:38:05.9112019Z 2023-01-11T21:38:05.9112159Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.9112366Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.9112487Z extern "C" void kernel(float* __restrict__ in_out_ptr0, 2023-01-11T21:38:05.9112605Z const float* __restrict__ in_ptr0) 2023-01-11T21:38:05.9112672Z { 2023-01-11T21:38:05.9112795Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.9112866Z { 2023-01-11T21:38:05.9112948Z #pragma omp for 2023-01-11T21:38:05.9113036Z for(long i0=0; i0<1; i0+=1) 2023-01-11T21:38:05.9113104Z { 2023-01-11T21:38:05.9113244Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:05.9113383Z auto tmp1 = at::vec::Vectorized(static_cast(2)); 2023-01-11T21:38:05.9113466Z auto tmp2 = tmp0 * tmp1; 2023-01-11T21:38:05.9113554Z auto tmp3 = tmp2 * tmp1; 2023-01-11T21:38:05.9113658Z tmp3.store(in_out_ptr0 + 8*i0); 2023-01-11T21:38:05.9113729Z } 2023-01-11T21:38:05.9113830Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:05.9113920Z for(long i0=8; i0<10; i0+=1) 2023-01-11T21:38:05.9113988Z { 2023-01-11T21:38:05.9114070Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:05.9114180Z auto tmp1 = static_cast(2); 2023-01-11T21:38:05.9114267Z auto tmp2 = tmp0 * tmp1; 2023-01-11T21:38:05.9114358Z auto tmp3 = tmp2 * tmp1; 2023-01-11T21:38:05.9114444Z in_out_ptr0[i0] = tmp3; 2023-01-11T21:38:05.9114513Z } 2023-01-11T21:38:05.9114572Z } 2023-01-11T21:38:05.9114639Z } 2023-01-11T21:38:05.9114724Z ''') 2023-01-11T21:38:05.9114730Z 2023-01-11T21:38:05.9114734Z 2023-01-11T21:38:05.9114829Z async_compile.wait(globals()) 2023-01-11T21:38:05.9114908Z del async_compile 2023-01-11T21:38:05.9114913Z 2023-01-11T21:38:05.9114988Z def call(args): 2023-01-11T21:38:05.9115063Z arg0_1, = args 2023-01-11T21:38:05.9115165Z args.clear() 2023-01-11T21:38:05.9115353Z buf0 = empty_strided((10, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9115443Z buf1 = buf0; del buf0 # reuse 2023-01-11T21:38:05.9115584Z kernel_cpp_0(c_void_p(buf1.data_ptr()), c_void_p(arg0_1.data_ptr())) 2023-01-11T21:38:05.9115659Z del arg0_1 2023-01-11T21:38:05.9115737Z return (buf1, ) 2023-01-11T21:38:05.9115742Z 2023-01-11T21:38:05.9115746Z 2023-01-11T21:38:05.9115829Z if __name__ == "__main__": 2023-01-11T21:38:05.9115947Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.9116067Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.9116260Z arg0_1 = rand_strided((10, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9116374Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:05.9116379Z 2023-01-11T21:38:05.9116451Z ok (1.766s) 2023-01-11T21:38:05.9116896Z test_full_like_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.9117037Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.9117296Z [2023-01-11 21:26:48,634] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 108 2023-01-11T21:38:05.9117559Z [2023-01-11 21:26:50,374] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 108 2023-01-11T21:38:05.9117564Z 2023-01-11T21:38:05.9117661Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.9117736Z import torch 2023-01-11T21:38:05.9117803Z import random 2023-01-11T21:38:05.9117924Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.9118049Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.9118056Z 2023-01-11T21:38:05.9118141Z aten = torch.ops.aten 2023-01-11T21:38:05.9118276Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.9118374Z async_compile = AsyncCompile() 2023-01-11T21:38:05.9118379Z 2023-01-11T21:38:05.9118456Z import triton 2023-01-11T21:38:05.9118568Z import triton.language as tl 2023-01-11T21:38:05.9118696Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.9118837Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.9118842Z 2023-01-11T21:38:05.9118847Z 2023-01-11T21:38:05.9118984Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.9119191Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.9119310Z extern "C" void kernel(float* __restrict__ out_ptr0) 2023-01-11T21:38:05.9119377Z { 2023-01-11T21:38:05.9119481Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.9119544Z { 2023-01-11T21:38:05.9119627Z #pragma omp for 2023-01-11T21:38:05.9119716Z for(long i0=0; i0<1; i0+=1) 2023-01-11T21:38:05.9119784Z { 2023-01-11T21:38:05.9119928Z auto tmp0 = at::vec::Vectorized(static_cast(7.777)); 2023-01-11T21:38:05.9120070Z auto tmp1 = at::vec::Vectorized(static_cast(1)); 2023-01-11T21:38:05.9120199Z auto tmp2 = tmp0 - tmp1; 2023-01-11T21:38:05.9120288Z tmp2.store(out_ptr0 + 8*i0); 2023-01-11T21:38:05.9120357Z } 2023-01-11T21:38:05.9120459Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:05.9120544Z for(long i0=8; i0<8; i0+=1) 2023-01-11T21:38:05.9120615Z { 2023-01-11T21:38:05.9120723Z auto tmp0 = static_cast(7.777); 2023-01-11T21:38:05.9120825Z auto tmp1 = static_cast(1); 2023-01-11T21:38:05.9120944Z auto tmp2 = tmp0 - tmp1; 2023-01-11T21:38:05.9121031Z out_ptr0[i0] = tmp2; 2023-01-11T21:38:05.9121127Z } 2023-01-11T21:38:05.9121198Z } 2023-01-11T21:38:05.9121263Z } 2023-01-11T21:38:05.9121349Z ''') 2023-01-11T21:38:05.9121354Z 2023-01-11T21:38:05.9121359Z 2023-01-11T21:38:05.9121453Z async_compile.wait(globals()) 2023-01-11T21:38:05.9121524Z del async_compile 2023-01-11T21:38:05.9121530Z 2023-01-11T21:38:05.9121609Z def call(args): 2023-01-11T21:38:05.9121685Z arg0_1, = args 2023-01-11T21:38:05.9121760Z args.clear() 2023-01-11T21:38:05.9121953Z buf0 = empty_strided((8, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9122061Z kernel_cpp_0(c_void_p(buf0.data_ptr())) 2023-01-11T21:38:05.9122140Z return (buf0, ) 2023-01-11T21:38:05.9122145Z 2023-01-11T21:38:05.9122149Z 2023-01-11T21:38:05.9122231Z if __name__ == "__main__": 2023-01-11T21:38:05.9122341Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.9122468Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.9122660Z arg0_1 = rand_strided((8, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9122780Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:05.9122785Z 2023-01-11T21:38:05.9122859Z ok (1.784s) 2023-01-11T21:38:05.9123320Z test_fuse_tiled_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.9123453Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.9123713Z [2023-01-11 21:26:50,405] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 109 2023-01-11T21:38:05.9123977Z [2023-01-11 21:26:52,258] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 109 2023-01-11T21:38:05.9123985Z 2023-01-11T21:38:05.9124076Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.9124150Z import torch 2023-01-11T21:38:05.9124226Z import random 2023-01-11T21:38:05.9124347Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.9124472Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.9124477Z 2023-01-11T21:38:05.9124591Z aten = torch.ops.aten 2023-01-11T21:38:05.9124730Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.9124819Z async_compile = AsyncCompile() 2023-01-11T21:38:05.9124833Z 2023-01-11T21:38:05.9124900Z import triton 2023-01-11T21:38:05.9124994Z import triton.language as tl 2023-01-11T21:38:05.9125120Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.9125259Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.9125264Z 2023-01-11T21:38:05.9125269Z 2023-01-11T21:38:05.9125407Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.9125612Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.9125740Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:05.9125851Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:05.9125953Z const float* __restrict__ in_ptr2, 2023-01-11T21:38:05.9126060Z float* __restrict__ out_ptr0, 2023-01-11T21:38:05.9126164Z float* __restrict__ out_ptr1) 2023-01-11T21:38:05.9126228Z { 2023-01-11T21:38:05.9126332Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.9126400Z { 2023-01-11T21:38:05.9126475Z #pragma omp for 2023-01-11T21:38:05.9126566Z for(long i0=0; i0<128; i0+=1) 2023-01-11T21:38:05.9126633Z { 2023-01-11T21:38:05.9126724Z for(long i1=0; i1<16; i1+=1) 2023-01-11T21:38:05.9126794Z { 2023-01-11T21:38:05.9126929Z auto tmp0 = at::vec::Vectorized(in_ptr0[i0]); 2023-01-11T21:38:05.9127102Z auto tmp1 = at::vec::Vectorized::loadu(in_ptr1 + 8*i1); 2023-01-11T21:38:05.9127197Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:05.9127299Z tmp2.store(out_ptr0 + (8*i1) + (128*i0)); 2023-01-11T21:38:05.9127369Z } 2023-01-11T21:38:05.9127474Z #pragma omp simd simdlen(4) 2023-01-11T21:38:05.9127571Z for(long i1=128; i1<128; i1+=1) 2023-01-11T21:38:05.9127640Z { 2023-01-11T21:38:05.9127732Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:05.9127824Z auto tmp1 = in_ptr1[i1]; 2023-01-11T21:38:05.9127908Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:05.9128008Z out_ptr0[i1 + (128*i0)] = tmp2; 2023-01-11T21:38:05.9128080Z } 2023-01-11T21:38:05.9128147Z } 2023-01-11T21:38:05.9128233Z #pragma omp for 2023-01-11T21:38:05.9128322Z for(long i0=0; i0<2048; i0+=1) 2023-01-11T21:38:05.9128382Z { 2023-01-11T21:38:05.9128523Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr2 + 8*i0); 2023-01-11T21:38:05.9128662Z auto tmp1 = at::vec::Vectorized(static_cast(1)); 2023-01-11T21:38:05.9128752Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:05.9128848Z tmp2.store(out_ptr1 + 8*i0); 2023-01-11T21:38:05.9128919Z } 2023-01-11T21:38:05.9129018Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:05.9129112Z for(long i0=16384; i0<16384; i0+=1) 2023-01-11T21:38:05.9129172Z { 2023-01-11T21:38:05.9129261Z auto tmp0 = in_ptr2[i0]; 2023-01-11T21:38:05.9129364Z auto tmp1 = static_cast(1); 2023-01-11T21:38:05.9129455Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:05.9129542Z out_ptr1[i0] = tmp2; 2023-01-11T21:38:05.9129610Z } 2023-01-11T21:38:05.9129669Z } 2023-01-11T21:38:05.9129735Z } 2023-01-11T21:38:05.9129820Z ''') 2023-01-11T21:38:05.9129825Z 2023-01-11T21:38:05.9129832Z 2023-01-11T21:38:05.9129927Z async_compile.wait(globals()) 2023-01-11T21:38:05.9130004Z del async_compile 2023-01-11T21:38:05.9130009Z 2023-01-11T21:38:05.9130085Z def call(args): 2023-01-11T21:38:05.9130176Z arg0_1, arg1_1, arg2_1 = args 2023-01-11T21:38:05.9130252Z args.clear() 2023-01-11T21:38:05.9130477Z buf0 = empty_strided((128, 128), (128, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9130681Z buf1 = empty_strided((128, 128), (128, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9130897Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(arg2_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr())) 2023-01-11T21:38:05.9130975Z del arg0_1 2023-01-11T21:38:05.9131051Z del arg1_1 2023-01-11T21:38:05.9131124Z del arg2_1 2023-01-11T21:38:05.9131204Z return (buf0, buf1, ) 2023-01-11T21:38:05.9131209Z 2023-01-11T21:38:05.9131213Z 2023-01-11T21:38:05.9131295Z if __name__ == "__main__": 2023-01-11T21:38:05.9131410Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.9131540Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.9131745Z arg0_1 = rand_strided((128, 1), (1, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9131953Z arg1_1 = rand_strided((1, 128), (128, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9132158Z arg2_1 = rand_strided((128, 128), (128, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9132291Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1])) 2023-01-11T21:38:05.9132296Z 2023-01-11T21:38:05.9132369Z ok (1.873s) 2023-01-11T21:38:05.9132823Z test_gather1_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.9132984Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.9133235Z [2023-01-11 21:26:52,286] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 110 2023-01-11T21:38:05.9133499Z [2023-01-11 21:26:54,036] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 110 2023-01-11T21:38:05.9133505Z 2023-01-11T21:38:05.9133604Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.9133681Z import torch 2023-01-11T21:38:05.9133756Z import random 2023-01-11T21:38:05.9133877Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.9134002Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.9134007Z 2023-01-11T21:38:05.9134090Z aten = torch.ops.aten 2023-01-11T21:38:05.9134219Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.9134317Z async_compile = AsyncCompile() 2023-01-11T21:38:05.9134326Z 2023-01-11T21:38:05.9134400Z import triton 2023-01-11T21:38:05.9134598Z import triton.language as tl 2023-01-11T21:38:05.9134725Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.9134864Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.9134870Z 2023-01-11T21:38:05.9134874Z 2023-01-11T21:38:05.9135015Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.9135223Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.9135338Z extern "C" void kernel(const long* __restrict__ in_ptr0, 2023-01-11T21:38:05.9135451Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:05.9135558Z float* __restrict__ out_ptr0, 2023-01-11T21:38:05.9135662Z float* __restrict__ out_ptr1) 2023-01-11T21:38:05.9135727Z { 2023-01-11T21:38:05.9135829Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.9135901Z { 2023-01-11T21:38:05.9135978Z #pragma omp for 2023-01-11T21:38:05.9136067Z for(long i0=0; i0<20; i0+=1) 2023-01-11T21:38:05.9136135Z { 2023-01-11T21:38:05.9136220Z #pragma GCC ivdep 2023-01-11T21:38:05.9136311Z for(long i1=0; i1<10; i1+=1) 2023-01-11T21:38:05.9136379Z { 2023-01-11T21:38:05.9136507Z { 2023-01-11T21:38:05.9136572Z { 2023-01-11T21:38:05.9136684Z auto tmp0 = in_ptr0[i1 + (10*i0)]; 2023-01-11T21:38:05.9136795Z auto tmp1 = static_cast(1); 2023-01-11T21:38:05.9136898Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:05.9137008Z auto tmp3 = in_ptr1[tmp2 + (6*i1)]; 2023-01-11T21:38:05.9137111Z out_ptr0[i1 + (10*i0)] = tmp3; 2023-01-11T21:38:05.9137281Z out_ptr1[i1 + (10*i0)] = tmp3; 2023-01-11T21:38:05.9137345Z } 2023-01-11T21:38:05.9137416Z } 2023-01-11T21:38:05.9137491Z } 2023-01-11T21:38:05.9166692Z } 2023-01-11T21:38:05.9166771Z } 2023-01-11T21:38:05.9166831Z } 2023-01-11T21:38:05.9166943Z ''') 2023-01-11T21:38:05.9166949Z 2023-01-11T21:38:05.9166954Z 2023-01-11T21:38:05.9167057Z async_compile.wait(globals()) 2023-01-11T21:38:05.9167132Z del async_compile 2023-01-11T21:38:05.9167142Z 2023-01-11T21:38:05.9167214Z def call(args): 2023-01-11T21:38:05.9167290Z arg0_1, arg1_1 = args 2023-01-11T21:38:05.9167361Z args.clear() 2023-01-11T21:38:05.9167582Z buf0 = empty_strided((4, 5, 10, 1), (50, 10, 1, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9167790Z buf1 = empty_strided((4, 5, 10, 1), (50, 10, 1, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9167983Z kernel_cpp_0(c_void_p(arg1_1.data_ptr()), c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr())) 2023-01-11T21:38:05.9168053Z del arg0_1 2023-01-11T21:38:05.9168120Z del arg1_1 2023-01-11T21:38:05.9168306Z return (buf0, buf1, ) 2023-01-11T21:38:05.9168312Z 2023-01-11T21:38:05.9168317Z 2023-01-11T21:38:05.9168393Z if __name__ == "__main__": 2023-01-11T21:38:05.9168510Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.9168634Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.9168846Z arg0_1 = rand_strided((1, 1, 10, 6), (60, 60, 6, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9169051Z arg1_1 = rand_strided((4, 5, 10, 1), (50, 10, 1, 1), device='cpu', dtype=torch.int64) 2023-01-11T21:38:05.9169167Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:05.9169173Z 2023-01-11T21:38:05.9169240Z ok (1.826s) 2023-01-11T21:38:05.9169354Z test_gather2_cpu (__main__.CpuTests) ... ok (0.001s) 2023-01-11T21:38:05.9169813Z test_gather_scatter_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.9169944Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.9170205Z [2023-01-11 21:26:54,184] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 111 2023-01-11T21:38:05.9170465Z [2023-01-11 21:26:55,935] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 111 2023-01-11T21:38:05.9170471Z 2023-01-11T21:38:05.9170566Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.9170633Z import torch 2023-01-11T21:38:05.9170704Z import random 2023-01-11T21:38:05.9170819Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.9170941Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.9170946Z 2023-01-11T21:38:05.9171025Z aten = torch.ops.aten 2023-01-11T21:38:05.9171161Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.9171253Z async_compile = AsyncCompile() 2023-01-11T21:38:05.9171258Z 2023-01-11T21:38:05.9171326Z import triton 2023-01-11T21:38:05.9171415Z import triton.language as tl 2023-01-11T21:38:05.9171540Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.9171710Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.9171716Z 2023-01-11T21:38:05.9171720Z 2023-01-11T21:38:05.9171852Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.9172056Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.9172175Z extern "C" void kernel(const long* __restrict__ in_ptr0, 2023-01-11T21:38:05.9172283Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:05.9172384Z float* __restrict__ out_ptr0) 2023-01-11T21:38:05.9172446Z { 2023-01-11T21:38:05.9172547Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.9172607Z { 2023-01-11T21:38:05.9172685Z #pragma omp for 2023-01-11T21:38:05.9172769Z for(long i0=0; i0<64; i0+=1) 2023-01-11T21:38:05.9172832Z { 2023-01-11T21:38:05.9172975Z auto tmp0 = at::vec::Vectorized(static_cast(0)); 2023-01-11T21:38:05.9173071Z tmp0.store(out_ptr0 + 8*i0); 2023-01-11T21:38:05.9173134Z } 2023-01-11T21:38:05.9173227Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:05.9173311Z for(long i0=512; i0<512; i0+=1) 2023-01-11T21:38:05.9173374Z { 2023-01-11T21:38:05.9173475Z auto tmp0 = static_cast(0); 2023-01-11T21:38:05.9173557Z out_ptr0[i0] = tmp0; 2023-01-11T21:38:05.9173620Z } 2023-01-11T21:38:05.9173697Z #pragma omp for 2023-01-11T21:38:05.9173776Z for(long i0=0; i0<80; i0+=1) 2023-01-11T21:38:05.9173838Z { 2023-01-11T21:38:05.9173920Z #pragma GCC ivdep 2023-01-11T21:38:05.9174043Z for(long i1=0; i1<32; i1+=1) 2023-01-11T21:38:05.9174107Z { 2023-01-11T21:38:05.9174172Z { 2023-01-11T21:38:05.9174239Z { 2023-01-11T21:38:05.9174337Z auto tmp0 = in_ptr0[80 + i0]; 2023-01-11T21:38:05.9174435Z auto tmp1 = in_ptr0[i0]; 2023-01-11T21:38:05.9174827Z auto tmp2 = in_ptr1[i1 + (32*tmp1)]; 2023-01-11T21:38:05.9174935Z auto tmp3 = in_ptr1[i1 + (32*tmp0)]; 2023-01-11T21:38:05.9175082Z auto tmp4 = tmp2 - tmp3; 2023-01-11T21:38:05.9175190Z auto tmp5 = static_cast(1); 2023-01-11T21:38:05.9175284Z auto tmp6 = tmp4 + tmp5; 2023-01-11T21:38:05.9175398Z atomic_add(&out_ptr0[i1 + (32*tmp0)], tmp6); 2023-01-11T21:38:05.9175466Z } 2023-01-11T21:38:05.9175531Z } 2023-01-11T21:38:05.9175594Z } 2023-01-11T21:38:05.9175661Z } 2023-01-11T21:38:05.9175723Z } 2023-01-11T21:38:05.9175783Z } 2023-01-11T21:38:05.9175860Z ''') 2023-01-11T21:38:05.9175865Z 2023-01-11T21:38:05.9175870Z 2023-01-11T21:38:05.9175960Z async_compile.wait(globals()) 2023-01-11T21:38:05.9176033Z del async_compile 2023-01-11T21:38:05.9176039Z 2023-01-11T21:38:05.9176112Z def call(args): 2023-01-11T21:38:05.9176188Z arg0_1, arg1_1 = args 2023-01-11T21:38:05.9176260Z args.clear() 2023-01-11T21:38:05.9176457Z buf0 = empty_strided((16, 32), (32, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9176618Z kernel_cpp_0(c_void_p(arg1_1.data_ptr()), c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:05.9176687Z del arg0_1 2023-01-11T21:38:05.9176754Z del arg1_1 2023-01-11T21:38:05.9176829Z return (buf0, ) 2023-01-11T21:38:05.9176834Z 2023-01-11T21:38:05.9176838Z 2023-01-11T21:38:05.9176915Z if __name__ == "__main__": 2023-01-11T21:38:05.9177029Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.9177231Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.9177451Z arg0_1 = rand_strided((16, 32), (32, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9177642Z arg1_1 = rand_strided((2, 80), (80, 1), device='cpu', dtype=torch.int64) 2023-01-11T21:38:05.9177809Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:05.9177815Z 2023-01-11T21:38:05.9177884Z ok (1.848s) 2023-01-11T21:38:05.9178330Z test_gelu_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.9178458Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.9178717Z [2023-01-11 21:26:55,974] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 112 2023-01-11T21:38:05.9178976Z [2023-01-11 21:26:57,765] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 112 2023-01-11T21:38:05.9178982Z 2023-01-11T21:38:05.9179080Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.9179152Z import torch 2023-01-11T21:38:05.9179219Z import random 2023-01-11T21:38:05.9179335Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.9179453Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.9179459Z 2023-01-11T21:38:05.9179537Z aten = torch.ops.aten 2023-01-11T21:38:05.9179670Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.9179761Z async_compile = AsyncCompile() 2023-01-11T21:38:05.9179766Z 2023-01-11T21:38:05.9179836Z import triton 2023-01-11T21:38:05.9179924Z import triton.language as tl 2023-01-11T21:38:05.9180041Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.9180216Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.9180222Z 2023-01-11T21:38:05.9180226Z 2023-01-11T21:38:05.9180363Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.9180570Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.9180693Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:05.9180797Z float* __restrict__ out_ptr0, 2023-01-11T21:38:05.9180898Z float* __restrict__ out_ptr1) 2023-01-11T21:38:05.9180966Z { 2023-01-11T21:38:05.9181063Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.9181132Z { 2023-01-11T21:38:05.9181216Z #pragma omp for 2023-01-11T21:38:05.9181306Z for(long i0=0; i0<32; i0+=1) 2023-01-11T21:38:05.9181376Z { 2023-01-11T21:38:05.9181520Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:05.9181668Z auto tmp1 = at::vec::Vectorized(static_cast(0.5)); 2023-01-11T21:38:05.9181753Z auto tmp2 = tmp0 * tmp1; 2023-01-11T21:38:05.9181903Z auto tmp3 = at::vec::Vectorized(static_cast(0.7071067811865476)); 2023-01-11T21:38:05.9181998Z auto tmp4 = tmp0 * tmp3; 2023-01-11T21:38:05.9182091Z auto tmp5 = tmp4.erf(); 2023-01-11T21:38:05.9182231Z auto tmp6 = at::vec::Vectorized(static_cast(1)); 2023-01-11T21:38:05.9182323Z auto tmp7 = tmp5 + tmp6; 2023-01-11T21:38:05.9182413Z auto tmp8 = tmp2 * tmp7; 2023-01-11T21:38:05.9182552Z auto tmp9 = at::vec::Vectorized(static_cast(2)); 2023-01-11T21:38:05.9182639Z auto tmp10 = tmp8 + tmp9; 2023-01-11T21:38:05.9182731Z auto tmp11 = tmp0 + tmp6; 2023-01-11T21:38:05.9182824Z auto tmp12 = tmp11 * tmp1; 2023-01-11T21:38:05.9182916Z auto tmp13 = tmp11 * tmp3; 2023-01-11T21:38:05.9183011Z auto tmp14 = tmp13.erf(); 2023-01-11T21:38:05.9183103Z auto tmp15 = tmp14 + tmp6; 2023-01-11T21:38:05.9183198Z auto tmp16 = tmp12 * tmp15; 2023-01-11T21:38:05.9183291Z tmp10.store(out_ptr0 + 8*i0); 2023-01-11T21:38:05.9183390Z tmp16.store(out_ptr1 + 8*i0); 2023-01-11T21:38:05.9183490Z } 2023-01-11T21:38:05.9183597Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:05.9183690Z for(long i0=256; i0<256; i0+=1) 2023-01-11T21:38:05.9183760Z { 2023-01-11T21:38:05.9183851Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:05.9183954Z auto tmp1 = static_cast(0.5); 2023-01-11T21:38:05.9184045Z auto tmp2 = tmp0 * tmp1; 2023-01-11T21:38:05.9184162Z auto tmp3 = static_cast(0.7071067811865476); 2023-01-11T21:38:05.9184255Z auto tmp4 = tmp0 * tmp3; 2023-01-11T21:38:05.9184351Z auto tmp5 = std::erf(tmp4); 2023-01-11T21:38:05.9184460Z auto tmp6 = static_cast(1); 2023-01-11T21:38:05.9184551Z auto tmp7 = tmp5 + tmp6; 2023-01-11T21:38:05.9184634Z auto tmp8 = tmp2 * tmp7; 2023-01-11T21:38:05.9184740Z auto tmp9 = static_cast(2); 2023-01-11T21:38:05.9184835Z auto tmp10 = tmp8 + tmp9; 2023-01-11T21:38:05.9184927Z auto tmp11 = tmp0 + tmp6; 2023-01-11T21:38:05.9185020Z auto tmp12 = tmp11 * tmp1; 2023-01-11T21:38:05.9185115Z auto tmp13 = tmp11 * tmp3; 2023-01-11T21:38:05.9185214Z auto tmp14 = std::erf(tmp13); 2023-01-11T21:38:05.9185299Z auto tmp15 = tmp14 + tmp6; 2023-01-11T21:38:05.9185392Z auto tmp16 = tmp12 * tmp15; 2023-01-11T21:38:05.9185482Z out_ptr0[i0] = tmp10; 2023-01-11T21:38:05.9185570Z out_ptr1[i0] = tmp16; 2023-01-11T21:38:05.9185640Z } 2023-01-11T21:38:05.9185709Z } 2023-01-11T21:38:05.9185769Z } 2023-01-11T21:38:05.9185857Z ''') 2023-01-11T21:38:05.9185896Z 2023-01-11T21:38:05.9185901Z 2023-01-11T21:38:05.9186002Z async_compile.wait(globals()) 2023-01-11T21:38:05.9186082Z del async_compile 2023-01-11T21:38:05.9186088Z 2023-01-11T21:38:05.9186166Z def call(args): 2023-01-11T21:38:05.9186243Z arg0_1, = args 2023-01-11T21:38:05.9186321Z args.clear() 2023-01-11T21:38:05.9186530Z buf0 = empty_strided((16, 16), (16, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9186726Z buf1 = empty_strided((16, 16), (16, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9186901Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr())) 2023-01-11T21:38:05.9186978Z del arg0_1 2023-01-11T21:38:05.9187062Z return (buf0, buf1, ) 2023-01-11T21:38:05.9187067Z 2023-01-11T21:38:05.9187072Z 2023-01-11T21:38:05.9187154Z if __name__ == "__main__": 2023-01-11T21:38:05.9187276Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.9187405Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.9188926Z arg0_1 = rand_strided((16, 16), (16, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9189033Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:05.9189038Z 2023-01-11T21:38:05.9189112Z ok (1.830s) 2023-01-11T21:38:05.9189567Z test_glu_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.9189703Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.9189964Z [2023-01-11 21:26:57,805] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 113 2023-01-11T21:38:05.9190232Z [2023-01-11 21:26:59,603] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 113 2023-01-11T21:38:05.9190240Z 2023-01-11T21:38:05.9190341Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.9190417Z import torch 2023-01-11T21:38:05.9190495Z import random 2023-01-11T21:38:05.9190610Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.9190766Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.9190771Z 2023-01-11T21:38:05.9190858Z aten = torch.ops.aten 2023-01-11T21:38:05.9190999Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.9191100Z async_compile = AsyncCompile() 2023-01-11T21:38:05.9191105Z 2023-01-11T21:38:05.9191180Z import triton 2023-01-11T21:38:05.9191277Z import triton.language as tl 2023-01-11T21:38:05.9191405Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.9191543Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.9191549Z 2023-01-11T21:38:05.9191553Z 2023-01-11T21:38:05.9191698Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.9191907Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.9192033Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:05.9192141Z float* __restrict__ out_ptr0, 2023-01-11T21:38:05.9192249Z float* __restrict__ out_ptr1, 2023-01-11T21:38:05.9192353Z float* __restrict__ out_ptr2) 2023-01-11T21:38:05.9192420Z { 2023-01-11T21:38:05.9192518Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.9192588Z { 2023-01-11T21:38:05.9192673Z #pragma omp for 2023-01-11T21:38:05.9192763Z for(long i0=0; i0<1024; i0+=1) 2023-01-11T21:38:05.9192836Z { 2023-01-11T21:38:05.9192923Z #pragma GCC ivdep 2023-01-11T21:38:05.9193006Z for(long i1=0; i1<4; i1+=1) 2023-01-11T21:38:05.9193077Z { 2023-01-11T21:38:05.9193148Z { 2023-01-11T21:38:05.9193255Z { 2023-01-11T21:38:05.9193365Z auto tmp0 = in_ptr0[i1 + (8*i0)]; 2023-01-11T21:38:05.9193478Z auto tmp1 = in_ptr0[4 + i1 + (8*i0)]; 2023-01-11T21:38:05.9193636Z auto tmp2 = std::exp(-tmp1); 2023-01-11T21:38:05.9193732Z auto tmp3 = 1 / (1 + tmp2); 2023-01-11T21:38:05.9193835Z auto tmp4 = tmp0 * tmp3; 2023-01-11T21:38:05.9193938Z out_ptr0[i1 + (4*i0)] = tmp4; 2023-01-11T21:38:05.9194013Z } 2023-01-11T21:38:05.9194086Z } 2023-01-11T21:38:05.9194158Z } 2023-01-11T21:38:05.9194228Z } 2023-01-11T21:38:05.9194305Z #pragma omp for 2023-01-11T21:38:05.9194395Z for(long i0=0; i0<8; i0+=1) 2023-01-11T21:38:05.9194467Z { 2023-01-11T21:38:05.9194561Z for(long i1=0; i1<64; i1+=1) 2023-01-11T21:38:05.9194631Z { 2023-01-11T21:38:05.9194784Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + (8*i1) + (1024*i0)); 2023-01-11T21:38:05.9194937Z auto tmp1 = at::vec::Vectorized::loadu(in_ptr0 + 512 + (8*i1) + (1024*i0)); 2023-01-11T21:38:05.9195083Z auto tmp2 = decltype(tmp1)(1)/(decltype(tmp1)(1) + tmp1.neg().exp()); 2023-01-11T21:38:05.9195175Z auto tmp3 = tmp0 * tmp2; 2023-01-11T21:38:05.9195285Z tmp3.store(out_ptr1 + (8*i1) + (512*i0)); 2023-01-11T21:38:05.9195355Z } 2023-01-11T21:38:05.9195454Z #pragma omp simd simdlen(4) 2023-01-11T21:38:05.9195551Z for(long i1=512; i1<512; i1+=1) 2023-01-11T21:38:05.9195621Z { 2023-01-11T21:38:05.9195727Z auto tmp0 = in_ptr0[i1 + (1024*i0)]; 2023-01-11T21:38:05.9195831Z auto tmp1 = in_ptr0[512 + i1 + (1024*i0)]; 2023-01-11T21:38:05.9195977Z auto tmp2 = std::exp(-tmp1); 2023-01-11T21:38:05.9196075Z auto tmp3 = 1 / (1 + tmp2); 2023-01-11T21:38:05.9196175Z auto tmp4 = tmp0 * tmp3; 2023-01-11T21:38:05.9196274Z out_ptr1[i1 + (512*i0)] = tmp4; 2023-01-11T21:38:05.9196346Z } 2023-01-11T21:38:05.9196422Z } 2023-01-11T21:38:05.9196499Z #pragma omp for 2023-01-11T21:38:05.9196623Z for(long i0=0; i0<128; i0+=1) 2023-01-11T21:38:05.9196695Z { 2023-01-11T21:38:05.9196785Z for(long i1=0; i1<4; i1+=1) 2023-01-11T21:38:05.9196854Z { 2023-01-11T21:38:05.9197001Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + (8*i1) + (64*i0)); 2023-01-11T21:38:05.9197152Z auto tmp1 = at::vec::Vectorized::loadu(in_ptr0 + 32 + (8*i1) + (64*i0)); 2023-01-11T21:38:05.9197288Z auto tmp2 = decltype(tmp1)(1)/(decltype(tmp1)(1) + tmp1.neg().exp()); 2023-01-11T21:38:05.9197383Z auto tmp3 = tmp0 * tmp2; 2023-01-11T21:38:05.9197496Z tmp3.store(out_ptr2 + (8*i1) + (32*i0)); 2023-01-11T21:38:05.9197569Z } 2023-01-11T21:38:05.9197668Z #pragma omp simd simdlen(4) 2023-01-11T21:38:05.9197762Z for(long i1=32; i1<32; i1+=1) 2023-01-11T21:38:05.9197833Z { 2023-01-11T21:38:05.9197929Z auto tmp0 = in_ptr0[i1 + (64*i0)]; 2023-01-11T21:38:05.9198039Z auto tmp1 = in_ptr0[32 + i1 + (64*i0)]; 2023-01-11T21:38:05.9198188Z auto tmp2 = std::exp(-tmp1); 2023-01-11T21:38:05.9198284Z auto tmp3 = 1 / (1 + tmp2); 2023-01-11T21:38:05.9198380Z auto tmp4 = tmp0 * tmp3; 2023-01-11T21:38:05.9198480Z out_ptr2[i1 + (32*i0)] = tmp4; 2023-01-11T21:38:05.9198549Z } 2023-01-11T21:38:05.9198612Z } 2023-01-11T21:38:05.9198681Z } 2023-01-11T21:38:05.9198748Z } 2023-01-11T21:38:05.9198833Z ''') 2023-01-11T21:38:05.9198839Z 2023-01-11T21:38:05.9198843Z 2023-01-11T21:38:05.9198940Z async_compile.wait(globals()) 2023-01-11T21:38:05.9199048Z del async_compile 2023-01-11T21:38:05.9199054Z 2023-01-11T21:38:05.9199132Z def call(args): 2023-01-11T21:38:05.9199209Z arg0_1, = args 2023-01-11T21:38:05.9199280Z args.clear() 2023-01-11T21:38:05.9199500Z buf0 = empty_strided((8, 16, 8, 4), (512, 32, 4, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9199717Z buf1 = empty_strided((8, 8, 8, 8), (512, 64, 8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9199927Z buf2 = empty_strided((8, 16, 4, 8), (512, 32, 8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9200125Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr())) 2023-01-11T21:38:05.9200200Z del arg0_1 2023-01-11T21:38:05.9200292Z return (buf0, buf1, buf2, ) 2023-01-11T21:38:05.9200297Z 2023-01-11T21:38:05.9200301Z 2023-01-11T21:38:05.9200385Z if __name__ == "__main__": 2023-01-11T21:38:05.9200499Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.9200633Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.9200852Z arg0_1 = rand_strided((8, 16, 8, 8), (1024, 64, 8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9200969Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:05.9200974Z 2023-01-11T21:38:05.9201047Z ok (1.838s) 2023-01-11T21:38:05.9201511Z test_grid_sampler_2d_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.9201646Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.9201909Z [2023-01-11 21:27:00,987] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 114 2023-01-11T21:38:05.9202123Z [2023-01-11 21:27:01,595] torch._inductor.scheduler: [DEBUG] remove_buffer('buf3') 2023-01-11T21:38:05.9202321Z [2023-01-11 21:27:01,596] torch._inductor.scheduler: [DEBUG] remove_buffer('buf7') 2023-01-11T21:38:05.9202524Z [2023-01-11 21:27:01,596] torch._inductor.scheduler: [DEBUG] remove_buffer('buf5') 2023-01-11T21:38:05.9202608Z 2023-01-11T21:38:05.9202711Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.9202788Z import torch 2023-01-11T21:38:05.9202867Z import random 2023-01-11T21:38:05.9202990Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.9203117Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.9203122Z 2023-01-11T21:38:05.9203208Z aten = torch.ops.aten 2023-01-11T21:38:05.9203340Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.9203440Z async_compile = AsyncCompile() 2023-01-11T21:38:05.9203445Z 2023-01-11T21:38:05.9203521Z import triton 2023-01-11T21:38:05.9203618Z import triton.language as tl 2023-01-11T21:38:05.9203748Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.9203892Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.9203898Z 2023-01-11T21:38:05.9203903Z 2023-01-11T21:38:05.9204043Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.9204257Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.9204374Z extern "C" void kernel(float* __restrict__ in_out_ptr0, 2023-01-11T21:38:05.9204488Z float* __restrict__ in_out_ptr1, 2023-01-11T21:38:05.9204603Z const float* __restrict__ in_ptr0, 2023-01-11T21:38:05.9204716Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:05.9204825Z float* __restrict__ out_ptr0, 2023-01-11T21:38:05.9204930Z float* __restrict__ out_ptr1, 2023-01-11T21:38:05.9205036Z float* __restrict__ out_ptr2, 2023-01-11T21:38:05.9205162Z float* __restrict__ out_ptr3, 2023-01-11T21:38:05.9205266Z long* __restrict__ out_ptr4, 2023-01-11T21:38:05.9205366Z long* __restrict__ out_ptr5, 2023-01-11T21:38:05.9205469Z float* __restrict__ out_ptr6, 2023-01-11T21:38:05.9205571Z long* __restrict__ out_ptr7, 2023-01-11T21:38:05.9205673Z long* __restrict__ out_ptr8, 2023-01-11T21:38:05.9205775Z float* __restrict__ out_ptr9, 2023-01-11T21:38:05.9205878Z long* __restrict__ out_ptr10, 2023-01-11T21:38:05.9205973Z long* __restrict__ out_ptr11, 2023-01-11T21:38:05.9206080Z float* __restrict__ out_ptr12, 2023-01-11T21:38:05.9206184Z long* __restrict__ out_ptr13, 2023-01-11T21:38:05.9206285Z long* __restrict__ out_ptr14, 2023-01-11T21:38:05.9206394Z float* __restrict__ out_ptr15) 2023-01-11T21:38:05.9206462Z { 2023-01-11T21:38:05.9206568Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.9206629Z { 2023-01-11T21:38:05.9206713Z #pragma omp for 2023-01-11T21:38:05.9206807Z for(long i0=0; i0<495616; i0+=1) 2023-01-11T21:38:05.9206880Z { 2023-01-11T21:38:05.9206953Z { 2023-01-11T21:38:05.9207028Z { 2023-01-11T21:38:05.9207124Z auto tmp0 = in_ptr0[2*i0]; 2023-01-11T21:38:05.9207230Z auto tmp9 = in_ptr0[1 + (2*i0)]; 2023-01-11T21:38:05.9207345Z auto tmp1 = static_cast(175.5); 2023-01-11T21:38:05.9207446Z auto tmp2 = tmp0 * tmp1; 2023-01-11T21:38:05.9207544Z auto tmp3 = tmp2 + tmp1; 2023-01-11T21:38:05.9207654Z auto tmp4 = std::floor(tmp3); 2023-01-11T21:38:05.9207765Z auto tmp5 = static_cast(0); 2023-01-11T21:38:05.9207868Z auto tmp6 = tmp4 >= tmp5; 2023-01-11T21:38:05.9207975Z auto tmp7 = static_cast(352); 2023-01-11T21:38:05.9208073Z auto tmp8 = tmp4 < tmp7; 2023-01-11T21:38:05.9208172Z auto tmp10 = tmp9 * tmp1; 2023-01-11T21:38:05.9208273Z auto tmp11 = tmp10 + tmp1; 2023-01-11T21:38:05.9208425Z auto tmp12 = std::floor(tmp11); 2023-01-11T21:38:05.9208528Z auto tmp13 = tmp12 >= tmp5; 2023-01-11T21:38:05.9208628Z auto tmp14 = tmp12 < tmp7; 2023-01-11T21:38:05.9208724Z auto tmp15 = tmp13 && tmp14; 2023-01-11T21:38:05.9208826Z auto tmp16 = tmp8 && tmp15; 2023-01-11T21:38:05.9208928Z auto tmp17 = tmp6 && tmp16; 2023-01-11T21:38:05.9209044Z auto tmp18 = static_cast(1); 2023-01-11T21:38:05.9209145Z auto tmp19 = tmp4 + tmp18; 2023-01-11T21:38:05.9209292Z auto tmp20 = tmp19 - tmp3; 2023-01-11T21:38:05.9209398Z auto tmp21 = tmp12 + tmp18; 2023-01-11T21:38:05.9209534Z auto tmp22 = tmp21 - tmp11; 2023-01-11T21:38:05.9209635Z auto tmp23 = tmp20 * tmp22; 2023-01-11T21:38:05.9209743Z auto tmp24 = tmp17 ? tmp23 : tmp5; 2023-01-11T21:38:05.9209846Z auto tmp25 = tmp19 >= tmp5; 2023-01-11T21:38:05.9209947Z auto tmp26 = tmp19 < tmp7; 2023-01-11T21:38:05.9210049Z auto tmp27 = tmp26 && tmp15; 2023-01-11T21:38:05.9210151Z auto tmp28 = tmp25 && tmp27; 2023-01-11T21:38:05.9210287Z auto tmp29 = tmp3 - tmp4; 2023-01-11T21:38:05.9210389Z auto tmp30 = tmp29 * tmp22; 2023-01-11T21:38:05.9210499Z auto tmp31 = tmp28 ? tmp30 : tmp5; 2023-01-11T21:38:05.9210596Z auto tmp32 = tmp21 >= tmp5; 2023-01-11T21:38:05.9210696Z auto tmp33 = tmp21 < tmp7; 2023-01-11T21:38:05.9210828Z auto tmp34 = tmp32 && tmp33; 2023-01-11T21:38:05.9210930Z auto tmp35 = tmp8 && tmp34; 2023-01-11T21:38:05.9211028Z auto tmp36 = tmp6 && tmp35; 2023-01-11T21:38:05.9211164Z auto tmp37 = tmp11 - tmp12; 2023-01-11T21:38:05.9211266Z auto tmp38 = tmp20 * tmp37; 2023-01-11T21:38:05.9211375Z auto tmp39 = tmp36 ? tmp38 : tmp5; 2023-01-11T21:38:05.9211476Z auto tmp40 = tmp26 && tmp34; 2023-01-11T21:38:05.9211577Z auto tmp41 = tmp25 && tmp40; 2023-01-11T21:38:05.9211677Z auto tmp42 = tmp29 * tmp37; 2023-01-11T21:38:05.9211785Z auto tmp43 = tmp41 ? tmp42 : tmp5; 2023-01-11T21:38:05.9211895Z auto tmp44 = static_cast(176.0); 2023-01-11T21:38:05.9211996Z auto tmp45 = tmp0 * tmp44; 2023-01-11T21:38:05.9212094Z auto tmp46 = tmp45 + tmp1; 2023-01-11T21:38:05.9212212Z auto tmp47 = static_cast(0.0); 2023-01-11T21:38:05.9212357Z auto tmp48 = (tmp47 != tmp47) ? tmp47 : std::max(tmp46, tmp47); 2023-01-11T21:38:05.9212473Z auto tmp49 = static_cast(351.0); 2023-01-11T21:38:05.9212614Z auto tmp50 = (tmp49 != tmp49) ? tmp49 : std::min(tmp48, tmp49); 2023-01-11T21:38:05.9212722Z auto tmp51 = std::floor(tmp50); 2023-01-11T21:38:05.9212814Z auto tmp52 = tmp51 >= tmp5; 2023-01-11T21:38:05.9212914Z auto tmp53 = tmp51 < tmp7; 2023-01-11T21:38:05.9213013Z auto tmp54 = tmp9 * tmp44; 2023-01-11T21:38:05.9213114Z auto tmp55 = tmp54 + tmp1; 2023-01-11T21:38:05.9213250Z auto tmp56 = (tmp47 != tmp47) ? tmp47 : std::max(tmp55, tmp47); 2023-01-11T21:38:05.9213383Z auto tmp57 = (tmp49 != tmp49) ? tmp49 : std::min(tmp56, tmp49); 2023-01-11T21:38:05.9213496Z auto tmp58 = std::floor(tmp57); 2023-01-11T21:38:05.9213587Z auto tmp59 = tmp58 >= tmp5; 2023-01-11T21:38:05.9213687Z auto tmp60 = tmp58 < tmp7; 2023-01-11T21:38:05.9213788Z auto tmp61 = tmp59 && tmp60; 2023-01-11T21:38:05.9213919Z auto tmp62 = tmp53 && tmp61; 2023-01-11T21:38:05.9214022Z auto tmp63 = tmp52 && tmp62; 2023-01-11T21:38:05.9214140Z auto tmp64 = static_cast(tmp51); 2023-01-11T21:38:05.9214253Z auto tmp65 = static_cast(0); 2023-01-11T21:38:05.9214364Z auto tmp66 = tmp63 ? tmp64 : tmp65; 2023-01-11T21:38:05.9214471Z auto tmp67 = static_cast(tmp58); 2023-01-11T21:38:05.9214693Z auto tmp68 = tmp63 ? tmp67 : tmp65; 2023-01-11T21:38:05.9214793Z auto tmp69 = tmp51 + tmp18; 2023-01-11T21:38:05.9214943Z auto tmp70 = tmp69 - tmp50; 2023-01-11T21:38:05.9215043Z auto tmp71 = tmp58 + tmp18; 2023-01-11T21:38:05.9215187Z auto tmp72 = tmp71 - tmp57; 2023-01-11T21:38:05.9215284Z auto tmp73 = tmp70 * tmp72; 2023-01-11T21:38:05.9215384Z auto tmp74 = tmp63 ? tmp73 : tmp5; 2023-01-11T21:38:05.9215497Z auto tmp75 = tmp69 >= tmp5; 2023-01-11T21:38:05.9215604Z auto tmp76 = tmp69 < tmp7; 2023-01-11T21:38:05.9215725Z auto tmp77 = tmp76 && tmp61; 2023-01-11T21:38:05.9215826Z auto tmp78 = tmp75 && tmp77; 2023-01-11T21:38:05.9215939Z auto tmp79 = static_cast(tmp69); 2023-01-11T21:38:05.9216049Z auto tmp80 = tmp78 ? tmp79 : tmp65; 2023-01-11T21:38:05.9216146Z auto tmp81 = tmp78 ? tmp67 : tmp65; 2023-01-11T21:38:05.9216287Z auto tmp82 = tmp50 - tmp51; 2023-01-11T21:38:05.9216435Z auto tmp83 = tmp82 * tmp72; 2023-01-11T21:38:05.9216539Z auto tmp84 = tmp78 ? tmp83 : tmp5; 2023-01-11T21:38:05.9216635Z auto tmp85 = tmp71 >= tmp5; 2023-01-11T21:38:05.9216735Z auto tmp86 = tmp71 < tmp7; 2023-01-11T21:38:05.9216837Z auto tmp87 = tmp85 && tmp86; 2023-01-11T21:38:05.9216935Z auto tmp88 = tmp53 && tmp87; 2023-01-11T21:38:05.9217025Z auto tmp89 = tmp52 && tmp88; 2023-01-11T21:38:05.9217180Z auto tmp90 = tmp89 ? tmp64 : tmp65; 2023-01-11T21:38:05.9217303Z auto tmp91 = static_cast(tmp71); 2023-01-11T21:38:05.9217407Z auto tmp92 = tmp89 ? tmp91 : tmp65; 2023-01-11T21:38:05.9217550Z auto tmp93 = tmp57 - tmp58; 2023-01-11T21:38:05.9217649Z auto tmp94 = tmp70 * tmp93; 2023-01-11T21:38:05.9217753Z auto tmp95 = tmp89 ? tmp94 : tmp5; 2023-01-11T21:38:05.9217846Z auto tmp96 = tmp76 && tmp87; 2023-01-11T21:38:05.9217943Z auto tmp97 = tmp75 && tmp96; 2023-01-11T21:38:05.9218046Z auto tmp98 = tmp97 ? tmp79 : tmp65; 2023-01-11T21:38:05.9218150Z auto tmp99 = tmp97 ? tmp91 : tmp65; 2023-01-11T21:38:05.9218250Z auto tmp100 = tmp82 * tmp93; 2023-01-11T21:38:05.9218357Z auto tmp101 = tmp97 ? tmp100 : tmp5; 2023-01-11T21:38:05.9218449Z out_ptr0[i0] = tmp24; 2023-01-11T21:38:05.9218532Z out_ptr1[i0] = tmp31; 2023-01-11T21:38:05.9218621Z out_ptr2[i0] = tmp39; 2023-01-11T21:38:05.9218710Z out_ptr3[i0] = tmp43; 2023-01-11T21:38:05.9218799Z out_ptr4[i0] = tmp66; 2023-01-11T21:38:05.9218886Z out_ptr5[i0] = tmp68; 2023-01-11T21:38:05.9218974Z out_ptr6[i0] = tmp74; 2023-01-11T21:38:05.9219066Z out_ptr7[i0] = tmp80; 2023-01-11T21:38:05.9219146Z out_ptr8[i0] = tmp81; 2023-01-11T21:38:05.9219233Z out_ptr9[i0] = tmp84; 2023-01-11T21:38:05.9219327Z out_ptr10[i0] = tmp90; 2023-01-11T21:38:05.9219422Z out_ptr11[i0] = tmp92; 2023-01-11T21:38:05.9219565Z out_ptr12[i0] = tmp95; 2023-01-11T21:38:05.9219657Z out_ptr13[i0] = tmp98; 2023-01-11T21:38:05.9219746Z out_ptr14[i0] = tmp99; 2023-01-11T21:38:05.9219832Z out_ptr15[i0] = tmp101; 2023-01-11T21:38:05.9219903Z } 2023-01-11T21:38:05.9219972Z } 2023-01-11T21:38:05.9220044Z } 2023-01-11T21:38:05.9220141Z #pragma omp for collapse(3) 2023-01-11T21:38:05.9220229Z for(long i0=0; i0<4; i0+=1) 2023-01-11T21:38:05.9220296Z { 2023-01-11T21:38:05.9220377Z for(long i1=0; i1<3; i1+=1) 2023-01-11T21:38:05.9220446Z { 2023-01-11T21:38:05.9220547Z for(long i2=0; i2<123904; i2+=1) 2023-01-11T21:38:05.9220616Z { 2023-01-11T21:38:05.9220694Z { 2023-01-11T21:38:05.9220766Z { 2023-01-11T21:38:05.9220882Z auto tmp2 = in_ptr0[(2*i2) + (247808*i0)]; 2023-01-11T21:38:05.9220996Z auto tmp11 = in_ptr0[1 + (2*i2) + (247808*i0)]; 2023-01-11T21:38:05.9221116Z auto tmp51 = out_ptr0[i2 + (123904*i0)]; 2023-01-11T21:38:05.9221232Z auto tmp53 = out_ptr1[i2 + (123904*i0)]; 2023-01-11T21:38:05.9221345Z auto tmp56 = out_ptr2[i2 + (123904*i0)]; 2023-01-11T21:38:05.9221454Z auto tmp59 = out_ptr3[i2 + (123904*i0)]; 2023-01-11T21:38:05.9221569Z auto tmp0 = static_cast(i0); 2023-01-11T21:38:05.9221683Z auto tmp1 = static_cast(i1); 2023-01-11T21:38:05.9221831Z auto tmp3 = static_cast(175.5); 2023-01-11T21:38:05.9221928Z auto tmp4 = tmp2 * tmp3; 2023-01-11T21:38:05.9222031Z auto tmp5 = tmp4 + tmp3; 2023-01-11T21:38:05.9222148Z auto tmp6 = std::floor(tmp5); 2023-01-11T21:38:05.9222266Z auto tmp7 = static_cast(0); 2023-01-11T21:38:05.9222371Z auto tmp8 = tmp6 >= tmp7; 2023-01-11T21:38:05.9222488Z auto tmp9 = static_cast(352); 2023-01-11T21:38:05.9222595Z auto tmp10 = tmp6 < tmp9; 2023-01-11T21:38:05.9222692Z auto tmp12 = tmp11 * tmp3; 2023-01-11T21:38:05.9222795Z auto tmp13 = tmp12 + tmp3; 2023-01-11T21:38:05.9222912Z auto tmp14 = std::floor(tmp13); 2023-01-11T21:38:05.9223016Z auto tmp15 = tmp14 >= tmp7; 2023-01-11T21:38:05.9223123Z auto tmp16 = tmp14 < tmp9; 2023-01-11T21:38:05.9223230Z auto tmp17 = tmp15 && tmp16; 2023-01-11T21:38:05.9223338Z auto tmp18 = tmp10 && tmp17; 2023-01-11T21:38:05.9223441Z auto tmp19 = tmp8 && tmp18; 2023-01-11T21:38:05.9223557Z auto tmp20 = static_cast(tmp14); 2023-01-11T21:38:05.9223676Z auto tmp21 = static_cast(0); 2023-01-11T21:38:05.9223789Z auto tmp22 = tmp19 ? tmp20 : tmp21; 2023-01-11T21:38:05.9223907Z auto tmp23 = static_cast(tmp6); 2023-01-11T21:38:05.9224021Z auto tmp24 = tmp19 ? tmp23 : tmp21; 2023-01-11T21:38:05.9224164Z auto tmp25 = in_ptr1[tmp24 + (352*tmp22) + (123904*tmp1) + (371712*tmp0)]; 2023-01-11T21:38:05.9224280Z auto tmp26 = static_cast(1); 2023-01-11T21:38:05.9224379Z auto tmp27 = tmp6 + tmp26; 2023-01-11T21:38:05.9224483Z auto tmp28 = tmp27 >= tmp7; 2023-01-11T21:38:05.9224589Z auto tmp29 = tmp27 < tmp9; 2023-01-11T21:38:05.9224698Z auto tmp30 = tmp29 && tmp17; 2023-01-11T21:38:05.9224838Z auto tmp31 = tmp28 && tmp30; 2023-01-11T21:38:05.9224947Z auto tmp32 = tmp31 ? tmp20 : tmp21; 2023-01-11T21:38:05.9225064Z auto tmp33 = static_cast(tmp27); 2023-01-11T21:38:05.9225178Z auto tmp34 = tmp31 ? tmp33 : tmp21; 2023-01-11T21:38:05.9225309Z auto tmp35 = in_ptr1[tmp34 + (352*tmp32) + (123904*tmp1) + (371712*tmp0)]; 2023-01-11T21:38:05.9225413Z auto tmp36 = tmp14 + tmp26; 2023-01-11T21:38:05.9225517Z auto tmp37 = tmp36 >= tmp7; 2023-01-11T21:38:05.9225623Z auto tmp38 = tmp36 < tmp9; 2023-01-11T21:38:05.9225729Z auto tmp39 = tmp37 && tmp38; 2023-01-11T21:38:05.9225836Z auto tmp40 = tmp10 && tmp39; 2023-01-11T21:38:05.9225941Z auto tmp41 = tmp8 && tmp40; 2023-01-11T21:38:05.9226052Z auto tmp42 = static_cast(tmp36); 2023-01-11T21:38:05.9226162Z auto tmp43 = tmp41 ? tmp42 : tmp21; 2023-01-11T21:38:05.9226270Z auto tmp44 = tmp41 ? tmp23 : tmp21; 2023-01-11T21:38:05.9226409Z auto tmp45 = in_ptr1[tmp44 + (352*tmp43) + (123904*tmp1) + (371712*tmp0)]; 2023-01-11T21:38:05.9226515Z auto tmp46 = tmp29 && tmp39; 2023-01-11T21:38:05.9226623Z auto tmp47 = tmp28 && tmp46; 2023-01-11T21:38:05.9226734Z auto tmp48 = tmp47 ? tmp42 : tmp21; 2023-01-11T21:38:05.9226874Z auto tmp49 = tmp47 ? tmp33 : tmp21; 2023-01-11T21:38:05.9227007Z auto tmp50 = in_ptr1[tmp49 + (352*tmp48) + (123904*tmp1) + (371712*tmp0)]; 2023-01-11T21:38:05.9227115Z auto tmp52 = tmp25 * tmp51; 2023-01-11T21:38:05.9227221Z auto tmp54 = tmp35 * tmp53; 2023-01-11T21:38:05.9227324Z auto tmp55 = tmp52 + tmp54; 2023-01-11T21:38:05.9227425Z auto tmp57 = tmp45 * tmp56; 2023-01-11T21:38:05.9227527Z auto tmp58 = tmp55 + tmp57; 2023-01-11T21:38:05.9227629Z auto tmp60 = tmp50 * tmp59; 2023-01-11T21:38:05.9227732Z auto tmp61 = tmp58 + tmp60; 2023-01-11T21:38:05.9227849Z in_out_ptr0[i2 + (123904*i1) + (371712*i0)] = tmp61; 2023-01-11T21:38:05.9227925Z } 2023-01-11T21:38:05.9227999Z } 2023-01-11T21:38:05.9228073Z } 2023-01-11T21:38:05.9228143Z } 2023-01-11T21:38:05.9228212Z } 2023-01-11T21:38:05.9228312Z #pragma omp for collapse(3) 2023-01-11T21:38:05.9228397Z for(long i0=0; i0<4; i0+=1) 2023-01-11T21:38:05.9228467Z { 2023-01-11T21:38:05.9228562Z for(long i1=0; i1<3; i1+=1) 2023-01-11T21:38:05.9228633Z { 2023-01-11T21:38:05.9228736Z for(long i2=0; i2<123904; i2+=1) 2023-01-11T21:38:05.9228809Z { 2023-01-11T21:38:05.9228875Z { 2023-01-11T21:38:05.9228955Z { 2023-01-11T21:38:05.9229074Z auto tmp2 = out_ptr5[i2 + (123904*i0)]; 2023-01-11T21:38:05.9229187Z auto tmp3 = out_ptr4[i2 + (123904*i0)]; 2023-01-11T21:38:05.9229298Z auto tmp5 = out_ptr6[i2 + (123904*i0)]; 2023-01-11T21:38:05.9229410Z auto tmp7 = out_ptr8[i2 + (123904*i0)]; 2023-01-11T21:38:05.9229521Z auto tmp8 = out_ptr7[i2 + (123904*i0)]; 2023-01-11T21:38:05.9229639Z auto tmp10 = out_ptr9[i2 + (123904*i0)]; 2023-01-11T21:38:05.9229749Z auto tmp13 = out_ptr11[i2 + (123904*i0)]; 2023-01-11T21:38:05.9229887Z auto tmp14 = out_ptr10[i2 + (123904*i0)]; 2023-01-11T21:38:05.9230001Z auto tmp16 = out_ptr12[i2 + (123904*i0)]; 2023-01-11T21:38:05.9230112Z auto tmp19 = out_ptr14[i2 + (123904*i0)]; 2023-01-11T21:38:05.9230222Z auto tmp20 = out_ptr13[i2 + (123904*i0)]; 2023-01-11T21:38:05.9230331Z auto tmp22 = out_ptr15[i2 + (123904*i0)]; 2023-01-11T21:38:05.9230449Z auto tmp0 = static_cast(i0); 2023-01-11T21:38:05.9230566Z auto tmp1 = static_cast(i1); 2023-01-11T21:38:05.9230699Z auto tmp4 = in_ptr1[tmp3 + (352*tmp2) + (123904*tmp1) + (371712*tmp0)]; 2023-01-11T21:38:05.9230808Z auto tmp6 = tmp4 * tmp5; 2023-01-11T21:38:05.9230945Z auto tmp9 = in_ptr1[tmp8 + (352*tmp7) + (123904*tmp1) + (371712*tmp0)]; 2023-01-11T21:38:05.9231055Z auto tmp11 = tmp9 * tmp10; 2023-01-11T21:38:05.9231160Z auto tmp12 = tmp6 + tmp11; 2023-01-11T21:38:05.9231304Z auto tmp15 = in_ptr1[tmp14 + (352*tmp13) + (123904*tmp1) + (371712*tmp0)]; 2023-01-11T21:38:05.9231410Z auto tmp17 = tmp15 * tmp16; 2023-01-11T21:38:05.9231507Z auto tmp18 = tmp12 + tmp17; 2023-01-11T21:38:05.9231645Z auto tmp21 = in_ptr1[tmp20 + (352*tmp19) + (123904*tmp1) + (371712*tmp0)]; 2023-01-11T21:38:05.9231750Z auto tmp23 = tmp21 * tmp22; 2023-01-11T21:38:05.9231878Z auto tmp24 = tmp18 + tmp23; 2023-01-11T21:38:05.9231998Z in_out_ptr1[i2 + (123904*i1) + (371712*i0)] = tmp24; 2023-01-11T21:38:05.9232071Z } 2023-01-11T21:38:05.9232145Z } 2023-01-11T21:38:05.9232213Z } 2023-01-11T21:38:05.9232273Z } 2023-01-11T21:38:05.9232344Z } 2023-01-11T21:38:05.9232410Z } 2023-01-11T21:38:05.9232476Z } 2023-01-11T21:38:05.9232565Z ''') 2023-01-11T21:38:05.9232570Z 2023-01-11T21:38:05.9232575Z 2023-01-11T21:38:05.9232670Z async_compile.wait(globals()) 2023-01-11T21:38:05.9232750Z del async_compile 2023-01-11T21:38:05.9232755Z 2023-01-11T21:38:05.9232823Z def call(args): 2023-01-11T21:38:05.9232902Z arg0_1, arg1_1 = args 2023-01-11T21:38:05.9232976Z args.clear() 2023-01-11T21:38:05.9233195Z buf0 = empty_strided((4, 352, 352), (123904, 352, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9233410Z buf2 = empty_strided((4, 352, 352), (123904, 352, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9233622Z buf4 = empty_strided((4, 352, 352), (123904, 352, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9233827Z buf6 = empty_strided((4, 352, 352), (123904, 352, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9234041Z buf9 = empty_strided((4, 352, 352), (123904, 352, 1), device='cpu', dtype=torch.int64) 2023-01-11T21:38:05.9234245Z buf10 = empty_strided((4, 352, 352), (123904, 352, 1), device='cpu', dtype=torch.int64) 2023-01-11T21:38:05.9234460Z buf11 = empty_strided((4, 352, 352), (123904, 352, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9234665Z buf12 = empty_strided((4, 352, 352), (123904, 352, 1), device='cpu', dtype=torch.int64) 2023-01-11T21:38:05.9234875Z buf13 = empty_strided((4, 352, 352), (123904, 352, 1), device='cpu', dtype=torch.int64) 2023-01-11T21:38:05.9235091Z buf14 = empty_strided((4, 352, 352), (123904, 352, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9235299Z buf15 = empty_strided((4, 352, 352), (123904, 352, 1), device='cpu', dtype=torch.int64) 2023-01-11T21:38:05.9235503Z buf16 = empty_strided((4, 352, 352), (123904, 352, 1), device='cpu', dtype=torch.int64) 2023-01-11T21:38:05.9235745Z buf17 = empty_strided((4, 352, 352), (123904, 352, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9235942Z buf19 = empty_strided((4, 352, 352), (123904, 352, 1), device='cpu', dtype=torch.int64) 2023-01-11T21:38:05.9236150Z buf20 = empty_strided((4, 352, 352), (123904, 352, 1), device='cpu', dtype=torch.int64) 2023-01-11T21:38:05.9236359Z buf21 = empty_strided((4, 352, 352), (123904, 352, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9236588Z buf1 = empty_strided((4, 3, 352, 352), (371712, 123904, 352, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9236679Z buf8 = buf1; del buf1 # reuse 2023-01-11T21:38:05.9236906Z buf18 = empty_strided((4, 3, 352, 352), (371712, 123904, 352, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9237003Z buf22 = buf18; del buf18 # reuse 2023-01-11T21:38:05.9237541Z kernel_cpp_0(c_void_p(buf8.data_ptr()), c_void_p(buf22.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf2.data_ptr()), c_void_p(buf4.data_ptr()), c_void_p(buf6.data_ptr()), c_void_p(buf9.data_ptr()), c_void_p(buf10.data_ptr()), c_void_p(buf11.data_ptr()), c_void_p(buf12.data_ptr()), c_void_p(buf13.data_ptr()), c_void_p(buf14.data_ptr()), c_void_p(buf15.data_ptr()), c_void_p(buf16.data_ptr()), c_void_p(buf17.data_ptr()), c_void_p(buf19.data_ptr()), c_void_p(buf20.data_ptr()), c_void_p(buf21.data_ptr())) 2023-01-11T21:38:05.9237618Z del arg0_1 2023-01-11T21:38:05.9237691Z del arg1_1 2023-01-11T21:38:05.9237769Z return (buf8, buf22, ) 2023-01-11T21:38:05.9237774Z 2023-01-11T21:38:05.9237779Z 2023-01-11T21:38:05.9237862Z if __name__ == "__main__": 2023-01-11T21:38:05.9237981Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.9238140Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.9238373Z arg0_1 = rand_strided((4, 3, 352, 352), (371712, 123904, 352, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9238600Z arg1_1 = rand_strided((4, 352, 352, 2), (247808, 704, 2, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9238719Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:05.9238984Z [2023-01-11 21:27:03,441] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 114 2023-01-11T21:38:05.9238989Z 2023-01-11T21:38:05.9239054Z ok (3.888s) 2023-01-11T21:38:05.9239513Z test_hardsigmoid_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.9239649Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.9239910Z [2023-01-11 21:27:03,542] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 115 2023-01-11T21:38:05.9240176Z [2023-01-11 21:27:05,302] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 115 2023-01-11T21:38:05.9240182Z 2023-01-11T21:38:05.9240279Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.9240355Z import torch 2023-01-11T21:38:05.9240430Z import random 2023-01-11T21:38:05.9240549Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.9240666Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.9240679Z 2023-01-11T21:38:05.9240755Z aten = torch.ops.aten 2023-01-11T21:38:05.9240892Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.9240987Z async_compile = AsyncCompile() 2023-01-11T21:38:05.9240995Z 2023-01-11T21:38:05.9241069Z import triton 2023-01-11T21:38:05.9241162Z import triton.language as tl 2023-01-11T21:38:05.9241288Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.9241429Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.9241434Z 2023-01-11T21:38:05.9241467Z 2023-01-11T21:38:05.9241607Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.9241806Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.9241930Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:05.9242037Z float* __restrict__ out_ptr0, 2023-01-11T21:38:05.9242139Z float* __restrict__ out_ptr1, 2023-01-11T21:38:05.9242239Z float* __restrict__ out_ptr2) 2023-01-11T21:38:05.9242303Z { 2023-01-11T21:38:05.9242407Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.9242470Z { 2023-01-11T21:38:05.9242555Z #pragma omp for 2023-01-11T21:38:05.9242643Z for(long i0=0; i0<8; i0+=1) 2023-01-11T21:38:05.9242710Z { 2023-01-11T21:38:05.9242851Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:05.9242994Z auto tmp1 = at::vec::Vectorized(static_cast(3)); 2023-01-11T21:38:05.9243086Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:05.9243221Z auto tmp3 = at::vec::Vectorized(static_cast(0.0)); 2023-01-11T21:38:05.9243335Z auto tmp4 = at::vec::maximum(tmp2, tmp3); 2023-01-11T21:38:05.9243473Z auto tmp5 = at::vec::Vectorized(static_cast(6.0)); 2023-01-11T21:38:05.9243587Z auto tmp6 = at::vec::minimum(tmp4, tmp5); 2023-01-11T21:38:05.9243724Z auto tmp7 = at::vec::Vectorized(static_cast(6)); 2023-01-11T21:38:05.9243815Z auto tmp8 = tmp6 / tmp7; 2023-01-11T21:38:05.9243934Z auto tmp9 = tmp2 + tmp1; 2023-01-11T21:38:05.9244049Z auto tmp10 = at::vec::maximum(tmp9, tmp3); 2023-01-11T21:38:05.9244157Z auto tmp11 = at::vec::minimum(tmp10, tmp5); 2023-01-11T21:38:05.9244248Z auto tmp12 = tmp11 / tmp7; 2023-01-11T21:38:05.9244381Z auto tmp13 = tmp0 - tmp1; 2023-01-11T21:38:05.9244473Z auto tmp14 = tmp13 + tmp1; 2023-01-11T21:38:05.9244587Z auto tmp15 = at::vec::maximum(tmp14, tmp3); 2023-01-11T21:38:05.9244698Z auto tmp16 = at::vec::minimum(tmp15, tmp5); 2023-01-11T21:38:05.9244789Z auto tmp17 = tmp16 / tmp7; 2023-01-11T21:38:05.9244886Z tmp8.store(out_ptr0 + 8*i0); 2023-01-11T21:38:05.9244976Z tmp12.store(out_ptr1 + 8*i0); 2023-01-11T21:38:05.9245072Z tmp17.store(out_ptr2 + 8*i0); 2023-01-11T21:38:05.9245138Z } 2023-01-11T21:38:05.9245238Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:05.9245334Z for(long i0=64; i0<64; i0+=1) 2023-01-11T21:38:05.9245419Z { 2023-01-11T21:38:05.9245508Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:05.9245630Z auto tmp1 = static_cast(3); 2023-01-11T21:38:05.9245719Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:05.9245825Z auto tmp3 = static_cast(0.0); 2023-01-11T21:38:05.9245959Z auto tmp4 = (tmp3 != tmp3) ? tmp3 : std::max(tmp2, tmp3); 2023-01-11T21:38:05.9246066Z auto tmp5 = static_cast(6.0); 2023-01-11T21:38:05.9246192Z auto tmp6 = (tmp5 != tmp5) ? tmp5 : std::min(tmp4, tmp5); 2023-01-11T21:38:05.9246296Z auto tmp7 = static_cast(6); 2023-01-11T21:38:05.9246379Z auto tmp8 = tmp6 / tmp7; 2023-01-11T21:38:05.9246466Z auto tmp9 = tmp2 + tmp1; 2023-01-11T21:38:05.9246596Z auto tmp10 = (tmp3 != tmp3) ? tmp3 : std::max(tmp9, tmp3); 2023-01-11T21:38:05.9246724Z auto tmp11 = (tmp5 != tmp5) ? tmp5 : std::min(tmp10, tmp5); 2023-01-11T21:38:05.9246820Z auto tmp12 = tmp11 / tmp7; 2023-01-11T21:38:05.9246950Z auto tmp13 = tmp0 - tmp1; 2023-01-11T21:38:05.9247042Z auto tmp14 = tmp13 + tmp1; 2023-01-11T21:38:05.9247170Z auto tmp15 = (tmp3 != tmp3) ? tmp3 : std::max(tmp14, tmp3); 2023-01-11T21:38:05.9247331Z auto tmp16 = (tmp5 != tmp5) ? tmp5 : std::min(tmp15, tmp5); 2023-01-11T21:38:05.9247425Z auto tmp17 = tmp16 / tmp7; 2023-01-11T21:38:05.9247511Z out_ptr0[i0] = tmp8; 2023-01-11T21:38:05.9247596Z out_ptr1[i0] = tmp12; 2023-01-11T21:38:05.9247681Z out_ptr2[i0] = tmp17; 2023-01-11T21:38:05.9247748Z } 2023-01-11T21:38:05.9247814Z } 2023-01-11T21:38:05.9247872Z } 2023-01-11T21:38:05.9247958Z ''') 2023-01-11T21:38:05.9247963Z 2023-01-11T21:38:05.9247968Z 2023-01-11T21:38:05.9248062Z async_compile.wait(globals()) 2023-01-11T21:38:05.9248138Z del async_compile 2023-01-11T21:38:05.9248143Z 2023-01-11T21:38:05.9248219Z def call(args): 2023-01-11T21:38:05.9248294Z arg0_1, = args 2023-01-11T21:38:05.9248369Z args.clear() 2023-01-11T21:38:05.9248557Z buf0 = empty_strided((64, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9248750Z buf1 = empty_strided((64, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9248941Z buf2 = empty_strided((64, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9249136Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr())) 2023-01-11T21:38:05.9249210Z del arg0_1 2023-01-11T21:38:05.9249297Z return (buf0, buf1, buf2, ) 2023-01-11T21:38:05.9249302Z 2023-01-11T21:38:05.9249307Z 2023-01-11T21:38:05.9249388Z if __name__ == "__main__": 2023-01-11T21:38:05.9249507Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.9249627Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.9249819Z arg0_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9249965Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:05.9249970Z 2023-01-11T21:38:05.9250043Z ok (1.812s) 2023-01-11T21:38:05.9250499Z test_hardswish_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.9250631Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.9250891Z [2023-01-11 21:27:05,370] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 116 2023-01-11T21:38:05.9251153Z [2023-01-11 21:27:07,130] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 116 2023-01-11T21:38:05.9251162Z 2023-01-11T21:38:05.9251261Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.9251329Z import torch 2023-01-11T21:38:05.9251404Z import random 2023-01-11T21:38:05.9251527Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.9251651Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.9251656Z 2023-01-11T21:38:05.9251742Z aten = torch.ops.aten 2023-01-11T21:38:05.9251879Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.9251975Z async_compile = AsyncCompile() 2023-01-11T21:38:05.9251980Z 2023-01-11T21:38:05.9252055Z import triton 2023-01-11T21:38:05.9252140Z import triton.language as tl 2023-01-11T21:38:05.9252265Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.9252406Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.9252412Z 2023-01-11T21:38:05.9252416Z 2023-01-11T21:38:05.9252554Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.9252762Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.9252887Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:05.9252992Z float* __restrict__ out_ptr0, 2023-01-11T21:38:05.9253095Z float* __restrict__ out_ptr1, 2023-01-11T21:38:05.9253217Z float* __restrict__ out_ptr2) 2023-01-11T21:38:05.9253285Z { 2023-01-11T21:38:05.9253391Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.9253458Z { 2023-01-11T21:38:05.9253541Z #pragma omp for 2023-01-11T21:38:05.9253630Z for(long i0=0; i0<8; i0+=1) 2023-01-11T21:38:05.9253700Z { 2023-01-11T21:38:05.9253832Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:05.9253972Z auto tmp1 = at::vec::Vectorized(static_cast(3)); 2023-01-11T21:38:05.9254063Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:05.9254212Z auto tmp3 = at::vec::Vectorized(static_cast(0.0)); 2023-01-11T21:38:05.9254325Z auto tmp4 = at::vec::maximum(tmp2, tmp3); 2023-01-11T21:38:05.9254466Z auto tmp5 = at::vec::Vectorized(static_cast(6.0)); 2023-01-11T21:38:05.9254700Z auto tmp6 = at::vec::minimum(tmp4, tmp5); 2023-01-11T21:38:05.9254793Z auto tmp7 = tmp0 * tmp6; 2023-01-11T21:38:05.9254922Z auto tmp8 = at::vec::Vectorized(static_cast(6)); 2023-01-11T21:38:05.9255015Z auto tmp9 = tmp7 / tmp8; 2023-01-11T21:38:05.9255106Z auto tmp10 = tmp2 + tmp1; 2023-01-11T21:38:05.9255221Z auto tmp11 = at::vec::maximum(tmp10, tmp3); 2023-01-11T21:38:05.9255333Z auto tmp12 = at::vec::minimum(tmp11, tmp5); 2023-01-11T21:38:05.9255427Z auto tmp13 = tmp2 * tmp12; 2023-01-11T21:38:05.9255518Z auto tmp14 = tmp13 / tmp8; 2023-01-11T21:38:05.9255641Z auto tmp15 = tmp0 - tmp1; 2023-01-11T21:38:05.9255784Z auto tmp16 = tmp15 + tmp1; 2023-01-11T21:38:05.9255896Z auto tmp17 = at::vec::maximum(tmp16, tmp3); 2023-01-11T21:38:05.9256009Z auto tmp18 = at::vec::minimum(tmp17, tmp5); 2023-01-11T21:38:05.9256099Z auto tmp19 = tmp15 * tmp18; 2023-01-11T21:38:05.9256194Z auto tmp20 = tmp19 / tmp8; 2023-01-11T21:38:05.9256294Z tmp9.store(out_ptr0 + 8*i0); 2023-01-11T21:38:05.9256389Z tmp14.store(out_ptr1 + 8*i0); 2023-01-11T21:38:05.9256479Z tmp20.store(out_ptr2 + 8*i0); 2023-01-11T21:38:05.9256548Z } 2023-01-11T21:38:05.9256650Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:05.9256739Z for(long i0=64; i0<64; i0+=1) 2023-01-11T21:38:05.9256807Z { 2023-01-11T21:38:05.9256898Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:05.9257004Z auto tmp1 = static_cast(3); 2023-01-11T21:38:05.9257087Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:05.9257257Z auto tmp3 = static_cast(0.0); 2023-01-11T21:38:05.9257388Z auto tmp4 = (tmp3 != tmp3) ? tmp3 : std::max(tmp2, tmp3); 2023-01-11T21:38:05.9257494Z auto tmp5 = static_cast(6.0); 2023-01-11T21:38:05.9257621Z auto tmp6 = (tmp5 != tmp5) ? tmp5 : std::min(tmp4, tmp5); 2023-01-11T21:38:05.9257716Z auto tmp7 = tmp0 * tmp6; 2023-01-11T21:38:05.9257820Z auto tmp8 = static_cast(6); 2023-01-11T21:38:05.9257902Z auto tmp9 = tmp7 / tmp8; 2023-01-11T21:38:05.9257991Z auto tmp10 = tmp2 + tmp1; 2023-01-11T21:38:05.9258122Z auto tmp11 = (tmp3 != tmp3) ? tmp3 : std::max(tmp10, tmp3); 2023-01-11T21:38:05.9258250Z auto tmp12 = (tmp5 != tmp5) ? tmp5 : std::min(tmp11, tmp5); 2023-01-11T21:38:05.9258341Z auto tmp13 = tmp2 * tmp12; 2023-01-11T21:38:05.9258435Z auto tmp14 = tmp13 / tmp8; 2023-01-11T21:38:05.9258565Z auto tmp15 = tmp0 - tmp1; 2023-01-11T21:38:05.9258649Z auto tmp16 = tmp15 + tmp1; 2023-01-11T21:38:05.9258775Z auto tmp17 = (tmp3 != tmp3) ? tmp3 : std::max(tmp16, tmp3); 2023-01-11T21:38:05.9258901Z auto tmp18 = (tmp5 != tmp5) ? tmp5 : std::min(tmp17, tmp5); 2023-01-11T21:38:05.9258993Z auto tmp19 = tmp15 * tmp18; 2023-01-11T21:38:05.9259127Z auto tmp20 = tmp19 / tmp8; 2023-01-11T21:38:05.9259221Z out_ptr0[i0] = tmp9; 2023-01-11T21:38:05.9259304Z out_ptr1[i0] = tmp14; 2023-01-11T21:38:05.9259391Z out_ptr2[i0] = tmp20; 2023-01-11T21:38:05.9259451Z } 2023-01-11T21:38:05.9259517Z } 2023-01-11T21:38:05.9259583Z } 2023-01-11T21:38:05.9259667Z ''') 2023-01-11T21:38:05.9259673Z 2023-01-11T21:38:05.9259677Z 2023-01-11T21:38:05.9259770Z async_compile.wait(globals()) 2023-01-11T21:38:05.9259848Z del async_compile 2023-01-11T21:38:05.9259853Z 2023-01-11T21:38:05.9259928Z def call(args): 2023-01-11T21:38:05.9259994Z arg0_1, = args 2023-01-11T21:38:05.9260073Z args.clear() 2023-01-11T21:38:05.9260270Z buf0 = empty_strided((64, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9260462Z buf1 = empty_strided((64, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9260650Z buf2 = empty_strided((64, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9260847Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr())) 2023-01-11T21:38:05.9260924Z del arg0_1 2023-01-11T21:38:05.9261011Z return (buf0, buf1, buf2, ) 2023-01-11T21:38:05.9261016Z 2023-01-11T21:38:05.9261020Z 2023-01-11T21:38:05.9261093Z if __name__ == "__main__": 2023-01-11T21:38:05.9261212Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.9261339Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.9261535Z arg0_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9261682Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:05.9261687Z 2023-01-11T21:38:05.9261759Z ok (1.827s) 2023-01-11T21:38:05.9262221Z test_hardtanh_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.9262354Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.9262614Z [2023-01-11 21:27:07,170] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 117 2023-01-11T21:38:05.9262872Z [2023-01-11 21:27:08,905] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 117 2023-01-11T21:38:05.9262887Z 2023-01-11T21:38:05.9262978Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.9263055Z import torch 2023-01-11T21:38:05.9263135Z import random 2023-01-11T21:38:05.9263253Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.9263379Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.9263384Z 2023-01-11T21:38:05.9263468Z aten = torch.ops.aten 2023-01-11T21:38:05.9263611Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.9263700Z async_compile = AsyncCompile() 2023-01-11T21:38:05.9263705Z 2023-01-11T21:38:05.9263780Z import triton 2023-01-11T21:38:05.9263874Z import triton.language as tl 2023-01-11T21:38:05.9264000Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.9264141Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.9264146Z 2023-01-11T21:38:05.9264151Z 2023-01-11T21:38:05.9264288Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.9264495Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.9264622Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:05.9264721Z float* __restrict__ out_ptr0, 2023-01-11T21:38:05.9264823Z float* __restrict__ out_ptr1, 2023-01-11T21:38:05.9264923Z float* __restrict__ out_ptr2) 2023-01-11T21:38:05.9265016Z { 2023-01-11T21:38:05.9265119Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.9265185Z { 2023-01-11T21:38:05.9265268Z #pragma omp for 2023-01-11T21:38:05.9265349Z for(long i0=0; i0<8; i0+=1) 2023-01-11T21:38:05.9265416Z { 2023-01-11T21:38:05.9265557Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:05.9265761Z auto tmp1 = at::vec::Vectorized(static_cast(-1.0)); 2023-01-11T21:38:05.9265874Z auto tmp2 = at::vec::maximum(tmp0, tmp1); 2023-01-11T21:38:05.9266015Z auto tmp3 = at::vec::Vectorized(static_cast(1.0)); 2023-01-11T21:38:05.9266131Z auto tmp4 = at::vec::minimum(tmp2, tmp3); 2023-01-11T21:38:05.9266267Z auto tmp5 = at::vec::Vectorized(static_cast(1)); 2023-01-11T21:38:05.9266350Z auto tmp6 = tmp0 + tmp5; 2023-01-11T21:38:05.9266461Z auto tmp7 = at::vec::maximum(tmp6, tmp1); 2023-01-11T21:38:05.9266577Z auto tmp8 = at::vec::minimum(tmp7, tmp3); 2023-01-11T21:38:05.9266704Z auto tmp9 = tmp0 - tmp5; 2023-01-11T21:38:05.9266817Z auto tmp10 = at::vec::maximum(tmp9, tmp1); 2023-01-11T21:38:05.9266934Z auto tmp11 = at::vec::minimum(tmp10, tmp3); 2023-01-11T21:38:05.9267031Z tmp4.store(out_ptr0 + 8*i0); 2023-01-11T21:38:05.9267127Z tmp8.store(out_ptr1 + 8*i0); 2023-01-11T21:38:05.9267216Z tmp11.store(out_ptr2 + 8*i0); 2023-01-11T21:38:05.9267283Z } 2023-01-11T21:38:05.9267384Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:05.9267505Z for(long i0=64; i0<64; i0+=1) 2023-01-11T21:38:05.9267572Z { 2023-01-11T21:38:05.9267661Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:05.9267805Z auto tmp1 = static_cast(-1.0); 2023-01-11T21:38:05.9267935Z auto tmp2 = (tmp1 != tmp1) ? tmp1 : std::max(tmp0, tmp1); 2023-01-11T21:38:05.9268044Z auto tmp3 = static_cast(1.0); 2023-01-11T21:38:05.9268172Z auto tmp4 = (tmp3 != tmp3) ? tmp3 : std::min(tmp2, tmp3); 2023-01-11T21:38:05.9268279Z auto tmp5 = static_cast(1); 2023-01-11T21:38:05.9268369Z auto tmp6 = tmp0 + tmp5; 2023-01-11T21:38:05.9268495Z auto tmp7 = (tmp1 != tmp1) ? tmp1 : std::max(tmp6, tmp1); 2023-01-11T21:38:05.9268621Z auto tmp8 = (tmp3 != tmp3) ? tmp3 : std::min(tmp7, tmp3); 2023-01-11T21:38:05.9268742Z auto tmp9 = tmp0 - tmp5; 2023-01-11T21:38:05.9268871Z auto tmp10 = (tmp1 != tmp1) ? tmp1 : std::max(tmp9, tmp1); 2023-01-11T21:38:05.9269002Z auto tmp11 = (tmp3 != tmp3) ? tmp3 : std::min(tmp10, tmp3); 2023-01-11T21:38:05.9269086Z out_ptr0[i0] = tmp4; 2023-01-11T21:38:05.9269171Z out_ptr1[i0] = tmp8; 2023-01-11T21:38:05.9269256Z out_ptr2[i0] = tmp11; 2023-01-11T21:38:05.9269325Z } 2023-01-11T21:38:05.9269390Z } 2023-01-11T21:38:05.9269449Z } 2023-01-11T21:38:05.9269533Z ''') 2023-01-11T21:38:05.9269539Z 2023-01-11T21:38:05.9269544Z 2023-01-11T21:38:05.9269639Z async_compile.wait(globals()) 2023-01-11T21:38:05.9269718Z del async_compile 2023-01-11T21:38:05.9269723Z 2023-01-11T21:38:05.9269798Z def call(args): 2023-01-11T21:38:05.9269872Z arg0_1, = args 2023-01-11T21:38:05.9269948Z args.clear() 2023-01-11T21:38:05.9270134Z buf0 = empty_strided((64, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9270327Z buf1 = empty_strided((64, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9270514Z buf2 = empty_strided((64, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9270710Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr())) 2023-01-11T21:38:05.9270783Z del arg0_1 2023-01-11T21:38:05.9270872Z return (buf0, buf1, buf2, ) 2023-01-11T21:38:05.9270877Z 2023-01-11T21:38:05.9270881Z 2023-01-11T21:38:05.9270990Z if __name__ == "__main__": 2023-01-11T21:38:05.9271111Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.9271230Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.9271423Z arg0_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9271538Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:05.9271543Z 2023-01-11T21:38:05.9271614Z ok (1.775s) 2023-01-11T21:38:05.9272078Z test_horizonal_fusion1_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.9272215Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.9272475Z [2023-01-11 21:27:08,925] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 118 2023-01-11T21:38:05.9272742Z [2023-01-11 21:27:10,674] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 118 2023-01-11T21:38:05.9272747Z 2023-01-11T21:38:05.9272845Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.9272913Z import torch 2023-01-11T21:38:05.9272989Z import random 2023-01-11T21:38:05.9273112Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.9273236Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.9273242Z 2023-01-11T21:38:05.9273353Z aten = torch.ops.aten 2023-01-11T21:38:05.9273491Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.9273587Z async_compile = AsyncCompile() 2023-01-11T21:38:05.9273592Z 2023-01-11T21:38:05.9273666Z import triton 2023-01-11T21:38:05.9273751Z import triton.language as tl 2023-01-11T21:38:05.9273878Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.9274021Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.9274026Z 2023-01-11T21:38:05.9274031Z 2023-01-11T21:38:05.9274167Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.9274373Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.9274498Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:05.9274609Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:05.9274718Z const float* __restrict__ in_ptr2, 2023-01-11T21:38:05.9274819Z float* __restrict__ out_ptr0, 2023-01-11T21:38:05.9274921Z float* __restrict__ out_ptr1, 2023-01-11T21:38:05.9275021Z float* __restrict__ out_ptr2) 2023-01-11T21:38:05.9275087Z { 2023-01-11T21:38:05.9275188Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.9275254Z { 2023-01-11T21:38:05.9275337Z #pragma omp for 2023-01-11T21:38:05.9275418Z for(long i0=0; i0<256; i0+=1) 2023-01-11T21:38:05.9275486Z { 2023-01-11T21:38:05.9275627Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:05.9275764Z auto tmp1 = at::vec::Vectorized::loadu(in_ptr1 + 8*i0); 2023-01-11T21:38:05.9275855Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:05.9275951Z tmp2.store(out_ptr0 + 8*i0); 2023-01-11T21:38:05.9276017Z } 2023-01-11T21:38:05.9276109Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:05.9276201Z for(long i0=2048; i0<2048; i0+=1) 2023-01-11T21:38:05.9276273Z { 2023-01-11T21:38:05.9276361Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:05.9276451Z auto tmp1 = in_ptr1[i0]; 2023-01-11T21:38:05.9276540Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:05.9276625Z out_ptr0[i0] = tmp2; 2023-01-11T21:38:05.9276684Z } 2023-01-11T21:38:05.9276791Z #pragma omp for 2023-01-11T21:38:05.9276879Z for(long i0=0; i0<8; i0+=1) 2023-01-11T21:38:05.9276946Z { 2023-01-11T21:38:05.9277034Z #pragma GCC ivdep 2023-01-11T21:38:05.9277124Z for(long i1=0; i1<16; i1+=1) 2023-01-11T21:38:05.9277190Z { 2023-01-11T21:38:05.9277278Z for(long i2=0; i2<2; i2+=1) 2023-01-11T21:38:05.9277348Z { 2023-01-11T21:38:05.9277507Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + (8*i2) + (16*i1) + (256*i0)); 2023-01-11T21:38:05.9277639Z auto tmp1 = at::vec::Vectorized(in_ptr2[i1]); 2023-01-11T21:38:05.9277795Z auto tmp3 = at::vec::Vectorized::loadu(in_ptr1 + (8*i2) + (16*i1) + (256*i0)); 2023-01-11T21:38:05.9277936Z auto tmp2 = tmp0 - tmp1; 2023-01-11T21:38:05.9278037Z auto tmp4 = tmp3 * tmp1; 2023-01-11T21:38:05.9278156Z tmp2.store(out_ptr1 + (8*i2) + (16*i1) + (256*i0)); 2023-01-11T21:38:05.9278266Z tmp4.store(out_ptr2 + (8*i2) + (16*i1) + (256*i0)); 2023-01-11T21:38:05.9278336Z } 2023-01-11T21:38:05.9278438Z #pragma omp simd simdlen(4) 2023-01-11T21:38:05.9278531Z for(long i2=16; i2<16; i2+=1) 2023-01-11T21:38:05.9278600Z { 2023-01-11T21:38:05.9278711Z auto tmp0 = in_ptr0[i2 + (16*i1) + (256*i0)]; 2023-01-11T21:38:05.9278809Z auto tmp1 = in_ptr2[i1]; 2023-01-11T21:38:05.9278911Z auto tmp3 = in_ptr1[i2 + (16*i1) + (256*i0)]; 2023-01-11T21:38:05.9279082Z auto tmp2 = tmp0 - tmp1; 2023-01-11T21:38:05.9279176Z auto tmp4 = tmp3 * tmp1; 2023-01-11T21:38:05.9279284Z out_ptr1[i2 + (16*i1) + (256*i0)] = tmp2; 2023-01-11T21:38:05.9279389Z out_ptr2[i2 + (16*i1) + (256*i0)] = tmp4; 2023-01-11T21:38:05.9279458Z } 2023-01-11T21:38:05.9279527Z } 2023-01-11T21:38:05.9279587Z } 2023-01-11T21:38:05.9279652Z } 2023-01-11T21:38:05.9279716Z } 2023-01-11T21:38:05.9279800Z ''') 2023-01-11T21:38:05.9279806Z 2023-01-11T21:38:05.9279810Z 2023-01-11T21:38:05.9279905Z async_compile.wait(globals()) 2023-01-11T21:38:05.9279981Z del async_compile 2023-01-11T21:38:05.9279986Z 2023-01-11T21:38:05.9280065Z def call(args): 2023-01-11T21:38:05.9280145Z arg0_1, arg1_1, arg2_1 = args 2023-01-11T21:38:05.9280219Z args.clear() 2023-01-11T21:38:05.9280430Z buf0 = empty_strided((8, 16, 16), (256, 16, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9280641Z buf1 = empty_strided((8, 16, 16), (256, 16, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9280845Z buf2 = empty_strided((8, 16, 16), (256, 16, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9281086Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(arg2_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr())) 2023-01-11T21:38:05.9281162Z del arg0_1 2023-01-11T21:38:05.9281232Z del arg1_1 2023-01-11T21:38:05.9281296Z del arg2_1 2023-01-11T21:38:05.9281382Z return (buf0, buf1, buf2, ) 2023-01-11T21:38:05.9281387Z 2023-01-11T21:38:05.9281392Z 2023-01-11T21:38:05.9281472Z if __name__ == "__main__": 2023-01-11T21:38:05.9281591Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.9281722Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.9281932Z arg0_1 = rand_strided((8, 16, 16), (256, 16, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9282142Z arg1_1 = rand_strided((8, 16, 16), (256, 16, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9282345Z arg2_1 = rand_strided((1, 16, 1), (16, 1, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9282464Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1])) 2023-01-11T21:38:05.9282469Z 2023-01-11T21:38:05.9282568Z ok (1.768s) 2023-01-11T21:38:05.9283033Z test_horizonal_fusion2_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.9283164Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.9283424Z [2023-01-11 21:27:10,693] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 119 2023-01-11T21:38:05.9283691Z [2023-01-11 21:27:12,431] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 119 2023-01-11T21:38:05.9283697Z 2023-01-11T21:38:05.9283794Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.9283868Z import torch 2023-01-11T21:38:05.9283946Z import random 2023-01-11T21:38:05.9284060Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.9284184Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.9284189Z 2023-01-11T21:38:05.9284273Z aten = torch.ops.aten 2023-01-11T21:38:05.9284409Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.9284505Z async_compile = AsyncCompile() 2023-01-11T21:38:05.9284511Z 2023-01-11T21:38:05.9284585Z import triton 2023-01-11T21:38:05.9284676Z import triton.language as tl 2023-01-11T21:38:05.9284801Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.9284933Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.9284967Z 2023-01-11T21:38:05.9284971Z 2023-01-11T21:38:05.9285107Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.9285315Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.9285444Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:05.9285554Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:05.9285663Z const float* __restrict__ in_ptr2, 2023-01-11T21:38:05.9285768Z float* __restrict__ out_ptr0, 2023-01-11T21:38:05.9285871Z float* __restrict__ out_ptr1, 2023-01-11T21:38:05.9285964Z float* __restrict__ out_ptr2) 2023-01-11T21:38:05.9286029Z { 2023-01-11T21:38:05.9286133Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.9286199Z { 2023-01-11T21:38:05.9286281Z #pragma omp for 2023-01-11T21:38:05.9286373Z for(long i0=0; i0<128; i0+=1) 2023-01-11T21:38:05.9286440Z { 2023-01-11T21:38:05.9286572Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:05.9286710Z auto tmp1 = at::vec::Vectorized(static_cast(1)); 2023-01-11T21:38:05.9286806Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:05.9286905Z tmp2.store(out_ptr0 + 8*i0); 2023-01-11T21:38:05.9286973Z } 2023-01-11T21:38:05.9287073Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:05.9287166Z for(long i0=1024; i0<1024; i0+=1) 2023-01-11T21:38:05.9287225Z { 2023-01-11T21:38:05.9287314Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:05.9287419Z auto tmp1 = static_cast(1); 2023-01-11T21:38:05.9287508Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:05.9287593Z out_ptr0[i0] = tmp2; 2023-01-11T21:38:05.9287659Z } 2023-01-11T21:38:05.9287739Z #pragma omp for 2023-01-11T21:38:05.9287820Z for(long i0=0; i0<16; i0+=1) 2023-01-11T21:38:05.9287887Z { 2023-01-11T21:38:05.9288026Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr1 + 8*i0); 2023-01-11T21:38:05.9288160Z auto tmp1 = at::vec::Vectorized(static_cast(2)); 2023-01-11T21:38:05.9288249Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:05.9288384Z tmp2.store(out_ptr1 + 8*i0); 2023-01-11T21:38:05.9288452Z } 2023-01-11T21:38:05.9288544Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:05.9288633Z for(long i0=128; i0<128; i0+=1) 2023-01-11T21:38:05.9288698Z { 2023-01-11T21:38:05.9288790Z auto tmp0 = in_ptr1[i0]; 2023-01-11T21:38:05.9288894Z auto tmp1 = static_cast(2); 2023-01-11T21:38:05.9288983Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:05.9289070Z out_ptr1[i0] = tmp2; 2023-01-11T21:38:05.9289129Z } 2023-01-11T21:38:05.9289209Z #pragma omp for 2023-01-11T21:38:05.9289300Z for(long i0=0; i0<16; i0+=1) 2023-01-11T21:38:05.9289366Z { 2023-01-11T21:38:05.9289500Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr2 + 8*i0); 2023-01-11T21:38:05.9289636Z auto tmp1 = at::vec::Vectorized(static_cast(3)); 2023-01-11T21:38:05.9289724Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:05.9289813Z tmp2.store(out_ptr2 + 8*i0); 2023-01-11T21:38:05.9289881Z } 2023-01-11T21:38:05.9289979Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:05.9290065Z for(long i0=128; i0<128; i0+=1) 2023-01-11T21:38:05.9290132Z { 2023-01-11T21:38:05.9290219Z auto tmp0 = in_ptr2[i0]; 2023-01-11T21:38:05.9290321Z auto tmp1 = static_cast(3); 2023-01-11T21:38:05.9290402Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:05.9290488Z out_ptr2[i0] = tmp2; 2023-01-11T21:38:05.9290555Z } 2023-01-11T21:38:05.9290620Z } 2023-01-11T21:38:05.9290765Z } 2023-01-11T21:38:05.9290855Z ''') 2023-01-11T21:38:05.9290861Z 2023-01-11T21:38:05.9290866Z 2023-01-11T21:38:05.9290960Z async_compile.wait(globals()) 2023-01-11T21:38:05.9291031Z del async_compile 2023-01-11T21:38:05.9291037Z 2023-01-11T21:38:05.9291111Z def call(args): 2023-01-11T21:38:05.9291198Z arg0_1, arg1_1, arg2_1 = args 2023-01-11T21:38:05.9291276Z args.clear() 2023-01-11T21:38:05.9291483Z buf0 = empty_strided((8, 16, 8), (128, 8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9291680Z buf1 = empty_strided((8, 16), (16, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9291874Z buf2 = empty_strided((16, 8), (8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9292110Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(arg2_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr())) 2023-01-11T21:38:05.9292178Z del arg0_1 2023-01-11T21:38:05.9292248Z del arg1_1 2023-01-11T21:38:05.9292320Z del arg2_1 2023-01-11T21:38:05.9292406Z return (buf0, buf1, buf2, ) 2023-01-11T21:38:05.9292411Z 2023-01-11T21:38:05.9292415Z 2023-01-11T21:38:05.9292495Z if __name__ == "__main__": 2023-01-11T21:38:05.9292615Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.9292746Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.9292945Z arg0_1 = rand_strided((8, 16, 8), (128, 8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9293143Z arg1_1 = rand_strided((8, 16), (16, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9293338Z arg2_1 = rand_strided((16, 8), (8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9293468Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1])) 2023-01-11T21:38:05.9293473Z 2023-01-11T21:38:05.9293545Z ok (1.757s) 2023-01-11T21:38:05.9293999Z test_index1_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.9294160Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.9294420Z [2023-01-11 21:27:12,472] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 120 2023-01-11T21:38:05.9294823Z [2023-01-11 21:27:14,170] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 120 2023-01-11T21:38:05.9295239Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.9295375Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.9295624Z [2023-01-11 21:27:14,210] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 121 2023-01-11T21:38:05.9295885Z [2023-01-11 21:27:16,036] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 121 2023-01-11T21:38:05.9295891Z 2023-01-11T21:38:05.9295990Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.9296064Z import torch 2023-01-11T21:38:05.9296141Z import random 2023-01-11T21:38:05.9296259Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.9296383Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.9296389Z 2023-01-11T21:38:05.9296469Z aten = torch.ops.aten 2023-01-11T21:38:05.9296599Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.9296695Z async_compile = AsyncCompile() 2023-01-11T21:38:05.9296700Z 2023-01-11T21:38:05.9296819Z import triton 2023-01-11T21:38:05.9296912Z import triton.language as tl 2023-01-11T21:38:05.9297037Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.9297258Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.9297263Z 2023-01-11T21:38:05.9297268Z 2023-01-11T21:38:05.9297412Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.9297619Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.9297734Z extern "C" void kernel(const long* __restrict__ in_ptr0, 2023-01-11T21:38:05.9297843Z const long* __restrict__ in_ptr1, 2023-01-11T21:38:05.9297952Z const float* __restrict__ in_ptr2, 2023-01-11T21:38:05.9298057Z float* __restrict__ out_ptr0) 2023-01-11T21:38:05.9298122Z { 2023-01-11T21:38:05.9298224Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.9298292Z { 2023-01-11T21:38:05.9298379Z #pragma omp for collapse(2) 2023-01-11T21:38:05.9298469Z for(long i0=0; i0<4; i0+=1) 2023-01-11T21:38:05.9298535Z { 2023-01-11T21:38:05.9298627Z for(long i1=0; i1<12; i1+=1) 2023-01-11T21:38:05.9298694Z { 2023-01-11T21:38:05.9298762Z { 2023-01-11T21:38:05.9298833Z { 2023-01-11T21:38:05.9298929Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:05.9299029Z auto tmp1 = in_ptr1[i0]; 2023-01-11T21:38:05.9299147Z auto tmp2 = in_ptr2[i1 + (12*tmp1) + (96*tmp0)]; 2023-01-11T21:38:05.9299249Z out_ptr0[i1 + (12*i0)] = tmp2; 2023-01-11T21:38:05.9299320Z } 2023-01-11T21:38:05.9299389Z } 2023-01-11T21:38:05.9299457Z } 2023-01-11T21:38:05.9299517Z } 2023-01-11T21:38:05.9299583Z } 2023-01-11T21:38:05.9299647Z } 2023-01-11T21:38:05.9299738Z ''') 2023-01-11T21:38:05.9299743Z 2023-01-11T21:38:05.9299752Z 2023-01-11T21:38:05.9299846Z async_compile.wait(globals()) 2023-01-11T21:38:05.9299923Z del async_compile 2023-01-11T21:38:05.9299928Z 2023-01-11T21:38:05.9300002Z def call(args): 2023-01-11T21:38:05.9300082Z arg0_1, arg1_1, arg2_1 = args 2023-01-11T21:38:05.9300157Z args.clear() 2023-01-11T21:38:05.9300404Z buf0 = empty_strided((4, 12), (12, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9300600Z kernel_cpp_0(c_void_p(arg1_1.data_ptr()), c_void_p(arg2_1.data_ptr()), c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:05.9300672Z del arg0_1 2023-01-11T21:38:05.9300743Z del arg1_1 2023-01-11T21:38:05.9300814Z del arg2_1 2023-01-11T21:38:05.9300883Z return (buf0, ) 2023-01-11T21:38:05.9300888Z 2023-01-11T21:38:05.9300901Z 2023-01-11T21:38:05.9300975Z if __name__ == "__main__": 2023-01-11T21:38:05.9301096Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.9301222Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.9301435Z arg0_1 = rand_strided((8, 8, 12), (96, 12, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9301626Z arg1_1 = rand_strided((4, ), (1, ), device='cpu', dtype=torch.int64) 2023-01-11T21:38:05.9301812Z arg2_1 = rand_strided((4, ), (1, ), device='cpu', dtype=torch.int64) 2023-01-11T21:38:05.9301943Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1])) 2023-01-11T21:38:05.9301948Z 2023-01-11T21:38:05.9301952Z 2023-01-11T21:38:05.9302051Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.9302118Z import torch 2023-01-11T21:38:05.9302193Z import random 2023-01-11T21:38:05.9302312Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.9302436Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.9302441Z 2023-01-11T21:38:05.9302524Z aten = torch.ops.aten 2023-01-11T21:38:05.9302660Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.9302759Z async_compile = AsyncCompile() 2023-01-11T21:38:05.9302793Z 2023-01-11T21:38:05.9302861Z import triton 2023-01-11T21:38:05.9302954Z import triton.language as tl 2023-01-11T21:38:05.9303079Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.9303219Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.9303225Z 2023-01-11T21:38:05.9303229Z 2023-01-11T21:38:05.9303366Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.9303574Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.9303695Z extern "C" void kernel(const long* __restrict__ in_ptr0, 2023-01-11T21:38:05.9303804Z const long* __restrict__ in_ptr1, 2023-01-11T21:38:05.9303907Z const float* __restrict__ in_ptr2, 2023-01-11T21:38:05.9304015Z float* __restrict__ out_ptr0) 2023-01-11T21:38:05.9304081Z { 2023-01-11T21:38:05.9304185Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.9304255Z { 2023-01-11T21:38:05.9304353Z #pragma omp for collapse(2) 2023-01-11T21:38:05.9304439Z for(long i0=0; i0<4; i0+=1) 2023-01-11T21:38:05.9304498Z { 2023-01-11T21:38:05.9304586Z for(long i1=0; i1<4; i1+=1) 2023-01-11T21:38:05.9304654Z { 2023-01-11T21:38:05.9304741Z #pragma GCC ivdep 2023-01-11T21:38:05.9304838Z for(long i2=0; i2<12; i2+=1) 2023-01-11T21:38:05.9304909Z { 2023-01-11T21:38:05.9304973Z { 2023-01-11T21:38:05.9305047Z { 2023-01-11T21:38:05.9305150Z auto tmp0 = in_ptr0[i1]; 2023-01-11T21:38:05.9305252Z auto tmp1 = in_ptr1[i0]; 2023-01-11T21:38:05.9305373Z auto tmp2 = in_ptr2[i2 + (12*tmp1) + (96*tmp0)]; 2023-01-11T21:38:05.9305484Z out_ptr0[i2 + (12*i1) + (48*i0)] = tmp2; 2023-01-11T21:38:05.9305558Z } 2023-01-11T21:38:05.9305633Z } 2023-01-11T21:38:05.9305695Z } 2023-01-11T21:38:05.9305763Z } 2023-01-11T21:38:05.9305829Z } 2023-01-11T21:38:05.9305900Z } 2023-01-11T21:38:05.9305965Z } 2023-01-11T21:38:05.9306048Z ''') 2023-01-11T21:38:05.9306054Z 2023-01-11T21:38:05.9306058Z 2023-01-11T21:38:05.9306187Z async_compile.wait(globals()) 2023-01-11T21:38:05.9306260Z del async_compile 2023-01-11T21:38:05.9306265Z 2023-01-11T21:38:05.9306341Z def call(args): 2023-01-11T21:38:05.9306426Z arg0_1, arg1_1, arg2_1 = args 2023-01-11T21:38:05.9306503Z args.clear() 2023-01-11T21:38:05.9306714Z buf0 = empty_strided((4, 4, 12), (48, 12, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9306908Z kernel_cpp_0(c_void_p(arg1_1.data_ptr()), c_void_p(arg2_1.data_ptr()), c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:05.9306981Z del arg0_1 2023-01-11T21:38:05.9307045Z del arg1_1 2023-01-11T21:38:05.9307115Z del arg2_1 2023-01-11T21:38:05.9307195Z return (buf0, ) 2023-01-11T21:38:05.9307201Z 2023-01-11T21:38:05.9307205Z 2023-01-11T21:38:05.9307286Z if __name__ == "__main__": 2023-01-11T21:38:05.9307406Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.9307533Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.9307742Z arg0_1 = rand_strided((8, 8, 12), (96, 12, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9307935Z arg1_1 = rand_strided((1, 4), (4, 1), device='cpu', dtype=torch.int64) 2023-01-11T21:38:05.9308119Z arg2_1 = rand_strided((4, 1), (1, 1), device='cpu', dtype=torch.int64) 2023-01-11T21:38:05.9308248Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1])) 2023-01-11T21:38:05.9308254Z 2023-01-11T21:38:05.9308323Z ok (3.619s) 2023-01-11T21:38:05.9308778Z test_index2_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.9308942Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.9309201Z [2023-01-11 21:27:16,124] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 122 2023-01-11T21:38:05.9309465Z [2023-01-11 21:27:17,893] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 122 2023-01-11T21:38:05.9309470Z 2023-01-11T21:38:05.9309568Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.9309642Z import torch 2023-01-11T21:38:05.9309709Z import random 2023-01-11T21:38:05.9309832Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.9309956Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.9309961Z 2023-01-11T21:38:05.9310043Z aten = torch.ops.aten 2023-01-11T21:38:05.9310181Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.9310275Z async_compile = AsyncCompile() 2023-01-11T21:38:05.9310280Z 2023-01-11T21:38:05.9310353Z import triton 2023-01-11T21:38:05.9310445Z import triton.language as tl 2023-01-11T21:38:05.9310565Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.9310708Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.9310713Z 2023-01-11T21:38:05.9310717Z 2023-01-11T21:38:05.9310853Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.9311059Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.9311181Z extern "C" void kernel(const long* __restrict__ in_ptr0, 2023-01-11T21:38:05.9311290Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:05.9311394Z float* __restrict__ out_ptr0, 2023-01-11T21:38:05.9311496Z float* __restrict__ out_ptr1) 2023-01-11T21:38:05.9311556Z { 2023-01-11T21:38:05.9311659Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.9311726Z { 2023-01-11T21:38:05.9311826Z #pragma omp for collapse(2) 2023-01-11T21:38:05.9311915Z for(long i0=0; i0<4; i0+=1) 2023-01-11T21:38:05.9311982Z { 2023-01-11T21:38:05.9312097Z for(long i1=0; i1<64; i1+=1) 2023-01-11T21:38:05.9312173Z { 2023-01-11T21:38:05.9312242Z { 2023-01-11T21:38:05.9312314Z { 2023-01-11T21:38:05.9312415Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:05.9312526Z auto tmp1 = in_ptr1[i1 + (64*tmp0)]; 2023-01-11T21:38:05.9312628Z out_ptr0[i1 + (64*i0)] = tmp1; 2023-01-11T21:38:05.9312692Z } 2023-01-11T21:38:05.9312762Z } 2023-01-11T21:38:05.9312829Z } 2023-01-11T21:38:05.9312895Z } 2023-01-11T21:38:05.9312977Z #pragma omp for 2023-01-11T21:38:05.9313067Z for(long i0=0; i0<8; i0+=1) 2023-01-11T21:38:05.9313133Z { 2023-01-11T21:38:05.9313212Z #pragma GCC ivdep 2023-01-11T21:38:05.9313298Z for(long i1=0; i1<4; i1+=1) 2023-01-11T21:38:05.9313366Z { 2023-01-11T21:38:05.9313455Z #pragma GCC ivdep 2023-01-11T21:38:05.9313553Z for(long i2=0; i2<8; i2+=1) 2023-01-11T21:38:05.9313622Z { 2023-01-11T21:38:05.9313692Z { 2023-01-11T21:38:05.9313758Z { 2023-01-11T21:38:05.9313859Z auto tmp0 = in_ptr0[i1]; 2023-01-11T21:38:05.9313976Z auto tmp1 = in_ptr1[i2 + (8*tmp0) + (64*i0)]; 2023-01-11T21:38:05.9314089Z out_ptr1[i2 + (8*i1) + (32*i0)] = tmp1; 2023-01-11T21:38:05.9314163Z } 2023-01-11T21:38:05.9314233Z } 2023-01-11T21:38:05.9314301Z } 2023-01-11T21:38:05.9314361Z } 2023-01-11T21:38:05.9314459Z } 2023-01-11T21:38:05.9314524Z } 2023-01-11T21:38:05.9314592Z } 2023-01-11T21:38:05.9314678Z ''') 2023-01-11T21:38:05.9314683Z 2023-01-11T21:38:05.9314687Z 2023-01-11T21:38:05.9314783Z async_compile.wait(globals()) 2023-01-11T21:38:05.9314860Z del async_compile 2023-01-11T21:38:05.9314865Z 2023-01-11T21:38:05.9314935Z def call(args): 2023-01-11T21:38:05.9315018Z arg0_1, arg1_1 = args 2023-01-11T21:38:05.9315094Z args.clear() 2023-01-11T21:38:05.9315309Z buf0 = empty_strided((1, 4, 8, 8), (256, 64, 8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9315520Z buf1 = empty_strided((8, 1, 4, 8), (32, 32, 8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9315712Z kernel_cpp_0(c_void_p(arg1_1.data_ptr()), c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr())) 2023-01-11T21:38:05.9315784Z del arg0_1 2023-01-11T21:38:05.9315848Z del arg1_1 2023-01-11T21:38:05.9315930Z return (buf0, buf1, ) 2023-01-11T21:38:05.9315939Z 2023-01-11T21:38:05.9315943Z 2023-01-11T21:38:05.9316023Z if __name__ == "__main__": 2023-01-11T21:38:05.9316143Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.9316270Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.9316480Z arg0_1 = rand_strided((8, 8, 8), (64, 8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9316670Z arg1_1 = rand_strided((1, 4), (4, 1), device='cpu', dtype=torch.int64) 2023-01-11T21:38:05.9316793Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:05.9316799Z 2023-01-11T21:38:05.9316862Z ok (1.852s) 2023-01-11T21:38:05.9317310Z test_index3_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.9317447Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.9317702Z [2023-01-11 21:27:17,953] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 123 2023-01-11T21:38:05.9317948Z [2023-01-11 21:27:17,978] torch._inductor.ir: [WARNING] Using FallbackKernel: aten.index 2023-01-11T21:38:05.9318212Z [2023-01-11 21:27:17,982] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 123 2023-01-11T21:38:05.9318217Z 2023-01-11T21:38:05.9318316Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.9318390Z import torch 2023-01-11T21:38:05.9318464Z import random 2023-01-11T21:38:05.9318576Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.9318699Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.9318705Z 2023-01-11T21:38:05.9318786Z aten = torch.ops.aten 2023-01-11T21:38:05.9318922Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.9319021Z async_compile = AsyncCompile() 2023-01-11T21:38:05.9319026Z 2023-01-11T21:38:05.9319101Z import triton 2023-01-11T21:38:05.9319194Z import triton.language as tl 2023-01-11T21:38:05.9319317Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.9319451Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.9319457Z 2023-01-11T21:38:05.9319468Z 2023-01-11T21:38:05.9319553Z async_compile.wait(globals()) 2023-01-11T21:38:05.9319629Z del async_compile 2023-01-11T21:38:05.9319634Z 2023-01-11T21:38:05.9319708Z def call(args): 2023-01-11T21:38:05.9319794Z arg0_1, arg1_1, arg2_1 = args 2023-01-11T21:38:05.9319870Z args.clear() 2023-01-11T21:38:05.9320023Z buf0 = aten.index(as_strided(arg0_1, (3, 4, 1, 4, 3), (192, 48, 0, 12, 1)), [None, arg1_1, None, arg2_1]) 2023-01-11T21:38:05.9320095Z del arg0_1 2023-01-11T21:38:05.9320159Z del arg1_1 2023-01-11T21:38:05.9320262Z del arg2_1 2023-01-11T21:38:05.9320335Z buf1 = buf0 2023-01-11T21:38:05.9320445Z assert_size_stride(buf1, (3, 3, 1, 3), (9, 3, 3, 1)) 2023-01-11T21:38:05.9320516Z del buf0 2023-01-11T21:38:05.9320591Z return (buf1, ) 2023-01-11T21:38:05.9320596Z 2023-01-11T21:38:05.9320601Z 2023-01-11T21:38:05.9320679Z if __name__ == "__main__": 2023-01-11T21:38:05.9320794Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.9320919Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.9321142Z arg0_1 = rand_strided((3, 4, 4, 4, 3), (192, 48, 12, 3, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9321333Z arg1_1 = rand_strided((3, ), (1, ), device='cpu', dtype=torch.int64) 2023-01-11T21:38:05.9321523Z arg2_1 = rand_strided((3, ), (1, ), device='cpu', dtype=torch.int64) 2023-01-11T21:38:05.9321650Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1])) 2023-01-11T21:38:05.9321655Z 2023-01-11T21:38:05.9321727Z ok (0.079s) 2023-01-11T21:38:05.9322187Z test_index_put1_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.9322320Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.9322569Z [2023-01-11 21:27:18,199] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 124 2023-01-11T21:38:05.9322832Z [2023-01-11 21:27:19,987] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 124 2023-01-11T21:38:05.9323252Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.9323386Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.9323665Z [2023-01-11 21:27:20,542] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 125 2023-01-11T21:38:05.9323928Z [2023-01-11 21:27:22,291] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 125 2023-01-11T21:38:05.9323934Z 2023-01-11T21:38:05.9324031Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.9324105Z import torch 2023-01-11T21:38:05.9324181Z import random 2023-01-11T21:38:05.9324294Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.9324422Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.9324427Z 2023-01-11T21:38:05.9324510Z aten = torch.ops.aten 2023-01-11T21:38:05.9324647Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.9324743Z async_compile = AsyncCompile() 2023-01-11T21:38:05.9324748Z 2023-01-11T21:38:05.9324823Z import triton 2023-01-11T21:38:05.9324917Z import triton.language as tl 2023-01-11T21:38:05.9325041Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.9325177Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.9325182Z 2023-01-11T21:38:05.9325195Z 2023-01-11T21:38:05.9325324Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.9325528Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.9325653Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:05.9325764Z const long* __restrict__ in_ptr1, 2023-01-11T21:38:05.9325874Z const float* __restrict__ in_ptr2, 2023-01-11T21:38:05.9325978Z float* __restrict__ out_ptr0, 2023-01-11T21:38:05.9326117Z float* __restrict__ out_ptr1, 2023-01-11T21:38:05.9326209Z float* __restrict__ out_ptr2) 2023-01-11T21:38:05.9326275Z { 2023-01-11T21:38:05.9326380Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.9326445Z { 2023-01-11T21:38:05.9326526Z #pragma omp for 2023-01-11T21:38:05.9326621Z for(long i0=0; i0<1254400; i0+=1) 2023-01-11T21:38:05.9326689Z { 2023-01-11T21:38:05.9326824Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:05.9326965Z auto tmp1 = at::vec::Vectorized(static_cast(1)); 2023-01-11T21:38:05.9327055Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:05.9327151Z tmp0.store(out_ptr0 + 8*i0); 2023-01-11T21:38:05.9327248Z tmp2.store(out_ptr1 + 8*i0); 2023-01-11T21:38:05.9327314Z } 2023-01-11T21:38:05.9327412Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:05.9327503Z for(long i0=10035200; i0<10035200; i0+=1) 2023-01-11T21:38:05.9327574Z { 2023-01-11T21:38:05.9327662Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:05.9327765Z auto tmp1 = static_cast(1); 2023-01-11T21:38:05.9327853Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:05.9327940Z out_ptr0[i0] = tmp0; 2023-01-11T21:38:05.9328030Z out_ptr1[i0] = tmp2; 2023-01-11T21:38:05.9328090Z } 2023-01-11T21:38:05.9328171Z #pragma omp for 2023-01-11T21:38:05.9328260Z for(long i0=0; i0<601; i0+=1) 2023-01-11T21:38:05.9328327Z { 2023-01-11T21:38:05.9328413Z #pragma GCC ivdep 2023-01-11T21:38:05.9328507Z for(long i1=0; i1<12544; i1+=1) 2023-01-11T21:38:05.9328576Z { 2023-01-11T21:38:05.9328638Z { 2023-01-11T21:38:05.9328709Z { 2023-01-11T21:38:05.9328809Z auto tmp0 = in_ptr1[i0]; 2023-01-11T21:38:05.9328920Z auto tmp1 = in_ptr2[i1 + (12544*i0)]; 2023-01-11T21:38:05.9329033Z auto tmp2 = static_cast(1); 2023-01-11T21:38:05.9329133Z auto tmp3 = tmp0 + tmp2; 2023-01-11T21:38:05.9329246Z auto tmp4 = static_cast(1); 2023-01-11T21:38:05.9329337Z auto tmp5 = tmp1 + tmp4; 2023-01-11T21:38:05.9329488Z out_ptr0[i1 + (12544*tmp0)] = tmp1; 2023-01-11T21:38:05.9329595Z out_ptr1[i1 + (12544*tmp3)] = tmp5; 2023-01-11T21:38:05.9329667Z } 2023-01-11T21:38:05.9329738Z } 2023-01-11T21:38:05.9329805Z } 2023-01-11T21:38:05.9329871Z } 2023-01-11T21:38:05.9329945Z #pragma omp for 2023-01-11T21:38:05.9330037Z for(long i0=0; i0<1254400; i0+=1) 2023-01-11T21:38:05.9330103Z { 2023-01-11T21:38:05.9330241Z auto tmp0 = at::vec::Vectorized::loadu(out_ptr1 + 8*i0); 2023-01-11T21:38:05.9330384Z auto tmp1 = at::vec::Vectorized(static_cast(1)); 2023-01-11T21:38:05.9330477Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:05.9330573Z tmp2.store(out_ptr2 + 8*i0); 2023-01-11T21:38:05.9330632Z } 2023-01-11T21:38:05.9330730Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:05.9330828Z for(long i0=10035200; i0<10035200; i0+=1) 2023-01-11T21:38:05.9330900Z { 2023-01-11T21:38:05.9330988Z auto tmp0 = out_ptr1[i0]; 2023-01-11T21:38:05.9331092Z auto tmp1 = static_cast(1); 2023-01-11T21:38:05.9331180Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:05.9331259Z out_ptr2[i0] = tmp2; 2023-01-11T21:38:05.9331326Z } 2023-01-11T21:38:05.9331391Z } 2023-01-11T21:38:05.9331455Z } 2023-01-11T21:38:05.9331540Z ''') 2023-01-11T21:38:05.9331546Z 2023-01-11T21:38:05.9331551Z 2023-01-11T21:38:05.9331644Z async_compile.wait(globals()) 2023-01-11T21:38:05.9331722Z del async_compile 2023-01-11T21:38:05.9331728Z 2023-01-11T21:38:05.9331829Z def call(args): 2023-01-11T21:38:05.9331917Z arg0_1, arg1_1, arg2_1 = args 2023-01-11T21:38:05.9331993Z args.clear() 2023-01-11T21:38:05.9332218Z buf0 = empty_strided((800, 256, 7, 7), (12544, 49, 7, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9332444Z buf2 = empty_strided((800, 256, 7, 7), (12544, 49, 7, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9332662Z buf4 = empty_strided((800, 256, 7, 7), (12544, 49, 7, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9332899Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(arg2_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf2.data_ptr()), c_void_p(buf4.data_ptr())) 2023-01-11T21:38:05.9332977Z del arg0_1 2023-01-11T21:38:05.9333041Z del arg1_1 2023-01-11T21:38:05.9333113Z del arg2_1 2023-01-11T21:38:05.9333195Z return (buf0, buf4, ) 2023-01-11T21:38:05.9333200Z 2023-01-11T21:38:05.9333204Z 2023-01-11T21:38:05.9333285Z if __name__ == "__main__": 2023-01-11T21:38:05.9333407Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.9333538Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.9333764Z arg0_1 = rand_strided((800, 256, 7, 7), (12544, 49, 7, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9333961Z arg1_1 = rand_strided((601, ), (1, ), device='cpu', dtype=torch.int64) 2023-01-11T21:38:05.9334171Z arg2_1 = rand_strided((601, 256, 7, 7), (12544, 49, 7, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9334302Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1])) 2023-01-11T21:38:05.9334308Z 2023-01-11T21:38:05.9334312Z 2023-01-11T21:38:05.9334409Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.9334601Z import torch 2023-01-11T21:38:05.9334678Z import random 2023-01-11T21:38:05.9334797Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.9334923Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.9334932Z 2023-01-11T21:38:05.9335013Z aten = torch.ops.aten 2023-01-11T21:38:05.9335143Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.9335241Z async_compile = AsyncCompile() 2023-01-11T21:38:05.9335247Z 2023-01-11T21:38:05.9335319Z import triton 2023-01-11T21:38:05.9335412Z import triton.language as tl 2023-01-11T21:38:05.9335580Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.9335721Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.9335726Z 2023-01-11T21:38:05.9335731Z 2023-01-11T21:38:05.9335869Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.9336075Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.9336192Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:05.9336301Z const long* __restrict__ in_ptr1, 2023-01-11T21:38:05.9336407Z const float* __restrict__ in_ptr2, 2023-01-11T21:38:05.9336518Z float* __restrict__ out_ptr0, 2023-01-11T21:38:05.9336621Z float* __restrict__ out_ptr1, 2023-01-11T21:38:05.9336720Z float* __restrict__ out_ptr2) 2023-01-11T21:38:05.9336785Z { 2023-01-11T21:38:05.9336879Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.9336947Z { 2023-01-11T21:38:05.9337029Z #pragma omp for 2023-01-11T21:38:05.9337116Z for(long i0=0; i0<1024; i0+=1) 2023-01-11T21:38:05.9337302Z { 2023-01-11T21:38:05.9337440Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:05.9337580Z auto tmp1 = at::vec::Vectorized(static_cast(1)); 2023-01-11T21:38:05.9337663Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:05.9337759Z tmp0.store(out_ptr0 + 8*i0); 2023-01-11T21:38:05.9337854Z tmp2.store(out_ptr1 + 8*i0); 2023-01-11T21:38:05.9337923Z } 2023-01-11T21:38:05.9338065Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:05.9338156Z for(long i0=8192; i0<8192; i0+=1) 2023-01-11T21:38:05.9338223Z { 2023-01-11T21:38:05.9338305Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:05.9338410Z auto tmp1 = static_cast(1); 2023-01-11T21:38:05.9338498Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:05.9338586Z out_ptr0[i0] = tmp0; 2023-01-11T21:38:05.9338672Z out_ptr1[i0] = tmp2; 2023-01-11T21:38:05.9338738Z } 2023-01-11T21:38:05.9338832Z #pragma omp for collapse(2) 2023-01-11T21:38:05.9338911Z for(long i0=0; i0<4; i0+=1) 2023-01-11T21:38:05.9338978Z { 2023-01-11T21:38:05.9339067Z for(long i1=0; i1<8; i1+=1) 2023-01-11T21:38:05.9339134Z { 2023-01-11T21:38:05.9339203Z { 2023-01-11T21:38:05.9339273Z { 2023-01-11T21:38:05.9339374Z auto tmp0 = in_ptr1[i0]; 2023-01-11T21:38:05.9339467Z auto tmp1 = in_ptr2[i0]; 2023-01-11T21:38:05.9339586Z auto tmp2 = static_cast(1); 2023-01-11T21:38:05.9339684Z auto tmp3 = tmp0 + tmp2; 2023-01-11T21:38:05.9339801Z auto tmp4 = static_cast(1); 2023-01-11T21:38:05.9339898Z auto tmp5 = tmp1 + tmp4; 2023-01-11T21:38:05.9340007Z out_ptr0[i1 + (8*tmp0)] = tmp1; 2023-01-11T21:38:05.9340111Z out_ptr1[i1 + (8*tmp3)] = tmp5; 2023-01-11T21:38:05.9340175Z } 2023-01-11T21:38:05.9340243Z } 2023-01-11T21:38:05.9340310Z } 2023-01-11T21:38:05.9340377Z } 2023-01-11T21:38:05.9340459Z #pragma omp for 2023-01-11T21:38:05.9340549Z for(long i0=0; i0<1024; i0+=1) 2023-01-11T21:38:05.9340615Z { 2023-01-11T21:38:05.9340746Z auto tmp0 = at::vec::Vectorized::loadu(out_ptr1 + 8*i0); 2023-01-11T21:38:05.9340882Z auto tmp1 = at::vec::Vectorized(static_cast(1)); 2023-01-11T21:38:05.9340977Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:05.9341074Z tmp2.store(out_ptr2 + 8*i0); 2023-01-11T21:38:05.9341143Z } 2023-01-11T21:38:05.9341242Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:05.9341361Z for(long i0=8192; i0<8192; i0+=1) 2023-01-11T21:38:05.9341422Z { 2023-01-11T21:38:05.9341512Z auto tmp0 = out_ptr1[i0]; 2023-01-11T21:38:05.9341615Z auto tmp1 = static_cast(1); 2023-01-11T21:38:05.9341705Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:05.9341791Z out_ptr2[i0] = tmp2; 2023-01-11T21:38:05.9341857Z } 2023-01-11T21:38:05.9341923Z } 2023-01-11T21:38:05.9341980Z } 2023-01-11T21:38:05.9342067Z ''') 2023-01-11T21:38:05.9342072Z 2023-01-11T21:38:05.9342077Z 2023-01-11T21:38:05.9342169Z async_compile.wait(globals()) 2023-01-11T21:38:05.9342246Z del async_compile 2023-01-11T21:38:05.9342254Z 2023-01-11T21:38:05.9342330Z def call(args): 2023-01-11T21:38:05.9342417Z arg0_1, arg1_1, arg2_1 = args 2023-01-11T21:38:05.9342492Z args.clear() 2023-01-11T21:38:05.9342692Z buf0 = empty_strided((1024, 4, 2), (8, 2, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9342893Z buf2 = empty_strided((1024, 4, 2), (8, 2, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9343092Z buf4 = empty_strided((1024, 4, 2), (8, 2, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9343330Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(arg2_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf2.data_ptr()), c_void_p(buf4.data_ptr())) 2023-01-11T21:38:05.9343403Z del arg0_1 2023-01-11T21:38:05.9343474Z del arg1_1 2023-01-11T21:38:05.9343546Z del arg2_1 2023-01-11T21:38:05.9343626Z return (buf0, buf4, ) 2023-01-11T21:38:05.9343632Z 2023-01-11T21:38:05.9343636Z 2023-01-11T21:38:05.9343709Z if __name__ == "__main__": 2023-01-11T21:38:05.9343859Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.9343987Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.9344192Z arg0_1 = rand_strided((1024, 4, 2), (8, 2, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9344385Z arg1_1 = rand_strided((4, ), (1, ), device='cpu', dtype=torch.int64) 2023-01-11T21:38:05.9344588Z arg2_1 = rand_strided((4, 1, 1), (1, 1, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9344714Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1])) 2023-01-11T21:38:05.9344719Z 2023-01-11T21:38:05.9344792Z ok (4.310s) 2023-01-11T21:38:05.9345243Z test_index_put2_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.9345370Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.9345628Z [2023-01-11 21:27:22,421] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 126 2023-01-11T21:38:05.9345895Z [2023-01-11 21:27:24,305] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 126 2023-01-11T21:38:05.9345901Z 2023-01-11T21:38:05.9346000Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.9346074Z import torch 2023-01-11T21:38:05.9346145Z import random 2023-01-11T21:38:05.9346264Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.9346390Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.9346395Z 2023-01-11T21:38:05.9346470Z aten = torch.ops.aten 2023-01-11T21:38:05.9346608Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.9346707Z async_compile = AsyncCompile() 2023-01-11T21:38:05.9346714Z 2023-01-11T21:38:05.9346789Z import triton 2023-01-11T21:38:05.9346880Z import triton.language as tl 2023-01-11T21:38:05.9347004Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.9347144Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.9347150Z 2023-01-11T21:38:05.9347154Z 2023-01-11T21:38:05.9347318Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.9347519Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.9347643Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:05.9347752Z const long* __restrict__ in_ptr1, 2023-01-11T21:38:05.9347861Z const float* __restrict__ in_ptr2, 2023-01-11T21:38:05.9347964Z float* __restrict__ out_ptr0) 2023-01-11T21:38:05.9348028Z { 2023-01-11T21:38:05.9348129Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.9348188Z { 2023-01-11T21:38:05.9348272Z #pragma omp for 2023-01-11T21:38:05.9348362Z for(long i0=0; i0<156800; i0+=1) 2023-01-11T21:38:05.9348429Z { 2023-01-11T21:38:05.9348568Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:05.9348663Z tmp0.store(out_ptr0 + 8*i0); 2023-01-11T21:38:05.9348734Z } 2023-01-11T21:38:05.9348826Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:05.9348924Z for(long i0=1254400; i0<1254400; i0+=1) 2023-01-11T21:38:05.9348990Z { 2023-01-11T21:38:05.9349079Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:05.9349164Z out_ptr0[i0] = tmp0; 2023-01-11T21:38:05.9349230Z } 2023-01-11T21:38:05.9349315Z #pragma omp for 2023-01-11T21:38:05.9349394Z for(long i0=0; i0<600; i0+=1) 2023-01-11T21:38:05.9349460Z { 2023-01-11T21:38:05.9349544Z #pragma GCC ivdep 2023-01-11T21:38:05.9349637Z for(long i1=0; i1<12544; i1+=1) 2023-01-11T21:38:05.9349738Z { 2023-01-11T21:38:05.9349807Z { 2023-01-11T21:38:05.9349878Z { 2023-01-11T21:38:05.9349972Z auto tmp0 = in_ptr1[i0]; 2023-01-11T21:38:05.9350090Z auto tmp1 = in_ptr2[i1 + (12544*i0)]; 2023-01-11T21:38:05.9350212Z atomic_add(&out_ptr0[i1 + (12544*tmp0)], tmp1); 2023-01-11T21:38:05.9350283Z } 2023-01-11T21:38:05.9350352Z } 2023-01-11T21:38:05.9350418Z } 2023-01-11T21:38:05.9350483Z } 2023-01-11T21:38:05.9350542Z } 2023-01-11T21:38:05.9350606Z } 2023-01-11T21:38:05.9350690Z ''') 2023-01-11T21:38:05.9350695Z 2023-01-11T21:38:05.9350699Z 2023-01-11T21:38:05.9350792Z async_compile.wait(globals()) 2023-01-11T21:38:05.9350869Z del async_compile 2023-01-11T21:38:05.9350873Z 2023-01-11T21:38:05.9350948Z def call(args): 2023-01-11T21:38:05.9351031Z arg0_1, arg1_1, arg2_1 = args 2023-01-11T21:38:05.9351100Z args.clear() 2023-01-11T21:38:05.9351324Z buf0 = empty_strided((100, 256, 7, 7), (12544, 49, 7, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9351517Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(arg2_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:05.9351591Z del arg0_1 2023-01-11T21:38:05.9351665Z del arg1_1 2023-01-11T21:38:05.9351735Z del arg2_1 2023-01-11T21:38:05.9351811Z return (buf0, ) 2023-01-11T21:38:05.9351816Z 2023-01-11T21:38:05.9351820Z 2023-01-11T21:38:05.9351901Z if __name__ == "__main__": 2023-01-11T21:38:05.9352014Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.9352140Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.9352360Z arg0_1 = rand_strided((100, 256, 7, 7), (12544, 49, 7, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9352553Z arg1_1 = rand_strided((600, ), (1, ), device='cpu', dtype=torch.int64) 2023-01-11T21:38:05.9352772Z arg2_1 = rand_strided((600, 256, 7, 7), (12544, 49, 7, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9352904Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1])) 2023-01-11T21:38:05.9352909Z 2023-01-11T21:38:05.9352983Z ok (2.114s) 2023-01-11T21:38:05.9353461Z test_index_put3_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.9353596Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.9353846Z [2023-01-11 21:27:24,500] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 127 2023-01-11T21:38:05.9354109Z [2023-01-11 21:27:26,315] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 127 2023-01-11T21:38:05.9354117Z 2023-01-11T21:38:05.9354215Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.9354289Z import torch 2023-01-11T21:38:05.9354361Z import random 2023-01-11T21:38:05.9354481Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.9354609Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.9354614Z 2023-01-11T21:38:05.9354696Z aten = torch.ops.aten 2023-01-11T21:38:05.9354825Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.9354920Z async_compile = AsyncCompile() 2023-01-11T21:38:05.9354925Z 2023-01-11T21:38:05.9354999Z import triton 2023-01-11T21:38:05.9355091Z import triton.language as tl 2023-01-11T21:38:05.9355215Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.9355356Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.9355361Z 2023-01-11T21:38:05.9355366Z 2023-01-11T21:38:05.9355502Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.9355737Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.9355853Z extern "C" void kernel(const long* __restrict__ in_ptr0, 2023-01-11T21:38:05.9355962Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:05.9356069Z float* __restrict__ out_ptr0, 2023-01-11T21:38:05.9356168Z float* __restrict__ out_ptr1) 2023-01-11T21:38:05.9356233Z { 2023-01-11T21:38:05.9356334Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.9356401Z { 2023-01-11T21:38:05.9356475Z #pragma omp for 2023-01-11T21:38:05.9356562Z for(long i0=0; i0<1024; i0+=1) 2023-01-11T21:38:05.9356630Z { 2023-01-11T21:38:05.9356713Z #pragma GCC ivdep 2023-01-11T21:38:05.9356801Z for(long i1=0; i1<3; i1+=1) 2023-01-11T21:38:05.9356870Z { 2023-01-11T21:38:05.9356957Z #pragma GCC ivdep 2023-01-11T21:38:05.9357047Z for(long i2=0; i2<2; i2+=1) 2023-01-11T21:38:05.9357116Z { 2023-01-11T21:38:05.9357188Z { 2023-01-11T21:38:05.9357262Z { 2023-01-11T21:38:05.9357364Z auto tmp0 = in_ptr0[i1]; 2023-01-11T21:38:05.9357477Z auto tmp1 = in_ptr1[i2 + (2*i0)]; 2023-01-11T21:38:05.9357590Z out_ptr0[i2 + (2*tmp0) + (8*i0)] = tmp1; 2023-01-11T21:38:05.9357657Z } 2023-01-11T21:38:05.9357726Z } 2023-01-11T21:38:05.9357796Z } 2023-01-11T21:38:05.9357864Z } 2023-01-11T21:38:05.9357932Z } 2023-01-11T21:38:05.9358012Z #pragma omp for 2023-01-11T21:38:05.9358093Z for(long i0=0; i0<1024; i0+=1) 2023-01-11T21:38:05.9358160Z { 2023-01-11T21:38:05.9358304Z auto tmp0 = at::vec::Vectorized::loadu(out_ptr0 + 8*i0); 2023-01-11T21:38:05.9358447Z auto tmp1 = at::vec::Vectorized(static_cast(1)); 2023-01-11T21:38:05.9358538Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:05.9358634Z tmp2.store(out_ptr1 + 8*i0); 2023-01-11T21:38:05.9358702Z } 2023-01-11T21:38:05.9358802Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:05.9358913Z for(long i0=8192; i0<8192; i0+=1) 2023-01-11T21:38:05.9358986Z { 2023-01-11T21:38:05.9359077Z auto tmp0 = out_ptr0[i0]; 2023-01-11T21:38:05.9359182Z auto tmp1 = static_cast(1); 2023-01-11T21:38:05.9359273Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:05.9359360Z out_ptr1[i0] = tmp2; 2023-01-11T21:38:05.9359419Z } 2023-01-11T21:38:05.9359502Z #pragma omp for 2023-01-11T21:38:05.9359588Z for(long i0=0; i0<1024; i0+=1) 2023-01-11T21:38:05.9359654Z { 2023-01-11T21:38:05.9359739Z #pragma GCC ivdep 2023-01-11T21:38:05.9359826Z for(long i1=0; i1<3; i1+=1) 2023-01-11T21:38:05.9359897Z { 2023-01-11T21:38:05.9359976Z #pragma GCC ivdep 2023-01-11T21:38:05.9360071Z for(long i2=0; i2<2; i2+=1) 2023-01-11T21:38:05.9360139Z { 2023-01-11T21:38:05.9360210Z { 2023-01-11T21:38:05.9360282Z { 2023-01-11T21:38:05.9360387Z auto tmp0 = in_ptr0[i1]; 2023-01-11T21:38:05.9360496Z auto tmp3 = in_ptr1[i2 + (2*i0)]; 2023-01-11T21:38:05.9360600Z auto tmp1 = static_cast(1); 2023-01-11T21:38:05.9360700Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:05.9360813Z auto tmp4 = static_cast(1); 2023-01-11T21:38:05.9360916Z auto tmp5 = tmp3 + tmp4; 2023-01-11T21:38:05.9361029Z out_ptr1[i2 + (2*tmp2) + (8*i0)] = tmp5; 2023-01-11T21:38:05.9361102Z } 2023-01-11T21:38:05.9361205Z } 2023-01-11T21:38:05.9361267Z } 2023-01-11T21:38:05.9361337Z } 2023-01-11T21:38:05.9361408Z } 2023-01-11T21:38:05.9361475Z } 2023-01-11T21:38:05.9361542Z } 2023-01-11T21:38:05.9361630Z ''') 2023-01-11T21:38:05.9361636Z 2023-01-11T21:38:05.9361641Z 2023-01-11T21:38:05.9361737Z async_compile.wait(globals()) 2023-01-11T21:38:05.9361807Z del async_compile 2023-01-11T21:38:05.9361821Z 2023-01-11T21:38:05.9361889Z def call(args): 2023-01-11T21:38:05.9361975Z arg0_1, arg1_1, arg2_1 = args 2023-01-11T21:38:05.9362051Z args.clear() 2023-01-11T21:38:05.9362259Z buf1 = empty_strided((1024, 4, 2), (8, 2, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9362454Z kernel_cpp_0(c_void_p(arg1_1.data_ptr()), c_void_p(arg2_1.data_ptr()), c_void_p(arg0_1.data_ptr()), c_void_p(buf1.data_ptr())) 2023-01-11T21:38:05.9362528Z del arg0_1 2023-01-11T21:38:05.9362602Z del arg1_1 2023-01-11T21:38:05.9362666Z del arg2_1 2023-01-11T21:38:05.9362747Z return (buf1, ) 2023-01-11T21:38:05.9362752Z 2023-01-11T21:38:05.9362756Z 2023-01-11T21:38:05.9362840Z if __name__ == "__main__": 2023-01-11T21:38:05.9362958Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.9363085Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.9363295Z arg0_1 = rand_strided((1024, 4, 2), (8, 2, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9363486Z arg1_1 = rand_strided((3, ), (1, ), device='cpu', dtype=torch.int64) 2023-01-11T21:38:05.9363684Z arg2_1 = rand_strided((1024, 1, 2), (2, 2, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9363810Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1])) 2023-01-11T21:38:05.9363816Z 2023-01-11T21:38:05.9363887Z ok (1.910s) 2023-01-11T21:38:05.9364353Z test_index_put_as_masked_fill_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.9364487Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.9364779Z [2023-01-11 21:27:26,363] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 128 2023-01-11T21:38:05.9365043Z [2023-01-11 21:27:28,060] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 128 2023-01-11T21:38:05.9365458Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.9365596Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.9365854Z [2023-01-11 21:27:28,107] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 129 2023-01-11T21:38:05.9366121Z [2023-01-11 21:27:29,845] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 129 2023-01-11T21:38:05.9366127Z 2023-01-11T21:38:05.9366226Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.9366294Z import torch 2023-01-11T21:38:05.9366368Z import random 2023-01-11T21:38:05.9366488Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.9366615Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.9366620Z 2023-01-11T21:38:05.9366702Z aten = torch.ops.aten 2023-01-11T21:38:05.9366841Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.9366940Z async_compile = AsyncCompile() 2023-01-11T21:38:05.9366945Z 2023-01-11T21:38:05.9367012Z import triton 2023-01-11T21:38:05.9367136Z import triton.language as tl 2023-01-11T21:38:05.9367262Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.9367404Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.9367409Z 2023-01-11T21:38:05.9367413Z 2023-01-11T21:38:05.9367550Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.9367758Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.9367883Z extern "C" void kernel(const bool* __restrict__ in_ptr0, 2023-01-11T21:38:05.9367994Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:05.9368095Z const float* __restrict__ in_ptr2, 2023-01-11T21:38:05.9368202Z float* __restrict__ out_ptr0) 2023-01-11T21:38:05.9368268Z { 2023-01-11T21:38:05.9368371Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.9368440Z { 2023-01-11T21:38:05.9368525Z #pragma omp for 2023-01-11T21:38:05.9368619Z for(long i0=0; i0<1024; i0+=1) 2023-01-11T21:38:05.9368680Z { 2023-01-11T21:38:05.9368787Z float g_tmp_buffer_in_ptr0[8] = {0}; 2023-01-11T21:38:05.9368915Z flag_to_float(in_ptr0 + 8*i0, g_tmp_buffer_in_ptr0, 8); 2023-01-11T21:38:05.9369067Z auto tmp0 = at::vec::Vectorized::loadu(g_tmp_buffer_in_ptr0); 2023-01-11T21:38:05.9369192Z auto tmp1 = at::vec::Vectorized(in_ptr1[0]); 2023-01-11T21:38:05.9369329Z auto tmp2 = at::vec::Vectorized::loadu(in_ptr2 + 8*i0); 2023-01-11T21:38:05.9369456Z auto tmp3 = decltype(tmp1)::blendv(tmp2, tmp1, tmp0); 2023-01-11T21:38:05.9369555Z tmp3.store(out_ptr0 + 8*i0); 2023-01-11T21:38:05.9369616Z } 2023-01-11T21:38:05.9369713Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:05.9369804Z for(long i0=8192; i0<8192; i0+=1) 2023-01-11T21:38:05.9369872Z { 2023-01-11T21:38:05.9369962Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:05.9370055Z auto tmp1 = in_ptr1[0]; 2023-01-11T21:38:05.9370143Z auto tmp2 = in_ptr2[i0]; 2023-01-11T21:38:05.9370235Z auto tmp3 = tmp0 ? tmp1 : tmp2; 2023-01-11T21:38:05.9370323Z out_ptr0[i0] = tmp3; 2023-01-11T21:38:05.9370390Z } 2023-01-11T21:38:05.9370456Z } 2023-01-11T21:38:05.9370556Z } 2023-01-11T21:38:05.9370645Z ''') 2023-01-11T21:38:05.9370651Z 2023-01-11T21:38:05.9370655Z 2023-01-11T21:38:05.9370751Z async_compile.wait(globals()) 2023-01-11T21:38:05.9370822Z del async_compile 2023-01-11T21:38:05.9370827Z 2023-01-11T21:38:05.9370902Z def call(args): 2023-01-11T21:38:05.9370995Z arg0_1, arg1_1, arg2_1 = args 2023-01-11T21:38:05.9371073Z args.clear() 2023-01-11T21:38:05.9371280Z buf0 = empty_strided((1024, 4, 2), (8, 2, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9371472Z kernel_cpp_0(c_void_p(arg1_1.data_ptr()), c_void_p(arg2_1.data_ptr()), c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:05.9371547Z del arg0_1 2023-01-11T21:38:05.9371612Z del arg1_1 2023-01-11T21:38:05.9371684Z del arg2_1 2023-01-11T21:38:05.9371759Z return (buf0, ) 2023-01-11T21:38:05.9371764Z 2023-01-11T21:38:05.9371769Z 2023-01-11T21:38:05.9371850Z if __name__ == "__main__": 2023-01-11T21:38:05.9371970Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.9372099Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.9372304Z arg0_1 = rand_strided((1024, 4, 2), (8, 2, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9372509Z arg1_1 = rand_strided((1024, 4, 2), (8, 2, 1), device='cpu', dtype=torch.bool) 2023-01-11T21:38:05.9372688Z arg2_1 = rand_strided((), (), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9372815Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1])) 2023-01-11T21:38:05.9372821Z 2023-01-11T21:38:05.9372825Z 2023-01-11T21:38:05.9372922Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.9373026Z import torch 2023-01-11T21:38:05.9373102Z import random 2023-01-11T21:38:05.9373223Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.9373347Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.9373352Z 2023-01-11T21:38:05.9373435Z aten = torch.ops.aten 2023-01-11T21:38:05.9373567Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.9373666Z async_compile = AsyncCompile() 2023-01-11T21:38:05.9373671Z 2023-01-11T21:38:05.9373745Z import triton 2023-01-11T21:38:05.9373841Z import triton.language as tl 2023-01-11T21:38:05.9373966Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.9374109Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.9374115Z 2023-01-11T21:38:05.9374119Z 2023-01-11T21:38:05.9374255Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.9374463Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.9374697Z extern "C" void kernel(const bool* __restrict__ in_ptr0, 2023-01-11T21:38:05.9374809Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:05.9374919Z const float* __restrict__ in_ptr2, 2023-01-11T21:38:05.9375022Z float* __restrict__ out_ptr0) 2023-01-11T21:38:05.9375089Z { 2023-01-11T21:38:05.9375191Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.9375257Z { 2023-01-11T21:38:05.9375331Z #pragma omp for 2023-01-11T21:38:05.9375418Z for(long i0=0; i0<1024; i0+=1) 2023-01-11T21:38:05.9375486Z { 2023-01-11T21:38:05.9375593Z float g_tmp_buffer_in_ptr0[8] = {0}; 2023-01-11T21:38:05.9375721Z flag_to_float(in_ptr0 + 8*i0, g_tmp_buffer_in_ptr0, 8); 2023-01-11T21:38:05.9375869Z auto tmp0 = at::vec::Vectorized::loadu(g_tmp_buffer_in_ptr0); 2023-01-11T21:38:05.9376006Z auto tmp1 = at::vec::Vectorized::loadu(in_ptr1 + 8*i0); 2023-01-11T21:38:05.9376128Z auto tmp2 = at::vec::Vectorized(in_ptr2[0]); 2023-01-11T21:38:05.9376220Z auto tmp3 = tmp1 + tmp2; 2023-01-11T21:38:05.9376350Z auto tmp4 = decltype(tmp3)::blendv(tmp1, tmp3, tmp0); 2023-01-11T21:38:05.9376450Z tmp4.store(out_ptr0 + 8*i0); 2023-01-11T21:38:05.9376562Z } 2023-01-11T21:38:05.9376667Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:05.9376756Z for(long i0=8192; i0<8192; i0+=1) 2023-01-11T21:38:05.9376824Z { 2023-01-11T21:38:05.9376907Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:05.9376993Z auto tmp1 = in_ptr1[i0]; 2023-01-11T21:38:05.9377083Z auto tmp2 = in_ptr2[0]; 2023-01-11T21:38:05.9377226Z auto tmp3 = tmp1 + tmp2; 2023-01-11T21:38:05.9377340Z auto tmp4 = tmp0 ? tmp3 : tmp1; 2023-01-11T21:38:05.9377439Z out_ptr0[i0] = tmp4; 2023-01-11T21:38:05.9377499Z } 2023-01-11T21:38:05.9377566Z } 2023-01-11T21:38:05.9377633Z } 2023-01-11T21:38:05.9377720Z ''') 2023-01-11T21:38:05.9377726Z 2023-01-11T21:38:05.9377730Z 2023-01-11T21:38:05.9377823Z async_compile.wait(globals()) 2023-01-11T21:38:05.9377901Z del async_compile 2023-01-11T21:38:05.9377906Z 2023-01-11T21:38:05.9377982Z def call(args): 2023-01-11T21:38:05.9378070Z arg0_1, arg1_1, arg2_1 = args 2023-01-11T21:38:05.9378138Z args.clear() 2023-01-11T21:38:05.9378347Z buf0 = empty_strided((1024, 4, 2), (8, 2, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9378539Z kernel_cpp_0(c_void_p(arg1_1.data_ptr()), c_void_p(arg0_1.data_ptr()), c_void_p(arg2_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:05.9378612Z del arg0_1 2023-01-11T21:38:05.9378683Z del arg1_1 2023-01-11T21:38:05.9378754Z del arg2_1 2023-01-11T21:38:05.9378831Z return (buf0, ) 2023-01-11T21:38:05.9378836Z 2023-01-11T21:38:05.9378840Z 2023-01-11T21:38:05.9378913Z if __name__ == "__main__": 2023-01-11T21:38:05.9379032Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.9379198Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.9379404Z arg0_1 = rand_strided((1024, 4, 2), (8, 2, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9379605Z arg1_1 = rand_strided((1024, 4, 2), (8, 2, 1), device='cpu', dtype=torch.bool) 2023-01-11T21:38:05.9379793Z arg2_1 = rand_strided((), (), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9379922Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1])) 2023-01-11T21:38:05.9379927Z 2023-01-11T21:38:05.9380001Z ok (3.530s) 2023-01-11T21:38:05.9380469Z test_index_put_fallback1_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.9380597Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.9380855Z [2023-01-11 21:27:29,893] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 130 2023-01-11T21:38:05.9381120Z [2023-01-11 21:27:31,582] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 130 2023-01-11T21:38:05.9381536Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.9381667Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.9381922Z [2023-01-11 21:27:31,629] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 131 2023-01-11T21:38:05.9382189Z [2023-01-11 21:27:31,636] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 131 2023-01-11T21:38:05.9382194Z 2023-01-11T21:38:05.9382292Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.9382366Z import torch 2023-01-11T21:38:05.9382441Z import random 2023-01-11T21:38:05.9382627Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.9382753Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.9382758Z 2023-01-11T21:38:05.9382841Z aten = torch.ops.aten 2023-01-11T21:38:05.9382976Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.9383074Z async_compile = AsyncCompile() 2023-01-11T21:38:05.9383080Z 2023-01-11T21:38:05.9383154Z import triton 2023-01-11T21:38:05.9383247Z import triton.language as tl 2023-01-11T21:38:05.9383365Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.9383506Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.9383514Z 2023-01-11T21:38:05.9383518Z 2023-01-11T21:38:05.9383657Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.9383862Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.9383986Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:05.9384093Z float* __restrict__ out_ptr0) 2023-01-11T21:38:05.9384160Z { 2023-01-11T21:38:05.9384239Z #pragma GCC ivdep 2023-01-11T21:38:05.9384318Z for(long i0=0; i0<3; i0+=1) 2023-01-11T21:38:05.9384384Z { 2023-01-11T21:38:05.9384451Z { 2023-01-11T21:38:05.9384519Z { 2023-01-11T21:38:05.9384614Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:05.9384703Z out_ptr0[i0] = tmp0; 2023-01-11T21:38:05.9384770Z } 2023-01-11T21:38:05.9384830Z } 2023-01-11T21:38:05.9384897Z } 2023-01-11T21:38:05.9384961Z } 2023-01-11T21:38:05.9385046Z ''') 2023-01-11T21:38:05.9385083Z 2023-01-11T21:38:05.9385088Z 2023-01-11T21:38:05.9385181Z async_compile.wait(globals()) 2023-01-11T21:38:05.9385259Z del async_compile 2023-01-11T21:38:05.9385264Z 2023-01-11T21:38:05.9385344Z def call(args): 2023-01-11T21:38:05.9385423Z arg0_1, arg1_1, arg2_1 = args 2023-01-11T21:38:05.9385499Z args.clear() 2023-01-11T21:38:05.9385694Z buf0 = empty_strided((3, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9385835Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:05.9385910Z del arg0_1 2023-01-11T21:38:05.9386019Z aten.index_put_(buf0, [arg1_1], arg2_1, False) 2023-01-11T21:38:05.9386093Z del arg1_1 2023-01-11T21:38:05.9386157Z del arg2_1 2023-01-11T21:38:05.9386231Z return (buf0, ) 2023-01-11T21:38:05.9386237Z 2023-01-11T21:38:05.9386241Z 2023-01-11T21:38:05.9386321Z if __name__ == "__main__": 2023-01-11T21:38:05.9386438Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.9386569Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.9386766Z arg0_1 = rand_strided((3, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9386953Z arg1_1 = rand_strided((3, ), (1, ), device='cpu', dtype=torch.bool) 2023-01-11T21:38:05.9387146Z arg2_1 = rand_strided((2, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9387267Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1])) 2023-01-11T21:38:05.9387272Z 2023-01-11T21:38:05.9387277Z 2023-01-11T21:38:05.9387375Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.9387450Z import torch 2023-01-11T21:38:05.9387525Z import random 2023-01-11T21:38:05.9387644Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.9387767Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.9387771Z 2023-01-11T21:38:05.9387855Z aten = torch.ops.aten 2023-01-11T21:38:05.9387993Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.9388084Z async_compile = AsyncCompile() 2023-01-11T21:38:05.9388089Z 2023-01-11T21:38:05.9388162Z import triton 2023-01-11T21:38:05.9388258Z import triton.language as tl 2023-01-11T21:38:05.9388384Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.9388523Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.9388555Z 2023-01-11T21:38:05.9388560Z 2023-01-11T21:38:05.9388696Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.9388901Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.9389026Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:05.9389123Z float* __restrict__ out_ptr0) 2023-01-11T21:38:05.9389188Z { 2023-01-11T21:38:05.9389269Z #pragma GCC ivdep 2023-01-11T21:38:05.9389354Z for(long i0=0; i0<3; i0+=1) 2023-01-11T21:38:05.9389420Z { 2023-01-11T21:38:05.9389486Z { 2023-01-11T21:38:05.9389546Z { 2023-01-11T21:38:05.9389641Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:05.9389729Z out_ptr0[i0] = tmp0; 2023-01-11T21:38:05.9389799Z } 2023-01-11T21:38:05.9389865Z } 2023-01-11T21:38:05.9389930Z } 2023-01-11T21:38:05.9389994Z } 2023-01-11T21:38:05.9390070Z ''') 2023-01-11T21:38:05.9390075Z 2023-01-11T21:38:05.9390082Z 2023-01-11T21:38:05.9390173Z async_compile.wait(globals()) 2023-01-11T21:38:05.9390248Z del async_compile 2023-01-11T21:38:05.9390253Z 2023-01-11T21:38:05.9390326Z def call(args): 2023-01-11T21:38:05.9390411Z arg0_1, arg1_1, arg2_1 = args 2023-01-11T21:38:05.9390487Z args.clear() 2023-01-11T21:38:05.9390677Z buf0 = empty_strided((3, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9390808Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:05.9390880Z del arg0_1 2023-01-11T21:38:05.9390987Z aten.index_put_(buf0, [arg1_1], arg2_1, True) 2023-01-11T21:38:05.9391092Z del arg1_1 2023-01-11T21:38:05.9391162Z del arg2_1 2023-01-11T21:38:05.9391236Z return (buf0, ) 2023-01-11T21:38:05.9391241Z 2023-01-11T21:38:05.9391246Z 2023-01-11T21:38:05.9391326Z if __name__ == "__main__": 2023-01-11T21:38:05.9391443Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.9391563Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.9391757Z arg0_1 = rand_strided((3, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9391943Z arg1_1 = rand_strided((3, ), (1, ), device='cpu', dtype=torch.bool) 2023-01-11T21:38:05.9392134Z arg2_1 = rand_strided((2, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9392263Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1])) 2023-01-11T21:38:05.9392269Z 2023-01-11T21:38:05.9392340Z ok (1.791s) 2023-01-11T21:38:05.9392805Z test_index_put_fallback2_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.9392943Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.9393204Z [2023-01-11 21:27:31,685] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 132 2023-01-11T21:38:05.9393460Z [2023-01-11 21:27:33,365] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 132 2023-01-11T21:38:05.9393878Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.9394010Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.9394267Z [2023-01-11 21:27:33,416] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 133 2023-01-11T21:38:05.9394555Z [2023-01-11 21:27:33,424] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 133 2023-01-11T21:38:05.9394561Z 2023-01-11T21:38:05.9394664Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.9394739Z import torch 2023-01-11T21:38:05.9394814Z import random 2023-01-11T21:38:05.9394935Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.9395052Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.9395062Z 2023-01-11T21:38:05.9395138Z aten = torch.ops.aten 2023-01-11T21:38:05.9395275Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.9395371Z async_compile = AsyncCompile() 2023-01-11T21:38:05.9395376Z 2023-01-11T21:38:05.9395455Z import triton 2023-01-11T21:38:05.9395547Z import triton.language as tl 2023-01-11T21:38:05.9395672Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.9395813Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.9395819Z 2023-01-11T21:38:05.9395823Z 2023-01-11T21:38:05.9395956Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.9396162Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.9396285Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:05.9396388Z float* __restrict__ out_ptr0) 2023-01-11T21:38:05.9396455Z { 2023-01-11T21:38:05.9396536Z #pragma GCC ivdep 2023-01-11T21:38:05.9396618Z for(long i0=0; i0<6; i0+=1) 2023-01-11T21:38:05.9396677Z { 2023-01-11T21:38:05.9396745Z { 2023-01-11T21:38:05.9396813Z { 2023-01-11T21:38:05.9396907Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:05.9397024Z out_ptr0[i0] = tmp0; 2023-01-11T21:38:05.9397094Z } 2023-01-11T21:38:05.9397159Z } 2023-01-11T21:38:05.9397218Z } 2023-01-11T21:38:05.9397282Z } 2023-01-11T21:38:05.9397366Z ''') 2023-01-11T21:38:05.9397371Z 2023-01-11T21:38:05.9397376Z 2023-01-11T21:38:05.9397471Z async_compile.wait(globals()) 2023-01-11T21:38:05.9397549Z del async_compile 2023-01-11T21:38:05.9397554Z 2023-01-11T21:38:05.9397627Z def call(args): 2023-01-11T21:38:05.9397722Z arg0_1, arg1_1, arg2_1, arg3_1 = args 2023-01-11T21:38:05.9397790Z args.clear() 2023-01-11T21:38:05.9397991Z buf0 = empty_strided((1, 2, 3), (6, 3, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9398129Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:05.9398203Z del arg0_1 2023-01-11T21:38:05.9398326Z aten.index_put_(buf0, [None,arg1_1,arg2_1], arg3_1, False) 2023-01-11T21:38:05.9398398Z del arg1_1 2023-01-11T21:38:05.9398473Z del arg2_1 2023-01-11T21:38:05.9398537Z del arg3_1 2023-01-11T21:38:05.9398614Z return (buf0, ) 2023-01-11T21:38:05.9398619Z 2023-01-11T21:38:05.9398624Z 2023-01-11T21:38:05.9398703Z if __name__ == "__main__": 2023-01-11T21:38:05.9398820Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.9398952Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.9399154Z arg0_1 = rand_strided((1, 2, 3), (6, 3, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9399342Z arg1_1 = rand_strided((2, ), (1, ), device='cpu', dtype=torch.int64) 2023-01-11T21:38:05.9399527Z arg2_1 = rand_strided((3, ), (1, ), device='cpu', dtype=torch.bool) 2023-01-11T21:38:05.9399704Z arg3_1 = rand_strided((), (), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9399838Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1, arg3_1])) 2023-01-11T21:38:05.9399843Z 2023-01-11T21:38:05.9399848Z 2023-01-11T21:38:05.9399948Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.9400025Z import torch 2023-01-11T21:38:05.9400100Z import random 2023-01-11T21:38:05.9400220Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.9400343Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.9400348Z 2023-01-11T21:38:05.9400430Z aten = torch.ops.aten 2023-01-11T21:38:05.9400586Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.9400684Z async_compile = AsyncCompile() 2023-01-11T21:38:05.9400690Z 2023-01-11T21:38:05.9400764Z import triton 2023-01-11T21:38:05.9400859Z import triton.language as tl 2023-01-11T21:38:05.9400985Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.9401127Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.9401132Z 2023-01-11T21:38:05.9401136Z 2023-01-11T21:38:05.9401271Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.9401476Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.9401595Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:05.9401700Z float* __restrict__ out_ptr0) 2023-01-11T21:38:05.9401765Z { 2023-01-11T21:38:05.9401846Z #pragma GCC ivdep 2023-01-11T21:38:05.9401932Z for(long i0=0; i0<6; i0+=1) 2023-01-11T21:38:05.9402001Z { 2023-01-11T21:38:05.9402072Z { 2023-01-11T21:38:05.9402132Z { 2023-01-11T21:38:05.9402226Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:05.9402314Z out_ptr0[i0] = tmp0; 2023-01-11T21:38:05.9402381Z } 2023-01-11T21:38:05.9402448Z } 2023-01-11T21:38:05.9402516Z } 2023-01-11T21:38:05.9402581Z } 2023-01-11T21:38:05.9402657Z ''') 2023-01-11T21:38:05.9402663Z 2023-01-11T21:38:05.9402668Z 2023-01-11T21:38:05.9402760Z async_compile.wait(globals()) 2023-01-11T21:38:05.9402838Z del async_compile 2023-01-11T21:38:05.9402843Z 2023-01-11T21:38:05.9402920Z def call(args): 2023-01-11T21:38:05.9403042Z arg0_1, arg1_1, arg2_1, arg3_1 = args 2023-01-11T21:38:05.9403118Z args.clear() 2023-01-11T21:38:05.9403317Z buf0 = empty_strided((1, 2, 3), (6, 3, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9403449Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:05.9403520Z del arg0_1 2023-01-11T21:38:05.9403650Z aten.index_put_(buf0, [None,arg1_1,arg2_1], arg3_1, True) 2023-01-11T21:38:05.9403722Z del arg1_1 2023-01-11T21:38:05.9403793Z del arg2_1 2023-01-11T21:38:05.9403865Z del arg3_1 2023-01-11T21:38:05.9403941Z return (buf0, ) 2023-01-11T21:38:05.9403946Z 2023-01-11T21:38:05.9403950Z 2023-01-11T21:38:05.9404023Z if __name__ == "__main__": 2023-01-11T21:38:05.9404142Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.9404270Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.9404472Z arg0_1 = rand_strided((1, 2, 3), (6, 3, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9404663Z arg1_1 = rand_strided((2, ), (1, ), device='cpu', dtype=torch.int64) 2023-01-11T21:38:05.9404850Z arg2_1 = rand_strided((3, ), (1, ), device='cpu', dtype=torch.bool) 2023-01-11T21:38:05.9405034Z arg3_1 = rand_strided((), (), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9405171Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1, arg3_1])) 2023-01-11T21:38:05.9405176Z 2023-01-11T21:38:05.9405240Z ok (1.788s) 2023-01-11T21:38:05.9405694Z test_index_select_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.9405827Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.9406088Z [2023-01-11 21:27:33,466] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 134 2023-01-11T21:38:05.9406350Z [2023-01-11 21:27:35,237] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 134 2023-01-11T21:38:05.9406795Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.9406929Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.9407185Z [2023-01-11 21:27:35,279] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 135 2023-01-11T21:38:05.9407447Z [2023-01-11 21:27:37,083] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 135 2023-01-11T21:38:05.9407455Z 2023-01-11T21:38:05.9407554Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.9407629Z import torch 2023-01-11T21:38:05.9407697Z import random 2023-01-11T21:38:05.9407821Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.9407944Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.9407952Z 2023-01-11T21:38:05.9408037Z aten = torch.ops.aten 2023-01-11T21:38:05.9408173Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.9408269Z async_compile = AsyncCompile() 2023-01-11T21:38:05.9408274Z 2023-01-11T21:38:05.9408348Z import triton 2023-01-11T21:38:05.9408443Z import triton.language as tl 2023-01-11T21:38:05.9408562Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.9408703Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.9408708Z 2023-01-11T21:38:05.9408713Z 2023-01-11T21:38:05.9408850Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.9409082Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.9409207Z extern "C" void kernel(const int* __restrict__ in_ptr0, 2023-01-11T21:38:05.9409317Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:05.9409422Z float* __restrict__ out_ptr0, 2023-01-11T21:38:05.9409527Z float* __restrict__ out_ptr1, 2023-01-11T21:38:05.9409621Z float* __restrict__ out_ptr2) 2023-01-11T21:38:05.9409690Z { 2023-01-11T21:38:05.9409793Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.9409860Z { 2023-01-11T21:38:05.9409955Z #pragma omp for collapse(2) 2023-01-11T21:38:05.9410041Z for(long i0=0; i0<4; i0+=1) 2023-01-11T21:38:05.9410101Z { 2023-01-11T21:38:05.9410193Z for(long i1=0; i1<64; i1+=1) 2023-01-11T21:38:05.9410261Z { 2023-01-11T21:38:05.9410330Z { 2023-01-11T21:38:05.9410406Z { 2023-01-11T21:38:05.9410506Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:05.9410618Z auto tmp1 = in_ptr1[i1 + (64*tmp0)]; 2023-01-11T21:38:05.9410714Z out_ptr0[i1 + (64*i0)] = tmp1; 2023-01-11T21:38:05.9410784Z } 2023-01-11T21:38:05.9410856Z } 2023-01-11T21:38:05.9410924Z } 2023-01-11T21:38:05.9410990Z } 2023-01-11T21:38:05.9411072Z #pragma omp for 2023-01-11T21:38:05.9411160Z for(long i0=0; i0<8; i0+=1) 2023-01-11T21:38:05.9411221Z { 2023-01-11T21:38:05.9411306Z #pragma GCC ivdep 2023-01-11T21:38:05.9411394Z for(long i1=0; i1<4; i1+=1) 2023-01-11T21:38:05.9411463Z { 2023-01-11T21:38:05.9411551Z #pragma GCC ivdep 2023-01-11T21:38:05.9411646Z for(long i2=0; i2<8; i2+=1) 2023-01-11T21:38:05.9411716Z { 2023-01-11T21:38:05.9411780Z { 2023-01-11T21:38:05.9411856Z { 2023-01-11T21:38:05.9411959Z auto tmp0 = in_ptr0[i1]; 2023-01-11T21:38:05.9412077Z auto tmp1 = in_ptr1[i2 + (8*tmp0) + (64*i0)]; 2023-01-11T21:38:05.9412190Z out_ptr1[i2 + (8*i1) + (32*i0)] = tmp1; 2023-01-11T21:38:05.9412301Z } 2023-01-11T21:38:05.9412374Z } 2023-01-11T21:38:05.9412435Z } 2023-01-11T21:38:05.9412502Z } 2023-01-11T21:38:05.9412569Z } 2023-01-11T21:38:05.9412652Z #pragma omp for 2023-01-11T21:38:05.9412740Z for(long i0=0; i0<8; i0+=1) 2023-01-11T21:38:05.9412807Z { 2023-01-11T21:38:05.9412884Z #pragma GCC ivdep 2023-01-11T21:38:05.9412970Z for(long i1=0; i1<4; i1+=1) 2023-01-11T21:38:05.9413038Z { 2023-01-11T21:38:05.9413123Z #pragma GCC ivdep 2023-01-11T21:38:05.9413218Z for(long i2=0; i2<4; i2+=1) 2023-01-11T21:38:05.9413293Z { 2023-01-11T21:38:05.9413364Z { 2023-01-11T21:38:05.9413430Z { 2023-01-11T21:38:05.9413531Z auto tmp0 = in_ptr0[i1]; 2023-01-11T21:38:05.9413629Z auto tmp1 = in_ptr0[i2]; 2023-01-11T21:38:05.9413752Z auto tmp2 = in_ptr1[tmp1 + (8*tmp0) + (64*i0)]; 2023-01-11T21:38:05.9413863Z out_ptr2[i2 + (4*i1) + (16*i0)] = tmp2; 2023-01-11T21:38:05.9413938Z } 2023-01-11T21:38:05.9414009Z } 2023-01-11T21:38:05.9414071Z } 2023-01-11T21:38:05.9414138Z } 2023-01-11T21:38:05.9414204Z } 2023-01-11T21:38:05.9414272Z } 2023-01-11T21:38:05.9414337Z } 2023-01-11T21:38:05.9414424Z ''') 2023-01-11T21:38:05.9414429Z 2023-01-11T21:38:05.9414433Z 2023-01-11T21:38:05.9414638Z async_compile.wait(globals()) 2023-01-11T21:38:05.9414708Z del async_compile 2023-01-11T21:38:05.9414756Z 2023-01-11T21:38:05.9414835Z def call(args): 2023-01-11T21:38:05.9414914Z arg0_1, arg1_1 = args 2023-01-11T21:38:05.9414989Z args.clear() 2023-01-11T21:38:05.9415196Z buf0 = empty_strided((4, 8, 8), (64, 8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9415399Z buf1 = empty_strided((8, 4, 8), (32, 8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9415593Z buf2 = empty_strided((8, 4, 4), (16, 4, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9415810Z kernel_cpp_0(c_void_p(arg1_1.data_ptr()), c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr())) 2023-01-11T21:38:05.9415877Z del arg0_1 2023-01-11T21:38:05.9415948Z del arg1_1 2023-01-11T21:38:05.9416036Z return (buf0, buf1, buf2, ) 2023-01-11T21:38:05.9416041Z 2023-01-11T21:38:05.9416046Z 2023-01-11T21:38:05.9416126Z if __name__ == "__main__": 2023-01-11T21:38:05.9416245Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.9416373Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.9416576Z arg0_1 = rand_strided((8, 8, 8), (64, 8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9416767Z arg1_1 = rand_strided((4, ), (1, ), device='cpu', dtype=torch.int32) 2023-01-11T21:38:05.9416884Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:05.9416889Z 2023-01-11T21:38:05.9416893Z 2023-01-11T21:38:05.9416992Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.9417067Z import torch 2023-01-11T21:38:05.9417192Z import random 2023-01-11T21:38:05.9417321Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.9417449Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.9417454Z 2023-01-11T21:38:05.9417536Z aten = torch.ops.aten 2023-01-11T21:38:05.9417665Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.9417760Z async_compile = AsyncCompile() 2023-01-11T21:38:05.9417768Z 2023-01-11T21:38:05.9417842Z import triton 2023-01-11T21:38:05.9417938Z import triton.language as tl 2023-01-11T21:38:05.9418061Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.9418201Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.9418206Z 2023-01-11T21:38:05.9418210Z 2023-01-11T21:38:05.9418389Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.9418596Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.9418717Z extern "C" void kernel(const long* __restrict__ in_ptr0, 2023-01-11T21:38:05.9418820Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:05.9418924Z float* __restrict__ out_ptr0, 2023-01-11T21:38:05.9419024Z float* __restrict__ out_ptr1, 2023-01-11T21:38:05.9419123Z float* __restrict__ out_ptr2) 2023-01-11T21:38:05.9419190Z { 2023-01-11T21:38:05.9419294Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.9419362Z { 2023-01-11T21:38:05.9419449Z #pragma omp for collapse(2) 2023-01-11T21:38:05.9419539Z for(long i0=0; i0<4; i0+=1) 2023-01-11T21:38:05.9419605Z { 2023-01-11T21:38:05.9419699Z for(long i1=0; i1<64; i1+=1) 2023-01-11T21:38:05.9419768Z { 2023-01-11T21:38:05.9419840Z { 2023-01-11T21:38:05.9419903Z { 2023-01-11T21:38:05.9420007Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:05.9420121Z auto tmp1 = in_ptr1[i1 + (64*tmp0)]; 2023-01-11T21:38:05.9420224Z out_ptr0[i1 + (64*i0)] = tmp1; 2023-01-11T21:38:05.9420294Z } 2023-01-11T21:38:05.9420363Z } 2023-01-11T21:38:05.9420433Z } 2023-01-11T21:38:05.9420495Z } 2023-01-11T21:38:05.9420578Z #pragma omp for 2023-01-11T21:38:05.9420664Z for(long i0=0; i0<8; i0+=1) 2023-01-11T21:38:05.9420760Z { 2023-01-11T21:38:05.9420849Z #pragma GCC ivdep 2023-01-11T21:38:05.9420936Z for(long i1=0; i1<4; i1+=1) 2023-01-11T21:38:05.9421005Z { 2023-01-11T21:38:05.9421084Z #pragma GCC ivdep 2023-01-11T21:38:05.9421179Z for(long i2=0; i2<8; i2+=1) 2023-01-11T21:38:05.9421255Z { 2023-01-11T21:38:05.9421326Z { 2023-01-11T21:38:05.9421398Z { 2023-01-11T21:38:05.9421498Z auto tmp0 = in_ptr0[i1]; 2023-01-11T21:38:05.9421616Z auto tmp1 = in_ptr1[i2 + (8*tmp0) + (64*i0)]; 2023-01-11T21:38:05.9421719Z out_ptr1[i2 + (8*i1) + (32*i0)] = tmp1; 2023-01-11T21:38:05.9421792Z } 2023-01-11T21:38:05.9421861Z } 2023-01-11T21:38:05.9421929Z } 2023-01-11T21:38:05.9421997Z } 2023-01-11T21:38:05.9422064Z } 2023-01-11T21:38:05.9422141Z #pragma omp for 2023-01-11T21:38:05.9422230Z for(long i0=0; i0<8; i0+=1) 2023-01-11T21:38:05.9422296Z { 2023-01-11T21:38:05.9422383Z #pragma GCC ivdep 2023-01-11T21:38:05.9422470Z for(long i1=0; i1<4; i1+=1) 2023-01-11T21:38:05.9422537Z { 2023-01-11T21:38:05.9422629Z #pragma GCC ivdep 2023-01-11T21:38:05.9422715Z for(long i2=0; i2<4; i2+=1) 2023-01-11T21:38:05.9422785Z { 2023-01-11T21:38:05.9422856Z { 2023-01-11T21:38:05.9422928Z { 2023-01-11T21:38:05.9423027Z auto tmp0 = in_ptr0[i1]; 2023-01-11T21:38:05.9423126Z auto tmp1 = in_ptr0[i2]; 2023-01-11T21:38:05.9423245Z auto tmp2 = in_ptr1[tmp1 + (8*tmp0) + (64*i0)]; 2023-01-11T21:38:05.9423347Z out_ptr2[i2 + (4*i1) + (16*i0)] = tmp2; 2023-01-11T21:38:05.9423423Z } 2023-01-11T21:38:05.9423496Z } 2023-01-11T21:38:05.9423566Z } 2023-01-11T21:38:05.9423633Z } 2023-01-11T21:38:05.9423700Z } 2023-01-11T21:38:05.9423765Z } 2023-01-11T21:38:05.9423822Z } 2023-01-11T21:38:05.9423910Z ''') 2023-01-11T21:38:05.9423916Z 2023-01-11T21:38:05.9423920Z 2023-01-11T21:38:05.9424044Z async_compile.wait(globals()) 2023-01-11T21:38:05.9424122Z del async_compile 2023-01-11T21:38:05.9424128Z 2023-01-11T21:38:05.9424204Z def call(args): 2023-01-11T21:38:05.9424283Z arg0_1, arg1_1 = args 2023-01-11T21:38:05.9424359Z args.clear() 2023-01-11T21:38:05.9424554Z buf0 = empty_strided((4, 8, 8), (64, 8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9424750Z buf1 = empty_strided((8, 4, 8), (32, 8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9424946Z buf2 = empty_strided((8, 4, 4), (16, 4, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9425181Z kernel_cpp_0(c_void_p(arg1_1.data_ptr()), c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr())) 2023-01-11T21:38:05.9425270Z del arg0_1 2023-01-11T21:38:05.9425358Z del arg1_1 2023-01-11T21:38:05.9425445Z return (buf0, buf1, buf2, ) 2023-01-11T21:38:05.9425450Z 2023-01-11T21:38:05.9425454Z 2023-01-11T21:38:05.9425538Z if __name__ == "__main__": 2023-01-11T21:38:05.9425651Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.9425780Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.9425983Z arg0_1 = rand_strided((8, 8, 8), (64, 8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9426173Z arg1_1 = rand_strided((4, ), (1, ), device='cpu', dtype=torch.int64) 2023-01-11T21:38:05.9426296Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:05.9426303Z 2023-01-11T21:38:05.9426373Z ok (3.659s) 2023-01-11T21:38:05.9426842Z test_indirect_load_broadcast_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.9427004Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.9427263Z [2023-01-11 21:27:37,130] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 136 2023-01-11T21:38:05.9427526Z [2023-01-11 21:27:38,882] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 136 2023-01-11T21:38:05.9427532Z 2023-01-11T21:38:05.9427624Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.9427698Z import torch 2023-01-11T21:38:05.9427773Z import random 2023-01-11T21:38:05.9427894Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.9428018Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.9428026Z 2023-01-11T21:38:05.9428108Z aten = torch.ops.aten 2023-01-11T21:38:05.9428245Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.9428333Z async_compile = AsyncCompile() 2023-01-11T21:38:05.9428348Z 2023-01-11T21:38:05.9428415Z import triton 2023-01-11T21:38:05.9428510Z import triton.language as tl 2023-01-11T21:38:05.9428634Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.9428775Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.9428780Z 2023-01-11T21:38:05.9428785Z 2023-01-11T21:38:05.9428922Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.9429125Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.9429247Z extern "C" void kernel(const long* __restrict__ in_ptr0, 2023-01-11T21:38:05.9429357Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:05.9429462Z const float* __restrict__ in_ptr2, 2023-01-11T21:38:05.9429567Z float* __restrict__ out_ptr0) 2023-01-11T21:38:05.9429636Z { 2023-01-11T21:38:05.9429736Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.9429803Z { 2023-01-11T21:38:05.9429886Z #pragma omp for 2023-01-11T21:38:05.9430001Z for(long i0=0; i0<32; i0+=1) 2023-01-11T21:38:05.9430069Z { 2023-01-11T21:38:05.9430157Z #pragma GCC ivdep 2023-01-11T21:38:05.9430246Z for(long i1=0; i1<21; i1+=1) 2023-01-11T21:38:05.9430314Z { 2023-01-11T21:38:05.9430383Z { 2023-01-11T21:38:05.9430455Z { 2023-01-11T21:38:05.9430558Z auto tmp0 = in_ptr0[i0 + (32*i1)]; 2023-01-11T21:38:05.9430659Z auto tmp2 = in_ptr2[i0]; 2023-01-11T21:38:05.9430771Z auto tmp1 = in_ptr1[i1 + (512*tmp0)]; 2023-01-11T21:38:05.9430872Z auto tmp3 = tmp1 + tmp2; 2023-01-11T21:38:05.9430977Z out_ptr0[i1 + (21*i0)] = tmp3; 2023-01-11T21:38:05.9431047Z } 2023-01-11T21:38:05.9431117Z } 2023-01-11T21:38:05.9431177Z } 2023-01-11T21:38:05.9431245Z } 2023-01-11T21:38:05.9431315Z } 2023-01-11T21:38:05.9431385Z } 2023-01-11T21:38:05.9431472Z ''') 2023-01-11T21:38:05.9431478Z 2023-01-11T21:38:05.9431482Z 2023-01-11T21:38:05.9431575Z async_compile.wait(globals()) 2023-01-11T21:38:05.9431653Z del async_compile 2023-01-11T21:38:05.9431658Z 2023-01-11T21:38:05.9431725Z def call(args): 2023-01-11T21:38:05.9431813Z arg0_1, arg1_1, arg2_1 = args 2023-01-11T21:38:05.9431890Z args.clear() 2023-01-11T21:38:05.9432091Z buf0 = empty_strided((32, 21), (21, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9432283Z kernel_cpp_0(c_void_p(arg2_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:05.9432388Z del arg0_1 2023-01-11T21:38:05.9432460Z del arg1_1 2023-01-11T21:38:05.9432525Z del arg2_1 2023-01-11T21:38:05.9432601Z return (buf0, ) 2023-01-11T21:38:05.9432607Z 2023-01-11T21:38:05.9432611Z 2023-01-11T21:38:05.9432692Z if __name__ == "__main__": 2023-01-11T21:38:05.9432812Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.9432941Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.9433140Z arg0_1 = rand_strided((32, 1), (1, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9433344Z arg1_1 = rand_strided((9521, 512), (512, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9433541Z arg2_1 = rand_strided((32, 21), (1, 32), device='cpu', dtype=torch.int64) 2023-01-11T21:38:05.9433660Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1])) 2023-01-11T21:38:05.9433673Z 2023-01-11T21:38:05.9433737Z ok (1.853s) 2023-01-11T21:38:05.9434183Z test_inf_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.9434324Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.9434581Z [2023-01-11 21:27:38,955] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 137 2023-01-11T21:38:05.9434845Z [2023-01-11 21:27:40,639] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 137 2023-01-11T21:38:05.9434851Z 2023-01-11T21:38:05.9434950Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.9435025Z import torch 2023-01-11T21:38:05.9435100Z import random 2023-01-11T21:38:05.9435219Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.9435335Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.9435342Z 2023-01-11T21:38:05.9435424Z aten = torch.ops.aten 2023-01-11T21:38:05.9435559Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.9435656Z async_compile = AsyncCompile() 2023-01-11T21:38:05.9435661Z 2023-01-11T21:38:05.9435736Z import triton 2023-01-11T21:38:05.9435858Z import triton.language as tl 2023-01-11T21:38:05.9435985Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.9436117Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.9436131Z 2023-01-11T21:38:05.9436136Z 2023-01-11T21:38:05.9436265Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.9436470Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.9436593Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:05.9436699Z float* __restrict__ out_ptr0, 2023-01-11T21:38:05.9436805Z float* __restrict__ out_ptr1, 2023-01-11T21:38:05.9436905Z float* __restrict__ out_ptr2) 2023-01-11T21:38:05.9436969Z { 2023-01-11T21:38:05.9443344Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.9443427Z { 2023-01-11T21:38:05.9443517Z #pragma omp for 2023-01-11T21:38:05.9443610Z for(long i0=0; i0<1; i0+=1) 2023-01-11T21:38:05.9443673Z { 2023-01-11T21:38:05.9443827Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:05.9443990Z auto tmp1 = at::vec::Vectorized(std::numeric_limits::infinity()); 2023-01-11T21:38:05.9444082Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:05.9444331Z auto tmp3 = at::vec::Vectorized(-std::numeric_limits::infinity()); 2023-01-11T21:38:05.9444422Z auto tmp4 = tmp0 + tmp3; 2023-01-11T21:38:05.9444509Z auto tmp5 = tmp0 * tmp3; 2023-01-11T21:38:05.9444600Z tmp2.store(out_ptr0 + 8*i0); 2023-01-11T21:38:05.9444756Z tmp4.store(out_ptr1 + 8*i0); 2023-01-11T21:38:05.9444858Z tmp5.store(out_ptr2 + 8*i0); 2023-01-11T21:38:05.9444928Z } 2023-01-11T21:38:05.9445029Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:05.9445120Z for(long i0=8; i0<8; i0+=1) 2023-01-11T21:38:05.9445191Z { 2023-01-11T21:38:05.9445277Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:05.9445409Z auto tmp1 = std::numeric_limits::infinity(); 2023-01-11T21:38:05.9445502Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:05.9445753Z auto tmp3 = -std::numeric_limits::infinity(); 2023-01-11T21:38:05.9445847Z auto tmp4 = tmp0 + tmp3; 2023-01-11T21:38:05.9445937Z auto tmp5 = tmp0 * tmp3; 2023-01-11T21:38:05.9446026Z out_ptr0[i0] = tmp2; 2023-01-11T21:38:05.9446107Z out_ptr1[i0] = tmp4; 2023-01-11T21:38:05.9446193Z out_ptr2[i0] = tmp5; 2023-01-11T21:38:05.9446268Z } 2023-01-11T21:38:05.9446336Z } 2023-01-11T21:38:05.9446402Z } 2023-01-11T21:38:05.9446489Z ''') 2023-01-11T21:38:05.9446496Z 2023-01-11T21:38:05.9446500Z 2023-01-11T21:38:05.9446597Z async_compile.wait(globals()) 2023-01-11T21:38:05.9446670Z del async_compile 2023-01-11T21:38:05.9446682Z 2023-01-11T21:38:05.9446752Z def call(args): 2023-01-11T21:38:05.9446833Z arg0_1, = args 2023-01-11T21:38:05.9446911Z args.clear() 2023-01-11T21:38:05.9447112Z buf0 = empty_strided((8, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9447302Z buf1 = empty_strided((8, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9447498Z buf2 = empty_strided((8, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9447694Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr())) 2023-01-11T21:38:05.9447769Z del arg0_1 2023-01-11T21:38:05.9447858Z return (buf0, buf1, buf2, ) 2023-01-11T21:38:05.9447866Z 2023-01-11T21:38:05.9447870Z 2023-01-11T21:38:05.9447954Z if __name__ == "__main__": 2023-01-11T21:38:05.9448078Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.9448201Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.9448427Z arg0_1 = rand_strided((8, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9448543Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:05.9448548Z 2023-01-11T21:38:05.9448623Z ok (1.702s) 2023-01-11T21:38:05.9449088Z test_inplace_activations_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.9449221Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.9449482Z [2023-01-11 21:27:40,783] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 138 2023-01-11T21:38:05.9449747Z [2023-01-11 21:27:42,498] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 138 2023-01-11T21:38:05.9449753Z 2023-01-11T21:38:05.9449853Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.9449929Z import torch 2023-01-11T21:38:05.9449998Z import random 2023-01-11T21:38:05.9450115Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.9450240Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.9450245Z 2023-01-11T21:38:05.9450327Z aten = torch.ops.aten 2023-01-11T21:38:05.9450465Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.9450561Z async_compile = AsyncCompile() 2023-01-11T21:38:05.9450566Z 2023-01-11T21:38:05.9450640Z import triton 2023-01-11T21:38:05.9450725Z import triton.language as tl 2023-01-11T21:38:05.9450878Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.9451021Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.9451027Z 2023-01-11T21:38:05.9451031Z 2023-01-11T21:38:05.9451169Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.9451376Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.9451502Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:05.9451607Z float* __restrict__ out_ptr0, 2023-01-11T21:38:05.9451709Z float* __restrict__ out_ptr1, 2023-01-11T21:38:05.9451802Z float* __restrict__ out_ptr2, 2023-01-11T21:38:05.9451900Z float* __restrict__ out_ptr3, 2023-01-11T21:38:05.9451999Z float* __restrict__ out_ptr4, 2023-01-11T21:38:05.9452098Z float* __restrict__ out_ptr5, 2023-01-11T21:38:05.9452202Z float* __restrict__ out_ptr6) 2023-01-11T21:38:05.9452268Z { 2023-01-11T21:38:05.9452370Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.9452429Z { 2023-01-11T21:38:05.9452510Z #pragma omp for 2023-01-11T21:38:05.9452597Z for(long i0=0; i0<64; i0+=1) 2023-01-11T21:38:05.9452664Z { 2023-01-11T21:38:05.9452730Z { 2023-01-11T21:38:05.9452798Z { 2023-01-11T21:38:05.9452896Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:05.9453000Z auto tmp1 = static_cast(1); 2023-01-11T21:38:05.9453100Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:05.9453206Z auto tmp3 = static_cast(3); 2023-01-11T21:38:05.9453303Z auto tmp4 = tmp2 + tmp3; 2023-01-11T21:38:05.9453414Z auto tmp5 = static_cast(0.0); 2023-01-11T21:38:05.9453549Z auto tmp6 = (tmp5 != tmp5) ? tmp5 : std::max(tmp4, tmp5); 2023-01-11T21:38:05.9453663Z auto tmp7 = static_cast(6.0); 2023-01-11T21:38:05.9453788Z auto tmp8 = (tmp7 != tmp7) ? tmp7 : std::min(tmp6, tmp7); 2023-01-11T21:38:05.9453884Z auto tmp9 = tmp2 * tmp8; 2023-01-11T21:38:05.9454018Z auto tmp10 = static_cast(6); 2023-01-11T21:38:05.9454119Z auto tmp11 = tmp9 / tmp10; 2023-01-11T21:38:05.9454284Z auto tmp12 = static_cast(-1.0); 2023-01-11T21:38:05.9454422Z auto tmp13 = (tmp12 != tmp12) ? tmp12 : std::max(tmp2, tmp12); 2023-01-11T21:38:05.9454674Z auto tmp14 = static_cast(1.0); 2023-01-11T21:38:05.9454812Z auto tmp15 = (tmp14 != tmp14) ? tmp14 : std::min(tmp13, tmp14); 2023-01-11T21:38:05.9454913Z auto tmp16 = static_cast(0); 2023-01-11T21:38:05.9455013Z auto tmp17 = tmp2 > tmp16; 2023-01-11T21:38:05.9455131Z auto tmp18 = static_cast(0.01); 2023-01-11T21:38:05.9455240Z auto tmp19 = tmp2 * tmp18; 2023-01-11T21:38:05.9455365Z auto tmp20 = tmp17 ? tmp2 : tmp19; 2023-01-11T21:38:05.9455542Z auto tmp21 = std::exp(-tmp2); 2023-01-11T21:38:05.9455642Z auto tmp22 = 1 / (1 + tmp21); 2023-01-11T21:38:05.9455738Z auto tmp23 = tmp2 * tmp22; 2023-01-11T21:38:05.9455838Z auto tmp24 = std::log1p(tmp2); 2023-01-11T21:38:05.9455943Z auto tmp25 = static_cast(0); 2023-01-11T21:38:05.9456055Z auto tmp26 = static_cast(99.0); 2023-01-11T21:38:05.9456160Z auto tmp27 = tmp25 ? tmp26 : tmp2; 2023-01-11T21:38:05.9456265Z auto tmp28 = static_cast(1); 2023-01-11T21:38:05.9456368Z auto tmp29 = tmp28 ? tmp26 : tmp2; 2023-01-11T21:38:05.9456458Z out_ptr0[i0] = tmp11; 2023-01-11T21:38:05.9456599Z out_ptr1[i0] = tmp15; 2023-01-11T21:38:05.9456693Z out_ptr2[i0] = tmp20; 2023-01-11T21:38:05.9456784Z out_ptr3[i0] = tmp23; 2023-01-11T21:38:05.9456875Z out_ptr4[i0] = tmp24; 2023-01-11T21:38:05.9456971Z out_ptr5[i0] = tmp27; 2023-01-11T21:38:05.9457064Z out_ptr6[i0] = tmp29; 2023-01-11T21:38:05.9457188Z } 2023-01-11T21:38:05.9457271Z } 2023-01-11T21:38:05.9457343Z } 2023-01-11T21:38:05.9457412Z } 2023-01-11T21:38:05.9457481Z } 2023-01-11T21:38:05.9457571Z ''') 2023-01-11T21:38:05.9457576Z 2023-01-11T21:38:05.9457581Z 2023-01-11T21:38:05.9457680Z async_compile.wait(globals()) 2023-01-11T21:38:05.9457761Z del async_compile 2023-01-11T21:38:05.9457766Z 2023-01-11T21:38:05.9457836Z def call(args): 2023-01-11T21:38:05.9457913Z arg0_1, = args 2023-01-11T21:38:05.9457990Z args.clear() 2023-01-11T21:38:05.9458194Z buf0 = empty_strided((64, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9458391Z buf1 = empty_strided((64, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9458582Z buf2 = empty_strided((64, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9458778Z buf3 = empty_strided((64, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9458967Z buf4 = empty_strided((64, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9459147Z buf5 = empty_strided((64, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9459334Z buf6 = empty_strided((64, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9459617Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr()), c_void_p(buf3.data_ptr()), c_void_p(buf4.data_ptr()), c_void_p(buf5.data_ptr()), c_void_p(buf6.data_ptr())) 2023-01-11T21:38:05.9459692Z del arg0_1 2023-01-11T21:38:05.9459812Z return (buf0, buf1, buf2, buf3, buf4, buf5, buf6, ) 2023-01-11T21:38:05.9459817Z 2023-01-11T21:38:05.9459822Z 2023-01-11T21:38:05.9459905Z if __name__ == "__main__": 2023-01-11T21:38:05.9460027Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.9460156Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.9460399Z arg0_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9460514Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:05.9460519Z 2023-01-11T21:38:05.9460589Z ok (1.860s) 2023-01-11T21:38:05.9461103Z test_inplace_add_cpu (__main__.CpuTests) ... /opt/conda/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py:372: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled.Consider setting `torch.set_float32_matmul_precision('high')` 2023-01-11T21:38:05.9461182Z warnings.warn( 2023-01-11T21:38:05.9461440Z [2023-01-11 21:27:42,512] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 139 2023-01-11T21:38:05.9461708Z [2023-01-11 21:27:44,237] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 139 2023-01-11T21:38:05.9461715Z 2023-01-11T21:38:05.9461812Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.9461889Z import torch 2023-01-11T21:38:05.9461957Z import random 2023-01-11T21:38:05.9462077Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.9462201Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.9462206Z 2023-01-11T21:38:05.9462289Z aten = torch.ops.aten 2023-01-11T21:38:05.9462427Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.9462523Z async_compile = AsyncCompile() 2023-01-11T21:38:05.9462528Z 2023-01-11T21:38:05.9462602Z import triton 2023-01-11T21:38:05.9462688Z import triton.language as tl 2023-01-11T21:38:05.9462811Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.9462981Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.9462986Z 2023-01-11T21:38:05.9462992Z 2023-01-11T21:38:05.9463132Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.9463342Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.9463471Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:05.9463583Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:05.9463689Z float* __restrict__ out_ptr0) 2023-01-11T21:38:05.9463750Z { 2023-01-11T21:38:05.9463853Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.9463921Z { 2023-01-11T21:38:05.9464005Z #pragma omp for 2023-01-11T21:38:05.9464098Z for(long i0=0; i0<2; i0+=1) 2023-01-11T21:38:05.9464169Z { 2023-01-11T21:38:05.9464312Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:05.9464457Z auto tmp1 = at::vec::Vectorized::loadu(in_ptr1 + 8*i0); 2023-01-11T21:38:05.9464543Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:05.9464641Z tmp2.store(out_ptr0 + 8*i0); 2023-01-11T21:38:05.9464710Z } 2023-01-11T21:38:05.9464814Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:05.9464910Z for(long i0=16; i0<16; i0+=1) 2023-01-11T21:38:05.9464978Z { 2023-01-11T21:38:05.9465066Z auto tmp0 = out_ptr0[i0]; 2023-01-11T21:38:05.9465156Z auto tmp1 = in_ptr1[i0]; 2023-01-11T21:38:05.9465247Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:05.9465336Z out_ptr0[i0] = tmp2; 2023-01-11T21:38:05.9465405Z } 2023-01-11T21:38:05.9465472Z } 2023-01-11T21:38:05.9465539Z } 2023-01-11T21:38:05.9465619Z ''') 2023-01-11T21:38:05.9465624Z 2023-01-11T21:38:05.9465635Z 2023-01-11T21:38:05.9465724Z async_compile.wait(globals()) 2023-01-11T21:38:05.9465803Z del async_compile 2023-01-11T21:38:05.9465810Z 2023-01-11T21:38:05.9465886Z def call(args): 2023-01-11T21:38:05.9465973Z arg0_1, arg1_1 = args 2023-01-11T21:38:05.9466051Z args.clear() 2023-01-11T21:38:05.9466221Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(arg0_1.data_ptr())) 2023-01-11T21:38:05.9466297Z del arg1_1 2023-01-11T21:38:05.9466400Z return (arg0_1, ) 2023-01-11T21:38:05.9466406Z 2023-01-11T21:38:05.9466410Z 2023-01-11T21:38:05.9466496Z if __name__ == "__main__": 2023-01-11T21:38:05.9466616Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.9466744Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.9466943Z arg0_1 = rand_strided((4, 4), (4, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9467138Z arg1_1 = rand_strided((4, 4), (4, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9467261Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:05.9467266Z 2023-01-11T21:38:05.9467340Z ok (1.750s) 2023-01-11T21:38:05.9467681Z test_inplace_mixed_dtype_ops_cpu (__main__.CpuTests) ... [2023-01-11 21:27:44,305] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 140 2023-01-11T21:38:05.9467949Z [2023-01-11 21:27:46,266] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 140 2023-01-11T21:38:05.9467957Z 2023-01-11T21:38:05.9468056Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.9468133Z import torch 2023-01-11T21:38:05.9468211Z import random 2023-01-11T21:38:05.9468336Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.9468462Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.9468467Z 2023-01-11T21:38:05.9468554Z aten = torch.ops.aten 2023-01-11T21:38:05.9468685Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.9468783Z async_compile = AsyncCompile() 2023-01-11T21:38:05.9468788Z 2023-01-11T21:38:05.9468865Z import triton 2023-01-11T21:38:05.9468988Z import triton.language as tl 2023-01-11T21:38:05.9469114Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.9469253Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.9469259Z 2023-01-11T21:38:05.9469263Z 2023-01-11T21:38:05.9469400Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.9469610Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.9469723Z extern "C" void kernel(float* __restrict__ in_out_ptr0, 2023-01-11T21:38:05.9469832Z const float* __restrict__ in_ptr0, 2023-01-11T21:38:05.9469944Z const double* __restrict__ in_ptr1) 2023-01-11T21:38:05.9470009Z { 2023-01-11T21:38:05.9470111Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.9470176Z { 2023-01-11T21:38:05.9470258Z #pragma omp for 2023-01-11T21:38:05.9470337Z for(long i0=0; i0<16; i0+=1) 2023-01-11T21:38:05.9470403Z { 2023-01-11T21:38:05.9470475Z { 2023-01-11T21:38:05.9470544Z { 2023-01-11T21:38:05.9470643Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:05.9470744Z auto tmp1 = in_ptr1[i0]; 2023-01-11T21:38:05.9470857Z auto tmp2 = static_cast(tmp1); 2023-01-11T21:38:05.9470950Z auto tmp3 = tmp0 + tmp2; 2023-01-11T21:38:05.9471064Z auto tmp4 = static_cast(tmp3); 2023-01-11T21:38:05.9471161Z auto tmp5 = tmp4 + tmp1; 2023-01-11T21:38:05.9471272Z auto tmp6 = static_cast(tmp5); 2023-01-11T21:38:05.9471385Z auto tmp7 = static_cast(tmp6); 2023-01-11T21:38:05.9471480Z auto tmp8 = tmp7 * tmp1; 2023-01-11T21:38:05.9471588Z auto tmp9 = static_cast(tmp8); 2023-01-11T21:38:05.9471676Z in_out_ptr0[i0] = tmp9; 2023-01-11T21:38:05.9471746Z } 2023-01-11T21:38:05.9471816Z } 2023-01-11T21:38:05.9471882Z } 2023-01-11T21:38:05.9471948Z } 2023-01-11T21:38:05.9472013Z } 2023-01-11T21:38:05.9472090Z ''') 2023-01-11T21:38:05.9472104Z 2023-01-11T21:38:05.9472108Z 2023-01-11T21:38:05.9472194Z async_compile.wait(globals()) 2023-01-11T21:38:05.9472269Z del async_compile 2023-01-11T21:38:05.9472275Z 2023-01-11T21:38:05.9472374Z def call(args): 2023-01-11T21:38:05.9472454Z arg0_1, arg1_1 = args 2023-01-11T21:38:05.9472529Z args.clear() 2023-01-11T21:38:05.9472727Z buf0 = empty_strided((4, 4), (4, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9472817Z buf1 = buf0; del buf0 # reuse 2023-01-11T21:38:05.9472977Z kernel_cpp_0(c_void_p(buf1.data_ptr()), c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr())) 2023-01-11T21:38:05.9473050Z del arg0_1 2023-01-11T21:38:05.9473120Z del arg1_1 2023-01-11T21:38:05.9473195Z return (buf1, ) 2023-01-11T21:38:05.9473201Z 2023-01-11T21:38:05.9473205Z 2023-01-11T21:38:05.9473289Z if __name__ == "__main__": 2023-01-11T21:38:05.9473409Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.9473537Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.9473731Z arg0_1 = rand_strided((4, 4), (4, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9473920Z arg1_1 = rand_strided((4, 4), (4, 1), device='cpu', dtype=torch.float64) 2023-01-11T21:38:05.9474038Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:05.9474045Z 2023-01-11T21:38:05.9474115Z ok (2.029s) 2023-01-11T21:38:05.9474445Z test_input_mutation1_cpu (__main__.CpuTests) ... [2023-01-11 21:27:46,303] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 141 2023-01-11T21:38:05.9474657Z [2023-01-11 21:27:46,318] torch._inductor.scheduler: [DEBUG] remove_buffer('buf0') 2023-01-11T21:38:05.9474861Z [2023-01-11 21:27:46,320] torch._inductor.scheduler: [DEBUG] remove_buffer('buf0') 2023-01-11T21:38:05.9475122Z [2023-01-11 21:27:48,251] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 141 2023-01-11T21:38:05.9475160Z 2023-01-11T21:38:05.9475260Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.9475330Z import torch 2023-01-11T21:38:05.9475404Z import random 2023-01-11T21:38:05.9475522Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.9475646Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.9475651Z 2023-01-11T21:38:05.9475733Z aten = torch.ops.aten 2023-01-11T21:38:05.9475868Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.9475964Z async_compile = AsyncCompile() 2023-01-11T21:38:05.9475971Z 2023-01-11T21:38:05.9476045Z import triton 2023-01-11T21:38:05.9476130Z import triton.language as tl 2023-01-11T21:38:05.9476254Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.9476394Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.9476399Z 2023-01-11T21:38:05.9476407Z 2023-01-11T21:38:05.9476547Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.9476751Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.9476874Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:05.9476982Z float* __restrict__ out_ptr1, 2023-01-11T21:38:05.9477084Z float* __restrict__ out_ptr2) 2023-01-11T21:38:05.9477142Z { 2023-01-11T21:38:05.9477244Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.9477311Z { 2023-01-11T21:38:05.9477395Z #pragma omp for 2023-01-11T21:38:05.9477481Z for(long i0=0; i0<8; i0+=1) 2023-01-11T21:38:05.9477548Z { 2023-01-11T21:38:05.9477689Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:05.9477821Z auto tmp1 = at::vec::Vectorized(static_cast(1)); 2023-01-11T21:38:05.9477911Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:05.9478002Z auto tmp3 = tmp2 * tmp2; 2023-01-11T21:38:05.9478136Z auto tmp4 = at::vec::Vectorized(static_cast(2)); 2023-01-11T21:38:05.9478226Z auto tmp5 = tmp2 + tmp4; 2023-01-11T21:38:05.9478315Z auto tmp6 = tmp3 / tmp5; 2023-01-11T21:38:05.9478440Z tmp2.store(out_ptr1 + 8*i0); 2023-01-11T21:38:05.9478529Z tmp6.store(out_ptr2 + 8*i0); 2023-01-11T21:38:05.9478597Z } 2023-01-11T21:38:05.9478699Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:05.9478786Z for(long i0=64; i0<64; i0+=1) 2023-01-11T21:38:05.9478852Z { 2023-01-11T21:38:05.9478942Z auto tmp0 = out_ptr1[i0]; 2023-01-11T21:38:05.9479047Z auto tmp1 = static_cast(1); 2023-01-11T21:38:05.9479128Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:05.9479218Z auto tmp3 = tmp2 * tmp2; 2023-01-11T21:38:05.9479323Z auto tmp4 = static_cast(2); 2023-01-11T21:38:05.9479415Z auto tmp5 = tmp2 + tmp4; 2023-01-11T21:38:05.9479504Z auto tmp6 = tmp3 / tmp5; 2023-01-11T21:38:05.9479589Z out_ptr1[i0] = tmp2; 2023-01-11T21:38:05.9479673Z out_ptr2[i0] = tmp6; 2023-01-11T21:38:05.9479733Z } 2023-01-11T21:38:05.9479799Z } 2023-01-11T21:38:05.9479864Z } 2023-01-11T21:38:05.9479954Z ''') 2023-01-11T21:38:05.9479960Z 2023-01-11T21:38:05.9479964Z 2023-01-11T21:38:05.9480055Z async_compile.wait(globals()) 2023-01-11T21:38:05.9480132Z del async_compile 2023-01-11T21:38:05.9480139Z 2023-01-11T21:38:05.9480216Z def call(args): 2023-01-11T21:38:05.9480283Z arg0_1, = args 2023-01-11T21:38:05.9480357Z args.clear() 2023-01-11T21:38:05.9480552Z buf2 = empty_strided((64, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9480722Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg0_1.data_ptr()), c_void_p(buf2.data_ptr())) 2023-01-11T21:38:05.9480795Z del arg0_1 2023-01-11T21:38:05.9480937Z return (buf2, ) 2023-01-11T21:38:05.9480942Z 2023-01-11T21:38:05.9480946Z 2023-01-11T21:38:05.9481025Z if __name__ == "__main__": 2023-01-11T21:38:05.9481143Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.9481263Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.9481461Z arg0_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9481574Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:05.9481579Z 2023-01-11T21:38:05.9481650Z ok (1.991s) 2023-01-11T21:38:05.9481980Z test_input_mutation2_cpu (__main__.CpuTests) ... [2023-01-11 21:27:48,357] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 142 2023-01-11T21:38:05.9482202Z [2023-01-11 21:27:48,360] torch._inductor.graph: [WARNING] Creating implicit fallback for: 2023-01-11T21:38:05.9482301Z target: aten.expand_copy.default 2023-01-11T21:38:05.9482397Z args[0]: TensorBox(StorageBox( 2023-01-11T21:38:05.9482470Z Pointwise( 2023-01-11T21:38:05.9482564Z 'cpu', 2023-01-11T21:38:05.9482643Z torch.float32, 2023-01-11T21:38:05.9482746Z tmp0 = constant(66.0, torch.float32) 2023-01-11T21:38:05.9482820Z return tmp0 2023-01-11T21:38:05.9482886Z , 2023-01-11T21:38:05.9482952Z ranges=[1], 2023-01-11T21:38:05.9483070Z origins={lift_fresh_copy, _tensor_constant0} 2023-01-11T21:38:05.9483136Z ) 2023-01-11T21:38:05.9483200Z )) 2023-01-11T21:38:05.9483272Z args[1]: [64] 2023-01-11T21:38:05.9483539Z [2023-01-11 21:27:48,370] torch._inductor.ir: [WARNING] Using FallbackKernel: torch.ops.aten.expand_copy.default 2023-01-11T21:38:05.9483799Z [2023-01-11 21:27:50,445] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 142 2023-01-11T21:38:05.9483805Z 2023-01-11T21:38:05.9483902Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.9483969Z import torch 2023-01-11T21:38:05.9484041Z import random 2023-01-11T21:38:05.9484161Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.9484287Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.9484292Z 2023-01-11T21:38:05.9484375Z aten = torch.ops.aten 2023-01-11T21:38:05.9484516Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.9484613Z async_compile = AsyncCompile() 2023-01-11T21:38:05.9484647Z 2023-01-11T21:38:05.9484715Z import triton 2023-01-11T21:38:05.9484812Z import triton.language as tl 2023-01-11T21:38:05.9484938Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.9485078Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.9485084Z 2023-01-11T21:38:05.9485088Z 2023-01-11T21:38:05.9485224Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.9485427Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.9485552Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:05.9485658Z float* __restrict__ out_ptr0, 2023-01-11T21:38:05.9485757Z float* __restrict__ out_ptr1) 2023-01-11T21:38:05.9485823Z { 2023-01-11T21:38:05.9485926Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.9485995Z { 2023-01-11T21:38:05.9486077Z #pragma omp for 2023-01-11T21:38:05.9486167Z for(long i0=0; i0<8; i0+=1) 2023-01-11T21:38:05.9486234Z { 2023-01-11T21:38:05.9486365Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:05.9486501Z auto tmp1 = at::vec::Vectorized(static_cast(1)); 2023-01-11T21:38:05.9486591Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:05.9486689Z tmp2.store(out_ptr0 + 8*i0); 2023-01-11T21:38:05.9486757Z } 2023-01-11T21:38:05.9486856Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:05.9486943Z for(long i0=64; i0<64; i0+=1) 2023-01-11T21:38:05.9487002Z { 2023-01-11T21:38:05.9487090Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:05.9487225Z auto tmp1 = static_cast(1); 2023-01-11T21:38:05.9487314Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:05.9487399Z out_ptr0[i0] = tmp2; 2023-01-11T21:38:05.9487467Z } 2023-01-11T21:38:05.9487554Z #pragma omp single 2023-01-11T21:38:05.9487614Z { 2023-01-11T21:38:05.9487682Z { 2023-01-11T21:38:05.9487749Z { 2023-01-11T21:38:05.9487862Z auto tmp0 = static_cast(66.0); 2023-01-11T21:38:05.9487954Z out_ptr1[0] = tmp0; 2023-01-11T21:38:05.9488026Z } 2023-01-11T21:38:05.9488093Z } 2023-01-11T21:38:05.9488152Z } 2023-01-11T21:38:05.9488218Z } 2023-01-11T21:38:05.9488282Z } 2023-01-11T21:38:05.9488367Z ''') 2023-01-11T21:38:05.9488372Z 2023-01-11T21:38:05.9488377Z 2023-01-11T21:38:05.9488512Z kernel_cpp_1 = async_compile.cpp(''' 2023-01-11T21:38:05.9488717Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.9488844Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:05.9488940Z float* __restrict__ out_ptr0) 2023-01-11T21:38:05.9489005Z { 2023-01-11T21:38:05.9489105Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.9489172Z { 2023-01-11T21:38:05.9489256Z #pragma omp for 2023-01-11T21:38:05.9489343Z for(long i0=0; i0<8; i0+=1) 2023-01-11T21:38:05.9489410Z { 2023-01-11T21:38:05.9489540Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:05.9489678Z auto tmp1 = at::vec::Vectorized(static_cast(2)); 2023-01-11T21:38:05.9489771Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:05.9489866Z tmp2.store(out_ptr0 + 8*i0); 2023-01-11T21:38:05.9489933Z } 2023-01-11T21:38:05.9490033Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:05.9490122Z for(long i0=64; i0<64; i0+=1) 2023-01-11T21:38:05.9490185Z { 2023-01-11T21:38:05.9490274Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:05.9490376Z auto tmp1 = static_cast(2); 2023-01-11T21:38:05.9490466Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:05.9490551Z out_ptr0[i0] = tmp2; 2023-01-11T21:38:05.9490647Z } 2023-01-11T21:38:05.9490715Z } 2023-01-11T21:38:05.9490772Z } 2023-01-11T21:38:05.9490858Z ''') 2023-01-11T21:38:05.9490864Z 2023-01-11T21:38:05.9490868Z 2023-01-11T21:38:05.9490960Z async_compile.wait(globals()) 2023-01-11T21:38:05.9491037Z del async_compile 2023-01-11T21:38:05.9491042Z 2023-01-11T21:38:05.9491116Z def call(args): 2023-01-11T21:38:05.9491197Z primals_1, = args 2023-01-11T21:38:05.9491273Z args.clear() 2023-01-11T21:38:05.9491465Z buf0 = empty_strided((1, 64), (64, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9491654Z buf1 = empty_strided((1, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9491831Z kernel_cpp_0(c_void_p(primals_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr())) 2023-01-11T21:38:05.9491910Z del primals_1 2023-01-11T21:38:05.9492041Z buf2 = torch.ops.aten.expand_copy.default(buf1, [64]) 2023-01-11T21:38:05.9492111Z del buf1 2023-01-11T21:38:05.9492183Z buf3 = buf2 2023-01-11T21:38:05.9492279Z assert_size_stride(buf3, (64, ), (1, )) 2023-01-11T21:38:05.9492349Z del buf2 2023-01-11T21:38:05.9492546Z buf4 = empty_strided((1, 64), (64, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9492684Z kernel_cpp_1(c_void_p(buf3.data_ptr()), c_void_p(buf4.data_ptr())) 2023-01-11T21:38:05.9492800Z return (as_strided(buf3, (1, 64), (64, 1)), buf0, buf4, ) 2023-01-11T21:38:05.9492805Z 2023-01-11T21:38:05.9492809Z 2023-01-11T21:38:05.9492889Z if __name__ == "__main__": 2023-01-11T21:38:05.9493007Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.9493128Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.9493354Z primals_1 = rand_strided((1, 64), (64, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9493471Z print_performance(lambda: call([primals_1])) 2023-01-11T21:38:05.9493476Z 2023-01-11T21:38:05.9493547Z ok (2.182s) 2023-01-11T21:38:05.9493881Z test_input_mutation3_cpu (__main__.CpuTests) ... [2023-01-11 21:27:50,496] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 143 2023-01-11T21:38:05.9494147Z [2023-01-11 21:27:52,721] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 143 2023-01-11T21:38:05.9494153Z 2023-01-11T21:38:05.9494249Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.9494325Z import torch 2023-01-11T21:38:05.9494400Z import random 2023-01-11T21:38:05.9494635Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.9494761Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.9494767Z 2023-01-11T21:38:05.9494852Z aten = torch.ops.aten 2023-01-11T21:38:05.9494991Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.9495087Z async_compile = AsyncCompile() 2023-01-11T21:38:05.9495092Z 2023-01-11T21:38:05.9495168Z import triton 2023-01-11T21:38:05.9495261Z import triton.language as tl 2023-01-11T21:38:05.9495390Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.9495523Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.9495529Z 2023-01-11T21:38:05.9495540Z 2023-01-11T21:38:05.9495671Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.9495875Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.9495998Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:05.9496102Z float* __restrict__ out_ptr0) 2023-01-11T21:38:05.9496166Z { 2023-01-11T21:38:05.9496266Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.9496332Z { 2023-01-11T21:38:05.9496409Z #pragma omp for 2023-01-11T21:38:05.9496496Z for(long i0=0; i0<8; i0+=1) 2023-01-11T21:38:05.9496562Z { 2023-01-11T21:38:05.9496706Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:05.9496886Z auto tmp1 = at::vec::Vectorized(static_cast(1)); 2023-01-11T21:38:05.9496979Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:05.9497074Z tmp2.store(out_ptr0 + 8*i0); 2023-01-11T21:38:05.9497187Z } 2023-01-11T21:38:05.9497302Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:05.9497402Z for(long i0=64; i0<64; i0+=1) 2023-01-11T21:38:05.9497485Z { 2023-01-11T21:38:05.9497591Z auto tmp0 = out_ptr0[i0]; 2023-01-11T21:38:05.9497708Z auto tmp1 = static_cast(1); 2023-01-11T21:38:05.9497812Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:05.9497904Z out_ptr0[i0] = tmp2; 2023-01-11T21:38:05.9498078Z } 2023-01-11T21:38:05.9498167Z #pragma omp for 2023-01-11T21:38:05.9498253Z for(long i0=0; i0<8; i0+=1) 2023-01-11T21:38:05.9498325Z { 2023-01-11T21:38:05.9498463Z auto tmp0 = at::vec::Vectorized::loadu(out_ptr0 + 8*i0); 2023-01-11T21:38:05.9498604Z auto tmp1 = at::vec::Vectorized(static_cast(2)); 2023-01-11T21:38:05.9498686Z auto tmp2 = tmp0 * tmp1; 2023-01-11T21:38:05.9498779Z tmp2.store(out_ptr0 + 8*i0); 2023-01-11T21:38:05.9498846Z } 2023-01-11T21:38:05.9498947Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:05.9499033Z for(long i0=64; i0<64; i0+=1) 2023-01-11T21:38:05.9499102Z { 2023-01-11T21:38:05.9499191Z auto tmp0 = out_ptr0[i0]; 2023-01-11T21:38:05.9499288Z auto tmp1 = static_cast(2); 2023-01-11T21:38:05.9499377Z auto tmp2 = tmp0 * tmp1; 2023-01-11T21:38:05.9499464Z out_ptr0[i0] = tmp2; 2023-01-11T21:38:05.9499567Z } 2023-01-11T21:38:05.9499650Z #pragma omp for 2023-01-11T21:38:05.9499736Z for(long i0=0; i0<8; i0+=1) 2023-01-11T21:38:05.9499801Z { 2023-01-11T21:38:05.9499932Z auto tmp0 = at::vec::Vectorized::loadu(out_ptr0 + 8*i0); 2023-01-11T21:38:05.9500076Z auto tmp1 = decltype(tmp0)(1)/(decltype(tmp0)(1) + tmp0.neg().exp()); 2023-01-11T21:38:05.9500173Z tmp1.store(out_ptr0 + 8*i0); 2023-01-11T21:38:05.9500239Z } 2023-01-11T21:38:05.9500337Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:05.9500425Z for(long i0=64; i0<64; i0+=1) 2023-01-11T21:38:05.9500491Z { 2023-01-11T21:38:05.9500575Z auto tmp0 = out_ptr0[i0]; 2023-01-11T21:38:05.9500716Z auto tmp1 = std::exp(-tmp0); 2023-01-11T21:38:05.9500804Z auto tmp2 = 1 / (1 + tmp1); 2023-01-11T21:38:05.9500891Z out_ptr0[i0] = tmp2; 2023-01-11T21:38:05.9500958Z } 2023-01-11T21:38:05.9501039Z #pragma omp for 2023-01-11T21:38:05.9501128Z for(long i0=0; i0<8; i0+=1) 2023-01-11T21:38:05.9501188Z { 2023-01-11T21:38:05.9501325Z auto tmp0 = at::vec::Vectorized::loadu(out_ptr0 + 8*i0); 2023-01-11T21:38:05.9501462Z auto tmp1 = at::vec::Vectorized(static_cast(3)); 2023-01-11T21:38:05.9501554Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:05.9501648Z tmp2.store(out_ptr0 + 8*i0); 2023-01-11T21:38:05.9501717Z } 2023-01-11T21:38:05.9501815Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:05.9501894Z for(long i0=64; i0<64; i0+=1) 2023-01-11T21:38:05.9501963Z { 2023-01-11T21:38:05.9502055Z auto tmp0 = out_ptr0[i0]; 2023-01-11T21:38:05.9502159Z auto tmp1 = static_cast(3); 2023-01-11T21:38:05.9502246Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:05.9502332Z out_ptr0[i0] = tmp2; 2023-01-11T21:38:05.9502400Z } 2023-01-11T21:38:05.9502474Z #pragma omp for 2023-01-11T21:38:05.9502562Z for(long i0=0; i0<8; i0+=1) 2023-01-11T21:38:05.9502631Z { 2023-01-11T21:38:05.9502769Z auto tmp0 = at::vec::Vectorized::loadu(out_ptr0 + 8*i0); 2023-01-11T21:38:05.9502904Z auto tmp1 = at::vec::Vectorized(static_cast(4)); 2023-01-11T21:38:05.9503033Z auto tmp2 = tmp0 * tmp1; 2023-01-11T21:38:05.9503128Z tmp2.store(out_ptr0 + 8*i0); 2023-01-11T21:38:05.9503188Z } 2023-01-11T21:38:05.9503286Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:05.9503376Z for(long i0=64; i0<64; i0+=1) 2023-01-11T21:38:05.9503443Z { 2023-01-11T21:38:05.9503533Z auto tmp0 = out_ptr0[i0]; 2023-01-11T21:38:05.9503638Z auto tmp1 = static_cast(4); 2023-01-11T21:38:05.9503730Z auto tmp2 = tmp0 * tmp1; 2023-01-11T21:38:05.9503808Z out_ptr0[i0] = tmp2; 2023-01-11T21:38:05.9503873Z } 2023-01-11T21:38:05.9503955Z #pragma omp for 2023-01-11T21:38:05.9504043Z for(long i0=0; i0<8; i0+=1) 2023-01-11T21:38:05.9504109Z { 2023-01-11T21:38:05.9504245Z auto tmp0 = at::vec::Vectorized::loadu(out_ptr0 + 8*i0); 2023-01-11T21:38:05.9504380Z auto tmp1 = at::vec::clamp_min(tmp0, decltype(tmp0)(0)); 2023-01-11T21:38:05.9504469Z tmp1.store(out_ptr0 + 8*i0); 2023-01-11T21:38:05.9504536Z } 2023-01-11T21:38:05.9504634Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:05.9504722Z for(long i0=64; i0<64; i0+=1) 2023-01-11T21:38:05.9504787Z { 2023-01-11T21:38:05.9504878Z auto tmp0 = out_ptr0[i0]; 2023-01-11T21:38:05.9504971Z auto tmp1 = tmp0 * (tmp0>0); 2023-01-11T21:38:05.9505049Z out_ptr0[i0] = tmp1; 2023-01-11T21:38:05.9505114Z } 2023-01-11T21:38:05.9505179Z } 2023-01-11T21:38:05.9505243Z } 2023-01-11T21:38:05.9505330Z ''') 2023-01-11T21:38:05.9505336Z 2023-01-11T21:38:05.9505340Z 2023-01-11T21:38:05.9505473Z async_compile.wait(globals()) 2023-01-11T21:38:05.9505549Z del async_compile 2023-01-11T21:38:05.9505555Z 2023-01-11T21:38:05.9505622Z def call(args): 2023-01-11T21:38:05.9505700Z arg0_1, = args 2023-01-11T21:38:05.9505776Z args.clear() 2023-01-11T21:38:05.9505912Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg0_1.data_ptr())) 2023-01-11T21:38:05.9506017Z return (as_strided(arg0_1, (64, ), (1, )), ) 2023-01-11T21:38:05.9506022Z 2023-01-11T21:38:05.9506026Z 2023-01-11T21:38:05.9506107Z if __name__ == "__main__": 2023-01-11T21:38:05.9506225Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.9506351Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.9506546Z arg0_1 = rand_strided((1, 64), (64, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9506659Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:05.9506664Z 2023-01-11T21:38:05.9506736Z ok (2.289s) 2023-01-11T21:38:05.9507065Z test_input_mutation4_cpu (__main__.CpuTests) ... [2023-01-11 21:27:52,753] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 144 2023-01-11T21:38:05.9507338Z [2023-01-11 21:27:54,532] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 144 2023-01-11T21:38:05.9507343Z 2023-01-11T21:38:05.9507446Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.9507522Z import torch 2023-01-11T21:38:05.9507599Z import random 2023-01-11T21:38:05.9507711Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.9507835Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.9507840Z 2023-01-11T21:38:05.9507922Z aten = torch.ops.aten 2023-01-11T21:38:05.9508060Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.9508159Z async_compile = AsyncCompile() 2023-01-11T21:38:05.9508164Z 2023-01-11T21:38:05.9508237Z import triton 2023-01-11T21:38:05.9508328Z import triton.language as tl 2023-01-11T21:38:05.9508446Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.9508587Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.9508592Z 2023-01-11T21:38:05.9508596Z 2023-01-11T21:38:05.9508735Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.9508999Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.9509123Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:05.9509229Z float* __restrict__ out_ptr0) 2023-01-11T21:38:05.9509295Z { 2023-01-11T21:38:05.9509394Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.9509453Z { 2023-01-11T21:38:05.9509534Z #pragma omp for 2023-01-11T21:38:05.9509620Z for(long i0=0; i0<8; i0+=1) 2023-01-11T21:38:05.9509685Z { 2023-01-11T21:38:05.9509823Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:05.9509957Z auto tmp1 = at::vec::clamp_min(tmp0, decltype(tmp0)(0)); 2023-01-11T21:38:05.9510058Z tmp1.store(out_ptr0 + 8*i0); 2023-01-11T21:38:05.9510117Z } 2023-01-11T21:38:05.9510217Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:05.9510302Z for(long i0=64; i0<64; i0+=1) 2023-01-11T21:38:05.9510368Z { 2023-01-11T21:38:05.9510461Z auto tmp0 = out_ptr0[i0]; 2023-01-11T21:38:05.9510553Z auto tmp1 = tmp0 * (tmp0>0); 2023-01-11T21:38:05.9510639Z out_ptr0[i0] = tmp1; 2023-01-11T21:38:05.9510699Z } 2023-01-11T21:38:05.9510763Z } 2023-01-11T21:38:05.9510827Z } 2023-01-11T21:38:05.9510910Z ''') 2023-01-11T21:38:05.9510916Z 2023-01-11T21:38:05.9510920Z 2023-01-11T21:38:05.9511012Z async_compile.wait(globals()) 2023-01-11T21:38:05.9511089Z del async_compile 2023-01-11T21:38:05.9511094Z 2023-01-11T21:38:05.9511174Z def call(args): 2023-01-11T21:38:05.9511240Z arg0_1, = args 2023-01-11T21:38:05.9511318Z args.clear() 2023-01-11T21:38:05.9511487Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg0_1.data_ptr())) 2023-01-11T21:38:05.9511566Z return (arg0_1, ) 2023-01-11T21:38:05.9511571Z 2023-01-11T21:38:05.9511575Z 2023-01-11T21:38:05.9511654Z if __name__ == "__main__": 2023-01-11T21:38:05.9511773Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.9511903Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.9512102Z arg0_1 = rand_strided((1, 64), (64, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9512207Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:05.9512212Z 2023-01-11T21:38:05.9512288Z ok (1.809s) 2023-01-11T21:38:05.9512759Z test_invalid_operand_issue1_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.9512895Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.9513154Z [2023-01-11 21:27:54,908] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 145 2023-01-11T21:38:05.9513419Z [2023-01-11 21:27:56,648] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 145 2023-01-11T21:38:05.9513425Z 2023-01-11T21:38:05.9513524Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.9513597Z import torch 2023-01-11T21:38:05.9513672Z import random 2023-01-11T21:38:05.9513785Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.9513909Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.9513914Z 2023-01-11T21:38:05.9513995Z aten = torch.ops.aten 2023-01-11T21:38:05.9514130Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.9514225Z async_compile = AsyncCompile() 2023-01-11T21:38:05.9514233Z 2023-01-11T21:38:05.9514306Z import triton 2023-01-11T21:38:05.9514399Z import triton.language as tl 2023-01-11T21:38:05.9514524Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.9514658Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.9514664Z 2023-01-11T21:38:05.9514702Z 2023-01-11T21:38:05.9514833Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.9515038Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.9515161Z extern "C" void kernel(const long* __restrict__ in_ptr0, 2023-01-11T21:38:05.9515269Z const long* __restrict__ in_ptr1, 2023-01-11T21:38:05.9515376Z const long* __restrict__ in_ptr2, 2023-01-11T21:38:05.9515486Z const float* __restrict__ in_ptr3, 2023-01-11T21:38:05.9515589Z float* __restrict__ out_ptr0) 2023-01-11T21:38:05.9515650Z { 2023-01-11T21:38:05.9515751Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.9515816Z { 2023-01-11T21:38:05.9515897Z #pragma omp for 2023-01-11T21:38:05.9515983Z for(long i0=0; i0<8; i0+=1) 2023-01-11T21:38:05.9516050Z { 2023-01-11T21:38:05.9516135Z #pragma GCC ivdep 2023-01-11T21:38:05.9516222Z for(long i1=0; i1<128; i1+=1) 2023-01-11T21:38:05.9516290Z { 2023-01-11T21:38:05.9516377Z #pragma GCC ivdep 2023-01-11T21:38:05.9516472Z for(long i2=0; i2<768; i2+=1) 2023-01-11T21:38:05.9516541Z { 2023-01-11T21:38:05.9516611Z { 2023-01-11T21:38:05.9516678Z { 2023-01-11T21:38:05.9516780Z auto tmp3 = in_ptr0[i0]; 2023-01-11T21:38:05.9516891Z auto tmp8 = in_ptr2[i1 + (128*i0)]; 2023-01-11T21:38:05.9517004Z auto tmp0 = static_cast(i1); 2023-01-11T21:38:05.9517141Z auto tmp1 = static_cast(0); 2023-01-11T21:38:05.9517242Z auto tmp2 = tmp0 == tmp1; 2023-01-11T21:38:05.9517352Z auto tmp4 = static_cast(1); 2023-01-11T21:38:05.9517456Z auto tmp5 = tmp0 >= tmp4; 2023-01-11T21:38:05.9517541Z auto tmp6 = 0; 2023-01-11T21:38:05.9517624Z if(tmp5) 2023-01-11T21:38:05.9517698Z { 2023-01-11T21:38:05.9517877Z auto tmp7 = in_ptr1[(-1) + i1 + (127*i0)]; 2023-01-11T21:38:05.9517970Z tmp6 = tmp7; 2023-01-11T21:38:05.9518050Z } 2023-01-11T21:38:05.9518160Z auto tmp9 = tmp5 ? tmp6 : tmp8; 2023-01-11T21:38:05.9518261Z auto tmp10 = tmp2 ? tmp3 : tmp9; 2023-01-11T21:38:05.9518375Z auto tmp11 = in_ptr3[i2 + (768*tmp10)]; 2023-01-11T21:38:05.9518492Z out_ptr0[i2 + (768*i1) + (98304*i0)] = tmp11; 2023-01-11T21:38:05.9518565Z } 2023-01-11T21:38:05.9518635Z } 2023-01-11T21:38:05.9518703Z } 2023-01-11T21:38:05.9518770Z } 2023-01-11T21:38:05.9518829Z } 2023-01-11T21:38:05.9518898Z } 2023-01-11T21:38:05.9518965Z } 2023-01-11T21:38:05.9519048Z ''') 2023-01-11T21:38:05.9519053Z 2023-01-11T21:38:05.9519057Z 2023-01-11T21:38:05.9519151Z async_compile.wait(globals()) 2023-01-11T21:38:05.9519225Z del async_compile 2023-01-11T21:38:05.9519230Z 2023-01-11T21:38:05.9519304Z def call(args): 2023-01-11T21:38:05.9519401Z arg0_1, arg1_1, arg2_1, arg3_1, arg4_1 = args 2023-01-11T21:38:05.9519477Z args.clear() 2023-01-11T21:38:05.9519692Z buf0 = empty_strided((8, 128, 768), (98304, 768, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9519910Z kernel_cpp_0(c_void_p(arg3_1.data_ptr()), c_void_p(arg2_1.data_ptr()), c_void_p(arg4_1.data_ptr()), c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:05.9519987Z del arg0_1 2023-01-11T21:38:05.9520059Z del arg2_1 2023-01-11T21:38:05.9520131Z del arg3_1 2023-01-11T21:38:05.9520195Z del arg4_1 2023-01-11T21:38:05.9520271Z return (buf0, ) 2023-01-11T21:38:05.9520277Z 2023-01-11T21:38:05.9520310Z 2023-01-11T21:38:05.9520392Z if __name__ == "__main__": 2023-01-11T21:38:05.9520513Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.9520642Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.9520850Z arg0_1 = rand_strided((50005, 768), (768, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9521050Z arg1_1 = rand_strided((8, 128), (128, 1), device='cpu', dtype=torch.int64) 2023-01-11T21:38:05.9521247Z arg2_1 = rand_strided((8, 127), (127, 1), device='cpu', dtype=torch.int64) 2023-01-11T21:38:05.9521428Z arg3_1 = rand_strided((8, ), (1, ), device='cpu', dtype=torch.int64) 2023-01-11T21:38:05.9521620Z arg4_1 = rand_strided((8, 128), (128, 1), device='cpu', dtype=torch.int64) 2023-01-11T21:38:05.9521761Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1, arg3_1, arg4_1])) 2023-01-11T21:38:05.9521767Z 2023-01-11T21:38:05.9521837Z ok (2.607s) 2023-01-11T21:38:05.9522287Z test_isinf2_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.9522419Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.9522678Z [2023-01-11 21:27:57,174] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 146 2023-01-11T21:38:05.9522941Z [2023-01-11 21:27:58,876] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 146 2023-01-11T21:38:05.9522975Z 2023-01-11T21:38:05.9523073Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.9523148Z import torch 2023-01-11T21:38:05.9523215Z import random 2023-01-11T21:38:05.9523336Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.9523461Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.9523466Z 2023-01-11T21:38:05.9523549Z aten = torch.ops.aten 2023-01-11T21:38:05.9523685Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.9523779Z async_compile = AsyncCompile() 2023-01-11T21:38:05.9523785Z 2023-01-11T21:38:05.9523858Z import triton 2023-01-11T21:38:05.9523944Z import triton.language as tl 2023-01-11T21:38:05.9524068Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.9524208Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.9524213Z 2023-01-11T21:38:05.9524221Z 2023-01-11T21:38:05.9524363Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.9524568Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.9524691Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:05.9524797Z bool* __restrict__ out_ptr0) 2023-01-11T21:38:05.9524865Z { 2023-01-11T21:38:05.9524939Z #pragma GCC ivdep 2023-01-11T21:38:05.9525024Z for(long i0=0; i0<5; i0+=1) 2023-01-11T21:38:05.9525089Z { 2023-01-11T21:38:05.9525155Z { 2023-01-11T21:38:05.9525224Z { 2023-01-11T21:38:05.9525316Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:05.9525435Z auto tmp1 = static_cast(i0); 2023-01-11T21:38:05.9525541Z auto tmp2 = static_cast(2); 2023-01-11T21:38:05.9525651Z auto tmp3 = tmp1 < tmp2; 2023-01-11T21:38:05.9525757Z auto tmp4 = static_cast(1); 2023-01-11T21:38:05.9525850Z auto tmp5 = tmp1 < tmp4; 2023-01-11T21:38:05.9525960Z auto tmp6 = static_cast(1.0); 2023-01-11T21:38:05.9526088Z auto tmp7 = std::numeric_limits::infinity(); 2023-01-11T21:38:05.9526191Z auto tmp8 = tmp5 ? tmp6 : tmp7; 2023-01-11T21:38:05.9526312Z auto tmp9 = static_cast(3); 2023-01-11T21:38:05.9526407Z auto tmp10 = tmp1 < tmp9; 2023-01-11T21:38:05.9526515Z auto tmp11 = static_cast(2.0); 2023-01-11T21:38:05.9526620Z auto tmp12 = static_cast(4); 2023-01-11T21:38:05.9526717Z auto tmp13 = tmp1 < tmp12; 2023-01-11T21:38:05.9526932Z auto tmp14 = -std::numeric_limits::infinity(); 2023-01-11T21:38:05.9527062Z auto tmp15 = std::numeric_limits::quiet_NaN(); 2023-01-11T21:38:05.9527165Z auto tmp16 = tmp13 ? tmp14 : tmp15; 2023-01-11T21:38:05.9527259Z auto tmp17 = tmp10 ? tmp11 : tmp16; 2023-01-11T21:38:05.9527363Z auto tmp18 = tmp3 ? tmp8 : tmp17; 2023-01-11T21:38:05.9527458Z auto tmp19 = tmp0 == tmp18; 2023-01-11T21:38:05.9527547Z out_ptr0[i0] = tmp19; 2023-01-11T21:38:05.9527616Z } 2023-01-11T21:38:05.9527683Z } 2023-01-11T21:38:05.9527749Z } 2023-01-11T21:38:05.9527808Z } 2023-01-11T21:38:05.9527894Z ''') 2023-01-11T21:38:05.9527899Z 2023-01-11T21:38:05.9527903Z 2023-01-11T21:38:05.9527997Z async_compile.wait(globals()) 2023-01-11T21:38:05.9528073Z del async_compile 2023-01-11T21:38:05.9528077Z 2023-01-11T21:38:05.9528152Z def call(args): 2023-01-11T21:38:05.9528226Z arg0_1, = args 2023-01-11T21:38:05.9528301Z args.clear() 2023-01-11T21:38:05.9528480Z buf0 = empty_strided((5, ), (1, ), device='cpu', dtype=torch.bool) 2023-01-11T21:38:05.9528618Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:05.9528691Z del arg0_1 2023-01-11T21:38:05.9528797Z return (buf0, ) 2023-01-11T21:38:05.9528802Z 2023-01-11T21:38:05.9528806Z 2023-01-11T21:38:05.9528886Z if __name__ == "__main__": 2023-01-11T21:38:05.9529007Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.9529134Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.9529329Z arg0_1 = rand_strided((5, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9529435Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:05.9529440Z 2023-01-11T21:38:05.9529510Z ok (1.719s) 2023-01-11T21:38:05.9529960Z test_isinf_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.9530089Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.9530352Z [2023-01-11 21:27:58,892] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 147 2023-01-11T21:38:05.9530612Z [2023-01-11 21:28:00,667] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 147 2023-01-11T21:38:05.9531032Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.9531164Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.9531416Z [2023-01-11 21:28:00,683] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 148 2023-01-11T21:38:05.9531676Z [2023-01-11 21:28:02,781] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 148 2023-01-11T21:38:05.9531685Z 2023-01-11T21:38:05.9531782Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.9531849Z import torch 2023-01-11T21:38:05.9531922Z import random 2023-01-11T21:38:05.9532040Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.9532192Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.9532197Z 2023-01-11T21:38:05.9532284Z aten = torch.ops.aten 2023-01-11T21:38:05.9532420Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.9532516Z async_compile = AsyncCompile() 2023-01-11T21:38:05.9532521Z 2023-01-11T21:38:05.9532588Z import triton 2023-01-11T21:38:05.9532682Z import triton.language as tl 2023-01-11T21:38:05.9532810Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.9532951Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.9532957Z 2023-01-11T21:38:05.9532965Z 2023-01-11T21:38:05.9533103Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.9533316Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.9533439Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:05.9533544Z bool* __restrict__ out_ptr0, 2023-01-11T21:38:05.9533637Z bool* __restrict__ out_ptr1) 2023-01-11T21:38:05.9533702Z { 2023-01-11T21:38:05.9533782Z #pragma GCC ivdep 2023-01-11T21:38:05.9533868Z for(long i0=0; i0<5; i0+=1) 2023-01-11T21:38:05.9533932Z { 2023-01-11T21:38:05.9534002Z { 2023-01-11T21:38:05.9534072Z { 2023-01-11T21:38:05.9534159Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:05.9534265Z auto tmp1 = std::isinf(tmp0); 2023-01-11T21:38:05.9534367Z auto tmp2 = std::isnan(tmp0); 2023-01-11T21:38:05.9534456Z out_ptr0[i0] = tmp1; 2023-01-11T21:38:05.9534704Z out_ptr1[i0] = tmp2; 2023-01-11T21:38:05.9534772Z } 2023-01-11T21:38:05.9534838Z } 2023-01-11T21:38:05.9534896Z } 2023-01-11T21:38:05.9534963Z } 2023-01-11T21:38:05.9535048Z ''') 2023-01-11T21:38:05.9535053Z 2023-01-11T21:38:05.9535057Z 2023-01-11T21:38:05.9535151Z async_compile.wait(globals()) 2023-01-11T21:38:05.9535231Z del async_compile 2023-01-11T21:38:05.9535236Z 2023-01-11T21:38:05.9535310Z def call(args): 2023-01-11T21:38:05.9535382Z arg0_1, = args 2023-01-11T21:38:05.9535455Z args.clear() 2023-01-11T21:38:05.9535675Z buf0 = empty_strided((5, ), (1, ), device='cpu', dtype=torch.bool) 2023-01-11T21:38:05.9535875Z buf1 = empty_strided((5, ), (1, ), device='cpu', dtype=torch.bool) 2023-01-11T21:38:05.9536043Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr())) 2023-01-11T21:38:05.9536115Z del arg0_1 2023-01-11T21:38:05.9536197Z return (buf0, buf1, ) 2023-01-11T21:38:05.9536202Z 2023-01-11T21:38:05.9536210Z 2023-01-11T21:38:05.9536289Z if __name__ == "__main__": 2023-01-11T21:38:05.9536406Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.9536526Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.9536718Z arg0_1 = rand_strided((5, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9536833Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:05.9536839Z 2023-01-11T21:38:05.9536843Z 2023-01-11T21:38:05.9536940Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.9537016Z import torch 2023-01-11T21:38:05.9537089Z import random 2023-01-11T21:38:05.9537271Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.9537402Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.9537407Z 2023-01-11T21:38:05.9537482Z aten = torch.ops.aten 2023-01-11T21:38:05.9537618Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.9537713Z async_compile = AsyncCompile() 2023-01-11T21:38:05.9537721Z 2023-01-11T21:38:05.9537796Z import triton 2023-01-11T21:38:05.9537889Z import triton.language as tl 2023-01-11T21:38:05.9538014Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.9538154Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.9538159Z 2023-01-11T21:38:05.9538211Z 2023-01-11T21:38:05.9538354Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.9538553Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.9538680Z extern "C" void kernel(const double* __restrict__ in_ptr0, 2023-01-11T21:38:05.9538782Z bool* __restrict__ out_ptr0, 2023-01-11T21:38:05.9538881Z bool* __restrict__ out_ptr1) 2023-01-11T21:38:05.9538945Z { 2023-01-11T21:38:05.9539029Z #pragma GCC ivdep 2023-01-11T21:38:05.9539114Z for(long i0=0; i0<5; i0+=1) 2023-01-11T21:38:05.9539172Z { 2023-01-11T21:38:05.9539239Z { 2023-01-11T21:38:05.9539311Z { 2023-01-11T21:38:05.9539404Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:05.9539509Z auto tmp1 = std::isinf(tmp0); 2023-01-11T21:38:05.9539610Z auto tmp2 = std::isnan(tmp0); 2023-01-11T21:38:05.9539696Z out_ptr0[i0] = tmp1; 2023-01-11T21:38:05.9539776Z out_ptr1[i0] = tmp2; 2023-01-11T21:38:05.9539844Z } 2023-01-11T21:38:05.9539910Z } 2023-01-11T21:38:05.9539978Z } 2023-01-11T21:38:05.9540042Z } 2023-01-11T21:38:05.9540127Z ''') 2023-01-11T21:38:05.9540132Z 2023-01-11T21:38:05.9540136Z 2023-01-11T21:38:05.9540228Z async_compile.wait(globals()) 2023-01-11T21:38:05.9540298Z del async_compile 2023-01-11T21:38:05.9540303Z 2023-01-11T21:38:05.9540376Z def call(args): 2023-01-11T21:38:05.9540449Z arg0_1, = args 2023-01-11T21:38:05.9540525Z args.clear() 2023-01-11T21:38:05.9540715Z buf0 = empty_strided((5, ), (1, ), device='cpu', dtype=torch.bool) 2023-01-11T21:38:05.9540932Z buf1 = empty_strided((5, ), (1, ), device='cpu', dtype=torch.bool) 2023-01-11T21:38:05.9541101Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr())) 2023-01-11T21:38:05.9541166Z del arg0_1 2023-01-11T21:38:05.9541247Z return (buf0, buf1, ) 2023-01-11T21:38:05.9541252Z 2023-01-11T21:38:05.9541260Z 2023-01-11T21:38:05.9541338Z if __name__ == "__main__": 2023-01-11T21:38:05.9541456Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.9541583Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.9541776Z arg0_1 = rand_strided((5, ), (1, ), device='cpu', dtype=torch.float64) 2023-01-11T21:38:05.9541889Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:05.9541894Z 2023-01-11T21:38:05.9541964Z ok (3.905s) 2023-01-11T21:38:05.9542468Z test_kernel_names_cpu (__main__.CpuTests) ... /opt/conda/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py:372: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled.Consider setting `torch.set_float32_matmul_precision('high')` 2023-01-11T21:38:05.9542554Z warnings.warn( 2023-01-11T21:38:05.9542813Z [2023-01-11 21:28:02,796] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 149 2023-01-11T21:38:05.9543078Z [2023-01-11 21:28:04,773] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 149 2023-01-11T21:38:05.9543083Z 2023-01-11T21:38:05.9543180Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.9543253Z import torch 2023-01-11T21:38:05.9543327Z import random 2023-01-11T21:38:05.9543446Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.9543563Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.9543576Z 2023-01-11T21:38:05.9543650Z aten = torch.ops.aten 2023-01-11T21:38:05.9543786Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.9543881Z async_compile = AsyncCompile() 2023-01-11T21:38:05.9543889Z 2023-01-11T21:38:05.9543962Z import triton 2023-01-11T21:38:05.9544054Z import triton.language as tl 2023-01-11T21:38:05.9544183Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.9544328Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.9544333Z 2023-01-11T21:38:05.9544374Z 2023-01-11T21:38:05.9544515Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.9544713Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.9544839Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:05.9544945Z float* __restrict__ out_ptr0) 2023-01-11T21:38:05.9545012Z { 2023-01-11T21:38:05.9545113Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.9545180Z { 2023-01-11T21:38:05.9545262Z #pragma omp for 2023-01-11T21:38:05.9545343Z for(long i0=0; i0<1; i0+=1) 2023-01-11T21:38:05.9545414Z { 2023-01-11T21:38:05.9545553Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:05.9545690Z auto tmp1 = at::vec::Vectorized(static_cast(2)); 2023-01-11T21:38:05.9545782Z auto tmp2 = tmp0 * tmp1; 2023-01-11T21:38:05.9545878Z tmp2.store(out_ptr0 + 8*i0); 2023-01-11T21:38:05.9545946Z } 2023-01-11T21:38:05.9546039Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:05.9546126Z for(long i0=8; i0<8; i0+=1) 2023-01-11T21:38:05.9546191Z { 2023-01-11T21:38:05.9546280Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:05.9546384Z auto tmp1 = static_cast(2); 2023-01-11T21:38:05.9546474Z auto tmp2 = tmp0 * tmp1; 2023-01-11T21:38:05.9546559Z out_ptr0[i0] = tmp2; 2023-01-11T21:38:05.9546619Z } 2023-01-11T21:38:05.9546683Z } 2023-01-11T21:38:05.9546748Z } 2023-01-11T21:38:05.9546832Z ''') 2023-01-11T21:38:05.9546837Z 2023-01-11T21:38:05.9546877Z 2023-01-11T21:38:05.9546973Z async_compile.wait(globals()) 2023-01-11T21:38:05.9547051Z del async_compile 2023-01-11T21:38:05.9547056Z 2023-01-11T21:38:05.9547130Z def call(args): 2023-01-11T21:38:05.9547197Z arg0_1, = args 2023-01-11T21:38:05.9547271Z args.clear() 2023-01-11T21:38:05.9547466Z buf0 = empty_strided((8, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9547605Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:05.9547677Z del arg0_1 2023-01-11T21:38:05.9547752Z return (buf0, ) 2023-01-11T21:38:05.9547757Z 2023-01-11T21:38:05.9547762Z 2023-01-11T21:38:05.9547843Z if __name__ == "__main__": 2023-01-11T21:38:05.9547959Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.9548079Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.9548271Z arg0_1 = rand_strided((8, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9548384Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:05.9548393Z 2023-01-11T21:38:05.9548462Z ok (2.003s) 2023-01-11T21:38:05.9548623Z test_kwargs_cpu (__main__.CpuTests) ... skip: histogramdd only supports cpu (0.001s) 2023-01-11T21:38:05.9549077Z test_l1_loss_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.9549209Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.9549467Z [2023-01-11 21:28:04,820] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 150 2023-01-11T21:38:05.9549730Z [2023-01-11 21:28:06,943] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 150 2023-01-11T21:38:05.9549738Z 2023-01-11T21:38:05.9549828Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.9549902Z import torch 2023-01-11T21:38:05.9549978Z import random 2023-01-11T21:38:05.9550097Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.9550218Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.9550252Z 2023-01-11T21:38:05.9550337Z aten = torch.ops.aten 2023-01-11T21:38:05.9550471Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.9550559Z async_compile = AsyncCompile() 2023-01-11T21:38:05.9550570Z 2023-01-11T21:38:05.9550637Z import triton 2023-01-11T21:38:05.9550729Z import triton.language as tl 2023-01-11T21:38:05.9550853Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.9550990Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.9550996Z 2023-01-11T21:38:05.9551001Z 2023-01-11T21:38:05.9551137Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.9551346Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.9551466Z extern "C" void kernel(float* __restrict__ in_out_ptr0, 2023-01-11T21:38:05.9551572Z float* __restrict__ in_out_ptr1, 2023-01-11T21:38:05.9551677Z const float* __restrict__ in_ptr0, 2023-01-11T21:38:05.9551787Z const float* __restrict__ in_ptr1) 2023-01-11T21:38:05.9551853Z { 2023-01-11T21:38:05.9551944Z auto out_ptr0 = in_out_ptr0; 2023-01-11T21:38:05.9552035Z auto out_ptr1 = in_out_ptr1; 2023-01-11T21:38:05.9552101Z { 2023-01-11T21:38:05.9552290Z #pragma omp declare reduction(+:at::vec::Vectorized:omp_out += omp_in) initializer(omp_priv={{0}}) 2023-01-11T21:38:05.9552363Z float tmp4 = 0; 2023-01-11T21:38:05.9552487Z auto tmp4_vec = at::vec::Vectorized(tmp4); 2023-01-11T21:38:05.9552566Z float tmp6 = 0; 2023-01-11T21:38:05.9552688Z auto tmp6_vec = at::vec::Vectorized(tmp6); 2023-01-11T21:38:05.9552825Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.9552892Z { 2023-01-11T21:38:05.9553030Z #pragma omp for reduction(+:tmp4_vec) reduction(+:tmp6_vec) 2023-01-11T21:38:05.9553115Z for(long i0=0; i0<192; i0+=1) 2023-01-11T21:38:05.9553187Z { 2023-01-11T21:38:05.9553329Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:05.9553468Z auto tmp1 = at::vec::Vectorized::loadu(in_ptr1 + 8*i0); 2023-01-11T21:38:05.9553603Z auto tmp2 = tmp0 - tmp1; 2023-01-11T21:38:05.9553696Z auto tmp3 = tmp2.abs(); 2023-01-11T21:38:05.9553786Z auto tmp5 = tmp2 * tmp2; 2023-01-11T21:38:05.9553873Z tmp4_vec += tmp3; 2023-01-11T21:38:05.9553949Z tmp6_vec += tmp5; 2023-01-11T21:38:05.9554017Z } 2023-01-11T21:38:05.9554218Z tmp4 = at::vec::vec_reduce_all([](at::vec::Vectorized& x, at::vec::Vectorized&y) {return x + y;}, tmp4_vec); 2023-01-11T21:38:05.9554419Z tmp6 = at::vec::vec_reduce_all([](at::vec::Vectorized& x, at::vec::Vectorized&y) {return x + y;}, tmp6_vec); 2023-01-11T21:38:05.9554570Z #pragma omp for simd simdlen(4) reduction(+:tmp4) reduction(+:tmp6) 2023-01-11T21:38:05.9554665Z for(long i0=1536; i0<1536; i0+=1) 2023-01-11T21:38:05.9554732Z { 2023-01-11T21:38:05.9554821Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:05.9554906Z auto tmp1 = in_ptr1[i0]; 2023-01-11T21:38:05.9555037Z auto tmp2 = tmp0 - tmp1; 2023-01-11T21:38:05.9555140Z auto tmp3 = std::abs(tmp2); 2023-01-11T21:38:05.9555229Z auto tmp5 = tmp2 * tmp2; 2023-01-11T21:38:05.9555309Z tmp4 += tmp3; 2023-01-11T21:38:05.9555389Z tmp6 += tmp5; 2023-01-11T21:38:05.9555454Z } 2023-01-11T21:38:05.9555516Z } 2023-01-11T21:38:05.9555594Z out_ptr0[0] = tmp4; 2023-01-11T21:38:05.9555672Z out_ptr1[0] = tmp6; 2023-01-11T21:38:05.9555738Z } 2023-01-11T21:38:05.9555802Z { 2023-01-11T21:38:05.9555868Z { 2023-01-11T21:38:05.9555952Z auto tmp0 = out_ptr0[0]; 2023-01-11T21:38:05.9556086Z auto tmp1 = static_cast(1536); 2023-01-11T21:38:05.9556180Z auto tmp2 = tmp0 / tmp1; 2023-01-11T21:38:05.9556267Z in_out_ptr0[0] = tmp2; 2023-01-11T21:38:05.9556334Z } 2023-01-11T21:38:05.9556397Z } 2023-01-11T21:38:05.9556462Z { 2023-01-11T21:38:05.9556522Z { 2023-01-11T21:38:05.9556611Z auto tmp0 = out_ptr1[0]; 2023-01-11T21:38:05.9556717Z auto tmp1 = static_cast(1536); 2023-01-11T21:38:05.9556806Z auto tmp2 = tmp0 / tmp1; 2023-01-11T21:38:05.9556890Z in_out_ptr1[0] = tmp2; 2023-01-11T21:38:05.9556955Z } 2023-01-11T21:38:05.9557025Z } 2023-01-11T21:38:05.9557083Z } 2023-01-11T21:38:05.9557168Z ''') 2023-01-11T21:38:05.9557173Z 2023-01-11T21:38:05.9557178Z 2023-01-11T21:38:05.9557272Z async_compile.wait(globals()) 2023-01-11T21:38:05.9557348Z del async_compile 2023-01-11T21:38:05.9557353Z 2023-01-11T21:38:05.9557427Z def call(args): 2023-01-11T21:38:05.9557511Z arg0_1, arg1_1 = args 2023-01-11T21:38:05.9557586Z args.clear() 2023-01-11T21:38:05.9557766Z buf0 = empty_strided((), (), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9557947Z buf1 = empty_strided((), (), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9558035Z buf2 = buf0; del buf0 # reuse 2023-01-11T21:38:05.9558123Z buf3 = buf1; del buf1 # reuse 2023-01-11T21:38:05.9558318Z kernel_cpp_0(c_void_p(buf2.data_ptr()), c_void_p(buf3.data_ptr()), c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr())) 2023-01-11T21:38:05.9558390Z del arg0_1 2023-01-11T21:38:05.9558461Z del arg1_1 2023-01-11T21:38:05.9558535Z return (buf2, buf3, ) 2023-01-11T21:38:05.9558577Z 2023-01-11T21:38:05.9558581Z 2023-01-11T21:38:05.9558656Z if __name__ == "__main__": 2023-01-11T21:38:05.9558773Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.9558900Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.9559121Z arg0_1 = rand_strided((2, 3, 16, 16), (768, 256, 16, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9559336Z arg1_1 = rand_strided((2, 3, 16, 16), (768, 256, 16, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9559456Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:05.9559462Z 2023-01-11T21:38:05.9559533Z ok (2.177s) 2023-01-11T21:38:05.9559986Z test_layer_norm_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.9560121Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.9560373Z [2023-01-11 21:28:07,056] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 151 2023-01-11T21:38:05.9560640Z [2023-01-11 21:28:08,972] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 151 2023-01-11T21:38:05.9560645Z 2023-01-11T21:38:05.9560746Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.9560820Z import torch 2023-01-11T21:38:05.9560892Z import random 2023-01-11T21:38:05.9561012Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.9561133Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.9561138Z 2023-01-11T21:38:05.9561221Z aten = torch.ops.aten 2023-01-11T21:38:05.9561350Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.9561448Z async_compile = AsyncCompile() 2023-01-11T21:38:05.9561454Z 2023-01-11T21:38:05.9561526Z import triton 2023-01-11T21:38:05.9561622Z import triton.language as tl 2023-01-11T21:38:05.9561747Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.9561888Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.9561921Z 2023-01-11T21:38:05.9561926Z 2023-01-11T21:38:05.9562064Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.9562269Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.9562382Z extern "C" void kernel(float* __restrict__ in_out_ptr0, 2023-01-11T21:38:05.9562491Z float* __restrict__ in_out_ptr1, 2023-01-11T21:38:05.9562602Z const float* __restrict__ in_ptr0, 2023-01-11T21:38:05.9562710Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:05.9562818Z const float* __restrict__ in_ptr2, 2023-01-11T21:38:05.9562924Z float* __restrict__ out_ptr0, 2023-01-11T21:38:05.9563025Z float* __restrict__ out_ptr3) 2023-01-11T21:38:05.9563086Z { 2023-01-11T21:38:05.9563177Z auto out_ptr2 = in_out_ptr0; 2023-01-11T21:38:05.9563266Z auto out_ptr1 = in_out_ptr1; 2023-01-11T21:38:05.9563369Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.9563436Z { 2023-01-11T21:38:05.9563517Z #pragma omp for 2023-01-11T21:38:05.9563604Z for(long i0=0; i0<16; i0+=1) 2023-01-11T21:38:05.9563664Z { 2023-01-11T21:38:05.9563731Z { 2023-01-11T21:38:05.9563928Z #pragma omp declare reduction(+:at::vec::Vectorized:omp_out += omp_in) initializer(omp_priv={{0}}) 2023-01-11T21:38:05.9564013Z float tmp1 = 0; 2023-01-11T21:38:05.9564140Z auto tmp1_vec = at::vec::Vectorized(tmp1); 2023-01-11T21:38:05.9564234Z for(long i1=0; i1<4; i1+=1) 2023-01-11T21:38:05.9564334Z { 2023-01-11T21:38:05.9564485Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + (8*i1) + (32*i0)); 2023-01-11T21:38:05.9564565Z tmp1_vec += tmp0; 2023-01-11T21:38:05.9564635Z } 2023-01-11T21:38:05.9564841Z tmp1 = at::vec::vec_reduce_all([](at::vec::Vectorized& x, at::vec::Vectorized&y) {return x + y;}, tmp1_vec); 2023-01-11T21:38:05.9564967Z #pragma omp simd simdlen(4) reduction(+:tmp1) 2023-01-11T21:38:05.9565061Z for(long i1=32; i1<32; i1+=1) 2023-01-11T21:38:05.9565129Z { 2023-01-11T21:38:05.9565234Z auto tmp0 = in_ptr0[i1 + (32*i0)]; 2023-01-11T21:38:05.9565309Z tmp1 += tmp0; 2023-01-11T21:38:05.9565389Z } 2023-01-11T21:38:05.9565489Z out_ptr0[i0] = tmp1; 2023-01-11T21:38:05.9565566Z } 2023-01-11T21:38:05.9565646Z } 2023-01-11T21:38:05.9565731Z #pragma omp for 2023-01-11T21:38:05.9565817Z for(long i0=0; i0<16; i0+=1) 2023-01-11T21:38:05.9565877Z { 2023-01-11T21:38:05.9565943Z { 2023-01-11T21:38:05.9566135Z #pragma omp declare reduction(+:at::vec::Vectorized:omp_out += omp_in) initializer(omp_priv={{0}}) 2023-01-11T21:38:05.9566223Z float tmp6 = 0; 2023-01-11T21:38:05.9566349Z auto tmp6_vec = at::vec::Vectorized(tmp6); 2023-01-11T21:38:05.9566433Z float tmp7 = 0; 2023-01-11T21:38:05.9566558Z auto tmp7_vec = at::vec::Vectorized(tmp7); 2023-01-11T21:38:05.9566651Z for(long i1=0; i1<4; i1+=1) 2023-01-11T21:38:05.9566714Z { 2023-01-11T21:38:05.9566862Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + (8*i1) + (32*i0)); 2023-01-11T21:38:05.9566998Z auto tmp1 = at::vec::Vectorized(out_ptr0[i0]); 2023-01-11T21:38:05.9567143Z auto tmp2 = at::vec::Vectorized(static_cast(32)); 2023-01-11T21:38:05.9567240Z auto tmp3 = tmp1 / tmp2; 2023-01-11T21:38:05.9567381Z auto tmp4 = tmp0 - tmp3; 2023-01-11T21:38:05.9567480Z auto tmp5 = tmp4.pow(2); 2023-01-11T21:38:05.9567590Z tmp6_vec += tmp5; 2023-01-11T21:38:05.9567677Z tmp7_vec += tmp0; 2023-01-11T21:38:05.9567745Z } 2023-01-11T21:38:05.9567946Z tmp6 = at::vec::vec_reduce_all([](at::vec::Vectorized& x, at::vec::Vectorized&y) {return x + y;}, tmp6_vec); 2023-01-11T21:38:05.9568145Z tmp7 = at::vec::vec_reduce_all([](at::vec::Vectorized& x, at::vec::Vectorized&y) {return x + y;}, tmp7_vec); 2023-01-11T21:38:05.9568291Z #pragma omp simd simdlen(4) reduction(+:tmp6) reduction(+:tmp7) 2023-01-11T21:38:05.9568387Z for(long i1=32; i1<32; i1+=1) 2023-01-11T21:38:05.9568463Z { 2023-01-11T21:38:05.9568570Z auto tmp0 = in_ptr0[i1 + (32*i0)]; 2023-01-11T21:38:05.9568661Z auto tmp1 = out_ptr0[i0]; 2023-01-11T21:38:05.9568769Z auto tmp2 = static_cast(32); 2023-01-11T21:38:05.9568867Z auto tmp3 = tmp1 / tmp2; 2023-01-11T21:38:05.9569005Z auto tmp4 = tmp0 - tmp3; 2023-01-11T21:38:05.9569100Z auto tmp5 = tmp4 * tmp4; 2023-01-11T21:38:05.9569183Z tmp6 += tmp5; 2023-01-11T21:38:05.9569264Z tmp7 += tmp0; 2023-01-11T21:38:05.9569325Z } 2023-01-11T21:38:05.9569413Z out_ptr1[i0] = tmp6; 2023-01-11T21:38:05.9569496Z out_ptr2[i0] = tmp7; 2023-01-11T21:38:05.9569565Z } 2023-01-11T21:38:05.9569630Z } 2023-01-11T21:38:05.9569714Z #pragma omp for 2023-01-11T21:38:05.9569794Z for(long i0=0; i0<2; i0+=1) 2023-01-11T21:38:05.9569896Z { 2023-01-11T21:38:05.9570035Z auto tmp0 = at::vec::Vectorized::loadu(out_ptr2 + 8*i0); 2023-01-11T21:38:05.9570174Z auto tmp1 = at::vec::Vectorized(static_cast(32)); 2023-01-11T21:38:05.9570263Z auto tmp2 = tmp0 / tmp1; 2023-01-11T21:38:05.9570367Z tmp2.store(in_out_ptr0 + 8*i0); 2023-01-11T21:38:05.9570433Z } 2023-01-11T21:38:05.9570535Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:05.9570615Z for(long i0=16; i0<16; i0+=1) 2023-01-11T21:38:05.9570686Z { 2023-01-11T21:38:05.9570774Z auto tmp0 = out_ptr2[i0]; 2023-01-11T21:38:05.9570880Z auto tmp1 = static_cast(32); 2023-01-11T21:38:05.9570969Z auto tmp2 = tmp0 / tmp1; 2023-01-11T21:38:05.9571057Z in_out_ptr0[i0] = tmp2; 2023-01-11T21:38:05.9571125Z } 2023-01-11T21:38:05.9571199Z #pragma omp for 2023-01-11T21:38:05.9571289Z for(long i0=0; i0<2; i0+=1) 2023-01-11T21:38:05.9571359Z { 2023-01-11T21:38:05.9571498Z auto tmp0 = at::vec::Vectorized::loadu(out_ptr1 + 8*i0); 2023-01-11T21:38:05.9571637Z auto tmp1 = at::vec::Vectorized(static_cast(32)); 2023-01-11T21:38:05.9571725Z auto tmp2 = tmp0 / tmp1; 2023-01-11T21:38:05.9571934Z auto tmp3 = at::vec::Vectorized(static_cast(1e-05)); 2023-01-11T21:38:05.9572017Z auto tmp4 = tmp2 + tmp3; 2023-01-11T21:38:05.9572106Z auto tmp5 = tmp4.rsqrt(); 2023-01-11T21:38:05.9572207Z tmp5.store(in_out_ptr1 + 8*i0); 2023-01-11T21:38:05.9572272Z } 2023-01-11T21:38:05.9572373Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:05.9572459Z for(long i0=16; i0<16; i0+=1) 2023-01-11T21:38:05.9572525Z { 2023-01-11T21:38:05.9572607Z auto tmp0 = out_ptr1[i0]; 2023-01-11T21:38:05.9572714Z auto tmp1 = static_cast(32); 2023-01-11T21:38:05.9572806Z auto tmp2 = tmp0 / tmp1; 2023-01-11T21:38:05.9572957Z auto tmp3 = static_cast(1e-05); 2023-01-11T21:38:05.9573047Z auto tmp4 = tmp2 + tmp3; 2023-01-11T21:38:05.9573152Z auto tmp5 = 1 / std::sqrt(tmp4); 2023-01-11T21:38:05.9573239Z in_out_ptr1[i0] = tmp5; 2023-01-11T21:38:05.9573299Z } 2023-01-11T21:38:05.9573465Z #pragma omp for 2023-01-11T21:38:05.9573551Z for(long i0=0; i0<16; i0+=1) 2023-01-11T21:38:05.9573616Z { 2023-01-11T21:38:05.9573702Z for(long i1=0; i1<4; i1+=1) 2023-01-11T21:38:05.9573771Z { 2023-01-11T21:38:05.9573919Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + (8*i1) + (32*i0)); 2023-01-11T21:38:05.9574047Z auto tmp1 = at::vec::Vectorized(in_out_ptr0[i0]); 2023-01-11T21:38:05.9574180Z auto tmp3 = at::vec::Vectorized(in_out_ptr1[i0]); 2023-01-11T21:38:05.9574318Z auto tmp5 = at::vec::Vectorized::loadu(in_ptr1 + 8*i1); 2023-01-11T21:38:05.9574459Z auto tmp7 = at::vec::Vectorized::loadu(in_ptr2 + 8*i1); 2023-01-11T21:38:05.9574709Z auto tmp2 = tmp0 - tmp1; 2023-01-11T21:38:05.9574801Z auto tmp4 = tmp2 * tmp3; 2023-01-11T21:38:05.9574891Z auto tmp6 = tmp4 * tmp5; 2023-01-11T21:38:05.9574985Z auto tmp8 = tmp6 + tmp7; 2023-01-11T21:38:05.9575112Z auto tmp9 = at::vec::clamp_min(tmp8, decltype(tmp8)(0)); 2023-01-11T21:38:05.9575223Z tmp9.store(out_ptr3 + (8*i1) + (32*i0)); 2023-01-11T21:38:05.9575291Z } 2023-01-11T21:38:05.9575392Z #pragma omp simd simdlen(4) 2023-01-11T21:38:05.9575483Z for(long i1=32; i1<32; i1+=1) 2023-01-11T21:38:05.9575551Z { 2023-01-11T21:38:05.9575654Z auto tmp0 = in_ptr0[i1 + (32*i0)]; 2023-01-11T21:38:05.9575745Z auto tmp1 = in_out_ptr0[i0]; 2023-01-11T21:38:05.9575893Z auto tmp3 = in_out_ptr1[i0]; 2023-01-11T21:38:05.9575985Z auto tmp5 = in_ptr1[i1]; 2023-01-11T21:38:05.9576076Z auto tmp7 = in_ptr2[i1]; 2023-01-11T21:38:05.9576209Z auto tmp2 = tmp0 - tmp1; 2023-01-11T21:38:05.9576300Z auto tmp4 = tmp2 * tmp3; 2023-01-11T21:38:05.9576390Z auto tmp6 = tmp4 * tmp5; 2023-01-11T21:38:05.9576473Z auto tmp8 = tmp6 + tmp7; 2023-01-11T21:38:05.9576569Z auto tmp9 = tmp8 * (tmp8>0); 2023-01-11T21:38:05.9576666Z out_ptr3[i1 + (32*i0)] = tmp9; 2023-01-11T21:38:05.9576734Z } 2023-01-11T21:38:05.9576802Z } 2023-01-11T21:38:05.9576868Z } 2023-01-11T21:38:05.9576932Z } 2023-01-11T21:38:05.9577009Z ''') 2023-01-11T21:38:05.9577014Z 2023-01-11T21:38:05.9577019Z 2023-01-11T21:38:05.9577113Z async_compile.wait(globals()) 2023-01-11T21:38:05.9577248Z del async_compile 2023-01-11T21:38:05.9577253Z 2023-01-11T21:38:05.9577333Z def call(args): 2023-01-11T21:38:05.9577441Z primals_1, primals_2, primals_3 = args 2023-01-11T21:38:05.9577517Z args.clear() 2023-01-11T21:38:05.9577713Z buf0 = empty_strided((16, 1), (1, 16), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9577910Z buf1 = empty_strided((16, 1), (1, 16), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9578099Z buf2 = empty_strided((16, 1), (1, 16), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9578215Z buf3 = as_strided(buf2, (16, 1), (1, 1)); del buf2 # reuse 2023-01-11T21:38:05.9578333Z buf4 = as_strided(buf1, (16, 1), (1, 1)); del buf1 # reuse 2023-01-11T21:38:05.9578532Z buf5 = empty_strided((16, 32), (32, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9578808Z kernel_cpp_0(c_void_p(buf3.data_ptr()), c_void_p(buf4.data_ptr()), c_void_p(primals_3.data_ptr()), c_void_p(primals_1.data_ptr()), c_void_p(primals_2.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf5.data_ptr())) 2023-01-11T21:38:05.9578940Z return (buf5, primals_1, primals_2, primals_3, buf3, buf4, ) 2023-01-11T21:38:05.9578946Z 2023-01-11T21:38:05.9578950Z 2023-01-11T21:38:05.9579031Z if __name__ == "__main__": 2023-01-11T21:38:05.9579149Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.9579268Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.9579509Z primals_1 = rand_strided((32, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9579709Z primals_2 = rand_strided((32, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9579914Z primals_3 = rand_strided((16, 32), (32, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9580056Z print_performance(lambda: call([primals_1, primals_2, primals_3])) 2023-01-11T21:38:05.9580061Z 2023-01-11T21:38:05.9580130Z ok (2.060s) 2023-01-11T21:38:05.9580583Z test_leaky_relu_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.9580716Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.9580977Z [2023-01-11 21:28:09,067] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 152 2023-01-11T21:38:05.9581240Z [2023-01-11 21:28:11,006] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 152 2023-01-11T21:38:05.9581246Z 2023-01-11T21:38:05.9581336Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.9581411Z import torch 2023-01-11T21:38:05.9581485Z import random 2023-01-11T21:38:05.9581602Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.9581725Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.9581730Z 2023-01-11T21:38:05.9581845Z aten = torch.ops.aten 2023-01-11T21:38:05.9581984Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.9582075Z async_compile = AsyncCompile() 2023-01-11T21:38:05.9582089Z 2023-01-11T21:38:05.9582158Z import triton 2023-01-11T21:38:05.9582253Z import triton.language as tl 2023-01-11T21:38:05.9582381Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.9582527Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.9582532Z 2023-01-11T21:38:05.9582536Z 2023-01-11T21:38:05.9582674Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.9582881Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.9583011Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:05.9583121Z float* __restrict__ out_ptr0, 2023-01-11T21:38:05.9583218Z float* __restrict__ out_ptr1) 2023-01-11T21:38:05.9583289Z { 2023-01-11T21:38:05.9583394Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.9583462Z { 2023-01-11T21:38:05.9583546Z #pragma omp for 2023-01-11T21:38:05.9583638Z for(long i0=0; i0<256; i0+=1) 2023-01-11T21:38:05.9583700Z { 2023-01-11T21:38:05.9583769Z { 2023-01-11T21:38:05.9583841Z { 2023-01-11T21:38:05.9583945Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:05.9584057Z auto tmp1 = static_cast(0); 2023-01-11T21:38:05.9584156Z auto tmp2 = tmp0 > tmp1; 2023-01-11T21:38:05.9584269Z auto tmp3 = static_cast(0.2); 2023-01-11T21:38:05.9584359Z auto tmp4 = tmp0 * tmp3; 2023-01-11T21:38:05.9584464Z auto tmp5 = tmp2 ? tmp0 : tmp4; 2023-01-11T21:38:05.9584578Z auto tmp6 = static_cast(2); 2023-01-11T21:38:05.9584681Z auto tmp7 = tmp5 + tmp6; 2023-01-11T21:38:05.9584794Z auto tmp8 = static_cast(1); 2023-01-11T21:38:05.9584892Z auto tmp9 = tmp0 + tmp8; 2023-01-11T21:38:05.9584992Z auto tmp10 = tmp9 > tmp1; 2023-01-11T21:38:05.9585100Z auto tmp11 = static_cast(0.01); 2023-01-11T21:38:05.9585243Z auto tmp12 = tmp9 * tmp11; 2023-01-11T21:38:05.9585371Z auto tmp13 = tmp10 ? tmp9 : tmp12; 2023-01-11T21:38:05.9585480Z out_ptr0[i0] = tmp7; 2023-01-11T21:38:05.9585583Z out_ptr1[i0] = tmp13; 2023-01-11T21:38:05.9585653Z } 2023-01-11T21:38:05.9585722Z } 2023-01-11T21:38:05.9585783Z } 2023-01-11T21:38:05.9585852Z } 2023-01-11T21:38:05.9585919Z } 2023-01-11T21:38:05.9586004Z ''') 2023-01-11T21:38:05.9586010Z 2023-01-11T21:38:05.9586014Z 2023-01-11T21:38:05.9586112Z async_compile.wait(globals()) 2023-01-11T21:38:05.9586192Z del async_compile 2023-01-11T21:38:05.9586198Z 2023-01-11T21:38:05.9586279Z def call(args): 2023-01-11T21:38:05.9586355Z arg0_1, = args 2023-01-11T21:38:05.9586426Z args.clear() 2023-01-11T21:38:05.9586627Z buf0 = empty_strided((16, 16), (16, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9586827Z buf1 = empty_strided((16, 16), (16, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9587000Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr())) 2023-01-11T21:38:05.9587082Z del arg0_1 2023-01-11T21:38:05.9587169Z return (buf0, buf1, ) 2023-01-11T21:38:05.9587175Z 2023-01-11T21:38:05.9587179Z 2023-01-11T21:38:05.9587262Z if __name__ == "__main__": 2023-01-11T21:38:05.9587375Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.9587504Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.9587705Z arg0_1 = rand_strided((16, 16), (16, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9587820Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:05.9587854Z 2023-01-11T21:38:05.9587927Z ok (1.990s) 2023-01-11T21:38:05.9588381Z test_lgamma_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.9588515Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.9588775Z [2023-01-11 21:28:11,063] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 153 2023-01-11T21:38:05.9589040Z [2023-01-11 21:28:12,993] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 153 2023-01-11T21:38:05.9589047Z 2023-01-11T21:38:05.9589148Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.9589221Z import torch 2023-01-11T21:38:05.9589298Z import random 2023-01-11T21:38:05.9589419Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.9589547Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.9589552Z 2023-01-11T21:38:05.9589638Z aten = torch.ops.aten 2023-01-11T21:38:05.9589781Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.9589882Z async_compile = AsyncCompile() 2023-01-11T21:38:05.9589887Z 2023-01-11T21:38:05.9589957Z import triton 2023-01-11T21:38:05.9590051Z import triton.language as tl 2023-01-11T21:38:05.9590180Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.9590321Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.9590327Z 2023-01-11T21:38:05.9590332Z 2023-01-11T21:38:05.9590471Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.9590677Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.9590806Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:05.9590913Z float* __restrict__ out_ptr0, 2023-01-11T21:38:05.9591009Z float* __restrict__ out_ptr1) 2023-01-11T21:38:05.9591078Z { 2023-01-11T21:38:05.9591182Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.9591279Z { 2023-01-11T21:38:05.9591364Z #pragma omp for 2023-01-11T21:38:05.9591456Z for(long i0=0; i0<32; i0+=1) 2023-01-11T21:38:05.9591525Z { 2023-01-11T21:38:05.9591664Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:05.9591759Z auto tmp1 = tmp0.lgamma(); 2023-01-11T21:38:05.9591898Z auto tmp2 = at::vec::Vectorized(static_cast(2)); 2023-01-11T21:38:05.9591987Z auto tmp3 = tmp1 + tmp2; 2023-01-11T21:38:05.9592126Z auto tmp4 = at::vec::Vectorized(static_cast(1)); 2023-01-11T21:38:05.9592223Z auto tmp5 = tmp0 + tmp4; 2023-01-11T21:38:05.9592315Z auto tmp6 = tmp5.cos(); 2023-01-11T21:38:05.9592414Z tmp3.store(out_ptr0 + 8*i0); 2023-01-11T21:38:05.9592505Z tmp6.store(out_ptr1 + 8*i0); 2023-01-11T21:38:05.9592574Z } 2023-01-11T21:38:05.9592677Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:05.9592770Z for(long i0=256; i0<256; i0+=1) 2023-01-11T21:38:05.9592838Z { 2023-01-11T21:38:05.9592929Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:05.9593033Z auto tmp1 = std::lgamma(tmp0); 2023-01-11T21:38:05.9593133Z auto tmp2 = static_cast(2); 2023-01-11T21:38:05.9593224Z auto tmp3 = tmp1 + tmp2; 2023-01-11T21:38:05.9593334Z auto tmp4 = static_cast(1); 2023-01-11T21:38:05.9593424Z auto tmp5 = tmp0 + tmp4; 2023-01-11T21:38:05.9593520Z auto tmp6 = std::cos(tmp5); 2023-01-11T21:38:05.9593608Z out_ptr0[i0] = tmp3; 2023-01-11T21:38:05.9593721Z out_ptr1[i0] = tmp6; 2023-01-11T21:38:05.9593782Z } 2023-01-11T21:38:05.9593848Z } 2023-01-11T21:38:05.9593911Z } 2023-01-11T21:38:05.9593997Z ''') 2023-01-11T21:38:05.9594003Z 2023-01-11T21:38:05.9594007Z 2023-01-11T21:38:05.9594098Z async_compile.wait(globals()) 2023-01-11T21:38:05.9594175Z del async_compile 2023-01-11T21:38:05.9594182Z 2023-01-11T21:38:05.9594258Z def call(args): 2023-01-11T21:38:05.9594325Z arg0_1, = args 2023-01-11T21:38:05.9594397Z args.clear() 2023-01-11T21:38:05.9594598Z buf0 = empty_strided((16, 16), (16, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9594795Z buf1 = empty_strided((16, 16), (16, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9594965Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr())) 2023-01-11T21:38:05.9595035Z del arg0_1 2023-01-11T21:38:05.9595112Z return (buf0, buf1, ) 2023-01-11T21:38:05.9595117Z 2023-01-11T21:38:05.9595122Z 2023-01-11T21:38:05.9595209Z if __name__ == "__main__": 2023-01-11T21:38:05.9595320Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.9595446Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.9595645Z arg0_1 = rand_strided((16, 16), (16, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9595762Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:05.9595767Z 2023-01-11T21:38:05.9595837Z ok (1.992s) 2023-01-11T21:38:05.9596285Z test_linear1_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.9596416Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.9596672Z [2023-01-11 21:28:13,058] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 154 2023-01-11T21:38:05.9596935Z [2023-01-11 21:28:14,964] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 154 2023-01-11T21:38:05.9596941Z 2023-01-11T21:38:05.9597031Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.9597136Z import torch 2023-01-11T21:38:05.9597211Z import random 2023-01-11T21:38:05.9597332Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.9597455Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.9597460Z 2023-01-11T21:38:05.9597543Z aten = torch.ops.aten 2023-01-11T21:38:05.9597679Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.9597768Z async_compile = AsyncCompile() 2023-01-11T21:38:05.9597782Z 2023-01-11T21:38:05.9597848Z import triton 2023-01-11T21:38:05.9597940Z import triton.language as tl 2023-01-11T21:38:05.9598062Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.9598204Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.9598210Z 2023-01-11T21:38:05.9598214Z 2023-01-11T21:38:05.9598353Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.9598559Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.9598681Z extern "C" void kernel(float* __restrict__ in_out_ptr0) 2023-01-11T21:38:05.9598739Z { 2023-01-11T21:38:05.9598842Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.9598908Z { 2023-01-11T21:38:05.9598989Z #pragma omp for 2023-01-11T21:38:05.9599075Z for(long i0=0; i0<4; i0+=1) 2023-01-11T21:38:05.9599142Z { 2023-01-11T21:38:05.9599288Z auto tmp0 = at::vec::Vectorized::loadu(in_out_ptr0 + 8*i0); 2023-01-11T21:38:05.9599422Z auto tmp1 = decltype(tmp0)(1)/(decltype(tmp0)(1) + tmp0.neg().exp()); 2023-01-11T21:38:05.9599522Z tmp1.store(in_out_ptr0 + 8*i0); 2023-01-11T21:38:05.9599621Z } 2023-01-11T21:38:05.9599719Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:05.9599808Z for(long i0=32; i0<32; i0+=1) 2023-01-11T21:38:05.9599874Z { 2023-01-11T21:38:05.9599971Z auto tmp0 = in_out_ptr0[i0]; 2023-01-11T21:38:05.9600103Z auto tmp1 = std::exp(-tmp0); 2023-01-11T21:38:05.9600195Z auto tmp2 = 1 / (1 + tmp1); 2023-01-11T21:38:05.9600284Z in_out_ptr0[i0] = tmp2; 2023-01-11T21:38:05.9600352Z } 2023-01-11T21:38:05.9600416Z } 2023-01-11T21:38:05.9600478Z } 2023-01-11T21:38:05.9600561Z ''') 2023-01-11T21:38:05.9600567Z 2023-01-11T21:38:05.9600571Z 2023-01-11T21:38:05.9600666Z async_compile.wait(globals()) 2023-01-11T21:38:05.9600737Z del async_compile 2023-01-11T21:38:05.9600742Z 2023-01-11T21:38:05.9600817Z def call(args): 2023-01-11T21:38:05.9600924Z primals_1, primals_2, primals_3 = args 2023-01-11T21:38:05.9601001Z args.clear() 2023-01-11T21:38:05.9601201Z buf0 = empty_strided((2, 16), (16, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9601371Z aten.addmm.out(primals_2, primals_3, as_strided(primals_1, (8, 16), (1, 8)), beta=1, alpha=1, out=buf0) 2023-01-11T21:38:05.9601447Z del primals_1 2023-01-11T21:38:05.9601516Z del primals_2 2023-01-11T21:38:05.9601608Z buf1 = buf0; del buf0 # reuse 2023-01-11T21:38:05.9601715Z kernel_cpp_0(c_void_p(buf1.data_ptr())) 2023-01-11T21:38:05.9601810Z return (buf1, primals_3, buf1, ) 2023-01-11T21:38:05.9601815Z 2023-01-11T21:38:05.9601820Z 2023-01-11T21:38:05.9601901Z if __name__ == "__main__": 2023-01-11T21:38:05.9602016Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.9602140Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.9602336Z primals_1 = rand_strided((16, 8), (8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9602533Z primals_2 = rand_strided((16, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9602734Z primals_3 = rand_strided((2, 8), (8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9602878Z print_performance(lambda: call([primals_1, primals_2, primals_3])) 2023-01-11T21:38:05.9602883Z 2023-01-11T21:38:05.9602953Z ok (1.958s) 2023-01-11T21:38:05.9603433Z test_linear2_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.9603566Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.9603825Z [2023-01-11 21:28:15,265] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 155 2023-01-11T21:38:05.9604087Z [2023-01-11 21:28:17,260] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 155 2023-01-11T21:38:05.9604097Z 2023-01-11T21:38:05.9604194Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.9604261Z import torch 2023-01-11T21:38:05.9604336Z import random 2023-01-11T21:38:05.9604457Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.9604584Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.9604589Z 2023-01-11T21:38:05.9604672Z aten = torch.ops.aten 2023-01-11T21:38:05.9604809Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.9604907Z async_compile = AsyncCompile() 2023-01-11T21:38:05.9604913Z 2023-01-11T21:38:05.9604987Z import triton 2023-01-11T21:38:05.9605072Z import triton.language as tl 2023-01-11T21:38:05.9605197Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.9605336Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.9605341Z 2023-01-11T21:38:05.9605345Z 2023-01-11T21:38:05.9605510Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.9605718Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.9605838Z extern "C" void kernel(float* __restrict__ in_out_ptr0) 2023-01-11T21:38:05.9605903Z { 2023-01-11T21:38:05.9605997Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.9606067Z { 2023-01-11T21:38:05.9606149Z #pragma omp for 2023-01-11T21:38:05.9606237Z for(long i0=0; i0<2; i0+=1) 2023-01-11T21:38:05.9606305Z { 2023-01-11T21:38:05.9606450Z auto tmp0 = at::vec::Vectorized::loadu(in_out_ptr0 + 8*i0); 2023-01-11T21:38:05.9606583Z auto tmp1 = at::vec::clamp_min(tmp0, decltype(tmp0)(0)); 2023-01-11T21:38:05.9606684Z tmp1.store(in_out_ptr0 + 8*i0); 2023-01-11T21:38:05.9606744Z } 2023-01-11T21:38:05.9606842Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:05.9606930Z for(long i0=16; i0<16; i0+=1) 2023-01-11T21:38:05.9607003Z { 2023-01-11T21:38:05.9607096Z auto tmp0 = in_out_ptr0[i0]; 2023-01-11T21:38:05.9607187Z auto tmp1 = tmp0 * (tmp0>0); 2023-01-11T21:38:05.9607268Z in_out_ptr0[i0] = tmp1; 2023-01-11T21:38:05.9607337Z } 2023-01-11T21:38:05.9607401Z } 2023-01-11T21:38:05.9607465Z } 2023-01-11T21:38:05.9607550Z ''') 2023-01-11T21:38:05.9607559Z 2023-01-11T21:38:05.9607564Z 2023-01-11T21:38:05.9607701Z kernel_cpp_1 = async_compile.cpp(''' 2023-01-11T21:38:05.9607900Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.9608018Z extern "C" void kernel(float* __restrict__ in_out_ptr0) 2023-01-11T21:38:05.9608076Z { 2023-01-11T21:38:05.9608176Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.9608242Z { 2023-01-11T21:38:05.9608320Z #pragma omp for 2023-01-11T21:38:05.9608406Z for(long i0=0; i0<2; i0+=1) 2023-01-11T21:38:05.9608473Z { 2023-01-11T21:38:05.9608615Z auto tmp0 = at::vec::Vectorized::loadu(in_out_ptr0 + 8*i0); 2023-01-11T21:38:05.9608745Z auto tmp1 = at::vec::clamp_min(tmp0, decltype(tmp0)(0)); 2023-01-11T21:38:05.9608844Z tmp1.store(in_out_ptr0 + 8*i0); 2023-01-11T21:38:05.9608910Z } 2023-01-11T21:38:05.9609007Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:05.9609122Z for(long i0=16; i0<16; i0+=1) 2023-01-11T21:38:05.9609190Z { 2023-01-11T21:38:05.9609281Z auto tmp0 = in_out_ptr0[i0]; 2023-01-11T21:38:05.9609367Z auto tmp1 = tmp0 * (tmp0>0); 2023-01-11T21:38:05.9609454Z in_out_ptr0[i0] = tmp1; 2023-01-11T21:38:05.9609522Z } 2023-01-11T21:38:05.9609589Z } 2023-01-11T21:38:05.9609652Z } 2023-01-11T21:38:05.9609735Z ''') 2023-01-11T21:38:05.9609740Z 2023-01-11T21:38:05.9609745Z 2023-01-11T21:38:05.9609876Z kernel_cpp_2 = async_compile.cpp(''' 2023-01-11T21:38:05.9610066Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.9610188Z extern "C" void kernel(float* __restrict__ in_out_ptr0) 2023-01-11T21:38:05.9610256Z { 2023-01-11T21:38:05.9610354Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.9610421Z { 2023-01-11T21:38:05.9610503Z #pragma omp for 2023-01-11T21:38:05.9610590Z for(long i0=0; i0<2; i0+=1) 2023-01-11T21:38:05.9610652Z { 2023-01-11T21:38:05.9610790Z auto tmp0 = at::vec::Vectorized::loadu(in_out_ptr0 + 8*i0); 2023-01-11T21:38:05.9610921Z auto tmp1 = at::vec::clamp_min(tmp0, decltype(tmp0)(0)); 2023-01-11T21:38:05.9611020Z tmp1.store(in_out_ptr0 + 8*i0); 2023-01-11T21:38:05.9611086Z } 2023-01-11T21:38:05.9611183Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:05.9611272Z for(long i0=16; i0<16; i0+=1) 2023-01-11T21:38:05.9611331Z { 2023-01-11T21:38:05.9611423Z auto tmp0 = in_out_ptr0[i0]; 2023-01-11T21:38:05.9611516Z auto tmp1 = tmp0 * (tmp0>0); 2023-01-11T21:38:05.9611631Z in_out_ptr0[i0] = tmp1; 2023-01-11T21:38:05.9611698Z } 2023-01-11T21:38:05.9611764Z } 2023-01-11T21:38:05.9611829Z } 2023-01-11T21:38:05.9611907Z ''') 2023-01-11T21:38:05.9611913Z 2023-01-11T21:38:05.9611917Z 2023-01-11T21:38:05.9612053Z kernel_cpp_3 = async_compile.cpp(''' 2023-01-11T21:38:05.9612252Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.9612371Z extern "C" void kernel(float* __restrict__ in_out_ptr0, 2023-01-11T21:38:05.9612472Z bool* __restrict__ out_ptr0) 2023-01-11T21:38:05.9612537Z { 2023-01-11T21:38:05.9612642Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.9612700Z { 2023-01-11T21:38:05.9612780Z #pragma omp for 2023-01-11T21:38:05.9612866Z for(long i0=0; i0<16; i0+=1) 2023-01-11T21:38:05.9612932Z { 2023-01-11T21:38:05.9612999Z { 2023-01-11T21:38:05.9613068Z { 2023-01-11T21:38:05.9613174Z auto tmp0 = in_out_ptr0[i0]; 2023-01-11T21:38:05.9613269Z auto tmp1 = tmp0 * (tmp0>0); 2023-01-11T21:38:05.9613382Z auto tmp2 = static_cast(0); 2023-01-11T21:38:05.9613480Z auto tmp3 = tmp1 <= tmp2; 2023-01-11T21:38:05.9613577Z in_out_ptr0[i0] = tmp1; 2023-01-11T21:38:05.9613666Z out_ptr0[i0] = tmp3; 2023-01-11T21:38:05.9613736Z } 2023-01-11T21:38:05.9613801Z } 2023-01-11T21:38:05.9613860Z } 2023-01-11T21:38:05.9613927Z } 2023-01-11T21:38:05.9613992Z } 2023-01-11T21:38:05.9614076Z ''') 2023-01-11T21:38:05.9614081Z 2023-01-11T21:38:05.9614085Z 2023-01-11T21:38:05.9614179Z async_compile.wait(globals()) 2023-01-11T21:38:05.9614256Z del async_compile 2023-01-11T21:38:05.9614261Z 2023-01-11T21:38:05.9614336Z def call(args): 2023-01-11T21:38:05.9614630Z primals_1, primals_2, primals_3, primals_4, primals_5, primals_6, primals_7, primals_8, primals_9 = args 2023-01-11T21:38:05.9614713Z args.clear() 2023-01-11T21:38:05.9614908Z buf0 = empty_strided((2, 8), (8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9615082Z aten.addmm.out(primals_2, primals_9, as_strided(primals_1, (8, 8), (1, 8)), beta=1, alpha=1, out=buf0) 2023-01-11T21:38:05.9615223Z del primals_1 2023-01-11T21:38:05.9615314Z del primals_2 2023-01-11T21:38:05.9615415Z buf1 = buf0; del buf0 # reuse 2023-01-11T21:38:05.9615515Z kernel_cpp_0(c_void_p(buf1.data_ptr())) 2023-01-11T21:38:05.9615707Z buf2 = empty_strided((2, 8), (8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9615871Z aten.addmm.out(primals_4, buf1, as_strided(primals_3, (8, 8), (1, 8)), beta=1, alpha=1, out=buf2) 2023-01-11T21:38:05.9615948Z del primals_4 2023-01-11T21:38:05.9616038Z buf3 = buf2; del buf2 # reuse 2023-01-11T21:38:05.9616143Z kernel_cpp_1(c_void_p(buf3.data_ptr())) 2023-01-11T21:38:05.9616335Z buf4 = empty_strided((2, 8), (8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9616499Z aten.addmm.out(primals_6, buf3, as_strided(primals_5, (8, 8), (1, 8)), beta=1, alpha=1, out=buf4) 2023-01-11T21:38:05.9616568Z del primals_6 2023-01-11T21:38:05.9616656Z buf5 = buf4; del buf4 # reuse 2023-01-11T21:38:05.9616762Z kernel_cpp_2(c_void_p(buf5.data_ptr())) 2023-01-11T21:38:05.9638871Z buf6 = empty_strided((2, 8), (8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9639039Z aten.addmm.out(primals_8, buf5, as_strided(primals_7, (8, 8), (1, 8)), beta=1, alpha=1, out=buf6) 2023-01-11T21:38:05.9639118Z del primals_8 2023-01-11T21:38:05.9639207Z buf7 = buf6; del buf6 # reuse 2023-01-11T21:38:05.9639396Z buf8 = empty_strided((2, 8), (8, 1), device='cpu', dtype=torch.bool) 2023-01-11T21:38:05.9639527Z kernel_cpp_3(c_void_p(buf7.data_ptr()), c_void_p(buf8.data_ptr())) 2023-01-11T21:38:05.9639727Z return (buf7, primals_9, buf1, buf3, buf5, buf8, as_strided(primals_7, (8, 8), (8, 1)), as_strided(primals_5, (8, 8), (8, 1)), as_strided(primals_3, (8, 8), (8, 1)), ) 2023-01-11T21:38:05.9639902Z 2023-01-11T21:38:05.9639907Z 2023-01-11T21:38:05.9639993Z if __name__ == "__main__": 2023-01-11T21:38:05.9640110Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.9640244Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.9640446Z primals_1 = rand_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9640642Z primals_2 = rand_strided((8, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9640837Z primals_3 = rand_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9641024Z primals_4 = rand_strided((8, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9641219Z primals_5 = rand_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9641410Z primals_6 = rand_strided((8, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9641610Z primals_7 = rand_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9641804Z primals_8 = rand_strided((8, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9641999Z primals_9 = rand_strided((2, 8), (8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9642212Z print_performance(lambda: call([primals_1, primals_2, primals_3, primals_4, primals_5, primals_6, primals_7, primals_8, primals_9])) 2023-01-11T21:38:05.9642218Z 2023-01-11T21:38:05.9642291Z ok (2.372s) 2023-01-11T21:38:05.9642416Z test_linear_binary_cpu (__main__.CpuTests) ... ok (0.020s) 2023-01-11T21:38:05.9642869Z test_linear_packed_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.9643004Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.9643260Z [2023-01-11 21:28:17,407] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 156 2023-01-11T21:38:05.9643559Z [2023-01-11 21:28:17,431] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 156 2023-01-11T21:38:05.9643975Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.9644106Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.9644360Z [2023-01-11 21:28:17,465] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 157 2023-01-11T21:38:05.9644627Z [2023-01-11 21:28:17,481] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 157 2023-01-11T21:38:05.9645093Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.9645225Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.9645478Z [2023-01-11 21:28:17,523] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 158 2023-01-11T21:38:05.9645739Z [2023-01-11 21:28:17,546] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 158 2023-01-11T21:38:05.9646145Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.9646307Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.9646564Z [2023-01-11 21:28:17,577] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 159 2023-01-11T21:38:05.9646828Z [2023-01-11 21:28:17,592] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 159 2023-01-11T21:38:05.9646834Z 2023-01-11T21:38:05.9646938Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.9647020Z import torch 2023-01-11T21:38:05.9647098Z import random 2023-01-11T21:38:05.9647220Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.9647349Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.9647356Z 2023-01-11T21:38:05.9647434Z aten = torch.ops.aten 2023-01-11T21:38:05.9647575Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.9647673Z async_compile = AsyncCompile() 2023-01-11T21:38:05.9647679Z 2023-01-11T21:38:05.9647760Z import triton 2023-01-11T21:38:05.9647857Z import triton.language as tl 2023-01-11T21:38:05.9647986Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.9648128Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.9648133Z 2023-01-11T21:38:05.9648138Z 2023-01-11T21:38:05.9648232Z async_compile.wait(globals()) 2023-01-11T21:38:05.9648305Z del async_compile 2023-01-11T21:38:05.9648310Z 2023-01-11T21:38:05.9648388Z def call(args): 2023-01-11T21:38:05.9648487Z arg0_1, arg1_1, arg2_1, arg3_1 = args 2023-01-11T21:38:05.9648565Z args.clear() 2023-01-11T21:38:05.9648700Z buf0 = torch.ops.mkl._mkl_linear(arg3_1, arg2_1, arg0_1, arg1_1, 6) 2023-01-11T21:38:05.9648777Z del arg0_1 2023-01-11T21:38:05.9648852Z del arg1_1 2023-01-11T21:38:05.9648919Z del arg2_1 2023-01-11T21:38:05.9648993Z del arg3_1 2023-01-11T21:38:05.9649072Z return (buf0, ) 2023-01-11T21:38:05.9649077Z 2023-01-11T21:38:05.9649081Z 2023-01-11T21:38:05.9649166Z if __name__ == "__main__": 2023-01-11T21:38:05.9649318Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.9649449Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.9649653Z arg0_1 = rand_strided((30, 10), (10, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9649850Z arg1_1 = rand_strided((30, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9650047Z arg2_1 = rand_strided((1459233, 1), (1, 0), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9650255Z arg3_1 = rand_strided((2, 3, 10), (30, 10, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9650390Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1, arg3_1])) 2023-01-11T21:38:05.9650400Z 2023-01-11T21:38:05.9650404Z 2023-01-11T21:38:05.9650507Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.9650584Z import torch 2023-01-11T21:38:05.9650661Z import random 2023-01-11T21:38:05.9650782Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.9650903Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.9650916Z 2023-01-11T21:38:05.9650994Z aten = torch.ops.aten 2023-01-11T21:38:05.9651133Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.9651232Z async_compile = AsyncCompile() 2023-01-11T21:38:05.9651237Z 2023-01-11T21:38:05.9651314Z import triton 2023-01-11T21:38:05.9651410Z import triton.language as tl 2023-01-11T21:38:05.9651539Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.9651679Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.9651684Z 2023-01-11T21:38:05.9651689Z 2023-01-11T21:38:05.9651818Z async_compile.wait(globals()) 2023-01-11T21:38:05.9651888Z del async_compile 2023-01-11T21:38:05.9651892Z 2023-01-11T21:38:05.9651967Z def call(args): 2023-01-11T21:38:05.9652058Z arg0_1, arg1_1, arg2_1 = args 2023-01-11T21:38:05.9652135Z args.clear() 2023-01-11T21:38:05.9652266Z buf0 = torch.ops.mkl._mkl_linear(arg2_1, arg1_1, arg0_1, None, 6) 2023-01-11T21:38:05.9652341Z del arg0_1 2023-01-11T21:38:05.9652416Z del arg1_1 2023-01-11T21:38:05.9652480Z del arg2_1 2023-01-11T21:38:05.9652555Z return (buf0, ) 2023-01-11T21:38:05.9652560Z 2023-01-11T21:38:05.9652565Z 2023-01-11T21:38:05.9652647Z if __name__ == "__main__": 2023-01-11T21:38:05.9652764Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.9652892Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.9653092Z arg0_1 = rand_strided((30, 10), (10, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9653294Z arg1_1 = rand_strided((1459233, 1), (1, 0), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9653491Z arg2_1 = rand_strided((2, 3, 10), (30, 10, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9653619Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1])) 2023-01-11T21:38:05.9653624Z 2023-01-11T21:38:05.9653628Z 2023-01-11T21:38:05.9653727Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.9653806Z import torch 2023-01-11T21:38:05.9653879Z import random 2023-01-11T21:38:05.9653996Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.9654118Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.9654123Z 2023-01-11T21:38:05.9654205Z aten = torch.ops.aten 2023-01-11T21:38:05.9654332Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.9654427Z async_compile = AsyncCompile() 2023-01-11T21:38:05.9654432Z 2023-01-11T21:38:05.9654657Z import triton 2023-01-11T21:38:05.9654755Z import triton.language as tl 2023-01-11T21:38:05.9654895Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.9655056Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.9655062Z 2023-01-11T21:38:05.9655068Z 2023-01-11T21:38:05.9655170Z async_compile.wait(globals()) 2023-01-11T21:38:05.9655249Z del async_compile 2023-01-11T21:38:05.9655254Z 2023-01-11T21:38:05.9655321Z def call(args): 2023-01-11T21:38:05.9655460Z arg0_1, arg1_1, arg2_1, arg3_1 = args 2023-01-11T21:38:05.9655539Z args.clear() 2023-01-11T21:38:05.9655670Z buf0 = torch.ops.mkl._mkl_linear(arg3_1, arg2_1, arg0_1, arg1_1, 2) 2023-01-11T21:38:05.9655744Z del arg0_1 2023-01-11T21:38:05.9655818Z del arg1_1 2023-01-11T21:38:05.9655889Z del arg2_1 2023-01-11T21:38:05.9655952Z del arg3_1 2023-01-11T21:38:05.9656028Z return (buf0, ) 2023-01-11T21:38:05.9656033Z 2023-01-11T21:38:05.9656038Z 2023-01-11T21:38:05.9656118Z if __name__ == "__main__": 2023-01-11T21:38:05.9656236Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.9656367Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.9656569Z arg0_1 = rand_strided((30, 10), (10, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9656760Z arg1_1 = rand_strided((30, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9656962Z arg2_1 = rand_strided((1459233, 1), (1, 0), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9657212Z arg3_1 = rand_strided((2, 10), (10, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9657346Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1, arg3_1])) 2023-01-11T21:38:05.9657352Z 2023-01-11T21:38:05.9657356Z 2023-01-11T21:38:05.9657453Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.9657528Z import torch 2023-01-11T21:38:05.9657603Z import random 2023-01-11T21:38:05.9657722Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.9657844Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.9657849Z 2023-01-11T21:38:05.9657931Z aten = torch.ops.aten 2023-01-11T21:38:05.9658100Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.9658196Z async_compile = AsyncCompile() 2023-01-11T21:38:05.9658201Z 2023-01-11T21:38:05.9658276Z import triton 2023-01-11T21:38:05.9658374Z import triton.language as tl 2023-01-11T21:38:05.9658502Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.9658641Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.9658647Z 2023-01-11T21:38:05.9658651Z 2023-01-11T21:38:05.9658745Z async_compile.wait(globals()) 2023-01-11T21:38:05.9658821Z del async_compile 2023-01-11T21:38:05.9658826Z 2023-01-11T21:38:05.9658894Z def call(args): 2023-01-11T21:38:05.9658983Z arg0_1, arg1_1, arg2_1 = args 2023-01-11T21:38:05.9659057Z args.clear() 2023-01-11T21:38:05.9659185Z buf0 = torch.ops.mkl._mkl_linear(arg2_1, arg1_1, arg0_1, None, 2) 2023-01-11T21:38:05.9659258Z del arg0_1 2023-01-11T21:38:05.9659330Z del arg1_1 2023-01-11T21:38:05.9659405Z del arg2_1 2023-01-11T21:38:05.9659473Z return (buf0, ) 2023-01-11T21:38:05.9659478Z 2023-01-11T21:38:05.9659482Z 2023-01-11T21:38:05.9659562Z if __name__ == "__main__": 2023-01-11T21:38:05.9659678Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.9659804Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.9660013Z arg0_1 = rand_strided((30, 10), (10, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9660211Z arg1_1 = rand_strided((1459233, 1), (1, 0), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9660408Z arg2_1 = rand_strided((2, 10), (10, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9660535Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1])) 2023-01-11T21:38:05.9660540Z 2023-01-11T21:38:05.9660604Z ok (0.237s) 2023-01-11T21:38:05.9660727Z test_linear_unary_cpu (__main__.CpuTests) ... ok (0.001s) 2023-01-11T21:38:05.9661182Z test_linspace1_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.9661348Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.9661610Z [2023-01-11 21:28:17,637] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 160 2023-01-11T21:38:05.9661875Z [2023-01-11 21:28:19,454] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 160 2023-01-11T21:38:05.9661880Z 2023-01-11T21:38:05.9661979Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.9662059Z import torch 2023-01-11T21:38:05.9662137Z import random 2023-01-11T21:38:05.9662250Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.9662378Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.9662383Z 2023-01-11T21:38:05.9662464Z aten = torch.ops.aten 2023-01-11T21:38:05.9662601Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.9662697Z async_compile = AsyncCompile() 2023-01-11T21:38:05.9662702Z 2023-01-11T21:38:05.9662776Z import triton 2023-01-11T21:38:05.9662870Z import triton.language as tl 2023-01-11T21:38:05.9662995Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.9663128Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.9663133Z 2023-01-11T21:38:05.9663145Z 2023-01-11T21:38:05.9663274Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.9663481Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.9663609Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:05.9663716Z float* __restrict__ out_ptr0) 2023-01-11T21:38:05.9663821Z { 2023-01-11T21:38:05.9663903Z #pragma GCC ivdep 2023-01-11T21:38:05.9663994Z for(long i0=0; i0<7; i0+=1) 2023-01-11T21:38:05.9664053Z { 2023-01-11T21:38:05.9664121Z { 2023-01-11T21:38:05.9664190Z { 2023-01-11T21:38:05.9664285Z auto tmp4 = in_ptr0[i0]; 2023-01-11T21:38:05.9664400Z auto tmp0 = static_cast(0.125); 2023-01-11T21:38:05.9664506Z auto tmp1 = static_cast(i0); 2023-01-11T21:38:05.9664591Z auto tmp2 = tmp0 * tmp1; 2023-01-11T21:38:05.9664682Z auto tmp3 = tmp2 + tmp0; 2023-01-11T21:38:05.9664772Z auto tmp5 = tmp3 + tmp4; 2023-01-11T21:38:05.9664879Z out_ptr0[i0] = tmp5; 2023-01-11T21:38:05.9664953Z } 2023-01-11T21:38:05.9665032Z } 2023-01-11T21:38:05.9665111Z } 2023-01-11T21:38:05.9665168Z } 2023-01-11T21:38:05.9665253Z ''') 2023-01-11T21:38:05.9665258Z 2023-01-11T21:38:05.9665262Z 2023-01-11T21:38:05.9665360Z async_compile.wait(globals()) 2023-01-11T21:38:05.9665437Z del async_compile 2023-01-11T21:38:05.9665442Z 2023-01-11T21:38:05.9665517Z def call(args): 2023-01-11T21:38:05.9665591Z arg0_1, = args 2023-01-11T21:38:05.9665666Z args.clear() 2023-01-11T21:38:05.9665855Z buf0 = empty_strided((1, 7), (7, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9666000Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:05.9666073Z del arg0_1 2023-01-11T21:38:05.9666149Z return (buf0, ) 2023-01-11T21:38:05.9666154Z 2023-01-11T21:38:05.9666158Z 2023-01-11T21:38:05.9666239Z if __name__ == "__main__": 2023-01-11T21:38:05.9666356Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.9666482Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.9666678Z arg0_1 = rand_strided((1, 7), (7, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9666782Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:05.9666797Z 2023-01-11T21:38:05.9666861Z ok (1.861s) 2023-01-11T21:38:05.9667346Z test_linspace2_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.9667479Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.9667734Z [2023-01-11 21:28:19,493] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 161 2023-01-11T21:38:05.9667994Z [2023-01-11 21:28:21,203] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 161 2023-01-11T21:38:05.9668000Z 2023-01-11T21:38:05.9668099Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.9668173Z import torch 2023-01-11T21:38:05.9668250Z import random 2023-01-11T21:38:05.9668362Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.9668485Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.9668490Z 2023-01-11T21:38:05.9668574Z aten = torch.ops.aten 2023-01-11T21:38:05.9668712Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.9668809Z async_compile = AsyncCompile() 2023-01-11T21:38:05.9668814Z 2023-01-11T21:38:05.9668890Z import triton 2023-01-11T21:38:05.9668984Z import triton.language as tl 2023-01-11T21:38:05.9669109Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.9669241Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.9669246Z 2023-01-11T21:38:05.9669259Z 2023-01-11T21:38:05.9669388Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.9669593Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.9669717Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:05.9669852Z float* __restrict__ out_ptr0) 2023-01-11T21:38:05.9669920Z { 2023-01-11T21:38:05.9669989Z { 2023-01-11T21:38:05.9670058Z { 2023-01-11T21:38:05.9670143Z auto tmp4 = in_ptr0[0]; 2023-01-11T21:38:05.9670255Z auto tmp0 = static_cast(0.0); 2023-01-11T21:38:05.9670361Z auto tmp1 = static_cast(0); 2023-01-11T21:38:05.9670453Z auto tmp2 = tmp0 * tmp1; 2023-01-11T21:38:05.9670545Z auto tmp3 = tmp2 + tmp1; 2023-01-11T21:38:05.9670635Z auto tmp5 = tmp3 + tmp4; 2023-01-11T21:38:05.9670721Z out_ptr0[0] = tmp5; 2023-01-11T21:38:05.9670783Z } 2023-01-11T21:38:05.9670850Z } 2023-01-11T21:38:05.9670917Z } 2023-01-11T21:38:05.9671004Z ''') 2023-01-11T21:38:05.9671010Z 2023-01-11T21:38:05.9671014Z 2023-01-11T21:38:05.9671111Z async_compile.wait(globals()) 2023-01-11T21:38:05.9671190Z del async_compile 2023-01-11T21:38:05.9671199Z 2023-01-11T21:38:05.9671276Z def call(args): 2023-01-11T21:38:05.9671345Z arg0_1, = args 2023-01-11T21:38:05.9671421Z args.clear() 2023-01-11T21:38:05.9671619Z buf0 = empty_strided((1, 1), (1, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9671764Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:05.9671839Z del arg0_1 2023-01-11T21:38:05.9671916Z return (buf0, ) 2023-01-11T21:38:05.9671922Z 2023-01-11T21:38:05.9671926Z 2023-01-11T21:38:05.9672008Z if __name__ == "__main__": 2023-01-11T21:38:05.9672121Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.9672251Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.9672449Z arg0_1 = rand_strided((1, 1), (1, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9672563Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:05.9672568Z 2023-01-11T21:38:05.9672642Z ok (1.748s) 2023-01-11T21:38:05.9673126Z test_linspace3_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.9673262Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.9673526Z [2023-01-11 21:28:21,231] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 162 2023-01-11T21:38:05.9673789Z [2023-01-11 21:28:21,233] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 162 2023-01-11T21:38:05.9673795Z 2023-01-11T21:38:05.9673895Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.9673964Z import torch 2023-01-11T21:38:05.9674041Z import random 2023-01-11T21:38:05.9674164Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.9674292Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.9674297Z 2023-01-11T21:38:05.9674384Z aten = torch.ops.aten 2023-01-11T21:38:05.9674523Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.9674620Z async_compile = AsyncCompile() 2023-01-11T21:38:05.9674628Z 2023-01-11T21:38:05.9674700Z import triton 2023-01-11T21:38:05.9674795Z import triton.language as tl 2023-01-11T21:38:05.9674925Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.9675070Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.9675076Z 2023-01-11T21:38:05.9675080Z 2023-01-11T21:38:05.9675178Z async_compile.wait(globals()) 2023-01-11T21:38:05.9675257Z del async_compile 2023-01-11T21:38:05.9675262Z 2023-01-11T21:38:05.9675339Z def call(args): 2023-01-11T21:38:05.9675533Z buf0 = empty_strided((0, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9675605Z return (buf0, ) 2023-01-11T21:38:05.9675638Z 2023-01-11T21:38:05.9675653Z 2023-01-11T21:38:05.9675728Z if __name__ == "__main__": 2023-01-11T21:38:05.9675846Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.9675974Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.9676078Z print_performance(lambda: call([])) 2023-01-11T21:38:05.9676086Z 2023-01-11T21:38:05.9676160Z ok (0.029s) 2023-01-11T21:38:05.9676497Z test_list_clearing_cpu (__main__.CpuTests) ... [2023-01-11 21:28:21,260] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph None 2023-01-11T21:38:05.9676766Z [2023-01-11 21:28:22,995] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph None 2023-01-11T21:38:05.9676772Z 2023-01-11T21:38:05.9676874Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.9676944Z import torch 2023-01-11T21:38:05.9677021Z import random 2023-01-11T21:38:05.9677143Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.9677272Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.9677277Z 2023-01-11T21:38:05.9677363Z aten = torch.ops.aten 2023-01-11T21:38:05.9677500Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.9677597Z async_compile = AsyncCompile() 2023-01-11T21:38:05.9677602Z 2023-01-11T21:38:05.9677674Z import triton 2023-01-11T21:38:05.9677768Z import triton.language as tl 2023-01-11T21:38:05.9677895Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.9678036Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.9678041Z 2023-01-11T21:38:05.9678046Z 2023-01-11T21:38:05.9678185Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.9678392Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.9678518Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:05.9678630Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:05.9678731Z float* __restrict__ out_ptr0) 2023-01-11T21:38:05.9678801Z { 2023-01-11T21:38:05.9678906Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.9678974Z { 2023-01-11T21:38:05.9679059Z #pragma omp for 2023-01-11T21:38:05.9679178Z for(long i0=0; i0<3; i0+=1) 2023-01-11T21:38:05.9679249Z { 2023-01-11T21:38:05.9679394Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:05.9679532Z auto tmp1 = at::vec::Vectorized::loadu(in_ptr1 + 8*i0); 2023-01-11T21:38:05.9679623Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:05.9679721Z tmp2.store(out_ptr0 + 8*i0); 2023-01-11T21:38:05.9679791Z } 2023-01-11T21:38:05.9679894Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:05.9679989Z for(long i0=24; i0<25; i0+=1) 2023-01-11T21:38:05.9680052Z { 2023-01-11T21:38:05.9680143Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:05.9680236Z auto tmp1 = in_ptr1[i0]; 2023-01-11T21:38:05.9680326Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:05.9680414Z out_ptr0[i0] = tmp2; 2023-01-11T21:38:05.9680483Z } 2023-01-11T21:38:05.9680551Z } 2023-01-11T21:38:05.9680610Z } 2023-01-11T21:38:05.9680699Z ''') 2023-01-11T21:38:05.9680706Z 2023-01-11T21:38:05.9680711Z 2023-01-11T21:38:05.9680806Z async_compile.wait(globals()) 2023-01-11T21:38:05.9680886Z del async_compile 2023-01-11T21:38:05.9680891Z 2023-01-11T21:38:05.9680968Z def call(args): 2023-01-11T21:38:05.9681045Z x_1, y_1 = args 2023-01-11T21:38:05.9681123Z args.clear() 2023-01-11T21:38:05.9681312Z buf0 = empty_strided((5, 5), (5, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9681475Z kernel_cpp_0(c_void_p(x_1.data_ptr()), c_void_p(y_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:05.9681545Z del x_1 2023-01-11T21:38:05.9681619Z del y_1 2023-01-11T21:38:05.9681814Z buf1 = empty_strided((5, 5), (5, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9681938Z aten.mm.out(buf0, buf0, out=buf1) 2023-01-11T21:38:05.9682015Z return (buf1, ) 2023-01-11T21:38:05.9682020Z 2023-01-11T21:38:05.9682025Z 2023-01-11T21:38:05.9682104Z if __name__ == "__main__": 2023-01-11T21:38:05.9682216Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.9682344Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.9682534Z x_1 = rand_strided((5, 5), (5, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9682724Z y_1 = rand_strided((5, 5), (5, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9682839Z print_performance(lambda: call([x_1, y_1])) 2023-01-11T21:38:05.9682844Z 2023-01-11T21:38:05.9682916Z ok (1.761s) 2023-01-11T21:38:05.9683364Z test_log1p_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.9683500Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.9683757Z [2023-01-11 21:28:23,012] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 163 2023-01-11T21:38:05.9684014Z [2023-01-11 21:28:25,010] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 163 2023-01-11T21:38:05.9684428Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.9684559Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.9684824Z [2023-01-11 21:28:25,026] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 164 2023-01-11T21:38:05.9685124Z [2023-01-11 21:28:26,758] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 164 2023-01-11T21:38:05.9685567Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.9685699Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.9685954Z [2023-01-11 21:28:26,787] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 165 2023-01-11T21:38:05.9686217Z [2023-01-11 21:28:28,640] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 165 2023-01-11T21:38:05.9686633Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.9686763Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.9687016Z [2023-01-11 21:28:28,697] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 166 2023-01-11T21:38:05.9687267Z [2023-01-11 21:28:30,508] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 166 2023-01-11T21:38:05.9687679Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.9687881Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.9688133Z [2023-01-11 21:28:30,545] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 167 2023-01-11T21:38:05.9688139Z 2023-01-11T21:38:05.9688237Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.9688314Z import torch 2023-01-11T21:38:05.9688390Z import random 2023-01-11T21:38:05.9688509Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.9688633Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.9688638Z 2023-01-11T21:38:05.9688712Z aten = torch.ops.aten 2023-01-11T21:38:05.9688850Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.9688947Z async_compile = AsyncCompile() 2023-01-11T21:38:05.9688956Z 2023-01-11T21:38:05.9689030Z import triton 2023-01-11T21:38:05.9689123Z import triton.language as tl 2023-01-11T21:38:05.9689250Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.9689388Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.9689394Z 2023-01-11T21:38:05.9689401Z 2023-01-11T21:38:05.9689539Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.9689738Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.9689862Z extern "C" void kernel(const half* __restrict__ in_ptr0, 2023-01-11T21:38:05.9689966Z half* __restrict__ out_ptr0, 2023-01-11T21:38:05.9690066Z half* __restrict__ out_ptr1) 2023-01-11T21:38:05.9690133Z { 2023-01-11T21:38:05.9690236Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.9690302Z { 2023-01-11T21:38:05.9690376Z #pragma omp for 2023-01-11T21:38:05.9690467Z for(long i0=0; i0<64; i0+=1) 2023-01-11T21:38:05.9690534Z { 2023-01-11T21:38:05.9690601Z { 2023-01-11T21:38:05.9690672Z { 2023-01-11T21:38:05.9690796Z auto tmp0 = static_cast(in_ptr0[i0]); 2023-01-11T21:38:05.9690906Z auto tmp1 = std::log1p(tmp0); 2023-01-11T21:38:05.9691034Z auto tmp2 = static_cast(2); 2023-01-11T21:38:05.9691134Z auto tmp3 = tmp1 * tmp2; 2023-01-11T21:38:05.9691223Z out_ptr0[i0] = tmp1; 2023-01-11T21:38:05.9691313Z out_ptr1[i0] = tmp3; 2023-01-11T21:38:05.9691382Z } 2023-01-11T21:38:05.9691450Z } 2023-01-11T21:38:05.9691515Z } 2023-01-11T21:38:05.9691574Z } 2023-01-11T21:38:05.9691640Z } 2023-01-11T21:38:05.9691725Z ''') 2023-01-11T21:38:05.9691730Z 2023-01-11T21:38:05.9691735Z 2023-01-11T21:38:05.9691829Z async_compile.wait(globals()) 2023-01-11T21:38:05.9691910Z del async_compile 2023-01-11T21:38:05.9691915Z 2023-01-11T21:38:05.9691990Z def call(args): 2023-01-11T21:38:05.9692064Z arg0_1, = args 2023-01-11T21:38:05.9692132Z args.clear() 2023-01-11T21:38:05.9692326Z buf0 = empty_strided((64, ), (1, ), device='cpu', dtype=torch.float16) 2023-01-11T21:38:05.9692521Z buf1 = empty_strided((64, ), (1, ), device='cpu', dtype=torch.float16) 2023-01-11T21:38:05.9692692Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr())) 2023-01-11T21:38:05.9692765Z del arg0_1 2023-01-11T21:38:05.9692848Z return (buf0, buf1, ) 2023-01-11T21:38:05.9692853Z 2023-01-11T21:38:05.9692857Z 2023-01-11T21:38:05.9692939Z if __name__ == "__main__": 2023-01-11T21:38:05.9693056Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.9693175Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.9693368Z arg0_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float16) 2023-01-11T21:38:05.9693508Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:05.9693513Z 2023-01-11T21:38:05.9693518Z 2023-01-11T21:38:05.9693615Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.9693691Z import torch 2023-01-11T21:38:05.9693765Z import random 2023-01-11T21:38:05.9693883Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.9694010Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.9694015Z 2023-01-11T21:38:05.9694089Z aten = torch.ops.aten 2023-01-11T21:38:05.9694222Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.9694318Z async_compile = AsyncCompile() 2023-01-11T21:38:05.9694323Z 2023-01-11T21:38:05.9694397Z import triton 2023-01-11T21:38:05.9694602Z import triton.language as tl 2023-01-11T21:38:05.9694729Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.9694868Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.9694874Z 2023-01-11T21:38:05.9694882Z 2023-01-11T21:38:05.9695020Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.9695217Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.9695339Z extern "C" void kernel(const half* __restrict__ in_ptr0, 2023-01-11T21:38:05.9695447Z half* __restrict__ out_ptr0, 2023-01-11T21:38:05.9695546Z half* __restrict__ out_ptr1) 2023-01-11T21:38:05.9695610Z { 2023-01-11T21:38:05.9695712Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.9695779Z { 2023-01-11T21:38:05.9695852Z #pragma omp for 2023-01-11T21:38:05.9695940Z for(long i0=0; i0<201; i0+=1) 2023-01-11T21:38:05.9696006Z { 2023-01-11T21:38:05.9696075Z { 2023-01-11T21:38:05.9696144Z { 2023-01-11T21:38:05.9696264Z auto tmp0 = static_cast(in_ptr0[i0]); 2023-01-11T21:38:05.9696374Z auto tmp1 = std::log1p(tmp0); 2023-01-11T21:38:05.9696476Z auto tmp2 = static_cast(2); 2023-01-11T21:38:05.9696571Z auto tmp3 = tmp1 * tmp2; 2023-01-11T21:38:05.9696660Z out_ptr0[i0] = tmp1; 2023-01-11T21:38:05.9696750Z out_ptr1[i0] = tmp3; 2023-01-11T21:38:05.9696819Z } 2023-01-11T21:38:05.9696931Z } 2023-01-11T21:38:05.9696999Z } 2023-01-11T21:38:05.9697061Z } 2023-01-11T21:38:05.9697188Z } 2023-01-11T21:38:05.9697279Z ''') 2023-01-11T21:38:05.9697285Z 2023-01-11T21:38:05.9697289Z 2023-01-11T21:38:05.9697385Z async_compile.wait(globals()) 2023-01-11T21:38:05.9697462Z del async_compile 2023-01-11T21:38:05.9697467Z 2023-01-11T21:38:05.9697542Z def call(args): 2023-01-11T21:38:05.9697616Z arg0_1, = args 2023-01-11T21:38:05.9697685Z args.clear() 2023-01-11T21:38:05.9697879Z buf0 = empty_strided((201, ), (1, ), device='cpu', dtype=torch.float16) 2023-01-11T21:38:05.9698069Z buf1 = empty_strided((201, ), (1, ), device='cpu', dtype=torch.float16) 2023-01-11T21:38:05.9698236Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr())) 2023-01-11T21:38:05.9698308Z del arg0_1 2023-01-11T21:38:05.9698389Z return (buf0, buf1, ) 2023-01-11T21:38:05.9698394Z 2023-01-11T21:38:05.9698398Z 2023-01-11T21:38:05.9698480Z if __name__ == "__main__": 2023-01-11T21:38:05.9698598Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.9698716Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.9698914Z arg0_1 = rand_strided((201, ), (1, ), device='cpu', dtype=torch.float16) 2023-01-11T21:38:05.9699027Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:05.9699032Z 2023-01-11T21:38:05.9699036Z 2023-01-11T21:38:05.9699134Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.9699209Z import torch 2023-01-11T21:38:05.9699283Z import random 2023-01-11T21:38:05.9699402Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.9699562Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.9699575Z 2023-01-11T21:38:05.9699650Z aten = torch.ops.aten 2023-01-11T21:38:05.9699786Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.9699880Z async_compile = AsyncCompile() 2023-01-11T21:38:05.9699886Z 2023-01-11T21:38:05.9699964Z import triton 2023-01-11T21:38:05.9700057Z import triton.language as tl 2023-01-11T21:38:05.9700182Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.9700320Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.9700326Z 2023-01-11T21:38:05.9700330Z 2023-01-11T21:38:05.9700469Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.9700665Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.9700788Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:05.9700892Z float* __restrict__ out_ptr0, 2023-01-11T21:38:05.9700998Z float* __restrict__ out_ptr1) 2023-01-11T21:38:05.9701064Z { 2023-01-11T21:38:05.9701168Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.9701236Z { 2023-01-11T21:38:05.9701309Z #pragma omp for 2023-01-11T21:38:05.9701397Z for(long i0=0; i0<8; i0+=1) 2023-01-11T21:38:05.9701466Z { 2023-01-11T21:38:05.9701608Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:05.9701699Z auto tmp1 = tmp0.log1p(); 2023-01-11T21:38:05.9701837Z auto tmp2 = at::vec::Vectorized(static_cast(2)); 2023-01-11T21:38:05.9701927Z auto tmp3 = tmp1 * tmp2; 2023-01-11T21:38:05.9702015Z tmp1.store(out_ptr0 + 8*i0); 2023-01-11T21:38:05.9702110Z tmp3.store(out_ptr1 + 8*i0); 2023-01-11T21:38:05.9702178Z } 2023-01-11T21:38:05.9702280Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:05.9702369Z for(long i0=64; i0<64; i0+=1) 2023-01-11T21:38:05.9702440Z { 2023-01-11T21:38:05.9702530Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:05.9702621Z auto tmp1 = std::log1p(tmp0); 2023-01-11T21:38:05.9702728Z auto tmp2 = static_cast(2); 2023-01-11T21:38:05.9702818Z auto tmp3 = tmp1 * tmp2; 2023-01-11T21:38:05.9702937Z out_ptr0[i0] = tmp1; 2023-01-11T21:38:05.9703022Z out_ptr1[i0] = tmp3; 2023-01-11T21:38:05.9703090Z } 2023-01-11T21:38:05.9703156Z } 2023-01-11T21:38:05.9703213Z } 2023-01-11T21:38:05.9703304Z ''') 2023-01-11T21:38:05.9703309Z 2023-01-11T21:38:05.9703313Z 2023-01-11T21:38:05.9703406Z async_compile.wait(globals()) 2023-01-11T21:38:05.9703484Z del async_compile 2023-01-11T21:38:05.9703489Z 2023-01-11T21:38:05.9703566Z def call(args): 2023-01-11T21:38:05.9703641Z arg0_1, = args 2023-01-11T21:38:05.9703717Z args.clear() 2023-01-11T21:38:05.9703900Z buf0 = empty_strided((64, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9704093Z buf1 = empty_strided((64, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9704264Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr())) 2023-01-11T21:38:05.9704336Z del arg0_1 2023-01-11T21:38:05.9704416Z return (buf0, buf1, ) 2023-01-11T21:38:05.9704424Z 2023-01-11T21:38:05.9704429Z 2023-01-11T21:38:05.9704509Z if __name__ == "__main__": 2023-01-11T21:38:05.9704633Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.9704762Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.9704948Z arg0_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9705078Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:05.9705084Z 2023-01-11T21:38:05.9705090Z 2023-01-11T21:38:05.9705197Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.9705294Z import torch 2023-01-11T21:38:05.9705374Z import random 2023-01-11T21:38:05.9705533Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.9705657Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.9705662Z 2023-01-11T21:38:05.9705744Z aten = torch.ops.aten 2023-01-11T21:38:05.9705872Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.9705971Z async_compile = AsyncCompile() 2023-01-11T21:38:05.9705976Z 2023-01-11T21:38:05.9706053Z import triton 2023-01-11T21:38:05.9706147Z import triton.language as tl 2023-01-11T21:38:05.9706274Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.9706414Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.9706419Z 2023-01-11T21:38:05.9706424Z 2023-01-11T21:38:05.9706560Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.9706764Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.9706880Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:05.9706989Z float* __restrict__ out_ptr0, 2023-01-11T21:38:05.9707092Z float* __restrict__ out_ptr1) 2023-01-11T21:38:05.9707159Z { 2023-01-11T21:38:05.9707261Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.9707329Z { 2023-01-11T21:38:05.9707410Z #pragma omp for 2023-01-11T21:38:05.9707493Z for(long i0=0; i0<25; i0+=1) 2023-01-11T21:38:05.9707559Z { 2023-01-11T21:38:05.9707699Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:05.9707790Z auto tmp1 = tmp0.log1p(); 2023-01-11T21:38:05.9707928Z auto tmp2 = at::vec::Vectorized(static_cast(2)); 2023-01-11T21:38:05.9708017Z auto tmp3 = tmp1 * tmp2; 2023-01-11T21:38:05.9708111Z tmp1.store(out_ptr0 + 8*i0); 2023-01-11T21:38:05.9708199Z tmp3.store(out_ptr1 + 8*i0); 2023-01-11T21:38:05.9708267Z } 2023-01-11T21:38:05.9708366Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:05.9708456Z for(long i0=200; i0<201; i0+=1) 2023-01-11T21:38:05.9708523Z { 2023-01-11T21:38:05.9708614Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:05.9708713Z auto tmp1 = std::log1p(tmp0); 2023-01-11T21:38:05.9708810Z auto tmp2 = static_cast(2); 2023-01-11T21:38:05.9708930Z auto tmp3 = tmp1 * tmp2; 2023-01-11T21:38:05.9709017Z out_ptr0[i0] = tmp1; 2023-01-11T21:38:05.9709103Z out_ptr1[i0] = tmp3; 2023-01-11T21:38:05.9709169Z } 2023-01-11T21:38:05.9709235Z } 2023-01-11T21:38:05.9709303Z } 2023-01-11T21:38:05.9709380Z ''') 2023-01-11T21:38:05.9709386Z 2023-01-11T21:38:05.9709390Z 2023-01-11T21:38:05.9709483Z async_compile.wait(globals()) 2023-01-11T21:38:05.9709561Z del async_compile 2023-01-11T21:38:05.9709566Z 2023-01-11T21:38:05.9709641Z def call(args): 2023-01-11T21:38:05.9709716Z arg0_1, = args 2023-01-11T21:38:05.9709792Z args.clear() 2023-01-11T21:38:05.9709989Z buf0 = empty_strided((201, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9710173Z buf1 = empty_strided((201, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9710337Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr())) 2023-01-11T21:38:05.9710413Z del arg0_1 2023-01-11T21:38:05.9710496Z return (buf0, buf1, ) 2023-01-11T21:38:05.9710501Z 2023-01-11T21:38:05.9710506Z 2023-01-11T21:38:05.9710588Z if __name__ == "__main__": 2023-01-11T21:38:05.9710707Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.9710837Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.9711032Z arg0_1 = rand_strided((201, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9711138Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:05.9711150Z 2023-01-11T21:38:05.9711406Z [2023-01-11 21:28:32,592] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 167 2023-01-11T21:38:05.9711860Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.9711991Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.9712248Z [2023-01-11 21:28:32,634] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 168 2023-01-11T21:38:05.9712511Z [2023-01-11 21:28:34,381] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 168 2023-01-11T21:38:05.9712924Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.9713059Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.9713318Z [2023-01-11 21:28:34,396] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 169 2023-01-11T21:38:05.9713578Z [2023-01-11 21:28:36,262] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 169 2023-01-11T21:38:05.9713988Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.9714119Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.9714366Z [2023-01-11 21:28:36,308] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 170 2023-01-11T21:38:05.9714632Z [2023-01-11 21:28:38,222] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 170 2023-01-11T21:38:05.9715070Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.9715203Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.9715455Z [2023-01-11 21:28:38,240] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 171 2023-01-11T21:38:05.9715461Z 2023-01-11T21:38:05.9715559Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.9715636Z import torch 2023-01-11T21:38:05.9715710Z import random 2023-01-11T21:38:05.9715829Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.9715946Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.9715959Z 2023-01-11T21:38:05.9716034Z aten = torch.ops.aten 2023-01-11T21:38:05.9716171Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.9716267Z async_compile = AsyncCompile() 2023-01-11T21:38:05.9716272Z 2023-01-11T21:38:05.9716347Z import triton 2023-01-11T21:38:05.9716440Z import triton.language as tl 2023-01-11T21:38:05.9716566Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.9716706Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.9716711Z 2023-01-11T21:38:05.9716716Z 2023-01-11T21:38:05.9716852Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.9717050Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.9717205Z extern "C" void kernel(const double* __restrict__ in_ptr0, 2023-01-11T21:38:05.9717311Z double* __restrict__ out_ptr0, 2023-01-11T21:38:05.9717413Z double* __restrict__ out_ptr1) 2023-01-11T21:38:05.9717479Z { 2023-01-11T21:38:05.9717584Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.9717651Z { 2023-01-11T21:38:05.9717725Z #pragma omp for 2023-01-11T21:38:05.9717815Z for(long i0=0; i0<64; i0+=1) 2023-01-11T21:38:05.9717883Z { 2023-01-11T21:38:05.9717950Z { 2023-01-11T21:38:05.9718019Z { 2023-01-11T21:38:05.9718120Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:05.9718221Z auto tmp1 = std::log1p(tmp0); 2023-01-11T21:38:05.9718331Z auto tmp2 = static_cast(2); 2023-01-11T21:38:05.9718428Z auto tmp3 = tmp1 * tmp2; 2023-01-11T21:38:05.9718517Z out_ptr0[i0] = tmp1; 2023-01-11T21:38:05.9718611Z out_ptr1[i0] = tmp3; 2023-01-11T21:38:05.9718680Z } 2023-01-11T21:38:05.9718748Z } 2023-01-11T21:38:05.9718808Z } 2023-01-11T21:38:05.9718874Z } 2023-01-11T21:38:05.9718936Z } 2023-01-11T21:38:05.9719021Z ''') 2023-01-11T21:38:05.9719026Z 2023-01-11T21:38:05.9719034Z 2023-01-11T21:38:05.9719127Z async_compile.wait(globals()) 2023-01-11T21:38:05.9719205Z del async_compile 2023-01-11T21:38:05.9719210Z 2023-01-11T21:38:05.9719286Z def call(args): 2023-01-11T21:38:05.9719363Z arg0_1, = args 2023-01-11T21:38:05.9719432Z args.clear() 2023-01-11T21:38:05.9719628Z buf0 = empty_strided((64, ), (1, ), device='cpu', dtype=torch.float64) 2023-01-11T21:38:05.9719820Z buf1 = empty_strided((64, ), (1, ), device='cpu', dtype=torch.float64) 2023-01-11T21:38:05.9719989Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr())) 2023-01-11T21:38:05.9720061Z del arg0_1 2023-01-11T21:38:05.9720145Z return (buf0, buf1, ) 2023-01-11T21:38:05.9720150Z 2023-01-11T21:38:05.9720155Z 2023-01-11T21:38:05.9720234Z if __name__ == "__main__": 2023-01-11T21:38:05.9720345Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.9720470Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.9720688Z arg0_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float64) 2023-01-11T21:38:05.9720802Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:05.9720807Z 2023-01-11T21:38:05.9720812Z 2023-01-11T21:38:05.9720909Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.9720985Z import torch 2023-01-11T21:38:05.9721058Z import random 2023-01-11T21:38:05.9721177Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.9721292Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.9721297Z 2023-01-11T21:38:05.9721378Z aten = torch.ops.aten 2023-01-11T21:38:05.9721513Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.9721613Z async_compile = AsyncCompile() 2023-01-11T21:38:05.9721618Z 2023-01-11T21:38:05.9721693Z import triton 2023-01-11T21:38:05.9721785Z import triton.language as tl 2023-01-11T21:38:05.9721910Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.9722054Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.9722060Z 2023-01-11T21:38:05.9722064Z 2023-01-11T21:38:05.9722192Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.9722397Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.9722522Z extern "C" void kernel(const double* __restrict__ in_ptr0, 2023-01-11T21:38:05.9722628Z double* __restrict__ out_ptr0, 2023-01-11T21:38:05.9722732Z double* __restrict__ out_ptr1) 2023-01-11T21:38:05.9722799Z { 2023-01-11T21:38:05.9722900Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.9722990Z { 2023-01-11T21:38:05.9723073Z #pragma omp for 2023-01-11T21:38:05.9723160Z for(long i0=0; i0<201; i0+=1) 2023-01-11T21:38:05.9723228Z { 2023-01-11T21:38:05.9723294Z { 2023-01-11T21:38:05.9723363Z { 2023-01-11T21:38:05.9723462Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:05.9723564Z auto tmp1 = std::log1p(tmp0); 2023-01-11T21:38:05.9723673Z auto tmp2 = static_cast(2); 2023-01-11T21:38:05.9723769Z auto tmp3 = tmp1 * tmp2; 2023-01-11T21:38:05.9723858Z out_ptr0[i0] = tmp1; 2023-01-11T21:38:05.9723947Z out_ptr1[i0] = tmp3; 2023-01-11T21:38:05.9724018Z } 2023-01-11T21:38:05.9724087Z } 2023-01-11T21:38:05.9724147Z } 2023-01-11T21:38:05.9724213Z } 2023-01-11T21:38:05.9724276Z } 2023-01-11T21:38:05.9724361Z ''') 2023-01-11T21:38:05.9724367Z 2023-01-11T21:38:05.9724372Z 2023-01-11T21:38:05.9724468Z async_compile.wait(globals()) 2023-01-11T21:38:05.9724545Z del async_compile 2023-01-11T21:38:05.9724550Z 2023-01-11T21:38:05.9724623Z def call(args): 2023-01-11T21:38:05.9724689Z arg0_1, = args 2023-01-11T21:38:05.9724767Z args.clear() 2023-01-11T21:38:05.9724962Z buf0 = empty_strided((201, ), (1, ), device='cpu', dtype=torch.float64) 2023-01-11T21:38:05.9725153Z buf1 = empty_strided((201, ), (1, ), device='cpu', dtype=torch.float64) 2023-01-11T21:38:05.9725316Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr())) 2023-01-11T21:38:05.9725389Z del arg0_1 2023-01-11T21:38:05.9725469Z return (buf0, buf1, ) 2023-01-11T21:38:05.9725475Z 2023-01-11T21:38:05.9725479Z 2023-01-11T21:38:05.9725558Z if __name__ == "__main__": 2023-01-11T21:38:05.9725667Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.9725791Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.9725986Z arg0_1 = rand_strided((201, ), (1, ), device='cpu', dtype=torch.float64) 2023-01-11T21:38:05.9726100Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:05.9726105Z 2023-01-11T21:38:05.9726110Z 2023-01-11T21:38:05.9726207Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.9726285Z import torch 2023-01-11T21:38:05.9726359Z import random 2023-01-11T21:38:05.9726506Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.9726623Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.9726628Z 2023-01-11T21:38:05.9726709Z aten = torch.ops.aten 2023-01-11T21:38:05.9726844Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.9726941Z async_compile = AsyncCompile() 2023-01-11T21:38:05.9726946Z 2023-01-11T21:38:05.9727020Z import triton 2023-01-11T21:38:05.9727113Z import triton.language as tl 2023-01-11T21:38:05.9727236Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.9727367Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.9727382Z 2023-01-11T21:38:05.9727387Z 2023-01-11T21:38:05.9727515Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.9727721Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.9727842Z extern "C" void kernel(const int* __restrict__ in_ptr0, 2023-01-11T21:38:05.9727946Z float* __restrict__ out_ptr0, 2023-01-11T21:38:05.9728049Z float* __restrict__ out_ptr1) 2023-01-11T21:38:05.9728114Z { 2023-01-11T21:38:05.9728214Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.9728273Z { 2023-01-11T21:38:05.9728354Z #pragma omp for 2023-01-11T21:38:05.9728441Z for(long i0=0; i0<64; i0+=1) 2023-01-11T21:38:05.9728509Z { 2023-01-11T21:38:05.9728576Z { 2023-01-11T21:38:05.9728644Z { 2023-01-11T21:38:05.9728740Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:05.9728846Z auto tmp1 = static_cast(tmp0); 2023-01-11T21:38:05.9728989Z auto tmp2 = std::log1p(tmp1); 2023-01-11T21:38:05.9729098Z auto tmp3 = static_cast(2); 2023-01-11T21:38:05.9729194Z auto tmp4 = tmp2 * tmp3; 2023-01-11T21:38:05.9729288Z out_ptr0[i0] = tmp2; 2023-01-11T21:38:05.9729379Z out_ptr1[i0] = tmp4; 2023-01-11T21:38:05.9729448Z } 2023-01-11T21:38:05.9729508Z } 2023-01-11T21:38:05.9729575Z } 2023-01-11T21:38:05.9729641Z } 2023-01-11T21:38:05.9729705Z } 2023-01-11T21:38:05.9729790Z ''') 2023-01-11T21:38:05.9729796Z 2023-01-11T21:38:05.9729800Z 2023-01-11T21:38:05.9729892Z async_compile.wait(globals()) 2023-01-11T21:38:05.9729972Z del async_compile 2023-01-11T21:38:05.9729977Z 2023-01-11T21:38:05.9730044Z def call(args): 2023-01-11T21:38:05.9730117Z arg0_1, = args 2023-01-11T21:38:05.9730191Z args.clear() 2023-01-11T21:38:05.9730385Z buf0 = empty_strided((64, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9730576Z buf1 = empty_strided((64, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9730738Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr())) 2023-01-11T21:38:05.9730810Z del arg0_1 2023-01-11T21:38:05.9730887Z return (buf0, buf1, ) 2023-01-11T21:38:05.9730901Z 2023-01-11T21:38:05.9730906Z 2023-01-11T21:38:05.9730978Z if __name__ == "__main__": 2023-01-11T21:38:05.9731095Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.9731220Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.9731411Z arg0_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.int32) 2023-01-11T21:38:05.9731522Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:05.9731527Z 2023-01-11T21:38:05.9731532Z 2023-01-11T21:38:05.9731629Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.9731702Z import torch 2023-01-11T21:38:05.9731772Z import random 2023-01-11T21:38:05.9731890Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.9732012Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.9732017Z 2023-01-11T21:38:05.9732100Z aten = torch.ops.aten 2023-01-11T21:38:05.9732260Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.9732358Z async_compile = AsyncCompile() 2023-01-11T21:38:05.9732363Z 2023-01-11T21:38:05.9732438Z import triton 2023-01-11T21:38:05.9732531Z import triton.language as tl 2023-01-11T21:38:05.9732647Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.9732784Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.9732790Z 2023-01-11T21:38:05.9732794Z 2023-01-11T21:38:05.9732930Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.9733133Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.9733255Z extern "C" void kernel(const int* __restrict__ in_ptr0, 2023-01-11T21:38:05.9733359Z float* __restrict__ out_ptr0, 2023-01-11T21:38:05.9733460Z float* __restrict__ out_ptr1) 2023-01-11T21:38:05.9733526Z { 2023-01-11T21:38:05.9733620Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.9733686Z { 2023-01-11T21:38:05.9733769Z #pragma omp for 2023-01-11T21:38:05.9733859Z for(long i0=0; i0<201; i0+=1) 2023-01-11T21:38:05.9733926Z { 2023-01-11T21:38:05.9733993Z { 2023-01-11T21:38:05.9734063Z { 2023-01-11T21:38:05.9734153Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:05.9734264Z auto tmp1 = static_cast(tmp0); 2023-01-11T21:38:05.9734370Z auto tmp2 = std::log1p(tmp1); 2023-01-11T21:38:05.9734590Z auto tmp3 = static_cast(2); 2023-01-11T21:38:05.9734686Z auto tmp4 = tmp2 * tmp3; 2023-01-11T21:38:05.9743465Z out_ptr0[i0] = tmp2; 2023-01-11T21:38:05.9743579Z out_ptr1[i0] = tmp4; 2023-01-11T21:38:05.9743652Z } 2023-01-11T21:38:05.9743723Z } 2023-01-11T21:38:05.9743792Z } 2023-01-11T21:38:05.9743851Z } 2023-01-11T21:38:05.9743920Z } 2023-01-11T21:38:05.9744026Z ''') 2023-01-11T21:38:05.9744037Z 2023-01-11T21:38:05.9744042Z 2023-01-11T21:38:05.9744140Z async_compile.wait(globals()) 2023-01-11T21:38:05.9744221Z del async_compile 2023-01-11T21:38:05.9744226Z 2023-01-11T21:38:05.9744310Z def call(args): 2023-01-11T21:38:05.9744387Z arg0_1, = args 2023-01-11T21:38:05.9744455Z args.clear() 2023-01-11T21:38:05.9744658Z buf0 = empty_strided((201, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9744855Z buf1 = empty_strided((201, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9745044Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr())) 2023-01-11T21:38:05.9745127Z del arg0_1 2023-01-11T21:38:05.9745229Z return (buf0, buf1, ) 2023-01-11T21:38:05.9745235Z 2023-01-11T21:38:05.9745241Z 2023-01-11T21:38:05.9745322Z if __name__ == "__main__": 2023-01-11T21:38:05.9745441Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.9745563Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.9745757Z arg0_1 = rand_strided((201, ), (1, ), device='cpu', dtype=torch.int32) 2023-01-11T21:38:05.9745869Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:05.9745874Z 2023-01-11T21:38:05.9746143Z [2023-01-11 21:28:40,196] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 171 2023-01-11T21:38:05.9746561Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.9746697Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.9746954Z [2023-01-11 21:28:40,254] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 172 2023-01-11T21:38:05.9747294Z [2023-01-11 21:28:42,144] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 172 2023-01-11T21:38:05.9747300Z 2023-01-11T21:38:05.9747401Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.9747476Z import torch 2023-01-11T21:38:05.9747544Z import random 2023-01-11T21:38:05.9747663Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.9747788Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.9747793Z 2023-01-11T21:38:05.9747874Z aten = torch.ops.aten 2023-01-11T21:38:05.9748006Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.9748107Z async_compile = AsyncCompile() 2023-01-11T21:38:05.9748112Z 2023-01-11T21:38:05.9748185Z import triton 2023-01-11T21:38:05.9748276Z import triton.language as tl 2023-01-11T21:38:05.9748400Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.9748541Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.9748549Z 2023-01-11T21:38:05.9748553Z 2023-01-11T21:38:05.9748694Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.9748899Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.9749015Z extern "C" void kernel(const long* __restrict__ in_ptr0, 2023-01-11T21:38:05.9749121Z float* __restrict__ out_ptr0, 2023-01-11T21:38:05.9749223Z float* __restrict__ out_ptr1) 2023-01-11T21:38:05.9749288Z { 2023-01-11T21:38:05.9749391Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.9749459Z { 2023-01-11T21:38:05.9749543Z #pragma omp for 2023-01-11T21:38:05.9749656Z for(long i0=0; i0<64; i0+=1) 2023-01-11T21:38:05.9749724Z { 2023-01-11T21:38:05.9749792Z { 2023-01-11T21:38:05.9749860Z { 2023-01-11T21:38:05.9749957Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:05.9750072Z auto tmp1 = static_cast(tmp0); 2023-01-11T21:38:05.9750185Z auto tmp2 = std::log1p(tmp1); 2023-01-11T21:38:05.9750286Z auto tmp3 = static_cast(2); 2023-01-11T21:38:05.9750383Z auto tmp4 = tmp2 * tmp3; 2023-01-11T21:38:05.9750473Z out_ptr0[i0] = tmp2; 2023-01-11T21:38:05.9750562Z out_ptr1[i0] = tmp4; 2023-01-11T21:38:05.9750633Z } 2023-01-11T21:38:05.9750700Z } 2023-01-11T21:38:05.9750766Z } 2023-01-11T21:38:05.9750826Z } 2023-01-11T21:38:05.9750889Z } 2023-01-11T21:38:05.9750977Z ''') 2023-01-11T21:38:05.9750983Z 2023-01-11T21:38:05.9750991Z 2023-01-11T21:38:05.9751086Z async_compile.wait(globals()) 2023-01-11T21:38:05.9751164Z del async_compile 2023-01-11T21:38:05.9751169Z 2023-01-11T21:38:05.9751244Z def call(args): 2023-01-11T21:38:05.9751320Z arg0_1, = args 2023-01-11T21:38:05.9751387Z args.clear() 2023-01-11T21:38:05.9751586Z buf0 = empty_strided((64, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9751782Z buf1 = empty_strided((64, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9751950Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr())) 2023-01-11T21:38:05.9752023Z del arg0_1 2023-01-11T21:38:05.9752105Z return (buf0, buf1, ) 2023-01-11T21:38:05.9752110Z 2023-01-11T21:38:05.9752114Z 2023-01-11T21:38:05.9752194Z if __name__ == "__main__": 2023-01-11T21:38:05.9752314Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.9752433Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.9752629Z arg0_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.int64) 2023-01-11T21:38:05.9752744Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:05.9752749Z 2023-01-11T21:38:05.9752754Z 2023-01-11T21:38:05.9752851Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.9752923Z import torch 2023-01-11T21:38:05.9753026Z import random 2023-01-11T21:38:05.9753146Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.9753261Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.9753273Z 2023-01-11T21:38:05.9753348Z aten = torch.ops.aten 2023-01-11T21:38:05.9753483Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.9753579Z async_compile = AsyncCompile() 2023-01-11T21:38:05.9753584Z 2023-01-11T21:38:05.9753657Z import triton 2023-01-11T21:38:05.9753750Z import triton.language as tl 2023-01-11T21:38:05.9753874Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.9754012Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.9754020Z 2023-01-11T21:38:05.9754024Z 2023-01-11T21:38:05.9754162Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.9754361Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.9754483Z extern "C" void kernel(const long* __restrict__ in_ptr0, 2023-01-11T21:38:05.9754589Z float* __restrict__ out_ptr0, 2023-01-11T21:38:05.9754692Z float* __restrict__ out_ptr1) 2023-01-11T21:38:05.9754757Z { 2023-01-11T21:38:05.9754858Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.9754925Z { 2023-01-11T21:38:05.9754999Z #pragma omp for 2023-01-11T21:38:05.9755088Z for(long i0=0; i0<201; i0+=1) 2023-01-11T21:38:05.9755155Z { 2023-01-11T21:38:05.9755222Z { 2023-01-11T21:38:05.9755293Z { 2023-01-11T21:38:05.9755390Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:05.9755539Z auto tmp1 = static_cast(tmp0); 2023-01-11T21:38:05.9755645Z auto tmp2 = std::log1p(tmp1); 2023-01-11T21:38:05.9755753Z auto tmp3 = static_cast(2); 2023-01-11T21:38:05.9755849Z auto tmp4 = tmp2 * tmp3; 2023-01-11T21:38:05.9755941Z out_ptr0[i0] = tmp2; 2023-01-11T21:38:05.9756031Z out_ptr1[i0] = tmp4; 2023-01-11T21:38:05.9756099Z } 2023-01-11T21:38:05.9756160Z } 2023-01-11T21:38:05.9756231Z } 2023-01-11T21:38:05.9756298Z } 2023-01-11T21:38:05.9756362Z } 2023-01-11T21:38:05.9756447Z ''') 2023-01-11T21:38:05.9756452Z 2023-01-11T21:38:05.9756457Z 2023-01-11T21:38:05.9756550Z async_compile.wait(globals()) 2023-01-11T21:38:05.9756627Z del async_compile 2023-01-11T21:38:05.9756632Z 2023-01-11T21:38:05.9756707Z def call(args): 2023-01-11T21:38:05.9756773Z arg0_1, = args 2023-01-11T21:38:05.9756849Z args.clear() 2023-01-11T21:38:05.9757047Z buf0 = empty_strided((201, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9757240Z buf1 = empty_strided((201, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9757409Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr())) 2023-01-11T21:38:05.9757484Z del arg0_1 2023-01-11T21:38:05.9757566Z return (buf0, buf1, ) 2023-01-11T21:38:05.9757571Z 2023-01-11T21:38:05.9757575Z 2023-01-11T21:38:05.9757648Z if __name__ == "__main__": 2023-01-11T21:38:05.9757763Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.9757889Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.9758080Z arg0_1 = rand_strided((201, ), (1, ), device='cpu', dtype=torch.int64) 2023-01-11T21:38:05.9758191Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:05.9758196Z 2023-01-11T21:38:05.9758268Z ok (19.158s) 2023-01-11T21:38:05.9758713Z test_log2_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.9758874Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.9759135Z [2023-01-11 21:28:42,184] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 173 2023-01-11T21:38:05.9759397Z [2023-01-11 21:28:44,219] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 173 2023-01-11T21:38:05.9759403Z 2023-01-11T21:38:05.9759494Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.9759569Z import torch 2023-01-11T21:38:05.9759646Z import random 2023-01-11T21:38:05.9759764Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.9759889Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.9759894Z 2023-01-11T21:38:05.9759976Z aten = torch.ops.aten 2023-01-11T21:38:05.9760120Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.9760208Z async_compile = AsyncCompile() 2023-01-11T21:38:05.9760214Z 2023-01-11T21:38:05.9760289Z import triton 2023-01-11T21:38:05.9760385Z import triton.language as tl 2023-01-11T21:38:05.9760512Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.9760652Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.9760658Z 2023-01-11T21:38:05.9760662Z 2023-01-11T21:38:05.9760797Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.9761003Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.9761125Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:05.9761222Z float* __restrict__ out_ptr0, 2023-01-11T21:38:05.9761351Z float* __restrict__ out_ptr1) 2023-01-11T21:38:05.9761418Z { 2023-01-11T21:38:05.9761518Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.9761585Z { 2023-01-11T21:38:05.9761666Z #pragma omp for 2023-01-11T21:38:05.9761755Z for(long i0=0; i0<8; i0+=1) 2023-01-11T21:38:05.9761817Z { 2023-01-11T21:38:05.9761958Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:05.9762048Z auto tmp1 = tmp0.log(); 2023-01-11T21:38:05.9762198Z auto tmp2 = at::vec::Vectorized(static_cast(1.4426950408889634)); 2023-01-11T21:38:05.9762288Z auto tmp3 = tmp1 * tmp2; 2023-01-11T21:38:05.9762424Z auto tmp4 = at::vec::Vectorized(static_cast(1)); 2023-01-11T21:38:05.9762514Z auto tmp5 = tmp0 + tmp4; 2023-01-11T21:38:05.9762603Z auto tmp6 = tmp5.log(); 2023-01-11T21:38:05.9762683Z auto tmp7 = tmp6 * tmp2; 2023-01-11T21:38:05.9762824Z auto tmp8 = at::vec::Vectorized(static_cast(2)); 2023-01-11T21:38:05.9762953Z auto tmp9 = tmp7 - tmp8; 2023-01-11T21:38:05.9763051Z tmp3.store(out_ptr0 + 8*i0); 2023-01-11T21:38:05.9763147Z tmp9.store(out_ptr1 + 8*i0); 2023-01-11T21:38:05.9763215Z } 2023-01-11T21:38:05.9763318Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:05.9763399Z for(long i0=64; i0<64; i0+=1) 2023-01-11T21:38:05.9763466Z { 2023-01-11T21:38:05.9763554Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:05.9763650Z auto tmp1 = std::log(tmp0); 2023-01-11T21:38:05.9763767Z auto tmp2 = static_cast(1.4426950408889634); 2023-01-11T21:38:05.9763856Z auto tmp3 = tmp1 * tmp2; 2023-01-11T21:38:05.9763959Z auto tmp4 = static_cast(1); 2023-01-11T21:38:05.9764041Z auto tmp5 = tmp0 + tmp4; 2023-01-11T21:38:05.9764135Z auto tmp6 = std::log(tmp5); 2023-01-11T21:38:05.9764226Z auto tmp7 = tmp6 * tmp2; 2023-01-11T21:38:05.9764326Z auto tmp8 = static_cast(2); 2023-01-11T21:38:05.9764452Z auto tmp9 = tmp7 - tmp8; 2023-01-11T21:38:05.9764538Z out_ptr0[i0] = tmp3; 2023-01-11T21:38:05.9764623Z out_ptr1[i0] = tmp9; 2023-01-11T21:38:05.9764716Z } 2023-01-11T21:38:05.9764786Z } 2023-01-11T21:38:05.9764851Z } 2023-01-11T21:38:05.9764934Z ''') 2023-01-11T21:38:05.9764939Z 2023-01-11T21:38:05.9764944Z 2023-01-11T21:38:05.9765039Z async_compile.wait(globals()) 2023-01-11T21:38:05.9765115Z del async_compile 2023-01-11T21:38:05.9765120Z 2023-01-11T21:38:05.9765195Z def call(args): 2023-01-11T21:38:05.9765262Z arg0_1, = args 2023-01-11T21:38:05.9765335Z args.clear() 2023-01-11T21:38:05.9765529Z buf0 = empty_strided((64, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9765721Z buf1 = empty_strided((64, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9765893Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr())) 2023-01-11T21:38:05.9765966Z del arg0_1 2023-01-11T21:38:05.9766048Z return (buf0, buf1, ) 2023-01-11T21:38:05.9766054Z 2023-01-11T21:38:05.9766058Z 2023-01-11T21:38:05.9766138Z if __name__ == "__main__": 2023-01-11T21:38:05.9766252Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.9766379Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.9766571Z arg0_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9766684Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:05.9766689Z 2023-01-11T21:38:05.9766764Z ok (2.068s) 2023-01-11T21:38:05.9767212Z test_log_fp64_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.9767380Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.9767639Z [2023-01-11 21:28:44,235] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 174 2023-01-11T21:38:05.9767902Z [2023-01-11 21:28:46,004] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 174 2023-01-11T21:38:05.9767908Z 2023-01-11T21:38:05.9767998Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.9768074Z import torch 2023-01-11T21:38:05.9768148Z import random 2023-01-11T21:38:05.9768267Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.9768390Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.9768395Z 2023-01-11T21:38:05.9768476Z aten = torch.ops.aten 2023-01-11T21:38:05.9768610Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.9768709Z async_compile = AsyncCompile() 2023-01-11T21:38:05.9768715Z 2023-01-11T21:38:05.9768781Z import triton 2023-01-11T21:38:05.9768874Z import triton.language as tl 2023-01-11T21:38:05.9768998Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.9769140Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.9769146Z 2023-01-11T21:38:05.9769150Z 2023-01-11T21:38:05.9769288Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.9769493Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.9769618Z extern "C" void kernel(const double* __restrict__ in_ptr0, 2023-01-11T21:38:05.9769724Z double* __restrict__ out_ptr0, 2023-01-11T21:38:05.9769821Z double* __restrict__ out_ptr1) 2023-01-11T21:38:05.9769887Z { 2023-01-11T21:38:05.9769989Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.9770057Z { 2023-01-11T21:38:05.9770138Z #pragma omp for 2023-01-11T21:38:05.9770229Z for(long i0=0; i0<1024; i0+=1) 2023-01-11T21:38:05.9770296Z { 2023-01-11T21:38:05.9770356Z { 2023-01-11T21:38:05.9770425Z { 2023-01-11T21:38:05.9770523Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:05.9770658Z auto tmp1 = std::log(tmp0); 2023-01-11T21:38:05.9770783Z auto tmp2 = static_cast(1.4426950408889634); 2023-01-11T21:38:05.9770883Z auto tmp3 = tmp1 * tmp2; 2023-01-11T21:38:05.9770973Z out_ptr0[i0] = tmp1; 2023-01-11T21:38:05.9771055Z out_ptr1[i0] = tmp3; 2023-01-11T21:38:05.9771125Z } 2023-01-11T21:38:05.9771191Z } 2023-01-11T21:38:05.9771257Z } 2023-01-11T21:38:05.9771323Z } 2023-01-11T21:38:05.9771387Z } 2023-01-11T21:38:05.9771464Z ''') 2023-01-11T21:38:05.9771470Z 2023-01-11T21:38:05.9771484Z 2023-01-11T21:38:05.9771571Z async_compile.wait(globals()) 2023-01-11T21:38:05.9771648Z del async_compile 2023-01-11T21:38:05.9771653Z 2023-01-11T21:38:05.9771727Z def call(args): 2023-01-11T21:38:05.9771802Z arg0_1, = args 2023-01-11T21:38:05.9771876Z args.clear() 2023-01-11T21:38:05.9772078Z buf0 = empty_strided((1024, ), (1, ), device='cpu', dtype=torch.float64) 2023-01-11T21:38:05.9772274Z buf1 = empty_strided((1024, ), (1, ), device='cpu', dtype=torch.float64) 2023-01-11T21:38:05.9772434Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr())) 2023-01-11T21:38:05.9772507Z del arg0_1 2023-01-11T21:38:05.9772590Z return (buf0, buf1, ) 2023-01-11T21:38:05.9772595Z 2023-01-11T21:38:05.9772599Z 2023-01-11T21:38:05.9772679Z if __name__ == "__main__": 2023-01-11T21:38:05.9772796Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.9772923Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.9773150Z arg0_1 = rand_strided((1024, ), (1, ), device='cpu', dtype=torch.float64) 2023-01-11T21:38:05.9773261Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:05.9773267Z 2023-01-11T21:38:05.9773331Z ok (1.784s) 2023-01-11T21:38:05.9773789Z test_log_softmax_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.9773923Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.9774179Z [2023-01-11 21:28:46,049] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 175 2023-01-11T21:38:05.9774185Z 2023-01-11T21:38:05.9774283Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.9774358Z import torch 2023-01-11T21:38:05.9774430Z import random 2023-01-11T21:38:05.9774684Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.9774814Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.9774820Z 2023-01-11T21:38:05.9774913Z aten = torch.ops.aten 2023-01-11T21:38:05.9775071Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.9775179Z async_compile = AsyncCompile() 2023-01-11T21:38:05.9775185Z 2023-01-11T21:38:05.9775258Z import triton 2023-01-11T21:38:05.9775351Z import triton.language as tl 2023-01-11T21:38:05.9775474Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.9775613Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.9775618Z 2023-01-11T21:38:05.9775623Z 2023-01-11T21:38:05.9775761Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.9775958Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.9776088Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:05.9776198Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:05.9776301Z float* __restrict__ out_ptr0, 2023-01-11T21:38:05.9776402Z float* __restrict__ out_ptr1, 2023-01-11T21:38:05.9776549Z float* __restrict__ out_ptr2, 2023-01-11T21:38:05.9776650Z float* __restrict__ out_ptr3, 2023-01-11T21:38:05.9776742Z float* __restrict__ out_ptr4, 2023-01-11T21:38:05.9776838Z float* __restrict__ out_ptr5, 2023-01-11T21:38:05.9776934Z float* __restrict__ out_ptr6, 2023-01-11T21:38:05.9777031Z float* __restrict__ out_ptr7, 2023-01-11T21:38:05.9777180Z float* __restrict__ out_ptr8) 2023-01-11T21:38:05.9777255Z { 2023-01-11T21:38:05.9777359Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.9777422Z { 2023-01-11T21:38:05.9777504Z #pragma omp for 2023-01-11T21:38:05.9777590Z for(long i0=0; i0<8; i0+=1) 2023-01-11T21:38:05.9777658Z { 2023-01-11T21:38:05.9777726Z { 2023-01-11T21:38:05.9778095Z #pragma omp declare reduction(max:at::vec::Vectorized:omp_out = at::vec::maximum(omp_out, omp_in)) initializer(omp_priv={{-std::numeric_limits::infinity()}}) 2023-01-11T21:38:05.9778334Z float tmp3 = -std::numeric_limits::infinity(); 2023-01-11T21:38:05.9778462Z auto tmp3_vec = at::vec::Vectorized(tmp3); 2023-01-11T21:38:05.9778550Z for(long i1=0; i1<1; i1+=1) 2023-01-11T21:38:05.9778620Z { 2023-01-11T21:38:05.9778768Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + (8*i0) + (8*i1)); 2023-01-11T21:38:05.9778917Z auto tmp1 = at::vec::Vectorized::loadu(in_ptr1 + (8*i0) + (8*i1)); 2023-01-11T21:38:05.9779014Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:05.9779178Z tmp3_vec = at::vec::maximum(tmp3_vec, tmp2); 2023-01-11T21:38:05.9779249Z } 2023-01-11T21:38:05.9779468Z tmp3 = at::vec::vec_reduce_all([](at::vec::Vectorized& x, at::vec::Vectorized&y) {return at::vec::maximum(x, y);}, tmp3_vec); 2023-01-11T21:38:05.9779597Z #pragma omp simd simdlen(4) reduction(max:tmp3) 2023-01-11T21:38:05.9779684Z for(long i1=8; i1<8; i1+=1) 2023-01-11T21:38:05.9779753Z { 2023-01-11T21:38:05.9779857Z auto tmp0 = in_ptr0[i1 + (8*i0)]; 2023-01-11T21:38:05.9779959Z auto tmp1 = in_ptr1[i1 + (8*i0)]; 2023-01-11T21:38:05.9780062Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:05.9780166Z tmp3 = std::max(tmp3, tmp2); 2023-01-11T21:38:05.9780235Z } 2023-01-11T21:38:05.9780315Z out_ptr0[i0] = tmp3; 2023-01-11T21:38:05.9780386Z } 2023-01-11T21:38:05.9780452Z } 2023-01-11T21:38:05.9780534Z #pragma omp for 2023-01-11T21:38:05.9780619Z for(long i0=0; i0<8; i0+=1) 2023-01-11T21:38:05.9780686Z { 2023-01-11T21:38:05.9780746Z { 2023-01-11T21:38:05.9780944Z #pragma omp declare reduction(+:at::vec::Vectorized:omp_out += omp_in) initializer(omp_priv={{0}}) 2023-01-11T21:38:05.9781033Z float tmp6 = 0; 2023-01-11T21:38:05.9781158Z auto tmp6_vec = at::vec::Vectorized(tmp6); 2023-01-11T21:38:05.9781515Z #pragma omp declare reduction(max:at::vec::Vectorized:omp_out = at::vec::maximum(omp_out, omp_in)) initializer(omp_priv={{-std::numeric_limits::infinity()}}) 2023-01-11T21:38:05.9781725Z float tmp7 = -std::numeric_limits::infinity(); 2023-01-11T21:38:05.9781853Z auto tmp7_vec = at::vec::Vectorized(tmp7); 2023-01-11T21:38:05.9781950Z for(long i1=0; i1<1; i1+=1) 2023-01-11T21:38:05.9782021Z { 2023-01-11T21:38:05.9782159Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + (8*i0) + (8*i1)); 2023-01-11T21:38:05.9782304Z auto tmp1 = at::vec::Vectorized::loadu(in_ptr1 + (8*i0) + (8*i1)); 2023-01-11T21:38:05.9782463Z auto tmp3 = at::vec::Vectorized(out_ptr0[i0]); 2023-01-11T21:38:05.9782563Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:05.9782702Z auto tmp4 = tmp2 - tmp3; 2023-01-11T21:38:05.9782797Z auto tmp5 = tmp4.exp(); 2023-01-11T21:38:05.9782884Z tmp6_vec += tmp5; 2023-01-11T21:38:05.9783008Z tmp7_vec = at::vec::maximum(tmp7_vec, tmp1); 2023-01-11T21:38:05.9783071Z } 2023-01-11T21:38:05.9783271Z tmp6 = at::vec::vec_reduce_all([](at::vec::Vectorized& x, at::vec::Vectorized&y) {return x + y;}, tmp6_vec); 2023-01-11T21:38:05.9783490Z tmp7 = at::vec::vec_reduce_all([](at::vec::Vectorized& x, at::vec::Vectorized&y) {return at::vec::maximum(x, y);}, tmp7_vec); 2023-01-11T21:38:05.9783639Z #pragma omp simd simdlen(4) reduction(+:tmp6) reduction(max:tmp7) 2023-01-11T21:38:05.9783735Z for(long i1=8; i1<8; i1+=1) 2023-01-11T21:38:05.9783807Z { 2023-01-11T21:38:05.9783914Z auto tmp0 = in_ptr0[i1 + (8*i0)]; 2023-01-11T21:38:05.9784015Z auto tmp1 = in_ptr1[i1 + (8*i0)]; 2023-01-11T21:38:05.9784114Z auto tmp3 = out_ptr0[i0]; 2023-01-11T21:38:05.9784203Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:05.9784342Z auto tmp4 = tmp2 - tmp3; 2023-01-11T21:38:05.9784450Z auto tmp5 = std::exp(tmp4); 2023-01-11T21:38:05.9784534Z tmp6 += tmp5; 2023-01-11T21:38:05.9784639Z tmp7 = std::max(tmp7, tmp1); 2023-01-11T21:38:05.9784742Z } 2023-01-11T21:38:05.9784833Z out_ptr1[i0] = tmp6; 2023-01-11T21:38:05.9784925Z out_ptr2[i0] = tmp7; 2023-01-11T21:38:05.9785005Z } 2023-01-11T21:38:05.9785078Z } 2023-01-11T21:38:05.9785181Z #pragma omp for 2023-01-11T21:38:05.9785272Z for(long i0=0; i0<8; i0+=1) 2023-01-11T21:38:05.9785339Z { 2023-01-11T21:38:05.9785400Z { 2023-01-11T21:38:05.9785468Z { 2023-01-11T21:38:05.9785684Z float tmp1 = -std::numeric_limits::infinity(); 2023-01-11T21:38:05.9785781Z for(long i1=0; i1<8; i1+=1) 2023-01-11T21:38:05.9785852Z { 2023-01-11T21:38:05.9785925Z { 2023-01-11T21:38:05.9786037Z auto tmp0 = in_ptr0[i0 + (8*i1)]; 2023-01-11T21:38:05.9786146Z tmp1 = std::max(tmp1, tmp0); 2023-01-11T21:38:05.9786214Z } 2023-01-11T21:38:05.9786285Z } 2023-01-11T21:38:05.9786374Z out_ptr3[i0] = tmp1; 2023-01-11T21:38:05.9786443Z } 2023-01-11T21:38:05.9786512Z } 2023-01-11T21:38:05.9786578Z } 2023-01-11T21:38:05.9786652Z #pragma omp for 2023-01-11T21:38:05.9786740Z for(long i0=0; i0<8; i0+=1) 2023-01-11T21:38:05.9786808Z { 2023-01-11T21:38:05.9786875Z { 2023-01-11T21:38:05.9786944Z { 2023-01-11T21:38:05.9787030Z float tmp4 = 0; 2023-01-11T21:38:05.9787126Z for(long i1=0; i1<8; i1+=1) 2023-01-11T21:38:05.9787189Z { 2023-01-11T21:38:05.9787266Z { 2023-01-11T21:38:05.9787378Z auto tmp0 = in_ptr0[i0 + (8*i1)]; 2023-01-11T21:38:05.9787480Z auto tmp1 = out_ptr3[i0]; 2023-01-11T21:38:05.9787627Z auto tmp2 = tmp0 - tmp1; 2023-01-11T21:38:05.9787737Z auto tmp3 = std::exp(tmp2); 2023-01-11T21:38:05.9787825Z tmp4 += tmp3; 2023-01-11T21:38:05.9787891Z } 2023-01-11T21:38:05.9787962Z } 2023-01-11T21:38:05.9788052Z out_ptr4[i0] = tmp4; 2023-01-11T21:38:05.9788194Z } 2023-01-11T21:38:05.9788263Z } 2023-01-11T21:38:05.9788330Z } 2023-01-11T21:38:05.9788411Z #pragma omp for 2023-01-11T21:38:05.9788490Z for(long i0=0; i0<8; i0+=1) 2023-01-11T21:38:05.9788556Z { 2023-01-11T21:38:05.9788624Z { 2023-01-11T21:38:05.9788819Z #pragma omp declare reduction(+:at::vec::Vectorized:omp_out += omp_in) initializer(omp_priv={{0}}) 2023-01-11T21:38:05.9788908Z float tmp4 = 0; 2023-01-11T21:38:05.9789034Z auto tmp4_vec = at::vec::Vectorized(tmp4); 2023-01-11T21:38:05.9789131Z for(long i1=0; i1<1; i1+=1) 2023-01-11T21:38:05.9789198Z { 2023-01-11T21:38:05.9789345Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr1 + (8*i0) + (8*i1)); 2023-01-11T21:38:05.9789478Z auto tmp1 = at::vec::Vectorized(out_ptr2[i0]); 2023-01-11T21:38:05.9789621Z auto tmp2 = tmp0 - tmp1; 2023-01-11T21:38:05.9789718Z auto tmp3 = tmp2.exp(); 2023-01-11T21:38:05.9789807Z tmp4_vec += tmp3; 2023-01-11T21:38:05.9789876Z } 2023-01-11T21:38:05.9790076Z tmp4 = at::vec::vec_reduce_all([](at::vec::Vectorized& x, at::vec::Vectorized&y) {return x + y;}, tmp4_vec); 2023-01-11T21:38:05.9790195Z #pragma omp simd simdlen(4) reduction(+:tmp4) 2023-01-11T21:38:05.9790290Z for(long i1=8; i1<8; i1+=1) 2023-01-11T21:38:05.9790358Z { 2023-01-11T21:38:05.9790463Z auto tmp0 = in_ptr1[i1 + (8*i0)]; 2023-01-11T21:38:05.9790594Z auto tmp1 = out_ptr2[i0]; 2023-01-11T21:38:05.9790731Z auto tmp2 = tmp0 - tmp1; 2023-01-11T21:38:05.9790835Z auto tmp3 = std::exp(tmp2); 2023-01-11T21:38:05.9790920Z tmp4 += tmp3; 2023-01-11T21:38:05.9790982Z } 2023-01-11T21:38:05.9791072Z out_ptr5[i0] = tmp4; 2023-01-11T21:38:05.9791140Z } 2023-01-11T21:38:05.9791206Z } 2023-01-11T21:38:05.9791288Z #pragma omp for 2023-01-11T21:38:05.9791373Z for(long i0=0; i0<8; i0+=1) 2023-01-11T21:38:05.9791433Z { 2023-01-11T21:38:05.9791520Z for(long i1=0; i1<1; i1+=1) 2023-01-11T21:38:05.9791588Z { 2023-01-11T21:38:05.9791734Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + (8*i0) + (8*i1)); 2023-01-11T21:38:05.9791878Z auto tmp1 = at::vec::Vectorized::loadu(in_ptr1 + (8*i0) + (8*i1)); 2023-01-11T21:38:05.9792015Z auto tmp3 = at::vec::Vectorized(out_ptr0[i0]); 2023-01-11T21:38:05.9792143Z auto tmp5 = at::vec::Vectorized(out_ptr1[i0]); 2023-01-11T21:38:05.9792286Z auto tmp8 = at::vec::Vectorized::loadu(out_ptr3 + 8*i1); 2023-01-11T21:38:05.9792428Z auto tmp10 = at::vec::Vectorized::loadu(out_ptr4 + 8*i1); 2023-01-11T21:38:05.9792553Z auto tmp13 = at::vec::Vectorized(out_ptr2[i0]); 2023-01-11T21:38:05.9792685Z auto tmp15 = at::vec::Vectorized(out_ptr5[i0]); 2023-01-11T21:38:05.9792778Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:05.9792911Z auto tmp4 = tmp2 - tmp3; 2023-01-11T21:38:05.9793003Z auto tmp6 = tmp5.log(); 2023-01-11T21:38:05.9793134Z auto tmp7 = tmp4 - tmp6; 2023-01-11T21:38:05.9793265Z auto tmp9 = tmp0 - tmp8; 2023-01-11T21:38:05.9793353Z auto tmp11 = tmp10.log(); 2023-01-11T21:38:05.9793491Z auto tmp12 = tmp9 - tmp11; 2023-01-11T21:38:05.9793626Z auto tmp14 = tmp1 - tmp13; 2023-01-11T21:38:05.9793721Z auto tmp16 = tmp15.log(); 2023-01-11T21:38:05.9793858Z auto tmp17 = tmp14 - tmp16; 2023-01-11T21:38:05.9793966Z tmp7.store(out_ptr6 + (8*i0) + (8*i1)); 2023-01-11T21:38:05.9794106Z tmp12.store(out_ptr7 + (8*i0) + (8*i1)); 2023-01-11T21:38:05.9794208Z tmp17.store(out_ptr8 + (8*i0) + (8*i1)); 2023-01-11T21:38:05.9794277Z } 2023-01-11T21:38:05.9794374Z #pragma omp simd simdlen(4) 2023-01-11T21:38:05.9794463Z for(long i1=8; i1<8; i1+=1) 2023-01-11T21:38:05.9794532Z { 2023-01-11T21:38:05.9794636Z auto tmp0 = in_ptr0[i1 + (8*i0)]; 2023-01-11T21:38:05.9794740Z auto tmp1 = in_ptr1[i1 + (8*i0)]; 2023-01-11T21:38:05.9794834Z auto tmp3 = out_ptr0[i0]; 2023-01-11T21:38:05.9794944Z auto tmp5 = out_ptr1[i0]; 2023-01-11T21:38:05.9795052Z auto tmp8 = out_ptr3[i1]; 2023-01-11T21:38:05.9795160Z auto tmp10 = out_ptr4[i1]; 2023-01-11T21:38:05.9795254Z auto tmp13 = out_ptr2[i0]; 2023-01-11T21:38:05.9795346Z auto tmp15 = out_ptr5[i0]; 2023-01-11T21:38:05.9795442Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:05.9795567Z auto tmp4 = tmp2 - tmp3; 2023-01-11T21:38:05.9795667Z auto tmp6 = std::log(tmp5); 2023-01-11T21:38:05.9795798Z auto tmp7 = tmp4 - tmp6; 2023-01-11T21:38:05.9795930Z auto tmp9 = tmp0 - tmp8; 2023-01-11T21:38:05.9796030Z auto tmp11 = std::log(tmp10); 2023-01-11T21:38:05.9796164Z auto tmp12 = tmp9 - tmp11; 2023-01-11T21:38:05.9796297Z auto tmp14 = tmp1 - tmp13; 2023-01-11T21:38:05.9796398Z auto tmp16 = std::log(tmp15); 2023-01-11T21:38:05.9796528Z auto tmp17 = tmp14 - tmp16; 2023-01-11T21:38:05.9796669Z out_ptr6[i1 + (8*i0)] = tmp7; 2023-01-11T21:38:05.9796769Z out_ptr7[i1 + (8*i0)] = tmp12; 2023-01-11T21:38:05.9796865Z out_ptr8[i1 + (8*i0)] = tmp17; 2023-01-11T21:38:05.9796934Z } 2023-01-11T21:38:05.9797002Z } 2023-01-11T21:38:05.9797070Z } 2023-01-11T21:38:05.9797130Z } 2023-01-11T21:38:05.9797214Z ''') 2023-01-11T21:38:05.9797220Z 2023-01-11T21:38:05.9797224Z 2023-01-11T21:38:05.9797319Z async_compile.wait(globals()) 2023-01-11T21:38:05.9797395Z del async_compile 2023-01-11T21:38:05.9797400Z 2023-01-11T21:38:05.9797476Z def call(args): 2023-01-11T21:38:05.9797559Z arg0_1, arg1_1 = args 2023-01-11T21:38:05.9797634Z args.clear() 2023-01-11T21:38:05.9797823Z buf0 = empty_strided((8, 1), (1, 8), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9798018Z buf1 = empty_strided((8, 1), (1, 8), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9798206Z buf6 = empty_strided((8, 1), (1, 8), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9798395Z buf3 = empty_strided((1, 8), (8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9798580Z buf4 = empty_strided((1, 8), (8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9798770Z buf7 = empty_strided((8, 1), (1, 8), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9798957Z buf2 = empty_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9799142Z buf5 = empty_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9799319Z buf8 = empty_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9799665Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf6.data_ptr()), c_void_p(buf3.data_ptr()), c_void_p(buf4.data_ptr()), c_void_p(buf7.data_ptr()), c_void_p(buf2.data_ptr()), c_void_p(buf5.data_ptr()), c_void_p(buf8.data_ptr())) 2023-01-11T21:38:05.9799741Z del arg0_1 2023-01-11T21:38:05.9799813Z del arg1_1 2023-01-11T21:38:05.9799901Z return (buf2, buf5, buf8, ) 2023-01-11T21:38:05.9799907Z 2023-01-11T21:38:05.9799911Z 2023-01-11T21:38:05.9799992Z if __name__ == "__main__": 2023-01-11T21:38:05.9800115Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.9800272Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.9800461Z arg0_1 = rand_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9800657Z arg1_1 = rand_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9800778Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:05.9801044Z [2023-01-11 21:28:47,919] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 175 2023-01-11T21:38:05.9801050Z 2023-01-11T21:38:05.9801122Z ok (1.915s) 2023-01-11T21:38:05.9801577Z test_logsumexp_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.9801715Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.9801977Z [2023-01-11 21:28:47,965] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 176 2023-01-11T21:38:05.9802236Z [2023-01-11 21:28:49,765] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 176 2023-01-11T21:38:05.9802242Z 2023-01-11T21:38:05.9802343Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.9802411Z import torch 2023-01-11T21:38:05.9802488Z import random 2023-01-11T21:38:05.9802607Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.9802759Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.9802765Z 2023-01-11T21:38:05.9802848Z aten = torch.ops.aten 2023-01-11T21:38:05.9802984Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.9803081Z async_compile = AsyncCompile() 2023-01-11T21:38:05.9803087Z 2023-01-11T21:38:05.9803161Z import triton 2023-01-11T21:38:05.9803248Z import triton.language as tl 2023-01-11T21:38:05.9803374Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.9803518Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.9803524Z 2023-01-11T21:38:05.9803528Z 2023-01-11T21:38:05.9803666Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.9803874Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.9803997Z extern "C" void kernel(float* __restrict__ in_out_ptr0, 2023-01-11T21:38:05.9804107Z float* __restrict__ in_out_ptr1, 2023-01-11T21:38:05.9804225Z const float* __restrict__ in_ptr0, 2023-01-11T21:38:05.9804322Z float* __restrict__ out_ptr1, 2023-01-11T21:38:05.9804424Z float* __restrict__ out_ptr3) 2023-01-11T21:38:05.9804491Z { 2023-01-11T21:38:05.9804584Z auto out_ptr0 = in_out_ptr0; 2023-01-11T21:38:05.9804676Z auto out_ptr2 = in_out_ptr1; 2023-01-11T21:38:05.9804780Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.9804839Z { 2023-01-11T21:38:05.9804942Z #pragma omp for 2023-01-11T21:38:05.9805036Z for(long i0=0; i0<8; i0+=1) 2023-01-11T21:38:05.9805122Z { 2023-01-11T21:38:05.9805198Z { 2023-01-11T21:38:05.9805562Z #pragma omp declare reduction(max:at::vec::Vectorized:omp_out = at::vec::maximum(omp_out, omp_in)) initializer(omp_priv={{-std::numeric_limits::infinity()}}) 2023-01-11T21:38:05.9805776Z float tmp1 = -std::numeric_limits::infinity(); 2023-01-11T21:38:05.9805904Z auto tmp1_vec = at::vec::Vectorized(tmp1); 2023-01-11T21:38:05.9805992Z for(long i1=0; i1<1; i1+=1) 2023-01-11T21:38:05.9806062Z { 2023-01-11T21:38:05.9806210Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + (8*i0) + (8*i1)); 2023-01-11T21:38:05.9806362Z tmp1_vec = at::vec::maximum(tmp1_vec, tmp0); 2023-01-11T21:38:05.9806435Z } 2023-01-11T21:38:05.9806650Z tmp1 = at::vec::vec_reduce_all([](at::vec::Vectorized& x, at::vec::Vectorized&y) {return at::vec::maximum(x, y);}, tmp1_vec); 2023-01-11T21:38:05.9806777Z #pragma omp simd simdlen(4) reduction(max:tmp1) 2023-01-11T21:38:05.9806870Z for(long i1=8; i1<8; i1+=1) 2023-01-11T21:38:05.9806944Z { 2023-01-11T21:38:05.9807042Z auto tmp0 = in_ptr0[i1 + (8*i0)]; 2023-01-11T21:38:05.9807145Z tmp1 = std::max(tmp1, tmp0); 2023-01-11T21:38:05.9807217Z } 2023-01-11T21:38:05.9807304Z out_ptr0[i0] = tmp1; 2023-01-11T21:38:05.9807374Z } 2023-01-11T21:38:05.9807441Z } 2023-01-11T21:38:05.9807517Z #pragma omp for 2023-01-11T21:38:05.9807605Z for(long i0=0; i0<8; i0+=1) 2023-01-11T21:38:05.9807674Z { 2023-01-11T21:38:05.9807742Z { 2023-01-11T21:38:05.9807935Z #pragma omp declare reduction(+:at::vec::Vectorized:omp_out += omp_in) initializer(omp_priv={{0}}) 2023-01-11T21:38:05.9808021Z float tmp4 = 0; 2023-01-11T21:38:05.9808147Z auto tmp4_vec = at::vec::Vectorized(tmp4); 2023-01-11T21:38:05.9808243Z for(long i1=0; i1<1; i1+=1) 2023-01-11T21:38:05.9808305Z { 2023-01-11T21:38:05.9808458Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + (8*i0) + (8*i1)); 2023-01-11T21:38:05.9808593Z auto tmp1 = at::vec::Vectorized(out_ptr0[i0]); 2023-01-11T21:38:05.9808762Z auto tmp2 = tmp0 - tmp1; 2023-01-11T21:38:05.9808858Z auto tmp3 = tmp2.exp(); 2023-01-11T21:38:05.9808949Z tmp4_vec += tmp3; 2023-01-11T21:38:05.9809018Z } 2023-01-11T21:38:05.9809221Z tmp4 = at::vec::vec_reduce_all([](at::vec::Vectorized& x, at::vec::Vectorized&y) {return x + y;}, tmp4_vec); 2023-01-11T21:38:05.9809340Z #pragma omp simd simdlen(4) reduction(+:tmp4) 2023-01-11T21:38:05.9809435Z for(long i1=8; i1<8; i1+=1) 2023-01-11T21:38:05.9809504Z { 2023-01-11T21:38:05.9809607Z auto tmp0 = in_ptr0[i1 + (8*i0)]; 2023-01-11T21:38:05.9809704Z auto tmp1 = out_ptr0[i0]; 2023-01-11T21:38:05.9809842Z auto tmp2 = tmp0 - tmp1; 2023-01-11T21:38:05.9809943Z auto tmp3 = std::exp(tmp2); 2023-01-11T21:38:05.9810022Z tmp4 += tmp3; 2023-01-11T21:38:05.9810091Z } 2023-01-11T21:38:05.9810179Z out_ptr1[i0] = tmp4; 2023-01-11T21:38:05.9810249Z } 2023-01-11T21:38:05.9810317Z } 2023-01-11T21:38:05.9810398Z #pragma omp for 2023-01-11T21:38:05.9810483Z for(long i0=0; i0<8; i0+=1) 2023-01-11T21:38:05.9810546Z { 2023-01-11T21:38:05.9810613Z { 2023-01-11T21:38:05.9810683Z { 2023-01-11T21:38:05.9810778Z auto tmp0 = out_ptr1[i0]; 2023-01-11T21:38:05.9810875Z auto tmp2 = out_ptr0[i0]; 2023-01-11T21:38:05.9810978Z auto tmp1 = std::log(tmp0); 2023-01-11T21:38:05.9811079Z auto tmp3 = std::abs(tmp2); 2023-01-11T21:38:05.9811204Z auto tmp4 = std::numeric_limits::infinity(); 2023-01-11T21:38:05.9811299Z auto tmp5 = tmp3 == tmp4; 2023-01-11T21:38:05.9811411Z auto tmp6 = static_cast(0.0); 2023-01-11T21:38:05.9811517Z auto tmp7 = tmp5 ? tmp6 : tmp2; 2023-01-11T21:38:05.9811613Z auto tmp8 = tmp1 + tmp7; 2023-01-11T21:38:05.9811707Z in_out_ptr0[i0] = tmp8; 2023-01-11T21:38:05.9811775Z } 2023-01-11T21:38:05.9811836Z } 2023-01-11T21:38:05.9811936Z } 2023-01-11T21:38:05.9812019Z #pragma omp for 2023-01-11T21:38:05.9812105Z for(long i0=0; i0<8; i0+=1) 2023-01-11T21:38:05.9812172Z { 2023-01-11T21:38:05.9812239Z { 2023-01-11T21:38:05.9812308Z { 2023-01-11T21:38:05.9812517Z float tmp1 = -std::numeric_limits::infinity(); 2023-01-11T21:38:05.9812613Z for(long i1=0; i1<8; i1+=1) 2023-01-11T21:38:05.9812686Z { 2023-01-11T21:38:05.9812761Z { 2023-01-11T21:38:05.9812873Z auto tmp0 = in_ptr0[i0 + (8*i1)]; 2023-01-11T21:38:05.9812986Z tmp1 = std::max(tmp1, tmp0); 2023-01-11T21:38:05.9813059Z } 2023-01-11T21:38:05.9813123Z } 2023-01-11T21:38:05.9813216Z out_ptr2[i0] = tmp1; 2023-01-11T21:38:05.9813287Z } 2023-01-11T21:38:05.9813355Z } 2023-01-11T21:38:05.9813424Z } 2023-01-11T21:38:05.9813508Z #pragma omp for 2023-01-11T21:38:05.9813587Z for(long i0=0; i0<8; i0+=1) 2023-01-11T21:38:05.9813654Z { 2023-01-11T21:38:05.9813721Z { 2023-01-11T21:38:05.9813790Z { 2023-01-11T21:38:05.9813875Z float tmp4 = 0; 2023-01-11T21:38:05.9813970Z for(long i1=0; i1<8; i1+=1) 2023-01-11T21:38:05.9814044Z { 2023-01-11T21:38:05.9814109Z { 2023-01-11T21:38:05.9814218Z auto tmp0 = in_ptr0[i0 + (8*i1)]; 2023-01-11T21:38:05.9814323Z auto tmp1 = out_ptr2[i0]; 2023-01-11T21:38:05.9814625Z auto tmp2 = tmp0 - tmp1; 2023-01-11T21:38:05.9814736Z auto tmp3 = std::exp(tmp2); 2023-01-11T21:38:05.9814825Z tmp4 += tmp3; 2023-01-11T21:38:05.9814898Z } 2023-01-11T21:38:05.9814964Z } 2023-01-11T21:38:05.9815056Z out_ptr3[i0] = tmp4; 2023-01-11T21:38:05.9815125Z } 2023-01-11T21:38:05.9815192Z } 2023-01-11T21:38:05.9815259Z } 2023-01-11T21:38:05.9815340Z #pragma omp for 2023-01-11T21:38:05.9815425Z for(long i0=0; i0<8; i0+=1) 2023-01-11T21:38:05.9815485Z { 2023-01-11T21:38:05.9815554Z { 2023-01-11T21:38:05.9815623Z { 2023-01-11T21:38:05.9815719Z auto tmp0 = out_ptr3[i0]; 2023-01-11T21:38:05.9815816Z auto tmp2 = out_ptr2[i0]; 2023-01-11T21:38:05.9815920Z auto tmp1 = std::log(tmp0); 2023-01-11T21:38:05.9816025Z auto tmp3 = std::abs(tmp2); 2023-01-11T21:38:05.9816148Z auto tmp4 = std::numeric_limits::infinity(); 2023-01-11T21:38:05.9816245Z auto tmp5 = tmp3 == tmp4; 2023-01-11T21:38:05.9816355Z auto tmp6 = static_cast(0.0); 2023-01-11T21:38:05.9816462Z auto tmp7 = tmp5 ? tmp6 : tmp2; 2023-01-11T21:38:05.9816561Z auto tmp8 = tmp1 + tmp7; 2023-01-11T21:38:05.9816670Z auto tmp9 = static_cast(2); 2023-01-11T21:38:05.9816812Z auto tmp10 = tmp8 - tmp9; 2023-01-11T21:38:05.9816899Z in_out_ptr1[i0] = tmp10; 2023-01-11T21:38:05.9816970Z } 2023-01-11T21:38:05.9817038Z } 2023-01-11T21:38:05.9817105Z } 2023-01-11T21:38:05.9817241Z } 2023-01-11T21:38:05.9817309Z } 2023-01-11T21:38:05.9817395Z ''') 2023-01-11T21:38:05.9817400Z 2023-01-11T21:38:05.9817408Z 2023-01-11T21:38:05.9817496Z async_compile.wait(globals()) 2023-01-11T21:38:05.9817577Z del async_compile 2023-01-11T21:38:05.9817582Z 2023-01-11T21:38:05.9817656Z def call(args): 2023-01-11T21:38:05.9817730Z arg0_1, = args 2023-01-11T21:38:05.9817808Z args.clear() 2023-01-11T21:38:05.9818059Z buf0 = empty_strided((8, 1), (1, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9818253Z buf1 = empty_strided((8, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9818369Z buf2 = as_strided(buf0, (8, ), (1, )); del buf0 # reuse 2023-01-11T21:38:05.9818554Z buf3 = empty_strided((1, 8), (8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9818743Z buf4 = empty_strided((8, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9818859Z buf5 = as_strided(buf3, (8, ), (1, )); del buf3 # reuse 2023-01-11T21:38:05.9819077Z kernel_cpp_0(c_void_p(buf2.data_ptr()), c_void_p(buf5.data_ptr()), c_void_p(arg0_1.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf4.data_ptr())) 2023-01-11T21:38:05.9819155Z del arg0_1 2023-01-11T21:38:05.9819237Z return (buf2, buf5, ) 2023-01-11T21:38:05.9819243Z 2023-01-11T21:38:05.9819247Z 2023-01-11T21:38:05.9819328Z if __name__ == "__main__": 2023-01-11T21:38:05.9819446Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.9819567Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.9819765Z arg0_1 = rand_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9819879Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:05.9819885Z 2023-01-11T21:38:05.9819955Z ok (1.846s) 2023-01-11T21:38:05.9820410Z test_long_tensor_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.9820578Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.9820838Z [2023-01-11 21:28:49,787] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 177 2023-01-11T21:38:05.9821105Z [2023-01-11 21:28:51,471] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 177 2023-01-11T21:38:05.9821111Z 2023-01-11T21:38:05.9821209Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.9821276Z import torch 2023-01-11T21:38:05.9821352Z import random 2023-01-11T21:38:05.9821472Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.9821601Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.9821606Z 2023-01-11T21:38:05.9821688Z aten = torch.ops.aten 2023-01-11T21:38:05.9821826Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.9821926Z async_compile = AsyncCompile() 2023-01-11T21:38:05.9821935Z 2023-01-11T21:38:05.9822009Z import triton 2023-01-11T21:38:05.9822093Z import triton.language as tl 2023-01-11T21:38:05.9822218Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.9822358Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.9822363Z 2023-01-11T21:38:05.9822371Z 2023-01-11T21:38:05.9822508Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.9822720Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.9822845Z extern "C" void kernel(const long* __restrict__ in_ptr0, 2023-01-11T21:38:05.9822948Z long* __restrict__ out_ptr0, 2023-01-11T21:38:05.9823049Z long* __restrict__ out_ptr1) 2023-01-11T21:38:05.9823107Z { 2023-01-11T21:38:05.9823209Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.9823276Z { 2023-01-11T21:38:05.9823358Z #pragma omp for 2023-01-11T21:38:05.9823449Z for(long i0=0; i0<64; i0+=1) 2023-01-11T21:38:05.9823516Z { 2023-01-11T21:38:05.9823577Z { 2023-01-11T21:38:05.9823646Z { 2023-01-11T21:38:05.9823743Z auto tmp1 = in_ptr0[i0]; 2023-01-11T21:38:05.9823851Z auto tmp0 = static_cast(294); 2023-01-11T21:38:05.9824020Z auto tmp2 = tmp0 - tmp1; 2023-01-11T21:38:05.9824131Z auto tmp3 = static_cast(295); 2023-01-11T21:38:05.9824231Z auto tmp4 = tmp3 + tmp1; 2023-01-11T21:38:05.9824314Z out_ptr0[i0] = tmp2; 2023-01-11T21:38:05.9824407Z out_ptr1[i0] = tmp4; 2023-01-11T21:38:05.9824475Z } 2023-01-11T21:38:05.9824543Z } 2023-01-11T21:38:05.9824609Z } 2023-01-11T21:38:05.9824675Z } 2023-01-11T21:38:05.9824740Z } 2023-01-11T21:38:05.9824817Z ''') 2023-01-11T21:38:05.9824823Z 2023-01-11T21:38:05.9824827Z 2023-01-11T21:38:05.9824925Z async_compile.wait(globals()) 2023-01-11T21:38:05.9825001Z del async_compile 2023-01-11T21:38:05.9825006Z 2023-01-11T21:38:05.9825083Z def call(args): 2023-01-11T21:38:05.9825157Z arg0_1, = args 2023-01-11T21:38:05.9825232Z args.clear() 2023-01-11T21:38:05.9825424Z buf0 = empty_strided((8, 8), (8, 1), device='cpu', dtype=torch.int64) 2023-01-11T21:38:05.9825619Z buf1 = empty_strided((8, 8), (8, 1), device='cpu', dtype=torch.int64) 2023-01-11T21:38:05.9825779Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr())) 2023-01-11T21:38:05.9825852Z del arg0_1 2023-01-11T21:38:05.9825933Z return (buf0, buf1, ) 2023-01-11T21:38:05.9825938Z 2023-01-11T21:38:05.9825942Z 2023-01-11T21:38:05.9826024Z if __name__ == "__main__": 2023-01-11T21:38:05.9826143Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.9826268Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.9826458Z arg0_1 = rand_strided((8, 8), (8, 1), device='cpu', dtype=torch.int64) 2023-01-11T21:38:05.9826593Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:05.9826606Z 2023-01-11T21:38:05.9826670Z ok (1.705s) 2023-01-11T21:38:05.9827189Z test_lowmem_dropout1_cpu (__main__.CpuTests) ... /opt/conda/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py:372: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled.Consider setting `torch.set_float32_matmul_precision('high')` 2023-01-11T21:38:05.9827270Z warnings.warn( 2023-01-11T21:38:05.9827527Z [2023-01-11 21:28:51,493] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 178 2023-01-11T21:38:05.9827788Z [2023-01-11 21:28:53,200] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 178 2023-01-11T21:38:05.9828043Z [2023-01-11 21:28:53,202] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling BACKWARDS graph 178 2023-01-11T21:38:05.9828309Z [2023-01-11 21:28:53,210] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling BACKWARDS graph 178 2023-01-11T21:38:05.9828565Z [2023-01-11 21:28:53,327] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 179 2023-01-11T21:38:05.9828822Z [2023-01-11 21:28:53,328] torch._inductor.lowering: [WARNING] using triton random, expect difference from eager 2023-01-11T21:38:05.9829075Z [2023-01-11 21:28:55,028] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 179 2023-01-11T21:38:05.9829327Z [2023-01-11 21:28:55,030] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling BACKWARDS graph 179 2023-01-11T21:38:05.9829333Z 2023-01-11T21:38:05.9829436Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.9829513Z import torch 2023-01-11T21:38:05.9829589Z import random 2023-01-11T21:38:05.9829710Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.9829834Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.9829842Z 2023-01-11T21:38:05.9829925Z aten = torch.ops.aten 2023-01-11T21:38:05.9830053Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.9830149Z async_compile = AsyncCompile() 2023-01-11T21:38:05.9830154Z 2023-01-11T21:38:05.9830227Z import triton 2023-01-11T21:38:05.9830319Z import triton.language as tl 2023-01-11T21:38:05.9830473Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.9830614Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.9830619Z 2023-01-11T21:38:05.9830623Z 2023-01-11T21:38:05.9830760Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.9830967Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.9831083Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:05.9831193Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:05.9831296Z float* __restrict__ out_ptr0) 2023-01-11T21:38:05.9831365Z { 2023-01-11T21:38:05.9831466Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.9831532Z { 2023-01-11T21:38:05.9831615Z #pragma omp for 2023-01-11T21:38:05.9831697Z for(long i0=0; i0<12500; i0+=1) 2023-01-11T21:38:05.9831763Z { 2023-01-11T21:38:05.9831906Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:05.9832043Z auto tmp1 = at::vec::Vectorized::loadu(in_ptr1 + 8*i0); 2023-01-11T21:38:05.9832133Z auto tmp2 = tmp0 * tmp1; 2023-01-11T21:38:05.9832230Z tmp2.store(out_ptr0 + 8*i0); 2023-01-11T21:38:05.9832297Z } 2023-01-11T21:38:05.9832389Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:05.9832485Z for(long i0=100000; i0<100000; i0+=1) 2023-01-11T21:38:05.9832551Z { 2023-01-11T21:38:05.9832639Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:05.9832726Z auto tmp1 = in_ptr1[i0]; 2023-01-11T21:38:05.9832844Z auto tmp2 = tmp0 * tmp1; 2023-01-11T21:38:05.9832929Z out_ptr0[i0] = tmp2; 2023-01-11T21:38:05.9832989Z } 2023-01-11T21:38:05.9833055Z } 2023-01-11T21:38:05.9833119Z } 2023-01-11T21:38:05.9833204Z ''') 2023-01-11T21:38:05.9833209Z 2023-01-11T21:38:05.9833214Z 2023-01-11T21:38:05.9833309Z async_compile.wait(globals()) 2023-01-11T21:38:05.9833387Z del async_compile 2023-01-11T21:38:05.9833392Z 2023-01-11T21:38:05.9833467Z def call(args): 2023-01-11T21:38:05.9833553Z primals_1, primals_2 = args 2023-01-11T21:38:05.9833629Z args.clear() 2023-01-11T21:38:05.9833832Z buf0 = empty_strided((100000, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9834008Z kernel_cpp_0(c_void_p(primals_1.data_ptr()), c_void_p(primals_2.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:05.9834085Z del primals_2 2023-01-11T21:38:05.9834177Z return (buf0, primals_1, ) 2023-01-11T21:38:05.9834182Z 2023-01-11T21:38:05.9834186Z 2023-01-11T21:38:05.9834268Z if __name__ == "__main__": 2023-01-11T21:38:05.9834385Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.9834504Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.9834713Z primals_1 = rand_strided((100000, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9834917Z primals_2 = rand_strided((100000, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9835050Z print_performance(lambda: call([primals_1, primals_2])) 2023-01-11T21:38:05.9835056Z 2023-01-11T21:38:05.9835061Z 2023-01-11T21:38:05.9835160Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.9835235Z import torch 2023-01-11T21:38:05.9835309Z import random 2023-01-11T21:38:05.9835430Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.9835546Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.9835551Z 2023-01-11T21:38:05.9835632Z aten = torch.ops.aten 2023-01-11T21:38:05.9835767Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.9835865Z async_compile = AsyncCompile() 2023-01-11T21:38:05.9835870Z 2023-01-11T21:38:05.9835943Z import triton 2023-01-11T21:38:05.9836036Z import triton.language as tl 2023-01-11T21:38:05.9836161Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.9836324Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.9836337Z 2023-01-11T21:38:05.9836341Z 2023-01-11T21:38:05.9836471Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.9836677Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.9836801Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:05.9836909Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:05.9837013Z float* __restrict__ out_ptr0) 2023-01-11T21:38:05.9837079Z { 2023-01-11T21:38:05.9837182Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.9837244Z { 2023-01-11T21:38:05.9837325Z #pragma omp for 2023-01-11T21:38:05.9837413Z for(long i0=0; i0<12500; i0+=1) 2023-01-11T21:38:05.9837480Z { 2023-01-11T21:38:05.9837619Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:05.9837756Z auto tmp1 = at::vec::Vectorized::loadu(in_ptr1 + 8*i0); 2023-01-11T21:38:05.9837845Z auto tmp2 = tmp0 * tmp1; 2023-01-11T21:38:05.9837941Z tmp2.store(out_ptr0 + 8*i0); 2023-01-11T21:38:05.9838002Z } 2023-01-11T21:38:05.9838101Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:05.9838197Z for(long i0=100000; i0<100000; i0+=1) 2023-01-11T21:38:05.9838264Z { 2023-01-11T21:38:05.9838353Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:05.9838442Z auto tmp1 = in_ptr1[i0]; 2023-01-11T21:38:05.9838523Z auto tmp2 = tmp0 * tmp1; 2023-01-11T21:38:05.9838609Z out_ptr0[i0] = tmp2; 2023-01-11T21:38:05.9838720Z } 2023-01-11T21:38:05.9838786Z } 2023-01-11T21:38:05.9838852Z } 2023-01-11T21:38:05.9838937Z ''') 2023-01-11T21:38:05.9838942Z 2023-01-11T21:38:05.9838947Z 2023-01-11T21:38:05.9839044Z async_compile.wait(globals()) 2023-01-11T21:38:05.9839115Z del async_compile 2023-01-11T21:38:05.9839126Z 2023-01-11T21:38:05.9839196Z def call(args): 2023-01-11T21:38:05.9839290Z primals_1, tangents_1 = args 2023-01-11T21:38:05.9839366Z args.clear() 2023-01-11T21:38:05.9839567Z buf0 = empty_strided((100000, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9839746Z kernel_cpp_0(c_void_p(tangents_1.data_ptr()), c_void_p(primals_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:05.9839826Z del primals_1 2023-01-11T21:38:05.9839901Z del tangents_1 2023-01-11T21:38:05.9839976Z return (None, buf0, ) 2023-01-11T21:38:05.9839982Z 2023-01-11T21:38:05.9839987Z 2023-01-11T21:38:05.9840065Z if __name__ == "__main__": 2023-01-11T21:38:05.9840184Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.9840311Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.9840516Z primals_1 = rand_strided((100000, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9840724Z tangents_1 = rand_strided((100000, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9840855Z print_performance(lambda: call([primals_1, tangents_1])) 2023-01-11T21:38:05.9840861Z 2023-01-11T21:38:05.9840865Z 2023-01-11T21:38:05.9840961Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.9841028Z import torch 2023-01-11T21:38:05.9841102Z import random 2023-01-11T21:38:05.9841221Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.9841344Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.9841349Z 2023-01-11T21:38:05.9841430Z aten = torch.ops.aten 2023-01-11T21:38:05.9841565Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.9841664Z async_compile = AsyncCompile() 2023-01-11T21:38:05.9841669Z 2023-01-11T21:38:05.9841743Z import triton 2023-01-11T21:38:05.9841828Z import triton.language as tl 2023-01-11T21:38:05.9841953Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.9842091Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.9842286Z seed_cpu_None = None # 9130db9322feaa41c28986790b86d7dd047e77339ff46fce775dbaa5929b26ce 2023-01-11T21:38:05.9842293Z 2023-01-11T21:38:05.9842297Z 2023-01-11T21:38:05.9842434Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.9842642Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.9842760Z extern "C" void kernel(const long* __restrict__ seed0, 2023-01-11T21:38:05.9842874Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:05.9842978Z const float* __restrict__ in_ptr2, 2023-01-11T21:38:05.9843083Z float* __restrict__ out_ptr0) 2023-01-11T21:38:05.9843152Z { 2023-01-11T21:38:05.9843254Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.9843321Z { 2023-01-11T21:38:05.9843402Z #pragma omp for 2023-01-11T21:38:05.9843492Z for(long i0=0; i0<100000; i0+=1) 2023-01-11T21:38:05.9843552Z { 2023-01-11T21:38:05.9843622Z { 2023-01-11T21:38:05.9843692Z { 2023-01-11T21:38:05.9843784Z auto tmp0 = seed0[0]; 2023-01-11T21:38:05.9843884Z auto tmp6 = in_ptr1[i0]; 2023-01-11T21:38:05.9843982Z auto tmp7 = in_ptr2[i0]; 2023-01-11T21:38:05.9844081Z auto tmp1 = static_cast(i0); 2023-01-11T21:38:05.9844223Z auto tmp2 = static_cast(normalized_rand_cpu(tmp0, tmp1));; 2023-01-11T21:38:05.9844333Z auto tmp3 = static_cast(0.33); 2023-01-11T21:38:05.9844431Z auto tmp4 = tmp2 > tmp3; 2023-01-11T21:38:05.9844573Z auto tmp5 = static_cast(tmp4); 2023-01-11T21:38:05.9844669Z auto tmp8 = tmp6 * tmp7; 2023-01-11T21:38:05.9844763Z auto tmp9 = tmp5 * tmp8; 2023-01-11T21:38:05.9844886Z auto tmp10 = static_cast(1.492537313432836); 2023-01-11T21:38:05.9844981Z auto tmp11 = tmp9 * tmp10; 2023-01-11T21:38:05.9845072Z out_ptr0[i0] = tmp11; 2023-01-11T21:38:05.9845141Z } 2023-01-11T21:38:05.9845208Z } 2023-01-11T21:38:05.9845277Z } 2023-01-11T21:38:05.9845348Z } 2023-01-11T21:38:05.9845420Z } 2023-01-11T21:38:05.9845510Z ''') 2023-01-11T21:38:05.9845516Z 2023-01-11T21:38:05.9845522Z 2023-01-11T21:38:05.9845637Z async_compile.wait(globals()) 2023-01-11T21:38:05.9845715Z del async_compile 2023-01-11T21:38:05.9845720Z 2023-01-11T21:38:05.9845795Z def call(args): 2023-01-11T21:38:05.9845886Z primals_1, primals_2 = args 2023-01-11T21:38:05.9845961Z args.clear() 2023-01-11T21:38:05.9846103Z torch.randint(2**31, size=(), dtype=torch.int64, out=seed_cpu_None) 2023-01-11T21:38:05.9846295Z buf0 = empty_strided((100000, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9846512Z kernel_cpp_0(c_void_p(seed_cpu_None.data_ptr()), c_void_p(primals_1.data_ptr()), c_void_p(primals_2.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:05.9846589Z del primals_2 2023-01-11T21:38:05.9846707Z return (buf0, primals_1, seed_cpu_None.clone(), ) 2023-01-11T21:38:05.9846713Z 2023-01-11T21:38:05.9846717Z 2023-01-11T21:38:05.9846800Z if __name__ == "__main__": 2023-01-11T21:38:05.9846917Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.9847044Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.9847238Z seed_cpu_None = rand_strided((), (), device='cpu', dtype=torch.int64) 2023-01-11T21:38:05.9847437Z primals_1 = rand_strided((100000, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9847639Z primals_2 = rand_strided((100000, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9847771Z print_performance(lambda: call([primals_1, primals_2])) 2023-01-11T21:38:05.9847776Z 2023-01-11T21:38:05.9848071Z [2023-01-11 21:28:56,699] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling BACKWARDS graph 179 2023-01-11T21:38:05.9848077Z 2023-01-11T21:38:05.9848176Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.9848251Z import torch 2023-01-11T21:38:05.9848329Z import random 2023-01-11T21:38:05.9848449Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.9848565Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.9848578Z 2023-01-11T21:38:05.9848653Z aten = torch.ops.aten 2023-01-11T21:38:05.9848789Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.9848884Z async_compile = AsyncCompile() 2023-01-11T21:38:05.9848889Z 2023-01-11T21:38:05.9848966Z import triton 2023-01-11T21:38:05.9849061Z import triton.language as tl 2023-01-11T21:38:05.9849185Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.9849325Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.9849331Z 2023-01-11T21:38:05.9849335Z 2023-01-11T21:38:05.9849472Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.9849673Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.9849797Z extern "C" void kernel(const long* __restrict__ in_ptr0, 2023-01-11T21:38:05.9849907Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:05.9850017Z const float* __restrict__ in_ptr2, 2023-01-11T21:38:05.9850121Z float* __restrict__ out_ptr0) 2023-01-11T21:38:05.9850188Z { 2023-01-11T21:38:05.9850291Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.9850350Z { 2023-01-11T21:38:05.9850431Z #pragma omp for 2023-01-11T21:38:05.9850552Z for(long i0=0; i0<100000; i0+=1) 2023-01-11T21:38:05.9850619Z { 2023-01-11T21:38:05.9850688Z { 2023-01-11T21:38:05.9850756Z { 2023-01-11T21:38:05.9850846Z auto tmp0 = in_ptr0[0]; 2023-01-11T21:38:05.9850944Z auto tmp6 = in_ptr1[i0]; 2023-01-11T21:38:05.9851043Z auto tmp10 = in_ptr2[i0]; 2023-01-11T21:38:05.9851149Z auto tmp1 = static_cast(i0); 2023-01-11T21:38:05.9851293Z auto tmp2 = static_cast(normalized_rand_cpu(tmp0, tmp1));; 2023-01-11T21:38:05.9851403Z auto tmp3 = static_cast(0.33); 2023-01-11T21:38:05.9851501Z auto tmp4 = tmp2 > tmp3; 2023-01-11T21:38:05.9851613Z auto tmp5 = static_cast(tmp4); 2023-01-11T21:38:05.9851702Z auto tmp7 = tmp5 * tmp6; 2023-01-11T21:38:05.9851821Z auto tmp8 = static_cast(1.492537313432836); 2023-01-11T21:38:05.9851919Z auto tmp9 = tmp7 * tmp8; 2023-01-11T21:38:05.9852018Z auto tmp11 = tmp9 * tmp10; 2023-01-11T21:38:05.9852109Z out_ptr0[i0] = tmp11; 2023-01-11T21:38:05.9852178Z } 2023-01-11T21:38:05.9852246Z } 2023-01-11T21:38:05.9852306Z } 2023-01-11T21:38:05.9852378Z } 2023-01-11T21:38:05.9852444Z } 2023-01-11T21:38:05.9852530Z ''') 2023-01-11T21:38:05.9852535Z 2023-01-11T21:38:05.9852540Z 2023-01-11T21:38:05.9852635Z async_compile.wait(globals()) 2023-01-11T21:38:05.9852712Z del async_compile 2023-01-11T21:38:05.9852717Z 2023-01-11T21:38:05.9852791Z def call(args): 2023-01-11T21:38:05.9852899Z primals_1, philox_seed_like, tangents_1 = args 2023-01-11T21:38:05.9852974Z args.clear() 2023-01-11T21:38:05.9853176Z buf0 = empty_strided((100000, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9853397Z kernel_cpp_0(c_void_p(philox_seed_like.data_ptr()), c_void_p(tangents_1.data_ptr()), c_void_p(primals_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:05.9853486Z del philox_seed_like 2023-01-11T21:38:05.9853562Z del primals_1 2023-01-11T21:38:05.9853638Z del tangents_1 2023-01-11T21:38:05.9853713Z return (None, buf0, ) 2023-01-11T21:38:05.9853726Z 2023-01-11T21:38:05.9853730Z 2023-01-11T21:38:05.9853834Z if __name__ == "__main__": 2023-01-11T21:38:05.9853954Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.9854079Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.9854285Z primals_1 = rand_strided((100000, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9854591Z philox_seed_like = rand_strided((), (), device='cpu', dtype=torch.int64) 2023-01-11T21:38:05.9854805Z tangents_1 = rand_strided((100000, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9854965Z print_performance(lambda: call([primals_1, philox_seed_like, tangents_1])) 2023-01-11T21:38:05.9854975Z 2023-01-11T21:38:05.9855062Z ok (5.232s) 2023-01-11T21:38:05.9855411Z test_lowmem_dropout2_cpu (__main__.CpuTests) ... [2023-01-11 21:28:56,934] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 180 2023-01-11T21:38:05.9855667Z [2023-01-11 21:28:56,935] torch._inductor.lowering: [WARNING] using triton random, expect difference from eager 2023-01-11T21:38:05.9855933Z [2023-01-11 21:28:58,741] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 180 2023-01-11T21:38:05.9856190Z [2023-01-11 21:28:58,811] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling BACKWARDS graph 180 2023-01-11T21:38:05.9856452Z [2023-01-11 21:29:01,063] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling BACKWARDS graph 180 2023-01-11T21:38:05.9856457Z 2023-01-11T21:38:05.9856555Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.9856632Z import torch 2023-01-11T21:38:05.9856709Z import random 2023-01-11T21:38:05.9856868Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.9856992Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.9856997Z 2023-01-11T21:38:05.9857079Z aten = torch.ops.aten 2023-01-11T21:38:05.9857274Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.9857388Z async_compile = AsyncCompile() 2023-01-11T21:38:05.9857393Z 2023-01-11T21:38:05.9857478Z import triton 2023-01-11T21:38:05.9857571Z import triton.language as tl 2023-01-11T21:38:05.9857697Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.9857828Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.9857992Z seed_cpu_None = None # 9130db9322feaa41c28986790b86d7dd047e77339ff46fce775dbaa5929b26ce 2023-01-11T21:38:05.9857997Z 2023-01-11T21:38:05.9858002Z 2023-01-11T21:38:05.9858140Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.9858344Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.9858471Z extern "C" void kernel(float* __restrict__ in_out_ptr0, 2023-01-11T21:38:05.9858578Z const long* __restrict__ seed0) 2023-01-11T21:38:05.9858643Z { 2023-01-11T21:38:05.9858744Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.9858803Z { 2023-01-11T21:38:05.9858889Z #pragma omp for 2023-01-11T21:38:05.9858977Z for(long i0=0; i0<256; i0+=1) 2023-01-11T21:38:05.9859044Z { 2023-01-11T21:38:05.9859111Z { 2023-01-11T21:38:05.9859183Z { 2023-01-11T21:38:05.9859268Z auto tmp0 = seed0[0]; 2023-01-11T21:38:05.9859372Z auto tmp6 = in_out_ptr0[i0]; 2023-01-11T21:38:05.9859478Z auto tmp1 = static_cast(i0); 2023-01-11T21:38:05.9859626Z auto tmp2 = static_cast(normalized_rand_cpu(tmp0, tmp1));; 2023-01-11T21:38:05.9859736Z auto tmp3 = static_cast(0.5); 2023-01-11T21:38:05.9859838Z auto tmp4 = tmp2 > tmp3; 2023-01-11T21:38:05.9859948Z auto tmp5 = static_cast(tmp4); 2023-01-11T21:38:05.9860044Z auto tmp7 = tmp5 * tmp6; 2023-01-11T21:38:05.9860148Z auto tmp8 = static_cast(2.0); 2023-01-11T21:38:05.9860285Z auto tmp9 = tmp7 * tmp8; 2023-01-11T21:38:05.9860382Z in_out_ptr0[i0] = tmp9; 2023-01-11T21:38:05.9860452Z } 2023-01-11T21:38:05.9860520Z } 2023-01-11T21:38:05.9860587Z } 2023-01-11T21:38:05.9860653Z } 2023-01-11T21:38:05.9860711Z } 2023-01-11T21:38:05.9860797Z ''') 2023-01-11T21:38:05.9860802Z 2023-01-11T21:38:05.9860807Z 2023-01-11T21:38:05.9860942Z kernel_cpp_1 = async_compile.cpp(''' 2023-01-11T21:38:05.9861149Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.9861270Z extern "C" void kernel(float* __restrict__ in_out_ptr0, 2023-01-11T21:38:05.9861378Z const long* __restrict__ seed0) 2023-01-11T21:38:05.9861442Z { 2023-01-11T21:38:05.9861537Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.9861604Z { 2023-01-11T21:38:05.9861685Z #pragma omp for 2023-01-11T21:38:05.9861772Z for(long i0=0; i0<256; i0+=1) 2023-01-11T21:38:05.9861841Z { 2023-01-11T21:38:05.9861909Z { 2023-01-11T21:38:05.9861978Z { 2023-01-11T21:38:05.9862061Z auto tmp0 = seed0[0]; 2023-01-11T21:38:05.9862163Z auto tmp6 = in_out_ptr0[i0]; 2023-01-11T21:38:05.9862276Z auto tmp1 = static_cast(256 + i0); 2023-01-11T21:38:05.9862421Z auto tmp2 = static_cast(normalized_rand_cpu(tmp0, tmp1));; 2023-01-11T21:38:05.9862531Z auto tmp3 = static_cast(0.5); 2023-01-11T21:38:05.9862626Z auto tmp4 = tmp2 > tmp3; 2023-01-11T21:38:05.9862737Z auto tmp5 = static_cast(tmp4); 2023-01-11T21:38:05.9862857Z auto tmp7 = tmp5 * tmp6; 2023-01-11T21:38:05.9862969Z auto tmp8 = static_cast(2.0); 2023-01-11T21:38:05.9863064Z auto tmp9 = tmp7 * tmp8; 2023-01-11T21:38:05.9863160Z in_out_ptr0[i0] = tmp9; 2023-01-11T21:38:05.9863236Z } 2023-01-11T21:38:05.9863303Z } 2023-01-11T21:38:05.9863371Z } 2023-01-11T21:38:05.9863431Z } 2023-01-11T21:38:05.9863494Z } 2023-01-11T21:38:05.9863580Z ''') 2023-01-11T21:38:05.9863585Z 2023-01-11T21:38:05.9863590Z 2023-01-11T21:38:05.9863684Z async_compile.wait(globals()) 2023-01-11T21:38:05.9863761Z del async_compile 2023-01-11T21:38:05.9863766Z 2023-01-11T21:38:05.9863840Z def call(args): 2023-01-11T21:38:05.9863946Z primals_1, primals_2, primals_3 = args 2023-01-11T21:38:05.9864015Z args.clear() 2023-01-11T21:38:05.9864153Z torch.randint(2**31, size=(), dtype=torch.int64, out=seed_cpu_None) 2023-01-11T21:38:05.9864354Z buf0 = empty_strided((8, 32), (32, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9864492Z aten.mm.out(primals_3, as_strided(primals_1, (32, 32), (1, 32)), out=buf0) 2023-01-11T21:38:05.9864568Z del primals_1 2023-01-11T21:38:05.9864659Z buf1 = buf0; del buf0 # reuse 2023-01-11T21:38:05.9864810Z kernel_cpp_0(c_void_p(buf1.data_ptr()), c_void_p(seed_cpu_None.data_ptr())) 2023-01-11T21:38:05.9865006Z buf2 = empty_strided((8, 32), (32, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9865128Z aten.mm.out(buf1, as_strided(primals_2, (32, 32), (1, 32)), out=buf2) 2023-01-11T21:38:05.9865242Z buf3 = buf2; del buf2 # reuse 2023-01-11T21:38:05.9865410Z kernel_cpp_1(c_void_p(buf3.data_ptr()), c_void_p(seed_cpu_None.data_ptr())) 2023-01-11T21:38:05.9865574Z return (buf3, primals_3, seed_cpu_None.clone(), buf1, as_strided(primals_2, (32, 32), (32, 1)), ) 2023-01-11T21:38:05.9865580Z 2023-01-11T21:38:05.9865586Z 2023-01-11T21:38:05.9865665Z if __name__ == "__main__": 2023-01-11T21:38:05.9865782Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.9865908Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.9866103Z seed_cpu_None = rand_strided((), (), device='cpu', dtype=torch.int64) 2023-01-11T21:38:05.9866333Z primals_1 = rand_strided((32, 32), (32, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9866534Z primals_2 = rand_strided((32, 32), (32, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9866732Z primals_3 = rand_strided((8, 32), (32, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9866877Z print_performance(lambda: call([primals_1, primals_2, primals_3])) 2023-01-11T21:38:05.9866882Z 2023-01-11T21:38:05.9866887Z 2023-01-11T21:38:05.9866989Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.9867064Z import torch 2023-01-11T21:38:05.9867138Z import random 2023-01-11T21:38:05.9867259Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.9867380Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.9867385Z 2023-01-11T21:38:05.9867467Z aten = torch.ops.aten 2023-01-11T21:38:05.9867603Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.9867699Z async_compile = AsyncCompile() 2023-01-11T21:38:05.9867707Z 2023-01-11T21:38:05.9867783Z import triton 2023-01-11T21:38:05.9867873Z import triton.language as tl 2023-01-11T21:38:05.9867998Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.9868135Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.9868141Z 2023-01-11T21:38:05.9868145Z 2023-01-11T21:38:05.9868274Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.9868478Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.9868600Z extern "C" void kernel(const long* __restrict__ in_ptr0, 2023-01-11T21:38:05.9868714Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:05.9868847Z float* __restrict__ out_ptr0) 2023-01-11T21:38:05.9868912Z { 2023-01-11T21:38:05.9869015Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.9869074Z { 2023-01-11T21:38:05.9869155Z #pragma omp for 2023-01-11T21:38:05.9869245Z for(long i0=0; i0<256; i0+=1) 2023-01-11T21:38:05.9869314Z { 2023-01-11T21:38:05.9869381Z { 2023-01-11T21:38:05.9869450Z { 2023-01-11T21:38:05.9869546Z auto tmp0 = in_ptr0[0]; 2023-01-11T21:38:05.9869637Z auto tmp6 = in_ptr1[i0]; 2023-01-11T21:38:05.9869750Z auto tmp1 = static_cast(256 + i0); 2023-01-11T21:38:05.9869896Z auto tmp2 = static_cast(normalized_rand_cpu(tmp0, tmp1));; 2023-01-11T21:38:05.9870005Z auto tmp3 = static_cast(0.5); 2023-01-11T21:38:05.9870101Z auto tmp4 = tmp2 > tmp3; 2023-01-11T21:38:05.9870217Z auto tmp5 = static_cast(tmp4); 2023-01-11T21:38:05.9870317Z auto tmp7 = tmp5 * tmp6; 2023-01-11T21:38:05.9870420Z auto tmp8 = static_cast(2.0); 2023-01-11T21:38:05.9870515Z auto tmp9 = tmp7 * tmp8; 2023-01-11T21:38:05.9870607Z out_ptr0[i0] = tmp9; 2023-01-11T21:38:05.9870679Z } 2023-01-11T21:38:05.9870746Z } 2023-01-11T21:38:05.9870813Z } 2023-01-11T21:38:05.9870879Z } 2023-01-11T21:38:05.9870936Z } 2023-01-11T21:38:05.9871021Z ''') 2023-01-11T21:38:05.9871026Z 2023-01-11T21:38:05.9871030Z 2023-01-11T21:38:05.9871167Z kernel_cpp_1 = async_compile.cpp(''' 2023-01-11T21:38:05.9871371Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.9871492Z extern "C" void kernel(float* __restrict__ in_out_ptr0, 2023-01-11T21:38:05.9871601Z const long* __restrict__ in_ptr0) 2023-01-11T21:38:05.9871670Z { 2023-01-11T21:38:05.9871771Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.9871831Z { 2023-01-11T21:38:05.9871911Z #pragma omp for 2023-01-11T21:38:05.9871999Z for(long i0=0; i0<256; i0+=1) 2023-01-11T21:38:05.9872067Z { 2023-01-11T21:38:05.9872134Z { 2023-01-11T21:38:05.9872235Z { 2023-01-11T21:38:05.9872324Z auto tmp0 = in_ptr0[0]; 2023-01-11T21:38:05.9872426Z auto tmp6 = in_out_ptr0[i0]; 2023-01-11T21:38:05.9872534Z auto tmp1 = static_cast(i0); 2023-01-11T21:38:05.9872677Z auto tmp2 = static_cast(normalized_rand_cpu(tmp0, tmp1));; 2023-01-11T21:38:05.9872788Z auto tmp3 = static_cast(0.5); 2023-01-11T21:38:05.9872883Z auto tmp4 = tmp2 > tmp3; 2023-01-11T21:38:05.9872994Z auto tmp5 = static_cast(tmp4); 2023-01-11T21:38:05.9873094Z auto tmp7 = tmp5 * tmp6; 2023-01-11T21:38:05.9873195Z auto tmp8 = static_cast(2.0); 2023-01-11T21:38:05.9873290Z auto tmp9 = tmp7 * tmp8; 2023-01-11T21:38:05.9873384Z in_out_ptr0[i0] = tmp9; 2023-01-11T21:38:05.9873455Z } 2023-01-11T21:38:05.9873527Z } 2023-01-11T21:38:05.9873594Z } 2023-01-11T21:38:05.9873653Z } 2023-01-11T21:38:05.9873719Z } 2023-01-11T21:38:05.9873804Z ''') 2023-01-11T21:38:05.9873810Z 2023-01-11T21:38:05.9873814Z 2023-01-11T21:38:05.9873909Z async_compile.wait(globals()) 2023-01-11T21:38:05.9873986Z del async_compile 2023-01-11T21:38:05.9873991Z 2023-01-11T21:38:05.9874069Z def call(args): 2023-01-11T21:38:05.9874207Z primals_3, philox_seed_like, mul_1, permute_4, tangents_1 = args 2023-01-11T21:38:05.9874288Z args.clear() 2023-01-11T21:38:05.9874479Z buf0 = empty_strided((8, 32), (32, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9874698Z kernel_cpp_0(c_void_p(philox_seed_like.data_ptr()), c_void_p(tangents_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:05.9874779Z del tangents_1 2023-01-11T21:38:05.9874979Z buf1 = empty_strided((32, 32), (32, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9875106Z aten.mm.out(as_strided(buf0, (32, 8), (1, 32)), mul_1, out=buf1) 2023-01-11T21:38:05.9875176Z del mul_1 2023-01-11T21:38:05.9875369Z buf2 = empty_strided((8, 32), (32, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9875465Z aten.mm.out(buf0, permute_4, out=buf2) 2023-01-11T21:38:05.9875537Z del buf0 2023-01-11T21:38:05.9875614Z del permute_4 2023-01-11T21:38:05.9875704Z buf3 = buf2; del buf2 # reuse 2023-01-11T21:38:05.9875858Z kernel_cpp_1(c_void_p(buf3.data_ptr()), c_void_p(philox_seed_like.data_ptr())) 2023-01-11T21:38:05.9875943Z del philox_seed_like 2023-01-11T21:38:05.9876139Z buf4 = empty_strided((32, 32), (32, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9876265Z aten.mm.out(as_strided(buf3, (32, 8), (1, 32)), primals_3, out=buf4) 2023-01-11T21:38:05.9876339Z del buf3 2023-01-11T21:38:05.9876415Z del primals_3 2023-01-11T21:38:05.9876552Z return (as_strided(buf4, (32, 32), (32, 1)), as_strided(buf1, (32, 32), (32, 1)), None, ) 2023-01-11T21:38:05.9876558Z 2023-01-11T21:38:05.9876565Z 2023-01-11T21:38:05.9876646Z if __name__ == "__main__": 2023-01-11T21:38:05.9876768Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.9876897Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.9877103Z primals_3 = rand_strided((8, 32), (32, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9877294Z philox_seed_like = rand_strided((), (), device='cpu', dtype=torch.int64) 2023-01-11T21:38:05.9877488Z mul_1 = rand_strided((8, 32), (32, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9877691Z permute_4 = rand_strided((32, 32), (32, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9877892Z tangents_1 = rand_strided((8, 32), (32, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9878063Z print_performance(lambda: call([primals_3, philox_seed_like, mul_1, permute_4, tangents_1])) 2023-01-11T21:38:05.9878069Z 2023-01-11T21:38:05.9878140Z ok (4.423s) 2023-01-11T21:38:05.9878627Z test_masked_fill_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.9878764Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.9879023Z [2023-01-11 21:29:01,173] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 181 2023-01-11T21:38:05.9879286Z [2023-01-11 21:29:03,275] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 181 2023-01-11T21:38:05.9879294Z 2023-01-11T21:38:05.9879386Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.9879465Z import torch 2023-01-11T21:38:05.9879542Z import random 2023-01-11T21:38:05.9879663Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.9879787Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.9879793Z 2023-01-11T21:38:05.9879875Z aten = torch.ops.aten 2023-01-11T21:38:05.9880015Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.9880111Z async_compile = AsyncCompile() 2023-01-11T21:38:05.9880116Z 2023-01-11T21:38:05.9880184Z import triton 2023-01-11T21:38:05.9880276Z import triton.language as tl 2023-01-11T21:38:05.9880401Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.9880541Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.9880629Z 2023-01-11T21:38:05.9880634Z 2023-01-11T21:38:05.9880773Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.9880979Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.9881103Z extern "C" void kernel(const bool* __restrict__ in_ptr0, 2023-01-11T21:38:05.9881214Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:05.9881311Z float* __restrict__ out_ptr0, 2023-01-11T21:38:05.9881413Z float* __restrict__ out_ptr1) 2023-01-11T21:38:05.9881480Z { 2023-01-11T21:38:05.9881585Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.9881652Z { 2023-01-11T21:38:05.9881735Z #pragma omp for 2023-01-11T21:38:05.9881823Z for(long i0=0; i0<16; i0+=1) 2023-01-11T21:38:05.9881884Z { 2023-01-11T21:38:05.9881969Z #pragma GCC ivdep 2023-01-11T21:38:05.9882062Z for(long i1=0; i1<16; i1+=1) 2023-01-11T21:38:05.9882133Z { 2023-01-11T21:38:05.9882206Z { 2023-01-11T21:38:05.9882277Z { 2023-01-11T21:38:05.9882371Z auto tmp0 = in_ptr0[i1]; 2023-01-11T21:38:05.9882483Z auto tmp2 = in_ptr1[i1 + (16*i0)]; 2023-01-11T21:38:05.9882654Z auto tmp1 = static_cast(-10000.0); 2023-01-11T21:38:05.9882766Z auto tmp3 = tmp0 ? tmp1 : tmp2; 2023-01-11T21:38:05.9882877Z auto tmp4 = static_cast(2); 2023-01-11T21:38:05.9882979Z auto tmp5 = tmp3 + tmp4; 2023-01-11T21:38:05.9883076Z auto tmp6 = tmp0 == 0; 2023-01-11T21:38:05.9883192Z auto tmp7 = static_cast(667.0); 2023-01-11T21:38:05.9883300Z auto tmp8 = static_cast(2.0); 2023-01-11T21:38:05.9883398Z auto tmp9 = tmp2 / tmp8; 2023-01-11T21:38:05.9883505Z auto tmp10 = tmp6 ? tmp7 : tmp9; 2023-01-11T21:38:05.9883611Z out_ptr0[i1 + (16*i0)] = tmp5; 2023-01-11T21:38:05.9883714Z out_ptr1[i1 + (16*i0)] = tmp10; 2023-01-11T21:38:05.9883786Z } 2023-01-11T21:38:05.9883857Z } 2023-01-11T21:38:05.9883918Z } 2023-01-11T21:38:05.9883985Z } 2023-01-11T21:38:05.9884081Z } 2023-01-11T21:38:05.9884146Z } 2023-01-11T21:38:05.9884231Z ''') 2023-01-11T21:38:05.9884237Z 2023-01-11T21:38:05.9884241Z 2023-01-11T21:38:05.9884337Z async_compile.wait(globals()) 2023-01-11T21:38:05.9884414Z del async_compile 2023-01-11T21:38:05.9884419Z 2023-01-11T21:38:05.9884487Z def call(args): 2023-01-11T21:38:05.9884567Z arg0_1, arg1_1 = args 2023-01-11T21:38:05.9884644Z args.clear() 2023-01-11T21:38:05.9884843Z buf0 = empty_strided((16, 16), (16, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9885042Z buf1 = empty_strided((16, 16), (16, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9885240Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr())) 2023-01-11T21:38:05.9885314Z del arg0_1 2023-01-11T21:38:05.9885379Z del arg1_1 2023-01-11T21:38:05.9885465Z return (buf0, buf1, ) 2023-01-11T21:38:05.9885470Z 2023-01-11T21:38:05.9885475Z 2023-01-11T21:38:05.9885558Z if __name__ == "__main__": 2023-01-11T21:38:05.9885677Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.9885804Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.9885998Z arg0_1 = rand_strided((1, 16), (16, 1), device='cpu', dtype=torch.bool) 2023-01-11T21:38:05.9886196Z arg1_1 = rand_strided((16, 16), (16, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9886317Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:05.9886322Z 2023-01-11T21:38:05.9886386Z ok (2.153s) 2023-01-11T21:38:05.9886908Z test_masked_fill_promotion_cpu (__main__.CpuTests) ... /opt/conda/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py:372: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled.Consider setting `torch.set_float32_matmul_precision('high')` 2023-01-11T21:38:05.9887024Z warnings.warn( 2023-01-11T21:38:05.9887286Z [2023-01-11 21:29:03,319] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 182 2023-01-11T21:38:05.9887551Z [2023-01-11 21:29:05,432] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 182 2023-01-11T21:38:05.9887806Z [2023-01-11 21:29:05,455] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 183 2023-01-11T21:38:05.9888068Z [2023-01-11 21:29:07,157] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 183 2023-01-11T21:38:05.9888074Z 2023-01-11T21:38:05.9888174Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.9888248Z import torch 2023-01-11T21:38:05.9888319Z import random 2023-01-11T21:38:05.9888440Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.9888564Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.9888569Z 2023-01-11T21:38:05.9888654Z aten = torch.ops.aten 2023-01-11T21:38:05.9888791Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.9888895Z async_compile = AsyncCompile() 2023-01-11T21:38:05.9888900Z 2023-01-11T21:38:05.9888975Z import triton 2023-01-11T21:38:05.9889067Z import triton.language as tl 2023-01-11T21:38:05.9889185Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.9889325Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.9889331Z 2023-01-11T21:38:05.9889335Z 2023-01-11T21:38:05.9889472Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.9889676Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.9889800Z extern "C" void kernel(const bool* __restrict__ in_ptr0, 2023-01-11T21:38:05.9889911Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:05.9890016Z float* __restrict__ out_ptr0) 2023-01-11T21:38:05.9890081Z { 2023-01-11T21:38:05.9890176Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.9890243Z { 2023-01-11T21:38:05.9890353Z #pragma omp for 2023-01-11T21:38:05.9890443Z for(long i0=0; i0<16; i0+=1) 2023-01-11T21:38:05.9890510Z { 2023-01-11T21:38:05.9890600Z for(long i1=0; i1<2; i1+=1) 2023-01-11T21:38:05.9890661Z { 2023-01-11T21:38:05.9890770Z float g_tmp_buffer_in_ptr0[8] = {0}; 2023-01-11T21:38:05.9890901Z flag_to_float(in_ptr0 + 8*i1, g_tmp_buffer_in_ptr0, 8); 2023-01-11T21:38:05.9891055Z auto tmp0 = at::vec::Vectorized::loadu(g_tmp_buffer_in_ptr0); 2023-01-11T21:38:05.9891202Z auto tmp2 = at::vec::Vectorized::loadu(in_ptr1 + (8*i1) + (16*i0)); 2023-01-11T21:38:05.9891349Z auto tmp1 = at::vec::Vectorized(static_cast(3.5)); 2023-01-11T21:38:05.9891480Z auto tmp3 = decltype(tmp1)::blendv(tmp2, tmp1, tmp0); 2023-01-11T21:38:05.9891589Z tmp3.store(out_ptr0 + (8*i1) + (16*i0)); 2023-01-11T21:38:05.9891658Z } 2023-01-11T21:38:05.9891752Z #pragma omp simd simdlen(4) 2023-01-11T21:38:05.9891846Z for(long i1=16; i1<16; i1+=1) 2023-01-11T21:38:05.9891917Z { 2023-01-11T21:38:05.9892011Z auto tmp0 = in_ptr0[i1]; 2023-01-11T21:38:05.9892114Z auto tmp2 = in_ptr1[i1 + (16*i0)]; 2023-01-11T21:38:05.9892220Z auto tmp1 = static_cast(3.5); 2023-01-11T21:38:05.9892323Z auto tmp3 = tmp0 ? tmp1 : tmp2; 2023-01-11T21:38:05.9892412Z out_ptr0[i1 + (16*i0)] = tmp3; 2023-01-11T21:38:05.9892480Z } 2023-01-11T21:38:05.9892547Z } 2023-01-11T21:38:05.9892614Z } 2023-01-11T21:38:05.9892705Z } 2023-01-11T21:38:05.9892791Z ''') 2023-01-11T21:38:05.9892796Z 2023-01-11T21:38:05.9892801Z 2023-01-11T21:38:05.9892896Z async_compile.wait(globals()) 2023-01-11T21:38:05.9892967Z del async_compile 2023-01-11T21:38:05.9892973Z 2023-01-11T21:38:05.9893049Z def call(args): 2023-01-11T21:38:05.9893130Z arg0_1, arg1_1 = args 2023-01-11T21:38:05.9893208Z args.clear() 2023-01-11T21:38:05.9893408Z buf0 = empty_strided((16, 16), (16, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9893575Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:05.9893649Z del arg0_1 2023-01-11T21:38:05.9893713Z del arg1_1 2023-01-11T21:38:05.9893790Z return (buf0, ) 2023-01-11T21:38:05.9893795Z 2023-01-11T21:38:05.9893799Z 2023-01-11T21:38:05.9893880Z if __name__ == "__main__": 2023-01-11T21:38:05.9893998Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.9894124Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.9894321Z arg0_1 = rand_strided((1, 16), (16, 1), device='cpu', dtype=torch.bool) 2023-01-11T21:38:05.9894632Z arg1_1 = rand_strided((16, 16), (16, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9894754Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:05.9894763Z 2023-01-11T21:38:05.9894767Z 2023-01-11T21:38:05.9894857Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.9894932Z import torch 2023-01-11T21:38:05.9895008Z import random 2023-01-11T21:38:05.9895131Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.9895254Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.9895259Z 2023-01-11T21:38:05.9895343Z aten = torch.ops.aten 2023-01-11T21:38:05.9895479Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.9895576Z async_compile = AsyncCompile() 2023-01-11T21:38:05.9895581Z 2023-01-11T21:38:05.9895648Z import triton 2023-01-11T21:38:05.9895745Z import triton.language as tl 2023-01-11T21:38:05.9895869Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.9896008Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.9896013Z 2023-01-11T21:38:05.9896017Z 2023-01-11T21:38:05.9896154Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.9896410Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.9896534Z extern "C" void kernel(const bool* __restrict__ in_ptr0, 2023-01-11T21:38:05.9896643Z const long* __restrict__ in_ptr1, 2023-01-11T21:38:05.9896738Z long* __restrict__ out_ptr0) 2023-01-11T21:38:05.9896803Z { 2023-01-11T21:38:05.9896904Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.9896970Z { 2023-01-11T21:38:05.9897051Z #pragma omp for 2023-01-11T21:38:05.9897199Z for(long i0=0; i0<16; i0+=1) 2023-01-11T21:38:05.9897261Z { 2023-01-11T21:38:05.9897349Z #pragma GCC ivdep 2023-01-11T21:38:05.9897440Z for(long i1=0; i1<16; i1+=1) 2023-01-11T21:38:05.9897508Z { 2023-01-11T21:38:05.9897577Z { 2023-01-11T21:38:05.9897654Z { 2023-01-11T21:38:05.9897755Z auto tmp0 = in_ptr0[i1]; 2023-01-11T21:38:05.9897858Z auto tmp3 = in_ptr1[i1 + (16*i0)]; 2023-01-11T21:38:05.9897974Z auto tmp1 = static_cast(3.5); 2023-01-11T21:38:05.9898087Z auto tmp2 = static_cast(tmp1); 2023-01-11T21:38:05.9898192Z auto tmp4 = tmp0 ? tmp2 : tmp3; 2023-01-11T21:38:05.9898297Z out_ptr0[i1 + (16*i0)] = tmp4; 2023-01-11T21:38:05.9898368Z } 2023-01-11T21:38:05.9898437Z } 2023-01-11T21:38:05.9898497Z } 2023-01-11T21:38:05.9898565Z } 2023-01-11T21:38:05.9898631Z } 2023-01-11T21:38:05.9898736Z } 2023-01-11T21:38:05.9898826Z ''') 2023-01-11T21:38:05.9898831Z 2023-01-11T21:38:05.9898836Z 2023-01-11T21:38:05.9898929Z async_compile.wait(globals()) 2023-01-11T21:38:05.9899007Z del async_compile 2023-01-11T21:38:05.9899012Z 2023-01-11T21:38:05.9899079Z def call(args): 2023-01-11T21:38:05.9899159Z arg0_1, arg1_1 = args 2023-01-11T21:38:05.9899239Z args.clear() 2023-01-11T21:38:05.9899438Z buf0 = empty_strided((16, 16), (16, 1), device='cpu', dtype=torch.int64) 2023-01-11T21:38:05.9899605Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:05.9899679Z del arg0_1 2023-01-11T21:38:05.9899750Z del arg1_1 2023-01-11T21:38:05.9899818Z return (buf0, ) 2023-01-11T21:38:05.9899830Z 2023-01-11T21:38:05.9899834Z 2023-01-11T21:38:05.9899907Z if __name__ == "__main__": 2023-01-11T21:38:05.9900028Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.9900154Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.9900350Z arg0_1 = rand_strided((1, 16), (16, 1), device='cpu', dtype=torch.bool) 2023-01-11T21:38:05.9900548Z arg1_1 = rand_strided((16, 16), (16, 1), device='cpu', dtype=torch.int64) 2023-01-11T21:38:05.9900667Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:05.9900672Z 2023-01-11T21:38:05.9900746Z ok (3.877s) 2023-01-11T21:38:05.9901198Z test_max_min_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.9901330Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.9901584Z [2023-01-11 21:29:07,173] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 184 2023-01-11T21:38:05.9901850Z [2023-01-11 21:29:08,858] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 184 2023-01-11T21:38:05.9902294Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.9902427Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.9902684Z [2023-01-11 21:29:08,874] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 185 2023-01-11T21:38:05.9902947Z [2023-01-11 21:29:08,883] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 185 2023-01-11T21:38:05.9902953Z 2023-01-11T21:38:05.9903051Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.9903129Z import torch 2023-01-11T21:38:05.9903204Z import random 2023-01-11T21:38:05.9903317Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.9903445Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.9903450Z 2023-01-11T21:38:05.9903535Z aten = torch.ops.aten 2023-01-11T21:38:05.9903674Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.9903770Z async_compile = AsyncCompile() 2023-01-11T21:38:05.9903775Z 2023-01-11T21:38:05.9903850Z import triton 2023-01-11T21:38:05.9903945Z import triton.language as tl 2023-01-11T21:38:05.9904072Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.9904204Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.9904210Z 2023-01-11T21:38:05.9904222Z 2023-01-11T21:38:05.9904352Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.9904556Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.9904708Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:05.9904817Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:05.9904921Z float* __restrict__ out_ptr0, 2023-01-11T21:38:05.9905028Z float* __restrict__ out_ptr1) 2023-01-11T21:38:05.9905099Z { 2023-01-11T21:38:05.9905194Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.9905261Z { 2023-01-11T21:38:05.9905343Z #pragma omp for 2023-01-11T21:38:05.9905431Z for(long i0=0; i0<1; i0+=1) 2023-01-11T21:38:05.9905499Z { 2023-01-11T21:38:05.9905641Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:05.9905778Z auto tmp1 = at::vec::Vectorized::loadu(in_ptr1 + 8*i0); 2023-01-11T21:38:05.9905884Z auto tmp2 = at::vec::maximum(tmp0, tmp1); 2023-01-11T21:38:05.9905995Z auto tmp3 = at::vec::minimum(tmp0, tmp1); 2023-01-11T21:38:05.9906094Z tmp2.store(out_ptr0 + 8*i0); 2023-01-11T21:38:05.9906191Z tmp3.store(out_ptr1 + 8*i0); 2023-01-11T21:38:05.9906258Z } 2023-01-11T21:38:05.9906359Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:05.9906446Z for(long i0=8; i0<8; i0+=1) 2023-01-11T21:38:05.9906507Z { 2023-01-11T21:38:05.9906598Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:05.9906688Z auto tmp1 = in_ptr1[i0]; 2023-01-11T21:38:05.9906817Z auto tmp2 = (tmp1 != tmp1) ? tmp1 : std::max(tmp0, tmp1); 2023-01-11T21:38:05.9906942Z auto tmp3 = (tmp1 != tmp1) ? tmp1 : std::min(tmp0, tmp1); 2023-01-11T21:38:05.9907027Z out_ptr0[i0] = tmp2; 2023-01-11T21:38:05.9907112Z out_ptr1[i0] = tmp3; 2023-01-11T21:38:05.9907172Z } 2023-01-11T21:38:05.9907240Z } 2023-01-11T21:38:05.9907305Z } 2023-01-11T21:38:05.9907392Z ''') 2023-01-11T21:38:05.9907397Z 2023-01-11T21:38:05.9907402Z 2023-01-11T21:38:05.9907497Z async_compile.wait(globals()) 2023-01-11T21:38:05.9907575Z del async_compile 2023-01-11T21:38:05.9907580Z 2023-01-11T21:38:05.9907657Z def call(args): 2023-01-11T21:38:05.9907737Z arg0_1, arg1_1 = args 2023-01-11T21:38:05.9907806Z args.clear() 2023-01-11T21:38:05.9908033Z buf0 = empty_strided((8, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9908228Z buf1 = empty_strided((8, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9908422Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr())) 2023-01-11T21:38:05.9908495Z del arg0_1 2023-01-11T21:38:05.9908567Z del arg1_1 2023-01-11T21:38:05.9908651Z return (buf0, buf1, ) 2023-01-11T21:38:05.9908656Z 2023-01-11T21:38:05.9908660Z 2023-01-11T21:38:05.9908734Z if __name__ == "__main__": 2023-01-11T21:38:05.9908853Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.9908979Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.9909174Z arg0_1 = rand_strided((8, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9909365Z arg1_1 = rand_strided((8, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9909484Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:05.9909489Z 2023-01-11T21:38:05.9909496Z 2023-01-11T21:38:05.9909594Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.9909669Z import torch 2023-01-11T21:38:05.9909737Z import random 2023-01-11T21:38:05.9909857Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.9909981Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.9909986Z 2023-01-11T21:38:05.9910068Z aten = torch.ops.aten 2023-01-11T21:38:05.9910204Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.9910300Z async_compile = AsyncCompile() 2023-01-11T21:38:05.9910305Z 2023-01-11T21:38:05.9910379Z import triton 2023-01-11T21:38:05.9910500Z import triton.language as tl 2023-01-11T21:38:05.9910618Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.9910758Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.9910764Z 2023-01-11T21:38:05.9910768Z 2023-01-11T21:38:05.9910903Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.9911112Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.9911236Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:05.9911347Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:05.9911453Z float* __restrict__ out_ptr0, 2023-01-11T21:38:05.9911553Z float* __restrict__ out_ptr1) 2023-01-11T21:38:05.9911612Z { 2023-01-11T21:38:05.9911713Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.9911780Z { 2023-01-11T21:38:05.9911865Z #pragma omp for 2023-01-11T21:38:05.9911952Z for(long i0=0; i0<1; i0+=1) 2023-01-11T21:38:05.9912025Z { 2023-01-11T21:38:05.9912163Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:05.9912292Z auto tmp1 = at::vec::Vectorized::loadu(in_ptr1 + 8*i0); 2023-01-11T21:38:05.9912407Z auto tmp2 = at::vec::maximum(tmp0, tmp1); 2023-01-11T21:38:05.9912521Z auto tmp3 = at::vec::minimum(tmp0, tmp1); 2023-01-11T21:38:05.9912618Z tmp2.store(out_ptr0 + 8*i0); 2023-01-11T21:38:05.9912715Z tmp3.store(out_ptr1 + 8*i0); 2023-01-11T21:38:05.9912784Z } 2023-01-11T21:38:05.9912883Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:05.9912963Z for(long i0=8; i0<8; i0+=1) 2023-01-11T21:38:05.9913030Z { 2023-01-11T21:38:05.9913120Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:05.9913211Z auto tmp1 = in_ptr1[i0]; 2023-01-11T21:38:05.9913338Z auto tmp2 = (tmp1 != tmp1) ? tmp1 : std::max(tmp0, tmp1); 2023-01-11T21:38:05.9913466Z auto tmp3 = (tmp1 != tmp1) ? tmp1 : std::min(tmp0, tmp1); 2023-01-11T21:38:05.9913554Z out_ptr0[i0] = tmp2; 2023-01-11T21:38:05.9913631Z out_ptr1[i0] = tmp3; 2023-01-11T21:38:05.9913702Z } 2023-01-11T21:38:05.9913768Z } 2023-01-11T21:38:05.9913833Z } 2023-01-11T21:38:05.9913918Z ''') 2023-01-11T21:38:05.9913956Z 2023-01-11T21:38:05.9913961Z 2023-01-11T21:38:05.9914057Z async_compile.wait(globals()) 2023-01-11T21:38:05.9914135Z del async_compile 2023-01-11T21:38:05.9914141Z 2023-01-11T21:38:05.9914216Z def call(args): 2023-01-11T21:38:05.9914289Z arg0_1, arg1_1 = args 2023-01-11T21:38:05.9914365Z args.clear() 2023-01-11T21:38:05.9914556Z buf0 = empty_strided((8, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9914747Z buf1 = empty_strided((8, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9914941Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr())) 2023-01-11T21:38:05.9915018Z del arg0_1 2023-01-11T21:38:05.9915090Z del arg1_1 2023-01-11T21:38:05.9915164Z return (buf0, buf1, ) 2023-01-11T21:38:05.9915169Z 2023-01-11T21:38:05.9915181Z 2023-01-11T21:38:05.9915255Z if __name__ == "__main__": 2023-01-11T21:38:05.9915375Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.9915502Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.9915694Z arg0_1 = rand_strided((8, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9915885Z arg1_1 = rand_strided((8, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9916009Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:05.9916014Z 2023-01-11T21:38:05.9916085Z ok (1.727s) 2023-01-11T21:38:05.9916541Z test_max_pool2d1_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.9916695Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.9916960Z [2023-01-11 21:29:08,903] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 186 2023-01-11T21:38:05.9917226Z [2023-01-11 21:29:10,682] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 186 2023-01-11T21:38:05.9917232Z 2023-01-11T21:38:05.9917329Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.9917405Z import torch 2023-01-11T21:38:05.9917481Z import random 2023-01-11T21:38:05.9917601Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.9917726Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.9917731Z 2023-01-11T21:38:05.9917809Z aten = torch.ops.aten 2023-01-11T21:38:05.9917946Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.9918043Z async_compile = AsyncCompile() 2023-01-11T21:38:05.9918048Z 2023-01-11T21:38:05.9918123Z import triton 2023-01-11T21:38:05.9918218Z import triton.language as tl 2023-01-11T21:38:05.9918346Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.9918486Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.9918492Z 2023-01-11T21:38:05.9918496Z 2023-01-11T21:38:05.9918634Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.9918831Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.9918954Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:05.9919061Z float* __restrict__ out_ptr0, 2023-01-11T21:38:05.9919161Z long* __restrict__ out_ptr1) 2023-01-11T21:38:05.9919232Z { 2023-01-11T21:38:05.9919336Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.9919406Z { 2023-01-11T21:38:05.9919480Z #pragma omp for 2023-01-11T21:38:05.9919565Z for(long i0=0; i0<8; i0+=1) 2023-01-11T21:38:05.9919633Z { 2023-01-11T21:38:05.9919717Z #pragma GCC ivdep 2023-01-11T21:38:05.9919833Z for(long i1=0; i1<7; i1+=1) 2023-01-11T21:38:05.9919903Z { 2023-01-11T21:38:05.9919990Z #pragma GCC ivdep 2023-01-11T21:38:05.9920077Z for(long i2=0; i2<7; i2+=1) 2023-01-11T21:38:05.9920148Z { 2023-01-11T21:38:05.9920222Z { 2023-01-11T21:38:05.9920295Z { 2023-01-11T21:38:05.9920413Z auto tmp0 = in_ptr0[(2*i2) + (32*i1) + (256*i0)]; 2023-01-11T21:38:05.9920535Z auto tmp1 = in_ptr0[1 + (2*i2) + (32*i1) + (256*i0)]; 2023-01-11T21:38:05.9920657Z auto tmp3 = in_ptr0[2 + (2*i2) + (32*i1) + (256*i0)]; 2023-01-11T21:38:05.9920773Z auto tmp5 = in_ptr0[16 + (2*i2) + (32*i1) + (256*i0)]; 2023-01-11T21:38:05.9920888Z auto tmp7 = in_ptr0[17 + (2*i2) + (32*i1) + (256*i0)]; 2023-01-11T21:38:05.9921000Z auto tmp9 = in_ptr0[18 + (2*i2) + (32*i1) + (256*i0)]; 2023-01-11T21:38:05.9921121Z auto tmp11 = in_ptr0[32 + (2*i2) + (32*i1) + (256*i0)]; 2023-01-11T21:38:05.9921236Z auto tmp13 = in_ptr0[33 + (2*i2) + (32*i1) + (256*i0)]; 2023-01-11T21:38:05.9921349Z auto tmp15 = in_ptr0[34 + (2*i2) + (32*i1) + (256*i0)]; 2023-01-11T21:38:05.9921486Z auto tmp2 = (tmp0 != tmp0) ? tmp0 : std::max(tmp1, tmp0); 2023-01-11T21:38:05.9921615Z auto tmp4 = (tmp2 != tmp2) ? tmp2 : std::max(tmp3, tmp2); 2023-01-11T21:38:05.9921748Z auto tmp6 = (tmp4 != tmp4) ? tmp4 : std::max(tmp5, tmp4); 2023-01-11T21:38:05.9921905Z auto tmp8 = (tmp6 != tmp6) ? tmp6 : std::max(tmp7, tmp6); 2023-01-11T21:38:05.9922040Z auto tmp10 = (tmp8 != tmp8) ? tmp8 : std::max(tmp9, tmp8); 2023-01-11T21:38:05.9922179Z auto tmp12 = (tmp10 != tmp10) ? tmp10 : std::max(tmp11, tmp10); 2023-01-11T21:38:05.9922318Z auto tmp14 = (tmp12 != tmp12) ? tmp12 : std::max(tmp13, tmp12); 2023-01-11T21:38:05.9922452Z auto tmp16 = (tmp14 != tmp14) ? tmp14 : std::max(tmp15, tmp14); 2023-01-11T21:38:05.9922576Z auto tmp17 = static_cast((2*i2) + (32*i1)); 2023-01-11T21:38:05.9922698Z auto tmp18 = static_cast(1 + (2*i2) + (32*i1)); 2023-01-11T21:38:05.9922800Z auto tmp19 = tmp1 > tmp0; 2023-01-11T21:38:05.9922911Z auto tmp20 = tmp19 ? tmp18 : tmp17; 2023-01-11T21:38:05.9923030Z auto tmp21 = static_cast(2 + (2*i2) + (32*i1)); 2023-01-11T21:38:05.9923131Z auto tmp22 = tmp3 > tmp2; 2023-01-11T21:38:05.9923241Z auto tmp23 = tmp22 ? tmp21 : tmp20; 2023-01-11T21:38:05.9923369Z auto tmp24 = static_cast(16 + (2*i2) + (32*i1)); 2023-01-11T21:38:05.9923470Z auto tmp25 = tmp5 > tmp4; 2023-01-11T21:38:05.9923577Z auto tmp26 = tmp25 ? tmp24 : tmp23; 2023-01-11T21:38:05.9923700Z auto tmp27 = static_cast(17 + (2*i2) + (32*i1)); 2023-01-11T21:38:05.9923801Z auto tmp28 = tmp7 > tmp6; 2023-01-11T21:38:05.9923903Z auto tmp29 = tmp28 ? tmp27 : tmp26; 2023-01-11T21:38:05.9924026Z auto tmp30 = static_cast(18 + (2*i2) + (32*i1)); 2023-01-11T21:38:05.9924130Z auto tmp31 = tmp9 > tmp8; 2023-01-11T21:38:05.9924241Z auto tmp32 = tmp31 ? tmp30 : tmp29; 2023-01-11T21:38:05.9924363Z auto tmp33 = static_cast(32 + (2*i2) + (32*i1)); 2023-01-11T21:38:05.9924467Z auto tmp34 = tmp11 > tmp10; 2023-01-11T21:38:05.9924601Z auto tmp35 = tmp34 ? tmp33 : tmp32; 2023-01-11T21:38:05.9924716Z auto tmp36 = static_cast(33 + (2*i2) + (32*i1)); 2023-01-11T21:38:05.9924817Z auto tmp37 = tmp13 > tmp12; 2023-01-11T21:38:05.9924924Z auto tmp38 = tmp37 ? tmp36 : tmp35; 2023-01-11T21:38:05.9925045Z auto tmp39 = static_cast(34 + (2*i2) + (32*i1)); 2023-01-11T21:38:05.9925145Z auto tmp40 = tmp15 > tmp14; 2023-01-11T21:38:05.9925251Z auto tmp41 = tmp40 ? tmp39 : tmp38; 2023-01-11T21:38:05.9925378Z out_ptr0[i2 + (7*i1) + (49*i0)] = tmp16; 2023-01-11T21:38:05.9925500Z out_ptr1[i2 + (7*i1) + (49*i0)] = tmp41; 2023-01-11T21:38:05.9925583Z } 2023-01-11T21:38:05.9925659Z } 2023-01-11T21:38:05.9925728Z } 2023-01-11T21:38:05.9925796Z } 2023-01-11T21:38:05.9925865Z } 2023-01-11T21:38:05.9925932Z } 2023-01-11T21:38:05.9925995Z } 2023-01-11T21:38:05.9926075Z ''') 2023-01-11T21:38:05.9926081Z 2023-01-11T21:38:05.9926085Z 2023-01-11T21:38:05.9926182Z async_compile.wait(globals()) 2023-01-11T21:38:05.9926263Z del async_compile 2023-01-11T21:38:05.9926268Z 2023-01-11T21:38:05.9926343Z def call(args): 2023-01-11T21:38:05.9926418Z arg0_1, = args 2023-01-11T21:38:05.9926494Z args.clear() 2023-01-11T21:38:05.9926708Z buf0 = empty_strided((2, 4, 7, 7), (196, 49, 7, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9926911Z buf1 = empty_strided((2, 4, 7, 7), (196, 49, 7, 1), device='cpu', dtype=torch.int64) 2023-01-11T21:38:05.9927112Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr())) 2023-01-11T21:38:05.9927185Z del arg0_1 2023-01-11T21:38:05.9927266Z return (buf0, buf1, ) 2023-01-11T21:38:05.9927271Z 2023-01-11T21:38:05.9927276Z 2023-01-11T21:38:05.9927359Z if __name__ == "__main__": 2023-01-11T21:38:05.9927478Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.9927604Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.9927824Z arg0_1 = rand_strided((2, 4, 16, 16), (1024, 256, 16, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9927930Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:05.9927940Z 2023-01-11T21:38:05.9928005Z ok (1.799s) 2023-01-11T21:38:05.9928457Z test_max_pool2d2_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.9928594Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.9928854Z [2023-01-11 21:29:10,726] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 187 2023-01-11T21:38:05.9929121Z [2023-01-11 21:29:12,646] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 187 2023-01-11T21:38:05.9929127Z 2023-01-11T21:38:05.9929225Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.9929299Z import torch 2023-01-11T21:38:05.9929374Z import random 2023-01-11T21:38:05.9929485Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.9929609Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.9929614Z 2023-01-11T21:38:05.9929696Z aten = torch.ops.aten 2023-01-11T21:38:05.9929834Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.9929931Z async_compile = AsyncCompile() 2023-01-11T21:38:05.9929936Z 2023-01-11T21:38:05.9930012Z import triton 2023-01-11T21:38:05.9930103Z import triton.language as tl 2023-01-11T21:38:05.9930256Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.9930389Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.9930395Z 2023-01-11T21:38:05.9930407Z 2023-01-11T21:38:05.9930537Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.9930744Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.9930871Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:05.9930979Z float* __restrict__ out_ptr0, 2023-01-11T21:38:05.9931082Z long* __restrict__ out_ptr1) 2023-01-11T21:38:05.9931152Z { 2023-01-11T21:38:05.9931259Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.9931319Z { 2023-01-11T21:38:05.9931402Z #pragma omp for 2023-01-11T21:38:05.9931491Z for(long i0=0; i0<1024; i0+=1) 2023-01-11T21:38:05.9931558Z { 2023-01-11T21:38:05.9931642Z #pragma GCC ivdep 2023-01-11T21:38:05.9931735Z for(long i1=0; i1<27; i1+=1) 2023-01-11T21:38:05.9931802Z { 2023-01-11T21:38:05.9931883Z #pragma GCC ivdep 2023-01-11T21:38:05.9931976Z for(long i2=0; i2<27; i2+=1) 2023-01-11T21:38:05.9932046Z { 2023-01-11T21:38:05.9932120Z { 2023-01-11T21:38:05.9932193Z { 2023-01-11T21:38:05.9932314Z auto tmp0 = in_ptr0[(2*i2) + (110*i1) + (3025*i0)]; 2023-01-11T21:38:05.9932433Z auto tmp1 = in_ptr0[1 + (2*i2) + (110*i1) + (3025*i0)]; 2023-01-11T21:38:05.9932542Z auto tmp3 = in_ptr0[2 + (2*i2) + (110*i1) + (3025*i0)]; 2023-01-11T21:38:05.9932706Z auto tmp5 = in_ptr0[55 + (2*i2) + (110*i1) + (3025*i0)]; 2023-01-11T21:38:05.9932820Z auto tmp7 = in_ptr0[56 + (2*i2) + (110*i1) + (3025*i0)]; 2023-01-11T21:38:05.9932935Z auto tmp9 = in_ptr0[57 + (2*i2) + (110*i1) + (3025*i0)]; 2023-01-11T21:38:05.9933057Z auto tmp11 = in_ptr0[110 + (2*i2) + (110*i1) + (3025*i0)]; 2023-01-11T21:38:05.9933176Z auto tmp13 = in_ptr0[111 + (2*i2) + (110*i1) + (3025*i0)]; 2023-01-11T21:38:05.9933296Z auto tmp15 = in_ptr0[112 + (2*i2) + (110*i1) + (3025*i0)]; 2023-01-11T21:38:05.9933433Z auto tmp2 = (tmp0 != tmp0) ? tmp0 : std::max(tmp1, tmp0); 2023-01-11T21:38:05.9933554Z auto tmp4 = (tmp2 != tmp2) ? tmp2 : std::max(tmp3, tmp2); 2023-01-11T21:38:05.9933683Z auto tmp6 = (tmp4 != tmp4) ? tmp4 : std::max(tmp5, tmp4); 2023-01-11T21:38:05.9933815Z auto tmp8 = (tmp6 != tmp6) ? tmp6 : std::max(tmp7, tmp6); 2023-01-11T21:38:05.9933948Z auto tmp10 = (tmp8 != tmp8) ? tmp8 : std::max(tmp9, tmp8); 2023-01-11T21:38:05.9934091Z auto tmp12 = (tmp10 != tmp10) ? tmp10 : std::max(tmp11, tmp10); 2023-01-11T21:38:05.9934227Z auto tmp14 = (tmp12 != tmp12) ? tmp12 : std::max(tmp13, tmp12); 2023-01-11T21:38:05.9934359Z auto tmp16 = (tmp14 != tmp14) ? tmp14 : std::max(tmp15, tmp14); 2023-01-11T21:38:05.9934589Z auto tmp17 = static_cast((2*i2) + (110*i1)); 2023-01-11T21:38:05.9934720Z auto tmp18 = static_cast(1 + (2*i2) + (110*i1)); 2023-01-11T21:38:05.9934815Z auto tmp19 = tmp1 > tmp0; 2023-01-11T21:38:05.9934925Z auto tmp20 = tmp19 ? tmp18 : tmp17; 2023-01-11T21:38:05.9935051Z auto tmp21 = static_cast(2 + (2*i2) + (110*i1)); 2023-01-11T21:38:05.9935152Z auto tmp22 = tmp3 > tmp2; 2023-01-11T21:38:05.9935261Z auto tmp23 = tmp22 ? tmp21 : tmp20; 2023-01-11T21:38:05.9935431Z auto tmp24 = static_cast(55 + (2*i2) + (110*i1)); 2023-01-11T21:38:05.9935533Z auto tmp25 = tmp5 > tmp4; 2023-01-11T21:38:05.9935640Z auto tmp26 = tmp25 ? tmp24 : tmp23; 2023-01-11T21:38:05.9935757Z auto tmp27 = static_cast(56 + (2*i2) + (110*i1)); 2023-01-11T21:38:05.9935863Z auto tmp28 = tmp7 > tmp6; 2023-01-11T21:38:05.9935972Z auto tmp29 = tmp28 ? tmp27 : tmp26; 2023-01-11T21:38:05.9936099Z auto tmp30 = static_cast(57 + (2*i2) + (110*i1)); 2023-01-11T21:38:05.9936201Z auto tmp31 = tmp9 > tmp8; 2023-01-11T21:38:05.9936310Z auto tmp32 = tmp31 ? tmp30 : tmp29; 2023-01-11T21:38:05.9936433Z auto tmp33 = static_cast(110 + (2*i2) + (110*i1)); 2023-01-11T21:38:05.9936536Z auto tmp34 = tmp11 > tmp10; 2023-01-11T21:38:05.9936637Z auto tmp35 = tmp34 ? tmp33 : tmp32; 2023-01-11T21:38:05.9936762Z auto tmp36 = static_cast(111 + (2*i2) + (110*i1)); 2023-01-11T21:38:05.9936865Z auto tmp37 = tmp13 > tmp12; 2023-01-11T21:38:05.9936971Z auto tmp38 = tmp37 ? tmp36 : tmp35; 2023-01-11T21:38:05.9937091Z auto tmp39 = static_cast(112 + (2*i2) + (110*i1)); 2023-01-11T21:38:05.9937245Z auto tmp40 = tmp15 > tmp14; 2023-01-11T21:38:05.9937352Z auto tmp41 = tmp40 ? tmp39 : tmp38; 2023-01-11T21:38:05.9937491Z out_ptr0[i2 + (27*i1) + (729*i0)] = tmp16; 2023-01-11T21:38:05.9937601Z out_ptr1[i2 + (27*i1) + (729*i0)] = tmp41; 2023-01-11T21:38:05.9937675Z } 2023-01-11T21:38:05.9937745Z } 2023-01-11T21:38:05.9937815Z } 2023-01-11T21:38:05.9937884Z } 2023-01-11T21:38:05.9937953Z } 2023-01-11T21:38:05.9938012Z } 2023-01-11T21:38:05.9938075Z } 2023-01-11T21:38:05.9938164Z ''') 2023-01-11T21:38:05.9938170Z 2023-01-11T21:38:05.9938174Z 2023-01-11T21:38:05.9938271Z async_compile.wait(globals()) 2023-01-11T21:38:05.9938347Z del async_compile 2023-01-11T21:38:05.9938352Z 2023-01-11T21:38:05.9938426Z def call(args): 2023-01-11T21:38:05.9938500Z arg0_1, = args 2023-01-11T21:38:05.9938574Z args.clear() 2023-01-11T21:38:05.9938792Z buf0 = empty_strided((16, 64, 27, 27), (46656, 729, 27, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9939012Z buf1 = empty_strided((16, 64, 27, 27), (46656, 729, 27, 1), device='cpu', dtype=torch.int64) 2023-01-11T21:38:05.9939183Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr())) 2023-01-11T21:38:05.9939256Z del arg0_1 2023-01-11T21:38:05.9939335Z return (buf0, buf1, ) 2023-01-11T21:38:05.9939340Z 2023-01-11T21:38:05.9939344Z 2023-01-11T21:38:05.9939428Z if __name__ == "__main__": 2023-01-11T21:38:05.9939547Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.9939674Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.9939893Z arg0_1 = rand_strided((16, 64, 55, 55), (193600, 3025, 55, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9940004Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:05.9940009Z 2023-01-11T21:38:05.9940079Z ok (2.106s) 2023-01-11T21:38:05.9940532Z test_max_pool2d3_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.9940668Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.9940955Z [2023-01-11 21:29:12,816] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 188 2023-01-11T21:38:05.9940962Z 2023-01-11T21:38:05.9941058Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.9941134Z import torch 2023-01-11T21:38:05.9941208Z import random 2023-01-11T21:38:05.9941320Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.9941444Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.9941449Z 2023-01-11T21:38:05.9941530Z aten = torch.ops.aten 2023-01-11T21:38:05.9941668Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.9941769Z async_compile = AsyncCompile() 2023-01-11T21:38:05.9941774Z 2023-01-11T21:38:05.9941846Z import triton 2023-01-11T21:38:05.9941938Z import triton.language as tl 2023-01-11T21:38:05.9942054Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.9942192Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.9942201Z 2023-01-11T21:38:05.9942205Z 2023-01-11T21:38:05.9942342Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.9942544Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.9942669Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:05.9942772Z float* __restrict__ out_ptr0, 2023-01-11T21:38:05.9942874Z long* __restrict__ out_ptr1) 2023-01-11T21:38:05.9942939Z { 2023-01-11T21:38:05.9943033Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.9943100Z { 2023-01-11T21:38:05.9943212Z #pragma omp for 2023-01-11T21:38:05.9943299Z for(long i0=0; i0<4; i0+=1) 2023-01-11T21:38:05.9943367Z { 2023-01-11T21:38:05.9943452Z #pragma GCC ivdep 2023-01-11T21:38:05.9943539Z for(long i1=0; i1<4; i1+=1) 2023-01-11T21:38:05.9943599Z { 2023-01-11T21:38:05.9943669Z { 2023-01-11T21:38:05.9943743Z { 2023-01-11T21:38:05.9943921Z auto tmp0 = static_cast((-1) + (2*i0)); 2023-01-11T21:38:05.9944030Z auto tmp1 = static_cast(0); 2023-01-11T21:38:05.9944131Z auto tmp2 = tmp0 >= tmp1; 2023-01-11T21:38:05.9944239Z auto tmp3 = static_cast(8); 2023-01-11T21:38:05.9944331Z auto tmp4 = tmp0 < tmp3; 2023-01-11T21:38:05.9944451Z auto tmp5 = tmp2 & tmp4; 2023-01-11T21:38:05.9944686Z auto tmp6 = static_cast((-1) + (2*i1)); 2023-01-11T21:38:05.9944818Z auto tmp7 = tmp6 >= tmp1; 2023-01-11T21:38:05.9944942Z auto tmp8 = tmp6 < tmp3; 2023-01-11T21:38:05.9945064Z auto tmp9 = tmp7 & tmp8; 2023-01-11T21:38:05.9945187Z auto tmp10 = tmp5 & tmp9; 2023-01-11T21:38:05.9945480Z float tmp11 = -std::numeric_limits::infinity(); 2023-01-11T21:38:05.9945590Z if(tmp10) 2023-01-11T21:38:05.9945688Z { 2023-01-11T21:38:05.9945918Z auto tmp12 = in_ptr0[(-9) + (2*i1) + (16*i0)]; 2023-01-11T21:38:05.9946033Z tmp11 = tmp12; 2023-01-11T21:38:05.9946131Z } 2023-01-11T21:38:05.9946275Z auto tmp13 = static_cast(2*i1); 2023-01-11T21:38:05.9946409Z auto tmp14 = tmp13 >= tmp1; 2023-01-11T21:38:05.9946535Z auto tmp15 = tmp13 < tmp3; 2023-01-11T21:38:05.9946673Z auto tmp16 = tmp14 & tmp15; 2023-01-11T21:38:05.9946810Z auto tmp17 = tmp5 & tmp16; 2023-01-11T21:38:05.9947139Z float tmp18 = -std::numeric_limits::infinity(); 2023-01-11T21:38:05.9947264Z if(tmp17) 2023-01-11T21:38:05.9947444Z { 2023-01-11T21:38:05.9947719Z auto tmp19 = in_ptr0[(-8) + (2*i1) + (16*i0)]; 2023-01-11T21:38:05.9947836Z tmp18 = tmp19; 2023-01-11T21:38:05.9947936Z } 2023-01-11T21:38:05.9948123Z auto tmp20 = (tmp11 != tmp11) ? tmp11 : std::max(tmp18, tmp11); 2023-01-11T21:38:05.9948285Z auto tmp21 = static_cast(1 + (2*i1)); 2023-01-11T21:38:05.9948426Z auto tmp22 = tmp21 >= tmp1; 2023-01-11T21:38:05.9948558Z auto tmp23 = tmp21 < tmp3; 2023-01-11T21:38:05.9948691Z auto tmp24 = tmp22 & tmp23; 2023-01-11T21:38:05.9948828Z auto tmp25 = tmp5 & tmp24; 2023-01-11T21:38:05.9949135Z float tmp26 = -std::numeric_limits::infinity(); 2023-01-11T21:38:05.9949251Z if(tmp25) 2023-01-11T21:38:05.9949350Z { 2023-01-11T21:38:05.9949607Z auto tmp27 = in_ptr0[(-7) + (2*i1) + (16*i0)]; 2023-01-11T21:38:05.9949729Z tmp26 = tmp27; 2023-01-11T21:38:05.9949834Z } 2023-01-11T21:38:05.9950024Z auto tmp28 = (tmp20 != tmp20) ? tmp20 : std::max(tmp26, tmp20); 2023-01-11T21:38:05.9950170Z auto tmp29 = static_cast(2*i0); 2023-01-11T21:38:05.9950308Z auto tmp30 = tmp29 >= tmp1; 2023-01-11T21:38:05.9950440Z auto tmp31 = tmp29 < tmp3; 2023-01-11T21:38:05.9950584Z auto tmp32 = tmp30 & tmp31; 2023-01-11T21:38:05.9950799Z auto tmp33 = tmp32 & tmp9; 2023-01-11T21:38:05.9951123Z float tmp34 = -std::numeric_limits::infinity(); 2023-01-11T21:38:05.9951238Z if(tmp33) 2023-01-11T21:38:05.9951342Z { 2023-01-11T21:38:05.9951574Z auto tmp35 = in_ptr0[(-1) + (2*i1) + (16*i0)]; 2023-01-11T21:38:05.9951691Z tmp34 = tmp35; 2023-01-11T21:38:05.9951790Z } 2023-01-11T21:38:05.9951976Z auto tmp36 = (tmp28 != tmp28) ? tmp28 : std::max(tmp34, tmp28); 2023-01-11T21:38:05.9952093Z auto tmp37 = tmp32 & tmp16; 2023-01-11T21:38:05.9952325Z float tmp38 = -std::numeric_limits::infinity(); 2023-01-11T21:38:05.9952407Z if(tmp37) 2023-01-11T21:38:05.9952475Z { 2023-01-11T21:38:05.9952591Z auto tmp39 = in_ptr0[(2*i1) + (16*i0)]; 2023-01-11T21:38:05.9952685Z tmp38 = tmp39; 2023-01-11T21:38:05.9952759Z } 2023-01-11T21:38:05.9952896Z auto tmp40 = (tmp36 != tmp36) ? tmp36 : std::max(tmp38, tmp36); 2023-01-11T21:38:05.9952999Z auto tmp41 = tmp32 & tmp24; 2023-01-11T21:38:05.9953226Z float tmp42 = -std::numeric_limits::infinity(); 2023-01-11T21:38:05.9953310Z if(tmp41) 2023-01-11T21:38:05.9953377Z { 2023-01-11T21:38:05.9953492Z auto tmp43 = in_ptr0[1 + (2*i1) + (16*i0)]; 2023-01-11T21:38:05.9953582Z tmp42 = tmp43; 2023-01-11T21:38:05.9953656Z } 2023-01-11T21:38:05.9953790Z auto tmp44 = (tmp40 != tmp40) ? tmp40 : std::max(tmp42, tmp40); 2023-01-11T21:38:05.9953908Z auto tmp45 = static_cast(1 + (2*i0)); 2023-01-11T21:38:05.9954014Z auto tmp46 = tmp45 >= tmp1; 2023-01-11T21:38:05.9954107Z auto tmp47 = tmp45 < tmp3; 2023-01-11T21:38:05.9954206Z auto tmp48 = tmp46 & tmp47; 2023-01-11T21:38:05.9954306Z auto tmp49 = tmp48 & tmp9; 2023-01-11T21:38:05.9954607Z float tmp50 = -std::numeric_limits::infinity(); 2023-01-11T21:38:05.9954693Z if(tmp49) 2023-01-11T21:38:05.9954765Z { 2023-01-11T21:38:05.9954880Z auto tmp51 = in_ptr0[7 + (2*i1) + (16*i0)]; 2023-01-11T21:38:05.9954970Z tmp50 = tmp51; 2023-01-11T21:38:05.9955037Z } 2023-01-11T21:38:05.9955171Z auto tmp52 = (tmp44 != tmp44) ? tmp44 : std::max(tmp50, tmp44); 2023-01-11T21:38:05.9955271Z auto tmp53 = tmp48 & tmp16; 2023-01-11T21:38:05.9955488Z float tmp54 = -std::numeric_limits::infinity(); 2023-01-11T21:38:05.9955574Z if(tmp53) 2023-01-11T21:38:05.9955649Z { 2023-01-11T21:38:05.9955762Z auto tmp55 = in_ptr0[8 + (2*i1) + (16*i0)]; 2023-01-11T21:38:05.9955844Z tmp54 = tmp55; 2023-01-11T21:38:05.9955922Z } 2023-01-11T21:38:05.9956059Z auto tmp56 = (tmp52 != tmp52) ? tmp52 : std::max(tmp54, tmp52); 2023-01-11T21:38:05.9956160Z auto tmp57 = tmp48 & tmp24; 2023-01-11T21:38:05.9956378Z float tmp58 = -std::numeric_limits::infinity(); 2023-01-11T21:38:05.9956460Z if(tmp57) 2023-01-11T21:38:05.9956535Z { 2023-01-11T21:38:05.9956654Z auto tmp59 = in_ptr0[9 + (2*i1) + (16*i0)]; 2023-01-11T21:38:05.9956740Z tmp58 = tmp59; 2023-01-11T21:38:05.9956848Z } 2023-01-11T21:38:05.9956985Z auto tmp60 = (tmp56 != tmp56) ? tmp56 : std::max(tmp58, tmp56); 2023-01-11T21:38:05.9957206Z float tmp61 = -std::numeric_limits::infinity(); 2023-01-11T21:38:05.9957292Z if(tmp10) 2023-01-11T21:38:05.9957370Z { 2023-01-11T21:38:05.9957547Z auto tmp62 = in_ptr0[(-9) + (2*i1) + (16*i0)]; 2023-01-11T21:38:05.9957639Z tmp61 = tmp62; 2023-01-11T21:38:05.9957707Z } 2023-01-11T21:38:05.9957897Z auto tmp63 = static_cast((-9) + (2*i1) + (16*i0)); 2023-01-11T21:38:05.9958123Z float tmp64 = -std::numeric_limits::infinity(); 2023-01-11T21:38:05.9958206Z if(tmp17) 2023-01-11T21:38:05.9958280Z { 2023-01-11T21:38:05.9958457Z auto tmp65 = in_ptr0[(-8) + (2*i1) + (16*i0)]; 2023-01-11T21:38:05.9958556Z tmp64 = tmp65; 2023-01-11T21:38:05.9958623Z } 2023-01-11T21:38:05.9958814Z auto tmp66 = static_cast((-8) + (2*i1) + (16*i0)); 2023-01-11T21:38:05.9958917Z auto tmp67 = tmp64 > tmp61; 2023-01-11T21:38:05.9959034Z auto tmp68 = tmp67 ? tmp66 : tmp63; 2023-01-11T21:38:05.9959174Z auto tmp69 = (tmp61 != tmp61) ? tmp61 : std::max(tmp64, tmp61); 2023-01-11T21:38:05.9959398Z float tmp70 = -std::numeric_limits::infinity(); 2023-01-11T21:38:05.9959489Z if(tmp25) 2023-01-11T21:38:05.9959568Z { 2023-01-11T21:38:05.9959738Z auto tmp71 = in_ptr0[(-7) + (2*i1) + (16*i0)]; 2023-01-11T21:38:05.9959829Z tmp70 = tmp71; 2023-01-11T21:38:05.9959904Z } 2023-01-11T21:38:05.9960095Z auto tmp72 = static_cast((-7) + (2*i1) + (16*i0)); 2023-01-11T21:38:05.9960198Z auto tmp73 = tmp70 > tmp69; 2023-01-11T21:38:05.9960310Z auto tmp74 = tmp73 ? tmp72 : tmp68; 2023-01-11T21:38:05.9960485Z auto tmp75 = (tmp69 != tmp69) ? tmp69 : std::max(tmp70, tmp69); 2023-01-11T21:38:05.9960705Z float tmp76 = -std::numeric_limits::infinity(); 2023-01-11T21:38:05.9960780Z if(tmp33) 2023-01-11T21:38:05.9960855Z { 2023-01-11T21:38:05.9961029Z auto tmp77 = in_ptr0[(-1) + (2*i1) + (16*i0)]; 2023-01-11T21:38:05.9961119Z tmp76 = tmp77; 2023-01-11T21:38:05.9961191Z } 2023-01-11T21:38:05.9961373Z auto tmp78 = static_cast((-1) + (2*i1) + (16*i0)); 2023-01-11T21:38:05.9961473Z auto tmp79 = tmp76 > tmp75; 2023-01-11T21:38:05.9961578Z auto tmp80 = tmp79 ? tmp78 : tmp74; 2023-01-11T21:38:05.9961711Z auto tmp81 = (tmp75 != tmp75) ? tmp75 : std::max(tmp76, tmp75); 2023-01-11T21:38:05.9961927Z float tmp82 = -std::numeric_limits::infinity(); 2023-01-11T21:38:05.9962011Z if(tmp37) 2023-01-11T21:38:05.9962086Z { 2023-01-11T21:38:05.9962200Z auto tmp83 = in_ptr0[(2*i1) + (16*i0)]; 2023-01-11T21:38:05.9962287Z tmp82 = tmp83; 2023-01-11T21:38:05.9962359Z } 2023-01-11T21:38:05.9962475Z auto tmp84 = static_cast((2*i1) + (16*i0)); 2023-01-11T21:38:05.9962573Z auto tmp85 = tmp82 > tmp81; 2023-01-11T21:38:05.9962684Z auto tmp86 = tmp85 ? tmp84 : tmp80; 2023-01-11T21:38:05.9962821Z auto tmp87 = (tmp81 != tmp81) ? tmp81 : std::max(tmp82, tmp81); 2023-01-11T21:38:05.9963087Z float tmp88 = -std::numeric_limits::infinity(); 2023-01-11T21:38:05.9963169Z if(tmp41) 2023-01-11T21:38:05.9963242Z { 2023-01-11T21:38:05.9963350Z auto tmp89 = in_ptr0[1 + (2*i1) + (16*i0)]; 2023-01-11T21:38:05.9963442Z tmp88 = tmp89; 2023-01-11T21:38:05.9963515Z } 2023-01-11T21:38:05.9963640Z auto tmp90 = static_cast(1 + (2*i1) + (16*i0)); 2023-01-11T21:38:05.9963740Z auto tmp91 = tmp88 > tmp87; 2023-01-11T21:38:05.9963848Z auto tmp92 = tmp91 ? tmp90 : tmp86; 2023-01-11T21:38:05.9963985Z auto tmp93 = (tmp87 != tmp87) ? tmp87 : std::max(tmp88, tmp87); 2023-01-11T21:38:05.9964200Z float tmp94 = -std::numeric_limits::infinity(); 2023-01-11T21:38:05.9964275Z if(tmp49) 2023-01-11T21:38:05.9964352Z { 2023-01-11T21:38:05.9964468Z auto tmp95 = in_ptr0[7 + (2*i1) + (16*i0)]; 2023-01-11T21:38:05.9964557Z tmp94 = tmp95; 2023-01-11T21:38:05.9964629Z } 2023-01-11T21:38:05.9964754Z auto tmp96 = static_cast(7 + (2*i1) + (16*i0)); 2023-01-11T21:38:05.9964855Z auto tmp97 = tmp94 > tmp93; 2023-01-11T21:38:05.9964958Z auto tmp98 = tmp97 ? tmp96 : tmp92; 2023-01-11T21:38:05.9965093Z auto tmp99 = (tmp93 != tmp93) ? tmp93 : std::max(tmp94, tmp93); 2023-01-11T21:38:05.9965338Z float tmp100 = -std::numeric_limits::infinity(); 2023-01-11T21:38:05.9965427Z if(tmp53) 2023-01-11T21:38:05.9965515Z { 2023-01-11T21:38:05.9965642Z auto tmp101 = in_ptr0[8 + (2*i1) + (16*i0)]; 2023-01-11T21:38:05.9965739Z tmp100 = tmp101; 2023-01-11T21:38:05.9965810Z } 2023-01-11T21:38:05.9965926Z auto tmp102 = static_cast(8 + (2*i1) + (16*i0)); 2023-01-11T21:38:05.9966030Z auto tmp103 = tmp100 > tmp99; 2023-01-11T21:38:05.9966177Z auto tmp104 = tmp103 ? tmp102 : tmp98; 2023-01-11T21:38:05.9966322Z auto tmp105 = (tmp99 != tmp99) ? tmp99 : std::max(tmp100, tmp99); 2023-01-11T21:38:05.9966543Z float tmp106 = -std::numeric_limits::infinity(); 2023-01-11T21:38:05.9966625Z if(tmp57) 2023-01-11T21:38:05.9966697Z { 2023-01-11T21:38:05.9966813Z auto tmp107 = in_ptr0[9 + (2*i1) + (16*i0)]; 2023-01-11T21:38:05.9966899Z tmp106 = tmp107; 2023-01-11T21:38:05.9966973Z } 2023-01-11T21:38:05.9967097Z auto tmp108 = static_cast(9 + (2*i1) + (16*i0)); 2023-01-11T21:38:05.9967205Z auto tmp109 = tmp106 > tmp105; 2023-01-11T21:38:05.9967317Z auto tmp110 = tmp109 ? tmp108 : tmp104; 2023-01-11T21:38:05.9967462Z auto tmp111 = (tmp105 != tmp105) ? tmp105 : std::max(tmp106, tmp105); 2023-01-11T21:38:05.9967567Z out_ptr0[i1 + (4*i0)] = tmp60; 2023-01-11T21:38:05.9967663Z out_ptr1[i1 + (4*i0)] = tmp110; 2023-01-11T21:38:05.9967739Z } 2023-01-11T21:38:05.9967808Z } 2023-01-11T21:38:05.9967876Z } 2023-01-11T21:38:05.9967943Z } 2023-01-11T21:38:05.9968011Z } 2023-01-11T21:38:05.9968074Z } 2023-01-11T21:38:05.9968152Z ''') 2023-01-11T21:38:05.9968159Z 2023-01-11T21:38:05.9968163Z 2023-01-11T21:38:05.9968259Z async_compile.wait(globals()) 2023-01-11T21:38:05.9968338Z del async_compile 2023-01-11T21:38:05.9968343Z 2023-01-11T21:38:05.9968448Z def call(args): 2023-01-11T21:38:05.9968520Z arg0_1, = args 2023-01-11T21:38:05.9968595Z args.clear() 2023-01-11T21:38:05.9968808Z buf0 = empty_strided((1, 1, 4, 4), (16, 16, 4, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9969024Z buf1 = empty_strided((1, 1, 4, 4), (16, 16, 4, 1), device='cpu', dtype=torch.int64) 2023-01-11T21:38:05.9969187Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr())) 2023-01-11T21:38:05.9969261Z del arg0_1 2023-01-11T21:38:05.9969342Z return (buf0, buf1, ) 2023-01-11T21:38:05.9969347Z 2023-01-11T21:38:05.9969351Z 2023-01-11T21:38:05.9969432Z if __name__ == "__main__": 2023-01-11T21:38:05.9969551Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.9969680Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.9969893Z arg0_1 = rand_strided((1, 1, 8, 8), (64, 64, 8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9970015Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:05.9970279Z [2023-01-11 21:29:14,938] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 188 2023-01-11T21:38:05.9970285Z 2023-01-11T21:38:05.9970358Z ok (2.170s) 2023-01-11T21:38:05.9970822Z test_max_pool2d4_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.9970956Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.9971216Z [2023-01-11 21:29:14,991] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 189 2023-01-11T21:38:05.9971481Z [2023-01-11 21:29:17,146] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 189 2023-01-11T21:38:05.9971489Z 2023-01-11T21:38:05.9971588Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.9971664Z import torch 2023-01-11T21:38:05.9971740Z import random 2023-01-11T21:38:05.9971852Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.9972048Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.9972054Z 2023-01-11T21:38:05.9972137Z aten = torch.ops.aten 2023-01-11T21:38:05.9972276Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.9972377Z async_compile = AsyncCompile() 2023-01-11T21:38:05.9972382Z 2023-01-11T21:38:05.9972458Z import triton 2023-01-11T21:38:05.9972550Z import triton.language as tl 2023-01-11T21:38:05.9972675Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.9972810Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.9972815Z 2023-01-11T21:38:05.9972820Z 2023-01-11T21:38:05.9972960Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.9973168Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.9973298Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:05.9973404Z float* __restrict__ out_ptr0, 2023-01-11T21:38:05.9973509Z long* __restrict__ out_ptr1) 2023-01-11T21:38:05.9973575Z { 2023-01-11T21:38:05.9973678Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.9973738Z { 2023-01-11T21:38:05.9973820Z #pragma omp for 2023-01-11T21:38:05.9973908Z for(long i0=0; i0<16; i0+=1) 2023-01-11T21:38:05.9973977Z { 2023-01-11T21:38:05.9974062Z #pragma GCC ivdep 2023-01-11T21:38:05.9974152Z for(long i1=0; i1<55; i1+=1) 2023-01-11T21:38:05.9974213Z { 2023-01-11T21:38:05.9974301Z #pragma GCC ivdep 2023-01-11T21:38:05.9974397Z for(long i2=0; i2<55; i2+=1) 2023-01-11T21:38:05.9974646Z { 2023-01-11T21:38:05.9974721Z { 2023-01-11T21:38:05.9974794Z { 2023-01-11T21:38:05.9974918Z auto tmp0 = in_ptr0[(2*i2) + (222*i1) + (12321*i0)]; 2023-01-11T21:38:05.9975038Z auto tmp1 = in_ptr0[1 + (2*i2) + (222*i1) + (12321*i0)]; 2023-01-11T21:38:05.9975156Z auto tmp3 = in_ptr0[2 + (2*i2) + (222*i1) + (12321*i0)]; 2023-01-11T21:38:05.9975278Z auto tmp5 = in_ptr0[111 + (2*i2) + (222*i1) + (12321*i0)]; 2023-01-11T21:38:05.9975401Z auto tmp7 = in_ptr0[112 + (2*i2) + (222*i1) + (12321*i0)]; 2023-01-11T21:38:05.9975521Z auto tmp9 = in_ptr0[113 + (2*i2) + (222*i1) + (12321*i0)]; 2023-01-11T21:38:05.9975642Z auto tmp11 = in_ptr0[222 + (2*i2) + (222*i1) + (12321*i0)]; 2023-01-11T21:38:05.9975764Z auto tmp13 = in_ptr0[223 + (2*i2) + (222*i1) + (12321*i0)]; 2023-01-11T21:38:05.9975890Z auto tmp15 = in_ptr0[224 + (2*i2) + (222*i1) + (12321*i0)]; 2023-01-11T21:38:05.9976032Z auto tmp2 = (tmp0 != tmp0) ? tmp0 : std::max(tmp1, tmp0); 2023-01-11T21:38:05.9976162Z auto tmp4 = (tmp2 != tmp2) ? tmp2 : std::max(tmp3, tmp2); 2023-01-11T21:38:05.9976292Z auto tmp6 = (tmp4 != tmp4) ? tmp4 : std::max(tmp5, tmp4); 2023-01-11T21:38:05.9976422Z auto tmp8 = (tmp6 != tmp6) ? tmp6 : std::max(tmp7, tmp6); 2023-01-11T21:38:05.9976556Z auto tmp10 = (tmp8 != tmp8) ? tmp8 : std::max(tmp9, tmp8); 2023-01-11T21:38:05.9976697Z auto tmp12 = (tmp10 != tmp10) ? tmp10 : std::max(tmp11, tmp10); 2023-01-11T21:38:05.9976835Z auto tmp14 = (tmp12 != tmp12) ? tmp12 : std::max(tmp13, tmp12); 2023-01-11T21:38:05.9976967Z auto tmp16 = (tmp14 != tmp14) ? tmp14 : std::max(tmp15, tmp14); 2023-01-11T21:38:05.9977096Z auto tmp17 = static_cast((2*i2) + (222*i1)); 2023-01-11T21:38:05.9977296Z auto tmp18 = static_cast(1 + (2*i2) + (222*i1)); 2023-01-11T21:38:05.9977453Z auto tmp19 = tmp1 > tmp0; 2023-01-11T21:38:05.9977575Z auto tmp20 = tmp19 ? tmp18 : tmp17; 2023-01-11T21:38:05.9977712Z auto tmp21 = static_cast(2 + (2*i2) + (222*i1)); 2023-01-11T21:38:05.9977820Z auto tmp22 = tmp3 > tmp2; 2023-01-11T21:38:05.9977937Z auto tmp23 = tmp22 ? tmp21 : tmp20; 2023-01-11T21:38:05.9978073Z auto tmp24 = static_cast(111 + (2*i2) + (222*i1)); 2023-01-11T21:38:05.9978178Z auto tmp25 = tmp5 > tmp4; 2023-01-11T21:38:05.9978294Z auto tmp26 = tmp25 ? tmp24 : tmp23; 2023-01-11T21:38:05.9978429Z auto tmp27 = static_cast(112 + (2*i2) + (222*i1)); 2023-01-11T21:38:05.9978536Z auto tmp28 = tmp7 > tmp6; 2023-01-11T21:38:05.9978653Z auto tmp29 = tmp28 ? tmp27 : tmp26; 2023-01-11T21:38:05.9978790Z auto tmp30 = static_cast(113 + (2*i2) + (222*i1)); 2023-01-11T21:38:05.9978897Z auto tmp31 = tmp9 > tmp8; 2023-01-11T21:38:05.9979012Z auto tmp32 = tmp31 ? tmp30 : tmp29; 2023-01-11T21:38:05.9979148Z auto tmp33 = static_cast(222 + (2*i2) + (222*i1)); 2023-01-11T21:38:05.9979249Z auto tmp34 = tmp11 > tmp10; 2023-01-11T21:38:05.9979363Z auto tmp35 = tmp34 ? tmp33 : tmp32; 2023-01-11T21:38:05.9979495Z auto tmp36 = static_cast(223 + (2*i2) + (222*i1)); 2023-01-11T21:38:05.9979639Z auto tmp37 = tmp13 > tmp12; 2023-01-11T21:38:05.9979752Z auto tmp38 = tmp37 ? tmp36 : tmp35; 2023-01-11T21:38:05.9979883Z auto tmp39 = static_cast(224 + (2*i2) + (222*i1)); 2023-01-11T21:38:05.9979989Z auto tmp40 = tmp15 > tmp14; 2023-01-11T21:38:05.9980105Z auto tmp41 = tmp40 ? tmp39 : tmp38; 2023-01-11T21:38:05.9980217Z out_ptr0[i2 + (55*i1) + (3025*i0)] = tmp16; 2023-01-11T21:38:05.9980337Z out_ptr1[i2 + (55*i1) + (3025*i0)] = tmp41; 2023-01-11T21:38:05.9980412Z } 2023-01-11T21:38:05.9980484Z } 2023-01-11T21:38:05.9980554Z } 2023-01-11T21:38:05.9980621Z } 2023-01-11T21:38:05.9980688Z } 2023-01-11T21:38:05.9980747Z } 2023-01-11T21:38:05.9980810Z } 2023-01-11T21:38:05.9980905Z ''') 2023-01-11T21:38:05.9980915Z 2023-01-11T21:38:05.9980919Z 2023-01-11T21:38:05.9981019Z async_compile.wait(globals()) 2023-01-11T21:38:05.9981097Z del async_compile 2023-01-11T21:38:05.9981102Z 2023-01-11T21:38:05.9981177Z def call(args): 2023-01-11T21:38:05.9981254Z arg0_1, = args 2023-01-11T21:38:05.9981322Z args.clear() 2023-01-11T21:38:05.9981586Z buf0 = empty_strided((2, 8, 55, 55), (24200, 3025, 55, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9981806Z buf1 = empty_strided((2, 8, 55, 55), (24200, 3025, 55, 1), device='cpu', dtype=torch.int64) 2023-01-11T21:38:05.9981976Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr())) 2023-01-11T21:38:05.9982050Z del arg0_1 2023-01-11T21:38:05.9982132Z return (buf0, buf1, ) 2023-01-11T21:38:05.9982138Z 2023-01-11T21:38:05.9982142Z 2023-01-11T21:38:05.9982222Z if __name__ == "__main__": 2023-01-11T21:38:05.9982342Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.9982461Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.9982695Z arg0_1 = rand_strided((2, 8, 111, 111), (98568, 12321, 111, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9982809Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:05.9982814Z 2023-01-11T21:38:05.9982886Z ok (2.327s) 2023-01-11T21:38:05.9983375Z test_max_pool2d5_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.9983513Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.9983773Z [2023-01-11 21:29:17,355] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 190 2023-01-11T21:38:05.9984041Z [2023-01-11 21:29:19,541] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 190 2023-01-11T21:38:05.9984049Z 2023-01-11T21:38:05.9984148Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.9984222Z import torch 2023-01-11T21:38:05.9984290Z import random 2023-01-11T21:38:05.9984411Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.9984537Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.9984542Z 2023-01-11T21:38:05.9984623Z aten = torch.ops.aten 2023-01-11T21:38:05.9984759Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.9984855Z async_compile = AsyncCompile() 2023-01-11T21:38:05.9984860Z 2023-01-11T21:38:05.9984934Z import triton 2023-01-11T21:38:05.9985020Z import triton.language as tl 2023-01-11T21:38:05.9985152Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.9985319Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.9985358Z 2023-01-11T21:38:05.9985364Z 2023-01-11T21:38:05.9985517Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:05.9985728Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:05.9985852Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:05.9985961Z float* __restrict__ out_ptr0, 2023-01-11T21:38:05.9986064Z long* __restrict__ out_ptr1) 2023-01-11T21:38:05.9986122Z { 2023-01-11T21:38:05.9986224Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:05.9986290Z { 2023-01-11T21:38:05.9986372Z #pragma omp for 2023-01-11T21:38:05.9986461Z for(long i0=0; i0<1024; i0+=1) 2023-01-11T21:38:05.9986529Z { 2023-01-11T21:38:05.9986616Z #pragma GCC ivdep 2023-01-11T21:38:05.9986700Z for(long i1=0; i1<18; i1+=1) 2023-01-11T21:38:05.9986768Z { 2023-01-11T21:38:05.9986857Z #pragma GCC ivdep 2023-01-11T21:38:05.9986956Z for(long i2=0; i2<18; i2+=1) 2023-01-11T21:38:05.9987027Z { 2023-01-11T21:38:05.9987100Z { 2023-01-11T21:38:05.9987172Z { 2023-01-11T21:38:05.9987290Z auto tmp0 = in_ptr0[(3*i2) + (165*i1) + (3025*i0)]; 2023-01-11T21:38:05.9987417Z auto tmp1 = in_ptr0[1 + (3*i2) + (165*i1) + (3025*i0)]; 2023-01-11T21:38:05.9987536Z auto tmp3 = in_ptr0[2 + (3*i2) + (165*i1) + (3025*i0)]; 2023-01-11T21:38:05.9987660Z auto tmp5 = in_ptr0[55 + (3*i2) + (165*i1) + (3025*i0)]; 2023-01-11T21:38:05.9987778Z auto tmp7 = in_ptr0[56 + (3*i2) + (165*i1) + (3025*i0)]; 2023-01-11T21:38:05.9987892Z auto tmp9 = in_ptr0[57 + (3*i2) + (165*i1) + (3025*i0)]; 2023-01-11T21:38:05.9988015Z auto tmp11 = in_ptr0[110 + (3*i2) + (165*i1) + (3025*i0)]; 2023-01-11T21:38:05.9988145Z auto tmp13 = in_ptr0[111 + (3*i2) + (165*i1) + (3025*i0)]; 2023-01-11T21:38:05.9988261Z auto tmp15 = in_ptr0[112 + (3*i2) + (165*i1) + (3025*i0)]; 2023-01-11T21:38:05.9988402Z auto tmp2 = (tmp0 != tmp0) ? tmp0 : std::max(tmp1, tmp0); 2023-01-11T21:38:05.9988563Z auto tmp4 = (tmp2 != tmp2) ? tmp2 : std::max(tmp3, tmp2); 2023-01-11T21:38:05.9988693Z auto tmp6 = (tmp4 != tmp4) ? tmp4 : std::max(tmp5, tmp4); 2023-01-11T21:38:05.9988822Z auto tmp8 = (tmp6 != tmp6) ? tmp6 : std::max(tmp7, tmp6); 2023-01-11T21:38:05.9988957Z auto tmp10 = (tmp8 != tmp8) ? tmp8 : std::max(tmp9, tmp8); 2023-01-11T21:38:05.9989099Z auto tmp12 = (tmp10 != tmp10) ? tmp10 : std::max(tmp11, tmp10); 2023-01-11T21:38:05.9989238Z auto tmp14 = (tmp12 != tmp12) ? tmp12 : std::max(tmp13, tmp12); 2023-01-11T21:38:05.9989377Z auto tmp16 = (tmp14 != tmp14) ? tmp14 : std::max(tmp15, tmp14); 2023-01-11T21:38:05.9989495Z auto tmp17 = static_cast((3*i2) + (165*i1)); 2023-01-11T21:38:05.9989627Z auto tmp18 = static_cast(1 + (3*i2) + (165*i1)); 2023-01-11T21:38:05.9989731Z auto tmp19 = tmp1 > tmp0; 2023-01-11T21:38:05.9989844Z auto tmp20 = tmp19 ? tmp18 : tmp17; 2023-01-11T21:38:05.9989973Z auto tmp21 = static_cast(2 + (3*i2) + (165*i1)); 2023-01-11T21:38:05.9990079Z auto tmp22 = tmp3 > tmp2; 2023-01-11T21:38:05.9990189Z auto tmp23 = tmp22 ? tmp21 : tmp20; 2023-01-11T21:38:05.9990317Z auto tmp24 = static_cast(55 + (3*i2) + (165*i1)); 2023-01-11T21:38:05.9990412Z auto tmp25 = tmp5 > tmp4; 2023-01-11T21:38:05.9990548Z auto tmp26 = tmp25 ? tmp24 : tmp23; 2023-01-11T21:38:05.9990674Z auto tmp27 = static_cast(56 + (3*i2) + (165*i1)); 2023-01-11T21:38:05.9990776Z auto tmp28 = tmp7 > tmp6; 2023-01-11T21:38:05.9990885Z auto tmp29 = tmp28 ? tmp27 : tmp26; 2023-01-11T21:38:05.9991011Z auto tmp30 = static_cast(57 + (3*i2) + (165*i1)); 2023-01-11T21:38:05.9991113Z auto tmp31 = tmp9 > tmp8; 2023-01-11T21:38:05.9991224Z auto tmp32 = tmp31 ? tmp30 : tmp29; 2023-01-11T21:38:05.9991343Z auto tmp33 = static_cast(110 + (3*i2) + (165*i1)); 2023-01-11T21:38:05.9991448Z auto tmp34 = tmp11 > tmp10; 2023-01-11T21:38:05.9991560Z auto tmp35 = tmp34 ? tmp33 : tmp32; 2023-01-11T21:38:05.9991688Z auto tmp36 = static_cast(111 + (3*i2) + (165*i1)); 2023-01-11T21:38:05.9991794Z auto tmp37 = tmp13 > tmp12; 2023-01-11T21:38:05.9991904Z auto tmp38 = tmp37 ? tmp36 : tmp35; 2023-01-11T21:38:05.9992027Z auto tmp39 = static_cast(112 + (3*i2) + (165*i1)); 2023-01-11T21:38:05.9992132Z auto tmp40 = tmp15 > tmp14; 2023-01-11T21:38:05.9992234Z auto tmp41 = tmp40 ? tmp39 : tmp38; 2023-01-11T21:38:05.9992349Z out_ptr0[i2 + (18*i1) + (324*i0)] = tmp16; 2023-01-11T21:38:05.9992461Z out_ptr1[i2 + (18*i1) + (324*i0)] = tmp41; 2023-01-11T21:38:05.9992535Z } 2023-01-11T21:38:05.9992610Z } 2023-01-11T21:38:05.9992680Z } 2023-01-11T21:38:05.9992748Z } 2023-01-11T21:38:05.9992808Z } 2023-01-11T21:38:05.9992875Z } 2023-01-11T21:38:05.9992938Z } 2023-01-11T21:38:05.9993028Z ''') 2023-01-11T21:38:05.9993034Z 2023-01-11T21:38:05.9993038Z 2023-01-11T21:38:05.9993133Z async_compile.wait(globals()) 2023-01-11T21:38:05.9993210Z del async_compile 2023-01-11T21:38:05.9993215Z 2023-01-11T21:38:05.9993290Z def call(args): 2023-01-11T21:38:05.9993357Z arg0_1, = args 2023-01-11T21:38:05.9993430Z args.clear() 2023-01-11T21:38:05.9993690Z buf0 = empty_strided((16, 64, 18, 18), (20736, 324, 18, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9993912Z buf1 = empty_strided((16, 64, 18, 18), (20736, 324, 18, 1), device='cpu', dtype=torch.int64) 2023-01-11T21:38:05.9994081Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr())) 2023-01-11T21:38:05.9994155Z del arg0_1 2023-01-11T21:38:05.9994236Z return (buf0, buf1, ) 2023-01-11T21:38:05.9994241Z 2023-01-11T21:38:05.9994245Z 2023-01-11T21:38:05.9994326Z if __name__ == "__main__": 2023-01-11T21:38:05.9994439Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.9994571Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.9994806Z arg0_1 = rand_strided((16, 64, 55, 55), (193600, 3025, 55, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9994923Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:05.9994928Z 2023-01-11T21:38:05.9995001Z ok (2.284s) 2023-01-11T21:38:05.9995463Z test_max_pool2d6_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:05.9995598Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:05.9995857Z [2023-01-11 21:29:19,612] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 191 2023-01-11T21:38:05.9996136Z [2023-01-11 21:29:19,618] torch._inductor.ir: [WARNING] Using FallbackKernel: aten.max_pool2d_with_indices 2023-01-11T21:38:05.9996394Z [2023-01-11 21:29:19,620] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 191 2023-01-11T21:38:05.9996408Z 2023-01-11T21:38:05.9996503Z from ctypes import c_void_p, c_long 2023-01-11T21:38:05.9996578Z import torch 2023-01-11T21:38:05.9996652Z import random 2023-01-11T21:38:05.9996773Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:05.9996899Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:05.9996904Z 2023-01-11T21:38:05.9996987Z aten = torch.ops.aten 2023-01-11T21:38:05.9997124Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:05.9997214Z async_compile = AsyncCompile() 2023-01-11T21:38:05.9997219Z 2023-01-11T21:38:05.9997293Z import triton 2023-01-11T21:38:05.9997388Z import triton.language as tl 2023-01-11T21:38:05.9997518Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:05.9997660Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:05.9997666Z 2023-01-11T21:38:05.9997670Z 2023-01-11T21:38:05.9997765Z async_compile.wait(globals()) 2023-01-11T21:38:05.9997843Z del async_compile 2023-01-11T21:38:05.9997848Z 2023-01-11T21:38:05.9997926Z def call(args): 2023-01-11T21:38:05.9997993Z arg0_1, = args 2023-01-11T21:38:05.9998068Z args.clear() 2023-01-11T21:38:05.9998203Z buf0 = aten.max_pool2d_with_indices(arg0_1, [13, 13], [13, 13], [0, 0], 1, False) 2023-01-11T21:38:05.9998276Z del arg0_1 2023-01-11T21:38:05.9998349Z buf1 = buf0[0] 2023-01-11T21:38:05.9998464Z assert_size_stride(buf1, (16, 64, 4, 4), (1024, 16, 4, 1)) 2023-01-11T21:38:05.9998537Z buf2 = buf0[1] 2023-01-11T21:38:05.9998644Z assert_size_stride(buf2, (16, 64, 4, 4), (1024, 16, 4, 1)) 2023-01-11T21:38:05.9998713Z del buf0 2023-01-11T21:38:05.9998794Z return (buf1, buf2, ) 2023-01-11T21:38:05.9998802Z 2023-01-11T21:38:05.9998806Z 2023-01-11T21:38:05.9998886Z if __name__ == "__main__": 2023-01-11T21:38:05.9999006Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:05.9999133Z from torch._inductor.utils import print_performance 2023-01-11T21:38:05.9999394Z arg0_1 = rand_strided((16, 64, 55, 55), (193600, 3025, 55, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:05.9999508Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:05.9999513Z 2023-01-11T21:38:05.9999577Z ok (0.075s) 2023-01-11T21:38:06.0000055Z test_max_pool2d_with_indices_backward2_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.0000192Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.0000450Z [2023-01-11 21:29:19,667] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 192 2023-01-11T21:38:06.0000718Z [2023-01-11 21:29:21,584] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 192 2023-01-11T21:38:06.0000724Z 2023-01-11T21:38:06.0000821Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.0000896Z import torch 2023-01-11T21:38:06.0000971Z import random 2023-01-11T21:38:06.0001091Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.0001208Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.0001213Z 2023-01-11T21:38:06.0001294Z aten = torch.ops.aten 2023-01-11T21:38:06.0001431Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.0001527Z async_compile = AsyncCompile() 2023-01-11T21:38:06.0001532Z 2023-01-11T21:38:06.0001605Z import triton 2023-01-11T21:38:06.0001739Z import triton.language as tl 2023-01-11T21:38:06.0001864Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.0001998Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.0002011Z 2023-01-11T21:38:06.0002015Z 2023-01-11T21:38:06.0002146Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.0002357Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.0002482Z extern "C" void kernel(const long* __restrict__ in_ptr0, 2023-01-11T21:38:06.0002594Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:06.0002700Z float* __restrict__ out_ptr0) 2023-01-11T21:38:06.0002765Z { 2023-01-11T21:38:06.0002868Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.0002927Z { 2023-01-11T21:38:06.0003010Z #pragma omp for 2023-01-11T21:38:06.0003097Z for(long i0=0; i0<8; i0+=1) 2023-01-11T21:38:06.0003165Z { 2023-01-11T21:38:06.0003252Z #pragma GCC ivdep 2023-01-11T21:38:06.0003346Z for(long i1=0; i1<40; i1+=1) 2023-01-11T21:38:06.0003416Z { 2023-01-11T21:38:06.0003495Z #pragma GCC ivdep 2023-01-11T21:38:06.0003591Z for(long i2=0; i2<56; i2+=1) 2023-01-11T21:38:06.0003664Z { 2023-01-11T21:38:06.0003742Z { 2023-01-11T21:38:06.0003815Z { 2023-01-11T21:38:06.0003938Z auto tmp0 = static_cast(i2 + (56*i1)); 2023-01-11T21:38:06.0004057Z auto tmp1 = static_cast((i1 / 2)); 2023-01-11T21:38:06.0004165Z auto tmp2 = static_cast((i2 / 2)); 2023-01-11T21:38:06.0004290Z auto tmp3 = static_cast(1 + (((1 + i1) / 2))); 2023-01-11T21:38:06.0004411Z auto tmp4 = static_cast(1 + (((1 + i2) / 2))); 2023-01-11T21:38:06.0004526Z auto tmp5 = static_cast(0); 2023-01-11T21:38:06.0004669Z auto tmp6 = (tmp5 != tmp5) ? tmp5 : std::max(tmp1, tmp5); 2023-01-11T21:38:06.0004803Z auto tmp7 = (tmp5 != tmp5) ? tmp5 : std::max(tmp2, tmp5); 2023-01-11T21:38:06.0004917Z auto tmp8 = static_cast(21); 2023-01-11T21:38:06.0005077Z auto tmp9 = (tmp8 != tmp8) ? tmp8 : std::min(tmp3, tmp8); 2023-01-11T21:38:06.0005186Z auto tmp10 = static_cast(29); 2023-01-11T21:38:06.0005331Z auto tmp11 = (tmp10 != tmp10) ? tmp10 : std::min(tmp4, tmp10); 2023-01-11T21:38:06.0005459Z auto tmp12 = tmp6 + tmp5; 2023-01-11T21:38:06.0005571Z auto tmp13 = tmp7 + tmp5; 2023-01-11T21:38:06.0005701Z auto tmp14 = static_cast(1); 2023-01-11T21:38:06.0005857Z auto tmp15 = tmp9 - tmp14; 2023-01-11T21:38:06.0006000Z auto tmp16 = (tmp15 != tmp15) ? tmp15 : std::min(tmp12, tmp15); 2023-01-11T21:38:06.0006152Z auto tmp17 = tmp11 - tmp14; 2023-01-11T21:38:06.0006289Z auto tmp18 = (tmp17 != tmp17) ? tmp17 : std::min(tmp13, tmp17); 2023-01-11T21:38:06.0006412Z auto tmp19 = in_ptr0[tmp18 + (29*tmp16) + (609*i0)]; 2023-01-11T21:38:06.0006541Z auto tmp20 = in_ptr1[tmp18 + (29*tmp16) + (609*i0)]; 2023-01-11T21:38:06.0006645Z auto tmp21 = tmp19 == tmp0; 2023-01-11T21:38:06.0006762Z auto tmp22 = static_cast(0.0); 2023-01-11T21:38:06.0006875Z auto tmp23 = tmp21 ? tmp20 : tmp22; 2023-01-11T21:38:06.0006979Z auto tmp24 = tmp7 + tmp14; 2023-01-11T21:38:06.0007115Z auto tmp25 = (tmp17 != tmp17) ? tmp17 : std::min(tmp24, tmp17); 2023-01-11T21:38:06.0007268Z auto tmp26 = in_ptr0[tmp25 + (29*tmp16) + (609*i0)]; 2023-01-11T21:38:06.0007383Z auto tmp27 = in_ptr1[tmp25 + (29*tmp16) + (609*i0)]; 2023-01-11T21:38:06.0007487Z auto tmp28 = tmp26 == tmp0; 2023-01-11T21:38:06.0007599Z auto tmp29 = tmp12 < tmp9; 2023-01-11T21:38:06.0007700Z auto tmp30 = tmp24 < tmp11; 2023-01-11T21:38:06.0007802Z auto tmp31 = tmp29 & tmp30; 2023-01-11T21:38:06.0007902Z auto tmp32 = tmp31 & tmp28; 2023-01-11T21:38:06.0008003Z auto tmp33 = tmp23 + tmp27; 2023-01-11T21:38:06.0008108Z auto tmp34 = tmp32 ? tmp33 : tmp23; 2023-01-11T21:38:06.0008212Z auto tmp35 = tmp6 + tmp14; 2023-01-11T21:38:06.0008351Z auto tmp36 = (tmp15 != tmp15) ? tmp15 : std::min(tmp35, tmp15); 2023-01-11T21:38:06.0008481Z auto tmp37 = in_ptr0[tmp18 + (29*tmp36) + (609*i0)]; 2023-01-11T21:38:06.0008606Z auto tmp38 = in_ptr1[tmp18 + (29*tmp36) + (609*i0)]; 2023-01-11T21:38:06.0008708Z auto tmp39 = tmp37 == tmp0; 2023-01-11T21:38:06.0008814Z auto tmp40 = tmp35 < tmp9; 2023-01-11T21:38:06.0008917Z auto tmp41 = tmp13 < tmp11; 2023-01-11T21:38:06.0009009Z auto tmp42 = tmp40 & tmp41; 2023-01-11T21:38:06.0009112Z auto tmp43 = tmp42 & tmp39; 2023-01-11T21:38:06.0009213Z auto tmp44 = tmp34 + tmp38; 2023-01-11T21:38:06.0009324Z auto tmp45 = tmp43 ? tmp44 : tmp34; 2023-01-11T21:38:06.0009449Z auto tmp46 = in_ptr0[tmp25 + (29*tmp36) + (609*i0)]; 2023-01-11T21:38:06.0009573Z auto tmp47 = in_ptr1[tmp25 + (29*tmp36) + (609*i0)]; 2023-01-11T21:38:06.0009677Z auto tmp48 = tmp46 == tmp0; 2023-01-11T21:38:06.0009776Z auto tmp49 = tmp40 & tmp30; 2023-01-11T21:38:06.0009868Z auto tmp50 = tmp49 & tmp48; 2023-01-11T21:38:06.0010004Z auto tmp51 = tmp45 + tmp47; 2023-01-11T21:38:06.0010120Z auto tmp52 = tmp50 ? tmp51 : tmp45; 2023-01-11T21:38:06.0010235Z out_ptr0[i2 + (56*i1) + (2240*i0)] = tmp52; 2023-01-11T21:38:06.0010312Z } 2023-01-11T21:38:06.0010384Z } 2023-01-11T21:38:06.0010455Z } 2023-01-11T21:38:06.0010515Z } 2023-01-11T21:38:06.0010583Z } 2023-01-11T21:38:06.0010651Z } 2023-01-11T21:38:06.0010717Z } 2023-01-11T21:38:06.0010804Z ''') 2023-01-11T21:38:06.0010810Z 2023-01-11T21:38:06.0010814Z 2023-01-11T21:38:06.0010910Z async_compile.wait(globals()) 2023-01-11T21:38:06.0010992Z del async_compile 2023-01-11T21:38:06.0010997Z 2023-01-11T21:38:06.0011064Z def call(args): 2023-01-11T21:38:06.0011151Z arg0_1, arg1_1, arg2_1 = args 2023-01-11T21:38:06.0011230Z args.clear() 2023-01-11T21:38:06.0011454Z buf0 = empty_strided((2, 4, 40, 56), (8960, 2240, 56, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0011626Z kernel_cpp_0(c_void_p(arg2_1.data_ptr()), c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.0011700Z del arg0_1 2023-01-11T21:38:06.0011771Z del arg2_1 2023-01-11T21:38:06.0011839Z return (buf0, ) 2023-01-11T21:38:06.0011845Z 2023-01-11T21:38:06.0011856Z 2023-01-11T21:38:06.0011931Z if __name__ == "__main__": 2023-01-11T21:38:06.0012050Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.0012177Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.0012396Z arg0_1 = rand_strided((2, 4, 21, 29), (2436, 609, 29, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0012649Z arg1_1 = rand_strided((2, 4, 40, 56), (8960, 2240, 56, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0012866Z arg2_1 = rand_strided((2, 4, 21, 29), (2436, 609, 29, 1), device='cpu', dtype=torch.int64) 2023-01-11T21:38:06.0012995Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1])) 2023-01-11T21:38:06.0013003Z 2023-01-11T21:38:06.0013067Z ok (1.954s) 2023-01-11T21:38:06.0013537Z test_max_pool2d_with_indices_backward3_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.0013673Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.0013932Z [2023-01-11 21:29:21,790] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 193 2023-01-11T21:38:06.0014202Z [2023-01-11 21:29:24,125] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 193 2023-01-11T21:38:06.0014208Z 2023-01-11T21:38:06.0014305Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.0014381Z import torch 2023-01-11T21:38:06.0014455Z import random 2023-01-11T21:38:06.0014705Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.0014832Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.0014838Z 2023-01-11T21:38:06.0014913Z aten = torch.ops.aten 2023-01-11T21:38:06.0015051Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.0015148Z async_compile = AsyncCompile() 2023-01-11T21:38:06.0015153Z 2023-01-11T21:38:06.0015227Z import triton 2023-01-11T21:38:06.0015322Z import triton.language as tl 2023-01-11T21:38:06.0015449Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.0015595Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.0015601Z 2023-01-11T21:38:06.0015605Z 2023-01-11T21:38:06.0015744Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.0015945Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.0016117Z extern "C" void kernel(const long* __restrict__ in_ptr0, 2023-01-11T21:38:06.0016231Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:06.0016340Z float* __restrict__ out_ptr0) 2023-01-11T21:38:06.0016405Z { 2023-01-11T21:38:06.0016509Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.0016578Z { 2023-01-11T21:38:06.0016653Z #pragma omp for 2023-01-11T21:38:06.0016743Z for(long i0=0; i0<8192; i0+=1) 2023-01-11T21:38:06.0016810Z { 2023-01-11T21:38:06.0016896Z #pragma GCC ivdep 2023-01-11T21:38:06.0016985Z for(long i1=0; i1<37; i1+=1) 2023-01-11T21:38:06.0023603Z { 2023-01-11T21:38:06.0023713Z #pragma GCC ivdep 2023-01-11T21:38:06.0023813Z for(long i2=0; i2<38; i2+=1) 2023-01-11T21:38:06.0023888Z { 2023-01-11T21:38:06.0023964Z { 2023-01-11T21:38:06.0024033Z { 2023-01-11T21:38:06.0024160Z auto tmp0 = static_cast(i2 + (38*i1)); 2023-01-11T21:38:06.0024285Z auto tmp1 = static_cast(((1 + i1) / 2)); 2023-01-11T21:38:06.0024406Z auto tmp2 = static_cast(((1 + i2) / 2)); 2023-01-11T21:38:06.0024526Z auto tmp3 = static_cast(1 + (i1 / 2)); 2023-01-11T21:38:06.0024646Z auto tmp4 = static_cast(1 + (i2 / 2)); 2023-01-11T21:38:06.0024759Z auto tmp5 = static_cast(0); 2023-01-11T21:38:06.0024906Z auto tmp6 = (tmp5 != tmp5) ? tmp5 : std::max(tmp1, tmp5); 2023-01-11T21:38:06.0025132Z auto tmp7 = (tmp5 != tmp5) ? tmp5 : std::max(tmp2, tmp5); 2023-01-11T21:38:06.0025271Z auto tmp8 = static_cast(19); 2023-01-11T21:38:06.0025407Z auto tmp9 = (tmp8 != tmp8) ? tmp8 : std::min(tmp3, tmp8); 2023-01-11T21:38:06.0025545Z auto tmp10 = (tmp8 != tmp8) ? tmp8 : std::min(tmp4, tmp8); 2023-01-11T21:38:06.0025654Z auto tmp11 = tmp6 + tmp5; 2023-01-11T21:38:06.0025756Z auto tmp12 = tmp7 + tmp5; 2023-01-11T21:38:06.0025871Z auto tmp13 = static_cast(1); 2023-01-11T21:38:06.0026044Z auto tmp14 = tmp9 - tmp13; 2023-01-11T21:38:06.0026176Z auto tmp15 = (tmp14 != tmp14) ? tmp14 : std::min(tmp11, tmp14); 2023-01-11T21:38:06.0026334Z auto tmp16 = tmp10 - tmp13; 2023-01-11T21:38:06.0026478Z auto tmp17 = (tmp16 != tmp16) ? tmp16 : std::min(tmp12, tmp16); 2023-01-11T21:38:06.0026604Z auto tmp18 = in_ptr0[tmp17 + (19*tmp15) + (361*i0)]; 2023-01-11T21:38:06.0026727Z auto tmp19 = in_ptr1[tmp17 + (19*tmp15) + (361*i0)]; 2023-01-11T21:38:06.0026833Z auto tmp20 = tmp18 == tmp0; 2023-01-11T21:38:06.0026947Z auto tmp21 = static_cast(0.0); 2023-01-11T21:38:06.0027059Z auto tmp22 = tmp20 ? tmp19 : tmp21; 2023-01-11T21:38:06.0027162Z out_ptr0[i2 + (38*i1) + (1406*i0)] = tmp22; 2023-01-11T21:38:06.0027238Z } 2023-01-11T21:38:06.0027311Z } 2023-01-11T21:38:06.0027381Z } 2023-01-11T21:38:06.0027449Z } 2023-01-11T21:38:06.0027516Z } 2023-01-11T21:38:06.0027578Z } 2023-01-11T21:38:06.0027635Z } 2023-01-11T21:38:06.0027725Z ''') 2023-01-11T21:38:06.0027731Z 2023-01-11T21:38:06.0027735Z 2023-01-11T21:38:06.0027831Z async_compile.wait(globals()) 2023-01-11T21:38:06.0027908Z del async_compile 2023-01-11T21:38:06.0027913Z 2023-01-11T21:38:06.0027980Z def call(args): 2023-01-11T21:38:06.0028067Z arg0_1, arg1_1, arg2_1 = args 2023-01-11T21:38:06.0028142Z args.clear() 2023-01-11T21:38:06.0028405Z buf0 = empty_strided((32, 256, 37, 38), (359936, 1406, 38, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0028571Z kernel_cpp_0(c_void_p(arg2_1.data_ptr()), c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.0028646Z del arg0_1 2023-01-11T21:38:06.0028718Z del arg2_1 2023-01-11T21:38:06.0028786Z return (buf0, ) 2023-01-11T21:38:06.0028798Z 2023-01-11T21:38:06.0028802Z 2023-01-11T21:38:06.0028878Z if __name__ == "__main__": 2023-01-11T21:38:06.0028999Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.0029127Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.0029362Z arg0_1 = rand_strided((32, 256, 19, 19), (92416, 361, 19, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0029589Z arg1_1 = rand_strided((32, 256, 37, 38), (359936, 1406, 38, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0029811Z arg2_1 = rand_strided((32, 256, 19, 19), (92416, 361, 19, 1), device='cpu', dtype=torch.int64) 2023-01-11T21:38:06.0029940Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1])) 2023-01-11T21:38:06.0029945Z 2023-01-11T21:38:06.0030017Z ok (2.813s) 2023-01-11T21:38:06.0030489Z test_max_pool2d_with_indices_backward4_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.0030648Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.0030913Z [2023-01-11 21:29:24,437] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 194 2023-01-11T21:38:06.0030918Z 2023-01-11T21:38:06.0031017Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.0031098Z import torch 2023-01-11T21:38:06.0031176Z import random 2023-01-11T21:38:06.0031300Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.0031427Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.0031432Z 2023-01-11T21:38:06.0031515Z aten = torch.ops.aten 2023-01-11T21:38:06.0031646Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.0031746Z async_compile = AsyncCompile() 2023-01-11T21:38:06.0031751Z 2023-01-11T21:38:06.0031830Z import triton 2023-01-11T21:38:06.0031923Z import triton.language as tl 2023-01-11T21:38:06.0032050Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.0032195Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.0032201Z 2023-01-11T21:38:06.0032205Z 2023-01-11T21:38:06.0032346Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.0032554Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.0032674Z extern "C" void kernel(const long* __restrict__ in_ptr0, 2023-01-11T21:38:06.0032789Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:06.0032897Z float* __restrict__ out_ptr0) 2023-01-11T21:38:06.0032965Z { 2023-01-11T21:38:06.0033068Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.0033136Z { 2023-01-11T21:38:06.0033220Z #pragma omp for 2023-01-11T21:38:06.0033303Z for(long i0=0; i0<128; i0+=1) 2023-01-11T21:38:06.0033373Z { 2023-01-11T21:38:06.0033461Z #pragma GCC ivdep 2023-01-11T21:38:06.0033551Z for(long i1=0; i1<3; i1+=1) 2023-01-11T21:38:06.0033624Z { 2023-01-11T21:38:06.0033715Z #pragma GCC ivdep 2023-01-11T21:38:06.0033813Z for(long i2=0; i2<4; i2+=1) 2023-01-11T21:38:06.0033878Z { 2023-01-11T21:38:06.0033953Z { 2023-01-11T21:38:06.0034030Z { 2023-01-11T21:38:06.0034183Z auto tmp0 = static_cast(i2 + (4*i1)); 2023-01-11T21:38:06.0034361Z auto tmp1 = static_cast((-2) + i1); 2023-01-11T21:38:06.0034531Z auto tmp2 = static_cast((-2) + i2); 2023-01-11T21:38:06.0034646Z auto tmp3 = static_cast(3 + i1); 2023-01-11T21:38:06.0034755Z auto tmp4 = static_cast(3 + i2); 2023-01-11T21:38:06.0034867Z auto tmp5 = static_cast(0); 2023-01-11T21:38:06.0035006Z auto tmp6 = (tmp5 != tmp5) ? tmp5 : std::max(tmp1, tmp5); 2023-01-11T21:38:06.0035143Z auto tmp7 = (tmp5 != tmp5) ? tmp5 : std::max(tmp2, tmp5); 2023-01-11T21:38:06.0035258Z auto tmp8 = static_cast(3); 2023-01-11T21:38:06.0035397Z auto tmp9 = (tmp8 != tmp8) ? tmp8 : std::min(tmp3, tmp8); 2023-01-11T21:38:06.0035513Z auto tmp10 = static_cast(4); 2023-01-11T21:38:06.0035654Z auto tmp11 = (tmp10 != tmp10) ? tmp10 : std::min(tmp4, tmp10); 2023-01-11T21:38:06.0035751Z auto tmp12 = tmp6 + tmp5; 2023-01-11T21:38:06.0035855Z auto tmp13 = tmp7 + tmp5; 2023-01-11T21:38:06.0035968Z auto tmp14 = static_cast(1); 2023-01-11T21:38:06.0036122Z auto tmp15 = tmp9 - tmp14; 2023-01-11T21:38:06.0036267Z auto tmp16 = (tmp15 != tmp15) ? tmp15 : std::min(tmp12, tmp15); 2023-01-11T21:38:06.0036420Z auto tmp17 = tmp11 - tmp14; 2023-01-11T21:38:06.0036589Z auto tmp18 = (tmp17 != tmp17) ? tmp17 : std::min(tmp13, tmp17); 2023-01-11T21:38:06.0036716Z auto tmp19 = in_ptr0[tmp18 + (4*tmp16) + (12*i0)]; 2023-01-11T21:38:06.0036844Z auto tmp20 = in_ptr1[tmp18 + (4*tmp16) + (12*i0)]; 2023-01-11T21:38:06.0036943Z auto tmp21 = tmp19 == tmp0; 2023-01-11T21:38:06.0037060Z auto tmp22 = static_cast(0.0); 2023-01-11T21:38:06.0037171Z auto tmp23 = tmp21 ? tmp20 : tmp22; 2023-01-11T21:38:06.0037275Z auto tmp24 = tmp7 + tmp14; 2023-01-11T21:38:06.0037410Z auto tmp25 = (tmp17 != tmp17) ? tmp17 : std::min(tmp24, tmp17); 2023-01-11T21:38:06.0037534Z auto tmp26 = in_ptr0[tmp25 + (4*tmp16) + (12*i0)]; 2023-01-11T21:38:06.0037659Z auto tmp27 = in_ptr1[tmp25 + (4*tmp16) + (12*i0)]; 2023-01-11T21:38:06.0037759Z auto tmp28 = tmp26 == tmp0; 2023-01-11T21:38:06.0037862Z auto tmp29 = tmp12 < tmp9; 2023-01-11T21:38:06.0037964Z auto tmp30 = tmp24 < tmp11; 2023-01-11T21:38:06.0038072Z auto tmp31 = tmp29 & tmp30; 2023-01-11T21:38:06.0038173Z auto tmp32 = tmp31 & tmp28; 2023-01-11T21:38:06.0038276Z auto tmp33 = tmp23 + tmp27; 2023-01-11T21:38:06.0038387Z auto tmp34 = tmp32 ? tmp33 : tmp23; 2023-01-11T21:38:06.0038503Z auto tmp35 = static_cast(2); 2023-01-11T21:38:06.0038600Z auto tmp36 = tmp7 + tmp35; 2023-01-11T21:38:06.0038736Z auto tmp37 = (tmp17 != tmp17) ? tmp17 : std::min(tmp36, tmp17); 2023-01-11T21:38:06.0038858Z auto tmp38 = in_ptr0[tmp37 + (4*tmp16) + (12*i0)]; 2023-01-11T21:38:06.0038983Z auto tmp39 = in_ptr1[tmp37 + (4*tmp16) + (12*i0)]; 2023-01-11T21:38:06.0039085Z auto tmp40 = tmp38 == tmp0; 2023-01-11T21:38:06.0039188Z auto tmp41 = tmp36 < tmp11; 2023-01-11T21:38:06.0039315Z auto tmp42 = tmp29 & tmp41; 2023-01-11T21:38:06.0039418Z auto tmp43 = tmp42 & tmp40; 2023-01-11T21:38:06.0039511Z auto tmp44 = tmp34 + tmp39; 2023-01-11T21:38:06.0039627Z auto tmp45 = tmp43 ? tmp44 : tmp34; 2023-01-11T21:38:06.0039732Z auto tmp46 = tmp7 + tmp8; 2023-01-11T21:38:06.0039866Z auto tmp47 = (tmp17 != tmp17) ? tmp17 : std::min(tmp46, tmp17); 2023-01-11T21:38:06.0039987Z auto tmp48 = in_ptr0[tmp47 + (4*tmp16) + (12*i0)]; 2023-01-11T21:38:06.0040109Z auto tmp49 = in_ptr1[tmp47 + (4*tmp16) + (12*i0)]; 2023-01-11T21:38:06.0040215Z auto tmp50 = tmp48 == tmp0; 2023-01-11T21:38:06.0040320Z auto tmp51 = tmp46 < tmp11; 2023-01-11T21:38:06.0040414Z auto tmp52 = tmp29 & tmp51; 2023-01-11T21:38:06.0040519Z auto tmp53 = tmp52 & tmp50; 2023-01-11T21:38:06.0040618Z auto tmp54 = tmp45 + tmp49; 2023-01-11T21:38:06.0040731Z auto tmp55 = tmp53 ? tmp54 : tmp45; 2023-01-11T21:38:06.0040835Z auto tmp56 = tmp7 + tmp10; 2023-01-11T21:38:06.0040977Z auto tmp57 = (tmp17 != tmp17) ? tmp17 : std::min(tmp56, tmp17); 2023-01-11T21:38:06.0041101Z auto tmp58 = in_ptr0[tmp57 + (4*tmp16) + (12*i0)]; 2023-01-11T21:38:06.0041215Z auto tmp59 = in_ptr1[tmp57 + (4*tmp16) + (12*i0)]; 2023-01-11T21:38:06.0041345Z auto tmp60 = tmp58 == tmp0; 2023-01-11T21:38:06.0041448Z auto tmp61 = tmp56 < tmp11; 2023-01-11T21:38:06.0041549Z auto tmp62 = tmp29 & tmp61; 2023-01-11T21:38:06.0041651Z auto tmp63 = tmp62 & tmp60; 2023-01-11T21:38:06.0041754Z auto tmp64 = tmp55 + tmp59; 2023-01-11T21:38:06.0041867Z auto tmp65 = tmp63 ? tmp64 : tmp55; 2023-01-11T21:38:06.0041969Z auto tmp66 = tmp6 + tmp14; 2023-01-11T21:38:06.0042101Z auto tmp67 = (tmp15 != tmp15) ? tmp15 : std::min(tmp66, tmp15); 2023-01-11T21:38:06.0042224Z auto tmp68 = in_ptr0[tmp18 + (4*tmp67) + (12*i0)]; 2023-01-11T21:38:06.0042345Z auto tmp69 = in_ptr1[tmp18 + (4*tmp67) + (12*i0)]; 2023-01-11T21:38:06.0042447Z auto tmp70 = tmp68 == tmp0; 2023-01-11T21:38:06.0042552Z auto tmp71 = tmp66 < tmp9; 2023-01-11T21:38:06.0042656Z auto tmp72 = tmp13 < tmp11; 2023-01-11T21:38:06.0042760Z auto tmp73 = tmp71 & tmp72; 2023-01-11T21:38:06.0042860Z auto tmp74 = tmp73 & tmp70; 2023-01-11T21:38:06.0042956Z auto tmp75 = tmp65 + tmp69; 2023-01-11T21:38:06.0043068Z auto tmp76 = tmp74 ? tmp75 : tmp65; 2023-01-11T21:38:06.0043190Z auto tmp77 = in_ptr0[tmp25 + (4*tmp67) + (12*i0)]; 2023-01-11T21:38:06.0043312Z auto tmp78 = in_ptr1[tmp25 + (4*tmp67) + (12*i0)]; 2023-01-11T21:38:06.0043414Z auto tmp79 = tmp77 == tmp0; 2023-01-11T21:38:06.0043516Z auto tmp80 = tmp71 & tmp30; 2023-01-11T21:38:06.0043616Z auto tmp81 = tmp80 & tmp79; 2023-01-11T21:38:06.0043710Z auto tmp82 = tmp76 + tmp78; 2023-01-11T21:38:06.0043826Z auto tmp83 = tmp81 ? tmp82 : tmp76; 2023-01-11T21:38:06.0043947Z auto tmp84 = in_ptr0[tmp37 + (4*tmp67) + (12*i0)]; 2023-01-11T21:38:06.0044067Z auto tmp85 = in_ptr1[tmp37 + (4*tmp67) + (12*i0)]; 2023-01-11T21:38:06.0044198Z auto tmp86 = tmp84 == tmp0; 2023-01-11T21:38:06.0044302Z auto tmp87 = tmp71 & tmp41; 2023-01-11T21:38:06.0044404Z auto tmp88 = tmp87 & tmp86; 2023-01-11T21:38:06.0044505Z auto tmp89 = tmp83 + tmp85; 2023-01-11T21:38:06.0044611Z auto tmp90 = tmp88 ? tmp89 : tmp83; 2023-01-11T21:38:06.0044732Z auto tmp91 = in_ptr0[tmp47 + (4*tmp67) + (12*i0)]; 2023-01-11T21:38:06.0044853Z auto tmp92 = in_ptr1[tmp47 + (4*tmp67) + (12*i0)]; 2023-01-11T21:38:06.0044955Z auto tmp93 = tmp91 == tmp0; 2023-01-11T21:38:06.0045060Z auto tmp94 = tmp71 & tmp51; 2023-01-11T21:38:06.0045160Z auto tmp95 = tmp94 & tmp93; 2023-01-11T21:38:06.0045260Z auto tmp96 = tmp90 + tmp92; 2023-01-11T21:38:06.0045373Z auto tmp97 = tmp95 ? tmp96 : tmp90; 2023-01-11T21:38:06.0045486Z auto tmp98 = in_ptr0[tmp57 + (4*tmp67) + (12*i0)]; 2023-01-11T21:38:06.0045605Z auto tmp99 = in_ptr1[tmp57 + (4*tmp67) + (12*i0)]; 2023-01-11T21:38:06.0045713Z auto tmp100 = tmp98 == tmp0; 2023-01-11T21:38:06.0045820Z auto tmp101 = tmp71 & tmp61; 2023-01-11T21:38:06.0045929Z auto tmp102 = tmp101 & tmp100; 2023-01-11T21:38:06.0046033Z auto tmp103 = tmp97 + tmp99; 2023-01-11T21:38:06.0046149Z auto tmp104 = tmp102 ? tmp103 : tmp97; 2023-01-11T21:38:06.0046284Z auto tmp105 = tmp6 + tmp35; 2023-01-11T21:38:06.0046429Z auto tmp106 = (tmp15 != tmp15) ? tmp15 : std::min(tmp105, tmp15); 2023-01-11T21:38:06.0046556Z auto tmp107 = in_ptr0[tmp18 + (4*tmp106) + (12*i0)]; 2023-01-11T21:38:06.0046683Z auto tmp108 = in_ptr1[tmp18 + (4*tmp106) + (12*i0)]; 2023-01-11T21:38:06.0046790Z auto tmp109 = tmp107 == tmp0; 2023-01-11T21:38:06.0046895Z auto tmp110 = tmp105 < tmp9; 2023-01-11T21:38:06.0047003Z auto tmp111 = tmp110 & tmp72; 2023-01-11T21:38:06.0047111Z auto tmp112 = tmp111 & tmp109; 2023-01-11T21:38:06.0047212Z auto tmp113 = tmp104 + tmp108; 2023-01-11T21:38:06.0047326Z auto tmp114 = tmp112 ? tmp113 : tmp104; 2023-01-11T21:38:06.0047449Z auto tmp115 = in_ptr0[tmp25 + (4*tmp106) + (12*i0)]; 2023-01-11T21:38:06.0047575Z auto tmp116 = in_ptr1[tmp25 + (4*tmp106) + (12*i0)]; 2023-01-11T21:38:06.0047685Z auto tmp117 = tmp115 == tmp0; 2023-01-11T21:38:06.0047790Z auto tmp118 = tmp110 & tmp30; 2023-01-11T21:38:06.0047902Z auto tmp119 = tmp118 & tmp117; 2023-01-11T21:38:06.0048010Z auto tmp120 = tmp114 + tmp116; 2023-01-11T21:38:06.0048116Z auto tmp121 = tmp119 ? tmp120 : tmp114; 2023-01-11T21:38:06.0048239Z auto tmp122 = in_ptr0[tmp37 + (4*tmp106) + (12*i0)]; 2023-01-11T21:38:06.0048360Z auto tmp123 = in_ptr1[tmp37 + (4*tmp106) + (12*i0)]; 2023-01-11T21:38:06.0048467Z auto tmp124 = tmp122 == tmp0; 2023-01-11T21:38:06.0048571Z auto tmp125 = tmp110 & tmp41; 2023-01-11T21:38:06.0048681Z auto tmp126 = tmp125 & tmp124; 2023-01-11T21:38:06.0048789Z auto tmp127 = tmp121 + tmp123; 2023-01-11T21:38:06.0048903Z auto tmp128 = tmp126 ? tmp127 : tmp121; 2023-01-11T21:38:06.0049047Z auto tmp129 = in_ptr0[tmp47 + (4*tmp106) + (12*i0)]; 2023-01-11T21:38:06.0049173Z auto tmp130 = in_ptr1[tmp47 + (4*tmp106) + (12*i0)]; 2023-01-11T21:38:06.0049278Z auto tmp131 = tmp129 == tmp0; 2023-01-11T21:38:06.0049384Z auto tmp132 = tmp110 & tmp51; 2023-01-11T21:38:06.0049492Z auto tmp133 = tmp132 & tmp131; 2023-01-11T21:38:06.0049598Z auto tmp134 = tmp128 + tmp130; 2023-01-11T21:38:06.0049715Z auto tmp135 = tmp133 ? tmp134 : tmp128; 2023-01-11T21:38:06.0049832Z auto tmp136 = in_ptr0[tmp57 + (4*tmp106) + (12*i0)]; 2023-01-11T21:38:06.0049955Z auto tmp137 = in_ptr1[tmp57 + (4*tmp106) + (12*i0)]; 2023-01-11T21:38:06.0050062Z auto tmp138 = tmp136 == tmp0; 2023-01-11T21:38:06.0050168Z auto tmp139 = tmp110 & tmp61; 2023-01-11T21:38:06.0050277Z auto tmp140 = tmp139 & tmp138; 2023-01-11T21:38:06.0050385Z auto tmp141 = tmp135 + tmp137; 2023-01-11T21:38:06.0050500Z auto tmp142 = tmp140 ? tmp141 : tmp135; 2023-01-11T21:38:06.0050604Z auto tmp143 = tmp6 + tmp8; 2023-01-11T21:38:06.0050741Z auto tmp144 = (tmp15 != tmp15) ? tmp15 : std::min(tmp143, tmp15); 2023-01-11T21:38:06.0050864Z auto tmp145 = in_ptr0[tmp18 + (4*tmp144) + (12*i0)]; 2023-01-11T21:38:06.0050986Z auto tmp146 = in_ptr1[tmp18 + (4*tmp144) + (12*i0)]; 2023-01-11T21:38:06.0051130Z auto tmp147 = tmp145 == tmp0; 2023-01-11T21:38:06.0051236Z auto tmp148 = tmp143 < tmp9; 2023-01-11T21:38:06.0051341Z auto tmp149 = tmp148 & tmp72; 2023-01-11T21:38:06.0051446Z auto tmp150 = tmp149 & tmp147; 2023-01-11T21:38:06.0051555Z auto tmp151 = tmp142 + tmp146; 2023-01-11T21:38:06.0051661Z auto tmp152 = tmp150 ? tmp151 : tmp142; 2023-01-11T21:38:06.0051783Z auto tmp153 = in_ptr0[tmp25 + (4*tmp144) + (12*i0)]; 2023-01-11T21:38:06.0051903Z auto tmp154 = in_ptr1[tmp25 + (4*tmp144) + (12*i0)]; 2023-01-11T21:38:06.0052008Z auto tmp155 = tmp153 == tmp0; 2023-01-11T21:38:06.0052113Z auto tmp156 = tmp148 & tmp30; 2023-01-11T21:38:06.0052218Z auto tmp157 = tmp156 & tmp155; 2023-01-11T21:38:06.0052328Z auto tmp158 = tmp152 + tmp154; 2023-01-11T21:38:06.0052440Z auto tmp159 = tmp157 ? tmp158 : tmp152; 2023-01-11T21:38:06.0052554Z auto tmp160 = in_ptr0[tmp37 + (4*tmp144) + (12*i0)]; 2023-01-11T21:38:06.0052678Z auto tmp161 = in_ptr1[tmp37 + (4*tmp144) + (12*i0)]; 2023-01-11T21:38:06.0052782Z auto tmp162 = tmp160 == tmp0; 2023-01-11T21:38:06.0052890Z auto tmp163 = tmp148 & tmp41; 2023-01-11T21:38:06.0052999Z auto tmp164 = tmp163 & tmp162; 2023-01-11T21:38:06.0053104Z auto tmp165 = tmp159 + tmp161; 2023-01-11T21:38:06.0053219Z auto tmp166 = tmp164 ? tmp165 : tmp159; 2023-01-11T21:38:06.0053336Z auto tmp167 = in_ptr0[tmp47 + (4*tmp144) + (12*i0)]; 2023-01-11T21:38:06.0053456Z auto tmp168 = in_ptr1[tmp47 + (4*tmp144) + (12*i0)]; 2023-01-11T21:38:06.0053563Z auto tmp169 = tmp167 == tmp0; 2023-01-11T21:38:06.0053668Z auto tmp170 = tmp148 & tmp51; 2023-01-11T21:38:06.0053775Z auto tmp171 = tmp170 & tmp169; 2023-01-11T21:38:06.0053906Z auto tmp172 = tmp166 + tmp168; 2023-01-11T21:38:06.0054021Z auto tmp173 = tmp171 ? tmp172 : tmp166; 2023-01-11T21:38:06.0054145Z auto tmp174 = in_ptr0[tmp57 + (4*tmp144) + (12*i0)]; 2023-01-11T21:38:06.0054258Z auto tmp175 = in_ptr1[tmp57 + (4*tmp144) + (12*i0)]; 2023-01-11T21:38:06.0054363Z auto tmp176 = tmp174 == tmp0; 2023-01-11T21:38:06.0054468Z auto tmp177 = tmp148 & tmp61; 2023-01-11T21:38:06.0054718Z auto tmp178 = tmp177 & tmp176; 2023-01-11T21:38:06.0054826Z auto tmp179 = tmp173 + tmp175; 2023-01-11T21:38:06.0054944Z auto tmp180 = tmp178 ? tmp179 : tmp173; 2023-01-11T21:38:06.0055062Z auto tmp181 = tmp6 + tmp10; 2023-01-11T21:38:06.0055224Z auto tmp182 = (tmp15 != tmp15) ? tmp15 : std::min(tmp181, tmp15); 2023-01-11T21:38:06.0055357Z auto tmp183 = in_ptr0[tmp18 + (4*tmp182) + (12*i0)]; 2023-01-11T21:38:06.0055478Z auto tmp184 = in_ptr1[tmp18 + (4*tmp182) + (12*i0)]; 2023-01-11T21:38:06.0055585Z auto tmp185 = tmp183 == tmp0; 2023-01-11T21:38:06.0055691Z auto tmp186 = tmp181 < tmp9; 2023-01-11T21:38:06.0055795Z auto tmp187 = tmp186 & tmp72; 2023-01-11T21:38:06.0055900Z auto tmp188 = tmp187 & tmp185; 2023-01-11T21:38:06.0056006Z auto tmp189 = tmp180 + tmp184; 2023-01-11T21:38:06.0056113Z auto tmp190 = tmp188 ? tmp189 : tmp180; 2023-01-11T21:38:06.0056277Z auto tmp191 = in_ptr0[tmp25 + (4*tmp182) + (12*i0)]; 2023-01-11T21:38:06.0056398Z auto tmp192 = in_ptr1[tmp25 + (4*tmp182) + (12*i0)]; 2023-01-11T21:38:06.0056503Z auto tmp193 = tmp191 == tmp0; 2023-01-11T21:38:06.0056608Z auto tmp194 = tmp186 & tmp30; 2023-01-11T21:38:06.0056712Z auto tmp195 = tmp194 & tmp193; 2023-01-11T21:38:06.0056816Z auto tmp196 = tmp190 + tmp192; 2023-01-11T21:38:06.0056929Z auto tmp197 = tmp195 ? tmp196 : tmp190; 2023-01-11T21:38:06.0057044Z auto tmp198 = in_ptr0[tmp37 + (4*tmp182) + (12*i0)]; 2023-01-11T21:38:06.0057230Z auto tmp199 = in_ptr1[tmp37 + (4*tmp182) + (12*i0)]; 2023-01-11T21:38:06.0057337Z auto tmp200 = tmp198 == tmp0; 2023-01-11T21:38:06.0057444Z auto tmp201 = tmp186 & tmp41; 2023-01-11T21:38:06.0057551Z auto tmp202 = tmp201 & tmp200; 2023-01-11T21:38:06.0057656Z auto tmp203 = tmp197 + tmp199; 2023-01-11T21:38:06.0057771Z auto tmp204 = tmp202 ? tmp203 : tmp197; 2023-01-11T21:38:06.0057895Z auto tmp205 = in_ptr0[tmp47 + (4*tmp182) + (12*i0)]; 2023-01-11T21:38:06.0058009Z auto tmp206 = in_ptr1[tmp47 + (4*tmp182) + (12*i0)]; 2023-01-11T21:38:06.0058114Z auto tmp207 = tmp205 == tmp0; 2023-01-11T21:38:06.0058217Z auto tmp208 = tmp186 & tmp51; 2023-01-11T21:38:06.0058324Z auto tmp209 = tmp208 & tmp207; 2023-01-11T21:38:06.0058429Z auto tmp210 = tmp204 + tmp206; 2023-01-11T21:38:06.0058541Z auto tmp211 = tmp209 ? tmp210 : tmp204; 2023-01-11T21:38:06.0058667Z auto tmp212 = in_ptr0[tmp57 + (4*tmp182) + (12*i0)]; 2023-01-11T21:38:06.0058788Z auto tmp213 = in_ptr1[tmp57 + (4*tmp182) + (12*i0)]; 2023-01-11T21:38:06.0058885Z auto tmp214 = tmp212 == tmp0; 2023-01-11T21:38:06.0059026Z auto tmp215 = tmp186 & tmp61; 2023-01-11T21:38:06.0059134Z auto tmp216 = tmp215 & tmp214; 2023-01-11T21:38:06.0059237Z auto tmp217 = tmp211 + tmp213; 2023-01-11T21:38:06.0059348Z auto tmp218 = tmp216 ? tmp217 : tmp211; 2023-01-11T21:38:06.0059457Z out_ptr0[i2 + (4*i1) + (12*i0)] = tmp218; 2023-01-11T21:38:06.0059531Z } 2023-01-11T21:38:06.0059595Z } 2023-01-11T21:38:06.0059663Z } 2023-01-11T21:38:06.0059731Z } 2023-01-11T21:38:06.0059798Z } 2023-01-11T21:38:06.0059869Z } 2023-01-11T21:38:06.0059934Z } 2023-01-11T21:38:06.0060024Z ''') 2023-01-11T21:38:06.0060029Z 2023-01-11T21:38:06.0060034Z 2023-01-11T21:38:06.0060123Z async_compile.wait(globals()) 2023-01-11T21:38:06.0060200Z del async_compile 2023-01-11T21:38:06.0060207Z 2023-01-11T21:38:06.0060286Z def call(args): 2023-01-11T21:38:06.0060375Z arg0_1, arg1_1, arg2_1 = args 2023-01-11T21:38:06.0060451Z args.clear() 2023-01-11T21:38:06.0060665Z buf0 = empty_strided((2, 64, 3, 4), (768, 12, 4, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0060851Z kernel_cpp_0(c_void_p(arg2_1.data_ptr()), c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.0060917Z del arg0_1 2023-01-11T21:38:06.0060989Z del arg2_1 2023-01-11T21:38:06.0061069Z return (buf0, ) 2023-01-11T21:38:06.0061074Z 2023-01-11T21:38:06.0061079Z 2023-01-11T21:38:06.0061160Z if __name__ == "__main__": 2023-01-11T21:38:06.0061289Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.0061490Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.0061705Z arg0_1 = rand_strided((2, 64, 3, 4), (768, 12, 4, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0061916Z arg1_1 = rand_strided((2, 64, 3, 4), (768, 12, 4, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0062120Z arg2_1 = rand_strided((2, 64, 3, 4), (768, 12, 4, 1), device='cpu', dtype=torch.int64) 2023-01-11T21:38:06.0062248Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1])) 2023-01-11T21:38:06.0062514Z [2023-01-11 21:29:26,811] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 194 2023-01-11T21:38:06.0062521Z 2023-01-11T21:38:06.0062592Z ok (2.417s) 2023-01-11T21:38:06.0063070Z test_max_pool2d_with_indices_backward5_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.0063204Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.0063466Z [2023-01-11 21:29:26,860] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 195 2023-01-11T21:38:06.0063723Z [2023-01-11 21:29:26,880] torch._inductor.ir: [WARNING] Using FallbackKernel: aten.max_pool2d_with_indices_backward 2023-01-11T21:38:06.0063982Z [2023-01-11 21:29:26,885] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 195 2023-01-11T21:38:06.0063988Z 2023-01-11T21:38:06.0064090Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.0064158Z import torch 2023-01-11T21:38:06.0064233Z import random 2023-01-11T21:38:06.0064354Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.0064478Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.0064487Z 2023-01-11T21:38:06.0064568Z aten = torch.ops.aten 2023-01-11T21:38:06.0064706Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.0064802Z async_compile = AsyncCompile() 2023-01-11T21:38:06.0064807Z 2023-01-11T21:38:06.0064882Z import triton 2023-01-11T21:38:06.0064994Z import triton.language as tl 2023-01-11T21:38:06.0065122Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.0065263Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.0065269Z 2023-01-11T21:38:06.0065273Z 2023-01-11T21:38:06.0065364Z async_compile.wait(globals()) 2023-01-11T21:38:06.0065441Z del async_compile 2023-01-11T21:38:06.0065446Z 2023-01-11T21:38:06.0065520Z def call(args): 2023-01-11T21:38:06.0065606Z arg0_1, arg1_1, arg2_1 = args 2023-01-11T21:38:06.0065681Z args.clear() 2023-01-11T21:38:06.0065836Z buf0 = aten.max_pool2d_with_indices_backward(arg0_1, arg1_1, [13, 13], [1, 1], [2, 2], [1, 1], False, arg2_1) 2023-01-11T21:38:06.0065913Z del arg0_1 2023-01-11T21:38:06.0065985Z del arg1_1 2023-01-11T21:38:06.0066056Z del arg2_1 2023-01-11T21:38:06.0066129Z buf1 = buf0 2023-01-11T21:38:06.0066244Z assert_size_stride(buf1, (2, 64, 20, 20), (25600, 400, 20, 1)) 2023-01-11T21:38:06.0066315Z del buf0 2023-01-11T21:38:06.0066386Z return (buf1, ) 2023-01-11T21:38:06.0066391Z 2023-01-11T21:38:06.0066396Z 2023-01-11T21:38:06.0066476Z if __name__ == "__main__": 2023-01-11T21:38:06.0066592Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.0066719Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.0066942Z arg0_1 = rand_strided((2, 64, 12, 12), (9216, 144, 12, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0067162Z arg1_1 = rand_strided((2, 64, 20, 20), (25600, 400, 20, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0067376Z arg2_1 = rand_strided((2, 64, 12, 12), (9216, 144, 12, 1), device='cpu', dtype=torch.int64) 2023-01-11T21:38:06.0067539Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1])) 2023-01-11T21:38:06.0067545Z 2023-01-11T21:38:06.0067609Z ok (0.059s) 2023-01-11T21:38:06.0068087Z test_max_pool2d_with_indices_backward_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.0068219Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.0068478Z [2023-01-11 21:29:26,919] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 196 2023-01-11T21:38:06.0068742Z [2023-01-11 21:29:28,882] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 196 2023-01-11T21:38:06.0068751Z 2023-01-11T21:38:06.0068850Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.0068925Z import torch 2023-01-11T21:38:06.0069000Z import random 2023-01-11T21:38:06.0069120Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.0069236Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.0069244Z 2023-01-11T21:38:06.0069325Z aten = torch.ops.aten 2023-01-11T21:38:06.0069464Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.0069561Z async_compile = AsyncCompile() 2023-01-11T21:38:06.0069566Z 2023-01-11T21:38:06.0069641Z import triton 2023-01-11T21:38:06.0069737Z import triton.language as tl 2023-01-11T21:38:06.0069864Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.0070003Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.0070009Z 2023-01-11T21:38:06.0070014Z 2023-01-11T21:38:06.0070144Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.0070353Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.0070476Z extern "C" void kernel(const long* __restrict__ in_ptr0, 2023-01-11T21:38:06.0070586Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:06.0070718Z float* __restrict__ out_ptr0) 2023-01-11T21:38:06.0070785Z { 2023-01-11T21:38:06.0070888Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.0070947Z { 2023-01-11T21:38:06.0071028Z #pragma omp for 2023-01-11T21:38:06.0071116Z for(long i0=0; i0<8; i0+=1) 2023-01-11T21:38:06.0071183Z { 2023-01-11T21:38:06.0071268Z #pragma GCC ivdep 2023-01-11T21:38:06.0071360Z for(long i1=0; i1<18; i1+=1) 2023-01-11T21:38:06.0071431Z { 2023-01-11T21:38:06.0071510Z #pragma GCC ivdep 2023-01-11T21:38:06.0071604Z for(long i2=0; i2<14; i2+=1) 2023-01-11T21:38:06.0071673Z { 2023-01-11T21:38:06.0071746Z { 2023-01-11T21:38:06.0071818Z { 2023-01-11T21:38:06.0071939Z auto tmp0 = static_cast(i2 + (14*i1)); 2023-01-11T21:38:06.0072057Z auto tmp1 = static_cast((i1 / 2)); 2023-01-11T21:38:06.0072165Z auto tmp2 = static_cast((i2 / 2)); 2023-01-11T21:38:06.0072282Z auto tmp3 = static_cast(1 + (i1 / 2)); 2023-01-11T21:38:06.0072404Z auto tmp4 = static_cast(1 + (i2 / 2)); 2023-01-11T21:38:06.0072515Z auto tmp5 = static_cast(0); 2023-01-11T21:38:06.0072652Z auto tmp6 = (tmp5 != tmp5) ? tmp5 : std::max(tmp1, tmp5); 2023-01-11T21:38:06.0072782Z auto tmp7 = (tmp5 != tmp5) ? tmp5 : std::max(tmp2, tmp5); 2023-01-11T21:38:06.0072894Z auto tmp8 = static_cast(9); 2023-01-11T21:38:06.0073057Z auto tmp9 = (tmp8 != tmp8) ? tmp8 : std::min(tmp3, tmp8); 2023-01-11T21:38:06.0073162Z auto tmp10 = static_cast(7); 2023-01-11T21:38:06.0073299Z auto tmp11 = (tmp10 != tmp10) ? tmp10 : std::min(tmp4, tmp10); 2023-01-11T21:38:06.0073402Z auto tmp12 = tmp6 + tmp5; 2023-01-11T21:38:06.0073502Z auto tmp13 = tmp7 + tmp5; 2023-01-11T21:38:06.0073612Z auto tmp14 = static_cast(1); 2023-01-11T21:38:06.0073765Z auto tmp15 = tmp9 - tmp14; 2023-01-11T21:38:06.0073904Z auto tmp16 = (tmp15 != tmp15) ? tmp15 : std::min(tmp12, tmp15); 2023-01-11T21:38:06.0074057Z auto tmp17 = tmp11 - tmp14; 2023-01-11T21:38:06.0074193Z auto tmp18 = (tmp17 != tmp17) ? tmp17 : std::min(tmp13, tmp17); 2023-01-11T21:38:06.0074310Z auto tmp19 = in_ptr0[tmp18 + (7*tmp16) + (63*i0)]; 2023-01-11T21:38:06.0074437Z auto tmp20 = in_ptr1[tmp18 + (7*tmp16) + (63*i0)]; 2023-01-11T21:38:06.0074541Z auto tmp21 = tmp19 == tmp0; 2023-01-11T21:38:06.0074657Z auto tmp22 = static_cast(0.0); 2023-01-11T21:38:06.0074770Z auto tmp23 = tmp21 ? tmp20 : tmp22; 2023-01-11T21:38:06.0074886Z out_ptr0[i2 + (14*i1) + (252*i0)] = tmp23; 2023-01-11T21:38:06.0074960Z } 2023-01-11T21:38:06.0075024Z } 2023-01-11T21:38:06.0075094Z } 2023-01-11T21:38:06.0075163Z } 2023-01-11T21:38:06.0075231Z } 2023-01-11T21:38:06.0075297Z } 2023-01-11T21:38:06.0075360Z } 2023-01-11T21:38:06.0075446Z ''') 2023-01-11T21:38:06.0075452Z 2023-01-11T21:38:06.0075456Z 2023-01-11T21:38:06.0075544Z async_compile.wait(globals()) 2023-01-11T21:38:06.0075624Z del async_compile 2023-01-11T21:38:06.0075630Z 2023-01-11T21:38:06.0075704Z def call(args): 2023-01-11T21:38:06.0075792Z arg0_1, arg1_1, arg2_1 = args 2023-01-11T21:38:06.0075868Z args.clear() 2023-01-11T21:38:06.0076085Z buf0 = empty_strided((2, 4, 18, 14), (1008, 252, 14, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0076281Z kernel_cpp_0(c_void_p(arg2_1.data_ptr()), c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.0076355Z del arg0_1 2023-01-11T21:38:06.0076420Z del arg2_1 2023-01-11T21:38:06.0076495Z return (buf0, ) 2023-01-11T21:38:06.0076500Z 2023-01-11T21:38:06.0076505Z 2023-01-11T21:38:06.0076586Z if __name__ == "__main__": 2023-01-11T21:38:06.0076705Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.0076831Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.0077043Z arg0_1 = rand_strided((2, 4, 9, 7), (252, 63, 7, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0077264Z arg1_1 = rand_strided((2, 4, 18, 14), (1008, 252, 14, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0077465Z arg2_1 = rand_strided((2, 4, 9, 7), (252, 63, 7, 1), device='cpu', dtype=torch.int64) 2023-01-11T21:38:06.0077592Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1])) 2023-01-11T21:38:06.0077597Z 2023-01-11T21:38:06.0077672Z ok (2.002s) 2023-01-11T21:38:06.0078121Z test_mean_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.0078254Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.0078512Z [2023-01-11 21:29:28,919] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 197 2023-01-11T21:38:06.0078805Z [2023-01-11 21:29:30,776] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 197 2023-01-11T21:38:06.0078811Z 2023-01-11T21:38:06.0078909Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.0078983Z import torch 2023-01-11T21:38:06.0079060Z import random 2023-01-11T21:38:06.0079175Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.0079299Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.0079304Z 2023-01-11T21:38:06.0079386Z aten = torch.ops.aten 2023-01-11T21:38:06.0079524Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.0079620Z async_compile = AsyncCompile() 2023-01-11T21:38:06.0079625Z 2023-01-11T21:38:06.0079701Z import triton 2023-01-11T21:38:06.0079794Z import triton.language as tl 2023-01-11T21:38:06.0079912Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.0080050Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.0080058Z 2023-01-11T21:38:06.0080063Z 2023-01-11T21:38:06.0080202Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.0080406Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.0080529Z extern "C" void kernel(float* __restrict__ in_out_ptr0, 2023-01-11T21:38:06.0080637Z float* __restrict__ in_out_ptr1, 2023-01-11T21:38:06.0080750Z const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.0080852Z float* __restrict__ out_ptr2, 2023-01-11T21:38:06.0080947Z float* __restrict__ out_ptr3) 2023-01-11T21:38:06.0081015Z { 2023-01-11T21:38:06.0081105Z auto out_ptr1 = in_out_ptr0; 2023-01-11T21:38:06.0081195Z auto out_ptr0 = in_out_ptr1; 2023-01-11T21:38:06.0081261Z { 2023-01-11T21:38:06.0081454Z #pragma omp declare reduction(+:at::vec::Vectorized:omp_out += omp_in) initializer(omp_priv={{0}}) 2023-01-11T21:38:06.0081539Z float tmp1 = 0; 2023-01-11T21:38:06.0081664Z auto tmp1_vec = at::vec::Vectorized(tmp1); 2023-01-11T21:38:06.0081765Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.0081832Z { 2023-01-11T21:38:06.0081944Z #pragma omp for reduction(+:tmp1_vec) 2023-01-11T21:38:06.0082063Z for(long i0=0; i0<8; i0+=1) 2023-01-11T21:38:06.0082133Z { 2023-01-11T21:38:06.0082272Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:06.0082362Z tmp1_vec += tmp0; 2023-01-11T21:38:06.0082422Z } 2023-01-11T21:38:06.0082620Z tmp1 = at::vec::vec_reduce_all([](at::vec::Vectorized& x, at::vec::Vectorized&y) {return x + y;}, tmp1_vec); 2023-01-11T21:38:06.0082747Z #pragma omp for simd simdlen(4) reduction(+:tmp1) 2023-01-11T21:38:06.0082838Z for(long i0=64; i0<64; i0+=1) 2023-01-11T21:38:06.0082909Z { 2023-01-11T21:38:06.0083005Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:06.0083087Z tmp1 += tmp0; 2023-01-11T21:38:06.0083147Z } 2023-01-11T21:38:06.0083215Z } 2023-01-11T21:38:06.0083300Z out_ptr0[0] = tmp1; 2023-01-11T21:38:06.0083368Z } 2023-01-11T21:38:06.0083475Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.0083542Z { 2023-01-11T21:38:06.0083624Z #pragma omp for 2023-01-11T21:38:06.0083704Z for(long i0=0; i0<8; i0+=1) 2023-01-11T21:38:06.0083772Z { 2023-01-11T21:38:06.0083839Z { 2023-01-11T21:38:06.0084031Z #pragma omp declare reduction(+:at::vec::Vectorized:omp_out += omp_in) initializer(omp_priv={{0}}) 2023-01-11T21:38:06.0084115Z float tmp1 = 0; 2023-01-11T21:38:06.0084246Z auto tmp1_vec = at::vec::Vectorized(tmp1); 2023-01-11T21:38:06.0084343Z for(long i1=0; i1<1; i1+=1) 2023-01-11T21:38:06.0084443Z { 2023-01-11T21:38:06.0084594Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + (8*i0) + (8*i1)); 2023-01-11T21:38:06.0084682Z tmp1_vec += tmp0; 2023-01-11T21:38:06.0084755Z } 2023-01-11T21:38:06.0084982Z tmp1 = at::vec::vec_reduce_all([](at::vec::Vectorized& x, at::vec::Vectorized&y) {return x + y;}, tmp1_vec); 2023-01-11T21:38:06.0085124Z #pragma omp simd simdlen(4) reduction(+:tmp1) 2023-01-11T21:38:06.0085226Z for(long i1=8; i1<8; i1+=1) 2023-01-11T21:38:06.0085296Z { 2023-01-11T21:38:06.0085393Z auto tmp0 = in_ptr0[i1 + (8*i0)]; 2023-01-11T21:38:06.0085476Z tmp1 += tmp0; 2023-01-11T21:38:06.0085545Z } 2023-01-11T21:38:06.0085633Z out_ptr1[i0] = tmp1; 2023-01-11T21:38:06.0085703Z } 2023-01-11T21:38:06.0085769Z } 2023-01-11T21:38:06.0085855Z #pragma omp for 2023-01-11T21:38:06.0085934Z for(long i0=0; i0<1; i0+=1) 2023-01-11T21:38:06.0086002Z { 2023-01-11T21:38:06.0086140Z auto tmp0 = at::vec::Vectorized::loadu(out_ptr1 + 8*i0); 2023-01-11T21:38:06.0086277Z auto tmp1 = at::vec::Vectorized(static_cast(8)); 2023-01-11T21:38:06.0086371Z auto tmp2 = tmp0 / tmp1; 2023-01-11T21:38:06.0086470Z tmp2.store(in_out_ptr0 + 8*i0); 2023-01-11T21:38:06.0086538Z } 2023-01-11T21:38:06.0086631Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:06.0086716Z for(long i0=8; i0<8; i0+=1) 2023-01-11T21:38:06.0086782Z { 2023-01-11T21:38:06.0086871Z auto tmp0 = out_ptr1[i0]; 2023-01-11T21:38:06.0086974Z auto tmp1 = static_cast(8); 2023-01-11T21:38:06.0087064Z auto tmp2 = tmp0 / tmp1; 2023-01-11T21:38:06.0087153Z in_out_ptr0[i0] = tmp2; 2023-01-11T21:38:06.0087212Z } 2023-01-11T21:38:06.0087295Z #pragma omp for 2023-01-11T21:38:06.0087379Z for(long i0=0; i0<2; i0+=1) 2023-01-11T21:38:06.0087444Z { 2023-01-11T21:38:06.0087532Z for(long i1=0; i1<1; i1+=1) 2023-01-11T21:38:06.0087600Z { 2023-01-11T21:38:06.0087774Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + (8*i1) + (32*i0)); 2023-01-11T21:38:06.0087917Z auto tmp1 = at::vec::Vectorized::loadu(in_ptr0 + 8 + (8*i1) + (32*i0)); 2023-01-11T21:38:06.0088065Z auto tmp3 = at::vec::Vectorized::loadu(in_ptr0 + 16 + (8*i1) + (32*i0)); 2023-01-11T21:38:06.0088208Z auto tmp5 = at::vec::Vectorized::loadu(in_ptr0 + 24 + (8*i1) + (32*i0)); 2023-01-11T21:38:06.0088301Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.0088396Z auto tmp4 = tmp2 + tmp3; 2023-01-11T21:38:06.0088488Z auto tmp6 = tmp4 + tmp5; 2023-01-11T21:38:06.0088628Z auto tmp7 = at::vec::Vectorized(static_cast(4)); 2023-01-11T21:38:06.0088721Z auto tmp8 = tmp6 / tmp7; 2023-01-11T21:38:06.0088823Z tmp8.store(out_ptr2 + (8*i0) + (8*i1)); 2023-01-11T21:38:06.0088892Z } 2023-01-11T21:38:06.0088988Z #pragma omp simd simdlen(4) 2023-01-11T21:38:06.0089079Z for(long i1=8; i1<8; i1+=1) 2023-01-11T21:38:06.0089145Z { 2023-01-11T21:38:06.0089248Z auto tmp0 = in_ptr0[i1 + (32*i0)]; 2023-01-11T21:38:06.0089350Z auto tmp1 = in_ptr0[8 + i1 + (32*i0)]; 2023-01-11T21:38:06.0089447Z auto tmp3 = in_ptr0[16 + i1 + (32*i0)]; 2023-01-11T21:38:06.0089550Z auto tmp5 = in_ptr0[24 + i1 + (32*i0)]; 2023-01-11T21:38:06.0089641Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.0089732Z auto tmp4 = tmp2 + tmp3; 2023-01-11T21:38:06.0089825Z auto tmp6 = tmp4 + tmp5; 2023-01-11T21:38:06.0089932Z auto tmp7 = static_cast(4); 2023-01-11T21:38:06.0090056Z auto tmp8 = tmp6 / tmp7; 2023-01-11T21:38:06.0090146Z out_ptr2[i1 + (8*i0)] = tmp8; 2023-01-11T21:38:06.0090219Z } 2023-01-11T21:38:06.0090287Z } 2023-01-11T21:38:06.0090367Z #pragma omp for 2023-01-11T21:38:06.0090457Z for(long i0=0; i0<4; i0+=1) 2023-01-11T21:38:06.0090526Z { 2023-01-11T21:38:06.0090663Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:06.0090803Z auto tmp1 = at::vec::Vectorized::loadu(in_ptr0 + 32 + (8*i0)); 2023-01-11T21:38:06.0090887Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.0091022Z auto tmp3 = at::vec::Vectorized(static_cast(2)); 2023-01-11T21:38:06.0091111Z auto tmp4 = tmp2 / tmp3; 2023-01-11T21:38:06.0091206Z tmp4.store(out_ptr3 + 8*i0); 2023-01-11T21:38:06.0091273Z } 2023-01-11T21:38:06.0091373Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:06.0091466Z for(long i0=32; i0<32; i0+=1) 2023-01-11T21:38:06.0091526Z { 2023-01-11T21:38:06.0091614Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:06.0091707Z auto tmp1 = in_ptr0[32 + i0]; 2023-01-11T21:38:06.0091798Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.0091904Z auto tmp3 = static_cast(2); 2023-01-11T21:38:06.0091991Z auto tmp4 = tmp2 / tmp3; 2023-01-11T21:38:06.0092078Z out_ptr3[i0] = tmp4; 2023-01-11T21:38:06.0092138Z } 2023-01-11T21:38:06.0092221Z #pragma omp single 2023-01-11T21:38:06.0092291Z { 2023-01-11T21:38:06.0092358Z { 2023-01-11T21:38:06.0092428Z { 2023-01-11T21:38:06.0092524Z auto tmp0 = out_ptr0[0]; 2023-01-11T21:38:06.0092625Z auto tmp1 = static_cast(64); 2023-01-11T21:38:06.0092722Z auto tmp2 = tmp0 / tmp1; 2023-01-11T21:38:06.0092820Z in_out_ptr1[0] = tmp2; 2023-01-11T21:38:06.0092892Z } 2023-01-11T21:38:06.0092959Z } 2023-01-11T21:38:06.0093028Z } 2023-01-11T21:38:06.0093095Z } 2023-01-11T21:38:06.0093152Z } 2023-01-11T21:38:06.0093243Z ''') 2023-01-11T21:38:06.0093248Z 2023-01-11T21:38:06.0093253Z 2023-01-11T21:38:06.0093376Z async_compile.wait(globals()) 2023-01-11T21:38:06.0093455Z del async_compile 2023-01-11T21:38:06.0093460Z 2023-01-11T21:38:06.0093535Z def call(args): 2023-01-11T21:38:06.0093608Z arg0_1, = args 2023-01-11T21:38:06.0093684Z args.clear() 2023-01-11T21:38:06.0093867Z buf0 = empty_strided((), (), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0094066Z buf1 = empty_strided((1, 2, 4), (8, 4, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0094157Z buf2 = buf1; del buf1 # reuse 2023-01-11T21:38:06.0094363Z buf3 = empty_strided((1, 2, 1, 8), (16, 8, 8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0094685Z buf4 = empty_strided((4, 8), (8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0094780Z buf5 = buf0; del buf0 # reuse 2023-01-11T21:38:06.0095000Z kernel_cpp_0(c_void_p(buf2.data_ptr()), c_void_p(buf5.data_ptr()), c_void_p(arg0_1.data_ptr()), c_void_p(buf3.data_ptr()), c_void_p(buf4.data_ptr())) 2023-01-11T21:38:06.0095073Z del arg0_1 2023-01-11T21:38:06.0095161Z return (buf5, buf2, buf3, buf4, ) 2023-01-11T21:38:06.0095166Z 2023-01-11T21:38:06.0095178Z 2023-01-11T21:38:06.0095251Z if __name__ == "__main__": 2023-01-11T21:38:06.0095369Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.0095494Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.0095705Z arg0_1 = rand_strided((1, 2, 4, 8), (64, 32, 8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0095817Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.0095822Z 2023-01-11T21:38:06.0095900Z ok (1.887s) 2023-01-11T21:38:06.0096363Z test_min_max_reduction_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.0096540Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.0096805Z [2023-01-11 21:29:30,801] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 198 2023-01-11T21:38:06.0097063Z [2023-01-11 21:29:33,081] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 198 2023-01-11T21:38:06.0097069Z 2023-01-11T21:38:06.0097238Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.0097328Z import torch 2023-01-11T21:38:06.0097418Z import random 2023-01-11T21:38:06.0097554Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.0097701Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.0097707Z 2023-01-11T21:38:06.0097806Z aten = torch.ops.aten 2023-01-11T21:38:06.0097957Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.0098056Z async_compile = AsyncCompile() 2023-01-11T21:38:06.0098061Z 2023-01-11T21:38:06.0098139Z import triton 2023-01-11T21:38:06.0098234Z import triton.language as tl 2023-01-11T21:38:06.0098360Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.0098502Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.0098509Z 2023-01-11T21:38:06.0098513Z 2023-01-11T21:38:06.0098654Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.0098862Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.0098987Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.0099092Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:06.0099200Z float* __restrict__ out_ptr0, 2023-01-11T21:38:06.0099304Z float* __restrict__ out_ptr1, 2023-01-11T21:38:06.0099406Z float* __restrict__ out_ptr2) 2023-01-11T21:38:06.0099473Z { 2023-01-11T21:38:06.0099542Z { 2023-01-11T21:38:06.0099946Z #pragma omp declare reduction(max:at::vec::Vectorized:omp_out = at::vec::maximum(omp_out, omp_in)) initializer(omp_priv={{-std::numeric_limits::infinity()}}) 2023-01-11T21:38:06.0100157Z float tmp3 = -std::numeric_limits::infinity(); 2023-01-11T21:38:06.0100275Z auto tmp3_vec = at::vec::Vectorized(tmp3); 2023-01-11T21:38:06.0100520Z #pragma omp declare reduction(min:at::vec::Vectorized:omp_out = at::vec::minimum(omp_out, omp_in)) initializer(omp_priv={{std::numeric_limits::infinity()}}) 2023-01-11T21:38:06.0100649Z float tmp4 = std::numeric_limits::infinity(); 2023-01-11T21:38:06.0100777Z auto tmp4_vec = at::vec::Vectorized(tmp4); 2023-01-11T21:38:06.0100975Z float tmp7 = -std::numeric_limits::infinity(); 2023-01-11T21:38:06.0101100Z auto tmp7_vec = at::vec::Vectorized(tmp7); 2023-01-11T21:38:06.0101213Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.0101287Z { 2023-01-11T21:38:06.0101455Z #pragma omp for reduction(max:tmp3_vec) reduction(min:tmp4_vec) reduction(max:tmp7_vec) 2023-01-11T21:38:06.0101545Z for(long i0=0; i0<8; i0+=1) 2023-01-11T21:38:06.0101615Z { 2023-01-11T21:38:06.0101758Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:06.0101903Z auto tmp1 = at::vec::Vectorized::loadu(in_ptr1 + 8*i0); 2023-01-11T21:38:06.0102000Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.0102140Z auto tmp5 = at::vec::Vectorized(static_cast(1)); 2023-01-11T21:38:06.0102269Z auto tmp6 = tmp0 + tmp5; 2023-01-11T21:38:06.0102384Z tmp3_vec = at::vec::maximum(tmp3_vec, tmp2); 2023-01-11T21:38:06.0102503Z tmp4_vec = at::vec::minimum(tmp4_vec, tmp2); 2023-01-11T21:38:06.0102620Z tmp7_vec = at::vec::maximum(tmp7_vec, tmp6); 2023-01-11T21:38:06.0102691Z } 2023-01-11T21:38:06.0102904Z tmp3 = at::vec::vec_reduce_all([](at::vec::Vectorized& x, at::vec::Vectorized&y) {return at::vec::maximum(x, y);}, tmp3_vec); 2023-01-11T21:38:06.0103114Z tmp4 = at::vec::vec_reduce_all([](at::vec::Vectorized& x, at::vec::Vectorized&y) {return at::vec::minimum(x, y);}, tmp4_vec); 2023-01-11T21:38:06.0103322Z tmp7 = at::vec::vec_reduce_all([](at::vec::Vectorized& x, at::vec::Vectorized&y) {return at::vec::maximum(x, y);}, tmp7_vec); 2023-01-11T21:38:06.0103496Z #pragma omp for simd simdlen(4) reduction(max:tmp3) reduction(min:tmp4) reduction(max:tmp7) 2023-01-11T21:38:06.0103591Z for(long i0=64; i0<64; i0+=1) 2023-01-11T21:38:06.0103660Z { 2023-01-11T21:38:06.0103746Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:06.0103839Z auto tmp1 = in_ptr1[i0]; 2023-01-11T21:38:06.0103933Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.0104039Z auto tmp5 = static_cast(1); 2023-01-11T21:38:06.0104129Z auto tmp6 = tmp0 + tmp5; 2023-01-11T21:38:06.0104232Z tmp3 = std::max(tmp3, tmp2); 2023-01-11T21:38:06.0104332Z tmp4 = std::min(tmp4, tmp2); 2023-01-11T21:38:06.0104422Z tmp7 = std::max(tmp7, tmp6); 2023-01-11T21:38:06.0104491Z } 2023-01-11T21:38:06.0104559Z } 2023-01-11T21:38:06.0104643Z out_ptr0[0] = tmp3; 2023-01-11T21:38:06.0104724Z out_ptr1[0] = tmp4; 2023-01-11T21:38:06.0104806Z out_ptr2[0] = tmp7; 2023-01-11T21:38:06.0104875Z } 2023-01-11T21:38:06.0104934Z } 2023-01-11T21:38:06.0105017Z ''') 2023-01-11T21:38:06.0105023Z 2023-01-11T21:38:06.0105027Z 2023-01-11T21:38:06.0105121Z async_compile.wait(globals()) 2023-01-11T21:38:06.0105198Z del async_compile 2023-01-11T21:38:06.0105203Z 2023-01-11T21:38:06.0105277Z def call(args): 2023-01-11T21:38:06.0105385Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.0105462Z args.clear() 2023-01-11T21:38:06.0105644Z buf0 = empty_strided((), (), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0105826Z buf1 = empty_strided((), (), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0106017Z buf2 = empty_strided((1, 1), (1, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0106240Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr())) 2023-01-11T21:38:06.0106313Z del arg0_1 2023-01-11T21:38:06.0106383Z del arg1_1 2023-01-11T21:38:06.0106468Z return (buf0, buf1, buf2, ) 2023-01-11T21:38:06.0106477Z 2023-01-11T21:38:06.0106481Z 2023-01-11T21:38:06.0106563Z if __name__ == "__main__": 2023-01-11T21:38:06.0106676Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.0106803Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.0107002Z arg0_1 = rand_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0107199Z arg1_1 = rand_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0107317Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.0107323Z 2023-01-11T21:38:06.0107395Z ok (2.310s) 2023-01-11T21:38:06.0107863Z test_misaligned_address_issue1_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.0108028Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.0108288Z [2023-01-11 21:29:33,108] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 199 2023-01-11T21:38:06.0108547Z [2023-01-11 21:29:35,087] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 199 2023-01-11T21:38:06.0108561Z 2023-01-11T21:38:06.0108651Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.0108731Z import torch 2023-01-11T21:38:06.0108807Z import random 2023-01-11T21:38:06.0108928Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.0109051Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.0109056Z 2023-01-11T21:38:06.0109137Z aten = torch.ops.aten 2023-01-11T21:38:06.0109274Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.0109365Z async_compile = AsyncCompile() 2023-01-11T21:38:06.0109370Z 2023-01-11T21:38:06.0109445Z import triton 2023-01-11T21:38:06.0109544Z import triton.language as tl 2023-01-11T21:38:06.0109668Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.0109806Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.0109815Z 2023-01-11T21:38:06.0109820Z 2023-01-11T21:38:06.0109956Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.0110163Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.0110285Z extern "C" void kernel(const long* __restrict__ in_ptr0, 2023-01-11T21:38:06.0110388Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:06.0110494Z float* __restrict__ out_ptr0) 2023-01-11T21:38:06.0110560Z { 2023-01-11T21:38:06.0110624Z { 2023-01-11T21:38:06.0110692Z { 2023-01-11T21:38:06.0110784Z auto tmp0 = in_ptr0[0]; 2023-01-11T21:38:06.0110880Z auto tmp1 = in_ptr1[tmp0]; 2023-01-11T21:38:06.0110957Z out_ptr0[0] = tmp1; 2023-01-11T21:38:06.0111026Z } 2023-01-11T21:38:06.0111094Z } 2023-01-11T21:38:06.0111160Z } 2023-01-11T21:38:06.0111244Z ''') 2023-01-11T21:38:06.0111249Z 2023-01-11T21:38:06.0111254Z 2023-01-11T21:38:06.0111376Z async_compile.wait(globals()) 2023-01-11T21:38:06.0111455Z del async_compile 2023-01-11T21:38:06.0111460Z 2023-01-11T21:38:06.0111527Z def call(args): 2023-01-11T21:38:06.0111608Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.0111683Z args.clear() 2023-01-11T21:38:06.0111877Z buf0 = empty_strided((1, 1), (1, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0112045Z kernel_cpp_0(c_void_p(arg1_1.data_ptr()), c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.0112123Z del arg0_1 2023-01-11T21:38:06.0112196Z del arg1_1 2023-01-11T21:38:06.0112264Z return (buf0, ) 2023-01-11T21:38:06.0112269Z 2023-01-11T21:38:06.0112284Z 2023-01-11T21:38:06.0112359Z if __name__ == "__main__": 2023-01-11T21:38:06.0112477Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.0112603Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.0112807Z arg0_1 = rand_strided((1, 1000), (1000, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0113000Z arg1_1 = rand_strided((1, 1), (1, 1), device='cpu', dtype=torch.int64) 2023-01-11T21:38:06.0113121Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.0113126Z 2023-01-11T21:38:06.0113196Z ok (2.001s) 2023-01-11T21:38:06.0113650Z test_mm_views_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.0113812Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.0114074Z [2023-01-11 21:29:35,134] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 200 2023-01-11T21:38:06.0114338Z [2023-01-11 21:29:35,137] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 200 2023-01-11T21:38:06.0114344Z 2023-01-11T21:38:06.0114447Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.0114521Z import torch 2023-01-11T21:38:06.0114594Z import random 2023-01-11T21:38:06.0114714Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.0114838Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.0114843Z 2023-01-11T21:38:06.0114935Z aten = torch.ops.aten 2023-01-11T21:38:06.0115089Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.0115201Z async_compile = AsyncCompile() 2023-01-11T21:38:06.0115207Z 2023-01-11T21:38:06.0115285Z import triton 2023-01-11T21:38:06.0115380Z import triton.language as tl 2023-01-11T21:38:06.0115507Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.0115646Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.0115651Z 2023-01-11T21:38:06.0115655Z 2023-01-11T21:38:06.0115748Z async_compile.wait(globals()) 2023-01-11T21:38:06.0115820Z del async_compile 2023-01-11T21:38:06.0115832Z 2023-01-11T21:38:06.0115900Z def call(args): 2023-01-11T21:38:06.0115981Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.0116056Z args.clear() 2023-01-11T21:38:06.0116257Z buf0 = empty_strided((32, 32), (32, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0116386Z aten.mm.out(arg0_1, as_strided(arg1_1, (32, 32), (32, 1)), out=buf0) 2023-01-11T21:38:06.0116460Z del arg0_1 2023-01-11T21:38:06.0116524Z del arg1_1 2023-01-11T21:38:06.0116602Z return (buf0, ) 2023-01-11T21:38:06.0116607Z 2023-01-11T21:38:06.0116612Z 2023-01-11T21:38:06.0116694Z if __name__ == "__main__": 2023-01-11T21:38:06.0116812Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.0116936Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.0117138Z arg0_1 = rand_strided((32, 32), (1, 32), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0117379Z arg1_1 = rand_strided((32, 1, 32), (32, 1024, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0117498Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.0117503Z 2023-01-11T21:38:06.0117576Z ok (0.068s) 2023-01-11T21:38:06.0118030Z test_move_arange_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.0118164Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.0118420Z [2023-01-11 21:29:35,195] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 201 2023-01-11T21:38:06.0118683Z [2023-01-11 21:29:37,179] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 201 2023-01-11T21:38:06.0118692Z 2023-01-11T21:38:06.0118791Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.0118865Z import torch 2023-01-11T21:38:06.0118939Z import random 2023-01-11T21:38:06.0119060Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.0119186Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.0119191Z 2023-01-11T21:38:06.0119266Z aten = torch.ops.aten 2023-01-11T21:38:06.0119405Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.0119501Z async_compile = AsyncCompile() 2023-01-11T21:38:06.0119506Z 2023-01-11T21:38:06.0119579Z import triton 2023-01-11T21:38:06.0119701Z import triton.language as tl 2023-01-11T21:38:06.0119826Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.0119965Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.0119972Z 2023-01-11T21:38:06.0119976Z 2023-01-11T21:38:06.0120114Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.0120315Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.0120440Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.0120545Z float* __restrict__ out_ptr0) 2023-01-11T21:38:06.0120610Z { 2023-01-11T21:38:06.0120711Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.0120777Z { 2023-01-11T21:38:06.0120858Z #pragma omp for 2023-01-11T21:38:06.0120938Z for(long i0=0; i0<32; i0+=1) 2023-01-11T21:38:06.0121004Z { 2023-01-11T21:38:06.0121073Z { 2023-01-11T21:38:06.0121144Z { 2023-01-11T21:38:06.0121249Z auto tmp2 = in_ptr0[i0]; 2023-01-11T21:38:06.0121359Z auto tmp0 = static_cast(i0); 2023-01-11T21:38:06.0121471Z auto tmp1 = static_cast(tmp0); 2023-01-11T21:38:06.0121560Z auto tmp3 = tmp1 + tmp2; 2023-01-11T21:38:06.0121655Z out_ptr0[i0] = tmp3; 2023-01-11T21:38:06.0121724Z } 2023-01-11T21:38:06.0121792Z } 2023-01-11T21:38:06.0121859Z } 2023-01-11T21:38:06.0121926Z } 2023-01-11T21:38:06.0121983Z } 2023-01-11T21:38:06.0122067Z ''') 2023-01-11T21:38:06.0122073Z 2023-01-11T21:38:06.0122077Z 2023-01-11T21:38:06.0122170Z async_compile.wait(globals()) 2023-01-11T21:38:06.0122247Z del async_compile 2023-01-11T21:38:06.0122252Z 2023-01-11T21:38:06.0122326Z def call(args): 2023-01-11T21:38:06.0122400Z arg0_1, = args 2023-01-11T21:38:06.0122477Z args.clear() 2023-01-11T21:38:06.0122670Z buf0 = empty_strided((32, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0122805Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.0122881Z del arg0_1 2023-01-11T21:38:06.0122956Z return (buf0, ) 2023-01-11T21:38:06.0122961Z 2023-01-11T21:38:06.0122966Z 2023-01-11T21:38:06.0123045Z if __name__ == "__main__": 2023-01-11T21:38:06.0123190Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.0123318Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.0123515Z arg0_1 = rand_strided((32, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0123619Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.0123632Z 2023-01-11T21:38:06.0123696Z ok (2.029s) 2023-01-11T21:38:06.0124149Z test_multi_device_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.0124283Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.0124544Z [2023-01-11 21:29:37,367] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 202 2023-01-11T21:38:06.0124724Z [2023-01-11 21:29:37,370] torch._inductor.ir: [WARNING] DeviceCopy 2023-01-11T21:38:06.0124903Z [2023-01-11 21:29:37,371] torch._inductor.ir: [WARNING] DeviceCopy 2023-01-11T21:38:06.0125087Z [2023-01-11 21:29:37,373] torch._inductor.ir: [WARNING] DeviceCopy 2023-01-11T21:38:06.0125287Z [2023-01-11 21:29:37,374] torch._inductor.ir: [WARNING] DeviceCopy 2023-01-11T21:38:06.0125563Z [2023-01-11 21:29:39,669] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 202 2023-01-11T21:38:06.0125569Z 2023-01-11T21:38:06.0125661Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.0125778Z import torch 2023-01-11T21:38:06.0125852Z import random 2023-01-11T21:38:06.0125972Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.0126096Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.0126101Z 2023-01-11T21:38:06.0126181Z aten = torch.ops.aten 2023-01-11T21:38:06.0126321Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.0126409Z async_compile = AsyncCompile() 2023-01-11T21:38:06.0126424Z 2023-01-11T21:38:06.0126490Z import triton 2023-01-11T21:38:06.0126586Z import triton.language as tl 2023-01-11T21:38:06.0126711Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.0126851Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.0126856Z 2023-01-11T21:38:06.0126861Z 2023-01-11T21:38:06.0126997Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.0127202Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.0127331Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.0127428Z float* __restrict__ out_ptr0) 2023-01-11T21:38:06.0127498Z { 2023-01-11T21:38:06.0127601Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.0127667Z { 2023-01-11T21:38:06.0127753Z #pragma omp for 2023-01-11T21:38:06.0127840Z for(long i0=0; i0<5; i0+=1) 2023-01-11T21:38:06.0127910Z { 2023-01-11T21:38:06.0128042Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:06.0128181Z auto tmp1 = at::vec::Vectorized(static_cast(1)); 2023-01-11T21:38:06.0128271Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.0128407Z auto tmp3 = at::vec::Vectorized(static_cast(2)); 2023-01-11T21:38:06.0128496Z auto tmp4 = tmp2 + tmp3; 2023-01-11T21:38:06.0128592Z tmp4.store(out_ptr0 + 8*i0); 2023-01-11T21:38:06.0128665Z } 2023-01-11T21:38:06.0128765Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:06.0128845Z for(long i0=40; i0<40; i0+=1) 2023-01-11T21:38:06.0128912Z { 2023-01-11T21:38:06.0128999Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:06.0129104Z auto tmp1 = static_cast(1); 2023-01-11T21:38:06.0129225Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.0129330Z auto tmp3 = static_cast(2); 2023-01-11T21:38:06.0129423Z auto tmp4 = tmp2 + tmp3; 2023-01-11T21:38:06.0129502Z out_ptr0[i0] = tmp4; 2023-01-11T21:38:06.0129571Z } 2023-01-11T21:38:06.0129638Z } 2023-01-11T21:38:06.0129703Z } 2023-01-11T21:38:06.0129789Z ''') 2023-01-11T21:38:06.0129794Z 2023-01-11T21:38:06.0129799Z 2023-01-11T21:38:06.0130001Z triton_fused_add_add_1_add_2_add_3_device_put_1 = async_compile.triton(''' 2023-01-11T21:38:06.0130078Z import triton 2023-01-11T21:38:06.0130163Z import triton.language as tl 2023-01-11T21:38:06.0130282Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.0130384Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.0130517Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.0130644Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.0130650Z 2023-01-11T21:38:06.0131058Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp32', 1: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0,), equal_to_1=())]}) 2023-01-11T21:38:06.0131133Z @triton.jit 2023-01-11T21:38:06.0131260Z def triton_(in_out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.0131327Z xnumel = 40 2023-01-11T21:38:06.0131425Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.0131557Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.0131640Z xmask = xindex < xnumel 2023-01-11T21:38:06.0131712Z x0 = xindex 2023-01-11T21:38:06.0131844Z tmp0 = tl.load(in_out_ptr0 + (x0), xmask) 2023-01-11T21:38:06.0131915Z tmp1 = 3 2023-01-11T21:38:06.0131988Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.0132056Z tmp3 = 4 2023-01-11T21:38:06.0132134Z tmp4 = tmp2 + tmp3 2023-01-11T21:38:06.0132276Z tl.store(in_out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.0132365Z ''') 2023-01-11T21:38:06.0132371Z 2023-01-11T21:38:06.0132375Z 2023-01-11T21:38:06.0132512Z kernel_cpp_2 = async_compile.cpp(''' 2023-01-11T21:38:06.0132718Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.0132834Z extern "C" void kernel(float* __restrict__ in_out_ptr0) 2023-01-11T21:38:06.0132899Z { 2023-01-11T21:38:06.0132999Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.0133065Z { 2023-01-11T21:38:06.0133147Z #pragma omp for 2023-01-11T21:38:06.0133234Z for(long i0=0; i0<5; i0+=1) 2023-01-11T21:38:06.0133304Z { 2023-01-11T21:38:06.0133442Z auto tmp0 = at::vec::Vectorized::loadu(in_out_ptr0 + 8*i0); 2023-01-11T21:38:06.0133580Z auto tmp1 = at::vec::Vectorized(static_cast(5)); 2023-01-11T21:38:06.0133670Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.0133808Z auto tmp3 = at::vec::Vectorized(static_cast(6)); 2023-01-11T21:38:06.0133897Z auto tmp4 = tmp2 + tmp3; 2023-01-11T21:38:06.0134001Z tmp4.store(in_out_ptr0 + 8*i0); 2023-01-11T21:38:06.0134071Z } 2023-01-11T21:38:06.0134165Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:06.0134255Z for(long i0=40; i0<40; i0+=1) 2023-01-11T21:38:06.0134326Z { 2023-01-11T21:38:06.0134420Z auto tmp0 = in_out_ptr0[i0]; 2023-01-11T21:38:06.0134633Z auto tmp1 = static_cast(5); 2023-01-11T21:38:06.0134723Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.0134828Z auto tmp3 = static_cast(6); 2023-01-11T21:38:06.0134913Z auto tmp4 = tmp2 + tmp3; 2023-01-11T21:38:06.0135017Z in_out_ptr0[i0] = tmp4; 2023-01-11T21:38:06.0135093Z } 2023-01-11T21:38:06.0135167Z } 2023-01-11T21:38:06.0135246Z } 2023-01-11T21:38:06.0135333Z ''') 2023-01-11T21:38:06.0135338Z 2023-01-11T21:38:06.0135343Z 2023-01-11T21:38:06.0135649Z triton_fused_add_add_1_add_2_add_3_add_4_add_5_add_6_add_7_device_put_device_put_1_3 = async_compile.triton(''' 2023-01-11T21:38:06.0135727Z import triton 2023-01-11T21:38:06.0135812Z import triton.language as tl 2023-01-11T21:38:06.0135927Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.0136029Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.0136163Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.0136291Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.0136297Z 2023-01-11T21:38:06.0136694Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp32', 1: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0,), equal_to_1=())]}) 2023-01-11T21:38:06.0136771Z @triton.jit 2023-01-11T21:38:06.0136898Z def triton_(in_out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.0136965Z xnumel = 40 2023-01-11T21:38:06.0137064Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.0137249Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.0137333Z xmask = xindex < xnumel 2023-01-11T21:38:06.0137402Z x0 = xindex 2023-01-11T21:38:06.0137508Z tmp0 = tl.load(in_out_ptr0 + (x0), xmask) 2023-01-11T21:38:06.0137579Z tmp1 = 7 2023-01-11T21:38:06.0137652Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.0137722Z tmp3 = 8 2023-01-11T21:38:06.0137804Z tmp4 = tmp2 + tmp3 2023-01-11T21:38:06.0137942Z tl.store(in_out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.0138027Z ''') 2023-01-11T21:38:06.0138071Z 2023-01-11T21:38:06.0138076Z 2023-01-11T21:38:06.0138214Z kernel_cpp_4 = async_compile.cpp(''' 2023-01-11T21:38:06.0138422Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.0138539Z extern "C" void kernel(float* __restrict__ in_out_ptr0) 2023-01-11T21:38:06.0138605Z { 2023-01-11T21:38:06.0138712Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.0138781Z { 2023-01-11T21:38:06.0138865Z #pragma omp for 2023-01-11T21:38:06.0138954Z for(long i0=0; i0<5; i0+=1) 2023-01-11T21:38:06.0139023Z { 2023-01-11T21:38:06.0139165Z auto tmp0 = at::vec::Vectorized::loadu(in_out_ptr0 + 8*i0); 2023-01-11T21:38:06.0139306Z auto tmp1 = at::vec::Vectorized(static_cast(9)); 2023-01-11T21:38:06.0139398Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.0139538Z auto tmp3 = at::vec::Vectorized(static_cast(10)); 2023-01-11T21:38:06.0139629Z auto tmp4 = tmp2 + tmp3; 2023-01-11T21:38:06.0139738Z tmp4.store(in_out_ptr0 + 8*i0); 2023-01-11T21:38:06.0139806Z } 2023-01-11T21:38:06.0139902Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:06.0139992Z for(long i0=40; i0<40; i0+=1) 2023-01-11T21:38:06.0140059Z { 2023-01-11T21:38:06.0140160Z auto tmp0 = in_out_ptr0[i0]; 2023-01-11T21:38:06.0140266Z auto tmp1 = static_cast(9); 2023-01-11T21:38:06.0140363Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.0140469Z auto tmp3 = static_cast(10); 2023-01-11T21:38:06.0140553Z auto tmp4 = tmp2 + tmp3; 2023-01-11T21:38:06.0140644Z in_out_ptr0[i0] = tmp4; 2023-01-11T21:38:06.0140712Z } 2023-01-11T21:38:06.0140780Z } 2023-01-11T21:38:06.0140846Z } 2023-01-11T21:38:06.0140933Z ''') 2023-01-11T21:38:06.0140938Z 2023-01-11T21:38:06.0140943Z 2023-01-11T21:38:06.0141038Z async_compile.wait(globals()) 2023-01-11T21:38:06.0141115Z del async_compile 2023-01-11T21:38:06.0141129Z 2023-01-11T21:38:06.0141200Z def call(args): 2023-01-11T21:38:06.0141275Z arg0_1, = args 2023-01-11T21:38:06.0141353Z args.clear() 2023-01-11T21:38:06.0141564Z buf0 = empty_strided((2, 2, 10), (20, 10, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0141740Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.0141817Z del arg0_1 2023-01-11T21:38:06.0141909Z with torch.cuda.device(0): 2023-01-11T21:38:06.0142112Z buf1 = empty_strided((2, 2, 10), (20, 10, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.0142193Z buf1.copy_(buf0) 2023-01-11T21:38:06.0142285Z buf2 = buf1; del buf1 # reuse 2023-01-11T21:38:06.0142377Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.0142540Z triton_fused_add_add_1_add_2_add_3_device_put_1.run(buf2, 40, grid=grid(40), stream=stream0) 2023-01-11T21:38:06.0142630Z buf3 = buf0; del buf0 # reuse 2023-01-11T21:38:06.0142712Z buf3.copy_(buf2) 2023-01-11T21:38:06.0142793Z buf4 = buf3; del buf3 # reuse 2023-01-11T21:38:06.0142900Z kernel_cpp_2(c_void_p(buf4.data_ptr())) 2023-01-11T21:38:06.0142995Z with torch.cuda.device(0): 2023-01-11T21:38:06.0143085Z buf5 = buf2; del buf2 # reuse 2023-01-11T21:38:06.0143169Z buf5.copy_(buf4) 2023-01-11T21:38:06.0143261Z buf6 = buf5; del buf5 # reuse 2023-01-11T21:38:06.0143452Z triton_fused_add_add_1_add_2_add_3_add_4_add_5_add_6_add_7_device_put_device_put_1_3.run(buf6, 40, grid=grid(40), stream=stream0) 2023-01-11T21:38:06.0143534Z buf7 = buf4; del buf4 # reuse 2023-01-11T21:38:06.0143614Z buf7.copy_(buf6) 2023-01-11T21:38:06.0143684Z del buf6 2023-01-11T21:38:06.0143773Z buf8 = buf7; del buf7 # reuse 2023-01-11T21:38:06.0143877Z kernel_cpp_4(c_void_p(buf8.data_ptr())) 2023-01-11T21:38:06.0143952Z return (buf8, ) 2023-01-11T21:38:06.0143957Z 2023-01-11T21:38:06.0143962Z 2023-01-11T21:38:06.0144082Z if __name__ == "__main__": 2023-01-11T21:38:06.0144193Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.0144319Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.0144527Z arg0_1 = rand_strided((2, 2, 10), (20, 10, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0144644Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.0144649Z 2023-01-11T21:38:06.0144719Z ok (2.484s) 2023-01-11T21:38:06.0144887Z test_multi_gpu_device_cpu (__main__.CpuTests) ... skip: requires multiple cuda devices (0.000s) 2023-01-11T21:38:06.0145040Z test_multilayer_low_prec_cpu (__main__.CpuTests) ... skip: requires CUDA (0.001s) 2023-01-11T21:38:06.0145190Z test_nan_to_num_cpu (__main__.CpuTests) ... skip: Skipping due to op bugs (0.001s) 2023-01-11T21:38:06.0145645Z test_narrow_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.0145779Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.0146033Z [2023-01-11 21:29:39,703] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 203 2023-01-11T21:38:06.0146298Z [2023-01-11 21:29:42,043] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 203 2023-01-11T21:38:06.0146304Z 2023-01-11T21:38:06.0146402Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.0146477Z import torch 2023-01-11T21:38:06.0146552Z import random 2023-01-11T21:38:06.0146672Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.0146796Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.0146801Z 2023-01-11T21:38:06.0146885Z aten = torch.ops.aten 2023-01-11T21:38:06.0147016Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.0147114Z async_compile = AsyncCompile() 2023-01-11T21:38:06.0147120Z 2023-01-11T21:38:06.0147194Z import triton 2023-01-11T21:38:06.0147287Z import triton.language as tl 2023-01-11T21:38:06.0147411Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.0147576Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.0147582Z 2023-01-11T21:38:06.0147587Z 2023-01-11T21:38:06.0147722Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.0147932Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.0148048Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.0148155Z float* __restrict__ out_ptr0) 2023-01-11T21:38:06.0148221Z { 2023-01-11T21:38:06.0148322Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.0148390Z { 2023-01-11T21:38:06.0148477Z #pragma omp for 2023-01-11T21:38:06.0148557Z for(long i0=0; i0<128; i0+=1) 2023-01-11T21:38:06.0148624Z { 2023-01-11T21:38:06.0148768Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 640 + (8*i0)); 2023-01-11T21:38:06.0148907Z auto tmp1 = at::vec::Vectorized(static_cast(2)); 2023-01-11T21:38:06.0149002Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.0149141Z auto tmp3 = at::vec::Vectorized(static_cast(1)); 2023-01-11T21:38:06.0149229Z auto tmp4 = tmp2 + tmp3; 2023-01-11T21:38:06.0149325Z tmp4.store(out_ptr0 + 8*i0); 2023-01-11T21:38:06.0149385Z } 2023-01-11T21:38:06.0149486Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:06.0149579Z for(long i0=1024; i0<1024; i0+=1) 2023-01-11T21:38:06.0149646Z { 2023-01-11T21:38:06.0149741Z auto tmp0 = in_ptr0[640 + i0]; 2023-01-11T21:38:06.0149844Z auto tmp1 = static_cast(2); 2023-01-11T21:38:06.0149961Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.0150057Z auto tmp3 = static_cast(1); 2023-01-11T21:38:06.0150145Z auto tmp4 = tmp2 + tmp3; 2023-01-11T21:38:06.0150232Z out_ptr0[i0] = tmp4; 2023-01-11T21:38:06.0150301Z } 2023-01-11T21:38:06.0150368Z } 2023-01-11T21:38:06.0150438Z } 2023-01-11T21:38:06.0150524Z ''') 2023-01-11T21:38:06.0150529Z 2023-01-11T21:38:06.0150534Z 2023-01-11T21:38:06.0150620Z async_compile.wait(globals()) 2023-01-11T21:38:06.0150702Z del async_compile 2023-01-11T21:38:06.0150708Z 2023-01-11T21:38:06.0150783Z def call(args): 2023-01-11T21:38:06.0150857Z arg0_1, = args 2023-01-11T21:38:06.0150936Z args.clear() 2023-01-11T21:38:06.0151138Z buf0 = empty_strided((16, 64), (64, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0151276Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.0151383Z return (as_strided(arg0_1, (64, 16), (64, 1), 10), buf0, ) 2023-01-11T21:38:06.0151401Z 2023-01-11T21:38:06.0151405Z 2023-01-11T21:38:06.0151478Z if __name__ == "__main__": 2023-01-11T21:38:06.0151598Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.0151723Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.0151930Z arg0_1 = rand_strided((64, 64), (64, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0152044Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.0152049Z 2023-01-11T21:38:06.0152119Z ok (2.372s) 2023-01-11T21:38:06.0152582Z test_new_empty_strided_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.0152716Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.0152973Z [2023-01-11 21:29:42,095] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 204 2023-01-11T21:38:06.0153230Z [2023-01-11 21:29:44,213] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 204 2023-01-11T21:38:06.0153314Z 2023-01-11T21:38:06.0153406Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.0153483Z import torch 2023-01-11T21:38:06.0153557Z import random 2023-01-11T21:38:06.0153676Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.0153800Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.0153805Z 2023-01-11T21:38:06.0153886Z aten = torch.ops.aten 2023-01-11T21:38:06.0154026Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.0154115Z async_compile = AsyncCompile() 2023-01-11T21:38:06.0154120Z 2023-01-11T21:38:06.0154194Z import triton 2023-01-11T21:38:06.0154292Z import triton.language as tl 2023-01-11T21:38:06.0154418Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.0154557Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.0154563Z 2023-01-11T21:38:06.0154567Z 2023-01-11T21:38:06.0154703Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.0154910Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.0155028Z extern "C" void kernel(float* __restrict__ out_ptr0) 2023-01-11T21:38:06.0155086Z { 2023-01-11T21:38:06.0155191Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.0155257Z { 2023-01-11T21:38:06.0155342Z #pragma omp for 2023-01-11T21:38:06.0155431Z for(long i0=0; i0<2048; i0+=1) 2023-01-11T21:38:06.0155498Z { 2023-01-11T21:38:06.0155640Z auto tmp0 = at::vec::Vectorized(static_cast(123)); 2023-01-11T21:38:06.0155728Z tmp0.store(out_ptr0 + 8*i0); 2023-01-11T21:38:06.0155829Z } 2023-01-11T21:38:06.0155928Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:06.0156023Z for(long i0=16384; i0<16384; i0+=1) 2023-01-11T21:38:06.0156091Z { 2023-01-11T21:38:06.0156196Z auto tmp0 = static_cast(123); 2023-01-11T21:38:06.0156284Z out_ptr0[i0] = tmp0; 2023-01-11T21:38:06.0156346Z } 2023-01-11T21:38:06.0156412Z } 2023-01-11T21:38:06.0156478Z } 2023-01-11T21:38:06.0156564Z ''') 2023-01-11T21:38:06.0156569Z 2023-01-11T21:38:06.0156574Z 2023-01-11T21:38:06.0156667Z async_compile.wait(globals()) 2023-01-11T21:38:06.0156744Z del async_compile 2023-01-11T21:38:06.0156749Z 2023-01-11T21:38:06.0156824Z def call(args): 2023-01-11T21:38:06.0156890Z arg0_1, = args 2023-01-11T21:38:06.0156965Z args.clear() 2023-01-11T21:38:06.0157183Z buf0 = empty_strided((1, 128, 128), (16384, 128, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0157293Z kernel_cpp_0(c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.0157373Z return (buf0, ) 2023-01-11T21:38:06.0157378Z 2023-01-11T21:38:06.0157382Z 2023-01-11T21:38:06.0157463Z if __name__ == "__main__": 2023-01-11T21:38:06.0157581Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.0157700Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.0157895Z arg0_1 = rand_strided((55, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0158007Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.0158012Z 2023-01-11T21:38:06.0158085Z ok (2.177s) 2023-01-11T21:38:06.0158537Z test_new_ones_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.0158672Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.0158936Z [2023-01-11 21:29:44,303] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 205 2023-01-11T21:38:06.0159228Z [2023-01-11 21:29:46,211] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 205 2023-01-11T21:38:06.0159234Z 2023-01-11T21:38:06.0159335Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.0159411Z import torch 2023-01-11T21:38:06.0159478Z import random 2023-01-11T21:38:06.0159598Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.0159724Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.0159729Z 2023-01-11T21:38:06.0159812Z aten = torch.ops.aten 2023-01-11T21:38:06.0159950Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.0160043Z async_compile = AsyncCompile() 2023-01-11T21:38:06.0160049Z 2023-01-11T21:38:06.0160122Z import triton 2023-01-11T21:38:06.0160212Z import triton.language as tl 2023-01-11T21:38:06.0160336Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.0160477Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.0160482Z 2023-01-11T21:38:06.0160487Z 2023-01-11T21:38:06.0160626Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.0160832Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.0160951Z extern "C" void kernel(float* __restrict__ out_ptr0, 2023-01-11T21:38:06.0161055Z float* __restrict__ out_ptr1) 2023-01-11T21:38:06.0161121Z { 2023-01-11T21:38:06.0161180Z { 2023-01-11T21:38:06.0161247Z { 2023-01-11T21:38:06.0161351Z auto tmp0 = static_cast(1); 2023-01-11T21:38:06.0161438Z out_ptr0[0] = tmp0; 2023-01-11T21:38:06.0161507Z } 2023-01-11T21:38:06.0161575Z } 2023-01-11T21:38:06.0161642Z { 2023-01-11T21:38:06.0161736Z { 2023-01-11T21:38:06.0161841Z auto tmp0 = static_cast(0); 2023-01-11T21:38:06.0161927Z out_ptr1[0] = tmp0; 2023-01-11T21:38:06.0161994Z } 2023-01-11T21:38:06.0162060Z } 2023-01-11T21:38:06.0162125Z } 2023-01-11T21:38:06.0162203Z ''') 2023-01-11T21:38:06.0162208Z 2023-01-11T21:38:06.0162219Z 2023-01-11T21:38:06.0162309Z async_compile.wait(globals()) 2023-01-11T21:38:06.0162387Z del async_compile 2023-01-11T21:38:06.0162392Z 2023-01-11T21:38:06.0162468Z def call(args): 2023-01-11T21:38:06.0162545Z arg0_1, = args 2023-01-11T21:38:06.0162620Z args.clear() 2023-01-11T21:38:06.0162810Z buf0 = empty_strided((), (), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0162991Z buf1 = empty_strided((), (), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0163123Z kernel_cpp_0(c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr())) 2023-01-11T21:38:06.0163204Z return (buf0, buf1, ) 2023-01-11T21:38:06.0163210Z 2023-01-11T21:38:06.0163217Z 2023-01-11T21:38:06.0163301Z if __name__ == "__main__": 2023-01-11T21:38:06.0163421Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.0163548Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.0163740Z arg0_1 = rand_strided((8, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0163854Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.0163860Z 2023-01-11T21:38:06.0163929Z ok (1.990s) 2023-01-11T21:38:06.0164375Z test_nll_loss_forward_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.0164503Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.0164762Z [2023-01-11 21:29:46,263] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 206 2023-01-11T21:38:06.0165024Z [2023-01-11 21:29:48,521] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 206 2023-01-11T21:38:06.0165030Z 2023-01-11T21:38:06.0165164Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.0165243Z import torch 2023-01-11T21:38:06.0165321Z import random 2023-01-11T21:38:06.0165440Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.0165574Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.0165579Z 2023-01-11T21:38:06.0165654Z aten = torch.ops.aten 2023-01-11T21:38:06.0165789Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.0165885Z async_compile = AsyncCompile() 2023-01-11T21:38:06.0165890Z 2023-01-11T21:38:06.0165965Z import triton 2023-01-11T21:38:06.0166055Z import triton.language as tl 2023-01-11T21:38:06.0166181Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.0166320Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.0166326Z 2023-01-11T21:38:06.0166330Z 2023-01-11T21:38:06.0166471Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.0166669Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.0166791Z extern "C" void kernel(float* __restrict__ in_out_ptr0, 2023-01-11T21:38:06.0166902Z const long* __restrict__ in_ptr0, 2023-01-11T21:38:06.0167012Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:06.0167119Z float* __restrict__ out_ptr1) 2023-01-11T21:38:06.0167184Z { 2023-01-11T21:38:06.0167275Z auto out_ptr0 = in_out_ptr0; 2023-01-11T21:38:06.0167336Z { 2023-01-11T21:38:06.0167404Z { 2023-01-11T21:38:06.0167488Z float tmp3 = 0; 2023-01-11T21:38:06.0167576Z for(long i0=0; i0<5; i0+=1) 2023-01-11T21:38:06.0167681Z { 2023-01-11T21:38:06.0167751Z { 2023-01-11T21:38:06.0167848Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:06.0167947Z auto tmp1 = in_ptr1[tmp0 + (5*i0)]; 2023-01-11T21:38:06.0168077Z auto tmp2 = -tmp1; 2023-01-11T21:38:06.0168164Z tmp3 += tmp2; 2023-01-11T21:38:06.0168233Z } 2023-01-11T21:38:06.0168300Z } 2023-01-11T21:38:06.0168384Z out_ptr0[0] = tmp3; 2023-01-11T21:38:06.0168452Z } 2023-01-11T21:38:06.0168510Z } 2023-01-11T21:38:06.0168576Z { 2023-01-11T21:38:06.0168644Z { 2023-01-11T21:38:06.0168734Z auto tmp0 = out_ptr0[0]; 2023-01-11T21:38:06.0168838Z auto tmp1 = static_cast(5); 2023-01-11T21:38:06.0168928Z auto tmp2 = tmp0 / tmp1; 2023-01-11T21:38:06.0169008Z in_out_ptr0[0] = tmp2; 2023-01-11T21:38:06.0169072Z } 2023-01-11T21:38:06.0169136Z } 2023-01-11T21:38:06.0169204Z { 2023-01-11T21:38:06.0169270Z { 2023-01-11T21:38:06.0169375Z auto tmp0 = static_cast(5.0); 2023-01-11T21:38:06.0169457Z out_ptr1[0] = tmp0; 2023-01-11T21:38:06.0169516Z } 2023-01-11T21:38:06.0169583Z } 2023-01-11T21:38:06.0169647Z } 2023-01-11T21:38:06.0169730Z ''') 2023-01-11T21:38:06.0169739Z 2023-01-11T21:38:06.0169745Z 2023-01-11T21:38:06.0169836Z async_compile.wait(globals()) 2023-01-11T21:38:06.0169915Z del async_compile 2023-01-11T21:38:06.0169920Z 2023-01-11T21:38:06.0169998Z def call(args): 2023-01-11T21:38:06.0170071Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.0170146Z args.clear() 2023-01-11T21:38:06.0170333Z buf0 = empty_strided((), (), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0170424Z buf1 = buf0; del buf0 # reuse 2023-01-11T21:38:06.0170601Z buf2 = empty_strided((), (), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0170795Z kernel_cpp_0(c_void_p(buf1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(arg0_1.data_ptr()), c_void_p(buf2.data_ptr())) 2023-01-11T21:38:06.0170871Z del arg0_1 2023-01-11T21:38:06.0170936Z del arg1_1 2023-01-11T21:38:06.0171019Z return (buf1, buf2, ) 2023-01-11T21:38:06.0171024Z 2023-01-11T21:38:06.0171029Z 2023-01-11T21:38:06.0171110Z if __name__ == "__main__": 2023-01-11T21:38:06.0171259Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.0171387Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.0171584Z arg0_1 = rand_strided((5, 5), (5, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0171775Z arg1_1 = rand_strided((5, ), (1, ), device='cpu', dtype=torch.int64) 2023-01-11T21:38:06.0171895Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.0171901Z 2023-01-11T21:38:06.0171964Z ok (2.314s) 2023-01-11T21:38:06.0172436Z test_no_mega_fusion_during_lowering_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.0172573Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.0172829Z [2023-01-11 21:29:48,831] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 207 2023-01-11T21:38:06.0172835Z 2023-01-11T21:38:06.0172933Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.0173011Z import torch 2023-01-11T21:38:06.0173086Z import random 2023-01-11T21:38:06.0173202Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.0173327Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.0173332Z 2023-01-11T21:38:06.0173407Z aten = torch.ops.aten 2023-01-11T21:38:06.0173542Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.0173667Z async_compile = AsyncCompile() 2023-01-11T21:38:06.0173672Z 2023-01-11T21:38:06.0173746Z import triton 2023-01-11T21:38:06.0173838Z import triton.language as tl 2023-01-11T21:38:06.0173963Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.0174106Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.0174111Z 2023-01-11T21:38:06.0174115Z 2023-01-11T21:38:06.0174254Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.0174451Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.0174788Z extern "C" void kernel(float* __restrict__ in_out_ptr0, 2023-01-11T21:38:06.0174900Z const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.0175011Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:06.0175123Z const float* __restrict__ in_ptr2, 2023-01-11T21:38:06.0175236Z const float* __restrict__ in_ptr3, 2023-01-11T21:38:06.0175341Z const float* __restrict__ in_ptr4, 2023-01-11T21:38:06.0175447Z const float* __restrict__ in_ptr5, 2023-01-11T21:38:06.0175545Z const float* __restrict__ in_ptr6, 2023-01-11T21:38:06.0175654Z const float* __restrict__ in_ptr7, 2023-01-11T21:38:06.0175762Z const float* __restrict__ in_ptr8, 2023-01-11T21:38:06.0175866Z const float* __restrict__ in_ptr9, 2023-01-11T21:38:06.0175981Z const float* __restrict__ in_ptr10, 2023-01-11T21:38:06.0176092Z const float* __restrict__ in_ptr11, 2023-01-11T21:38:06.0176200Z const float* __restrict__ in_ptr12, 2023-01-11T21:38:06.0176306Z const float* __restrict__ in_ptr13, 2023-01-11T21:38:06.0176406Z const float* __restrict__ in_ptr14, 2023-01-11T21:38:06.0176515Z const float* __restrict__ in_ptr15, 2023-01-11T21:38:06.0176621Z const float* __restrict__ in_ptr16, 2023-01-11T21:38:06.0176729Z const float* __restrict__ in_ptr17, 2023-01-11T21:38:06.0176834Z const float* __restrict__ in_ptr18, 2023-01-11T21:38:06.0176993Z const float* __restrict__ in_ptr19, 2023-01-11T21:38:06.0177102Z const float* __restrict__ in_ptr20, 2023-01-11T21:38:06.0177255Z const float* __restrict__ in_ptr21, 2023-01-11T21:38:06.0177363Z const float* __restrict__ in_ptr22, 2023-01-11T21:38:06.0177468Z const float* __restrict__ in_ptr23, 2023-01-11T21:38:06.0177574Z const float* __restrict__ in_ptr24, 2023-01-11T21:38:06.0177679Z const float* __restrict__ in_ptr25, 2023-01-11T21:38:06.0177784Z const float* __restrict__ in_ptr26, 2023-01-11T21:38:06.0177894Z const float* __restrict__ in_ptr27, 2023-01-11T21:38:06.0177999Z const float* __restrict__ in_ptr28, 2023-01-11T21:38:06.0178098Z const float* __restrict__ in_ptr29, 2023-01-11T21:38:06.0178206Z const float* __restrict__ in_ptr30, 2023-01-11T21:38:06.0178311Z const float* __restrict__ in_ptr31, 2023-01-11T21:38:06.0178418Z const float* __restrict__ in_ptr32, 2023-01-11T21:38:06.0178524Z const float* __restrict__ in_ptr33, 2023-01-11T21:38:06.0178627Z const float* __restrict__ in_ptr34, 2023-01-11T21:38:06.0178733Z const float* __restrict__ in_ptr35, 2023-01-11T21:38:06.0178831Z const float* __restrict__ in_ptr36, 2023-01-11T21:38:06.0178935Z const float* __restrict__ in_ptr37, 2023-01-11T21:38:06.0179080Z const float* __restrict__ in_ptr38, 2023-01-11T21:38:06.0179186Z const float* __restrict__ in_ptr39, 2023-01-11T21:38:06.0179292Z const float* __restrict__ in_ptr40, 2023-01-11T21:38:06.0179398Z const float* __restrict__ in_ptr41, 2023-01-11T21:38:06.0179505Z const float* __restrict__ in_ptr42, 2023-01-11T21:38:06.0179612Z const float* __restrict__ in_ptr43, 2023-01-11T21:38:06.0179711Z const float* __restrict__ in_ptr44, 2023-01-11T21:38:06.0179819Z const float* __restrict__ in_ptr45, 2023-01-11T21:38:06.0179923Z const float* __restrict__ in_ptr46, 2023-01-11T21:38:06.0180035Z const float* __restrict__ in_ptr47, 2023-01-11T21:38:06.0180140Z const float* __restrict__ in_ptr48, 2023-01-11T21:38:06.0180247Z const float* __restrict__ in_ptr49) 2023-01-11T21:38:06.0180314Z { 2023-01-11T21:38:06.0180399Z auto out_ptr0 = in_out_ptr0; 2023-01-11T21:38:06.0180498Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.0180563Z { 2023-01-11T21:38:06.0180645Z #pragma omp for 2023-01-11T21:38:06.0180733Z for(long i0=0; i0<8; i0+=1) 2023-01-11T21:38:06.0180798Z { 2023-01-11T21:38:06.0180943Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:06.0181073Z auto tmp2 = at::vec::Vectorized::loadu(in_ptr1 + 8*i0); 2023-01-11T21:38:06.0181204Z auto tmp4 = at::vec::Vectorized::loadu(in_ptr2 + 8*i0); 2023-01-11T21:38:06.0181336Z auto tmp6 = at::vec::Vectorized::loadu(in_ptr3 + 8*i0); 2023-01-11T21:38:06.0181465Z auto tmp8 = at::vec::Vectorized::loadu(in_ptr4 + 8*i0); 2023-01-11T21:38:06.0181597Z auto tmp10 = at::vec::Vectorized::loadu(in_ptr5 + 8*i0); 2023-01-11T21:38:06.0181731Z auto tmp12 = at::vec::Vectorized::loadu(in_ptr6 + 8*i0); 2023-01-11T21:38:06.0181861Z auto tmp14 = at::vec::Vectorized::loadu(in_ptr7 + 8*i0); 2023-01-11T21:38:06.0181992Z auto tmp16 = at::vec::Vectorized::loadu(in_ptr8 + 8*i0); 2023-01-11T21:38:06.0182083Z auto tmp1 = tmp0 + tmp0; 2023-01-11T21:38:06.0182194Z auto tmp3 = tmp1 + tmp2; 2023-01-11T21:38:06.0182286Z auto tmp5 = tmp3 + tmp4; 2023-01-11T21:38:06.0182373Z auto tmp7 = tmp5 + tmp6; 2023-01-11T21:38:06.0182460Z auto tmp9 = tmp7 + tmp8; 2023-01-11T21:38:06.0182553Z auto tmp11 = tmp9 + tmp10; 2023-01-11T21:38:06.0182643Z auto tmp13 = tmp11 + tmp12; 2023-01-11T21:38:06.0182733Z auto tmp15 = tmp13 + tmp14; 2023-01-11T21:38:06.0182816Z auto tmp17 = tmp15 + tmp16; 2023-01-11T21:38:06.0182912Z tmp17.store(out_ptr0 + 8*i0); 2023-01-11T21:38:06.0182979Z } 2023-01-11T21:38:06.0183077Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:06.0183168Z for(long i0=64; i0<64; i0+=1) 2023-01-11T21:38:06.0183235Z { 2023-01-11T21:38:06.0183322Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:06.0183402Z auto tmp2 = in_ptr1[i0]; 2023-01-11T21:38:06.0183489Z auto tmp4 = in_ptr2[i0]; 2023-01-11T21:38:06.0183577Z auto tmp6 = in_ptr3[i0]; 2023-01-11T21:38:06.0183665Z auto tmp8 = in_ptr4[i0]; 2023-01-11T21:38:06.0183753Z auto tmp10 = in_ptr5[i0]; 2023-01-11T21:38:06.0183841Z auto tmp12 = in_ptr6[i0]; 2023-01-11T21:38:06.0183921Z auto tmp14 = in_ptr7[i0]; 2023-01-11T21:38:06.0184010Z auto tmp16 = in_ptr8[i0]; 2023-01-11T21:38:06.0184098Z auto tmp1 = tmp0 + tmp0; 2023-01-11T21:38:06.0184184Z auto tmp3 = tmp1 + tmp2; 2023-01-11T21:38:06.0184274Z auto tmp5 = tmp3 + tmp4; 2023-01-11T21:38:06.0184359Z auto tmp7 = tmp5 + tmp6; 2023-01-11T21:38:06.0184474Z auto tmp9 = tmp7 + tmp8; 2023-01-11T21:38:06.0184556Z auto tmp11 = tmp9 + tmp10; 2023-01-11T21:38:06.0184648Z auto tmp13 = tmp11 + tmp12; 2023-01-11T21:38:06.0184738Z auto tmp15 = tmp13 + tmp14; 2023-01-11T21:38:06.0184828Z auto tmp17 = tmp15 + tmp16; 2023-01-11T21:38:06.0184916Z out_ptr0[i0] = tmp17; 2023-01-11T21:38:06.0184983Z } 2023-01-11T21:38:06.0185064Z #pragma omp for 2023-01-11T21:38:06.0185147Z for(long i0=0; i0<8; i0+=1) 2023-01-11T21:38:06.0185231Z { 2023-01-11T21:38:06.0185387Z auto tmp0 = at::vec::Vectorized::loadu(out_ptr0 + 8*i0); 2023-01-11T21:38:06.0185531Z auto tmp1 = at::vec::Vectorized::loadu(in_ptr9 + 8*i0); 2023-01-11T21:38:06.0185665Z auto tmp3 = at::vec::Vectorized::loadu(in_ptr10 + 8*i0); 2023-01-11T21:38:06.0185798Z auto tmp5 = at::vec::Vectorized::loadu(in_ptr11 + 8*i0); 2023-01-11T21:38:06.0185932Z auto tmp7 = at::vec::Vectorized::loadu(in_ptr12 + 8*i0); 2023-01-11T21:38:06.0186061Z auto tmp9 = at::vec::Vectorized::loadu(in_ptr13 + 8*i0); 2023-01-11T21:38:06.0186199Z auto tmp11 = at::vec::Vectorized::loadu(in_ptr14 + 8*i0); 2023-01-11T21:38:06.0186330Z auto tmp13 = at::vec::Vectorized::loadu(in_ptr15 + 8*i0); 2023-01-11T21:38:06.0186465Z auto tmp15 = at::vec::Vectorized::loadu(in_ptr16 + 8*i0); 2023-01-11T21:38:06.0186559Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.0186648Z auto tmp4 = tmp2 + tmp3; 2023-01-11T21:38:06.0186738Z auto tmp6 = tmp4 + tmp5; 2023-01-11T21:38:06.0186823Z auto tmp8 = tmp6 + tmp7; 2023-01-11T21:38:06.0186912Z auto tmp10 = tmp8 + tmp9; 2023-01-11T21:38:06.0186995Z auto tmp12 = tmp10 + tmp11; 2023-01-11T21:38:06.0187086Z auto tmp14 = tmp12 + tmp13; 2023-01-11T21:38:06.0187176Z auto tmp16 = tmp14 + tmp15; 2023-01-11T21:38:06.0187281Z tmp16.store(in_out_ptr0 + 8*i0); 2023-01-11T21:38:06.0187351Z } 2023-01-11T21:38:06.0187451Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:06.0187538Z for(long i0=64; i0<64; i0+=1) 2023-01-11T21:38:06.0187597Z { 2023-01-11T21:38:06.0187714Z auto tmp0 = out_ptr0[i0]; 2023-01-11T21:38:06.0187801Z auto tmp1 = in_ptr9[i0]; 2023-01-11T21:38:06.0187889Z auto tmp3 = in_ptr10[i0]; 2023-01-11T21:38:06.0187979Z auto tmp5 = in_ptr11[i0]; 2023-01-11T21:38:06.0188065Z auto tmp7 = in_ptr12[i0]; 2023-01-11T21:38:06.0188156Z auto tmp9 = in_ptr13[i0]; 2023-01-11T21:38:06.0188239Z auto tmp11 = in_ptr14[i0]; 2023-01-11T21:38:06.0188330Z auto tmp13 = in_ptr15[i0]; 2023-01-11T21:38:06.0188422Z auto tmp15 = in_ptr16[i0]; 2023-01-11T21:38:06.0188512Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.0188601Z auto tmp4 = tmp2 + tmp3; 2023-01-11T21:38:06.0188693Z auto tmp6 = tmp4 + tmp5; 2023-01-11T21:38:06.0188781Z auto tmp8 = tmp6 + tmp7; 2023-01-11T21:38:06.0188862Z auto tmp10 = tmp8 + tmp9; 2023-01-11T21:38:06.0188952Z auto tmp12 = tmp10 + tmp11; 2023-01-11T21:38:06.0189045Z auto tmp14 = tmp12 + tmp13; 2023-01-11T21:38:06.0189137Z auto tmp16 = tmp14 + tmp15; 2023-01-11T21:38:06.0189224Z in_out_ptr0[i0] = tmp16; 2023-01-11T21:38:06.0189291Z } 2023-01-11T21:38:06.0189372Z #pragma omp for 2023-01-11T21:38:06.0189452Z for(long i0=0; i0<8; i0+=1) 2023-01-11T21:38:06.0189522Z { 2023-01-11T21:38:06.0189667Z auto tmp0 = at::vec::Vectorized::loadu(in_out_ptr0 + 8*i0); 2023-01-11T21:38:06.0189801Z auto tmp1 = at::vec::Vectorized::loadu(in_ptr17 + 8*i0); 2023-01-11T21:38:06.0189935Z auto tmp3 = at::vec::Vectorized::loadu(in_ptr18 + 8*i0); 2023-01-11T21:38:06.0190094Z auto tmp5 = at::vec::Vectorized::loadu(in_ptr19 + 8*i0); 2023-01-11T21:38:06.0190226Z auto tmp7 = at::vec::Vectorized::loadu(in_ptr20 + 8*i0); 2023-01-11T21:38:06.0190354Z auto tmp9 = at::vec::Vectorized::loadu(in_ptr21 + 8*i0); 2023-01-11T21:38:06.0190496Z auto tmp11 = at::vec::Vectorized::loadu(in_ptr22 + 8*i0); 2023-01-11T21:38:06.0190623Z auto tmp13 = at::vec::Vectorized::loadu(in_ptr23 + 8*i0); 2023-01-11T21:38:06.0190754Z auto tmp15 = at::vec::Vectorized::loadu(in_ptr24 + 8*i0); 2023-01-11T21:38:06.0190844Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.0190933Z auto tmp4 = tmp2 + tmp3; 2023-01-11T21:38:06.0191019Z auto tmp6 = tmp4 + tmp5; 2023-01-11T21:38:06.0191103Z auto tmp8 = tmp6 + tmp7; 2023-01-11T21:38:06.0191192Z auto tmp10 = tmp8 + tmp9; 2023-01-11T21:38:06.0191275Z auto tmp12 = tmp10 + tmp11; 2023-01-11T21:38:06.0191367Z auto tmp14 = tmp12 + tmp13; 2023-01-11T21:38:06.0191456Z auto tmp16 = tmp14 + tmp15; 2023-01-11T21:38:06.0191556Z tmp16.store(in_out_ptr0 + 8*i0); 2023-01-11T21:38:06.0191623Z } 2023-01-11T21:38:06.0191723Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:06.0191814Z for(long i0=64; i0<64; i0+=1) 2023-01-11T21:38:06.0191873Z { 2023-01-11T21:38:06.0191968Z auto tmp0 = in_out_ptr0[i0]; 2023-01-11T21:38:06.0192058Z auto tmp1 = in_ptr17[i0]; 2023-01-11T21:38:06.0192151Z auto tmp3 = in_ptr18[i0]; 2023-01-11T21:38:06.0192239Z auto tmp5 = in_ptr19[i0]; 2023-01-11T21:38:06.0192327Z auto tmp7 = in_ptr20[i0]; 2023-01-11T21:38:06.0192415Z auto tmp9 = in_ptr21[i0]; 2023-01-11T21:38:06.0192498Z auto tmp11 = in_ptr22[i0]; 2023-01-11T21:38:06.0192588Z auto tmp13 = in_ptr23[i0]; 2023-01-11T21:38:06.0192677Z auto tmp15 = in_ptr24[i0]; 2023-01-11T21:38:06.0192768Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.0192855Z auto tmp4 = tmp2 + tmp3; 2023-01-11T21:38:06.0192942Z auto tmp6 = tmp4 + tmp5; 2023-01-11T21:38:06.0193030Z auto tmp8 = tmp6 + tmp7; 2023-01-11T21:38:06.0193111Z auto tmp10 = tmp8 + tmp9; 2023-01-11T21:38:06.0193228Z auto tmp12 = tmp10 + tmp11; 2023-01-11T21:38:06.0193321Z auto tmp14 = tmp12 + tmp13; 2023-01-11T21:38:06.0193411Z auto tmp16 = tmp14 + tmp15; 2023-01-11T21:38:06.0193499Z in_out_ptr0[i0] = tmp16; 2023-01-11T21:38:06.0193568Z } 2023-01-11T21:38:06.0193648Z #pragma omp for 2023-01-11T21:38:06.0193728Z for(long i0=0; i0<8; i0+=1) 2023-01-11T21:38:06.0193793Z { 2023-01-11T21:38:06.0193935Z auto tmp0 = at::vec::Vectorized::loadu(in_out_ptr0 + 8*i0); 2023-01-11T21:38:06.0194075Z auto tmp1 = at::vec::Vectorized::loadu(in_ptr25 + 8*i0); 2023-01-11T21:38:06.0194218Z auto tmp3 = at::vec::Vectorized::loadu(in_ptr26 + 8*i0); 2023-01-11T21:38:06.0194350Z auto tmp5 = at::vec::Vectorized::loadu(in_ptr27 + 8*i0); 2023-01-11T21:38:06.0194483Z auto tmp7 = at::vec::Vectorized::loadu(in_ptr28 + 8*i0); 2023-01-11T21:38:06.0194615Z auto tmp9 = at::vec::Vectorized::loadu(in_ptr29 + 8*i0); 2023-01-11T21:38:06.0194746Z auto tmp11 = at::vec::Vectorized::loadu(in_ptr30 + 8*i0); 2023-01-11T21:38:06.0194881Z auto tmp13 = at::vec::Vectorized::loadu(in_ptr31 + 8*i0); 2023-01-11T21:38:06.0195014Z auto tmp15 = at::vec::Vectorized::loadu(in_ptr32 + 8*i0); 2023-01-11T21:38:06.0195103Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.0195193Z auto tmp4 = tmp2 + tmp3; 2023-01-11T21:38:06.0195282Z auto tmp6 = tmp4 + tmp5; 2023-01-11T21:38:06.0195369Z auto tmp8 = tmp6 + tmp7; 2023-01-11T21:38:06.0195487Z auto tmp10 = tmp8 + tmp9; 2023-01-11T21:38:06.0195570Z auto tmp12 = tmp10 + tmp11; 2023-01-11T21:38:06.0195660Z auto tmp14 = tmp12 + tmp13; 2023-01-11T21:38:06.0195748Z auto tmp16 = tmp14 + tmp15; 2023-01-11T21:38:06.0195851Z tmp16.store(in_out_ptr0 + 8*i0); 2023-01-11T21:38:06.0195919Z } 2023-01-11T21:38:06.0196022Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:06.0196108Z for(long i0=64; i0<64; i0+=1) 2023-01-11T21:38:06.0196168Z { 2023-01-11T21:38:06.0196263Z auto tmp0 = in_out_ptr0[i0]; 2023-01-11T21:38:06.0196354Z auto tmp1 = in_ptr25[i0]; 2023-01-11T21:38:06.0196444Z auto tmp3 = in_ptr26[i0]; 2023-01-11T21:38:06.0196532Z auto tmp5 = in_ptr27[i0]; 2023-01-11T21:38:06.0196619Z auto tmp7 = in_ptr28[i0]; 2023-01-11T21:38:06.0196707Z auto tmp9 = in_ptr29[i0]; 2023-01-11T21:38:06.0196789Z auto tmp11 = in_ptr30[i0]; 2023-01-11T21:38:06.0196881Z auto tmp13 = in_ptr31[i0]; 2023-01-11T21:38:06.0196971Z auto tmp15 = in_ptr32[i0]; 2023-01-11T21:38:06.0197063Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.0197150Z auto tmp4 = tmp2 + tmp3; 2023-01-11T21:38:06.0197237Z auto tmp6 = tmp4 + tmp5; 2023-01-11T21:38:06.0197320Z auto tmp8 = tmp6 + tmp7; 2023-01-11T21:38:06.0197408Z auto tmp10 = tmp8 + tmp9; 2023-01-11T21:38:06.0197496Z auto tmp12 = tmp10 + tmp11; 2023-01-11T21:38:06.0197587Z auto tmp14 = tmp12 + tmp13; 2023-01-11T21:38:06.0197677Z auto tmp16 = tmp14 + tmp15; 2023-01-11T21:38:06.0197764Z in_out_ptr0[i0] = tmp16; 2023-01-11T21:38:06.0197833Z } 2023-01-11T21:38:06.0197907Z #pragma omp for 2023-01-11T21:38:06.0197994Z for(long i0=0; i0<8; i0+=1) 2023-01-11T21:38:06.0198059Z { 2023-01-11T21:38:06.0198202Z auto tmp0 = at::vec::Vectorized::loadu(in_out_ptr0 + 8*i0); 2023-01-11T21:38:06.0198337Z auto tmp1 = at::vec::Vectorized::loadu(in_ptr33 + 8*i0); 2023-01-11T21:38:06.0198470Z auto tmp3 = at::vec::Vectorized::loadu(in_ptr34 + 8*i0); 2023-01-11T21:38:06.0198601Z auto tmp5 = at::vec::Vectorized::loadu(in_ptr35 + 8*i0); 2023-01-11T21:38:06.0198756Z auto tmp7 = at::vec::Vectorized::loadu(in_ptr36 + 8*i0); 2023-01-11T21:38:06.0198886Z auto tmp9 = at::vec::Vectorized::loadu(in_ptr37 + 8*i0); 2023-01-11T21:38:06.0199014Z auto tmp11 = at::vec::Vectorized::loadu(in_ptr38 + 8*i0); 2023-01-11T21:38:06.0199148Z auto tmp13 = at::vec::Vectorized::loadu(in_ptr39 + 8*i0); 2023-01-11T21:38:06.0199283Z auto tmp15 = at::vec::Vectorized::loadu(in_ptr40 + 8*i0); 2023-01-11T21:38:06.0199372Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.0199461Z auto tmp4 = tmp2 + tmp3; 2023-01-11T21:38:06.0199549Z auto tmp6 = tmp4 + tmp5; 2023-01-11T21:38:06.0199637Z auto tmp8 = tmp6 + tmp7; 2023-01-11T21:38:06.0199718Z auto tmp10 = tmp8 + tmp9; 2023-01-11T21:38:06.0199813Z auto tmp12 = tmp10 + tmp11; 2023-01-11T21:38:06.0199906Z auto tmp14 = tmp12 + tmp13; 2023-01-11T21:38:06.0199993Z auto tmp16 = tmp14 + tmp15; 2023-01-11T21:38:06.0200096Z tmp16.store(in_out_ptr0 + 8*i0); 2023-01-11T21:38:06.0200161Z } 2023-01-11T21:38:06.0200261Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:06.0200340Z for(long i0=64; i0<64; i0+=1) 2023-01-11T21:38:06.0200408Z { 2023-01-11T21:38:06.0200502Z auto tmp0 = in_out_ptr0[i0]; 2023-01-11T21:38:06.0200592Z auto tmp1 = in_ptr33[i0]; 2023-01-11T21:38:06.0200681Z auto tmp3 = in_ptr34[i0]; 2023-01-11T21:38:06.0200770Z auto tmp5 = in_ptr35[i0]; 2023-01-11T21:38:06.0200858Z auto tmp7 = in_ptr36[i0]; 2023-01-11T21:38:06.0200938Z auto tmp9 = in_ptr37[i0]; 2023-01-11T21:38:06.0201056Z auto tmp11 = in_ptr38[i0]; 2023-01-11T21:38:06.0201147Z auto tmp13 = in_ptr39[i0]; 2023-01-11T21:38:06.0201236Z auto tmp15 = in_ptr40[i0]; 2023-01-11T21:38:06.0201325Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.0201420Z auto tmp4 = tmp2 + tmp3; 2023-01-11T21:38:06.0201509Z auto tmp6 = tmp4 + tmp5; 2023-01-11T21:38:06.0201590Z auto tmp8 = tmp6 + tmp7; 2023-01-11T21:38:06.0201680Z auto tmp10 = tmp8 + tmp9; 2023-01-11T21:38:06.0201769Z auto tmp12 = tmp10 + tmp11; 2023-01-11T21:38:06.0201857Z auto tmp14 = tmp12 + tmp13; 2023-01-11T21:38:06.0201949Z auto tmp16 = tmp14 + tmp15; 2023-01-11T21:38:06.0202035Z in_out_ptr0[i0] = tmp16; 2023-01-11T21:38:06.0202102Z } 2023-01-11T21:38:06.0202176Z #pragma omp for 2023-01-11T21:38:06.0202262Z for(long i0=0; i0<8; i0+=1) 2023-01-11T21:38:06.0202330Z { 2023-01-11T21:38:06.0202472Z auto tmp0 = at::vec::Vectorized::loadu(in_out_ptr0 + 8*i0); 2023-01-11T21:38:06.0202606Z auto tmp1 = at::vec::Vectorized::loadu(in_ptr41 + 8*i0); 2023-01-11T21:38:06.0202740Z auto tmp3 = at::vec::Vectorized::loadu(in_ptr42 + 8*i0); 2023-01-11T21:38:06.0202875Z auto tmp5 = at::vec::Vectorized::loadu(in_ptr43 + 8*i0); 2023-01-11T21:38:06.0203005Z auto tmp7 = at::vec::Vectorized::loadu(in_ptr44 + 8*i0); 2023-01-11T21:38:06.0203127Z auto tmp9 = at::vec::Vectorized::loadu(in_ptr45 + 8*i0); 2023-01-11T21:38:06.0203262Z auto tmp11 = at::vec::Vectorized::loadu(in_ptr46 + 8*i0); 2023-01-11T21:38:06.0203396Z auto tmp13 = at::vec::Vectorized::loadu(in_ptr47 + 8*i0); 2023-01-11T21:38:06.0203530Z auto tmp15 = at::vec::Vectorized::loadu(in_ptr48 + 8*i0); 2023-01-11T21:38:06.0203625Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.0203717Z auto tmp4 = tmp2 + tmp3; 2023-01-11T21:38:06.0203806Z auto tmp6 = tmp4 + tmp5; 2023-01-11T21:38:06.0203892Z auto tmp8 = tmp6 + tmp7; 2023-01-11T21:38:06.0203975Z auto tmp10 = tmp8 + tmp9; 2023-01-11T21:38:06.0204066Z auto tmp12 = tmp10 + tmp11; 2023-01-11T21:38:06.0204183Z auto tmp14 = tmp12 + tmp13; 2023-01-11T21:38:06.0204274Z auto tmp16 = tmp14 + tmp15; 2023-01-11T21:38:06.0204376Z tmp16.store(in_out_ptr0 + 8*i0); 2023-01-11T21:38:06.0204445Z } 2023-01-11T21:38:06.0204545Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:06.0204624Z for(long i0=64; i0<64; i0+=1) 2023-01-11T21:38:06.0204689Z { 2023-01-11T21:38:06.0204782Z auto tmp0 = in_out_ptr0[i0]; 2023-01-11T21:38:06.0204874Z auto tmp1 = in_ptr41[i0]; 2023-01-11T21:38:06.0204963Z auto tmp3 = in_ptr42[i0]; 2023-01-11T21:38:06.0205051Z auto tmp5 = in_ptr43[i0]; 2023-01-11T21:38:06.0205142Z auto tmp7 = in_ptr44[i0]; 2023-01-11T21:38:06.0205222Z auto tmp9 = in_ptr45[i0]; 2023-01-11T21:38:06.0205312Z auto tmp11 = in_ptr46[i0]; 2023-01-11T21:38:06.0205402Z auto tmp13 = in_ptr47[i0]; 2023-01-11T21:38:06.0205491Z auto tmp15 = in_ptr48[i0]; 2023-01-11T21:38:06.0205582Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.0205671Z auto tmp4 = tmp2 + tmp3; 2023-01-11T21:38:06.0205758Z auto tmp6 = tmp4 + tmp5; 2023-01-11T21:38:06.0205837Z auto tmp8 = tmp6 + tmp7; 2023-01-11T21:38:06.0205930Z auto tmp10 = tmp8 + tmp9; 2023-01-11T21:38:06.0206019Z auto tmp12 = tmp10 + tmp11; 2023-01-11T21:38:06.0206109Z auto tmp14 = tmp12 + tmp13; 2023-01-11T21:38:06.0206198Z auto tmp16 = tmp14 + tmp15; 2023-01-11T21:38:06.0206285Z in_out_ptr0[i0] = tmp16; 2023-01-11T21:38:06.0206353Z } 2023-01-11T21:38:06.0206470Z #pragma omp for 2023-01-11T21:38:06.0206557Z for(long i0=0; i0<8; i0+=1) 2023-01-11T21:38:06.0206623Z { 2023-01-11T21:38:06.0206766Z auto tmp0 = at::vec::Vectorized::loadu(in_out_ptr0 + 8*i0); 2023-01-11T21:38:06.0206899Z auto tmp1 = at::vec::Vectorized::loadu(in_ptr49 + 8*i0); 2023-01-11T21:38:06.0206992Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.0207092Z tmp2.store(in_out_ptr0 + 8*i0); 2023-01-11T21:38:06.0207151Z } 2023-01-11T21:38:06.0207254Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:06.0207340Z for(long i0=64; i0<64; i0+=1) 2023-01-11T21:38:06.0207410Z { 2023-01-11T21:38:06.0207506Z auto tmp0 = in_out_ptr0[i0]; 2023-01-11T21:38:06.0207597Z auto tmp1 = in_ptr49[i0]; 2023-01-11T21:38:06.0207685Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.0207766Z in_out_ptr0[i0] = tmp2; 2023-01-11T21:38:06.0207832Z } 2023-01-11T21:38:06.0207901Z } 2023-01-11T21:38:06.0207966Z } 2023-01-11T21:38:06.0208059Z ''') 2023-01-11T21:38:06.0208065Z 2023-01-11T21:38:06.0208070Z 2023-01-11T21:38:06.0208164Z async_compile.wait(globals()) 2023-01-11T21:38:06.0208240Z del async_compile 2023-01-11T21:38:06.0208246Z 2023-01-11T21:38:06.0208313Z def call(args): 2023-01-11T21:38:06.0208634Z arg0_1, arg1_1, arg2_1, arg3_1, arg4_1, arg5_1, arg6_1, arg7_1, arg8_1, arg9_1, arg10_1, arg11_1, arg12_1, arg13_1, arg14_1, arg15_1, arg16_1, arg17_1, arg18_1, arg19_1, arg20_1, arg21_1, arg22_1, arg23_1, arg24_1, arg25_1, arg26_1, arg27_1, arg28_1, arg29_1, arg30_1, arg31_1, arg32_1, arg33_1, arg34_1, arg35_1, arg36_1, arg37_1, arg38_1, arg39_1, arg40_1, arg41_1, arg42_1, arg43_1, arg44_1, arg45_1, arg46_1, arg47_1, arg48_1, arg49_1 = args 2023-01-11T21:38:06.0208711Z args.clear() 2023-01-11T21:38:06.0208910Z buf0 = empty_strided((64, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0209001Z buf1 = buf0; del buf0 # reuse 2023-01-11T21:38:06.0209093Z buf2 = buf1; del buf1 # reuse 2023-01-11T21:38:06.0209179Z buf3 = buf2; del buf2 # reuse 2023-01-11T21:38:06.0209266Z buf4 = buf3; del buf3 # reuse 2023-01-11T21:38:06.0209352Z buf5 = buf4; del buf4 # reuse 2023-01-11T21:38:06.0209431Z buf6 = buf5; del buf5 # reuse 2023-01-11T21:38:06.0210667Z kernel_cpp_0(c_void_p(buf6.data_ptr()), c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(arg2_1.data_ptr()), c_void_p(arg3_1.data_ptr()), c_void_p(arg4_1.data_ptr()), c_void_p(arg5_1.data_ptr()), c_void_p(arg6_1.data_ptr()), c_void_p(arg7_1.data_ptr()), c_void_p(arg8_1.data_ptr()), c_void_p(arg9_1.data_ptr()), c_void_p(arg10_1.data_ptr()), c_void_p(arg11_1.data_ptr()), c_void_p(arg12_1.data_ptr()), c_void_p(arg13_1.data_ptr()), c_void_p(arg14_1.data_ptr()), c_void_p(arg15_1.data_ptr()), c_void_p(arg16_1.data_ptr()), c_void_p(arg17_1.data_ptr()), c_void_p(arg18_1.data_ptr()), c_void_p(arg19_1.data_ptr()), c_void_p(arg20_1.data_ptr()), c_void_p(arg21_1.data_ptr()), c_void_p(arg22_1.data_ptr()), c_void_p(arg23_1.data_ptr()), c_void_p(arg24_1.data_ptr()), c_void_p(arg25_1.data_ptr()), c_void_p(arg26_1.data_ptr()), c_void_p(arg27_1.data_ptr()), c_void_p(arg28_1.data_ptr()), c_void_p(arg29_1.data_ptr()), c_void_p(arg30_1.data_ptr()), c_void_p(arg31_1.data_ptr()), c_void_p(arg32_1.data_ptr()), c_void_p(arg33_1.data_ptr()), c_void_p(arg34_1.data_ptr()), c_void_p(arg35_1.data_ptr()), c_void_p(arg36_1.data_ptr()), c_void_p(arg37_1.data_ptr()), c_void_p(arg38_1.data_ptr()), c_void_p(arg39_1.data_ptr()), c_void_p(arg40_1.data_ptr()), c_void_p(arg41_1.data_ptr()), c_void_p(arg42_1.data_ptr()), c_void_p(arg43_1.data_ptr()), c_void_p(arg44_1.data_ptr()), c_void_p(arg45_1.data_ptr()), c_void_p(arg46_1.data_ptr()), c_void_p(arg47_1.data_ptr()), c_void_p(arg48_1.data_ptr()), c_void_p(arg49_1.data_ptr())) 2023-01-11T21:38:06.0210754Z del arg0_1 2023-01-11T21:38:06.0210827Z del arg10_1 2023-01-11T21:38:06.0210891Z del arg11_1 2023-01-11T21:38:06.0210964Z del arg12_1 2023-01-11T21:38:06.0211059Z del arg13_1 2023-01-11T21:38:06.0211129Z del arg14_1 2023-01-11T21:38:06.0211203Z del arg15_1 2023-01-11T21:38:06.0211275Z del arg16_1 2023-01-11T21:38:06.0211345Z del arg17_1 2023-01-11T21:38:06.0211408Z del arg18_1 2023-01-11T21:38:06.0211480Z del arg19_1 2023-01-11T21:38:06.0211550Z del arg1_1 2023-01-11T21:38:06.0211623Z del arg20_1 2023-01-11T21:38:06.0211692Z del arg21_1 2023-01-11T21:38:06.0211763Z del arg22_1 2023-01-11T21:38:06.0211825Z del arg23_1 2023-01-11T21:38:06.0211897Z del arg24_1 2023-01-11T21:38:06.0211965Z del arg25_1 2023-01-11T21:38:06.0212036Z del arg26_1 2023-01-11T21:38:06.0212110Z del arg27_1 2023-01-11T21:38:06.0212179Z del arg28_1 2023-01-11T21:38:06.0212249Z del arg29_1 2023-01-11T21:38:06.0212314Z del arg2_1 2023-01-11T21:38:06.0212384Z del arg30_1 2023-01-11T21:38:06.0212454Z del arg31_1 2023-01-11T21:38:06.0212524Z del arg32_1 2023-01-11T21:38:06.0212594Z del arg33_1 2023-01-11T21:38:06.0212665Z del arg34_1 2023-01-11T21:38:06.0212728Z del arg35_1 2023-01-11T21:38:06.0212798Z del arg36_1 2023-01-11T21:38:06.0212867Z del arg37_1 2023-01-11T21:38:06.0212936Z del arg38_1 2023-01-11T21:38:06.0213007Z del arg39_1 2023-01-11T21:38:06.0213078Z del arg3_1 2023-01-11T21:38:06.0213149Z del arg40_1 2023-01-11T21:38:06.0213213Z del arg41_1 2023-01-11T21:38:06.0213283Z del arg42_1 2023-01-11T21:38:06.0213351Z del arg43_1 2023-01-11T21:38:06.0213422Z del arg44_1 2023-01-11T21:38:06.0213492Z del arg45_1 2023-01-11T21:38:06.0213559Z del arg46_1 2023-01-11T21:38:06.0213622Z del arg47_1 2023-01-11T21:38:06.0213692Z del arg48_1 2023-01-11T21:38:06.0213764Z del arg49_1 2023-01-11T21:38:06.0213834Z del arg4_1 2023-01-11T21:38:06.0213904Z del arg5_1 2023-01-11T21:38:06.0213973Z del arg6_1 2023-01-11T21:38:06.0214043Z del arg7_1 2023-01-11T21:38:06.0214106Z del arg8_1 2023-01-11T21:38:06.0214179Z del arg9_1 2023-01-11T21:38:06.0214253Z return (buf6, ) 2023-01-11T21:38:06.0214259Z 2023-01-11T21:38:06.0214263Z 2023-01-11T21:38:06.0214343Z if __name__ == "__main__": 2023-01-11T21:38:06.0214468Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.0214703Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.0214964Z arg0_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0215179Z arg1_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0215370Z arg2_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0215556Z arg3_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0215740Z arg4_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0215925Z arg5_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0216108Z arg6_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0216295Z arg7_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0216478Z arg8_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0216656Z arg9_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0216844Z arg10_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0217034Z arg11_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0217273Z arg12_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0217460Z arg13_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0217648Z arg14_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0217835Z arg15_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0218062Z arg16_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0218239Z arg17_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0218422Z arg18_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0218607Z arg19_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0218792Z arg20_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0218976Z arg21_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0219160Z arg22_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0219341Z arg23_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0219524Z arg24_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0219700Z arg25_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0219888Z arg26_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0220071Z arg27_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0220254Z arg28_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0220442Z arg29_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0220627Z arg30_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0220812Z arg31_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0220995Z arg32_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0221172Z arg33_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0221359Z arg34_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0221542Z arg35_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0221729Z arg36_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0221911Z arg37_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0222125Z arg38_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0222311Z arg39_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0222494Z arg40_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0222670Z arg41_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0222857Z arg42_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0223039Z arg43_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0223224Z arg44_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0223411Z arg45_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0223596Z arg46_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0223780Z arg47_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0223967Z arg48_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0224144Z arg49_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0224496Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1, arg3_1, arg4_1, arg5_1, arg6_1, arg7_1, arg8_1, arg9_1, arg10_1, arg11_1, arg12_1, arg13_1, arg14_1, arg15_1, arg16_1, arg17_1, arg18_1, arg19_1, arg20_1, arg21_1, arg22_1, arg23_1, arg24_1, arg25_1, arg26_1, arg27_1, arg28_1, arg29_1, arg30_1, arg31_1, arg32_1, arg33_1, arg34_1, arg35_1, arg36_1, arg37_1, arg38_1, arg39_1, arg40_1, arg41_1, arg42_1, arg43_1, arg44_1, arg45_1, arg46_1, arg47_1, arg48_1, arg49_1])) 2023-01-11T21:38:06.0224792Z [2023-01-11 21:29:51,517] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 207 2023-01-11T21:38:06.0224799Z 2023-01-11T21:38:06.0224885Z --> 7 2023-01-11T21:38:06.0224958Z ok (3.028s) 2023-01-11T21:38:06.0225420Z test_no_op_reduction_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.0225552Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.0225809Z [2023-01-11 21:29:51,580] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 208 2023-01-11T21:38:06.0226074Z [2023-01-11 21:29:53,776] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 208 2023-01-11T21:38:06.0226083Z 2023-01-11T21:38:06.0226181Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.0226257Z import torch 2023-01-11T21:38:06.0226325Z import random 2023-01-11T21:38:06.0226445Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.0226573Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.0226578Z 2023-01-11T21:38:06.0226660Z aten = torch.ops.aten 2023-01-11T21:38:06.0226798Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.0226894Z async_compile = AsyncCompile() 2023-01-11T21:38:06.0226899Z 2023-01-11T21:38:06.0226977Z import triton 2023-01-11T21:38:06.0227069Z import triton.language as tl 2023-01-11T21:38:06.0227188Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.0227327Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.0227333Z 2023-01-11T21:38:06.0227338Z 2023-01-11T21:38:06.0227477Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.0227686Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.0227810Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.0227916Z float* __restrict__ out_ptr0, 2023-01-11T21:38:06.0228046Z float* __restrict__ out_ptr1) 2023-01-11T21:38:06.0228106Z { 2023-01-11T21:38:06.0228209Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.0228274Z { 2023-01-11T21:38:06.0228356Z #pragma omp for 2023-01-11T21:38:06.0228442Z for(long i0=0; i0<1; i0+=1) 2023-01-11T21:38:06.0228507Z { 2023-01-11T21:38:06.0228646Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:06.0228785Z auto tmp1 = at::vec::Vectorized(static_cast(1)); 2023-01-11T21:38:06.0228869Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.0228962Z tmp0.store(out_ptr0 + 8*i0); 2023-01-11T21:38:06.0229066Z tmp2.store(out_ptr1 + 8*i0); 2023-01-11T21:38:06.0229133Z } 2023-01-11T21:38:06.0229233Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:06.0229319Z for(long i0=8; i0<8; i0+=1) 2023-01-11T21:38:06.0229379Z { 2023-01-11T21:38:06.0229471Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:06.0229578Z auto tmp1 = static_cast(1); 2023-01-11T21:38:06.0229667Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.0229754Z out_ptr0[i0] = tmp0; 2023-01-11T21:38:06.0229839Z out_ptr1[i0] = tmp2; 2023-01-11T21:38:06.0229906Z } 2023-01-11T21:38:06.0229965Z } 2023-01-11T21:38:06.0230030Z } 2023-01-11T21:38:06.0230117Z ''') 2023-01-11T21:38:06.0230122Z 2023-01-11T21:38:06.0230127Z 2023-01-11T21:38:06.0230222Z async_compile.wait(globals()) 2023-01-11T21:38:06.0230300Z del async_compile 2023-01-11T21:38:06.0230305Z 2023-01-11T21:38:06.0230382Z def call(args): 2023-01-11T21:38:06.0230484Z arg0_1, = args 2023-01-11T21:38:06.0230559Z args.clear() 2023-01-11T21:38:06.0230750Z buf0 = empty_strided((8, 1), (1, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0230954Z buf1 = empty_strided((8, 1, 1), (1, 1, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0231126Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr())) 2023-01-11T21:38:06.0231200Z del arg0_1 2023-01-11T21:38:06.0231278Z return (buf0, buf1, ) 2023-01-11T21:38:06.0231283Z 2023-01-11T21:38:06.0231288Z 2023-01-11T21:38:06.0231368Z if __name__ == "__main__": 2023-01-11T21:38:06.0231487Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.0231607Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.0231808Z arg0_1 = rand_strided((8, 1, 1), (1, 1, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0231921Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.0231929Z 2023-01-11T21:38:06.0232000Z ok (2.239s) 2023-01-11T21:38:06.0232521Z test_output_strides_cpu (__main__.CpuTests) ... /opt/conda/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py:372: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled.Consider setting `torch.set_float32_matmul_precision('high')` 2023-01-11T21:38:06.0232603Z warnings.warn( 2023-01-11T21:38:06.0232862Z [2023-01-11 21:29:53,818] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 209 2023-01-11T21:38:06.0233125Z [2023-01-11 21:29:56,056] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 209 2023-01-11T21:38:06.0233378Z [2023-01-11 21:29:56,069] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 210 2023-01-11T21:38:06.0233633Z [2023-01-11 21:29:56,077] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 210 2023-01-11T21:38:06.0233884Z [2023-01-11 21:29:56,113] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 211 2023-01-11T21:38:06.0234143Z [2023-01-11 21:29:56,119] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 211 2023-01-11T21:38:06.0234595Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:3148: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.0234716Z self.assertEqual(inp.storage(), out.storage()) 2023-01-11T21:38:06.0235351Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:1904: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.0235452Z device=typed_storage.device, 2023-01-11T21:38:06.0235457Z 2023-01-11T21:38:06.0235556Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.0235629Z import torch 2023-01-11T21:38:06.0235704Z import random 2023-01-11T21:38:06.0235820Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.0235947Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.0235952Z 2023-01-11T21:38:06.0236034Z aten = torch.ops.aten 2023-01-11T21:38:06.0236171Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.0236266Z async_compile = AsyncCompile() 2023-01-11T21:38:06.0236272Z 2023-01-11T21:38:06.0236346Z import triton 2023-01-11T21:38:06.0236438Z import triton.language as tl 2023-01-11T21:38:06.0236562Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.0236695Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.0236728Z 2023-01-11T21:38:06.0236738Z 2023-01-11T21:38:06.0236869Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.0237078Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.0237201Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.0237311Z float* __restrict__ out_ptr0) 2023-01-11T21:38:06.0237377Z { 2023-01-11T21:38:06.0237483Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.0237550Z { 2023-01-11T21:38:06.0237638Z #pragma omp for collapse(2) 2023-01-11T21:38:06.0237722Z for(long i0=0; i0<4; i0+=1) 2023-01-11T21:38:06.0237790Z { 2023-01-11T21:38:06.0237879Z for(long i1=0; i1<16; i1+=1) 2023-01-11T21:38:06.0237946Z { 2023-01-11T21:38:06.0238034Z #pragma GCC ivdep 2023-01-11T21:38:06.0238128Z for(long i2=0; i2<4; i2+=1) 2023-01-11T21:38:06.0238190Z { 2023-01-11T21:38:06.0238267Z { 2023-01-11T21:38:06.0238341Z { 2023-01-11T21:38:06.0238457Z auto tmp0 = in_ptr0[i1 + (16*i2) + (64*i0)]; 2023-01-11T21:38:06.0238566Z out_ptr0[i2 + (4*i1) + (64*i0)] = tmp0; 2023-01-11T21:38:06.0238638Z } 2023-01-11T21:38:06.0238704Z } 2023-01-11T21:38:06.0238773Z } 2023-01-11T21:38:06.0238839Z } 2023-01-11T21:38:06.0238908Z } 2023-01-11T21:38:06.0238976Z } 2023-01-11T21:38:06.0239043Z } 2023-01-11T21:38:06.0239128Z ''') 2023-01-11T21:38:06.0239134Z 2023-01-11T21:38:06.0239138Z 2023-01-11T21:38:06.0239224Z async_compile.wait(globals()) 2023-01-11T21:38:06.0239299Z del async_compile 2023-01-11T21:38:06.0239304Z 2023-01-11T21:38:06.0239378Z def call(args): 2023-01-11T21:38:06.0239454Z arg0_1, = args 2023-01-11T21:38:06.0239529Z args.clear() 2023-01-11T21:38:06.0239740Z buf0 = empty_strided((4, 4, 4, 4), (64, 16, 4, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0239881Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.0239953Z del arg0_1 2023-01-11T21:38:06.0240021Z return (buf0, ) 2023-01-11T21:38:06.0240026Z 2023-01-11T21:38:06.0240031Z 2023-01-11T21:38:06.0240140Z if __name__ == "__main__": 2023-01-11T21:38:06.0240260Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.0240387Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.0240595Z arg0_1 = rand_strided((4, 4, 4, 4), (64, 16, 4, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0240706Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.0240712Z 2023-01-11T21:38:06.0240716Z 2023-01-11T21:38:06.0240813Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.0240888Z import torch 2023-01-11T21:38:06.0240955Z import random 2023-01-11T21:38:06.0241073Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.0241199Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.0241204Z 2023-01-11T21:38:06.0241286Z aten = torch.ops.aten 2023-01-11T21:38:06.0241421Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.0241518Z async_compile = AsyncCompile() 2023-01-11T21:38:06.0241523Z 2023-01-11T21:38:06.0241600Z import triton 2023-01-11T21:38:06.0241686Z import triton.language as tl 2023-01-11T21:38:06.0241810Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.0241951Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.0241956Z 2023-01-11T21:38:06.0241961Z 2023-01-11T21:38:06.0242052Z async_compile.wait(globals()) 2023-01-11T21:38:06.0242129Z del async_compile 2023-01-11T21:38:06.0242134Z 2023-01-11T21:38:06.0242207Z def call(args): 2023-01-11T21:38:06.0242281Z arg0_1, = args 2023-01-11T21:38:06.0242354Z args.clear() 2023-01-11T21:38:06.0242450Z return (as_strided(arg0_1, (64, 4), (4, 1)), ) 2023-01-11T21:38:06.0242524Z 2023-01-11T21:38:06.0242537Z 2023-01-11T21:38:06.0242610Z if __name__ == "__main__": 2023-01-11T21:38:06.0242729Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.0242856Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.0243070Z arg0_1 = rand_strided((4, 4, 4, 4), (64, 16, 4, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0243183Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.0243188Z 2023-01-11T21:38:06.0243192Z 2023-01-11T21:38:06.0243289Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.0243365Z import torch 2023-01-11T21:38:06.0243432Z import random 2023-01-11T21:38:06.0243552Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.0243675Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.0243680Z 2023-01-11T21:38:06.0243761Z aten = torch.ops.aten 2023-01-11T21:38:06.0243896Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.0243992Z async_compile = AsyncCompile() 2023-01-11T21:38:06.0243997Z 2023-01-11T21:38:06.0244071Z import triton 2023-01-11T21:38:06.0244163Z import triton.language as tl 2023-01-11T21:38:06.0244280Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.0244427Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.0244433Z 2023-01-11T21:38:06.0244437Z 2023-01-11T21:38:06.0244531Z async_compile.wait(globals()) 2023-01-11T21:38:06.0244607Z del async_compile 2023-01-11T21:38:06.0244611Z 2023-01-11T21:38:06.0244685Z def call(args): 2023-01-11T21:38:06.0244756Z arg0_1, = args 2023-01-11T21:38:06.0244830Z args.clear() 2023-01-11T21:38:06.0244944Z return (as_strided(arg0_1, (4, 4, 1), (4, 16, 0), 3), ) 2023-01-11T21:38:06.0244949Z 2023-01-11T21:38:06.0244954Z 2023-01-11T21:38:06.0245026Z if __name__ == "__main__": 2023-01-11T21:38:06.0245141Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.0245269Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.0245476Z arg0_1 = rand_strided((4, 4, 4, 4), (64, 16, 4, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0245587Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.0245591Z 2023-01-11T21:38:06.0245663Z ok (2.327s) 2023-01-11T21:38:06.0246144Z test_permute_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.0246279Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.0246536Z [2023-01-11 21:29:56,145] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 212 2023-01-11T21:38:06.0246795Z [2023-01-11 21:29:58,015] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 212 2023-01-11T21:38:06.0246808Z 2023-01-11T21:38:06.0246899Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.0246973Z import torch 2023-01-11T21:38:06.0247047Z import random 2023-01-11T21:38:06.0247168Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.0247292Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.0247299Z 2023-01-11T21:38:06.0247380Z aten = torch.ops.aten 2023-01-11T21:38:06.0247515Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.0247603Z async_compile = AsyncCompile() 2023-01-11T21:38:06.0247608Z 2023-01-11T21:38:06.0247680Z import triton 2023-01-11T21:38:06.0247773Z import triton.language as tl 2023-01-11T21:38:06.0247900Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.0248038Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.0248084Z 2023-01-11T21:38:06.0248089Z 2023-01-11T21:38:06.0248226Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.0248432Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.0248555Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.0248655Z float* __restrict__ out_ptr0, 2023-01-11T21:38:06.0248755Z float* __restrict__ out_ptr1) 2023-01-11T21:38:06.0248824Z { 2023-01-11T21:38:06.0248925Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.0248992Z { 2023-01-11T21:38:06.0249072Z #pragma omp for 2023-01-11T21:38:06.0249158Z for(long i0=0; i0<4; i0+=1) 2023-01-11T21:38:06.0249218Z { 2023-01-11T21:38:06.0249358Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:06.0249498Z auto tmp1 = at::vec::Vectorized(static_cast(1)); 2023-01-11T21:38:06.0249587Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.0249728Z auto tmp3 = at::vec::Vectorized(static_cast(2)); 2023-01-11T21:38:06.0249819Z auto tmp4 = tmp2 + tmp3; 2023-01-11T21:38:06.0249908Z auto tmp5 = tmp0 + tmp3; 2023-01-11T21:38:06.0249997Z tmp4.store(out_ptr0 + 8*i0); 2023-01-11T21:38:06.0250095Z tmp5.store(out_ptr1 + 8*i0); 2023-01-11T21:38:06.0250162Z } 2023-01-11T21:38:06.0250261Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:06.0250347Z for(long i0=32; i0<32; i0+=1) 2023-01-11T21:38:06.0250413Z { 2023-01-11T21:38:06.0250502Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:06.0250599Z auto tmp1 = static_cast(1); 2023-01-11T21:38:06.0250688Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.0250791Z auto tmp3 = static_cast(2); 2023-01-11T21:38:06.0250884Z auto tmp4 = tmp2 + tmp3; 2023-01-11T21:38:06.0250971Z auto tmp5 = tmp0 + tmp3; 2023-01-11T21:38:06.0251059Z out_ptr0[i0] = tmp4; 2023-01-11T21:38:06.0251141Z out_ptr1[i0] = tmp5; 2023-01-11T21:38:06.0251201Z } 2023-01-11T21:38:06.0251266Z } 2023-01-11T21:38:06.0251330Z } 2023-01-11T21:38:06.0251416Z ''') 2023-01-11T21:38:06.0251422Z 2023-01-11T21:38:06.0251426Z 2023-01-11T21:38:06.0251548Z async_compile.wait(globals()) 2023-01-11T21:38:06.0251625Z del async_compile 2023-01-11T21:38:06.0251630Z 2023-01-11T21:38:06.0251706Z def call(args): 2023-01-11T21:38:06.0251773Z arg0_1, = args 2023-01-11T21:38:06.0251850Z args.clear() 2023-01-11T21:38:06.0252074Z buf0 = empty_strided((2, 2, 2, 2, 2), (4, 8, 1, 16, 2), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0252285Z buf1 = empty_strided((2, 2, 2, 2, 2), (4, 8, 1, 16, 2), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0252451Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr())) 2023-01-11T21:38:06.0252524Z del arg0_1 2023-01-11T21:38:06.0252607Z return (buf0, buf1, ) 2023-01-11T21:38:06.0252612Z 2023-01-11T21:38:06.0252617Z 2023-01-11T21:38:06.0252696Z if __name__ == "__main__": 2023-01-11T21:38:06.0252806Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.0252932Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.0253150Z arg0_1 = rand_strided((2, 2, 2, 2, 2), (16, 8, 4, 2, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0253262Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.0253267Z 2023-01-11T21:38:06.0253337Z ok (1.910s) 2023-01-11T21:38:06.0253786Z test_pow1_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.0253948Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.0254206Z [2023-01-11 21:29:58,232] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 213 2023-01-11T21:38:06.0254584Z [2023-01-11 21:30:00,202] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 213 2023-01-11T21:38:06.0254590Z 2023-01-11T21:38:06.0254690Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.0254757Z import torch 2023-01-11T21:38:06.0254830Z import random 2023-01-11T21:38:06.0254950Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.0255075Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.0255080Z 2023-01-11T21:38:06.0255162Z aten = torch.ops.aten 2023-01-11T21:38:06.0255299Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.0255393Z async_compile = AsyncCompile() 2023-01-11T21:38:06.0255398Z 2023-01-11T21:38:06.0255469Z import triton 2023-01-11T21:38:06.0255560Z import triton.language as tl 2023-01-11T21:38:06.0255684Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.0255825Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.0255831Z 2023-01-11T21:38:06.0255835Z 2023-01-11T21:38:06.0255977Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.0256182Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.0256305Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.0256409Z float* __restrict__ out_ptr0, 2023-01-11T21:38:06.0256503Z float* __restrict__ out_ptr1, 2023-01-11T21:38:06.0256602Z float* __restrict__ out_ptr2, 2023-01-11T21:38:06.0256698Z float* __restrict__ out_ptr3, 2023-01-11T21:38:06.0256795Z float* __restrict__ out_ptr4, 2023-01-11T21:38:06.0256897Z float* __restrict__ out_ptr5, 2023-01-11T21:38:06.0256996Z float* __restrict__ out_ptr6, 2023-01-11T21:38:06.0257097Z float* __restrict__ out_ptr7, 2023-01-11T21:38:06.0257258Z float* __restrict__ out_ptr8, 2023-01-11T21:38:06.0257415Z float* __restrict__ out_ptr9, 2023-01-11T21:38:06.0257529Z float* __restrict__ out_ptr10, 2023-01-11T21:38:06.0257635Z float* __restrict__ out_ptr11, 2023-01-11T21:38:06.0257736Z float* __restrict__ out_ptr12, 2023-01-11T21:38:06.0257836Z float* __restrict__ out_ptr13, 2023-01-11T21:38:06.0257935Z float* __restrict__ out_ptr14, 2023-01-11T21:38:06.0258036Z float* __restrict__ out_ptr15) 2023-01-11T21:38:06.0258094Z { 2023-01-11T21:38:06.0258195Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.0258261Z { 2023-01-11T21:38:06.0258345Z #pragma omp for 2023-01-11T21:38:06.0258432Z for(long i0=0; i0<32; i0+=1) 2023-01-11T21:38:06.0258501Z { 2023-01-11T21:38:06.0258641Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:06.0258737Z auto tmp1 = tmp0.reciprocal(); 2023-01-11T21:38:06.0258830Z auto tmp2 = tmp1 * tmp1; 2023-01-11T21:38:06.0258921Z auto tmp3 = tmp2 * tmp2; 2023-01-11T21:38:06.0259007Z auto tmp4 = tmp3 * tmp3; 2023-01-11T21:38:06.0259093Z auto tmp5 = tmp2 * tmp1; 2023-01-11T21:38:06.0259178Z auto tmp6 = tmp5 * tmp5; 2023-01-11T21:38:06.0259264Z auto tmp7 = tmp6 * tmp1; 2023-01-11T21:38:06.0259343Z auto tmp8 = tmp3 * tmp1; 2023-01-11T21:38:06.0259482Z auto tmp9 = at::vec::Vectorized(static_cast(1)); 2023-01-11T21:38:06.0259572Z auto tmp10 = tmp0 * tmp0; 2023-01-11T21:38:06.0259662Z auto tmp11 = tmp10 * tmp0; 2023-01-11T21:38:06.0259792Z auto tmp12 = tmp10 * tmp10; 2023-01-11T21:38:06.0259881Z auto tmp13 = tmp12 * tmp0; 2023-01-11T21:38:06.0259968Z auto tmp14 = tmp11 * tmp11; 2023-01-11T21:38:06.0260049Z auto tmp15 = tmp14 * tmp0; 2023-01-11T21:38:06.0260137Z auto tmp16 = tmp12 * tmp12; 2023-01-11T21:38:06.0260237Z tmp4.store(out_ptr0 + 8*i0); 2023-01-11T21:38:06.0260331Z tmp7.store(out_ptr1 + 8*i0); 2023-01-11T21:38:06.0260426Z tmp6.store(out_ptr2 + 8*i0); 2023-01-11T21:38:06.0260519Z tmp8.store(out_ptr3 + 8*i0); 2023-01-11T21:38:06.0260611Z tmp3.store(out_ptr4 + 8*i0); 2023-01-11T21:38:06.0260697Z tmp5.store(out_ptr5 + 8*i0); 2023-01-11T21:38:06.0260790Z tmp2.store(out_ptr6 + 8*i0); 2023-01-11T21:38:06.0260881Z tmp1.store(out_ptr7 + 8*i0); 2023-01-11T21:38:06.0260972Z tmp9.store(out_ptr8 + 8*i0); 2023-01-11T21:38:06.0261066Z tmp10.store(out_ptr9 + 8*i0); 2023-01-11T21:38:06.0261168Z tmp11.store(out_ptr10 + 8*i0); 2023-01-11T21:38:06.0261267Z tmp12.store(out_ptr11 + 8*i0); 2023-01-11T21:38:06.0261356Z tmp13.store(out_ptr12 + 8*i0); 2023-01-11T21:38:06.0261451Z tmp14.store(out_ptr13 + 8*i0); 2023-01-11T21:38:06.0261548Z tmp15.store(out_ptr14 + 8*i0); 2023-01-11T21:38:06.0261640Z tmp16.store(out_ptr15 + 8*i0); 2023-01-11T21:38:06.0261709Z } 2023-01-11T21:38:06.0261808Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:06.0261893Z for(long i0=256; i0<256; i0+=1) 2023-01-11T21:38:06.0261953Z { 2023-01-11T21:38:06.0262043Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:06.0262134Z auto tmp1 = 1 / tmp0; 2023-01-11T21:38:06.0262226Z auto tmp2 = tmp1 * tmp1; 2023-01-11T21:38:06.0262316Z auto tmp3 = tmp2 * tmp2; 2023-01-11T21:38:06.0262407Z auto tmp4 = tmp3 * tmp3; 2023-01-11T21:38:06.0262497Z auto tmp5 = tmp2 * tmp1; 2023-01-11T21:38:06.0262576Z auto tmp6 = tmp5 * tmp5; 2023-01-11T21:38:06.0262660Z auto tmp7 = tmp6 * tmp1; 2023-01-11T21:38:06.0262746Z auto tmp8 = tmp3 * tmp1; 2023-01-11T21:38:06.0262850Z auto tmp9 = static_cast(1); 2023-01-11T21:38:06.0262968Z auto tmp10 = tmp0 * tmp0; 2023-01-11T21:38:06.0263059Z auto tmp11 = tmp10 * tmp0; 2023-01-11T21:38:06.0263151Z auto tmp12 = tmp10 * tmp10; 2023-01-11T21:38:06.0263233Z auto tmp13 = tmp12 * tmp0; 2023-01-11T21:38:06.0263323Z auto tmp14 = tmp11 * tmp11; 2023-01-11T21:38:06.0263411Z auto tmp15 = tmp14 * tmp0; 2023-01-11T21:38:06.0263500Z auto tmp16 = tmp12 * tmp12; 2023-01-11T21:38:06.0263585Z out_ptr0[i0] = tmp4; 2023-01-11T21:38:06.0263669Z out_ptr1[i0] = tmp7; 2023-01-11T21:38:06.0263755Z out_ptr2[i0] = tmp6; 2023-01-11T21:38:06.0263831Z out_ptr3[i0] = tmp8; 2023-01-11T21:38:06.0263916Z out_ptr4[i0] = tmp3; 2023-01-11T21:38:06.0263999Z out_ptr5[i0] = tmp5; 2023-01-11T21:38:06.0264082Z out_ptr6[i0] = tmp2; 2023-01-11T21:38:06.0264164Z out_ptr7[i0] = tmp1; 2023-01-11T21:38:06.0264247Z out_ptr8[i0] = tmp9; 2023-01-11T21:38:06.0264326Z out_ptr9[i0] = tmp10; 2023-01-11T21:38:06.0264413Z out_ptr10[i0] = tmp11; 2023-01-11T21:38:06.0264498Z out_ptr11[i0] = tmp12; 2023-01-11T21:38:06.0264583Z out_ptr12[i0] = tmp13; 2023-01-11T21:38:06.0264665Z out_ptr13[i0] = tmp14; 2023-01-11T21:38:06.0264748Z out_ptr14[i0] = tmp15; 2023-01-11T21:38:06.0264833Z out_ptr15[i0] = tmp16; 2023-01-11T21:38:06.0264893Z } 2023-01-11T21:38:06.0264959Z } 2023-01-11T21:38:06.0265022Z } 2023-01-11T21:38:06.0265112Z ''') 2023-01-11T21:38:06.0265118Z 2023-01-11T21:38:06.0265122Z 2023-01-11T21:38:06.0265215Z async_compile.wait(globals()) 2023-01-11T21:38:06.0265329Z del async_compile 2023-01-11T21:38:06.0265336Z 2023-01-11T21:38:06.0265409Z def call(args): 2023-01-11T21:38:06.0265483Z arg0_1, = args 2023-01-11T21:38:06.0265551Z args.clear() 2023-01-11T21:38:06.0265755Z buf0 = empty_strided((16, 16), (16, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0265956Z buf1 = empty_strided((16, 16), (16, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0266153Z buf2 = empty_strided((16, 16), (16, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0266346Z buf3 = empty_strided((16, 16), (16, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0266539Z buf4 = empty_strided((16, 16), (16, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0266728Z buf5 = empty_strided((16, 16), (16, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0266911Z buf6 = empty_strided((16, 16), (16, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0267099Z buf7 = empty_strided((16, 16), (16, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0267291Z buf8 = empty_strided((16, 16), (16, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0267480Z buf9 = empty_strided((16, 16), (16, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0267677Z buf10 = empty_strided((16, 16), (16, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0267871Z buf11 = empty_strided((16, 16), (16, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0268067Z buf12 = empty_strided((16, 16), (16, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0268260Z buf13 = empty_strided((16, 16), (16, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0268444Z buf14 = empty_strided((16, 16), (16, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0268634Z buf15 = empty_strided((16, 16), (16, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0269135Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr()), c_void_p(buf3.data_ptr()), c_void_p(buf4.data_ptr()), c_void_p(buf5.data_ptr()), c_void_p(buf6.data_ptr()), c_void_p(buf7.data_ptr()), c_void_p(buf8.data_ptr()), c_void_p(buf9.data_ptr()), c_void_p(buf10.data_ptr()), c_void_p(buf11.data_ptr()), c_void_p(buf12.data_ptr()), c_void_p(buf13.data_ptr()), c_void_p(buf14.data_ptr()), c_void_p(buf15.data_ptr())) 2023-01-11T21:38:06.0269308Z return (buf0, buf1, buf2, buf3, buf4, buf5, buf6, buf7, buf8, arg0_1, buf9, buf10, buf11, buf12, buf13, buf14, buf15, ) 2023-01-11T21:38:06.0269314Z 2023-01-11T21:38:06.0269318Z 2023-01-11T21:38:06.0269399Z if __name__ == "__main__": 2023-01-11T21:38:06.0269520Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.0269646Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.0269840Z arg0_1 = rand_strided((16, 16), (16, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0269954Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.0269962Z 2023-01-11T21:38:06.0270036Z ok (2.189s) 2023-01-11T21:38:06.0270487Z test_pow2_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.0270618Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.0270870Z [2023-01-11 21:30:00,255] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 214 2023-01-11T21:38:06.0271134Z [2023-01-11 21:30:02,112] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 214 2023-01-11T21:38:06.0271140Z 2023-01-11T21:38:06.0271240Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.0271346Z import torch 2023-01-11T21:38:06.0271421Z import random 2023-01-11T21:38:06.0271542Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.0271667Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.0271672Z 2023-01-11T21:38:06.0271754Z aten = torch.ops.aten 2023-01-11T21:38:06.0271885Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.0271981Z async_compile = AsyncCompile() 2023-01-11T21:38:06.0271986Z 2023-01-11T21:38:06.0272062Z import triton 2023-01-11T21:38:06.0272153Z import triton.language as tl 2023-01-11T21:38:06.0272279Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.0272417Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.0272423Z 2023-01-11T21:38:06.0272427Z 2023-01-11T21:38:06.0272566Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.0272771Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.0272892Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.0272996Z float* __restrict__ out_ptr0, 2023-01-11T21:38:06.0273097Z float* __restrict__ out_ptr1) 2023-01-11T21:38:06.0273162Z { 2023-01-11T21:38:06.0273263Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.0273331Z { 2023-01-11T21:38:06.0273414Z #pragma omp for 2023-01-11T21:38:06.0273494Z for(long i0=0; i0<32; i0+=1) 2023-01-11T21:38:06.0273562Z { 2023-01-11T21:38:06.0273703Z auto tmp1 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:06.0273844Z auto tmp0 = at::vec::Vectorized(static_cast(1000)); 2023-01-11T21:38:06.0273937Z auto tmp2 = tmp0.pow(tmp1); 2023-01-11T21:38:06.0274029Z auto tmp3 = tmp1.pow(tmp0); 2023-01-11T21:38:06.0274125Z tmp2.store(out_ptr0 + 8*i0); 2023-01-11T21:38:06.0274214Z tmp3.store(out_ptr1 + 8*i0); 2023-01-11T21:38:06.0274283Z } 2023-01-11T21:38:06.0274380Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:06.0274465Z for(long i0=256; i0<256; i0+=1) 2023-01-11T21:38:06.0274532Z { 2023-01-11T21:38:06.0274621Z auto tmp1 = in_ptr0[i0]; 2023-01-11T21:38:06.0274726Z auto tmp0 = static_cast(1000); 2023-01-11T21:38:06.0274856Z auto tmp2 = std::pow(tmp0, tmp1); 2023-01-11T21:38:06.0274964Z auto tmp3 = std::pow(tmp1, tmp0); 2023-01-11T21:38:06.0275050Z out_ptr0[i0] = tmp2; 2023-01-11T21:38:06.0275134Z out_ptr1[i0] = tmp3; 2023-01-11T21:38:06.0275203Z } 2023-01-11T21:38:06.0275270Z } 2023-01-11T21:38:06.0275337Z } 2023-01-11T21:38:06.0275415Z ''') 2023-01-11T21:38:06.0275421Z 2023-01-11T21:38:06.0275425Z 2023-01-11T21:38:06.0275519Z async_compile.wait(globals()) 2023-01-11T21:38:06.0275595Z del async_compile 2023-01-11T21:38:06.0275600Z 2023-01-11T21:38:06.0275675Z def call(args): 2023-01-11T21:38:06.0275754Z arg0_1, = args 2023-01-11T21:38:06.0275828Z args.clear() 2023-01-11T21:38:06.0276028Z buf0 = empty_strided((16, 16), (16, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0276216Z buf1 = empty_strided((16, 16), (16, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0276388Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr())) 2023-01-11T21:38:06.0276460Z del arg0_1 2023-01-11T21:38:06.0276545Z return (buf0, buf1, ) 2023-01-11T21:38:06.0276550Z 2023-01-11T21:38:06.0276555Z 2023-01-11T21:38:06.0276633Z if __name__ == "__main__": 2023-01-11T21:38:06.0276751Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.0276879Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.0277075Z arg0_1 = rand_strided((16, 16), (16, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0277179Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.0277192Z 2023-01-11T21:38:06.0277285Z ok (1.908s) 2023-01-11T21:38:06.0277791Z test_pow3_cpu (__main__.CpuTests) ... /opt/conda/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py:372: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled.Consider setting `torch.set_float32_matmul_precision('high')` 2023-01-11T21:38:06.0277874Z warnings.warn( 2023-01-11T21:38:06.0278131Z [2023-01-11 21:30:02,152] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 215 2023-01-11T21:38:06.0278395Z [2023-01-11 21:30:04,306] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 215 2023-01-11T21:38:06.0278400Z 2023-01-11T21:38:06.0278498Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.0278572Z import torch 2023-01-11T21:38:06.0278649Z import random 2023-01-11T21:38:06.0278761Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.0278886Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.0278894Z 2023-01-11T21:38:06.0278978Z aten = torch.ops.aten 2023-01-11T21:38:06.0279115Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.0279210Z async_compile = AsyncCompile() 2023-01-11T21:38:06.0279215Z 2023-01-11T21:38:06.0279291Z import triton 2023-01-11T21:38:06.0279384Z import triton.language as tl 2023-01-11T21:38:06.0279506Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.0279646Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.0279652Z 2023-01-11T21:38:06.0279656Z 2023-01-11T21:38:06.0279792Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.0279996Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.0280119Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.0280224Z float* __restrict__ out_ptr0) 2023-01-11T21:38:06.0280291Z { 2023-01-11T21:38:06.0280355Z { 2023-01-11T21:38:06.0280418Z { 2023-01-11T21:38:06.0280505Z auto tmp1 = in_ptr0[0]; 2023-01-11T21:38:06.0280622Z auto tmp0 = static_cast(0.12300000339746475); 2023-01-11T21:38:06.0280711Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.0280807Z auto tmp3 = std::sqrt(tmp2); 2023-01-11T21:38:06.0280924Z out_ptr0[0] = tmp3; 2023-01-11T21:38:06.0280997Z } 2023-01-11T21:38:06.0281059Z } 2023-01-11T21:38:06.0281126Z } 2023-01-11T21:38:06.0281210Z ''') 2023-01-11T21:38:06.0281215Z 2023-01-11T21:38:06.0281220Z 2023-01-11T21:38:06.0281314Z async_compile.wait(globals()) 2023-01-11T21:38:06.0281390Z del async_compile 2023-01-11T21:38:06.0281395Z 2023-01-11T21:38:06.0281470Z def call(args): 2023-01-11T21:38:06.0281546Z arg0_1, = args 2023-01-11T21:38:06.0281614Z args.clear() 2023-01-11T21:38:06.0281797Z buf0 = empty_strided((), (), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0281937Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.0282014Z del arg0_1 2023-01-11T21:38:06.0282091Z return (buf0, ) 2023-01-11T21:38:06.0282096Z 2023-01-11T21:38:06.0282101Z 2023-01-11T21:38:06.0282181Z if __name__ == "__main__": 2023-01-11T21:38:06.0282298Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.0282427Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.0282606Z arg0_1 = rand_strided((), (), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0282718Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.0282723Z 2023-01-11T21:38:06.0282793Z ok (2.178s) 2023-01-11T21:38:06.0283131Z test_profiler_mark_wrapper_call_cpu (__main__.CpuTests) ... STAGE:2023-01-11 21:30:04 2346:2346 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:38:06.0283387Z [2023-01-11 21:30:04,320] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 216 2023-01-11T21:38:06.0283648Z [2023-01-11 21:30:06,710] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 216 2023-01-11T21:38:06.0283933Z STAGE:2023-01-11 21:30:06 2346:2346 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:38:06.0284195Z STAGE:2023-01-11 21:30:06 2346:2346 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:38:06.0284201Z 2023-01-11T21:38:06.0284298Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.0284365Z import torch 2023-01-11T21:38:06.0284441Z import random 2023-01-11T21:38:06.0284559Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.0284683Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.0284688Z 2023-01-11T21:38:06.0284769Z aten = torch.ops.aten 2023-01-11T21:38:06.0284908Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.0285003Z async_compile = AsyncCompile() 2023-01-11T21:38:06.0285008Z 2023-01-11T21:38:06.0285075Z import triton 2023-01-11T21:38:06.0285171Z import triton.language as tl 2023-01-11T21:38:06.0285298Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.0285439Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.0285444Z 2023-01-11T21:38:06.0285450Z 2023-01-11T21:38:06.0285589Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.0285798Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.0285923Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.0286034Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:06.0286131Z float* __restrict__ out_ptr0) 2023-01-11T21:38:06.0286199Z { 2023-01-11T21:38:06.0286302Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.0286370Z { 2023-01-11T21:38:06.0286452Z #pragma omp for 2023-01-11T21:38:06.0286541Z for(long i0=0; i0<12; i0+=1) 2023-01-11T21:38:06.0286610Z { 2023-01-11T21:38:06.0286744Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:06.0286879Z auto tmp1 = at::vec::Vectorized::loadu(in_ptr1 + 8*i0); 2023-01-11T21:38:06.0286972Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.0287067Z tmp2.store(out_ptr0 + 8*i0); 2023-01-11T21:38:06.0287164Z } 2023-01-11T21:38:06.0287267Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:06.0287358Z for(long i0=96; i0<100; i0+=1) 2023-01-11T21:38:06.0287418Z { 2023-01-11T21:38:06.0287506Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:06.0287595Z auto tmp1 = in_ptr1[i0]; 2023-01-11T21:38:06.0287682Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.0287766Z out_ptr0[i0] = tmp2; 2023-01-11T21:38:06.0287829Z } 2023-01-11T21:38:06.0287892Z } 2023-01-11T21:38:06.0287949Z } 2023-01-11T21:38:06.0288034Z ''') 2023-01-11T21:38:06.0288039Z 2023-01-11T21:38:06.0288046Z 2023-01-11T21:38:06.0288140Z async_compile.wait(globals()) 2023-01-11T21:38:06.0288216Z del async_compile 2023-01-11T21:38:06.0288222Z 2023-01-11T21:38:06.0288295Z def call(args): 2023-01-11T21:38:06.0288376Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.0288451Z args.clear() 2023-01-11T21:38:06.0288559Z from torch.profiler import record_function 2023-01-11T21:38:06.0288724Z with record_function('inductor_wrapper_call'): 2023-01-11T21:38:06.0288927Z buf0 = empty_strided((100, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0289098Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.0289175Z del arg0_1 2023-01-11T21:38:06.0289248Z del arg1_1 2023-01-11T21:38:06.0289327Z return (buf0, ) 2023-01-11T21:38:06.0289333Z 2023-01-11T21:38:06.0289338Z 2023-01-11T21:38:06.0289421Z if __name__ == "__main__": 2023-01-11T21:38:06.0289531Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.0289699Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.0289896Z arg0_1 = rand_strided((100, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0290091Z arg1_1 = rand_strided((100, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0290216Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.0290221Z 2023-01-11T21:38:06.0290293Z ok (2.424s) 2023-01-11T21:38:06.0290813Z test_rand_like_deterministic_cpu (__main__.CpuTests) ... /opt/conda/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py:372: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled.Consider setting `torch.set_float32_matmul_precision('high')` 2023-01-11T21:38:06.0290891Z warnings.warn( 2023-01-11T21:38:06.0291146Z [2023-01-11 21:30:06,825] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 217 2023-01-11T21:38:06.0291392Z [2023-01-11 21:30:06,826] torch._inductor.lowering: [WARNING] using triton random, expect difference from eager 2023-01-11T21:38:06.0291654Z [2023-01-11 21:30:09,192] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 217 2023-01-11T21:38:06.0291660Z 2023-01-11T21:38:06.0291758Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.0291830Z import torch 2023-01-11T21:38:06.0291907Z import random 2023-01-11T21:38:06.0292024Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.0292148Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.0292154Z 2023-01-11T21:38:06.0292237Z aten = torch.ops.aten 2023-01-11T21:38:06.0292365Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.0292458Z async_compile = AsyncCompile() 2023-01-11T21:38:06.0292463Z 2023-01-11T21:38:06.0292536Z import triton 2023-01-11T21:38:06.0292628Z import triton.language as tl 2023-01-11T21:38:06.0292752Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.0292896Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.0293062Z seed_cpu_None = None # 9130db9322feaa41c28986790b86d7dd047e77339ff46fce775dbaa5929b26ce 2023-01-11T21:38:06.0293068Z 2023-01-11T21:38:06.0293072Z 2023-01-11T21:38:06.0293207Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.0293429Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.0293550Z extern "C" void kernel(const long* __restrict__ seed0, 2023-01-11T21:38:06.0293653Z float* __restrict__ out_ptr0, 2023-01-11T21:38:06.0293757Z float* __restrict__ out_ptr1) 2023-01-11T21:38:06.0293822Z { 2023-01-11T21:38:06.0293924Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.0293988Z { 2023-01-11T21:38:06.0294063Z #pragma omp for 2023-01-11T21:38:06.0294151Z for(long i0=0; i0<1024; i0+=1) 2023-01-11T21:38:06.0294217Z { 2023-01-11T21:38:06.0294288Z { 2023-01-11T21:38:06.0294355Z { 2023-01-11T21:38:06.0294447Z auto tmp0 = seed0[0]; 2023-01-11T21:38:06.0294669Z auto tmp1 = static_cast(i0); 2023-01-11T21:38:06.0294808Z auto tmp2 = static_cast(normalized_rand_cpu(tmp0, tmp1));; 2023-01-11T21:38:06.0294923Z auto tmp3 = static_cast(1024 + i0); 2023-01-11T21:38:06.0295067Z auto tmp4 = static_cast(normalized_rand_cpu(tmp0, tmp3));; 2023-01-11T21:38:06.0295156Z out_ptr0[i0] = tmp2; 2023-01-11T21:38:06.0295246Z out_ptr1[i0] = tmp4; 2023-01-11T21:38:06.0295317Z } 2023-01-11T21:38:06.0295384Z } 2023-01-11T21:38:06.0295443Z } 2023-01-11T21:38:06.0295509Z } 2023-01-11T21:38:06.0295570Z } 2023-01-11T21:38:06.0301863Z ''') 2023-01-11T21:38:06.0301869Z 2023-01-11T21:38:06.0301874Z 2023-01-11T21:38:06.0301984Z async_compile.wait(globals()) 2023-01-11T21:38:06.0302148Z del async_compile 2023-01-11T21:38:06.0302153Z 2023-01-11T21:38:06.0302233Z def call(args): 2023-01-11T21:38:06.0302311Z arg0_1, = args 2023-01-11T21:38:06.0302392Z args.clear() 2023-01-11T21:38:06.0302537Z torch.randint(2**31, size=(), dtype=torch.int64, out=seed_cpu_None) 2023-01-11T21:38:06.0302737Z buf0 = empty_strided((1024, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0302935Z buf1 = empty_strided((1024, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0303110Z kernel_cpp_0(c_void_p(seed_cpu_None.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr())) 2023-01-11T21:38:06.0303192Z return (buf0, buf1, ) 2023-01-11T21:38:06.0303198Z 2023-01-11T21:38:06.0303202Z 2023-01-11T21:38:06.0303280Z if __name__ == "__main__": 2023-01-11T21:38:06.0303398Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.0303525Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.0303722Z seed_cpu_None = rand_strided((), (), device='cpu', dtype=torch.int64) 2023-01-11T21:38:06.0303912Z arg0_1 = rand_strided((1024, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0304025Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.0304031Z 2023-01-11T21:38:06.0304100Z ok (2.462s) 2023-01-11T21:38:06.0304560Z test_reduction1_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.0304693Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.0304950Z [2023-01-11 21:30:09,211] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 218 2023-01-11T21:38:06.0305219Z [2023-01-11 21:30:11,591] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 218 2023-01-11T21:38:06.0305225Z 2023-01-11T21:38:06.0305323Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.0305398Z import torch 2023-01-11T21:38:06.0305467Z import random 2023-01-11T21:38:06.0305625Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.0305754Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.0305760Z 2023-01-11T21:38:06.0305843Z aten = torch.ops.aten 2023-01-11T21:38:06.0305978Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.0306073Z async_compile = AsyncCompile() 2023-01-11T21:38:06.0306078Z 2023-01-11T21:38:06.0306155Z import triton 2023-01-11T21:38:06.0306249Z import triton.language as tl 2023-01-11T21:38:06.0306366Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.0306505Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.0306514Z 2023-01-11T21:38:06.0306519Z 2023-01-11T21:38:06.0306655Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.0306857Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.0306986Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.0307087Z float* __restrict__ out_ptr0, 2023-01-11T21:38:06.0307188Z float* __restrict__ out_ptr1, 2023-01-11T21:38:06.0307288Z float* __restrict__ out_ptr2, 2023-01-11T21:38:06.0307389Z long* __restrict__ out_ptr3, 2023-01-11T21:38:06.0307490Z long* __restrict__ out_ptr4) 2023-01-11T21:38:06.0307557Z { 2023-01-11T21:38:06.0307623Z { 2023-01-11T21:38:06.0307684Z { 2023-01-11T21:38:06.0307766Z float tmp1 = 0; 2023-01-11T21:38:06.0307987Z float tmp2 = -std::numeric_limits::infinity(); 2023-01-11T21:38:06.0308114Z float tmp3 = std::numeric_limits::infinity(); 2023-01-11T21:38:06.0308270Z struct IndexValue_7 {size_t index; float value;}; 2023-01-11T21:38:06.0308491Z IndexValue_7 tmp4{0, -std::numeric_limits::infinity()}; 2023-01-11T21:38:06.0308632Z #pragma omp declare reduction(argmax : struct IndexValue_7 :\ 2023-01-11T21:38:06.0308790Z omp_out.value = omp_in.value < omp_out.value ? omp_out.value : omp_in.value,\ 2023-01-11T21:38:06.0308937Z omp_out.index = omp_in.value < omp_out.value ? omp_out.index : omp_in.index)\ 2023-01-11T21:38:06.0309171Z initializer(omp_priv = {0, -std::numeric_limits::infinity()}) 2023-01-11T21:38:06.0309293Z struct IndexValue_8 {size_t index; float value;}; 2023-01-11T21:38:06.0309431Z IndexValue_8 tmp5{0, std::numeric_limits::infinity()}; 2023-01-11T21:38:06.0309571Z #pragma omp declare reduction(argmin : struct IndexValue_8 :\ 2023-01-11T21:38:06.0309726Z omp_out.value = omp_in.value > omp_out.value ? omp_out.value : omp_in.value,\ 2023-01-11T21:38:06.0309877Z omp_out.index = omp_in.value > omp_out.value ? omp_out.index : omp_in.index)\ 2023-01-11T21:38:06.0310025Z initializer(omp_priv = {0, std::numeric_limits::infinity()}) 2023-01-11T21:38:06.0310119Z for(long i0=0; i0<3; i0+=1) 2023-01-11T21:38:06.0310181Z { 2023-01-11T21:38:06.0310251Z { 2023-01-11T21:38:06.0310349Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:06.0310431Z tmp1 += tmp0; 2023-01-11T21:38:06.0310537Z tmp2 = std::max(tmp2, tmp0); 2023-01-11T21:38:06.0310641Z tmp3 = std::min(tmp3, tmp0); 2023-01-11T21:38:06.0310737Z if (tmp4.value < tmp0) { 2023-01-11T21:38:06.0310845Z tmp4.index = i0; tmp4.value = tmp0; 2023-01-11T21:38:06.0310917Z } 2023-01-11T21:38:06.0311015Z if (tmp5.value > tmp0) { 2023-01-11T21:38:06.0311125Z tmp5.index = i0; tmp5.value = tmp0; 2023-01-11T21:38:06.0311198Z } 2023-01-11T21:38:06.0311267Z } 2023-01-11T21:38:06.0311334Z } 2023-01-11T21:38:06.0311412Z out_ptr0[0] = tmp1; 2023-01-11T21:38:06.0311524Z out_ptr1[0] = tmp2; 2023-01-11T21:38:06.0311607Z out_ptr2[0] = tmp3; 2023-01-11T21:38:06.0311697Z out_ptr3[0] = tmp4.index; 2023-01-11T21:38:06.0311785Z out_ptr4[0] = tmp5.index; 2023-01-11T21:38:06.0311852Z } 2023-01-11T21:38:06.0311911Z } 2023-01-11T21:38:06.0311974Z } 2023-01-11T21:38:06.0312058Z ''') 2023-01-11T21:38:06.0312064Z 2023-01-11T21:38:06.0312068Z 2023-01-11T21:38:06.0312161Z async_compile.wait(globals()) 2023-01-11T21:38:06.0312239Z del async_compile 2023-01-11T21:38:06.0312244Z 2023-01-11T21:38:06.0312319Z def call(args): 2023-01-11T21:38:06.0312393Z arg0_1, = args 2023-01-11T21:38:06.0312471Z args.clear() 2023-01-11T21:38:06.0312653Z buf0 = empty_strided((), (), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0312833Z buf1 = empty_strided((), (), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0313013Z buf2 = empty_strided((), (), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0313194Z buf3 = empty_strided((), (), device='cpu', dtype=torch.int64) 2023-01-11T21:38:06.0313371Z buf4 = empty_strided((), (), device='cpu', dtype=torch.int64) 2023-01-11T21:38:06.0313609Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr()), c_void_p(buf3.data_ptr()), c_void_p(buf4.data_ptr())) 2023-01-11T21:38:06.0313684Z del arg0_1 2023-01-11T21:38:06.0313777Z return (buf0, buf1, buf2, buf3, buf4, ) 2023-01-11T21:38:06.0313788Z 2023-01-11T21:38:06.0313793Z 2023-01-11T21:38:06.0313867Z if __name__ == "__main__": 2023-01-11T21:38:06.0313985Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.0314146Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.0314339Z arg0_1 = rand_strided((3, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0314451Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.0314456Z 2023-01-11T21:38:06.0314530Z ok (2.399s) 2023-01-11T21:38:06.0314986Z test_reduction2_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.0315119Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.0315375Z [2023-01-11 21:30:11,608] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 219 2023-01-11T21:38:06.0315634Z [2023-01-11 21:30:14,220] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 219 2023-01-11T21:38:06.0315647Z 2023-01-11T21:38:06.0315739Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.0315812Z import torch 2023-01-11T21:38:06.0315887Z import random 2023-01-11T21:38:06.0316007Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.0316130Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.0316135Z 2023-01-11T21:38:06.0316218Z aten = torch.ops.aten 2023-01-11T21:38:06.0316357Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.0316446Z async_compile = AsyncCompile() 2023-01-11T21:38:06.0316451Z 2023-01-11T21:38:06.0316524Z import triton 2023-01-11T21:38:06.0316618Z import triton.language as tl 2023-01-11T21:38:06.0316746Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.0316888Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.0316896Z 2023-01-11T21:38:06.0316901Z 2023-01-11T21:38:06.0317038Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.0317243Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.0317368Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.0317495Z float* __restrict__ out_ptr0, 2023-01-11T21:38:06.0317599Z float* __restrict__ out_ptr1, 2023-01-11T21:38:06.0317702Z float* __restrict__ out_ptr2, 2023-01-11T21:38:06.0317803Z long* __restrict__ out_ptr3) 2023-01-11T21:38:06.0317868Z { 2023-01-11T21:38:06.0317934Z { 2023-01-11T21:38:06.0318002Z { 2023-01-11T21:38:06.0318077Z float tmp1 = 0; 2023-01-11T21:38:06.0318284Z float tmp2 = -std::numeric_limits::infinity(); 2023-01-11T21:38:06.0318415Z float tmp3 = std::numeric_limits::infinity(); 2023-01-11T21:38:06.0318543Z struct IndexValue_9 {size_t index; float value;}; 2023-01-11T21:38:06.0318681Z IndexValue_9 tmp4{0, std::numeric_limits::infinity()}; 2023-01-11T21:38:06.0318821Z #pragma omp declare reduction(argmin : struct IndexValue_9 :\ 2023-01-11T21:38:06.0318976Z omp_out.value = omp_in.value > omp_out.value ? omp_out.value : omp_in.value,\ 2023-01-11T21:38:06.0319126Z omp_out.index = omp_in.value > omp_out.value ? omp_out.index : omp_in.index)\ 2023-01-11T21:38:06.0319266Z initializer(omp_priv = {0, std::numeric_limits::infinity()}) 2023-01-11T21:38:06.0319358Z for(long i0=0; i0<4; i0+=1) 2023-01-11T21:38:06.0319425Z { 2023-01-11T21:38:06.0319494Z { 2023-01-11T21:38:06.0319592Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:06.0319675Z tmp1 += tmp0; 2023-01-11T21:38:06.0319781Z tmp2 = std::max(tmp2, tmp0); 2023-01-11T21:38:06.0319906Z tmp3 = std::min(tmp3, tmp0); 2023-01-11T21:38:06.0320005Z if (tmp4.value > tmp0) { 2023-01-11T21:38:06.0320120Z tmp4.index = i0; tmp4.value = tmp0; 2023-01-11T21:38:06.0320193Z } 2023-01-11T21:38:06.0320262Z } 2023-01-11T21:38:06.0320331Z } 2023-01-11T21:38:06.0320419Z out_ptr0[0] = tmp1; 2023-01-11T21:38:06.0320496Z out_ptr1[0] = tmp2; 2023-01-11T21:38:06.0320576Z out_ptr2[0] = tmp3; 2023-01-11T21:38:06.0320666Z out_ptr3[0] = tmp4.index; 2023-01-11T21:38:06.0320732Z } 2023-01-11T21:38:06.0320797Z } 2023-01-11T21:38:06.0320861Z } 2023-01-11T21:38:06.0320946Z ''') 2023-01-11T21:38:06.0320951Z 2023-01-11T21:38:06.0320955Z 2023-01-11T21:38:06.0321042Z async_compile.wait(globals()) 2023-01-11T21:38:06.0321116Z del async_compile 2023-01-11T21:38:06.0321121Z 2023-01-11T21:38:06.0321194Z def call(args): 2023-01-11T21:38:06.0321269Z arg0_1, = args 2023-01-11T21:38:06.0321344Z args.clear() 2023-01-11T21:38:06.0321530Z buf0 = empty_strided((), (), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0321710Z buf1 = empty_strided((), (), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0321884Z buf2 = empty_strided((), (), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0322065Z buf3 = empty_strided((), (), device='cpu', dtype=torch.int64) 2023-01-11T21:38:06.0322280Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr()), c_void_p(buf3.data_ptr())) 2023-01-11T21:38:06.0322354Z del arg0_1 2023-01-11T21:38:06.0322446Z return (buf0, buf1, buf2, buf3, ) 2023-01-11T21:38:06.0322451Z 2023-01-11T21:38:06.0322455Z 2023-01-11T21:38:06.0322536Z if __name__ == "__main__": 2023-01-11T21:38:06.0322654Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.0322780Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.0322968Z arg0_1 = rand_strided((4, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0323082Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.0323087Z 2023-01-11T21:38:06.0323157Z ok (2.634s) 2023-01-11T21:38:06.0323639Z test_reduction3_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.0323777Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.0324035Z [2023-01-11 21:30:14,281] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 220 2023-01-11T21:38:06.0324296Z [2023-01-11 21:30:16,495] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 220 2023-01-11T21:38:06.0324305Z 2023-01-11T21:38:06.0324402Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.0324476Z import torch 2023-01-11T21:38:06.0324551Z import random 2023-01-11T21:38:06.0324663Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.0324789Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.0324794Z 2023-01-11T21:38:06.0324875Z aten = torch.ops.aten 2023-01-11T21:38:06.0325012Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.0325108Z async_compile = AsyncCompile() 2023-01-11T21:38:06.0325113Z 2023-01-11T21:38:06.0325187Z import triton 2023-01-11T21:38:06.0325278Z import triton.language as tl 2023-01-11T21:38:06.0325408Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.0325564Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.0325571Z 2023-01-11T21:38:06.0325576Z 2023-01-11T21:38:06.0325757Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.0325963Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.0326087Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.0326193Z float* __restrict__ out_ptr0, 2023-01-11T21:38:06.0326296Z float* __restrict__ out_ptr1, 2023-01-11T21:38:06.0326395Z float* __restrict__ out_ptr2, 2023-01-11T21:38:06.0326489Z long* __restrict__ out_ptr3) 2023-01-11T21:38:06.0326554Z { 2023-01-11T21:38:06.0326616Z { 2023-01-11T21:38:06.0326684Z { 2023-01-11T21:38:06.0326766Z float tmp1 = 0; 2023-01-11T21:38:06.0326970Z float tmp2 = -std::numeric_limits::infinity(); 2023-01-11T21:38:06.0327097Z float tmp3 = std::numeric_limits::infinity(); 2023-01-11T21:38:06.0327214Z struct IndexValue_10 {size_t index; float value;}; 2023-01-11T21:38:06.0327438Z IndexValue_10 tmp4{0, -std::numeric_limits::infinity()}; 2023-01-11T21:38:06.0327580Z #pragma omp declare reduction(argmax : struct IndexValue_10 :\ 2023-01-11T21:38:06.0327735Z omp_out.value = omp_in.value < omp_out.value ? omp_out.value : omp_in.value,\ 2023-01-11T21:38:06.0327886Z omp_out.index = omp_in.value < omp_out.value ? omp_out.index : omp_in.index)\ 2023-01-11T21:38:06.0328115Z initializer(omp_priv = {0, -std::numeric_limits::infinity()}) 2023-01-11T21:38:06.0328206Z for(long i0=0; i0<4; i0+=1) 2023-01-11T21:38:06.0328274Z { 2023-01-11T21:38:06.0328337Z { 2023-01-11T21:38:06.0328436Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:06.0328519Z tmp1 += tmp0; 2023-01-11T21:38:06.0328625Z tmp2 = std::max(tmp2, tmp0); 2023-01-11T21:38:06.0328726Z tmp3 = std::min(tmp3, tmp0); 2023-01-11T21:38:06.0328826Z if (tmp4.value < tmp0) { 2023-01-11T21:38:06.0328941Z tmp4.index = i0; tmp4.value = tmp0; 2023-01-11T21:38:06.0329012Z } 2023-01-11T21:38:06.0329074Z } 2023-01-11T21:38:06.0329140Z } 2023-01-11T21:38:06.0329254Z out_ptr0[0] = tmp1; 2023-01-11T21:38:06.0329338Z out_ptr1[0] = tmp2; 2023-01-11T21:38:06.0329420Z out_ptr2[0] = tmp3; 2023-01-11T21:38:06.0329509Z out_ptr3[0] = tmp4.index; 2023-01-11T21:38:06.0329569Z } 2023-01-11T21:38:06.0329636Z } 2023-01-11T21:38:06.0329698Z } 2023-01-11T21:38:06.0329782Z ''') 2023-01-11T21:38:06.0329787Z 2023-01-11T21:38:06.0329792Z 2023-01-11T21:38:06.0329884Z async_compile.wait(globals()) 2023-01-11T21:38:06.0329962Z del async_compile 2023-01-11T21:38:06.0329967Z 2023-01-11T21:38:06.0330041Z def call(args): 2023-01-11T21:38:06.0330108Z arg0_1, = args 2023-01-11T21:38:06.0330186Z args.clear() 2023-01-11T21:38:06.0330373Z buf0 = empty_strided((), (), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0330551Z buf1 = empty_strided((), (), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0330727Z buf2 = empty_strided((), (), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0330906Z buf3 = empty_strided((), (), device='cpu', dtype=torch.int64) 2023-01-11T21:38:06.0331121Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr()), c_void_p(buf3.data_ptr())) 2023-01-11T21:38:06.0331197Z del arg0_1 2023-01-11T21:38:06.0331282Z return (buf0, buf1, buf2, buf3, ) 2023-01-11T21:38:06.0331287Z 2023-01-11T21:38:06.0331296Z 2023-01-11T21:38:06.0331371Z if __name__ == "__main__": 2023-01-11T21:38:06.0331489Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.0331615Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.0331806Z arg0_1 = rand_strided((4, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0331946Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.0331951Z 2023-01-11T21:38:06.0332020Z ok (2.270s) 2023-01-11T21:38:06.0332250Z test_reduction4_cpu (__main__.CpuTests) ... skip: Non-deterministic CPU results (0.001s) 2023-01-11T21:38:06.0332728Z test_reflection_pad2d_backward_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.0332860Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.0333113Z [2023-01-11 21:30:16,524] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 221 2023-01-11T21:38:06.0333379Z [2023-01-11 21:30:18,601] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 221 2023-01-11T21:38:06.0333799Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.0333930Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.0334183Z [2023-01-11 21:30:18,643] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 222 2023-01-11T21:38:06.0334442Z [2023-01-11 21:30:21,371] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 222 2023-01-11T21:38:06.0335001Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.0335138Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.0335442Z [2023-01-11 21:30:21,391] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 223 2023-01-11T21:38:06.0335450Z 2023-01-11T21:38:06.0335568Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.0335643Z import torch 2023-01-11T21:38:06.0335737Z import random 2023-01-11T21:38:06.0335856Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.0335981Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.0335987Z 2023-01-11T21:38:06.0336069Z aten = torch.ops.aten 2023-01-11T21:38:06.0336207Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.0336307Z async_compile = AsyncCompile() 2023-01-11T21:38:06.0336312Z 2023-01-11T21:38:06.0336386Z import triton 2023-01-11T21:38:06.0336471Z import triton.language as tl 2023-01-11T21:38:06.0336595Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.0336733Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.0336741Z 2023-01-11T21:38:06.0336745Z 2023-01-11T21:38:06.0336882Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.0337087Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.0337329Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.0337437Z float* __restrict__ out_ptr0) 2023-01-11T21:38:06.0337504Z { 2023-01-11T21:38:06.0337599Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.0337666Z { 2023-01-11T21:38:06.0337748Z #pragma omp for 2023-01-11T21:38:06.0337834Z for(long i0=0; i0<8; i0+=1) 2023-01-11T21:38:06.0337959Z { 2023-01-11T21:38:06.0338049Z #pragma GCC ivdep 2023-01-11T21:38:06.0338129Z for(long i1=0; i1<8; i1+=1) 2023-01-11T21:38:06.0338196Z { 2023-01-11T21:38:06.0338265Z { 2023-01-11T21:38:06.0338336Z { 2023-01-11T21:38:06.0338449Z auto tmp0 = static_cast(i0); 2023-01-11T21:38:06.0338561Z auto tmp1 = static_cast(i1); 2023-01-11T21:38:06.0338674Z auto tmp2 = in_ptr0[tmp1 + (8*tmp0)]; 2023-01-11T21:38:06.0338767Z out_ptr0[i1 + (8*i0)] = tmp2; 2023-01-11T21:38:06.0338839Z } 2023-01-11T21:38:06.0338907Z } 2023-01-11T21:38:06.0338974Z } 2023-01-11T21:38:06.0339041Z } 2023-01-11T21:38:06.0339108Z } 2023-01-11T21:38:06.0339175Z } 2023-01-11T21:38:06.0339254Z ''') 2023-01-11T21:38:06.0339259Z 2023-01-11T21:38:06.0339270Z 2023-01-11T21:38:06.0339362Z async_compile.wait(globals()) 2023-01-11T21:38:06.0339438Z del async_compile 2023-01-11T21:38:06.0339443Z 2023-01-11T21:38:06.0339519Z def call(args): 2023-01-11T21:38:06.0339598Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.0339672Z args.clear() 2023-01-11T21:38:06.0339887Z buf0 = empty_strided((1, 1, 8, 8), (64, 64, 8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0340039Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.0340105Z del arg0_1 2023-01-11T21:38:06.0340188Z return (buf0, ) 2023-01-11T21:38:06.0340193Z 2023-01-11T21:38:06.0340197Z 2023-01-11T21:38:06.0340280Z if __name__ == "__main__": 2023-01-11T21:38:06.0340407Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.0340543Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.0340786Z arg0_1 = rand_strided((1, 1, 8, 8), (64, 64, 8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0340996Z arg1_1 = rand_strided((1, 1, 8, 8), (64, 64, 8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0341120Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.0341125Z 2023-01-11T21:38:06.0341129Z 2023-01-11T21:38:06.0341219Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.0341299Z import torch 2023-01-11T21:38:06.0341373Z import random 2023-01-11T21:38:06.0341566Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.0341693Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.0341698Z 2023-01-11T21:38:06.0341781Z aten = torch.ops.aten 2023-01-11T21:38:06.0341916Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.0342004Z async_compile = AsyncCompile() 2023-01-11T21:38:06.0342016Z 2023-01-11T21:38:06.0342083Z import triton 2023-01-11T21:38:06.0342180Z import triton.language as tl 2023-01-11T21:38:06.0342305Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.0342444Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.0342451Z 2023-01-11T21:38:06.0342456Z 2023-01-11T21:38:06.0342590Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.0342793Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.0342924Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.0343033Z float* __restrict__ out_ptr0) 2023-01-11T21:38:06.0343091Z { 2023-01-11T21:38:06.0343191Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.0343262Z { 2023-01-11T21:38:06.0343343Z #pragma omp for 2023-01-11T21:38:06.0343427Z for(long i0=0; i0<8; i0+=1) 2023-01-11T21:38:06.0343493Z { 2023-01-11T21:38:06.0343571Z #pragma GCC ivdep 2023-01-11T21:38:06.0343659Z for(long i1=0; i1<8; i1+=1) 2023-01-11T21:38:06.0343732Z { 2023-01-11T21:38:06.0343801Z { 2023-01-11T21:38:06.0343871Z { 2023-01-11T21:38:06.0344014Z auto tmp0 = static_cast(1 + i0); 2023-01-11T21:38:06.0344129Z auto tmp1 = static_cast(1 + i1); 2023-01-11T21:38:06.0344233Z auto tmp2 = in_ptr0[tmp1 + (10*tmp0)]; 2023-01-11T21:38:06.0344340Z auto tmp3 = static_cast(i1); 2023-01-11T21:38:06.0344439Z auto tmp4 = tmp3 >= 1; 2023-01-11T21:38:06.0344535Z auto tmp5 = tmp3 <= 1; 2023-01-11T21:38:06.0344637Z auto tmp6 = tmp4 & tmp5; 2023-01-11T21:38:06.0344727Z float tmp7 = 0.0; 2023-01-11T21:38:06.0344807Z if(tmp6) 2023-01-11T21:38:06.0344873Z { 2023-01-11T21:38:06.0344985Z auto tmp8 = static_cast(1 + i0); 2023-01-11T21:38:06.0345162Z auto tmp9 = static_cast(1 + ((-1)*i1)); 2023-01-11T21:38:06.0345278Z auto tmp10 = in_ptr0[tmp9 + (10*tmp8)]; 2023-01-11T21:38:06.0345367Z tmp7 = tmp10; 2023-01-11T21:38:06.0345440Z } 2023-01-11T21:38:06.0345539Z auto tmp11 = tmp2 + tmp7; 2023-01-11T21:38:06.0345639Z auto tmp12 = tmp3 >= 6; 2023-01-11T21:38:06.0345732Z auto tmp13 = tmp3 <= 6; 2023-01-11T21:38:06.0345833Z auto tmp14 = tmp12 & tmp13; 2023-01-11T21:38:06.0345925Z float tmp15 = 0.0; 2023-01-11T21:38:06.0346006Z if(tmp14) 2023-01-11T21:38:06.0346078Z { 2023-01-11T21:38:06.0346192Z auto tmp16 = static_cast(1 + i0); 2023-01-11T21:38:06.0346369Z auto tmp17 = static_cast(15 + ((-1)*i1)); 2023-01-11T21:38:06.0346480Z auto tmp18 = in_ptr0[tmp17 + (10*tmp16)]; 2023-01-11T21:38:06.0346571Z tmp15 = tmp18; 2023-01-11T21:38:06.0346645Z } 2023-01-11T21:38:06.0346748Z auto tmp19 = tmp11 + tmp15; 2023-01-11T21:38:06.0346856Z auto tmp20 = static_cast(i0); 2023-01-11T21:38:06.0346953Z auto tmp21 = tmp20 >= 1; 2023-01-11T21:38:06.0347080Z auto tmp22 = tmp20 <= 1; 2023-01-11T21:38:06.0347174Z auto tmp23 = tmp21 & tmp22; 2023-01-11T21:38:06.0347263Z float tmp24 = 0.0; 2023-01-11T21:38:06.0347344Z if(tmp23) 2023-01-11T21:38:06.0347415Z { 2023-01-11T21:38:06.0347591Z auto tmp25 = static_cast(1 + ((-1)*i0)); 2023-01-11T21:38:06.0347704Z auto tmp26 = static_cast(1 + i1); 2023-01-11T21:38:06.0347823Z auto tmp27 = in_ptr0[tmp26 + (10*tmp25)]; 2023-01-11T21:38:06.0347904Z tmp24 = tmp27; 2023-01-11T21:38:06.0347979Z } 2023-01-11T21:38:06.0348080Z auto tmp28 = tmp19 + tmp24; 2023-01-11T21:38:06.0348178Z auto tmp29 = tmp20 >= 6; 2023-01-11T21:38:06.0348273Z auto tmp30 = tmp20 <= 6; 2023-01-11T21:38:06.0348375Z auto tmp31 = tmp29 & tmp30; 2023-01-11T21:38:06.0348465Z float tmp32 = 0.0; 2023-01-11T21:38:06.0348545Z if(tmp31) 2023-01-11T21:38:06.0348611Z { 2023-01-11T21:38:06.0348788Z auto tmp33 = static_cast(15 + ((-1)*i0)); 2023-01-11T21:38:06.0348905Z auto tmp34 = static_cast(1 + i1); 2023-01-11T21:38:06.0349021Z auto tmp35 = in_ptr0[tmp34 + (10*tmp33)]; 2023-01-11T21:38:06.0349110Z tmp32 = tmp35; 2023-01-11T21:38:06.0349182Z } 2023-01-11T21:38:06.0349314Z auto tmp36 = tmp28 + tmp32; 2023-01-11T21:38:06.0349406Z auto tmp37 = tmp23 & tmp6; 2023-01-11T21:38:06.0349497Z float tmp38 = 0.0; 2023-01-11T21:38:06.0349583Z if(tmp37) 2023-01-11T21:38:06.0349656Z { 2023-01-11T21:38:06.0349838Z auto tmp39 = static_cast(1 + ((-1)*i0)); 2023-01-11T21:38:06.0350012Z auto tmp40 = static_cast(1 + ((-1)*i1)); 2023-01-11T21:38:06.0350127Z auto tmp41 = in_ptr0[tmp40 + (10*tmp39)]; 2023-01-11T21:38:06.0350208Z tmp38 = tmp41; 2023-01-11T21:38:06.0350279Z } 2023-01-11T21:38:06.0350379Z auto tmp42 = tmp36 + tmp38; 2023-01-11T21:38:06.0350478Z auto tmp43 = tmp23 & tmp14; 2023-01-11T21:38:06.0350570Z float tmp44 = 0.0; 2023-01-11T21:38:06.0350654Z if(tmp43) 2023-01-11T21:38:06.0350726Z { 2023-01-11T21:38:06.0350892Z auto tmp45 = static_cast(1 + ((-1)*i0)); 2023-01-11T21:38:06.0351069Z auto tmp46 = static_cast(15 + ((-1)*i1)); 2023-01-11T21:38:06.0351185Z auto tmp47 = in_ptr0[tmp46 + (10*tmp45)]; 2023-01-11T21:38:06.0351272Z tmp44 = tmp47; 2023-01-11T21:38:06.0351344Z } 2023-01-11T21:38:06.0351443Z auto tmp48 = tmp42 + tmp44; 2023-01-11T21:38:06.0351542Z auto tmp49 = tmp31 & tmp6; 2023-01-11T21:38:06.0351631Z float tmp50 = 0.0; 2023-01-11T21:38:06.0351705Z if(tmp49) 2023-01-11T21:38:06.0351777Z { 2023-01-11T21:38:06.0351953Z auto tmp51 = static_cast(15 + ((-1)*i0)); 2023-01-11T21:38:06.0352124Z auto tmp52 = static_cast(1 + ((-1)*i1)); 2023-01-11T21:38:06.0352241Z auto tmp53 = in_ptr0[tmp52 + (10*tmp51)]; 2023-01-11T21:38:06.0352329Z tmp50 = tmp53; 2023-01-11T21:38:06.0352400Z } 2023-01-11T21:38:06.0352518Z auto tmp54 = tmp48 + tmp50; 2023-01-11T21:38:06.0352618Z auto tmp55 = tmp31 & tmp14; 2023-01-11T21:38:06.0352708Z float tmp56 = 0.0; 2023-01-11T21:38:06.0352792Z if(tmp55) 2023-01-11T21:38:06.0352868Z { 2023-01-11T21:38:06.0353042Z auto tmp57 = static_cast(15 + ((-1)*i0)); 2023-01-11T21:38:06.0353216Z auto tmp58 = static_cast(15 + ((-1)*i1)); 2023-01-11T21:38:06.0353323Z auto tmp59 = in_ptr0[tmp58 + (10*tmp57)]; 2023-01-11T21:38:06.0353411Z tmp56 = tmp59; 2023-01-11T21:38:06.0353487Z } 2023-01-11T21:38:06.0353585Z auto tmp60 = tmp54 + tmp56; 2023-01-11T21:38:06.0353688Z out_ptr0[i1 + (8*i0)] = tmp60; 2023-01-11T21:38:06.0353761Z } 2023-01-11T21:38:06.0353831Z } 2023-01-11T21:38:06.0353891Z } 2023-01-11T21:38:06.0353960Z } 2023-01-11T21:38:06.0354027Z } 2023-01-11T21:38:06.0354093Z } 2023-01-11T21:38:06.0354177Z ''') 2023-01-11T21:38:06.0354183Z 2023-01-11T21:38:06.0354187Z 2023-01-11T21:38:06.0354278Z async_compile.wait(globals()) 2023-01-11T21:38:06.0354356Z del async_compile 2023-01-11T21:38:06.0354361Z 2023-01-11T21:38:06.0354429Z def call(args): 2023-01-11T21:38:06.0354510Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.0354585Z args.clear() 2023-01-11T21:38:06.0354799Z buf0 = empty_strided((1, 1, 8, 8), (64, 64, 8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0354937Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.0355050Z del arg0_1 2023-01-11T21:38:06.0355142Z return (buf0, ) 2023-01-11T21:38:06.0355147Z 2023-01-11T21:38:06.0355151Z 2023-01-11T21:38:06.0355240Z if __name__ == "__main__": 2023-01-11T21:38:06.0355368Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.0355496Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.0355715Z arg0_1 = rand_strided((1, 1, 10, 10), (100, 100, 10, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0355924Z arg1_1 = rand_strided((1, 1, 8, 8), (64, 64, 8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0356047Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.0356052Z 2023-01-11T21:38:06.0356317Z [2023-01-11 21:30:23,676] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 223 2023-01-11T21:38:06.0356323Z 2023-01-11T21:38:06.0356422Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.0356500Z import torch 2023-01-11T21:38:06.0356568Z import random 2023-01-11T21:38:06.0356684Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.0356806Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.0356811Z 2023-01-11T21:38:06.0356891Z aten = torch.ops.aten 2023-01-11T21:38:06.0357028Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.0357124Z async_compile = AsyncCompile() 2023-01-11T21:38:06.0357129Z 2023-01-11T21:38:06.0357203Z import triton 2023-01-11T21:38:06.0357296Z import triton.language as tl 2023-01-11T21:38:06.0357414Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.0357551Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.0357556Z 2023-01-11T21:38:06.0357561Z 2023-01-11T21:38:06.0357699Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.0357905Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.0358032Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.0358136Z float* __restrict__ out_ptr0) 2023-01-11T21:38:06.0358199Z { 2023-01-11T21:38:06.0358293Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.0358360Z { 2023-01-11T21:38:06.0358444Z #pragma omp for 2023-01-11T21:38:06.0358563Z for(long i0=0; i0<8; i0+=1) 2023-01-11T21:38:06.0358633Z { 2023-01-11T21:38:06.0358723Z #pragma GCC ivdep 2023-01-11T21:38:06.0358810Z for(long i1=0; i1<8; i1+=1) 2023-01-11T21:38:06.0358871Z { 2023-01-11T21:38:06.0358941Z { 2023-01-11T21:38:06.0359010Z { 2023-01-11T21:38:06.0359123Z auto tmp0 = static_cast(3 + i0); 2023-01-11T21:38:06.0359236Z auto tmp1 = static_cast(1 + i1); 2023-01-11T21:38:06.0359350Z auto tmp2 = in_ptr0[tmp1 + (11*tmp0)]; 2023-01-11T21:38:06.0359461Z auto tmp3 = static_cast(i1); 2023-01-11T21:38:06.0359551Z auto tmp4 = tmp3 >= 1; 2023-01-11T21:38:06.0359651Z auto tmp5 = tmp3 <= 1; 2023-01-11T21:38:06.0359754Z auto tmp6 = tmp4 & tmp5; 2023-01-11T21:38:06.0359848Z float tmp7 = 0.0; 2023-01-11T21:38:06.0359929Z if(tmp6) 2023-01-11T21:38:06.0360003Z { 2023-01-11T21:38:06.0360120Z auto tmp8 = static_cast(3 + i0); 2023-01-11T21:38:06.0360287Z auto tmp9 = static_cast(1 + ((-1)*i1)); 2023-01-11T21:38:06.0360404Z auto tmp10 = in_ptr0[tmp9 + (11*tmp8)]; 2023-01-11T21:38:06.0360492Z tmp7 = tmp10; 2023-01-11T21:38:06.0360566Z } 2023-01-11T21:38:06.0360668Z auto tmp11 = tmp2 + tmp7; 2023-01-11T21:38:06.0360766Z auto tmp12 = tmp3 >= 5; 2023-01-11T21:38:06.0360892Z auto tmp13 = tmp3 <= 6; 2023-01-11T21:38:06.0360991Z auto tmp14 = tmp12 & tmp13; 2023-01-11T21:38:06.0361075Z float tmp15 = 0.0; 2023-01-11T21:38:06.0361161Z if(tmp14) 2023-01-11T21:38:06.0361240Z { 2023-01-11T21:38:06.0361356Z auto tmp16 = static_cast(3 + i0); 2023-01-11T21:38:06.0361540Z auto tmp17 = static_cast(15 + ((-1)*i1)); 2023-01-11T21:38:06.0361656Z auto tmp18 = in_ptr0[tmp17 + (11*tmp16)]; 2023-01-11T21:38:06.0361744Z tmp15 = tmp18; 2023-01-11T21:38:06.0361809Z } 2023-01-11T21:38:06.0361909Z auto tmp19 = tmp11 + tmp15; 2023-01-11T21:38:06.0362018Z auto tmp20 = static_cast(i0); 2023-01-11T21:38:06.0362116Z auto tmp21 = tmp20 >= 1; 2023-01-11T21:38:06.0362214Z auto tmp22 = tmp20 <= 3; 2023-01-11T21:38:06.0362314Z auto tmp23 = tmp21 & tmp22; 2023-01-11T21:38:06.0362404Z float tmp24 = 0.0; 2023-01-11T21:38:06.0362478Z if(tmp23) 2023-01-11T21:38:06.0362555Z { 2023-01-11T21:38:06.0362734Z auto tmp25 = static_cast(3 + ((-1)*i0)); 2023-01-11T21:38:06.0362845Z auto tmp26 = static_cast(1 + i1); 2023-01-11T21:38:06.0362964Z auto tmp27 = in_ptr0[tmp26 + (11*tmp25)]; 2023-01-11T21:38:06.0363055Z tmp24 = tmp27; 2023-01-11T21:38:06.0363129Z } 2023-01-11T21:38:06.0363221Z auto tmp28 = tmp19 + tmp24; 2023-01-11T21:38:06.0363320Z auto tmp29 = tmp20 >= 3; 2023-01-11T21:38:06.0363413Z auto tmp30 = tmp20 <= 6; 2023-01-11T21:38:06.0363516Z auto tmp31 = tmp29 & tmp30; 2023-01-11T21:38:06.0363606Z float tmp32 = 0.0; 2023-01-11T21:38:06.0363692Z if(tmp31) 2023-01-11T21:38:06.0363765Z { 2023-01-11T21:38:06.0363978Z auto tmp33 = static_cast(17 + ((-1)*i0)); 2023-01-11T21:38:06.0364087Z auto tmp34 = static_cast(1 + i1); 2023-01-11T21:38:06.0364202Z auto tmp35 = in_ptr0[tmp34 + (11*tmp33)]; 2023-01-11T21:38:06.0364293Z tmp32 = tmp35; 2023-01-11T21:38:06.0364367Z } 2023-01-11T21:38:06.0364468Z auto tmp36 = tmp28 + tmp32; 2023-01-11T21:38:06.0364565Z auto tmp37 = tmp23 & tmp6; 2023-01-11T21:38:06.0364655Z float tmp38 = 0.0; 2023-01-11T21:38:06.0364729Z if(tmp37) 2023-01-11T21:38:06.0364807Z { 2023-01-11T21:38:06.0364982Z auto tmp39 = static_cast(3 + ((-1)*i0)); 2023-01-11T21:38:06.0365158Z auto tmp40 = static_cast(1 + ((-1)*i1)); 2023-01-11T21:38:06.0365273Z auto tmp41 = in_ptr0[tmp40 + (11*tmp39)]; 2023-01-11T21:38:06.0365363Z tmp38 = tmp41; 2023-01-11T21:38:06.0365436Z } 2023-01-11T21:38:06.0365528Z auto tmp42 = tmp36 + tmp38; 2023-01-11T21:38:06.0365625Z auto tmp43 = tmp23 & tmp14; 2023-01-11T21:38:06.0365716Z float tmp44 = 0.0; 2023-01-11T21:38:06.0365797Z if(tmp43) 2023-01-11T21:38:06.0365870Z { 2023-01-11T21:38:06.0366045Z auto tmp45 = static_cast(3 + ((-1)*i0)); 2023-01-11T21:38:06.0366224Z auto tmp46 = static_cast(15 + ((-1)*i1)); 2023-01-11T21:38:06.0366360Z auto tmp47 = in_ptr0[tmp46 + (11*tmp45)]; 2023-01-11T21:38:06.0366448Z tmp44 = tmp47; 2023-01-11T21:38:06.0366521Z } 2023-01-11T21:38:06.0366624Z auto tmp48 = tmp42 + tmp44; 2023-01-11T21:38:06.0366724Z auto tmp49 = tmp31 & tmp6; 2023-01-11T21:38:06.0366813Z float tmp50 = 0.0; 2023-01-11T21:38:06.0366897Z if(tmp49) 2023-01-11T21:38:06.0366971Z { 2023-01-11T21:38:06.0367141Z auto tmp51 = static_cast(17 + ((-1)*i0)); 2023-01-11T21:38:06.0367317Z auto tmp52 = static_cast(1 + ((-1)*i1)); 2023-01-11T21:38:06.0367430Z auto tmp53 = in_ptr0[tmp52 + (11*tmp51)]; 2023-01-11T21:38:06.0367520Z tmp50 = tmp53; 2023-01-11T21:38:06.0367592Z } 2023-01-11T21:38:06.0367694Z auto tmp54 = tmp48 + tmp50; 2023-01-11T21:38:06.0367790Z auto tmp55 = tmp31 & tmp14; 2023-01-11T21:38:06.0367875Z float tmp56 = 0.0; 2023-01-11T21:38:06.0367958Z if(tmp55) 2023-01-11T21:38:06.0368031Z { 2023-01-11T21:38:06.0368213Z auto tmp57 = static_cast(17 + ((-1)*i0)); 2023-01-11T21:38:06.0368387Z auto tmp58 = static_cast(15 + ((-1)*i1)); 2023-01-11T21:38:06.0368505Z auto tmp59 = in_ptr0[tmp58 + (11*tmp57)]; 2023-01-11T21:38:06.0368594Z tmp56 = tmp59; 2023-01-11T21:38:06.0368659Z } 2023-01-11T21:38:06.0368760Z auto tmp60 = tmp54 + tmp56; 2023-01-11T21:38:06.0368864Z out_ptr0[i1 + (8*i0)] = tmp60; 2023-01-11T21:38:06.0368938Z } 2023-01-11T21:38:06.0369009Z } 2023-01-11T21:38:06.0369077Z } 2023-01-11T21:38:06.0369142Z } 2023-01-11T21:38:06.0369201Z } 2023-01-11T21:38:06.0369267Z } 2023-01-11T21:38:06.0369350Z ''') 2023-01-11T21:38:06.0369355Z 2023-01-11T21:38:06.0369360Z 2023-01-11T21:38:06.0369453Z async_compile.wait(globals()) 2023-01-11T21:38:06.0369557Z del async_compile 2023-01-11T21:38:06.0369563Z 2023-01-11T21:38:06.0369641Z def call(args): 2023-01-11T21:38:06.0369721Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.0369790Z args.clear() 2023-01-11T21:38:06.0370000Z buf0 = empty_strided((1, 1, 8, 8), (64, 64, 8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0370139Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.0370211Z del arg0_1 2023-01-11T21:38:06.0370287Z return (buf0, ) 2023-01-11T21:38:06.0370292Z 2023-01-11T21:38:06.0370296Z 2023-01-11T21:38:06.0370372Z if __name__ == "__main__": 2023-01-11T21:38:06.0370493Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.0370623Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.0370837Z arg0_1 = rand_strided((1, 1, 15, 11), (165, 165, 11, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0371045Z arg1_1 = rand_strided((1, 1, 8, 8), (64, 64, 8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0371166Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.0371171Z 2023-01-11T21:38:06.0371242Z ok (7.180s) 2023-01-11T21:38:06.0371703Z test_reflection_pad2d_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.0371834Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.0372119Z [2023-01-11 21:30:23,760] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 224 2023-01-11T21:38:06.0372380Z [2023-01-11 21:30:26,391] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 224 2023-01-11T21:38:06.0372386Z 2023-01-11T21:38:06.0372490Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.0372565Z import torch 2023-01-11T21:38:06.0372633Z import random 2023-01-11T21:38:06.0372752Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.0372875Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.0372881Z 2023-01-11T21:38:06.0372964Z aten = torch.ops.aten 2023-01-11T21:38:06.0373100Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.0373196Z async_compile = AsyncCompile() 2023-01-11T21:38:06.0373202Z 2023-01-11T21:38:06.0373276Z import triton 2023-01-11T21:38:06.0373361Z import triton.language as tl 2023-01-11T21:38:06.0373492Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.0373631Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.0373637Z 2023-01-11T21:38:06.0373641Z 2023-01-11T21:38:06.0373778Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.0373985Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.0374109Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.0374213Z float* __restrict__ out_ptr0, 2023-01-11T21:38:06.0374313Z float* __restrict__ out_ptr1) 2023-01-11T21:38:06.0374372Z { 2023-01-11T21:38:06.0374474Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.0374650Z { 2023-01-11T21:38:06.0374744Z #pragma omp for collapse(2) 2023-01-11T21:38:06.0374829Z for(long i0=0; i0<10; i0+=1) 2023-01-11T21:38:06.0374897Z { 2023-01-11T21:38:06.0374986Z for(long i1=0; i1<10; i1+=1) 2023-01-11T21:38:06.0375053Z { 2023-01-11T21:38:06.0375122Z { 2023-01-11T21:38:06.0375193Z { 2023-01-11T21:38:06.0375303Z auto tmp0 = static_cast(7); 2023-01-11T21:38:06.0375416Z auto tmp1 = static_cast(i0); 2023-01-11T21:38:06.0375565Z auto tmp2 = static_cast(1); 2023-01-11T21:38:06.0375713Z auto tmp3 = tmp1 - tmp2; 2023-01-11T21:38:06.0375812Z auto tmp4 = std::abs(tmp3); 2023-01-11T21:38:06.0375954Z auto tmp5 = tmp0 - tmp4; 2023-01-11T21:38:06.0376061Z auto tmp6 = std::abs(tmp5); 2023-01-11T21:38:06.0376202Z auto tmp7 = tmp0 - tmp6; 2023-01-11T21:38:06.0376310Z auto tmp8 = static_cast(i1); 2023-01-11T21:38:06.0376449Z auto tmp9 = tmp8 - tmp2; 2023-01-11T21:38:06.0376558Z auto tmp10 = std::abs(tmp9); 2023-01-11T21:38:06.0376696Z auto tmp11 = tmp0 - tmp10; 2023-01-11T21:38:06.0376804Z auto tmp12 = std::abs(tmp11); 2023-01-11T21:38:06.0376944Z auto tmp13 = tmp0 - tmp12; 2023-01-11T21:38:06.0377056Z auto tmp14 = in_ptr0[tmp13 + (8*tmp7)]; 2023-01-11T21:38:06.0377208Z out_ptr0[i1 + (10*i0)] = tmp14; 2023-01-11T21:38:06.0377292Z } 2023-01-11T21:38:06.0377375Z } 2023-01-11T21:38:06.0377447Z } 2023-01-11T21:38:06.0377517Z } 2023-01-11T21:38:06.0377611Z #pragma omp for collapse(2) 2023-01-11T21:38:06.0377697Z for(long i0=0; i0<15; i0+=1) 2023-01-11T21:38:06.0377764Z { 2023-01-11T21:38:06.0377855Z for(long i1=0; i1<11; i1+=1) 2023-01-11T21:38:06.0377924Z { 2023-01-11T21:38:06.0377986Z { 2023-01-11T21:38:06.0378058Z { 2023-01-11T21:38:06.0378219Z auto tmp0 = static_cast(7); 2023-01-11T21:38:06.0378329Z auto tmp1 = static_cast(i0); 2023-01-11T21:38:06.0378435Z auto tmp2 = static_cast(3); 2023-01-11T21:38:06.0378580Z auto tmp3 = tmp1 - tmp2; 2023-01-11T21:38:06.0378685Z auto tmp4 = std::abs(tmp3); 2023-01-11T21:38:06.0378818Z auto tmp5 = tmp0 - tmp4; 2023-01-11T21:38:06.0378922Z auto tmp6 = std::abs(tmp5); 2023-01-11T21:38:06.0379061Z auto tmp7 = tmp0 - tmp6; 2023-01-11T21:38:06.0379167Z auto tmp8 = static_cast(i1); 2023-01-11T21:38:06.0379277Z auto tmp9 = static_cast(1); 2023-01-11T21:38:06.0379421Z auto tmp10 = tmp8 - tmp9; 2023-01-11T21:38:06.0379525Z auto tmp11 = std::abs(tmp10); 2023-01-11T21:38:06.0379669Z auto tmp12 = tmp0 - tmp11; 2023-01-11T21:38:06.0379767Z auto tmp13 = std::abs(tmp12); 2023-01-11T21:38:06.0379911Z auto tmp14 = tmp0 - tmp13; 2023-01-11T21:38:06.0380024Z auto tmp15 = in_ptr0[tmp14 + (8*tmp7)]; 2023-01-11T21:38:06.0380129Z out_ptr1[i1 + (11*i0)] = tmp15; 2023-01-11T21:38:06.0380201Z } 2023-01-11T21:38:06.0380269Z } 2023-01-11T21:38:06.0380334Z } 2023-01-11T21:38:06.0380394Z } 2023-01-11T21:38:06.0380459Z } 2023-01-11T21:38:06.0380523Z } 2023-01-11T21:38:06.0380605Z ''') 2023-01-11T21:38:06.0380611Z 2023-01-11T21:38:06.0380615Z 2023-01-11T21:38:06.0380709Z async_compile.wait(globals()) 2023-01-11T21:38:06.0380786Z del async_compile 2023-01-11T21:38:06.0380791Z 2023-01-11T21:38:06.0380866Z def call(args): 2023-01-11T21:38:06.0380932Z arg0_1, = args 2023-01-11T21:38:06.0381006Z args.clear() 2023-01-11T21:38:06.0381226Z buf0 = empty_strided((1, 1, 10, 10), (100, 100, 10, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0381440Z buf1 = empty_strided((1, 1, 15, 11), (165, 165, 11, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0381609Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr())) 2023-01-11T21:38:06.0381714Z del arg0_1 2023-01-11T21:38:06.0381796Z return (buf0, buf1, ) 2023-01-11T21:38:06.0381801Z 2023-01-11T21:38:06.0381806Z 2023-01-11T21:38:06.0381889Z if __name__ == "__main__": 2023-01-11T21:38:06.0381999Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.0382128Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.0382339Z arg0_1 = rand_strided((1, 1, 8, 8), (64, 64, 8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0382453Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.0382458Z 2023-01-11T21:38:06.0382530Z ok (2.715s) 2023-01-11T21:38:06.0382986Z test_relu_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.0383120Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.0383376Z [2023-01-11 21:30:26,422] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 225 2023-01-11T21:38:06.0383639Z [2023-01-11 21:30:28,171] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 225 2023-01-11T21:38:06.0383645Z 2023-01-11T21:38:06.0383744Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.0383811Z import torch 2023-01-11T21:38:06.0383887Z import random 2023-01-11T21:38:06.0384008Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.0384161Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.0384166Z 2023-01-11T21:38:06.0384249Z aten = torch.ops.aten 2023-01-11T21:38:06.0384388Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.0384480Z async_compile = AsyncCompile() 2023-01-11T21:38:06.0384487Z 2023-01-11T21:38:06.0384555Z import triton 2023-01-11T21:38:06.0384648Z import triton.language as tl 2023-01-11T21:38:06.0384773Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.0384913Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.0384919Z 2023-01-11T21:38:06.0384923Z 2023-01-11T21:38:06.0385058Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.0385264Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.0385387Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.0385499Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:06.0385596Z float* __restrict__ out_ptr0, 2023-01-11T21:38:06.0385697Z float* __restrict__ out_ptr1) 2023-01-11T21:38:06.0385760Z { 2023-01-11T21:38:06.0385862Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.0385926Z { 2023-01-11T21:38:06.0386012Z #pragma omp for 2023-01-11T21:38:06.0386099Z for(long i0=0; i0<8; i0+=1) 2023-01-11T21:38:06.0386159Z { 2023-01-11T21:38:06.0386300Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:06.0386437Z auto tmp2 = at::vec::Vectorized::loadu(in_ptr1 + 8*i0); 2023-01-11T21:38:06.0386575Z auto tmp1 = at::vec::clamp_min(tmp0, decltype(tmp0)(0)); 2023-01-11T21:38:06.0386669Z auto tmp3 = tmp0 + tmp2; 2023-01-11T21:38:06.0386799Z auto tmp4 = at::vec::clamp_min(tmp3, decltype(tmp3)(0)); 2023-01-11T21:38:06.0386938Z auto tmp5 = at::vec::Vectorized(static_cast(10)); 2023-01-11T21:38:06.0387034Z auto tmp6 = tmp4 / tmp5; 2023-01-11T21:38:06.0387125Z tmp1.store(out_ptr0 + 8*i0); 2023-01-11T21:38:06.0387220Z tmp6.store(out_ptr1 + 8*i0); 2023-01-11T21:38:06.0387286Z } 2023-01-11T21:38:06.0387416Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:06.0387506Z for(long i0=64; i0<64; i0+=1) 2023-01-11T21:38:06.0387576Z { 2023-01-11T21:38:06.0387665Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:06.0387746Z auto tmp2 = in_ptr1[i0]; 2023-01-11T21:38:06.0387838Z auto tmp1 = tmp0 * (tmp0>0); 2023-01-11T21:38:06.0387928Z auto tmp3 = tmp0 + tmp2; 2023-01-11T21:38:06.0388020Z auto tmp4 = tmp3 * (tmp3>0); 2023-01-11T21:38:06.0388124Z auto tmp5 = static_cast(10); 2023-01-11T21:38:06.0388213Z auto tmp6 = tmp4 / tmp5; 2023-01-11T21:38:06.0388303Z out_ptr0[i0] = tmp1; 2023-01-11T21:38:06.0388385Z out_ptr1[i0] = tmp6; 2023-01-11T21:38:06.0388455Z } 2023-01-11T21:38:06.0388523Z } 2023-01-11T21:38:06.0388587Z } 2023-01-11T21:38:06.0388673Z ''') 2023-01-11T21:38:06.0388678Z 2023-01-11T21:38:06.0388683Z 2023-01-11T21:38:06.0388774Z async_compile.wait(globals()) 2023-01-11T21:38:06.0388850Z del async_compile 2023-01-11T21:38:06.0388855Z 2023-01-11T21:38:06.0388923Z def call(args): 2023-01-11T21:38:06.0389002Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.0389079Z args.clear() 2023-01-11T21:38:06.0389279Z buf0 = empty_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0389470Z buf1 = empty_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0389666Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr())) 2023-01-11T21:38:06.0389738Z del arg0_1 2023-01-11T21:38:06.0389803Z del arg1_1 2023-01-11T21:38:06.0389914Z return (buf0, buf1, ) 2023-01-11T21:38:06.0389919Z 2023-01-11T21:38:06.0389923Z 2023-01-11T21:38:06.0390003Z if __name__ == "__main__": 2023-01-11T21:38:06.0390120Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.0390246Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.0390443Z arg0_1 = rand_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0390636Z arg1_1 = rand_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0390754Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.0390759Z 2023-01-11T21:38:06.0390823Z ok (1.780s) 2023-01-11T21:38:06.0391280Z test_remainder_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.0391416Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.0391672Z [2023-01-11 21:30:28,198] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 226 2023-01-11T21:38:06.0391935Z [2023-01-11 21:30:29,923] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 226 2023-01-11T21:38:06.0391941Z 2023-01-11T21:38:06.0392043Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.0392119Z import torch 2023-01-11T21:38:06.0392192Z import random 2023-01-11T21:38:06.0392311Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.0392429Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.0392434Z 2023-01-11T21:38:06.0392514Z aten = torch.ops.aten 2023-01-11T21:38:06.0392653Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.0392747Z async_compile = AsyncCompile() 2023-01-11T21:38:06.0392755Z 2023-01-11T21:38:06.0392829Z import triton 2023-01-11T21:38:06.0392925Z import triton.language as tl 2023-01-11T21:38:06.0393048Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.0393184Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.0393189Z 2023-01-11T21:38:06.0393232Z 2023-01-11T21:38:06.0393365Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.0393573Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.0393697Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.0393810Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:06.0393918Z float* __restrict__ out_ptr0, 2023-01-11T21:38:06.0394023Z float* __restrict__ out_ptr1, 2023-01-11T21:38:06.0394126Z float* __restrict__ out_ptr2) 2023-01-11T21:38:06.0394199Z { 2023-01-11T21:38:06.0394296Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.0394365Z { 2023-01-11T21:38:06.0394450Z #pragma omp for 2023-01-11T21:38:06.0394542Z for(long i0=0; i0<64; i0+=1) 2023-01-11T21:38:06.0394611Z { 2023-01-11T21:38:06.0394683Z { 2023-01-11T21:38:06.0394748Z { 2023-01-11T21:38:06.0394850Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:06.0394948Z auto tmp1 = in_ptr1[i0]; 2023-01-11T21:38:06.0395051Z auto tmp2 = mod(tmp0, tmp1); 2023-01-11T21:38:06.0395152Z auto tmp3 = tmp2 + tmp1; 2023-01-11T21:38:06.0395283Z auto tmp4 = ((tmp2 != 0) & ((tmp2 < 0) != (tmp1 < 0))) ? tmp3 : tmp2; 2023-01-11T21:38:06.0395396Z auto tmp5 = static_cast(1); 2023-01-11T21:38:06.0395487Z auto tmp6 = tmp0 + tmp5; 2023-01-11T21:38:06.0395629Z auto tmp7 = tmp1 - tmp5; 2023-01-11T21:38:06.0395759Z auto tmp8 = mod(tmp6, tmp7); 2023-01-11T21:38:06.0395853Z auto tmp9 = tmp8 + tmp7; 2023-01-11T21:38:06.0395982Z auto tmp10 = ((tmp8 != 0) & ((tmp8 < 0) != (tmp7 < 0))) ? tmp9 : tmp8; 2023-01-11T21:38:06.0396124Z auto tmp11 = tmp0 - tmp5; 2023-01-11T21:38:06.0396223Z auto tmp12 = tmp1 + tmp5; 2023-01-11T21:38:06.0396326Z auto tmp13 = mod(tmp11, tmp12); 2023-01-11T21:38:06.0396417Z auto tmp14 = tmp13 + tmp12; 2023-01-11T21:38:06.0396548Z auto tmp15 = ((tmp13 != 0) & ((tmp13 < 0) != (tmp12 < 0))) ? tmp14 : tmp13; 2023-01-11T21:38:06.0396640Z out_ptr0[i0] = tmp4; 2023-01-11T21:38:06.0396731Z out_ptr1[i0] = tmp10; 2023-01-11T21:38:06.0396820Z out_ptr2[i0] = tmp15; 2023-01-11T21:38:06.0396888Z } 2023-01-11T21:38:06.0396954Z } 2023-01-11T21:38:06.0397020Z } 2023-01-11T21:38:06.0397088Z } 2023-01-11T21:38:06.0397150Z } 2023-01-11T21:38:06.0397235Z ''') 2023-01-11T21:38:06.0397240Z 2023-01-11T21:38:06.0397245Z 2023-01-11T21:38:06.0397338Z async_compile.wait(globals()) 2023-01-11T21:38:06.0397413Z del async_compile 2023-01-11T21:38:06.0397419Z 2023-01-11T21:38:06.0397492Z def call(args): 2023-01-11T21:38:06.0397568Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.0397644Z args.clear() 2023-01-11T21:38:06.0397842Z buf0 = empty_strided((64, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0398034Z buf1 = empty_strided((64, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0398222Z buf2 = empty_strided((64, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0398436Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr())) 2023-01-11T21:38:06.0398511Z del arg0_1 2023-01-11T21:38:06.0398587Z del arg1_1 2023-01-11T21:38:06.0398668Z return (buf0, buf1, buf2, ) 2023-01-11T21:38:06.0398673Z 2023-01-11T21:38:06.0398677Z 2023-01-11T21:38:06.0398756Z if __name__ == "__main__": 2023-01-11T21:38:06.0398875Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.0398999Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.0399220Z arg0_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0399414Z arg1_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0399535Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.0399541Z 2023-01-11T21:38:06.0399613Z ok (1.752s) 2023-01-11T21:38:06.0400058Z test_repeat_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.0400192Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.0400452Z [2023-01-11 21:30:29,942] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 227 2023-01-11T21:38:06.0400715Z [2023-01-11 21:30:32,016] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 227 2023-01-11T21:38:06.0400721Z 2023-01-11T21:38:06.0400821Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.0400895Z import torch 2023-01-11T21:38:06.0400972Z import random 2023-01-11T21:38:06.0401092Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.0401217Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.0401223Z 2023-01-11T21:38:06.0401298Z aten = torch.ops.aten 2023-01-11T21:38:06.0401436Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.0401564Z async_compile = AsyncCompile() 2023-01-11T21:38:06.0401569Z 2023-01-11T21:38:06.0401644Z import triton 2023-01-11T21:38:06.0401740Z import triton.language as tl 2023-01-11T21:38:06.0401865Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.0402003Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.0402012Z 2023-01-11T21:38:06.0402016Z 2023-01-11T21:38:06.0402154Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.0402350Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.0402471Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.0402576Z float* __restrict__ out_ptr0, 2023-01-11T21:38:06.0402677Z float* __restrict__ out_ptr1, 2023-01-11T21:38:06.0402774Z float* __restrict__ out_ptr2) 2023-01-11T21:38:06.0402838Z { 2023-01-11T21:38:06.0402943Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.0403005Z { 2023-01-11T21:38:06.0403100Z #pragma omp for collapse(2) 2023-01-11T21:38:06.0403185Z for(long i0=0; i0<2; i0+=1) 2023-01-11T21:38:06.0403251Z { 2023-01-11T21:38:06.0403339Z for(long i1=0; i1<4; i1+=1) 2023-01-11T21:38:06.0403405Z { 2023-01-11T21:38:06.0403494Z #pragma GCC ivdep 2023-01-11T21:38:06.0403583Z for(long i2=0; i2<12; i2+=1) 2023-01-11T21:38:06.0403652Z { 2023-01-11T21:38:06.0403746Z for(long i3=0; i3<1; i3+=1) 2023-01-11T21:38:06.0403817Z { 2023-01-11T21:38:06.0403981Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + (8*i3) + (8*(i2 % 4)) + (32*(i1 % 2))); 2023-01-11T21:38:06.0404108Z tmp0.store(out_ptr0 + (8*i2) + (8*i3) + (96*i1) + (384*i0)); 2023-01-11T21:38:06.0404177Z } 2023-01-11T21:38:06.0404274Z #pragma omp simd simdlen(4) 2023-01-11T21:38:06.0404377Z for(long i3=8; i3<8; i3+=1) 2023-01-11T21:38:06.0404449Z { 2023-01-11T21:38:06.0404568Z auto tmp0 = in_ptr0[i3 + (8*(i2 % 4)) + (32*(i1 % 2))]; 2023-01-11T21:38:06.0404682Z out_ptr0[i3 + (8*i2) + (96*i1) + (384*i0)] = tmp0; 2023-01-11T21:38:06.0404780Z } 2023-01-11T21:38:06.0404851Z } 2023-01-11T21:38:06.0404911Z } 2023-01-11T21:38:06.0404980Z } 2023-01-11T21:38:06.0405064Z #pragma omp for 2023-01-11T21:38:06.0405150Z for(long i0=0; i0<8; i0+=1) 2023-01-11T21:38:06.0405217Z { 2023-01-11T21:38:06.0405308Z for(long i1=0; i1<8; i1+=1) 2023-01-11T21:38:06.0405388Z { 2023-01-11T21:38:06.0405539Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i1); 2023-01-11T21:38:06.0405666Z tmp0.store(out_ptr1 + (8*i1) + (64*i0)); 2023-01-11T21:38:06.0405737Z } 2023-01-11T21:38:06.0405835Z #pragma omp simd simdlen(4) 2023-01-11T21:38:06.0405926Z for(long i1=64; i1<64; i1+=1) 2023-01-11T21:38:06.0405993Z { 2023-01-11T21:38:06.0406083Z auto tmp0 = in_ptr0[i1]; 2023-01-11T21:38:06.0406172Z out_ptr1[i1 + (64*i0)] = tmp0; 2023-01-11T21:38:06.0406241Z } 2023-01-11T21:38:06.0406309Z } 2023-01-11T21:38:06.0406390Z #pragma omp for 2023-01-11T21:38:06.0406474Z for(long i0=0; i0<2; i0+=1) 2023-01-11T21:38:06.0406541Z { 2023-01-11T21:38:06.0406627Z for(long i1=0; i1<8; i1+=1) 2023-01-11T21:38:06.0406687Z { 2023-01-11T21:38:06.0406827Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i1); 2023-01-11T21:38:06.0406937Z tmp0.store(out_ptr2 + (8*i1) + (64*i0)); 2023-01-11T21:38:06.0407005Z } 2023-01-11T21:38:06.0407102Z #pragma omp simd simdlen(4) 2023-01-11T21:38:06.0407217Z for(long i1=64; i1<64; i1+=1) 2023-01-11T21:38:06.0407286Z { 2023-01-11T21:38:06.0407370Z auto tmp0 = in_ptr0[i1]; 2023-01-11T21:38:06.0407465Z out_ptr2[i1 + (64*i0)] = tmp0; 2023-01-11T21:38:06.0407532Z } 2023-01-11T21:38:06.0407600Z } 2023-01-11T21:38:06.0407669Z } 2023-01-11T21:38:06.0407735Z } 2023-01-11T21:38:06.0407815Z ''') 2023-01-11T21:38:06.0407827Z 2023-01-11T21:38:06.0407831Z 2023-01-11T21:38:06.0407918Z async_compile.wait(globals()) 2023-01-11T21:38:06.0407993Z del async_compile 2023-01-11T21:38:06.0407999Z 2023-01-11T21:38:06.0408072Z def call(args): 2023-01-11T21:38:06.0408146Z arg0_1, = args 2023-01-11T21:38:06.0408222Z args.clear() 2023-01-11T21:38:06.0408440Z buf0 = empty_strided((2, 4, 12, 8), (384, 96, 8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0408650Z buf1 = empty_strided((8, 2, 4, 8), (64, 32, 8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0408872Z buf2 = empty_strided((2, 1, 1, 2, 4, 8), (64, 64, 64, 32, 8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0409067Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr())) 2023-01-11T21:38:06.0409139Z del arg0_1 2023-01-11T21:38:06.0409229Z return (buf0, buf1, buf2, ) 2023-01-11T21:38:06.0409236Z 2023-01-11T21:38:06.0409240Z 2023-01-11T21:38:06.0409320Z if __name__ == "__main__": 2023-01-11T21:38:06.0409440Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.0409569Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.0409780Z arg0_1 = rand_strided((1, 2, 4, 8), (64, 32, 8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0409893Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.0409899Z 2023-01-11T21:38:06.0409962Z ok (2.100s) 2023-01-11T21:38:06.0410413Z test_roi_align_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.0410575Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.0410835Z [2023-01-11 21:30:33,845] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 228 2023-01-11T21:38:06.0411094Z [2023-01-11 21:30:34,178] torch._inductor.ir: [WARNING] Using FallbackKernel: torch.ops.torchvision.roi_align 2023-01-11T21:38:06.0411356Z [2023-01-11 21:30:34,181] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 228 2023-01-11T21:38:06.0411362Z 2023-01-11T21:38:06.0411456Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.0411532Z import torch 2023-01-11T21:38:06.0411607Z import random 2023-01-11T21:38:06.0411720Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.0411842Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.0411847Z 2023-01-11T21:38:06.0411927Z aten = torch.ops.aten 2023-01-11T21:38:06.0412062Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.0412158Z async_compile = AsyncCompile() 2023-01-11T21:38:06.0412164Z 2023-01-11T21:38:06.0412237Z import triton 2023-01-11T21:38:06.0412328Z import triton.language as tl 2023-01-11T21:38:06.0412453Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.0412586Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.0412591Z 2023-01-11T21:38:06.0412596Z 2023-01-11T21:38:06.0412689Z async_compile.wait(globals()) 2023-01-11T21:38:06.0412766Z del async_compile 2023-01-11T21:38:06.0412771Z 2023-01-11T21:38:06.0412845Z def call(args): 2023-01-11T21:38:06.0412924Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.0413029Z args.clear() 2023-01-11T21:38:06.0413175Z buf0 = torch.ops.torchvision.roi_align(arg0_1, arg1_1, 0.25, 7, 7, 2, False) 2023-01-11T21:38:06.0413241Z del arg0_1 2023-01-11T21:38:06.0413309Z del arg1_1 2023-01-11T21:38:06.0413379Z buf1 = buf0 2023-01-11T21:38:06.0413498Z assert_size_stride(buf1, (2292, 256, 7, 7), (12544, 49, 7, 1)) 2023-01-11T21:38:06.0413569Z del buf0 2023-01-11T21:38:06.0413642Z return (buf1, ) 2023-01-11T21:38:06.0413648Z 2023-01-11T21:38:06.0413652Z 2023-01-11T21:38:06.0413729Z if __name__ == "__main__": 2023-01-11T21:38:06.0413844Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.0413963Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.0414196Z arg0_1 = rand_strided((4, 256, 296, 304), (23035904, 89984, 304, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0414396Z arg1_1 = rand_strided((2292, 5), (5, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0414637Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.0414646Z 2023-01-11T21:38:06.0414722Z ok (4.251s) 2023-01-11T21:38:06.0415172Z test_roll_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.0415303Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.0415560Z [2023-01-11 21:30:36,340] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 229 2023-01-11T21:38:06.0415824Z [2023-01-11 21:30:39,345] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 229 2023-01-11T21:38:06.0415830Z 2023-01-11T21:38:06.0415921Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.0415999Z import torch 2023-01-11T21:38:06.0416073Z import random 2023-01-11T21:38:06.0416194Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.0416315Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.0416321Z 2023-01-11T21:38:06.0416404Z aten = torch.ops.aten 2023-01-11T21:38:06.0416585Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.0416682Z async_compile = AsyncCompile() 2023-01-11T21:38:06.0416687Z 2023-01-11T21:38:06.0416754Z import triton 2023-01-11T21:38:06.0416846Z import triton.language as tl 2023-01-11T21:38:06.0416969Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.0417109Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.0417114Z 2023-01-11T21:38:06.0417119Z 2023-01-11T21:38:06.0417305Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.0417508Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.0417633Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.0417735Z float* __restrict__ out_ptr0, 2023-01-11T21:38:06.0417829Z float* __restrict__ out_ptr1) 2023-01-11T21:38:06.0417893Z { 2023-01-11T21:38:06.0417994Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.0418063Z { 2023-01-11T21:38:06.0418158Z #pragma omp for collapse(2) 2023-01-11T21:38:06.0418244Z for(long i0=0; i0<2; i0+=1) 2023-01-11T21:38:06.0418303Z { 2023-01-11T21:38:06.0418394Z for(long i1=0; i1<56; i1+=1) 2023-01-11T21:38:06.0418460Z { 2023-01-11T21:38:06.0418545Z #pragma GCC ivdep 2023-01-11T21:38:06.0418640Z for(long i2=0; i2<56; i2+=1) 2023-01-11T21:38:06.0418709Z { 2023-01-11T21:38:06.0418804Z for(long i3=0; i3<2; i3+=1) 2023-01-11T21:38:06.0418867Z { 2023-01-11T21:38:06.0419091Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + (8*i3) + (16*((46 + i2) % 56)) + (896*((3 + i1) % 56)) + (50176*i0)); 2023-01-11T21:38:06.0419214Z tmp0.store(out_ptr0 + (8*i3) + (16*i2) + (896*i1) + (50176*i0)); 2023-01-11T21:38:06.0419287Z } 2023-01-11T21:38:06.0419395Z #pragma omp simd simdlen(4) 2023-01-11T21:38:06.0419491Z for(long i3=16; i3<16; i3+=1) 2023-01-11T21:38:06.0419562Z { 2023-01-11T21:38:06.0419700Z auto tmp0 = in_ptr0[i3 + (16*((46 + i2) % 56)) + (896*((3 + i1) % 56)) + (50176*i0)]; 2023-01-11T21:38:06.0419808Z out_ptr0[i3 + (16*i2) + (896*i1) + (50176*i0)] = tmp0; 2023-01-11T21:38:06.0419880Z } 2023-01-11T21:38:06.0419948Z } 2023-01-11T21:38:06.0420014Z } 2023-01-11T21:38:06.0420080Z } 2023-01-11T21:38:06.0420174Z #pragma omp for collapse(2) 2023-01-11T21:38:06.0420264Z for(long i0=0; i0<2; i0+=1) 2023-01-11T21:38:06.0420324Z { 2023-01-11T21:38:06.0420415Z for(long i1=0; i1<56; i1+=1) 2023-01-11T21:38:06.0420480Z { 2023-01-11T21:38:06.0420567Z #pragma GCC ivdep 2023-01-11T21:38:06.0420660Z for(long i2=0; i2<56; i2+=1) 2023-01-11T21:38:06.0420731Z { 2023-01-11T21:38:06.0420820Z #pragma GCC ivdep 2023-01-11T21:38:06.0420908Z for(long i3=0; i3<16; i3+=1) 2023-01-11T21:38:06.0420977Z { 2023-01-11T21:38:06.0421052Z { 2023-01-11T21:38:06.0421126Z { 2023-01-11T21:38:06.0421260Z auto tmp0 = in_ptr0[(100347 + i3 + (16*i2) + (896*i1) + (50176*i0)) % 100352]; 2023-01-11T21:38:06.0421382Z out_ptr1[i3 + (16*i2) + (896*i1) + (50176*i0)] = tmp0; 2023-01-11T21:38:06.0421456Z } 2023-01-11T21:38:06.0421523Z } 2023-01-11T21:38:06.0421593Z } 2023-01-11T21:38:06.0421660Z } 2023-01-11T21:38:06.0421725Z } 2023-01-11T21:38:06.0421789Z } 2023-01-11T21:38:06.0421854Z } 2023-01-11T21:38:06.0421916Z } 2023-01-11T21:38:06.0421996Z ''') 2023-01-11T21:38:06.0422002Z 2023-01-11T21:38:06.0422034Z 2023-01-11T21:38:06.0422128Z async_compile.wait(globals()) 2023-01-11T21:38:06.0422204Z del async_compile 2023-01-11T21:38:06.0422209Z 2023-01-11T21:38:06.0422282Z def call(args): 2023-01-11T21:38:06.0422357Z arg0_1, = args 2023-01-11T21:38:06.0422438Z args.clear() 2023-01-11T21:38:06.0422660Z buf0 = empty_strided((2, 56, 56, 16), (50176, 896, 16, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0422873Z buf1 = empty_strided((2, 56, 56, 16), (50176, 896, 16, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0423040Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr())) 2023-01-11T21:38:06.0423116Z del arg0_1 2023-01-11T21:38:06.0423200Z return (buf0, buf1, ) 2023-01-11T21:38:06.0423205Z 2023-01-11T21:38:06.0423209Z 2023-01-11T21:38:06.0423289Z if __name__ == "__main__": 2023-01-11T21:38:06.0423405Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.0423534Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.0423755Z arg0_1 = rand_strided((2, 56, 56, 16), (50176, 896, 16, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0423861Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.0423872Z 2023-01-11T21:38:06.0423936Z ok (3.098s) 2023-01-11T21:38:06.0424395Z test_round_correctness_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.0424557Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.0424810Z [2023-01-11 21:30:39,389] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 230 2023-01-11T21:38:06.0425076Z [2023-01-11 21:30:41,752] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 230 2023-01-11T21:38:06.0425082Z 2023-01-11T21:38:06.0425181Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.0425255Z import torch 2023-01-11T21:38:06.0425330Z import random 2023-01-11T21:38:06.0425443Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.0425569Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.0425574Z 2023-01-11T21:38:06.0425655Z aten = torch.ops.aten 2023-01-11T21:38:06.0425789Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.0425890Z async_compile = AsyncCompile() 2023-01-11T21:38:06.0425895Z 2023-01-11T21:38:06.0425967Z import triton 2023-01-11T21:38:06.0426056Z import triton.language as tl 2023-01-11T21:38:06.0426183Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.0426315Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.0426324Z 2023-01-11T21:38:06.0426337Z 2023-01-11T21:38:06.0426465Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.0426670Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.0426797Z extern "C" void kernel(const double* __restrict__ in_ptr0, 2023-01-11T21:38:06.0426902Z double* __restrict__ out_ptr0) 2023-01-11T21:38:06.0426966Z { 2023-01-11T21:38:06.0427067Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.0427133Z { 2023-01-11T21:38:06.0427208Z #pragma omp for 2023-01-11T21:38:06.0427293Z for(long i0=0; i0<200; i0+=1) 2023-01-11T21:38:06.0427363Z { 2023-01-11T21:38:06.0427432Z { 2023-01-11T21:38:06.0427500Z { 2023-01-11T21:38:06.0427598Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:06.0427711Z auto tmp1 = std::nearbyint(tmp0); 2023-01-11T21:38:06.0427795Z out_ptr0[i0] = tmp1; 2023-01-11T21:38:06.0427892Z } 2023-01-11T21:38:06.0427960Z } 2023-01-11T21:38:06.0428028Z } 2023-01-11T21:38:06.0428094Z } 2023-01-11T21:38:06.0428156Z } 2023-01-11T21:38:06.0428233Z ''') 2023-01-11T21:38:06.0428239Z 2023-01-11T21:38:06.0428253Z 2023-01-11T21:38:06.0428339Z async_compile.wait(globals()) 2023-01-11T21:38:06.0428416Z del async_compile 2023-01-11T21:38:06.0428421Z 2023-01-11T21:38:06.0428493Z def call(args): 2023-01-11T21:38:06.0428568Z arg0_1, = args 2023-01-11T21:38:06.0428646Z args.clear() 2023-01-11T21:38:06.0428839Z buf0 = empty_strided((200, ), (1, ), device='cpu', dtype=torch.float64) 2023-01-11T21:38:06.0428980Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.0429045Z del arg0_1 2023-01-11T21:38:06.0429120Z return (buf0, ) 2023-01-11T21:38:06.0429125Z 2023-01-11T21:38:06.0429129Z 2023-01-11T21:38:06.0429209Z if __name__ == "__main__": 2023-01-11T21:38:06.0429331Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.0429457Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.0429652Z arg0_1 = rand_strided((200, ), (1, ), device='cpu', dtype=torch.float64) 2023-01-11T21:38:06.0429764Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.0429769Z 2023-01-11T21:38:06.0429840Z ok (2.382s) 2023-01-11T21:38:06.0430272Z test_round_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.0430478Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.0430736Z [2023-01-11 21:30:41,807] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 231 2023-01-11T21:38:06.0431000Z [2023-01-11 21:30:44,755] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 231 2023-01-11T21:38:06.0431005Z 2023-01-11T21:38:06.0431103Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.0431176Z import torch 2023-01-11T21:38:06.0431251Z import random 2023-01-11T21:38:06.0431372Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.0431492Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.0431498Z 2023-01-11T21:38:06.0431573Z aten = torch.ops.aten 2023-01-11T21:38:06.0431708Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.0431807Z async_compile = AsyncCompile() 2023-01-11T21:38:06.0431812Z 2023-01-11T21:38:06.0431890Z import triton 2023-01-11T21:38:06.0431983Z import triton.language as tl 2023-01-11T21:38:06.0432106Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.0432244Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.0432253Z 2023-01-11T21:38:06.0432257Z 2023-01-11T21:38:06.0432394Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.0432589Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.0432712Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.0432819Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:06.0432920Z float* __restrict__ out_ptr0, 2023-01-11T21:38:06.0433020Z float* __restrict__ out_ptr1, 2023-01-11T21:38:06.0433121Z float* __restrict__ out_ptr2) 2023-01-11T21:38:06.0433188Z { 2023-01-11T21:38:06.0433283Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.0433349Z { 2023-01-11T21:38:06.0433431Z #pragma omp for 2023-01-11T21:38:06.0433518Z for(long i0=0; i0<8; i0+=1) 2023-01-11T21:38:06.0433585Z { 2023-01-11T21:38:06.0433753Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:06.0433846Z auto tmp1 = tmp0.round(); 2023-01-11T21:38:06.0433981Z auto tmp2 = at::vec::Vectorized(static_cast(100.0)); 2023-01-11T21:38:06.0434069Z auto tmp3 = tmp0 * tmp2; 2023-01-11T21:38:06.0434160Z auto tmp4 = tmp3.round(); 2023-01-11T21:38:06.0434299Z auto tmp5 = at::vec::Vectorized(static_cast(0.01)); 2023-01-11T21:38:06.0434389Z auto tmp6 = tmp4 * tmp5; 2023-01-11T21:38:06.0434487Z tmp1.store(out_ptr0 + 8*i0); 2023-01-11T21:38:06.0434581Z tmp6.store(out_ptr1 + 8*i0); 2023-01-11T21:38:06.0434654Z } 2023-01-11T21:38:06.0434749Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:06.0434838Z for(long i0=64; i0<64; i0+=1) 2023-01-11T21:38:06.0434919Z { 2023-01-11T21:38:06.0435015Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:06.0435144Z auto tmp1 = std::nearbyint(tmp0); 2023-01-11T21:38:06.0435254Z auto tmp2 = static_cast(100.0); 2023-01-11T21:38:06.0435343Z auto tmp3 = tmp0 * tmp2; 2023-01-11T21:38:06.0435441Z auto tmp4 = std::nearbyint(tmp3); 2023-01-11T21:38:06.0435547Z auto tmp5 = static_cast(0.01); 2023-01-11T21:38:06.0435636Z auto tmp6 = tmp4 * tmp5; 2023-01-11T21:38:06.0435724Z out_ptr0[i0] = tmp1; 2023-01-11T21:38:06.0435808Z out_ptr1[i0] = tmp6; 2023-01-11T21:38:06.0435874Z } 2023-01-11T21:38:06.0435947Z #pragma omp for 2023-01-11T21:38:06.0436035Z for(long i0=0; i0<8; i0+=1) 2023-01-11T21:38:06.0436136Z { 2023-01-11T21:38:06.0436274Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr1 + 8*i0); 2023-01-11T21:38:06.0436411Z auto tmp1 = at::vec::Vectorized(static_cast(1)); 2023-01-11T21:38:06.0436501Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.0436595Z auto tmp3 = tmp2.round(); 2023-01-11T21:38:06.0436690Z tmp3.store(out_ptr2 + 8*i0); 2023-01-11T21:38:06.0436750Z } 2023-01-11T21:38:06.0436849Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:06.0436935Z for(long i0=64; i0<64; i0+=1) 2023-01-11T21:38:06.0437002Z { 2023-01-11T21:38:06.0437090Z auto tmp0 = in_ptr1[i0]; 2023-01-11T21:38:06.0437191Z auto tmp1 = static_cast(1); 2023-01-11T21:38:06.0437283Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.0437382Z auto tmp3 = std::nearbyint(tmp2); 2023-01-11T21:38:06.0437468Z out_ptr2[i0] = tmp3; 2023-01-11T21:38:06.0437538Z } 2023-01-11T21:38:06.0437604Z } 2023-01-11T21:38:06.0437668Z } 2023-01-11T21:38:06.0437756Z ''') 2023-01-11T21:38:06.0437761Z 2023-01-11T21:38:06.0437766Z 2023-01-11T21:38:06.0437859Z async_compile.wait(globals()) 2023-01-11T21:38:06.0437932Z del async_compile 2023-01-11T21:38:06.0437938Z 2023-01-11T21:38:06.0438012Z def call(args): 2023-01-11T21:38:06.0438092Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.0438166Z args.clear() 2023-01-11T21:38:06.0438361Z buf0 = empty_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0438582Z buf2 = empty_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0438795Z buf1 = empty_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0439031Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf2.data_ptr()), c_void_p(buf1.data_ptr())) 2023-01-11T21:38:06.0439105Z del arg0_1 2023-01-11T21:38:06.0439181Z del arg1_1 2023-01-11T21:38:06.0439272Z return (buf0, buf1, buf2, ) 2023-01-11T21:38:06.0439277Z 2023-01-11T21:38:06.0439281Z 2023-01-11T21:38:06.0439362Z if __name__ == "__main__": 2023-01-11T21:38:06.0439490Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.0439627Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.0439879Z arg0_1 = rand_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0440068Z arg1_1 = rand_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0440192Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.0440197Z 2023-01-11T21:38:06.0440266Z ok (3.025s) 2023-01-11T21:38:06.0440711Z test_rsqrt_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.0440844Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.0441101Z [2023-01-11 21:30:44,819] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 232 2023-01-11T21:38:06.0441366Z [2023-01-11 21:30:47,945] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 232 2023-01-11T21:38:06.0441372Z 2023-01-11T21:38:06.0441472Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.0441547Z import torch 2023-01-11T21:38:06.0441615Z import random 2023-01-11T21:38:06.0441733Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.0441856Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.0441861Z 2023-01-11T21:38:06.0441942Z aten = torch.ops.aten 2023-01-11T21:38:06.0442077Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.0442200Z async_compile = AsyncCompile() 2023-01-11T21:38:06.0442205Z 2023-01-11T21:38:06.0442279Z import triton 2023-01-11T21:38:06.0442369Z import triton.language as tl 2023-01-11T21:38:06.0442486Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.0442626Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.0442632Z 2023-01-11T21:38:06.0442636Z 2023-01-11T21:38:06.0442771Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.0442975Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.0443099Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.0443203Z float* __restrict__ out_ptr0, 2023-01-11T21:38:06.0443304Z float* __restrict__ out_ptr1) 2023-01-11T21:38:06.0443370Z { 2023-01-11T21:38:06.0443466Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.0443531Z { 2023-01-11T21:38:06.0443617Z #pragma omp for 2023-01-11T21:38:06.0443702Z for(long i0=0; i0<8; i0+=1) 2023-01-11T21:38:06.0443773Z { 2023-01-11T21:38:06.0443910Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:06.0443999Z auto tmp1 = tmp0.rsqrt(); 2023-01-11T21:38:06.0444133Z auto tmp2 = at::vec::Vectorized(static_cast(1)); 2023-01-11T21:38:06.0444224Z auto tmp3 = tmp0 + tmp2; 2023-01-11T21:38:06.0444312Z auto tmp4 = tmp3.rsqrt(); 2023-01-11T21:38:06.0444446Z auto tmp5 = at::vec::Vectorized(static_cast(2)); 2023-01-11T21:38:06.0444575Z auto tmp6 = tmp4 - tmp5; 2023-01-11T21:38:06.0444670Z tmp1.store(out_ptr0 + 8*i0); 2023-01-11T21:38:06.0444763Z tmp6.store(out_ptr1 + 8*i0); 2023-01-11T21:38:06.0444823Z } 2023-01-11T21:38:06.0444921Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:06.0445013Z for(long i0=64; i0<64; i0+=1) 2023-01-11T21:38:06.0445084Z { 2023-01-11T21:38:06.0445174Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:06.0445275Z auto tmp1 = 1 / std::sqrt(tmp0); 2023-01-11T21:38:06.0445380Z auto tmp2 = static_cast(1); 2023-01-11T21:38:06.0445462Z auto tmp3 = tmp0 + tmp2; 2023-01-11T21:38:06.0445590Z auto tmp4 = 1 / std::sqrt(tmp3); 2023-01-11T21:38:06.0445693Z auto tmp5 = static_cast(2); 2023-01-11T21:38:06.0445819Z auto tmp6 = tmp4 - tmp5; 2023-01-11T21:38:06.0445902Z out_ptr0[i0] = tmp1; 2023-01-11T21:38:06.0445986Z out_ptr1[i0] = tmp6; 2023-01-11T21:38:06.0446053Z } 2023-01-11T21:38:06.0446112Z } 2023-01-11T21:38:06.0446175Z } 2023-01-11T21:38:06.0446261Z ''') 2023-01-11T21:38:06.0446266Z 2023-01-11T21:38:06.0446272Z 2023-01-11T21:38:06.0446366Z async_compile.wait(globals()) 2023-01-11T21:38:06.0446443Z del async_compile 2023-01-11T21:38:06.0446448Z 2023-01-11T21:38:06.0446524Z def call(args): 2023-01-11T21:38:06.0446598Z arg0_1, = args 2023-01-11T21:38:06.0446666Z args.clear() 2023-01-11T21:38:06.0446860Z buf0 = empty_strided((64, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0447050Z buf1 = empty_strided((64, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0447220Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr())) 2023-01-11T21:38:06.0447295Z del arg0_1 2023-01-11T21:38:06.0447376Z return (buf0, buf1, ) 2023-01-11T21:38:06.0447381Z 2023-01-11T21:38:06.0447386Z 2023-01-11T21:38:06.0447466Z if __name__ == "__main__": 2023-01-11T21:38:06.0447581Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.0447703Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.0447896Z arg0_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0448007Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.0448040Z 2023-01-11T21:38:06.0448111Z ok (3.180s) 2023-01-11T21:38:06.0448561Z test_scatter1_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.0448695Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.0448953Z [2023-01-11 21:30:47,998] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 233 2023-01-11T21:38:06.0449213Z [2023-01-11 21:30:50,987] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 233 2023-01-11T21:38:06.0449219Z 2023-01-11T21:38:06.0449317Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.0449392Z import torch 2023-01-11T21:38:06.0449463Z import random 2023-01-11T21:38:06.0449582Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.0449703Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.0449709Z 2023-01-11T21:38:06.0449790Z aten = torch.ops.aten 2023-01-11T21:38:06.0449929Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.0450024Z async_compile = AsyncCompile() 2023-01-11T21:38:06.0450029Z 2023-01-11T21:38:06.0450103Z import triton 2023-01-11T21:38:06.0450188Z import triton.language as tl 2023-01-11T21:38:06.0450311Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.0450450Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.0450455Z 2023-01-11T21:38:06.0450459Z 2023-01-11T21:38:06.0450596Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.0450800Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.0450924Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.0451036Z const long* __restrict__ in_ptr1, 2023-01-11T21:38:06.0451145Z const float* __restrict__ in_ptr2, 2023-01-11T21:38:06.0451242Z float* __restrict__ out_ptr0) 2023-01-11T21:38:06.0451308Z { 2023-01-11T21:38:06.0451420Z #pragma GCC ivdep 2023-01-11T21:38:06.0451509Z for(long i0=0; i0<6; i0+=1) 2023-01-11T21:38:06.0451575Z { 2023-01-11T21:38:06.0451641Z { 2023-01-11T21:38:06.0451712Z { 2023-01-11T21:38:06.0451799Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:06.0451886Z out_ptr0[i0] = tmp0; 2023-01-11T21:38:06.0451952Z } 2023-01-11T21:38:06.0452017Z } 2023-01-11T21:38:06.0452081Z } 2023-01-11T21:38:06.0452148Z { 2023-01-11T21:38:06.0452207Z { 2023-01-11T21:38:06.0452296Z auto tmp0 = in_ptr1[0]; 2023-01-11T21:38:06.0452383Z auto tmp1 = in_ptr2[0]; 2023-01-11T21:38:06.0452472Z out_ptr0[tmp0] = tmp1; 2023-01-11T21:38:06.0452539Z } 2023-01-11T21:38:06.0452603Z } 2023-01-11T21:38:06.0452664Z } 2023-01-11T21:38:06.0452742Z ''') 2023-01-11T21:38:06.0452747Z 2023-01-11T21:38:06.0452751Z 2023-01-11T21:38:06.0452842Z async_compile.wait(globals()) 2023-01-11T21:38:06.0452919Z del async_compile 2023-01-11T21:38:06.0452924Z 2023-01-11T21:38:06.0452998Z def call(args): 2023-01-11T21:38:06.0453084Z arg0_1, arg1_1, arg2_1 = args 2023-01-11T21:38:06.0453158Z args.clear() 2023-01-11T21:38:06.0453352Z buf0 = empty_strided((2, 3), (3, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0453542Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(arg2_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.0453608Z del arg0_1 2023-01-11T21:38:06.0453680Z del arg1_1 2023-01-11T21:38:06.0453749Z del arg2_1 2023-01-11T21:38:06.0453822Z return (buf0, ) 2023-01-11T21:38:06.0453856Z 2023-01-11T21:38:06.0453861Z 2023-01-11T21:38:06.0453942Z if __name__ == "__main__": 2023-01-11T21:38:06.0454062Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.0454187Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.0454377Z arg0_1 = rand_strided((2, 3), (3, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0454680Z arg1_1 = rand_strided((1, 1), (1, 1), device='cpu', dtype=torch.int64) 2023-01-11T21:38:06.0454872Z arg2_1 = rand_strided((2, 3), (3, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0455004Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1])) 2023-01-11T21:38:06.0455009Z 2023-01-11T21:38:06.0455080Z ok (3.029s) 2023-01-11T21:38:06.0455534Z test_scatter2_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.0455668Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.0455927Z [2023-01-11 21:30:51,033] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 234 2023-01-11T21:38:06.0456190Z [2023-01-11 21:30:53,637] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 234 2023-01-11T21:38:06.0456196Z 2023-01-11T21:38:06.0456291Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.0456358Z import torch 2023-01-11T21:38:06.0456436Z import random 2023-01-11T21:38:06.0456555Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.0456676Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.0456681Z 2023-01-11T21:38:06.0456765Z aten = torch.ops.aten 2023-01-11T21:38:06.0456908Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.0457007Z async_compile = AsyncCompile() 2023-01-11T21:38:06.0457012Z 2023-01-11T21:38:06.0457079Z import triton 2023-01-11T21:38:06.0457245Z import triton.language as tl 2023-01-11T21:38:06.0457370Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.0457559Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.0457565Z 2023-01-11T21:38:06.0457569Z 2023-01-11T21:38:06.0457707Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.0457912Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.0458036Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.0458146Z const long* __restrict__ in_ptr1, 2023-01-11T21:38:06.0458247Z const float* __restrict__ in_ptr2, 2023-01-11T21:38:06.0458351Z float* __restrict__ out_ptr0) 2023-01-11T21:38:06.0458416Z { 2023-01-11T21:38:06.0458520Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.0458584Z { 2023-01-11T21:38:06.0458667Z #pragma omp for 2023-01-11T21:38:06.0458757Z for(long i0=0; i0<4096; i0+=1) 2023-01-11T21:38:06.0458817Z { 2023-01-11T21:38:06.0458958Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:06.0459061Z tmp0.store(out_ptr0 + 8*i0); 2023-01-11T21:38:06.0459128Z } 2023-01-11T21:38:06.0459225Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:06.0459319Z for(long i0=32768; i0<32768; i0+=1) 2023-01-11T21:38:06.0459387Z { 2023-01-11T21:38:06.0459469Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:06.0459552Z out_ptr0[i0] = tmp0; 2023-01-11T21:38:06.0459619Z } 2023-01-11T21:38:06.0459700Z #pragma omp for 2023-01-11T21:38:06.0459787Z for(long i0=0; i0<64; i0+=1) 2023-01-11T21:38:06.0459849Z { 2023-01-11T21:38:06.0459934Z #pragma GCC ivdep 2023-01-11T21:38:06.0460066Z for(long i1=0; i1<512; i1+=1) 2023-01-11T21:38:06.0460133Z { 2023-01-11T21:38:06.0460203Z { 2023-01-11T21:38:06.0460277Z { 2023-01-11T21:38:06.0460384Z auto tmp0 = in_ptr1[i1 + (512*i0)]; 2023-01-11T21:38:06.0460497Z auto tmp1 = in_ptr2[i1 + (512*i0)]; 2023-01-11T21:38:06.0460616Z atomic_add(&out_ptr0[i1 + (512*tmp0)], tmp1); 2023-01-11T21:38:06.0460680Z } 2023-01-11T21:38:06.0460747Z } 2023-01-11T21:38:06.0460812Z } 2023-01-11T21:38:06.0460878Z } 2023-01-11T21:38:06.0460944Z } 2023-01-11T21:38:06.0461009Z } 2023-01-11T21:38:06.0461088Z ''') 2023-01-11T21:38:06.0461093Z 2023-01-11T21:38:06.0461106Z 2023-01-11T21:38:06.0461193Z async_compile.wait(globals()) 2023-01-11T21:38:06.0461269Z del async_compile 2023-01-11T21:38:06.0461274Z 2023-01-11T21:38:06.0461351Z def call(args): 2023-01-11T21:38:06.0461442Z arg0_1, arg1_1, arg2_1 = args 2023-01-11T21:38:06.0461515Z args.clear() 2023-01-11T21:38:06.0461718Z buf0 = empty_strided((64, 512), (512, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0461914Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(arg2_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.0461981Z del arg0_1 2023-01-11T21:38:06.0462053Z del arg1_1 2023-01-11T21:38:06.0462123Z del arg2_1 2023-01-11T21:38:06.0462199Z return (buf0, ) 2023-01-11T21:38:06.0462204Z 2023-01-11T21:38:06.0462209Z 2023-01-11T21:38:06.0462289Z if __name__ == "__main__": 2023-01-11T21:38:06.0462412Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.0462538Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.0462743Z arg0_1 = rand_strided((64, 512), (512, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0462935Z arg1_1 = rand_strided((64, 512), (512, 1), device='cpu', dtype=torch.int64) 2023-01-11T21:38:06.0463140Z arg2_1 = rand_strided((64, 512), (512, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0463269Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1])) 2023-01-11T21:38:06.0463274Z 2023-01-11T21:38:06.0463346Z ok (2.672s) 2023-01-11T21:38:06.0463823Z test_scatter3_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.0463954Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.0464213Z [2023-01-11 21:30:53,691] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 235 2023-01-11T21:38:06.0464476Z [2023-01-11 21:30:56,035] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 235 2023-01-11T21:38:06.0464484Z 2023-01-11T21:38:06.0464585Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.0464652Z import torch 2023-01-11T21:38:06.0464727Z import random 2023-01-11T21:38:06.0464844Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.0464971Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.0464976Z 2023-01-11T21:38:06.0465055Z aten = torch.ops.aten 2023-01-11T21:38:06.0465189Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.0465288Z async_compile = AsyncCompile() 2023-01-11T21:38:06.0465293Z 2023-01-11T21:38:06.0465367Z import triton 2023-01-11T21:38:06.0465452Z import triton.language as tl 2023-01-11T21:38:06.0465576Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.0465716Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.0465721Z 2023-01-11T21:38:06.0465769Z 2023-01-11T21:38:06.0465907Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.0466111Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.0466234Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.0466346Z const long* __restrict__ in_ptr1, 2023-01-11T21:38:06.0466449Z float* __restrict__ out_ptr0) 2023-01-11T21:38:06.0466508Z { 2023-01-11T21:38:06.0466608Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.0466673Z { 2023-01-11T21:38:06.0466754Z #pragma omp for 2023-01-11T21:38:06.0466841Z for(long i0=0; i0<235; i0+=1) 2023-01-11T21:38:06.0466908Z { 2023-01-11T21:38:06.0467049Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:06.0467138Z tmp0.store(out_ptr0 + 8*i0); 2023-01-11T21:38:06.0467202Z } 2023-01-11T21:38:06.0467303Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:06.0467398Z for(long i0=1880; i0<1885; i0+=1) 2023-01-11T21:38:06.0467464Z { 2023-01-11T21:38:06.0467553Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:06.0467632Z out_ptr0[i0] = tmp0; 2023-01-11T21:38:06.0467700Z } 2023-01-11T21:38:06.0467781Z #pragma omp single 2023-01-11T21:38:06.0467851Z { 2023-01-11T21:38:06.0467935Z #pragma GCC ivdep 2023-01-11T21:38:06.0468023Z for(long i0=0; i0<4; i0+=1) 2023-01-11T21:38:06.0468091Z { 2023-01-11T21:38:06.0468153Z { 2023-01-11T21:38:06.0468226Z { 2023-01-11T21:38:06.0468325Z auto tmp0 = in_ptr1[i0]; 2023-01-11T21:38:06.0468439Z auto tmp1 = static_cast(0.8); 2023-01-11T21:38:06.0468552Z atomic_add(&out_ptr0[tmp0], tmp1); 2023-01-11T21:38:06.0468621Z } 2023-01-11T21:38:06.0468686Z } 2023-01-11T21:38:06.0468750Z } 2023-01-11T21:38:06.0468815Z } 2023-01-11T21:38:06.0468880Z } 2023-01-11T21:38:06.0468946Z } 2023-01-11T21:38:06.0469031Z ''') 2023-01-11T21:38:06.0469036Z 2023-01-11T21:38:06.0469041Z 2023-01-11T21:38:06.0469133Z async_compile.wait(globals()) 2023-01-11T21:38:06.0469208Z del async_compile 2023-01-11T21:38:06.0469213Z 2023-01-11T21:38:06.0469310Z def call(args): 2023-01-11T21:38:06.0469393Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.0469468Z args.clear() 2023-01-11T21:38:06.0469676Z buf0 = empty_strided((5, 29, 13), (377, 13, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0469847Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.0469918Z del arg0_1 2023-01-11T21:38:06.0469990Z del arg1_1 2023-01-11T21:38:06.0470058Z return (buf0, ) 2023-01-11T21:38:06.0470072Z 2023-01-11T21:38:06.0470077Z 2023-01-11T21:38:06.0470151Z if __name__ == "__main__": 2023-01-11T21:38:06.0470271Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.0470397Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.0470606Z arg0_1 = rand_strided((5, 29, 13), (377, 13, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0470806Z arg1_1 = rand_strided((1, 1, 4), (4, 4, 1), device='cpu', dtype=torch.int64) 2023-01-11T21:38:06.0470925Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.0470931Z 2023-01-11T21:38:06.0470998Z ok (2.404s) 2023-01-11T21:38:06.0471445Z test_scatter4_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.0471576Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.0471855Z [2023-01-11 21:30:56,112] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 236 2023-01-11T21:38:06.0472118Z [2023-01-11 21:30:58,403] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 236 2023-01-11T21:38:06.0472123Z 2023-01-11T21:38:06.0472231Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.0472304Z import torch 2023-01-11T21:38:06.0472379Z import random 2023-01-11T21:38:06.0472497Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.0472622Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.0472627Z 2023-01-11T21:38:06.0472709Z aten = torch.ops.aten 2023-01-11T21:38:06.0472837Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.0472930Z async_compile = AsyncCompile() 2023-01-11T21:38:06.0472936Z 2023-01-11T21:38:06.0473009Z import triton 2023-01-11T21:38:06.0473103Z import triton.language as tl 2023-01-11T21:38:06.0473229Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.0473368Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.0473374Z 2023-01-11T21:38:06.0473378Z 2023-01-11T21:38:06.0473515Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.0473722Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.0473837Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.0473947Z const long* __restrict__ in_ptr1, 2023-01-11T21:38:06.0474058Z const float* __restrict__ in_ptr2, 2023-01-11T21:38:06.0474166Z float* __restrict__ out_ptr0) 2023-01-11T21:38:06.0474231Z { 2023-01-11T21:38:06.0474335Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.0474398Z { 2023-01-11T21:38:06.0474472Z #pragma omp for 2023-01-11T21:38:06.0474559Z for(long i0=0; i0<24304; i0+=1) 2023-01-11T21:38:06.0474633Z { 2023-01-11T21:38:06.0474772Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:06.0474869Z tmp0.store(out_ptr0 + 8*i0); 2023-01-11T21:38:06.0474936Z } 2023-01-11T21:38:06.0475033Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:06.0475152Z for(long i0=194432; i0<194432; i0+=1) 2023-01-11T21:38:06.0475219Z { 2023-01-11T21:38:06.0475309Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:06.0475394Z out_ptr0[i0] = tmp0; 2023-01-11T21:38:06.0475460Z } 2023-01-11T21:38:06.0475538Z #pragma omp for 2023-01-11T21:38:06.0475618Z for(long i0=0; i0<992; i0+=1) 2023-01-11T21:38:06.0475685Z { 2023-01-11T21:38:06.0475754Z { 2023-01-11T21:38:06.0475822Z { 2023-01-11T21:38:06.0475921Z auto tmp0 = in_ptr1[i0]; 2023-01-11T21:38:06.0476016Z auto tmp1 = in_ptr2[i0]; 2023-01-11T21:38:06.0476127Z out_ptr0[i0 + (992*tmp0)] = tmp1; 2023-01-11T21:38:06.0476189Z } 2023-01-11T21:38:06.0476257Z } 2023-01-11T21:38:06.0476324Z } 2023-01-11T21:38:06.0476387Z } 2023-01-11T21:38:06.0476450Z } 2023-01-11T21:38:06.0476533Z ''') 2023-01-11T21:38:06.0476539Z 2023-01-11T21:38:06.0476544Z 2023-01-11T21:38:06.0476635Z async_compile.wait(globals()) 2023-01-11T21:38:06.0476705Z del async_compile 2023-01-11T21:38:06.0476717Z 2023-01-11T21:38:06.0476784Z def call(args): 2023-01-11T21:38:06.0476869Z arg0_1, arg1_1, arg2_1 = args 2023-01-11T21:38:06.0476946Z args.clear() 2023-01-11T21:38:06.0477151Z buf0 = empty_strided((196, 992), (992, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0477343Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(arg2_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.0477414Z del arg0_1 2023-01-11T21:38:06.0477486Z del arg1_1 2023-01-11T21:38:06.0477580Z del arg2_1 2023-01-11T21:38:06.0477656Z return (buf0, ) 2023-01-11T21:38:06.0477662Z 2023-01-11T21:38:06.0477666Z 2023-01-11T21:38:06.0477747Z if __name__ == "__main__": 2023-01-11T21:38:06.0477868Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.0477992Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.0478200Z arg0_1 = rand_strided((196, 992), (992, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0478398Z arg1_1 = rand_strided((1, 992), (992, 1), device='cpu', dtype=torch.int64) 2023-01-11T21:38:06.0478591Z arg2_1 = rand_strided((1, 992), (992, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0478717Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1])) 2023-01-11T21:38:06.0478722Z 2023-01-11T21:38:06.0478790Z ok (2.339s) 2023-01-11T21:38:06.0478947Z test_scatter_add1_cpu (__main__.CpuTests) ... skip: Flaky test, needs debugging (0.001s) 2023-01-11T21:38:06.0479399Z test_scatter_add2_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.0479535Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.0479792Z [2023-01-11 21:30:58,435] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 237 2023-01-11T21:38:06.0480051Z [2023-01-11 21:31:00,297] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 237 2023-01-11T21:38:06.0480057Z 2023-01-11T21:38:06.0480154Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.0480229Z import torch 2023-01-11T21:38:06.0480297Z import random 2023-01-11T21:38:06.0480414Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.0480539Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.0480544Z 2023-01-11T21:38:06.0480625Z aten = torch.ops.aten 2023-01-11T21:38:06.0480762Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.0480857Z async_compile = AsyncCompile() 2023-01-11T21:38:06.0480862Z 2023-01-11T21:38:06.0480962Z import triton 2023-01-11T21:38:06.0481048Z import triton.language as tl 2023-01-11T21:38:06.0481173Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.0481310Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.0481316Z 2023-01-11T21:38:06.0481320Z 2023-01-11T21:38:06.0481461Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.0481664Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.0481786Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.0481896Z const long* __restrict__ in_ptr1, 2023-01-11T21:38:06.0482005Z const float* __restrict__ in_ptr2, 2023-01-11T21:38:06.0482109Z float* __restrict__ out_ptr0) 2023-01-11T21:38:06.0482168Z { 2023-01-11T21:38:06.0482249Z #pragma GCC ivdep 2023-01-11T21:38:06.0482332Z for(long i0=0; i0<6; i0+=1) 2023-01-11T21:38:06.0482398Z { 2023-01-11T21:38:06.0482469Z { 2023-01-11T21:38:06.0482536Z { 2023-01-11T21:38:06.0482621Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:06.0482710Z out_ptr0[i0] = tmp0; 2023-01-11T21:38:06.0482778Z } 2023-01-11T21:38:06.0482842Z } 2023-01-11T21:38:06.0482906Z } 2023-01-11T21:38:06.0482985Z #pragma GCC ivdep 2023-01-11T21:38:06.0483071Z for(long i0=0; i0<2; i0+=1) 2023-01-11T21:38:06.0483130Z { 2023-01-11T21:38:06.0483216Z #pragma GCC ivdep 2023-01-11T21:38:06.0483305Z for(long i1=0; i1<3; i1+=1) 2023-01-11T21:38:06.0483371Z { 2023-01-11T21:38:06.0483469Z { 2023-01-11T21:38:06.0483545Z { 2023-01-11T21:38:06.0483642Z auto tmp0 = in_ptr1[i1 + (3*i0)]; 2023-01-11T21:38:06.0483747Z auto tmp1 = in_ptr2[i1 + (3*i0)]; 2023-01-11T21:38:06.0483863Z atomic_add(&out_ptr0[i1 + (3*tmp0)], tmp1); 2023-01-11T21:38:06.0483936Z } 2023-01-11T21:38:06.0484002Z } 2023-01-11T21:38:06.0484067Z } 2023-01-11T21:38:06.0484131Z } 2023-01-11T21:38:06.0484188Z } 2023-01-11T21:38:06.0484276Z ''') 2023-01-11T21:38:06.0484281Z 2023-01-11T21:38:06.0484286Z 2023-01-11T21:38:06.0484376Z async_compile.wait(globals()) 2023-01-11T21:38:06.0484456Z del async_compile 2023-01-11T21:38:06.0484462Z 2023-01-11T21:38:06.0484536Z def call(args): 2023-01-11T21:38:06.0484627Z arg0_1, arg1_1, arg2_1 = args 2023-01-11T21:38:06.0484703Z args.clear() 2023-01-11T21:38:06.0484892Z buf0 = empty_strided((2, 3), (3, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0485087Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(arg2_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.0485160Z del arg0_1 2023-01-11T21:38:06.0485233Z del arg1_1 2023-01-11T21:38:06.0485303Z del arg2_1 2023-01-11T21:38:06.0485377Z return (buf0, ) 2023-01-11T21:38:06.0485383Z 2023-01-11T21:38:06.0485394Z 2023-01-11T21:38:06.0485473Z if __name__ == "__main__": 2023-01-11T21:38:06.0485591Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.0485711Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.0485908Z arg0_1 = rand_strided((2, 3), (3, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0486099Z arg1_1 = rand_strided((2, 3), (3, 1), device='cpu', dtype=torch.int64) 2023-01-11T21:38:06.0486288Z arg2_1 = rand_strided((2, 3), (3, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0486416Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1])) 2023-01-11T21:38:06.0486423Z 2023-01-11T21:38:06.0486494Z ok (1.891s) 2023-01-11T21:38:06.0486975Z test_scatter_add3_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.0487109Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.0487364Z [2023-01-11 21:31:00,321] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 238 2023-01-11T21:38:06.0487625Z [2023-01-11 21:31:02,733] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 238 2023-01-11T21:38:06.0487631Z 2023-01-11T21:38:06.0487722Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.0487796Z import torch 2023-01-11T21:38:06.0487873Z import random 2023-01-11T21:38:06.0487993Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.0488116Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.0488121Z 2023-01-11T21:38:06.0488203Z aten = torch.ops.aten 2023-01-11T21:38:06.0488343Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.0488431Z async_compile = AsyncCompile() 2023-01-11T21:38:06.0488441Z 2023-01-11T21:38:06.0488509Z import triton 2023-01-11T21:38:06.0488602Z import triton.language as tl 2023-01-11T21:38:06.0488725Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.0488865Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.0488871Z 2023-01-11T21:38:06.0488875Z 2023-01-11T21:38:06.0489014Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.0489217Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.0489342Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.0489474Z const long* __restrict__ in_ptr1, 2023-01-11T21:38:06.0489584Z const float* __restrict__ in_ptr2, 2023-01-11T21:38:06.0489686Z float* __restrict__ out_ptr0) 2023-01-11T21:38:06.0489752Z { 2023-01-11T21:38:06.0489861Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.0489928Z { 2023-01-11T21:38:06.0490014Z #pragma omp for 2023-01-11T21:38:06.0490095Z for(long i0=0; i0<235; i0+=1) 2023-01-11T21:38:06.0490165Z { 2023-01-11T21:38:06.0490306Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:06.0490404Z tmp0.store(out_ptr0 + 8*i0); 2023-01-11T21:38:06.0490473Z } 2023-01-11T21:38:06.0490572Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:06.0490665Z for(long i0=1880; i0<1885; i0+=1) 2023-01-11T21:38:06.0490725Z { 2023-01-11T21:38:06.0490817Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:06.0490901Z out_ptr0[i0] = tmp0; 2023-01-11T21:38:06.0490971Z } 2023-01-11T21:38:06.0491054Z #pragma omp single 2023-01-11T21:38:06.0491121Z { 2023-01-11T21:38:06.0491210Z #pragma GCC ivdep 2023-01-11T21:38:06.0491290Z for(long i0=0; i0<4; i0+=1) 2023-01-11T21:38:06.0491359Z { 2023-01-11T21:38:06.0491429Z { 2023-01-11T21:38:06.0491501Z { 2023-01-11T21:38:06.0491601Z auto tmp0 = in_ptr1[i0]; 2023-01-11T21:38:06.0491700Z auto tmp1 = in_ptr2[i0]; 2023-01-11T21:38:06.0491810Z atomic_add(&out_ptr0[tmp0], tmp1); 2023-01-11T21:38:06.0491874Z } 2023-01-11T21:38:06.0491942Z } 2023-01-11T21:38:06.0492012Z } 2023-01-11T21:38:06.0492078Z } 2023-01-11T21:38:06.0492145Z } 2023-01-11T21:38:06.0492209Z } 2023-01-11T21:38:06.0492286Z ''') 2023-01-11T21:38:06.0492299Z 2023-01-11T21:38:06.0492312Z 2023-01-11T21:38:06.0492399Z async_compile.wait(globals()) 2023-01-11T21:38:06.0492473Z del async_compile 2023-01-11T21:38:06.0492479Z 2023-01-11T21:38:06.0492554Z def call(args): 2023-01-11T21:38:06.0492640Z arg0_1, arg1_1, arg2_1 = args 2023-01-11T21:38:06.0492715Z args.clear() 2023-01-11T21:38:06.0492952Z buf0 = empty_strided((5, 29, 13), (377, 13, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0493145Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(arg2_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.0493212Z del arg0_1 2023-01-11T21:38:06.0493284Z del arg1_1 2023-01-11T21:38:06.0493354Z del arg2_1 2023-01-11T21:38:06.0493427Z return (buf0, ) 2023-01-11T21:38:06.0493432Z 2023-01-11T21:38:06.0493437Z 2023-01-11T21:38:06.0493518Z if __name__ == "__main__": 2023-01-11T21:38:06.0493634Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.0493764Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.0493972Z arg0_1 = rand_strided((5, 29, 13), (377, 13, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0494166Z arg1_1 = rand_strided((1, 1, 4), (4, 4, 1), device='cpu', dtype=torch.int64) 2023-01-11T21:38:06.0494369Z arg2_1 = rand_strided((1, 1, 10), (10, 10, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0494603Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1])) 2023-01-11T21:38:06.0494608Z 2023-01-11T21:38:06.0494679Z ok (2.436s) 2023-01-11T21:38:06.0495186Z test_scatter_reduce1_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.0495361Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.0495575Z [W TensorAdvancedIndexing.cpp:1739] Warning: scatter_reduce() is in beta and the API may change at any time. (function operator()) 2023-01-11T21:38:06.0495838Z [2023-01-11 21:31:02,755] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 239 2023-01-11T21:38:06.0496098Z [2023-01-11 21:31:02,766] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 239 2023-01-11T21:38:06.0496105Z 2023-01-11T21:38:06.0496204Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.0496272Z import torch 2023-01-11T21:38:06.0496345Z import random 2023-01-11T21:38:06.0496465Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.0496588Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.0496593Z 2023-01-11T21:38:06.0496673Z aten = torch.ops.aten 2023-01-11T21:38:06.0496807Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.0496907Z async_compile = AsyncCompile() 2023-01-11T21:38:06.0496912Z 2023-01-11T21:38:06.0496978Z import triton 2023-01-11T21:38:06.0497070Z import triton.language as tl 2023-01-11T21:38:06.0497252Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.0497395Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.0497400Z 2023-01-11T21:38:06.0497405Z 2023-01-11T21:38:06.0497543Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.0497750Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.0497871Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.0497980Z const long* __restrict__ in_ptr1, 2023-01-11T21:38:06.0498081Z const float* __restrict__ in_ptr2, 2023-01-11T21:38:06.0498184Z float* __restrict__ out_ptr0) 2023-01-11T21:38:06.0498248Z { 2023-01-11T21:38:06.0498352Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.0498419Z { 2023-01-11T21:38:06.0498501Z #pragma omp for 2023-01-11T21:38:06.0498586Z for(long i0=0; i0<235; i0+=1) 2023-01-11T21:38:06.0498646Z { 2023-01-11T21:38:06.0498786Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:06.0498923Z tmp0.store(out_ptr0 + 8*i0); 2023-01-11T21:38:06.0498993Z } 2023-01-11T21:38:06.0499091Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:06.0499179Z for(long i0=1880; i0<1885; i0+=1) 2023-01-11T21:38:06.0499244Z { 2023-01-11T21:38:06.0499326Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:06.0499412Z out_ptr0[i0] = tmp0; 2023-01-11T21:38:06.0499479Z } 2023-01-11T21:38:06.0499561Z #pragma omp single 2023-01-11T21:38:06.0499629Z { 2023-01-11T21:38:06.0499715Z #pragma GCC ivdep 2023-01-11T21:38:06.0499803Z for(long i0=0; i0<4; i0+=1) 2023-01-11T21:38:06.0499868Z { 2023-01-11T21:38:06.0499936Z { 2023-01-11T21:38:06.0500007Z { 2023-01-11T21:38:06.0500105Z auto tmp0 = in_ptr1[i0]; 2023-01-11T21:38:06.0500204Z auto tmp1 = in_ptr2[i0]; 2023-01-11T21:38:06.0500316Z atomic_add(&out_ptr0[tmp0], tmp1); 2023-01-11T21:38:06.0500380Z } 2023-01-11T21:38:06.0500446Z } 2023-01-11T21:38:06.0500512Z } 2023-01-11T21:38:06.0500575Z } 2023-01-11T21:38:06.0500640Z } 2023-01-11T21:38:06.0500702Z } 2023-01-11T21:38:06.0500787Z ''') 2023-01-11T21:38:06.0500793Z 2023-01-11T21:38:06.0500797Z 2023-01-11T21:38:06.0500884Z async_compile.wait(globals()) 2023-01-11T21:38:06.0500961Z del async_compile 2023-01-11T21:38:06.0500966Z 2023-01-11T21:38:06.0501040Z def call(args): 2023-01-11T21:38:06.0501127Z arg0_1, arg1_1, arg2_1 = args 2023-01-11T21:38:06.0501204Z args.clear() 2023-01-11T21:38:06.0501450Z buf0 = empty_strided((5, 29, 13), (377, 13, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0501640Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(arg2_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.0501711Z del arg0_1 2023-01-11T21:38:06.0501775Z del arg1_1 2023-01-11T21:38:06.0501848Z del arg2_1 2023-01-11T21:38:06.0501926Z return (buf0, ) 2023-01-11T21:38:06.0501931Z 2023-01-11T21:38:06.0501935Z 2023-01-11T21:38:06.0502013Z if __name__ == "__main__": 2023-01-11T21:38:06.0502131Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.0502256Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.0502466Z arg0_1 = rand_strided((5, 29, 13), (377, 13, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0502659Z arg1_1 = rand_strided((1, 1, 4), (4, 4, 1), device='cpu', dtype=torch.int64) 2023-01-11T21:38:06.0502865Z arg2_1 = rand_strided((1, 1, 10), (10, 10, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0502992Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1])) 2023-01-11T21:38:06.0502998Z 2023-01-11T21:38:06.0503070Z ok (0.032s) 2023-01-11T21:38:06.0503529Z test_scatter_reduce2_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.0503660Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.0503915Z [2023-01-11 21:31:02,786] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 240 2023-01-11T21:38:06.0504177Z [2023-01-11 21:31:04,822] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 240 2023-01-11T21:38:06.0504186Z 2023-01-11T21:38:06.0504285Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.0504359Z import torch 2023-01-11T21:38:06.0504427Z import random 2023-01-11T21:38:06.0504546Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.0504669Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.0504703Z 2023-01-11T21:38:06.0504786Z aten = torch.ops.aten 2023-01-11T21:38:06.0504920Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.0505016Z async_compile = AsyncCompile() 2023-01-11T21:38:06.0505021Z 2023-01-11T21:38:06.0505100Z import triton 2023-01-11T21:38:06.0505186Z import triton.language as tl 2023-01-11T21:38:06.0505313Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.0505451Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.0505456Z 2023-01-11T21:38:06.0505461Z 2023-01-11T21:38:06.0505602Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.0505811Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.0505930Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.0506041Z const long* __restrict__ in_ptr1, 2023-01-11T21:38:06.0506150Z const float* __restrict__ in_ptr2, 2023-01-11T21:38:06.0506255Z float* __restrict__ out_ptr0) 2023-01-11T21:38:06.0506313Z { 2023-01-11T21:38:06.0506393Z #pragma GCC ivdep 2023-01-11T21:38:06.0506477Z for(long i0=0; i0<6; i0+=1) 2023-01-11T21:38:06.0506543Z { 2023-01-11T21:38:06.0506609Z { 2023-01-11T21:38:06.0506675Z { 2023-01-11T21:38:06.0506761Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:06.0506848Z out_ptr0[i0] = tmp0; 2023-01-11T21:38:06.0506913Z } 2023-01-11T21:38:06.0506979Z } 2023-01-11T21:38:06.0507044Z } 2023-01-11T21:38:06.0507122Z #pragma GCC ivdep 2023-01-11T21:38:06.0507237Z for(long i0=0; i0<2; i0+=1) 2023-01-11T21:38:06.0507296Z { 2023-01-11T21:38:06.0507376Z #pragma GCC ivdep 2023-01-11T21:38:06.0507463Z for(long i1=0; i1<3; i1+=1) 2023-01-11T21:38:06.0507530Z { 2023-01-11T21:38:06.0507601Z { 2023-01-11T21:38:06.0507671Z { 2023-01-11T21:38:06.0507772Z auto tmp0 = in_ptr1[i1 + (3*i0)]; 2023-01-11T21:38:06.0507879Z auto tmp1 = static_cast(0); 2023-01-11T21:38:06.0507980Z out_ptr0[i1 + (3*tmp0)] = tmp1; 2023-01-11T21:38:06.0508048Z } 2023-01-11T21:38:06.0508112Z } 2023-01-11T21:38:06.0508177Z } 2023-01-11T21:38:06.0508242Z } 2023-01-11T21:38:06.0508314Z #pragma GCC ivdep 2023-01-11T21:38:06.0508395Z for(long i0=0; i0<2; i0+=1) 2023-01-11T21:38:06.0508461Z { 2023-01-11T21:38:06.0508542Z #pragma GCC ivdep 2023-01-11T21:38:06.0508627Z for(long i1=0; i1<3; i1+=1) 2023-01-11T21:38:06.0508695Z { 2023-01-11T21:38:06.0508762Z { 2023-01-11T21:38:06.0508826Z { 2023-01-11T21:38:06.0508932Z auto tmp0 = in_ptr1[i1 + (3*i0)]; 2023-01-11T21:38:06.0509033Z auto tmp1 = in_ptr2[i1 + (3*i0)]; 2023-01-11T21:38:06.0509150Z atomic_add(&out_ptr0[i1 + (3*tmp0)], tmp1); 2023-01-11T21:38:06.0509219Z } 2023-01-11T21:38:06.0509285Z } 2023-01-11T21:38:06.0509350Z } 2023-01-11T21:38:06.0509409Z } 2023-01-11T21:38:06.0509473Z } 2023-01-11T21:38:06.0509557Z ''') 2023-01-11T21:38:06.0509563Z 2023-01-11T21:38:06.0509568Z 2023-01-11T21:38:06.0509659Z async_compile.wait(globals()) 2023-01-11T21:38:06.0509735Z del async_compile 2023-01-11T21:38:06.0509740Z 2023-01-11T21:38:06.0509817Z def call(args): 2023-01-11T21:38:06.0509904Z arg0_1, arg1_1, arg2_1 = args 2023-01-11T21:38:06.0509974Z args.clear() 2023-01-11T21:38:06.0510170Z buf0 = empty_strided((2, 3), (3, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0510366Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(arg2_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.0510439Z del arg0_1 2023-01-11T21:38:06.0510509Z del arg1_1 2023-01-11T21:38:06.0510580Z del arg2_1 2023-01-11T21:38:06.0510683Z return (buf0, ) 2023-01-11T21:38:06.0510688Z 2023-01-11T21:38:06.0510693Z 2023-01-11T21:38:06.0510766Z if __name__ == "__main__": 2023-01-11T21:38:06.0510884Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.0511011Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.0511206Z arg0_1 = rand_strided((2, 3), (3, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0511398Z arg1_1 = rand_strided((2, 3), (3, 1), device='cpu', dtype=torch.int64) 2023-01-11T21:38:06.0511590Z arg2_1 = rand_strided((2, 3), (3, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0511721Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1])) 2023-01-11T21:38:06.0511729Z 2023-01-11T21:38:06.0511801Z ok (2.056s) 2023-01-11T21:38:06.0512277Z test_scheduler_vertical_fusion1_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.0512402Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.0512658Z [2023-01-11 21:31:05,002] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 241 2023-01-11T21:38:06.0512915Z [2023-01-11 21:31:07,464] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 241 2023-01-11T21:38:06.0512921Z 2023-01-11T21:38:06.0513019Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.0513122Z import torch 2023-01-11T21:38:06.0513195Z import random 2023-01-11T21:38:06.0513314Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.0513437Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.0513442Z 2023-01-11T21:38:06.0513518Z aten = torch.ops.aten 2023-01-11T21:38:06.0513658Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.0513751Z async_compile = AsyncCompile() 2023-01-11T21:38:06.0513756Z 2023-01-11T21:38:06.0513830Z import triton 2023-01-11T21:38:06.0513920Z import triton.language as tl 2023-01-11T21:38:06.0514044Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.0514185Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.0514190Z 2023-01-11T21:38:06.0514195Z 2023-01-11T21:38:06.0514330Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.0514528Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.0514653Z extern "C" void kernel(float* __restrict__ in_out_ptr0, 2023-01-11T21:38:06.0514759Z float* __restrict__ in_out_ptr1, 2023-01-11T21:38:06.0514868Z const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.0514980Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:06.0515085Z const float* __restrict__ in_ptr2) 2023-01-11T21:38:06.0515150Z { 2023-01-11T21:38:06.0515241Z auto out_ptr1 = in_out_ptr1; 2023-01-11T21:38:06.0515335Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.0515402Z { 2023-01-11T21:38:06.0515483Z #pragma omp for 2023-01-11T21:38:06.0515575Z for(long i0=0; i0<135252; i0+=1) 2023-01-11T21:38:06.0515640Z { 2023-01-11T21:38:06.0515781Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:06.0515914Z auto tmp8 = at::vec::Vectorized::loadu(in_ptr1 + 8*i0); 2023-01-11T21:38:06.0516135Z auto tmp1 = at::vec::Vectorized(static_cast(-1.061519070296458e-11)); 2023-01-11T21:38:06.0516227Z auto tmp2 = tmp0 * tmp1; 2023-01-11T21:38:06.0516445Z auto tmp3 = at::vec::Vectorized(static_cast(-1.988366587925593e-08)); 2023-01-11T21:38:06.0516566Z auto tmp4 = tmp2 + tmp3; 2023-01-11T21:38:06.0516657Z auto tmp5 = tmp0 * tmp4; 2023-01-11T21:38:06.0516878Z auto tmp6 = at::vec::Vectorized(static_cast(-3.087032500374211e-07)); 2023-01-11T21:38:06.0516967Z auto tmp7 = tmp5 + tmp6; 2023-01-11T21:38:06.0517184Z auto tmp9 = at::vec::Vectorized(static_cast(1.55093272922008e-10)); 2023-01-11T21:38:06.0517268Z auto tmp10 = tmp8 * tmp9; 2023-01-11T21:38:06.0517360Z auto tmp11 = tmp7 + tmp10; 2023-01-11T21:38:06.0517467Z auto tmp12 = tmp11.reciprocal(); 2023-01-11T21:38:06.0517612Z auto tmp13 = at::vec::Vectorized(static_cast(1.0)); 2023-01-11T21:38:06.0517707Z auto tmp14 = tmp12 * tmp13; 2023-01-11T21:38:06.0517810Z tmp11.store(in_out_ptr0 + 8*i0); 2023-01-11T21:38:06.0517907Z tmp14.store(out_ptr1 + 8*i0); 2023-01-11T21:38:06.0517974Z } 2023-01-11T21:38:06.0518069Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:06.0518173Z for(long i0=1082016; i0<1082016; i0+=1) 2023-01-11T21:38:06.0518239Z { 2023-01-11T21:38:06.0518328Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:06.0518416Z auto tmp8 = in_ptr1[i0]; 2023-01-11T21:38:06.0518593Z auto tmp1 = static_cast(-1.061519070296458e-11); 2023-01-11T21:38:06.0518681Z auto tmp2 = tmp0 * tmp1; 2023-01-11T21:38:06.0518846Z auto tmp3 = static_cast(-1.988366587925593e-08); 2023-01-11T21:38:06.0518934Z auto tmp4 = tmp2 + tmp3; 2023-01-11T21:38:06.0519023Z auto tmp5 = tmp0 * tmp4; 2023-01-11T21:38:06.0519228Z auto tmp6 = static_cast(-3.087032500374211e-07); 2023-01-11T21:38:06.0519315Z auto tmp7 = tmp5 + tmp6; 2023-01-11T21:38:06.0519489Z auto tmp9 = static_cast(1.55093272922008e-10); 2023-01-11T21:38:06.0519578Z auto tmp10 = tmp8 * tmp9; 2023-01-11T21:38:06.0519665Z auto tmp11 = tmp7 + tmp10; 2023-01-11T21:38:06.0519751Z auto tmp12 = 1 / tmp11; 2023-01-11T21:38:06.0519856Z auto tmp13 = static_cast(1.0); 2023-01-11T21:38:06.0519947Z auto tmp14 = tmp12 * tmp13; 2023-01-11T21:38:06.0520035Z in_out_ptr0[i0] = tmp11; 2023-01-11T21:38:06.0520120Z out_ptr1[i0] = tmp14; 2023-01-11T21:38:06.0520186Z } 2023-01-11T21:38:06.0520260Z #pragma omp for 2023-01-11T21:38:06.0520348Z for(long i0=0; i0<41616; i0+=1) 2023-01-11T21:38:06.0520414Z { 2023-01-11T21:38:06.0520499Z for(long i1=0; i1<3; i1+=1) 2023-01-11T21:38:06.0520570Z { 2023-01-11T21:38:06.0520723Z auto tmp0 = at::vec::Vectorized::loadu(in_out_ptr0 + (8*i1) + (26*i0)); 2023-01-11T21:38:06.0520864Z auto tmp1 = at::vec::Vectorized::loadu(in_ptr2 + 8*i1); 2023-01-11T21:38:06.0520949Z auto tmp2 = tmp0 * tmp1; 2023-01-11T21:38:06.0521066Z tmp2.store(in_out_ptr0 + (8*i1) + (26*i0)); 2023-01-11T21:38:06.0521134Z } 2023-01-11T21:38:06.0521233Z #pragma omp simd simdlen(4) 2023-01-11T21:38:06.0521324Z for(long i1=24; i1<26; i1+=1) 2023-01-11T21:38:06.0521395Z { 2023-01-11T21:38:06.0521503Z auto tmp0 = in_out_ptr0[i1 + (26*i0)]; 2023-01-11T21:38:06.0521588Z auto tmp1 = in_ptr2[i1]; 2023-01-11T21:38:06.0521678Z auto tmp2 = tmp0 * tmp1; 2023-01-11T21:38:06.0521777Z in_out_ptr0[i1 + (26*i0)] = tmp2; 2023-01-11T21:38:06.0521846Z } 2023-01-11T21:38:06.0521920Z } 2023-01-11T21:38:06.0521998Z #pragma omp for 2023-01-11T21:38:06.0522086Z for(long i0=0; i0<135252; i0+=1) 2023-01-11T21:38:06.0522145Z { 2023-01-11T21:38:06.0522282Z auto tmp0 = at::vec::Vectorized::loadu(out_ptr1 + 8*i0); 2023-01-11T21:38:06.0522497Z auto tmp1 = at::vec::Vectorized::loadu(in_out_ptr0 + 8*i0); 2023-01-11T21:38:06.0522588Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.0522689Z tmp2.store(in_out_ptr1 + 8*i0); 2023-01-11T21:38:06.0522755Z } 2023-01-11T21:38:06.0522853Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:06.0522942Z for(long i0=1082016; i0<1082016; i0+=1) 2023-01-11T21:38:06.0523008Z { 2023-01-11T21:38:06.0523098Z auto tmp0 = out_ptr1[i0]; 2023-01-11T21:38:06.0523193Z auto tmp1 = in_out_ptr0[i0]; 2023-01-11T21:38:06.0523278Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.0523367Z in_out_ptr1[i0] = tmp2; 2023-01-11T21:38:06.0523436Z } 2023-01-11T21:38:06.0523494Z } 2023-01-11T21:38:06.0523558Z } 2023-01-11T21:38:06.0523642Z ''') 2023-01-11T21:38:06.0523648Z 2023-01-11T21:38:06.0523652Z 2023-01-11T21:38:06.0523744Z async_compile.wait(globals()) 2023-01-11T21:38:06.0523821Z del async_compile 2023-01-11T21:38:06.0523827Z 2023-01-11T21:38:06.0523905Z def call(args): 2023-01-11T21:38:06.0523992Z arg0_1, arg1_1, arg2_1 = args 2023-01-11T21:38:06.0524060Z args.clear() 2023-01-11T21:38:06.0524273Z buf0 = empty_strided((204, 204, 26), (5304, 26, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0524363Z buf1 = buf0; del buf0 # reuse 2023-01-11T21:38:06.0524575Z buf2 = empty_strided((204, 204, 26), (5304, 26, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0524663Z buf3 = buf1; del buf1 # reuse 2023-01-11T21:38:06.0524751Z buf4 = buf2; del buf2 # reuse 2023-01-11T21:38:06.0525002Z kernel_cpp_0(c_void_p(buf3.data_ptr()), c_void_p(buf4.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(arg0_1.data_ptr()), c_void_p(arg2_1.data_ptr())) 2023-01-11T21:38:06.0525115Z del arg0_1 2023-01-11T21:38:06.0525179Z del arg1_1 2023-01-11T21:38:06.0525249Z del arg2_1 2023-01-11T21:38:06.0525324Z return (buf4, ) 2023-01-11T21:38:06.0525329Z 2023-01-11T21:38:06.0525334Z 2023-01-11T21:38:06.0525416Z if __name__ == "__main__": 2023-01-11T21:38:06.0525537Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.0525662Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.0525875Z arg0_1 = rand_strided((204, 204, 26), (5304, 26, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0526080Z arg1_1 = rand_strided((204, 204, 26), (5304, 26, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0526270Z arg2_1 = rand_strided((26, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0526394Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1])) 2023-01-11T21:38:06.0526400Z 2023-01-11T21:38:06.0526474Z ok (2.664s) 2023-01-11T21:38:06.0526935Z test_select_scatter_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.0527066Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.0527325Z [2023-01-11 21:31:07,517] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 242 2023-01-11T21:38:06.0527585Z [2023-01-11 21:31:09,285] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 242 2023-01-11T21:38:06.0527591Z 2023-01-11T21:38:06.0527686Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.0527758Z import torch 2023-01-11T21:38:06.0527826Z import random 2023-01-11T21:38:06.0527945Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.0528066Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.0528072Z 2023-01-11T21:38:06.0528151Z aten = torch.ops.aten 2023-01-11T21:38:06.0528288Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.0528415Z async_compile = AsyncCompile() 2023-01-11T21:38:06.0528421Z 2023-01-11T21:38:06.0528495Z import triton 2023-01-11T21:38:06.0528581Z import triton.language as tl 2023-01-11T21:38:06.0528706Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.0528844Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.0528849Z 2023-01-11T21:38:06.0528854Z 2023-01-11T21:38:06.0528991Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.0529197Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.0529320Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.0529431Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:06.0529537Z const float* __restrict__ in_ptr2, 2023-01-11T21:38:06.0529641Z float* __restrict__ out_ptr0, 2023-01-11T21:38:06.0529736Z float* __restrict__ out_ptr1) 2023-01-11T21:38:06.0529806Z { 2023-01-11T21:38:06.0529907Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.0529974Z { 2023-01-11T21:38:06.0530056Z #pragma omp for 2023-01-11T21:38:06.0530142Z for(long i0=0; i0<8; i0+=1) 2023-01-11T21:38:06.0530201Z { 2023-01-11T21:38:06.0530287Z #pragma GCC ivdep 2023-01-11T21:38:06.0530377Z for(long i1=0; i1<197; i1+=1) 2023-01-11T21:38:06.0530443Z { 2023-01-11T21:38:06.0530529Z #pragma GCC ivdep 2023-01-11T21:38:06.0530625Z for(long i2=0; i2<38; i2+=1) 2023-01-11T21:38:06.0530694Z { 2023-01-11T21:38:06.0530786Z { 2023-01-11T21:38:06.0530859Z { 2023-01-11T21:38:06.0530970Z auto tmp3 = in_ptr0[i2 + (38*i0)]; 2023-01-11T21:38:06.0531085Z auto tmp4 = in_ptr1[i2 + (38*i1) + (7486*i0)]; 2023-01-11T21:38:06.0531197Z auto tmp0 = static_cast(i1); 2023-01-11T21:38:06.0531307Z auto tmp1 = static_cast(0); 2023-01-11T21:38:06.0531408Z auto tmp2 = tmp0 == tmp1; 2023-01-11T21:38:06.0531509Z auto tmp5 = tmp2 ? tmp3 : tmp4; 2023-01-11T21:38:06.0531621Z out_ptr0[i2 + (38*i1) + (7486*i0)] = tmp5; 2023-01-11T21:38:06.0531694Z } 2023-01-11T21:38:06.0531766Z } 2023-01-11T21:38:06.0531836Z } 2023-01-11T21:38:06.0531904Z } 2023-01-11T21:38:06.0531971Z } 2023-01-11T21:38:06.0532044Z #pragma omp for 2023-01-11T21:38:06.0532134Z for(long i0=0; i0<8; i0+=1) 2023-01-11T21:38:06.0532198Z { 2023-01-11T21:38:06.0532282Z #pragma GCC ivdep 2023-01-11T21:38:06.0532373Z for(long i1=0; i1<7486; i1+=1) 2023-01-11T21:38:06.0532439Z { 2023-01-11T21:38:06.0532508Z { 2023-01-11T21:38:06.0532574Z { 2023-01-11T21:38:06.0532675Z auto tmp3 = in_ptr2[i1]; 2023-01-11T21:38:06.0532788Z auto tmp4 = in_ptr1[i1 + (7486*i0)]; 2023-01-11T21:38:06.0532896Z auto tmp0 = static_cast(i0); 2023-01-11T21:38:06.0533005Z auto tmp1 = static_cast(1); 2023-01-11T21:38:06.0533106Z auto tmp2 = tmp0 == tmp1; 2023-01-11T21:38:06.0533212Z auto tmp5 = tmp2 ? tmp3 : tmp4; 2023-01-11T21:38:06.0533306Z out_ptr1[i1 + (7486*i0)] = tmp5; 2023-01-11T21:38:06.0533376Z } 2023-01-11T21:38:06.0533449Z } 2023-01-11T21:38:06.0533514Z } 2023-01-11T21:38:06.0533582Z } 2023-01-11T21:38:06.0533649Z } 2023-01-11T21:38:06.0533714Z } 2023-01-11T21:38:06.0533792Z ''') 2023-01-11T21:38:06.0533798Z 2023-01-11T21:38:06.0533803Z 2023-01-11T21:38:06.0533898Z async_compile.wait(globals()) 2023-01-11T21:38:06.0534001Z del async_compile 2023-01-11T21:38:06.0534007Z 2023-01-11T21:38:06.0534082Z def call(args): 2023-01-11T21:38:06.0534166Z arg0_1, arg1_1, arg2_1 = args 2023-01-11T21:38:06.0534242Z args.clear() 2023-01-11T21:38:06.0534454Z buf0 = empty_strided((8, 197, 38), (7486, 38, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0534770Z buf1 = empty_strided((8, 197, 38), (7486, 38, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0534987Z kernel_cpp_0(c_void_p(arg1_1.data_ptr()), c_void_p(arg0_1.data_ptr()), c_void_p(arg2_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr())) 2023-01-11T21:38:06.0535065Z del arg0_1 2023-01-11T21:38:06.0535136Z del arg1_1 2023-01-11T21:38:06.0535206Z del arg2_1 2023-01-11T21:38:06.0535288Z return (buf0, buf1, ) 2023-01-11T21:38:06.0535294Z 2023-01-11T21:38:06.0535298Z 2023-01-11T21:38:06.0535376Z if __name__ == "__main__": 2023-01-11T21:38:06.0535493Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.0535615Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.0535825Z arg0_1 = rand_strided((8, 197, 38), (7486, 38, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0536024Z arg1_1 = rand_strided((8, 38), (38, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0536223Z arg2_1 = rand_strided((197, 38), (38, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0536348Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1])) 2023-01-11T21:38:06.0536353Z 2023-01-11T21:38:06.0536421Z ok (1.800s) 2023-01-11T21:38:06.0536869Z test_sgn_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.0537051Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.0537377Z [2023-01-11 21:31:09,307] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 243 2023-01-11T21:38:06.0537644Z [2023-01-11 21:31:11,035] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 243 2023-01-11T21:38:06.0537650Z 2023-01-11T21:38:06.0537741Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.0537816Z import torch 2023-01-11T21:38:06.0537892Z import random 2023-01-11T21:38:06.0538013Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.0538138Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.0538147Z 2023-01-11T21:38:06.0538228Z aten = torch.ops.aten 2023-01-11T21:38:06.0538367Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.0538455Z async_compile = AsyncCompile() 2023-01-11T21:38:06.0538466Z 2023-01-11T21:38:06.0538532Z import triton 2023-01-11T21:38:06.0538627Z import triton.language as tl 2023-01-11T21:38:06.0538749Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.0538890Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.0538895Z 2023-01-11T21:38:06.0538900Z 2023-01-11T21:38:06.0539039Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.0539248Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.0539371Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.0539469Z float* __restrict__ out_ptr0, 2023-01-11T21:38:06.0539581Z float* __restrict__ out_ptr1) 2023-01-11T21:38:06.0539647Z { 2023-01-11T21:38:06.0539753Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.0539818Z { 2023-01-11T21:38:06.0539900Z #pragma omp for 2023-01-11T21:38:06.0539988Z for(long i0=0; i0<5; i0+=1) 2023-01-11T21:38:06.0540048Z { 2023-01-11T21:38:06.0540228Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:06.0540401Z auto tmp1 = decltype(tmp0)::blendv(decltype(tmp0)(0), decltype(tmp0)(1), decltype(tmp0)(0) < tmp0); 2023-01-11T21:38:06.0540575Z auto tmp2 = decltype(tmp0)::blendv(decltype(tmp0)(0), decltype(tmp0)(1), tmp0 < decltype(tmp0)(0)); 2023-01-11T21:38:06.0540705Z auto tmp3 = tmp1 - tmp2; 2023-01-11T21:38:06.0540840Z auto tmp4 = at::vec::Vectorized(static_cast(1)); 2023-01-11T21:38:06.0540929Z auto tmp5 = tmp0 + tmp4; 2023-01-11T21:38:06.0541096Z auto tmp6 = decltype(tmp5)::blendv(decltype(tmp5)(0), decltype(tmp5)(1), decltype(tmp5)(0) < tmp5); 2023-01-11T21:38:06.0541266Z auto tmp7 = decltype(tmp5)::blendv(decltype(tmp5)(0), decltype(tmp5)(1), tmp5 < decltype(tmp5)(0)); 2023-01-11T21:38:06.0541386Z auto tmp8 = tmp6 - tmp7; 2023-01-11T21:38:06.0541512Z auto tmp9 = tmp8 - tmp4; 2023-01-11T21:38:06.0541611Z tmp3.store(out_ptr0 + 8*i0); 2023-01-11T21:38:06.0541706Z tmp9.store(out_ptr1 + 8*i0); 2023-01-11T21:38:06.0541772Z } 2023-01-11T21:38:06.0541874Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:06.0541960Z for(long i0=40; i0<41; i0+=1) 2023-01-11T21:38:06.0542020Z { 2023-01-11T21:38:06.0542108Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:06.0542199Z auto tmp1 = tmp0 > 0 ? 1 : 0; 2023-01-11T21:38:06.0542290Z auto tmp2 = tmp0 < 0 ? 1 : 0; 2023-01-11T21:38:06.0542416Z auto tmp3 = tmp1 - tmp2; 2023-01-11T21:38:06.0542562Z auto tmp4 = static_cast(1); 2023-01-11T21:38:06.0542653Z auto tmp5 = tmp0 + tmp4; 2023-01-11T21:38:06.0542736Z auto tmp6 = tmp5 > 0 ? 1 : 0; 2023-01-11T21:38:06.0542830Z auto tmp7 = tmp5 < 0 ? 1 : 0; 2023-01-11T21:38:06.0542958Z auto tmp8 = tmp6 - tmp7; 2023-01-11T21:38:06.0543084Z auto tmp9 = tmp8 - tmp4; 2023-01-11T21:38:06.0543172Z out_ptr0[i0] = tmp3; 2023-01-11T21:38:06.0543257Z out_ptr1[i0] = tmp9; 2023-01-11T21:38:06.0543323Z } 2023-01-11T21:38:06.0543381Z } 2023-01-11T21:38:06.0543444Z } 2023-01-11T21:38:06.0543529Z ''') 2023-01-11T21:38:06.0543535Z 2023-01-11T21:38:06.0543539Z 2023-01-11T21:38:06.0543633Z async_compile.wait(globals()) 2023-01-11T21:38:06.0543715Z del async_compile 2023-01-11T21:38:06.0543720Z 2023-01-11T21:38:06.0543794Z def call(args): 2023-01-11T21:38:06.0543868Z arg0_1, = args 2023-01-11T21:38:06.0543936Z args.clear() 2023-01-11T21:38:06.0544132Z buf0 = empty_strided((41, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0544328Z buf1 = empty_strided((41, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0544496Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr())) 2023-01-11T21:38:06.0544568Z del arg0_1 2023-01-11T21:38:06.0544651Z return (buf0, buf1, ) 2023-01-11T21:38:06.0544656Z 2023-01-11T21:38:06.0544661Z 2023-01-11T21:38:06.0544740Z if __name__ == "__main__": 2023-01-11T21:38:06.0544858Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.0544977Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.0545173Z arg0_1 = rand_strided((41, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0545287Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.0545292Z 2023-01-11T21:38:06.0545362Z ok (1.748s) 2023-01-11T21:38:06.0545820Z test_sgn_extremal_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.0545986Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.0546244Z [2023-01-11 21:31:11,048] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 244 2023-01-11T21:38:06.0546507Z [2023-01-11 21:31:12,722] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 244 2023-01-11T21:38:06.0546513Z 2023-01-11T21:38:06.0546612Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.0546680Z import torch 2023-01-11T21:38:06.0546755Z import random 2023-01-11T21:38:06.0546874Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.0546998Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.0547007Z 2023-01-11T21:38:06.0547088Z aten = torch.ops.aten 2023-01-11T21:38:06.0547225Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.0547319Z async_compile = AsyncCompile() 2023-01-11T21:38:06.0547324Z 2023-01-11T21:38:06.0547397Z import triton 2023-01-11T21:38:06.0547485Z import triton.language as tl 2023-01-11T21:38:06.0547611Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.0547751Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.0547756Z 2023-01-11T21:38:06.0547761Z 2023-01-11T21:38:06.0547899Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.0548104Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.0548228Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.0548330Z float* __restrict__ out_ptr0) 2023-01-11T21:38:06.0548424Z { 2023-01-11T21:38:06.0548498Z #pragma GCC ivdep 2023-01-11T21:38:06.0548582Z for(long i0=0; i0<4; i0+=1) 2023-01-11T21:38:06.0548647Z { 2023-01-11T21:38:06.0548713Z { 2023-01-11T21:38:06.0548780Z { 2023-01-11T21:38:06.0548873Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:06.0548966Z auto tmp1 = tmp0 > 0 ? 1 : 0; 2023-01-11T21:38:06.0549063Z auto tmp2 = tmp0 < 0 ? 1 : 0; 2023-01-11T21:38:06.0549192Z auto tmp3 = tmp1 - tmp2; 2023-01-11T21:38:06.0549279Z out_ptr0[i0] = tmp3; 2023-01-11T21:38:06.0549344Z } 2023-01-11T21:38:06.0549412Z } 2023-01-11T21:38:06.0549479Z } 2023-01-11T21:38:06.0549536Z } 2023-01-11T21:38:06.0549618Z ''') 2023-01-11T21:38:06.0549624Z 2023-01-11T21:38:06.0549628Z 2023-01-11T21:38:06.0549719Z async_compile.wait(globals()) 2023-01-11T21:38:06.0549796Z del async_compile 2023-01-11T21:38:06.0549802Z 2023-01-11T21:38:06.0549874Z def call(args): 2023-01-11T21:38:06.0549952Z arg0_1, = args 2023-01-11T21:38:06.0550025Z args.clear() 2023-01-11T21:38:06.0550217Z buf0 = empty_strided((4, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0550348Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.0550421Z del arg0_1 2023-01-11T21:38:06.0550501Z return (buf0, ) 2023-01-11T21:38:06.0550506Z 2023-01-11T21:38:06.0550510Z 2023-01-11T21:38:06.0550590Z if __name__ == "__main__": 2023-01-11T21:38:06.0550707Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.0550833Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.0551025Z arg0_1 = rand_strided((4, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0551129Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.0551144Z 2023-01-11T21:38:06.0551208Z ok (1.686s) 2023-01-11T21:38:06.0551673Z test_shape_prop_torch_ones_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.0551837Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.0552093Z [2023-01-11 21:31:12,940] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 245 2023-01-11T21:38:06.0552354Z [2023-01-11 21:31:14,628] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 245 2023-01-11T21:38:06.0552360Z 2023-01-11T21:38:06.0552458Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.0552533Z import torch 2023-01-11T21:38:06.0552607Z import random 2023-01-11T21:38:06.0552726Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.0552846Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.0552855Z 2023-01-11T21:38:06.0552936Z aten = torch.ops.aten 2023-01-11T21:38:06.0553072Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.0553167Z async_compile = AsyncCompile() 2023-01-11T21:38:06.0553172Z 2023-01-11T21:38:06.0553245Z import triton 2023-01-11T21:38:06.0553341Z import triton.language as tl 2023-01-11T21:38:06.0553465Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.0553597Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.0553610Z 2023-01-11T21:38:06.0553614Z 2023-01-11T21:38:06.0553743Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.0553946Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.0554069Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.0554171Z float* __restrict__ out_ptr0) 2023-01-11T21:38:06.0554266Z { 2023-01-11T21:38:06.0554367Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.0554433Z { 2023-01-11T21:38:06.0554507Z #pragma omp for 2023-01-11T21:38:06.0554600Z for(long i0=0; i0<3145728; i0+=1) 2023-01-11T21:38:06.0554665Z { 2023-01-11T21:38:06.0554806Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:06.0554956Z auto tmp1 = at::vec::Vectorized(static_cast(1)); 2023-01-11T21:38:06.0555058Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.0555168Z tmp2.store(out_ptr0 + 8*i0); 2023-01-11T21:38:06.0555235Z } 2023-01-11T21:38:06.0555332Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:06.0555430Z for(long i0=25165824; i0<25165824; i0+=1) 2023-01-11T21:38:06.0555499Z { 2023-01-11T21:38:06.0555586Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:06.0555690Z auto tmp1 = static_cast(1); 2023-01-11T21:38:06.0555779Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.0555860Z out_ptr0[i0] = tmp2; 2023-01-11T21:38:06.0555926Z } 2023-01-11T21:38:06.0555990Z } 2023-01-11T21:38:06.0556054Z } 2023-01-11T21:38:06.0556138Z ''') 2023-01-11T21:38:06.0556144Z 2023-01-11T21:38:06.0556148Z 2023-01-11T21:38:06.0556239Z async_compile.wait(globals()) 2023-01-11T21:38:06.0556318Z del async_compile 2023-01-11T21:38:06.0556324Z 2023-01-11T21:38:06.0556393Z def call(args): 2023-01-11T21:38:06.0556466Z arg0_1, = args 2023-01-11T21:38:06.0556542Z args.clear() 2023-01-11T21:38:06.0556774Z buf0 = empty_strided((8, 12, 512, 512), (3145728, 262144, 512, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0556913Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.0556984Z del arg0_1 2023-01-11T21:38:06.0557060Z return (buf0, ) 2023-01-11T21:38:06.0557065Z 2023-01-11T21:38:06.0557069Z 2023-01-11T21:38:06.0557150Z if __name__ == "__main__": 2023-01-11T21:38:06.0557263Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.0557388Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.0557619Z arg0_1 = rand_strided((8, 12, 512, 512), (3145728, 262144, 512, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0557732Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.0557765Z 2023-01-11T21:38:06.0557838Z ok (2.438s) 2023-01-11T21:38:06.0558289Z test_sigmoid_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.0558420Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.0558674Z [2023-01-11 21:31:15,179] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 246 2023-01-11T21:38:06.0558933Z [2023-01-11 21:31:16,887] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 246 2023-01-11T21:38:06.0558939Z 2023-01-11T21:38:06.0559030Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.0559103Z import torch 2023-01-11T21:38:06.0559181Z import random 2023-01-11T21:38:06.0559300Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.0559422Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.0559427Z 2023-01-11T21:38:06.0559509Z aten = torch.ops.aten 2023-01-11T21:38:06.0559642Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.0559737Z async_compile = AsyncCompile() 2023-01-11T21:38:06.0559743Z 2023-01-11T21:38:06.0559810Z import triton 2023-01-11T21:38:06.0559903Z import triton.language as tl 2023-01-11T21:38:06.0560028Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.0560196Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.0560201Z 2023-01-11T21:38:06.0560206Z 2023-01-11T21:38:06.0560340Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.0560543Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.0560671Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.0560781Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:06.0560878Z float* __restrict__ out_ptr0, 2023-01-11T21:38:06.0560980Z float* __restrict__ out_ptr1) 2023-01-11T21:38:06.0561045Z { 2023-01-11T21:38:06.0561147Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.0561211Z { 2023-01-11T21:38:06.0561292Z #pragma omp for 2023-01-11T21:38:06.0561376Z for(long i0=0; i0<8; i0+=1) 2023-01-11T21:38:06.0561436Z { 2023-01-11T21:38:06.0561575Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:06.0561716Z auto tmp2 = at::vec::Vectorized::loadu(in_ptr1 + 8*i0); 2023-01-11T21:38:06.0561855Z auto tmp1 = decltype(tmp0)(1)/(decltype(tmp0)(1) + tmp0.neg().exp()); 2023-01-11T21:38:06.0561946Z auto tmp3 = tmp0 + tmp2; 2023-01-11T21:38:06.0562090Z auto tmp4 = decltype(tmp3)(1)/(decltype(tmp3)(1) + tmp3.neg().exp()); 2023-01-11T21:38:06.0562186Z tmp1.store(out_ptr0 + 8*i0); 2023-01-11T21:38:06.0562275Z tmp4.store(out_ptr1 + 8*i0); 2023-01-11T21:38:06.0562341Z } 2023-01-11T21:38:06.0562440Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:06.0562526Z for(long i0=64; i0<64; i0+=1) 2023-01-11T21:38:06.0562591Z { 2023-01-11T21:38:06.0562678Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:06.0562765Z auto tmp3 = in_ptr1[i0]; 2023-01-11T21:38:06.0562895Z auto tmp1 = std::exp(-tmp0); 2023-01-11T21:38:06.0562986Z auto tmp2 = 1 / (1 + tmp1); 2023-01-11T21:38:06.0563073Z auto tmp4 = tmp0 + tmp3; 2023-01-11T21:38:06.0563206Z auto tmp5 = std::exp(-tmp4); 2023-01-11T21:38:06.0563292Z auto tmp6 = 1 / (1 + tmp5); 2023-01-11T21:38:06.0563377Z out_ptr0[i0] = tmp2; 2023-01-11T21:38:06.0563490Z out_ptr1[i0] = tmp6; 2023-01-11T21:38:06.0563552Z } 2023-01-11T21:38:06.0563619Z } 2023-01-11T21:38:06.0563683Z } 2023-01-11T21:38:06.0563767Z ''') 2023-01-11T21:38:06.0563773Z 2023-01-11T21:38:06.0563777Z 2023-01-11T21:38:06.0563870Z async_compile.wait(globals()) 2023-01-11T21:38:06.0563946Z del async_compile 2023-01-11T21:38:06.0563952Z 2023-01-11T21:38:06.0564027Z def call(args): 2023-01-11T21:38:06.0564100Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.0564174Z args.clear() 2023-01-11T21:38:06.0564369Z buf0 = empty_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0564564Z buf1 = empty_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0564761Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr())) 2023-01-11T21:38:06.0564831Z del arg0_1 2023-01-11T21:38:06.0564899Z del arg1_1 2023-01-11T21:38:06.0564978Z return (buf0, buf1, ) 2023-01-11T21:38:06.0564986Z 2023-01-11T21:38:06.0564992Z 2023-01-11T21:38:06.0565079Z if __name__ == "__main__": 2023-01-11T21:38:06.0565208Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.0565357Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.0565549Z arg0_1 = rand_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0565742Z arg1_1 = rand_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0565861Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.0565866Z 2023-01-11T21:38:06.0565936Z ok (1.728s) 2023-01-11T21:38:06.0566387Z test_signbit_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.0566547Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.0566798Z [2023-01-11 21:31:16,909] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 247 2023-01-11T21:38:06.0567058Z [2023-01-11 21:31:18,663] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 247 2023-01-11T21:38:06.0567063Z 2023-01-11T21:38:06.0567160Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.0567235Z import torch 2023-01-11T21:38:06.0567311Z import random 2023-01-11T21:38:06.0567429Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.0567553Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.0567558Z 2023-01-11T21:38:06.0567639Z aten = torch.ops.aten 2023-01-11T21:38:06.0567767Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.0567862Z async_compile = AsyncCompile() 2023-01-11T21:38:06.0567867Z 2023-01-11T21:38:06.0567943Z import triton 2023-01-11T21:38:06.0568036Z import triton.language as tl 2023-01-11T21:38:06.0568162Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.0568301Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.0568306Z 2023-01-11T21:38:06.0568311Z 2023-01-11T21:38:06.0568446Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.0568654Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.0568771Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.0568871Z bool* __restrict__ out_ptr0, 2023-01-11T21:38:06.0568974Z long* __restrict__ out_ptr1) 2023-01-11T21:38:06.0569036Z { 2023-01-11T21:38:06.0569139Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.0569205Z { 2023-01-11T21:38:06.0569289Z #pragma omp for 2023-01-11T21:38:06.0569368Z for(long i0=0; i0<72; i0+=1) 2023-01-11T21:38:06.0569463Z { 2023-01-11T21:38:06.0569531Z { 2023-01-11T21:38:06.0569600Z { 2023-01-11T21:38:06.0569700Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:06.0569808Z auto tmp1 = std::signbit(tmp0); 2023-01-11T21:38:06.0569935Z auto tmp2 = -tmp0; 2023-01-11T21:38:06.0570039Z auto tmp3 = std::signbit(tmp2); 2023-01-11T21:38:06.0576413Z auto tmp4 = tmp3 == 0; 2023-01-11T21:38:06.0576549Z auto tmp5 = static_cast(tmp4); 2023-01-11T21:38:06.0576653Z auto tmp6 = static_cast(1); 2023-01-11T21:38:06.0576757Z auto tmp7 = tmp5 & tmp6; 2023-01-11T21:38:06.0576848Z out_ptr0[i0] = tmp1; 2023-01-11T21:38:06.0576942Z out_ptr1[i0] = tmp7; 2023-01-11T21:38:06.0577017Z } 2023-01-11T21:38:06.0577088Z } 2023-01-11T21:38:06.0577235Z } 2023-01-11T21:38:06.0577301Z } 2023-01-11T21:38:06.0577369Z } 2023-01-11T21:38:06.0577478Z ''') 2023-01-11T21:38:06.0577483Z 2023-01-11T21:38:06.0577488Z 2023-01-11T21:38:06.0577586Z async_compile.wait(globals()) 2023-01-11T21:38:06.0577665Z del async_compile 2023-01-11T21:38:06.0577670Z 2023-01-11T21:38:06.0577748Z def call(args): 2023-01-11T21:38:06.0577823Z arg0_1, = args 2023-01-11T21:38:06.0577894Z args.clear() 2023-01-11T21:38:06.0578104Z buf0 = empty_strided((1, 2, 6, 6), (72, 36, 6, 1), device='cpu', dtype=torch.bool) 2023-01-11T21:38:06.0578311Z buf1 = empty_strided((1, 2, 6, 6), (72, 36, 6, 1), device='cpu', dtype=torch.int64) 2023-01-11T21:38:06.0578561Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr())) 2023-01-11T21:38:06.0578635Z del arg0_1 2023-01-11T21:38:06.0578715Z return (buf0, buf1, ) 2023-01-11T21:38:06.0578720Z 2023-01-11T21:38:06.0578724Z 2023-01-11T21:38:06.0578806Z if __name__ == "__main__": 2023-01-11T21:38:06.0578926Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.0579045Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.0579257Z arg0_1 = rand_strided((1, 2, 6, 6), (72, 36, 6, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0579370Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.0579375Z 2023-01-11T21:38:06.0579447Z ok (1.776s) 2023-01-11T21:38:06.0579899Z test_silu_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.0580035Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.0580298Z [2023-01-11 21:31:18,686] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 248 2023-01-11T21:38:06.0580565Z [2023-01-11 21:31:20,374] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 248 2023-01-11T21:38:06.0580571Z 2023-01-11T21:38:06.0580670Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.0580737Z import torch 2023-01-11T21:38:06.0580814Z import random 2023-01-11T21:38:06.0580933Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.0581053Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.0581058Z 2023-01-11T21:38:06.0581143Z aten = torch.ops.aten 2023-01-11T21:38:06.0581274Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.0581372Z async_compile = AsyncCompile() 2023-01-11T21:38:06.0581378Z 2023-01-11T21:38:06.0581451Z import triton 2023-01-11T21:38:06.0581548Z import triton.language as tl 2023-01-11T21:38:06.0581672Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.0581846Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.0581853Z 2023-01-11T21:38:06.0581857Z 2023-01-11T21:38:06.0581997Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.0582204Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.0582322Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.0582427Z float* __restrict__ out_ptr0) 2023-01-11T21:38:06.0582492Z { 2023-01-11T21:38:06.0582592Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.0582658Z { 2023-01-11T21:38:06.0582739Z #pragma omp for 2023-01-11T21:38:06.0582826Z for(long i0=0; i0<8; i0+=1) 2023-01-11T21:38:06.0582887Z { 2023-01-11T21:38:06.0583034Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:06.0583172Z auto tmp1 = decltype(tmp0)(1)/(decltype(tmp0)(1) + tmp0.neg().exp()); 2023-01-11T21:38:06.0583264Z auto tmp2 = tmp0 * tmp1; 2023-01-11T21:38:06.0583359Z tmp2.store(out_ptr0 + 8*i0); 2023-01-11T21:38:06.0583426Z } 2023-01-11T21:38:06.0583526Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:06.0583606Z for(long i0=64; i0<64; i0+=1) 2023-01-11T21:38:06.0583673Z { 2023-01-11T21:38:06.0583758Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:06.0583897Z auto tmp1 = std::exp(-tmp0); 2023-01-11T21:38:06.0583985Z auto tmp2 = 1 / (1 + tmp1); 2023-01-11T21:38:06.0584072Z auto tmp3 = tmp0 * tmp2; 2023-01-11T21:38:06.0584157Z out_ptr0[i0] = tmp3; 2023-01-11T21:38:06.0584252Z } 2023-01-11T21:38:06.0584317Z } 2023-01-11T21:38:06.0584378Z } 2023-01-11T21:38:06.0584461Z ''') 2023-01-11T21:38:06.0584467Z 2023-01-11T21:38:06.0584471Z 2023-01-11T21:38:06.0584565Z async_compile.wait(globals()) 2023-01-11T21:38:06.0584645Z del async_compile 2023-01-11T21:38:06.0584651Z 2023-01-11T21:38:06.0584725Z def call(args): 2023-01-11T21:38:06.0584794Z arg0_1, = args 2023-01-11T21:38:06.0584872Z args.clear() 2023-01-11T21:38:06.0585072Z buf0 = empty_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0585225Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.0585309Z del arg0_1 2023-01-11T21:38:06.0585392Z return (buf0, ) 2023-01-11T21:38:06.0585399Z 2023-01-11T21:38:06.0585404Z 2023-01-11T21:38:06.0585498Z if __name__ == "__main__": 2023-01-11T21:38:06.0585618Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.0585738Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.0585936Z arg0_1 = rand_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0586045Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.0586050Z 2023-01-11T21:38:06.0586123Z ok (1.710s) 2023-01-11T21:38:06.0586582Z test_simplify_loops_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.0587223Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.0587649Z [2023-01-11 21:31:20,388] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 249 2023-01-11T21:38:06.0588110Z [2023-01-11 21:31:22,105] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 249 2023-01-11T21:38:06.0588321Z 2023-01-11T21:38:06.0588423Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.0588632Z import torch 2023-01-11T21:38:06.0588810Z import random 2023-01-11T21:38:06.0589038Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.0589352Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.0589519Z 2023-01-11T21:38:06.0589595Z aten = torch.ops.aten 2023-01-11T21:38:06.0589856Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.0590121Z async_compile = AsyncCompile() 2023-01-11T21:38:06.0590256Z 2023-01-11T21:38:06.0590332Z import triton 2023-01-11T21:38:06.0590526Z import triton.language as tl 2023-01-11T21:38:06.0590786Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.0591087Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.0591270Z 2023-01-11T21:38:06.0591274Z 2023-01-11T21:38:06.0591411Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.0591756Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.0592112Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.0592375Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:06.0592618Z float* __restrict__ out_ptr0) 2023-01-11T21:38:06.0592820Z { 2023-01-11T21:38:06.0593022Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.0593220Z { 2023-01-11T21:38:06.0593411Z #pragma omp for collapse(2) 2023-01-11T21:38:06.0593629Z for(long i0=0; i0<6; i0+=1) 2023-01-11T21:38:06.0593813Z { 2023-01-11T21:38:06.0594000Z for(long i1=0; i1<4; i1+=1) 2023-01-11T21:38:06.0594193Z { 2023-01-11T21:38:06.0594384Z for(long i2=0; i2<3; i2+=1) 2023-01-11T21:38:06.0594579Z { 2023-01-11T21:38:06.0594845Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + (8*i2) + (30*i1) + (120*i0)); 2023-01-11T21:38:06.0595266Z auto tmp1 = at::vec::Vectorized::loadu(in_ptr1 + (8*i2) + (30*i0) + (180*i1)); 2023-01-11T21:38:06.0595543Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.0595799Z tmp2.store(out_ptr0 + (8*i2) + (30*i1) + (120*i0)); 2023-01-11T21:38:06.0596017Z } 2023-01-11T21:38:06.0596217Z #pragma omp simd simdlen(4) 2023-01-11T21:38:06.0596447Z for(long i2=24; i2<30; i2+=1) 2023-01-11T21:38:06.0596646Z { 2023-01-11T21:38:06.0596858Z auto tmp0 = in_ptr0[i2 + (30*i1) + (120*i0)]; 2023-01-11T21:38:06.0597116Z auto tmp1 = in_ptr1[i2 + (30*i0) + (180*i1)]; 2023-01-11T21:38:06.0597355Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.0597587Z out_ptr0[i2 + (30*i1) + (120*i0)] = tmp2; 2023-01-11T21:38:06.0597788Z } 2023-01-11T21:38:06.0597965Z } 2023-01-11T21:38:06.0598129Z } 2023-01-11T21:38:06.0598292Z } 2023-01-11T21:38:06.0598461Z } 2023-01-11T21:38:06.0598646Z ''') 2023-01-11T21:38:06.0598758Z 2023-01-11T21:38:06.0598763Z 2023-01-11T21:38:06.0598861Z async_compile.wait(globals()) 2023-01-11T21:38:06.0599071Z del async_compile 2023-01-11T21:38:06.0599189Z 2023-01-11T21:38:06.0599269Z def call(args): 2023-01-11T21:38:06.0599456Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.0599649Z args.clear() 2023-01-11T21:38:06.0599990Z buf0 = empty_strided((2, 3, 4, 5, 6), (360, 120, 30, 6, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0600328Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.0600592Z del arg0_1 2023-01-11T21:38:06.0600767Z del arg1_1 2023-01-11T21:38:06.0600942Z return (buf0, ) 2023-01-11T21:38:06.0601056Z 2023-01-11T21:38:06.0601061Z 2023-01-11T21:38:06.0601142Z if __name__ == "__main__": 2023-01-11T21:38:06.0601380Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.0601663Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.0602055Z arg0_1 = rand_strided((2, 3, 4, 5, 6), (360, 120, 30, 6, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0602472Z arg1_1 = rand_strided((2, 3, 4, 5, 6), (90, 30, 180, 6, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0602765Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.0602921Z 2023-01-11T21:38:06.0602994Z ok (1.734s) 2023-01-11T21:38:06.0603549Z test_sin_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.0604125Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.0604552Z [2023-01-11 21:31:22,138] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 250 2023-01-11T21:38:06.0605023Z [2023-01-11 21:31:23,903] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 250 2023-01-11T21:38:06.0605260Z 2023-01-11T21:38:06.0605386Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.0605592Z import torch 2023-01-11T21:38:06.0605774Z import random 2023-01-11T21:38:06.0606002Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.0606271Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.0606434Z 2023-01-11T21:38:06.0606518Z aten = torch.ops.aten 2023-01-11T21:38:06.0606778Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.0607033Z async_compile = AsyncCompile() 2023-01-11T21:38:06.0607170Z 2023-01-11T21:38:06.0607247Z import triton 2023-01-11T21:38:06.0607484Z import triton.language as tl 2023-01-11T21:38:06.0607735Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.0608037Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.0608210Z 2023-01-11T21:38:06.0608215Z 2023-01-11T21:38:06.0608357Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.0608713Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.0609139Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.0609463Z float* __restrict__ out_ptr0, 2023-01-11T21:38:06.0609766Z float* __restrict__ out_ptr1) 2023-01-11T21:38:06.0610024Z { 2023-01-11T21:38:06.0610282Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.0610548Z { 2023-01-11T21:38:06.0610794Z #pragma omp for 2023-01-11T21:38:06.0611057Z for(long i0=0; i0<32; i0+=1) 2023-01-11T21:38:06.0611321Z { 2023-01-11T21:38:06.0611641Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:06.0611917Z auto tmp1 = tmp0.sin(); 2023-01-11T21:38:06.0612183Z auto tmp2 = at::vec::Vectorized(static_cast(2)); 2023-01-11T21:38:06.0612438Z auto tmp3 = tmp1 + tmp2; 2023-01-11T21:38:06.0612704Z auto tmp4 = at::vec::Vectorized(static_cast(1)); 2023-01-11T21:38:06.0612949Z auto tmp5 = tmp0 + tmp4; 2023-01-11T21:38:06.0613165Z auto tmp6 = tmp5.sin(); 2023-01-11T21:38:06.0613387Z tmp3.store(out_ptr0 + 8*i0); 2023-01-11T21:38:06.0613605Z tmp6.store(out_ptr1 + 8*i0); 2023-01-11T21:38:06.0613797Z } 2023-01-11T21:38:06.0613999Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:06.0614209Z for(long i0=256; i0<256; i0+=1) 2023-01-11T21:38:06.0614399Z { 2023-01-11T21:38:06.0614738Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:06.0614965Z auto tmp1 = std::sin(tmp0); 2023-01-11T21:38:06.0615238Z auto tmp2 = static_cast(2); 2023-01-11T21:38:06.0615503Z auto tmp3 = tmp1 + tmp2; 2023-01-11T21:38:06.0615740Z auto tmp4 = static_cast(1); 2023-01-11T21:38:06.0615980Z auto tmp5 = tmp0 + tmp4; 2023-01-11T21:38:06.0616298Z auto tmp6 = std::sin(tmp5); 2023-01-11T21:38:06.0616516Z out_ptr0[i0] = tmp3; 2023-01-11T21:38:06.0616713Z out_ptr1[i0] = tmp6; 2023-01-11T21:38:06.0616898Z } 2023-01-11T21:38:06.0617067Z } 2023-01-11T21:38:06.0617326Z } 2023-01-11T21:38:06.0617558Z ''') 2023-01-11T21:38:06.0617659Z 2023-01-11T21:38:06.0617663Z 2023-01-11T21:38:06.0617763Z async_compile.wait(globals()) 2023-01-11T21:38:06.0617963Z del async_compile 2023-01-11T21:38:06.0618086Z 2023-01-11T21:38:06.0618164Z def call(args): 2023-01-11T21:38:06.0618350Z arg0_1, = args 2023-01-11T21:38:06.0618526Z args.clear() 2023-01-11T21:38:06.0618844Z buf0 = empty_strided((16, 16), (16, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0619212Z buf1 = empty_strided((16, 16), (16, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0619556Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr())) 2023-01-11T21:38:06.0619809Z del arg0_1 2023-01-11T21:38:06.0620005Z return (buf0, buf1, ) 2023-01-11T21:38:06.0620127Z 2023-01-11T21:38:06.0620131Z 2023-01-11T21:38:06.0620217Z if __name__ == "__main__": 2023-01-11T21:38:06.0620445Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.0620719Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.0621082Z arg0_1 = rand_strided((16, 16), (16, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0621362Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.0621501Z 2023-01-11T21:38:06.0621574Z ok (1.795s) 2023-01-11T21:38:06.0622143Z test_sizehint_issue1_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.0622795Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.0623329Z [2023-01-11 21:31:24,057] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 251 2023-01-11T21:38:06.0624010Z [2023-01-11 21:31:25,803] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 251 2023-01-11T21:38:06.0624315Z 2023-01-11T21:38:06.0624461Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.0624766Z import torch 2023-01-11T21:38:06.0625034Z import random 2023-01-11T21:38:06.0625345Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.0625658Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.0625820Z 2023-01-11T21:38:06.0625908Z aten = torch.ops.aten 2023-01-11T21:38:06.0626160Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.0626426Z async_compile = AsyncCompile() 2023-01-11T21:38:06.0626563Z 2023-01-11T21:38:06.0626643Z import triton 2023-01-11T21:38:06.0626845Z import triton.language as tl 2023-01-11T21:38:06.0627093Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.0627389Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.0627560Z 2023-01-11T21:38:06.0627565Z 2023-01-11T21:38:06.0627721Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.0628058Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.0628407Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.0628660Z float* __restrict__ out_ptr0) 2023-01-11T21:38:06.0628858Z { 2023-01-11T21:38:06.0629057Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.0629268Z { 2023-01-11T21:38:06.0629451Z #pragma omp for collapse(2) 2023-01-11T21:38:06.0629669Z for(long i0=0; i0<2; i0+=1) 2023-01-11T21:38:06.0629861Z { 2023-01-11T21:38:06.0630108Z for(long i1=0; i1<384; i1+=1) 2023-01-11T21:38:06.0630305Z { 2023-01-11T21:38:06.0630496Z #pragma GCC ivdep 2023-01-11T21:38:06.0630714Z for(long i2=0; i2<196; i2+=1) 2023-01-11T21:38:06.0630901Z { 2023-01-11T21:38:06.0631078Z { 2023-01-11T21:38:06.0631256Z { 2023-01-11T21:38:06.0631479Z auto tmp0 = static_cast(4*(i2 / 14)); 2023-01-11T21:38:06.0631748Z auto tmp1 = static_cast((i1 / 4) % 4); 2023-01-11T21:38:06.0632000Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.0632246Z auto tmp3 = static_cast(4*(i2 % 14)); 2023-01-11T21:38:06.0632507Z auto tmp4 = static_cast(i1 % 4); 2023-01-11T21:38:06.0632750Z auto tmp5 = tmp3 + tmp4; 2023-01-11T21:38:06.0633010Z auto tmp6 = in_ptr0[tmp5 + (56*tmp2) + (3136*(i1 / 16)) + (75264*i0)]; 2023-01-11T21:38:06.0633282Z out_ptr0[i2 + (196*i1) + (75264*i0)] = tmp6; 2023-01-11T21:38:06.0633494Z } 2023-01-11T21:38:06.0633674Z } 2023-01-11T21:38:06.0633843Z } 2023-01-11T21:38:06.0634013Z } 2023-01-11T21:38:06.0634185Z } 2023-01-11T21:38:06.0634342Z } 2023-01-11T21:38:06.0634502Z } 2023-01-11T21:38:06.0634680Z ''') 2023-01-11T21:38:06.0634772Z 2023-01-11T21:38:06.0634777Z 2023-01-11T21:38:06.0634873Z async_compile.wait(globals()) 2023-01-11T21:38:06.0635078Z del async_compile 2023-01-11T21:38:06.0635192Z 2023-01-11T21:38:06.0635304Z def call(args): 2023-01-11T21:38:06.0635480Z arg0_1, = args 2023-01-11T21:38:06.0635661Z args.clear() 2023-01-11T21:38:06.0635987Z buf0 = empty_strided((2, 384, 196), (75264, 196, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0636290Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.0636531Z del arg0_1 2023-01-11T21:38:06.0636713Z return (buf0, ) 2023-01-11T21:38:06.0636827Z 2023-01-11T21:38:06.0636832Z 2023-01-11T21:38:06.0636907Z if __name__ == "__main__": 2023-01-11T21:38:06.0637141Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.0637417Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.0637800Z arg0_1 = rand_strided((2, 24, 56, 56), (75264, 3136, 56, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0638076Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.0638223Z 2023-01-11T21:38:06.0638297Z ok (1.903s) 2023-01-11T21:38:06.0638864Z test_slice1_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.0639455Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.0639866Z [2023-01-11 21:31:25,848] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 252 2023-01-11T21:38:06.0640328Z [2023-01-11 21:31:27,596] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 252 2023-01-11T21:38:06.0640537Z 2023-01-11T21:38:06.0640638Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.0640845Z import torch 2023-01-11T21:38:06.0641025Z import random 2023-01-11T21:38:06.0641255Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.0641533Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.0641686Z 2023-01-11T21:38:06.0641769Z aten = torch.ops.aten 2023-01-11T21:38:06.0642023Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.0642284Z async_compile = AsyncCompile() 2023-01-11T21:38:06.0642457Z 2023-01-11T21:38:06.0642529Z import triton 2023-01-11T21:38:06.0642731Z import triton.language as tl 2023-01-11T21:38:06.0642987Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.0643274Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.0643447Z 2023-01-11T21:38:06.0643452Z 2023-01-11T21:38:06.0643591Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.0643931Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.0644302Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.0644555Z float* __restrict__ out_ptr0, 2023-01-11T21:38:06.0644793Z float* __restrict__ out_ptr1) 2023-01-11T21:38:06.0644994Z { 2023-01-11T21:38:06.0645182Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.0645381Z { 2023-01-11T21:38:06.0645572Z #pragma omp for collapse(2) 2023-01-11T21:38:06.0645784Z for(long i0=0; i0<2; i0+=1) 2023-01-11T21:38:06.0645971Z { 2023-01-11T21:38:06.0646162Z for(long i1=0; i1<10; i1+=1) 2023-01-11T21:38:06.0646346Z { 2023-01-11T21:38:06.0646519Z { 2023-01-11T21:38:06.0646695Z { 2023-01-11T21:38:06.0646911Z auto tmp0 = in_ptr0[(2*i1) + (40*i0)]; 2023-01-11T21:38:06.0647162Z auto tmp1 = in_ptr0[20 + (2*i1) + (40*i0)]; 2023-01-11T21:38:06.0647403Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.0647647Z auto tmp3 = static_cast(1); 2023-01-11T21:38:06.0647915Z auto tmp4 = tmp0 + tmp3; 2023-01-11T21:38:06.0648141Z auto tmp5 = tmp1 + tmp3; 2023-01-11T21:38:06.0648373Z auto tmp6 = tmp4 + tmp5; 2023-01-11T21:38:06.0648600Z out_ptr0[i1 + (10*i0)] = tmp2; 2023-01-11T21:38:06.0648839Z out_ptr1[i1 + (10*i0)] = tmp6; 2023-01-11T21:38:06.0649045Z } 2023-01-11T21:38:06.0649218Z } 2023-01-11T21:38:06.0649393Z } 2023-01-11T21:38:06.0649570Z } 2023-01-11T21:38:06.0649735Z } 2023-01-11T21:38:06.0649900Z } 2023-01-11T21:38:06.0650083Z ''') 2023-01-11T21:38:06.0650178Z 2023-01-11T21:38:06.0650190Z 2023-01-11T21:38:06.0650281Z async_compile.wait(globals()) 2023-01-11T21:38:06.0650494Z del async_compile 2023-01-11T21:38:06.0650611Z 2023-01-11T21:38:06.0650689Z def call(args): 2023-01-11T21:38:06.0650867Z arg0_1, = args 2023-01-11T21:38:06.0651057Z args.clear() 2023-01-11T21:38:06.0651375Z buf0 = empty_strided((2, 10), (10, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0651734Z buf1 = empty_strided((2, 10), (10, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0652070Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr())) 2023-01-11T21:38:06.0652334Z del arg0_1 2023-01-11T21:38:06.0652531Z return (buf0, buf1, ) 2023-01-11T21:38:06.0652648Z 2023-01-11T21:38:06.0652654Z 2023-01-11T21:38:06.0652737Z if __name__ == "__main__": 2023-01-11T21:38:06.0652974Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.0653255Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.0653614Z arg0_1 = rand_strided((2, 20, 2), (40, 2, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0653898Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.0654048Z 2023-01-11T21:38:06.0654123Z ok (1.790s) 2023-01-11T21:38:06.0654834Z test_slice2_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.0655551Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.0655982Z [2023-01-11 21:31:27,639] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 253 2023-01-11T21:38:06.0656449Z [2023-01-11 21:31:29,356] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 253 2023-01-11T21:38:06.0656658Z 2023-01-11T21:38:06.0656760Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.0656967Z import torch 2023-01-11T21:38:06.0657230Z import random 2023-01-11T21:38:06.0657473Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.0657746Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.0657910Z 2023-01-11T21:38:06.0657995Z aten = torch.ops.aten 2023-01-11T21:38:06.0658256Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.0658518Z async_compile = AsyncCompile() 2023-01-11T21:38:06.0658661Z 2023-01-11T21:38:06.0658737Z import triton 2023-01-11T21:38:06.0658948Z import triton.language as tl 2023-01-11T21:38:06.0659209Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.0659500Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.0659673Z 2023-01-11T21:38:06.0659678Z 2023-01-11T21:38:06.0659819Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.0660165Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.0660517Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.0660781Z float* __restrict__ out_ptr0, 2023-01-11T21:38:06.0661104Z float* __restrict__ out_ptr1) 2023-01-11T21:38:06.0661308Z { 2023-01-11T21:38:06.0661500Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.0661704Z { 2023-01-11T21:38:06.0661887Z #pragma omp for 2023-01-11T21:38:06.0662087Z for(long i0=0; i0<10; i0+=1) 2023-01-11T21:38:06.0662283Z { 2023-01-11T21:38:06.0662454Z { 2023-01-11T21:38:06.0662620Z { 2023-01-11T21:38:06.0662831Z auto tmp0 = in_ptr0[1 + (4*i0)]; 2023-01-11T21:38:06.0663068Z auto tmp1 = in_ptr0[42 + (4*i0)]; 2023-01-11T21:38:06.0663295Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.0663536Z auto tmp3 = static_cast(1); 2023-01-11T21:38:06.0663776Z auto tmp4 = tmp0 + tmp3; 2023-01-11T21:38:06.0664010Z auto tmp5 = static_cast(2); 2023-01-11T21:38:06.0664245Z auto tmp6 = tmp1 + tmp5; 2023-01-11T21:38:06.0664477Z auto tmp7 = tmp4 + tmp6; 2023-01-11T21:38:06.0664688Z out_ptr0[i0] = tmp2; 2023-01-11T21:38:06.0664905Z out_ptr1[i0] = tmp7; 2023-01-11T21:38:06.0665104Z } 2023-01-11T21:38:06.0665273Z } 2023-01-11T21:38:06.0665443Z } 2023-01-11T21:38:06.0665615Z } 2023-01-11T21:38:06.0665773Z } 2023-01-11T21:38:06.0665954Z ''') 2023-01-11T21:38:06.0666055Z 2023-01-11T21:38:06.0666060Z 2023-01-11T21:38:06.0666158Z async_compile.wait(globals()) 2023-01-11T21:38:06.0666362Z del async_compile 2023-01-11T21:38:06.0666477Z 2023-01-11T21:38:06.0666555Z def call(args): 2023-01-11T21:38:06.0666744Z arg0_1, = args 2023-01-11T21:38:06.0666931Z args.clear() 2023-01-11T21:38:06.0667240Z buf0 = empty_strided((1, 10), (10, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0667604Z buf1 = empty_strided((1, 10), (10, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0667943Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr())) 2023-01-11T21:38:06.0668202Z del arg0_1 2023-01-11T21:38:06.0668394Z return (buf0, buf1, ) 2023-01-11T21:38:06.0668516Z 2023-01-11T21:38:06.0668521Z 2023-01-11T21:38:06.0668604Z if __name__ == "__main__": 2023-01-11T21:38:06.0668872Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.0669153Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.0669524Z arg0_1 = rand_strided((2, 20, 2), (40, 2, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0669806Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.0669946Z 2023-01-11T21:38:06.0670022Z ok (1.760s) 2023-01-11T21:38:06.0670591Z test_slice_mutation1_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.0671185Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.0671611Z [2023-01-11 21:31:29,405] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 254 2023-01-11T21:38:06.0672070Z [2023-01-11 21:31:31,171] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 254 2023-01-11T21:38:06.0672282Z 2023-01-11T21:38:06.0672384Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.0672595Z import torch 2023-01-11T21:38:06.0672775Z import random 2023-01-11T21:38:06.0673005Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.0673286Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.0673449Z 2023-01-11T21:38:06.0673535Z aten = torch.ops.aten 2023-01-11T21:38:06.0673789Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.0674087Z async_compile = AsyncCompile() 2023-01-11T21:38:06.0674222Z 2023-01-11T21:38:06.0674300Z import triton 2023-01-11T21:38:06.0674496Z import triton.language as tl 2023-01-11T21:38:06.0674754Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.0675057Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.0675233Z 2023-01-11T21:38:06.0675237Z 2023-01-11T21:38:06.0675372Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.0675720Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.0676072Z extern "C" void kernel(float* __restrict__ out_ptr0, 2023-01-11T21:38:06.0676326Z float* __restrict__ out_ptr1, 2023-01-11T21:38:06.0676554Z float* __restrict__ out_ptr2, 2023-01-11T21:38:06.0676785Z float* __restrict__ out_ptr3) 2023-01-11T21:38:06.0676991Z { 2023-01-11T21:38:06.0677187Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.0677396Z { 2023-01-11T21:38:06.0677577Z #pragma omp for 2023-01-11T21:38:06.0677778Z for(long i0=0; i0<8; i0+=1) 2023-01-11T21:38:06.0677971Z { 2023-01-11T21:38:06.0678219Z auto tmp0 = at::vec::Vectorized(static_cast(0)); 2023-01-11T21:38:06.0678522Z auto tmp1 = at::vec::Vectorized(static_cast(1)); 2023-01-11T21:38:06.0678774Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.0679000Z tmp0.store(out_ptr0 + 8*i0); 2023-01-11T21:38:06.0679220Z tmp2.store(out_ptr1 + 8*i0); 2023-01-11T21:38:06.0679418Z } 2023-01-11T21:38:06.0679620Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:06.0679843Z for(long i0=64; i0<64; i0+=1) 2023-01-11T21:38:06.0680033Z { 2023-01-11T21:38:06.0680242Z auto tmp0 = static_cast(0); 2023-01-11T21:38:06.0680486Z auto tmp1 = static_cast(1); 2023-01-11T21:38:06.0680708Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.0680921Z out_ptr0[i0] = tmp0; 2023-01-11T21:38:06.0681126Z out_ptr1[i0] = tmp2; 2023-01-11T21:38:06.0681306Z } 2023-01-11T21:38:06.0681491Z #pragma omp for 2023-01-11T21:38:06.0681699Z for(long i0=0; i0<8; i0+=1) 2023-01-11T21:38:06.0681918Z { 2023-01-11T21:38:06.0682087Z { 2023-01-11T21:38:06.0682260Z { 2023-01-11T21:38:06.0682471Z auto tmp0 = static_cast(3.0); 2023-01-11T21:38:06.0682711Z out_ptr0[3 + (8*i0)] = tmp0; 2023-01-11T21:38:06.0682910Z } 2023-01-11T21:38:06.0683078Z } 2023-01-11T21:38:06.0683248Z } 2023-01-11T21:38:06.0683429Z #pragma omp for 2023-01-11T21:38:06.0683626Z for(long i0=0; i0<8; i0+=1) 2023-01-11T21:38:06.0683813Z { 2023-01-11T21:38:06.0684052Z auto tmp0 = at::vec::Vectorized::loadu(out_ptr0 + 8*i0); 2023-01-11T21:38:06.0684309Z tmp0.store(out_ptr2 + 8*i0); 2023-01-11T21:38:06.0684508Z } 2023-01-11T21:38:06.0684707Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:06.0684919Z for(long i0=64; i0<64; i0+=1) 2023-01-11T21:38:06.0685115Z { 2023-01-11T21:38:06.0685307Z auto tmp0 = out_ptr0[i0]; 2023-01-11T21:38:06.0685515Z out_ptr2[i0] = tmp0; 2023-01-11T21:38:06.0685727Z } 2023-01-11T21:38:06.0685929Z #pragma omp for 2023-01-11T21:38:06.0686124Z for(long i0=0; i0<1; i0+=1) 2023-01-11T21:38:06.0686315Z { 2023-01-11T21:38:06.0686559Z auto tmp0 = at::vec::Vectorized(static_cast(4.0)); 2023-01-11T21:38:06.0686828Z tmp0.store(out_ptr0 + 32 + (8*i0)); 2023-01-11T21:38:06.0687021Z } 2023-01-11T21:38:06.0687219Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:06.0687437Z for(long i0=8; i0<8; i0+=1) 2023-01-11T21:38:06.0687618Z { 2023-01-11T21:38:06.0687870Z auto tmp0 = static_cast(4.0); 2023-01-11T21:38:06.0688095Z out_ptr0[32 + i0] = tmp0; 2023-01-11T21:38:06.0688276Z } 2023-01-11T21:38:06.0688458Z #pragma omp for 2023-01-11T21:38:06.0688658Z for(long i0=0; i0<8; i0+=1) 2023-01-11T21:38:06.0688837Z { 2023-01-11T21:38:06.0689076Z auto tmp0 = at::vec::Vectorized::loadu(out_ptr0 + 8*i0); 2023-01-11T21:38:06.0689381Z auto tmp1 = at::vec::Vectorized(static_cast(1)); 2023-01-11T21:38:06.0689629Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.0689851Z tmp2.store(out_ptr3 + 8*i0); 2023-01-11T21:38:06.0690045Z } 2023-01-11T21:38:06.0690235Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:06.0690452Z for(long i0=64; i0<64; i0+=1) 2023-01-11T21:38:06.0690642Z { 2023-01-11T21:38:06.0690820Z auto tmp0 = out_ptr0[i0]; 2023-01-11T21:38:06.0691049Z auto tmp1 = static_cast(1); 2023-01-11T21:38:06.0691276Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.0691485Z out_ptr3[i0] = tmp2; 2023-01-11T21:38:06.0691664Z } 2023-01-11T21:38:06.0691829Z } 2023-01-11T21:38:06.0691991Z } 2023-01-11T21:38:06.0692161Z ''') 2023-01-11T21:38:06.0692261Z 2023-01-11T21:38:06.0692266Z 2023-01-11T21:38:06.0692366Z async_compile.wait(globals()) 2023-01-11T21:38:06.0692575Z del async_compile 2023-01-11T21:38:06.0692682Z 2023-01-11T21:38:06.0692760Z def call(args): 2023-01-11T21:38:06.0692943Z arg0_1, = args 2023-01-11T21:38:06.0693124Z args.clear() 2023-01-11T21:38:06.0693425Z buf0 = empty_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0693782Z buf1 = empty_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0694125Z buf3 = empty_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0694461Z buf5 = empty_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0694923Z kernel_cpp_0(c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf3.data_ptr()), c_void_p(buf5.data_ptr())) 2023-01-11T21:38:06.0695227Z return (buf0, buf1, buf3, buf5, ) 2023-01-11T21:38:06.0695360Z 2023-01-11T21:38:06.0695365Z 2023-01-11T21:38:06.0695451Z if __name__ == "__main__": 2023-01-11T21:38:06.0695725Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.0696002Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.0696358Z arg0_1 = rand_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0696628Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.0696775Z 2023-01-11T21:38:06.0696848Z ok (1.815s) 2023-01-11T21:38:06.0697345Z test_slice_mutation2_cpu (__main__.CpuTests) ... [2023-01-11 21:31:31,206] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 255 2023-01-11T21:38:06.0697855Z [2023-01-11 21:31:32,943] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 255 2023-01-11T21:38:06.0698067Z 2023-01-11T21:38:06.0698160Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.0698367Z import torch 2023-01-11T21:38:06.0698553Z import random 2023-01-11T21:38:06.0698776Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.0699055Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.0699214Z 2023-01-11T21:38:06.0699298Z aten = torch.ops.aten 2023-01-11T21:38:06.0699558Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.0699813Z async_compile = AsyncCompile() 2023-01-11T21:38:06.0699952Z 2023-01-11T21:38:06.0700028Z import triton 2023-01-11T21:38:06.0700228Z import triton.language as tl 2023-01-11T21:38:06.0700475Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.0700769Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.0700941Z 2023-01-11T21:38:06.0700945Z 2023-01-11T21:38:06.0701124Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.0701463Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.0701815Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.0702072Z float* __restrict__ out_ptr0, 2023-01-11T21:38:06.0702310Z float* __restrict__ out_ptr1, 2023-01-11T21:38:06.0702534Z float* __restrict__ out_ptr2) 2023-01-11T21:38:06.0702736Z { 2023-01-11T21:38:06.0702931Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.0703126Z { 2023-01-11T21:38:06.0703307Z #pragma omp for 2023-01-11T21:38:06.0703510Z for(long i0=0; i0<2; i0+=1) 2023-01-11T21:38:06.0703692Z { 2023-01-11T21:38:06.0703937Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 20 + (8*i0)); 2023-01-11T21:38:06.0704246Z auto tmp1 = at::vec::Vectorized(static_cast(1)); 2023-01-11T21:38:06.0704496Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.0704720Z tmp2.store(out_ptr0 + 8*i0); 2023-01-11T21:38:06.0704917Z } 2023-01-11T21:38:06.0705113Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:06.0705337Z for(long i0=16; i0<20; i0+=1) 2023-01-11T21:38:06.0705527Z { 2023-01-11T21:38:06.0705717Z auto tmp0 = in_ptr0[20 + i0]; 2023-01-11T21:38:06.0705952Z auto tmp1 = static_cast(1); 2023-01-11T21:38:06.0706176Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.0706387Z out_ptr0[i0] = tmp2; 2023-01-11T21:38:06.0706567Z } 2023-01-11T21:38:06.0706746Z #pragma omp for 2023-01-11T21:38:06.0706952Z for(long i0=0; i0<2; i0+=1) 2023-01-11T21:38:06.0707133Z { 2023-01-11T21:38:06.0707373Z auto tmp0 = at::vec::Vectorized::loadu(out_ptr0 + 8*i0); 2023-01-11T21:38:06.0707639Z tmp0.store(out_ptr1 + 20 + (8*i0)); 2023-01-11T21:38:06.0707834Z } 2023-01-11T21:38:06.0708029Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:06.0708250Z for(long i0=16; i0<20; i0+=1) 2023-01-11T21:38:06.0708431Z { 2023-01-11T21:38:06.0708619Z auto tmp0 = out_ptr0[i0]; 2023-01-11T21:38:06.0708830Z out_ptr1[20 + i0] = tmp0; 2023-01-11T21:38:06.0709009Z } 2023-01-11T21:38:06.0709227Z #pragma omp for 2023-01-11T21:38:06.0709433Z for(long i0=0; i0<1; i0+=1) 2023-01-11T21:38:06.0709614Z { 2023-01-11T21:38:06.0709855Z auto tmp0 = at::vec::Vectorized::loadu(out_ptr1 + 1 + (8*i0)); 2023-01-11T21:38:06.0710161Z auto tmp1 = at::vec::Vectorized(static_cast(2)); 2023-01-11T21:38:06.0710406Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.0710627Z tmp2.store(out_ptr2 + 8*i0); 2023-01-11T21:38:06.0710821Z } 2023-01-11T21:38:06.0711016Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:06.0711224Z for(long i0=8; i0<9; i0+=1) 2023-01-11T21:38:06.0711413Z { 2023-01-11T21:38:06.0711604Z auto tmp0 = out_ptr1[1 + i0]; 2023-01-11T21:38:06.0711832Z auto tmp1 = static_cast(2); 2023-01-11T21:38:06.0712055Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.0712268Z out_ptr2[i0] = tmp2; 2023-01-11T21:38:06.0712333Z } 2023-01-11T21:38:06.0712416Z #pragma omp for 2023-01-11T21:38:06.0712502Z for(long i0=0; i0<1; i0+=1) 2023-01-11T21:38:06.0712570Z { 2023-01-11T21:38:06.0712709Z auto tmp0 = at::vec::Vectorized::loadu(out_ptr2 + 8*i0); 2023-01-11T21:38:06.0712813Z tmp0.store(out_ptr1 + 2 + (8*i0)); 2023-01-11T21:38:06.0712880Z } 2023-01-11T21:38:06.0712973Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:06.0713059Z for(long i0=8; i0<9; i0+=1) 2023-01-11T21:38:06.0713127Z { 2023-01-11T21:38:06.0713216Z auto tmp0 = out_ptr2[i0]; 2023-01-11T21:38:06.0713337Z out_ptr1[2 + i0] = tmp0; 2023-01-11T21:38:06.0713405Z } 2023-01-11T21:38:06.0713477Z } 2023-01-11T21:38:06.0713536Z } 2023-01-11T21:38:06.0713625Z ''') 2023-01-11T21:38:06.0713631Z 2023-01-11T21:38:06.0713635Z 2023-01-11T21:38:06.0713730Z async_compile.wait(globals()) 2023-01-11T21:38:06.0713812Z del async_compile 2023-01-11T21:38:06.0713820Z 2023-01-11T21:38:06.0713899Z def call(args): 2023-01-11T21:38:06.0713976Z arg0_1, = args 2023-01-11T21:38:06.0714052Z args.clear() 2023-01-11T21:38:06.0714247Z buf0 = empty_strided((1, 20), (20, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0714442Z buf2 = empty_strided((1, 9), (9, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0714643Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(arg0_1.data_ptr()), c_void_p(buf2.data_ptr())) 2023-01-11T21:38:06.0714718Z del arg0_1 2023-01-11T21:38:06.0714789Z return () 2023-01-11T21:38:06.0714794Z 2023-01-11T21:38:06.0714801Z 2023-01-11T21:38:06.0714884Z if __name__ == "__main__": 2023-01-11T21:38:06.0715003Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.0715144Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.0715370Z arg0_1 = rand_strided((1, 64), (64, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0715491Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.0715497Z 2023-01-11T21:38:06.0715570Z ok (1.771s) 2023-01-11T21:38:06.0716032Z test_slice_scatter2_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.0716168Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.0716431Z [2023-01-11 21:31:32,970] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 256 2023-01-11T21:38:06.0716698Z [2023-01-11 21:31:34,653] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 256 2023-01-11T21:38:06.0716704Z 2023-01-11T21:38:06.0716833Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.0716912Z import torch 2023-01-11T21:38:06.0716982Z import random 2023-01-11T21:38:06.0717102Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.0717233Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.0717238Z 2023-01-11T21:38:06.0717323Z aten = torch.ops.aten 2023-01-11T21:38:06.0717462Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.0717561Z async_compile = AsyncCompile() 2023-01-11T21:38:06.0717566Z 2023-01-11T21:38:06.0717643Z import triton 2023-01-11T21:38:06.0717737Z import triton.language as tl 2023-01-11T21:38:06.0717856Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.0718002Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.0718007Z 2023-01-11T21:38:06.0718012Z 2023-01-11T21:38:06.0718149Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.0718357Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.0718483Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.0718588Z float* __restrict__ out_ptr0) 2023-01-11T21:38:06.0718653Z { 2023-01-11T21:38:06.0718754Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.0718815Z { 2023-01-11T21:38:06.0718900Z #pragma omp for 2023-01-11T21:38:06.0718988Z for(long i0=0; i0<75648; i0+=1) 2023-01-11T21:38:06.0719054Z { 2023-01-11T21:38:06.0719194Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:06.0719294Z tmp0.store(out_ptr0 + 8*i0); 2023-01-11T21:38:06.0719391Z } 2023-01-11T21:38:06.0719484Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:06.0719580Z for(long i0=605184; i0<605184; i0+=1) 2023-01-11T21:38:06.0719648Z { 2023-01-11T21:38:06.0719739Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:06.0719826Z out_ptr0[i0] = tmp0; 2023-01-11T21:38:06.0719895Z } 2023-01-11T21:38:06.0719956Z } 2023-01-11T21:38:06.0720021Z } 2023-01-11T21:38:06.0720106Z ''') 2023-01-11T21:38:06.0720112Z 2023-01-11T21:38:06.0720117Z 2023-01-11T21:38:06.0720210Z async_compile.wait(globals()) 2023-01-11T21:38:06.0720289Z del async_compile 2023-01-11T21:38:06.0720294Z 2023-01-11T21:38:06.0720370Z def call(args): 2023-01-11T21:38:06.0720451Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.0720527Z args.clear() 2023-01-11T21:38:06.0720735Z buf0 = empty_strided((8, 197, 384), (75648, 384, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0720874Z kernel_cpp_0(c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.0720952Z del arg1_1 2023-01-11T21:38:06.0721028Z return (buf0, ) 2023-01-11T21:38:06.0721034Z 2023-01-11T21:38:06.0721038Z 2023-01-11T21:38:06.0721118Z if __name__ == "__main__": 2023-01-11T21:38:06.0721235Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.0721364Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.0721581Z arg0_1 = rand_strided((8, 197, 384), (75648, 384, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0721788Z arg1_1 = rand_strided((8, 197, 384), (75648, 384, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0721910Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.0721915Z 2023-01-11T21:38:06.0721986Z ok (1.732s) 2023-01-11T21:38:06.0722446Z test_slice_scatter_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.0722580Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.0722869Z [2023-01-11 21:31:34,712] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 257 2023-01-11T21:38:06.0723135Z [2023-01-11 21:31:36,427] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 257 2023-01-11T21:38:06.0723140Z 2023-01-11T21:38:06.0723240Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.0723316Z import torch 2023-01-11T21:38:06.0723385Z import random 2023-01-11T21:38:06.0723506Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.0723631Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.0723636Z 2023-01-11T21:38:06.0723723Z aten = torch.ops.aten 2023-01-11T21:38:06.0723864Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.0723960Z async_compile = AsyncCompile() 2023-01-11T21:38:06.0723966Z 2023-01-11T21:38:06.0724040Z import triton 2023-01-11T21:38:06.0724133Z import triton.language as tl 2023-01-11T21:38:06.0724253Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.0724396Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.0724402Z 2023-01-11T21:38:06.0724406Z 2023-01-11T21:38:06.0724544Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.0724754Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.0724879Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.0724991Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:06.0725096Z float* __restrict__ out_ptr0, 2023-01-11T21:38:06.0725206Z float* __restrict__ out_ptr1) 2023-01-11T21:38:06.0725310Z { 2023-01-11T21:38:06.0725432Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.0725505Z { 2023-01-11T21:38:06.0725587Z #pragma omp for 2023-01-11T21:38:06.0725675Z for(long i0=0; i0<32; i0+=1) 2023-01-11T21:38:06.0725742Z { 2023-01-11T21:38:06.0725823Z #pragma GCC ivdep 2023-01-11T21:38:06.0725915Z for(long i1=0; i1<100; i1+=1) 2023-01-11T21:38:06.0725984Z { 2023-01-11T21:38:06.0726054Z { 2023-01-11T21:38:06.0726127Z { 2023-01-11T21:38:06.0726241Z auto tmp8 = in_ptr1[i1 + (100*i0)]; 2023-01-11T21:38:06.0726353Z auto tmp0 = static_cast(i1); 2023-01-11T21:38:06.0726456Z auto tmp1 = static_cast(10); 2023-01-11T21:38:06.0726556Z auto tmp2 = tmp0 >= tmp1; 2023-01-11T21:38:06.0726667Z auto tmp3 = static_cast(90); 2023-01-11T21:38:06.0726769Z auto tmp4 = tmp0 < tmp3; 2023-01-11T21:38:06.0726872Z auto tmp5 = tmp2 & tmp4; 2023-01-11T21:38:06.0726963Z float tmp6 = 0.0; 2023-01-11T21:38:06.0727046Z if(tmp5) 2023-01-11T21:38:06.0727113Z { 2023-01-11T21:38:06.0727290Z auto tmp7 = in_ptr0[(-10) + i1 + (80*i0)]; 2023-01-11T21:38:06.0727379Z tmp6 = tmp7; 2023-01-11T21:38:06.0727453Z } 2023-01-11T21:38:06.0727560Z auto tmp9 = tmp5 ? tmp6 : tmp8; 2023-01-11T21:38:06.0727676Z auto tmp10 = static_cast(i1 % 2); 2023-01-11T21:38:06.0727784Z auto tmp11 = static_cast(0); 2023-01-11T21:38:06.0727886Z auto tmp12 = tmp10 == tmp11; 2023-01-11T21:38:06.0727979Z auto tmp13 = tmp5 & tmp12; 2023-01-11T21:38:06.0728072Z float tmp14 = 0.0; 2023-01-11T21:38:06.0728158Z if(tmp13) 2023-01-11T21:38:06.0728233Z { 2023-01-11T21:38:06.0728412Z auto tmp15 = in_ptr0[(-5) + (80*i0) + (i1 / 2)]; 2023-01-11T21:38:06.0728503Z tmp14 = tmp15; 2023-01-11T21:38:06.0728608Z } 2023-01-11T21:38:06.0728714Z auto tmp16 = tmp13 ? tmp14 : tmp8; 2023-01-11T21:38:06.0728816Z out_ptr0[i1 + (100*i0)] = tmp9; 2023-01-11T21:38:06.0728919Z out_ptr1[i1 + (100*i0)] = tmp16; 2023-01-11T21:38:06.0728992Z } 2023-01-11T21:38:06.0729061Z } 2023-01-11T21:38:06.0729131Z } 2023-01-11T21:38:06.0729199Z } 2023-01-11T21:38:06.0729259Z } 2023-01-11T21:38:06.0729324Z } 2023-01-11T21:38:06.0729410Z ''') 2023-01-11T21:38:06.0729415Z 2023-01-11T21:38:06.0729420Z 2023-01-11T21:38:06.0729515Z async_compile.wait(globals()) 2023-01-11T21:38:06.0729600Z del async_compile 2023-01-11T21:38:06.0729605Z 2023-01-11T21:38:06.0729680Z def call(args): 2023-01-11T21:38:06.0729760Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.0729830Z args.clear() 2023-01-11T21:38:06.0730042Z buf0 = empty_strided((4, 8, 100), (800, 100, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0730255Z buf1 = empty_strided((4, 8, 100), (800, 100, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0730450Z kernel_cpp_0(c_void_p(arg1_1.data_ptr()), c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr())) 2023-01-11T21:38:06.0730523Z del arg0_1 2023-01-11T21:38:06.0730595Z del arg1_1 2023-01-11T21:38:06.0730676Z return (buf0, buf1, ) 2023-01-11T21:38:06.0730681Z 2023-01-11T21:38:06.0730685Z 2023-01-11T21:38:06.0730766Z if __name__ == "__main__": 2023-01-11T21:38:06.0730878Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.0731004Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.0731245Z arg0_1 = rand_strided((4, 8, 100), (800, 100, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0731452Z arg1_1 = rand_strided((4, 8, 80), (640, 80, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0731573Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.0731581Z 2023-01-11T21:38:06.0731652Z ok (1.753s) 2023-01-11T21:38:06.0732106Z test_softmax_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.0732239Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.0732499Z [2023-01-11 21:31:36,469] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 258 2023-01-11T21:38:06.0732507Z 2023-01-11T21:38:06.0732599Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.0732674Z import torch 2023-01-11T21:38:06.0732749Z import random 2023-01-11T21:38:06.0732871Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.0732999Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.0733004Z 2023-01-11T21:38:06.0733088Z aten = torch.ops.aten 2023-01-11T21:38:06.0733224Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.0733320Z async_compile = AsyncCompile() 2023-01-11T21:38:06.0733326Z 2023-01-11T21:38:06.0733394Z import triton 2023-01-11T21:38:06.0733488Z import triton.language as tl 2023-01-11T21:38:06.0733613Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.0733752Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.0733758Z 2023-01-11T21:38:06.0733762Z 2023-01-11T21:38:06.0733903Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.0734109Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.0734230Z extern "C" void kernel(float* __restrict__ in_out_ptr0, 2023-01-11T21:38:06.0734337Z float* __restrict__ in_out_ptr1, 2023-01-11T21:38:06.0734463Z float* __restrict__ in_out_ptr2, 2023-01-11T21:38:06.0734706Z const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.0734816Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:06.0734921Z float* __restrict__ out_ptr0, 2023-01-11T21:38:06.0735023Z float* __restrict__ out_ptr1, 2023-01-11T21:38:06.0735124Z float* __restrict__ out_ptr2, 2023-01-11T21:38:06.0735224Z float* __restrict__ out_ptr6, 2023-01-11T21:38:06.0735317Z float* __restrict__ out_ptr7, 2023-01-11T21:38:06.0735420Z float* __restrict__ out_ptr8) 2023-01-11T21:38:06.0735490Z { 2023-01-11T21:38:06.0735583Z auto out_ptr3 = in_out_ptr0; 2023-01-11T21:38:06.0735673Z auto out_ptr4 = in_out_ptr1; 2023-01-11T21:38:06.0735762Z auto out_ptr5 = in_out_ptr2; 2023-01-11T21:38:06.0735864Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.0735926Z { 2023-01-11T21:38:06.0736009Z #pragma omp for 2023-01-11T21:38:06.0736097Z for(long i0=0; i0<8; i0+=1) 2023-01-11T21:38:06.0736165Z { 2023-01-11T21:38:06.0736234Z { 2023-01-11T21:38:06.0736600Z #pragma omp declare reduction(max:at::vec::Vectorized:omp_out = at::vec::maximum(omp_out, omp_in)) initializer(omp_priv={{-std::numeric_limits::infinity()}}) 2023-01-11T21:38:06.0736844Z float tmp3 = -std::numeric_limits::infinity(); 2023-01-11T21:38:06.0736973Z auto tmp3_vec = at::vec::Vectorized(tmp3); 2023-01-11T21:38:06.0737284Z float tmp4 = -std::numeric_limits::infinity(); 2023-01-11T21:38:06.0737417Z auto tmp4_vec = at::vec::Vectorized(tmp4); 2023-01-11T21:38:06.0737512Z for(long i1=0; i1<1; i1+=1) 2023-01-11T21:38:06.0737583Z { 2023-01-11T21:38:06.0737736Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + (8*i0) + (8*i1)); 2023-01-11T21:38:06.0737885Z auto tmp1 = at::vec::Vectorized::loadu(in_ptr1 + (8*i0) + (8*i1)); 2023-01-11T21:38:06.0737986Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.0738109Z tmp3_vec = at::vec::maximum(tmp3_vec, tmp2); 2023-01-11T21:38:06.0738225Z tmp4_vec = at::vec::maximum(tmp4_vec, tmp1); 2023-01-11T21:38:06.0738295Z } 2023-01-11T21:38:06.0738512Z tmp3 = at::vec::vec_reduce_all([](at::vec::Vectorized& x, at::vec::Vectorized&y) {return at::vec::maximum(x, y);}, tmp3_vec); 2023-01-11T21:38:06.0738719Z tmp4 = at::vec::vec_reduce_all([](at::vec::Vectorized& x, at::vec::Vectorized&y) {return at::vec::maximum(x, y);}, tmp4_vec); 2023-01-11T21:38:06.0738872Z #pragma omp simd simdlen(4) reduction(max:tmp3) reduction(max:tmp4) 2023-01-11T21:38:06.0738971Z for(long i1=8; i1<8; i1+=1) 2023-01-11T21:38:06.0739043Z { 2023-01-11T21:38:06.0739150Z auto tmp0 = in_ptr0[i1 + (8*i0)]; 2023-01-11T21:38:06.0739253Z auto tmp1 = in_ptr1[i1 + (8*i0)]; 2023-01-11T21:38:06.0739344Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.0739451Z tmp3 = std::max(tmp3, tmp2); 2023-01-11T21:38:06.0739557Z tmp4 = std::max(tmp4, tmp1); 2023-01-11T21:38:06.0739627Z } 2023-01-11T21:38:06.0739714Z out_ptr0[i0] = tmp3; 2023-01-11T21:38:06.0739801Z out_ptr1[i0] = tmp4; 2023-01-11T21:38:06.0739875Z } 2023-01-11T21:38:06.0739936Z } 2023-01-11T21:38:06.0740019Z #pragma omp for 2023-01-11T21:38:06.0740106Z for(long i0=0; i0<8; i0+=1) 2023-01-11T21:38:06.0740173Z { 2023-01-11T21:38:06.0740243Z { 2023-01-11T21:38:06.0740312Z { 2023-01-11T21:38:06.0740578Z float tmp1 = -std::numeric_limits::infinity(); 2023-01-11T21:38:06.0740671Z for(long i1=0; i1<8; i1+=1) 2023-01-11T21:38:06.0740745Z { 2023-01-11T21:38:06.0740820Z { 2023-01-11T21:38:06.0740931Z auto tmp0 = in_ptr0[i0 + (8*i1)]; 2023-01-11T21:38:06.0741043Z tmp1 = std::max(tmp1, tmp0); 2023-01-11T21:38:06.0741118Z } 2023-01-11T21:38:06.0741191Z } 2023-01-11T21:38:06.0741276Z out_ptr2[i0] = tmp1; 2023-01-11T21:38:06.0741346Z } 2023-01-11T21:38:06.0741419Z } 2023-01-11T21:38:06.0741487Z } 2023-01-11T21:38:06.0741570Z #pragma omp for 2023-01-11T21:38:06.0741655Z for(long i0=0; i0<8; i0+=1) 2023-01-11T21:38:06.0741716Z { 2023-01-11T21:38:06.0741785Z { 2023-01-11T21:38:06.0741981Z #pragma omp declare reduction(+:at::vec::Vectorized:omp_out += omp_in) initializer(omp_priv={{0}}) 2023-01-11T21:38:06.0742071Z float tmp12 = 0; 2023-01-11T21:38:06.0742200Z auto tmp12_vec = at::vec::Vectorized(tmp12); 2023-01-11T21:38:06.0742295Z for(long i1=0; i1<1; i1+=1) 2023-01-11T21:38:06.0742366Z { 2023-01-11T21:38:06.0742515Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + (8*i0) + (8*i1)); 2023-01-11T21:38:06.0742655Z auto tmp1 = at::vec::Vectorized::loadu(in_ptr1 + (8*i0) + (8*i1)); 2023-01-11T21:38:06.0742787Z auto tmp3 = at::vec::Vectorized(out_ptr0[i0]); 2023-01-11T21:38:06.0742976Z auto tmp6 = at::vec::Vectorized::loadu(out_ptr2 + 8*i1); 2023-01-11T21:38:06.0743110Z auto tmp9 = at::vec::Vectorized(out_ptr1[i0]); 2023-01-11T21:38:06.0743210Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.0743354Z auto tmp4 = tmp2 - tmp3; 2023-01-11T21:38:06.0743452Z auto tmp5 = tmp4.exp(); 2023-01-11T21:38:06.0743591Z auto tmp7 = tmp0 - tmp6; 2023-01-11T21:38:06.0743679Z auto tmp8 = tmp7.exp(); 2023-01-11T21:38:06.0743820Z auto tmp10 = tmp1 - tmp9; 2023-01-11T21:38:06.0743919Z auto tmp11 = tmp10.exp(); 2023-01-11T21:38:06.0744033Z tmp5.store(out_ptr3 + (8*i0) + (8*i1)); 2023-01-11T21:38:06.0744145Z tmp8.store(out_ptr4 + (8*i0) + (8*i1)); 2023-01-11T21:38:06.0744261Z tmp11.store(out_ptr5 + (8*i0) + (8*i1)); 2023-01-11T21:38:06.0744355Z tmp12_vec += tmp5; 2023-01-11T21:38:06.0744418Z } 2023-01-11T21:38:06.0744620Z tmp12 = at::vec::vec_reduce_all([](at::vec::Vectorized& x, at::vec::Vectorized&y) {return x + y;}, tmp12_vec); 2023-01-11T21:38:06.0744748Z #pragma omp simd simdlen(4) reduction(+:tmp12) 2023-01-11T21:38:06.0744846Z for(long i1=8; i1<8; i1+=1) 2023-01-11T21:38:06.0744914Z { 2023-01-11T21:38:06.0745019Z auto tmp0 = in_ptr0[i1 + (8*i0)]; 2023-01-11T21:38:06.0745121Z auto tmp1 = in_ptr1[i1 + (8*i0)]; 2023-01-11T21:38:06.0745218Z auto tmp3 = out_ptr0[i0]; 2023-01-11T21:38:06.0745309Z auto tmp6 = out_ptr2[i1]; 2023-01-11T21:38:06.0745405Z auto tmp9 = out_ptr1[i0]; 2023-01-11T21:38:06.0745502Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.0745641Z auto tmp4 = tmp2 - tmp3; 2023-01-11T21:38:06.0745750Z auto tmp5 = std::exp(tmp4); 2023-01-11T21:38:06.0745890Z auto tmp7 = tmp0 - tmp6; 2023-01-11T21:38:06.0745995Z auto tmp8 = std::exp(tmp7); 2023-01-11T21:38:06.0746126Z auto tmp10 = tmp1 - tmp9; 2023-01-11T21:38:06.0746264Z auto tmp11 = std::exp(tmp10); 2023-01-11T21:38:06.0746365Z out_ptr3[i1 + (8*i0)] = tmp5; 2023-01-11T21:38:06.0746462Z out_ptr4[i1 + (8*i0)] = tmp8; 2023-01-11T21:38:06.0746561Z out_ptr5[i1 + (8*i0)] = tmp11; 2023-01-11T21:38:06.0746648Z tmp12 += tmp5; 2023-01-11T21:38:06.0746719Z } 2023-01-11T21:38:06.0746812Z out_ptr6[i0] = tmp12; 2023-01-11T21:38:06.0746874Z } 2023-01-11T21:38:06.0746943Z } 2023-01-11T21:38:06.0747026Z #pragma omp for 2023-01-11T21:38:06.0747114Z for(long i0=0; i0<8; i0+=1) 2023-01-11T21:38:06.0747186Z { 2023-01-11T21:38:06.0747274Z for(long i1=0; i1<1; i1+=1) 2023-01-11T21:38:06.0747336Z { 2023-01-11T21:38:06.0747489Z auto tmp0 = at::vec::Vectorized::loadu(out_ptr3 + (8*i0) + (8*i1)); 2023-01-11T21:38:06.0747627Z auto tmp1 = at::vec::Vectorized(out_ptr6[i0]); 2023-01-11T21:38:06.0747721Z auto tmp2 = tmp0 / tmp1; 2023-01-11T21:38:06.0747834Z tmp2.store(in_out_ptr0 + (8*i0) + (8*i1)); 2023-01-11T21:38:06.0747905Z } 2023-01-11T21:38:06.0748003Z #pragma omp simd simdlen(4) 2023-01-11T21:38:06.0748090Z for(long i1=8; i1<8; i1+=1) 2023-01-11T21:38:06.0748153Z { 2023-01-11T21:38:06.0748255Z auto tmp0 = out_ptr3[i1 + (8*i0)]; 2023-01-11T21:38:06.0748349Z auto tmp1 = out_ptr6[i0]; 2023-01-11T21:38:06.0748444Z auto tmp2 = tmp0 / tmp1; 2023-01-11T21:38:06.0748543Z in_out_ptr0[i1 + (8*i0)] = tmp2; 2023-01-11T21:38:06.0748646Z } 2023-01-11T21:38:06.0748715Z } 2023-01-11T21:38:06.0748791Z #pragma omp for 2023-01-11T21:38:06.0748877Z for(long i0=0; i0<8; i0+=1) 2023-01-11T21:38:06.0748947Z { 2023-01-11T21:38:06.0749014Z { 2023-01-11T21:38:06.0749083Z { 2023-01-11T21:38:06.0749172Z float tmp1 = 0; 2023-01-11T21:38:06.0749262Z for(long i1=0; i1<8; i1+=1) 2023-01-11T21:38:06.0749335Z { 2023-01-11T21:38:06.0749409Z { 2023-01-11T21:38:06.0749521Z auto tmp0 = out_ptr4[i0 + (8*i1)]; 2023-01-11T21:38:06.0749610Z tmp1 += tmp0; 2023-01-11T21:38:06.0749683Z } 2023-01-11T21:38:06.0749754Z } 2023-01-11T21:38:06.0749837Z out_ptr7[i0] = tmp1; 2023-01-11T21:38:06.0749908Z } 2023-01-11T21:38:06.0749976Z } 2023-01-11T21:38:06.0750047Z } 2023-01-11T21:38:06.0750129Z #pragma omp for 2023-01-11T21:38:06.0750217Z for(long i0=0; i0<8; i0+=1) 2023-01-11T21:38:06.0750288Z { 2023-01-11T21:38:06.0750369Z for(long i1=0; i1<1; i1+=1) 2023-01-11T21:38:06.0750438Z { 2023-01-11T21:38:06.0750590Z auto tmp0 = at::vec::Vectorized::loadu(out_ptr4 + (8*i0) + (8*i1)); 2023-01-11T21:38:06.0750733Z auto tmp1 = at::vec::Vectorized::loadu(out_ptr7 + 8*i1); 2023-01-11T21:38:06.0750828Z auto tmp2 = tmp0 / tmp1; 2023-01-11T21:38:06.0750942Z tmp2.store(in_out_ptr1 + (8*i0) + (8*i1)); 2023-01-11T21:38:06.0751013Z } 2023-01-11T21:38:06.0751103Z #pragma omp simd simdlen(4) 2023-01-11T21:38:06.0751191Z for(long i1=8; i1<8; i1+=1) 2023-01-11T21:38:06.0751260Z { 2023-01-11T21:38:06.0751362Z auto tmp0 = out_ptr4[i1 + (8*i0)]; 2023-01-11T21:38:06.0751458Z auto tmp1 = out_ptr7[i1]; 2023-01-11T21:38:06.0751552Z auto tmp2 = tmp0 / tmp1; 2023-01-11T21:38:06.0751656Z in_out_ptr1[i1 + (8*i0)] = tmp2; 2023-01-11T21:38:06.0751717Z } 2023-01-11T21:38:06.0751787Z } 2023-01-11T21:38:06.0751870Z #pragma omp for 2023-01-11T21:38:06.0751986Z for(long i0=0; i0<8; i0+=1) 2023-01-11T21:38:06.0752057Z { 2023-01-11T21:38:06.0752127Z { 2023-01-11T21:38:06.0752322Z #pragma omp declare reduction(+:at::vec::Vectorized:omp_out += omp_in) initializer(omp_priv={{0}}) 2023-01-11T21:38:06.0752401Z float tmp1 = 0; 2023-01-11T21:38:06.0752530Z auto tmp1_vec = at::vec::Vectorized(tmp1); 2023-01-11T21:38:06.0752626Z for(long i1=0; i1<1; i1+=1) 2023-01-11T21:38:06.0752697Z { 2023-01-11T21:38:06.0752847Z auto tmp0 = at::vec::Vectorized::loadu(out_ptr5 + (8*i0) + (8*i1)); 2023-01-11T21:38:06.0752939Z tmp1_vec += tmp0; 2023-01-11T21:38:06.0753011Z } 2023-01-11T21:38:06.0753220Z tmp1 = at::vec::vec_reduce_all([](at::vec::Vectorized& x, at::vec::Vectorized&y) {return x + y;}, tmp1_vec); 2023-01-11T21:38:06.0753342Z #pragma omp simd simdlen(4) reduction(+:tmp1) 2023-01-11T21:38:06.0753437Z for(long i1=8; i1<8; i1+=1) 2023-01-11T21:38:06.0753507Z { 2023-01-11T21:38:06.0753614Z auto tmp0 = out_ptr5[i1 + (8*i0)]; 2023-01-11T21:38:06.0753698Z tmp1 += tmp0; 2023-01-11T21:38:06.0753769Z } 2023-01-11T21:38:06.0753858Z out_ptr8[i0] = tmp1; 2023-01-11T21:38:06.0753920Z } 2023-01-11T21:38:06.0753987Z } 2023-01-11T21:38:06.0754070Z #pragma omp for 2023-01-11T21:38:06.0754157Z for(long i0=0; i0<8; i0+=1) 2023-01-11T21:38:06.0754227Z { 2023-01-11T21:38:06.0754347Z for(long i1=0; i1<1; i1+=1) 2023-01-11T21:38:06.0754418Z { 2023-01-11T21:38:06.0754556Z auto tmp0 = at::vec::Vectorized::loadu(out_ptr5 + (8*i0) + (8*i1)); 2023-01-11T21:38:06.0754690Z auto tmp1 = at::vec::Vectorized(out_ptr8[i0]); 2023-01-11T21:38:06.0754786Z auto tmp2 = tmp0 / tmp1; 2023-01-11T21:38:06.0754896Z tmp2.store(in_out_ptr2 + (8*i0) + (8*i1)); 2023-01-11T21:38:06.0754966Z } 2023-01-11T21:38:06.0755062Z #pragma omp simd simdlen(4) 2023-01-11T21:38:06.0755150Z for(long i1=8; i1<8; i1+=1) 2023-01-11T21:38:06.0755212Z { 2023-01-11T21:38:06.0755317Z auto tmp0 = out_ptr5[i1 + (8*i0)]; 2023-01-11T21:38:06.0755411Z auto tmp1 = out_ptr8[i0]; 2023-01-11T21:38:06.0755503Z auto tmp2 = tmp0 / tmp1; 2023-01-11T21:38:06.0755604Z in_out_ptr2[i1 + (8*i0)] = tmp2; 2023-01-11T21:38:06.0755679Z } 2023-01-11T21:38:06.0755747Z } 2023-01-11T21:38:06.0755807Z } 2023-01-11T21:38:06.0755873Z } 2023-01-11T21:38:06.0755961Z ''') 2023-01-11T21:38:06.0755967Z 2023-01-11T21:38:06.0755973Z 2023-01-11T21:38:06.0756067Z async_compile.wait(globals()) 2023-01-11T21:38:06.0756146Z del async_compile 2023-01-11T21:38:06.0756151Z 2023-01-11T21:38:06.0756229Z def call(args): 2023-01-11T21:38:06.0756310Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.0756380Z args.clear() 2023-01-11T21:38:06.0756576Z buf0 = empty_strided((8, 1), (1, 8), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0756770Z buf8 = empty_strided((8, 1), (1, 8), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0756960Z buf4 = empty_strided((1, 8), (8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0757150Z buf1 = empty_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0757336Z buf5 = empty_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0757527Z buf9 = empty_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0757711Z buf2 = empty_strided((8, 1), (1, 8), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0757796Z buf3 = buf1; del buf1 # reuse 2023-01-11T21:38:06.0758013Z buf6 = empty_strided((1, 8), (8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0758106Z buf7 = buf5; del buf5 # reuse 2023-01-11T21:38:06.0758301Z buf10 = empty_strided((8, 1), (1, 8), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0758392Z buf11 = buf9; del buf9 # reuse 2023-01-11T21:38:06.0758740Z kernel_cpp_0(c_void_p(buf3.data_ptr()), c_void_p(buf7.data_ptr()), c_void_p(buf11.data_ptr()), c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf8.data_ptr()), c_void_p(buf4.data_ptr()), c_void_p(buf2.data_ptr()), c_void_p(buf6.data_ptr()), c_void_p(buf10.data_ptr())) 2023-01-11T21:38:06.0758814Z del arg0_1 2023-01-11T21:38:06.0758890Z del arg1_1 2023-01-11T21:38:06.0758975Z return (buf3, buf7, buf11, ) 2023-01-11T21:38:06.0758991Z 2023-01-11T21:38:06.0758996Z 2023-01-11T21:38:06.0759071Z if __name__ == "__main__": 2023-01-11T21:38:06.0759191Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.0759320Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.0759518Z arg0_1 = rand_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0759715Z arg1_1 = rand_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0759837Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.0760107Z [2023-01-11 21:31:38,302] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 258 2023-01-11T21:38:06.0760113Z 2023-01-11T21:38:06.0760186Z ok (1.876s) 2023-01-11T21:38:06.0760640Z test_softmax_one_kernel_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.0760811Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.0761075Z [2023-01-11 21:31:38,325] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 259 2023-01-11T21:38:06.0761338Z [2023-01-11 21:31:40,054] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 259 2023-01-11T21:38:06.0761344Z 2023-01-11T21:38:06.0761443Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.0761520Z import torch 2023-01-11T21:38:06.0761596Z import random 2023-01-11T21:38:06.0761719Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.0761843Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.0761852Z 2023-01-11T21:38:06.0761929Z aten = torch.ops.aten 2023-01-11T21:38:06.0762067Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.0762166Z async_compile = AsyncCompile() 2023-01-11T21:38:06.0762170Z 2023-01-11T21:38:06.0762247Z import triton 2023-01-11T21:38:06.0762344Z import triton.language as tl 2023-01-11T21:38:06.0762471Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.0762610Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.0762616Z 2023-01-11T21:38:06.0762620Z 2023-01-11T21:38:06.0762757Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.0762959Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.0763080Z extern "C" void kernel(float* __restrict__ in_out_ptr0, 2023-01-11T21:38:06.0763192Z const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.0763298Z float* __restrict__ out_ptr0, 2023-01-11T21:38:06.0763401Z float* __restrict__ out_ptr2) 2023-01-11T21:38:06.0763468Z { 2023-01-11T21:38:06.0763561Z auto out_ptr1 = in_out_ptr0; 2023-01-11T21:38:06.0763656Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.0763724Z { 2023-01-11T21:38:06.0763805Z #pragma omp for 2023-01-11T21:38:06.0763922Z for(long i0=0; i0<16; i0+=1) 2023-01-11T21:38:06.0763992Z { 2023-01-11T21:38:06.0764062Z { 2023-01-11T21:38:06.0764428Z #pragma omp declare reduction(max:at::vec::Vectorized:omp_out = at::vec::maximum(omp_out, omp_in)) initializer(omp_priv={{-std::numeric_limits::infinity()}}) 2023-01-11T21:38:06.0764641Z float tmp1 = -std::numeric_limits::infinity(); 2023-01-11T21:38:06.0764762Z auto tmp1_vec = at::vec::Vectorized(tmp1); 2023-01-11T21:38:06.0764859Z for(long i1=0; i1<4; i1+=1) 2023-01-11T21:38:06.0764932Z { 2023-01-11T21:38:06.0765087Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + (8*i1) + (32*i0)); 2023-01-11T21:38:06.0765212Z tmp1_vec = at::vec::maximum(tmp1_vec, tmp0); 2023-01-11T21:38:06.0765283Z } 2023-01-11T21:38:06.0765500Z tmp1 = at::vec::vec_reduce_all([](at::vec::Vectorized& x, at::vec::Vectorized&y) {return at::vec::maximum(x, y);}, tmp1_vec); 2023-01-11T21:38:06.0765628Z #pragma omp simd simdlen(4) reduction(max:tmp1) 2023-01-11T21:38:06.0765717Z for(long i1=32; i1<32; i1+=1) 2023-01-11T21:38:06.0765786Z { 2023-01-11T21:38:06.0765894Z auto tmp0 = in_ptr0[i1 + (32*i0)]; 2023-01-11T21:38:06.0766000Z tmp1 = std::max(tmp1, tmp0); 2023-01-11T21:38:06.0766069Z } 2023-01-11T21:38:06.0766157Z out_ptr0[i0] = tmp1; 2023-01-11T21:38:06.0766226Z } 2023-01-11T21:38:06.0766320Z } 2023-01-11T21:38:06.0766402Z #pragma omp for 2023-01-11T21:38:06.0766489Z for(long i0=0; i0<16; i0+=1) 2023-01-11T21:38:06.0766557Z { 2023-01-11T21:38:06.0766626Z { 2023-01-11T21:38:06.0766824Z #pragma omp declare reduction(+:at::vec::Vectorized:omp_out += omp_in) initializer(omp_priv={{0}}) 2023-01-11T21:38:06.0766910Z float tmp4 = 0; 2023-01-11T21:38:06.0767030Z auto tmp4_vec = at::vec::Vectorized(tmp4); 2023-01-11T21:38:06.0767127Z for(long i1=0; i1<4; i1+=1) 2023-01-11T21:38:06.0767197Z { 2023-01-11T21:38:06.0767344Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + (8*i1) + (32*i0)); 2023-01-11T21:38:06.0767479Z auto tmp1 = at::vec::Vectorized(out_ptr0[i0]); 2023-01-11T21:38:06.0767582Z auto tmp2 = tmp0 * tmp1; 2023-01-11T21:38:06.0767680Z auto tmp3 = tmp2.exp(); 2023-01-11T21:38:06.0767798Z tmp3.store(out_ptr1 + (8*i1) + (32*i0)); 2023-01-11T21:38:06.0767880Z tmp4_vec += tmp3; 2023-01-11T21:38:06.0767949Z } 2023-01-11T21:38:06.0768151Z tmp4 = at::vec::vec_reduce_all([](at::vec::Vectorized& x, at::vec::Vectorized&y) {return x + y;}, tmp4_vec); 2023-01-11T21:38:06.0768281Z #pragma omp simd simdlen(4) reduction(+:tmp4) 2023-01-11T21:38:06.0768378Z for(long i1=32; i1<32; i1+=1) 2023-01-11T21:38:06.0768449Z { 2023-01-11T21:38:06.0768554Z auto tmp0 = in_ptr0[i1 + (32*i0)]; 2023-01-11T21:38:06.0768653Z auto tmp1 = out_ptr0[i0]; 2023-01-11T21:38:06.0768744Z auto tmp2 = tmp0 * tmp1; 2023-01-11T21:38:06.0768850Z auto tmp3 = std::exp(tmp2); 2023-01-11T21:38:06.0768949Z out_ptr1[i1 + (32*i0)] = tmp3; 2023-01-11T21:38:06.0769036Z tmp4 += tmp3; 2023-01-11T21:38:06.0769108Z } 2023-01-11T21:38:06.0769195Z out_ptr2[i0] = tmp4; 2023-01-11T21:38:06.0769264Z } 2023-01-11T21:38:06.0769325Z } 2023-01-11T21:38:06.0769408Z #pragma omp for 2023-01-11T21:38:06.0769495Z for(long i0=0; i0<16; i0+=1) 2023-01-11T21:38:06.0769595Z { 2023-01-11T21:38:06.0769687Z for(long i1=0; i1<4; i1+=1) 2023-01-11T21:38:06.0769756Z { 2023-01-11T21:38:06.0769899Z auto tmp0 = at::vec::Vectorized::loadu(out_ptr1 + (8*i1) + (32*i0)); 2023-01-11T21:38:06.0770032Z auto tmp1 = at::vec::Vectorized(out_ptr2[i0]); 2023-01-11T21:38:06.0770126Z auto tmp2 = tmp0 / tmp1; 2023-01-11T21:38:06.0770241Z tmp2.store(in_out_ptr0 + (8*i1) + (32*i0)); 2023-01-11T21:38:06.0770312Z } 2023-01-11T21:38:06.0770410Z #pragma omp simd simdlen(4) 2023-01-11T21:38:06.0770501Z for(long i1=32; i1<32; i1+=1) 2023-01-11T21:38:06.0770573Z { 2023-01-11T21:38:06.0770669Z auto tmp0 = out_ptr1[i1 + (32*i0)]; 2023-01-11T21:38:06.0770763Z auto tmp1 = out_ptr2[i0]; 2023-01-11T21:38:06.0770856Z auto tmp2 = tmp0 / tmp1; 2023-01-11T21:38:06.0770959Z in_out_ptr0[i1 + (32*i0)] = tmp2; 2023-01-11T21:38:06.0771028Z } 2023-01-11T21:38:06.0771096Z } 2023-01-11T21:38:06.0771163Z } 2023-01-11T21:38:06.0771222Z } 2023-01-11T21:38:06.0771311Z ''') 2023-01-11T21:38:06.0771316Z 2023-01-11T21:38:06.0771321Z 2023-01-11T21:38:06.0771419Z async_compile.wait(globals()) 2023-01-11T21:38:06.0771497Z del async_compile 2023-01-11T21:38:06.0771503Z 2023-01-11T21:38:06.0771576Z def call(args): 2023-01-11T21:38:06.0771651Z arg0_1, = args 2023-01-11T21:38:06.0771730Z args.clear() 2023-01-11T21:38:06.0771922Z buf0 = empty_strided((16, 1), (1, 16), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0772151Z buf1 = empty_strided((16, 32), (32, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0772347Z buf2 = empty_strided((16, 1), (1, 16), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0772439Z buf3 = buf1; del buf1 # reuse 2023-01-11T21:38:06.0772635Z kernel_cpp_0(c_void_p(buf3.data_ptr()), c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf2.data_ptr())) 2023-01-11T21:38:06.0772709Z del arg0_1 2023-01-11T21:38:06.0772788Z return (buf3, ) 2023-01-11T21:38:06.0772793Z 2023-01-11T21:38:06.0772798Z 2023-01-11T21:38:06.0772879Z if __name__ == "__main__": 2023-01-11T21:38:06.0772991Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.0773119Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.0773322Z arg0_1 = rand_strided((16, 32), (32, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0773435Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.0773440Z 2023-01-11T21:38:06.0773516Z ok (1.751s) 2023-01-11T21:38:06.0773971Z test_sort_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.0774105Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.0774367Z [2023-01-11 21:31:40,069] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 260 2023-01-11T21:38:06.0774701Z [2023-01-11 21:31:40,074] torch._inductor.ir: [WARNING] Using FallbackKernel: aten.sort 2023-01-11T21:38:06.0774963Z [2023-01-11 21:31:40,077] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 260 2023-01-11T21:38:06.0774978Z 2023-01-11T21:38:06.0775071Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.0775150Z import torch 2023-01-11T21:38:06.0775244Z import random 2023-01-11T21:38:06.0775380Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.0775520Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.0775526Z 2023-01-11T21:38:06.0775608Z aten = torch.ops.aten 2023-01-11T21:38:06.0775803Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.0775896Z async_compile = AsyncCompile() 2023-01-11T21:38:06.0775901Z 2023-01-11T21:38:06.0775977Z import triton 2023-01-11T21:38:06.0776071Z import triton.language as tl 2023-01-11T21:38:06.0776197Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.0776338Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.0776344Z 2023-01-11T21:38:06.0776348Z 2023-01-11T21:38:06.0776442Z async_compile.wait(globals()) 2023-01-11T21:38:06.0776520Z del async_compile 2023-01-11T21:38:06.0776525Z 2023-01-11T21:38:06.0776601Z def call(args): 2023-01-11T21:38:06.0776672Z arg0_1, = args 2023-01-11T21:38:06.0776747Z args.clear() 2023-01-11T21:38:06.0776837Z buf0 = aten.sort(arg0_1) 2023-01-11T21:38:06.0776910Z del arg0_1 2023-01-11T21:38:06.0776984Z buf1 = buf0[0] 2023-01-11T21:38:06.0777098Z assert_size_stride(buf1, (1, 1, 8, 8), (64, 64, 8, 1)) 2023-01-11T21:38:06.0777234Z buf2 = buf0[1] 2023-01-11T21:38:06.0777358Z assert_size_stride(buf2, (1, 1, 8, 8), (64, 64, 8, 1)) 2023-01-11T21:38:06.0777432Z del buf0 2023-01-11T21:38:06.0777515Z return (buf1, buf2, ) 2023-01-11T21:38:06.0777520Z 2023-01-11T21:38:06.0777525Z 2023-01-11T21:38:06.0777606Z if __name__ == "__main__": 2023-01-11T21:38:06.0777725Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.0777853Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.0778071Z arg0_1 = rand_strided((1, 1, 8, 8), (64, 64, 8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0778177Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.0778230Z 2023-01-11T21:38:06.0778296Z ok (0.022s) 2023-01-11T21:38:06.0778755Z test_split_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.0778889Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.0779150Z [2023-01-11 21:31:40,098] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 261 2023-01-11T21:38:06.0779415Z [2023-01-11 21:31:40,101] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 261 2023-01-11T21:38:06.0779832Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.0779970Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.0780227Z [2023-01-11 21:31:40,124] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 262 2023-01-11T21:38:06.0780490Z [2023-01-11 21:31:41,831] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 262 2023-01-11T21:38:06.0780495Z 2023-01-11T21:38:06.0780594Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.0780662Z import torch 2023-01-11T21:38:06.0780739Z import random 2023-01-11T21:38:06.0780859Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.0780984Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.0780992Z 2023-01-11T21:38:06.0781077Z aten = torch.ops.aten 2023-01-11T21:38:06.0781215Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.0781312Z async_compile = AsyncCompile() 2023-01-11T21:38:06.0781317Z 2023-01-11T21:38:06.0781392Z import triton 2023-01-11T21:38:06.0781478Z import triton.language as tl 2023-01-11T21:38:06.0781679Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.0781823Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.0781828Z 2023-01-11T21:38:06.0781833Z 2023-01-11T21:38:06.0781926Z async_compile.wait(globals()) 2023-01-11T21:38:06.0782005Z del async_compile 2023-01-11T21:38:06.0782010Z 2023-01-11T21:38:06.0782085Z def call(args): 2023-01-11T21:38:06.0782162Z arg0_1, = args 2023-01-11T21:38:06.0782231Z args.clear() 2023-01-11T21:38:06.0782433Z return (as_strided(arg0_1, (2, 2, 3), (20, 10, 1)), as_strided(arg0_1, (2, 2, 3), (20, 10, 1), 3), as_strided(arg0_1, (2, 2, 3), (20, 10, 1), 6), as_strided(arg0_1, (2, 2, 1), (20, 10, 1), 9), ) 2023-01-11T21:38:06.0782442Z 2023-01-11T21:38:06.0782446Z 2023-01-11T21:38:06.0782528Z if __name__ == "__main__": 2023-01-11T21:38:06.0782648Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.0782777Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.0782989Z arg0_1 = rand_strided((2, 2, 10), (20, 10, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0783102Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.0783107Z 2023-01-11T21:38:06.0783111Z 2023-01-11T21:38:06.0783213Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.0783288Z import torch 2023-01-11T21:38:06.0783356Z import random 2023-01-11T21:38:06.0783475Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.0783599Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.0783605Z 2023-01-11T21:38:06.0783689Z aten = torch.ops.aten 2023-01-11T21:38:06.0783826Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.0783961Z async_compile = AsyncCompile() 2023-01-11T21:38:06.0783966Z 2023-01-11T21:38:06.0784039Z import triton 2023-01-11T21:38:06.0784133Z import triton.language as tl 2023-01-11T21:38:06.0784251Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.0784394Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.0784400Z 2023-01-11T21:38:06.0784404Z 2023-01-11T21:38:06.0784543Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.0784751Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.0784875Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.0784981Z float* __restrict__ out_ptr0) 2023-01-11T21:38:06.0785048Z { 2023-01-11T21:38:06.0785143Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.0785213Z { 2023-01-11T21:38:06.0785299Z #pragma omp for 2023-01-11T21:38:06.0785398Z for(long i0=0; i0<5; i0+=1) 2023-01-11T21:38:06.0785486Z { 2023-01-11T21:38:06.0785649Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:06.0785798Z auto tmp1 = at::vec::Vectorized(static_cast(1)); 2023-01-11T21:38:06.0785890Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.0785981Z tmp2.store(out_ptr0 + 8*i0); 2023-01-11T21:38:06.0786051Z } 2023-01-11T21:38:06.0786152Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:06.0786242Z for(long i0=40; i0<40; i0+=1) 2023-01-11T21:38:06.0786311Z { 2023-01-11T21:38:06.0786406Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:06.0786513Z auto tmp1 = static_cast(1); 2023-01-11T21:38:06.0786597Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.0786688Z out_ptr0[i0] = tmp2; 2023-01-11T21:38:06.0786756Z } 2023-01-11T21:38:06.0786825Z } 2023-01-11T21:38:06.0786897Z } 2023-01-11T21:38:06.0786982Z ''') 2023-01-11T21:38:06.0786988Z 2023-01-11T21:38:06.0786992Z 2023-01-11T21:38:06.0787089Z async_compile.wait(globals()) 2023-01-11T21:38:06.0787161Z del async_compile 2023-01-11T21:38:06.0787166Z 2023-01-11T21:38:06.0787251Z def call(args): 2023-01-11T21:38:06.0787326Z arg0_1, = args 2023-01-11T21:38:06.0787431Z args.clear() 2023-01-11T21:38:06.0787649Z buf0 = empty_strided((2, 2, 10), (20, 10, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0787789Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.0787863Z del arg0_1 2023-01-11T21:38:06.0788047Z return (as_strided(buf0, (2, 2, 3), (20, 10, 1)), as_strided(buf0, (2, 2, 3), (20, 10, 1), 3), as_strided(buf0, (2, 2, 3), (20, 10, 1), 6), as_strided(buf0, (2, 2, 1), (20, 10, 0), 9), ) 2023-01-11T21:38:06.0788062Z 2023-01-11T21:38:06.0788066Z 2023-01-11T21:38:06.0788141Z if __name__ == "__main__": 2023-01-11T21:38:06.0788260Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.0788390Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.0788597Z arg0_1 = rand_strided((2, 2, 10), (20, 10, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0788711Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.0788716Z 2023-01-11T21:38:06.0788791Z ok (1.755s) 2023-01-11T21:38:06.0789245Z test_split_with_sizes_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.0789379Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.0789640Z [2023-01-11 21:31:41,862] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 263 2023-01-11T21:38:06.0789929Z [2023-01-11 21:31:43,618] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 263 2023-01-11T21:38:06.0790344Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.0790477Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.0790732Z [2023-01-11 21:31:43,673] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 264 2023-01-11T21:38:06.0790997Z [2023-01-11 21:31:45,604] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 264 2023-01-11T21:38:06.0791416Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.0791554Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.0791811Z [2023-01-11 21:31:45,662] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 265 2023-01-11T21:38:06.0791817Z 2023-01-11T21:38:06.0791915Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.0791993Z import torch 2023-01-11T21:38:06.0792070Z import random 2023-01-11T21:38:06.0792185Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.0792311Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.0792316Z 2023-01-11T21:38:06.0792400Z aten = torch.ops.aten 2023-01-11T21:38:06.0792540Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.0792640Z async_compile = AsyncCompile() 2023-01-11T21:38:06.0792645Z 2023-01-11T21:38:06.0792725Z import triton 2023-01-11T21:38:06.0792817Z import triton.language as tl 2023-01-11T21:38:06.0792940Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.0793111Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.0793117Z 2023-01-11T21:38:06.0793122Z 2023-01-11T21:38:06.0793264Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.0793477Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.0793605Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.0793712Z float* __restrict__ out_ptr0, 2023-01-11T21:38:06.0793815Z float* __restrict__ out_ptr1, 2023-01-11T21:38:06.0793917Z float* __restrict__ out_ptr2) 2023-01-11T21:38:06.0793977Z { 2023-01-11T21:38:06.0794086Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.0794153Z { 2023-01-11T21:38:06.0794237Z #pragma omp for 2023-01-11T21:38:06.0794327Z for(long i0=0; i0<4; i0+=1) 2023-01-11T21:38:06.0794398Z { 2023-01-11T21:38:06.0794485Z #pragma GCC ivdep 2023-01-11T21:38:06.0794570Z for(long i1=0; i1<3; i1+=1) 2023-01-11T21:38:06.0794640Z { 2023-01-11T21:38:06.0794710Z { 2023-01-11T21:38:06.0794782Z { 2023-01-11T21:38:06.0794892Z auto tmp0 = in_ptr0[i1 + (10*i0)]; 2023-01-11T21:38:06.0795009Z auto tmp1 = static_cast(2.0); 2023-01-11T21:38:06.0795111Z auto tmp2 = tmp0 * tmp1; 2023-01-11T21:38:06.0795220Z auto tmp3 = static_cast(1.0); 2023-01-11T21:38:06.0795320Z auto tmp4 = tmp2 + tmp3; 2023-01-11T21:38:06.0795424Z out_ptr0[i1 + (3*i0)] = tmp4; 2023-01-11T21:38:06.0795533Z } 2023-01-11T21:38:06.0795604Z } 2023-01-11T21:38:06.0795674Z } 2023-01-11T21:38:06.0795743Z } 2023-01-11T21:38:06.0795820Z #pragma omp for 2023-01-11T21:38:06.0795911Z for(long i0=0; i0<4; i0+=1) 2023-01-11T21:38:06.0795981Z { 2023-01-11T21:38:06.0796069Z #pragma GCC ivdep 2023-01-11T21:38:06.0796157Z for(long i1=0; i1<3; i1+=1) 2023-01-11T21:38:06.0796228Z { 2023-01-11T21:38:06.0796291Z { 2023-01-11T21:38:06.0796362Z { 2023-01-11T21:38:06.0796474Z auto tmp0 = in_ptr0[3 + i1 + (10*i0)]; 2023-01-11T21:38:06.0796588Z auto tmp1 = static_cast(2.0); 2023-01-11T21:38:06.0796689Z auto tmp2 = tmp0 * tmp1; 2023-01-11T21:38:06.0796802Z auto tmp3 = static_cast(1.0); 2023-01-11T21:38:06.0796901Z auto tmp4 = tmp2 + tmp3; 2023-01-11T21:38:06.0796999Z out_ptr1[i1 + (3*i0)] = tmp4; 2023-01-11T21:38:06.0797071Z } 2023-01-11T21:38:06.0797141Z } 2023-01-11T21:38:06.0797212Z } 2023-01-11T21:38:06.0797281Z } 2023-01-11T21:38:06.0797366Z #pragma omp for 2023-01-11T21:38:06.0797456Z for(long i0=0; i0<4; i0+=1) 2023-01-11T21:38:06.0797517Z { 2023-01-11T21:38:06.0797603Z #pragma GCC ivdep 2023-01-11T21:38:06.0797693Z for(long i1=0; i1<4; i1+=1) 2023-01-11T21:38:06.0797763Z { 2023-01-11T21:38:06.0797833Z { 2023-01-11T21:38:06.0797906Z { 2023-01-11T21:38:06.0798017Z auto tmp0 = in_ptr0[6 + i1 + (10*i0)]; 2023-01-11T21:38:06.0798124Z auto tmp1 = static_cast(2.0); 2023-01-11T21:38:06.0798223Z auto tmp2 = tmp0 * tmp1; 2023-01-11T21:38:06.0798338Z auto tmp3 = static_cast(1.0); 2023-01-11T21:38:06.0798440Z auto tmp4 = tmp2 + tmp3; 2023-01-11T21:38:06.0798542Z out_ptr2[i1 + (4*i0)] = tmp4; 2023-01-11T21:38:06.0798616Z } 2023-01-11T21:38:06.0798686Z } 2023-01-11T21:38:06.0798747Z } 2023-01-11T21:38:06.0798844Z } 2023-01-11T21:38:06.0798916Z } 2023-01-11T21:38:06.0798981Z } 2023-01-11T21:38:06.0799069Z ''') 2023-01-11T21:38:06.0799075Z 2023-01-11T21:38:06.0799079Z 2023-01-11T21:38:06.0799178Z async_compile.wait(globals()) 2023-01-11T21:38:06.0799257Z del async_compile 2023-01-11T21:38:06.0799262Z 2023-01-11T21:38:06.0799331Z def call(args): 2023-01-11T21:38:06.0799408Z arg0_1, = args 2023-01-11T21:38:06.0799486Z args.clear() 2023-01-11T21:38:06.0799695Z buf0 = empty_strided((2, 2, 3), (6, 3, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0799899Z buf1 = empty_strided((2, 2, 3), (6, 3, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0800101Z buf2 = empty_strided((2, 2, 4), (8, 4, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0800294Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr())) 2023-01-11T21:38:06.0800367Z del arg0_1 2023-01-11T21:38:06.0800451Z return (buf0, buf1, buf2, ) 2023-01-11T21:38:06.0800457Z 2023-01-11T21:38:06.0800461Z 2023-01-11T21:38:06.0800542Z if __name__ == "__main__": 2023-01-11T21:38:06.0800663Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.0800792Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.0801003Z arg0_1 = rand_strided((2, 2, 10), (20, 10, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0801116Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.0801122Z 2023-01-11T21:38:06.0801126Z 2023-01-11T21:38:06.0801225Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.0801301Z import torch 2023-01-11T21:38:06.0801399Z import random 2023-01-11T21:38:06.0801522Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.0801646Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.0801651Z 2023-01-11T21:38:06.0801735Z aten = torch.ops.aten 2023-01-11T21:38:06.0801874Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.0801971Z async_compile = AsyncCompile() 2023-01-11T21:38:06.0801977Z 2023-01-11T21:38:06.0802051Z import triton 2023-01-11T21:38:06.0802139Z import triton.language as tl 2023-01-11T21:38:06.0802264Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.0802406Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.0802412Z 2023-01-11T21:38:06.0802416Z 2023-01-11T21:38:06.0802554Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.0802759Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.0802889Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.0802993Z float* __restrict__ out_ptr0, 2023-01-11T21:38:06.0803096Z float* __restrict__ out_ptr1, 2023-01-11T21:38:06.0803190Z float* __restrict__ out_ptr2) 2023-01-11T21:38:06.0803259Z { 2023-01-11T21:38:06.0803363Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.0803429Z { 2023-01-11T21:38:06.0803512Z #pragma omp for 2023-01-11T21:38:06.0803597Z for(long i0=0; i0<4; i0+=1) 2023-01-11T21:38:06.0803671Z { 2023-01-11T21:38:06.0803750Z #pragma GCC ivdep 2023-01-11T21:38:06.0803839Z for(long i1=0; i1<4; i1+=1) 2023-01-11T21:38:06.0803908Z { 2023-01-11T21:38:06.0803977Z { 2023-01-11T21:38:06.0804048Z { 2023-01-11T21:38:06.0804159Z auto tmp0 = in_ptr0[i1 + (10*i0)]; 2023-01-11T21:38:06.0804274Z auto tmp1 = static_cast(2.0); 2023-01-11T21:38:06.0804370Z auto tmp2 = tmp0 * tmp1; 2023-01-11T21:38:06.0804486Z auto tmp3 = static_cast(1.0); 2023-01-11T21:38:06.0804586Z auto tmp4 = tmp2 + tmp3; 2023-01-11T21:38:06.0804687Z out_ptr0[i1 + (4*i0)] = tmp4; 2023-01-11T21:38:06.0804786Z } 2023-01-11T21:38:06.0804858Z } 2023-01-11T21:38:06.0804930Z } 2023-01-11T21:38:06.0804991Z } 2023-01-11T21:38:06.0805075Z #pragma omp for 2023-01-11T21:38:06.0805163Z for(long i0=0; i0<4; i0+=1) 2023-01-11T21:38:06.0805232Z { 2023-01-11T21:38:06.0805318Z #pragma GCC ivdep 2023-01-11T21:38:06.0805422Z for(long i1=0; i1<3; i1+=1) 2023-01-11T21:38:06.0805492Z { 2023-01-11T21:38:06.0805572Z { 2023-01-11T21:38:06.0805657Z { 2023-01-11T21:38:06.0805769Z auto tmp0 = in_ptr0[4 + i1 + (10*i0)]; 2023-01-11T21:38:06.0805886Z auto tmp1 = static_cast(2.0); 2023-01-11T21:38:06.0805988Z auto tmp2 = tmp0 * tmp1; 2023-01-11T21:38:06.0806103Z auto tmp3 = static_cast(1.0); 2023-01-11T21:38:06.0806202Z auto tmp4 = tmp2 + tmp3; 2023-01-11T21:38:06.0806299Z out_ptr1[i1 + (3*i0)] = tmp4; 2023-01-11T21:38:06.0806372Z } 2023-01-11T21:38:06.0806444Z } 2023-01-11T21:38:06.0806513Z } 2023-01-11T21:38:06.0806581Z } 2023-01-11T21:38:06.0806661Z #pragma omp for 2023-01-11T21:38:06.0806741Z for(long i0=0; i0<4; i0+=1) 2023-01-11T21:38:06.0806810Z { 2023-01-11T21:38:06.0806895Z #pragma GCC ivdep 2023-01-11T21:38:06.0806984Z for(long i1=0; i1<3; i1+=1) 2023-01-11T21:38:06.0807052Z { 2023-01-11T21:38:06.0807121Z { 2023-01-11T21:38:06.0807223Z { 2023-01-11T21:38:06.0807329Z auto tmp0 = in_ptr0[7 + i1 + (10*i0)]; 2023-01-11T21:38:06.0807443Z auto tmp1 = static_cast(2.0); 2023-01-11T21:38:06.0807542Z auto tmp2 = tmp0 * tmp1; 2023-01-11T21:38:06.0807659Z auto tmp3 = static_cast(1.0); 2023-01-11T21:38:06.0807758Z auto tmp4 = tmp2 + tmp3; 2023-01-11T21:38:06.0807859Z out_ptr2[i1 + (3*i0)] = tmp4; 2023-01-11T21:38:06.0807932Z } 2023-01-11T21:38:06.0807995Z } 2023-01-11T21:38:06.0808063Z } 2023-01-11T21:38:06.0808134Z } 2023-01-11T21:38:06.0808202Z } 2023-01-11T21:38:06.0808266Z } 2023-01-11T21:38:06.0808357Z ''') 2023-01-11T21:38:06.0808362Z 2023-01-11T21:38:06.0808367Z 2023-01-11T21:38:06.0808463Z async_compile.wait(globals()) 2023-01-11T21:38:06.0808534Z del async_compile 2023-01-11T21:38:06.0808539Z 2023-01-11T21:38:06.0808618Z def call(args): 2023-01-11T21:38:06.0808695Z arg0_1, = args 2023-01-11T21:38:06.0808770Z args.clear() 2023-01-11T21:38:06.0808972Z buf0 = empty_strided((2, 2, 4), (8, 4, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0809173Z buf1 = empty_strided((2, 2, 3), (6, 3, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0809376Z buf2 = empty_strided((2, 2, 3), (6, 3, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0809569Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr())) 2023-01-11T21:38:06.0809636Z del arg0_1 2023-01-11T21:38:06.0809724Z return (buf0, buf1, buf2, ) 2023-01-11T21:38:06.0809729Z 2023-01-11T21:38:06.0809733Z 2023-01-11T21:38:06.0809814Z if __name__ == "__main__": 2023-01-11T21:38:06.0809934Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.0810063Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.0810273Z arg0_1 = rand_strided((2, 2, 10), (20, 10, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0810387Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.0810392Z 2023-01-11T21:38:06.0810659Z [2023-01-11 21:31:47,543] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 265 2023-01-11T21:38:06.0810693Z 2023-01-11T21:38:06.0810787Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.0810863Z import torch 2023-01-11T21:38:06.0810940Z import random 2023-01-11T21:38:06.0811059Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.0811187Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.0811192Z 2023-01-11T21:38:06.0811276Z aten = torch.ops.aten 2023-01-11T21:38:06.0811413Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.0811509Z async_compile = AsyncCompile() 2023-01-11T21:38:06.0811515Z 2023-01-11T21:38:06.0811583Z import triton 2023-01-11T21:38:06.0811675Z import triton.language as tl 2023-01-11T21:38:06.0811805Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.0811945Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.0811951Z 2023-01-11T21:38:06.0811955Z 2023-01-11T21:38:06.0812093Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.0812300Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.0812425Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.0812529Z float* __restrict__ out_ptr0, 2023-01-11T21:38:06.0812625Z float* __restrict__ out_ptr1, 2023-01-11T21:38:06.0812725Z float* __restrict__ out_ptr2, 2023-01-11T21:38:06.0812825Z float* __restrict__ out_ptr3) 2023-01-11T21:38:06.0812892Z { 2023-01-11T21:38:06.0812973Z #pragma GCC ivdep 2023-01-11T21:38:06.0813059Z for(long i0=0; i0<4; i0+=1) 2023-01-11T21:38:06.0813156Z { 2023-01-11T21:38:06.0813217Z { 2023-01-11T21:38:06.0813285Z { 2023-01-11T21:38:06.0813386Z auto tmp0 = in_ptr0[10*i0]; 2023-01-11T21:38:06.0813495Z auto tmp1 = static_cast(2.0); 2023-01-11T21:38:06.0813589Z auto tmp2 = tmp0 * tmp1; 2023-01-11T21:38:06.0813698Z auto tmp3 = static_cast(1.0); 2023-01-11T21:38:06.0813790Z auto tmp4 = tmp2 + tmp3; 2023-01-11T21:38:06.0813871Z out_ptr0[i0] = tmp4; 2023-01-11T21:38:06.0813940Z } 2023-01-11T21:38:06.0814007Z } 2023-01-11T21:38:06.0814075Z } 2023-01-11T21:38:06.0814176Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.0814244Z { 2023-01-11T21:38:06.0814319Z #pragma omp for 2023-01-11T21:38:06.0814406Z for(long i0=0; i0<4; i0+=1) 2023-01-11T21:38:06.0814584Z { 2023-01-11T21:38:06.0814674Z #pragma GCC ivdep 2023-01-11T21:38:06.0814762Z for(long i1=0; i1<2; i1+=1) 2023-01-11T21:38:06.0814833Z { 2023-01-11T21:38:06.0814903Z { 2023-01-11T21:38:06.0814971Z { 2023-01-11T21:38:06.0815084Z auto tmp0 = in_ptr0[1 + i1 + (10*i0)]; 2023-01-11T21:38:06.0815201Z auto tmp1 = static_cast(2.0); 2023-01-11T21:38:06.0815302Z auto tmp2 = tmp0 * tmp1; 2023-01-11T21:38:06.0815417Z auto tmp3 = static_cast(1.0); 2023-01-11T21:38:06.0815516Z auto tmp4 = tmp2 + tmp3; 2023-01-11T21:38:06.0815618Z out_ptr1[i1 + (2*i0)] = tmp4; 2023-01-11T21:38:06.0815683Z } 2023-01-11T21:38:06.0815753Z } 2023-01-11T21:38:06.0815821Z } 2023-01-11T21:38:06.0815889Z } 2023-01-11T21:38:06.0815972Z #pragma omp for 2023-01-11T21:38:06.0816058Z for(long i0=0; i0<4; i0+=1) 2023-01-11T21:38:06.0816127Z { 2023-01-11T21:38:06.0816208Z #pragma GCC ivdep 2023-01-11T21:38:06.0816296Z for(long i1=0; i1<3; i1+=1) 2023-01-11T21:38:06.0816364Z { 2023-01-11T21:38:06.0816434Z { 2023-01-11T21:38:06.0816509Z { 2023-01-11T21:38:06.0816618Z auto tmp0 = in_ptr0[3 + i1 + (10*i0)]; 2023-01-11T21:38:06.0816769Z auto tmp1 = static_cast(2.0); 2023-01-11T21:38:06.0816871Z auto tmp2 = tmp0 * tmp1; 2023-01-11T21:38:06.0816986Z auto tmp3 = static_cast(1.0); 2023-01-11T21:38:06.0817085Z auto tmp4 = tmp2 + tmp3; 2023-01-11T21:38:06.0817256Z out_ptr2[i1 + (3*i0)] = tmp4; 2023-01-11T21:38:06.0817332Z } 2023-01-11T21:38:06.0817402Z } 2023-01-11T21:38:06.0817463Z } 2023-01-11T21:38:06.0817532Z } 2023-01-11T21:38:06.0817616Z #pragma omp for 2023-01-11T21:38:06.0817706Z for(long i0=0; i0<4; i0+=1) 2023-01-11T21:38:06.0817774Z { 2023-01-11T21:38:06.0817861Z #pragma GCC ivdep 2023-01-11T21:38:06.0817949Z for(long i1=0; i1<4; i1+=1) 2023-01-11T21:38:06.0818010Z { 2023-01-11T21:38:06.0818079Z { 2023-01-11T21:38:06.0818150Z { 2023-01-11T21:38:06.0818262Z auto tmp0 = in_ptr0[6 + i1 + (10*i0)]; 2023-01-11T21:38:06.0818373Z auto tmp1 = static_cast(2.0); 2023-01-11T21:38:06.0818474Z auto tmp2 = tmp0 * tmp1; 2023-01-11T21:38:06.0818586Z auto tmp3 = static_cast(1.0); 2023-01-11T21:38:06.0818678Z auto tmp4 = tmp2 + tmp3; 2023-01-11T21:38:06.0818779Z out_ptr3[i1 + (4*i0)] = tmp4; 2023-01-11T21:38:06.0818850Z } 2023-01-11T21:38:06.0818920Z } 2023-01-11T21:38:06.0818988Z } 2023-01-11T21:38:06.0819095Z } 2023-01-11T21:38:06.0819162Z } 2023-01-11T21:38:06.0819220Z } 2023-01-11T21:38:06.0819311Z ''') 2023-01-11T21:38:06.0819316Z 2023-01-11T21:38:06.0819321Z 2023-01-11T21:38:06.0819417Z async_compile.wait(globals()) 2023-01-11T21:38:06.0819496Z del async_compile 2023-01-11T21:38:06.0819501Z 2023-01-11T21:38:06.0819577Z def call(args): 2023-01-11T21:38:06.0819654Z arg0_1, = args 2023-01-11T21:38:06.0819730Z args.clear() 2023-01-11T21:38:06.0819928Z buf0 = empty_strided((2, 2, 1), (2, 1, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0820129Z buf1 = empty_strided((2, 2, 2), (4, 2, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0820328Z buf2 = empty_strided((2, 2, 3), (6, 3, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0820523Z buf3 = empty_strided((2, 2, 4), (8, 4, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0820739Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr()), c_void_p(buf3.data_ptr())) 2023-01-11T21:38:06.0820817Z del arg0_1 2023-01-11T21:38:06.0820909Z return (buf0, buf1, buf2, buf3, ) 2023-01-11T21:38:06.0820914Z 2023-01-11T21:38:06.0820919Z 2023-01-11T21:38:06.0820998Z if __name__ == "__main__": 2023-01-11T21:38:06.0821123Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.0821244Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.0821451Z arg0_1 = rand_strided((2, 2, 10), (20, 10, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0821569Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.0821574Z 2023-01-11T21:38:06.0821647Z ok (5.732s) 2023-01-11T21:38:06.0822102Z test_squeeze1_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.0822240Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.0822499Z [2023-01-11 21:31:47,587] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 266 2023-01-11T21:38:06.0822792Z [2023-01-11 21:31:49,501] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 266 2023-01-11T21:38:06.0822798Z 2023-01-11T21:38:06.0822898Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.0822967Z import torch 2023-01-11T21:38:06.0823042Z import random 2023-01-11T21:38:06.0823162Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.0823288Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.0823293Z 2023-01-11T21:38:06.0823379Z aten = torch.ops.aten 2023-01-11T21:38:06.0823516Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.0823617Z async_compile = AsyncCompile() 2023-01-11T21:38:06.0823622Z 2023-01-11T21:38:06.0823697Z import triton 2023-01-11T21:38:06.0823783Z import triton.language as tl 2023-01-11T21:38:06.0823911Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.0824052Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.0824059Z 2023-01-11T21:38:06.0824064Z 2023-01-11T21:38:06.0824204Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.0824410Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.0824534Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.0824641Z float* __restrict__ out_ptr0, 2023-01-11T21:38:06.0824743Z float* __restrict__ out_ptr1) 2023-01-11T21:38:06.0824803Z { 2023-01-11T21:38:06.0824907Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.0824973Z { 2023-01-11T21:38:06.0825094Z #pragma omp for 2023-01-11T21:38:06.0825180Z for(long i0=0; i0<1; i0+=1) 2023-01-11T21:38:06.0825249Z { 2023-01-11T21:38:06.0825384Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:06.0825526Z auto tmp1 = at::vec::Vectorized(static_cast(1)); 2023-01-11T21:38:06.0825619Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.0825756Z auto tmp3 = at::vec::Vectorized(static_cast(2)); 2023-01-11T21:38:06.0825847Z auto tmp4 = tmp2 + tmp3; 2023-01-11T21:38:06.0825936Z auto tmp5 = tmp0 + tmp3; 2023-01-11T21:38:06.0826033Z tmp4.store(out_ptr0 + 8*i0); 2023-01-11T21:38:06.0826128Z tmp5.store(out_ptr1 + 8*i0); 2023-01-11T21:38:06.0826189Z } 2023-01-11T21:38:06.0826290Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:06.0826375Z for(long i0=8; i0<8; i0+=1) 2023-01-11T21:38:06.0826443Z { 2023-01-11T21:38:06.0826537Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:06.0826643Z auto tmp1 = static_cast(1); 2023-01-11T21:38:06.0826730Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.0826827Z auto tmp3 = static_cast(2); 2023-01-11T21:38:06.0826916Z auto tmp4 = tmp2 + tmp3; 2023-01-11T21:38:06.0827008Z auto tmp5 = tmp0 + tmp3; 2023-01-11T21:38:06.0827096Z out_ptr0[i0] = tmp4; 2023-01-11T21:38:06.0827181Z out_ptr1[i0] = tmp5; 2023-01-11T21:38:06.0827249Z } 2023-01-11T21:38:06.0827309Z } 2023-01-11T21:38:06.0827375Z } 2023-01-11T21:38:06.0827462Z ''') 2023-01-11T21:38:06.0827468Z 2023-01-11T21:38:06.0827473Z 2023-01-11T21:38:06.0827568Z async_compile.wait(globals()) 2023-01-11T21:38:06.0827647Z del async_compile 2023-01-11T21:38:06.0827653Z 2023-01-11T21:38:06.0827728Z def call(args): 2023-01-11T21:38:06.0827803Z arg0_1, = args 2023-01-11T21:38:06.0827878Z args.clear() 2023-01-11T21:38:06.0828075Z buf0 = empty_strided((2, 2, 2), (4, 2, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0828280Z buf1 = empty_strided((2, 2, 2), (4, 2, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0828454Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr())) 2023-01-11T21:38:06.0828529Z del arg0_1 2023-01-11T21:38:06.0828640Z return (buf0, buf1, ) 2023-01-11T21:38:06.0828646Z 2023-01-11T21:38:06.0828651Z 2023-01-11T21:38:06.0828735Z if __name__ == "__main__": 2023-01-11T21:38:06.0828855Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.0828984Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.0829208Z arg0_1 = rand_strided((1, 2, 1, 2, 2, 1, 1), (8, 4, 4, 2, 1, 1, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0829323Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.0829328Z 2023-01-11T21:38:06.0829400Z ok (1.956s) 2023-01-11T21:38:06.0829857Z test_squeeze2_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.0829995Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.0830254Z [2023-01-11 21:31:49,545] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 267 2023-01-11T21:38:06.0830519Z [2023-01-11 21:31:51,433] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 267 2023-01-11T21:38:06.0830525Z 2023-01-11T21:38:06.0830626Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.0830702Z import torch 2023-01-11T21:38:06.0830771Z import random 2023-01-11T21:38:06.0830890Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.0831043Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.0831048Z 2023-01-11T21:38:06.0831132Z aten = torch.ops.aten 2023-01-11T21:38:06.0831268Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.0831367Z async_compile = AsyncCompile() 2023-01-11T21:38:06.0831372Z 2023-01-11T21:38:06.0831450Z import triton 2023-01-11T21:38:06.0831549Z import triton.language as tl 2023-01-11T21:38:06.0831668Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.0831812Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.0831817Z 2023-01-11T21:38:06.0831822Z 2023-01-11T21:38:06.0831959Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.0832165Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.0832288Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.0832393Z float* __restrict__ out_ptr0, 2023-01-11T21:38:06.0832502Z float* __restrict__ out_ptr1) 2023-01-11T21:38:06.0832568Z { 2023-01-11T21:38:06.0832663Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.0832731Z { 2023-01-11T21:38:06.0832815Z #pragma omp for 2023-01-11T21:38:06.0832902Z for(long i0=0; i0<2; i0+=1) 2023-01-11T21:38:06.0832974Z { 2023-01-11T21:38:06.0833114Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:06.0833254Z auto tmp1 = at::vec::Vectorized(static_cast(1)); 2023-01-11T21:38:06.0833339Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.0833475Z auto tmp3 = at::vec::Vectorized(static_cast(2)); 2023-01-11T21:38:06.0833566Z auto tmp4 = tmp2 + tmp3; 2023-01-11T21:38:06.0833653Z auto tmp5 = tmp0 + tmp3; 2023-01-11T21:38:06.0833753Z tmp4.store(out_ptr0 + 8*i0); 2023-01-11T21:38:06.0833847Z tmp5.store(out_ptr1 + 8*i0); 2023-01-11T21:38:06.0833918Z } 2023-01-11T21:38:06.0834011Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:06.0834100Z for(long i0=16; i0<16; i0+=1) 2023-01-11T21:38:06.0834167Z { 2023-01-11T21:38:06.0834257Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:06.0834387Z auto tmp1 = static_cast(1); 2023-01-11T21:38:06.0834480Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.0834584Z auto tmp3 = static_cast(2); 2023-01-11T21:38:06.0834668Z auto tmp4 = tmp2 + tmp3; 2023-01-11T21:38:06.0834757Z auto tmp5 = tmp0 + tmp3; 2023-01-11T21:38:06.0834846Z out_ptr0[i0] = tmp4; 2023-01-11T21:38:06.0834933Z out_ptr1[i0] = tmp5; 2023-01-11T21:38:06.0835003Z } 2023-01-11T21:38:06.0835070Z } 2023-01-11T21:38:06.0835140Z } 2023-01-11T21:38:06.0835220Z ''') 2023-01-11T21:38:06.0835225Z 2023-01-11T21:38:06.0835229Z 2023-01-11T21:38:06.0835326Z async_compile.wait(globals()) 2023-01-11T21:38:06.0835408Z del async_compile 2023-01-11T21:38:06.0835413Z 2023-01-11T21:38:06.0835489Z def call(args): 2023-01-11T21:38:06.0835564Z arg0_1, = args 2023-01-11T21:38:06.0835640Z args.clear() 2023-01-11T21:38:06.0835862Z buf0 = empty_strided((1, 2, 2, 2, 2), (16, 8, 4, 2, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0836080Z buf1 = empty_strided((2, 1, 2, 2, 2, 1), (8, 8, 4, 2, 1, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0836244Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr())) 2023-01-11T21:38:06.0836320Z del arg0_1 2023-01-11T21:38:06.0836403Z return (buf0, buf1, ) 2023-01-11T21:38:06.0836408Z 2023-01-11T21:38:06.0836412Z 2023-01-11T21:38:06.0836494Z if __name__ == "__main__": 2023-01-11T21:38:06.0836614Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.0836746Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.0836977Z arg0_1 = rand_strided((1, 2, 1, 2, 2, 2, 1), (16, 8, 8, 4, 2, 1, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0837115Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.0837127Z 2023-01-11T21:38:06.0837193Z ok (1.928s) 2023-01-11T21:38:06.0837646Z test_stack_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.0837782Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.0838046Z [2023-01-11 21:31:51,471] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 268 2023-01-11T21:38:06.0838309Z [2023-01-11 21:31:53,182] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 268 2023-01-11T21:38:06.0838318Z 2023-01-11T21:38:06.0838417Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.0838495Z import torch 2023-01-11T21:38:06.0838575Z import random 2023-01-11T21:38:06.0838688Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.0838817Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.0838822Z 2023-01-11T21:38:06.0838906Z aten = torch.ops.aten 2023-01-11T21:38:06.0839043Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.0839139Z async_compile = AsyncCompile() 2023-01-11T21:38:06.0839144Z 2023-01-11T21:38:06.0839218Z import triton 2023-01-11T21:38:06.0839311Z import triton.language as tl 2023-01-11T21:38:06.0839437Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.0839570Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.0839576Z 2023-01-11T21:38:06.0839589Z 2023-01-11T21:38:06.0839720Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.0839930Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.0840057Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.0840167Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:06.0840300Z float* __restrict__ out_ptr0, 2023-01-11T21:38:06.0840404Z float* __restrict__ out_ptr1) 2023-01-11T21:38:06.0840473Z { 2023-01-11T21:38:06.0840569Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.0840636Z { 2023-01-11T21:38:06.0840731Z #pragma omp for collapse(2) 2023-01-11T21:38:06.0840817Z for(long i0=0; i0<12; i0+=1) 2023-01-11T21:38:06.0840884Z { 2023-01-11T21:38:06.0840977Z for(long i1=0; i1<16; i1+=1) 2023-01-11T21:38:06.0841044Z { 2023-01-11T21:38:06.0841108Z { 2023-01-11T21:38:06.0841182Z { 2023-01-11T21:38:06.0841287Z auto tmp0 = in_ptr0[i1]; 2023-01-11T21:38:06.0841397Z out_ptr0[(2*i1) + (32*i0)] = tmp0; 2023-01-11T21:38:06.0841470Z } 2023-01-11T21:38:06.0841541Z } 2023-01-11T21:38:06.0841602Z } 2023-01-11T21:38:06.0841670Z } 2023-01-11T21:38:06.0841769Z #pragma omp for collapse(2) 2023-01-11T21:38:06.0841855Z for(long i0=0; i0<12; i0+=1) 2023-01-11T21:38:06.0841923Z { 2023-01-11T21:38:06.0842014Z for(long i1=0; i1<16; i1+=1) 2023-01-11T21:38:06.0842082Z { 2023-01-11T21:38:06.0842145Z { 2023-01-11T21:38:06.0842217Z { 2023-01-11T21:38:06.0842318Z auto tmp0 = in_ptr1[i0]; 2023-01-11T21:38:06.0842426Z out_ptr1[(2*i1) + (32*i0)] = tmp0; 2023-01-11T21:38:06.0842496Z } 2023-01-11T21:38:06.0842565Z } 2023-01-11T21:38:06.0842633Z } 2023-01-11T21:38:06.0842722Z } 2023-01-11T21:38:06.0842787Z } 2023-01-11T21:38:06.0842851Z } 2023-01-11T21:38:06.0842936Z ''') 2023-01-11T21:38:06.0842942Z 2023-01-11T21:38:06.0842947Z 2023-01-11T21:38:06.0843044Z async_compile.wait(globals()) 2023-01-11T21:38:06.0843122Z del async_compile 2023-01-11T21:38:06.0843127Z 2023-01-11T21:38:06.0843203Z def call(args): 2023-01-11T21:38:06.0843279Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.0843356Z args.clear() 2023-01-11T21:38:06.0843565Z buf2 = empty_strided((12, 16, 2), (32, 2, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0843677Z buf0 = as_strided(buf2, (12, 16, 1), (32, 2, 1)) # alias 2023-01-11T21:38:06.0843790Z buf1 = as_strided(buf2, (12, 16, 1), (32, 2, 1), 1) # alias 2023-01-11T21:38:06.0843985Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr())) 2023-01-11T21:38:06.0844059Z del arg0_1 2023-01-11T21:38:06.0844125Z del arg1_1 2023-01-11T21:38:06.0844205Z return (buf2, ) 2023-01-11T21:38:06.0844210Z 2023-01-11T21:38:06.0844215Z 2023-01-11T21:38:06.0844296Z if __name__ == "__main__": 2023-01-11T21:38:06.0844415Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.0844543Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.0844746Z arg0_1 = rand_strided((1, 16), (16, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0844944Z arg1_1 = rand_strided((12, 1), (1, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0845065Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.0845070Z 2023-01-11T21:38:06.0845143Z ok (1.734s) 2023-01-11T21:38:06.0845585Z test_std_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.0845721Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.0845980Z [2023-01-11 21:31:53,221] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 269 2023-01-11T21:38:06.0845986Z 2023-01-11T21:38:06.0846124Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.0846204Z import torch 2023-01-11T21:38:06.0846279Z import random 2023-01-11T21:38:06.0846399Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.0846525Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.0846530Z 2023-01-11T21:38:06.0846606Z aten = torch.ops.aten 2023-01-11T21:38:06.0846743Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.0846841Z async_compile = AsyncCompile() 2023-01-11T21:38:06.0846846Z 2023-01-11T21:38:06.0846920Z import triton 2023-01-11T21:38:06.0847015Z import triton.language as tl 2023-01-11T21:38:06.0847144Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.0847284Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.0847289Z 2023-01-11T21:38:06.0847293Z 2023-01-11T21:38:06.0847431Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.0847632Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.0857619Z extern "C" void kernel(float* __restrict__ in_out_ptr0, 2023-01-11T21:38:06.0857779Z float* __restrict__ in_out_ptr1, 2023-01-11T21:38:06.0857890Z float* __restrict__ in_out_ptr2, 2023-01-11T21:38:06.0857988Z float* __restrict__ in_out_ptr3, 2023-01-11T21:38:06.0858094Z float* __restrict__ in_out_ptr4, 2023-01-11T21:38:06.0858199Z float* __restrict__ in_out_ptr5, 2023-01-11T21:38:06.0858304Z float* __restrict__ in_out_ptr6, 2023-01-11T21:38:06.0858503Z const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.0858611Z float* __restrict__ out_ptr0, 2023-01-11T21:38:06.0858714Z float* __restrict__ out_ptr2, 2023-01-11T21:38:06.0858807Z float* __restrict__ out_ptr4, 2023-01-11T21:38:06.0858915Z float* __restrict__ out_ptr5, 2023-01-11T21:38:06.0859013Z float* __restrict__ out_ptr7, 2023-01-11T21:38:06.0859114Z float* __restrict__ out_ptr10, 2023-01-11T21:38:06.0859221Z float* __restrict__ out_ptr12, 2023-01-11T21:38:06.0859322Z float* __restrict__ out_ptr14) 2023-01-11T21:38:06.0859382Z { 2023-01-11T21:38:06.0859474Z auto out_ptr6 = in_out_ptr0; 2023-01-11T21:38:06.0859562Z auto out_ptr8 = in_out_ptr1; 2023-01-11T21:38:06.0859654Z auto out_ptr11 = in_out_ptr2; 2023-01-11T21:38:06.0859743Z auto out_ptr13 = in_out_ptr3; 2023-01-11T21:38:06.0859834Z auto out_ptr1 = in_out_ptr4; 2023-01-11T21:38:06.0859921Z auto out_ptr3 = in_out_ptr5; 2023-01-11T21:38:06.0860001Z auto out_ptr9 = in_out_ptr6; 2023-01-11T21:38:06.0860069Z { 2023-01-11T21:38:06.0860272Z #pragma omp declare reduction(+:at::vec::Vectorized:omp_out += omp_in) initializer(omp_priv={{0}}) 2023-01-11T21:38:06.0860355Z float tmp1 = 0; 2023-01-11T21:38:06.0860479Z auto tmp1_vec = at::vec::Vectorized(tmp1); 2023-01-11T21:38:06.0860588Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.0860655Z { 2023-01-11T21:38:06.0860761Z #pragma omp for reduction(+:tmp1_vec) 2023-01-11T21:38:06.0860851Z for(long i0=0; i0<32; i0+=1) 2023-01-11T21:38:06.0860920Z { 2023-01-11T21:38:06.0861061Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:06.0861146Z tmp1_vec += tmp0; 2023-01-11T21:38:06.0861215Z } 2023-01-11T21:38:06.0861417Z tmp1 = at::vec::vec_reduce_all([](at::vec::Vectorized& x, at::vec::Vectorized&y) {return x + y;}, tmp1_vec); 2023-01-11T21:38:06.0861546Z #pragma omp for simd simdlen(4) reduction(+:tmp1) 2023-01-11T21:38:06.0861633Z for(long i0=256; i0<256; i0+=1) 2023-01-11T21:38:06.0861742Z { 2023-01-11T21:38:06.0861837Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:06.0861918Z tmp1 += tmp0; 2023-01-11T21:38:06.0861985Z } 2023-01-11T21:38:06.0862052Z } 2023-01-11T21:38:06.0862136Z out_ptr0[0] = tmp1; 2023-01-11T21:38:06.0862196Z } 2023-01-11T21:38:06.0862261Z { 2023-01-11T21:38:06.0862450Z #pragma omp declare reduction(+:at::vec::Vectorized:omp_out += omp_in) initializer(omp_priv={{0}}) 2023-01-11T21:38:06.0862535Z float tmp6 = 0; 2023-01-11T21:38:06.0862656Z auto tmp6_vec = at::vec::Vectorized(tmp6); 2023-01-11T21:38:06.0862739Z float tmp7 = 0; 2023-01-11T21:38:06.0862862Z auto tmp7_vec = at::vec::Vectorized(tmp7); 2023-01-11T21:38:06.0862963Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.0863031Z { 2023-01-11T21:38:06.0863172Z #pragma omp for reduction(+:tmp6_vec) reduction(+:tmp7_vec) 2023-01-11T21:38:06.0863265Z for(long i0=0; i0<32; i0+=1) 2023-01-11T21:38:06.0863333Z { 2023-01-11T21:38:06.0863470Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:06.0863603Z auto tmp1 = at::vec::Vectorized(out_ptr0[0]); 2023-01-11T21:38:06.0863743Z auto tmp2 = at::vec::Vectorized(static_cast(256)); 2023-01-11T21:38:06.0863830Z auto tmp3 = tmp1 / tmp2; 2023-01-11T21:38:06.0863986Z auto tmp4 = tmp0 - tmp3; 2023-01-11T21:38:06.0864079Z auto tmp5 = tmp4.pow(2); 2023-01-11T21:38:06.0864165Z tmp6_vec += tmp5; 2023-01-11T21:38:06.0864278Z tmp7_vec += tmp0; 2023-01-11T21:38:06.0864348Z } 2023-01-11T21:38:06.0864546Z tmp6 = at::vec::vec_reduce_all([](at::vec::Vectorized& x, at::vec::Vectorized&y) {return x + y;}, tmp6_vec); 2023-01-11T21:38:06.0864739Z tmp7 = at::vec::vec_reduce_all([](at::vec::Vectorized& x, at::vec::Vectorized&y) {return x + y;}, tmp7_vec); 2023-01-11T21:38:06.0864882Z #pragma omp for simd simdlen(4) reduction(+:tmp6) reduction(+:tmp7) 2023-01-11T21:38:06.0864972Z for(long i0=256; i0<256; i0+=1) 2023-01-11T21:38:06.0865039Z { 2023-01-11T21:38:06.0865130Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:06.0865221Z auto tmp1 = out_ptr0[0]; 2023-01-11T21:38:06.0865330Z auto tmp2 = static_cast(256); 2023-01-11T21:38:06.0865421Z auto tmp3 = tmp1 / tmp2; 2023-01-11T21:38:06.0865547Z auto tmp4 = tmp0 - tmp3; 2023-01-11T21:38:06.0865642Z auto tmp5 = tmp4 * tmp4; 2023-01-11T21:38:06.0865723Z tmp6 += tmp5; 2023-01-11T21:38:06.0865803Z tmp7 += tmp0; 2023-01-11T21:38:06.0865869Z } 2023-01-11T21:38:06.0865936Z } 2023-01-11T21:38:06.0866013Z out_ptr1[0] = tmp6; 2023-01-11T21:38:06.0866088Z out_ptr2[0] = tmp7; 2023-01-11T21:38:06.0866155Z } 2023-01-11T21:38:06.0866219Z { 2023-01-11T21:38:06.0866410Z #pragma omp declare reduction(+:at::vec::Vectorized:omp_out += omp_in) initializer(omp_priv={{0}}) 2023-01-11T21:38:06.0866489Z float tmp6 = 0; 2023-01-11T21:38:06.0866612Z auto tmp6_vec = at::vec::Vectorized(tmp6); 2023-01-11T21:38:06.0866692Z float tmp7 = 0; 2023-01-11T21:38:06.0866804Z auto tmp7_vec = at::vec::Vectorized(tmp7); 2023-01-11T21:38:06.0866913Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.0866980Z { 2023-01-11T21:38:06.0867120Z #pragma omp for reduction(+:tmp6_vec) reduction(+:tmp7_vec) 2023-01-11T21:38:06.0867210Z for(long i0=0; i0<32; i0+=1) 2023-01-11T21:38:06.0867278Z { 2023-01-11T21:38:06.0867418Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:06.0867572Z auto tmp1 = at::vec::Vectorized(out_ptr2[0]); 2023-01-11T21:38:06.0867716Z auto tmp2 = at::vec::Vectorized(static_cast(256)); 2023-01-11T21:38:06.0867807Z auto tmp3 = tmp1 / tmp2; 2023-01-11T21:38:06.0867941Z auto tmp4 = tmp0 - tmp3; 2023-01-11T21:38:06.0868036Z auto tmp5 = tmp4.pow(2); 2023-01-11T21:38:06.0868121Z tmp6_vec += tmp5; 2023-01-11T21:38:06.0868204Z tmp7_vec += tmp0; 2023-01-11T21:38:06.0868271Z } 2023-01-11T21:38:06.0868460Z tmp6 = at::vec::vec_reduce_all([](at::vec::Vectorized& x, at::vec::Vectorized&y) {return x + y;}, tmp6_vec); 2023-01-11T21:38:06.0868654Z tmp7 = at::vec::vec_reduce_all([](at::vec::Vectorized& x, at::vec::Vectorized&y) {return x + y;}, tmp7_vec); 2023-01-11T21:38:06.0868799Z #pragma omp for simd simdlen(4) reduction(+:tmp6) reduction(+:tmp7) 2023-01-11T21:38:06.0868895Z for(long i0=256; i0<256; i0+=1) 2023-01-11T21:38:06.0868967Z { 2023-01-11T21:38:06.0869058Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:06.0869149Z auto tmp1 = out_ptr2[0]; 2023-01-11T21:38:06.0869258Z auto tmp2 = static_cast(256); 2023-01-11T21:38:06.0869342Z auto tmp3 = tmp1 / tmp2; 2023-01-11T21:38:06.0869474Z auto tmp4 = tmp0 - tmp3; 2023-01-11T21:38:06.0869566Z auto tmp5 = tmp4 * tmp4; 2023-01-11T21:38:06.0869648Z tmp6 += tmp5; 2023-01-11T21:38:06.0869727Z tmp7 += tmp0; 2023-01-11T21:38:06.0869823Z } 2023-01-11T21:38:06.0869890Z } 2023-01-11T21:38:06.0869965Z out_ptr3[0] = tmp6; 2023-01-11T21:38:06.0870045Z out_ptr4[0] = tmp7; 2023-01-11T21:38:06.0870110Z } 2023-01-11T21:38:06.0870215Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.0870284Z { 2023-01-11T21:38:06.0870366Z #pragma omp for 2023-01-11T21:38:06.0870449Z for(long i0=0; i0<32; i0+=1) 2023-01-11T21:38:06.0870516Z { 2023-01-11T21:38:06.0870583Z { 2023-01-11T21:38:06.0870778Z #pragma omp declare reduction(+:at::vec::Vectorized:omp_out += omp_in) initializer(omp_priv={{0}}) 2023-01-11T21:38:06.0870862Z float tmp1 = 0; 2023-01-11T21:38:06.0870988Z auto tmp1_vec = at::vec::Vectorized(tmp1); 2023-01-11T21:38:06.0871081Z for(long i1=0; i1<1; i1+=1) 2023-01-11T21:38:06.0871150Z { 2023-01-11T21:38:06.0871291Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + (8*i0) + (8*i1)); 2023-01-11T21:38:06.0871382Z tmp1_vec += tmp0; 2023-01-11T21:38:06.0871450Z } 2023-01-11T21:38:06.0871648Z tmp1 = at::vec::vec_reduce_all([](at::vec::Vectorized& x, at::vec::Vectorized&y) {return x + y;}, tmp1_vec); 2023-01-11T21:38:06.0871776Z #pragma omp simd simdlen(4) reduction(+:tmp1) 2023-01-11T21:38:06.0871871Z for(long i1=8; i1<8; i1+=1) 2023-01-11T21:38:06.0871940Z { 2023-01-11T21:38:06.0872049Z auto tmp0 = in_ptr0[i1 + (8*i0)]; 2023-01-11T21:38:06.0872124Z tmp1 += tmp0; 2023-01-11T21:38:06.0872194Z } 2023-01-11T21:38:06.0872280Z out_ptr5[i0] = tmp1; 2023-01-11T21:38:06.0872348Z } 2023-01-11T21:38:06.0872415Z } 2023-01-11T21:38:06.0872496Z #pragma omp for 2023-01-11T21:38:06.0872576Z for(long i0=0; i0<32; i0+=1) 2023-01-11T21:38:06.0872646Z { 2023-01-11T21:38:06.0872714Z { 2023-01-11T21:38:06.0872903Z #pragma omp declare reduction(+:at::vec::Vectorized:omp_out += omp_in) initializer(omp_priv={{0}}) 2023-01-11T21:38:06.0872988Z float tmp6 = 0; 2023-01-11T21:38:06.0873114Z auto tmp6_vec = at::vec::Vectorized(tmp6); 2023-01-11T21:38:06.0873226Z float tmp7 = 0; 2023-01-11T21:38:06.0873352Z auto tmp7_vec = at::vec::Vectorized(tmp7); 2023-01-11T21:38:06.0873439Z for(long i1=0; i1<1; i1+=1) 2023-01-11T21:38:06.0873512Z { 2023-01-11T21:38:06.0873661Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + (8*i0) + (8*i1)); 2023-01-11T21:38:06.0873794Z auto tmp1 = at::vec::Vectorized(out_ptr5[i0]); 2023-01-11T21:38:06.0873936Z auto tmp2 = at::vec::Vectorized(static_cast(8)); 2023-01-11T21:38:06.0874034Z auto tmp3 = tmp1 / tmp2; 2023-01-11T21:38:06.0874182Z auto tmp4 = tmp0 - tmp3; 2023-01-11T21:38:06.0874278Z auto tmp5 = tmp4.pow(2); 2023-01-11T21:38:06.0874358Z tmp6_vec += tmp5; 2023-01-11T21:38:06.0874442Z tmp7_vec += tmp0; 2023-01-11T21:38:06.0874510Z } 2023-01-11T21:38:06.0874714Z tmp6 = at::vec::vec_reduce_all([](at::vec::Vectorized& x, at::vec::Vectorized&y) {return x + y;}, tmp6_vec); 2023-01-11T21:38:06.0874907Z tmp7 = at::vec::vec_reduce_all([](at::vec::Vectorized& x, at::vec::Vectorized&y) {return x + y;}, tmp7_vec); 2023-01-11T21:38:06.0875052Z #pragma omp simd simdlen(4) reduction(+:tmp6) reduction(+:tmp7) 2023-01-11T21:38:06.0875146Z for(long i1=8; i1<8; i1+=1) 2023-01-11T21:38:06.0875216Z { 2023-01-11T21:38:06.0875332Z auto tmp0 = in_ptr0[i1 + (8*i0)]; 2023-01-11T21:38:06.0875492Z auto tmp1 = out_ptr5[i0]; 2023-01-11T21:38:06.0875615Z auto tmp2 = static_cast(8); 2023-01-11T21:38:06.0875712Z auto tmp3 = tmp1 / tmp2; 2023-01-11T21:38:06.0875849Z auto tmp4 = tmp0 - tmp3; 2023-01-11T21:38:06.0875946Z auto tmp5 = tmp4 * tmp4; 2023-01-11T21:38:06.0876028Z tmp6 += tmp5; 2023-01-11T21:38:06.0876104Z tmp7 += tmp0; 2023-01-11T21:38:06.0876172Z } 2023-01-11T21:38:06.0876260Z out_ptr6[i0] = tmp6; 2023-01-11T21:38:06.0876346Z out_ptr7[i0] = tmp7; 2023-01-11T21:38:06.0876414Z } 2023-01-11T21:38:06.0876479Z } 2023-01-11T21:38:06.0876561Z #pragma omp for 2023-01-11T21:38:06.0876641Z for(long i0=0; i0<4; i0+=1) 2023-01-11T21:38:06.0876707Z { 2023-01-11T21:38:06.0876849Z auto tmp0 = at::vec::Vectorized::loadu(out_ptr6 + 8*i0); 2023-01-11T21:38:06.0876992Z auto tmp1 = at::vec::Vectorized(static_cast(7)); 2023-01-11T21:38:06.0877081Z auto tmp2 = tmp0 / tmp1; 2023-01-11T21:38:06.0877183Z tmp2.store(in_out_ptr0 + 8*i0); 2023-01-11T21:38:06.0877252Z } 2023-01-11T21:38:06.0877344Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:06.0877434Z for(long i0=32; i0<32; i0+=1) 2023-01-11T21:38:06.0877502Z { 2023-01-11T21:38:06.0877593Z auto tmp0 = out_ptr6[i0]; 2023-01-11T21:38:06.0877696Z auto tmp1 = static_cast(7); 2023-01-11T21:38:06.0877786Z auto tmp2 = tmp0 / tmp1; 2023-01-11T21:38:06.0877873Z in_out_ptr0[i0] = tmp2; 2023-01-11T21:38:06.0877932Z } 2023-01-11T21:38:06.0878013Z #pragma omp for 2023-01-11T21:38:06.0878099Z for(long i0=0; i0<32; i0+=1) 2023-01-11T21:38:06.0878164Z { 2023-01-11T21:38:06.0878231Z { 2023-01-11T21:38:06.0878423Z #pragma omp declare reduction(+:at::vec::Vectorized:omp_out += omp_in) initializer(omp_priv={{0}}) 2023-01-11T21:38:06.0878510Z float tmp6 = 0; 2023-01-11T21:38:06.0878629Z auto tmp6_vec = at::vec::Vectorized(tmp6); 2023-01-11T21:38:06.0878722Z for(long i1=0; i1<1; i1+=1) 2023-01-11T21:38:06.0878820Z { 2023-01-11T21:38:06.0878969Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + (8*i0) + (8*i1)); 2023-01-11T21:38:06.0879104Z auto tmp1 = at::vec::Vectorized(out_ptr7[i0]); 2023-01-11T21:38:06.0879247Z auto tmp2 = at::vec::Vectorized(static_cast(8)); 2023-01-11T21:38:06.0879344Z auto tmp3 = tmp1 / tmp2; 2023-01-11T21:38:06.0879484Z auto tmp4 = tmp0 - tmp3; 2023-01-11T21:38:06.0879573Z auto tmp5 = tmp4.pow(2); 2023-01-11T21:38:06.0879659Z tmp6_vec += tmp5; 2023-01-11T21:38:06.0879732Z } 2023-01-11T21:38:06.0879928Z tmp6 = at::vec::vec_reduce_all([](at::vec::Vectorized& x, at::vec::Vectorized&y) {return x + y;}, tmp6_vec); 2023-01-11T21:38:06.0880055Z #pragma omp simd simdlen(4) reduction(+:tmp6) 2023-01-11T21:38:06.0880156Z for(long i1=8; i1<8; i1+=1) 2023-01-11T21:38:06.0880226Z { 2023-01-11T21:38:06.0880331Z auto tmp0 = in_ptr0[i1 + (8*i0)]; 2023-01-11T21:38:06.0880422Z auto tmp1 = out_ptr7[i0]; 2023-01-11T21:38:06.0880531Z auto tmp2 = static_cast(8); 2023-01-11T21:38:06.0880628Z auto tmp3 = tmp1 / tmp2; 2023-01-11T21:38:06.0880765Z auto tmp4 = tmp0 - tmp3; 2023-01-11T21:38:06.0880859Z auto tmp5 = tmp4 * tmp4; 2023-01-11T21:38:06.0880942Z tmp6 += tmp5; 2023-01-11T21:38:06.0881013Z } 2023-01-11T21:38:06.0881093Z out_ptr8[i0] = tmp6; 2023-01-11T21:38:06.0881236Z } 2023-01-11T21:38:06.0881303Z } 2023-01-11T21:38:06.0881387Z #pragma omp for 2023-01-11T21:38:06.0881473Z for(long i0=0; i0<4; i0+=1) 2023-01-11T21:38:06.0881538Z { 2023-01-11T21:38:06.0881681Z auto tmp0 = at::vec::Vectorized::loadu(out_ptr8 + 8*i0); 2023-01-11T21:38:06.0881813Z auto tmp1 = at::vec::Vectorized(static_cast(8)); 2023-01-11T21:38:06.0881904Z auto tmp2 = tmp0 / tmp1; 2023-01-11T21:38:06.0882006Z tmp2.store(in_out_ptr1 + 8*i0); 2023-01-11T21:38:06.0882073Z } 2023-01-11T21:38:06.0882171Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:06.0882258Z for(long i0=32; i0<32; i0+=1) 2023-01-11T21:38:06.0882325Z { 2023-01-11T21:38:06.0882408Z auto tmp0 = out_ptr8[i0]; 2023-01-11T21:38:06.0882512Z auto tmp1 = static_cast(8); 2023-01-11T21:38:06.0882600Z auto tmp2 = tmp0 / tmp1; 2023-01-11T21:38:06.0882693Z in_out_ptr1[i0] = tmp2; 2023-01-11T21:38:06.0882763Z } 2023-01-11T21:38:06.0882830Z } 2023-01-11T21:38:06.0882890Z { 2023-01-11T21:38:06.0883082Z #pragma omp declare reduction(+:at::vec::Vectorized:omp_out += omp_in) initializer(omp_priv={{0}}) 2023-01-11T21:38:06.0883165Z float tmp6 = 0; 2023-01-11T21:38:06.0883286Z auto tmp6_vec = at::vec::Vectorized(tmp6); 2023-01-11T21:38:06.0883394Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.0883461Z { 2023-01-11T21:38:06.0883576Z #pragma omp for reduction(+:tmp6_vec) 2023-01-11T21:38:06.0883666Z for(long i0=0; i0<32; i0+=1) 2023-01-11T21:38:06.0883728Z { 2023-01-11T21:38:06.0883869Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:06.0884000Z auto tmp1 = at::vec::Vectorized(out_ptr4[0]); 2023-01-11T21:38:06.0884142Z auto tmp2 = at::vec::Vectorized(static_cast(256)); 2023-01-11T21:38:06.0884237Z auto tmp3 = tmp1 / tmp2; 2023-01-11T21:38:06.0884371Z auto tmp4 = tmp0 - tmp3; 2023-01-11T21:38:06.0884464Z auto tmp5 = tmp4.pow(2); 2023-01-11T21:38:06.0884548Z tmp6_vec += tmp5; 2023-01-11T21:38:06.0884637Z } 2023-01-11T21:38:06.0884836Z tmp6 = at::vec::vec_reduce_all([](at::vec::Vectorized& x, at::vec::Vectorized&y) {return x + y;}, tmp6_vec); 2023-01-11T21:38:06.0884962Z #pragma omp for simd simdlen(4) reduction(+:tmp6) 2023-01-11T21:38:06.0885059Z for(long i0=256; i0<256; i0+=1) 2023-01-11T21:38:06.0885128Z { 2023-01-11T21:38:06.0885219Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:06.0885310Z auto tmp1 = out_ptr4[0]; 2023-01-11T21:38:06.0885412Z auto tmp2 = static_cast(256); 2023-01-11T21:38:06.0885505Z auto tmp3 = tmp1 / tmp2; 2023-01-11T21:38:06.0885642Z auto tmp4 = tmp0 - tmp3; 2023-01-11T21:38:06.0885734Z auto tmp5 = tmp4 * tmp4; 2023-01-11T21:38:06.0885814Z tmp6 += tmp5; 2023-01-11T21:38:06.0885883Z } 2023-01-11T21:38:06.0885952Z } 2023-01-11T21:38:06.0886029Z out_ptr9[0] = tmp6; 2023-01-11T21:38:06.0886095Z } 2023-01-11T21:38:06.0886197Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.0886263Z { 2023-01-11T21:38:06.0886344Z #pragma omp for 2023-01-11T21:38:06.0886432Z for(long i0=0; i0<32; i0+=1) 2023-01-11T21:38:06.0886492Z { 2023-01-11T21:38:06.0886559Z { 2023-01-11T21:38:06.0886627Z { 2023-01-11T21:38:06.0886713Z float tmp1 = 0; 2023-01-11T21:38:06.0886811Z for(long i1=0; i1<8; i1+=1) 2023-01-11T21:38:06.0886883Z { 2023-01-11T21:38:06.0886955Z { 2023-01-11T21:38:06.0887089Z auto tmp0 = in_ptr0[i0 + (32*i1)]; 2023-01-11T21:38:06.0887178Z tmp1 += tmp0; 2023-01-11T21:38:06.0887254Z } 2023-01-11T21:38:06.0887325Z } 2023-01-11T21:38:06.0887416Z out_ptr10[i0] = tmp1; 2023-01-11T21:38:06.0887486Z } 2023-01-11T21:38:06.0887553Z } 2023-01-11T21:38:06.0887613Z } 2023-01-11T21:38:06.0887695Z #pragma omp for 2023-01-11T21:38:06.0887780Z for(long i0=0; i0<32; i0+=1) 2023-01-11T21:38:06.0887846Z { 2023-01-11T21:38:06.0887913Z { 2023-01-11T21:38:06.0887981Z { 2023-01-11T21:38:06.0888067Z float tmp6 = 0; 2023-01-11T21:38:06.0888145Z float tmp7 = 0; 2023-01-11T21:38:06.0888241Z for(long i1=0; i1<8; i1+=1) 2023-01-11T21:38:06.0888312Z { 2023-01-11T21:38:06.0888385Z { 2023-01-11T21:38:06.0888502Z auto tmp0 = in_ptr0[i0 + (32*i1)]; 2023-01-11T21:38:06.0888608Z auto tmp1 = out_ptr10[i0]; 2023-01-11T21:38:06.0888721Z auto tmp2 = static_cast(8); 2023-01-11T21:38:06.0888816Z auto tmp3 = tmp1 / tmp2; 2023-01-11T21:38:06.0888968Z auto tmp4 = tmp0 - tmp3; 2023-01-11T21:38:06.0889069Z auto tmp5 = tmp4 * tmp4; 2023-01-11T21:38:06.0889156Z tmp6 += tmp5; 2023-01-11T21:38:06.0889244Z tmp7 += tmp0; 2023-01-11T21:38:06.0889315Z } 2023-01-11T21:38:06.0889385Z } 2023-01-11T21:38:06.0889468Z out_ptr11[i0] = tmp6; 2023-01-11T21:38:06.0889556Z out_ptr12[i0] = tmp7; 2023-01-11T21:38:06.0889625Z } 2023-01-11T21:38:06.0889692Z } 2023-01-11T21:38:06.0889758Z } 2023-01-11T21:38:06.0889842Z #pragma omp for 2023-01-11T21:38:06.0889929Z for(long i0=0; i0<4; i0+=1) 2023-01-11T21:38:06.0889989Z { 2023-01-11T21:38:06.0890131Z auto tmp0 = at::vec::Vectorized::loadu(out_ptr11 + 8*i0); 2023-01-11T21:38:06.0890299Z auto tmp1 = at::vec::Vectorized(static_cast(7)); 2023-01-11T21:38:06.0890390Z auto tmp2 = tmp0 / tmp1; 2023-01-11T21:38:06.0890479Z auto tmp3 = tmp2.sqrt(); 2023-01-11T21:38:06.0890580Z tmp3.store(in_out_ptr2 + 8*i0); 2023-01-11T21:38:06.0890648Z } 2023-01-11T21:38:06.0890740Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:06.0890827Z for(long i0=32; i0<32; i0+=1) 2023-01-11T21:38:06.0890894Z { 2023-01-11T21:38:06.0890986Z auto tmp0 = out_ptr11[i0]; 2023-01-11T21:38:06.0891089Z auto tmp1 = static_cast(7); 2023-01-11T21:38:06.0891179Z auto tmp2 = tmp0 / tmp1; 2023-01-11T21:38:06.0891279Z auto tmp3 = std::sqrt(tmp2); 2023-01-11T21:38:06.0891360Z in_out_ptr2[i0] = tmp3; 2023-01-11T21:38:06.0891426Z } 2023-01-11T21:38:06.0891507Z #pragma omp for 2023-01-11T21:38:06.0891594Z for(long i0=0; i0<32; i0+=1) 2023-01-11T21:38:06.0891661Z { 2023-01-11T21:38:06.0891730Z { 2023-01-11T21:38:06.0891794Z { 2023-01-11T21:38:06.0891880Z float tmp6 = 0; 2023-01-11T21:38:06.0891977Z for(long i1=0; i1<8; i1+=1) 2023-01-11T21:38:06.0892047Z { 2023-01-11T21:38:06.0892119Z { 2023-01-11T21:38:06.0892227Z auto tmp0 = in_ptr0[i0 + (32*i1)]; 2023-01-11T21:38:06.0892331Z auto tmp1 = out_ptr12[i0]; 2023-01-11T21:38:06.0892437Z auto tmp2 = static_cast(8); 2023-01-11T21:38:06.0892539Z auto tmp3 = tmp1 / tmp2; 2023-01-11T21:38:06.0892713Z auto tmp4 = tmp0 - tmp3; 2023-01-11T21:38:06.0892811Z auto tmp5 = tmp4 * tmp4; 2023-01-11T21:38:06.0892898Z tmp6 += tmp5; 2023-01-11T21:38:06.0892972Z } 2023-01-11T21:38:06.0893042Z } 2023-01-11T21:38:06.0893133Z out_ptr13[i0] = tmp6; 2023-01-11T21:38:06.0893196Z } 2023-01-11T21:38:06.0893264Z } 2023-01-11T21:38:06.0893331Z } 2023-01-11T21:38:06.0893414Z #pragma omp for 2023-01-11T21:38:06.0893499Z for(long i0=0; i0<4; i0+=1) 2023-01-11T21:38:06.0893568Z { 2023-01-11T21:38:06.0893701Z auto tmp0 = at::vec::Vectorized::loadu(out_ptr13 + 8*i0); 2023-01-11T21:38:06.0893839Z auto tmp1 = at::vec::Vectorized(static_cast(8)); 2023-01-11T21:38:06.0893929Z auto tmp2 = tmp0 / tmp1; 2023-01-11T21:38:06.0894020Z auto tmp3 = tmp2.sqrt(); 2023-01-11T21:38:06.0894125Z tmp3.store(in_out_ptr3 + 8*i0); 2023-01-11T21:38:06.0894191Z } 2023-01-11T21:38:06.0894291Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:06.0894379Z for(long i0=32; i0<32; i0+=1) 2023-01-11T21:38:06.0894439Z { 2023-01-11T21:38:06.0894675Z auto tmp0 = out_ptr13[i0]; 2023-01-11T21:38:06.0894785Z auto tmp1 = static_cast(8); 2023-01-11T21:38:06.0894875Z auto tmp2 = tmp0 / tmp1; 2023-01-11T21:38:06.0894971Z auto tmp3 = std::sqrt(tmp2); 2023-01-11T21:38:06.0895066Z in_out_ptr3[i0] = tmp3; 2023-01-11T21:38:06.0895140Z } 2023-01-11T21:38:06.0895223Z #pragma omp for 2023-01-11T21:38:06.0895328Z for(long i0=0; i0<8; i0+=1) 2023-01-11T21:38:06.0895397Z { 2023-01-11T21:38:06.0895488Z for(long i1=0; i1<1; i1+=1) 2023-01-11T21:38:06.0895559Z { 2023-01-11T21:38:06.0895706Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + (8*i1) + (32*i0)); 2023-01-11T21:38:06.0895859Z auto tmp1 = at::vec::Vectorized::loadu(in_ptr0 + 8 + (8*i1) + (32*i0)); 2023-01-11T21:38:06.0896002Z auto tmp3 = at::vec::Vectorized::loadu(in_ptr0 + 16 + (8*i1) + (32*i0)); 2023-01-11T21:38:06.0896195Z auto tmp5 = at::vec::Vectorized::loadu(in_ptr0 + 24 + (8*i1) + (32*i0)); 2023-01-11T21:38:06.0896289Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.0896381Z auto tmp4 = tmp2 + tmp3; 2023-01-11T21:38:06.0896473Z auto tmp6 = tmp4 + tmp5; 2023-01-11T21:38:06.0896614Z auto tmp7 = at::vec::Vectorized(static_cast(4)); 2023-01-11T21:38:06.0896704Z auto tmp8 = tmp6 / tmp7; 2023-01-11T21:38:06.0896838Z auto tmp9 = tmp0 - tmp8; 2023-01-11T21:38:06.0896926Z auto tmp10 = tmp9.pow(2); 2023-01-11T21:38:06.0897058Z auto tmp11 = tmp1 - tmp8; 2023-01-11T21:38:06.0897228Z auto tmp12 = tmp11.pow(2); 2023-01-11T21:38:06.0897332Z auto tmp13 = tmp10 + tmp12; 2023-01-11T21:38:06.0897466Z auto tmp14 = tmp3 - tmp8; 2023-01-11T21:38:06.0897561Z auto tmp15 = tmp14.pow(2); 2023-01-11T21:38:06.0897654Z auto tmp16 = tmp13 + tmp15; 2023-01-11T21:38:06.0897782Z auto tmp17 = tmp5 - tmp8; 2023-01-11T21:38:06.0897876Z auto tmp18 = tmp17.pow(2); 2023-01-11T21:38:06.0897970Z auto tmp19 = tmp16 + tmp18; 2023-01-11T21:38:06.0898113Z auto tmp20 = at::vec::Vectorized(static_cast(3)); 2023-01-11T21:38:06.0898210Z auto tmp21 = tmp19 / tmp20; 2023-01-11T21:38:06.0898303Z auto tmp22 = tmp21.sqrt(); 2023-01-11T21:38:06.0898414Z tmp22.store(out_ptr14 + (8*i0) + (8*i1)); 2023-01-11T21:38:06.0898476Z } 2023-01-11T21:38:06.0898573Z #pragma omp simd simdlen(4) 2023-01-11T21:38:06.0898703Z for(long i1=8; i1<8; i1+=1) 2023-01-11T21:38:06.0898772Z { 2023-01-11T21:38:06.0898873Z auto tmp0 = in_ptr0[i1 + (32*i0)]; 2023-01-11T21:38:06.0898979Z auto tmp1 = in_ptr0[8 + i1 + (32*i0)]; 2023-01-11T21:38:06.0899086Z auto tmp3 = in_ptr0[16 + i1 + (32*i0)]; 2023-01-11T21:38:06.0899184Z auto tmp5 = in_ptr0[24 + i1 + (32*i0)]; 2023-01-11T21:38:06.0899276Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.0899366Z auto tmp4 = tmp2 + tmp3; 2023-01-11T21:38:06.0899457Z auto tmp6 = tmp4 + tmp5; 2023-01-11T21:38:06.0899561Z auto tmp7 = static_cast(4); 2023-01-11T21:38:06.0899651Z auto tmp8 = tmp6 / tmp7; 2023-01-11T21:38:06.0899782Z auto tmp9 = tmp0 - tmp8; 2023-01-11T21:38:06.0899868Z auto tmp10 = tmp9 * tmp9; 2023-01-11T21:38:06.0899999Z auto tmp11 = tmp1 - tmp8; 2023-01-11T21:38:06.0900099Z auto tmp12 = tmp11 * tmp11; 2023-01-11T21:38:06.0900193Z auto tmp13 = tmp10 + tmp12; 2023-01-11T21:38:06.0900322Z auto tmp14 = tmp3 - tmp8; 2023-01-11T21:38:06.0900418Z auto tmp15 = tmp14 * tmp14; 2023-01-11T21:38:06.0900511Z auto tmp16 = tmp13 + tmp15; 2023-01-11T21:38:06.0900638Z auto tmp17 = tmp5 - tmp8; 2023-01-11T21:38:06.0900732Z auto tmp18 = tmp17 * tmp17; 2023-01-11T21:38:06.0900822Z auto tmp19 = tmp16 + tmp18; 2023-01-11T21:38:06.0900929Z auto tmp20 = static_cast(3); 2023-01-11T21:38:06.0901022Z auto tmp21 = tmp19 / tmp20; 2023-01-11T21:38:06.0901126Z auto tmp22 = std::sqrt(tmp21); 2023-01-11T21:38:06.0901222Z out_ptr14[i1 + (8*i0)] = tmp22; 2023-01-11T21:38:06.0901289Z } 2023-01-11T21:38:06.0901350Z } 2023-01-11T21:38:06.0901434Z #pragma omp single 2023-01-11T21:38:06.0901502Z { 2023-01-11T21:38:06.0901568Z { 2023-01-11T21:38:06.0901635Z { 2023-01-11T21:38:06.0901731Z auto tmp0 = out_ptr1[0]; 2023-01-11T21:38:06.0901836Z auto tmp1 = static_cast(255); 2023-01-11T21:38:06.0901931Z auto tmp2 = tmp0 / tmp1; 2023-01-11T21:38:06.0902054Z in_out_ptr4[0] = tmp2; 2023-01-11T21:38:06.0902125Z } 2023-01-11T21:38:06.0902192Z } 2023-01-11T21:38:06.0902258Z } 2023-01-11T21:38:06.0902342Z #pragma omp single 2023-01-11T21:38:06.0902402Z { 2023-01-11T21:38:06.0902467Z { 2023-01-11T21:38:06.0902534Z { 2023-01-11T21:38:06.0902629Z auto tmp0 = out_ptr3[0]; 2023-01-11T21:38:06.0902741Z auto tmp1 = static_cast(256); 2023-01-11T21:38:06.0902842Z auto tmp2 = tmp0 / tmp1; 2023-01-11T21:38:06.0902934Z in_out_ptr5[0] = tmp2; 2023-01-11T21:38:06.0902999Z } 2023-01-11T21:38:06.0903066Z } 2023-01-11T21:38:06.0903134Z } 2023-01-11T21:38:06.0903216Z #pragma omp single 2023-01-11T21:38:06.0903283Z { 2023-01-11T21:38:06.0903348Z { 2023-01-11T21:38:06.0903409Z { 2023-01-11T21:38:06.0903506Z auto tmp0 = out_ptr9[0]; 2023-01-11T21:38:06.0903617Z auto tmp1 = static_cast(256); 2023-01-11T21:38:06.0903713Z auto tmp2 = tmp0 / tmp1; 2023-01-11T21:38:06.0903845Z auto tmp3 = std::sqrt(tmp2); 2023-01-11T21:38:06.0903979Z in_out_ptr6[0] = tmp3; 2023-01-11T21:38:06.0904076Z } 2023-01-11T21:38:06.0904156Z } 2023-01-11T21:38:06.0904245Z } 2023-01-11T21:38:06.0904332Z } 2023-01-11T21:38:06.0904414Z } 2023-01-11T21:38:06.0904541Z ''') 2023-01-11T21:38:06.0904550Z 2023-01-11T21:38:06.0904555Z 2023-01-11T21:38:06.0904738Z async_compile.wait(globals()) 2023-01-11T21:38:06.0904846Z del async_compile 2023-01-11T21:38:06.0904853Z 2023-01-11T21:38:06.0904966Z def call(args): 2023-01-11T21:38:06.0905085Z arg0_1, = args 2023-01-11T21:38:06.0905229Z args.clear() 2023-01-11T21:38:06.0905582Z buf0 = empty_strided((1, 1, 1, 1), (1, 1, 1, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0905911Z buf1 = empty_strided((), (), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0906214Z buf2 = empty_strided((1, 1, 1, 1), (1, 1, 1, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0906502Z buf3 = empty_strided((), (), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0906799Z buf10 = empty_strided((1, 1, 1, 1), (1, 1, 1, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0907099Z buf4 = empty_strided((2, 4, 4, 1), (16, 4, 1, 32), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0907360Z buf5 = empty_strided((2, 4, 4), (16, 4, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0907684Z buf7 = empty_strided((2, 4, 4, 1), (16, 4, 1, 32), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0907817Z buf6 = buf5; del buf5 # reuse 2023-01-11T21:38:06.0908096Z buf8 = empty_strided((2, 4, 4), (16, 4, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0908229Z buf9 = buf8; del buf8 # reuse 2023-01-11T21:38:06.0908527Z buf11 = empty_strided((), (), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0908833Z buf12 = empty_strided((1, 1, 4, 8), (32, 32, 8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0909109Z buf13 = empty_strided((4, 8), (8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0909399Z buf15 = empty_strided((1, 1, 4, 8), (32, 32, 8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0909538Z buf14 = buf13; del buf13 # reuse 2023-01-11T21:38:06.0909804Z buf16 = empty_strided((4, 8), (8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0909936Z buf17 = buf16; del buf16 # reuse 2023-01-11T21:38:06.0910225Z buf18 = empty_strided((2, 4, 1, 8), (32, 8, 8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0910343Z buf19 = buf1; del buf1 # reuse 2023-01-11T21:38:06.0910459Z buf20 = buf3; del buf3 # reuse 2023-01-11T21:38:06.0910580Z buf21 = buf11; del buf11 # reuse 2023-01-11T21:38:06.0911147Z kernel_cpp_0(c_void_p(buf6.data_ptr()), c_void_p(buf9.data_ptr()), c_void_p(buf14.data_ptr()), c_void_p(buf17.data_ptr()), c_void_p(buf19.data_ptr()), c_void_p(buf20.data_ptr()), c_void_p(buf21.data_ptr()), c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf2.data_ptr()), c_void_p(buf10.data_ptr()), c_void_p(buf4.data_ptr()), c_void_p(buf7.data_ptr()), c_void_p(buf12.data_ptr()), c_void_p(buf15.data_ptr()), c_void_p(buf18.data_ptr())) 2023-01-11T21:38:06.0911233Z del arg0_1 2023-01-11T21:38:06.0911362Z return (buf19, buf20, buf6, buf9, buf21, buf14, buf17, buf18, ) 2023-01-11T21:38:06.0911369Z 2023-01-11T21:38:06.0911377Z 2023-01-11T21:38:06.0911462Z if __name__ == "__main__": 2023-01-11T21:38:06.0911584Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.0911715Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.0911937Z arg0_1 = rand_strided((2, 4, 4, 8), (128, 32, 8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0912053Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.0912323Z [2023-01-11 21:31:55,313] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 269 2023-01-11T21:38:06.0912330Z 2023-01-11T21:38:06.0912396Z ok (2.132s) 2023-01-11T21:38:06.0912910Z test_strided_inputs_cpu (__main__.CpuTests) ... /opt/conda/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py:372: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled.Consider setting `torch.set_float32_matmul_precision('high')` 2023-01-11T21:38:06.0912992Z warnings.warn( 2023-01-11T21:38:06.0913292Z [2023-01-11 21:31:55,328] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 270 2023-01-11T21:38:06.0913559Z [2023-01-11 21:31:57,015] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 270 2023-01-11T21:38:06.0913565Z 2023-01-11T21:38:06.0913670Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.0913750Z import torch 2023-01-11T21:38:06.0913831Z import random 2023-01-11T21:38:06.0913956Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.0914076Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.0914081Z 2023-01-11T21:38:06.0914168Z aten = torch.ops.aten 2023-01-11T21:38:06.0914308Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.0914407Z async_compile = AsyncCompile() 2023-01-11T21:38:06.0914412Z 2023-01-11T21:38:06.0914488Z import triton 2023-01-11T21:38:06.0914584Z import triton.language as tl 2023-01-11T21:38:06.0914710Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.0914849Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.0914863Z 2023-01-11T21:38:06.0914867Z 2023-01-11T21:38:06.0915001Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.0915214Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.0915342Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.0915455Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:06.0915562Z float* __restrict__ out_ptr0) 2023-01-11T21:38:06.0915634Z { 2023-01-11T21:38:06.0915742Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.0915804Z { 2023-01-11T21:38:06.0915890Z #pragma omp for 2023-01-11T21:38:06.0915986Z for(long i0=0; i0<128; i0+=1) 2023-01-11T21:38:06.0916057Z { 2023-01-11T21:38:06.0916128Z { 2023-01-11T21:38:06.0916198Z { 2023-01-11T21:38:06.0916305Z auto tmp0 = in_ptr0[2*i0]; 2023-01-11T21:38:06.0916398Z auto tmp1 = in_ptr1[i0]; 2023-01-11T21:38:06.0916495Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.0916587Z out_ptr0[i0] = tmp2; 2023-01-11T21:38:06.0916663Z } 2023-01-11T21:38:06.0916764Z } 2023-01-11T21:38:06.0916837Z } 2023-01-11T21:38:06.0916905Z } 2023-01-11T21:38:06.0916964Z } 2023-01-11T21:38:06.0917051Z ''') 2023-01-11T21:38:06.0917057Z 2023-01-11T21:38:06.0917062Z 2023-01-11T21:38:06.0917157Z async_compile.wait(globals()) 2023-01-11T21:38:06.0917233Z del async_compile 2023-01-11T21:38:06.0917238Z 2023-01-11T21:38:06.0917317Z def call(args): 2023-01-11T21:38:06.0917398Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.0917476Z args.clear() 2023-01-11T21:38:06.0917668Z buf0 = empty_strided((8, 16), (16, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0917835Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.0917911Z del arg0_1 2023-01-11T21:38:06.0917983Z del arg1_1 2023-01-11T21:38:06.0918060Z return (buf0, ) 2023-01-11T21:38:06.0918065Z 2023-01-11T21:38:06.0918070Z 2023-01-11T21:38:06.0918149Z if __name__ == "__main__": 2023-01-11T21:38:06.0918270Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.0918399Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.0918592Z arg0_1 = rand_strided((8, 16), (32, 2), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0918789Z arg1_1 = rand_strided((8, 16), (16, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0918913Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.0918918Z 2023-01-11T21:38:06.0918988Z ok (1.700s) 2023-01-11T21:38:06.0919443Z test_sum1_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.0919621Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.0919882Z [2023-01-11 21:31:57,030] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 271 2023-01-11T21:38:06.0920147Z [2023-01-11 21:31:58,726] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 271 2023-01-11T21:38:06.0920153Z 2023-01-11T21:38:06.0920251Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.0920319Z import torch 2023-01-11T21:38:06.0920397Z import random 2023-01-11T21:38:06.0920517Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.0920642Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.0920647Z 2023-01-11T21:38:06.0920735Z aten = torch.ops.aten 2023-01-11T21:38:06.0920874Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.0920970Z async_compile = AsyncCompile() 2023-01-11T21:38:06.0920976Z 2023-01-11T21:38:06.0921051Z import triton 2023-01-11T21:38:06.0921137Z import triton.language as tl 2023-01-11T21:38:06.0921266Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.0921408Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.0921414Z 2023-01-11T21:38:06.0921418Z 2023-01-11T21:38:06.0921558Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.0921767Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.0921894Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.0922004Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:06.0922109Z float* __restrict__ out_ptr0) 2023-01-11T21:38:06.0922172Z { 2023-01-11T21:38:06.0922273Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.0922342Z { 2023-01-11T21:38:06.0922424Z #pragma omp for 2023-01-11T21:38:06.0922512Z for(long i0=0; i0<8; i0+=1) 2023-01-11T21:38:06.0922581Z { 2023-01-11T21:38:06.0922643Z { 2023-01-11T21:38:06.0922873Z #pragma omp declare reduction(+:at::vec::Vectorized:omp_out += omp_in) initializer(omp_priv={{0}}) 2023-01-11T21:38:06.0922962Z float tmp3 = 0; 2023-01-11T21:38:06.0923092Z auto tmp3_vec = at::vec::Vectorized(tmp3); 2023-01-11T21:38:06.0923189Z for(long i1=0; i1<1; i1+=1) 2023-01-11T21:38:06.0923262Z { 2023-01-11T21:38:06.0923412Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + (8*i0) + (8*i1)); 2023-01-11T21:38:06.0923561Z auto tmp1 = at::vec::Vectorized::loadu(in_ptr1 + (8*i0) + (8*i1)); 2023-01-11T21:38:06.0923662Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.0923748Z tmp3_vec += tmp2; 2023-01-11T21:38:06.0923819Z } 2023-01-11T21:38:06.0924023Z tmp3 = at::vec::vec_reduce_all([](at::vec::Vectorized& x, at::vec::Vectorized&y) {return x + y;}, tmp3_vec); 2023-01-11T21:38:06.0924153Z #pragma omp simd simdlen(4) reduction(+:tmp3) 2023-01-11T21:38:06.0924250Z for(long i1=8; i1<8; i1+=1) 2023-01-11T21:38:06.0924325Z { 2023-01-11T21:38:06.0924435Z auto tmp0 = in_ptr0[i1 + (8*i0)]; 2023-01-11T21:38:06.0924532Z auto tmp1 = in_ptr1[i1 + (8*i0)]; 2023-01-11T21:38:06.0924631Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.0924717Z tmp3 += tmp2; 2023-01-11T21:38:06.0924789Z } 2023-01-11T21:38:06.0924878Z out_ptr0[i0] = tmp3; 2023-01-11T21:38:06.0924950Z } 2023-01-11T21:38:06.0925046Z } 2023-01-11T21:38:06.0925108Z } 2023-01-11T21:38:06.0925172Z } 2023-01-11T21:38:06.0925259Z ''') 2023-01-11T21:38:06.0925265Z 2023-01-11T21:38:06.0925269Z 2023-01-11T21:38:06.0925367Z async_compile.wait(globals()) 2023-01-11T21:38:06.0925446Z del async_compile 2023-01-11T21:38:06.0925451Z 2023-01-11T21:38:06.0925528Z def call(args): 2023-01-11T21:38:06.0925614Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.0925684Z args.clear() 2023-01-11T21:38:06.0925879Z buf0 = empty_strided((8, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0926049Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.0926126Z del arg0_1 2023-01-11T21:38:06.0926199Z del arg1_1 2023-01-11T21:38:06.0926278Z return (buf0, ) 2023-01-11T21:38:06.0926283Z 2023-01-11T21:38:06.0926288Z 2023-01-11T21:38:06.0926372Z if __name__ == "__main__": 2023-01-11T21:38:06.0926495Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.0926620Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.0926819Z arg0_1 = rand_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0927020Z arg1_1 = rand_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0927143Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.0927148Z 2023-01-11T21:38:06.0927225Z ok (1.717s) 2023-01-11T21:38:06.0927671Z test_sum2_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.0927805Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.0928065Z [2023-01-11 21:31:58,752] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 272 2023-01-11T21:38:06.0928337Z [2023-01-11 21:32:00,481] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 272 2023-01-11T21:38:06.0928343Z 2023-01-11T21:38:06.0928437Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.0928543Z import torch 2023-01-11T21:38:06.0928621Z import random 2023-01-11T21:38:06.0928745Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.0928871Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.0928876Z 2023-01-11T21:38:06.0928961Z aten = torch.ops.aten 2023-01-11T21:38:06.0929102Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.0929200Z async_compile = AsyncCompile() 2023-01-11T21:38:06.0929206Z 2023-01-11T21:38:06.0929275Z import triton 2023-01-11T21:38:06.0929369Z import triton.language as tl 2023-01-11T21:38:06.0929496Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.0929644Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.0929650Z 2023-01-11T21:38:06.0929654Z 2023-01-11T21:38:06.0929794Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.0930003Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.0930134Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.0930246Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:06.0930346Z float* __restrict__ out_ptr0, 2023-01-11T21:38:06.0930448Z float* __restrict__ out_ptr1) 2023-01-11T21:38:06.0930516Z { 2023-01-11T21:38:06.0930622Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.0930691Z { 2023-01-11T21:38:06.0930776Z #pragma omp for 2023-01-11T21:38:06.0930867Z for(long i0=0; i0<8; i0+=1) 2023-01-11T21:38:06.0930930Z { 2023-01-11T21:38:06.0931019Z #pragma GCC ivdep 2023-01-11T21:38:06.0931143Z for(long i1=0; i1<21; i1+=1) 2023-01-11T21:38:06.0931213Z { 2023-01-11T21:38:06.0931284Z { 2023-01-11T21:38:06.0931356Z { 2023-01-11T21:38:06.0931438Z float tmp3 = 0; 2023-01-11T21:38:06.0931543Z for(long i2=0; i2<27; i2+=1) 2023-01-11T21:38:06.0931617Z { 2023-01-11T21:38:06.0931694Z { 2023-01-11T21:38:06.0931812Z auto tmp0 = in_ptr0[i1 + (21*i2) + (567*i0)]; 2023-01-11T21:38:06.0931932Z auto tmp1 = in_ptr1[i1 + (21*i2) + (567*i0)]; 2023-01-11T21:38:06.0932041Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.0932134Z tmp3 += tmp2; 2023-01-11T21:38:06.0932203Z } 2023-01-11T21:38:06.0932277Z } 2023-01-11T21:38:06.0932381Z out_ptr0[i1 + (21*i0)] = tmp3; 2023-01-11T21:38:06.0932457Z } 2023-01-11T21:38:06.0932525Z } 2023-01-11T21:38:06.0932594Z } 2023-01-11T21:38:06.0932654Z } 2023-01-11T21:38:06.0932736Z #pragma omp for 2023-01-11T21:38:06.0932824Z for(long i0=0; i0<216; i0+=1) 2023-01-11T21:38:06.0932893Z { 2023-01-11T21:38:06.0932961Z { 2023-01-11T21:38:06.0933156Z #pragma omp declare reduction(+:at::vec::Vectorized:omp_out += omp_in) initializer(omp_priv={{0}}) 2023-01-11T21:38:06.0933243Z float tmp3 = 0; 2023-01-11T21:38:06.0933376Z auto tmp3_vec = at::vec::Vectorized(tmp3); 2023-01-11T21:38:06.0933467Z for(long i1=0; i1<2; i1+=1) 2023-01-11T21:38:06.0933536Z { 2023-01-11T21:38:06.0933686Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + (8*i1) + (21*i0)); 2023-01-11T21:38:06.0933832Z auto tmp1 = at::vec::Vectorized::loadu(in_ptr1 + (8*i1) + (21*i0)); 2023-01-11T21:38:06.0933937Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.0934026Z tmp3_vec += tmp2; 2023-01-11T21:38:06.0934097Z } 2023-01-11T21:38:06.0934324Z tmp3 = at::vec::vec_reduce_all([](at::vec::Vectorized& x, at::vec::Vectorized&y) {return x + y;}, tmp3_vec); 2023-01-11T21:38:06.0934446Z #pragma omp simd simdlen(4) reduction(+:tmp3) 2023-01-11T21:38:06.0934690Z for(long i1=16; i1<21; i1+=1) 2023-01-11T21:38:06.0934767Z { 2023-01-11T21:38:06.0934875Z auto tmp0 = in_ptr0[i1 + (21*i0)]; 2023-01-11T21:38:06.0934977Z auto tmp1 = in_ptr1[i1 + (21*i0)]; 2023-01-11T21:38:06.0935076Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.0935162Z tmp3 += tmp2; 2023-01-11T21:38:06.0935225Z } 2023-01-11T21:38:06.0935316Z out_ptr1[i0] = tmp3; 2023-01-11T21:38:06.0935384Z } 2023-01-11T21:38:06.0935452Z } 2023-01-11T21:38:06.0935522Z } 2023-01-11T21:38:06.0935589Z } 2023-01-11T21:38:06.0935671Z ''') 2023-01-11T21:38:06.0935684Z 2023-01-11T21:38:06.0935688Z 2023-01-11T21:38:06.0935776Z async_compile.wait(globals()) 2023-01-11T21:38:06.0935857Z del async_compile 2023-01-11T21:38:06.0935862Z 2023-01-11T21:38:06.0935939Z def call(args): 2023-01-11T21:38:06.0936018Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.0936093Z args.clear() 2023-01-11T21:38:06.0936299Z buf0 = empty_strided((8, 21), (21, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0936503Z buf1 = empty_strided((8, 9, 3), (27, 3, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0936693Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr())) 2023-01-11T21:38:06.0936768Z del arg0_1 2023-01-11T21:38:06.0936842Z del arg1_1 2023-01-11T21:38:06.0936982Z return (buf0, buf1, ) 2023-01-11T21:38:06.0936988Z 2023-01-11T21:38:06.0936992Z 2023-01-11T21:38:06.0937072Z if __name__ == "__main__": 2023-01-11T21:38:06.0937269Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.0937400Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.0937619Z arg0_1 = rand_strided((8, 9, 3, 21), (567, 63, 21, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0937821Z arg1_1 = rand_strided((8, 9, 3, 21), (567, 63, 21, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0937943Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.0937948Z 2023-01-11T21:38:06.0938020Z ok (1.749s) 2023-01-11T21:38:06.0938464Z test_sum3_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.0938601Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.0938863Z [2023-01-11 21:32:00,501] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 273 2023-01-11T21:38:06.0939128Z [2023-01-11 21:32:02,228] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 273 2023-01-11T21:38:06.0939134Z 2023-01-11T21:38:06.0939233Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.0939308Z import torch 2023-01-11T21:38:06.0939383Z import random 2023-01-11T21:38:06.0939496Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.0939620Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.0939625Z 2023-01-11T21:38:06.0939710Z aten = torch.ops.aten 2023-01-11T21:38:06.0939847Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.0939948Z async_compile = AsyncCompile() 2023-01-11T21:38:06.0939953Z 2023-01-11T21:38:06.0940029Z import triton 2023-01-11T21:38:06.0940122Z import triton.language as tl 2023-01-11T21:38:06.0940241Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.0940421Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.0940427Z 2023-01-11T21:38:06.0940432Z 2023-01-11T21:38:06.0940572Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.0940777Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.0940903Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.0941010Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:06.0941115Z float* __restrict__ out_ptr0, 2023-01-11T21:38:06.0941217Z float* __restrict__ out_ptr1, 2023-01-11T21:38:06.0941312Z float* __restrict__ out_ptr2) 2023-01-11T21:38:06.0941382Z { 2023-01-11T21:38:06.0941485Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.0941552Z { 2023-01-11T21:38:06.0941631Z #pragma omp for 2023-01-11T21:38:06.0941720Z for(long i0=0; i0<10; i0+=1) 2023-01-11T21:38:06.0941789Z { 2023-01-11T21:38:06.0941854Z { 2023-01-11T21:38:06.0942046Z #pragma omp declare reduction(+:at::vec::Vectorized:omp_out += omp_in) initializer(omp_priv={{0}}) 2023-01-11T21:38:06.0942132Z float tmp3 = 0; 2023-01-11T21:38:06.0942260Z auto tmp3_vec = at::vec::Vectorized(tmp3); 2023-01-11T21:38:06.0942358Z for(long i1=0; i1<1; i1+=1) 2023-01-11T21:38:06.0942429Z { 2023-01-11T21:38:06.0942579Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + (8*i1) + (10*i0)); 2023-01-11T21:38:06.0942724Z auto tmp1 = at::vec::Vectorized::loadu(in_ptr1 + 8*i1); 2023-01-11T21:38:06.0942845Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.0942961Z tmp2.store(out_ptr0 + (8*i1) + (10*i0)); 2023-01-11T21:38:06.0943049Z tmp3_vec += tmp2; 2023-01-11T21:38:06.0943121Z } 2023-01-11T21:38:06.0943322Z tmp3 = at::vec::vec_reduce_all([](at::vec::Vectorized& x, at::vec::Vectorized&y) {return x + y;}, tmp3_vec); 2023-01-11T21:38:06.0943448Z #pragma omp simd simdlen(4) reduction(+:tmp3) 2023-01-11T21:38:06.0943547Z for(long i1=8; i1<10; i1+=1) 2023-01-11T21:38:06.0943617Z { 2023-01-11T21:38:06.0943717Z auto tmp0 = in_ptr0[i1 + (10*i0)]; 2023-01-11T21:38:06.0943815Z auto tmp1 = in_ptr1[i1]; 2023-01-11T21:38:06.0943912Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.0944011Z out_ptr0[i1 + (10*i0)] = tmp2; 2023-01-11T21:38:06.0944099Z tmp3 += tmp2; 2023-01-11T21:38:06.0944174Z } 2023-01-11T21:38:06.0944262Z out_ptr1[i0] = tmp3; 2023-01-11T21:38:06.0944324Z } 2023-01-11T21:38:06.0944392Z } 2023-01-11T21:38:06.0944474Z #pragma omp for 2023-01-11T21:38:06.0944562Z for(long i0=0; i0<1; i0+=1) 2023-01-11T21:38:06.0944634Z { 2023-01-11T21:38:06.0944775Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr1 + 8*i0); 2023-01-11T21:38:06.0944916Z auto tmp1 = at::vec::Vectorized(static_cast(10)); 2023-01-11T21:38:06.0945000Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.0945099Z tmp2.store(out_ptr2 + 8*i0); 2023-01-11T21:38:06.0945170Z } 2023-01-11T21:38:06.0945268Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:06.0945356Z for(long i0=8; i0<10; i0+=1) 2023-01-11T21:38:06.0945423Z { 2023-01-11T21:38:06.0945512Z auto tmp0 = in_ptr1[i0]; 2023-01-11T21:38:06.0945614Z auto tmp1 = static_cast(10); 2023-01-11T21:38:06.0945705Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.0945790Z out_ptr2[i0] = tmp2; 2023-01-11T21:38:06.0945858Z } 2023-01-11T21:38:06.0945925Z } 2023-01-11T21:38:06.0945991Z } 2023-01-11T21:38:06.0946070Z ''') 2023-01-11T21:38:06.0946076Z 2023-01-11T21:38:06.0946115Z 2023-01-11T21:38:06.0946204Z async_compile.wait(globals()) 2023-01-11T21:38:06.0946282Z del async_compile 2023-01-11T21:38:06.0946287Z 2023-01-11T21:38:06.0946360Z def call(args): 2023-01-11T21:38:06.0946441Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.0946517Z args.clear() 2023-01-11T21:38:06.0946717Z buf0 = empty_strided((10, 10), (10, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0946910Z buf1 = empty_strided((10, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0947094Z buf2 = empty_strided((10, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0947310Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr())) 2023-01-11T21:38:06.0947388Z del arg0_1 2023-01-11T21:38:06.0947462Z del arg1_1 2023-01-11T21:38:06.0947549Z return (buf0, buf1, buf2, ) 2023-01-11T21:38:06.0947554Z 2023-01-11T21:38:06.0947558Z 2023-01-11T21:38:06.0947642Z if __name__ == "__main__": 2023-01-11T21:38:06.0947763Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.0947890Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.0948083Z arg0_1 = rand_strided((10, 10), (10, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0948280Z arg1_1 = rand_strided((1, 10), (10, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0948402Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.0948407Z 2023-01-11T21:38:06.0948479Z ok (1.747s) 2023-01-11T21:38:06.0948918Z test_sum4_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.0949084Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.0949344Z [2023-01-11 21:32:02,249] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 274 2023-01-11T21:38:06.0949610Z [2023-01-11 21:32:03,996] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 274 2023-01-11T21:38:06.0949615Z 2023-01-11T21:38:06.0949715Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.0949783Z import torch 2023-01-11T21:38:06.0949859Z import random 2023-01-11T21:38:06.0949979Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.0950104Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.0950112Z 2023-01-11T21:38:06.0950195Z aten = torch.ops.aten 2023-01-11T21:38:06.0950334Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.0950429Z async_compile = AsyncCompile() 2023-01-11T21:38:06.0950434Z 2023-01-11T21:38:06.0950509Z import triton 2023-01-11T21:38:06.0950599Z import triton.language as tl 2023-01-11T21:38:06.0950726Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.0950866Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.0950872Z 2023-01-11T21:38:06.0950876Z 2023-01-11T21:38:06.0951015Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.0951220Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.0951345Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.0951450Z float* __restrict__ out_ptr0, 2023-01-11T21:38:06.0951556Z float* __restrict__ out_ptr1, 2023-01-11T21:38:06.0951652Z float* __restrict__ out_ptr2, 2023-01-11T21:38:06.0951753Z float* __restrict__ out_ptr3, 2023-01-11T21:38:06.0951853Z float* __restrict__ out_ptr4) 2023-01-11T21:38:06.0951920Z { 2023-01-11T21:38:06.0952049Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.0952119Z { 2023-01-11T21:38:06.0952203Z #pragma omp for 2023-01-11T21:38:06.0952286Z for(long i0=0; i0<128; i0+=1) 2023-01-11T21:38:06.0952355Z { 2023-01-11T21:38:06.0952496Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:06.0952635Z auto tmp1 = at::vec::Vectorized(static_cast(1)); 2023-01-11T21:38:06.0952727Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.0952824Z tmp2.store(out_ptr0 + 8*i0); 2023-01-11T21:38:06.0952893Z } 2023-01-11T21:38:06.0952986Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:06.0953082Z for(long i0=1024; i0<1024; i0+=1) 2023-01-11T21:38:06.0953151Z { 2023-01-11T21:38:06.0953241Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:06.0953348Z auto tmp1 = static_cast(1); 2023-01-11T21:38:06.0953438Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.0953525Z out_ptr0[i0] = tmp2; 2023-01-11T21:38:06.0953587Z } 2023-01-11T21:38:06.0953669Z #pragma omp for 2023-01-11T21:38:06.0953759Z for(long i0=0; i0<128; i0+=1) 2023-01-11T21:38:06.0953827Z { 2023-01-11T21:38:06.0953895Z { 2023-01-11T21:38:06.0954087Z #pragma omp declare reduction(+:at::vec::Vectorized:omp_out += omp_in) initializer(omp_priv={{0}}) 2023-01-11T21:38:06.0954172Z float tmp1 = 0; 2023-01-11T21:38:06.0954292Z auto tmp1_vec = at::vec::Vectorized(tmp1); 2023-01-11T21:38:06.0954387Z for(long i1=0; i1<1; i1+=1) 2023-01-11T21:38:06.0954485Z { 2023-01-11T21:38:06.0954637Z auto tmp0 = at::vec::Vectorized::loadu(out_ptr0 + (8*i0) + (8*i1)); 2023-01-11T21:38:06.0954727Z tmp1_vec += tmp0; 2023-01-11T21:38:06.0954797Z } 2023-01-11T21:38:06.0955002Z tmp1 = at::vec::vec_reduce_all([](at::vec::Vectorized& x, at::vec::Vectorized&y) {return x + y;}, tmp1_vec); 2023-01-11T21:38:06.0955132Z #pragma omp simd simdlen(4) reduction(+:tmp1) 2023-01-11T21:38:06.0955221Z for(long i1=8; i1<8; i1+=1) 2023-01-11T21:38:06.0955291Z { 2023-01-11T21:38:06.0955398Z auto tmp0 = out_ptr0[i1 + (8*i0)]; 2023-01-11T21:38:06.0955482Z tmp1 += tmp0; 2023-01-11T21:38:06.0955551Z } 2023-01-11T21:38:06.0955638Z out_ptr1[i0] = tmp1; 2023-01-11T21:38:06.0955707Z } 2023-01-11T21:38:06.0955767Z } 2023-01-11T21:38:06.0955857Z #pragma omp for 2023-01-11T21:38:06.0955944Z for(long i0=0; i0<16; i0+=1) 2023-01-11T21:38:06.0956011Z { 2023-01-11T21:38:06.0956150Z auto tmp0 = at::vec::Vectorized::loadu(out_ptr1 + 8*i0); 2023-01-11T21:38:06.0956294Z auto tmp1 = at::vec::Vectorized(static_cast(3)); 2023-01-11T21:38:06.0956386Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.0956476Z tmp2.store(out_ptr2 + 8*i0); 2023-01-11T21:38:06.0956546Z } 2023-01-11T21:38:06.0956648Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:06.0956734Z for(long i0=128; i0<128; i0+=1) 2023-01-11T21:38:06.0956800Z { 2023-01-11T21:38:06.0956891Z auto tmp0 = out_ptr1[i0]; 2023-01-11T21:38:06.0956996Z auto tmp1 = static_cast(3); 2023-01-11T21:38:06.0957079Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.0957164Z out_ptr2[i0] = tmp2; 2023-01-11T21:38:06.0957234Z } 2023-01-11T21:38:06.0957316Z #pragma omp for 2023-01-11T21:38:06.0957406Z for(long i0=0; i0<16; i0+=1) 2023-01-11T21:38:06.0957473Z { 2023-01-11T21:38:06.0957541Z { 2023-01-11T21:38:06.0957726Z #pragma omp declare reduction(+:at::vec::Vectorized:omp_out += omp_in) initializer(omp_priv={{0}}) 2023-01-11T21:38:06.0957839Z float tmp1 = 0; 2023-01-11T21:38:06.0957967Z auto tmp1_vec = at::vec::Vectorized(tmp1); 2023-01-11T21:38:06.0958061Z for(long i1=0; i1<1; i1+=1) 2023-01-11T21:38:06.0958132Z { 2023-01-11T21:38:06.0958280Z auto tmp0 = at::vec::Vectorized::loadu(out_ptr2 + (8*i0) + (8*i1)); 2023-01-11T21:38:06.0958367Z tmp1_vec += tmp0; 2023-01-11T21:38:06.0958438Z } 2023-01-11T21:38:06.0958629Z tmp1 = at::vec::vec_reduce_all([](at::vec::Vectorized& x, at::vec::Vectorized&y) {return x + y;}, tmp1_vec); 2023-01-11T21:38:06.0958762Z #pragma omp simd simdlen(4) reduction(+:tmp1) 2023-01-11T21:38:06.0958856Z for(long i1=8; i1<8; i1+=1) 2023-01-11T21:38:06.0958926Z { 2023-01-11T21:38:06.0959034Z auto tmp0 = out_ptr2[i1 + (8*i0)]; 2023-01-11T21:38:06.0959119Z tmp1 += tmp0; 2023-01-11T21:38:06.0959189Z } 2023-01-11T21:38:06.0959269Z out_ptr3[i0] = tmp1; 2023-01-11T21:38:06.0959336Z } 2023-01-11T21:38:06.0959406Z } 2023-01-11T21:38:06.0959487Z #pragma omp for 2023-01-11T21:38:06.0959573Z for(long i0=0; i0<2; i0+=1) 2023-01-11T21:38:06.0959641Z { 2023-01-11T21:38:06.0959780Z auto tmp0 = at::vec::Vectorized::loadu(out_ptr3 + 8*i0); 2023-01-11T21:38:06.0959913Z auto tmp1 = at::vec::Vectorized(static_cast(5)); 2023-01-11T21:38:06.0960003Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.0960142Z tmp2.store(out_ptr4 + 8*i0); 2023-01-11T21:38:06.0960212Z } 2023-01-11T21:38:06.0960312Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:06.0960399Z for(long i0=16; i0<16; i0+=1) 2023-01-11T21:38:06.0960467Z { 2023-01-11T21:38:06.0960550Z auto tmp0 = out_ptr3[i0]; 2023-01-11T21:38:06.0960655Z auto tmp1 = static_cast(5); 2023-01-11T21:38:06.0960747Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.0960833Z out_ptr4[i0] = tmp2; 2023-01-11T21:38:06.0960901Z } 2023-01-11T21:38:06.0960968Z } 2023-01-11T21:38:06.0961034Z } 2023-01-11T21:38:06.0961116Z ''') 2023-01-11T21:38:06.0961121Z 2023-01-11T21:38:06.0961126Z 2023-01-11T21:38:06.0961218Z async_compile.wait(globals()) 2023-01-11T21:38:06.0961296Z del async_compile 2023-01-11T21:38:06.0961302Z 2023-01-11T21:38:06.0961379Z def call(args): 2023-01-11T21:38:06.0961457Z arg0_1, = args 2023-01-11T21:38:06.0961534Z args.clear() 2023-01-11T21:38:06.0961754Z buf0 = empty_strided((1, 16, 8, 8), (1024, 64, 8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0961955Z buf1 = empty_strided((1, 16, 8), (128, 8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0962154Z buf2 = empty_strided((1, 16, 8), (128, 8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0962354Z buf3 = empty_strided((1, 16), (16, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0962552Z buf4 = empty_strided((1, 16), (16, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0962791Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr()), c_void_p(buf3.data_ptr()), c_void_p(buf4.data_ptr())) 2023-01-11T21:38:06.0962865Z del arg0_1 2023-01-11T21:38:06.0962968Z return (buf4, buf3, buf2, buf1, buf0, ) 2023-01-11T21:38:06.0962974Z 2023-01-11T21:38:06.0962978Z 2023-01-11T21:38:06.0963059Z if __name__ == "__main__": 2023-01-11T21:38:06.0963182Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.0963304Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.0963518Z arg0_1 = rand_strided((1, 16, 8, 8), (1024, 64, 8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0963632Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.0963669Z 2023-01-11T21:38:06.0963745Z ok (1.768s) 2023-01-11T21:38:06.0964194Z test_sum5_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.0964327Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.0964588Z [2023-01-11 21:32:04,017] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 275 2023-01-11T21:38:06.0964854Z [2023-01-11 21:32:05,735] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 275 2023-01-11T21:38:06.0964860Z 2023-01-11T21:38:06.0964958Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.0965027Z import torch 2023-01-11T21:38:06.0965104Z import random 2023-01-11T21:38:06.0965227Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.0965352Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.0965358Z 2023-01-11T21:38:06.0965442Z aten = torch.ops.aten 2023-01-11T21:38:06.0965580Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.0965675Z async_compile = AsyncCompile() 2023-01-11T21:38:06.0965681Z 2023-01-11T21:38:06.0965756Z import triton 2023-01-11T21:38:06.0965843Z import triton.language as tl 2023-01-11T21:38:06.0965968Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.0966107Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.0966139Z 2023-01-11T21:38:06.0966144Z 2023-01-11T21:38:06.0966284Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.0966492Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.0966617Z extern "C" void kernel(float* __restrict__ in_out_ptr0, 2023-01-11T21:38:06.0966729Z const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.0966832Z float* __restrict__ out_ptr0) 2023-01-11T21:38:06.0966892Z { 2023-01-11T21:38:06.0966985Z auto out_ptr1 = in_out_ptr0; 2023-01-11T21:38:06.0967088Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.0967155Z { 2023-01-11T21:38:06.0967238Z #pragma omp for 2023-01-11T21:38:06.0967326Z for(long i0=0; i0<136; i0+=1) 2023-01-11T21:38:06.0967397Z { 2023-01-11T21:38:06.0967459Z { 2023-01-11T21:38:06.0967653Z #pragma omp declare reduction(+:at::vec::Vectorized:omp_out += omp_in) initializer(omp_priv={{0}}) 2023-01-11T21:38:06.0967743Z float tmp3 = 0; 2023-01-11T21:38:06.0967872Z auto tmp3_vec = at::vec::Vectorized(tmp3); 2023-01-11T21:38:06.0967968Z for(long i1=0; i1<1; i1+=1) 2023-01-11T21:38:06.0968041Z { 2023-01-11T21:38:06.0968190Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + (8*i1) + (9*i0)); 2023-01-11T21:38:06.0968336Z auto tmp1 = at::vec::Vectorized(static_cast(1)); 2023-01-11T21:38:06.0968428Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.0968515Z tmp3_vec += tmp2; 2023-01-11T21:38:06.0968585Z } 2023-01-11T21:38:06.0968784Z tmp3 = at::vec::vec_reduce_all([](at::vec::Vectorized& x, at::vec::Vectorized&y) {return x + y;}, tmp3_vec); 2023-01-11T21:38:06.0968913Z #pragma omp simd simdlen(4) reduction(+:tmp3) 2023-01-11T21:38:06.0969010Z for(long i1=8; i1<9; i1+=1) 2023-01-11T21:38:06.0969081Z { 2023-01-11T21:38:06.0969179Z auto tmp0 = in_ptr0[i1 + (9*i0)]; 2023-01-11T21:38:06.0969291Z auto tmp1 = static_cast(1); 2023-01-11T21:38:06.0969414Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.0969500Z tmp3 += tmp2; 2023-01-11T21:38:06.0969570Z } 2023-01-11T21:38:06.0969658Z out_ptr0[i0] = tmp3; 2023-01-11T21:38:06.0969728Z } 2023-01-11T21:38:06.0969790Z } 2023-01-11T21:38:06.0969873Z #pragma omp for 2023-01-11T21:38:06.0969959Z for(long i0=0; i0<17; i0+=1) 2023-01-11T21:38:06.0970027Z { 2023-01-11T21:38:06.0970095Z { 2023-01-11T21:38:06.0970286Z #pragma omp declare reduction(+:at::vec::Vectorized:omp_out += omp_in) initializer(omp_priv={{0}}) 2023-01-11T21:38:06.0970374Z float tmp3 = 0; 2023-01-11T21:38:06.0970494Z auto tmp3_vec = at::vec::Vectorized(tmp3); 2023-01-11T21:38:06.0970591Z for(long i1=0; i1<1; i1+=1) 2023-01-11T21:38:06.0970661Z { 2023-01-11T21:38:06.0970812Z auto tmp0 = at::vec::Vectorized::loadu(out_ptr0 + (8*i0) + (8*i1)); 2023-01-11T21:38:06.0970957Z auto tmp1 = at::vec::Vectorized(static_cast(3)); 2023-01-11T21:38:06.0971055Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.0971143Z tmp3_vec += tmp2; 2023-01-11T21:38:06.0971214Z } 2023-01-11T21:38:06.0971404Z tmp3 = at::vec::vec_reduce_all([](at::vec::Vectorized& x, at::vec::Vectorized&y) {return x + y;}, tmp3_vec); 2023-01-11T21:38:06.0971530Z #pragma omp simd simdlen(4) reduction(+:tmp3) 2023-01-11T21:38:06.0971626Z for(long i1=8; i1<8; i1+=1) 2023-01-11T21:38:06.0971724Z { 2023-01-11T21:38:06.0971831Z auto tmp0 = out_ptr0[i1 + (8*i0)]; 2023-01-11T21:38:06.0971941Z auto tmp1 = static_cast(3); 2023-01-11T21:38:06.0972037Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.0972122Z tmp3 += tmp2; 2023-01-11T21:38:06.0972188Z } 2023-01-11T21:38:06.0972276Z out_ptr1[i0] = tmp3; 2023-01-11T21:38:06.0972346Z } 2023-01-11T21:38:06.0972416Z } 2023-01-11T21:38:06.0972498Z #pragma omp for 2023-01-11T21:38:06.0972586Z for(long i0=0; i0<2; i0+=1) 2023-01-11T21:38:06.0972647Z { 2023-01-11T21:38:06.0972790Z auto tmp0 = at::vec::Vectorized::loadu(out_ptr1 + 8*i0); 2023-01-11T21:38:06.0972928Z auto tmp1 = at::vec::Vectorized(static_cast(5)); 2023-01-11T21:38:06.0973020Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.0973123Z tmp2.store(in_out_ptr0 + 8*i0); 2023-01-11T21:38:06.0973195Z } 2023-01-11T21:38:06.0973297Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:06.0973387Z for(long i0=16; i0<17; i0+=1) 2023-01-11T21:38:06.0973447Z { 2023-01-11T21:38:06.0973539Z auto tmp0 = out_ptr1[i0]; 2023-01-11T21:38:06.0973645Z auto tmp1 = static_cast(5); 2023-01-11T21:38:06.0973734Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.0973822Z in_out_ptr0[i0] = tmp2; 2023-01-11T21:38:06.0973889Z } 2023-01-11T21:38:06.0973949Z } 2023-01-11T21:38:06.0974015Z } 2023-01-11T21:38:06.0974102Z ''') 2023-01-11T21:38:06.0974108Z 2023-01-11T21:38:06.0974112Z 2023-01-11T21:38:06.0974207Z async_compile.wait(globals()) 2023-01-11T21:38:06.0974285Z del async_compile 2023-01-11T21:38:06.0974290Z 2023-01-11T21:38:06.0974366Z def call(args): 2023-01-11T21:38:06.0974441Z arg0_1, = args 2023-01-11T21:38:06.0974636Z args.clear() 2023-01-11T21:38:06.0974842Z buf0 = empty_strided((1, 17, 8), (136, 8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0975041Z buf1 = empty_strided((1, 17), (17, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0975131Z buf2 = buf1; del buf1 # reuse 2023-01-11T21:38:06.0975382Z kernel_cpp_0(c_void_p(buf2.data_ptr()), c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.0975458Z del arg0_1 2023-01-11T21:38:06.0975533Z return (buf2, ) 2023-01-11T21:38:06.0975539Z 2023-01-11T21:38:06.0975543Z 2023-01-11T21:38:06.0975624Z if __name__ == "__main__": 2023-01-11T21:38:06.0975746Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.0975866Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.0976082Z arg0_1 = rand_strided((1, 17, 8, 9), (1224, 72, 9, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0976192Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.0976197Z 2023-01-11T21:38:06.0976267Z ok (1.769s) 2023-01-11T21:38:06.0976729Z test_sum_dtype_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.0976859Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.0977116Z [2023-01-11 21:32:05,788] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 276 2023-01-11T21:38:06.0977474Z [2023-01-11 21:32:07,506] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 276 2023-01-11T21:38:06.0977481Z 2023-01-11T21:38:06.0977600Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.0977673Z import torch 2023-01-11T21:38:06.0977750Z import random 2023-01-11T21:38:06.0977918Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.0978045Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.0978050Z 2023-01-11T21:38:06.0978132Z aten = torch.ops.aten 2023-01-11T21:38:06.0978268Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.0978366Z async_compile = AsyncCompile() 2023-01-11T21:38:06.0978374Z 2023-01-11T21:38:06.0978449Z import triton 2023-01-11T21:38:06.0978535Z import triton.language as tl 2023-01-11T21:38:06.0978659Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.0978799Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.0978805Z 2023-01-11T21:38:06.0978809Z 2023-01-11T21:38:06.0978945Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.0979149Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.0979272Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.0979383Z double* __restrict__ out_ptr0, 2023-01-11T21:38:06.0979492Z double* __restrict__ out_ptr1, 2023-01-11T21:38:06.0979587Z double* __restrict__ out_ptr2) 2023-01-11T21:38:06.0979652Z { 2023-01-11T21:38:06.0979756Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.0979828Z { 2023-01-11T21:38:06.0979909Z #pragma omp for 2023-01-11T21:38:06.0980002Z for(long i0=0; i0<32; i0+=1) 2023-01-11T21:38:06.0980062Z { 2023-01-11T21:38:06.0980132Z { 2023-01-11T21:38:06.0980202Z { 2023-01-11T21:38:06.0980288Z double tmp2 = 0; 2023-01-11T21:38:06.0980385Z for(long i1=0; i1<32; i1+=1) 2023-01-11T21:38:06.0980456Z { 2023-01-11T21:38:06.0980530Z { 2023-01-11T21:38:06.0980633Z auto tmp0 = in_ptr0[i1 + (32*i0)]; 2023-01-11T21:38:06.0980755Z auto tmp1 = static_cast(tmp0); 2023-01-11T21:38:06.0980846Z tmp2 += tmp1; 2023-01-11T21:38:06.0980919Z } 2023-01-11T21:38:06.0980991Z } 2023-01-11T21:38:06.0981082Z out_ptr0[i0] = tmp2; 2023-01-11T21:38:06.0981152Z } 2023-01-11T21:38:06.0981244Z } 2023-01-11T21:38:06.0981312Z } 2023-01-11T21:38:06.0981378Z } 2023-01-11T21:38:06.0981443Z { 2023-01-11T21:38:06.0981512Z { 2023-01-11T21:38:06.0981595Z double tmp2 = 0; 2023-01-11T21:38:06.0981700Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.0981770Z { 2023-01-11T21:38:06.0981877Z #pragma omp for reduction(+:tmp2) 2023-01-11T21:38:06.0981973Z for(long i0=0; i0<1024; i0+=1) 2023-01-11T21:38:06.0982042Z { 2023-01-11T21:38:06.0982113Z { 2023-01-11T21:38:06.0982215Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:06.0982328Z auto tmp1 = static_cast(tmp0); 2023-01-11T21:38:06.0982413Z tmp2 += tmp1; 2023-01-11T21:38:06.0982484Z } 2023-01-11T21:38:06.0982554Z } 2023-01-11T21:38:06.0982627Z } 2023-01-11T21:38:06.0982710Z out_ptr1[0] = tmp2; 2023-01-11T21:38:06.0982780Z } 2023-01-11T21:38:06.0982838Z } 2023-01-11T21:38:06.0982941Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.0983005Z { 2023-01-11T21:38:06.0983086Z #pragma omp for 2023-01-11T21:38:06.0983172Z for(long i0=0; i0<32; i0+=1) 2023-01-11T21:38:06.0983239Z { 2023-01-11T21:38:06.0983324Z #pragma GCC ivdep 2023-01-11T21:38:06.0983407Z for(long i1=0; i1<32; i1+=1) 2023-01-11T21:38:06.0983475Z { 2023-01-11T21:38:06.0983549Z { 2023-01-11T21:38:06.0983620Z { 2023-01-11T21:38:06.0983730Z auto tmp0 = in_ptr0[i1 + (32*i0)]; 2023-01-11T21:38:06.0983863Z auto tmp2 = out_ptr0[i1]; 2023-01-11T21:38:06.0983962Z auto tmp4 = out_ptr1[0]; 2023-01-11T21:38:06.0984071Z auto tmp1 = static_cast(tmp0); 2023-01-11T21:38:06.0984170Z auto tmp3 = tmp1 * tmp2; 2023-01-11T21:38:06.0984273Z auto tmp5 = tmp3 + tmp4; 2023-01-11T21:38:06.0984378Z out_ptr2[i1 + (32*i0)] = tmp5; 2023-01-11T21:38:06.0984449Z } 2023-01-11T21:38:06.0984518Z } 2023-01-11T21:38:06.0984586Z } 2023-01-11T21:38:06.0984645Z } 2023-01-11T21:38:06.0984711Z } 2023-01-11T21:38:06.0984776Z } 2023-01-11T21:38:06.0984862Z ''') 2023-01-11T21:38:06.0984868Z 2023-01-11T21:38:06.0984872Z 2023-01-11T21:38:06.0984968Z async_compile.wait(globals()) 2023-01-11T21:38:06.0985044Z del async_compile 2023-01-11T21:38:06.0985049Z 2023-01-11T21:38:06.0985124Z def call(args): 2023-01-11T21:38:06.0985193Z arg0_1, = args 2023-01-11T21:38:06.0985268Z args.clear() 2023-01-11T21:38:06.0985463Z buf0 = empty_strided((32, ), (1, ), device='cpu', dtype=torch.float64) 2023-01-11T21:38:06.0985653Z buf1 = empty_strided((), (), device='cpu', dtype=torch.float64) 2023-01-11T21:38:06.0985855Z buf2 = empty_strided((32, 32), (32, 1), device='cpu', dtype=torch.float64) 2023-01-11T21:38:06.0986052Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr())) 2023-01-11T21:38:06.0986125Z del arg0_1 2023-01-11T21:38:06.0986194Z return (buf2, ) 2023-01-11T21:38:06.0986207Z 2023-01-11T21:38:06.0986211Z 2023-01-11T21:38:06.0986285Z if __name__ == "__main__": 2023-01-11T21:38:06.0986403Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.0986530Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.0986730Z arg0_1 = rand_strided((32, 32), (32, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.0986849Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.0986854Z 2023-01-11T21:38:06.0986924Z ok (1.740s) 2023-01-11T21:38:06.0987406Z test_sum_int_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.0987540Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.0987799Z [2023-01-11 21:32:07,525] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 277 2023-01-11T21:38:06.0988057Z [2023-01-11 21:32:09,232] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 277 2023-01-11T21:38:06.0988477Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.0988609Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.0988867Z [2023-01-11 21:32:09,250] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 278 2023-01-11T21:38:06.0989130Z [2023-01-11 21:32:10,966] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 278 2023-01-11T21:38:06.0989541Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.0989702Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.0989956Z [2023-01-11 21:32:10,984] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 279 2023-01-11T21:38:06.0990216Z [2023-01-11 21:32:12,685] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 279 2023-01-11T21:38:06.0990222Z 2023-01-11T21:38:06.0990320Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.0990392Z import torch 2023-01-11T21:38:06.0990460Z import random 2023-01-11T21:38:06.0990581Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.0990703Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.0990709Z 2023-01-11T21:38:06.0990791Z aten = torch.ops.aten 2023-01-11T21:38:06.0990928Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.0991028Z async_compile = AsyncCompile() 2023-01-11T21:38:06.0991034Z 2023-01-11T21:38:06.0991110Z import triton 2023-01-11T21:38:06.0991196Z import triton.language as tl 2023-01-11T21:38:06.0991320Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.0991461Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.0991469Z 2023-01-11T21:38:06.0991473Z 2023-01-11T21:38:06.0991610Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.0991816Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.0991938Z extern "C" void kernel(long* __restrict__ in_out_ptr0, 2023-01-11T21:38:06.0992047Z const bool* __restrict__ in_ptr0, 2023-01-11T21:38:06.0992146Z long* __restrict__ out_ptr1) 2023-01-11T21:38:06.0992204Z { 2023-01-11T21:38:06.0992295Z auto out_ptr0 = in_out_ptr0; 2023-01-11T21:38:06.0992361Z { 2023-01-11T21:38:06.0992428Z { 2023-01-11T21:38:06.0992510Z long tmp2 = 0; 2023-01-11T21:38:06.0992589Z long tmp3 = 0; 2023-01-11T21:38:06.0992699Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.0992760Z { 2023-01-11T21:38:06.0992893Z #pragma omp for reduction(+:tmp2) reduction(+:tmp3) 2023-01-11T21:38:06.0993016Z for(long i0=0; i0<64; i0+=1) 2023-01-11T21:38:06.0993087Z { 2023-01-11T21:38:06.0993161Z { 2023-01-11T21:38:06.0993262Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:06.0993377Z auto tmp1 = static_cast(tmp0); 2023-01-11T21:38:06.0993455Z tmp2 += tmp1; 2023-01-11T21:38:06.0993538Z tmp3 += tmp1; 2023-01-11T21:38:06.0993608Z } 2023-01-11T21:38:06.0993675Z } 2023-01-11T21:38:06.0993743Z } 2023-01-11T21:38:06.0993826Z out_ptr0[0] = tmp2; 2023-01-11T21:38:06.0993907Z out_ptr1[0] = tmp3; 2023-01-11T21:38:06.0993975Z } 2023-01-11T21:38:06.0994040Z } 2023-01-11T21:38:06.0994104Z { 2023-01-11T21:38:06.0994174Z { 2023-01-11T21:38:06.0994265Z auto tmp0 = out_ptr0[0]; 2023-01-11T21:38:06.0994353Z auto tmp3 = out_ptr1[0]; 2023-01-11T21:38:06.0994451Z auto tmp1 = static_cast(2); 2023-01-11T21:38:06.0994542Z auto tmp2 = tmp0 * tmp1; 2023-01-11T21:38:06.0994629Z auto tmp4 = tmp2 + tmp3; 2023-01-11T21:38:06.0994716Z in_out_ptr0[0] = tmp4; 2023-01-11T21:38:06.0994782Z } 2023-01-11T21:38:06.0994858Z } 2023-01-11T21:38:06.0994937Z } 2023-01-11T21:38:06.0995021Z ''') 2023-01-11T21:38:06.0995028Z 2023-01-11T21:38:06.0995033Z 2023-01-11T21:38:06.0995142Z async_compile.wait(globals()) 2023-01-11T21:38:06.0995216Z del async_compile 2023-01-11T21:38:06.0995221Z 2023-01-11T21:38:06.0995296Z def call(args): 2023-01-11T21:38:06.0995370Z arg0_1, = args 2023-01-11T21:38:06.0995477Z args.clear() 2023-01-11T21:38:06.0995665Z buf0 = empty_strided((), (), device='cpu', dtype=torch.int64) 2023-01-11T21:38:06.0995838Z buf1 = empty_strided((), (), device='cpu', dtype=torch.int64) 2023-01-11T21:38:06.0995930Z buf2 = buf0; del buf0 # reuse 2023-01-11T21:38:06.0996102Z kernel_cpp_0(c_void_p(buf2.data_ptr()), c_void_p(arg0_1.data_ptr()), c_void_p(buf1.data_ptr())) 2023-01-11T21:38:06.0996175Z del arg0_1 2023-01-11T21:38:06.0996249Z return (buf2, ) 2023-01-11T21:38:06.0996255Z 2023-01-11T21:38:06.0996259Z 2023-01-11T21:38:06.0996339Z if __name__ == "__main__": 2023-01-11T21:38:06.0996458Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.0996584Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.0996765Z arg0_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.bool) 2023-01-11T21:38:06.0996877Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.0996882Z 2023-01-11T21:38:06.0996890Z 2023-01-11T21:38:06.0996991Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.0997067Z import torch 2023-01-11T21:38:06.0997143Z import random 2023-01-11T21:38:06.0997262Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.0997385Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.0997390Z 2023-01-11T21:38:06.0997478Z aten = torch.ops.aten 2023-01-11T21:38:06.0997607Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.0997702Z async_compile = AsyncCompile() 2023-01-11T21:38:06.0997707Z 2023-01-11T21:38:06.0997780Z import triton 2023-01-11T21:38:06.0997872Z import triton.language as tl 2023-01-11T21:38:06.0997998Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.0998136Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.0998141Z 2023-01-11T21:38:06.0998146Z 2023-01-11T21:38:06.0998282Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.0998489Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.0998604Z extern "C" void kernel(long* __restrict__ in_out_ptr0, 2023-01-11T21:38:06.0998727Z const unsigned char* __restrict__ in_ptr0, 2023-01-11T21:38:06.0998829Z long* __restrict__ out_ptr1) 2023-01-11T21:38:06.0998925Z { 2023-01-11T21:38:06.0999018Z auto out_ptr0 = in_out_ptr0; 2023-01-11T21:38:06.0999084Z { 2023-01-11T21:38:06.0999150Z { 2023-01-11T21:38:06.0999223Z long tmp2 = 0; 2023-01-11T21:38:06.0999302Z long tmp3 = 0; 2023-01-11T21:38:06.0999413Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.0999480Z { 2023-01-11T21:38:06.0999612Z #pragma omp for reduction(+:tmp2) reduction(+:tmp3) 2023-01-11T21:38:06.0999706Z for(long i0=0; i0<64; i0+=1) 2023-01-11T21:38:06.0999775Z { 2023-01-11T21:38:06.0999839Z { 2023-01-11T21:38:06.0999940Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:06.1000054Z auto tmp1 = static_cast(tmp0); 2023-01-11T21:38:06.1000144Z tmp2 += tmp1; 2023-01-11T21:38:06.1000228Z tmp3 += tmp1; 2023-01-11T21:38:06.1000297Z } 2023-01-11T21:38:06.1000367Z } 2023-01-11T21:38:06.1000428Z } 2023-01-11T21:38:06.1000514Z out_ptr0[0] = tmp2; 2023-01-11T21:38:06.1000597Z out_ptr1[0] = tmp3; 2023-01-11T21:38:06.1000663Z } 2023-01-11T21:38:06.1000729Z } 2023-01-11T21:38:06.1000794Z { 2023-01-11T21:38:06.1000853Z { 2023-01-11T21:38:06.1000942Z auto tmp0 = out_ptr0[0]; 2023-01-11T21:38:06.1001029Z auto tmp3 = out_ptr1[0]; 2023-01-11T21:38:06.1001132Z auto tmp1 = static_cast(2); 2023-01-11T21:38:06.1001220Z auto tmp2 = tmp0 * tmp1; 2023-01-11T21:38:06.1001308Z auto tmp4 = tmp2 + tmp3; 2023-01-11T21:38:06.1001432Z in_out_ptr0[0] = tmp4; 2023-01-11T21:38:06.1001491Z } 2023-01-11T21:38:06.1001557Z } 2023-01-11T21:38:06.1001620Z } 2023-01-11T21:38:06.1001705Z ''') 2023-01-11T21:38:06.1001710Z 2023-01-11T21:38:06.1001714Z 2023-01-11T21:38:06.1001806Z async_compile.wait(globals()) 2023-01-11T21:38:06.1001886Z del async_compile 2023-01-11T21:38:06.1001891Z 2023-01-11T21:38:06.1001965Z def call(args): 2023-01-11T21:38:06.1002032Z arg0_1, = args 2023-01-11T21:38:06.1002106Z args.clear() 2023-01-11T21:38:06.1002286Z buf0 = empty_strided((), (), device='cpu', dtype=torch.int64) 2023-01-11T21:38:06.1002464Z buf1 = empty_strided((), (), device='cpu', dtype=torch.int64) 2023-01-11T21:38:06.1002553Z buf2 = buf0; del buf0 # reuse 2023-01-11T21:38:06.1002722Z kernel_cpp_0(c_void_p(buf2.data_ptr()), c_void_p(arg0_1.data_ptr()), c_void_p(buf1.data_ptr())) 2023-01-11T21:38:06.1002796Z del arg0_1 2023-01-11T21:38:06.1002864Z return (buf2, ) 2023-01-11T21:38:06.1002882Z 2023-01-11T21:38:06.1002887Z 2023-01-11T21:38:06.1002960Z if __name__ == "__main__": 2023-01-11T21:38:06.1003076Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.1003203Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.1003396Z arg0_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.uint8) 2023-01-11T21:38:06.1003508Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.1003513Z 2023-01-11T21:38:06.1003517Z 2023-01-11T21:38:06.1003615Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.1003687Z import torch 2023-01-11T21:38:06.1003754Z import random 2023-01-11T21:38:06.1003873Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.1003994Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.1003999Z 2023-01-11T21:38:06.1004080Z aten = torch.ops.aten 2023-01-11T21:38:06.1004217Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.1004314Z async_compile = AsyncCompile() 2023-01-11T21:38:06.1004319Z 2023-01-11T21:38:06.1004395Z import triton 2023-01-11T21:38:06.1004489Z import triton.language as tl 2023-01-11T21:38:06.1004607Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.1004744Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.1004776Z 2023-01-11T21:38:06.1004781Z 2023-01-11T21:38:06.1004917Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.1005125Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.1005243Z extern "C" void kernel(long* __restrict__ in_out_ptr0, 2023-01-11T21:38:06.1005350Z const int* __restrict__ in_ptr0, 2023-01-11T21:38:06.1005453Z long* __restrict__ out_ptr1) 2023-01-11T21:38:06.1005516Z { 2023-01-11T21:38:06.1005599Z auto out_ptr0 = in_out_ptr0; 2023-01-11T21:38:06.1005664Z { 2023-01-11T21:38:06.1005729Z { 2023-01-11T21:38:06.1005812Z long tmp2 = 0; 2023-01-11T21:38:06.1005891Z long tmp3 = 0; 2023-01-11T21:38:06.1006002Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.1006070Z { 2023-01-11T21:38:06.1006195Z #pragma omp for reduction(+:tmp2) reduction(+:tmp3) 2023-01-11T21:38:06.1006291Z for(long i0=0; i0<64; i0+=1) 2023-01-11T21:38:06.1006360Z { 2023-01-11T21:38:06.1006428Z { 2023-01-11T21:38:06.1006531Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:06.1006645Z auto tmp1 = static_cast(tmp0); 2023-01-11T21:38:06.1006730Z tmp2 += tmp1; 2023-01-11T21:38:06.1006806Z tmp3 += tmp1; 2023-01-11T21:38:06.1006876Z } 2023-01-11T21:38:06.1006945Z } 2023-01-11T21:38:06.1007012Z } 2023-01-11T21:38:06.1007096Z out_ptr0[0] = tmp2; 2023-01-11T21:38:06.1007206Z out_ptr1[0] = tmp3; 2023-01-11T21:38:06.1007266Z } 2023-01-11T21:38:06.1007332Z } 2023-01-11T21:38:06.1007397Z { 2023-01-11T21:38:06.1007463Z { 2023-01-11T21:38:06.1007551Z auto tmp0 = out_ptr0[0]; 2023-01-11T21:38:06.1007644Z auto tmp3 = out_ptr1[0]; 2023-01-11T21:38:06.1007750Z auto tmp1 = static_cast(2); 2023-01-11T21:38:06.1007833Z auto tmp2 = tmp0 * tmp1; 2023-01-11T21:38:06.1007922Z auto tmp4 = tmp2 + tmp3; 2023-01-11T21:38:06.1008008Z in_out_ptr0[0] = tmp4; 2023-01-11T21:38:06.1008076Z } 2023-01-11T21:38:06.1008141Z } 2023-01-11T21:38:06.1008205Z } 2023-01-11T21:38:06.1008282Z ''') 2023-01-11T21:38:06.1008294Z 2023-01-11T21:38:06.1008298Z 2023-01-11T21:38:06.1008385Z async_compile.wait(globals()) 2023-01-11T21:38:06.1008462Z del async_compile 2023-01-11T21:38:06.1008467Z 2023-01-11T21:38:06.1008541Z def call(args): 2023-01-11T21:38:06.1008615Z arg0_1, = args 2023-01-11T21:38:06.1008694Z args.clear() 2023-01-11T21:38:06.1008876Z buf0 = empty_strided((), (), device='cpu', dtype=torch.int64) 2023-01-11T21:38:06.1009053Z buf1 = empty_strided((), (), device='cpu', dtype=torch.int64) 2023-01-11T21:38:06.1009135Z buf2 = buf0; del buf0 # reuse 2023-01-11T21:38:06.1009300Z kernel_cpp_0(c_void_p(buf2.data_ptr()), c_void_p(arg0_1.data_ptr()), c_void_p(buf1.data_ptr())) 2023-01-11T21:38:06.1009374Z del arg0_1 2023-01-11T21:38:06.1009449Z return (buf2, ) 2023-01-11T21:38:06.1009454Z 2023-01-11T21:38:06.1009459Z 2023-01-11T21:38:06.1009539Z if __name__ == "__main__": 2023-01-11T21:38:06.1009655Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.1009782Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.1009974Z arg0_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.int32) 2023-01-11T21:38:06.1010078Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.1010083Z 2023-01-11T21:38:06.1010155Z ok (5.178s) 2023-01-11T21:38:06.1010640Z test_sum_keepdims_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.1010774Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.1011032Z [2023-01-11 21:32:12,701] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 280 2023-01-11T21:38:06.1011295Z [2023-01-11 21:32:12,710] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 280 2023-01-11T21:38:06.1011301Z 2023-01-11T21:38:06.1011399Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.1011473Z import torch 2023-01-11T21:38:06.1011550Z import random 2023-01-11T21:38:06.1011664Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.1011786Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.1011791Z 2023-01-11T21:38:06.1011873Z aten = torch.ops.aten 2023-01-11T21:38:06.1012007Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.1012106Z async_compile = AsyncCompile() 2023-01-11T21:38:06.1012112Z 2023-01-11T21:38:06.1012186Z import triton 2023-01-11T21:38:06.1012280Z import triton.language as tl 2023-01-11T21:38:06.1012404Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.1012536Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.1012542Z 2023-01-11T21:38:06.1012546Z 2023-01-11T21:38:06.1012684Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.1012891Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.1013014Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.1013152Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:06.1013256Z float* __restrict__ out_ptr0) 2023-01-11T21:38:06.1013323Z { 2023-01-11T21:38:06.1013426Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.1013485Z { 2023-01-11T21:38:06.1013568Z #pragma omp for 2023-01-11T21:38:06.1013656Z for(long i0=0; i0<8; i0+=1) 2023-01-11T21:38:06.1013722Z { 2023-01-11T21:38:06.1013790Z { 2023-01-11T21:38:06.1013985Z #pragma omp declare reduction(+:at::vec::Vectorized:omp_out += omp_in) initializer(omp_priv={{0}}) 2023-01-11T21:38:06.1014071Z float tmp3 = 0; 2023-01-11T21:38:06.1014192Z auto tmp3_vec = at::vec::Vectorized(tmp3); 2023-01-11T21:38:06.1014287Z for(long i1=0; i1<1; i1+=1) 2023-01-11T21:38:06.1014356Z { 2023-01-11T21:38:06.1014619Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + (8*i0) + (8*i1)); 2023-01-11T21:38:06.1014772Z auto tmp1 = at::vec::Vectorized::loadu(in_ptr1 + (8*i0) + (8*i1)); 2023-01-11T21:38:06.1014867Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.1014955Z tmp3_vec += tmp2; 2023-01-11T21:38:06.1015016Z } 2023-01-11T21:38:06.1015220Z tmp3 = at::vec::vec_reduce_all([](at::vec::Vectorized& x, at::vec::Vectorized&y) {return x + y;}, tmp3_vec); 2023-01-11T21:38:06.1015350Z #pragma omp simd simdlen(4) reduction(+:tmp3) 2023-01-11T21:38:06.1015445Z for(long i1=8; i1<8; i1+=1) 2023-01-11T21:38:06.1015515Z { 2023-01-11T21:38:06.1015621Z auto tmp0 = in_ptr0[i1 + (8*i0)]; 2023-01-11T21:38:06.1015730Z auto tmp1 = in_ptr1[i1 + (8*i0)]; 2023-01-11T21:38:06.1015825Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.1015901Z tmp3 += tmp2; 2023-01-11T21:38:06.1015973Z } 2023-01-11T21:38:06.1016059Z out_ptr0[i0] = tmp3; 2023-01-11T21:38:06.1016125Z } 2023-01-11T21:38:06.1016190Z } 2023-01-11T21:38:06.1016254Z } 2023-01-11T21:38:06.1016317Z } 2023-01-11T21:38:06.1016397Z ''') 2023-01-11T21:38:06.1016402Z 2023-01-11T21:38:06.1016448Z 2023-01-11T21:38:06.1016542Z async_compile.wait(globals()) 2023-01-11T21:38:06.1016619Z del async_compile 2023-01-11T21:38:06.1016623Z 2023-01-11T21:38:06.1016696Z def call(args): 2023-01-11T21:38:06.1016776Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.1016849Z args.clear() 2023-01-11T21:38:06.1017048Z buf0 = empty_strided((8, 1), (1, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1017268Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.1017345Z del arg0_1 2023-01-11T21:38:06.1017417Z del arg1_1 2023-01-11T21:38:06.1017494Z return (buf0, ) 2023-01-11T21:38:06.1017503Z 2023-01-11T21:38:06.1017507Z 2023-01-11T21:38:06.1017587Z if __name__ == "__main__": 2023-01-11T21:38:06.1017705Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.1017832Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.1018032Z arg0_1 = rand_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1018221Z arg1_1 = rand_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1018343Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.1018348Z 2023-01-11T21:38:06.1018418Z ok (0.025s) 2023-01-11T21:38:06.1018866Z test_tanh_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.1019042Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.1019302Z [2023-01-11 21:32:12,739] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 281 2023-01-11T21:38:06.1019567Z [2023-01-11 21:32:14,415] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 281 2023-01-11T21:38:06.1019573Z 2023-01-11T21:38:06.1019672Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.1019746Z import torch 2023-01-11T21:38:06.1019814Z import random 2023-01-11T21:38:06.1019934Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.1020057Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.1020062Z 2023-01-11T21:38:06.1020142Z aten = torch.ops.aten 2023-01-11T21:38:06.1020278Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.1020375Z async_compile = AsyncCompile() 2023-01-11T21:38:06.1020384Z 2023-01-11T21:38:06.1020457Z import triton 2023-01-11T21:38:06.1020549Z import triton.language as tl 2023-01-11T21:38:06.1020667Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.1020808Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.1020814Z 2023-01-11T21:38:06.1020818Z 2023-01-11T21:38:06.1020957Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.1021162Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.1021288Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.1021393Z float* __restrict__ out_ptr0, 2023-01-11T21:38:06.1021494Z float* __restrict__ out_ptr1) 2023-01-11T21:38:06.1021560Z { 2023-01-11T21:38:06.1021655Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.1021722Z { 2023-01-11T21:38:06.1021804Z #pragma omp for 2023-01-11T21:38:06.1021893Z for(long i0=0; i0<32; i0+=1) 2023-01-11T21:38:06.1021961Z { 2023-01-11T21:38:06.1022101Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:06.1022191Z auto tmp1 = tmp0.tanh(); 2023-01-11T21:38:06.1022320Z auto tmp2 = at::vec::Vectorized(static_cast(2)); 2023-01-11T21:38:06.1022442Z auto tmp3 = tmp1 + tmp2; 2023-01-11T21:38:06.1022578Z auto tmp4 = at::vec::Vectorized(static_cast(1)); 2023-01-11T21:38:06.1022667Z auto tmp5 = tmp0 + tmp4; 2023-01-11T21:38:06.1022756Z auto tmp6 = tmp5.tanh(); 2023-01-11T21:38:06.1022852Z tmp3.store(out_ptr0 + 8*i0); 2023-01-11T21:38:06.1022945Z tmp6.store(out_ptr1 + 8*i0); 2023-01-11T21:38:06.1023006Z } 2023-01-11T21:38:06.1023106Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:06.1023193Z for(long i0=256; i0<256; i0+=1) 2023-01-11T21:38:06.1023257Z { 2023-01-11T21:38:06.1023349Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:06.1023447Z auto tmp1 = std::tanh(tmp0); 2023-01-11T21:38:06.1023558Z auto tmp2 = static_cast(2); 2023-01-11T21:38:06.1023640Z auto tmp3 = tmp1 + tmp2; 2023-01-11T21:38:06.1023743Z auto tmp4 = static_cast(1); 2023-01-11T21:38:06.1023834Z auto tmp5 = tmp0 + tmp4; 2023-01-11T21:38:06.1023933Z auto tmp6 = std::tanh(tmp5); 2023-01-11T21:38:06.1024020Z out_ptr0[i0] = tmp3; 2023-01-11T21:38:06.1024101Z out_ptr1[i0] = tmp6; 2023-01-11T21:38:06.1024169Z } 2023-01-11T21:38:06.1024228Z } 2023-01-11T21:38:06.1024292Z } 2023-01-11T21:38:06.1024377Z ''') 2023-01-11T21:38:06.1024383Z 2023-01-11T21:38:06.1024387Z 2023-01-11T21:38:06.1024483Z async_compile.wait(globals()) 2023-01-11T21:38:06.1024560Z del async_compile 2023-01-11T21:38:06.1024565Z 2023-01-11T21:38:06.1024639Z def call(args): 2023-01-11T21:38:06.1024712Z arg0_1, = args 2023-01-11T21:38:06.1024810Z args.clear() 2023-01-11T21:38:06.1025012Z buf0 = empty_strided((16, 16), (16, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1025213Z buf1 = empty_strided((16, 16), (16, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1025384Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr())) 2023-01-11T21:38:06.1025459Z del arg0_1 2023-01-11T21:38:06.1025540Z return (buf0, buf1, ) 2023-01-11T21:38:06.1025546Z 2023-01-11T21:38:06.1025550Z 2023-01-11T21:38:06.1025628Z if __name__ == "__main__": 2023-01-11T21:38:06.1025746Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.1025865Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.1026064Z arg0_1 = rand_strided((16, 16), (16, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1026176Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.1026182Z 2023-01-11T21:38:06.1026252Z ok (1.706s) 2023-01-11T21:38:06.1026708Z test_tensor1_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.1026840Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.1027097Z [2023-01-11 21:32:14,433] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 282 2023-01-11T21:38:06.1027361Z [2023-01-11 21:32:16,107] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 282 2023-01-11T21:38:06.1027367Z 2023-01-11T21:38:06.1027464Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.1027539Z import torch 2023-01-11T21:38:06.1027607Z import random 2023-01-11T21:38:06.1027728Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.1027852Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.1027857Z 2023-01-11T21:38:06.1027941Z aten = torch.ops.aten 2023-01-11T21:38:06.1028078Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.1028174Z async_compile = AsyncCompile() 2023-01-11T21:38:06.1028210Z 2023-01-11T21:38:06.1028287Z import triton 2023-01-11T21:38:06.1028372Z import triton.language as tl 2023-01-11T21:38:06.1028496Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.1028635Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.1028640Z 2023-01-11T21:38:06.1028645Z 2023-01-11T21:38:06.1028780Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.1028985Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.1029111Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.1029222Z float* __restrict__ out_ptr0, 2023-01-11T21:38:06.1029326Z long* __restrict__ out_ptr1) 2023-01-11T21:38:06.1029384Z { 2023-01-11T21:38:06.1029485Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.1029552Z { 2023-01-11T21:38:06.1029635Z #pragma omp for 2023-01-11T21:38:06.1029725Z for(long i0=0; i0<10; i0+=1) 2023-01-11T21:38:06.1029793Z { 2023-01-11T21:38:06.1029858Z { 2023-01-11T21:38:06.1029920Z { 2023-01-11T21:38:06.1030018Z auto tmp2 = in_ptr0[i0]; 2023-01-11T21:38:06.1030126Z auto tmp0 = static_cast(1); 2023-01-11T21:38:06.1030236Z auto tmp1 = static_cast(tmp0); 2023-01-11T21:38:06.1030333Z auto tmp3 = tmp1 + tmp2; 2023-01-11T21:38:06.1030425Z out_ptr0[i0] = tmp3; 2023-01-11T21:38:06.1030492Z } 2023-01-11T21:38:06.1030552Z } 2023-01-11T21:38:06.1030649Z } 2023-01-11T21:38:06.1030734Z #pragma omp single 2023-01-11T21:38:06.1030800Z { 2023-01-11T21:38:06.1030868Z { 2023-01-11T21:38:06.1030935Z { 2023-01-11T21:38:06.1031035Z auto tmp0 = static_cast(5); 2023-01-11T21:38:06.1031122Z out_ptr1[0] = tmp0; 2023-01-11T21:38:06.1031197Z } 2023-01-11T21:38:06.1031268Z } 2023-01-11T21:38:06.1031333Z } 2023-01-11T21:38:06.1031402Z } 2023-01-11T21:38:06.1031469Z } 2023-01-11T21:38:06.1031547Z ''') 2023-01-11T21:38:06.1031552Z 2023-01-11T21:38:06.1031557Z 2023-01-11T21:38:06.1031649Z async_compile.wait(globals()) 2023-01-11T21:38:06.1031726Z del async_compile 2023-01-11T21:38:06.1031731Z 2023-01-11T21:38:06.1031804Z def call(args): 2023-01-11T21:38:06.1031881Z arg0_1, = args 2023-01-11T21:38:06.1031957Z args.clear() 2023-01-11T21:38:06.1032151Z buf0 = empty_strided((10, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1032327Z buf1 = empty_strided((), (), device='cpu', dtype=torch.int64) 2023-01-11T21:38:06.1032493Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr())) 2023-01-11T21:38:06.1032568Z del arg0_1 2023-01-11T21:38:06.1032650Z return (buf0, buf1, ) 2023-01-11T21:38:06.1032655Z 2023-01-11T21:38:06.1032662Z 2023-01-11T21:38:06.1032743Z if __name__ == "__main__": 2023-01-11T21:38:06.1032861Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.1032984Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.1033177Z arg0_1 = rand_strided((10, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1033282Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.1033296Z 2023-01-11T21:38:06.1033360Z ok (1.692s) 2023-01-11T21:38:06.1033812Z test_tensor2_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.1033951Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.1034246Z [2023-01-11 21:32:16,124] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 283 2023-01-11T21:38:06.1034510Z [2023-01-11 21:32:17,791] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 283 2023-01-11T21:38:06.1034516Z 2023-01-11T21:38:06.1034616Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.1034692Z import torch 2023-01-11T21:38:06.1034768Z import random 2023-01-11T21:38:06.1034881Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.1035004Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.1035012Z 2023-01-11T21:38:06.1035094Z aten = torch.ops.aten 2023-01-11T21:38:06.1035230Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.1035324Z async_compile = AsyncCompile() 2023-01-11T21:38:06.1035329Z 2023-01-11T21:38:06.1035402Z import triton 2023-01-11T21:38:06.1035495Z import triton.language as tl 2023-01-11T21:38:06.1035622Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.1035754Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.1035921Z constant0 = None # 4ebd4ff1c68a89413a036eaaf84436373c4ec2939ac1d7f84e9908772a109281 2023-01-11T21:38:06.1035926Z 2023-01-11T21:38:06.1035931Z 2023-01-11T21:38:06.1036068Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.1036274Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.1036399Z extern "C" void kernel(const long* __restrict__ in_ptr0, 2023-01-11T21:38:06.1036510Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:06.1036644Z float* __restrict__ out_ptr0) 2023-01-11T21:38:06.1036709Z { 2023-01-11T21:38:06.1036803Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.1036869Z { 2023-01-11T21:38:06.1036952Z #pragma omp for 2023-01-11T21:38:06.1037039Z for(long i0=0; i0<19; i0+=1) 2023-01-11T21:38:06.1037108Z { 2023-01-11T21:38:06.1037175Z { 2023-01-11T21:38:06.1037244Z { 2023-01-11T21:38:06.1037334Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:06.1037429Z auto tmp2 = in_ptr1[0]; 2023-01-11T21:38:06.1037541Z auto tmp1 = static_cast(tmp0); 2023-01-11T21:38:06.1037639Z auto tmp3 = tmp1 + tmp2; 2023-01-11T21:38:06.1037728Z out_ptr0[i0] = tmp3; 2023-01-11T21:38:06.1037797Z } 2023-01-11T21:38:06.1037864Z } 2023-01-11T21:38:06.1037923Z } 2023-01-11T21:38:06.1037993Z } 2023-01-11T21:38:06.1038057Z } 2023-01-11T21:38:06.1038139Z ''') 2023-01-11T21:38:06.1038145Z 2023-01-11T21:38:06.1038149Z 2023-01-11T21:38:06.1038241Z async_compile.wait(globals()) 2023-01-11T21:38:06.1038319Z del async_compile 2023-01-11T21:38:06.1038324Z 2023-01-11T21:38:06.1038398Z def call(args): 2023-01-11T21:38:06.1038465Z arg0_1, = args 2023-01-11T21:38:06.1038544Z args.clear() 2023-01-11T21:38:06.1038739Z buf0 = empty_strided((19, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1038915Z kernel_cpp_0(c_void_p(constant0.data_ptr()), c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.1038989Z del arg0_1 2023-01-11T21:38:06.1039063Z return (buf0, ) 2023-01-11T21:38:06.1039069Z 2023-01-11T21:38:06.1039073Z 2023-01-11T21:38:06.1039152Z if __name__ == "__main__": 2023-01-11T21:38:06.1039261Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.1039388Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.1039590Z constant0 = rand_strided((19, ), (1, ), device='cpu', dtype=torch.int64) 2023-01-11T21:38:06.1039782Z arg0_1 = rand_strided((1, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1039893Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.1039898Z 2023-01-11T21:38:06.1039971Z ok (1.683s) 2023-01-11T21:38:06.1040447Z test_tensor3_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.1040581Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.1040839Z [2023-01-11 21:32:17,818] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 284 2023-01-11T21:38:06.1041107Z [2023-01-11 21:32:19,491] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 284 2023-01-11T21:38:06.1041113Z 2023-01-11T21:38:06.1041204Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.1041278Z import torch 2023-01-11T21:38:06.1041353Z import random 2023-01-11T21:38:06.1041474Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.1041597Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.1041602Z 2023-01-11T21:38:06.1041683Z aten = torch.ops.aten 2023-01-11T21:38:06.1041819Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.1041908Z async_compile = AsyncCompile() 2023-01-11T21:38:06.1041922Z 2023-01-11T21:38:06.1041989Z import triton 2023-01-11T21:38:06.1042086Z import triton.language as tl 2023-01-11T21:38:06.1042211Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.1042350Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.1042396Z 2023-01-11T21:38:06.1042400Z 2023-01-11T21:38:06.1042538Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.1042745Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.1042869Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.1042966Z long* __restrict__ out_ptr0, 2023-01-11T21:38:06.1043066Z long* __restrict__ out_ptr1, 2023-01-11T21:38:06.1043168Z float* __restrict__ out_ptr2) 2023-01-11T21:38:06.1043234Z { 2023-01-11T21:38:06.1043316Z #pragma GCC ivdep 2023-01-11T21:38:06.1043402Z for(long i0=0; i0<2; i0+=1) 2023-01-11T21:38:06.1043468Z { 2023-01-11T21:38:06.1043528Z { 2023-01-11T21:38:06.1043597Z { 2023-01-11T21:38:06.1043703Z auto tmp0 = static_cast(i0); 2023-01-11T21:38:06.1043804Z auto tmp1 = static_cast(1); 2023-01-11T21:38:06.1043899Z auto tmp2 = tmp0 < tmp1; 2023-01-11T21:38:06.1044000Z auto tmp3 = static_cast(2); 2023-01-11T21:38:06.1044102Z auto tmp4 = tmp2 ? tmp1 : tmp3; 2023-01-11T21:38:06.1044187Z auto tmp5 = tmp4 + tmp1; 2023-01-11T21:38:06.1044274Z out_ptr0[i0] = tmp5; 2023-01-11T21:38:06.1044348Z } 2023-01-11T21:38:06.1044415Z } 2023-01-11T21:38:06.1044478Z } 2023-01-11T21:38:06.1044558Z #pragma GCC ivdep 2023-01-11T21:38:06.1044642Z for(long i0=0; i0<3; i0+=1) 2023-01-11T21:38:06.1044701Z { 2023-01-11T21:38:06.1044767Z { 2023-01-11T21:38:06.1044838Z { 2023-01-11T21:38:06.1044941Z auto tmp0 = static_cast(i0); 2023-01-11T21:38:06.1045040Z auto tmp1 = static_cast(1); 2023-01-11T21:38:06.1045133Z auto tmp2 = tmp0 < tmp1; 2023-01-11T21:38:06.1045227Z auto tmp3 = static_cast(2); 2023-01-11T21:38:06.1045319Z auto tmp4 = tmp0 < tmp3; 2023-01-11T21:38:06.1045417Z auto tmp5 = static_cast(3); 2023-01-11T21:38:06.1045519Z auto tmp6 = tmp4 ? tmp3 : tmp5; 2023-01-11T21:38:06.1045619Z auto tmp7 = tmp2 ? tmp1 : tmp6; 2023-01-11T21:38:06.1045712Z auto tmp8 = tmp7 + tmp3; 2023-01-11T21:38:06.1045829Z out_ptr1[i0] = tmp8; 2023-01-11T21:38:06.1045899Z } 2023-01-11T21:38:06.1045959Z } 2023-01-11T21:38:06.1046026Z } 2023-01-11T21:38:06.1046107Z #pragma GCC ivdep 2023-01-11T21:38:06.1046196Z for(long i0=0; i0<4; i0+=1) 2023-01-11T21:38:06.1046262Z { 2023-01-11T21:38:06.1046332Z { 2023-01-11T21:38:06.1046393Z { 2023-01-11T21:38:06.1046487Z auto tmp12 = in_ptr0[i0]; 2023-01-11T21:38:06.1046591Z auto tmp0 = static_cast(i0); 2023-01-11T21:38:06.1046692Z auto tmp1 = static_cast(2); 2023-01-11T21:38:06.1046784Z auto tmp2 = tmp0 < tmp1; 2023-01-11T21:38:06.1046888Z auto tmp3 = static_cast(1); 2023-01-11T21:38:06.1046979Z auto tmp4 = tmp0 < tmp3; 2023-01-11T21:38:06.1047072Z auto tmp5 = tmp4 ? tmp3 : tmp1; 2023-01-11T21:38:06.1047177Z auto tmp6 = static_cast(3); 2023-01-11T21:38:06.1047272Z auto tmp7 = tmp0 < tmp6; 2023-01-11T21:38:06.1047376Z auto tmp8 = static_cast(4); 2023-01-11T21:38:06.1047475Z auto tmp9 = tmp7 ? tmp6 : tmp8; 2023-01-11T21:38:06.1047579Z auto tmp10 = tmp2 ? tmp5 : tmp9; 2023-01-11T21:38:06.1047690Z auto tmp11 = static_cast(tmp10); 2023-01-11T21:38:06.1047779Z auto tmp13 = tmp11 + tmp12; 2023-01-11T21:38:06.1047867Z out_ptr2[i0] = tmp13; 2023-01-11T21:38:06.1047932Z } 2023-01-11T21:38:06.1047999Z } 2023-01-11T21:38:06.1048064Z } 2023-01-11T21:38:06.1048127Z } 2023-01-11T21:38:06.1048242Z ''') 2023-01-11T21:38:06.1048247Z 2023-01-11T21:38:06.1048252Z 2023-01-11T21:38:06.1048339Z async_compile.wait(globals()) 2023-01-11T21:38:06.1048416Z del async_compile 2023-01-11T21:38:06.1048421Z 2023-01-11T21:38:06.1048496Z def call(args): 2023-01-11T21:38:06.1048570Z arg0_1, = args 2023-01-11T21:38:06.1048647Z args.clear() 2023-01-11T21:38:06.1048840Z buf0 = empty_strided((2, ), (1, ), device='cpu', dtype=torch.int64) 2023-01-11T21:38:06.1049028Z buf1 = empty_strided((3, ), (1, ), device='cpu', dtype=torch.int64) 2023-01-11T21:38:06.1049219Z buf2 = empty_strided((4, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1049403Z buf3 = empty_strided((0, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1049593Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr())) 2023-01-11T21:38:06.1049667Z del arg0_1 2023-01-11T21:38:06.1049761Z return (buf3, buf0, buf1, buf2, ) 2023-01-11T21:38:06.1049770Z 2023-01-11T21:38:06.1049774Z 2023-01-11T21:38:06.1049854Z if __name__ == "__main__": 2023-01-11T21:38:06.1049969Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.1050094Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.1050288Z arg0_1 = rand_strided((4, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1050393Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.1050398Z 2023-01-11T21:38:06.1050473Z ok (1.700s) 2023-01-11T21:38:06.1050937Z test_tmp_not_defined_issue1_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.1051068Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.1051458Z Failed to collect metadata on function, produced code may be suboptimal. Known situations this can occur are inference mode only compilation involving resize_ or prims (!schema.hasAnyAliasInfo() INTERNAL ASSERT FAILED); if your situation looks different please file a bug to PyTorch. 2023-01-11T21:38:06.1051558Z Traceback (most recent call last): 2023-01-11T21:38:06.1051833Z File "/opt/conda/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 1273, in aot_wrapper_dedupe 2023-01-11T21:38:06.1052003Z fw_metadata, _out, _num_aliasing_metadata_outs = run_functionalized_fw_and_collect_metadata( 2023-01-11T21:38:06.1052252Z File "/opt/conda/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 289, in inner 2023-01-11T21:38:06.1052331Z outs = f(*f_args) 2023-01-11T21:38:06.1052589Z File "/opt/conda/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 2327, in functional_call 2023-01-11T21:38:06.1052719Z out = Interpreter(mod).run(*args[params_len:], **kwargs) 2023-01-11T21:38:06.1052949Z File "/opt/conda/lib/python3.10/site-packages/torch/fx/interpreter.py", line 136, in run 2023-01-11T21:38:06.1053053Z self.env[node] = self.run_node(node) 2023-01-11T21:38:06.1053293Z File "/opt/conda/lib/python3.10/site-packages/torch/fx/interpreter.py", line 177, in run_node 2023-01-11T21:38:06.1053414Z return getattr(self, n.op)(n.target, args, kwargs) 2023-01-11T21:38:06.1053660Z File "/opt/conda/lib/python3.10/site-packages/torch/fx/interpreter.py", line 249, in call_function 2023-01-11T21:38:06.1053753Z return target(*args, **kwargs) 2023-01-11T21:38:06.1053961Z File "/opt/conda/lib/python3.10/site-packages/torch/_ops.py", line 284, in __call__ 2023-01-11T21:38:06.1054063Z return self._op(*args, **kwargs or {}) 2023-01-11T21:38:06.1054320Z File "/opt/conda/lib/python3.10/site-packages/torch/_inductor/overrides.py", line 36, in __torch_function__ 2023-01-11T21:38:06.1054443Z return func(*args, **kwargs) 2023-01-11T21:38:06.1054783Z File "/opt/conda/lib/python3.10/site-packages/torch/_ops.py", line 284, in __call__ 2023-01-11T21:38:06.1054886Z return self._op(*args, **kwargs or {}) 2023-01-11T21:38:06.1055157Z File "/opt/conda/lib/python3.10/site-packages/torch/_prims/__init__.py", line 285, in _autograd_impl 2023-01-11T21:38:06.1055309Z return backwards_not_supported(_prim)(*args, **kwargs) 2023-01-11T21:38:06.1055559Z File "/opt/conda/lib/python3.10/site-packages/torch/_prims_common/wrappers.py", line 309, in _autograd_impl 2023-01-11T21:38:06.1055663Z return redispatch_prim(args, kwargs) 2023-01-11T21:38:06.1055924Z File "/opt/conda/lib/python3.10/site-packages/torch/_prims_common/wrappers.py", line 279, in redispatch_prim 2023-01-11T21:38:06.1056015Z return prim(*args, **kwargs) 2023-01-11T21:38:06.1056231Z File "/opt/conda/lib/python3.10/site-packages/torch/_ops.py", line 284, in __call__ 2023-01-11T21:38:06.1056334Z return self._op(*args, **kwargs or {}) 2023-01-11T21:38:06.1056804Z RuntimeError: !schema.hasAnyAliasInfo() INTERNAL ASSERT FAILED at "/var/lib/jenkins/workspace/aten/src/ATen/FunctionalizeFallbackKernel.cpp":32, please report a bug to PyTorch. mutating and aliasing ops should all have codegen'd kernels 2023-01-11T21:38:06.1056810Z 2023-01-11T21:38:06.1057056Z While executing %broadcast_in_dim_default : [#users=1] = call_function[target=torch.ops.prims.broadcast_in_dim.default](args = (%var_default_1, [1, 512, 1], [0, 1]), kwargs = {}) 2023-01-11T21:38:06.1057185Z Original traceback: 2023-01-11T21:38:06.1057270Z Module stack: {} 2023-01-11T21:38:06.1057438Z File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 4723, in forward 2023-01-11T21:38:06.1057592Z broadcast_in_dim_default_2 = torch.ops.prims.broadcast_in_dim.default( 2023-01-11T21:38:06.1057754Z | File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 318, in run 2023-01-11T21:38:06.1057842Z return model(*ex, **kwargs) 2023-01-11T21:38:06.1057850Z 2023-01-11T21:38:06.1058108Z [2023-01-11 21:32:19,701] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 285 2023-01-11T21:38:06.1058114Z 2023-01-11T21:38:06.1058211Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.1058289Z import torch 2023-01-11T21:38:06.1058432Z import random 2023-01-11T21:38:06.1058557Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.1058682Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.1058687Z 2023-01-11T21:38:06.1058769Z aten = torch.ops.aten 2023-01-11T21:38:06.1058905Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.1059001Z async_compile = AsyncCompile() 2023-01-11T21:38:06.1059007Z 2023-01-11T21:38:06.1059083Z import triton 2023-01-11T21:38:06.1059168Z import triton.language as tl 2023-01-11T21:38:06.1059293Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.1059432Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.1059441Z 2023-01-11T21:38:06.1059445Z 2023-01-11T21:38:06.1059584Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.1059790Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.1059912Z extern "C" void kernel(float* __restrict__ in_out_ptr0, 2023-01-11T21:38:06.1060019Z float* __restrict__ in_out_ptr1, 2023-01-11T21:38:06.1060134Z const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.1060236Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:06.1060344Z const float* __restrict__ in_ptr2, 2023-01-11T21:38:06.1060453Z const float* __restrict__ in_ptr3, 2023-01-11T21:38:06.1060559Z const float* __restrict__ in_ptr4, 2023-01-11T21:38:06.1060666Z const float* __restrict__ in_ptr5, 2023-01-11T21:38:06.1060770Z float* __restrict__ out_ptr0, 2023-01-11T21:38:06.1060912Z float* __restrict__ out_ptr2, 2023-01-11T21:38:06.1061015Z float* __restrict__ out_ptr3, 2023-01-11T21:38:06.1061110Z float* __restrict__ out_ptr5) 2023-01-11T21:38:06.1061179Z { 2023-01-11T21:38:06.1061276Z auto out_ptr1 = in_out_ptr0; 2023-01-11T21:38:06.1061370Z auto out_ptr4 = in_out_ptr1; 2023-01-11T21:38:06.1061474Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.1061545Z { 2023-01-11T21:38:06.1061622Z #pragma omp for 2023-01-11T21:38:06.1061713Z for(long i0=0; i0<512; i0+=1) 2023-01-11T21:38:06.1061784Z { 2023-01-11T21:38:06.1061856Z { 2023-01-11T21:38:06.1062054Z #pragma omp declare reduction(+:at::vec::Vectorized:omp_out += omp_in) initializer(omp_priv={{0}}) 2023-01-11T21:38:06.1062142Z float tmp1 = 0; 2023-01-11T21:38:06.1062272Z auto tmp1_vec = at::vec::Vectorized(tmp1); 2023-01-11T21:38:06.1062374Z for(long i1=0; i1<128; i1+=1) 2023-01-11T21:38:06.1062439Z { 2023-01-11T21:38:06.1062593Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + (8*i1) + (1024*i0)); 2023-01-11T21:38:06.1062683Z tmp1_vec += tmp0; 2023-01-11T21:38:06.1062758Z } 2023-01-11T21:38:06.1062964Z tmp1 = at::vec::vec_reduce_all([](at::vec::Vectorized& x, at::vec::Vectorized&y) {return x + y;}, tmp1_vec); 2023-01-11T21:38:06.1063093Z #pragma omp simd simdlen(4) reduction(+:tmp1) 2023-01-11T21:38:06.1063195Z for(long i1=1024; i1<1024; i1+=1) 2023-01-11T21:38:06.1063266Z { 2023-01-11T21:38:06.1063369Z auto tmp0 = in_ptr0[i1 + (1024*i0)]; 2023-01-11T21:38:06.1063455Z tmp1 += tmp0; 2023-01-11T21:38:06.1063525Z } 2023-01-11T21:38:06.1063616Z out_ptr0[i0] = tmp1; 2023-01-11T21:38:06.1063688Z } 2023-01-11T21:38:06.1063758Z } 2023-01-11T21:38:06.1063835Z #pragma omp for 2023-01-11T21:38:06.1063926Z for(long i0=0; i0<512; i0+=1) 2023-01-11T21:38:06.1063996Z { 2023-01-11T21:38:06.1064065Z { 2023-01-11T21:38:06.1064290Z #pragma omp declare reduction(+:at::vec::Vectorized:omp_out += omp_in) initializer(omp_priv={{0}}) 2023-01-11T21:38:06.1064377Z float tmp6 = 0; 2023-01-11T21:38:06.1064505Z auto tmp6_vec = at::vec::Vectorized(tmp6); 2023-01-11T21:38:06.1064599Z for(long i1=0; i1<128; i1+=1) 2023-01-11T21:38:06.1064661Z { 2023-01-11T21:38:06.1064810Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + (8*i1) + (1024*i0)); 2023-01-11T21:38:06.1064944Z auto tmp1 = at::vec::Vectorized(out_ptr0[i0]); 2023-01-11T21:38:06.1065087Z auto tmp2 = at::vec::Vectorized(static_cast(1024)); 2023-01-11T21:38:06.1065189Z auto tmp3 = tmp1 / tmp2; 2023-01-11T21:38:06.1065334Z auto tmp4 = tmp0 - tmp3; 2023-01-11T21:38:06.1065435Z auto tmp5 = tmp4.pow(2); 2023-01-11T21:38:06.1065528Z tmp6_vec += tmp5; 2023-01-11T21:38:06.1065608Z } 2023-01-11T21:38:06.1065835Z tmp6 = at::vec::vec_reduce_all([](at::vec::Vectorized& x, at::vec::Vectorized&y) {return x + y;}, tmp6_vec); 2023-01-11T21:38:06.1065961Z #pragma omp simd simdlen(4) reduction(+:tmp6) 2023-01-11T21:38:06.1066057Z for(long i1=1024; i1<1024; i1+=1) 2023-01-11T21:38:06.1066127Z { 2023-01-11T21:38:06.1066235Z auto tmp0 = in_ptr0[i1 + (1024*i0)]; 2023-01-11T21:38:06.1066334Z auto tmp1 = out_ptr0[i0]; 2023-01-11T21:38:06.1066437Z auto tmp2 = static_cast(1024); 2023-01-11T21:38:06.1066607Z auto tmp3 = tmp1 / tmp2; 2023-01-11T21:38:06.1066747Z auto tmp4 = tmp0 - tmp3; 2023-01-11T21:38:06.1066840Z auto tmp5 = tmp4 * tmp4; 2023-01-11T21:38:06.1066923Z tmp6 += tmp5; 2023-01-11T21:38:06.1066993Z } 2023-01-11T21:38:06.1067084Z out_ptr1[i0] = tmp6; 2023-01-11T21:38:06.1067144Z } 2023-01-11T21:38:06.1067211Z } 2023-01-11T21:38:06.1067293Z #pragma omp for 2023-01-11T21:38:06.1067382Z for(long i0=0; i0<64; i0+=1) 2023-01-11T21:38:06.1067451Z { 2023-01-11T21:38:06.1067592Z auto tmp0 = at::vec::Vectorized::loadu(out_ptr1 + 8*i0); 2023-01-11T21:38:06.1067734Z auto tmp1 = at::vec::Vectorized(static_cast(1024)); 2023-01-11T21:38:06.1067824Z auto tmp2 = tmp0 / tmp1; 2023-01-11T21:38:06.1067917Z tmp2.store(in_out_ptr0 + 8*i0); 2023-01-11T21:38:06.1067987Z } 2023-01-11T21:38:06.1068085Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:06.1068174Z for(long i0=512; i0<512; i0+=1) 2023-01-11T21:38:06.1068241Z { 2023-01-11T21:38:06.1068333Z auto tmp0 = out_ptr1[i0]; 2023-01-11T21:38:06.1068431Z auto tmp1 = static_cast(1024); 2023-01-11T21:38:06.1068524Z auto tmp2 = tmp0 / tmp1; 2023-01-11T21:38:06.1068612Z in_out_ptr0[i0] = tmp2; 2023-01-11T21:38:06.1068678Z } 2023-01-11T21:38:06.1068758Z #pragma omp for 2023-01-11T21:38:06.1068845Z for(long i0=0; i0<512; i0+=1) 2023-01-11T21:38:06.1068910Z { 2023-01-11T21:38:06.1068971Z { 2023-01-11T21:38:06.1069160Z #pragma omp declare reduction(+:at::vec::Vectorized:omp_out += omp_in) initializer(omp_priv={{0}}) 2023-01-11T21:38:06.1069246Z float tmp9 = 0; 2023-01-11T21:38:06.1069370Z auto tmp9_vec = at::vec::Vectorized(tmp9); 2023-01-11T21:38:06.1069468Z for(long i1=0; i1<128; i1+=1) 2023-01-11T21:38:06.1069538Z { 2023-01-11T21:38:06.1069684Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr1 + (8*i1) + (1024*i0)); 2023-01-11T21:38:06.1069815Z auto tmp1 = at::vec::Vectorized(in_ptr2[i0]); 2023-01-11T21:38:06.1069969Z auto tmp3 = at::vec::Vectorized(in_ptr3[i0]); 2023-01-11T21:38:06.1070111Z auto tmp5 = at::vec::Vectorized::loadu(in_ptr4 + 8*i1); 2023-01-11T21:38:06.1070246Z auto tmp7 = at::vec::Vectorized::loadu(in_ptr5 + 8*i1); 2023-01-11T21:38:06.1070387Z auto tmp2 = tmp0 - tmp1; 2023-01-11T21:38:06.1070482Z auto tmp4 = tmp2 * tmp3; 2023-01-11T21:38:06.1070575Z auto tmp6 = tmp4 * tmp5; 2023-01-11T21:38:06.1070672Z auto tmp8 = tmp6 + tmp7; 2023-01-11T21:38:06.1070783Z tmp8.store(out_ptr2 + (8*i1) + (1024*i0)); 2023-01-11T21:38:06.1070867Z tmp9_vec += tmp8; 2023-01-11T21:38:06.1070935Z } 2023-01-11T21:38:06.1071132Z tmp9 = at::vec::vec_reduce_all([](at::vec::Vectorized& x, at::vec::Vectorized&y) {return x + y;}, tmp9_vec); 2023-01-11T21:38:06.1071260Z #pragma omp simd simdlen(4) reduction(+:tmp9) 2023-01-11T21:38:06.1071358Z for(long i1=1024; i1<1024; i1+=1) 2023-01-11T21:38:06.1071428Z { 2023-01-11T21:38:06.1071534Z auto tmp0 = in_ptr1[i1 + (1024*i0)]; 2023-01-11T21:38:06.1071631Z auto tmp1 = in_ptr2[i0]; 2023-01-11T21:38:06.1071718Z auto tmp3 = in_ptr3[i0]; 2023-01-11T21:38:06.1071814Z auto tmp5 = in_ptr4[i1]; 2023-01-11T21:38:06.1071907Z auto tmp7 = in_ptr5[i1]; 2023-01-11T21:38:06.1072044Z auto tmp2 = tmp0 - tmp1; 2023-01-11T21:38:06.1072168Z auto tmp4 = tmp2 * tmp3; 2023-01-11T21:38:06.1072261Z auto tmp6 = tmp4 * tmp5; 2023-01-11T21:38:06.1072352Z auto tmp8 = tmp6 + tmp7; 2023-01-11T21:38:06.1072446Z out_ptr2[i1 + (1024*i0)] = tmp8; 2023-01-11T21:38:06.1072530Z tmp9 += tmp8; 2023-01-11T21:38:06.1072602Z } 2023-01-11T21:38:06.1072690Z out_ptr3[i0] = tmp9; 2023-01-11T21:38:06.1072757Z } 2023-01-11T21:38:06.1072824Z } 2023-01-11T21:38:06.1072908Z #pragma omp for 2023-01-11T21:38:06.1072989Z for(long i0=0; i0<512; i0+=1) 2023-01-11T21:38:06.1073057Z { 2023-01-11T21:38:06.1073126Z { 2023-01-11T21:38:06.1073319Z #pragma omp declare reduction(+:at::vec::Vectorized:omp_out += omp_in) initializer(omp_priv={{0}}) 2023-01-11T21:38:06.1073407Z float tmp6 = 0; 2023-01-11T21:38:06.1073531Z auto tmp6_vec = at::vec::Vectorized(tmp6); 2023-01-11T21:38:06.1073619Z float tmp7 = 0; 2023-01-11T21:38:06.1073738Z auto tmp7_vec = at::vec::Vectorized(tmp7); 2023-01-11T21:38:06.1073833Z for(long i1=0; i1<128; i1+=1) 2023-01-11T21:38:06.1073902Z { 2023-01-11T21:38:06.1074054Z auto tmp0 = at::vec::Vectorized::loadu(out_ptr2 + (8*i1) + (1024*i0)); 2023-01-11T21:38:06.1074187Z auto tmp1 = at::vec::Vectorized(out_ptr3[i0]); 2023-01-11T21:38:06.1074330Z auto tmp2 = at::vec::Vectorized(static_cast(1024)); 2023-01-11T21:38:06.1074425Z auto tmp3 = tmp1 / tmp2; 2023-01-11T21:38:06.1074563Z auto tmp4 = tmp0 - tmp3; 2023-01-11T21:38:06.1074652Z auto tmp5 = tmp4.pow(2); 2023-01-11T21:38:06.1074742Z tmp6_vec += tmp5; 2023-01-11T21:38:06.1074826Z tmp7_vec += tmp0; 2023-01-11T21:38:06.1074898Z } 2023-01-11T21:38:06.1075100Z tmp6 = at::vec::vec_reduce_all([](at::vec::Vectorized& x, at::vec::Vectorized&y) {return x + y;}, tmp6_vec); 2023-01-11T21:38:06.1075319Z tmp7 = at::vec::vec_reduce_all([](at::vec::Vectorized& x, at::vec::Vectorized&y) {return x + y;}, tmp7_vec); 2023-01-11T21:38:06.1075470Z #pragma omp simd simdlen(4) reduction(+:tmp6) reduction(+:tmp7) 2023-01-11T21:38:06.1075565Z for(long i1=1024; i1<1024; i1+=1) 2023-01-11T21:38:06.1075627Z { 2023-01-11T21:38:06.1075735Z auto tmp0 = out_ptr2[i1 + (1024*i0)]; 2023-01-11T21:38:06.1075831Z auto tmp1 = out_ptr3[i0]; 2023-01-11T21:38:06.1075943Z auto tmp2 = static_cast(1024); 2023-01-11T21:38:06.1076040Z auto tmp3 = tmp1 / tmp2; 2023-01-11T21:38:06.1076179Z auto tmp4 = tmp0 - tmp3; 2023-01-11T21:38:06.1076275Z auto tmp5 = tmp4 * tmp4; 2023-01-11T21:38:06.1076359Z tmp6 += tmp5; 2023-01-11T21:38:06.1076434Z tmp7 += tmp0; 2023-01-11T21:38:06.1076503Z } 2023-01-11T21:38:06.1076591Z out_ptr4[i0] = tmp6; 2023-01-11T21:38:06.1076680Z out_ptr5[i0] = tmp7; 2023-01-11T21:38:06.1076747Z } 2023-01-11T21:38:06.1076814Z } 2023-01-11T21:38:06.1076888Z #pragma omp for 2023-01-11T21:38:06.1076974Z for(long i0=0; i0<64; i0+=1) 2023-01-11T21:38:06.1077041Z { 2023-01-11T21:38:06.1077181Z auto tmp0 = at::vec::Vectorized::loadu(out_ptr4 + 8*i0); 2023-01-11T21:38:06.1077321Z auto tmp1 = at::vec::Vectorized(static_cast(1024)); 2023-01-11T21:38:06.1077410Z auto tmp2 = tmp0 / tmp1; 2023-01-11T21:38:06.1077611Z auto tmp3 = at::vec::Vectorized(static_cast(1e-05)); 2023-01-11T21:38:06.1077731Z auto tmp4 = tmp2 + tmp3; 2023-01-11T21:38:06.1077826Z tmp4.store(in_out_ptr1 + 8*i0); 2023-01-11T21:38:06.1077894Z } 2023-01-11T21:38:06.1077993Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:06.1078079Z for(long i0=512; i0<512; i0+=1) 2023-01-11T21:38:06.1078146Z { 2023-01-11T21:38:06.1078238Z auto tmp0 = out_ptr4[i0]; 2023-01-11T21:38:06.1078343Z auto tmp1 = static_cast(1024); 2023-01-11T21:38:06.1078424Z auto tmp2 = tmp0 / tmp1; 2023-01-11T21:38:06.1078576Z auto tmp3 = static_cast(1e-05); 2023-01-11T21:38:06.1078663Z auto tmp4 = tmp2 + tmp3; 2023-01-11T21:38:06.1078750Z in_out_ptr1[i0] = tmp4; 2023-01-11T21:38:06.1078817Z } 2023-01-11T21:38:06.1078882Z } 2023-01-11T21:38:06.1078939Z } 2023-01-11T21:38:06.1079022Z ''') 2023-01-11T21:38:06.1079027Z 2023-01-11T21:38:06.1079032Z 2023-01-11T21:38:06.1079126Z async_compile.wait(globals()) 2023-01-11T21:38:06.1079209Z del async_compile 2023-01-11T21:38:06.1079214Z 2023-01-11T21:38:06.1079288Z def call(args): 2023-01-11T21:38:06.1079400Z arg0_1, arg1_1, arg2_1, arg3_1, arg4_1, arg5_1 = args 2023-01-11T21:38:06.1079477Z args.clear() 2023-01-11T21:38:06.1079687Z buf0 = empty_strided((1, 512, 1), (512, 1, 512), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1079884Z buf1 = empty_strided((1, 512), (512, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1079977Z buf2 = buf1; del buf1 # reuse 2023-01-11T21:38:06.1080196Z buf3 = empty_strided((1, 512, 1024), (524288, 1024, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1080404Z buf4 = empty_strided((1, 512, 1), (512, 1, 512), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1080601Z buf5 = empty_strided((1, 512), (512, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1080797Z buf6 = empty_strided((1, 512), (512, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1080924Z buf7 = as_strided(buf5, (1, 512, 1), (512, 1, 1)); del buf5 # reuse 2023-01-11T21:38:06.1081318Z kernel_cpp_0(c_void_p(buf2.data_ptr()), c_void_p(buf7.data_ptr()), c_void_p(arg3_1.data_ptr()), c_void_p(arg2_1.data_ptr()), c_void_p(arg4_1.data_ptr()), c_void_p(arg5_1.data_ptr()), c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf3.data_ptr()), c_void_p(buf4.data_ptr()), c_void_p(buf6.data_ptr())) 2023-01-11T21:38:06.1081395Z del arg0_1 2023-01-11T21:38:06.1081460Z del arg1_1 2023-01-11T21:38:06.1081532Z del arg2_1 2023-01-11T21:38:06.1081603Z del arg3_1 2023-01-11T21:38:06.1081673Z del arg4_1 2023-01-11T21:38:06.1081745Z del arg5_1 2023-01-11T21:38:06.1081831Z return (buf2, buf6, buf7, ) 2023-01-11T21:38:06.1081836Z 2023-01-11T21:38:06.1081841Z 2023-01-11T21:38:06.1081922Z if __name__ == "__main__": 2023-01-11T21:38:06.1082034Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.1082161Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.1082362Z arg0_1 = rand_strided((1024, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1082559Z arg1_1 = rand_strided((1024, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1082779Z arg2_1 = rand_strided((1, 512, 1024), (524288, 1024, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1082997Z arg3_1 = rand_strided((1, 512, 1024), (524288, 1024, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1083204Z arg4_1 = rand_strided((1, 512, 1), (512, 1, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1083410Z arg5_1 = rand_strided((1, 512, 1), (512, 1, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1083550Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1, arg3_1, arg4_1, arg5_1])) 2023-01-11T21:38:06.1083814Z [2023-01-11 21:32:21,487] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 285 2023-01-11T21:38:06.1083820Z 2023-01-11T21:38:06.1083932Z ok (2.005s) 2023-01-11T21:38:06.1084400Z test_tmp_not_defined_issue2_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.1084533Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.1084791Z [2023-01-11 21:32:21,558] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 286 2023-01-11T21:38:06.1085050Z [2023-01-11 21:32:23,253] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 286 2023-01-11T21:38:06.1085056Z 2023-01-11T21:38:06.1085156Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.1085230Z import torch 2023-01-11T21:38:06.1085307Z import random 2023-01-11T21:38:06.1085420Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.1085543Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.1085549Z 2023-01-11T21:38:06.1085629Z aten = torch.ops.aten 2023-01-11T21:38:06.1085766Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.1085864Z async_compile = AsyncCompile() 2023-01-11T21:38:06.1085870Z 2023-01-11T21:38:06.1085943Z import triton 2023-01-11T21:38:06.1086035Z import triton.language as tl 2023-01-11T21:38:06.1086152Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.1086291Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.1086297Z 2023-01-11T21:38:06.1086301Z 2023-01-11T21:38:06.1086436Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.1086644Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.1086767Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.1086879Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:06.1086988Z const float* __restrict__ in_ptr2, 2023-01-11T21:38:06.1087094Z float* __restrict__ out_ptr0) 2023-01-11T21:38:06.1087152Z { 2023-01-11T21:38:06.1087223Z { 2023-01-11T21:38:06.1087441Z #pragma omp declare reduction(+:at::vec::Vectorized:omp_out += omp_in) initializer(omp_priv={{0}}) 2023-01-11T21:38:06.1087524Z float tmp5 = 0; 2023-01-11T21:38:06.1087645Z auto tmp5_vec = at::vec::Vectorized(tmp5); 2023-01-11T21:38:06.1087754Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.1087824Z { 2023-01-11T21:38:06.1087938Z #pragma omp for reduction(+:tmp5_vec) 2023-01-11T21:38:06.1088026Z for(long i0=0; i0<17600; i0+=1) 2023-01-11T21:38:06.1088096Z { 2023-01-11T21:38:06.1088237Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:06.1088369Z auto tmp1 = at::vec::Vectorized(in_ptr1[0]); 2023-01-11T21:38:06.1088513Z auto tmp3 = at::vec::Vectorized::loadu(in_ptr2 + 8*i0); 2023-01-11T21:38:06.1088611Z auto tmp2 = tmp0 / tmp1; 2023-01-11T21:38:06.1088704Z auto tmp4 = tmp2 * tmp3; 2023-01-11T21:38:06.1088783Z tmp5_vec += tmp4; 2023-01-11T21:38:06.1088854Z } 2023-01-11T21:38:06.1089055Z tmp5 = at::vec::vec_reduce_all([](at::vec::Vectorized& x, at::vec::Vectorized&y) {return x + y;}, tmp5_vec); 2023-01-11T21:38:06.1089187Z #pragma omp for simd simdlen(4) reduction(+:tmp5) 2023-01-11T21:38:06.1089284Z for(long i0=140800; i0<140800; i0+=1) 2023-01-11T21:38:06.1089351Z { 2023-01-11T21:38:06.1089442Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:06.1089533Z auto tmp1 = in_ptr1[0]; 2023-01-11T21:38:06.1089682Z auto tmp3 = in_ptr2[i0]; 2023-01-11T21:38:06.1089775Z auto tmp2 = tmp0 / tmp1; 2023-01-11T21:38:06.1089864Z auto tmp4 = tmp2 * tmp3; 2023-01-11T21:38:06.1089945Z tmp5 += tmp4; 2023-01-11T21:38:06.1090013Z } 2023-01-11T21:38:06.1090079Z } 2023-01-11T21:38:06.1090164Z out_ptr0[0] = tmp5; 2023-01-11T21:38:06.1090223Z } 2023-01-11T21:38:06.1090288Z } 2023-01-11T21:38:06.1090374Z ''') 2023-01-11T21:38:06.1090379Z 2023-01-11T21:38:06.1090384Z 2023-01-11T21:38:06.1090480Z async_compile.wait(globals()) 2023-01-11T21:38:06.1090558Z del async_compile 2023-01-11T21:38:06.1090563Z 2023-01-11T21:38:06.1090638Z def call(args): 2023-01-11T21:38:06.1090744Z primals_1, primals_2, primals_3 = args 2023-01-11T21:38:06.1090813Z args.clear() 2023-01-11T21:38:06.1091001Z buf0 = empty_strided((), (), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1091210Z kernel_cpp_0(c_void_p(primals_3.data_ptr()), c_void_p(primals_2.data_ptr()), c_void_p(primals_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.1091333Z return (buf0, primals_1, primals_2, primals_3, ) 2023-01-11T21:38:06.1091338Z 2023-01-11T21:38:06.1091343Z 2023-01-11T21:38:06.1091425Z if __name__ == "__main__": 2023-01-11T21:38:06.1091548Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.1091676Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.1091907Z primals_1 = rand_strided((1, 88, 40, 40), (140800, 1600, 40, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1092092Z primals_2 = rand_strided((), (), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1092323Z primals_3 = rand_strided((1, 88, 40, 40), (140800, 1600, 40, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1092469Z print_performance(lambda: call([primals_1, primals_2, primals_3])) 2023-01-11T21:38:06.1092474Z 2023-01-11T21:38:06.1092545Z ok (1.760s) 2023-01-11T21:38:06.1093037Z test_to_device_constant_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.1093169Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.1093429Z [2023-01-11 21:32:23,402] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 287 2023-01-11T21:38:06.1093692Z [2023-01-11 21:32:25,094] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 287 2023-01-11T21:38:06.1093697Z 2023-01-11T21:38:06.1093796Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.1093874Z import torch 2023-01-11T21:38:06.1093942Z import random 2023-01-11T21:38:06.1094063Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.1094190Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.1094195Z 2023-01-11T21:38:06.1094277Z aten = torch.ops.aten 2023-01-11T21:38:06.1094413Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.1094625Z async_compile = AsyncCompile() 2023-01-11T21:38:06.1094633Z 2023-01-11T21:38:06.1094710Z import triton 2023-01-11T21:38:06.1094795Z import triton.language as tl 2023-01-11T21:38:06.1094919Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.1095060Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.1095213Z constant0 = None # beb02a6268c3f406824d31228875474e7116cf8e770246a5eb85f5795315f9cc 2023-01-11T21:38:06.1095219Z 2023-01-11T21:38:06.1095223Z 2023-01-11T21:38:06.1095362Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.1095566Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.1095735Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.1095846Z const long* __restrict__ in_ptr1, 2023-01-11T21:38:06.1095953Z float* __restrict__ out_ptr0, 2023-01-11T21:38:06.1096048Z long* __restrict__ out_ptr1, 2023-01-11T21:38:06.1096156Z long* __restrict__ out_ptr2) 2023-01-11T21:38:06.1096223Z { 2023-01-11T21:38:06.1096328Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.1096394Z { 2023-01-11T21:38:06.1096475Z #pragma omp for 2023-01-11T21:38:06.1096555Z for(long i0=0; i0<10; i0+=1) 2023-01-11T21:38:06.1096621Z { 2023-01-11T21:38:06.1096687Z { 2023-01-11T21:38:06.1096757Z { 2023-01-11T21:38:06.1096856Z auto tmp2 = in_ptr0[i0]; 2023-01-11T21:38:06.1096967Z auto tmp0 = static_cast(i0); 2023-01-11T21:38:06.1097079Z auto tmp1 = static_cast(tmp0); 2023-01-11T21:38:06.1097243Z auto tmp3 = tmp1 + tmp2; 2023-01-11T21:38:06.1097335Z out_ptr0[i0] = tmp3; 2023-01-11T21:38:06.1097410Z } 2023-01-11T21:38:06.1097476Z } 2023-01-11T21:38:06.1097544Z } 2023-01-11T21:38:06.1097625Z #pragma omp for 2023-01-11T21:38:06.1097716Z for(long i0=0; i0<64; i0+=1) 2023-01-11T21:38:06.1097775Z { 2023-01-11T21:38:06.1097845Z { 2023-01-11T21:38:06.1097913Z { 2023-01-11T21:38:06.1098010Z auto tmp0 = in_ptr1[i0]; 2023-01-11T21:38:06.1098120Z auto tmp1 = static_cast(1); 2023-01-11T21:38:06.1098214Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.1098306Z out_ptr1[i0] = tmp0; 2023-01-11T21:38:06.1098389Z out_ptr2[i0] = tmp2; 2023-01-11T21:38:06.1098457Z } 2023-01-11T21:38:06.1098525Z } 2023-01-11T21:38:06.1098596Z } 2023-01-11T21:38:06.1098665Z } 2023-01-11T21:38:06.1098730Z } 2023-01-11T21:38:06.1098809Z ''') 2023-01-11T21:38:06.1098824Z 2023-01-11T21:38:06.1098829Z 2023-01-11T21:38:06.1098915Z async_compile.wait(globals()) 2023-01-11T21:38:06.1098991Z del async_compile 2023-01-11T21:38:06.1098996Z 2023-01-11T21:38:06.1099071Z def call(args): 2023-01-11T21:38:06.1099186Z arg0_1, = args 2023-01-11T21:38:06.1099264Z args.clear() 2023-01-11T21:38:06.1099462Z buf0 = empty_strided((10, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1099654Z buf1 = empty_strided((64, ), (1, ), device='cpu', dtype=torch.int64) 2023-01-11T21:38:06.1099836Z buf2 = empty_strided((64, ), (1, ), device='cpu', dtype=torch.int64) 2023-01-11T21:38:06.1100063Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(constant0_cpu0.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr())) 2023-01-11T21:38:06.1100138Z del arg0_1 2023-01-11T21:38:06.1100229Z return (buf0, buf1, buf2, ) 2023-01-11T21:38:06.1100234Z 2023-01-11T21:38:06.1100239Z 2023-01-11T21:38:06.1100321Z if __name__ == "__main__": 2023-01-11T21:38:06.1100440Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.1100569Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.1100777Z constant0 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.1100975Z constant0_cpu0 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.int64) 2023-01-11T21:38:06.1101169Z arg0_1 = rand_strided((10, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1101281Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.1101287Z 2023-01-11T21:38:06.1101357Z ok (1.838s) 2023-01-11T21:38:06.1101818Z test_to_device_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.1101982Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.1102247Z [2023-01-11 21:32:25,147] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 288 2023-01-11T21:38:06.1102430Z [2023-01-11 21:32:25,148] torch._inductor.ir: [WARNING] DeviceCopy 2023-01-11T21:38:06.1102695Z [2023-01-11 21:32:25,150] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 288 2023-01-11T21:38:06.1102701Z 2023-01-11T21:38:06.1102800Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.1102867Z import torch 2023-01-11T21:38:06.1102943Z import random 2023-01-11T21:38:06.1103065Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.1103190Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.1103198Z 2023-01-11T21:38:06.1103283Z aten = torch.ops.aten 2023-01-11T21:38:06.1103419Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.1103515Z async_compile = AsyncCompile() 2023-01-11T21:38:06.1103520Z 2023-01-11T21:38:06.1103587Z import triton 2023-01-11T21:38:06.1103680Z import triton.language as tl 2023-01-11T21:38:06.1103808Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.1103950Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.1103955Z 2023-01-11T21:38:06.1103960Z 2023-01-11T21:38:06.1104053Z async_compile.wait(globals()) 2023-01-11T21:38:06.1104132Z del async_compile 2023-01-11T21:38:06.1104137Z 2023-01-11T21:38:06.1104211Z def call(args): 2023-01-11T21:38:06.1104285Z arg0_1, = args 2023-01-11T21:38:06.1104352Z args.clear() 2023-01-11T21:38:06.1104444Z with torch.cuda.device(0): 2023-01-11T21:38:06.1104656Z buf0 = empty_strided((2, 2, 10), (20, 10, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.1104742Z buf0.copy_(arg0_1) 2023-01-11T21:38:06.1104815Z del arg0_1 2023-01-11T21:38:06.1104894Z return (buf0, ) 2023-01-11T21:38:06.1104899Z 2023-01-11T21:38:06.1104903Z 2023-01-11T21:38:06.1104986Z if __name__ == "__main__": 2023-01-11T21:38:06.1105129Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.1105250Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.1105458Z arg0_1 = rand_strided((2, 2, 10), (20, 10, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1105571Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.1105576Z 2023-01-11T21:38:06.1105649Z ok (0.057s) 2023-01-11T21:38:06.1106102Z test_to_dtype_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.1106237Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.1106493Z [2023-01-11 21:32:25,242] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 289 2023-01-11T21:38:06.1106757Z [2023-01-11 21:32:26,946] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 289 2023-01-11T21:38:06.1106763Z 2023-01-11T21:38:06.1106863Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.1106931Z import torch 2023-01-11T21:38:06.1107005Z import random 2023-01-11T21:38:06.1107126Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.1107251Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.1107256Z 2023-01-11T21:38:06.1107339Z aten = torch.ops.aten 2023-01-11T21:38:06.1107479Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.1107604Z async_compile = AsyncCompile() 2023-01-11T21:38:06.1107609Z 2023-01-11T21:38:06.1107684Z import triton 2023-01-11T21:38:06.1107770Z import triton.language as tl 2023-01-11T21:38:06.1107896Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.1108035Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.1108043Z 2023-01-11T21:38:06.1108047Z 2023-01-11T21:38:06.1108185Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.1108393Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.1108522Z extern "C" void kernel(const double* __restrict__ in_ptr0, 2023-01-11T21:38:06.1108626Z float* __restrict__ out_ptr0, 2023-01-11T21:38:06.1108721Z bool* __restrict__ out_ptr1) 2023-01-11T21:38:06.1108787Z { 2023-01-11T21:38:06.1108888Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.1108956Z { 2023-01-11T21:38:06.1109042Z #pragma omp for 2023-01-11T21:38:06.1109130Z for(long i0=0; i0<40; i0+=1) 2023-01-11T21:38:06.1109195Z { 2023-01-11T21:38:06.1109255Z { 2023-01-11T21:38:06.1109324Z { 2023-01-11T21:38:06.1109423Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:06.1109537Z auto tmp1 = static_cast(1); 2023-01-11T21:38:06.1109634Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.1109746Z auto tmp3 = static_cast(tmp2); 2023-01-11T21:38:06.1109858Z auto tmp4 = static_cast(tmp0); 2023-01-11T21:38:06.1109940Z out_ptr0[i0] = tmp3; 2023-01-11T21:38:06.1110029Z out_ptr1[i0] = tmp4; 2023-01-11T21:38:06.1110098Z } 2023-01-11T21:38:06.1110166Z } 2023-01-11T21:38:06.1110234Z } 2023-01-11T21:38:06.1110300Z } 2023-01-11T21:38:06.1110365Z } 2023-01-11T21:38:06.1110442Z ''') 2023-01-11T21:38:06.1110450Z 2023-01-11T21:38:06.1110455Z 2023-01-11T21:38:06.1110549Z async_compile.wait(globals()) 2023-01-11T21:38:06.1110625Z del async_compile 2023-01-11T21:38:06.1110631Z 2023-01-11T21:38:06.1110704Z def call(args): 2023-01-11T21:38:06.1110784Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.1110859Z args.clear() 2023-01-11T21:38:06.1111095Z buf0 = empty_strided((2, 2, 10), (20, 10, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1111287Z buf1 = empty_strided((2, 2, 10), (20, 10, 1), device='cpu', dtype=torch.bool) 2023-01-11T21:38:06.1111456Z kernel_cpp_0(c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr())) 2023-01-11T21:38:06.1111552Z return (arg0_1, buf0, arg1_1, buf1, ) 2023-01-11T21:38:06.1111558Z 2023-01-11T21:38:06.1111562Z 2023-01-11T21:38:06.1111644Z if __name__ == "__main__": 2023-01-11T21:38:06.1111760Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.1111886Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.1112094Z arg0_1 = rand_strided((2, 2, 10), (20, 10, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1112291Z arg1_1 = rand_strided((2, 2, 10), (20, 10, 1), device='cpu', dtype=torch.float64) 2023-01-11T21:38:06.1112408Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.1112413Z 2023-01-11T21:38:06.1112481Z ok (1.795s) 2023-01-11T21:38:06.1112927Z test_topk_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.1113057Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.1113317Z [2023-01-11 21:32:26,961] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 290 2023-01-11T21:38:06.1113563Z [2023-01-11 21:32:26,967] torch._inductor.ir: [WARNING] Using FallbackKernel: aten.topk 2023-01-11T21:38:06.1113827Z [2023-01-11 21:32:26,969] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 290 2023-01-11T21:38:06.1113832Z 2023-01-11T21:38:06.1113934Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.1114009Z import torch 2023-01-11T21:38:06.1114084Z import random 2023-01-11T21:38:06.1114197Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.1114322Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.1114327Z 2023-01-11T21:38:06.1114408Z aten = torch.ops.aten 2023-01-11T21:38:06.1114547Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.1114645Z async_compile = AsyncCompile() 2023-01-11T21:38:06.1114650Z 2023-01-11T21:38:06.1114723Z import triton 2023-01-11T21:38:06.1114815Z import triton.language as tl 2023-01-11T21:38:06.1114944Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.1115077Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.1115082Z 2023-01-11T21:38:06.1115087Z 2023-01-11T21:38:06.1115179Z async_compile.wait(globals()) 2023-01-11T21:38:06.1115255Z del async_compile 2023-01-11T21:38:06.1115260Z 2023-01-11T21:38:06.1115339Z def call(args): 2023-01-11T21:38:06.1115412Z arg0_1, = args 2023-01-11T21:38:06.1115486Z args.clear() 2023-01-11T21:38:06.1115573Z buf0 = aten.topk(arg0_1, 2) 2023-01-11T21:38:06.1115639Z del arg0_1 2023-01-11T21:38:06.1115712Z buf1 = buf0[0] 2023-01-11T21:38:06.1115824Z assert_size_stride(buf1, (1, 1, 8, 2), (16, 16, 2, 1)) 2023-01-11T21:38:06.1115897Z buf2 = buf0[1] 2023-01-11T21:38:06.1116006Z assert_size_stride(buf2, (1, 1, 8, 2), (16, 16, 2, 1)) 2023-01-11T21:38:06.1116075Z del buf0 2023-01-11T21:38:06.1116155Z return (buf1, buf2, ) 2023-01-11T21:38:06.1116160Z 2023-01-11T21:38:06.1116165Z 2023-01-11T21:38:06.1116240Z if __name__ == "__main__": 2023-01-11T21:38:06.1116357Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.1116482Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.1116693Z arg0_1 = rand_strided((1, 1, 8, 8), (64, 64, 8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1116837Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.1116843Z 2023-01-11T21:38:06.1116915Z ok (0.022s) 2023-01-11T21:38:06.1117371Z test_transpose_add_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.1117504Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.1117765Z [2023-01-11 21:32:26,983] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 291 2023-01-11T21:38:06.1118026Z [2023-01-11 21:32:28,699] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 291 2023-01-11T21:38:06.1118031Z 2023-01-11T21:38:06.1118125Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.1118199Z import torch 2023-01-11T21:38:06.1118273Z import random 2023-01-11T21:38:06.1118392Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.1118517Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.1118523Z 2023-01-11T21:38:06.1118605Z aten = torch.ops.aten 2023-01-11T21:38:06.1118742Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.1118831Z async_compile = AsyncCompile() 2023-01-11T21:38:06.1118843Z 2023-01-11T21:38:06.1118910Z import triton 2023-01-11T21:38:06.1119002Z import triton.language as tl 2023-01-11T21:38:06.1119129Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.1119294Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.1119299Z 2023-01-11T21:38:06.1119303Z 2023-01-11T21:38:06.1119440Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.1119651Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.1119774Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.1119877Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:06.1119983Z float* __restrict__ out_ptr0) 2023-01-11T21:38:06.1120048Z { 2023-01-11T21:38:06.1120152Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.1120219Z { 2023-01-11T21:38:06.1120301Z #pragma omp for 2023-01-11T21:38:06.1120389Z for(long i0=0; i0<32; i0+=1) 2023-01-11T21:38:06.1120450Z { 2023-01-11T21:38:06.1120534Z #pragma GCC ivdep 2023-01-11T21:38:06.1120629Z for(long i1=0; i1<16; i1+=1) 2023-01-11T21:38:06.1120697Z { 2023-01-11T21:38:06.1120767Z { 2023-01-11T21:38:06.1120837Z { 2023-01-11T21:38:06.1120945Z auto tmp0 = in_ptr0[i0 + (32*i1)]; 2023-01-11T21:38:06.1121046Z auto tmp1 = in_ptr1[i1 + (16*i0)]; 2023-01-11T21:38:06.1121155Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.1121258Z out_ptr0[i0 + (32*i1)] = tmp2; 2023-01-11T21:38:06.1121330Z } 2023-01-11T21:38:06.1121399Z } 2023-01-11T21:38:06.1121470Z } 2023-01-11T21:38:06.1121540Z } 2023-01-11T21:38:06.1121599Z } 2023-01-11T21:38:06.1121663Z } 2023-01-11T21:38:06.1121748Z ''') 2023-01-11T21:38:06.1121753Z 2023-01-11T21:38:06.1121758Z 2023-01-11T21:38:06.1121850Z async_compile.wait(globals()) 2023-01-11T21:38:06.1121927Z del async_compile 2023-01-11T21:38:06.1121932Z 2023-01-11T21:38:06.1122010Z def call(args): 2023-01-11T21:38:06.1122091Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.1122160Z args.clear() 2023-01-11T21:38:06.1122363Z buf0 = empty_strided((32, 16), (1, 32), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1122531Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.1122631Z del arg0_1 2023-01-11T21:38:06.1130305Z del arg1_1 2023-01-11T21:38:06.1130403Z return (buf0, ) 2023-01-11T21:38:06.1130409Z 2023-01-11T21:38:06.1130413Z 2023-01-11T21:38:06.1130498Z if __name__ == "__main__": 2023-01-11T21:38:06.1130621Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.1130752Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.1130967Z arg0_1 = rand_strided((16, 32), (32, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1131169Z arg1_1 = rand_strided((32, 16), (16, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1131290Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.1131300Z 2023-01-11T21:38:06.1131374Z ok (1.730s) 2023-01-11T21:38:06.1131835Z test_transpose_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.1131968Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.1132221Z [2023-01-11 21:32:28,721] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 292 2023-01-11T21:38:06.1132480Z [2023-01-11 21:32:30,458] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 292 2023-01-11T21:38:06.1132486Z 2023-01-11T21:38:06.1132587Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.1132728Z import torch 2023-01-11T21:38:06.1132804Z import random 2023-01-11T21:38:06.1132924Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.1133048Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.1133053Z 2023-01-11T21:38:06.1133136Z aten = torch.ops.aten 2023-01-11T21:38:06.1133269Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.1133369Z async_compile = AsyncCompile() 2023-01-11T21:38:06.1133374Z 2023-01-11T21:38:06.1133450Z import triton 2023-01-11T21:38:06.1133542Z import triton.language as tl 2023-01-11T21:38:06.1133668Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.1133809Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.1133815Z 2023-01-11T21:38:06.1133819Z 2023-01-11T21:38:06.1133957Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.1134164Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.1134289Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.1134391Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:06.1134645Z float* __restrict__ out_ptr0, 2023-01-11T21:38:06.1134750Z float* __restrict__ out_ptr1) 2023-01-11T21:38:06.1134820Z { 2023-01-11T21:38:06.1134921Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.1134987Z { 2023-01-11T21:38:06.1135063Z #pragma omp for 2023-01-11T21:38:06.1135151Z for(long i0=0; i0<8; i0+=1) 2023-01-11T21:38:06.1135218Z { 2023-01-11T21:38:06.1135303Z #pragma GCC ivdep 2023-01-11T21:38:06.1135391Z for(long i1=0; i1<8; i1+=1) 2023-01-11T21:38:06.1135460Z { 2023-01-11T21:38:06.1135527Z { 2023-01-11T21:38:06.1135591Z { 2023-01-11T21:38:06.1135698Z auto tmp0 = in_ptr0[i0 + (8*i1)]; 2023-01-11T21:38:06.1135808Z auto tmp1 = in_ptr1[i1 + (8*i0)]; 2023-01-11T21:38:06.1135910Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.1136011Z out_ptr0[i0 + (8*i1)] = tmp2; 2023-01-11T21:38:06.1136082Z } 2023-01-11T21:38:06.1136150Z } 2023-01-11T21:38:06.1136262Z } 2023-01-11T21:38:06.1136331Z } 2023-01-11T21:38:06.1136415Z #pragma omp for 2023-01-11T21:38:06.1136501Z for(long i0=0; i0<8; i0+=1) 2023-01-11T21:38:06.1136568Z { 2023-01-11T21:38:06.1136713Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr1 + 8*i0); 2023-01-11T21:38:06.1136853Z auto tmp1 = at::vec::Vectorized(static_cast(2)); 2023-01-11T21:38:06.1136935Z auto tmp2 = tmp0 * tmp1; 2023-01-11T21:38:06.1137074Z auto tmp3 = at::vec::Vectorized(static_cast(10)); 2023-01-11T21:38:06.1137216Z auto tmp4 = tmp2 + tmp3; 2023-01-11T21:38:06.1137331Z tmp4.store(out_ptr1 + 8*i0); 2023-01-11T21:38:06.1137413Z } 2023-01-11T21:38:06.1137530Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:06.1137630Z for(long i0=64; i0<64; i0+=1) 2023-01-11T21:38:06.1137690Z { 2023-01-11T21:38:06.1137777Z auto tmp0 = in_ptr1[i0]; 2023-01-11T21:38:06.1137885Z auto tmp1 = static_cast(2); 2023-01-11T21:38:06.1137974Z auto tmp2 = tmp0 * tmp1; 2023-01-11T21:38:06.1138077Z auto tmp3 = static_cast(10); 2023-01-11T21:38:06.1138164Z auto tmp4 = tmp2 + tmp3; 2023-01-11T21:38:06.1138249Z out_ptr1[i0] = tmp4; 2023-01-11T21:38:06.1138309Z } 2023-01-11T21:38:06.1138374Z } 2023-01-11T21:38:06.1138437Z } 2023-01-11T21:38:06.1138526Z ''') 2023-01-11T21:38:06.1138534Z 2023-01-11T21:38:06.1138539Z 2023-01-11T21:38:06.1138631Z async_compile.wait(globals()) 2023-01-11T21:38:06.1138708Z del async_compile 2023-01-11T21:38:06.1138756Z 2023-01-11T21:38:06.1138832Z def call(args): 2023-01-11T21:38:06.1138913Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.1138981Z args.clear() 2023-01-11T21:38:06.1139180Z buf0 = empty_strided((8, 8), (1, 8), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1139376Z buf1 = empty_strided((8, 8), (1, 8), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1139572Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr())) 2023-01-11T21:38:06.1139647Z del arg0_1 2023-01-11T21:38:06.1139718Z del arg1_1 2023-01-11T21:38:06.1139800Z return (buf0, buf1, ) 2023-01-11T21:38:06.1139805Z 2023-01-11T21:38:06.1139810Z 2023-01-11T21:38:06.1139883Z if __name__ == "__main__": 2023-01-11T21:38:06.1140000Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.1140127Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.1140325Z arg0_1 = rand_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1140523Z arg1_1 = rand_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1140645Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.1140650Z 2023-01-11T21:38:06.1140720Z ok (1.759s) 2023-01-11T21:38:06.1141250Z test_transposed_propagates_cpu (__main__.CpuTests) ... /opt/conda/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py:372: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled.Consider setting `torch.set_float32_matmul_precision('high')` 2023-01-11T21:38:06.1141331Z warnings.warn( 2023-01-11T21:38:06.1141583Z [2023-01-11 21:32:30,472] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 293 2023-01-11T21:38:06.1141846Z [2023-01-11 21:32:32,143] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 293 2023-01-11T21:38:06.1141851Z 2023-01-11T21:38:06.1141950Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.1142027Z import torch 2023-01-11T21:38:06.1142104Z import random 2023-01-11T21:38:06.1142230Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.1142354Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.1142359Z 2023-01-11T21:38:06.1142448Z aten = torch.ops.aten 2023-01-11T21:38:06.1142611Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.1142709Z async_compile = AsyncCompile() 2023-01-11T21:38:06.1142714Z 2023-01-11T21:38:06.1142793Z import triton 2023-01-11T21:38:06.1142886Z import triton.language as tl 2023-01-11T21:38:06.1143011Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.1143152Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.1143158Z 2023-01-11T21:38:06.1143162Z 2023-01-11T21:38:06.1143300Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.1143506Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.1143625Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.1143736Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:06.1143840Z float* __restrict__ out_ptr0) 2023-01-11T21:38:06.1143907Z { 2023-01-11T21:38:06.1144011Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.1144077Z { 2023-01-11T21:38:06.1144158Z #pragma omp for 2023-01-11T21:38:06.1144239Z for(long i0=0; i0<8; i0+=1) 2023-01-11T21:38:06.1144307Z { 2023-01-11T21:38:06.1144449Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:06.1144587Z auto tmp1 = at::vec::Vectorized::loadu(in_ptr1 + 8*i0); 2023-01-11T21:38:06.1144678Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.1144773Z tmp2.store(out_ptr0 + 8*i0); 2023-01-11T21:38:06.1144839Z } 2023-01-11T21:38:06.1144932Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:06.1145051Z for(long i0=64; i0<64; i0+=1) 2023-01-11T21:38:06.1145119Z { 2023-01-11T21:38:06.1145207Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:06.1145302Z auto tmp1 = in_ptr1[i0]; 2023-01-11T21:38:06.1145408Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.1145500Z out_ptr0[i0] = tmp2; 2023-01-11T21:38:06.1145576Z } 2023-01-11T21:38:06.1145645Z } 2023-01-11T21:38:06.1145709Z } 2023-01-11T21:38:06.1145794Z ''') 2023-01-11T21:38:06.1145799Z 2023-01-11T21:38:06.1145804Z 2023-01-11T21:38:06.1145897Z async_compile.wait(globals()) 2023-01-11T21:38:06.1145975Z del async_compile 2023-01-11T21:38:06.1145979Z 2023-01-11T21:38:06.1146052Z def call(args): 2023-01-11T21:38:06.1146125Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.1146203Z args.clear() 2023-01-11T21:38:06.1146418Z buf0 = empty_strided((1, 4, 4, 4), (64, 4, 1, 16), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1146590Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.1146665Z del arg0_1 2023-01-11T21:38:06.1146738Z del arg1_1 2023-01-11T21:38:06.1146811Z return (buf0, ) 2023-01-11T21:38:06.1146817Z 2023-01-11T21:38:06.1146821Z 2023-01-11T21:38:06.1146902Z if __name__ == "__main__": 2023-01-11T21:38:06.1147016Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.1147143Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.1147357Z arg0_1 = rand_strided((1, 4, 4, 4), (64, 4, 1, 16), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1147562Z arg1_1 = rand_strided((4, 4, 4), (4, 1, 16), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1147682Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.1147688Z 2023-01-11T21:38:06.1147764Z ok (1.684s) 2023-01-11T21:38:06.1148273Z test_triton_conv_cpu (__main__.CpuTests) ... /opt/conda/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py:372: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled.Consider setting `torch.set_float32_matmul_precision('high')` 2023-01-11T21:38:06.1148355Z warnings.warn( 2023-01-11T21:38:06.1148605Z [2023-01-11 21:32:32,209] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 294 2023-01-11T21:38:06.1148925Z [2023-01-11 21:32:32,230] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 294 2023-01-11T21:38:06.1148931Z 2023-01-11T21:38:06.1149031Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.1149108Z import torch 2023-01-11T21:38:06.1149182Z import random 2023-01-11T21:38:06.1149301Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.1149425Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.1149430Z 2023-01-11T21:38:06.1149512Z aten = torch.ops.aten 2023-01-11T21:38:06.1149641Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.1149738Z async_compile = AsyncCompile() 2023-01-11T21:38:06.1149744Z 2023-01-11T21:38:06.1149819Z import triton 2023-01-11T21:38:06.1149911Z import triton.language as tl 2023-01-11T21:38:06.1150034Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.1150182Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.1150190Z 2023-01-11T21:38:06.1150340Z from torch._inductor.triton_ops.conv_perf_model import early_config_prune 2023-01-11T21:38:06.1150487Z from torch._inductor.triton_ops.conv_perf_model import estimate_conv_time 2023-01-11T21:38:06.1150618Z from torch._inductor.triton_ops.autotune import conv_heuristics 2023-01-11T21:38:06.1150624Z 2023-01-11T21:38:06.1150635Z 2023-01-11T21:38:06.1150720Z async_compile.wait(globals()) 2023-01-11T21:38:06.1150798Z del async_compile 2023-01-11T21:38:06.1150803Z 2023-01-11T21:38:06.1150878Z def call(args): 2023-01-11T21:38:06.1150965Z arg0_1, arg1_1, arg2_1 = args 2023-01-11T21:38:06.1151039Z args.clear() 2023-01-11T21:38:06.1151211Z buf0 = aten.convolution(arg0_1, arg1_1, arg2_1, (1, 1), (0, 0), (1, 1), False, (0, 0), 1) 2023-01-11T21:38:06.1151325Z assert_size_stride(buf0, (32, 32, 32, 32), (32768, 1024, 32, 1)) 2023-01-11T21:38:06.1151390Z del arg0_1 2023-01-11T21:38:06.1151461Z del arg1_1 2023-01-11T21:38:06.1151535Z del arg2_1 2023-01-11T21:38:06.1151617Z return (buf0, ) 2023-01-11T21:38:06.1151623Z 2023-01-11T21:38:06.1151627Z 2023-01-11T21:38:06.1151706Z if __name__ == "__main__": 2023-01-11T21:38:06.1151824Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.1151949Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.1152171Z arg0_1 = rand_strided((32, 128, 32, 32), (131072, 1024, 32, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1152387Z arg1_1 = rand_strided((32, 128, 1, 1), (128, 1, 1, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1152579Z arg2_1 = rand_strided((32, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1152709Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1])) 2023-01-11T21:38:06.1152714Z 2023-01-11T21:38:06.1152785Z ok (0.099s) 2023-01-11T21:38:06.1153293Z test_triton_mm2_cpu (__main__.CpuTests) ... /opt/conda/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py:372: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled.Consider setting `torch.set_float32_matmul_precision('high')` 2023-01-11T21:38:06.1153375Z warnings.warn( 2023-01-11T21:38:06.1153635Z [2023-01-11 21:32:32,278] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 295 2023-01-11T21:38:06.1153895Z [2023-01-11 21:32:33,946] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 295 2023-01-11T21:38:06.1153901Z 2023-01-11T21:38:06.1153992Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.1154067Z import torch 2023-01-11T21:38:06.1154142Z import random 2023-01-11T21:38:06.1154264Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.1154388Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.1154393Z 2023-01-11T21:38:06.1154476Z aten = torch.ops.aten 2023-01-11T21:38:06.1154615Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.1154740Z async_compile = AsyncCompile() 2023-01-11T21:38:06.1154746Z 2023-01-11T21:38:06.1154814Z import triton 2023-01-11T21:38:06.1154907Z import triton.language as tl 2023-01-11T21:38:06.1155031Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.1155173Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.1155179Z 2023-01-11T21:38:06.1155314Z from torch._inductor.triton_ops.autotune import mm_heuristics 2023-01-11T21:38:06.1155474Z from torch._inductor.triton_ops.autotune import mm_autotune 2023-01-11T21:38:06.1155479Z 2023-01-11T21:38:06.1155484Z 2023-01-11T21:38:06.1155644Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.1155854Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.1155968Z extern "C" void kernel(float* __restrict__ in_out_ptr0) 2023-01-11T21:38:06.1156034Z { 2023-01-11T21:38:06.1156135Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.1156201Z { 2023-01-11T21:38:06.1156285Z #pragma omp for 2023-01-11T21:38:06.1156378Z for(long i0=0; i0<131072; i0+=1) 2023-01-11T21:38:06.1156445Z { 2023-01-11T21:38:06.1156582Z auto tmp0 = at::vec::Vectorized::loadu(in_out_ptr0 + 8*i0); 2023-01-11T21:38:06.1156718Z auto tmp1 = at::vec::clamp_min(tmp0, decltype(tmp0)(0)); 2023-01-11T21:38:06.1156819Z tmp1.store(in_out_ptr0 + 8*i0); 2023-01-11T21:38:06.1156886Z } 2023-01-11T21:38:06.1156985Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:06.1157082Z for(long i0=1048576; i0<1048576; i0+=1) 2023-01-11T21:38:06.1157151Z { 2023-01-11T21:38:06.1157267Z auto tmp0 = in_out_ptr0[i0]; 2023-01-11T21:38:06.1157358Z auto tmp1 = tmp0 * (tmp0>0); 2023-01-11T21:38:06.1157445Z in_out_ptr0[i0] = tmp1; 2023-01-11T21:38:06.1157511Z } 2023-01-11T21:38:06.1157578Z } 2023-01-11T21:38:06.1157643Z } 2023-01-11T21:38:06.1157728Z ''') 2023-01-11T21:38:06.1157736Z 2023-01-11T21:38:06.1157740Z 2023-01-11T21:38:06.1157827Z async_compile.wait(globals()) 2023-01-11T21:38:06.1157905Z del async_compile 2023-01-11T21:38:06.1157910Z 2023-01-11T21:38:06.1157982Z def call(args): 2023-01-11T21:38:06.1158061Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.1158136Z args.clear() 2023-01-11T21:38:06.1158345Z buf0 = empty_strided((1024, 1024), (1024, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1158445Z aten.mm.out(arg0_1, arg1_1, out=buf0) 2023-01-11T21:38:06.1158510Z del arg0_1 2023-01-11T21:38:06.1158582Z del arg1_1 2023-01-11T21:38:06.1158673Z buf1 = buf0; del buf0 # reuse 2023-01-11T21:38:06.1158784Z kernel_cpp_0(c_void_p(buf1.data_ptr())) 2023-01-11T21:38:06.1158860Z return (buf1, ) 2023-01-11T21:38:06.1158865Z 2023-01-11T21:38:06.1158870Z 2023-01-11T21:38:06.1158950Z if __name__ == "__main__": 2023-01-11T21:38:06.1159068Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.1159196Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.1159395Z arg0_1 = rand_strided((1024, 1024), (1024, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1159595Z arg1_1 = rand_strided((1024, 1024), (1024, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1159714Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.1159721Z 2023-01-11T21:38:06.1159791Z ok (1.715s) 2023-01-11T21:38:06.1160236Z test_triu_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.1160370Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.1160655Z [2023-01-11 21:32:33,996] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 296 2023-01-11T21:38:06.1160919Z [2023-01-11 21:32:35,723] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 296 2023-01-11T21:38:06.1160925Z 2023-01-11T21:38:06.1161023Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.1161090Z import torch 2023-01-11T21:38:06.1161166Z import random 2023-01-11T21:38:06.1161285Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.1161408Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.1161413Z 2023-01-11T21:38:06.1161493Z aten = torch.ops.aten 2023-01-11T21:38:06.1161630Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.1161728Z async_compile = AsyncCompile() 2023-01-11T21:38:06.1161734Z 2023-01-11T21:38:06.1161807Z import triton 2023-01-11T21:38:06.1161892Z import triton.language as tl 2023-01-11T21:38:06.1162018Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.1162161Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.1162166Z 2023-01-11T21:38:06.1162171Z 2023-01-11T21:38:06.1162306Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.1162512Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.1162639Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.1162744Z float* __restrict__ out_ptr0, 2023-01-11T21:38:06.1162846Z float* __restrict__ out_ptr1, 2023-01-11T21:38:06.1162939Z float* __restrict__ out_ptr2) 2023-01-11T21:38:06.1163035Z { 2023-01-11T21:38:06.1163137Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.1163204Z { 2023-01-11T21:38:06.1163299Z #pragma omp for collapse(2) 2023-01-11T21:38:06.1163385Z for(long i0=0; i0<2; i0+=1) 2023-01-11T21:38:06.1163452Z { 2023-01-11T21:38:06.1163535Z for(long i1=0; i1<10; i1+=1) 2023-01-11T21:38:06.1163606Z { 2023-01-11T21:38:06.1163696Z #pragma GCC ivdep 2023-01-11T21:38:06.1163792Z for(long i2=0; i2<10; i2+=1) 2023-01-11T21:38:06.1163861Z { 2023-01-11T21:38:06.1163935Z { 2023-01-11T21:38:06.1164000Z { 2023-01-11T21:38:06.1164121Z auto tmp3 = in_ptr0[i2 + (10*i1) + (100*i0)]; 2023-01-11T21:38:06.1164310Z auto tmp0 = static_cast((-1) + i2 + ((-1)*i1)); 2023-01-11T21:38:06.1164422Z auto tmp1 = static_cast(0); 2023-01-11T21:38:06.1164529Z auto tmp2 = tmp0 >= tmp1; 2023-01-11T21:38:06.1164641Z auto tmp4 = static_cast(0); 2023-01-11T21:38:06.1164751Z auto tmp5 = tmp2 ? tmp3 : tmp4; 2023-01-11T21:38:06.1164930Z auto tmp6 = static_cast(i2 + ((-1)*i1)); 2023-01-11T21:38:06.1165027Z auto tmp7 = tmp6 >= tmp1; 2023-01-11T21:38:06.1165137Z auto tmp8 = tmp7 ? tmp3 : tmp4; 2023-01-11T21:38:06.1165344Z auto tmp9 = static_cast((-2) + i2 + ((-1)*i1)); 2023-01-11T21:38:06.1165457Z auto tmp10 = tmp9 >= tmp1; 2023-01-11T21:38:06.1165586Z auto tmp11 = tmp10 ? tmp3 : tmp4; 2023-01-11T21:38:06.1165699Z out_ptr0[i2 + (10*i1) + (100*i0)] = tmp5; 2023-01-11T21:38:06.1165807Z out_ptr1[i2 + (10*i1) + (100*i0)] = tmp8; 2023-01-11T21:38:06.1165922Z out_ptr2[i2 + (10*i1) + (100*i0)] = tmp11; 2023-01-11T21:38:06.1165989Z } 2023-01-11T21:38:06.1166060Z } 2023-01-11T21:38:06.1166130Z } 2023-01-11T21:38:06.1166196Z } 2023-01-11T21:38:06.1166264Z } 2023-01-11T21:38:06.1166330Z } 2023-01-11T21:38:06.1166464Z } 2023-01-11T21:38:06.1166552Z ''') 2023-01-11T21:38:06.1166557Z 2023-01-11T21:38:06.1166562Z 2023-01-11T21:38:06.1166656Z async_compile.wait(globals()) 2023-01-11T21:38:06.1166734Z del async_compile 2023-01-11T21:38:06.1166739Z 2023-01-11T21:38:06.1166815Z def call(args): 2023-01-11T21:38:06.1166890Z arg0_1, = args 2023-01-11T21:38:06.1166965Z args.clear() 2023-01-11T21:38:06.1167179Z buf0 = empty_strided((2, 10, 10), (100, 10, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1167382Z buf1 = empty_strided((2, 10, 10), (100, 10, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1167586Z buf2 = empty_strided((2, 10, 10), (100, 10, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1167786Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr())) 2023-01-11T21:38:06.1167860Z del arg0_1 2023-01-11T21:38:06.1167945Z return (buf0, buf1, buf2, ) 2023-01-11T21:38:06.1167950Z 2023-01-11T21:38:06.1167957Z 2023-01-11T21:38:06.1168038Z if __name__ == "__main__": 2023-01-11T21:38:06.1168157Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.1168284Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.1168485Z arg0_1 = rand_strided((2, 10, 10), (100, 10, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1168598Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.1168603Z 2023-01-11T21:38:06.1168673Z ok (1.767s) 2023-01-11T21:38:06.1169125Z test_unbind_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.1169288Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.1169550Z [2023-01-11 21:32:35,746] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 297 2023-01-11T21:38:06.1169813Z [2023-01-11 21:32:35,753] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 297 2023-01-11T21:38:06.1169819Z 2023-01-11T21:38:06.1169918Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.1169993Z import torch 2023-01-11T21:38:06.1170061Z import random 2023-01-11T21:38:06.1170179Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.1170304Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.1170311Z 2023-01-11T21:38:06.1170394Z aten = torch.ops.aten 2023-01-11T21:38:06.1170531Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.1170627Z async_compile = AsyncCompile() 2023-01-11T21:38:06.1170632Z 2023-01-11T21:38:06.1170706Z import triton 2023-01-11T21:38:06.1170797Z import triton.language as tl 2023-01-11T21:38:06.1170918Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.1171058Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.1171063Z 2023-01-11T21:38:06.1171068Z 2023-01-11T21:38:06.1171159Z async_compile.wait(globals()) 2023-01-11T21:38:06.1171236Z del async_compile 2023-01-11T21:38:06.1171241Z 2023-01-11T21:38:06.1171315Z def call(args): 2023-01-11T21:38:06.1171389Z arg0_1, = args 2023-01-11T21:38:06.1171468Z args.clear() 2023-01-11T21:38:06.1171741Z return (as_strided(arg0_1, (4, 4), (4, 1)), as_strided(arg0_1, (4, 4), (4, 1), 16), as_strided(arg0_1, (4, 4), (4, 1), 32), as_strided(arg0_1, (4, 4), (4, 1), 48), as_strided(arg0_1, (4, 4), (16, 4)), as_strided(arg0_1, (4, 4), (16, 4), 1), as_strided(arg0_1, (4, 4), (16, 4), 2), as_strided(arg0_1, (4, 4), (16, 4), 3), ) 2023-01-11T21:38:06.1171749Z 2023-01-11T21:38:06.1171754Z 2023-01-11T21:38:06.1171836Z if __name__ == "__main__": 2023-01-11T21:38:06.1171973Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.1172103Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.1172308Z arg0_1 = rand_strided((4, 4, 4), (16, 4, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1172419Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.1172424Z 2023-01-11T21:38:06.1172496Z ok (0.030s) 2023-01-11T21:38:06.1172961Z test_unroll_small_reduction_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.1173096Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.1173355Z [2023-01-11 21:32:35,792] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 298 2023-01-11T21:38:06.1173615Z [2023-01-11 21:32:37,522] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 298 2023-01-11T21:38:06.1174033Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.1174157Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.1174452Z [2023-01-11 21:32:37,561] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 299 2023-01-11T21:38:06.1174457Z 2023-01-11T21:38:06.1174673Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.1174751Z import torch 2023-01-11T21:38:06.1174826Z import random 2023-01-11T21:38:06.1174948Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.1175073Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.1175078Z 2023-01-11T21:38:06.1175167Z aten = torch.ops.aten 2023-01-11T21:38:06.1175296Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.1175392Z async_compile = AsyncCompile() 2023-01-11T21:38:06.1175397Z 2023-01-11T21:38:06.1175471Z import triton 2023-01-11T21:38:06.1175563Z import triton.language as tl 2023-01-11T21:38:06.1175690Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.1175832Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.1175841Z 2023-01-11T21:38:06.1175845Z 2023-01-11T21:38:06.1175997Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.1176205Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.1176323Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.1176433Z float* __restrict__ out_ptr0, 2023-01-11T21:38:06.1176541Z long* __restrict__ out_ptr1, 2023-01-11T21:38:06.1176643Z float* __restrict__ out_ptr2, 2023-01-11T21:38:06.1176741Z long* __restrict__ out_ptr3, 2023-01-11T21:38:06.1176840Z float* __restrict__ out_ptr4, 2023-01-11T21:38:06.1176938Z bool* __restrict__ out_ptr5, 2023-01-11T21:38:06.1177036Z bool* __restrict__ out_ptr6, 2023-01-11T21:38:06.1177177Z long* __restrict__ out_ptr7, 2023-01-11T21:38:06.1177291Z long* __restrict__ out_ptr8, 2023-01-11T21:38:06.1177403Z float* __restrict__ out_ptr9, 2023-01-11T21:38:06.1177510Z float* __restrict__ out_ptr10) 2023-01-11T21:38:06.1177575Z { 2023-01-11T21:38:06.1177677Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.1177744Z { 2023-01-11T21:38:06.1177866Z #pragma omp for 2023-01-11T21:38:06.1177957Z for(long i0=0; i0<8; i0+=1) 2023-01-11T21:38:06.1178024Z { 2023-01-11T21:38:06.1178092Z { 2023-01-11T21:38:06.1178161Z { 2023-01-11T21:38:06.1178263Z auto tmp0 = in_ptr0[3*i0]; 2023-01-11T21:38:06.1178358Z auto tmp1 = in_ptr0[1 + (3*i0)]; 2023-01-11T21:38:06.1178456Z auto tmp3 = in_ptr0[2 + (3*i0)]; 2023-01-11T21:38:06.1178593Z auto tmp2 = (tmp1 != tmp1) ? tmp1 : std::min(tmp0, tmp1); 2023-01-11T21:38:06.1178724Z auto tmp4 = (tmp3 != tmp3) ? tmp3 : std::min(tmp2, tmp3); 2023-01-11T21:38:06.1178834Z auto tmp5 = static_cast(0); 2023-01-11T21:38:06.1178937Z auto tmp6 = static_cast(1); 2023-01-11T21:38:06.1179036Z auto tmp7 = tmp1 < tmp0; 2023-01-11T21:38:06.1179145Z auto tmp8 = tmp7 ? tmp6 : tmp5; 2023-01-11T21:38:06.1179246Z auto tmp9 = static_cast(2); 2023-01-11T21:38:06.1179345Z auto tmp10 = tmp3 < tmp2; 2023-01-11T21:38:06.1179447Z auto tmp11 = tmp10 ? tmp9 : tmp8; 2023-01-11T21:38:06.1179581Z auto tmp12 = (tmp1 != tmp1) ? tmp1 : std::max(tmp0, tmp1); 2023-01-11T21:38:06.1179717Z auto tmp13 = (tmp3 != tmp3) ? tmp3 : std::max(tmp12, tmp3); 2023-01-11T21:38:06.1179812Z auto tmp14 = tmp1 > tmp0; 2023-01-11T21:38:06.1179914Z auto tmp15 = tmp14 ? tmp6 : tmp5; 2023-01-11T21:38:06.1180012Z auto tmp16 = tmp3 > tmp12; 2023-01-11T21:38:06.1180148Z auto tmp17 = tmp16 ? tmp9 : tmp15; 2023-01-11T21:38:06.1180246Z auto tmp18 = tmp0 + tmp1; 2023-01-11T21:38:06.1180344Z auto tmp19 = tmp18 + tmp3; 2023-01-11T21:38:06.1180454Z auto tmp20 = static_cast(1); 2023-01-11T21:38:06.1180553Z auto tmp21 = tmp0 > tmp20; 2023-01-11T21:38:06.1180669Z auto tmp22 = static_cast(tmp21); 2023-01-11T21:38:06.1180782Z auto tmp23 = static_cast(tmp22); 2023-01-11T21:38:06.1180871Z auto tmp24 = tmp1 > tmp20; 2023-01-11T21:38:06.1180983Z auto tmp25 = static_cast(tmp24); 2023-01-11T21:38:06.1181095Z auto tmp26 = static_cast(tmp25); 2023-01-11T21:38:06.1181195Z auto tmp27 = tmp23 || tmp26; 2023-01-11T21:38:06.1181292Z auto tmp28 = tmp3 > tmp20; 2023-01-11T21:38:06.1181401Z auto tmp29 = static_cast(tmp28); 2023-01-11T21:38:06.1181514Z auto tmp30 = static_cast(tmp29); 2023-01-11T21:38:06.1181607Z auto tmp31 = tmp27 || tmp30; 2023-01-11T21:38:06.1181714Z auto tmp32 = static_cast(0); 2023-01-11T21:38:06.1181809Z auto tmp33 = tmp0 > tmp32; 2023-01-11T21:38:06.1181912Z auto tmp34 = tmp33 == 0; 2023-01-11T21:38:06.1182023Z auto tmp35 = static_cast(tmp34); 2023-01-11T21:38:06.1182132Z auto tmp36 = static_cast(tmp35); 2023-01-11T21:38:06.1182230Z auto tmp37 = tmp1 > tmp32; 2023-01-11T21:38:06.1182326Z auto tmp38 = tmp37 == 0; 2023-01-11T21:38:06.1182428Z auto tmp39 = static_cast(tmp38); 2023-01-11T21:38:06.1182536Z auto tmp40 = static_cast(tmp39); 2023-01-11T21:38:06.1182634Z auto tmp41 = tmp36 || tmp40; 2023-01-11T21:38:06.1182732Z auto tmp42 = tmp3 > tmp32; 2023-01-11T21:38:06.1182827Z auto tmp43 = tmp42 == 0; 2023-01-11T21:38:06.1182935Z auto tmp44 = static_cast(tmp43); 2023-01-11T21:38:06.1183041Z auto tmp45 = static_cast(tmp44); 2023-01-11T21:38:06.1183159Z auto tmp46 = tmp41 || tmp45; 2023-01-11T21:38:06.1183256Z auto tmp47 = tmp46 == 0; 2023-01-11T21:38:06.1183345Z out_ptr0[i0] = tmp4; 2023-01-11T21:38:06.1183437Z out_ptr1[i0] = tmp11; 2023-01-11T21:38:06.1183526Z out_ptr2[i0] = tmp13; 2023-01-11T21:38:06.1183617Z out_ptr3[i0] = tmp17; 2023-01-11T21:38:06.1183707Z out_ptr4[i0] = tmp19; 2023-01-11T21:38:06.1183787Z out_ptr5[i0] = tmp31; 2023-01-11T21:38:06.1183874Z out_ptr6[i0] = tmp47; 2023-01-11T21:38:06.1183962Z out_ptr7[i0] = tmp11; 2023-01-11T21:38:06.1184052Z out_ptr8[i0] = tmp17; 2023-01-11T21:38:06.1184142Z out_ptr9[i0] = tmp4; 2023-01-11T21:38:06.1184234Z out_ptr10[i0] = tmp13; 2023-01-11T21:38:06.1184303Z } 2023-01-11T21:38:06.1184364Z } 2023-01-11T21:38:06.1184431Z } 2023-01-11T21:38:06.1184499Z } 2023-01-11T21:38:06.1184568Z } 2023-01-11T21:38:06.1184657Z ''') 2023-01-11T21:38:06.1184663Z 2023-01-11T21:38:06.1184668Z 2023-01-11T21:38:06.1184764Z async_compile.wait(globals()) 2023-01-11T21:38:06.1184839Z del async_compile 2023-01-11T21:38:06.1184844Z 2023-01-11T21:38:06.1184911Z def call(args): 2023-01-11T21:38:06.1184986Z arg0_1, = args 2023-01-11T21:38:06.1185059Z args.clear() 2023-01-11T21:38:06.1185255Z buf0 = empty_strided((8, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1185454Z buf1 = empty_strided((8, ), (1, ), device='cpu', dtype=torch.int64) 2023-01-11T21:38:06.1185722Z buf2 = empty_strided((8, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1185911Z buf3 = empty_strided((8, ), (1, ), device='cpu', dtype=torch.int64) 2023-01-11T21:38:06.1186097Z buf4 = empty_strided((8, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1186276Z buf5 = empty_strided((8, ), (1, ), device='cpu', dtype=torch.bool) 2023-01-11T21:38:06.1186461Z buf6 = empty_strided((8, ), (1, ), device='cpu', dtype=torch.bool) 2023-01-11T21:38:06.1186646Z buf7 = empty_strided((8, ), (1, ), device='cpu', dtype=torch.int64) 2023-01-11T21:38:06.1186831Z buf8 = empty_strided((8, ), (1, ), device='cpu', dtype=torch.int64) 2023-01-11T21:38:06.1187018Z buf9 = empty_strided((8, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1187208Z buf10 = empty_strided((8, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1187576Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr()), c_void_p(buf3.data_ptr()), c_void_p(buf4.data_ptr()), c_void_p(buf5.data_ptr()), c_void_p(buf6.data_ptr()), c_void_p(buf7.data_ptr()), c_void_p(buf8.data_ptr()), c_void_p(buf9.data_ptr()), c_void_p(buf10.data_ptr())) 2023-01-11T21:38:06.1187654Z del arg0_1 2023-01-11T21:38:06.1187786Z return (buf0, buf1, buf2, buf3, buf4, buf5, buf6, buf7, buf8, buf9, buf10, ) 2023-01-11T21:38:06.1187798Z 2023-01-11T21:38:06.1187803Z 2023-01-11T21:38:06.1187876Z if __name__ == "__main__": 2023-01-11T21:38:06.1187997Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.1188124Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.1188322Z arg0_1 = rand_strided((8, 3), (3, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1188435Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.1188440Z 2023-01-11T21:38:06.1188706Z [2023-01-11 21:32:39,271] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 299 2023-01-11T21:38:06.1188714Z 2023-01-11T21:38:06.1188812Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.1188888Z import torch 2023-01-11T21:38:06.1188956Z import random 2023-01-11T21:38:06.1189077Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.1189203Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.1189239Z 2023-01-11T21:38:06.1189326Z aten = torch.ops.aten 2023-01-11T21:38:06.1189466Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.1189562Z async_compile = AsyncCompile() 2023-01-11T21:38:06.1189567Z 2023-01-11T21:38:06.1189642Z import triton 2023-01-11T21:38:06.1189733Z import triton.language as tl 2023-01-11T21:38:06.1189852Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.1189994Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.1190000Z 2023-01-11T21:38:06.1190004Z 2023-01-11T21:38:06.1190144Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.1190351Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.1190471Z extern "C" void kernel(bool* __restrict__ in_out_ptr0, 2023-01-11T21:38:06.1190583Z const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.1190693Z float* __restrict__ out_ptr0, 2023-01-11T21:38:06.1190793Z long* __restrict__ out_ptr1, 2023-01-11T21:38:06.1190887Z float* __restrict__ out_ptr2, 2023-01-11T21:38:06.1190985Z long* __restrict__ out_ptr3, 2023-01-11T21:38:06.1191086Z float* __restrict__ out_ptr4, 2023-01-11T21:38:06.1191182Z bool* __restrict__ out_ptr5, 2023-01-11T21:38:06.1191282Z long* __restrict__ out_ptr7, 2023-01-11T21:38:06.1191380Z long* __restrict__ out_ptr8, 2023-01-11T21:38:06.1191481Z float* __restrict__ out_ptr9, 2023-01-11T21:38:06.1191606Z float* __restrict__ out_ptr10) 2023-01-11T21:38:06.1191670Z { 2023-01-11T21:38:06.1191764Z auto out_ptr6 = in_out_ptr0; 2023-01-11T21:38:06.1191866Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.1191931Z { 2023-01-11T21:38:06.1192017Z #pragma omp for 2023-01-11T21:38:06.1192107Z for(long i0=0; i0<8; i0+=1) 2023-01-11T21:38:06.1192167Z { 2023-01-11T21:38:06.1192234Z { 2023-01-11T21:38:06.1192303Z { 2023-01-11T21:38:06.1192440Z float tmp1 = std::numeric_limits::infinity(); 2023-01-11T21:38:06.1192573Z struct IndexValue_11 {size_t index; float value;}; 2023-01-11T21:38:06.1192720Z IndexValue_11 tmp2{0, std::numeric_limits::infinity()}; 2023-01-11T21:38:06.1192866Z #pragma omp declare reduction(argmin : struct IndexValue_11 :\ 2023-01-11T21:38:06.1193023Z omp_out.value = omp_in.value > omp_out.value ? omp_out.value : omp_in.value,\ 2023-01-11T21:38:06.1193180Z omp_out.index = omp_in.value > omp_out.value ? omp_out.index : omp_in.index)\ 2023-01-11T21:38:06.1193334Z initializer(omp_priv = {0, std::numeric_limits::infinity()}) 2023-01-11T21:38:06.1193593Z float tmp3 = -std::numeric_limits::infinity(); 2023-01-11T21:38:06.1193722Z struct IndexValue_12 {size_t index; float value;}; 2023-01-11T21:38:06.1193954Z IndexValue_12 tmp4{0, -std::numeric_limits::infinity()}; 2023-01-11T21:38:06.1194103Z #pragma omp declare reduction(argmax : struct IndexValue_12 :\ 2023-01-11T21:38:06.1194259Z omp_out.value = omp_in.value < omp_out.value ? omp_out.value : omp_in.value,\ 2023-01-11T21:38:06.1194410Z omp_out.index = omp_in.value < omp_out.value ? omp_out.index : omp_in.index)\ 2023-01-11T21:38:06.1194643Z initializer(omp_priv = {0, -std::numeric_limits::infinity()}) 2023-01-11T21:38:06.1194729Z float tmp5 = 0; 2023-01-11T21:38:06.1194812Z bool tmp10 = 0; 2023-01-11T21:38:06.1194896Z bool tmp16 = 0; 2023-01-11T21:38:06.1195058Z struct IndexValue_13 {size_t index; float value;}; 2023-01-11T21:38:06.1195209Z IndexValue_13 tmp17{0, std::numeric_limits::infinity()}; 2023-01-11T21:38:06.1195374Z #pragma omp declare reduction(argmin : struct IndexValue_13 :\ 2023-01-11T21:38:06.1195555Z omp_out.value = omp_in.value > omp_out.value ? omp_out.value : omp_in.value,\ 2023-01-11T21:38:06.1195698Z omp_out.index = omp_in.value > omp_out.value ? omp_out.index : omp_in.index)\ 2023-01-11T21:38:06.1195850Z initializer(omp_priv = {0, std::numeric_limits::infinity()}) 2023-01-11T21:38:06.1195977Z struct IndexValue_14 {size_t index; float value;}; 2023-01-11T21:38:06.1196217Z IndexValue_14 tmp18{0, -std::numeric_limits::infinity()}; 2023-01-11T21:38:06.1196360Z #pragma omp declare reduction(argmax : struct IndexValue_14 :\ 2023-01-11T21:38:06.1196518Z omp_out.value = omp_in.value < omp_out.value ? omp_out.value : omp_in.value,\ 2023-01-11T21:38:06.1196666Z omp_out.index = omp_in.value < omp_out.value ? omp_out.index : omp_in.index)\ 2023-01-11T21:38:06.1196901Z initializer(omp_priv = {0, -std::numeric_limits::infinity()}) 2023-01-11T21:38:06.1197035Z float tmp19 = std::numeric_limits::infinity(); 2023-01-11T21:38:06.1197239Z float tmp20 = -std::numeric_limits::infinity(); 2023-01-11T21:38:06.1197335Z for(long i1=0; i1<3; i1+=1) 2023-01-11T21:38:06.1197406Z { 2023-01-11T21:38:06.1197509Z { 2023-01-11T21:38:06.1197624Z auto tmp0 = in_ptr0[i1 + (3*i0)]; 2023-01-11T21:38:06.1197739Z auto tmp6 = static_cast(1); 2023-01-11T21:38:06.1197841Z auto tmp7 = tmp0 > tmp6; 2023-01-11T21:38:06.1197960Z auto tmp8 = static_cast(tmp7); 2023-01-11T21:38:06.1198068Z auto tmp9 = static_cast(tmp8); 2023-01-11T21:38:06.1198184Z auto tmp11 = static_cast(0); 2023-01-11T21:38:06.1198287Z auto tmp12 = tmp0 > tmp11; 2023-01-11T21:38:06.1198387Z auto tmp13 = tmp12 == 0; 2023-01-11T21:38:06.1198504Z auto tmp14 = static_cast(tmp13); 2023-01-11T21:38:06.1198616Z auto tmp15 = static_cast(tmp14); 2023-01-11T21:38:06.1198731Z tmp1 = std::min(tmp1, tmp0); 2023-01-11T21:38:06.1198829Z if (tmp2.value > tmp0) { 2023-01-11T21:38:06.1198946Z tmp2.index = i1; tmp2.value = tmp0; 2023-01-11T21:38:06.1199024Z } 2023-01-11T21:38:06.1199132Z tmp3 = std::max(tmp3, tmp0); 2023-01-11T21:38:06.1199235Z if (tmp4.value < tmp0) { 2023-01-11T21:38:06.1199351Z tmp4.index = i1; tmp4.value = tmp0; 2023-01-11T21:38:06.1199424Z } 2023-01-11T21:38:06.1199513Z tmp5 += tmp0; 2023-01-11T21:38:06.1199603Z tmp10 = tmp10 || tmp9; 2023-01-11T21:38:06.1199702Z tmp16 = tmp16 || tmp15; 2023-01-11T21:38:06.1199801Z if (tmp17.value > tmp0) { 2023-01-11T21:38:06.1199918Z tmp17.index = i1; tmp17.value = tmp0; 2023-01-11T21:38:06.1199992Z } 2023-01-11T21:38:06.1200095Z if (tmp18.value < tmp0) { 2023-01-11T21:38:06.1200211Z tmp18.index = i1; tmp18.value = tmp0; 2023-01-11T21:38:06.1200279Z } 2023-01-11T21:38:06.1200390Z tmp19 = std::min(tmp19, tmp0); 2023-01-11T21:38:06.1200526Z tmp20 = std::max(tmp20, tmp0); 2023-01-11T21:38:06.1200603Z } 2023-01-11T21:38:06.1200675Z } 2023-01-11T21:38:06.1200768Z out_ptr0[i0] = tmp1; 2023-01-11T21:38:06.1200867Z out_ptr1[i0] = tmp2.index; 2023-01-11T21:38:06.1200949Z out_ptr2[i0] = tmp3; 2023-01-11T21:38:06.1201047Z out_ptr3[i0] = tmp4.index; 2023-01-11T21:38:06.1201135Z out_ptr4[i0] = tmp5; 2023-01-11T21:38:06.1201226Z out_ptr5[i0] = tmp10; 2023-01-11T21:38:06.1201317Z out_ptr6[i0] = tmp16; 2023-01-11T21:38:06.1201423Z out_ptr7[i0] = tmp17.index; 2023-01-11T21:38:06.1201520Z out_ptr8[i0] = tmp18.index; 2023-01-11T21:38:06.1201602Z out_ptr9[i0] = tmp19; 2023-01-11T21:38:06.1201695Z out_ptr10[i0] = tmp20; 2023-01-11T21:38:06.1201767Z } 2023-01-11T21:38:06.1201837Z } 2023-01-11T21:38:06.1201907Z } 2023-01-11T21:38:06.1201991Z #pragma omp for 2023-01-11T21:38:06.1202079Z for(long i0=0; i0<8; i0+=1) 2023-01-11T21:38:06.1202139Z { 2023-01-11T21:38:06.1202209Z { 2023-01-11T21:38:06.1202278Z { 2023-01-11T21:38:06.1202377Z auto tmp0 = out_ptr6[i0]; 2023-01-11T21:38:06.1202476Z auto tmp1 = tmp0 == 0; 2023-01-11T21:38:06.1202570Z in_out_ptr0[i0] = tmp1; 2023-01-11T21:38:06.1202639Z } 2023-01-11T21:38:06.1202700Z } 2023-01-11T21:38:06.1202796Z } 2023-01-11T21:38:06.1202862Z } 2023-01-11T21:38:06.1202926Z } 2023-01-11T21:38:06.1203015Z ''') 2023-01-11T21:38:06.1203021Z 2023-01-11T21:38:06.1203026Z 2023-01-11T21:38:06.1203121Z async_compile.wait(globals()) 2023-01-11T21:38:06.1203198Z del async_compile 2023-01-11T21:38:06.1203203Z 2023-01-11T21:38:06.1203271Z def call(args): 2023-01-11T21:38:06.1203349Z arg0_1, = args 2023-01-11T21:38:06.1203424Z args.clear() 2023-01-11T21:38:06.1203620Z buf0 = empty_strided((8, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1203809Z buf1 = empty_strided((8, ), (1, ), device='cpu', dtype=torch.int64) 2023-01-11T21:38:06.1203998Z buf2 = empty_strided((8, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1204183Z buf3 = empty_strided((8, ), (1, ), device='cpu', dtype=torch.int64) 2023-01-11T21:38:06.1204361Z buf4 = empty_strided((8, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1204544Z buf5 = empty_strided((8, ), (1, ), device='cpu', dtype=torch.bool) 2023-01-11T21:38:06.1204728Z buf6 = empty_strided((8, ), (1, ), device='cpu', dtype=torch.bool) 2023-01-11T21:38:06.1204913Z buf8 = empty_strided((8, ), (1, ), device='cpu', dtype=torch.int64) 2023-01-11T21:38:06.1205097Z buf9 = empty_strided((8, ), (1, ), device='cpu', dtype=torch.int64) 2023-01-11T21:38:06.1205293Z buf10 = empty_strided((8, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1205485Z buf11 = empty_strided((8, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1205578Z buf7 = buf6; del buf6 # reuse 2023-01-11T21:38:06.1205939Z kernel_cpp_0(c_void_p(buf7.data_ptr()), c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr()), c_void_p(buf3.data_ptr()), c_void_p(buf4.data_ptr()), c_void_p(buf5.data_ptr()), c_void_p(buf8.data_ptr()), c_void_p(buf9.data_ptr()), c_void_p(buf10.data_ptr()), c_void_p(buf11.data_ptr())) 2023-01-11T21:38:06.1206014Z del arg0_1 2023-01-11T21:38:06.1206154Z return (buf0, buf1, buf2, buf3, buf4, buf5, buf7, buf8, buf9, buf10, buf11, ) 2023-01-11T21:38:06.1206159Z 2023-01-11T21:38:06.1206165Z 2023-01-11T21:38:06.1206245Z if __name__ == "__main__": 2023-01-11T21:38:06.1206367Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.1206525Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.1206728Z arg0_1 = rand_strided((8, 3), (3, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1206842Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.1206847Z 2023-01-11T21:38:06.1206918Z ok (3.519s) 2023-01-11T21:38:06.1207427Z test_unspec_inputs_cpu (__main__.CpuTests) ... /opt/conda/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py:372: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled.Consider setting `torch.set_float32_matmul_precision('high')` 2023-01-11T21:38:06.1207507Z warnings.warn( 2023-01-11T21:38:06.1207769Z [2023-01-11 21:32:39,292] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 300 2023-01-11T21:38:06.1208020Z [2023-01-11 21:32:39,706] torch._inductor.compile_fx: [WARNING] skipping cudagraphs due to multiple devices 2023-01-11T21:38:06.1208285Z [2023-01-11 21:32:39,706] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 300 2023-01-11T21:38:06.1208540Z [2023-01-11 21:32:39,721] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 301 2023-01-11T21:38:06.1208786Z [2023-01-11 21:32:40,137] torch._inductor.compile_fx: [WARNING] skipping cudagraphs due to multiple devices 2023-01-11T21:38:06.1209046Z [2023-01-11 21:32:40,137] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 301 2023-01-11T21:38:06.1209298Z [2023-01-11 21:32:40,153] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 302 2023-01-11T21:38:06.1209543Z [2023-01-11 21:32:40,409] torch._inductor.compile_fx: [WARNING] skipping cudagraphs due to multiple devices 2023-01-11T21:38:06.1209825Z [2023-01-11 21:32:40,409] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 302 2023-01-11T21:38:06.1210078Z [2023-01-11 21:32:40,425] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 303 2023-01-11T21:38:06.1210084Z 2023-01-11T21:38:06.1210182Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.1210259Z import torch 2023-01-11T21:38:06.1210334Z import random 2023-01-11T21:38:06.1210452Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.1210575Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.1210580Z 2023-01-11T21:38:06.1210662Z aten = torch.ops.aten 2023-01-11T21:38:06.1210793Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.1210889Z async_compile = AsyncCompile() 2023-01-11T21:38:06.1210894Z 2023-01-11T21:38:06.1210968Z import triton 2023-01-11T21:38:06.1211065Z import triton.language as tl 2023-01-11T21:38:06.1211191Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.1211331Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.1211336Z 2023-01-11T21:38:06.1211341Z 2023-01-11T21:38:06.1211513Z triton_fused_add_div_mul_0 = async_compile.triton(''' 2023-01-11T21:38:06.1211588Z import triton 2023-01-11T21:38:06.1211674Z import triton.language as tl 2023-01-11T21:38:06.1211793Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.1211896Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.1212028Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.1212153Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.1212158Z 2023-01-11T21:38:06.1212605Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*fp32', 1: 'fp32', 2: '*fp32', 3: '*fp32', 4: '*fp32', 5: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]}) 2023-01-11T21:38:06.1212682Z @triton.jit 2023-01-11T21:38:06.1212842Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, out_ptr2, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.1212908Z xnumel = 6 2023-01-11T21:38:06.1213048Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.1213187Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.1213270Z xmask = xindex < xnumel 2023-01-11T21:38:06.1213344Z x0 = xindex 2023-01-11T21:38:06.1213537Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.1213610Z tmp1 = in_ptr1 2023-01-11T21:38:06.1213701Z tmp5 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.1213791Z tmp2 = tmp1.to(tl.float32) 2023-01-11T21:38:06.1213871Z tmp3 = tmp0 + tmp2 2023-01-11T21:38:06.1213951Z tmp4 = tmp0 * tmp2 2023-01-11T21:38:06.1214029Z tmp6 = tmp5 / tmp2 2023-01-11T21:38:06.1214168Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.1214302Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.1214426Z tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp6, xmask) 2023-01-11T21:38:06.1214626Z ''') 2023-01-11T21:38:06.1214633Z 2023-01-11T21:38:06.1214639Z 2023-01-11T21:38:06.1214735Z async_compile.wait(globals()) 2023-01-11T21:38:06.1214810Z del async_compile 2023-01-11T21:38:06.1214816Z 2023-01-11T21:38:06.1214889Z def call(args): 2023-01-11T21:38:06.1214967Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.1215041Z args.clear() 2023-01-11T21:38:06.1215126Z with torch.cuda.device(0): 2023-01-11T21:38:06.1215329Z buf0 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.1215529Z buf1 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.1215724Z buf2 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.1215889Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.1216058Z triton_fused_add_div_mul_0.run(arg0_1, arg1_1.item(), buf0, buf1, buf2, 6, grid=grid(6), stream=stream0) 2023-01-11T21:38:06.1216130Z del arg0_1 2023-01-11T21:38:06.1216203Z del arg1_1 2023-01-11T21:38:06.1216289Z return (buf0, buf1, buf2, ) 2023-01-11T21:38:06.1216294Z 2023-01-11T21:38:06.1216298Z 2023-01-11T21:38:06.1216380Z if __name__ == "__main__": 2023-01-11T21:38:06.1216499Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.1216627Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.1216826Z arg0_1 = rand_strided((2, 3), (3, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.1217010Z arg1_1 = rand_strided((), (), device='cpu', dtype=torch.float16) 2023-01-11T21:38:06.1217176Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.1217182Z 2023-01-11T21:38:06.1217191Z 2023-01-11T21:38:06.1217299Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.1217378Z import torch 2023-01-11T21:38:06.1217453Z import random 2023-01-11T21:38:06.1217572Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.1217696Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.1217702Z 2023-01-11T21:38:06.1217786Z aten = torch.ops.aten 2023-01-11T21:38:06.1217924Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.1218022Z async_compile = AsyncCompile() 2023-01-11T21:38:06.1218027Z 2023-01-11T21:38:06.1218100Z import triton 2023-01-11T21:38:06.1218185Z import triton.language as tl 2023-01-11T21:38:06.1218310Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.1218451Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.1218457Z 2023-01-11T21:38:06.1218461Z 2023-01-11T21:38:06.1218629Z triton_fused_add_div_mul_0 = async_compile.triton(''' 2023-01-11T21:38:06.1218704Z import triton 2023-01-11T21:38:06.1218796Z import triton.language as tl 2023-01-11T21:38:06.1218910Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.1219006Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.1219139Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.1219303Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.1219309Z 2023-01-11T21:38:06.1219756Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: 'fp32', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: '*fp32', 5: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]}) 2023-01-11T21:38:06.1219829Z @triton.jit 2023-01-11T21:38:06.1219989Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, out_ptr2, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.1220060Z xnumel = 6 2023-01-11T21:38:06.1220157Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.1220292Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.1220368Z xmask = xindex < xnumel 2023-01-11T21:38:06.1220438Z x0 = xindex 2023-01-11T21:38:06.1220512Z tmp0 = in_ptr0 2023-01-11T21:38:06.1220703Z tmp2 = tl.load(in_ptr1 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.1220804Z tmp5 = tl.load(in_ptr1 + (x0), xmask) 2023-01-11T21:38:06.1220893Z tmp1 = tmp0.to(tl.float32) 2023-01-11T21:38:06.1220971Z tmp3 = tmp1 + tmp2 2023-01-11T21:38:06.1221043Z tmp4 = tmp1 * tmp2 2023-01-11T21:38:06.1221117Z tmp6 = tmp1 / tmp5 2023-01-11T21:38:06.1221251Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.1221382Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.1221512Z tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp6, xmask) 2023-01-11T21:38:06.1221596Z ''') 2023-01-11T21:38:06.1221602Z 2023-01-11T21:38:06.1221634Z 2023-01-11T21:38:06.1221726Z async_compile.wait(globals()) 2023-01-11T21:38:06.1221796Z del async_compile 2023-01-11T21:38:06.1221801Z 2023-01-11T21:38:06.1221877Z def call(args): 2023-01-11T21:38:06.1221957Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.1222030Z args.clear() 2023-01-11T21:38:06.1222123Z with torch.cuda.device(0): 2023-01-11T21:38:06.1222322Z buf0 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.1222517Z buf1 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.1222713Z buf2 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.1222798Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.1222961Z triton_fused_add_div_mul_0.run(arg0_1.item(), arg1_1, buf0, buf1, buf2, 6, grid=grid(6), stream=stream0) 2023-01-11T21:38:06.1223035Z del arg0_1 2023-01-11T21:38:06.1223106Z del arg1_1 2023-01-11T21:38:06.1223194Z return (buf0, buf1, buf2, ) 2023-01-11T21:38:06.1223203Z 2023-01-11T21:38:06.1223208Z 2023-01-11T21:38:06.1223288Z if __name__ == "__main__": 2023-01-11T21:38:06.1223404Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.1223524Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.1223710Z arg0_1 = rand_strided((), (), device='cpu', dtype=torch.float16) 2023-01-11T21:38:06.1223904Z arg1_1 = rand_strided((2, 3), (3, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.1224021Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.1224027Z 2023-01-11T21:38:06.1224031Z 2023-01-11T21:38:06.1224128Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.1224203Z import torch 2023-01-11T21:38:06.1224277Z import random 2023-01-11T21:38:06.1224395Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.1224511Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.1224516Z 2023-01-11T21:38:06.1224600Z aten = torch.ops.aten 2023-01-11T21:38:06.1224739Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.1224835Z async_compile = AsyncCompile() 2023-01-11T21:38:06.1224840Z 2023-01-11T21:38:06.1224912Z import triton 2023-01-11T21:38:06.1225006Z import triton.language as tl 2023-01-11T21:38:06.1225160Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.1225302Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.1225307Z 2023-01-11T21:38:06.1225312Z 2023-01-11T21:38:06.1225488Z triton_fused_add_div_mul_0 = async_compile.triton(''' 2023-01-11T21:38:06.1225572Z import triton 2023-01-11T21:38:06.1225680Z import triton.language as tl 2023-01-11T21:38:06.1225801Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.1225902Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.1226035Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.1226161Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.1226169Z 2023-01-11T21:38:06.1226617Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*fp32', 1: 'fp32', 2: '*fp32', 3: '*fp32', 4: '*fp32', 5: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]}) 2023-01-11T21:38:06.1226687Z @triton.jit 2023-01-11T21:38:06.1226848Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, out_ptr2, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.1226922Z xnumel = 6 2023-01-11T21:38:06.1227020Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.1227152Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.1227236Z xmask = xindex < xnumel 2023-01-11T21:38:06.1227307Z x0 = xindex 2023-01-11T21:38:06.1227491Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.1227567Z tmp1 = in_ptr1 2023-01-11T21:38:06.1227663Z tmp5 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.1227780Z tmp2 = tmp1.to(tl.float32) 2023-01-11T21:38:06.1227858Z tmp3 = tmp0 + tmp2 2023-01-11T21:38:06.1227935Z tmp4 = tmp0 * tmp2 2023-01-11T21:38:06.1228012Z tmp6 = tmp5 / tmp2 2023-01-11T21:38:06.1228140Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.1228272Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.1228402Z tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp6, xmask) 2023-01-11T21:38:06.1228489Z ''') 2023-01-11T21:38:06.1228494Z 2023-01-11T21:38:06.1228499Z 2023-01-11T21:38:06.1228591Z async_compile.wait(globals()) 2023-01-11T21:38:06.1228667Z del async_compile 2023-01-11T21:38:06.1228672Z 2023-01-11T21:38:06.1228749Z def call(args): 2023-01-11T21:38:06.1228830Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.1228898Z args.clear() 2023-01-11T21:38:06.1228990Z with torch.cuda.device(0): 2023-01-11T21:38:06.1229188Z buf0 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.1229390Z buf1 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.1229585Z buf2 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.1229677Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.1229844Z triton_fused_add_div_mul_0.run(arg0_1, arg1_1.item(), buf0, buf1, buf2, 6, grid=grid(6), stream=stream0) 2023-01-11T21:38:06.1229917Z del arg0_1 2023-01-11T21:38:06.1229983Z del arg1_1 2023-01-11T21:38:06.1230072Z return (buf0, buf1, buf2, ) 2023-01-11T21:38:06.1230077Z 2023-01-11T21:38:06.1230081Z 2023-01-11T21:38:06.1230161Z if __name__ == "__main__": 2023-01-11T21:38:06.1230279Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.1230406Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.1230605Z arg0_1 = rand_strided((2, 3), (3, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.1230797Z arg1_1 = rand_strided((), (), device='cpu', dtype=torch.bfloat16) 2023-01-11T21:38:06.1230908Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.1230920Z 2023-01-11T21:38:06.1231162Z [2023-01-11 21:32:40,682] torch._inductor.compile_fx: [WARNING] skipping cudagraphs due to multiple devices 2023-01-11T21:38:06.1231454Z [2023-01-11 21:32:40,682] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 303 2023-01-11T21:38:06.1231708Z [2023-01-11 21:32:40,698] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 304 2023-01-11T21:38:06.1231960Z [2023-01-11 21:32:41,016] torch._inductor.compile_fx: [WARNING] skipping cudagraphs due to multiple devices 2023-01-11T21:38:06.1232222Z [2023-01-11 21:32:41,017] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 304 2023-01-11T21:38:06.1232475Z [2023-01-11 21:32:41,032] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 305 2023-01-11T21:38:06.1232725Z [2023-01-11 21:32:41,353] torch._inductor.compile_fx: [WARNING] skipping cudagraphs due to multiple devices 2023-01-11T21:38:06.1232983Z [2023-01-11 21:32:41,354] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 305 2023-01-11T21:38:06.1233234Z [2023-01-11 21:32:41,369] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 306 2023-01-11T21:38:06.1233240Z 2023-01-11T21:38:06.1233331Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.1233406Z import torch 2023-01-11T21:38:06.1233479Z import random 2023-01-11T21:38:06.1233599Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.1233721Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.1233727Z 2023-01-11T21:38:06.1233808Z aten = torch.ops.aten 2023-01-11T21:38:06.1233944Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.1234032Z async_compile = AsyncCompile() 2023-01-11T21:38:06.1234071Z 2023-01-11T21:38:06.1234139Z import triton 2023-01-11T21:38:06.1234231Z import triton.language as tl 2023-01-11T21:38:06.1234358Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.1234499Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.1234505Z 2023-01-11T21:38:06.1234512Z 2023-01-11T21:38:06.1234678Z triton_fused_add_div_mul_0 = async_compile.triton(''' 2023-01-11T21:38:06.1234751Z import triton 2023-01-11T21:38:06.1234842Z import triton.language as tl 2023-01-11T21:38:06.1234949Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.1235050Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.1235198Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.1235338Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.1235344Z 2023-01-11T21:38:06.1235808Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: 'fp32', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: '*fp32', 5: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]}) 2023-01-11T21:38:06.1235886Z @triton.jit 2023-01-11T21:38:06.1236052Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, out_ptr2, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.1236128Z xnumel = 6 2023-01-11T21:38:06.1236227Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.1236350Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.1236432Z xmask = xindex < xnumel 2023-01-11T21:38:06.1236503Z x0 = xindex 2023-01-11T21:38:06.1236577Z tmp0 = in_ptr0 2023-01-11T21:38:06.1236768Z tmp2 = tl.load(in_ptr1 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.1236867Z tmp5 = tl.load(in_ptr1 + (x0), xmask) 2023-01-11T21:38:06.1236955Z tmp1 = tmp0.to(tl.float32) 2023-01-11T21:38:06.1237027Z tmp3 = tmp1 + tmp2 2023-01-11T21:38:06.1237108Z tmp4 = tmp1 * tmp2 2023-01-11T21:38:06.1237184Z tmp6 = tmp1 / tmp5 2023-01-11T21:38:06.1237319Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.1237453Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.1237607Z tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp6, xmask) 2023-01-11T21:38:06.1237694Z ''') 2023-01-11T21:38:06.1237699Z 2023-01-11T21:38:06.1237704Z 2023-01-11T21:38:06.1237789Z async_compile.wait(globals()) 2023-01-11T21:38:06.1237870Z del async_compile 2023-01-11T21:38:06.1237875Z 2023-01-11T21:38:06.1237951Z def call(args): 2023-01-11T21:38:06.1238031Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.1238106Z args.clear() 2023-01-11T21:38:06.1238199Z with torch.cuda.device(0): 2023-01-11T21:38:06.1238400Z buf0 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.1238592Z buf1 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.1238790Z buf2 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.1238882Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.1239050Z triton_fused_add_div_mul_0.run(arg0_1.item(), arg1_1, buf0, buf1, buf2, 6, grid=grid(6), stream=stream0) 2023-01-11T21:38:06.1239125Z del arg0_1 2023-01-11T21:38:06.1239201Z del arg1_1 2023-01-11T21:38:06.1239290Z return (buf0, buf1, buf2, ) 2023-01-11T21:38:06.1239295Z 2023-01-11T21:38:06.1239300Z 2023-01-11T21:38:06.1239379Z if __name__ == "__main__": 2023-01-11T21:38:06.1239490Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.1239616Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.1239805Z arg0_1 = rand_strided((), (), device='cpu', dtype=torch.bfloat16) 2023-01-11T21:38:06.1240002Z arg1_1 = rand_strided((2, 3), (3, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.1240151Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.1240156Z 2023-01-11T21:38:06.1240161Z 2023-01-11T21:38:06.1240260Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.1240335Z import torch 2023-01-11T21:38:06.1240410Z import random 2023-01-11T21:38:06.1240523Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.1240650Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.1240655Z 2023-01-11T21:38:06.1240736Z aten = torch.ops.aten 2023-01-11T21:38:06.1240873Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.1240970Z async_compile = AsyncCompile() 2023-01-11T21:38:06.1240975Z 2023-01-11T21:38:06.1241050Z import triton 2023-01-11T21:38:06.1241142Z import triton.language as tl 2023-01-11T21:38:06.1241264Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.1241396Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.1241402Z 2023-01-11T21:38:06.1241409Z 2023-01-11T21:38:06.1241576Z triton_fused_add_div_mul_0 = async_compile.triton(''' 2023-01-11T21:38:06.1241650Z import triton 2023-01-11T21:38:06.1241742Z import triton.language as tl 2023-01-11T21:38:06.1241856Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.1241961Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.1242099Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.1242217Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.1242229Z 2023-01-11T21:38:06.1242669Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*fp32', 1: 'fp32', 2: '*fp32', 3: '*fp32', 4: '*fp32', 5: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]}) 2023-01-11T21:38:06.1242744Z @triton.jit 2023-01-11T21:38:06.1242903Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, out_ptr2, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.1242979Z xnumel = 6 2023-01-11T21:38:06.1243077Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.1243209Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.1243293Z xmask = xindex < xnumel 2023-01-11T21:38:06.1243357Z x0 = xindex 2023-01-11T21:38:06.1243580Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.1243657Z tmp1 = in_ptr1 2023-01-11T21:38:06.1243757Z tmp4 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.1243837Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.1243913Z tmp3 = tmp0 * tmp1 2023-01-11T21:38:06.1243991Z tmp5 = tmp4 / tmp1 2023-01-11T21:38:06.1244117Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.1244246Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.1244374Z tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask) 2023-01-11T21:38:06.1244458Z ''') 2023-01-11T21:38:06.1244466Z 2023-01-11T21:38:06.1244471Z 2023-01-11T21:38:06.1244563Z async_compile.wait(globals()) 2023-01-11T21:38:06.1244640Z del async_compile 2023-01-11T21:38:06.1244645Z 2023-01-11T21:38:06.1244717Z def call(args): 2023-01-11T21:38:06.1244796Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.1244864Z args.clear() 2023-01-11T21:38:06.1244958Z with torch.cuda.device(0): 2023-01-11T21:38:06.1245159Z buf0 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.1245357Z buf1 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.1245553Z buf2 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.1245646Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.1245812Z triton_fused_add_div_mul_0.run(arg0_1, arg1_1.item(), buf0, buf1, buf2, 6, grid=grid(6), stream=stream0) 2023-01-11T21:38:06.1245885Z del arg0_1 2023-01-11T21:38:06.1245951Z del arg1_1 2023-01-11T21:38:06.1246069Z return (buf0, buf1, buf2, ) 2023-01-11T21:38:06.1246074Z 2023-01-11T21:38:06.1246079Z 2023-01-11T21:38:06.1246155Z if __name__ == "__main__": 2023-01-11T21:38:06.1246273Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.1246401Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.1246600Z arg0_1 = rand_strided((2, 3), (3, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.1246785Z arg1_1 = rand_strided((), (), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1246897Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.1246911Z 2023-01-11T21:38:06.1246915Z 2023-01-11T21:38:06.1247005Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.1247077Z import torch 2023-01-11T21:38:06.1247151Z import random 2023-01-11T21:38:06.1247271Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.1247393Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.1247401Z 2023-01-11T21:38:06.1247482Z aten = torch.ops.aten 2023-01-11T21:38:06.1247619Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.1247707Z async_compile = AsyncCompile() 2023-01-11T21:38:06.1247717Z 2023-01-11T21:38:06.1247787Z import triton 2023-01-11T21:38:06.1247879Z import triton.language as tl 2023-01-11T21:38:06.1248007Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.1248148Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.1248154Z 2023-01-11T21:38:06.1248158Z 2023-01-11T21:38:06.1248324Z triton_fused_add_div_mul_0 = async_compile.triton(''' 2023-01-11T21:38:06.1248402Z import triton 2023-01-11T21:38:06.1248493Z import triton.language as tl 2023-01-11T21:38:06.1248599Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.1248699Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.1248829Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.1248955Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.1248962Z 2023-01-11T21:38:06.1249437Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: 'fp32', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: '*fp32', 5: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]}) 2023-01-11T21:38:06.1249513Z @triton.jit 2023-01-11T21:38:06.1249675Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, out_ptr2, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.1249749Z xnumel = 6 2023-01-11T21:38:06.1249841Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.1249969Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.1250051Z xmask = xindex < xnumel 2023-01-11T21:38:06.1250120Z x0 = xindex 2023-01-11T21:38:06.1250194Z tmp0 = in_ptr0 2023-01-11T21:38:06.1250387Z tmp1 = tl.load(in_ptr1 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.1250489Z tmp4 = tl.load(in_ptr1 + (x0), xmask) 2023-01-11T21:38:06.1250561Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.1250641Z tmp3 = tmp0 * tmp1 2023-01-11T21:38:06.1250722Z tmp5 = tmp0 / tmp4 2023-01-11T21:38:06.1250855Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.1250994Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.1251122Z tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask) 2023-01-11T21:38:06.1251206Z ''') 2023-01-11T21:38:06.1251212Z 2023-01-11T21:38:06.1251217Z 2023-01-11T21:38:06.1251309Z async_compile.wait(globals()) 2023-01-11T21:38:06.1251381Z del async_compile 2023-01-11T21:38:06.1251386Z 2023-01-11T21:38:06.1251460Z def call(args): 2023-01-11T21:38:06.1251539Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.1251612Z args.clear() 2023-01-11T21:38:06.1251704Z with torch.cuda.device(0): 2023-01-11T21:38:06.1251904Z buf0 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.1252132Z buf1 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.1252323Z buf2 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.1252418Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.1252588Z triton_fused_add_div_mul_0.run(arg0_1.item(), arg1_1, buf0, buf1, buf2, 6, grid=grid(6), stream=stream0) 2023-01-11T21:38:06.1252665Z del arg0_1 2023-01-11T21:38:06.1252736Z del arg1_1 2023-01-11T21:38:06.1252825Z return (buf0, buf1, buf2, ) 2023-01-11T21:38:06.1252831Z 2023-01-11T21:38:06.1252835Z 2023-01-11T21:38:06.1252916Z if __name__ == "__main__": 2023-01-11T21:38:06.1253036Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.1253156Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.1253341Z arg0_1 = rand_strided((), (), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1253542Z arg1_1 = rand_strided((2, 3), (3, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.1253660Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.1253666Z 2023-01-11T21:38:06.1253914Z [2023-01-11 21:32:41,784] torch._inductor.compile_fx: [WARNING] skipping cudagraphs due to multiple devices 2023-01-11T21:38:06.1254183Z [2023-01-11 21:32:41,784] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 306 2023-01-11T21:38:06.1254436Z [2023-01-11 21:32:41,800] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 307 2023-01-11T21:38:06.1254804Z [2023-01-11 21:32:42,223] torch._inductor.compile_fx: [WARNING] skipping cudagraphs due to multiple devices 2023-01-11T21:38:06.1255064Z [2023-01-11 21:32:42,223] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 307 2023-01-11T21:38:06.1255308Z [2023-01-11 21:32:42,247] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 308 2023-01-11T21:38:06.1255559Z [2023-01-11 21:32:42,663] torch._inductor.compile_fx: [WARNING] skipping cudagraphs due to multiple devices 2023-01-11T21:38:06.1255819Z [2023-01-11 21:32:42,663] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 308 2023-01-11T21:38:06.1256113Z [2023-01-11 21:32:42,684] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 309 2023-01-11T21:38:06.1256120Z 2023-01-11T21:38:06.1256219Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.1256294Z import torch 2023-01-11T21:38:06.1256369Z import random 2023-01-11T21:38:06.1256490Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.1256606Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.1256611Z 2023-01-11T21:38:06.1256695Z aten = torch.ops.aten 2023-01-11T21:38:06.1256831Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.1256929Z async_compile = AsyncCompile() 2023-01-11T21:38:06.1256937Z 2023-01-11T21:38:06.1257011Z import triton 2023-01-11T21:38:06.1257104Z import triton.language as tl 2023-01-11T21:38:06.1257288Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.1257429Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.1257435Z 2023-01-11T21:38:06.1257441Z 2023-01-11T21:38:06.1257602Z triton_fused_add_div_mul_0 = async_compile.triton(''' 2023-01-11T21:38:06.1257677Z import triton 2023-01-11T21:38:06.1257771Z import triton.language as tl 2023-01-11T21:38:06.1257884Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.1257986Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.1258119Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.1258244Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.1258249Z 2023-01-11T21:38:06.1258692Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*fp32', 1: 'fp64', 2: '*fp32', 3: '*fp32', 4: '*fp32', 5: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]}) 2023-01-11T21:38:06.1258851Z @triton.jit 2023-01-11T21:38:06.1259010Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, out_ptr2, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.1259087Z xnumel = 6 2023-01-11T21:38:06.1259186Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.1259316Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.1259400Z xmask = xindex < xnumel 2023-01-11T21:38:06.1259473Z x0 = xindex 2023-01-11T21:38:06.1259657Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.1259734Z tmp1 = in_ptr1 2023-01-11T21:38:06.1259831Z tmp5 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.1259920Z tmp2 = tmp1.to(tl.float32) 2023-01-11T21:38:06.1259998Z tmp3 = tmp0 + tmp2 2023-01-11T21:38:06.1260080Z tmp4 = tmp0 * tmp2 2023-01-11T21:38:06.1260160Z tmp6 = tmp5 / tmp2 2023-01-11T21:38:06.1260289Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.1260424Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.1260557Z tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp6, xmask) 2023-01-11T21:38:06.1260642Z ''') 2023-01-11T21:38:06.1260648Z 2023-01-11T21:38:06.1260652Z 2023-01-11T21:38:06.1260744Z async_compile.wait(globals()) 2023-01-11T21:38:06.1260820Z del async_compile 2023-01-11T21:38:06.1260826Z 2023-01-11T21:38:06.1260901Z def call(args): 2023-01-11T21:38:06.1260981Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.1261050Z args.clear() 2023-01-11T21:38:06.1261143Z with torch.cuda.device(0): 2023-01-11T21:38:06.1261338Z buf0 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.1261537Z buf1 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.1261734Z buf2 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.1261828Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.1261995Z triton_fused_add_div_mul_0.run(arg0_1, arg1_1.item(), buf0, buf1, buf2, 6, grid=grid(6), stream=stream0) 2023-01-11T21:38:06.1262097Z del arg0_1 2023-01-11T21:38:06.1262165Z del arg1_1 2023-01-11T21:38:06.1262254Z return (buf0, buf1, buf2, ) 2023-01-11T21:38:06.1262259Z 2023-01-11T21:38:06.1262264Z 2023-01-11T21:38:06.1262346Z if __name__ == "__main__": 2023-01-11T21:38:06.1262469Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.1262597Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.1262795Z arg0_1 = rand_strided((2, 3), (3, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.1262980Z arg1_1 = rand_strided((), (), device='cpu', dtype=torch.float64) 2023-01-11T21:38:06.1263092Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.1263109Z 2023-01-11T21:38:06.1263113Z 2023-01-11T21:38:06.1263203Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.1263277Z import torch 2023-01-11T21:38:06.1263351Z import random 2023-01-11T21:38:06.1263467Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.1263599Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.1263604Z 2023-01-11T21:38:06.1263686Z aten = torch.ops.aten 2023-01-11T21:38:06.1263821Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.1263910Z async_compile = AsyncCompile() 2023-01-11T21:38:06.1263921Z 2023-01-11T21:38:06.1263988Z import triton 2023-01-11T21:38:06.1264080Z import triton.language as tl 2023-01-11T21:38:06.1264205Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.1264343Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.1264349Z 2023-01-11T21:38:06.1264353Z 2023-01-11T21:38:06.1264548Z triton_fused_add_div_mul_0 = async_compile.triton(''' 2023-01-11T21:38:06.1264626Z import triton 2023-01-11T21:38:06.1264721Z import triton.language as tl 2023-01-11T21:38:06.1264827Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.1264931Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.1265071Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.1265199Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.1265205Z 2023-01-11T21:38:06.1265697Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: 'fp64', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: '*fp32', 5: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]}) 2023-01-11T21:38:06.1265772Z @triton.jit 2023-01-11T21:38:06.1265936Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, out_ptr2, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.1266008Z xnumel = 6 2023-01-11T21:38:06.1266102Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.1266232Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.1266316Z xmask = xindex < xnumel 2023-01-11T21:38:06.1266388Z x0 = xindex 2023-01-11T21:38:06.1266462Z tmp0 = in_ptr0 2023-01-11T21:38:06.1266656Z tmp2 = tl.load(in_ptr1 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.1266754Z tmp5 = tl.load(in_ptr1 + (x0), xmask) 2023-01-11T21:38:06.1266835Z tmp1 = tmp0.to(tl.float32) 2023-01-11T21:38:06.1266914Z tmp3 = tmp1 + tmp2 2023-01-11T21:38:06.1266991Z tmp4 = tmp1 * tmp2 2023-01-11T21:38:06.1267065Z tmp6 = tmp1 / tmp5 2023-01-11T21:38:06.1267197Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.1267330Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.1267461Z tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp6, xmask) 2023-01-11T21:38:06.1267541Z ''') 2023-01-11T21:38:06.1267554Z 2023-01-11T21:38:06.1267559Z 2023-01-11T21:38:06.1267645Z async_compile.wait(globals()) 2023-01-11T21:38:06.1267722Z del async_compile 2023-01-11T21:38:06.1267727Z 2023-01-11T21:38:06.1267801Z def call(args): 2023-01-11T21:38:06.1267881Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.1267955Z args.clear() 2023-01-11T21:38:06.1268079Z with torch.cuda.device(0): 2023-01-11T21:38:06.1268280Z buf0 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.1268470Z buf1 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.1268667Z buf2 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.1268760Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.1268927Z triton_fused_add_div_mul_0.run(arg0_1.item(), arg1_1, buf0, buf1, buf2, 6, grid=grid(6), stream=stream0) 2023-01-11T21:38:06.1269001Z del arg0_1 2023-01-11T21:38:06.1269078Z del arg1_1 2023-01-11T21:38:06.1269166Z return (buf0, buf1, buf2, ) 2023-01-11T21:38:06.1269172Z 2023-01-11T21:38:06.1269176Z 2023-01-11T21:38:06.1269256Z if __name__ == "__main__": 2023-01-11T21:38:06.1269367Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.1269496Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.1269685Z arg0_1 = rand_strided((), (), device='cpu', dtype=torch.float64) 2023-01-11T21:38:06.1269884Z arg1_1 = rand_strided((2, 3), (3, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.1270001Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.1270007Z 2023-01-11T21:38:06.1270011Z 2023-01-11T21:38:06.1270109Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.1270183Z import torch 2023-01-11T21:38:06.1270257Z import random 2023-01-11T21:38:06.1270369Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.1270492Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.1270523Z 2023-01-11T21:38:06.1270606Z aten = torch.ops.aten 2023-01-11T21:38:06.1270743Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.1270840Z async_compile = AsyncCompile() 2023-01-11T21:38:06.1270845Z 2023-01-11T21:38:06.1270919Z import triton 2023-01-11T21:38:06.1271012Z import triton.language as tl 2023-01-11T21:38:06.1271129Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.1271268Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.1271273Z 2023-01-11T21:38:06.1271277Z 2023-01-11T21:38:06.1271444Z triton_fused_add_div_mul_0 = async_compile.triton(''' 2023-01-11T21:38:06.1271517Z import triton 2023-01-11T21:38:06.1271609Z import triton.language as tl 2023-01-11T21:38:06.1271728Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.1271830Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.1271961Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.1272082Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.1272087Z 2023-01-11T21:38:06.1272536Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*fp32', 1: 'i32', 2: '*fp32', 3: '*fp32', 4: '*fp32', 5: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]}) 2023-01-11T21:38:06.1272609Z @triton.jit 2023-01-11T21:38:06.1272769Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, out_ptr2, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.1272843Z xnumel = 6 2023-01-11T21:38:06.1272940Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.1273072Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.1273154Z xmask = xindex < xnumel 2023-01-11T21:38:06.1273218Z x0 = xindex 2023-01-11T21:38:06.1273406Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.1273483Z tmp1 = in_ptr1 2023-01-11T21:38:06.1273580Z tmp5 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.1273669Z tmp2 = tmp1.to(tl.float32) 2023-01-11T21:38:06.1273750Z tmp3 = tmp0 + tmp2 2023-01-11T21:38:06.1273827Z tmp4 = tmp0 * tmp2 2023-01-11T21:38:06.1273897Z tmp6 = tmp5 / tmp2 2023-01-11T21:38:06.1274060Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.1274192Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.1274328Z tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp6, xmask) 2023-01-11T21:38:06.1274411Z ''') 2023-01-11T21:38:06.1274416Z 2023-01-11T21:38:06.1274421Z 2023-01-11T21:38:06.1274512Z async_compile.wait(globals()) 2023-01-11T21:38:06.1274588Z del async_compile 2023-01-11T21:38:06.1274593Z 2023-01-11T21:38:06.1274668Z def call(args): 2023-01-11T21:38:06.1274740Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.1274815Z args.clear() 2023-01-11T21:38:06.1274912Z with torch.cuda.device(0): 2023-01-11T21:38:06.1275110Z buf0 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.1275335Z buf1 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.1275551Z buf2 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.1275647Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.1275807Z triton_fused_add_div_mul_0.run(arg0_1, arg1_1.item(), buf0, buf1, buf2, 6, grid=grid(6), stream=stream0) 2023-01-11T21:38:06.1275880Z del arg0_1 2023-01-11T21:38:06.1275953Z del arg1_1 2023-01-11T21:38:06.1276043Z return (buf0, buf1, buf2, ) 2023-01-11T21:38:06.1276048Z 2023-01-11T21:38:06.1276052Z 2023-01-11T21:38:06.1276130Z if __name__ == "__main__": 2023-01-11T21:38:06.1276249Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.1276373Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.1276596Z arg0_1 = rand_strided((2, 3), (3, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.1276771Z arg1_1 = rand_strided((), (), device='cpu', dtype=torch.int32) 2023-01-11T21:38:06.1276888Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.1276893Z 2023-01-11T21:38:06.1277144Z [2023-01-11 21:32:43,109] torch._inductor.compile_fx: [WARNING] skipping cudagraphs due to multiple devices 2023-01-11T21:38:06.1277410Z [2023-01-11 21:32:43,109] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 309 2023-01-11T21:38:06.1277664Z [2023-01-11 21:32:43,124] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 310 2023-01-11T21:38:06.1277912Z [2023-01-11 21:32:43,540] torch._inductor.compile_fx: [WARNING] skipping cudagraphs due to multiple devices 2023-01-11T21:38:06.1278171Z [2023-01-11 21:32:43,540] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 310 2023-01-11T21:38:06.1278427Z [2023-01-11 21:32:43,557] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 311 2023-01-11T21:38:06.1278673Z [2023-01-11 21:32:43,974] torch._inductor.compile_fx: [WARNING] skipping cudagraphs due to multiple devices 2023-01-11T21:38:06.1278925Z [2023-01-11 21:32:43,974] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 311 2023-01-11T21:38:06.1278939Z 2023-01-11T21:38:06.1279031Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.1279103Z import torch 2023-01-11T21:38:06.1279178Z import random 2023-01-11T21:38:06.1279302Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.1279425Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.1279430Z 2023-01-11T21:38:06.1279512Z aten = torch.ops.aten 2023-01-11T21:38:06.1279650Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.1279739Z async_compile = AsyncCompile() 2023-01-11T21:38:06.1279744Z 2023-01-11T21:38:06.1279819Z import triton 2023-01-11T21:38:06.1279911Z import triton.language as tl 2023-01-11T21:38:06.1280034Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.1280178Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.1280183Z 2023-01-11T21:38:06.1280188Z 2023-01-11T21:38:06.1280381Z triton_fused_add_div_mul_0 = async_compile.triton(''' 2023-01-11T21:38:06.1280458Z import triton 2023-01-11T21:38:06.1280550Z import triton.language as tl 2023-01-11T21:38:06.1280658Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.1280757Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.1280894Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.1281020Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.1281025Z 2023-01-11T21:38:06.1281467Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: 'i32', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: '*fp32', 5: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]}) 2023-01-11T21:38:06.1281542Z @triton.jit 2023-01-11T21:38:06.1281700Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, out_ptr2, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.1281775Z xnumel = 6 2023-01-11T21:38:06.1281869Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.1281999Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.1282087Z xmask = xindex < xnumel 2023-01-11T21:38:06.1282157Z x0 = xindex 2023-01-11T21:38:06.1282232Z tmp0 = in_ptr0 2023-01-11T21:38:06.1282422Z tmp2 = tl.load(in_ptr1 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.1282519Z tmp5 = tl.load(in_ptr1 + (x0), xmask) 2023-01-11T21:38:06.1282601Z tmp1 = tmp0.to(tl.float32) 2023-01-11T21:38:06.1282681Z tmp3 = tmp1 + tmp2 2023-01-11T21:38:06.1282758Z tmp4 = tmp1 * tmp2 2023-01-11T21:38:06.1282868Z tmp6 = tmp1 / tmp5 2023-01-11T21:38:06.1283003Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.1283135Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.1283266Z tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp6, xmask) 2023-01-11T21:38:06.1283344Z ''') 2023-01-11T21:38:06.1283352Z 2023-01-11T21:38:06.1283363Z 2023-01-11T21:38:06.1283450Z async_compile.wait(globals()) 2023-01-11T21:38:06.1283526Z del async_compile 2023-01-11T21:38:06.1283532Z 2023-01-11T21:38:06.1283606Z def call(args): 2023-01-11T21:38:06.1283685Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.1283761Z args.clear() 2023-01-11T21:38:06.1283855Z with torch.cuda.device(0): 2023-01-11T21:38:06.1284056Z buf0 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.1284247Z buf1 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.1284443Z buf2 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.1284537Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.1284702Z triton_fused_add_div_mul_0.run(arg0_1.item(), arg1_1, buf0, buf1, buf2, 6, grid=grid(6), stream=stream0) 2023-01-11T21:38:06.1284775Z del arg0_1 2023-01-11T21:38:06.1284850Z del arg1_1 2023-01-11T21:38:06.1284938Z return (buf0, buf1, buf2, ) 2023-01-11T21:38:06.1284943Z 2023-01-11T21:38:06.1284948Z 2023-01-11T21:38:06.1285030Z if __name__ == "__main__": 2023-01-11T21:38:06.1285141Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.1285269Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.1285450Z arg0_1 = rand_strided((), (), device='cpu', dtype=torch.int32) 2023-01-11T21:38:06.1285647Z arg1_1 = rand_strided((2, 3), (3, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.1285768Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.1285775Z 2023-01-11T21:38:06.1285779Z 2023-01-11T21:38:06.1285878Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.1285952Z import torch 2023-01-11T21:38:06.1286020Z import random 2023-01-11T21:38:06.1286138Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.1286262Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.1286296Z 2023-01-11T21:38:06.1286381Z aten = torch.ops.aten 2023-01-11T21:38:06.1286521Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.1286614Z async_compile = AsyncCompile() 2023-01-11T21:38:06.1286619Z 2023-01-11T21:38:06.1286697Z import triton 2023-01-11T21:38:06.1286787Z import triton.language as tl 2023-01-11T21:38:06.1286905Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.1287045Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.1287050Z 2023-01-11T21:38:06.1287054Z 2023-01-11T21:38:06.1287219Z triton_fused_add_div_mul_0 = async_compile.triton(''' 2023-01-11T21:38:06.1287299Z import triton 2023-01-11T21:38:06.1287391Z import triton.language as tl 2023-01-11T21:38:06.1287506Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.1287607Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.1287740Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.1287862Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.1287867Z 2023-01-11T21:38:06.1288307Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*fp32', 1: 'i64', 2: '*fp32', 3: '*fp32', 4: '*fp32', 5: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]}) 2023-01-11T21:38:06.1288383Z @triton.jit 2023-01-11T21:38:06.1288542Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, out_ptr2, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.1288615Z xnumel = 6 2023-01-11T21:38:06.1288712Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.1288869Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.1288952Z xmask = xindex < xnumel 2023-01-11T21:38:06.1289015Z x0 = xindex 2023-01-11T21:38:06.1289207Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.1289281Z tmp1 = in_ptr1 2023-01-11T21:38:06.1289382Z tmp5 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.1289471Z tmp2 = tmp1.to(tl.float32) 2023-01-11T21:38:06.1289551Z tmp3 = tmp0 + tmp2 2023-01-11T21:38:06.1289630Z tmp4 = tmp0 * tmp2 2023-01-11T21:38:06.1289701Z tmp6 = tmp5 / tmp2 2023-01-11T21:38:06.1289834Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.1289968Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.1290099Z tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp6, xmask) 2023-01-11T21:38:06.1290184Z ''') 2023-01-11T21:38:06.1290192Z 2023-01-11T21:38:06.1290196Z 2023-01-11T21:38:06.1290288Z async_compile.wait(globals()) 2023-01-11T21:38:06.1290364Z del async_compile 2023-01-11T21:38:06.1290369Z 2023-01-11T21:38:06.1290441Z def call(args): 2023-01-11T21:38:06.1290514Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.1290588Z args.clear() 2023-01-11T21:38:06.1290682Z with torch.cuda.device(0): 2023-01-11T21:38:06.1290881Z buf0 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.1291078Z buf1 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.1291273Z buf2 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.1291365Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.1291524Z triton_fused_add_div_mul_0.run(arg0_1, arg1_1.item(), buf0, buf1, buf2, 6, grid=grid(6), stream=stream0) 2023-01-11T21:38:06.1291596Z del arg0_1 2023-01-11T21:38:06.1291673Z del arg1_1 2023-01-11T21:38:06.1291765Z return (buf0, buf1, buf2, ) 2023-01-11T21:38:06.1291771Z 2023-01-11T21:38:06.1291775Z 2023-01-11T21:38:06.1291857Z if __name__ == "__main__": 2023-01-11T21:38:06.1291977Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.1292103Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.1292326Z arg0_1 = rand_strided((2, 3), (3, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.1292503Z arg1_1 = rand_strided((), (), device='cpu', dtype=torch.int64) 2023-01-11T21:38:06.1292623Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.1292628Z 2023-01-11T21:38:06.1292632Z 2023-01-11T21:38:06.1292729Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.1292803Z import torch 2023-01-11T21:38:06.1292875Z import random 2023-01-11T21:38:06.1292994Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.1293117Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.1293122Z 2023-01-11T21:38:06.1293200Z aten = torch.ops.aten 2023-01-11T21:38:06.1293335Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.1293430Z async_compile = AsyncCompile() 2023-01-11T21:38:06.1293436Z 2023-01-11T21:38:06.1293510Z import triton 2023-01-11T21:38:06.1293599Z import triton.language as tl 2023-01-11T21:38:06.1293729Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.1293869Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.1293874Z 2023-01-11T21:38:06.1293879Z 2023-01-11T21:38:06.1294044Z triton_fused_add_div_mul_0 = async_compile.triton(''' 2023-01-11T21:38:06.1294111Z import triton 2023-01-11T21:38:06.1294201Z import triton.language as tl 2023-01-11T21:38:06.1294311Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.1294412Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.1294655Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.1294781Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.1294829Z 2023-01-11T21:38:06.1295272Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: 'i64', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: '*fp32', 5: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]}) 2023-01-11T21:38:06.1295347Z @triton.jit 2023-01-11T21:38:06.1295504Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, out_ptr2, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.1295570Z xnumel = 6 2023-01-11T21:38:06.1295667Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.1295796Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.1295880Z xmask = xindex < xnumel 2023-01-11T21:38:06.1295953Z x0 = xindex 2023-01-11T21:38:06.1296027Z tmp0 = in_ptr0 2023-01-11T21:38:06.1296219Z tmp2 = tl.load(in_ptr1 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.1296313Z tmp5 = tl.load(in_ptr1 + (x0), xmask) 2023-01-11T21:38:06.1296402Z tmp1 = tmp0.to(tl.float32) 2023-01-11T21:38:06.1296480Z tmp3 = tmp1 + tmp2 2023-01-11T21:38:06.1296557Z tmp4 = tmp1 * tmp2 2023-01-11T21:38:06.1296635Z tmp6 = tmp1 / tmp5 2023-01-11T21:38:06.1296769Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.1296905Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.1297030Z tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp6, xmask) 2023-01-11T21:38:06.1297114Z ''') 2023-01-11T21:38:06.1297120Z 2023-01-11T21:38:06.1297166Z 2023-01-11T21:38:06.1297275Z async_compile.wait(globals()) 2023-01-11T21:38:06.1297367Z del async_compile 2023-01-11T21:38:06.1297373Z 2023-01-11T21:38:06.1297459Z def call(args): 2023-01-11T21:38:06.1297551Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.1297627Z args.clear() 2023-01-11T21:38:06.1297713Z with torch.cuda.device(0): 2023-01-11T21:38:06.1297915Z buf0 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.1298112Z buf1 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.1298303Z buf2 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.1298435Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.1298603Z triton_fused_add_div_mul_0.run(arg0_1.item(), arg1_1, buf0, buf1, buf2, 6, grid=grid(6), stream=stream0) 2023-01-11T21:38:06.1298678Z del arg0_1 2023-01-11T21:38:06.1298750Z del arg1_1 2023-01-11T21:38:06.1298832Z return (buf0, buf1, buf2, ) 2023-01-11T21:38:06.1298837Z 2023-01-11T21:38:06.1298842Z 2023-01-11T21:38:06.1298920Z if __name__ == "__main__": 2023-01-11T21:38:06.1299038Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.1299166Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.1299349Z arg0_1 = rand_strided((), (), device='cpu', dtype=torch.int64) 2023-01-11T21:38:06.1299553Z arg1_1 = rand_strided((2, 3), (3, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.1299673Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.1299679Z 2023-01-11T21:38:06.1299750Z ok (4.701s) 2023-01-11T21:38:06.1300202Z test_unsqueeze_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.1300336Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.1300596Z [2023-01-11 21:32:44,005] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 312 2023-01-11T21:38:06.1300858Z [2023-01-11 21:32:45,826] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 312 2023-01-11T21:38:06.1300902Z 2023-01-11T21:38:06.1301000Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.1301077Z import torch 2023-01-11T21:38:06.1301153Z import random 2023-01-11T21:38:06.1301272Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.1301399Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.1301404Z 2023-01-11T21:38:06.1301479Z aten = torch.ops.aten 2023-01-11T21:38:06.1301614Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.1301712Z async_compile = AsyncCompile() 2023-01-11T21:38:06.1301717Z 2023-01-11T21:38:06.1301793Z import triton 2023-01-11T21:38:06.1301889Z import triton.language as tl 2023-01-11T21:38:06.1302016Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.1302155Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.1302161Z 2023-01-11T21:38:06.1302168Z 2023-01-11T21:38:06.1302305Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.1302506Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.1302632Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.1302737Z float* __restrict__ out_ptr0, 2023-01-11T21:38:06.1302842Z float* __restrict__ out_ptr1, 2023-01-11T21:38:06.1302944Z float* __restrict__ out_ptr2, 2023-01-11T21:38:06.1303044Z float* __restrict__ out_ptr3) 2023-01-11T21:38:06.1303111Z { 2023-01-11T21:38:06.1303207Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.1303272Z { 2023-01-11T21:38:06.1303353Z #pragma omp for 2023-01-11T21:38:06.1303442Z for(long i0=0; i0<2; i0+=1) 2023-01-11T21:38:06.1303512Z { 2023-01-11T21:38:06.1303653Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:06.1303797Z auto tmp1 = at::vec::Vectorized(static_cast(1)); 2023-01-11T21:38:06.1303888Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.1304016Z auto tmp3 = at::vec::Vectorized(static_cast(2)); 2023-01-11T21:38:06.1304107Z auto tmp4 = tmp2 + tmp3; 2023-01-11T21:38:06.1304224Z auto tmp5 = tmp0 + tmp3; 2023-01-11T21:38:06.1304323Z tmp4.store(out_ptr0 + 8*i0); 2023-01-11T21:38:06.1304418Z tmp5.store(out_ptr1 + 8*i0); 2023-01-11T21:38:06.1304511Z tmp4.store(out_ptr2 + 8*i0); 2023-01-11T21:38:06.1304606Z tmp5.store(out_ptr3 + 8*i0); 2023-01-11T21:38:06.1304666Z } 2023-01-11T21:38:06.1304768Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:06.1304857Z for(long i0=16; i0<16; i0+=1) 2023-01-11T21:38:06.1304924Z { 2023-01-11T21:38:06.1305013Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:06.1305117Z auto tmp1 = static_cast(1); 2023-01-11T21:38:06.1305210Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.1305307Z auto tmp3 = static_cast(2); 2023-01-11T21:38:06.1305397Z auto tmp4 = tmp2 + tmp3; 2023-01-11T21:38:06.1305505Z auto tmp5 = tmp0 + tmp3; 2023-01-11T21:38:06.1305597Z out_ptr0[i0] = tmp4; 2023-01-11T21:38:06.1305704Z out_ptr1[i0] = tmp5; 2023-01-11T21:38:06.1305790Z out_ptr2[i0] = tmp4; 2023-01-11T21:38:06.1305866Z out_ptr3[i0] = tmp5; 2023-01-11T21:38:06.1305936Z } 2023-01-11T21:38:06.1306002Z } 2023-01-11T21:38:06.1306067Z } 2023-01-11T21:38:06.1306152Z ''') 2023-01-11T21:38:06.1306158Z 2023-01-11T21:38:06.1306162Z 2023-01-11T21:38:06.1306257Z async_compile.wait(globals()) 2023-01-11T21:38:06.1306334Z del async_compile 2023-01-11T21:38:06.1306339Z 2023-01-11T21:38:06.1306415Z def call(args): 2023-01-11T21:38:06.1306482Z arg0_1, = args 2023-01-11T21:38:06.1306557Z args.clear() 2023-01-11T21:38:06.1306802Z buf0 = empty_strided((2, 2, 2, 2, 1), (8, 4, 2, 1, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1307019Z buf1 = empty_strided((2, 2, 1, 2, 2), (8, 4, 4, 2, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1307233Z buf2 = empty_strided((1, 2, 2, 2, 2), (16, 8, 4, 2, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1307449Z buf3 = empty_strided((2, 2, 2, 1, 2), (8, 4, 2, 2, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1307665Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr()), c_void_p(buf3.data_ptr())) 2023-01-11T21:38:06.1307738Z del arg0_1 2023-01-11T21:38:06.1307823Z return (buf0, buf1, buf2, buf3, ) 2023-01-11T21:38:06.1307828Z 2023-01-11T21:38:06.1307832Z 2023-01-11T21:38:06.1307912Z if __name__ == "__main__": 2023-01-11T21:38:06.1308031Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.1308160Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.1308370Z arg0_1 = rand_strided((2, 2, 2, 2), (8, 4, 2, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1308483Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.1308488Z 2023-01-11T21:38:06.1308558Z ok (1.852s) 2023-01-11T21:38:06.1309023Z test_unsqueeze_inplace_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.1309156Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.1309406Z [2023-01-11 21:32:45,858] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 313 2023-01-11T21:38:06.1309666Z [2023-01-11 21:32:47,526] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 313 2023-01-11T21:38:06.1309674Z 2023-01-11T21:38:06.1309773Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.1309848Z import torch 2023-01-11T21:38:06.1309924Z import random 2023-01-11T21:38:06.1310041Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.1310194Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.1310199Z 2023-01-11T21:38:06.1310282Z aten = torch.ops.aten 2023-01-11T21:38:06.1310411Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.1310508Z async_compile = AsyncCompile() 2023-01-11T21:38:06.1310513Z 2023-01-11T21:38:06.1310588Z import triton 2023-01-11T21:38:06.1310680Z import triton.language as tl 2023-01-11T21:38:06.1310808Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.1310952Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.1310958Z 2023-01-11T21:38:06.1310962Z 2023-01-11T21:38:06.1311103Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.1311312Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.1311430Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.1311535Z float* __restrict__ out_ptr0, 2023-01-11T21:38:06.1311640Z float* __restrict__ out_ptr1) 2023-01-11T21:38:06.1311708Z { 2023-01-11T21:38:06.1311812Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.1311880Z { 2023-01-11T21:38:06.1311962Z #pragma omp for 2023-01-11T21:38:06.1312042Z for(long i0=0; i0<2; i0+=1) 2023-01-11T21:38:06.1312109Z { 2023-01-11T21:38:06.1312249Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:06.1312385Z auto tmp1 = at::vec::Vectorized(static_cast(1)); 2023-01-11T21:38:06.1312476Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.1312640Z auto tmp3 = at::vec::Vectorized(static_cast(2)); 2023-01-11T21:38:06.1312729Z auto tmp4 = tmp2 + tmp3; 2023-01-11T21:38:06.1312818Z tmp2.store(out_ptr0 + 8*i0); 2023-01-11T21:38:06.1312913Z tmp4.store(out_ptr1 + 8*i0); 2023-01-11T21:38:06.1312979Z } 2023-01-11T21:38:06.1313079Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:06.1313169Z for(long i0=16; i0<16; i0+=1) 2023-01-11T21:38:06.1313235Z { 2023-01-11T21:38:06.1313325Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:06.1313422Z auto tmp1 = static_cast(1); 2023-01-11T21:38:06.1313513Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.1313617Z auto tmp3 = static_cast(2); 2023-01-11T21:38:06.1313707Z auto tmp4 = tmp2 + tmp3; 2023-01-11T21:38:06.1313793Z out_ptr0[i0] = tmp2; 2023-01-11T21:38:06.1313876Z out_ptr1[i0] = tmp4; 2023-01-11T21:38:06.1313943Z } 2023-01-11T21:38:06.1314006Z } 2023-01-11T21:38:06.1314069Z } 2023-01-11T21:38:06.1314154Z ''') 2023-01-11T21:38:06.1314160Z 2023-01-11T21:38:06.1314164Z 2023-01-11T21:38:06.1314260Z async_compile.wait(globals()) 2023-01-11T21:38:06.1314339Z del async_compile 2023-01-11T21:38:06.1314344Z 2023-01-11T21:38:06.1314418Z def call(args): 2023-01-11T21:38:06.1314498Z arg0_1, = args 2023-01-11T21:38:06.1314565Z args.clear() 2023-01-11T21:38:06.1314783Z buf0 = empty_strided((2, 2, 1, 2, 2), (8, 4, 4, 2, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1314999Z buf1 = empty_strided((1, 2, 2, 2, 2), (16, 8, 4, 2, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1315173Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr())) 2023-01-11T21:38:06.1315253Z del arg0_1 2023-01-11T21:38:06.1315332Z return (buf0, buf1, ) 2023-01-11T21:38:06.1315337Z 2023-01-11T21:38:06.1315341Z 2023-01-11T21:38:06.1315442Z if __name__ == "__main__": 2023-01-11T21:38:06.1315575Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.1315712Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.1315918Z arg0_1 = rand_strided((2, 2, 2, 2), (8, 4, 2, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1316031Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.1316065Z 2023-01-11T21:38:06.1316136Z ok (1.700s) 2023-01-11T21:38:06.1316604Z test_upsample_bicubic2d_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.1316736Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.1316993Z [2023-01-11 21:32:49,107] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 314 2023-01-11T21:38:06.1317002Z 2023-01-11T21:38:06.1317098Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.1317173Z import torch 2023-01-11T21:38:06.1317248Z import random 2023-01-11T21:38:06.1317360Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.1317486Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.1317492Z 2023-01-11T21:38:06.1317574Z aten = torch.ops.aten 2023-01-11T21:38:06.1317709Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.1317803Z async_compile = AsyncCompile() 2023-01-11T21:38:06.1317808Z 2023-01-11T21:38:06.1317882Z import triton 2023-01-11T21:38:06.1317975Z import triton.language as tl 2023-01-11T21:38:06.1318093Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.1318230Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.1318236Z 2023-01-11T21:38:06.1318266Z 2023-01-11T21:38:06.1318402Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.1318610Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.1318736Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.1318842Z float* __restrict__ out_ptr0, 2023-01-11T21:38:06.1318947Z float* __restrict__ out_ptr1) 2023-01-11T21:38:06.1319011Z { 2023-01-11T21:38:06.1319105Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.1319174Z { 2023-01-11T21:38:06.1319269Z #pragma omp for collapse(2) 2023-01-11T21:38:06.1319354Z for(long i0=0; i0<12; i0+=1) 2023-01-11T21:38:06.1319421Z { 2023-01-11T21:38:06.1319512Z for(long i1=0; i1<128; i1+=1) 2023-01-11T21:38:06.1319579Z { 2023-01-11T21:38:06.1319660Z #pragma GCC ivdep 2023-01-11T21:38:06.1319753Z for(long i2=0; i2<128; i2+=1) 2023-01-11T21:38:06.1319826Z { 2023-01-11T21:38:06.1319898Z { 2023-01-11T21:38:06.1319972Z { 2023-01-11T21:38:06.1320089Z auto tmp0 = static_cast(i2); 2023-01-11T21:38:06.1320197Z auto tmp1 = 0.2440944881889764 * tmp0; 2023-01-11T21:38:06.1320303Z auto tmp2 = std::floor(tmp1); 2023-01-11T21:38:06.1320452Z auto tmp3 = tmp1 - tmp2; 2023-01-11T21:38:06.1320565Z auto tmp4 = static_cast(i1); 2023-01-11T21:38:06.1320672Z auto tmp5 = 0.49606299212598426 * tmp4; 2023-01-11T21:38:06.1320780Z auto tmp6 = std::floor(tmp5); 2023-01-11T21:38:06.1320924Z auto tmp7 = tmp5 - tmp6; 2023-01-11T21:38:06.1321035Z auto tmp8 = static_cast(tmp6); 2023-01-11T21:38:06.1321140Z auto tmp9 = static_cast(tmp2); 2023-01-11T21:38:06.1321291Z auto tmp10 = tmp8 + -1; 2023-01-11T21:38:06.1321389Z auto tmp11 = tmp8 + 0; 2023-01-11T21:38:06.1321485Z auto tmp12 = tmp8 + 1; 2023-01-11T21:38:06.1321579Z auto tmp13 = tmp8 + 2; 2023-01-11T21:38:06.1321747Z auto tmp14 = tmp9 + -1; 2023-01-11T21:38:06.1321843Z auto tmp15 = tmp9 + 0; 2023-01-11T21:38:06.1321936Z auto tmp16 = tmp9 + 1; 2023-01-11T21:38:06.1322022Z auto tmp17 = tmp9 + 2; 2023-01-11T21:38:06.1322161Z auto tmp18 = (tmp10 != tmp10) ? tmp10 : std::min(63, tmp10); 2023-01-11T21:38:06.1322294Z auto tmp19 = (tmp18 != tmp18) ? tmp18 : std::max(0, tmp18); 2023-01-11T21:38:06.1322428Z auto tmp20 = (tmp14 != tmp14) ? tmp14 : std::min(31, tmp14); 2023-01-11T21:38:06.1322560Z auto tmp21 = (tmp20 != tmp20) ? tmp20 : std::max(0, tmp20); 2023-01-11T21:38:06.1322685Z auto tmp22 = in_ptr0[tmp21 + (32*tmp19) + (2048*i0)]; 2023-01-11T21:38:06.1322815Z auto tmp23 = (tmp15 != tmp15) ? tmp15 : std::min(31, tmp15); 2023-01-11T21:38:06.1322950Z auto tmp24 = (tmp23 != tmp23) ? tmp23 : std::max(0, tmp23); 2023-01-11T21:38:06.1323072Z auto tmp25 = in_ptr0[tmp24 + (32*tmp19) + (2048*i0)]; 2023-01-11T21:38:06.1323195Z auto tmp26 = (tmp16 != tmp16) ? tmp16 : std::min(31, tmp16); 2023-01-11T21:38:06.1323325Z auto tmp27 = (tmp26 != tmp26) ? tmp26 : std::max(0, tmp26); 2023-01-11T21:38:06.1323448Z auto tmp28 = in_ptr0[tmp27 + (32*tmp19) + (2048*i0)]; 2023-01-11T21:38:06.1323576Z auto tmp29 = (tmp17 != tmp17) ? tmp17 : std::min(31, tmp17); 2023-01-11T21:38:06.1323733Z auto tmp30 = (tmp29 != tmp29) ? tmp29 : std::max(0, tmp29); 2023-01-11T21:38:06.1323856Z auto tmp31 = in_ptr0[tmp30 + (32*tmp19) + (2048*i0)]; 2023-01-11T21:38:06.1323957Z auto tmp32 = tmp3 + 1.0; 2023-01-11T21:38:06.1324110Z auto tmp33 = -0.75 * tmp32; 2023-01-11T21:38:06.1324254Z auto tmp34 = tmp33 - -3.75; 2023-01-11T21:38:06.1324356Z auto tmp35 = tmp34 * tmp32; 2023-01-11T21:38:06.1324505Z auto tmp36 = tmp35 + -6.0; 2023-01-11T21:38:06.1324605Z auto tmp37 = tmp36 * tmp32; 2023-01-11T21:38:06.1324751Z auto tmp38 = tmp37 - -3.0; 2023-01-11T21:38:06.1324854Z auto tmp39 = 1.25 * tmp3; 2023-01-11T21:38:06.1325001Z auto tmp40 = tmp39 - 2.25; 2023-01-11T21:38:06.1325099Z auto tmp41 = tmp40 * tmp3; 2023-01-11T21:38:06.1325196Z auto tmp42 = tmp41 * tmp3; 2023-01-11T21:38:06.1325296Z auto tmp43 = tmp42 + 1.0; 2023-01-11T21:38:06.1325438Z auto tmp44 = 1.0 - tmp3; 2023-01-11T21:38:06.1325542Z auto tmp45 = 1.25 * tmp44; 2023-01-11T21:38:06.1325686Z auto tmp46 = tmp45 - 2.25; 2023-01-11T21:38:06.1325788Z auto tmp47 = tmp46 * tmp44; 2023-01-11T21:38:06.1325886Z auto tmp48 = tmp47 * tmp44; 2023-01-11T21:38:06.1325978Z auto tmp49 = tmp48 + 1.0; 2023-01-11T21:38:06.1326077Z auto tmp50 = tmp44 + 1.0; 2023-01-11T21:38:06.1326223Z auto tmp51 = -0.75 * tmp50; 2023-01-11T21:38:06.1326370Z auto tmp52 = tmp51 - -3.75; 2023-01-11T21:38:06.1326468Z auto tmp53 = tmp52 * tmp50; 2023-01-11T21:38:06.1326615Z auto tmp54 = tmp53 + -6.0; 2023-01-11T21:38:06.1326714Z auto tmp55 = tmp54 * tmp50; 2023-01-11T21:38:06.1326860Z auto tmp56 = tmp55 - -3.0; 2023-01-11T21:38:06.1326952Z auto tmp57 = tmp22 * tmp38; 2023-01-11T21:38:06.1327077Z auto tmp58 = tmp25 * tmp43; 2023-01-11T21:38:06.1327179Z auto tmp59 = tmp28 * tmp49; 2023-01-11T21:38:06.1327277Z auto tmp60 = tmp31 * tmp56; 2023-01-11T21:38:06.1327376Z auto tmp61 = tmp59 + tmp60; 2023-01-11T21:38:06.1327474Z auto tmp62 = tmp58 + tmp61; 2023-01-11T21:38:06.1327574Z auto tmp63 = tmp57 + tmp62; 2023-01-11T21:38:06.1327701Z auto tmp64 = (tmp11 != tmp11) ? tmp11 : std::min(63, tmp11); 2023-01-11T21:38:06.1327834Z auto tmp65 = (tmp64 != tmp64) ? tmp64 : std::max(0, tmp64); 2023-01-11T21:38:06.1327963Z auto tmp66 = in_ptr0[tmp21 + (32*tmp65) + (2048*i0)]; 2023-01-11T21:38:06.1328087Z auto tmp67 = in_ptr0[tmp24 + (32*tmp65) + (2048*i0)]; 2023-01-11T21:38:06.1328212Z auto tmp68 = in_ptr0[tmp27 + (32*tmp65) + (2048*i0)]; 2023-01-11T21:38:06.1328331Z auto tmp69 = in_ptr0[tmp30 + (32*tmp65) + (2048*i0)]; 2023-01-11T21:38:06.1328433Z auto tmp70 = tmp66 * tmp38; 2023-01-11T21:38:06.1328533Z auto tmp71 = tmp67 * tmp43; 2023-01-11T21:38:06.1328626Z auto tmp72 = tmp68 * tmp49; 2023-01-11T21:38:06.1328727Z auto tmp73 = tmp69 * tmp56; 2023-01-11T21:38:06.1328825Z auto tmp74 = tmp72 + tmp73; 2023-01-11T21:38:06.1328923Z auto tmp75 = tmp71 + tmp74; 2023-01-11T21:38:06.1329057Z auto tmp76 = tmp70 + tmp75; 2023-01-11T21:38:06.1329189Z auto tmp77 = (tmp12 != tmp12) ? tmp12 : std::min(63, tmp12); 2023-01-11T21:38:06.1329317Z auto tmp78 = (tmp77 != tmp77) ? tmp77 : std::max(0, tmp77); 2023-01-11T21:38:06.1329441Z auto tmp79 = in_ptr0[tmp21 + (32*tmp78) + (2048*i0)]; 2023-01-11T21:38:06.1329555Z auto tmp80 = in_ptr0[tmp24 + (32*tmp78) + (2048*i0)]; 2023-01-11T21:38:06.1329675Z auto tmp81 = in_ptr0[tmp27 + (32*tmp78) + (2048*i0)]; 2023-01-11T21:38:06.1329799Z auto tmp82 = in_ptr0[tmp30 + (32*tmp78) + (2048*i0)]; 2023-01-11T21:38:06.1329899Z auto tmp83 = tmp79 * tmp38; 2023-01-11T21:38:06.1329999Z auto tmp84 = tmp80 * tmp43; 2023-01-11T21:38:06.1330103Z auto tmp85 = tmp81 * tmp49; 2023-01-11T21:38:06.1330205Z auto tmp86 = tmp82 * tmp56; 2023-01-11T21:38:06.1330302Z auto tmp87 = tmp85 + tmp86; 2023-01-11T21:38:06.1330393Z auto tmp88 = tmp84 + tmp87; 2023-01-11T21:38:06.1330494Z auto tmp89 = tmp83 + tmp88; 2023-01-11T21:38:06.1330630Z auto tmp90 = (tmp13 != tmp13) ? tmp13 : std::min(63, tmp13); 2023-01-11T21:38:06.1330762Z auto tmp91 = (tmp90 != tmp90) ? tmp90 : std::max(0, tmp90); 2023-01-11T21:38:06.1330883Z auto tmp92 = in_ptr0[tmp21 + (32*tmp91) + (2048*i0)]; 2023-01-11T21:38:06.1331003Z auto tmp93 = in_ptr0[tmp24 + (32*tmp91) + (2048*i0)]; 2023-01-11T21:38:06.1331122Z auto tmp94 = in_ptr0[tmp27 + (32*tmp91) + (2048*i0)]; 2023-01-11T21:38:06.1331241Z auto tmp95 = in_ptr0[tmp30 + (32*tmp91) + (2048*i0)]; 2023-01-11T21:38:06.1331337Z auto tmp96 = tmp92 * tmp38; 2023-01-11T21:38:06.1331437Z auto tmp97 = tmp93 * tmp43; 2023-01-11T21:38:06.1331537Z auto tmp98 = tmp94 * tmp49; 2023-01-11T21:38:06.1331635Z auto tmp99 = tmp95 * tmp56; 2023-01-11T21:38:06.1331769Z auto tmp100 = tmp98 + tmp99; 2023-01-11T21:38:06.1331877Z auto tmp101 = tmp97 + tmp100; 2023-01-11T21:38:06.1331981Z auto tmp102 = tmp96 + tmp101; 2023-01-11T21:38:06.1332083Z auto tmp103 = tmp7 + 1.0; 2023-01-11T21:38:06.1332231Z auto tmp104 = -0.75 * tmp103; 2023-01-11T21:38:06.1332384Z auto tmp105 = tmp104 - -3.75; 2023-01-11T21:38:06.1332489Z auto tmp106 = tmp105 * tmp103; 2023-01-11T21:38:06.1332642Z auto tmp107 = tmp106 + -6.0; 2023-01-11T21:38:06.1332749Z auto tmp108 = tmp107 * tmp103; 2023-01-11T21:38:06.1332904Z auto tmp109 = tmp108 - -3.0; 2023-01-11T21:38:06.1333003Z auto tmp110 = 1.25 * tmp7; 2023-01-11T21:38:06.1333147Z auto tmp111 = tmp110 - 2.25; 2023-01-11T21:38:06.1333254Z auto tmp112 = tmp111 * tmp7; 2023-01-11T21:38:06.1333356Z auto tmp113 = tmp112 * tmp7; 2023-01-11T21:38:06.1333458Z auto tmp114 = tmp113 + 1.0; 2023-01-11T21:38:06.1333603Z auto tmp115 = 1.0 - tmp7; 2023-01-11T21:38:06.1333704Z auto tmp116 = 1.25 * tmp115; 2023-01-11T21:38:06.1333851Z auto tmp117 = tmp116 - 2.25; 2023-01-11T21:38:06.1333949Z auto tmp118 = tmp117 * tmp115; 2023-01-11T21:38:06.1334055Z auto tmp119 = tmp118 * tmp115; 2023-01-11T21:38:06.1334184Z auto tmp120 = tmp119 + 1.0; 2023-01-11T21:38:06.1334283Z auto tmp121 = tmp115 + 1.0; 2023-01-11T21:38:06.1334433Z auto tmp122 = -0.75 * tmp121; 2023-01-11T21:38:06.1334939Z auto tmp123 = tmp122 - -3.75; 2023-01-11T21:38:06.1335049Z auto tmp124 = tmp123 * tmp121; 2023-01-11T21:38:06.1335201Z auto tmp125 = tmp124 + -6.0; 2023-01-11T21:38:06.1335299Z auto tmp126 = tmp125 * tmp121; 2023-01-11T21:38:06.1335451Z auto tmp127 = tmp126 - -3.0; 2023-01-11T21:38:06.1335554Z auto tmp128 = tmp63 * tmp109; 2023-01-11T21:38:06.1335660Z auto tmp129 = tmp76 * tmp114; 2023-01-11T21:38:06.1335764Z auto tmp130 = tmp89 * tmp120; 2023-01-11T21:38:06.1335867Z auto tmp131 = tmp102 * tmp127; 2023-01-11T21:38:06.1335975Z auto tmp132 = tmp130 + tmp131; 2023-01-11T21:38:06.1336071Z auto tmp133 = tmp129 + tmp132; 2023-01-11T21:38:06.1336173Z auto tmp134 = tmp128 + tmp133; 2023-01-11T21:38:06.1336288Z out_ptr0[i2 + (128*i1) + (16384*i0)] = tmp134; 2023-01-11T21:38:06.1336360Z } 2023-01-11T21:38:06.1336431Z } 2023-01-11T21:38:06.1336501Z } 2023-01-11T21:38:06.1336570Z } 2023-01-11T21:38:06.1336630Z } 2023-01-11T21:38:06.1336728Z #pragma omp for collapse(2) 2023-01-11T21:38:06.1336814Z for(long i0=0; i0<12; i0+=1) 2023-01-11T21:38:06.1336882Z { 2023-01-11T21:38:06.1336972Z for(long i1=0; i1<128; i1+=1) 2023-01-11T21:38:06.1337039Z { 2023-01-11T21:38:06.1337180Z #pragma GCC ivdep 2023-01-11T21:38:06.1337289Z for(long i2=0; i2<256; i2+=1) 2023-01-11T21:38:06.1337363Z { 2023-01-11T21:38:06.1337435Z { 2023-01-11T21:38:06.1337508Z { 2023-01-11T21:38:06.1337625Z auto tmp0 = static_cast(i2); 2023-01-11T21:38:06.1337723Z auto tmp1 = tmp0 + 0.5; 2023-01-11T21:38:06.1337874Z auto tmp2 = 0.125 * tmp1; 2023-01-11T21:38:06.1338015Z auto tmp3 = tmp2 - 0.5; 2023-01-11T21:38:06.1338127Z auto tmp4 = std::floor(tmp3); 2023-01-11T21:38:06.1338274Z auto tmp5 = tmp3 - tmp4; 2023-01-11T21:38:06.1338389Z auto tmp6 = static_cast(i1); 2023-01-11T21:38:06.1338486Z auto tmp7 = tmp6 + 0.5; 2023-01-11T21:38:06.1338579Z auto tmp8 = 0.5 * tmp7; 2023-01-11T21:38:06.1338722Z auto tmp9 = tmp8 - 0.5; 2023-01-11T21:38:06.1338836Z auto tmp10 = std::floor(tmp9); 2023-01-11T21:38:06.1338979Z auto tmp11 = tmp9 - tmp10; 2023-01-11T21:38:06.1339093Z auto tmp12 = static_cast(tmp10); 2023-01-11T21:38:06.1339209Z auto tmp13 = static_cast(tmp4); 2023-01-11T21:38:06.1339357Z auto tmp14 = tmp12 + -1; 2023-01-11T21:38:06.1339456Z auto tmp15 = tmp12 + 0; 2023-01-11T21:38:06.1339551Z auto tmp16 = tmp12 + 1; 2023-01-11T21:38:06.1339647Z auto tmp17 = tmp12 + 2; 2023-01-11T21:38:06.1339784Z auto tmp18 = tmp13 + -1; 2023-01-11T21:38:06.1339880Z auto tmp19 = tmp13 + 0; 2023-01-11T21:38:06.1339976Z auto tmp20 = tmp13 + 1; 2023-01-11T21:38:06.1340069Z auto tmp21 = tmp13 + 2; 2023-01-11T21:38:06.1340249Z auto tmp22 = (tmp14 != tmp14) ? tmp14 : std::min(63, tmp14); 2023-01-11T21:38:06.1340383Z auto tmp23 = (tmp22 != tmp22) ? tmp22 : std::max(0, tmp22); 2023-01-11T21:38:06.1340513Z auto tmp24 = (tmp18 != tmp18) ? tmp18 : std::min(31, tmp18); 2023-01-11T21:38:06.1340648Z auto tmp25 = (tmp24 != tmp24) ? tmp24 : std::max(0, tmp24); 2023-01-11T21:38:06.1340765Z auto tmp26 = in_ptr0[tmp25 + (32*tmp23) + (2048*i0)]; 2023-01-11T21:38:06.1340893Z auto tmp27 = (tmp19 != tmp19) ? tmp19 : std::min(31, tmp19); 2023-01-11T21:38:06.1341025Z auto tmp28 = (tmp27 != tmp27) ? tmp27 : std::max(0, tmp27); 2023-01-11T21:38:06.1341146Z auto tmp29 = in_ptr0[tmp28 + (32*tmp23) + (2048*i0)]; 2023-01-11T21:38:06.1341274Z auto tmp30 = (tmp20 != tmp20) ? tmp20 : std::min(31, tmp20); 2023-01-11T21:38:06.1341406Z auto tmp31 = (tmp30 != tmp30) ? tmp30 : std::max(0, tmp30); 2023-01-11T21:38:06.1341525Z auto tmp32 = in_ptr0[tmp31 + (32*tmp23) + (2048*i0)]; 2023-01-11T21:38:06.1341653Z auto tmp33 = (tmp21 != tmp21) ? tmp21 : std::min(31, tmp21); 2023-01-11T21:38:06.1341783Z auto tmp34 = (tmp33 != tmp33) ? tmp33 : std::max(0, tmp33); 2023-01-11T21:38:06.1341906Z auto tmp35 = in_ptr0[tmp34 + (32*tmp23) + (2048*i0)]; 2023-01-11T21:38:06.1342000Z auto tmp36 = tmp5 + 1.0; 2023-01-11T21:38:06.1342150Z auto tmp37 = -0.75 * tmp36; 2023-01-11T21:38:06.1342300Z auto tmp38 = tmp37 - -3.75; 2023-01-11T21:38:06.1342401Z auto tmp39 = tmp38 * tmp36; 2023-01-11T21:38:06.1342549Z auto tmp40 = tmp39 + -6.0; 2023-01-11T21:38:06.1342650Z auto tmp41 = tmp40 * tmp36; 2023-01-11T21:38:06.1342799Z auto tmp42 = tmp41 - -3.0; 2023-01-11T21:38:06.1342892Z auto tmp43 = 1.25 * tmp5; 2023-01-11T21:38:06.1343042Z auto tmp44 = tmp43 - 2.25; 2023-01-11T21:38:06.1343169Z auto tmp45 = tmp44 * tmp5; 2023-01-11T21:38:06.1343271Z auto tmp46 = tmp45 * tmp5; 2023-01-11T21:38:06.1343372Z auto tmp47 = tmp46 + 1.0; 2023-01-11T21:38:06.1343516Z auto tmp48 = 1.0 - tmp5; 2023-01-11T21:38:06.1343612Z auto tmp49 = 1.25 * tmp48; 2023-01-11T21:38:06.1343752Z auto tmp50 = tmp49 - 2.25; 2023-01-11T21:38:06.1343854Z auto tmp51 = tmp50 * tmp48; 2023-01-11T21:38:06.1343952Z auto tmp52 = tmp51 * tmp48; 2023-01-11T21:38:06.1344056Z auto tmp53 = tmp52 + 1.0; 2023-01-11T21:38:06.1344154Z auto tmp54 = tmp48 + 1.0; 2023-01-11T21:38:06.1344299Z auto tmp55 = -0.75 * tmp54; 2023-01-11T21:38:06.1344446Z auto tmp56 = tmp55 - -3.75; 2023-01-11T21:38:06.1344548Z auto tmp57 = tmp56 * tmp54; 2023-01-11T21:38:06.1344686Z auto tmp58 = tmp57 + -6.0; 2023-01-11T21:38:06.1344787Z auto tmp59 = tmp58 * tmp54; 2023-01-11T21:38:06.1344934Z auto tmp60 = tmp59 - -3.0; 2023-01-11T21:38:06.1345033Z auto tmp61 = tmp26 * tmp42; 2023-01-11T21:38:06.1345133Z auto tmp62 = tmp29 * tmp47; 2023-01-11T21:38:06.1345232Z auto tmp63 = tmp32 * tmp53; 2023-01-11T21:38:06.1345335Z auto tmp64 = tmp35 * tmp60; 2023-01-11T21:38:06.1345426Z auto tmp65 = tmp63 + tmp64; 2023-01-11T21:38:06.1345558Z auto tmp66 = tmp62 + tmp65; 2023-01-11T21:38:06.1345657Z auto tmp67 = tmp61 + tmp66; 2023-01-11T21:38:06.1345788Z auto tmp68 = (tmp15 != tmp15) ? tmp15 : std::min(63, tmp15); 2023-01-11T21:38:06.1345924Z auto tmp69 = (tmp68 != tmp68) ? tmp68 : std::max(0, tmp68); 2023-01-11T21:38:06.1346048Z auto tmp70 = in_ptr0[tmp25 + (32*tmp69) + (2048*i0)]; 2023-01-11T21:38:06.1346170Z auto tmp71 = in_ptr0[tmp28 + (32*tmp69) + (2048*i0)]; 2023-01-11T21:38:06.1346291Z auto tmp72 = in_ptr0[tmp31 + (32*tmp69) + (2048*i0)]; 2023-01-11T21:38:06.1346413Z auto tmp73 = in_ptr0[tmp34 + (32*tmp69) + (2048*i0)]; 2023-01-11T21:38:06.1346507Z auto tmp74 = tmp70 * tmp42; 2023-01-11T21:38:06.1346608Z auto tmp75 = tmp71 * tmp47; 2023-01-11T21:38:06.1346712Z auto tmp76 = tmp72 * tmp53; 2023-01-11T21:38:06.1346811Z auto tmp77 = tmp73 * tmp60; 2023-01-11T21:38:06.1346910Z auto tmp78 = tmp76 + tmp77; 2023-01-11T21:38:06.1347010Z auto tmp79 = tmp75 + tmp78; 2023-01-11T21:38:06.1347107Z auto tmp80 = tmp74 + tmp79; 2023-01-11T21:38:06.1347232Z auto tmp81 = (tmp16 != tmp16) ? tmp16 : std::min(63, tmp16); 2023-01-11T21:38:06.1347365Z auto tmp82 = (tmp81 != tmp81) ? tmp81 : std::max(0, tmp81); 2023-01-11T21:38:06.1347485Z auto tmp83 = in_ptr0[tmp25 + (32*tmp82) + (2048*i0)]; 2023-01-11T21:38:06.1347607Z auto tmp84 = in_ptr0[tmp28 + (32*tmp82) + (2048*i0)]; 2023-01-11T21:38:06.1347727Z auto tmp85 = in_ptr0[tmp31 + (32*tmp82) + (2048*i0)]; 2023-01-11T21:38:06.1347851Z auto tmp86 = in_ptr0[tmp34 + (32*tmp82) + (2048*i0)]; 2023-01-11T21:38:06.1347953Z auto tmp87 = tmp83 * tmp42; 2023-01-11T21:38:06.1348050Z auto tmp88 = tmp84 * tmp47; 2023-01-11T21:38:06.1348221Z auto tmp89 = tmp85 * tmp53; 2023-01-11T21:38:06.1348322Z auto tmp90 = tmp86 * tmp60; 2023-01-11T21:38:06.1348424Z auto tmp91 = tmp89 + tmp90; 2023-01-11T21:38:06.1348521Z auto tmp92 = tmp88 + tmp91; 2023-01-11T21:38:06.1348620Z auto tmp93 = tmp87 + tmp92; 2023-01-11T21:38:06.1348751Z auto tmp94 = (tmp17 != tmp17) ? tmp17 : std::min(63, tmp17); 2023-01-11T21:38:06.1348885Z auto tmp95 = (tmp94 != tmp94) ? tmp94 : std::max(0, tmp94); 2023-01-11T21:38:06.1349005Z auto tmp96 = in_ptr0[tmp25 + (32*tmp95) + (2048*i0)]; 2023-01-11T21:38:06.1349122Z auto tmp97 = in_ptr0[tmp28 + (32*tmp95) + (2048*i0)]; 2023-01-11T21:38:06.1349241Z auto tmp98 = in_ptr0[tmp31 + (32*tmp95) + (2048*i0)]; 2023-01-11T21:38:06.1349363Z auto tmp99 = in_ptr0[tmp34 + (32*tmp95) + (2048*i0)]; 2023-01-11T21:38:06.1349469Z auto tmp100 = tmp96 * tmp42; 2023-01-11T21:38:06.1349577Z auto tmp101 = tmp97 * tmp47; 2023-01-11T21:38:06.1349680Z auto tmp102 = tmp98 * tmp53; 2023-01-11T21:38:06.1349785Z auto tmp103 = tmp99 * tmp60; 2023-01-11T21:38:06.1349891Z auto tmp104 = tmp102 + tmp103; 2023-01-11T21:38:06.1349988Z auto tmp105 = tmp101 + tmp104; 2023-01-11T21:38:06.1350092Z auto tmp106 = tmp100 + tmp105; 2023-01-11T21:38:06.1350193Z auto tmp107 = tmp11 + 1.0; 2023-01-11T21:38:06.1350375Z auto tmp108 = -0.75 * tmp107; 2023-01-11T21:38:06.1350529Z auto tmp109 = tmp108 - -3.75; 2023-01-11T21:38:06.1350636Z auto tmp110 = tmp109 * tmp107; 2023-01-11T21:38:06.1350790Z auto tmp111 = tmp110 + -6.0; 2023-01-11T21:38:06.1350886Z auto tmp112 = tmp111 * tmp107; 2023-01-11T21:38:06.1351038Z auto tmp113 = tmp112 - -3.0; 2023-01-11T21:38:06.1351138Z auto tmp114 = 1.25 * tmp11; 2023-01-11T21:38:06.1351289Z auto tmp115 = tmp114 - 2.25; 2023-01-11T21:38:06.1351393Z auto tmp116 = tmp115 * tmp11; 2023-01-11T21:38:06.1351497Z auto tmp117 = tmp116 * tmp11; 2023-01-11T21:38:06.1351596Z auto tmp118 = tmp117 + 1.0; 2023-01-11T21:38:06.1351741Z auto tmp119 = 1.0 - tmp11; 2023-01-11T21:38:06.1351838Z auto tmp120 = 1.25 * tmp119; 2023-01-11T21:38:06.1351989Z auto tmp121 = tmp120 - 2.25; 2023-01-11T21:38:06.1352094Z auto tmp122 = tmp121 * tmp119; 2023-01-11T21:38:06.1352200Z auto tmp123 = tmp122 * tmp119; 2023-01-11T21:38:06.1352301Z auto tmp124 = tmp123 + 1.0; 2023-01-11T21:38:06.1352400Z auto tmp125 = tmp119 + 1.0; 2023-01-11T21:38:06.1352552Z auto tmp126 = -0.75 * tmp125; 2023-01-11T21:38:06.1352696Z auto tmp127 = tmp126 - -3.75; 2023-01-11T21:38:06.1352800Z auto tmp128 = tmp127 * tmp125; 2023-01-11T21:38:06.1352951Z auto tmp129 = tmp128 + -6.0; 2023-01-11T21:38:06.1353055Z auto tmp130 = tmp129 * tmp125; 2023-01-11T21:38:06.1353208Z auto tmp131 = tmp130 - -3.0; 2023-01-11T21:38:06.1353314Z auto tmp132 = tmp67 * tmp113; 2023-01-11T21:38:06.1353415Z auto tmp133 = tmp80 * tmp118; 2023-01-11T21:38:06.1353519Z auto tmp134 = tmp93 * tmp124; 2023-01-11T21:38:06.1353644Z auto tmp135 = tmp106 * tmp131; 2023-01-11T21:38:06.1353751Z auto tmp136 = tmp134 + tmp135; 2023-01-11T21:38:06.1353853Z auto tmp137 = tmp133 + tmp136; 2023-01-11T21:38:06.1353956Z auto tmp138 = tmp132 + tmp137; 2023-01-11T21:38:06.1354067Z out_ptr1[i2 + (256*i1) + (32768*i0)] = tmp138; 2023-01-11T21:38:06.1354140Z } 2023-01-11T21:38:06.1354211Z } 2023-01-11T21:38:06.1354273Z } 2023-01-11T21:38:06.1354341Z } 2023-01-11T21:38:06.1354409Z } 2023-01-11T21:38:06.1354482Z } 2023-01-11T21:38:06.1354550Z } 2023-01-11T21:38:06.1354633Z ''') 2023-01-11T21:38:06.1354639Z 2023-01-11T21:38:06.1354643Z 2023-01-11T21:38:06.1354739Z async_compile.wait(globals()) 2023-01-11T21:38:06.1354809Z del async_compile 2023-01-11T21:38:06.1354814Z 2023-01-11T21:38:06.1354888Z def call(args): 2023-01-11T21:38:06.1354963Z arg0_1, = args 2023-01-11T21:38:06.1355041Z args.clear() 2023-01-11T21:38:06.1355287Z buf0 = empty_strided((4, 3, 128, 128), (49152, 16384, 128, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1355545Z buf1 = empty_strided((4, 3, 128, 256), (98304, 32768, 256, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1355714Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr())) 2023-01-11T21:38:06.1355788Z del arg0_1 2023-01-11T21:38:06.1355862Z return (buf0, buf1, ) 2023-01-11T21:38:06.1355868Z 2023-01-11T21:38:06.1355872Z 2023-01-11T21:38:06.1355952Z if __name__ == "__main__": 2023-01-11T21:38:06.1356111Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.1356240Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.1356463Z arg0_1 = rand_strided((4, 3, 64, 32), (6144, 2048, 32, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1356580Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.1356847Z [2023-01-11 21:32:51,050] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 314 2023-01-11T21:38:06.1356854Z 2023-01-11T21:38:06.1356925Z ok (3.538s) 2023-01-11T21:38:06.1357394Z test_upsample_bilinear2d_a_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.1357523Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.1357779Z [2023-01-11 21:32:51,625] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 315 2023-01-11T21:38:06.1357785Z 2023-01-11T21:38:06.1357883Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.1357963Z import torch 2023-01-11T21:38:06.1358038Z import random 2023-01-11T21:38:06.1358160Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.1358283Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.1358288Z 2023-01-11T21:38:06.1358370Z aten = torch.ops.aten 2023-01-11T21:38:06.1358500Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.1358598Z async_compile = AsyncCompile() 2023-01-11T21:38:06.1358603Z 2023-01-11T21:38:06.1358678Z import triton 2023-01-11T21:38:06.1358771Z import triton.language as tl 2023-01-11T21:38:06.1358896Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.1359040Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.1359046Z 2023-01-11T21:38:06.1359050Z 2023-01-11T21:38:06.1359186Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.1359393Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.1359533Z extern "C" void kernel(float* __restrict__ in_out_ptr0, 2023-01-11T21:38:06.1359642Z float* __restrict__ in_out_ptr1, 2023-01-11T21:38:06.1359752Z const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.1359858Z float* __restrict__ out_ptr1, 2023-01-11T21:38:06.1359961Z float* __restrict__ out_ptr3) 2023-01-11T21:38:06.1360027Z { 2023-01-11T21:38:06.1360118Z auto out_ptr0 = in_out_ptr0; 2023-01-11T21:38:06.1360201Z auto out_ptr2 = in_out_ptr1; 2023-01-11T21:38:06.1360304Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.1360372Z { 2023-01-11T21:38:06.1360458Z #pragma omp for 2023-01-11T21:38:06.1360547Z for(long i0=0; i0<8; i0+=1) 2023-01-11T21:38:06.1360614Z { 2023-01-11T21:38:06.1360702Z #pragma GCC ivdep 2023-01-11T21:38:06.1360787Z for(long i1=0; i1<45; i1+=1) 2023-01-11T21:38:06.1360859Z { 2023-01-11T21:38:06.1360949Z #pragma GCC ivdep 2023-01-11T21:38:06.1361045Z for(long i2=0; i2<45; i2+=1) 2023-01-11T21:38:06.1361114Z { 2023-01-11T21:38:06.1361186Z { 2023-01-11T21:38:06.1361259Z { 2023-01-11T21:38:06.1361368Z auto tmp0 = static_cast(i1); 2023-01-11T21:38:06.1361486Z auto tmp1 = static_cast(0.5); 2023-01-11T21:38:06.1361585Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.1361709Z auto tmp3 = static_cast(0.8222222222222222); 2023-01-11T21:38:06.1361838Z auto tmp4 = tmp2 * tmp3; 2023-01-11T21:38:06.1361983Z auto tmp5 = tmp4 - tmp1; 2023-01-11T21:38:06.1362098Z auto tmp6 = static_cast(0.0); 2023-01-11T21:38:06.1362229Z auto tmp7 = (tmp6 != tmp6) ? tmp6 : std::max(tmp5, tmp6); 2023-01-11T21:38:06.1362345Z auto tmp8 = std::floor(tmp7); 2023-01-11T21:38:06.1362461Z auto tmp9 = static_cast(tmp8); 2023-01-11T21:38:06.1362577Z auto tmp10 = static_cast(i2); 2023-01-11T21:38:06.1362682Z auto tmp11 = tmp10 + tmp1; 2023-01-11T21:38:06.1362807Z auto tmp12 = static_cast(0.8444444444444444); 2023-01-11T21:38:06.1362912Z auto tmp13 = tmp11 * tmp12; 2023-01-11T21:38:06.1363080Z auto tmp14 = tmp13 - tmp1; 2023-01-11T21:38:06.1363247Z auto tmp15 = (tmp6 != tmp6) ? tmp6 : std::max(tmp14, tmp6); 2023-01-11T21:38:06.1363397Z auto tmp16 = std::floor(tmp15); 2023-01-11T21:38:06.1363552Z auto tmp17 = static_cast(tmp16); 2023-01-11T21:38:06.1363712Z auto tmp18 = in_ptr0[tmp17 + (38*tmp9) + (1406*i0)]; 2023-01-11T21:38:06.1363844Z auto tmp19 = static_cast(1.0); 2023-01-11T21:38:06.1363983Z auto tmp20 = static_cast(tmp9); 2023-01-11T21:38:06.1364170Z auto tmp21 = tmp7 - tmp20; 2023-01-11T21:38:06.1364343Z auto tmp22 = tmp19 - tmp21; 2023-01-11T21:38:06.1364467Z auto tmp23 = tmp18 * tmp22; 2023-01-11T21:38:06.1364594Z auto tmp24 = std::ceil(tmp7); 2023-01-11T21:38:06.1364727Z auto tmp25 = static_cast(36.0); 2023-01-11T21:38:06.1364891Z auto tmp26 = (tmp25 != tmp25) ? tmp25 : std::min(tmp24, tmp25); 2023-01-11T21:38:06.1365027Z auto tmp27 = static_cast(tmp26); 2023-01-11T21:38:06.1365163Z auto tmp28 = in_ptr0[tmp17 + (38*tmp27) + (1406*i0)]; 2023-01-11T21:38:06.1365333Z auto tmp29 = tmp28 * tmp21; 2023-01-11T21:38:06.1365463Z auto tmp30 = tmp23 + tmp29; 2023-01-11T21:38:06.1365594Z out_ptr0[i2 + (45*i1) + (2025*i0)] = tmp30; 2023-01-11T21:38:06.1365701Z } 2023-01-11T21:38:06.1365785Z } 2023-01-11T21:38:06.1365867Z } 2023-01-11T21:38:06.1365948Z } 2023-01-11T21:38:06.1366028Z } 2023-01-11T21:38:06.1366123Z #pragma omp for 2023-01-11T21:38:06.1366219Z for(long i0=0; i0<8; i0+=1) 2023-01-11T21:38:06.1366305Z { 2023-01-11T21:38:06.1366417Z #pragma GCC ivdep 2023-01-11T21:38:06.1366543Z for(long i1=0; i1<45; i1+=1) 2023-01-11T21:38:06.1366636Z { 2023-01-11T21:38:06.1366746Z #pragma GCC ivdep 2023-01-11T21:38:06.1366835Z for(long i2=0; i2<45; i2+=1) 2023-01-11T21:38:06.1366905Z { 2023-01-11T21:38:06.1366978Z { 2023-01-11T21:38:06.1367050Z { 2023-01-11T21:38:06.1367163Z auto tmp0 = static_cast(i1); 2023-01-11T21:38:06.1367276Z auto tmp1 = static_cast(0.5); 2023-01-11T21:38:06.1367380Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.1367494Z auto tmp3 = static_cast(0.8222222222222222); 2023-01-11T21:38:06.1367596Z auto tmp4 = tmp2 * tmp3; 2023-01-11T21:38:06.1367771Z auto tmp5 = tmp4 - tmp1; 2023-01-11T21:38:06.1367887Z auto tmp6 = static_cast(0.0); 2023-01-11T21:38:06.1368068Z auto tmp7 = (tmp6 != tmp6) ? tmp6 : std::max(tmp5, tmp6); 2023-01-11T21:38:06.1368180Z auto tmp8 = std::floor(tmp7); 2023-01-11T21:38:06.1368295Z auto tmp9 = static_cast(tmp8); 2023-01-11T21:38:06.1368418Z auto tmp10 = static_cast(i2); 2023-01-11T21:38:06.1368518Z auto tmp11 = tmp10 + tmp1; 2023-01-11T21:38:06.1368647Z auto tmp12 = static_cast(0.8444444444444444); 2023-01-11T21:38:06.1368752Z auto tmp13 = tmp11 * tmp12; 2023-01-11T21:38:06.1368905Z auto tmp14 = tmp13 - tmp1; 2023-01-11T21:38:06.1369041Z auto tmp15 = (tmp6 != tmp6) ? tmp6 : std::max(tmp14, tmp6); 2023-01-11T21:38:06.1369153Z auto tmp16 = std::ceil(tmp15); 2023-01-11T21:38:06.1369271Z auto tmp17 = static_cast(37.0); 2023-01-11T21:38:06.1369409Z auto tmp18 = (tmp17 != tmp17) ? tmp17 : std::min(tmp16, tmp17); 2023-01-11T21:38:06.1369523Z auto tmp19 = static_cast(tmp18); 2023-01-11T21:38:06.1369653Z auto tmp20 = in_ptr0[tmp19 + (38*tmp9) + (1406*i0)]; 2023-01-11T21:38:06.1369772Z auto tmp21 = static_cast(1.0); 2023-01-11T21:38:06.1369891Z auto tmp22 = static_cast(tmp9); 2023-01-11T21:38:06.1370043Z auto tmp23 = tmp7 - tmp22; 2023-01-11T21:38:06.1370196Z auto tmp24 = tmp21 - tmp23; 2023-01-11T21:38:06.1370303Z auto tmp25 = tmp20 * tmp24; 2023-01-11T21:38:06.1370414Z auto tmp26 = std::ceil(tmp7); 2023-01-11T21:38:06.1370522Z auto tmp27 = static_cast(36.0); 2023-01-11T21:38:06.1370661Z auto tmp28 = (tmp27 != tmp27) ? tmp27 : std::min(tmp26, tmp27); 2023-01-11T21:38:06.1370780Z auto tmp29 = static_cast(tmp28); 2023-01-11T21:38:06.1370907Z auto tmp30 = in_ptr0[tmp19 + (38*tmp29) + (1406*i0)]; 2023-01-11T21:38:06.1371041Z auto tmp31 = tmp30 * tmp23; 2023-01-11T21:38:06.1371145Z auto tmp32 = tmp25 + tmp31; 2023-01-11T21:38:06.1371259Z out_ptr1[i2 + (45*i1) + (2025*i0)] = tmp32; 2023-01-11T21:38:06.1371335Z } 2023-01-11T21:38:06.1371402Z } 2023-01-11T21:38:06.1371474Z } 2023-01-11T21:38:06.1371544Z } 2023-01-11T21:38:06.1371613Z } 2023-01-11T21:38:06.1371696Z #pragma omp for 2023-01-11T21:38:06.1371785Z for(long i0=0; i0<360; i0+=1) 2023-01-11T21:38:06.1371854Z { 2023-01-11T21:38:06.1371934Z #pragma GCC ivdep 2023-01-11T21:38:06.1372031Z for(long i1=0; i1<45; i1+=1) 2023-01-11T21:38:06.1372100Z { 2023-01-11T21:38:06.1372170Z { 2023-01-11T21:38:06.1372240Z { 2023-01-11T21:38:06.1372355Z auto tmp0 = out_ptr0[i1 + (45*i0)]; 2023-01-11T21:38:06.1372463Z auto tmp16 = out_ptr1[i1 + (45*i0)]; 2023-01-11T21:38:06.1372581Z auto tmp1 = static_cast(1.0); 2023-01-11T21:38:06.1372692Z auto tmp2 = static_cast(i1); 2023-01-11T21:38:06.1372805Z auto tmp3 = static_cast(0.5); 2023-01-11T21:38:06.1372908Z auto tmp4 = tmp2 + tmp3; 2023-01-11T21:38:06.1373032Z auto tmp5 = static_cast(0.8444444444444444); 2023-01-11T21:38:06.1373135Z auto tmp6 = tmp4 * tmp5; 2023-01-11T21:38:06.1373282Z auto tmp7 = tmp6 - tmp3; 2023-01-11T21:38:06.1373423Z auto tmp8 = static_cast(0.0); 2023-01-11T21:38:06.1373560Z auto tmp9 = (tmp8 != tmp8) ? tmp8 : std::max(tmp7, tmp8); 2023-01-11T21:38:06.1373672Z auto tmp10 = std::floor(tmp9); 2023-01-11T21:38:06.1373790Z auto tmp11 = static_cast(tmp10); 2023-01-11T21:38:06.1373908Z auto tmp12 = static_cast(tmp11); 2023-01-11T21:38:06.1374056Z auto tmp13 = tmp9 - tmp12; 2023-01-11T21:38:06.1374202Z auto tmp14 = tmp1 - tmp13; 2023-01-11T21:38:06.1374302Z auto tmp15 = tmp0 * tmp14; 2023-01-11T21:38:06.1374397Z auto tmp17 = tmp16 * tmp13; 2023-01-11T21:38:06.1374685Z auto tmp18 = tmp15 + tmp17; 2023-01-11T21:38:06.1374832Z in_out_ptr0[i1 + (45*i0)] = tmp18; 2023-01-11T21:38:06.1374909Z } 2023-01-11T21:38:06.1374985Z } 2023-01-11T21:38:06.1375052Z } 2023-01-11T21:38:06.1375111Z } 2023-01-11T21:38:06.1375194Z #pragma omp for 2023-01-11T21:38:06.1375282Z for(long i0=0; i0<8; i0+=1) 2023-01-11T21:38:06.1375362Z { 2023-01-11T21:38:06.1375459Z #pragma GCC ivdep 2023-01-11T21:38:06.1375568Z for(long i1=0; i1<74; i1+=1) 2023-01-11T21:38:06.1375643Z { 2023-01-11T21:38:06.1375724Z #pragma GCC ivdep 2023-01-11T21:38:06.1375818Z for(long i2=0; i2<76; i2+=1) 2023-01-11T21:38:06.1375886Z { 2023-01-11T21:38:06.1375957Z { 2023-01-11T21:38:06.1376030Z { 2023-01-11T21:38:06.1376147Z auto tmp0 = static_cast(i1); 2023-01-11T21:38:06.1376267Z auto tmp1 = static_cast(0.4931506849315068); 2023-01-11T21:38:06.1376361Z auto tmp2 = tmp0 * tmp1; 2023-01-11T21:38:06.1376473Z auto tmp3 = std::floor(tmp2); 2023-01-11T21:38:06.1376590Z auto tmp4 = static_cast(tmp3); 2023-01-11T21:38:06.1376701Z auto tmp5 = static_cast(i2); 2023-01-11T21:38:06.1376892Z auto tmp6 = static_cast(0.49333333333333335); 2023-01-11T21:38:06.1376997Z auto tmp7 = tmp5 * tmp6; 2023-01-11T21:38:06.1377107Z auto tmp8 = std::floor(tmp7); 2023-01-11T21:38:06.1377298Z auto tmp9 = static_cast(tmp8); 2023-01-11T21:38:06.1377420Z auto tmp10 = in_ptr0[tmp9 + (38*tmp4) + (1406*i0)]; 2023-01-11T21:38:06.1377533Z auto tmp11 = static_cast(1.0); 2023-01-11T21:38:06.1377649Z auto tmp12 = static_cast(tmp4); 2023-01-11T21:38:06.1377809Z auto tmp13 = tmp2 - tmp12; 2023-01-11T21:38:06.1377963Z auto tmp14 = tmp11 - tmp13; 2023-01-11T21:38:06.1378068Z auto tmp15 = tmp10 * tmp14; 2023-01-11T21:38:06.1378180Z auto tmp16 = std::ceil(tmp2); 2023-01-11T21:38:06.1378292Z auto tmp17 = static_cast(36.0); 2023-01-11T21:38:06.1378423Z auto tmp18 = (tmp17 != tmp17) ? tmp17 : std::min(tmp16, tmp17); 2023-01-11T21:38:06.1378538Z auto tmp19 = static_cast(tmp18); 2023-01-11T21:38:06.1378661Z auto tmp20 = in_ptr0[tmp9 + (38*tmp19) + (1406*i0)]; 2023-01-11T21:38:06.1378763Z auto tmp21 = tmp20 * tmp13; 2023-01-11T21:38:06.1378864Z auto tmp22 = tmp15 + tmp21; 2023-01-11T21:38:06.1378981Z auto tmp23 = static_cast(tmp9); 2023-01-11T21:38:06.1379128Z auto tmp24 = tmp7 - tmp23; 2023-01-11T21:38:06.1379325Z auto tmp25 = tmp11 - tmp24; 2023-01-11T21:38:06.1379426Z auto tmp26 = tmp22 * tmp25; 2023-01-11T21:38:06.1379538Z out_ptr2[i2 + (76*i1) + (5624*i0)] = tmp26; 2023-01-11T21:38:06.1379611Z } 2023-01-11T21:38:06.1379685Z } 2023-01-11T21:38:06.1379754Z } 2023-01-11T21:38:06.1379821Z } 2023-01-11T21:38:06.1379880Z } 2023-01-11T21:38:06.1379961Z #pragma omp for 2023-01-11T21:38:06.1380046Z for(long i0=0; i0<8; i0+=1) 2023-01-11T21:38:06.1380113Z { 2023-01-11T21:38:06.1380200Z #pragma GCC ivdep 2023-01-11T21:38:06.1380290Z for(long i1=0; i1<74; i1+=1) 2023-01-11T21:38:06.1380359Z { 2023-01-11T21:38:06.1380438Z #pragma GCC ivdep 2023-01-11T21:38:06.1380532Z for(long i2=0; i2<76; i2+=1) 2023-01-11T21:38:06.1380600Z { 2023-01-11T21:38:06.1380674Z { 2023-01-11T21:38:06.1380746Z { 2023-01-11T21:38:06.1380859Z auto tmp0 = static_cast(i1); 2023-01-11T21:38:06.1380982Z auto tmp1 = static_cast(0.4931506849315068); 2023-01-11T21:38:06.1381078Z auto tmp2 = tmp0 * tmp1; 2023-01-11T21:38:06.1381187Z auto tmp3 = std::floor(tmp2); 2023-01-11T21:38:06.1381302Z auto tmp4 = static_cast(tmp3); 2023-01-11T21:38:06.1381413Z auto tmp5 = static_cast(i2); 2023-01-11T21:38:06.1381536Z auto tmp6 = static_cast(0.49333333333333335); 2023-01-11T21:38:06.1381637Z auto tmp7 = tmp5 * tmp6; 2023-01-11T21:38:06.1381746Z auto tmp8 = std::ceil(tmp7); 2023-01-11T21:38:06.1381860Z auto tmp9 = static_cast(37.0); 2023-01-11T21:38:06.1381992Z auto tmp10 = (tmp9 != tmp9) ? tmp9 : std::min(tmp8, tmp9); 2023-01-11T21:38:06.1382108Z auto tmp11 = static_cast(tmp10); 2023-01-11T21:38:06.1382230Z auto tmp12 = in_ptr0[tmp11 + (38*tmp4) + (1406*i0)]; 2023-01-11T21:38:06.1382377Z auto tmp13 = static_cast(1.0); 2023-01-11T21:38:06.1382494Z auto tmp14 = static_cast(tmp4); 2023-01-11T21:38:06.1382647Z auto tmp15 = tmp2 - tmp14; 2023-01-11T21:38:06.1382796Z auto tmp16 = tmp13 - tmp15; 2023-01-11T21:38:06.1382898Z auto tmp17 = tmp12 * tmp16; 2023-01-11T21:38:06.1383001Z auto tmp18 = std::ceil(tmp2); 2023-01-11T21:38:06.1383118Z auto tmp19 = static_cast(36.0); 2023-01-11T21:38:06.1383256Z auto tmp20 = (tmp19 != tmp19) ? tmp19 : std::min(tmp18, tmp19); 2023-01-11T21:38:06.1383374Z auto tmp21 = static_cast(tmp20); 2023-01-11T21:38:06.1383499Z auto tmp22 = in_ptr0[tmp11 + (38*tmp21) + (1406*i0)]; 2023-01-11T21:38:06.1383601Z auto tmp23 = tmp22 * tmp15; 2023-01-11T21:38:06.1383709Z auto tmp24 = tmp17 + tmp23; 2023-01-11T21:38:06.1383821Z auto tmp25 = std::floor(tmp7); 2023-01-11T21:38:06.1383928Z auto tmp26 = static_cast(tmp25); 2023-01-11T21:38:06.1384044Z auto tmp27 = static_cast(tmp26); 2023-01-11T21:38:06.1384194Z auto tmp28 = tmp7 - tmp27; 2023-01-11T21:38:06.1384296Z auto tmp29 = tmp24 * tmp28; 2023-01-11T21:38:06.1384407Z out_ptr3[i2 + (76*i1) + (5624*i0)] = tmp29; 2023-01-11T21:38:06.1384480Z } 2023-01-11T21:38:06.1384581Z } 2023-01-11T21:38:06.1384643Z } 2023-01-11T21:38:06.1384711Z } 2023-01-11T21:38:06.1384778Z } 2023-01-11T21:38:06.1384862Z #pragma omp for 2023-01-11T21:38:06.1384950Z for(long i0=0; i0<5624; i0+=1) 2023-01-11T21:38:06.1385018Z { 2023-01-11T21:38:06.1385159Z auto tmp0 = at::vec::Vectorized::loadu(out_ptr2 + 8*i0); 2023-01-11T21:38:06.1385292Z auto tmp1 = at::vec::Vectorized::loadu(out_ptr3 + 8*i0); 2023-01-11T21:38:06.1385383Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.1385482Z tmp2.store(in_out_ptr1 + 8*i0); 2023-01-11T21:38:06.1385550Z } 2023-01-11T21:38:06.1385650Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:06.1385744Z for(long i0=44992; i0<44992; i0+=1) 2023-01-11T21:38:06.1385812Z { 2023-01-11T21:38:06.1385896Z auto tmp0 = out_ptr2[i0]; 2023-01-11T21:38:06.1385989Z auto tmp1 = out_ptr3[i0]; 2023-01-11T21:38:06.1386078Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.1386165Z in_out_ptr1[i0] = tmp2; 2023-01-11T21:38:06.1386232Z } 2023-01-11T21:38:06.1386299Z } 2023-01-11T21:38:06.1386365Z } 2023-01-11T21:38:06.1386443Z ''') 2023-01-11T21:38:06.1386449Z 2023-01-11T21:38:06.1386455Z 2023-01-11T21:38:06.1386550Z async_compile.wait(globals()) 2023-01-11T21:38:06.1386628Z del async_compile 2023-01-11T21:38:06.1386633Z 2023-01-11T21:38:06.1386709Z def call(args): 2023-01-11T21:38:06.1386784Z arg0_1, = args 2023-01-11T21:38:06.1386858Z args.clear() 2023-01-11T21:38:06.1387083Z buf0 = empty_strided((2, 4, 45, 45), (8100, 2025, 45, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1387295Z buf1 = empty_strided((2, 4, 45, 45), (8100, 2025, 45, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1387387Z buf2 = buf0; del buf0 # reuse 2023-01-11T21:38:06.1387608Z buf3 = empty_strided((2, 4, 74, 76), (22496, 5624, 76, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1387829Z buf4 = empty_strided((2, 4, 74, 76), (22496, 5624, 76, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1387920Z buf5 = buf3; del buf3 # reuse 2023-01-11T21:38:06.1388171Z kernel_cpp_0(c_void_p(buf2.data_ptr()), c_void_p(buf5.data_ptr()), c_void_p(arg0_1.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf4.data_ptr())) 2023-01-11T21:38:06.1388247Z del arg0_1 2023-01-11T21:38:06.1388330Z return (buf2, buf5, ) 2023-01-11T21:38:06.1388335Z 2023-01-11T21:38:06.1388339Z 2023-01-11T21:38:06.1388420Z if __name__ == "__main__": 2023-01-11T21:38:06.1388533Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.1388660Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.1388881Z arg0_1 = rand_strided((2, 4, 37, 38), (5624, 1406, 38, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1388995Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.1389262Z [2023-01-11 21:32:53,555] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 315 2023-01-11T21:38:06.1389268Z 2023-01-11T21:38:06.1389339Z ok (2.491s) 2023-01-11T21:38:06.1389814Z test_upsample_bilinear2d_b_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.1389949Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.1390203Z [2023-01-11 21:32:53,837] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 316 2023-01-11T21:38:06.1390460Z [2023-01-11 21:32:55,635] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 316 2023-01-11T21:38:06.1390501Z 2023-01-11T21:38:06.1390593Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.1390668Z import torch 2023-01-11T21:38:06.1390743Z import random 2023-01-11T21:38:06.1390865Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.1390989Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.1390996Z 2023-01-11T21:38:06.1391080Z aten = torch.ops.aten 2023-01-11T21:38:06.1391218Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.1391308Z async_compile = AsyncCompile() 2023-01-11T21:38:06.1391313Z 2023-01-11T21:38:06.1391388Z import triton 2023-01-11T21:38:06.1391481Z import triton.language as tl 2023-01-11T21:38:06.1391605Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.1391746Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.1391751Z 2023-01-11T21:38:06.1391756Z 2023-01-11T21:38:06.1391892Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.1392104Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.1392226Z extern "C" void kernel(float* __restrict__ in_out_ptr0, 2023-01-11T21:38:06.1392329Z const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.1392438Z float* __restrict__ out_ptr1) 2023-01-11T21:38:06.1392506Z { 2023-01-11T21:38:06.1392600Z auto out_ptr0 = in_out_ptr0; 2023-01-11T21:38:06.1392702Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.1392768Z { 2023-01-11T21:38:06.1392863Z #pragma omp for collapse(2) 2023-01-11T21:38:06.1392943Z for(long i0=0; i0<2; i0+=1) 2023-01-11T21:38:06.1393011Z { 2023-01-11T21:38:06.1393102Z for(long i1=0; i1<80; i1+=1) 2023-01-11T21:38:06.1393172Z { 2023-01-11T21:38:06.1393258Z #pragma GCC ivdep 2023-01-11T21:38:06.1393354Z for(long i2=0; i2<118; i2+=1) 2023-01-11T21:38:06.1393419Z { 2023-01-11T21:38:06.1393491Z { 2023-01-11T21:38:06.1393568Z { 2023-01-11T21:38:06.1393689Z auto tmp0 = static_cast(i1); 2023-01-11T21:38:06.1393814Z auto tmp1 = static_cast(0.4936708860759494); 2023-01-11T21:38:06.1393944Z auto tmp2 = tmp0 * tmp1; 2023-01-11T21:38:06.1394058Z auto tmp3 = std::floor(tmp2); 2023-01-11T21:38:06.1394176Z auto tmp4 = static_cast(tmp3); 2023-01-11T21:38:06.1394282Z auto tmp5 = static_cast(i2); 2023-01-11T21:38:06.1394406Z auto tmp6 = static_cast(0.49572649572649574); 2023-01-11T21:38:06.1394508Z auto tmp7 = tmp5 * tmp6; 2023-01-11T21:38:06.1394622Z auto tmp8 = std::floor(tmp7); 2023-01-11T21:38:06.1394736Z auto tmp9 = static_cast(tmp8); 2023-01-11T21:38:06.1394862Z auto tmp10 = in_ptr0[tmp9 + (59*tmp4) + (2360*i0)]; 2023-01-11T21:38:06.1394978Z auto tmp11 = static_cast(1.0); 2023-01-11T21:38:06.1395087Z auto tmp12 = static_cast(tmp4); 2023-01-11T21:38:06.1395242Z auto tmp13 = tmp2 - tmp12; 2023-01-11T21:38:06.1395394Z auto tmp14 = tmp11 - tmp13; 2023-01-11T21:38:06.1395498Z auto tmp15 = tmp10 * tmp14; 2023-01-11T21:38:06.1395608Z auto tmp16 = std::ceil(tmp2); 2023-01-11T21:38:06.1395724Z auto tmp17 = static_cast(39.0); 2023-01-11T21:38:06.1395863Z auto tmp18 = (tmp17 != tmp17) ? tmp17 : std::min(tmp16, tmp17); 2023-01-11T21:38:06.1395981Z auto tmp19 = static_cast(tmp18); 2023-01-11T21:38:06.1396098Z auto tmp20 = in_ptr0[tmp9 + (59*tmp19) + (2360*i0)]; 2023-01-11T21:38:06.1396228Z auto tmp21 = tmp20 * tmp13; 2023-01-11T21:38:06.1396329Z auto tmp22 = tmp15 + tmp21; 2023-01-11T21:38:06.1396446Z auto tmp23 = static_cast(tmp9); 2023-01-11T21:38:06.1396598Z auto tmp24 = tmp7 - tmp23; 2023-01-11T21:38:06.1396748Z auto tmp25 = tmp11 - tmp24; 2023-01-11T21:38:06.1396852Z auto tmp26 = tmp22 * tmp25; 2023-01-11T21:38:06.1396965Z out_ptr0[i2 + (118*i1) + (9440*i0)] = tmp26; 2023-01-11T21:38:06.1397032Z } 2023-01-11T21:38:06.1397103Z } 2023-01-11T21:38:06.1397174Z } 2023-01-11T21:38:06.1397242Z } 2023-01-11T21:38:06.1397311Z } 2023-01-11T21:38:06.1397407Z #pragma omp for collapse(2) 2023-01-11T21:38:06.1397500Z for(long i0=0; i0<2; i0+=1) 2023-01-11T21:38:06.1397560Z { 2023-01-11T21:38:06.1397651Z for(long i1=0; i1<80; i1+=1) 2023-01-11T21:38:06.1397719Z { 2023-01-11T21:38:06.1397809Z #pragma GCC ivdep 2023-01-11T21:38:06.1397906Z for(long i2=0; i2<118; i2+=1) 2023-01-11T21:38:06.1397979Z { 2023-01-11T21:38:06.1398043Z { 2023-01-11T21:38:06.1398117Z { 2023-01-11T21:38:06.1398229Z auto tmp0 = static_cast(i1); 2023-01-11T21:38:06.1398352Z auto tmp1 = static_cast(0.4936708860759494); 2023-01-11T21:38:06.1398454Z auto tmp2 = tmp0 * tmp1; 2023-01-11T21:38:06.1398564Z auto tmp3 = std::floor(tmp2); 2023-01-11T21:38:06.1398683Z auto tmp4 = static_cast(tmp3); 2023-01-11T21:38:06.1398796Z auto tmp5 = static_cast(i2); 2023-01-11T21:38:06.1398916Z auto tmp6 = static_cast(0.49572649572649574); 2023-01-11T21:38:06.1399019Z auto tmp7 = tmp5 * tmp6; 2023-01-11T21:38:06.1399130Z auto tmp8 = std::ceil(tmp7); 2023-01-11T21:38:06.1399274Z auto tmp9 = static_cast(58.0); 2023-01-11T21:38:06.1399414Z auto tmp10 = (tmp9 != tmp9) ? tmp9 : std::min(tmp8, tmp9); 2023-01-11T21:38:06.1399530Z auto tmp11 = static_cast(tmp10); 2023-01-11T21:38:06.1399652Z auto tmp12 = in_ptr0[tmp11 + (59*tmp4) + (2360*i0)]; 2023-01-11T21:38:06.1399771Z auto tmp13 = static_cast(1.0); 2023-01-11T21:38:06.1399879Z auto tmp14 = static_cast(tmp4); 2023-01-11T21:38:06.1400031Z auto tmp15 = tmp2 - tmp14; 2023-01-11T21:38:06.1400181Z auto tmp16 = tmp13 - tmp15; 2023-01-11T21:38:06.1400286Z auto tmp17 = tmp12 * tmp16; 2023-01-11T21:38:06.1400395Z auto tmp18 = std::ceil(tmp2); 2023-01-11T21:38:06.1400511Z auto tmp19 = static_cast(39.0); 2023-01-11T21:38:06.1400653Z auto tmp20 = (tmp19 != tmp19) ? tmp19 : std::min(tmp18, tmp19); 2023-01-11T21:38:06.1400767Z auto tmp21 = static_cast(tmp20); 2023-01-11T21:38:06.1400884Z auto tmp22 = in_ptr0[tmp11 + (59*tmp21) + (2360*i0)]; 2023-01-11T21:38:06.1400987Z auto tmp23 = tmp22 * tmp15; 2023-01-11T21:38:06.1401089Z auto tmp24 = tmp17 + tmp23; 2023-01-11T21:38:06.1401202Z auto tmp25 = std::floor(tmp7); 2023-01-11T21:38:06.1401317Z auto tmp26 = static_cast(tmp25); 2023-01-11T21:38:06.1401462Z auto tmp27 = static_cast(tmp26); 2023-01-11T21:38:06.1401611Z auto tmp28 = tmp7 - tmp27; 2023-01-11T21:38:06.1401705Z auto tmp29 = tmp24 * tmp28; 2023-01-11T21:38:06.1401820Z out_ptr1[i2 + (118*i1) + (9440*i0)] = tmp29; 2023-01-11T21:38:06.1401897Z } 2023-01-11T21:38:06.1401968Z } 2023-01-11T21:38:06.1402039Z } 2023-01-11T21:38:06.1402107Z } 2023-01-11T21:38:06.1402176Z } 2023-01-11T21:38:06.1402251Z #pragma omp for 2023-01-11T21:38:06.1402340Z for(long i0=0; i0<2360; i0+=1) 2023-01-11T21:38:06.1402407Z { 2023-01-11T21:38:06.1402547Z auto tmp0 = at::vec::Vectorized::loadu(out_ptr0 + 8*i0); 2023-01-11T21:38:06.1402686Z auto tmp1 = at::vec::Vectorized::loadu(out_ptr1 + 8*i0); 2023-01-11T21:38:06.1402777Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.1402882Z tmp2.store(in_out_ptr0 + 8*i0); 2023-01-11T21:38:06.1402942Z } 2023-01-11T21:38:06.1403045Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:06.1403137Z for(long i0=18880; i0<18880; i0+=1) 2023-01-11T21:38:06.1403204Z { 2023-01-11T21:38:06.1403299Z auto tmp0 = out_ptr0[i0]; 2023-01-11T21:38:06.1403391Z auto tmp1 = out_ptr1[i0]; 2023-01-11T21:38:06.1403480Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.1403561Z in_out_ptr0[i0] = tmp2; 2023-01-11T21:38:06.1403629Z } 2023-01-11T21:38:06.1403697Z } 2023-01-11T21:38:06.1403762Z } 2023-01-11T21:38:06.1403850Z ''') 2023-01-11T21:38:06.1403855Z 2023-01-11T21:38:06.1403860Z 2023-01-11T21:38:06.1403957Z async_compile.wait(globals()) 2023-01-11T21:38:06.1404034Z del async_compile 2023-01-11T21:38:06.1404039Z 2023-01-11T21:38:06.1404115Z def call(args): 2023-01-11T21:38:06.1404182Z arg0_1, = args 2023-01-11T21:38:06.1404257Z args.clear() 2023-01-11T21:38:06.1404490Z buf0 = empty_strided((1, 2, 80, 118), (18880, 9440, 118, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1404708Z buf1 = empty_strided((1, 2, 80, 118), (18880, 9440, 118, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1404800Z buf2 = buf0; del buf0 # reuse 2023-01-11T21:38:06.1404994Z kernel_cpp_0(c_void_p(buf2.data_ptr()), c_void_p(arg0_1.data_ptr()), c_void_p(buf1.data_ptr())) 2023-01-11T21:38:06.1405072Z del arg0_1 2023-01-11T21:38:06.1405141Z return (buf2, ) 2023-01-11T21:38:06.1405148Z 2023-01-11T21:38:06.1405154Z 2023-01-11T21:38:06.1405256Z if __name__ == "__main__": 2023-01-11T21:38:06.1405389Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.1405526Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.1405747Z arg0_1 = rand_strided((1, 2, 40, 59), (4720, 2360, 59, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1405861Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.1405870Z 2023-01-11T21:38:06.1405943Z ok (2.080s) 2023-01-11T21:38:06.1406415Z test_upsample_nearest1d_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.1406548Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.1406799Z [2023-01-11 21:32:55,924] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 317 2023-01-11T21:38:06.1407063Z [2023-01-11 21:32:57,620] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 317 2023-01-11T21:38:06.1407069Z 2023-01-11T21:38:06.1407166Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.1407241Z import torch 2023-01-11T21:38:06.1407350Z import random 2023-01-11T21:38:06.1407471Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.1407594Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.1407599Z 2023-01-11T21:38:06.1407681Z aten = torch.ops.aten 2023-01-11T21:38:06.1407813Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.1407911Z async_compile = AsyncCompile() 2023-01-11T21:38:06.1407916Z 2023-01-11T21:38:06.1407990Z import triton 2023-01-11T21:38:06.1408084Z import triton.language as tl 2023-01-11T21:38:06.1408211Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.1408351Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.1408356Z 2023-01-11T21:38:06.1408361Z 2023-01-11T21:38:06.1408499Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.1408705Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.1408825Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.1408930Z float* __restrict__ out_ptr0, 2023-01-11T21:38:06.1409031Z float* __restrict__ out_ptr1, 2023-01-11T21:38:06.1409130Z float* __restrict__ out_ptr2, 2023-01-11T21:38:06.1409236Z float* __restrict__ out_ptr3, 2023-01-11T21:38:06.1409341Z float* __restrict__ out_ptr4) 2023-01-11T21:38:06.1409408Z { 2023-01-11T21:38:06.1409503Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.1409569Z { 2023-01-11T21:38:06.1409653Z #pragma omp for 2023-01-11T21:38:06.1409741Z for(long i0=0; i0<8; i0+=1) 2023-01-11T21:38:06.1409807Z { 2023-01-11T21:38:06.1409893Z #pragma GCC ivdep 2023-01-11T21:38:06.1409987Z for(long i1=0; i1<74; i1+=1) 2023-01-11T21:38:06.1410048Z { 2023-01-11T21:38:06.1410117Z { 2023-01-11T21:38:06.1410189Z { 2023-01-11T21:38:06.1410306Z auto tmp0 = static_cast(i1); 2023-01-11T21:38:06.1410421Z auto tmp1 = static_cast(0.5); 2023-01-11T21:38:06.1410521Z auto tmp2 = tmp0 * tmp1; 2023-01-11T21:38:06.1410634Z auto tmp3 = static_cast(tmp2); 2023-01-11T21:38:06.1410767Z auto tmp4 = in_ptr0[tmp3 + (37*i0)]; 2023-01-11T21:38:06.1410871Z out_ptr0[i1 + (74*i0)] = tmp4; 2023-01-11T21:38:06.1410970Z out_ptr1[i1 + (74*i0)] = tmp4; 2023-01-11T21:38:06.1411042Z } 2023-01-11T21:38:06.1411113Z } 2023-01-11T21:38:06.1411181Z } 2023-01-11T21:38:06.1411248Z } 2023-01-11T21:38:06.1411323Z #pragma omp for 2023-01-11T21:38:06.1411412Z for(long i0=0; i0<8; i0+=1) 2023-01-11T21:38:06.1411481Z { 2023-01-11T21:38:06.1411567Z #pragma GCC ivdep 2023-01-11T21:38:06.1411661Z for(long i1=0; i1<70; i1+=1) 2023-01-11T21:38:06.1411728Z { 2023-01-11T21:38:06.1411797Z { 2023-01-11T21:38:06.1411860Z { 2023-01-11T21:38:06.1411972Z auto tmp0 = static_cast(i1); 2023-01-11T21:38:06.1412097Z auto tmp1 = static_cast(0.5285714285714286); 2023-01-11T21:38:06.1412199Z auto tmp2 = tmp0 * tmp1; 2023-01-11T21:38:06.1412312Z auto tmp3 = static_cast(tmp2); 2023-01-11T21:38:06.1412423Z auto tmp4 = in_ptr0[tmp3 + (37*i0)]; 2023-01-11T21:38:06.1412521Z out_ptr2[i1 + (70*i0)] = tmp4; 2023-01-11T21:38:06.1421406Z } 2023-01-11T21:38:06.1421495Z } 2023-01-11T21:38:06.1421567Z } 2023-01-11T21:38:06.1421633Z } 2023-01-11T21:38:06.1421724Z #pragma omp for 2023-01-11T21:38:06.1421812Z for(long i0=0; i0<8; i0+=1) 2023-01-11T21:38:06.1422001Z { 2023-01-11T21:38:06.1422094Z #pragma GCC ivdep 2023-01-11T21:38:06.1422191Z for(long i1=0; i1<45; i1+=1) 2023-01-11T21:38:06.1422255Z { 2023-01-11T21:38:06.1422327Z { 2023-01-11T21:38:06.1422402Z { 2023-01-11T21:38:06.1422522Z auto tmp0 = static_cast(i1); 2023-01-11T21:38:06.1422647Z auto tmp1 = static_cast(0.8222222222222222); 2023-01-11T21:38:06.1422750Z auto tmp2 = tmp0 * tmp1; 2023-01-11T21:38:06.1422864Z auto tmp3 = static_cast(tmp2); 2023-01-11T21:38:06.1422971Z auto tmp4 = in_ptr0[tmp3 + (37*i0)]; 2023-01-11T21:38:06.1423074Z out_ptr3[i1 + (45*i0)] = tmp4; 2023-01-11T21:38:06.1423143Z } 2023-01-11T21:38:06.1423219Z } 2023-01-11T21:38:06.1423289Z } 2023-01-11T21:38:06.1423356Z } 2023-01-11T21:38:06.1423441Z #pragma omp for 2023-01-11T21:38:06.1423529Z for(long i0=0; i0<8; i0+=1) 2023-01-11T21:38:06.1423600Z { 2023-01-11T21:38:06.1423687Z #pragma GCC ivdep 2023-01-11T21:38:06.1423780Z for(long i1=0; i1<36; i1+=1) 2023-01-11T21:38:06.1423852Z { 2023-01-11T21:38:06.1423919Z { 2023-01-11T21:38:06.1423991Z { 2023-01-11T21:38:06.1424105Z auto tmp0 = static_cast(i1); 2023-01-11T21:38:06.1424226Z auto tmp1 = static_cast(1.0277777777777777); 2023-01-11T21:38:06.1424325Z auto tmp2 = tmp0 * tmp1; 2023-01-11T21:38:06.1424438Z auto tmp3 = static_cast(tmp2); 2023-01-11T21:38:06.1424549Z auto tmp4 = in_ptr0[tmp3 + (37*i0)]; 2023-01-11T21:38:06.1424644Z out_ptr4[i1 + (36*i0)] = tmp4; 2023-01-11T21:38:06.1424716Z } 2023-01-11T21:38:06.1424789Z } 2023-01-11T21:38:06.1424860Z } 2023-01-11T21:38:06.1424931Z } 2023-01-11T21:38:06.1425001Z } 2023-01-11T21:38:06.1425062Z } 2023-01-11T21:38:06.1425171Z ''') 2023-01-11T21:38:06.1425177Z 2023-01-11T21:38:06.1425182Z 2023-01-11T21:38:06.1425280Z async_compile.wait(globals()) 2023-01-11T21:38:06.1425401Z del async_compile 2023-01-11T21:38:06.1425407Z 2023-01-11T21:38:06.1425485Z def call(args): 2023-01-11T21:38:06.1425562Z arg0_1, = args 2023-01-11T21:38:06.1425640Z args.clear() 2023-01-11T21:38:06.1425861Z buf0 = empty_strided((2, 4, 74), (296, 74, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1426060Z buf4 = empty_strided((2, 4, 74), (296, 74, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1426263Z buf1 = empty_strided((2, 4, 70), (280, 70, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1426463Z buf2 = empty_strided((2, 4, 45), (180, 45, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1426666Z buf3 = empty_strided((2, 4, 36), (144, 36, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1426908Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf4.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr()), c_void_p(buf3.data_ptr())) 2023-01-11T21:38:06.1426986Z del arg0_1 2023-01-11T21:38:06.1427089Z return (buf0, buf1, buf2, buf3, buf4, ) 2023-01-11T21:38:06.1427094Z 2023-01-11T21:38:06.1427098Z 2023-01-11T21:38:06.1427179Z if __name__ == "__main__": 2023-01-11T21:38:06.1427290Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.1427417Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.1427623Z arg0_1 = rand_strided((2, 4, 37), (148, 37, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1427737Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.1427742Z 2023-01-11T21:38:06.1427814Z ok (2.008s) 2023-01-11T21:38:06.1428292Z test_upsample_nearest2d_backward_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.1428474Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.1428735Z [2023-01-11 21:32:57,672] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 318 2023-01-11T21:38:06.1428741Z 2023-01-11T21:38:06.1428838Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.1428914Z import torch 2023-01-11T21:38:06.1428984Z import random 2023-01-11T21:38:06.1429108Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.1429231Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.1429239Z 2023-01-11T21:38:06.1429322Z aten = torch.ops.aten 2023-01-11T21:38:06.1429459Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.1429558Z async_compile = AsyncCompile() 2023-01-11T21:38:06.1429564Z 2023-01-11T21:38:06.1429638Z import triton 2023-01-11T21:38:06.1429724Z import triton.language as tl 2023-01-11T21:38:06.1429854Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.1429996Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.1430001Z 2023-01-11T21:38:06.1430006Z 2023-01-11T21:38:06.1430146Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.1430354Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.1430478Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.1430587Z float* __restrict__ out_ptr0, 2023-01-11T21:38:06.1430693Z float* __restrict__ out_ptr1, 2023-01-11T21:38:06.1430789Z float* __restrict__ out_ptr2, 2023-01-11T21:38:06.1430890Z float* __restrict__ out_ptr3, 2023-01-11T21:38:06.1430988Z float* __restrict__ out_ptr4) 2023-01-11T21:38:06.1431054Z { 2023-01-11T21:38:06.1431159Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.1431253Z { 2023-01-11T21:38:06.1431336Z #pragma omp for 2023-01-11T21:38:06.1431417Z for(long i0=0; i0<27; i0+=1) 2023-01-11T21:38:06.1431484Z { 2023-01-11T21:38:06.1431570Z #pragma GCC ivdep 2023-01-11T21:38:06.1431660Z for(long i1=0; i1<6; i1+=1) 2023-01-11T21:38:06.1431728Z { 2023-01-11T21:38:06.1431797Z { 2023-01-11T21:38:06.1431868Z { 2023-01-11T21:38:06.1431973Z auto tmp0 = in_ptr0[(2*i1) + (24*i0)]; 2023-01-11T21:38:06.1432085Z auto tmp1 = in_ptr0[1 + (2*i1) + (24*i0)]; 2023-01-11T21:38:06.1432199Z auto tmp3 = in_ptr0[12 + (2*i1) + (24*i0)]; 2023-01-11T21:38:06.1432311Z auto tmp5 = in_ptr0[13 + (2*i1) + (24*i0)]; 2023-01-11T21:38:06.1432414Z auto tmp2 = tmp1 + tmp0; 2023-01-11T21:38:06.1432511Z auto tmp4 = tmp3 + tmp2; 2023-01-11T21:38:06.1432611Z auto tmp6 = tmp5 + tmp4; 2023-01-11T21:38:06.1432720Z auto tmp7 = static_cast(1.0); 2023-01-11T21:38:06.1432817Z auto tmp8 = tmp6 * tmp7; 2023-01-11T21:38:06.1432923Z out_ptr0[i1 + (6*i0)] = tmp8; 2023-01-11T21:38:06.1432995Z } 2023-01-11T21:38:06.1433063Z } 2023-01-11T21:38:06.1433131Z } 2023-01-11T21:38:06.1433198Z } 2023-01-11T21:38:06.1433287Z #pragma omp for collapse(2) 2023-01-11T21:38:06.1433375Z for(long i0=0; i0<9; i0+=1) 2023-01-11T21:38:06.1433441Z { 2023-01-11T21:38:06.1433558Z for(long i1=0; i1<4; i1+=1) 2023-01-11T21:38:06.1433631Z { 2023-01-11T21:38:06.1433721Z #pragma GCC ivdep 2023-01-11T21:38:06.1433814Z for(long i2=0; i2<5; i2+=1) 2023-01-11T21:38:06.1433877Z { 2023-01-11T21:38:06.1433947Z { 2023-01-11T21:38:06.1434023Z { 2023-01-11T21:38:06.1434147Z auto tmp0 = static_cast(((3 + (6*i1)) / 4)); 2023-01-11T21:38:06.1434270Z auto tmp1 = static_cast(((9 + (6*i1)) / 4)); 2023-01-11T21:38:06.1434370Z auto tmp2 = tmp0 < tmp1; 2023-01-11T21:38:06.1434492Z auto tmp3 = static_cast(((4 + (12*i2)) / 5)); 2023-01-11T21:38:06.1434609Z auto tmp4 = static_cast(((16 + (12*i2)) / 5)); 2023-01-11T21:38:06.1434712Z auto tmp5 = tmp3 < tmp4; 2023-01-11T21:38:06.1434811Z auto tmp6 = tmp2 & tmp5; 2023-01-11T21:38:06.1434907Z float tmp7 = 0.0; 2023-01-11T21:38:06.1434996Z if(tmp6) 2023-01-11T21:38:06.1435116Z { 2023-01-11T21:38:06.1435312Z auto tmp8 = in_ptr0[(12*(((3 + (6*i1)) / 4))) + (72*i0) + (((4 + (12*i2)) / 5))]; 2023-01-11T21:38:06.1435435Z tmp7 = tmp8; 2023-01-11T21:38:06.1435521Z } 2023-01-11T21:38:06.1435671Z auto tmp9 = static_cast(1 + (((4 + (12*i2)) / 5))); 2023-01-11T21:38:06.1435792Z auto tmp10 = tmp9 < tmp4; 2023-01-11T21:38:06.1435918Z auto tmp11 = tmp2 & tmp10; 2023-01-11T21:38:06.1436041Z float tmp12 = 0.0; 2023-01-11T21:38:06.1436148Z if(tmp11) 2023-01-11T21:38:06.1436246Z { 2023-01-11T21:38:06.1438893Z auto tmp13 = in_ptr0[1 + (12*(((3 + (6*i1)) / 4))) + (72*i0) + (((4 + (12*i2)) / 5))]; 2023-01-11T21:38:06.1439079Z tmp12 = tmp13; 2023-01-11T21:38:06.1439159Z } 2023-01-11T21:38:06.1439267Z auto tmp14 = tmp12 + tmp7; 2023-01-11T21:38:06.1439454Z auto tmp15 = static_cast(2 + (((4 + (12*i2)) / 5))); 2023-01-11T21:38:06.1439561Z auto tmp16 = tmp15 < tmp4; 2023-01-11T21:38:06.1439666Z auto tmp17 = tmp2 & tmp16; 2023-01-11T21:38:06.1439757Z float tmp18 = 0.0; 2023-01-11T21:38:06.1439850Z if(tmp17) 2023-01-11T21:38:06.1439926Z { 2023-01-11T21:38:06.1440065Z auto tmp19 = in_ptr0[2 + (12*(((3 + (6*i1)) / 4))) + (72*i0) + (((4 + (12*i2)) / 5))]; 2023-01-11T21:38:06.1440162Z tmp18 = tmp19; 2023-01-11T21:38:06.1440242Z } 2023-01-11T21:38:06.1440347Z auto tmp20 = tmp18 + tmp14; 2023-01-11T21:38:06.1440477Z auto tmp21 = static_cast(1 + (((3 + (6*i1)) / 4))); 2023-01-11T21:38:06.1440573Z auto tmp22 = tmp21 < tmp1; 2023-01-11T21:38:06.1440682Z auto tmp23 = tmp22 & tmp5; 2023-01-11T21:38:06.1440779Z float tmp24 = 0.0; 2023-01-11T21:38:06.1440866Z if(tmp23) 2023-01-11T21:38:06.1440943Z { 2023-01-11T21:38:06.1441082Z auto tmp25 = in_ptr0[12 + (12*(((3 + (6*i1)) / 4))) + (72*i0) + (((4 + (12*i2)) / 5))]; 2023-01-11T21:38:06.1441183Z tmp24 = tmp25; 2023-01-11T21:38:06.1441252Z } 2023-01-11T21:38:06.1441361Z auto tmp26 = tmp24 + tmp20; 2023-01-11T21:38:06.1441497Z auto tmp27 = tmp22 & tmp10; 2023-01-11T21:38:06.1441591Z float tmp28 = 0.0; 2023-01-11T21:38:06.1441674Z if(tmp27) 2023-01-11T21:38:06.1441751Z { 2023-01-11T21:38:06.1441894Z auto tmp29 = in_ptr0[13 + (12*(((3 + (6*i1)) / 4))) + (72*i0) + (((4 + (12*i2)) / 5))]; 2023-01-11T21:38:06.1441989Z tmp28 = tmp29; 2023-01-11T21:38:06.1442057Z } 2023-01-11T21:38:06.1442161Z auto tmp30 = tmp28 + tmp26; 2023-01-11T21:38:06.1442263Z auto tmp31 = tmp22 & tmp16; 2023-01-11T21:38:06.1442358Z float tmp32 = 0.0; 2023-01-11T21:38:06.1442443Z if(tmp31) 2023-01-11T21:38:06.1442518Z { 2023-01-11T21:38:06.1442657Z auto tmp33 = in_ptr0[14 + (12*(((3 + (6*i1)) / 4))) + (72*i0) + (((4 + (12*i2)) / 5))]; 2023-01-11T21:38:06.1442747Z tmp32 = tmp33; 2023-01-11T21:38:06.1442822Z } 2023-01-11T21:38:06.1442924Z auto tmp34 = tmp32 + tmp30; 2023-01-11T21:38:06.1443042Z out_ptr1[i2 + (5*i1) + (20*i0)] = tmp34; 2023-01-11T21:38:06.1443117Z } 2023-01-11T21:38:06.1443190Z } 2023-01-11T21:38:06.1443259Z } 2023-01-11T21:38:06.1443322Z } 2023-01-11T21:38:06.1443390Z } 2023-01-11T21:38:06.1443490Z #pragma omp for collapse(2) 2023-01-11T21:38:06.1443579Z for(long i0=0; i0<9; i0+=1) 2023-01-11T21:38:06.1443651Z { 2023-01-11T21:38:06.1443740Z for(long i1=0; i1<2; i1+=1) 2023-01-11T21:38:06.1443810Z { 2023-01-11T21:38:06.1443892Z #pragma GCC ivdep 2023-01-11T21:38:06.1443987Z for(long i2=0; i2<8; i2+=1) 2023-01-11T21:38:06.1444061Z { 2023-01-11T21:38:06.1444134Z { 2023-01-11T21:38:06.1444207Z { 2023-01-11T21:38:06.1444325Z auto tmp0 = static_cast(3*i1); 2023-01-11T21:38:06.1444445Z auto tmp1 = static_cast(3 + (3*i1)); 2023-01-11T21:38:06.1444572Z auto tmp2 = tmp0 < tmp1; 2023-01-11T21:38:06.1444700Z auto tmp3 = static_cast(((7 + (12*i2)) / 8)); 2023-01-11T21:38:06.1444826Z auto tmp4 = static_cast(((19 + (12*i2)) / 8)); 2023-01-11T21:38:06.1444930Z auto tmp5 = tmp3 < tmp4; 2023-01-11T21:38:06.1445030Z auto tmp6 = tmp2 & tmp5; 2023-01-11T21:38:06.1445128Z float tmp7 = 0.0; 2023-01-11T21:38:06.1445213Z if(tmp6) 2023-01-11T21:38:06.1445283Z { 2023-01-11T21:38:06.1445415Z auto tmp8 = in_ptr0[(36*i1) + (72*i0) + (((7 + (12*i2)) / 8))]; 2023-01-11T21:38:06.1445512Z tmp7 = tmp8; 2023-01-11T21:38:06.1445589Z } 2023-01-11T21:38:06.1445715Z auto tmp9 = static_cast(1 + (((7 + (12*i2)) / 8))); 2023-01-11T21:38:06.1445821Z auto tmp10 = tmp9 < tmp4; 2023-01-11T21:38:06.1445925Z auto tmp11 = tmp2 & tmp10; 2023-01-11T21:38:06.1446014Z float tmp12 = 0.0; 2023-01-11T21:38:06.1446098Z if(tmp11) 2023-01-11T21:38:06.1446178Z { 2023-01-11T21:38:06.1446310Z auto tmp13 = in_ptr0[1 + (36*i1) + (72*i0) + (((7 + (12*i2)) / 8))]; 2023-01-11T21:38:06.1446405Z tmp12 = tmp13; 2023-01-11T21:38:06.1446482Z } 2023-01-11T21:38:06.1446635Z auto tmp14 = tmp12 + tmp7; 2023-01-11T21:38:06.1446757Z auto tmp15 = static_cast(1 + (3*i1)); 2023-01-11T21:38:06.1446854Z auto tmp16 = tmp15 < tmp1; 2023-01-11T21:38:06.1446957Z auto tmp17 = tmp16 & tmp5; 2023-01-11T21:38:06.1447057Z float tmp18 = 0.0; 2023-01-11T21:38:06.1447146Z if(tmp17) 2023-01-11T21:38:06.1447223Z { 2023-01-11T21:38:06.1447354Z auto tmp19 = in_ptr0[12 + (36*i1) + (72*i0) + (((7 + (12*i2)) / 8))]; 2023-01-11T21:38:06.1447447Z tmp18 = tmp19; 2023-01-11T21:38:06.1447515Z } 2023-01-11T21:38:06.1447621Z auto tmp20 = tmp18 + tmp14; 2023-01-11T21:38:06.1447728Z auto tmp21 = tmp16 & tmp10; 2023-01-11T21:38:06.1447824Z float tmp22 = 0.0; 2023-01-11T21:38:06.1447919Z if(tmp21) 2023-01-11T21:38:06.1447997Z { 2023-01-11T21:38:06.1448125Z auto tmp23 = in_ptr0[13 + (36*i1) + (72*i0) + (((7 + (12*i2)) / 8))]; 2023-01-11T21:38:06.1448212Z tmp22 = tmp23; 2023-01-11T21:38:06.1448292Z } 2023-01-11T21:38:06.1448397Z auto tmp24 = tmp22 + tmp20; 2023-01-11T21:38:06.1448520Z auto tmp25 = static_cast(2 + (3*i1)); 2023-01-11T21:38:06.1448623Z auto tmp26 = tmp25 < tmp1; 2023-01-11T21:38:06.1448727Z auto tmp27 = tmp26 & tmp5; 2023-01-11T21:38:06.1448823Z float tmp28 = 0.0; 2023-01-11T21:38:06.1448909Z if(tmp27) 2023-01-11T21:38:06.1448977Z { 2023-01-11T21:38:06.1449109Z auto tmp29 = in_ptr0[24 + (36*i1) + (72*i0) + (((7 + (12*i2)) / 8))]; 2023-01-11T21:38:06.1449205Z tmp28 = tmp29; 2023-01-11T21:38:06.1449283Z } 2023-01-11T21:38:06.1449390Z auto tmp30 = tmp28 + tmp24; 2023-01-11T21:38:06.1449540Z auto tmp31 = tmp26 & tmp10; 2023-01-11T21:38:06.1449637Z float tmp32 = 0.0; 2023-01-11T21:38:06.1449714Z if(tmp31) 2023-01-11T21:38:06.1449794Z { 2023-01-11T21:38:06.1449920Z auto tmp33 = in_ptr0[25 + (36*i1) + (72*i0) + (((7 + (12*i2)) / 8))]; 2023-01-11T21:38:06.1450013Z tmp32 = tmp33; 2023-01-11T21:38:06.1450091Z } 2023-01-11T21:38:06.1450194Z auto tmp34 = tmp32 + tmp30; 2023-01-11T21:38:06.1450289Z float tmp35 = 0.0; 2023-01-11T21:38:06.1450371Z if(tmp6) 2023-01-11T21:38:06.1450449Z { 2023-01-11T21:38:06.1450580Z auto tmp36 = in_ptr0[(36*i1) + (72*i0) + (((7 + (12*i2)) / 8))]; 2023-01-11T21:38:06.1450673Z tmp35 = tmp36; 2023-01-11T21:38:06.1450754Z } 2023-01-11T21:38:06.1450848Z float tmp37 = 0.0; 2023-01-11T21:38:06.1450934Z if(tmp11) 2023-01-11T21:38:06.1451003Z { 2023-01-11T21:38:06.1451131Z auto tmp38 = in_ptr0[1 + (36*i1) + (72*i0) + (((7 + (12*i2)) / 8))]; 2023-01-11T21:38:06.1451223Z tmp37 = tmp38; 2023-01-11T21:38:06.1451298Z } 2023-01-11T21:38:06.1451402Z auto tmp39 = tmp37 + tmp35; 2023-01-11T21:38:06.1451498Z float tmp40 = 0.0; 2023-01-11T21:38:06.1451612Z if(tmp17) 2023-01-11T21:38:06.1451690Z { 2023-01-11T21:38:06.1451812Z auto tmp41 = in_ptr0[12 + (36*i1) + (72*i0) + (((7 + (12*i2)) / 8))]; 2023-01-11T21:38:06.1451904Z tmp40 = tmp41; 2023-01-11T21:38:06.1451980Z } 2023-01-11T21:38:06.1452085Z auto tmp42 = tmp40 + tmp39; 2023-01-11T21:38:06.1452180Z float tmp43 = 0.0; 2023-01-11T21:38:06.1452267Z if(tmp21) 2023-01-11T21:38:06.1452343Z { 2023-01-11T21:38:06.1452462Z auto tmp44 = in_ptr0[13 + (36*i1) + (72*i0) + (((7 + (12*i2)) / 8))]; 2023-01-11T21:38:06.1452556Z tmp43 = tmp44; 2023-01-11T21:38:06.1452632Z } 2023-01-11T21:38:06.1452736Z auto tmp45 = tmp43 + tmp42; 2023-01-11T21:38:06.1452834Z float tmp46 = 0.0; 2023-01-11T21:38:06.1452920Z if(tmp27) 2023-01-11T21:38:06.1452997Z { 2023-01-11T21:38:06.1453120Z auto tmp47 = in_ptr0[24 + (36*i1) + (72*i0) + (((7 + (12*i2)) / 8))]; 2023-01-11T21:38:06.1453214Z tmp46 = tmp47; 2023-01-11T21:38:06.1453292Z } 2023-01-11T21:38:06.1453397Z auto tmp48 = tmp46 + tmp45; 2023-01-11T21:38:06.1453491Z float tmp49 = 0.0; 2023-01-11T21:38:06.1453575Z if(tmp31) 2023-01-11T21:38:06.1453652Z { 2023-01-11T21:38:06.1453777Z auto tmp50 = in_ptr0[25 + (36*i1) + (72*i0) + (((7 + (12*i2)) / 8))]; 2023-01-11T21:38:06.1453861Z tmp49 = tmp50; 2023-01-11T21:38:06.1453936Z } 2023-01-11T21:38:06.1454042Z auto tmp51 = tmp49 + tmp48; 2023-01-11T21:38:06.1454161Z out_ptr2[i2 + (8*i1) + (16*i0)] = tmp34; 2023-01-11T21:38:06.1454270Z out_ptr3[i2 + (8*i1) + (16*i0)] = tmp51; 2023-01-11T21:38:06.1454344Z } 2023-01-11T21:38:06.1454417Z } 2023-01-11T21:38:06.1454843Z } 2023-01-11T21:38:06.1454939Z } 2023-01-11T21:38:06.1455007Z } 2023-01-11T21:38:06.1455108Z #pragma omp for collapse(2) 2023-01-11T21:38:06.1455194Z for(long i0=0; i0<9; i0+=1) 2023-01-11T21:38:06.1455264Z { 2023-01-11T21:38:06.1455356Z for(long i1=0; i1<4; i1+=1) 2023-01-11T21:38:06.1455417Z { 2023-01-11T21:38:06.1455504Z #pragma GCC ivdep 2023-01-11T21:38:06.1455599Z for(long i2=0; i2<7; i2+=1) 2023-01-11T21:38:06.1455670Z { 2023-01-11T21:38:06.1455739Z { 2023-01-11T21:38:06.1455815Z { 2023-01-11T21:38:06.1455932Z auto tmp0 = static_cast(((3 + (6*i1)) / 4)); 2023-01-11T21:38:06.1456054Z auto tmp1 = static_cast(((9 + (6*i1)) / 4)); 2023-01-11T21:38:06.1456159Z auto tmp2 = tmp0 < tmp1; 2023-01-11T21:38:06.1456283Z auto tmp3 = static_cast(((6 + (12*i2)) / 7)); 2023-01-11T21:38:06.1456407Z auto tmp4 = static_cast(((18 + (12*i2)) / 7)); 2023-01-11T21:38:06.1456510Z auto tmp5 = tmp3 < tmp4; 2023-01-11T21:38:06.1456608Z auto tmp6 = tmp2 & tmp5; 2023-01-11T21:38:06.1456702Z float tmp7 = 0.0; 2023-01-11T21:38:06.1456777Z if(tmp6) 2023-01-11T21:38:06.1456852Z { 2023-01-11T21:38:06.1456985Z auto tmp8 = in_ptr0[(12*(((3 + (6*i1)) / 4))) + (72*i0) + (((6 + (12*i2)) / 7))]; 2023-01-11T21:38:06.1457117Z tmp7 = tmp8; 2023-01-11T21:38:06.1457271Z } 2023-01-11T21:38:06.1457416Z auto tmp9 = static_cast(1 + (((6 + (12*i2)) / 7))); 2023-01-11T21:38:06.1457521Z auto tmp10 = tmp9 < tmp4; 2023-01-11T21:38:06.1457616Z auto tmp11 = tmp2 & tmp10; 2023-01-11T21:38:06.1457714Z float tmp12 = 0.0; 2023-01-11T21:38:06.1457796Z if(tmp11) 2023-01-11T21:38:06.1457870Z { 2023-01-11T21:38:06.1458006Z auto tmp13 = in_ptr0[1 + (12*(((3 + (6*i1)) / 4))) + (72*i0) + (((6 + (12*i2)) / 7))]; 2023-01-11T21:38:06.1458099Z tmp12 = tmp13; 2023-01-11T21:38:06.1458174Z } 2023-01-11T21:38:06.1458275Z auto tmp14 = tmp12 + tmp7; 2023-01-11T21:38:06.1458398Z auto tmp15 = static_cast(1 + (((3 + (6*i1)) / 4))); 2023-01-11T21:38:06.1458501Z auto tmp16 = tmp15 < tmp1; 2023-01-11T21:38:06.1458603Z auto tmp17 = tmp16 & tmp5; 2023-01-11T21:38:06.1458697Z float tmp18 = 0.0; 2023-01-11T21:38:06.1458784Z if(tmp17) 2023-01-11T21:38:06.1458858Z { 2023-01-11T21:38:06.1458997Z auto tmp19 = in_ptr0[12 + (12*(((3 + (6*i1)) / 4))) + (72*i0) + (((6 + (12*i2)) / 7))]; 2023-01-11T21:38:06.1459082Z tmp18 = tmp19; 2023-01-11T21:38:06.1459157Z } 2023-01-11T21:38:06.1459260Z auto tmp20 = tmp18 + tmp14; 2023-01-11T21:38:06.1459363Z auto tmp21 = tmp16 & tmp10; 2023-01-11T21:38:06.1459456Z float tmp22 = 0.0; 2023-01-11T21:38:06.1459542Z if(tmp21) 2023-01-11T21:38:06.1459617Z { 2023-01-11T21:38:06.1459753Z auto tmp23 = in_ptr0[13 + (12*(((3 + (6*i1)) / 4))) + (72*i0) + (((6 + (12*i2)) / 7))]; 2023-01-11T21:38:06.1459837Z tmp22 = tmp23; 2023-01-11T21:38:06.1459943Z } 2023-01-11T21:38:06.1460046Z auto tmp24 = tmp22 + tmp20; 2023-01-11T21:38:06.1460162Z out_ptr4[i2 + (7*i1) + (28*i0)] = tmp24; 2023-01-11T21:38:06.1460235Z } 2023-01-11T21:38:06.1460307Z } 2023-01-11T21:38:06.1460376Z } 2023-01-11T21:38:06.1460436Z } 2023-01-11T21:38:06.1460502Z } 2023-01-11T21:38:06.1460568Z } 2023-01-11T21:38:06.1460632Z } 2023-01-11T21:38:06.1460767Z ''') 2023-01-11T21:38:06.1460774Z 2023-01-11T21:38:06.1460779Z 2023-01-11T21:38:06.1460876Z async_compile.wait(globals()) 2023-01-11T21:38:06.1460958Z del async_compile 2023-01-11T21:38:06.1460963Z 2023-01-11T21:38:06.1461032Z def call(args): 2023-01-11T21:38:06.1461107Z arg0_1, = args 2023-01-11T21:38:06.1461182Z args.clear() 2023-01-11T21:38:06.1461406Z buf0 = empty_strided((3, 3, 3, 6), (54, 18, 6, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1461622Z buf1 = empty_strided((3, 3, 4, 5), (60, 20, 5, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1461829Z buf2 = empty_strided((3, 3, 2, 8), (48, 16, 8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1462035Z buf3 = empty_strided((3, 3, 2, 8), (48, 16, 8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1462230Z buf4 = empty_strided((3, 3, 4, 7), (84, 28, 7, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1462470Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr()), c_void_p(buf3.data_ptr()), c_void_p(buf4.data_ptr())) 2023-01-11T21:38:06.1462575Z del arg0_1 2023-01-11T21:38:06.1462679Z return (buf0, buf1, buf2, buf3, buf4, ) 2023-01-11T21:38:06.1462684Z 2023-01-11T21:38:06.1462689Z 2023-01-11T21:38:06.1462776Z if __name__ == "__main__": 2023-01-11T21:38:06.1462896Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.1463027Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.1463243Z arg0_1 = rand_strided((3, 3, 6, 12), (216, 72, 12, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1463361Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.1463626Z [2023-01-11 21:33:00,019] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 318 2023-01-11T21:38:06.1463632Z 2023-01-11T21:38:06.1463708Z ok (2.379s) 2023-01-11T21:38:06.1464184Z test_upsample_nearest2d_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.1464325Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.1464592Z [2023-01-11 21:33:00,472] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 319 2023-01-11T21:38:06.1464862Z [2023-01-11 21:33:02,677] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 319 2023-01-11T21:38:06.1464868Z 2023-01-11T21:38:06.1464968Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.1465045Z import torch 2023-01-11T21:38:06.1465133Z import random 2023-01-11T21:38:06.1465266Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.1465419Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.1465424Z 2023-01-11T21:38:06.1465516Z aten = torch.ops.aten 2023-01-11T21:38:06.1465660Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.1465760Z async_compile = AsyncCompile() 2023-01-11T21:38:06.1465765Z 2023-01-11T21:38:06.1465844Z import triton 2023-01-11T21:38:06.1465939Z import triton.language as tl 2023-01-11T21:38:06.1466099Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.1466235Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.1466241Z 2023-01-11T21:38:06.1466254Z 2023-01-11T21:38:06.1466387Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.1466599Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.1466726Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.1466835Z float* __restrict__ out_ptr0, 2023-01-11T21:38:06.1466943Z float* __restrict__ out_ptr1, 2023-01-11T21:38:06.1467046Z float* __restrict__ out_ptr2, 2023-01-11T21:38:06.1467152Z float* __restrict__ out_ptr3, 2023-01-11T21:38:06.1467247Z float* __restrict__ out_ptr4) 2023-01-11T21:38:06.1467315Z { 2023-01-11T21:38:06.1467419Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.1467490Z { 2023-01-11T21:38:06.1467577Z #pragma omp for 2023-01-11T21:38:06.1467669Z for(long i0=0; i0<8; i0+=1) 2023-01-11T21:38:06.1467740Z { 2023-01-11T21:38:06.1467822Z #pragma GCC ivdep 2023-01-11T21:38:06.1467915Z for(long i1=0; i1<74; i1+=1) 2023-01-11T21:38:06.1467985Z { 2023-01-11T21:38:06.1468076Z #pragma GCC ivdep 2023-01-11T21:38:06.1468176Z for(long i2=0; i2<76; i2+=1) 2023-01-11T21:38:06.1468248Z { 2023-01-11T21:38:06.1468315Z { 2023-01-11T21:38:06.1468392Z { 2023-01-11T21:38:06.1468513Z auto tmp0 = static_cast(i1); 2023-01-11T21:38:06.1468662Z auto tmp1 = static_cast(0.5); 2023-01-11T21:38:06.1468769Z auto tmp2 = tmp0 * tmp1; 2023-01-11T21:38:06.1468886Z auto tmp3 = static_cast(tmp2); 2023-01-11T21:38:06.1469005Z auto tmp4 = static_cast(i2); 2023-01-11T21:38:06.1469108Z auto tmp5 = tmp4 * tmp1; 2023-01-11T21:38:06.1469215Z auto tmp6 = static_cast(tmp5); 2023-01-11T21:38:06.1469343Z auto tmp7 = in_ptr0[tmp6 + (38*tmp3) + (1406*i0)]; 2023-01-11T21:38:06.1469458Z out_ptr0[i2 + (76*i1) + (5624*i0)] = tmp7; 2023-01-11T21:38:06.1469573Z out_ptr1[i2 + (76*i1) + (5624*i0)] = tmp7; 2023-01-11T21:38:06.1469648Z } 2023-01-11T21:38:06.1469721Z } 2023-01-11T21:38:06.1469793Z } 2023-01-11T21:38:06.1469859Z } 2023-01-11T21:38:06.1469928Z } 2023-01-11T21:38:06.1470012Z #pragma omp for 2023-01-11T21:38:06.1470103Z for(long i0=0; i0<8; i0+=1) 2023-01-11T21:38:06.1470172Z { 2023-01-11T21:38:06.1470262Z #pragma GCC ivdep 2023-01-11T21:38:06.1470356Z for(long i1=0; i1<70; i1+=1) 2023-01-11T21:38:06.1470420Z { 2023-01-11T21:38:06.1470511Z #pragma GCC ivdep 2023-01-11T21:38:06.1470609Z for(long i2=0; i2<75; i2+=1) 2023-01-11T21:38:06.1470680Z { 2023-01-11T21:38:06.1470755Z { 2023-01-11T21:38:06.1470829Z { 2023-01-11T21:38:06.1470939Z auto tmp0 = static_cast(i1); 2023-01-11T21:38:06.1471066Z auto tmp1 = static_cast(0.5285714285714286); 2023-01-11T21:38:06.1471173Z auto tmp2 = tmp0 * tmp1; 2023-01-11T21:38:06.1471288Z auto tmp3 = static_cast(tmp2); 2023-01-11T21:38:06.1471408Z auto tmp4 = static_cast(i2); 2023-01-11T21:38:06.1471537Z auto tmp5 = static_cast(0.5066666666666667); 2023-01-11T21:38:06.1471644Z auto tmp6 = tmp4 * tmp5; 2023-01-11T21:38:06.1471787Z auto tmp7 = static_cast(tmp6); 2023-01-11T21:38:06.1471908Z auto tmp8 = in_ptr0[tmp7 + (38*tmp3) + (1406*i0)]; 2023-01-11T21:38:06.1472023Z out_ptr2[i2 + (75*i1) + (5250*i0)] = tmp8; 2023-01-11T21:38:06.1472098Z } 2023-01-11T21:38:06.1472174Z } 2023-01-11T21:38:06.1472246Z } 2023-01-11T21:38:06.1472317Z } 2023-01-11T21:38:06.1472385Z } 2023-01-11T21:38:06.1472463Z #pragma omp for 2023-01-11T21:38:06.1472554Z for(long i0=0; i0<8; i0+=1) 2023-01-11T21:38:06.1472627Z { 2023-01-11T21:38:06.1472715Z #pragma GCC ivdep 2023-01-11T21:38:06.1472807Z for(long i1=0; i1<45; i1+=1) 2023-01-11T21:38:06.1472879Z { 2023-01-11T21:38:06.1472969Z #pragma GCC ivdep 2023-01-11T21:38:06.1473059Z for(long i2=0; i2<74; i2+=1) 2023-01-11T21:38:06.1473133Z { 2023-01-11T21:38:06.1473206Z { 2023-01-11T21:38:06.1473281Z { 2023-01-11T21:38:06.1473396Z auto tmp0 = static_cast(i1); 2023-01-11T21:38:06.1473519Z auto tmp1 = static_cast(0.8222222222222222); 2023-01-11T21:38:06.1473623Z auto tmp2 = tmp0 * tmp1; 2023-01-11T21:38:06.1473729Z auto tmp3 = static_cast(tmp2); 2023-01-11T21:38:06.1473845Z auto tmp4 = static_cast(i2); 2023-01-11T21:38:06.1473970Z auto tmp5 = static_cast(0.5135135135135135); 2023-01-11T21:38:06.1474200Z auto tmp6 = tmp4 * tmp5; 2023-01-11T21:38:06.1474311Z auto tmp7 = static_cast(tmp6); 2023-01-11T21:38:06.1474433Z auto tmp8 = in_ptr0[tmp7 + (38*tmp3) + (1406*i0)]; 2023-01-11T21:38:06.1474547Z out_ptr3[i2 + (74*i1) + (3330*i0)] = tmp8; 2023-01-11T21:38:06.1474612Z } 2023-01-11T21:38:06.1474683Z } 2023-01-11T21:38:06.1474754Z } 2023-01-11T21:38:06.1474825Z } 2023-01-11T21:38:06.1474894Z } 2023-01-11T21:38:06.1474976Z #pragma omp for 2023-01-11T21:38:06.1475062Z for(long i0=0; i0<8; i0+=1) 2023-01-11T21:38:06.1475128Z { 2023-01-11T21:38:06.1475232Z #pragma GCC ivdep 2023-01-11T21:38:06.1475329Z for(long i1=0; i1<36; i1+=1) 2023-01-11T21:38:06.1475413Z { 2023-01-11T21:38:06.1475499Z #pragma GCC ivdep 2023-01-11T21:38:06.1475596Z for(long i2=0; i2<39; i2+=1) 2023-01-11T21:38:06.1475666Z { 2023-01-11T21:38:06.1475729Z { 2023-01-11T21:38:06.1475801Z { 2023-01-11T21:38:06.1475914Z auto tmp0 = static_cast(i1); 2023-01-11T21:38:06.1476039Z auto tmp1 = static_cast(1.0277777777777777); 2023-01-11T21:38:06.1476142Z auto tmp2 = tmp0 * tmp1; 2023-01-11T21:38:06.1476254Z auto tmp3 = static_cast(tmp2); 2023-01-11T21:38:06.1476368Z auto tmp4 = static_cast(i2); 2023-01-11T21:38:06.1476483Z auto tmp5 = static_cast(0.9743589743589743); 2023-01-11T21:38:06.1476586Z auto tmp6 = tmp4 * tmp5; 2023-01-11T21:38:06.1476698Z auto tmp7 = static_cast(tmp6); 2023-01-11T21:38:06.1476826Z auto tmp8 = in_ptr0[tmp7 + (38*tmp3) + (1406*i0)]; 2023-01-11T21:38:06.1476938Z out_ptr4[i2 + (39*i1) + (1404*i0)] = tmp8; 2023-01-11T21:38:06.1477013Z } 2023-01-11T21:38:06.1477084Z } 2023-01-11T21:38:06.1477146Z } 2023-01-11T21:38:06.1477240Z } 2023-01-11T21:38:06.1477309Z } 2023-01-11T21:38:06.1477378Z } 2023-01-11T21:38:06.1477444Z } 2023-01-11T21:38:06.1477537Z ''') 2023-01-11T21:38:06.1477544Z 2023-01-11T21:38:06.1477548Z 2023-01-11T21:38:06.1477648Z async_compile.wait(globals()) 2023-01-11T21:38:06.1477719Z del async_compile 2023-01-11T21:38:06.1477732Z 2023-01-11T21:38:06.1477800Z def call(args): 2023-01-11T21:38:06.1477874Z arg0_1, = args 2023-01-11T21:38:06.1477949Z args.clear() 2023-01-11T21:38:06.1478181Z buf0 = empty_strided((2, 4, 74, 76), (22496, 5624, 76, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1478402Z buf4 = empty_strided((2, 4, 74, 76), (22496, 5624, 76, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1478625Z buf1 = empty_strided((2, 4, 70, 75), (21000, 5250, 75, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1478844Z buf2 = empty_strided((2, 4, 45, 74), (13320, 3330, 74, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1479056Z buf3 = empty_strided((2, 4, 36, 39), (5616, 1404, 39, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1479294Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf4.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr()), c_void_p(buf3.data_ptr())) 2023-01-11T21:38:06.1479369Z del arg0_1 2023-01-11T21:38:06.1479473Z return (buf0, buf1, buf2, buf3, buf4, ) 2023-01-11T21:38:06.1479479Z 2023-01-11T21:38:06.1479484Z 2023-01-11T21:38:06.1479566Z if __name__ == "__main__": 2023-01-11T21:38:06.1479684Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.1479812Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.1480057Z arg0_1 = rand_strided((2, 4, 37, 38), (5624, 1406, 38, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1480171Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.1480177Z 2023-01-11T21:38:06.1480241Z ok (2.665s) 2023-01-11T21:38:06.1480707Z test_upsample_nearest3d_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.1480841Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.1481100Z [2023-01-11 21:33:03,264] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 320 2023-01-11T21:38:06.1481108Z 2023-01-11T21:38:06.1481207Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.1481283Z import torch 2023-01-11T21:38:06.1481358Z import random 2023-01-11T21:38:06.1481479Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.1481603Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.1481608Z 2023-01-11T21:38:06.1481685Z aten = torch.ops.aten 2023-01-11T21:38:06.1481821Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.1481919Z async_compile = AsyncCompile() 2023-01-11T21:38:06.1481924Z 2023-01-11T21:38:06.1481997Z import triton 2023-01-11T21:38:06.1482091Z import triton.language as tl 2023-01-11T21:38:06.1482216Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.1482358Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.1482364Z 2023-01-11T21:38:06.1482368Z 2023-01-11T21:38:06.1482505Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.1482705Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.1482833Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.1482940Z float* __restrict__ out_ptr0, 2023-01-11T21:38:06.1483044Z float* __restrict__ out_ptr1, 2023-01-11T21:38:06.1483172Z float* __restrict__ out_ptr2, 2023-01-11T21:38:06.1483274Z float* __restrict__ out_ptr3, 2023-01-11T21:38:06.1483374Z float* __restrict__ out_ptr4) 2023-01-11T21:38:06.1483432Z { 2023-01-11T21:38:06.1483534Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.1483602Z { 2023-01-11T21:38:06.1483684Z #pragma omp for 2023-01-11T21:38:06.1483770Z for(long i0=0; i0<8; i0+=1) 2023-01-11T21:38:06.1483838Z { 2023-01-11T21:38:06.1483924Z #pragma GCC ivdep 2023-01-11T21:38:06.1484008Z for(long i1=0; i1<74; i1+=1) 2023-01-11T21:38:06.1484080Z { 2023-01-11T21:38:06.1484167Z #pragma GCC ivdep 2023-01-11T21:38:06.1484262Z for(long i2=0; i2<76; i2+=1) 2023-01-11T21:38:06.1484332Z { 2023-01-11T21:38:06.1484420Z #pragma GCC ivdep 2023-01-11T21:38:06.1484518Z for(long i3=0; i3<78; i3+=1) 2023-01-11T21:38:06.1484585Z { 2023-01-11T21:38:06.1484662Z { 2023-01-11T21:38:06.1484738Z { 2023-01-11T21:38:06.1484857Z auto tmp0 = static_cast(i1); 2023-01-11T21:38:06.1484978Z auto tmp1 = static_cast(0.5); 2023-01-11T21:38:06.1485083Z auto tmp2 = tmp0 * tmp1; 2023-01-11T21:38:06.1485203Z auto tmp3 = static_cast(tmp2); 2023-01-11T21:38:06.1485310Z auto tmp4 = static_cast(i2); 2023-01-11T21:38:06.1485412Z auto tmp5 = tmp4 * tmp1; 2023-01-11T21:38:06.1485558Z auto tmp6 = static_cast(tmp5); 2023-01-11T21:38:06.1485674Z auto tmp7 = static_cast(i3); 2023-01-11T21:38:06.1485778Z auto tmp8 = tmp7 * tmp1; 2023-01-11T21:38:06.1485896Z auto tmp9 = static_cast(tmp8); 2023-01-11T21:38:06.1486036Z auto tmp10 = in_ptr0[tmp9 + (39*tmp6) + (1482*tmp3) + (54834*i0)]; 2023-01-11T21:38:06.1486157Z out_ptr0[i3 + (78*i2) + (5928*i1) + (438672*i0)] = tmp10; 2023-01-11T21:38:06.1486270Z out_ptr1[i3 + (78*i2) + (5928*i1) + (438672*i0)] = tmp10; 2023-01-11T21:38:06.1486349Z } 2023-01-11T21:38:06.1486422Z } 2023-01-11T21:38:06.1486493Z } 2023-01-11T21:38:06.1486563Z } 2023-01-11T21:38:06.1486633Z } 2023-01-11T21:38:06.1486700Z } 2023-01-11T21:38:06.1486775Z #pragma omp for 2023-01-11T21:38:06.1486861Z for(long i0=0; i0<8; i0+=1) 2023-01-11T21:38:06.1486929Z { 2023-01-11T21:38:06.1487013Z #pragma GCC ivdep 2023-01-11T21:38:06.1487103Z for(long i1=0; i1<70; i1+=1) 2023-01-11T21:38:06.1487176Z { 2023-01-11T21:38:06.1487256Z #pragma GCC ivdep 2023-01-11T21:38:06.1487352Z for(long i2=0; i2<75; i2+=1) 2023-01-11T21:38:06.1487426Z { 2023-01-11T21:38:06.1487514Z #pragma GCC ivdep 2023-01-11T21:38:06.1487613Z for(long i3=0; i3<80; i3+=1) 2023-01-11T21:38:06.1487684Z { 2023-01-11T21:38:06.1487757Z { 2023-01-11T21:38:06.1487824Z { 2023-01-11T21:38:06.1487940Z auto tmp0 = static_cast(i1); 2023-01-11T21:38:06.1488065Z auto tmp1 = static_cast(0.5285714285714286); 2023-01-11T21:38:06.1488174Z auto tmp2 = tmp0 * tmp1; 2023-01-11T21:38:06.1488290Z auto tmp3 = static_cast(tmp2); 2023-01-11T21:38:06.1488403Z auto tmp4 = static_cast(i2); 2023-01-11T21:38:06.1488553Z auto tmp5 = static_cast(0.5066666666666667); 2023-01-11T21:38:06.1488659Z auto tmp6 = tmp4 * tmp5; 2023-01-11T21:38:06.1488766Z auto tmp7 = static_cast(tmp6); 2023-01-11T21:38:06.1488878Z auto tmp8 = static_cast(i3); 2023-01-11T21:38:06.1488999Z auto tmp9 = static_cast(0.4875); 2023-01-11T21:38:06.1489106Z auto tmp10 = tmp8 * tmp9; 2023-01-11T21:38:06.1489226Z auto tmp11 = static_cast(tmp10); 2023-01-11T21:38:06.1489365Z auto tmp12 = in_ptr0[tmp11 + (39*tmp7) + (1482*tmp3) + (54834*i0)]; 2023-01-11T21:38:06.1489483Z out_ptr2[i3 + (80*i2) + (6000*i1) + (420000*i0)] = tmp12; 2023-01-11T21:38:06.1489558Z } 2023-01-11T21:38:06.1489626Z } 2023-01-11T21:38:06.1489700Z } 2023-01-11T21:38:06.1489770Z } 2023-01-11T21:38:06.1489837Z } 2023-01-11T21:38:06.1489904Z } 2023-01-11T21:38:06.1489986Z #pragma omp for 2023-01-11T21:38:06.1490066Z for(long i0=0; i0<8; i0+=1) 2023-01-11T21:38:06.1490133Z { 2023-01-11T21:38:06.1490217Z #pragma GCC ivdep 2023-01-11T21:38:06.1490306Z for(long i1=0; i1<45; i1+=1) 2023-01-11T21:38:06.1490376Z { 2023-01-11T21:38:06.1490463Z #pragma GCC ivdep 2023-01-11T21:38:06.1490557Z for(long i2=0; i2<74; i2+=1) 2023-01-11T21:38:06.1490649Z { 2023-01-11T21:38:06.1490739Z #pragma GCC ivdep 2023-01-11T21:38:06.1490836Z for(long i3=0; i3<103; i3+=1) 2023-01-11T21:38:06.1490907Z { 2023-01-11T21:38:06.1490980Z { 2023-01-11T21:38:06.1491054Z { 2023-01-11T21:38:06.1491174Z auto tmp0 = static_cast(i1); 2023-01-11T21:38:06.1491290Z auto tmp1 = static_cast(0.8222222222222222); 2023-01-11T21:38:06.1491394Z auto tmp2 = tmp0 * tmp1; 2023-01-11T21:38:06.1491510Z auto tmp3 = static_cast(tmp2); 2023-01-11T21:38:06.1491623Z auto tmp4 = static_cast(i2); 2023-01-11T21:38:06.1491747Z auto tmp5 = static_cast(0.5135135135135135); 2023-01-11T21:38:06.1491852Z auto tmp6 = tmp4 * tmp5; 2023-01-11T21:38:06.1491969Z auto tmp7 = static_cast(tmp6); 2023-01-11T21:38:06.1492081Z auto tmp8 = static_cast(i3); 2023-01-11T21:38:06.1492196Z auto tmp9 = static_cast(0.3786407766990291); 2023-01-11T21:38:06.1492306Z auto tmp10 = tmp8 * tmp9; 2023-01-11T21:38:06.1492430Z auto tmp11 = static_cast(tmp10); 2023-01-11T21:38:06.1492567Z auto tmp12 = in_ptr0[tmp11 + (39*tmp7) + (1482*tmp3) + (54834*i0)]; 2023-01-11T21:38:06.1492690Z out_ptr3[i3 + (103*i2) + (7622*i1) + (342990*i0)] = tmp12; 2023-01-11T21:38:06.1492766Z } 2023-01-11T21:38:06.1492840Z } 2023-01-11T21:38:06.1492904Z } 2023-01-11T21:38:06.1492973Z } 2023-01-11T21:38:06.1493039Z } 2023-01-11T21:38:06.1493108Z } 2023-01-11T21:38:06.1493191Z #pragma omp for 2023-01-11T21:38:06.1493280Z for(long i0=0; i0<8; i0+=1) 2023-01-11T21:38:06.1493346Z { 2023-01-11T21:38:06.1493423Z #pragma GCC ivdep 2023-01-11T21:38:06.1493514Z for(long i1=0; i1<36; i1+=1) 2023-01-11T21:38:06.1493583Z { 2023-01-11T21:38:06.1493696Z #pragma GCC ivdep 2023-01-11T21:38:06.1493793Z for(long i2=0; i2<39; i2+=1) 2023-01-11T21:38:06.1493865Z { 2023-01-11T21:38:06.1493956Z #pragma GCC ivdep 2023-01-11T21:38:06.1494046Z for(long i3=0; i3<40; i3+=1) 2023-01-11T21:38:06.1494117Z { 2023-01-11T21:38:06.1494189Z { 2023-01-11T21:38:06.1494265Z { 2023-01-11T21:38:06.1494383Z auto tmp0 = static_cast(i1); 2023-01-11T21:38:06.1494811Z auto tmp1 = static_cast(1.0277777777777777); 2023-01-11T21:38:06.1494960Z auto tmp2 = tmp0 * tmp1; 2023-01-11T21:38:06.1495069Z auto tmp3 = static_cast(tmp2); 2023-01-11T21:38:06.1495180Z auto tmp4 = static_cast(i2); 2023-01-11T21:38:06.1495307Z auto tmp5 = static_cast(0.9743589743589743); 2023-01-11T21:38:06.1495423Z auto tmp6 = tmp4 * tmp5; 2023-01-11T21:38:06.1495552Z auto tmp7 = static_cast(tmp6); 2023-01-11T21:38:06.1495688Z auto tmp8 = static_cast(i3); 2023-01-11T21:38:06.1495806Z auto tmp9 = static_cast(0.975); 2023-01-11T21:38:06.1495911Z auto tmp10 = tmp8 * tmp9; 2023-01-11T21:38:06.1496022Z auto tmp11 = static_cast(tmp10); 2023-01-11T21:38:06.1496248Z auto tmp12 = in_ptr0[tmp11 + (39*tmp7) + (1482*tmp3) + (54834*i0)]; 2023-01-11T21:38:06.1496370Z out_ptr4[i3 + (40*i2) + (1560*i1) + (56160*i0)] = tmp12; 2023-01-11T21:38:06.1496448Z } 2023-01-11T21:38:06.1496524Z } 2023-01-11T21:38:06.1496602Z } 2023-01-11T21:38:06.1496675Z } 2023-01-11T21:38:06.1496738Z } 2023-01-11T21:38:06.1496806Z } 2023-01-11T21:38:06.1496875Z } 2023-01-11T21:38:06.1496941Z } 2023-01-11T21:38:06.1497039Z ''') 2023-01-11T21:38:06.1497051Z 2023-01-11T21:38:06.1497055Z 2023-01-11T21:38:06.1497228Z async_compile.wait(globals()) 2023-01-11T21:38:06.1497324Z del async_compile 2023-01-11T21:38:06.1497330Z 2023-01-11T21:38:06.1497411Z def call(args): 2023-01-11T21:38:06.1497505Z arg0_1, = args 2023-01-11T21:38:06.1497591Z args.clear() 2023-01-11T21:38:06.1497837Z buf0 = empty_strided((2, 4, 74, 76, 78), (1754688, 438672, 5928, 78, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1498079Z buf4 = empty_strided((2, 4, 74, 76, 78), (1754688, 438672, 5928, 78, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1498313Z buf1 = empty_strided((2, 4, 70, 75, 80), (1680000, 420000, 6000, 80, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1498559Z buf2 = empty_strided((2, 4, 45, 74, 103), (1371960, 342990, 7622, 103, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1498792Z buf3 = empty_strided((2, 4, 36, 39, 40), (224640, 56160, 1560, 40, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1499021Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf4.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr()), c_void_p(buf3.data_ptr())) 2023-01-11T21:38:06.1499094Z del arg0_1 2023-01-11T21:38:06.1499195Z return (buf0, buf1, buf2, buf3, buf4, ) 2023-01-11T21:38:06.1499201Z 2023-01-11T21:38:06.1499205Z 2023-01-11T21:38:06.1499290Z if __name__ == "__main__": 2023-01-11T21:38:06.1499409Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.1499538Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.1499773Z arg0_1 = rand_strided((2, 4, 37, 38, 39), (219336, 54834, 1482, 39, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1499933Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.1500200Z [2023-01-11 21:33:05,048] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 320 2023-01-11T21:38:06.1500206Z 2023-01-11T21:38:06.1500270Z ok (2.476s) 2023-01-11T21:38:06.1500724Z test_var_mean_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.1500860Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.1501115Z [2023-01-11 21:33:05,185] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 321 2023-01-11T21:38:06.1501120Z 2023-01-11T21:38:06.1501220Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.1501294Z import torch 2023-01-11T21:38:06.1501368Z import random 2023-01-11T21:38:06.1501489Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.1501616Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.1501622Z 2023-01-11T21:38:06.1501697Z aten = torch.ops.aten 2023-01-11T21:38:06.1501833Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.1501930Z async_compile = AsyncCompile() 2023-01-11T21:38:06.1501936Z 2023-01-11T21:38:06.1502010Z import triton 2023-01-11T21:38:06.1502102Z import triton.language as tl 2023-01-11T21:38:06.1502255Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.1502394Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.1502400Z 2023-01-11T21:38:06.1502404Z 2023-01-11T21:38:06.1502542Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.1502743Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.1502865Z extern "C" void kernel(float* __restrict__ in_out_ptr0, 2023-01-11T21:38:06.1502971Z float* __restrict__ in_out_ptr1, 2023-01-11T21:38:06.1503077Z float* __restrict__ in_out_ptr2, 2023-01-11T21:38:06.1503182Z float* __restrict__ in_out_ptr3, 2023-01-11T21:38:06.1503291Z const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.1503396Z float* __restrict__ out_ptr0, 2023-01-11T21:38:06.1503490Z float* __restrict__ out_ptr3) 2023-01-11T21:38:06.1503558Z { 2023-01-11T21:38:06.1503654Z auto out_ptr1 = in_out_ptr0; 2023-01-11T21:38:06.1503743Z auto out_ptr2 = in_out_ptr1; 2023-01-11T21:38:06.1503831Z auto out_ptr4 = in_out_ptr2; 2023-01-11T21:38:06.1503918Z auto out_ptr5 = in_out_ptr3; 2023-01-11T21:38:06.1504020Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.1504079Z { 2023-01-11T21:38:06.1504162Z #pragma omp for 2023-01-11T21:38:06.1504250Z for(long i0=0; i0<8; i0+=1) 2023-01-11T21:38:06.1504316Z { 2023-01-11T21:38:06.1504386Z { 2023-01-11T21:38:06.1504611Z #pragma omp declare reduction(+:at::vec::Vectorized:omp_out += omp_in) initializer(omp_priv={{0}}) 2023-01-11T21:38:06.1504696Z float tmp1 = 0; 2023-01-11T21:38:06.1504816Z auto tmp1_vec = at::vec::Vectorized(tmp1); 2023-01-11T21:38:06.1504910Z for(long i1=0; i1<1; i1+=1) 2023-01-11T21:38:06.1504980Z { 2023-01-11T21:38:06.1505128Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + (8*i0) + (8*i1)); 2023-01-11T21:38:06.1505223Z tmp1_vec += tmp0; 2023-01-11T21:38:06.1505310Z } 2023-01-11T21:38:06.1505540Z tmp1 = at::vec::vec_reduce_all([](at::vec::Vectorized& x, at::vec::Vectorized&y) {return x + y;}, tmp1_vec); 2023-01-11T21:38:06.1505699Z #pragma omp simd simdlen(4) reduction(+:tmp1) 2023-01-11T21:38:06.1505789Z for(long i1=8; i1<8; i1+=1) 2023-01-11T21:38:06.1505858Z { 2023-01-11T21:38:06.1505963Z auto tmp0 = in_ptr0[i1 + (8*i0)]; 2023-01-11T21:38:06.1506047Z tmp1 += tmp0; 2023-01-11T21:38:06.1506116Z } 2023-01-11T21:38:06.1506202Z out_ptr0[i0] = tmp1; 2023-01-11T21:38:06.1506269Z } 2023-01-11T21:38:06.1506329Z } 2023-01-11T21:38:06.1506412Z #pragma omp for 2023-01-11T21:38:06.1506497Z for(long i0=0; i0<8; i0+=1) 2023-01-11T21:38:06.1506566Z { 2023-01-11T21:38:06.1506633Z { 2023-01-11T21:38:06.1506824Z #pragma omp declare reduction(+:at::vec::Vectorized:omp_out += omp_in) initializer(omp_priv={{0}}) 2023-01-11T21:38:06.1506909Z float tmp6 = 0; 2023-01-11T21:38:06.1507030Z auto tmp6_vec = at::vec::Vectorized(tmp6); 2023-01-11T21:38:06.1507114Z float tmp7 = 0; 2023-01-11T21:38:06.1507238Z auto tmp7_vec = at::vec::Vectorized(tmp7); 2023-01-11T21:38:06.1507334Z for(long i1=0; i1<1; i1+=1) 2023-01-11T21:38:06.1507404Z { 2023-01-11T21:38:06.1507553Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + (8*i0) + (8*i1)); 2023-01-11T21:38:06.1507686Z auto tmp1 = at::vec::Vectorized(out_ptr0[i0]); 2023-01-11T21:38:06.1507829Z auto tmp2 = at::vec::Vectorized(static_cast(8)); 2023-01-11T21:38:06.1507950Z auto tmp3 = tmp1 / tmp2; 2023-01-11T21:38:06.1508094Z auto tmp4 = tmp0 - tmp3; 2023-01-11T21:38:06.1508192Z auto tmp5 = tmp4.pow(2); 2023-01-11T21:38:06.1508280Z tmp6_vec += tmp5; 2023-01-11T21:38:06.1508367Z tmp7_vec += tmp0; 2023-01-11T21:38:06.1508437Z } 2023-01-11T21:38:06.1508637Z tmp6 = at::vec::vec_reduce_all([](at::vec::Vectorized& x, at::vec::Vectorized&y) {return x + y;}, tmp6_vec); 2023-01-11T21:38:06.1508834Z tmp7 = at::vec::vec_reduce_all([](at::vec::Vectorized& x, at::vec::Vectorized&y) {return x + y;}, tmp7_vec); 2023-01-11T21:38:06.1508980Z #pragma omp simd simdlen(4) reduction(+:tmp6) reduction(+:tmp7) 2023-01-11T21:38:06.1509068Z for(long i1=8; i1<8; i1+=1) 2023-01-11T21:38:06.1509136Z { 2023-01-11T21:38:06.1509241Z auto tmp0 = in_ptr0[i1 + (8*i0)]; 2023-01-11T21:38:06.1509341Z auto tmp1 = out_ptr0[i0]; 2023-01-11T21:38:06.1509450Z auto tmp2 = static_cast(8); 2023-01-11T21:38:06.1509546Z auto tmp3 = tmp1 / tmp2; 2023-01-11T21:38:06.1509682Z auto tmp4 = tmp0 - tmp3; 2023-01-11T21:38:06.1509772Z auto tmp5 = tmp4 * tmp4; 2023-01-11T21:38:06.1509856Z tmp6 += tmp5; 2023-01-11T21:38:06.1509939Z tmp7 += tmp0; 2023-01-11T21:38:06.1510008Z } 2023-01-11T21:38:06.1510098Z out_ptr1[i0] = tmp6; 2023-01-11T21:38:06.1510185Z out_ptr2[i0] = tmp7; 2023-01-11T21:38:06.1510253Z } 2023-01-11T21:38:06.1510313Z } 2023-01-11T21:38:06.1510398Z #pragma omp for 2023-01-11T21:38:06.1510484Z for(long i0=0; i0<1; i0+=1) 2023-01-11T21:38:06.1510551Z { 2023-01-11T21:38:06.1510690Z auto tmp0 = at::vec::Vectorized::loadu(out_ptr1 + 8*i0); 2023-01-11T21:38:06.1510833Z auto tmp1 = at::vec::Vectorized(static_cast(7)); 2023-01-11T21:38:06.1510923Z auto tmp2 = tmp0 / tmp1; 2023-01-11T21:38:06.1511017Z tmp2.store(in_out_ptr0 + 8*i0); 2023-01-11T21:38:06.1511085Z } 2023-01-11T21:38:06.1511217Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:06.1511305Z for(long i0=8; i0<8; i0+=1) 2023-01-11T21:38:06.1511372Z { 2023-01-11T21:38:06.1511462Z auto tmp0 = out_ptr1[i0]; 2023-01-11T21:38:06.1511565Z auto tmp1 = static_cast(7); 2023-01-11T21:38:06.1511646Z auto tmp2 = tmp0 / tmp1; 2023-01-11T21:38:06.1511732Z in_out_ptr0[i0] = tmp2; 2023-01-11T21:38:06.1511800Z } 2023-01-11T21:38:06.1511880Z #pragma omp for 2023-01-11T21:38:06.1511966Z for(long i0=0; i0<1; i0+=1) 2023-01-11T21:38:06.1512032Z { 2023-01-11T21:38:06.1512171Z auto tmp0 = at::vec::Vectorized::loadu(out_ptr2 + 8*i0); 2023-01-11T21:38:06.1512303Z auto tmp1 = at::vec::Vectorized(static_cast(8)); 2023-01-11T21:38:06.1512393Z auto tmp2 = tmp0 / tmp1; 2023-01-11T21:38:06.1512493Z tmp2.store(in_out_ptr1 + 8*i0); 2023-01-11T21:38:06.1512562Z } 2023-01-11T21:38:06.1512663Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:06.1512748Z for(long i0=8; i0<8; i0+=1) 2023-01-11T21:38:06.1512816Z { 2023-01-11T21:38:06.1512899Z auto tmp0 = out_ptr2[i0]; 2023-01-11T21:38:06.1513004Z auto tmp1 = static_cast(8); 2023-01-11T21:38:06.1513092Z auto tmp2 = tmp0 / tmp1; 2023-01-11T21:38:06.1513178Z in_out_ptr1[i0] = tmp2; 2023-01-11T21:38:06.1513246Z } 2023-01-11T21:38:06.1513328Z #pragma omp for 2023-01-11T21:38:06.1513413Z for(long i0=0; i0<4; i0+=1) 2023-01-11T21:38:06.1513473Z { 2023-01-11T21:38:06.1513540Z { 2023-01-11T21:38:06.1513777Z #pragma omp declare reduction(+:at::vec::Vectorized:omp_out += omp_in) initializer(omp_priv={{0}}) 2023-01-11T21:38:06.1513863Z float tmp1 = 0; 2023-01-11T21:38:06.1513991Z auto tmp1_vec = at::vec::Vectorized(tmp1); 2023-01-11T21:38:06.1514089Z for(long i1=0; i1<2; i1+=1) 2023-01-11T21:38:06.1514158Z { 2023-01-11T21:38:06.1514247Z for(long i2=0; i2<1; i2+=1) 2023-01-11T21:38:06.1514318Z { 2023-01-11T21:38:06.1514476Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + (8*i0) + (8*i2) + (32*i1)); 2023-01-11T21:38:06.1514568Z tmp1_vec += tmp0; 2023-01-11T21:38:06.1514641Z } 2023-01-11T21:38:06.1514842Z tmp1 = at::vec::vec_reduce_all([](at::vec::Vectorized& x, at::vec::Vectorized&y) {return x + y;}, tmp1_vec); 2023-01-11T21:38:06.1514970Z #pragma omp simd simdlen(4) reduction(+:tmp1) 2023-01-11T21:38:06.1515072Z for(long i2=8; i2<8; i2+=1) 2023-01-11T21:38:06.1515136Z { 2023-01-11T21:38:06.1515248Z auto tmp0 = in_ptr0[i2 + (8*i0) + (32*i1)]; 2023-01-11T21:38:06.1515333Z tmp1 += tmp0; 2023-01-11T21:38:06.1515406Z } 2023-01-11T21:38:06.1515477Z } 2023-01-11T21:38:06.1515565Z out_ptr3[i0] = tmp1; 2023-01-11T21:38:06.1515632Z } 2023-01-11T21:38:06.1515691Z } 2023-01-11T21:38:06.1515774Z #pragma omp for 2023-01-11T21:38:06.1515861Z for(long i0=0; i0<4; i0+=1) 2023-01-11T21:38:06.1515928Z { 2023-01-11T21:38:06.1515994Z { 2023-01-11T21:38:06.1516187Z #pragma omp declare reduction(+:at::vec::Vectorized:omp_out += omp_in) initializer(omp_priv={{0}}) 2023-01-11T21:38:06.1516275Z float tmp6 = 0; 2023-01-11T21:38:06.1516398Z auto tmp6_vec = at::vec::Vectorized(tmp6); 2023-01-11T21:38:06.1516482Z float tmp7 = 0; 2023-01-11T21:38:06.1516608Z auto tmp7_vec = at::vec::Vectorized(tmp7); 2023-01-11T21:38:06.1516700Z for(long i1=0; i1<2; i1+=1) 2023-01-11T21:38:06.1516770Z { 2023-01-11T21:38:06.1516896Z for(long i2=0; i2<1; i2+=1) 2023-01-11T21:38:06.1516971Z { 2023-01-11T21:38:06.1517129Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + (8*i0) + (8*i2) + (32*i1)); 2023-01-11T21:38:06.1517258Z auto tmp1 = at::vec::Vectorized(out_ptr3[i0]); 2023-01-11T21:38:06.1517402Z auto tmp2 = at::vec::Vectorized(static_cast(16)); 2023-01-11T21:38:06.1517502Z auto tmp3 = tmp1 / tmp2; 2023-01-11T21:38:06.1517652Z auto tmp4 = tmp0 - tmp3; 2023-01-11T21:38:06.1517755Z auto tmp5 = tmp4.pow(2); 2023-01-11T21:38:06.1517845Z tmp6_vec += tmp5; 2023-01-11T21:38:06.1517935Z tmp7_vec += tmp0; 2023-01-11T21:38:06.1517999Z } 2023-01-11T21:38:06.1518203Z tmp6 = at::vec::vec_reduce_all([](at::vec::Vectorized& x, at::vec::Vectorized&y) {return x + y;}, tmp6_vec); 2023-01-11T21:38:06.1518394Z tmp7 = at::vec::vec_reduce_all([](at::vec::Vectorized& x, at::vec::Vectorized&y) {return x + y;}, tmp7_vec); 2023-01-11T21:38:06.1518542Z #pragma omp simd simdlen(4) reduction(+:tmp6) reduction(+:tmp7) 2023-01-11T21:38:06.1518638Z for(long i2=8; i2<8; i2+=1) 2023-01-11T21:38:06.1518709Z { 2023-01-11T21:38:06.1518822Z auto tmp0 = in_ptr0[i2 + (8*i0) + (32*i1)]; 2023-01-11T21:38:06.1518920Z auto tmp1 = out_ptr3[i0]; 2023-01-11T21:38:06.1519066Z auto tmp2 = static_cast(16); 2023-01-11T21:38:06.1519157Z auto tmp3 = tmp1 / tmp2; 2023-01-11T21:38:06.1519298Z auto tmp4 = tmp0 - tmp3; 2023-01-11T21:38:06.1519395Z auto tmp5 = tmp4 * tmp4; 2023-01-11T21:38:06.1519483Z tmp6 += tmp5; 2023-01-11T21:38:06.1519567Z tmp7 += tmp0; 2023-01-11T21:38:06.1519638Z } 2023-01-11T21:38:06.1519706Z } 2023-01-11T21:38:06.1519786Z out_ptr4[i0] = tmp6; 2023-01-11T21:38:06.1519870Z out_ptr5[i0] = tmp7; 2023-01-11T21:38:06.1519940Z } 2023-01-11T21:38:06.1520009Z } 2023-01-11T21:38:06.1520094Z #pragma omp single 2023-01-11T21:38:06.1520161Z { 2023-01-11T21:38:06.1520238Z #pragma GCC ivdep 2023-01-11T21:38:06.1520326Z for(long i0=0; i0<4; i0+=1) 2023-01-11T21:38:06.1520396Z { 2023-01-11T21:38:06.1520465Z { 2023-01-11T21:38:06.1520536Z { 2023-01-11T21:38:06.1520636Z auto tmp0 = out_ptr4[i0]; 2023-01-11T21:38:06.1520749Z auto tmp1 = static_cast(15); 2023-01-11T21:38:06.1520840Z auto tmp2 = tmp0 / tmp1; 2023-01-11T21:38:06.1520938Z in_out_ptr2[i0] = tmp2; 2023-01-11T21:38:06.1521010Z } 2023-01-11T21:38:06.1521078Z } 2023-01-11T21:38:06.1521144Z } 2023-01-11T21:38:06.1521210Z } 2023-01-11T21:38:06.1521293Z #pragma omp single 2023-01-11T21:38:06.1521355Z { 2023-01-11T21:38:06.1521440Z #pragma GCC ivdep 2023-01-11T21:38:06.1521529Z for(long i0=0; i0<4; i0+=1) 2023-01-11T21:38:06.1521596Z { 2023-01-11T21:38:06.1521664Z { 2023-01-11T21:38:06.1521734Z { 2023-01-11T21:38:06.1521832Z auto tmp0 = out_ptr5[i0]; 2023-01-11T21:38:06.1521940Z auto tmp1 = static_cast(16); 2023-01-11T21:38:06.1522039Z auto tmp2 = tmp0 / tmp1; 2023-01-11T21:38:06.1522133Z in_out_ptr3[i0] = tmp2; 2023-01-11T21:38:06.1522205Z } 2023-01-11T21:38:06.1522301Z } 2023-01-11T21:38:06.1522373Z } 2023-01-11T21:38:06.1522442Z } 2023-01-11T21:38:06.1522503Z } 2023-01-11T21:38:06.1522568Z } 2023-01-11T21:38:06.1522653Z ''') 2023-01-11T21:38:06.1522661Z 2023-01-11T21:38:06.1522665Z 2023-01-11T21:38:06.1522760Z async_compile.wait(globals()) 2023-01-11T21:38:06.1522837Z del async_compile 2023-01-11T21:38:06.1522842Z 2023-01-11T21:38:06.1522920Z def call(args): 2023-01-11T21:38:06.1522994Z arg0_1, = args 2023-01-11T21:38:06.1523062Z args.clear() 2023-01-11T21:38:06.1523273Z buf0 = empty_strided((1, 2, 4, 1), (8, 4, 1, 8), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1523479Z buf1 = empty_strided((1, 2, 4), (8, 4, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1523679Z buf2 = empty_strided((1, 2, 4), (8, 4, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1523771Z buf3 = buf1; del buf1 # reuse 2023-01-11T21:38:06.1523857Z buf4 = buf2; del buf2 # reuse 2023-01-11T21:38:06.1524065Z buf5 = empty_strided((1, 1, 4, 1), (4, 4, 1, 4), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1524248Z buf6 = empty_strided((1, 4), (4, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1524441Z buf7 = empty_strided((1, 4), (4, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1524529Z buf8 = buf6; del buf6 # reuse 2023-01-11T21:38:06.1524617Z buf9 = buf7; del buf7 # reuse 2023-01-11T21:38:06.1524875Z kernel_cpp_0(c_void_p(buf3.data_ptr()), c_void_p(buf4.data_ptr()), c_void_p(buf8.data_ptr()), c_void_p(buf9.data_ptr()), c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf5.data_ptr())) 2023-01-11T21:38:06.1524979Z del arg0_1 2023-01-11T21:38:06.1525071Z return (buf3, buf4, buf8, buf9, ) 2023-01-11T21:38:06.1525077Z 2023-01-11T21:38:06.1525081Z 2023-01-11T21:38:06.1525162Z if __name__ == "__main__": 2023-01-11T21:38:06.1525281Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.1525402Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.1525616Z arg0_1 = rand_strided((1, 2, 4, 8), (64, 32, 8, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1525729Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.1525997Z [2023-01-11 21:33:06,996] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 321 2023-01-11T21:38:06.1526003Z 2023-01-11T21:38:06.1526075Z ok (1.831s) 2023-01-11T21:38:06.1526533Z test_vdd_clamp_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.1526670Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.1526929Z [2023-01-11 21:33:07,033] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 322 2023-01-11T21:38:06.1527188Z [2023-01-11 21:33:08,749] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 322 2023-01-11T21:38:06.1527194Z 2023-01-11T21:38:06.1527285Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.1527360Z import torch 2023-01-11T21:38:06.1527434Z import random 2023-01-11T21:38:06.1527555Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.1527679Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.1527685Z 2023-01-11T21:38:06.1527767Z aten = torch.ops.aten 2023-01-11T21:38:06.1527908Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.1528004Z async_compile = AsyncCompile() 2023-01-11T21:38:06.1528009Z 2023-01-11T21:38:06.1528075Z import triton 2023-01-11T21:38:06.1528169Z import triton.language as tl 2023-01-11T21:38:06.1528295Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.1528466Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.1528473Z 2023-01-11T21:38:06.1528478Z 2023-01-11T21:38:06.1528616Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.1528822Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.1528947Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.1529053Z float* __restrict__ out_ptr0, 2023-01-11T21:38:06.1529147Z bool* __restrict__ out_ptr1) 2023-01-11T21:38:06.1529213Z { 2023-01-11T21:38:06.1529315Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.1529387Z { 2023-01-11T21:38:06.1529468Z #pragma omp for 2023-01-11T21:38:06.1529557Z for(long i0=0; i0<16; i0+=1) 2023-01-11T21:38:06.1529624Z { 2023-01-11T21:38:06.1529684Z { 2023-01-11T21:38:06.1529752Z { 2023-01-11T21:38:06.1529853Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:06.1529965Z auto tmp1 = static_cast(3.0); 2023-01-11T21:38:06.1530100Z auto tmp2 = (tmp1 != tmp1) ? tmp1 : std::max(tmp0, tmp1); 2023-01-11T21:38:06.1530213Z auto tmp3 = static_cast(3); 2023-01-11T21:38:06.1530311Z auto tmp4 = tmp0 >= tmp3; 2023-01-11T21:38:06.1530394Z out_ptr0[i0] = tmp2; 2023-01-11T21:38:06.1530484Z out_ptr1[i0] = tmp4; 2023-01-11T21:38:06.1530552Z } 2023-01-11T21:38:06.1530621Z } 2023-01-11T21:38:06.1530688Z } 2023-01-11T21:38:06.1530784Z } 2023-01-11T21:38:06.1530841Z } 2023-01-11T21:38:06.1530925Z ''') 2023-01-11T21:38:06.1530930Z 2023-01-11T21:38:06.1530935Z 2023-01-11T21:38:06.1531032Z async_compile.wait(globals()) 2023-01-11T21:38:06.1531110Z del async_compile 2023-01-11T21:38:06.1531115Z 2023-01-11T21:38:06.1531188Z def call(args): 2023-01-11T21:38:06.1531270Z primals_1, = args 2023-01-11T21:38:06.1531345Z args.clear() 2023-01-11T21:38:06.1531540Z buf0 = empty_strided((16, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1531720Z buf1 = empty_strided((16, ), (1, ), device='cpu', dtype=torch.bool) 2023-01-11T21:38:06.1531896Z kernel_cpp_0(c_void_p(primals_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr())) 2023-01-11T21:38:06.1531973Z del primals_1 2023-01-11T21:38:06.1532057Z return (buf0, buf1, ) 2023-01-11T21:38:06.1532063Z 2023-01-11T21:38:06.1532067Z 2023-01-11T21:38:06.1532149Z if __name__ == "__main__": 2023-01-11T21:38:06.1532267Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.1532397Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.1532595Z primals_1 = rand_strided((16, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1532706Z print_performance(lambda: call([primals_1])) 2023-01-11T21:38:06.1532711Z 2023-01-11T21:38:06.1532784Z ok (1.753s) 2023-01-11T21:38:06.1533244Z test_vertical_fusion1_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.1533376Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.1533633Z [2023-01-11 21:33:08,797] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 323 2023-01-11T21:38:06.1533899Z [2023-01-11 21:33:10,533] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 323 2023-01-11T21:38:06.1533904Z 2023-01-11T21:38:06.1534004Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.1534080Z import torch 2023-01-11T21:38:06.1534156Z import random 2023-01-11T21:38:06.1534311Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.1534438Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.1534443Z 2023-01-11T21:38:06.1534735Z aten = torch.ops.aten 2023-01-11T21:38:06.1534920Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.1535028Z async_compile = AsyncCompile() 2023-01-11T21:38:06.1535034Z 2023-01-11T21:38:06.1535112Z import triton 2023-01-11T21:38:06.1535226Z import triton.language as tl 2023-01-11T21:38:06.1535368Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.1535518Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.1535528Z 2023-01-11T21:38:06.1535533Z 2023-01-11T21:38:06.1535683Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.1535891Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.1536020Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.1536132Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:06.1536242Z const float* __restrict__ in_ptr2, 2023-01-11T21:38:06.1536347Z float* __restrict__ out_ptr0) 2023-01-11T21:38:06.1536416Z { 2023-01-11T21:38:06.1536514Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.1536582Z { 2023-01-11T21:38:06.1536669Z #pragma omp for 2023-01-11T21:38:06.1536761Z for(long i0=0; i0<41616; i0+=1) 2023-01-11T21:38:06.1536830Z { 2023-01-11T21:38:06.1536919Z for(long i1=0; i1<3; i1+=1) 2023-01-11T21:38:06.1537068Z { 2023-01-11T21:38:06.1537293Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + (8*i1) + (26*i0)); 2023-01-11T21:38:06.1537444Z auto tmp8 = at::vec::Vectorized::loadu(in_ptr1 + (8*i1) + (26*i0)); 2023-01-11T21:38:06.1537586Z auto tmp15 = at::vec::Vectorized::loadu(in_ptr2 + 8*i1); 2023-01-11T21:38:06.1537821Z auto tmp1 = at::vec::Vectorized(static_cast(-1.061519070296458e-11)); 2023-01-11T21:38:06.1537916Z auto tmp2 = tmp0 * tmp1; 2023-01-11T21:38:06.1538142Z auto tmp3 = at::vec::Vectorized(static_cast(-1.988366587925593e-08)); 2023-01-11T21:38:06.1538234Z auto tmp4 = tmp2 + tmp3; 2023-01-11T21:38:06.1538325Z auto tmp5 = tmp0 * tmp4; 2023-01-11T21:38:06.1538543Z auto tmp6 = at::vec::Vectorized(static_cast(-3.087032500374211e-07)); 2023-01-11T21:38:06.1538638Z auto tmp7 = tmp5 + tmp6; 2023-01-11T21:38:06.1538862Z auto tmp9 = at::vec::Vectorized(static_cast(1.55093272922008e-10)); 2023-01-11T21:38:06.1538957Z auto tmp10 = tmp8 * tmp9; 2023-01-11T21:38:06.1539052Z auto tmp11 = tmp7 + tmp10; 2023-01-11T21:38:06.1539160Z auto tmp12 = tmp11.reciprocal(); 2023-01-11T21:38:06.1539307Z auto tmp13 = at::vec::Vectorized(static_cast(1.0)); 2023-01-11T21:38:06.1539404Z auto tmp14 = tmp12 * tmp13; 2023-01-11T21:38:06.1539492Z auto tmp16 = tmp11 * tmp15; 2023-01-11T21:38:06.1539585Z auto tmp17 = tmp14 + tmp16; 2023-01-11T21:38:06.1539695Z tmp17.store(out_ptr0 + (8*i1) + (26*i0)); 2023-01-11T21:38:06.1539763Z } 2023-01-11T21:38:06.1539863Z #pragma omp simd simdlen(4) 2023-01-11T21:38:06.1539953Z for(long i1=24; i1<26; i1+=1) 2023-01-11T21:38:06.1540020Z { 2023-01-11T21:38:06.1540118Z auto tmp0 = in_ptr0[i1 + (26*i0)]; 2023-01-11T21:38:06.1540217Z auto tmp8 = in_ptr1[i1 + (26*i0)]; 2023-01-11T21:38:06.1540310Z auto tmp15 = in_ptr2[i1]; 2023-01-11T21:38:06.1540494Z auto tmp1 = static_cast(-1.061519070296458e-11); 2023-01-11T21:38:06.1540636Z auto tmp2 = tmp0 * tmp1; 2023-01-11T21:38:06.1540820Z auto tmp3 = static_cast(-1.988366587925593e-08); 2023-01-11T21:38:06.1540914Z auto tmp4 = tmp2 + tmp3; 2023-01-11T21:38:06.1540998Z auto tmp5 = tmp0 * tmp4; 2023-01-11T21:38:06.1541182Z auto tmp6 = static_cast(-3.087032500374211e-07); 2023-01-11T21:38:06.1541275Z auto tmp7 = tmp5 + tmp6; 2023-01-11T21:38:06.1541452Z auto tmp9 = static_cast(1.55093272922008e-10); 2023-01-11T21:38:06.1541547Z auto tmp10 = tmp8 * tmp9; 2023-01-11T21:38:06.1541640Z auto tmp11 = tmp7 + tmp10; 2023-01-11T21:38:06.1541735Z auto tmp12 = 1 / tmp11; 2023-01-11T21:38:06.1541837Z auto tmp13 = static_cast(1.0); 2023-01-11T21:38:06.1541934Z auto tmp14 = tmp12 * tmp13; 2023-01-11T21:38:06.1542029Z auto tmp16 = tmp11 * tmp15; 2023-01-11T21:38:06.1542125Z auto tmp17 = tmp14 + tmp16; 2023-01-11T21:38:06.1542223Z out_ptr0[i1 + (26*i0)] = tmp17; 2023-01-11T21:38:06.1542291Z } 2023-01-11T21:38:06.1542357Z } 2023-01-11T21:38:06.1542416Z } 2023-01-11T21:38:06.1542480Z } 2023-01-11T21:38:06.1542564Z ''') 2023-01-11T21:38:06.1542570Z 2023-01-11T21:38:06.1542574Z 2023-01-11T21:38:06.1542668Z async_compile.wait(globals()) 2023-01-11T21:38:06.1542750Z del async_compile 2023-01-11T21:38:06.1542755Z 2023-01-11T21:38:06.1542831Z def call(args): 2023-01-11T21:38:06.1542918Z arg0_1, arg1_1, arg2_1 = args 2023-01-11T21:38:06.1542995Z args.clear() 2023-01-11T21:38:06.1543202Z buf0 = empty_strided((204, 204, 26), (5304, 26, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1543430Z kernel_cpp_0(c_void_p(arg1_1.data_ptr()), c_void_p(arg0_1.data_ptr()), c_void_p(arg2_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.1543504Z del arg0_1 2023-01-11T21:38:06.1543577Z del arg1_1 2023-01-11T21:38:06.1543652Z del arg2_1 2023-01-11T21:38:06.1543728Z return (buf0, ) 2023-01-11T21:38:06.1543733Z 2023-01-11T21:38:06.1543738Z 2023-01-11T21:38:06.1543816Z if __name__ == "__main__": 2023-01-11T21:38:06.1543929Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.1544055Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.1544272Z arg0_1 = rand_strided((204, 204, 26), (5304, 26, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1544487Z arg1_1 = rand_strided((204, 204, 26), (5304, 26, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1544679Z arg2_1 = rand_strided((26, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1544811Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1])) 2023-01-11T21:38:06.1544816Z 2023-01-11T21:38:06.1544889Z ok (1.808s) 2023-01-11T21:38:06.1545338Z test_views1_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.1545471Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.1545729Z [2023-01-11 21:33:10,579] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 324 2023-01-11T21:38:06.1545984Z [2023-01-11 21:33:12,267] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 324 2023-01-11T21:38:06.1546406Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.1546563Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.1546819Z [2023-01-11 21:33:12,286] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 325 2023-01-11T21:38:06.1547084Z [2023-01-11 21:33:13,974] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 325 2023-01-11T21:38:06.1547498Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.1547632Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.1547886Z [2023-01-11 21:33:13,992] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 326 2023-01-11T21:38:06.1548150Z [2023-01-11 21:33:15,744] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 326 2023-01-11T21:38:06.1548565Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.1548696Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.1548950Z [2023-01-11 21:33:15,764] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 327 2023-01-11T21:38:06.1548983Z 2023-01-11T21:38:06.1549077Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.1549157Z import torch 2023-01-11T21:38:06.1549235Z import random 2023-01-11T21:38:06.1549358Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.1549490Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.1549495Z 2023-01-11T21:38:06.1549580Z aten = torch.ops.aten 2023-01-11T21:38:06.1549721Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.1549813Z async_compile = AsyncCompile() 2023-01-11T21:38:06.1549826Z 2023-01-11T21:38:06.1549896Z import triton 2023-01-11T21:38:06.1549991Z import triton.language as tl 2023-01-11T21:38:06.1550122Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.1550264Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.1550270Z 2023-01-11T21:38:06.1550279Z 2023-01-11T21:38:06.1550418Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.1550629Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.1550757Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.1550864Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:06.1550971Z float* __restrict__ out_ptr0) 2023-01-11T21:38:06.1551040Z { 2023-01-11T21:38:06.1551145Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.1551216Z { 2023-01-11T21:38:06.1551301Z #pragma omp for 2023-01-11T21:38:06.1551393Z for(long i0=0; i0<4; i0+=1) 2023-01-11T21:38:06.1551456Z { 2023-01-11T21:38:06.1551599Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:06.1551739Z auto tmp1 = at::vec::Vectorized::loadu(in_ptr1 + 8*i0); 2023-01-11T21:38:06.1551832Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.1551933Z tmp2.store(out_ptr0 + 8*i0); 2023-01-11T21:38:06.1552002Z } 2023-01-11T21:38:06.1552107Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:06.1552190Z for(long i0=32; i0<35; i0+=1) 2023-01-11T21:38:06.1552260Z { 2023-01-11T21:38:06.1552352Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:06.1552469Z auto tmp1 = in_ptr1[i0]; 2023-01-11T21:38:06.1552558Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.1552645Z out_ptr0[i0] = tmp2; 2023-01-11T21:38:06.1552711Z } 2023-01-11T21:38:06.1552770Z } 2023-01-11T21:38:06.1552836Z } 2023-01-11T21:38:06.1552922Z ''') 2023-01-11T21:38:06.1552928Z 2023-01-11T21:38:06.1552932Z 2023-01-11T21:38:06.1553025Z async_compile.wait(globals()) 2023-01-11T21:38:06.1553102Z del async_compile 2023-01-11T21:38:06.1553107Z 2023-01-11T21:38:06.1553182Z def call(args): 2023-01-11T21:38:06.1553262Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.1553330Z args.clear() 2023-01-11T21:38:06.1553529Z buf0 = empty_strided((5, 7), (7, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1553699Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.1553772Z del arg0_1 2023-01-11T21:38:06.1553845Z del arg1_1 2023-01-11T21:38:06.1553923Z return (buf0, ) 2023-01-11T21:38:06.1553928Z 2023-01-11T21:38:06.1553932Z 2023-01-11T21:38:06.1554012Z if __name__ == "__main__": 2023-01-11T21:38:06.1554130Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.1554250Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.1554444Z arg0_1 = rand_strided((35, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1554639Z arg1_1 = rand_strided((5, 7), (7, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1554759Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.1554765Z 2023-01-11T21:38:06.1554769Z 2023-01-11T21:38:06.1554906Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.1554982Z import torch 2023-01-11T21:38:06.1555055Z import random 2023-01-11T21:38:06.1555173Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.1555289Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.1555294Z 2023-01-11T21:38:06.1555379Z aten = torch.ops.aten 2023-01-11T21:38:06.1555517Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.1555613Z async_compile = AsyncCompile() 2023-01-11T21:38:06.1555618Z 2023-01-11T21:38:06.1555694Z import triton 2023-01-11T21:38:06.1555785Z import triton.language as tl 2023-01-11T21:38:06.1555911Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.1556051Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.1556057Z 2023-01-11T21:38:06.1556061Z 2023-01-11T21:38:06.1556194Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.1556401Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.1556528Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.1556637Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:06.1556741Z float* __restrict__ out_ptr0) 2023-01-11T21:38:06.1556809Z { 2023-01-11T21:38:06.1556914Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.1556972Z { 2023-01-11T21:38:06.1557054Z #pragma omp for 2023-01-11T21:38:06.1557139Z for(long i0=0; i0<4; i0+=1) 2023-01-11T21:38:06.1557206Z { 2023-01-11T21:38:06.1557344Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:06.1557479Z auto tmp3 = at::vec::Vectorized::loadu(in_ptr1 + 8*i0); 2023-01-11T21:38:06.1557612Z auto tmp1 = at::vec::Vectorized(static_cast(1)); 2023-01-11T21:38:06.1557701Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.1557784Z auto tmp4 = tmp2 + tmp3; 2023-01-11T21:38:06.1557881Z tmp4.store(out_ptr0 + 8*i0); 2023-01-11T21:38:06.1557948Z } 2023-01-11T21:38:06.1558049Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:06.1558136Z for(long i0=32; i0<35; i0+=1) 2023-01-11T21:38:06.1558203Z { 2023-01-11T21:38:06.1558319Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:06.1558402Z auto tmp3 = in_ptr1[i0]; 2023-01-11T21:38:06.1558505Z auto tmp1 = static_cast(1); 2023-01-11T21:38:06.1558594Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.1558681Z auto tmp4 = tmp2 + tmp3; 2023-01-11T21:38:06.1558766Z out_ptr0[i0] = tmp4; 2023-01-11T21:38:06.1558833Z } 2023-01-11T21:38:06.1558891Z } 2023-01-11T21:38:06.1558955Z } 2023-01-11T21:38:06.1559040Z ''') 2023-01-11T21:38:06.1559045Z 2023-01-11T21:38:06.1559049Z 2023-01-11T21:38:06.1559143Z async_compile.wait(globals()) 2023-01-11T21:38:06.1559219Z del async_compile 2023-01-11T21:38:06.1559228Z 2023-01-11T21:38:06.1559302Z def call(args): 2023-01-11T21:38:06.1559381Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.1559458Z args.clear() 2023-01-11T21:38:06.1559645Z buf0 = empty_strided((5, 7), (7, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1559815Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.1559886Z del arg0_1 2023-01-11T21:38:06.1559960Z del arg1_1 2023-01-11T21:38:06.1560035Z return (buf0, ) 2023-01-11T21:38:06.1560040Z 2023-01-11T21:38:06.1560044Z 2023-01-11T21:38:06.1560128Z if __name__ == "__main__": 2023-01-11T21:38:06.1560247Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.1560366Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.1560559Z arg0_1 = rand_strided((35, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1560759Z arg1_1 = rand_strided((5, 7), (7, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1560932Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.1560937Z 2023-01-11T21:38:06.1560942Z 2023-01-11T21:38:06.1561039Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.1561114Z import torch 2023-01-11T21:38:06.1561187Z import random 2023-01-11T21:38:06.1561308Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.1561424Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.1561437Z 2023-01-11T21:38:06.1561512Z aten = torch.ops.aten 2023-01-11T21:38:06.1561648Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.1561742Z async_compile = AsyncCompile() 2023-01-11T21:38:06.1561747Z 2023-01-11T21:38:06.1561820Z import triton 2023-01-11T21:38:06.1561914Z import triton.language as tl 2023-01-11T21:38:06.1562037Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.1562183Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.1562192Z 2023-01-11T21:38:06.1562196Z 2023-01-11T21:38:06.1562324Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.1562530Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.1562652Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.1562769Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:06.1562873Z float* __restrict__ out_ptr0) 2023-01-11T21:38:06.1562940Z { 2023-01-11T21:38:06.1563043Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.1563112Z { 2023-01-11T21:38:06.1563187Z #pragma omp for 2023-01-11T21:38:06.1563275Z for(long i0=0; i0<630; i0+=1) 2023-01-11T21:38:06.1563343Z { 2023-01-11T21:38:06.1563482Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:06.1563617Z auto tmp1 = at::vec::Vectorized::loadu(in_ptr1 + 8*i0); 2023-01-11T21:38:06.1563710Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.1563804Z tmp2.store(out_ptr0 + 8*i0); 2023-01-11T21:38:06.1563864Z } 2023-01-11T21:38:06.1563963Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:06.1564054Z for(long i0=5040; i0<5040; i0+=1) 2023-01-11T21:38:06.1564121Z { 2023-01-11T21:38:06.1564246Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:06.1564338Z auto tmp1 = in_ptr1[i0]; 2023-01-11T21:38:06.1564426Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.1564504Z out_ptr0[i0] = tmp2; 2023-01-11T21:38:06.1564570Z } 2023-01-11T21:38:06.1564636Z } 2023-01-11T21:38:06.1564700Z } 2023-01-11T21:38:06.1564785Z ''') 2023-01-11T21:38:06.1564791Z 2023-01-11T21:38:06.1564796Z 2023-01-11T21:38:06.1564888Z async_compile.wait(globals()) 2023-01-11T21:38:06.1564969Z del async_compile 2023-01-11T21:38:06.1564974Z 2023-01-11T21:38:06.1565042Z def call(args): 2023-01-11T21:38:06.1565125Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.1565223Z args.clear() 2023-01-11T21:38:06.1565481Z buf0 = empty_strided((2, 3, 4, 5, 6, 7), (2520, 840, 210, 42, 7, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1565649Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.1565722Z del arg0_1 2023-01-11T21:38:06.1565796Z del arg1_1 2023-01-11T21:38:06.1565865Z return (buf0, ) 2023-01-11T21:38:06.1565870Z 2023-01-11T21:38:06.1565883Z 2023-01-11T21:38:06.1565956Z if __name__ == "__main__": 2023-01-11T21:38:06.1566076Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.1566202Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.1566398Z arg0_1 = rand_strided((5040, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1566632Z arg1_1 = rand_strided((2, 3, 4, 5, 6, 7), (2520, 840, 210, 42, 7, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1566753Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.1566786Z 2023-01-11T21:38:06.1567053Z [2023-01-11 21:33:17,488] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 327 2023-01-11T21:38:06.1567470Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.1567604Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.1567852Z [2023-01-11 21:33:17,506] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 328 2023-01-11T21:38:06.1568113Z [2023-01-11 21:33:17,516] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 328 2023-01-11T21:38:06.1568523Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.1568660Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.1568914Z [2023-01-11 21:33:17,536] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 329 2023-01-11T21:38:06.1569176Z [2023-01-11 21:33:17,548] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 329 2023-01-11T21:38:06.1569592Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.1569725Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.1569977Z [2023-01-11 21:33:17,564] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 330 2023-01-11T21:38:06.1569983Z 2023-01-11T21:38:06.1570114Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.1570190Z import torch 2023-01-11T21:38:06.1570258Z import random 2023-01-11T21:38:06.1570378Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.1570502Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.1570507Z 2023-01-11T21:38:06.1570589Z aten = torch.ops.aten 2023-01-11T21:38:06.1570724Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.1570822Z async_compile = AsyncCompile() 2023-01-11T21:38:06.1570827Z 2023-01-11T21:38:06.1570902Z import triton 2023-01-11T21:38:06.1570987Z import triton.language as tl 2023-01-11T21:38:06.1571116Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.1571257Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.1571262Z 2023-01-11T21:38:06.1571267Z 2023-01-11T21:38:06.1571404Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.1571612Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.1571735Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.1571846Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:06.1571950Z float* __restrict__ out_ptr0) 2023-01-11T21:38:06.1572009Z { 2023-01-11T21:38:06.1572112Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.1572181Z { 2023-01-11T21:38:06.1572266Z #pragma omp for 2023-01-11T21:38:06.1572354Z for(long i0=0; i0<630; i0+=1) 2023-01-11T21:38:06.1572423Z { 2023-01-11T21:38:06.1572560Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:06.1572718Z auto tmp3 = at::vec::Vectorized::loadu(in_ptr1 + 8*i0); 2023-01-11T21:38:06.1572853Z auto tmp1 = at::vec::Vectorized(static_cast(1)); 2023-01-11T21:38:06.1572946Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.1573037Z auto tmp4 = tmp2 + tmp3; 2023-01-11T21:38:06.1573133Z tmp4.store(out_ptr0 + 8*i0); 2023-01-11T21:38:06.1573201Z } 2023-01-11T21:38:06.1573301Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:06.1573385Z for(long i0=5040; i0<5040; i0+=1) 2023-01-11T21:38:06.1573453Z { 2023-01-11T21:38:06.1573543Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:06.1573632Z auto tmp3 = in_ptr1[i0]; 2023-01-11T21:38:06.1573740Z auto tmp1 = static_cast(1); 2023-01-11T21:38:06.1573830Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.1573918Z auto tmp4 = tmp2 + tmp3; 2023-01-11T21:38:06.1573998Z out_ptr0[i0] = tmp4; 2023-01-11T21:38:06.1574065Z } 2023-01-11T21:38:06.1574131Z } 2023-01-11T21:38:06.1574198Z } 2023-01-11T21:38:06.1574282Z ''') 2023-01-11T21:38:06.1574288Z 2023-01-11T21:38:06.1574292Z 2023-01-11T21:38:06.1574385Z async_compile.wait(globals()) 2023-01-11T21:38:06.1574468Z del async_compile 2023-01-11T21:38:06.1574473Z 2023-01-11T21:38:06.1574828Z def call(args): 2023-01-11T21:38:06.1574929Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.1575005Z args.clear() 2023-01-11T21:38:06.1575249Z buf0 = empty_strided((2, 3, 4, 5, 6, 7), (2520, 840, 210, 42, 7, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1575418Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.1575490Z del arg0_1 2023-01-11T21:38:06.1575564Z del arg1_1 2023-01-11T21:38:06.1575640Z return (buf0, ) 2023-01-11T21:38:06.1575646Z 2023-01-11T21:38:06.1575661Z 2023-01-11T21:38:06.1575734Z if __name__ == "__main__": 2023-01-11T21:38:06.1575856Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.1575987Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.1576186Z arg0_1 = rand_strided((5040, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1576489Z arg1_1 = rand_strided((2, 3, 4, 5, 6, 7), (2520, 840, 210, 42, 7, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1576611Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.1576616Z 2023-01-11T21:38:06.1576621Z 2023-01-11T21:38:06.1576718Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.1576792Z import torch 2023-01-11T21:38:06.1576859Z import random 2023-01-11T21:38:06.1576977Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.1577101Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.1577106Z 2023-01-11T21:38:06.1577269Z aten = torch.ops.aten 2023-01-11T21:38:06.1577425Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.1577523Z async_compile = AsyncCompile() 2023-01-11T21:38:06.1577529Z 2023-01-11T21:38:06.1577602Z import triton 2023-01-11T21:38:06.1577694Z import triton.language as tl 2023-01-11T21:38:06.1577811Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.1577953Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.1577959Z 2023-01-11T21:38:06.1577963Z 2023-01-11T21:38:06.1578100Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.1578306Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.1578430Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.1578540Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:06.1578645Z float* __restrict__ out_ptr0) 2023-01-11T21:38:06.1578703Z { 2023-01-11T21:38:06.1578805Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.1578915Z { 2023-01-11T21:38:06.1579000Z #pragma omp for 2023-01-11T21:38:06.1579091Z for(long i0=0; i0<630; i0+=1) 2023-01-11T21:38:06.1579162Z { 2023-01-11T21:38:06.1579305Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:06.1579448Z auto tmp1 = at::vec::Vectorized::loadu(in_ptr1 + 8*i0); 2023-01-11T21:38:06.1579534Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.1579631Z tmp2.store(out_ptr0 + 8*i0); 2023-01-11T21:38:06.1579701Z } 2023-01-11T21:38:06.1579802Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:06.1579896Z for(long i0=5040; i0<5040; i0+=1) 2023-01-11T21:38:06.1579966Z { 2023-01-11T21:38:06.1580049Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:06.1580140Z auto tmp1 = in_ptr1[i0]; 2023-01-11T21:38:06.1580229Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.1580316Z out_ptr0[i0] = tmp2; 2023-01-11T21:38:06.1580388Z } 2023-01-11T21:38:06.1580456Z } 2023-01-11T21:38:06.1580524Z } 2023-01-11T21:38:06.1580606Z ''') 2023-01-11T21:38:06.1580611Z 2023-01-11T21:38:06.1580623Z 2023-01-11T21:38:06.1580710Z async_compile.wait(globals()) 2023-01-11T21:38:06.1580792Z del async_compile 2023-01-11T21:38:06.1580797Z 2023-01-11T21:38:06.1580878Z def call(args): 2023-01-11T21:38:06.1580961Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.1581039Z args.clear() 2023-01-11T21:38:06.1581272Z buf0 = empty_strided((2, 3, 4, 5, 6, 7), (2520, 840, 210, 42, 7, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1581441Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.1581510Z del arg0_1 2023-01-11T21:38:06.1581585Z del arg1_1 2023-01-11T21:38:06.1581662Z return (buf0, ) 2023-01-11T21:38:06.1581668Z 2023-01-11T21:38:06.1581672Z 2023-01-11T21:38:06.1581755Z if __name__ == "__main__": 2023-01-11T21:38:06.1581879Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.1582008Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.1582229Z arg0_1 = rand_strided((6, 4, 5, 42), (840, 210, 42, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1582493Z arg1_1 = rand_strided((2, 3, 4, 5, 6, 7), (2520, 840, 210, 42, 7, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1582610Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.1582615Z 2023-01-11T21:38:06.1582626Z 2023-01-11T21:38:06.1582719Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.1582796Z import torch 2023-01-11T21:38:06.1582874Z import random 2023-01-11T21:38:06.1582995Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.1583120Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.1583125Z 2023-01-11T21:38:06.1583210Z aten = torch.ops.aten 2023-01-11T21:38:06.1583348Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.1583442Z async_compile = AsyncCompile() 2023-01-11T21:38:06.1583447Z 2023-01-11T21:38:06.1583524Z import triton 2023-01-11T21:38:06.1583619Z import triton.language as tl 2023-01-11T21:38:06.1583747Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.1583891Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.1583897Z 2023-01-11T21:38:06.1583901Z 2023-01-11T21:38:06.1584038Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.1584248Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.1584372Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.1584477Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:06.1584582Z float* __restrict__ out_ptr0) 2023-01-11T21:38:06.1584650Z { 2023-01-11T21:38:06.1584753Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.1584854Z { 2023-01-11T21:38:06.1584939Z #pragma omp for 2023-01-11T21:38:06.1585028Z for(long i0=0; i0<630; i0+=1) 2023-01-11T21:38:06.1585091Z { 2023-01-11T21:38:06.1585242Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:06.1585404Z auto tmp3 = at::vec::Vectorized::loadu(in_ptr1 + 8*i0); 2023-01-11T21:38:06.1585557Z auto tmp1 = at::vec::Vectorized(static_cast(1)); 2023-01-11T21:38:06.1585648Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.1585738Z auto tmp4 = tmp2 + tmp3; 2023-01-11T21:38:06.1585839Z tmp4.store(out_ptr0 + 8*i0); 2023-01-11T21:38:06.1585902Z } 2023-01-11T21:38:06.1586004Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:06.1586098Z for(long i0=5040; i0<5040; i0+=1) 2023-01-11T21:38:06.1586168Z { 2023-01-11T21:38:06.1586261Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:06.1586352Z auto tmp3 = in_ptr1[i0]; 2023-01-11T21:38:06.1586463Z auto tmp1 = static_cast(1); 2023-01-11T21:38:06.1586546Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.1586634Z auto tmp4 = tmp2 + tmp3; 2023-01-11T21:38:06.1586723Z out_ptr0[i0] = tmp4; 2023-01-11T21:38:06.1586792Z } 2023-01-11T21:38:06.1586861Z } 2023-01-11T21:38:06.1586931Z } 2023-01-11T21:38:06.1587019Z ''') 2023-01-11T21:38:06.1587025Z 2023-01-11T21:38:06.1587029Z 2023-01-11T21:38:06.1587118Z async_compile.wait(globals()) 2023-01-11T21:38:06.1587198Z del async_compile 2023-01-11T21:38:06.1587203Z 2023-01-11T21:38:06.1587280Z def call(args): 2023-01-11T21:38:06.1587361Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.1587441Z args.clear() 2023-01-11T21:38:06.1587677Z buf0 = empty_strided((2, 3, 4, 5, 6, 7), (2520, 840, 210, 42, 7, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1587845Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.1587916Z del arg0_1 2023-01-11T21:38:06.1587990Z del arg1_1 2023-01-11T21:38:06.1588066Z return (buf0, ) 2023-01-11T21:38:06.1588072Z 2023-01-11T21:38:06.1588076Z 2023-01-11T21:38:06.1588160Z if __name__ == "__main__": 2023-01-11T21:38:06.1588282Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.1588449Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.1588668Z arg0_1 = rand_strided((6, 4, 5, 42), (840, 210, 42, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1588901Z arg1_1 = rand_strided((2, 3, 4, 5, 6, 7), (2520, 840, 210, 42, 7, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1589013Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.1589026Z 2023-01-11T21:38:06.1589284Z [2023-01-11 21:33:19,259] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 330 2023-01-11T21:38:06.1589699Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.1589836Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.1590092Z [2023-01-11 21:33:19,290] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 331 2023-01-11T21:38:06.1590355Z [2023-01-11 21:33:21,054] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 331 2023-01-11T21:38:06.1590769Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.1590931Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.1591186Z [2023-01-11 21:33:21,075] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 332 2023-01-11T21:38:06.1591450Z [2023-01-11 21:33:22,802] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 332 2023-01-11T21:38:06.1591861Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.1591993Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.1592248Z [2023-01-11 21:33:22,820] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 333 2023-01-11T21:38:06.1592257Z 2023-01-11T21:38:06.1592348Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.1592423Z import torch 2023-01-11T21:38:06.1592498Z import random 2023-01-11T21:38:06.1592619Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.1592746Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.1592751Z 2023-01-11T21:38:06.1592835Z aten = torch.ops.aten 2023-01-11T21:38:06.1592971Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.1593060Z async_compile = AsyncCompile() 2023-01-11T21:38:06.1593072Z 2023-01-11T21:38:06.1593139Z import triton 2023-01-11T21:38:06.1593232Z import triton.language as tl 2023-01-11T21:38:06.1593356Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.1593497Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.1593502Z 2023-01-11T21:38:06.1593507Z 2023-01-11T21:38:06.1593642Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.1593850Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.1593973Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.1594075Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:06.1594207Z float* __restrict__ out_ptr0) 2023-01-11T21:38:06.1594275Z { 2023-01-11T21:38:06.1594381Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.1594448Z { 2023-01-11T21:38:06.1594532Z #pragma omp for 2023-01-11T21:38:06.1594623Z for(long i0=0; i0<125; i0+=1) 2023-01-11T21:38:06.1594684Z { 2023-01-11T21:38:06.1594822Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:06.1594957Z auto tmp1 = at::vec::Vectorized::loadu(in_ptr1 + 8*i0); 2023-01-11T21:38:06.1595048Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.1595143Z tmp2.store(out_ptr0 + 8*i0); 2023-01-11T21:38:06.1595217Z } 2023-01-11T21:38:06.1595318Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:06.1595401Z for(long i0=1000; i0<1000; i0+=1) 2023-01-11T21:38:06.1595471Z { 2023-01-11T21:38:06.1595561Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:06.1595648Z auto tmp1 = in_ptr1[i0]; 2023-01-11T21:38:06.1595739Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.1595826Z out_ptr0[i0] = tmp2; 2023-01-11T21:38:06.1595895Z } 2023-01-11T21:38:06.1595954Z } 2023-01-11T21:38:06.1596018Z } 2023-01-11T21:38:06.1596103Z ''') 2023-01-11T21:38:06.1596108Z 2023-01-11T21:38:06.1596113Z 2023-01-11T21:38:06.1596205Z async_compile.wait(globals()) 2023-01-11T21:38:06.1596282Z del async_compile 2023-01-11T21:38:06.1596287Z 2023-01-11T21:38:06.1596362Z def call(args): 2023-01-11T21:38:06.1596442Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.1596510Z args.clear() 2023-01-11T21:38:06.1596718Z buf0 = empty_strided((10, 5, 20), (100, 20, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1596925Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.1596998Z del arg0_1 2023-01-11T21:38:06.1597070Z del arg1_1 2023-01-11T21:38:06.1597147Z return (buf0, ) 2023-01-11T21:38:06.1597152Z 2023-01-11T21:38:06.1597158Z 2023-01-11T21:38:06.1597241Z if __name__ == "__main__": 2023-01-11T21:38:06.1597360Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.1597480Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.1597679Z arg0_1 = rand_strided((50, 20), (20, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1597886Z arg1_1 = rand_strided((10, 5, 20), (100, 20, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1598007Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.1598012Z 2023-01-11T21:38:06.1598017Z 2023-01-11T21:38:06.1598116Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.1598195Z import torch 2023-01-11T21:38:06.1598270Z import random 2023-01-11T21:38:06.1598389Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.1598505Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.1598511Z 2023-01-11T21:38:06.1598594Z aten = torch.ops.aten 2023-01-11T21:38:06.1598732Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.1598827Z async_compile = AsyncCompile() 2023-01-11T21:38:06.1598832Z 2023-01-11T21:38:06.1598905Z import triton 2023-01-11T21:38:06.1599000Z import triton.language as tl 2023-01-11T21:38:06.1599123Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.1599264Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.1599269Z 2023-01-11T21:38:06.1599274Z 2023-01-11T21:38:06.1599403Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.1599609Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.1599733Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.1599846Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:06.1599952Z float* __restrict__ out_ptr0) 2023-01-11T21:38:06.1600016Z { 2023-01-11T21:38:06.1600145Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.1600205Z { 2023-01-11T21:38:06.1600287Z #pragma omp for 2023-01-11T21:38:06.1600375Z for(long i0=0; i0<125; i0+=1) 2023-01-11T21:38:06.1600443Z { 2023-01-11T21:38:06.1600580Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:06.1600715Z auto tmp3 = at::vec::Vectorized::loadu(in_ptr1 + 8*i0); 2023-01-11T21:38:06.1600850Z auto tmp1 = at::vec::Vectorized(static_cast(1)); 2023-01-11T21:38:06.1600942Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.1601023Z auto tmp4 = tmp2 + tmp3; 2023-01-11T21:38:06.1601123Z tmp4.store(out_ptr0 + 8*i0); 2023-01-11T21:38:06.1601190Z } 2023-01-11T21:38:06.1601288Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:06.1601380Z for(long i0=1000; i0<1000; i0+=1) 2023-01-11T21:38:06.1601449Z { 2023-01-11T21:38:06.1601536Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:06.1601620Z auto tmp3 = in_ptr1[i0]; 2023-01-11T21:38:06.1601726Z auto tmp1 = static_cast(1); 2023-01-11T21:38:06.1601813Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.1601902Z auto tmp4 = tmp2 + tmp3; 2023-01-11T21:38:06.1601986Z out_ptr0[i0] = tmp4; 2023-01-11T21:38:06.1602052Z } 2023-01-11T21:38:06.1602119Z } 2023-01-11T21:38:06.1602177Z } 2023-01-11T21:38:06.1602262Z ''') 2023-01-11T21:38:06.1602267Z 2023-01-11T21:38:06.1602272Z 2023-01-11T21:38:06.1602367Z async_compile.wait(globals()) 2023-01-11T21:38:06.1602446Z del async_compile 2023-01-11T21:38:06.1602493Z 2023-01-11T21:38:06.1602568Z def call(args): 2023-01-11T21:38:06.1602648Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.1602724Z args.clear() 2023-01-11T21:38:06.1602926Z buf0 = empty_strided((10, 5, 20), (100, 20, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1603094Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.1603166Z del arg0_1 2023-01-11T21:38:06.1603237Z del arg1_1 2023-01-11T21:38:06.1603314Z return (buf0, ) 2023-01-11T21:38:06.1603319Z 2023-01-11T21:38:06.1603323Z 2023-01-11T21:38:06.1603408Z if __name__ == "__main__": 2023-01-11T21:38:06.1603527Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.1603655Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.1603847Z arg0_1 = rand_strided((50, 20), (20, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1604054Z arg1_1 = rand_strided((10, 5, 20), (100, 20, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1604177Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.1604183Z 2023-01-11T21:38:06.1604187Z 2023-01-11T21:38:06.1604287Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.1604360Z import torch 2023-01-11T21:38:06.1604434Z import random 2023-01-11T21:38:06.1604557Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.1604674Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.1604686Z 2023-01-11T21:38:06.1604761Z aten = torch.ops.aten 2023-01-11T21:38:06.1604896Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.1604992Z async_compile = AsyncCompile() 2023-01-11T21:38:06.1604997Z 2023-01-11T21:38:06.1605072Z import triton 2023-01-11T21:38:06.1605164Z import triton.language as tl 2023-01-11T21:38:06.1605288Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.1605429Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.1605436Z 2023-01-11T21:38:06.1605440Z 2023-01-11T21:38:06.1605577Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.1605776Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.1605902Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.1606046Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:06.1606156Z float* __restrict__ out_ptr0) 2023-01-11T21:38:06.1606224Z { 2023-01-11T21:38:06.1606328Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.1606393Z { 2023-01-11T21:38:06.1606467Z #pragma omp for 2023-01-11T21:38:06.1606553Z for(long i0=0; i0<1; i0+=1) 2023-01-11T21:38:06.1606622Z { 2023-01-11T21:38:06.1606760Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:06.1606896Z auto tmp1 = at::vec::Vectorized::loadu(in_ptr1 + 8*i0); 2023-01-11T21:38:06.1606991Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.1607088Z tmp2.store(out_ptr0 + 8*i0); 2023-01-11T21:38:06.1607148Z } 2023-01-11T21:38:06.1607252Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:06.1607338Z for(long i0=8; i0<10; i0+=1) 2023-01-11T21:38:06.1607404Z { 2023-01-11T21:38:06.1607493Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:06.1607581Z auto tmp1 = in_ptr1[i0]; 2023-01-11T21:38:06.1607670Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.1607748Z out_ptr0[i0] = tmp2; 2023-01-11T21:38:06.1607815Z } 2023-01-11T21:38:06.1607882Z } 2023-01-11T21:38:06.1607945Z } 2023-01-11T21:38:06.1608029Z ''') 2023-01-11T21:38:06.1608035Z 2023-01-11T21:38:06.1608039Z 2023-01-11T21:38:06.1608133Z async_compile.wait(globals()) 2023-01-11T21:38:06.1608212Z del async_compile 2023-01-11T21:38:06.1608217Z 2023-01-11T21:38:06.1608284Z def call(args): 2023-01-11T21:38:06.1608367Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.1608472Z args.clear() 2023-01-11T21:38:06.1608664Z buf0 = empty_strided((10, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1608832Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.1608904Z del arg0_1 2023-01-11T21:38:06.1608977Z del arg1_1 2023-01-11T21:38:06.1609046Z return (buf0, ) 2023-01-11T21:38:06.1609051Z 2023-01-11T21:38:06.1609062Z 2023-01-11T21:38:06.1609136Z if __name__ == "__main__": 2023-01-11T21:38:06.1609257Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.1609383Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.1609592Z arg0_1 = rand_strided((1, 10, 1), (10, 1, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1609784Z arg1_1 = rand_strided((10, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1609903Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.1609911Z 2023-01-11T21:38:06.1610174Z [2023-01-11 21:33:24,502] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 333 2023-01-11T21:38:06.1610587Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.1610721Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.1610969Z [2023-01-11 21:33:24,517] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 334 2023-01-11T21:38:06.1611232Z [2023-01-11 21:33:24,526] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 334 2023-01-11T21:38:06.1611643Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.1611802Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.1612058Z [2023-01-11 21:33:24,544] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 335 2023-01-11T21:38:06.1612321Z [2023-01-11 21:33:24,566] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 335 2023-01-11T21:38:06.1612730Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.1612864Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.1613119Z [2023-01-11 21:33:24,581] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 336 2023-01-11T21:38:06.1613124Z 2023-01-11T21:38:06.1613226Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.1613304Z import torch 2023-01-11T21:38:06.1613371Z import random 2023-01-11T21:38:06.1613490Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.1613612Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.1613617Z 2023-01-11T21:38:06.1613699Z aten = torch.ops.aten 2023-01-11T21:38:06.1613835Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.1613934Z async_compile = AsyncCompile() 2023-01-11T21:38:06.1613939Z 2023-01-11T21:38:06.1614013Z import triton 2023-01-11T21:38:06.1614098Z import triton.language as tl 2023-01-11T21:38:06.1614224Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.1614390Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.1614396Z 2023-01-11T21:38:06.1614400Z 2023-01-11T21:38:06.1614755Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.1615002Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.1615126Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.1615235Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:06.1615338Z float* __restrict__ out_ptr0) 2023-01-11T21:38:06.1615396Z { 2023-01-11T21:38:06.1615498Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.1615564Z { 2023-01-11T21:38:06.1615645Z #pragma omp for 2023-01-11T21:38:06.1615732Z for(long i0=0; i0<1; i0+=1) 2023-01-11T21:38:06.1615799Z { 2023-01-11T21:38:06.1615939Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:06.1616072Z auto tmp3 = at::vec::Vectorized::loadu(in_ptr1 + 8*i0); 2023-01-11T21:38:06.1616205Z auto tmp1 = at::vec::Vectorized(static_cast(1)); 2023-01-11T21:38:06.1616297Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.1616384Z auto tmp4 = tmp2 + tmp3; 2023-01-11T21:38:06.1616483Z tmp4.store(out_ptr0 + 8*i0); 2023-01-11T21:38:06.1616550Z } 2023-01-11T21:38:06.1616647Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:06.1616726Z for(long i0=8; i0<10; i0+=1) 2023-01-11T21:38:06.1616792Z { 2023-01-11T21:38:06.1616880Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:06.1616966Z auto tmp3 = in_ptr1[i0]; 2023-01-11T21:38:06.1617069Z auto tmp1 = static_cast(1); 2023-01-11T21:38:06.1617289Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.1617385Z auto tmp4 = tmp2 + tmp3; 2023-01-11T21:38:06.1617463Z out_ptr0[i0] = tmp4; 2023-01-11T21:38:06.1617532Z } 2023-01-11T21:38:06.1617598Z } 2023-01-11T21:38:06.1617663Z } 2023-01-11T21:38:06.1617756Z ''') 2023-01-11T21:38:06.1617761Z 2023-01-11T21:38:06.1617766Z 2023-01-11T21:38:06.1617858Z async_compile.wait(globals()) 2023-01-11T21:38:06.1617935Z del async_compile 2023-01-11T21:38:06.1617941Z 2023-01-11T21:38:06.1618070Z def call(args): 2023-01-11T21:38:06.1618153Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.1618229Z args.clear() 2023-01-11T21:38:06.1618421Z buf0 = empty_strided((10, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1618587Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.1618660Z del arg0_1 2023-01-11T21:38:06.1618731Z del arg1_1 2023-01-11T21:38:06.1618799Z return (buf0, ) 2023-01-11T21:38:06.1618811Z 2023-01-11T21:38:06.1618815Z 2023-01-11T21:38:06.1618888Z if __name__ == "__main__": 2023-01-11T21:38:06.1619006Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.1619136Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.1619346Z arg0_1 = rand_strided((1, 10, 1), (10, 1, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1619537Z arg1_1 = rand_strided((10, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1619659Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.1619664Z 2023-01-11T21:38:06.1619669Z 2023-01-11T21:38:06.1619766Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.1619840Z import torch 2023-01-11T21:38:06.1619906Z import random 2023-01-11T21:38:06.1620026Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.1620149Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.1620154Z 2023-01-11T21:38:06.1620236Z aten = torch.ops.aten 2023-01-11T21:38:06.1620373Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.1620468Z async_compile = AsyncCompile() 2023-01-11T21:38:06.1620514Z 2023-01-11T21:38:06.1620594Z import triton 2023-01-11T21:38:06.1620682Z import triton.language as tl 2023-01-11T21:38:06.1620810Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.1620951Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.1620956Z 2023-01-11T21:38:06.1620963Z 2023-01-11T21:38:06.1621102Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.1621311Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.1621437Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.1621550Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:06.1621657Z float* __restrict__ out_ptr0) 2023-01-11T21:38:06.1621718Z { 2023-01-11T21:38:06.1621822Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.1621891Z { 2023-01-11T21:38:06.1621975Z #pragma omp for 2023-01-11T21:38:06.1622070Z for(long i0=0; i0<125; i0+=1) 2023-01-11T21:38:06.1622138Z { 2023-01-11T21:38:06.1622280Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:06.1622411Z auto tmp1 = at::vec::Vectorized::loadu(in_ptr1 + 8*i0); 2023-01-11T21:38:06.1622504Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.1622605Z tmp2.store(out_ptr0 + 8*i0); 2023-01-11T21:38:06.1622675Z } 2023-01-11T21:38:06.1622778Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:06.1622872Z for(long i0=1000; i0<1000; i0+=1) 2023-01-11T21:38:06.1622941Z { 2023-01-11T21:38:06.1623025Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:06.1623115Z auto tmp1 = in_ptr1[i0]; 2023-01-11T21:38:06.1623207Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.1623295Z out_ptr0[i0] = tmp2; 2023-01-11T21:38:06.1623365Z } 2023-01-11T21:38:06.1623433Z } 2023-01-11T21:38:06.1623499Z } 2023-01-11T21:38:06.1623582Z ''') 2023-01-11T21:38:06.1623587Z 2023-01-11T21:38:06.1623591Z 2023-01-11T21:38:06.1623687Z async_compile.wait(globals()) 2023-01-11T21:38:06.1623767Z del async_compile 2023-01-11T21:38:06.1623772Z 2023-01-11T21:38:06.1623850Z def call(args): 2023-01-11T21:38:06.1623931Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.1624045Z args.clear() 2023-01-11T21:38:06.1624252Z buf0 = empty_strided((10, 100), (100, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1624421Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.1624490Z del arg0_1 2023-01-11T21:38:06.1624563Z del arg1_1 2023-01-11T21:38:06.1624641Z return (buf0, ) 2023-01-11T21:38:06.1624647Z 2023-01-11T21:38:06.1624651Z 2023-01-11T21:38:06.1624734Z if __name__ == "__main__": 2023-01-11T21:38:06.1624853Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.1624982Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.1625227Z arg0_1 = rand_strided((10, 1, 10, 1, 10), (100, 100, 10, 10, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1625457Z arg1_1 = rand_strided((10, 100), (100, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1625582Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.1625588Z 2023-01-11T21:38:06.1625594Z 2023-01-11T21:38:06.1625694Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.1625770Z import torch 2023-01-11T21:38:06.1625847Z import random 2023-01-11T21:38:06.1625968Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.1626095Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.1626100Z 2023-01-11T21:38:06.1626185Z aten = torch.ops.aten 2023-01-11T21:38:06.1626316Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.1626414Z async_compile = AsyncCompile() 2023-01-11T21:38:06.1626419Z 2023-01-11T21:38:06.1626494Z import triton 2023-01-11T21:38:06.1626633Z import triton.language as tl 2023-01-11T21:38:06.1626756Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.1626894Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.1626899Z 2023-01-11T21:38:06.1626903Z 2023-01-11T21:38:06.1627038Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.1627243Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.1627359Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.1627471Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:06.1627578Z float* __restrict__ out_ptr0) 2023-01-11T21:38:06.1627643Z { 2023-01-11T21:38:06.1627745Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.1627815Z { 2023-01-11T21:38:06.1627896Z #pragma omp for 2023-01-11T21:38:06.1627977Z for(long i0=0; i0<125; i0+=1) 2023-01-11T21:38:06.1628044Z { 2023-01-11T21:38:06.1628185Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:06.1628320Z auto tmp3 = at::vec::Vectorized::loadu(in_ptr1 + 8*i0); 2023-01-11T21:38:06.1628457Z auto tmp1 = at::vec::Vectorized(static_cast(1)); 2023-01-11T21:38:06.1628551Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.1628639Z auto tmp4 = tmp2 + tmp3; 2023-01-11T21:38:06.1628737Z tmp4.store(out_ptr0 + 8*i0); 2023-01-11T21:38:06.1628797Z } 2023-01-11T21:38:06.1628895Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:06.1628986Z for(long i0=1000; i0<1000; i0+=1) 2023-01-11T21:38:06.1629053Z { 2023-01-11T21:38:06.1629143Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:06.1629232Z auto tmp3 = in_ptr1[i0]; 2023-01-11T21:38:06.1629335Z auto tmp1 = static_cast(1); 2023-01-11T21:38:06.1629416Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.1629506Z auto tmp4 = tmp2 + tmp3; 2023-01-11T21:38:06.1629596Z out_ptr0[i0] = tmp4; 2023-01-11T21:38:06.1629662Z } 2023-01-11T21:38:06.1629727Z } 2023-01-11T21:38:06.1629792Z } 2023-01-11T21:38:06.1629870Z ''') 2023-01-11T21:38:06.1629876Z 2023-01-11T21:38:06.1629889Z 2023-01-11T21:38:06.1629974Z async_compile.wait(globals()) 2023-01-11T21:38:06.1630084Z del async_compile 2023-01-11T21:38:06.1630090Z 2023-01-11T21:38:06.1630166Z def call(args): 2023-01-11T21:38:06.1630249Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.1630324Z args.clear() 2023-01-11T21:38:06.1630527Z buf0 = empty_strided((10, 100), (100, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1630695Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.1630760Z del arg0_1 2023-01-11T21:38:06.1630831Z del arg1_1 2023-01-11T21:38:06.1630906Z return (buf0, ) 2023-01-11T21:38:06.1630911Z 2023-01-11T21:38:06.1630918Z 2023-01-11T21:38:06.1630999Z if __name__ == "__main__": 2023-01-11T21:38:06.1631116Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.1631242Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.1631468Z arg0_1 = rand_strided((10, 1, 10, 1, 10), (100, 100, 10, 10, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1631672Z arg1_1 = rand_strided((10, 100), (100, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1631786Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.1631791Z 2023-01-11T21:38:06.1632055Z [2023-01-11 21:33:26,249] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 336 2023-01-11T21:38:06.1632467Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.1632632Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.1632889Z [2023-01-11 21:33:26,272] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 337 2023-01-11T21:38:06.1633154Z [2023-01-11 21:33:27,984] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 337 2023-01-11T21:38:06.1633564Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.1633693Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.1633947Z [2023-01-11 21:33:28,000] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 338 2023-01-11T21:38:06.1634210Z [2023-01-11 21:33:28,009] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 338 2023-01-11T21:38:06.1634624Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.1634754Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.1635000Z [2023-01-11 21:33:28,025] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 339 2023-01-11T21:38:06.1635015Z 2023-01-11T21:38:06.1635106Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.1635181Z import torch 2023-01-11T21:38:06.1635256Z import random 2023-01-11T21:38:06.1635378Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.1635504Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.1635509Z 2023-01-11T21:38:06.1635592Z aten = torch.ops.aten 2023-01-11T21:38:06.1635729Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.1635845Z async_compile = AsyncCompile() 2023-01-11T21:38:06.1635851Z 2023-01-11T21:38:06.1635929Z import triton 2023-01-11T21:38:06.1636021Z import triton.language as tl 2023-01-11T21:38:06.1636146Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.1636288Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.1636294Z 2023-01-11T21:38:06.1636298Z 2023-01-11T21:38:06.1636437Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.1636644Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.1636771Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.1636875Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:06.1636979Z float* __restrict__ out_ptr0) 2023-01-11T21:38:06.1637044Z { 2023-01-11T21:38:06.1637148Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.1637215Z { 2023-01-11T21:38:06.1637299Z #pragma omp for 2023-01-11T21:38:06.1637387Z for(long i0=0; i0<2; i0+=1) 2023-01-11T21:38:06.1637448Z { 2023-01-11T21:38:06.1637587Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:06.1637724Z auto tmp1 = at::vec::Vectorized::loadu(in_ptr1 + 8*i0); 2023-01-11T21:38:06.1637817Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.1637912Z tmp2.store(out_ptr0 + 8*i0); 2023-01-11T21:38:06.1637979Z } 2023-01-11T21:38:06.1638079Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:06.1638159Z for(long i0=16; i0<16; i0+=1) 2023-01-11T21:38:06.1638271Z { 2023-01-11T21:38:06.1638359Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:06.1638447Z auto tmp1 = in_ptr1[i0]; 2023-01-11T21:38:06.1638535Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.1638620Z out_ptr0[i0] = tmp2; 2023-01-11T21:38:06.1638686Z } 2023-01-11T21:38:06.1638745Z } 2023-01-11T21:38:06.1638812Z } 2023-01-11T21:38:06.1638897Z ''') 2023-01-11T21:38:06.1638903Z 2023-01-11T21:38:06.1638910Z 2023-01-11T21:38:06.1639001Z async_compile.wait(globals()) 2023-01-11T21:38:06.1639078Z del async_compile 2023-01-11T21:38:06.1639083Z 2023-01-11T21:38:06.1639158Z def call(args): 2023-01-11T21:38:06.1639237Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.1639305Z args.clear() 2023-01-11T21:38:06.1639499Z buf0 = empty_strided((4, 4), (4, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1639666Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.1639739Z del arg0_1 2023-01-11T21:38:06.1639814Z del arg1_1 2023-01-11T21:38:06.1639890Z return (buf0, ) 2023-01-11T21:38:06.1639895Z 2023-01-11T21:38:06.1639899Z 2023-01-11T21:38:06.1639981Z if __name__ == "__main__": 2023-01-11T21:38:06.1640100Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.1640222Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.1640429Z arg0_1 = rand_strided((2, 2, 2, 2), (8, 4, 2, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1640623Z arg1_1 = rand_strided((4, 4), (4, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1640743Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.1640749Z 2023-01-11T21:38:06.1640753Z 2023-01-11T21:38:06.1640852Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.1640926Z import torch 2023-01-11T21:38:06.1640999Z import random 2023-01-11T21:38:06.1641111Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.1641233Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.1641241Z 2023-01-11T21:38:06.1641321Z aten = torch.ops.aten 2023-01-11T21:38:06.1641455Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.1641550Z async_compile = AsyncCompile() 2023-01-11T21:38:06.1641555Z 2023-01-11T21:38:06.1641631Z import triton 2023-01-11T21:38:06.1641754Z import triton.language as tl 2023-01-11T21:38:06.1641882Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.1642014Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.1642019Z 2023-01-11T21:38:06.1642033Z 2023-01-11T21:38:06.1642163Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.1642368Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.1642492Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.1642604Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:06.1642711Z float* __restrict__ out_ptr0) 2023-01-11T21:38:06.1642777Z { 2023-01-11T21:38:06.1642878Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.1642937Z { 2023-01-11T21:38:06.1643019Z #pragma omp for 2023-01-11T21:38:06.1643104Z for(long i0=0; i0<2; i0+=1) 2023-01-11T21:38:06.1643171Z { 2023-01-11T21:38:06.1643309Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:06.1643447Z auto tmp3 = at::vec::Vectorized::loadu(in_ptr1 + 8*i0); 2023-01-11T21:38:06.1643582Z auto tmp1 = at::vec::Vectorized(static_cast(1)); 2023-01-11T21:38:06.1643664Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.1643751Z auto tmp4 = tmp2 + tmp3; 2023-01-11T21:38:06.1643848Z tmp4.store(out_ptr0 + 8*i0); 2023-01-11T21:38:06.1643915Z } 2023-01-11T21:38:06.1644015Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:06.1644102Z for(long i0=16; i0<16; i0+=1) 2023-01-11T21:38:06.1644196Z { 2023-01-11T21:38:06.1644277Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:06.1644365Z auto tmp3 = in_ptr1[i0]; 2023-01-11T21:38:06.1644469Z auto tmp1 = static_cast(1); 2023-01-11T21:38:06.1644556Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.1644646Z auto tmp4 = tmp2 + tmp3; 2023-01-11T21:38:06.1644732Z out_ptr0[i0] = tmp4; 2023-01-11T21:38:06.1644798Z } 2023-01-11T21:38:06.1644857Z } 2023-01-11T21:38:06.1644922Z } 2023-01-11T21:38:06.1645005Z ''') 2023-01-11T21:38:06.1645011Z 2023-01-11T21:38:06.1645015Z 2023-01-11T21:38:06.1645113Z async_compile.wait(globals()) 2023-01-11T21:38:06.1645206Z del async_compile 2023-01-11T21:38:06.1645213Z 2023-01-11T21:38:06.1645295Z def call(args): 2023-01-11T21:38:06.1645391Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.1645468Z args.clear() 2023-01-11T21:38:06.1645660Z buf0 = empty_strided((4, 4), (4, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1645829Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.1645901Z del arg0_1 2023-01-11T21:38:06.1645974Z del arg1_1 2023-01-11T21:38:06.1646050Z return (buf0, ) 2023-01-11T21:38:06.1646055Z 2023-01-11T21:38:06.1646060Z 2023-01-11T21:38:06.1646145Z if __name__ == "__main__": 2023-01-11T21:38:06.1646261Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.1646381Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.1646590Z arg0_1 = rand_strided((2, 2, 2, 2), (8, 4, 2, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1646784Z arg1_1 = rand_strided((4, 4), (4, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1646904Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.1646909Z 2023-01-11T21:38:06.1646914Z 2023-01-11T21:38:06.1647011Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.1647089Z import torch 2023-01-11T21:38:06.1647164Z import random 2023-01-11T21:38:06.1647282Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.1647398Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.1647404Z 2023-01-11T21:38:06.1647484Z aten = torch.ops.aten 2023-01-11T21:38:06.1647654Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.1647751Z async_compile = AsyncCompile() 2023-01-11T21:38:06.1647756Z 2023-01-11T21:38:06.1647830Z import triton 2023-01-11T21:38:06.1647921Z import triton.language as tl 2023-01-11T21:38:06.1648046Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.1648178Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.1648190Z 2023-01-11T21:38:06.1648194Z 2023-01-11T21:38:06.1648321Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.1648527Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.1648653Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.1648762Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:06.1648865Z float* __restrict__ out_ptr0) 2023-01-11T21:38:06.1648931Z { 2023-01-11T21:38:06.1649033Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.1649094Z { 2023-01-11T21:38:06.1649177Z #pragma omp for 2023-01-11T21:38:06.1649264Z for(long i0=0; i0<4; i0+=1) 2023-01-11T21:38:06.1649330Z { 2023-01-11T21:38:06.1649467Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:06.1649601Z auto tmp1 = at::vec::Vectorized::loadu(in_ptr1 + 8*i0); 2023-01-11T21:38:06.1649689Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.1649777Z tmp2.store(out_ptr0 + 8*i0); 2023-01-11T21:38:06.1649844Z } 2023-01-11T21:38:06.1649942Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:06.1650060Z for(long i0=32; i0<35; i0+=1) 2023-01-11T21:38:06.1650128Z { 2023-01-11T21:38:06.1650216Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:06.1650305Z auto tmp1 = in_ptr1[i0]; 2023-01-11T21:38:06.1650385Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.1650472Z out_ptr0[i0] = tmp2; 2023-01-11T21:38:06.1650541Z } 2023-01-11T21:38:06.1650609Z } 2023-01-11T21:38:06.1650673Z } 2023-01-11T21:38:06.1650756Z ''') 2023-01-11T21:38:06.1650762Z 2023-01-11T21:38:06.1650766Z 2023-01-11T21:38:06.1650858Z async_compile.wait(globals()) 2023-01-11T21:38:06.1650928Z del async_compile 2023-01-11T21:38:06.1650933Z 2023-01-11T21:38:06.1651007Z def call(args): 2023-01-11T21:38:06.1651087Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.1651162Z args.clear() 2023-01-11T21:38:06.1651356Z buf0 = empty_strided((35, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1651522Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.1651597Z del arg0_1 2023-01-11T21:38:06.1651662Z del arg1_1 2023-01-11T21:38:06.1651737Z return (buf0, ) 2023-01-11T21:38:06.1651742Z 2023-01-11T21:38:06.1651747Z 2023-01-11T21:38:06.1651826Z if __name__ == "__main__": 2023-01-11T21:38:06.1651944Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.1652073Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.1652269Z arg0_1 = rand_strided((5, 7), (7, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1652461Z arg1_1 = rand_strided((35, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1652580Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.1652585Z 2023-01-11T21:38:06.1652848Z [2023-01-11 21:33:28,039] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 339 2023-01-11T21:38:06.1653254Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.1653417Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.1653674Z [2023-01-11 21:33:28,053] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 340 2023-01-11T21:38:06.1653937Z [2023-01-11 21:33:28,062] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 340 2023-01-11T21:38:06.1654347Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.1654624Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.1654949Z [2023-01-11 21:33:28,078] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 341 2023-01-11T21:38:06.1655216Z [2023-01-11 21:33:28,105] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 341 2023-01-11T21:38:06.1655629Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.1655759Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.1656011Z [2023-01-11 21:33:28,120] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 342 2023-01-11T21:38:06.1656071Z 2023-01-11T21:38:06.1656175Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.1656245Z import torch 2023-01-11T21:38:06.1656321Z import random 2023-01-11T21:38:06.1656443Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.1656573Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.1656581Z 2023-01-11T21:38:06.1656666Z aten = torch.ops.aten 2023-01-11T21:38:06.1656805Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.1656903Z async_compile = AsyncCompile() 2023-01-11T21:38:06.1656908Z 2023-01-11T21:38:06.1656978Z import triton 2023-01-11T21:38:06.1657073Z import triton.language as tl 2023-01-11T21:38:06.1657304Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.1657460Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.1657466Z 2023-01-11T21:38:06.1657471Z 2023-01-11T21:38:06.1657611Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.1657821Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.1657945Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.1658053Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:06.1658153Z float* __restrict__ out_ptr0) 2023-01-11T21:38:06.1658219Z { 2023-01-11T21:38:06.1658320Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.1658387Z { 2023-01-11T21:38:06.1658470Z #pragma omp for 2023-01-11T21:38:06.1658558Z for(long i0=0; i0<4; i0+=1) 2023-01-11T21:38:06.1658627Z { 2023-01-11T21:38:06.1658760Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:06.1658895Z auto tmp3 = at::vec::Vectorized::loadu(in_ptr1 + 8*i0); 2023-01-11T21:38:06.1659030Z auto tmp1 = at::vec::Vectorized(static_cast(1)); 2023-01-11T21:38:06.1659122Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.1659212Z auto tmp4 = tmp2 + tmp3; 2023-01-11T21:38:06.1659309Z tmp4.store(out_ptr0 + 8*i0); 2023-01-11T21:38:06.1659376Z } 2023-01-11T21:38:06.1659475Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:06.1659555Z for(long i0=32; i0<35; i0+=1) 2023-01-11T21:38:06.1659664Z { 2023-01-11T21:38:06.1659754Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:06.1659842Z auto tmp3 = in_ptr1[i0]; 2023-01-11T21:38:06.1659950Z auto tmp1 = static_cast(1); 2023-01-11T21:38:06.1660038Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.1660118Z auto tmp4 = tmp2 + tmp3; 2023-01-11T21:38:06.1660204Z out_ptr0[i0] = tmp4; 2023-01-11T21:38:06.1660269Z } 2023-01-11T21:38:06.1660333Z } 2023-01-11T21:38:06.1660397Z } 2023-01-11T21:38:06.1660483Z ''') 2023-01-11T21:38:06.1660488Z 2023-01-11T21:38:06.1660493Z 2023-01-11T21:38:06.1660587Z async_compile.wait(globals()) 2023-01-11T21:38:06.1660661Z del async_compile 2023-01-11T21:38:06.1660674Z 2023-01-11T21:38:06.1660742Z def call(args): 2023-01-11T21:38:06.1660820Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.1660896Z args.clear() 2023-01-11T21:38:06.1661086Z buf0 = empty_strided((35, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1661253Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.1661326Z del arg0_1 2023-01-11T21:38:06.1661397Z del arg1_1 2023-01-11T21:38:06.1661464Z return (buf0, ) 2023-01-11T21:38:06.1661469Z 2023-01-11T21:38:06.1661474Z 2023-01-11T21:38:06.1661556Z if __name__ == "__main__": 2023-01-11T21:38:06.1661674Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.1661802Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.1661997Z arg0_1 = rand_strided((5, 7), (7, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1662189Z arg1_1 = rand_strided((35, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1662337Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.1662343Z 2023-01-11T21:38:06.1662347Z 2023-01-11T21:38:06.1662445Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.1662512Z import torch 2023-01-11T21:38:06.1662584Z import random 2023-01-11T21:38:06.1662706Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.1662830Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.1662835Z 2023-01-11T21:38:06.1662918Z aten = torch.ops.aten 2023-01-11T21:38:06.1663053Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.1663148Z async_compile = AsyncCompile() 2023-01-11T21:38:06.1663153Z 2023-01-11T21:38:06.1663227Z import triton 2023-01-11T21:38:06.1663312Z import triton.language as tl 2023-01-11T21:38:06.1663435Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.1663574Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.1663583Z 2023-01-11T21:38:06.1663587Z 2023-01-11T21:38:06.1663721Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.1663928Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.1664052Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.1664161Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:06.1664265Z float* __restrict__ out_ptr0) 2023-01-11T21:38:06.1664323Z { 2023-01-11T21:38:06.1664424Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.1664489Z { 2023-01-11T21:38:06.1664570Z #pragma omp for 2023-01-11T21:38:06.1664657Z for(long i0=0; i0<630; i0+=1) 2023-01-11T21:38:06.1664724Z { 2023-01-11T21:38:06.1664854Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:06.1664989Z auto tmp1 = at::vec::Vectorized::loadu(in_ptr1 + 8*i0); 2023-01-11T21:38:06.1665081Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.1665177Z tmp2.store(out_ptr0 + 8*i0); 2023-01-11T21:38:06.1665244Z } 2023-01-11T21:38:06.1665356Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:06.1665462Z for(long i0=5040; i0<5040; i0+=1) 2023-01-11T21:38:06.1665573Z { 2023-01-11T21:38:06.1665663Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:06.1665752Z auto tmp1 = in_ptr1[i0]; 2023-01-11T21:38:06.1665838Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.1665923Z out_ptr0[i0] = tmp2; 2023-01-11T21:38:06.1665989Z } 2023-01-11T21:38:06.1666055Z } 2023-01-11T21:38:06.1666112Z } 2023-01-11T21:38:06.1666197Z ''') 2023-01-11T21:38:06.1666203Z 2023-01-11T21:38:06.1666207Z 2023-01-11T21:38:06.1666299Z async_compile.wait(globals()) 2023-01-11T21:38:06.1666377Z del async_compile 2023-01-11T21:38:06.1666382Z 2023-01-11T21:38:06.1666458Z def call(args): 2023-01-11T21:38:06.1666541Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.1666615Z args.clear() 2023-01-11T21:38:06.1666810Z buf0 = empty_strided((5040, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1666970Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.1667044Z del arg0_1 2023-01-11T21:38:06.1667115Z del arg1_1 2023-01-11T21:38:06.1667191Z return (buf0, ) 2023-01-11T21:38:06.1667196Z 2023-01-11T21:38:06.1667201Z 2023-01-11T21:38:06.1667280Z if __name__ == "__main__": 2023-01-11T21:38:06.1667397Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.1667523Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.1667749Z arg0_1 = rand_strided((2, 3, 4, 5, 6, 7), (2520, 840, 210, 42, 7, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1667946Z arg1_1 = rand_strided((5040, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1668099Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.1668105Z 2023-01-11T21:38:06.1668109Z 2023-01-11T21:38:06.1668208Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.1668285Z import torch 2023-01-11T21:38:06.1668361Z import random 2023-01-11T21:38:06.1668482Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.1668611Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.1668616Z 2023-01-11T21:38:06.1668693Z aten = torch.ops.aten 2023-01-11T21:38:06.1668832Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.1668929Z async_compile = AsyncCompile() 2023-01-11T21:38:06.1668934Z 2023-01-11T21:38:06.1669009Z import triton 2023-01-11T21:38:06.1669103Z import triton.language as tl 2023-01-11T21:38:06.1669232Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.1669377Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.1669382Z 2023-01-11T21:38:06.1669389Z 2023-01-11T21:38:06.1669527Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.1669728Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.1669855Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.1669968Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:06.1670077Z float* __restrict__ out_ptr0) 2023-01-11T21:38:06.1670146Z { 2023-01-11T21:38:06.1670249Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.1670316Z { 2023-01-11T21:38:06.1670393Z #pragma omp for 2023-01-11T21:38:06.1670483Z for(long i0=0; i0<630; i0+=1) 2023-01-11T21:38:06.1670553Z { 2023-01-11T21:38:06.1670693Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:06.1670831Z auto tmp3 = at::vec::Vectorized::loadu(in_ptr1 + 8*i0); 2023-01-11T21:38:06.1670968Z auto tmp1 = at::vec::Vectorized(static_cast(1)); 2023-01-11T21:38:06.1671064Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.1671154Z auto tmp4 = tmp2 + tmp3; 2023-01-11T21:38:06.1671245Z tmp4.store(out_ptr0 + 8*i0); 2023-01-11T21:38:06.1671314Z } 2023-01-11T21:38:06.1671415Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:06.1671537Z for(long i0=5040; i0<5040; i0+=1) 2023-01-11T21:38:06.1671608Z { 2023-01-11T21:38:06.1671698Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:06.1671790Z auto tmp3 = in_ptr1[i0]; 2023-01-11T21:38:06.1671890Z auto tmp1 = static_cast(1); 2023-01-11T21:38:06.1671980Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.1672070Z auto tmp4 = tmp2 + tmp3; 2023-01-11T21:38:06.1672157Z out_ptr0[i0] = tmp4; 2023-01-11T21:38:06.1672224Z } 2023-01-11T21:38:06.1672291Z } 2023-01-11T21:38:06.1672357Z } 2023-01-11T21:38:06.1672437Z ''') 2023-01-11T21:38:06.1672445Z 2023-01-11T21:38:06.1672449Z 2023-01-11T21:38:06.1672546Z async_compile.wait(globals()) 2023-01-11T21:38:06.1672624Z del async_compile 2023-01-11T21:38:06.1672630Z 2023-01-11T21:38:06.1672706Z def call(args): 2023-01-11T21:38:06.1672789Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.1672867Z args.clear() 2023-01-11T21:38:06.1673067Z buf0 = empty_strided((5040, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1673227Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.1673303Z del arg0_1 2023-01-11T21:38:06.1673376Z del arg1_1 2023-01-11T21:38:06.1673454Z return (buf0, ) 2023-01-11T21:38:06.1673460Z 2023-01-11T21:38:06.1673464Z 2023-01-11T21:38:06.1673547Z if __name__ == "__main__": 2023-01-11T21:38:06.1673665Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.1673794Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.1674028Z arg0_1 = rand_strided((2, 3, 4, 5, 6, 7), (2520, 840, 210, 42, 7, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1674248Z arg1_1 = rand_strided((5040, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1674370Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.1674375Z 2023-01-11T21:38:06.1674644Z [2023-01-11 21:33:28,131] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 342 2023-01-11T21:38:06.1675057Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.1675191Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.1675450Z [2023-01-11 21:33:28,149] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 343 2023-01-11T21:38:06.1675715Z [2023-01-11 21:33:28,195] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 343 2023-01-11T21:38:06.1676132Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.1676264Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.1676523Z [2023-01-11 21:33:28,211] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 344 2023-01-11T21:38:06.1676787Z [2023-01-11 21:33:28,219] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 344 2023-01-11T21:38:06.1677198Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.1677366Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.1677622Z [2023-01-11 21:33:28,236] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 345 2023-01-11T21:38:06.1677628Z 2023-01-11T21:38:06.1677728Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.1677804Z import torch 2023-01-11T21:38:06.1677881Z import random 2023-01-11T21:38:06.1678003Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.1678129Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.1678134Z 2023-01-11T21:38:06.1678218Z aten = torch.ops.aten 2023-01-11T21:38:06.1678350Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.1678453Z async_compile = AsyncCompile() 2023-01-11T21:38:06.1678458Z 2023-01-11T21:38:06.1678533Z import triton 2023-01-11T21:38:06.1678629Z import triton.language as tl 2023-01-11T21:38:06.1678756Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.1678901Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.1678907Z 2023-01-11T21:38:06.1678911Z 2023-01-11T21:38:06.1679049Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.1679258Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.1679376Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.1679488Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:06.1679597Z float* __restrict__ out_ptr0) 2023-01-11T21:38:06.1679666Z { 2023-01-11T21:38:06.1679769Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.1679877Z { 2023-01-11T21:38:06.1679963Z #pragma omp for 2023-01-11T21:38:06.1680046Z for(long i0=0; i0<630; i0+=1) 2023-01-11T21:38:06.1680115Z { 2023-01-11T21:38:06.1680254Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:06.1680397Z auto tmp1 = at::vec::Vectorized::loadu(in_ptr1 + 8*i0); 2023-01-11T21:38:06.1680491Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.1680590Z tmp2.store(out_ptr0 + 8*i0); 2023-01-11T21:38:06.1680659Z } 2023-01-11T21:38:06.1680754Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:06.1680847Z for(long i0=5040; i0<5040; i0+=1) 2023-01-11T21:38:06.1680915Z { 2023-01-11T21:38:06.1681006Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:06.1681096Z auto tmp1 = in_ptr1[i0]; 2023-01-11T21:38:06.1681187Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.1681274Z out_ptr0[i0] = tmp2; 2023-01-11T21:38:06.1681340Z } 2023-01-11T21:38:06.1681407Z } 2023-01-11T21:38:06.1681473Z } 2023-01-11T21:38:06.1681560Z ''') 2023-01-11T21:38:06.1681566Z 2023-01-11T21:38:06.1681570Z 2023-01-11T21:38:06.1681666Z async_compile.wait(globals()) 2023-01-11T21:38:06.1681745Z del async_compile 2023-01-11T21:38:06.1681750Z 2023-01-11T21:38:06.1681828Z def call(args): 2023-01-11T21:38:06.1681904Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.1681981Z args.clear() 2023-01-11T21:38:06.1682200Z buf0 = empty_strided((6, 4, 5, 42), (840, 210, 42, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1682366Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.1682442Z del arg0_1 2023-01-11T21:38:06.1682517Z del arg1_1 2023-01-11T21:38:06.1682595Z return (buf0, ) 2023-01-11T21:38:06.1682600Z 2023-01-11T21:38:06.1682604Z 2023-01-11T21:38:06.1682687Z if __name__ == "__main__": 2023-01-11T21:38:06.1682800Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.1682932Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.1683168Z arg0_1 = rand_strided((2, 3, 4, 5, 6, 7), (2520, 840, 210, 42, 7, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1683418Z arg1_1 = rand_strided((6, 4, 5, 42), (840, 210, 42, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1683543Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.1683549Z 2023-01-11T21:38:06.1683553Z 2023-01-11T21:38:06.1683652Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.1683730Z import torch 2023-01-11T21:38:06.1683807Z import random 2023-01-11T21:38:06.1683921Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.1684046Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.1684051Z 2023-01-11T21:38:06.1684134Z aten = torch.ops.aten 2023-01-11T21:38:06.1684271Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.1684370Z async_compile = AsyncCompile() 2023-01-11T21:38:06.1684375Z 2023-01-11T21:38:06.1684452Z import triton 2023-01-11T21:38:06.1684546Z import triton.language as tl 2023-01-11T21:38:06.1684665Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.1684806Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.1684814Z 2023-01-11T21:38:06.1684818Z 2023-01-11T21:38:06.1684956Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.1685162Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.1685286Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.1685399Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:06.1685504Z float* __restrict__ out_ptr0) 2023-01-11T21:38:06.1685572Z { 2023-01-11T21:38:06.1685668Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.1685737Z { 2023-01-11T21:38:06.1685851Z #pragma omp for 2023-01-11T21:38:06.1685943Z for(long i0=0; i0<630; i0+=1) 2023-01-11T21:38:06.1686012Z { 2023-01-11T21:38:06.1686151Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:06.1686288Z auto tmp3 = at::vec::Vectorized::loadu(in_ptr1 + 8*i0); 2023-01-11T21:38:06.1686425Z auto tmp1 = at::vec::Vectorized(static_cast(1)); 2023-01-11T21:38:06.1686511Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.1686602Z auto tmp4 = tmp2 + tmp3; 2023-01-11T21:38:06.1686700Z tmp4.store(out_ptr0 + 8*i0); 2023-01-11T21:38:06.1686770Z } 2023-01-11T21:38:06.1686871Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:06.1686966Z for(long i0=5040; i0<5040; i0+=1) 2023-01-11T21:38:06.1687035Z { 2023-01-11T21:38:06.1687120Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:06.1687210Z auto tmp3 = in_ptr1[i0]; 2023-01-11T21:38:06.1687320Z auto tmp1 = static_cast(1); 2023-01-11T21:38:06.1687411Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.1687502Z auto tmp4 = tmp2 + tmp3; 2023-01-11T21:38:06.1687589Z out_ptr0[i0] = tmp4; 2023-01-11T21:38:06.1687652Z } 2023-01-11T21:38:06.1687720Z } 2023-01-11T21:38:06.1687786Z } 2023-01-11T21:38:06.1687875Z ''') 2023-01-11T21:38:06.1687880Z 2023-01-11T21:38:06.1687885Z 2023-01-11T21:38:06.1687981Z async_compile.wait(globals()) 2023-01-11T21:38:06.1688062Z del async_compile 2023-01-11T21:38:06.1688067Z 2023-01-11T21:38:06.1688144Z def call(args): 2023-01-11T21:38:06.1688225Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.1688296Z args.clear() 2023-01-11T21:38:06.1688513Z buf0 = empty_strided((6, 4, 5, 42), (840, 210, 42, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1688683Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.1688757Z del arg0_1 2023-01-11T21:38:06.1688834Z del arg1_1 2023-01-11T21:38:06.1688911Z return (buf0, ) 2023-01-11T21:38:06.1688916Z 2023-01-11T21:38:06.1688920Z 2023-01-11T21:38:06.1689005Z if __name__ == "__main__": 2023-01-11T21:38:06.1689118Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.1689277Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.1689511Z arg0_1 = rand_strided((2, 3, 4, 5, 6, 7), (2520, 840, 210, 42, 7, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1689725Z arg1_1 = rand_strided((6, 4, 5, 42), (840, 210, 42, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1689845Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.1689850Z 2023-01-11T21:38:06.1689855Z 2023-01-11T21:38:06.1689953Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.1690027Z import torch 2023-01-11T21:38:06.1690101Z import random 2023-01-11T21:38:06.1690213Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.1690340Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.1690345Z 2023-01-11T21:38:06.1690426Z aten = torch.ops.aten 2023-01-11T21:38:06.1690563Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.1690659Z async_compile = AsyncCompile() 2023-01-11T21:38:06.1690664Z 2023-01-11T21:38:06.1690740Z import triton 2023-01-11T21:38:06.1690831Z import triton.language as tl 2023-01-11T21:38:06.1690957Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.1691089Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.1691094Z 2023-01-11T21:38:06.1691105Z 2023-01-11T21:38:06.1691232Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.1691439Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.1691560Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.1699059Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:06.1699272Z float* __restrict__ out_ptr0) 2023-01-11T21:38:06.1699343Z { 2023-01-11T21:38:06.1699451Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.1699526Z { 2023-01-11T21:38:06.1699613Z #pragma omp for 2023-01-11T21:38:06.1699697Z for(long i0=0; i0<125; i0+=1) 2023-01-11T21:38:06.1699772Z { 2023-01-11T21:38:06.1699925Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:06.1700065Z auto tmp1 = at::vec::Vectorized::loadu(in_ptr1 + 8*i0); 2023-01-11T21:38:06.1700159Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.1700257Z tmp2.store(out_ptr0 + 8*i0); 2023-01-11T21:38:06.1700327Z } 2023-01-11T21:38:06.1700422Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:06.1700519Z for(long i0=1000; i0<1000; i0+=1) 2023-01-11T21:38:06.1700590Z { 2023-01-11T21:38:06.1700681Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:06.1700770Z auto tmp1 = in_ptr1[i0]; 2023-01-11T21:38:06.1700854Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.1700945Z out_ptr0[i0] = tmp2; 2023-01-11T21:38:06.1701013Z } 2023-01-11T21:38:06.1701082Z } 2023-01-11T21:38:06.1701148Z } 2023-01-11T21:38:06.1701262Z ''') 2023-01-11T21:38:06.1701269Z 2023-01-11T21:38:06.1701276Z 2023-01-11T21:38:06.1701375Z async_compile.wait(globals()) 2023-01-11T21:38:06.1701449Z del async_compile 2023-01-11T21:38:06.1701462Z 2023-01-11T21:38:06.1701534Z def call(args): 2023-01-11T21:38:06.1701618Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.1701696Z args.clear() 2023-01-11T21:38:06.1701903Z buf0 = empty_strided((50, 20), (20, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1702087Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.1702161Z del arg0_1 2023-01-11T21:38:06.1702233Z del arg1_1 2023-01-11T21:38:06.1702306Z return (buf0, ) 2023-01-11T21:38:06.1702311Z 2023-01-11T21:38:06.1702316Z 2023-01-11T21:38:06.1702397Z if __name__ == "__main__": 2023-01-11T21:38:06.1702526Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.1702663Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.1702944Z arg0_1 = rand_strided((10, 5, 20), (100, 20, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1703174Z arg1_1 = rand_strided((50, 20), (20, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1703303Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.1703308Z 2023-01-11T21:38:06.1703615Z [2023-01-11 21:33:28,257] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 345 2023-01-11T21:38:06.1704104Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.1704243Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.1704538Z [2023-01-11 21:33:28,272] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 346 2023-01-11T21:38:06.1704839Z [2023-01-11 21:33:28,279] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 346 2023-01-11T21:38:06.1705327Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.1705469Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.1705789Z [2023-01-11 21:33:28,296] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 347 2023-01-11T21:38:06.1706092Z [2023-01-11 21:33:28,305] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 347 2023-01-11T21:38:06.1706581Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.1706724Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.1707015Z [2023-01-11 21:33:28,319] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 348 2023-01-11T21:38:06.1707021Z 2023-01-11T21:38:06.1707128Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.1707202Z import torch 2023-01-11T21:38:06.1707280Z import random 2023-01-11T21:38:06.1707410Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.1707545Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.1707550Z 2023-01-11T21:38:06.1707638Z aten = torch.ops.aten 2023-01-11T21:38:06.1707792Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.1707895Z async_compile = AsyncCompile() 2023-01-11T21:38:06.1707900Z 2023-01-11T21:38:06.1707978Z import triton 2023-01-11T21:38:06.1708070Z import triton.language as tl 2023-01-11T21:38:06.1708207Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.1708360Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.1708366Z 2023-01-11T21:38:06.1708371Z 2023-01-11T21:38:06.1708524Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.1708758Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.1708896Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.1709013Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:06.1709125Z float* __restrict__ out_ptr0) 2023-01-11T21:38:06.1709186Z { 2023-01-11T21:38:06.1709296Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.1709393Z { 2023-01-11T21:38:06.1709481Z #pragma omp for 2023-01-11T21:38:06.1709574Z for(long i0=0; i0<125; i0+=1) 2023-01-11T21:38:06.1709644Z { 2023-01-11T21:38:06.1709790Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:06.1709940Z auto tmp3 = at::vec::Vectorized::loadu(in_ptr1 + 8*i0); 2023-01-11T21:38:06.1710085Z auto tmp1 = at::vec::Vectorized(static_cast(1)); 2023-01-11T21:38:06.1710181Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.1710276Z auto tmp4 = tmp2 + tmp3; 2023-01-11T21:38:06.1710378Z tmp4.store(out_ptr0 + 8*i0); 2023-01-11T21:38:06.1710452Z } 2023-01-11T21:38:06.1710558Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:06.1710648Z for(long i0=1000; i0<1000; i0+=1) 2023-01-11T21:38:06.1710717Z { 2023-01-11T21:38:06.1710811Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:06.1710906Z auto tmp3 = in_ptr1[i0]; 2023-01-11T21:38:06.1711017Z auto tmp1 = static_cast(1); 2023-01-11T21:38:06.1711112Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.1711205Z auto tmp4 = tmp2 + tmp3; 2023-01-11T21:38:06.1711288Z out_ptr0[i0] = tmp4; 2023-01-11T21:38:06.1711356Z } 2023-01-11T21:38:06.1711423Z } 2023-01-11T21:38:06.1711490Z } 2023-01-11T21:38:06.1711580Z ''') 2023-01-11T21:38:06.1711586Z 2023-01-11T21:38:06.1711590Z 2023-01-11T21:38:06.1711691Z async_compile.wait(globals()) 2023-01-11T21:38:06.1711772Z del async_compile 2023-01-11T21:38:06.1711777Z 2023-01-11T21:38:06.1711849Z def call(args): 2023-01-11T21:38:06.1711963Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.1712039Z args.clear() 2023-01-11T21:38:06.1712269Z buf0 = empty_strided((50, 20), (20, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1712438Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.1712512Z del arg0_1 2023-01-11T21:38:06.1712584Z del arg1_1 2023-01-11T21:38:06.1712652Z return (buf0, ) 2023-01-11T21:38:06.1712657Z 2023-01-11T21:38:06.1712668Z 2023-01-11T21:38:06.1712741Z if __name__ == "__main__": 2023-01-11T21:38:06.1712860Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.1712988Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.1713197Z arg0_1 = rand_strided((10, 5, 20), (100, 20, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1713394Z arg1_1 = rand_strided((50, 20), (20, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1713517Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.1713523Z 2023-01-11T21:38:06.1713527Z 2023-01-11T21:38:06.1713624Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.1713691Z import torch 2023-01-11T21:38:06.1713766Z import random 2023-01-11T21:38:06.1713884Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.1714012Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.1714017Z 2023-01-11T21:38:06.1714100Z aten = torch.ops.aten 2023-01-11T21:38:06.1714236Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.1714331Z async_compile = AsyncCompile() 2023-01-11T21:38:06.1714336Z 2023-01-11T21:38:06.1714409Z import triton 2023-01-11T21:38:06.1714494Z import triton.language as tl 2023-01-11T21:38:06.1714618Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.1714758Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.1714764Z 2023-01-11T21:38:06.1714770Z 2023-01-11T21:38:06.1714905Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.1715111Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.1715234Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.1715375Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:06.1715480Z float* __restrict__ out_ptr0) 2023-01-11T21:38:06.1715539Z { 2023-01-11T21:38:06.1715639Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.1715704Z { 2023-01-11T21:38:06.1715787Z #pragma omp for 2023-01-11T21:38:06.1715874Z for(long i0=0; i0<1; i0+=1) 2023-01-11T21:38:06.1715941Z { 2023-01-11T21:38:06.1716082Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:06.1716212Z auto tmp1 = at::vec::Vectorized::loadu(in_ptr1 + 8*i0); 2023-01-11T21:38:06.1716301Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.1716400Z tmp2.store(out_ptr0 + 8*i0); 2023-01-11T21:38:06.1716466Z } 2023-01-11T21:38:06.1716568Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:06.1716654Z for(long i0=8; i0<10; i0+=1) 2023-01-11T21:38:06.1716720Z { 2023-01-11T21:38:06.1716801Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:06.1716891Z auto tmp1 = in_ptr1[i0]; 2023-01-11T21:38:06.1716982Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.1717068Z out_ptr0[i0] = tmp2; 2023-01-11T21:38:06.1717134Z } 2023-01-11T21:38:06.1717200Z } 2023-01-11T21:38:06.1717257Z } 2023-01-11T21:38:06.1717342Z ''') 2023-01-11T21:38:06.1717347Z 2023-01-11T21:38:06.1717352Z 2023-01-11T21:38:06.1717444Z async_compile.wait(globals()) 2023-01-11T21:38:06.1717522Z del async_compile 2023-01-11T21:38:06.1717527Z 2023-01-11T21:38:06.1717603Z def call(args): 2023-01-11T21:38:06.1717684Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.1717759Z args.clear() 2023-01-11T21:38:06.1718000Z buf0 = empty_strided((1, 10, 1), (10, 1, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1718158Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.1718230Z del arg0_1 2023-01-11T21:38:06.1718306Z del arg1_1 2023-01-11T21:38:06.1718383Z return (buf0, ) 2023-01-11T21:38:06.1718388Z 2023-01-11T21:38:06.1718393Z 2023-01-11T21:38:06.1718473Z if __name__ == "__main__": 2023-01-11T21:38:06.1718590Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.1718718Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.1718912Z arg0_1 = rand_strided((10, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1719108Z arg1_1 = rand_strided((1, 10, 1), (10, 1, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1719227Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.1719232Z 2023-01-11T21:38:06.1719237Z 2023-01-11T21:38:06.1719337Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.1719412Z import torch 2023-01-11T21:38:06.1719487Z import random 2023-01-11T21:38:06.1719608Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.1719731Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.1719737Z 2023-01-11T21:38:06.1719827Z aten = torch.ops.aten 2023-01-11T21:38:06.1719956Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.1720053Z async_compile = AsyncCompile() 2023-01-11T21:38:06.1720058Z 2023-01-11T21:38:06.1720133Z import triton 2023-01-11T21:38:06.1720225Z import triton.language as tl 2023-01-11T21:38:06.1720353Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.1720493Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.1720498Z 2023-01-11T21:38:06.1720503Z 2023-01-11T21:38:06.1720638Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.1720843Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.1720963Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.1721074Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:06.1721179Z float* __restrict__ out_ptr0) 2023-01-11T21:38:06.1721245Z { 2023-01-11T21:38:06.1721377Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.1721444Z { 2023-01-11T21:38:06.1721526Z #pragma omp for 2023-01-11T21:38:06.1721605Z for(long i0=0; i0<1; i0+=1) 2023-01-11T21:38:06.1721672Z { 2023-01-11T21:38:06.1721809Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:06.1721946Z auto tmp3 = at::vec::Vectorized::loadu(in_ptr1 + 8*i0); 2023-01-11T21:38:06.1722081Z auto tmp1 = at::vec::Vectorized(static_cast(1)); 2023-01-11T21:38:06.1722170Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.1722260Z auto tmp4 = tmp2 + tmp3; 2023-01-11T21:38:06.1722350Z tmp4.store(out_ptr0 + 8*i0); 2023-01-11T21:38:06.1722416Z } 2023-01-11T21:38:06.1722515Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:06.1722601Z for(long i0=8; i0<10; i0+=1) 2023-01-11T21:38:06.1722668Z { 2023-01-11T21:38:06.1722758Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:06.1722846Z auto tmp3 = in_ptr1[i0]; 2023-01-11T21:38:06.1722943Z auto tmp1 = static_cast(1); 2023-01-11T21:38:06.1723031Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.1723119Z auto tmp4 = tmp2 + tmp3; 2023-01-11T21:38:06.1723205Z out_ptr0[i0] = tmp4; 2023-01-11T21:38:06.1723271Z } 2023-01-11T21:38:06.1723337Z } 2023-01-11T21:38:06.1723400Z } 2023-01-11T21:38:06.1723479Z ''') 2023-01-11T21:38:06.1723484Z 2023-01-11T21:38:06.1723489Z 2023-01-11T21:38:06.1723581Z async_compile.wait(globals()) 2023-01-11T21:38:06.1723657Z del async_compile 2023-01-11T21:38:06.1723695Z 2023-01-11T21:38:06.1723773Z def call(args): 2023-01-11T21:38:06.1723853Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.1723928Z args.clear() 2023-01-11T21:38:06.1724133Z buf0 = empty_strided((1, 10, 1), (10, 1, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1724297Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.1724371Z del arg0_1 2023-01-11T21:38:06.1724443Z del arg1_1 2023-01-11T21:38:06.1724519Z return (buf0, ) 2023-01-11T21:38:06.1724525Z 2023-01-11T21:38:06.1724529Z 2023-01-11T21:38:06.1724612Z if __name__ == "__main__": 2023-01-11T21:38:06.1724730Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.1724858Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.1725051Z arg0_1 = rand_strided((10, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1725247Z arg1_1 = rand_strided((1, 10, 1), (10, 1, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1725370Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.1725375Z 2023-01-11T21:38:06.1725639Z [2023-01-11 21:33:28,329] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 348 2023-01-11T21:38:06.1726053Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.1726186Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.1726442Z [2023-01-11 21:33:28,346] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 349 2023-01-11T21:38:06.1726705Z [2023-01-11 21:33:28,356] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 349 2023-01-11T21:38:06.1727190Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.1727323Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.1727578Z [2023-01-11 21:33:28,371] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 350 2023-01-11T21:38:06.1727839Z [2023-01-11 21:33:28,381] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 350 2023-01-11T21:38:06.1728247Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.1728375Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.1728630Z [2023-01-11 21:33:28,399] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 351 2023-01-11T21:38:06.1728636Z 2023-01-11T21:38:06.1728734Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.1728808Z import torch 2023-01-11T21:38:06.1728882Z import random 2023-01-11T21:38:06.1729001Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.1729125Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.1729130Z 2023-01-11T21:38:06.1729212Z aten = torch.ops.aten 2023-01-11T21:38:06.1729341Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.1729437Z async_compile = AsyncCompile() 2023-01-11T21:38:06.1729442Z 2023-01-11T21:38:06.1729559Z import triton 2023-01-11T21:38:06.1729652Z import triton.language as tl 2023-01-11T21:38:06.1729778Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.1729919Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.1729925Z 2023-01-11T21:38:06.1729929Z 2023-01-11T21:38:06.1730068Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.1730275Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.1730392Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.1730501Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:06.1730604Z float* __restrict__ out_ptr0) 2023-01-11T21:38:06.1730670Z { 2023-01-11T21:38:06.1730771Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.1730837Z { 2023-01-11T21:38:06.1730919Z #pragma omp for 2023-01-11T21:38:06.1731000Z for(long i0=0; i0<125; i0+=1) 2023-01-11T21:38:06.1731070Z { 2023-01-11T21:38:06.1731209Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:06.1731346Z auto tmp1 = at::vec::Vectorized::loadu(in_ptr1 + 8*i0); 2023-01-11T21:38:06.1731436Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.1731535Z tmp2.store(out_ptr0 + 8*i0); 2023-01-11T21:38:06.1731601Z } 2023-01-11T21:38:06.1731693Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:06.1731783Z for(long i0=1000; i0<1000; i0+=1) 2023-01-11T21:38:06.1731850Z { 2023-01-11T21:38:06.1731938Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:06.1732025Z auto tmp1 = in_ptr1[i0]; 2023-01-11T21:38:06.1732112Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.1732197Z out_ptr0[i0] = tmp2; 2023-01-11T21:38:06.1732256Z } 2023-01-11T21:38:06.1732322Z } 2023-01-11T21:38:06.1732385Z } 2023-01-11T21:38:06.1732470Z ''') 2023-01-11T21:38:06.1732478Z 2023-01-11T21:38:06.1732484Z 2023-01-11T21:38:06.1732576Z async_compile.wait(globals()) 2023-01-11T21:38:06.1732653Z del async_compile 2023-01-11T21:38:06.1732658Z 2023-01-11T21:38:06.1732732Z def call(args): 2023-01-11T21:38:06.1732805Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.1732880Z args.clear() 2023-01-11T21:38:06.1733143Z buf0 = empty_strided((10, 1, 10, 1, 10), (100, 100, 10, 10, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1733313Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.1733388Z del arg0_1 2023-01-11T21:38:06.1733458Z del arg1_1 2023-01-11T21:38:06.1733533Z return (buf0, ) 2023-01-11T21:38:06.1733538Z 2023-01-11T21:38:06.1733543Z 2023-01-11T21:38:06.1733623Z if __name__ == "__main__": 2023-01-11T21:38:06.1733734Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.1733861Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.1734070Z arg0_1 = rand_strided((10, 100), (100, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1734297Z arg1_1 = rand_strided((10, 1, 10, 1, 10), (100, 100, 10, 10, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1734418Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.1734423Z 2023-01-11T21:38:06.1734430Z 2023-01-11T21:38:06.1734685Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.1734798Z import torch 2023-01-11T21:38:06.1734894Z import random 2023-01-11T21:38:06.1735008Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.1735136Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.1735142Z 2023-01-11T21:38:06.1735230Z aten = torch.ops.aten 2023-01-11T21:38:06.1735386Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.1735508Z async_compile = AsyncCompile() 2023-01-11T21:38:06.1735514Z 2023-01-11T21:38:06.1735590Z import triton 2023-01-11T21:38:06.1735741Z import triton.language as tl 2023-01-11T21:38:06.1735859Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.1735999Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.1736005Z 2023-01-11T21:38:06.1736009Z 2023-01-11T21:38:06.1736150Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.1736359Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.1736480Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.1736589Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:06.1736691Z float* __restrict__ out_ptr0) 2023-01-11T21:38:06.1736755Z { 2023-01-11T21:38:06.1736849Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.1736914Z { 2023-01-11T21:38:06.1736995Z #pragma omp for 2023-01-11T21:38:06.1737082Z for(long i0=0; i0<125; i0+=1) 2023-01-11T21:38:06.1737211Z { 2023-01-11T21:38:06.1737373Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:06.1737510Z auto tmp3 = at::vec::Vectorized::loadu(in_ptr1 + 8*i0); 2023-01-11T21:38:06.1737644Z auto tmp1 = at::vec::Vectorized(static_cast(1)); 2023-01-11T21:38:06.1737727Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.1737818Z auto tmp4 = tmp2 + tmp3; 2023-01-11T21:38:06.1737914Z tmp4.store(out_ptr0 + 8*i0); 2023-01-11T21:38:06.1737985Z } 2023-01-11T21:38:06.1738084Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:06.1738176Z for(long i0=1000; i0<1000; i0+=1) 2023-01-11T21:38:06.1738243Z { 2023-01-11T21:38:06.1738324Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:06.1738410Z auto tmp3 = in_ptr1[i0]; 2023-01-11T21:38:06.1738514Z auto tmp1 = static_cast(1); 2023-01-11T21:38:06.1738602Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.1738690Z auto tmp4 = tmp2 + tmp3; 2023-01-11T21:38:06.1738778Z out_ptr0[i0] = tmp4; 2023-01-11T21:38:06.1738844Z } 2023-01-11T21:38:06.1738904Z } 2023-01-11T21:38:06.1738969Z } 2023-01-11T21:38:06.1739055Z ''') 2023-01-11T21:38:06.1739060Z 2023-01-11T21:38:06.1739064Z 2023-01-11T21:38:06.1739157Z async_compile.wait(globals()) 2023-01-11T21:38:06.1739275Z del async_compile 2023-01-11T21:38:06.1739280Z 2023-01-11T21:38:06.1739357Z def call(args): 2023-01-11T21:38:06.1739436Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.1739504Z args.clear() 2023-01-11T21:38:06.1739734Z buf0 = empty_strided((10, 1, 10, 1, 10), (100, 100, 10, 10, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1739901Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.1739974Z del arg0_1 2023-01-11T21:38:06.1740045Z del arg1_1 2023-01-11T21:38:06.1740120Z return (buf0, ) 2023-01-11T21:38:06.1740125Z 2023-01-11T21:38:06.1740133Z 2023-01-11T21:38:06.1740212Z if __name__ == "__main__": 2023-01-11T21:38:06.1740323Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.1740453Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.1740655Z arg0_1 = rand_strided((10, 100), (100, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1740886Z arg1_1 = rand_strided((10, 1, 10, 1, 10), (100, 100, 10, 10, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1741006Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.1741012Z 2023-01-11T21:38:06.1741016Z 2023-01-11T21:38:06.1741114Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.1741187Z import torch 2023-01-11T21:38:06.1741261Z import random 2023-01-11T21:38:06.1741374Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.1741498Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.1741503Z 2023-01-11T21:38:06.1741584Z aten = torch.ops.aten 2023-01-11T21:38:06.1741749Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.1741849Z async_compile = AsyncCompile() 2023-01-11T21:38:06.1741854Z 2023-01-11T21:38:06.1741930Z import triton 2023-01-11T21:38:06.1742026Z import triton.language as tl 2023-01-11T21:38:06.1742152Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.1742289Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.1742295Z 2023-01-11T21:38:06.1742305Z 2023-01-11T21:38:06.1742435Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.1742643Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.1742769Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.1742881Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:06.1742987Z float* __restrict__ out_ptr0) 2023-01-11T21:38:06.1743055Z { 2023-01-11T21:38:06.1743158Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.1743223Z { 2023-01-11T21:38:06.1743304Z #pragma omp for 2023-01-11T21:38:06.1743392Z for(long i0=0; i0<2; i0+=1) 2023-01-11T21:38:06.1743461Z { 2023-01-11T21:38:06.1743603Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:06.1743743Z auto tmp1 = at::vec::Vectorized::loadu(in_ptr1 + 8*i0); 2023-01-11T21:38:06.1743834Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.1743925Z tmp2.store(out_ptr0 + 8*i0); 2023-01-11T21:38:06.1743994Z } 2023-01-11T21:38:06.1744096Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:06.1744186Z for(long i0=16; i0<16; i0+=1) 2023-01-11T21:38:06.1744257Z { 2023-01-11T21:38:06.1744347Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:06.1744437Z auto tmp1 = in_ptr1[i0]; 2023-01-11T21:38:06.1744520Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.1744609Z out_ptr0[i0] = tmp2; 2023-01-11T21:38:06.1744681Z } 2023-01-11T21:38:06.1744748Z } 2023-01-11T21:38:06.1744815Z } 2023-01-11T21:38:06.1744901Z ''') 2023-01-11T21:38:06.1744907Z 2023-01-11T21:38:06.1744913Z 2023-01-11T21:38:06.1745006Z async_compile.wait(globals()) 2023-01-11T21:38:06.1745079Z del async_compile 2023-01-11T21:38:06.1745084Z 2023-01-11T21:38:06.1745190Z def call(args): 2023-01-11T21:38:06.1745274Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.1745352Z args.clear() 2023-01-11T21:38:06.1745597Z buf0 = empty_strided((2, 2, 2, 2), (8, 4, 2, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1745779Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.1745854Z del arg0_1 2023-01-11T21:38:06.1745921Z del arg1_1 2023-01-11T21:38:06.1745998Z return (buf0, ) 2023-01-11T21:38:06.1746003Z 2023-01-11T21:38:06.1746007Z 2023-01-11T21:38:06.1746090Z if __name__ == "__main__": 2023-01-11T21:38:06.1746211Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.1746340Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.1746539Z arg0_1 = rand_strided((4, 4), (4, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1746749Z arg1_1 = rand_strided((2, 2, 2, 2), (8, 4, 2, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1746873Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.1746878Z 2023-01-11T21:38:06.1747136Z [2023-01-11 21:33:28,409] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 351 2023-01-11T21:38:06.1747150Z 2023-01-11T21:38:06.1747243Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.1747320Z import torch 2023-01-11T21:38:06.1747397Z import random 2023-01-11T21:38:06.1747517Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.1747640Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.1747645Z 2023-01-11T21:38:06.1747761Z aten = torch.ops.aten 2023-01-11T21:38:06.1747898Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.1747989Z async_compile = AsyncCompile() 2023-01-11T21:38:06.1747994Z 2023-01-11T21:38:06.1748071Z import triton 2023-01-11T21:38:06.1748166Z import triton.language as tl 2023-01-11T21:38:06.1748295Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.1748436Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.1748441Z 2023-01-11T21:38:06.1748446Z 2023-01-11T21:38:06.1748586Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.1748791Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.1748915Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.1749019Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:06.1749124Z float* __restrict__ out_ptr0) 2023-01-11T21:38:06.1749191Z { 2023-01-11T21:38:06.1749303Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.1749371Z { 2023-01-11T21:38:06.1749455Z #pragma omp for 2023-01-11T21:38:06.1749543Z for(long i0=0; i0<2; i0+=1) 2023-01-11T21:38:06.1749605Z { 2023-01-11T21:38:06.1749743Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:06.1749881Z auto tmp3 = at::vec::Vectorized::loadu(in_ptr1 + 8*i0); 2023-01-11T21:38:06.1750017Z auto tmp1 = at::vec::Vectorized(static_cast(1)); 2023-01-11T21:38:06.1750109Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.1750200Z auto tmp4 = tmp2 + tmp3; 2023-01-11T21:38:06.1750299Z tmp4.store(out_ptr0 + 8*i0); 2023-01-11T21:38:06.1750370Z } 2023-01-11T21:38:06.1750465Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:06.1750553Z for(long i0=16; i0<16; i0+=1) 2023-01-11T21:38:06.1750621Z { 2023-01-11T21:38:06.1750711Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:06.1750803Z auto tmp3 = in_ptr1[i0]; 2023-01-11T21:38:06.1750909Z auto tmp1 = static_cast(1); 2023-01-11T21:38:06.1750994Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.1751085Z auto tmp4 = tmp2 + tmp3; 2023-01-11T21:38:06.1751173Z out_ptr0[i0] = tmp4; 2023-01-11T21:38:06.1751271Z } 2023-01-11T21:38:06.1751340Z } 2023-01-11T21:38:06.1751406Z } 2023-01-11T21:38:06.1751494Z ''') 2023-01-11T21:38:06.1751499Z 2023-01-11T21:38:06.1751504Z 2023-01-11T21:38:06.1751599Z async_compile.wait(globals()) 2023-01-11T21:38:06.1751672Z del async_compile 2023-01-11T21:38:06.1751678Z 2023-01-11T21:38:06.1751753Z def call(args): 2023-01-11T21:38:06.1751836Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.1751913Z args.clear() 2023-01-11T21:38:06.1752123Z buf0 = empty_strided((2, 2, 2, 2), (8, 4, 2, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1752293Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.1752370Z del arg0_1 2023-01-11T21:38:06.1752436Z del arg1_1 2023-01-11T21:38:06.1752515Z return (buf0, ) 2023-01-11T21:38:06.1752520Z 2023-01-11T21:38:06.1752524Z 2023-01-11T21:38:06.1752606Z if __name__ == "__main__": 2023-01-11T21:38:06.1752732Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.1752861Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.1753059Z arg0_1 = rand_strided((4, 4), (4, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1753269Z arg1_1 = rand_strided((2, 2, 2, 2), (8, 4, 2, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1753390Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.1753395Z 2023-01-11T21:38:06.1753462Z ok (17.852s) 2023-01-11T21:38:06.1753921Z test_views2_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.1754084Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.1754343Z [2023-01-11 21:33:28,424] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 352 2023-01-11T21:38:06.1754606Z [2023-01-11 21:33:30,087] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 352 2023-01-11T21:38:06.1755019Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.1755157Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.1755459Z [2023-01-11 21:33:30,103] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 353 2023-01-11T21:38:06.1755722Z [2023-01-11 21:33:31,775] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 353 2023-01-11T21:38:06.1756134Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.1756265Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.1756519Z [2023-01-11 21:33:31,790] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 354 2023-01-11T21:38:06.1756775Z [2023-01-11 21:33:33,456] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 354 2023-01-11T21:38:06.1757208Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.1757341Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.1757594Z [2023-01-11 21:33:33,474] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 355 2023-01-11T21:38:06.1757852Z [2023-01-11 21:33:35,138] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 355 2023-01-11T21:38:06.1758260Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.1758396Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.1758648Z [2023-01-11 21:33:35,153] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 356 2023-01-11T21:38:06.1758653Z 2023-01-11T21:38:06.1758751Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.1758828Z import torch 2023-01-11T21:38:06.1758895Z import random 2023-01-11T21:38:06.1759016Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.1759144Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.1759149Z 2023-01-11T21:38:06.1759230Z aten = torch.ops.aten 2023-01-11T21:38:06.1759368Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.1759494Z async_compile = AsyncCompile() 2023-01-11T21:38:06.1759499Z 2023-01-11T21:38:06.1759573Z import triton 2023-01-11T21:38:06.1759664Z import triton.language as tl 2023-01-11T21:38:06.1759782Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.1759923Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.1759929Z 2023-01-11T21:38:06.1759933Z 2023-01-11T21:38:06.1760072Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.1760279Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.1760401Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.1760505Z float* __restrict__ out_ptr0) 2023-01-11T21:38:06.1760571Z { 2023-01-11T21:38:06.1760671Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.1760732Z { 2023-01-11T21:38:06.1760814Z #pragma omp for 2023-01-11T21:38:06.1760900Z for(long i0=0; i0<2; i0+=1) 2023-01-11T21:38:06.1760971Z { 2023-01-11T21:38:06.1761109Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:06.1761248Z auto tmp1 = at::vec::Vectorized(static_cast(1)); 2023-01-11T21:38:06.1761338Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.1761430Z tmp2.store(out_ptr0 + 8*i0); 2023-01-11T21:38:06.1761498Z } 2023-01-11T21:38:06.1761596Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:06.1761683Z for(long i0=16; i0<16; i0+=1) 2023-01-11T21:38:06.1761748Z { 2023-01-11T21:38:06.1761837Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:06.1761944Z auto tmp1 = static_cast(1); 2023-01-11T21:38:06.1762026Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.1762111Z out_ptr0[i0] = tmp2; 2023-01-11T21:38:06.1762178Z } 2023-01-11T21:38:06.1762244Z } 2023-01-11T21:38:06.1762309Z } 2023-01-11T21:38:06.1762397Z ''') 2023-01-11T21:38:06.1762403Z 2023-01-11T21:38:06.1762407Z 2023-01-11T21:38:06.1762499Z async_compile.wait(globals()) 2023-01-11T21:38:06.1762569Z del async_compile 2023-01-11T21:38:06.1762574Z 2023-01-11T21:38:06.1762648Z def call(args): 2023-01-11T21:38:06.1762723Z arg0_1, = args 2023-01-11T21:38:06.1762798Z args.clear() 2023-01-11T21:38:06.1763021Z buf0 = empty_strided((4, 4), (4, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1763163Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.1763234Z del arg0_1 2023-01-11T21:38:06.1763302Z return (buf0, ) 2023-01-11T21:38:06.1763307Z 2023-01-11T21:38:06.1763311Z 2023-01-11T21:38:06.1763391Z if __name__ == "__main__": 2023-01-11T21:38:06.1763511Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.1763637Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.1763846Z arg0_1 = rand_strided((2, 2, 2, 2), (8, 4, 2, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1763961Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.1763966Z 2023-01-11T21:38:06.1763970Z 2023-01-11T21:38:06.1764066Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.1764141Z import torch 2023-01-11T21:38:06.1764208Z import random 2023-01-11T21:38:06.1764329Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.1764451Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.1764456Z 2023-01-11T21:38:06.1764537Z aten = torch.ops.aten 2023-01-11T21:38:06.1764671Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.1764766Z async_compile = AsyncCompile() 2023-01-11T21:38:06.1764772Z 2023-01-11T21:38:06.1764846Z import triton 2023-01-11T21:38:06.1764937Z import triton.language as tl 2023-01-11T21:38:06.1765055Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.1765194Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.1765230Z 2023-01-11T21:38:06.1765235Z 2023-01-11T21:38:06.1765371Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.1765577Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.1765700Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.1765806Z float* __restrict__ out_ptr0) 2023-01-11T21:38:06.1765871Z { 2023-01-11T21:38:06.1765965Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.1766031Z { 2023-01-11T21:38:06.1766111Z #pragma omp for 2023-01-11T21:38:06.1766198Z for(long i0=0; i0<2; i0+=1) 2023-01-11T21:38:06.1766264Z { 2023-01-11T21:38:06.1766401Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:06.1766537Z auto tmp1 = at::vec::Vectorized(static_cast(2)); 2023-01-11T21:38:06.1766626Z auto tmp2 = tmp0 * tmp1; 2023-01-11T21:38:06.1766752Z auto tmp3 = at::vec::Vectorized(static_cast(1)); 2023-01-11T21:38:06.1766845Z auto tmp4 = tmp2 + tmp3; 2023-01-11T21:38:06.1766940Z tmp4.store(out_ptr0 + 8*i0); 2023-01-11T21:38:06.1767007Z } 2023-01-11T21:38:06.1767107Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:06.1767195Z for(long i0=16; i0<16; i0+=1) 2023-01-11T21:38:06.1767263Z { 2023-01-11T21:38:06.1767344Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:06.1767448Z auto tmp1 = static_cast(2); 2023-01-11T21:38:06.1767536Z auto tmp2 = tmp0 * tmp1; 2023-01-11T21:38:06.1767639Z auto tmp3 = static_cast(1); 2023-01-11T21:38:06.1767726Z auto tmp4 = tmp2 + tmp3; 2023-01-11T21:38:06.1767812Z out_ptr0[i0] = tmp4; 2023-01-11T21:38:06.1767879Z } 2023-01-11T21:38:06.1767937Z } 2023-01-11T21:38:06.1768002Z } 2023-01-11T21:38:06.1768088Z ''') 2023-01-11T21:38:06.1768093Z 2023-01-11T21:38:06.1768098Z 2023-01-11T21:38:06.1768193Z async_compile.wait(globals()) 2023-01-11T21:38:06.1768270Z del async_compile 2023-01-11T21:38:06.1768276Z 2023-01-11T21:38:06.1768350Z def call(args): 2023-01-11T21:38:06.1768425Z arg0_1, = args 2023-01-11T21:38:06.1768492Z args.clear() 2023-01-11T21:38:06.1768686Z buf0 = empty_strided((4, 4), (4, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1768854Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.1768928Z del arg0_1 2023-01-11T21:38:06.1769003Z return (buf0, ) 2023-01-11T21:38:06.1769009Z 2023-01-11T21:38:06.1769013Z 2023-01-11T21:38:06.1769093Z if __name__ == "__main__": 2023-01-11T21:38:06.1769211Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.1769335Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.1769537Z arg0_1 = rand_strided((2, 2, 2, 2), (8, 4, 2, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1769649Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.1769659Z 2023-01-11T21:38:06.1769663Z 2023-01-11T21:38:06.1769761Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.1769838Z import torch 2023-01-11T21:38:06.1769913Z import random 2023-01-11T21:38:06.1770031Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.1770156Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.1770162Z 2023-01-11T21:38:06.1770236Z aten = torch.ops.aten 2023-01-11T21:38:06.1770370Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.1770464Z async_compile = AsyncCompile() 2023-01-11T21:38:06.1770469Z 2023-01-11T21:38:06.1770542Z import triton 2023-01-11T21:38:06.1770636Z import triton.language as tl 2023-01-11T21:38:06.1770760Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.1770900Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.1770905Z 2023-01-11T21:38:06.1770910Z 2023-01-11T21:38:06.1771045Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.1771284Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.1771409Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.1771513Z float* __restrict__ out_ptr0) 2023-01-11T21:38:06.1771578Z { 2023-01-11T21:38:06.1771680Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.1771746Z { 2023-01-11T21:38:06.1771828Z #pragma omp for 2023-01-11T21:38:06.1771908Z for(long i0=0; i0<125; i0+=1) 2023-01-11T21:38:06.1771973Z { 2023-01-11T21:38:06.1772111Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:06.1772250Z auto tmp1 = at::vec::Vectorized(static_cast(1)); 2023-01-11T21:38:06.1772340Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.1772436Z tmp2.store(out_ptr0 + 8*i0); 2023-01-11T21:38:06.1772503Z } 2023-01-11T21:38:06.1772604Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:06.1772689Z for(long i0=1000; i0<1000; i0+=1) 2023-01-11T21:38:06.1772756Z { 2023-01-11T21:38:06.1772845Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:06.1772947Z auto tmp1 = static_cast(1); 2023-01-11T21:38:06.1773035Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.1773126Z out_ptr0[i0] = tmp2; 2023-01-11T21:38:06.1773193Z } 2023-01-11T21:38:06.1773252Z } 2023-01-11T21:38:06.1773315Z } 2023-01-11T21:38:06.1773400Z ''') 2023-01-11T21:38:06.1773405Z 2023-01-11T21:38:06.1773410Z 2023-01-11T21:38:06.1773502Z async_compile.wait(globals()) 2023-01-11T21:38:06.1773578Z del async_compile 2023-01-11T21:38:06.1773583Z 2023-01-11T21:38:06.1773659Z def call(args): 2023-01-11T21:38:06.1773733Z arg0_1, = args 2023-01-11T21:38:06.1773800Z args.clear() 2023-01-11T21:38:06.1774004Z buf0 = empty_strided((10, 100), (100, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1774144Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.1774216Z del arg0_1 2023-01-11T21:38:06.1774292Z return (buf0, ) 2023-01-11T21:38:06.1774297Z 2023-01-11T21:38:06.1774302Z 2023-01-11T21:38:06.1774381Z if __name__ == "__main__": 2023-01-11T21:38:06.1774757Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.1774919Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.1775163Z arg0_1 = rand_strided((10, 1, 10, 1, 10), (100, 100, 10, 10, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1775277Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.1775283Z 2023-01-11T21:38:06.1775287Z 2023-01-11T21:38:06.1775387Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.1775465Z import torch 2023-01-11T21:38:06.1775541Z import random 2023-01-11T21:38:06.1775666Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.1775792Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.1775801Z 2023-01-11T21:38:06.1775880Z aten = torch.ops.aten 2023-01-11T21:38:06.1776017Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.1776114Z async_compile = AsyncCompile() 2023-01-11T21:38:06.1776119Z 2023-01-11T21:38:06.1776194Z import triton 2023-01-11T21:38:06.1776292Z import triton.language as tl 2023-01-11T21:38:06.1776417Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.1776558Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.1776563Z 2023-01-11T21:38:06.1776568Z 2023-01-11T21:38:06.1776705Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.1776906Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.1777032Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.1777196Z float* __restrict__ out_ptr0) 2023-01-11T21:38:06.1777277Z { 2023-01-11T21:38:06.1777437Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.1777504Z { 2023-01-11T21:38:06.1777586Z #pragma omp for 2023-01-11T21:38:06.1777666Z for(long i0=0; i0<125; i0+=1) 2023-01-11T21:38:06.1777732Z { 2023-01-11T21:38:06.1777871Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:06.1778012Z auto tmp1 = at::vec::Vectorized(static_cast(2)); 2023-01-11T21:38:06.1778101Z auto tmp2 = tmp0 * tmp1; 2023-01-11T21:38:06.1778234Z auto tmp3 = at::vec::Vectorized(static_cast(1)); 2023-01-11T21:38:06.1778322Z auto tmp4 = tmp2 + tmp3; 2023-01-11T21:38:06.1778420Z tmp4.store(out_ptr0 + 8*i0); 2023-01-11T21:38:06.1778479Z } 2023-01-11T21:38:06.1778581Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:06.1778671Z for(long i0=1000; i0<1000; i0+=1) 2023-01-11T21:38:06.1778738Z { 2023-01-11T21:38:06.1778827Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:06.1778933Z auto tmp1 = static_cast(2); 2023-01-11T21:38:06.1779021Z auto tmp2 = tmp0 * tmp1; 2023-01-11T21:38:06.1779117Z auto tmp3 = static_cast(1); 2023-01-11T21:38:06.1779205Z auto tmp4 = tmp2 + tmp3; 2023-01-11T21:38:06.1779296Z out_ptr0[i0] = tmp4; 2023-01-11T21:38:06.1779365Z } 2023-01-11T21:38:06.1779430Z } 2023-01-11T21:38:06.1779494Z } 2023-01-11T21:38:06.1779574Z ''') 2023-01-11T21:38:06.1779587Z 2023-01-11T21:38:06.1779592Z 2023-01-11T21:38:06.1779677Z async_compile.wait(globals()) 2023-01-11T21:38:06.1779754Z del async_compile 2023-01-11T21:38:06.1779759Z 2023-01-11T21:38:06.1779834Z def call(args): 2023-01-11T21:38:06.1779907Z arg0_1, = args 2023-01-11T21:38:06.1779981Z args.clear() 2023-01-11T21:38:06.1780182Z buf0 = empty_strided((10, 100), (100, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1780319Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.1780387Z del arg0_1 2023-01-11T21:38:06.1780463Z return (buf0, ) 2023-01-11T21:38:06.1780468Z 2023-01-11T21:38:06.1780472Z 2023-01-11T21:38:06.1780552Z if __name__ == "__main__": 2023-01-11T21:38:06.1780671Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.1780827Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.1781057Z arg0_1 = rand_strided((10, 1, 10, 1, 10), (100, 100, 10, 10, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1781172Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.1781177Z 2023-01-11T21:38:06.1781440Z [2023-01-11 21:33:35,160] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 356 2023-01-11T21:38:06.1781851Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.1781978Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.1782235Z [2023-01-11 21:33:35,177] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 357 2023-01-11T21:38:06.1782497Z [2023-01-11 21:33:35,185] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 357 2023-01-11T21:38:06.1782503Z 2023-01-11T21:38:06.1782601Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.1782675Z import torch 2023-01-11T21:38:06.1782750Z import random 2023-01-11T21:38:06.1782868Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.1782990Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.1782995Z 2023-01-11T21:38:06.1783070Z aten = torch.ops.aten 2023-01-11T21:38:06.1783204Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.1783330Z async_compile = AsyncCompile() 2023-01-11T21:38:06.1783335Z 2023-01-11T21:38:06.1783409Z import triton 2023-01-11T21:38:06.1783502Z import triton.language as tl 2023-01-11T21:38:06.1783626Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.1783771Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.1783776Z 2023-01-11T21:38:06.1783781Z 2023-01-11T21:38:06.1783917Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.1784117Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.1784239Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.1784344Z float* __restrict__ out_ptr0) 2023-01-11T21:38:06.1784408Z { 2023-01-11T21:38:06.1784509Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.1784576Z { 2023-01-11T21:38:06.1784656Z #pragma omp for 2023-01-11T21:38:06.1784741Z for(long i0=0; i0<125; i0+=1) 2023-01-11T21:38:06.1784807Z { 2023-01-11T21:38:06.1784945Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:06.1785081Z auto tmp1 = at::vec::Vectorized(static_cast(1)); 2023-01-11T21:38:06.1785171Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.1785269Z tmp2.store(out_ptr0 + 8*i0); 2023-01-11T21:38:06.1785341Z } 2023-01-11T21:38:06.1785454Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:06.1785551Z for(long i0=1000; i0<1000; i0+=1) 2023-01-11T21:38:06.1785625Z { 2023-01-11T21:38:06.1785730Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:06.1785832Z auto tmp1 = static_cast(1); 2023-01-11T21:38:06.1785921Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.1786005Z out_ptr0[i0] = tmp2; 2023-01-11T21:38:06.1786072Z } 2023-01-11T21:38:06.1786130Z } 2023-01-11T21:38:06.1786195Z } 2023-01-11T21:38:06.1786283Z ''') 2023-01-11T21:38:06.1786289Z 2023-01-11T21:38:06.1786293Z 2023-01-11T21:38:06.1786386Z async_compile.wait(globals()) 2023-01-11T21:38:06.1786463Z del async_compile 2023-01-11T21:38:06.1786468Z 2023-01-11T21:38:06.1786542Z def call(args): 2023-01-11T21:38:06.1786617Z arg0_1, = args 2023-01-11T21:38:06.1786684Z args.clear() 2023-01-11T21:38:06.1786922Z buf0 = empty_strided((10, 5, 20), (100, 20, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1787066Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.1787140Z del arg0_1 2023-01-11T21:38:06.1787215Z return (buf0, ) 2023-01-11T21:38:06.1787221Z 2023-01-11T21:38:06.1787225Z 2023-01-11T21:38:06.1787306Z if __name__ == "__main__": 2023-01-11T21:38:06.1787423Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.1787541Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.1787742Z arg0_1 = rand_strided((50, 20), (20, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1787857Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.1787862Z 2023-01-11T21:38:06.1787866Z 2023-01-11T21:38:06.1787963Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.1788036Z import torch 2023-01-11T21:38:06.1788111Z import random 2023-01-11T21:38:06.1788234Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.1788357Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.1788362Z 2023-01-11T21:38:06.1788437Z aten = torch.ops.aten 2023-01-11T21:38:06.1788572Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.1788665Z async_compile = AsyncCompile() 2023-01-11T21:38:06.1788671Z 2023-01-11T21:38:06.1788744Z import triton 2023-01-11T21:38:06.1788837Z import triton.language as tl 2023-01-11T21:38:06.1788960Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.1789099Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.1789131Z 2023-01-11T21:38:06.1789136Z 2023-01-11T21:38:06.1789272Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.1789469Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.1789593Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.1789699Z float* __restrict__ out_ptr0) 2023-01-11T21:38:06.1789765Z { 2023-01-11T21:38:06.1789866Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.1789931Z { 2023-01-11T21:38:06.1790012Z #pragma omp for 2023-01-11T21:38:06.1790092Z for(long i0=0; i0<125; i0+=1) 2023-01-11T21:38:06.1790159Z { 2023-01-11T21:38:06.1790295Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:06.1790432Z auto tmp1 = at::vec::Vectorized(static_cast(2)); 2023-01-11T21:38:06.1790520Z auto tmp2 = tmp0 * tmp1; 2023-01-11T21:38:06.1790660Z auto tmp3 = at::vec::Vectorized(static_cast(1)); 2023-01-11T21:38:06.1790748Z auto tmp4 = tmp2 + tmp3; 2023-01-11T21:38:06.1790843Z tmp4.store(out_ptr0 + 8*i0); 2023-01-11T21:38:06.1790903Z } 2023-01-11T21:38:06.1791002Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:06.1791096Z for(long i0=1000; i0<1000; i0+=1) 2023-01-11T21:38:06.1791163Z { 2023-01-11T21:38:06.1791252Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:06.1791356Z auto tmp1 = static_cast(2); 2023-01-11T21:38:06.1791445Z auto tmp2 = tmp0 * tmp1; 2023-01-11T21:38:06.1791540Z auto tmp3 = static_cast(1); 2023-01-11T21:38:06.1791628Z auto tmp4 = tmp2 + tmp3; 2023-01-11T21:38:06.1791713Z out_ptr0[i0] = tmp4; 2023-01-11T21:38:06.1791778Z } 2023-01-11T21:38:06.1791843Z } 2023-01-11T21:38:06.1791907Z } 2023-01-11T21:38:06.1791984Z ''') 2023-01-11T21:38:06.1791996Z 2023-01-11T21:38:06.1792004Z 2023-01-11T21:38:06.1792090Z async_compile.wait(globals()) 2023-01-11T21:38:06.1792166Z del async_compile 2023-01-11T21:38:06.1792171Z 2023-01-11T21:38:06.1792247Z def call(args): 2023-01-11T21:38:06.1792322Z arg0_1, = args 2023-01-11T21:38:06.1792397Z args.clear() 2023-01-11T21:38:06.1792634Z buf0 = empty_strided((10, 5, 20), (100, 20, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1792773Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.1792839Z del arg0_1 2023-01-11T21:38:06.1792914Z return (buf0, ) 2023-01-11T21:38:06.1792919Z 2023-01-11T21:38:06.1792923Z 2023-01-11T21:38:06.1793003Z if __name__ == "__main__": 2023-01-11T21:38:06.1793120Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.1793246Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.1793446Z arg0_1 = rand_strided((50, 20), (20, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1793562Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.1793567Z 2023-01-11T21:38:06.1793637Z ok (6.775s) 2023-01-11T21:38:06.1794088Z test_views3_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.1794221Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.1794478Z [2023-01-11 21:33:35,243] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 358 2023-01-11T21:38:06.1794740Z [2023-01-11 21:33:36,954] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 358 2023-01-11T21:38:06.1794746Z 2023-01-11T21:38:06.1794844Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.1794948Z import torch 2023-01-11T21:38:06.1795027Z import random 2023-01-11T21:38:06.1795146Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.1795295Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.1795301Z 2023-01-11T21:38:06.1795385Z aten = torch.ops.aten 2023-01-11T21:38:06.1795538Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.1795637Z async_compile = AsyncCompile() 2023-01-11T21:38:06.1795642Z 2023-01-11T21:38:06.1795716Z import triton 2023-01-11T21:38:06.1795811Z import triton.language as tl 2023-01-11T21:38:06.1795935Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.1796073Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.1796079Z 2023-01-11T21:38:06.1796084Z 2023-01-11T21:38:06.1796220Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.1796418Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.1796543Z extern "C" void kernel(const long* __restrict__ in_ptr0, 2023-01-11T21:38:06.1796653Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:06.1796757Z float* __restrict__ out_ptr0) 2023-01-11T21:38:06.1796825Z { 2023-01-11T21:38:06.1796930Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.1796996Z { 2023-01-11T21:38:06.1797071Z #pragma omp for 2023-01-11T21:38:06.1797159Z for(long i0=0; i0<744; i0+=1) 2023-01-11T21:38:06.1797228Z { 2023-01-11T21:38:06.1797313Z #pragma GCC ivdep 2023-01-11T21:38:06.1797406Z for(long i1=0; i1<192; i1+=1) 2023-01-11T21:38:06.1797475Z { 2023-01-11T21:38:06.1797546Z { 2023-01-11T21:38:06.1797610Z { 2023-01-11T21:38:06.1797723Z auto tmp0 = in_ptr0[(3*i0) + (i1 / 64)]; 2023-01-11T21:38:06.1797838Z auto tmp1 = in_ptr1[(64*tmp0) + (i1 % 64)]; 2023-01-11T21:38:06.1797943Z out_ptr0[i1 + (192*i0)] = tmp1; 2023-01-11T21:38:06.1798013Z } 2023-01-11T21:38:06.1798081Z } 2023-01-11T21:38:06.1798150Z } 2023-01-11T21:38:06.1798209Z } 2023-01-11T21:38:06.1798275Z } 2023-01-11T21:38:06.1798339Z } 2023-01-11T21:38:06.1798430Z ''') 2023-01-11T21:38:06.1798464Z 2023-01-11T21:38:06.1798469Z 2023-01-11T21:38:06.1798564Z async_compile.wait(globals()) 2023-01-11T21:38:06.1798643Z del async_compile 2023-01-11T21:38:06.1798648Z 2023-01-11T21:38:06.1798724Z def call(args): 2023-01-11T21:38:06.1798796Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.1798873Z args.clear() 2023-01-11T21:38:06.1799104Z buf0 = empty_strided((1, 12, 62, 192), (142848, 11904, 192, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1799272Z kernel_cpp_0(c_void_p(arg1_1.data_ptr()), c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.1799346Z del arg0_1 2023-01-11T21:38:06.1799421Z del arg1_1 2023-01-11T21:38:06.1799496Z return (buf0, ) 2023-01-11T21:38:06.1799501Z 2023-01-11T21:38:06.1799506Z 2023-01-11T21:38:06.1799585Z if __name__ == "__main__": 2023-01-11T21:38:06.1799696Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.1799821Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.1800022Z arg0_1 = rand_strided((64, 64), (64, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1800216Z arg1_1 = rand_strided((2232, ), (1, ), device='cpu', dtype=torch.int64) 2023-01-11T21:38:06.1800338Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.1800343Z 2023-01-11T21:38:06.1800415Z ok (1.769s) 2023-01-11T21:38:06.1800934Z test_zero_dim_reductions_cpu (__main__.CpuTests) ... /opt/conda/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py:372: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled.Consider setting `torch.set_float32_matmul_precision('high')` 2023-01-11T21:38:06.1801058Z warnings.warn( 2023-01-11T21:38:06.1801306Z [2023-01-11 21:33:37,014] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 359 2023-01-11T21:38:06.1801565Z [2023-01-11 21:33:38,661] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 359 2023-01-11T21:38:06.1801819Z [2023-01-11 21:33:38,718] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 360 2023-01-11T21:38:06.1802080Z [2023-01-11 21:33:38,724] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 360 2023-01-11T21:38:06.1802086Z 2023-01-11T21:38:06.1802184Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.1802259Z import torch 2023-01-11T21:38:06.1802334Z import random 2023-01-11T21:38:06.1802454Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.1802572Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.1802584Z 2023-01-11T21:38:06.1802663Z aten = torch.ops.aten 2023-01-11T21:38:06.1802799Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.1802896Z async_compile = AsyncCompile() 2023-01-11T21:38:06.1802901Z 2023-01-11T21:38:06.1802975Z import triton 2023-01-11T21:38:06.1803070Z import triton.language as tl 2023-01-11T21:38:06.1803198Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.1803337Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.1803343Z 2023-01-11T21:38:06.1803347Z 2023-01-11T21:38:06.1803477Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.1803679Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.1803794Z extern "C" void kernel(bool* __restrict__ out_ptr0) 2023-01-11T21:38:06.1803860Z { 2023-01-11T21:38:06.1803942Z #pragma GCC ivdep 2023-01-11T21:38:06.1804027Z for(long i0=0; i0<2; i0+=1) 2023-01-11T21:38:06.1804093Z { 2023-01-11T21:38:06.1804156Z { 2023-01-11T21:38:06.1804226Z { 2023-01-11T21:38:06.1804338Z auto tmp0 = static_cast(false); 2023-01-11T21:38:06.1804428Z auto tmp1 = tmp0 == 0; 2023-01-11T21:38:06.1804515Z out_ptr0[i0] = tmp1; 2023-01-11T21:38:06.1804583Z } 2023-01-11T21:38:06.1804678Z } 2023-01-11T21:38:06.1804741Z } 2023-01-11T21:38:06.1804807Z } 2023-01-11T21:38:06.1804891Z ''') 2023-01-11T21:38:06.1804897Z 2023-01-11T21:38:06.1804901Z 2023-01-11T21:38:06.1804996Z async_compile.wait(globals()) 2023-01-11T21:38:06.1805073Z del async_compile 2023-01-11T21:38:06.1805079Z 2023-01-11T21:38:06.1805154Z def call(args): 2023-01-11T21:38:06.1805232Z arg0_1, = args 2023-01-11T21:38:06.1805300Z args.clear() 2023-01-11T21:38:06.1805491Z buf0 = empty_strided((2, 1), (1, 1), device='cpu', dtype=torch.bool) 2023-01-11T21:38:06.1805599Z kernel_cpp_0(c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.1805674Z return (buf0, ) 2023-01-11T21:38:06.1805682Z 2023-01-11T21:38:06.1805686Z 2023-01-11T21:38:06.1805767Z if __name__ == "__main__": 2023-01-11T21:38:06.1805882Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.1806007Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.1806205Z arg0_1 = rand_strided((2, 0), (1, 1), device='cpu', dtype=torch.float16) 2023-01-11T21:38:06.1806310Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.1806315Z 2023-01-11T21:38:06.1806327Z 2023-01-11T21:38:06.1806417Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.1806491Z import torch 2023-01-11T21:38:06.1806565Z import random 2023-01-11T21:38:06.1806683Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.1806805Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.1806810Z 2023-01-11T21:38:06.1806891Z aten = torch.ops.aten 2023-01-11T21:38:06.1807025Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.1807143Z async_compile = AsyncCompile() 2023-01-11T21:38:06.1807148Z 2023-01-11T21:38:06.1807222Z import triton 2023-01-11T21:38:06.1807314Z import triton.language as tl 2023-01-11T21:38:06.1807437Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.1807574Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.1807582Z 2023-01-11T21:38:06.1807586Z 2023-01-11T21:38:06.1807721Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.1807929Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.1808044Z extern "C" void kernel(bool* __restrict__ out_ptr0) 2023-01-11T21:38:06.1808103Z { 2023-01-11T21:38:06.1808182Z #pragma GCC ivdep 2023-01-11T21:38:06.1808267Z for(long i0=0; i0<2; i0+=1) 2023-01-11T21:38:06.1808333Z { 2023-01-11T21:38:06.1808402Z { 2023-01-11T21:38:06.1808471Z { 2023-01-11T21:38:06.1808573Z auto tmp0 = static_cast(false); 2023-01-11T21:38:06.1808666Z auto tmp1 = tmp0 == 0; 2023-01-11T21:38:06.1808754Z out_ptr0[i0] = tmp1; 2023-01-11T21:38:06.1808822Z } 2023-01-11T21:38:06.1808889Z } 2023-01-11T21:38:06.1808955Z } 2023-01-11T21:38:06.1809019Z } 2023-01-11T21:38:06.1809096Z ''') 2023-01-11T21:38:06.1809101Z 2023-01-11T21:38:06.1809115Z 2023-01-11T21:38:06.1809202Z async_compile.wait(globals()) 2023-01-11T21:38:06.1809282Z del async_compile 2023-01-11T21:38:06.1809287Z 2023-01-11T21:38:06.1809361Z def call(args): 2023-01-11T21:38:06.1809435Z arg0_1, = args 2023-01-11T21:38:06.1809511Z args.clear() 2023-01-11T21:38:06.1809700Z buf0 = empty_strided((2, ), (1, ), device='cpu', dtype=torch.bool) 2023-01-11T21:38:06.1809807Z kernel_cpp_0(c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.1809876Z return (buf0, ) 2023-01-11T21:38:06.1809881Z 2023-01-11T21:38:06.1809886Z 2023-01-11T21:38:06.1809965Z if __name__ == "__main__": 2023-01-11T21:38:06.1810083Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.1810208Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.1810406Z arg0_1 = rand_strided((2, 0), (1, 1), device='cpu', dtype=torch.float16) 2023-01-11T21:38:06.1810518Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.1810523Z 2023-01-11T21:38:06.1810624Z ok (1.770s) 2023-01-11T21:38:06.1811076Z test_zeros_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.1811208Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.1811458Z [2023-01-11 21:33:38,831] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 361 2023-01-11T21:38:06.1811722Z [2023-01-11 21:33:40,564] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 361 2023-01-11T21:38:06.1811728Z 2023-01-11T21:38:06.1811826Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.1811900Z import torch 2023-01-11T21:38:06.1811974Z import random 2023-01-11T21:38:06.1812092Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.1812214Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.1812219Z 2023-01-11T21:38:06.1812300Z aten = torch.ops.aten 2023-01-11T21:38:06.1812428Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.1812523Z async_compile = AsyncCompile() 2023-01-11T21:38:06.1812528Z 2023-01-11T21:38:06.1812603Z import triton 2023-01-11T21:38:06.1812695Z import triton.language as tl 2023-01-11T21:38:06.1812819Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.1812960Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.1813005Z 2023-01-11T21:38:06.1813010Z 2023-01-11T21:38:06.1813148Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.1813352Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.1813470Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.1813575Z float* __restrict__ out_ptr0, 2023-01-11T21:38:06.1813677Z float* __restrict__ out_ptr1, 2023-01-11T21:38:06.1813778Z float* __restrict__ out_ptr2, 2023-01-11T21:38:06.1813877Z float* __restrict__ out_ptr3, 2023-01-11T21:38:06.1813976Z float* __restrict__ out_ptr4, 2023-01-11T21:38:06.1814072Z float* __restrict__ out_ptr5) 2023-01-11T21:38:06.1814131Z { 2023-01-11T21:38:06.1814232Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.1814299Z { 2023-01-11T21:38:06.1814384Z #pragma omp for 2023-01-11T21:38:06.1814470Z for(long i0=0; i0<1; i0+=1) 2023-01-11T21:38:06.1814661Z { 2023-01-11T21:38:06.1814803Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:06.1814934Z auto tmp1 = at::vec::Vectorized(static_cast(1)); 2023-01-11T21:38:06.1815029Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.1815125Z tmp2.store(out_ptr0 + 8*i0); 2023-01-11T21:38:06.1815220Z tmp2.store(out_ptr1 + 8*i0); 2023-01-11T21:38:06.1815287Z } 2023-01-11T21:38:06.1815385Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:06.1815471Z for(long i0=8; i0<8; i0+=1) 2023-01-11T21:38:06.1815530Z { 2023-01-11T21:38:06.1815621Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:06.1815725Z auto tmp1 = static_cast(1); 2023-01-11T21:38:06.1815814Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.1815899Z out_ptr0[i0] = tmp2; 2023-01-11T21:38:06.1815988Z out_ptr1[i0] = tmp2; 2023-01-11T21:38:06.1816056Z } 2023-01-11T21:38:06.1816130Z #pragma omp for 2023-01-11T21:38:06.1816219Z for(long i0=0; i0<4096; i0+=1) 2023-01-11T21:38:06.1816286Z { 2023-01-11T21:38:06.1816468Z auto tmp0 = at::vec::Vectorized(static_cast(0)); 2023-01-11T21:38:06.1816568Z tmp0.store(out_ptr2 + 8*i0); 2023-01-11T21:38:06.1816636Z } 2023-01-11T21:38:06.1816741Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:06.1816833Z for(long i0=32768; i0<32768; i0+=1) 2023-01-11T21:38:06.1816899Z { 2023-01-11T21:38:06.1817008Z auto tmp0 = static_cast(0); 2023-01-11T21:38:06.1817096Z out_ptr2[i0] = tmp0; 2023-01-11T21:38:06.1817252Z } 2023-01-11T21:38:06.1817343Z #pragma omp for 2023-01-11T21:38:06.1817431Z for(long i0=0; i0<4096; i0+=1) 2023-01-11T21:38:06.1817490Z { 2023-01-11T21:38:06.1817630Z auto tmp0 = at::vec::Vectorized(static_cast(0)); 2023-01-11T21:38:06.1817726Z tmp0.store(out_ptr3 + 8*i0); 2023-01-11T21:38:06.1817795Z } 2023-01-11T21:38:06.1817893Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:06.1817986Z for(long i0=32768; i0<32768; i0+=1) 2023-01-11T21:38:06.1818056Z { 2023-01-11T21:38:06.1818151Z auto tmp0 = static_cast(0); 2023-01-11T21:38:06.1818235Z out_ptr3[i0] = tmp0; 2023-01-11T21:38:06.1818302Z } 2023-01-11T21:38:06.1818386Z #pragma omp single 2023-01-11T21:38:06.1818454Z { 2023-01-11T21:38:06.1818539Z #pragma GCC ivdep 2023-01-11T21:38:06.1818619Z for(long i0=0; i0<6; i0+=1) 2023-01-11T21:38:06.1818687Z { 2023-01-11T21:38:06.1818755Z { 2023-01-11T21:38:06.1818826Z { 2023-01-11T21:38:06.1818938Z auto tmp0 = static_cast(0); 2023-01-11T21:38:06.1819125Z out_ptr4[i0] = tmp0; 2023-01-11T21:38:06.1819198Z } 2023-01-11T21:38:06.1819262Z } 2023-01-11T21:38:06.1819332Z } 2023-01-11T21:38:06.1819402Z } 2023-01-11T21:38:06.1819489Z #pragma omp single 2023-01-11T21:38:06.1819558Z { 2023-01-11T21:38:06.1819648Z #pragma GCC ivdep 2023-01-11T21:38:06.1819738Z for(long i0=0; i0<6; i0+=1) 2023-01-11T21:38:06.1819802Z { 2023-01-11T21:38:06.1819874Z { 2023-01-11T21:38:06.1819947Z { 2023-01-11T21:38:06.1820065Z auto tmp0 = static_cast(3.1416); 2023-01-11T21:38:06.1820162Z out_ptr5[i0] = tmp0; 2023-01-11T21:38:06.1820236Z } 2023-01-11T21:38:06.1820310Z } 2023-01-11T21:38:06.1820373Z } 2023-01-11T21:38:06.1820442Z } 2023-01-11T21:38:06.1820513Z } 2023-01-11T21:38:06.1820580Z } 2023-01-11T21:38:06.1820677Z ''') 2023-01-11T21:38:06.1820683Z 2023-01-11T21:38:06.1820687Z 2023-01-11T21:38:06.1820783Z async_compile.wait(globals()) 2023-01-11T21:38:06.1820862Z del async_compile 2023-01-11T21:38:06.1820867Z 2023-01-11T21:38:06.1820936Z def call(args): 2023-01-11T21:38:06.1821012Z arg0_1, = args 2023-01-11T21:38:06.1821092Z args.clear() 2023-01-11T21:38:06.1821294Z buf0 = empty_strided((8, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1821488Z buf4 = empty_strided((8, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1821712Z buf1 = empty_strided((1, 8, 64, 64), (32768, 4096, 64, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1821934Z buf2 = empty_strided((1, 8, 64, 64), (32768, 4096, 64, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1822121Z buf3 = empty_strided((2, 3), (3, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1822315Z buf5 = empty_strided((2, 3), (3, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1822579Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf4.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr()), c_void_p(buf3.data_ptr()), c_void_p(buf5.data_ptr())) 2023-01-11T21:38:06.1822655Z del arg0_1 2023-01-11T21:38:06.1822763Z return (buf0, buf1, buf2, buf3, buf4, buf5, ) 2023-01-11T21:38:06.1822798Z 2023-01-11T21:38:06.1822804Z 2023-01-11T21:38:06.1822885Z if __name__ == "__main__": 2023-01-11T21:38:06.1823006Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.1823131Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.1823322Z arg0_1 = rand_strided((8, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.1823428Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.1823433Z 2023-01-11T21:38:06.1823504Z ok (1.841s) 2023-01-11T21:38:06.1824027Z test_accuracy_issue1 (__main__.CudaReproTests) ... /opt/conda/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py:372: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled.Consider setting `torch.set_float32_matmul_precision('high')` 2023-01-11T21:38:06.1824110Z warnings.warn( 2023-01-11T21:38:06.1824369Z [2023-01-11 21:33:41,585] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 362 2023-01-11T21:38:06.1824579Z [2023-01-11 21:33:41,613] torch._inductor.scheduler: [DEBUG] remove_buffer('buf2') 2023-01-11T21:38:06.1824782Z [2023-01-11 21:33:41,613] torch._inductor.scheduler: [DEBUG] remove_buffer('buf1') 2023-01-11T21:38:06.1825044Z [2023-01-11 21:33:42,055] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 362 2023-01-11T21:38:06.1825298Z [2023-01-11 21:33:42,194] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling BACKWARDS graph 362 2023-01-11T21:38:06.1825499Z [2023-01-11 21:33:42,232] torch._inductor.scheduler: [DEBUG] remove_buffer('buf2') 2023-01-11T21:38:06.1825533Z 2023-01-11T21:38:06.1825654Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.1825733Z import torch 2023-01-11T21:38:06.1825826Z import random 2023-01-11T21:38:06.1825953Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.1826075Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.1826080Z 2023-01-11T21:38:06.1826166Z aten = torch.ops.aten 2023-01-11T21:38:06.1826295Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.1826391Z async_compile = AsyncCompile() 2023-01-11T21:38:06.1826396Z 2023-01-11T21:38:06.1826470Z import triton 2023-01-11T21:38:06.1826562Z import triton.language as tl 2023-01-11T21:38:06.1826688Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.1826828Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.1826834Z 2023-01-11T21:38:06.1826838Z 2023-01-11T21:38:06.1827083Z triton_fused_addmm_amax_clone_exp_permute_split_squeeze_sub_sub_1_sum_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.1827161Z import triton 2023-01-11T21:38:06.1827246Z import triton.language as tl 2023-01-11T21:38:06.1827363Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.1827465Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.1827597Z from torch._inductor.triton_ops.autotune import reduction 2023-01-11T21:38:06.1827724Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.1827729Z 2023-01-11T21:38:06.1827819Z @reduction(size_hints=[1, 128], 2023-01-11T21:38:06.1827933Z reduction_hint=ReductionHint.INNER, 2023-01-11T21:38:06.1828018Z filename=__file__, 2023-01-11T21:38:06.1828380Z meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 3), equal_to_1=())]}) 2023-01-11T21:38:06.1828455Z @triton.jit 2023-01-11T21:38:06.1828624Z def triton_(in_ptr0, out_ptr2, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr): 2023-01-11T21:38:06.1828700Z xnumel = 1 2023-01-11T21:38:06.1828774Z rnumel = 128 2023-01-11T21:38:06.1828870Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.1829005Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.1829082Z xmask = xindex < xnumel 2023-01-11T21:38:06.1829228Z rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK]) 2023-01-11T21:38:06.1829413Z _tmp1 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("-inf") 2023-01-11T21:38:06.1829518Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.1829609Z rindex = roffset + rbase 2023-01-11T21:38:06.1829695Z rmask = rindex < rnumel 2023-01-11T21:38:06.1829768Z r0 = rindex 2023-01-11T21:38:06.1829958Z tmp0 = tl.load(in_ptr0 + (2*r0), rmask, eviction_policy='evict_last') 2023-01-11T21:38:06.1830090Z _tmp1 = tl.where(xmask & rmask & (_tmp1 < tmp0), tmp0, _tmp1) 2023-01-11T21:38:06.1830206Z tmp1 = tl.reshape(tl.max(_tmp1, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.1830322Z _tmp5 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.1830428Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.1830515Z rindex = roffset + rbase 2023-01-11T21:38:06.1830601Z rmask = rindex < rnumel 2023-01-11T21:38:06.1830675Z r0 = rindex 2023-01-11T21:38:06.1830864Z tmp2 = tl.load(in_ptr0 + (2*r0), rmask, eviction_policy='evict_last') 2023-01-11T21:38:06.1830980Z tmp3 = tmp2 - tmp1 2023-01-11T21:38:06.1831063Z tmp4 = tl.exp(tmp3) 2023-01-11T21:38:06.1831184Z _tmp5 = tl.where(xmask & rmask, _tmp5 + tmp4, _tmp5) 2023-01-11T21:38:06.1831297Z tmp5 = tl.reshape(tl.sum(_tmp5, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.1831400Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.1831489Z rindex = roffset + rbase 2023-01-11T21:38:06.1831566Z rmask = rindex < rnumel 2023-01-11T21:38:06.1831639Z r0 = rindex 2023-01-11T21:38:06.1831774Z tmp6 = tl.load(in_ptr0 + (2*r0), rmask) 2023-01-11T21:38:06.1831889Z tmp7 = tmp6 - tmp1 2023-01-11T21:38:06.1831970Z tmp8 = tl.log(tmp5) 2023-01-11T21:38:06.1832083Z tmp9 = tmp7 - tmp8 2023-01-11T21:38:06.1832239Z tl.store(out_ptr2 + (r0 + tl.zeros([XBLOCK, RBLOCK], tl.int32)), tmp9, rmask & xmask) 2023-01-11T21:38:06.1832317Z ''') 2023-01-11T21:38:06.1832322Z 2023-01-11T21:38:06.1832326Z 2023-01-11T21:38:06.1832604Z triton_fused_convert_element_type_div_gather_ne_neg_scalar_tensor_squeeze_1_sub_1_sum_2_sum_3_1 = async_compile.triton(''' 2023-01-11T21:38:06.1832680Z import triton 2023-01-11T21:38:06.1832776Z import triton.language as tl 2023-01-11T21:38:06.1832890Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.1832992Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.1833125Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.1833251Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.1833259Z 2023-01-11T21:38:06.1833692Z @pointwise(size_hints=[1], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp32', 2: '*i64', 3: '*i1', 4: '*fp32', 5: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]}) 2023-01-11T21:38:06.1833771Z @triton.jit 2023-01-11T21:38:06.1833932Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, out_ptr2, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.1834005Z xnumel = 1 2023-01-11T21:38:06.1834107Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.1834238Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.1834323Z xmask = xindex < xnumel 2023-01-11T21:38:06.1834559Z tmp0 = tl.load(in_ptr0 + (0 + tl.zeros([XBLOCK], tl.int32)), None, eviction_policy='evict_last') 2023-01-11T21:38:06.1834684Z tmp5 = tl.load(in_ptr0 + (0 + tl.zeros([XBLOCK], tl.int32)), None) 2023-01-11T21:38:06.1834758Z tmp1 = 0 2023-01-11T21:38:06.1834897Z tmp2 = tl.where(tmp0 != tmp0, tmp0, tl.where(tmp0 > tmp1, tmp0, tmp1)) 2023-01-11T21:38:06.1834971Z tmp3 = 128 2023-01-11T21:38:06.1835105Z tmp4 = tl.where(tmp2 != tmp2, tmp2, tl.where(tmp2 < tmp3, tmp2, tmp3)) 2023-01-11T21:38:06.1835268Z tmp6 = tl.where(tmp5 != tmp5, tmp5, tl.where(tmp5 > tmp1, tmp5, tmp1)) 2023-01-11T21:38:06.1835401Z tmp7 = tl.where(tmp6 != tmp6, tmp6, tl.where(tmp6 < tmp3, tmp6, tmp3)) 2023-01-11T21:38:06.1835475Z tmp8 = tmp7 != tmp3 2023-01-11T21:38:06.1835576Z tmp9 = tl.load(in_ptr1 + (tmp4), None) 2023-01-11T21:38:06.1835683Z tmp10 = -tmp9 2023-01-11T21:38:06.1835783Z tmp11 = tl.where(tmp8, tmp10, tmp1) 2023-01-11T21:38:06.1835871Z tmp12 = tmp8.to(tl.int64) 2023-01-11T21:38:06.1835963Z tmp13 = tmp12.to(tl.float32) 2023-01-11T21:38:06.1836044Z tmp14 = tmp11 / tmp13 2023-01-11T21:38:06.1836171Z tl.store(out_ptr0 + (0 + tl.zeros([XBLOCK], tl.int32)), tmp4, None) 2023-01-11T21:38:06.1836306Z tl.store(out_ptr1 + (0 + tl.zeros([XBLOCK], tl.int32)), tmp8, None) 2023-01-11T21:38:06.1836443Z tl.store(out_ptr2 + (0 + tl.zeros([XBLOCK], tl.int32)), tmp14, None) 2023-01-11T21:38:06.1836530Z ''') 2023-01-11T21:38:06.1836536Z 2023-01-11T21:38:06.1836540Z 2023-01-11T21:38:06.1836634Z async_compile.wait(globals()) 2023-01-11T21:38:06.1836720Z del async_compile 2023-01-11T21:38:06.1836725Z 2023-01-11T21:38:06.1836801Z def call(args): 2023-01-11T21:38:06.1836920Z primals_1, primals_2, primals_3, primals_4 = args 2023-01-11T21:38:06.1836989Z args.clear() 2023-01-11T21:38:06.1837082Z with torch.cuda.device(0): 2023-01-11T21:38:06.1837287Z buf0 = empty_strided((128, 2), (2, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.1837481Z aten.addmm.out(primals_2, as_strided(primals_4, (128, 768), (768, 1)), as_strided(primals_1, (768, 2), (1, 768)), beta=1, alpha=1, out=buf0) 2023-01-11T21:38:06.1837560Z del primals_2 2023-01-11T21:38:06.1837765Z buf3 = empty_strided((1, 128), (128, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.1837886Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.1838080Z triton_fused_addmm_amax_clone_exp_permute_split_squeeze_sub_sub_1_sum_1_0.run(buf0, buf3, 1, 128, grid=grid(1), stream=stream0) 2023-01-11T21:38:06.1838145Z del buf0 2023-01-11T21:38:06.1838345Z buf4 = empty_strided((1, 1), (1, 1), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.1838535Z buf5 = empty_strided((1, ), (1, ), device='cuda', dtype=torch.bool) 2023-01-11T21:38:06.1838721Z buf6 = empty_strided((), (), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.1838951Z triton_fused_convert_element_type_div_gather_ne_neg_scalar_tensor_squeeze_1_sub_1_sum_2_sum_3_1.run(primals_3, buf3, buf4, buf5, buf6, 1, grid=grid(1), stream=stream0) 2023-01-11T21:38:06.1839031Z del primals_3 2023-01-11T21:38:06.1839200Z return (buf6, as_strided(primals_4, (128, 768), (768, 1)), buf3, buf4, buf5, as_strided(primals_1, (2, 768), (768, 1)), ) 2023-01-11T21:38:06.1839209Z 2023-01-11T21:38:06.1839213Z 2023-01-11T21:38:06.1839296Z if __name__ == "__main__": 2023-01-11T21:38:06.1839408Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.1839536Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.1839750Z primals_1 = rand_strided((2, 768), (768, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.1839954Z primals_2 = rand_strided((2, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.1840153Z primals_3 = rand_strided((1, ), (1, ), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.1840378Z primals_4 = rand_strided((1, 128, 768), (98304, 768, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.1840533Z print_performance(lambda: call([primals_1, primals_2, primals_3, primals_4])) 2023-01-11T21:38:06.1840538Z 2023-01-11T21:38:06.1840543Z 2023-01-11T21:38:06.1840643Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.1840722Z import torch 2023-01-11T21:38:06.1840790Z import random 2023-01-11T21:38:06.1840911Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.1841036Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.1841041Z 2023-01-11T21:38:06.1841127Z aten = torch.ops.aten 2023-01-11T21:38:06.1841291Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.1841389Z async_compile = AsyncCompile() 2023-01-11T21:38:06.1841394Z 2023-01-11T21:38:06.1841469Z import triton 2023-01-11T21:38:06.1841555Z import triton.language as tl 2023-01-11T21:38:06.1841680Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.1841823Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.1841828Z 2023-01-11T21:38:06.1841833Z 2023-01-11T21:38:06.1842013Z triton_fused_scatter_zeros_like_0 = async_compile.triton(''' 2023-01-11T21:38:06.1842088Z import triton 2023-01-11T21:38:06.1842181Z import triton.language as tl 2023-01-11T21:38:06.1842298Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.1842400Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.1842525Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.1842650Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.1842656Z 2023-01-11T21:38:06.1843048Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp32', 1: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.1843128Z @triton.jit 2023-01-11T21:38:06.1843250Z def triton_(out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.1843324Z xnumel = 128 2023-01-11T21:38:06.1843422Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.1843550Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.1843626Z xmask = xindex < xnumel 2023-01-11T21:38:06.1843728Z x0 = xindex 2023-01-11T21:38:06.1843800Z tmp0 = 0 2023-01-11T21:38:06.1843934Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask) 2023-01-11T21:38:06.1844020Z ''') 2023-01-11T21:38:06.1844026Z 2023-01-11T21:38:06.1844030Z 2023-01-11T21:38:06.1844209Z triton_fused_scatter_zeros_like_1 = async_compile.triton(''' 2023-01-11T21:38:06.1844288Z import triton 2023-01-11T21:38:06.1844373Z import triton.language as tl 2023-01-11T21:38:06.1844488Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.1844590Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.1844722Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.1844846Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.1844851Z 2023-01-11T21:38:06.1845256Z @pointwise(size_hints=[1], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.1845333Z @triton.jit 2023-01-11T21:38:06.1845488Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.1845556Z xnumel = 1 2023-01-11T21:38:06.1845674Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.1845811Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.1845895Z xmask = xindex < xnumel 2023-01-11T21:38:06.1846128Z tmp0 = tl.load(in_ptr0 + (0 + tl.zeros([XBLOCK], tl.int32)), None, eviction_policy='evict_last') 2023-01-11T21:38:06.1846230Z tmp1 = -1.0 2023-01-11T21:38:06.1846334Z tl.store(out_ptr0 + (tmp0), tmp1, None) 2023-01-11T21:38:06.1846411Z ''') 2023-01-11T21:38:06.1846423Z 2023-01-11T21:38:06.1846428Z 2023-01-11T21:38:06.1846695Z triton_fused_cat_convert_element_type_div_1_mul_ne_2_scalar_tensor_scatter_sum_2_sum_4_where_1_2 = async_compile.triton(''' 2023-01-11T21:38:06.1846775Z import triton 2023-01-11T21:38:06.1846873Z import triton.language as tl 2023-01-11T21:38:06.1846990Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.1847092Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.1847223Z from torch._inductor.triton_ops.autotune import reduction 2023-01-11T21:38:06.1847348Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.1847353Z 2023-01-11T21:38:06.1847471Z @reduction(size_hints=[1, 128], 2023-01-11T21:38:06.1847582Z reduction_hint=ReductionHint.INNER, 2023-01-11T21:38:06.1847669Z filename=__file__, 2023-01-11T21:38:06.1848088Z meta={'signature': {0: '*fp32', 1: '*i64', 2: '*fp32', 3: '*i1', 4: '*fp32', 5: '*fp32', 6: 'i32', 7: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5, 7), equal_to_1=())]}) 2023-01-11T21:38:06.1848163Z @triton.jit 2023-01-11T21:38:06.1848361Z def triton_(in_ptr0, in_ptr1, in_ptr2, in_ptr3, in_ptr4, out_ptr1, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr): 2023-01-11T21:38:06.1848439Z xnumel = 1 2023-01-11T21:38:06.1848513Z rnumel = 128 2023-01-11T21:38:06.1848603Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.1848740Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.1848826Z xmask = xindex < xnumel 2023-01-11T21:38:06.1848948Z rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK]) 2023-01-11T21:38:06.1849194Z tmp1 = tl.load(in_ptr1 + (0 + tl.zeros([XBLOCK, RBLOCK], tl.int32)), None, eviction_policy='evict_last') 2023-01-11T21:38:06.1849439Z tmp4 = tl.load(in_ptr2 + (0 + tl.zeros([XBLOCK, RBLOCK], tl.int32)), None, eviction_policy='evict_last') 2023-01-11T21:38:06.1849677Z tmp5 = tl.load(in_ptr3 + (0 + tl.zeros([XBLOCK, RBLOCK], tl.int32)), None, eviction_policy='evict_last') 2023-01-11T21:38:06.1849800Z _tmp12 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.1849900Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.1850019Z rindex = roffset + rbase 2023-01-11T21:38:06.1850109Z rmask = rindex < rnumel 2023-01-11T21:38:06.1850181Z r0 = rindex 2023-01-11T21:38:06.1850375Z tmp0 = tl.load(in_ptr0 + (r0), rmask, eviction_policy='evict_last') 2023-01-11T21:38:06.1850451Z tmp2 = 128 2023-01-11T21:38:06.1850538Z tmp3 = tmp1 != tmp2 2023-01-11T21:38:06.1850621Z tmp6 = tmp5.to(tl.int64) 2023-01-11T21:38:06.1850713Z tmp7 = tmp6.to(tl.float32) 2023-01-11T21:38:06.1850795Z tmp8 = tmp4 / tmp7 2023-01-11T21:38:06.1850868Z tmp9 = 0 2023-01-11T21:38:06.1850969Z tmp10 = tl.where(tmp3, tmp8, tmp9) 2023-01-11T21:38:06.1851052Z tmp11 = tmp0 * tmp10 2023-01-11T21:38:06.1851175Z _tmp12 = tl.where(xmask & rmask, _tmp12 + tmp11, _tmp12) 2023-01-11T21:38:06.1851284Z tmp12 = tl.reshape(tl.sum(_tmp12, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.1851424Z tmp14 = tl.load(in_ptr1 + (0 + tl.zeros([XBLOCK, RBLOCK], tl.int32)), None) 2023-01-11T21:38:06.1851569Z tmp17 = tl.load(in_ptr2 + (0 + tl.zeros([XBLOCK, RBLOCK], tl.int32)), None) 2023-01-11T21:38:06.1851706Z tmp18 = tl.load(in_ptr3 + (0 + tl.zeros([XBLOCK, RBLOCK], tl.int32)), None) 2023-01-11T21:38:06.1851813Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.1851902Z rindex = roffset + rbase 2023-01-11T21:38:06.1851991Z rmask = rindex < rnumel 2023-01-11T21:38:06.1852056Z r0 = rindex 2023-01-11T21:38:06.1852158Z tmp13 = tl.load(in_ptr0 + (r0), rmask) 2023-01-11T21:38:06.1852262Z tmp25 = tl.load(in_ptr4 + (r0), rmask) 2023-01-11T21:38:06.1852336Z tmp15 = 128 2023-01-11T21:38:06.1852420Z tmp16 = tmp14 != tmp15 2023-01-11T21:38:06.1852513Z tmp19 = tmp18.to(tl.int64) 2023-01-11T21:38:06.1852610Z tmp20 = tmp19.to(tl.float32) 2023-01-11T21:38:06.1852687Z tmp21 = tmp17 / tmp20 2023-01-11T21:38:06.1852758Z tmp22 = 0 2023-01-11T21:38:06.1852863Z tmp23 = tl.where(tmp16, tmp21, tmp22) 2023-01-11T21:38:06.1852949Z tmp24 = tmp13 * tmp23 2023-01-11T21:38:06.1853034Z tmp26 = tl.exp(tmp25) 2023-01-11T21:38:06.1853115Z tmp27 = tmp26 * tmp12 2023-01-11T21:38:06.1853233Z tmp28 = tmp24 - tmp27 2023-01-11T21:38:06.1853380Z tl.store(out_ptr1 + (2*r0 + tl.zeros([XBLOCK, RBLOCK], tl.int32)), tmp28, rmask & xmask) 2023-01-11T21:38:06.1853492Z ''') 2023-01-11T21:38:06.1853497Z 2023-01-11T21:38:06.1853502Z 2023-01-11T21:38:06.1853664Z triton_fused_zeros_3 = async_compile.triton(''' 2023-01-11T21:38:06.1853738Z import triton 2023-01-11T21:38:06.1853834Z import triton.language as tl 2023-01-11T21:38:06.1853948Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.1854051Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.1854184Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.1854302Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.1854307Z 2023-01-11T21:38:06.1854808Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp32', 1: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(1,), equal_to_1=())]}) 2023-01-11T21:38:06.1854887Z @triton.jit 2023-01-11T21:38:06.1855010Z def triton_(out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.1855088Z xnumel = 128 2023-01-11T21:38:06.1855185Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.1855313Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.1855396Z xmask = xindex < xnumel 2023-01-11T21:38:06.1855460Z x0 = xindex 2023-01-11T21:38:06.1855531Z tmp0 = 0 2023-01-11T21:38:06.1855668Z tl.store(out_ptr0 + (2*x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask) 2023-01-11T21:38:06.1855755Z ''') 2023-01-11T21:38:06.1855760Z 2023-01-11T21:38:06.1855764Z 2023-01-11T21:38:06.1855940Z triton_fused_cat_sum_5_view_2_4 = async_compile.triton(''' 2023-01-11T21:38:06.1856016Z import triton 2023-01-11T21:38:06.1856175Z import triton.language as tl 2023-01-11T21:38:06.1856289Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.1856397Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.1856541Z from torch._inductor.triton_ops.autotune import reduction 2023-01-11T21:38:06.1856676Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.1856684Z 2023-01-11T21:38:06.1856777Z @reduction(size_hints=[2, 128], 2023-01-11T21:38:06.1856903Z reduction_hint=ReductionHint.OUTER, 2023-01-11T21:38:06.1856992Z filename=__file__, 2023-01-11T21:38:06.1857450Z meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 3), equal_to_1=())]}) 2023-01-11T21:38:06.1857518Z @triton.jit 2023-01-11T21:38:06.1857686Z def triton_(in_ptr0, out_ptr0, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr): 2023-01-11T21:38:06.1857763Z xnumel = 2 2023-01-11T21:38:06.1857836Z rnumel = 128 2023-01-11T21:38:06.1857934Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.1858070Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.1858152Z xmask = xindex < xnumel 2023-01-11T21:38:06.1858267Z rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK]) 2023-01-11T21:38:06.1858339Z x0 = xindex 2023-01-11T21:38:06.1858458Z _tmp1 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.1858564Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.1858652Z rindex = roffset + rbase 2023-01-11T21:38:06.1858736Z rmask = rindex < rnumel 2023-01-11T21:38:06.1858807Z r1 = rindex 2023-01-11T21:38:06.1858918Z tmp0 = tl.load(in_ptr0 + (x0 + (2*r1)), rmask & xmask) 2023-01-11T21:38:06.1859039Z _tmp1 = tl.where(xmask & rmask, _tmp1 + tmp0, _tmp1) 2023-01-11T21:38:06.1859153Z tmp1 = tl.reshape(tl.sum(_tmp1, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.1859257Z tl.store(out_ptr0 + x0, tmp1, xmask) 2023-01-11T21:38:06.1859342Z ''') 2023-01-11T21:38:06.1859348Z 2023-01-11T21:38:06.1859352Z 2023-01-11T21:38:06.1859444Z async_compile.wait(globals()) 2023-01-11T21:38:06.1859522Z del async_compile 2023-01-11T21:38:06.1859527Z 2023-01-11T21:38:06.1859602Z def call(args): 2023-01-11T21:38:06.1859763Z view, sub_1, unsqueeze, ne, permute_1, tangents_1 = args 2023-01-11T21:38:06.1859840Z args.clear() 2023-01-11T21:38:06.1859932Z with torch.cuda.device(0): 2023-01-11T21:38:06.1860140Z buf0 = empty_strided((1, 128), (128, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.1860234Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.1860381Z triton_fused_scatter_zeros_like_0.run(buf0, 128, grid=grid(128), stream=stream0) 2023-01-11T21:38:06.1860538Z triton_fused_scatter_zeros_like_1.run(unsqueeze, buf0, 1, grid=grid(1), stream=stream0) 2023-01-11T21:38:06.1860740Z buf5 = empty_strided((1, 128, 2), (256, 2, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.1860857Z buf3 = as_strided(buf5, (1, 128, 1), (256, 2, 1)) # alias 2023-01-11T21:38:06.1861098Z triton_fused_cat_convert_element_type_div_1_mul_ne_2_scalar_tensor_scatter_sum_2_sum_4_where_1_2.run(buf0, unsqueeze, tangents_1, ne, sub_1, buf3, 1, 128, grid=grid(1), stream=stream0) 2023-01-11T21:38:06.1861171Z del buf0 2023-01-11T21:38:06.1861243Z del ne 2023-01-11T21:38:06.1861315Z del sub_1 2023-01-11T21:38:06.1861394Z del tangents_1 2023-01-11T21:38:06.1861469Z del unsqueeze 2023-01-11T21:38:06.1861578Z buf4 = as_strided(buf5, (1, 128, 1), (256, 2, 1), 1) # alias 2023-01-11T21:38:06.1861714Z triton_fused_zeros_3.run(buf4, 128, grid=grid(128), stream=stream0) 2023-01-11T21:38:06.1861789Z del buf3 2023-01-11T21:38:06.1861871Z del buf4 2023-01-11T21:38:06.1862077Z buf6 = empty_strided((128, 768), (768, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.1862239Z aten.mm.out(as_strided(buf5, (128, 2), (2, 1)), permute_1, out=buf6) 2023-01-11T21:38:06.1862318Z del permute_1 2023-01-11T21:38:06.1862515Z buf7 = empty_strided((2, 768), (768, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.1862645Z aten.mm.out(as_strided(buf5, (2, 128), (1, 2)), view, out=buf7) 2023-01-11T21:38:06.1862720Z del view 2023-01-11T21:38:06.1862918Z buf8 = empty_strided((1, 2), (2, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.1863066Z triton_fused_cat_sum_5_view_2_4.run(buf5, buf8, 2, 128, grid=grid(2), stream=stream0) 2023-01-11T21:38:06.1863242Z return (as_strided(buf7, (2, 768), (768, 1)), as_strided(buf8, (2, ), (1, )), None, as_strided(buf6, (1, 128, 768), (98304, 768, 1)), ) 2023-01-11T21:38:06.1863248Z 2023-01-11T21:38:06.1863253Z 2023-01-11T21:38:06.1863333Z if __name__ == "__main__": 2023-01-11T21:38:06.1863451Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.1863574Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.1863779Z view = rand_strided((128, 768), (768, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.1863981Z sub_1 = rand_strided((1, 128), (128, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.1864184Z unsqueeze = rand_strided((1, 1), (1, 1), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.1864368Z ne = rand_strided((1, ), (1, ), device='cuda:0', dtype=torch.bool) 2023-01-11T21:38:06.1864578Z permute_1 = rand_strided((2, 768), (768, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.1864772Z tangents_1 = rand_strided((), (), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.1864941Z print_performance(lambda: call([view, sub_1, unsqueeze, ne, permute_1, tangents_1])) 2023-01-11T21:38:06.1865213Z [2023-01-11 21:33:42,577] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling BACKWARDS graph 362 2023-01-11T21:38:06.1865223Z 2023-01-11T21:38:06.1865287Z ok (2.011s) 2023-01-11T21:38:06.1865419Z test_autotune_inplace_kernel (__main__.CudaReproTests) 2023-01-11T21:38:06.1865616Z This UT tests autotune on an inplace kernel. The autotune should not contaminate ... ok (4.483s) 2023-01-11T21:38:06.1866007Z test_dtype_factory_issue (__main__.CudaReproTests) ... [2023-01-11 21:33:47,080] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph None 2023-01-11T21:38:06.1866230Z [2023-01-11 21:33:47,084] torch._inductor.ir: [WARNING] Using FallbackKernel: aten.randn 2023-01-11T21:38:06.1866490Z [2023-01-11 21:33:47,086] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph None 2023-01-11T21:38:06.1866496Z 2023-01-11T21:38:06.1866595Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.1866670Z import torch 2023-01-11T21:38:06.1866738Z import random 2023-01-11T21:38:06.1866857Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.1866981Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.1866990Z 2023-01-11T21:38:06.1867074Z aten = torch.ops.aten 2023-01-11T21:38:06.1867210Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.1867308Z async_compile = AsyncCompile() 2023-01-11T21:38:06.1867313Z 2023-01-11T21:38:06.1867389Z import triton 2023-01-11T21:38:06.1867484Z import triton.language as tl 2023-01-11T21:38:06.1867602Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.1867742Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.1867748Z 2023-01-11T21:38:06.1867752Z 2023-01-11T21:38:06.1867845Z async_compile.wait(globals()) 2023-01-11T21:38:06.1867924Z del async_compile 2023-01-11T21:38:06.1867929Z 2023-01-11T21:38:06.1868006Z def call(args): 2023-01-11T21:38:06.1868098Z with torch.cuda.device(0): 2023-01-11T21:38:06.1868357Z buf0 = aten.randn([12, 64, 1, 64], dtype=torch.float32, device=device(type='cuda', index=0), pin_memory=False) 2023-01-11T21:38:06.1868464Z buf1 = buf0 2023-01-11T21:38:06.1868575Z assert_size_stride(buf1, (12, 64, 1, 64), (4096, 64, 64, 1)) 2023-01-11T21:38:06.1868646Z del buf0 2023-01-11T21:38:06.1868767Z return (as_strided(buf1, (12, 64, 1, 64, 1), (4096, 64, 64, 1, 1)), ) 2023-01-11T21:38:06.1868772Z 2023-01-11T21:38:06.1868777Z 2023-01-11T21:38:06.1868861Z if __name__ == "__main__": 2023-01-11T21:38:06.1868977Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.1869101Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.1869206Z print_performance(lambda: call([])) 2023-01-11T21:38:06.1869211Z 2023-01-11T21:38:06.1869282Z ok (0.024s) 2023-01-11T21:38:06.1869792Z test_dynamic_shapes (__main__.CudaReproTests) ... /opt/conda/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py:372: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled.Consider setting `torch.set_float32_matmul_precision('high')` 2023-01-11T21:38:06.1869874Z warnings.warn( 2023-01-11T21:38:06.1870128Z [2023-01-11 21:33:47,154] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 363 2023-01-11T21:38:06.1870392Z [2023-01-11 21:33:48,818] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 363 2023-01-11T21:38:06.1870401Z 2023-01-11T21:38:06.1870501Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.1870577Z import torch 2023-01-11T21:38:06.1870651Z import random 2023-01-11T21:38:06.1870769Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.1870886Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.1870891Z 2023-01-11T21:38:06.1870972Z aten = torch.ops.aten 2023-01-11T21:38:06.1871108Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.1871203Z async_compile = AsyncCompile() 2023-01-11T21:38:06.1871208Z 2023-01-11T21:38:06.1871281Z import triton 2023-01-11T21:38:06.1871379Z import triton.language as tl 2023-01-11T21:38:06.1871504Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.1871642Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.1871647Z 2023-01-11T21:38:06.1871652Z 2023-01-11T21:38:06.1871781Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.1872015Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.1872140Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.1872245Z float* __restrict__ out_ptr0, 2023-01-11T21:38:06.1872332Z const long ks0) 2023-01-11T21:38:06.1872397Z { 2023-01-11T21:38:06.1872498Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.1872557Z { 2023-01-11T21:38:06.1872639Z #pragma omp for 2023-01-11T21:38:06.1872729Z for(long i0=0; i0(1); 2023-01-11T21:38:06.1947965Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.1948055Z out_ptr0[0] = tmp2; 2023-01-11T21:38:06.1948151Z } 2023-01-11T21:38:06.1948213Z } 2023-01-11T21:38:06.1948278Z } 2023-01-11T21:38:06.1948364Z ''') 2023-01-11T21:38:06.1948369Z 2023-01-11T21:38:06.1948373Z 2023-01-11T21:38:06.1948466Z async_compile.wait(globals()) 2023-01-11T21:38:06.1948545Z del async_compile 2023-01-11T21:38:06.1948551Z 2023-01-11T21:38:06.1948630Z def call(args): 2023-01-11T21:38:06.1948712Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.1948781Z args.clear() 2023-01-11T21:38:06.1948964Z buf0 = empty_strided((), (), device='cpu', dtype=torch.int64) 2023-01-11T21:38:06.1949103Z kernel_cpp_0(c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.1949176Z del arg1_1 2023-01-11T21:38:06.1949300Z return (as_strided(arg0_1, (12, 3, 512, 1, 64), (64, 196608, 768, 0, 1)), buf0, ) 2023-01-11T21:38:06.1949306Z 2023-01-11T21:38:06.1949310Z 2023-01-11T21:38:06.1949390Z if __name__ == "__main__": 2023-01-11T21:38:06.1949508Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.1949637Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.1949859Z arg0_1 = rand_strided((12, 3, 512, 64), (64, 196608, 768, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.1950042Z arg1_1 = rand_strided((), (), device='cpu', dtype=torch.int64) 2023-01-11T21:38:06.1950167Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.1950172Z 2023-01-11T21:38:06.1950246Z ok (1.679s) 2023-01-11T21:38:06.1950700Z test_abs_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.1950831Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.1951091Z [2023-01-11 21:33:53,763] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 372 2023-01-11T21:38:06.1951355Z [2023-01-11 21:33:53,923] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 372 2023-01-11T21:38:06.1951801Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.1951933Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.1952190Z [2023-01-11 21:33:53,939] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 373 2023-01-11T21:38:06.1952446Z [2023-01-11 21:33:54,008] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 373 2023-01-11T21:38:06.1952454Z 2023-01-11T21:38:06.1952551Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.1952625Z import torch 2023-01-11T21:38:06.1952699Z import random 2023-01-11T21:38:06.1952819Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.1952945Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.1952950Z 2023-01-11T21:38:06.1953034Z aten = torch.ops.aten 2023-01-11T21:38:06.1953164Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.1953258Z async_compile = AsyncCompile() 2023-01-11T21:38:06.1953264Z 2023-01-11T21:38:06.1953336Z import triton 2023-01-11T21:38:06.1953427Z import triton.language as tl 2023-01-11T21:38:06.1953552Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.1953692Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.1953697Z 2023-01-11T21:38:06.1953702Z 2023-01-11T21:38:06.1953855Z triton_fused_div_0 = async_compile.triton(''' 2023-01-11T21:38:06.1953956Z import triton 2023-01-11T21:38:06.1954041Z import triton.language as tl 2023-01-11T21:38:06.1954155Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.1954255Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.1954388Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.1954514Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.1954520Z 2023-01-11T21:38:06.1954923Z @pointwise(size_hints=[32], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.1954996Z @triton.jit 2023-01-11T21:38:06.1955145Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.1955219Z xnumel = 17 2023-01-11T21:38:06.1955335Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.1955470Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.1955556Z xmask = xindex < xnumel 2023-01-11T21:38:06.1955627Z x0 = xindex 2023-01-11T21:38:06.1955723Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.1955803Z tmp1 = tl.abs(tmp0) 2023-01-11T21:38:06.1955866Z tmp2 = 1 2023-01-11T21:38:06.1955944Z tmp3 = tmp1 + tmp2 2023-01-11T21:38:06.1956025Z tmp4 = tmp0 / tmp3 2023-01-11T21:38:06.1956161Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.1956246Z ''') 2023-01-11T21:38:06.1956251Z 2023-01-11T21:38:06.1956256Z 2023-01-11T21:38:06.1956349Z async_compile.wait(globals()) 2023-01-11T21:38:06.1956428Z del async_compile 2023-01-11T21:38:06.1956433Z 2023-01-11T21:38:06.1956507Z def call(args): 2023-01-11T21:38:06.1956573Z arg0_1, = args 2023-01-11T21:38:06.1956647Z args.clear() 2023-01-11T21:38:06.1956737Z with torch.cuda.device(0): 2023-01-11T21:38:06.1956933Z buf0 = empty_strided((17, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.1957029Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.1957164Z triton_fused_div_0.run(arg0_1, buf0, 17, grid=grid(17), stream=stream0) 2023-01-11T21:38:06.1957237Z del arg0_1 2023-01-11T21:38:06.1957307Z return (buf0, ) 2023-01-11T21:38:06.1957312Z 2023-01-11T21:38:06.1957317Z 2023-01-11T21:38:06.1957426Z if __name__ == "__main__": 2023-01-11T21:38:06.1957549Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.1957676Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.1957875Z arg0_1 = rand_strided((17, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.1957987Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.1957992Z 2023-01-11T21:38:06.1957997Z 2023-01-11T21:38:06.1958093Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.1958168Z import torch 2023-01-11T21:38:06.1958235Z import random 2023-01-11T21:38:06.1958353Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.1958478Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.1958483Z 2023-01-11T21:38:06.1958566Z aten = torch.ops.aten 2023-01-11T21:38:06.1958700Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.1958793Z async_compile = AsyncCompile() 2023-01-11T21:38:06.1958798Z 2023-01-11T21:38:06.1958874Z import triton 2023-01-11T21:38:06.1958960Z import triton.language as tl 2023-01-11T21:38:06.1959082Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.1959224Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.1959229Z 2023-01-11T21:38:06.1959234Z 2023-01-11T21:38:06.1959385Z triton_fused_div_0 = async_compile.triton(''' 2023-01-11T21:38:06.1959458Z import triton 2023-01-11T21:38:06.1959550Z import triton.language as tl 2023-01-11T21:38:06.1959664Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.1959765Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.1959917Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.1960043Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.1960048Z 2023-01-11T21:38:06.1960451Z @pointwise(size_hints=[32], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.1960525Z @triton.jit 2023-01-11T21:38:06.1960657Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.1960731Z xnumel = 17 2023-01-11T21:38:06.1960827Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.1960957Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.1961033Z xmask = xindex < xnumel 2023-01-11T21:38:06.1961104Z x0 = xindex 2023-01-11T21:38:06.1961223Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.1961304Z tmp1 = tl.abs(tmp0) 2023-01-11T21:38:06.1961380Z tmp2 = 1 2023-01-11T21:38:06.1961458Z tmp3 = tmp1 + tmp2 2023-01-11T21:38:06.1961535Z tmp4 = tmp0 / tmp3 2023-01-11T21:38:06.1961666Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.1961751Z ''') 2023-01-11T21:38:06.1961756Z 2023-01-11T21:38:06.1961761Z 2023-01-11T21:38:06.1961855Z async_compile.wait(globals()) 2023-01-11T21:38:06.1961932Z del async_compile 2023-01-11T21:38:06.1961937Z 2023-01-11T21:38:06.1962012Z def call(args): 2023-01-11T21:38:06.1962086Z arg0_1, = args 2023-01-11T21:38:06.1962159Z args.clear() 2023-01-11T21:38:06.1962243Z with torch.cuda.device(0): 2023-01-11T21:38:06.1962439Z buf0 = empty_strided((17, ), (1, ), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.1962532Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.1962666Z triton_fused_div_0.run(arg0_1, buf0, 17, grid=grid(17), stream=stream0) 2023-01-11T21:38:06.1962739Z del arg0_1 2023-01-11T21:38:06.1962820Z return (buf0, ) 2023-01-11T21:38:06.1962825Z 2023-01-11T21:38:06.1962830Z 2023-01-11T21:38:06.1962910Z if __name__ == "__main__": 2023-01-11T21:38:06.1963027Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.1963146Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.1963369Z arg0_1 = rand_strided((17, ), (1, ), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.1963482Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.1963487Z 2023-01-11T21:38:06.1963562Z ok (0.261s) 2023-01-11T21:38:06.1964033Z test_adaptive_avg_pool2d1_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.1964167Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.1964427Z [2023-01-11 21:33:54,036] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 374 2023-01-11T21:38:06.1964673Z [2023-01-11 21:33:54,045] torch._inductor.ir: [WARNING] Using FallbackKernel: aten._adaptive_avg_pool2d 2023-01-11T21:38:06.1964679Z 2023-01-11T21:38:06.1964774Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.1964841Z import torch 2023-01-11T21:38:06.1964917Z import random 2023-01-11T21:38:06.1965036Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.1965157Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.1965163Z 2023-01-11T21:38:06.1965244Z aten = torch.ops.aten 2023-01-11T21:38:06.1965380Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.1965474Z async_compile = AsyncCompile() 2023-01-11T21:38:06.1965479Z 2023-01-11T21:38:06.1965555Z import triton 2023-01-11T21:38:06.1965669Z import triton.language as tl 2023-01-11T21:38:06.1965793Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.1965931Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.1965936Z 2023-01-11T21:38:06.1965941Z 2023-01-11T21:38:06.1966129Z triton_fused__adaptive_avg_pool2d_0 = async_compile.triton(''' 2023-01-11T21:38:06.1966205Z import triton 2023-01-11T21:38:06.1966297Z import triton.language as tl 2023-01-11T21:38:06.1966409Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.1966510Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.1966635Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.1966761Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.1966766Z 2023-01-11T21:38:06.1967170Z @pointwise(size_hints=[512], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.1967247Z @triton.jit 2023-01-11T21:38:06.1967377Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.1967451Z xnumel = 288 2023-01-11T21:38:06.1967549Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.1967681Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.1967757Z xmask = xindex < xnumel 2023-01-11T21:38:06.1967839Z x1 = (xindex // 6) % 6 2023-01-11T21:38:06.1967914Z x0 = xindex % 6 2023-01-11T21:38:06.1967994Z x2 = (xindex // 36) 2023-01-11T21:38:06.1968065Z x4 = xindex 2023-01-11T21:38:06.1968147Z tmp0 = ((8*x1) // 3) 2023-01-11T21:38:06.1968223Z tmp1 = ((21 + (16*x1)) // 6) 2023-01-11T21:38:06.1968301Z tmp2 = tmp0 < tmp1 2023-01-11T21:38:06.1968377Z tmp3 = ((8*x0) // 3) 2023-01-11T21:38:06.1968460Z tmp4 = ((21 + (16*x0)) // 6) 2023-01-11T21:38:06.1968539Z tmp5 = tmp3 < tmp4 2023-01-11T21:38:06.1968623Z tmp6 = tmp2 & tmp5 2023-01-11T21:38:06.1968934Z tmp7 = tl.load(in_ptr0 + ((16*(((8*x1) // 3))) + (256*x2) + (((8*x0) // 3)) + tl.zeros([XBLOCK], tl.int32)), tmp6 & xmask, eviction_policy='evict_last', other=0) 2023-01-11T21:38:06.1969023Z tmp8 = tl.where(tmp6, tmp7, 0.0) 2023-01-11T21:38:06.1969106Z tmp9 = 1 + (((8*x0) // 3)) 2023-01-11T21:38:06.1969212Z tmp10 = tmp9 < tmp4 2023-01-11T21:38:06.1969295Z tmp11 = tmp2 & tmp10 2023-01-11T21:38:06.1969616Z tmp12 = tl.load(in_ptr0 + (1 + (16*(((8*x1) // 3))) + (256*x2) + (((8*x0) // 3)) + tl.zeros([XBLOCK], tl.int32)), tmp11 & xmask, eviction_policy='evict_last', other=0) 2023-01-11T21:38:06.1969712Z tmp13 = tl.where(tmp11, tmp12, 0.0) 2023-01-11T21:38:06.1969793Z tmp14 = tmp13 + tmp8 2023-01-11T21:38:06.1969877Z tmp15 = 2 + (((8*x0) // 3)) 2023-01-11T21:38:06.1969950Z tmp16 = tmp15 < tmp4 2023-01-11T21:38:06.1970031Z tmp17 = tmp2 & tmp16 2023-01-11T21:38:06.1970344Z tmp18 = tl.load(in_ptr0 + (2 + (16*(((8*x1) // 3))) + (256*x2) + (((8*x0) // 3)) + tl.zeros([XBLOCK], tl.int32)), tmp17 & xmask, eviction_policy='evict_last', other=0) 2023-01-11T21:38:06.1970445Z tmp19 = tl.where(tmp17, tmp18, 0.0) 2023-01-11T21:38:06.1970528Z tmp20 = tmp19 + tmp14 2023-01-11T21:38:06.1970611Z tmp21 = 3 + (((8*x0) // 3)) 2023-01-11T21:38:06.1970693Z tmp22 = tmp21 < tmp4 2023-01-11T21:38:06.1970766Z tmp23 = tmp2 & tmp22 2023-01-11T21:38:06.1971077Z tmp24 = tl.load(in_ptr0 + (3 + (16*(((8*x1) // 3))) + (256*x2) + (((8*x0) // 3)) + tl.zeros([XBLOCK], tl.int32)), tmp23 & xmask, eviction_policy='evict_last', other=0) 2023-01-11T21:38:06.1971172Z tmp25 = tl.where(tmp23, tmp24, 0.0) 2023-01-11T21:38:06.1971255Z tmp26 = tmp25 + tmp20 2023-01-11T21:38:06.1971335Z tmp27 = 1 + (((8*x1) // 3)) 2023-01-11T21:38:06.1971416Z tmp28 = tmp27 < tmp1 2023-01-11T21:38:06.1971496Z tmp29 = tmp28 & tmp5 2023-01-11T21:38:06.1971805Z tmp30 = tl.load(in_ptr0 + (16 + (16*(((8*x1) // 3))) + (256*x2) + (((8*x0) // 3)) + tl.zeros([XBLOCK], tl.int32)), tmp29 & xmask, eviction_policy='evict_last', other=0) 2023-01-11T21:38:06.1971927Z tmp31 = tl.where(tmp29, tmp30, 0.0) 2023-01-11T21:38:06.1972009Z tmp32 = tmp31 + tmp26 2023-01-11T21:38:06.1972093Z tmp33 = tmp28 & tmp10 2023-01-11T21:38:06.1972408Z tmp34 = tl.load(in_ptr0 + (17 + (16*(((8*x1) // 3))) + (256*x2) + (((8*x0) // 3)) + tl.zeros([XBLOCK], tl.int32)), tmp33 & xmask, eviction_policy='evict_last', other=0) 2023-01-11T21:38:06.1972504Z tmp35 = tl.where(tmp33, tmp34, 0.0) 2023-01-11T21:38:06.1972585Z tmp36 = tmp35 + tmp32 2023-01-11T21:38:06.1972668Z tmp37 = tmp28 & tmp16 2023-01-11T21:38:06.1972972Z tmp38 = tl.load(in_ptr0 + (18 + (16*(((8*x1) // 3))) + (256*x2) + (((8*x0) // 3)) + tl.zeros([XBLOCK], tl.int32)), tmp37 & xmask, eviction_policy='evict_last', other=0) 2023-01-11T21:38:06.1973066Z tmp39 = tl.where(tmp37, tmp38, 0.0) 2023-01-11T21:38:06.1973147Z tmp40 = tmp39 + tmp36 2023-01-11T21:38:06.1973228Z tmp41 = tmp28 & tmp22 2023-01-11T21:38:06.1973539Z tmp42 = tl.load(in_ptr0 + (19 + (16*(((8*x1) // 3))) + (256*x2) + (((8*x0) // 3)) + tl.zeros([XBLOCK], tl.int32)), tmp41 & xmask, eviction_policy='evict_last', other=0) 2023-01-11T21:38:06.1973635Z tmp43 = tl.where(tmp41, tmp42, 0.0) 2023-01-11T21:38:06.1973715Z tmp44 = tmp43 + tmp40 2023-01-11T21:38:06.1973798Z tmp45 = 2 + (((8*x1) // 3)) 2023-01-11T21:38:06.1973872Z tmp46 = tmp45 < tmp1 2023-01-11T21:38:06.1973954Z tmp47 = tmp46 & tmp5 2023-01-11T21:38:06.1974261Z tmp48 = tl.load(in_ptr0 + (32 + (16*(((8*x1) // 3))) + (256*x2) + (((8*x0) // 3)) + tl.zeros([XBLOCK], tl.int32)), tmp47 & xmask, eviction_policy='evict_last', other=0) 2023-01-11T21:38:06.1974357Z tmp49 = tl.where(tmp47, tmp48, 0.0) 2023-01-11T21:38:06.1974438Z tmp50 = tmp49 + tmp44 2023-01-11T21:38:06.1974637Z tmp51 = tmp46 & tmp10 2023-01-11T21:38:06.1974944Z tmp52 = tl.load(in_ptr0 + (33 + (16*(((8*x1) // 3))) + (256*x2) + (((8*x0) // 3)) + tl.zeros([XBLOCK], tl.int32)), tmp51 & xmask, eviction_policy='evict_last', other=0) 2023-01-11T21:38:06.1975036Z tmp53 = tl.where(tmp51, tmp52, 0.0) 2023-01-11T21:38:06.1975117Z tmp54 = tmp53 + tmp50 2023-01-11T21:38:06.1975211Z tmp55 = tmp46 & tmp16 2023-01-11T21:38:06.1975603Z tmp56 = tl.load(in_ptr0 + (34 + (16*(((8*x1) // 3))) + (256*x2) + (((8*x0) // 3)) + tl.zeros([XBLOCK], tl.int32)), tmp55 & xmask, eviction_policy='evict_last', other=0) 2023-01-11T21:38:06.1975704Z tmp57 = tl.where(tmp55, tmp56, 0.0) 2023-01-11T21:38:06.1975787Z tmp58 = tmp57 + tmp54 2023-01-11T21:38:06.1975869Z tmp59 = tmp46 & tmp22 2023-01-11T21:38:06.1976179Z tmp60 = tl.load(in_ptr0 + (35 + (16*(((8*x1) // 3))) + (256*x2) + (((8*x0) // 3)) + tl.zeros([XBLOCK], tl.int32)), tmp59 & xmask, eviction_policy='evict_last', other=0) 2023-01-11T21:38:06.1976270Z tmp61 = tl.where(tmp59, tmp60, 0.0) 2023-01-11T21:38:06.1976354Z tmp62 = tmp61 + tmp58 2023-01-11T21:38:06.1976437Z tmp63 = 3 + (((8*x1) // 3)) 2023-01-11T21:38:06.1976523Z tmp64 = tmp63 < tmp1 2023-01-11T21:38:06.1976606Z tmp65 = tmp64 & tmp5 2023-01-11T21:38:06.1976920Z tmp66 = tl.load(in_ptr0 + (48 + (16*(((8*x1) // 3))) + (256*x2) + (((8*x0) // 3)) + tl.zeros([XBLOCK], tl.int32)), tmp65 & xmask, eviction_policy='evict_last', other=0) 2023-01-11T21:38:06.1977019Z tmp67 = tl.where(tmp65, tmp66, 0.0) 2023-01-11T21:38:06.1977095Z tmp68 = tmp67 + tmp62 2023-01-11T21:38:06.1977245Z tmp69 = tmp64 & tmp10 2023-01-11T21:38:06.1977557Z tmp70 = tl.load(in_ptr0 + (49 + (16*(((8*x1) // 3))) + (256*x2) + (((8*x0) // 3)) + tl.zeros([XBLOCK], tl.int32)), tmp69 & xmask, eviction_policy='evict_last', other=0) 2023-01-11T21:38:06.1977656Z tmp71 = tl.where(tmp69, tmp70, 0.0) 2023-01-11T21:38:06.1977739Z tmp72 = tmp71 + tmp68 2023-01-11T21:38:06.1977823Z tmp73 = tmp64 & tmp16 2023-01-11T21:38:06.1978133Z tmp74 = tl.load(in_ptr0 + (50 + (16*(((8*x1) // 3))) + (256*x2) + (((8*x0) // 3)) + tl.zeros([XBLOCK], tl.int32)), tmp73 & xmask, eviction_policy='evict_last', other=0) 2023-01-11T21:38:06.1978271Z tmp75 = tl.where(tmp73, tmp74, 0.0) 2023-01-11T21:38:06.1978345Z tmp76 = tmp75 + tmp72 2023-01-11T21:38:06.1978425Z tmp77 = tmp64 & tmp22 2023-01-11T21:38:06.1978729Z tmp78 = tl.load(in_ptr0 + (51 + (16*(((8*x1) // 3))) + (256*x2) + (((8*x0) // 3)) + tl.zeros([XBLOCK], tl.int32)), tmp77 & xmask, eviction_policy='evict_last', other=0) 2023-01-11T21:38:06.1978831Z tmp79 = tl.where(tmp77, tmp78, 0.0) 2023-01-11T21:38:06.1978910Z tmp80 = tmp79 + tmp76 2023-01-11T21:38:06.1978983Z tmp81 = 1 2023-01-11T21:38:06.1979079Z tmp82 = tl.where(tmp6, tmp81, 0.0) 2023-01-11T21:38:06.1979144Z tmp83 = 1 2023-01-11T21:38:06.1979238Z tmp84 = tl.where(tmp11, tmp83, 0.0) 2023-01-11T21:38:06.1979318Z tmp85 = tmp84 + tmp82 2023-01-11T21:38:06.1979388Z tmp86 = 1 2023-01-11T21:38:06.1979481Z tmp87 = tl.where(tmp17, tmp86, 0.0) 2023-01-11T21:38:06.1979562Z tmp88 = tmp87 + tmp85 2023-01-11T21:38:06.1979634Z tmp89 = 1 2023-01-11T21:38:06.1979722Z tmp90 = tl.where(tmp23, tmp89, 0.0) 2023-01-11T21:38:06.1979801Z tmp91 = tmp90 + tmp88 2023-01-11T21:38:06.1979874Z tmp92 = 1 2023-01-11T21:38:06.1979966Z tmp93 = tl.where(tmp29, tmp92, 0.0) 2023-01-11T21:38:06.1980044Z tmp94 = tmp93 + tmp91 2023-01-11T21:38:06.1980115Z tmp95 = 1 2023-01-11T21:38:06.1980210Z tmp96 = tl.where(tmp33, tmp95, 0.0) 2023-01-11T21:38:06.1980282Z tmp97 = tmp96 + tmp94 2023-01-11T21:38:06.1980353Z tmp98 = 1 2023-01-11T21:38:06.1980445Z tmp99 = tl.where(tmp37, tmp98, 0.0) 2023-01-11T21:38:06.1980528Z tmp100 = tmp99 + tmp97 2023-01-11T21:38:06.1980600Z tmp101 = 1 2023-01-11T21:38:06.1980702Z tmp102 = tl.where(tmp41, tmp101, 0.0) 2023-01-11T21:38:06.1980779Z tmp103 = tmp102 + tmp100 2023-01-11T21:38:06.1980851Z tmp104 = 1 2023-01-11T21:38:06.1980950Z tmp105 = tl.where(tmp47, tmp104, 0.0) 2023-01-11T21:38:06.1981031Z tmp106 = tmp105 + tmp103 2023-01-11T21:38:06.1981102Z tmp107 = 1 2023-01-11T21:38:06.1981203Z tmp108 = tl.where(tmp51, tmp107, 0.0) 2023-01-11T21:38:06.1981284Z tmp109 = tmp108 + tmp106 2023-01-11T21:38:06.1981349Z tmp110 = 1 2023-01-11T21:38:06.1981446Z tmp111 = tl.where(tmp55, tmp110, 0.0) 2023-01-11T21:38:06.1981528Z tmp112 = tmp111 + tmp109 2023-01-11T21:38:06.1981601Z tmp113 = 1 2023-01-11T21:38:06.1981727Z tmp114 = tl.where(tmp59, tmp113, 0.0) 2023-01-11T21:38:06.1981811Z tmp115 = tmp114 + tmp112 2023-01-11T21:38:06.1981883Z tmp116 = 1 2023-01-11T21:38:06.1981971Z tmp117 = tl.where(tmp65, tmp116, 0.0) 2023-01-11T21:38:06.1982053Z tmp118 = tmp117 + tmp115 2023-01-11T21:38:06.1982125Z tmp119 = 1 2023-01-11T21:38:06.1982218Z tmp120 = tl.where(tmp69, tmp119, 0.0) 2023-01-11T21:38:06.1988990Z tmp121 = tmp120 + tmp118 2023-01-11T21:38:06.1989084Z tmp122 = 1 2023-01-11T21:38:06.1989188Z tmp123 = tl.where(tmp73, tmp122, 0.0) 2023-01-11T21:38:06.1989275Z tmp124 = tmp123 + tmp121 2023-01-11T21:38:06.1989360Z tmp125 = 1 2023-01-11T21:38:06.1989454Z tmp126 = tl.where(tmp77, tmp125, 0.0) 2023-01-11T21:38:06.1989541Z tmp127 = tmp126 + tmp124 2023-01-11T21:38:06.1989628Z tmp128 = tmp80 / tmp127 2023-01-11T21:38:06.1989767Z tl.store(out_ptr0 + (x4 + tl.zeros([XBLOCK], tl.int32)), tmp128, xmask) 2023-01-11T21:38:06.1989882Z ''') 2023-01-11T21:38:06.1989892Z 2023-01-11T21:38:06.1989897Z 2023-01-11T21:38:06.1990057Z triton_fused_add_1 = async_compile.triton(''' 2023-01-11T21:38:06.1990136Z import triton 2023-01-11T21:38:06.1990225Z import triton.language as tl 2023-01-11T21:38:06.1990347Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.1990452Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.1990585Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.1990717Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.1990723Z 2023-01-11T21:38:06.1991129Z @pointwise(size_hints=[2048], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.1991277Z @triton.jit 2023-01-11T21:38:06.1991406Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.1991482Z xnumel = 2048 2023-01-11T21:38:06.1991583Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.1991714Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.1991798Z xmask = xindex < xnumel 2023-01-11T21:38:06.1991870Z x0 = xindex 2023-01-11T21:38:06.1991972Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.1992036Z tmp1 = 1 2023-01-11T21:38:06.1992115Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.1992249Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.1992334Z ''') 2023-01-11T21:38:06.1992340Z 2023-01-11T21:38:06.1992344Z 2023-01-11T21:38:06.1992437Z async_compile.wait(globals()) 2023-01-11T21:38:06.1992516Z del async_compile 2023-01-11T21:38:06.1992521Z 2023-01-11T21:38:06.1992595Z def call(args): 2023-01-11T21:38:06.1992662Z arg0_1, = args 2023-01-11T21:38:06.1992737Z args.clear() 2023-01-11T21:38:06.1992830Z with torch.cuda.device(0): 2023-01-11T21:38:06.1993053Z buf0 = empty_strided((2, 4, 6, 6), (144, 36, 6, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.1993146Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.1993304Z triton_fused__adaptive_avg_pool2d_0.run(arg0_1, buf0, 288, grid=grid(288), stream=stream0) 2023-01-11T21:38:06.1993528Z buf1 = empty_strided((2, 4, 16, 16), (1024, 256, 16, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.1993667Z triton_fused_add_1.run(arg0_1, buf1, 2048, grid=grid(2048), stream=stream0) 2023-01-11T21:38:06.1993734Z del arg0_1 2023-01-11T21:38:06.1993851Z buf2 = aten._adaptive_avg_pool2d(buf1, [2, 5]) 2023-01-11T21:38:06.1993923Z del buf1 2023-01-11T21:38:06.1993997Z buf3 = buf2 2023-01-11T21:38:06.1994111Z assert_size_stride(buf3, (2, 4, 2, 5), (40, 10, 5, 1)) 2023-01-11T21:38:06.1994181Z del buf2 2023-01-11T21:38:06.1994262Z return (buf0, buf3, ) 2023-01-11T21:38:06.1994267Z 2023-01-11T21:38:06.1994272Z 2023-01-11T21:38:06.1994345Z if __name__ == "__main__": 2023-01-11T21:38:06.1994491Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.1994619Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.1994847Z arg0_1 = rand_strided((2, 4, 16, 16), (1024, 256, 16, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.1994961Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.1995233Z [2023-01-11 21:33:54,749] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 374 2023-01-11T21:38:06.1995653Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.1995789Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.1996048Z [2023-01-11 21:33:54,776] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 375 2023-01-11T21:38:06.1996310Z [2023-01-11 21:33:55,168] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 375 2023-01-11T21:38:06.1996717Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.1996878Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.1997133Z [2023-01-11 21:33:55,195] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 376 2023-01-11T21:38:06.1997138Z 2023-01-11T21:38:06.1997143Z 2023-01-11T21:38:06.1997241Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.1997318Z import torch 2023-01-11T21:38:06.1997392Z import random 2023-01-11T21:38:06.1997511Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.1997634Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.1997641Z 2023-01-11T21:38:06.1997723Z aten = torch.ops.aten 2023-01-11T21:38:06.1997853Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.1997948Z async_compile = AsyncCompile() 2023-01-11T21:38:06.1997953Z 2023-01-11T21:38:06.1998027Z import triton 2023-01-11T21:38:06.1998120Z import triton.language as tl 2023-01-11T21:38:06.1998244Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.1998386Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.1998391Z 2023-01-11T21:38:06.1998396Z 2023-01-11T21:38:06.1998581Z triton_fused__adaptive_avg_pool2d_0 = async_compile.triton(''' 2023-01-11T21:38:06.1998656Z import triton 2023-01-11T21:38:06.1998741Z import triton.language as tl 2023-01-11T21:38:06.1998857Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.1998960Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.1999096Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.1999223Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.1999228Z 2023-01-11T21:38:06.1999634Z @pointwise(size_hints=[512], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.1999709Z @triton.jit 2023-01-11T21:38:06.1999842Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.1999910Z xnumel = 288 2023-01-11T21:38:06.2000006Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2000133Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.2000217Z xmask = xindex < xnumel 2023-01-11T21:38:06.2000326Z x1 = (xindex // 6) % 6 2023-01-11T21:38:06.2000402Z x0 = xindex % 6 2023-01-11T21:38:06.2000481Z x2 = (xindex // 36) 2023-01-11T21:38:06.2000544Z x4 = xindex 2023-01-11T21:38:06.2000619Z tmp0 = (x1 // 2) 2023-01-11T21:38:06.2000701Z tmp1 = ((8 + (3*x1)) // 6) 2023-01-11T21:38:06.2000780Z tmp2 = tmp0 < tmp1 2023-01-11T21:38:06.2000853Z tmp3 = (x0 // 2) 2023-01-11T21:38:06.2000934Z tmp4 = ((8 + (3*x0)) // 6) 2023-01-11T21:38:06.2001006Z tmp5 = tmp3 < tmp4 2023-01-11T21:38:06.2001085Z tmp6 = tmp2 & tmp5 2023-01-11T21:38:06.2001381Z tmp7 = tl.load(in_ptr0 + ((3*(x1 // 2)) + (9*x2) + (x0 // 2) + tl.zeros([XBLOCK], tl.int32)), tmp6 & xmask, eviction_policy='evict_last', other=0) 2023-01-11T21:38:06.2001483Z tmp8 = tl.where(tmp6, tmp7, 0.0) 2023-01-11T21:38:06.2001562Z tmp9 = 1 + (x0 // 2) 2023-01-11T21:38:06.2001642Z tmp10 = tmp9 < tmp4 2023-01-11T21:38:06.2001723Z tmp11 = tmp2 & tmp10 2023-01-11T21:38:06.2002029Z tmp12 = tl.load(in_ptr0 + (1 + (3*(x1 // 2)) + (9*x2) + (x0 // 2) + tl.zeros([XBLOCK], tl.int32)), tmp11 & xmask, eviction_policy='evict_last', other=0) 2023-01-11T21:38:06.2002119Z tmp13 = tl.where(tmp11, tmp12, 0.0) 2023-01-11T21:38:06.2002199Z tmp14 = tmp13 + tmp8 2023-01-11T21:38:06.2002277Z tmp15 = 1 + (x1 // 2) 2023-01-11T21:38:06.2002356Z tmp16 = tmp15 < tmp1 2023-01-11T21:38:06.2002436Z tmp17 = tmp16 & tmp5 2023-01-11T21:38:06.2002736Z tmp18 = tl.load(in_ptr0 + (3 + (3*(x1 // 2)) + (9*x2) + (x0 // 2) + tl.zeros([XBLOCK], tl.int32)), tmp17 & xmask, eviction_policy='evict_last', other=0) 2023-01-11T21:38:06.2002833Z tmp19 = tl.where(tmp17, tmp18, 0.0) 2023-01-11T21:38:06.2002937Z tmp20 = tmp19 + tmp14 2023-01-11T21:38:06.2003018Z tmp21 = tmp16 & tmp10 2023-01-11T21:38:06.2003306Z tmp22 = tl.load(in_ptr0 + (4 + (3*(x1 // 2)) + (9*x2) + (x0 // 2) + tl.zeros([XBLOCK], tl.int32)), tmp21 & xmask, eviction_policy='evict_last', other=0) 2023-01-11T21:38:06.2003405Z tmp23 = tl.where(tmp21, tmp22, 0.0) 2023-01-11T21:38:06.2003489Z tmp24 = tmp23 + tmp20 2023-01-11T21:38:06.2003562Z tmp25 = 1 2023-01-11T21:38:06.2003659Z tmp26 = tl.where(tmp6, tmp25, 0.0) 2023-01-11T21:38:06.2003724Z tmp27 = 1 2023-01-11T21:38:06.2003816Z tmp28 = tl.where(tmp11, tmp27, 0.0) 2023-01-11T21:38:06.2003900Z tmp29 = tmp28 + tmp26 2023-01-11T21:38:06.2003971Z tmp30 = 1 2023-01-11T21:38:06.2004063Z tmp31 = tl.where(tmp17, tmp30, 0.0) 2023-01-11T21:38:06.2004143Z tmp32 = tmp31 + tmp29 2023-01-11T21:38:06.2004216Z tmp33 = 1 2023-01-11T21:38:06.2004301Z tmp34 = tl.where(tmp21, tmp33, 0.0) 2023-01-11T21:38:06.2004381Z tmp35 = tmp34 + tmp32 2023-01-11T21:38:06.2004463Z tmp36 = tmp24 / tmp35 2023-01-11T21:38:06.2004598Z tl.store(out_ptr0 + (x4 + tl.zeros([XBLOCK], tl.int32)), tmp36, xmask) 2023-01-11T21:38:06.2004684Z ''') 2023-01-11T21:38:06.2004690Z 2023-01-11T21:38:06.2004694Z 2023-01-11T21:38:06.2004884Z triton_fused__adaptive_avg_pool2d_1_1 = async_compile.triton(''' 2023-01-11T21:38:06.2004962Z import triton 2023-01-11T21:38:06.2005048Z import triton.language as tl 2023-01-11T21:38:06.2005164Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2005270Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2005402Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.2005527Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2005532Z 2023-01-11T21:38:06.2005933Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.2006007Z @triton.jit 2023-01-11T21:38:06.2006138Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.2006205Z xnumel = 80 2023-01-11T21:38:06.2006300Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2006455Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.2006543Z xmask = xindex < xnumel 2023-01-11T21:38:06.2006624Z x1 = (xindex // 5) % 2 2023-01-11T21:38:06.2006696Z x0 = xindex % 5 2023-01-11T21:38:06.2006774Z x2 = (xindex // 10) 2023-01-11T21:38:06.2006837Z x4 = xindex 2023-01-11T21:38:06.2006918Z tmp0 = ((3*x1) // 2) 2023-01-11T21:38:06.2006999Z tmp1 = 2 + (((3*x1) // 2)) 2023-01-11T21:38:06.2007077Z tmp2 = tmp0 < tmp1 2023-01-11T21:38:06.2007153Z tmp3 = ((3*x0) // 5) 2023-01-11T21:38:06.2007234Z tmp4 = ((7 + (3*x0)) // 5) 2023-01-11T21:38:06.2007313Z tmp5 = tmp3 < tmp4 2023-01-11T21:38:06.2007383Z tmp6 = tmp2 & tmp5 2023-01-11T21:38:06.2007556Z tmp7 = tl.load(in_ptr0 + ((3*(((3*x1) // 2))) + (9*x2) + (((3*x0) // 5)) + tl.zeros([XBLOCK], tl.int32)), tmp6 & xmask, other=0) 2023-01-11T21:38:06.2007626Z tmp8 = 1 2023-01-11T21:38:06.2007704Z tmp9 = tmp7 + tmp8 2023-01-11T21:38:06.2007799Z tmp10 = tl.where(tmp6, tmp9, 0.0) 2023-01-11T21:38:06.2007882Z tmp11 = 1 + (((3*x0) // 5)) 2023-01-11T21:38:06.2007962Z tmp12 = tmp11 < tmp4 2023-01-11T21:38:06.2008035Z tmp13 = tmp2 & tmp12 2023-01-11T21:38:06.2008208Z tmp14 = tl.load(in_ptr0 + (1 + (3*(((3*x1) // 2))) + (9*x2) + (((3*x0) // 5)) + tl.zeros([XBLOCK], tl.int32)), tmp13 & xmask, other=0) 2023-01-11T21:38:06.2008280Z tmp15 = 1 2023-01-11T21:38:06.2008360Z tmp16 = tmp14 + tmp15 2023-01-11T21:38:06.2008454Z tmp17 = tl.where(tmp13, tmp16, 0.0) 2023-01-11T21:38:06.2008535Z tmp18 = tmp17 + tmp10 2023-01-11T21:38:06.2008615Z tmp19 = 1 + (((3*x1) // 2)) 2023-01-11T21:38:06.2008688Z tmp20 = tmp19 < tmp1 2023-01-11T21:38:06.2008795Z tmp21 = tmp20 & tmp5 2023-01-11T21:38:06.2008960Z tmp22 = tl.load(in_ptr0 + (3 + (3*(((3*x1) // 2))) + (9*x2) + (((3*x0) // 5)) + tl.zeros([XBLOCK], tl.int32)), tmp21 & xmask, other=0) 2023-01-11T21:38:06.2009034Z tmp23 = 1 2023-01-11T21:38:06.2009114Z tmp24 = tmp22 + tmp23 2023-01-11T21:38:06.2009216Z tmp25 = tl.where(tmp21, tmp24, 0.0) 2023-01-11T21:38:06.2009297Z tmp26 = tmp25 + tmp18 2023-01-11T21:38:06.2009369Z tmp27 = tmp20 & tmp12 2023-01-11T21:38:06.2009529Z tmp28 = tl.load(in_ptr0 + (4 + (3*(((3*x1) // 2))) + (9*x2) + (((3*x0) // 5)) + tl.zeros([XBLOCK], tl.int32)), tmp27 & xmask, other=0) 2023-01-11T21:38:06.2009601Z tmp29 = 1 2023-01-11T21:38:06.2009679Z tmp30 = tmp28 + tmp29 2023-01-11T21:38:06.2009772Z tmp31 = tl.where(tmp27, tmp30, 0.0) 2023-01-11T21:38:06.2009851Z tmp32 = tmp31 + tmp26 2023-01-11T21:38:06.2009924Z tmp33 = 1 2023-01-11T21:38:06.2010011Z tmp34 = tl.where(tmp6, tmp33, 0.0) 2023-01-11T21:38:06.2010082Z tmp35 = 1 2023-01-11T21:38:06.2010180Z tmp36 = tl.where(tmp13, tmp35, 0.0) 2023-01-11T21:38:06.2010258Z tmp37 = tmp36 + tmp34 2023-01-11T21:38:06.2010329Z tmp38 = 1 2023-01-11T21:38:06.2010422Z tmp39 = tl.where(tmp21, tmp38, 0.0) 2023-01-11T21:38:06.2010503Z tmp40 = tmp39 + tmp37 2023-01-11T21:38:06.2010567Z tmp41 = 1 2023-01-11T21:38:06.2010661Z tmp42 = tl.where(tmp27, tmp41, 0.0) 2023-01-11T21:38:06.2010741Z tmp43 = tmp42 + tmp40 2023-01-11T21:38:06.2010821Z tmp44 = tmp32 / tmp43 2023-01-11T21:38:06.2010957Z tl.store(out_ptr0 + (x4 + tl.zeros([XBLOCK], tl.int32)), tmp44, xmask) 2023-01-11T21:38:06.2011048Z ''') 2023-01-11T21:38:06.2011053Z 2023-01-11T21:38:06.2011058Z 2023-01-11T21:38:06.2011151Z async_compile.wait(globals()) 2023-01-11T21:38:06.2011221Z del async_compile 2023-01-11T21:38:06.2011226Z 2023-01-11T21:38:06.2011302Z def call(args): 2023-01-11T21:38:06.2011375Z arg0_1, = args 2023-01-11T21:38:06.2011455Z args.clear() 2023-01-11T21:38:06.2011551Z with torch.cuda.device(0): 2023-01-11T21:38:06.2011770Z buf0 = empty_strided((2, 4, 6, 6), (144, 36, 6, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.2011863Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.2012016Z triton_fused__adaptive_avg_pool2d_0.run(arg0_1, buf0, 288, grid=grid(288), stream=stream0) 2023-01-11T21:38:06.2012259Z buf1 = empty_strided((2, 4, 2, 5), (40, 10, 5, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.2012418Z triton_fused__adaptive_avg_pool2d_1_1.run(arg0_1, buf1, 80, grid=grid(80), stream=stream0) 2023-01-11T21:38:06.2012494Z del arg0_1 2023-01-11T21:38:06.2012576Z return (buf0, buf1, ) 2023-01-11T21:38:06.2012581Z 2023-01-11T21:38:06.2012586Z 2023-01-11T21:38:06.2012666Z if __name__ == "__main__": 2023-01-11T21:38:06.2012785Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.2012912Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.2013126Z arg0_1 = rand_strided((2, 4, 3, 3), (36, 9, 3, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.2013237Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.2013242Z 2023-01-11T21:38:06.2013506Z [2023-01-11 21:33:55,478] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 376 2023-01-11T21:38:06.2013927Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.2014059Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.2014314Z [2023-01-11 21:33:55,505] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 377 2023-01-11T21:38:06.2014320Z 2023-01-11T21:38:06.2014417Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.2014752Z import torch 2023-01-11T21:38:06.2014826Z import random 2023-01-11T21:38:06.2014945Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.2015062Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.2015067Z 2023-01-11T21:38:06.2015150Z aten = torch.ops.aten 2023-01-11T21:38:06.2015289Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.2015385Z async_compile = AsyncCompile() 2023-01-11T21:38:06.2015390Z 2023-01-11T21:38:06.2015463Z import triton 2023-01-11T21:38:06.2015556Z import triton.language as tl 2023-01-11T21:38:06.2015681Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.2015814Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.2015825Z 2023-01-11T21:38:06.2015829Z 2023-01-11T21:38:06.2016010Z triton_fused__adaptive_avg_pool2d_0 = async_compile.triton(''' 2023-01-11T21:38:06.2016084Z import triton 2023-01-11T21:38:06.2016181Z import triton.language as tl 2023-01-11T21:38:06.2016294Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2016395Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2016527Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.2016656Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2016665Z 2023-01-11T21:38:06.2017069Z @pointwise(size_hints=[512], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.2017192Z @triton.jit 2023-01-11T21:38:06.2017335Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.2017410Z xnumel = 288 2023-01-11T21:38:06.2017508Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2017638Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.2017724Z xmask = xindex < xnumel 2023-01-11T21:38:06.2017805Z x1 = (xindex // 6) % 6 2023-01-11T21:38:06.2017872Z x0 = xindex % 6 2023-01-11T21:38:06.2017951Z x2 = (xindex // 36) 2023-01-11T21:38:06.2018022Z x4 = xindex 2023-01-11T21:38:06.2018096Z tmp0 = (x1 // 2) 2023-01-11T21:38:06.2018178Z tmp1 = ((8 + (3*x1)) // 6) 2023-01-11T21:38:06.2018307Z tmp2 = tmp0 < tmp1 2023-01-11T21:38:06.2018383Z tmp3 = (x0 // 2) 2023-01-11T21:38:06.2018456Z tmp4 = ((8 + (3*x0)) // 6) 2023-01-11T21:38:06.2018533Z tmp5 = tmp3 < tmp4 2023-01-11T21:38:06.2018608Z tmp6 = tmp2 & tmp5 2023-01-11T21:38:06.2018926Z tmp7 = tl.load(in_ptr0 + ((3*(x1 // 2)) + (9*x2) + (x0 // 2) + tl.zeros([XBLOCK], tl.int32)), tmp6 & xmask, eviction_policy='evict_last', other=0).to(tl.float32) 2023-01-11T21:38:06.2019024Z tmp8 = tl.where(tmp6, tmp7, 0.0) 2023-01-11T21:38:06.2019102Z tmp9 = 1 + (x0 // 2) 2023-01-11T21:38:06.2019185Z tmp10 = tmp9 < tmp4 2023-01-11T21:38:06.2019259Z tmp11 = tmp2 & tmp10 2023-01-11T21:38:06.2019584Z tmp12 = tl.load(in_ptr0 + (1 + (3*(x1 // 2)) + (9*x2) + (x0 // 2) + tl.zeros([XBLOCK], tl.int32)), tmp11 & xmask, eviction_policy='evict_last', other=0).to(tl.float32) 2023-01-11T21:38:06.2019682Z tmp13 = tl.where(tmp11, tmp12, 0.0) 2023-01-11T21:38:06.2019762Z tmp14 = tmp13 + tmp8 2023-01-11T21:38:06.2019844Z tmp15 = 1 + (x1 // 2) 2023-01-11T21:38:06.2019924Z tmp16 = tmp15 < tmp1 2023-01-11T21:38:06.2020008Z tmp17 = tmp16 & tmp5 2023-01-11T21:38:06.2020318Z tmp18 = tl.load(in_ptr0 + (3 + (3*(x1 // 2)) + (9*x2) + (x0 // 2) + tl.zeros([XBLOCK], tl.int32)), tmp17 & xmask, eviction_policy='evict_last', other=0).to(tl.float32) 2023-01-11T21:38:06.2020414Z tmp19 = tl.where(tmp17, tmp18, 0.0) 2023-01-11T21:38:06.2020494Z tmp20 = tmp19 + tmp14 2023-01-11T21:38:06.2020576Z tmp21 = tmp16 & tmp10 2023-01-11T21:38:06.2020886Z tmp22 = tl.load(in_ptr0 + (4 + (3*(x1 // 2)) + (9*x2) + (x0 // 2) + tl.zeros([XBLOCK], tl.int32)), tmp21 & xmask, eviction_policy='evict_last', other=0).to(tl.float32) 2023-01-11T21:38:06.2021019Z tmp23 = tl.where(tmp21, tmp22, 0.0) 2023-01-11T21:38:06.2021100Z tmp24 = tmp23 + tmp20 2023-01-11T21:38:06.2021172Z tmp25 = 1 2023-01-11T21:38:06.2021260Z tmp26 = tl.where(tmp6, tmp25, 0.0) 2023-01-11T21:38:06.2021331Z tmp27 = 1 2023-01-11T21:38:06.2021427Z tmp28 = tl.where(tmp11, tmp27, 0.0) 2023-01-11T21:38:06.2021511Z tmp29 = tmp28 + tmp26 2023-01-11T21:38:06.2021584Z tmp30 = 1 2023-01-11T21:38:06.2021675Z tmp31 = tl.where(tmp17, tmp30, 0.0) 2023-01-11T21:38:06.2021748Z tmp32 = tmp31 + tmp29 2023-01-11T21:38:06.2021819Z tmp33 = 1 2023-01-11T21:38:06.2021912Z tmp34 = tl.where(tmp21, tmp33, 0.0) 2023-01-11T21:38:06.2021992Z tmp35 = tmp34 + tmp32 2023-01-11T21:38:06.2022071Z tmp36 = tmp24 / tmp35 2023-01-11T21:38:06.2022208Z tl.store(out_ptr0 + (x4 + tl.zeros([XBLOCK], tl.int32)), tmp36, xmask) 2023-01-11T21:38:06.2022294Z ''') 2023-01-11T21:38:06.2022300Z 2023-01-11T21:38:06.2022308Z 2023-01-11T21:38:06.2022499Z triton_fused__adaptive_avg_pool2d_1_1 = async_compile.triton(''' 2023-01-11T21:38:06.2022567Z import triton 2023-01-11T21:38:06.2022661Z import triton.language as tl 2023-01-11T21:38:06.2022777Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2022878Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2023016Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.2023142Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2023147Z 2023-01-11T21:38:06.2023552Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.2023628Z @triton.jit 2023-01-11T21:38:06.2023754Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.2023827Z xnumel = 80 2023-01-11T21:38:06.2023929Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2024058Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.2024141Z xmask = xindex < xnumel 2023-01-11T21:38:06.2024227Z x1 = (xindex // 5) % 2 2023-01-11T21:38:06.2024300Z x0 = xindex % 5 2023-01-11T21:38:06.2024371Z x2 = (xindex // 10) 2023-01-11T21:38:06.2024477Z x4 = xindex 2023-01-11T21:38:06.2024557Z tmp0 = ((3*x1) // 2) 2023-01-11T21:38:06.2024638Z tmp1 = 2 + (((3*x1) // 2)) 2023-01-11T21:38:06.2024717Z tmp2 = tmp0 < tmp1 2023-01-11T21:38:06.2024794Z tmp3 = ((3*x0) // 5) 2023-01-11T21:38:06.2024868Z tmp4 = ((7 + (3*x0)) // 5) 2023-01-11T21:38:06.2024946Z tmp5 = tmp3 < tmp4 2023-01-11T21:38:06.2025024Z tmp6 = tmp2 & tmp5 2023-01-11T21:38:06.2025211Z tmp7 = tl.load(in_ptr0 + ((3*(((3*x1) // 2))) + (9*x2) + (((3*x0) // 5)) + tl.zeros([XBLOCK], tl.int32)), tmp6 & xmask, other=0).to(tl.float32) 2023-01-11T21:38:06.2025281Z tmp8 = 1 2023-01-11T21:38:06.2025364Z tmp9 = tmp7 + tmp8 2023-01-11T21:38:06.2025460Z tmp10 = tl.where(tmp6, tmp9, 0.0) 2023-01-11T21:38:06.2025534Z tmp11 = 1 + (((3*x0) // 5)) 2023-01-11T21:38:06.2025614Z tmp12 = tmp11 < tmp4 2023-01-11T21:38:06.2025695Z tmp13 = tmp2 & tmp12 2023-01-11T21:38:06.2025883Z tmp14 = tl.load(in_ptr0 + (1 + (3*(((3*x1) // 2))) + (9*x2) + (((3*x0) // 5)) + tl.zeros([XBLOCK], tl.int32)), tmp13 & xmask, other=0).to(tl.float32) 2023-01-11T21:38:06.2025958Z tmp15 = 1 2023-01-11T21:38:06.2026040Z tmp16 = tmp14 + tmp15 2023-01-11T21:38:06.2026133Z tmp17 = tl.where(tmp13, tmp16, 0.0) 2023-01-11T21:38:06.2026206Z tmp18 = tmp17 + tmp10 2023-01-11T21:38:06.2026287Z tmp19 = 1 + (((3*x1) // 2)) 2023-01-11T21:38:06.2026366Z tmp20 = tmp19 < tmp1 2023-01-11T21:38:06.2026445Z tmp21 = tmp20 & tmp5 2023-01-11T21:38:06.2026628Z tmp22 = tl.load(in_ptr0 + (3 + (3*(((3*x1) // 2))) + (9*x2) + (((3*x0) // 5)) + tl.zeros([XBLOCK], tl.int32)), tmp21 & xmask, other=0).to(tl.float32) 2023-01-11T21:38:06.2026731Z tmp23 = 1 2023-01-11T21:38:06.2026812Z tmp24 = tmp22 + tmp23 2023-01-11T21:38:06.2026907Z tmp25 = tl.where(tmp21, tmp24, 0.0) 2023-01-11T21:38:06.2026980Z tmp26 = tmp25 + tmp18 2023-01-11T21:38:06.2027059Z tmp27 = tmp20 & tmp12 2023-01-11T21:38:06.2027244Z tmp28 = tl.load(in_ptr0 + (4 + (3*(((3*x1) // 2))) + (9*x2) + (((3*x0) // 5)) + tl.zeros([XBLOCK], tl.int32)), tmp27 & xmask, other=0).to(tl.float32) 2023-01-11T21:38:06.2027319Z tmp29 = 1 2023-01-11T21:38:06.2027399Z tmp30 = tmp28 + tmp29 2023-01-11T21:38:06.2027496Z tmp31 = tl.where(tmp27, tmp30, 0.0) 2023-01-11T21:38:06.2027576Z tmp32 = tmp31 + tmp26 2023-01-11T21:38:06.2027640Z tmp33 = 1 2023-01-11T21:38:06.2027734Z tmp34 = tl.where(tmp6, tmp33, 0.0) 2023-01-11T21:38:06.2027804Z tmp35 = 1 2023-01-11T21:38:06.2027898Z tmp36 = tl.where(tmp13, tmp35, 0.0) 2023-01-11T21:38:06.2027979Z tmp37 = tmp36 + tmp34 2023-01-11T21:38:06.2028050Z tmp38 = 1 2023-01-11T21:38:06.2028138Z tmp39 = tl.where(tmp21, tmp38, 0.0) 2023-01-11T21:38:06.2028222Z tmp40 = tmp39 + tmp37 2023-01-11T21:38:06.2028292Z tmp41 = 1 2023-01-11T21:38:06.2028384Z tmp42 = tl.where(tmp27, tmp41, 0.0) 2023-01-11T21:38:06.2028464Z tmp43 = tmp42 + tmp40 2023-01-11T21:38:06.2028543Z tmp44 = tmp32 / tmp43 2023-01-11T21:38:06.2028682Z tl.store(out_ptr0 + (x4 + tl.zeros([XBLOCK], tl.int32)), tmp44, xmask) 2023-01-11T21:38:06.2028765Z ''') 2023-01-11T21:38:06.2028779Z 2023-01-11T21:38:06.2028784Z 2023-01-11T21:38:06.2028871Z async_compile.wait(globals()) 2023-01-11T21:38:06.2028948Z del async_compile 2023-01-11T21:38:06.2028954Z 2023-01-11T21:38:06.2029027Z def call(args): 2023-01-11T21:38:06.2029100Z arg0_1, = args 2023-01-11T21:38:06.2029175Z args.clear() 2023-01-11T21:38:06.2029268Z with torch.cuda.device(0): 2023-01-11T21:38:06.2029486Z buf0 = empty_strided((2, 4, 6, 6), (144, 36, 6, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.2029572Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.2029735Z triton_fused__adaptive_avg_pool2d_0.run(arg0_1, buf0, 288, grid=grid(288), stream=stream0) 2023-01-11T21:38:06.2029949Z buf1 = empty_strided((2, 4, 2, 5), (40, 10, 5, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.2030139Z triton_fused__adaptive_avg_pool2d_1_1.run(arg0_1, buf1, 80, grid=grid(80), stream=stream0) 2023-01-11T21:38:06.2030214Z del arg0_1 2023-01-11T21:38:06.2030297Z return (buf0, buf1, ) 2023-01-11T21:38:06.2030302Z 2023-01-11T21:38:06.2030307Z 2023-01-11T21:38:06.2030389Z if __name__ == "__main__": 2023-01-11T21:38:06.2030508Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.2030629Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.2030845Z arg0_1 = rand_strided((2, 4, 3, 3), (36, 9, 3, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.2030957Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.2030965Z 2023-01-11T21:38:06.2031228Z [2023-01-11 21:33:55,784] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 377 2023-01-11T21:38:06.2031647Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.2031779Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.2032035Z [2023-01-11 21:33:55,811] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 378 2023-01-11T21:38:06.2032040Z 2023-01-11T21:38:06.2032137Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.2032212Z import torch 2023-01-11T21:38:06.2032280Z import random 2023-01-11T21:38:06.2032400Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.2032565Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.2032570Z 2023-01-11T21:38:06.2032652Z aten = torch.ops.aten 2023-01-11T21:38:06.2032789Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.2032886Z async_compile = AsyncCompile() 2023-01-11T21:38:06.2032891Z 2023-01-11T21:38:06.2032966Z import triton 2023-01-11T21:38:06.2033056Z import triton.language as tl 2023-01-11T21:38:06.2033174Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.2033313Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.2033319Z 2023-01-11T21:38:06.2033323Z 2023-01-11T21:38:06.2033508Z triton_fused__adaptive_avg_pool2d_0 = async_compile.triton(''' 2023-01-11T21:38:06.2033583Z import triton 2023-01-11T21:38:06.2033677Z import triton.language as tl 2023-01-11T21:38:06.2033792Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2033894Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2034031Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.2034149Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2034154Z 2023-01-11T21:38:06.2034558Z @pointwise(size_hints=[512], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.2034634Z @triton.jit 2023-01-11T21:38:06.2034766Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.2034839Z xnumel = 288 2023-01-11T21:38:06.2034935Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2035065Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.2035149Z xmask = xindex < xnumel 2023-01-11T21:38:06.2035213Z x0 = xindex 2023-01-11T21:38:06.2035403Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.2035542Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask) 2023-01-11T21:38:06.2035629Z ''') 2023-01-11T21:38:06.2035634Z 2023-01-11T21:38:06.2035638Z 2023-01-11T21:38:06.2035825Z triton_fused__adaptive_avg_pool2d_1_1 = async_compile.triton(''' 2023-01-11T21:38:06.2035900Z import triton 2023-01-11T21:38:06.2036023Z import triton.language as tl 2023-01-11T21:38:06.2036131Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2036233Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2036364Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.2036489Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2036494Z 2023-01-11T21:38:06.2036892Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.2036966Z @triton.jit 2023-01-11T21:38:06.2037103Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.2037176Z xnumel = 80 2023-01-11T21:38:06.2037265Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2037393Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.2037476Z xmask = xindex < xnumel 2023-01-11T21:38:06.2037559Z x1 = (xindex // 5) % 2 2023-01-11T21:38:06.2037633Z x0 = xindex % 5 2023-01-11T21:38:06.2037710Z x3 = (xindex // 5) 2023-01-11T21:38:06.2037782Z x4 = xindex 2023-01-11T21:38:06.2037849Z tmp0 = 3*x1 2023-01-11T21:38:06.2037924Z tmp1 = 3 + (3*x1) 2023-01-11T21:38:06.2038001Z tmp2 = tmp0 < tmp1 2023-01-11T21:38:06.2038080Z tmp3 = ((6*x0) // 5) 2023-01-11T21:38:06.2038163Z tmp4 = 2 + (((6*x0) // 5)) 2023-01-11T21:38:06.2038240Z tmp5 = tmp3 < tmp4 2023-01-11T21:38:06.2038321Z tmp6 = tmp2 & tmp5 2023-01-11T21:38:06.2038477Z tmp7 = tl.load(in_ptr0 + ((18*x3) + (((6*x0) // 5)) + tl.zeros([XBLOCK], tl.int32)), tmp6 & xmask, other=0) 2023-01-11T21:38:06.2038577Z tmp8 = 1 2023-01-11T21:38:06.2038654Z tmp9 = tmp7 + tmp8 2023-01-11T21:38:06.2038749Z tmp10 = tl.where(tmp6, tmp9, 0.0) 2023-01-11T21:38:06.2038829Z tmp11 = 1 + (((6*x0) // 5)) 2023-01-11T21:38:06.2038910Z tmp12 = tmp11 < tmp4 2023-01-11T21:38:06.2038998Z tmp13 = tmp2 & tmp12 2023-01-11T21:38:06.2039153Z tmp14 = tl.load(in_ptr0 + (1 + (18*x3) + (((6*x0) // 5)) + tl.zeros([XBLOCK], tl.int32)), tmp13 & xmask, other=0) 2023-01-11T21:38:06.2039226Z tmp15 = 1 2023-01-11T21:38:06.2039306Z tmp16 = tmp14 + tmp15 2023-01-11T21:38:06.2039401Z tmp17 = tl.where(tmp13, tmp16, 0.0) 2023-01-11T21:38:06.2039481Z tmp18 = tmp17 + tmp10 2023-01-11T21:38:06.2039555Z tmp19 = 1 + (3*x1) 2023-01-11T21:38:06.2039637Z tmp20 = tmp19 < tmp1 2023-01-11T21:38:06.2039709Z tmp21 = tmp20 & tmp5 2023-01-11T21:38:06.2039862Z tmp22 = tl.load(in_ptr0 + (6 + (18*x3) + (((6*x0) // 5)) + tl.zeros([XBLOCK], tl.int32)), tmp21 & xmask, other=0) 2023-01-11T21:38:06.2039939Z tmp23 = 1 2023-01-11T21:38:06.2040019Z tmp24 = tmp22 + tmp23 2023-01-11T21:38:06.2040112Z tmp25 = tl.where(tmp21, tmp24, 0.0) 2023-01-11T21:38:06.2040192Z tmp26 = tmp25 + tmp18 2023-01-11T21:38:06.2040272Z tmp27 = tmp20 & tmp12 2023-01-11T21:38:06.2040429Z tmp28 = tl.load(in_ptr0 + (7 + (18*x3) + (((6*x0) // 5)) + tl.zeros([XBLOCK], tl.int32)), tmp27 & xmask, other=0) 2023-01-11T21:38:06.2040502Z tmp29 = 1 2023-01-11T21:38:06.2040580Z tmp30 = tmp28 + tmp29 2023-01-11T21:38:06.2040677Z tmp31 = tl.where(tmp27, tmp30, 0.0) 2023-01-11T21:38:06.2040757Z tmp32 = tmp31 + tmp26 2023-01-11T21:38:06.2040835Z tmp33 = 2 + (3*x1) 2023-01-11T21:38:06.2040915Z tmp34 = tmp33 < tmp1 2023-01-11T21:38:06.2040988Z tmp35 = tmp34 & tmp5 2023-01-11T21:38:06.2041147Z tmp36 = tl.load(in_ptr0 + (12 + (18*x3) + (((6*x0) // 5)) + tl.zeros([XBLOCK], tl.int32)), tmp35 & xmask, other=0) 2023-01-11T21:38:06.2041219Z tmp37 = 1 2023-01-11T21:38:06.2041304Z tmp38 = tmp36 + tmp37 2023-01-11T21:38:06.2041398Z tmp39 = tl.where(tmp35, tmp38, 0.0) 2023-01-11T21:38:06.2041479Z tmp40 = tmp39 + tmp32 2023-01-11T21:38:06.2041557Z tmp41 = tmp34 & tmp12 2023-01-11T21:38:06.2041735Z tmp42 = tl.load(in_ptr0 + (13 + (18*x3) + (((6*x0) // 5)) + tl.zeros([XBLOCK], tl.int32)), tmp41 & xmask, other=0) 2023-01-11T21:38:06.2041812Z tmp43 = 1 2023-01-11T21:38:06.2041891Z tmp44 = tmp42 + tmp43 2023-01-11T21:38:06.2041987Z tmp45 = tl.where(tmp41, tmp44, 0.0) 2023-01-11T21:38:06.2042065Z tmp46 = tmp45 + tmp40 2023-01-11T21:38:06.2042147Z tmp47 = tmp1 < tmp1 2023-01-11T21:38:06.2042227Z tmp48 = tmp47 & tmp5 2023-01-11T21:38:06.2042375Z tmp49 = tl.load(in_ptr0 + (18 + (18*x3) + (((6*x0) // 5)) + tl.zeros([XBLOCK], tl.int32)), tmp48 & xmask, other=0) 2023-01-11T21:38:06.2042448Z tmp50 = 1 2023-01-11T21:38:06.2042528Z tmp51 = tmp49 + tmp50 2023-01-11T21:38:06.2042624Z tmp52 = tl.where(tmp48, tmp51, 0.0) 2023-01-11T21:38:06.2042706Z tmp53 = tmp52 + tmp46 2023-01-11T21:38:06.2042786Z tmp54 = tmp47 & tmp12 2023-01-11T21:38:06.2042944Z tmp55 = tl.load(in_ptr0 + (19 + (18*x3) + (((6*x0) // 5)) + tl.zeros([XBLOCK], tl.int32)), tmp54 & xmask, other=0) 2023-01-11T21:38:06.2043009Z tmp56 = 1 2023-01-11T21:38:06.2043088Z tmp57 = tmp55 + tmp56 2023-01-11T21:38:06.2043184Z tmp58 = tl.where(tmp54, tmp57, 0.0) 2023-01-11T21:38:06.2043263Z tmp59 = tmp58 + tmp53 2023-01-11T21:38:06.2043335Z tmp60 = 1 2023-01-11T21:38:06.2043430Z tmp61 = tl.where(tmp6, tmp60, 0.0) 2023-01-11T21:38:06.2043501Z tmp62 = 1 2023-01-11T21:38:06.2043587Z tmp63 = tl.where(tmp13, tmp62, 0.0) 2023-01-11T21:38:06.2043667Z tmp64 = tmp63 + tmp61 2023-01-11T21:38:06.2043737Z tmp65 = 1 2023-01-11T21:38:06.2043829Z tmp66 = tl.where(tmp21, tmp65, 0.0) 2023-01-11T21:38:06.2043910Z tmp67 = tmp66 + tmp64 2023-01-11T21:38:06.2043981Z tmp68 = 1 2023-01-11T21:38:06.2044066Z tmp69 = tl.where(tmp27, tmp68, 0.0) 2023-01-11T21:38:06.2044181Z tmp70 = tmp69 + tmp67 2023-01-11T21:38:06.2044252Z tmp71 = 1 2023-01-11T21:38:06.2044344Z tmp72 = tl.where(tmp35, tmp71, 0.0) 2023-01-11T21:38:06.2044423Z tmp73 = tmp72 + tmp70 2023-01-11T21:38:06.2044494Z tmp74 = 1 2023-01-11T21:38:06.2044586Z tmp75 = tl.where(tmp41, tmp74, 0.0) 2023-01-11T21:38:06.2044662Z tmp76 = tmp75 + tmp73 2023-01-11T21:38:06.2044732Z tmp77 = 1 2023-01-11T21:38:06.2044823Z tmp78 = tl.where(tmp48, tmp77, 0.0) 2023-01-11T21:38:06.2044902Z tmp79 = tmp78 + tmp76 2023-01-11T21:38:06.2044978Z tmp80 = 1 2023-01-11T21:38:06.2045069Z tmp81 = tl.where(tmp54, tmp80, 0.0) 2023-01-11T21:38:06.2045151Z tmp82 = tmp81 + tmp79 2023-01-11T21:38:06.2045223Z tmp83 = tmp59 / tmp82 2023-01-11T21:38:06.2045360Z tl.store(out_ptr0 + (x4 + tl.zeros([XBLOCK], tl.int32)), tmp83, xmask) 2023-01-11T21:38:06.2045448Z ''') 2023-01-11T21:38:06.2045454Z 2023-01-11T21:38:06.2045458Z 2023-01-11T21:38:06.2045556Z async_compile.wait(globals()) 2023-01-11T21:38:06.2045632Z del async_compile 2023-01-11T21:38:06.2045638Z 2023-01-11T21:38:06.2045713Z def call(args): 2023-01-11T21:38:06.2045785Z arg0_1, = args 2023-01-11T21:38:06.2045853Z args.clear() 2023-01-11T21:38:06.2045946Z with torch.cuda.device(0): 2023-01-11T21:38:06.2046168Z buf0 = empty_strided((2, 4, 6, 6), (144, 36, 6, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.2046262Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.2046419Z triton_fused__adaptive_avg_pool2d_0.run(arg0_1, buf0, 288, grid=grid(288), stream=stream0) 2023-01-11T21:38:06.2046633Z buf1 = empty_strided((2, 4, 2, 5), (40, 10, 5, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.2046790Z triton_fused__adaptive_avg_pool2d_1_1.run(arg0_1, buf1, 80, grid=grid(80), stream=stream0) 2023-01-11T21:38:06.2046862Z del arg0_1 2023-01-11T21:38:06.2046939Z return (buf0, buf1, ) 2023-01-11T21:38:06.2046944Z 2023-01-11T21:38:06.2046958Z 2023-01-11T21:38:06.2047032Z if __name__ == "__main__": 2023-01-11T21:38:06.2047149Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.2047275Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.2047488Z arg0_1 = rand_strided((2, 4, 6, 6), (144, 36, 6, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.2047625Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.2047631Z 2023-01-11T21:38:06.2047895Z [2023-01-11 21:33:56,054] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 378 2023-01-11T21:38:06.2047901Z 2023-01-11T21:38:06.2047998Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.2048078Z import torch 2023-01-11T21:38:06.2048145Z import random 2023-01-11T21:38:06.2048265Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.2048391Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.2048396Z 2023-01-11T21:38:06.2048479Z aten = torch.ops.aten 2023-01-11T21:38:06.2048618Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.2048715Z async_compile = AsyncCompile() 2023-01-11T21:38:06.2048720Z 2023-01-11T21:38:06.2048794Z import triton 2023-01-11T21:38:06.2048879Z import triton.language as tl 2023-01-11T21:38:06.2049006Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.2049149Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.2049154Z 2023-01-11T21:38:06.2049158Z 2023-01-11T21:38:06.2049343Z triton_fused__adaptive_avg_pool2d_0 = async_compile.triton(''' 2023-01-11T21:38:06.2049417Z import triton 2023-01-11T21:38:06.2049509Z import triton.language as tl 2023-01-11T21:38:06.2049625Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2049727Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2049853Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.2049977Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2050009Z 2023-01-11T21:38:06.2050411Z @pointwise(size_hints=[512], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.2050484Z @triton.jit 2023-01-11T21:38:06.2050617Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.2050694Z xnumel = 288 2023-01-11T21:38:06.2050787Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2050918Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.2050995Z xmask = xindex < xnumel 2023-01-11T21:38:06.2051065Z x0 = xindex 2023-01-11T21:38:06.2051279Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.2051413Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask) 2023-01-11T21:38:06.2051498Z ''') 2023-01-11T21:38:06.2051509Z 2023-01-11T21:38:06.2051514Z 2023-01-11T21:38:06.2051701Z triton_fused__adaptive_avg_pool2d_1_1 = async_compile.triton(''' 2023-01-11T21:38:06.2051777Z import triton 2023-01-11T21:38:06.2051863Z import triton.language as tl 2023-01-11T21:38:06.2051975Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2052080Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2052211Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.2052335Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2052341Z 2023-01-11T21:38:06.2052735Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.2052813Z @triton.jit 2023-01-11T21:38:06.2052942Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.2053008Z xnumel = 80 2023-01-11T21:38:06.2053106Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2053234Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.2053316Z xmask = xindex < xnumel 2023-01-11T21:38:06.2053399Z x1 = (xindex // 5) % 2 2023-01-11T21:38:06.2053477Z x0 = xindex % 5 2023-01-11T21:38:06.2053583Z x3 = (xindex // 5) 2023-01-11T21:38:06.2053648Z x4 = xindex 2023-01-11T21:38:06.2053721Z tmp0 = 3*x1 2023-01-11T21:38:06.2053797Z tmp1 = 3 + (3*x1) 2023-01-11T21:38:06.2053873Z tmp2 = tmp0 < tmp1 2023-01-11T21:38:06.2053951Z tmp3 = ((6*x0) // 5) 2023-01-11T21:38:06.2054031Z tmp4 = 2 + (((6*x0) // 5)) 2023-01-11T21:38:06.2054108Z tmp5 = tmp3 < tmp4 2023-01-11T21:38:06.2054178Z tmp6 = tmp2 & tmp5 2023-01-11T21:38:06.2054352Z tmp7 = tl.load(in_ptr0 + ((18*x3) + (((6*x0) // 5)) + tl.zeros([XBLOCK], tl.int32)), tmp6 & xmask, other=0).to(tl.float32) 2023-01-11T21:38:06.2054422Z tmp8 = 1 2023-01-11T21:38:06.2054616Z tmp9 = tmp7 + tmp8 2023-01-11T21:38:06.2054714Z tmp10 = tl.where(tmp6, tmp9, 0.0) 2023-01-11T21:38:06.2054796Z tmp11 = 1 + (((6*x0) // 5)) 2023-01-11T21:38:06.2054876Z tmp12 = tmp11 < tmp4 2023-01-11T21:38:06.2054949Z tmp13 = tmp2 & tmp12 2023-01-11T21:38:06.2055131Z tmp14 = tl.load(in_ptr0 + (1 + (18*x3) + (((6*x0) // 5)) + tl.zeros([XBLOCK], tl.int32)), tmp13 & xmask, other=0).to(tl.float32) 2023-01-11T21:38:06.2055204Z tmp15 = 1 2023-01-11T21:38:06.2055285Z tmp16 = tmp14 + tmp15 2023-01-11T21:38:06.2055379Z tmp17 = tl.where(tmp13, tmp16, 0.0) 2023-01-11T21:38:06.2055459Z tmp18 = tmp17 + tmp10 2023-01-11T21:38:06.2055533Z tmp19 = 1 + (3*x1) 2023-01-11T21:38:06.2055606Z tmp20 = tmp19 < tmp1 2023-01-11T21:38:06.2055685Z tmp21 = tmp20 & tmp5 2023-01-11T21:38:06.2055857Z tmp22 = tl.load(in_ptr0 + (6 + (18*x3) + (((6*x0) // 5)) + tl.zeros([XBLOCK], tl.int32)), tmp21 & xmask, other=0).to(tl.float32) 2023-01-11T21:38:06.2055930Z tmp23 = 1 2023-01-11T21:38:06.2056056Z tmp24 = tmp22 + tmp23 2023-01-11T21:38:06.2056150Z tmp25 = tl.where(tmp21, tmp24, 0.0) 2023-01-11T21:38:06.2056231Z tmp26 = tmp25 + tmp18 2023-01-11T21:38:06.2056303Z tmp27 = tmp20 & tmp12 2023-01-11T21:38:06.2056481Z tmp28 = tl.load(in_ptr0 + (7 + (18*x3) + (((6*x0) // 5)) + tl.zeros([XBLOCK], tl.int32)), tmp27 & xmask, other=0).to(tl.float32) 2023-01-11T21:38:06.2056553Z tmp29 = 1 2023-01-11T21:38:06.2056632Z tmp30 = tmp28 + tmp29 2023-01-11T21:38:06.2056727Z tmp31 = tl.where(tmp27, tmp30, 0.0) 2023-01-11T21:38:06.2056808Z tmp32 = tmp31 + tmp26 2023-01-11T21:38:06.2056882Z tmp33 = 2 + (3*x1) 2023-01-11T21:38:06.2056955Z tmp34 = tmp33 < tmp1 2023-01-11T21:38:06.2057033Z tmp35 = tmp34 & tmp5 2023-01-11T21:38:06.2057263Z tmp36 = tl.load(in_ptr0 + (12 + (18*x3) + (((6*x0) // 5)) + tl.zeros([XBLOCK], tl.int32)), tmp35 & xmask, other=0).to(tl.float32) 2023-01-11T21:38:06.2057337Z tmp37 = 1 2023-01-11T21:38:06.2057419Z tmp38 = tmp36 + tmp37 2023-01-11T21:38:06.2057519Z tmp39 = tl.where(tmp35, tmp38, 0.0) 2023-01-11T21:38:06.2057597Z tmp40 = tmp39 + tmp32 2023-01-11T21:38:06.2057671Z tmp41 = tmp34 & tmp12 2023-01-11T21:38:06.2057842Z tmp42 = tl.load(in_ptr0 + (13 + (18*x3) + (((6*x0) // 5)) + tl.zeros([XBLOCK], tl.int32)), tmp41 & xmask, other=0).to(tl.float32) 2023-01-11T21:38:06.2057917Z tmp43 = 1 2023-01-11T21:38:06.2057995Z tmp44 = tmp42 + tmp43 2023-01-11T21:38:06.2058089Z tmp45 = tl.where(tmp41, tmp44, 0.0) 2023-01-11T21:38:06.2058169Z tmp46 = tmp45 + tmp40 2023-01-11T21:38:06.2058248Z tmp47 = tmp1 < tmp1 2023-01-11T21:38:06.2058320Z tmp48 = tmp47 & tmp5 2023-01-11T21:38:06.2058490Z tmp49 = tl.load(in_ptr0 + (18 + (18*x3) + (((6*x0) // 5)) + tl.zeros([XBLOCK], tl.int32)), tmp48 & xmask, other=0).to(tl.float32) 2023-01-11T21:38:06.2058563Z tmp50 = 1 2023-01-11T21:38:06.2058643Z tmp51 = tmp49 + tmp50 2023-01-11T21:38:06.2058737Z tmp52 = tl.where(tmp48, tmp51, 0.0) 2023-01-11T21:38:06.2058818Z tmp53 = tmp52 + tmp46 2023-01-11T21:38:06.2058895Z tmp54 = tmp47 & tmp12 2023-01-11T21:38:06.2059060Z tmp55 = tl.load(in_ptr0 + (19 + (18*x3) + (((6*x0) // 5)) + tl.zeros([XBLOCK], tl.int32)), tmp54 & xmask, other=0).to(tl.float32) 2023-01-11T21:38:06.2059136Z tmp56 = 1 2023-01-11T21:38:06.2059215Z tmp57 = tmp55 + tmp56 2023-01-11T21:38:06.2059348Z tmp58 = tl.where(tmp54, tmp57, 0.0) 2023-01-11T21:38:06.2059428Z tmp59 = tmp58 + tmp53 2023-01-11T21:38:06.2059499Z tmp60 = 1 2023-01-11T21:38:06.2059593Z tmp61 = tl.where(tmp6, tmp60, 0.0) 2023-01-11T21:38:06.2059657Z tmp62 = 1 2023-01-11T21:38:06.2059750Z tmp63 = tl.where(tmp13, tmp62, 0.0) 2023-01-11T21:38:06.2059829Z tmp64 = tmp63 + tmp61 2023-01-11T21:38:06.2059901Z tmp65 = 1 2023-01-11T21:38:06.2059992Z tmp66 = tl.where(tmp21, tmp65, 0.0) 2023-01-11T21:38:06.2060071Z tmp67 = tmp66 + tmp64 2023-01-11T21:38:06.2060142Z tmp68 = 1 2023-01-11T21:38:06.2060227Z tmp69 = tl.where(tmp27, tmp68, 0.0) 2023-01-11T21:38:06.2060309Z tmp70 = tmp69 + tmp67 2023-01-11T21:38:06.2060382Z tmp71 = 1 2023-01-11T21:38:06.2060475Z tmp72 = tl.where(tmp35, tmp71, 0.0) 2023-01-11T21:38:06.2060554Z tmp73 = tmp72 + tmp70 2023-01-11T21:38:06.2060624Z tmp74 = 1 2023-01-11T21:38:06.2060715Z tmp75 = tl.where(tmp41, tmp74, 0.0) 2023-01-11T21:38:06.2060791Z tmp76 = tmp75 + tmp73 2023-01-11T21:38:06.2060861Z tmp77 = 1 2023-01-11T21:38:06.2060952Z tmp78 = tl.where(tmp48, tmp77, 0.0) 2023-01-11T21:38:06.2061032Z tmp79 = tmp78 + tmp76 2023-01-11T21:38:06.2061101Z tmp80 = 1 2023-01-11T21:38:06.2061194Z tmp81 = tl.where(tmp54, tmp80, 0.0) 2023-01-11T21:38:06.2061266Z tmp82 = tmp81 + tmp79 2023-01-11T21:38:06.2061345Z tmp83 = tmp59 / tmp82 2023-01-11T21:38:06.2061482Z tl.store(out_ptr0 + (x4 + tl.zeros([XBLOCK], tl.int32)), tmp83, xmask) 2023-01-11T21:38:06.2061574Z ''') 2023-01-11T21:38:06.2061580Z 2023-01-11T21:38:06.2061585Z 2023-01-11T21:38:06.2061681Z async_compile.wait(globals()) 2023-01-11T21:38:06.2061787Z del async_compile 2023-01-11T21:38:06.2061793Z 2023-01-11T21:38:06.2061867Z def call(args): 2023-01-11T21:38:06.2061940Z arg0_1, = args 2023-01-11T21:38:06.2062008Z args.clear() 2023-01-11T21:38:06.2062103Z with torch.cuda.device(0): 2023-01-11T21:38:06.2062326Z buf0 = empty_strided((2, 4, 6, 6), (144, 36, 6, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.2062423Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.2062598Z triton_fused__adaptive_avg_pool2d_0.run(arg0_1, buf0, 288, grid=grid(288), stream=stream0) 2023-01-11T21:38:06.2062840Z buf1 = empty_strided((2, 4, 2, 5), (40, 10, 5, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.2063012Z triton_fused__adaptive_avg_pool2d_1_1.run(arg0_1, buf1, 80, grid=grid(80), stream=stream0) 2023-01-11T21:38:06.2063088Z del arg0_1 2023-01-11T21:38:06.2063167Z return (buf0, buf1, ) 2023-01-11T21:38:06.2063172Z 2023-01-11T21:38:06.2063180Z 2023-01-11T21:38:06.2063262Z if __name__ == "__main__": 2023-01-11T21:38:06.2063388Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.2063525Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.2063767Z arg0_1 = rand_strided((2, 4, 6, 6), (144, 36, 6, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.2063891Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.2063897Z 2023-01-11T21:38:06.2063969Z ok (2.046s) 2023-01-11T21:38:06.2064523Z test_adaptive_avg_pool2d2_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.2064667Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.2064959Z [2023-01-11 21:33:56,072] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 379 2023-01-11T21:38:06.2065238Z [2023-01-11 21:33:56,077] torch._inductor.ir: [WARNING] Using FallbackKernel: aten._adaptive_avg_pool2d 2023-01-11T21:38:06.2065565Z [2023-01-11 21:33:56,080] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 379 2023-01-11T21:38:06.2065571Z 2023-01-11T21:38:06.2065676Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.2065749Z import torch 2023-01-11T21:38:06.2065825Z import random 2023-01-11T21:38:06.2065954Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.2066088Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.2066093Z 2023-01-11T21:38:06.2066171Z aten = torch.ops.aten 2023-01-11T21:38:06.2066321Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.2066422Z async_compile = AsyncCompile() 2023-01-11T21:38:06.2066431Z 2023-01-11T21:38:06.2066506Z import triton 2023-01-11T21:38:06.2066602Z import triton.language as tl 2023-01-11T21:38:06.2066739Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.2066891Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.2066897Z 2023-01-11T21:38:06.2066901Z 2023-01-11T21:38:06.2066999Z async_compile.wait(globals()) 2023-01-11T21:38:06.2067071Z del async_compile 2023-01-11T21:38:06.2067076Z 2023-01-11T21:38:06.2067150Z def call(args): 2023-01-11T21:38:06.2067225Z arg0_1, = args 2023-01-11T21:38:06.2067301Z args.clear() 2023-01-11T21:38:06.2067401Z with torch.cuda.device(0): 2023-01-11T21:38:06.2067525Z buf0 = aten._adaptive_avg_pool2d(arg0_1, [4, 4]) 2023-01-11T21:38:06.2067598Z del arg0_1 2023-01-11T21:38:06.2067664Z buf1 = buf0 2023-01-11T21:38:06.2067785Z assert_size_stride(buf1, (2, 4, 4, 4), (64, 16, 4, 1)) 2023-01-11T21:38:06.2067857Z del buf0 2023-01-11T21:38:06.2067966Z return (buf1, ) 2023-01-11T21:38:06.2067971Z 2023-01-11T21:38:06.2067976Z 2023-01-11T21:38:06.2068058Z if __name__ == "__main__": 2023-01-11T21:38:06.2068183Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.2068319Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.2068583Z arg0_1 = rand_strided((2, 4, 21, 21), (1764, 441, 21, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.2068687Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.2068693Z 2023-01-11T21:38:06.2068764Z ok (0.025s) 2023-01-11T21:38:06.2069221Z test_add_const_float_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.2069354Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.2069614Z [2023-01-11 21:33:56,091] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 380 2023-01-11T21:38:06.2069879Z [2023-01-11 21:33:56,155] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 380 2023-01-11T21:38:06.2070290Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.2070421Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.2070675Z [2023-01-11 21:33:56,167] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 381 2023-01-11T21:38:06.2070940Z [2023-01-11 21:33:56,230] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 381 2023-01-11T21:38:06.2070946Z 2023-01-11T21:38:06.2071048Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.2071115Z import torch 2023-01-11T21:38:06.2071189Z import random 2023-01-11T21:38:06.2071338Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.2071465Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.2071471Z 2023-01-11T21:38:06.2071552Z aten = torch.ops.aten 2023-01-11T21:38:06.2071688Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.2071783Z async_compile = AsyncCompile() 2023-01-11T21:38:06.2071789Z 2023-01-11T21:38:06.2071856Z import triton 2023-01-11T21:38:06.2071949Z import triton.language as tl 2023-01-11T21:38:06.2072074Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.2072219Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.2072228Z 2023-01-11T21:38:06.2072232Z 2023-01-11T21:38:06.2072387Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.2072459Z import triton 2023-01-11T21:38:06.2072550Z import triton.language as tl 2023-01-11T21:38:06.2072664Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2072761Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2072897Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.2073023Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2073029Z 2023-01-11T21:38:06.2073434Z @pointwise(size_hints=[32], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.2073507Z @triton.jit 2023-01-11T21:38:06.2073639Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.2073712Z xnumel = 32 2023-01-11T21:38:06.2073860Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2073982Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.2074063Z xmask = xindex < xnumel 2023-01-11T21:38:06.2074134Z x0 = xindex 2023-01-11T21:38:06.2074232Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.2074305Z tmp1 = 1.5 2023-01-11T21:38:06.2074383Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.2074518Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.2074597Z ''') 2023-01-11T21:38:06.2074603Z 2023-01-11T21:38:06.2074607Z 2023-01-11T21:38:06.2074699Z async_compile.wait(globals()) 2023-01-11T21:38:06.2074775Z del async_compile 2023-01-11T21:38:06.2074780Z 2023-01-11T21:38:06.2074856Z def call(args): 2023-01-11T21:38:06.2074929Z arg0_1, = args 2023-01-11T21:38:06.2075006Z args.clear() 2023-01-11T21:38:06.2075099Z with torch.cuda.device(0): 2023-01-11T21:38:06.2075323Z buf0 = empty_strided((32, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.2075433Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.2075569Z triton_fused_add_0.run(arg0_1, buf0, 32, grid=grid(32), stream=stream0) 2023-01-11T21:38:06.2075642Z del arg0_1 2023-01-11T21:38:06.2075719Z return (buf0, ) 2023-01-11T21:38:06.2075724Z 2023-01-11T21:38:06.2075731Z 2023-01-11T21:38:06.2075810Z if __name__ == "__main__": 2023-01-11T21:38:06.2075928Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.2076055Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.2076248Z arg0_1 = rand_strided((32, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.2076358Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.2076363Z 2023-01-11T21:38:06.2076368Z 2023-01-11T21:38:06.2076465Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.2076537Z import torch 2023-01-11T21:38:06.2076611Z import random 2023-01-11T21:38:06.2076734Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.2076856Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.2076861Z 2023-01-11T21:38:06.2076944Z aten = torch.ops.aten 2023-01-11T21:38:06.2077071Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.2077166Z async_compile = AsyncCompile() 2023-01-11T21:38:06.2077203Z 2023-01-11T21:38:06.2077279Z import triton 2023-01-11T21:38:06.2077371Z import triton.language as tl 2023-01-11T21:38:06.2077495Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.2077636Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.2077641Z 2023-01-11T21:38:06.2077646Z 2023-01-11T21:38:06.2077799Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.2077874Z import triton 2023-01-11T21:38:06.2077959Z import triton.language as tl 2023-01-11T21:38:06.2078072Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2078173Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2078307Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.2078432Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2078437Z 2023-01-11T21:38:06.2078845Z @pointwise(size_hints=[32], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.2078918Z @triton.jit 2023-01-11T21:38:06.2079050Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.2079116Z xnumel = 32 2023-01-11T21:38:06.2079212Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2079341Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.2079427Z xmask = xindex < xnumel 2023-01-11T21:38:06.2079497Z x0 = xindex 2023-01-11T21:38:06.2079614Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.2079718Z tmp1 = 1.5 2023-01-11T21:38:06.2079791Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.2079926Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.2080012Z ''') 2023-01-11T21:38:06.2080018Z 2023-01-11T21:38:06.2080022Z 2023-01-11T21:38:06.2080115Z async_compile.wait(globals()) 2023-01-11T21:38:06.2080194Z del async_compile 2023-01-11T21:38:06.2080199Z 2023-01-11T21:38:06.2080272Z def call(args): 2023-01-11T21:38:06.2080346Z arg0_1, = args 2023-01-11T21:38:06.2080414Z args.clear() 2023-01-11T21:38:06.2080506Z with torch.cuda.device(0): 2023-01-11T21:38:06.2080703Z buf0 = empty_strided((32, ), (1, ), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.2080795Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.2080928Z triton_fused_add_0.run(arg0_1, buf0, 32, grid=grid(32), stream=stream0) 2023-01-11T21:38:06.2081002Z del arg0_1 2023-01-11T21:38:06.2081079Z return (buf0, ) 2023-01-11T21:38:06.2081087Z 2023-01-11T21:38:06.2081091Z 2023-01-11T21:38:06.2081172Z if __name__ == "__main__": 2023-01-11T21:38:06.2081283Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.2081410Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.2081610Z arg0_1 = rand_strided((32, ), (1, ), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.2081722Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.2081727Z 2023-01-11T21:38:06.2081797Z ok (0.151s) 2023-01-11T21:38:06.2082259Z test_add_const_int_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.2082390Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.2082648Z [2023-01-11 21:33:56,242] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 382 2023-01-11T21:38:06.2082912Z [2023-01-11 21:33:56,308] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 382 2023-01-11T21:38:06.2083347Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.2083481Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.2083737Z [2023-01-11 21:33:56,320] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 383 2023-01-11T21:38:06.2083999Z [2023-01-11 21:33:56,382] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 383 2023-01-11T21:38:06.2084007Z 2023-01-11T21:38:06.2084106Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.2084181Z import torch 2023-01-11T21:38:06.2084255Z import random 2023-01-11T21:38:06.2084373Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.2084504Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.2084510Z 2023-01-11T21:38:06.2084584Z aten = torch.ops.aten 2023-01-11T21:38:06.2084721Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.2084815Z async_compile = AsyncCompile() 2023-01-11T21:38:06.2084820Z 2023-01-11T21:38:06.2084893Z import triton 2023-01-11T21:38:06.2084985Z import triton.language as tl 2023-01-11T21:38:06.2085110Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.2085248Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.2085253Z 2023-01-11T21:38:06.2085257Z 2023-01-11T21:38:06.2085441Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.2085508Z import triton 2023-01-11T21:38:06.2085600Z import triton.language as tl 2023-01-11T21:38:06.2085714Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2085816Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2085950Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.2086075Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2086080Z 2023-01-11T21:38:06.2086485Z @pointwise(size_hints=[32], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.2086559Z @triton.jit 2023-01-11T21:38:06.2086685Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.2086756Z xnumel = 32 2023-01-11T21:38:06.2086853Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2086985Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.2087070Z xmask = xindex < xnumel 2023-01-11T21:38:06.2087139Z x0 = xindex 2023-01-11T21:38:06.2087237Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.2087301Z tmp1 = 1 2023-01-11T21:38:06.2087379Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.2087519Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.2087607Z ''') 2023-01-11T21:38:06.2087612Z 2023-01-11T21:38:06.2087617Z 2023-01-11T21:38:06.2087710Z async_compile.wait(globals()) 2023-01-11T21:38:06.2087786Z del async_compile 2023-01-11T21:38:06.2087791Z 2023-01-11T21:38:06.2087866Z def call(args): 2023-01-11T21:38:06.2087933Z arg0_1, = args 2023-01-11T21:38:06.2088007Z args.clear() 2023-01-11T21:38:06.2088099Z with torch.cuda.device(0): 2023-01-11T21:38:06.2088297Z buf0 = empty_strided((32, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.2088391Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.2088526Z triton_fused_add_0.run(arg0_1, buf0, 32, grid=grid(32), stream=stream0) 2023-01-11T21:38:06.2088600Z del arg0_1 2023-01-11T21:38:06.2088671Z return (buf0, ) 2023-01-11T21:38:06.2088684Z 2023-01-11T21:38:06.2088688Z 2023-01-11T21:38:06.2088761Z if __name__ == "__main__": 2023-01-11T21:38:06.2088911Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.2089037Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.2089239Z arg0_1 = rand_strided((32, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.2089354Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.2089360Z 2023-01-11T21:38:06.2089364Z 2023-01-11T21:38:06.2089461Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.2089533Z import torch 2023-01-11T21:38:06.2089607Z import random 2023-01-11T21:38:06.2089718Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.2089843Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.2089848Z 2023-01-11T21:38:06.2089930Z aten = torch.ops.aten 2023-01-11T21:38:06.2090068Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.2090163Z async_compile = AsyncCompile() 2023-01-11T21:38:06.2090168Z 2023-01-11T21:38:06.2090241Z import triton 2023-01-11T21:38:06.2090336Z import triton.language as tl 2023-01-11T21:38:06.2090454Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.2090592Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.2090598Z 2023-01-11T21:38:06.2090602Z 2023-01-11T21:38:06.2090756Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.2090831Z import triton 2023-01-11T21:38:06.2090923Z import triton.language as tl 2023-01-11T21:38:06.2091035Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2091138Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2091271Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.2091418Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2091423Z 2023-01-11T21:38:06.2091825Z @pointwise(size_hints=[32], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.2091899Z @triton.jit 2023-01-11T21:38:06.2092033Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.2092106Z xnumel = 32 2023-01-11T21:38:06.2092203Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2092330Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.2092416Z xmask = xindex < xnumel 2023-01-11T21:38:06.2092480Z x0 = xindex 2023-01-11T21:38:06.2092596Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.2092667Z tmp1 = 1 2023-01-11T21:38:06.2092751Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.2092884Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.2092971Z ''') 2023-01-11T21:38:06.2092976Z 2023-01-11T21:38:06.2092981Z 2023-01-11T21:38:06.2093071Z async_compile.wait(globals()) 2023-01-11T21:38:06.2093140Z del async_compile 2023-01-11T21:38:06.2093146Z 2023-01-11T21:38:06.2093224Z def call(args): 2023-01-11T21:38:06.2093297Z arg0_1, = args 2023-01-11T21:38:06.2093371Z args.clear() 2023-01-11T21:38:06.2093461Z with torch.cuda.device(0): 2023-01-11T21:38:06.2093658Z buf0 = empty_strided((32, ), (1, ), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.2093751Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.2093878Z triton_fused_add_0.run(arg0_1, buf0, 32, grid=grid(32), stream=stream0) 2023-01-11T21:38:06.2093951Z del arg0_1 2023-01-11T21:38:06.2094029Z return (buf0, ) 2023-01-11T21:38:06.2094034Z 2023-01-11T21:38:06.2094039Z 2023-01-11T21:38:06.2094120Z if __name__ == "__main__": 2023-01-11T21:38:06.2094241Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.2094366Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.2094675Z arg0_1 = rand_strided((32, ), (1, ), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.2094834Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.2094840Z 2023-01-11T21:38:06.2094905Z ok (0.152s) 2023-01-11T21:38:06.2095378Z test_add_inplace_permuted_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.2095510Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.2095788Z [2023-01-11 21:33:56,395] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 384 2023-01-11T21:38:06.2096054Z [2023-01-11 21:33:56,481] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 384 2023-01-11T21:38:06.2096470Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.2096600Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.2096858Z [2023-01-11 21:33:56,494] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 385 2023-01-11T21:38:06.2097120Z [2023-01-11 21:33:56,573] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 385 2023-01-11T21:38:06.2097211Z 2023-01-11T21:38:06.2097333Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.2097407Z import torch 2023-01-11T21:38:06.2097474Z import random 2023-01-11T21:38:06.2097594Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.2097716Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.2097724Z 2023-01-11T21:38:06.2097806Z aten = torch.ops.aten 2023-01-11T21:38:06.2097942Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.2098038Z async_compile = AsyncCompile() 2023-01-11T21:38:06.2098043Z 2023-01-11T21:38:06.2098116Z import triton 2023-01-11T21:38:06.2098208Z import triton.language as tl 2023-01-11T21:38:06.2098326Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.2098466Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.2098471Z 2023-01-11T21:38:06.2098476Z 2023-01-11T21:38:06.2098634Z triton_fused_add__0 = async_compile.triton(''' 2023-01-11T21:38:06.2098712Z import triton 2023-01-11T21:38:06.2098804Z import triton.language as tl 2023-01-11T21:38:06.2098916Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2099017Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2099143Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.2099272Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2099277Z 2023-01-11T21:38:06.2099719Z @pointwise(size_hints=[8192], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'out_ptr0', 'in_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.2099791Z @triton.jit 2023-01-11T21:38:06.2099932Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.2100008Z xnumel = 5304 2023-01-11T21:38:06.2100108Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2100236Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.2100315Z xmask = xindex < xnumel 2023-01-11T21:38:06.2100385Z x3 = xindex 2023-01-11T21:38:06.2100461Z x0 = xindex % 221 2023-01-11T21:38:06.2100542Z x2 = (xindex // 2652) 2023-01-11T21:38:06.2100763Z tmp0 = tl.load(in_ptr0 + (x3), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.2100872Z tmp1 = tl.load(in_ptr1 + (x0 + (221*x2)), xmask) 2023-01-11T21:38:06.2100951Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.2101078Z tl.store(out_ptr0 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.2101163Z ''') 2023-01-11T21:38:06.2101168Z 2023-01-11T21:38:06.2101173Z 2023-01-11T21:38:06.2101266Z async_compile.wait(globals()) 2023-01-11T21:38:06.2101342Z del async_compile 2023-01-11T21:38:06.2101347Z 2023-01-11T21:38:06.2101421Z def call(args): 2023-01-11T21:38:06.2101499Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.2101572Z args.clear() 2023-01-11T21:38:06.2101663Z with torch.cuda.device(0): 2023-01-11T21:38:06.2101750Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.2101899Z triton_fused_add__0.run(arg0_1, arg1_1, arg0_1, 5304, grid=grid(5304), stream=stream0) 2023-01-11T21:38:06.2101973Z del arg1_1 2023-01-11T21:38:06.2102053Z return (arg0_1, ) 2023-01-11T21:38:06.2102058Z 2023-01-11T21:38:06.2102065Z 2023-01-11T21:38:06.2102147Z if __name__ == "__main__": 2023-01-11T21:38:06.2102262Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.2102385Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.2102611Z arg0_1 = rand_strided((2, 13, 12, 17), (2652, 17, 221, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.2102821Z arg1_1 = rand_strided((2, 13, 1, 17), (221, 17, 17, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.2102941Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.2102946Z 2023-01-11T21:38:06.2102951Z 2023-01-11T21:38:06.2103084Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.2103159Z import torch 2023-01-11T21:38:06.2103233Z import random 2023-01-11T21:38:06.2103351Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.2103474Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.2103479Z 2023-01-11T21:38:06.2103559Z aten = torch.ops.aten 2023-01-11T21:38:06.2103690Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.2103784Z async_compile = AsyncCompile() 2023-01-11T21:38:06.2103789Z 2023-01-11T21:38:06.2103862Z import triton 2023-01-11T21:38:06.2103954Z import triton.language as tl 2023-01-11T21:38:06.2104078Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.2104218Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.2104223Z 2023-01-11T21:38:06.2104228Z 2023-01-11T21:38:06.2104382Z triton_fused_add__0 = async_compile.triton(''' 2023-01-11T21:38:06.2104456Z import triton 2023-01-11T21:38:06.2104545Z import triton.language as tl 2023-01-11T21:38:06.2104658Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2104759Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2104892Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.2105016Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2105023Z 2023-01-11T21:38:06.2105492Z @pointwise(size_hints=[8192], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'out_ptr0', 'in_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.2105584Z @triton.jit 2023-01-11T21:38:06.2105726Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.2105794Z xnumel = 5304 2023-01-11T21:38:06.2105890Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2106018Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.2106104Z xmask = xindex < xnumel 2023-01-11T21:38:06.2106174Z x3 = xindex 2023-01-11T21:38:06.2106248Z x0 = xindex % 221 2023-01-11T21:38:06.2106331Z x2 = (xindex // 2652) 2023-01-11T21:38:06.2106539Z tmp0 = tl.load(in_ptr0 + (x3), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.2106741Z tmp1 = tl.load(in_ptr1 + (x0 + (221*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.2106820Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.2106956Z tl.store(out_ptr0 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.2107040Z ''') 2023-01-11T21:38:06.2107046Z 2023-01-11T21:38:06.2107050Z 2023-01-11T21:38:06.2107143Z async_compile.wait(globals()) 2023-01-11T21:38:06.2107219Z del async_compile 2023-01-11T21:38:06.2107224Z 2023-01-11T21:38:06.2107292Z def call(args): 2023-01-11T21:38:06.2107371Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.2107445Z args.clear() 2023-01-11T21:38:06.2107537Z with torch.cuda.device(0): 2023-01-11T21:38:06.2107632Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.2107780Z triton_fused_add__0.run(arg0_1, arg1_1, arg0_1, 5304, grid=grid(5304), stream=stream0) 2023-01-11T21:38:06.2107853Z del arg1_1 2023-01-11T21:38:06.2107926Z return (arg0_1, ) 2023-01-11T21:38:06.2107938Z 2023-01-11T21:38:06.2107942Z 2023-01-11T21:38:06.2108018Z if __name__ == "__main__": 2023-01-11T21:38:06.2108138Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.2108263Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.2108489Z arg0_1 = rand_strided((2, 13, 12, 17), (2652, 17, 221, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.2108704Z arg1_1 = rand_strided((2, 13, 1, 17), (221, 17, 17, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.2108824Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.2108829Z 2023-01-11T21:38:06.2108899Z ok (0.192s) 2023-01-11T21:38:06.2109353Z test_addmm_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.2109512Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.2109761Z [2023-01-11 21:33:56,601] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 386 2023-01-11T21:38:06.2110025Z [2023-01-11 21:33:56,826] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 386 2023-01-11T21:38:06.2110440Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.2110576Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.2110831Z [2023-01-11 21:33:56,924] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 387 2023-01-11T21:38:06.2110840Z 2023-01-11T21:38:06.2110938Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.2111012Z import torch 2023-01-11T21:38:06.2111086Z import random 2023-01-11T21:38:06.2111204Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.2111321Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.2111326Z 2023-01-11T21:38:06.2111406Z aten = torch.ops.aten 2023-01-11T21:38:06.2111542Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.2111636Z async_compile = AsyncCompile() 2023-01-11T21:38:06.2111641Z 2023-01-11T21:38:06.2111714Z import triton 2023-01-11T21:38:06.2111811Z import triton.language as tl 2023-01-11T21:38:06.2111935Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.2112074Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.2112080Z 2023-01-11T21:38:06.2112084Z 2023-01-11T21:38:06.2112232Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.2112334Z import triton 2023-01-11T21:38:06.2112428Z import triton.language as tl 2023-01-11T21:38:06.2112542Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2112644Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2112777Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.2112902Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2112908Z 2023-01-11T21:38:06.2113312Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.2113381Z @triton.jit 2023-01-11T21:38:06.2113513Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.2113585Z xnumel = 64 2023-01-11T21:38:06.2113682Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2113813Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.2113896Z xmask = xindex < xnumel 2023-01-11T21:38:06.2113965Z x0 = xindex 2023-01-11T21:38:06.2114055Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.2114125Z tmp1 = 1 2023-01-11T21:38:06.2114203Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.2114337Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.2114422Z ''') 2023-01-11T21:38:06.2114428Z 2023-01-11T21:38:06.2114432Z 2023-01-11T21:38:06.2114589Z triton_fused_add_1_1 = async_compile.triton(''' 2023-01-11T21:38:06.2114663Z import triton 2023-01-11T21:38:06.2114756Z import triton.language as tl 2023-01-11T21:38:06.2114908Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2115008Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2115151Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.2115295Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2115302Z 2023-01-11T21:38:06.2115731Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.2115805Z @triton.jit 2023-01-11T21:38:06.2115937Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.2116010Z xnumel = 64 2023-01-11T21:38:06.2116099Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2116228Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.2116311Z xmask = xindex < xnumel 2023-01-11T21:38:06.2116385Z x0 = xindex 2023-01-11T21:38:06.2116480Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.2116550Z tmp1 = 2 2023-01-11T21:38:06.2116622Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.2116754Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.2116840Z ''') 2023-01-11T21:38:06.2116847Z 2023-01-11T21:38:06.2116853Z 2023-01-11T21:38:06.2117009Z triton_fused_add_2_2 = async_compile.triton(''' 2023-01-11T21:38:06.2117086Z import triton 2023-01-11T21:38:06.2117179Z import triton.language as tl 2023-01-11T21:38:06.2117291Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2117392Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2117517Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.2117639Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2117644Z 2023-01-11T21:38:06.2118045Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.2118121Z @triton.jit 2023-01-11T21:38:06.2118252Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.2118325Z xnumel = 64 2023-01-11T21:38:06.2118452Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2118582Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.2118658Z xmask = xindex < xnumel 2023-01-11T21:38:06.2118726Z x0 = xindex 2023-01-11T21:38:06.2118822Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.2118893Z tmp1 = 3 2023-01-11T21:38:06.2118971Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.2119104Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.2119189Z ''') 2023-01-11T21:38:06.2119195Z 2023-01-11T21:38:06.2119199Z 2023-01-11T21:38:06.2119348Z triton_fused_add_3_3 = async_compile.triton(''' 2023-01-11T21:38:06.2119426Z import triton 2023-01-11T21:38:06.2119518Z import triton.language as tl 2023-01-11T21:38:06.2119631Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2119733Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2119864Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.2119992Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2119997Z 2023-01-11T21:38:06.2120396Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp32', 1: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.2120463Z @triton.jit 2023-01-11T21:38:06.2120587Z def triton_(in_out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.2120659Z xnumel = 64 2023-01-11T21:38:06.2120755Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2120883Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.2120996Z xmask = xindex < xnumel 2023-01-11T21:38:06.2121066Z x0 = xindex 2023-01-11T21:38:06.2121161Z tmp0 = tl.load(in_out_ptr0 + (x0), xmask) 2023-01-11T21:38:06.2121232Z tmp1 = 4 2023-01-11T21:38:06.2121310Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.2121449Z tl.store(in_out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.2121534Z ''') 2023-01-11T21:38:06.2121539Z 2023-01-11T21:38:06.2121544Z 2023-01-11T21:38:06.2121638Z async_compile.wait(globals()) 2023-01-11T21:38:06.2121715Z del async_compile 2023-01-11T21:38:06.2121720Z 2023-01-11T21:38:06.2121792Z def call(args): 2023-01-11T21:38:06.2121873Z arg0_1, arg1_1, arg2_1 = args 2023-01-11T21:38:06.2121948Z args.clear() 2023-01-11T21:38:06.2122040Z with torch.cuda.device(0): 2023-01-11T21:38:06.2122241Z buf0 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.2122334Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.2122474Z triton_fused_add_0.run(arg0_1, buf0, 64, grid=grid(64), stream=stream0) 2023-01-11T21:38:06.2122547Z del arg0_1 2023-01-11T21:38:06.2122738Z buf1 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.2122880Z triton_fused_add_1_1.run(arg1_1, buf1, 64, grid=grid(64), stream=stream0) 2023-01-11T21:38:06.2122953Z del arg1_1 2023-01-11T21:38:06.2123148Z buf2 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.2123284Z triton_fused_add_2_2.run(arg2_1, buf2, 64, grid=grid(64), stream=stream0) 2023-01-11T21:38:06.2123356Z del arg2_1 2023-01-11T21:38:06.2123546Z buf3 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.2123671Z aten.addmm.out(buf0, buf1, buf2, beta=1, alpha=1, out=buf3) 2023-01-11T21:38:06.2123745Z del buf0 2023-01-11T21:38:06.2123814Z del buf1 2023-01-11T21:38:06.2123886Z del buf2 2023-01-11T21:38:06.2123976Z buf4 = buf3; del buf3 # reuse 2023-01-11T21:38:06.2124106Z triton_fused_add_3_3.run(buf4, 64, grid=grid(64), stream=stream0) 2023-01-11T21:38:06.2124183Z return (buf4, ) 2023-01-11T21:38:06.2124189Z 2023-01-11T21:38:06.2124193Z 2023-01-11T21:38:06.2124273Z if __name__ == "__main__": 2023-01-11T21:38:06.2124410Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.2124538Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.2124737Z arg0_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.2124935Z arg1_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.2125130Z arg2_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.2125255Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1])) 2023-01-11T21:38:06.2125261Z 2023-01-11T21:38:06.2125553Z [2023-01-11 21:33:57,016] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 387 2023-01-11T21:38:06.2125564Z 2023-01-11T21:38:06.2125683Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.2125751Z import torch 2023-01-11T21:38:06.2125825Z import random 2023-01-11T21:38:06.2125944Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.2126072Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.2126077Z 2023-01-11T21:38:06.2126159Z aten = torch.ops.aten 2023-01-11T21:38:06.2126295Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.2126391Z async_compile = AsyncCompile() 2023-01-11T21:38:06.2126397Z 2023-01-11T21:38:06.2126468Z import triton 2023-01-11T21:38:06.2126555Z import triton.language as tl 2023-01-11T21:38:06.2126679Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.2126817Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.2126822Z 2023-01-11T21:38:06.2126857Z 2023-01-11T21:38:06.2127011Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.2127085Z import triton 2023-01-11T21:38:06.2127178Z import triton.language as tl 2023-01-11T21:38:06.2127290Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2127384Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2127518Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.2127641Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2127646Z 2023-01-11T21:38:06.2128048Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.2128123Z @triton.jit 2023-01-11T21:38:06.2128255Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.2128328Z xnumel = 64 2023-01-11T21:38:06.2128423Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2128547Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.2128630Z xmask = xindex < xnumel 2023-01-11T21:38:06.2128699Z x0 = xindex 2023-01-11T21:38:06.2128816Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.2128886Z tmp1 = 1 2023-01-11T21:38:06.2128967Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.2129103Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.2129181Z ''') 2023-01-11T21:38:06.2129186Z 2023-01-11T21:38:06.2129197Z 2023-01-11T21:38:06.2129347Z triton_fused_add_1_1 = async_compile.triton(''' 2023-01-11T21:38:06.2129422Z import triton 2023-01-11T21:38:06.2129513Z import triton.language as tl 2023-01-11T21:38:06.2129628Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2129730Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2129862Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.2129987Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2129992Z 2023-01-11T21:38:06.2130386Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.2130490Z @triton.jit 2023-01-11T21:38:06.2130621Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.2130694Z xnumel = 64 2023-01-11T21:38:06.2130790Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2130918Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.2131001Z xmask = xindex < xnumel 2023-01-11T21:38:06.2131064Z x0 = xindex 2023-01-11T21:38:06.2131180Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.2131251Z tmp1 = 2 2023-01-11T21:38:06.2131331Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.2131463Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.2131550Z ''') 2023-01-11T21:38:06.2131556Z 2023-01-11T21:38:06.2131560Z 2023-01-11T21:38:06.2131715Z triton_fused_add_2_2 = async_compile.triton(''' 2023-01-11T21:38:06.2131788Z import triton 2023-01-11T21:38:06.2131874Z import triton.language as tl 2023-01-11T21:38:06.2131990Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2132091Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2132224Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.2132350Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2132355Z 2023-01-11T21:38:06.2132755Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.2132828Z @triton.jit 2023-01-11T21:38:06.2132957Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.2133053Z xnumel = 64 2023-01-11T21:38:06.2133152Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2133281Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.2133366Z xmask = xindex < xnumel 2023-01-11T21:38:06.2133438Z x0 = xindex 2023-01-11T21:38:06.2133554Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.2133624Z tmp1 = 3 2023-01-11T21:38:06.2133695Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.2133826Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.2133911Z ''') 2023-01-11T21:38:06.2133916Z 2023-01-11T21:38:06.2133921Z 2023-01-11T21:38:06.2134077Z triton_fused_add_3_3 = async_compile.triton(''' 2023-01-11T21:38:06.2134150Z import triton 2023-01-11T21:38:06.2134242Z import triton.language as tl 2023-01-11T21:38:06.2134354Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2134451Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2134700Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.2134826Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2134831Z 2023-01-11T21:38:06.2135236Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp16', 1: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.2135309Z @triton.jit 2023-01-11T21:38:06.2135435Z def triton_(in_out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.2135508Z xnumel = 64 2023-01-11T21:38:06.2135604Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2135725Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.2135808Z xmask = xindex < xnumel 2023-01-11T21:38:06.2135879Z x0 = xindex 2023-01-11T21:38:06.2135999Z tmp0 = tl.load(in_out_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.2136074Z tmp1 = 4 2023-01-11T21:38:06.2136150Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.2136287Z tl.store(in_out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.2136365Z ''') 2023-01-11T21:38:06.2136370Z 2023-01-11T21:38:06.2136381Z 2023-01-11T21:38:06.2136467Z async_compile.wait(globals()) 2023-01-11T21:38:06.2136588Z del async_compile 2023-01-11T21:38:06.2136594Z 2023-01-11T21:38:06.2136669Z def call(args): 2023-01-11T21:38:06.2136757Z arg0_1, arg1_1, arg2_1 = args 2023-01-11T21:38:06.2136832Z args.clear() 2023-01-11T21:38:06.2136924Z with torch.cuda.device(0): 2023-01-11T21:38:06.2137167Z buf0 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.2137267Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.2137420Z triton_fused_add_0.run(arg0_1, buf0, 64, grid=grid(64), stream=stream0) 2023-01-11T21:38:06.2137494Z del arg0_1 2023-01-11T21:38:06.2137695Z buf1 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.2137836Z triton_fused_add_1_1.run(arg1_1, buf1, 64, grid=grid(64), stream=stream0) 2023-01-11T21:38:06.2137911Z del arg1_1 2023-01-11T21:38:06.2138104Z buf2 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.2138235Z triton_fused_add_2_2.run(arg2_1, buf2, 64, grid=grid(64), stream=stream0) 2023-01-11T21:38:06.2138308Z del arg2_1 2023-01-11T21:38:06.2138501Z buf3 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.2138631Z aten.addmm.out(buf0, buf1, buf2, beta=1, alpha=1, out=buf3) 2023-01-11T21:38:06.2138703Z del buf0 2023-01-11T21:38:06.2138772Z del buf1 2023-01-11T21:38:06.2138840Z del buf2 2023-01-11T21:38:06.2138924Z buf4 = buf3; del buf3 # reuse 2023-01-11T21:38:06.2139055Z triton_fused_add_3_3.run(buf4, 64, grid=grid(64), stream=stream0) 2023-01-11T21:38:06.2139177Z return (buf4, ) 2023-01-11T21:38:06.2139182Z 2023-01-11T21:38:06.2139187Z 2023-01-11T21:38:06.2139265Z if __name__ == "__main__": 2023-01-11T21:38:06.2139382Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.2139506Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.2139708Z arg0_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.2139903Z arg1_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.2140093Z arg2_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.2140221Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1])) 2023-01-11T21:38:06.2140227Z 2023-01-11T21:38:06.2140296Z ok (0.444s) 2023-01-11T21:38:06.2140757Z test_alexnet_prefix_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.2140888Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.2141149Z [2023-01-11 21:33:57,096] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 388 2023-01-11T21:38:06.2141362Z [2023-01-11 21:33:57,185] torch._inductor.scheduler: [DEBUG] removed dead node: buf3 2023-01-11T21:38:06.2141623Z [2023-01-11 21:33:57,325] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 388 2023-01-11T21:38:06.2142039Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.2142171Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.2142424Z [2023-01-11 21:33:57,387] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 389 2023-01-11T21:38:06.2142660Z [2023-01-11 21:33:57,438] torch._inductor.scheduler: [DEBUG] removed dead node: buf3 2023-01-11T21:38:06.2142674Z 2023-01-11T21:38:06.2142766Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.2142840Z import torch 2023-01-11T21:38:06.2142913Z import random 2023-01-11T21:38:06.2143032Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.2143155Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.2143161Z 2023-01-11T21:38:06.2143242Z aten = torch.ops.aten 2023-01-11T21:38:06.2143378Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.2143468Z async_compile = AsyncCompile() 2023-01-11T21:38:06.2143475Z 2023-01-11T21:38:06.2143548Z import triton 2023-01-11T21:38:06.2143640Z import triton.language as tl 2023-01-11T21:38:06.2143765Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.2143907Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.2143912Z 2023-01-11T21:38:06.2143917Z 2023-01-11T21:38:06.2144099Z triton_fused_convolution_relu_0 = async_compile.triton(''' 2023-01-11T21:38:06.2144174Z import triton 2023-01-11T21:38:06.2144266Z import triton.language as tl 2023-01-11T21:38:06.2144372Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2144476Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2144609Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.2144733Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2144738Z 2023-01-11T21:38:06.2145156Z @pointwise(size_hints=[4194304], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.2145265Z @triton.jit 2023-01-11T21:38:06.2145399Z def triton_(in_out_ptr0, in_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.2145474Z xnumel = 3097600 2023-01-11T21:38:06.2145567Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2145693Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.2145774Z xmask = xindex < xnumel 2023-01-11T21:38:06.2145845Z x3 = xindex 2023-01-11T21:38:06.2145930Z x1 = (xindex // 3025) % 64 2023-01-11T21:38:06.2146038Z tmp0 = tl.load(in_out_ptr0 + (x3), xmask) 2023-01-11T21:38:06.2146132Z tmp1 = tl.load(in_ptr0 + (x1), xmask) 2023-01-11T21:38:06.2146205Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.2146322Z tmp3 = tl.where(0 != 0, 0, tl.where(0 > tmp2, 0, tmp2)) 2023-01-11T21:38:06.2146460Z tl.store(in_out_ptr0 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.2146548Z ''') 2023-01-11T21:38:06.2146553Z 2023-01-11T21:38:06.2146558Z 2023-01-11T21:38:06.2146717Z triton_fused_getitem_1 = async_compile.triton(''' 2023-01-11T21:38:06.2146789Z import triton 2023-01-11T21:38:06.2146880Z import triton.language as tl 2023-01-11T21:38:06.2146990Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2147090Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2147222Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.2147346Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2147351Z 2023-01-11T21:38:06.2147758Z @pointwise(size_hints=[1048576], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.2147832Z @triton.jit 2023-01-11T21:38:06.2147964Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.2148042Z xnumel = 746496 2023-01-11T21:38:06.2148132Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2148259Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.2148341Z xmask = xindex < xnumel 2023-01-11T21:38:06.2148416Z x0 = xindex % 27 2023-01-11T21:38:06.2148526Z x1 = (xindex // 27) % 27 2023-01-11T21:38:06.2148608Z x2 = (xindex // 729) 2023-01-11T21:38:06.2148678Z x3 = xindex 2023-01-11T21:38:06.2148791Z tmp0 = tl.load(in_ptr0 + ((2*x0) + (110*x1) + (3025*x2)), xmask) 2023-01-11T21:38:06.2148909Z tmp1 = tl.load(in_ptr0 + (1 + (2*x0) + (110*x1) + (3025*x2)), xmask) 2023-01-11T21:38:06.2149028Z tmp3 = tl.load(in_ptr0 + (2 + (2*x0) + (110*x1) + (3025*x2)), xmask) 2023-01-11T21:38:06.2149146Z tmp5 = tl.load(in_ptr0 + (55 + (2*x0) + (110*x1) + (3025*x2)), xmask) 2023-01-11T21:38:06.2149263Z tmp7 = tl.load(in_ptr0 + (56 + (2*x0) + (110*x1) + (3025*x2)), xmask) 2023-01-11T21:38:06.2149381Z tmp9 = tl.load(in_ptr0 + (57 + (2*x0) + (110*x1) + (3025*x2)), xmask) 2023-01-11T21:38:06.2149500Z tmp11 = tl.load(in_ptr0 + (110 + (2*x0) + (110*x1) + (3025*x2)), xmask) 2023-01-11T21:38:06.2149617Z tmp13 = tl.load(in_ptr0 + (111 + (2*x0) + (110*x1) + (3025*x2)), xmask) 2023-01-11T21:38:06.2149728Z tmp15 = tl.load(in_ptr0 + (112 + (2*x0) + (110*x1) + (3025*x2)), xmask) 2023-01-11T21:38:06.2149864Z tmp2 = tl.where(tmp1 != tmp1, tmp1, tl.where(tmp1 > tmp0, tmp1, tmp0)) 2023-01-11T21:38:06.2150000Z tmp4 = tl.where(tmp3 != tmp3, tmp3, tl.where(tmp3 > tmp2, tmp3, tmp2)) 2023-01-11T21:38:06.2150131Z tmp6 = tl.where(tmp5 != tmp5, tmp5, tl.where(tmp5 > tmp4, tmp5, tmp4)) 2023-01-11T21:38:06.2150261Z tmp8 = tl.where(tmp7 != tmp7, tmp7, tl.where(tmp7 > tmp6, tmp7, tmp6)) 2023-01-11T21:38:06.2150394Z tmp10 = tl.where(tmp9 != tmp9, tmp9, tl.where(tmp9 > tmp8, tmp9, tmp8)) 2023-01-11T21:38:06.2150536Z tmp12 = tl.where(tmp11 != tmp11, tmp11, tl.where(tmp11 > tmp10, tmp11, tmp10)) 2023-01-11T21:38:06.2150702Z tmp14 = tl.where(tmp13 != tmp13, tmp13, tl.where(tmp13 > tmp12, tmp13, tmp12)) 2023-01-11T21:38:06.2150828Z tmp16 = tl.where(tmp15 != tmp15, tmp15, tl.where(tmp15 > tmp14, tmp15, tmp14)) 2023-01-11T21:38:06.2150958Z tl.store(out_ptr0 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp16, xmask) 2023-01-11T21:38:06.2151049Z ''') 2023-01-11T21:38:06.2151055Z 2023-01-11T21:38:06.2151059Z 2023-01-11T21:38:06.2151153Z async_compile.wait(globals()) 2023-01-11T21:38:06.2151231Z del async_compile 2023-01-11T21:38:06.2151237Z 2023-01-11T21:38:06.2151313Z def call(args): 2023-01-11T21:38:06.2151399Z arg0_1, arg1_1, arg2_1 = args 2023-01-11T21:38:06.2151475Z args.clear() 2023-01-11T21:38:06.2151561Z with torch.cuda.device(0): 2023-01-11T21:38:06.2151704Z buf0 = aten.convolution(arg2_1, arg1_1, None, (4, 4), (2, 2), (1, 1), False, (0, 0), 1) 2023-01-11T21:38:06.2151824Z assert_size_stride(buf0, (16, 64, 55, 55), (193600, 3025, 55, 1)) 2023-01-11T21:38:06.2151900Z del arg1_1 2023-01-11T21:38:06.2151975Z del arg2_1 2023-01-11T21:38:06.2152103Z buf1 = as_strided(buf0, (16, 64, 55, 55), (193600, 3025, 55, 1)); del buf0 # reuse 2023-01-11T21:38:06.2152196Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.2152349Z triton_fused_convolution_relu_0.run(buf1, arg0_1, 3097600, grid=grid(3097600), stream=stream0) 2023-01-11T21:38:06.2152422Z del arg0_1 2023-01-11T21:38:06.2152652Z buf2 = empty_strided((16, 64, 27, 27), (46656, 729, 27, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.2152796Z triton_fused_getitem_1.run(buf1, buf2, 746496, grid=grid(746496), stream=stream0) 2023-01-11T21:38:06.2152877Z return (buf2, ) 2023-01-11T21:38:06.2152882Z 2023-01-11T21:38:06.2152887Z 2023-01-11T21:38:06.2152965Z if __name__ == "__main__": 2023-01-11T21:38:06.2153083Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.2153212Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.2153407Z arg0_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.2153630Z arg1_1 = rand_strided((64, 3, 11, 11), (363, 121, 11, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.2153891Z arg2_1 = rand_strided((16, 3, 224, 224), (150528, 50176, 224, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.2154021Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1])) 2023-01-11T21:38:06.2154026Z 2023-01-11T21:38:06.2154290Z [2023-01-11 21:33:57,574] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 389 2023-01-11T21:38:06.2154296Z 2023-01-11T21:38:06.2154394Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.2154472Z import torch 2023-01-11T21:38:06.2154548Z import random 2023-01-11T21:38:06.2154660Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.2154785Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.2154794Z 2023-01-11T21:38:06.2154875Z aten = torch.ops.aten 2023-01-11T21:38:06.2155009Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.2155104Z async_compile = AsyncCompile() 2023-01-11T21:38:06.2155110Z 2023-01-11T21:38:06.2155184Z import triton 2023-01-11T21:38:06.2155277Z import triton.language as tl 2023-01-11T21:38:06.2155405Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.2155536Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.2155541Z 2023-01-11T21:38:06.2155552Z 2023-01-11T21:38:06.2155724Z triton_fused_convolution_relu_0 = async_compile.triton(''' 2023-01-11T21:38:06.2155797Z import triton 2023-01-11T21:38:06.2155889Z import triton.language as tl 2023-01-11T21:38:06.2156003Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2156102Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2156233Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.2156356Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2156407Z 2023-01-11T21:38:06.2156825Z @pointwise(size_hints=[4194304], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.2156901Z @triton.jit 2023-01-11T21:38:06.2157034Z def triton_(in_out_ptr0, in_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.2157111Z xnumel = 3097600 2023-01-11T21:38:06.2157210Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2157337Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.2157424Z xmask = xindex < xnumel 2023-01-11T21:38:06.2157495Z x3 = xindex 2023-01-11T21:38:06.2157570Z x1 = (xindex // 3025) % 64 2023-01-11T21:38:06.2157690Z tmp0 = tl.load(in_out_ptr0 + (x3), xmask).to(tl.float32) 2023-01-11T21:38:06.2157807Z tmp1 = tl.load(in_ptr0 + (x1), xmask).to(tl.float32) 2023-01-11T21:38:06.2157890Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.2158005Z tmp3 = tl.where(0 != 0, 0, tl.where(0 > tmp2, 0, tmp2)) 2023-01-11T21:38:06.2158141Z tl.store(in_out_ptr0 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.2158227Z ''') 2023-01-11T21:38:06.2158233Z 2023-01-11T21:38:06.2158238Z 2023-01-11T21:38:06.2158392Z triton_fused_getitem_1 = async_compile.triton(''' 2023-01-11T21:38:06.2158466Z import triton 2023-01-11T21:38:06.2158557Z import triton.language as tl 2023-01-11T21:38:06.2158671Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2158771Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2158903Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.2159029Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2159034Z 2023-01-11T21:38:06.2159444Z @pointwise(size_hints=[1048576], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.2159514Z @triton.jit 2023-01-11T21:38:06.2159648Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.2159723Z xnumel = 746496 2023-01-11T21:38:06.2159855Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2159983Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.2160063Z xmask = xindex < xnumel 2023-01-11T21:38:06.2160139Z x0 = xindex % 27 2023-01-11T21:38:06.2160215Z x1 = (xindex // 27) % 27 2023-01-11T21:38:06.2160294Z x2 = (xindex // 729) 2023-01-11T21:38:06.2160363Z x3 = xindex 2023-01-11T21:38:06.2160499Z tmp0 = tl.load(in_ptr0 + ((2*x0) + (110*x1) + (3025*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.2160633Z tmp1 = tl.load(in_ptr0 + (1 + (2*x0) + (110*x1) + (3025*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.2160765Z tmp3 = tl.load(in_ptr0 + (2 + (2*x0) + (110*x1) + (3025*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.2160901Z tmp5 = tl.load(in_ptr0 + (55 + (2*x0) + (110*x1) + (3025*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.2161032Z tmp7 = tl.load(in_ptr0 + (56 + (2*x0) + (110*x1) + (3025*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.2161158Z tmp9 = tl.load(in_ptr0 + (57 + (2*x0) + (110*x1) + (3025*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.2161289Z tmp11 = tl.load(in_ptr0 + (110 + (2*x0) + (110*x1) + (3025*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.2161420Z tmp13 = tl.load(in_ptr0 + (111 + (2*x0) + (110*x1) + (3025*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.2161546Z tmp15 = tl.load(in_ptr0 + (112 + (2*x0) + (110*x1) + (3025*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.2161680Z tmp2 = tl.where(tmp1 != tmp1, tmp1, tl.where(tmp1 > tmp0, tmp1, tmp0)) 2023-01-11T21:38:06.2161813Z tmp4 = tl.where(tmp3 != tmp3, tmp3, tl.where(tmp3 > tmp2, tmp3, tmp2)) 2023-01-11T21:38:06.2161944Z tmp6 = tl.where(tmp5 != tmp5, tmp5, tl.where(tmp5 > tmp4, tmp5, tmp4)) 2023-01-11T21:38:06.2162101Z tmp8 = tl.where(tmp7 != tmp7, tmp7, tl.where(tmp7 > tmp6, tmp7, tmp6)) 2023-01-11T21:38:06.2162230Z tmp10 = tl.where(tmp9 != tmp9, tmp9, tl.where(tmp9 > tmp8, tmp9, tmp8)) 2023-01-11T21:38:06.2162376Z tmp12 = tl.where(tmp11 != tmp11, tmp11, tl.where(tmp11 > tmp10, tmp11, tmp10)) 2023-01-11T21:38:06.2162513Z tmp14 = tl.where(tmp13 != tmp13, tmp13, tl.where(tmp13 > tmp12, tmp13, tmp12)) 2023-01-11T21:38:06.2162645Z tmp16 = tl.where(tmp15 != tmp15, tmp15, tl.where(tmp15 > tmp14, tmp15, tmp14)) 2023-01-11T21:38:06.2162776Z tl.store(out_ptr0 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp16, xmask) 2023-01-11T21:38:06.2162866Z ''') 2023-01-11T21:38:06.2162871Z 2023-01-11T21:38:06.2162876Z 2023-01-11T21:38:06.2162969Z async_compile.wait(globals()) 2023-01-11T21:38:06.2163046Z del async_compile 2023-01-11T21:38:06.2163051Z 2023-01-11T21:38:06.2163118Z def call(args): 2023-01-11T21:38:06.2163209Z arg0_1, arg1_1, arg2_1 = args 2023-01-11T21:38:06.2163285Z args.clear() 2023-01-11T21:38:06.2163379Z with torch.cuda.device(0): 2023-01-11T21:38:06.2163523Z buf0 = aten.convolution(arg2_1, arg1_1, None, (4, 4), (2, 2), (1, 1), False, (0, 0), 1) 2023-01-11T21:38:06.2163644Z assert_size_stride(buf0, (16, 64, 55, 55), (193600, 3025, 55, 1)) 2023-01-11T21:38:06.2163718Z del arg1_1 2023-01-11T21:38:06.2163784Z del arg2_1 2023-01-11T21:38:06.2163911Z buf1 = as_strided(buf0, (16, 64, 55, 55), (193600, 3025, 55, 1)); del buf0 # reuse 2023-01-11T21:38:06.2164004Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.2164163Z triton_fused_convolution_relu_0.run(buf1, arg0_1, 3097600, grid=grid(3097600), stream=stream0) 2023-01-11T21:38:06.2164238Z del arg0_1 2023-01-11T21:38:06.2164468Z buf2 = empty_strided((16, 64, 27, 27), (46656, 729, 27, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.2164609Z triton_fused_getitem_1.run(buf1, buf2, 746496, grid=grid(746496), stream=stream0) 2023-01-11T21:38:06.2164689Z return (buf2, ) 2023-01-11T21:38:06.2164696Z 2023-01-11T21:38:06.2164700Z 2023-01-11T21:38:06.2164780Z if __name__ == "__main__": 2023-01-11T21:38:06.2164890Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.2165042Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.2165247Z arg0_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.2165471Z arg1_1 = rand_strided((64, 3, 11, 11), (363, 121, 11, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.2165701Z arg2_1 = rand_strided((16, 3, 224, 224), (150528, 50176, 224, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.2165829Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1])) 2023-01-11T21:38:06.2165834Z 2023-01-11T21:38:06.2165906Z ok (0.560s) 2023-01-11T21:38:06.2166356Z test_any_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.2166493Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.2166745Z [2023-01-11 21:33:57,612] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 390 2023-01-11T21:38:06.2167008Z [2023-01-11 21:33:57,914] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 390 2023-01-11T21:38:06.2167423Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.2167580Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.2167836Z [2023-01-11 21:33:57,947] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 391 2023-01-11T21:38:06.2167841Z 2023-01-11T21:38:06.2167942Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.2168018Z import torch 2023-01-11T21:38:06.2168091Z import random 2023-01-11T21:38:06.2168210Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.2168327Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.2168332Z 2023-01-11T21:38:06.2168416Z aten = torch.ops.aten 2023-01-11T21:38:06.2168551Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.2168646Z async_compile = AsyncCompile() 2023-01-11T21:38:06.2168651Z 2023-01-11T21:38:06.2168725Z import triton 2023-01-11T21:38:06.2168816Z import triton.language as tl 2023-01-11T21:38:06.2168946Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.2169085Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.2169091Z 2023-01-11T21:38:06.2169095Z 2023-01-11T21:38:06.2169366Z triton_fused_any_1_any_2_any_3_any_4_isinf_isinf_1_isinf_2_logical_not_logical_not_1_logical_not_2_0 = async_compile.triton(''' 2023-01-11T21:38:06.2169443Z import triton 2023-01-11T21:38:06.2169535Z import triton.language as tl 2023-01-11T21:38:06.2169648Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2169749Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2169879Z from torch._inductor.triton_ops.autotune import reduction 2023-01-11T21:38:06.2170002Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2170007Z 2023-01-11T21:38:06.2170099Z @reduction(size_hints=[1, 64], 2023-01-11T21:38:06.2170206Z reduction_hint=ReductionHint.INNER, 2023-01-11T21:38:06.2170292Z filename=__file__, 2023-01-11T21:38:06.2170717Z meta={'signature': {0: '*i1', 1: '*i1', 2: '*fp32', 3: '*i1', 4: '*i1', 5: 'i32', 6: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0', 'in_out_ptr1'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 6), equal_to_1=())]}) 2023-01-11T21:38:06.2170790Z @triton.jit 2023-01-11T21:38:06.2171018Z def triton_(in_out_ptr0, in_out_ptr1, in_ptr0, out_ptr0, out_ptr1, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr): 2023-01-11T21:38:06.2171094Z xnumel = 1 2023-01-11T21:38:06.2171169Z rnumel = 64 2023-01-11T21:38:06.2171266Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2171394Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.2171476Z xmask = xindex < xnumel 2023-01-11T21:38:06.2171593Z rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK]) 2023-01-11T21:38:06.2171706Z _tmp2 = tl.zeros([XBLOCK, RBLOCK], tl.int1) + 0 2023-01-11T21:38:06.2171817Z _tmp4 = tl.zeros([XBLOCK, RBLOCK], tl.int1) + 0 2023-01-11T21:38:06.2171923Z _tmp8 = tl.zeros([XBLOCK, RBLOCK], tl.int1) + 0 2023-01-11T21:38:06.2172034Z _tmp13 = tl.zeros([XBLOCK, RBLOCK], tl.int1) + 0 2023-01-11T21:38:06.2172131Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.2172220Z rindex = roffset + rbase 2023-01-11T21:38:06.2172308Z rmask = rindex < rnumel 2023-01-11T21:38:06.2172381Z r0 = rindex 2023-01-11T21:38:06.2172579Z tmp0 = tl.load(in_ptr0 + (r0), rmask, eviction_policy='evict_last') 2023-01-11T21:38:06.2172679Z tmp9 = tl.load(in_ptr0 + (r0), rmask) 2023-01-11T21:38:06.2172760Z tmp1 = (tmp0 != 0) 2023-01-11T21:38:06.2172882Z _tmp2 = tl.where(xmask & rmask & (_tmp2 < tmp1), tmp1, _tmp2) 2023-01-11T21:38:06.2172981Z tmp3 = tl.libdevice.isinf(tmp0) 2023-01-11T21:38:06.2173108Z _tmp4 = tl.where(xmask & rmask & (_tmp4 < tmp3), tmp3, _tmp4) 2023-01-11T21:38:06.2173187Z tmp5 = tmp3 == 0 2023-01-11T21:38:06.2173304Z tmp6 = tmp5.to(tl.int64) 2023-01-11T21:38:06.2173384Z tmp7 = (tmp6 != 0) 2023-01-11T21:38:06.2173508Z _tmp8 = tl.where(xmask & rmask & (_tmp8 < tmp7), tmp7, _tmp8) 2023-01-11T21:38:06.2173603Z tmp10 = tl.libdevice.isinf(tmp9) 2023-01-11T21:38:06.2173683Z tmp11 = tmp10 == 0 2023-01-11T21:38:06.2173765Z tmp12 = tmp11 == 0 2023-01-11T21:38:06.2173898Z _tmp13 = tl.where(xmask & rmask & (_tmp13 < tmp12), tmp12, _tmp13) 2023-01-11T21:38:06.2174012Z tmp2 = tl.reshape(tl.max(_tmp2, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.2174144Z tl.store(out_ptr0 + 0 + tl.zeros([XBLOCK, 1], tl.int32), tmp2, None) 2023-01-11T21:38:06.2174258Z tmp4 = tl.reshape(tl.max(_tmp4, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.2174383Z tl.store(out_ptr1 + 0 + tl.zeros([XBLOCK, 1], tl.int32), tmp4, None) 2023-01-11T21:38:06.2174621Z tmp8 = tl.reshape(tl.max(_tmp8, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.2174739Z tmp13 = tl.reshape(tl.max(_tmp13, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.2174822Z tmp14 = tmp8 == 0 2023-01-11T21:38:06.2174900Z tmp15 = tmp13 == 0 2023-01-11T21:38:06.2175037Z tl.store(in_out_ptr0 + (0 + tl.zeros([XBLOCK, 1], tl.int32)), tmp14, None) 2023-01-11T21:38:06.2175197Z tl.store(in_out_ptr1 + (0 + tl.zeros([XBLOCK, 1], tl.int32)), tmp15, None) 2023-01-11T21:38:06.2175287Z ''') 2023-01-11T21:38:06.2175297Z 2023-01-11T21:38:06.2175303Z 2023-01-11T21:38:06.2175406Z async_compile.wait(globals()) 2023-01-11T21:38:06.2175486Z del async_compile 2023-01-11T21:38:06.2175491Z 2023-01-11T21:38:06.2175568Z def call(args): 2023-01-11T21:38:06.2175642Z arg0_1, = args 2023-01-11T21:38:06.2175717Z args.clear() 2023-01-11T21:38:06.2175817Z with torch.cuda.device(0): 2023-01-11T21:38:06.2176027Z buf0 = empty_strided((), (), device='cuda', dtype=torch.bool) 2023-01-11T21:38:06.2176200Z buf1 = empty_strided((), (), device='cuda', dtype=torch.bool) 2023-01-11T21:38:06.2176379Z buf2 = empty_strided((), (), device='cuda', dtype=torch.bool) 2023-01-11T21:38:06.2176556Z buf3 = empty_strided((), (), device='cuda', dtype=torch.bool) 2023-01-11T21:38:06.2176645Z buf4 = buf2; del buf2 # reuse 2023-01-11T21:38:06.2176736Z buf5 = buf3; del buf3 # reuse 2023-01-11T21:38:06.2176826Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.2177086Z triton_fused_any_1_any_2_any_3_any_4_isinf_isinf_1_isinf_2_logical_not_logical_not_1_logical_not_2_0.run(buf4, buf5, arg0_1, buf0, buf1, 1, 64, grid=grid(1), stream=stream0) 2023-01-11T21:38:06.2177229Z del arg0_1 2023-01-11T21:38:06.2177319Z return (buf0, buf1, buf4, buf5, ) 2023-01-11T21:38:06.2177324Z 2023-01-11T21:38:06.2177338Z 2023-01-11T21:38:06.2177411Z if __name__ == "__main__": 2023-01-11T21:38:06.2177528Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.2177656Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.2177856Z arg0_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.2177972Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.2177977Z 2023-01-11T21:38:06.2178240Z [2023-01-11 21:33:58,123] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 391 2023-01-11T21:38:06.2178663Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.2178795Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.2179056Z [2023-01-11 21:33:58,156] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 392 2023-01-11T21:38:06.2179062Z 2023-01-11T21:38:06.2179152Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.2179264Z import torch 2023-01-11T21:38:06.2179338Z import random 2023-01-11T21:38:06.2179458Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.2179584Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.2179589Z 2023-01-11T21:38:06.2179671Z aten = torch.ops.aten 2023-01-11T21:38:06.2179808Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.2179897Z async_compile = AsyncCompile() 2023-01-11T21:38:06.2179912Z 2023-01-11T21:38:06.2179979Z import triton 2023-01-11T21:38:06.2180077Z import triton.language as tl 2023-01-11T21:38:06.2180203Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.2180344Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.2180349Z 2023-01-11T21:38:06.2180354Z 2023-01-11T21:38:06.2180630Z triton_fused_any_1_any_2_any_3_any_4_isinf_isinf_1_isinf_2_logical_not_logical_not_1_logical_not_2_0 = async_compile.triton(''' 2023-01-11T21:38:06.2180705Z import triton 2023-01-11T21:38:06.2180799Z import triton.language as tl 2023-01-11T21:38:06.2180906Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2181005Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2181139Z from torch._inductor.triton_ops.autotune import reduction 2023-01-11T21:38:06.2181263Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2181271Z 2023-01-11T21:38:06.2181358Z @reduction(size_hints=[1, 64], 2023-01-11T21:38:06.2181473Z reduction_hint=ReductionHint.INNER, 2023-01-11T21:38:06.2181557Z filename=__file__, 2023-01-11T21:38:06.2181984Z meta={'signature': {0: '*i1', 1: '*i1', 2: '*fp16', 3: '*i1', 4: '*i1', 5: 'i32', 6: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0', 'in_out_ptr1'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 6), equal_to_1=())]}) 2023-01-11T21:38:06.2182052Z @triton.jit 2023-01-11T21:38:06.2182251Z def triton_(in_out_ptr0, in_out_ptr1, in_ptr0, out_ptr0, out_ptr1, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr): 2023-01-11T21:38:06.2182328Z xnumel = 1 2023-01-11T21:38:06.2182403Z rnumel = 64 2023-01-11T21:38:06.2182499Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2182635Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.2182745Z xmask = xindex < xnumel 2023-01-11T21:38:06.2182863Z rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK]) 2023-01-11T21:38:06.2182970Z _tmp2 = tl.zeros([XBLOCK, RBLOCK], tl.int1) + 0 2023-01-11T21:38:06.2183080Z _tmp4 = tl.zeros([XBLOCK, RBLOCK], tl.int1) + 0 2023-01-11T21:38:06.2183187Z _tmp8 = tl.zeros([XBLOCK, RBLOCK], tl.int1) + 0 2023-01-11T21:38:06.2183300Z _tmp13 = tl.zeros([XBLOCK, RBLOCK], tl.int1) + 0 2023-01-11T21:38:06.2183406Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.2183495Z rindex = roffset + rbase 2023-01-11T21:38:06.2183582Z rmask = rindex < rnumel 2023-01-11T21:38:06.2183651Z r0 = rindex 2023-01-11T21:38:06.2183869Z tmp0 = tl.load(in_ptr0 + (r0), rmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.2183991Z tmp9 = tl.load(in_ptr0 + (r0), rmask).to(tl.float32) 2023-01-11T21:38:06.2184074Z tmp1 = (tmp0 != 0) 2023-01-11T21:38:06.2184204Z _tmp2 = tl.where(xmask & rmask & (_tmp2 < tmp1), tmp1, _tmp2) 2023-01-11T21:38:06.2184305Z tmp3 = tl.libdevice.isinf(tmp0) 2023-01-11T21:38:06.2184431Z _tmp4 = tl.where(xmask & rmask & (_tmp4 < tmp3), tmp3, _tmp4) 2023-01-11T21:38:06.2184504Z tmp5 = tmp3 == 0 2023-01-11T21:38:06.2184593Z tmp6 = tmp5.to(tl.int64) 2023-01-11T21:38:06.2184675Z tmp7 = (tmp6 != 0) 2023-01-11T21:38:06.2184801Z _tmp8 = tl.where(xmask & rmask & (_tmp8 < tmp7), tmp7, _tmp8) 2023-01-11T21:38:06.2184903Z tmp10 = tl.libdevice.isinf(tmp9) 2023-01-11T21:38:06.2184983Z tmp11 = tmp10 == 0 2023-01-11T21:38:06.2185063Z tmp12 = tmp11 == 0 2023-01-11T21:38:06.2185216Z _tmp13 = tl.where(xmask & rmask & (_tmp13 < tmp12), tmp12, _tmp13) 2023-01-11T21:38:06.2185332Z tmp2 = tl.reshape(tl.max(_tmp2, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.2185488Z tl.store(out_ptr0 + 0 + tl.zeros([XBLOCK, 1], tl.int32), tmp2, None) 2023-01-11T21:38:06.2185622Z tmp4 = tl.reshape(tl.max(_tmp4, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.2185757Z tl.store(out_ptr1 + 0 + tl.zeros([XBLOCK, 1], tl.int32), tmp4, None) 2023-01-11T21:38:06.2185868Z tmp8 = tl.reshape(tl.max(_tmp8, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.2185982Z tmp13 = tl.reshape(tl.max(_tmp13, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.2186053Z tmp14 = tmp8 == 0 2023-01-11T21:38:06.2186132Z tmp15 = tmp13 == 0 2023-01-11T21:38:06.2186270Z tl.store(in_out_ptr0 + (0 + tl.zeros([XBLOCK, 1], tl.int32)), tmp14, None) 2023-01-11T21:38:06.2186403Z tl.store(in_out_ptr1 + (0 + tl.zeros([XBLOCK, 1], tl.int32)), tmp15, None) 2023-01-11T21:38:06.2186493Z ''') 2023-01-11T21:38:06.2186499Z 2023-01-11T21:38:06.2186507Z 2023-01-11T21:38:06.2186600Z async_compile.wait(globals()) 2023-01-11T21:38:06.2186681Z del async_compile 2023-01-11T21:38:06.2186686Z 2023-01-11T21:38:06.2186760Z def call(args): 2023-01-11T21:38:06.2186827Z arg0_1, = args 2023-01-11T21:38:06.2186902Z args.clear() 2023-01-11T21:38:06.2186994Z with torch.cuda.device(0): 2023-01-11T21:38:06.2187181Z buf0 = empty_strided((), (), device='cuda', dtype=torch.bool) 2023-01-11T21:38:06.2187360Z buf1 = empty_strided((), (), device='cuda', dtype=torch.bool) 2023-01-11T21:38:06.2187535Z buf2 = empty_strided((), (), device='cuda', dtype=torch.bool) 2023-01-11T21:38:06.2187710Z buf3 = empty_strided((), (), device='cuda', dtype=torch.bool) 2023-01-11T21:38:06.2187801Z buf4 = buf2; del buf2 # reuse 2023-01-11T21:38:06.2187885Z buf5 = buf3; del buf3 # reuse 2023-01-11T21:38:06.2187978Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.2188195Z triton_fused_any_1_any_2_any_3_any_4_isinf_isinf_1_isinf_2_logical_not_logical_not_1_logical_not_2_0.run(buf4, buf5, arg0_1, buf0, buf1, 1, 64, grid=grid(1), stream=stream0) 2023-01-11T21:38:06.2188272Z del arg0_1 2023-01-11T21:38:06.2188366Z return (buf0, buf1, buf4, buf5, ) 2023-01-11T21:38:06.2188372Z 2023-01-11T21:38:06.2188376Z 2023-01-11T21:38:06.2188484Z if __name__ == "__main__": 2023-01-11T21:38:06.2188600Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.2188730Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.2188922Z arg0_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.2189033Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.2189038Z 2023-01-11T21:38:06.2189300Z [2023-01-11 21:33:58,425] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 392 2023-01-11T21:38:06.2189717Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.2189852Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.2190108Z [2023-01-11 21:33:58,458] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 393 2023-01-11T21:38:06.2190113Z 2023-01-11T21:38:06.2190211Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.2190288Z import torch 2023-01-11T21:38:06.2190362Z import random 2023-01-11T21:38:06.2190475Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.2190599Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.2190604Z 2023-01-11T21:38:06.2190685Z aten = torch.ops.aten 2023-01-11T21:38:06.2190821Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.2190946Z async_compile = AsyncCompile() 2023-01-11T21:38:06.2190951Z 2023-01-11T21:38:06.2191026Z import triton 2023-01-11T21:38:06.2191118Z import triton.language as tl 2023-01-11T21:38:06.2191236Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.2191377Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.2191385Z 2023-01-11T21:38:06.2191389Z 2023-01-11T21:38:06.2191550Z triton_fused_any_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.2191624Z import triton 2023-01-11T21:38:06.2191716Z import triton.language as tl 2023-01-11T21:38:06.2191831Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2191934Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2192066Z from torch._inductor.triton_ops.autotune import reduction 2023-01-11T21:38:06.2192183Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2192188Z 2023-01-11T21:38:06.2192278Z @reduction(size_hints=[16, 8], 2023-01-11T21:38:06.2192395Z reduction_hint=ReductionHint.INNER, 2023-01-11T21:38:06.2192479Z filename=__file__, 2023-01-11T21:38:06.2192838Z meta={'signature': {0: '*fp32', 1: '*i1', 2: 'i32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.2192916Z @triton.jit 2023-01-11T21:38:06.2193086Z def triton_(in_ptr0, out_ptr0, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr): 2023-01-11T21:38:06.2193160Z xnumel = 16 2023-01-11T21:38:06.2193225Z rnumel = 8 2023-01-11T21:38:06.2193322Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2193461Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.2193543Z xmask = xindex < xnumel 2023-01-11T21:38:06.2193663Z rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK]) 2023-01-11T21:38:06.2193732Z x0 = xindex 2023-01-11T21:38:06.2193842Z _tmp2 = tl.zeros([XBLOCK, RBLOCK], tl.int1) + 0 2023-01-11T21:38:06.2193943Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.2194030Z rindex = roffset + rbase 2023-01-11T21:38:06.2194116Z rmask = rindex < rnumel 2023-01-11T21:38:06.2194187Z r1 = rindex 2023-01-11T21:38:06.2194427Z tmp0 = tl.load(in_ptr0 + (r1 + (8*x0)), rmask & xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.2194513Z tmp1 = (tmp0 != 0) 2023-01-11T21:38:06.2194641Z _tmp2 = tl.where(xmask & rmask & (_tmp2 < tmp1), tmp1, _tmp2) 2023-01-11T21:38:06.2194748Z tmp2 = tl.reshape(tl.max(_tmp2, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.2194845Z tl.store(out_ptr0 + x0, tmp2, xmask) 2023-01-11T21:38:06.2194930Z ''') 2023-01-11T21:38:06.2194936Z 2023-01-11T21:38:06.2194942Z 2023-01-11T21:38:06.2195205Z triton_fused_any_2_any_4_isinf_isinf_2_logical_not_2_logical_not_3_logical_not_4_1 = async_compile.triton(''' 2023-01-11T21:38:06.2195296Z import triton 2023-01-11T21:38:06.2195398Z import triton.language as tl 2023-01-11T21:38:06.2195531Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2195632Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2195757Z from torch._inductor.triton_ops.autotune import reduction 2023-01-11T21:38:06.2195883Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2195888Z 2023-01-11T21:38:06.2195977Z @reduction(size_hints=[1, 128], 2023-01-11T21:38:06.2196091Z reduction_hint=ReductionHint.INNER, 2023-01-11T21:38:06.2196174Z filename=__file__, 2023-01-11T21:38:06.2196561Z meta={'signature': {0: '*i1', 1: '*fp32', 2: '*i1', 3: 'i32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 4), equal_to_1=())]}) 2023-01-11T21:38:06.2196634Z @triton.jit 2023-01-11T21:38:06.2196813Z def triton_(in_out_ptr0, in_ptr0, out_ptr0, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr): 2023-01-11T21:38:06.2196880Z xnumel = 1 2023-01-11T21:38:06.2197036Z rnumel = 128 2023-01-11T21:38:06.2197133Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2197268Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.2197350Z xmask = xindex < xnumel 2023-01-11T21:38:06.2197468Z rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK]) 2023-01-11T21:38:06.2197585Z _tmp2 = tl.zeros([XBLOCK, RBLOCK], tl.int1) + 0 2023-01-11T21:38:06.2197689Z _tmp7 = tl.zeros([XBLOCK, RBLOCK], tl.int1) + 0 2023-01-11T21:38:06.2197793Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.2197879Z rindex = roffset + rbase 2023-01-11T21:38:06.2197964Z rmask = rindex < rnumel 2023-01-11T21:38:06.2198033Z r0 = rindex 2023-01-11T21:38:06.2198229Z tmp0 = tl.load(in_ptr0 + (r0), rmask, eviction_policy='evict_last') 2023-01-11T21:38:06.2198330Z tmp3 = tl.load(in_ptr0 + (r0), rmask) 2023-01-11T21:38:06.2198422Z tmp1 = tl.libdevice.isinf(tmp0) 2023-01-11T21:38:06.2198550Z _tmp2 = tl.where(xmask & rmask & (_tmp2 < tmp1), tmp1, _tmp2) 2023-01-11T21:38:06.2198649Z tmp4 = tl.libdevice.isinf(tmp3) 2023-01-11T21:38:06.2198728Z tmp5 = tmp4 == 0 2023-01-11T21:38:06.2198809Z tmp6 = tmp5 == 0 2023-01-11T21:38:06.2198939Z _tmp7 = tl.where(xmask & rmask & (_tmp7 < tmp6), tmp6, _tmp7) 2023-01-11T21:38:06.2199052Z tmp2 = tl.reshape(tl.max(_tmp2, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.2199177Z tl.store(out_ptr0 + 0 + tl.zeros([XBLOCK, 1], tl.int32), tmp2, None) 2023-01-11T21:38:06.2199290Z tmp7 = tl.reshape(tl.max(_tmp7, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.2199366Z tmp8 = tmp7 == 0 2023-01-11T21:38:06.2199503Z tl.store(in_out_ptr0 + (0 + tl.zeros([XBLOCK, 1], tl.int32)), tmp8, None) 2023-01-11T21:38:06.2199588Z ''') 2023-01-11T21:38:06.2199594Z 2023-01-11T21:38:06.2199598Z 2023-01-11T21:38:06.2199808Z triton_fused_any_3_isinf_1_logical_not_logical_not_1_2 = async_compile.triton(''' 2023-01-11T21:38:06.2199890Z import triton 2023-01-11T21:38:06.2199982Z import triton.language as tl 2023-01-11T21:38:06.2200090Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2200191Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2200321Z from torch._inductor.triton_ops.autotune import reduction 2023-01-11T21:38:06.2200475Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2200480Z 2023-01-11T21:38:06.2200570Z @reduction(size_hints=[8, 16], 2023-01-11T21:38:06.2200687Z reduction_hint=ReductionHint.DEFAULT, 2023-01-11T21:38:06.2200770Z filename=__file__, 2023-01-11T21:38:06.2201143Z meta={'signature': {0: '*i1', 1: '*fp32', 2: 'i32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 3), equal_to_1=())]}) 2023-01-11T21:38:06.2201209Z @triton.jit 2023-01-11T21:38:06.2201377Z def triton_(in_out_ptr0, in_ptr0, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr): 2023-01-11T21:38:06.2201453Z xnumel = 8 2023-01-11T21:38:06.2201526Z rnumel = 16 2023-01-11T21:38:06.2201624Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2201759Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.2201843Z xmask = xindex < xnumel 2023-01-11T21:38:06.2201958Z rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK]) 2023-01-11T21:38:06.2202029Z x0 = xindex 2023-01-11T21:38:06.2202140Z _tmp5 = tl.zeros([XBLOCK, RBLOCK], tl.int1) + 0 2023-01-11T21:38:06.2202242Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.2202330Z rindex = roffset + rbase 2023-01-11T21:38:06.2202417Z rmask = rindex < rnumel 2023-01-11T21:38:06.2202488Z r1 = rindex 2023-01-11T21:38:06.2202598Z tmp0 = tl.load(in_ptr0 + (x0 + (8*r1)), rmask & xmask) 2023-01-11T21:38:06.2202696Z tmp1 = tl.libdevice.isinf(tmp0) 2023-01-11T21:38:06.2202775Z tmp2 = tmp1 == 0 2023-01-11T21:38:06.2202891Z tmp3 = tmp2.to(tl.int64) 2023-01-11T21:38:06.2202971Z tmp4 = (tmp3 != 0) 2023-01-11T21:38:06.2203098Z _tmp5 = tl.where(xmask & rmask & (_tmp5 < tmp4), tmp4, _tmp5) 2023-01-11T21:38:06.2203212Z tmp5 = tl.reshape(tl.max(_tmp5, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.2203282Z tmp6 = tmp5 == 0 2023-01-11T21:38:06.2203423Z tl.store(in_out_ptr0 + (x0 + tl.zeros([XBLOCK, 1], tl.int32)), tmp6, xmask) 2023-01-11T21:38:06.2203508Z ''') 2023-01-11T21:38:06.2203514Z 2023-01-11T21:38:06.2203518Z 2023-01-11T21:38:06.2203609Z async_compile.wait(globals()) 2023-01-11T21:38:06.2203685Z del async_compile 2023-01-11T21:38:06.2203690Z 2023-01-11T21:38:06.2203763Z def call(args): 2023-01-11T21:38:06.2203836Z arg0_1, = args 2023-01-11T21:38:06.2203905Z args.clear() 2023-01-11T21:38:06.2203996Z with torch.cuda.device(0): 2023-01-11T21:38:06.2204193Z buf0 = empty_strided((16, ), (1, ), device='cuda', dtype=torch.bool) 2023-01-11T21:38:06.2204284Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.2204428Z triton_fused_any_1_0.run(arg0_1, buf0, 16, 8, grid=grid(16), stream=stream0) 2023-01-11T21:38:06.2204610Z buf1 = empty_strided((), (), device='cuda', dtype=torch.bool) 2023-01-11T21:38:06.2204787Z buf4 = empty_strided((), (), device='cuda', dtype=torch.bool) 2023-01-11T21:38:06.2204879Z buf5 = buf4; del buf4 # reuse 2023-01-11T21:38:06.2205069Z triton_fused_any_2_any_4_isinf_isinf_2_logical_not_2_logical_not_3_logical_not_4_1.run(buf5, arg0_1, buf1, 1, 128, grid=grid(1), stream=stream0) 2023-01-11T21:38:06.2205259Z buf2 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.bool) 2023-01-11T21:38:06.2205359Z buf3 = buf2; del buf2 # reuse 2023-01-11T21:38:06.2205559Z triton_fused_any_3_isinf_1_logical_not_logical_not_1_2.run(buf3, arg0_1, 8, 16, grid=grid(8), stream=stream0) 2023-01-11T21:38:06.2205643Z del arg0_1 2023-01-11T21:38:06.2205737Z return (buf0, buf1, buf3, buf5, ) 2023-01-11T21:38:06.2205745Z 2023-01-11T21:38:06.2205749Z 2023-01-11T21:38:06.2205830Z if __name__ == "__main__": 2023-01-11T21:38:06.2205948Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.2206066Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.2206295Z arg0_1 = rand_strided((16, 8), (8, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.2206408Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.2206413Z 2023-01-11T21:38:06.2206678Z [2023-01-11 21:33:58,656] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 393 2023-01-11T21:38:06.2206684Z 2023-01-11T21:38:06.2206781Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.2206858Z import torch 2023-01-11T21:38:06.2206932Z import random 2023-01-11T21:38:06.2207051Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.2207169Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.2207174Z 2023-01-11T21:38:06.2207258Z aten = torch.ops.aten 2023-01-11T21:38:06.2207393Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.2207491Z async_compile = AsyncCompile() 2023-01-11T21:38:06.2207496Z 2023-01-11T21:38:06.2207570Z import triton 2023-01-11T21:38:06.2207662Z import triton.language as tl 2023-01-11T21:38:06.2207788Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.2207926Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.2207931Z 2023-01-11T21:38:06.2207936Z 2023-01-11T21:38:06.2208087Z triton_fused_any_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.2208162Z import triton 2023-01-11T21:38:06.2208255Z import triton.language as tl 2023-01-11T21:38:06.2208366Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2208467Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2208598Z from torch._inductor.triton_ops.autotune import reduction 2023-01-11T21:38:06.2208721Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2208754Z 2023-01-11T21:38:06.2208847Z @reduction(size_hints=[16, 8], 2023-01-11T21:38:06.2208954Z reduction_hint=ReductionHint.INNER, 2023-01-11T21:38:06.2209038Z filename=__file__, 2023-01-11T21:38:06.2209398Z meta={'signature': {0: '*fp16', 1: '*i1', 2: 'i32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.2209472Z @triton.jit 2023-01-11T21:38:06.2209639Z def triton_(in_ptr0, out_ptr0, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr): 2023-01-11T21:38:06.2209712Z xnumel = 16 2023-01-11T21:38:06.2209782Z rnumel = 8 2023-01-11T21:38:06.2209873Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2210005Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.2210086Z xmask = xindex < xnumel 2023-01-11T21:38:06.2210204Z rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK]) 2023-01-11T21:38:06.2210277Z x0 = xindex 2023-01-11T21:38:06.2210389Z _tmp2 = tl.zeros([XBLOCK, RBLOCK], tl.int1) + 0 2023-01-11T21:38:06.2210495Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.2210577Z rindex = roffset + rbase 2023-01-11T21:38:06.2210662Z rmask = rindex < rnumel 2023-01-11T21:38:06.2210734Z r1 = rindex 2023-01-11T21:38:06.2210971Z tmp0 = tl.load(in_ptr0 + (r1 + (8*x0)), rmask & xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.2211056Z tmp1 = (tmp0 != 0) 2023-01-11T21:38:06.2211182Z _tmp2 = tl.where(xmask & rmask & (_tmp2 < tmp1), tmp1, _tmp2) 2023-01-11T21:38:06.2211295Z tmp2 = tl.reshape(tl.max(_tmp2, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.2211385Z tl.store(out_ptr0 + x0, tmp2, xmask) 2023-01-11T21:38:06.2211470Z ''') 2023-01-11T21:38:06.2211475Z 2023-01-11T21:38:06.2211480Z 2023-01-11T21:38:06.2211730Z triton_fused_any_2_any_4_isinf_isinf_2_logical_not_2_logical_not_3_logical_not_4_1 = async_compile.triton(''' 2023-01-11T21:38:06.2211810Z import triton 2023-01-11T21:38:06.2211903Z import triton.language as tl 2023-01-11T21:38:06.2212017Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2212117Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2212274Z from torch._inductor.triton_ops.autotune import reduction 2023-01-11T21:38:06.2212394Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2212406Z 2023-01-11T21:38:06.2212489Z @reduction(size_hints=[1, 128], 2023-01-11T21:38:06.2212607Z reduction_hint=ReductionHint.INNER, 2023-01-11T21:38:06.2212692Z filename=__file__, 2023-01-11T21:38:06.2213076Z meta={'signature': {0: '*i1', 1: '*fp16', 2: '*i1', 3: 'i32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 4), equal_to_1=())]}) 2023-01-11T21:38:06.2213150Z @triton.jit 2023-01-11T21:38:06.2213331Z def triton_(in_out_ptr0, in_ptr0, out_ptr0, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr): 2023-01-11T21:38:06.2213405Z xnumel = 1 2023-01-11T21:38:06.2213472Z rnumel = 128 2023-01-11T21:38:06.2213569Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2213707Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.2213795Z xmask = xindex < xnumel 2023-01-11T21:38:06.2213913Z rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK]) 2023-01-11T21:38:06.2214025Z _tmp2 = tl.zeros([XBLOCK, RBLOCK], tl.int1) + 0 2023-01-11T21:38:06.2214134Z _tmp7 = tl.zeros([XBLOCK, RBLOCK], tl.int1) + 0 2023-01-11T21:38:06.2214231Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.2214321Z rindex = roffset + rbase 2023-01-11T21:38:06.2214406Z rmask = rindex < rnumel 2023-01-11T21:38:06.2214588Z r0 = rindex 2023-01-11T21:38:06.2214808Z tmp0 = tl.load(in_ptr0 + (r0), rmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.2214973Z tmp3 = tl.load(in_ptr0 + (r0), rmask).to(tl.float32) 2023-01-11T21:38:06.2215074Z tmp1 = tl.libdevice.isinf(tmp0) 2023-01-11T21:38:06.2215194Z _tmp2 = tl.where(xmask & rmask & (_tmp2 < tmp1), tmp1, _tmp2) 2023-01-11T21:38:06.2215294Z tmp4 = tl.libdevice.isinf(tmp3) 2023-01-11T21:38:06.2215376Z tmp5 = tmp4 == 0 2023-01-11T21:38:06.2215455Z tmp6 = tmp5 == 0 2023-01-11T21:38:06.2215582Z _tmp7 = tl.where(xmask & rmask & (_tmp7 < tmp6), tmp6, _tmp7) 2023-01-11T21:38:06.2215695Z tmp2 = tl.reshape(tl.max(_tmp2, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.2215827Z tl.store(out_ptr0 + 0 + tl.zeros([XBLOCK, 1], tl.int32), tmp2, None) 2023-01-11T21:38:06.2215945Z tmp7 = tl.reshape(tl.max(_tmp7, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.2216015Z tmp8 = tmp7 == 0 2023-01-11T21:38:06.2216150Z tl.store(in_out_ptr0 + (0 + tl.zeros([XBLOCK, 1], tl.int32)), tmp8, None) 2023-01-11T21:38:06.2216236Z ''') 2023-01-11T21:38:06.2216244Z 2023-01-11T21:38:06.2216249Z 2023-01-11T21:38:06.2216460Z triton_fused_any_3_isinf_1_logical_not_logical_not_1_2 = async_compile.triton(''' 2023-01-11T21:38:06.2216535Z import triton 2023-01-11T21:38:06.2216627Z import triton.language as tl 2023-01-11T21:38:06.2216741Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2216838Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2216970Z from torch._inductor.triton_ops.autotune import reduction 2023-01-11T21:38:06.2217095Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2217100Z 2023-01-11T21:38:06.2217247Z @reduction(size_hints=[8, 16], 2023-01-11T21:38:06.2217367Z reduction_hint=ReductionHint.DEFAULT, 2023-01-11T21:38:06.2217450Z filename=__file__, 2023-01-11T21:38:06.2217822Z meta={'signature': {0: '*i1', 1: '*fp16', 2: 'i32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 3), equal_to_1=())]}) 2023-01-11T21:38:06.2217900Z @triton.jit 2023-01-11T21:38:06.2218063Z def triton_(in_out_ptr0, in_ptr0, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr): 2023-01-11T21:38:06.2218137Z xnumel = 8 2023-01-11T21:38:06.2218209Z rnumel = 16 2023-01-11T21:38:06.2218304Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2218472Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.2218556Z xmask = xindex < xnumel 2023-01-11T21:38:06.2218673Z rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK]) 2023-01-11T21:38:06.2218737Z x0 = xindex 2023-01-11T21:38:06.2218848Z _tmp5 = tl.zeros([XBLOCK, RBLOCK], tl.int1) + 0 2023-01-11T21:38:06.2218951Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.2219038Z rindex = roffset + rbase 2023-01-11T21:38:06.2219124Z rmask = rindex < rnumel 2023-01-11T21:38:06.2219196Z r1 = rindex 2023-01-11T21:38:06.2219325Z tmp0 = tl.load(in_ptr0 + (x0 + (8*r1)), rmask & xmask).to(tl.float32) 2023-01-11T21:38:06.2219423Z tmp1 = tl.libdevice.isinf(tmp0) 2023-01-11T21:38:06.2219499Z tmp2 = tmp1 == 0 2023-01-11T21:38:06.2219587Z tmp3 = tmp2.to(tl.int64) 2023-01-11T21:38:06.2219667Z tmp4 = (tmp3 != 0) 2023-01-11T21:38:06.2219797Z _tmp5 = tl.where(xmask & rmask & (_tmp5 < tmp4), tmp4, _tmp5) 2023-01-11T21:38:06.2219909Z tmp5 = tl.reshape(tl.max(_tmp5, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.2219986Z tmp6 = tmp5 == 0 2023-01-11T21:38:06.2220117Z tl.store(in_out_ptr0 + (x0 + tl.zeros([XBLOCK, 1], tl.int32)), tmp6, xmask) 2023-01-11T21:38:06.2220203Z ''') 2023-01-11T21:38:06.2220209Z 2023-01-11T21:38:06.2220213Z 2023-01-11T21:38:06.2220303Z async_compile.wait(globals()) 2023-01-11T21:38:06.2220380Z del async_compile 2023-01-11T21:38:06.2220385Z 2023-01-11T21:38:06.2220461Z def call(args): 2023-01-11T21:38:06.2220533Z arg0_1, = args 2023-01-11T21:38:06.2220606Z args.clear() 2023-01-11T21:38:06.2220728Z with torch.cuda.device(0): 2023-01-11T21:38:06.2220914Z buf0 = empty_strided((16, ), (1, ), device='cuda', dtype=torch.bool) 2023-01-11T21:38:06.2221008Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.2221149Z triton_fused_any_1_0.run(arg0_1, buf0, 16, 8, grid=grid(16), stream=stream0) 2023-01-11T21:38:06.2221331Z buf1 = empty_strided((), (), device='cuda', dtype=torch.bool) 2023-01-11T21:38:06.2221509Z buf4 = empty_strided((), (), device='cuda', dtype=torch.bool) 2023-01-11T21:38:06.2221598Z buf5 = buf4; del buf4 # reuse 2023-01-11T21:38:06.2221796Z triton_fused_any_2_any_4_isinf_isinf_2_logical_not_2_logical_not_3_logical_not_4_1.run(buf5, arg0_1, buf1, 1, 128, grid=grid(1), stream=stream0) 2023-01-11T21:38:06.2221988Z buf2 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.bool) 2023-01-11T21:38:06.2222071Z buf3 = buf2; del buf2 # reuse 2023-01-11T21:38:06.2222241Z triton_fused_any_3_isinf_1_logical_not_logical_not_1_2.run(buf3, arg0_1, 8, 16, grid=grid(8), stream=stream0) 2023-01-11T21:38:06.2222318Z del arg0_1 2023-01-11T21:38:06.2222412Z return (buf0, buf1, buf3, buf5, ) 2023-01-11T21:38:06.2222418Z 2023-01-11T21:38:06.2222422Z 2023-01-11T21:38:06.2222500Z if __name__ == "__main__": 2023-01-11T21:38:06.2222620Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.2222746Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.2222950Z arg0_1 = rand_strided((16, 8), (8, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.2223056Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.2223062Z 2023-01-11T21:38:06.2223133Z ok (1.078s) 2023-01-11T21:38:06.2223589Z test_arange1_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.2223726Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.2224014Z [2023-01-11 21:33:58,710] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 394 2023-01-11T21:38:06.2224277Z [2023-01-11 21:33:58,876] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 394 2023-01-11T21:38:06.2224691Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.2224821Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.2225078Z [2023-01-11 21:33:58,929] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 395 2023-01-11T21:38:06.2225339Z [2023-01-11 21:33:59,009] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 395 2023-01-11T21:38:06.2225345Z 2023-01-11T21:38:06.2225441Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.2225510Z import torch 2023-01-11T21:38:06.2225583Z import random 2023-01-11T21:38:06.2225727Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.2225865Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.2225871Z 2023-01-11T21:38:06.2225958Z aten = torch.ops.aten 2023-01-11T21:38:06.2226098Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.2226194Z async_compile = AsyncCompile() 2023-01-11T21:38:06.2226199Z 2023-01-11T21:38:06.2226269Z import triton 2023-01-11T21:38:06.2226360Z import triton.language as tl 2023-01-11T21:38:06.2226488Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.2226659Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.2226665Z 2023-01-11T21:38:06.2226669Z 2023-01-11T21:38:06.2226830Z triton_fused_add_mul_0 = async_compile.triton(''' 2023-01-11T21:38:06.2226906Z import triton 2023-01-11T21:38:06.2226996Z import triton.language as tl 2023-01-11T21:38:06.2227112Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2227207Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2227339Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.2227465Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2227470Z 2023-01-11T21:38:06.2227891Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.2227963Z @triton.jit 2023-01-11T21:38:06.2228109Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.2228181Z xnumel = 64 2023-01-11T21:38:06.2228277Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2228401Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.2228491Z xmask = xindex < xnumel 2023-01-11T21:38:06.2228562Z x0 = xindex 2023-01-11T21:38:06.2228634Z x1 = xindex % 8 2023-01-11T21:38:06.2228731Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.2228803Z tmp1 = x0 2023-01-11T21:38:06.2228882Z tmp2 = tmp0 * tmp1 2023-01-11T21:38:06.2228949Z tmp3 = 10 + x1 2023-01-11T21:38:06.2229038Z tmp4 = tmp3.to(tl.float32) 2023-01-11T21:38:06.2229114Z tmp5 = tmp2 + tmp4 2023-01-11T21:38:06.2229250Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.2229385Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask) 2023-01-11T21:38:06.2229474Z ''') 2023-01-11T21:38:06.2229480Z 2023-01-11T21:38:06.2229484Z 2023-01-11T21:38:06.2229576Z async_compile.wait(globals()) 2023-01-11T21:38:06.2229645Z del async_compile 2023-01-11T21:38:06.2229650Z 2023-01-11T21:38:06.2229725Z def call(args): 2023-01-11T21:38:06.2229799Z arg0_1, = args 2023-01-11T21:38:06.2229875Z args.clear() 2023-01-11T21:38:06.2229996Z with torch.cuda.device(0): 2023-01-11T21:38:06.2230198Z buf0 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.2230398Z buf1 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.2230484Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.2230628Z triton_fused_add_mul_0.run(arg0_1, buf0, buf1, 64, grid=grid(64), stream=stream0) 2023-01-11T21:38:06.2230701Z del arg0_1 2023-01-11T21:38:06.2230782Z return (buf0, buf1, ) 2023-01-11T21:38:06.2230787Z 2023-01-11T21:38:06.2230791Z 2023-01-11T21:38:06.2230874Z if __name__ == "__main__": 2023-01-11T21:38:06.2230993Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.2231121Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.2231320Z arg0_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.2231424Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.2231439Z 2023-01-11T21:38:06.2231444Z 2023-01-11T21:38:06.2231534Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.2231608Z import torch 2023-01-11T21:38:06.2231682Z import random 2023-01-11T21:38:06.2231803Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.2231929Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.2231934Z 2023-01-11T21:38:06.2232017Z aten = torch.ops.aten 2023-01-11T21:38:06.2232154Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.2232244Z async_compile = AsyncCompile() 2023-01-11T21:38:06.2232249Z 2023-01-11T21:38:06.2232352Z import triton 2023-01-11T21:38:06.2232444Z import triton.language as tl 2023-01-11T21:38:06.2232568Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.2232707Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.2232712Z 2023-01-11T21:38:06.2232716Z 2023-01-11T21:38:06.2232878Z triton_fused_add_mul_0 = async_compile.triton(''' 2023-01-11T21:38:06.2232955Z import triton 2023-01-11T21:38:06.2233046Z import triton.language as tl 2023-01-11T21:38:06.2233154Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2233256Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2233388Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.2233513Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2233518Z 2023-01-11T21:38:06.2233939Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.2234016Z @triton.jit 2023-01-11T21:38:06.2234157Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.2234230Z xnumel = 64 2023-01-11T21:38:06.2234320Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2234456Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.2234538Z xmask = xindex < xnumel 2023-01-11T21:38:06.2234609Z x0 = xindex 2023-01-11T21:38:06.2234681Z x1 = xindex % 8 2023-01-11T21:38:06.2234800Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.2234888Z tmp1 = tmp0.to(tl.float32) 2023-01-11T21:38:06.2234952Z tmp2 = x0 2023-01-11T21:38:06.2235030Z tmp3 = tmp1 * tmp2 2023-01-11T21:38:06.2235103Z tmp4 = 10 + x1 2023-01-11T21:38:06.2235188Z tmp5 = tmp4.to(tl.float32) 2023-01-11T21:38:06.2235264Z tmp6 = tmp3 + tmp5 2023-01-11T21:38:06.2235399Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.2235530Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp6, xmask) 2023-01-11T21:38:06.2235609Z ''') 2023-01-11T21:38:06.2235614Z 2023-01-11T21:38:06.2235619Z 2023-01-11T21:38:06.2235710Z async_compile.wait(globals()) 2023-01-11T21:38:06.2235816Z del async_compile 2023-01-11T21:38:06.2235821Z 2023-01-11T21:38:06.2235897Z def call(args): 2023-01-11T21:38:06.2235974Z arg0_1, = args 2023-01-11T21:38:06.2236047Z args.clear() 2023-01-11T21:38:06.2236139Z with torch.cuda.device(0): 2023-01-11T21:38:06.2236331Z buf0 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.2236528Z buf1 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.2236621Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.2236764Z triton_fused_add_mul_0.run(arg0_1, buf0, buf1, 64, grid=grid(64), stream=stream0) 2023-01-11T21:38:06.2236840Z del arg0_1 2023-01-11T21:38:06.2236924Z return (buf0, buf1, ) 2023-01-11T21:38:06.2236929Z 2023-01-11T21:38:06.2236933Z 2023-01-11T21:38:06.2237019Z if __name__ == "__main__": 2023-01-11T21:38:06.2237136Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.2237256Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.2237454Z arg0_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.2237570Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.2237575Z 2023-01-11T21:38:06.2237646Z ok (0.353s) 2023-01-11T21:38:06.2238102Z test_arange2_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.2238276Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.2238539Z [2023-01-11 21:33:59,040] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 396 2023-01-11T21:38:06.2238802Z [2023-01-11 21:33:59,108] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 396 2023-01-11T21:38:06.2238809Z 2023-01-11T21:38:06.2238909Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.2238986Z import torch 2023-01-11T21:38:06.2239054Z import random 2023-01-11T21:38:06.2239174Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.2239297Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.2239302Z 2023-01-11T21:38:06.2239382Z aten = torch.ops.aten 2023-01-11T21:38:06.2239521Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.2239614Z async_compile = AsyncCompile() 2023-01-11T21:38:06.2239620Z 2023-01-11T21:38:06.2239697Z import triton 2023-01-11T21:38:06.2239783Z import triton.language as tl 2023-01-11T21:38:06.2239907Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.2240047Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.2240053Z 2023-01-11T21:38:06.2240057Z 2023-01-11T21:38:06.2240215Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.2240291Z import triton 2023-01-11T21:38:06.2240385Z import triton.language as tl 2023-01-11T21:38:06.2240497Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2240599Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2240724Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.2240851Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2240856Z 2023-01-11T21:38:06.2241256Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*i64', 1: '*i64', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.2241333Z @triton.jit 2023-01-11T21:38:06.2241467Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.2241539Z xnumel = 64 2023-01-11T21:38:06.2241637Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2241790Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.2241868Z xmask = xindex < xnumel 2023-01-11T21:38:06.2241938Z x2 = xindex 2023-01-11T21:38:06.2242015Z x0 = xindex % 8 2023-01-11T21:38:06.2242111Z tmp0 = tl.load(in_ptr0 + (x2), xmask) 2023-01-11T21:38:06.2242184Z tmp1 = x0 2023-01-11T21:38:06.2242262Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.2242399Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.2242477Z ''') 2023-01-11T21:38:06.2242482Z 2023-01-11T21:38:06.2242487Z 2023-01-11T21:38:06.2242578Z async_compile.wait(globals()) 2023-01-11T21:38:06.2242656Z del async_compile 2023-01-11T21:38:06.2242662Z 2023-01-11T21:38:06.2242738Z def call(args): 2023-01-11T21:38:06.2242810Z arg0_1, = args 2023-01-11T21:38:06.2242887Z args.clear() 2023-01-11T21:38:06.2242978Z with torch.cuda.device(0): 2023-01-11T21:38:06.2243170Z buf0 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.2243261Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.2243400Z triton_fused_add_0.run(arg0_1, buf0, 64, grid=grid(64), stream=stream0) 2023-01-11T21:38:06.2243475Z del arg0_1 2023-01-11T21:38:06.2243556Z return (buf0, ) 2023-01-11T21:38:06.2243562Z 2023-01-11T21:38:06.2243566Z 2023-01-11T21:38:06.2243645Z if __name__ == "__main__": 2023-01-11T21:38:06.2243763Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.2243888Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.2244078Z arg0_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.2244218Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.2244224Z 2023-01-11T21:38:06.2244298Z ok (0.098s) 2023-01-11T21:38:06.2244763Z test_arange3_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.2244894Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.2245152Z [2023-01-11 21:33:59,143] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 397 2023-01-11T21:38:06.2245416Z [2023-01-11 21:33:59,208] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 397 2023-01-11T21:38:06.2245833Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.2245972Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.2246226Z [2023-01-11 21:33:59,243] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 398 2023-01-11T21:38:06.2246487Z [2023-01-11 21:33:59,305] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 398 2023-01-11T21:38:06.2246493Z 2023-01-11T21:38:06.2246584Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.2246659Z import torch 2023-01-11T21:38:06.2246735Z import random 2023-01-11T21:38:06.2246856Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.2246980Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.2246988Z 2023-01-11T21:38:06.2247069Z aten = torch.ops.aten 2023-01-11T21:38:06.2247204Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.2247293Z async_compile = AsyncCompile() 2023-01-11T21:38:06.2247298Z 2023-01-11T21:38:06.2247374Z import triton 2023-01-11T21:38:06.2247495Z import triton.language as tl 2023-01-11T21:38:06.2247622Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.2247759Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.2247765Z 2023-01-11T21:38:06.2247770Z 2023-01-11T21:38:06.2247927Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.2248000Z import triton 2023-01-11T21:38:06.2248094Z import triton.language as tl 2023-01-11T21:38:06.2248202Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2248303Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2248439Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.2248566Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2248571Z 2023-01-11T21:38:06.2248975Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.2249046Z @triton.jit 2023-01-11T21:38:06.2249181Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.2249256Z xnumel = 14 2023-01-11T21:38:06.2249346Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2249474Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.2249559Z xmask = xindex < xnumel 2023-01-11T21:38:06.2249629Z x0 = xindex 2023-01-11T21:38:06.2249727Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.2249804Z tmp1 = 4*x0 2023-01-11T21:38:06.2249890Z tmp2 = tmp1.to(tl.float32) 2023-01-11T21:38:06.2249991Z tmp3 = tmp0 + tmp2 2023-01-11T21:38:06.2250125Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.2250212Z ''') 2023-01-11T21:38:06.2250218Z 2023-01-11T21:38:06.2250222Z 2023-01-11T21:38:06.2250317Z async_compile.wait(globals()) 2023-01-11T21:38:06.2250398Z del async_compile 2023-01-11T21:38:06.2250405Z 2023-01-11T21:38:06.2250482Z def call(args): 2023-01-11T21:38:06.2250556Z arg0_1, = args 2023-01-11T21:38:06.2250623Z args.clear() 2023-01-11T21:38:06.2250715Z with torch.cuda.device(0): 2023-01-11T21:38:06.2250915Z buf0 = empty_strided((14, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.2251009Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.2251150Z triton_fused_add_0.run(arg0_1, buf0, 14, grid=grid(14), stream=stream0) 2023-01-11T21:38:06.2251224Z del arg0_1 2023-01-11T21:38:06.2251301Z return (buf0, ) 2023-01-11T21:38:06.2251306Z 2023-01-11T21:38:06.2251312Z 2023-01-11T21:38:06.2251396Z if __name__ == "__main__": 2023-01-11T21:38:06.2251507Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.2251633Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.2251835Z arg0_1 = rand_strided((14, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.2251954Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.2251959Z 2023-01-11T21:38:06.2251963Z 2023-01-11T21:38:06.2252060Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.2252136Z import torch 2023-01-11T21:38:06.2252210Z import random 2023-01-11T21:38:06.2252327Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.2252443Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.2252448Z 2023-01-11T21:38:06.2252532Z aten = torch.ops.aten 2023-01-11T21:38:06.2252668Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.2252764Z async_compile = AsyncCompile() 2023-01-11T21:38:06.2252772Z 2023-01-11T21:38:06.2252850Z import triton 2023-01-11T21:38:06.2252943Z import triton.language as tl 2023-01-11T21:38:06.2253070Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.2253202Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.2253215Z 2023-01-11T21:38:06.2253220Z 2023-01-11T21:38:06.2253393Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.2253470Z import triton 2023-01-11T21:38:06.2253565Z import triton.language as tl 2023-01-11T21:38:06.2253682Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2253785Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2253917Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.2254043Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2254048Z 2023-01-11T21:38:06.2254440Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.2254631Z @triton.jit 2023-01-11T21:38:06.2254763Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.2254834Z xnumel = 14 2023-01-11T21:38:06.2254934Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2255067Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.2255150Z xmask = xindex < xnumel 2023-01-11T21:38:06.2255223Z x0 = xindex 2023-01-11T21:38:06.2255333Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.2255414Z tmp1 = 4*x0 2023-01-11T21:38:06.2255520Z tmp2 = tmp1.to(tl.float32) 2023-01-11T21:38:06.2255604Z tmp3 = tmp0 + tmp2 2023-01-11T21:38:06.2255758Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.2255847Z ''') 2023-01-11T21:38:06.2255853Z 2023-01-11T21:38:06.2255857Z 2023-01-11T21:38:06.2255952Z async_compile.wait(globals()) 2023-01-11T21:38:06.2256067Z del async_compile 2023-01-11T21:38:06.2256072Z 2023-01-11T21:38:06.2256150Z def call(args): 2023-01-11T21:38:06.2256227Z arg0_1, = args 2023-01-11T21:38:06.2256302Z args.clear() 2023-01-11T21:38:06.2256392Z with torch.cuda.device(0): 2023-01-11T21:38:06.2256596Z buf0 = empty_strided((14, ), (1, ), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.2256691Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.2256818Z triton_fused_add_0.run(arg0_1, buf0, 14, grid=grid(14), stream=stream0) 2023-01-11T21:38:06.2256894Z del arg0_1 2023-01-11T21:38:06.2256971Z return (buf0, ) 2023-01-11T21:38:06.2256976Z 2023-01-11T21:38:06.2256981Z 2023-01-11T21:38:06.2257065Z if __name__ == "__main__": 2023-01-11T21:38:06.2257242Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.2257372Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.2257574Z arg0_1 = rand_strided((14, ), (1, ), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.2257690Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.2257696Z 2023-01-11T21:38:06.2257759Z ok (0.198s) 2023-01-11T21:38:06.2258216Z test_arange4_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.2258350Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.2258606Z [2023-01-11 21:33:59,336] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 399 2023-01-11T21:38:06.2258869Z [2023-01-11 21:33:59,407] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 399 2023-01-11T21:38:06.2259295Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.2259463Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.2259717Z [2023-01-11 21:33:59,437] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 400 2023-01-11T21:38:06.2259983Z [2023-01-11 21:33:59,505] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 400 2023-01-11T21:38:06.2259989Z 2023-01-11T21:38:06.2260086Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.2260161Z import torch 2023-01-11T21:38:06.2260229Z import random 2023-01-11T21:38:06.2260347Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.2260481Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.2260487Z 2023-01-11T21:38:06.2260569Z aten = torch.ops.aten 2023-01-11T21:38:06.2260705Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.2260805Z async_compile = AsyncCompile() 2023-01-11T21:38:06.2260810Z 2023-01-11T21:38:06.2260889Z import triton 2023-01-11T21:38:06.2260974Z import triton.language as tl 2023-01-11T21:38:06.2261103Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.2261242Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.2261247Z 2023-01-11T21:38:06.2261252Z 2023-01-11T21:38:06.2261407Z triton_fused_sub_0 = async_compile.triton(''' 2023-01-11T21:38:06.2261480Z import triton 2023-01-11T21:38:06.2261576Z import triton.language as tl 2023-01-11T21:38:06.2261688Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2261788Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2261914Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.2262071Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2262076Z 2023-01-11T21:38:06.2262487Z @pointwise(size_hints=[1024], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.2262564Z @triton.jit 2023-01-11T21:38:06.2262711Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.2262788Z xnumel = 1024 2023-01-11T21:38:06.2262893Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2263032Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.2263111Z xmask = xindex < xnumel 2023-01-11T21:38:06.2263184Z x0 = xindex 2023-01-11T21:38:06.2263289Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.2263407Z tmp1 = 512 + ((-1)*x0) 2023-01-11T21:38:06.2263502Z tmp2 = tmp1.to(tl.float32) 2023-01-11T21:38:06.2263617Z tmp3 = tmp0 - tmp2 2023-01-11T21:38:06.2263763Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.2263843Z ''') 2023-01-11T21:38:06.2263849Z 2023-01-11T21:38:06.2263863Z 2023-01-11T21:38:06.2263954Z async_compile.wait(globals()) 2023-01-11T21:38:06.2264036Z del async_compile 2023-01-11T21:38:06.2264042Z 2023-01-11T21:38:06.2264118Z def call(args): 2023-01-11T21:38:06.2264191Z arg0_1, = args 2023-01-11T21:38:06.2264267Z args.clear() 2023-01-11T21:38:06.2264362Z with torch.cuda.device(0): 2023-01-11T21:38:06.2264587Z buf0 = empty_strided((1024, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.2264673Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.2264815Z triton_fused_sub_0.run(arg0_1, buf0, 1024, grid=grid(1024), stream=stream0) 2023-01-11T21:38:06.2264892Z del arg0_1 2023-01-11T21:38:06.2264969Z return (buf0, ) 2023-01-11T21:38:06.2264977Z 2023-01-11T21:38:06.2264981Z 2023-01-11T21:38:06.2265064Z if __name__ == "__main__": 2023-01-11T21:38:06.2265182Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.2265307Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.2265531Z arg0_1 = rand_strided((1024, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.2265645Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.2265650Z 2023-01-11T21:38:06.2265655Z 2023-01-11T21:38:06.2265752Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.2265827Z import torch 2023-01-11T21:38:06.2265899Z import random 2023-01-11T21:38:06.2266014Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.2266140Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.2266146Z 2023-01-11T21:38:06.2266225Z aten = torch.ops.aten 2023-01-11T21:38:06.2266353Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.2266452Z async_compile = AsyncCompile() 2023-01-11T21:38:06.2266457Z 2023-01-11T21:38:06.2266528Z import triton 2023-01-11T21:38:06.2266617Z import triton.language as tl 2023-01-11T21:38:06.2266739Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.2266876Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.2266883Z 2023-01-11T21:38:06.2266888Z 2023-01-11T21:38:06.2267042Z triton_fused_sub_0 = async_compile.triton(''' 2023-01-11T21:38:06.2267115Z import triton 2023-01-11T21:38:06.2267200Z import triton.language as tl 2023-01-11T21:38:06.2267313Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2267419Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2267551Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.2267676Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2267681Z 2023-01-11T21:38:06.2268084Z @pointwise(size_hints=[1024], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.2268185Z @triton.jit 2023-01-11T21:38:06.2268317Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.2268385Z xnumel = 1024 2023-01-11T21:38:06.2268484Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2268612Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.2268698Z xmask = xindex < xnumel 2023-01-11T21:38:06.2268769Z x0 = xindex 2023-01-11T21:38:06.2268885Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.2268999Z tmp1 = 512 + ((-1)*x0) 2023-01-11T21:38:06.2269080Z tmp2 = tmp1.to(tl.float32) 2023-01-11T21:38:06.2269187Z tmp3 = tmp0 - tmp2 2023-01-11T21:38:06.2269318Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.2269402Z ''') 2023-01-11T21:38:06.2269410Z 2023-01-11T21:38:06.2269415Z 2023-01-11T21:38:06.2269509Z async_compile.wait(globals()) 2023-01-11T21:38:06.2269583Z del async_compile 2023-01-11T21:38:06.2269589Z 2023-01-11T21:38:06.2269662Z def call(args): 2023-01-11T21:38:06.2269734Z arg0_1, = args 2023-01-11T21:38:06.2269802Z args.clear() 2023-01-11T21:38:06.2269895Z with torch.cuda.device(0): 2023-01-11T21:38:06.2270095Z buf0 = empty_strided((1024, ), (1, ), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.2270187Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.2270324Z triton_fused_sub_0.run(arg0_1, buf0, 1024, grid=grid(1024), stream=stream0) 2023-01-11T21:38:06.2270397Z del arg0_1 2023-01-11T21:38:06.2270475Z return (buf0, ) 2023-01-11T21:38:06.2270480Z 2023-01-11T21:38:06.2270484Z 2023-01-11T21:38:06.2270558Z if __name__ == "__main__": 2023-01-11T21:38:06.2270673Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.2270798Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.2271003Z arg0_1 = rand_strided((1024, ), (1, ), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.2271113Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.2271118Z 2023-01-11T21:38:06.2271186Z ok (0.199s) 2023-01-11T21:38:06.2271671Z test_argmax_argmin1_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.2271803Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.2272055Z [2023-01-11 21:33:59,532] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 401 2023-01-11T21:38:06.2272309Z [2023-01-11 21:33:59,757] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 401 2023-01-11T21:38:06.2272726Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.2272857Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.2273110Z [2023-01-11 21:33:59,781] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 402 2023-01-11T21:38:06.2273369Z [2023-01-11 21:33:59,912] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 402 2023-01-11T21:38:06.2273375Z 2023-01-11T21:38:06.2273472Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.2273545Z import torch 2023-01-11T21:38:06.2273618Z import random 2023-01-11T21:38:06.2273767Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.2273883Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.2273896Z 2023-01-11T21:38:06.2273970Z aten = torch.ops.aten 2023-01-11T21:38:06.2274107Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.2280386Z async_compile = AsyncCompile() 2023-01-11T21:38:06.2280395Z 2023-01-11T21:38:06.2280486Z import triton 2023-01-11T21:38:06.2280586Z import triton.language as tl 2023-01-11T21:38:06.2280714Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.2280856Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.2280862Z 2023-01-11T21:38:06.2280866Z 2023-01-11T21:38:06.2281060Z triton_fused_argmax_argmin_0 = async_compile.triton(''' 2023-01-11T21:38:06.2281129Z import triton 2023-01-11T21:38:06.2281224Z import triton.language as tl 2023-01-11T21:38:06.2281339Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2281446Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2281578Z from torch._inductor.triton_ops.autotune import reduction 2023-01-11T21:38:06.2281706Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2281711Z 2023-01-11T21:38:06.2281805Z @reduction(size_hints=[1, 524288], 2023-01-11T21:38:06.2281925Z reduction_hint=ReductionHint.INNER, 2023-01-11T21:38:06.2282004Z filename=__file__, 2023-01-11T21:38:06.2282374Z meta={'signature': {0: '*fp32', 1: '*i64', 2: '*i64', 3: 'i32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 4), equal_to_1=())]}) 2023-01-11T21:38:06.2282451Z @triton.jit 2023-01-11T21:38:06.2282625Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr): 2023-01-11T21:38:06.2282699Z xnumel = 1 2023-01-11T21:38:06.2282773Z rnumel = 524288 2023-01-11T21:38:06.2282877Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2283006Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.2283090Z xmask = xindex < xnumel 2023-01-11T21:38:06.2283208Z rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK]) 2023-01-11T21:38:06.2283442Z _tmp1 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("-inf") 2023-01-11T21:38:06.2283561Z _tmp1_index = tl.zeros([XBLOCK, RBLOCK], tl.int64) 2023-01-11T21:38:06.2283690Z _tmp3 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("inf") 2023-01-11T21:38:06.2283803Z _tmp3_index = tl.zeros([XBLOCK, RBLOCK], tl.int64) 2023-01-11T21:38:06.2283900Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.2283989Z rindex = roffset + rbase 2023-01-11T21:38:06.2284075Z rmask = rindex < rnumel 2023-01-11T21:38:06.2284146Z r0 = rindex 2023-01-11T21:38:06.2284343Z tmp0 = tl.load(in_ptr0 + (r0), rmask, eviction_policy='evict_last') 2023-01-11T21:38:06.2284445Z tmp2 = tl.load(in_ptr0 + (r0), rmask) 2023-01-11T21:38:06.2284589Z _tmp1_index = tl.where(xmask & rmask & (_tmp1 < tmp0), rindex, _tmp1_index) 2023-01-11T21:38:06.2284710Z _tmp1 = tl.where(xmask & rmask & (_tmp1 < tmp0), tmp0, _tmp1) 2023-01-11T21:38:06.2284853Z _tmp3_index = tl.where(xmask & rmask & (_tmp3 > tmp2), rindex, _tmp3_index) 2023-01-11T21:38:06.2284976Z _tmp3 = tl.where(xmask & rmask & (_tmp3 > tmp2), tmp2, _tmp3) 2023-01-11T21:38:06.2285071Z _tmp1_index_reduce = tl.reshape( 2023-01-11T21:38:06.2285186Z tl.argmax(_tmp1, 1), [XBLOCK, 1]).to(tl.int32) 2023-01-11T21:38:06.2285305Z _tmp1_index_mask = (tl.reshape(tl.arange(0, RBLOCK), 2023-01-11T21:38:06.2285404Z [1, RBLOCK]) == _tmp1_index_reduce) 2023-01-11T21:38:06.2285492Z tmp1 = tl.reshape(tl.sum( 2023-01-11T21:38:06.2285610Z tl.where(_tmp1_index_mask, _tmp1_index, 0), 1), [XBLOCK, 1]) 2023-01-11T21:38:06.2285741Z tl.store(out_ptr0 + 0 + tl.zeros([XBLOCK, 1], tl.int32), tmp1, None) 2023-01-11T21:38:06.2285882Z _tmp3_index_reduce = tl.reshape( 2023-01-11T21:38:06.2285991Z tl.argmin(_tmp3, 1), [XBLOCK, 1]).to(tl.int32) 2023-01-11T21:38:06.2286108Z _tmp3_index_mask = (tl.reshape(tl.arange(0, RBLOCK), 2023-01-11T21:38:06.2286201Z [1, RBLOCK]) == _tmp3_index_reduce) 2023-01-11T21:38:06.2286292Z tmp3 = tl.reshape(tl.sum( 2023-01-11T21:38:06.2286409Z tl.where(_tmp3_index_mask, _tmp3_index, 0), 1), [XBLOCK, 1]) 2023-01-11T21:38:06.2286542Z tl.store(out_ptr1 + 0 + tl.zeros([XBLOCK, 1], tl.int32), tmp3, None) 2023-01-11T21:38:06.2286629Z ''') 2023-01-11T21:38:06.2286634Z 2023-01-11T21:38:06.2286639Z 2023-01-11T21:38:06.2286732Z async_compile.wait(globals()) 2023-01-11T21:38:06.2286809Z del async_compile 2023-01-11T21:38:06.2286814Z 2023-01-11T21:38:06.2286888Z def call(args): 2023-01-11T21:38:06.2286963Z arg0_1, = args 2023-01-11T21:38:06.2287031Z args.clear() 2023-01-11T21:38:06.2287123Z with torch.cuda.device(0): 2023-01-11T21:38:06.2287314Z buf0 = empty_strided((), (), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.2287500Z buf1 = empty_strided((), (), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.2287590Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.2287752Z triton_fused_argmax_argmin_0.run(arg0_1, buf0, buf1, 1, 524288, grid=grid(1), stream=stream0) 2023-01-11T21:38:06.2287823Z del arg0_1 2023-01-11T21:38:06.2287906Z return (buf0, buf1, ) 2023-01-11T21:38:06.2287911Z 2023-01-11T21:38:06.2287915Z 2023-01-11T21:38:06.2287988Z if __name__ == "__main__": 2023-01-11T21:38:06.2288107Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.2288235Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.2288456Z arg0_1 = rand_strided((8, 256, 256), (65536, 256, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.2288570Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.2288576Z 2023-01-11T21:38:06.2288581Z 2023-01-11T21:38:06.2288679Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.2288753Z import torch 2023-01-11T21:38:06.2288829Z import random 2023-01-11T21:38:06.2288940Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.2289065Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.2289070Z 2023-01-11T21:38:06.2289180Z aten = torch.ops.aten 2023-01-11T21:38:06.2289317Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.2289413Z async_compile = AsyncCompile() 2023-01-11T21:38:06.2289418Z 2023-01-11T21:38:06.2289492Z import triton 2023-01-11T21:38:06.2289584Z import triton.language as tl 2023-01-11T21:38:06.2289708Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.2289838Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.2289844Z 2023-01-11T21:38:06.2289848Z 2023-01-11T21:38:06.2290019Z triton_fused_argmax_argmin_0 = async_compile.triton(''' 2023-01-11T21:38:06.2290097Z import triton 2023-01-11T21:38:06.2290189Z import triton.language as tl 2023-01-11T21:38:06.2290302Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2290403Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2290532Z from torch._inductor.triton_ops.autotune import reduction 2023-01-11T21:38:06.2290651Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2290663Z 2023-01-11T21:38:06.2290748Z @reduction(size_hints=[1, 524288], 2023-01-11T21:38:06.2290864Z reduction_hint=ReductionHint.INNER, 2023-01-11T21:38:06.2290948Z filename=__file__, 2023-01-11T21:38:06.2291322Z meta={'signature': {0: '*fp16', 1: '*i64', 2: '*i64', 3: 'i32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 4), equal_to_1=())]}) 2023-01-11T21:38:06.2291394Z @triton.jit 2023-01-11T21:38:06.2291571Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr): 2023-01-11T21:38:06.2291674Z xnumel = 1 2023-01-11T21:38:06.2291742Z rnumel = 524288 2023-01-11T21:38:06.2291841Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2291975Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.2292058Z xmask = xindex < xnumel 2023-01-11T21:38:06.2292179Z rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK]) 2023-01-11T21:38:06.2292364Z _tmp1 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("-inf") 2023-01-11T21:38:06.2292480Z _tmp1_index = tl.zeros([XBLOCK, RBLOCK], tl.int64) 2023-01-11T21:38:06.2292607Z _tmp3 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("inf") 2023-01-11T21:38:06.2292714Z _tmp3_index = tl.zeros([XBLOCK, RBLOCK], tl.int64) 2023-01-11T21:38:06.2292820Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.2292907Z rindex = roffset + rbase 2023-01-11T21:38:06.2292991Z rmask = rindex < rnumel 2023-01-11T21:38:06.2293062Z r0 = rindex 2023-01-11T21:38:06.2293284Z tmp0 = tl.load(in_ptr0 + (r0), rmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.2293404Z tmp2 = tl.load(in_ptr0 + (r0), rmask).to(tl.float32) 2023-01-11T21:38:06.2293539Z _tmp1_index = tl.where(xmask & rmask & (_tmp1 < tmp0), rindex, _tmp1_index) 2023-01-11T21:38:06.2293667Z _tmp1 = tl.where(xmask & rmask & (_tmp1 < tmp0), tmp0, _tmp1) 2023-01-11T21:38:06.2293802Z _tmp3_index = tl.where(xmask & rmask & (_tmp3 > tmp2), rindex, _tmp3_index) 2023-01-11T21:38:06.2293925Z _tmp3 = tl.where(xmask & rmask & (_tmp3 > tmp2), tmp2, _tmp3) 2023-01-11T21:38:06.2294021Z _tmp1_index_reduce = tl.reshape( 2023-01-11T21:38:06.2294129Z tl.argmax(_tmp1, 1), [XBLOCK, 1]).to(tl.int32) 2023-01-11T21:38:06.2294252Z _tmp1_index_mask = (tl.reshape(tl.arange(0, RBLOCK), 2023-01-11T21:38:06.2294341Z [1, RBLOCK]) == _tmp1_index_reduce) 2023-01-11T21:38:06.2294429Z tmp1 = tl.reshape(tl.sum( 2023-01-11T21:38:06.2294692Z tl.where(_tmp1_index_mask, _tmp1_index, 0), 1), [XBLOCK, 1]) 2023-01-11T21:38:06.2294828Z tl.store(out_ptr0 + 0 + tl.zeros([XBLOCK, 1], tl.int32), tmp1, None) 2023-01-11T21:38:06.2294922Z _tmp3_index_reduce = tl.reshape( 2023-01-11T21:38:06.2295030Z tl.argmin(_tmp3, 1), [XBLOCK, 1]).to(tl.int32) 2023-01-11T21:38:06.2295261Z _tmp3_index_mask = (tl.reshape(tl.arange(0, RBLOCK), 2023-01-11T21:38:06.2295366Z [1, RBLOCK]) == _tmp3_index_reduce) 2023-01-11T21:38:06.2295468Z tmp3 = tl.reshape(tl.sum( 2023-01-11T21:38:06.2295591Z tl.where(_tmp3_index_mask, _tmp3_index, 0), 1), [XBLOCK, 1]) 2023-01-11T21:38:06.2295721Z tl.store(out_ptr1 + 0 + tl.zeros([XBLOCK, 1], tl.int32), tmp3, None) 2023-01-11T21:38:06.2295830Z ''') 2023-01-11T21:38:06.2295836Z 2023-01-11T21:38:06.2295840Z 2023-01-11T21:38:06.2295933Z async_compile.wait(globals()) 2023-01-11T21:38:06.2296012Z del async_compile 2023-01-11T21:38:06.2296017Z 2023-01-11T21:38:06.2296090Z def call(args): 2023-01-11T21:38:06.2296159Z arg0_1, = args 2023-01-11T21:38:06.2296234Z args.clear() 2023-01-11T21:38:06.2296327Z with torch.cuda.device(0): 2023-01-11T21:38:06.2296512Z buf0 = empty_strided((), (), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.2296697Z buf1 = empty_strided((), (), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.2296788Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.2296943Z triton_fused_argmax_argmin_0.run(arg0_1, buf0, buf1, 1, 524288, grid=grid(1), stream=stream0) 2023-01-11T21:38:06.2297016Z del arg0_1 2023-01-11T21:38:06.2297092Z return (buf0, buf1, ) 2023-01-11T21:38:06.2297097Z 2023-01-11T21:38:06.2297102Z 2023-01-11T21:38:06.2297255Z if __name__ == "__main__": 2023-01-11T21:38:06.2297379Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.2297505Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.2297725Z arg0_1 = rand_strided((8, 256, 256), (65536, 256, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.2297880Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.2297885Z 2023-01-11T21:38:06.2297954Z ok (0.407s) 2023-01-11T21:38:06.2298415Z test_argmax_argmin2_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.2298547Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.2298799Z [2023-01-11 21:33:59,952] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 403 2023-01-11T21:38:06.2299061Z [2023-01-11 21:34:01,511] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 403 2023-01-11T21:38:06.2299487Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.2299621Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.2299876Z [2023-01-11 21:34:01,553] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 404 2023-01-11T21:38:06.2299881Z 2023-01-11T21:38:06.2299978Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.2300052Z import torch 2023-01-11T21:38:06.2300127Z import random 2023-01-11T21:38:06.2300245Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.2300362Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.2300368Z 2023-01-11T21:38:06.2300450Z aten = torch.ops.aten 2023-01-11T21:38:06.2300590Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.2300686Z async_compile = AsyncCompile() 2023-01-11T21:38:06.2300691Z 2023-01-11T21:38:06.2300767Z import triton 2023-01-11T21:38:06.2300857Z import triton.language as tl 2023-01-11T21:38:06.2300983Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.2301153Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.2301159Z 2023-01-11T21:38:06.2301163Z 2023-01-11T21:38:06.2301329Z triton_fused_argmax_argmin_0 = async_compile.triton(''' 2023-01-11T21:38:06.2301405Z import triton 2023-01-11T21:38:06.2301496Z import triton.language as tl 2023-01-11T21:38:06.2301612Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2301715Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2301845Z from torch._inductor.triton_ops.autotune import reduction 2023-01-11T21:38:06.2301973Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2301982Z 2023-01-11T21:38:06.2302073Z @reduction(size_hints=[256, 256], 2023-01-11T21:38:06.2302184Z reduction_hint=ReductionHint.DEFAULT, 2023-01-11T21:38:06.2302268Z filename=__file__, 2023-01-11T21:38:06.2302649Z meta={'signature': {0: '*fp32', 1: '*i64', 2: '*i64', 3: 'i32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]}) 2023-01-11T21:38:06.2302722Z @triton.jit 2023-01-11T21:38:06.2302897Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr): 2023-01-11T21:38:06.2302972Z xnumel = 144 2023-01-11T21:38:06.2303046Z rnumel = 144 2023-01-11T21:38:06.2303138Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2303272Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.2303354Z xmask = xindex < xnumel 2023-01-11T21:38:06.2303473Z rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK]) 2023-01-11T21:38:06.2303576Z x0 = xindex 2023-01-11T21:38:06.2303759Z _tmp1 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("-inf") 2023-01-11T21:38:06.2303876Z _tmp1_index = tl.zeros([XBLOCK, RBLOCK], tl.int64) 2023-01-11T21:38:06.2303995Z _tmp3 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("inf") 2023-01-11T21:38:06.2304110Z _tmp3_index = tl.zeros([XBLOCK, RBLOCK], tl.int64) 2023-01-11T21:38:06.2304215Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.2304305Z rindex = roffset + rbase 2023-01-11T21:38:06.2304392Z rmask = rindex < rnumel 2023-01-11T21:38:06.2304463Z r1 = rindex 2023-01-11T21:38:06.2304683Z tmp0 = tl.load(in_ptr0 + (x0 + (144*r1)), rmask & xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.2304800Z tmp2 = tl.load(in_ptr0 + (x0 + (144*r1)), rmask & xmask) 2023-01-11T21:38:06.2304935Z _tmp1_index = tl.where(xmask & rmask & (_tmp1 < tmp0), rindex, _tmp1_index) 2023-01-11T21:38:06.2305065Z _tmp1 = tl.where(xmask & rmask & (_tmp1 < tmp0), tmp0, _tmp1) 2023-01-11T21:38:06.2305203Z _tmp3_index = tl.where(xmask & rmask & (_tmp3 > tmp2), rindex, _tmp3_index) 2023-01-11T21:38:06.2305325Z _tmp3 = tl.where(xmask & rmask & (_tmp3 > tmp2), tmp2, _tmp3) 2023-01-11T21:38:06.2305423Z _tmp1_index_reduce = tl.reshape( 2023-01-11T21:38:06.2305558Z tl.argmax(_tmp1, 1), [XBLOCK, 1]).to(tl.int32) 2023-01-11T21:38:06.2305691Z _tmp1_index_mask = (tl.reshape(tl.arange(0, RBLOCK), 2023-01-11T21:38:06.2305791Z [1, RBLOCK]) == _tmp1_index_reduce) 2023-01-11T21:38:06.2305880Z tmp1 = tl.reshape(tl.sum( 2023-01-11T21:38:06.2306003Z tl.where(_tmp1_index_mask, _tmp1_index, 0), 1), [XBLOCK, 1]) 2023-01-11T21:38:06.2306101Z tl.store(out_ptr0 + x0, tmp1, xmask) 2023-01-11T21:38:06.2306198Z _tmp3_index_reduce = tl.reshape( 2023-01-11T21:38:06.2306306Z tl.argmin(_tmp3, 1), [XBLOCK, 1]).to(tl.int32) 2023-01-11T21:38:06.2306426Z _tmp3_index_mask = (tl.reshape(tl.arange(0, RBLOCK), 2023-01-11T21:38:06.2306517Z [1, RBLOCK]) == _tmp3_index_reduce) 2023-01-11T21:38:06.2306608Z tmp3 = tl.reshape(tl.sum( 2023-01-11T21:38:06.2306731Z tl.where(_tmp3_index_mask, _tmp3_index, 0), 1), [XBLOCK, 1]) 2023-01-11T21:38:06.2306828Z tl.store(out_ptr1 + x0, tmp3, xmask) 2023-01-11T21:38:06.2306939Z ''') 2023-01-11T21:38:06.2306946Z 2023-01-11T21:38:06.2306950Z 2023-01-11T21:38:06.2307129Z triton_fused_argmax_1_argmin_1_1 = async_compile.triton(''' 2023-01-11T21:38:06.2307204Z import triton 2023-01-11T21:38:06.2307295Z import triton.language as tl 2023-01-11T21:38:06.2307402Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2307505Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2307635Z from torch._inductor.triton_ops.autotune import reduction 2023-01-11T21:38:06.2307760Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2307765Z 2023-01-11T21:38:06.2307857Z @reduction(size_hints=[256, 256], 2023-01-11T21:38:06.2307974Z reduction_hint=ReductionHint.INNER, 2023-01-11T21:38:06.2308060Z filename=__file__, 2023-01-11T21:38:06.2308439Z meta={'signature': {0: '*fp32', 1: '*i64', 2: '*i64', 3: 'i32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]}) 2023-01-11T21:38:06.2308506Z @triton.jit 2023-01-11T21:38:06.2308680Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr): 2023-01-11T21:38:06.2308752Z xnumel = 144 2023-01-11T21:38:06.2308826Z rnumel = 144 2023-01-11T21:38:06.2308924Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2309057Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.2309142Z xmask = xindex < xnumel 2023-01-11T21:38:06.2309253Z rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK]) 2023-01-11T21:38:06.2309323Z x0 = xindex 2023-01-11T21:38:06.2309536Z _tmp1 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("-inf") 2023-01-11T21:38:06.2309652Z _tmp1_index = tl.zeros([XBLOCK, RBLOCK], tl.int64) 2023-01-11T21:38:06.2309778Z _tmp3 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("inf") 2023-01-11T21:38:06.2309892Z _tmp3_index = tl.zeros([XBLOCK, RBLOCK], tl.int64) 2023-01-11T21:38:06.2310001Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.2310083Z rindex = roffset + rbase 2023-01-11T21:38:06.2310168Z rmask = rindex < rnumel 2023-01-11T21:38:06.2310239Z r1 = rindex 2023-01-11T21:38:06.2310459Z tmp0 = tl.load(in_ptr0 + (r1 + (144*x0)), rmask & xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.2310576Z tmp2 = tl.load(in_ptr0 + (r1 + (144*x0)), rmask & xmask) 2023-01-11T21:38:06.2310720Z _tmp1_index = tl.where(xmask & rmask & (_tmp1 < tmp0), rindex, _tmp1_index) 2023-01-11T21:38:06.2310848Z _tmp1 = tl.where(xmask & rmask & (_tmp1 < tmp0), tmp0, _tmp1) 2023-01-11T21:38:06.2310991Z _tmp3_index = tl.where(xmask & rmask & (_tmp3 > tmp2), rindex, _tmp3_index) 2023-01-11T21:38:06.2311108Z _tmp3 = tl.where(xmask & rmask & (_tmp3 > tmp2), tmp2, _tmp3) 2023-01-11T21:38:06.2311204Z _tmp1_index_reduce = tl.reshape( 2023-01-11T21:38:06.2311316Z tl.argmax(_tmp1, 1), [XBLOCK, 1]).to(tl.int32) 2023-01-11T21:38:06.2311432Z _tmp1_index_mask = (tl.reshape(tl.arange(0, RBLOCK), 2023-01-11T21:38:06.2311527Z [1, RBLOCK]) == _tmp1_index_reduce) 2023-01-11T21:38:06.2311614Z tmp1 = tl.reshape(tl.sum( 2023-01-11T21:38:06.2311738Z tl.where(_tmp1_index_mask, _tmp1_index, 0), 1), [XBLOCK, 1]) 2023-01-11T21:38:06.2311828Z tl.store(out_ptr0 + x0, tmp1, xmask) 2023-01-11T21:38:06.2311922Z _tmp3_index_reduce = tl.reshape( 2023-01-11T21:38:06.2312032Z tl.argmin(_tmp3, 1), [XBLOCK, 1]).to(tl.int32) 2023-01-11T21:38:06.2312150Z _tmp3_index_mask = (tl.reshape(tl.arange(0, RBLOCK), 2023-01-11T21:38:06.2312245Z [1, RBLOCK]) == _tmp3_index_reduce) 2023-01-11T21:38:06.2312333Z tmp3 = tl.reshape(tl.sum( 2023-01-11T21:38:06.2312456Z tl.where(_tmp3_index_mask, _tmp3_index, 0), 1), [XBLOCK, 1]) 2023-01-11T21:38:06.2312546Z tl.store(out_ptr1 + x0, tmp3, xmask) 2023-01-11T21:38:06.2312631Z ''') 2023-01-11T21:38:06.2312636Z 2023-01-11T21:38:06.2312641Z 2023-01-11T21:38:06.2312758Z async_compile.wait(globals()) 2023-01-11T21:38:06.2312839Z del async_compile 2023-01-11T21:38:06.2312844Z 2023-01-11T21:38:06.2312919Z def call(args): 2023-01-11T21:38:06.2312993Z arg0_1, = args 2023-01-11T21:38:06.2313068Z args.clear() 2023-01-11T21:38:06.2313159Z with torch.cuda.device(0): 2023-01-11T21:38:06.2313350Z buf0 = empty_strided((144, ), (1, ), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.2313546Z buf1 = empty_strided((144, ), (1, ), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.2313638Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.2313795Z triton_fused_argmax_argmin_0.run(arg0_1, buf0, buf1, 144, 144, grid=grid(144), stream=stream0) 2023-01-11T21:38:06.2313990Z buf2 = empty_strided((144, ), (1, ), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.2314182Z buf3 = empty_strided((144, ), (1, ), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.2314346Z triton_fused_argmax_1_argmin_1_1.run(arg0_1, buf2, buf3, 144, 144, grid=grid(144), stream=stream0) 2023-01-11T21:38:06.2314420Z del arg0_1 2023-01-11T21:38:06.2314510Z return (buf0, buf1, buf2, buf3, ) 2023-01-11T21:38:06.2314515Z 2023-01-11T21:38:06.2314520Z 2023-01-11T21:38:06.2314599Z if __name__ == "__main__": 2023-01-11T21:38:06.2314717Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.2314843Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.2315055Z arg0_1 = rand_strided((144, 144), (144, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.2315167Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.2315206Z 2023-01-11T21:38:06.2315515Z [2023-01-11 21:34:03,297] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 404 2023-01-11T21:38:06.2315521Z 2023-01-11T21:38:06.2315620Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.2315688Z import torch 2023-01-11T21:38:06.2315762Z import random 2023-01-11T21:38:06.2315882Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.2316004Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.2316009Z 2023-01-11T21:38:06.2316091Z aten = torch.ops.aten 2023-01-11T21:38:06.2316227Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.2316323Z async_compile = AsyncCompile() 2023-01-11T21:38:06.2316328Z 2023-01-11T21:38:06.2316402Z import triton 2023-01-11T21:38:06.2316488Z import triton.language as tl 2023-01-11T21:38:06.2316613Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.2316751Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.2316759Z 2023-01-11T21:38:06.2316763Z 2023-01-11T21:38:06.2316933Z triton_fused_argmax_argmin_0 = async_compile.triton(''' 2023-01-11T21:38:06.2317009Z import triton 2023-01-11T21:38:06.2317100Z import triton.language as tl 2023-01-11T21:38:06.2317216Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2317314Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2317443Z from torch._inductor.triton_ops.autotune import reduction 2023-01-11T21:38:06.2317568Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2317573Z 2023-01-11T21:38:06.2317665Z @reduction(size_hints=[256, 256], 2023-01-11T21:38:06.2317780Z reduction_hint=ReductionHint.DEFAULT, 2023-01-11T21:38:06.2317864Z filename=__file__, 2023-01-11T21:38:06.2318252Z meta={'signature': {0: '*fp16', 1: '*i64', 2: '*i64', 3: 'i32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]}) 2023-01-11T21:38:06.2318329Z @triton.jit 2023-01-11T21:38:06.2318505Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr): 2023-01-11T21:38:06.2318580Z xnumel = 144 2023-01-11T21:38:06.2318653Z rnumel = 144 2023-01-11T21:38:06.2318778Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2318911Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.2318996Z xmask = xindex < xnumel 2023-01-11T21:38:06.2319115Z rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK]) 2023-01-11T21:38:06.2319180Z x0 = xindex 2023-01-11T21:38:06.2319363Z _tmp1 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("-inf") 2023-01-11T21:38:06.2319479Z _tmp1_index = tl.zeros([XBLOCK, RBLOCK], tl.int64) 2023-01-11T21:38:06.2319604Z _tmp3 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("inf") 2023-01-11T21:38:06.2319719Z _tmp3_index = tl.zeros([XBLOCK, RBLOCK], tl.int64) 2023-01-11T21:38:06.2319827Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.2319915Z rindex = roffset + rbase 2023-01-11T21:38:06.2320000Z rmask = rindex < rnumel 2023-01-11T21:38:06.2320065Z r1 = rindex 2023-01-11T21:38:06.2320308Z tmp0 = tl.load(in_ptr0 + (x0 + (144*r1)), rmask & xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.2320443Z tmp2 = tl.load(in_ptr0 + (x0 + (144*r1)), rmask & xmask).to(tl.float32) 2023-01-11T21:38:06.2320586Z _tmp1_index = tl.where(xmask & rmask & (_tmp1 < tmp0), rindex, _tmp1_index) 2023-01-11T21:38:06.2320710Z _tmp1 = tl.where(xmask & rmask & (_tmp1 < tmp0), tmp0, _tmp1) 2023-01-11T21:38:06.2320847Z _tmp3_index = tl.where(xmask & rmask & (_tmp3 > tmp2), rindex, _tmp3_index) 2023-01-11T21:38:06.2320972Z _tmp3 = tl.where(xmask & rmask & (_tmp3 > tmp2), tmp2, _tmp3) 2023-01-11T21:38:06.2321060Z _tmp1_index_reduce = tl.reshape( 2023-01-11T21:38:06.2321201Z tl.argmax(_tmp1, 1), [XBLOCK, 1]).to(tl.int32) 2023-01-11T21:38:06.2321321Z _tmp1_index_mask = (tl.reshape(tl.arange(0, RBLOCK), 2023-01-11T21:38:06.2321417Z [1, RBLOCK]) == _tmp1_index_reduce) 2023-01-11T21:38:06.2321504Z tmp1 = tl.reshape(tl.sum( 2023-01-11T21:38:06.2321631Z tl.where(_tmp1_index_mask, _tmp1_index, 0), 1), [XBLOCK, 1]) 2023-01-11T21:38:06.2321729Z tl.store(out_ptr0 + x0, tmp1, xmask) 2023-01-11T21:38:06.2321816Z _tmp3_index_reduce = tl.reshape( 2023-01-11T21:38:06.2321925Z tl.argmin(_tmp3, 1), [XBLOCK, 1]).to(tl.int32) 2023-01-11T21:38:06.2322044Z _tmp3_index_mask = (tl.reshape(tl.arange(0, RBLOCK), 2023-01-11T21:38:06.2322138Z [1, RBLOCK]) == _tmp3_index_reduce) 2023-01-11T21:38:06.2322226Z tmp3 = tl.reshape(tl.sum( 2023-01-11T21:38:06.2322349Z tl.where(_tmp3_index_mask, _tmp3_index, 0), 1), [XBLOCK, 1]) 2023-01-11T21:38:06.2322448Z tl.store(out_ptr1 + x0, tmp3, xmask) 2023-01-11T21:38:06.2322534Z ''') 2023-01-11T21:38:06.2322542Z 2023-01-11T21:38:06.2322547Z 2023-01-11T21:38:06.2322718Z triton_fused_argmax_1_argmin_1_1 = async_compile.triton(''' 2023-01-11T21:38:06.2322793Z import triton 2023-01-11T21:38:06.2322885Z import triton.language as tl 2023-01-11T21:38:06.2323000Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2323105Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2323235Z from torch._inductor.triton_ops.autotune import reduction 2023-01-11T21:38:06.2323360Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2323365Z 2023-01-11T21:38:06.2323449Z @reduction(size_hints=[256, 256], 2023-01-11T21:38:06.2323563Z reduction_hint=ReductionHint.INNER, 2023-01-11T21:38:06.2323648Z filename=__file__, 2023-01-11T21:38:06.2324021Z meta={'signature': {0: '*fp16', 1: '*i64', 2: '*i64', 3: 'i32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]}) 2023-01-11T21:38:06.2324097Z @triton.jit 2023-01-11T21:38:06.2324274Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr): 2023-01-11T21:38:06.2324348Z xnumel = 144 2023-01-11T21:38:06.2324421Z rnumel = 144 2023-01-11T21:38:06.2324511Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2324673Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.2324758Z xmask = xindex < xnumel 2023-01-11T21:38:06.2324877Z rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK]) 2023-01-11T21:38:06.2324947Z x0 = xindex 2023-01-11T21:38:06.2325129Z _tmp1 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("-inf") 2023-01-11T21:38:06.2325249Z _tmp1_index = tl.zeros([XBLOCK, RBLOCK], tl.int64) 2023-01-11T21:38:06.2325368Z _tmp3 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("inf") 2023-01-11T21:38:06.2325482Z _tmp3_index = tl.zeros([XBLOCK, RBLOCK], tl.int64) 2023-01-11T21:38:06.2325587Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.2325691Z rindex = roffset + rbase 2023-01-11T21:38:06.2325786Z rmask = rindex < rnumel 2023-01-11T21:38:06.2325869Z r1 = rindex 2023-01-11T21:38:06.2326123Z tmp0 = tl.load(in_ptr0 + (r1 + (144*x0)), rmask & xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.2326262Z tmp2 = tl.load(in_ptr0 + (r1 + (144*x0)), rmask & xmask).to(tl.float32) 2023-01-11T21:38:06.2326398Z _tmp1_index = tl.where(xmask & rmask & (_tmp1 < tmp0), rindex, _tmp1_index) 2023-01-11T21:38:06.2326526Z _tmp1 = tl.where(xmask & rmask & (_tmp1 < tmp0), tmp0, _tmp1) 2023-01-11T21:38:06.2326666Z _tmp3_index = tl.where(xmask & rmask & (_tmp3 > tmp2), rindex, _tmp3_index) 2023-01-11T21:38:06.2326788Z _tmp3 = tl.where(xmask & rmask & (_tmp3 > tmp2), tmp2, _tmp3) 2023-01-11T21:38:06.2326882Z _tmp1_index_reduce = tl.reshape( 2023-01-11T21:38:06.2326990Z tl.argmax(_tmp1, 1), [XBLOCK, 1]).to(tl.int32) 2023-01-11T21:38:06.2327149Z _tmp1_index_mask = (tl.reshape(tl.arange(0, RBLOCK), 2023-01-11T21:38:06.2327238Z [1, RBLOCK]) == _tmp1_index_reduce) 2023-01-11T21:38:06.2327327Z tmp1 = tl.reshape(tl.sum( 2023-01-11T21:38:06.2327452Z tl.where(_tmp1_index_mask, _tmp1_index, 0), 1), [XBLOCK, 1]) 2023-01-11T21:38:06.2327554Z tl.store(out_ptr0 + x0, tmp1, xmask) 2023-01-11T21:38:06.2327648Z _tmp3_index_reduce = tl.reshape( 2023-01-11T21:38:06.2327758Z tl.argmin(_tmp3, 1), [XBLOCK, 1]).to(tl.int32) 2023-01-11T21:38:06.2327875Z _tmp3_index_mask = (tl.reshape(tl.arange(0, RBLOCK), 2023-01-11T21:38:06.2327963Z [1, RBLOCK]) == _tmp3_index_reduce) 2023-01-11T21:38:06.2328054Z tmp3 = tl.reshape(tl.sum( 2023-01-11T21:38:06.2328178Z tl.where(_tmp3_index_mask, _tmp3_index, 0), 1), [XBLOCK, 1]) 2023-01-11T21:38:06.2328275Z tl.store(out_ptr1 + x0, tmp3, xmask) 2023-01-11T21:38:06.2328361Z ''') 2023-01-11T21:38:06.2328366Z 2023-01-11T21:38:06.2328371Z 2023-01-11T21:38:06.2328468Z async_compile.wait(globals()) 2023-01-11T21:38:06.2328544Z del async_compile 2023-01-11T21:38:06.2328549Z 2023-01-11T21:38:06.2328623Z def call(args): 2023-01-11T21:38:06.2328689Z arg0_1, = args 2023-01-11T21:38:06.2328763Z args.clear() 2023-01-11T21:38:06.2328856Z with torch.cuda.device(0): 2023-01-11T21:38:06.2329053Z buf0 = empty_strided((144, ), (1, ), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.2329250Z buf1 = empty_strided((144, ), (1, ), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.2329341Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.2329500Z triton_fused_argmax_argmin_0.run(arg0_1, buf0, buf1, 144, 144, grid=grid(144), stream=stream0) 2023-01-11T21:38:06.2329685Z buf2 = empty_strided((144, ), (1, ), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.2329877Z buf3 = empty_strided((144, ), (1, ), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.2330037Z triton_fused_argmax_1_argmin_1_1.run(arg0_1, buf2, buf3, 144, 144, grid=grid(144), stream=stream0) 2023-01-11T21:38:06.2330115Z del arg0_1 2023-01-11T21:38:06.2330211Z return (buf0, buf1, buf2, buf3, ) 2023-01-11T21:38:06.2330217Z 2023-01-11T21:38:06.2330221Z 2023-01-11T21:38:06.2330302Z if __name__ == "__main__": 2023-01-11T21:38:06.2330451Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.2330579Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.2330789Z arg0_1 = rand_strided((144, 144), (144, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.2330894Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.2330899Z 2023-01-11T21:38:06.2330970Z ok (3.387s) 2023-01-11T21:38:06.2331092Z test_argmax_argmin3_cuda (__main__.CudaTests) ... skip: 2023-01-11T21:38:06.2331255Z FIXME: In the case of having equally max/min elements, our implementation returns 2023-01-11T21:38:06.2331363Z the last index instead of the first one 2023-01-11T21:38:06.2331437Z (0.000s) 2023-01-11T21:38:06.2331897Z test_as_strided_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.2332028Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.2332288Z [2023-01-11 21:34:03,331] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 405 2023-01-11T21:38:06.2332546Z [2023-01-11 21:34:03,407] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 405 2023-01-11T21:38:06.2332959Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.2333119Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.2333376Z [2023-01-11 21:34:03,438] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 406 2023-01-11T21:38:06.2333637Z [2023-01-11 21:34:03,511] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 406 2023-01-11T21:38:06.2333643Z 2023-01-11T21:38:06.2333740Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.2333814Z import torch 2023-01-11T21:38:06.2333887Z import random 2023-01-11T21:38:06.2334005Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.2334121Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.2334126Z 2023-01-11T21:38:06.2334208Z aten = torch.ops.aten 2023-01-11T21:38:06.2334347Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.2334443Z async_compile = AsyncCompile() 2023-01-11T21:38:06.2334448Z 2023-01-11T21:38:06.2334634Z import triton 2023-01-11T21:38:06.2334729Z import triton.language as tl 2023-01-11T21:38:06.2334854Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.2334990Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.2335001Z 2023-01-11T21:38:06.2335005Z 2023-01-11T21:38:06.2335162Z triton_fused_add_add_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.2335235Z import triton 2023-01-11T21:38:06.2335327Z import triton.language as tl 2023-01-11T21:38:06.2335437Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2335539Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2335672Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.2335797Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2335805Z 2023-01-11T21:38:06.2336224Z @pointwise(size_hints=[4096], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.2336292Z @triton.jit 2023-01-11T21:38:06.2336469Z def triton_(in_out_ptr0, in_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.2336546Z xnumel = 4096 2023-01-11T21:38:06.2336642Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2336776Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.2336861Z xmask = xindex < xnumel 2023-01-11T21:38:06.2336931Z x0 = xindex 2023-01-11T21:38:06.2337021Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.2337093Z tmp1 = 1 2023-01-11T21:38:06.2337227Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.2337298Z tmp3 = 2 2023-01-11T21:38:06.2337377Z tmp4 = tmp2 + tmp3 2023-01-11T21:38:06.2337519Z tl.store(in_out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.2337607Z ''') 2023-01-11T21:38:06.2337612Z 2023-01-11T21:38:06.2337617Z 2023-01-11T21:38:06.2337703Z async_compile.wait(globals()) 2023-01-11T21:38:06.2337779Z del async_compile 2023-01-11T21:38:06.2337785Z 2023-01-11T21:38:06.2337858Z def call(args): 2023-01-11T21:38:06.2337934Z arg0_1, = args 2023-01-11T21:38:06.2338007Z args.clear() 2023-01-11T21:38:06.2338097Z with torch.cuda.device(0): 2023-01-11T21:38:06.2338300Z buf0 = empty_strided((64, 64), (64, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.2338414Z buf1 = as_strided(buf0, (8, 8, 64), (512, 64, 1)); del buf0 # reuse 2023-01-11T21:38:06.2338506Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.2338648Z triton_fused_add_add_1_0.run(buf1, arg0_1, 4096, grid=grid(4096), stream=stream0) 2023-01-11T21:38:06.2338765Z return (as_strided(arg0_1, (8, 8, 64), (512, 64, 1)), buf1, ) 2023-01-11T21:38:06.2338808Z 2023-01-11T21:38:06.2338812Z 2023-01-11T21:38:06.2338895Z if __name__ == "__main__": 2023-01-11T21:38:06.2339011Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.2339136Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.2339342Z arg0_1 = rand_strided((64, 64), (64, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.2339447Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.2339460Z 2023-01-11T21:38:06.2339464Z 2023-01-11T21:38:06.2339555Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.2339628Z import torch 2023-01-11T21:38:06.2339702Z import random 2023-01-11T21:38:06.2339818Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.2339942Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.2339947Z 2023-01-11T21:38:06.2340028Z aten = torch.ops.aten 2023-01-11T21:38:06.2340164Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.2340254Z async_compile = AsyncCompile() 2023-01-11T21:38:06.2340267Z 2023-01-11T21:38:06.2340334Z import triton 2023-01-11T21:38:06.2340426Z import triton.language as tl 2023-01-11T21:38:06.2340549Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.2340687Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.2340695Z 2023-01-11T21:38:06.2340699Z 2023-01-11T21:38:06.2340860Z triton_fused_add_add_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.2340935Z import triton 2023-01-11T21:38:06.2341026Z import triton.language as tl 2023-01-11T21:38:06.2341132Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2341236Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2341366Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.2341489Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2341494Z 2023-01-11T21:38:06.2341915Z @pointwise(size_hints=[4096], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.2341989Z @triton.jit 2023-01-11T21:38:06.2342121Z def triton_(in_out_ptr0, in_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.2342224Z xnumel = 4096 2023-01-11T21:38:06.2342316Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2342445Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.2342527Z xmask = xindex < xnumel 2023-01-11T21:38:06.2342595Z x0 = xindex 2023-01-11T21:38:06.2342712Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.2342780Z tmp1 = 1 2023-01-11T21:38:06.2342860Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.2342923Z tmp3 = 2 2023-01-11T21:38:06.2343001Z tmp4 = tmp2 + tmp3 2023-01-11T21:38:06.2343137Z tl.store(in_out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.2343226Z ''') 2023-01-11T21:38:06.2343231Z 2023-01-11T21:38:06.2343236Z 2023-01-11T21:38:06.2343327Z async_compile.wait(globals()) 2023-01-11T21:38:06.2343403Z del async_compile 2023-01-11T21:38:06.2343408Z 2023-01-11T21:38:06.2343482Z def call(args): 2023-01-11T21:38:06.2343555Z arg0_1, = args 2023-01-11T21:38:06.2343623Z args.clear() 2023-01-11T21:38:06.2343716Z with torch.cuda.device(0): 2023-01-11T21:38:06.2343920Z buf0 = empty_strided((64, 64), (64, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.2344039Z buf1 = as_strided(buf0, (8, 8, 64), (512, 64, 1)); del buf0 # reuse 2023-01-11T21:38:06.2344131Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.2344272Z triton_fused_add_add_1_0.run(buf1, arg0_1, 4096, grid=grid(4096), stream=stream0) 2023-01-11T21:38:06.2344388Z return (as_strided(arg0_1, (8, 8, 64), (512, 64, 1)), buf1, ) 2023-01-11T21:38:06.2344393Z 2023-01-11T21:38:06.2344398Z 2023-01-11T21:38:06.2344471Z if __name__ == "__main__": 2023-01-11T21:38:06.2344614Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.2344740Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.2344943Z arg0_1 = rand_strided((64, 64), (64, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.2345057Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.2345062Z 2023-01-11T21:38:06.2345132Z ok (0.213s) 2023-01-11T21:38:06.2345620Z test_as_strided_scatter_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.2345776Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.2346036Z [2023-01-11 21:34:03,541] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 407 2023-01-11T21:38:06.2346305Z [2023-01-11 21:34:03,624] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 407 2023-01-11T21:38:06.2346715Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.2346845Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.2347100Z [2023-01-11 21:34:03,653] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 408 2023-01-11T21:38:06.2347360Z [2023-01-11 21:34:03,731] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 408 2023-01-11T21:38:06.2347368Z 2023-01-11T21:38:06.2347470Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.2347544Z import torch 2023-01-11T21:38:06.2347618Z import random 2023-01-11T21:38:06.2347735Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.2347860Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.2347865Z 2023-01-11T21:38:06.2347967Z aten = torch.ops.aten 2023-01-11T21:38:06.2348105Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.2348200Z async_compile = AsyncCompile() 2023-01-11T21:38:06.2348204Z 2023-01-11T21:38:06.2348277Z import triton 2023-01-11T21:38:06.2348370Z import triton.language as tl 2023-01-11T21:38:06.2348496Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.2348634Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.2348640Z 2023-01-11T21:38:06.2348644Z 2023-01-11T21:38:06.2348851Z triton_fused_add_as_strided_scatter_mul_mul_1_sub_0 = async_compile.triton(''' 2023-01-11T21:38:06.2348922Z import triton 2023-01-11T21:38:06.2349014Z import triton.language as tl 2023-01-11T21:38:06.2349127Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2349229Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2349361Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.2349490Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2349496Z 2023-01-11T21:38:06.2349902Z @pointwise(size_hints=[16384], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.2349976Z @triton.jit 2023-01-11T21:38:06.2350102Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.2350178Z xnumel = 10240 2023-01-11T21:38:06.2350275Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2350401Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.2350509Z xmask = xindex < xnumel 2023-01-11T21:38:06.2350579Z x0 = xindex 2023-01-11T21:38:06.2350675Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.2350740Z tmp1 = 8 2023-01-11T21:38:06.2350819Z tmp2 = tmp0 * tmp1 2023-01-11T21:38:06.2350891Z tmp3 = 10 2023-01-11T21:38:06.2350972Z tmp4 = tmp2 + tmp3 2023-01-11T21:38:06.2351107Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.2351193Z ''') 2023-01-11T21:38:06.2351199Z 2023-01-11T21:38:06.2351203Z 2023-01-11T21:38:06.2351361Z triton_fused_mul_1_sub_1 = async_compile.triton(''' 2023-01-11T21:38:06.2351428Z import triton 2023-01-11T21:38:06.2351520Z import triton.language as tl 2023-01-11T21:38:06.2351632Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2351732Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2351863Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.2351991Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2351996Z 2023-01-11T21:38:06.2352410Z @pointwise(size_hints=[8192], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.2352484Z @triton.jit 2023-01-11T21:38:06.2352609Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.2352683Z xnumel = 5120 2023-01-11T21:38:06.2352779Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2352907Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.2352990Z xmask = xindex < xnumel 2023-01-11T21:38:06.2353063Z x0 = xindex 2023-01-11T21:38:06.2353159Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.2353222Z tmp1 = 2 2023-01-11T21:38:06.2353304Z tmp2 = tmp0 * tmp1 2023-01-11T21:38:06.2353372Z tmp3 = 4 2023-01-11T21:38:06.2353482Z tmp4 = tmp2 - tmp3 2023-01-11T21:38:06.2353619Z tl.store(out_ptr0 + (2*x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.2353704Z ''') 2023-01-11T21:38:06.2353709Z 2023-01-11T21:38:06.2353714Z 2023-01-11T21:38:06.2353804Z async_compile.wait(globals()) 2023-01-11T21:38:06.2353875Z del async_compile 2023-01-11T21:38:06.2353917Z 2023-01-11T21:38:06.2353986Z def call(args): 2023-01-11T21:38:06.2354067Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.2354141Z args.clear() 2023-01-11T21:38:06.2354233Z with torch.cuda.device(0): 2023-01-11T21:38:06.2354441Z buf0 = empty_strided((10, 1024), (1024, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.2354533Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.2354706Z triton_fused_add_as_strided_scatter_mul_mul_1_sub_0.run(arg0_1, buf0, 10240, grid=grid(10240), stream=stream0) 2023-01-11T21:38:06.2354773Z del arg0_1 2023-01-11T21:38:06.2354913Z triton_fused_mul_1_sub_1.run(arg1_1, buf0, 5120, grid=grid(5120), stream=stream0) 2023-01-11T21:38:06.2354986Z del arg1_1 2023-01-11T21:38:06.2355065Z return (buf0, ) 2023-01-11T21:38:06.2355070Z 2023-01-11T21:38:06.2355074Z 2023-01-11T21:38:06.2355155Z if __name__ == "__main__": 2023-01-11T21:38:06.2355294Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.2355434Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.2355652Z arg0_1 = rand_strided((10, 1024), (1024, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.2355857Z arg1_1 = rand_strided((10, 512), (512, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.2355977Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.2355983Z 2023-01-11T21:38:06.2355987Z 2023-01-11T21:38:06.2356082Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.2356158Z import torch 2023-01-11T21:38:06.2356231Z import random 2023-01-11T21:38:06.2356350Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.2356500Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.2356505Z 2023-01-11T21:38:06.2356585Z aten = torch.ops.aten 2023-01-11T21:38:06.2356713Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.2356807Z async_compile = AsyncCompile() 2023-01-11T21:38:06.2356815Z 2023-01-11T21:38:06.2356889Z import triton 2023-01-11T21:38:06.2356980Z import triton.language as tl 2023-01-11T21:38:06.2357103Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.2357241Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.2357246Z 2023-01-11T21:38:06.2357251Z 2023-01-11T21:38:06.2357462Z triton_fused_add_as_strided_scatter_mul_mul_1_sub_0 = async_compile.triton(''' 2023-01-11T21:38:06.2357536Z import triton 2023-01-11T21:38:06.2357621Z import triton.language as tl 2023-01-11T21:38:06.2357733Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2357837Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2357969Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.2358093Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2358098Z 2023-01-11T21:38:06.2358506Z @pointwise(size_hints=[16384], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.2358579Z @triton.jit 2023-01-11T21:38:06.2358709Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.2358777Z xnumel = 10240 2023-01-11T21:38:06.2358873Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2359003Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.2359085Z xmask = xindex < xnumel 2023-01-11T21:38:06.2359156Z x0 = xindex 2023-01-11T21:38:06.2359270Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.2359337Z tmp1 = 8 2023-01-11T21:38:06.2359414Z tmp2 = tmp0 * tmp1 2023-01-11T21:38:06.2359484Z tmp3 = 10 2023-01-11T21:38:06.2359560Z tmp4 = tmp2 + tmp3 2023-01-11T21:38:06.2359694Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.2359778Z ''') 2023-01-11T21:38:06.2359811Z 2023-01-11T21:38:06.2359817Z 2023-01-11T21:38:06.2359978Z triton_fused_mul_1_sub_1 = async_compile.triton(''' 2023-01-11T21:38:06.2360052Z import triton 2023-01-11T21:38:06.2360137Z import triton.language as tl 2023-01-11T21:38:06.2360249Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2360350Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2360480Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.2360606Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2360611Z 2023-01-11T21:38:06.2361024Z @pointwise(size_hints=[8192], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.2361100Z @triton.jit 2023-01-11T21:38:06.2361232Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.2361301Z xnumel = 5120 2023-01-11T21:38:06.2361396Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2361524Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.2361606Z xmask = xindex < xnumel 2023-01-11T21:38:06.2361675Z x0 = xindex 2023-01-11T21:38:06.2361795Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.2361865Z tmp1 = 2 2023-01-11T21:38:06.2361937Z tmp2 = tmp0 * tmp1 2023-01-11T21:38:06.2362005Z tmp3 = 4 2023-01-11T21:38:06.2362115Z tmp4 = tmp2 - tmp3 2023-01-11T21:38:06.2362250Z tl.store(out_ptr0 + (2*x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.2362363Z ''') 2023-01-11T21:38:06.2362369Z 2023-01-11T21:38:06.2362373Z 2023-01-11T21:38:06.2362466Z async_compile.wait(globals()) 2023-01-11T21:38:06.2362541Z del async_compile 2023-01-11T21:38:06.2362546Z 2023-01-11T21:38:06.2362614Z def call(args): 2023-01-11T21:38:06.2362694Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.2362770Z args.clear() 2023-01-11T21:38:06.2362862Z with torch.cuda.device(0): 2023-01-11T21:38:06.2363070Z buf0 = empty_strided((10, 1024), (1024, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.2363163Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.2363338Z triton_fused_add_as_strided_scatter_mul_mul_1_sub_0.run(arg0_1, buf0, 10240, grid=grid(10240), stream=stream0) 2023-01-11T21:38:06.2363405Z del arg0_1 2023-01-11T21:38:06.2363550Z triton_fused_mul_1_sub_1.run(arg1_1, buf0, 5120, grid=grid(5120), stream=stream0) 2023-01-11T21:38:06.2363621Z del arg1_1 2023-01-11T21:38:06.2363696Z return (buf0, ) 2023-01-11T21:38:06.2363705Z 2023-01-11T21:38:06.2363709Z 2023-01-11T21:38:06.2363790Z if __name__ == "__main__": 2023-01-11T21:38:06.2363906Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.2364031Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.2364242Z arg0_1 = rand_strided((10, 1024), (1024, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.2364441Z arg1_1 = rand_strided((10, 512), (512, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.2364558Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.2364563Z 2023-01-11T21:38:06.2364638Z ok (0.220s) 2023-01-11T21:38:06.2365090Z test_avg_pool2d1_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.2365227Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.2365482Z [2023-01-11 21:34:03,751] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 409 2023-01-11T21:38:06.2365775Z [2023-01-11 21:34:03,903] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 409 2023-01-11T21:38:06.2366189Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.2366323Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.2366578Z [2023-01-11 21:34:03,921] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 410 2023-01-11T21:38:06.2366842Z [2023-01-11 21:34:04,052] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 410 2023-01-11T21:38:06.2366848Z 2023-01-11T21:38:06.2366939Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.2367012Z import torch 2023-01-11T21:38:06.2367087Z import random 2023-01-11T21:38:06.2367205Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.2367328Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.2367333Z 2023-01-11T21:38:06.2367416Z aten = torch.ops.aten 2023-01-11T21:38:06.2367553Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.2367649Z async_compile = AsyncCompile() 2023-01-11T21:38:06.2367654Z 2023-01-11T21:38:06.2367721Z import triton 2023-01-11T21:38:06.2367814Z import triton.language as tl 2023-01-11T21:38:06.2367936Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.2368117Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.2368122Z 2023-01-11T21:38:06.2368127Z 2023-01-11T21:38:06.2368293Z triton_fused_avg_pool2d_0 = async_compile.triton(''' 2023-01-11T21:38:06.2368366Z import triton 2023-01-11T21:38:06.2368458Z import triton.language as tl 2023-01-11T21:38:06.2368568Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2368671Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2368805Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.2368930Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2368935Z 2023-01-11T21:38:06.2369334Z @pointwise(size_hints=[512], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.2369408Z @triton.jit 2023-01-11T21:38:06.2369539Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.2369616Z xnumel = 392 2023-01-11T21:38:06.2369705Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2369833Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.2369916Z xmask = xindex < xnumel 2023-01-11T21:38:06.2369990Z x0 = xindex % 7 2023-01-11T21:38:06.2370077Z x1 = (xindex // 7) % 7 2023-01-11T21:38:06.2370156Z x2 = (xindex // 49) 2023-01-11T21:38:06.2370226Z x3 = xindex 2023-01-11T21:38:06.2370337Z tmp0 = tl.load(in_ptr0 + ((2*x0) + (32*x1) + (256*x2)), xmask) 2023-01-11T21:38:06.2370456Z tmp1 = tl.load(in_ptr0 + (1 + (2*x0) + (32*x1) + (256*x2)), xmask) 2023-01-11T21:38:06.2370572Z tmp3 = tl.load(in_ptr0 + (2 + (2*x0) + (32*x1) + (256*x2)), xmask) 2023-01-11T21:38:06.2370691Z tmp5 = tl.load(in_ptr0 + (16 + (2*x0) + (32*x1) + (256*x2)), xmask) 2023-01-11T21:38:06.2370806Z tmp7 = tl.load(in_ptr0 + (17 + (2*x0) + (32*x1) + (256*x2)), xmask) 2023-01-11T21:38:06.2370917Z tmp9 = tl.load(in_ptr0 + (18 + (2*x0) + (32*x1) + (256*x2)), xmask) 2023-01-11T21:38:06.2371036Z tmp11 = tl.load(in_ptr0 + (32 + (2*x0) + (32*x1) + (256*x2)), xmask) 2023-01-11T21:38:06.2371154Z tmp13 = tl.load(in_ptr0 + (33 + (2*x0) + (32*x1) + (256*x2)), xmask) 2023-01-11T21:38:06.2371291Z tmp15 = tl.load(in_ptr0 + (34 + (2*x0) + (32*x1) + (256*x2)), xmask) 2023-01-11T21:38:06.2371372Z tmp2 = tmp1 + tmp0 2023-01-11T21:38:06.2371451Z tmp4 = tmp3 + tmp2 2023-01-11T21:38:06.2371528Z tmp6 = tmp5 + tmp4 2023-01-11T21:38:06.2371604Z tmp8 = tmp7 + tmp6 2023-01-11T21:38:06.2371680Z tmp10 = tmp9 + tmp8 2023-01-11T21:38:06.2371760Z tmp12 = tmp11 + tmp10 2023-01-11T21:38:06.2371833Z tmp14 = tmp13 + tmp12 2023-01-11T21:38:06.2371910Z tmp16 = tmp15 + tmp14 2023-01-11T21:38:06.2371990Z tmp17 = 0.1111111111111111 2023-01-11T21:38:06.2372068Z tmp18 = tmp16 * tmp17 2023-01-11T21:38:06.2372202Z tl.store(out_ptr0 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp18, xmask) 2023-01-11T21:38:06.2372291Z ''') 2023-01-11T21:38:06.2372296Z 2023-01-11T21:38:06.2372301Z 2023-01-11T21:38:06.2372396Z async_compile.wait(globals()) 2023-01-11T21:38:06.2372465Z del async_compile 2023-01-11T21:38:06.2372477Z 2023-01-11T21:38:06.2372545Z def call(args): 2023-01-11T21:38:06.2372618Z arg0_1, = args 2023-01-11T21:38:06.2372694Z args.clear() 2023-01-11T21:38:06.2372789Z with torch.cuda.device(0): 2023-01-11T21:38:06.2373007Z buf0 = empty_strided((2, 4, 7, 7), (196, 49, 7, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.2373097Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.2373242Z triton_fused_avg_pool2d_0.run(arg0_1, buf0, 392, grid=grid(392), stream=stream0) 2023-01-11T21:38:06.2373308Z del arg0_1 2023-01-11T21:38:06.2373382Z return (buf0, ) 2023-01-11T21:38:06.2373388Z 2023-01-11T21:38:06.2373392Z 2023-01-11T21:38:06.2373472Z if __name__ == "__main__": 2023-01-11T21:38:06.2373589Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.2373744Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.2373968Z arg0_1 = rand_strided((2, 4, 16, 16), (1024, 256, 16, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.2374080Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.2374085Z 2023-01-11T21:38:06.2374089Z 2023-01-11T21:38:06.2374185Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.2374252Z import torch 2023-01-11T21:38:06.2374326Z import random 2023-01-11T21:38:06.2374443Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.2374692Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.2374697Z 2023-01-11T21:38:06.2374779Z aten = torch.ops.aten 2023-01-11T21:38:06.2374915Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.2375009Z async_compile = AsyncCompile() 2023-01-11T21:38:06.2375015Z 2023-01-11T21:38:06.2375081Z import triton 2023-01-11T21:38:06.2375175Z import triton.language as tl 2023-01-11T21:38:06.2375304Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.2375443Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.2375448Z 2023-01-11T21:38:06.2375452Z 2023-01-11T21:38:06.2375619Z triton_fused_avg_pool2d_0 = async_compile.triton(''' 2023-01-11T21:38:06.2375698Z import triton 2023-01-11T21:38:06.2375792Z import triton.language as tl 2023-01-11T21:38:06.2375905Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2376000Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2376131Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.2376258Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2376264Z 2023-01-11T21:38:06.2376664Z @pointwise(size_hints=[512], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.2376740Z @triton.jit 2023-01-11T21:38:06.2376872Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.2376946Z xnumel = 392 2023-01-11T21:38:06.2377041Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2377243Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.2377387Z xmask = xindex < xnumel 2023-01-11T21:38:06.2377465Z x0 = xindex % 7 2023-01-11T21:38:06.2377547Z x1 = (xindex // 7) % 7 2023-01-11T21:38:06.2377626Z x2 = (xindex // 49) 2023-01-11T21:38:06.2377699Z x3 = xindex 2023-01-11T21:38:06.2377831Z tmp0 = tl.load(in_ptr0 + ((2*x0) + (32*x1) + (256*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.2377960Z tmp1 = tl.load(in_ptr0 + (1 + (2*x0) + (32*x1) + (256*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.2378085Z tmp3 = tl.load(in_ptr0 + (2 + (2*x0) + (32*x1) + (256*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.2378220Z tmp5 = tl.load(in_ptr0 + (16 + (2*x0) + (32*x1) + (256*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.2378354Z tmp7 = tl.load(in_ptr0 + (17 + (2*x0) + (32*x1) + (256*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.2378484Z tmp9 = tl.load(in_ptr0 + (18 + (2*x0) + (32*x1) + (256*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.2378616Z tmp11 = tl.load(in_ptr0 + (32 + (2*x0) + (32*x1) + (256*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.2378750Z tmp13 = tl.load(in_ptr0 + (33 + (2*x0) + (32*x1) + (256*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.2378878Z tmp15 = tl.load(in_ptr0 + (34 + (2*x0) + (32*x1) + (256*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.2378950Z tmp2 = tmp1 + tmp0 2023-01-11T21:38:06.2379028Z tmp4 = tmp3 + tmp2 2023-01-11T21:38:06.2379105Z tmp6 = tmp5 + tmp4 2023-01-11T21:38:06.2379181Z tmp8 = tmp7 + tmp6 2023-01-11T21:38:06.2379257Z tmp10 = tmp9 + tmp8 2023-01-11T21:38:06.2379336Z tmp12 = tmp11 + tmp10 2023-01-11T21:38:06.2379417Z tmp14 = tmp13 + tmp12 2023-01-11T21:38:06.2379525Z tmp16 = tmp15 + tmp14 2023-01-11T21:38:06.2379605Z tmp17 = 0.1111111111111111 2023-01-11T21:38:06.2379682Z tmp18 = tmp16 * tmp17 2023-01-11T21:38:06.2379818Z tl.store(out_ptr0 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp18, xmask) 2023-01-11T21:38:06.2379905Z ''') 2023-01-11T21:38:06.2379910Z 2023-01-11T21:38:06.2379915Z 2023-01-11T21:38:06.2380010Z async_compile.wait(globals()) 2023-01-11T21:38:06.2380084Z del async_compile 2023-01-11T21:38:06.2380090Z 2023-01-11T21:38:06.2380157Z def call(args): 2023-01-11T21:38:06.2380229Z arg0_1, = args 2023-01-11T21:38:06.2380301Z args.clear() 2023-01-11T21:38:06.2380393Z with torch.cuda.device(0): 2023-01-11T21:38:06.2380610Z buf0 = empty_strided((2, 4, 7, 7), (196, 49, 7, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.2380701Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.2380846Z triton_fused_avg_pool2d_0.run(arg0_1, buf0, 392, grid=grid(392), stream=stream0) 2023-01-11T21:38:06.2380919Z del arg0_1 2023-01-11T21:38:06.2380993Z return (buf0, ) 2023-01-11T21:38:06.2380998Z 2023-01-11T21:38:06.2381002Z 2023-01-11T21:38:06.2381081Z if __name__ == "__main__": 2023-01-11T21:38:06.2381197Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.2381322Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.2381551Z arg0_1 = rand_strided((2, 4, 16, 16), (1024, 256, 16, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.2381663Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.2381668Z 2023-01-11T21:38:06.2381741Z ok (0.320s) 2023-01-11T21:38:06.2382201Z test_avg_pool2d2_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.2382335Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.2382587Z [2023-01-11 21:34:04,091] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 411 2023-01-11T21:38:06.2382880Z [2023-01-11 21:34:04,233] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 411 2023-01-11T21:38:06.2383295Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.2383425Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.2383680Z [2023-01-11 21:34:04,252] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 412 2023-01-11T21:38:06.2383945Z [2023-01-11 21:34:04,385] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 412 2023-01-11T21:38:06.2383951Z 2023-01-11T21:38:06.2384049Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.2384123Z import torch 2023-01-11T21:38:06.2384196Z import random 2023-01-11T21:38:06.2384310Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.2384435Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.2384440Z 2023-01-11T21:38:06.2384519Z aten = torch.ops.aten 2023-01-11T21:38:06.2384655Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.2384750Z async_compile = AsyncCompile() 2023-01-11T21:38:06.2384755Z 2023-01-11T21:38:06.2384827Z import triton 2023-01-11T21:38:06.2384917Z import triton.language as tl 2023-01-11T21:38:06.2385041Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.2385174Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.2385249Z 2023-01-11T21:38:06.2385260Z 2023-01-11T21:38:06.2385419Z triton_fused_avg_pool2d_0 = async_compile.triton(''' 2023-01-11T21:38:06.2385503Z import triton 2023-01-11T21:38:06.2385608Z import triton.language as tl 2023-01-11T21:38:06.2385742Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2385848Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2385980Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.2386099Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2386111Z 2023-01-11T21:38:06.2386510Z @pointwise(size_hints=[1048576], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.2386583Z @triton.jit 2023-01-11T21:38:06.2386714Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.2386790Z xnumel = 746496 2023-01-11T21:38:06.2386887Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2387013Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.2387097Z xmask = xindex < xnumel 2023-01-11T21:38:06.2387166Z x0 = xindex % 27 2023-01-11T21:38:06.2387248Z x1 = (xindex // 27) % 27 2023-01-11T21:38:06.2387329Z x2 = (xindex // 729) 2023-01-11T21:38:06.2387399Z x3 = xindex 2023-01-11T21:38:06.2387518Z tmp0 = tl.load(in_ptr0 + ((2*x0) + (110*x1) + (3025*x2)), xmask) 2023-01-11T21:38:06.2387635Z tmp1 = tl.load(in_ptr0 + (1 + (2*x0) + (110*x1) + (3025*x2)), xmask) 2023-01-11T21:38:06.2387755Z tmp3 = tl.load(in_ptr0 + (2 + (2*x0) + (110*x1) + (3025*x2)), xmask) 2023-01-11T21:38:06.2387873Z tmp5 = tl.load(in_ptr0 + (55 + (2*x0) + (110*x1) + (3025*x2)), xmask) 2023-01-11T21:38:06.2387986Z tmp7 = tl.load(in_ptr0 + (56 + (2*x0) + (110*x1) + (3025*x2)), xmask) 2023-01-11T21:38:06.2388101Z tmp9 = tl.load(in_ptr0 + (57 + (2*x0) + (110*x1) + (3025*x2)), xmask) 2023-01-11T21:38:06.2388223Z tmp11 = tl.load(in_ptr0 + (110 + (2*x0) + (110*x1) + (3025*x2)), xmask) 2023-01-11T21:38:06.2388342Z tmp13 = tl.load(in_ptr0 + (111 + (2*x0) + (110*x1) + (3025*x2)), xmask) 2023-01-11T21:38:06.2388457Z tmp15 = tl.load(in_ptr0 + (112 + (2*x0) + (110*x1) + (3025*x2)), xmask) 2023-01-11T21:38:06.2388566Z tmp2 = tmp1 + tmp0 2023-01-11T21:38:06.2388646Z tmp4 = tmp3 + tmp2 2023-01-11T21:38:06.2388718Z tmp6 = tmp5 + tmp4 2023-01-11T21:38:06.2388795Z tmp8 = tmp7 + tmp6 2023-01-11T21:38:06.2388873Z tmp10 = tmp9 + tmp8 2023-01-11T21:38:06.2388954Z tmp12 = tmp11 + tmp10 2023-01-11T21:38:06.2389034Z tmp14 = tmp13 + tmp12 2023-01-11T21:38:06.2389112Z tmp16 = tmp15 + tmp14 2023-01-11T21:38:06.2389192Z tmp17 = 0.1111111111111111 2023-01-11T21:38:06.2389264Z tmp18 = tmp16 * tmp17 2023-01-11T21:38:06.2389400Z tl.store(out_ptr0 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp18, xmask) 2023-01-11T21:38:06.2389489Z ''') 2023-01-11T21:38:06.2389494Z 2023-01-11T21:38:06.2389498Z 2023-01-11T21:38:06.2389592Z async_compile.wait(globals()) 2023-01-11T21:38:06.2389670Z del async_compile 2023-01-11T21:38:06.2389675Z 2023-01-11T21:38:06.2389748Z def call(args): 2023-01-11T21:38:06.2389821Z arg0_1, = args 2023-01-11T21:38:06.2389889Z args.clear() 2023-01-11T21:38:06.2389983Z with torch.cuda.device(0): 2023-01-11T21:38:06.2390214Z buf0 = empty_strided((16, 64, 27, 27), (46656, 729, 27, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.2390305Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.2390453Z triton_fused_avg_pool2d_0.run(arg0_1, buf0, 746496, grid=grid(746496), stream=stream0) 2023-01-11T21:38:06.2390527Z del arg0_1 2023-01-11T21:38:06.2390603Z return (buf0, ) 2023-01-11T21:38:06.2390608Z 2023-01-11T21:38:06.2390612Z 2023-01-11T21:38:06.2390691Z if __name__ == "__main__": 2023-01-11T21:38:06.2390803Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.2390958Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.2391189Z arg0_1 = rand_strided((16, 64, 55, 55), (193600, 3025, 55, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.2391300Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.2391305Z 2023-01-11T21:38:06.2391309Z 2023-01-11T21:38:06.2391408Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.2391484Z import torch 2023-01-11T21:38:06.2391559Z import random 2023-01-11T21:38:06.2391678Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.2391794Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.2391799Z 2023-01-11T21:38:06.2391880Z aten = torch.ops.aten 2023-01-11T21:38:06.2392017Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.2392112Z async_compile = AsyncCompile() 2023-01-11T21:38:06.2392118Z 2023-01-11T21:38:06.2392190Z import triton 2023-01-11T21:38:06.2392280Z import triton.language as tl 2023-01-11T21:38:06.2392407Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.2392538Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.2392552Z 2023-01-11T21:38:06.2392556Z 2023-01-11T21:38:06.2392714Z triton_fused_avg_pool2d_0 = async_compile.triton(''' 2023-01-11T21:38:06.2392789Z import triton 2023-01-11T21:38:06.2392881Z import triton.language as tl 2023-01-11T21:38:06.2392993Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2393095Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2393224Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.2393346Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2393351Z 2023-01-11T21:38:06.2393760Z @pointwise(size_hints=[1048576], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.2393831Z @triton.jit 2023-01-11T21:38:06.2393961Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.2394034Z xnumel = 746496 2023-01-11T21:38:06.2394129Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2394294Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.2394378Z xmask = xindex < xnumel 2023-01-11T21:38:06.2394453Z x0 = xindex % 27 2023-01-11T21:38:06.2394529Z x1 = (xindex // 27) % 27 2023-01-11T21:38:06.2394608Z x2 = (xindex // 729) 2023-01-11T21:38:06.2394678Z x3 = xindex 2023-01-11T21:38:06.2394812Z tmp0 = tl.load(in_ptr0 + ((2*x0) + (110*x1) + (3025*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.2394946Z tmp1 = tl.load(in_ptr0 + (1 + (2*x0) + (110*x1) + (3025*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.2395078Z tmp3 = tl.load(in_ptr0 + (2 + (2*x0) + (110*x1) + (3025*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.2395211Z tmp5 = tl.load(in_ptr0 + (55 + (2*x0) + (110*x1) + (3025*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.2395346Z tmp7 = tl.load(in_ptr0 + (56 + (2*x0) + (110*x1) + (3025*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.2395469Z tmp9 = tl.load(in_ptr0 + (57 + (2*x0) + (110*x1) + (3025*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.2395602Z tmp11 = tl.load(in_ptr0 + (110 + (2*x0) + (110*x1) + (3025*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.2395732Z tmp13 = tl.load(in_ptr0 + (111 + (2*x0) + (110*x1) + (3025*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.2395856Z tmp15 = tl.load(in_ptr0 + (112 + (2*x0) + (110*x1) + (3025*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.2395934Z tmp2 = tmp1 + tmp0 2023-01-11T21:38:06.2396013Z tmp4 = tmp3 + tmp2 2023-01-11T21:38:06.2396089Z tmp6 = tmp5 + tmp4 2023-01-11T21:38:06.2396159Z tmp8 = tmp7 + tmp6 2023-01-11T21:38:06.2396237Z tmp10 = tmp9 + tmp8 2023-01-11T21:38:06.2396315Z tmp12 = tmp11 + tmp10 2023-01-11T21:38:06.2396395Z tmp14 = tmp13 + tmp12 2023-01-11T21:38:06.2396501Z tmp16 = tmp15 + tmp14 2023-01-11T21:38:06.2396579Z tmp17 = 0.1111111111111111 2023-01-11T21:38:06.2396658Z tmp18 = tmp16 * tmp17 2023-01-11T21:38:06.2396787Z tl.store(out_ptr0 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp18, xmask) 2023-01-11T21:38:06.2396874Z ''') 2023-01-11T21:38:06.2396880Z 2023-01-11T21:38:06.2396887Z 2023-01-11T21:38:06.2396977Z async_compile.wait(globals()) 2023-01-11T21:38:06.2397055Z del async_compile 2023-01-11T21:38:06.2397061Z 2023-01-11T21:38:06.2397135Z def call(args): 2023-01-11T21:38:06.2397206Z arg0_1, = args 2023-01-11T21:38:06.2397279Z args.clear() 2023-01-11T21:38:06.2397365Z with torch.cuda.device(0): 2023-01-11T21:38:06.2397591Z buf0 = empty_strided((16, 64, 27, 27), (46656, 729, 27, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.2397682Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.2397829Z triton_fused_avg_pool2d_0.run(arg0_1, buf0, 746496, grid=grid(746496), stream=stream0) 2023-01-11T21:38:06.2397902Z del arg0_1 2023-01-11T21:38:06.2397979Z return (buf0, ) 2023-01-11T21:38:06.2397984Z 2023-01-11T21:38:06.2397989Z 2023-01-11T21:38:06.2398069Z if __name__ == "__main__": 2023-01-11T21:38:06.2398185Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.2398308Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.2398540Z arg0_1 = rand_strided((16, 64, 55, 55), (193600, 3025, 55, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.2398653Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.2398658Z 2023-01-11T21:38:06.2398726Z ok (0.334s) 2023-01-11T21:38:06.2399183Z test_avg_pool2d3_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.2399314Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.2399570Z [2023-01-11 21:34:04,405] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 413 2023-01-11T21:38:06.2399859Z [2023-01-11 21:34:04,576] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 413 2023-01-11T21:38:06.2400276Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.2400406Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.2400663Z [2023-01-11 21:34:04,594] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 414 2023-01-11T21:38:06.2400672Z 2023-01-11T21:38:06.2400763Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.2400840Z import torch 2023-01-11T21:38:06.2400912Z import random 2023-01-11T21:38:06.2401028Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.2401154Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.2401160Z 2023-01-11T21:38:06.2401241Z aten = torch.ops.aten 2023-01-11T21:38:06.2401377Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.2401474Z async_compile = AsyncCompile() 2023-01-11T21:38:06.2401480Z 2023-01-11T21:38:06.2401547Z import triton 2023-01-11T21:38:06.2401636Z import triton.language as tl 2023-01-11T21:38:06.2401759Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.2401904Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.2401909Z 2023-01-11T21:38:06.2401913Z 2023-01-11T21:38:06.2402106Z triton_fused_avg_pool2d_0 = async_compile.triton(''' 2023-01-11T21:38:06.2402181Z import triton 2023-01-11T21:38:06.2402271Z import triton.language as tl 2023-01-11T21:38:06.2402378Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2402480Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2402615Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.2402740Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2402745Z 2023-01-11T21:38:06.2403148Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.2403221Z @triton.jit 2023-01-11T21:38:06.2403351Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.2403424Z xnumel = 16 2023-01-11T21:38:06.2403514Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2403644Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.2403727Z xmask = xindex < xnumel 2023-01-11T21:38:06.2403805Z x1 = (xindex // 4) 2023-01-11T21:38:06.2403878Z x0 = xindex % 4 2023-01-11T21:38:06.2403950Z x2 = xindex 2023-01-11T21:38:06.2404058Z tmp0 = (-1) + (2*x1) 2023-01-11T21:38:06.2404124Z tmp1 = 0 2023-01-11T21:38:06.2404203Z tmp2 = tmp0 >= tmp1 2023-01-11T21:38:06.2404271Z tmp3 = 8 2023-01-11T21:38:06.2404351Z tmp4 = tmp0 < tmp3 2023-01-11T21:38:06.2404427Z tmp5 = tmp2 & tmp4 2023-01-11T21:38:06.2404533Z tmp6 = (-1) + (2*x0) 2023-01-11T21:38:06.2404611Z tmp7 = tmp6 >= tmp1 2023-01-11T21:38:06.2404681Z tmp8 = tmp6 < tmp3 2023-01-11T21:38:06.2404757Z tmp9 = tmp7 & tmp8 2023-01-11T21:38:06.2404834Z tmp10 = tmp5 & tmp9 2023-01-11T21:38:06.2405070Z tmp11 = tl.load(in_ptr0 + ((-9) + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp10 & xmask, other=0) 2023-01-11T21:38:06.2405169Z tmp12 = tl.where(tmp10, tmp11, 0.0) 2023-01-11T21:38:06.2405240Z tmp13 = 2*x0 2023-01-11T21:38:06.2405319Z tmp14 = tmp13 >= tmp1 2023-01-11T21:38:06.2405392Z tmp15 = tmp13 < tmp3 2023-01-11T21:38:06.2405471Z tmp16 = tmp14 & tmp15 2023-01-11T21:38:06.2405550Z tmp17 = tmp5 & tmp16 2023-01-11T21:38:06.2405823Z tmp18 = tl.load(in_ptr0 + ((-8) + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp17 & xmask, other=0) 2023-01-11T21:38:06.2405921Z tmp19 = tl.where(tmp17, tmp18, 0.0) 2023-01-11T21:38:06.2406001Z tmp20 = tmp19 + tmp12 2023-01-11T21:38:06.2406075Z tmp21 = 1 + (2*x0) 2023-01-11T21:38:06.2406149Z tmp22 = tmp21 >= tmp1 2023-01-11T21:38:06.2406229Z tmp23 = tmp21 < tmp3 2023-01-11T21:38:06.2406310Z tmp24 = tmp22 & tmp23 2023-01-11T21:38:06.2406389Z tmp25 = tmp5 & tmp24 2023-01-11T21:38:06.2406626Z tmp26 = tl.load(in_ptr0 + ((-7) + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp25 & xmask, other=0) 2023-01-11T21:38:06.2406725Z tmp27 = tl.where(tmp25, tmp26, 0.0) 2023-01-11T21:38:06.2406805Z tmp28 = tmp27 + tmp20 2023-01-11T21:38:06.2406871Z tmp29 = 2*x1 2023-01-11T21:38:06.2406950Z tmp30 = tmp29 >= tmp1 2023-01-11T21:38:06.2407027Z tmp31 = tmp29 < tmp3 2023-01-11T21:38:06.2407107Z tmp32 = tmp30 & tmp31 2023-01-11T21:38:06.2407184Z tmp33 = tmp32 & tmp9 2023-01-11T21:38:06.2407420Z tmp34 = tl.load(in_ptr0 + ((-1) + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp33 & xmask, other=0) 2023-01-11T21:38:06.2407515Z tmp35 = tl.where(tmp33, tmp34, 0.0) 2023-01-11T21:38:06.2407589Z tmp36 = tmp35 + tmp28 2023-01-11T21:38:06.2407667Z tmp37 = tmp32 & tmp16 2023-01-11T21:38:06.2407824Z tmp38 = tl.load(in_ptr0 + ((2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp37 & xmask, other=0) 2023-01-11T21:38:06.2407918Z tmp39 = tl.where(tmp37, tmp38, 0.0) 2023-01-11T21:38:06.2407998Z tmp40 = tmp39 + tmp36 2023-01-11T21:38:06.2408076Z tmp41 = tmp32 & tmp24 2023-01-11T21:38:06.2408231Z tmp42 = tl.load(in_ptr0 + (1 + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp41 & xmask, other=0) 2023-01-11T21:38:06.2408359Z tmp43 = tl.where(tmp41, tmp42, 0.0) 2023-01-11T21:38:06.2408438Z tmp44 = tmp43 + tmp40 2023-01-11T21:38:06.2408511Z tmp45 = 1 + (2*x1) 2023-01-11T21:38:06.2408590Z tmp46 = tmp45 >= tmp1 2023-01-11T21:38:06.2408673Z tmp47 = tmp45 < tmp3 2023-01-11T21:38:06.2408752Z tmp48 = tmp46 & tmp47 2023-01-11T21:38:06.2408829Z tmp49 = tmp48 & tmp9 2023-01-11T21:38:06.2408979Z tmp50 = tl.load(in_ptr0 + (7 + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp49 & xmask, other=0) 2023-01-11T21:38:06.2409072Z tmp51 = tl.where(tmp49, tmp50, 0.0) 2023-01-11T21:38:06.2409150Z tmp52 = tmp51 + tmp44 2023-01-11T21:38:06.2409229Z tmp53 = tmp48 & tmp16 2023-01-11T21:38:06.2409381Z tmp54 = tl.load(in_ptr0 + (8 + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp53 & xmask, other=0) 2023-01-11T21:38:06.2409475Z tmp55 = tl.where(tmp53, tmp54, 0.0) 2023-01-11T21:38:06.2409556Z tmp56 = tmp55 + tmp52 2023-01-11T21:38:06.2409628Z tmp57 = tmp48 & tmp24 2023-01-11T21:38:06.2409783Z tmp58 = tl.load(in_ptr0 + (9 + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp57 & xmask, other=0) 2023-01-11T21:38:06.2409876Z tmp59 = tl.where(tmp57, tmp58, 0.0) 2023-01-11T21:38:06.2409959Z tmp60 = tmp59 + tmp56 2023-01-11T21:38:06.2410038Z tmp61 = 0.1111111111111111 2023-01-11T21:38:06.2410116Z tmp62 = tmp60 * tmp61 2023-01-11T21:38:06.2410257Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp62, xmask) 2023-01-11T21:38:06.2410337Z ''') 2023-01-11T21:38:06.2410342Z 2023-01-11T21:38:06.2410354Z 2023-01-11T21:38:06.2410440Z async_compile.wait(globals()) 2023-01-11T21:38:06.2410517Z del async_compile 2023-01-11T21:38:06.2410522Z 2023-01-11T21:38:06.2410599Z def call(args): 2023-01-11T21:38:06.2410672Z arg0_1, = args 2023-01-11T21:38:06.2410746Z args.clear() 2023-01-11T21:38:06.2410838Z with torch.cuda.device(0): 2023-01-11T21:38:06.2411055Z buf0 = empty_strided((1, 1, 4, 4), (16, 16, 4, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.2411142Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.2411304Z triton_fused_avg_pool2d_0.run(arg0_1, buf0, 16, grid=grid(16), stream=stream0) 2023-01-11T21:38:06.2411380Z del arg0_1 2023-01-11T21:38:06.2411485Z return (buf0, ) 2023-01-11T21:38:06.2411490Z 2023-01-11T21:38:06.2411495Z 2023-01-11T21:38:06.2411576Z if __name__ == "__main__": 2023-01-11T21:38:06.2411696Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.2411825Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.2412059Z arg0_1 = rand_strided((1, 1, 8, 8), (64, 64, 8, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.2412183Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.2412188Z 2023-01-11T21:38:06.2412476Z [2023-01-11 21:34:04,749] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 414 2023-01-11T21:38:06.2412485Z 2023-01-11T21:38:06.2412581Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.2412655Z import torch 2023-01-11T21:38:06.2412731Z import random 2023-01-11T21:38:06.2412851Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.2412975Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.2412980Z 2023-01-11T21:38:06.2413068Z aten = torch.ops.aten 2023-01-11T21:38:06.2413198Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.2413299Z async_compile = AsyncCompile() 2023-01-11T21:38:06.2413304Z 2023-01-11T21:38:06.2413376Z import triton 2023-01-11T21:38:06.2413468Z import triton.language as tl 2023-01-11T21:38:06.2413598Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.2413739Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.2413745Z 2023-01-11T21:38:06.2413749Z 2023-01-11T21:38:06.2413913Z triton_fused_avg_pool2d_0 = async_compile.triton(''' 2023-01-11T21:38:06.2414021Z import triton 2023-01-11T21:38:06.2414105Z import triton.language as tl 2023-01-11T21:38:06.2414218Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2414317Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2414453Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.2414706Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2414712Z 2023-01-11T21:38:06.2415125Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.2415200Z @triton.jit 2023-01-11T21:38:06.2415333Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.2415402Z xnumel = 16 2023-01-11T21:38:06.2415508Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2415657Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.2415764Z xmask = xindex < xnumel 2023-01-11T21:38:06.2415852Z x1 = (xindex // 4) 2023-01-11T21:38:06.2415924Z x0 = xindex % 4 2023-01-11T21:38:06.2415997Z x2 = xindex 2023-01-11T21:38:06.2416100Z tmp0 = (-1) + (2*x1) 2023-01-11T21:38:06.2416173Z tmp1 = 0 2023-01-11T21:38:06.2416259Z tmp2 = tmp0 >= tmp1 2023-01-11T21:38:06.2416332Z tmp3 = 8 2023-01-11T21:38:06.2416409Z tmp4 = tmp0 < tmp3 2023-01-11T21:38:06.2416491Z tmp5 = tmp2 & tmp4 2023-01-11T21:38:06.2416593Z tmp6 = (-1) + (2*x0) 2023-01-11T21:38:06.2416674Z tmp7 = tmp6 >= tmp1 2023-01-11T21:38:06.2416751Z tmp8 = tmp6 < tmp3 2023-01-11T21:38:06.2416835Z tmp9 = tmp7 & tmp8 2023-01-11T21:38:06.2416915Z tmp10 = tmp5 & tmp9 2023-01-11T21:38:06.2417261Z tmp11 = tl.load(in_ptr0 + ((-9) + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp10 & xmask, other=0).to(tl.float32) 2023-01-11T21:38:06.2417417Z tmp12 = tl.where(tmp10, tmp11, 0.0) 2023-01-11T21:38:06.2417487Z tmp13 = 2*x0 2023-01-11T21:38:06.2417567Z tmp14 = tmp13 >= tmp1 2023-01-11T21:38:06.2417648Z tmp15 = tmp13 < tmp3 2023-01-11T21:38:06.2417729Z tmp16 = tmp14 & tmp15 2023-01-11T21:38:06.2417808Z tmp17 = tmp5 & tmp16 2023-01-11T21:38:06.2418127Z tmp18 = tl.load(in_ptr0 + ((-8) + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp17 & xmask, other=0).to(tl.float32) 2023-01-11T21:38:06.2418226Z tmp19 = tl.where(tmp17, tmp18, 0.0) 2023-01-11T21:38:06.2418300Z tmp20 = tmp19 + tmp12 2023-01-11T21:38:06.2418375Z tmp21 = 1 + (2*x0) 2023-01-11T21:38:06.2418454Z tmp22 = tmp21 >= tmp1 2023-01-11T21:38:06.2418532Z tmp23 = tmp21 < tmp3 2023-01-11T21:38:06.2418610Z tmp24 = tmp22 & tmp23 2023-01-11T21:38:06.2418687Z tmp25 = tmp5 & tmp24 2023-01-11T21:38:06.2418948Z tmp26 = tl.load(in_ptr0 + ((-7) + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp25 & xmask, other=0).to(tl.float32) 2023-01-11T21:38:06.2419040Z tmp27 = tl.where(tmp25, tmp26, 0.0) 2023-01-11T21:38:06.2419121Z tmp28 = tmp27 + tmp20 2023-01-11T21:38:06.2419195Z tmp29 = 2*x1 2023-01-11T21:38:06.2419274Z tmp30 = tmp29 >= tmp1 2023-01-11T21:38:06.2419351Z tmp31 = tmp29 < tmp3 2023-01-11T21:38:06.2419429Z tmp32 = tmp30 & tmp31 2023-01-11T21:38:06.2419507Z tmp33 = tmp32 & tmp9 2023-01-11T21:38:06.2419756Z tmp34 = tl.load(in_ptr0 + ((-1) + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp33 & xmask, other=0).to(tl.float32) 2023-01-11T21:38:06.2419853Z tmp35 = tl.where(tmp33, tmp34, 0.0) 2023-01-11T21:38:06.2419931Z tmp36 = tmp35 + tmp28 2023-01-11T21:38:06.2420008Z tmp37 = tmp32 & tmp16 2023-01-11T21:38:06.2420182Z tmp38 = tl.load(in_ptr0 + ((2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp37 & xmask, other=0).to(tl.float32) 2023-01-11T21:38:06.2420277Z tmp39 = tl.where(tmp37, tmp38, 0.0) 2023-01-11T21:38:06.2420355Z tmp40 = tmp39 + tmp36 2023-01-11T21:38:06.2420427Z tmp41 = tmp32 & tmp24 2023-01-11T21:38:06.2420632Z tmp42 = tl.load(in_ptr0 + (1 + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp41 & xmask, other=0).to(tl.float32) 2023-01-11T21:38:06.2420733Z tmp43 = tl.where(tmp41, tmp42, 0.0) 2023-01-11T21:38:06.2420812Z tmp44 = tmp43 + tmp40 2023-01-11T21:38:06.2420887Z tmp45 = 1 + (2*x1) 2023-01-11T21:38:06.2420967Z tmp46 = tmp45 >= tmp1 2023-01-11T21:38:06.2421047Z tmp47 = tmp45 < tmp3 2023-01-11T21:38:06.2421120Z tmp48 = tmp46 & tmp47 2023-01-11T21:38:06.2421198Z tmp49 = tmp48 & tmp9 2023-01-11T21:38:06.2421369Z tmp50 = tl.load(in_ptr0 + (7 + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp49 & xmask, other=0).to(tl.float32) 2023-01-11T21:38:06.2421464Z tmp51 = tl.where(tmp49, tmp50, 0.0) 2023-01-11T21:38:06.2421543Z tmp52 = tmp51 + tmp44 2023-01-11T21:38:06.2421621Z tmp53 = tmp48 & tmp16 2023-01-11T21:38:06.2421787Z tmp54 = tl.load(in_ptr0 + (8 + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp53 & xmask, other=0).to(tl.float32) 2023-01-11T21:38:06.2421884Z tmp55 = tl.where(tmp53, tmp54, 0.0) 2023-01-11T21:38:06.2421956Z tmp56 = tmp55 + tmp52 2023-01-11T21:38:06.2422035Z tmp57 = tmp48 & tmp24 2023-01-11T21:38:06.2422206Z tmp58 = tl.load(in_ptr0 + (9 + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp57 & xmask, other=0).to(tl.float32) 2023-01-11T21:38:06.2422303Z tmp59 = tl.where(tmp57, tmp58, 0.0) 2023-01-11T21:38:06.2422381Z tmp60 = tmp59 + tmp56 2023-01-11T21:38:06.2422460Z tmp61 = 0.1111111111111111 2023-01-11T21:38:06.2422540Z tmp62 = tmp60 * tmp61 2023-01-11T21:38:06.2422670Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp62, xmask) 2023-01-11T21:38:06.2422755Z ''') 2023-01-11T21:38:06.2422762Z 2023-01-11T21:38:06.2422767Z 2023-01-11T21:38:06.2422860Z async_compile.wait(globals()) 2023-01-11T21:38:06.2422937Z del async_compile 2023-01-11T21:38:06.2422942Z 2023-01-11T21:38:06.2423015Z def call(args): 2023-01-11T21:38:06.2423088Z arg0_1, = args 2023-01-11T21:38:06.2423165Z args.clear() 2023-01-11T21:38:06.2423257Z with torch.cuda.device(0): 2023-01-11T21:38:06.2423469Z buf0 = empty_strided((1, 1, 4, 4), (16, 16, 4, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.2423562Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.2423733Z triton_fused_avg_pool2d_0.run(arg0_1, buf0, 16, grid=grid(16), stream=stream0) 2023-01-11T21:38:06.2423813Z del arg0_1 2023-01-11T21:38:06.2423890Z return (buf0, ) 2023-01-11T21:38:06.2423895Z 2023-01-11T21:38:06.2423899Z 2023-01-11T21:38:06.2423981Z if __name__ == "__main__": 2023-01-11T21:38:06.2424101Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.2424221Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.2424438Z arg0_1 = rand_strided((1, 1, 8, 8), (64, 64, 8, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.2424549Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.2424554Z 2023-01-11T21:38:06.2424626Z ok (0.363s) 2023-01-11T21:38:06.2425088Z test_avg_pool2d4_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.2425224Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.2425482Z [2023-01-11 21:34:04,769] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 415 2023-01-11T21:38:06.2425746Z [2023-01-11 21:34:04,918] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 415 2023-01-11T21:38:06.2426160Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.2426319Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.2426576Z [2023-01-11 21:34:04,935] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 416 2023-01-11T21:38:06.2426836Z [2023-01-11 21:34:05,067] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 416 2023-01-11T21:38:06.2426850Z 2023-01-11T21:38:06.2426942Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.2427016Z import torch 2023-01-11T21:38:06.2427090Z import random 2023-01-11T21:38:06.2427210Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.2427334Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.2427339Z 2023-01-11T21:38:06.2427422Z aten = torch.ops.aten 2023-01-11T21:38:06.2427560Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.2427649Z async_compile = AsyncCompile() 2023-01-11T21:38:06.2427654Z 2023-01-11T21:38:06.2427729Z import triton 2023-01-11T21:38:06.2427821Z import triton.language as tl 2023-01-11T21:38:06.2427945Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.2428086Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.2428092Z 2023-01-11T21:38:06.2428096Z 2023-01-11T21:38:06.2428263Z triton_fused_avg_pool2d_0 = async_compile.triton(''' 2023-01-11T21:38:06.2428338Z import triton 2023-01-11T21:38:06.2428431Z import triton.language as tl 2023-01-11T21:38:06.2428538Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2428640Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2428776Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.2428903Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2428911Z 2023-01-11T21:38:06.2429316Z @pointwise(size_hints=[65536], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.2429388Z @triton.jit 2023-01-11T21:38:06.2429554Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.2429630Z xnumel = 48400 2023-01-11T21:38:06.2429722Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2429854Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.2429939Z xmask = xindex < xnumel 2023-01-11T21:38:06.2430016Z x0 = xindex % 55 2023-01-11T21:38:06.2430098Z x1 = (xindex // 55) % 55 2023-01-11T21:38:06.2430177Z x2 = (xindex // 3025) 2023-01-11T21:38:06.2430240Z x3 = xindex 2023-01-11T21:38:06.2430361Z tmp0 = tl.load(in_ptr0 + ((2*x0) + (222*x1) + (12321*x2)), xmask) 2023-01-11T21:38:06.2430485Z tmp1 = tl.load(in_ptr0 + (1 + (2*x0) + (222*x1) + (12321*x2)), xmask) 2023-01-11T21:38:06.2430611Z tmp3 = tl.load(in_ptr0 + (2 + (2*x0) + (222*x1) + (12321*x2)), xmask) 2023-01-11T21:38:06.2430730Z tmp5 = tl.load(in_ptr0 + (111 + (2*x0) + (222*x1) + (12321*x2)), xmask) 2023-01-11T21:38:06.2430852Z tmp7 = tl.load(in_ptr0 + (112 + (2*x0) + (222*x1) + (12321*x2)), xmask) 2023-01-11T21:38:06.2430972Z tmp9 = tl.load(in_ptr0 + (113 + (2*x0) + (222*x1) + (12321*x2)), xmask) 2023-01-11T21:38:06.2431094Z tmp11 = tl.load(in_ptr0 + (222 + (2*x0) + (222*x1) + (12321*x2)), xmask) 2023-01-11T21:38:06.2431208Z tmp13 = tl.load(in_ptr0 + (223 + (2*x0) + (222*x1) + (12321*x2)), xmask) 2023-01-11T21:38:06.2431324Z tmp15 = tl.load(in_ptr0 + (224 + (2*x0) + (222*x1) + (12321*x2)), xmask) 2023-01-11T21:38:06.2431403Z tmp2 = tmp1 + tmp0 2023-01-11T21:38:06.2431481Z tmp4 = tmp3 + tmp2 2023-01-11T21:38:06.2431558Z tmp6 = tmp5 + tmp4 2023-01-11T21:38:06.2431636Z tmp8 = tmp7 + tmp6 2023-01-11T21:38:06.2431747Z tmp10 = tmp9 + tmp8 2023-01-11T21:38:06.2431822Z tmp12 = tmp11 + tmp10 2023-01-11T21:38:06.2431903Z tmp14 = tmp13 + tmp12 2023-01-11T21:38:06.2431983Z tmp16 = tmp15 + tmp14 2023-01-11T21:38:06.2432064Z tmp17 = 0.1111111111111111 2023-01-11T21:38:06.2432141Z tmp18 = tmp16 * tmp17 2023-01-11T21:38:06.2432282Z tl.store(out_ptr0 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp18, xmask) 2023-01-11T21:38:06.2432369Z ''') 2023-01-11T21:38:06.2432375Z 2023-01-11T21:38:06.2432380Z 2023-01-11T21:38:06.2432472Z async_compile.wait(globals()) 2023-01-11T21:38:06.2432542Z del async_compile 2023-01-11T21:38:06.2432547Z 2023-01-11T21:38:06.2432621Z def call(args): 2023-01-11T21:38:06.2432691Z arg0_1, = args 2023-01-11T21:38:06.2432765Z args.clear() 2023-01-11T21:38:06.2432858Z with torch.cuda.device(0): 2023-01-11T21:38:06.2433084Z buf0 = empty_strided((2, 8, 55, 55), (24200, 3025, 55, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.2433177Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.2433322Z triton_fused_avg_pool2d_0.run(arg0_1, buf0, 48400, grid=grid(48400), stream=stream0) 2023-01-11T21:38:06.2433394Z del arg0_1 2023-01-11T21:38:06.2433470Z return (buf0, ) 2023-01-11T21:38:06.2433476Z 2023-01-11T21:38:06.2433480Z 2023-01-11T21:38:06.2433560Z if __name__ == "__main__": 2023-01-11T21:38:06.2433683Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.2433810Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.2434045Z arg0_1 = rand_strided((2, 8, 111, 111), (98568, 12321, 111, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.2434160Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.2434165Z 2023-01-11T21:38:06.2434170Z 2023-01-11T21:38:06.2434260Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.2434337Z import torch 2023-01-11T21:38:06.2434409Z import random 2023-01-11T21:38:06.2434530Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.2434657Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.2434663Z 2023-01-11T21:38:06.2434746Z aten = torch.ops.aten 2023-01-11T21:38:06.2434882Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.2434978Z async_compile = AsyncCompile() 2023-01-11T21:38:06.2434983Z 2023-01-11T21:38:06.2435079Z import triton 2023-01-11T21:38:06.2435172Z import triton.language as tl 2023-01-11T21:38:06.2435297Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.2435439Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.2435445Z 2023-01-11T21:38:06.2435449Z 2023-01-11T21:38:06.2435617Z triton_fused_avg_pool2d_0 = async_compile.triton(''' 2023-01-11T21:38:06.2435688Z import triton 2023-01-11T21:38:06.2435780Z import triton.language as tl 2023-01-11T21:38:06.2435888Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2435989Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2436125Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.2436250Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2436255Z 2023-01-11T21:38:06.2436662Z @pointwise(size_hints=[65536], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.2436736Z @triton.jit 2023-01-11T21:38:06.2436869Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.2436943Z xnumel = 48400 2023-01-11T21:38:06.2437034Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2437164Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.2437247Z xmask = xindex < xnumel 2023-01-11T21:38:06.2437321Z x0 = xindex % 55 2023-01-11T21:38:06.2437402Z x1 = (xindex // 55) % 55 2023-01-11T21:38:06.2437484Z x2 = (xindex // 3025) 2023-01-11T21:38:06.2437587Z x3 = xindex 2023-01-11T21:38:06.2437717Z tmp0 = tl.load(in_ptr0 + ((2*x0) + (222*x1) + (12321*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.2437852Z tmp1 = tl.load(in_ptr0 + (1 + (2*x0) + (222*x1) + (12321*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.2437989Z tmp3 = tl.load(in_ptr0 + (2 + (2*x0) + (222*x1) + (12321*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.2438130Z tmp5 = tl.load(in_ptr0 + (111 + (2*x0) + (222*x1) + (12321*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.2438264Z tmp7 = tl.load(in_ptr0 + (112 + (2*x0) + (222*x1) + (12321*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.2438394Z tmp9 = tl.load(in_ptr0 + (113 + (2*x0) + (222*x1) + (12321*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.2438529Z tmp11 = tl.load(in_ptr0 + (222 + (2*x0) + (222*x1) + (12321*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.2438664Z tmp13 = tl.load(in_ptr0 + (223 + (2*x0) + (222*x1) + (12321*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.2438796Z tmp15 = tl.load(in_ptr0 + (224 + (2*x0) + (222*x1) + (12321*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.2438871Z tmp2 = tmp1 + tmp0 2023-01-11T21:38:06.2438948Z tmp4 = tmp3 + tmp2 2023-01-11T21:38:06.2439025Z tmp6 = tmp5 + tmp4 2023-01-11T21:38:06.2439103Z tmp8 = tmp7 + tmp6 2023-01-11T21:38:06.2439179Z tmp10 = tmp9 + tmp8 2023-01-11T21:38:06.2439260Z tmp12 = tmp11 + tmp10 2023-01-11T21:38:06.2439334Z tmp14 = tmp13 + tmp12 2023-01-11T21:38:06.2439414Z tmp16 = tmp15 + tmp14 2023-01-11T21:38:06.2439495Z tmp17 = 0.1111111111111111 2023-01-11T21:38:06.2439573Z tmp18 = tmp16 * tmp17 2023-01-11T21:38:06.2439713Z tl.store(out_ptr0 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp18, xmask) 2023-01-11T21:38:06.2439798Z ''') 2023-01-11T21:38:06.2439803Z 2023-01-11T21:38:06.2439808Z 2023-01-11T21:38:06.2439904Z async_compile.wait(globals()) 2023-01-11T21:38:06.2439980Z del async_compile 2023-01-11T21:38:06.2439985Z 2023-01-11T21:38:06.2440053Z def call(args): 2023-01-11T21:38:06.2440130Z arg0_1, = args 2023-01-11T21:38:06.2440204Z args.clear() 2023-01-11T21:38:06.2440297Z with torch.cuda.device(0): 2023-01-11T21:38:06.2440527Z buf0 = empty_strided((2, 8, 55, 55), (24200, 3025, 55, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.2440621Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.2440800Z triton_fused_avg_pool2d_0.run(arg0_1, buf0, 48400, grid=grid(48400), stream=stream0) 2023-01-11T21:38:06.2440868Z del arg0_1 2023-01-11T21:38:06.2440944Z return (buf0, ) 2023-01-11T21:38:06.2440949Z 2023-01-11T21:38:06.2440953Z 2023-01-11T21:38:06.2441034Z if __name__ == "__main__": 2023-01-11T21:38:06.2441152Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.2441279Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.2441518Z arg0_1 = rand_strided((2, 8, 111, 111), (98568, 12321, 111, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.2441631Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.2441640Z 2023-01-11T21:38:06.2441711Z ok (0.318s) 2023-01-11T21:38:06.2442174Z test_avg_pool2d5_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.2442301Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.2442562Z [2023-01-11 21:34:05,235] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 417 2023-01-11T21:38:06.2442825Z [2023-01-11 21:34:05,433] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 417 2023-01-11T21:38:06.2443242Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.2443405Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.2443663Z [2023-01-11 21:34:05,451] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 418 2023-01-11T21:38:06.2443668Z 2023-01-11T21:38:06.2443766Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.2443844Z import torch 2023-01-11T21:38:06.2443918Z import random 2023-01-11T21:38:06.2444036Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.2444154Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.2444159Z 2023-01-11T21:38:06.2444238Z aten = torch.ops.aten 2023-01-11T21:38:06.2444379Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.2444479Z async_compile = AsyncCompile() 2023-01-11T21:38:06.2444485Z 2023-01-11T21:38:06.2444558Z import triton 2023-01-11T21:38:06.2444653Z import triton.language as tl 2023-01-11T21:38:06.2444778Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.2444915Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.2444927Z 2023-01-11T21:38:06.2444931Z 2023-01-11T21:38:06.2445092Z triton_fused_avg_pool2d_0 = async_compile.triton(''' 2023-01-11T21:38:06.2445165Z import triton 2023-01-11T21:38:06.2445257Z import triton.language as tl 2023-01-11T21:38:06.2445370Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2445471Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2445603Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.2445729Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2445734Z 2023-01-11T21:38:06.2446136Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.2446211Z @triton.jit 2023-01-11T21:38:06.2446342Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.2446446Z xnumel = 16 2023-01-11T21:38:06.2446545Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2446675Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.2446758Z xmask = xindex < xnumel 2023-01-11T21:38:06.2446835Z x1 = (xindex // 4) 2023-01-11T21:38:06.2446902Z x0 = xindex % 4 2023-01-11T21:38:06.2446971Z x2 = xindex 2023-01-11T21:38:06.2447083Z tmp0 = (-1) + (2*x1) 2023-01-11T21:38:06.2447154Z tmp1 = 0 2023-01-11T21:38:06.2447234Z tmp2 = tmp0 >= tmp1 2023-01-11T21:38:06.2447301Z tmp3 = 8 2023-01-11T21:38:06.2447372Z tmp4 = tmp0 < tmp3 2023-01-11T21:38:06.2447448Z tmp5 = tmp2 & tmp4 2023-01-11T21:38:06.2447561Z tmp6 = (-1) + (2*x0) 2023-01-11T21:38:06.2447638Z tmp7 = tmp6 >= tmp1 2023-01-11T21:38:06.2447715Z tmp8 = tmp6 < tmp3 2023-01-11T21:38:06.2447790Z tmp9 = tmp7 & tmp8 2023-01-11T21:38:06.2447867Z tmp10 = tmp5 & tmp9 2023-01-11T21:38:06.2448106Z tmp11 = tl.load(in_ptr0 + ((-9) + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp10 & xmask, other=0) 2023-01-11T21:38:06.2448203Z tmp12 = tl.where(tmp10, tmp11, 0.0) 2023-01-11T21:38:06.2448277Z tmp13 = 2*x0 2023-01-11T21:38:06.2448357Z tmp14 = tmp13 >= tmp1 2023-01-11T21:38:06.2448437Z tmp15 = tmp13 < tmp3 2023-01-11T21:38:06.2448519Z tmp16 = tmp14 & tmp15 2023-01-11T21:38:06.2448597Z tmp17 = tmp5 & tmp16 2023-01-11T21:38:06.2448827Z tmp18 = tl.load(in_ptr0 + ((-8) + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp17 & xmask, other=0) 2023-01-11T21:38:06.2448923Z tmp19 = tl.where(tmp17, tmp18, 0.0) 2023-01-11T21:38:06.2449003Z tmp20 = tmp19 + tmp12 2023-01-11T21:38:06.2449119Z tmp21 = 1 + (2*x0) 2023-01-11T21:38:06.2449197Z tmp22 = tmp21 >= tmp1 2023-01-11T21:38:06.2449275Z tmp23 = tmp21 < tmp3 2023-01-11T21:38:06.2449353Z tmp24 = tmp22 & tmp23 2023-01-11T21:38:06.2449425Z tmp25 = tmp5 & tmp24 2023-01-11T21:38:06.2449667Z tmp26 = tl.load(in_ptr0 + ((-7) + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp25 & xmask, other=0) 2023-01-11T21:38:06.2449762Z tmp27 = tl.where(tmp25, tmp26, 0.0) 2023-01-11T21:38:06.2449840Z tmp28 = tmp27 + tmp20 2023-01-11T21:38:06.2449913Z tmp29 = 2*x1 2023-01-11T21:38:06.2449996Z tmp30 = tmp29 >= tmp1 2023-01-11T21:38:06.2450074Z tmp31 = tmp29 < tmp3 2023-01-11T21:38:06.2450146Z tmp32 = tmp30 & tmp31 2023-01-11T21:38:06.2450223Z tmp33 = tmp32 & tmp9 2023-01-11T21:38:06.2450456Z tmp34 = tl.load(in_ptr0 + ((-1) + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp33 & xmask, other=0) 2023-01-11T21:38:06.2450550Z tmp35 = tl.where(tmp33, tmp34, 0.0) 2023-01-11T21:38:06.2450633Z tmp36 = tmp35 + tmp28 2023-01-11T21:38:06.2450710Z tmp37 = tmp32 & tmp16 2023-01-11T21:38:06.2450866Z tmp38 = tl.load(in_ptr0 + ((2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp37 & xmask, other=0) 2023-01-11T21:38:06.2450954Z tmp39 = tl.where(tmp37, tmp38, 0.0) 2023-01-11T21:38:06.2451033Z tmp40 = tmp39 + tmp36 2023-01-11T21:38:06.2451114Z tmp41 = tmp32 & tmp24 2023-01-11T21:38:06.2451268Z tmp42 = tl.load(in_ptr0 + (1 + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp41 & xmask, other=0) 2023-01-11T21:38:06.2451365Z tmp43 = tl.where(tmp41, tmp42, 0.0) 2023-01-11T21:38:06.2451441Z tmp44 = tmp43 + tmp40 2023-01-11T21:38:06.2451517Z tmp45 = 1 + (2*x1) 2023-01-11T21:38:06.2451589Z tmp46 = tmp45 >= tmp1 2023-01-11T21:38:06.2451669Z tmp47 = tmp45 < tmp3 2023-01-11T21:38:06.2451747Z tmp48 = tmp46 & tmp47 2023-01-11T21:38:06.2451829Z tmp49 = tmp48 & tmp9 2023-01-11T21:38:06.2451987Z tmp50 = tl.load(in_ptr0 + (7 + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp49 & xmask, other=0) 2023-01-11T21:38:06.2452085Z tmp51 = tl.where(tmp49, tmp50, 0.0) 2023-01-11T21:38:06.2452164Z tmp52 = tmp51 + tmp44 2023-01-11T21:38:06.2452236Z tmp53 = tmp48 & tmp16 2023-01-11T21:38:06.2452422Z tmp54 = tl.load(in_ptr0 + (8 + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp53 & xmask, other=0) 2023-01-11T21:38:06.2452519Z tmp55 = tl.where(tmp53, tmp54, 0.0) 2023-01-11T21:38:06.2452600Z tmp56 = tmp55 + tmp52 2023-01-11T21:38:06.2452679Z tmp57 = tmp48 & tmp24 2023-01-11T21:38:06.2452837Z tmp58 = tl.load(in_ptr0 + (9 + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp57 & xmask, other=0) 2023-01-11T21:38:06.2452933Z tmp59 = tl.where(tmp57, tmp58, 0.0) 2023-01-11T21:38:06.2453006Z tmp60 = tmp59 + tmp56 2023-01-11T21:38:06.2453077Z tmp61 = 1 2023-01-11T21:38:06.2453170Z tmp62 = tl.where(tmp10, tmp61, 0.0) 2023-01-11T21:38:06.2453239Z tmp63 = 1 2023-01-11T21:38:06.2453333Z tmp64 = tl.where(tmp17, tmp63, 0.0) 2023-01-11T21:38:06.2453417Z tmp65 = tmp64 + tmp62 2023-01-11T21:38:06.2453488Z tmp66 = 1 2023-01-11T21:38:06.2453574Z tmp67 = tl.where(tmp25, tmp66, 0.0) 2023-01-11T21:38:06.2453654Z tmp68 = tmp67 + tmp65 2023-01-11T21:38:06.2453726Z tmp69 = 1 2023-01-11T21:38:06.2453816Z tmp70 = tl.where(tmp33, tmp69, 0.0) 2023-01-11T21:38:06.2453896Z tmp71 = tmp70 + tmp68 2023-01-11T21:38:06.2453968Z tmp72 = 1 2023-01-11T21:38:06.2454061Z tmp73 = tl.where(tmp37, tmp72, 0.0) 2023-01-11T21:38:06.2454134Z tmp74 = tmp73 + tmp71 2023-01-11T21:38:06.2454205Z tmp75 = 1 2023-01-11T21:38:06.2454298Z tmp76 = tl.where(tmp41, tmp75, 0.0) 2023-01-11T21:38:06.2454377Z tmp77 = tmp76 + tmp74 2023-01-11T21:38:06.2454447Z tmp78 = 1 2023-01-11T21:38:06.2454654Z tmp79 = tl.where(tmp49, tmp78, 0.0) 2023-01-11T21:38:06.2454733Z tmp80 = tmp79 + tmp77 2023-01-11T21:38:06.2454797Z tmp81 = 1 2023-01-11T21:38:06.2454890Z tmp82 = tl.where(tmp53, tmp81, 0.0) 2023-01-11T21:38:06.2455013Z tmp83 = tmp82 + tmp80 2023-01-11T21:38:06.2455083Z tmp84 = 1 2023-01-11T21:38:06.2455175Z tmp85 = tl.where(tmp57, tmp84, 0.0) 2023-01-11T21:38:06.2455252Z tmp86 = tmp85 + tmp83 2023-01-11T21:38:06.2455324Z tmp87 = tmp60 / tmp86 2023-01-11T21:38:06.2455465Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp87, xmask) 2023-01-11T21:38:06.2455554Z ''') 2023-01-11T21:38:06.2455560Z 2023-01-11T21:38:06.2455565Z 2023-01-11T21:38:06.2455659Z async_compile.wait(globals()) 2023-01-11T21:38:06.2455737Z del async_compile 2023-01-11T21:38:06.2455742Z 2023-01-11T21:38:06.2455820Z def call(args): 2023-01-11T21:38:06.2455892Z arg0_1, = args 2023-01-11T21:38:06.2455967Z args.clear() 2023-01-11T21:38:06.2456054Z with torch.cuda.device(0): 2023-01-11T21:38:06.2456271Z buf0 = empty_strided((1, 1, 4, 4), (16, 16, 4, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.2456361Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.2456507Z triton_fused_avg_pool2d_0.run(arg0_1, buf0, 16, grid=grid(16), stream=stream0) 2023-01-11T21:38:06.2456582Z del arg0_1 2023-01-11T21:38:06.2456660Z return (buf0, ) 2023-01-11T21:38:06.2456666Z 2023-01-11T21:38:06.2456671Z 2023-01-11T21:38:06.2456750Z if __name__ == "__main__": 2023-01-11T21:38:06.2456870Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.2456994Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.2457270Z arg0_1 = rand_strided((1, 1, 8, 8), (64, 64, 8, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.2457406Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.2457412Z 2023-01-11T21:38:06.2457713Z [2023-01-11 21:34:05,651] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 418 2023-01-11T21:38:06.2457720Z 2023-01-11T21:38:06.2457833Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.2457921Z import torch 2023-01-11T21:38:06.2458135Z import random 2023-01-11T21:38:06.2458266Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.2458400Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.2458405Z 2023-01-11T21:38:06.2458490Z aten = torch.ops.aten 2023-01-11T21:38:06.2458642Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.2458783Z async_compile = AsyncCompile() 2023-01-11T21:38:06.2458789Z 2023-01-11T21:38:06.2458865Z import triton 2023-01-11T21:38:06.2458961Z import triton.language as tl 2023-01-11T21:38:06.2459099Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.2459247Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.2459259Z 2023-01-11T21:38:06.2459263Z 2023-01-11T21:38:06.2459445Z triton_fused_avg_pool2d_0 = async_compile.triton(''' 2023-01-11T21:38:06.2459520Z import triton 2023-01-11T21:38:06.2459615Z import triton.language as tl 2023-01-11T21:38:06.2459737Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2459848Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2459992Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.2460128Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2460134Z 2023-01-11T21:38:06.2460606Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.2460679Z @triton.jit 2023-01-11T21:38:06.2460814Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.2460886Z xnumel = 16 2023-01-11T21:38:06.2460983Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2461113Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.2461197Z xmask = xindex < xnumel 2023-01-11T21:38:06.2461277Z x1 = (xindex // 4) 2023-01-11T21:38:06.2461343Z x0 = xindex % 4 2023-01-11T21:38:06.2461442Z x2 = xindex 2023-01-11T21:38:06.2461555Z tmp0 = (-1) + (2*x1) 2023-01-11T21:38:06.2461627Z tmp1 = 0 2023-01-11T21:38:06.2461709Z tmp2 = tmp0 >= tmp1 2023-01-11T21:38:06.2461781Z tmp3 = 8 2023-01-11T21:38:06.2461855Z tmp4 = tmp0 < tmp3 2023-01-11T21:38:06.2461933Z tmp5 = tmp2 & tmp4 2023-01-11T21:38:06.2462045Z tmp6 = (-1) + (2*x0) 2023-01-11T21:38:06.2462124Z tmp7 = tmp6 >= tmp1 2023-01-11T21:38:06.2462204Z tmp8 = tmp6 < tmp3 2023-01-11T21:38:06.2462285Z tmp9 = tmp7 & tmp8 2023-01-11T21:38:06.2462366Z tmp10 = tmp5 & tmp9 2023-01-11T21:38:06.2462627Z tmp11 = tl.load(in_ptr0 + ((-9) + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp10 & xmask, other=0).to(tl.float32) 2023-01-11T21:38:06.2462726Z tmp12 = tl.where(tmp10, tmp11, 0.0) 2023-01-11T21:38:06.2462800Z tmp13 = 2*x0 2023-01-11T21:38:06.2462887Z tmp14 = tmp13 >= tmp1 2023-01-11T21:38:06.2462969Z tmp15 = tmp13 < tmp3 2023-01-11T21:38:06.2463050Z tmp16 = tmp14 & tmp15 2023-01-11T21:38:06.2463134Z tmp17 = tmp5 & tmp16 2023-01-11T21:38:06.2463394Z tmp18 = tl.load(in_ptr0 + ((-8) + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp17 & xmask, other=0).to(tl.float32) 2023-01-11T21:38:06.2463491Z tmp19 = tl.where(tmp17, tmp18, 0.0) 2023-01-11T21:38:06.2463575Z tmp20 = tmp19 + tmp12 2023-01-11T21:38:06.2463656Z tmp21 = 1 + (2*x0) 2023-01-11T21:38:06.2463738Z tmp22 = tmp21 >= tmp1 2023-01-11T21:38:06.2463819Z tmp23 = tmp21 < tmp3 2023-01-11T21:38:06.2463900Z tmp24 = tmp22 & tmp23 2023-01-11T21:38:06.2463975Z tmp25 = tmp5 & tmp24 2023-01-11T21:38:06.2464237Z tmp26 = tl.load(in_ptr0 + ((-7) + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp25 & xmask, other=0).to(tl.float32) 2023-01-11T21:38:06.2464335Z tmp27 = tl.where(tmp25, tmp26, 0.0) 2023-01-11T21:38:06.2464416Z tmp28 = tmp27 + tmp20 2023-01-11T21:38:06.2464493Z tmp29 = 2*x1 2023-01-11T21:38:06.2464572Z tmp30 = tmp29 >= tmp1 2023-01-11T21:38:06.2464657Z tmp31 = tmp29 < tmp3 2023-01-11T21:38:06.2464731Z tmp32 = tmp30 & tmp31 2023-01-11T21:38:06.2464814Z tmp33 = tmp32 & tmp9 2023-01-11T21:38:06.2465070Z tmp34 = tl.load(in_ptr0 + ((-1) + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp33 & xmask, other=0).to(tl.float32) 2023-01-11T21:38:06.2465170Z tmp35 = tl.where(tmp33, tmp34, 0.0) 2023-01-11T21:38:06.2465285Z tmp36 = tmp35 + tmp28 2023-01-11T21:38:06.2465368Z tmp37 = tmp32 & tmp16 2023-01-11T21:38:06.2465542Z tmp38 = tl.load(in_ptr0 + ((2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp37 & xmask, other=0).to(tl.float32) 2023-01-11T21:38:06.2465632Z tmp39 = tl.where(tmp37, tmp38, 0.0) 2023-01-11T21:38:06.2465713Z tmp40 = tmp39 + tmp36 2023-01-11T21:38:06.2465795Z tmp41 = tmp32 & tmp24 2023-01-11T21:38:06.2465966Z tmp42 = tl.load(in_ptr0 + (1 + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp41 & xmask, other=0).to(tl.float32) 2023-01-11T21:38:06.2466063Z tmp43 = tl.where(tmp41, tmp42, 0.0) 2023-01-11T21:38:06.2466150Z tmp44 = tmp43 + tmp40 2023-01-11T21:38:06.2466228Z tmp45 = 1 + (2*x1) 2023-01-11T21:38:06.2466302Z tmp46 = tmp45 >= tmp1 2023-01-11T21:38:06.2466385Z tmp47 = tmp45 < tmp3 2023-01-11T21:38:06.2466467Z tmp48 = tmp46 & tmp47 2023-01-11T21:38:06.2466547Z tmp49 = tmp48 & tmp9 2023-01-11T21:38:06.2466722Z tmp50 = tl.load(in_ptr0 + (7 + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp49 & xmask, other=0).to(tl.float32) 2023-01-11T21:38:06.2466820Z tmp51 = tl.where(tmp49, tmp50, 0.0) 2023-01-11T21:38:06.2466899Z tmp52 = tmp51 + tmp44 2023-01-11T21:38:06.2466973Z tmp53 = tmp48 & tmp16 2023-01-11T21:38:06.2467144Z tmp54 = tl.load(in_ptr0 + (8 + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp53 & xmask, other=0).to(tl.float32) 2023-01-11T21:38:06.2467241Z tmp55 = tl.where(tmp53, tmp54, 0.0) 2023-01-11T21:38:06.2467324Z tmp56 = tmp55 + tmp52 2023-01-11T21:38:06.2467409Z tmp57 = tmp48 & tmp24 2023-01-11T21:38:06.2467580Z tmp58 = tl.load(in_ptr0 + (9 + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp57 & xmask, other=0).to(tl.float32) 2023-01-11T21:38:06.2467704Z tmp59 = tl.where(tmp57, tmp58, 0.0) 2023-01-11T21:38:06.2467786Z tmp60 = tmp59 + tmp56 2023-01-11T21:38:06.2467853Z tmp61 = 1 2023-01-11T21:38:06.2467950Z tmp62 = tl.where(tmp10, tmp61, 0.0) 2023-01-11T21:38:06.2468027Z tmp63 = 1 2023-01-11T21:38:06.2468123Z tmp64 = tl.where(tmp17, tmp63, 0.0) 2023-01-11T21:38:06.2468204Z tmp65 = tmp64 + tmp62 2023-01-11T21:38:06.2468281Z tmp66 = 1 2023-01-11T21:38:06.2468375Z tmp67 = tl.where(tmp25, tmp66, 0.0) 2023-01-11T21:38:06.2468450Z tmp68 = tmp67 + tmp65 2023-01-11T21:38:06.2468523Z tmp69 = 1 2023-01-11T21:38:06.2468617Z tmp70 = tl.where(tmp33, tmp69, 0.0) 2023-01-11T21:38:06.2468699Z tmp71 = tmp70 + tmp68 2023-01-11T21:38:06.2468772Z tmp72 = 1 2023-01-11T21:38:06.2468868Z tmp73 = tl.where(tmp37, tmp72, 0.0) 2023-01-11T21:38:06.2468944Z tmp74 = tmp73 + tmp71 2023-01-11T21:38:06.2469019Z tmp75 = 1 2023-01-11T21:38:06.2469117Z tmp76 = tl.where(tmp41, tmp75, 0.0) 2023-01-11T21:38:06.2469198Z tmp77 = tmp76 + tmp74 2023-01-11T21:38:06.2469271Z tmp78 = 1 2023-01-11T21:38:06.2469367Z tmp79 = tl.where(tmp49, tmp78, 0.0) 2023-01-11T21:38:06.2469450Z tmp80 = tmp79 + tmp77 2023-01-11T21:38:06.2469519Z tmp81 = 1 2023-01-11T21:38:06.2469614Z tmp82 = tl.where(tmp53, tmp81, 0.0) 2023-01-11T21:38:06.2469695Z tmp83 = tmp82 + tmp80 2023-01-11T21:38:06.2469769Z tmp84 = 1 2023-01-11T21:38:06.2469863Z tmp85 = tl.where(tmp57, tmp84, 0.0) 2023-01-11T21:38:06.2469945Z tmp86 = tmp85 + tmp83 2023-01-11T21:38:06.2470028Z tmp87 = tmp60 / tmp86 2023-01-11T21:38:06.2470161Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp87, xmask) 2023-01-11T21:38:06.2470251Z ''') 2023-01-11T21:38:06.2470257Z 2023-01-11T21:38:06.2470261Z 2023-01-11T21:38:06.2470357Z async_compile.wait(globals()) 2023-01-11T21:38:06.2470435Z del async_compile 2023-01-11T21:38:06.2470443Z 2023-01-11T21:38:06.2470521Z def call(args): 2023-01-11T21:38:06.2470597Z arg0_1, = args 2023-01-11T21:38:06.2470673Z args.clear() 2023-01-11T21:38:06.2470763Z with torch.cuda.device(0): 2023-01-11T21:38:06.2470983Z buf0 = empty_strided((1, 1, 4, 4), (16, 16, 4, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.2471105Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.2471255Z triton_fused_avg_pool2d_0.run(arg0_1, buf0, 16, grid=grid(16), stream=stream0) 2023-01-11T21:38:06.2471332Z del arg0_1 2023-01-11T21:38:06.2471412Z return (buf0, ) 2023-01-11T21:38:06.2471417Z 2023-01-11T21:38:06.2471422Z 2023-01-11T21:38:06.2471505Z if __name__ == "__main__": 2023-01-11T21:38:06.2471626Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.2471749Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.2471969Z arg0_1 = rand_strided((1, 1, 8, 8), (64, 64, 8, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.2472087Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.2472092Z 2023-01-11T21:38:06.2472165Z ok (0.585s) 2023-01-11T21:38:06.2472623Z test_avg_pool2d6_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.2472760Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.2473022Z [2023-01-11 21:34:05,670] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 419 2023-01-11T21:38:06.2473289Z [2023-01-11 21:34:05,820] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 419 2023-01-11T21:38:06.2473706Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.2473871Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.2474130Z [2023-01-11 21:34:05,838] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 420 2023-01-11T21:38:06.2474136Z 2023-01-11T21:38:06.2474230Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.2474307Z import torch 2023-01-11T21:38:06.2474382Z import random 2023-01-11T21:38:06.2474508Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.2474633Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.2474639Z 2023-01-11T21:38:06.2474723Z aten = torch.ops.aten 2023-01-11T21:38:06.2474867Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.2474959Z async_compile = AsyncCompile() 2023-01-11T21:38:06.2474969Z 2023-01-11T21:38:06.2475039Z import triton 2023-01-11T21:38:06.2475131Z import triton.language as tl 2023-01-11T21:38:06.2475261Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.2475408Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.2475414Z 2023-01-11T21:38:06.2475418Z 2023-01-11T21:38:06.2475591Z triton_fused_avg_pool2d_0 = async_compile.triton(''' 2023-01-11T21:38:06.2475668Z import triton 2023-01-11T21:38:06.2475767Z import triton.language as tl 2023-01-11T21:38:06.2475878Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2475980Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2476117Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.2476246Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2476254Z 2023-01-11T21:38:06.2476660Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.2476735Z @triton.jit 2023-01-11T21:38:06.2476950Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.2477028Z xnumel = 16 2023-01-11T21:38:06.2477121Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2477254Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.2477341Z xmask = xindex < xnumel 2023-01-11T21:38:06.2477422Z x1 = (xindex // 4) 2023-01-11T21:38:06.2477501Z x0 = xindex % 4 2023-01-11T21:38:06.2477573Z x2 = xindex 2023-01-11T21:38:06.2477686Z tmp0 = (-1) + (2*x1) 2023-01-11T21:38:06.2477753Z tmp1 = 0 2023-01-11T21:38:06.2477837Z tmp2 = tmp0 >= tmp1 2023-01-11T21:38:06.2477909Z tmp3 = 8 2023-01-11T21:38:06.2477994Z tmp4 = tmp0 < tmp3 2023-01-11T21:38:06.2478073Z tmp5 = tmp2 & tmp4 2023-01-11T21:38:06.2478186Z tmp6 = (-1) + (2*x0) 2023-01-11T21:38:06.2478261Z tmp7 = tmp6 >= tmp1 2023-01-11T21:38:06.2478339Z tmp8 = tmp6 < tmp3 2023-01-11T21:38:06.2478418Z tmp9 = tmp7 & tmp8 2023-01-11T21:38:06.2478498Z tmp10 = tmp5 & tmp9 2023-01-11T21:38:06.2478747Z tmp11 = tl.load(in_ptr0 + ((-9) + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp10 & xmask, other=0) 2023-01-11T21:38:06.2478846Z tmp12 = tl.where(tmp10, tmp11, 0.0) 2023-01-11T21:38:06.2478922Z tmp13 = 2*x0 2023-01-11T21:38:06.2478999Z tmp14 = tmp13 >= tmp1 2023-01-11T21:38:06.2479080Z tmp15 = tmp13 < tmp3 2023-01-11T21:38:06.2479161Z tmp16 = tmp14 & tmp15 2023-01-11T21:38:06.2479243Z tmp17 = tmp5 & tmp16 2023-01-11T21:38:06.2479482Z tmp18 = tl.load(in_ptr0 + ((-8) + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp17 & xmask, other=0) 2023-01-11T21:38:06.2479582Z tmp19 = tl.where(tmp17, tmp18, 0.0) 2023-01-11T21:38:06.2479695Z tmp20 = tmp19 + tmp12 2023-01-11T21:38:06.2479765Z tmp21 = 1 + (2*x0) 2023-01-11T21:38:06.2479850Z tmp22 = tmp21 >= tmp1 2023-01-11T21:38:06.2479932Z tmp23 = tmp21 < tmp3 2023-01-11T21:38:06.2480015Z tmp24 = tmp22 & tmp23 2023-01-11T21:38:06.2480097Z tmp25 = tmp5 & tmp24 2023-01-11T21:38:06.2480340Z tmp26 = tl.load(in_ptr0 + ((-7) + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp25 & xmask, other=0) 2023-01-11T21:38:06.2480437Z tmp27 = tl.where(tmp25, tmp26, 0.0) 2023-01-11T21:38:06.2480512Z tmp28 = tmp27 + tmp20 2023-01-11T21:38:06.2480585Z tmp29 = 2*x1 2023-01-11T21:38:06.2480666Z tmp30 = tmp29 >= tmp1 2023-01-11T21:38:06.2480744Z tmp31 = tmp29 < tmp3 2023-01-11T21:38:06.2480826Z tmp32 = tmp30 & tmp31 2023-01-11T21:38:06.2480905Z tmp33 = tmp32 & tmp9 2023-01-11T21:38:06.2481143Z tmp34 = tl.load(in_ptr0 + ((-1) + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp33 & xmask, other=0) 2023-01-11T21:38:06.2481238Z tmp35 = tl.where(tmp33, tmp34, 0.0) 2023-01-11T21:38:06.2481321Z tmp36 = tmp35 + tmp28 2023-01-11T21:38:06.2481403Z tmp37 = tmp32 & tmp16 2023-01-11T21:38:06.2481563Z tmp38 = tl.load(in_ptr0 + ((2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp37 & xmask, other=0) 2023-01-11T21:38:06.2481660Z tmp39 = tl.where(tmp37, tmp38, 0.0) 2023-01-11T21:38:06.2481745Z tmp40 = tmp39 + tmp36 2023-01-11T21:38:06.2481826Z tmp41 = tmp32 & tmp24 2023-01-11T21:38:06.2481979Z tmp42 = tl.load(in_ptr0 + (1 + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp41 & xmask, other=0) 2023-01-11T21:38:06.2482077Z tmp43 = tl.where(tmp41, tmp42, 0.0) 2023-01-11T21:38:06.2482157Z tmp44 = tmp43 + tmp40 2023-01-11T21:38:06.2482234Z tmp45 = 1 + (2*x1) 2023-01-11T21:38:06.2482313Z tmp46 = tmp45 >= tmp1 2023-01-11T21:38:06.2482396Z tmp47 = tmp45 < tmp3 2023-01-11T21:38:06.2482479Z tmp48 = tmp46 & tmp47 2023-01-11T21:38:06.2482553Z tmp49 = tmp48 & tmp9 2023-01-11T21:38:06.2482718Z tmp50 = tl.load(in_ptr0 + (7 + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp49 & xmask, other=0) 2023-01-11T21:38:06.2482817Z tmp51 = tl.where(tmp49, tmp50, 0.0) 2023-01-11T21:38:06.2482897Z tmp52 = tmp51 + tmp44 2023-01-11T21:38:06.2482978Z tmp53 = tmp48 & tmp16 2023-01-11T21:38:06.2483161Z tmp54 = tl.load(in_ptr0 + (8 + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp53 & xmask, other=0) 2023-01-11T21:38:06.2483260Z tmp55 = tl.where(tmp53, tmp54, 0.0) 2023-01-11T21:38:06.2483343Z tmp56 = tmp55 + tmp52 2023-01-11T21:38:06.2483417Z tmp57 = tmp48 & tmp24 2023-01-11T21:38:06.2483578Z tmp58 = tl.load(in_ptr0 + (9 + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp57 & xmask, other=0) 2023-01-11T21:38:06.2483677Z tmp59 = tl.where(tmp57, tmp58, 0.0) 2023-01-11T21:38:06.2483760Z tmp60 = tmp59 + tmp56 2023-01-11T21:38:06.2483843Z tmp61 = 0.3333333333333333 2023-01-11T21:38:06.2483925Z tmp62 = tmp60 * tmp61 2023-01-11T21:38:06.2484066Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp62, xmask) 2023-01-11T21:38:06.2484151Z ''') 2023-01-11T21:38:06.2484158Z 2023-01-11T21:38:06.2484162Z 2023-01-11T21:38:06.2484260Z async_compile.wait(globals()) 2023-01-11T21:38:06.2484339Z del async_compile 2023-01-11T21:38:06.2484344Z 2023-01-11T21:38:06.2484422Z def call(args): 2023-01-11T21:38:06.2484500Z arg0_1, = args 2023-01-11T21:38:06.2484578Z args.clear() 2023-01-11T21:38:06.2484674Z with torch.cuda.device(0): 2023-01-11T21:38:06.2484887Z buf0 = empty_strided((1, 1, 4, 4), (16, 16, 4, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.2484983Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.2485129Z triton_fused_avg_pool2d_0.run(arg0_1, buf0, 16, grid=grid(16), stream=stream0) 2023-01-11T21:38:06.2485204Z del arg0_1 2023-01-11T21:38:06.2485281Z return (buf0, ) 2023-01-11T21:38:06.2485286Z 2023-01-11T21:38:06.2485291Z 2023-01-11T21:38:06.2485376Z if __name__ == "__main__": 2023-01-11T21:38:06.2485525Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.2485656Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.2485868Z arg0_1 = rand_strided((1, 1, 8, 8), (64, 64, 8, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.2485984Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.2485992Z 2023-01-11T21:38:06.2486263Z [2023-01-11 21:34:05,990] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 420 2023-01-11T21:38:06.2486268Z 2023-01-11T21:38:06.2486373Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.2486450Z import torch 2023-01-11T21:38:06.2486526Z import random 2023-01-11T21:38:06.2486649Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.2486777Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.2486782Z 2023-01-11T21:38:06.2486860Z aten = torch.ops.aten 2023-01-11T21:38:06.2487001Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.2487105Z async_compile = AsyncCompile() 2023-01-11T21:38:06.2487110Z 2023-01-11T21:38:06.2487186Z import triton 2023-01-11T21:38:06.2487280Z import triton.language as tl 2023-01-11T21:38:06.2487408Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.2487554Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.2487560Z 2023-01-11T21:38:06.2487564Z 2023-01-11T21:38:06.2487734Z triton_fused_avg_pool2d_0 = async_compile.triton(''' 2023-01-11T21:38:06.2487805Z import triton 2023-01-11T21:38:06.2487898Z import triton.language as tl 2023-01-11T21:38:06.2488017Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2488122Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2488256Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.2488383Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2488388Z 2023-01-11T21:38:06.2488791Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.2488868Z @triton.jit 2023-01-11T21:38:06.2488997Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.2489103Z xnumel = 16 2023-01-11T21:38:06.2489204Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2489338Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.2489423Z xmask = xindex < xnumel 2023-01-11T21:38:06.2489503Z x1 = (xindex // 4) 2023-01-11T21:38:06.2489579Z x0 = xindex % 4 2023-01-11T21:38:06.2489645Z x2 = xindex 2023-01-11T21:38:06.2489759Z tmp0 = (-1) + (2*x1) 2023-01-11T21:38:06.2489832Z tmp1 = 0 2023-01-11T21:38:06.2489911Z tmp2 = tmp0 >= tmp1 2023-01-11T21:38:06.2489983Z tmp3 = 8 2023-01-11T21:38:06.2490062Z tmp4 = tmp0 < tmp3 2023-01-11T21:38:06.2490145Z tmp5 = tmp2 & tmp4 2023-01-11T21:38:06.2490249Z tmp6 = (-1) + (2*x0) 2023-01-11T21:38:06.2490329Z tmp7 = tmp6 >= tmp1 2023-01-11T21:38:06.2490407Z tmp8 = tmp6 < tmp3 2023-01-11T21:38:06.2490483Z tmp9 = tmp7 & tmp8 2023-01-11T21:38:06.2490564Z tmp10 = tmp5 & tmp9 2023-01-11T21:38:06.2490835Z tmp11 = tl.load(in_ptr0 + ((-9) + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp10 & xmask, other=0).to(tl.float32) 2023-01-11T21:38:06.2490935Z tmp12 = tl.where(tmp10, tmp11, 0.0) 2023-01-11T21:38:06.2491004Z tmp13 = 2*x0 2023-01-11T21:38:06.2491088Z tmp14 = tmp13 >= tmp1 2023-01-11T21:38:06.2491170Z tmp15 = tmp13 < tmp3 2023-01-11T21:38:06.2491252Z tmp16 = tmp14 & tmp15 2023-01-11T21:38:06.2491337Z tmp17 = tmp5 & tmp16 2023-01-11T21:38:06.2491599Z tmp18 = tl.load(in_ptr0 + ((-8) + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp17 & xmask, other=0).to(tl.float32) 2023-01-11T21:38:06.2491698Z tmp19 = tl.where(tmp17, tmp18, 0.0) 2023-01-11T21:38:06.2491815Z tmp20 = tmp19 + tmp12 2023-01-11T21:38:06.2491893Z tmp21 = 1 + (2*x0) 2023-01-11T21:38:06.2491978Z tmp22 = tmp21 >= tmp1 2023-01-11T21:38:06.2492061Z tmp23 = tmp21 < tmp3 2023-01-11T21:38:06.2492141Z tmp24 = tmp22 & tmp23 2023-01-11T21:38:06.2492222Z tmp25 = tmp5 & tmp24 2023-01-11T21:38:06.2492479Z tmp26 = tl.load(in_ptr0 + ((-7) + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp25 & xmask, other=0).to(tl.float32) 2023-01-11T21:38:06.2492580Z tmp27 = tl.where(tmp25, tmp26, 0.0) 2023-01-11T21:38:06.2492661Z tmp28 = tmp27 + tmp20 2023-01-11T21:38:06.2492737Z tmp29 = 2*x1 2023-01-11T21:38:06.2492817Z tmp30 = tmp29 >= tmp1 2023-01-11T21:38:06.2492898Z tmp31 = tmp29 < tmp3 2023-01-11T21:38:06.2492979Z tmp32 = tmp30 & tmp31 2023-01-11T21:38:06.2493054Z tmp33 = tmp32 & tmp9 2023-01-11T21:38:06.2493308Z tmp34 = tl.load(in_ptr0 + ((-1) + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp33 & xmask, other=0).to(tl.float32) 2023-01-11T21:38:06.2493410Z tmp35 = tl.where(tmp33, tmp34, 0.0) 2023-01-11T21:38:06.2493493Z tmp36 = tmp35 + tmp28 2023-01-11T21:38:06.2493575Z tmp37 = tmp32 & tmp16 2023-01-11T21:38:06.2493745Z tmp38 = tl.load(in_ptr0 + ((2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp37 & xmask, other=0).to(tl.float32) 2023-01-11T21:38:06.2493847Z tmp39 = tl.where(tmp37, tmp38, 0.0) 2023-01-11T21:38:06.2493927Z tmp40 = tmp39 + tmp36 2023-01-11T21:38:06.2494001Z tmp41 = tmp32 & tmp24 2023-01-11T21:38:06.2494174Z tmp42 = tl.load(in_ptr0 + (1 + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp41 & xmask, other=0).to(tl.float32) 2023-01-11T21:38:06.2494271Z tmp43 = tl.where(tmp41, tmp42, 0.0) 2023-01-11T21:38:06.2494354Z tmp44 = tmp43 + tmp40 2023-01-11T21:38:06.2494431Z tmp45 = 1 + (2*x1) 2023-01-11T21:38:06.2494624Z tmp46 = tmp45 >= tmp1 2023-01-11T21:38:06.2494704Z tmp47 = tmp45 < tmp3 2023-01-11T21:38:06.2494777Z tmp48 = tmp46 & tmp47 2023-01-11T21:38:06.2494856Z tmp49 = tmp48 & tmp9 2023-01-11T21:38:06.2495031Z tmp50 = tl.load(in_ptr0 + (7 + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp49 & xmask, other=0).to(tl.float32) 2023-01-11T21:38:06.2495131Z tmp51 = tl.where(tmp49, tmp50, 0.0) 2023-01-11T21:38:06.2495230Z tmp52 = tmp51 + tmp44 2023-01-11T21:38:06.2495314Z tmp53 = tmp48 & tmp16 2023-01-11T21:38:06.2495547Z tmp54 = tl.load(in_ptr0 + (8 + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp53 & xmask, other=0).to(tl.float32) 2023-01-11T21:38:06.2495637Z tmp55 = tl.where(tmp53, tmp54, 0.0) 2023-01-11T21:38:06.2495718Z tmp56 = tmp55 + tmp52 2023-01-11T21:38:06.2495797Z tmp57 = tmp48 & tmp24 2023-01-11T21:38:06.2495967Z tmp58 = tl.load(in_ptr0 + (9 + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp57 & xmask, other=0).to(tl.float32) 2023-01-11T21:38:06.2496062Z tmp59 = tl.where(tmp57, tmp58, 0.0) 2023-01-11T21:38:06.2496142Z tmp60 = tmp59 + tmp56 2023-01-11T21:38:06.2496223Z tmp61 = 0.3333333333333333 2023-01-11T21:38:06.2496299Z tmp62 = tmp60 * tmp61 2023-01-11T21:38:06.2496435Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp62, xmask) 2023-01-11T21:38:06.2496523Z ''') 2023-01-11T21:38:06.2496529Z 2023-01-11T21:38:06.2496533Z 2023-01-11T21:38:06.2496627Z async_compile.wait(globals()) 2023-01-11T21:38:06.2496704Z del async_compile 2023-01-11T21:38:06.2496712Z 2023-01-11T21:38:06.2496787Z def call(args): 2023-01-11T21:38:06.2496862Z arg0_1, = args 2023-01-11T21:38:06.2496941Z args.clear() 2023-01-11T21:38:06.2497027Z with torch.cuda.device(0): 2023-01-11T21:38:06.2497304Z buf0 = empty_strided((1, 1, 4, 4), (16, 16, 4, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.2497399Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.2497546Z triton_fused_avg_pool2d_0.run(arg0_1, buf0, 16, grid=grid(16), stream=stream0) 2023-01-11T21:38:06.2497622Z del arg0_1 2023-01-11T21:38:06.2497699Z return (buf0, ) 2023-01-11T21:38:06.2497746Z 2023-01-11T21:38:06.2497750Z 2023-01-11T21:38:06.2497833Z if __name__ == "__main__": 2023-01-11T21:38:06.2497950Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.2498072Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.2498296Z arg0_1 = rand_strided((1, 1, 8, 8), (64, 64, 8, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.2498411Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.2498416Z 2023-01-11T21:38:06.2498485Z ok (0.339s) 2023-01-11T21:38:06.2498944Z test_avg_pool2d7_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.2499077Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.2499340Z [2023-01-11 21:34:06,009] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 421 2023-01-11T21:38:06.2499567Z [2023-01-11 21:34:06,014] torch._inductor.ir: [WARNING] Using FallbackKernel: aten.avg_pool2d 2023-01-11T21:38:06.2499834Z [2023-01-11 21:34:06,016] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 421 2023-01-11T21:38:06.2500255Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.2500382Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.2500637Z [2023-01-11 21:34:06,033] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 422 2023-01-11T21:38:06.2500870Z [2023-01-11 21:34:06,039] torch._inductor.ir: [WARNING] Using FallbackKernel: aten.avg_pool2d 2023-01-11T21:38:06.2501133Z [2023-01-11 21:34:06,041] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 422 2023-01-11T21:38:06.2501139Z 2023-01-11T21:38:06.2501265Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.2501341Z import torch 2023-01-11T21:38:06.2501415Z import random 2023-01-11T21:38:06.2501536Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.2501653Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.2501666Z 2023-01-11T21:38:06.2501742Z aten = torch.ops.aten 2023-01-11T21:38:06.2501880Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.2501974Z async_compile = AsyncCompile() 2023-01-11T21:38:06.2501979Z 2023-01-11T21:38:06.2502053Z import triton 2023-01-11T21:38:06.2502146Z import triton.language as tl 2023-01-11T21:38:06.2502277Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.2502420Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.2502425Z 2023-01-11T21:38:06.2502430Z 2023-01-11T21:38:06.2502516Z async_compile.wait(globals()) 2023-01-11T21:38:06.2502592Z del async_compile 2023-01-11T21:38:06.2502597Z 2023-01-11T21:38:06.2502673Z def call(args): 2023-01-11T21:38:06.2502745Z arg0_1, = args 2023-01-11T21:38:06.2502820Z args.clear() 2023-01-11T21:38:06.2502912Z with torch.cuda.device(0): 2023-01-11T21:38:06.2503048Z buf0 = aten.avg_pool2d(arg0_1, [13, 13], [1, 1], [0, 0], False, True, None) 2023-01-11T21:38:06.2503114Z del arg0_1 2023-01-11T21:38:06.2503186Z buf1 = buf0 2023-01-11T21:38:06.2503304Z assert_size_stride(buf1, (1, 1, 12, 12), (144, 144, 12, 1)) 2023-01-11T21:38:06.2503375Z del buf0 2023-01-11T21:38:06.2503450Z return (buf1, ) 2023-01-11T21:38:06.2503455Z 2023-01-11T21:38:06.2503487Z 2023-01-11T21:38:06.2503571Z if __name__ == "__main__": 2023-01-11T21:38:06.2503692Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.2503819Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.2504042Z arg0_1 = rand_strided((1, 1, 24, 24), (576, 576, 24, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.2504157Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.2504162Z 2023-01-11T21:38:06.2504166Z 2023-01-11T21:38:06.2504266Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.2504342Z import torch 2023-01-11T21:38:06.2504418Z import random 2023-01-11T21:38:06.2504540Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.2504667Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.2504672Z 2023-01-11T21:38:06.2504755Z aten = torch.ops.aten 2023-01-11T21:38:06.2504886Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.2504980Z async_compile = AsyncCompile() 2023-01-11T21:38:06.2504989Z 2023-01-11T21:38:06.2505065Z import triton 2023-01-11T21:38:06.2505159Z import triton.language as tl 2023-01-11T21:38:06.2505288Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.2505428Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.2505433Z 2023-01-11T21:38:06.2505440Z 2023-01-11T21:38:06.2505534Z async_compile.wait(globals()) 2023-01-11T21:38:06.2505613Z del async_compile 2023-01-11T21:38:06.2505618Z 2023-01-11T21:38:06.2505689Z def call(args): 2023-01-11T21:38:06.2505764Z arg0_1, = args 2023-01-11T21:38:06.2505841Z args.clear() 2023-01-11T21:38:06.2505935Z with torch.cuda.device(0): 2023-01-11T21:38:06.2506071Z buf0 = aten.avg_pool2d(arg0_1, [13, 13], [1, 1], [0, 0], False, True, None) 2023-01-11T21:38:06.2506151Z del arg0_1 2023-01-11T21:38:06.2506225Z buf1 = buf0 2023-01-11T21:38:06.2506335Z assert_size_stride(buf1, (1, 1, 12, 12), (144, 144, 12, 1)) 2023-01-11T21:38:06.2506412Z del buf0 2023-01-11T21:38:06.2506492Z return (buf1, ) 2023-01-11T21:38:06.2506497Z 2023-01-11T21:38:06.2506501Z 2023-01-11T21:38:06.2506583Z if __name__ == "__main__": 2023-01-11T21:38:06.2506700Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.2506857Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.2507085Z arg0_1 = rand_strided((1, 1, 24, 24), (576, 576, 24, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.2507201Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.2507206Z 2023-01-11T21:38:06.2507273Z ok (0.051s) 2023-01-11T21:38:06.2507744Z test_avg_pool2d_backward2_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.2507885Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.2508145Z [2023-01-11 21:34:06,062] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 423 2023-01-11T21:38:06.2508416Z [2023-01-11 21:34:06,383] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 423 2023-01-11T21:38:06.2508837Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.2508971Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.2509231Z [2023-01-11 21:34:06,402] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 424 2023-01-11T21:38:06.2509273Z 2023-01-11T21:38:06.2509376Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.2509451Z import torch 2023-01-11T21:38:06.2509522Z import random 2023-01-11T21:38:06.2509642Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.2509769Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.2509774Z 2023-01-11T21:38:06.2509858Z aten = torch.ops.aten 2023-01-11T21:38:06.2509995Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.2510091Z async_compile = AsyncCompile() 2023-01-11T21:38:06.2510096Z 2023-01-11T21:38:06.2510172Z import triton 2023-01-11T21:38:06.2510267Z import triton.language as tl 2023-01-11T21:38:06.2510390Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.2510536Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.2510542Z 2023-01-11T21:38:06.2510546Z 2023-01-11T21:38:06.2510735Z triton_fused_avg_pool2d_backward_0 = async_compile.triton(''' 2023-01-11T21:38:06.2510817Z import triton 2023-01-11T21:38:06.2510912Z import triton.language as tl 2023-01-11T21:38:06.2511029Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2511137Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2511269Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.2511399Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2511404Z 2023-01-11T21:38:06.2511813Z @pointwise(size_hints=[512], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.2511890Z @triton.jit 2023-01-11T21:38:06.2512027Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.2512102Z xnumel = 300 2023-01-11T21:38:06.2512202Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2512336Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.2512414Z xmask = xindex < xnumel 2023-01-11T21:38:06.2512497Z x1 = (xindex // 15) 2023-01-11T21:38:06.2512578Z x0 = xindex % 15 2023-01-11T21:38:06.2512650Z x2 = xindex 2023-01-11T21:38:06.2512755Z tmp0 = (-1) + x1 2023-01-11T21:38:06.2512889Z tmp1 = (-1) + x0 2023-01-11T21:38:06.2512967Z tmp2 = 2 + x1 2023-01-11T21:38:06.2513035Z tmp3 = 2 + x0 2023-01-11T21:38:06.2513106Z tmp4 = 0 2023-01-11T21:38:06.2513252Z tmp5 = tl.where(tmp0 != tmp0, tmp0, tl.where(tmp0 > tmp4, tmp0, tmp4)) 2023-01-11T21:38:06.2513391Z tmp6 = tl.where(tmp1 != tmp1, tmp1, tl.where(tmp1 > tmp4, tmp1, tmp4)) 2023-01-11T21:38:06.2513465Z tmp7 = 20 2023-01-11T21:38:06.2513598Z tmp8 = tl.where(tmp2 != tmp2, tmp2, tl.where(tmp2 < tmp7, tmp2, tmp7)) 2023-01-11T21:38:06.2513670Z tmp9 = 15 2023-01-11T21:38:06.2513803Z tmp10 = tl.where(tmp3 != tmp3, tmp3, tl.where(tmp3 < tmp9, tmp3, tmp9)) 2023-01-11T21:38:06.2513887Z tmp11 = tmp5 + tmp4 2023-01-11T21:38:06.2513968Z tmp12 = tmp6 + tmp4 2023-01-11T21:38:06.2514045Z tmp13 = 1 2023-01-11T21:38:06.2514117Z tmp14 = 3 2023-01-11T21:38:06.2514200Z tmp15 = tmp11 * tmp13 2023-01-11T21:38:06.2514318Z tmp16 = tmp15 - tmp13 2023-01-11T21:38:06.2514393Z tmp17 = tmp12 * tmp13 2023-01-11T21:38:06.2514509Z tmp18 = tmp17 - tmp13 2023-01-11T21:38:06.2514589Z tmp19 = tmp16 + tmp14 2023-01-11T21:38:06.2514670Z tmp20 = tmp7 + tmp13 2023-01-11T21:38:06.2514818Z tmp21 = tl.where(tmp19 != tmp19, tmp19, tl.where(tmp19 < tmp20, tmp19, tmp20)) 2023-01-11T21:38:06.2514900Z tmp22 = tmp18 + tmp14 2023-01-11T21:38:06.2514979Z tmp23 = tmp9 + tmp13 2023-01-11T21:38:06.2515118Z tmp24 = tl.where(tmp22 != tmp22, tmp22, tl.where(tmp22 < tmp23, tmp22, tmp23)) 2023-01-11T21:38:06.2515287Z tmp25 = tl.where(tmp16 != tmp16, tmp16, tl.where(tmp16 > tmp4, tmp16, tmp4)) 2023-01-11T21:38:06.2515455Z tmp26 = tl.where(tmp18 != tmp18, tmp18, tl.where(tmp18 > tmp4, tmp18, tmp4)) 2023-01-11T21:38:06.2515625Z tmp27 = tl.where(tmp21 != tmp21, tmp21, tl.where(tmp21 < tmp7, tmp21, tmp7)) 2023-01-11T21:38:06.2515766Z tmp28 = tl.where(tmp24 != tmp24, tmp24, tl.where(tmp24 < tmp9, tmp24, tmp9)) 2023-01-11T21:38:06.2515884Z tmp29 = tmp27 - tmp25 2023-01-11T21:38:06.2516001Z tmp30 = tmp28 - tmp26 2023-01-11T21:38:06.2516076Z tmp31 = tmp29 * tmp30 2023-01-11T21:38:06.2516190Z tmp32 = tmp8 - tmp13 2023-01-11T21:38:06.2516336Z tmp33 = tl.where(tmp11 != tmp11, tmp11, tl.where(tmp11 < tmp32, tmp11, tmp32)) 2023-01-11T21:38:06.2516454Z tmp34 = tmp10 - tmp13 2023-01-11T21:38:06.2516596Z tmp35 = tl.where(tmp12 != tmp12, tmp12, tl.where(tmp12 < tmp34, tmp12, tmp34)) 2023-01-11T21:38:06.2516714Z tmp36 = tl.load(in_ptr0 + (tmp35 + (15*tmp33)), None) 2023-01-11T21:38:06.2516796Z tmp37 = tmp36 / tmp31 2023-01-11T21:38:06.2516872Z tmp38 = tmp11 < tmp8 2023-01-11T21:38:06.2516953Z tmp39 = tmp12 < tmp10 2023-01-11T21:38:06.2517037Z tmp40 = tmp38 & tmp39 2023-01-11T21:38:06.2517113Z tmp41 = 0.0 2023-01-11T21:38:06.2517215Z tmp42 = tl.where(tmp40, tmp37, tmp41) 2023-01-11T21:38:06.2517296Z tmp43 = tmp6 + tmp13 2023-01-11T21:38:06.2517375Z tmp44 = tmp43 * tmp13 2023-01-11T21:38:06.2517482Z tmp45 = tmp44 - tmp13 2023-01-11T21:38:06.2517565Z tmp46 = tmp45 + tmp14 2023-01-11T21:38:06.2517709Z tmp47 = tl.where(tmp46 != tmp46, tmp46, tl.where(tmp46 < tmp23, tmp46, tmp23)) 2023-01-11T21:38:06.2517851Z tmp48 = tl.where(tmp45 != tmp45, tmp45, tl.where(tmp45 > tmp4, tmp45, tmp4)) 2023-01-11T21:38:06.2517992Z tmp49 = tl.where(tmp47 != tmp47, tmp47, tl.where(tmp47 < tmp9, tmp47, tmp9)) 2023-01-11T21:38:06.2518108Z tmp50 = tmp49 - tmp48 2023-01-11T21:38:06.2518189Z tmp51 = tmp29 * tmp50 2023-01-11T21:38:06.2518323Z tmp52 = tl.where(tmp43 != tmp43, tmp43, tl.where(tmp43 < tmp34, tmp43, tmp34)) 2023-01-11T21:38:06.2518443Z tmp53 = tl.load(in_ptr0 + (tmp52 + (15*tmp33)), None) 2023-01-11T21:38:06.2518528Z tmp54 = tmp53 / tmp51 2023-01-11T21:38:06.2518607Z tmp55 = tmp43 < tmp10 2023-01-11T21:38:06.2518688Z tmp56 = tmp38 & tmp55 2023-01-11T21:38:06.2518769Z tmp57 = tmp42 + tmp54 2023-01-11T21:38:06.2518869Z tmp58 = tl.where(tmp56, tmp57, tmp42) 2023-01-11T21:38:06.2518936Z tmp59 = 2 2023-01-11T21:38:06.2519048Z tmp60 = tmp6 + tmp59 2023-01-11T21:38:06.2519129Z tmp61 = tmp60 * tmp13 2023-01-11T21:38:06.2519241Z tmp62 = tmp61 - tmp13 2023-01-11T21:38:06.2519321Z tmp63 = tmp62 + tmp14 2023-01-11T21:38:06.2519459Z tmp64 = tl.where(tmp63 != tmp63, tmp63, tl.where(tmp63 < tmp23, tmp63, tmp23)) 2023-01-11T21:38:06.2519599Z tmp65 = tl.where(tmp62 != tmp62, tmp62, tl.where(tmp62 > tmp4, tmp62, tmp4)) 2023-01-11T21:38:06.2519738Z tmp66 = tl.where(tmp64 != tmp64, tmp64, tl.where(tmp64 < tmp9, tmp64, tmp9)) 2023-01-11T21:38:06.2519845Z tmp67 = tmp66 - tmp65 2023-01-11T21:38:06.2519926Z tmp68 = tmp29 * tmp67 2023-01-11T21:38:06.2520072Z tmp69 = tl.where(tmp60 != tmp60, tmp60, tl.where(tmp60 < tmp34, tmp60, tmp34)) 2023-01-11T21:38:06.2520188Z tmp70 = tl.load(in_ptr0 + (tmp69 + (15*tmp33)), None) 2023-01-11T21:38:06.2520272Z tmp71 = tmp70 / tmp68 2023-01-11T21:38:06.2520351Z tmp72 = tmp60 < tmp10 2023-01-11T21:38:06.2520433Z tmp73 = tmp38 & tmp72 2023-01-11T21:38:06.2520509Z tmp74 = tmp58 + tmp71 2023-01-11T21:38:06.2520610Z tmp75 = tl.where(tmp73, tmp74, tmp58) 2023-01-11T21:38:06.2520692Z tmp76 = tmp5 + tmp13 2023-01-11T21:38:06.2520772Z tmp77 = tmp76 * tmp13 2023-01-11T21:38:06.2520888Z tmp78 = tmp77 - tmp13 2023-01-11T21:38:06.2520969Z tmp79 = tmp78 + tmp14 2023-01-11T21:38:06.2521111Z tmp80 = tl.where(tmp79 != tmp79, tmp79, tl.where(tmp79 < tmp20, tmp79, tmp20)) 2023-01-11T21:38:06.2521247Z tmp81 = tl.where(tmp78 != tmp78, tmp78, tl.where(tmp78 > tmp4, tmp78, tmp4)) 2023-01-11T21:38:06.2521388Z tmp82 = tl.where(tmp80 != tmp80, tmp80, tl.where(tmp80 < tmp7, tmp80, tmp7)) 2023-01-11T21:38:06.2521530Z tmp83 = tmp82 - tmp81 2023-01-11T21:38:06.2521611Z tmp84 = tmp83 * tmp30 2023-01-11T21:38:06.2521751Z tmp85 = tl.where(tmp76 != tmp76, tmp76, tl.where(tmp76 < tmp32, tmp76, tmp32)) 2023-01-11T21:38:06.2521869Z tmp86 = tl.load(in_ptr0 + (tmp35 + (15*tmp85)), None) 2023-01-11T21:38:06.2521950Z tmp87 = tmp86 / tmp84 2023-01-11T21:38:06.2522028Z tmp88 = tmp76 < tmp8 2023-01-11T21:38:06.2522110Z tmp89 = tmp88 & tmp39 2023-01-11T21:38:06.2522191Z tmp90 = tmp75 + tmp87 2023-01-11T21:38:06.2522293Z tmp91 = tl.where(tmp89, tmp90, tmp75) 2023-01-11T21:38:06.2522374Z tmp92 = tmp83 * tmp50 2023-01-11T21:38:06.2522490Z tmp93 = tl.load(in_ptr0 + (tmp52 + (15*tmp85)), None) 2023-01-11T21:38:06.2522571Z tmp94 = tmp93 / tmp92 2023-01-11T21:38:06.2522645Z tmp95 = tmp88 & tmp55 2023-01-11T21:38:06.2522727Z tmp96 = tmp91 + tmp94 2023-01-11T21:38:06.2522830Z tmp97 = tl.where(tmp95, tmp96, tmp91) 2023-01-11T21:38:06.2522910Z tmp98 = tmp83 * tmp67 2023-01-11T21:38:06.2523031Z tmp99 = tl.load(in_ptr0 + (tmp69 + (15*tmp85)), None) 2023-01-11T21:38:06.2523116Z tmp100 = tmp99 / tmp98 2023-01-11T21:38:06.2523199Z tmp101 = tmp88 & tmp72 2023-01-11T21:38:06.2523276Z tmp102 = tmp97 + tmp100 2023-01-11T21:38:06.2523384Z tmp103 = tl.where(tmp101, tmp102, tmp97) 2023-01-11T21:38:06.2523468Z tmp104 = tmp5 + tmp59 2023-01-11T21:38:06.2523553Z tmp105 = tmp104 * tmp13 2023-01-11T21:38:06.2523672Z tmp106 = tmp105 - tmp13 2023-01-11T21:38:06.2523755Z tmp107 = tmp106 + tmp14 2023-01-11T21:38:06.2523904Z tmp108 = tl.where(tmp107 != tmp107, tmp107, tl.where(tmp107 < tmp20, tmp107, tmp20)) 2023-01-11T21:38:06.2524045Z tmp109 = tl.where(tmp106 != tmp106, tmp106, tl.where(tmp106 > tmp4, tmp106, tmp4)) 2023-01-11T21:38:06.2524191Z tmp110 = tl.where(tmp108 != tmp108, tmp108, tl.where(tmp108 < tmp7, tmp108, tmp7)) 2023-01-11T21:38:06.2524314Z tmp111 = tmp110 - tmp109 2023-01-11T21:38:06.2524398Z tmp112 = tmp111 * tmp30 2023-01-11T21:38:06.2524548Z tmp113 = tl.where(tmp104 != tmp104, tmp104, tl.where(tmp104 < tmp32, tmp104, tmp32)) 2023-01-11T21:38:06.2524671Z tmp114 = tl.load(in_ptr0 + (tmp35 + (15*tmp113)), None) 2023-01-11T21:38:06.2524757Z tmp115 = tmp114 / tmp112 2023-01-11T21:38:06.2524834Z tmp116 = tmp104 < tmp8 2023-01-11T21:38:06.2524942Z tmp117 = tmp116 & tmp39 2023-01-11T21:38:06.2525026Z tmp118 = tmp103 + tmp115 2023-01-11T21:38:06.2525135Z tmp119 = tl.where(tmp117, tmp118, tmp103) 2023-01-11T21:38:06.2525217Z tmp120 = tmp111 * tmp50 2023-01-11T21:38:06.2525337Z tmp121 = tl.load(in_ptr0 + (tmp52 + (15*tmp113)), None) 2023-01-11T21:38:06.2525420Z tmp122 = tmp121 / tmp120 2023-01-11T21:38:06.2525496Z tmp123 = tmp116 & tmp55 2023-01-11T21:38:06.2525580Z tmp124 = tmp119 + tmp122 2023-01-11T21:38:06.2525685Z tmp125 = tl.where(tmp123, tmp124, tmp119) 2023-01-11T21:38:06.2525767Z tmp126 = tmp111 * tmp67 2023-01-11T21:38:06.2525885Z tmp127 = tl.load(in_ptr0 + (tmp69 + (15*tmp113)), None) 2023-01-11T21:38:06.2525971Z tmp128 = tmp127 / tmp126 2023-01-11T21:38:06.2526052Z tmp129 = tmp116 & tmp72 2023-01-11T21:38:06.2526128Z tmp130 = tmp125 + tmp128 2023-01-11T21:38:06.2526234Z tmp131 = tl.where(tmp129, tmp130, tmp125) 2023-01-11T21:38:06.2526376Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp131, xmask) 2023-01-11T21:38:06.2526464Z ''') 2023-01-11T21:38:06.2526470Z 2023-01-11T21:38:06.2526475Z 2023-01-11T21:38:06.2526570Z async_compile.wait(globals()) 2023-01-11T21:38:06.2526648Z del async_compile 2023-01-11T21:38:06.2526653Z 2023-01-11T21:38:06.2526734Z def call(args): 2023-01-11T21:38:06.2526808Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.2526887Z args.clear() 2023-01-11T21:38:06.2526982Z with torch.cuda.device(0): 2023-01-11T21:38:06.2527212Z buf0 = empty_strided((1, 1, 20, 15), (300, 300, 15, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.2527307Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.2527494Z triton_fused_avg_pool2d_backward_0.run(arg0_1, buf0, 300, grid=grid(300), stream=stream0) 2023-01-11T21:38:06.2527570Z del arg0_1 2023-01-11T21:38:06.2527651Z return (buf0, ) 2023-01-11T21:38:06.2527656Z 2023-01-11T21:38:06.2527660Z 2023-01-11T21:38:06.2527736Z if __name__ == "__main__": 2023-01-11T21:38:06.2527857Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.2527991Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.2528218Z arg0_1 = rand_strided((1, 1, 20, 15), (300, 300, 15, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.2528442Z arg1_1 = rand_strided((1, 1, 20, 15), (300, 300, 15, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.2528566Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.2528571Z 2023-01-11T21:38:06.2528839Z [2023-01-11 21:34:06,718] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 424 2023-01-11T21:38:06.2528848Z 2023-01-11T21:38:06.2528949Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.2529026Z import torch 2023-01-11T21:38:06.2529096Z import random 2023-01-11T21:38:06.2529217Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.2529344Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.2529349Z 2023-01-11T21:38:06.2529435Z aten = torch.ops.aten 2023-01-11T21:38:06.2529575Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.2529674Z async_compile = AsyncCompile() 2023-01-11T21:38:06.2529679Z 2023-01-11T21:38:06.2529754Z import triton 2023-01-11T21:38:06.2529842Z import triton.language as tl 2023-01-11T21:38:06.2529969Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.2530112Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.2530118Z 2023-01-11T21:38:06.2530122Z 2023-01-11T21:38:06.2530311Z triton_fused_avg_pool2d_backward_0 = async_compile.triton(''' 2023-01-11T21:38:06.2530387Z import triton 2023-01-11T21:38:06.2530482Z import triton.language as tl 2023-01-11T21:38:06.2530595Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2530703Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2530833Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.2530989Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2530995Z 2023-01-11T21:38:06.2531401Z @pointwise(size_hints=[512], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.2531478Z @triton.jit 2023-01-11T21:38:06.2531615Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.2531691Z xnumel = 300 2023-01-11T21:38:06.2531790Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2531922Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.2532003Z xmask = xindex < xnumel 2023-01-11T21:38:06.2532084Z x1 = (xindex // 15) 2023-01-11T21:38:06.2532162Z x0 = xindex % 15 2023-01-11T21:38:06.2532237Z x2 = xindex 2023-01-11T21:38:06.2532345Z tmp0 = (-1) + x1 2023-01-11T21:38:06.2532451Z tmp1 = (-1) + x0 2023-01-11T21:38:06.2532523Z tmp2 = 2 + x1 2023-01-11T21:38:06.2532598Z tmp3 = 2 + x0 2023-01-11T21:38:06.2532670Z tmp4 = 0 2023-01-11T21:38:06.2532812Z tmp5 = tl.where(tmp0 != tmp0, tmp0, tl.where(tmp0 > tmp4, tmp0, tmp4)) 2023-01-11T21:38:06.2532956Z tmp6 = tl.where(tmp1 != tmp1, tmp1, tl.where(tmp1 > tmp4, tmp1, tmp4)) 2023-01-11T21:38:06.2533030Z tmp7 = 20 2023-01-11T21:38:06.2533168Z tmp8 = tl.where(tmp2 != tmp2, tmp2, tl.where(tmp2 < tmp7, tmp2, tmp7)) 2023-01-11T21:38:06.2533235Z tmp9 = 15 2023-01-11T21:38:06.2533374Z tmp10 = tl.where(tmp3 != tmp3, tmp3, tl.where(tmp3 < tmp9, tmp3, tmp9)) 2023-01-11T21:38:06.2533456Z tmp11 = tmp5 + tmp4 2023-01-11T21:38:06.2533576Z tmp12 = tmp6 + tmp4 2023-01-11T21:38:06.2533654Z tmp13 = 1 2023-01-11T21:38:06.2533729Z tmp14 = 3 2023-01-11T21:38:06.2533815Z tmp15 = tmp11 * tmp13 2023-01-11T21:38:06.2533926Z tmp16 = tmp15 - tmp13 2023-01-11T21:38:06.2534009Z tmp17 = tmp12 * tmp13 2023-01-11T21:38:06.2534123Z tmp18 = tmp17 - tmp13 2023-01-11T21:38:06.2534209Z tmp19 = tmp16 + tmp14 2023-01-11T21:38:06.2534291Z tmp20 = tmp7 + tmp13 2023-01-11T21:38:06.2534438Z tmp21 = tl.where(tmp19 != tmp19, tmp19, tl.where(tmp19 < tmp20, tmp19, tmp20)) 2023-01-11T21:38:06.2534628Z tmp22 = tmp18 + tmp14 2023-01-11T21:38:06.2534702Z tmp23 = tmp9 + tmp13 2023-01-11T21:38:06.2534845Z tmp24 = tl.where(tmp22 != tmp22, tmp22, tl.where(tmp22 < tmp23, tmp22, tmp23)) 2023-01-11T21:38:06.2534986Z tmp25 = tl.where(tmp16 != tmp16, tmp16, tl.where(tmp16 > tmp4, tmp16, tmp4)) 2023-01-11T21:38:06.2535125Z tmp26 = tl.where(tmp18 != tmp18, tmp18, tl.where(tmp18 > tmp4, tmp18, tmp4)) 2023-01-11T21:38:06.2535268Z tmp27 = tl.where(tmp21 != tmp21, tmp21, tl.where(tmp21 < tmp7, tmp21, tmp7)) 2023-01-11T21:38:06.2535407Z tmp28 = tl.where(tmp24 != tmp24, tmp24, tl.where(tmp24 < tmp9, tmp24, tmp9)) 2023-01-11T21:38:06.2535524Z tmp29 = tmp27 - tmp25 2023-01-11T21:38:06.2535638Z tmp30 = tmp28 - tmp26 2023-01-11T21:38:06.2535714Z tmp31 = tmp29 * tmp30 2023-01-11T21:38:06.2535825Z tmp32 = tmp8 - tmp13 2023-01-11T21:38:06.2535963Z tmp33 = tl.where(tmp11 != tmp11, tmp11, tl.where(tmp11 < tmp32, tmp11, tmp32)) 2023-01-11T21:38:06.2536074Z tmp34 = tmp10 - tmp13 2023-01-11T21:38:06.2536215Z tmp35 = tl.where(tmp12 != tmp12, tmp12, tl.where(tmp12 < tmp34, tmp12, tmp34)) 2023-01-11T21:38:06.2536346Z tmp36 = tl.load(in_ptr0 + (tmp35 + (15*tmp33)), None).to(tl.float32) 2023-01-11T21:38:06.2536424Z tmp37 = tmp36 / tmp31 2023-01-11T21:38:06.2536496Z tmp38 = tmp11 < tmp8 2023-01-11T21:38:06.2536574Z tmp39 = tmp12 < tmp10 2023-01-11T21:38:06.2536652Z tmp40 = tmp38 & tmp39 2023-01-11T21:38:06.2536727Z tmp41 = 0.0 2023-01-11T21:38:06.2536826Z tmp42 = tl.where(tmp40, tmp37, tmp41) 2023-01-11T21:38:06.2536906Z tmp43 = tmp6 + tmp13 2023-01-11T21:38:06.2536979Z tmp44 = tmp43 * tmp13 2023-01-11T21:38:06.2537090Z tmp45 = tmp44 - tmp13 2023-01-11T21:38:06.2537223Z tmp46 = tmp45 + tmp14 2023-01-11T21:38:06.2537427Z tmp47 = tl.where(tmp46 != tmp46, tmp46, tl.where(tmp46 < tmp23, tmp46, tmp23)) 2023-01-11T21:38:06.2537573Z tmp48 = tl.where(tmp45 != tmp45, tmp45, tl.where(tmp45 > tmp4, tmp45, tmp4)) 2023-01-11T21:38:06.2537716Z tmp49 = tl.where(tmp47 != tmp47, tmp47, tl.where(tmp47 < tmp9, tmp47, tmp9)) 2023-01-11T21:38:06.2537831Z tmp50 = tmp49 - tmp48 2023-01-11T21:38:06.2537913Z tmp51 = tmp29 * tmp50 2023-01-11T21:38:06.2538047Z tmp52 = tl.where(tmp43 != tmp43, tmp43, tl.where(tmp43 < tmp34, tmp43, tmp34)) 2023-01-11T21:38:06.2538181Z tmp53 = tl.load(in_ptr0 + (tmp52 + (15*tmp33)), None).to(tl.float32) 2023-01-11T21:38:06.2538269Z tmp54 = tmp53 / tmp51 2023-01-11T21:38:06.2538350Z tmp55 = tmp43 < tmp10 2023-01-11T21:38:06.2538432Z tmp56 = tmp38 & tmp55 2023-01-11T21:38:06.2538513Z tmp57 = tmp42 + tmp54 2023-01-11T21:38:06.2538614Z tmp58 = tl.where(tmp56, tmp57, tmp42) 2023-01-11T21:38:06.2538681Z tmp59 = 2 2023-01-11T21:38:06.2538763Z tmp60 = tmp6 + tmp59 2023-01-11T21:38:06.2538848Z tmp61 = tmp60 * tmp13 2023-01-11T21:38:06.2538963Z tmp62 = tmp61 - tmp13 2023-01-11T21:38:06.2539043Z tmp63 = tmp62 + tmp14 2023-01-11T21:38:06.2539186Z tmp64 = tl.where(tmp63 != tmp63, tmp63, tl.where(tmp63 < tmp23, tmp63, tmp23)) 2023-01-11T21:38:06.2539325Z tmp65 = tl.where(tmp62 != tmp62, tmp62, tl.where(tmp62 > tmp4, tmp62, tmp4)) 2023-01-11T21:38:06.2539458Z tmp66 = tl.where(tmp64 != tmp64, tmp64, tl.where(tmp64 < tmp9, tmp64, tmp9)) 2023-01-11T21:38:06.2539572Z tmp67 = tmp66 - tmp65 2023-01-11T21:38:06.2539651Z tmp68 = tmp29 * tmp67 2023-01-11T21:38:06.2539789Z tmp69 = tl.where(tmp60 != tmp60, tmp60, tl.where(tmp60 < tmp34, tmp60, tmp34)) 2023-01-11T21:38:06.2539961Z tmp70 = tl.load(in_ptr0 + (tmp69 + (15*tmp33)), None).to(tl.float32) 2023-01-11T21:38:06.2540042Z tmp71 = tmp70 / tmp68 2023-01-11T21:38:06.2540122Z tmp72 = tmp60 < tmp10 2023-01-11T21:38:06.2540196Z tmp73 = tmp38 & tmp72 2023-01-11T21:38:06.2540284Z tmp74 = tmp58 + tmp71 2023-01-11T21:38:06.2540386Z tmp75 = tl.where(tmp73, tmp74, tmp58) 2023-01-11T21:38:06.2540468Z tmp76 = tmp5 + tmp13 2023-01-11T21:38:06.2540548Z tmp77 = tmp76 * tmp13 2023-01-11T21:38:06.2540666Z tmp78 = tmp77 - tmp13 2023-01-11T21:38:06.2540747Z tmp79 = tmp78 + tmp14 2023-01-11T21:38:06.2540881Z tmp80 = tl.where(tmp79 != tmp79, tmp79, tl.where(tmp79 < tmp20, tmp79, tmp20)) 2023-01-11T21:38:06.2541022Z tmp81 = tl.where(tmp78 != tmp78, tmp78, tl.where(tmp78 > tmp4, tmp78, tmp4)) 2023-01-11T21:38:06.2541163Z tmp82 = tl.where(tmp80 != tmp80, tmp80, tl.where(tmp80 < tmp7, tmp80, tmp7)) 2023-01-11T21:38:06.2541283Z tmp83 = tmp82 - tmp81 2023-01-11T21:38:06.2541363Z tmp84 = tmp83 * tmp30 2023-01-11T21:38:06.2541503Z tmp85 = tl.where(tmp76 != tmp76, tmp76, tl.where(tmp76 < tmp32, tmp76, tmp32)) 2023-01-11T21:38:06.2541636Z tmp86 = tl.load(in_ptr0 + (tmp35 + (15*tmp85)), None).to(tl.float32) 2023-01-11T21:38:06.2541712Z tmp87 = tmp86 / tmp84 2023-01-11T21:38:06.2541798Z tmp88 = tmp76 < tmp8 2023-01-11T21:38:06.2541881Z tmp89 = tmp88 & tmp39 2023-01-11T21:38:06.2541960Z tmp90 = tmp75 + tmp87 2023-01-11T21:38:06.2542062Z tmp91 = tl.where(tmp89, tmp90, tmp75) 2023-01-11T21:38:06.2542146Z tmp92 = tmp83 * tmp50 2023-01-11T21:38:06.2542278Z tmp93 = tl.load(in_ptr0 + (tmp52 + (15*tmp85)), None).to(tl.float32) 2023-01-11T21:38:06.2542353Z tmp94 = tmp93 / tmp92 2023-01-11T21:38:06.2542432Z tmp95 = tmp88 & tmp55 2023-01-11T21:38:06.2542514Z tmp96 = tmp91 + tmp94 2023-01-11T21:38:06.2542615Z tmp97 = tl.where(tmp95, tmp96, tmp91) 2023-01-11T21:38:06.2542697Z tmp98 = tmp83 * tmp67 2023-01-11T21:38:06.2542832Z tmp99 = tl.load(in_ptr0 + (tmp69 + (15*tmp85)), None).to(tl.float32) 2023-01-11T21:38:06.2542917Z tmp100 = tmp99 / tmp98 2023-01-11T21:38:06.2542994Z tmp101 = tmp88 & tmp72 2023-01-11T21:38:06.2543078Z tmp102 = tmp97 + tmp100 2023-01-11T21:38:06.2543184Z tmp103 = tl.where(tmp101, tmp102, tmp97) 2023-01-11T21:38:06.2543293Z tmp104 = tmp5 + tmp59 2023-01-11T21:38:06.2543377Z tmp105 = tmp104 * tmp13 2023-01-11T21:38:06.2543497Z tmp106 = tmp105 - tmp13 2023-01-11T21:38:06.2543577Z tmp107 = tmp106 + tmp14 2023-01-11T21:38:06.2543722Z tmp108 = tl.where(tmp107 != tmp107, tmp107, tl.where(tmp107 < tmp20, tmp107, tmp20)) 2023-01-11T21:38:06.2543868Z tmp109 = tl.where(tmp106 != tmp106, tmp106, tl.where(tmp106 > tmp4, tmp106, tmp4)) 2023-01-11T21:38:06.2544012Z tmp110 = tl.where(tmp108 != tmp108, tmp108, tl.where(tmp108 < tmp7, tmp108, tmp7)) 2023-01-11T21:38:06.2544133Z tmp111 = tmp110 - tmp109 2023-01-11T21:38:06.2544222Z tmp112 = tmp111 * tmp30 2023-01-11T21:38:06.2544369Z tmp113 = tl.where(tmp104 != tmp104, tmp104, tl.where(tmp104 < tmp32, tmp104, tmp32)) 2023-01-11T21:38:06.2544502Z tmp114 = tl.load(in_ptr0 + (tmp35 + (15*tmp113)), None).to(tl.float32) 2023-01-11T21:38:06.2544588Z tmp115 = tmp114 / tmp112 2023-01-11T21:38:06.2544665Z tmp116 = tmp104 < tmp8 2023-01-11T21:38:06.2544753Z tmp117 = tmp116 & tmp39 2023-01-11T21:38:06.2544836Z tmp118 = tmp103 + tmp115 2023-01-11T21:38:06.2544944Z tmp119 = tl.where(tmp117, tmp118, tmp103) 2023-01-11T21:38:06.2545025Z tmp120 = tmp111 * tmp50 2023-01-11T21:38:06.2545158Z tmp121 = tl.load(in_ptr0 + (tmp52 + (15*tmp113)), None).to(tl.float32) 2023-01-11T21:38:06.2545241Z tmp122 = tmp121 / tmp120 2023-01-11T21:38:06.2545317Z tmp123 = tmp116 & tmp55 2023-01-11T21:38:06.2545402Z tmp124 = tmp119 + tmp122 2023-01-11T21:38:06.2545506Z tmp125 = tl.where(tmp123, tmp124, tmp119) 2023-01-11T21:38:06.2545585Z tmp126 = tmp111 * tmp67 2023-01-11T21:38:06.2545739Z tmp127 = tl.load(in_ptr0 + (tmp69 + (15*tmp113)), None).to(tl.float32) 2023-01-11T21:38:06.2545823Z tmp128 = tmp127 / tmp126 2023-01-11T21:38:06.2545904Z tmp129 = tmp116 & tmp72 2023-01-11T21:38:06.2545981Z tmp130 = tmp125 + tmp128 2023-01-11T21:38:06.2546086Z tmp131 = tl.where(tmp129, tmp130, tmp125) 2023-01-11T21:38:06.2546224Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp131, xmask) 2023-01-11T21:38:06.2546311Z ''') 2023-01-11T21:38:06.2546318Z 2023-01-11T21:38:06.2546322Z 2023-01-11T21:38:06.2546417Z async_compile.wait(globals()) 2023-01-11T21:38:06.2546496Z del async_compile 2023-01-11T21:38:06.2546501Z 2023-01-11T21:38:06.2546578Z def call(args): 2023-01-11T21:38:06.2546652Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.2546727Z args.clear() 2023-01-11T21:38:06.2546820Z with torch.cuda.device(0): 2023-01-11T21:38:06.2547050Z buf0 = empty_strided((1, 1, 20, 15), (300, 300, 15, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.2547148Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.2547308Z triton_fused_avg_pool2d_backward_0.run(arg0_1, buf0, 300, grid=grid(300), stream=stream0) 2023-01-11T21:38:06.2547382Z del arg0_1 2023-01-11T21:38:06.2547455Z return (buf0, ) 2023-01-11T21:38:06.2547469Z 2023-01-11T21:38:06.2547474Z 2023-01-11T21:38:06.2547552Z if __name__ == "__main__": 2023-01-11T21:38:06.2547672Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.2547803Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.2548029Z arg0_1 = rand_strided((1, 1, 20, 15), (300, 300, 15, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.2548247Z arg1_1 = rand_strided((1, 1, 20, 15), (300, 300, 15, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.2548372Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.2548378Z 2023-01-11T21:38:06.2548450Z ok (0.678s) 2023-01-11T21:38:06.2548946Z test_avg_pool2d_backward3_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.2549085Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.2549342Z [2023-01-11 21:34:06,749] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 425 2023-01-11T21:38:06.2549610Z [2023-01-11 21:34:06,881] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 425 2023-01-11T21:38:06.2550026Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.2550164Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.2550425Z [2023-01-11 21:34:06,901] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 426 2023-01-11T21:38:06.2550693Z [2023-01-11 21:34:07,022] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 426 2023-01-11T21:38:06.2550699Z 2023-01-11T21:38:06.2550799Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.2550878Z import torch 2023-01-11T21:38:06.2550954Z import random 2023-01-11T21:38:06.2551070Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.2551196Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.2551201Z 2023-01-11T21:38:06.2551285Z aten = torch.ops.aten 2023-01-11T21:38:06.2551427Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.2551549Z async_compile = AsyncCompile() 2023-01-11T21:38:06.2551554Z 2023-01-11T21:38:06.2551630Z import triton 2023-01-11T21:38:06.2551724Z import triton.language as tl 2023-01-11T21:38:06.2551852Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.2551991Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.2551997Z 2023-01-11T21:38:06.2552008Z 2023-01-11T21:38:06.2552189Z triton_fused_avg_pool2d_backward_0 = async_compile.triton(''' 2023-01-11T21:38:06.2552263Z import triton 2023-01-11T21:38:06.2552358Z import triton.language as tl 2023-01-11T21:38:06.2552475Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2552578Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2552713Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.2552842Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2552847Z 2023-01-11T21:38:06.2553252Z @pointwise(size_hints=[1048576], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.2553332Z @triton.jit 2023-01-11T21:38:06.2553466Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.2553547Z xnumel = 889056 2023-01-11T21:38:06.2553647Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2553779Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.2553864Z xmask = xindex < xnumel 2023-01-11T21:38:06.2553947Z x1 = (xindex // 21) % 21 2023-01-11T21:38:06.2554019Z x0 = xindex % 21 2023-01-11T21:38:06.2554099Z x2 = (xindex // 441) 2023-01-11T21:38:06.2560860Z x5 = xindex 2023-01-11T21:38:06.2560961Z tmp0 = ((1 + x1) // 2) 2023-01-11T21:38:06.2561043Z tmp1 = ((1 + x0) // 2) 2023-01-11T21:38:06.2561122Z tmp2 = 1 + (x1 // 2) 2023-01-11T21:38:06.2561208Z tmp3 = 1 + (x0 // 2) 2023-01-11T21:38:06.2561276Z tmp4 = 0 2023-01-11T21:38:06.2561418Z tmp5 = tl.where(tmp0 != tmp0, tmp0, tl.where(tmp0 > tmp4, tmp0, tmp4)) 2023-01-11T21:38:06.2561558Z tmp6 = tl.where(tmp1 != tmp1, tmp1, tl.where(tmp1 > tmp4, tmp1, tmp4)) 2023-01-11T21:38:06.2561632Z tmp7 = 11 2023-01-11T21:38:06.2561822Z tmp8 = tl.where(tmp2 != tmp2, tmp2, tl.where(tmp2 < tmp7, tmp2, tmp7)) 2023-01-11T21:38:06.2561959Z tmp9 = tl.where(tmp3 != tmp3, tmp3, tl.where(tmp3 < tmp7, tmp3, tmp7)) 2023-01-11T21:38:06.2562043Z tmp10 = tmp5 + tmp4 2023-01-11T21:38:06.2562118Z tmp11 = tmp6 + tmp4 2023-01-11T21:38:06.2562196Z tmp12 = 1 2023-01-11T21:38:06.2562331Z tmp13 = tmp8 - tmp12 2023-01-11T21:38:06.2562481Z tmp14 = tl.where(tmp10 != tmp10, tmp10, tl.where(tmp10 < tmp13, tmp10, tmp13)) 2023-01-11T21:38:06.2562595Z tmp15 = tmp9 - tmp12 2023-01-11T21:38:06.2562731Z tmp16 = tl.where(tmp11 != tmp11, tmp11, tl.where(tmp11 < tmp15, tmp11, tmp15)) 2023-01-11T21:38:06.2562860Z tmp17 = tl.load(in_ptr0 + (tmp16 + (11*tmp14) + (121*x2)), xmask) 2023-01-11T21:38:06.2562940Z tmp18 = tmp17 / 1 2023-01-11T21:38:06.2563021Z tmp19 = tmp10 < tmp8 2023-01-11T21:38:06.2563102Z tmp20 = tmp11 < tmp9 2023-01-11T21:38:06.2563184Z tmp21 = tmp19 & tmp20 2023-01-11T21:38:06.2563259Z tmp22 = 0.0 2023-01-11T21:38:06.2563358Z tmp23 = tl.where(tmp21, tmp18, tmp22) 2023-01-11T21:38:06.2563496Z tl.store(out_ptr0 + (x5 + tl.zeros([XBLOCK], tl.int32)), tmp23, xmask) 2023-01-11T21:38:06.2563583Z ''') 2023-01-11T21:38:06.2563590Z 2023-01-11T21:38:06.2563595Z 2023-01-11T21:38:06.2563693Z async_compile.wait(globals()) 2023-01-11T21:38:06.2563772Z del async_compile 2023-01-11T21:38:06.2563778Z 2023-01-11T21:38:06.2563854Z def call(args): 2023-01-11T21:38:06.2563939Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.2564017Z args.clear() 2023-01-11T21:38:06.2564105Z with torch.cuda.device(0): 2023-01-11T21:38:06.2564338Z buf0 = empty_strided((1, 2016, 21, 21), (889056, 441, 21, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.2564468Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.2564630Z triton_fused_avg_pool2d_backward_0.run(arg0_1, buf0, 889056, grid=grid(889056), stream=stream0) 2023-01-11T21:38:06.2564707Z del arg0_1 2023-01-11T21:38:06.2564790Z return (buf0, ) 2023-01-11T21:38:06.2564795Z 2023-01-11T21:38:06.2564800Z 2023-01-11T21:38:06.2564882Z if __name__ == "__main__": 2023-01-11T21:38:06.2565001Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.2565124Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.2565358Z arg0_1 = rand_strided((1, 2016, 11, 11), (243936, 121, 11, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.2565591Z arg1_1 = rand_strided((1, 2016, 21, 21), (889056, 441, 21, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.2565710Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.2565718Z 2023-01-11T21:38:06.2565722Z 2023-01-11T21:38:06.2565819Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.2565893Z import torch 2023-01-11T21:38:06.2565964Z import random 2023-01-11T21:38:06.2566075Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.2566197Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.2566205Z 2023-01-11T21:38:06.2566287Z aten = torch.ops.aten 2023-01-11T21:38:06.2566425Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.2566520Z async_compile = AsyncCompile() 2023-01-11T21:38:06.2566526Z 2023-01-11T21:38:06.2566602Z import triton 2023-01-11T21:38:06.2566693Z import triton.language as tl 2023-01-11T21:38:06.2566819Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.2566951Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.2566965Z 2023-01-11T21:38:06.2566969Z 2023-01-11T21:38:06.2567148Z triton_fused_avg_pool2d_backward_0 = async_compile.triton(''' 2023-01-11T21:38:06.2567223Z import triton 2023-01-11T21:38:06.2567318Z import triton.language as tl 2023-01-11T21:38:06.2567431Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2567531Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2567665Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.2567816Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2567822Z 2023-01-11T21:38:06.2568228Z @pointwise(size_hints=[1048576], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.2568303Z @triton.jit 2023-01-11T21:38:06.2568437Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.2568512Z xnumel = 889056 2023-01-11T21:38:06.2568609Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2568741Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.2568827Z xmask = xindex < xnumel 2023-01-11T21:38:06.2568909Z x1 = (xindex // 21) % 21 2023-01-11T21:38:06.2568977Z x0 = xindex % 21 2023-01-11T21:38:06.2569057Z x2 = (xindex // 441) 2023-01-11T21:38:06.2569127Z x5 = xindex 2023-01-11T21:38:06.2569209Z tmp0 = ((1 + x1) // 2) 2023-01-11T21:38:06.2569286Z tmp1 = ((1 + x0) // 2) 2023-01-11T21:38:06.2569362Z tmp2 = 1 + (x1 // 2) 2023-01-11T21:38:06.2569431Z tmp3 = 1 + (x0 // 2) 2023-01-11T21:38:06.2569501Z tmp4 = 0 2023-01-11T21:38:06.2569641Z tmp5 = tl.where(tmp0 != tmp0, tmp0, tl.where(tmp0 > tmp4, tmp0, tmp4)) 2023-01-11T21:38:06.2569777Z tmp6 = tl.where(tmp1 != tmp1, tmp1, tl.where(tmp1 > tmp4, tmp1, tmp4)) 2023-01-11T21:38:06.2569848Z tmp7 = 11 2023-01-11T21:38:06.2569979Z tmp8 = tl.where(tmp2 != tmp2, tmp2, tl.where(tmp2 < tmp7, tmp2, tmp7)) 2023-01-11T21:38:06.2570111Z tmp9 = tl.where(tmp3 != tmp3, tmp3, tl.where(tmp3 < tmp7, tmp3, tmp7)) 2023-01-11T21:38:06.2570219Z tmp10 = tmp5 + tmp4 2023-01-11T21:38:06.2570291Z tmp11 = tmp6 + tmp4 2023-01-11T21:38:06.2570364Z tmp12 = 1 2023-01-11T21:38:06.2570478Z tmp13 = tmp8 - tmp12 2023-01-11T21:38:06.2570621Z tmp14 = tl.where(tmp10 != tmp10, tmp10, tl.where(tmp10 < tmp13, tmp10, tmp13)) 2023-01-11T21:38:06.2570734Z tmp15 = tmp9 - tmp12 2023-01-11T21:38:06.2570871Z tmp16 = tl.where(tmp11 != tmp11, tmp11, tl.where(tmp11 < tmp15, tmp11, tmp15)) 2023-01-11T21:38:06.2571009Z tmp17 = tl.load(in_ptr0 + (tmp16 + (11*tmp14) + (121*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.2571080Z tmp18 = tmp17 / 1 2023-01-11T21:38:06.2571157Z tmp19 = tmp10 < tmp8 2023-01-11T21:38:06.2571235Z tmp20 = tmp11 < tmp9 2023-01-11T21:38:06.2571315Z tmp21 = tmp19 & tmp20 2023-01-11T21:38:06.2571387Z tmp22 = 0.0 2023-01-11T21:38:06.2571486Z tmp23 = tl.where(tmp21, tmp18, tmp22) 2023-01-11T21:38:06.2571621Z tl.store(out_ptr0 + (x5 + tl.zeros([XBLOCK], tl.int32)), tmp23, xmask) 2023-01-11T21:38:06.2571702Z ''') 2023-01-11T21:38:06.2571708Z 2023-01-11T21:38:06.2571712Z 2023-01-11T21:38:06.2571806Z async_compile.wait(globals()) 2023-01-11T21:38:06.2571884Z del async_compile 2023-01-11T21:38:06.2571889Z 2023-01-11T21:38:06.2571962Z def call(args): 2023-01-11T21:38:06.2572041Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.2572118Z args.clear() 2023-01-11T21:38:06.2572210Z with torch.cuda.device(0): 2023-01-11T21:38:06.2572435Z buf0 = empty_strided((1, 2016, 21, 21), (889056, 441, 21, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.2572527Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.2572686Z triton_fused_avg_pool2d_backward_0.run(arg0_1, buf0, 889056, grid=grid(889056), stream=stream0) 2023-01-11T21:38:06.2572759Z del arg0_1 2023-01-11T21:38:06.2572838Z return (buf0, ) 2023-01-11T21:38:06.2572843Z 2023-01-11T21:38:06.2572848Z 2023-01-11T21:38:06.2572925Z if __name__ == "__main__": 2023-01-11T21:38:06.2573043Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.2573171Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.2573396Z arg0_1 = rand_strided((1, 2016, 11, 11), (243936, 121, 11, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.2573653Z arg1_1 = rand_strided((1, 2016, 21, 21), (889056, 441, 21, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.2573774Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.2573780Z 2023-01-11T21:38:06.2573850Z ok (0.303s) 2023-01-11T21:38:06.2574317Z test_avg_pool2d_backward4_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.2574456Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.2574864Z [2023-01-11 21:34:07,043] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 427 2023-01-11T21:38:06.2575108Z [2023-01-11 21:34:07,053] torch._inductor.ir: [WARNING] Using FallbackKernel: aten.avg_pool2d_backward 2023-01-11T21:38:06.2575397Z [2023-01-11 21:34:07,055] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 427 2023-01-11T21:38:06.2575403Z 2023-01-11T21:38:06.2575514Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.2575593Z import torch 2023-01-11T21:38:06.2575667Z import random 2023-01-11T21:38:06.2575786Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.2575909Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.2575914Z 2023-01-11T21:38:06.2575995Z aten = torch.ops.aten 2023-01-11T21:38:06.2576130Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.2576317Z async_compile = AsyncCompile() 2023-01-11T21:38:06.2576323Z 2023-01-11T21:38:06.2576397Z import triton 2023-01-11T21:38:06.2576484Z import triton.language as tl 2023-01-11T21:38:06.2576612Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.2576756Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.2576762Z 2023-01-11T21:38:06.2576766Z 2023-01-11T21:38:06.2576860Z async_compile.wait(globals()) 2023-01-11T21:38:06.2576938Z del async_compile 2023-01-11T21:38:06.2576943Z 2023-01-11T21:38:06.2577018Z def call(args): 2023-01-11T21:38:06.2577098Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.2577233Z args.clear() 2023-01-11T21:38:06.2577341Z with torch.cuda.device(0): 2023-01-11T21:38:06.2577512Z buf0 = aten.avg_pool2d_backward(arg0_1, arg1_1, [13, 13], [1, 1], [0, 0], True, False, None) 2023-01-11T21:38:06.2577591Z del arg0_1 2023-01-11T21:38:06.2577666Z del arg1_1 2023-01-11T21:38:06.2577745Z buf1 = buf0 2023-01-11T21:38:06.2577863Z assert_size_stride(buf1, (1, 16, 24, 24), (9216, 576, 24, 1)) 2023-01-11T21:38:06.2577930Z del buf0 2023-01-11T21:38:06.2578009Z return (buf1, ) 2023-01-11T21:38:06.2578014Z 2023-01-11T21:38:06.2578019Z 2023-01-11T21:38:06.2578102Z if __name__ == "__main__": 2023-01-11T21:38:06.2578224Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.2578353Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.2578587Z arg0_1 = rand_strided((1, 16, 12, 12), (2304, 144, 12, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.2578808Z arg1_1 = rand_strided((1, 16, 24, 24), (9216, 576, 24, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.2578929Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.2578934Z 2023-01-11T21:38:06.2579000Z ok (0.033s) 2023-01-11T21:38:06.2579466Z test_avg_pool2d_backward_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.2579643Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.2579906Z [2023-01-11 21:34:07,075] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 428 2023-01-11T21:38:06.2580170Z [2023-01-11 21:34:07,205] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 428 2023-01-11T21:38:06.2580583Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.2580720Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.2580977Z [2023-01-11 21:34:07,224] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 429 2023-01-11T21:38:06.2581244Z [2023-01-11 21:34:07,343] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 429 2023-01-11T21:38:06.2581249Z 2023-01-11T21:38:06.2581347Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.2581425Z import torch 2023-01-11T21:38:06.2581494Z import random 2023-01-11T21:38:06.2581616Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.2581740Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.2581746Z 2023-01-11T21:38:06.2581829Z aten = torch.ops.aten 2023-01-11T21:38:06.2581969Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.2582065Z async_compile = AsyncCompile() 2023-01-11T21:38:06.2582114Z 2023-01-11T21:38:06.2582191Z import triton 2023-01-11T21:38:06.2582285Z import triton.language as tl 2023-01-11T21:38:06.2582407Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.2582552Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.2582558Z 2023-01-11T21:38:06.2582565Z 2023-01-11T21:38:06.2582748Z triton_fused_avg_pool2d_backward_0 = async_compile.triton(''' 2023-01-11T21:38:06.2582823Z import triton 2023-01-11T21:38:06.2582918Z import triton.language as tl 2023-01-11T21:38:06.2583031Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2583135Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2583262Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.2583390Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2583395Z 2023-01-11T21:38:06.2583804Z @pointwise(size_hints=[2048], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.2583882Z @triton.jit 2023-01-11T21:38:06.2584018Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.2584094Z xnumel = 1568 2023-01-11T21:38:06.2584199Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2584329Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.2584408Z xmask = xindex < xnumel 2023-01-11T21:38:06.2584491Z x1 = (xindex // 14) % 14 2023-01-11T21:38:06.2584568Z x0 = xindex % 14 2023-01-11T21:38:06.2584650Z x2 = (xindex // 196) 2023-01-11T21:38:06.2584723Z x5 = xindex 2023-01-11T21:38:06.2584800Z tmp0 = (x1 // 2) 2023-01-11T21:38:06.2584876Z tmp1 = (x0 // 2) 2023-01-11T21:38:06.2584949Z tmp2 = 1 + (x1 // 2) 2023-01-11T21:38:06.2585026Z tmp3 = 1 + (x0 // 2) 2023-01-11T21:38:06.2585101Z tmp4 = 0 2023-01-11T21:38:06.2585246Z tmp5 = tl.where(tmp0 != tmp0, tmp0, tl.where(tmp0 > tmp4, tmp0, tmp4)) 2023-01-11T21:38:06.2585384Z tmp6 = tl.where(tmp1 != tmp1, tmp1, tl.where(tmp1 > tmp4, tmp1, tmp4)) 2023-01-11T21:38:06.2585460Z tmp7 = 7 2023-01-11T21:38:06.2585616Z tmp8 = tl.where(tmp2 != tmp2, tmp2, tl.where(tmp2 < tmp7, tmp2, tmp7)) 2023-01-11T21:38:06.2585796Z tmp9 = tl.where(tmp3 != tmp3, tmp3, tl.where(tmp3 < tmp7, tmp3, tmp7)) 2023-01-11T21:38:06.2585879Z tmp10 = tmp5 + tmp4 2023-01-11T21:38:06.2585963Z tmp11 = tmp6 + tmp4 2023-01-11T21:38:06.2586034Z tmp12 = 1 2023-01-11T21:38:06.2586151Z tmp13 = tmp8 - tmp12 2023-01-11T21:38:06.2586299Z tmp14 = tl.where(tmp10 != tmp10, tmp10, tl.where(tmp10 < tmp13, tmp10, tmp13)) 2023-01-11T21:38:06.2586413Z tmp15 = tmp9 - tmp12 2023-01-11T21:38:06.2586549Z tmp16 = tl.where(tmp11 != tmp11, tmp11, tl.where(tmp11 < tmp15, tmp11, tmp15)) 2023-01-11T21:38:06.2586673Z tmp17 = tl.load(in_ptr0 + (tmp16 + (7*tmp14) + (49*x2)), xmask) 2023-01-11T21:38:06.2586756Z tmp18 = tmp17 / 4 2023-01-11T21:38:06.2586838Z tmp19 = tmp10 < tmp8 2023-01-11T21:38:06.2586919Z tmp20 = tmp11 < tmp9 2023-01-11T21:38:06.2587001Z tmp21 = tmp19 & tmp20 2023-01-11T21:38:06.2587077Z tmp22 = 0.0 2023-01-11T21:38:06.2587173Z tmp23 = tl.where(tmp21, tmp18, tmp22) 2023-01-11T21:38:06.2587311Z tl.store(out_ptr0 + (x5 + tl.zeros([XBLOCK], tl.int32)), tmp23, xmask) 2023-01-11T21:38:06.2587400Z ''') 2023-01-11T21:38:06.2587406Z 2023-01-11T21:38:06.2587410Z 2023-01-11T21:38:06.2587508Z async_compile.wait(globals()) 2023-01-11T21:38:06.2587588Z del async_compile 2023-01-11T21:38:06.2587593Z 2023-01-11T21:38:06.2587669Z def call(args): 2023-01-11T21:38:06.2587750Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.2587831Z args.clear() 2023-01-11T21:38:06.2587919Z with torch.cuda.device(0): 2023-01-11T21:38:06.2588143Z buf0 = empty_strided((2, 4, 14, 14), (784, 196, 14, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.2588263Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.2588419Z triton_fused_avg_pool2d_backward_0.run(arg0_1, buf0, 1568, grid=grid(1568), stream=stream0) 2023-01-11T21:38:06.2588493Z del arg0_1 2023-01-11T21:38:06.2588574Z return (buf0, ) 2023-01-11T21:38:06.2588580Z 2023-01-11T21:38:06.2588584Z 2023-01-11T21:38:06.2588668Z if __name__ == "__main__": 2023-01-11T21:38:06.2588781Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.2588910Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.2589130Z arg0_1 = rand_strided((2, 4, 7, 7), (196, 49, 7, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.2589352Z arg1_1 = rand_strided((2, 4, 14, 14), (784, 196, 14, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.2589473Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.2589479Z 2023-01-11T21:38:06.2589483Z 2023-01-11T21:38:06.2589583Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.2589664Z import torch 2023-01-11T21:38:06.2589742Z import random 2023-01-11T21:38:06.2589856Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.2589981Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.2589986Z 2023-01-11T21:38:06.2590068Z aten = torch.ops.aten 2023-01-11T21:38:06.2590208Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.2590307Z async_compile = AsyncCompile() 2023-01-11T21:38:06.2590313Z 2023-01-11T21:38:06.2590390Z import triton 2023-01-11T21:38:06.2590484Z import triton.language as tl 2023-01-11T21:38:06.2590611Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.2590744Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.2590749Z 2023-01-11T21:38:06.2590761Z 2023-01-11T21:38:06.2590939Z triton_fused_avg_pool2d_backward_0 = async_compile.triton(''' 2023-01-11T21:38:06.2591015Z import triton 2023-01-11T21:38:06.2591109Z import triton.language as tl 2023-01-11T21:38:06.2591227Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2591333Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2591467Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.2591595Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2591600Z 2023-01-11T21:38:06.2592030Z @pointwise(size_hints=[2048], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.2592111Z @triton.jit 2023-01-11T21:38:06.2592248Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.2592326Z xnumel = 1568 2023-01-11T21:38:06.2592425Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2592554Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.2592642Z xmask = xindex < xnumel 2023-01-11T21:38:06.2592729Z x1 = (xindex // 14) % 14 2023-01-11T21:38:06.2592800Z x0 = xindex % 14 2023-01-11T21:38:06.2592882Z x2 = (xindex // 196) 2023-01-11T21:38:06.2592955Z x5 = xindex 2023-01-11T21:38:06.2593032Z tmp0 = (x1 // 2) 2023-01-11T21:38:06.2593110Z tmp1 = (x0 // 2) 2023-01-11T21:38:06.2593189Z tmp2 = 1 + (x1 // 2) 2023-01-11T21:38:06.2593263Z tmp3 = 1 + (x0 // 2) 2023-01-11T21:38:06.2593337Z tmp4 = 0 2023-01-11T21:38:06.2593476Z tmp5 = tl.where(tmp0 != tmp0, tmp0, tl.where(tmp0 > tmp4, tmp0, tmp4)) 2023-01-11T21:38:06.2593614Z tmp6 = tl.where(tmp1 != tmp1, tmp1, tl.where(tmp1 > tmp4, tmp1, tmp4)) 2023-01-11T21:38:06.2593687Z tmp7 = 7 2023-01-11T21:38:06.2593820Z tmp8 = tl.where(tmp2 != tmp2, tmp2, tl.where(tmp2 < tmp7, tmp2, tmp7)) 2023-01-11T21:38:06.2593951Z tmp9 = tl.where(tmp3 != tmp3, tmp3, tl.where(tmp3 < tmp7, tmp3, tmp7)) 2023-01-11T21:38:06.2594027Z tmp10 = tmp5 + tmp4 2023-01-11T21:38:06.2594107Z tmp11 = tmp6 + tmp4 2023-01-11T21:38:06.2594208Z tmp12 = 1 2023-01-11T21:38:06.2594324Z tmp13 = tmp8 - tmp12 2023-01-11T21:38:06.2594470Z tmp14 = tl.where(tmp10 != tmp10, tmp10, tl.where(tmp10 < tmp13, tmp10, tmp13)) 2023-01-11T21:38:06.2594584Z tmp15 = tmp9 - tmp12 2023-01-11T21:38:06.2594729Z tmp16 = tl.where(tmp11 != tmp11, tmp11, tl.where(tmp11 < tmp15, tmp11, tmp15)) 2023-01-11T21:38:06.2594859Z tmp17 = tl.load(in_ptr0 + (tmp16 + (7*tmp14) + (49*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.2594940Z tmp18 = tmp17 / 4 2023-01-11T21:38:06.2595022Z tmp19 = tmp10 < tmp8 2023-01-11T21:38:06.2595103Z tmp20 = tmp11 < tmp9 2023-01-11T21:38:06.2595184Z tmp21 = tmp19 & tmp20 2023-01-11T21:38:06.2595258Z tmp22 = 0.0 2023-01-11T21:38:06.2595360Z tmp23 = tl.where(tmp21, tmp18, tmp22) 2023-01-11T21:38:06.2595491Z tl.store(out_ptr0 + (x5 + tl.zeros([XBLOCK], tl.int32)), tmp23, xmask) 2023-01-11T21:38:06.2595578Z ''') 2023-01-11T21:38:06.2595583Z 2023-01-11T21:38:06.2595587Z 2023-01-11T21:38:06.2595688Z async_compile.wait(globals()) 2023-01-11T21:38:06.2595767Z del async_compile 2023-01-11T21:38:06.2595772Z 2023-01-11T21:38:06.2595848Z def call(args): 2023-01-11T21:38:06.2595930Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.2596006Z args.clear() 2023-01-11T21:38:06.2596101Z with torch.cuda.device(0): 2023-01-11T21:38:06.2596323Z buf0 = empty_strided((2, 4, 14, 14), (784, 196, 14, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.2596419Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.2596574Z triton_fused_avg_pool2d_backward_0.run(arg0_1, buf0, 1568, grid=grid(1568), stream=stream0) 2023-01-11T21:38:06.2596649Z del arg0_1 2023-01-11T21:38:06.2596729Z return (buf0, ) 2023-01-11T21:38:06.2596735Z 2023-01-11T21:38:06.2596739Z 2023-01-11T21:38:06.2596822Z if __name__ == "__main__": 2023-01-11T21:38:06.2596941Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.2597068Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.2597282Z arg0_1 = rand_strided((2, 4, 7, 7), (196, 49, 7, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.2597503Z arg1_1 = rand_strided((2, 4, 14, 14), (784, 196, 14, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.2597628Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.2597662Z 2023-01-11T21:38:06.2597736Z ok (0.288s) 2023-01-11T21:38:06.2598195Z test_baddbmm_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.2598328Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.2598588Z [2023-01-11 21:34:07,364] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 430 2023-01-11T21:38:06.2598859Z [2023-01-11 21:34:07,444] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 430 2023-01-11T21:38:06.2599279Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.2599414Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.2599665Z [2023-01-11 21:34:07,464] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 431 2023-01-11T21:38:06.2599929Z [2023-01-11 21:34:07,539] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 431 2023-01-11T21:38:06.2599934Z 2023-01-11T21:38:06.2600068Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.2600144Z import torch 2023-01-11T21:38:06.2600222Z import random 2023-01-11T21:38:06.2600345Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.2600471Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.2600476Z 2023-01-11T21:38:06.2600562Z aten = torch.ops.aten 2023-01-11T21:38:06.2600695Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.2600793Z async_compile = AsyncCompile() 2023-01-11T21:38:06.2600798Z 2023-01-11T21:38:06.2600873Z import triton 2023-01-11T21:38:06.2600969Z import triton.language as tl 2023-01-11T21:38:06.2601093Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.2601233Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.2601238Z 2023-01-11T21:38:06.2601243Z 2023-01-11T21:38:06.2601400Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.2601478Z import triton 2023-01-11T21:38:06.2601569Z import triton.language as tl 2023-01-11T21:38:06.2601684Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2601789Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2601922Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.2602050Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2602060Z 2023-01-11T21:38:06.2602485Z @pointwise(size_hints=[131072], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.2602562Z @triton.jit 2023-01-11T21:38:06.2602698Z def triton_(in_out_ptr0, in_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.2602769Z xnumel = 76800 2023-01-11T21:38:06.2602869Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2602999Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.2603088Z xmask = xindex < xnumel 2023-01-11T21:38:06.2603168Z x0 = xindex % 100 2023-01-11T21:38:06.2603255Z x2 = (xindex // 12800) 2023-01-11T21:38:06.2603328Z x3 = xindex 2023-01-11T21:38:06.2603432Z tmp0 = tl.load(in_ptr0 + (x0 + (100*x2)), xmask) 2023-01-11T21:38:06.2603537Z tmp1 = tl.load(in_out_ptr0 + (x3), xmask) 2023-01-11T21:38:06.2603649Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.2603791Z tl.store(in_out_ptr0 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.2603877Z ''') 2023-01-11T21:38:06.2603883Z 2023-01-11T21:38:06.2603887Z 2023-01-11T21:38:06.2603982Z async_compile.wait(globals()) 2023-01-11T21:38:06.2604061Z del async_compile 2023-01-11T21:38:06.2604066Z 2023-01-11T21:38:06.2604142Z def call(args): 2023-01-11T21:38:06.2604225Z arg0_1, arg1_1, arg2_1 = args 2023-01-11T21:38:06.2604303Z args.clear() 2023-01-11T21:38:06.2604397Z with torch.cuda.device(0): 2023-01-11T21:38:06.2604622Z buf0 = empty_strided((6, 128, 100), (12800, 100, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.2604731Z aten.bmm.out(arg1_1, arg2_1, out=buf0) 2023-01-11T21:38:06.2604808Z del arg1_1 2023-01-11T21:38:06.2604883Z del arg2_1 2023-01-11T21:38:06.2604969Z buf1 = buf0; del buf0 # reuse 2023-01-11T21:38:06.2605065Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.2605207Z triton_fused_add_0.run(buf1, arg0_1, 76800, grid=grid(76800), stream=stream0) 2023-01-11T21:38:06.2605281Z del arg0_1 2023-01-11T21:38:06.2605359Z return (buf1, ) 2023-01-11T21:38:06.2605364Z 2023-01-11T21:38:06.2605368Z 2023-01-11T21:38:06.2605450Z if __name__ == "__main__": 2023-01-11T21:38:06.2605569Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.2605700Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.2605954Z arg0_1 = rand_strided((6, 1, 100), (100, 100, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.2606172Z arg1_1 = rand_strided((6, 128, 64), (8192, 64, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.2606419Z arg2_1 = rand_strided((6, 64, 100), (6400, 100, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.2606548Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1])) 2023-01-11T21:38:06.2606553Z 2023-01-11T21:38:06.2606558Z 2023-01-11T21:38:06.2606662Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.2606739Z import torch 2023-01-11T21:38:06.2606816Z import random 2023-01-11T21:38:06.2606930Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.2607056Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.2607061Z 2023-01-11T21:38:06.2607144Z aten = torch.ops.aten 2023-01-11T21:38:06.2607281Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.2607377Z async_compile = AsyncCompile() 2023-01-11T21:38:06.2607382Z 2023-01-11T21:38:06.2607458Z import triton 2023-01-11T21:38:06.2607553Z import triton.language as tl 2023-01-11T21:38:06.2607684Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.2607818Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.2607823Z 2023-01-11T21:38:06.2607837Z 2023-01-11T21:38:06.2607986Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.2608062Z import triton 2023-01-11T21:38:06.2608158Z import triton.language as tl 2023-01-11T21:38:06.2608274Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2608380Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2608515Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.2608643Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2608649Z 2023-01-11T21:38:06.2609066Z @pointwise(size_hints=[131072], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.2609145Z @triton.jit 2023-01-11T21:38:06.2609282Z def triton_(in_out_ptr0, in_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.2609359Z xnumel = 76800 2023-01-11T21:38:06.2609458Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2609615Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.2609703Z xmask = xindex < xnumel 2023-01-11T21:38:06.2609785Z x0 = xindex % 100 2023-01-11T21:38:06.2609862Z x2 = (xindex // 12800) 2023-01-11T21:38:06.2609934Z x3 = xindex 2023-01-11T21:38:06.2610064Z tmp0 = tl.load(in_ptr0 + (x0 + (100*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.2610190Z tmp1 = tl.load(in_out_ptr0 + (x3), xmask).to(tl.float32) 2023-01-11T21:38:06.2610272Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.2610412Z tl.store(in_out_ptr0 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.2610501Z ''') 2023-01-11T21:38:06.2610506Z 2023-01-11T21:38:06.2610511Z 2023-01-11T21:38:06.2610601Z async_compile.wait(globals()) 2023-01-11T21:38:06.2610681Z del async_compile 2023-01-11T21:38:06.2610687Z 2023-01-11T21:38:06.2610763Z def call(args): 2023-01-11T21:38:06.2610852Z arg0_1, arg1_1, arg2_1 = args 2023-01-11T21:38:06.2610932Z args.clear() 2023-01-11T21:38:06.2611029Z with torch.cuda.device(0): 2023-01-11T21:38:06.2611254Z buf0 = empty_strided((6, 128, 100), (12800, 100, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.2611355Z aten.bmm.out(arg1_1, arg2_1, out=buf0) 2023-01-11T21:38:06.2611430Z del arg1_1 2023-01-11T21:38:06.2611505Z del arg2_1 2023-01-11T21:38:06.2611596Z buf1 = buf0; del buf0 # reuse 2023-01-11T21:38:06.2611690Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.2611828Z triton_fused_add_0.run(buf1, arg0_1, 76800, grid=grid(76800), stream=stream0) 2023-01-11T21:38:06.2611903Z del arg0_1 2023-01-11T21:38:06.2611976Z return (buf1, ) 2023-01-11T21:38:06.2611989Z 2023-01-11T21:38:06.2612024Z 2023-01-11T21:38:06.2612100Z if __name__ == "__main__": 2023-01-11T21:38:06.2612221Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.2612353Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.2612567Z arg0_1 = rand_strided((6, 1, 100), (100, 100, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.2612781Z arg1_1 = rand_strided((6, 128, 64), (8192, 64, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.2612995Z arg2_1 = rand_strided((6, 64, 100), (6400, 100, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.2613124Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1])) 2023-01-11T21:38:06.2613129Z 2023-01-11T21:38:06.2613204Z ok (0.197s) 2023-01-11T21:38:06.2613658Z test_batch_norm_2d_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.2613793Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.2614055Z [2023-01-11 21:34:07,851] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 432 2023-01-11T21:38:06.2614320Z [2023-01-11 21:34:08,091] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 432 2023-01-11T21:38:06.2614846Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.2614980Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.2615240Z [2023-01-11 21:34:08,398] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 433 2023-01-11T21:38:06.2615246Z 2023-01-11T21:38:06.2615347Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.2615418Z import torch 2023-01-11T21:38:06.2615490Z import random 2023-01-11T21:38:06.2615646Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.2615776Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.2615781Z 2023-01-11T21:38:06.2615863Z aten = torch.ops.aten 2023-01-11T21:38:06.2615998Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.2616094Z async_compile = AsyncCompile() 2023-01-11T21:38:06.2616099Z 2023-01-11T21:38:06.2616171Z import triton 2023-01-11T21:38:06.2616266Z import triton.language as tl 2023-01-11T21:38:06.2616391Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.2616524Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.2616532Z 2023-01-11T21:38:06.2616543Z 2023-01-11T21:38:06.2616722Z triton_fused_convert_element_type_0 = async_compile.triton(''' 2023-01-11T21:38:06.2616795Z import triton 2023-01-11T21:38:06.2616885Z import triton.language as tl 2023-01-11T21:38:06.2616999Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2617104Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2617324Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.2617453Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2617459Z 2023-01-11T21:38:06.2617854Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.2617928Z @triton.jit 2023-01-11T21:38:06.2618061Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.2618176Z xnumel = 10 2023-01-11T21:38:06.2618271Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2618401Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.2618484Z xmask = xindex < xnumel 2023-01-11T21:38:06.2618547Z x0 = xindex 2023-01-11T21:38:06.2618648Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.2618782Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask) 2023-01-11T21:38:06.2618870Z ''') 2023-01-11T21:38:06.2618875Z 2023-01-11T21:38:06.2618879Z 2023-01-11T21:38:06.2619040Z triton_fused_le_relu_1 = async_compile.triton(''' 2023-01-11T21:38:06.2619116Z import triton 2023-01-11T21:38:06.2619208Z import triton.language as tl 2023-01-11T21:38:06.2619323Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2619416Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2619551Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.2619676Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2619685Z 2023-01-11T21:38:06.2620166Z @pointwise(size_hints=[2048], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: '*fp32', 5: '*fp32', 6: '*i1', 7: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5, 6, 7), equal_to_1=())]}) 2023-01-11T21:38:06.2620240Z @triton.jit 2023-01-11T21:38:06.2620420Z def triton_(in_ptr0, in_ptr1, in_ptr2, in_ptr3, in_ptr4, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.2620495Z xnumel = 1280 2023-01-11T21:38:06.2620594Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2620718Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.2620800Z xmask = xindex < xnumel 2023-01-11T21:38:06.2620869Z x3 = xindex 2023-01-11T21:38:06.2620952Z x1 = (xindex // 64) % 10 2023-01-11T21:38:06.2621049Z tmp0 = tl.load(in_ptr0 + (x3), xmask) 2023-01-11T21:38:06.2621148Z tmp1 = tl.load(in_ptr1 + (x1), xmask) 2023-01-11T21:38:06.2621244Z tmp3 = tl.load(in_ptr2 + (x1), xmask) 2023-01-11T21:38:06.2621335Z tmp11 = tl.load(in_ptr3 + (x1), xmask) 2023-01-11T21:38:06.2621431Z tmp13 = tl.load(in_ptr4 + (x1), xmask) 2023-01-11T21:38:06.2621542Z tmp2 = tmp0 - tmp1 2023-01-11T21:38:06.2621670Z tmp4 = 1e-05 2023-01-11T21:38:06.2621750Z tmp5 = tmp3 + tmp4 2023-01-11T21:38:06.2621830Z tmp6 = tl.sqrt(tmp5) 2023-01-11T21:38:06.2621906Z tmp7 = 1 / tmp6 2023-01-11T21:38:06.2621970Z tmp8 = 1 2023-01-11T21:38:06.2622046Z tmp9 = tmp7 * tmp8 2023-01-11T21:38:06.2622124Z tmp10 = tmp2 * tmp9 2023-01-11T21:38:06.2622205Z tmp12 = tmp10 * tmp11 2023-01-11T21:38:06.2622284Z tmp14 = tmp12 + tmp13 2023-01-11T21:38:06.2622404Z tmp15 = tl.where(0 != 0, 0, tl.where(0 > tmp14, 0, tmp14)) 2023-01-11T21:38:06.2622475Z tmp16 = 0 2023-01-11T21:38:06.2622549Z tmp17 = tmp15 <= tmp16 2023-01-11T21:38:06.2622687Z tl.store(out_ptr0 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp15, xmask) 2023-01-11T21:38:06.2622822Z tl.store(out_ptr1 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp17, xmask) 2023-01-11T21:38:06.2622906Z ''') 2023-01-11T21:38:06.2622911Z 2023-01-11T21:38:06.2622916Z 2023-01-11T21:38:06.2623009Z async_compile.wait(globals()) 2023-01-11T21:38:06.2623087Z del async_compile 2023-01-11T21:38:06.2623093Z 2023-01-11T21:38:06.2623167Z def call(args): 2023-01-11T21:38:06.2623312Z primals_1, primals_2, primals_3, primals_4, primals_5, primals_6 = args 2023-01-11T21:38:06.2623381Z args.clear() 2023-01-11T21:38:06.2623476Z with torch.cuda.device(0): 2023-01-11T21:38:06.2623676Z buf0 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.2623772Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.2623933Z triton_fused_convert_element_type_0.run(primals_3, buf0, 10, grid=grid(10), stream=stream0) 2023-01-11T21:38:06.2624011Z del primals_3 2023-01-11T21:38:06.2624243Z buf1 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.2624396Z triton_fused_convert_element_type_0.run(primals_4, buf1, 10, grid=grid(10), stream=stream0) 2023-01-11T21:38:06.2624476Z del primals_4 2023-01-11T21:38:06.2624694Z buf2 = empty_strided((2, 10, 8, 8), (640, 64, 8, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.2624902Z buf3 = empty_strided((2, 10, 8, 8), (640, 64, 8, 1), device='cuda', dtype=torch.bool) 2023-01-11T21:38:06.2625085Z triton_fused_le_relu_1.run(primals_6, buf0, buf1, primals_1, primals_2, buf2, buf3, 1280, grid=grid(1280), stream=stream0) 2023-01-11T21:38:06.2625162Z del primals_1 2023-01-11T21:38:06.2625238Z del primals_2 2023-01-11T21:38:06.2625359Z return (buf0, buf1, buf2, primals_6, buf0, buf1, buf3, ) 2023-01-11T21:38:06.2625366Z 2023-01-11T21:38:06.2625370Z 2023-01-11T21:38:06.2625444Z if __name__ == "__main__": 2023-01-11T21:38:06.2625589Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.2625737Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.2625945Z primals_1 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.2626151Z primals_2 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.2626351Z primals_3 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.2626552Z primals_4 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.2626740Z primals_5 = rand_strided((), (), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.2626954Z primals_6 = rand_strided((2, 10, 8, 8), (640, 64, 8, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.2627132Z print_performance(lambda: call([primals_1, primals_2, primals_3, primals_4, primals_5, primals_6])) 2023-01-11T21:38:06.2627137Z 2023-01-11T21:38:06.2627400Z [2023-01-11 21:34:08,530] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 433 2023-01-11T21:38:06.2627409Z 2023-01-11T21:38:06.2627505Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.2627579Z import torch 2023-01-11T21:38:06.2627655Z import random 2023-01-11T21:38:06.2627804Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.2627930Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.2627936Z 2023-01-11T21:38:06.2628011Z aten = torch.ops.aten 2023-01-11T21:38:06.2628147Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.2628241Z async_compile = AsyncCompile() 2023-01-11T21:38:06.2628246Z 2023-01-11T21:38:06.2628319Z import triton 2023-01-11T21:38:06.2628414Z import triton.language as tl 2023-01-11T21:38:06.2628537Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.2628678Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.2628687Z 2023-01-11T21:38:06.2628691Z 2023-01-11T21:38:06.2628875Z triton_fused_convert_element_type_0 = async_compile.triton(''' 2023-01-11T21:38:06.2628944Z import triton 2023-01-11T21:38:06.2629037Z import triton.language as tl 2023-01-11T21:38:06.2629148Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2629251Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2629385Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.2629510Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2629515Z 2023-01-11T21:38:06.2629917Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.2629993Z @triton.jit 2023-01-11T21:38:06.2630118Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.2630192Z xnumel = 10 2023-01-11T21:38:06.2630318Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2630449Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.2630536Z xmask = xindex < xnumel 2023-01-11T21:38:06.2630606Z x0 = xindex 2023-01-11T21:38:06.2630703Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.2630836Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask) 2023-01-11T21:38:06.2630924Z ''') 2023-01-11T21:38:06.2630929Z 2023-01-11T21:38:06.2630933Z 2023-01-11T21:38:06.2631094Z triton_fused_le_relu_1 = async_compile.triton(''' 2023-01-11T21:38:06.2631170Z import triton 2023-01-11T21:38:06.2631265Z import triton.language as tl 2023-01-11T21:38:06.2631382Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2631486Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2631622Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.2631743Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2631752Z 2023-01-11T21:38:06.2632228Z @pointwise(size_hints=[8192], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: '*fp32', 5: '*fp32', 6: '*i1', 7: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5, 6, 7), equal_to_1=())]}) 2023-01-11T21:38:06.2632310Z @triton.jit 2023-01-11T21:38:06.2632486Z def triton_(in_ptr0, in_ptr1, in_ptr2, in_ptr3, in_ptr4, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.2632563Z xnumel = 7680 2023-01-11T21:38:06.2632663Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2632793Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.2632880Z xmask = xindex < xnumel 2023-01-11T21:38:06.2632946Z x3 = xindex 2023-01-11T21:38:06.2633031Z x1 = (xindex // 256) % 10 2023-01-11T21:38:06.2633130Z tmp0 = tl.load(in_ptr0 + (x3), xmask) 2023-01-11T21:38:06.2633228Z tmp1 = tl.load(in_ptr1 + (x1), xmask) 2023-01-11T21:38:06.2633327Z tmp3 = tl.load(in_ptr2 + (x1), xmask) 2023-01-11T21:38:06.2633429Z tmp11 = tl.load(in_ptr3 + (x1), xmask) 2023-01-11T21:38:06.2633527Z tmp13 = tl.load(in_ptr4 + (x1), xmask) 2023-01-11T21:38:06.2633634Z tmp2 = tmp0 - tmp1 2023-01-11T21:38:06.2633737Z tmp4 = 1e-05 2023-01-11T21:38:06.2633855Z tmp5 = tmp3 + tmp4 2023-01-11T21:38:06.2633939Z tmp6 = tl.sqrt(tmp5) 2023-01-11T21:38:06.2634018Z tmp7 = 1 / tmp6 2023-01-11T21:38:06.2634093Z tmp8 = 1 2023-01-11T21:38:06.2634174Z tmp9 = tmp7 * tmp8 2023-01-11T21:38:06.2634248Z tmp10 = tmp2 * tmp9 2023-01-11T21:38:06.2634332Z tmp12 = tmp10 * tmp11 2023-01-11T21:38:06.2634414Z tmp14 = tmp12 + tmp13 2023-01-11T21:38:06.2634539Z tmp15 = tl.where(0 != 0, 0, tl.where(0 > tmp14, 0, tmp14)) 2023-01-11T21:38:06.2634613Z tmp16 = 0 2023-01-11T21:38:06.2634696Z tmp17 = tmp15 <= tmp16 2023-01-11T21:38:06.2634832Z tl.store(out_ptr0 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp15, xmask) 2023-01-11T21:38:06.2634965Z tl.store(out_ptr1 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp17, xmask) 2023-01-11T21:38:06.2635053Z ''') 2023-01-11T21:38:06.2635058Z 2023-01-11T21:38:06.2635062Z 2023-01-11T21:38:06.2635158Z async_compile.wait(globals()) 2023-01-11T21:38:06.2635253Z del async_compile 2023-01-11T21:38:06.2635260Z 2023-01-11T21:38:06.2635344Z def call(args): 2023-01-11T21:38:06.2635516Z primals_1, primals_2, primals_3, primals_4, primals_5, primals_6 = args 2023-01-11T21:38:06.2635592Z args.clear() 2023-01-11T21:38:06.2635680Z with torch.cuda.device(0): 2023-01-11T21:38:06.2635880Z buf0 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.2635976Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.2636141Z triton_fused_convert_element_type_0.run(primals_3, buf0, 10, grid=grid(10), stream=stream0) 2023-01-11T21:38:06.2636221Z del primals_3 2023-01-11T21:38:06.2636417Z buf1 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.2636607Z triton_fused_convert_element_type_0.run(primals_4, buf1, 10, grid=grid(10), stream=stream0) 2023-01-11T21:38:06.2636690Z del primals_4 2023-01-11T21:38:06.2636907Z buf2 = empty_strided((3, 10, 16, 16), (2560, 256, 16, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.2637132Z buf3 = empty_strided((3, 10, 16, 16), (2560, 256, 16, 1), device='cuda', dtype=torch.bool) 2023-01-11T21:38:06.2637318Z triton_fused_le_relu_1.run(primals_6, buf0, buf1, primals_1, primals_2, buf2, buf3, 7680, grid=grid(7680), stream=stream0) 2023-01-11T21:38:06.2637397Z del primals_1 2023-01-11T21:38:06.2637476Z del primals_2 2023-01-11T21:38:06.2637600Z return (buf0, buf1, buf2, primals_6, buf0, buf1, buf3, ) 2023-01-11T21:38:06.2637607Z 2023-01-11T21:38:06.2637611Z 2023-01-11T21:38:06.2637692Z if __name__ == "__main__": 2023-01-11T21:38:06.2637812Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.2637940Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.2638146Z primals_1 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.2638351Z primals_2 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.2638557Z primals_3 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.2638759Z primals_4 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.2638953Z primals_5 = rand_strided((), (), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.2639184Z primals_6 = rand_strided((3, 10, 16, 16), (2560, 256, 16, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.2639366Z print_performance(lambda: call([primals_1, primals_2, primals_3, primals_4, primals_5, primals_6])) 2023-01-11T21:38:06.2639371Z 2023-01-11T21:38:06.2639444Z ok (0.991s) 2023-01-11T21:38:06.2639927Z test_bernoulli1_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.2640067Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.2640330Z [2023-01-11 21:34:08,566] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 434 2023-01-11T21:38:06.2640593Z [2023-01-11 21:34:08,630] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 434 2023-01-11T21:38:06.2641009Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.2641147Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.2641404Z [2023-01-11 21:34:08,665] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 435 2023-01-11T21:38:06.2641670Z [2023-01-11 21:34:08,731] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 435 2023-01-11T21:38:06.2641676Z 2023-01-11T21:38:06.2641774Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.2641852Z import torch 2023-01-11T21:38:06.2641922Z import random 2023-01-11T21:38:06.2642044Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.2642170Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.2642175Z 2023-01-11T21:38:06.2642262Z aten = torch.ops.aten 2023-01-11T21:38:06.2642403Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.2642529Z async_compile = AsyncCompile() 2023-01-11T21:38:06.2642534Z 2023-01-11T21:38:06.2642608Z import triton 2023-01-11T21:38:06.2642702Z import triton.language as tl 2023-01-11T21:38:06.2642824Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.2642967Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.2642972Z 2023-01-11T21:38:06.2642977Z 2023-01-11T21:38:06.2643147Z triton_fused_empty_like_0 = async_compile.triton(''' 2023-01-11T21:38:06.2643225Z import triton 2023-01-11T21:38:06.2643319Z import triton.language as tl 2023-01-11T21:38:06.2643435Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2643538Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2643672Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.2643794Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2643799Z 2023-01-11T21:38:06.2644188Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp32', 1: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0,), equal_to_1=())]}) 2023-01-11T21:38:06.2644267Z @triton.jit 2023-01-11T21:38:06.2644391Z def triton_(out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.2644469Z xnumel = 100 2023-01-11T21:38:06.2644568Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2644698Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.2644781Z xmask = xindex < xnumel 2023-01-11T21:38:06.2644848Z x0 = xindex 2023-01-11T21:38:06.2644921Z tmp0 = 0 2023-01-11T21:38:06.2645058Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask) 2023-01-11T21:38:06.2645145Z ''') 2023-01-11T21:38:06.2645151Z 2023-01-11T21:38:06.2645155Z 2023-01-11T21:38:06.2645250Z async_compile.wait(globals()) 2023-01-11T21:38:06.2645329Z del async_compile 2023-01-11T21:38:06.2645334Z 2023-01-11T21:38:06.2645412Z def call(args): 2023-01-11T21:38:06.2645481Z arg0_1, = args 2023-01-11T21:38:06.2645560Z args.clear() 2023-01-11T21:38:06.2645654Z with torch.cuda.device(0): 2023-01-11T21:38:06.2645858Z buf0 = empty_strided((100, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.2645954Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.2646122Z triton_fused_empty_like_0.run(buf0, 100, grid=grid(100), stream=stream0) 2023-01-11T21:38:06.2646212Z aten.bernoulli_(buf0, ) 2023-01-11T21:38:06.2646291Z return (buf0, buf0, ) 2023-01-11T21:38:06.2646296Z 2023-01-11T21:38:06.2646307Z 2023-01-11T21:38:06.2646382Z if __name__ == "__main__": 2023-01-11T21:38:06.2646503Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.2646632Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.2646838Z arg0_1 = rand_strided((100, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.2646954Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.2646962Z 2023-01-11T21:38:06.2646967Z 2023-01-11T21:38:06.2647066Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.2647140Z import torch 2023-01-11T21:38:06.2647210Z import random 2023-01-11T21:38:06.2647330Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.2647458Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.2647463Z 2023-01-11T21:38:06.2647546Z aten = torch.ops.aten 2023-01-11T21:38:06.2647683Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.2647785Z async_compile = AsyncCompile() 2023-01-11T21:38:06.2647790Z 2023-01-11T21:38:06.2647865Z import triton 2023-01-11T21:38:06.2647963Z import triton.language as tl 2023-01-11T21:38:06.2648083Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.2648225Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.2648230Z 2023-01-11T21:38:06.2648235Z 2023-01-11T21:38:06.2648430Z triton_fused_empty_like_0 = async_compile.triton(''' 2023-01-11T21:38:06.2648506Z import triton 2023-01-11T21:38:06.2648600Z import triton.language as tl 2023-01-11T21:38:06.2648717Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2648825Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2648956Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.2649084Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2649090Z 2023-01-11T21:38:06.2649481Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp16', 1: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0,), equal_to_1=())]}) 2023-01-11T21:38:06.2649556Z @triton.jit 2023-01-11T21:38:06.2649680Z def triton_(out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.2649756Z xnumel = 100 2023-01-11T21:38:06.2649853Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2649988Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.2650066Z xmask = xindex < xnumel 2023-01-11T21:38:06.2650140Z x0 = xindex 2023-01-11T21:38:06.2650212Z tmp0 = 0 2023-01-11T21:38:06.2650348Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask) 2023-01-11T21:38:06.2650438Z ''') 2023-01-11T21:38:06.2650446Z 2023-01-11T21:38:06.2650451Z 2023-01-11T21:38:06.2650549Z async_compile.wait(globals()) 2023-01-11T21:38:06.2650628Z del async_compile 2023-01-11T21:38:06.2650633Z 2023-01-11T21:38:06.2650710Z def call(args): 2023-01-11T21:38:06.2650779Z arg0_1, = args 2023-01-11T21:38:06.2650858Z args.clear() 2023-01-11T21:38:06.2650953Z with torch.cuda.device(0): 2023-01-11T21:38:06.2651157Z buf0 = empty_strided((100, ), (1, ), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.2651252Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.2651393Z triton_fused_empty_like_0.run(buf0, 100, grid=grid(100), stream=stream0) 2023-01-11T21:38:06.2651488Z aten.bernoulli_(buf0, ) 2023-01-11T21:38:06.2651567Z return (buf0, buf0, ) 2023-01-11T21:38:06.2651572Z 2023-01-11T21:38:06.2651584Z 2023-01-11T21:38:06.2651659Z if __name__ == "__main__": 2023-01-11T21:38:06.2651780Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.2651937Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.2652140Z arg0_1 = rand_strided((100, ), (1, ), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.2652252Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.2652257Z 2023-01-11T21:38:06.2652330Z ok (0.200s) 2023-01-11T21:38:06.2652793Z test_bernoulli2_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.2652929Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.2653181Z [2023-01-11 21:34:08,754] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 436 2023-01-11T21:38:06.2653440Z [2023-01-11 21:34:08,754] torch._inductor.lowering: [WARNING] using triton random, expect difference from eager 2023-01-11T21:38:06.2653705Z [2023-01-11 21:34:08,945] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 436 2023-01-11T21:38:06.2654122Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.2654258Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.2654650Z [2023-01-11 21:34:08,967] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 437 2023-01-11T21:38:06.2654903Z [2023-01-11 21:34:08,968] torch._inductor.lowering: [WARNING] using triton random, expect difference from eager 2023-01-11T21:38:06.2655168Z [2023-01-11 21:34:09,063] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 437 2023-01-11T21:38:06.2655175Z 2023-01-11T21:38:06.2655294Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.2655370Z import torch 2023-01-11T21:38:06.2655454Z import random 2023-01-11T21:38:06.2655579Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.2655702Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.2655707Z 2023-01-11T21:38:06.2655792Z aten = torch.ops.aten 2023-01-11T21:38:06.2655929Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.2656028Z async_compile = AsyncCompile() 2023-01-11T21:38:06.2656033Z 2023-01-11T21:38:06.2656107Z import triton 2023-01-11T21:38:06.2656200Z import triton.language as tl 2023-01-11T21:38:06.2656317Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.2656454Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.2656621Z seed_cuda_0 = None # 12bf87036c8e625335a9db42dcf50de0c1ec952294785adced537424d5733e17 2023-01-11T21:38:06.2656627Z 2023-01-11T21:38:06.2656631Z 2023-01-11T21:38:06.2656782Z triton_fused_lt_0 = async_compile.triton(''' 2023-01-11T21:38:06.2656856Z import triton 2023-01-11T21:38:06.2656946Z import triton.language as tl 2023-01-11T21:38:06.2657060Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2657217Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2657353Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.2657479Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2657487Z 2023-01-11T21:38:06.2657893Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp32', 2: '*i1', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.2657967Z @triton.jit 2023-01-11T21:38:06.2658155Z def triton_(seed0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.2658230Z xnumel = 8 2023-01-11T21:38:06.2658327Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2658449Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.2658533Z xmask = xindex < xnumel 2023-01-11T21:38:06.2658606Z x0 = xindex 2023-01-11T21:38:06.2658739Z tmp0 = tl.load(seed0 + (0 + tl.zeros([XBLOCK], tl.int32)), None) 2023-01-11T21:38:06.2658837Z tmp5 = tl.load(in_ptr1 + (x0), xmask) 2023-01-11T21:38:06.2658908Z tmp1 = 65535 2023-01-11T21:38:06.2658987Z tmp2 = tmp0 ^ tmp1 2023-01-11T21:38:06.2659054Z tmp3 = x0 2023-01-11T21:38:06.2659142Z tmp4 = tl.rand(tmp2, tmp3) 2023-01-11T21:38:06.2659221Z tmp6 = tmp4 < tmp5 2023-01-11T21:38:06.2659359Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp6, xmask) 2023-01-11T21:38:06.2659445Z ''') 2023-01-11T21:38:06.2659451Z 2023-01-11T21:38:06.2659455Z 2023-01-11T21:38:06.2659552Z async_compile.wait(globals()) 2023-01-11T21:38:06.2659629Z del async_compile 2023-01-11T21:38:06.2659635Z 2023-01-11T21:38:06.2659708Z def call(args): 2023-01-11T21:38:06.2659775Z arg0_1, = args 2023-01-11T21:38:06.2659850Z args.clear() 2023-01-11T21:38:06.2659984Z torch.randint(2**31, size=(), dtype=torch.int64, out=seed_cuda_0) 2023-01-11T21:38:06.2660078Z with torch.cuda.device(0): 2023-01-11T21:38:06.2660270Z buf0 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.bool) 2023-01-11T21:38:06.2660363Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.2660509Z triton_fused_lt_0.run(seed_cuda_0, arg0_1, buf0, 8, grid=grid(8), stream=stream0) 2023-01-11T21:38:06.2660612Z del arg0_1 2023-01-11T21:38:06.2660694Z return (buf0, ) 2023-01-11T21:38:06.2660699Z 2023-01-11T21:38:06.2660704Z 2023-01-11T21:38:06.2660784Z if __name__ == "__main__": 2023-01-11T21:38:06.2660902Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.2661034Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.2661236Z seed_cuda_0 = rand_strided((), (), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.2661435Z arg0_1 = rand_strided((8, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.2661546Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.2661551Z 2023-01-11T21:38:06.2661556Z 2023-01-11T21:38:06.2661646Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.2661719Z import torch 2023-01-11T21:38:06.2661792Z import random 2023-01-11T21:38:06.2661909Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.2662031Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.2662040Z 2023-01-11T21:38:06.2662122Z aten = torch.ops.aten 2023-01-11T21:38:06.2662256Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.2662348Z async_compile = AsyncCompile() 2023-01-11T21:38:06.2662354Z 2023-01-11T21:38:06.2662421Z import triton 2023-01-11T21:38:06.2662518Z import triton.language as tl 2023-01-11T21:38:06.2662643Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.2662781Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.2662944Z seed_cuda_0 = None # 12bf87036c8e625335a9db42dcf50de0c1ec952294785adced537424d5733e17 2023-01-11T21:38:06.2662950Z 2023-01-11T21:38:06.2662955Z 2023-01-11T21:38:06.2663107Z triton_fused_lt_0 = async_compile.triton(''' 2023-01-11T21:38:06.2663183Z import triton 2023-01-11T21:38:06.2663276Z import triton.language as tl 2023-01-11T21:38:06.2663381Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2663481Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2663615Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.2663738Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2663743Z 2023-01-11T21:38:06.2664183Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp16', 2: '*i1', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.2664259Z @triton.jit 2023-01-11T21:38:06.2664398Z def triton_(seed0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.2664472Z xnumel = 8 2023-01-11T21:38:06.2664563Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2664690Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.2664774Z xmask = xindex < xnumel 2023-01-11T21:38:06.2664844Z x0 = xindex 2023-01-11T21:38:06.2664972Z tmp0 = tl.load(seed0 + (0 + tl.zeros([XBLOCK], tl.int32)), None) 2023-01-11T21:38:06.2665093Z tmp5 = tl.load(in_ptr1 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.2665167Z tmp1 = 65535 2023-01-11T21:38:06.2665239Z tmp2 = tmp0 ^ tmp1 2023-01-11T21:38:06.2665310Z tmp3 = x0 2023-01-11T21:38:06.2665401Z tmp4 = tl.rand(tmp2, tmp3) 2023-01-11T21:38:06.2665482Z tmp6 = tmp4 < tmp5 2023-01-11T21:38:06.2665618Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp6, xmask) 2023-01-11T21:38:06.2665707Z ''') 2023-01-11T21:38:06.2665713Z 2023-01-11T21:38:06.2665717Z 2023-01-11T21:38:06.2665809Z async_compile.wait(globals()) 2023-01-11T21:38:06.2665879Z del async_compile 2023-01-11T21:38:06.2665884Z 2023-01-11T21:38:06.2665957Z def call(args): 2023-01-11T21:38:06.2666030Z arg0_1, = args 2023-01-11T21:38:06.2666105Z args.clear() 2023-01-11T21:38:06.2666238Z torch.randint(2**31, size=(), dtype=torch.int64, out=seed_cuda_0) 2023-01-11T21:38:06.2666330Z with torch.cuda.device(0): 2023-01-11T21:38:06.2666566Z buf0 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.bool) 2023-01-11T21:38:06.2666654Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.2666799Z triton_fused_lt_0.run(seed_cuda_0, arg0_1, buf0, 8, grid=grid(8), stream=stream0) 2023-01-11T21:38:06.2666875Z del arg0_1 2023-01-11T21:38:06.2666957Z return (buf0, ) 2023-01-11T21:38:06.2666962Z 2023-01-11T21:38:06.2666967Z 2023-01-11T21:38:06.2667049Z if __name__ == "__main__": 2023-01-11T21:38:06.2667166Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.2667293Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.2667492Z seed_cuda_0 = rand_strided((), (), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.2667683Z arg0_1 = rand_strided((8, ), (1, ), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.2667796Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.2667801Z 2023-01-11T21:38:06.2667875Z ok (0.332s) 2023-01-11T21:38:06.2668340Z test_bitwise2_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.2668474Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.2668735Z [2023-01-11 21:34:09,084] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 438 2023-01-11T21:38:06.2669000Z [2023-01-11 21:34:09,274] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 438 2023-01-11T21:38:06.2669419Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.2669554Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.2669883Z [2023-01-11 21:34:09,294] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 439 2023-01-11T21:38:06.2670153Z [2023-01-11 21:34:09,309] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 439 2023-01-11T21:38:06.2670158Z 2023-01-11T21:38:06.2670253Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.2670329Z import torch 2023-01-11T21:38:06.2670406Z import random 2023-01-11T21:38:06.2670529Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.2670655Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.2670660Z 2023-01-11T21:38:06.2670745Z aten = torch.ops.aten 2023-01-11T21:38:06.2670886Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.2670980Z async_compile = AsyncCompile() 2023-01-11T21:38:06.2670991Z 2023-01-11T21:38:06.2671061Z import triton 2023-01-11T21:38:06.2671154Z import triton.language as tl 2023-01-11T21:38:06.2671282Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.2671431Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.2671437Z 2023-01-11T21:38:06.2671442Z 2023-01-11T21:38:06.2671668Z triton_fused_bitwise_and_bitwise_not_bitwise_or_bitwise_xor_0 = async_compile.triton(''' 2023-01-11T21:38:06.2671747Z import triton 2023-01-11T21:38:06.2671843Z import triton.language as tl 2023-01-11T21:38:06.2671954Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2672059Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2672193Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.2672321Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2672357Z 2023-01-11T21:38:06.2672806Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*i1', 1: '*i1', 2: '*i1', 3: '*i1', 4: '*i1', 5: '*i1', 6: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5), equal_to_1=())]}) 2023-01-11T21:38:06.2672886Z @triton.jit 2023-01-11T21:38:06.2673057Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, out_ptr2, out_ptr3, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.2673136Z xnumel = 40 2023-01-11T21:38:06.2673237Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2673363Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.2673447Z xmask = xindex < xnumel 2023-01-11T21:38:06.2673521Z x0 = xindex 2023-01-11T21:38:06.2673716Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.2673911Z tmp2 = tl.load(in_ptr1 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.2674013Z tmp5 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.2674114Z tmp6 = tl.load(in_ptr1 + (x0), xmask) 2023-01-11T21:38:06.2674186Z tmp1 = tmp0 == 0 2023-01-11T21:38:06.2674267Z tmp3 = tmp0 | tmp2 2023-01-11T21:38:06.2674347Z tmp4 = tmp0 ^ tmp2 2023-01-11T21:38:06.2674426Z tmp7 = tmp5 & tmp6 2023-01-11T21:38:06.2674569Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask) 2023-01-11T21:38:06.2674704Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.2674838Z tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.2674965Z tl.store(out_ptr3 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp7, xmask) 2023-01-11T21:38:06.2675053Z ''') 2023-01-11T21:38:06.2675058Z 2023-01-11T21:38:06.2675063Z 2023-01-11T21:38:06.2675160Z async_compile.wait(globals()) 2023-01-11T21:38:06.2675239Z del async_compile 2023-01-11T21:38:06.2675244Z 2023-01-11T21:38:06.2675322Z def call(args): 2023-01-11T21:38:06.2675409Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.2675488Z args.clear() 2023-01-11T21:38:06.2675584Z with torch.cuda.device(0): 2023-01-11T21:38:06.2675777Z buf0 = empty_strided((2, 20), (20, 1), device='cuda', dtype=torch.bool) 2023-01-11T21:38:06.2676006Z buf1 = empty_strided((2, 20), (20, 1), device='cuda', dtype=torch.bool) 2023-01-11T21:38:06.2676208Z buf2 = empty_strided((2, 20), (20, 1), device='cuda', dtype=torch.bool) 2023-01-11T21:38:06.2676404Z buf3 = empty_strided((2, 20), (20, 1), device='cuda', dtype=torch.bool) 2023-01-11T21:38:06.2676500Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.2676702Z triton_fused_bitwise_and_bitwise_not_bitwise_or_bitwise_xor_0.run(arg0_1, arg1_1, buf0, buf1, buf2, buf3, 40, grid=grid(40), stream=stream0) 2023-01-11T21:38:06.2676778Z del arg0_1 2023-01-11T21:38:06.2676846Z del arg1_1 2023-01-11T21:38:06.2676944Z return (buf0, buf1, buf2, buf3, ) 2023-01-11T21:38:06.2676953Z 2023-01-11T21:38:06.2676958Z 2023-01-11T21:38:06.2677041Z if __name__ == "__main__": 2023-01-11T21:38:06.2677161Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.2677291Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.2677493Z arg0_1 = rand_strided((2, 20), (20, 1), device='cuda:0', dtype=torch.bool) 2023-01-11T21:38:06.2677691Z arg1_1 = rand_strided((2, 20), (20, 1), device='cuda:0', dtype=torch.bool) 2023-01-11T21:38:06.2677811Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.2677817Z 2023-01-11T21:38:06.2677821Z 2023-01-11T21:38:06.2677920Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.2677990Z import torch 2023-01-11T21:38:06.2678066Z import random 2023-01-11T21:38:06.2678186Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.2678310Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.2678315Z 2023-01-11T21:38:06.2678398Z aten = torch.ops.aten 2023-01-11T21:38:06.2678564Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.2678659Z async_compile = AsyncCompile() 2023-01-11T21:38:06.2678665Z 2023-01-11T21:38:06.2678734Z import triton 2023-01-11T21:38:06.2678829Z import triton.language as tl 2023-01-11T21:38:06.2678957Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.2679101Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.2679107Z 2023-01-11T21:38:06.2679111Z 2023-01-11T21:38:06.2679338Z triton_fused_bitwise_and_bitwise_not_bitwise_or_bitwise_xor_0 = async_compile.triton(''' 2023-01-11T21:38:06.2679415Z import triton 2023-01-11T21:38:06.2679509Z import triton.language as tl 2023-01-11T21:38:06.2679625Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2679722Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2679857Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.2679985Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2679993Z 2023-01-11T21:38:06.2680433Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*i1', 1: '*i1', 2: '*i1', 3: '*i1', 4: '*i1', 5: '*i1', 6: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5), equal_to_1=())]}) 2023-01-11T21:38:06.2680514Z @triton.jit 2023-01-11T21:38:06.2680688Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, out_ptr2, out_ptr3, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.2680765Z xnumel = 40 2023-01-11T21:38:06.2680864Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2680990Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.2681076Z xmask = xindex < xnumel 2023-01-11T21:38:06.2681148Z x0 = xindex 2023-01-11T21:38:06.2681346Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.2681538Z tmp2 = tl.load(in_ptr1 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.2681639Z tmp5 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.2681738Z tmp6 = tl.load(in_ptr1 + (x0), xmask) 2023-01-11T21:38:06.2681812Z tmp1 = tmp0 == 0 2023-01-11T21:38:06.2681895Z tmp3 = tmp0 | tmp2 2023-01-11T21:38:06.2681971Z tmp4 = tmp0 ^ tmp2 2023-01-11T21:38:06.2682079Z tmp7 = tmp5 & tmp6 2023-01-11T21:38:06.2682215Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask) 2023-01-11T21:38:06.2682351Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.2682483Z tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.2682615Z tl.store(out_ptr3 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp7, xmask) 2023-01-11T21:38:06.2682695Z ''') 2023-01-11T21:38:06.2682701Z 2023-01-11T21:38:06.2682705Z 2023-01-11T21:38:06.2682799Z async_compile.wait(globals()) 2023-01-11T21:38:06.2682879Z del async_compile 2023-01-11T21:38:06.2682884Z 2023-01-11T21:38:06.2682963Z def call(args): 2023-01-11T21:38:06.2683045Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.2683122Z args.clear() 2023-01-11T21:38:06.2683216Z with torch.cuda.device(0): 2023-01-11T21:38:06.2683407Z buf0 = empty_strided((2, 20), (20, 1), device='cuda', dtype=torch.bool) 2023-01-11T21:38:06.2683605Z buf1 = empty_strided((2, 20), (20, 1), device='cuda', dtype=torch.bool) 2023-01-11T21:38:06.2683800Z buf2 = empty_strided((2, 20), (20, 1), device='cuda', dtype=torch.bool) 2023-01-11T21:38:06.2683992Z buf3 = empty_strided((2, 20), (20, 1), device='cuda', dtype=torch.bool) 2023-01-11T21:38:06.2684087Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.2684288Z triton_fused_bitwise_and_bitwise_not_bitwise_or_bitwise_xor_0.run(arg0_1, arg1_1, buf0, buf1, buf2, buf3, 40, grid=grid(40), stream=stream0) 2023-01-11T21:38:06.2684364Z del arg0_1 2023-01-11T21:38:06.2684438Z del arg1_1 2023-01-11T21:38:06.2684529Z return (buf0, buf1, buf2, buf3, ) 2023-01-11T21:38:06.2684562Z 2023-01-11T21:38:06.2684574Z 2023-01-11T21:38:06.2684650Z if __name__ == "__main__": 2023-01-11T21:38:06.2684771Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.2684899Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.2685100Z arg0_1 = rand_strided((2, 20), (20, 1), device='cuda:0', dtype=torch.bool) 2023-01-11T21:38:06.2685295Z arg1_1 = rand_strided((2, 20), (20, 1), device='cuda:0', dtype=torch.bool) 2023-01-11T21:38:06.2685436Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.2685443Z 2023-01-11T21:38:06.2685520Z ok (0.245s) 2023-01-11T21:38:06.2686001Z test_bitwise_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.2686132Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.2686393Z [2023-01-11 21:34:09,329] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 440 2023-01-11T21:38:06.2686663Z [2023-01-11 21:34:09,410] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 440 2023-01-11T21:38:06.2687080Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.2687214Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.2687474Z [2023-01-11 21:34:09,430] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 441 2023-01-11T21:38:06.2687744Z [2023-01-11 21:34:09,444] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 441 2023-01-11T21:38:06.2687750Z 2023-01-11T21:38:06.2687850Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.2687927Z import torch 2023-01-11T21:38:06.2688034Z import random 2023-01-11T21:38:06.2688150Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.2688276Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.2688281Z 2023-01-11T21:38:06.2688366Z aten = torch.ops.aten 2023-01-11T21:38:06.2688505Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.2688601Z async_compile = AsyncCompile() 2023-01-11T21:38:06.2688607Z 2023-01-11T21:38:06.2688687Z import triton 2023-01-11T21:38:06.2688782Z import triton.language as tl 2023-01-11T21:38:06.2688902Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.2689046Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.2689051Z 2023-01-11T21:38:06.2689055Z 2023-01-11T21:38:06.2689281Z triton_fused_bitwise_and_bitwise_not_bitwise_or_bitwise_xor_0 = async_compile.triton(''' 2023-01-11T21:38:06.2689358Z import triton 2023-01-11T21:38:06.2689452Z import triton.language as tl 2023-01-11T21:38:06.2689571Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2689675Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2689809Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.2689930Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2689941Z 2023-01-11T21:38:06.2690386Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*i32', 1: '*i32', 2: '*i32', 3: '*i32', 4: '*i32', 5: '*i32', 6: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5, 6), equal_to_1=())]}) 2023-01-11T21:38:06.2690489Z @triton.jit 2023-01-11T21:38:06.2690659Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, out_ptr2, out_ptr3, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.2690736Z xnumel = 64 2023-01-11T21:38:06.2690839Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2690971Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.2691058Z xmask = xindex < xnumel 2023-01-11T21:38:06.2691125Z x0 = xindex 2023-01-11T21:38:06.2691318Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.2691510Z tmp2 = tl.load(in_ptr1 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.2691608Z tmp5 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.2691705Z tmp6 = tl.load(in_ptr1 + (x0), xmask) 2023-01-11T21:38:06.2691782Z tmp1 = ~tmp0 2023-01-11T21:38:06.2691863Z tmp3 = tmp0 | tmp2 2023-01-11T21:38:06.2691937Z tmp4 = tmp0 ^ tmp2 2023-01-11T21:38:06.2692016Z tmp7 = tmp5 & tmp6 2023-01-11T21:38:06.2692158Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask) 2023-01-11T21:38:06.2692294Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.2692425Z tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.2692561Z tl.store(out_ptr3 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp7, xmask) 2023-01-11T21:38:06.2692650Z ''') 2023-01-11T21:38:06.2692656Z 2023-01-11T21:38:06.2692660Z 2023-01-11T21:38:06.2692756Z async_compile.wait(globals()) 2023-01-11T21:38:06.2692829Z del async_compile 2023-01-11T21:38:06.2692840Z 2023-01-11T21:38:06.2692910Z def call(args): 2023-01-11T21:38:06.2692991Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.2693068Z args.clear() 2023-01-11T21:38:06.2693162Z with torch.cuda.device(0): 2023-01-11T21:38:06.2693362Z buf0 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.int32) 2023-01-11T21:38:06.2693558Z buf1 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.int32) 2023-01-11T21:38:06.2693750Z buf2 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.int32) 2023-01-11T21:38:06.2693935Z buf3 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.int32) 2023-01-11T21:38:06.2694031Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.2694261Z triton_fused_bitwise_and_bitwise_not_bitwise_or_bitwise_xor_0.run(arg0_1, arg1_1, buf0, buf1, buf2, buf3, 64, grid=grid(64), stream=stream0) 2023-01-11T21:38:06.2694339Z del arg0_1 2023-01-11T21:38:06.2694415Z del arg1_1 2023-01-11T21:38:06.2694618Z return (buf0, buf1, buf2, buf3, ) 2023-01-11T21:38:06.2694624Z 2023-01-11T21:38:06.2694628Z 2023-01-11T21:38:06.2694710Z if __name__ == "__main__": 2023-01-11T21:38:06.2694828Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.2694949Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.2695148Z arg0_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.int32) 2023-01-11T21:38:06.2695346Z arg1_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.int32) 2023-01-11T21:38:06.2695482Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.2695489Z 2023-01-11T21:38:06.2695495Z 2023-01-11T21:38:06.2695601Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.2695696Z import torch 2023-01-11T21:38:06.2695777Z import random 2023-01-11T21:38:06.2695890Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.2696015Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.2696020Z 2023-01-11T21:38:06.2696101Z aten = torch.ops.aten 2023-01-11T21:38:06.2696235Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.2696329Z async_compile = AsyncCompile() 2023-01-11T21:38:06.2696334Z 2023-01-11T21:38:06.2696405Z import triton 2023-01-11T21:38:06.2696497Z import triton.language as tl 2023-01-11T21:38:06.2696622Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.2696801Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.2696806Z 2023-01-11T21:38:06.2696818Z 2023-01-11T21:38:06.2697036Z triton_fused_bitwise_and_bitwise_not_bitwise_or_bitwise_xor_0 = async_compile.triton(''' 2023-01-11T21:38:06.2697112Z import triton 2023-01-11T21:38:06.2697268Z import triton.language as tl 2023-01-11T21:38:06.2697386Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2697488Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2697621Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.2697747Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2697753Z 2023-01-11T21:38:06.2698199Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*i32', 1: '*i32', 2: '*i32', 3: '*i32', 4: '*i32', 5: '*i32', 6: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5, 6), equal_to_1=())]}) 2023-01-11T21:38:06.2698275Z @triton.jit 2023-01-11T21:38:06.2698445Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, out_ptr2, out_ptr3, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.2698521Z xnumel = 64 2023-01-11T21:38:06.2698621Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2698754Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.2698839Z xmask = xindex < xnumel 2023-01-11T21:38:06.2698910Z x0 = xindex 2023-01-11T21:38:06.2699094Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.2699284Z tmp2 = tl.load(in_ptr1 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.2699380Z tmp5 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.2699475Z tmp6 = tl.load(in_ptr1 + (x0), xmask) 2023-01-11T21:38:06.2699549Z tmp1 = ~tmp0 2023-01-11T21:38:06.2699628Z tmp3 = tmp0 | tmp2 2023-01-11T21:38:06.2699709Z tmp4 = tmp0 ^ tmp2 2023-01-11T21:38:06.2699780Z tmp7 = tmp5 & tmp6 2023-01-11T21:38:06.2699921Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask) 2023-01-11T21:38:06.2700056Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.2700187Z tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.2700354Z tl.store(out_ptr3 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp7, xmask) 2023-01-11T21:38:06.2700442Z ''') 2023-01-11T21:38:06.2700448Z 2023-01-11T21:38:06.2700452Z 2023-01-11T21:38:06.2700545Z async_compile.wait(globals()) 2023-01-11T21:38:06.2700621Z del async_compile 2023-01-11T21:38:06.2700626Z 2023-01-11T21:38:06.2700694Z def call(args): 2023-01-11T21:38:06.2700777Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.2700852Z args.clear() 2023-01-11T21:38:06.2700946Z with torch.cuda.device(0): 2023-01-11T21:38:06.2701141Z buf0 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.int32) 2023-01-11T21:38:06.2701334Z buf1 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.int32) 2023-01-11T21:38:06.2701527Z buf2 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.int32) 2023-01-11T21:38:06.2701709Z buf3 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.int32) 2023-01-11T21:38:06.2701800Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.2701997Z triton_fused_bitwise_and_bitwise_not_bitwise_or_bitwise_xor_0.run(arg0_1, arg1_1, buf0, buf1, buf2, buf3, 64, grid=grid(64), stream=stream0) 2023-01-11T21:38:06.2702070Z del arg0_1 2023-01-11T21:38:06.2702145Z del arg1_1 2023-01-11T21:38:06.2702238Z return (buf0, buf1, buf2, buf3, ) 2023-01-11T21:38:06.2702244Z 2023-01-11T21:38:06.2702248Z 2023-01-11T21:38:06.2702328Z if __name__ == "__main__": 2023-01-11T21:38:06.2702445Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.2702566Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.2702759Z arg0_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.int32) 2023-01-11T21:38:06.2702982Z arg1_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.int32) 2023-01-11T21:38:06.2703100Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.2703106Z 2023-01-11T21:38:06.2703175Z ok (0.135s) 2023-01-11T21:38:06.2703629Z test_bmm1_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.2703761Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.2704021Z [2023-01-11 21:34:09,466] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 442 2023-01-11T21:38:06.2704283Z [2023-01-11 21:34:09,554] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 442 2023-01-11T21:38:06.2704708Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.2704840Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.2705087Z [2023-01-11 21:34:09,576] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 443 2023-01-11T21:38:06.2705092Z 2023-01-11T21:38:06.2705190Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.2705265Z import torch 2023-01-11T21:38:06.2705339Z import random 2023-01-11T21:38:06.2705461Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.2705587Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.2705594Z 2023-01-11T21:38:06.2705691Z aten = torch.ops.aten 2023-01-11T21:38:06.2705838Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.2705952Z async_compile = AsyncCompile() 2023-01-11T21:38:06.2705957Z 2023-01-11T21:38:06.2706030Z import triton 2023-01-11T21:38:06.2706148Z import triton.language as tl 2023-01-11T21:38:06.2706276Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.2706417Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.2706422Z 2023-01-11T21:38:06.2706426Z 2023-01-11T21:38:06.2706582Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.2706657Z import triton 2023-01-11T21:38:06.2706742Z import triton.language as tl 2023-01-11T21:38:06.2706857Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2706958Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2707092Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.2707224Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2707229Z 2023-01-11T21:38:06.2707636Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.2707710Z @triton.jit 2023-01-11T21:38:06.2707845Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.2707913Z xnumel = 128 2023-01-11T21:38:06.2708009Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2708136Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.2708219Z xmask = xindex < xnumel 2023-01-11T21:38:06.2708292Z x0 = xindex 2023-01-11T21:38:06.2708390Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.2708461Z tmp1 = 1 2023-01-11T21:38:06.2708534Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.2708709Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.2708799Z ''') 2023-01-11T21:38:06.2708805Z 2023-01-11T21:38:06.2708809Z 2023-01-11T21:38:06.2708969Z triton_fused_add_1_1 = async_compile.triton(''' 2023-01-11T21:38:06.2709045Z import triton 2023-01-11T21:38:06.2709143Z import triton.language as tl 2023-01-11T21:38:06.2709262Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2709367Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2709496Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.2709627Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2709633Z 2023-01-11T21:38:06.2710041Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.2710119Z @triton.jit 2023-01-11T21:38:06.2710255Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.2710332Z xnumel = 128 2023-01-11T21:38:06.2710430Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2710560Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.2710638Z xmask = xindex < xnumel 2023-01-11T21:38:06.2710713Z x0 = xindex 2023-01-11T21:38:06.2710812Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.2710887Z tmp1 = 2 2023-01-11T21:38:06.2710969Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.2711102Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.2711191Z ''') 2023-01-11T21:38:06.2711197Z 2023-01-11T21:38:06.2711201Z 2023-01-11T21:38:06.2711353Z triton_fused_add_2_2 = async_compile.triton(''' 2023-01-11T21:38:06.2711429Z import triton 2023-01-11T21:38:06.2711524Z import triton.language as tl 2023-01-11T21:38:06.2711640Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2711747Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2711883Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.2712009Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2712015Z 2023-01-11T21:38:06.2712444Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp32', 1: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.2712515Z @triton.jit 2023-01-11T21:38:06.2712641Z def triton_(in_out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.2712717Z xnumel = 128 2023-01-11T21:38:06.2712814Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2712945Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.2713030Z xmask = xindex < xnumel 2023-01-11T21:38:06.2713103Z x0 = xindex 2023-01-11T21:38:06.2713202Z tmp0 = tl.load(in_out_ptr0 + (x0), xmask) 2023-01-11T21:38:06.2713279Z tmp1 = 3 2023-01-11T21:38:06.2713358Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.2713499Z tl.store(in_out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.2713585Z ''') 2023-01-11T21:38:06.2713591Z 2023-01-11T21:38:06.2713595Z 2023-01-11T21:38:06.2713692Z async_compile.wait(globals()) 2023-01-11T21:38:06.2713772Z del async_compile 2023-01-11T21:38:06.2713777Z 2023-01-11T21:38:06.2713848Z def call(args): 2023-01-11T21:38:06.2713931Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.2714006Z args.clear() 2023-01-11T21:38:06.2714101Z with torch.cuda.device(0): 2023-01-11T21:38:06.2714312Z buf0 = empty_strided((2, 8, 8), (64, 8, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.2714422Z aten.bmm.out(arg0_1, arg1_1, out=buf0) 2023-01-11T21:38:06.2714624Z buf1 = empty_strided((2, 8, 8), (64, 8, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.2714718Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.2714882Z triton_fused_add_0.run(arg0_1, buf1, 128, grid=grid(128), stream=stream0) 2023-01-11T21:38:06.2714956Z del arg0_1 2023-01-11T21:38:06.2715160Z buf2 = empty_strided((2, 8, 8), (64, 8, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.2715330Z triton_fused_add_1_1.run(arg1_1, buf2, 128, grid=grid(128), stream=stream0) 2023-01-11T21:38:06.2715416Z del arg1_1 2023-01-11T21:38:06.2715633Z buf3 = empty_strided((2, 8, 8), (64, 8, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.2715736Z aten.bmm.out(buf1, buf2, out=buf3) 2023-01-11T21:38:06.2715804Z del buf1 2023-01-11T21:38:06.2715876Z del buf2 2023-01-11T21:38:06.2715968Z buf4 = buf3; del buf3 # reuse 2023-01-11T21:38:06.2716103Z triton_fused_add_2_2.run(buf4, 128, grid=grid(128), stream=stream0) 2023-01-11T21:38:06.2716189Z return (buf0, buf4, ) 2023-01-11T21:38:06.2716194Z 2023-01-11T21:38:06.2716199Z 2023-01-11T21:38:06.2716284Z if __name__ == "__main__": 2023-01-11T21:38:06.2716404Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.2716532Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.2716737Z arg0_1 = rand_strided((2, 8, 8), (64, 8, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.2716949Z arg1_1 = rand_strided((2, 8, 8), (64, 8, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.2717071Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.2717077Z 2023-01-11T21:38:06.2717346Z [2023-01-11 21:34:09,750] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 443 2023-01-11T21:38:06.2717352Z 2023-01-11T21:38:06.2717450Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.2717526Z import torch 2023-01-11T21:38:06.2717602Z import random 2023-01-11T21:38:06.2717722Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.2717841Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.2717849Z 2023-01-11T21:38:06.2717932Z aten = torch.ops.aten 2023-01-11T21:38:06.2718068Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.2718165Z async_compile = AsyncCompile() 2023-01-11T21:38:06.2718171Z 2023-01-11T21:38:06.2718249Z import triton 2023-01-11T21:38:06.2718343Z import triton.language as tl 2023-01-11T21:38:06.2718500Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.2718635Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.2718647Z 2023-01-11T21:38:06.2718651Z 2023-01-11T21:38:06.2718803Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.2718880Z import triton 2023-01-11T21:38:06.2718973Z import triton.language as tl 2023-01-11T21:38:06.2719089Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2719193Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2719327Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.2719456Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2719461Z 2023-01-11T21:38:06.2719865Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.2719936Z @triton.jit 2023-01-11T21:38:06.2720071Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.2720148Z xnumel = 128 2023-01-11T21:38:06.2720247Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2720378Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.2720464Z xmask = xindex < xnumel 2023-01-11T21:38:06.2720537Z x0 = xindex 2023-01-11T21:38:06.2720629Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.2720701Z tmp1 = 1 2023-01-11T21:38:06.2720783Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.2720919Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.2721034Z ''') 2023-01-11T21:38:06.2721040Z 2023-01-11T21:38:06.2721044Z 2023-01-11T21:38:06.2721205Z triton_fused_add_1_1 = async_compile.triton(''' 2023-01-11T21:38:06.2721284Z import triton 2023-01-11T21:38:06.2721372Z import triton.language as tl 2023-01-11T21:38:06.2721490Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2721593Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2721726Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.2721854Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2721859Z 2023-01-11T21:38:06.2722262Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.2722340Z @triton.jit 2023-01-11T21:38:06.2722474Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.2722547Z xnumel = 80 2023-01-11T21:38:06.2722647Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2722777Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.2722864Z xmask = xindex < xnumel 2023-01-11T21:38:06.2722937Z x0 = xindex 2023-01-11T21:38:06.2723043Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.2723116Z tmp1 = 2 2023-01-11T21:38:06.2723191Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.2723331Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.2723418Z ''') 2023-01-11T21:38:06.2723424Z 2023-01-11T21:38:06.2723428Z 2023-01-11T21:38:06.2723586Z triton_fused_add_2_2 = async_compile.triton(''' 2023-01-11T21:38:06.2723662Z import triton 2023-01-11T21:38:06.2723757Z import triton.language as tl 2023-01-11T21:38:06.2723871Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2723975Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2724104Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.2724230Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2724235Z 2023-01-11T21:38:06.2724669Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp32', 1: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.2724747Z @triton.jit 2023-01-11T21:38:06.2724876Z def triton_(in_out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.2724953Z xnumel = 160 2023-01-11T21:38:06.2725051Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2725181Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.2725259Z xmask = xindex < xnumel 2023-01-11T21:38:06.2725332Z x0 = xindex 2023-01-11T21:38:06.2725437Z tmp0 = tl.load(in_out_ptr0 + (x0), xmask) 2023-01-11T21:38:06.2725509Z tmp1 = 3 2023-01-11T21:38:06.2725591Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.2725728Z tl.store(in_out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.2725809Z ''') 2023-01-11T21:38:06.2725821Z 2023-01-11T21:38:06.2725825Z 2023-01-11T21:38:06.2725914Z async_compile.wait(globals()) 2023-01-11T21:38:06.2725993Z del async_compile 2023-01-11T21:38:06.2726000Z 2023-01-11T21:38:06.2726077Z def call(args): 2023-01-11T21:38:06.2726159Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.2726235Z args.clear() 2023-01-11T21:38:06.2726329Z with torch.cuda.device(0): 2023-01-11T21:38:06.2726545Z buf0 = empty_strided((1, 16, 10), (160, 10, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.2726712Z aten.mm.out(as_strided(arg0_1, (16, 8), (8, 1)), as_strided(arg1_1, (8, 10), (10, 1)), out=as_strided(buf0, (16, 10), (10, 1))) 2023-01-11T21:38:06.2726926Z buf1 = empty_strided((1, 16, 8), (128, 8, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.2727051Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.2727193Z triton_fused_add_0.run(arg0_1, buf1, 128, grid=grid(128), stream=stream0) 2023-01-11T21:38:06.2727269Z del arg0_1 2023-01-11T21:38:06.2727476Z buf2 = empty_strided((1, 8, 10), (80, 10, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.2727616Z triton_fused_add_1_1.run(arg1_1, buf2, 80, grid=grid(80), stream=stream0) 2023-01-11T21:38:06.2727692Z del arg1_1 2023-01-11T21:38:06.2727895Z buf3 = empty_strided((1, 16, 10), (160, 10, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.2728065Z aten.mm.out(as_strided(buf1, (16, 8), (8, 1)), as_strided(buf2, (8, 10), (10, 1)), out=as_strided(buf3, (16, 10), (10, 1))) 2023-01-11T21:38:06.2728140Z del buf1 2023-01-11T21:38:06.2728211Z del buf2 2023-01-11T21:38:06.2728305Z buf4 = buf3; del buf3 # reuse 2023-01-11T21:38:06.2728438Z triton_fused_add_2_2.run(buf4, 160, grid=grid(160), stream=stream0) 2023-01-11T21:38:06.2728527Z return (buf0, buf4, ) 2023-01-11T21:38:06.2728532Z 2023-01-11T21:38:06.2728536Z 2023-01-11T21:38:06.2728619Z if __name__ == "__main__": 2023-01-11T21:38:06.2728733Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.2728863Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.2729077Z arg0_1 = rand_strided((1, 16, 8), (128, 8, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.2729288Z arg1_1 = rand_strided((1, 8, 10), (80, 10, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.2729412Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.2729417Z 2023-01-11T21:38:06.2729491Z ok (0.307s) 2023-01-11T21:38:06.2729945Z test_bmm2_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.2730081Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.2730368Z [2023-01-11 21:34:09,766] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 444 2023-01-11T21:38:06.2730626Z [2023-01-11 21:34:09,773] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 444 2023-01-11T21:38:06.2730637Z 2023-01-11T21:38:06.2730731Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.2730807Z import torch 2023-01-11T21:38:06.2730882Z import random 2023-01-11T21:38:06.2731004Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.2731128Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.2731133Z 2023-01-11T21:38:06.2731217Z aten = torch.ops.aten 2023-01-11T21:38:06.2731357Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.2731451Z async_compile = AsyncCompile() 2023-01-11T21:38:06.2731456Z 2023-01-11T21:38:06.2731531Z import triton 2023-01-11T21:38:06.2731625Z import triton.language as tl 2023-01-11T21:38:06.2731752Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.2731895Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.2731900Z 2023-01-11T21:38:06.2731905Z 2023-01-11T21:38:06.2731999Z async_compile.wait(globals()) 2023-01-11T21:38:06.2732080Z del async_compile 2023-01-11T21:38:06.2732085Z 2023-01-11T21:38:06.2732161Z def call(args): 2023-01-11T21:38:06.2732237Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.2732317Z args.clear() 2023-01-11T21:38:06.2732412Z with torch.cuda.device(0): 2023-01-11T21:38:06.2732621Z buf0 = empty_strided((1, 8, 8), (64, 8, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.2732792Z aten.mm.out(as_strided(arg0_1, (8, 8), (1, 8)), as_strided(arg1_1, (8, 8), (8, 1)), out=as_strided(buf0, (8, 8), (8, 1))) 2023-01-11T21:38:06.2732896Z del arg0_1 2023-01-11T21:38:06.2732969Z del arg1_1 2023-01-11T21:38:06.2733042Z return (buf0, ) 2023-01-11T21:38:06.2733048Z 2023-01-11T21:38:06.2733056Z 2023-01-11T21:38:06.2733131Z if __name__ == "__main__": 2023-01-11T21:38:06.2733254Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.2733382Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.2733594Z arg0_1 = rand_strided((1, 8, 8), (64, 8, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.2733804Z arg1_1 = rand_strided((1, 8, 8), (64, 8, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.2733924Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.2733929Z 2023-01-11T21:38:06.2734004Z ok (0.022s) 2023-01-11T21:38:06.2734456Z test_bool_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.2734773Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.2735036Z [2023-01-11 21:34:09,805] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 445 2023-01-11T21:38:06.2735301Z [2023-01-11 21:34:10,025] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 445 2023-01-11T21:38:06.2735716Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.2735850Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.2736106Z [2023-01-11 21:34:10,056] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 446 2023-01-11T21:38:06.2736412Z [2023-01-11 21:34:10,086] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 446 2023-01-11T21:38:06.2736418Z 2023-01-11T21:38:06.2736515Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.2736590Z import torch 2023-01-11T21:38:06.2736665Z import random 2023-01-11T21:38:06.2736778Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.2736901Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.2736907Z 2023-01-11T21:38:06.2736988Z aten = torch.ops.aten 2023-01-11T21:38:06.2737174Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.2737284Z async_compile = AsyncCompile() 2023-01-11T21:38:06.2737290Z 2023-01-11T21:38:06.2737374Z import triton 2023-01-11T21:38:06.2737471Z import triton.language as tl 2023-01-11T21:38:06.2737591Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.2737733Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.2737738Z 2023-01-11T21:38:06.2737743Z 2023-01-11T21:38:06.2738030Z triton_fused_add_bitwise_and_bitwise_or_bitwise_xor_logical_and_logical_not_logical_or_mul_sign_0 = async_compile.triton(''' 2023-01-11T21:38:06.2738106Z import triton 2023-01-11T21:38:06.2738197Z import triton.language as tl 2023-01-11T21:38:06.2738310Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2738412Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2738545Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.2738665Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2738676Z 2023-01-11T21:38:06.2739172Z @pointwise(size_hints=[4], filename=__file__, meta={'signature': {0: '*i1', 1: '*i1', 2: '*i1', 3: '*i1', 4: '*i1', 5: '*i1', 6: '*i1', 7: '*i1', 8: '*i1', 9: '*i1', 10: '*i1', 11: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10), equal_to_1=())]}) 2023-01-11T21:38:06.2739287Z @triton.jit 2023-01-11T21:38:06.2739498Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, out_ptr2, out_ptr3, out_ptr4, out_ptr5, out_ptr6, out_ptr7, out_ptr8, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.2739575Z xnumel = 4 2023-01-11T21:38:06.2739676Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2739809Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.2739893Z xmask = xindex < xnumel 2023-01-11T21:38:06.2739967Z x0 = xindex 2023-01-11T21:38:06.2740155Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.2740346Z tmp1 = tl.load(in_ptr1 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.2740445Z tmp5 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.2740546Z tmp7 = tl.load(in_ptr1 + (x0), xmask) 2023-01-11T21:38:06.2740628Z tmp2 = tmp0 | tmp1 2023-01-11T21:38:06.2740708Z tmp3 = tmp0 & tmp1 2023-01-11T21:38:06.2740788Z tmp4 = tmp0 ^ tmp1 2023-01-11T21:38:06.2740859Z tmp6 = tmp5 == 0 2023-01-11T21:38:06.2740999Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.2741135Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.2741270Z tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.2741402Z tl.store(out_ptr3 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.2741532Z tl.store(out_ptr4 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.2741661Z tl.store(out_ptr5 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.2741789Z tl.store(out_ptr6 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.2741917Z tl.store(out_ptr7 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp6, xmask) 2023-01-11T21:38:06.2742044Z tl.store(out_ptr8 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp7, xmask) 2023-01-11T21:38:06.2742133Z ''') 2023-01-11T21:38:06.2742138Z 2023-01-11T21:38:06.2742143Z 2023-01-11T21:38:06.2742266Z async_compile.wait(globals()) 2023-01-11T21:38:06.2742351Z del async_compile 2023-01-11T21:38:06.2742356Z 2023-01-11T21:38:06.2742431Z def call(args): 2023-01-11T21:38:06.2742513Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.2742592Z args.clear() 2023-01-11T21:38:06.2742681Z with torch.cuda.device(0): 2023-01-11T21:38:06.2742873Z buf0 = empty_strided((4, ), (1, ), device='cuda', dtype=torch.bool) 2023-01-11T21:38:06.2743064Z buf1 = empty_strided((4, ), (1, ), device='cuda', dtype=torch.bool) 2023-01-11T21:38:06.2743251Z buf2 = empty_strided((4, ), (1, ), device='cuda', dtype=torch.bool) 2023-01-11T21:38:06.2743439Z buf3 = empty_strided((4, ), (1, ), device='cuda', dtype=torch.bool) 2023-01-11T21:38:06.2743626Z buf4 = empty_strided((4, ), (1, ), device='cuda', dtype=torch.bool) 2023-01-11T21:38:06.2743812Z buf5 = empty_strided((4, ), (1, ), device='cuda', dtype=torch.bool) 2023-01-11T21:38:06.2743987Z buf6 = empty_strided((4, ), (1, ), device='cuda', dtype=torch.bool) 2023-01-11T21:38:06.2744178Z buf7 = empty_strided((4, ), (1, ), device='cuda', dtype=torch.bool) 2023-01-11T21:38:06.2744359Z buf8 = empty_strided((4, ), (1, ), device='cuda', dtype=torch.bool) 2023-01-11T21:38:06.2744457Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.2744708Z triton_fused_add_bitwise_and_bitwise_or_bitwise_xor_logical_and_logical_not_logical_or_mul_sign_0.run(arg0_1, arg1_1, buf0, buf1, buf2, buf3, buf4, buf5, buf6, buf7, buf8, 4, grid=grid(4), stream=stream0) 2023-01-11T21:38:06.2744784Z del arg0_1 2023-01-11T21:38:06.2744859Z del arg1_1 2023-01-11T21:38:06.2744984Z return (buf0, buf1, buf2, buf3, buf4, buf5, buf6, buf7, buf8, ) 2023-01-11T21:38:06.2745019Z 2023-01-11T21:38:06.2745023Z 2023-01-11T21:38:06.2745103Z if __name__ == "__main__": 2023-01-11T21:38:06.2745218Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.2745347Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.2745542Z arg0_1 = rand_strided((4, ), (1, ), device='cuda:0', dtype=torch.bool) 2023-01-11T21:38:06.2745734Z arg1_1 = rand_strided((4, ), (1, ), device='cuda:0', dtype=torch.bool) 2023-01-11T21:38:06.2745856Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.2745861Z 2023-01-11T21:38:06.2745865Z 2023-01-11T21:38:06.2745965Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.2746041Z import torch 2023-01-11T21:38:06.2746112Z import random 2023-01-11T21:38:06.2746234Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.2746359Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.2746367Z 2023-01-11T21:38:06.2746451Z aten = torch.ops.aten 2023-01-11T21:38:06.2746589Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.2746688Z async_compile = AsyncCompile() 2023-01-11T21:38:06.2746693Z 2023-01-11T21:38:06.2746768Z import triton 2023-01-11T21:38:06.2746862Z import triton.language as tl 2023-01-11T21:38:06.2746985Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.2747127Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.2747132Z 2023-01-11T21:38:06.2747137Z 2023-01-11T21:38:06.2747419Z triton_fused_add_bitwise_and_bitwise_or_bitwise_xor_logical_and_logical_not_logical_or_mul_sign_0 = async_compile.triton(''' 2023-01-11T21:38:06.2747497Z import triton 2023-01-11T21:38:06.2747591Z import triton.language as tl 2023-01-11T21:38:06.2747708Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2747810Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2747946Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.2748069Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2748074Z 2023-01-11T21:38:06.2748607Z @pointwise(size_hints=[4], filename=__file__, meta={'signature': {0: '*i1', 1: '*i1', 2: '*i1', 3: '*i1', 4: '*i1', 5: '*i1', 6: '*i1', 7: '*i1', 8: '*i1', 9: '*i1', 10: '*i1', 11: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10), equal_to_1=())]}) 2023-01-11T21:38:06.2748686Z @triton.jit 2023-01-11T21:38:06.2748894Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, out_ptr2, out_ptr3, out_ptr4, out_ptr5, out_ptr6, out_ptr7, out_ptr8, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.2748970Z xnumel = 4 2023-01-11T21:38:06.2749072Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2749205Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.2749292Z xmask = xindex < xnumel 2023-01-11T21:38:06.2749369Z x0 = xindex 2023-01-11T21:38:06.2749555Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.2749747Z tmp1 = tl.load(in_ptr1 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.2749846Z tmp5 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.2749947Z tmp7 = tl.load(in_ptr1 + (x0), xmask) 2023-01-11T21:38:06.2750030Z tmp2 = tmp0 | tmp1 2023-01-11T21:38:06.2750110Z tmp3 = tmp0 & tmp1 2023-01-11T21:38:06.2750190Z tmp4 = tmp0 ^ tmp1 2023-01-11T21:38:06.2750261Z tmp6 = tmp5 == 0 2023-01-11T21:38:06.2750397Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.2750533Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.2750668Z tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.2750799Z tl.store(out_ptr3 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.2750977Z tl.store(out_ptr4 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.2751107Z tl.store(out_ptr5 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.2751231Z tl.store(out_ptr6 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.2751359Z tl.store(out_ptr7 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp6, xmask) 2023-01-11T21:38:06.2751487Z tl.store(out_ptr8 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp7, xmask) 2023-01-11T21:38:06.2751576Z ''') 2023-01-11T21:38:06.2751582Z 2023-01-11T21:38:06.2751586Z 2023-01-11T21:38:06.2751683Z async_compile.wait(globals()) 2023-01-11T21:38:06.2751764Z del async_compile 2023-01-11T21:38:06.2751769Z 2023-01-11T21:38:06.2751847Z def call(args): 2023-01-11T21:38:06.2751928Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.2751999Z args.clear() 2023-01-11T21:38:06.2752094Z with torch.cuda.device(0): 2023-01-11T21:38:06.2752288Z buf0 = empty_strided((4, ), (1, ), device='cuda', dtype=torch.bool) 2023-01-11T21:38:06.2752483Z buf1 = empty_strided((4, ), (1, ), device='cuda', dtype=torch.bool) 2023-01-11T21:38:06.2752671Z buf2 = empty_strided((4, ), (1, ), device='cuda', dtype=torch.bool) 2023-01-11T21:38:06.2752858Z buf3 = empty_strided((4, ), (1, ), device='cuda', dtype=torch.bool) 2023-01-11T21:38:06.2753046Z buf4 = empty_strided((4, ), (1, ), device='cuda', dtype=torch.bool) 2023-01-11T21:38:06.2753231Z buf5 = empty_strided((4, ), (1, ), device='cuda', dtype=torch.bool) 2023-01-11T21:38:06.2753408Z buf6 = empty_strided((4, ), (1, ), device='cuda', dtype=torch.bool) 2023-01-11T21:38:06.2753590Z buf7 = empty_strided((4, ), (1, ), device='cuda', dtype=torch.bool) 2023-01-11T21:38:06.2753773Z buf8 = empty_strided((4, ), (1, ), device='cuda', dtype=torch.bool) 2023-01-11T21:38:06.2753868Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.2754118Z triton_fused_add_bitwise_and_bitwise_or_bitwise_xor_logical_and_logical_not_logical_or_mul_sign_0.run(arg0_1, arg1_1, buf0, buf1, buf2, buf3, buf4, buf5, buf6, buf7, buf8, 4, grid=grid(4), stream=stream0) 2023-01-11T21:38:06.2754199Z del arg0_1 2023-01-11T21:38:06.2754275Z del arg1_1 2023-01-11T21:38:06.2754400Z return (buf0, buf1, buf2, buf3, buf4, buf5, buf6, buf7, buf8, ) 2023-01-11T21:38:06.2754406Z 2023-01-11T21:38:06.2754439Z 2023-01-11T21:38:06.2754517Z if __name__ == "__main__": 2023-01-11T21:38:06.2754639Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.2754766Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.2754957Z arg0_1 = rand_strided((4, ), (1, ), device='cuda:0', dtype=torch.bool) 2023-01-11T21:38:06.2755163Z arg1_1 = rand_strided((4, ), (1, ), device='cuda:0', dtype=torch.bool) 2023-01-11T21:38:06.2755299Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.2755305Z 2023-01-11T21:38:06.2755403Z ok (0.314s) 2023-01-11T21:38:06.2755735Z test_both_scalars_cuda (__main__.CudaTests) ... [2023-01-11 21:34:10,142] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 447 2023-01-11T21:38:06.2756000Z [2023-01-11 21:34:10,158] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 447 2023-01-11T21:38:06.2756258Z [2023-01-11 21:34:10,212] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 448 2023-01-11T21:38:06.2756524Z [2023-01-11 21:34:10,228] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 448 2023-01-11T21:38:06.2756529Z 2023-01-11T21:38:06.2756632Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.2756709Z import torch 2023-01-11T21:38:06.2756788Z import random 2023-01-11T21:38:06.2756911Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.2757035Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.2757041Z 2023-01-11T21:38:06.2757119Z aten = torch.ops.aten 2023-01-11T21:38:06.2757285Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.2757384Z async_compile = AsyncCompile() 2023-01-11T21:38:06.2757389Z 2023-01-11T21:38:06.2757465Z import triton 2023-01-11T21:38:06.2757560Z import triton.language as tl 2023-01-11T21:38:06.2757687Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.2757830Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.2757836Z 2023-01-11T21:38:06.2757840Z 2023-01-11T21:38:06.2757982Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.2758185Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.2758306Z extern "C" void kernel(float* __restrict__ out_ptr0, 2023-01-11T21:38:06.2758411Z float* __restrict__ out_ptr1, 2023-01-11T21:38:06.2758512Z float* __restrict__ out_ptr2, 2023-01-11T21:38:06.2758616Z float* __restrict__ out_ptr3, 2023-01-11T21:38:06.2758721Z float* __restrict__ out_ptr4, 2023-01-11T21:38:06.2758821Z float* __restrict__ out_ptr5) 2023-01-11T21:38:06.2758889Z { 2023-01-11T21:38:06.2758950Z { 2023-01-11T21:38:06.2759020Z { 2023-01-11T21:38:06.2759126Z auto tmp0 = static_cast(4); 2023-01-11T21:38:06.2759239Z auto tmp1 = static_cast(3.3); 2023-01-11T21:38:06.2759334Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.2759421Z out_ptr0[0] = tmp2; 2023-01-11T21:38:06.2759483Z } 2023-01-11T21:38:06.2759552Z } 2023-01-11T21:38:06.2759620Z { 2023-01-11T21:38:06.2759687Z { 2023-01-11T21:38:06.2759796Z auto tmp0 = static_cast(3.3); 2023-01-11T21:38:06.2759902Z auto tmp1 = static_cast(4); 2023-01-11T21:38:06.2759994Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.2760073Z out_ptr1[0] = tmp2; 2023-01-11T21:38:06.2760141Z } 2023-01-11T21:38:06.2760211Z } 2023-01-11T21:38:06.2760277Z { 2023-01-11T21:38:06.2760344Z { 2023-01-11T21:38:06.2760449Z auto tmp0 = static_cast(4); 2023-01-11T21:38:06.2760557Z auto tmp1 = static_cast(3.3); 2023-01-11T21:38:06.2760683Z auto tmp2 = tmp0 - tmp1; 2023-01-11T21:38:06.2760798Z out_ptr2[0] = tmp2; 2023-01-11T21:38:06.2760867Z } 2023-01-11T21:38:06.2760934Z } 2023-01-11T21:38:06.2761002Z { 2023-01-11T21:38:06.2761071Z { 2023-01-11T21:38:06.2761171Z auto tmp0 = static_cast(3.3); 2023-01-11T21:38:06.2761275Z auto tmp1 = static_cast(4); 2023-01-11T21:38:06.2761406Z auto tmp2 = tmp0 - tmp1; 2023-01-11T21:38:06.2761489Z out_ptr3[0] = tmp2; 2023-01-11T21:38:06.2761559Z } 2023-01-11T21:38:06.2761627Z } 2023-01-11T21:38:06.2761694Z { 2023-01-11T21:38:06.2761755Z { 2023-01-11T21:38:06.2761858Z auto tmp0 = static_cast(4); 2023-01-11T21:38:06.2761969Z auto tmp1 = static_cast(3.3); 2023-01-11T21:38:06.2762059Z auto tmp2 = tmp0 * tmp1; 2023-01-11T21:38:06.2762144Z out_ptr4[0] = tmp2; 2023-01-11T21:38:06.2762212Z } 2023-01-11T21:38:06.2762281Z } 2023-01-11T21:38:06.2762342Z { 2023-01-11T21:38:06.2762409Z { 2023-01-11T21:38:06.2762517Z auto tmp0 = static_cast(3.3); 2023-01-11T21:38:06.2762620Z auto tmp1 = static_cast(4); 2023-01-11T21:38:06.2762714Z auto tmp2 = tmp0 * tmp1; 2023-01-11T21:38:06.2762800Z out_ptr5[0] = tmp2; 2023-01-11T21:38:06.2762863Z } 2023-01-11T21:38:06.2762930Z } 2023-01-11T21:38:06.2762996Z } 2023-01-11T21:38:06.2763081Z ''') 2023-01-11T21:38:06.2763087Z 2023-01-11T21:38:06.2763091Z 2023-01-11T21:38:06.2763185Z async_compile.wait(globals()) 2023-01-11T21:38:06.2763265Z del async_compile 2023-01-11T21:38:06.2763271Z 2023-01-11T21:38:06.2763345Z def call(args): 2023-01-11T21:38:06.2763608Z buf0 = empty_strided((), (), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.2763785Z buf1 = empty_strided((), (), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.2763966Z buf2 = empty_strided((), (), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.2764150Z buf3 = empty_strided((), (), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.2764327Z buf4 = empty_strided((), (), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.2764505Z buf5 = empty_strided((), (), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.2764743Z kernel_cpp_0(c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr()), c_void_p(buf3.data_ptr()), c_void_p(buf4.data_ptr()), c_void_p(buf5.data_ptr())) 2023-01-11T21:38:06.2764854Z return (buf0, buf1, buf2, buf3, buf4, buf5, ) 2023-01-11T21:38:06.2764859Z 2023-01-11T21:38:06.2764864Z 2023-01-11T21:38:06.2764947Z if __name__ == "__main__": 2023-01-11T21:38:06.2765062Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.2765194Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.2765298Z print_performance(lambda: call([])) 2023-01-11T21:38:06.2765304Z 2023-01-11T21:38:06.2765308Z 2023-01-11T21:38:06.2765409Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.2765485Z import torch 2023-01-11T21:38:06.2765563Z import random 2023-01-11T21:38:06.2765684Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.2765809Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.2765814Z 2023-01-11T21:38:06.2765891Z aten = torch.ops.aten 2023-01-11T21:38:06.2766029Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.2766127Z async_compile = AsyncCompile() 2023-01-11T21:38:06.2766132Z 2023-01-11T21:38:06.2766210Z import triton 2023-01-11T21:38:06.2766303Z import triton.language as tl 2023-01-11T21:38:06.2766431Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.2766575Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.2766580Z 2023-01-11T21:38:06.2766585Z 2023-01-11T21:38:06.2766724Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.2766926Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.2767074Z extern "C" void kernel(float* __restrict__ out_ptr0, 2023-01-11T21:38:06.2767183Z float* __restrict__ out_ptr1, 2023-01-11T21:38:06.2767287Z float* __restrict__ out_ptr2, 2023-01-11T21:38:06.2767389Z float* __restrict__ out_ptr3, 2023-01-11T21:38:06.2767493Z float* __restrict__ out_ptr4, 2023-01-11T21:38:06.2767594Z float* __restrict__ out_ptr5) 2023-01-11T21:38:06.2767655Z { 2023-01-11T21:38:06.2767723Z { 2023-01-11T21:38:06.2767791Z { 2023-01-11T21:38:06.2767897Z auto tmp0 = static_cast(4); 2023-01-11T21:38:06.2768011Z auto tmp1 = static_cast(3.3); 2023-01-11T21:38:06.2768105Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.2768191Z out_ptr0[0] = tmp2; 2023-01-11T21:38:06.2768253Z } 2023-01-11T21:38:06.2768322Z } 2023-01-11T21:38:06.2768388Z { 2023-01-11T21:38:06.2768455Z { 2023-01-11T21:38:06.2768565Z auto tmp0 = static_cast(3.3); 2023-01-11T21:38:06.2768670Z auto tmp1 = static_cast(4); 2023-01-11T21:38:06.2768764Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.2768842Z out_ptr1[0] = tmp2; 2023-01-11T21:38:06.2768910Z } 2023-01-11T21:38:06.2768975Z } 2023-01-11T21:38:06.2769043Z { 2023-01-11T21:38:06.2769111Z { 2023-01-11T21:38:06.2769214Z auto tmp0 = static_cast(4); 2023-01-11T21:38:06.2769316Z auto tmp1 = static_cast(3.3); 2023-01-11T21:38:06.2769449Z auto tmp2 = tmp0 - tmp1; 2023-01-11T21:38:06.2769532Z out_ptr2[0] = tmp2; 2023-01-11T21:38:06.2769629Z } 2023-01-11T21:38:06.2769698Z } 2023-01-11T21:38:06.2769765Z { 2023-01-11T21:38:06.2769833Z { 2023-01-11T21:38:06.2769933Z auto tmp0 = static_cast(3.3); 2023-01-11T21:38:06.2770038Z auto tmp1 = static_cast(4); 2023-01-11T21:38:06.2770174Z auto tmp2 = tmp0 - tmp1; 2023-01-11T21:38:06.2770259Z out_ptr3[0] = tmp2; 2023-01-11T21:38:06.2770327Z } 2023-01-11T21:38:06.2770394Z } 2023-01-11T21:38:06.2770461Z { 2023-01-11T21:38:06.2770522Z { 2023-01-11T21:38:06.2770626Z auto tmp0 = static_cast(4); 2023-01-11T21:38:06.2770732Z auto tmp1 = static_cast(3.3); 2023-01-11T21:38:06.2770823Z auto tmp2 = tmp0 * tmp1; 2023-01-11T21:38:06.2770909Z out_ptr4[0] = tmp2; 2023-01-11T21:38:06.2770976Z } 2023-01-11T21:38:06.2771037Z } 2023-01-11T21:38:06.2771104Z { 2023-01-11T21:38:06.2771170Z { 2023-01-11T21:38:06.2771278Z auto tmp0 = static_cast(3.3); 2023-01-11T21:38:06.2771382Z auto tmp1 = static_cast(4); 2023-01-11T21:38:06.2771471Z auto tmp2 = tmp0 * tmp1; 2023-01-11T21:38:06.2771555Z out_ptr5[0] = tmp2; 2023-01-11T21:38:06.2771617Z } 2023-01-11T21:38:06.2771684Z } 2023-01-11T21:38:06.2771752Z } 2023-01-11T21:38:06.2771839Z ''') 2023-01-11T21:38:06.2771844Z 2023-01-11T21:38:06.2771849Z 2023-01-11T21:38:06.2771947Z async_compile.wait(globals()) 2023-01-11T21:38:06.2772026Z del async_compile 2023-01-11T21:38:06.2772031Z 2023-01-11T21:38:06.2772108Z def call(args): 2023-01-11T21:38:06.2772290Z buf0 = empty_strided((), (), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.2772473Z buf1 = empty_strided((), (), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.2772652Z buf2 = empty_strided((), (), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.2772831Z buf3 = empty_strided((), (), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.2773012Z buf4 = empty_strided((), (), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.2773187Z buf5 = empty_strided((), (), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.2773450Z kernel_cpp_0(c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr()), c_void_p(buf3.data_ptr()), c_void_p(buf4.data_ptr()), c_void_p(buf5.data_ptr())) 2023-01-11T21:38:06.2773557Z return (buf0, buf1, buf2, buf3, buf4, buf5, ) 2023-01-11T21:38:06.2773563Z 2023-01-11T21:38:06.2773567Z 2023-01-11T21:38:06.2773648Z if __name__ == "__main__": 2023-01-11T21:38:06.2773760Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.2773888Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.2773989Z print_performance(lambda: call([])) 2023-01-11T21:38:06.2773995Z 2023-01-11T21:38:06.2774067Z ok (0.141s) 2023-01-11T21:38:06.2774632Z test_cat_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.2774770Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.2775033Z [2023-01-11 21:34:10,276] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 449 2023-01-11T21:38:06.2775295Z [2023-01-11 21:34:10,525] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 449 2023-01-11T21:38:06.2775712Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.2775920Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.2776169Z [2023-01-11 21:34:10,575] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 450 2023-01-11T21:38:06.2776183Z 2023-01-11T21:38:06.2776276Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.2776351Z import torch 2023-01-11T21:38:06.2776425Z import random 2023-01-11T21:38:06.2776543Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.2776665Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.2776671Z 2023-01-11T21:38:06.2776751Z aten = torch.ops.aten 2023-01-11T21:38:06.2776886Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.2776976Z async_compile = AsyncCompile() 2023-01-11T21:38:06.2776981Z 2023-01-11T21:38:06.2777054Z import triton 2023-01-11T21:38:06.2777208Z import triton.language as tl 2023-01-11T21:38:06.2777365Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.2777506Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.2777511Z 2023-01-11T21:38:06.2777516Z 2023-01-11T21:38:06.2777740Z triton_fused_add_1_cat_cat_1_cat_2_convert_element_type_mul_0 = async_compile.triton(''' 2023-01-11T21:38:06.2777819Z import triton 2023-01-11T21:38:06.2777911Z import triton.language as tl 2023-01-11T21:38:06.2778018Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2778122Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2778255Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.2778381Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2778386Z 2023-01-11T21:38:06.2778866Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: '*fp32', 5: '*fp64', 6: '*fp64', 7: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 3, 4, 5, 6, 7), equal_to_1=())]}) 2023-01-11T21:38:06.2778943Z @triton.jit 2023-01-11T21:38:06.2779124Z def triton_(in_ptr0, out_ptr0, out_ptr1, out_ptr2, out_ptr3, out_ptr4, out_ptr5, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.2779195Z xnumel = 128 2023-01-11T21:38:06.2779326Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2779460Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.2779540Z xmask = xindex < xnumel 2023-01-11T21:38:06.2779610Z x2 = xindex 2023-01-11T21:38:06.2779684Z x0 = xindex % 16 2023-01-11T21:38:06.2779764Z x1 = (xindex // 16) 2023-01-11T21:38:06.2779961Z tmp0 = tl.load(in_ptr0 + (x2), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.2780053Z tmp3 = tl.load(in_ptr0 + (x2), xmask) 2023-01-11T21:38:06.2780124Z tmp1 = 2 2023-01-11T21:38:06.2780202Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.2780280Z tmp4 = tmp3 * tmp1 2023-01-11T21:38:06.2780371Z tmp5 = tmp4.to(tl.float64) 2023-01-11T21:38:06.2780515Z tl.store(out_ptr0 + (x0 + (36*x1) + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask) 2023-01-11T21:38:06.2780650Z tl.store(out_ptr1 + (x0 + (36*x1) + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.2780783Z tl.store(out_ptr2 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.2780910Z tl.store(out_ptr3 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.2781040Z tl.store(out_ptr4 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask) 2023-01-11T21:38:06.2781171Z tl.store(out_ptr5 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask) 2023-01-11T21:38:06.2781256Z ''') 2023-01-11T21:38:06.2781262Z 2023-01-11T21:38:06.2781266Z 2023-01-11T21:38:06.2781445Z triton_fused_add_slice_1_slice_2_1 = async_compile.triton(''' 2023-01-11T21:38:06.2781520Z import triton 2023-01-11T21:38:06.2781613Z import triton.language as tl 2023-01-11T21:38:06.2781721Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2781856Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2781990Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.2782114Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2782120Z 2023-01-11T21:38:06.2782530Z @pointwise(size_hints=[32], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.2782603Z @triton.jit 2023-01-11T21:38:06.2782737Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.2782810Z xnumel = 32 2023-01-11T21:38:06.2782899Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2783032Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.2783118Z xmask = xindex < xnumel 2023-01-11T21:38:06.2783192Z x0 = xindex % 4 2023-01-11T21:38:06.2783273Z x1 = (xindex // 4) 2023-01-11T21:38:06.2783379Z tmp0 = tl.load(in_ptr0 + (x0 + (16*x1)), xmask) 2023-01-11T21:38:06.2783451Z tmp1 = 1 2023-01-11T21:38:06.2783523Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.2783663Z tl.store(out_ptr0 + (x0 + (36*x1) + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.2783749Z ''') 2023-01-11T21:38:06.2783757Z 2023-01-11T21:38:06.2783762Z 2023-01-11T21:38:06.2783853Z async_compile.wait(globals()) 2023-01-11T21:38:06.2783932Z del async_compile 2023-01-11T21:38:06.2783937Z 2023-01-11T21:38:06.2784011Z def call(args): 2023-01-11T21:38:06.2784082Z arg0_1, = args 2023-01-11T21:38:06.2784157Z args.clear() 2023-01-11T21:38:06.2784244Z with torch.cuda.device(0): 2023-01-11T21:38:06.2784444Z buf3 = empty_strided((8, 36), (36, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.2784553Z buf0 = as_strided(buf3, (8, 16), (36, 1)) # alias 2023-01-11T21:38:06.2784667Z buf2 = as_strided(buf3, (8, 16), (36, 1), 20) # alias 2023-01-11T21:38:06.2784876Z buf6 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.2784985Z buf4 = as_strided(buf6, (8, 16), (16, 1)) # alias 2023-01-11T21:38:06.2785096Z buf5 = as_strided(buf6, (8, 16), (16, 1), 128) # alias 2023-01-11T21:38:06.2785318Z buf9 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float64) 2023-01-11T21:38:06.2785452Z buf7 = as_strided(buf9, (8, 16), (16, 1)) # alias 2023-01-11T21:38:06.2785572Z buf8 = as_strided(buf9, (8, 16), (16, 1), 128) # alias 2023-01-11T21:38:06.2785681Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.2785886Z triton_fused_add_1_cat_cat_1_cat_2_convert_element_type_mul_0.run(arg0_1, buf0, buf2, buf4, buf5, buf7, buf8, 128, grid=grid(128), stream=stream0) 2023-01-11T21:38:06.2785999Z buf1 = as_strided(buf3, (8, 4), (36, 1), 16) # alias 2023-01-11T21:38:06.2786149Z triton_fused_add_slice_1_slice_2_1.run(arg0_1, buf1, 32, grid=grid(32), stream=stream0) 2023-01-11T21:38:06.2786225Z del arg0_1 2023-01-11T21:38:06.2786309Z return (buf3, buf6, buf9, ) 2023-01-11T21:38:06.2786321Z 2023-01-11T21:38:06.2786325Z 2023-01-11T21:38:06.2786399Z if __name__ == "__main__": 2023-01-11T21:38:06.2786516Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.2786644Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.2786848Z arg0_1 = rand_strided((8, 16), (16, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.2786959Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.2786964Z 2023-01-11T21:38:06.2787227Z [2023-01-11 21:34:10,712] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 450 2023-01-11T21:38:06.2787646Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.2787806Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.2788061Z [2023-01-11 21:34:10,760] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 451 2023-01-11T21:38:06.2788067Z 2023-01-11T21:38:06.2788157Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.2788231Z import torch 2023-01-11T21:38:06.2788306Z import random 2023-01-11T21:38:06.2788424Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.2788549Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.2788554Z 2023-01-11T21:38:06.2788635Z aten = torch.ops.aten 2023-01-11T21:38:06.2788776Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.2788865Z async_compile = AsyncCompile() 2023-01-11T21:38:06.2788877Z 2023-01-11T21:38:06.2788948Z import triton 2023-01-11T21:38:06.2789040Z import triton.language as tl 2023-01-11T21:38:06.2789164Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.2789307Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.2789312Z 2023-01-11T21:38:06.2789317Z 2023-01-11T21:38:06.2789542Z triton_fused_add_1_cat_cat_1_cat_2_convert_element_type_mul_0 = async_compile.triton(''' 2023-01-11T21:38:06.2789615Z import triton 2023-01-11T21:38:06.2789707Z import triton.language as tl 2023-01-11T21:38:06.2789814Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2789915Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2790047Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.2790172Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2790177Z 2023-01-11T21:38:06.2790653Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: '*fp16', 4: '*fp16', 5: '*fp64', 6: '*fp64', 7: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 3, 4, 5, 6, 7), equal_to_1=())]}) 2023-01-11T21:38:06.2790728Z @triton.jit 2023-01-11T21:38:06.2790938Z def triton_(in_ptr0, out_ptr0, out_ptr1, out_ptr2, out_ptr3, out_ptr4, out_ptr5, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.2791015Z xnumel = 128 2023-01-11T21:38:06.2791111Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2791234Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.2791318Z xmask = xindex < xnumel 2023-01-11T21:38:06.2791389Z x2 = xindex 2023-01-11T21:38:06.2791463Z x0 = xindex % 16 2023-01-11T21:38:06.2791542Z x1 = (xindex // 16) 2023-01-11T21:38:06.2791757Z tmp0 = tl.load(in_ptr0 + (x2), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.2791874Z tmp3 = tl.load(in_ptr0 + (x2), xmask).to(tl.float32) 2023-01-11T21:38:06.2791942Z tmp1 = 2 2023-01-11T21:38:06.2792022Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.2792099Z tmp4 = tmp3 * tmp1 2023-01-11T21:38:06.2792187Z tmp5 = tmp4.to(tl.float64) 2023-01-11T21:38:06.2792327Z tl.store(out_ptr0 + (x0 + (36*x1) + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask) 2023-01-11T21:38:06.2792464Z tl.store(out_ptr1 + (x0 + (36*x1) + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.2792596Z tl.store(out_ptr2 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.2792722Z tl.store(out_ptr3 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.2792854Z tl.store(out_ptr4 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask) 2023-01-11T21:38:06.2792984Z tl.store(out_ptr5 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask) 2023-01-11T21:38:06.2793069Z ''') 2023-01-11T21:38:06.2793075Z 2023-01-11T21:38:06.2793079Z 2023-01-11T21:38:06.2793260Z triton_fused_add_slice_1_slice_2_1 = async_compile.triton(''' 2023-01-11T21:38:06.2793377Z import triton 2023-01-11T21:38:06.2793471Z import triton.language as tl 2023-01-11T21:38:06.2793585Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2793680Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2793812Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.2793940Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2793946Z 2023-01-11T21:38:06.2794350Z @pointwise(size_hints=[32], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.2794424Z @triton.jit 2023-01-11T21:38:06.2794556Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.2794629Z xnumel = 32 2023-01-11T21:38:06.2794724Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2794846Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.2794931Z xmask = xindex < xnumel 2023-01-11T21:38:06.2795005Z x0 = xindex % 4 2023-01-11T21:38:06.2795083Z x1 = (xindex // 4) 2023-01-11T21:38:06.2795209Z tmp0 = tl.load(in_ptr0 + (x0 + (16*x1)), xmask).to(tl.float32) 2023-01-11T21:38:06.2795280Z tmp1 = 1 2023-01-11T21:38:06.2795359Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.2795519Z tl.store(out_ptr0 + (x0 + (36*x1) + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.2795618Z ''') 2023-01-11T21:38:06.2795625Z 2023-01-11T21:38:06.2795630Z 2023-01-11T21:38:06.2795732Z async_compile.wait(globals()) 2023-01-11T21:38:06.2795811Z del async_compile 2023-01-11T21:38:06.2795816Z 2023-01-11T21:38:06.2795890Z def call(args): 2023-01-11T21:38:06.2795963Z arg0_1, = args 2023-01-11T21:38:06.2796037Z args.clear() 2023-01-11T21:38:06.2796123Z with torch.cuda.device(0): 2023-01-11T21:38:06.2796325Z buf3 = empty_strided((8, 36), (36, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.2796435Z buf0 = as_strided(buf3, (8, 16), (36, 1)) # alias 2023-01-11T21:38:06.2796547Z buf2 = as_strided(buf3, (8, 16), (36, 1), 20) # alias 2023-01-11T21:38:06.2796750Z buf6 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.2796890Z buf4 = as_strided(buf6, (8, 16), (16, 1)) # alias 2023-01-11T21:38:06.2797005Z buf5 = as_strided(buf6, (8, 16), (16, 1), 128) # alias 2023-01-11T21:38:06.2797206Z buf9 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float64) 2023-01-11T21:38:06.2797306Z buf7 = as_strided(buf9, (8, 16), (16, 1)) # alias 2023-01-11T21:38:06.2797417Z buf8 = as_strided(buf9, (8, 16), (16, 1), 128) # alias 2023-01-11T21:38:06.2797508Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.2797711Z triton_fused_add_1_cat_cat_1_cat_2_convert_element_type_mul_0.run(arg0_1, buf0, buf2, buf4, buf5, buf7, buf8, 128, grid=grid(128), stream=stream0) 2023-01-11T21:38:06.2797825Z buf1 = as_strided(buf3, (8, 4), (36, 1), 16) # alias 2023-01-11T21:38:06.2797975Z triton_fused_add_slice_1_slice_2_1.run(arg0_1, buf1, 32, grid=grid(32), stream=stream0) 2023-01-11T21:38:06.2798049Z del arg0_1 2023-01-11T21:38:06.2798138Z return (buf3, buf6, buf9, ) 2023-01-11T21:38:06.2798144Z 2023-01-11T21:38:06.2798150Z 2023-01-11T21:38:06.2798224Z if __name__ == "__main__": 2023-01-11T21:38:06.2798342Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.2798467Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.2798672Z arg0_1 = rand_strided((8, 16), (16, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.2798784Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.2798789Z 2023-01-11T21:38:06.2798793Z 2023-01-11T21:38:06.2798889Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.2798964Z import torch 2023-01-11T21:38:06.2799038Z import random 2023-01-11T21:38:06.2799151Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.2799309Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.2799314Z 2023-01-11T21:38:06.2799395Z aten = torch.ops.aten 2023-01-11T21:38:06.2799530Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.2799628Z async_compile = AsyncCompile() 2023-01-11T21:38:06.2799635Z 2023-01-11T21:38:06.2799710Z import triton 2023-01-11T21:38:06.2799802Z import triton.language as tl 2023-01-11T21:38:06.2799919Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.2800058Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.2800063Z 2023-01-11T21:38:06.2800067Z 2023-01-11T21:38:06.2800322Z triton_fused_add_add_1_cat_cat_1_cat_2_convert_element_type_mul_slice_1_slice_2_0 = async_compile.triton(''' 2023-01-11T21:38:06.2800396Z import triton 2023-01-11T21:38:06.2800488Z import triton.language as tl 2023-01-11T21:38:06.2800603Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2800711Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2800842Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.2800961Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2800974Z 2023-01-11T21:38:06.2801520Z @pointwise(size_hints=[4, 64], tile_hint=TileHint.DEFAULT,filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: '*fp32', 5: '*fp32', 6: '*fp64', 7: '*fp64', 8: 'i32', 9: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5, 6, 7, 9), equal_to_1=())]}) 2023-01-11T21:38:06.2801594Z @triton.jit 2023-01-11T21:38:06.2801808Z def triton_(in_ptr0, out_ptr0, out_ptr1, out_ptr2, out_ptr3, out_ptr4, out_ptr5, out_ptr6, xnumel, ynumel, XBLOCK : tl.constexpr, YBLOCK : tl.constexpr): 2023-01-11T21:38:06.2801881Z xnumel = 3 2023-01-11T21:38:06.2801954Z ynumel = 48 2023-01-11T21:38:06.2802053Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2802190Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.2802274Z xmask = xindex < xnumel 2023-01-11T21:38:06.2802363Z yoffset = tl.program_id(1) * YBLOCK 2023-01-11T21:38:06.2802496Z yindex = yoffset + tl.reshape(tl.arange(0, YBLOCK), [1, YBLOCK]) 2023-01-11T21:38:06.2802608Z ymask = yindex < ynumel 2023-01-11T21:38:06.2802680Z x0 = xindex 2023-01-11T21:38:06.2802750Z y3 = yindex 2023-01-11T21:38:06.2802824Z y1 = yindex % 16 2023-01-11T21:38:06.2802904Z y2 = (yindex // 16) 2023-01-11T21:38:06.2803112Z tmp0 = tl.load(in_ptr0 + (x0 + (3*y3)), xmask & ymask, eviction_policy='evict_last') 2023-01-11T21:38:06.2803229Z tmp5 = tl.load(in_ptr0 + (x0 + (3*y3)), xmask & ymask) 2023-01-11T21:38:06.2803302Z tmp1 = 1 2023-01-11T21:38:06.2803381Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.2803452Z tmp3 = 2 2023-01-11T21:38:06.2803530Z tmp4 = tmp0 + tmp3 2023-01-11T21:38:06.2803609Z tmp6 = tmp5 * tmp3 2023-01-11T21:38:06.2803691Z tmp7 = tmp6.to(tl.float64) 2023-01-11T21:38:06.2803856Z tl.store(out_ptr0 + (y1 + (48*y2) + (144*x0) + tl.zeros([XBLOCK, YBLOCK], tl.int32)), tmp0, xmask & ymask) 2023-01-11T21:38:06.2804019Z tl.store(out_ptr1 + (y1 + (48*y2) + (144*x0) + tl.zeros([XBLOCK, YBLOCK], tl.int32)), tmp2, xmask & ymask) 2023-01-11T21:38:06.2804180Z tl.store(out_ptr2 + (y1 + (48*y2) + (144*x0) + tl.zeros([XBLOCK, YBLOCK], tl.int32)), tmp4, xmask & ymask) 2023-01-11T21:38:06.2804333Z tl.store(out_ptr3 + (y3 + (48*x0) + tl.zeros([XBLOCK, YBLOCK], tl.int32)), tmp6, xmask & ymask) 2023-01-11T21:38:06.2804481Z tl.store(out_ptr4 + (y3 + (48*x0) + tl.zeros([XBLOCK, YBLOCK], tl.int32)), tmp6, xmask & ymask) 2023-01-11T21:38:06.2804628Z tl.store(out_ptr5 + (y3 + (48*x0) + tl.zeros([XBLOCK, YBLOCK], tl.int32)), tmp7, xmask & ymask) 2023-01-11T21:38:06.2804772Z tl.store(out_ptr6 + (y3 + (48*x0) + tl.zeros([XBLOCK, YBLOCK], tl.int32)), tmp7, xmask & ymask) 2023-01-11T21:38:06.2804880Z ''') 2023-01-11T21:38:06.2804886Z 2023-01-11T21:38:06.2804893Z 2023-01-11T21:38:06.2805040Z triton_fused_cat_1 = async_compile.triton(''' 2023-01-11T21:38:06.2805115Z import triton 2023-01-11T21:38:06.2805211Z import triton.language as tl 2023-01-11T21:38:06.2805344Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2805456Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2805611Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.2805735Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2805740Z 2023-01-11T21:38:06.2806193Z @pointwise(size_hints=[4, 256], tile_hint=TileHint.SQUARE,filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 3), equal_to_1=())]}) 2023-01-11T21:38:06.2806267Z @triton.jit 2023-01-11T21:38:06.2806433Z def triton_(in_ptr0, out_ptr0, xnumel, ynumel, XBLOCK : tl.constexpr, YBLOCK : tl.constexpr): 2023-01-11T21:38:06.2806509Z xnumel = 3 2023-01-11T21:38:06.2806583Z ynumel = 144 2023-01-11T21:38:06.2806681Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2806815Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.2806901Z xmask = xindex < xnumel 2023-01-11T21:38:06.2806991Z yoffset = tl.program_id(1) * YBLOCK 2023-01-11T21:38:06.2807121Z yindex = yoffset + tl.reshape(tl.arange(0, YBLOCK), [1, YBLOCK]) 2023-01-11T21:38:06.2807202Z ymask = yindex < ynumel 2023-01-11T21:38:06.2807270Z x0 = xindex 2023-01-11T21:38:06.2807339Z y1 = yindex 2023-01-11T21:38:06.2807454Z tmp0 = tl.load(in_ptr0 + (y1 + (144*x0)), xmask & ymask) 2023-01-11T21:38:06.2807611Z tl.store(out_ptr0 + (x0 + (3*y1) + tl.zeros([XBLOCK, YBLOCK], tl.int32)), tmp0, xmask & ymask) 2023-01-11T21:38:06.2807689Z ''') 2023-01-11T21:38:06.2807694Z 2023-01-11T21:38:06.2807699Z 2023-01-11T21:38:06.2807857Z triton_fused_cat_1_2 = async_compile.triton(''' 2023-01-11T21:38:06.2807932Z import triton 2023-01-11T21:38:06.2808024Z import triton.language as tl 2023-01-11T21:38:06.2808137Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2808238Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2808398Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.2808517Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2808529Z 2023-01-11T21:38:06.2808983Z @pointwise(size_hints=[8, 64], tile_hint=TileHint.SQUARE,filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 3), equal_to_1=())]}) 2023-01-11T21:38:06.2809057Z @triton.jit 2023-01-11T21:38:06.2809222Z def triton_(in_ptr0, out_ptr0, xnumel, ynumel, XBLOCK : tl.constexpr, YBLOCK : tl.constexpr): 2023-01-11T21:38:06.2809291Z xnumel = 6 2023-01-11T21:38:06.2809365Z ynumel = 48 2023-01-11T21:38:06.2809459Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2809589Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.2809670Z xmask = xindex < xnumel 2023-01-11T21:38:06.2809759Z yoffset = tl.program_id(1) * YBLOCK 2023-01-11T21:38:06.2809895Z yindex = yoffset + tl.reshape(tl.arange(0, YBLOCK), [1, YBLOCK]) 2023-01-11T21:38:06.2809980Z ymask = yindex < ynumel 2023-01-11T21:38:06.2810050Z x3 = xindex 2023-01-11T21:38:06.2810121Z y2 = yindex 2023-01-11T21:38:06.2810193Z x0 = xindex % 3 2023-01-11T21:38:06.2810264Z x1 = (xindex // 3) 2023-01-11T21:38:06.2810378Z tmp0 = tl.load(in_ptr0 + (y2 + (48*x3)), xmask & ymask) 2023-01-11T21:38:06.2810539Z tl.store(out_ptr0 + (x0 + (3*y2) + (144*x1) + tl.zeros([XBLOCK, YBLOCK], tl.int32)), tmp0, xmask & ymask) 2023-01-11T21:38:06.2810625Z ''') 2023-01-11T21:38:06.2810631Z 2023-01-11T21:38:06.2810635Z 2023-01-11T21:38:06.2810818Z triton_fused_cat_2_3 = async_compile.triton(''' 2023-01-11T21:38:06.2810897Z import triton 2023-01-11T21:38:06.2810991Z import triton.language as tl 2023-01-11T21:38:06.2811105Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2811200Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2811339Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.2811464Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2811470Z 2023-01-11T21:38:06.2811926Z @pointwise(size_hints=[8, 64], tile_hint=TileHint.SQUARE,filename=__file__, meta={'signature': {0: '*fp64', 1: '*fp64', 2: 'i32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 3), equal_to_1=())]}) 2023-01-11T21:38:06.2811999Z @triton.jit 2023-01-11T21:38:06.2812166Z def triton_(in_ptr0, out_ptr0, xnumel, ynumel, XBLOCK : tl.constexpr, YBLOCK : tl.constexpr): 2023-01-11T21:38:06.2812237Z xnumel = 6 2023-01-11T21:38:06.2812310Z ynumel = 48 2023-01-11T21:38:06.2812399Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2812534Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.2812619Z xmask = xindex < xnumel 2023-01-11T21:38:06.2812712Z yoffset = tl.program_id(1) * YBLOCK 2023-01-11T21:38:06.2812845Z yindex = yoffset + tl.reshape(tl.arange(0, YBLOCK), [1, YBLOCK]) 2023-01-11T21:38:06.2812925Z ymask = yindex < ynumel 2023-01-11T21:38:06.2812996Z x3 = xindex 2023-01-11T21:38:06.2813059Z y2 = yindex 2023-01-11T21:38:06.2813134Z x0 = xindex % 3 2023-01-11T21:38:06.2813210Z x1 = (xindex // 3) 2023-01-11T21:38:06.2813324Z tmp0 = tl.load(in_ptr0 + (y2 + (48*x3)), xmask & ymask) 2023-01-11T21:38:06.2813487Z tl.store(out_ptr0 + (x0 + (3*y2) + (144*x1) + tl.zeros([XBLOCK, YBLOCK], tl.int32)), tmp0, xmask & ymask) 2023-01-11T21:38:06.2813572Z ''') 2023-01-11T21:38:06.2813578Z 2023-01-11T21:38:06.2813582Z 2023-01-11T21:38:06.2813677Z async_compile.wait(globals()) 2023-01-11T21:38:06.2813759Z del async_compile 2023-01-11T21:38:06.2813765Z 2023-01-11T21:38:06.2813832Z def call(args): 2023-01-11T21:38:06.2813906Z arg0_1, = args 2023-01-11T21:38:06.2813983Z args.clear() 2023-01-11T21:38:06.2814074Z with torch.cuda.device(0): 2023-01-11T21:38:06.2814321Z buf3 = empty_strided((1, 3, 3, 48), (432, 144, 48, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.2814438Z buf0 = as_strided(buf3, (1, 3, 3, 16), (432, 144, 48, 1)) # alias 2023-01-11T21:38:06.2814674Z buf1 = as_strided(buf3, (1, 3, 3, 16), (432, 144, 48, 1), 16) # alias 2023-01-11T21:38:06.2814788Z buf2 = as_strided(buf3, (1, 3, 3, 16), (432, 144, 48, 1), 32) # alias 2023-01-11T21:38:06.2815004Z buf7 = empty_strided((2, 3, 3, 16), (144, 48, 16, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.2815123Z buf5 = as_strided(buf7, (1, 3, 3, 16), (144, 48, 16, 1)) # alias 2023-01-11T21:38:06.2815242Z buf6 = as_strided(buf7, (1, 3, 3, 16), (144, 48, 16, 1), 144) # alias 2023-01-11T21:38:06.2815464Z buf11 = empty_strided((2, 3, 3, 16), (144, 48, 16, 1), device='cuda', dtype=torch.float64) 2023-01-11T21:38:06.2815583Z buf9 = as_strided(buf11, (1, 3, 3, 16), (144, 48, 16, 1)) # alias 2023-01-11T21:38:06.2815706Z buf10 = as_strided(buf11, (1, 3, 3, 16), (144, 48, 16, 1), 144) # alias 2023-01-11T21:38:06.2815799Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.2816012Z triton_fused_add_add_1_cat_cat_1_cat_2_convert_element_type_mul_slice_1_slice_2_0.run(arg0_1, buf0, buf1, buf2, buf5, buf6, buf9, buf10, 3, 48, grid=grid(3, 48), stream=stream0) 2023-01-11T21:38:06.2816088Z del arg0_1 2023-01-11T21:38:06.2816303Z buf4 = empty_strided((1, 3, 3, 48), (432, 1, 144, 3), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.2816443Z triton_fused_cat_1.run(buf3, buf4, 3, 144, grid=grid(3, 144), stream=stream0) 2023-01-11T21:38:06.2816513Z del buf0 2023-01-11T21:38:06.2816633Z del buf1 2023-01-11T21:38:06.2816703Z del buf2 2023-01-11T21:38:06.2816771Z del buf3 2023-01-11T21:38:06.2816980Z buf8 = empty_strided((2, 3, 3, 16), (144, 1, 48, 3), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.2817122Z triton_fused_cat_1_2.run(buf7, buf8, 6, 48, grid=grid(6, 48), stream=stream0) 2023-01-11T21:38:06.2817268Z del buf5 2023-01-11T21:38:06.2817356Z del buf6 2023-01-11T21:38:06.2817427Z del buf7 2023-01-11T21:38:06.2817641Z buf12 = empty_strided((2, 3, 3, 16), (144, 1, 48, 3), device='cuda', dtype=torch.float64) 2023-01-11T21:38:06.2817780Z triton_fused_cat_2_3.run(buf11, buf12, 6, 48, grid=grid(6, 48), stream=stream0) 2023-01-11T21:38:06.2817844Z del buf10 2023-01-11T21:38:06.2817915Z del buf11 2023-01-11T21:38:06.2817984Z del buf9 2023-01-11T21:38:06.2818074Z return (buf4, buf8, buf12, ) 2023-01-11T21:38:06.2818080Z 2023-01-11T21:38:06.2818088Z 2023-01-11T21:38:06.2818169Z if __name__ == "__main__": 2023-01-11T21:38:06.2818288Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.2818415Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.2818634Z arg0_1 = rand_strided((1, 3, 3, 16), (144, 1, 48, 3), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.2818742Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.2819009Z [2023-01-11 21:34:11,483] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 451 2023-01-11T21:38:06.2819430Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.2819560Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.2819820Z [2023-01-11 21:34:11,532] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 452 2023-01-11T21:38:06.2819826Z 2023-01-11T21:38:06.2819830Z 2023-01-11T21:38:06.2819926Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.2820000Z import torch 2023-01-11T21:38:06.2820114Z import random 2023-01-11T21:38:06.2820236Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.2820354Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.2820359Z 2023-01-11T21:38:06.2820441Z aten = torch.ops.aten 2023-01-11T21:38:06.2820577Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.2820673Z async_compile = AsyncCompile() 2023-01-11T21:38:06.2820679Z 2023-01-11T21:38:06.2820753Z import triton 2023-01-11T21:38:06.2820845Z import triton.language as tl 2023-01-11T21:38:06.2820968Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.2821101Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.2821116Z 2023-01-11T21:38:06.2821121Z 2023-01-11T21:38:06.2821369Z triton_fused_add_add_1_cat_cat_1_cat_2_convert_element_type_mul_slice_1_slice_2_0 = async_compile.triton(''' 2023-01-11T21:38:06.2821443Z import triton 2023-01-11T21:38:06.2821536Z import triton.language as tl 2023-01-11T21:38:06.2821650Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2821753Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2821884Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.2822011Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2822016Z 2023-01-11T21:38:06.2822574Z @pointwise(size_hints=[4, 64], tile_hint=TileHint.DEFAULT,filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: '*fp16', 4: '*fp16', 5: '*fp16', 6: '*fp64', 7: '*fp64', 8: 'i32', 9: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5, 6, 7, 9), equal_to_1=())]}) 2023-01-11T21:38:06.2822678Z @triton.jit 2023-01-11T21:38:06.2822886Z def triton_(in_ptr0, out_ptr0, out_ptr1, out_ptr2, out_ptr3, out_ptr4, out_ptr5, out_ptr6, xnumel, ynumel, XBLOCK : tl.constexpr, YBLOCK : tl.constexpr): 2023-01-11T21:38:06.2822961Z xnumel = 3 2023-01-11T21:38:06.2823036Z ynumel = 48 2023-01-11T21:38:06.2823137Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2823272Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.2823355Z xmask = xindex < xnumel 2023-01-11T21:38:06.2823452Z yoffset = tl.program_id(1) * YBLOCK 2023-01-11T21:38:06.2823577Z yindex = yoffset + tl.reshape(tl.arange(0, YBLOCK), [1, YBLOCK]) 2023-01-11T21:38:06.2823659Z ymask = yindex < ynumel 2023-01-11T21:38:06.2823729Z x0 = xindex 2023-01-11T21:38:06.2823800Z y3 = yindex 2023-01-11T21:38:06.2823876Z y1 = yindex % 16 2023-01-11T21:38:06.2823954Z y2 = (yindex // 16) 2023-01-11T21:38:06.2824195Z tmp0 = tl.load(in_ptr0 + (x0 + (3*y3)), xmask & ymask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.2824322Z tmp5 = tl.load(in_ptr0 + (x0 + (3*y3)), xmask & ymask).to(tl.float32) 2023-01-11T21:38:06.2824391Z tmp1 = 1 2023-01-11T21:38:06.2824471Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.2824542Z tmp3 = 2 2023-01-11T21:38:06.2824620Z tmp4 = tmp0 + tmp3 2023-01-11T21:38:06.2824697Z tmp6 = tmp5 * tmp3 2023-01-11T21:38:06.2824785Z tmp7 = tmp6.to(tl.float64) 2023-01-11T21:38:06.2824945Z tl.store(out_ptr0 + (y1 + (48*y2) + (144*x0) + tl.zeros([XBLOCK, YBLOCK], tl.int32)), tmp0, xmask & ymask) 2023-01-11T21:38:06.2825109Z tl.store(out_ptr1 + (y1 + (48*y2) + (144*x0) + tl.zeros([XBLOCK, YBLOCK], tl.int32)), tmp2, xmask & ymask) 2023-01-11T21:38:06.2825268Z tl.store(out_ptr2 + (y1 + (48*y2) + (144*x0) + tl.zeros([XBLOCK, YBLOCK], tl.int32)), tmp4, xmask & ymask) 2023-01-11T21:38:06.2825447Z tl.store(out_ptr3 + (y3 + (48*x0) + tl.zeros([XBLOCK, YBLOCK], tl.int32)), tmp6, xmask & ymask) 2023-01-11T21:38:06.2825623Z tl.store(out_ptr4 + (y3 + (48*x0) + tl.zeros([XBLOCK, YBLOCK], tl.int32)), tmp6, xmask & ymask) 2023-01-11T21:38:06.2825770Z tl.store(out_ptr5 + (y3 + (48*x0) + tl.zeros([XBLOCK, YBLOCK], tl.int32)), tmp7, xmask & ymask) 2023-01-11T21:38:06.2825942Z tl.store(out_ptr6 + (y3 + (48*x0) + tl.zeros([XBLOCK, YBLOCK], tl.int32)), tmp7, xmask & ymask) 2023-01-11T21:38:06.2826030Z ''') 2023-01-11T21:38:06.2826035Z 2023-01-11T21:38:06.2826040Z 2023-01-11T21:38:06.2826195Z triton_fused_cat_1 = async_compile.triton(''' 2023-01-11T21:38:06.2826263Z import triton 2023-01-11T21:38:06.2826357Z import triton.language as tl 2023-01-11T21:38:06.2826475Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2826576Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2826709Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.2826834Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2826842Z 2023-01-11T21:38:06.2827302Z @pointwise(size_hints=[4, 256], tile_hint=TileHint.SQUARE,filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 3), equal_to_1=())]}) 2023-01-11T21:38:06.2827380Z @triton.jit 2023-01-11T21:38:06.2827540Z def triton_(in_ptr0, out_ptr0, xnumel, ynumel, XBLOCK : tl.constexpr, YBLOCK : tl.constexpr): 2023-01-11T21:38:06.2827617Z xnumel = 3 2023-01-11T21:38:06.2827689Z ynumel = 144 2023-01-11T21:38:06.2827784Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2827920Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.2828004Z xmask = xindex < xnumel 2023-01-11T21:38:06.2828099Z yoffset = tl.program_id(1) * YBLOCK 2023-01-11T21:38:06.2828224Z yindex = yoffset + tl.reshape(tl.arange(0, YBLOCK), [1, YBLOCK]) 2023-01-11T21:38:06.2828306Z ymask = yindex < ynumel 2023-01-11T21:38:06.2828408Z x0 = xindex 2023-01-11T21:38:06.2828478Z y1 = yindex 2023-01-11T21:38:06.2828609Z tmp0 = tl.load(in_ptr0 + (y1 + (144*x0)), xmask & ymask).to(tl.float32) 2023-01-11T21:38:06.2828765Z tl.store(out_ptr0 + (x0 + (3*y1) + tl.zeros([XBLOCK, YBLOCK], tl.int32)), tmp0, xmask & ymask) 2023-01-11T21:38:06.2828852Z ''') 2023-01-11T21:38:06.2828858Z 2023-01-11T21:38:06.2828862Z 2023-01-11T21:38:06.2829013Z triton_fused_cat_1_2 = async_compile.triton(''' 2023-01-11T21:38:06.2829088Z import triton 2023-01-11T21:38:06.2829180Z import triton.language as tl 2023-01-11T21:38:06.2829294Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2829397Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2829529Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.2829655Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2829660Z 2023-01-11T21:38:06.2830119Z @pointwise(size_hints=[8, 64], tile_hint=TileHint.SQUARE,filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 3), equal_to_1=())]}) 2023-01-11T21:38:06.2830196Z @triton.jit 2023-01-11T21:38:06.2830357Z def triton_(in_ptr0, out_ptr0, xnumel, ynumel, XBLOCK : tl.constexpr, YBLOCK : tl.constexpr): 2023-01-11T21:38:06.2830430Z xnumel = 6 2023-01-11T21:38:06.2830503Z ynumel = 48 2023-01-11T21:38:06.2830597Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2830729Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.2830812Z xmask = xindex < xnumel 2023-01-11T21:38:06.2830908Z yoffset = tl.program_id(1) * YBLOCK 2023-01-11T21:38:06.2831032Z yindex = yoffset + tl.reshape(tl.arange(0, YBLOCK), [1, YBLOCK]) 2023-01-11T21:38:06.2831117Z ymask = yindex < ynumel 2023-01-11T21:38:06.2831191Z x3 = xindex 2023-01-11T21:38:06.2831263Z y2 = yindex 2023-01-11T21:38:06.2831340Z x0 = xindex % 3 2023-01-11T21:38:06.2831418Z x1 = (xindex // 3) 2023-01-11T21:38:06.2831544Z tmp0 = tl.load(in_ptr0 + (y2 + (48*x3)), xmask & ymask).to(tl.float32) 2023-01-11T21:38:06.2831706Z tl.store(out_ptr0 + (x0 + (3*y2) + (144*x1) + tl.zeros([XBLOCK, YBLOCK], tl.int32)), tmp0, xmask & ymask) 2023-01-11T21:38:06.2831818Z ''') 2023-01-11T21:38:06.2831824Z 2023-01-11T21:38:06.2831828Z 2023-01-11T21:38:06.2831987Z triton_fused_cat_2_3 = async_compile.triton(''' 2023-01-11T21:38:06.2832063Z import triton 2023-01-11T21:38:06.2832155Z import triton.language as tl 2023-01-11T21:38:06.2832270Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2832375Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2832500Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.2832625Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2832630Z 2023-01-11T21:38:06.2833084Z @pointwise(size_hints=[8, 64], tile_hint=TileHint.SQUARE,filename=__file__, meta={'signature': {0: '*fp64', 1: '*fp64', 2: 'i32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 3), equal_to_1=())]}) 2023-01-11T21:38:06.2833161Z @triton.jit 2023-01-11T21:38:06.2833330Z def triton_(in_ptr0, out_ptr0, xnumel, ynumel, XBLOCK : tl.constexpr, YBLOCK : tl.constexpr): 2023-01-11T21:38:06.2833403Z xnumel = 6 2023-01-11T21:38:06.2833475Z ynumel = 48 2023-01-11T21:38:06.2833573Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2833700Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.2833783Z xmask = xindex < xnumel 2023-01-11T21:38:06.2833878Z yoffset = tl.program_id(1) * YBLOCK 2023-01-11T21:38:06.2834007Z yindex = yoffset + tl.reshape(tl.arange(0, YBLOCK), [1, YBLOCK]) 2023-01-11T21:38:06.2834090Z ymask = yindex < ynumel 2023-01-11T21:38:06.2834158Z x3 = xindex 2023-01-11T21:38:06.2834271Z y2 = yindex 2023-01-11T21:38:06.2834337Z x0 = xindex % 3 2023-01-11T21:38:06.2834417Z x1 = (xindex // 3) 2023-01-11T21:38:06.2834535Z tmp0 = tl.load(in_ptr0 + (y2 + (48*x3)), xmask & ymask) 2023-01-11T21:38:06.2834698Z tl.store(out_ptr0 + (x0 + (3*y2) + (144*x1) + tl.zeros([XBLOCK, YBLOCK], tl.int32)), tmp0, xmask & ymask) 2023-01-11T21:38:06.2834790Z ''') 2023-01-11T21:38:06.2834795Z 2023-01-11T21:38:06.2834800Z 2023-01-11T21:38:06.2834894Z async_compile.wait(globals()) 2023-01-11T21:38:06.2834972Z del async_compile 2023-01-11T21:38:06.2834977Z 2023-01-11T21:38:06.2835054Z def call(args): 2023-01-11T21:38:06.2835138Z arg0_1, = args 2023-01-11T21:38:06.2835219Z args.clear() 2023-01-11T21:38:06.2835333Z with torch.cuda.device(0): 2023-01-11T21:38:06.2835559Z buf3 = empty_strided((1, 3, 3, 48), (432, 144, 48, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.2835675Z buf0 = as_strided(buf3, (1, 3, 3, 16), (432, 144, 48, 1)) # alias 2023-01-11T21:38:06.2835798Z buf1 = as_strided(buf3, (1, 3, 3, 16), (432, 144, 48, 1), 16) # alias 2023-01-11T21:38:06.2835917Z buf2 = as_strided(buf3, (1, 3, 3, 16), (432, 144, 48, 1), 32) # alias 2023-01-11T21:38:06.2836127Z buf7 = empty_strided((2, 3, 3, 16), (144, 48, 16, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.2836248Z buf5 = as_strided(buf7, (1, 3, 3, 16), (144, 48, 16, 1)) # alias 2023-01-11T21:38:06.2836366Z buf6 = as_strided(buf7, (1, 3, 3, 16), (144, 48, 16, 1), 144) # alias 2023-01-11T21:38:06.2836585Z buf11 = empty_strided((2, 3, 3, 16), (144, 48, 16, 1), device='cuda', dtype=torch.float64) 2023-01-11T21:38:06.2836700Z buf9 = as_strided(buf11, (1, 3, 3, 16), (144, 48, 16, 1)) # alias 2023-01-11T21:38:06.2836818Z buf10 = as_strided(buf11, (1, 3, 3, 16), (144, 48, 16, 1), 144) # alias 2023-01-11T21:38:06.2836910Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.2837132Z triton_fused_add_add_1_cat_cat_1_cat_2_convert_element_type_mul_slice_1_slice_2_0.run(arg0_1, buf0, buf1, buf2, buf5, buf6, buf9, buf10, 3, 48, grid=grid(3, 48), stream=stream0) 2023-01-11T21:38:06.2837208Z del arg0_1 2023-01-11T21:38:06.2837417Z buf4 = empty_strided((1, 3, 3, 48), (432, 1, 144, 3), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.2837583Z triton_fused_cat_1.run(buf3, buf4, 3, 144, grid=grid(3, 144), stream=stream0) 2023-01-11T21:38:06.2837657Z del buf0 2023-01-11T21:38:06.2837732Z del buf1 2023-01-11T21:38:06.2837801Z del buf2 2023-01-11T21:38:06.2837870Z del buf3 2023-01-11T21:38:06.2838087Z buf8 = empty_strided((2, 3, 3, 16), (144, 1, 48, 3), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.2838219Z triton_fused_cat_1_2.run(buf7, buf8, 6, 48, grid=grid(6, 48), stream=stream0) 2023-01-11T21:38:06.2838289Z del buf5 2023-01-11T21:38:06.2838362Z del buf6 2023-01-11T21:38:06.2838431Z del buf7 2023-01-11T21:38:06.2838647Z buf12 = empty_strided((2, 3, 3, 16), (144, 1, 48, 3), device='cuda', dtype=torch.float64) 2023-01-11T21:38:06.2838788Z triton_fused_cat_2_3.run(buf11, buf12, 6, 48, grid=grid(6, 48), stream=stream0) 2023-01-11T21:38:06.2838860Z del buf10 2023-01-11T21:38:06.2838924Z del buf11 2023-01-11T21:38:06.2838992Z del buf9 2023-01-11T21:38:06.2839084Z return (buf4, buf8, buf12, ) 2023-01-11T21:38:06.2839089Z 2023-01-11T21:38:06.2839094Z 2023-01-11T21:38:06.2839175Z if __name__ == "__main__": 2023-01-11T21:38:06.2839295Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.2839421Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.2839638Z arg0_1 = rand_strided((1, 3, 3, 16), (144, 1, 48, 3), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.2839750Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.2840008Z [2023-01-11 21:34:11,904] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 452 2023-01-11T21:38:06.2840042Z 2023-01-11T21:38:06.2840114Z ok (1.677s) 2023-01-11T21:38:06.2840581Z test_cat_extern_kernel_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.2840715Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.2840970Z [2023-01-11 21:34:11,940] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 453 2023-01-11T21:38:06.2841232Z [2023-01-11 21:34:12,023] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 453 2023-01-11T21:38:06.2841239Z 2023-01-11T21:38:06.2841336Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.2841413Z import torch 2023-01-11T21:38:06.2841492Z import random 2023-01-11T21:38:06.2841604Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.2841728Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.2841734Z 2023-01-11T21:38:06.2841816Z aten = torch.ops.aten 2023-01-11T21:38:06.2841958Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.2842054Z async_compile = AsyncCompile() 2023-01-11T21:38:06.2842059Z 2023-01-11T21:38:06.2842133Z import triton 2023-01-11T21:38:06.2842228Z import triton.language as tl 2023-01-11T21:38:06.2842353Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.2842487Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.2842492Z 2023-01-11T21:38:06.2842503Z 2023-01-11T21:38:06.2842649Z triton_fused_cat_0 = async_compile.triton(''' 2023-01-11T21:38:06.2842723Z import triton 2023-01-11T21:38:06.2842815Z import triton.language as tl 2023-01-11T21:38:06.2842932Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2843032Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2843162Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.2843289Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2843294Z 2023-01-11T21:38:06.2843720Z @pointwise(size_hints=[65536], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.2843797Z @triton.jit 2023-01-11T21:38:06.2843930Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.2844003Z xnumel = 65536 2023-01-11T21:38:06.2844100Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2844229Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.2844311Z xmask = xindex < xnumel 2023-01-11T21:38:06.2844377Z x2 = xindex 2023-01-11T21:38:06.2844453Z x0 = xindex % 256 2023-01-11T21:38:06.2844535Z x1 = (xindex // 256) 2023-01-11T21:38:06.2844633Z tmp0 = tl.load(in_ptr0 + (x2), xmask) 2023-01-11T21:38:06.2844776Z tl.store(out_ptr0 + (x0 + (512*x1) + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask) 2023-01-11T21:38:06.2844862Z ''') 2023-01-11T21:38:06.2844872Z 2023-01-11T21:38:06.2844877Z 2023-01-11T21:38:06.2844971Z async_compile.wait(globals()) 2023-01-11T21:38:06.2845049Z del async_compile 2023-01-11T21:38:06.2845054Z 2023-01-11T21:38:06.2845122Z def call(args): 2023-01-11T21:38:06.2845215Z arg0_1, arg1_1, arg2_1, arg3_1 = args 2023-01-11T21:38:06.2845290Z args.clear() 2023-01-11T21:38:06.2845381Z with torch.cuda.device(0): 2023-01-11T21:38:06.2845591Z buf0 = empty_strided((256, 1600), (1600, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.2845692Z aten.mm.out(arg1_1, arg2_1, out=buf0) 2023-01-11T21:38:06.2845764Z del arg1_1 2023-01-11T21:38:06.2845859Z del arg2_1 2023-01-11T21:38:06.2846068Z buf3 = empty_strided((256, 512), (512, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.2846184Z buf1 = as_strided(buf3, (256, 256), (512, 1)) # alias 2023-01-11T21:38:06.2846310Z aten.mm.out(as_strided(buf0, (256, 100), (1600, 1)), arg3_1, out=buf1) 2023-01-11T21:38:06.2846386Z del arg3_1 2023-01-11T21:38:06.2846458Z del buf0 2023-01-11T21:38:06.2846573Z buf2 = as_strided(buf3, (256, 256), (512, 1), 256) # alias 2023-01-11T21:38:06.2846659Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.2846800Z triton_fused_cat_0.run(arg0_1, buf2, 65536, grid=grid(65536), stream=stream0) 2023-01-11T21:38:06.2846872Z del arg0_1 2023-01-11T21:38:06.2846949Z return (buf3, ) 2023-01-11T21:38:06.2846954Z 2023-01-11T21:38:06.2846958Z 2023-01-11T21:38:06.2847035Z if __name__ == "__main__": 2023-01-11T21:38:06.2847152Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.2847281Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.2847486Z arg0_1 = rand_strided((256, 256), (256, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.2847688Z arg1_1 = rand_strided((256, 1024), (1024, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.2847899Z arg2_1 = rand_strided((1024, 1600), (1600, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.2848101Z arg3_1 = rand_strided((100, 256), (256, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.2848232Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1, arg3_1])) 2023-01-11T21:38:06.2848238Z 2023-01-11T21:38:06.2848309Z ok (0.120s) 2023-01-11T21:38:06.2848770Z test_cat_upcasting_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.2848905Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.2849162Z [2023-01-11 21:34:12,046] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 454 2023-01-11T21:38:06.2849458Z [2023-01-11 21:34:12,221] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 454 2023-01-11T21:38:06.2849878Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.2850009Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.2850258Z [2023-01-11 21:34:12,240] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 455 2023-01-11T21:38:06.2850522Z [2023-01-11 21:34:12,327] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 455 2023-01-11T21:38:06.2850527Z 2023-01-11T21:38:06.2850624Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.2850701Z import torch 2023-01-11T21:38:06.2850776Z import random 2023-01-11T21:38:06.2850895Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.2851016Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.2851021Z 2023-01-11T21:38:06.2851101Z aten = torch.ops.aten 2023-01-11T21:38:06.2851231Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.2851322Z async_compile = AsyncCompile() 2023-01-11T21:38:06.2851327Z 2023-01-11T21:38:06.2851399Z import triton 2023-01-11T21:38:06.2851489Z import triton.language as tl 2023-01-11T21:38:06.2851616Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.2851779Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.2851785Z 2023-01-11T21:38:06.2851789Z 2023-01-11T21:38:06.2851943Z triton_fused_cat_0 = async_compile.triton(''' 2023-01-11T21:38:06.2852017Z import triton 2023-01-11T21:38:06.2852103Z import triton.language as tl 2023-01-11T21:38:06.2852220Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2852320Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2852451Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.2852579Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2852584Z 2023-01-11T21:38:06.2852986Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.2853060Z @triton.jit 2023-01-11T21:38:06.2853197Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.2853264Z xnumel = 128 2023-01-11T21:38:06.2853358Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2853486Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.2853569Z xmask = xindex < xnumel 2023-01-11T21:38:06.2853642Z x2 = xindex 2023-01-11T21:38:06.2853717Z x0 = xindex % 16 2023-01-11T21:38:06.2853797Z x1 = (xindex // 16) 2023-01-11T21:38:06.2853888Z tmp0 = tl.load(in_ptr0 + (x2), xmask) 2023-01-11T21:38:06.2854029Z tl.store(out_ptr0 + (x0 + (36*x1) + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask) 2023-01-11T21:38:06.2854114Z ''') 2023-01-11T21:38:06.2854119Z 2023-01-11T21:38:06.2854124Z 2023-01-11T21:38:06.2854276Z triton_fused_cat_1 = async_compile.triton(''' 2023-01-11T21:38:06.2854349Z import triton 2023-01-11T21:38:06.2854442Z import triton.language as tl 2023-01-11T21:38:06.2854663Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2854762Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2854896Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.2855019Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2855025Z 2023-01-11T21:38:06.2855516Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.2855593Z @triton.jit 2023-01-11T21:38:06.2855725Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.2855798Z xnumel = 160 2023-01-11T21:38:06.2855893Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2856015Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.2856098Z xmask = xindex < xnumel 2023-01-11T21:38:06.2856168Z x2 = xindex 2023-01-11T21:38:06.2856242Z x0 = xindex % 20 2023-01-11T21:38:06.2856323Z x1 = (xindex // 20) 2023-01-11T21:38:06.2856441Z tmp0 = tl.load(in_ptr0 + (x2), xmask).to(tl.float32) 2023-01-11T21:38:06.2856528Z tmp1 = tmp0.to(tl.float32) 2023-01-11T21:38:06.2856661Z tl.store(out_ptr0 + (x0 + (36*x1) + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask) 2023-01-11T21:38:06.2856748Z ''') 2023-01-11T21:38:06.2856753Z 2023-01-11T21:38:06.2856760Z 2023-01-11T21:38:06.2856850Z async_compile.wait(globals()) 2023-01-11T21:38:06.2856926Z del async_compile 2023-01-11T21:38:06.2856931Z 2023-01-11T21:38:06.2857004Z def call(args): 2023-01-11T21:38:06.2857081Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.2857213Z args.clear() 2023-01-11T21:38:06.2857306Z with torch.cuda.device(0): 2023-01-11T21:38:06.2857504Z buf2 = empty_strided((8, 36), (36, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.2857614Z buf0 = as_strided(buf2, (8, 16), (36, 1)) # alias 2023-01-11T21:38:06.2857705Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.2857881Z triton_fused_cat_0.run(arg0_1, buf0, 128, grid=grid(128), stream=stream0) 2023-01-11T21:38:06.2857954Z del arg0_1 2023-01-11T21:38:06.2858066Z buf1 = as_strided(buf2, (8, 20), (36, 1), 16) # alias 2023-01-11T21:38:06.2858202Z triton_fused_cat_1.run(arg1_1, buf1, 160, grid=grid(160), stream=stream0) 2023-01-11T21:38:06.2858271Z del arg1_1 2023-01-11T21:38:06.2858348Z return (buf2, ) 2023-01-11T21:38:06.2858353Z 2023-01-11T21:38:06.2858357Z 2023-01-11T21:38:06.2858436Z if __name__ == "__main__": 2023-01-11T21:38:06.2858553Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.2864823Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.2865063Z arg0_1 = rand_strided((8, 16), (16, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.2865265Z arg1_1 = rand_strided((8, 20), (20, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.2865408Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.2865423Z 2023-01-11T21:38:06.2865428Z 2023-01-11T21:38:06.2865537Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.2865627Z import torch 2023-01-11T21:38:06.2865702Z import random 2023-01-11T21:38:06.2865815Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.2865944Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.2865949Z 2023-01-11T21:38:06.2866033Z aten = torch.ops.aten 2023-01-11T21:38:06.2866171Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.2866268Z async_compile = AsyncCompile() 2023-01-11T21:38:06.2866274Z 2023-01-11T21:38:06.2866349Z import triton 2023-01-11T21:38:06.2866437Z import triton.language as tl 2023-01-11T21:38:06.2866555Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.2866698Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.2866703Z 2023-01-11T21:38:06.2866708Z 2023-01-11T21:38:06.2866859Z triton_fused_cat_0 = async_compile.triton(''' 2023-01-11T21:38:06.2866938Z import triton 2023-01-11T21:38:06.2867028Z import triton.language as tl 2023-01-11T21:38:06.2867143Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2867238Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2867370Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.2867548Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2867554Z 2023-01-11T21:38:06.2867959Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.2868032Z @triton.jit 2023-01-11T21:38:06.2868165Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.2868240Z xnumel = 128 2023-01-11T21:38:06.2868337Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2868459Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.2868545Z xmask = xindex < xnumel 2023-01-11T21:38:06.2868615Z x2 = xindex 2023-01-11T21:38:06.2868691Z x0 = xindex % 16 2023-01-11T21:38:06.2868768Z x1 = (xindex // 16) 2023-01-11T21:38:06.2868886Z tmp0 = tl.load(in_ptr0 + (x2), xmask).to(tl.float32) 2023-01-11T21:38:06.2869030Z tl.store(out_ptr0 + (x0 + (36*x1) + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask) 2023-01-11T21:38:06.2869109Z ''') 2023-01-11T21:38:06.2869114Z 2023-01-11T21:38:06.2869118Z 2023-01-11T21:38:06.2869275Z triton_fused_cat_1 = async_compile.triton(''' 2023-01-11T21:38:06.2869350Z import triton 2023-01-11T21:38:06.2869442Z import triton.language as tl 2023-01-11T21:38:06.2869554Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2869656Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2869790Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.2869909Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2869952Z 2023-01-11T21:38:06.2870349Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.2870421Z @triton.jit 2023-01-11T21:38:06.2870554Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.2870626Z xnumel = 160 2023-01-11T21:38:06.2870720Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2870849Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.2870933Z xmask = xindex < xnumel 2023-01-11T21:38:06.2870997Z x2 = xindex 2023-01-11T21:38:06.2871072Z x0 = xindex % 20 2023-01-11T21:38:06.2871150Z x1 = (xindex // 20) 2023-01-11T21:38:06.2871267Z tmp0 = tl.load(in_ptr0 + (x2), xmask).to(tl.float32) 2023-01-11T21:38:06.2871410Z tl.store(out_ptr0 + (x0 + (36*x1) + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask) 2023-01-11T21:38:06.2871499Z ''') 2023-01-11T21:38:06.2871505Z 2023-01-11T21:38:06.2871509Z 2023-01-11T21:38:06.2871601Z async_compile.wait(globals()) 2023-01-11T21:38:06.2871677Z del async_compile 2023-01-11T21:38:06.2871682Z 2023-01-11T21:38:06.2871750Z def call(args): 2023-01-11T21:38:06.2871831Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.2871906Z args.clear() 2023-01-11T21:38:06.2871999Z with torch.cuda.device(0): 2023-01-11T21:38:06.2872200Z buf2 = empty_strided((8, 36), (36, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.2872309Z buf0 = as_strided(buf2, (8, 16), (36, 1)) # alias 2023-01-11T21:38:06.2872400Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.2872530Z triton_fused_cat_0.run(arg0_1, buf0, 128, grid=grid(128), stream=stream0) 2023-01-11T21:38:06.2872603Z del arg0_1 2023-01-11T21:38:06.2872715Z buf1 = as_strided(buf2, (8, 20), (36, 1), 16) # alias 2023-01-11T21:38:06.2872850Z triton_fused_cat_1.run(arg1_1, buf1, 160, grid=grid(160), stream=stream0) 2023-01-11T21:38:06.2872926Z del arg1_1 2023-01-11T21:38:06.2873005Z return (buf2, ) 2023-01-11T21:38:06.2873010Z 2023-01-11T21:38:06.2873015Z 2023-01-11T21:38:06.2873093Z if __name__ == "__main__": 2023-01-11T21:38:06.2873246Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.2873368Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.2873571Z arg0_1 = rand_strided((8, 16), (16, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.2873768Z arg1_1 = rand_strided((8, 20), (20, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.2873887Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.2873892Z 2023-01-11T21:38:06.2873962Z ok (0.302s) 2023-01-11T21:38:06.2874418Z test_cauchy_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.2874553Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.2874813Z [2023-01-11 21:34:12,349] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 456 2023-01-11T21:38:06.2875079Z [2023-01-11 21:34:12,527] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 456 2023-01-11T21:38:06.2875086Z 2023-01-11T21:38:06.2875189Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.2875276Z import torch 2023-01-11T21:38:06.2875357Z import random 2023-01-11T21:38:06.2875495Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.2875618Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.2875623Z 2023-01-11T21:38:06.2875737Z aten = torch.ops.aten 2023-01-11T21:38:06.2875874Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.2875962Z async_compile = AsyncCompile() 2023-01-11T21:38:06.2875973Z 2023-01-11T21:38:06.2876040Z import triton 2023-01-11T21:38:06.2876130Z import triton.language as tl 2023-01-11T21:38:06.2876257Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.2876395Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.2876400Z 2023-01-11T21:38:06.2876405Z 2023-01-11T21:38:06.2876613Z triton_fused_mul_reciprocal_sub_sum_1_unsqueeze_0 = async_compile.triton(''' 2023-01-11T21:38:06.2876688Z import triton 2023-01-11T21:38:06.2876782Z import triton.language as tl 2023-01-11T21:38:06.2876888Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2876990Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2877121Z from torch._inductor.triton_ops.autotune import reduction 2023-01-11T21:38:06.2877248Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2877256Z 2023-01-11T21:38:06.2877350Z @reduction(size_hints=[1, 1024], 2023-01-11T21:38:06.2877465Z reduction_hint=ReductionHint.INNER, 2023-01-11T21:38:06.2877550Z filename=__file__, 2023-01-11T21:38:06.2877931Z meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 4), equal_to_1=())]}) 2023-01-11T21:38:06.2877998Z @triton.jit 2023-01-11T21:38:06.2878175Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr): 2023-01-11T21:38:06.2878248Z xnumel = 1 2023-01-11T21:38:06.2878325Z rnumel = 1024 2023-01-11T21:38:06.2878424Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2878558Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.2878640Z xmask = xindex < xnumel 2023-01-11T21:38:06.2878755Z rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK]) 2023-01-11T21:38:06.2878873Z _tmp6 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.2878978Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.2879064Z rindex = roffset + rbase 2023-01-11T21:38:06.2879149Z rmask = rindex < rnumel 2023-01-11T21:38:06.2879255Z r1 = (rindex // 32) 2023-01-11T21:38:06.2879335Z r0 = rindex % 32 2023-01-11T21:38:06.2879431Z tmp0 = tl.load(in_ptr0 + (r1), rmask) 2023-01-11T21:38:06.2879532Z tmp1 = tl.load(in_ptr1 + (r0), rmask) 2023-01-11T21:38:06.2879649Z tmp2 = tmp0 - tmp1 2023-01-11T21:38:06.2879725Z tmp3 = 1 / tmp2 2023-01-11T21:38:06.2879802Z tmp4 = 1 2023-01-11T21:38:06.2879884Z tmp5 = tmp3 * tmp4 2023-01-11T21:38:06.2880007Z _tmp6 = tl.where(xmask & rmask, _tmp6 + tmp5, _tmp6) 2023-01-11T21:38:06.2880115Z tmp6 = tl.reshape(tl.sum(_tmp6, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.2880256Z tl.store(out_ptr0 + 0 + tl.zeros([XBLOCK, 1], tl.int32), tmp6, None) 2023-01-11T21:38:06.2880340Z ''') 2023-01-11T21:38:06.2880345Z 2023-01-11T21:38:06.2880350Z 2023-01-11T21:38:06.2880443Z async_compile.wait(globals()) 2023-01-11T21:38:06.2880519Z del async_compile 2023-01-11T21:38:06.2880524Z 2023-01-11T21:38:06.2880597Z def call(args): 2023-01-11T21:38:06.2880678Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.2880756Z args.clear() 2023-01-11T21:38:06.2880842Z with torch.cuda.device(0): 2023-01-11T21:38:06.2881030Z buf0 = empty_strided((), (), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.2881121Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.2881298Z triton_fused_mul_reciprocal_sub_sum_1_unsqueeze_0.run(arg0_1, arg1_1, buf0, 1, 1024, grid=grid(1), stream=stream0) 2023-01-11T21:38:06.2881371Z del arg0_1 2023-01-11T21:38:06.2881443Z del arg1_1 2023-01-11T21:38:06.2881520Z return (buf0, ) 2023-01-11T21:38:06.2881526Z 2023-01-11T21:38:06.2881577Z 2023-01-11T21:38:06.2881657Z if __name__ == "__main__": 2023-01-11T21:38:06.2881768Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.2881895Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.2882094Z arg0_1 = rand_strided((32, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.2882293Z arg1_1 = rand_strided((32, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.2882413Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.2882418Z 2023-01-11T21:38:06.2882488Z ok (0.199s) 2023-01-11T21:38:06.2882943Z test_clamp_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.2883080Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.2883337Z [2023-01-11 21:34:12,556] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 457 2023-01-11T21:38:06.2883597Z [2023-01-11 21:34:12,740] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 457 2023-01-11T21:38:06.2884013Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.2884144Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.2884398Z [2023-01-11 21:34:12,769] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 458 2023-01-11T21:38:06.2884664Z [2023-01-11 21:34:12,855] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 458 2023-01-11T21:38:06.2884670Z 2023-01-11T21:38:06.2884769Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.2884843Z import torch 2023-01-11T21:38:06.2884916Z import random 2023-01-11T21:38:06.2885064Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.2885185Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.2885190Z 2023-01-11T21:38:06.2885271Z aten = torch.ops.aten 2023-01-11T21:38:06.2885408Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.2885505Z async_compile = AsyncCompile() 2023-01-11T21:38:06.2885510Z 2023-01-11T21:38:06.2885584Z import triton 2023-01-11T21:38:06.2885676Z import triton.language as tl 2023-01-11T21:38:06.2885801Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.2885933Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.2885947Z 2023-01-11T21:38:06.2885951Z 2023-01-11T21:38:06.2886140Z triton_fused_maximum_1_minimum_minimum_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.2886215Z import triton 2023-01-11T21:38:06.2886307Z import triton.language as tl 2023-01-11T21:38:06.2886421Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2886525Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2886658Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.2886780Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2886786Z 2023-01-11T21:38:06.2887233Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: '*fp32', 5: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5), equal_to_1=())]}) 2023-01-11T21:38:06.2887300Z @triton.jit 2023-01-11T21:38:06.2887458Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, out_ptr2, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.2887556Z xnumel = 64 2023-01-11T21:38:06.2887654Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2887783Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.2887866Z xmask = xindex < xnumel 2023-01-11T21:38:06.2887940Z x0 = xindex 2023-01-11T21:38:06.2888124Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.2888316Z tmp5 = tl.load(in_ptr1 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.2888413Z tmp8 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.2888511Z tmp9 = tl.load(in_ptr1 + (x0), xmask) 2023-01-11T21:38:06.2888630Z tmp1 = -0.10000000149011612 2023-01-11T21:38:06.2888768Z tmp2 = tl.where(tmp0 != tmp0, tmp0, tl.where(tmp0 > tmp1, tmp0, tmp1)) 2023-01-11T21:38:06.2888847Z tmp3 = 0.10000000149011612 2023-01-11T21:38:06.2888983Z tmp4 = tl.where(tmp2 != tmp2, tmp2, tl.where(tmp2 < tmp3, tmp2, tmp3)) 2023-01-11T21:38:06.2889052Z tmp6 = 0.0 2023-01-11T21:38:06.2889186Z tmp7 = tl.where(tmp5 != tmp5, tmp5, tl.where(tmp5 > tmp6, tmp5, tmp6)) 2023-01-11T21:38:06.2889268Z tmp10 = tmp8 + tmp9 2023-01-11T21:38:06.2889411Z tmp11 = tl.where(tmp10 != tmp10, tmp10, tl.where(tmp10 < tmp6, tmp10, tmp6)) 2023-01-11T21:38:06.2889547Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.2889681Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp7, xmask) 2023-01-11T21:38:06.2889811Z tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp11, xmask) 2023-01-11T21:38:06.2889890Z ''') 2023-01-11T21:38:06.2889896Z 2023-01-11T21:38:06.2889907Z 2023-01-11T21:38:06.2889994Z async_compile.wait(globals()) 2023-01-11T21:38:06.2890073Z del async_compile 2023-01-11T21:38:06.2890078Z 2023-01-11T21:38:06.2890153Z def call(args): 2023-01-11T21:38:06.2890232Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.2890306Z args.clear() 2023-01-11T21:38:06.2890400Z with torch.cuda.device(0): 2023-01-11T21:38:06.2890600Z buf0 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.2890792Z buf1 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.2891018Z buf2 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.2891111Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.2891287Z triton_fused_maximum_1_minimum_minimum_1_0.run(arg0_1, arg1_1, buf0, buf1, buf2, 64, grid=grid(64), stream=stream0) 2023-01-11T21:38:06.2891362Z del arg0_1 2023-01-11T21:38:06.2891434Z del arg1_1 2023-01-11T21:38:06.2891523Z return (buf0, buf1, buf2, ) 2023-01-11T21:38:06.2891528Z 2023-01-11T21:38:06.2891532Z 2023-01-11T21:38:06.2891613Z if __name__ == "__main__": 2023-01-11T21:38:06.2891725Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.2891853Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.2892054Z arg0_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.2892251Z arg1_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.2892368Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.2892374Z 2023-01-11T21:38:06.2892380Z 2023-01-11T21:38:06.2892478Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.2892552Z import torch 2023-01-11T21:38:06.2892625Z import random 2023-01-11T21:38:06.2892738Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.2892861Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.2892866Z 2023-01-11T21:38:06.2892946Z aten = torch.ops.aten 2023-01-11T21:38:06.2893081Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.2893173Z async_compile = AsyncCompile() 2023-01-11T21:38:06.2893178Z 2023-01-11T21:38:06.2893250Z import triton 2023-01-11T21:38:06.2893373Z import triton.language as tl 2023-01-11T21:38:06.2893491Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.2893629Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.2893635Z 2023-01-11T21:38:06.2893639Z 2023-01-11T21:38:06.2893835Z triton_fused_maximum_1_minimum_minimum_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.2893910Z import triton 2023-01-11T21:38:06.2894001Z import triton.language as tl 2023-01-11T21:38:06.2894115Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2894217Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2894348Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.2894465Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2894705Z 2023-01-11T21:38:06.2895166Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: '*fp16', 4: '*fp16', 5: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5), equal_to_1=())]}) 2023-01-11T21:38:06.2895258Z @triton.jit 2023-01-11T21:38:06.2895442Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, out_ptr2, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.2895536Z xnumel = 64 2023-01-11T21:38:06.2895641Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2895775Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.2895862Z xmask = xindex < xnumel 2023-01-11T21:38:06.2895928Z x0 = xindex 2023-01-11T21:38:06.2896147Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.2896363Z tmp5 = tl.load(in_ptr1 + (x0), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.2896481Z tmp8 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.2896598Z tmp9 = tl.load(in_ptr1 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.2896716Z tmp1 = -0.0999755859375 2023-01-11T21:38:06.2896858Z tmp2 = tl.where(tmp0 != tmp0, tmp0, tl.where(tmp0 > tmp1, tmp0, tmp1)) 2023-01-11T21:38:06.2896939Z tmp3 = 0.0999755859375 2023-01-11T21:38:06.2897066Z tmp4 = tl.where(tmp2 != tmp2, tmp2, tl.where(tmp2 < tmp3, tmp2, tmp3)) 2023-01-11T21:38:06.2897202Z tmp6 = 0.0 2023-01-11T21:38:06.2897437Z tmp7 = tl.where(tmp5 != tmp5, tmp5, tl.where(tmp5 > tmp6, tmp5, tmp6)) 2023-01-11T21:38:06.2897520Z tmp10 = tmp8 + tmp9 2023-01-11T21:38:06.2897660Z tmp11 = tl.where(tmp10 != tmp10, tmp10, tl.where(tmp10 < tmp6, tmp10, tmp6)) 2023-01-11T21:38:06.2897794Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.2897924Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp7, xmask) 2023-01-11T21:38:06.2898049Z tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp11, xmask) 2023-01-11T21:38:06.2898137Z ''') 2023-01-11T21:38:06.2898143Z 2023-01-11T21:38:06.2898147Z 2023-01-11T21:38:06.2898240Z async_compile.wait(globals()) 2023-01-11T21:38:06.2898318Z del async_compile 2023-01-11T21:38:06.2898323Z 2023-01-11T21:38:06.2898397Z def call(args): 2023-01-11T21:38:06.2898475Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.2898550Z args.clear() 2023-01-11T21:38:06.2898641Z with torch.cuda.device(0): 2023-01-11T21:38:06.2898839Z buf0 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.2899039Z buf1 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.2899235Z buf2 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.2899328Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.2899503Z triton_fused_maximum_1_minimum_minimum_1_0.run(arg0_1, arg1_1, buf0, buf1, buf2, 64, grid=grid(64), stream=stream0) 2023-01-11T21:38:06.2899577Z del arg0_1 2023-01-11T21:38:06.2899650Z del arg1_1 2023-01-11T21:38:06.2899732Z return (buf0, buf1, buf2, ) 2023-01-11T21:38:06.2899773Z 2023-01-11T21:38:06.2899785Z 2023-01-11T21:38:06.2899861Z if __name__ == "__main__": 2023-01-11T21:38:06.2899982Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.2900112Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.2900318Z arg0_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.2900517Z arg1_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.2900638Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.2900644Z 2023-01-11T21:38:06.2900720Z ok (0.329s) 2023-01-11T21:38:06.2901174Z test_clone_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.2901305Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.2901567Z [2023-01-11 21:34:12,885] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 459 2023-01-11T21:38:06.2901836Z [2023-01-11 21:34:13,052] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 459 2023-01-11T21:38:06.2902253Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.2902386Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.2902645Z [2023-01-11 21:34:13,081] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 460 2023-01-11T21:38:06.2902914Z [2023-01-11 21:34:13,155] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 460 2023-01-11T21:38:06.2902920Z 2023-01-11T21:38:06.2903020Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.2903098Z import torch 2023-01-11T21:38:06.2903177Z import random 2023-01-11T21:38:06.2903320Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.2903448Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.2903453Z 2023-01-11T21:38:06.2903538Z aten = torch.ops.aten 2023-01-11T21:38:06.2903679Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.2903777Z async_compile = AsyncCompile() 2023-01-11T21:38:06.2903782Z 2023-01-11T21:38:06.2903859Z import triton 2023-01-11T21:38:06.2903953Z import triton.language as tl 2023-01-11T21:38:06.2904074Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.2904216Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.2904225Z 2023-01-11T21:38:06.2904229Z 2023-01-11T21:38:06.2904401Z triton_fused_add_clone_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.2904476Z import triton 2023-01-11T21:38:06.2904572Z import triton.language as tl 2023-01-11T21:38:06.2904690Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2904798Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2904935Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.2905056Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2905068Z 2023-01-11T21:38:06.2905496Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.2905585Z @triton.jit 2023-01-11T21:38:06.2905754Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.2905862Z xnumel = 256 2023-01-11T21:38:06.2905966Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2906099Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.2906185Z xmask = xindex < xnumel 2023-01-11T21:38:06.2906252Z x0 = xindex 2023-01-11T21:38:06.2906447Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.2906546Z tmp3 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.2906620Z tmp1 = 2 2023-01-11T21:38:06.2906701Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.2906774Z tmp4 = 1 2023-01-11T21:38:06.2906853Z tmp5 = tmp3 + tmp4 2023-01-11T21:38:06.2906984Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.2907118Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask) 2023-01-11T21:38:06.2907204Z ''') 2023-01-11T21:38:06.2907209Z 2023-01-11T21:38:06.2907214Z 2023-01-11T21:38:06.2907312Z async_compile.wait(globals()) 2023-01-11T21:38:06.2907390Z del async_compile 2023-01-11T21:38:06.2907395Z 2023-01-11T21:38:06.2907473Z def call(args): 2023-01-11T21:38:06.2907548Z arg0_1, = args 2023-01-11T21:38:06.2907624Z args.clear() 2023-01-11T21:38:06.2907713Z with torch.cuda.device(0): 2023-01-11T21:38:06.2907926Z buf0 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.2908127Z buf1 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.2908222Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.2908375Z triton_fused_add_clone_1_0.run(arg0_1, buf0, buf1, 256, grid=grid(256), stream=stream0) 2023-01-11T21:38:06.2908451Z del arg0_1 2023-01-11T21:38:06.2908537Z return (buf0, buf1, ) 2023-01-11T21:38:06.2908542Z 2023-01-11T21:38:06.2908546Z 2023-01-11T21:38:06.2908622Z if __name__ == "__main__": 2023-01-11T21:38:06.2908747Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.2908878Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.2909085Z arg0_1 = rand_strided((16, 16), (16, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.2909202Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.2909207Z 2023-01-11T21:38:06.2909212Z 2023-01-11T21:38:06.2909340Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.2909419Z import torch 2023-01-11T21:38:06.2909494Z import random 2023-01-11T21:38:06.2909609Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.2909736Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.2909741Z 2023-01-11T21:38:06.2909824Z aten = torch.ops.aten 2023-01-11T21:38:06.2909962Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.2910062Z async_compile = AsyncCompile() 2023-01-11T21:38:06.2910067Z 2023-01-11T21:38:06.2910142Z import triton 2023-01-11T21:38:06.2910239Z import triton.language as tl 2023-01-11T21:38:06.2910369Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.2910504Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.2910510Z 2023-01-11T21:38:06.2910514Z 2023-01-11T21:38:06.2910682Z triton_fused_add_clone_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.2910758Z import triton 2023-01-11T21:38:06.2910856Z import triton.language as tl 2023-01-11T21:38:06.2910973Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2911078Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2911213Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.2911333Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2911345Z 2023-01-11T21:38:06.2911756Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.2911859Z @triton.jit 2023-01-11T21:38:06.2912002Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.2912080Z xnumel = 256 2023-01-11T21:38:06.2912179Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2912310Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.2912396Z xmask = xindex < xnumel 2023-01-11T21:38:06.2912463Z x0 = xindex 2023-01-11T21:38:06.2912682Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.2912803Z tmp3 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.2912881Z tmp1 = 2 2023-01-11T21:38:06.2912963Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.2913037Z tmp4 = 1 2023-01-11T21:38:06.2913118Z tmp5 = tmp3 + tmp4 2023-01-11T21:38:06.2913249Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.2913382Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask) 2023-01-11T21:38:06.2913470Z ''') 2023-01-11T21:38:06.2913476Z 2023-01-11T21:38:06.2913481Z 2023-01-11T21:38:06.2913577Z async_compile.wait(globals()) 2023-01-11T21:38:06.2913656Z del async_compile 2023-01-11T21:38:06.2913661Z 2023-01-11T21:38:06.2913738Z def call(args): 2023-01-11T21:38:06.2913815Z arg0_1, = args 2023-01-11T21:38:06.2913897Z args.clear() 2023-01-11T21:38:06.2913986Z with torch.cuda.device(0): 2023-01-11T21:38:06.2914190Z buf0 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.2914389Z buf1 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.2914484Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.2914637Z triton_fused_add_clone_1_0.run(arg0_1, buf0, buf1, 256, grid=grid(256), stream=stream0) 2023-01-11T21:38:06.2914713Z del arg0_1 2023-01-11T21:38:06.2914802Z return (buf0, buf1, ) 2023-01-11T21:38:06.2914810Z 2023-01-11T21:38:06.2914814Z 2023-01-11T21:38:06.2914896Z if __name__ == "__main__": 2023-01-11T21:38:06.2915011Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.2915140Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.2915383Z arg0_1 = rand_strided((16, 16), (16, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.2915540Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.2915546Z 2023-01-11T21:38:06.2915620Z ok (0.299s) 2023-01-11T21:38:06.2916085Z test_constant_pad_1d_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.2916218Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.2916480Z [2023-01-11 21:34:13,182] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 461 2023-01-11T21:38:06.2916750Z [2023-01-11 21:34:13,444] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 461 2023-01-11T21:38:06.2917172Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.2917300Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.2917558Z [2023-01-11 21:34:13,471] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 462 2023-01-11T21:38:06.2917821Z [2023-01-11 21:34:13,562] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 462 2023-01-11T21:38:06.2917854Z 2023-01-11T21:38:06.2917955Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.2918032Z import torch 2023-01-11T21:38:06.2918110Z import random 2023-01-11T21:38:06.2918229Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.2918358Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.2918363Z 2023-01-11T21:38:06.2918442Z aten = torch.ops.aten 2023-01-11T21:38:06.2918585Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.2918682Z async_compile = AsyncCompile() 2023-01-11T21:38:06.2918688Z 2023-01-11T21:38:06.2918768Z import triton 2023-01-11T21:38:06.2918863Z import triton.language as tl 2023-01-11T21:38:06.2918989Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.2919131Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.2919136Z 2023-01-11T21:38:06.2919141Z 2023-01-11T21:38:06.2919320Z triton_fused_constant_pad_nd_0 = async_compile.triton(''' 2023-01-11T21:38:06.2919391Z import triton 2023-01-11T21:38:06.2919486Z import triton.language as tl 2023-01-11T21:38:06.2919602Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2919705Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2919843Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.2919970Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2919975Z 2023-01-11T21:38:06.2920385Z @pointwise(size_hints=[1024], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.2920461Z @triton.jit 2023-01-11T21:38:06.2920591Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.2920671Z xnumel = 1024 2023-01-11T21:38:06.2920772Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2920904Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.2920990Z xmask = xindex < xnumel 2023-01-11T21:38:06.2921069Z x0 = xindex % 32 2023-01-11T21:38:06.2921152Z x1 = (xindex // 32) 2023-01-11T21:38:06.2921218Z x2 = xindex 2023-01-11T21:38:06.2921290Z tmp0 = x0 2023-01-11T21:38:06.2921391Z tmp1 = 31 2023-01-11T21:38:06.2921474Z tmp2 = tmp0 < tmp1 2023-01-11T21:38:06.2921749Z tmp3 = tl.load(in_ptr0 + (x0 + (31*x1) + tl.zeros([XBLOCK], tl.int32)), tmp2 & xmask, eviction_policy='evict_last', other=0) 2023-01-11T21:38:06.2921846Z tmp4 = tl.where(tmp2, tmp3, 6.0) 2023-01-11T21:38:06.2921981Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.2922061Z ''') 2023-01-11T21:38:06.2922067Z 2023-01-11T21:38:06.2922078Z 2023-01-11T21:38:06.2922255Z triton_fused_constant_pad_nd_1_1 = async_compile.triton(''' 2023-01-11T21:38:06.2922331Z import triton 2023-01-11T21:38:06.2922426Z import triton.language as tl 2023-01-11T21:38:06.2922548Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2922651Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2922784Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.2922912Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2922917Z 2023-01-11T21:38:06.2923312Z @pointwise(size_hints=[2048], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.2923386Z @triton.jit 2023-01-11T21:38:06.2923521Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.2923598Z xnumel = 1152 2023-01-11T21:38:06.2923698Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2923830Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.2923915Z xmask = xindex < xnumel 2023-01-11T21:38:06.2924036Z x0 = xindex % 36 2023-01-11T21:38:06.2924111Z x1 = (xindex // 36) 2023-01-11T21:38:06.2924184Z x2 = xindex 2023-01-11T21:38:06.2924292Z tmp0 = (-2) + x0 2023-01-11T21:38:06.2924363Z tmp1 = 0 2023-01-11T21:38:06.2924445Z tmp2 = tmp0 >= tmp1 2023-01-11T21:38:06.2924517Z tmp3 = 31 2023-01-11T21:38:06.2924593Z tmp4 = tmp0 < tmp3 2023-01-11T21:38:06.2924673Z tmp5 = tmp2 & tmp4 2023-01-11T21:38:06.2924909Z tmp6 = tl.load(in_ptr0 + ((-2) + x0 + (31*x1) + tl.zeros([XBLOCK], tl.int32)), tmp5 & xmask, other=0) 2023-01-11T21:38:06.2925007Z tmp7 = tl.where(tmp5, tmp6, 99.0) 2023-01-11T21:38:06.2925142Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp7, xmask) 2023-01-11T21:38:06.2925229Z ''') 2023-01-11T21:38:06.2925235Z 2023-01-11T21:38:06.2925240Z 2023-01-11T21:38:06.2925337Z async_compile.wait(globals()) 2023-01-11T21:38:06.2925416Z del async_compile 2023-01-11T21:38:06.2925421Z 2023-01-11T21:38:06.2925492Z def call(args): 2023-01-11T21:38:06.2925574Z arg0_1, = args 2023-01-11T21:38:06.2925651Z args.clear() 2023-01-11T21:38:06.2925745Z with torch.cuda.device(0): 2023-01-11T21:38:06.2925959Z buf0 = empty_strided((2, 16, 32), (512, 32, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.2926054Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.2926211Z triton_fused_constant_pad_nd_0.run(arg0_1, buf0, 1024, grid=grid(1024), stream=stream0) 2023-01-11T21:38:06.2926417Z buf1 = empty_strided((2, 16, 36), (576, 36, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.2926575Z triton_fused_constant_pad_nd_1_1.run(arg0_1, buf1, 1152, grid=grid(1152), stream=stream0) 2023-01-11T21:38:06.2926653Z del arg0_1 2023-01-11T21:38:06.2926739Z return (buf0, buf1, ) 2023-01-11T21:38:06.2926744Z 2023-01-11T21:38:06.2926749Z 2023-01-11T21:38:06.2926832Z if __name__ == "__main__": 2023-01-11T21:38:06.2926952Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.2927084Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.2927300Z arg0_1 = rand_strided((2, 16, 31), (496, 31, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.2927409Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.2927420Z 2023-01-11T21:38:06.2927424Z 2023-01-11T21:38:06.2927545Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.2927624Z import torch 2023-01-11T21:38:06.2927699Z import random 2023-01-11T21:38:06.2927821Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.2927947Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.2927952Z 2023-01-11T21:38:06.2928037Z aten = torch.ops.aten 2023-01-11T21:38:06.2928174Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.2928265Z async_compile = AsyncCompile() 2023-01-11T21:38:06.2928270Z 2023-01-11T21:38:06.2928348Z import triton 2023-01-11T21:38:06.2928443Z import triton.language as tl 2023-01-11T21:38:06.2928571Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.2928713Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.2928719Z 2023-01-11T21:38:06.2928723Z 2023-01-11T21:38:06.2928901Z triton_fused_constant_pad_nd_0 = async_compile.triton(''' 2023-01-11T21:38:06.2928977Z import triton 2023-01-11T21:38:06.2929074Z import triton.language as tl 2023-01-11T21:38:06.2929183Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2929284Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2929416Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.2929542Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2929547Z 2023-01-11T21:38:06.2929955Z @pointwise(size_hints=[1024], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.2930059Z @triton.jit 2023-01-11T21:38:06.2930193Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.2930269Z xnumel = 1024 2023-01-11T21:38:06.2930363Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2930495Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.2930583Z xmask = xindex < xnumel 2023-01-11T21:38:06.2930660Z x0 = xindex % 32 2023-01-11T21:38:06.2930741Z x1 = (xindex // 32) 2023-01-11T21:38:06.2930814Z x2 = xindex 2023-01-11T21:38:06.2930887Z tmp0 = x0 2023-01-11T21:38:06.2930954Z tmp1 = 31 2023-01-11T21:38:06.2931033Z tmp2 = tmp0 < tmp1 2023-01-11T21:38:06.2931333Z tmp3 = tl.load(in_ptr0 + (x0 + (31*x1) + tl.zeros([XBLOCK], tl.int32)), tmp2 & xmask, eviction_policy='evict_last', other=0).to(tl.float32) 2023-01-11T21:38:06.2931433Z tmp4 = tl.where(tmp2, tmp3, 6.0) 2023-01-11T21:38:06.2931568Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.2931656Z ''') 2023-01-11T21:38:06.2931662Z 2023-01-11T21:38:06.2931666Z 2023-01-11T21:38:06.2931848Z triton_fused_constant_pad_nd_1_1 = async_compile.triton(''' 2023-01-11T21:38:06.2931925Z import triton 2023-01-11T21:38:06.2932013Z import triton.language as tl 2023-01-11T21:38:06.2932129Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2932236Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2932371Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.2932498Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2932503Z 2023-01-11T21:38:06.2932902Z @pointwise(size_hints=[2048], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.2932978Z @triton.jit 2023-01-11T21:38:06.2933111Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.2933184Z xnumel = 1152 2023-01-11T21:38:06.2933282Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2933411Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.2933496Z xmask = xindex < xnumel 2023-01-11T21:38:06.2933573Z x0 = xindex % 36 2023-01-11T21:38:06.2933683Z x1 = (xindex // 36) 2023-01-11T21:38:06.2933751Z x2 = xindex 2023-01-11T21:38:06.2933858Z tmp0 = (-2) + x0 2023-01-11T21:38:06.2933930Z tmp1 = 0 2023-01-11T21:38:06.2934011Z tmp2 = tmp0 >= tmp1 2023-01-11T21:38:06.2934084Z tmp3 = 31 2023-01-11T21:38:06.2934164Z tmp4 = tmp0 < tmp3 2023-01-11T21:38:06.2934244Z tmp5 = tmp2 & tmp4 2023-01-11T21:38:06.2934616Z tmp6 = tl.load(in_ptr0 + ((-2) + x0 + (31*x1) + tl.zeros([XBLOCK], tl.int32)), tmp5 & xmask, other=0).to(tl.float32) 2023-01-11T21:38:06.2934713Z tmp7 = tl.where(tmp5, tmp6, 99.0) 2023-01-11T21:38:06.2934845Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp7, xmask) 2023-01-11T21:38:06.2934933Z ''') 2023-01-11T21:38:06.2934939Z 2023-01-11T21:38:06.2934943Z 2023-01-11T21:38:06.2935036Z async_compile.wait(globals()) 2023-01-11T21:38:06.2935112Z del async_compile 2023-01-11T21:38:06.2935117Z 2023-01-11T21:38:06.2935190Z def call(args): 2023-01-11T21:38:06.2935261Z arg0_1, = args 2023-01-11T21:38:06.2935331Z args.clear() 2023-01-11T21:38:06.2935424Z with torch.cuda.device(0): 2023-01-11T21:38:06.2935636Z buf0 = empty_strided((2, 16, 32), (512, 32, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.2935729Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.2935880Z triton_fused_constant_pad_nd_0.run(arg0_1, buf0, 1024, grid=grid(1024), stream=stream0) 2023-01-11T21:38:06.2936092Z buf1 = empty_strided((2, 16, 36), (576, 36, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.2936247Z triton_fused_constant_pad_nd_1_1.run(arg0_1, buf1, 1152, grid=grid(1152), stream=stream0) 2023-01-11T21:38:06.2936394Z del arg0_1 2023-01-11T21:38:06.2936477Z return (buf0, buf1, ) 2023-01-11T21:38:06.2936483Z 2023-01-11T21:38:06.2936487Z 2023-01-11T21:38:06.2936568Z if __name__ == "__main__": 2023-01-11T21:38:06.2936687Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.2936816Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.2937033Z arg0_1 = rand_strided((2, 16, 31), (496, 31, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.2937201Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.2937207Z 2023-01-11T21:38:06.2937298Z ok (0.407s) 2023-01-11T21:38:06.2937766Z test_constant_pad_2d_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.2937900Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.2938156Z [2023-01-11 21:34:13,589] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 463 2023-01-11T21:38:06.2938426Z [2023-01-11 21:34:13,696] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 463 2023-01-11T21:38:06.2938841Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.2938974Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.2939229Z [2023-01-11 21:34:13,723] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 464 2023-01-11T21:38:06.2939494Z [2023-01-11 21:34:13,820] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 464 2023-01-11T21:38:06.2939500Z 2023-01-11T21:38:06.2939602Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.2939677Z import torch 2023-01-11T21:38:06.2939759Z import random 2023-01-11T21:38:06.2939915Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.2940042Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.2940047Z 2023-01-11T21:38:06.2940131Z aten = torch.ops.aten 2023-01-11T21:38:06.2940271Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.2940370Z async_compile = AsyncCompile() 2023-01-11T21:38:06.2940375Z 2023-01-11T21:38:06.2940451Z import triton 2023-01-11T21:38:06.2940546Z import triton.language as tl 2023-01-11T21:38:06.2940673Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.2940808Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.2940817Z 2023-01-11T21:38:06.2940821Z 2023-01-11T21:38:06.2941001Z triton_fused_constant_pad_nd_0 = async_compile.triton(''' 2023-01-11T21:38:06.2941079Z import triton 2023-01-11T21:38:06.2941180Z import triton.language as tl 2023-01-11T21:38:06.2941322Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2941459Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2941611Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.2941735Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2941747Z 2023-01-11T21:38:06.2942147Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.2942223Z @triton.jit 2023-01-11T21:38:06.2942358Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.2942470Z xnumel = 100 2023-01-11T21:38:06.2942568Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2942695Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.2942775Z xmask = xindex < xnumel 2023-01-11T21:38:06.2942848Z x1 = (xindex // 10) 2023-01-11T21:38:06.2942926Z x0 = xindex % 10 2023-01-11T21:38:06.2942995Z x2 = xindex 2023-01-11T21:38:06.2943101Z tmp0 = (-1) + x1 2023-01-11T21:38:06.2943251Z tmp1 = 0 2023-01-11T21:38:06.2943332Z tmp2 = tmp0 >= tmp1 2023-01-11T21:38:06.2943403Z tmp3 = 8 2023-01-11T21:38:06.2943475Z tmp4 = tmp0 < tmp3 2023-01-11T21:38:06.2943581Z tmp5 = (-1) + x0 2023-01-11T21:38:06.2943659Z tmp6 = tmp5 >= tmp1 2023-01-11T21:38:06.2943736Z tmp7 = tmp5 < tmp3 2023-01-11T21:38:06.2943814Z tmp8 = tmp2 & tmp4 2023-01-11T21:38:06.2943892Z tmp9 = tmp8 & tmp6 2023-01-11T21:38:06.2943964Z tmp10 = tmp9 & tmp7 2023-01-11T21:38:06.2944248Z tmp11 = tl.load(in_ptr0 + ((-9) + x0 + (8*x1) + tl.zeros([XBLOCK], tl.int32)), tmp10 & xmask, eviction_policy='evict_last', other=0) 2023-01-11T21:38:06.2944349Z tmp12 = tl.where(tmp10, tmp11, 6.0) 2023-01-11T21:38:06.2944484Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp12, xmask) 2023-01-11T21:38:06.2944568Z ''') 2023-01-11T21:38:06.2944573Z 2023-01-11T21:38:06.2944581Z 2023-01-11T21:38:06.2944759Z triton_fused_constant_pad_nd_1_1 = async_compile.triton(''' 2023-01-11T21:38:06.2944833Z import triton 2023-01-11T21:38:06.2944925Z import triton.language as tl 2023-01-11T21:38:06.2945033Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2945133Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2945270Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.2945417Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2945422Z 2023-01-11T21:38:06.2945845Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.2945920Z @triton.jit 2023-01-11T21:38:06.2946051Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.2946124Z xnumel = 165 2023-01-11T21:38:06.2946245Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2946375Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.2946457Z xmask = xindex < xnumel 2023-01-11T21:38:06.2946536Z x1 = (xindex // 11) 2023-01-11T21:38:06.2946611Z x0 = xindex % 11 2023-01-11T21:38:06.2946680Z x2 = xindex 2023-01-11T21:38:06.2946785Z tmp0 = (-3) + x1 2023-01-11T21:38:06.2946848Z tmp1 = 0 2023-01-11T21:38:06.2946928Z tmp2 = tmp0 >= tmp1 2023-01-11T21:38:06.2947000Z tmp3 = 8 2023-01-11T21:38:06.2947075Z tmp4 = tmp0 < tmp3 2023-01-11T21:38:06.2947177Z tmp5 = (-1) + x0 2023-01-11T21:38:06.2947255Z tmp6 = tmp5 >= tmp1 2023-01-11T21:38:06.2947335Z tmp7 = tmp5 < tmp3 2023-01-11T21:38:06.2947405Z tmp8 = tmp2 & tmp4 2023-01-11T21:38:06.2947479Z tmp9 = tmp8 & tmp6 2023-01-11T21:38:06.2947554Z tmp10 = tmp9 & tmp7 2023-01-11T21:38:06.2947793Z tmp11 = tl.load(in_ptr0 + ((-25) + x0 + (8*x1) + tl.zeros([XBLOCK], tl.int32)), tmp10 & xmask, other=0) 2023-01-11T21:38:06.2947895Z tmp12 = tl.where(tmp10, tmp11, 99.0) 2023-01-11T21:38:06.2948030Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp12, xmask) 2023-01-11T21:38:06.2948115Z ''') 2023-01-11T21:38:06.2948121Z 2023-01-11T21:38:06.2948125Z 2023-01-11T21:38:06.2948211Z async_compile.wait(globals()) 2023-01-11T21:38:06.2948287Z del async_compile 2023-01-11T21:38:06.2948293Z 2023-01-11T21:38:06.2948369Z def call(args): 2023-01-11T21:38:06.2948439Z arg0_1, = args 2023-01-11T21:38:06.2948513Z args.clear() 2023-01-11T21:38:06.2948605Z with torch.cuda.device(0): 2023-01-11T21:38:06.2948827Z buf0 = empty_strided((1, 1, 10, 10), (100, 100, 10, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.2948943Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.2949098Z triton_fused_constant_pad_nd_0.run(arg0_1, buf0, 100, grid=grid(100), stream=stream0) 2023-01-11T21:38:06.2949325Z buf1 = empty_strided((1, 1, 15, 11), (165, 165, 11, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.2949476Z triton_fused_constant_pad_nd_1_1.run(arg0_1, buf1, 165, grid=grid(165), stream=stream0) 2023-01-11T21:38:06.2949550Z del arg0_1 2023-01-11T21:38:06.2949636Z return (buf0, buf1, ) 2023-01-11T21:38:06.2949641Z 2023-01-11T21:38:06.2949645Z 2023-01-11T21:38:06.2949730Z if __name__ == "__main__": 2023-01-11T21:38:06.2949853Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.2949975Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.2950189Z arg0_1 = rand_strided((1, 1, 8, 8), (64, 64, 8, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.2950306Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.2950311Z 2023-01-11T21:38:06.2950316Z 2023-01-11T21:38:06.2950415Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.2950489Z import torch 2023-01-11T21:38:06.2950566Z import random 2023-01-11T21:38:06.2950689Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.2950818Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.2950823Z 2023-01-11T21:38:06.2950901Z aten = torch.ops.aten 2023-01-11T21:38:06.2951040Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.2951136Z async_compile = AsyncCompile() 2023-01-11T21:38:06.2951141Z 2023-01-11T21:38:06.2951215Z import triton 2023-01-11T21:38:06.2951309Z import triton.language as tl 2023-01-11T21:38:06.2951436Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.2951578Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.2951583Z 2023-01-11T21:38:06.2951590Z 2023-01-11T21:38:06.2951768Z triton_fused_constant_pad_nd_0 = async_compile.triton(''' 2023-01-11T21:38:06.2951838Z import triton 2023-01-11T21:38:06.2951933Z import triton.language as tl 2023-01-11T21:38:06.2952049Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2952149Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2952311Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.2952440Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2952446Z 2023-01-11T21:38:06.2952848Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.2952922Z @triton.jit 2023-01-11T21:38:06.2953049Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.2953125Z xnumel = 100 2023-01-11T21:38:06.2953227Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2953359Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.2953445Z xmask = xindex < xnumel 2023-01-11T21:38:06.2953525Z x1 = (xindex // 10) 2023-01-11T21:38:06.2953601Z x0 = xindex % 10 2023-01-11T21:38:06.2953667Z x2 = xindex 2023-01-11T21:38:06.2953775Z tmp0 = (-1) + x1 2023-01-11T21:38:06.2953848Z tmp1 = 0 2023-01-11T21:38:06.2953934Z tmp2 = tmp0 >= tmp1 2023-01-11T21:38:06.2954005Z tmp3 = 8 2023-01-11T21:38:06.2954085Z tmp4 = tmp0 < tmp3 2023-01-11T21:38:06.2954188Z tmp5 = (-1) + x0 2023-01-11T21:38:06.2954262Z tmp6 = tmp5 >= tmp1 2023-01-11T21:38:06.2954342Z tmp7 = tmp5 < tmp3 2023-01-11T21:38:06.2954420Z tmp8 = tmp2 & tmp4 2023-01-11T21:38:06.2954496Z tmp9 = tmp8 & tmp6 2023-01-11T21:38:06.2954576Z tmp10 = tmp9 & tmp7 2023-01-11T21:38:06.2954882Z tmp11 = tl.load(in_ptr0 + ((-9) + x0 + (8*x1) + tl.zeros([XBLOCK], tl.int32)), tmp10 & xmask, eviction_policy='evict_last', other=0).to(tl.float32) 2023-01-11T21:38:06.2955053Z tmp12 = tl.where(tmp10, tmp11, 6.0) 2023-01-11T21:38:06.2955184Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp12, xmask) 2023-01-11T21:38:06.2955270Z ''') 2023-01-11T21:38:06.2955276Z 2023-01-11T21:38:06.2955280Z 2023-01-11T21:38:06.2955466Z triton_fused_constant_pad_nd_1_1 = async_compile.triton(''' 2023-01-11T21:38:06.2955543Z import triton 2023-01-11T21:38:06.2955642Z import triton.language as tl 2023-01-11T21:38:06.2955759Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2955864Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2955992Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.2956119Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2956124Z 2023-01-11T21:38:06.2956523Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.2956603Z @triton.jit 2023-01-11T21:38:06.2956738Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.2956817Z xnumel = 165 2023-01-11T21:38:06.2956915Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2957049Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.2957128Z xmask = xindex < xnumel 2023-01-11T21:38:06.2957211Z x1 = (xindex // 11) 2023-01-11T21:38:06.2957289Z x0 = xindex % 11 2023-01-11T21:38:06.2957358Z x2 = xindex 2023-01-11T21:38:06.2957464Z tmp0 = (-3) + x1 2023-01-11T21:38:06.2957538Z tmp1 = 0 2023-01-11T21:38:06.2957620Z tmp2 = tmp0 >= tmp1 2023-01-11T21:38:06.2957685Z tmp3 = 8 2023-01-11T21:38:06.2957766Z tmp4 = tmp0 < tmp3 2023-01-11T21:38:06.2957870Z tmp5 = (-1) + x0 2023-01-11T21:38:06.2957951Z tmp6 = tmp5 >= tmp1 2023-01-11T21:38:06.2958028Z tmp7 = tmp5 < tmp3 2023-01-11T21:38:06.2958110Z tmp8 = tmp2 & tmp4 2023-01-11T21:38:06.2958182Z tmp9 = tmp8 & tmp6 2023-01-11T21:38:06.2958262Z tmp10 = tmp9 & tmp7 2023-01-11T21:38:06.2958522Z tmp11 = tl.load(in_ptr0 + ((-25) + x0 + (8*x1) + tl.zeros([XBLOCK], tl.int32)), tmp10 & xmask, other=0).to(tl.float32) 2023-01-11T21:38:06.2958652Z tmp12 = tl.where(tmp10, tmp11, 99.0) 2023-01-11T21:38:06.2958789Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp12, xmask) 2023-01-11T21:38:06.2958877Z ''') 2023-01-11T21:38:06.2958882Z 2023-01-11T21:38:06.2958887Z 2023-01-11T21:38:06.2958982Z async_compile.wait(globals()) 2023-01-11T21:38:06.2959060Z del async_compile 2023-01-11T21:38:06.2959065Z 2023-01-11T21:38:06.2959135Z def call(args): 2023-01-11T21:38:06.2959210Z arg0_1, = args 2023-01-11T21:38:06.2959285Z args.clear() 2023-01-11T21:38:06.2959379Z with torch.cuda.device(0): 2023-01-11T21:38:06.2959602Z buf0 = empty_strided((1, 1, 10, 10), (100, 100, 10, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.2959697Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.2959851Z triton_fused_constant_pad_nd_0.run(arg0_1, buf0, 100, grid=grid(100), stream=stream0) 2023-01-11T21:38:06.2960072Z buf1 = empty_strided((1, 1, 15, 11), (165, 165, 11, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.2960222Z triton_fused_constant_pad_nd_1_1.run(arg0_1, buf1, 165, grid=grid(165), stream=stream0) 2023-01-11T21:38:06.2960299Z del arg0_1 2023-01-11T21:38:06.2960383Z return (buf0, buf1, ) 2023-01-11T21:38:06.2960388Z 2023-01-11T21:38:06.2960393Z 2023-01-11T21:38:06.2960474Z if __name__ == "__main__": 2023-01-11T21:38:06.2960593Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.2960719Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.2960933Z arg0_1 = rand_strided((1, 1, 8, 8), (64, 64, 8, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.2961048Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.2961080Z 2023-01-11T21:38:06.2961148Z ok (0.259s) 2023-01-11T21:38:06.2961615Z test_constant_pad_3d_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.2961752Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.2962013Z [2023-01-11 21:34:13,849] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 465 2023-01-11T21:38:06.2962279Z [2023-01-11 21:34:13,990] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 465 2023-01-11T21:38:06.2962699Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.2962840Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.2963099Z [2023-01-11 21:34:14,018] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 466 2023-01-11T21:38:06.2963362Z [2023-01-11 21:34:14,136] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 466 2023-01-11T21:38:06.2963368Z 2023-01-11T21:38:06.2963470Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.2963547Z import torch 2023-01-11T21:38:06.2963617Z import random 2023-01-11T21:38:06.2963738Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.2963864Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.2963872Z 2023-01-11T21:38:06.2963956Z aten = torch.ops.aten 2023-01-11T21:38:06.2964095Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.2964193Z async_compile = AsyncCompile() 2023-01-11T21:38:06.2964198Z 2023-01-11T21:38:06.2964273Z import triton 2023-01-11T21:38:06.2964360Z import triton.language as tl 2023-01-11T21:38:06.2964516Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.2964661Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.2964667Z 2023-01-11T21:38:06.2964672Z 2023-01-11T21:38:06.2964850Z triton_fused_constant_pad_nd_0 = async_compile.triton(''' 2023-01-11T21:38:06.2964930Z import triton 2023-01-11T21:38:06.2965024Z import triton.language as tl 2023-01-11T21:38:06.2965140Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2965244Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2965371Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.2965503Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2965508Z 2023-01-11T21:38:06.2965913Z @pointwise(size_hints=[4096], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.2965996Z @triton.jit 2023-01-11T21:38:06.2966130Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.2966210Z xnumel = 2310 2023-01-11T21:38:06.2966308Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2966439Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.2966517Z xmask = xindex < xnumel 2023-01-11T21:38:06.2966601Z x2 = (xindex // 77) % 15 2023-01-11T21:38:06.2966681Z x1 = (xindex // 7) % 11 2023-01-11T21:38:06.2966757Z x0 = xindex % 7 2023-01-11T21:38:06.2966841Z x3 = (xindex // 1155) 2023-01-11T21:38:06.2966913Z x7 = xindex 2023-01-11T21:38:06.2967058Z tmp0 = (-5) + x2 2023-01-11T21:38:06.2967125Z tmp1 = 0 2023-01-11T21:38:06.2967207Z tmp2 = tmp0 >= tmp1 2023-01-11T21:38:06.2967281Z tmp3 = 4 2023-01-11T21:38:06.2967361Z tmp4 = tmp0 < tmp3 2023-01-11T21:38:06.2967464Z tmp5 = (-3) + x1 2023-01-11T21:38:06.2967543Z tmp6 = tmp5 >= tmp1 2023-01-11T21:38:06.2967618Z tmp7 = tmp5 < tmp3 2023-01-11T21:38:06.2967726Z tmp8 = (-1) + x0 2023-01-11T21:38:06.2967805Z tmp9 = tmp8 >= tmp1 2023-01-11T21:38:06.2967881Z tmp10 = tmp8 < tmp3 2023-01-11T21:38:06.2967959Z tmp11 = tmp2 & tmp4 2023-01-11T21:38:06.2968039Z tmp12 = tmp11 & tmp6 2023-01-11T21:38:06.2968121Z tmp13 = tmp12 & tmp7 2023-01-11T21:38:06.2968196Z tmp14 = tmp13 & tmp9 2023-01-11T21:38:06.2968279Z tmp15 = tmp14 & tmp10 2023-01-11T21:38:06.2968587Z tmp16 = tl.load(in_ptr0 + ((-93) + x0 + (4*x1) + (16*x2) + (64*x3) + tl.zeros([XBLOCK], tl.int32)), tmp15 & xmask, eviction_policy='evict_last', other=0) 2023-01-11T21:38:06.2968690Z tmp17 = tl.where(tmp15, tmp16, 6.0) 2023-01-11T21:38:06.2968828Z tl.store(out_ptr0 + (x7 + tl.zeros([XBLOCK], tl.int32)), tmp17, xmask) 2023-01-11T21:38:06.2968915Z ''') 2023-01-11T21:38:06.2968921Z 2023-01-11T21:38:06.2968925Z 2023-01-11T21:38:06.2969105Z triton_fused_constant_pad_nd_1_1 = async_compile.triton(''' 2023-01-11T21:38:06.2969189Z import triton 2023-01-11T21:38:06.2969278Z import triton.language as tl 2023-01-11T21:38:06.2969394Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2969498Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2969634Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.2969761Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2969766Z 2023-01-11T21:38:06.2970171Z @pointwise(size_hints=[512], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.2970252Z @triton.jit 2023-01-11T21:38:06.2970385Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.2970455Z xnumel = 352 2023-01-11T21:38:06.2970555Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2970685Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.2970798Z xmask = xindex < xnumel 2023-01-11T21:38:06.2970881Z x1 = (xindex // 4) % 11 2023-01-11T21:38:06.2970959Z x2 = (xindex // 44) 2023-01-11T21:38:06.2971030Z x3 = xindex % 44 2023-01-11T21:38:06.2971103Z x4 = xindex 2023-01-11T21:38:06.2971211Z tmp0 = (-3) + x1 2023-01-11T21:38:06.2971282Z tmp1 = 0 2023-01-11T21:38:06.2971365Z tmp2 = tmp0 >= tmp1 2023-01-11T21:38:06.2971438Z tmp3 = 4 2023-01-11T21:38:06.2971518Z tmp4 = tmp0 < tmp3 2023-01-11T21:38:06.2971591Z tmp5 = tmp2 & tmp4 2023-01-11T21:38:06.2971828Z tmp6 = tl.load(in_ptr0 + ((-12) + x3 + (16*x2) + tl.zeros([XBLOCK], tl.int32)), tmp5 & xmask, other=0) 2023-01-11T21:38:06.2971927Z tmp7 = tl.where(tmp5, tmp6, 6.0) 2023-01-11T21:38:06.2972064Z tl.store(out_ptr0 + (x4 + tl.zeros([XBLOCK], tl.int32)), tmp7, xmask) 2023-01-11T21:38:06.2972149Z ''') 2023-01-11T21:38:06.2972154Z 2023-01-11T21:38:06.2972158Z 2023-01-11T21:38:06.2972255Z async_compile.wait(globals()) 2023-01-11T21:38:06.2972336Z del async_compile 2023-01-11T21:38:06.2972341Z 2023-01-11T21:38:06.2972418Z def call(args): 2023-01-11T21:38:06.2972487Z arg0_1, = args 2023-01-11T21:38:06.2972563Z args.clear() 2023-01-11T21:38:06.2972662Z with torch.cuda.device(0): 2023-01-11T21:38:06.2972885Z buf0 = empty_strided((2, 15, 11, 7), (1155, 77, 7, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.2972978Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.2973130Z triton_fused_constant_pad_nd_0.run(arg0_1, buf0, 2310, grid=grid(2310), stream=stream0) 2023-01-11T21:38:06.2973352Z buf1 = empty_strided((2, 4, 11, 4), (176, 44, 4, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.2973528Z triton_fused_constant_pad_nd_1_1.run(arg0_1, buf1, 352, grid=grid(352), stream=stream0) 2023-01-11T21:38:06.2973602Z del arg0_1 2023-01-11T21:38:06.2973687Z return (buf0, buf1, ) 2023-01-11T21:38:06.2973692Z 2023-01-11T21:38:06.2973697Z 2023-01-11T21:38:06.2973783Z if __name__ == "__main__": 2023-01-11T21:38:06.2973906Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.2974034Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.2974251Z arg0_1 = rand_strided((2, 4, 4, 4), (64, 16, 4, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.2974364Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.2974370Z 2023-01-11T21:38:06.2974374Z 2023-01-11T21:38:06.2974590Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.2974661Z import torch 2023-01-11T21:38:06.2974739Z import random 2023-01-11T21:38:06.2974860Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.2974995Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.2975000Z 2023-01-11T21:38:06.2975084Z aten = torch.ops.aten 2023-01-11T21:38:06.2975221Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.2975318Z async_compile = AsyncCompile() 2023-01-11T21:38:06.2975327Z 2023-01-11T21:38:06.2975396Z import triton 2023-01-11T21:38:06.2975491Z import triton.language as tl 2023-01-11T21:38:06.2975617Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.2975760Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.2975766Z 2023-01-11T21:38:06.2975770Z 2023-01-11T21:38:06.2975950Z triton_fused_constant_pad_nd_0 = async_compile.triton(''' 2023-01-11T21:38:06.2976027Z import triton 2023-01-11T21:38:06.2976120Z import triton.language as tl 2023-01-11T21:38:06.2976235Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2976331Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2976467Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.2976595Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2976600Z 2023-01-11T21:38:06.2977046Z @pointwise(size_hints=[4096], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.2977177Z @triton.jit 2023-01-11T21:38:06.2977322Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.2977399Z xnumel = 2310 2023-01-11T21:38:06.2977500Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2977625Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.2977710Z xmask = xindex < xnumel 2023-01-11T21:38:06.2977793Z x2 = (xindex // 77) % 15 2023-01-11T21:38:06.2977878Z x1 = (xindex // 7) % 11 2023-01-11T21:38:06.2977957Z x0 = xindex % 7 2023-01-11T21:38:06.2978037Z x3 = (xindex // 1155) 2023-01-11T21:38:06.2978110Z x7 = xindex 2023-01-11T21:38:06.2978212Z tmp0 = (-5) + x2 2023-01-11T21:38:06.2978283Z tmp1 = 0 2023-01-11T21:38:06.2978364Z tmp2 = tmp0 >= tmp1 2023-01-11T21:38:06.2978435Z tmp3 = 4 2023-01-11T21:38:06.2978515Z tmp4 = tmp0 < tmp3 2023-01-11T21:38:06.2978623Z tmp5 = (-3) + x1 2023-01-11T21:38:06.2978697Z tmp6 = tmp5 >= tmp1 2023-01-11T21:38:06.2978776Z tmp7 = tmp5 < tmp3 2023-01-11T21:38:06.2978881Z tmp8 = (-1) + x0 2023-01-11T21:38:06.2978960Z tmp9 = tmp8 >= tmp1 2023-01-11T21:38:06.2979040Z tmp10 = tmp8 < tmp3 2023-01-11T21:38:06.2979118Z tmp11 = tmp2 & tmp4 2023-01-11T21:38:06.2979201Z tmp12 = tmp11 & tmp6 2023-01-11T21:38:06.2979278Z tmp13 = tmp12 & tmp7 2023-01-11T21:38:06.2979359Z tmp14 = tmp13 & tmp9 2023-01-11T21:38:06.2979444Z tmp15 = tmp14 & tmp10 2023-01-11T21:38:06.2979774Z tmp16 = tl.load(in_ptr0 + ((-93) + x0 + (4*x1) + (16*x2) + (64*x3) + tl.zeros([XBLOCK], tl.int32)), tmp15 & xmask, eviction_policy='evict_last', other=0).to(tl.float32) 2023-01-11T21:38:06.2979915Z tmp17 = tl.where(tmp15, tmp16, 6.0) 2023-01-11T21:38:06.2980053Z tl.store(out_ptr0 + (x7 + tl.zeros([XBLOCK], tl.int32)), tmp17, xmask) 2023-01-11T21:38:06.2980140Z ''') 2023-01-11T21:38:06.2980146Z 2023-01-11T21:38:06.2980153Z 2023-01-11T21:38:06.2980334Z triton_fused_constant_pad_nd_1_1 = async_compile.triton(''' 2023-01-11T21:38:06.2980405Z import triton 2023-01-11T21:38:06.2980497Z import triton.language as tl 2023-01-11T21:38:06.2980612Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2980714Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2980848Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.2980975Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2980980Z 2023-01-11T21:38:06.2981381Z @pointwise(size_hints=[512], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.2981462Z @triton.jit 2023-01-11T21:38:06.2981588Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.2981663Z xnumel = 352 2023-01-11T21:38:06.2981764Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2981897Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.2981982Z xmask = xindex < xnumel 2023-01-11T21:38:06.2982064Z x1 = (xindex // 4) % 11 2023-01-11T21:38:06.2982143Z x2 = (xindex // 44) 2023-01-11T21:38:06.2982214Z x3 = xindex % 44 2023-01-11T21:38:06.2982286Z x4 = xindex 2023-01-11T21:38:06.2982392Z tmp0 = (-3) + x1 2023-01-11T21:38:06.2982465Z tmp1 = 0 2023-01-11T21:38:06.2982545Z tmp2 = tmp0 >= tmp1 2023-01-11T21:38:06.2982614Z tmp3 = 4 2023-01-11T21:38:06.2982687Z tmp4 = tmp0 < tmp3 2023-01-11T21:38:06.2982766Z tmp5 = tmp2 & tmp4 2023-01-11T21:38:06.2983025Z tmp6 = tl.load(in_ptr0 + ((-12) + x3 + (16*x2) + tl.zeros([XBLOCK], tl.int32)), tmp5 & xmask, other=0).to(tl.float32) 2023-01-11T21:38:06.2983124Z tmp7 = tl.where(tmp5, tmp6, 6.0) 2023-01-11T21:38:06.2983259Z tl.store(out_ptr0 + (x4 + tl.zeros([XBLOCK], tl.int32)), tmp7, xmask) 2023-01-11T21:38:06.2983373Z ''') 2023-01-11T21:38:06.2983379Z 2023-01-11T21:38:06.2983384Z 2023-01-11T21:38:06.2983479Z async_compile.wait(globals()) 2023-01-11T21:38:06.2983558Z del async_compile 2023-01-11T21:38:06.2983563Z 2023-01-11T21:38:06.2983633Z def call(args): 2023-01-11T21:38:06.2983711Z arg0_1, = args 2023-01-11T21:38:06.2983785Z args.clear() 2023-01-11T21:38:06.2983882Z with torch.cuda.device(0): 2023-01-11T21:38:06.2984106Z buf0 = empty_strided((2, 15, 11, 7), (1155, 77, 7, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.2984198Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.2984352Z triton_fused_constant_pad_nd_0.run(arg0_1, buf0, 2310, grid=grid(2310), stream=stream0) 2023-01-11T21:38:06.2984569Z buf1 = empty_strided((2, 4, 11, 4), (176, 44, 4, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.2984723Z triton_fused_constant_pad_nd_1_1.run(arg0_1, buf1, 352, grid=grid(352), stream=stream0) 2023-01-11T21:38:06.2984800Z del arg0_1 2023-01-11T21:38:06.2984885Z return (buf0, buf1, ) 2023-01-11T21:38:06.2984890Z 2023-01-11T21:38:06.2984895Z 2023-01-11T21:38:06.2984975Z if __name__ == "__main__": 2023-01-11T21:38:06.2985094Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.2985221Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.2985435Z arg0_1 = rand_strided((2, 4, 4, 4), (64, 16, 4, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.2985543Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.2985554Z 2023-01-11T21:38:06.2985621Z ok (0.316s) 2023-01-11T21:38:06.2986105Z test_conv2d_backward_channels_last_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.2986268Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.2986531Z [2023-01-11 21:34:14,303] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 467 2023-01-11T21:38:06.2986798Z [2023-01-11 21:34:14,500] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 467 2023-01-11T21:38:06.2986803Z 2023-01-11T21:38:06.2986901Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.2986976Z import torch 2023-01-11T21:38:06.2987050Z import random 2023-01-11T21:38:06.2987165Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.2987296Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.2987301Z 2023-01-11T21:38:06.2987385Z aten = torch.ops.aten 2023-01-11T21:38:06.2987526Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.2987627Z async_compile = AsyncCompile() 2023-01-11T21:38:06.2987635Z 2023-01-11T21:38:06.2987710Z import triton 2023-01-11T21:38:06.2987805Z import triton.language as tl 2023-01-11T21:38:06.2987931Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.2988065Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.2988070Z 2023-01-11T21:38:06.2988081Z 2023-01-11T21:38:06.2988234Z triton_fused_sum_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.2988311Z import triton 2023-01-11T21:38:06.2988406Z import triton.language as tl 2023-01-11T21:38:06.2988521Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2988628Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2988762Z from torch._inductor.triton_ops.autotune import reduction 2023-01-11T21:38:06.2988889Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2988894Z 2023-01-11T21:38:06.2988982Z @reduction(size_hints=[512, 128], 2023-01-11T21:38:06.2989101Z reduction_hint=ReductionHint.INNER, 2023-01-11T21:38:06.2989214Z filename=__file__, 2023-01-11T21:38:06.2989585Z meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.2989661Z @triton.jit 2023-01-11T21:38:06.2989831Z def triton_(in_ptr0, out_ptr0, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr): 2023-01-11T21:38:06.2989907Z xnumel = 320 2023-01-11T21:38:06.2989983Z rnumel = 128 2023-01-11T21:38:06.2990077Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.2990213Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.2990299Z xmask = xindex < xnumel 2023-01-11T21:38:06.2990419Z rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK]) 2023-01-11T21:38:06.2990492Z x0 = xindex 2023-01-11T21:38:06.2990609Z _tmp1 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.2990719Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.2990804Z rindex = roffset + rbase 2023-01-11T21:38:06.2990892Z rmask = rindex < rnumel 2023-01-11T21:38:06.2990970Z r1 = rindex % 64 2023-01-11T21:38:06.2991054Z r2 = (rindex // 64) 2023-01-11T21:38:06.2991289Z tmp0 = tl.load(in_ptr0 + (r1 + (64*x0) + (20480*r2)), rmask & xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.2991412Z _tmp1 = tl.where(xmask & rmask, _tmp1 + tmp0, _tmp1) 2023-01-11T21:38:06.2991527Z tmp1 = tl.reshape(tl.sum(_tmp1, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.2991621Z tl.store(out_ptr0 + x0, tmp1, xmask) 2023-01-11T21:38:06.2991744Z ''') 2023-01-11T21:38:06.2991750Z 2023-01-11T21:38:06.2991754Z 2023-01-11T21:38:06.2991854Z async_compile.wait(globals()) 2023-01-11T21:38:06.2991932Z del async_compile 2023-01-11T21:38:06.2991937Z 2023-01-11T21:38:06.2992014Z def call(args): 2023-01-11T21:38:06.2992103Z arg0_1, arg1_1, arg2_1 = args 2023-01-11T21:38:06.2992182Z args.clear() 2023-01-11T21:38:06.2992270Z with torch.cuda.device(0): 2023-01-11T21:38:06.2992471Z buf0 = empty_strided((320, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.2992565Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.2992705Z triton_fused_sum_1_0.run(arg0_1, buf0, 320, 128, grid=grid(320), stream=stream0) 2023-01-11T21:38:06.2992877Z buf1 = aten.convolution_backward(arg0_1, arg1_1, arg2_1, [320], [1, 1], [0, 0], [1, 1], False, [0, 0], 1, [True, True, False]) 2023-01-11T21:38:06.2992951Z del arg0_1 2023-01-11T21:38:06.2993026Z del arg1_1 2023-01-11T21:38:06.2993102Z del arg2_1 2023-01-11T21:38:06.2993171Z buf2 = buf1[0] 2023-01-11T21:38:06.2993293Z assert_size_stride(buf2, (2, 2048, 8, 8), (131072, 1, 16384, 2048)) 2023-01-11T21:38:06.2993369Z buf3 = buf1[1] 2023-01-11T21:38:06.2993490Z assert_size_stride(buf3, (320, 2048, 1, 1), (2048, 1, 2048, 2048)) 2023-01-11T21:38:06.2993566Z del buf1 2023-01-11T21:38:06.2993658Z return (buf2, buf3, buf0, ) 2023-01-11T21:38:06.2993663Z 2023-01-11T21:38:06.2993668Z 2023-01-11T21:38:06.2993749Z if __name__ == "__main__": 2023-01-11T21:38:06.2993863Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.2993991Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.2994216Z arg0_1 = rand_strided((2, 320, 8, 8), (20480, 64, 8, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.2994442Z arg1_1 = rand_strided((2, 2048, 8, 8), (131072, 64, 8, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.2994672Z arg2_1 = rand_strided((320, 2048, 1, 1), (2048, 1, 2048, 2048), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.2994801Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1])) 2023-01-11T21:38:06.2994806Z 2023-01-11T21:38:06.2994880Z ok (0.365s) 2023-01-11T21:38:06.2995078Z test_conv2d_binary_cuda (__main__.CudaTests) ... skip: only support cpu conv2d binary test (0.001s) 2023-01-11T21:38:06.2995254Z test_conv2d_channels_last_cuda (__main__.CudaTests) ... skip: only support cpu channels_last (0.001s) 2023-01-11T21:38:06.2995419Z test_conv2d_packed_cuda (__main__.CudaTests) ... skip: only support cpu conv2d unary test (0.000s) 2023-01-11T21:38:06.2995588Z test_conv2d_unary_cuda (__main__.CudaTests) ... skip: only support cpu conv2d unary test (0.001s) 2023-01-11T21:38:06.2995762Z test_conv3d_channels_last_cuda (__main__.CudaTests) ... skip: only support cpu channels_last (0.001s) 2023-01-11T21:38:06.2996302Z test_conv_autotune_cuda (__main__.CudaTests) ... /opt/conda/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py:372: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled.Consider setting `torch.set_float32_matmul_precision('high')` 2023-01-11T21:38:06.2996387Z warnings.warn( 2023-01-11T21:38:06.2996650Z [2023-01-11 21:34:14,551] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 468 2023-01-11T21:38:06.2996912Z [2023-01-11 21:34:25,241] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 468 2023-01-11T21:38:06.2997111Z for key = ('conv', 32, 128, 32, 32, 32, 128, 1, 1, (1, 1), (0, 0), (1, 1), False, (0, 0), 1) 2023-01-11T21:38:06.2997324Z timing {'aten.convolution': 0.05315199866890907, 'triton_ops.conv': 0.07064960002899169} 2023-01-11T21:38:06.2997420Z best_kernel aten.convolution 2023-01-11T21:38:06.2997426Z 2023-01-11T21:38:06.2997519Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.2997595Z import torch 2023-01-11T21:38:06.2997671Z import random 2023-01-11T21:38:06.2997825Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.2997950Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.2997956Z 2023-01-11T21:38:06.2998038Z aten = torch.ops.aten 2023-01-11T21:38:06.2998175Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.2998269Z async_compile = AsyncCompile() 2023-01-11T21:38:06.2998274Z 2023-01-11T21:38:06.2998350Z import triton 2023-01-11T21:38:06.2998442Z import triton.language as tl 2023-01-11T21:38:06.2998567Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.2998708Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.2998714Z 2023-01-11T21:38:06.2998864Z from torch._inductor.triton_ops.conv_perf_model import early_config_prune 2023-01-11T21:38:06.2999014Z from torch._inductor.triton_ops.conv_perf_model import estimate_conv_time 2023-01-11T21:38:06.2999153Z from torch._inductor.triton_ops.autotune import conv_heuristics 2023-01-11T21:38:06.2999162Z 2023-01-11T21:38:06.2999167Z 2023-01-11T21:38:06.2999336Z triton_fused_convolution_0 = async_compile.triton(''' 2023-01-11T21:38:06.2999407Z import triton 2023-01-11T21:38:06.2999500Z import triton.language as tl 2023-01-11T21:38:06.2999616Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.2999722Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.2999855Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.2999980Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.2999985Z 2023-01-11T21:38:06.3000412Z @pointwise(size_hints=[1048576], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.3000488Z @triton.jit 2023-01-11T21:38:06.3000619Z def triton_(in_out_ptr0, in_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.3000699Z xnumel = 1048576 2023-01-11T21:38:06.3000800Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.3000930Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.3001013Z xmask = xindex < xnumel 2023-01-11T21:38:06.3001085Z x3 = xindex 2023-01-11T21:38:06.3001163Z x1 = (xindex // 1024) % 32 2023-01-11T21:38:06.3001327Z tmp0 = tl.load(in_out_ptr0 + (x3 + tl.zeros([XBLOCK], tl.int32)), xmask) 2023-01-11T21:38:06.3001465Z tmp1 = tl.load(in_ptr0 + (x1 + tl.zeros([XBLOCK], tl.int32)), xmask) 2023-01-11T21:38:06.3001546Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.3001681Z tl.store(in_out_ptr0 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.3001769Z ''') 2023-01-11T21:38:06.3001775Z 2023-01-11T21:38:06.3001780Z 2023-01-11T21:38:06.3001879Z async_compile.wait(globals()) 2023-01-11T21:38:06.3001957Z del async_compile 2023-01-11T21:38:06.3001962Z 2023-01-11T21:38:06.3002032Z def call(args): 2023-01-11T21:38:06.3002123Z arg0_1, arg1_1, arg2_1 = args 2023-01-11T21:38:06.3002200Z args.clear() 2023-01-11T21:38:06.3002292Z with torch.cuda.device(0): 2023-01-11T21:38:06.3002439Z buf0 = aten.convolution(arg0_1, arg1_1, None, (1, 1), (0, 0), (1, 1), False, (0, 0), 1) 2023-01-11T21:38:06.3002563Z assert_size_stride(buf0, (32, 32, 32, 32), (32768, 1024, 32, 1)) 2023-01-11T21:38:06.3002638Z del arg0_1 2023-01-11T21:38:06.3002706Z del arg1_1 2023-01-11T21:38:06.3002837Z buf1 = as_strided(buf0, (32, 32, 32, 32), (32768, 1024, 32, 1)); del buf0 # reuse 2023-01-11T21:38:06.3002929Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.3003084Z triton_fused_convolution_0.run(buf1, arg2_1, 1048576, grid=grid(1048576), stream=stream0) 2023-01-11T21:38:06.3003159Z del arg2_1 2023-01-11T21:38:06.3003237Z return (buf1, ) 2023-01-11T21:38:06.3003242Z 2023-01-11T21:38:06.3003246Z 2023-01-11T21:38:06.3003326Z if __name__ == "__main__": 2023-01-11T21:38:06.3003474Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.3003595Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.3003832Z arg0_1 = rand_strided((32, 128, 32, 32), (131072, 1024, 32, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.3004057Z arg1_1 = rand_strided((32, 128, 1, 1), (128, 1, 1, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.3004257Z arg2_1 = rand_strided((32, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.3004387Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1])) 2023-01-11T21:38:06.3004392Z 2023-01-11T21:38:06.3004465Z ok (10.912s) 2023-01-11T21:38:06.3004796Z test_conv_backward_cuda (__main__.CudaTests) ... [2023-01-11 21:34:25,504] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 469 2023-01-11T21:38:06.3005062Z [2023-01-11 21:34:25,571] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 469 2023-01-11T21:38:06.3005317Z [2023-01-11 21:34:25,662] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 470 2023-01-11T21:38:06.3005570Z [2023-01-11 21:34:25,729] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 470 2023-01-11T21:38:06.3005581Z 2023-01-11T21:38:06.3005675Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.3005755Z import torch 2023-01-11T21:38:06.3005833Z import random 2023-01-11T21:38:06.3005954Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.3006078Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.3006084Z 2023-01-11T21:38:06.3006168Z aten = torch.ops.aten 2023-01-11T21:38:06.3006310Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.3006402Z async_compile = AsyncCompile() 2023-01-11T21:38:06.3006407Z 2023-01-11T21:38:06.3006483Z import triton 2023-01-11T21:38:06.3006577Z import triton.language as tl 2023-01-11T21:38:06.3006703Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.3006846Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.3006852Z 2023-01-11T21:38:06.3006856Z 2023-01-11T21:38:06.3006946Z async_compile.wait(globals()) 2023-01-11T21:38:06.3007026Z del async_compile 2023-01-11T21:38:06.3007031Z 2023-01-11T21:38:06.3007138Z def call(args): 2023-01-11T21:38:06.3007265Z arg0_1, arg1_1, arg2_1, arg3_1, arg4_1, arg5_1, arg6_1, arg7_1, arg8_1 = args 2023-01-11T21:38:06.3007344Z args.clear() 2023-01-11T21:38:06.3007516Z buf0 = aten.convolution_backward(arg0_1, arg1_1, arg2_1, [4], [1, 1], [0, 0], [1, 1], False, [0, 0], 1, [True, True, True]) 2023-01-11T21:38:06.3007591Z buf1 = buf0[0] 2023-01-11T21:38:06.3007706Z assert_size_stride(buf1, (3, 4, 5, 5), (100, 25, 5, 1)) 2023-01-11T21:38:06.3007781Z buf2 = buf0[1] 2023-01-11T21:38:06.3007892Z assert_size_stride(buf2, (4, 4, 3, 3), (36, 9, 3, 1)) 2023-01-11T21:38:06.3007959Z buf3 = buf0[2] 2023-01-11T21:38:06.3008063Z assert_size_stride(buf3, (4, ), (1, )) 2023-01-11T21:38:06.3008134Z del buf0 2023-01-11T21:38:06.3008307Z buf4 = aten.convolution_backward(arg0_1, arg1_1, arg2_1, [4], [1, 1], [0, 0], [1, 1], False, [0, 0], 1, [True, False, False]) 2023-01-11T21:38:06.3008381Z del arg0_1 2023-01-11T21:38:06.3008454Z del arg1_1 2023-01-11T21:38:06.3008528Z del arg2_1 2023-01-11T21:38:06.3008596Z buf5 = buf4[0] 2023-01-11T21:38:06.3008708Z assert_size_stride(buf5, (3, 4, 5, 5), (100, 25, 5, 1)) 2023-01-11T21:38:06.3008779Z del buf4 2023-01-11T21:38:06.3008944Z buf6 = aten.convolution_backward(arg3_1, arg4_1, arg5_1, [4], [1], [0], [1], False, [0], 1, [True, True, True]) 2023-01-11T21:38:06.3009019Z del arg3_1 2023-01-11T21:38:06.3009095Z del arg4_1 2023-01-11T21:38:06.3009168Z del arg5_1 2023-01-11T21:38:06.3009236Z buf7 = buf6[0] 2023-01-11T21:38:06.3009347Z assert_size_stride(buf7, (3, 4, 5, 5), (100, 25, 5, 1)) 2023-01-11T21:38:06.3009422Z buf8 = buf6[1] 2023-01-11T21:38:06.3009571Z assert_size_stride(buf8, (4, 4, 3, 3), (36, 9, 3, 1)) 2023-01-11T21:38:06.3009647Z buf9 = buf6[2] 2023-01-11T21:38:06.3009749Z assert_size_stride(buf9, (4, ), (1, )) 2023-01-11T21:38:06.3009827Z del buf6 2023-01-11T21:38:06.3010001Z buf10 = aten.convolution_backward(arg6_1, arg7_1, arg8_1, [4], [1, 1, 1], [0, 0, 0], [1, 1, 1], False, [0, 0, 0], 1, [True, True, True]) 2023-01-11T21:38:06.3010076Z del arg6_1 2023-01-11T21:38:06.3010151Z del arg7_1 2023-01-11T21:38:06.3010225Z del arg8_1 2023-01-11T21:38:06.3010302Z buf11 = buf10[0] 2023-01-11T21:38:06.3010419Z assert_size_stride(buf11, (3, 4, 5, 5, 5), (500, 125, 25, 5, 1)) 2023-01-11T21:38:06.3010497Z buf12 = buf10[1] 2023-01-11T21:38:06.3010606Z assert_size_stride(buf12, (4, 4, 3, 3, 3), (108, 27, 9, 3, 1)) 2023-01-11T21:38:06.3010681Z buf13 = buf10[2] 2023-01-11T21:38:06.3010782Z assert_size_stride(buf13, (4, ), (1, )) 2023-01-11T21:38:06.3010853Z del buf10 2023-01-11T21:38:06.3010990Z return (buf1, buf2, buf3, buf5, buf7, buf8, buf9, buf11, buf12, buf13, ) 2023-01-11T21:38:06.3010996Z 2023-01-11T21:38:06.3011000Z 2023-01-11T21:38:06.3011082Z if __name__ == "__main__": 2023-01-11T21:38:06.3011202Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.3011333Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.3011540Z arg0_1 = rand_strided((3, 4, 3, 3), (36, 9, 3, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.3011753Z arg1_1 = rand_strided((3, 4, 5, 5), (100, 25, 5, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.3011959Z arg2_1 = rand_strided((4, 4, 3, 3), (36, 9, 3, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.3012163Z arg3_1 = rand_strided((3, 4, 3, 3), (36, 9, 3, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.3012369Z arg4_1 = rand_strided((3, 4, 5, 5), (100, 25, 5, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.3012575Z arg5_1 = rand_strided((4, 4, 3, 3), (36, 9, 3, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.3012800Z arg6_1 = rand_strided((3, 4, 3, 3, 3), (108, 27, 9, 3, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.3013017Z arg7_1 = rand_strided((3, 4, 5, 5, 5), (500, 125, 25, 5, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.3013250Z arg8_1 = rand_strided((4, 4, 3, 3, 3), (108, 27, 9, 3, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.3013423Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1, arg3_1, arg4_1, arg5_1, arg6_1, arg7_1, arg8_1])) 2023-01-11T21:38:06.3013428Z 2023-01-11T21:38:06.3013433Z 2023-01-11T21:38:06.3013535Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.3013611Z import torch 2023-01-11T21:38:06.3013687Z import random 2023-01-11T21:38:06.3013809Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.3013939Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.3013944Z 2023-01-11T21:38:06.3014036Z aten = torch.ops.aten 2023-01-11T21:38:06.3014169Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.3014267Z async_compile = AsyncCompile() 2023-01-11T21:38:06.3014272Z 2023-01-11T21:38:06.3014348Z import triton 2023-01-11T21:38:06.3014442Z import triton.language as tl 2023-01-11T21:38:06.3014692Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.3014833Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.3014839Z 2023-01-11T21:38:06.3014843Z 2023-01-11T21:38:06.3014933Z async_compile.wait(globals()) 2023-01-11T21:38:06.3015013Z del async_compile 2023-01-11T21:38:06.3015018Z 2023-01-11T21:38:06.3015085Z def call(args): 2023-01-11T21:38:06.3015214Z arg0_1, arg1_1, arg2_1, arg3_1, arg4_1, arg5_1, arg6_1, arg7_1, arg8_1 = args 2023-01-11T21:38:06.3015289Z args.clear() 2023-01-11T21:38:06.3015455Z buf0 = aten.convolution_backward(arg0_1, arg1_1, arg2_1, [4], [1, 1], [0, 0], [1, 1], False, [0, 0], 1, [True, True, True]) 2023-01-11T21:38:06.3015574Z buf1 = buf0[0] 2023-01-11T21:38:06.3015688Z assert_size_stride(buf1, (3, 4, 5, 5), (100, 25, 5, 1)) 2023-01-11T21:38:06.3015765Z buf2 = buf0[1] 2023-01-11T21:38:06.3015875Z assert_size_stride(buf2, (4, 4, 3, 3), (36, 9, 3, 1)) 2023-01-11T21:38:06.3015942Z buf3 = buf0[2] 2023-01-11T21:38:06.3016046Z assert_size_stride(buf3, (4, ), (1, )) 2023-01-11T21:38:06.3016118Z del buf0 2023-01-11T21:38:06.3016287Z buf4 = aten.convolution_backward(arg0_1, arg1_1, arg2_1, [4], [1, 1], [0, 0], [1, 1], False, [0, 0], 1, [True, False, False]) 2023-01-11T21:38:06.3016361Z del arg0_1 2023-01-11T21:38:06.3016432Z del arg1_1 2023-01-11T21:38:06.3016506Z del arg2_1 2023-01-11T21:38:06.3016573Z buf5 = buf4[0] 2023-01-11T21:38:06.3016684Z assert_size_stride(buf5, (3, 4, 5, 5), (100, 25, 5, 1)) 2023-01-11T21:38:06.3016758Z del buf4 2023-01-11T21:38:06.3016921Z buf6 = aten.convolution_backward(arg3_1, arg4_1, arg5_1, [4], [1], [0], [1], False, [0], 1, [True, True, True]) 2023-01-11T21:38:06.3016999Z del arg3_1 2023-01-11T21:38:06.3017071Z del arg4_1 2023-01-11T21:38:06.3017188Z del arg5_1 2023-01-11T21:38:06.3017265Z buf7 = buf6[0] 2023-01-11T21:38:06.3017380Z assert_size_stride(buf7, (3, 4, 5, 5), (100, 25, 5, 1)) 2023-01-11T21:38:06.3017457Z buf8 = buf6[1] 2023-01-11T21:38:06.3017572Z assert_size_stride(buf8, (4, 4, 3, 3), (36, 9, 3, 1)) 2023-01-11T21:38:06.3017647Z buf9 = buf6[2] 2023-01-11T21:38:06.3017747Z assert_size_stride(buf9, (4, ), (1, )) 2023-01-11T21:38:06.3017812Z del buf6 2023-01-11T21:38:06.3017985Z buf10 = aten.convolution_backward(arg6_1, arg7_1, arg8_1, [4], [1, 1, 1], [0, 0, 0], [1, 1, 1], False, [0, 0, 0], 1, [True, True, True]) 2023-01-11T21:38:06.3018059Z del arg6_1 2023-01-11T21:38:06.3018132Z del arg7_1 2023-01-11T21:38:06.3018204Z del arg8_1 2023-01-11T21:38:06.3018279Z buf11 = buf10[0] 2023-01-11T21:38:06.3018398Z assert_size_stride(buf11, (3, 4, 5, 5, 5), (500, 125, 25, 5, 1)) 2023-01-11T21:38:06.3018472Z buf12 = buf10[1] 2023-01-11T21:38:06.3018588Z assert_size_stride(buf12, (4, 4, 3, 3, 3), (108, 27, 9, 3, 1)) 2023-01-11T21:38:06.3018663Z buf13 = buf10[2] 2023-01-11T21:38:06.3018766Z assert_size_stride(buf13, (4, ), (1, )) 2023-01-11T21:38:06.3018838Z del buf10 2023-01-11T21:38:06.3019010Z return (buf1, buf2, buf3, buf5, buf7, buf8, buf9, buf11, buf12, buf13, ) 2023-01-11T21:38:06.3019016Z 2023-01-11T21:38:06.3019021Z 2023-01-11T21:38:06.3019106Z if __name__ == "__main__": 2023-01-11T21:38:06.3019227Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.3019350Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.3019566Z arg0_1 = rand_strided((3, 4, 3, 3), (36, 9, 3, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.3019808Z arg1_1 = rand_strided((3, 4, 5, 5), (100, 25, 5, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.3020041Z arg2_1 = rand_strided((4, 4, 3, 3), (36, 9, 3, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.3020274Z arg3_1 = rand_strided((3, 4, 3, 3), (36, 9, 3, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.3020509Z arg4_1 = rand_strided((3, 4, 5, 5), (100, 25, 5, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.3020738Z arg5_1 = rand_strided((4, 4, 3, 3), (36, 9, 3, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.3020987Z arg6_1 = rand_strided((3, 4, 3, 3, 3), (108, 27, 9, 3, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.3021231Z arg7_1 = rand_strided((3, 4, 5, 5, 5), (500, 125, 25, 5, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.3021475Z arg8_1 = rand_strided((4, 4, 3, 3, 3), (108, 27, 9, 3, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.3021657Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1, arg3_1, arg4_1, arg5_1, arg6_1, arg7_1, arg8_1])) 2023-01-11T21:38:06.3021663Z 2023-01-11T21:38:06.3021735Z ok (0.319s) 2023-01-11T21:38:06.3021943Z test_conv_bn_fuse_cuda (__main__.CudaTests) ... skip: only support cpu conv bn test (0.001s) 2023-01-11T21:38:06.3022142Z test_conv_functional_bn_fuse_cuda (__main__.CudaTests) ... skip: only support cpu conv bn test (0.001s) 2023-01-11T21:38:06.3022690Z test_convolution1_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.3022835Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.3023137Z [2023-01-11 21:34:25,828] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 471 2023-01-11T21:38:06.3023438Z [2023-01-11 21:34:25,939] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 471 2023-01-11T21:38:06.3023931Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.3024076Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.3024359Z [2023-01-11 21:34:26,044] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 472 2023-01-11T21:38:06.3024662Z [2023-01-11 21:34:26,149] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 472 2023-01-11T21:38:06.3024667Z 2023-01-11T21:38:06.3024772Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.3024850Z import torch 2023-01-11T21:38:06.3024928Z import random 2023-01-11T21:38:06.3025063Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.3025199Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.3025204Z 2023-01-11T21:38:06.3025291Z aten = torch.ops.aten 2023-01-11T21:38:06.3025435Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.3025541Z async_compile = AsyncCompile() 2023-01-11T21:38:06.3025576Z 2023-01-11T21:38:06.3025665Z import triton 2023-01-11T21:38:06.3025781Z import triton.language as tl 2023-01-11T21:38:06.3025942Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.3026097Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.3026102Z 2023-01-11T21:38:06.3026107Z 2023-01-11T21:38:06.3026287Z triton_fused_le_relu_0 = async_compile.triton(''' 2023-01-11T21:38:06.3026358Z import triton 2023-01-11T21:38:06.3026457Z import triton.language as tl 2023-01-11T21:38:06.3026581Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.3026691Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.3026840Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.3026976Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.3026981Z 2023-01-11T21:38:06.3027496Z @pointwise(size_hints=[4096], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*i1', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.3027571Z @triton.jit 2023-01-11T21:38:06.3027719Z def triton_(in_out_ptr0, in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.3027790Z xnumel = 2352 2023-01-11T21:38:06.3027890Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.3028021Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.3028105Z xmask = xindex < xnumel 2023-01-11T21:38:06.3028179Z x3 = xindex 2023-01-11T21:38:06.3028262Z x1 = (xindex // 196) % 6 2023-01-11T21:38:06.3028390Z tmp0 = tl.load(in_out_ptr0 + (x3), xmask) 2023-01-11T21:38:06.3028489Z tmp1 = tl.load(in_ptr0 + (x1), xmask) 2023-01-11T21:38:06.3028568Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.3028687Z tmp3 = tl.where(0 != 0, 0, tl.where(0 > tmp2, 0, tmp2)) 2023-01-11T21:38:06.3028759Z tmp4 = 0 2023-01-11T21:38:06.3028846Z tmp5 = tmp3 <= tmp4 2023-01-11T21:38:06.3028986Z tl.store(in_out_ptr0 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.3029120Z tl.store(out_ptr0 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask) 2023-01-11T21:38:06.3029202Z ''') 2023-01-11T21:38:06.3029207Z 2023-01-11T21:38:06.3029212Z 2023-01-11T21:38:06.3029307Z async_compile.wait(globals()) 2023-01-11T21:38:06.3029387Z del async_compile 2023-01-11T21:38:06.3029392Z 2023-01-11T21:38:06.3029469Z def call(args): 2023-01-11T21:38:06.3029580Z primals_1, primals_2, primals_3 = args 2023-01-11T21:38:06.3029657Z args.clear() 2023-01-11T21:38:06.3029754Z with torch.cuda.device(0): 2023-01-11T21:38:06.3029907Z buf0 = aten.convolution(primals_3, primals_1, None, (1, 1), (0, 0), (1, 1), False, (0, 0), 1) 2023-01-11T21:38:06.3030028Z assert_size_stride(buf0, (2, 6, 14, 14), (1176, 196, 14, 1)) 2023-01-11T21:38:06.3030161Z buf1 = as_strided(buf0, (2, 6, 14, 14), (1176, 196, 14, 1)); del buf0 # reuse 2023-01-11T21:38:06.3030380Z buf2 = empty_strided((2, 6, 14, 14), (1176, 196, 14, 1), device='cuda', dtype=torch.bool) 2023-01-11T21:38:06.3030474Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.3030633Z triton_fused_le_relu_0.run(buf1, primals_2, buf2, 2352, grid=grid(2352), stream=stream0) 2023-01-11T21:38:06.3030716Z del primals_2 2023-01-11T21:38:06.3030827Z return (buf1, primals_1, primals_3, buf2, ) 2023-01-11T21:38:06.3030833Z 2023-01-11T21:38:06.3030837Z 2023-01-11T21:38:06.3030919Z if __name__ == "__main__": 2023-01-11T21:38:06.3031033Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.3031163Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.3031387Z primals_1 = rand_strided((6, 5, 3, 3), (45, 9, 3, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.3031592Z primals_2 = rand_strided((6, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.3031851Z primals_3 = rand_strided((2, 5, 16, 16), (1280, 256, 16, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.3031994Z print_performance(lambda: call([primals_1, primals_2, primals_3])) 2023-01-11T21:38:06.3032000Z 2023-01-11T21:38:06.3032004Z 2023-01-11T21:38:06.3032106Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.3032183Z import torch 2023-01-11T21:38:06.3032253Z import random 2023-01-11T21:38:06.3032375Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.3032499Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.3032504Z 2023-01-11T21:38:06.3032584Z aten = torch.ops.aten 2023-01-11T21:38:06.3032725Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.3032821Z async_compile = AsyncCompile() 2023-01-11T21:38:06.3032826Z 2023-01-11T21:38:06.3032902Z import triton 2023-01-11T21:38:06.3032990Z import triton.language as tl 2023-01-11T21:38:06.3033116Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.3033260Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.3033265Z 2023-01-11T21:38:06.3033270Z 2023-01-11T21:38:06.3033429Z triton_fused_le_relu_0 = async_compile.triton(''' 2023-01-11T21:38:06.3033506Z import triton 2023-01-11T21:38:06.3033601Z import triton.language as tl 2023-01-11T21:38:06.3033715Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.3033818Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.3033945Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.3034073Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.3034105Z 2023-01-11T21:38:06.3034538Z @pointwise(size_hints=[4096], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*i1', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.3034615Z @triton.jit 2023-01-11T21:38:06.3034765Z def triton_(in_out_ptr0, in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.3034841Z xnumel = 2352 2023-01-11T21:38:06.3034942Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.3035077Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.3035175Z xmask = xindex < xnumel 2023-01-11T21:38:06.3035252Z x3 = xindex 2023-01-11T21:38:06.3035351Z x1 = (xindex // 196) % 6 2023-01-11T21:38:06.3035487Z tmp0 = tl.load(in_out_ptr0 + (x3), xmask).to(tl.float32) 2023-01-11T21:38:06.3035604Z tmp1 = tl.load(in_ptr0 + (x1), xmask).to(tl.float32) 2023-01-11T21:38:06.3035685Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.3035808Z tmp3 = tl.where(0 != 0, 0, tl.where(0 > tmp2, 0, tmp2)) 2023-01-11T21:38:06.3035891Z tmp4 = tmp3.to(tl.float32) 2023-01-11T21:38:06.3035964Z tmp5 = 0 2023-01-11T21:38:06.3036045Z tmp6 = tmp4 <= tmp5 2023-01-11T21:38:06.3036183Z tl.store(in_out_ptr0 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.3036318Z tl.store(out_ptr0 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp6, xmask) 2023-01-11T21:38:06.3036407Z ''') 2023-01-11T21:38:06.3036413Z 2023-01-11T21:38:06.3036417Z 2023-01-11T21:38:06.3036510Z async_compile.wait(globals()) 2023-01-11T21:38:06.3036582Z del async_compile 2023-01-11T21:38:06.3036594Z 2023-01-11T21:38:06.3036664Z def call(args): 2023-01-11T21:38:06.3036769Z primals_1, primals_2, primals_3 = args 2023-01-11T21:38:06.3036848Z args.clear() 2023-01-11T21:38:06.3036942Z with torch.cuda.device(0): 2023-01-11T21:38:06.3037101Z buf0 = aten.convolution(primals_3, primals_1, None, (1, 1), (0, 0), (1, 1), False, (0, 0), 1) 2023-01-11T21:38:06.3037223Z assert_size_stride(buf0, (2, 6, 14, 14), (1176, 196, 14, 1)) 2023-01-11T21:38:06.3037351Z buf1 = as_strided(buf0, (2, 6, 14, 14), (1176, 196, 14, 1)); del buf0 # reuse 2023-01-11T21:38:06.3037593Z buf2 = empty_strided((2, 6, 14, 14), (1176, 196, 14, 1), device='cuda', dtype=torch.bool) 2023-01-11T21:38:06.3037691Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.3037845Z triton_fused_le_relu_0.run(buf1, primals_2, buf2, 2352, grid=grid(2352), stream=stream0) 2023-01-11T21:38:06.3037924Z del primals_2 2023-01-11T21:38:06.3038038Z return (buf1, primals_1, primals_3, buf2, ) 2023-01-11T21:38:06.3038044Z 2023-01-11T21:38:06.3038048Z 2023-01-11T21:38:06.3038132Z if __name__ == "__main__": 2023-01-11T21:38:06.3038251Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.3038379Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.3038594Z primals_1 = rand_strided((6, 5, 3, 3), (45, 9, 3, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.3038804Z primals_2 = rand_strided((6, ), (1, ), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.3039030Z primals_3 = rand_strided((2, 5, 16, 16), (1280, 256, 16, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.3039178Z print_performance(lambda: call([primals_1, primals_2, primals_3])) 2023-01-11T21:38:06.3039183Z 2023-01-11T21:38:06.3039257Z ok (0.412s) 2023-01-11T21:38:06.3039716Z test_convolution2_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.3039849Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.3040140Z [2023-01-11 21:34:26,193] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 473 2023-01-11T21:38:06.3040407Z [2023-01-11 21:34:26,291] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 473 2023-01-11T21:38:06.3040413Z 2023-01-11T21:38:06.3040514Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.3040584Z import torch 2023-01-11T21:38:06.3040661Z import random 2023-01-11T21:38:06.3040782Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.3040907Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.3040912Z 2023-01-11T21:38:06.3040999Z aten = torch.ops.aten 2023-01-11T21:38:06.3041138Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.3041234Z async_compile = AsyncCompile() 2023-01-11T21:38:06.3041239Z 2023-01-11T21:38:06.3041308Z import triton 2023-01-11T21:38:06.3041403Z import triton.language as tl 2023-01-11T21:38:06.3041529Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.3041673Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.3041679Z 2023-01-11T21:38:06.3041683Z 2023-01-11T21:38:06.3041853Z triton_fused_convolution_0 = async_compile.triton(''' 2023-01-11T21:38:06.3041930Z import triton 2023-01-11T21:38:06.3042025Z import triton.language as tl 2023-01-11T21:38:06.3042143Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.3042241Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.3042376Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.3042502Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.3042507Z 2023-01-11T21:38:06.3042930Z @pointwise(size_hints=[16384], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.3043008Z @triton.jit 2023-01-11T21:38:06.3043146Z def triton_(in_out_ptr0, in_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.3043223Z xnumel = 11648 2023-01-11T21:38:06.3043321Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.3043444Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.3043554Z xmask = xindex < xnumel 2023-01-11T21:38:06.3043630Z x3 = xindex 2023-01-11T21:38:06.3043716Z x1 = (xindex // 364) % 16 2023-01-11T21:38:06.3043823Z tmp0 = tl.load(in_out_ptr0 + (x3), xmask) 2023-01-11T21:38:06.3043919Z tmp1 = tl.load(in_ptr0 + (x1), xmask) 2023-01-11T21:38:06.3044001Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.3044133Z tl.store(in_out_ptr0 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.3044222Z ''') 2023-01-11T21:38:06.3044227Z 2023-01-11T21:38:06.3044232Z 2023-01-11T21:38:06.3044324Z async_compile.wait(globals()) 2023-01-11T21:38:06.3044405Z del async_compile 2023-01-11T21:38:06.3044413Z 2023-01-11T21:38:06.3044490Z def call(args): 2023-01-11T21:38:06.3044577Z arg0_1, arg1_1, arg2_1 = args 2023-01-11T21:38:06.3044654Z args.clear() 2023-01-11T21:38:06.3044747Z with torch.cuda.device(0): 2023-01-11T21:38:06.3044884Z buf0 = aten.convolution(arg0_1, arg1_1, None, (4,), (0,), (1,), True, (0,), 1) 2023-01-11T21:38:06.3045002Z assert_size_stride(buf0, (2, 16, 364), (5824, 364, 1)) 2023-01-11T21:38:06.3045076Z del arg0_1 2023-01-11T21:38:06.3045152Z del arg1_1 2023-01-11T21:38:06.3045279Z buf1 = as_strided(buf0, (2, 16, 364), (5824, 364, 1)); del buf0 # reuse 2023-01-11T21:38:06.3045373Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.3045524Z triton_fused_convolution_0.run(buf1, arg2_1, 11648, grid=grid(11648), stream=stream0) 2023-01-11T21:38:06.3045592Z del arg2_1 2023-01-11T21:38:06.3045670Z return (buf1, ) 2023-01-11T21:38:06.3045675Z 2023-01-11T21:38:06.3045680Z 2023-01-11T21:38:06.3045758Z if __name__ == "__main__": 2023-01-11T21:38:06.3045903Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.3046032Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.3046250Z arg0_1 = rand_strided((2, 32, 90), (2880, 90, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.3046462Z arg1_1 = rand_strided((32, 16, 8), (128, 8, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.3046660Z arg2_1 = rand_strided((16, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.3046783Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1])) 2023-01-11T21:38:06.3046793Z 2023-01-11T21:38:06.3046860Z ok (0.141s) 2023-01-11T21:38:06.3047316Z test_cos_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.3047452Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.3047711Z [2023-01-11 21:34:26,322] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 474 2023-01-11T21:38:06.3047980Z [2023-01-11 21:34:26,404] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 474 2023-01-11T21:38:06.3048400Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.3048534Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.3048789Z [2023-01-11 21:34:26,433] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 475 2023-01-11T21:38:06.3049054Z [2023-01-11 21:34:26,514] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 475 2023-01-11T21:38:06.3049060Z 2023-01-11T21:38:06.3049158Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.3049229Z import torch 2023-01-11T21:38:06.3049378Z import random 2023-01-11T21:38:06.3049500Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.3049626Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.3049631Z 2023-01-11T21:38:06.3049714Z aten = torch.ops.aten 2023-01-11T21:38:06.3049854Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.3049955Z async_compile = AsyncCompile() 2023-01-11T21:38:06.3049960Z 2023-01-11T21:38:06.3050036Z import triton 2023-01-11T21:38:06.3050124Z import triton.language as tl 2023-01-11T21:38:06.3050251Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.3050394Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.3050399Z 2023-01-11T21:38:06.3050404Z 2023-01-11T21:38:06.3050572Z triton_fused_add_cos_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.3050649Z import triton 2023-01-11T21:38:06.3050745Z import triton.language as tl 2023-01-11T21:38:06.3050864Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.3050961Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.3051095Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.3051221Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.3051227Z 2023-01-11T21:38:06.3051650Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.3051726Z @triton.jit 2023-01-11T21:38:06.3051871Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.3051984Z xnumel = 256 2023-01-11T21:38:06.3052083Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.3052206Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.3052292Z xmask = xindex < xnumel 2023-01-11T21:38:06.3052371Z x0 = xindex 2023-01-11T21:38:06.3052564Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.3052664Z tmp4 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.3052747Z tmp1 = tl.cos(tmp0) 2023-01-11T21:38:06.3052818Z tmp2 = 2 2023-01-11T21:38:06.3052893Z tmp3 = tmp1 + tmp2 2023-01-11T21:38:06.3052966Z tmp5 = 1 2023-01-11T21:38:06.3053046Z tmp6 = tmp4 + tmp5 2023-01-11T21:38:06.3053126Z tmp7 = tl.cos(tmp6) 2023-01-11T21:38:06.3053266Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.3053402Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp7, xmask) 2023-01-11T21:38:06.3053492Z ''') 2023-01-11T21:38:06.3053498Z 2023-01-11T21:38:06.3053502Z 2023-01-11T21:38:06.3053598Z async_compile.wait(globals()) 2023-01-11T21:38:06.3053671Z del async_compile 2023-01-11T21:38:06.3053676Z 2023-01-11T21:38:06.3053752Z def call(args): 2023-01-11T21:38:06.3053827Z arg0_1, = args 2023-01-11T21:38:06.3053905Z args.clear() 2023-01-11T21:38:06.3053998Z with torch.cuda.device(0): 2023-01-11T21:38:06.3054208Z buf0 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3054408Z buf1 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3054601Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.3054753Z triton_fused_add_cos_1_0.run(arg0_1, buf0, buf1, 256, grid=grid(256), stream=stream0) 2023-01-11T21:38:06.3054828Z del arg0_1 2023-01-11T21:38:06.3054912Z return (buf0, buf1, ) 2023-01-11T21:38:06.3054918Z 2023-01-11T21:38:06.3054926Z 2023-01-11T21:38:06.3055009Z if __name__ == "__main__": 2023-01-11T21:38:06.3055129Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.3055258Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.3055490Z arg0_1 = rand_strided((16, 16), (16, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.3055659Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.3055667Z 2023-01-11T21:38:06.3055672Z 2023-01-11T21:38:06.3055777Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.3055851Z import torch 2023-01-11T21:38:06.3055925Z import random 2023-01-11T21:38:06.3056046Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.3056170Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.3056175Z 2023-01-11T21:38:06.3056260Z aten = torch.ops.aten 2023-01-11T21:38:06.3056391Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.3056489Z async_compile = AsyncCompile() 2023-01-11T21:38:06.3056497Z 2023-01-11T21:38:06.3056573Z import triton 2023-01-11T21:38:06.3056666Z import triton.language as tl 2023-01-11T21:38:06.3056796Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.3056937Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.3056942Z 2023-01-11T21:38:06.3056949Z 2023-01-11T21:38:06.3057113Z triton_fused_add_cos_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.3057244Z import triton 2023-01-11T21:38:06.3057333Z import triton.language as tl 2023-01-11T21:38:06.3057449Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.3057552Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.3057688Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.3057814Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.3057820Z 2023-01-11T21:38:06.3058243Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.3058356Z @triton.jit 2023-01-11T21:38:06.3058503Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.3058573Z xnumel = 256 2023-01-11T21:38:06.3058675Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.3058806Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.3058892Z xmask = xindex < xnumel 2023-01-11T21:38:06.3058963Z x0 = xindex 2023-01-11T21:38:06.3059178Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.3059297Z tmp4 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.3059381Z tmp1 = tl.cos(tmp0) 2023-01-11T21:38:06.3059447Z tmp2 = 2 2023-01-11T21:38:06.3059526Z tmp3 = tmp1 + tmp2 2023-01-11T21:38:06.3059599Z tmp5 = 1 2023-01-11T21:38:06.3059681Z tmp6 = tmp4 + tmp5 2023-01-11T21:38:06.3059761Z tmp7 = tl.cos(tmp6) 2023-01-11T21:38:06.3059901Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.3060030Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp7, xmask) 2023-01-11T21:38:06.3060118Z ''') 2023-01-11T21:38:06.3060124Z 2023-01-11T21:38:06.3060131Z 2023-01-11T21:38:06.3060227Z async_compile.wait(globals()) 2023-01-11T21:38:06.3060306Z del async_compile 2023-01-11T21:38:06.3060311Z 2023-01-11T21:38:06.3060387Z def call(args): 2023-01-11T21:38:06.3060462Z arg0_1, = args 2023-01-11T21:38:06.3060538Z args.clear() 2023-01-11T21:38:06.3060633Z with torch.cuda.device(0): 2023-01-11T21:38:06.3060833Z buf0 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.3061032Z buf1 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.3061125Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.3061279Z triton_fused_add_cos_1_0.run(arg0_1, buf0, buf1, 256, grid=grid(256), stream=stream0) 2023-01-11T21:38:06.3061353Z del arg0_1 2023-01-11T21:38:06.3061437Z return (buf0, buf1, ) 2023-01-11T21:38:06.3061442Z 2023-01-11T21:38:06.3061447Z 2023-01-11T21:38:06.3061528Z if __name__ == "__main__": 2023-01-11T21:38:06.3061680Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.3061801Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.3062003Z arg0_1 = rand_strided((16, 16), (16, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.3062113Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.3062119Z 2023-01-11T21:38:06.3062188Z ok (0.222s) 2023-01-11T21:38:06.3062359Z test_cpp_wrapper_cuda (__main__.CudaTests) ... skip: cpp_wrapper only supports cpu (0.001s) 2023-01-11T21:38:06.3062814Z test_cudnn_rnn_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.3062953Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.3063208Z [2023-01-11 21:34:28,899] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 476 2023-01-11T21:38:06.3063430Z [2023-01-11 21:34:28,989] torch._inductor.ir: [WARNING] Using FallbackKernel: aten._cudnn_rnn 2023-01-11T21:38:06.3063686Z [2023-01-11 21:34:28,995] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 476 2023-01-11T21:38:06.3063699Z 2023-01-11T21:38:06.3063790Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.3063863Z import torch 2023-01-11T21:38:06.3063937Z import random 2023-01-11T21:38:06.3064057Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.3064206Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.3064212Z 2023-01-11T21:38:06.3064292Z aten = torch.ops.aten 2023-01-11T21:38:06.3064428Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.3064516Z async_compile = AsyncCompile() 2023-01-11T21:38:06.3064523Z 2023-01-11T21:38:06.3064596Z import triton 2023-01-11T21:38:06.3064688Z import triton.language as tl 2023-01-11T21:38:06.3064813Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.3064952Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.3064957Z 2023-01-11T21:38:06.3064962Z 2023-01-11T21:38:06.3065055Z async_compile.wait(globals()) 2023-01-11T21:38:06.3065130Z del async_compile 2023-01-11T21:38:06.3065135Z 2023-01-11T21:38:06.3065208Z def call(args): 2023-01-11T21:38:06.3065395Z arg0_1, arg1_1, arg2_1, arg3_1, arg4_1, arg5_1, arg6_1, arg7_1, arg8_1, arg9_1, arg10_1, arg11_1, arg12_1, arg13_1, arg14_1, arg15_1, arg16_1, arg17_1, arg18_1, arg19_1 = args 2023-01-11T21:38:06.3065491Z args.clear() 2023-01-11T21:38:06.3065592Z with torch.cuda.device(0): 2023-01-11T21:38:06.3065840Z buf0 = aten._cudnn_rnn(arg0_1, [arg1_1, arg2_1, arg3_1, arg4_1, arg5_1, arg6_1, arg7_1, arg8_1, arg9_1, arg10_1, arg11_1, arg12_1, arg13_1, arg14_1, arg15_1, arg16_1], 4, arg17_1, arg18_1, arg19_1, 2, 2048, 0, 2, False, 0.0, False, True, [], None) 2023-01-11T21:38:06.3065913Z del arg0_1 2023-01-11T21:38:06.3065984Z del arg10_1 2023-01-11T21:38:06.3066055Z del arg11_1 2023-01-11T21:38:06.3066126Z del arg12_1 2023-01-11T21:38:06.3066189Z del arg13_1 2023-01-11T21:38:06.3066260Z del arg14_1 2023-01-11T21:38:06.3066328Z del arg15_1 2023-01-11T21:38:06.3066398Z del arg16_1 2023-01-11T21:38:06.3066467Z del arg17_1 2023-01-11T21:38:06.3066536Z del arg18_1 2023-01-11T21:38:06.3066599Z del arg19_1 2023-01-11T21:38:06.3066672Z del arg1_1 2023-01-11T21:38:06.3066740Z del arg2_1 2023-01-11T21:38:06.3066811Z del arg3_1 2023-01-11T21:38:06.3066882Z del arg4_1 2023-01-11T21:38:06.3066951Z del arg5_1 2023-01-11T21:38:06.3067021Z del arg6_1 2023-01-11T21:38:06.3067084Z del arg7_1 2023-01-11T21:38:06.3067154Z del arg8_1 2023-01-11T21:38:06.3067251Z del arg9_1 2023-01-11T21:38:06.3067326Z buf1 = buf0[0] 2023-01-11T21:38:06.3067439Z assert_size_stride(buf1, (92, 8, 4096), (32768, 4096, 1)) 2023-01-11T21:38:06.3067514Z buf2 = buf0[1] 2023-01-11T21:38:06.3067626Z assert_size_stride(buf2, (4, 8, 2048), (16384, 2048, 1)) 2023-01-11T21:38:06.3067694Z buf3 = buf0[2] 2023-01-11T21:38:06.3067804Z assert_size_stride(buf3, (4, 8, 2048), (16384, 2048, 1)) 2023-01-11T21:38:06.3067881Z buf4 = buf0[3] 2023-01-11T21:38:06.3067981Z assert_size_stride(buf4, (0, ), (1, )) 2023-01-11T21:38:06.3068056Z buf5 = buf0[4] 2023-01-11T21:38:06.3068167Z assert_size_stride(buf5, (167837696, ), (1, )) 2023-01-11T21:38:06.3068238Z del buf0 2023-01-11T21:38:06.3068334Z return (buf1, buf2, buf3, buf4, buf5, ) 2023-01-11T21:38:06.3068339Z 2023-01-11T21:38:06.3068344Z 2023-01-11T21:38:06.3068423Z if __name__ == "__main__": 2023-01-11T21:38:06.3068544Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.3068669Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.3068895Z arg0_1 = rand_strided((92, 8, 2048), (16384, 2048, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.3069106Z arg1_1 = rand_strided((8192, 2048), (2048, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.3069313Z arg2_1 = rand_strided((8192, 2048), (2048, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.3069512Z arg3_1 = rand_strided((8192, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.3069703Z arg4_1 = rand_strided((8192, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.3069934Z arg5_1 = rand_strided((8192, 2048), (2048, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.3070136Z arg6_1 = rand_strided((8192, 2048), (2048, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.3070332Z arg7_1 = rand_strided((8192, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.3070526Z arg8_1 = rand_strided((8192, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.3070726Z arg9_1 = rand_strided((8192, 4096), (4096, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.3070935Z arg10_1 = rand_strided((8192, 2048), (2048, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.3071136Z arg11_1 = rand_strided((8192, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.3071326Z arg12_1 = rand_strided((8192, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.3071535Z arg13_1 = rand_strided((8192, 4096), (4096, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.3071742Z arg14_1 = rand_strided((8192, 2048), (2048, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.3071937Z arg15_1 = rand_strided((8192, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.3072136Z arg16_1 = rand_strided((8192, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.3072342Z arg17_1 = rand_strided((167837696, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.3072558Z arg18_1 = rand_strided((4, 8, 2048), (16384, 2048, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.3072773Z arg19_1 = rand_strided((4, 8, 2048), (16384, 2048, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.3072990Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1, arg3_1, arg4_1, arg5_1, arg6_1, arg7_1, arg8_1, arg9_1, arg10_1, arg11_1, arg12_1, arg13_1, arg14_1, arg15_1, arg16_1, arg17_1, arg18_1, arg19_1])) 2023-01-11T21:38:06.3072999Z 2023-01-11T21:38:06.3073063Z ok (2.801s) 2023-01-11T21:38:06.3073549Z test_dense_mask_index_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.3073683Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.3073942Z [2023-01-11 21:34:29,342] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 477 2023-01-11T21:38:06.3074204Z [2023-01-11 21:34:29,516] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 477 2023-01-11T21:38:06.3074618Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.3074755Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.3075011Z [2023-01-11 21:34:29,540] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 478 2023-01-11T21:38:06.3075017Z 2023-01-11T21:38:06.3075114Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.3075187Z import torch 2023-01-11T21:38:06.3075260Z import random 2023-01-11T21:38:06.3075372Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.3075495Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.3075500Z 2023-01-11T21:38:06.3075582Z aten = torch.ops.aten 2023-01-11T21:38:06.3075721Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.3075817Z async_compile = AsyncCompile() 2023-01-11T21:38:06.3075850Z 2023-01-11T21:38:06.3075925Z import triton 2023-01-11T21:38:06.3076018Z import triton.language as tl 2023-01-11T21:38:06.3076136Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.3076273Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.3076279Z 2023-01-11T21:38:06.3076285Z 2023-01-11T21:38:06.3076460Z triton_fused_mul_select_sum_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.3076537Z import triton 2023-01-11T21:38:06.3076628Z import triton.language as tl 2023-01-11T21:38:06.3076742Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.3076842Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.3076972Z from torch._inductor.triton_ops.autotune import reduction 2023-01-11T21:38:06.3077090Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.3077095Z 2023-01-11T21:38:06.3077187Z @reduction(size_hints=[16, 8192], 2023-01-11T21:38:06.3077304Z reduction_hint=ReductionHint.INNER, 2023-01-11T21:38:06.3077395Z filename=__file__, 2023-01-11T21:38:06.3077772Z meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.3077852Z @triton.jit 2023-01-11T21:38:06.3078028Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr): 2023-01-11T21:38:06.3078102Z xnumel = 13 2023-01-11T21:38:06.3078170Z rnumel = 7877 2023-01-11T21:38:06.3078267Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.3078401Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.3078484Z xmask = xindex < xnumel 2023-01-11T21:38:06.3078600Z rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK]) 2023-01-11T21:38:06.3078671Z x0 = xindex 2023-01-11T21:38:06.3078811Z tmp4 = tl.load(in_ptr1 + (2 + tl.zeros([XBLOCK, RBLOCK], tl.int32)), None) 2023-01-11T21:38:06.3078924Z _tmp7 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.3079028Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.3079119Z rindex = roffset + rbase 2023-01-11T21:38:06.3079204Z rmask = rindex < rnumel 2023-01-11T21:38:06.3079300Z r1 = rindex 2023-01-11T21:38:06.3079382Z tmp0 = r1 + (7877*x0) 2023-01-11T21:38:06.3079458Z tmp1 = 102400 2023-01-11T21:38:06.3079532Z tmp2 = tmp0 < tmp1 2023-01-11T21:38:06.3079703Z tmp3 = tl.load(in_ptr0 + (r1 + (7877*x0) + tl.zeros([XBLOCK, RBLOCK], tl.int32)), rmask & tmp2 & xmask, other=0) 2023-01-11T21:38:06.3079784Z tmp5 = tmp3 * tmp4 2023-01-11T21:38:06.3079880Z tmp6 = tl.where(tmp2, tmp5, 0) 2023-01-11T21:38:06.3080002Z _tmp7 = tl.where(xmask & rmask, _tmp7 + tmp6, _tmp7) 2023-01-11T21:38:06.3080116Z tmp7 = tl.reshape(tl.sum(_tmp7, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.3080219Z tl.store(out_ptr0 + x0, tmp7, xmask) 2023-01-11T21:38:06.3080297Z ''') 2023-01-11T21:38:06.3080303Z 2023-01-11T21:38:06.3080313Z 2023-01-11T21:38:06.3080483Z triton_fused_mul_select_sum_1_1 = async_compile.triton(''' 2023-01-11T21:38:06.3080556Z import triton 2023-01-11T21:38:06.3080648Z import triton.language as tl 2023-01-11T21:38:06.3080764Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.3080865Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.3080997Z from torch._inductor.triton_ops.autotune import reduction 2023-01-11T21:38:06.3081123Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.3081129Z 2023-01-11T21:38:06.3081214Z @reduction(size_hints=[1, 16], 2023-01-11T21:38:06.3081328Z reduction_hint=ReductionHint.INNER, 2023-01-11T21:38:06.3081415Z filename=__file__, 2023-01-11T21:38:06.3081774Z meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.3081873Z @triton.jit 2023-01-11T21:38:06.3082041Z def triton_(in_ptr0, out_ptr0, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr): 2023-01-11T21:38:06.3082115Z xnumel = 1 2023-01-11T21:38:06.3082186Z rnumel = 13 2023-01-11T21:38:06.3082278Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.3082415Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.3082498Z xmask = xindex < xnumel 2023-01-11T21:38:06.3082616Z rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK]) 2023-01-11T21:38:06.3082734Z _tmp1 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.3082838Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.3082927Z rindex = roffset + rbase 2023-01-11T21:38:06.3083005Z rmask = rindex < rnumel 2023-01-11T21:38:06.3083075Z r0 = rindex 2023-01-11T21:38:06.3083181Z tmp0 = tl.load(in_ptr0 + (r0), rmask) 2023-01-11T21:38:06.3083303Z _tmp1 = tl.where(xmask & rmask, _tmp1 + tmp0, _tmp1) 2023-01-11T21:38:06.3083416Z tmp1 = tl.reshape(tl.sum(_tmp1, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.3083551Z tl.store(out_ptr0 + 0 + tl.zeros([XBLOCK, 1], tl.int32), tmp1, None) 2023-01-11T21:38:06.3083640Z ''') 2023-01-11T21:38:06.3083648Z 2023-01-11T21:38:06.3083652Z 2023-01-11T21:38:06.3083738Z async_compile.wait(globals()) 2023-01-11T21:38:06.3083817Z del async_compile 2023-01-11T21:38:06.3083822Z 2023-01-11T21:38:06.3083895Z def call(args): 2023-01-11T21:38:06.3083974Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.3084048Z args.clear() 2023-01-11T21:38:06.3084141Z with torch.cuda.device(0): 2023-01-11T21:38:06.3084339Z buf0 = empty_strided((13, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3084424Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.3084582Z triton_fused_mul_select_sum_1_0.run(arg0_1, arg1_1, buf0, 13, 7877, grid=grid(13), stream=stream0) 2023-01-11T21:38:06.3084658Z del arg0_1 2023-01-11T21:38:06.3084731Z del arg1_1 2023-01-11T21:38:06.3084920Z buf1 = empty_strided((), (), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3085069Z triton_fused_mul_select_sum_1_1.run(buf0, buf1, 1, 13, grid=grid(1), stream=stream0) 2023-01-11T21:38:06.3085176Z return (buf1, ) 2023-01-11T21:38:06.3085181Z 2023-01-11T21:38:06.3085186Z 2023-01-11T21:38:06.3085270Z if __name__ == "__main__": 2023-01-11T21:38:06.3085381Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.3085529Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.3085765Z arg0_1 = rand_strided((102400, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.3085960Z arg1_1 = rand_strided((3, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.3086080Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.3086088Z 2023-01-11T21:38:06.3086353Z [2023-01-11 21:34:29,639] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 478 2023-01-11T21:38:06.3086359Z 2023-01-11T21:38:06.3086462Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.3086535Z import torch 2023-01-11T21:38:06.3086603Z import random 2023-01-11T21:38:06.3086723Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.3086849Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.3086854Z 2023-01-11T21:38:06.3086936Z aten = torch.ops.aten 2023-01-11T21:38:06.3087074Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.3087170Z async_compile = AsyncCompile() 2023-01-11T21:38:06.3087176Z 2023-01-11T21:38:06.3087249Z import triton 2023-01-11T21:38:06.3087344Z import triton.language as tl 2023-01-11T21:38:06.3087462Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.3087601Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.3087633Z 2023-01-11T21:38:06.3087637Z 2023-01-11T21:38:06.3087808Z triton_fused_mul_select_sum_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.3087884Z import triton 2023-01-11T21:38:06.3087974Z import triton.language as tl 2023-01-11T21:38:06.3088088Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.3088192Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.3088324Z from torch._inductor.triton_ops.autotune import reduction 2023-01-11T21:38:06.3088443Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.3088448Z 2023-01-11T21:38:06.3088540Z @reduction(size_hints=[16, 8192], 2023-01-11T21:38:06.3088657Z reduction_hint=ReductionHint.INNER, 2023-01-11T21:38:06.3088742Z filename=__file__, 2023-01-11T21:38:06.3089114Z meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp32', 3: 'i32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.3089194Z @triton.jit 2023-01-11T21:38:06.3089368Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr): 2023-01-11T21:38:06.3089442Z xnumel = 13 2023-01-11T21:38:06.3089510Z rnumel = 7877 2023-01-11T21:38:06.3089610Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.3089747Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.3089832Z xmask = xindex < xnumel 2023-01-11T21:38:06.3089949Z rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK]) 2023-01-11T21:38:06.3090019Z x0 = xindex 2023-01-11T21:38:06.3090174Z tmp4 = tl.load(in_ptr1 + (2 + tl.zeros([XBLOCK, RBLOCK], tl.int32)), None).to(tl.float32) 2023-01-11T21:38:06.3090285Z _tmp7 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.3090391Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.3090477Z rindex = roffset + rbase 2023-01-11T21:38:06.3090566Z rmask = rindex < rnumel 2023-01-11T21:38:06.3090639Z r1 = rindex 2023-01-11T21:38:06.3090717Z tmp0 = r1 + (7877*x0) 2023-01-11T21:38:06.3090790Z tmp1 = 102400 2023-01-11T21:38:06.3090864Z tmp2 = tmp0 < tmp1 2023-01-11T21:38:06.3091081Z tmp3 = tl.load(in_ptr0 + (r1 + (7877*x0) + tl.zeros([XBLOCK, RBLOCK], tl.int32)), rmask & tmp2 & xmask, other=0).to(tl.float32) 2023-01-11T21:38:06.3091165Z tmp5 = tmp3 * tmp4 2023-01-11T21:38:06.3091261Z tmp6 = tl.where(tmp2, tmp5, 0) 2023-01-11T21:38:06.3091383Z _tmp7 = tl.where(xmask & rmask, _tmp7 + tmp6, _tmp7) 2023-01-11T21:38:06.3091497Z tmp7 = tl.reshape(tl.sum(_tmp7, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.3091596Z tl.store(out_ptr0 + x0, tmp7, xmask) 2023-01-11T21:38:06.3091674Z ''') 2023-01-11T21:38:06.3091680Z 2023-01-11T21:38:06.3091691Z 2023-01-11T21:38:06.3091859Z triton_fused_mul_select_sum_1_1 = async_compile.triton(''' 2023-01-11T21:38:06.3091934Z import triton 2023-01-11T21:38:06.3092029Z import triton.language as tl 2023-01-11T21:38:06.3092143Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.3092243Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.3092373Z from torch._inductor.triton_ops.autotune import reduction 2023-01-11T21:38:06.3092497Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.3092504Z 2023-01-11T21:38:06.3092587Z @reduction(size_hints=[1, 16], 2023-01-11T21:38:06.3092701Z reduction_hint=ReductionHint.INNER, 2023-01-11T21:38:06.3092785Z filename=__file__, 2023-01-11T21:38:06.3093141Z meta={'signature': {0: '*fp32', 1: '*fp16', 2: 'i32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.3093217Z @triton.jit 2023-01-11T21:38:06.3093386Z def triton_(in_ptr0, out_ptr0, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr): 2023-01-11T21:38:06.3093460Z xnumel = 1 2023-01-11T21:38:06.3093565Z rnumel = 13 2023-01-11T21:38:06.3093660Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.3093797Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.3093880Z xmask = xindex < xnumel 2023-01-11T21:38:06.3093996Z rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK]) 2023-01-11T21:38:06.3094116Z _tmp1 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.3094221Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.3094308Z rindex = roffset + rbase 2023-01-11T21:38:06.3094387Z rmask = rindex < rnumel 2023-01-11T21:38:06.3094460Z r0 = rindex 2023-01-11T21:38:06.3094666Z tmp0 = tl.load(in_ptr0 + (r0), rmask) 2023-01-11T21:38:06.3094788Z _tmp1 = tl.where(xmask & rmask, _tmp1 + tmp0, _tmp1) 2023-01-11T21:38:06.3094902Z tmp1 = tl.reshape(tl.sum(_tmp1, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.3095035Z tl.store(out_ptr0 + 0 + tl.zeros([XBLOCK, 1], tl.int32), tmp1, None) 2023-01-11T21:38:06.3095125Z ''') 2023-01-11T21:38:06.3095131Z 2023-01-11T21:38:06.3095135Z 2023-01-11T21:38:06.3095222Z async_compile.wait(globals()) 2023-01-11T21:38:06.3095300Z del async_compile 2023-01-11T21:38:06.3095305Z 2023-01-11T21:38:06.3095380Z def call(args): 2023-01-11T21:38:06.3095464Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.3095564Z args.clear() 2023-01-11T21:38:06.3095666Z with torch.cuda.device(0): 2023-01-11T21:38:06.3095880Z buf0 = empty_strided((13, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3095965Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.3096124Z triton_fused_mul_select_sum_1_0.run(arg0_1, arg1_1, buf0, 13, 7877, grid=grid(13), stream=stream0) 2023-01-11T21:38:06.3096197Z del arg0_1 2023-01-11T21:38:06.3096270Z del arg1_1 2023-01-11T21:38:06.3096457Z buf1 = empty_strided((), (), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.3096604Z triton_fused_mul_select_sum_1_1.run(buf0, buf1, 1, 13, grid=grid(1), stream=stream0) 2023-01-11T21:38:06.3096683Z return (buf1, ) 2023-01-11T21:38:06.3096688Z 2023-01-11T21:38:06.3096693Z 2023-01-11T21:38:06.3096772Z if __name__ == "__main__": 2023-01-11T21:38:06.3096884Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.3097054Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.3097323Z arg0_1 = rand_strided((102400, ), (1, ), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.3097519Z arg1_1 = rand_strided((3, ), (1, ), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.3097640Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.3097646Z 2023-01-11T21:38:06.3097714Z ok (0.322s) 2023-01-11T21:38:06.3098165Z test_div1_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.3098299Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.3098561Z [2023-01-11 21:34:29,694] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 479 2023-01-11T21:38:06.3098816Z [2023-01-11 21:34:29,968] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 479 2023-01-11T21:38:06.3099232Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.3099364Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.3099659Z [2023-01-11 21:34:30,024] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 480 2023-01-11T21:38:06.3099922Z [2023-01-11 21:34:30,182] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 480 2023-01-11T21:38:06.3099927Z 2023-01-11T21:38:06.3100028Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.3100102Z import torch 2023-01-11T21:38:06.3100175Z import random 2023-01-11T21:38:06.3100292Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.3100409Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.3100420Z 2023-01-11T21:38:06.3100495Z aten = torch.ops.aten 2023-01-11T21:38:06.3100633Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.3100729Z async_compile = AsyncCompile() 2023-01-11T21:38:06.3100734Z 2023-01-11T21:38:06.3100807Z import triton 2023-01-11T21:38:06.3100898Z import triton.language as tl 2023-01-11T21:38:06.3101031Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.3101170Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.3101176Z 2023-01-11T21:38:06.3101180Z 2023-01-11T21:38:06.3101371Z triton_fused_div_div_1_div_2_div_3_div_4_0 = async_compile.triton(''' 2023-01-11T21:38:06.3101439Z import triton 2023-01-11T21:38:06.3101532Z import triton.language as tl 2023-01-11T21:38:06.3101645Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.3101747Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.3101878Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.3102001Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.3102006Z 2023-01-11T21:38:06.3102486Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: '*fp32', 5: '*fp32', 6: '*fp32', 7: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5, 6, 7), equal_to_1=())]}) 2023-01-11T21:38:06.3102561Z @triton.jit 2023-01-11T21:38:06.3102733Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, out_ptr2, out_ptr3, out_ptr4, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.3102806Z xnumel = 64 2023-01-11T21:38:06.3102933Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.3103065Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.3103149Z xmask = xindex < xnumel 2023-01-11T21:38:06.3103219Z x0 = xindex 2023-01-11T21:38:06.3103412Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.3103594Z tmp1 = tl.load(in_ptr1 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.3103691Z tmp5 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.3103786Z tmp6 = tl.load(in_ptr1 + (x0), xmask) 2023-01-11T21:38:06.3103866Z tmp2 = tmp0 / tmp1 2023-01-11T21:38:06.3103964Z tmp3 = tl.libdevice.floor(tmp2) 2023-01-11T21:38:06.3104064Z tmp4 = tl.libdevice.trunc(tmp2) 2023-01-11T21:38:06.3104140Z tmp7 = tmp5 / tmp6 2023-01-11T21:38:06.3104229Z tmp8 = tl.libdevice.floor(tmp7) 2023-01-11T21:38:06.3104363Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.3104499Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.3104628Z tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.3104757Z tl.store(out_ptr3 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.3104883Z tl.store(out_ptr4 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp8, xmask) 2023-01-11T21:38:06.3104968Z ''') 2023-01-11T21:38:06.3104974Z 2023-01-11T21:38:06.3104978Z 2023-01-11T21:38:06.3105071Z async_compile.wait(globals()) 2023-01-11T21:38:06.3105140Z del async_compile 2023-01-11T21:38:06.3105146Z 2023-01-11T21:38:06.3105220Z def call(args): 2023-01-11T21:38:06.3105298Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.3105402Z args.clear() 2023-01-11T21:38:06.3105497Z with torch.cuda.device(0): 2023-01-11T21:38:06.3105696Z buf0 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3105894Z buf1 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3106085Z buf2 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3106284Z buf3 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3106477Z buf4 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3106569Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.3106748Z triton_fused_div_div_1_div_2_div_3_div_4_0.run(arg0_1, arg1_1, buf0, buf1, buf2, buf3, buf4, 64, grid=grid(64), stream=stream0) 2023-01-11T21:38:06.3106822Z del arg0_1 2023-01-11T21:38:06.3106893Z del arg1_1 2023-01-11T21:38:06.3107003Z return (buf0, buf1, buf2, buf3, buf4, ) 2023-01-11T21:38:06.3107008Z 2023-01-11T21:38:06.3107013Z 2023-01-11T21:38:06.3107092Z if __name__ == "__main__": 2023-01-11T21:38:06.3107205Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.3107330Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.3107530Z arg0_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.3107725Z arg1_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.3107840Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.3107846Z 2023-01-11T21:38:06.3107850Z 2023-01-11T21:38:06.3107946Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.3108020Z import torch 2023-01-11T21:38:06.3108088Z import random 2023-01-11T21:38:06.3108205Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.3108327Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.3108336Z 2023-01-11T21:38:06.3108417Z aten = torch.ops.aten 2023-01-11T21:38:06.3108554Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.3108646Z async_compile = AsyncCompile() 2023-01-11T21:38:06.3108651Z 2023-01-11T21:38:06.3108725Z import triton 2023-01-11T21:38:06.3108816Z import triton.language as tl 2023-01-11T21:38:06.3108971Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.3109111Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.3109117Z 2023-01-11T21:38:06.3109122Z 2023-01-11T21:38:06.3109311Z triton_fused_div_div_1_div_2_div_3_div_4_0 = async_compile.triton(''' 2023-01-11T21:38:06.3109385Z import triton 2023-01-11T21:38:06.3109480Z import triton.language as tl 2023-01-11T21:38:06.3109595Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.3109699Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.3109829Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.3109950Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.3109955Z 2023-01-11T21:38:06.3110438Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: '*fp16', 4: '*fp16', 5: '*fp16', 6: '*fp16', 7: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5, 6, 7), equal_to_1=())]}) 2023-01-11T21:38:06.3110512Z @triton.jit 2023-01-11T21:38:06.3110691Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, out_ptr2, out_ptr3, out_ptr4, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.3110764Z xnumel = 64 2023-01-11T21:38:06.3110862Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.3110989Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.3111071Z xmask = xindex < xnumel 2023-01-11T21:38:06.3111135Z x0 = xindex 2023-01-11T21:38:06.3111348Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.3111587Z tmp1 = tl.load(in_ptr1 + (x0), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.3111704Z tmp5 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.3111817Z tmp6 = tl.load(in_ptr1 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.3111896Z tmp2 = tmp0 / tmp1 2023-01-11T21:38:06.3111997Z tmp3 = tl.libdevice.floor(tmp2) 2023-01-11T21:38:06.3112086Z tmp4 = tl.libdevice.trunc(tmp2) 2023-01-11T21:38:06.3112165Z tmp7 = tmp5 / tmp6 2023-01-11T21:38:06.3112261Z tmp8 = tl.libdevice.floor(tmp7) 2023-01-11T21:38:06.3112395Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.3112528Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.3112661Z tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.3112788Z tl.store(out_ptr3 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.3112916Z tl.store(out_ptr4 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp8, xmask) 2023-01-11T21:38:06.3112995Z ''') 2023-01-11T21:38:06.3113000Z 2023-01-11T21:38:06.3113004Z 2023-01-11T21:38:06.3113097Z async_compile.wait(globals()) 2023-01-11T21:38:06.3113173Z del async_compile 2023-01-11T21:38:06.3113178Z 2023-01-11T21:38:06.3113253Z def call(args): 2023-01-11T21:38:06.3113331Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.3113406Z args.clear() 2023-01-11T21:38:06.3113497Z with torch.cuda.device(0): 2023-01-11T21:38:06.3113691Z buf0 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.3113886Z buf1 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.3114084Z buf2 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.3114277Z buf3 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.3114466Z buf4 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.3114561Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.3114737Z triton_fused_div_div_1_div_2_div_3_div_4_0.run(arg0_1, arg1_1, buf0, buf1, buf2, buf3, buf4, 64, grid=grid(64), stream=stream0) 2023-01-11T21:38:06.3114811Z del arg0_1 2023-01-11T21:38:06.3114902Z del arg1_1 2023-01-11T21:38:06.3115009Z return (buf0, buf1, buf2, buf3, buf4, ) 2023-01-11T21:38:06.3115015Z 2023-01-11T21:38:06.3115019Z 2023-01-11T21:38:06.3115102Z if __name__ == "__main__": 2023-01-11T21:38:06.3115243Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.3115383Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.3115595Z arg0_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.3115790Z arg1_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.3115908Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.3115916Z 2023-01-11T21:38:06.3115979Z ok (0.544s) 2023-01-11T21:38:06.3116433Z test_div2_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.3116565Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.3116819Z [2023-01-11 21:34:30,238] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 481 2023-01-11T21:38:06.3117084Z [2023-01-11 21:34:30,400] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 481 2023-01-11T21:38:06.3117498Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.3117658Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.3117911Z [2023-01-11 21:34:30,454] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 482 2023-01-11T21:38:06.3118171Z [2023-01-11 21:34:30,616] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 482 2023-01-11T21:38:06.3118177Z 2023-01-11T21:38:06.3118279Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.3118353Z import torch 2023-01-11T21:38:06.3118420Z import random 2023-01-11T21:38:06.3118538Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.3118662Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.3118670Z 2023-01-11T21:38:06.3118752Z aten = torch.ops.aten 2023-01-11T21:38:06.3118889Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.3118985Z async_compile = AsyncCompile() 2023-01-11T21:38:06.3118990Z 2023-01-11T21:38:06.3119063Z import triton 2023-01-11T21:38:06.3119156Z import triton.language as tl 2023-01-11T21:38:06.3119277Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.3119418Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.3119424Z 2023-01-11T21:38:06.3119428Z 2023-01-11T21:38:06.3119617Z triton_fused_div_div_1_div_2_div_3_div_4_0 = async_compile.triton(''' 2023-01-11T21:38:06.3119690Z import triton 2023-01-11T21:38:06.3119780Z import triton.language as tl 2023-01-11T21:38:06.3119895Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.3119994Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.3120120Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.3120248Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.3120253Z 2023-01-11T21:38:06.3120755Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: '*fp32', 5: '*fp32', 6: '*fp32', 7: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5, 6, 7), equal_to_1=())]}) 2023-01-11T21:38:06.3120830Z @triton.jit 2023-01-11T21:38:06.3121011Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, out_ptr2, out_ptr3, out_ptr4, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.3121085Z xnumel = 64 2023-01-11T21:38:06.3121182Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.3121312Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.3121394Z xmask = xindex < xnumel 2023-01-11T21:38:06.3121458Z x0 = xindex 2023-01-11T21:38:06.3121651Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.3121842Z tmp2 = tl.load(in_ptr1 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.3121939Z tmp6 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.3122034Z tmp8 = tl.load(in_ptr1 + (x0), xmask) 2023-01-11T21:38:06.3122123Z tmp1 = tmp0.to(tl.float32) 2023-01-11T21:38:06.3122205Z tmp3 = tmp1 / tmp2 2023-01-11T21:38:06.3122298Z tmp4 = tl.libdevice.floor(tmp3) 2023-01-11T21:38:06.3122393Z tmp5 = tl.libdevice.trunc(tmp3) 2023-01-11T21:38:06.3122480Z tmp7 = tmp6.to(tl.float32) 2023-01-11T21:38:06.3122557Z tmp9 = tmp7 / tmp8 2023-01-11T21:38:06.3122654Z tmp10 = tl.libdevice.floor(tmp9) 2023-01-11T21:38:06.3122787Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.3122921Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.3123045Z tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask) 2023-01-11T21:38:06.3123176Z tl.store(out_ptr3 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.3123334Z tl.store(out_ptr4 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp10, xmask) 2023-01-11T21:38:06.3123421Z ''') 2023-01-11T21:38:06.3123427Z 2023-01-11T21:38:06.3123431Z 2023-01-11T21:38:06.3123525Z async_compile.wait(globals()) 2023-01-11T21:38:06.3123599Z del async_compile 2023-01-11T21:38:06.3123608Z 2023-01-11T21:38:06.3123682Z def call(args): 2023-01-11T21:38:06.3123761Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.3123829Z args.clear() 2023-01-11T21:38:06.3123922Z with torch.cuda.device(0): 2023-01-11T21:38:06.3124122Z buf0 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3124320Z buf1 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3124515Z buf2 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3124707Z buf3 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3124901Z buf4 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3124986Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.3125165Z triton_fused_div_div_1_div_2_div_3_div_4_0.run(arg0_1, arg1_1, buf0, buf1, buf2, buf3, buf4, 64, grid=grid(64), stream=stream0) 2023-01-11T21:38:06.3125239Z del arg0_1 2023-01-11T21:38:06.3125329Z del arg1_1 2023-01-11T21:38:06.3125442Z return (buf0, buf1, buf2, buf3, buf4, ) 2023-01-11T21:38:06.3125449Z 2023-01-11T21:38:06.3125455Z 2023-01-11T21:38:06.3125554Z if __name__ == "__main__": 2023-01-11T21:38:06.3125674Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.3125799Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.3125990Z arg0_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.3126189Z arg1_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.3126311Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.3126316Z 2023-01-11T21:38:06.3126321Z 2023-01-11T21:38:06.3126418Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.3126491Z import torch 2023-01-11T21:38:06.3126565Z import random 2023-01-11T21:38:06.3126713Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.3126837Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.3126843Z 2023-01-11T21:38:06.3126917Z aten = torch.ops.aten 2023-01-11T21:38:06.3127053Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.3127150Z async_compile = AsyncCompile() 2023-01-11T21:38:06.3127155Z 2023-01-11T21:38:06.3127234Z import triton 2023-01-11T21:38:06.3127326Z import triton.language as tl 2023-01-11T21:38:06.3127451Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.3127594Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.3127603Z 2023-01-11T21:38:06.3127607Z 2023-01-11T21:38:06.3127798Z triton_fused_div_div_1_div_2_div_3_div_4_0 = async_compile.triton(''' 2023-01-11T21:38:06.3127866Z import triton 2023-01-11T21:38:06.3127957Z import triton.language as tl 2023-01-11T21:38:06.3128070Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.3128175Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.3128306Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.3128430Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.3128435Z 2023-01-11T21:38:06.3128913Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp16', 2: '*fp16', 3: '*fp16', 4: '*fp16', 5: '*fp16', 6: '*fp16', 7: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5, 6, 7), equal_to_1=())]}) 2023-01-11T21:38:06.3128987Z @triton.jit 2023-01-11T21:38:06.3129163Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, out_ptr2, out_ptr3, out_ptr4, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.3129259Z xnumel = 64 2023-01-11T21:38:06.3129356Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.3129485Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.3129570Z xmask = xindex < xnumel 2023-01-11T21:38:06.3129642Z x0 = xindex 2023-01-11T21:38:06.3129830Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.3130043Z tmp2 = tl.load(in_ptr1 + (x0), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.3130134Z tmp6 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.3130251Z tmp8 = tl.load(in_ptr1 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.3130339Z tmp1 = tmp0.to(tl.float32) 2023-01-11T21:38:06.3130418Z tmp3 = tmp1 / tmp2 2023-01-11T21:38:06.3130517Z tmp4 = tl.libdevice.floor(tmp3) 2023-01-11T21:38:06.3130613Z tmp5 = tl.libdevice.trunc(tmp3) 2023-01-11T21:38:06.3130705Z tmp7 = tmp6.to(tl.float32) 2023-01-11T21:38:06.3130776Z tmp9 = tmp7 / tmp8 2023-01-11T21:38:06.3130874Z tmp10 = tl.libdevice.floor(tmp9) 2023-01-11T21:38:06.3131006Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.3131138Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.3131271Z tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask) 2023-01-11T21:38:06.3131400Z tl.store(out_ptr3 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.3131531Z tl.store(out_ptr4 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp10, xmask) 2023-01-11T21:38:06.3131609Z ''') 2023-01-11T21:38:06.3131622Z 2023-01-11T21:38:06.3131626Z 2023-01-11T21:38:06.3131712Z async_compile.wait(globals()) 2023-01-11T21:38:06.3131789Z del async_compile 2023-01-11T21:38:06.3131794Z 2023-01-11T21:38:06.3131868Z def call(args): 2023-01-11T21:38:06.3131946Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.3132023Z args.clear() 2023-01-11T21:38:06.3132115Z with torch.cuda.device(0): 2023-01-11T21:38:06.3132312Z buf0 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.3132502Z buf1 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.3132725Z buf2 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.3132921Z buf3 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.3133110Z buf4 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.3133201Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.3133381Z triton_fused_div_div_1_div_2_div_3_div_4_0.run(arg0_1, arg1_1, buf0, buf1, buf2, buf3, buf4, 64, grid=grid(64), stream=stream0) 2023-01-11T21:38:06.3133455Z del arg0_1 2023-01-11T21:38:06.3133526Z del arg1_1 2023-01-11T21:38:06.3133621Z return (buf0, buf1, buf2, buf3, buf4, ) 2023-01-11T21:38:06.3133631Z 2023-01-11T21:38:06.3133635Z 2023-01-11T21:38:06.3133714Z if __name__ == "__main__": 2023-01-11T21:38:06.3133836Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.3133961Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.3134161Z arg0_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.3134358Z arg1_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.3134588Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.3134594Z 2023-01-11T21:38:06.3134667Z ok (0.434s) 2023-01-11T21:38:06.3135117Z test_div3_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.3135301Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.3135567Z [2023-01-11 21:34:30,663] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 483 2023-01-11T21:38:06.3135833Z [2023-01-11 21:34:30,769] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 483 2023-01-11T21:38:06.3136246Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.3136384Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.3136709Z [2023-01-11 21:34:30,815] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 484 2023-01-11T21:38:06.3136976Z [2023-01-11 21:34:30,834] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 484 2023-01-11T21:38:06.3136982Z 2023-01-11T21:38:06.3137081Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.3137215Z import torch 2023-01-11T21:38:06.3137291Z import random 2023-01-11T21:38:06.3137414Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.3137537Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.3137542Z 2023-01-11T21:38:06.3137625Z aten = torch.ops.aten 2023-01-11T21:38:06.3137762Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.3137857Z async_compile = AsyncCompile() 2023-01-11T21:38:06.3137862Z 2023-01-11T21:38:06.3137935Z import triton 2023-01-11T21:38:06.3138027Z import triton.language as tl 2023-01-11T21:38:06.3138145Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.3138283Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.3138291Z 2023-01-11T21:38:06.3138296Z 2023-01-11T21:38:06.3138489Z triton_fused_div_div_1_div_2_div_3_div_4_0 = async_compile.triton(''' 2023-01-11T21:38:06.3138564Z import triton 2023-01-11T21:38:06.3138658Z import triton.language as tl 2023-01-11T21:38:06.3138818Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.3138921Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.3139052Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.3139171Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.3139176Z 2023-01-11T21:38:06.3139642Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*i64', 1: '*i64', 2: '*fp32', 3: '*i64', 4: '*i64', 5: '*fp32', 6: '*i64', 7: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5, 6, 7), equal_to_1=())]}) 2023-01-11T21:38:06.3139716Z @triton.jit 2023-01-11T21:38:06.3139894Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, out_ptr2, out_ptr3, out_ptr4, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.3139967Z xnumel = 64 2023-01-11T21:38:06.3140063Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.3140192Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.3140278Z xmask = xindex < xnumel 2023-01-11T21:38:06.3140342Z x0 = xindex 2023-01-11T21:38:06.3140533Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.3140723Z tmp2 = tl.load(in_ptr1 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.3140894Z tmp7 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.3140992Z tmp8 = tl.load(in_ptr1 + (x0), xmask) 2023-01-11T21:38:06.3141082Z tmp1 = tmp0.to(tl.float32) 2023-01-11T21:38:06.3141168Z tmp3 = tmp2.to(tl.float32) 2023-01-11T21:38:06.3141240Z tmp4 = tmp1 / tmp3 2023-01-11T21:38:06.3141499Z tmp5 = tl.where((tmp0 < 0) != (tmp2 < 0), tl.where(tmp0 % tmp2 != 0, tmp0 // tmp2 - 1, tmp0 // tmp2), tmp0 // tmp2) 2023-01-11T21:38:06.3141650Z tmp6 = tmp0 // tmp2 2023-01-11T21:38:06.3141905Z tmp9 = tl.where((tmp7 < 0) != (tmp8 < 0), tl.where(tmp7 % tmp8 != 0, tmp7 // tmp8 - 1, tmp7 // tmp8), tmp7 // tmp8) 2023-01-11T21:38:06.3142047Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.3142182Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask) 2023-01-11T21:38:06.3142316Z tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp6, xmask) 2023-01-11T21:38:06.3142446Z tl.store(out_ptr3 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.3142570Z tl.store(out_ptr4 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp9, xmask) 2023-01-11T21:38:06.3142657Z ''') 2023-01-11T21:38:06.3142663Z 2023-01-11T21:38:06.3142668Z 2023-01-11T21:38:06.3142766Z async_compile.wait(globals()) 2023-01-11T21:38:06.3142846Z del async_compile 2023-01-11T21:38:06.3142854Z 2023-01-11T21:38:06.3142932Z def call(args): 2023-01-11T21:38:06.3143013Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.3143089Z args.clear() 2023-01-11T21:38:06.3143183Z with torch.cuda.device(0): 2023-01-11T21:38:06.3143378Z buf0 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3143578Z buf1 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.3143774Z buf2 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.3143973Z buf3 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3144168Z buf4 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.3144263Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.3144445Z triton_fused_div_div_1_div_2_div_3_div_4_0.run(arg0_1, arg1_1, buf0, buf1, buf2, buf3, buf4, 64, grid=grid(64), stream=stream0) 2023-01-11T21:38:06.3144523Z del arg0_1 2023-01-11T21:38:06.3144591Z del arg1_1 2023-01-11T21:38:06.3144697Z return (buf0, buf1, buf2, buf3, buf4, ) 2023-01-11T21:38:06.3144703Z 2023-01-11T21:38:06.3144707Z 2023-01-11T21:38:06.3144789Z if __name__ == "__main__": 2023-01-11T21:38:06.3144909Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.3145066Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.3145265Z arg0_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.3145463Z arg1_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.3145601Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.3145608Z 2023-01-11T21:38:06.3145614Z 2023-01-11T21:38:06.3145713Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.3145808Z import torch 2023-01-11T21:38:06.3145887Z import random 2023-01-11T21:38:06.3146006Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.3146135Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.3146141Z 2023-01-11T21:38:06.3146225Z aten = torch.ops.aten 2023-01-11T21:38:06.3146363Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.3146454Z async_compile = AsyncCompile() 2023-01-11T21:38:06.3146465Z 2023-01-11T21:38:06.3146538Z import triton 2023-01-11T21:38:06.3146633Z import triton.language as tl 2023-01-11T21:38:06.3146759Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.3146900Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.3146905Z 2023-01-11T21:38:06.3146910Z 2023-01-11T21:38:06.3147101Z triton_fused_div_div_1_div_2_div_3_div_4_0 = async_compile.triton(''' 2023-01-11T21:38:06.3147178Z import triton 2023-01-11T21:38:06.3147271Z import triton.language as tl 2023-01-11T21:38:06.3147381Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.3147485Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.3147651Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.3147778Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.3147784Z 2023-01-11T21:38:06.3148255Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*i64', 1: '*i64', 2: '*fp32', 3: '*i64', 4: '*i64', 5: '*fp32', 6: '*i64', 7: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5, 6, 7), equal_to_1=())]}) 2023-01-11T21:38:06.3148331Z @triton.jit 2023-01-11T21:38:06.3148511Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, out_ptr2, out_ptr3, out_ptr4, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.3148587Z xnumel = 64 2023-01-11T21:38:06.3148686Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.3148813Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.3148898Z xmask = xindex < xnumel 2023-01-11T21:38:06.3148971Z x0 = xindex 2023-01-11T21:38:06.3149165Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.3149355Z tmp2 = tl.load(in_ptr1 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.3149456Z tmp7 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.3149555Z tmp8 = tl.load(in_ptr1 + (x0), xmask) 2023-01-11T21:38:06.3149642Z tmp1 = tmp0.to(tl.float32) 2023-01-11T21:38:06.3149731Z tmp3 = tmp2.to(tl.float32) 2023-01-11T21:38:06.3149810Z tmp4 = tmp1 / tmp3 2023-01-11T21:38:06.3150066Z tmp5 = tl.where((tmp0 < 0) != (tmp2 < 0), tl.where(tmp0 % tmp2 != 0, tmp0 // tmp2 - 1, tmp0 // tmp2), tmp0 // tmp2) 2023-01-11T21:38:06.3150149Z tmp6 = tmp0 // tmp2 2023-01-11T21:38:06.3150399Z tmp9 = tl.where((tmp7 < 0) != (tmp8 < 0), tl.where(tmp7 % tmp8 != 0, tmp7 // tmp8 - 1, tmp7 // tmp8), tmp7 // tmp8) 2023-01-11T21:38:06.3150535Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.3150662Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask) 2023-01-11T21:38:06.3150797Z tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp6, xmask) 2023-01-11T21:38:06.3150928Z tl.store(out_ptr3 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.3151057Z tl.store(out_ptr4 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp9, xmask) 2023-01-11T21:38:06.3151173Z ''') 2023-01-11T21:38:06.3151179Z 2023-01-11T21:38:06.3151183Z 2023-01-11T21:38:06.3151281Z async_compile.wait(globals()) 2023-01-11T21:38:06.3151361Z del async_compile 2023-01-11T21:38:06.3151367Z 2023-01-11T21:38:06.3151443Z def call(args): 2023-01-11T21:38:06.3151517Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.3151594Z args.clear() 2023-01-11T21:38:06.3151690Z with torch.cuda.device(0): 2023-01-11T21:38:06.3151891Z buf0 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3152088Z buf1 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.3152287Z buf2 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.3152486Z buf3 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3152679Z buf4 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.3152769Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.3152954Z triton_fused_div_div_1_div_2_div_3_div_4_0.run(arg0_1, arg1_1, buf0, buf1, buf2, buf3, buf4, 64, grid=grid(64), stream=stream0) 2023-01-11T21:38:06.3153030Z del arg0_1 2023-01-11T21:38:06.3153104Z del arg1_1 2023-01-11T21:38:06.3153210Z return (buf0, buf1, buf2, buf3, buf4, ) 2023-01-11T21:38:06.3153215Z 2023-01-11T21:38:06.3153220Z 2023-01-11T21:38:06.3153300Z if __name__ == "__main__": 2023-01-11T21:38:06.3153423Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.3153552Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.3153773Z arg0_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.3153967Z arg1_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.3154090Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.3154095Z 2023-01-11T21:38:06.3154171Z ok (0.217s) 2023-01-11T21:38:06.3154623Z test_div4_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.3154756Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.3155014Z [2023-01-11 21:34:30,880] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 485 2023-01-11T21:38:06.3155303Z [2023-01-11 21:34:30,900] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 485 2023-01-11T21:38:06.3155749Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.3155881Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.3156131Z [2023-01-11 21:34:30,945] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 486 2023-01-11T21:38:06.3156393Z [2023-01-11 21:34:30,965] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 486 2023-01-11T21:38:06.3156399Z 2023-01-11T21:38:06.3156498Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.3156576Z import torch 2023-01-11T21:38:06.3156654Z import random 2023-01-11T21:38:06.3156775Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.3156901Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.3156906Z 2023-01-11T21:38:06.3156992Z aten = torch.ops.aten 2023-01-11T21:38:06.3157148Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.3157247Z async_compile = AsyncCompile() 2023-01-11T21:38:06.3157252Z 2023-01-11T21:38:06.3157328Z import triton 2023-01-11T21:38:06.3157422Z import triton.language as tl 2023-01-11T21:38:06.3157548Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.3157691Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.3157696Z 2023-01-11T21:38:06.3157701Z 2023-01-11T21:38:06.3157893Z triton_fused_div_div_1_div_2_div_3_div_4_0 = async_compile.triton(''' 2023-01-11T21:38:06.3157970Z import triton 2023-01-11T21:38:06.3158058Z import triton.language as tl 2023-01-11T21:38:06.3158176Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.3158278Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.3158410Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.3158536Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.3158541Z 2023-01-11T21:38:06.3159012Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*i64', 1: '*i64', 2: '*fp32', 3: '*i64', 4: '*i64', 5: '*fp32', 6: '*i64', 7: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5, 6, 7), equal_to_1=())]}) 2023-01-11T21:38:06.3159088Z @triton.jit 2023-01-11T21:38:06.3159264Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, out_ptr2, out_ptr3, out_ptr4, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.3159339Z xnumel = 64 2023-01-11T21:38:06.3159432Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.3159562Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.3159675Z xmask = xindex < xnumel 2023-01-11T21:38:06.3159748Z x0 = xindex 2023-01-11T21:38:06.3159939Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.3160130Z tmp2 = tl.load(in_ptr1 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.3160232Z tmp7 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.3160323Z tmp8 = tl.load(in_ptr1 + (x0), xmask) 2023-01-11T21:38:06.3160414Z tmp1 = tmp0.to(tl.float32) 2023-01-11T21:38:06.3160503Z tmp3 = tmp2.to(tl.float32) 2023-01-11T21:38:06.3160582Z tmp4 = tmp1 / tmp3 2023-01-11T21:38:06.3160836Z tmp5 = tl.where((tmp0 < 0) != (tmp2 < 0), tl.where(tmp0 % tmp2 != 0, tmp0 // tmp2 - 1, tmp0 // tmp2), tmp0 // tmp2) 2023-01-11T21:38:06.3160917Z tmp6 = tmp0 // tmp2 2023-01-11T21:38:06.3161167Z tmp9 = tl.where((tmp7 < 0) != (tmp8 < 0), tl.where(tmp7 % tmp8 != 0, tmp7 // tmp8 - 1, tmp7 // tmp8), tmp7 // tmp8) 2023-01-11T21:38:06.3161300Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.3161434Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask) 2023-01-11T21:38:06.3161570Z tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp6, xmask) 2023-01-11T21:38:06.3161703Z tl.store(out_ptr3 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.3161829Z tl.store(out_ptr4 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp9, xmask) 2023-01-11T21:38:06.3161918Z ''') 2023-01-11T21:38:06.3161924Z 2023-01-11T21:38:06.3161928Z 2023-01-11T21:38:06.3162023Z async_compile.wait(globals()) 2023-01-11T21:38:06.3162101Z del async_compile 2023-01-11T21:38:06.3162107Z 2023-01-11T21:38:06.3162176Z def call(args): 2023-01-11T21:38:06.3162257Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.3162334Z args.clear() 2023-01-11T21:38:06.3162426Z with torch.cuda.device(0): 2023-01-11T21:38:06.3162631Z buf0 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3162831Z buf1 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.3163026Z buf2 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.3163225Z buf3 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3163444Z buf4 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.3163540Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.3163719Z triton_fused_div_div_1_div_2_div_3_div_4_0.run(arg0_1, arg1_1, buf0, buf1, buf2, buf3, buf4, 64, grid=grid(64), stream=stream0) 2023-01-11T21:38:06.3163794Z del arg0_1 2023-01-11T21:38:06.3163872Z del arg1_1 2023-01-11T21:38:06.3163977Z return (buf0, buf1, buf2, buf3, buf4, ) 2023-01-11T21:38:06.3163982Z 2023-01-11T21:38:06.3163987Z 2023-01-11T21:38:06.3164068Z if __name__ == "__main__": 2023-01-11T21:38:06.3164188Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.3164313Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.3164511Z arg0_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.3164704Z arg1_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.3164828Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.3164833Z 2023-01-11T21:38:06.3164838Z 2023-01-11T21:38:06.3164936Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.3165012Z import torch 2023-01-11T21:38:06.3165088Z import random 2023-01-11T21:38:06.3165203Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.3165328Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.3165333Z 2023-01-11T21:38:06.3165417Z aten = torch.ops.aten 2023-01-11T21:38:06.3165553Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.3165650Z async_compile = AsyncCompile() 2023-01-11T21:38:06.3165680Z 2023-01-11T21:38:06.3165758Z import triton 2023-01-11T21:38:06.3165853Z import triton.language as tl 2023-01-11T21:38:06.3165979Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.3166112Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.3166118Z 2023-01-11T21:38:06.3166128Z 2023-01-11T21:38:06.3166315Z triton_fused_div_div_1_div_2_div_3_div_4_0 = async_compile.triton(''' 2023-01-11T21:38:06.3166394Z import triton 2023-01-11T21:38:06.3166487Z import triton.language as tl 2023-01-11T21:38:06.3172813Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.3172933Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.3173067Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.3173194Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.3173199Z 2023-01-11T21:38:06.3173688Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*i64', 1: '*i64', 2: '*fp32', 3: '*i64', 4: '*i64', 5: '*fp32', 6: '*i64', 7: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5, 6, 7), equal_to_1=())]}) 2023-01-11T21:38:06.3173768Z @triton.jit 2023-01-11T21:38:06.3173950Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, out_ptr2, out_ptr3, out_ptr4, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.3174018Z xnumel = 64 2023-01-11T21:38:06.3174117Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.3174248Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.3174333Z xmask = xindex < xnumel 2023-01-11T21:38:06.3174405Z x0 = xindex 2023-01-11T21:38:06.3174843Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.3175032Z tmp2 = tl.load(in_ptr1 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.3175123Z tmp7 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.3175224Z tmp8 = tl.load(in_ptr1 + (x0), xmask) 2023-01-11T21:38:06.3175318Z tmp1 = tmp0.to(tl.float32) 2023-01-11T21:38:06.3175408Z tmp3 = tmp2.to(tl.float32) 2023-01-11T21:38:06.3175484Z tmp4 = tmp1 / tmp3 2023-01-11T21:38:06.3175741Z tmp5 = tl.where((tmp0 < 0) != (tmp2 < 0), tl.where(tmp0 % tmp2 != 0, tmp0 // tmp2 - 1, tmp0 // tmp2), tmp0 // tmp2) 2023-01-11T21:38:06.3175888Z tmp6 = tmp0 // tmp2 2023-01-11T21:38:06.3176145Z tmp9 = tl.where((tmp7 < 0) != (tmp8 < 0), tl.where(tmp7 % tmp8 != 0, tmp7 // tmp8 - 1, tmp7 // tmp8), tmp7 // tmp8) 2023-01-11T21:38:06.3176282Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.3176416Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask) 2023-01-11T21:38:06.3176548Z tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp6, xmask) 2023-01-11T21:38:06.3176679Z tl.store(out_ptr3 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.3176808Z tl.store(out_ptr4 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp9, xmask) 2023-01-11T21:38:06.3176898Z ''') 2023-01-11T21:38:06.3176904Z 2023-01-11T21:38:06.3176908Z 2023-01-11T21:38:06.3176998Z async_compile.wait(globals()) 2023-01-11T21:38:06.3177077Z del async_compile 2023-01-11T21:38:06.3177082Z 2023-01-11T21:38:06.3177215Z def call(args): 2023-01-11T21:38:06.3177303Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.3177378Z args.clear() 2023-01-11T21:38:06.3177473Z with torch.cuda.device(0): 2023-01-11T21:38:06.3177678Z buf0 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3177876Z buf1 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.3178065Z buf2 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.3178263Z buf3 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3178454Z buf4 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.3178601Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.3178786Z triton_fused_div_div_1_div_2_div_3_div_4_0.run(arg0_1, arg1_1, buf0, buf1, buf2, buf3, buf4, 64, grid=grid(64), stream=stream0) 2023-01-11T21:38:06.3178863Z del arg0_1 2023-01-11T21:38:06.3178936Z del arg1_1 2023-01-11T21:38:06.3179038Z return (buf0, buf1, buf2, buf3, buf4, ) 2023-01-11T21:38:06.3179051Z 2023-01-11T21:38:06.3179056Z 2023-01-11T21:38:06.3179132Z if __name__ == "__main__": 2023-01-11T21:38:06.3179252Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.3179377Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.3179575Z arg0_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.3179771Z arg1_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.3179894Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.3179899Z 2023-01-11T21:38:06.3179975Z ok (0.131s) 2023-01-11T21:38:06.3180435Z test_div5_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.3180570Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.3180822Z [2023-01-11 21:34:31,010] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 487 2023-01-11T21:38:06.3181088Z [2023-01-11 21:34:31,101] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 487 2023-01-11T21:38:06.3181506Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.3181640Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.3181923Z [2023-01-11 21:34:31,146] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 488 2023-01-11T21:38:06.3182188Z [2023-01-11 21:34:31,163] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 488 2023-01-11T21:38:06.3182194Z 2023-01-11T21:38:06.3182293Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.3182371Z import torch 2023-01-11T21:38:06.3182450Z import random 2023-01-11T21:38:06.3182565Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.3182693Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.3182698Z 2023-01-11T21:38:06.3182782Z aten = torch.ops.aten 2023-01-11T21:38:06.3182925Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.3183022Z async_compile = AsyncCompile() 2023-01-11T21:38:06.3183027Z 2023-01-11T21:38:06.3183104Z import triton 2023-01-11T21:38:06.3183199Z import triton.language as tl 2023-01-11T21:38:06.3183327Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.3183464Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.3183469Z 2023-01-11T21:38:06.3183474Z 2023-01-11T21:38:06.3183667Z triton_fused_div_div_1_div_2_div_3_div_4_0 = async_compile.triton(''' 2023-01-11T21:38:06.3183744Z import triton 2023-01-11T21:38:06.3183838Z import triton.language as tl 2023-01-11T21:38:06.3183955Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.3184062Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.3184199Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.3184320Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.3184373Z 2023-01-11T21:38:06.3184826Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp32', 2: '*i64', 3: '*i64', 4: '*fp32', 5: '*i64', 6: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5, 6), equal_to_1=())]}) 2023-01-11T21:38:06.3184903Z @triton.jit 2023-01-11T21:38:06.3185076Z def triton_(in_ptr0, out_ptr0, out_ptr1, out_ptr2, out_ptr3, out_ptr4, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.3185151Z xnumel = 64 2023-01-11T21:38:06.3185250Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.3185381Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.3185472Z xmask = xindex < xnumel 2023-01-11T21:38:06.3185561Z x0 = xindex 2023-01-11T21:38:06.3185770Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.3185868Z tmp6 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.3185960Z tmp1 = tmp0.to(tl.float32) 2023-01-11T21:38:06.3186030Z tmp2 = 16 2023-01-11T21:38:06.3186108Z tmp3 = tmp1 / tmp2 2023-01-11T21:38:06.3186362Z tmp4 = tl.where((tmp0 < 0) != (tmp2 < 0), tl.where(tmp0 % tmp2 != 0, tmp0 // tmp2 - 1, tmp0 // tmp2), tmp0 // tmp2) 2023-01-11T21:38:06.3186443Z tmp5 = tmp0 // tmp2 2023-01-11T21:38:06.3186687Z tmp7 = tl.where((tmp6 < 0) != (tmp2 < 0), tl.where(tmp6 % tmp2 != 0, tmp6 // tmp2 - 1, tmp6 // tmp2), tmp6 // tmp2) 2023-01-11T21:38:06.3186824Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.3186957Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.3187089Z tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask) 2023-01-11T21:38:06.3187220Z tl.store(out_ptr3 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.3187350Z tl.store(out_ptr4 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp7, xmask) 2023-01-11T21:38:06.3187440Z ''') 2023-01-11T21:38:06.3187446Z 2023-01-11T21:38:06.3187450Z 2023-01-11T21:38:06.3187545Z async_compile.wait(globals()) 2023-01-11T21:38:06.3187616Z del async_compile 2023-01-11T21:38:06.3187621Z 2023-01-11T21:38:06.3187696Z def call(args): 2023-01-11T21:38:06.3187769Z arg0_1, = args 2023-01-11T21:38:06.3187844Z args.clear() 2023-01-11T21:38:06.3187966Z with torch.cuda.device(0): 2023-01-11T21:38:06.3188169Z buf0 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3188365Z buf1 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.3188553Z buf2 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.3188755Z buf3 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3188950Z buf4 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.3189043Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.3189219Z triton_fused_div_div_1_div_2_div_3_div_4_0.run(arg0_1, buf0, buf1, buf2, buf3, buf4, 64, grid=grid(64), stream=stream0) 2023-01-11T21:38:06.3189295Z del arg0_1 2023-01-11T21:38:06.3189400Z return (buf0, buf1, buf2, buf3, buf4, ) 2023-01-11T21:38:06.3189405Z 2023-01-11T21:38:06.3189410Z 2023-01-11T21:38:06.3189493Z if __name__ == "__main__": 2023-01-11T21:38:06.3189612Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.3189731Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.3189929Z arg0_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.3190042Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.3190047Z 2023-01-11T21:38:06.3190052Z 2023-01-11T21:38:06.3190150Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.3190223Z import torch 2023-01-11T21:38:06.3190299Z import random 2023-01-11T21:38:06.3190420Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.3190569Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.3190582Z 2023-01-11T21:38:06.3190659Z aten = torch.ops.aten 2023-01-11T21:38:06.3190797Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.3190892Z async_compile = AsyncCompile() 2023-01-11T21:38:06.3190897Z 2023-01-11T21:38:06.3190974Z import triton 2023-01-11T21:38:06.3191067Z import triton.language as tl 2023-01-11T21:38:06.3191193Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.3191331Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.3191337Z 2023-01-11T21:38:06.3191341Z 2023-01-11T21:38:06.3191525Z triton_fused_div_div_1_div_2_div_3_div_4_0 = async_compile.triton(''' 2023-01-11T21:38:06.3191600Z import triton 2023-01-11T21:38:06.3191692Z import triton.language as tl 2023-01-11T21:38:06.3191806Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.3191908Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.3192041Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.3192167Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.3192172Z 2023-01-11T21:38:06.3192633Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp32', 2: '*i64', 3: '*i64', 4: '*fp32', 5: '*i64', 6: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5, 6), equal_to_1=())]}) 2023-01-11T21:38:06.3192709Z @triton.jit 2023-01-11T21:38:06.3192873Z def triton_(in_ptr0, out_ptr0, out_ptr1, out_ptr2, out_ptr3, out_ptr4, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.3192947Z xnumel = 64 2023-01-11T21:38:06.3193044Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.3193172Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.3193257Z xmask = xindex < xnumel 2023-01-11T21:38:06.3193328Z x0 = xindex 2023-01-11T21:38:06.3193523Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.3193616Z tmp6 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.3193705Z tmp1 = tmp0.to(tl.float32) 2023-01-11T21:38:06.3193779Z tmp2 = 16 2023-01-11T21:38:06.3193862Z tmp3 = tmp1 / tmp2 2023-01-11T21:38:06.3194147Z tmp4 = tl.where((tmp0 < 0) != (tmp2 < 0), tl.where(tmp0 % tmp2 != 0, tmp0 // tmp2 - 1, tmp0 // tmp2), tmp0 // tmp2) 2023-01-11T21:38:06.3194228Z tmp5 = tmp0 // tmp2 2023-01-11T21:38:06.3194482Z tmp7 = tl.where((tmp6 < 0) != (tmp2 < 0), tl.where(tmp6 % tmp2 != 0, tmp6 // tmp2 - 1, tmp6 // tmp2), tmp6 // tmp2) 2023-01-11T21:38:06.3194610Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.3194742Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.3194872Z tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask) 2023-01-11T21:38:06.3195000Z tl.store(out_ptr3 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.3195131Z tl.store(out_ptr4 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp7, xmask) 2023-01-11T21:38:06.3195218Z ''') 2023-01-11T21:38:06.3195223Z 2023-01-11T21:38:06.3195228Z 2023-01-11T21:38:06.3195324Z async_compile.wait(globals()) 2023-01-11T21:38:06.3195401Z del async_compile 2023-01-11T21:38:06.3195408Z 2023-01-11T21:38:06.3195477Z def call(args): 2023-01-11T21:38:06.3195552Z arg0_1, = args 2023-01-11T21:38:06.3195629Z args.clear() 2023-01-11T21:38:06.3195720Z with torch.cuda.device(0): 2023-01-11T21:38:06.3195920Z buf0 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3196117Z buf1 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.3196312Z buf2 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.3196509Z buf3 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3196727Z buf4 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.3196822Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.3196999Z triton_fused_div_div_1_div_2_div_3_div_4_0.run(arg0_1, buf0, buf1, buf2, buf3, buf4, 64, grid=grid(64), stream=stream0) 2023-01-11T21:38:06.3197075Z del arg0_1 2023-01-11T21:38:06.3197180Z return (buf0, buf1, buf2, buf3, buf4, ) 2023-01-11T21:38:06.3197185Z 2023-01-11T21:38:06.3197190Z 2023-01-11T21:38:06.3197271Z if __name__ == "__main__": 2023-01-11T21:38:06.3197390Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.3197517Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.3197709Z arg0_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.3197822Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.3197827Z 2023-01-11T21:38:06.3197900Z ok (0.198s) 2023-01-11T21:38:06.3198355Z test_div6_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.3198492Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.3198753Z [2023-01-11 21:34:31,208] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 489 2023-01-11T21:38:06.3199020Z [2023-01-11 21:34:31,319] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 489 2023-01-11T21:38:06.3199436Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.3199572Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.3199855Z [2023-01-11 21:34:31,363] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 490 2023-01-11T21:38:06.3200115Z [2023-01-11 21:34:31,384] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 490 2023-01-11T21:38:06.3200128Z 2023-01-11T21:38:06.3200220Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.3200296Z import torch 2023-01-11T21:38:06.3200371Z import random 2023-01-11T21:38:06.3200491Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.3200617Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.3200622Z 2023-01-11T21:38:06.3200704Z aten = torch.ops.aten 2023-01-11T21:38:06.3200841Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.3200934Z async_compile = AsyncCompile() 2023-01-11T21:38:06.3200939Z 2023-01-11T21:38:06.3201013Z import triton 2023-01-11T21:38:06.3201106Z import triton.language as tl 2023-01-11T21:38:06.3201232Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.3201374Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.3201380Z 2023-01-11T21:38:06.3201384Z 2023-01-11T21:38:06.3201580Z triton_fused_div_div_1_div_2_div_3_div_4_0 = async_compile.triton(''' 2023-01-11T21:38:06.3201654Z import triton 2023-01-11T21:38:06.3201749Z import triton.language as tl 2023-01-11T21:38:06.3201857Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.3201961Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.3202095Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.3202221Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.3202227Z 2023-01-11T21:38:06.3202697Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*i1', 1: '*i64', 2: '*fp32', 3: '*i64', 4: '*i64', 5: '*fp32', 6: '*i64', 7: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5, 6, 7), equal_to_1=())]}) 2023-01-11T21:38:06.3202800Z @triton.jit 2023-01-11T21:38:06.3202981Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, out_ptr2, out_ptr3, out_ptr4, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.3203056Z xnumel = 64 2023-01-11T21:38:06.3203148Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.3203280Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.3203363Z xmask = xindex < xnumel 2023-01-11T21:38:06.3203434Z x0 = xindex 2023-01-11T21:38:06.3203625Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.3203814Z tmp3 = tl.load(in_ptr1 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.3203914Z tmp10 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.3204015Z tmp12 = tl.load(in_ptr1 + (x0), xmask) 2023-01-11T21:38:06.3204097Z tmp1 = tmp0.to(tl.int64) 2023-01-11T21:38:06.3204186Z tmp2 = tmp1.to(tl.float32) 2023-01-11T21:38:06.3204273Z tmp4 = tmp3.to(tl.float32) 2023-01-11T21:38:06.3204351Z tmp5 = tmp2 / tmp4 2023-01-11T21:38:06.3204605Z tmp6 = tl.where((tmp1 < 0) != (tmp3 < 0), tl.where(tmp1 % tmp3 != 0, tmp1 // tmp3 - 1, tmp1 // tmp3), tmp1 // tmp3) 2023-01-11T21:38:06.3204684Z tmp7 = tmp1 // tmp3 2023-01-11T21:38:06.3204772Z tmp8 = tmp0.to(tl.float32) 2023-01-11T21:38:06.3204844Z tmp9 = tmp8 / tmp4 2023-01-11T21:38:06.3204931Z tmp11 = tmp10.to(tl.int64) 2023-01-11T21:38:06.3205200Z tmp13 = tl.where((tmp11 < 0) != (tmp12 < 0), tl.where(tmp11 % tmp12 != 0, tmp11 // tmp12 - 1, tmp11 // tmp12), tmp11 // tmp12) 2023-01-11T21:38:06.3205342Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask) 2023-01-11T21:38:06.3205493Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp6, xmask) 2023-01-11T21:38:06.3205651Z tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp7, xmask) 2023-01-11T21:38:06.3205782Z tl.store(out_ptr3 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp9, xmask) 2023-01-11T21:38:06.3205907Z tl.store(out_ptr4 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp13, xmask) 2023-01-11T21:38:06.3206025Z ''') 2023-01-11T21:38:06.3206031Z 2023-01-11T21:38:06.3206036Z 2023-01-11T21:38:06.3206133Z async_compile.wait(globals()) 2023-01-11T21:38:06.3206210Z del async_compile 2023-01-11T21:38:06.3206215Z 2023-01-11T21:38:06.3206289Z def call(args): 2023-01-11T21:38:06.3206373Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.3206450Z args.clear() 2023-01-11T21:38:06.3206543Z with torch.cuda.device(0): 2023-01-11T21:38:06.3206738Z buf0 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3206934Z buf1 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.3207134Z buf2 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.3207334Z buf3 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3207526Z buf4 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.3207621Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.3207799Z triton_fused_div_div_1_div_2_div_3_div_4_0.run(arg0_1, arg1_1, buf0, buf1, buf2, buf3, buf4, 64, grid=grid(64), stream=stream0) 2023-01-11T21:38:06.3207873Z del arg0_1 2023-01-11T21:38:06.3207939Z del arg1_1 2023-01-11T21:38:06.3208043Z return (buf0, buf1, buf2, buf3, buf4, ) 2023-01-11T21:38:06.3208048Z 2023-01-11T21:38:06.3208052Z 2023-01-11T21:38:06.3208132Z if __name__ == "__main__": 2023-01-11T21:38:06.3208250Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.3208380Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.3208603Z arg0_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.bool) 2023-01-11T21:38:06.3208797Z arg1_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.3208914Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.3208920Z 2023-01-11T21:38:06.3208924Z 2023-01-11T21:38:06.3209018Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.3209093Z import torch 2023-01-11T21:38:06.3209168Z import random 2023-01-11T21:38:06.3209285Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.3209409Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.3209414Z 2023-01-11T21:38:06.3209496Z aten = torch.ops.aten 2023-01-11T21:38:06.3209635Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.3209724Z async_compile = AsyncCompile() 2023-01-11T21:38:06.3209734Z 2023-01-11T21:38:06.3209803Z import triton 2023-01-11T21:38:06.3209895Z import triton.language as tl 2023-01-11T21:38:06.3210023Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.3210162Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.3210168Z 2023-01-11T21:38:06.3210172Z 2023-01-11T21:38:06.3210364Z triton_fused_div_div_1_div_2_div_3_div_4_0 = async_compile.triton(''' 2023-01-11T21:38:06.3210444Z import triton 2023-01-11T21:38:06.3210538Z import triton.language as tl 2023-01-11T21:38:06.3210646Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.3210747Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.3210880Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.3211004Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.3211009Z 2023-01-11T21:38:06.3211484Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*i1', 1: '*i64', 2: '*fp32', 3: '*i64', 4: '*i64', 5: '*fp32', 6: '*i64', 7: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5, 6, 7), equal_to_1=())]}) 2023-01-11T21:38:06.3211565Z @triton.jit 2023-01-11T21:38:06.3211742Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, out_ptr2, out_ptr3, out_ptr4, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.3211819Z xnumel = 64 2023-01-11T21:38:06.3211949Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.3212074Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.3212158Z xmask = xindex < xnumel 2023-01-11T21:38:06.3212230Z x0 = xindex 2023-01-11T21:38:06.3212422Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.3212612Z tmp3 = tl.load(in_ptr1 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.3212712Z tmp10 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.3212810Z tmp12 = tl.load(in_ptr1 + (x0), xmask) 2023-01-11T21:38:06.3212891Z tmp1 = tmp0.to(tl.int64) 2023-01-11T21:38:06.3212980Z tmp2 = tmp1.to(tl.float32) 2023-01-11T21:38:06.3213070Z tmp4 = tmp3.to(tl.float32) 2023-01-11T21:38:06.3213148Z tmp5 = tmp2 / tmp4 2023-01-11T21:38:06.3213403Z tmp6 = tl.where((tmp1 < 0) != (tmp3 < 0), tl.where(tmp1 % tmp3 != 0, tmp1 // tmp3 - 1, tmp1 // tmp3), tmp1 // tmp3) 2023-01-11T21:38:06.3213483Z tmp7 = tmp1 // tmp3 2023-01-11T21:38:06.3213574Z tmp8 = tmp0.to(tl.float32) 2023-01-11T21:38:06.3213646Z tmp9 = tmp8 / tmp4 2023-01-11T21:38:06.3213733Z tmp11 = tmp10.to(tl.int64) 2023-01-11T21:38:06.3214002Z tmp13 = tl.where((tmp11 < 0) != (tmp12 < 0), tl.where(tmp11 % tmp12 != 0, tmp11 // tmp12 - 1, tmp11 // tmp12), tmp11 // tmp12) 2023-01-11T21:38:06.3214135Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask) 2023-01-11T21:38:06.3214267Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp6, xmask) 2023-01-11T21:38:06.3214399Z tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp7, xmask) 2023-01-11T21:38:06.3214642Z tl.store(out_ptr3 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp9, xmask) 2023-01-11T21:38:06.3214823Z tl.store(out_ptr4 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp13, xmask) 2023-01-11T21:38:06.3214904Z ''') 2023-01-11T21:38:06.3214910Z 2023-01-11T21:38:06.3214914Z 2023-01-11T21:38:06.3215011Z async_compile.wait(globals()) 2023-01-11T21:38:06.3215088Z del async_compile 2023-01-11T21:38:06.3215096Z 2023-01-11T21:38:06.3215171Z def call(args): 2023-01-11T21:38:06.3215251Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.3215325Z args.clear() 2023-01-11T21:38:06.3215415Z with torch.cuda.device(0): 2023-01-11T21:38:06.3215608Z buf0 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3215804Z buf1 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.3216000Z buf2 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.3216197Z buf3 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3216393Z buf4 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.3216485Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.3216666Z triton_fused_div_div_1_div_2_div_3_div_4_0.run(arg0_1, arg1_1, buf0, buf1, buf2, buf3, buf4, 64, grid=grid(64), stream=stream0) 2023-01-11T21:38:06.3216744Z del arg0_1 2023-01-11T21:38:06.3216811Z del arg1_1 2023-01-11T21:38:06.3216915Z return (buf0, buf1, buf2, buf3, buf4, ) 2023-01-11T21:38:06.3216920Z 2023-01-11T21:38:06.3216924Z 2023-01-11T21:38:06.3217005Z if __name__ == "__main__": 2023-01-11T21:38:06.3217172Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.3217305Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.3217500Z arg0_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.bool) 2023-01-11T21:38:06.3217694Z arg1_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.3217819Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.3217825Z 2023-01-11T21:38:06.3217890Z ok (0.222s) 2023-01-11T21:38:06.3218380Z test_div7_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.3218515Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.3218779Z [2023-01-11 21:34:31,430] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 491 2023-01-11T21:38:06.3219045Z [2023-01-11 21:34:31,585] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 491 2023-01-11T21:38:06.3219458Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.3219597Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.3219854Z [2023-01-11 21:34:31,629] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 492 2023-01-11T21:38:06.3220118Z [2023-01-11 21:34:31,650] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 492 2023-01-11T21:38:06.3220124Z 2023-01-11T21:38:06.3220223Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.3220298Z import torch 2023-01-11T21:38:06.3220367Z import random 2023-01-11T21:38:06.3220486Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.3220611Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.3220641Z 2023-01-11T21:38:06.3220726Z aten = torch.ops.aten 2023-01-11T21:38:06.3220863Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.3220960Z async_compile = AsyncCompile() 2023-01-11T21:38:06.3220965Z 2023-01-11T21:38:06.3221039Z import triton 2023-01-11T21:38:06.3221127Z import triton.language as tl 2023-01-11T21:38:06.3221254Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.3221394Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.3221400Z 2023-01-11T21:38:06.3221404Z 2023-01-11T21:38:06.3221593Z triton_fused_div_div_1_div_2_div_3_div_4_0 = async_compile.triton(''' 2023-01-11T21:38:06.3221669Z import triton 2023-01-11T21:38:06.3221761Z import triton.language as tl 2023-01-11T21:38:06.3221875Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.3221977Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.3222105Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.3222234Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.3222239Z 2023-01-11T21:38:06.3222716Z @pointwise(size_hints=[16384], filename=__file__, meta={'signature': {0: '*i64', 1: '*i64', 2: '*fp32', 3: '*i64', 4: '*i64', 5: '*fp32', 6: '*i64', 7: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5, 6, 7), equal_to_1=())]}) 2023-01-11T21:38:06.3222790Z @triton.jit 2023-01-11T21:38:06.3222966Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, out_ptr2, out_ptr3, out_ptr4, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.3223042Z xnumel = 10000 2023-01-11T21:38:06.3223141Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.3223272Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.3223355Z xmask = xindex < xnumel 2023-01-11T21:38:06.3223421Z x0 = xindex 2023-01-11T21:38:06.3223609Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.3223801Z tmp2 = tl.load(in_ptr1 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.3223899Z tmp7 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.3223995Z tmp8 = tl.load(in_ptr1 + (x0), xmask) 2023-01-11T21:38:06.3224084Z tmp1 = tmp0.to(tl.float32) 2023-01-11T21:38:06.3224200Z tmp3 = tmp2.to(tl.float32) 2023-01-11T21:38:06.3224275Z tmp4 = tmp1 / tmp3 2023-01-11T21:38:06.3224528Z tmp5 = tl.where((tmp0 < 0) != (tmp2 < 0), tl.where(tmp0 % tmp2 != 0, tmp0 // tmp2 - 1, tmp0 // tmp2), tmp0 // tmp2) 2023-01-11T21:38:06.3224610Z tmp6 = tmp0 // tmp2 2023-01-11T21:38:06.3224860Z tmp9 = tl.where((tmp7 < 0) != (tmp8 < 0), tl.where(tmp7 % tmp8 != 0, tmp7 // tmp8 - 1, tmp7 // tmp8), tmp7 // tmp8) 2023-01-11T21:38:06.3224996Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.3225130Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask) 2023-01-11T21:38:06.3225264Z tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp6, xmask) 2023-01-11T21:38:06.3225420Z tl.store(out_ptr3 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.3225557Z tl.store(out_ptr4 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp9, xmask) 2023-01-11T21:38:06.3225652Z ''') 2023-01-11T21:38:06.3225658Z 2023-01-11T21:38:06.3225664Z 2023-01-11T21:38:06.3225760Z async_compile.wait(globals()) 2023-01-11T21:38:06.3225842Z del async_compile 2023-01-11T21:38:06.3225847Z 2023-01-11T21:38:06.3225923Z def call(args): 2023-01-11T21:38:06.3226004Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.3226079Z args.clear() 2023-01-11T21:38:06.3226164Z with torch.cuda.device(0): 2023-01-11T21:38:06.3226373Z buf0 = empty_strided((100, 100), (100, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3226578Z buf1 = empty_strided((100, 100), (100, 1), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.3226779Z buf2 = empty_strided((100, 100), (100, 1), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.3227024Z buf3 = empty_strided((100, 100), (100, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3227226Z buf4 = empty_strided((100, 100), (100, 1), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.3227318Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.3227506Z triton_fused_div_div_1_div_2_div_3_div_4_0.run(arg0_1, arg1_1, buf0, buf1, buf2, buf3, buf4, 10000, grid=grid(10000), stream=stream0) 2023-01-11T21:38:06.3227574Z del arg0_1 2023-01-11T21:38:06.3227647Z del arg1_1 2023-01-11T21:38:06.3227750Z return (buf0, buf1, buf2, buf3, buf4, ) 2023-01-11T21:38:06.3227755Z 2023-01-11T21:38:06.3227759Z 2023-01-11T21:38:06.3227839Z if __name__ == "__main__": 2023-01-11T21:38:06.3227958Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.3228085Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.3228290Z arg0_1 = rand_strided((100, 100), (100, 1), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.3228491Z arg1_1 = rand_strided((100, 100), (100, 1), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.3228605Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.3228618Z 2023-01-11T21:38:06.3228623Z 2023-01-11T21:38:06.3228717Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.3228794Z import torch 2023-01-11T21:38:06.3228869Z import random 2023-01-11T21:38:06.3228989Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.3229112Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.3229118Z 2023-01-11T21:38:06.3229201Z aten = torch.ops.aten 2023-01-11T21:38:06.3229336Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.3229424Z async_compile = AsyncCompile() 2023-01-11T21:38:06.3229429Z 2023-01-11T21:38:06.3229506Z import triton 2023-01-11T21:38:06.3229600Z import triton.language as tl 2023-01-11T21:38:06.3229729Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.3229869Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.3229875Z 2023-01-11T21:38:06.3229879Z 2023-01-11T21:38:06.3230070Z triton_fused_div_div_1_div_2_div_3_div_4_0 = async_compile.triton(''' 2023-01-11T21:38:06.3230146Z import triton 2023-01-11T21:38:06.3230266Z import triton.language as tl 2023-01-11T21:38:06.3230376Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.3230475Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.3230609Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.3230736Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.3230741Z 2023-01-11T21:38:06.3231210Z @pointwise(size_hints=[16384], filename=__file__, meta={'signature': {0: '*i64', 1: '*i64', 2: '*fp32', 3: '*i64', 4: '*i64', 5: '*fp32', 6: '*i64', 7: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5, 6, 7), equal_to_1=())]}) 2023-01-11T21:38:06.3231292Z @triton.jit 2023-01-11T21:38:06.3231466Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, out_ptr2, out_ptr3, out_ptr4, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.3231540Z xnumel = 10000 2023-01-11T21:38:06.3231633Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.3231763Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.3231846Z xmask = xindex < xnumel 2023-01-11T21:38:06.3231917Z x0 = xindex 2023-01-11T21:38:06.3232108Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.3232299Z tmp2 = tl.load(in_ptr1 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.3232397Z tmp7 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.3232494Z tmp8 = tl.load(in_ptr1 + (x0), xmask) 2023-01-11T21:38:06.3232576Z tmp1 = tmp0.to(tl.float32) 2023-01-11T21:38:06.3232661Z tmp3 = tmp2.to(tl.float32) 2023-01-11T21:38:06.3232777Z tmp4 = tmp1 / tmp3 2023-01-11T21:38:06.3233029Z tmp5 = tl.where((tmp0 < 0) != (tmp2 < 0), tl.where(tmp0 % tmp2 != 0, tmp0 // tmp2 - 1, tmp0 // tmp2), tmp0 // tmp2) 2023-01-11T21:38:06.3233108Z tmp6 = tmp0 // tmp2 2023-01-11T21:38:06.3233362Z tmp9 = tl.where((tmp7 < 0) != (tmp8 < 0), tl.where(tmp7 % tmp8 != 0, tmp7 // tmp8 - 1, tmp7 // tmp8), tmp7 // tmp8) 2023-01-11T21:38:06.3233496Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.3233624Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask) 2023-01-11T21:38:06.3233755Z tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp6, xmask) 2023-01-11T21:38:06.3233887Z tl.store(out_ptr3 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.3234017Z tl.store(out_ptr4 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp9, xmask) 2023-01-11T21:38:06.3234104Z ''') 2023-01-11T21:38:06.3234110Z 2023-01-11T21:38:06.3234114Z 2023-01-11T21:38:06.3234213Z async_compile.wait(globals()) 2023-01-11T21:38:06.3234293Z del async_compile 2023-01-11T21:38:06.3234298Z 2023-01-11T21:38:06.3234375Z def call(args): 2023-01-11T21:38:06.3234450Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.3234525Z args.clear() 2023-01-11T21:38:06.3234617Z with torch.cuda.device(0): 2023-01-11T21:38:06.3234828Z buf0 = empty_strided((100, 100), (100, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3235039Z buf1 = empty_strided((100, 100), (100, 1), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.3235256Z buf2 = empty_strided((100, 100), (100, 1), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.3235497Z buf3 = empty_strided((100, 100), (100, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3235702Z buf4 = empty_strided((100, 100), (100, 1), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.3235790Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.3235972Z triton_fused_div_div_1_div_2_div_3_div_4_0.run(arg0_1, arg1_1, buf0, buf1, buf2, buf3, buf4, 10000, grid=grid(10000), stream=stream0) 2023-01-11T21:38:06.3236051Z del arg0_1 2023-01-11T21:38:06.3236124Z del arg1_1 2023-01-11T21:38:06.3236229Z return (buf0, buf1, buf2, buf3, buf4, ) 2023-01-11T21:38:06.3236234Z 2023-01-11T21:38:06.3236239Z 2023-01-11T21:38:06.3236352Z if __name__ == "__main__": 2023-01-11T21:38:06.3236474Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.3236601Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.3236800Z arg0_1 = rand_strided((100, 100), (100, 1), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.3236999Z arg1_1 = rand_strided((100, 100), (100, 1), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.3237124Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.3237129Z 2023-01-11T21:38:06.3237203Z ok (0.265s) 2023-01-11T21:38:06.3237521Z test_div8_cuda (__main__.CudaTests) ... [2023-01-11 21:34:31,686] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 493 2023-01-11T21:38:06.3237798Z [2023-01-11 21:34:31,695] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 493 2023-01-11T21:38:06.3238056Z [2023-01-11 21:34:31,729] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 494 2023-01-11T21:38:06.3238319Z [2023-01-11 21:34:31,737] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 494 2023-01-11T21:38:06.3238324Z 2023-01-11T21:38:06.3238423Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.3238493Z import torch 2023-01-11T21:38:06.3238570Z import random 2023-01-11T21:38:06.3238691Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.3238820Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.3238825Z 2023-01-11T21:38:06.3238910Z aten = torch.ops.aten 2023-01-11T21:38:06.3239047Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.3239173Z async_compile = AsyncCompile() 2023-01-11T21:38:06.3239178Z 2023-01-11T21:38:06.3239248Z import triton 2023-01-11T21:38:06.3239341Z import triton.language as tl 2023-01-11T21:38:06.3239468Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.3239614Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.3239619Z 2023-01-11T21:38:06.3239624Z 2023-01-11T21:38:06.3239766Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.3239975Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.3240095Z extern "C" void kernel(long* __restrict__ out_ptr0, 2023-01-11T21:38:06.3240198Z long* __restrict__ out_ptr1, 2023-01-11T21:38:06.3240291Z long* __restrict__ out_ptr2) 2023-01-11T21:38:06.3240359Z { 2023-01-11T21:38:06.3240426Z { 2023-01-11T21:38:06.3240498Z { 2023-01-11T21:38:06.3240609Z auto tmp0 = static_cast(1024); 2023-01-11T21:38:06.3240717Z auto tmp1 = static_cast(100); 2023-01-11T21:38:06.3240810Z auto tmp2 = tmp0 / tmp1; 2023-01-11T21:38:06.3240891Z out_ptr0[0] = tmp2; 2023-01-11T21:38:06.3240958Z } 2023-01-11T21:38:06.3241026Z } 2023-01-11T21:38:06.3241097Z { 2023-01-11T21:38:06.3241165Z { 2023-01-11T21:38:06.3241272Z auto tmp0 = static_cast(1024); 2023-01-11T21:38:06.3241370Z auto tmp1 = static_cast(100); 2023-01-11T21:38:06.3241610Z auto tmp2 = ((tmp0 < 0) != (tmp1 < 0) ? (tmp0 % tmp1 != 0 ? tmp0 / tmp1 - 1 : tmp0 / tmp1) : tmp0 / tmp1); 2023-01-11T21:38:06.3241696Z out_ptr1[0] = tmp2; 2023-01-11T21:38:06.3241763Z } 2023-01-11T21:38:06.3241831Z } 2023-01-11T21:38:06.3241898Z { 2023-01-11T21:38:06.3241967Z { 2023-01-11T21:38:06.3242066Z auto tmp0 = static_cast(1024); 2023-01-11T21:38:06.3242173Z auto tmp1 = static_cast(100); 2023-01-11T21:38:06.3242264Z auto tmp2 = tmp0 / tmp1; 2023-01-11T21:38:06.3242349Z out_ptr2[0] = tmp2; 2023-01-11T21:38:06.3242416Z } 2023-01-11T21:38:06.3242483Z } 2023-01-11T21:38:06.3242551Z } 2023-01-11T21:38:06.3242630Z ''') 2023-01-11T21:38:06.3242635Z 2023-01-11T21:38:06.3242710Z 2023-01-11T21:38:06.3242808Z async_compile.wait(globals()) 2023-01-11T21:38:06.3242886Z del async_compile 2023-01-11T21:38:06.3242891Z 2023-01-11T21:38:06.3242968Z def call(args): 2023-01-11T21:38:06.3243152Z buf0 = empty_strided((), (), device='cpu', dtype=torch.int64) 2023-01-11T21:38:06.3243335Z buf1 = empty_strided((), (), device='cpu', dtype=torch.int64) 2023-01-11T21:38:06.3243515Z buf2 = empty_strided((), (), device='cpu', dtype=torch.int64) 2023-01-11T21:38:06.3243682Z kernel_cpp_0(c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr())) 2023-01-11T21:38:06.3243766Z return (buf0, buf1, buf2, ) 2023-01-11T21:38:06.3243773Z 2023-01-11T21:38:06.3243777Z 2023-01-11T21:38:06.3243860Z if __name__ == "__main__": 2023-01-11T21:38:06.3243979Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.3244107Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.3244212Z print_performance(lambda: call([])) 2023-01-11T21:38:06.3244219Z 2023-01-11T21:38:06.3244223Z 2023-01-11T21:38:06.3244323Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.3244398Z import torch 2023-01-11T21:38:06.3244476Z import random 2023-01-11T21:38:06.3244590Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.3244714Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.3244720Z 2023-01-11T21:38:06.3244803Z aten = torch.ops.aten 2023-01-11T21:38:06.3244939Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.3245040Z async_compile = AsyncCompile() 2023-01-11T21:38:06.3245046Z 2023-01-11T21:38:06.3245172Z import triton 2023-01-11T21:38:06.3245277Z import triton.language as tl 2023-01-11T21:38:06.3245410Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.3245551Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.3245556Z 2023-01-11T21:38:06.3245561Z 2023-01-11T21:38:06.3245702Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.3245908Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.3246027Z extern "C" void kernel(long* __restrict__ out_ptr0, 2023-01-11T21:38:06.3246128Z long* __restrict__ out_ptr1, 2023-01-11T21:38:06.3246232Z long* __restrict__ out_ptr2) 2023-01-11T21:38:06.3246299Z { 2023-01-11T21:38:06.3246360Z { 2023-01-11T21:38:06.3246430Z { 2023-01-11T21:38:06.3246538Z auto tmp0 = static_cast(1024); 2023-01-11T21:38:06.3246644Z auto tmp1 = static_cast(100); 2023-01-11T21:38:06.3246738Z auto tmp2 = tmp0 / tmp1; 2023-01-11T21:38:06.3246823Z out_ptr0[0] = tmp2; 2023-01-11T21:38:06.3246893Z } 2023-01-11T21:38:06.3246953Z } 2023-01-11T21:38:06.3247022Z { 2023-01-11T21:38:06.3247089Z { 2023-01-11T21:38:06.3247194Z auto tmp0 = static_cast(1024); 2023-01-11T21:38:06.3247302Z auto tmp1 = static_cast(100); 2023-01-11T21:38:06.3247545Z auto tmp2 = ((tmp0 < 0) != (tmp1 < 0) ? (tmp0 % tmp1 != 0 ? tmp0 / tmp1 - 1 : tmp0 / tmp1) : tmp0 / tmp1); 2023-01-11T21:38:06.3247630Z out_ptr1[0] = tmp2; 2023-01-11T21:38:06.3247691Z } 2023-01-11T21:38:06.3247756Z } 2023-01-11T21:38:06.3247822Z { 2023-01-11T21:38:06.3247889Z { 2023-01-11T21:38:06.3247993Z auto tmp0 = static_cast(1024); 2023-01-11T21:38:06.3248099Z auto tmp1 = static_cast(100); 2023-01-11T21:38:06.3248190Z auto tmp2 = tmp0 / tmp1; 2023-01-11T21:38:06.3248268Z out_ptr2[0] = tmp2; 2023-01-11T21:38:06.3248342Z } 2023-01-11T21:38:06.3248409Z } 2023-01-11T21:38:06.3248472Z } 2023-01-11T21:38:06.3248557Z ''') 2023-01-11T21:38:06.3248563Z 2023-01-11T21:38:06.3248567Z 2023-01-11T21:38:06.3248663Z async_compile.wait(globals()) 2023-01-11T21:38:06.3248741Z del async_compile 2023-01-11T21:38:06.3248746Z 2023-01-11T21:38:06.3248846Z def call(args): 2023-01-11T21:38:06.3249031Z buf0 = empty_strided((), (), device='cpu', dtype=torch.int64) 2023-01-11T21:38:06.3249212Z buf1 = empty_strided((), (), device='cpu', dtype=torch.int64) 2023-01-11T21:38:06.3249392Z buf2 = empty_strided((), (), device='cpu', dtype=torch.int64) 2023-01-11T21:38:06.3249556Z kernel_cpp_0(c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr())) 2023-01-11T21:38:06.3249645Z return (buf0, buf1, buf2, ) 2023-01-11T21:38:06.3249650Z 2023-01-11T21:38:06.3249656Z 2023-01-11T21:38:06.3249736Z if __name__ == "__main__": 2023-01-11T21:38:06.3249859Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.3249980Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.3250085Z print_performance(lambda: call([])) 2023-01-11T21:38:06.3250091Z 2023-01-11T21:38:06.3250163Z ok (0.087s) 2023-01-11T21:38:06.3250621Z test_div_prim_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.3250755Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.3251014Z [2023-01-11 21:34:31,775] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 495 2023-01-11T21:38:06.3251281Z [2023-01-11 21:34:31,935] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 495 2023-01-11T21:38:06.3251727Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.3251862Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.3252118Z [2023-01-11 21:34:31,972] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 496 2023-01-11T21:38:06.3252381Z [2023-01-11 21:34:32,045] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 496 2023-01-11T21:38:06.3252788Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.3252924Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.3253181Z [2023-01-11 21:34:32,082] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 497 2023-01-11T21:38:06.3253446Z [2023-01-11 21:34:32,158] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 497 2023-01-11T21:38:06.3253860Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.3253989Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.3254247Z [2023-01-11 21:34:32,195] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 498 2023-01-11T21:38:06.3254253Z 2023-01-11T21:38:06.3254352Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.3254426Z import torch 2023-01-11T21:38:06.3254613Z import random 2023-01-11T21:38:06.3254771Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.3254897Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.3254902Z 2023-01-11T21:38:06.3254984Z aten = torch.ops.aten 2023-01-11T21:38:06.3255121Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.3255218Z async_compile = AsyncCompile() 2023-01-11T21:38:06.3255223Z 2023-01-11T21:38:06.3255297Z import triton 2023-01-11T21:38:06.3255392Z import triton.language as tl 2023-01-11T21:38:06.3255525Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.3255683Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.3255693Z 2023-01-11T21:38:06.3255699Z 2023-01-11T21:38:06.3255885Z triton_fused_div_0 = async_compile.triton(''' 2023-01-11T21:38:06.3255961Z import triton 2023-01-11T21:38:06.3256054Z import triton.language as tl 2023-01-11T21:38:06.3256167Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.3256273Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.3256406Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.3256525Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.3256537Z 2023-01-11T21:38:06.3256949Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.3257022Z @triton.jit 2023-01-11T21:38:06.3257246Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.3257374Z xnumel = 100 2023-01-11T21:38:06.3257474Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.3257603Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.3257688Z xmask = xindex < xnumel 2023-01-11T21:38:06.3257754Z x0 = xindex 2023-01-11T21:38:06.3257854Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.3257953Z tmp1 = tl.load(in_ptr1 + (x0), xmask) 2023-01-11T21:38:06.3258033Z tmp2 = tmp0 / tmp1 2023-01-11T21:38:06.3258169Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.3258257Z ''') 2023-01-11T21:38:06.3258263Z 2023-01-11T21:38:06.3258267Z 2023-01-11T21:38:06.3258359Z async_compile.wait(globals()) 2023-01-11T21:38:06.3258436Z del async_compile 2023-01-11T21:38:06.3258441Z 2023-01-11T21:38:06.3258510Z def call(args): 2023-01-11T21:38:06.3258589Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.3258663Z args.clear() 2023-01-11T21:38:06.3258760Z with torch.cuda.device(0): 2023-01-11T21:38:06.3258962Z buf0 = empty_strided((100, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3259055Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.3259196Z triton_fused_div_0.run(arg0_1, arg1_1, buf0, 100, grid=grid(100), stream=stream0) 2023-01-11T21:38:06.3259267Z del arg0_1 2023-01-11T21:38:06.3259339Z del arg1_1 2023-01-11T21:38:06.3259417Z return (buf0, ) 2023-01-11T21:38:06.3259422Z 2023-01-11T21:38:06.3259427Z 2023-01-11T21:38:06.3259507Z if __name__ == "__main__": 2023-01-11T21:38:06.3259623Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.3259748Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.3259949Z arg0_1 = rand_strided((100, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.3260148Z arg1_1 = rand_strided((100, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.3260262Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.3260270Z 2023-01-11T21:38:06.3260274Z 2023-01-11T21:38:06.3260372Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.3260446Z import torch 2023-01-11T21:38:06.3260521Z import random 2023-01-11T21:38:06.3260639Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.3260793Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.3260799Z 2023-01-11T21:38:06.3260883Z aten = torch.ops.aten 2023-01-11T21:38:06.3261014Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.3261109Z async_compile = AsyncCompile() 2023-01-11T21:38:06.3261113Z 2023-01-11T21:38:06.3261189Z import triton 2023-01-11T21:38:06.3261282Z import triton.language as tl 2023-01-11T21:38:06.3261409Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.3261547Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.3261553Z 2023-01-11T21:38:06.3261557Z 2023-01-11T21:38:06.3261714Z triton_fused_div_0 = async_compile.triton(''' 2023-01-11T21:38:06.3261789Z import triton 2023-01-11T21:38:06.3261875Z import triton.language as tl 2023-01-11T21:38:06.3261990Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.3262092Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.3262226Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.3262351Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.3262356Z 2023-01-11T21:38:06.3262771Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.3262843Z @triton.jit 2023-01-11T21:38:06.3262985Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.3263053Z xnumel = 100 2023-01-11T21:38:06.3263151Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.3263308Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.3263392Z xmask = xindex < xnumel 2023-01-11T21:38:06.3263462Z x0 = xindex 2023-01-11T21:38:06.3263575Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.3263694Z tmp1 = tl.load(in_ptr1 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.3263768Z tmp2 = tmp0 / tmp1 2023-01-11T21:38:06.3263903Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.3263989Z ''') 2023-01-11T21:38:06.3263994Z 2023-01-11T21:38:06.3263999Z 2023-01-11T21:38:06.3264092Z async_compile.wait(globals()) 2023-01-11T21:38:06.3264169Z del async_compile 2023-01-11T21:38:06.3264174Z 2023-01-11T21:38:06.3264250Z def call(args): 2023-01-11T21:38:06.3264326Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.3264402Z args.clear() 2023-01-11T21:38:06.3264488Z with torch.cuda.device(0): 2023-01-11T21:38:06.3264687Z buf0 = empty_strided((100, ), (1, ), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.3264782Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.3264924Z triton_fused_div_0.run(arg0_1, arg1_1, buf0, 100, grid=grid(100), stream=stream0) 2023-01-11T21:38:06.3264997Z del arg0_1 2023-01-11T21:38:06.3265070Z del arg1_1 2023-01-11T21:38:06.3265153Z return (buf0, ) 2023-01-11T21:38:06.3265158Z 2023-01-11T21:38:06.3265162Z 2023-01-11T21:38:06.3265243Z if __name__ == "__main__": 2023-01-11T21:38:06.3265354Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.3265479Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.3265675Z arg0_1 = rand_strided((100, ), (1, ), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.3265872Z arg1_1 = rand_strided((100, ), (1, ), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.3265995Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.3266000Z 2023-01-11T21:38:06.3266008Z 2023-01-11T21:38:06.3266109Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.3266183Z import torch 2023-01-11T21:38:06.3266251Z import random 2023-01-11T21:38:06.3266369Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.3266491Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.3266497Z 2023-01-11T21:38:06.3266605Z aten = torch.ops.aten 2023-01-11T21:38:06.3266748Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.3266842Z async_compile = AsyncCompile() 2023-01-11T21:38:06.3266847Z 2023-01-11T21:38:06.3266922Z import triton 2023-01-11T21:38:06.3267015Z import triton.language as tl 2023-01-11T21:38:06.3267133Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.3267270Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.3267275Z 2023-01-11T21:38:06.3267280Z 2023-01-11T21:38:06.3267434Z triton_fused_div_0 = async_compile.triton(''' 2023-01-11T21:38:06.3267513Z import triton 2023-01-11T21:38:06.3267606Z import triton.language as tl 2023-01-11T21:38:06.3267720Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.3267822Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.3267956Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.3268077Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.3268083Z 2023-01-11T21:38:06.3268497Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*i64', 1: '*i64', 2: '*i64', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.3268571Z @triton.jit 2023-01-11T21:38:06.3268711Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.3268787Z xnumel = 100 2023-01-11T21:38:06.3268887Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.3269017Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.3269140Z xmask = xindex < xnumel 2023-01-11T21:38:06.3269205Z x0 = xindex 2023-01-11T21:38:06.3269303Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.3269398Z tmp1 = tl.load(in_ptr1 + (x0), xmask) 2023-01-11T21:38:06.3269477Z tmp2 = tmp0 // tmp1 2023-01-11T21:38:06.3269614Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.3269700Z ''') 2023-01-11T21:38:06.3269706Z 2023-01-11T21:38:06.3269710Z 2023-01-11T21:38:06.3269804Z async_compile.wait(globals()) 2023-01-11T21:38:06.3269875Z del async_compile 2023-01-11T21:38:06.3269880Z 2023-01-11T21:38:06.3269955Z def call(args): 2023-01-11T21:38:06.3270034Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.3270109Z args.clear() 2023-01-11T21:38:06.3270199Z with torch.cuda.device(0): 2023-01-11T21:38:06.3270396Z buf0 = empty_strided((100, ), (1, ), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.3270485Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.3270625Z triton_fused_div_0.run(arg0_1, arg1_1, buf0, 100, grid=grid(100), stream=stream0) 2023-01-11T21:38:06.3270699Z del arg0_1 2023-01-11T21:38:06.3270771Z del arg1_1 2023-01-11T21:38:06.3270849Z return (buf0, ) 2023-01-11T21:38:06.3270854Z 2023-01-11T21:38:06.3270860Z 2023-01-11T21:38:06.3270942Z if __name__ == "__main__": 2023-01-11T21:38:06.3271062Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.3271189Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.3271387Z arg0_1 = rand_strided((100, ), (1, ), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.3271575Z arg1_1 = rand_strided((100, ), (1, ), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.3271694Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.3271700Z 2023-01-11T21:38:06.3271965Z [2023-01-11 21:34:32,200] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 498 2023-01-11T21:38:06.3271974Z 2023-01-11T21:38:06.3272074Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.3272147Z import torch 2023-01-11T21:38:06.3272222Z import random 2023-01-11T21:38:06.3272339Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.3272464Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.3272496Z 2023-01-11T21:38:06.3272574Z aten = torch.ops.aten 2023-01-11T21:38:06.3272710Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.3272806Z async_compile = AsyncCompile() 2023-01-11T21:38:06.3272811Z 2023-01-11T21:38:06.3272884Z import triton 2023-01-11T21:38:06.3272978Z import triton.language as tl 2023-01-11T21:38:06.3273103Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.3273242Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.3273248Z 2023-01-11T21:38:06.3273252Z 2023-01-11T21:38:06.3273406Z triton_fused_div_0 = async_compile.triton(''' 2023-01-11T21:38:06.3273478Z import triton 2023-01-11T21:38:06.3273572Z import triton.language as tl 2023-01-11T21:38:06.3273685Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.3273786Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.3273918Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.3274049Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.3274054Z 2023-01-11T21:38:06.3274464Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*i64', 1: '*i64', 2: '*i64', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.3274538Z @triton.jit 2023-01-11T21:38:06.3274673Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.3274747Z xnumel = 100 2023-01-11T21:38:06.3274844Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.3274973Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.3275084Z xmask = xindex < xnumel 2023-01-11T21:38:06.3275157Z x0 = xindex 2023-01-11T21:38:06.3275263Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.3275370Z tmp1 = tl.load(in_ptr1 + (x0), xmask) 2023-01-11T21:38:06.3275462Z tmp2 = tmp0 // tmp1 2023-01-11T21:38:06.3275612Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.3275698Z ''') 2023-01-11T21:38:06.3275704Z 2023-01-11T21:38:06.3275708Z 2023-01-11T21:38:06.3275801Z async_compile.wait(globals()) 2023-01-11T21:38:06.3275878Z del async_compile 2023-01-11T21:38:06.3275883Z 2023-01-11T21:38:06.3275957Z def call(args): 2023-01-11T21:38:06.3276038Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.3276107Z args.clear() 2023-01-11T21:38:06.3276199Z with torch.cuda.device(0): 2023-01-11T21:38:06.3276396Z buf0 = empty_strided((100, ), (1, ), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.3276492Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.3276635Z triton_fused_div_0.run(arg0_1, arg1_1, buf0, 100, grid=grid(100), stream=stream0) 2023-01-11T21:38:06.3276709Z del arg0_1 2023-01-11T21:38:06.3276782Z del arg1_1 2023-01-11T21:38:06.3276853Z return (buf0, ) 2023-01-11T21:38:06.3276858Z 2023-01-11T21:38:06.3276862Z 2023-01-11T21:38:06.3276946Z if __name__ == "__main__": 2023-01-11T21:38:06.3277063Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.3277188Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.3277389Z arg0_1 = rand_strided((100, ), (1, ), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.3277582Z arg1_1 = rand_strided((100, ), (1, ), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.3277702Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.3277707Z 2023-01-11T21:38:06.3277778Z ok (0.463s) 2023-01-11T21:38:06.3278230Z test_div_zero_dim_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.3278391Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.3278650Z [2023-01-11 21:34:32,252] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 499 2023-01-11T21:38:06.3278915Z [2023-01-11 21:34:32,516] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 499 2023-01-11T21:38:06.3279331Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.3279466Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.3279717Z [2023-01-11 21:34:32,570] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 500 2023-01-11T21:38:06.3279980Z [2023-01-11 21:34:32,728] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 500 2023-01-11T21:38:06.3280396Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.3280526Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.3280776Z [2023-01-11 21:34:32,782] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 501 2023-01-11T21:38:06.3280807Z 2023-01-11T21:38:06.3280907Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.3280976Z import torch 2023-01-11T21:38:06.3281050Z import random 2023-01-11T21:38:06.3281169Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.3281297Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.3281302Z 2023-01-11T21:38:06.3281384Z aten = torch.ops.aten 2023-01-11T21:38:06.3281521Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.3281618Z async_compile = AsyncCompile() 2023-01-11T21:38:06.3281623Z 2023-01-11T21:38:06.3281691Z import triton 2023-01-11T21:38:06.3281784Z import triton.language as tl 2023-01-11T21:38:06.3281908Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.3282048Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.3282053Z 2023-01-11T21:38:06.3282061Z 2023-01-11T21:38:06.3282253Z triton_fused_div_div_1_div_2_div_3_div_4_0 = async_compile.triton(''' 2023-01-11T21:38:06.3282329Z import triton 2023-01-11T21:38:06.3282422Z import triton.language as tl 2023-01-11T21:38:06.3282535Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.3282631Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.3282767Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.3282891Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.3282896Z 2023-01-11T21:38:06.3283373Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: '*fp32', 5: '*fp32', 6: '*fp32', 7: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5, 6), equal_to_1=())]}) 2023-01-11T21:38:06.3283448Z @triton.jit 2023-01-11T21:38:06.3283626Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, out_ptr2, out_ptr3, out_ptr4, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.3283701Z xnumel = 10 2023-01-11T21:38:06.3283802Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.3283931Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.3284009Z xmask = xindex < xnumel 2023-01-11T21:38:06.3284080Z x0 = xindex 2023-01-11T21:38:06.3284298Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.3284533Z tmp1 = tl.load(in_ptr1 + (0 + tl.zeros([XBLOCK], tl.int32)), None, eviction_policy='evict_last') 2023-01-11T21:38:06.3284634Z tmp5 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.3284764Z tmp6 = tl.load(in_ptr1 + (0 + tl.zeros([XBLOCK], tl.int32)), None) 2023-01-11T21:38:06.3284849Z tmp2 = tmp0 / tmp1 2023-01-11T21:38:06.3284942Z tmp3 = tl.libdevice.floor(tmp2) 2023-01-11T21:38:06.3285038Z tmp4 = tl.libdevice.trunc(tmp2) 2023-01-11T21:38:06.3285116Z tmp7 = tmp5 / tmp6 2023-01-11T21:38:06.3285216Z tmp8 = tl.libdevice.floor(tmp7) 2023-01-11T21:38:06.3285348Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.3285481Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.3285613Z tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.3285739Z tl.store(out_ptr3 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.3285867Z tl.store(out_ptr4 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp8, xmask) 2023-01-11T21:38:06.3285954Z ''') 2023-01-11T21:38:06.3285960Z 2023-01-11T21:38:06.3285964Z 2023-01-11T21:38:06.3286059Z async_compile.wait(globals()) 2023-01-11T21:38:06.3286137Z del async_compile 2023-01-11T21:38:06.3286142Z 2023-01-11T21:38:06.3286216Z def call(args): 2023-01-11T21:38:06.3286295Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.3286370Z args.clear() 2023-01-11T21:38:06.3286457Z with torch.cuda.device(0): 2023-01-11T21:38:06.3286656Z buf0 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3286882Z buf1 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3287078Z buf2 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3287275Z buf3 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3287466Z buf4 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3287560Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.3287740Z triton_fused_div_div_1_div_2_div_3_div_4_0.run(arg0_1, arg1_1, buf0, buf1, buf2, buf3, buf4, 10, grid=grid(10), stream=stream0) 2023-01-11T21:38:06.3287808Z del arg0_1 2023-01-11T21:38:06.3287881Z del arg1_1 2023-01-11T21:38:06.3287986Z return (buf0, buf1, buf2, buf3, buf4, ) 2023-01-11T21:38:06.3287991Z 2023-01-11T21:38:06.3287996Z 2023-01-11T21:38:06.3288079Z if __name__ == "__main__": 2023-01-11T21:38:06.3288202Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.3288333Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.3288533Z arg0_1 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.3288725Z arg1_1 = rand_strided((), (), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.3288839Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.3288844Z 2023-01-11T21:38:06.3288848Z 2023-01-11T21:38:06.3288945Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.3289021Z import torch 2023-01-11T21:38:06.3289095Z import random 2023-01-11T21:38:06.3289215Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.3289342Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.3289348Z 2023-01-11T21:38:06.3289430Z aten = torch.ops.aten 2023-01-11T21:38:06.3289560Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.3289660Z async_compile = AsyncCompile() 2023-01-11T21:38:06.3289665Z 2023-01-11T21:38:06.3289739Z import triton 2023-01-11T21:38:06.3289835Z import triton.language as tl 2023-01-11T21:38:06.3289962Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.3290103Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.3290137Z 2023-01-11T21:38:06.3290142Z 2023-01-11T21:38:06.3290335Z triton_fused_div_div_1_div_2_div_3_div_4_0 = async_compile.triton(''' 2023-01-11T21:38:06.3290411Z import triton 2023-01-11T21:38:06.3290497Z import triton.language as tl 2023-01-11T21:38:06.3290611Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.3290714Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.3290846Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.3290972Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.3290977Z 2023-01-11T21:38:06.3291458Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: '*fp16', 4: '*fp16', 5: '*fp16', 6: '*fp16', 7: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5, 6), equal_to_1=())]}) 2023-01-11T21:38:06.3291535Z @triton.jit 2023-01-11T21:38:06.3291713Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, out_ptr2, out_ptr3, out_ptr4, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.3291786Z xnumel = 10 2023-01-11T21:38:06.3291877Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.3292007Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.3292091Z xmask = xindex < xnumel 2023-01-11T21:38:06.3292163Z x0 = xindex 2023-01-11T21:38:06.3292378Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.3292636Z tmp1 = tl.load(in_ptr1 + (0 + tl.zeros([XBLOCK], tl.int32)), None, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.3292782Z tmp5 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.3292920Z tmp6 = tl.load(in_ptr1 + (0 + tl.zeros([XBLOCK], tl.int32)), None).to(tl.float32) 2023-01-11T21:38:06.3293001Z tmp2 = tmp0 / tmp1 2023-01-11T21:38:06.3293100Z tmp3 = tl.libdevice.floor(tmp2) 2023-01-11T21:38:06.3293200Z tmp4 = tl.libdevice.trunc(tmp2) 2023-01-11T21:38:06.3293280Z tmp7 = tmp5 / tmp6 2023-01-11T21:38:06.3293377Z tmp8 = tl.libdevice.floor(tmp7) 2023-01-11T21:38:06.3293512Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.3293639Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.3293770Z tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.3293899Z tl.store(out_ptr3 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.3294027Z tl.store(out_ptr4 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp8, xmask) 2023-01-11T21:38:06.3294117Z ''') 2023-01-11T21:38:06.3294122Z 2023-01-11T21:38:06.3294127Z 2023-01-11T21:38:06.3294225Z async_compile.wait(globals()) 2023-01-11T21:38:06.3294302Z del async_compile 2023-01-11T21:38:06.3294307Z 2023-01-11T21:38:06.3294378Z def call(args): 2023-01-11T21:38:06.3294452Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.3294655Z args.clear() 2023-01-11T21:38:06.3294748Z with torch.cuda.device(0): 2023-01-11T21:38:06.3294947Z buf0 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.3295144Z buf1 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.3295337Z buf2 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.3295530Z buf3 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.3295723Z buf4 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.3295810Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.3295993Z triton_fused_div_div_1_div_2_div_3_div_4_0.run(arg0_1, arg1_1, buf0, buf1, buf2, buf3, buf4, 10, grid=grid(10), stream=stream0) 2023-01-11T21:38:06.3296067Z del arg0_1 2023-01-11T21:38:06.3296143Z del arg1_1 2023-01-11T21:38:06.3296247Z return (buf0, buf1, buf2, buf3, buf4, ) 2023-01-11T21:38:06.3296293Z 2023-01-11T21:38:06.3296298Z 2023-01-11T21:38:06.3296381Z if __name__ == "__main__": 2023-01-11T21:38:06.3296502Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.3296629Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.3296824Z arg0_1 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.3297014Z arg1_1 = rand_strided((), (), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.3297213Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.3297219Z 2023-01-11T21:38:06.3297518Z [2023-01-11 21:34:32,940] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 501 2023-01-11T21:38:06.3297947Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.3298082Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.3298340Z [2023-01-11 21:34:32,993] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 502 2023-01-11T21:38:06.3298602Z [2023-01-11 21:34:33,149] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 502 2023-01-11T21:38:06.3299016Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.3299187Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.3299447Z [2023-01-11 21:34:33,195] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 503 2023-01-11T21:38:06.3299452Z 2023-01-11T21:38:06.3299546Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.3299621Z import torch 2023-01-11T21:38:06.3299700Z import random 2023-01-11T21:38:06.3299821Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.3299944Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.3299949Z 2023-01-11T21:38:06.3300032Z aten = torch.ops.aten 2023-01-11T21:38:06.3300175Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.3300266Z async_compile = AsyncCompile() 2023-01-11T21:38:06.3300274Z 2023-01-11T21:38:06.3300354Z import triton 2023-01-11T21:38:06.3300451Z import triton.language as tl 2023-01-11T21:38:06.3300578Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.3300718Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.3300724Z 2023-01-11T21:38:06.3300728Z 2023-01-11T21:38:06.3300924Z triton_fused_div_div_1_div_2_div_3_div_4_0 = async_compile.triton(''' 2023-01-11T21:38:06.3300997Z import triton 2023-01-11T21:38:06.3301091Z import triton.language as tl 2023-01-11T21:38:06.3301200Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.3301308Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.3301443Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.3301570Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.3301575Z 2023-01-11T21:38:06.3302052Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: '*fp32', 5: '*fp32', 6: '*fp32', 7: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5, 6), equal_to_1=())]}) 2023-01-11T21:38:06.3302128Z @triton.jit 2023-01-11T21:38:06.3302336Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, out_ptr2, out_ptr3, out_ptr4, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.3302410Z xnumel = 10 2023-01-11T21:38:06.3302503Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.3302633Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.3302717Z xmask = xindex < xnumel 2023-01-11T21:38:06.3302791Z x0 = xindex 2023-01-11T21:38:06.3303025Z tmp0 = tl.load(in_ptr0 + (0 + tl.zeros([XBLOCK], tl.int32)), None, eviction_policy='evict_last') 2023-01-11T21:38:06.3303217Z tmp1 = tl.load(in_ptr1 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.3303351Z tmp5 = tl.load(in_ptr0 + (0 + tl.zeros([XBLOCK], tl.int32)), None) 2023-01-11T21:38:06.3303452Z tmp6 = tl.load(in_ptr1 + (x0), xmask) 2023-01-11T21:38:06.3303527Z tmp2 = tmp0 / tmp1 2023-01-11T21:38:06.3303626Z tmp3 = tl.libdevice.floor(tmp2) 2023-01-11T21:38:06.3303723Z tmp4 = tl.libdevice.trunc(tmp2) 2023-01-11T21:38:06.3303802Z tmp7 = tmp5 / tmp6 2023-01-11T21:38:06.3303901Z tmp8 = tl.libdevice.floor(tmp7) 2023-01-11T21:38:06.3304039Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.3304170Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.3304295Z tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.3304425Z tl.store(out_ptr3 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.3304556Z tl.store(out_ptr4 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp8, xmask) 2023-01-11T21:38:06.3304646Z ''') 2023-01-11T21:38:06.3304652Z 2023-01-11T21:38:06.3304656Z 2023-01-11T21:38:06.3304781Z async_compile.wait(globals()) 2023-01-11T21:38:06.3304859Z del async_compile 2023-01-11T21:38:06.3304864Z 2023-01-11T21:38:06.3304941Z def call(args): 2023-01-11T21:38:06.3305022Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.3305092Z args.clear() 2023-01-11T21:38:06.3305187Z with torch.cuda.device(0): 2023-01-11T21:38:06.3305395Z buf0 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3305626Z buf1 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3305834Z buf2 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3306037Z buf3 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3306228Z buf4 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3306316Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.3306500Z triton_fused_div_div_1_div_2_div_3_div_4_0.run(arg0_1, arg1_1, buf0, buf1, buf2, buf3, buf4, 10, grid=grid(10), stream=stream0) 2023-01-11T21:38:06.3306580Z del arg0_1 2023-01-11T21:38:06.3306653Z del arg1_1 2023-01-11T21:38:06.3306759Z return (buf0, buf1, buf2, buf3, buf4, ) 2023-01-11T21:38:06.3306765Z 2023-01-11T21:38:06.3306769Z 2023-01-11T21:38:06.3306849Z if __name__ == "__main__": 2023-01-11T21:38:06.3306973Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.3307100Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.3307284Z arg0_1 = rand_strided((), (), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.3307482Z arg1_1 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.3307604Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.3307609Z 2023-01-11T21:38:06.3307613Z 2023-01-11T21:38:06.3307710Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.3307787Z import torch 2023-01-11T21:38:06.3307863Z import random 2023-01-11T21:38:06.3307985Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.3308108Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.3308113Z 2023-01-11T21:38:06.3308191Z aten = torch.ops.aten 2023-01-11T21:38:06.3308328Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.3308452Z async_compile = AsyncCompile() 2023-01-11T21:38:06.3308458Z 2023-01-11T21:38:06.3308535Z import triton 2023-01-11T21:38:06.3308628Z import triton.language as tl 2023-01-11T21:38:06.3308756Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.3308895Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.3308901Z 2023-01-11T21:38:06.3308905Z 2023-01-11T21:38:06.3309098Z triton_fused_div_div_1_div_2_div_3_div_4_0 = async_compile.triton(''' 2023-01-11T21:38:06.3309169Z import triton 2023-01-11T21:38:06.3309263Z import triton.language as tl 2023-01-11T21:38:06.3309378Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.3309489Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.3309625Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.3309751Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.3309756Z 2023-01-11T21:38:06.3310232Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: '*fp16', 4: '*fp16', 5: '*fp16', 6: '*fp16', 7: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5, 6), equal_to_1=())]}) 2023-01-11T21:38:06.3310307Z @triton.jit 2023-01-11T21:38:06.3310487Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, out_ptr2, out_ptr3, out_ptr4, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.3310556Z xnumel = 10 2023-01-11T21:38:06.3310655Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.3310785Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.3310907Z xmask = xindex < xnumel 2023-01-11T21:38:06.3310980Z x0 = xindex 2023-01-11T21:38:06.3311239Z tmp0 = tl.load(in_ptr0 + (0 + tl.zeros([XBLOCK], tl.int32)), None, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.3311453Z tmp1 = tl.load(in_ptr1 + (x0), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.3311594Z tmp5 = tl.load(in_ptr0 + (0 + tl.zeros([XBLOCK], tl.int32)), None).to(tl.float32) 2023-01-11T21:38:06.3311711Z tmp6 = tl.load(in_ptr1 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.3311794Z tmp2 = tmp0 / tmp1 2023-01-11T21:38:06.3311893Z tmp3 = tl.libdevice.floor(tmp2) 2023-01-11T21:38:06.3311990Z tmp4 = tl.libdevice.trunc(tmp2) 2023-01-11T21:38:06.3312070Z tmp7 = tmp5 / tmp6 2023-01-11T21:38:06.3312167Z tmp8 = tl.libdevice.floor(tmp7) 2023-01-11T21:38:06.3312296Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.3312429Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.3312563Z tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.3312695Z tl.store(out_ptr3 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.3312825Z tl.store(out_ptr4 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp8, xmask) 2023-01-11T21:38:06.3312914Z ''') 2023-01-11T21:38:06.3312920Z 2023-01-11T21:38:06.3312925Z 2023-01-11T21:38:06.3313018Z async_compile.wait(globals()) 2023-01-11T21:38:06.3313096Z del async_compile 2023-01-11T21:38:06.3313101Z 2023-01-11T21:38:06.3313171Z def call(args): 2023-01-11T21:38:06.3313251Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.3313327Z args.clear() 2023-01-11T21:38:06.3313419Z with torch.cuda.device(0): 2023-01-11T21:38:06.3313617Z buf0 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.3313813Z buf1 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.3314011Z buf2 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.3314207Z buf3 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.3314393Z buf4 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.3314488Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.3314697Z triton_fused_div_div_1_div_2_div_3_div_4_0.run(arg0_1, arg1_1, buf0, buf1, buf2, buf3, buf4, 10, grid=grid(10), stream=stream0) 2023-01-11T21:38:06.3314775Z del arg0_1 2023-01-11T21:38:06.3314851Z del arg1_1 2023-01-11T21:38:06.3314960Z return (buf0, buf1, buf2, buf3, buf4, ) 2023-01-11T21:38:06.3314965Z 2023-01-11T21:38:06.3314970Z 2023-01-11T21:38:06.3315050Z if __name__ == "__main__": 2023-01-11T21:38:06.3315168Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.3315316Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.3315534Z arg0_1 = rand_strided((), (), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.3315736Z arg1_1 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.3315856Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.3315861Z 2023-01-11T21:38:06.3316132Z [2023-01-11 21:34:33,296] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 503 2023-01-11T21:38:06.3316551Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.3316682Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.3316938Z [2023-01-11 21:34:33,340] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 504 2023-01-11T21:38:06.3317228Z [2023-01-11 21:34:33,359] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 504 2023-01-11T21:38:06.3317645Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.3317771Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.3318021Z [2023-01-11 21:34:33,403] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 505 2023-01-11T21:38:06.3318027Z 2023-01-11T21:38:06.3318124Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.3318197Z import torch 2023-01-11T21:38:06.3318271Z import random 2023-01-11T21:38:06.3318390Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.3318515Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.3318521Z 2023-01-11T21:38:06.3318604Z aten = torch.ops.aten 2023-01-11T21:38:06.3318733Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.3318827Z async_compile = AsyncCompile() 2023-01-11T21:38:06.3318835Z 2023-01-11T21:38:06.3318909Z import triton 2023-01-11T21:38:06.3319000Z import triton.language as tl 2023-01-11T21:38:06.3319124Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.3319264Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.3319269Z 2023-01-11T21:38:06.3319274Z 2023-01-11T21:38:06.3319463Z triton_fused_div_div_1_div_2_div_3_div_4_0 = async_compile.triton(''' 2023-01-11T21:38:06.3319539Z import triton 2023-01-11T21:38:06.3319624Z import triton.language as tl 2023-01-11T21:38:06.3319737Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.3319837Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.3319972Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.3320095Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.3320100Z 2023-01-11T21:38:06.3320590Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*i64', 1: '*i64', 2: '*fp32', 3: '*i64', 4: '*i64', 5: '*fp32', 6: '*i64', 7: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5, 6), equal_to_1=())]}) 2023-01-11T21:38:06.3320666Z @triton.jit 2023-01-11T21:38:06.3320844Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, out_ptr2, out_ptr3, out_ptr4, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.3320913Z xnumel = 10 2023-01-11T21:38:06.3321011Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.3321138Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.3321222Z xmask = xindex < xnumel 2023-01-11T21:38:06.3321295Z x0 = xindex 2023-01-11T21:38:06.3321482Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.3321715Z tmp2 = tl.load(in_ptr1 + (0 + tl.zeros([XBLOCK], tl.int32)), None, eviction_policy='evict_last') 2023-01-11T21:38:06.3321812Z tmp7 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.3321938Z tmp8 = tl.load(in_ptr1 + (0 + tl.zeros([XBLOCK], tl.int32)), None) 2023-01-11T21:38:06.3322025Z tmp1 = tmp0.to(tl.float32) 2023-01-11T21:38:06.3322113Z tmp3 = tmp2.to(tl.float32) 2023-01-11T21:38:06.3322191Z tmp4 = tmp1 / tmp3 2023-01-11T21:38:06.3322444Z tmp5 = tl.where((tmp0 < 0) != (tmp2 < 0), tl.where(tmp0 % tmp2 != 0, tmp0 // tmp2 - 1, tmp0 // tmp2), tmp0 // tmp2) 2023-01-11T21:38:06.3322524Z tmp6 = tmp0 // tmp2 2023-01-11T21:38:06.3322775Z tmp9 = tl.where((tmp7 < 0) != (tmp8 < 0), tl.where(tmp7 % tmp8 != 0, tmp7 // tmp8 - 1, tmp7 // tmp8), tmp7 // tmp8) 2023-01-11T21:38:06.3322902Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.3323061Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask) 2023-01-11T21:38:06.3323191Z tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp6, xmask) 2023-01-11T21:38:06.3323320Z tl.store(out_ptr3 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.3323448Z tl.store(out_ptr4 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp9, xmask) 2023-01-11T21:38:06.3323533Z ''') 2023-01-11T21:38:06.3323539Z 2023-01-11T21:38:06.3323543Z 2023-01-11T21:38:06.3323636Z async_compile.wait(globals()) 2023-01-11T21:38:06.3323713Z del async_compile 2023-01-11T21:38:06.3323719Z 2023-01-11T21:38:06.3323785Z def call(args): 2023-01-11T21:38:06.3323865Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.3323939Z args.clear() 2023-01-11T21:38:06.3324031Z with torch.cuda.device(0): 2023-01-11T21:38:06.3324227Z buf0 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3324426Z buf1 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.3324618Z buf2 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.3324814Z buf3 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3325004Z buf4 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.3325097Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.3325274Z triton_fused_div_div_1_div_2_div_3_div_4_0.run(arg0_1, arg1_1, buf0, buf1, buf2, buf3, buf4, 10, grid=grid(10), stream=stream0) 2023-01-11T21:38:06.3325350Z del arg0_1 2023-01-11T21:38:06.3325422Z del arg1_1 2023-01-11T21:38:06.3325525Z return (buf0, buf1, buf2, buf3, buf4, ) 2023-01-11T21:38:06.3325530Z 2023-01-11T21:38:06.3325535Z 2023-01-11T21:38:06.3325613Z if __name__ == "__main__": 2023-01-11T21:38:06.3325732Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.3325854Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.3326048Z arg0_1 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.3326234Z arg1_1 = rand_strided((), (), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.3326354Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.3326390Z 2023-01-11T21:38:06.3326396Z 2023-01-11T21:38:06.3326495Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.3326571Z import torch 2023-01-11T21:38:06.3326645Z import random 2023-01-11T21:38:06.3326758Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.3326882Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.3326887Z 2023-01-11T21:38:06.3326970Z aten = torch.ops.aten 2023-01-11T21:38:06.3327108Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.3327205Z async_compile = AsyncCompile() 2023-01-11T21:38:06.3327210Z 2023-01-11T21:38:06.3327285Z import triton 2023-01-11T21:38:06.3327380Z import triton.language as tl 2023-01-11T21:38:06.3327505Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.3327636Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.3327642Z 2023-01-11T21:38:06.3327651Z 2023-01-11T21:38:06.3327835Z triton_fused_div_div_1_div_2_div_3_div_4_0 = async_compile.triton(''' 2023-01-11T21:38:06.3327910Z import triton 2023-01-11T21:38:06.3328000Z import triton.language as tl 2023-01-11T21:38:06.3328112Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.3328212Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.3328345Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.3328471Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.3328476Z 2023-01-11T21:38:06.3328934Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*i64', 1: '*i64', 2: '*fp32', 3: '*i64', 4: '*i64', 5: '*fp32', 6: '*i64', 7: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5, 6), equal_to_1=())]}) 2023-01-11T21:38:06.3329043Z @triton.jit 2023-01-11T21:38:06.3329220Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, out_ptr2, out_ptr3, out_ptr4, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.3329301Z xnumel = 10 2023-01-11T21:38:06.3329396Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.3329525Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.3329608Z xmask = xindex < xnumel 2023-01-11T21:38:06.3329681Z x0 = xindex 2023-01-11T21:38:06.3329865Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.3330099Z tmp2 = tl.load(in_ptr1 + (0 + tl.zeros([XBLOCK], tl.int32)), None, eviction_policy='evict_last') 2023-01-11T21:38:06.3330195Z tmp7 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.3330326Z tmp8 = tl.load(in_ptr1 + (0 + tl.zeros([XBLOCK], tl.int32)), None) 2023-01-11T21:38:06.3330415Z tmp1 = tmp0.to(tl.float32) 2023-01-11T21:38:06.3330499Z tmp3 = tmp2.to(tl.float32) 2023-01-11T21:38:06.3330577Z tmp4 = tmp1 / tmp3 2023-01-11T21:38:06.3330822Z tmp5 = tl.where((tmp0 < 0) != (tmp2 < 0), tl.where(tmp0 % tmp2 != 0, tmp0 // tmp2 - 1, tmp0 // tmp2), tmp0 // tmp2) 2023-01-11T21:38:06.3330906Z tmp6 = tmp0 // tmp2 2023-01-11T21:38:06.3331155Z tmp9 = tl.where((tmp7 < 0) != (tmp8 < 0), tl.where(tmp7 % tmp8 != 0, tmp7 // tmp8 - 1, tmp7 // tmp8), tmp7 // tmp8) 2023-01-11T21:38:06.3331289Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.3331418Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask) 2023-01-11T21:38:06.3331547Z tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp6, xmask) 2023-01-11T21:38:06.3331679Z tl.store(out_ptr3 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.3331807Z tl.store(out_ptr4 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp9, xmask) 2023-01-11T21:38:06.3331887Z ''') 2023-01-11T21:38:06.3331900Z 2023-01-11T21:38:06.3331904Z 2023-01-11T21:38:06.3331991Z async_compile.wait(globals()) 2023-01-11T21:38:06.3332066Z del async_compile 2023-01-11T21:38:06.3332071Z 2023-01-11T21:38:06.3332148Z def call(args): 2023-01-11T21:38:06.3332255Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.3332331Z args.clear() 2023-01-11T21:38:06.3332423Z with torch.cuda.device(0): 2023-01-11T21:38:06.3332622Z buf0 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3332808Z buf1 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.3332997Z buf2 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.3333193Z buf3 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3333382Z buf4 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.3333476Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.3333657Z triton_fused_div_div_1_div_2_div_3_div_4_0.run(arg0_1, arg1_1, buf0, buf1, buf2, buf3, buf4, 10, grid=grid(10), stream=stream0) 2023-01-11T21:38:06.3333731Z del arg0_1 2023-01-11T21:38:06.3333803Z del arg1_1 2023-01-11T21:38:06.3333902Z return (buf0, buf1, buf2, buf3, buf4, ) 2023-01-11T21:38:06.3333907Z 2023-01-11T21:38:06.3333912Z 2023-01-11T21:38:06.3333990Z if __name__ == "__main__": 2023-01-11T21:38:06.3334108Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.3334232Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.3334427Z arg0_1 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.3334727Z arg1_1 = rand_strided((), (), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.3334845Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.3334850Z 2023-01-11T21:38:06.3335210Z [2023-01-11 21:34:33,507] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 505 2023-01-11T21:38:06.3335631Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.3335757Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.3336016Z [2023-01-11 21:34:33,551] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 506 2023-01-11T21:38:06.3336280Z [2023-01-11 21:34:33,570] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 506 2023-01-11T21:38:06.3336286Z 2023-01-11T21:38:06.3336385Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.3336457Z import torch 2023-01-11T21:38:06.3336535Z import random 2023-01-11T21:38:06.3336653Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.3336774Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.3336780Z 2023-01-11T21:38:06.3336855Z aten = torch.ops.aten 2023-01-11T21:38:06.3336993Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.3337087Z async_compile = AsyncCompile() 2023-01-11T21:38:06.3337092Z 2023-01-11T21:38:06.3337239Z import triton 2023-01-11T21:38:06.3337347Z import triton.language as tl 2023-01-11T21:38:06.3337484Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.3337623Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.3337629Z 2023-01-11T21:38:06.3337633Z 2023-01-11T21:38:06.3337826Z triton_fused_div_div_1_div_2_div_3_div_4_0 = async_compile.triton(''' 2023-01-11T21:38:06.3337893Z import triton 2023-01-11T21:38:06.3337983Z import triton.language as tl 2023-01-11T21:38:06.3338099Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.3338201Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.3338334Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.3338459Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.3338464Z 2023-01-11T21:38:06.3338964Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*i64', 1: '*i64', 2: '*fp32', 3: '*i64', 4: '*i64', 5: '*fp32', 6: '*i64', 7: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5, 6), equal_to_1=())]}) 2023-01-11T21:38:06.3339041Z @triton.jit 2023-01-11T21:38:06.3339222Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, out_ptr2, out_ptr3, out_ptr4, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.3339289Z xnumel = 10 2023-01-11T21:38:06.3339387Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.3339513Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.3339599Z xmask = xindex < xnumel 2023-01-11T21:38:06.3339670Z x0 = xindex 2023-01-11T21:38:06.3339902Z tmp0 = tl.load(in_ptr0 + (0 + tl.zeros([XBLOCK], tl.int32)), None, eviction_policy='evict_last') 2023-01-11T21:38:06.3340088Z tmp2 = tl.load(in_ptr1 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.3340217Z tmp7 = tl.load(in_ptr0 + (0 + tl.zeros([XBLOCK], tl.int32)), None) 2023-01-11T21:38:06.3340313Z tmp8 = tl.load(in_ptr1 + (x0), xmask) 2023-01-11T21:38:06.3340403Z tmp1 = tmp0.to(tl.float32) 2023-01-11T21:38:06.3340491Z tmp3 = tmp2.to(tl.float32) 2023-01-11T21:38:06.3340569Z tmp4 = tmp1 / tmp3 2023-01-11T21:38:06.3340823Z tmp5 = tl.where((tmp0 < 0) != (tmp2 < 0), tl.where(tmp0 % tmp2 != 0, tmp0 // tmp2 - 1, tmp0 // tmp2), tmp0 // tmp2) 2023-01-11T21:38:06.3340901Z tmp6 = tmp0 // tmp2 2023-01-11T21:38:06.3341144Z tmp9 = tl.where((tmp7 < 0) != (tmp8 < 0), tl.where(tmp7 % tmp8 != 0, tmp7 // tmp8 - 1, tmp7 // tmp8), tmp7 // tmp8) 2023-01-11T21:38:06.3341307Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.3341441Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask) 2023-01-11T21:38:06.3341574Z tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp6, xmask) 2023-01-11T21:38:06.3341710Z tl.store(out_ptr3 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.3341840Z tl.store(out_ptr4 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp9, xmask) 2023-01-11T21:38:06.3341928Z ''') 2023-01-11T21:38:06.3341934Z 2023-01-11T21:38:06.3341938Z 2023-01-11T21:38:06.3342036Z async_compile.wait(globals()) 2023-01-11T21:38:06.3342108Z del async_compile 2023-01-11T21:38:06.3342119Z 2023-01-11T21:38:06.3342189Z def call(args): 2023-01-11T21:38:06.3342271Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.3342347Z args.clear() 2023-01-11T21:38:06.3342443Z with torch.cuda.device(0): 2023-01-11T21:38:06.3342642Z buf0 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3342840Z buf1 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.3343036Z buf2 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.3343230Z buf3 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3343427Z buf4 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.3343522Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.3343704Z triton_fused_div_div_1_div_2_div_3_div_4_0.run(arg0_1, arg1_1, buf0, buf1, buf2, buf3, buf4, 10, grid=grid(10), stream=stream0) 2023-01-11T21:38:06.3343779Z del arg0_1 2023-01-11T21:38:06.3343854Z del arg1_1 2023-01-11T21:38:06.3343959Z return (buf0, buf1, buf2, buf3, buf4, ) 2023-01-11T21:38:06.3343964Z 2023-01-11T21:38:06.3343968Z 2023-01-11T21:38:06.3344049Z if __name__ == "__main__": 2023-01-11T21:38:06.3344165Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.3344294Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.3344485Z arg0_1 = rand_strided((), (), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.3344680Z arg1_1 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.3344833Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.3344839Z 2023-01-11T21:38:06.3344844Z 2023-01-11T21:38:06.3344943Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.3345020Z import torch 2023-01-11T21:38:06.3345099Z import random 2023-01-11T21:38:06.3345236Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.3345382Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.3345389Z 2023-01-11T21:38:06.3345473Z aten = torch.ops.aten 2023-01-11T21:38:06.3345610Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.3345708Z async_compile = AsyncCompile() 2023-01-11T21:38:06.3345716Z 2023-01-11T21:38:06.3345792Z import triton 2023-01-11T21:38:06.3345886Z import triton.language as tl 2023-01-11T21:38:06.3346005Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.3346148Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.3346153Z 2023-01-11T21:38:06.3346160Z 2023-01-11T21:38:06.3346351Z triton_fused_div_div_1_div_2_div_3_div_4_0 = async_compile.triton(''' 2023-01-11T21:38:06.3346428Z import triton 2023-01-11T21:38:06.3346523Z import triton.language as tl 2023-01-11T21:38:06.3346637Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.3346740Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.3346875Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.3346995Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.3347000Z 2023-01-11T21:38:06.3347462Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*i64', 1: '*i64', 2: '*fp32', 3: '*i64', 4: '*i64', 5: '*fp32', 6: '*i64', 7: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5, 6), equal_to_1=())]}) 2023-01-11T21:38:06.3347566Z @triton.jit 2023-01-11T21:38:06.3347748Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, out_ptr2, out_ptr3, out_ptr4, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.3347825Z xnumel = 10 2023-01-11T21:38:06.3347925Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.3348056Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.3348139Z xmask = xindex < xnumel 2023-01-11T21:38:06.3348206Z x0 = xindex 2023-01-11T21:38:06.3348440Z tmp0 = tl.load(in_ptr0 + (0 + tl.zeros([XBLOCK], tl.int32)), None, eviction_policy='evict_last') 2023-01-11T21:38:06.3348630Z tmp2 = tl.load(in_ptr1 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.3348762Z tmp7 = tl.load(in_ptr0 + (0 + tl.zeros([XBLOCK], tl.int32)), None) 2023-01-11T21:38:06.3348864Z tmp8 = tl.load(in_ptr1 + (x0), xmask) 2023-01-11T21:38:06.3348954Z tmp1 = tmp0.to(tl.float32) 2023-01-11T21:38:06.3349040Z tmp3 = tmp2.to(tl.float32) 2023-01-11T21:38:06.3349121Z tmp4 = tmp1 / tmp3 2023-01-11T21:38:06.3349370Z tmp5 = tl.where((tmp0 < 0) != (tmp2 < 0), tl.where(tmp0 % tmp2 != 0, tmp0 // tmp2 - 1, tmp0 // tmp2), tmp0 // tmp2) 2023-01-11T21:38:06.3349454Z tmp6 = tmp0 // tmp2 2023-01-11T21:38:06.3349706Z tmp9 = tl.where((tmp7 < 0) != (tmp8 < 0), tl.where(tmp7 % tmp8 != 0, tmp7 // tmp8 - 1, tmp7 // tmp8), tmp7 // tmp8) 2023-01-11T21:38:06.3349840Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.3349970Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask) 2023-01-11T21:38:06.3350103Z tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp6, xmask) 2023-01-11T21:38:06.3350234Z tl.store(out_ptr3 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.3350366Z tl.store(out_ptr4 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp9, xmask) 2023-01-11T21:38:06.3350447Z ''') 2023-01-11T21:38:06.3350453Z 2023-01-11T21:38:06.3350457Z 2023-01-11T21:38:06.3350553Z async_compile.wait(globals()) 2023-01-11T21:38:06.3350630Z del async_compile 2023-01-11T21:38:06.3350636Z 2023-01-11T21:38:06.3350749Z def call(args): 2023-01-11T21:38:06.3350831Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.3350906Z args.clear() 2023-01-11T21:38:06.3350996Z with torch.cuda.device(0): 2023-01-11T21:38:06.3351186Z buf0 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3351378Z buf1 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.3351569Z buf2 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.3351764Z buf3 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3351958Z buf4 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.3352053Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.3352228Z triton_fused_div_div_1_div_2_div_3_div_4_0.run(arg0_1, arg1_1, buf0, buf1, buf2, buf3, buf4, 10, grid=grid(10), stream=stream0) 2023-01-11T21:38:06.3352302Z del arg0_1 2023-01-11T21:38:06.3352370Z del arg1_1 2023-01-11T21:38:06.3352476Z return (buf0, buf1, buf2, buf3, buf4, ) 2023-01-11T21:38:06.3352481Z 2023-01-11T21:38:06.3352485Z 2023-01-11T21:38:06.3352562Z if __name__ == "__main__": 2023-01-11T21:38:06.3352679Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.3352804Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.3352992Z arg0_1 = rand_strided((), (), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.3353183Z arg1_1 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.3353302Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.3353348Z 2023-01-11T21:38:06.3353412Z ok (1.371s) 2023-01-11T21:38:06.3353926Z test_dropout_cuda (__main__.CudaTests) ... /opt/conda/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py:372: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled.Consider setting `torch.set_float32_matmul_precision('high')` 2023-01-11T21:38:06.3354006Z warnings.warn( 2023-01-11T21:38:06.3354262Z [2023-01-11 21:34:33,639] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 507 2023-01-11T21:38:06.3354514Z [2023-01-11 21:34:33,639] torch._inductor.lowering: [WARNING] using triton random, expect difference from eager 2023-01-11T21:38:06.3354779Z [2023-01-11 21:34:33,774] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 507 2023-01-11T21:38:06.3355027Z [2023-01-11 21:34:34,024] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 508 2023-01-11T21:38:06.3355321Z [2023-01-11 21:34:34,025] torch._inductor.lowering: [WARNING] using triton random, expect difference from eager 2023-01-11T21:38:06.3355587Z [2023-01-11 21:34:34,033] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 508 2023-01-11T21:38:06.3355593Z 2023-01-11T21:38:06.3355695Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.3355764Z import torch 2023-01-11T21:38:06.3355839Z import random 2023-01-11T21:38:06.3355957Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.3356078Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.3356083Z 2023-01-11T21:38:06.3356163Z aten = torch.ops.aten 2023-01-11T21:38:06.3356300Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.3356396Z async_compile = AsyncCompile() 2023-01-11T21:38:06.3356401Z 2023-01-11T21:38:06.3356469Z import triton 2023-01-11T21:38:06.3356560Z import triton.language as tl 2023-01-11T21:38:06.3356684Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.3356827Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.3356992Z seed_cuda_0 = None # 12bf87036c8e625335a9db42dcf50de0c1ec952294785adced537424d5733e17 2023-01-11T21:38:06.3356998Z 2023-01-11T21:38:06.3357002Z 2023-01-11T21:38:06.3357184Z triton_fused_mul_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.3357260Z import triton 2023-01-11T21:38:06.3357352Z import triton.language as tl 2023-01-11T21:38:06.3357458Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.3357559Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.3357692Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.3357817Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.3357822Z 2023-01-11T21:38:06.3358236Z @pointwise(size_hints=[1024], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.3358314Z @triton.jit 2023-01-11T21:38:06.3358454Z def triton_(seed0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.3358527Z xnumel = 1000 2023-01-11T21:38:06.3358617Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.3358746Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.3358828Z xmask = xindex < xnumel 2023-01-11T21:38:06.3358899Z x0 = xindex 2023-01-11T21:38:06.3359028Z tmp0 = tl.load(seed0 + (0 + tl.zeros([XBLOCK], tl.int32)), None) 2023-01-11T21:38:06.3359124Z tmp6 = tl.load(in_ptr1 + (x0), xmask) 2023-01-11T21:38:06.3359193Z tmp1 = x0 2023-01-11T21:38:06.3359273Z tmp2 = tl.rand(tmp0, tmp1) 2023-01-11T21:38:06.3359345Z tmp3 = 0.5 2023-01-11T21:38:06.3359422Z tmp4 = tmp2 > tmp3 2023-01-11T21:38:06.3359508Z tmp5 = tmp4.to(tl.float32) 2023-01-11T21:38:06.3359584Z tmp7 = tmp5 * tmp6 2023-01-11T21:38:06.3359682Z tmp8 = 2.0 2023-01-11T21:38:06.3359752Z tmp9 = tmp7 * tmp8 2023-01-11T21:38:06.3359885Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp9, xmask) 2023-01-11T21:38:06.3359970Z ''') 2023-01-11T21:38:06.3359976Z 2023-01-11T21:38:06.3359980Z 2023-01-11T21:38:06.3360073Z async_compile.wait(globals()) 2023-01-11T21:38:06.3360150Z del async_compile 2023-01-11T21:38:06.3360155Z 2023-01-11T21:38:06.3360229Z def call(args): 2023-01-11T21:38:06.3360300Z arg0_1, = args 2023-01-11T21:38:06.3360374Z args.clear() 2023-01-11T21:38:06.3360500Z torch.randint(2**31, size=(), dtype=torch.int64, out=seed_cuda_0) 2023-01-11T21:38:06.3360590Z with torch.cuda.device(0): 2023-01-11T21:38:06.3360789Z buf0 = empty_strided((1000, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3360882Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.3361033Z triton_fused_mul_1_0.run(seed_cuda_0, arg0_1, buf0, 1000, grid=grid(1000), stream=stream0) 2023-01-11T21:38:06.3361109Z del arg0_1 2023-01-11T21:38:06.3361186Z return (buf0, ) 2023-01-11T21:38:06.3361191Z 2023-01-11T21:38:06.3361195Z 2023-01-11T21:38:06.3361272Z if __name__ == "__main__": 2023-01-11T21:38:06.3361384Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.3361514Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.3361710Z seed_cuda_0 = rand_strided((), (), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.3361908Z arg0_1 = rand_strided((1000, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.3362020Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.3362026Z 2023-01-11T21:38:06.3362030Z 2023-01-11T21:38:06.3362129Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.3362201Z import torch 2023-01-11T21:38:06.3362274Z import random 2023-01-11T21:38:06.3362385Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.3362507Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.3362516Z 2023-01-11T21:38:06.3362595Z aten = torch.ops.aten 2023-01-11T21:38:06.3362730Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.3362824Z async_compile = AsyncCompile() 2023-01-11T21:38:06.3362829Z 2023-01-11T21:38:06.3362899Z import triton 2023-01-11T21:38:06.3363020Z import triton.language as tl 2023-01-11T21:38:06.3363139Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.3363279Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.3363442Z seed_cuda_0 = None # 12bf87036c8e625335a9db42dcf50de0c1ec952294785adced537424d5733e17 2023-01-11T21:38:06.3363447Z 2023-01-11T21:38:06.3363451Z 2023-01-11T21:38:06.3363604Z triton_fused_mul_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.3363675Z import triton 2023-01-11T21:38:06.3363766Z import triton.language as tl 2023-01-11T21:38:06.3363878Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.3363981Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.3364107Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.3364230Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.3364235Z 2023-01-11T21:38:06.3364648Z @pointwise(size_hints=[1024], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.3364723Z @triton.jit 2023-01-11T21:38:06.3364862Z def triton_(seed0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.3364936Z xnumel = 1000 2023-01-11T21:38:06.3365032Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.3365160Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.3365236Z xmask = xindex < xnumel 2023-01-11T21:38:06.3365306Z x0 = xindex 2023-01-11T21:38:06.3365431Z tmp0 = tl.load(seed0 + (0 + tl.zeros([XBLOCK], tl.int32)), None) 2023-01-11T21:38:06.3365554Z tmp6 = tl.load(in_ptr1 + (x0), xmask) 2023-01-11T21:38:06.3365623Z tmp1 = x0 2023-01-11T21:38:06.3365711Z tmp2 = tl.rand(tmp0, tmp1) 2023-01-11T21:38:06.3365780Z tmp3 = 0.5 2023-01-11T21:38:06.3365851Z tmp4 = tmp2 > tmp3 2023-01-11T21:38:06.3365939Z tmp5 = tmp4.to(tl.float32) 2023-01-11T21:38:06.3366014Z tmp7 = tmp5 * tmp6 2023-01-11T21:38:06.3366085Z tmp8 = 2.0 2023-01-11T21:38:06.3366161Z tmp9 = tmp7 * tmp8 2023-01-11T21:38:06.3366293Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp9, xmask) 2023-01-11T21:38:06.3366378Z ''') 2023-01-11T21:38:06.3366384Z 2023-01-11T21:38:06.3366389Z 2023-01-11T21:38:06.3366473Z async_compile.wait(globals()) 2023-01-11T21:38:06.3366549Z del async_compile 2023-01-11T21:38:06.3366554Z 2023-01-11T21:38:06.3366628Z def call(args): 2023-01-11T21:38:06.3366700Z arg0_1, = args 2023-01-11T21:38:06.3366775Z args.clear() 2023-01-11T21:38:06.3366910Z torch.randint(2**31, size=(), dtype=torch.int64, out=seed_cuda_0) 2023-01-11T21:38:06.3367000Z with torch.cuda.device(0): 2023-01-11T21:38:06.3367195Z buf0 = empty_strided((1000, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3367287Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.3367442Z triton_fused_mul_1_0.run(seed_cuda_0, arg0_1, buf0, 1000, grid=grid(1000), stream=stream0) 2023-01-11T21:38:06.3367515Z del arg0_1 2023-01-11T21:38:06.3367592Z return (buf0, ) 2023-01-11T21:38:06.3367597Z 2023-01-11T21:38:06.3367602Z 2023-01-11T21:38:06.3367680Z if __name__ == "__main__": 2023-01-11T21:38:06.3367800Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.3367925Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.3368110Z seed_cuda_0 = rand_strided((), (), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.3368311Z arg0_1 = rand_strided((1000, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.3368426Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.3368432Z 2023-01-11T21:38:06.3368501Z ok (0.588s) 2023-01-11T21:38:06.3368843Z test_dropout_deterministic_cuda (__main__.CudaTests) ... [2023-01-11 21:34:34,226] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 509 2023-01-11T21:38:06.3369125Z [2023-01-11 21:34:34,227] torch._inductor.lowering: [WARNING] using triton random, expect difference from eager 2023-01-11T21:38:06.3369390Z [2023-01-11 21:34:34,364] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 509 2023-01-11T21:38:06.3369641Z [2023-01-11 21:34:34,429] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 510 2023-01-11T21:38:06.3369892Z [2023-01-11 21:34:34,430] torch._inductor.lowering: [WARNING] using triton random, expect difference from eager 2023-01-11T21:38:06.3370144Z [2023-01-11 21:34:34,438] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 510 2023-01-11T21:38:06.3370160Z 2023-01-11T21:38:06.3370251Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.3370326Z import torch 2023-01-11T21:38:06.3370401Z import random 2023-01-11T21:38:06.3370520Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.3370647Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.3370652Z 2023-01-11T21:38:06.3370733Z aten = torch.ops.aten 2023-01-11T21:38:06.3370871Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.3370958Z async_compile = AsyncCompile() 2023-01-11T21:38:06.3370963Z 2023-01-11T21:38:06.3371035Z import triton 2023-01-11T21:38:06.3371127Z import triton.language as tl 2023-01-11T21:38:06.3371254Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.3371391Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.3371555Z seed_cuda_0 = None # 12bf87036c8e625335a9db42dcf50de0c1ec952294785adced537424d5733e17 2023-01-11T21:38:06.3371586Z 2023-01-11T21:38:06.3371591Z 2023-01-11T21:38:06.3371747Z triton_fused_mul_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.3371820Z import triton 2023-01-11T21:38:06.3371905Z import triton.language as tl 2023-01-11T21:38:06.3372021Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.3372124Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.3372254Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.3372379Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.3372384Z 2023-01-11T21:38:06.3372801Z @pointwise(size_hints=[1024], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.3372871Z @triton.jit 2023-01-11T21:38:06.3373013Z def triton_(seed0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.3373084Z xnumel = 1024 2023-01-11T21:38:06.3373179Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.3373309Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.3373392Z xmask = xindex < xnumel 2023-01-11T21:38:06.3373463Z x0 = xindex 2023-01-11T21:38:06.3373591Z tmp0 = tl.load(seed0 + (0 + tl.zeros([XBLOCK], tl.int32)), None) 2023-01-11T21:38:06.3373688Z tmp6 = tl.load(in_ptr1 + (x0), xmask) 2023-01-11T21:38:06.3373752Z tmp1 = x0 2023-01-11T21:38:06.3373840Z tmp2 = tl.rand(tmp0, tmp1) 2023-01-11T21:38:06.3373911Z tmp3 = 0.55 2023-01-11T21:38:06.3373989Z tmp4 = tmp2 > tmp3 2023-01-11T21:38:06.3374077Z tmp5 = tmp4.to(tl.float32) 2023-01-11T21:38:06.3374153Z tmp7 = tmp5 * tmp6 2023-01-11T21:38:06.3374232Z tmp8 = 2.2222222222222223 2023-01-11T21:38:06.3374302Z tmp9 = tmp7 * tmp8 2023-01-11T21:38:06.3374439Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp9, xmask) 2023-01-11T21:38:06.3374638Z ''') 2023-01-11T21:38:06.3374644Z 2023-01-11T21:38:06.3374648Z 2023-01-11T21:38:06.3374739Z async_compile.wait(globals()) 2023-01-11T21:38:06.3374815Z del async_compile 2023-01-11T21:38:06.3374820Z 2023-01-11T21:38:06.3374893Z def call(args): 2023-01-11T21:38:06.3374966Z arg0_1, = args 2023-01-11T21:38:06.3375034Z args.clear() 2023-01-11T21:38:06.3375214Z torch.randint(2**31, size=(), dtype=torch.int64, out=seed_cuda_0) 2023-01-11T21:38:06.3375306Z with torch.cuda.device(0): 2023-01-11T21:38:06.3375506Z buf0 = empty_strided((1024, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3375598Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.3375750Z triton_fused_mul_1_0.run(seed_cuda_0, arg0_1, buf0, 1024, grid=grid(1024), stream=stream0) 2023-01-11T21:38:06.3375822Z del arg0_1 2023-01-11T21:38:06.3375892Z return (buf0, ) 2023-01-11T21:38:06.3375905Z 2023-01-11T21:38:06.3375910Z 2023-01-11T21:38:06.3375982Z if __name__ == "__main__": 2023-01-11T21:38:06.3376103Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.3376227Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.3376424Z seed_cuda_0 = rand_strided((), (), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.3376628Z arg0_1 = rand_strided((1024, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.3376742Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.3376747Z 2023-01-11T21:38:06.3376751Z 2023-01-11T21:38:06.3376849Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.3376921Z import torch 2023-01-11T21:38:06.3376988Z import random 2023-01-11T21:38:06.3377105Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.3377304Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.3377310Z 2023-01-11T21:38:06.3377404Z aten = torch.ops.aten 2023-01-11T21:38:06.3377550Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.3377687Z async_compile = AsyncCompile() 2023-01-11T21:38:06.3377692Z 2023-01-11T21:38:06.3377767Z import triton 2023-01-11T21:38:06.3377855Z import triton.language as tl 2023-01-11T21:38:06.3377979Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.3378120Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.3378291Z seed_cuda_0 = None # 12bf87036c8e625335a9db42dcf50de0c1ec952294785adced537424d5733e17 2023-01-11T21:38:06.3378296Z 2023-01-11T21:38:06.3378300Z 2023-01-11T21:38:06.3378460Z triton_fused_mul_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.3378535Z import triton 2023-01-11T21:38:06.3378628Z import triton.language as tl 2023-01-11T21:38:06.3378748Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.3378845Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.3378979Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.3379107Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.3379115Z 2023-01-11T21:38:06.3379537Z @pointwise(size_hints=[1024], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.3379612Z @triton.jit 2023-01-11T21:38:06.3379756Z def triton_(seed0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.3379833Z xnumel = 1024 2023-01-11T21:38:06.3379933Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.3380058Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.3380142Z xmask = xindex < xnumel 2023-01-11T21:38:06.3380214Z x0 = xindex 2023-01-11T21:38:06.3380345Z tmp0 = tl.load(seed0 + (0 + tl.zeros([XBLOCK], tl.int32)), None) 2023-01-11T21:38:06.3380443Z tmp6 = tl.load(in_ptr1 + (x0), xmask) 2023-01-11T21:38:06.3380516Z tmp1 = x0 2023-01-11T21:38:06.3380606Z tmp2 = tl.rand(tmp0, tmp1) 2023-01-11T21:38:06.3380677Z tmp3 = 0.55 2023-01-11T21:38:06.3380756Z tmp4 = tmp2 > tmp3 2023-01-11T21:38:06.3380849Z tmp5 = tmp4.to(tl.float32) 2023-01-11T21:38:06.3380928Z tmp7 = tmp5 * tmp6 2023-01-11T21:38:06.3381009Z tmp8 = 2.2222222222222223 2023-01-11T21:38:06.3381090Z tmp9 = tmp7 * tmp8 2023-01-11T21:38:06.3381253Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp9, xmask) 2023-01-11T21:38:06.3381334Z ''') 2023-01-11T21:38:06.3381339Z 2023-01-11T21:38:06.3381344Z 2023-01-11T21:38:06.3381438Z async_compile.wait(globals()) 2023-01-11T21:38:06.3381519Z del async_compile 2023-01-11T21:38:06.3381524Z 2023-01-11T21:38:06.3381600Z def call(args): 2023-01-11T21:38:06.3381675Z arg0_1, = args 2023-01-11T21:38:06.3381749Z args.clear() 2023-01-11T21:38:06.3381884Z torch.randint(2**31, size=(), dtype=torch.int64, out=seed_cuda_0) 2023-01-11T21:38:06.3381971Z with torch.cuda.device(0): 2023-01-11T21:38:06.3382175Z buf0 = empty_strided((1024, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3382273Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.3382425Z triton_fused_mul_1_0.run(seed_cuda_0, arg0_1, buf0, 1024, grid=grid(1024), stream=stream0) 2023-01-11T21:38:06.3382502Z del arg0_1 2023-01-11T21:38:06.3382581Z return (buf0, ) 2023-01-11T21:38:06.3382588Z 2023-01-11T21:38:06.3382593Z 2023-01-11T21:38:06.3382674Z if __name__ == "__main__": 2023-01-11T21:38:06.3382795Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.3382916Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.3383110Z seed_cuda_0 = rand_strided((), (), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.3383311Z arg0_1 = rand_strided((1024, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.3383428Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.3383434Z 2023-01-11T21:38:06.3383506Z ok (0.401s) 2023-01-11T21:38:06.3383847Z test_dtype_mismatch_issue_cuda (__main__.CudaTests) ... [2023-01-11 21:34:34,583] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 511 2023-01-11T21:38:06.3384145Z [2023-01-11 21:34:34,615] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 511 2023-01-11T21:38:06.3384151Z 2023-01-11T21:38:06.3384255Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.3384332Z import torch 2023-01-11T21:38:06.3384402Z import random 2023-01-11T21:38:06.3384521Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.3384646Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.3384651Z 2023-01-11T21:38:06.3384737Z aten = torch.ops.aten 2023-01-11T21:38:06.3384875Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.3384971Z async_compile = AsyncCompile() 2023-01-11T21:38:06.3384976Z 2023-01-11T21:38:06.3385051Z import triton 2023-01-11T21:38:06.3385141Z import triton.language as tl 2023-01-11T21:38:06.3385294Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.3385458Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.3385464Z 2023-01-11T21:38:06.3385469Z 2023-01-11T21:38:06.3385610Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.3385823Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.3385948Z extern "C" void kernel(float* __restrict__ in_out_ptr0, 2023-01-11T21:38:06.3386061Z const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.3386169Z float* __restrict__ out_ptr0, 2023-01-11T21:38:06.3386265Z float* __restrict__ out_ptr2) 2023-01-11T21:38:06.3386333Z { 2023-01-11T21:38:06.3386425Z auto out_ptr1 = in_out_ptr0; 2023-01-11T21:38:06.3386528Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.3386598Z { 2023-01-11T21:38:06.3386682Z #pragma omp for 2023-01-11T21:38:06.3386777Z for(long i0=0; i0<4096; i0+=1) 2023-01-11T21:38:06.3386840Z { 2023-01-11T21:38:06.3386908Z { 2023-01-11T21:38:06.3386978Z { 2023-01-11T21:38:06.3387255Z float tmp5 = -std::numeric_limits::infinity(); 2023-01-11T21:38:06.3387354Z for(long i1=0; i1<64; i1+=1) 2023-01-11T21:38:06.3387458Z { 2023-01-11T21:38:06.3387534Z { 2023-01-11T21:38:06.3387643Z auto tmp0 = static_cast(i1); 2023-01-11T21:38:06.3387757Z auto tmp1 = static_cast(63); 2023-01-11T21:38:06.3387860Z auto tmp2 = tmp0 < tmp1; 2023-01-11T21:38:06.3387956Z float tmp3 = 0.0; 2023-01-11T21:38:06.3388040Z if(tmp2) 2023-01-11T21:38:06.3388117Z { 2023-01-11T21:38:06.3388227Z auto tmp4 = in_ptr0[i1 + (63*i0)]; 2023-01-11T21:38:06.3388317Z tmp3 = tmp4; 2023-01-11T21:38:06.3388396Z } 2023-01-11T21:38:06.3388507Z tmp5 = std::max(tmp5, tmp3); 2023-01-11T21:38:06.3388582Z } 2023-01-11T21:38:06.3388655Z } 2023-01-11T21:38:06.3388749Z out_ptr0[i0] = tmp5; 2023-01-11T21:38:06.3388823Z } 2023-01-11T21:38:06.3388885Z } 2023-01-11T21:38:06.3388954Z } 2023-01-11T21:38:06.3389039Z #pragma omp for 2023-01-11T21:38:06.3389127Z for(long i0=0; i0<4096; i0+=1) 2023-01-11T21:38:06.3389194Z { 2023-01-11T21:38:06.3389263Z { 2023-01-11T21:38:06.3389333Z { 2023-01-11T21:38:06.3389415Z float tmp8 = 0; 2023-01-11T21:38:06.3389512Z for(long i1=0; i1<64; i1+=1) 2023-01-11T21:38:06.3389586Z { 2023-01-11T21:38:06.3389660Z { 2023-01-11T21:38:06.3389794Z auto tmp5 = out_ptr0[i0]; 2023-01-11T21:38:06.3389908Z auto tmp0 = static_cast(i1); 2023-01-11T21:38:06.3390023Z auto tmp1 = static_cast(63); 2023-01-11T21:38:06.3390119Z auto tmp2 = tmp0 < tmp1; 2023-01-11T21:38:06.3390216Z float tmp3 = 0.0; 2023-01-11T21:38:06.3390299Z if(tmp2) 2023-01-11T21:38:06.3390377Z { 2023-01-11T21:38:06.3390488Z auto tmp4 = in_ptr0[i1 + (63*i0)]; 2023-01-11T21:38:06.3390580Z tmp3 = tmp4; 2023-01-11T21:38:06.3390659Z } 2023-01-11T21:38:06.3390801Z auto tmp6 = tmp3 - tmp5; 2023-01-11T21:38:06.3390911Z auto tmp7 = std::exp(tmp6); 2023-01-11T21:38:06.3391018Z out_ptr1[i1 + (64*i0)] = tmp7; 2023-01-11T21:38:06.3391114Z tmp8 += tmp7; 2023-01-11T21:38:06.3391189Z } 2023-01-11T21:38:06.3391261Z } 2023-01-11T21:38:06.3391354Z out_ptr2[i0] = tmp8; 2023-01-11T21:38:06.3391418Z } 2023-01-11T21:38:06.3391487Z } 2023-01-11T21:38:06.3391558Z } 2023-01-11T21:38:06.3391641Z #pragma omp for 2023-01-11T21:38:06.3391729Z for(long i0=0; i0<4096; i0+=1) 2023-01-11T21:38:06.3391799Z { 2023-01-11T21:38:06.3391883Z for(long i1=0; i1<8; i1+=1) 2023-01-11T21:38:06.3391955Z { 2023-01-11T21:38:06.3392108Z auto tmp0 = at::vec::Vectorized::loadu(out_ptr1 + (8*i1) + (64*i0)); 2023-01-11T21:38:06.3392243Z auto tmp1 = at::vec::Vectorized(out_ptr2[i0]); 2023-01-11T21:38:06.3392338Z auto tmp2 = tmp0 / tmp1; 2023-01-11T21:38:06.3392452Z tmp2.store(in_out_ptr0 + (8*i1) + (64*i0)); 2023-01-11T21:38:06.3392523Z } 2023-01-11T21:38:06.3392628Z #pragma omp simd simdlen(4) 2023-01-11T21:38:06.3392714Z for(long i1=64; i1<64; i1+=1) 2023-01-11T21:38:06.3392783Z { 2023-01-11T21:38:06.3392886Z auto tmp0 = out_ptr1[i1 + (64*i0)]; 2023-01-11T21:38:06.3393008Z auto tmp1 = out_ptr2[i0]; 2023-01-11T21:38:06.3393104Z auto tmp2 = tmp0 / tmp1; 2023-01-11T21:38:06.3393205Z in_out_ptr0[i1 + (64*i0)] = tmp2; 2023-01-11T21:38:06.3393275Z } 2023-01-11T21:38:06.3393337Z } 2023-01-11T21:38:06.3393403Z } 2023-01-11T21:38:06.3393469Z } 2023-01-11T21:38:06.3393554Z ''') 2023-01-11T21:38:06.3393560Z 2023-01-11T21:38:06.3393564Z 2023-01-11T21:38:06.3393661Z async_compile.wait(globals()) 2023-01-11T21:38:06.3393739Z del async_compile 2023-01-11T21:38:06.3393744Z 2023-01-11T21:38:06.3393820Z def call(args): 2023-01-11T21:38:06.3393889Z arg0_1, = args 2023-01-11T21:38:06.3393971Z args.clear() 2023-01-11T21:38:06.3394183Z buf0 = empty_strided((128, 32, 1), (32, 1, 4096), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.3394396Z buf1 = empty_strided((128, 32, 64), (2048, 64, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.3394612Z buf2 = empty_strided((128, 32, 1), (32, 1, 4096), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.3394706Z buf3 = buf1; del buf1 # reuse 2023-01-11T21:38:06.3394902Z kernel_cpp_0(c_void_p(buf3.data_ptr()), c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf2.data_ptr())) 2023-01-11T21:38:06.3394977Z del arg0_1 2023-01-11T21:38:06.3395048Z return (buf3, ) 2023-01-11T21:38:06.3395053Z 2023-01-11T21:38:06.3395058Z 2023-01-11T21:38:06.3395149Z if __name__ == "__main__": 2023-01-11T21:38:06.3395287Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.3395440Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.3395653Z arg0_1 = rand_strided((128, 32, 63), (2016, 63, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.3395810Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.3395815Z 2023-01-11T21:38:06.3395884Z ok (0.056s) 2023-01-11T21:38:06.3396345Z test_elu_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.3396477Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.3396732Z [2023-01-11 21:34:34,657] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 512 2023-01-11T21:38:06.3396998Z [2023-01-11 21:34:34,821] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 512 2023-01-11T21:38:06.3397420Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.3397554Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.3397810Z [2023-01-11 21:34:34,884] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 513 2023-01-11T21:38:06.3398075Z [2023-01-11 21:34:35,058] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 513 2023-01-11T21:38:06.3398080Z 2023-01-11T21:38:06.3398181Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.3398257Z import torch 2023-01-11T21:38:06.3398333Z import random 2023-01-11T21:38:06.3398448Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.3398576Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.3398581Z 2023-01-11T21:38:06.3398666Z aten = torch.ops.aten 2023-01-11T21:38:06.3398803Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.3398901Z async_compile = AsyncCompile() 2023-01-11T21:38:06.3398906Z 2023-01-11T21:38:06.3399010Z import triton 2023-01-11T21:38:06.3399107Z import triton.language as tl 2023-01-11T21:38:06.3399235Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.3399368Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.3399373Z 2023-01-11T21:38:06.3399384Z 2023-01-11T21:38:06.3399544Z triton_fused_add_where_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.3399621Z import triton 2023-01-11T21:38:06.3399716Z import triton.language as tl 2023-01-11T21:38:06.3399830Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.3399936Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.3400072Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.3400195Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.3400200Z 2023-01-11T21:38:06.3400616Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.3400690Z @triton.jit 2023-01-11T21:38:06.3400831Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.3400904Z xnumel = 256 2023-01-11T21:38:06.3401000Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.3401129Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.3401213Z xmask = xindex < xnumel 2023-01-11T21:38:06.3401284Z x0 = xindex 2023-01-11T21:38:06.3401468Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.3401598Z tmp13 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.3401671Z tmp1 = 0 2023-01-11T21:38:06.3401752Z tmp2 = tmp0 > tmp1 2023-01-11T21:38:06.3401831Z tmp3 = 1.0507009873554805 2023-01-11T21:38:06.3401907Z tmp4 = tmp0 * tmp3 2023-01-11T21:38:06.3401972Z tmp5 = 1.0 2023-01-11T21:38:06.3402051Z tmp6 = tmp0 * tmp5 2023-01-11T21:38:06.3402150Z tmp7 = tl.libdevice.expm1(tmp6) 2023-01-11T21:38:06.3402229Z tmp8 = 1.7580993408473766 2023-01-11T21:38:06.3402307Z tmp9 = tmp7 * tmp8 2023-01-11T21:38:06.3402404Z tmp10 = tl.where(tmp2, tmp4, tmp9) 2023-01-11T21:38:06.3402476Z tmp11 = 2 2023-01-11T21:38:06.3402549Z tmp12 = tmp10 + tmp11 2023-01-11T21:38:06.3402622Z tmp14 = 1 2023-01-11T21:38:06.3402700Z tmp15 = tmp13 + tmp14 2023-01-11T21:38:06.3402780Z tmp16 = tmp15 > tmp1 2023-01-11T21:38:06.3402851Z tmp17 = 3 2023-01-11T21:38:06.3402930Z tmp18 = tmp15 * tmp17 2023-01-11T21:38:06.3403002Z tmp19 = 4 2023-01-11T21:38:06.3403077Z tmp20 = tmp15 * tmp19 2023-01-11T21:38:06.3403176Z tmp21 = tl.libdevice.expm1(tmp20) 2023-01-11T21:38:06.3403249Z tmp22 = 6 2023-01-11T21:38:06.3403327Z tmp23 = tmp21 * tmp22 2023-01-11T21:38:06.3403428Z tmp24 = tl.where(tmp16, tmp18, tmp23) 2023-01-11T21:38:06.3403566Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp12, xmask) 2023-01-11T21:38:06.3403699Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp24, xmask) 2023-01-11T21:38:06.3403777Z ''') 2023-01-11T21:38:06.3403782Z 2023-01-11T21:38:06.3403786Z 2023-01-11T21:38:06.3403879Z async_compile.wait(globals()) 2023-01-11T21:38:06.3403955Z del async_compile 2023-01-11T21:38:06.3403960Z 2023-01-11T21:38:06.3404041Z def call(args): 2023-01-11T21:38:06.3404113Z arg0_1, = args 2023-01-11T21:38:06.3404184Z args.clear() 2023-01-11T21:38:06.3404274Z with torch.cuda.device(0): 2023-01-11T21:38:06.3404470Z buf0 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3404673Z buf1 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3404764Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.3404915Z triton_fused_add_where_1_0.run(arg0_1, buf0, buf1, 256, grid=grid(256), stream=stream0) 2023-01-11T21:38:06.3405014Z del arg0_1 2023-01-11T21:38:06.3405098Z return (buf0, buf1, ) 2023-01-11T21:38:06.3405103Z 2023-01-11T21:38:06.3405107Z 2023-01-11T21:38:06.3405185Z if __name__ == "__main__": 2023-01-11T21:38:06.3405301Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.3405419Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.3405622Z arg0_1 = rand_strided((16, 16), (16, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.3405740Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.3405745Z 2023-01-11T21:38:06.3405749Z 2023-01-11T21:38:06.3405845Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.3405921Z import torch 2023-01-11T21:38:06.3405994Z import random 2023-01-11T21:38:06.3406112Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.3406236Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.3406241Z 2023-01-11T21:38:06.3406315Z aten = torch.ops.aten 2023-01-11T21:38:06.3406452Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.3406547Z async_compile = AsyncCompile() 2023-01-11T21:38:06.3406552Z 2023-01-11T21:38:06.3406623Z import triton 2023-01-11T21:38:06.3406714Z import triton.language as tl 2023-01-11T21:38:06.3406838Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.3406981Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.3406987Z 2023-01-11T21:38:06.3406991Z 2023-01-11T21:38:06.3407184Z triton_fused_add_convert_element_type_3_0 = async_compile.triton(''' 2023-01-11T21:38:06.3407252Z import triton 2023-01-11T21:38:06.3407343Z import triton.language as tl 2023-01-11T21:38:06.3407485Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.3407586Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.3407717Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.3407841Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.3407846Z 2023-01-11T21:38:06.3408267Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.3408342Z @triton.jit 2023-01-11T21:38:06.3408478Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.3408553Z xnumel = 256 2023-01-11T21:38:06.3408650Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.3408781Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.3408868Z xmask = xindex < xnumel 2023-01-11T21:38:06.3408938Z x0 = xindex 2023-01-11T21:38:06.3409151Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.3409262Z tmp15 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.3409350Z tmp1 = tmp0.to(tl.float32) 2023-01-11T21:38:06.3409424Z tmp2 = 0 2023-01-11T21:38:06.3409502Z tmp3 = tmp1 > tmp2 2023-01-11T21:38:06.3409580Z tmp4 = 1.0507009873554805 2023-01-11T21:38:06.3409657Z tmp5 = tmp1 * tmp4 2023-01-11T21:38:06.3409730Z tmp6 = 1.0 2023-01-11T21:38:06.3409800Z tmp7 = tmp1 * tmp6 2023-01-11T21:38:06.3409900Z tmp8 = tl.libdevice.expm1(tmp7) 2023-01-11T21:38:06.3409977Z tmp9 = 1.7580993408473766 2023-01-11T21:38:06.3410056Z tmp10 = tmp8 * tmp9 2023-01-11T21:38:06.3410151Z tmp11 = tl.where(tmp3, tmp5, tmp10) 2023-01-11T21:38:06.3410239Z tmp12 = tmp11.to(tl.float32) 2023-01-11T21:38:06.3410309Z tmp13 = 2 2023-01-11T21:38:06.3410386Z tmp14 = tmp12 + tmp13 2023-01-11T21:38:06.3410454Z tmp16 = 1 2023-01-11T21:38:06.3410533Z tmp17 = tmp15 + tmp16 2023-01-11T21:38:06.3410623Z tmp18 = tmp17.to(tl.float32) 2023-01-11T21:38:06.3410700Z tmp19 = tmp18 > tmp2 2023-01-11T21:38:06.3410770Z tmp20 = 3 2023-01-11T21:38:06.3410842Z tmp21 = tmp18 * tmp20 2023-01-11T21:38:06.3410941Z tmp22 = 4 2023-01-11T21:38:06.3411022Z tmp23 = tmp18 * tmp22 2023-01-11T21:38:06.3411121Z tmp24 = tl.libdevice.expm1(tmp23) 2023-01-11T21:38:06.3411193Z tmp25 = 6 2023-01-11T21:38:06.3411270Z tmp26 = tmp24 * tmp25 2023-01-11T21:38:06.3411370Z tmp27 = tl.where(tmp19, tmp21, tmp26) 2023-01-11T21:38:06.3411451Z tmp28 = tmp27.to(tl.float32) 2023-01-11T21:38:06.3411586Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp14, xmask) 2023-01-11T21:38:06.3411720Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp28, xmask) 2023-01-11T21:38:06.3411804Z ''') 2023-01-11T21:38:06.3411809Z 2023-01-11T21:38:06.3411816Z 2023-01-11T21:38:06.3411908Z async_compile.wait(globals()) 2023-01-11T21:38:06.3411984Z del async_compile 2023-01-11T21:38:06.3411990Z 2023-01-11T21:38:06.3412064Z def call(args): 2023-01-11T21:38:06.3412138Z arg0_1, = args 2023-01-11T21:38:06.3412206Z args.clear() 2023-01-11T21:38:06.3412296Z with torch.cuda.device(0): 2023-01-11T21:38:06.3412499Z buf0 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.3412697Z buf1 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.3412788Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.3412957Z triton_fused_add_convert_element_type_3_0.run(arg0_1, buf0, buf1, 256, grid=grid(256), stream=stream0) 2023-01-11T21:38:06.3413030Z del arg0_1 2023-01-11T21:38:06.3413106Z return (buf0, buf1, ) 2023-01-11T21:38:06.3413111Z 2023-01-11T21:38:06.3413124Z 2023-01-11T21:38:06.3413197Z if __name__ == "__main__": 2023-01-11T21:38:06.3413350Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.3413476Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.3413681Z arg0_1 = rand_strided((16, 16), (16, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.3413794Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.3413799Z 2023-01-11T21:38:06.3413871Z ok (0.442s) 2023-01-11T21:38:06.3414336Z test_embedding_bag_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.3414468Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.3414841Z [2023-01-11 21:34:35,081] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 514 2023-01-11T21:38:06.3415069Z [2023-01-11 21:34:35,096] torch._inductor.ir: [WARNING] Using FallbackKernel: aten._embedding_bag 2023-01-11T21:38:06.3415332Z [2023-01-11 21:34:35,099] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 514 2023-01-11T21:38:06.3415746Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.3415878Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.3416130Z [2023-01-11 21:34:35,121] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 515 2023-01-11T21:38:06.3416361Z [2023-01-11 21:34:35,136] torch._inductor.ir: [WARNING] Using FallbackKernel: aten._embedding_bag 2023-01-11T21:38:06.3416624Z [2023-01-11 21:34:35,138] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 515 2023-01-11T21:38:06.3416630Z 2023-01-11T21:38:06.3416728Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.3416808Z import torch 2023-01-11T21:38:06.3416920Z import random 2023-01-11T21:38:06.3417041Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.3417220Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.3417227Z 2023-01-11T21:38:06.3417317Z aten = torch.ops.aten 2023-01-11T21:38:06.3417455Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.3417550Z async_compile = AsyncCompile() 2023-01-11T21:38:06.3417555Z 2023-01-11T21:38:06.3417632Z import triton 2023-01-11T21:38:06.3417724Z import triton.language as tl 2023-01-11T21:38:06.3417842Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.3417980Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.3417989Z 2023-01-11T21:38:06.3417994Z 2023-01-11T21:38:06.3418085Z async_compile.wait(globals()) 2023-01-11T21:38:06.3418164Z del async_compile 2023-01-11T21:38:06.3418169Z 2023-01-11T21:38:06.3418243Z def call(args): 2023-01-11T21:38:06.3418332Z arg0_1, arg1_1, arg2_1 = args 2023-01-11T21:38:06.3418410Z args.clear() 2023-01-11T21:38:06.3418495Z with torch.cuda.device(0): 2023-01-11T21:38:06.3418613Z buf0 = aten._embedding_bag(arg0_1, arg1_1, arg2_1) 2023-01-11T21:38:06.3418685Z del arg0_1 2023-01-11T21:38:06.3418756Z del arg1_1 2023-01-11T21:38:06.3418827Z del arg2_1 2023-01-11T21:38:06.3418900Z buf1 = buf0[0] 2023-01-11T21:38:06.3419004Z assert_size_stride(buf1, (3, 4), (4, 1)) 2023-01-11T21:38:06.3419071Z buf2 = buf0[1] 2023-01-11T21:38:06.3419172Z assert_size_stride(buf2, (8, ), (1, )) 2023-01-11T21:38:06.3419247Z buf3 = buf0[2] 2023-01-11T21:38:06.3419383Z assert_size_stride(buf3, (3, ), (1, )) 2023-01-11T21:38:06.3419456Z buf4 = buf0[3] 2023-01-11T21:38:06.3419552Z assert_size_stride(buf4, (0, ), (1, )) 2023-01-11T21:38:06.3419620Z del buf0 2023-01-11T21:38:06.3419709Z return (buf1, buf2, buf3, buf4, ) 2023-01-11T21:38:06.3419714Z 2023-01-11T21:38:06.3419722Z 2023-01-11T21:38:06.3419799Z if __name__ == "__main__": 2023-01-11T21:38:06.3419918Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.3420044Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.3420253Z arg0_1 = rand_strided((10, 4), (4, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.3420447Z arg1_1 = rand_strided((8, ), (1, ), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.3420639Z arg2_1 = rand_strided((3, ), (1, ), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.3420765Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1])) 2023-01-11T21:38:06.3420774Z 2023-01-11T21:38:06.3420779Z 2023-01-11T21:38:06.3420874Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.3420941Z import torch 2023-01-11T21:38:06.3421016Z import random 2023-01-11T21:38:06.3421133Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.3421256Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.3421263Z 2023-01-11T21:38:06.3421342Z aten = torch.ops.aten 2023-01-11T21:38:06.3421477Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.3421573Z async_compile = AsyncCompile() 2023-01-11T21:38:06.3421578Z 2023-01-11T21:38:06.3421645Z import triton 2023-01-11T21:38:06.3421738Z import triton.language as tl 2023-01-11T21:38:06.3421863Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.3422002Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.3422007Z 2023-01-11T21:38:06.3422011Z 2023-01-11T21:38:06.3422102Z async_compile.wait(globals()) 2023-01-11T21:38:06.3422183Z del async_compile 2023-01-11T21:38:06.3422188Z 2023-01-11T21:38:06.3422262Z def call(args): 2023-01-11T21:38:06.3422348Z arg0_1, arg1_1, arg2_1 = args 2023-01-11T21:38:06.3422417Z args.clear() 2023-01-11T21:38:06.3422507Z with torch.cuda.device(0): 2023-01-11T21:38:06.3422622Z buf0 = aten._embedding_bag(arg0_1, arg1_1, arg2_1) 2023-01-11T21:38:06.3422722Z del arg0_1 2023-01-11T21:38:06.3422800Z del arg1_1 2023-01-11T21:38:06.3422870Z del arg2_1 2023-01-11T21:38:06.3422943Z buf1 = buf0[0] 2023-01-11T21:38:06.3423041Z assert_size_stride(buf1, (3, 4), (4, 1)) 2023-01-11T21:38:06.3423112Z buf2 = buf0[1] 2023-01-11T21:38:06.3423212Z assert_size_stride(buf2, (8, ), (1, )) 2023-01-11T21:38:06.3423284Z buf3 = buf0[2] 2023-01-11T21:38:06.3423381Z assert_size_stride(buf3, (3, ), (1, )) 2023-01-11T21:38:06.3423454Z buf4 = buf0[3] 2023-01-11T21:38:06.3423544Z assert_size_stride(buf4, (0, ), (1, )) 2023-01-11T21:38:06.3423617Z del buf0 2023-01-11T21:38:06.3423715Z return (buf1, buf2, buf3, buf4, ) 2023-01-11T21:38:06.3423720Z 2023-01-11T21:38:06.3423725Z 2023-01-11T21:38:06.3423805Z if __name__ == "__main__": 2023-01-11T21:38:06.3423920Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.3424044Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.3424246Z arg0_1 = rand_strided((10, 4), (4, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.3424439Z arg1_1 = rand_strided((8, ), (1, ), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.3424620Z arg2_1 = rand_strided((3, ), (1, ), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.3424744Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1])) 2023-01-11T21:38:06.3424750Z 2023-01-11T21:38:06.3424818Z ok (0.081s) 2023-01-11T21:38:06.3425276Z test_embedding_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.3425441Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.3425732Z [2023-01-11 21:34:35,243] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 516 2023-01-11T21:38:06.3426015Z [2023-01-11 21:34:35,432] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 516 2023-01-11T21:38:06.3426432Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.3426571Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.3426829Z [2023-01-11 21:34:35,566] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 517 2023-01-11T21:38:06.3427091Z [2023-01-11 21:34:35,650] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 517 2023-01-11T21:38:06.3427097Z 2023-01-11T21:38:06.3427197Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.3427267Z import torch 2023-01-11T21:38:06.3427344Z import random 2023-01-11T21:38:06.3427466Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.3427594Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.3427599Z 2023-01-11T21:38:06.3427679Z aten = torch.ops.aten 2023-01-11T21:38:06.3427819Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.3427920Z async_compile = AsyncCompile() 2023-01-11T21:38:06.3427927Z 2023-01-11T21:38:06.3427999Z import triton 2023-01-11T21:38:06.3428093Z import triton.language as tl 2023-01-11T21:38:06.3428220Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.3428360Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.3428366Z 2023-01-11T21:38:06.3428370Z 2023-01-11T21:38:06.3428608Z triton_fused_le_relu_0 = async_compile.triton(''' 2023-01-11T21:38:06.3428688Z import triton 2023-01-11T21:38:06.3428784Z import triton.language as tl 2023-01-11T21:38:06.3428899Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.3428996Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.3429130Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.3429256Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.3429261Z 2023-01-11T21:38:06.3429695Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp32', 2: '*fp32', 3: '*i1', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]}) 2023-01-11T21:38:06.3429775Z @triton.jit 2023-01-11T21:38:06.3429927Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.3430002Z xnumel = 64 2023-01-11T21:38:06.3430104Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.3430228Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.3430314Z xmask = xindex < xnumel 2023-01-11T21:38:06.3430396Z x1 = (xindex // 4) 2023-01-11T21:38:06.3430474Z x0 = xindex % 4 2023-01-11T21:38:06.3430548Z x2 = xindex 2023-01-11T21:38:06.3430652Z tmp0 = tl.load(in_ptr0 + (x1), xmask) 2023-01-11T21:38:06.3430763Z tmp1 = tl.load(in_ptr1 + (x0 + (4*tmp0)), xmask) 2023-01-11T21:38:06.3430876Z tmp2 = tl.where(0 != 0, 0, tl.where(0 > tmp1, 0, tmp1)) 2023-01-11T21:38:06.3430946Z tmp3 = 0 2023-01-11T21:38:06.3431027Z tmp4 = tmp2 <= tmp3 2023-01-11T21:38:06.3431192Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.3431327Z tl.store(out_ptr1 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.3431414Z ''') 2023-01-11T21:38:06.3431420Z 2023-01-11T21:38:06.3431424Z 2023-01-11T21:38:06.3431616Z triton_fused_convert_element_type_1_1 = async_compile.triton(''' 2023-01-11T21:38:06.3431685Z import triton 2023-01-11T21:38:06.3431781Z import triton.language as tl 2023-01-11T21:38:06.3431899Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.3432006Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.3432140Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.3432266Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.3432271Z 2023-01-11T21:38:06.3432672Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*i64', 1: '*i64', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.3432758Z @triton.jit 2023-01-11T21:38:06.3432886Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.3432964Z xnumel = 16 2023-01-11T21:38:06.3433062Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.3433197Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.3433284Z xmask = xindex < xnumel 2023-01-11T21:38:06.3433356Z x0 = xindex 2023-01-11T21:38:06.3433456Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.3433540Z tmp1 = tmp0.to(tl.float32) 2023-01-11T21:38:06.3433627Z tmp2 = tmp1.to(tl.int64) 2023-01-11T21:38:06.3433761Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.3433849Z ''') 2023-01-11T21:38:06.3433854Z 2023-01-11T21:38:06.3433859Z 2023-01-11T21:38:06.3433954Z async_compile.wait(globals()) 2023-01-11T21:38:06.3434034Z del async_compile 2023-01-11T21:38:06.3434042Z 2023-01-11T21:38:06.3434118Z def call(args): 2023-01-11T21:38:06.3434209Z primals_1, primals_2 = args 2023-01-11T21:38:06.3434280Z args.clear() 2023-01-11T21:38:06.3434374Z with torch.cuda.device(0): 2023-01-11T21:38:06.3434583Z buf0 = empty_strided((2, 8, 4), (32, 4, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3434816Z buf1 = empty_strided((2, 8, 4), (32, 4, 1), device='cuda', dtype=torch.bool) 2023-01-11T21:38:06.3434912Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.3435077Z triton_fused_le_relu_0.run(primals_2, primals_1, buf0, buf1, 64, grid=grid(64), stream=stream0) 2023-01-11T21:38:06.3435159Z del primals_1 2023-01-11T21:38:06.3435380Z buf2 = empty_strided((2, 8), (8, 1), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.3435563Z triton_fused_convert_element_type_1_1.run(primals_2, buf2, 16, grid=grid(16), stream=stream0) 2023-01-11T21:38:06.3435642Z del primals_2 2023-01-11T21:38:06.3435734Z return (buf0, buf1, buf2, ) 2023-01-11T21:38:06.3435739Z 2023-01-11T21:38:06.3435744Z 2023-01-11T21:38:06.3435823Z if __name__ == "__main__": 2023-01-11T21:38:06.3435942Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.3436069Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.3436280Z primals_1 = rand_strided((10, 4), (4, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.3436477Z primals_2 = rand_strided((2, 8), (8, 1), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.3436612Z print_performance(lambda: call([primals_1, primals_2])) 2023-01-11T21:38:06.3436617Z 2023-01-11T21:38:06.3436621Z 2023-01-11T21:38:06.3436721Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.3436796Z import torch 2023-01-11T21:38:06.3436873Z import random 2023-01-11T21:38:06.3436993Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.3437120Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.3437163Z 2023-01-11T21:38:06.3437247Z aten = torch.ops.aten 2023-01-11T21:38:06.3437378Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.3437477Z async_compile = AsyncCompile() 2023-01-11T21:38:06.3437482Z 2023-01-11T21:38:06.3437558Z import triton 2023-01-11T21:38:06.3437652Z import triton.language as tl 2023-01-11T21:38:06.3437781Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.3437925Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.3437930Z 2023-01-11T21:38:06.3437934Z 2023-01-11T21:38:06.3438096Z triton_fused_le_relu_0 = async_compile.triton(''' 2023-01-11T21:38:06.3438176Z import triton 2023-01-11T21:38:06.3438263Z import triton.language as tl 2023-01-11T21:38:06.3438381Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.3438487Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.3438622Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.3438751Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.3438756Z 2023-01-11T21:38:06.3439188Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp16', 2: '*fp16', 3: '*i1', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]}) 2023-01-11T21:38:06.3439264Z @triton.jit 2023-01-11T21:38:06.3439418Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.3439488Z xnumel = 64 2023-01-11T21:38:06.3439587Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.3439716Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.3439800Z xmask = xindex < xnumel 2023-01-11T21:38:06.3439879Z x1 = (xindex // 4) 2023-01-11T21:38:06.3439955Z x0 = xindex % 4 2023-01-11T21:38:06.3440026Z x2 = xindex 2023-01-11T21:38:06.3440119Z tmp0 = tl.load(in_ptr0 + (x1), xmask) 2023-01-11T21:38:06.3440251Z tmp1 = tl.load(in_ptr1 + (x0 + (4*tmp0)), xmask).to(tl.float32) 2023-01-11T21:38:06.3440369Z tmp2 = tl.where(0 != 0, 0, tl.where(0 > tmp1, 0, tmp1)) 2023-01-11T21:38:06.3440460Z tmp3 = tmp2.to(tl.float32) 2023-01-11T21:38:06.3440532Z tmp4 = 0 2023-01-11T21:38:06.3440613Z tmp5 = tmp3 <= tmp4 2023-01-11T21:38:06.3440774Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.3440903Z tl.store(out_ptr1 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask) 2023-01-11T21:38:06.3440994Z ''') 2023-01-11T21:38:06.3440999Z 2023-01-11T21:38:06.3441004Z 2023-01-11T21:38:06.3441190Z triton_fused_convert_element_type_5_1 = async_compile.triton(''' 2023-01-11T21:38:06.3441266Z import triton 2023-01-11T21:38:06.3441362Z import triton.language as tl 2023-01-11T21:38:06.3441479Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.3441584Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.3441722Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.3441845Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.3441850Z 2023-01-11T21:38:06.3442257Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*i64', 1: '*i64', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.3442333Z @triton.jit 2023-01-11T21:38:06.3442467Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.3442544Z xnumel = 16 2023-01-11T21:38:06.3442641Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.3442773Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.3442858Z xmask = xindex < xnumel 2023-01-11T21:38:06.3442924Z x0 = xindex 2023-01-11T21:38:06.3443022Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.3443112Z tmp1 = tmp0.to(tl.float32) 2023-01-11T21:38:06.3443199Z tmp2 = tmp1.to(tl.int64) 2023-01-11T21:38:06.3443359Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.3443445Z ''') 2023-01-11T21:38:06.3443451Z 2023-01-11T21:38:06.3443455Z 2023-01-11T21:38:06.3443552Z async_compile.wait(globals()) 2023-01-11T21:38:06.3443624Z del async_compile 2023-01-11T21:38:06.3443630Z 2023-01-11T21:38:06.3443708Z def call(args): 2023-01-11T21:38:06.3443801Z primals_1, primals_2 = args 2023-01-11T21:38:06.3443876Z args.clear() 2023-01-11T21:38:06.3443972Z with torch.cuda.device(0): 2023-01-11T21:38:06.3444181Z buf0 = empty_strided((2, 8, 4), (32, 4, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.3444384Z buf1 = empty_strided((2, 8, 4), (32, 4, 1), device='cuda', dtype=torch.bool) 2023-01-11T21:38:06.3444472Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.3444638Z triton_fused_le_relu_0.run(primals_2, primals_1, buf0, buf1, 64, grid=grid(64), stream=stream0) 2023-01-11T21:38:06.3444718Z del primals_1 2023-01-11T21:38:06.3444919Z buf2 = empty_strided((2, 8), (8, 1), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.3445083Z triton_fused_convert_element_type_5_1.run(primals_2, buf2, 16, grid=grid(16), stream=stream0) 2023-01-11T21:38:06.3445164Z del primals_2 2023-01-11T21:38:06.3445261Z return (buf0, buf1, buf2, ) 2023-01-11T21:38:06.3445267Z 2023-01-11T21:38:06.3445271Z 2023-01-11T21:38:06.3445354Z if __name__ == "__main__": 2023-01-11T21:38:06.3445467Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.3445594Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.3445807Z primals_1 = rand_strided((10, 4), (4, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.3446010Z primals_2 = rand_strided((2, 8), (8, 1), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.3446144Z print_performance(lambda: call([primals_1, primals_2])) 2023-01-11T21:38:06.3446149Z 2023-01-11T21:38:06.3446224Z ok (0.511s) 2023-01-11T21:38:06.3446700Z test_exp_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.3446840Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.3447100Z [2023-01-11 21:34:35,668] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 518 2023-01-11T21:38:06.3447366Z [2023-01-11 21:34:35,744] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 518 2023-01-11T21:38:06.3447776Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.3447913Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.3448171Z [2023-01-11 21:34:35,762] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 519 2023-01-11T21:38:06.3448435Z [2023-01-11 21:34:35,835] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 519 2023-01-11T21:38:06.3448441Z 2023-01-11T21:38:06.3448543Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.3448617Z import torch 2023-01-11T21:38:06.3448694Z import random 2023-01-11T21:38:06.3448815Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.3448940Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.3448945Z 2023-01-11T21:38:06.3449023Z aten = torch.ops.aten 2023-01-11T21:38:06.3449165Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.3449290Z async_compile = AsyncCompile() 2023-01-11T21:38:06.3449295Z 2023-01-11T21:38:06.3449371Z import triton 2023-01-11T21:38:06.3449462Z import triton.language as tl 2023-01-11T21:38:06.3449589Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.3449730Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.3449736Z 2023-01-11T21:38:06.3449740Z 2023-01-11T21:38:06.3449905Z triton_fused_exp_exp_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.3449975Z import triton 2023-01-11T21:38:06.3450070Z import triton.language as tl 2023-01-11T21:38:06.3450187Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.3450291Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.3450425Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.3450552Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.3450557Z 2023-01-11T21:38:06.3451000Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]}) 2023-01-11T21:38:06.3451075Z @triton.jit 2023-01-11T21:38:06.3451225Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.3451301Z xnumel = 64 2023-01-11T21:38:06.3451402Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.3451535Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.3451621Z xmask = xindex < xnumel 2023-01-11T21:38:06.3451695Z x0 = xindex 2023-01-11T21:38:06.3451887Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.3451981Z tmp2 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.3452078Z tmp3 = tl.load(in_ptr1 + (x0), xmask) 2023-01-11T21:38:06.3452162Z tmp1 = tl.exp(tmp0) 2023-01-11T21:38:06.3452244Z tmp4 = tmp2 + tmp3 2023-01-11T21:38:06.3452327Z tmp5 = tl.exp(tmp4) 2023-01-11T21:38:06.3452464Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask) 2023-01-11T21:38:06.3452600Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask) 2023-01-11T21:38:06.3452680Z ''') 2023-01-11T21:38:06.3452713Z 2023-01-11T21:38:06.3452726Z 2023-01-11T21:38:06.3452816Z async_compile.wait(globals()) 2023-01-11T21:38:06.3452895Z del async_compile 2023-01-11T21:38:06.3452900Z 2023-01-11T21:38:06.3452977Z def call(args): 2023-01-11T21:38:06.3453060Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.3453137Z args.clear() 2023-01-11T21:38:06.3453231Z with torch.cuda.device(0): 2023-01-11T21:38:06.3453432Z buf0 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3453625Z buf1 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3453722Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.3453882Z triton_fused_exp_exp_1_0.run(arg0_1, arg1_1, buf0, buf1, 64, grid=grid(64), stream=stream0) 2023-01-11T21:38:06.3453958Z del arg0_1 2023-01-11T21:38:06.3454032Z del arg1_1 2023-01-11T21:38:06.3454118Z return (buf0, buf1, ) 2023-01-11T21:38:06.3454123Z 2023-01-11T21:38:06.3454129Z 2023-01-11T21:38:06.3454211Z if __name__ == "__main__": 2023-01-11T21:38:06.3454325Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.3454455Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.3454764Z arg0_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.3454962Z arg1_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.3455081Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.3455086Z 2023-01-11T21:38:06.3455090Z 2023-01-11T21:38:06.3455189Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.3455304Z import torch 2023-01-11T21:38:06.3455376Z import random 2023-01-11T21:38:06.3455487Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.3455611Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.3455616Z 2023-01-11T21:38:06.3455696Z aten = torch.ops.aten 2023-01-11T21:38:06.3455832Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.3455926Z async_compile = AsyncCompile() 2023-01-11T21:38:06.3455931Z 2023-01-11T21:38:06.3456007Z import triton 2023-01-11T21:38:06.3456099Z import triton.language as tl 2023-01-11T21:38:06.3456222Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.3456354Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.3456359Z 2023-01-11T21:38:06.3456371Z 2023-01-11T21:38:06.3456527Z triton_fused_exp_exp_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.3456600Z import triton 2023-01-11T21:38:06.3456692Z import triton.language as tl 2023-01-11T21:38:06.3456807Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.3456910Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.3457041Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.3457220Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.3457228Z 2023-01-11T21:38:06.3457658Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: '*fp16', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]}) 2023-01-11T21:38:06.3457732Z @triton.jit 2023-01-11T21:38:06.3457882Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.3457956Z xnumel = 64 2023-01-11T21:38:06.3458053Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.3458182Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.3458268Z xmask = xindex < xnumel 2023-01-11T21:38:06.3458338Z x0 = xindex 2023-01-11T21:38:06.3458544Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.3458661Z tmp2 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.3458775Z tmp3 = tl.load(in_ptr1 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.3458896Z tmp1 = tl.exp(tmp0) 2023-01-11T21:38:06.3458977Z tmp4 = tmp2 + tmp3 2023-01-11T21:38:06.3459057Z tmp5 = tl.exp(tmp4) 2023-01-11T21:38:06.3459191Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask) 2023-01-11T21:38:06.3459317Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask) 2023-01-11T21:38:06.3459404Z ''') 2023-01-11T21:38:06.3459409Z 2023-01-11T21:38:06.3459414Z 2023-01-11T21:38:06.3459507Z async_compile.wait(globals()) 2023-01-11T21:38:06.3459582Z del async_compile 2023-01-11T21:38:06.3459587Z 2023-01-11T21:38:06.3459662Z def call(args): 2023-01-11T21:38:06.3459744Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.3459819Z args.clear() 2023-01-11T21:38:06.3459904Z with torch.cuda.device(0): 2023-01-11T21:38:06.3460100Z buf0 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.3460296Z buf1 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.3460388Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.3460542Z triton_fused_exp_exp_1_0.run(arg0_1, arg1_1, buf0, buf1, 64, grid=grid(64), stream=stream0) 2023-01-11T21:38:06.3460613Z del arg0_1 2023-01-11T21:38:06.3460688Z del arg1_1 2023-01-11T21:38:06.3460770Z return (buf0, buf1, ) 2023-01-11T21:38:06.3460775Z 2023-01-11T21:38:06.3460779Z 2023-01-11T21:38:06.3460850Z if __name__ == "__main__": 2023-01-11T21:38:06.3460967Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.3461095Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.3461320Z arg0_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.3461516Z arg1_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.3461635Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.3461640Z 2023-01-11T21:38:06.3461710Z ok (0.186s) 2023-01-11T21:38:06.3462170Z test_expand_as_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.3462303Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.3462553Z [2023-01-11 21:34:35,870] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 520 2023-01-11T21:38:06.3462819Z [2023-01-11 21:34:35,949] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 520 2023-01-11T21:38:06.3463233Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.3463364Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.3463623Z [2023-01-11 21:34:35,983] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 521 2023-01-11T21:38:06.3463880Z [2023-01-11 21:34:36,061] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 521 2023-01-11T21:38:06.3463886Z 2023-01-11T21:38:06.3463984Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.3464063Z import torch 2023-01-11T21:38:06.3464136Z import random 2023-01-11T21:38:06.3464257Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.3464373Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.3464378Z 2023-01-11T21:38:06.3464462Z aten = torch.ops.aten 2023-01-11T21:38:06.3464624Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.3464719Z async_compile = AsyncCompile() 2023-01-11T21:38:06.3464725Z 2023-01-11T21:38:06.3464798Z import triton 2023-01-11T21:38:06.3464892Z import triton.language as tl 2023-01-11T21:38:06.3465017Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.3465149Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.3465159Z 2023-01-11T21:38:06.3465164Z 2023-01-11T21:38:06.3465315Z triton_fused_add_2_0 = async_compile.triton(''' 2023-01-11T21:38:06.3465405Z import triton 2023-01-11T21:38:06.3465503Z import triton.language as tl 2023-01-11T21:38:06.3465643Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.3465745Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.3465877Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.3466001Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.3466006Z 2023-01-11T21:38:06.3466407Z @pointwise(size_hints=[131072], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.3466480Z @triton.jit 2023-01-11T21:38:06.3466613Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.3466687Z xnumel = 76800 2023-01-11T21:38:06.3466785Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.3466912Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.3466994Z xmask = xindex < xnumel 2023-01-11T21:38:06.3467070Z x0 = xindex % 100 2023-01-11T21:38:06.3467183Z x2 = (xindex // 12800) 2023-01-11T21:38:06.3467253Z x3 = xindex 2023-01-11T21:38:06.3467457Z tmp0 = tl.load(in_ptr0 + (x0 + (100*x2)), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.3467526Z tmp1 = 1 2023-01-11T21:38:06.3467602Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.3467680Z tmp3 = tmp2 + tmp1 2023-01-11T21:38:06.3467813Z tl.store(out_ptr0 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.3467890Z ''') 2023-01-11T21:38:06.3467895Z 2023-01-11T21:38:06.3467900Z 2023-01-11T21:38:06.3467993Z async_compile.wait(globals()) 2023-01-11T21:38:06.3468069Z del async_compile 2023-01-11T21:38:06.3468074Z 2023-01-11T21:38:06.3468147Z def call(args): 2023-01-11T21:38:06.3468223Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.3468298Z args.clear() 2023-01-11T21:38:06.3468388Z with torch.cuda.device(0): 2023-01-11T21:38:06.3468600Z buf0 = empty_strided((6, 128, 100), (12800, 100, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3468697Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.3468836Z triton_fused_add_2_0.run(arg0_1, buf0, 76800, grid=grid(76800), stream=stream0) 2023-01-11T21:38:06.3468954Z return (as_strided(arg0_1, (6, 128, 100), (100, 0, 1)), buf0, ) 2023-01-11T21:38:06.3468960Z 2023-01-11T21:38:06.3468964Z 2023-01-11T21:38:06.3469045Z if __name__ == "__main__": 2023-01-11T21:38:06.3469162Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.3469286Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.3469500Z arg0_1 = rand_strided((6, 1, 100), (100, 100, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.3469710Z arg1_1 = rand_strided((6, 128, 100), (12800, 100, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.3469828Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.3469833Z 2023-01-11T21:38:06.3469838Z 2023-01-11T21:38:06.3469935Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.3470011Z import torch 2023-01-11T21:38:06.3470085Z import random 2023-01-11T21:38:06.3470206Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.3470327Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.3470332Z 2023-01-11T21:38:06.3470413Z aten = torch.ops.aten 2023-01-11T21:38:06.3470568Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.3470664Z async_compile = AsyncCompile() 2023-01-11T21:38:06.3470669Z 2023-01-11T21:38:06.3470743Z import triton 2023-01-11T21:38:06.3470833Z import triton.language as tl 2023-01-11T21:38:06.3470957Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.3471093Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.3471098Z 2023-01-11T21:38:06.3471103Z 2023-01-11T21:38:06.3471259Z triton_fused_add_2_0 = async_compile.triton(''' 2023-01-11T21:38:06.3471332Z import triton 2023-01-11T21:38:06.3471417Z import triton.language as tl 2023-01-11T21:38:06.3471533Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.3471633Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.3471766Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.3471888Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.3471893Z 2023-01-11T21:38:06.3472297Z @pointwise(size_hints=[131072], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.3472371Z @triton.jit 2023-01-11T21:38:06.3472504Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.3472572Z xnumel = 76800 2023-01-11T21:38:06.3472668Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.3472795Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.3472876Z xmask = xindex < xnumel 2023-01-11T21:38:06.3472978Z x0 = xindex % 100 2023-01-11T21:38:06.3473059Z x2 = (xindex // 12800) 2023-01-11T21:38:06.3473127Z x3 = xindex 2023-01-11T21:38:06.3473348Z tmp0 = tl.load(in_ptr0 + (x0 + (100*x2)), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.3473417Z tmp1 = 1 2023-01-11T21:38:06.3473499Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.3473577Z tmp3 = tmp2 + tmp1 2023-01-11T21:38:06.3473711Z tl.store(out_ptr0 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.3473793Z ''') 2023-01-11T21:38:06.3473798Z 2023-01-11T21:38:06.3473803Z 2023-01-11T21:38:06.3473899Z async_compile.wait(globals()) 2023-01-11T21:38:06.3473975Z del async_compile 2023-01-11T21:38:06.3473980Z 2023-01-11T21:38:06.3474047Z def call(args): 2023-01-11T21:38:06.3474126Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.3474199Z args.clear() 2023-01-11T21:38:06.3474289Z with torch.cuda.device(0): 2023-01-11T21:38:06.3474505Z buf0 = empty_strided((6, 128, 100), (12800, 100, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.3474599Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.3480972Z triton_fused_add_2_0.run(arg0_1, buf0, 76800, grid=grid(76800), stream=stream0) 2023-01-11T21:38:06.3481112Z return (as_strided(arg0_1, (6, 128, 100), (100, 0, 1)), buf0, ) 2023-01-11T21:38:06.3481122Z 2023-01-11T21:38:06.3481127Z 2023-01-11T21:38:06.3481210Z if __name__ == "__main__": 2023-01-11T21:38:06.3481324Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.3481454Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.3481689Z arg0_1 = rand_strided((6, 1, 100), (100, 100, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.3481907Z arg1_1 = rand_strided((6, 128, 100), (12800, 100, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.3482029Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.3482035Z 2023-01-11T21:38:06.3482112Z ok (0.226s) 2023-01-11T21:38:06.3482621Z test_expand_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.3482756Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.3483017Z [2023-01-11 21:34:36,086] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 522 2023-01-11T21:38:06.3483274Z [2023-01-11 21:34:36,189] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 522 2023-01-11T21:38:06.3483694Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.3483824Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.3484085Z [2023-01-11 21:34:36,213] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 523 2023-01-11T21:38:06.3484344Z [2023-01-11 21:34:36,299] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 523 2023-01-11T21:38:06.3484350Z 2023-01-11T21:38:06.3484450Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.3484524Z import torch 2023-01-11T21:38:06.3484600Z import random 2023-01-11T21:38:06.3484713Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.3484837Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.3484842Z 2023-01-11T21:38:06.3484927Z aten = torch.ops.aten 2023-01-11T21:38:06.3485066Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.3485210Z async_compile = AsyncCompile() 2023-01-11T21:38:06.3485215Z 2023-01-11T21:38:06.3485288Z import triton 2023-01-11T21:38:06.3485380Z import triton.language as tl 2023-01-11T21:38:06.3485497Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.3485639Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.3485644Z 2023-01-11T21:38:06.3485649Z 2023-01-11T21:38:06.3485807Z triton_fused_add_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.3485882Z import triton 2023-01-11T21:38:06.3485975Z import triton.language as tl 2023-01-11T21:38:06.3486086Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.3486193Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.3486325Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.3486444Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.3486457Z 2023-01-11T21:38:06.3486858Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.3486932Z @triton.jit 2023-01-11T21:38:06.3487067Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.3487140Z xnumel = 144 2023-01-11T21:38:06.3487237Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.3487365Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.3487446Z xmask = xindex < xnumel 2023-01-11T21:38:06.3487514Z x0 = xindex % 2 2023-01-11T21:38:06.3487595Z x2 = (xindex // 6) % 2 2023-01-11T21:38:06.3487665Z x4 = xindex 2023-01-11T21:38:06.3487865Z tmp0 = tl.load(in_ptr0 + (x0 + (2*x2)), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.3487934Z tmp1 = 1 2023-01-11T21:38:06.3488014Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.3488087Z tmp3 = 2 2023-01-11T21:38:06.3488159Z tmp4 = tmp2 + tmp3 2023-01-11T21:38:06.3488296Z tl.store(out_ptr0 + (x4 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.3488381Z ''') 2023-01-11T21:38:06.3488387Z 2023-01-11T21:38:06.3488391Z 2023-01-11T21:38:06.3488552Z triton_fused_add_2_1 = async_compile.triton(''' 2023-01-11T21:38:06.3488655Z import triton 2023-01-11T21:38:06.3488749Z import triton.language as tl 2023-01-11T21:38:06.3488862Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.3488956Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.3489088Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.3489214Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.3489219Z 2023-01-11T21:38:06.3489617Z @pointwise(size_hints=[32], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.3489693Z @triton.jit 2023-01-11T21:38:06.3489827Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.3489900Z xnumel = 24 2023-01-11T21:38:06.3490000Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.3490124Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.3490207Z xmask = xindex < xnumel 2023-01-11T21:38:06.3490280Z x0 = xindex % 2 2023-01-11T21:38:06.3490362Z x2 = (xindex // 6) % 2 2023-01-11T21:38:06.3490432Z x4 = xindex 2023-01-11T21:38:06.3490633Z tmp0 = tl.load(in_ptr0 + (x0 + (2*x2)), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.3490703Z tmp1 = 2 2023-01-11T21:38:06.3490774Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.3490908Z tl.store(out_ptr0 + (x4 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.3490992Z ''') 2023-01-11T21:38:06.3490998Z 2023-01-11T21:38:06.3491002Z 2023-01-11T21:38:06.3491093Z async_compile.wait(globals()) 2023-01-11T21:38:06.3491198Z del async_compile 2023-01-11T21:38:06.3491204Z 2023-01-11T21:38:06.3491278Z def call(args): 2023-01-11T21:38:06.3491351Z arg0_1, = args 2023-01-11T21:38:06.3491426Z args.clear() 2023-01-11T21:38:06.3491511Z with torch.cuda.device(0): 2023-01-11T21:38:06.3491737Z buf0 = empty_strided((3, 4, 2, 3, 2), (48, 12, 6, 2, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3491831Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.3491969Z triton_fused_add_1_0.run(arg0_1, buf0, 144, grid=grid(144), stream=stream0) 2023-01-11T21:38:06.3492188Z buf1 = empty_strided((2, 1, 2, 3, 2), (12, 12, 6, 2, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3492327Z triton_fused_add_2_1.run(arg0_1, buf1, 24, grid=grid(24), stream=stream0) 2023-01-11T21:38:06.3492451Z return (buf0, buf1, as_strided(arg0_1, (2, 2, 5, 2), (0, 2, 0, 1)), ) 2023-01-11T21:38:06.3492456Z 2023-01-11T21:38:06.3492461Z 2023-01-11T21:38:06.3492546Z if __name__ == "__main__": 2023-01-11T21:38:06.3492655Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.3492780Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.3492983Z arg0_1 = rand_strided((2, 1, 2), (2, 2, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.3493097Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.3493102Z 2023-01-11T21:38:06.3493107Z 2023-01-11T21:38:06.3493202Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.3493277Z import torch 2023-01-11T21:38:06.3493355Z import random 2023-01-11T21:38:06.3493475Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.3493592Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.3493597Z 2023-01-11T21:38:06.3493678Z aten = torch.ops.aten 2023-01-11T21:38:06.3493816Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.3493912Z async_compile = AsyncCompile() 2023-01-11T21:38:06.3493919Z 2023-01-11T21:38:06.3493992Z import triton 2023-01-11T21:38:06.3494084Z import triton.language as tl 2023-01-11T21:38:06.3494208Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.3494340Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.3494350Z 2023-01-11T21:38:06.3494354Z 2023-01-11T21:38:06.3494797Z triton_fused_add_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.3494878Z import triton 2023-01-11T21:38:06.3494972Z import triton.language as tl 2023-01-11T21:38:06.3495086Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.3495189Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.3495321Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.3495443Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.3495448Z 2023-01-11T21:38:06.3495852Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.3495923Z @triton.jit 2023-01-11T21:38:06.3496052Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.3496128Z xnumel = 144 2023-01-11T21:38:06.3496225Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.3496355Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.3496435Z xmask = xindex < xnumel 2023-01-11T21:38:06.3496510Z x0 = xindex % 2 2023-01-11T21:38:06.3496584Z x2 = (xindex // 6) % 2 2023-01-11T21:38:06.3496653Z x4 = xindex 2023-01-11T21:38:06.3496877Z tmp0 = tl.load(in_ptr0 + (x0 + (2*x2)), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.3496946Z tmp1 = 1 2023-01-11T21:38:06.3497025Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.3497097Z tmp3 = 2 2023-01-11T21:38:06.3497238Z tmp4 = tmp2 + tmp3 2023-01-11T21:38:06.3497377Z tl.store(out_ptr0 + (x4 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.3497504Z ''') 2023-01-11T21:38:06.3497510Z 2023-01-11T21:38:06.3497514Z 2023-01-11T21:38:06.3497672Z triton_fused_add_2_1 = async_compile.triton(''' 2023-01-11T21:38:06.3497747Z import triton 2023-01-11T21:38:06.3497840Z import triton.language as tl 2023-01-11T21:38:06.3497961Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.3498067Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.3498196Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.3498324Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.3498329Z 2023-01-11T21:38:06.3498733Z @pointwise(size_hints=[32], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.3498808Z @triton.jit 2023-01-11T21:38:06.3498941Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.3499020Z xnumel = 24 2023-01-11T21:38:06.3499118Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.3499251Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.3499329Z xmask = xindex < xnumel 2023-01-11T21:38:06.3499410Z x0 = xindex % 2 2023-01-11T21:38:06.3499495Z x2 = (xindex // 6) % 2 2023-01-11T21:38:06.3499569Z x4 = xindex 2023-01-11T21:38:06.3499797Z tmp0 = tl.load(in_ptr0 + (x0 + (2*x2)), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.3499870Z tmp1 = 2 2023-01-11T21:38:06.3499951Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.3500081Z tl.store(out_ptr0 + (x4 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.3500167Z ''') 2023-01-11T21:38:06.3500173Z 2023-01-11T21:38:06.3500177Z 2023-01-11T21:38:06.3500273Z async_compile.wait(globals()) 2023-01-11T21:38:06.3500352Z del async_compile 2023-01-11T21:38:06.3500357Z 2023-01-11T21:38:06.3500437Z def call(args): 2023-01-11T21:38:06.3500513Z arg0_1, = args 2023-01-11T21:38:06.3500587Z args.clear() 2023-01-11T21:38:06.3500681Z with torch.cuda.device(0): 2023-01-11T21:38:06.3500898Z buf0 = empty_strided((3, 4, 2, 3, 2), (48, 12, 6, 2, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.3501021Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.3501162Z triton_fused_add_1_0.run(arg0_1, buf0, 144, grid=grid(144), stream=stream0) 2023-01-11T21:38:06.3501383Z buf1 = empty_strided((2, 1, 2, 3, 2), (12, 12, 6, 2, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.3501525Z triton_fused_add_2_1.run(arg0_1, buf1, 24, grid=grid(24), stream=stream0) 2023-01-11T21:38:06.3501651Z return (buf0, buf1, as_strided(arg0_1, (2, 2, 5, 2), (0, 2, 0, 1)), ) 2023-01-11T21:38:06.3501657Z 2023-01-11T21:38:06.3501661Z 2023-01-11T21:38:06.3501743Z if __name__ == "__main__": 2023-01-11T21:38:06.3501861Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.3501985Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.3502192Z arg0_1 = rand_strided((2, 1, 2), (2, 2, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.3502306Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.3502311Z 2023-01-11T21:38:06.3502385Z ok (0.237s) 2023-01-11T21:38:06.3502852Z test_expanded_reduction_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.3502988Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.3503248Z [2023-01-11 21:34:36,316] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 524 2023-01-11T21:38:06.3503545Z [2023-01-11 21:34:36,597] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 524 2023-01-11T21:38:06.3503967Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.3504102Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.3504357Z [2023-01-11 21:34:36,613] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 525 2023-01-11T21:38:06.3504363Z 2023-01-11T21:38:06.3504456Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.3504531Z import torch 2023-01-11T21:38:06.3504605Z import random 2023-01-11T21:38:06.3504727Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.3504861Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.3504866Z 2023-01-11T21:38:06.3504950Z aten = torch.ops.aten 2023-01-11T21:38:06.3505089Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.3505181Z async_compile = AsyncCompile() 2023-01-11T21:38:06.3505191Z 2023-01-11T21:38:06.3505263Z import triton 2023-01-11T21:38:06.3505357Z import triton.language as tl 2023-01-11T21:38:06.3505509Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.3505673Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.3505680Z 2023-01-11T21:38:06.3505686Z 2023-01-11T21:38:06.3505851Z triton_fused_mul_sum_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.3505927Z import triton 2023-01-11T21:38:06.3506026Z import triton.language as tl 2023-01-11T21:38:06.3506136Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.3506239Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.3506377Z from torch._inductor.triton_ops.autotune import reduction 2023-01-11T21:38:06.3506505Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.3506510Z 2023-01-11T21:38:06.3506605Z @reduction(size_hints=[1024, 128], 2023-01-11T21:38:06.3506723Z reduction_hint=ReductionHint.OUTER, 2023-01-11T21:38:06.3506837Z filename=__file__, 2023-01-11T21:38:06.3507218Z meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.3507289Z @triton.jit 2023-01-11T21:38:06.3507468Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr): 2023-01-11T21:38:06.3507545Z xnumel = 1024 2023-01-11T21:38:06.3507621Z rnumel = 99 2023-01-11T21:38:06.3507720Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.3507865Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.3507949Z xmask = xindex < xnumel 2023-01-11T21:38:06.3508064Z rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK]) 2023-01-11T21:38:06.3508154Z x1 = (xindex // 256) 2023-01-11T21:38:06.3508232Z x0 = xindex % 256 2023-01-11T21:38:06.3508355Z _tmp7 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.3508428Z x3 = xindex 2023-01-11T21:38:06.3508535Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.3508627Z rindex = roffset + rbase 2023-01-11T21:38:06.3508707Z rmask = rindex < rnumel 2023-01-11T21:38:06.3508782Z r2 = rindex 2023-01-11T21:38:06.3508866Z tmp0 = r2 + (99*x1) 2023-01-11T21:38:06.3508943Z tmp1 = 394 2023-01-11T21:38:06.3509026Z tmp2 = tmp0 < tmp1 2023-01-11T21:38:06.3509209Z tmp3 = tl.load(in_ptr0 + (x0 + (256*((r2 + (99*x1)) % 394)) + tl.zeros([XBLOCK, RBLOCK], tl.int32)), rmask & tmp2 & xmask, other=0) 2023-01-11T21:38:06.3509425Z tmp4 = tl.load(in_ptr1 + (x0 + (256*(((r2 + (99*x1)) // 197) % 2)) + tl.zeros([XBLOCK, RBLOCK], tl.int32)), rmask & tmp2 & xmask, other=0) 2023-01-11T21:38:06.3509502Z tmp5 = tmp3 * tmp4 2023-01-11T21:38:06.3509600Z tmp6 = tl.where(tmp2, tmp5, 0) 2023-01-11T21:38:06.3509725Z _tmp7 = tl.where(xmask & rmask, _tmp7 + tmp6, _tmp7) 2023-01-11T21:38:06.3509842Z tmp7 = tl.reshape(tl.sum(_tmp7, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.3509945Z tl.store(out_ptr0 + x3, tmp7, xmask) 2023-01-11T21:38:06.3510032Z ''') 2023-01-11T21:38:06.3510038Z 2023-01-11T21:38:06.3510042Z 2023-01-11T21:38:06.3510207Z triton_fused_mul_sum_1_1 = async_compile.triton(''' 2023-01-11T21:38:06.3510282Z import triton 2023-01-11T21:38:06.3510370Z import triton.language as tl 2023-01-11T21:38:06.3510484Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.3510589Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.3510723Z from torch._inductor.triton_ops.autotune import reduction 2023-01-11T21:38:06.3510857Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.3510862Z 2023-01-11T21:38:06.3510953Z @reduction(size_hints=[256, 4], 2023-01-11T21:38:06.3511076Z reduction_hint=ReductionHint.OUTER_TINY, 2023-01-11T21:38:06.3511160Z filename=__file__, 2023-01-11T21:38:06.3511526Z meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.3511601Z @triton.jit 2023-01-11T21:38:06.3511772Z def triton_(in_ptr0, out_ptr0, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr): 2023-01-11T21:38:06.3511849Z xnumel = 256 2023-01-11T21:38:06.3511926Z rnumel = 4 2023-01-11T21:38:06.3512023Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.3512162Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.3512244Z xmask = xindex < xnumel 2023-01-11T21:38:06.3512367Z rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK]) 2023-01-11T21:38:06.3512440Z x0 = xindex 2023-01-11T21:38:06.3512560Z _tmp1 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.3512670Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.3512792Z rindex = roffset + rbase 2023-01-11T21:38:06.3512881Z rmask = rindex < rnumel 2023-01-11T21:38:06.3512948Z r1 = rindex 2023-01-11T21:38:06.3513067Z tmp0 = tl.load(in_ptr0 + (x0 + (256*r1)), rmask & xmask) 2023-01-11T21:38:06.3513189Z _tmp1 = tl.where(xmask & rmask, _tmp1 + tmp0, _tmp1) 2023-01-11T21:38:06.3513305Z tmp1 = tl.reshape(tl.sum(_tmp1, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.3513410Z tl.store(out_ptr0 + x0, tmp1, xmask) 2023-01-11T21:38:06.3513497Z ''') 2023-01-11T21:38:06.3513503Z 2023-01-11T21:38:06.3513507Z 2023-01-11T21:38:06.3513602Z async_compile.wait(globals()) 2023-01-11T21:38:06.3513677Z del async_compile 2023-01-11T21:38:06.3513690Z 2023-01-11T21:38:06.3513759Z def call(args): 2023-01-11T21:38:06.3513842Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.3513920Z args.clear() 2023-01-11T21:38:06.3514012Z with torch.cuda.device(0): 2023-01-11T21:38:06.3514222Z buf0 = empty_strided((256, 4), (1, 256), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3514315Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.3514469Z triton_fused_mul_sum_1_0.run(arg0_1, arg1_1, buf0, 1024, 99, grid=grid(1024), stream=stream0) 2023-01-11T21:38:06.3514538Z del arg0_1 2023-01-11T21:38:06.3514612Z del arg1_1 2023-01-11T21:38:06.3514814Z buf1 = empty_strided((256, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3514957Z triton_fused_mul_sum_1_1.run(buf0, buf1, 256, 4, grid=grid(256), stream=stream0) 2023-01-11T21:38:06.3515036Z return (buf1, ) 2023-01-11T21:38:06.3515041Z 2023-01-11T21:38:06.3515046Z 2023-01-11T21:38:06.3515171Z if __name__ == "__main__": 2023-01-11T21:38:06.3515308Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.3515451Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.3515676Z arg0_1 = rand_strided((2, 197, 256), (50432, 256, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.3515893Z arg1_1 = rand_strided((2, 1, 256), (256, 256, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.3516017Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.3516023Z 2023-01-11T21:38:06.3516290Z [2023-01-11 21:34:36,741] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 525 2023-01-11T21:38:06.3516296Z 2023-01-11T21:38:06.3516395Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.3516472Z import torch 2023-01-11T21:38:06.3516547Z import random 2023-01-11T21:38:06.3516668Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.3516787Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.3516796Z 2023-01-11T21:38:06.3516880Z aten = torch.ops.aten 2023-01-11T21:38:06.3517017Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.3517115Z async_compile = AsyncCompile() 2023-01-11T21:38:06.3517120Z 2023-01-11T21:38:06.3517198Z import triton 2023-01-11T21:38:06.3517296Z import triton.language as tl 2023-01-11T21:38:06.3517423Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.3517558Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.3517570Z 2023-01-11T21:38:06.3517574Z 2023-01-11T21:38:06.3517731Z triton_fused_mul_sum_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.3517810Z import triton 2023-01-11T21:38:06.3517905Z import triton.language as tl 2023-01-11T21:38:06.3518024Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.3518127Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.3518261Z from torch._inductor.triton_ops.autotune import reduction 2023-01-11T21:38:06.3518391Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.3518396Z 2023-01-11T21:38:06.3518484Z @reduction(size_hints=[1024, 128], 2023-01-11T21:38:06.3518600Z reduction_hint=ReductionHint.OUTER, 2023-01-11T21:38:06.3518687Z filename=__file__, 2023-01-11T21:38:06.3519099Z meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp32', 3: 'i32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.3519177Z @triton.jit 2023-01-11T21:38:06.3519356Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr): 2023-01-11T21:38:06.3519434Z xnumel = 1024 2023-01-11T21:38:06.3519512Z rnumel = 99 2023-01-11T21:38:06.3519606Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.3519743Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.3519829Z xmask = xindex < xnumel 2023-01-11T21:38:06.3519950Z rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK]) 2023-01-11T21:38:06.3520033Z x1 = (xindex // 256) 2023-01-11T21:38:06.3520111Z x0 = xindex % 256 2023-01-11T21:38:06.3520230Z _tmp7 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.3520298Z x3 = xindex 2023-01-11T21:38:06.3520404Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.3520496Z rindex = roffset + rbase 2023-01-11T21:38:06.3520581Z rmask = rindex < rnumel 2023-01-11T21:38:06.3520654Z r2 = rindex 2023-01-11T21:38:06.3520737Z tmp0 = r2 + (99*x1) 2023-01-11T21:38:06.3520814Z tmp1 = 394 2023-01-11T21:38:06.3520891Z tmp2 = tmp0 < tmp1 2023-01-11T21:38:06.3521090Z tmp3 = tl.load(in_ptr0 + (x0 + (256*((r2 + (99*x1)) % 394)) + tl.zeros([XBLOCK, RBLOCK], tl.int32)), rmask & tmp2 & xmask, other=0).to(tl.float32) 2023-01-11T21:38:06.3521289Z tmp4 = tl.load(in_ptr1 + (x0 + (256*(((r2 + (99*x1)) // 197) % 2)) + tl.zeros([XBLOCK, RBLOCK], tl.int32)), rmask & tmp2 & xmask, other=0).to(tl.float32) 2023-01-11T21:38:06.3521402Z tmp5 = tmp3 * tmp4 2023-01-11T21:38:06.3521497Z tmp6 = tl.where(tmp2, tmp5, 0) 2023-01-11T21:38:06.3521620Z _tmp7 = tl.where(xmask & rmask, _tmp7 + tmp6, _tmp7) 2023-01-11T21:38:06.3521739Z tmp7 = tl.reshape(tl.sum(_tmp7, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.3521832Z tl.store(out_ptr0 + x3, tmp7, xmask) 2023-01-11T21:38:06.3521920Z ''') 2023-01-11T21:38:06.3521925Z 2023-01-11T21:38:06.3521930Z 2023-01-11T21:38:06.3522096Z triton_fused_mul_sum_1_1 = async_compile.triton(''' 2023-01-11T21:38:06.3522172Z import triton 2023-01-11T21:38:06.3522266Z import triton.language as tl 2023-01-11T21:38:06.3522382Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.3522486Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.3522621Z from torch._inductor.triton_ops.autotune import reduction 2023-01-11T21:38:06.3522744Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.3522750Z 2023-01-11T21:38:06.3522841Z @reduction(size_hints=[256, 4], 2023-01-11T21:38:06.3522965Z reduction_hint=ReductionHint.OUTER_TINY, 2023-01-11T21:38:06.3523051Z filename=__file__, 2023-01-11T21:38:06.3523415Z meta={'signature': {0: '*fp32', 1: '*fp16', 2: 'i32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.3523492Z @triton.jit 2023-01-11T21:38:06.3523663Z def triton_(in_ptr0, out_ptr0, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr): 2023-01-11T21:38:06.3523739Z xnumel = 256 2023-01-11T21:38:06.3523806Z rnumel = 4 2023-01-11T21:38:06.3523905Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.3524044Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.3524129Z xmask = xindex < xnumel 2023-01-11T21:38:06.3524254Z rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK]) 2023-01-11T21:38:06.3524327Z x0 = xindex 2023-01-11T21:38:06.3524444Z _tmp1 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.3524545Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.3524634Z rindex = roffset + rbase 2023-01-11T21:38:06.3524752Z rmask = rindex < rnumel 2023-01-11T21:38:06.3524827Z r1 = rindex 2023-01-11T21:38:06.3524945Z tmp0 = tl.load(in_ptr0 + (x0 + (256*r1)), rmask & xmask) 2023-01-11T21:38:06.3525069Z _tmp1 = tl.where(xmask & rmask, _tmp1 + tmp0, _tmp1) 2023-01-11T21:38:06.3525183Z tmp1 = tl.reshape(tl.sum(_tmp1, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.3525276Z tl.store(out_ptr0 + x0, tmp1, xmask) 2023-01-11T21:38:06.3525363Z ''') 2023-01-11T21:38:06.3525369Z 2023-01-11T21:38:06.3525373Z 2023-01-11T21:38:06.3525469Z async_compile.wait(globals()) 2023-01-11T21:38:06.3525548Z del async_compile 2023-01-11T21:38:06.3525557Z 2023-01-11T21:38:06.3525634Z def call(args): 2023-01-11T21:38:06.3525715Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.3525797Z args.clear() 2023-01-11T21:38:06.3525884Z with torch.cuda.device(0): 2023-01-11T21:38:06.3526093Z buf0 = empty_strided((256, 4), (1, 256), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3526189Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.3526343Z triton_fused_mul_sum_1_0.run(arg0_1, arg1_1, buf0, 1024, 99, grid=grid(1024), stream=stream0) 2023-01-11T21:38:06.3526417Z del arg0_1 2023-01-11T21:38:06.3526489Z del arg1_1 2023-01-11T21:38:06.3526691Z buf1 = empty_strided((256, ), (1, ), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.3526835Z triton_fused_mul_sum_1_1.run(buf0, buf1, 256, 4, grid=grid(256), stream=stream0) 2023-01-11T21:38:06.3526909Z return (buf1, ) 2023-01-11T21:38:06.3526914Z 2023-01-11T21:38:06.3526918Z 2023-01-11T21:38:06.3527000Z if __name__ == "__main__": 2023-01-11T21:38:06.3527209Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.3527337Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.3527560Z arg0_1 = rand_strided((2, 197, 256), (50432, 256, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.3527778Z arg1_1 = rand_strided((2, 1, 256), (256, 256, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.3527903Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.3527908Z 2023-01-11T21:38:06.3527982Z ok (0.442s) 2023-01-11T21:38:06.3528443Z test_expm1_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.3528571Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.3528835Z [2023-01-11 21:34:36,758] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 526 2023-01-11T21:38:06.3529103Z [2023-01-11 21:34:36,900] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 526 2023-01-11T21:38:06.3529524Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.3529657Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.3529916Z [2023-01-11 21:34:36,916] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 527 2023-01-11T21:38:06.3530180Z [2023-01-11 21:34:36,924] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 527 2023-01-11T21:38:06.3530626Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.3530761Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.3531017Z [2023-01-11 21:34:36,940] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 528 2023-01-11T21:38:06.3531283Z [2023-01-11 21:34:37,090] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 528 2023-01-11T21:38:06.3531696Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.3531826Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.3532086Z [2023-01-11 21:34:37,106] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 529 2023-01-11T21:38:06.3532092Z 2023-01-11T21:38:06.3532194Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.3532270Z import torch 2023-01-11T21:38:06.3532349Z import random 2023-01-11T21:38:06.3532471Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.3532598Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.3532603Z 2023-01-11T21:38:06.3532686Z aten = torch.ops.aten 2023-01-11T21:38:06.3532819Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.3532918Z async_compile = AsyncCompile() 2023-01-11T21:38:06.3532950Z 2023-01-11T21:38:06.3533027Z import triton 2023-01-11T21:38:06.3533123Z import triton.language as tl 2023-01-11T21:38:06.3533251Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.3533393Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.3533398Z 2023-01-11T21:38:06.3533405Z 2023-01-11T21:38:06.3533574Z triton_fused_expm1_mul_0 = async_compile.triton(''' 2023-01-11T21:38:06.3533652Z import triton 2023-01-11T21:38:06.3533740Z import triton.language as tl 2023-01-11T21:38:06.3533856Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.3533962Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.3534098Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.3534224Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.3534230Z 2023-01-11T21:38:06.3534774Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.3534856Z @triton.jit 2023-01-11T21:38:06.3535002Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.3535071Z xnumel = 64 2023-01-11T21:38:06.3535173Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.3535308Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.3535393Z xmask = xindex < xnumel 2023-01-11T21:38:06.3535465Z x0 = xindex 2023-01-11T21:38:06.3535679Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.3535799Z tmp2 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.3535895Z tmp1 = tl.libdevice.expm1(tmp0) 2023-01-11T21:38:06.3535995Z tmp3 = tl.libdevice.expm1(tmp2) 2023-01-11T21:38:06.3536067Z tmp4 = 2 2023-01-11T21:38:06.3536150Z tmp5 = tmp3 * tmp4 2023-01-11T21:38:06.3536289Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask) 2023-01-11T21:38:06.3536423Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask) 2023-01-11T21:38:06.3536509Z ''') 2023-01-11T21:38:06.3536515Z 2023-01-11T21:38:06.3536519Z 2023-01-11T21:38:06.3536659Z async_compile.wait(globals()) 2023-01-11T21:38:06.3536733Z del async_compile 2023-01-11T21:38:06.3536738Z 2023-01-11T21:38:06.3536814Z def call(args): 2023-01-11T21:38:06.3536888Z arg0_1, = args 2023-01-11T21:38:06.3536967Z args.clear() 2023-01-11T21:38:06.3537063Z with torch.cuda.device(0): 2023-01-11T21:38:06.3537337Z buf0 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.3537536Z buf1 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.3537625Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.3537777Z triton_fused_expm1_mul_0.run(arg0_1, buf0, buf1, 64, grid=grid(64), stream=stream0) 2023-01-11T21:38:06.3537857Z del arg0_1 2023-01-11T21:38:06.3537939Z return (buf0, buf1, ) 2023-01-11T21:38:06.3537944Z 2023-01-11T21:38:06.3537949Z 2023-01-11T21:38:06.3538028Z if __name__ == "__main__": 2023-01-11T21:38:06.3538146Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.3538278Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.3538478Z arg0_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.3538587Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.3538593Z 2023-01-11T21:38:06.3538597Z 2023-01-11T21:38:06.3538700Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.3538775Z import torch 2023-01-11T21:38:06.3538851Z import random 2023-01-11T21:38:06.3538974Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.3539104Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.3539109Z 2023-01-11T21:38:06.3539232Z aten = torch.ops.aten 2023-01-11T21:38:06.3539370Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.3539462Z async_compile = AsyncCompile() 2023-01-11T21:38:06.3539466Z 2023-01-11T21:38:06.3539542Z import triton 2023-01-11T21:38:06.3539639Z import triton.language as tl 2023-01-11T21:38:06.3539769Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.3539911Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.3539917Z 2023-01-11T21:38:06.3539921Z 2023-01-11T21:38:06.3540088Z triton_fused_expm1_mul_0 = async_compile.triton(''' 2023-01-11T21:38:06.3540166Z import triton 2023-01-11T21:38:06.3540253Z import triton.language as tl 2023-01-11T21:38:06.3540369Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.3540472Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.3540606Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.3540731Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.3540739Z 2023-01-11T21:38:06.3541162Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.3541240Z @triton.jit 2023-01-11T21:38:06.3541385Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.3541466Z xnumel = 64 2023-01-11T21:38:06.3541558Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.3541690Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.3541778Z xmask = xindex < xnumel 2023-01-11T21:38:06.3541851Z x0 = xindex 2023-01-11T21:38:06.3542066Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.3542188Z tmp2 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.3542288Z tmp1 = tl.libdevice.expm1(tmp0) 2023-01-11T21:38:06.3542384Z tmp3 = tl.libdevice.expm1(tmp2) 2023-01-11T21:38:06.3542457Z tmp4 = 2 2023-01-11T21:38:06.3542538Z tmp5 = tmp3 * tmp4 2023-01-11T21:38:06.3542673Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask) 2023-01-11T21:38:06.3542839Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask) 2023-01-11T21:38:06.3542926Z ''') 2023-01-11T21:38:06.3542932Z 2023-01-11T21:38:06.3542936Z 2023-01-11T21:38:06.3543033Z async_compile.wait(globals()) 2023-01-11T21:38:06.3543105Z del async_compile 2023-01-11T21:38:06.3543111Z 2023-01-11T21:38:06.3543189Z def call(args): 2023-01-11T21:38:06.3543265Z arg0_1, = args 2023-01-11T21:38:06.3543343Z args.clear() 2023-01-11T21:38:06.3543437Z with torch.cuda.device(0): 2023-01-11T21:38:06.3543635Z buf0 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.3543831Z buf1 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.3543922Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.3544068Z triton_fused_expm1_mul_0.run(arg0_1, buf0, buf1, 64, grid=grid(64), stream=stream0) 2023-01-11T21:38:06.3544145Z del arg0_1 2023-01-11T21:38:06.3544233Z return (buf0, buf1, ) 2023-01-11T21:38:06.3544238Z 2023-01-11T21:38:06.3544242Z 2023-01-11T21:38:06.3544326Z if __name__ == "__main__": 2023-01-11T21:38:06.3544445Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.3544573Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.3544775Z arg0_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.3544882Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.3544893Z 2023-01-11T21:38:06.3544897Z 2023-01-11T21:38:06.3544990Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.3545066Z import torch 2023-01-11T21:38:06.3545143Z import random 2023-01-11T21:38:06.3545264Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.3545442Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.3545447Z 2023-01-11T21:38:06.3545535Z aten = torch.ops.aten 2023-01-11T21:38:06.3545690Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.3545781Z async_compile = AsyncCompile() 2023-01-11T21:38:06.3545789Z 2023-01-11T21:38:06.3545864Z import triton 2023-01-11T21:38:06.3545958Z import triton.language as tl 2023-01-11T21:38:06.3546085Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.3546226Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.3546231Z 2023-01-11T21:38:06.3546236Z 2023-01-11T21:38:06.3546401Z triton_fused_expm1_mul_0 = async_compile.triton(''' 2023-01-11T21:38:06.3546477Z import triton 2023-01-11T21:38:06.3546570Z import triton.language as tl 2023-01-11T21:38:06.3546678Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.3546781Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.3546918Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.3547046Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.3547051Z 2023-01-11T21:38:06.3547473Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.3547548Z @triton.jit 2023-01-11T21:38:06.3547692Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.3547769Z xnumel = 201 2023-01-11T21:38:06.3547862Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.3547992Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.3548076Z xmask = xindex < xnumel 2023-01-11T21:38:06.3548150Z x0 = xindex 2023-01-11T21:38:06.3548366Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.3548488Z tmp2 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.3548590Z tmp1 = tl.libdevice.expm1(tmp0) 2023-01-11T21:38:06.3548683Z tmp3 = tl.libdevice.expm1(tmp2) 2023-01-11T21:38:06.3548756Z tmp4 = 2 2023-01-11T21:38:06.3548838Z tmp5 = tmp3 * tmp4 2023-01-11T21:38:06.3549001Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask) 2023-01-11T21:38:06.3549138Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask) 2023-01-11T21:38:06.3549223Z ''') 2023-01-11T21:38:06.3549229Z 2023-01-11T21:38:06.3549234Z 2023-01-11T21:38:06.3549327Z async_compile.wait(globals()) 2023-01-11T21:38:06.3549403Z del async_compile 2023-01-11T21:38:06.3549408Z 2023-01-11T21:38:06.3549478Z def call(args): 2023-01-11T21:38:06.3549555Z arg0_1, = args 2023-01-11T21:38:06.3549631Z args.clear() 2023-01-11T21:38:06.3549723Z with torch.cuda.device(0): 2023-01-11T21:38:06.3549925Z buf0 = empty_strided((201, ), (1, ), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.3550132Z buf1 = empty_strided((201, ), (1, ), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.3550225Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.3550371Z triton_fused_expm1_mul_0.run(arg0_1, buf0, buf1, 201, grid=grid(201), stream=stream0) 2023-01-11T21:38:06.3550447Z del arg0_1 2023-01-11T21:38:06.3550530Z return (buf0, buf1, ) 2023-01-11T21:38:06.3550535Z 2023-01-11T21:38:06.3550539Z 2023-01-11T21:38:06.3550626Z if __name__ == "__main__": 2023-01-11T21:38:06.3550745Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.3550872Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.3551073Z arg0_1 = rand_strided((201, ), (1, ), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.3551188Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.3551193Z 2023-01-11T21:38:06.3551452Z [2023-01-11 21:34:37,114] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 529 2023-01-11T21:38:06.3551903Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.3552040Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.3552302Z [2023-01-11 21:34:37,130] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 530 2023-01-11T21:38:06.3552570Z [2023-01-11 21:34:37,270] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 530 2023-01-11T21:38:06.3552990Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.3553127Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.3553386Z [2023-01-11 21:34:37,286] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 531 2023-01-11T21:38:06.3553646Z [2023-01-11 21:34:37,294] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 531 2023-01-11T21:38:06.3554059Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.3554191Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.3554443Z [2023-01-11 21:34:37,310] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 532 2023-01-11T21:38:06.3554449Z 2023-01-11T21:38:06.3554543Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.3554619Z import torch 2023-01-11T21:38:06.3554723Z import random 2023-01-11T21:38:06.3554847Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.3554973Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.3554978Z 2023-01-11T21:38:06.3555063Z aten = torch.ops.aten 2023-01-11T21:38:06.3555202Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.3555299Z async_compile = AsyncCompile() 2023-01-11T21:38:06.3555305Z 2023-01-11T21:38:06.3555393Z import triton 2023-01-11T21:38:06.3555495Z import triton.language as tl 2023-01-11T21:38:06.3555642Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.3555787Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.3555792Z 2023-01-11T21:38:06.3555796Z 2023-01-11T21:38:06.3555963Z triton_fused_expm1_mul_0 = async_compile.triton(''' 2023-01-11T21:38:06.3556038Z import triton 2023-01-11T21:38:06.3556135Z import triton.language as tl 2023-01-11T21:38:06.3556247Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.3556351Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.3556486Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.3556611Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.3556616Z 2023-01-11T21:38:06.3557034Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.3557110Z @triton.jit 2023-01-11T21:38:06.3557257Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.3557360Z xnumel = 201 2023-01-11T21:38:06.3557454Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.3557587Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.3557671Z xmask = xindex < xnumel 2023-01-11T21:38:06.3557747Z x0 = xindex 2023-01-11T21:38:06.3557964Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.3558086Z tmp2 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.3558187Z tmp1 = tl.libdevice.expm1(tmp0) 2023-01-11T21:38:06.3558288Z tmp3 = tl.libdevice.expm1(tmp2) 2023-01-11T21:38:06.3558354Z tmp4 = 2 2023-01-11T21:38:06.3558438Z tmp5 = tmp3 * tmp4 2023-01-11T21:38:06.3558575Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask) 2023-01-11T21:38:06.3558709Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask) 2023-01-11T21:38:06.3558800Z ''') 2023-01-11T21:38:06.3558806Z 2023-01-11T21:38:06.3558810Z 2023-01-11T21:38:06.3558904Z async_compile.wait(globals()) 2023-01-11T21:38:06.3558985Z del async_compile 2023-01-11T21:38:06.3558990Z 2023-01-11T21:38:06.3559060Z def call(args): 2023-01-11T21:38:06.3559134Z arg0_1, = args 2023-01-11T21:38:06.3559208Z args.clear() 2023-01-11T21:38:06.3559305Z with torch.cuda.device(0): 2023-01-11T21:38:06.3559506Z buf0 = empty_strided((201, ), (1, ), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.3559706Z buf1 = empty_strided((201, ), (1, ), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.3559800Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.3559945Z triton_fused_expm1_mul_0.run(arg0_1, buf0, buf1, 201, grid=grid(201), stream=stream0) 2023-01-11T21:38:06.3560021Z del arg0_1 2023-01-11T21:38:06.3560107Z return (buf0, buf1, ) 2023-01-11T21:38:06.3560112Z 2023-01-11T21:38:06.3560117Z 2023-01-11T21:38:06.3560198Z if __name__ == "__main__": 2023-01-11T21:38:06.3560321Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.3560449Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.3560650Z arg0_1 = rand_strided((201, ), (1, ), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.3560765Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.3560799Z 2023-01-11T21:38:06.3560804Z 2023-01-11T21:38:06.3560903Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.3560973Z import torch 2023-01-11T21:38:06.3561053Z import random 2023-01-11T21:38:06.3561173Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.3561299Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.3561304Z 2023-01-11T21:38:06.3561388Z aten = torch.ops.aten 2023-01-11T21:38:06.3561525Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.3561621Z async_compile = AsyncCompile() 2023-01-11T21:38:06.3561626Z 2023-01-11T21:38:06.3561698Z import triton 2023-01-11T21:38:06.3561792Z import triton.language as tl 2023-01-11T21:38:06.3561919Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.3562061Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.3562067Z 2023-01-11T21:38:06.3562071Z 2023-01-11T21:38:06.3562238Z triton_fused_expm1_mul_0 = async_compile.triton(''' 2023-01-11T21:38:06.3562314Z import triton 2023-01-11T21:38:06.3562408Z import triton.language as tl 2023-01-11T21:38:06.3562523Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.3562621Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.3562757Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.3562884Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.3562889Z 2023-01-11T21:38:06.3563310Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.3563416Z @triton.jit 2023-01-11T21:38:06.3563559Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.3563635Z xnumel = 64 2023-01-11T21:38:06.3563733Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.3563861Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.3563949Z xmask = xindex < xnumel 2023-01-11T21:38:06.3564023Z x0 = xindex 2023-01-11T21:38:06.3564215Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.3564315Z tmp2 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.3564417Z tmp1 = tl.libdevice.expm1(tmp0) 2023-01-11T21:38:06.3564515Z tmp3 = tl.libdevice.expm1(tmp2) 2023-01-11T21:38:06.3564582Z tmp4 = 2 2023-01-11T21:38:06.3564660Z tmp5 = tmp3 * tmp4 2023-01-11T21:38:06.3564798Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask) 2023-01-11T21:38:06.3564936Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask) 2023-01-11T21:38:06.3565023Z ''') 2023-01-11T21:38:06.3565029Z 2023-01-11T21:38:06.3565033Z 2023-01-11T21:38:06.3565129Z async_compile.wait(globals()) 2023-01-11T21:38:06.3565208Z del async_compile 2023-01-11T21:38:06.3565213Z 2023-01-11T21:38:06.3565293Z def call(args): 2023-01-11T21:38:06.3565363Z arg0_1, = args 2023-01-11T21:38:06.3565437Z args.clear() 2023-01-11T21:38:06.3565532Z with torch.cuda.device(0): 2023-01-11T21:38:06.3565729Z buf0 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3565925Z buf1 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3566021Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.3566170Z triton_fused_expm1_mul_0.run(arg0_1, buf0, buf1, 64, grid=grid(64), stream=stream0) 2023-01-11T21:38:06.3566239Z del arg0_1 2023-01-11T21:38:06.3566326Z return (buf0, buf1, ) 2023-01-11T21:38:06.3566331Z 2023-01-11T21:38:06.3566335Z 2023-01-11T21:38:06.3566416Z if __name__ == "__main__": 2023-01-11T21:38:06.3566537Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.3566665Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.3566896Z arg0_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.3567010Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.3567015Z 2023-01-11T21:38:06.3567020Z 2023-01-11T21:38:06.3567119Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.3567189Z import torch 2023-01-11T21:38:06.3567265Z import random 2023-01-11T21:38:06.3567384Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.3567512Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.3567517Z 2023-01-11T21:38:06.3567602Z aten = torch.ops.aten 2023-01-11T21:38:06.3567738Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.3567838Z async_compile = AsyncCompile() 2023-01-11T21:38:06.3567844Z 2023-01-11T21:38:06.3567918Z import triton 2023-01-11T21:38:06.3568006Z import triton.language as tl 2023-01-11T21:38:06.3568131Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.3568281Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.3568286Z 2023-01-11T21:38:06.3568290Z 2023-01-11T21:38:06.3568456Z triton_fused_expm1_mul_0 = async_compile.triton(''' 2023-01-11T21:38:06.3568533Z import triton 2023-01-11T21:38:06.3568627Z import triton.language as tl 2023-01-11T21:38:06.3568742Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.3568840Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.3568975Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.3569099Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.3569104Z 2023-01-11T21:38:06.3569528Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.3569641Z @triton.jit 2023-01-11T21:38:06.3569788Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.3569865Z xnumel = 64 2023-01-11T21:38:06.3569964Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.3570089Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.3570173Z xmask = xindex < xnumel 2023-01-11T21:38:06.3570247Z x0 = xindex 2023-01-11T21:38:06.3570462Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.3570584Z tmp2 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.3570687Z tmp1 = tl.libdevice.expm1(tmp0) 2023-01-11T21:38:06.3570786Z tmp3 = tl.libdevice.expm1(tmp2) 2023-01-11T21:38:06.3570856Z tmp4 = 2 2023-01-11T21:38:06.3570938Z tmp5 = tmp3 * tmp4 2023-01-11T21:38:06.3571075Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask) 2023-01-11T21:38:06.3571209Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask) 2023-01-11T21:38:06.3571295Z ''') 2023-01-11T21:38:06.3571300Z 2023-01-11T21:38:06.3571308Z 2023-01-11T21:38:06.3571403Z async_compile.wait(globals()) 2023-01-11T21:38:06.3571483Z del async_compile 2023-01-11T21:38:06.3571488Z 2023-01-11T21:38:06.3571566Z def call(args): 2023-01-11T21:38:06.3571635Z arg0_1, = args 2023-01-11T21:38:06.3571710Z args.clear() 2023-01-11T21:38:06.3571804Z with torch.cuda.device(0): 2023-01-11T21:38:06.3572003Z buf0 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.3572200Z buf1 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.3572294Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.3572446Z triton_fused_expm1_mul_0.run(arg0_1, buf0, buf1, 64, grid=grid(64), stream=stream0) 2023-01-11T21:38:06.3572515Z del arg0_1 2023-01-11T21:38:06.3572603Z return (buf0, buf1, ) 2023-01-11T21:38:06.3572608Z 2023-01-11T21:38:06.3572613Z 2023-01-11T21:38:06.3572694Z if __name__ == "__main__": 2023-01-11T21:38:06.3572840Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.3572968Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.3573167Z arg0_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.3573276Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.3573282Z 2023-01-11T21:38:06.3573546Z [2023-01-11 21:34:37,460] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 532 2023-01-11T21:38:06.3573963Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.3574099Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.3574350Z [2023-01-11 21:34:37,476] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 533 2023-01-11T21:38:06.3574719Z [2023-01-11 21:34:37,484] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 533 2023-01-11T21:38:06.3575135Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.3575268Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.3575570Z [2023-01-11 21:34:37,499] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 534 2023-01-11T21:38:06.3575831Z [2023-01-11 21:34:37,638] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 534 2023-01-11T21:38:06.3576249Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.3576379Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.3576632Z [2023-01-11 21:34:37,654] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 535 2023-01-11T21:38:06.3576637Z 2023-01-11T21:38:06.3576738Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.3576813Z import torch 2023-01-11T21:38:06.3576881Z import random 2023-01-11T21:38:06.3577003Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.3577185Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.3577191Z 2023-01-11T21:38:06.3577292Z aten = torch.ops.aten 2023-01-11T21:38:06.3577440Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.3577536Z async_compile = AsyncCompile() 2023-01-11T21:38:06.3577541Z 2023-01-11T21:38:06.3577613Z import triton 2023-01-11T21:38:06.3577698Z import triton.language as tl 2023-01-11T21:38:06.3577820Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.3577960Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.3577965Z 2023-01-11T21:38:06.3577969Z 2023-01-11T21:38:06.3578132Z triton_fused_expm1_mul_0 = async_compile.triton(''' 2023-01-11T21:38:06.3578207Z import triton 2023-01-11T21:38:06.3578302Z import triton.language as tl 2023-01-11T21:38:06.3578415Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.3578518Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.3578644Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.3578769Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.3578815Z 2023-01-11T21:38:06.3579230Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.3579304Z @triton.jit 2023-01-11T21:38:06.3579446Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.3579521Z xnumel = 201 2023-01-11T21:38:06.3579618Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.3579747Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.3579826Z xmask = xindex < xnumel 2023-01-11T21:38:06.3579897Z x0 = xindex 2023-01-11T21:38:06.3580086Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.3580181Z tmp2 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.3580285Z tmp1 = tl.libdevice.expm1(tmp0) 2023-01-11T21:38:06.3580386Z tmp3 = tl.libdevice.expm1(tmp2) 2023-01-11T21:38:06.3580463Z tmp4 = 2 2023-01-11T21:38:06.3580535Z tmp5 = tmp3 * tmp4 2023-01-11T21:38:06.3580669Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask) 2023-01-11T21:38:06.3580802Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask) 2023-01-11T21:38:06.3580888Z ''') 2023-01-11T21:38:06.3580893Z 2023-01-11T21:38:06.3580897Z 2023-01-11T21:38:06.3580992Z async_compile.wait(globals()) 2023-01-11T21:38:06.3581069Z del async_compile 2023-01-11T21:38:06.3581075Z 2023-01-11T21:38:06.3581148Z def call(args): 2023-01-11T21:38:06.3581214Z arg0_1, = args 2023-01-11T21:38:06.3581321Z args.clear() 2023-01-11T21:38:06.3581414Z with torch.cuda.device(0): 2023-01-11T21:38:06.3581615Z buf0 = empty_strided((201, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3581820Z buf1 = empty_strided((201, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3581914Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.3582065Z triton_fused_expm1_mul_0.run(arg0_1, buf0, buf1, 201, grid=grid(201), stream=stream0) 2023-01-11T21:38:06.3582137Z del arg0_1 2023-01-11T21:38:06.3582213Z return (buf0, buf1, ) 2023-01-11T21:38:06.3582218Z 2023-01-11T21:38:06.3582222Z 2023-01-11T21:38:06.3582300Z if __name__ == "__main__": 2023-01-11T21:38:06.3582418Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.3582544Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.3582743Z arg0_1 = rand_strided((201, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.3582861Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.3582866Z 2023-01-11T21:38:06.3582871Z 2023-01-11T21:38:06.3582969Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.3583042Z import torch 2023-01-11T21:38:06.3583110Z import random 2023-01-11T21:38:06.3583228Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.3583355Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.3583361Z 2023-01-11T21:38:06.3583442Z aten = torch.ops.aten 2023-01-11T21:38:06.3583578Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.3583672Z async_compile = AsyncCompile() 2023-01-11T21:38:06.3583677Z 2023-01-11T21:38:06.3583750Z import triton 2023-01-11T21:38:06.3583836Z import triton.language as tl 2023-01-11T21:38:06.3583957Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.3584093Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.3584099Z 2023-01-11T21:38:06.3584106Z 2023-01-11T21:38:06.3584268Z triton_fused_expm1_mul_0 = async_compile.triton(''' 2023-01-11T21:38:06.3584340Z import triton 2023-01-11T21:38:06.3584432Z import triton.language as tl 2023-01-11T21:38:06.3584546Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.3584646Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.3584804Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.3584929Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.3584934Z 2023-01-11T21:38:06.3585374Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.3585466Z @triton.jit 2023-01-11T21:38:06.3585615Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.3585686Z xnumel = 201 2023-01-11T21:38:06.3585784Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.3585913Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.3585990Z xmask = xindex < xnumel 2023-01-11T21:38:06.3586060Z x0 = xindex 2023-01-11T21:38:06.3586271Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.3586389Z tmp2 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.3586490Z tmp1 = tl.libdevice.expm1(tmp0) 2023-01-11T21:38:06.3586587Z tmp3 = tl.libdevice.expm1(tmp2) 2023-01-11T21:38:06.3586657Z tmp4 = 2 2023-01-11T21:38:06.3586730Z tmp5 = tmp3 * tmp4 2023-01-11T21:38:06.3586862Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask) 2023-01-11T21:38:06.3586997Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask) 2023-01-11T21:38:06.3587084Z ''') 2023-01-11T21:38:06.3587089Z 2023-01-11T21:38:06.3587093Z 2023-01-11T21:38:06.3587185Z async_compile.wait(globals()) 2023-01-11T21:38:06.3587291Z del async_compile 2023-01-11T21:38:06.3587296Z 2023-01-11T21:38:06.3587372Z def call(args): 2023-01-11T21:38:06.3587445Z arg0_1, = args 2023-01-11T21:38:06.3587513Z args.clear() 2023-01-11T21:38:06.3587604Z with torch.cuda.device(0): 2023-01-11T21:38:06.3587806Z buf0 = empty_strided((201, ), (1, ), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.3588006Z buf1 = empty_strided((201, ), (1, ), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.3588097Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.3588244Z triton_fused_expm1_mul_0.run(arg0_1, buf0, buf1, 201, grid=grid(201), stream=stream0) 2023-01-11T21:38:06.3588315Z del arg0_1 2023-01-11T21:38:06.3588391Z return (buf0, buf1, ) 2023-01-11T21:38:06.3588396Z 2023-01-11T21:38:06.3588407Z 2023-01-11T21:38:06.3588479Z if __name__ == "__main__": 2023-01-11T21:38:06.3588596Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.3588726Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.3588926Z arg0_1 = rand_strided((201, ), (1, ), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.3589038Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.3589043Z 2023-01-11T21:38:06.3589048Z 2023-01-11T21:38:06.3589144Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.3589220Z import torch 2023-01-11T21:38:06.3589288Z import random 2023-01-11T21:38:06.3589404Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.3589527Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.3589532Z 2023-01-11T21:38:06.3589616Z aten = torch.ops.aten 2023-01-11T21:38:06.3589749Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.3589843Z async_compile = AsyncCompile() 2023-01-11T21:38:06.3589848Z 2023-01-11T21:38:06.3589919Z import triton 2023-01-11T21:38:06.3590010Z import triton.language as tl 2023-01-11T21:38:06.3590126Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.3590266Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.3590271Z 2023-01-11T21:38:06.3590276Z 2023-01-11T21:38:06.3590438Z triton_fused_expm1_mul_0 = async_compile.triton(''' 2023-01-11T21:38:06.3590514Z import triton 2023-01-11T21:38:06.3590643Z import triton.language as tl 2023-01-11T21:38:06.3590760Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.3590861Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.3590992Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.3591111Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.3591116Z 2023-01-11T21:38:06.3591535Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp64', 1: '*fp64', 2: '*fp64', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.3591610Z @triton.jit 2023-01-11T21:38:06.3591752Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.3591823Z xnumel = 64 2023-01-11T21:38:06.3591919Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.3592049Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.3592133Z xmask = xindex < xnumel 2023-01-11T21:38:06.3592198Z x0 = xindex 2023-01-11T21:38:06.3592389Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.3592486Z tmp2 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.3592584Z tmp1 = tl.libdevice.expm1(tmp0) 2023-01-11T21:38:06.3592682Z tmp3 = tl.libdevice.expm1(tmp2) 2023-01-11T21:38:06.3592753Z tmp4 = 2 2023-01-11T21:38:06.3592824Z tmp5 = tmp3 * tmp4 2023-01-11T21:38:06.3592958Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask) 2023-01-11T21:38:06.3593088Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask) 2023-01-11T21:38:06.3593201Z ''') 2023-01-11T21:38:06.3593206Z 2023-01-11T21:38:06.3593211Z 2023-01-11T21:38:06.3593305Z async_compile.wait(globals()) 2023-01-11T21:38:06.3593383Z del async_compile 2023-01-11T21:38:06.3593388Z 2023-01-11T21:38:06.3593462Z def call(args): 2023-01-11T21:38:06.3593538Z arg0_1, = args 2023-01-11T21:38:06.3593608Z args.clear() 2023-01-11T21:38:06.3593702Z with torch.cuda.device(0): 2023-01-11T21:38:06.3593899Z buf0 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float64) 2023-01-11T21:38:06.3594096Z buf1 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float64) 2023-01-11T21:38:06.3594186Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.3594334Z triton_fused_expm1_mul_0.run(arg0_1, buf0, buf1, 64, grid=grid(64), stream=stream0) 2023-01-11T21:38:06.3594406Z del arg0_1 2023-01-11T21:38:06.3594481Z return (buf0, buf1, ) 2023-01-11T21:38:06.3594492Z 2023-01-11T21:38:06.3594497Z 2023-01-11T21:38:06.3594571Z if __name__ == "__main__": 2023-01-11T21:38:06.3594689Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.3594814Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.3595012Z arg0_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float64) 2023-01-11T21:38:06.3595127Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.3595133Z 2023-01-11T21:38:06.3595399Z [2023-01-11 21:34:37,662] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 535 2023-01-11T21:38:06.3595813Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.3595943Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.3596202Z [2023-01-11 21:34:37,678] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 536 2023-01-11T21:38:06.3596456Z [2023-01-11 21:34:37,829] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 536 2023-01-11T21:38:06.3596897Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.3597031Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.3597284Z [2023-01-11 21:34:37,845] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 537 2023-01-11T21:38:06.3597541Z [2023-01-11 21:34:37,853] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 537 2023-01-11T21:38:06.3597954Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.3598083Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.3598337Z [2023-01-11 21:34:37,868] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 538 2023-01-11T21:38:06.3598344Z 2023-01-11T21:38:06.3598442Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.3598516Z import torch 2023-01-11T21:38:06.3598591Z import random 2023-01-11T21:38:06.3598704Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.3598829Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.3598860Z 2023-01-11T21:38:06.3598944Z aten = torch.ops.aten 2023-01-11T21:38:06.3599081Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.3599177Z async_compile = AsyncCompile() 2023-01-11T21:38:06.3599182Z 2023-01-11T21:38:06.3599256Z import triton 2023-01-11T21:38:06.3599348Z import triton.language as tl 2023-01-11T21:38:06.3599469Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.3599611Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.3599617Z 2023-01-11T21:38:06.3599621Z 2023-01-11T21:38:06.3599784Z triton_fused_expm1_mul_0 = async_compile.triton(''' 2023-01-11T21:38:06.3599858Z import triton 2023-01-11T21:38:06.3599951Z import triton.language as tl 2023-01-11T21:38:06.3600065Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.3600165Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.3600295Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.3600416Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.3600422Z 2023-01-11T21:38:06.3600842Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp64', 1: '*fp64', 2: '*fp64', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.3600914Z @triton.jit 2023-01-11T21:38:06.3601055Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.3601126Z xnumel = 64 2023-01-11T21:38:06.3601224Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.3601350Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.3601433Z xmask = xindex < xnumel 2023-01-11T21:38:06.3601497Z x0 = xindex 2023-01-11T21:38:06.3601687Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.3601784Z tmp2 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.3601885Z tmp1 = tl.libdevice.expm1(tmp0) 2023-01-11T21:38:06.3601981Z tmp3 = tl.libdevice.expm1(tmp2) 2023-01-11T21:38:06.3602051Z tmp4 = 2 2023-01-11T21:38:06.3602129Z tmp5 = tmp3 * tmp4 2023-01-11T21:38:06.3602258Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask) 2023-01-11T21:38:06.3602416Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask) 2023-01-11T21:38:06.3602500Z ''') 2023-01-11T21:38:06.3602506Z 2023-01-11T21:38:06.3602511Z 2023-01-11T21:38:06.3602600Z async_compile.wait(globals()) 2023-01-11T21:38:06.3602676Z del async_compile 2023-01-11T21:38:06.3602681Z 2023-01-11T21:38:06.3602757Z def call(args): 2023-01-11T21:38:06.3602831Z arg0_1, = args 2023-01-11T21:38:06.3602905Z args.clear() 2023-01-11T21:38:06.3602990Z with torch.cuda.device(0): 2023-01-11T21:38:06.3603186Z buf0 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float64) 2023-01-11T21:38:06.3603380Z buf1 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float64) 2023-01-11T21:38:06.3603474Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.3603620Z triton_fused_expm1_mul_0.run(arg0_1, buf0, buf1, 64, grid=grid(64), stream=stream0) 2023-01-11T21:38:06.3603691Z del arg0_1 2023-01-11T21:38:06.3603776Z return (buf0, buf1, ) 2023-01-11T21:38:06.3603781Z 2023-01-11T21:38:06.3603788Z 2023-01-11T21:38:06.3603861Z if __name__ == "__main__": 2023-01-11T21:38:06.3603980Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.3604103Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.3604302Z arg0_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float64) 2023-01-11T21:38:06.3604413Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.3604418Z 2023-01-11T21:38:06.3604422Z 2023-01-11T21:38:06.3604517Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.3604591Z import torch 2023-01-11T21:38:06.3604663Z import random 2023-01-11T21:38:06.3604803Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.3604926Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.3604931Z 2023-01-11T21:38:06.3605011Z aten = torch.ops.aten 2023-01-11T21:38:06.3605143Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.3605239Z async_compile = AsyncCompile() 2023-01-11T21:38:06.3605244Z 2023-01-11T21:38:06.3605318Z import triton 2023-01-11T21:38:06.3605408Z import triton.language as tl 2023-01-11T21:38:06.3605531Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.3605663Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.3605669Z 2023-01-11T21:38:06.3605673Z 2023-01-11T21:38:06.3605835Z triton_fused_expm1_mul_0 = async_compile.triton(''' 2023-01-11T21:38:06.3605908Z import triton 2023-01-11T21:38:06.3605999Z import triton.language as tl 2023-01-11T21:38:06.3606109Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.3606214Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.3606346Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.3606464Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.3606477Z 2023-01-11T21:38:06.3606888Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp64', 1: '*fp64', 2: '*fp64', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.3606960Z @triton.jit 2023-01-11T21:38:06.3607101Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.3607175Z xnumel = 201 2023-01-11T21:38:06.3607271Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.3607398Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.3607481Z xmask = xindex < xnumel 2023-01-11T21:38:06.3607545Z x0 = xindex 2023-01-11T21:38:06.3607733Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.3607833Z tmp2 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.3607931Z tmp1 = tl.libdevice.expm1(tmp0) 2023-01-11T21:38:06.3608024Z tmp3 = tl.libdevice.expm1(tmp2) 2023-01-11T21:38:06.3608094Z tmp4 = 2 2023-01-11T21:38:06.3608174Z tmp5 = tmp3 * tmp4 2023-01-11T21:38:06.3608328Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask) 2023-01-11T21:38:06.3608461Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask) 2023-01-11T21:38:06.3608547Z ''') 2023-01-11T21:38:06.3608553Z 2023-01-11T21:38:06.3608557Z 2023-01-11T21:38:06.3608653Z async_compile.wait(globals()) 2023-01-11T21:38:06.3608730Z del async_compile 2023-01-11T21:38:06.3608735Z 2023-01-11T21:38:06.3608811Z def call(args): 2023-01-11T21:38:06.3608883Z arg0_1, = args 2023-01-11T21:38:06.3608957Z args.clear() 2023-01-11T21:38:06.3609042Z with torch.cuda.device(0): 2023-01-11T21:38:06.3609241Z buf0 = empty_strided((201, ), (1, ), device='cuda', dtype=torch.float64) 2023-01-11T21:38:06.3609443Z buf1 = empty_strided((201, ), (1, ), device='cuda', dtype=torch.float64) 2023-01-11T21:38:06.3609535Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.3609683Z triton_fused_expm1_mul_0.run(arg0_1, buf0, buf1, 201, grid=grid(201), stream=stream0) 2023-01-11T21:38:06.3609761Z del arg0_1 2023-01-11T21:38:06.3609844Z return (buf0, buf1, ) 2023-01-11T21:38:06.3609849Z 2023-01-11T21:38:06.3609854Z 2023-01-11T21:38:06.3609932Z if __name__ == "__main__": 2023-01-11T21:38:06.3610043Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.3610169Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.3610367Z arg0_1 = rand_strided((201, ), (1, ), device='cuda:0', dtype=torch.float64) 2023-01-11T21:38:06.3610479Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.3610484Z 2023-01-11T21:38:06.3610489Z 2023-01-11T21:38:06.3610634Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.3610707Z import torch 2023-01-11T21:38:06.3610781Z import random 2023-01-11T21:38:06.3610892Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.3611013Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.3611018Z 2023-01-11T21:38:06.3611099Z aten = torch.ops.aten 2023-01-11T21:38:06.3611237Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.3611332Z async_compile = AsyncCompile() 2023-01-11T21:38:06.3611337Z 2023-01-11T21:38:06.3611408Z import triton 2023-01-11T21:38:06.3611498Z import triton.language as tl 2023-01-11T21:38:06.3611619Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.3611753Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.3611758Z 2023-01-11T21:38:06.3611771Z 2023-01-11T21:38:06.3611927Z triton_fused_expm1_mul_0 = async_compile.triton(''' 2023-01-11T21:38:06.3611999Z import triton 2023-01-11T21:38:06.3612095Z import triton.language as tl 2023-01-11T21:38:06.3612204Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.3612306Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.3612438Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.3612562Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.3612570Z 2023-01-11T21:38:06.3612973Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp64', 1: '*fp64', 2: '*fp64', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.3613048Z @triton.jit 2023-01-11T21:38:06.3613192Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.3613265Z xnumel = 201 2023-01-11T21:38:06.3613362Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.3613489Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.3613577Z xmask = xindex < xnumel 2023-01-11T21:38:06.3613647Z x0 = xindex 2023-01-11T21:38:06.3613829Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.3613925Z tmp2 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.3614024Z tmp1 = tl.libdevice.expm1(tmp0) 2023-01-11T21:38:06.3614146Z tmp3 = tl.libdevice.expm1(tmp2) 2023-01-11T21:38:06.3614220Z tmp4 = 2 2023-01-11T21:38:06.3614299Z tmp5 = tmp3 * tmp4 2023-01-11T21:38:06.3614432Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask) 2023-01-11T21:38:06.3614677Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask) 2023-01-11T21:38:06.3614766Z ''') 2023-01-11T21:38:06.3614771Z 2023-01-11T21:38:06.3614775Z 2023-01-11T21:38:06.3614868Z async_compile.wait(globals()) 2023-01-11T21:38:06.3614943Z del async_compile 2023-01-11T21:38:06.3614948Z 2023-01-11T21:38:06.3615021Z def call(args): 2023-01-11T21:38:06.3615093Z arg0_1, = args 2023-01-11T21:38:06.3615174Z args.clear() 2023-01-11T21:38:06.3615259Z with torch.cuda.device(0): 2023-01-11T21:38:06.3615456Z buf0 = empty_strided((201, ), (1, ), device='cuda', dtype=torch.float64) 2023-01-11T21:38:06.3615656Z buf1 = empty_strided((201, ), (1, ), device='cuda', dtype=torch.float64) 2023-01-11T21:38:06.3615750Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.3615899Z triton_fused_expm1_mul_0.run(arg0_1, buf0, buf1, 201, grid=grid(201), stream=stream0) 2023-01-11T21:38:06.3615969Z del arg0_1 2023-01-11T21:38:06.3616050Z return (buf0, buf1, ) 2023-01-11T21:38:06.3616055Z 2023-01-11T21:38:06.3616059Z 2023-01-11T21:38:06.3616135Z if __name__ == "__main__": 2023-01-11T21:38:06.3616246Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.3616371Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.3616568Z arg0_1 = rand_strided((201, ), (1, ), device='cuda:0', dtype=torch.float64) 2023-01-11T21:38:06.3616722Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.3616728Z 2023-01-11T21:38:06.3616992Z [2023-01-11 21:34:38,012] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 538 2023-01-11T21:38:06.3617485Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.3617619Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.3617877Z [2023-01-11 21:34:38,029] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 539 2023-01-11T21:38:06.3618139Z [2023-01-11 21:34:38,037] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 539 2023-01-11T21:38:06.3618558Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.3618691Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.3618938Z [2023-01-11 21:34:38,052] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 540 2023-01-11T21:38:06.3619200Z [2023-01-11 21:34:38,204] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 540 2023-01-11T21:38:06.3619606Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.3619735Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.3619988Z [2023-01-11 21:34:38,220] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 541 2023-01-11T21:38:06.3620076Z 2023-01-11T21:38:06.3620176Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.3620252Z import torch 2023-01-11T21:38:06.3620326Z import random 2023-01-11T21:38:06.3620446Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.3620562Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.3620568Z 2023-01-11T21:38:06.3620651Z aten = torch.ops.aten 2023-01-11T21:38:06.3620787Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.3620880Z async_compile = AsyncCompile() 2023-01-11T21:38:06.3620886Z 2023-01-11T21:38:06.3620960Z import triton 2023-01-11T21:38:06.3621057Z import triton.language as tl 2023-01-11T21:38:06.3621182Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.3621314Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.3621328Z 2023-01-11T21:38:06.3621333Z 2023-01-11T21:38:06.3621489Z triton_fused_expm1_mul_0 = async_compile.triton(''' 2023-01-11T21:38:06.3621566Z import triton 2023-01-11T21:38:06.3621659Z import triton.language as tl 2023-01-11T21:38:06.3621771Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.3621873Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.3622004Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.3622128Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.3622133Z 2023-01-11T21:38:06.3622549Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*i32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.3622644Z @triton.jit 2023-01-11T21:38:06.3622786Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.3622861Z xnumel = 64 2023-01-11T21:38:06.3622957Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.3623088Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.3623172Z xmask = xindex < xnumel 2023-01-11T21:38:06.3623240Z x0 = xindex 2023-01-11T21:38:06.3623423Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.3623519Z tmp3 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.3623606Z tmp1 = tmp0.to(tl.float32) 2023-01-11T21:38:06.3623703Z tmp2 = tl.libdevice.expm1(tmp1) 2023-01-11T21:38:06.3623789Z tmp4 = tmp3.to(tl.float32) 2023-01-11T21:38:06.3623887Z tmp5 = tl.libdevice.expm1(tmp4) 2023-01-11T21:38:06.3623957Z tmp6 = 2 2023-01-11T21:38:06.3624028Z tmp7 = tmp5 * tmp6 2023-01-11T21:38:06.3624163Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.3624293Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp7, xmask) 2023-01-11T21:38:06.3624377Z ''') 2023-01-11T21:38:06.3624383Z 2023-01-11T21:38:06.3624387Z 2023-01-11T21:38:06.3624476Z async_compile.wait(globals()) 2023-01-11T21:38:06.3624552Z del async_compile 2023-01-11T21:38:06.3624558Z 2023-01-11T21:38:06.3624633Z def call(args): 2023-01-11T21:38:06.3624706Z arg0_1, = args 2023-01-11T21:38:06.3624774Z args.clear() 2023-01-11T21:38:06.3624865Z with torch.cuda.device(0): 2023-01-11T21:38:06.3625062Z buf0 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3625259Z buf1 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3625353Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.3625500Z triton_fused_expm1_mul_0.run(arg0_1, buf0, buf1, 64, grid=grid(64), stream=stream0) 2023-01-11T21:38:06.3625577Z del arg0_1 2023-01-11T21:38:06.3625653Z return (buf0, buf1, ) 2023-01-11T21:38:06.3625658Z 2023-01-11T21:38:06.3625669Z 2023-01-11T21:38:06.3625741Z if __name__ == "__main__": 2023-01-11T21:38:06.3625859Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.3626012Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.3626208Z arg0_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.int32) 2023-01-11T21:38:06.3626320Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.3626325Z 2023-01-11T21:38:06.3626330Z 2023-01-11T21:38:06.3626429Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.3626504Z import torch 2023-01-11T21:38:06.3626571Z import random 2023-01-11T21:38:06.3626689Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.3626811Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.3626816Z 2023-01-11T21:38:06.3626899Z aten = torch.ops.aten 2023-01-11T21:38:06.3627038Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.3627132Z async_compile = AsyncCompile() 2023-01-11T21:38:06.3627137Z 2023-01-11T21:38:06.3627213Z import triton 2023-01-11T21:38:06.3627305Z import triton.language as tl 2023-01-11T21:38:06.3627424Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.3627560Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.3627565Z 2023-01-11T21:38:06.3627570Z 2023-01-11T21:38:06.3627733Z triton_fused_expm1_mul_0 = async_compile.triton(''' 2023-01-11T21:38:06.3627806Z import triton 2023-01-11T21:38:06.3627897Z import triton.language as tl 2023-01-11T21:38:06.3628009Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.3628113Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.3628239Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.3628363Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.3628395Z 2023-01-11T21:38:06.3628813Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*i32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.3628887Z @triton.jit 2023-01-11T21:38:06.3629032Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.3629108Z xnumel = 64 2023-01-11T21:38:06.3629205Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.3629332Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.3629409Z xmask = xindex < xnumel 2023-01-11T21:38:06.3629481Z x0 = xindex 2023-01-11T21:38:06.3629671Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.3629766Z tmp3 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.3629854Z tmp1 = tmp0.to(tl.float32) 2023-01-11T21:38:06.3629961Z tmp2 = tl.libdevice.expm1(tmp1) 2023-01-11T21:38:06.3630048Z tmp4 = tmp3.to(tl.float32) 2023-01-11T21:38:06.3630139Z tmp5 = tl.libdevice.expm1(tmp4) 2023-01-11T21:38:06.3630208Z tmp6 = 2 2023-01-11T21:38:06.3630288Z tmp7 = tmp5 * tmp6 2023-01-11T21:38:06.3630422Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.3630557Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp7, xmask) 2023-01-11T21:38:06.3630640Z ''') 2023-01-11T21:38:06.3630646Z 2023-01-11T21:38:06.3630650Z 2023-01-11T21:38:06.3630741Z async_compile.wait(globals()) 2023-01-11T21:38:06.3630818Z del async_compile 2023-01-11T21:38:06.3630824Z 2023-01-11T21:38:06.3630891Z def call(args): 2023-01-11T21:38:06.3630963Z arg0_1, = args 2023-01-11T21:38:06.3631037Z args.clear() 2023-01-11T21:38:06.3631128Z with torch.cuda.device(0): 2023-01-11T21:38:06.3631323Z buf0 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3631548Z buf1 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3631643Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.3631797Z triton_fused_expm1_mul_0.run(arg0_1, buf0, buf1, 64, grid=grid(64), stream=stream0) 2023-01-11T21:38:06.3631874Z del arg0_1 2023-01-11T21:38:06.3631984Z return (buf0, buf1, ) 2023-01-11T21:38:06.3631990Z 2023-01-11T21:38:06.3631995Z 2023-01-11T21:38:06.3632075Z if __name__ == "__main__": 2023-01-11T21:38:06.3632202Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.3632335Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.3632554Z arg0_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.int32) 2023-01-11T21:38:06.3632665Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.3632670Z 2023-01-11T21:38:06.3632675Z 2023-01-11T21:38:06.3632770Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.3632837Z import torch 2023-01-11T21:38:06.3632914Z import random 2023-01-11T21:38:06.3633032Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.3633156Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.3633161Z 2023-01-11T21:38:06.3633242Z aten = torch.ops.aten 2023-01-11T21:38:06.3633378Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.3633473Z async_compile = AsyncCompile() 2023-01-11T21:38:06.3633478Z 2023-01-11T21:38:06.3633545Z import triton 2023-01-11T21:38:06.3633641Z import triton.language as tl 2023-01-11T21:38:06.3633764Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.3633904Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.3633909Z 2023-01-11T21:38:06.3633914Z 2023-01-11T21:38:06.3634076Z triton_fused_expm1_mul_0 = async_compile.triton(''' 2023-01-11T21:38:06.3634151Z import triton 2023-01-11T21:38:06.3634246Z import triton.language as tl 2023-01-11T21:38:06.3634359Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.3634480Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.3634610Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.3634734Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.3634739Z 2023-01-11T21:38:06.3635154Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*i32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.3635232Z @triton.jit 2023-01-11T21:38:06.3635378Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.3635454Z xnumel = 201 2023-01-11T21:38:06.3635550Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.3635673Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.3635754Z xmask = xindex < xnumel 2023-01-11T21:38:06.3635826Z x0 = xindex 2023-01-11T21:38:06.3636021Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.3636119Z tmp3 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.3636208Z tmp1 = tmp0.to(tl.float32) 2023-01-11T21:38:06.3636306Z tmp2 = tl.libdevice.expm1(tmp1) 2023-01-11T21:38:06.3636386Z tmp4 = tmp3.to(tl.float32) 2023-01-11T21:38:06.3636486Z tmp5 = tl.libdevice.expm1(tmp4) 2023-01-11T21:38:06.3636557Z tmp6 = 2 2023-01-11T21:38:06.3636640Z tmp7 = tmp5 * tmp6 2023-01-11T21:38:06.3636772Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.3636907Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp7, xmask) 2023-01-11T21:38:06.3636991Z ''') 2023-01-11T21:38:06.3636996Z 2023-01-11T21:38:06.3637001Z 2023-01-11T21:38:06.3637086Z async_compile.wait(globals()) 2023-01-11T21:38:06.3637162Z del async_compile 2023-01-11T21:38:06.3637168Z 2023-01-11T21:38:06.3637240Z def call(args): 2023-01-11T21:38:06.3637315Z arg0_1, = args 2023-01-11T21:38:06.3637388Z args.clear() 2023-01-11T21:38:06.3637478Z with torch.cuda.device(0): 2023-01-11T21:38:06.3637677Z buf0 = empty_strided((201, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3637870Z buf1 = empty_strided((201, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3637995Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.3638145Z triton_fused_expm1_mul_0.run(arg0_1, buf0, buf1, 201, grid=grid(201), stream=stream0) 2023-01-11T21:38:06.3638218Z del arg0_1 2023-01-11T21:38:06.3638301Z return (buf0, buf1, ) 2023-01-11T21:38:06.3638306Z 2023-01-11T21:38:06.3638311Z 2023-01-11T21:38:06.3638389Z if __name__ == "__main__": 2023-01-11T21:38:06.3638505Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.3638632Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.3638823Z arg0_1 = rand_strided((201, ), (1, ), device='cuda:0', dtype=torch.int32) 2023-01-11T21:38:06.3638939Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.3638945Z 2023-01-11T21:38:06.3639211Z [2023-01-11 21:34:38,228] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 541 2023-01-11T21:38:06.3639628Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.3639759Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.3640015Z [2023-01-11 21:34:38,243] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 542 2023-01-11T21:38:06.3640280Z [2023-01-11 21:34:38,387] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 542 2023-01-11T21:38:06.3640719Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.3640851Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.3641105Z [2023-01-11 21:34:38,403] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 543 2023-01-11T21:38:06.3641369Z [2023-01-11 21:34:38,411] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 543 2023-01-11T21:38:06.3641778Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.3641905Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.3642162Z [2023-01-11 21:34:38,427] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 544 2023-01-11T21:38:06.3642168Z 2023-01-11T21:38:06.3642267Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.3642341Z import torch 2023-01-11T21:38:06.3642416Z import random 2023-01-11T21:38:06.3642537Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.3642662Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.3642667Z 2023-01-11T21:38:06.3642746Z aten = torch.ops.aten 2023-01-11T21:38:06.3642876Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.3642969Z async_compile = AsyncCompile() 2023-01-11T21:38:06.3642974Z 2023-01-11T21:38:06.3643049Z import triton 2023-01-11T21:38:06.3643143Z import triton.language as tl 2023-01-11T21:38:06.3643269Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.3643410Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.3643416Z 2023-01-11T21:38:06.3643420Z 2023-01-11T21:38:06.3643610Z triton_fused_expm1_mul_0 = async_compile.triton(''' 2023-01-11T21:38:06.3643686Z import triton 2023-01-11T21:38:06.3643770Z import triton.language as tl 2023-01-11T21:38:06.3643885Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.3643986Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.3644120Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.3644245Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.3644251Z 2023-01-11T21:38:06.3644664Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*i32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.3644740Z @triton.jit 2023-01-11T21:38:06.3644883Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.3644952Z xnumel = 201 2023-01-11T21:38:06.3645049Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.3645180Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.3645262Z xmask = xindex < xnumel 2023-01-11T21:38:06.3645331Z x0 = xindex 2023-01-11T21:38:06.3645523Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.3645619Z tmp3 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.3645701Z tmp1 = tmp0.to(tl.float32) 2023-01-11T21:38:06.3645799Z tmp2 = tl.libdevice.expm1(tmp1) 2023-01-11T21:38:06.3645885Z tmp4 = tmp3.to(tl.float32) 2023-01-11T21:38:06.3645985Z tmp5 = tl.libdevice.expm1(tmp4) 2023-01-11T21:38:06.3646056Z tmp6 = 2 2023-01-11T21:38:06.3646134Z tmp7 = tmp5 * tmp6 2023-01-11T21:38:06.3646300Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.3646427Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp7, xmask) 2023-01-11T21:38:06.3646511Z ''') 2023-01-11T21:38:06.3646518Z 2023-01-11T21:38:06.3646522Z 2023-01-11T21:38:06.3646616Z async_compile.wait(globals()) 2023-01-11T21:38:06.3646692Z del async_compile 2023-01-11T21:38:06.3646697Z 2023-01-11T21:38:06.3646770Z def call(args): 2023-01-11T21:38:06.3646843Z arg0_1, = args 2023-01-11T21:38:06.3646916Z args.clear() 2023-01-11T21:38:06.3647001Z with torch.cuda.device(0): 2023-01-11T21:38:06.3647200Z buf0 = empty_strided((201, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3647401Z buf1 = empty_strided((201, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3647493Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.3647643Z triton_fused_expm1_mul_0.run(arg0_1, buf0, buf1, 201, grid=grid(201), stream=stream0) 2023-01-11T21:38:06.3647719Z del arg0_1 2023-01-11T21:38:06.3647799Z return (buf0, buf1, ) 2023-01-11T21:38:06.3647804Z 2023-01-11T21:38:06.3647809Z 2023-01-11T21:38:06.3647890Z if __name__ == "__main__": 2023-01-11T21:38:06.3648001Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.3648130Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.3648329Z arg0_1 = rand_strided((201, ), (1, ), device='cuda:0', dtype=torch.int32) 2023-01-11T21:38:06.3648445Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.3648450Z 2023-01-11T21:38:06.3648454Z 2023-01-11T21:38:06.3648549Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.3648623Z import torch 2023-01-11T21:38:06.3648697Z import random 2023-01-11T21:38:06.3648820Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.3648936Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.3648941Z 2023-01-11T21:38:06.3649026Z aten = torch.ops.aten 2023-01-11T21:38:06.3649162Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.3649256Z async_compile = AsyncCompile() 2023-01-11T21:38:06.3649262Z 2023-01-11T21:38:06.3649335Z import triton 2023-01-11T21:38:06.3649427Z import triton.language as tl 2023-01-11T21:38:06.3649577Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.3649714Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.3649720Z 2023-01-11T21:38:06.3649724Z 2023-01-11T21:38:06.3649880Z triton_fused_expm1_mul_0 = async_compile.triton(''' 2023-01-11T21:38:06.3649957Z import triton 2023-01-11T21:38:06.3650050Z import triton.language as tl 2023-01-11T21:38:06.3650163Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.3650261Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.3650391Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.3650516Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.3650523Z 2023-01-11T21:38:06.3650939Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.3651010Z @triton.jit 2023-01-11T21:38:06.3651152Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.3651223Z xnumel = 64 2023-01-11T21:38:06.3651319Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.3651445Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.3651528Z xmask = xindex < xnumel 2023-01-11T21:38:06.3651597Z x0 = xindex 2023-01-11T21:38:06.3651781Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.3651877Z tmp3 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.3651965Z tmp1 = tmp0.to(tl.float32) 2023-01-11T21:38:06.3652103Z tmp2 = tl.libdevice.expm1(tmp1) 2023-01-11T21:38:06.3652192Z tmp4 = tmp3.to(tl.float32) 2023-01-11T21:38:06.3652289Z tmp5 = tl.libdevice.expm1(tmp4) 2023-01-11T21:38:06.3652359Z tmp6 = 2 2023-01-11T21:38:06.3652430Z tmp7 = tmp5 * tmp6 2023-01-11T21:38:06.3652564Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.3652698Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp7, xmask) 2023-01-11T21:38:06.3652786Z ''') 2023-01-11T21:38:06.3652792Z 2023-01-11T21:38:06.3652796Z 2023-01-11T21:38:06.3652886Z async_compile.wait(globals()) 2023-01-11T21:38:06.3652962Z del async_compile 2023-01-11T21:38:06.3652967Z 2023-01-11T21:38:06.3653043Z def call(args): 2023-01-11T21:38:06.3653115Z arg0_1, = args 2023-01-11T21:38:06.3653183Z args.clear() 2023-01-11T21:38:06.3653270Z with torch.cuda.device(0): 2023-01-11T21:38:06.3653472Z buf0 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3653673Z buf1 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3653764Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.3653911Z triton_fused_expm1_mul_0.run(arg0_1, buf0, buf1, 64, grid=grid(64), stream=stream0) 2023-01-11T21:38:06.3653982Z del arg0_1 2023-01-11T21:38:06.3654061Z return (buf0, buf1, ) 2023-01-11T21:38:06.3654066Z 2023-01-11T21:38:06.3654080Z 2023-01-11T21:38:06.3654153Z if __name__ == "__main__": 2023-01-11T21:38:06.3654270Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.3654397Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.3654697Z arg0_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.3654814Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.3654819Z 2023-01-11T21:38:06.3654823Z 2023-01-11T21:38:06.3654923Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.3655001Z import torch 2023-01-11T21:38:06.3655092Z import random 2023-01-11T21:38:06.3655224Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.3655359Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.3655365Z 2023-01-11T21:38:06.3655447Z aten = torch.ops.aten 2023-01-11T21:38:06.3655623Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.3655722Z async_compile = AsyncCompile() 2023-01-11T21:38:06.3655728Z 2023-01-11T21:38:06.3655808Z import triton 2023-01-11T21:38:06.3655899Z import triton.language as tl 2023-01-11T21:38:06.3656016Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.3656157Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.3656163Z 2023-01-11T21:38:06.3656167Z 2023-01-11T21:38:06.3656331Z triton_fused_expm1_mul_0 = async_compile.triton(''' 2023-01-11T21:38:06.3656406Z import triton 2023-01-11T21:38:06.3656497Z import triton.language as tl 2023-01-11T21:38:06.3656612Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.3656713Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.3656847Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.3656965Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.3656970Z 2023-01-11T21:38:06.3657443Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.3657519Z @triton.jit 2023-01-11T21:38:06.3657659Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.3657729Z xnumel = 64 2023-01-11T21:38:06.3657825Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.3657955Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.3658038Z xmask = xindex < xnumel 2023-01-11T21:38:06.3658146Z x0 = xindex 2023-01-11T21:38:06.3658342Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.3658441Z tmp3 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.3658533Z tmp1 = tmp0.to(tl.float32) 2023-01-11T21:38:06.3658634Z tmp2 = tl.libdevice.expm1(tmp1) 2023-01-11T21:38:06.3658725Z tmp4 = tmp3.to(tl.float32) 2023-01-11T21:38:06.3658825Z tmp5 = tl.libdevice.expm1(tmp4) 2023-01-11T21:38:06.3658891Z tmp6 = 2 2023-01-11T21:38:06.3658971Z tmp7 = tmp5 * tmp6 2023-01-11T21:38:06.3659106Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.3659240Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp7, xmask) 2023-01-11T21:38:06.3659327Z ''') 2023-01-11T21:38:06.3659333Z 2023-01-11T21:38:06.3659337Z 2023-01-11T21:38:06.3659434Z async_compile.wait(globals()) 2023-01-11T21:38:06.3659512Z del async_compile 2023-01-11T21:38:06.3659517Z 2023-01-11T21:38:06.3659587Z def call(args): 2023-01-11T21:38:06.3659664Z arg0_1, = args 2023-01-11T21:38:06.3659740Z args.clear() 2023-01-11T21:38:06.3659834Z with torch.cuda.device(0): 2023-01-11T21:38:06.3660033Z buf0 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3660234Z buf1 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3660330Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.3660471Z triton_fused_expm1_mul_0.run(arg0_1, buf0, buf1, 64, grid=grid(64), stream=stream0) 2023-01-11T21:38:06.3660546Z del arg0_1 2023-01-11T21:38:06.3660630Z return (buf0, buf1, ) 2023-01-11T21:38:06.3660636Z 2023-01-11T21:38:06.3660640Z 2023-01-11T21:38:06.3660722Z if __name__ == "__main__": 2023-01-11T21:38:06.3660839Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.3660966Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.3661165Z arg0_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.3661281Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.3661287Z 2023-01-11T21:38:06.3661553Z [2023-01-11 21:34:38,578] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 544 2023-01-11T21:38:06.3661990Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.3662126Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.3662383Z [2023-01-11 21:34:38,593] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 545 2023-01-11T21:38:06.3662647Z [2023-01-11 21:34:38,601] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 545 2023-01-11T21:38:06.3662656Z 2023-01-11T21:38:06.3662757Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.3662832Z import torch 2023-01-11T21:38:06.3662908Z import random 2023-01-11T21:38:06.3663029Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.3663153Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.3663161Z 2023-01-11T21:38:06.3663238Z aten = torch.ops.aten 2023-01-11T21:38:06.3663377Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.3663475Z async_compile = AsyncCompile() 2023-01-11T21:38:06.3663480Z 2023-01-11T21:38:06.3663555Z import triton 2023-01-11T21:38:06.3663645Z import triton.language as tl 2023-01-11T21:38:06.3663768Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.3663910Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.3663915Z 2023-01-11T21:38:06.3663920Z 2023-01-11T21:38:06.3664084Z triton_fused_expm1_mul_0 = async_compile.triton(''' 2023-01-11T21:38:06.3664182Z import triton 2023-01-11T21:38:06.3664275Z import triton.language as tl 2023-01-11T21:38:06.3664390Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.3664494Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.3664631Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.3664760Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.3664766Z 2023-01-11T21:38:06.3665197Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.3665281Z @triton.jit 2023-01-11T21:38:06.3665444Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.3665518Z xnumel = 201 2023-01-11T21:38:06.3665617Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.3665751Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.3665839Z xmask = xindex < xnumel 2023-01-11T21:38:06.3665910Z x0 = xindex 2023-01-11T21:38:06.3666102Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.3666195Z tmp3 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.3666285Z tmp1 = tmp0.to(tl.float32) 2023-01-11T21:38:06.3666387Z tmp2 = tl.libdevice.expm1(tmp1) 2023-01-11T21:38:06.3666473Z tmp4 = tmp3.to(tl.float32) 2023-01-11T21:38:06.3666572Z tmp5 = tl.libdevice.expm1(tmp4) 2023-01-11T21:38:06.3666644Z tmp6 = 2 2023-01-11T21:38:06.3666725Z tmp7 = tmp5 * tmp6 2023-01-11T21:38:06.3666855Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.3666990Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp7, xmask) 2023-01-11T21:38:06.3667076Z ''') 2023-01-11T21:38:06.3667082Z 2023-01-11T21:38:06.3667086Z 2023-01-11T21:38:06.3667186Z async_compile.wait(globals()) 2023-01-11T21:38:06.3667264Z del async_compile 2023-01-11T21:38:06.3667269Z 2023-01-11T21:38:06.3667347Z def call(args): 2023-01-11T21:38:06.3667421Z arg0_1, = args 2023-01-11T21:38:06.3667492Z args.clear() 2023-01-11T21:38:06.3667584Z with torch.cuda.device(0): 2023-01-11T21:38:06.3667813Z buf0 = empty_strided((201, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3668015Z buf1 = empty_strided((201, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3668109Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.3668258Z triton_fused_expm1_mul_0.run(arg0_1, buf0, buf1, 201, grid=grid(201), stream=stream0) 2023-01-11T21:38:06.3668334Z del arg0_1 2023-01-11T21:38:06.3668418Z return (buf0, buf1, ) 2023-01-11T21:38:06.3668423Z 2023-01-11T21:38:06.3668428Z 2023-01-11T21:38:06.3668503Z if __name__ == "__main__": 2023-01-11T21:38:06.3668623Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.3668754Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.3668953Z arg0_1 = rand_strided((201, ), (1, ), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.3669067Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.3669072Z 2023-01-11T21:38:06.3669077Z 2023-01-11T21:38:06.3669178Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.3669250Z import torch 2023-01-11T21:38:06.3669330Z import random 2023-01-11T21:38:06.3669444Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.3669569Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.3669574Z 2023-01-11T21:38:06.3669657Z aten = torch.ops.aten 2023-01-11T21:38:06.3669796Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.3669896Z async_compile = AsyncCompile() 2023-01-11T21:38:06.3669901Z 2023-01-11T21:38:06.3669977Z import triton 2023-01-11T21:38:06.3670072Z import triton.language as tl 2023-01-11T21:38:06.3670219Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.3670360Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.3670366Z 2023-01-11T21:38:06.3670370Z 2023-01-11T21:38:06.3670535Z triton_fused_expm1_mul_0 = async_compile.triton(''' 2023-01-11T21:38:06.3670611Z import triton 2023-01-11T21:38:06.3670707Z import triton.language as tl 2023-01-11T21:38:06.3670822Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.3670927Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.3671059Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.3671180Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.3671185Z 2023-01-11T21:38:06.3671605Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.3671684Z @triton.jit 2023-01-11T21:38:06.3671829Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.3671907Z xnumel = 201 2023-01-11T21:38:06.3672007Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.3672135Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.3672222Z xmask = xindex < xnumel 2023-01-11T21:38:06.3672288Z x0 = xindex 2023-01-11T21:38:06.3672478Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.3672575Z tmp3 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.3672665Z tmp1 = tmp0.to(tl.float32) 2023-01-11T21:38:06.3672764Z tmp2 = tl.libdevice.expm1(tmp1) 2023-01-11T21:38:06.3672854Z tmp4 = tmp3.to(tl.float32) 2023-01-11T21:38:06.3672953Z tmp5 = tl.libdevice.expm1(tmp4) 2023-01-11T21:38:06.3673020Z tmp6 = 2 2023-01-11T21:38:06.3673100Z tmp7 = tmp5 * tmp6 2023-01-11T21:38:06.3673235Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.3673372Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp7, xmask) 2023-01-11T21:38:06.3673458Z ''') 2023-01-11T21:38:06.3673464Z 2023-01-11T21:38:06.3673468Z 2023-01-11T21:38:06.3673563Z async_compile.wait(globals()) 2023-01-11T21:38:06.3673641Z del async_compile 2023-01-11T21:38:06.3673672Z 2023-01-11T21:38:06.3673750Z def call(args): 2023-01-11T21:38:06.3673819Z arg0_1, = args 2023-01-11T21:38:06.3673895Z args.clear() 2023-01-11T21:38:06.3673989Z with torch.cuda.device(0): 2023-01-11T21:38:06.3674190Z buf0 = empty_strided((201, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3674388Z buf1 = empty_strided((201, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3674483Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.3674633Z triton_fused_expm1_mul_0.run(arg0_1, buf0, buf1, 201, grid=grid(201), stream=stream0) 2023-01-11T21:38:06.3674701Z del arg0_1 2023-01-11T21:38:06.3674790Z return (buf0, buf1, ) 2023-01-11T21:38:06.3674794Z 2023-01-11T21:38:06.3674799Z 2023-01-11T21:38:06.3674881Z if __name__ == "__main__": 2023-01-11T21:38:06.3675000Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.3675139Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.3675372Z arg0_1 = rand_strided((201, ), (1, ), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.3675488Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.3675493Z 2023-01-11T21:38:06.3675568Z ok (1.860s) 2023-01-11T21:38:06.3676021Z test_fill1_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.3676175Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.3676436Z [2023-01-11 21:34:38,642] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 546 2023-01-11T21:38:06.3676704Z [2023-01-11 21:34:38,715] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 546 2023-01-11T21:38:06.3677121Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.3677254Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.3677510Z [2023-01-11 21:34:38,756] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 547 2023-01-11T21:38:06.3677777Z [2023-01-11 21:34:38,828] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 547 2023-01-11T21:38:06.3677783Z 2023-01-11T21:38:06.3677885Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.3677960Z import torch 2023-01-11T21:38:06.3678037Z import random 2023-01-11T21:38:06.3678154Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.3678278Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.3678283Z 2023-01-11T21:38:06.3678366Z aten = torch.ops.aten 2023-01-11T21:38:06.3678504Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.3678602Z async_compile = AsyncCompile() 2023-01-11T21:38:06.3678607Z 2023-01-11T21:38:06.3678681Z import triton 2023-01-11T21:38:06.3678774Z import triton.language as tl 2023-01-11T21:38:06.3678895Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.3679038Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.3679047Z 2023-01-11T21:38:06.3679051Z 2023-01-11T21:38:06.3679216Z triton_fused_ones_like_0 = async_compile.triton(''' 2023-01-11T21:38:06.3679294Z import triton 2023-01-11T21:38:06.3679387Z import triton.language as tl 2023-01-11T21:38:06.3679500Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.3679630Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.3679765Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.3679886Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.3679891Z 2023-01-11T21:38:06.3680284Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp32', 1: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.3680361Z @triton.jit 2023-01-11T21:38:06.3680485Z def triton_(out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.3680562Z xnumel = 256 2023-01-11T21:38:06.3680664Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.3680794Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.3680879Z xmask = xindex < xnumel 2023-01-11T21:38:06.3680945Z x0 = xindex 2023-01-11T21:38:06.3681018Z tmp0 = 1 2023-01-11T21:38:06.3681153Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask) 2023-01-11T21:38:06.3681239Z ''') 2023-01-11T21:38:06.3681245Z 2023-01-11T21:38:06.3681249Z 2023-01-11T21:38:06.3681413Z triton_fused_full_like_1 = async_compile.triton(''' 2023-01-11T21:38:06.3681491Z import triton 2023-01-11T21:38:06.3681584Z import triton.language as tl 2023-01-11T21:38:06.3681693Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.3681795Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.3681929Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.3682056Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.3682061Z 2023-01-11T21:38:06.3682471Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp32', 1: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.3682547Z @triton.jit 2023-01-11T21:38:06.3682670Z def triton_(out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.3682749Z xnumel = 256 2023-01-11T21:38:06.3682841Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.3682971Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.3683056Z xmask = xindex < xnumel 2023-01-11T21:38:06.3683125Z x0 = xindex 2023-01-11T21:38:06.3683197Z tmp0 = 2 2023-01-11T21:38:06.3683330Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask) 2023-01-11T21:38:06.3683418Z ''') 2023-01-11T21:38:06.3683423Z 2023-01-11T21:38:06.3683427Z 2023-01-11T21:38:06.3683515Z async_compile.wait(globals()) 2023-01-11T21:38:06.3683594Z del async_compile 2023-01-11T21:38:06.3683602Z 2023-01-11T21:38:06.3683682Z def call(args): 2023-01-11T21:38:06.3683755Z arg0_1, = args 2023-01-11T21:38:06.3683831Z args.clear() 2023-01-11T21:38:06.3683926Z with torch.cuda.device(0): 2023-01-11T21:38:06.3684132Z buf0 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3684223Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.3684361Z triton_fused_ones_like_0.run(buf0, 256, grid=grid(256), stream=stream0) 2023-01-11T21:38:06.3684570Z buf1 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3684709Z triton_fused_full_like_1.run(buf1, 256, grid=grid(256), stream=stream0) 2023-01-11T21:38:06.3684798Z return (buf0, buf1, ) 2023-01-11T21:38:06.3684803Z 2023-01-11T21:38:06.3684807Z 2023-01-11T21:38:06.3684890Z if __name__ == "__main__": 2023-01-11T21:38:06.3685023Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.3685168Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.3685396Z arg0_1 = rand_strided((16, 16), (16, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.3685504Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.3685510Z 2023-01-11T21:38:06.3685514Z 2023-01-11T21:38:06.3685614Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.3685717Z import torch 2023-01-11T21:38:06.3685796Z import random 2023-01-11T21:38:06.3685919Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.3686043Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.3686048Z 2023-01-11T21:38:06.3686133Z aten = torch.ops.aten 2023-01-11T21:38:06.3686265Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.3686362Z async_compile = AsyncCompile() 2023-01-11T21:38:06.3686367Z 2023-01-11T21:38:06.3686443Z import triton 2023-01-11T21:38:06.3686538Z import triton.language as tl 2023-01-11T21:38:06.3686664Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.3686808Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.3686813Z 2023-01-11T21:38:06.3686818Z 2023-01-11T21:38:06.3686981Z triton_fused_ones_like_0 = async_compile.triton(''' 2023-01-11T21:38:06.3687057Z import triton 2023-01-11T21:38:06.3687145Z import triton.language as tl 2023-01-11T21:38:06.3687261Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.3687367Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.3687501Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.3687627Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.3687632Z 2023-01-11T21:38:06.3688013Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp16', 1: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.3688087Z @triton.jit 2023-01-11T21:38:06.3688237Z def triton_(out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.3688306Z xnumel = 256 2023-01-11T21:38:06.3688406Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.3688534Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.3688624Z xmask = xindex < xnumel 2023-01-11T21:38:06.3688695Z x0 = xindex 2023-01-11T21:38:06.3688768Z tmp0 = 1 2023-01-11T21:38:06.3688908Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask) 2023-01-11T21:38:06.3688989Z ''') 2023-01-11T21:38:06.3688994Z 2023-01-11T21:38:06.3689006Z 2023-01-11T21:38:06.3689163Z triton_fused_full_like_1 = async_compile.triton(''' 2023-01-11T21:38:06.3689239Z import triton 2023-01-11T21:38:06.3689335Z import triton.language as tl 2023-01-11T21:38:06.3689448Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.3689553Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.3689692Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.3689821Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.3689826Z 2023-01-11T21:38:06.3690195Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp16', 1: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.3690276Z @triton.jit 2023-01-11T21:38:06.3690397Z def triton_(out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.3690474Z xnumel = 256 2023-01-11T21:38:06.3690570Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.3690700Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.3690783Z xmask = xindex < xnumel 2023-01-11T21:38:06.3690849Z x0 = xindex 2023-01-11T21:38:06.3690921Z tmp0 = 2 2023-01-11T21:38:06.3691054Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask) 2023-01-11T21:38:06.3691141Z ''') 2023-01-11T21:38:06.3691147Z 2023-01-11T21:38:06.3691154Z 2023-01-11T21:38:06.3691249Z async_compile.wait(globals()) 2023-01-11T21:38:06.3691330Z del async_compile 2023-01-11T21:38:06.3691336Z 2023-01-11T21:38:06.3691411Z def call(args): 2023-01-11T21:38:06.3691488Z arg0_1, = args 2023-01-11T21:38:06.3691557Z args.clear() 2023-01-11T21:38:06.3691651Z with torch.cuda.device(0): 2023-01-11T21:38:06.3691883Z buf0 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.3691977Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.3692114Z triton_fused_ones_like_0.run(buf0, 256, grid=grid(256), stream=stream0) 2023-01-11T21:38:06.3692318Z buf1 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.3692454Z triton_fused_full_like_1.run(buf1, 256, grid=grid(256), stream=stream0) 2023-01-11T21:38:06.3692532Z return (buf0, buf1, ) 2023-01-11T21:38:06.3692542Z 2023-01-11T21:38:06.3692547Z 2023-01-11T21:38:06.3692622Z if __name__ == "__main__": 2023-01-11T21:38:06.3692746Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.3692873Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.3693075Z arg0_1 = rand_strided((16, 16), (16, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.3693190Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.3693198Z 2023-01-11T21:38:06.3693272Z ok (0.227s) 2023-01-11T21:38:06.3693727Z test_fill2_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.3693865Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.3694127Z [2023-01-11 21:34:38,872] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 548 2023-01-11T21:38:06.3694430Z [2023-01-11 21:34:38,939] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 548 2023-01-11T21:38:06.3694951Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.3695082Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.3695338Z [2023-01-11 21:34:38,982] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 549 2023-01-11T21:38:06.3695600Z [2023-01-11 21:34:39,052] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 549 2023-01-11T21:38:06.3695606Z 2023-01-11T21:38:06.3695709Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.3695782Z import torch 2023-01-11T21:38:06.3695858Z import random 2023-01-11T21:38:06.3695977Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.3696094Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.3696100Z 2023-01-11T21:38:06.3696184Z aten = torch.ops.aten 2023-01-11T21:38:06.3696318Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.3696412Z async_compile = AsyncCompile() 2023-01-11T21:38:06.3696417Z 2023-01-11T21:38:06.3696490Z import triton 2023-01-11T21:38:06.3696581Z import triton.language as tl 2023-01-11T21:38:06.3696709Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.3696848Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.3696853Z 2023-01-11T21:38:06.3696858Z 2023-01-11T21:38:06.3697012Z triton_fused_ones_like_0 = async_compile.triton(''' 2023-01-11T21:38:06.3697086Z import triton 2023-01-11T21:38:06.3697232Z import triton.language as tl 2023-01-11T21:38:06.3697367Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.3697469Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.3697600Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.3697771Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.3697778Z 2023-01-11T21:38:06.3698168Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp32', 1: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.3698235Z @triton.jit 2023-01-11T21:38:06.3698358Z def triton_(out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.3698432Z xnumel = 256 2023-01-11T21:38:06.3698528Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.3698659Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.3698742Z xmask = xindex < xnumel 2023-01-11T21:38:06.3698816Z x0 = xindex 2023-01-11T21:38:06.3698880Z tmp0 = 1 2023-01-11T21:38:06.3699012Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask) 2023-01-11T21:38:06.3699097Z ''') 2023-01-11T21:38:06.3699103Z 2023-01-11T21:38:06.3699107Z 2023-01-11T21:38:06.3699271Z triton_fused_full_like_1 = async_compile.triton(''' 2023-01-11T21:38:06.3699346Z import triton 2023-01-11T21:38:06.3699437Z import triton.language as tl 2023-01-11T21:38:06.3699547Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.3699648Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.3699772Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.3699893Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.3699898Z 2023-01-11T21:38:06.3700281Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp32', 1: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.3700394Z @triton.jit 2023-01-11T21:38:06.3700514Z def triton_(out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.3700587Z xnumel = 256 2023-01-11T21:38:06.3700684Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.3700814Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.3700890Z xmask = xindex < xnumel 2023-01-11T21:38:06.3700960Z x0 = xindex 2023-01-11T21:38:06.3701031Z tmp0 = 3.0 2023-01-11T21:38:06.3701165Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask) 2023-01-11T21:38:06.3701249Z ''') 2023-01-11T21:38:06.3701255Z 2023-01-11T21:38:06.3701259Z 2023-01-11T21:38:06.3701351Z async_compile.wait(globals()) 2023-01-11T21:38:06.3701428Z del async_compile 2023-01-11T21:38:06.3701433Z 2023-01-11T21:38:06.3701500Z def call(args): 2023-01-11T21:38:06.3701571Z arg0_1, = args 2023-01-11T21:38:06.3701644Z args.clear() 2023-01-11T21:38:06.3701738Z with torch.cuda.device(0): 2023-01-11T21:38:06.3701942Z buf0 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3702033Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.3702168Z triton_fused_ones_like_0.run(buf0, 256, grid=grid(256), stream=stream0) 2023-01-11T21:38:06.3702368Z buf1 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3702502Z triton_fused_full_like_1.run(buf1, 256, grid=grid(256), stream=stream0) 2023-01-11T21:38:06.3702585Z return (buf0, buf1, ) 2023-01-11T21:38:06.3702590Z 2023-01-11T21:38:06.3702595Z 2023-01-11T21:38:06.3702675Z if __name__ == "__main__": 2023-01-11T21:38:06.3702790Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.3702919Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.3703121Z arg0_1 = rand_strided((16, 16), (16, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.3703234Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.3703239Z 2023-01-11T21:38:06.3703244Z 2023-01-11T21:38:06.3703341Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.3703409Z import torch 2023-01-11T21:38:06.3703481Z import random 2023-01-11T21:38:06.3703601Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.3703751Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.3703757Z 2023-01-11T21:38:06.3703840Z aten = torch.ops.aten 2023-01-11T21:38:06.3703976Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.3704072Z async_compile = AsyncCompile() 2023-01-11T21:38:06.3704077Z 2023-01-11T21:38:06.3704145Z import triton 2023-01-11T21:38:06.3704236Z import triton.language as tl 2023-01-11T21:38:06.3704363Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.3704500Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.3704505Z 2023-01-11T21:38:06.3704513Z 2023-01-11T21:38:06.3704672Z triton_fused_ones_like_0 = async_compile.triton(''' 2023-01-11T21:38:06.3704750Z import triton 2023-01-11T21:38:06.3704841Z import triton.language as tl 2023-01-11T21:38:06.3704954Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.3705050Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.3705186Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.3705326Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.3705332Z 2023-01-11T21:38:06.3705752Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp16', 1: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.3705826Z @triton.jit 2023-01-11T21:38:06.3705946Z def triton_(out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.3706020Z xnumel = 256 2023-01-11T21:38:06.3706118Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.3706275Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.3706361Z xmask = xindex < xnumel 2023-01-11T21:38:06.3706435Z x0 = xindex 2023-01-11T21:38:06.3706509Z tmp0 = 1 2023-01-11T21:38:06.3706644Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask) 2023-01-11T21:38:06.3706733Z ''') 2023-01-11T21:38:06.3706739Z 2023-01-11T21:38:06.3706743Z 2023-01-11T21:38:06.3706906Z triton_fused_full_like_1 = async_compile.triton(''' 2023-01-11T21:38:06.3706977Z import triton 2023-01-11T21:38:06.3707072Z import triton.language as tl 2023-01-11T21:38:06.3707187Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.3707291Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.3707423Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.3707550Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.3707554Z 2023-01-11T21:38:06.3707930Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp16', 1: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.3708005Z @triton.jit 2023-01-11T21:38:06.3708120Z def triton_(out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.3708194Z xnumel = 256 2023-01-11T21:38:06.3708296Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.3708423Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.3708508Z xmask = xindex < xnumel 2023-01-11T21:38:06.3708580Z x0 = xindex 2023-01-11T21:38:06.3708655Z tmp0 = 3.0 2023-01-11T21:38:06.3708783Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask) 2023-01-11T21:38:06.3708869Z ''') 2023-01-11T21:38:06.3708874Z 2023-01-11T21:38:06.3708878Z 2023-01-11T21:38:06.3708973Z async_compile.wait(globals()) 2023-01-11T21:38:06.3709055Z del async_compile 2023-01-11T21:38:06.3709060Z 2023-01-11T21:38:06.3709140Z def call(args): 2023-01-11T21:38:06.3709215Z arg0_1, = args 2023-01-11T21:38:06.3709291Z args.clear() 2023-01-11T21:38:06.3709379Z with torch.cuda.device(0): 2023-01-11T21:38:06.3709584Z buf0 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.3709678Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.3709849Z triton_fused_ones_like_0.run(buf0, 256, grid=grid(256), stream=stream0) 2023-01-11T21:38:06.3710055Z buf1 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.3710190Z triton_fused_full_like_1.run(buf1, 256, grid=grid(256), stream=stream0) 2023-01-11T21:38:06.3710274Z return (buf0, buf1, ) 2023-01-11T21:38:06.3710280Z 2023-01-11T21:38:06.3710284Z 2023-01-11T21:38:06.3710366Z if __name__ == "__main__": 2023-01-11T21:38:06.3710486Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.3710608Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.3710817Z arg0_1 = rand_strided((16, 16), (16, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.3710933Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.3710938Z 2023-01-11T21:38:06.3711011Z ok (0.224s) 2023-01-11T21:38:06.3711466Z test_flip_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.3711599Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.3711860Z [2023-01-11 21:34:39,092] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 550 2023-01-11T21:38:06.3712124Z [2023-01-11 21:34:39,182] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 550 2023-01-11T21:38:06.3712621Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.3712752Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.3713000Z [2023-01-11 21:34:39,221] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 551 2023-01-11T21:38:06.3713262Z [2023-01-11 21:34:39,305] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 551 2023-01-11T21:38:06.3713267Z 2023-01-11T21:38:06.3713366Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.3713441Z import torch 2023-01-11T21:38:06.3713515Z import random 2023-01-11T21:38:06.3713637Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.3713761Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.3713766Z 2023-01-11T21:38:06.3713847Z aten = torch.ops.aten 2023-01-11T21:38:06.3713977Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.3714076Z async_compile = AsyncCompile() 2023-01-11T21:38:06.3714081Z 2023-01-11T21:38:06.3714156Z import triton 2023-01-11T21:38:06.3714247Z import triton.language as tl 2023-01-11T21:38:06.3714370Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.3714509Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.3714515Z 2023-01-11T21:38:06.3714520Z 2023-01-11T21:38:06.3714673Z triton_fused_rev_0 = async_compile.triton(''' 2023-01-11T21:38:06.3714745Z import triton 2023-01-11T21:38:06.3714830Z import triton.language as tl 2023-01-11T21:38:06.3714943Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.3715043Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.3715180Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.3715316Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.3715322Z 2023-01-11T21:38:06.3715786Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.3715862Z @triton.jit 2023-01-11T21:38:06.3715995Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.3716062Z xnumel = 72 2023-01-11T21:38:06.3716157Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.3716285Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.3716368Z xmask = xindex < xnumel 2023-01-11T21:38:06.3716442Z x0 = xindex % 6 2023-01-11T21:38:06.3716520Z x1 = (xindex // 6) 2023-01-11T21:38:06.3716593Z x2 = xindex 2023-01-11T21:38:06.3716801Z tmp0 = tl.load(in_ptr0 + (5 + ((-1)*x0) + (6*x1)), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.3716934Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask) 2023-01-11T21:38:06.3717019Z ''') 2023-01-11T21:38:06.3717024Z 2023-01-11T21:38:06.3717029Z 2023-01-11T21:38:06.3717183Z triton_fused_sub_1 = async_compile.triton(''' 2023-01-11T21:38:06.3717259Z import triton 2023-01-11T21:38:06.3717349Z import triton.language as tl 2023-01-11T21:38:06.3717462Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.3717556Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.3717688Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.3717813Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.3717818Z 2023-01-11T21:38:06.3718218Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.3718322Z @triton.jit 2023-01-11T21:38:06.3718454Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.3718526Z xnumel = 72 2023-01-11T21:38:06.3718622Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.3718745Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.3718827Z xmask = xindex < xnumel 2023-01-11T21:38:06.3718899Z x0 = xindex % 6 2023-01-11T21:38:06.3718981Z x1 = (xindex // 6) % 6 2023-01-11T21:38:06.3719060Z x2 = (xindex // 36) 2023-01-11T21:38:06.3719132Z x3 = xindex 2023-01-11T21:38:06.3719309Z tmp0 = tl.load(in_ptr0 + (30 + x0 + ((-6)*x1) + (36*x2)), xmask) 2023-01-11T21:38:06.3719373Z tmp1 = 2 2023-01-11T21:38:06.3719486Z tmp2 = tmp0 - tmp1 2023-01-11T21:38:06.3719616Z tl.store(out_ptr0 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.3719704Z ''') 2023-01-11T21:38:06.3719709Z 2023-01-11T21:38:06.3719714Z 2023-01-11T21:38:06.3719808Z async_compile.wait(globals()) 2023-01-11T21:38:06.3719884Z del async_compile 2023-01-11T21:38:06.3719889Z 2023-01-11T21:38:06.3719964Z def call(args): 2023-01-11T21:38:06.3720038Z arg0_1, = args 2023-01-11T21:38:06.3720105Z args.clear() 2023-01-11T21:38:06.3720200Z with torch.cuda.device(0): 2023-01-11T21:38:06.3720416Z buf0 = empty_strided((1, 2, 6, 6), (72, 36, 6, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3720509Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.3720643Z triton_fused_rev_0.run(arg0_1, buf0, 72, grid=grid(72), stream=stream0) 2023-01-11T21:38:06.3720858Z buf1 = empty_strided((1, 2, 6, 6), (72, 36, 6, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3720993Z triton_fused_sub_1.run(arg0_1, buf1, 72, grid=grid(72), stream=stream0) 2023-01-11T21:38:06.3721059Z del arg0_1 2023-01-11T21:38:06.3721143Z return (buf0, buf1, ) 2023-01-11T21:38:06.3721151Z 2023-01-11T21:38:06.3721156Z 2023-01-11T21:38:06.3721235Z if __name__ == "__main__": 2023-01-11T21:38:06.3721353Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.3721479Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.3721721Z arg0_1 = rand_strided((1, 2, 6, 6), (72, 36, 6, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.3721835Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.3721841Z 2023-01-11T21:38:06.3721845Z 2023-01-11T21:38:06.3721942Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.3722009Z import torch 2023-01-11T21:38:06.3722081Z import random 2023-01-11T21:38:06.3722199Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.3722321Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.3722326Z 2023-01-11T21:38:06.3722408Z aten = torch.ops.aten 2023-01-11T21:38:06.3722545Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.3722642Z async_compile = AsyncCompile() 2023-01-11T21:38:06.3722646Z 2023-01-11T21:38:06.3722721Z import triton 2023-01-11T21:38:06.3722806Z import triton.language as tl 2023-01-11T21:38:06.3722931Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.3723074Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.3723080Z 2023-01-11T21:38:06.3723084Z 2023-01-11T21:38:06.3723238Z triton_fused_rev_0 = async_compile.triton(''' 2023-01-11T21:38:06.3723312Z import triton 2023-01-11T21:38:06.3723403Z import triton.language as tl 2023-01-11T21:38:06.3723518Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.3723619Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.3723744Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.3723869Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.3723874Z 2023-01-11T21:38:06.3724270Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.3724371Z @triton.jit 2023-01-11T21:38:06.3724500Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.3724572Z xnumel = 72 2023-01-11T21:38:06.3724668Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.3724795Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.3724870Z xmask = xindex < xnumel 2023-01-11T21:38:06.3724943Z x0 = xindex % 6 2023-01-11T21:38:06.3725020Z x1 = (xindex // 6) 2023-01-11T21:38:06.3725090Z x2 = xindex 2023-01-11T21:38:06.3725328Z tmp0 = tl.load(in_ptr0 + (5 + ((-1)*x0) + (6*x1)), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.3725459Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask) 2023-01-11T21:38:06.3725544Z ''') 2023-01-11T21:38:06.3725552Z 2023-01-11T21:38:06.3725557Z 2023-01-11T21:38:06.3725702Z triton_fused_sub_1 = async_compile.triton(''' 2023-01-11T21:38:06.3725775Z import triton 2023-01-11T21:38:06.3725873Z import triton.language as tl 2023-01-11T21:38:06.3725986Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.3726088Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.3726221Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.3726345Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.3726350Z 2023-01-11T21:38:06.3726750Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.3726816Z @triton.jit 2023-01-11T21:38:06.3726950Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.3727022Z xnumel = 72 2023-01-11T21:38:06.3727122Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.3727253Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.3727335Z xmask = xindex < xnumel 2023-01-11T21:38:06.3727409Z x0 = xindex % 6 2023-01-11T21:38:06.3727482Z x1 = (xindex // 6) % 6 2023-01-11T21:38:06.3727587Z x2 = (xindex // 36) 2023-01-11T21:38:06.3727658Z x3 = xindex 2023-01-11T21:38:06.3727857Z tmp0 = tl.load(in_ptr0 + (30 + x0 + ((-6)*x1) + (36*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.3727930Z tmp1 = 2 2023-01-11T21:38:06.3728041Z tmp2 = tmp0 - tmp1 2023-01-11T21:38:06.3728173Z tl.store(out_ptr0 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.3728251Z ''') 2023-01-11T21:38:06.3728256Z 2023-01-11T21:38:06.3728268Z 2023-01-11T21:38:06.3728354Z async_compile.wait(globals()) 2023-01-11T21:38:06.3728431Z del async_compile 2023-01-11T21:38:06.3728437Z 2023-01-11T21:38:06.3728509Z def call(args): 2023-01-11T21:38:06.3728585Z arg0_1, = args 2023-01-11T21:38:06.3728657Z args.clear() 2023-01-11T21:38:06.3728750Z with torch.cuda.device(0): 2023-01-11T21:38:06.3728956Z buf0 = empty_strided((1, 2, 6, 6), (72, 36, 6, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.3729052Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.3729189Z triton_fused_rev_0.run(arg0_1, buf0, 72, grid=grid(72), stream=stream0) 2023-01-11T21:38:06.3729403Z buf1 = empty_strided((1, 2, 6, 6), (72, 36, 6, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.3729539Z triton_fused_sub_1.run(arg0_1, buf1, 72, grid=grid(72), stream=stream0) 2023-01-11T21:38:06.3729612Z del arg0_1 2023-01-11T21:38:06.3729692Z return (buf0, buf1, ) 2023-01-11T21:38:06.3729697Z 2023-01-11T21:38:06.3729701Z 2023-01-11T21:38:06.3729777Z if __name__ == "__main__": 2023-01-11T21:38:06.3729887Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.3730012Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.3730259Z arg0_1 = rand_strided((1, 2, 6, 6), (72, 36, 6, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.3730370Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.3730376Z 2023-01-11T21:38:06.3730448Z ok (0.253s) 2023-01-11T21:38:06.3730905Z test_fmod_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.3731036Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.3731295Z [2023-01-11 21:34:39,326] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 552 2023-01-11T21:38:06.3731558Z [2023-01-11 21:34:39,576] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 552 2023-01-11T21:38:06.3731978Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.3732111Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.3732359Z [2023-01-11 21:34:39,597] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 553 2023-01-11T21:38:06.3732622Z [2023-01-11 21:34:39,761] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 553 2023-01-11T21:38:06.3732628Z 2023-01-11T21:38:06.3732726Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.3732797Z import torch 2023-01-11T21:38:06.3732873Z import random 2023-01-11T21:38:06.3732994Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.3733117Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.3733122Z 2023-01-11T21:38:06.3733204Z aten = torch.ops.aten 2023-01-11T21:38:06.3733333Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.3733456Z async_compile = AsyncCompile() 2023-01-11T21:38:06.3733462Z 2023-01-11T21:38:06.3733535Z import triton 2023-01-11T21:38:06.3733628Z import triton.language as tl 2023-01-11T21:38:06.3733755Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.3733893Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.3733899Z 2023-01-11T21:38:06.3733903Z 2023-01-11T21:38:06.3734065Z triton_fused_fmod_sub_0 = async_compile.triton(''' 2023-01-11T21:38:06.3734137Z import triton 2023-01-11T21:38:06.3734222Z import triton.language as tl 2023-01-11T21:38:06.3734335Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.3734441Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.3734681Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.3734806Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.3734812Z 2023-01-11T21:38:06.3735248Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.3735323Z @triton.jit 2023-01-11T21:38:06.3735473Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.3735540Z xnumel = 72 2023-01-11T21:38:06.3735638Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.3735767Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.3735850Z xmask = xindex < xnumel 2023-01-11T21:38:06.3735918Z x0 = xindex 2023-01-11T21:38:06.3736163Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.3736350Z tmp1 = tl.load(in_ptr1 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.3736441Z tmp3 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.3736536Z tmp6 = tl.load(in_ptr1 + (x0), xmask) 2023-01-11T21:38:06.3736641Z tmp2 = tl.libdevice.fmod(tmp0, tmp1) 2023-01-11T21:38:06.3736715Z tmp4 = 3.0 2023-01-11T21:38:06.3736794Z tmp5 = tmp3 * tmp4 2023-01-11T21:38:06.3736898Z tmp7 = tl.libdevice.fmod(tmp5, tmp6) 2023-01-11T21:38:06.3736970Z tmp8 = 2.0 2023-01-11T21:38:06.3737072Z tmp9 = tmp7 - tmp8 2023-01-11T21:38:06.3737269Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.3737410Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp9, xmask) 2023-01-11T21:38:06.3737495Z ''') 2023-01-11T21:38:06.3737501Z 2023-01-11T21:38:06.3737505Z 2023-01-11T21:38:06.3737599Z async_compile.wait(globals()) 2023-01-11T21:38:06.3737681Z del async_compile 2023-01-11T21:38:06.3737687Z 2023-01-11T21:38:06.3737760Z def call(args): 2023-01-11T21:38:06.3737839Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.3737908Z args.clear() 2023-01-11T21:38:06.3737999Z with torch.cuda.device(0): 2023-01-11T21:38:06.3738217Z buf0 = empty_strided((1, 2, 6, 6), (72, 36, 6, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3738433Z buf1 = empty_strided((1, 2, 6, 6), (72, 36, 6, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3738525Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.3738678Z triton_fused_fmod_sub_0.run(arg0_1, arg1_1, buf0, buf1, 72, grid=grid(72), stream=stream0) 2023-01-11T21:38:06.3738749Z del arg0_1 2023-01-11T21:38:06.3738814Z del arg1_1 2023-01-11T21:38:06.3738896Z return (buf0, buf1, ) 2023-01-11T21:38:06.3738901Z 2023-01-11T21:38:06.3738905Z 2023-01-11T21:38:06.3738987Z if __name__ == "__main__": 2023-01-11T21:38:06.3739108Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.3739237Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.3739449Z arg0_1 = rand_strided((1, 2, 6, 6), (72, 36, 6, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.3739704Z arg1_1 = rand_strided((1, 2, 6, 6), (72, 36, 6, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.3739826Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.3739831Z 2023-01-11T21:38:06.3739836Z 2023-01-11T21:38:06.3739932Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.3739999Z import torch 2023-01-11T21:38:06.3740072Z import random 2023-01-11T21:38:06.3740190Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.3740315Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.3740320Z 2023-01-11T21:38:06.3740402Z aten = torch.ops.aten 2023-01-11T21:38:06.3740537Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.3740632Z async_compile = AsyncCompile() 2023-01-11T21:38:06.3740637Z 2023-01-11T21:38:06.3740704Z import triton 2023-01-11T21:38:06.3740795Z import triton.language as tl 2023-01-11T21:38:06.3740920Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.3741060Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.3741065Z 2023-01-11T21:38:06.3741070Z 2023-01-11T21:38:06.3741231Z triton_fused_fmod_sub_0 = async_compile.triton(''' 2023-01-11T21:38:06.3741305Z import triton 2023-01-11T21:38:06.3741398Z import triton.language as tl 2023-01-11T21:38:06.3741512Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.3741607Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.3741739Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.3741863Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.3741869Z 2023-01-11T21:38:06.3742300Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: '*fp16', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.3742401Z @triton.jit 2023-01-11T21:38:06.3742557Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.3742629Z xnumel = 72 2023-01-11T21:38:06.3742727Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.3742850Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.3742931Z xmask = xindex < xnumel 2023-01-11T21:38:06.3743003Z x0 = xindex 2023-01-11T21:38:06.3743218Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.3743430Z tmp1 = tl.load(in_ptr1 + (x0), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.3743548Z tmp3 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.3743666Z tmp6 = tl.load(in_ptr1 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.3743764Z tmp2 = tl.libdevice.fmod(tmp0, tmp1) 2023-01-11T21:38:06.3743837Z tmp4 = 3.0 2023-01-11T21:38:06.3743918Z tmp5 = tmp3 * tmp4 2023-01-11T21:38:06.3744022Z tmp7 = tl.libdevice.fmod(tmp5, tmp6) 2023-01-11T21:38:06.3744094Z tmp8 = 2.0 2023-01-11T21:38:06.3744208Z tmp9 = tmp7 - tmp8 2023-01-11T21:38:06.3744343Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.3744470Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp9, xmask) 2023-01-11T21:38:06.3744554Z ''') 2023-01-11T21:38:06.3744559Z 2023-01-11T21:38:06.3744564Z 2023-01-11T21:38:06.3744655Z async_compile.wait(globals()) 2023-01-11T21:38:06.3744730Z del async_compile 2023-01-11T21:38:06.3744735Z 2023-01-11T21:38:06.3744809Z def call(args): 2023-01-11T21:38:06.3744888Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.3744964Z args.clear() 2023-01-11T21:38:06.3745056Z with torch.cuda.device(0): 2023-01-11T21:38:06.3745268Z buf0 = empty_strided((1, 2, 6, 6), (72, 36, 6, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.3745503Z buf1 = empty_strided((1, 2, 6, 6), (72, 36, 6, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.3745605Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.3745805Z triton_fused_fmod_sub_0.run(arg0_1, arg1_1, buf0, buf1, 72, grid=grid(72), stream=stream0) 2023-01-11T21:38:06.3745879Z del arg0_1 2023-01-11T21:38:06.3745950Z del arg1_1 2023-01-11T21:38:06.3746033Z return (buf0, buf1, ) 2023-01-11T21:38:06.3746038Z 2023-01-11T21:38:06.3746042Z 2023-01-11T21:38:06.3746121Z if __name__ == "__main__": 2023-01-11T21:38:06.3746232Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.3746359Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.3746574Z arg0_1 = rand_strided((1, 2, 6, 6), (72, 36, 6, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.3746790Z arg1_1 = rand_strided((1, 2, 6, 6), (72, 36, 6, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.3746909Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.3746915Z 2023-01-11T21:38:06.3746986Z ok (0.456s) 2023-01-11T21:38:06.3747446Z test_fmod_zero_dim_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.3747577Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.3747835Z [2023-01-11 21:34:39,775] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 554 2023-01-11T21:38:06.3748091Z [2023-01-11 21:34:40,011] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 554 2023-01-11T21:38:06.3748536Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.3748666Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.3748921Z [2023-01-11 21:34:40,024] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 555 2023-01-11T21:38:06.3749184Z [2023-01-11 21:34:40,168] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 555 2023-01-11T21:38:06.3749595Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.3749730Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.3749989Z [2023-01-11 21:34:40,181] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 556 2023-01-11T21:38:06.3750246Z [2023-01-11 21:34:40,325] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 556 2023-01-11T21:38:06.3750657Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.3750790Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.3751042Z [2023-01-11 21:34:40,338] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 557 2023-01-11T21:38:06.3751048Z 2023-01-11T21:38:06.3751140Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.3751212Z import torch 2023-01-11T21:38:06.3751286Z import random 2023-01-11T21:38:06.3751431Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.3751556Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.3751561Z 2023-01-11T21:38:06.3751643Z aten = torch.ops.aten 2023-01-11T21:38:06.3751780Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.3751868Z async_compile = AsyncCompile() 2023-01-11T21:38:06.3751879Z 2023-01-11T21:38:06.3751946Z import triton 2023-01-11T21:38:06.3752039Z import triton.language as tl 2023-01-11T21:38:06.3752165Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.3752305Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.3752313Z 2023-01-11T21:38:06.3752317Z 2023-01-11T21:38:06.3752475Z triton_fused_fmod_0 = async_compile.triton(''' 2023-01-11T21:38:06.3752548Z import triton 2023-01-11T21:38:06.3752640Z import triton.language as tl 2023-01-11T21:38:06.3752748Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.3752854Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.3752987Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.3753111Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.3753117Z 2023-01-11T21:38:06.3753529Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.3753603Z @triton.jit 2023-01-11T21:38:06.3753744Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.3753846Z xnumel = 10 2023-01-11T21:38:06.3753936Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.3754066Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.3754149Z xmask = xindex < xnumel 2023-01-11T21:38:06.3754220Z x0 = xindex 2023-01-11T21:38:06.3754321Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.3754453Z tmp1 = tl.load(in_ptr1 + (0 + tl.zeros([XBLOCK], tl.int32)), None) 2023-01-11T21:38:06.3754558Z tmp2 = tl.libdevice.fmod(tmp0, tmp1) 2023-01-11T21:38:06.3754683Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.3754768Z ''') 2023-01-11T21:38:06.3754774Z 2023-01-11T21:38:06.3754778Z 2023-01-11T21:38:06.3754868Z async_compile.wait(globals()) 2023-01-11T21:38:06.3754944Z del async_compile 2023-01-11T21:38:06.3754949Z 2023-01-11T21:38:06.3755027Z def call(args): 2023-01-11T21:38:06.3755107Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.3755182Z args.clear() 2023-01-11T21:38:06.3755274Z with torch.cuda.device(0): 2023-01-11T21:38:06.3755465Z buf0 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3755553Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.3755700Z triton_fused_fmod_0.run(arg0_1, arg1_1, buf0, 10, grid=grid(10), stream=stream0) 2023-01-11T21:38:06.3755774Z del arg0_1 2023-01-11T21:38:06.3755845Z del arg1_1 2023-01-11T21:38:06.3755922Z return (buf0, ) 2023-01-11T21:38:06.3755928Z 2023-01-11T21:38:06.3755933Z 2023-01-11T21:38:06.3756013Z if __name__ == "__main__": 2023-01-11T21:38:06.3756130Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.3756249Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.3756449Z arg0_1 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.3756635Z arg1_1 = rand_strided((), (), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.3756757Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.3756762Z 2023-01-11T21:38:06.3756767Z 2023-01-11T21:38:06.3756864Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.3756938Z import torch 2023-01-11T21:38:06.3757012Z import random 2023-01-11T21:38:06.3757124Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.3757279Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.3757285Z 2023-01-11T21:38:06.3757369Z aten = torch.ops.aten 2023-01-11T21:38:06.3757504Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.3757600Z async_compile = AsyncCompile() 2023-01-11T21:38:06.3757605Z 2023-01-11T21:38:06.3757678Z import triton 2023-01-11T21:38:06.3757770Z import triton.language as tl 2023-01-11T21:38:06.3757895Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.3758027Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.3758032Z 2023-01-11T21:38:06.3758045Z 2023-01-11T21:38:06.3758194Z triton_fused_fmod_0 = async_compile.triton(''' 2023-01-11T21:38:06.3758274Z import triton 2023-01-11T21:38:06.3758366Z import triton.language as tl 2023-01-11T21:38:06.3758481Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.3758584Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.3758720Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.3758843Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.3758848Z 2023-01-11T21:38:06.3759256Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.3759331Z @triton.jit 2023-01-11T21:38:06.3759471Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.3759545Z xnumel = 10 2023-01-11T21:38:06.3759668Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.3759797Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.3759880Z xmask = xindex < xnumel 2023-01-11T21:38:06.3759952Z x0 = xindex 2023-01-11T21:38:06.3760062Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.3760209Z tmp1 = tl.load(in_ptr1 + (0 + tl.zeros([XBLOCK], tl.int32)), None).to(tl.float32) 2023-01-11T21:38:06.3760313Z tmp2 = tl.libdevice.fmod(tmp0, tmp1) 2023-01-11T21:38:06.3760449Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.3760534Z ''') 2023-01-11T21:38:06.3760540Z 2023-01-11T21:38:06.3760544Z 2023-01-11T21:38:06.3760636Z async_compile.wait(globals()) 2023-01-11T21:38:06.3760711Z del async_compile 2023-01-11T21:38:06.3760716Z 2023-01-11T21:38:06.3760784Z def call(args): 2023-01-11T21:38:06.3760862Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.3760936Z args.clear() 2023-01-11T21:38:06.3761033Z with torch.cuda.device(0): 2023-01-11T21:38:06.3761229Z buf0 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.3761318Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.3761462Z triton_fused_fmod_0.run(arg0_1, arg1_1, buf0, 10, grid=grid(10), stream=stream0) 2023-01-11T21:38:06.3761530Z del arg0_1 2023-01-11T21:38:06.3761605Z del arg1_1 2023-01-11T21:38:06.3761681Z return (buf0, ) 2023-01-11T21:38:06.3761686Z 2023-01-11T21:38:06.3761691Z 2023-01-11T21:38:06.3761770Z if __name__ == "__main__": 2023-01-11T21:38:06.3761889Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.3762018Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.3762218Z arg0_1 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.3762410Z arg1_1 = rand_strided((), (), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.3762523Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.3762538Z 2023-01-11T21:38:06.3762542Z 2023-01-11T21:38:06.3762634Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.3762708Z import torch 2023-01-11T21:38:06.3762782Z import random 2023-01-11T21:38:06.3762900Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.3763051Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.3763057Z 2023-01-11T21:38:06.3763140Z aten = torch.ops.aten 2023-01-11T21:38:06.3763276Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.3763364Z async_compile = AsyncCompile() 2023-01-11T21:38:06.3763369Z 2023-01-11T21:38:06.3763443Z import triton 2023-01-11T21:38:06.3763539Z import triton.language as tl 2023-01-11T21:38:06.3763663Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.3763802Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.3763808Z 2023-01-11T21:38:06.3763812Z 2023-01-11T21:38:06.3763972Z triton_fused_fmod_0 = async_compile.triton(''' 2023-01-11T21:38:06.3764048Z import triton 2023-01-11T21:38:06.3764139Z import triton.language as tl 2023-01-11T21:38:06.3764246Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.3764348Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.3764484Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.3764608Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.3764614Z 2023-01-11T21:38:06.3765027Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.3765101Z @triton.jit 2023-01-11T21:38:06.3765241Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.3765314Z xnumel = 10 2023-01-11T21:38:06.3765405Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.3765562Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.3765646Z xmask = xindex < xnumel 2023-01-11T21:38:06.3765716Z x0 = xindex 2023-01-11T21:38:06.3765845Z tmp0 = tl.load(in_ptr0 + (0 + tl.zeros([XBLOCK], tl.int32)), None) 2023-01-11T21:38:06.3765945Z tmp1 = tl.load(in_ptr1 + (x0), xmask) 2023-01-11T21:38:06.3766052Z tmp2 = tl.libdevice.fmod(tmp0, tmp1) 2023-01-11T21:38:06.3766179Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.3766265Z ''') 2023-01-11T21:38:06.3766271Z 2023-01-11T21:38:06.3766275Z 2023-01-11T21:38:06.3766367Z async_compile.wait(globals()) 2023-01-11T21:38:06.3766444Z del async_compile 2023-01-11T21:38:06.3766450Z 2023-01-11T21:38:06.3766526Z def call(args): 2023-01-11T21:38:06.3766606Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.3766681Z args.clear() 2023-01-11T21:38:06.3766772Z with torch.cuda.device(0): 2023-01-11T21:38:06.3766963Z buf0 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3767059Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.3767201Z triton_fused_fmod_0.run(arg0_1, arg1_1, buf0, 10, grid=grid(10), stream=stream0) 2023-01-11T21:38:06.3767274Z del arg0_1 2023-01-11T21:38:06.3767345Z del arg1_1 2023-01-11T21:38:06.3767430Z return (buf0, ) 2023-01-11T21:38:06.3767435Z 2023-01-11T21:38:06.3767439Z 2023-01-11T21:38:06.3767519Z if __name__ == "__main__": 2023-01-11T21:38:06.3767629Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.3767752Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.3767943Z arg0_1 = rand_strided((), (), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.3768147Z arg1_1 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.3768266Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.3768271Z 2023-01-11T21:38:06.3768541Z [2023-01-11 21:34:40,483] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 557 2023-01-11T21:38:06.3768547Z 2023-01-11T21:38:06.3768644Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.3768717Z import torch 2023-01-11T21:38:06.3768785Z import random 2023-01-11T21:38:06.3768927Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.3769052Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.3769057Z 2023-01-11T21:38:06.3769138Z aten = torch.ops.aten 2023-01-11T21:38:06.3769276Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.3769370Z async_compile = AsyncCompile() 2023-01-11T21:38:06.3769375Z 2023-01-11T21:38:06.3769446Z import triton 2023-01-11T21:38:06.3769536Z import triton.language as tl 2023-01-11T21:38:06.3769654Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.3769792Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.3769801Z 2023-01-11T21:38:06.3769805Z 2023-01-11T21:38:06.3769962Z triton_fused_fmod_0 = async_compile.triton(''' 2023-01-11T21:38:06.3770039Z import triton 2023-01-11T21:38:06.3770130Z import triton.language as tl 2023-01-11T21:38:06.3770243Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.3770343Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.3770471Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.3770595Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.3770600Z 2023-01-11T21:38:06.3771013Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.3771084Z @triton.jit 2023-01-11T21:38:06.3771226Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.3771299Z xnumel = 10 2023-01-11T21:38:06.3771429Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.3771554Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.3771634Z xmask = xindex < xnumel 2023-01-11T21:38:06.3771698Z x0 = xindex 2023-01-11T21:38:06.3771840Z tmp0 = tl.load(in_ptr0 + (0 + tl.zeros([XBLOCK], tl.int32)), None).to(tl.float32) 2023-01-11T21:38:06.3771956Z tmp1 = tl.load(in_ptr1 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.3772064Z tmp2 = tl.libdevice.fmod(tmp0, tmp1) 2023-01-11T21:38:06.3772198Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.3772283Z ''') 2023-01-11T21:38:06.3772288Z 2023-01-11T21:38:06.3772292Z 2023-01-11T21:38:06.3772383Z async_compile.wait(globals()) 2023-01-11T21:38:06.3772453Z del async_compile 2023-01-11T21:38:06.3772462Z 2023-01-11T21:38:06.3772531Z def call(args): 2023-01-11T21:38:06.3772609Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.3772684Z args.clear() 2023-01-11T21:38:06.3772777Z with torch.cuda.device(0): 2023-01-11T21:38:06.3772974Z buf0 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.3773066Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.3773202Z triton_fused_fmod_0.run(arg0_1, arg1_1, buf0, 10, grid=grid(10), stream=stream0) 2023-01-11T21:38:06.3773280Z del arg0_1 2023-01-11T21:38:06.3773352Z del arg1_1 2023-01-11T21:38:06.3773428Z return (buf0, ) 2023-01-11T21:38:06.3773433Z 2023-01-11T21:38:06.3773437Z 2023-01-11T21:38:06.3773515Z if __name__ == "__main__": 2023-01-11T21:38:06.3773631Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.3773756Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.3773944Z arg0_1 = rand_strided((), (), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.3774135Z arg1_1 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.3774254Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.3774261Z 2023-01-11T21:38:06.3774331Z ok (0.722s) 2023-01-11T21:38:06.3774951Z test_forced_buffer_realize_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.3775087Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.3775347Z [2023-01-11 21:34:40,507] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 558 2023-01-11T21:38:06.3775610Z [2023-01-11 21:34:40,575] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 558 2023-01-11T21:38:06.3776020Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.3776155Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.3776408Z [2023-01-11 21:34:40,598] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 559 2023-01-11T21:38:06.3776668Z [2023-01-11 21:34:40,667] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 559 2023-01-11T21:38:06.3776674Z 2023-01-11T21:38:06.3776765Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.3776839Z import torch 2023-01-11T21:38:06.3776911Z import random 2023-01-11T21:38:06.3777030Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.3777213Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.3777269Z 2023-01-11T21:38:06.3777368Z aten = torch.ops.aten 2023-01-11T21:38:06.3777527Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.3777620Z async_compile = AsyncCompile() 2023-01-11T21:38:06.3777633Z 2023-01-11T21:38:06.3777700Z import triton 2023-01-11T21:38:06.3777793Z import triton.language as tl 2023-01-11T21:38:06.3777918Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.3778055Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.3778061Z 2023-01-11T21:38:06.3778065Z 2023-01-11T21:38:06.3778226Z triton_fused_mul_mul_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.3778297Z import triton 2023-01-11T21:38:06.3778389Z import triton.language as tl 2023-01-11T21:38:06.3778495Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.3778597Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.3778727Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.3778859Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.3778864Z 2023-01-11T21:38:06.3779284Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.3779357Z @triton.jit 2023-01-11T21:38:06.3779488Z def triton_(in_out_ptr0, in_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.3779561Z xnumel = 10 2023-01-11T21:38:06.3779651Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.3779778Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.3779860Z xmask = xindex < xnumel 2023-01-11T21:38:06.3779931Z x0 = xindex 2023-01-11T21:38:06.3780026Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.3780094Z tmp1 = 2 2023-01-11T21:38:06.3780170Z tmp2 = tmp0 * tmp1 2023-01-11T21:38:06.3786356Z tmp3 = tmp2 * tmp1 2023-01-11T21:38:06.3786520Z tl.store(in_out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.3786619Z ''') 2023-01-11T21:38:06.3786624Z 2023-01-11T21:38:06.3786629Z 2023-01-11T21:38:06.3786725Z async_compile.wait(globals()) 2023-01-11T21:38:06.3786803Z del async_compile 2023-01-11T21:38:06.3786808Z 2023-01-11T21:38:06.3786946Z def call(args): 2023-01-11T21:38:06.3787022Z arg0_1, = args 2023-01-11T21:38:06.3787100Z args.clear() 2023-01-11T21:38:06.3787194Z with torch.cuda.device(0): 2023-01-11T21:38:06.3787390Z buf0 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3787482Z buf1 = buf0; del buf0 # reuse 2023-01-11T21:38:06.3787575Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.3787716Z triton_fused_mul_mul_1_0.run(buf1, arg0_1, 10, grid=grid(10), stream=stream0) 2023-01-11T21:38:06.3787789Z del arg0_1 2023-01-11T21:38:06.3787862Z return (buf1, ) 2023-01-11T21:38:06.3787871Z 2023-01-11T21:38:06.3787875Z 2023-01-11T21:38:06.3787950Z if __name__ == "__main__": 2023-01-11T21:38:06.3788069Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.3788194Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.3788395Z arg0_1 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.3788507Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.3788512Z 2023-01-11T21:38:06.3788517Z 2023-01-11T21:38:06.3788614Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.3788687Z import torch 2023-01-11T21:38:06.3788765Z import random 2023-01-11T21:38:06.3788877Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.3789001Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.3789006Z 2023-01-11T21:38:06.3789090Z aten = torch.ops.aten 2023-01-11T21:38:06.3789224Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.3789350Z async_compile = AsyncCompile() 2023-01-11T21:38:06.3789355Z 2023-01-11T21:38:06.3789429Z import triton 2023-01-11T21:38:06.3789521Z import triton.language as tl 2023-01-11T21:38:06.3789645Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.3789778Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.3789786Z 2023-01-11T21:38:06.3789790Z 2023-01-11T21:38:06.3789952Z triton_fused_mul_mul_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.3790023Z import triton 2023-01-11T21:38:06.3790115Z import triton.language as tl 2023-01-11T21:38:06.3790232Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.3790334Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.3790469Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.3790587Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.3790599Z 2023-01-11T21:38:06.3791004Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.3791079Z @triton.jit 2023-01-11T21:38:06.3791213Z def triton_(in_out_ptr0, in_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.3791289Z xnumel = 10 2023-01-11T21:38:06.3791388Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.3791515Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.3791598Z xmask = xindex < xnumel 2023-01-11T21:38:06.3791662Z x0 = xindex 2023-01-11T21:38:06.3791775Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.3791844Z tmp1 = 2 2023-01-11T21:38:06.3791923Z tmp2 = tmp0 * tmp1 2023-01-11T21:38:06.3792002Z tmp3 = tmp2 * tmp1 2023-01-11T21:38:06.3792139Z tl.store(in_out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.3792226Z ''') 2023-01-11T21:38:06.3792234Z 2023-01-11T21:38:06.3792239Z 2023-01-11T21:38:06.3792331Z async_compile.wait(globals()) 2023-01-11T21:38:06.3792401Z del async_compile 2023-01-11T21:38:06.3792406Z 2023-01-11T21:38:06.3792485Z def call(args): 2023-01-11T21:38:06.3792558Z arg0_1, = args 2023-01-11T21:38:06.3792632Z args.clear() 2023-01-11T21:38:06.3792753Z with torch.cuda.device(0): 2023-01-11T21:38:06.3792952Z buf0 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.3793041Z buf1 = buf0; del buf0 # reuse 2023-01-11T21:38:06.3793126Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.3793264Z triton_fused_mul_mul_1_0.run(buf1, arg0_1, 10, grid=grid(10), stream=stream0) 2023-01-11T21:38:06.3793335Z del arg0_1 2023-01-11T21:38:06.3793414Z return (buf1, ) 2023-01-11T21:38:06.3793419Z 2023-01-11T21:38:06.3793423Z 2023-01-11T21:38:06.3793506Z if __name__ == "__main__": 2023-01-11T21:38:06.3793622Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.3793749Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.3793940Z arg0_1 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.3794051Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.3794056Z 2023-01-11T21:38:06.3794127Z ok (0.183s) 2023-01-11T21:38:06.3794585Z test_full_like_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.3794718Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.3794977Z [2023-01-11 21:34:40,697] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 560 2023-01-11T21:38:06.3795281Z [2023-01-11 21:34:40,848] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 560 2023-01-11T21:38:06.3795696Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.3795827Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.3796084Z [2023-01-11 21:34:40,877] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 561 2023-01-11T21:38:06.3796344Z [2023-01-11 21:34:40,943] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 561 2023-01-11T21:38:06.3796350Z 2023-01-11T21:38:06.3796448Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.3796518Z import torch 2023-01-11T21:38:06.3796591Z import random 2023-01-11T21:38:06.3796711Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.3796837Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.3796842Z 2023-01-11T21:38:06.3796923Z aten = torch.ops.aten 2023-01-11T21:38:06.3797063Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.3797157Z async_compile = AsyncCompile() 2023-01-11T21:38:06.3797163Z 2023-01-11T21:38:06.3797230Z import triton 2023-01-11T21:38:06.3797320Z import triton.language as tl 2023-01-11T21:38:06.3797443Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.3797581Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.3797587Z 2023-01-11T21:38:06.3797591Z 2023-01-11T21:38:06.3797742Z triton_fused_sub_0 = async_compile.triton(''' 2023-01-11T21:38:06.3797814Z import triton 2023-01-11T21:38:06.3797906Z import triton.language as tl 2023-01-11T21:38:06.3798024Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.3798119Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.3798252Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.3798377Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.3798382Z 2023-01-11T21:38:06.3798801Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*fp32', 1: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0,), equal_to_1=())]}) 2023-01-11T21:38:06.3798876Z @triton.jit 2023-01-11T21:38:06.3798999Z def triton_(out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.3799072Z xnumel = 8 2023-01-11T21:38:06.3799162Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.3799295Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.3799377Z xmask = xindex < xnumel 2023-01-11T21:38:06.3799448Z x0 = xindex 2023-01-11T21:38:06.3799523Z tmp0 = 7.777 2023-01-11T21:38:06.3799592Z tmp1 = 1 2023-01-11T21:38:06.3799703Z tmp2 = tmp0 - tmp1 2023-01-11T21:38:06.3799832Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.3799916Z ''') 2023-01-11T21:38:06.3799922Z 2023-01-11T21:38:06.3799926Z 2023-01-11T21:38:06.3800019Z async_compile.wait(globals()) 2023-01-11T21:38:06.3800095Z del async_compile 2023-01-11T21:38:06.3800100Z 2023-01-11T21:38:06.3800174Z def call(args): 2023-01-11T21:38:06.3800248Z arg0_1, = args 2023-01-11T21:38:06.3800322Z args.clear() 2023-01-11T21:38:06.3800413Z with torch.cuda.device(0): 2023-01-11T21:38:06.3800603Z buf0 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3800696Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.3800828Z triton_fused_sub_0.run(buf0, 8, grid=grid(8), stream=stream0) 2023-01-11T21:38:06.3800906Z return (buf0, ) 2023-01-11T21:38:06.3800911Z 2023-01-11T21:38:06.3800943Z 2023-01-11T21:38:06.3801024Z if __name__ == "__main__": 2023-01-11T21:38:06.3801140Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.3801265Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.3801463Z arg0_1 = rand_strided((8, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.3801571Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.3801577Z 2023-01-11T21:38:06.3801581Z 2023-01-11T21:38:06.3801680Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.3801754Z import torch 2023-01-11T21:38:06.3801828Z import random 2023-01-11T21:38:06.3801945Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.3802069Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.3802074Z 2023-01-11T21:38:06.3802154Z aten = torch.ops.aten 2023-01-11T21:38:06.3802283Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.3802379Z async_compile = AsyncCompile() 2023-01-11T21:38:06.3802387Z 2023-01-11T21:38:06.3802462Z import triton 2023-01-11T21:38:06.3802553Z import triton.language as tl 2023-01-11T21:38:06.3802677Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.3802814Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.3802819Z 2023-01-11T21:38:06.3802823Z 2023-01-11T21:38:06.3802977Z triton_fused_sub_0 = async_compile.triton(''' 2023-01-11T21:38:06.3803050Z import triton 2023-01-11T21:38:06.3803135Z import triton.language as tl 2023-01-11T21:38:06.3803248Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.3803350Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.3803480Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.3803604Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.3803609Z 2023-01-11T21:38:06.3803992Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*fp16', 1: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0,), equal_to_1=())]}) 2023-01-11T21:38:06.3804067Z @triton.jit 2023-01-11T21:38:06.3804186Z def triton_(out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.3804252Z xnumel = 8 2023-01-11T21:38:06.3804348Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.3804502Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.3804584Z xmask = xindex < xnumel 2023-01-11T21:38:06.3804657Z x0 = xindex 2023-01-11T21:38:06.3804729Z tmp0 = 7.777 2023-01-11T21:38:06.3804797Z tmp1 = 1 2023-01-11T21:38:06.3804901Z tmp2 = tmp0 - tmp1 2023-01-11T21:38:06.3805036Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.3805120Z ''') 2023-01-11T21:38:06.3805125Z 2023-01-11T21:38:06.3805130Z 2023-01-11T21:38:06.3805221Z async_compile.wait(globals()) 2023-01-11T21:38:06.3805297Z del async_compile 2023-01-11T21:38:06.3805302Z 2023-01-11T21:38:06.3805379Z def call(args): 2023-01-11T21:38:06.3805452Z arg0_1, = args 2023-01-11T21:38:06.3805527Z args.clear() 2023-01-11T21:38:06.3805613Z with torch.cuda.device(0): 2023-01-11T21:38:06.3805809Z buf0 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.3805901Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.3806031Z triton_fused_sub_0.run(buf0, 8, grid=grid(8), stream=stream0) 2023-01-11T21:38:06.3806108Z return (buf0, ) 2023-01-11T21:38:06.3806113Z 2023-01-11T21:38:06.3806119Z 2023-01-11T21:38:06.3806198Z if __name__ == "__main__": 2023-01-11T21:38:06.3806316Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.3806435Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.3806629Z arg0_1 = rand_strided((8, ), (1, ), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.3806741Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.3806746Z 2023-01-11T21:38:06.3806848Z ok (0.276s) 2023-01-11T21:38:06.3807308Z test_fuse_tiled_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.3807437Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.3807693Z [2023-01-11 21:34:40,961] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 562 2023-01-11T21:38:06.3807957Z [2023-01-11 21:34:41,046] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 562 2023-01-11T21:38:06.3808365Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.3808502Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.3808762Z [2023-01-11 21:34:41,064] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 563 2023-01-11T21:38:06.3809019Z [2023-01-11 21:34:41,147] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 563 2023-01-11T21:38:06.3809032Z 2023-01-11T21:38:06.3809123Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.3809195Z import torch 2023-01-11T21:38:06.3809269Z import random 2023-01-11T21:38:06.3809389Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.3809512Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.3809517Z 2023-01-11T21:38:06.3809599Z aten = torch.ops.aten 2023-01-11T21:38:06.3809739Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.3809827Z async_compile = AsyncCompile() 2023-01-11T21:38:06.3809832Z 2023-01-11T21:38:06.3809905Z import triton 2023-01-11T21:38:06.3809997Z import triton.language as tl 2023-01-11T21:38:06.3810127Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.3810343Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.3810349Z 2023-01-11T21:38:06.3810354Z 2023-01-11T21:38:06.3810507Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.3810581Z import triton 2023-01-11T21:38:06.3810667Z import triton.language as tl 2023-01-11T21:38:06.3810783Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.3810882Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.3811013Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.3811139Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.3811144Z 2023-01-11T21:38:06.3811571Z @pointwise(size_hints=[16384], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.3811643Z @triton.jit 2023-01-11T21:38:06.3811786Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.3811861Z xnumel = 16384 2023-01-11T21:38:06.3811951Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.3812078Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.3812160Z xmask = xindex < xnumel 2023-01-11T21:38:06.3812240Z x1 = (xindex // 128) 2023-01-11T21:38:06.3812317Z x0 = xindex % 128 2023-01-11T21:38:06.3812387Z x2 = xindex 2023-01-11T21:38:06.3812479Z tmp0 = tl.load(in_ptr0 + (x1), xmask) 2023-01-11T21:38:06.3812576Z tmp1 = tl.load(in_ptr1 + (x0), xmask) 2023-01-11T21:38:06.3812654Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.3812817Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.3812903Z ''') 2023-01-11T21:38:06.3812909Z 2023-01-11T21:38:06.3812913Z 2023-01-11T21:38:06.3813068Z triton_fused_add_1_1 = async_compile.triton(''' 2023-01-11T21:38:06.3813141Z import triton 2023-01-11T21:38:06.3813236Z import triton.language as tl 2023-01-11T21:38:06.3813343Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.3813446Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.3813578Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.3813702Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.3813707Z 2023-01-11T21:38:06.3814111Z @pointwise(size_hints=[16384], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.3814187Z @triton.jit 2023-01-11T21:38:06.3814317Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.3814389Z xnumel = 16384 2023-01-11T21:38:06.3814722Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.3814856Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.3814946Z xmask = xindex < xnumel 2023-01-11T21:38:06.3815019Z x0 = xindex 2023-01-11T21:38:06.3815118Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.3815190Z tmp1 = 1 2023-01-11T21:38:06.3815271Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.3815401Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.3815490Z ''') 2023-01-11T21:38:06.3815496Z 2023-01-11T21:38:06.3815502Z 2023-01-11T21:38:06.3815598Z async_compile.wait(globals()) 2023-01-11T21:38:06.3815677Z del async_compile 2023-01-11T21:38:06.3815682Z 2023-01-11T21:38:06.3815759Z def call(args): 2023-01-11T21:38:06.3815847Z arg0_1, arg1_1, arg2_1 = args 2023-01-11T21:38:06.3815928Z args.clear() 2023-01-11T21:38:06.3816015Z with torch.cuda.device(0): 2023-01-11T21:38:06.3816226Z buf0 = empty_strided((128, 128), (128, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3816321Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.3816525Z triton_fused_add_0.run(arg0_1, arg1_1, buf0, 16384, grid=grid(16384), stream=stream0) 2023-01-11T21:38:06.3816600Z del arg0_1 2023-01-11T21:38:06.3816673Z del arg1_1 2023-01-11T21:38:06.3816879Z buf1 = empty_strided((128, 128), (128, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3817019Z triton_fused_add_1_1.run(arg2_1, buf1, 16384, grid=grid(16384), stream=stream0) 2023-01-11T21:38:06.3817085Z del arg2_1 2023-01-11T21:38:06.3817227Z return (buf0, buf1, ) 2023-01-11T21:38:06.3817233Z 2023-01-11T21:38:06.3817237Z 2023-01-11T21:38:06.3817316Z if __name__ == "__main__": 2023-01-11T21:38:06.3817439Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.3817568Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.3817772Z arg0_1 = rand_strided((128, 1), (1, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.3817976Z arg1_1 = rand_strided((1, 128), (128, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.3818182Z arg2_1 = rand_strided((128, 128), (128, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.3818302Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1])) 2023-01-11T21:38:06.3818308Z 2023-01-11T21:38:06.3818320Z 2023-01-11T21:38:06.3818411Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.3818484Z import torch 2023-01-11T21:38:06.3818555Z import random 2023-01-11T21:38:06.3818674Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.3818796Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.3818802Z 2023-01-11T21:38:06.3818881Z aten = torch.ops.aten 2023-01-11T21:38:06.3819015Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.3819147Z async_compile = AsyncCompile() 2023-01-11T21:38:06.3819152Z 2023-01-11T21:38:06.3819227Z import triton 2023-01-11T21:38:06.3819322Z import triton.language as tl 2023-01-11T21:38:06.3819448Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.3819594Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.3819600Z 2023-01-11T21:38:06.3819604Z 2023-01-11T21:38:06.3819761Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.3819836Z import triton 2023-01-11T21:38:06.3819924Z import triton.language as tl 2023-01-11T21:38:06.3820043Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.3820147Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.3820281Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.3820408Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.3820413Z 2023-01-11T21:38:06.3820842Z @pointwise(size_hints=[16384], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.3820917Z @triton.jit 2023-01-11T21:38:06.3821065Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.3821141Z xnumel = 16384 2023-01-11T21:38:06.3821234Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.3821363Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.3821447Z xmask = xindex < xnumel 2023-01-11T21:38:06.3821529Z x1 = (xindex // 128) 2023-01-11T21:38:06.3821608Z x0 = xindex % 128 2023-01-11T21:38:06.3821680Z x2 = xindex 2023-01-11T21:38:06.3821794Z tmp0 = tl.load(in_ptr0 + (x1), xmask).to(tl.float32) 2023-01-11T21:38:06.3821913Z tmp1 = tl.load(in_ptr1 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.3821996Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.3822129Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.3822216Z ''') 2023-01-11T21:38:06.3822222Z 2023-01-11T21:38:06.3822227Z 2023-01-11T21:38:06.3822386Z triton_fused_add_1_1 = async_compile.triton(''' 2023-01-11T21:38:06.3822465Z import triton 2023-01-11T21:38:06.3822590Z import triton.language as tl 2023-01-11T21:38:06.3822700Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.3822803Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.3822937Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.3823066Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.3823071Z 2023-01-11T21:38:06.3823474Z @pointwise(size_hints=[16384], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.3823554Z @triton.jit 2023-01-11T21:38:06.3823688Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.3823765Z xnumel = 16384 2023-01-11T21:38:06.3823858Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.3823989Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.3824074Z xmask = xindex < xnumel 2023-01-11T21:38:06.3824146Z x0 = xindex 2023-01-11T21:38:06.3824264Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.3824339Z tmp1 = 1 2023-01-11T21:38:06.3824420Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.3824548Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.3824636Z ''') 2023-01-11T21:38:06.3824641Z 2023-01-11T21:38:06.3824646Z 2023-01-11T21:38:06.3824740Z async_compile.wait(globals()) 2023-01-11T21:38:06.3824818Z del async_compile 2023-01-11T21:38:06.3824823Z 2023-01-11T21:38:06.3824901Z def call(args): 2023-01-11T21:38:06.3825035Z arg0_1, arg1_1, arg2_1 = args 2023-01-11T21:38:06.3825114Z args.clear() 2023-01-11T21:38:06.3825202Z with torch.cuda.device(0): 2023-01-11T21:38:06.3825411Z buf0 = empty_strided((128, 128), (128, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.3825508Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.3825658Z triton_fused_add_0.run(arg0_1, arg1_1, buf0, 16384, grid=grid(16384), stream=stream0) 2023-01-11T21:38:06.3825735Z del arg0_1 2023-01-11T21:38:06.3825809Z del arg1_1 2023-01-11T21:38:06.3826016Z buf1 = empty_strided((128, 128), (128, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.3826156Z triton_fused_add_1_1.run(arg2_1, buf1, 16384, grid=grid(16384), stream=stream0) 2023-01-11T21:38:06.3826224Z del arg2_1 2023-01-11T21:38:06.3826310Z return (buf0, buf1, ) 2023-01-11T21:38:06.3826315Z 2023-01-11T21:38:06.3826319Z 2023-01-11T21:38:06.3826401Z if __name__ == "__main__": 2023-01-11T21:38:06.3826522Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.3826648Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.3826852Z arg0_1 = rand_strided((128, 1), (1, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.3827057Z arg1_1 = rand_strided((1, 128), (128, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.3827264Z arg2_1 = rand_strided((128, 128), (128, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.3827386Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1])) 2023-01-11T21:38:06.3827391Z 2023-01-11T21:38:06.3827467Z ok (0.205s) 2023-01-11T21:38:06.3827920Z test_gather1_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.3828057Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.3828317Z [2023-01-11 21:34:41,174] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 564 2023-01-11T21:38:06.3828619Z [2023-01-11 21:34:41,369] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 564 2023-01-11T21:38:06.3829035Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.3829167Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.3829425Z [2023-01-11 21:34:41,395] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 565 2023-01-11T21:38:06.3829693Z [2023-01-11 21:34:41,487] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 565 2023-01-11T21:38:06.3829699Z 2023-01-11T21:38:06.3829798Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.3829868Z import torch 2023-01-11T21:38:06.3829944Z import random 2023-01-11T21:38:06.3830067Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.3830191Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.3830196Z 2023-01-11T21:38:06.3830281Z aten = torch.ops.aten 2023-01-11T21:38:06.3830423Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.3830518Z async_compile = AsyncCompile() 2023-01-11T21:38:06.3830524Z 2023-01-11T21:38:06.3830601Z import triton 2023-01-11T21:38:06.3830689Z import triton.language as tl 2023-01-11T21:38:06.3830817Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.3830958Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.3830990Z 2023-01-11T21:38:06.3830995Z 2023-01-11T21:38:06.3831173Z triton_fused_gather_gather_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.3831251Z import triton 2023-01-11T21:38:06.3831346Z import triton.language as tl 2023-01-11T21:38:06.3831463Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.3831562Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.3831695Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.3831824Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.3831829Z 2023-01-11T21:38:06.3832264Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.3832341Z @triton.jit 2023-01-11T21:38:06.3832493Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.3832569Z xnumel = 200 2023-01-11T21:38:06.3832669Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.3832794Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.3832879Z xmask = xindex < xnumel 2023-01-11T21:38:06.3832952Z x2 = xindex 2023-01-11T21:38:06.3833031Z x0 = xindex % 10 2023-01-11T21:38:06.3833225Z tmp0 = tl.load(in_ptr0 + (x2), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.3833326Z tmp4 = tl.load(in_ptr0 + (x2), xmask) 2023-01-11T21:38:06.3833399Z tmp1 = 1 2023-01-11T21:38:06.3833474Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.3833680Z tmp3 = tl.load(in_ptr1 + (tmp2 + (6*x0)), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.3833764Z tmp5 = tmp4 + tmp1 2023-01-11T21:38:06.3833876Z tmp6 = tl.load(in_ptr1 + (tmp5 + (6*x0)), xmask) 2023-01-11T21:38:06.3834012Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.3834150Z tl.store(out_ptr1 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp6, xmask) 2023-01-11T21:38:06.3834241Z ''') 2023-01-11T21:38:06.3834246Z 2023-01-11T21:38:06.3834251Z 2023-01-11T21:38:06.3834347Z async_compile.wait(globals()) 2023-01-11T21:38:06.3834419Z del async_compile 2023-01-11T21:38:06.3834425Z 2023-01-11T21:38:06.3834531Z def call(args): 2023-01-11T21:38:06.3834615Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.3834692Z args.clear() 2023-01-11T21:38:06.3834786Z with torch.cuda.device(0): 2023-01-11T21:38:06.3835004Z buf0 = empty_strided((4, 5, 10, 1), (50, 10, 1, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3835235Z buf1 = empty_strided((4, 5, 10, 1), (50, 10, 1, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3835338Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.3835529Z triton_fused_gather_gather_1_0.run(arg1_1, arg0_1, buf0, buf1, 200, grid=grid(200), stream=stream0) 2023-01-11T21:38:06.3835604Z del arg0_1 2023-01-11T21:38:06.3835682Z del arg1_1 2023-01-11T21:38:06.3835765Z return (buf0, buf1, ) 2023-01-11T21:38:06.3835771Z 2023-01-11T21:38:06.3835775Z 2023-01-11T21:38:06.3835857Z if __name__ == "__main__": 2023-01-11T21:38:06.3835979Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.3836109Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.3836321Z arg0_1 = rand_strided((1, 1, 10, 6), (60, 60, 6, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.3836534Z arg1_1 = rand_strided((4, 5, 10, 1), (50, 10, 1, 1), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.3836658Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.3836663Z 2023-01-11T21:38:06.3836668Z 2023-01-11T21:38:06.3836768Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.3836844Z import torch 2023-01-11T21:38:06.3836921Z import random 2023-01-11T21:38:06.3837042Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.3837196Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.3837202Z 2023-01-11T21:38:06.3837279Z aten = torch.ops.aten 2023-01-11T21:38:06.3837416Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.3837513Z async_compile = AsyncCompile() 2023-01-11T21:38:06.3837518Z 2023-01-11T21:38:06.3837597Z import triton 2023-01-11T21:38:06.3837692Z import triton.language as tl 2023-01-11T21:38:06.3837816Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.3837957Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.3837963Z 2023-01-11T21:38:06.3837967Z 2023-01-11T21:38:06.3838145Z triton_fused_gather_gather_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.3838215Z import triton 2023-01-11T21:38:06.3838309Z import triton.language as tl 2023-01-11T21:38:06.3838425Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.3838528Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.3838665Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.3838792Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.3838798Z 2023-01-11T21:38:06.3839228Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp16', 2: '*fp16', 3: '*fp16', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.3839306Z @triton.jit 2023-01-11T21:38:06.3839452Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.3839529Z xnumel = 200 2023-01-11T21:38:06.3839628Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.3839760Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.3839843Z xmask = xindex < xnumel 2023-01-11T21:38:06.3839915Z x2 = xindex 2023-01-11T21:38:06.3839992Z x0 = xindex % 10 2023-01-11T21:38:06.3840180Z tmp0 = tl.load(in_ptr0 + (x2), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.3840281Z tmp4 = tl.load(in_ptr0 + (x2), xmask) 2023-01-11T21:38:06.3840353Z tmp1 = 1 2023-01-11T21:38:06.3840433Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.3840663Z tmp3 = tl.load(in_ptr1 + (tmp2 + (6*x0)), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.3840772Z tmp5 = tmp4 + tmp1 2023-01-11T21:38:06.3840905Z tmp6 = tl.load(in_ptr1 + (tmp5 + (6*x0)), xmask).to(tl.float32) 2023-01-11T21:38:06.3841035Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.3841168Z tl.store(out_ptr1 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp6, xmask) 2023-01-11T21:38:06.3841256Z ''') 2023-01-11T21:38:06.3841261Z 2023-01-11T21:38:06.3841265Z 2023-01-11T21:38:06.3841360Z async_compile.wait(globals()) 2023-01-11T21:38:06.3841436Z del async_compile 2023-01-11T21:38:06.3841441Z 2023-01-11T21:38:06.3841514Z def call(args): 2023-01-11T21:38:06.3841596Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.3841669Z args.clear() 2023-01-11T21:38:06.3841757Z with torch.cuda.device(0): 2023-01-11T21:38:06.3841974Z buf0 = empty_strided((4, 5, 10, 1), (50, 10, 1, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.3842192Z buf1 = empty_strided((4, 5, 10, 1), (50, 10, 1, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.3842285Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.3842448Z triton_fused_gather_gather_1_0.run(arg1_1, arg0_1, buf0, buf1, 200, grid=grid(200), stream=stream0) 2023-01-11T21:38:06.3842523Z del arg0_1 2023-01-11T21:38:06.3842597Z del arg1_1 2023-01-11T21:38:06.3842675Z return (buf0, buf1, ) 2023-01-11T21:38:06.3842685Z 2023-01-11T21:38:06.3842689Z 2023-01-11T21:38:06.3842764Z if __name__ == "__main__": 2023-01-11T21:38:06.3842882Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.3843009Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.3843251Z arg0_1 = rand_strided((1, 1, 10, 6), (60, 60, 6, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.3843466Z arg1_1 = rand_strided((4, 5, 10, 1), (50, 10, 1, 1), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.3843588Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.3843593Z 2023-01-11T21:38:06.3843670Z ok (0.340s) 2023-01-11T21:38:06.3843792Z test_gather2_cuda (__main__.CudaTests) ... ok (0.001s) 2023-01-11T21:38:06.3844246Z test_gather_scatter_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.3844378Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.3844637Z [2023-01-11 21:34:41,581] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 566 2023-01-11T21:38:06.3844903Z [2023-01-11 21:34:41,755] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 566 2023-01-11T21:38:06.3844909Z 2023-01-11T21:38:06.3845012Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.3845088Z import torch 2023-01-11T21:38:06.3845167Z import random 2023-01-11T21:38:06.3845288Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.3845413Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.3845419Z 2023-01-11T21:38:06.3845496Z aten = torch.ops.aten 2023-01-11T21:38:06.3845634Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.3845729Z async_compile = AsyncCompile() 2023-01-11T21:38:06.3845734Z 2023-01-11T21:38:06.3845810Z import triton 2023-01-11T21:38:06.3845905Z import triton.language as tl 2023-01-11T21:38:06.3846033Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.3846176Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.3846181Z 2023-01-11T21:38:06.3846185Z 2023-01-11T21:38:06.3846449Z triton_fused_add_expand_index_index_1_scatter_add__select_2_sub_unsqueeze_zeros_like_0 = async_compile.triton(''' 2023-01-11T21:38:06.3846547Z import triton 2023-01-11T21:38:06.3846642Z import triton.language as tl 2023-01-11T21:38:06.3846757Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.3846860Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.3846993Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.3847120Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.3847125Z 2023-01-11T21:38:06.3847516Z @pointwise(size_hints=[512], filename=__file__, meta={'signature': {0: '*fp32', 1: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.3847593Z @triton.jit 2023-01-11T21:38:06.3847710Z def triton_(out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.3847786Z xnumel = 512 2023-01-11T21:38:06.3847886Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.3848015Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.3848103Z xmask = xindex < xnumel 2023-01-11T21:38:06.3848176Z x0 = xindex 2023-01-11T21:38:06.3848247Z tmp0 = 0 2023-01-11T21:38:06.3848375Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask) 2023-01-11T21:38:06.3848461Z ''') 2023-01-11T21:38:06.3848467Z 2023-01-11T21:38:06.3848471Z 2023-01-11T21:38:06.3848735Z triton_fused_add_expand_index_index_1_scatter_add__select_2_sub_unsqueeze_zeros_like_1 = async_compile.triton(''' 2023-01-11T21:38:06.3848811Z import triton 2023-01-11T21:38:06.3848906Z import triton.language as tl 2023-01-11T21:38:06.3849022Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.3849153Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.3849287Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.3849407Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.3849412Z 2023-01-11T21:38:06.3849841Z @pointwise(size_hints=[4096], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.3849918Z @triton.jit 2023-01-11T21:38:06.3850057Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.3850132Z xnumel = 2560 2023-01-11T21:38:06.3850230Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.3850360Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.3850444Z xmask = xindex < xnumel 2023-01-11T21:38:06.3850520Z x1 = (xindex // 32) 2023-01-11T21:38:06.3850602Z x0 = xindex % 32 2023-01-11T21:38:06.3850706Z tmp0 = tl.load(in_ptr0 + (80 + x1), xmask) 2023-01-11T21:38:06.3850806Z tmp1 = tl.load(in_ptr0 + (x1), xmask) 2023-01-11T21:38:06.3850916Z tmp2 = tl.load(in_ptr1 + (x0 + (32*tmp1)), xmask) 2023-01-11T21:38:06.3851023Z tmp3 = tl.load(in_ptr1 + (x0 + (32*tmp0)), xmask) 2023-01-11T21:38:06.3851138Z tmp4 = tmp2 - tmp3 2023-01-11T21:38:06.3851205Z tmp5 = 1 2023-01-11T21:38:06.3851286Z tmp6 = tmp4 + tmp5 2023-01-11T21:38:06.3851437Z tl.atomic_add(out_ptr0 + (x0 + (32*tmp0) + tl.zeros([XBLOCK], tl.int32)), tmp6, xmask) 2023-01-11T21:38:06.3851524Z ''') 2023-01-11T21:38:06.3851529Z 2023-01-11T21:38:06.3851534Z 2023-01-11T21:38:06.3851627Z async_compile.wait(globals()) 2023-01-11T21:38:06.3851707Z del async_compile 2023-01-11T21:38:06.3851712Z 2023-01-11T21:38:06.3851788Z def call(args): 2023-01-11T21:38:06.3851863Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.3851939Z args.clear() 2023-01-11T21:38:06.3852033Z with torch.cuda.device(0): 2023-01-11T21:38:06.3852242Z buf0 = empty_strided((16, 32), (32, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3852337Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.3852566Z triton_fused_add_expand_index_index_1_scatter_add__select_2_sub_unsqueeze_zeros_like_0.run(buf0, 512, grid=grid(512), stream=stream0) 2023-01-11T21:38:06.3852781Z triton_fused_add_expand_index_index_1_scatter_add__select_2_sub_unsqueeze_zeros_like_1.run(arg1_1, arg0_1, buf0, 2560, grid=grid(2560), stream=stream0) 2023-01-11T21:38:06.3852856Z del arg0_1 2023-01-11T21:38:06.3852924Z del arg1_1 2023-01-11T21:38:06.3853005Z return (buf0, ) 2023-01-11T21:38:06.3853010Z 2023-01-11T21:38:06.3853015Z 2023-01-11T21:38:06.3853095Z if __name__ == "__main__": 2023-01-11T21:38:06.3853216Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.3853343Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.3853553Z arg0_1 = rand_strided((16, 32), (32, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.3853753Z arg1_1 = rand_strided((2, 80), (80, 1), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.3853874Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.3853879Z 2023-01-11T21:38:06.3853945Z ok (0.267s) 2023-01-11T21:38:06.3854403Z test_gelu_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.3854650Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.3854914Z [2023-01-11 21:34:41,793] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 567 2023-01-11T21:38:06.3855221Z [2023-01-11 21:34:41,965] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 567 2023-01-11T21:38:06.3855639Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.3855774Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.3856028Z [2023-01-11 21:34:42,027] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 568 2023-01-11T21:38:06.3856294Z [2023-01-11 21:34:42,190] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 568 2023-01-11T21:38:06.3856300Z 2023-01-11T21:38:06.3856399Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.3856477Z import torch 2023-01-11T21:38:06.3856551Z import random 2023-01-11T21:38:06.3856670Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.3856794Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.3856800Z 2023-01-11T21:38:06.3856884Z aten = torch.ops.aten 2023-01-11T21:38:06.3857028Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.3857177Z async_compile = AsyncCompile() 2023-01-11T21:38:06.3857183Z 2023-01-11T21:38:06.3857274Z import triton 2023-01-11T21:38:06.3857374Z import triton.language as tl 2023-01-11T21:38:06.3857516Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.3857658Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.3857664Z 2023-01-11T21:38:06.3857668Z 2023-01-11T21:38:06.3857836Z triton_fused_add_1_mul_5_0 = async_compile.triton(''' 2023-01-11T21:38:06.3857912Z import triton 2023-01-11T21:38:06.3858004Z import triton.language as tl 2023-01-11T21:38:06.3858125Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.3858227Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.3858356Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.3858480Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.3858486Z 2023-01-11T21:38:06.3858946Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.3859024Z @triton.jit 2023-01-11T21:38:06.3859168Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.3859244Z xnumel = 256 2023-01-11T21:38:06.3859342Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.3859469Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.3859548Z xmask = xindex < xnumel 2023-01-11T21:38:06.3859624Z x0 = xindex 2023-01-11T21:38:06.3859816Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.3859917Z tmp11 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.3859992Z tmp1 = 0.5 2023-01-11T21:38:06.3860072Z tmp2 = tmp0 * tmp1 2023-01-11T21:38:06.3860155Z tmp3 = 0.7071067811865476 2023-01-11T21:38:06.3860232Z tmp4 = tmp0 * tmp3 2023-01-11T21:38:06.3860330Z tmp5 = tl.libdevice.erf(tmp4) 2023-01-11T21:38:06.3860402Z tmp6 = 1 2023-01-11T21:38:06.3860479Z tmp7 = tmp5 + tmp6 2023-01-11T21:38:06.3860558Z tmp8 = tmp2 * tmp7 2023-01-11T21:38:06.3860630Z tmp9 = 2 2023-01-11T21:38:06.3860704Z tmp10 = tmp8 + tmp9 2023-01-11T21:38:06.3860788Z tmp12 = tmp11 + tmp6 2023-01-11T21:38:06.3860872Z tmp13 = tmp12 * tmp1 2023-01-11T21:38:06.3860954Z tmp14 = tmp12 * tmp3 2023-01-11T21:38:06.3861051Z tmp15 = tl.libdevice.erf(tmp14) 2023-01-11T21:38:06.3861132Z tmp16 = tmp15 + tmp6 2023-01-11T21:38:06.3861241Z tmp17 = tmp13 * tmp16 2023-01-11T21:38:06.3861374Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp10, xmask) 2023-01-11T21:38:06.3861510Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp17, xmask) 2023-01-11T21:38:06.3861597Z ''') 2023-01-11T21:38:06.3861603Z 2023-01-11T21:38:06.3861608Z 2023-01-11T21:38:06.3861708Z async_compile.wait(globals()) 2023-01-11T21:38:06.3861790Z del async_compile 2023-01-11T21:38:06.3861795Z 2023-01-11T21:38:06.3861870Z def call(args): 2023-01-11T21:38:06.3861946Z arg0_1, = args 2023-01-11T21:38:06.3862022Z args.clear() 2023-01-11T21:38:06.3862110Z with torch.cuda.device(0): 2023-01-11T21:38:06.3862314Z buf0 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3862518Z buf1 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3862610Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.3862759Z triton_fused_add_1_mul_5_0.run(arg0_1, buf0, buf1, 256, grid=grid(256), stream=stream0) 2023-01-11T21:38:06.3862838Z del arg0_1 2023-01-11T21:38:06.3862922Z return (buf0, buf1, ) 2023-01-11T21:38:06.3862927Z 2023-01-11T21:38:06.3862931Z 2023-01-11T21:38:06.3863012Z if __name__ == "__main__": 2023-01-11T21:38:06.3863126Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.3863260Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.3863465Z arg0_1 = rand_strided((16, 16), (16, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.3863579Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.3863584Z 2023-01-11T21:38:06.3863588Z 2023-01-11T21:38:06.3863686Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.3863762Z import torch 2023-01-11T21:38:06.3863837Z import random 2023-01-11T21:38:06.3863952Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.3864076Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.3864084Z 2023-01-11T21:38:06.3864168Z aten = torch.ops.aten 2023-01-11T21:38:06.3864304Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.3864406Z async_compile = AsyncCompile() 2023-01-11T21:38:06.3864411Z 2023-01-11T21:38:06.3864488Z import triton 2023-01-11T21:38:06.3864582Z import triton.language as tl 2023-01-11T21:38:06.3864735Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.3864871Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.3864877Z 2023-01-11T21:38:06.3864886Z 2023-01-11T21:38:06.3865079Z triton_fused_add_1_convert_element_type_3_0 = async_compile.triton(''' 2023-01-11T21:38:06.3865156Z import triton 2023-01-11T21:38:06.3865249Z import triton.language as tl 2023-01-11T21:38:06.3865365Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.3865468Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.3865616Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.3865759Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.3865766Z 2023-01-11T21:38:06.3866197Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.3866271Z @triton.jit 2023-01-11T21:38:06.3866414Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.3866491Z xnumel = 256 2023-01-11T21:38:06.3866588Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.3866719Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.3866805Z xmask = xindex < xnumel 2023-01-11T21:38:06.3866878Z x0 = xindex 2023-01-11T21:38:06.3867087Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.3867206Z tmp13 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.3867339Z tmp1 = tmp0.to(tl.float32) 2023-01-11T21:38:06.3867414Z tmp2 = 0.5 2023-01-11T21:38:06.3867494Z tmp3 = tmp1 * tmp2 2023-01-11T21:38:06.3867574Z tmp4 = 0.7071067811865476 2023-01-11T21:38:06.3867654Z tmp5 = tmp1 * tmp4 2023-01-11T21:38:06.3867744Z tmp6 = tl.libdevice.erf(tmp5) 2023-01-11T21:38:06.3867819Z tmp7 = 1 2023-01-11T21:38:06.3867898Z tmp8 = tmp6 + tmp7 2023-01-11T21:38:06.3867976Z tmp9 = tmp3 * tmp8 2023-01-11T21:38:06.3868066Z tmp10 = tmp9.to(tl.float32) 2023-01-11T21:38:06.3868140Z tmp11 = 2 2023-01-11T21:38:06.3868216Z tmp12 = tmp10 + tmp11 2023-01-11T21:38:06.3868300Z tmp14 = tmp13 + tmp7 2023-01-11T21:38:06.3868392Z tmp15 = tmp14.to(tl.float32) 2023-01-11T21:38:06.3868474Z tmp16 = tmp15 * tmp2 2023-01-11T21:38:06.3868552Z tmp17 = tmp15 * tmp4 2023-01-11T21:38:06.3868649Z tmp18 = tl.libdevice.erf(tmp17) 2023-01-11T21:38:06.3868731Z tmp19 = tmp18 + tmp7 2023-01-11T21:38:06.3868810Z tmp20 = tmp16 * tmp19 2023-01-11T21:38:06.3868901Z tmp21 = tmp20.to(tl.float32) 2023-01-11T21:38:06.3869038Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp12, xmask) 2023-01-11T21:38:06.3869181Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp21, xmask) 2023-01-11T21:38:06.3869272Z ''') 2023-01-11T21:38:06.3869278Z 2023-01-11T21:38:06.3869285Z 2023-01-11T21:38:06.3869383Z async_compile.wait(globals()) 2023-01-11T21:38:06.3869461Z del async_compile 2023-01-11T21:38:06.3869466Z 2023-01-11T21:38:06.3869541Z def call(args): 2023-01-11T21:38:06.3869610Z arg0_1, = args 2023-01-11T21:38:06.3869692Z args.clear() 2023-01-11T21:38:06.3869786Z with torch.cuda.device(0): 2023-01-11T21:38:06.3869992Z buf0 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.3870192Z buf1 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.3870285Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.3870460Z triton_fused_add_1_convert_element_type_3_0.run(arg0_1, buf0, buf1, 256, grid=grid(256), stream=stream0) 2023-01-11T21:38:06.3870528Z del arg0_1 2023-01-11T21:38:06.3870613Z return (buf0, buf1, ) 2023-01-11T21:38:06.3870618Z 2023-01-11T21:38:06.3870624Z 2023-01-11T21:38:06.3870706Z if __name__ == "__main__": 2023-01-11T21:38:06.3870854Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.3870985Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.3871189Z arg0_1 = rand_strided((16, 16), (16, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.3871300Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.3871306Z 2023-01-11T21:38:06.3871379Z ok (0.434s) 2023-01-11T21:38:06.3871832Z test_glu_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.3871962Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.3872226Z [2023-01-11 21:34:42,227] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 569 2023-01-11T21:38:06.3872489Z [2023-01-11 21:34:42,352] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 569 2023-01-11T21:38:06.3872901Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.3873035Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.3873319Z [2023-01-11 21:34:42,391] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 570 2023-01-11T21:38:06.3873325Z 2023-01-11T21:38:06.3873425Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.3873501Z import torch 2023-01-11T21:38:06.3873576Z import random 2023-01-11T21:38:06.3873692Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.3873816Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.3873821Z 2023-01-11T21:38:06.3873906Z aten = torch.ops.aten 2023-01-11T21:38:06.3874042Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.3874143Z async_compile = AsyncCompile() 2023-01-11T21:38:06.3874148Z 2023-01-11T21:38:06.3874224Z import triton 2023-01-11T21:38:06.3874317Z import triton.language as tl 2023-01-11T21:38:06.3874444Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.3874579Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.3874587Z 2023-01-11T21:38:06.3874598Z 2023-01-11T21:38:06.3874746Z triton_fused_glu_0 = async_compile.triton(''' 2023-01-11T21:38:06.3874824Z import triton 2023-01-11T21:38:06.3874919Z import triton.language as tl 2023-01-11T21:38:06.3875036Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.3875142Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.3875280Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.3875429Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.3875434Z 2023-01-11T21:38:06.3875857Z @pointwise(size_hints=[4096], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.3875933Z @triton.jit 2023-01-11T21:38:06.3876066Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.3876148Z xnumel = 4096 2023-01-11T21:38:06.3876248Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.3876376Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.3876461Z xmask = xindex < xnumel 2023-01-11T21:38:06.3876539Z x0 = xindex % 4 2023-01-11T21:38:06.3876614Z x1 = (xindex // 4) 2023-01-11T21:38:06.3876713Z x2 = xindex 2023-01-11T21:38:06.3876920Z tmp0 = tl.load(in_ptr0 + (x0 + (8*x1)), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.3877128Z tmp1 = tl.load(in_ptr0 + (4 + x0 + (8*x1)), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.3877216Z tmp2 = tl.sigmoid(tmp1) 2023-01-11T21:38:06.3877297Z tmp3 = tmp0 * tmp2 2023-01-11T21:38:06.3877435Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.3877515Z ''') 2023-01-11T21:38:06.3877520Z 2023-01-11T21:38:06.3877524Z 2023-01-11T21:38:06.3877682Z triton_fused_glu_1_1 = async_compile.triton(''' 2023-01-11T21:38:06.3877762Z import triton 2023-01-11T21:38:06.3877865Z import triton.language as tl 2023-01-11T21:38:06.3877982Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.3878087Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.3878220Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.3878343Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.3878353Z 2023-01-11T21:38:06.3878753Z @pointwise(size_hints=[4096], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.3878826Z @triton.jit 2023-01-11T21:38:06.3878960Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.3879037Z xnumel = 4096 2023-01-11T21:38:06.3879138Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.3879268Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.3879384Z xmask = xindex < xnumel 2023-01-11T21:38:06.3879456Z x0 = xindex % 512 2023-01-11T21:38:06.3879538Z x1 = (xindex // 512) 2023-01-11T21:38:06.3879610Z x2 = xindex 2023-01-11T21:38:06.3879818Z tmp0 = tl.load(in_ptr0 + (x0 + (1024*x1)), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.3880030Z tmp1 = tl.load(in_ptr0 + (512 + x0 + (1024*x1)), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.3880117Z tmp2 = tl.sigmoid(tmp1) 2023-01-11T21:38:06.3880196Z tmp3 = tmp0 * tmp2 2023-01-11T21:38:06.3880322Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.3880406Z ''') 2023-01-11T21:38:06.3880411Z 2023-01-11T21:38:06.3880416Z 2023-01-11T21:38:06.3880572Z triton_fused_glu_2_2 = async_compile.triton(''' 2023-01-11T21:38:06.3880647Z import triton 2023-01-11T21:38:06.3880745Z import triton.language as tl 2023-01-11T21:38:06.3880860Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.3880966Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.3881104Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.3881223Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.3881229Z 2023-01-11T21:38:06.3881635Z @pointwise(size_hints=[4096], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.3881709Z @triton.jit 2023-01-11T21:38:06.3881845Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.3881919Z xnumel = 4096 2023-01-11T21:38:06.3882022Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.3882150Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.3882235Z xmask = xindex < xnumel 2023-01-11T21:38:06.3882306Z x0 = xindex % 32 2023-01-11T21:38:06.3882387Z x1 = (xindex // 32) 2023-01-11T21:38:06.3882462Z x2 = xindex 2023-01-11T21:38:06.3882567Z tmp0 = tl.load(in_ptr0 + (x0 + (64*x1)), xmask) 2023-01-11T21:38:06.3882679Z tmp1 = tl.load(in_ptr0 + (32 + x0 + (64*x1)), xmask) 2023-01-11T21:38:06.3882766Z tmp2 = tl.sigmoid(tmp1) 2023-01-11T21:38:06.3882847Z tmp3 = tmp0 * tmp2 2023-01-11T21:38:06.3883007Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.3883097Z ''') 2023-01-11T21:38:06.3883102Z 2023-01-11T21:38:06.3883107Z 2023-01-11T21:38:06.3883202Z async_compile.wait(globals()) 2023-01-11T21:38:06.3883280Z del async_compile 2023-01-11T21:38:06.3883285Z 2023-01-11T21:38:06.3883363Z def call(args): 2023-01-11T21:38:06.3883437Z arg0_1, = args 2023-01-11T21:38:06.3883514Z args.clear() 2023-01-11T21:38:06.3883602Z with torch.cuda.device(0): 2023-01-11T21:38:06.3883824Z buf0 = empty_strided((8, 16, 8, 4), (512, 32, 4, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3883919Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.3884064Z triton_fused_glu_0.run(arg0_1, buf0, 4096, grid=grid(4096), stream=stream0) 2023-01-11T21:38:06.3884282Z buf1 = empty_strided((8, 8, 8, 8), (512, 64, 8, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3884424Z triton_fused_glu_1_1.run(arg0_1, buf1, 4096, grid=grid(4096), stream=stream0) 2023-01-11T21:38:06.3884640Z buf2 = empty_strided((8, 16, 4, 8), (512, 32, 8, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3884780Z triton_fused_glu_2_2.run(arg0_1, buf2, 4096, grid=grid(4096), stream=stream0) 2023-01-11T21:38:06.3884849Z del arg0_1 2023-01-11T21:38:06.3884938Z return (buf0, buf1, buf2, ) 2023-01-11T21:38:06.3884943Z 2023-01-11T21:38:06.3884948Z 2023-01-11T21:38:06.3885031Z if __name__ == "__main__": 2023-01-11T21:38:06.3885151Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.3885280Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.3885525Z arg0_1 = rand_strided((8, 16, 8, 8), (1024, 64, 8, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.3885641Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.3885646Z 2023-01-11T21:38:06.3885912Z [2023-01-11 21:34:42,497] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 570 2023-01-11T21:38:06.3885922Z 2023-01-11T21:38:06.3886021Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.3886091Z import torch 2023-01-11T21:38:06.3886167Z import random 2023-01-11T21:38:06.3886291Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.3886418Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.3886423Z 2023-01-11T21:38:06.3886507Z aten = torch.ops.aten 2023-01-11T21:38:06.3886646Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.3886742Z async_compile = AsyncCompile() 2023-01-11T21:38:06.3886747Z 2023-01-11T21:38:06.3886816Z import triton 2023-01-11T21:38:06.3886912Z import triton.language as tl 2023-01-11T21:38:06.3887041Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.3887181Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.3887186Z 2023-01-11T21:38:06.3887190Z 2023-01-11T21:38:06.3887345Z triton_fused_glu_0 = async_compile.triton(''' 2023-01-11T21:38:06.3887426Z import triton 2023-01-11T21:38:06.3887519Z import triton.language as tl 2023-01-11T21:38:06.3887633Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.3887730Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.3887863Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.3887989Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.3887995Z 2023-01-11T21:38:06.3888397Z @pointwise(size_hints=[4096], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.3888473Z @triton.jit 2023-01-11T21:38:06.3888606Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.3888683Z xnumel = 4096 2023-01-11T21:38:06.3888787Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.3888936Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.3889020Z xmask = xindex < xnumel 2023-01-11T21:38:06.3889096Z x0 = xindex % 4 2023-01-11T21:38:06.3889175Z x1 = (xindex // 4) 2023-01-11T21:38:06.3889248Z x2 = xindex 2023-01-11T21:38:06.3889474Z tmp0 = tl.load(in_ptr0 + (x0 + (8*x1)), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.3889702Z tmp1 = tl.load(in_ptr0 + (4 + x0 + (8*x1)), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.3889783Z tmp2 = tl.sigmoid(tmp1) 2023-01-11T21:38:06.3889864Z tmp3 = tmp0 * tmp2 2023-01-11T21:38:06.3890001Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.3890090Z ''') 2023-01-11T21:38:06.3890095Z 2023-01-11T21:38:06.3890100Z 2023-01-11T21:38:06.3890260Z triton_fused_glu_1_1 = async_compile.triton(''' 2023-01-11T21:38:06.3890335Z import triton 2023-01-11T21:38:06.3890432Z import triton.language as tl 2023-01-11T21:38:06.3890543Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.3890646Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.3890781Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.3890908Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.3890913Z 2023-01-11T21:38:06.3891317Z @pointwise(size_hints=[4096], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.3891392Z @triton.jit 2023-01-11T21:38:06.3891530Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.3891638Z xnumel = 4096 2023-01-11T21:38:06.3891730Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.3891859Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.3891942Z xmask = xindex < xnumel 2023-01-11T21:38:06.3892022Z x0 = xindex % 512 2023-01-11T21:38:06.3892107Z x1 = (xindex // 512) 2023-01-11T21:38:06.3892180Z x2 = xindex 2023-01-11T21:38:06.3892411Z tmp0 = tl.load(in_ptr0 + (x0 + (1024*x1)), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.3892644Z tmp1 = tl.load(in_ptr0 + (512 + x0 + (1024*x1)), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.3892725Z tmp2 = tl.sigmoid(tmp1) 2023-01-11T21:38:06.3892807Z tmp3 = tmp0 * tmp2 2023-01-11T21:38:06.3892941Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.3893028Z ''') 2023-01-11T21:38:06.3893033Z 2023-01-11T21:38:06.3893038Z 2023-01-11T21:38:06.3893196Z triton_fused_glu_2_2 = async_compile.triton(''' 2023-01-11T21:38:06.3893270Z import triton 2023-01-11T21:38:06.3893365Z import triton.language as tl 2023-01-11T21:38:06.3893473Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.3893575Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.3893710Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.3893838Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.3893843Z 2023-01-11T21:38:06.3894240Z @pointwise(size_hints=[4096], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.3894317Z @triton.jit 2023-01-11T21:38:06.3894454Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.3894643Z xnumel = 4096 2023-01-11T21:38:06.3894736Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.3894870Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.3894956Z xmask = xindex < xnumel 2023-01-11T21:38:06.3895035Z x0 = xindex % 32 2023-01-11T21:38:06.3895115Z x1 = (xindex // 32) 2023-01-11T21:38:06.3895187Z x2 = xindex 2023-01-11T21:38:06.3895357Z tmp0 = tl.load(in_ptr0 + (x0 + (64*x1)), xmask).to(tl.float32) 2023-01-11T21:38:06.3895481Z tmp1 = tl.load(in_ptr0 + (32 + x0 + (64*x1)), xmask).to(tl.float32) 2023-01-11T21:38:06.3895567Z tmp2 = tl.sigmoid(tmp1) 2023-01-11T21:38:06.3895648Z tmp3 = tmp0 * tmp2 2023-01-11T21:38:06.3895786Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.3895897Z ''') 2023-01-11T21:38:06.3895903Z 2023-01-11T21:38:06.3895907Z 2023-01-11T21:38:06.3896004Z async_compile.wait(globals()) 2023-01-11T21:38:06.3896085Z del async_compile 2023-01-11T21:38:06.3896090Z 2023-01-11T21:38:06.3896166Z def call(args): 2023-01-11T21:38:06.3896234Z arg0_1, = args 2023-01-11T21:38:06.3896314Z args.clear() 2023-01-11T21:38:06.3896408Z with torch.cuda.device(0): 2023-01-11T21:38:06.3896628Z buf0 = empty_strided((8, 16, 8, 4), (512, 32, 4, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.3896727Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.3896882Z triton_fused_glu_0.run(arg0_1, buf0, 4096, grid=grid(4096), stream=stream0) 2023-01-11T21:38:06.3897172Z buf1 = empty_strided((8, 8, 8, 8), (512, 64, 8, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.3897312Z triton_fused_glu_1_1.run(arg0_1, buf1, 4096, grid=grid(4096), stream=stream0) 2023-01-11T21:38:06.3897547Z buf2 = empty_strided((8, 16, 4, 8), (512, 32, 8, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.3897687Z triton_fused_glu_2_2.run(arg0_1, buf2, 4096, grid=grid(4096), stream=stream0) 2023-01-11T21:38:06.3897763Z del arg0_1 2023-01-11T21:38:06.3897851Z return (buf0, buf1, buf2, ) 2023-01-11T21:38:06.3897895Z 2023-01-11T21:38:06.3897900Z 2023-01-11T21:38:06.3897983Z if __name__ == "__main__": 2023-01-11T21:38:06.3898102Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.3898232Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.3898448Z arg0_1 = rand_strided((8, 16, 8, 8), (1024, 64, 8, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.3898562Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.3898567Z 2023-01-11T21:38:06.3898639Z ok (0.308s) 2023-01-11T21:38:06.3899102Z test_grid_sampler_2d_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.3899236Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.3899499Z [2023-01-11 21:34:43,850] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 571 2023-01-11T21:38:06.3899711Z [2023-01-11 21:34:44,454] torch._inductor.scheduler: [DEBUG] remove_buffer('buf3') 2023-01-11T21:38:06.3899917Z [2023-01-11 21:34:44,454] torch._inductor.scheduler: [DEBUG] remove_buffer('buf7') 2023-01-11T21:38:06.3900118Z [2023-01-11 21:34:44,454] torch._inductor.scheduler: [DEBUG] remove_buffer('buf5') 2023-01-11T21:38:06.3900124Z 2023-01-11T21:38:06.3900224Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.3900295Z import torch 2023-01-11T21:38:06.3900371Z import random 2023-01-11T21:38:06.3900492Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.3900614Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.3900619Z 2023-01-11T21:38:06.3900704Z aten = torch.ops.aten 2023-01-11T21:38:06.3900843Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.3900941Z async_compile = AsyncCompile() 2023-01-11T21:38:06.3900947Z 2023-01-11T21:38:06.3901016Z import triton 2023-01-11T21:38:06.3901112Z import triton.language as tl 2023-01-11T21:38:06.3901240Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.3901410Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.3901416Z 2023-01-11T21:38:06.3901421Z 2023-01-11T21:38:06.3901718Z triton_fused_add_add_1_add_10_add_2_add_3_add_7_add_8_add_9_convert_element_type_10_convert_element_type_11_0 = async_compile.triton(''' 2023-01-11T21:38:06.3901796Z import triton 2023-01-11T21:38:06.3901892Z import triton.language as tl 2023-01-11T21:38:06.3902008Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.3902106Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.3902239Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.3902370Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.3902378Z 2023-01-11T21:38:06.3903011Z @pointwise(size_hints=[524288], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: '*fp32', 5: '*i64', 6: '*i64', 7: '*fp32', 8: '*i64', 9: '*i64', 10: '*fp32', 11: '*i64', 12: '*i64', 13: '*fp32', 14: '*i64', 15: '*i64', 16: '*fp32', 17: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17), equal_to_1=())]}) 2023-01-11T21:38:06.3903086Z @triton.jit 2023-01-11T21:38:06.3903339Z def triton_(in_ptr0, out_ptr0, out_ptr1, out_ptr2, out_ptr3, out_ptr4, out_ptr5, out_ptr6, out_ptr7, out_ptr8, out_ptr9, out_ptr10, out_ptr11, out_ptr12, out_ptr13, out_ptr14, out_ptr15, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.3903416Z xnumel = 495616 2023-01-11T21:38:06.3903517Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.3903647Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.3903798Z xmask = xindex < xnumel 2023-01-11T21:38:06.3903864Z x0 = xindex 2023-01-11T21:38:06.3904059Z tmp0 = tl.load(in_ptr0 + (2*x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.3904258Z tmp9 = tl.load(in_ptr0 + (1 + (2*x0)), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.3904363Z tmp99 = tl.load(in_ptr0 + (2*x0), xmask) 2023-01-11T21:38:06.3904468Z tmp108 = tl.load(in_ptr0 + (1 + (2*x0)), xmask) 2023-01-11T21:38:06.3904544Z tmp1 = 175.5 2023-01-11T21:38:06.3904624Z tmp2 = tmp0 * tmp1 2023-01-11T21:38:06.3904698Z tmp3 = tmp2 + tmp1 2023-01-11T21:38:06.3904801Z tmp4 = tl.libdevice.floor(tmp3) 2023-01-11T21:38:06.3904874Z tmp5 = 0 2023-01-11T21:38:06.3904954Z tmp6 = tmp4 >= tmp5 2023-01-11T21:38:06.3905029Z tmp7 = 352 2023-01-11T21:38:06.3905108Z tmp8 = tmp4 < tmp7 2023-01-11T21:38:06.3905181Z tmp10 = tmp9 * tmp1 2023-01-11T21:38:06.3905265Z tmp11 = tmp10 + tmp1 2023-01-11T21:38:06.3905375Z tmp12 = tl.libdevice.floor(tmp11) 2023-01-11T21:38:06.3905459Z tmp13 = tmp12 >= tmp5 2023-01-11T21:38:06.3905539Z tmp14 = tmp12 < tmp7 2023-01-11T21:38:06.3905637Z tmp15 = tmp13 & tmp14 2023-01-11T21:38:06.3905722Z tmp16 = tmp8 & tmp15 2023-01-11T21:38:06.3905812Z tmp17 = tmp6 & tmp16 2023-01-11T21:38:06.3905894Z tmp18 = 1 2023-01-11T21:38:06.3905976Z tmp19 = tmp4 + tmp18 2023-01-11T21:38:06.3906090Z tmp20 = tmp19 - tmp3 2023-01-11T21:38:06.3906170Z tmp21 = tmp12 + tmp18 2023-01-11T21:38:06.3906281Z tmp22 = tmp21 - tmp11 2023-01-11T21:38:06.3906361Z tmp23 = tmp20 * tmp22 2023-01-11T21:38:06.3906456Z tmp24 = tl.where(tmp17, tmp23, tmp5) 2023-01-11T21:38:06.3906538Z tmp25 = tmp19 >= tmp5 2023-01-11T21:38:06.3906616Z tmp26 = tmp19 < tmp7 2023-01-11T21:38:06.3906695Z tmp27 = tmp26 & tmp15 2023-01-11T21:38:06.3906777Z tmp28 = tmp25 & tmp27 2023-01-11T21:38:06.3906886Z tmp29 = tmp3 - tmp4 2023-01-11T21:38:06.3906960Z tmp30 = tmp29 * tmp22 2023-01-11T21:38:06.3907064Z tmp31 = tl.where(tmp28, tmp30, tmp5) 2023-01-11T21:38:06.3907144Z tmp32 = tmp21 >= tmp5 2023-01-11T21:38:06.3907223Z tmp33 = tmp21 < tmp7 2023-01-11T21:38:06.3907303Z tmp34 = tmp32 & tmp33 2023-01-11T21:38:06.3907384Z tmp35 = tmp8 & tmp34 2023-01-11T21:38:06.3907466Z tmp36 = tmp6 & tmp35 2023-01-11T21:38:06.3907601Z tmp37 = tmp11 - tmp12 2023-01-11T21:38:06.3907682Z tmp38 = tmp20 * tmp37 2023-01-11T21:38:06.3907778Z tmp39 = tl.where(tmp36, tmp38, tmp5) 2023-01-11T21:38:06.3907857Z tmp40 = tmp26 & tmp34 2023-01-11T21:38:06.3907937Z tmp41 = tmp25 & tmp40 2023-01-11T21:38:06.3908015Z tmp42 = tmp29 * tmp37 2023-01-11T21:38:06.3908113Z tmp43 = tl.where(tmp41, tmp42, tmp5) 2023-01-11T21:38:06.3908182Z tmp44 = 176.0 2023-01-11T21:38:06.3908263Z tmp45 = tmp0 * tmp44 2023-01-11T21:38:06.3908342Z tmp46 = tmp45 + tmp1 2023-01-11T21:38:06.3908413Z tmp47 = 0.0 2023-01-11T21:38:06.3908555Z tmp48 = tl.where(tmp46 != tmp46, tmp46, tl.where(tmp46 > tmp47, tmp46, tmp47)) 2023-01-11T21:38:06.3908634Z tmp49 = 351.0 2023-01-11T21:38:06.3908777Z tmp50 = tl.where(tmp48 != tmp48, tmp48, tl.where(tmp48 < tmp49, tmp48, tmp49)) 2023-01-11T21:38:06.3908872Z tmp51 = tl.libdevice.floor(tmp50) 2023-01-11T21:38:06.3908954Z tmp52 = tmp51 >= tmp5 2023-01-11T21:38:06.3909036Z tmp53 = tmp51 < tmp7 2023-01-11T21:38:06.3909116Z tmp54 = tmp9 * tmp44 2023-01-11T21:38:06.3909197Z tmp55 = tmp54 + tmp1 2023-01-11T21:38:06.3909335Z tmp56 = tl.where(tmp55 != tmp55, tmp55, tl.where(tmp55 > tmp47, tmp55, tmp47)) 2023-01-11T21:38:06.3909470Z tmp57 = tl.where(tmp56 != tmp56, tmp56, tl.where(tmp56 < tmp49, tmp56, tmp49)) 2023-01-11T21:38:06.3909564Z tmp58 = tl.libdevice.floor(tmp57) 2023-01-11T21:38:06.3909646Z tmp59 = tmp58 >= tmp5 2023-01-11T21:38:06.3909727Z tmp60 = tmp58 < tmp7 2023-01-11T21:38:06.3909808Z tmp61 = tmp59 & tmp60 2023-01-11T21:38:06.3909888Z tmp62 = tmp53 & tmp61 2023-01-11T21:38:06.3910005Z tmp63 = tmp52 & tmp62 2023-01-11T21:38:06.3910098Z tmp64 = tmp51.to(tl.int64) 2023-01-11T21:38:06.3910191Z tmp65 = tl.where(tmp63, tmp64, tmp5) 2023-01-11T21:38:06.3910279Z tmp66 = tmp58.to(tl.int64) 2023-01-11T21:38:06.3910377Z tmp67 = tl.where(tmp63, tmp66, tmp5) 2023-01-11T21:38:06.3910460Z tmp68 = tmp51 + tmp18 2023-01-11T21:38:06.3910577Z tmp69 = tmp68 - tmp50 2023-01-11T21:38:06.3910658Z tmp70 = tmp58 + tmp18 2023-01-11T21:38:06.3910772Z tmp71 = tmp70 - tmp57 2023-01-11T21:38:06.3910846Z tmp72 = tmp69 * tmp71 2023-01-11T21:38:06.3910945Z tmp73 = tl.where(tmp63, tmp72, tmp5) 2023-01-11T21:38:06.3911026Z tmp74 = tmp68 >= tmp5 2023-01-11T21:38:06.3911105Z tmp75 = tmp68 < tmp7 2023-01-11T21:38:06.3911185Z tmp76 = tmp75 & tmp61 2023-01-11T21:38:06.3911266Z tmp77 = tmp74 & tmp76 2023-01-11T21:38:06.3911348Z tmp78 = tmp68.to(tl.int64) 2023-01-11T21:38:06.3911450Z tmp79 = tl.where(tmp77, tmp78, tmp5) 2023-01-11T21:38:06.3911548Z tmp80 = tl.where(tmp77, tmp66, tmp5) 2023-01-11T21:38:06.3911667Z tmp81 = tmp50 - tmp51 2023-01-11T21:38:06.3911746Z tmp82 = tmp81 * tmp71 2023-01-11T21:38:06.3911844Z tmp83 = tl.where(tmp77, tmp82, tmp5) 2023-01-11T21:38:06.3911925Z tmp84 = tmp70 >= tmp5 2023-01-11T21:38:06.3911999Z tmp85 = tmp70 < tmp7 2023-01-11T21:38:06.3912081Z tmp86 = tmp84 & tmp85 2023-01-11T21:38:06.3912159Z tmp87 = tmp53 & tmp86 2023-01-11T21:38:06.3912239Z tmp88 = tmp52 & tmp87 2023-01-11T21:38:06.3912336Z tmp89 = tl.where(tmp88, tmp64, tmp5) 2023-01-11T21:38:06.3912423Z tmp90 = tmp70.to(tl.int64) 2023-01-11T21:38:06.3912522Z tmp91 = tl.where(tmp88, tmp90, tmp5) 2023-01-11T21:38:06.3912628Z tmp92 = tmp57 - tmp58 2023-01-11T21:38:06.3912710Z tmp93 = tmp69 * tmp92 2023-01-11T21:38:06.3912807Z tmp94 = tl.where(tmp88, tmp93, tmp5) 2023-01-11T21:38:06.3912885Z tmp95 = tmp75 & tmp86 2023-01-11T21:38:06.3912963Z tmp96 = tmp74 & tmp95 2023-01-11T21:38:06.3913058Z tmp97 = tl.where(tmp96, tmp78, tmp5) 2023-01-11T21:38:06.3913159Z tmp98 = tl.where(tmp96, tmp90, tmp5) 2023-01-11T21:38:06.3913236Z tmp100 = tmp99 * tmp44 2023-01-11T21:38:06.3913319Z tmp101 = tmp100 + tmp1 2023-01-11T21:38:06.3913470Z tmp102 = tl.where(tmp101 != tmp101, tmp101, tl.where(tmp101 > tmp47, tmp101, tmp47)) 2023-01-11T21:38:06.3913642Z tmp103 = tl.where(tmp102 != tmp102, tmp102, tl.where(tmp102 < tmp49, tmp102, tmp49)) 2023-01-11T21:38:06.3913746Z tmp104 = tl.libdevice.floor(tmp103) 2023-01-11T21:38:06.3913830Z tmp105 = tmp104 + tmp18 2023-01-11T21:38:06.3913912Z tmp106 = tmp105 >= tmp5 2023-01-11T21:38:06.3913989Z tmp107 = tmp105 < tmp7 2023-01-11T21:38:06.3914074Z tmp109 = tmp108 * tmp44 2023-01-11T21:38:06.3914157Z tmp110 = tmp109 + tmp1 2023-01-11T21:38:06.3914300Z tmp111 = tl.where(tmp110 != tmp110, tmp110, tl.where(tmp110 > tmp47, tmp110, tmp47)) 2023-01-11T21:38:06.3914442Z tmp112 = tl.where(tmp111 != tmp111, tmp111, tl.where(tmp111 < tmp49, tmp111, tmp49)) 2023-01-11T21:38:06.3914549Z tmp113 = tl.libdevice.floor(tmp112) 2023-01-11T21:38:06.3914632Z tmp114 = tmp113 + tmp18 2023-01-11T21:38:06.3914709Z tmp115 = tmp114 >= tmp5 2023-01-11T21:38:06.3914791Z tmp116 = tmp114 < tmp7 2023-01-11T21:38:06.3914872Z tmp117 = tmp115 & tmp116 2023-01-11T21:38:06.3914957Z tmp118 = tmp107 & tmp117 2023-01-11T21:38:06.3915041Z tmp119 = tmp106 & tmp118 2023-01-11T21:38:06.3915164Z tmp120 = tmp103 - tmp104 2023-01-11T21:38:06.3915282Z tmp121 = tmp112 - tmp113 2023-01-11T21:38:06.3915357Z tmp122 = tmp120 * tmp121 2023-01-11T21:38:06.3915471Z tmp123 = tl.where(tmp119, tmp122, tmp5) 2023-01-11T21:38:06.3915632Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp24, xmask) 2023-01-11T21:38:06.3915786Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp31, xmask) 2023-01-11T21:38:06.3915917Z tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp39, xmask) 2023-01-11T21:38:06.3916048Z tl.store(out_ptr3 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp43, xmask) 2023-01-11T21:38:06.3916206Z tl.store(out_ptr4 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp65, xmask) 2023-01-11T21:38:06.3916328Z tl.store(out_ptr5 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp67, xmask) 2023-01-11T21:38:06.3916455Z tl.store(out_ptr6 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp73, xmask) 2023-01-11T21:38:06.3916583Z tl.store(out_ptr7 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp79, xmask) 2023-01-11T21:38:06.3916714Z tl.store(out_ptr8 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp80, xmask) 2023-01-11T21:38:06.3916842Z tl.store(out_ptr9 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp83, xmask) 2023-01-11T21:38:06.3916974Z tl.store(out_ptr10 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp89, xmask) 2023-01-11T21:38:06.3917109Z tl.store(out_ptr11 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp91, xmask) 2023-01-11T21:38:06.3917238Z tl.store(out_ptr12 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp94, xmask) 2023-01-11T21:38:06.3917363Z tl.store(out_ptr13 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp97, xmask) 2023-01-11T21:38:06.3917489Z tl.store(out_ptr14 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp98, xmask) 2023-01-11T21:38:06.3917625Z tl.store(out_ptr15 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp123, xmask) 2023-01-11T21:38:06.3917715Z ''') 2023-01-11T21:38:06.3917721Z 2023-01-11T21:38:06.3917727Z 2023-01-11T21:38:06.3917933Z triton_fused_add_6_index_index_1_index_2_index_3_1 = async_compile.triton(''' 2023-01-11T21:38:06.3918010Z import triton 2023-01-11T21:38:06.3918104Z import triton.language as tl 2023-01-11T21:38:06.3918221Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.3918319Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.3918453Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.3918580Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.3918585Z 2023-01-11T21:38:06.3919081Z @pointwise(size_hints=[2097152], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: '*fp32', 5: '*fp32', 6: '*fp32', 7: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5, 6, 7), equal_to_1=())]}) 2023-01-11T21:38:06.3919160Z @triton.jit 2023-01-11T21:38:06.3919369Z def triton_(in_out_ptr0, in_ptr0, in_ptr1, in_ptr2, in_ptr3, in_ptr4, in_ptr5, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.3919451Z xnumel = 1486848 2023-01-11T21:38:06.3919550Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.3919682Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.3919760Z xmask = xindex < xnumel 2023-01-11T21:38:06.3919844Z x2 = (xindex // 371712) 2023-01-11T21:38:06.3919927Z x1 = (xindex // 123904) % 3 2023-01-11T21:38:06.3920008Z x0 = xindex % 123904 2023-01-11T21:38:06.3920081Z x3 = xindex 2023-01-11T21:38:06.3920299Z tmp2 = tl.load(in_ptr0 + ((2*x0) + (247808*x2)), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.3920522Z tmp11 = tl.load(in_ptr0 + (1 + (2*x0) + (247808*x2)), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.3920632Z tmp45 = tl.load(in_ptr0 + ((2*x0) + (247808*x2)), xmask) 2023-01-11T21:38:06.3920747Z tmp52 = tl.load(in_ptr0 + (1 + (2*x0) + (247808*x2)), xmask) 2023-01-11T21:38:06.3920862Z tmp67 = tl.load(in_ptr2 + (x0 + (123904*x2)), xmask) 2023-01-11T21:38:06.3920975Z tmp69 = tl.load(in_ptr3 + (x0 + (123904*x2)), xmask) 2023-01-11T21:38:06.3921085Z tmp72 = tl.load(in_ptr4 + (x0 + (123904*x2)), xmask) 2023-01-11T21:38:06.3921194Z tmp75 = tl.load(in_ptr5 + (x0 + (123904*x2)), xmask) 2023-01-11T21:38:06.3921271Z tmp0 = x2 2023-01-11T21:38:06.3921338Z tmp1 = x1 2023-01-11T21:38:06.3921413Z tmp3 = 175.5 2023-01-11T21:38:06.3921492Z tmp4 = tmp2 * tmp3 2023-01-11T21:38:06.3921573Z tmp5 = tmp4 + tmp3 2023-01-11T21:38:06.3921675Z tmp6 = tl.libdevice.floor(tmp5) 2023-01-11T21:38:06.3921748Z tmp7 = 0 2023-01-11T21:38:06.3921859Z tmp8 = tmp6 >= tmp7 2023-01-11T21:38:06.3921928Z tmp9 = 352 2023-01-11T21:38:06.3922008Z tmp10 = tmp6 < tmp9 2023-01-11T21:38:06.3922092Z tmp12 = tmp11 * tmp3 2023-01-11T21:38:06.3922171Z tmp13 = tmp12 + tmp3 2023-01-11T21:38:06.3922274Z tmp14 = tl.libdevice.floor(tmp13) 2023-01-11T21:38:06.3922358Z tmp15 = tmp14 >= tmp7 2023-01-11T21:38:06.3922433Z tmp16 = tmp14 < tmp9 2023-01-11T21:38:06.3922512Z tmp17 = tmp15 & tmp16 2023-01-11T21:38:06.3922592Z tmp18 = tmp10 & tmp17 2023-01-11T21:38:06.3922671Z tmp19 = tmp8 & tmp18 2023-01-11T21:38:06.3922759Z tmp20 = tmp14.to(tl.int64) 2023-01-11T21:38:06.3922860Z tmp21 = tl.where(tmp19, tmp20, tmp7) 2023-01-11T21:38:06.3922950Z tmp22 = tmp6.to(tl.int64) 2023-01-11T21:38:06.3923044Z tmp23 = tl.where(tmp19, tmp22, tmp7) 2023-01-11T21:38:06.3923301Z tmp24 = tl.load(in_ptr1 + (tmp23 + (352*tmp21) + (123904*tmp1) + (371712*tmp0)), None, eviction_policy='evict_last') 2023-01-11T21:38:06.3923377Z tmp25 = 1 2023-01-11T21:38:06.3923458Z tmp26 = tmp6 + tmp25 2023-01-11T21:38:06.3923539Z tmp27 = tmp26 >= tmp7 2023-01-11T21:38:06.3923618Z tmp28 = tmp26 < tmp9 2023-01-11T21:38:06.3923696Z tmp29 = tmp28 & tmp17 2023-01-11T21:38:06.3923770Z tmp30 = tmp27 & tmp29 2023-01-11T21:38:06.3923869Z tmp31 = tl.where(tmp30, tmp20, tmp7) 2023-01-11T21:38:06.3923964Z tmp32 = tmp26.to(tl.int64) 2023-01-11T21:38:06.3924063Z tmp33 = tl.where(tmp30, tmp32, tmp7) 2023-01-11T21:38:06.3924317Z tmp34 = tl.load(in_ptr1 + (tmp33 + (352*tmp31) + (123904*tmp1) + (371712*tmp0)), None, eviction_policy='evict_last') 2023-01-11T21:38:06.3924403Z tmp35 = tmp14 + tmp25 2023-01-11T21:38:06.3924484Z tmp36 = tmp35 >= tmp7 2023-01-11T21:38:06.3924558Z tmp37 = tmp35 < tmp9 2023-01-11T21:38:06.3924639Z tmp38 = tmp36 & tmp37 2023-01-11T21:38:06.3924720Z tmp39 = tmp10 & tmp38 2023-01-11T21:38:06.3924801Z tmp40 = tmp8 & tmp39 2023-01-11T21:38:06.3924890Z tmp41 = tmp35.to(tl.int64) 2023-01-11T21:38:06.3924991Z tmp42 = tl.where(tmp40, tmp41, tmp7) 2023-01-11T21:38:06.3925091Z tmp43 = tl.where(tmp40, tmp22, tmp7) 2023-01-11T21:38:06.3925334Z tmp44 = tl.load(in_ptr1 + (tmp43 + (352*tmp42) + (123904*tmp1) + (371712*tmp0)), None, eviction_policy='evict_last') 2023-01-11T21:38:06.3925417Z tmp46 = tmp45 * tmp3 2023-01-11T21:38:06.3925526Z tmp47 = tmp46 + tmp3 2023-01-11T21:38:06.3925630Z tmp48 = tl.libdevice.floor(tmp47) 2023-01-11T21:38:06.3925711Z tmp49 = tmp48 + tmp25 2023-01-11T21:38:06.3925792Z tmp50 = tmp49 >= tmp7 2023-01-11T21:38:06.3925874Z tmp51 = tmp49 < tmp9 2023-01-11T21:38:06.3925948Z tmp53 = tmp52 * tmp3 2023-01-11T21:38:06.3926029Z tmp54 = tmp53 + tmp3 2023-01-11T21:38:06.3926131Z tmp55 = tl.libdevice.floor(tmp54) 2023-01-11T21:38:06.3926215Z tmp56 = tmp55 + tmp25 2023-01-11T21:38:06.3926296Z tmp57 = tmp56 >= tmp7 2023-01-11T21:38:06.3926377Z tmp58 = tmp56 < tmp9 2023-01-11T21:38:06.3926456Z tmp59 = tmp57 & tmp58 2023-01-11T21:38:06.3926534Z tmp60 = tmp51 & tmp59 2023-01-11T21:38:06.3926612Z tmp61 = tmp50 & tmp60 2023-01-11T21:38:06.3926698Z tmp62 = tmp56.to(tl.int64) 2023-01-11T21:38:06.3926798Z tmp63 = tl.where(tmp61, tmp62, tmp7) 2023-01-11T21:38:06.3926888Z tmp64 = tmp49.to(tl.int64) 2023-01-11T21:38:06.3926989Z tmp65 = tl.where(tmp61, tmp64, tmp7) 2023-01-11T21:38:06.3927129Z tmp66 = tl.load(in_ptr1 + (tmp65 + (352*tmp63) + (123904*tmp1) + (371712*tmp0)), None) 2023-01-11T21:38:06.3927205Z tmp68 = tmp24 * tmp67 2023-01-11T21:38:06.3927288Z tmp70 = tmp34 * tmp69 2023-01-11T21:38:06.3927369Z tmp71 = tmp68 + tmp70 2023-01-11T21:38:06.3927450Z tmp73 = tmp44 * tmp72 2023-01-11T21:38:06.3927530Z tmp74 = tmp71 + tmp73 2023-01-11T21:38:06.3927608Z tmp76 = tmp66 * tmp75 2023-01-11T21:38:06.3927688Z tmp77 = tmp74 + tmp76 2023-01-11T21:38:06.3927824Z tl.store(in_out_ptr0 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp77, xmask) 2023-01-11T21:38:06.3927911Z ''') 2023-01-11T21:38:06.3927943Z 2023-01-11T21:38:06.3927948Z 2023-01-11T21:38:06.3928200Z triton_fused_add_10_add_11_add_12_add_13_add_7_add_8_add_9_floor_2_floor_3_ge_10_2 = async_compile.triton(''' 2023-01-11T21:38:06.3928278Z import triton 2023-01-11T21:38:06.3928373Z import triton.language as tl 2023-01-11T21:38:06.3928493Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.3928597Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.3928726Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.3928852Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.3928857Z 2023-01-11T21:38:06.3929451Z @pointwise(size_hints=[2097152], filename=__file__, meta={'signature': {0: '*fp32', 1: '*i64', 2: '*i64', 3: '*fp32', 4: '*fp32', 5: '*i64', 6: '*i64', 7: '*fp32', 8: '*i64', 9: '*i64', 10: '*fp32', 11: '*i64', 12: '*i64', 13: '*fp32', 14: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14), equal_to_1=())]}) 2023-01-11T21:38:06.3929529Z @triton.jit 2023-01-11T21:38:06.3929753Z def triton_(in_out_ptr0, in_ptr0, in_ptr1, in_ptr2, in_ptr3, in_ptr4, in_ptr5, in_ptr6, in_ptr7, in_ptr8, in_ptr9, in_ptr10, in_ptr11, in_ptr12, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.3929835Z xnumel = 1486848 2023-01-11T21:38:06.3929935Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.3930066Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.3930152Z xmask = xindex < xnumel 2023-01-11T21:38:06.3930229Z x2 = (xindex // 371712) 2023-01-11T21:38:06.3930314Z x1 = (xindex // 123904) % 3 2023-01-11T21:38:06.3930394Z x0 = xindex % 123904 2023-01-11T21:38:06.3930466Z x3 = xindex 2023-01-11T21:38:06.3930577Z tmp2 = tl.load(in_ptr0 + (x0 + (123904*x2)), xmask) 2023-01-11T21:38:06.3930689Z tmp3 = tl.load(in_ptr1 + (x0 + (123904*x2)), xmask) 2023-01-11T21:38:06.3930797Z tmp5 = tl.load(in_ptr3 + (x0 + (123904*x2)), xmask) 2023-01-11T21:38:06.3930901Z tmp7 = tl.load(in_ptr4 + (x0 + (123904*x2)), xmask) 2023-01-11T21:38:06.3931007Z tmp8 = tl.load(in_ptr5 + (x0 + (123904*x2)), xmask) 2023-01-11T21:38:06.3931119Z tmp10 = tl.load(in_ptr6 + (x0 + (123904*x2)), xmask) 2023-01-11T21:38:06.3931254Z tmp13 = tl.load(in_ptr7 + (x0 + (123904*x2)), xmask) 2023-01-11T21:38:06.3931360Z tmp14 = tl.load(in_ptr8 + (x0 + (123904*x2)), xmask) 2023-01-11T21:38:06.3931465Z tmp16 = tl.load(in_ptr9 + (x0 + (123904*x2)), xmask) 2023-01-11T21:38:06.3931573Z tmp19 = tl.load(in_ptr10 + (x0 + (123904*x2)), xmask) 2023-01-11T21:38:06.3931681Z tmp20 = tl.load(in_ptr11 + (x0 + (123904*x2)), xmask) 2023-01-11T21:38:06.3931781Z tmp22 = tl.load(in_ptr12 + (x0 + (123904*x2)), xmask) 2023-01-11T21:38:06.3931852Z tmp0 = x2 2023-01-11T21:38:06.3931923Z tmp1 = x1 2023-01-11T21:38:06.3932175Z tmp4 = tl.load(in_ptr2 + (tmp3 + (352*tmp2) + (123904*tmp1) + (371712*tmp0)), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.3932257Z tmp6 = tmp4 * tmp5 2023-01-11T21:38:06.3932504Z tmp9 = tl.load(in_ptr2 + (tmp8 + (352*tmp7) + (123904*tmp1) + (371712*tmp0)), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.3932586Z tmp11 = tmp9 * tmp10 2023-01-11T21:38:06.3932658Z tmp12 = tmp6 + tmp11 2023-01-11T21:38:06.3932911Z tmp15 = tl.load(in_ptr2 + (tmp14 + (352*tmp13) + (123904*tmp1) + (371712*tmp0)), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.3932993Z tmp17 = tmp15 * tmp16 2023-01-11T21:38:06.3933073Z tmp18 = tmp12 + tmp17 2023-01-11T21:38:06.3933208Z tmp21 = tl.load(in_ptr2 + (tmp20 + (352*tmp19) + (123904*tmp1) + (371712*tmp0)), xmask) 2023-01-11T21:38:06.3933286Z tmp23 = tmp21 * tmp22 2023-01-11T21:38:06.3933363Z tmp24 = tmp18 + tmp23 2023-01-11T21:38:06.3933494Z tl.store(in_out_ptr0 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp24, xmask) 2023-01-11T21:38:06.3933576Z ''') 2023-01-11T21:38:06.3933582Z 2023-01-11T21:38:06.3933615Z 2023-01-11T21:38:06.3933709Z async_compile.wait(globals()) 2023-01-11T21:38:06.3933784Z del async_compile 2023-01-11T21:38:06.3933789Z 2023-01-11T21:38:06.3933862Z def call(args): 2023-01-11T21:38:06.3933940Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.3934013Z args.clear() 2023-01-11T21:38:06.3934103Z with torch.cuda.device(0): 2023-01-11T21:38:06.3934318Z buf0 = empty_strided((4, 352, 352), (123904, 352, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3934661Z buf2 = empty_strided((4, 352, 352), (123904, 352, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3934882Z buf4 = empty_strided((4, 352, 352), (123904, 352, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3935097Z buf6 = empty_strided((4, 352, 352), (123904, 352, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3935314Z buf9 = empty_strided((4, 352, 352), (123904, 352, 1), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.3935528Z buf10 = empty_strided((4, 352, 352), (123904, 352, 1), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.3935751Z buf11 = empty_strided((4, 352, 352), (123904, 352, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3935961Z buf12 = empty_strided((4, 352, 352), (123904, 352, 1), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.3936179Z buf13 = empty_strided((4, 352, 352), (123904, 352, 1), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.3936390Z buf14 = empty_strided((4, 352, 352), (123904, 352, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3936602Z buf15 = empty_strided((4, 352, 352), (123904, 352, 1), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.3936814Z buf16 = empty_strided((4, 352, 352), (123904, 352, 1), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.3937029Z buf17 = empty_strided((4, 352, 352), (123904, 352, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3937309Z buf19 = empty_strided((4, 352, 352), (123904, 352, 1), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.3937525Z buf20 = empty_strided((4, 352, 352), (123904, 352, 1), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.3937738Z buf21 = empty_strided((4, 352, 352), (123904, 352, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3937879Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.3938162Z triton_fused_add_add_1_add_10_add_2_add_3_add_7_add_8_add_9_convert_element_type_10_convert_element_type_11_0.run(arg1_1, buf0, buf2, buf4, buf6, buf9, buf10, buf11, buf12, buf13, buf14, buf15, buf16, buf17, buf19, buf20, buf21, 495616, grid=grid(495616), stream=stream0) 2023-01-11T21:38:06.3938388Z buf1 = empty_strided((4, 3, 352, 352), (371712, 123904, 352, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3938479Z buf8 = buf1; del buf1 # reuse 2023-01-11T21:38:06.3938667Z triton_fused_add_6_index_index_1_index_2_index_3_1.run(buf8, arg1_1, arg0_1, buf0, buf2, buf4, buf6, 1486848, grid=grid(1486848), stream=stream0) 2023-01-11T21:38:06.3938745Z del arg1_1 2023-01-11T21:38:06.3938816Z del buf0 2023-01-11T21:38:06.3938886Z del buf2 2023-01-11T21:38:06.3938955Z del buf4 2023-01-11T21:38:06.3939017Z del buf6 2023-01-11T21:38:06.3939256Z buf18 = empty_strided((4, 3, 352, 352), (371712, 123904, 352, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3939349Z buf22 = buf18; del buf18 # reuse 2023-01-11T21:38:06.3939582Z triton_fused_add_10_add_11_add_12_add_13_add_7_add_8_add_9_floor_2_floor_3_ge_10_2.run(buf22, buf10, buf9, arg0_1, buf11, buf13, buf12, buf14, buf16, buf15, buf17, buf20, buf19, buf21, 1486848, grid=grid(1486848), stream=stream0) 2023-01-11T21:38:06.3939657Z del arg0_1 2023-01-11T21:38:06.3939742Z return (buf8, buf22, ) 2023-01-11T21:38:06.3939747Z 2023-01-11T21:38:06.3939751Z 2023-01-11T21:38:06.3939829Z if __name__ == "__main__": 2023-01-11T21:38:06.3939989Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.3940118Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.3940346Z arg0_1 = rand_strided((4, 3, 352, 352), (371712, 123904, 352, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.3940611Z arg1_1 = rand_strided((4, 352, 352, 2), (247808, 704, 2, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.3940739Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.3941046Z [2023-01-11 21:34:44,927] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 571 2023-01-11T21:38:06.3941052Z 2023-01-11T21:38:06.3941127Z ok (2.432s) 2023-01-11T21:38:06.3941670Z test_hardsigmoid_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.3941816Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.3942110Z [2023-01-11 21:34:44,978] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 572 2023-01-11T21:38:06.3942416Z [2023-01-11 21:34:45,071] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 572 2023-01-11T21:38:06.3942908Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.3943051Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.3943336Z [2023-01-11 21:34:45,222] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 573 2023-01-11T21:38:06.3943638Z [2023-01-11 21:34:45,326] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 573 2023-01-11T21:38:06.3943644Z 2023-01-11T21:38:06.3943749Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.3943853Z import torch 2023-01-11T21:38:06.3943931Z import random 2023-01-11T21:38:06.3944064Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.3944199Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.3944204Z 2023-01-11T21:38:06.3944291Z aten = torch.ops.aten 2023-01-11T21:38:06.3944437Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.3944538Z async_compile = AsyncCompile() 2023-01-11T21:38:06.3944544Z 2023-01-11T21:38:06.3944621Z import triton 2023-01-11T21:38:06.3944720Z import triton.language as tl 2023-01-11T21:38:06.3944857Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.3945016Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.3945022Z 2023-01-11T21:38:06.3945026Z 2023-01-11T21:38:06.3945244Z triton_fused_div_div_1_div_2_0 = async_compile.triton(''' 2023-01-11T21:38:06.3945328Z import triton 2023-01-11T21:38:06.3945443Z import triton.language as tl 2023-01-11T21:38:06.3945569Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.3945680Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.3945824Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.3945968Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.3945973Z 2023-01-11T21:38:06.3946491Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]}) 2023-01-11T21:38:06.3946605Z @triton.jit 2023-01-11T21:38:06.3946759Z def triton_(in_ptr0, out_ptr0, out_ptr1, out_ptr2, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.3946829Z xnumel = 64 2023-01-11T21:38:06.3946927Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.3947058Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.3947146Z xmask = xindex < xnumel 2023-01-11T21:38:06.3947220Z x0 = xindex 2023-01-11T21:38:06.3947414Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.3947517Z tmp13 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.3947583Z tmp1 = 3 2023-01-11T21:38:06.3947662Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.3947736Z tmp3 = 0.0 2023-01-11T21:38:06.3947875Z tmp4 = tl.where(tmp2 != tmp2, tmp2, tl.where(tmp2 > tmp3, tmp2, tmp3)) 2023-01-11T21:38:06.3947949Z tmp5 = 6.0 2023-01-11T21:38:06.3948085Z tmp6 = tl.where(tmp4 != tmp4, tmp4, tl.where(tmp4 < tmp5, tmp4, tmp5)) 2023-01-11T21:38:06.3948160Z tmp7 = 6 2023-01-11T21:38:06.3948234Z tmp8 = tmp6 / tmp7 2023-01-11T21:38:06.3948312Z tmp9 = tmp2 + tmp1 2023-01-11T21:38:06.3948450Z tmp10 = tl.where(tmp9 != tmp9, tmp9, tl.where(tmp9 > tmp3, tmp9, tmp3)) 2023-01-11T21:38:06.3948596Z tmp11 = tl.where(tmp10 != tmp10, tmp10, tl.where(tmp10 < tmp5, tmp10, tmp5)) 2023-01-11T21:38:06.3948681Z tmp12 = tmp11 / tmp7 2023-01-11T21:38:06.3948796Z tmp14 = tmp13 - tmp1 2023-01-11T21:38:06.3948878Z tmp15 = tmp14 + tmp1 2023-01-11T21:38:06.3949013Z tmp16 = tl.where(tmp15 != tmp15, tmp15, tl.where(tmp15 > tmp3, tmp15, tmp3)) 2023-01-11T21:38:06.3949153Z tmp17 = tl.where(tmp16 != tmp16, tmp16, tl.where(tmp16 < tmp5, tmp16, tmp5)) 2023-01-11T21:38:06.3949233Z tmp18 = tmp17 / tmp7 2023-01-11T21:38:06.3949367Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp8, xmask) 2023-01-11T21:38:06.3949502Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp12, xmask) 2023-01-11T21:38:06.3949635Z tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp18, xmask) 2023-01-11T21:38:06.3949725Z ''') 2023-01-11T21:38:06.3949730Z 2023-01-11T21:38:06.3949735Z 2023-01-11T21:38:06.3949831Z async_compile.wait(globals()) 2023-01-11T21:38:06.3949903Z del async_compile 2023-01-11T21:38:06.3949908Z 2023-01-11T21:38:06.3949984Z def call(args): 2023-01-11T21:38:06.3950086Z arg0_1, = args 2023-01-11T21:38:06.3950164Z args.clear() 2023-01-11T21:38:06.3950257Z with torch.cuda.device(0): 2023-01-11T21:38:06.3950458Z buf0 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3950657Z buf1 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3950848Z buf2 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3950941Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.3951100Z triton_fused_div_div_1_div_2_0.run(arg0_1, buf0, buf1, buf2, 64, grid=grid(64), stream=stream0) 2023-01-11T21:38:06.3951179Z del arg0_1 2023-01-11T21:38:06.3951267Z return (buf0, buf1, buf2, ) 2023-01-11T21:38:06.3951273Z 2023-01-11T21:38:06.3951277Z 2023-01-11T21:38:06.3951361Z if __name__ == "__main__": 2023-01-11T21:38:06.3951482Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.3951612Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.3951804Z arg0_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.3951917Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.3951922Z 2023-01-11T21:38:06.3951926Z 2023-01-11T21:38:06.3952023Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.3952099Z import torch 2023-01-11T21:38:06.3952175Z import random 2023-01-11T21:38:06.3952297Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.3952421Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.3952426Z 2023-01-11T21:38:06.3952510Z aten = torch.ops.aten 2023-01-11T21:38:06.3952680Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.3952776Z async_compile = AsyncCompile() 2023-01-11T21:38:06.3952781Z 2023-01-11T21:38:06.3952856Z import triton 2023-01-11T21:38:06.3952950Z import triton.language as tl 2023-01-11T21:38:06.3953076Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.3953220Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.3953225Z 2023-01-11T21:38:06.3953230Z 2023-01-11T21:38:06.3953494Z triton_fused_convert_element_type_1_convert_element_type_3_convert_element_type_5_0 = async_compile.triton(''' 2023-01-11T21:38:06.3953572Z import triton 2023-01-11T21:38:06.3953659Z import triton.language as tl 2023-01-11T21:38:06.3953776Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.3953881Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.3954015Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.3954144Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.3954152Z 2023-01-11T21:38:06.3954592Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: '*fp16', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]}) 2023-01-11T21:38:06.3954666Z @triton.jit 2023-01-11T21:38:06.3954820Z def triton_(in_ptr0, out_ptr0, out_ptr1, out_ptr2, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.3954890Z xnumel = 64 2023-01-11T21:38:06.3954991Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.3955143Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.3955230Z xmask = xindex < xnumel 2023-01-11T21:38:06.3955320Z x0 = xindex 2023-01-11T21:38:06.3955541Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.3955661Z tmp18 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.3955746Z tmp1 = tmp0.to(tl.float32) 2023-01-11T21:38:06.3955819Z tmp2 = 3 2023-01-11T21:38:06.3955897Z tmp3 = tmp1 + tmp2 2023-01-11T21:38:06.3955972Z tmp4 = 0.0 2023-01-11T21:38:06.3956107Z tmp5 = tl.where(tmp3 != tmp3, tmp3, tl.where(tmp3 > tmp4, tmp3, tmp4)) 2023-01-11T21:38:06.3956180Z tmp6 = 6.0 2023-01-11T21:38:06.3956347Z tmp7 = tl.where(tmp5 != tmp5, tmp5, tl.where(tmp5 < tmp6, tmp5, tmp6)) 2023-01-11T21:38:06.3956414Z tmp8 = 6 2023-01-11T21:38:06.3956496Z tmp9 = tmp7 / tmp8 2023-01-11T21:38:06.3956586Z tmp10 = tmp9.to(tl.float32) 2023-01-11T21:38:06.3956669Z tmp11 = tmp0 + tmp2 2023-01-11T21:38:06.3956761Z tmp12 = tmp11.to(tl.float32) 2023-01-11T21:38:06.3956842Z tmp13 = tmp12 + tmp2 2023-01-11T21:38:06.3956985Z tmp14 = tl.where(tmp13 != tmp13, tmp13, tl.where(tmp13 > tmp4, tmp13, tmp4)) 2023-01-11T21:38:06.3957120Z tmp15 = tl.where(tmp14 != tmp14, tmp14, tl.where(tmp14 < tmp6, tmp14, tmp6)) 2023-01-11T21:38:06.3957206Z tmp16 = tmp15 / tmp8 2023-01-11T21:38:06.3957297Z tmp17 = tmp16.to(tl.float32) 2023-01-11T21:38:06.3957413Z tmp19 = tmp18 - tmp2 2023-01-11T21:38:06.3957503Z tmp20 = tmp19.to(tl.float32) 2023-01-11T21:38:06.3957583Z tmp21 = tmp20 + tmp2 2023-01-11T21:38:06.3957721Z tmp22 = tl.where(tmp21 != tmp21, tmp21, tl.where(tmp21 > tmp4, tmp21, tmp4)) 2023-01-11T21:38:06.3957857Z tmp23 = tl.where(tmp22 != tmp22, tmp22, tl.where(tmp22 < tmp6, tmp22, tmp6)) 2023-01-11T21:38:06.3957943Z tmp24 = tmp23 / tmp8 2023-01-11T21:38:06.3958037Z tmp25 = tmp24.to(tl.float32) 2023-01-11T21:38:06.3958172Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp10, xmask) 2023-01-11T21:38:06.3958306Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp17, xmask) 2023-01-11T21:38:06.3958436Z tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp25, xmask) 2023-01-11T21:38:06.3958524Z ''') 2023-01-11T21:38:06.3958530Z 2023-01-11T21:38:06.3958534Z 2023-01-11T21:38:06.3958658Z async_compile.wait(globals()) 2023-01-11T21:38:06.3958730Z del async_compile 2023-01-11T21:38:06.3958735Z 2023-01-11T21:38:06.3958812Z def call(args): 2023-01-11T21:38:06.3958885Z arg0_1, = args 2023-01-11T21:38:06.3958962Z args.clear() 2023-01-11T21:38:06.3959056Z with torch.cuda.device(0): 2023-01-11T21:38:06.3959259Z buf0 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.3959458Z buf1 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.3959648Z buf2 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.3959743Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.3959955Z triton_fused_convert_element_type_1_convert_element_type_3_convert_element_type_5_0.run(arg0_1, buf0, buf1, buf2, 64, grid=grid(64), stream=stream0) 2023-01-11T21:38:06.3960031Z del arg0_1 2023-01-11T21:38:06.3960120Z return (buf0, buf1, buf2, ) 2023-01-11T21:38:06.3960128Z 2023-01-11T21:38:06.3960132Z 2023-01-11T21:38:06.3960215Z if __name__ == "__main__": 2023-01-11T21:38:06.3960334Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.3960462Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.3960659Z arg0_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.3960771Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.3960776Z 2023-01-11T21:38:06.3960849Z ok (0.397s) 2023-01-11T21:38:06.3961312Z test_hardswish_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.3961444Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.3961707Z [2023-01-11 21:34:45,379] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 574 2023-01-11T21:38:06.3961972Z [2023-01-11 21:34:45,480] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 574 2023-01-11T21:38:06.3962416Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.3962552Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.3962808Z [2023-01-11 21:34:45,636] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 575 2023-01-11T21:38:06.3963072Z [2023-01-11 21:34:45,743] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 575 2023-01-11T21:38:06.3963080Z 2023-01-11T21:38:06.3963175Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.3963250Z import torch 2023-01-11T21:38:06.3963326Z import random 2023-01-11T21:38:06.3963448Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.3963573Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.3963578Z 2023-01-11T21:38:06.3963663Z aten = torch.ops.aten 2023-01-11T21:38:06.3963801Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.3963893Z async_compile = AsyncCompile() 2023-01-11T21:38:06.3963902Z 2023-01-11T21:38:06.3963972Z import triton 2023-01-11T21:38:06.3964066Z import triton.language as tl 2023-01-11T21:38:06.3964193Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.3964333Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.3964338Z 2023-01-11T21:38:06.3964343Z 2023-01-11T21:38:06.3964546Z triton_fused_div_div_1_div_2_0 = async_compile.triton(''' 2023-01-11T21:38:06.3964623Z import triton 2023-01-11T21:38:06.3964717Z import triton.language as tl 2023-01-11T21:38:06.3964826Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.3964930Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.3965066Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.3965192Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.3965197Z 2023-01-11T21:38:06.3965632Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]}) 2023-01-11T21:38:06.3965710Z @triton.jit 2023-01-11T21:38:06.3965862Z def triton_(in_ptr0, out_ptr0, out_ptr1, out_ptr2, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.3965935Z xnumel = 64 2023-01-11T21:38:06.3966032Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.3966159Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.3966246Z xmask = xindex < xnumel 2023-01-11T21:38:06.3966320Z x0 = xindex 2023-01-11T21:38:06.3966511Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.3966617Z tmp15 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.3966688Z tmp1 = 3 2023-01-11T21:38:06.3966763Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.3966838Z tmp3 = 0.0 2023-01-11T21:38:06.3966976Z tmp4 = tl.where(tmp2 != tmp2, tmp2, tl.where(tmp2 > tmp3, tmp2, tmp3)) 2023-01-11T21:38:06.3967050Z tmp5 = 6.0 2023-01-11T21:38:06.3967186Z tmp6 = tl.where(tmp4 != tmp4, tmp4, tl.where(tmp4 < tmp5, tmp4, tmp5)) 2023-01-11T21:38:06.3967265Z tmp7 = tmp0 * tmp6 2023-01-11T21:38:06.3967337Z tmp8 = 6 2023-01-11T21:38:06.3967410Z tmp9 = tmp7 / tmp8 2023-01-11T21:38:06.3967491Z tmp10 = tmp2 + tmp1 2023-01-11T21:38:06.3967638Z tmp11 = tl.where(tmp10 != tmp10, tmp10, tl.where(tmp10 > tmp3, tmp10, tmp3)) 2023-01-11T21:38:06.3967776Z tmp12 = tl.where(tmp11 != tmp11, tmp11, tl.where(tmp11 < tmp5, tmp11, tmp5)) 2023-01-11T21:38:06.3967859Z tmp13 = tmp2 * tmp12 2023-01-11T21:38:06.3967938Z tmp14 = tmp13 / tmp8 2023-01-11T21:38:06.3968081Z tmp16 = tmp15 - tmp1 2023-01-11T21:38:06.3968156Z tmp17 = tmp16 + tmp1 2023-01-11T21:38:06.3968296Z tmp18 = tl.where(tmp17 != tmp17, tmp17, tl.where(tmp17 > tmp3, tmp17, tmp3)) 2023-01-11T21:38:06.3968435Z tmp19 = tl.where(tmp18 != tmp18, tmp18, tl.where(tmp18 < tmp5, tmp18, tmp5)) 2023-01-11T21:38:06.3968517Z tmp20 = tmp16 * tmp19 2023-01-11T21:38:06.3968598Z tmp21 = tmp20 / tmp8 2023-01-11T21:38:06.3968737Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp9, xmask) 2023-01-11T21:38:06.3968872Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp14, xmask) 2023-01-11T21:38:06.3969006Z tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp21, xmask) 2023-01-11T21:38:06.3969090Z ''') 2023-01-11T21:38:06.3969095Z 2023-01-11T21:38:06.3969100Z 2023-01-11T21:38:06.3969196Z async_compile.wait(globals()) 2023-01-11T21:38:06.3969275Z del async_compile 2023-01-11T21:38:06.3969280Z 2023-01-11T21:38:06.3969356Z def call(args): 2023-01-11T21:38:06.3969432Z arg0_1, = args 2023-01-11T21:38:06.3969508Z args.clear() 2023-01-11T21:38:06.3969603Z with torch.cuda.device(0): 2023-01-11T21:38:06.3969795Z buf0 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3969994Z buf1 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3970193Z buf2 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3970289Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.3970447Z triton_fused_div_div_1_div_2_0.run(arg0_1, buf0, buf1, buf2, 64, grid=grid(64), stream=stream0) 2023-01-11T21:38:06.3970555Z del arg0_1 2023-01-11T21:38:06.3970645Z return (buf0, buf1, buf2, ) 2023-01-11T21:38:06.3970651Z 2023-01-11T21:38:06.3970655Z 2023-01-11T21:38:06.3970738Z if __name__ == "__main__": 2023-01-11T21:38:06.3970851Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.3970981Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.3971183Z arg0_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.3971295Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.3971300Z 2023-01-11T21:38:06.3971305Z 2023-01-11T21:38:06.3971403Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.3971478Z import torch 2023-01-11T21:38:06.3971555Z import random 2023-01-11T21:38:06.3971678Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.3971796Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.3971801Z 2023-01-11T21:38:06.3971886Z aten = torch.ops.aten 2023-01-11T21:38:06.3972027Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.3972125Z async_compile = AsyncCompile() 2023-01-11T21:38:06.3972130Z 2023-01-11T21:38:06.3972206Z import triton 2023-01-11T21:38:06.3972299Z import triton.language as tl 2023-01-11T21:38:06.3972426Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.3972563Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.3972575Z 2023-01-11T21:38:06.3972579Z 2023-01-11T21:38:06.3972839Z triton_fused_convert_element_type_1_convert_element_type_3_convert_element_type_5_0 = async_compile.triton(''' 2023-01-11T21:38:06.3972917Z import triton 2023-01-11T21:38:06.3973012Z import triton.language as tl 2023-01-11T21:38:06.3973127Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.3973231Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.3973367Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.3973494Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.3973501Z 2023-01-11T21:38:06.3973931Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: '*fp16', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]}) 2023-01-11T21:38:06.3974032Z @triton.jit 2023-01-11T21:38:06.3974188Z def triton_(in_ptr0, out_ptr0, out_ptr1, out_ptr2, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.3974262Z xnumel = 64 2023-01-11T21:38:06.3974360Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.3974668Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.3974750Z xmask = xindex < xnumel 2023-01-11T21:38:06.3974820Z x0 = xindex 2023-01-11T21:38:06.3975027Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.3975144Z tmp20 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.3975237Z tmp1 = tmp0.to(tl.float32) 2023-01-11T21:38:06.3975307Z tmp2 = 3 2023-01-11T21:38:06.3975384Z tmp3 = tmp1 + tmp2 2023-01-11T21:38:06.3975455Z tmp4 = 0.0 2023-01-11T21:38:06.3975591Z tmp5 = tl.where(tmp3 != tmp3, tmp3, tl.where(tmp3 > tmp4, tmp3, tmp4)) 2023-01-11T21:38:06.3975656Z tmp6 = 6.0 2023-01-11T21:38:06.3975794Z tmp7 = tl.where(tmp5 != tmp5, tmp5, tl.where(tmp5 < tmp6, tmp5, tmp6)) 2023-01-11T21:38:06.3975872Z tmp8 = tmp1 * tmp7 2023-01-11T21:38:06.3975941Z tmp9 = 6 2023-01-11T21:38:06.3976019Z tmp10 = tmp8 / tmp9 2023-01-11T21:38:06.3976107Z tmp11 = tmp10.to(tl.float32) 2023-01-11T21:38:06.3976185Z tmp12 = tmp0 + tmp2 2023-01-11T21:38:06.3976267Z tmp13 = tmp12.to(tl.float32) 2023-01-11T21:38:06.3976348Z tmp14 = tmp13 + tmp2 2023-01-11T21:38:06.3976485Z tmp15 = tl.where(tmp14 != tmp14, tmp14, tl.where(tmp14 > tmp4, tmp14, tmp4)) 2023-01-11T21:38:06.3976622Z tmp16 = tl.where(tmp15 != tmp15, tmp15, tl.where(tmp15 < tmp6, tmp15, tmp6)) 2023-01-11T21:38:06.3976750Z tmp17 = tmp13 * tmp16 2023-01-11T21:38:06.3976834Z tmp18 = tmp17 / tmp9 2023-01-11T21:38:06.3976924Z tmp19 = tmp18.to(tl.float32) 2023-01-11T21:38:06.3977034Z tmp21 = tmp20 - tmp2 2023-01-11T21:38:06.3977167Z tmp22 = tmp21.to(tl.float32) 2023-01-11T21:38:06.3977261Z tmp23 = tmp22 + tmp2 2023-01-11T21:38:06.3977402Z tmp24 = tl.where(tmp23 != tmp23, tmp23, tl.where(tmp23 > tmp4, tmp23, tmp4)) 2023-01-11T21:38:06.3977538Z tmp25 = tl.where(tmp24 != tmp24, tmp24, tl.where(tmp24 < tmp6, tmp24, tmp6)) 2023-01-11T21:38:06.3977619Z tmp26 = tmp22 * tmp25 2023-01-11T21:38:06.3977698Z tmp27 = tmp26 / tmp9 2023-01-11T21:38:06.3977781Z tmp28 = tmp27.to(tl.float32) 2023-01-11T21:38:06.3977921Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp11, xmask) 2023-01-11T21:38:06.3978054Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp19, xmask) 2023-01-11T21:38:06.3978183Z tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp28, xmask) 2023-01-11T21:38:06.3978277Z ''') 2023-01-11T21:38:06.3978283Z 2023-01-11T21:38:06.3978287Z 2023-01-11T21:38:06.3978382Z async_compile.wait(globals()) 2023-01-11T21:38:06.3978461Z del async_compile 2023-01-11T21:38:06.3978466Z 2023-01-11T21:38:06.3978542Z def call(args): 2023-01-11T21:38:06.3978613Z arg0_1, = args 2023-01-11T21:38:06.3978689Z args.clear() 2023-01-11T21:38:06.3978784Z with torch.cuda.device(0): 2023-01-11T21:38:06.3978981Z buf0 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.3979181Z buf1 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.3979380Z buf2 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.3979473Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.3979688Z triton_fused_convert_element_type_1_convert_element_type_3_convert_element_type_5_0.run(arg0_1, buf0, buf1, buf2, 64, grid=grid(64), stream=stream0) 2023-01-11T21:38:06.3979761Z del arg0_1 2023-01-11T21:38:06.3979851Z return (buf0, buf1, buf2, ) 2023-01-11T21:38:06.3979857Z 2023-01-11T21:38:06.3979861Z 2023-01-11T21:38:06.3979943Z if __name__ == "__main__": 2023-01-11T21:38:06.3980064Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.3980232Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.3980436Z arg0_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.3980549Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.3980554Z 2023-01-11T21:38:06.3980628Z ok (0.417s) 2023-01-11T21:38:06.3981082Z test_hardtanh_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.3981222Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.3981481Z [2023-01-11 21:34:45,779] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 576 2023-01-11T21:38:06.3981752Z [2023-01-11 21:34:45,870] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 576 2023-01-11T21:38:06.3982168Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.3982301Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.3982558Z [2023-01-11 21:34:46,098] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 577 2023-01-11T21:38:06.3982848Z [2023-01-11 21:34:46,191] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 577 2023-01-11T21:38:06.3982854Z 2023-01-11T21:38:06.3982954Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.3983030Z import torch 2023-01-11T21:38:06.3983104Z import random 2023-01-11T21:38:06.3983225Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.3983349Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.3983354Z 2023-01-11T21:38:06.3983438Z aten = torch.ops.aten 2023-01-11T21:38:06.3983578Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.3983675Z async_compile = AsyncCompile() 2023-01-11T21:38:06.3983680Z 2023-01-11T21:38:06.3983756Z import triton 2023-01-11T21:38:06.3983850Z import triton.language as tl 2023-01-11T21:38:06.3983970Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.3984118Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.3984124Z 2023-01-11T21:38:06.3984128Z 2023-01-11T21:38:06.3984324Z triton_fused_minimum_minimum_1_minimum_2_0 = async_compile.triton(''' 2023-01-11T21:38:06.3984401Z import triton 2023-01-11T21:38:06.3984496Z import triton.language as tl 2023-01-11T21:38:06.3984616Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.3984719Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.3984853Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.3984974Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.3984979Z 2023-01-11T21:38:06.3985449Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]}) 2023-01-11T21:38:06.3985536Z @triton.jit 2023-01-11T21:38:06.3985697Z def triton_(in_ptr0, out_ptr0, out_ptr1, out_ptr2, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.3985773Z xnumel = 64 2023-01-11T21:38:06.3985873Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.3986006Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.3986090Z xmask = xindex < xnumel 2023-01-11T21:38:06.3986187Z x0 = xindex 2023-01-11T21:38:06.3986381Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.3986481Z tmp9 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.3986581Z tmp1 = -1.0 2023-01-11T21:38:06.3986720Z tmp2 = tl.where(tmp0 != tmp0, tmp0, tl.where(tmp0 > tmp1, tmp0, tmp1)) 2023-01-11T21:38:06.3986794Z tmp3 = 1.0 2023-01-11T21:38:06.3986934Z tmp4 = tl.where(tmp2 != tmp2, tmp2, tl.where(tmp2 < tmp3, tmp2, tmp3)) 2023-01-11T21:38:06.3987000Z tmp5 = 1 2023-01-11T21:38:06.3987082Z tmp6 = tmp0 + tmp5 2023-01-11T21:38:06.3987220Z tmp7 = tl.where(tmp6 != tmp6, tmp6, tl.where(tmp6 > tmp1, tmp6, tmp1)) 2023-01-11T21:38:06.3987355Z tmp8 = tl.where(tmp7 != tmp7, tmp7, tl.where(tmp7 < tmp3, tmp7, tmp3)) 2023-01-11T21:38:06.3987469Z tmp10 = tmp9 - tmp5 2023-01-11T21:38:06.3987615Z tmp11 = tl.where(tmp10 != tmp10, tmp10, tl.where(tmp10 > tmp1, tmp10, tmp1)) 2023-01-11T21:38:06.3987762Z tmp12 = tl.where(tmp11 != tmp11, tmp11, tl.where(tmp11 < tmp3, tmp11, tmp3)) 2023-01-11T21:38:06.3987892Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.3988029Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp8, xmask) 2023-01-11T21:38:06.3988166Z tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp12, xmask) 2023-01-11T21:38:06.3988254Z ''') 2023-01-11T21:38:06.3988259Z 2023-01-11T21:38:06.3988264Z 2023-01-11T21:38:06.3988358Z async_compile.wait(globals()) 2023-01-11T21:38:06.3988438Z del async_compile 2023-01-11T21:38:06.3988443Z 2023-01-11T21:38:06.3988521Z def call(args): 2023-01-11T21:38:06.3988628Z arg0_1, = args 2023-01-11T21:38:06.3988698Z args.clear() 2023-01-11T21:38:06.3988792Z with torch.cuda.device(0): 2023-01-11T21:38:06.3988994Z buf0 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3989192Z buf1 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3989394Z buf2 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.3989487Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.3989660Z triton_fused_minimum_minimum_1_minimum_2_0.run(arg0_1, buf0, buf1, buf2, 64, grid=grid(64), stream=stream0) 2023-01-11T21:38:06.3989737Z del arg0_1 2023-01-11T21:38:06.3989821Z return (buf0, buf1, buf2, ) 2023-01-11T21:38:06.3989826Z 2023-01-11T21:38:06.3989831Z 2023-01-11T21:38:06.3989914Z if __name__ == "__main__": 2023-01-11T21:38:06.3990036Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.3990169Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.3990369Z arg0_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.3990482Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.3990487Z 2023-01-11T21:38:06.3990492Z 2023-01-11T21:38:06.3990591Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.3990670Z import torch 2023-01-11T21:38:06.3990740Z import random 2023-01-11T21:38:06.3990863Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.3990987Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.3990992Z 2023-01-11T21:38:06.3991074Z aten = torch.ops.aten 2023-01-11T21:38:06.3991213Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.3991312Z async_compile = AsyncCompile() 2023-01-11T21:38:06.3991317Z 2023-01-11T21:38:06.3991392Z import triton 2023-01-11T21:38:06.3991480Z import triton.language as tl 2023-01-11T21:38:06.3991611Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.3991752Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.3991758Z 2023-01-11T21:38:06.3991762Z 2023-01-11T21:38:06.3992026Z triton_fused_convert_element_type_1_convert_element_type_3_convert_element_type_5_0 = async_compile.triton(''' 2023-01-11T21:38:06.3992133Z import triton 2023-01-11T21:38:06.3992230Z import triton.language as tl 2023-01-11T21:38:06.3992348Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.3992452Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.3992581Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.3992708Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.3992714Z 2023-01-11T21:38:06.3993150Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: '*fp16', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]}) 2023-01-11T21:38:06.3993229Z @triton.jit 2023-01-11T21:38:06.3993382Z def triton_(in_ptr0, out_ptr0, out_ptr1, out_ptr2, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.3993458Z xnumel = 64 2023-01-11T21:38:06.3993562Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.3993696Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.3993775Z xmask = xindex < xnumel 2023-01-11T21:38:06.3993848Z x0 = xindex 2023-01-11T21:38:06.3994065Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.3994184Z tmp13 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.3994275Z tmp1 = tmp0.to(tl.float32) 2023-01-11T21:38:06.3994376Z tmp2 = -1.0 2023-01-11T21:38:06.3994513Z tmp3 = tl.where(tmp1 != tmp1, tmp1, tl.where(tmp1 > tmp2, tmp1, tmp2)) 2023-01-11T21:38:06.3994590Z tmp4 = 1.0 2023-01-11T21:38:06.3994721Z tmp5 = tl.where(tmp3 != tmp3, tmp3, tl.where(tmp3 < tmp4, tmp3, tmp4)) 2023-01-11T21:38:06.3994851Z tmp6 = tmp5.to(tl.float32) 2023-01-11T21:38:06.3994926Z tmp7 = 1 2023-01-11T21:38:06.3995005Z tmp8 = tmp0 + tmp7 2023-01-11T21:38:06.3995091Z tmp9 = tmp8.to(tl.float32) 2023-01-11T21:38:06.3995241Z tmp10 = tl.where(tmp9 != tmp9, tmp9, tl.where(tmp9 > tmp2, tmp9, tmp2)) 2023-01-11T21:38:06.3995406Z tmp11 = tl.where(tmp10 != tmp10, tmp10, tl.where(tmp10 < tmp4, tmp10, tmp4)) 2023-01-11T21:38:06.3995509Z tmp12 = tmp11.to(tl.float32) 2023-01-11T21:38:06.3995625Z tmp14 = tmp13 - tmp7 2023-01-11T21:38:06.3995718Z tmp15 = tmp14.to(tl.float32) 2023-01-11T21:38:06.3995859Z tmp16 = tl.where(tmp15 != tmp15, tmp15, tl.where(tmp15 > tmp2, tmp15, tmp2)) 2023-01-11T21:38:06.3995998Z tmp17 = tl.where(tmp16 != tmp16, tmp16, tl.where(tmp16 < tmp4, tmp16, tmp4)) 2023-01-11T21:38:06.3996088Z tmp18 = tmp17.to(tl.float32) 2023-01-11T21:38:06.3996223Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp6, xmask) 2023-01-11T21:38:06.3996355Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp12, xmask) 2023-01-11T21:38:06.3996486Z tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp18, xmask) 2023-01-11T21:38:06.3996570Z ''') 2023-01-11T21:38:06.3996576Z 2023-01-11T21:38:06.3996580Z 2023-01-11T21:38:06.3996679Z async_compile.wait(globals()) 2023-01-11T21:38:06.3996758Z del async_compile 2023-01-11T21:38:06.3996763Z 2023-01-11T21:38:06.3996841Z def call(args): 2023-01-11T21:38:06.3996915Z arg0_1, = args 2023-01-11T21:38:06.3996993Z args.clear() 2023-01-11T21:38:06.3997081Z with torch.cuda.device(0): 2023-01-11T21:38:06.3997280Z buf0 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.3997478Z buf1 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.3997675Z buf2 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.3997769Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.3997987Z triton_fused_convert_element_type_1_convert_element_type_3_convert_element_type_5_0.run(arg0_1, buf0, buf1, buf2, 64, grid=grid(64), stream=stream0) 2023-01-11T21:38:06.3998064Z del arg0_1 2023-01-11T21:38:06.3998149Z return (buf0, buf1, buf2, ) 2023-01-11T21:38:06.3998162Z 2023-01-11T21:38:06.3998230Z 2023-01-11T21:38:06.3998308Z if __name__ == "__main__": 2023-01-11T21:38:06.3998430Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.3998560Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.3998761Z arg0_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.3998875Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.3998880Z 2023-01-11T21:38:06.3998957Z ok (0.448s) 2023-01-11T21:38:06.3999422Z test_horizonal_fusion1_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.3999559Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.3999822Z [2023-01-11 21:34:46,210] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 578 2023-01-11T21:38:06.4000080Z [2023-01-11 21:34:46,424] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 578 2023-01-11T21:38:06.4000498Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.4000662Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.4000917Z [2023-01-11 21:34:46,444] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 579 2023-01-11T21:38:06.4001190Z [2023-01-11 21:34:46,540] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 579 2023-01-11T21:38:06.4001195Z 2023-01-11T21:38:06.4001297Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.4001374Z import torch 2023-01-11T21:38:06.4001450Z import random 2023-01-11T21:38:06.4001571Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.4001690Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.4001695Z 2023-01-11T21:38:06.4001779Z aten = torch.ops.aten 2023-01-11T21:38:06.4001921Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.4002017Z async_compile = AsyncCompile() 2023-01-11T21:38:06.4002022Z 2023-01-11T21:38:06.4002100Z import triton 2023-01-11T21:38:06.4002199Z import triton.language as tl 2023-01-11T21:38:06.4002323Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.4002464Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.4002469Z 2023-01-11T21:38:06.4002474Z 2023-01-11T21:38:06.4002641Z triton_fused_add_mul_sub_0 = async_compile.triton(''' 2023-01-11T21:38:06.4002717Z import triton 2023-01-11T21:38:06.4002813Z import triton.language as tl 2023-01-11T21:38:06.4002929Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4003033Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4003170Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4003296Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4003301Z 2023-01-11T21:38:06.4003768Z @pointwise(size_hints=[2048], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: '*fp32', 5: '*fp32', 6: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5, 6), equal_to_1=())]}) 2023-01-11T21:38:06.4003841Z @triton.jit 2023-01-11T21:38:06.4004012Z def triton_(in_ptr0, in_ptr1, in_ptr2, out_ptr0, out_ptr1, out_ptr2, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4004088Z xnumel = 2048 2023-01-11T21:38:06.4004216Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4004348Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4004433Z xmask = xindex < xnumel 2023-01-11T21:38:06.4004507Z x0 = xindex 2023-01-11T21:38:06.4004585Z x2 = (xindex // 16) % 16 2023-01-11T21:38:06.4004777Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.4004975Z tmp1 = tl.load(in_ptr1 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.4005075Z tmp3 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.4005266Z tmp4 = tl.load(in_ptr2 + (x2), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.4005366Z tmp6 = tl.load(in_ptr1 + (x0), xmask) 2023-01-11T21:38:06.4005464Z tmp7 = tl.load(in_ptr2 + (x2), xmask) 2023-01-11T21:38:06.4005546Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.4005650Z tmp5 = tmp3 - tmp4 2023-01-11T21:38:06.4005728Z tmp8 = tmp6 * tmp7 2023-01-11T21:38:06.4005866Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.4006002Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask) 2023-01-11T21:38:06.4006135Z tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp8, xmask) 2023-01-11T21:38:06.4006222Z ''') 2023-01-11T21:38:06.4006227Z 2023-01-11T21:38:06.4006232Z 2023-01-11T21:38:06.4006327Z async_compile.wait(globals()) 2023-01-11T21:38:06.4006399Z del async_compile 2023-01-11T21:38:06.4006411Z 2023-01-11T21:38:06.4006482Z def call(args): 2023-01-11T21:38:06.4006569Z arg0_1, arg1_1, arg2_1 = args 2023-01-11T21:38:06.4006648Z args.clear() 2023-01-11T21:38:06.4006772Z with torch.cuda.device(0): 2023-01-11T21:38:06.4006988Z buf0 = empty_strided((8, 16, 16), (256, 16, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.4007201Z buf1 = empty_strided((8, 16, 16), (256, 16, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.4007410Z buf2 = empty_strided((8, 16, 16), (256, 16, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.4007499Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.4007670Z triton_fused_add_mul_sub_0.run(arg0_1, arg1_1, arg2_1, buf0, buf1, buf2, 2048, grid=grid(2048), stream=stream0) 2023-01-11T21:38:06.4007744Z del arg0_1 2023-01-11T21:38:06.4007818Z del arg1_1 2023-01-11T21:38:06.4007891Z del arg2_1 2023-01-11T21:38:06.4007982Z return (buf0, buf1, buf2, ) 2023-01-11T21:38:06.4007987Z 2023-01-11T21:38:06.4007991Z 2023-01-11T21:38:06.4008075Z if __name__ == "__main__": 2023-01-11T21:38:06.4008199Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.4008324Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.4008540Z arg0_1 = rand_strided((8, 16, 16), (256, 16, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.4008751Z arg1_1 = rand_strided((8, 16, 16), (256, 16, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.4008960Z arg2_1 = rand_strided((1, 16, 1), (16, 1, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.4009091Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1])) 2023-01-11T21:38:06.4009096Z 2023-01-11T21:38:06.4009101Z 2023-01-11T21:38:06.4009199Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.4009276Z import torch 2023-01-11T21:38:06.4009353Z import random 2023-01-11T21:38:06.4009468Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.4009595Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.4009601Z 2023-01-11T21:38:06.4009685Z aten = torch.ops.aten 2023-01-11T21:38:06.4009825Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.4009921Z async_compile = AsyncCompile() 2023-01-11T21:38:06.4009926Z 2023-01-11T21:38:06.4010001Z import triton 2023-01-11T21:38:06.4010099Z import triton.language as tl 2023-01-11T21:38:06.4010218Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.4010386Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.4010393Z 2023-01-11T21:38:06.4010397Z 2023-01-11T21:38:06.4010566Z triton_fused_add_mul_sub_0 = async_compile.triton(''' 2023-01-11T21:38:06.4010642Z import triton 2023-01-11T21:38:06.4010737Z import triton.language as tl 2023-01-11T21:38:06.4010849Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4010953Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4011089Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4011210Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4011218Z 2023-01-11T21:38:06.4011686Z @pointwise(size_hints=[2048], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: '*fp16', 4: '*fp16', 5: '*fp16', 6: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5, 6), equal_to_1=())]}) 2023-01-11T21:38:06.4011764Z @triton.jit 2023-01-11T21:38:06.4011936Z def triton_(in_ptr0, in_ptr1, in_ptr2, out_ptr0, out_ptr1, out_ptr2, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4012016Z xnumel = 2048 2023-01-11T21:38:06.4012115Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4012248Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4012334Z xmask = xindex < xnumel 2023-01-11T21:38:06.4012400Z x0 = xindex 2023-01-11T21:38:06.4012486Z x2 = (xindex // 16) % 16 2023-01-11T21:38:06.4012703Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.4012940Z tmp1 = tl.load(in_ptr1 + (x0), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.4013056Z tmp3 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.4013264Z tmp4 = tl.load(in_ptr2 + (x2), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.4013382Z tmp6 = tl.load(in_ptr1 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.4013496Z tmp7 = tl.load(in_ptr2 + (x2), xmask).to(tl.float32) 2023-01-11T21:38:06.4013568Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.4013676Z tmp5 = tmp3 - tmp4 2023-01-11T21:38:06.4013753Z tmp8 = tmp6 * tmp7 2023-01-11T21:38:06.4013887Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.4014022Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask) 2023-01-11T21:38:06.4014153Z tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp8, xmask) 2023-01-11T21:38:06.4014237Z ''') 2023-01-11T21:38:06.4014243Z 2023-01-11T21:38:06.4014247Z 2023-01-11T21:38:06.4014337Z async_compile.wait(globals()) 2023-01-11T21:38:06.4014414Z del async_compile 2023-01-11T21:38:06.4014419Z 2023-01-11T21:38:06.4014624Z def call(args): 2023-01-11T21:38:06.4014714Z arg0_1, arg1_1, arg2_1 = args 2023-01-11T21:38:06.4014788Z args.clear() 2023-01-11T21:38:06.4014879Z with torch.cuda.device(0): 2023-01-11T21:38:06.4015099Z buf0 = empty_strided((8, 16, 16), (256, 16, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.4015309Z buf1 = empty_strided((8, 16, 16), (256, 16, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.4015532Z buf2 = empty_strided((8, 16, 16), (256, 16, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.4015642Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.4015818Z triton_fused_add_mul_sub_0.run(arg0_1, arg1_1, arg2_1, buf0, buf1, buf2, 2048, grid=grid(2048), stream=stream0) 2023-01-11T21:38:06.4015893Z del arg0_1 2023-01-11T21:38:06.4015965Z del arg1_1 2023-01-11T21:38:06.4016040Z del arg2_1 2023-01-11T21:38:06.4016131Z return (buf0, buf1, buf2, ) 2023-01-11T21:38:06.4016136Z 2023-01-11T21:38:06.4016140Z 2023-01-11T21:38:06.4016217Z if __name__ == "__main__": 2023-01-11T21:38:06.4016328Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.4016500Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.4016711Z arg0_1 = rand_strided((8, 16, 16), (256, 16, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.4016919Z arg1_1 = rand_strided((8, 16, 16), (256, 16, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.4017176Z arg2_1 = rand_strided((1, 16, 1), (16, 1, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.4017334Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1])) 2023-01-11T21:38:06.4017340Z 2023-01-11T21:38:06.4017412Z ok (0.350s) 2023-01-11T21:38:06.4017873Z test_horizonal_fusion2_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.4018013Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.4018266Z [2023-01-11 21:34:46,560] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 580 2023-01-11T21:38:06.4018526Z [2023-01-11 21:34:46,643] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 580 2023-01-11T21:38:06.4018940Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.4019110Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.4019363Z [2023-01-11 21:34:46,663] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 581 2023-01-11T21:38:06.4019369Z 2023-01-11T21:38:06.4019468Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.4019542Z import torch 2023-01-11T21:38:06.4019614Z import random 2023-01-11T21:38:06.4019735Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.4019852Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.4019857Z 2023-01-11T21:38:06.4019938Z aten = torch.ops.aten 2023-01-11T21:38:06.4020077Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.4020173Z async_compile = AsyncCompile() 2023-01-11T21:38:06.4020178Z 2023-01-11T21:38:06.4020252Z import triton 2023-01-11T21:38:06.4020344Z import triton.language as tl 2023-01-11T21:38:06.4020473Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.4020607Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.4020619Z 2023-01-11T21:38:06.4020623Z 2023-01-11T21:38:06.4020770Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.4020847Z import triton 2023-01-11T21:38:06.4020941Z import triton.language as tl 2023-01-11T21:38:06.4021055Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4021155Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4021290Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4021415Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4021421Z 2023-01-11T21:38:06.4021833Z @pointwise(size_hints=[1024], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.4021903Z @triton.jit 2023-01-11T21:38:06.4022038Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4022111Z xnumel = 1024 2023-01-11T21:38:06.4022208Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4022335Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4022445Z xmask = xindex < xnumel 2023-01-11T21:38:06.4022517Z x0 = xindex 2023-01-11T21:38:06.4022607Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.4022679Z tmp1 = 1 2023-01-11T21:38:06.4022758Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.4022893Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.4022981Z ''') 2023-01-11T21:38:06.4022987Z 2023-01-11T21:38:06.4022992Z 2023-01-11T21:38:06.4023149Z triton_fused_add_1_1 = async_compile.triton(''' 2023-01-11T21:38:06.4023224Z import triton 2023-01-11T21:38:06.4023310Z import triton.language as tl 2023-01-11T21:38:06.4023425Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4023528Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4023660Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4023785Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4023790Z 2023-01-11T21:38:06.4024194Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.4024273Z @triton.jit 2023-01-11T21:38:06.4024406Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4024473Z xnumel = 128 2023-01-11T21:38:06.4024571Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4024702Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4024785Z xmask = xindex < xnumel 2023-01-11T21:38:06.4024854Z x0 = xindex 2023-01-11T21:38:06.4024981Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.4025051Z tmp1 = 2 2023-01-11T21:38:06.4025123Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.4025260Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.4025348Z ''') 2023-01-11T21:38:06.4025353Z 2023-01-11T21:38:06.4025357Z 2023-01-11T21:38:06.4025542Z triton_fused_add_2_2 = async_compile.triton(''' 2023-01-11T21:38:06.4025621Z import triton 2023-01-11T21:38:06.4025734Z import triton.language as tl 2023-01-11T21:38:06.4025848Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4025950Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4026078Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4026205Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4026210Z 2023-01-11T21:38:06.4026615Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.4026693Z @triton.jit 2023-01-11T21:38:06.4026825Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4026898Z xnumel = 128 2023-01-11T21:38:06.4026994Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4027127Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4027203Z xmask = xindex < xnumel 2023-01-11T21:38:06.4027272Z x0 = xindex 2023-01-11T21:38:06.4027368Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.4027439Z tmp1 = 3 2023-01-11T21:38:06.4027518Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.4027651Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.4027735Z ''') 2023-01-11T21:38:06.4027741Z 2023-01-11T21:38:06.4027745Z 2023-01-11T21:38:06.4027831Z async_compile.wait(globals()) 2023-01-11T21:38:06.4027906Z del async_compile 2023-01-11T21:38:06.4027914Z 2023-01-11T21:38:06.4027989Z def call(args): 2023-01-11T21:38:06.4028073Z arg0_1, arg1_1, arg2_1 = args 2023-01-11T21:38:06.4028147Z args.clear() 2023-01-11T21:38:06.4028239Z with torch.cuda.device(0): 2023-01-11T21:38:06.4028450Z buf0 = empty_strided((8, 16, 8), (128, 8, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.4028562Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.4028707Z triton_fused_add_0.run(arg0_1, buf0, 1024, grid=grid(1024), stream=stream0) 2023-01-11T21:38:06.4028779Z del arg0_1 2023-01-11T21:38:06.4028982Z buf1 = empty_strided((8, 16), (16, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.4029116Z triton_fused_add_1_1.run(arg1_1, buf1, 128, grid=grid(128), stream=stream0) 2023-01-11T21:38:06.4029188Z del arg1_1 2023-01-11T21:38:06.4029388Z buf2 = empty_strided((16, 8), (8, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.4029526Z triton_fused_add_2_2.run(arg2_1, buf2, 128, grid=grid(128), stream=stream0) 2023-01-11T21:38:06.4029595Z del arg2_1 2023-01-11T21:38:06.4029682Z return (buf0, buf1, buf2, ) 2023-01-11T21:38:06.4029687Z 2023-01-11T21:38:06.4029692Z 2023-01-11T21:38:06.4029771Z if __name__ == "__main__": 2023-01-11T21:38:06.4029892Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.4030019Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.4030233Z arg0_1 = rand_strided((8, 16, 8), (128, 8, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.4030435Z arg1_1 = rand_strided((8, 16), (16, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.4030636Z arg2_1 = rand_strided((16, 8), (8, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.4030758Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1])) 2023-01-11T21:38:06.4030763Z 2023-01-11T21:38:06.4031030Z [2023-01-11 21:34:46,748] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 581 2023-01-11T21:38:06.4031063Z 2023-01-11T21:38:06.4031161Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.4031235Z import torch 2023-01-11T21:38:06.4031310Z import random 2023-01-11T21:38:06.4031427Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.4031554Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.4031559Z 2023-01-11T21:38:06.4031641Z aten = torch.ops.aten 2023-01-11T21:38:06.4031771Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.4031865Z async_compile = AsyncCompile() 2023-01-11T21:38:06.4031870Z 2023-01-11T21:38:06.4031944Z import triton 2023-01-11T21:38:06.4032038Z import triton.language as tl 2023-01-11T21:38:06.4032163Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.4032302Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.4032307Z 2023-01-11T21:38:06.4032312Z 2023-01-11T21:38:06.4032466Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.4032544Z import triton 2023-01-11T21:38:06.4032629Z import triton.language as tl 2023-01-11T21:38:06.4032743Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4032844Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4032981Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4033108Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4033114Z 2023-01-11T21:38:06.4033518Z @pointwise(size_hints=[1024], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.4033589Z @triton.jit 2023-01-11T21:38:06.4033724Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4033792Z xnumel = 1024 2023-01-11T21:38:06.4033886Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4034014Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4034105Z xmask = xindex < xnumel 2023-01-11T21:38:06.4034177Z x0 = xindex 2023-01-11T21:38:06.4034295Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.4034366Z tmp1 = 1 2023-01-11T21:38:06.4034438Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.4034601Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.4034687Z ''') 2023-01-11T21:38:06.4034693Z 2023-01-11T21:38:06.4034697Z 2023-01-11T21:38:06.4034857Z triton_fused_add_1_1 = async_compile.triton(''' 2023-01-11T21:38:06.4034932Z import triton 2023-01-11T21:38:06.4035024Z import triton.language as tl 2023-01-11T21:38:06.4035142Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4035253Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4035403Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4035545Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4035554Z 2023-01-11T21:38:06.4035953Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.4036026Z @triton.jit 2023-01-11T21:38:06.4036160Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4036237Z xnumel = 128 2023-01-11T21:38:06.4036333Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4036455Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4036538Z xmask = xindex < xnumel 2023-01-11T21:38:06.4036607Z x0 = xindex 2023-01-11T21:38:06.4036728Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.4036799Z tmp1 = 2 2023-01-11T21:38:06.4036877Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.4037011Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.4037132Z ''') 2023-01-11T21:38:06.4037138Z 2023-01-11T21:38:06.4037147Z 2023-01-11T21:38:06.4037299Z triton_fused_add_2_2 = async_compile.triton(''' 2023-01-11T21:38:06.4037371Z import triton 2023-01-11T21:38:06.4037468Z import triton.language as tl 2023-01-11T21:38:06.4037582Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4037685Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4037818Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4037945Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4037950Z 2023-01-11T21:38:06.4038346Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.4038420Z @triton.jit 2023-01-11T21:38:06.4038552Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4038632Z xnumel = 128 2023-01-11T21:38:06.4038730Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4038858Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4038940Z xmask = xindex < xnumel 2023-01-11T21:38:06.4039008Z x0 = xindex 2023-01-11T21:38:06.4039122Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.4039191Z tmp1 = 3 2023-01-11T21:38:06.4039267Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.4039403Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.4039487Z ''') 2023-01-11T21:38:06.4039492Z 2023-01-11T21:38:06.4039496Z 2023-01-11T21:38:06.4039590Z async_compile.wait(globals()) 2023-01-11T21:38:06.4039666Z del async_compile 2023-01-11T21:38:06.4039671Z 2023-01-11T21:38:06.4039739Z def call(args): 2023-01-11T21:38:06.4039825Z arg0_1, arg1_1, arg2_1 = args 2023-01-11T21:38:06.4039901Z args.clear() 2023-01-11T21:38:06.4039995Z with torch.cuda.device(0): 2023-01-11T21:38:06.4040208Z buf0 = empty_strided((8, 16, 8), (128, 8, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.4040301Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.4040440Z triton_fused_add_0.run(arg0_1, buf0, 1024, grid=grid(1024), stream=stream0) 2023-01-11T21:38:06.4040507Z del arg0_1 2023-01-11T21:38:06.4040738Z buf1 = empty_strided((8, 16), (16, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.4040878Z triton_fused_add_1_1.run(arg1_1, buf1, 128, grid=grid(128), stream=stream0) 2023-01-11T21:38:06.4040952Z del arg1_1 2023-01-11T21:38:06.4041153Z buf2 = empty_strided((16, 8), (8, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.4041291Z triton_fused_add_2_2.run(arg2_1, buf2, 128, grid=grid(128), stream=stream0) 2023-01-11T21:38:06.4041362Z del arg2_1 2023-01-11T21:38:06.4041449Z return (buf0, buf1, buf2, ) 2023-01-11T21:38:06.4041454Z 2023-01-11T21:38:06.4041461Z 2023-01-11T21:38:06.4041534Z if __name__ == "__main__": 2023-01-11T21:38:06.4041653Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.4041781Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.4041990Z arg0_1 = rand_strided((8, 16, 8), (128, 8, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.4042195Z arg1_1 = rand_strided((8, 16), (16, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.4042395Z arg2_1 = rand_strided((16, 8), (8, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.4042523Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1])) 2023-01-11T21:38:06.4042528Z 2023-01-11T21:38:06.4042598Z ok (0.208s) 2023-01-11T21:38:06.4043055Z test_index1_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.4043209Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.4043471Z [2023-01-11 21:34:46,791] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 582 2023-01-11T21:38:06.4043736Z [2023-01-11 21:34:46,869] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 582 2023-01-11T21:38:06.4044151Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.4044283Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.4044544Z [2023-01-11 21:34:46,909] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 583 2023-01-11T21:38:06.4044802Z [2023-01-11 21:34:46,983] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 583 2023-01-11T21:38:06.4045217Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.4045348Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.4045601Z [2023-01-11 21:34:47,023] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 584 2023-01-11T21:38:06.4045863Z [2023-01-11 21:34:47,121] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 584 2023-01-11T21:38:06.4046300Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.4046427Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.4046686Z [2023-01-11 21:34:47,162] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 585 2023-01-11T21:38:06.4046692Z 2023-01-11T21:38:06.4046790Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.4046865Z import torch 2023-01-11T21:38:06.4046938Z import random 2023-01-11T21:38:06.4047058Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.4047184Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.4047189Z 2023-01-11T21:38:06.4047276Z aten = torch.ops.aten 2023-01-11T21:38:06.4047407Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.4047501Z async_compile = AsyncCompile() 2023-01-11T21:38:06.4047506Z 2023-01-11T21:38:06.4047579Z import triton 2023-01-11T21:38:06.4047670Z import triton.language as tl 2023-01-11T21:38:06.4047799Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.4047940Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.4047946Z 2023-01-11T21:38:06.4047950Z 2023-01-11T21:38:06.4048109Z triton_fused_index_0 = async_compile.triton(''' 2023-01-11T21:38:06.4048184Z import triton 2023-01-11T21:38:06.4048270Z import triton.language as tl 2023-01-11T21:38:06.4048385Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4048487Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4048625Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4048752Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4048783Z 2023-01-11T21:38:06.4049218Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*i64', 1: '*i64', 2: '*fp32', 3: '*fp32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]}) 2023-01-11T21:38:06.4049296Z @triton.jit 2023-01-11T21:38:06.4049449Z def triton_(in_ptr0, in_ptr1, in_ptr2, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4049515Z xnumel = 48 2023-01-11T21:38:06.4049612Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4049742Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4049824Z xmask = xindex < xnumel 2023-01-11T21:38:06.4049904Z x1 = (xindex // 12) 2023-01-11T21:38:06.4049977Z x0 = xindex % 12 2023-01-11T21:38:06.4050047Z x2 = xindex 2023-01-11T21:38:06.4050139Z tmp0 = tl.load(in_ptr0 + (x1), xmask) 2023-01-11T21:38:06.4050236Z tmp1 = tl.load(in_ptr1 + (x1), xmask) 2023-01-11T21:38:06.4050359Z tmp2 = tl.load(in_ptr2 + (x0 + (12*tmp1) + (96*tmp0)), xmask) 2023-01-11T21:38:06.4050494Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.4050579Z ''') 2023-01-11T21:38:06.4050585Z 2023-01-11T21:38:06.4050590Z 2023-01-11T21:38:06.4050685Z async_compile.wait(globals()) 2023-01-11T21:38:06.4050763Z del async_compile 2023-01-11T21:38:06.4050768Z 2023-01-11T21:38:06.4050843Z def call(args): 2023-01-11T21:38:06.4050922Z arg0_1, arg1_1, arg2_1 = args 2023-01-11T21:38:06.4050999Z args.clear() 2023-01-11T21:38:06.4051091Z with torch.cuda.device(0): 2023-01-11T21:38:06.4051296Z buf0 = empty_strided((4, 12), (12, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.4051390Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.4051541Z triton_fused_index_0.run(arg1_1, arg2_1, arg0_1, buf0, 48, grid=grid(48), stream=stream0) 2023-01-11T21:38:06.4051615Z del arg0_1 2023-01-11T21:38:06.4051683Z del arg1_1 2023-01-11T21:38:06.4051755Z del arg2_1 2023-01-11T21:38:06.4051831Z return (buf0, ) 2023-01-11T21:38:06.4051836Z 2023-01-11T21:38:06.4051840Z 2023-01-11T21:38:06.4051921Z if __name__ == "__main__": 2023-01-11T21:38:06.4052039Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.4052191Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.4052405Z arg0_1 = rand_strided((8, 8, 12), (96, 12, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.4052602Z arg1_1 = rand_strided((4, ), (1, ), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.4052789Z arg2_1 = rand_strided((4, ), (1, ), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.4052919Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1])) 2023-01-11T21:38:06.4052924Z 2023-01-11T21:38:06.4052928Z 2023-01-11T21:38:06.4053026Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.4053101Z import torch 2023-01-11T21:38:06.4053177Z import random 2023-01-11T21:38:06.4053296Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.4053423Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.4053428Z 2023-01-11T21:38:06.4053509Z aten = torch.ops.aten 2023-01-11T21:38:06.4053638Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.4053734Z async_compile = AsyncCompile() 2023-01-11T21:38:06.4053739Z 2023-01-11T21:38:06.4053812Z import triton 2023-01-11T21:38:06.4053905Z import triton.language as tl 2023-01-11T21:38:06.4054031Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.4054170Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.4054175Z 2023-01-11T21:38:06.4054180Z 2023-01-11T21:38:06.4054336Z triton_fused_index_0 = async_compile.triton(''' 2023-01-11T21:38:06.4054404Z import triton 2023-01-11T21:38:06.4054603Z import triton.language as tl 2023-01-11T21:38:06.4054722Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4054869Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4055004Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4055130Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4055135Z 2023-01-11T21:38:06.4055620Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*i64', 1: '*i64', 2: '*fp16', 3: '*fp16', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]}) 2023-01-11T21:38:06.4055694Z @triton.jit 2023-01-11T21:38:06.4055847Z def triton_(in_ptr0, in_ptr1, in_ptr2, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4055914Z xnumel = 48 2023-01-11T21:38:06.4056010Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4056138Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4056220Z xmask = xindex < xnumel 2023-01-11T21:38:06.4056305Z x1 = (xindex // 12) 2023-01-11T21:38:06.4056381Z x0 = xindex % 12 2023-01-11T21:38:06.4056446Z x2 = xindex 2023-01-11T21:38:06.4056544Z tmp0 = tl.load(in_ptr0 + (x1), xmask) 2023-01-11T21:38:06.4056641Z tmp1 = tl.load(in_ptr1 + (x1), xmask) 2023-01-11T21:38:06.4056782Z tmp2 = tl.load(in_ptr2 + (x0 + (12*tmp1) + (96*tmp0)), xmask).to(tl.float32) 2023-01-11T21:38:06.4056921Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.4057005Z ''') 2023-01-11T21:38:06.4057011Z 2023-01-11T21:38:06.4057015Z 2023-01-11T21:38:06.4057107Z async_compile.wait(globals()) 2023-01-11T21:38:06.4057245Z del async_compile 2023-01-11T21:38:06.4057251Z 2023-01-11T21:38:06.4057320Z def call(args): 2023-01-11T21:38:06.4057408Z arg0_1, arg1_1, arg2_1 = args 2023-01-11T21:38:06.4057482Z args.clear() 2023-01-11T21:38:06.4057573Z with torch.cuda.device(0): 2023-01-11T21:38:06.4057780Z buf0 = empty_strided((4, 12), (12, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.4057880Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.4058029Z triton_fused_index_0.run(arg1_1, arg2_1, arg0_1, buf0, 48, grid=grid(48), stream=stream0) 2023-01-11T21:38:06.4058096Z del arg0_1 2023-01-11T21:38:06.4058169Z del arg1_1 2023-01-11T21:38:06.4058238Z del arg2_1 2023-01-11T21:38:06.4058353Z return (buf0, ) 2023-01-11T21:38:06.4058359Z 2023-01-11T21:38:06.4058363Z 2023-01-11T21:38:06.4058446Z if __name__ == "__main__": 2023-01-11T21:38:06.4058564Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.4058690Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.4058902Z arg0_1 = rand_strided((8, 8, 12), (96, 12, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.4059090Z arg1_1 = rand_strided((4, ), (1, ), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.4059278Z arg2_1 = rand_strided((4, ), (1, ), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.4059409Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1])) 2023-01-11T21:38:06.4059414Z 2023-01-11T21:38:06.4059419Z 2023-01-11T21:38:06.4059516Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.4059591Z import torch 2023-01-11T21:38:06.4059664Z import random 2023-01-11T21:38:06.4059785Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.4059909Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.4059914Z 2023-01-11T21:38:06.4059989Z aten = torch.ops.aten 2023-01-11T21:38:06.4060124Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.4060220Z async_compile = AsyncCompile() 2023-01-11T21:38:06.4060225Z 2023-01-11T21:38:06.4060299Z import triton 2023-01-11T21:38:06.4060392Z import triton.language as tl 2023-01-11T21:38:06.4060516Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.4060655Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.4060660Z 2023-01-11T21:38:06.4060692Z 2023-01-11T21:38:06.4060852Z triton_fused_index_0 = async_compile.triton(''' 2023-01-11T21:38:06.4060922Z import triton 2023-01-11T21:38:06.4061016Z import triton.language as tl 2023-01-11T21:38:06.4061134Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4061238Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4061377Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4061505Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4061510Z 2023-01-11T21:38:06.4061942Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*i64', 1: '*i64', 2: '*fp32', 3: '*fp32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]}) 2023-01-11T21:38:06.4062017Z @triton.jit 2023-01-11T21:38:06.4062164Z def triton_(in_ptr0, in_ptr1, in_ptr2, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4062242Z xnumel = 192 2023-01-11T21:38:06.4062342Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4062474Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4062561Z xmask = xindex < xnumel 2023-01-11T21:38:06.4062646Z x1 = (xindex // 12) % 4 2023-01-11T21:38:06.4062725Z x2 = (xindex // 48) 2023-01-11T21:38:06.4062798Z x0 = xindex % 12 2023-01-11T21:38:06.4062871Z x4 = xindex 2023-01-11T21:38:06.4062974Z tmp0 = tl.load(in_ptr0 + (x1), xmask) 2023-01-11T21:38:06.4063075Z tmp1 = tl.load(in_ptr1 + (x2), xmask) 2023-01-11T21:38:06.4063198Z tmp2 = tl.load(in_ptr2 + (x0 + (12*tmp1) + (96*tmp0)), xmask) 2023-01-11T21:38:06.4063337Z tl.store(out_ptr0 + (x4 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.4063425Z ''') 2023-01-11T21:38:06.4063431Z 2023-01-11T21:38:06.4063435Z 2023-01-11T21:38:06.4063530Z async_compile.wait(globals()) 2023-01-11T21:38:06.4063602Z del async_compile 2023-01-11T21:38:06.4063607Z 2023-01-11T21:38:06.4063685Z def call(args): 2023-01-11T21:38:06.4063774Z arg0_1, arg1_1, arg2_1 = args 2023-01-11T21:38:06.4063853Z args.clear() 2023-01-11T21:38:06.4063948Z with torch.cuda.device(0): 2023-01-11T21:38:06.4064159Z buf0 = empty_strided((4, 4, 12), (48, 12, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.4064295Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.4064448Z triton_fused_index_0.run(arg1_1, arg2_1, arg0_1, buf0, 192, grid=grid(192), stream=stream0) 2023-01-11T21:38:06.4064525Z del arg0_1 2023-01-11T21:38:06.4064598Z del arg1_1 2023-01-11T21:38:06.4064673Z del arg2_1 2023-01-11T21:38:06.4064753Z return (buf0, ) 2023-01-11T21:38:06.4064758Z 2023-01-11T21:38:06.4064762Z 2023-01-11T21:38:06.4064844Z if __name__ == "__main__": 2023-01-11T21:38:06.4064967Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.4065089Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.4065303Z arg0_1 = rand_strided((8, 8, 12), (96, 12, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.4065504Z arg1_1 = rand_strided((1, 4), (4, 1), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.4065701Z arg2_1 = rand_strided((4, 1), (1, 1), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.4065832Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1])) 2023-01-11T21:38:06.4065838Z 2023-01-11T21:38:06.4066106Z [2023-01-11 21:34:47,253] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 585 2023-01-11T21:38:06.4066112Z 2023-01-11T21:38:06.4066215Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.4066294Z import torch 2023-01-11T21:38:06.4066364Z import random 2023-01-11T21:38:06.4066485Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.4066614Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.4066619Z 2023-01-11T21:38:06.4066703Z aten = torch.ops.aten 2023-01-11T21:38:06.4066845Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.4066974Z async_compile = AsyncCompile() 2023-01-11T21:38:06.4066979Z 2023-01-11T21:38:06.4067058Z import triton 2023-01-11T21:38:06.4067155Z import triton.language as tl 2023-01-11T21:38:06.4067276Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.4067421Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.4067426Z 2023-01-11T21:38:06.4067431Z 2023-01-11T21:38:06.4067594Z triton_fused_index_0 = async_compile.triton(''' 2023-01-11T21:38:06.4067671Z import triton 2023-01-11T21:38:06.4067766Z import triton.language as tl 2023-01-11T21:38:06.4067881Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4067990Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4068126Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4068248Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4068253Z 2023-01-11T21:38:06.4068686Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*i64', 1: '*i64', 2: '*fp16', 3: '*fp16', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]}) 2023-01-11T21:38:06.4068760Z @triton.jit 2023-01-11T21:38:06.4068919Z def triton_(in_ptr0, in_ptr1, in_ptr2, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4068994Z xnumel = 192 2023-01-11T21:38:06.4069094Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4069224Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4069311Z xmask = xindex < xnumel 2023-01-11T21:38:06.4069389Z x1 = (xindex // 12) % 4 2023-01-11T21:38:06.4069470Z x2 = (xindex // 48) 2023-01-11T21:38:06.4069546Z x0 = xindex % 12 2023-01-11T21:38:06.4069619Z x4 = xindex 2023-01-11T21:38:06.4069720Z tmp0 = tl.load(in_ptr0 + (x1), xmask) 2023-01-11T21:38:06.4069820Z tmp1 = tl.load(in_ptr1 + (x2), xmask) 2023-01-11T21:38:06.4069962Z tmp2 = tl.load(in_ptr2 + (x0 + (12*tmp1) + (96*tmp0)), xmask).to(tl.float32) 2023-01-11T21:38:06.4070093Z tl.store(out_ptr0 + (x4 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.4070181Z ''') 2023-01-11T21:38:06.4070187Z 2023-01-11T21:38:06.4070191Z 2023-01-11T21:38:06.4070313Z async_compile.wait(globals()) 2023-01-11T21:38:06.4070394Z del async_compile 2023-01-11T21:38:06.4070399Z 2023-01-11T21:38:06.4070476Z def call(args): 2023-01-11T21:38:06.4070564Z arg0_1, arg1_1, arg2_1 = args 2023-01-11T21:38:06.4070641Z args.clear() 2023-01-11T21:38:06.4070728Z with torch.cuda.device(0): 2023-01-11T21:38:06.4070942Z buf0 = empty_strided((4, 4, 12), (48, 12, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.4071036Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.4071194Z triton_fused_index_0.run(arg1_1, arg2_1, arg0_1, buf0, 192, grid=grid(192), stream=stream0) 2023-01-11T21:38:06.4071269Z del arg0_1 2023-01-11T21:38:06.4071347Z del arg1_1 2023-01-11T21:38:06.4071421Z del arg2_1 2023-01-11T21:38:06.4071493Z return (buf0, ) 2023-01-11T21:38:06.4071506Z 2023-01-11T21:38:06.4071510Z 2023-01-11T21:38:06.4071585Z if __name__ == "__main__": 2023-01-11T21:38:06.4071705Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.4071835Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.4072046Z arg0_1 = rand_strided((8, 8, 12), (96, 12, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.4072245Z arg1_1 = rand_strided((1, 4), (4, 1), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.4072443Z arg2_1 = rand_strided((4, 1), (1, 1), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.4072573Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1])) 2023-01-11T21:38:06.4072578Z 2023-01-11T21:38:06.4072651Z ok (0.504s) 2023-01-11T21:38:06.4073103Z test_index2_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.4073266Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.4073530Z [2023-01-11 21:34:47,321] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 586 2023-01-11T21:38:06.4073795Z [2023-01-11 21:34:47,423] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 586 2023-01-11T21:38:06.4074213Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.4074349Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.4074610Z [2023-01-11 21:34:47,491] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 587 2023-01-11T21:38:06.4074877Z [2023-01-11 21:34:47,582] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 587 2023-01-11T21:38:06.4074883Z 2023-01-11T21:38:06.4074983Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.4075061Z import torch 2023-01-11T21:38:06.4075130Z import random 2023-01-11T21:38:06.4075252Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.4075383Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.4075388Z 2023-01-11T21:38:06.4075471Z aten = torch.ops.aten 2023-01-11T21:38:06.4075612Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.4075709Z async_compile = AsyncCompile() 2023-01-11T21:38:06.4075718Z 2023-01-11T21:38:06.4075791Z import triton 2023-01-11T21:38:06.4075883Z import triton.language as tl 2023-01-11T21:38:06.4076004Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.4076145Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.4076151Z 2023-01-11T21:38:06.4076180Z 2023-01-11T21:38:06.4076340Z triton_fused_index_0 = async_compile.triton(''' 2023-01-11T21:38:06.4076418Z import triton 2023-01-11T21:38:06.4076510Z import triton.language as tl 2023-01-11T21:38:06.4076625Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4076729Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4076866Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4076988Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4076993Z 2023-01-11T21:38:06.4077415Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.4077493Z @triton.jit 2023-01-11T21:38:06.4077638Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4077716Z xnumel = 256 2023-01-11T21:38:06.4077817Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4077953Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4078037Z xmask = xindex < xnumel 2023-01-11T21:38:06.4078112Z x1 = (xindex // 64) 2023-01-11T21:38:06.4078189Z x0 = xindex % 64 2023-01-11T21:38:06.4078262Z x2 = xindex 2023-01-11T21:38:06.4078457Z tmp0 = tl.load(in_ptr0 + (x1), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.4078667Z tmp1 = tl.load(in_ptr1 + (x0 + (64*tmp0)), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.4078805Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask) 2023-01-11T21:38:06.4078929Z ''') 2023-01-11T21:38:06.4078935Z 2023-01-11T21:38:06.4078939Z 2023-01-11T21:38:06.4079096Z triton_fused_index_1_1 = async_compile.triton(''' 2023-01-11T21:38:06.4079174Z import triton 2023-01-11T21:38:06.4079269Z import triton.language as tl 2023-01-11T21:38:06.4079388Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4079492Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4079627Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4079756Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4079761Z 2023-01-11T21:38:06.4080179Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.4080249Z @triton.jit 2023-01-11T21:38:06.4080393Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4080473Z xnumel = 256 2023-01-11T21:38:06.4080571Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4080704Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4080790Z xmask = xindex < xnumel 2023-01-11T21:38:06.4080872Z x1 = (xindex // 8) % 4 2023-01-11T21:38:06.4080944Z x0 = xindex % 8 2023-01-11T21:38:06.4081025Z x2 = (xindex // 32) 2023-01-11T21:38:06.4081100Z x3 = xindex 2023-01-11T21:38:06.4081200Z tmp0 = tl.load(in_ptr0 + (x1), xmask) 2023-01-11T21:38:06.4081323Z tmp1 = tl.load(in_ptr1 + (x0 + (8*tmp0) + (64*x2)), xmask) 2023-01-11T21:38:06.4081459Z tl.store(out_ptr0 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask) 2023-01-11T21:38:06.4081545Z ''') 2023-01-11T21:38:06.4081550Z 2023-01-11T21:38:06.4081555Z 2023-01-11T21:38:06.4081647Z async_compile.wait(globals()) 2023-01-11T21:38:06.4081720Z del async_compile 2023-01-11T21:38:06.4081725Z 2023-01-11T21:38:06.4081805Z def call(args): 2023-01-11T21:38:06.4081885Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.4081961Z args.clear() 2023-01-11T21:38:06.4082057Z with torch.cuda.device(0): 2023-01-11T21:38:06.4082279Z buf0 = empty_strided((1, 4, 8, 8), (256, 64, 8, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.4082400Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.4082544Z triton_fused_index_0.run(arg1_1, arg0_1, buf0, 256, grid=grid(256), stream=stream0) 2023-01-11T21:38:06.4082763Z buf1 = empty_strided((8, 1, 4, 8), (32, 32, 8, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.4082917Z triton_fused_index_1_1.run(arg1_1, arg0_1, buf1, 256, grid=grid(256), stream=stream0) 2023-01-11T21:38:06.4082992Z del arg0_1 2023-01-11T21:38:06.4083065Z del arg1_1 2023-01-11T21:38:06.4083151Z return (buf0, buf1, ) 2023-01-11T21:38:06.4083156Z 2023-01-11T21:38:06.4083160Z 2023-01-11T21:38:06.4083241Z if __name__ == "__main__": 2023-01-11T21:38:06.4083362Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.4083485Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.4083694Z arg0_1 = rand_strided((8, 8, 8), (64, 8, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.4090045Z arg1_1 = rand_strided((1, 4), (4, 1), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.4090168Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.4090174Z 2023-01-11T21:38:06.4090178Z 2023-01-11T21:38:06.4090276Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.4090353Z import torch 2023-01-11T21:38:06.4090431Z import random 2023-01-11T21:38:06.4090551Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.4090677Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.4090683Z 2023-01-11T21:38:06.4090757Z aten = torch.ops.aten 2023-01-11T21:38:06.4090898Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.4091054Z async_compile = AsyncCompile() 2023-01-11T21:38:06.4091059Z 2023-01-11T21:38:06.4091136Z import triton 2023-01-11T21:38:06.4091232Z import triton.language as tl 2023-01-11T21:38:06.4091361Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.4091499Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.4091506Z 2023-01-11T21:38:06.4091510Z 2023-01-11T21:38:06.4091667Z triton_fused_index_0 = async_compile.triton(''' 2023-01-11T21:38:06.4091745Z import triton 2023-01-11T21:38:06.4091837Z import triton.language as tl 2023-01-11T21:38:06.4091954Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4092058Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4092194Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4092323Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4092328Z 2023-01-11T21:38:06.4092748Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.4092822Z @triton.jit 2023-01-11T21:38:06.4092966Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4093045Z xnumel = 256 2023-01-11T21:38:06.4093144Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4093275Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4093361Z xmask = xindex < xnumel 2023-01-11T21:38:06.4093443Z x1 = (xindex // 64) 2023-01-11T21:38:06.4093514Z x0 = xindex % 64 2023-01-11T21:38:06.4093586Z x2 = xindex 2023-01-11T21:38:06.4093780Z tmp0 = tl.load(in_ptr0 + (x1), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.4094012Z tmp1 = tl.load(in_ptr1 + (x0 + (64*tmp0)), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.4094150Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask) 2023-01-11T21:38:06.4094239Z ''') 2023-01-11T21:38:06.4094245Z 2023-01-11T21:38:06.4094249Z 2023-01-11T21:38:06.4094411Z triton_fused_index_1_1 = async_compile.triton(''' 2023-01-11T21:38:06.4094724Z import triton 2023-01-11T21:38:06.4094814Z import triton.language as tl 2023-01-11T21:38:06.4094981Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4095084Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4095218Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4095361Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4095368Z 2023-01-11T21:38:06.4095818Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.4095890Z @triton.jit 2023-01-11T21:38:06.4096034Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4096101Z xnumel = 256 2023-01-11T21:38:06.4096193Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4096321Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4096403Z xmask = xindex < xnumel 2023-01-11T21:38:06.4096489Z x1 = (xindex // 8) % 4 2023-01-11T21:38:06.4096562Z x0 = xindex % 8 2023-01-11T21:38:06.4096639Z x2 = (xindex // 32) 2023-01-11T21:38:06.4096703Z x3 = xindex 2023-01-11T21:38:06.4096801Z tmp0 = tl.load(in_ptr0 + (x1), xmask) 2023-01-11T21:38:06.4096934Z tmp1 = tl.load(in_ptr1 + (x0 + (8*tmp0) + (64*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.4097068Z tl.store(out_ptr0 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask) 2023-01-11T21:38:06.4097219Z ''') 2023-01-11T21:38:06.4097226Z 2023-01-11T21:38:06.4097234Z 2023-01-11T21:38:06.4097338Z async_compile.wait(globals()) 2023-01-11T21:38:06.4097415Z del async_compile 2023-01-11T21:38:06.4097508Z 2023-01-11T21:38:06.4097577Z def call(args): 2023-01-11T21:38:06.4097655Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.4097729Z args.clear() 2023-01-11T21:38:06.4097818Z with torch.cuda.device(0): 2023-01-11T21:38:06.4098037Z buf0 = empty_strided((1, 4, 8, 8), (256, 64, 8, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.4098131Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.4098277Z triton_fused_index_0.run(arg1_1, arg0_1, buf0, 256, grid=grid(256), stream=stream0) 2023-01-11T21:38:06.4098492Z buf1 = empty_strided((8, 1, 4, 8), (32, 32, 8, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.4098630Z triton_fused_index_1_1.run(arg1_1, arg0_1, buf1, 256, grid=grid(256), stream=stream0) 2023-01-11T21:38:06.4098703Z del arg0_1 2023-01-11T21:38:06.4098775Z del arg1_1 2023-01-11T21:38:06.4098859Z return (buf0, buf1, ) 2023-01-11T21:38:06.4098864Z 2023-01-11T21:38:06.4098868Z 2023-01-11T21:38:06.4098950Z if __name__ == "__main__": 2023-01-11T21:38:06.4099066Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.4099192Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.4099400Z arg0_1 = rand_strided((8, 8, 8), (64, 8, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.4099593Z arg1_1 = rand_strided((1, 4), (4, 1), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.4099711Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.4099717Z 2023-01-11T21:38:06.4099787Z ok (0.329s) 2023-01-11T21:38:06.4100239Z test_index3_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.4100373Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.4100636Z [2023-01-11 21:34:47,625] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 588 2023-01-11T21:38:06.4100854Z [2023-01-11 21:34:47,647] torch._inductor.ir: [WARNING] Using FallbackKernel: aten.index 2023-01-11T21:38:06.4101146Z [2023-01-11 21:34:47,650] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 588 2023-01-11T21:38:06.4101556Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.4101686Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.4101930Z [2023-01-11 21:34:47,690] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 589 2023-01-11T21:38:06.4102144Z [2023-01-11 21:34:47,711] torch._inductor.ir: [WARNING] Using FallbackKernel: aten.index 2023-01-11T21:38:06.4102401Z [2023-01-11 21:34:47,714] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 589 2023-01-11T21:38:06.4102409Z 2023-01-11T21:38:06.4102506Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.4102580Z import torch 2023-01-11T21:38:06.4102652Z import random 2023-01-11T21:38:06.4102770Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.4102892Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.4102898Z 2023-01-11T21:38:06.4102973Z aten = torch.ops.aten 2023-01-11T21:38:06.4103111Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.4103207Z async_compile = AsyncCompile() 2023-01-11T21:38:06.4103213Z 2023-01-11T21:38:06.4103286Z import triton 2023-01-11T21:38:06.4103403Z import triton.language as tl 2023-01-11T21:38:06.4103528Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.4103668Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.4103673Z 2023-01-11T21:38:06.4103677Z 2023-01-11T21:38:06.4103769Z async_compile.wait(globals()) 2023-01-11T21:38:06.4103840Z del async_compile 2023-01-11T21:38:06.4103851Z 2023-01-11T21:38:06.4103919Z def call(args): 2023-01-11T21:38:06.4104004Z arg0_1, arg1_1, arg2_1 = args 2023-01-11T21:38:06.4104080Z args.clear() 2023-01-11T21:38:06.4104172Z with torch.cuda.device(0): 2023-01-11T21:38:06.4104324Z buf0 = aten.index(as_strided(arg0_1, (3, 4, 1, 4, 3), (192, 48, 0, 12, 1)), [None, arg1_1, None, arg2_1]) 2023-01-11T21:38:06.4104399Z del arg0_1 2023-01-11T21:38:06.4104464Z del arg1_1 2023-01-11T21:38:06.4104534Z del arg2_1 2023-01-11T21:38:06.4104604Z buf1 = buf0 2023-01-11T21:38:06.4104716Z assert_size_stride(buf1, (3, 3, 1, 3), (9, 3, 3, 1)) 2023-01-11T21:38:06.4104789Z del buf0 2023-01-11T21:38:06.4104867Z return (buf1, ) 2023-01-11T21:38:06.4104872Z 2023-01-11T21:38:06.4104877Z 2023-01-11T21:38:06.4104955Z if __name__ == "__main__": 2023-01-11T21:38:06.4105070Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.4105194Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.4105425Z arg0_1 = rand_strided((3, 4, 4, 4, 3), (192, 48, 12, 3, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.4105639Z arg1_1 = rand_strided((3, ), (1, ), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.4105860Z arg2_1 = rand_strided((3, ), (1, ), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.4105987Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1])) 2023-01-11T21:38:06.4105993Z 2023-01-11T21:38:06.4105997Z 2023-01-11T21:38:06.4106096Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.4106170Z import torch 2023-01-11T21:38:06.4106245Z import random 2023-01-11T21:38:06.4106356Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.4106480Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.4106486Z 2023-01-11T21:38:06.4106568Z aten = torch.ops.aten 2023-01-11T21:38:06.4106702Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.4106823Z async_compile = AsyncCompile() 2023-01-11T21:38:06.4106829Z 2023-01-11T21:38:06.4106903Z import triton 2023-01-11T21:38:06.4106993Z import triton.language as tl 2023-01-11T21:38:06.4107111Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.4107248Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.4107253Z 2023-01-11T21:38:06.4107258Z 2023-01-11T21:38:06.4107349Z async_compile.wait(globals()) 2023-01-11T21:38:06.4107424Z del async_compile 2023-01-11T21:38:06.4107429Z 2023-01-11T21:38:06.4107503Z def call(args): 2023-01-11T21:38:06.4107587Z arg0_1, arg1_1, arg2_1 = args 2023-01-11T21:38:06.4107662Z args.clear() 2023-01-11T21:38:06.4107752Z with torch.cuda.device(0): 2023-01-11T21:38:06.4107896Z buf0 = aten.index(as_strided(arg0_1, (3, 4, 1, 4, 3), (192, 48, 0, 12, 1)), [None, arg1_1, None, arg2_1]) 2023-01-11T21:38:06.4107972Z del arg0_1 2023-01-11T21:38:06.4108045Z del arg1_1 2023-01-11T21:38:06.4108118Z del arg2_1 2023-01-11T21:38:06.4108190Z buf1 = buf0 2023-01-11T21:38:06.4108300Z assert_size_stride(buf1, (3, 3, 1, 3), (9, 3, 3, 1)) 2023-01-11T21:38:06.4108371Z del buf0 2023-01-11T21:38:06.4108442Z return (buf1, ) 2023-01-11T21:38:06.4108447Z 2023-01-11T21:38:06.4108457Z 2023-01-11T21:38:06.4108529Z if __name__ == "__main__": 2023-01-11T21:38:06.4108647Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.4108772Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.4108995Z arg0_1 = rand_strided((3, 4, 4, 4, 3), (192, 48, 12, 3, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.4109217Z arg1_1 = rand_strided((3, ), (1, ), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.4109407Z arg2_1 = rand_strided((3, ), (1, ), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.4109531Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1])) 2023-01-11T21:38:06.4109536Z 2023-01-11T21:38:06.4109605Z ok (0.131s) 2023-01-11T21:38:06.4110061Z test_index_put1_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.4110193Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.4110449Z [2023-01-11 21:34:47,901] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 590 2023-01-11T21:38:06.4110712Z [2023-01-11 21:34:48,118] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 590 2023-01-11T21:38:06.4111129Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.4111260Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.4111518Z [2023-01-11 21:34:48,208] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 591 2023-01-11T21:38:06.4111524Z 2023-01-11T21:38:06.4111621Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.4111695Z import torch 2023-01-11T21:38:06.4111768Z import random 2023-01-11T21:38:06.4111880Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.4112003Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.4112009Z 2023-01-11T21:38:06.4112090Z aten = torch.ops.aten 2023-01-11T21:38:06.4112227Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.4112320Z async_compile = AsyncCompile() 2023-01-11T21:38:06.4112354Z 2023-01-11T21:38:06.4112432Z import triton 2023-01-11T21:38:06.4112527Z import triton.language as tl 2023-01-11T21:38:06.4112646Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.4112788Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.4112794Z 2023-01-11T21:38:06.4112798Z 2023-01-11T21:38:06.4112970Z triton_fused_add_index_put_0 = async_compile.triton(''' 2023-01-11T21:38:06.4113045Z import triton 2023-01-11T21:38:06.4113138Z import triton.language as tl 2023-01-11T21:38:06.4113251Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4113355Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4113494Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4113615Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4113620Z 2023-01-11T21:38:06.4114053Z @pointwise(size_hints=[16777216], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.4114130Z @triton.jit 2023-01-11T21:38:06.4114274Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4114355Z xnumel = 10035200 2023-01-11T21:38:06.4114455Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4114589Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4114674Z xmask = xindex < xnumel 2023-01-11T21:38:06.4114741Z x0 = xindex 2023-01-11T21:38:06.4114933Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.4115067Z tmp1 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.4115139Z tmp2 = 1 2023-01-11T21:38:06.4115218Z tmp3 = tmp1 + tmp2 2023-01-11T21:38:06.4115375Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask) 2023-01-11T21:38:06.4115531Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.4115618Z ''') 2023-01-11T21:38:06.4115624Z 2023-01-11T21:38:06.4115636Z 2023-01-11T21:38:06.4115829Z triton_fused_add_add_2_index_put_index_put__1 = async_compile.triton(''' 2023-01-11T21:38:06.4115903Z import triton 2023-01-11T21:38:06.4115995Z import triton.language as tl 2023-01-11T21:38:06.4116109Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4116215Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4116345Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4116470Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4116479Z 2023-01-11T21:38:06.4116930Z @pointwise(size_hints=[8388608], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'out_ptr0', 'out_ptr1'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]}) 2023-01-11T21:38:06.4117004Z @triton.jit 2023-01-11T21:38:06.4117156Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4117233Z xnumel = 7538944 2023-01-11T21:38:06.4117330Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4117460Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4117543Z xmask = xindex < xnumel 2023-01-11T21:38:06.4117627Z x1 = (xindex // 12544) 2023-01-11T21:38:06.4117690Z x2 = xindex 2023-01-11T21:38:06.4117771Z x0 = xindex % 12544 2023-01-11T21:38:06.4117965Z tmp0 = tl.load(in_ptr0 + (x1), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.4118159Z tmp1 = tl.load(in_ptr1 + (x2), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.4118255Z tmp2 = tl.load(in_ptr0 + (x1), xmask) 2023-01-11T21:38:06.4118351Z tmp5 = tl.load(in_ptr1 + (x2), xmask) 2023-01-11T21:38:06.4118421Z tmp3 = 1 2023-01-11T21:38:06.4118492Z tmp4 = tmp2 + tmp3 2023-01-11T21:38:06.4118598Z tmp6 = tmp5 + tmp3 2023-01-11T21:38:06.4118743Z tl.store(out_ptr0 + (x0 + (12544*tmp0) + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask) 2023-01-11T21:38:06.4118888Z tl.store(out_ptr1 + (x0 + (12544*tmp4) + tl.zeros([XBLOCK], tl.int32)), tmp6, xmask) 2023-01-11T21:38:06.4118973Z ''') 2023-01-11T21:38:06.4118979Z 2023-01-11T21:38:06.4118983Z 2023-01-11T21:38:06.4119139Z triton_fused_add_3_2 = async_compile.triton(''' 2023-01-11T21:38:06.4119212Z import triton 2023-01-11T21:38:06.4119298Z import triton.language as tl 2023-01-11T21:38:06.4119411Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4119510Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4119648Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4119772Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4119777Z 2023-01-11T21:38:06.4120187Z @pointwise(size_hints=[16777216], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.4120262Z @triton.jit 2023-01-11T21:38:06.4120395Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4120470Z xnumel = 10035200 2023-01-11T21:38:06.4120560Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4120687Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4120770Z xmask = xindex < xnumel 2023-01-11T21:38:06.4120841Z x0 = xindex 2023-01-11T21:38:06.4120935Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.4121034Z tmp1 = 1 2023-01-11T21:38:06.4121106Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.4121240Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.4121325Z ''') 2023-01-11T21:38:06.4121331Z 2023-01-11T21:38:06.4121335Z 2023-01-11T21:38:06.4121427Z async_compile.wait(globals()) 2023-01-11T21:38:06.4121506Z del async_compile 2023-01-11T21:38:06.4121511Z 2023-01-11T21:38:06.4121589Z def call(args): 2023-01-11T21:38:06.4121676Z arg0_1, arg1_1, arg2_1 = args 2023-01-11T21:38:06.4121751Z args.clear() 2023-01-11T21:38:06.4121835Z with torch.cuda.device(0): 2023-01-11T21:38:06.4122062Z buf0 = empty_strided((800, 256, 7, 7), (12544, 49, 7, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.4122280Z buf2 = empty_strided((800, 256, 7, 7), (12544, 49, 7, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.4122372Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.4122532Z triton_fused_add_index_put_0.run(arg0_1, buf0, buf2, 10035200, grid=grid(10035200), stream=stream0) 2023-01-11T21:38:06.4122609Z del arg0_1 2023-01-11T21:38:06.4122789Z triton_fused_add_add_2_index_put_index_put__1.run(arg1_1, arg2_1, buf0, buf2, 7538944, grid=grid(7538944), stream=stream0) 2023-01-11T21:38:06.4122867Z del arg1_1 2023-01-11T21:38:06.4122935Z del arg2_1 2023-01-11T21:38:06.4123159Z buf4 = empty_strided((800, 256, 7, 7), (12544, 49, 7, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.4123300Z triton_fused_add_3_2.run(buf2, buf4, 10035200, grid=grid(10035200), stream=stream0) 2023-01-11T21:38:06.4123381Z return (buf0, buf4, ) 2023-01-11T21:38:06.4123386Z 2023-01-11T21:38:06.4123391Z 2023-01-11T21:38:06.4123470Z if __name__ == "__main__": 2023-01-11T21:38:06.4123589Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.4123713Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.4123939Z arg0_1 = rand_strided((800, 256, 7, 7), (12544, 49, 7, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.4124130Z arg1_1 = rand_strided((601, ), (1, ), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.4124352Z arg2_1 = rand_strided((601, 256, 7, 7), (12544, 49, 7, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.4124512Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1])) 2023-01-11T21:38:06.4124518Z 2023-01-11T21:38:06.4124782Z [2023-01-11 21:34:48,322] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 591 2023-01-11T21:38:06.4125197Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.4125331Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.4125590Z [2023-01-11 21:34:48,412] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 592 2023-01-11T21:38:06.4125596Z 2023-01-11T21:38:06.4125693Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.4125767Z import torch 2023-01-11T21:38:06.4125836Z import random 2023-01-11T21:38:06.4125957Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.4126079Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.4126084Z 2023-01-11T21:38:06.4126166Z aten = torch.ops.aten 2023-01-11T21:38:06.4126302Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.4126398Z async_compile = AsyncCompile() 2023-01-11T21:38:06.4126403Z 2023-01-11T21:38:06.4126476Z import triton 2023-01-11T21:38:06.4126569Z import triton.language as tl 2023-01-11T21:38:06.4126687Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.4126826Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.4126869Z 2023-01-11T21:38:06.4126874Z 2023-01-11T21:38:06.4127050Z triton_fused_add_index_put_0 = async_compile.triton(''' 2023-01-11T21:38:06.4127125Z import triton 2023-01-11T21:38:06.4127218Z import triton.language as tl 2023-01-11T21:38:06.4127334Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4127435Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4127565Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4127684Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4127689Z 2023-01-11T21:38:06.4128112Z @pointwise(size_hints=[16777216], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.4128188Z @triton.jit 2023-01-11T21:38:06.4128332Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4128412Z xnumel = 10035200 2023-01-11T21:38:06.4128507Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4128634Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4128718Z xmask = xindex < xnumel 2023-01-11T21:38:06.4128782Z x0 = xindex 2023-01-11T21:38:06.4128997Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.4129115Z tmp1 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.4129186Z tmp2 = 1 2023-01-11T21:38:06.4129266Z tmp3 = tmp1 + tmp2 2023-01-11T21:38:06.4129401Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask) 2023-01-11T21:38:06.4129531Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.4129608Z ''') 2023-01-11T21:38:06.4129613Z 2023-01-11T21:38:06.4129624Z 2023-01-11T21:38:06.4129817Z triton_fused_add_add_2_index_put_index_put__1 = async_compile.triton(''' 2023-01-11T21:38:06.4129898Z import triton 2023-01-11T21:38:06.4129990Z import triton.language as tl 2023-01-11T21:38:06.4130106Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4130207Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4130339Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4130491Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4130496Z 2023-01-11T21:38:06.4130948Z @pointwise(size_hints=[8388608], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp16', 2: '*fp16', 3: '*fp16', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'out_ptr0', 'out_ptr1'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]}) 2023-01-11T21:38:06.4131022Z @triton.jit 2023-01-11T21:38:06.4131173Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4131251Z xnumel = 7538944 2023-01-11T21:38:06.4131356Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4131481Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4131563Z xmask = xindex < xnumel 2023-01-11T21:38:06.4131644Z x1 = (xindex // 12544) 2023-01-11T21:38:06.4131707Z x2 = xindex 2023-01-11T21:38:06.4131783Z x0 = xindex % 12544 2023-01-11T21:38:06.4131977Z tmp0 = tl.load(in_ptr0 + (x1), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.4132191Z tmp1 = tl.load(in_ptr1 + (x2), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.4132288Z tmp2 = tl.load(in_ptr0 + (x1), xmask) 2023-01-11T21:38:06.4132404Z tmp5 = tl.load(in_ptr1 + (x2), xmask).to(tl.float32) 2023-01-11T21:38:06.4132473Z tmp3 = 1 2023-01-11T21:38:06.4132545Z tmp4 = tmp2 + tmp3 2023-01-11T21:38:06.4132620Z tmp6 = tmp5 + tmp3 2023-01-11T21:38:06.4132765Z tl.store(out_ptr0 + (x0 + (12544*tmp0) + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask) 2023-01-11T21:38:06.4132908Z tl.store(out_ptr1 + (x0 + (12544*tmp4) + tl.zeros([XBLOCK], tl.int32)), tmp6, xmask) 2023-01-11T21:38:06.4133023Z ''') 2023-01-11T21:38:06.4133028Z 2023-01-11T21:38:06.4133033Z 2023-01-11T21:38:06.4133189Z triton_fused_add_3_2 = async_compile.triton(''' 2023-01-11T21:38:06.4133262Z import triton 2023-01-11T21:38:06.4133348Z import triton.language as tl 2023-01-11T21:38:06.4133464Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4133566Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4133697Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4133820Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4133825Z 2023-01-11T21:38:06.4134232Z @pointwise(size_hints=[16777216], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.4134307Z @triton.jit 2023-01-11T21:38:06.4134442Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4134635Z xnumel = 10035200 2023-01-11T21:38:06.4134727Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4134858Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4134940Z xmask = xindex < xnumel 2023-01-11T21:38:06.4135011Z x0 = xindex 2023-01-11T21:38:06.4135127Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.4135197Z tmp1 = 1 2023-01-11T21:38:06.4135269Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.4135401Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.4135485Z ''') 2023-01-11T21:38:06.4135491Z 2023-01-11T21:38:06.4135495Z 2023-01-11T21:38:06.4135587Z async_compile.wait(globals()) 2023-01-11T21:38:06.4135663Z del async_compile 2023-01-11T21:38:06.4135668Z 2023-01-11T21:38:06.4135742Z def call(args): 2023-01-11T21:38:06.4135828Z arg0_1, arg1_1, arg2_1 = args 2023-01-11T21:38:06.4135905Z args.clear() 2023-01-11T21:38:06.4135990Z with torch.cuda.device(0): 2023-01-11T21:38:06.4136217Z buf0 = empty_strided((800, 256, 7, 7), (12544, 49, 7, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.4136436Z buf2 = empty_strided((800, 256, 7, 7), (12544, 49, 7, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.4136573Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.4136736Z triton_fused_add_index_put_0.run(arg0_1, buf0, buf2, 10035200, grid=grid(10035200), stream=stream0) 2023-01-11T21:38:06.4136809Z del arg0_1 2023-01-11T21:38:06.4136990Z triton_fused_add_add_2_index_put_index_put__1.run(arg1_1, arg2_1, buf0, buf2, 7538944, grid=grid(7538944), stream=stream0) 2023-01-11T21:38:06.4137061Z del arg1_1 2023-01-11T21:38:06.4137177Z del arg2_1 2023-01-11T21:38:06.4137404Z buf4 = empty_strided((800, 256, 7, 7), (12544, 49, 7, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.4137549Z triton_fused_add_3_2.run(buf2, buf4, 10035200, grid=grid(10035200), stream=stream0) 2023-01-11T21:38:06.4137632Z return (buf0, buf4, ) 2023-01-11T21:38:06.4137637Z 2023-01-11T21:38:06.4137641Z 2023-01-11T21:38:06.4137721Z if __name__ == "__main__": 2023-01-11T21:38:06.4137838Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.4137968Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.4138193Z arg0_1 = rand_strided((800, 256, 7, 7), (12544, 49, 7, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.4138382Z arg1_1 = rand_strided((601, ), (1, ), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.4138606Z arg2_1 = rand_strided((601, 256, 7, 7), (12544, 49, 7, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.4138733Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1])) 2023-01-11T21:38:06.4138738Z 2023-01-11T21:38:06.4139002Z [2023-01-11 21:34:48,618] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 592 2023-01-11T21:38:06.4139461Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.4139591Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.4139846Z [2023-01-11 21:34:48,691] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 593 2023-01-11T21:38:06.4139851Z 2023-01-11T21:38:06.4139949Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.4140026Z import torch 2023-01-11T21:38:06.4140093Z import random 2023-01-11T21:38:06.4140212Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.4140335Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.4140343Z 2023-01-11T21:38:06.4140424Z aten = torch.ops.aten 2023-01-11T21:38:06.4140558Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.4140652Z async_compile = AsyncCompile() 2023-01-11T21:38:06.4140657Z 2023-01-11T21:38:06.4140732Z import triton 2023-01-11T21:38:06.4140824Z import triton.language as tl 2023-01-11T21:38:06.4140945Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.4141085Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.4141090Z 2023-01-11T21:38:06.4141095Z 2023-01-11T21:38:06.4141264Z triton_fused_add_index_put_0 = async_compile.triton(''' 2023-01-11T21:38:06.4141338Z import triton 2023-01-11T21:38:06.4141429Z import triton.language as tl 2023-01-11T21:38:06.4141541Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4141643Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4141775Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4141897Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4141902Z 2023-01-11T21:38:06.4142345Z @pointwise(size_hints=[8192], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.4142422Z @triton.jit 2023-01-11T21:38:06.4142568Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4142644Z xnumel = 8192 2023-01-11T21:38:06.4142742Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4142870Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4142953Z xmask = xindex < xnumel 2023-01-11T21:38:06.4143017Z x0 = xindex 2023-01-11T21:38:06.4143207Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.4143306Z tmp1 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.4143378Z tmp2 = 1 2023-01-11T21:38:06.4143458Z tmp3 = tmp1 + tmp2 2023-01-11T21:38:06.4143593Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask) 2023-01-11T21:38:06.4143727Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.4143805Z ''') 2023-01-11T21:38:06.4143810Z 2023-01-11T21:38:06.4143817Z 2023-01-11T21:38:06.4144014Z triton_fused_add_add_2_index_put_index_put__1 = async_compile.triton(''' 2023-01-11T21:38:06.4144088Z import triton 2023-01-11T21:38:06.4144180Z import triton.language as tl 2023-01-11T21:38:06.4144294Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4144391Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4144524Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4144641Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4144653Z 2023-01-11T21:38:06.4145098Z @pointwise(size_hints=[32], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'out_ptr0', 'out_ptr1'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]}) 2023-01-11T21:38:06.4145204Z @triton.jit 2023-01-11T21:38:06.4145353Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4145431Z xnumel = 32 2023-01-11T21:38:06.4145548Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4145698Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4145783Z xmask = xindex < xnumel 2023-01-11T21:38:06.4145861Z x1 = (xindex // 8) 2023-01-11T21:38:06.4145928Z x0 = xindex % 8 2023-01-11T21:38:06.4146118Z tmp0 = tl.load(in_ptr0 + (x1), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.4146310Z tmp1 = tl.load(in_ptr1 + (x1), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.4146407Z tmp2 = tl.load(in_ptr0 + (x1), xmask) 2023-01-11T21:38:06.4146505Z tmp5 = tl.load(in_ptr1 + (x1), xmask) 2023-01-11T21:38:06.4146575Z tmp3 = 1 2023-01-11T21:38:06.4146653Z tmp4 = tmp2 + tmp3 2023-01-11T21:38:06.4146725Z tmp6 = tmp5 + tmp3 2023-01-11T21:38:06.4146869Z tl.store(out_ptr0 + (x0 + (8*tmp0) + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask) 2023-01-11T21:38:06.4147005Z tl.store(out_ptr1 + (x0 + (8*tmp4) + tl.zeros([XBLOCK], tl.int32)), tmp6, xmask) 2023-01-11T21:38:06.4147087Z ''') 2023-01-11T21:38:06.4147093Z 2023-01-11T21:38:06.4147097Z 2023-01-11T21:38:06.4147251Z triton_fused_add_3_2 = async_compile.triton(''' 2023-01-11T21:38:06.4147321Z import triton 2023-01-11T21:38:06.4147413Z import triton.language as tl 2023-01-11T21:38:06.4147520Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4147621Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4147755Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4147878Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4147887Z 2023-01-11T21:38:06.4148288Z @pointwise(size_hints=[8192], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.4148361Z @triton.jit 2023-01-11T21:38:06.4148522Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4148597Z xnumel = 8192 2023-01-11T21:38:06.4148687Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4148815Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4148899Z xmask = xindex < xnumel 2023-01-11T21:38:06.4148969Z x0 = xindex 2023-01-11T21:38:06.4149065Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.4149136Z tmp1 = 1 2023-01-11T21:38:06.4149214Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.4149339Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.4149428Z ''') 2023-01-11T21:38:06.4149434Z 2023-01-11T21:38:06.4149438Z 2023-01-11T21:38:06.4149528Z async_compile.wait(globals()) 2023-01-11T21:38:06.4149609Z del async_compile 2023-01-11T21:38:06.4149614Z 2023-01-11T21:38:06.4149687Z def call(args): 2023-01-11T21:38:06.4149774Z arg0_1, arg1_1, arg2_1 = args 2023-01-11T21:38:06.4149847Z args.clear() 2023-01-11T21:38:06.4149941Z with torch.cuda.device(0): 2023-01-11T21:38:06.4150141Z buf0 = empty_strided((1024, 4, 2), (8, 2, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.4150350Z buf2 = empty_strided((1024, 4, 2), (8, 2, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.4150443Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.4150598Z triton_fused_add_index_put_0.run(arg0_1, buf0, buf2, 8192, grid=grid(8192), stream=stream0) 2023-01-11T21:38:06.4150671Z del arg0_1 2023-01-11T21:38:06.4150845Z triton_fused_add_add_2_index_put_index_put__1.run(arg1_1, arg2_1, buf0, buf2, 32, grid=grid(32), stream=stream0) 2023-01-11T21:38:06.4150947Z del arg1_1 2023-01-11T21:38:06.4151012Z del arg2_1 2023-01-11T21:38:06.4151220Z buf4 = empty_strided((1024, 4, 2), (8, 2, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.4151357Z triton_fused_add_3_2.run(buf2, buf4, 8192, grid=grid(8192), stream=stream0) 2023-01-11T21:38:06.4151441Z return (buf0, buf4, ) 2023-01-11T21:38:06.4151447Z 2023-01-11T21:38:06.4151451Z 2023-01-11T21:38:06.4151530Z if __name__ == "__main__": 2023-01-11T21:38:06.4151646Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.4151771Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.4151979Z arg0_1 = rand_strided((1024, 4, 2), (8, 2, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.4152165Z arg1_1 = rand_strided((4, ), (1, ), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.4152368Z arg2_1 = rand_strided((4, 1, 1), (1, 1, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.4152498Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1])) 2023-01-11T21:38:06.4152503Z 2023-01-11T21:38:06.4152765Z [2023-01-11 21:34:48,791] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 593 2023-01-11T21:38:06.4152771Z 2023-01-11T21:38:06.4152868Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.4152944Z import torch 2023-01-11T21:38:06.4153020Z import random 2023-01-11T21:38:06.4153138Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.4153256Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.4153267Z 2023-01-11T21:38:06.4153342Z aten = torch.ops.aten 2023-01-11T21:38:06.4153477Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.4153572Z async_compile = AsyncCompile() 2023-01-11T21:38:06.4153577Z 2023-01-11T21:38:06.4153648Z import triton 2023-01-11T21:38:06.4153741Z import triton.language as tl 2023-01-11T21:38:06.4153865Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.4154003Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.4154008Z 2023-01-11T21:38:06.4154013Z 2023-01-11T21:38:06.4154184Z triton_fused_add_index_put_0 = async_compile.triton(''' 2023-01-11T21:38:06.4154254Z import triton 2023-01-11T21:38:06.4154373Z import triton.language as tl 2023-01-11T21:38:06.4154488Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4154588Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4154719Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4154843Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4154849Z 2023-01-11T21:38:06.4155270Z @pointwise(size_hints=[8192], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.4155366Z @triton.jit 2023-01-11T21:38:06.4155519Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4155605Z xnumel = 8192 2023-01-11T21:38:06.4155700Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4155829Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4155913Z xmask = xindex < xnumel 2023-01-11T21:38:06.4155983Z x0 = xindex 2023-01-11T21:38:06.4156201Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.4156312Z tmp1 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.4156381Z tmp2 = 1 2023-01-11T21:38:06.4156461Z tmp3 = tmp1 + tmp2 2023-01-11T21:38:06.4156594Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask) 2023-01-11T21:38:06.4156724Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.4156809Z ''') 2023-01-11T21:38:06.4156815Z 2023-01-11T21:38:06.4156846Z 2023-01-11T21:38:06.4157042Z triton_fused_add_add_2_index_put_index_put__1 = async_compile.triton(''' 2023-01-11T21:38:06.4157110Z import triton 2023-01-11T21:38:06.4157202Z import triton.language as tl 2023-01-11T21:38:06.4157315Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4157417Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4157550Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4157673Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4157678Z 2023-01-11T21:38:06.4158134Z @pointwise(size_hints=[32], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp16', 2: '*fp16', 3: '*fp16', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'out_ptr0', 'out_ptr1'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]}) 2023-01-11T21:38:06.4158207Z @triton.jit 2023-01-11T21:38:06.4158358Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4158428Z xnumel = 32 2023-01-11T21:38:06.4158523Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4158651Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4158732Z xmask = xindex < xnumel 2023-01-11T21:38:06.4158809Z x1 = (xindex // 8) 2023-01-11T21:38:06.4158879Z x0 = xindex % 8 2023-01-11T21:38:06.4159074Z tmp0 = tl.load(in_ptr0 + (x1), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.4159282Z tmp1 = tl.load(in_ptr1 + (x1), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.4159378Z tmp2 = tl.load(in_ptr0 + (x1), xmask) 2023-01-11T21:38:06.4159493Z tmp5 = tl.load(in_ptr1 + (x1), xmask).to(tl.float32) 2023-01-11T21:38:06.4159564Z tmp3 = 1 2023-01-11T21:38:06.4159640Z tmp4 = tmp2 + tmp3 2023-01-11T21:38:06.4159716Z tmp6 = tmp5 + tmp3 2023-01-11T21:38:06.4159858Z tl.store(out_ptr0 + (x0 + (8*tmp0) + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask) 2023-01-11T21:38:06.4159991Z tl.store(out_ptr1 + (x0 + (8*tmp4) + tl.zeros([XBLOCK], tl.int32)), tmp6, xmask) 2023-01-11T21:38:06.4160074Z ''') 2023-01-11T21:38:06.4160080Z 2023-01-11T21:38:06.4160084Z 2023-01-11T21:38:06.4160244Z triton_fused_add_3_2 = async_compile.triton(''' 2023-01-11T21:38:06.4160317Z import triton 2023-01-11T21:38:06.4160409Z import triton.language as tl 2023-01-11T21:38:06.4160548Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4160649Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4160773Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4160897Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4160902Z 2023-01-11T21:38:06.4161307Z @pointwise(size_hints=[8192], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.4161379Z @triton.jit 2023-01-11T21:38:06.4161516Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4161589Z xnumel = 8192 2023-01-11T21:38:06.4161686Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4161814Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4161889Z xmask = xindex < xnumel 2023-01-11T21:38:06.4161963Z x0 = xindex 2023-01-11T21:38:06.4162077Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.4162145Z tmp1 = 1 2023-01-11T21:38:06.4162222Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.4162354Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.4162439Z ''') 2023-01-11T21:38:06.4162444Z 2023-01-11T21:38:06.4162449Z 2023-01-11T21:38:06.4162543Z async_compile.wait(globals()) 2023-01-11T21:38:06.4162612Z del async_compile 2023-01-11T21:38:06.4162617Z 2023-01-11T21:38:06.4162697Z def call(args): 2023-01-11T21:38:06.4162782Z arg0_1, arg1_1, arg2_1 = args 2023-01-11T21:38:06.4162885Z args.clear() 2023-01-11T21:38:06.4162972Z with torch.cuda.device(0): 2023-01-11T21:38:06.4163182Z buf0 = empty_strided((1024, 4, 2), (8, 2, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.4163388Z buf2 = empty_strided((1024, 4, 2), (8, 2, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.4163476Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.4163630Z triton_fused_add_index_put_0.run(arg0_1, buf0, buf2, 8192, grid=grid(8192), stream=stream0) 2023-01-11T21:38:06.4163703Z del arg0_1 2023-01-11T21:38:06.4163875Z triton_fused_add_add_2_index_put_index_put__1.run(arg1_1, arg2_1, buf0, buf2, 32, grid=grid(32), stream=stream0) 2023-01-11T21:38:06.4163947Z del arg1_1 2023-01-11T21:38:06.4164017Z del arg2_1 2023-01-11T21:38:06.4164224Z buf4 = empty_strided((1024, 4, 2), (8, 2, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.4164361Z triton_fused_add_3_2.run(buf2, buf4, 8192, grid=grid(8192), stream=stream0) 2023-01-11T21:38:06.4164440Z return (buf0, buf4, ) 2023-01-11T21:38:06.4164445Z 2023-01-11T21:38:06.4164449Z 2023-01-11T21:38:06.4164530Z if __name__ == "__main__": 2023-01-11T21:38:06.4164648Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.4164781Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.4164987Z arg0_1 = rand_strided((1024, 4, 2), (8, 2, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.4165182Z arg1_1 = rand_strided((4, ), (1, ), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.4165385Z arg2_1 = rand_strided((4, 1, 1), (1, 1, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.4165514Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1])) 2023-01-11T21:38:06.4165520Z 2023-01-11T21:38:06.4165583Z ok (1.078s) 2023-01-11T21:38:06.4166044Z test_index_put2_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.4166208Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.4166468Z [2023-01-11 21:34:48,880] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 594 2023-01-11T21:38:06.4166730Z [2023-01-11 21:34:48,970] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 594 2023-01-11T21:38:06.4166737Z 2023-01-11T21:38:06.4166835Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.4166908Z import torch 2023-01-11T21:38:06.4166983Z import random 2023-01-11T21:38:06.4167101Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.4167217Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.4167225Z 2023-01-11T21:38:06.4167308Z aten = torch.ops.aten 2023-01-11T21:38:06.4167444Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.4167539Z async_compile = AsyncCompile() 2023-01-11T21:38:06.4167544Z 2023-01-11T21:38:06.4167615Z import triton 2023-01-11T21:38:06.4167707Z import triton.language as tl 2023-01-11T21:38:06.4167834Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.4167976Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.4167981Z 2023-01-11T21:38:06.4167985Z 2023-01-11T21:38:06.4168142Z triton_fused_index_put_0 = async_compile.triton(''' 2023-01-11T21:38:06.4168216Z import triton 2023-01-11T21:38:06.4168308Z import triton.language as tl 2023-01-11T21:38:06.4168422Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4168522Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4168657Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4168823Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4168828Z 2023-01-11T21:38:06.4169238Z @pointwise(size_hints=[2097152], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.4169307Z @triton.jit 2023-01-11T21:38:06.4169441Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4169515Z xnumel = 1254400 2023-01-11T21:38:06.4169611Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4169738Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4169820Z xmask = xindex < xnumel 2023-01-11T21:38:06.4169889Z x0 = xindex 2023-01-11T21:38:06.4169979Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.4170112Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask) 2023-01-11T21:38:06.4170197Z ''') 2023-01-11T21:38:06.4170205Z 2023-01-11T21:38:06.4170209Z 2023-01-11T21:38:06.4170373Z triton_fused_index_put_1 = async_compile.triton(''' 2023-01-11T21:38:06.4170448Z import triton 2023-01-11T21:38:06.4170540Z import triton.language as tl 2023-01-11T21:38:06.4170651Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4170754Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4170879Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4171003Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4171008Z 2023-01-11T21:38:06.4171436Z @pointwise(size_hints=[8388608], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.4171511Z @triton.jit 2023-01-11T21:38:06.4171651Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4171729Z xnumel = 7526400 2023-01-11T21:38:06.4171825Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4171953Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4172028Z xmask = xindex < xnumel 2023-01-11T21:38:06.4172109Z x1 = (xindex // 12544) 2023-01-11T21:38:06.4172178Z x2 = xindex 2023-01-11T21:38:06.4172288Z x0 = xindex % 12544 2023-01-11T21:38:06.4172385Z tmp0 = tl.load(in_ptr0 + (x1), xmask) 2023-01-11T21:38:06.4172480Z tmp1 = tl.load(in_ptr1 + (x2), xmask) 2023-01-11T21:38:06.4172631Z tl.atomic_add(out_ptr0 + (x0 + (12544*tmp0) + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask) 2023-01-11T21:38:06.4172710Z ''') 2023-01-11T21:38:06.4172715Z 2023-01-11T21:38:06.4172726Z 2023-01-11T21:38:06.4172812Z async_compile.wait(globals()) 2023-01-11T21:38:06.4172886Z del async_compile 2023-01-11T21:38:06.4172891Z 2023-01-11T21:38:06.4172965Z def call(args): 2023-01-11T21:38:06.4173050Z arg0_1, arg1_1, arg2_1 = args 2023-01-11T21:38:06.4173127Z args.clear() 2023-01-11T21:38:06.4173220Z with torch.cuda.device(0): 2023-01-11T21:38:06.4173447Z buf0 = empty_strided((100, 256, 7, 7), (12544, 49, 7, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.4173533Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.4173682Z triton_fused_index_put_0.run(arg0_1, buf0, 1254400, grid=grid(1254400), stream=stream0) 2023-01-11T21:38:06.4173755Z del arg0_1 2023-01-11T21:38:06.4173906Z triton_fused_index_put_1.run(arg1_1, arg2_1, buf0, 7526400, grid=grid(7526400), stream=stream0) 2023-01-11T21:38:06.4173980Z del arg1_1 2023-01-11T21:38:06.4174052Z del arg2_1 2023-01-11T21:38:06.4174128Z return (buf0, ) 2023-01-11T21:38:06.4174133Z 2023-01-11T21:38:06.4174137Z 2023-01-11T21:38:06.4174210Z if __name__ == "__main__": 2023-01-11T21:38:06.4174329Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.4174456Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.4174844Z arg0_1 = rand_strided((100, 256, 7, 7), (12544, 49, 7, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.4175040Z arg1_1 = rand_strided((600, ), (1, ), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.4175262Z arg2_1 = rand_strided((600, 256, 7, 7), (12544, 49, 7, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.4175395Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1])) 2023-01-11T21:38:06.4175400Z 2023-01-11T21:38:06.4175471Z ok (0.181s) 2023-01-11T21:38:06.4175975Z test_index_put3_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.4176107Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.4176363Z [2023-01-11 21:34:49,060] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 595 2023-01-11T21:38:06.4176629Z [2023-01-11 21:34:49,171] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 595 2023-01-11T21:38:06.4177043Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.4177230Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.4177489Z [2023-01-11 21:34:49,258] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 596 2023-01-11T21:38:06.4177494Z 2023-01-11T21:38:06.4177591Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.4177676Z import torch 2023-01-11T21:38:06.4177750Z import random 2023-01-11T21:38:06.4177866Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.4177983Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.4177988Z 2023-01-11T21:38:06.4178069Z aten = torch.ops.aten 2023-01-11T21:38:06.4178253Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.4178352Z async_compile = AsyncCompile() 2023-01-11T21:38:06.4178357Z 2023-01-11T21:38:06.4178432Z import triton 2023-01-11T21:38:06.4178526Z import triton.language as tl 2023-01-11T21:38:06.4178650Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.4178782Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.4178791Z 2023-01-11T21:38:06.4178796Z 2023-01-11T21:38:06.4178956Z triton_fused_index_put__0 = async_compile.triton(''' 2023-01-11T21:38:06.4179031Z import triton 2023-01-11T21:38:06.4179121Z import triton.language as tl 2023-01-11T21:38:06.4179236Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4179336Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4179468Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4179593Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4179599Z 2023-01-11T21:38:06.4180034Z @pointwise(size_hints=[8192], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.4180101Z @triton.jit 2023-01-11T21:38:06.4180241Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4180316Z xnumel = 6144 2023-01-11T21:38:06.4180412Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4180541Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4180667Z xmask = xindex < xnumel 2023-01-11T21:38:06.4180752Z x1 = (xindex // 2) % 3 2023-01-11T21:38:06.4180821Z x0 = xindex % 2 2023-01-11T21:38:06.4180902Z x2 = (xindex // 6) 2023-01-11T21:38:06.4181097Z tmp0 = tl.load(in_ptr0 + (x1), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.4181303Z tmp1 = tl.load(in_ptr1 + (x0 + (2*x2)), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.4181453Z tl.store(out_ptr0 + (x0 + (2*tmp0) + (8*x2) + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask) 2023-01-11T21:38:06.4181541Z ''') 2023-01-11T21:38:06.4181547Z 2023-01-11T21:38:06.4181551Z 2023-01-11T21:38:06.4181703Z triton_fused_add_1 = async_compile.triton(''' 2023-01-11T21:38:06.4181782Z import triton 2023-01-11T21:38:06.4181870Z import triton.language as tl 2023-01-11T21:38:06.4181988Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4182091Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4182223Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4182357Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4182362Z 2023-01-11T21:38:06.4182776Z @pointwise(size_hints=[8192], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.4182852Z @triton.jit 2023-01-11T21:38:06.4182987Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4183057Z xnumel = 8192 2023-01-11T21:38:06.4183155Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4183286Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4183371Z xmask = xindex < xnumel 2023-01-11T21:38:06.4183444Z x0 = xindex 2023-01-11T21:38:06.4183544Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.4183619Z tmp1 = 1 2023-01-11T21:38:06.4183694Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.4183833Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.4183920Z ''') 2023-01-11T21:38:06.4183926Z 2023-01-11T21:38:06.4183930Z 2023-01-11T21:38:06.4184115Z triton_fused_add_add_2_index_put__1_2 = async_compile.triton(''' 2023-01-11T21:38:06.4184193Z import triton 2023-01-11T21:38:06.4184312Z import triton.language as tl 2023-01-11T21:38:06.4184431Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4184527Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4184662Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4184788Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4184793Z 2023-01-11T21:38:06.4185223Z @pointwise(size_hints=[8192], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.4185302Z @triton.jit 2023-01-11T21:38:06.4185450Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4185550Z xnumel = 6144 2023-01-11T21:38:06.4185656Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4185797Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4185883Z xmask = xindex < xnumel 2023-01-11T21:38:06.4185964Z x1 = (xindex // 2) % 3 2023-01-11T21:38:06.4186039Z x0 = xindex % 2 2023-01-11T21:38:06.4186116Z x2 = (xindex // 6) 2023-01-11T21:38:06.4186214Z tmp0 = tl.load(in_ptr0 + (x1), xmask) 2023-01-11T21:38:06.4186320Z tmp3 = tl.load(in_ptr1 + (x0 + (2*x2)), xmask) 2023-01-11T21:38:06.4186387Z tmp1 = 1 2023-01-11T21:38:06.4186468Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.4186547Z tmp4 = tmp3 + tmp1 2023-01-11T21:38:06.4186700Z tl.store(out_ptr0 + (x0 + (2*tmp2) + (8*x2) + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.4186788Z ''') 2023-01-11T21:38:06.4186830Z 2023-01-11T21:38:06.4186834Z 2023-01-11T21:38:06.4186932Z async_compile.wait(globals()) 2023-01-11T21:38:06.4187009Z del async_compile 2023-01-11T21:38:06.4187014Z 2023-01-11T21:38:06.4187092Z def call(args): 2023-01-11T21:38:06.4187173Z arg0_1, arg1_1, arg2_1 = args 2023-01-11T21:38:06.4187250Z args.clear() 2023-01-11T21:38:06.4187348Z with torch.cuda.device(0): 2023-01-11T21:38:06.4187441Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.4187597Z triton_fused_index_put__0.run(arg1_1, arg2_1, arg0_1, 6144, grid=grid(6144), stream=stream0) 2023-01-11T21:38:06.4187809Z buf1 = empty_strided((1024, 4, 2), (8, 2, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.4187949Z triton_fused_add_1.run(arg0_1, buf1, 8192, grid=grid(8192), stream=stream0) 2023-01-11T21:38:06.4188018Z del arg0_1 2023-01-11T21:38:06.4188180Z triton_fused_add_add_2_index_put__1_2.run(arg1_1, arg2_1, buf1, 6144, grid=grid(6144), stream=stream0) 2023-01-11T21:38:06.4188258Z del arg1_1 2023-01-11T21:38:06.4188333Z del arg2_1 2023-01-11T21:38:06.4188414Z return (buf1, ) 2023-01-11T21:38:06.4188420Z 2023-01-11T21:38:06.4188425Z 2023-01-11T21:38:06.4188509Z if __name__ == "__main__": 2023-01-11T21:38:06.4188630Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.4188764Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.4188970Z arg0_1 = rand_strided((1024, 4, 2), (8, 2, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.4189166Z arg1_1 = rand_strided((3, ), (1, ), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.4189376Z arg2_1 = rand_strided((1024, 1, 2), (2, 2, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.4189505Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1])) 2023-01-11T21:38:06.4189510Z 2023-01-11T21:38:06.4189777Z [2023-01-11 21:34:49,361] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 596 2023-01-11T21:38:06.4189786Z 2023-01-11T21:38:06.4189887Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.4189964Z import torch 2023-01-11T21:38:06.4190041Z import random 2023-01-11T21:38:06.4190155Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.4190280Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.4190286Z 2023-01-11T21:38:06.4190439Z aten = torch.ops.aten 2023-01-11T21:38:06.4190580Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.4190676Z async_compile = AsyncCompile() 2023-01-11T21:38:06.4190681Z 2023-01-11T21:38:06.4190757Z import triton 2023-01-11T21:38:06.4190852Z import triton.language as tl 2023-01-11T21:38:06.4190972Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.4191112Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.4191117Z 2023-01-11T21:38:06.4191121Z 2023-01-11T21:38:06.4191290Z triton_fused_index_put__0 = async_compile.triton(''' 2023-01-11T21:38:06.4191370Z import triton 2023-01-11T21:38:06.4191466Z import triton.language as tl 2023-01-11T21:38:06.4191582Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4191687Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4191821Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4191943Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4191956Z 2023-01-11T21:38:06.4192375Z @pointwise(size_hints=[8192], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.4192452Z @triton.jit 2023-01-11T21:38:06.4192594Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4192670Z xnumel = 6144 2023-01-11T21:38:06.4192768Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4192900Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4193007Z xmask = xindex < xnumel 2023-01-11T21:38:06.4193081Z x1 = (xindex // 2) % 3 2023-01-11T21:38:06.4193154Z x0 = xindex % 2 2023-01-11T21:38:06.4193231Z x2 = (xindex // 6) 2023-01-11T21:38:06.4193420Z tmp0 = tl.load(in_ptr0 + (x1), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.4193644Z tmp1 = tl.load(in_ptr1 + (x0 + (2*x2)), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.4193790Z tl.store(out_ptr0 + (x0 + (2*tmp0) + (8*x2) + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask) 2023-01-11T21:38:06.4193875Z ''') 2023-01-11T21:38:06.4193880Z 2023-01-11T21:38:06.4193885Z 2023-01-11T21:38:06.4194036Z triton_fused_add_1 = async_compile.triton(''' 2023-01-11T21:38:06.4194104Z import triton 2023-01-11T21:38:06.4194193Z import triton.language as tl 2023-01-11T21:38:06.4194310Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4194414Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4194549Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4194672Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4194677Z 2023-01-11T21:38:06.4195080Z @pointwise(size_hints=[8192], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.4195151Z @triton.jit 2023-01-11T21:38:06.4195280Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4195372Z xnumel = 8192 2023-01-11T21:38:06.4195477Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4195626Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4195710Z xmask = xindex < xnumel 2023-01-11T21:38:06.4195783Z x0 = xindex 2023-01-11T21:38:06.4195904Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.4195970Z tmp1 = 1 2023-01-11T21:38:06.4196048Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.4196181Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.4196267Z ''') 2023-01-11T21:38:06.4196272Z 2023-01-11T21:38:06.4196276Z 2023-01-11T21:38:06.4196462Z triton_fused_add_add_2_index_put__1_2 = async_compile.triton(''' 2023-01-11T21:38:06.4196567Z import triton 2023-01-11T21:38:06.4196662Z import triton.language as tl 2023-01-11T21:38:06.4196769Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4196869Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4196998Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4197122Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4197127Z 2023-01-11T21:38:06.4197546Z @pointwise(size_hints=[8192], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.4197625Z @triton.jit 2023-01-11T21:38:06.4197763Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4197837Z xnumel = 6144 2023-01-11T21:38:06.4197925Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4198058Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4198140Z xmask = xindex < xnumel 2023-01-11T21:38:06.4198222Z x1 = (xindex // 2) % 3 2023-01-11T21:38:06.4198293Z x0 = xindex % 2 2023-01-11T21:38:06.4198373Z x2 = (xindex // 6) 2023-01-11T21:38:06.4198472Z tmp0 = tl.load(in_ptr0 + (x1), xmask) 2023-01-11T21:38:06.4198586Z tmp3 = tl.load(in_ptr1 + (x0 + (2*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.4198659Z tmp1 = 1 2023-01-11T21:38:06.4198737Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.4198813Z tmp4 = tmp3 + tmp1 2023-01-11T21:38:06.4198957Z tl.store(out_ptr0 + (x0 + (2*tmp2) + (8*x2) + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.4199071Z ''') 2023-01-11T21:38:06.4199076Z 2023-01-11T21:38:06.4199081Z 2023-01-11T21:38:06.4199173Z async_compile.wait(globals()) 2023-01-11T21:38:06.4199248Z del async_compile 2023-01-11T21:38:06.4199254Z 2023-01-11T21:38:06.4199321Z def call(args): 2023-01-11T21:38:06.4199408Z arg0_1, arg1_1, arg2_1 = args 2023-01-11T21:38:06.4199485Z args.clear() 2023-01-11T21:38:06.4199578Z with torch.cuda.device(0): 2023-01-11T21:38:06.4199671Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.4199820Z triton_fused_index_put__0.run(arg1_1, arg2_1, arg0_1, 6144, grid=grid(6144), stream=stream0) 2023-01-11T21:38:06.4200025Z buf1 = empty_strided((1024, 4, 2), (8, 2, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.4200158Z triton_fused_add_1.run(arg0_1, buf1, 8192, grid=grid(8192), stream=stream0) 2023-01-11T21:38:06.4200231Z del arg0_1 2023-01-11T21:38:06.4200394Z triton_fused_add_add_2_index_put__1_2.run(arg1_1, arg2_1, buf1, 6144, grid=grid(6144), stream=stream0) 2023-01-11T21:38:06.4200470Z del arg1_1 2023-01-11T21:38:06.4200542Z del arg2_1 2023-01-11T21:38:06.4200622Z return (buf1, ) 2023-01-11T21:38:06.4200627Z 2023-01-11T21:38:06.4200632Z 2023-01-11T21:38:06.4200714Z if __name__ == "__main__": 2023-01-11T21:38:06.4200835Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.4200954Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.4201161Z arg0_1 = rand_strided((1024, 4, 2), (8, 2, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.4201353Z arg1_1 = rand_strided((3, ), (1, ), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.4201560Z arg2_1 = rand_strided((1024, 1, 2), (2, 2, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.4201687Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1])) 2023-01-11T21:38:06.4201692Z 2023-01-11T21:38:06.4201762Z ok (0.388s) 2023-01-11T21:38:06.4202260Z test_index_put_as_masked_fill_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.4202394Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.4202652Z [2023-01-11 21:34:49,410] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 597 2023-01-11T21:38:06.4202917Z [2023-01-11 21:34:49,483] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 597 2023-01-11T21:38:06.4203325Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.4203461Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.4203721Z [2023-01-11 21:34:49,530] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 598 2023-01-11T21:38:06.4203981Z [2023-01-11 21:34:49,607] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 598 2023-01-11T21:38:06.4204391Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.4204525Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.4204801Z [2023-01-11 21:34:49,656] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 599 2023-01-11T21:38:06.4205063Z [2023-01-11 21:34:49,736] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 599 2023-01-11T21:38:06.4205478Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.4205609Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.4205864Z [2023-01-11 21:34:49,784] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 600 2023-01-11T21:38:06.4205870Z 2023-01-11T21:38:06.4205968Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.4206039Z import torch 2023-01-11T21:38:06.4206113Z import random 2023-01-11T21:38:06.4206231Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.4206353Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.4206358Z 2023-01-11T21:38:06.4206438Z aten = torch.ops.aten 2023-01-11T21:38:06.4206578Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.4206675Z async_compile = AsyncCompile() 2023-01-11T21:38:06.4206680Z 2023-01-11T21:38:06.4206747Z import triton 2023-01-11T21:38:06.4206838Z import triton.language as tl 2023-01-11T21:38:06.4206963Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.4207098Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.4207104Z 2023-01-11T21:38:06.4207108Z 2023-01-11T21:38:06.4207287Z triton_fused_clone_index_put__0 = async_compile.triton(''' 2023-01-11T21:38:06.4207361Z import triton 2023-01-11T21:38:06.4207453Z import triton.language as tl 2023-01-11T21:38:06.4207568Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4207662Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4207793Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4207919Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4207949Z 2023-01-11T21:38:06.4208386Z @pointwise(size_hints=[8192], filename=__file__, meta={'signature': {0: '*i1', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]}) 2023-01-11T21:38:06.4208459Z @triton.jit 2023-01-11T21:38:06.4208611Z def triton_(in_ptr0, in_ptr1, in_ptr2, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4208685Z xnumel = 8192 2023-01-11T21:38:06.4208783Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4208903Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4208990Z xmask = xindex < xnumel 2023-01-11T21:38:06.4209061Z x0 = xindex 2023-01-11T21:38:06.4209156Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.4209289Z tmp1 = tl.load(in_ptr1 + (0 + tl.zeros([XBLOCK], tl.int32)), None) 2023-01-11T21:38:06.4209385Z tmp2 = tl.load(in_ptr2 + (x0), xmask) 2023-01-11T21:38:06.4209485Z tmp3 = tl.where(tmp0, tmp1, tmp2) 2023-01-11T21:38:06.4209611Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.4209694Z ''') 2023-01-11T21:38:06.4209699Z 2023-01-11T21:38:06.4209704Z 2023-01-11T21:38:06.4209794Z async_compile.wait(globals()) 2023-01-11T21:38:06.4209870Z del async_compile 2023-01-11T21:38:06.4209875Z 2023-01-11T21:38:06.4209947Z def call(args): 2023-01-11T21:38:06.4210031Z arg0_1, arg1_1, arg2_1 = args 2023-01-11T21:38:06.4210106Z args.clear() 2023-01-11T21:38:06.4210197Z with torch.cuda.device(0): 2023-01-11T21:38:06.4210401Z buf0 = empty_strided((1024, 4, 2), (8, 2, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.4210532Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.4210695Z triton_fused_clone_index_put__0.run(arg1_1, arg2_1, arg0_1, buf0, 8192, grid=grid(8192), stream=stream0) 2023-01-11T21:38:06.4210767Z del arg0_1 2023-01-11T21:38:06.4210839Z del arg1_1 2023-01-11T21:38:06.4210914Z del arg2_1 2023-01-11T21:38:06.4210993Z return (buf0, ) 2023-01-11T21:38:06.4210998Z 2023-01-11T21:38:06.4211003Z 2023-01-11T21:38:06.4211076Z if __name__ == "__main__": 2023-01-11T21:38:06.4211194Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.4211319Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.4211528Z arg0_1 = rand_strided((1024, 4, 2), (8, 2, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.4211729Z arg1_1 = rand_strided((1024, 4, 2), (8, 2, 1), device='cuda:0', dtype=torch.bool) 2023-01-11T21:38:06.4211914Z arg2_1 = rand_strided((), (), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.4212044Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1])) 2023-01-11T21:38:06.4212049Z 2023-01-11T21:38:06.4212054Z 2023-01-11T21:38:06.4212151Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.4212218Z import torch 2023-01-11T21:38:06.4212293Z import random 2023-01-11T21:38:06.4212416Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.4212540Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.4212545Z 2023-01-11T21:38:06.4212626Z aten = torch.ops.aten 2023-01-11T21:38:06.4212767Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.4212863Z async_compile = AsyncCompile() 2023-01-11T21:38:06.4212868Z 2023-01-11T21:38:06.4212940Z import triton 2023-01-11T21:38:06.4213025Z import triton.language as tl 2023-01-11T21:38:06.4213151Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.4213292Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.4213300Z 2023-01-11T21:38:06.4213305Z 2023-01-11T21:38:06.4213478Z triton_fused_clone_index_put__0 = async_compile.triton(''' 2023-01-11T21:38:06.4213552Z import triton 2023-01-11T21:38:06.4213647Z import triton.language as tl 2023-01-11T21:38:06.4213758Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4213887Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4214014Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4214137Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4214143Z 2023-01-11T21:38:06.4214767Z @pointwise(size_hints=[8192], filename=__file__, meta={'signature': {0: '*i1', 1: '*fp16', 2: '*fp16', 3: '*fp16', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]}) 2023-01-11T21:38:06.4214842Z @triton.jit 2023-01-11T21:38:06.4214990Z def triton_(in_ptr0, in_ptr1, in_ptr2, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4215066Z xnumel = 8192 2023-01-11T21:38:06.4215162Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4215290Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4215366Z xmask = xindex < xnumel 2023-01-11T21:38:06.4215436Z x0 = xindex 2023-01-11T21:38:06.4215534Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.4215678Z tmp1 = tl.load(in_ptr1 + (0 + tl.zeros([XBLOCK], tl.int32)), None).to(tl.float32) 2023-01-11T21:38:06.4215795Z tmp2 = tl.load(in_ptr2 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.4215887Z tmp3 = tl.where(tmp0, tmp1, tmp2) 2023-01-11T21:38:06.4216023Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.4216103Z ''') 2023-01-11T21:38:06.4216109Z 2023-01-11T21:38:06.4216119Z 2023-01-11T21:38:06.4216205Z async_compile.wait(globals()) 2023-01-11T21:38:06.4216281Z del async_compile 2023-01-11T21:38:06.4216331Z 2023-01-11T21:38:06.4216404Z def call(args): 2023-01-11T21:38:06.4216491Z arg0_1, arg1_1, arg2_1 = args 2023-01-11T21:38:06.4216565Z args.clear() 2023-01-11T21:38:06.4216655Z with torch.cuda.device(0): 2023-01-11T21:38:06.4216864Z buf0 = empty_strided((1024, 4, 2), (8, 2, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.4216952Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.4217120Z triton_fused_clone_index_put__0.run(arg1_1, arg2_1, arg0_1, buf0, 8192, grid=grid(8192), stream=stream0) 2023-01-11T21:38:06.4217259Z del arg0_1 2023-01-11T21:38:06.4217340Z del arg1_1 2023-01-11T21:38:06.4217411Z del arg2_1 2023-01-11T21:38:06.4217486Z return (buf0, ) 2023-01-11T21:38:06.4217491Z 2023-01-11T21:38:06.4217496Z 2023-01-11T21:38:06.4217575Z if __name__ == "__main__": 2023-01-11T21:38:06.4217685Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.4217812Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.4218026Z arg0_1 = rand_strided((1024, 4, 2), (8, 2, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.4218227Z arg1_1 = rand_strided((1024, 4, 2), (8, 2, 1), device='cuda:0', dtype=torch.bool) 2023-01-11T21:38:06.4218414Z arg2_1 = rand_strided((), (), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.4218545Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1])) 2023-01-11T21:38:06.4218550Z 2023-01-11T21:38:06.4218555Z 2023-01-11T21:38:06.4218656Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.4218730Z import torch 2023-01-11T21:38:06.4218797Z import random 2023-01-11T21:38:06.4218915Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.4219036Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.4219041Z 2023-01-11T21:38:06.4219123Z aten = torch.ops.aten 2023-01-11T21:38:06.4219258Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.4219350Z async_compile = AsyncCompile() 2023-01-11T21:38:06.4219359Z 2023-01-11T21:38:06.4219431Z import triton 2023-01-11T21:38:06.4219523Z import triton.language as tl 2023-01-11T21:38:06.4219641Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.4219779Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.4219784Z 2023-01-11T21:38:06.4219826Z 2023-01-11T21:38:06.4220004Z triton_fused_clone_index_put__0 = async_compile.triton(''' 2023-01-11T21:38:06.4220080Z import triton 2023-01-11T21:38:06.4220171Z import triton.language as tl 2023-01-11T21:38:06.4220283Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4220384Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4220509Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4220633Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4220638Z 2023-01-11T21:38:06.4221070Z @pointwise(size_hints=[8192], filename=__file__, meta={'signature': {0: '*i1', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]}) 2023-01-11T21:38:06.4221147Z @triton.jit 2023-01-11T21:38:06.4221298Z def triton_(in_ptr0, in_ptr1, in_ptr2, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4221373Z xnumel = 8192 2023-01-11T21:38:06.4221467Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4221598Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4221678Z xmask = xindex < xnumel 2023-01-11T21:38:06.4221741Z x0 = xindex 2023-01-11T21:38:06.4221838Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.4221933Z tmp1 = tl.load(in_ptr1 + (x0), xmask) 2023-01-11T21:38:06.4222068Z tmp2 = tl.load(in_ptr2 + (0 + tl.zeros([XBLOCK], tl.int32)), None) 2023-01-11T21:38:06.4222146Z tmp3 = tmp1 + tmp2 2023-01-11T21:38:06.4222243Z tmp4 = tl.where(tmp0, tmp3, tmp1) 2023-01-11T21:38:06.4222401Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.4222479Z ''') 2023-01-11T21:38:06.4222485Z 2023-01-11T21:38:06.4222490Z 2023-01-11T21:38:06.4222582Z async_compile.wait(globals()) 2023-01-11T21:38:06.4222658Z del async_compile 2023-01-11T21:38:06.4222663Z 2023-01-11T21:38:06.4222738Z def call(args): 2023-01-11T21:38:06.4222830Z arg0_1, arg1_1, arg2_1 = args 2023-01-11T21:38:06.4222903Z args.clear() 2023-01-11T21:38:06.4222995Z with torch.cuda.device(0): 2023-01-11T21:38:06.4223196Z buf0 = empty_strided((1024, 4, 2), (8, 2, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.4223288Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.4223450Z triton_fused_clone_index_put__0.run(arg1_1, arg0_1, arg2_1, buf0, 8192, grid=grid(8192), stream=stream0) 2023-01-11T21:38:06.4223520Z del arg0_1 2023-01-11T21:38:06.4223592Z del arg1_1 2023-01-11T21:38:06.4223662Z del arg2_1 2023-01-11T21:38:06.4223742Z return (buf0, ) 2023-01-11T21:38:06.4223747Z 2023-01-11T21:38:06.4223752Z 2023-01-11T21:38:06.4223831Z if __name__ == "__main__": 2023-01-11T21:38:06.4223942Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.4224068Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.4224281Z arg0_1 = rand_strided((1024, 4, 2), (8, 2, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.4224481Z arg1_1 = rand_strided((1024, 4, 2), (8, 2, 1), device='cuda:0', dtype=torch.bool) 2023-01-11T21:38:06.4224668Z arg2_1 = rand_strided((), (), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.4224794Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1])) 2023-01-11T21:38:06.4224799Z 2023-01-11T21:38:06.4225062Z [2023-01-11 21:34:49,862] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 600 2023-01-11T21:38:06.4225068Z 2023-01-11T21:38:06.4225165Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.4225234Z import torch 2023-01-11T21:38:06.4225306Z import random 2023-01-11T21:38:06.4225432Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.4225557Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.4225563Z 2023-01-11T21:38:06.4225665Z aten = torch.ops.aten 2023-01-11T21:38:06.4225848Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.4225944Z async_compile = AsyncCompile() 2023-01-11T21:38:06.4225950Z 2023-01-11T21:38:06.4226020Z import triton 2023-01-11T21:38:06.4226106Z import triton.language as tl 2023-01-11T21:38:06.4226229Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.4226364Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.4226369Z 2023-01-11T21:38:06.4226374Z 2023-01-11T21:38:06.4226548Z triton_fused_clone_index_put__0 = async_compile.triton(''' 2023-01-11T21:38:06.4226621Z import triton 2023-01-11T21:38:06.4226713Z import triton.language as tl 2023-01-11T21:38:06.4226830Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4226925Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4227060Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4227185Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4227190Z 2023-01-11T21:38:06.4227616Z @pointwise(size_hints=[8192], filename=__file__, meta={'signature': {0: '*i1', 1: '*fp16', 2: '*fp16', 3: '*fp16', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]}) 2023-01-11T21:38:06.4227690Z @triton.jit 2023-01-11T21:38:06.4227839Z def triton_(in_ptr0, in_ptr1, in_ptr2, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4227912Z xnumel = 8192 2023-01-11T21:38:06.4228007Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4228135Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4228240Z xmask = xindex < xnumel 2023-01-11T21:38:06.4228310Z x0 = xindex 2023-01-11T21:38:06.4228405Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.4228521Z tmp1 = tl.load(in_ptr1 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.4228666Z tmp2 = tl.load(in_ptr2 + (0 + tl.zeros([XBLOCK], tl.int32)), None).to(tl.float32) 2023-01-11T21:38:06.4228745Z tmp3 = tmp1 + tmp2 2023-01-11T21:38:06.4228840Z tmp4 = tl.where(tmp0, tmp3, tmp1) 2023-01-11T21:38:06.4228967Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.4229050Z ''') 2023-01-11T21:38:06.4229056Z 2023-01-11T21:38:06.4229060Z 2023-01-11T21:38:06.4229151Z async_compile.wait(globals()) 2023-01-11T21:38:06.4229227Z del async_compile 2023-01-11T21:38:06.4229232Z 2023-01-11T21:38:06.4229305Z def call(args): 2023-01-11T21:38:06.4229391Z arg0_1, arg1_1, arg2_1 = args 2023-01-11T21:38:06.4229466Z args.clear() 2023-01-11T21:38:06.4229550Z with torch.cuda.device(0): 2023-01-11T21:38:06.4229760Z buf0 = empty_strided((1024, 4, 2), (8, 2, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.4229850Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.4230015Z triton_fused_clone_index_put__0.run(arg1_1, arg0_1, arg2_1, buf0, 8192, grid=grid(8192), stream=stream0) 2023-01-11T21:38:06.4230086Z del arg0_1 2023-01-11T21:38:06.4230160Z del arg1_1 2023-01-11T21:38:06.4230231Z del arg2_1 2023-01-11T21:38:06.4230301Z return (buf0, ) 2023-01-11T21:38:06.4230312Z 2023-01-11T21:38:06.4230317Z 2023-01-11T21:38:06.4230390Z if __name__ == "__main__": 2023-01-11T21:38:06.4230507Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.4230630Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.4230840Z arg0_1 = rand_strided((1024, 4, 2), (8, 2, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.4231043Z arg1_1 = rand_strided((1024, 4, 2), (8, 2, 1), device='cuda:0', dtype=torch.bool) 2023-01-11T21:38:06.4231230Z arg2_1 = rand_strided((), (), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.4231356Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1])) 2023-01-11T21:38:06.4231362Z 2023-01-11T21:38:06.4231430Z ok (0.501s) 2023-01-11T21:38:06.4231919Z test_index_put_fallback1_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.4232051Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.4232308Z [2023-01-11 21:34:49,910] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 601 2023-01-11T21:38:06.4232575Z [2023-01-11 21:34:50,066] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 601 2023-01-11T21:38:06.4232992Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.4233122Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.4233374Z [2023-01-11 21:34:50,113] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 602 2023-01-11T21:38:06.4233636Z [2023-01-11 21:34:50,179] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 602 2023-01-11T21:38:06.4234045Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.4234203Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.4234459Z [2023-01-11 21:34:50,227] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 603 2023-01-11T21:38:06.4234721Z [2023-01-11 21:34:50,237] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 603 2023-01-11T21:38:06.4235130Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.4235262Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.4235560Z [2023-01-11 21:34:50,284] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 604 2023-01-11T21:38:06.4235566Z 2023-01-11T21:38:06.4235664Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.4235738Z import torch 2023-01-11T21:38:06.4235815Z import random 2023-01-11T21:38:06.4235936Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.4236060Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.4236066Z 2023-01-11T21:38:06.4236146Z aten = torch.ops.aten 2023-01-11T21:38:06.4236275Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.4236368Z async_compile = AsyncCompile() 2023-01-11T21:38:06.4236373Z 2023-01-11T21:38:06.4236446Z import triton 2023-01-11T21:38:06.4236537Z import triton.language as tl 2023-01-11T21:38:06.4236661Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.4236800Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.4236808Z 2023-01-11T21:38:06.4236813Z 2023-01-11T21:38:06.4236973Z triton_fused_clone_0 = async_compile.triton(''' 2023-01-11T21:38:06.4237048Z import triton 2023-01-11T21:38:06.4237132Z import triton.language as tl 2023-01-11T21:38:06.4237247Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4237378Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4237512Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4237636Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4237641Z 2023-01-11T21:38:06.4238046Z @pointwise(size_hints=[4], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.4238118Z @triton.jit 2023-01-11T21:38:06.4238250Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4238320Z xnumel = 3 2023-01-11T21:38:06.4238415Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4238541Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4238623Z xmask = xindex < xnumel 2023-01-11T21:38:06.4238695Z x0 = xindex 2023-01-11T21:38:06.4238794Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.4238928Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask) 2023-01-11T21:38:06.4239006Z ''') 2023-01-11T21:38:06.4239012Z 2023-01-11T21:38:06.4239017Z 2023-01-11T21:38:06.4239109Z async_compile.wait(globals()) 2023-01-11T21:38:06.4239187Z del async_compile 2023-01-11T21:38:06.4239192Z 2023-01-11T21:38:06.4239266Z def call(args): 2023-01-11T21:38:06.4239351Z arg0_1, arg1_1, arg2_1 = args 2023-01-11T21:38:06.4239424Z args.clear() 2023-01-11T21:38:06.4239516Z with torch.cuda.device(0): 2023-01-11T21:38:06.4239705Z buf0 = empty_strided((3, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.4239825Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.4239961Z triton_fused_clone_0.run(arg0_1, buf0, 3, grid=grid(3), stream=stream0) 2023-01-11T21:38:06.4240038Z del arg0_1 2023-01-11T21:38:06.4240149Z aten.index_put_(buf0, [arg1_1], arg2_1, False) 2023-01-11T21:38:06.4240223Z del arg1_1 2023-01-11T21:38:06.4240296Z del arg2_1 2023-01-11T21:38:06.4240366Z return (buf0, ) 2023-01-11T21:38:06.4240379Z 2023-01-11T21:38:06.4240384Z 2023-01-11T21:38:06.4240456Z if __name__ == "__main__": 2023-01-11T21:38:06.4240572Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.4240697Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.4240901Z arg0_1 = rand_strided((3, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.4241089Z arg1_1 = rand_strided((3, ), (1, ), device='cuda:0', dtype=torch.bool) 2023-01-11T21:38:06.4241280Z arg2_1 = rand_strided((2, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.4241413Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1])) 2023-01-11T21:38:06.4241418Z 2023-01-11T21:38:06.4241423Z 2023-01-11T21:38:06.4241523Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.4241590Z import torch 2023-01-11T21:38:06.4241664Z import random 2023-01-11T21:38:06.4241783Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.4241907Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.4241912Z 2023-01-11T21:38:06.4241993Z aten = torch.ops.aten 2023-01-11T21:38:06.4242129Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.4242223Z async_compile = AsyncCompile() 2023-01-11T21:38:06.4242230Z 2023-01-11T21:38:06.4242296Z import triton 2023-01-11T21:38:06.4242386Z import triton.language as tl 2023-01-11T21:38:06.4242510Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.4242646Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.4242654Z 2023-01-11T21:38:06.4242658Z 2023-01-11T21:38:06.4242813Z triton_fused_clone_0 = async_compile.triton(''' 2023-01-11T21:38:06.4242888Z import triton 2023-01-11T21:38:06.4242979Z import triton.language as tl 2023-01-11T21:38:06.4243093Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4243213Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4243348Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4243472Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4243478Z 2023-01-11T21:38:06.4243870Z @pointwise(size_hints=[4], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.4243943Z @triton.jit 2023-01-11T21:38:06.4244074Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4244150Z xnumel = 3 2023-01-11T21:38:06.4244248Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4244369Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4244452Z xmask = xindex < xnumel 2023-01-11T21:38:06.4244522Z x0 = xindex 2023-01-11T21:38:06.4244641Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.4244775Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask) 2023-01-11T21:38:06.4244862Z ''') 2023-01-11T21:38:06.4244867Z 2023-01-11T21:38:06.4244871Z 2023-01-11T21:38:06.4244963Z async_compile.wait(globals()) 2023-01-11T21:38:06.4245032Z del async_compile 2023-01-11T21:38:06.4245043Z 2023-01-11T21:38:06.4245110Z def call(args): 2023-01-11T21:38:06.4245196Z arg0_1, arg1_1, arg2_1 = args 2023-01-11T21:38:06.4245270Z args.clear() 2023-01-11T21:38:06.4245361Z with torch.cuda.device(0): 2023-01-11T21:38:06.4245557Z buf0 = empty_strided((3, ), (1, ), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.4245674Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.4245809Z triton_fused_clone_0.run(arg0_1, buf0, 3, grid=grid(3), stream=stream0) 2023-01-11T21:38:06.4245875Z del arg0_1 2023-01-11T21:38:06.4245988Z aten.index_put_(buf0, [arg1_1], arg2_1, False) 2023-01-11T21:38:06.4246059Z del arg1_1 2023-01-11T21:38:06.4246136Z del arg2_1 2023-01-11T21:38:06.4246213Z return (buf0, ) 2023-01-11T21:38:06.4246218Z 2023-01-11T21:38:06.4246223Z 2023-01-11T21:38:06.4246301Z if __name__ == "__main__": 2023-01-11T21:38:06.4246418Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.4246536Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.4246731Z arg0_1 = rand_strided((3, ), (1, ), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.4246920Z arg1_1 = rand_strided((3, ), (1, ), device='cuda:0', dtype=torch.bool) 2023-01-11T21:38:06.4247110Z arg2_1 = rand_strided((2, ), (1, ), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.4247242Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1])) 2023-01-11T21:38:06.4247247Z 2023-01-11T21:38:06.4247252Z 2023-01-11T21:38:06.4247348Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.4247421Z import torch 2023-01-11T21:38:06.4247495Z import random 2023-01-11T21:38:06.4247610Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.4247731Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.4247736Z 2023-01-11T21:38:06.4247816Z aten = torch.ops.aten 2023-01-11T21:38:06.4247949Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.4248043Z async_compile = AsyncCompile() 2023-01-11T21:38:06.4248048Z 2023-01-11T21:38:06.4248122Z import triton 2023-01-11T21:38:06.4248213Z import triton.language as tl 2023-01-11T21:38:06.4248337Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.4248467Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.4248474Z 2023-01-11T21:38:06.4248478Z 2023-01-11T21:38:06.4248635Z triton_fused_clone_0 = async_compile.triton(''' 2023-01-11T21:38:06.4248708Z import triton 2023-01-11T21:38:06.4248798Z import triton.language as tl 2023-01-11T21:38:06.4248912Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4249039Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4249172Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4249289Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4249299Z 2023-01-11T21:38:06.4249684Z @pointwise(size_hints=[4], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.4249757Z @triton.jit 2023-01-11T21:38:06.4249888Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4249961Z xnumel = 3 2023-01-11T21:38:06.4250058Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4250185Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4250268Z xmask = xindex < xnumel 2023-01-11T21:38:06.4250332Z x0 = xindex 2023-01-11T21:38:06.4250428Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.4250562Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask) 2023-01-11T21:38:06.4250647Z ''') 2023-01-11T21:38:06.4250652Z 2023-01-11T21:38:06.4250657Z 2023-01-11T21:38:06.4250752Z async_compile.wait(globals()) 2023-01-11T21:38:06.4250828Z del async_compile 2023-01-11T21:38:06.4250833Z 2023-01-11T21:38:06.4250907Z def call(args): 2023-01-11T21:38:06.4250992Z arg0_1, arg1_1, arg2_1 = args 2023-01-11T21:38:06.4251061Z args.clear() 2023-01-11T21:38:06.4251151Z with torch.cuda.device(0): 2023-01-11T21:38:06.4251344Z buf0 = empty_strided((3, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.4251435Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.4251610Z triton_fused_clone_0.run(arg0_1, buf0, 3, grid=grid(3), stream=stream0) 2023-01-11T21:38:06.4251680Z del arg0_1 2023-01-11T21:38:06.4251791Z aten.index_put_(buf0, [arg1_1], arg2_1, True) 2023-01-11T21:38:06.4251856Z del arg1_1 2023-01-11T21:38:06.4251931Z del arg2_1 2023-01-11T21:38:06.4252010Z return (buf0, ) 2023-01-11T21:38:06.4252015Z 2023-01-11T21:38:06.4252020Z 2023-01-11T21:38:06.4252100Z if __name__ == "__main__": 2023-01-11T21:38:06.4252218Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.4252343Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.4252540Z arg0_1 = rand_strided((3, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.4252730Z arg1_1 = rand_strided((3, ), (1, ), device='cuda:0', dtype=torch.bool) 2023-01-11T21:38:06.4252913Z arg2_1 = rand_strided((2, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.4253046Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1])) 2023-01-11T21:38:06.4253051Z 2023-01-11T21:38:06.4253319Z [2023-01-11 21:34:50,294] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 604 2023-01-11T21:38:06.4253325Z 2023-01-11T21:38:06.4253424Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.4253501Z import torch 2023-01-11T21:38:06.4253575Z import random 2023-01-11T21:38:06.4253694Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.4253816Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.4253821Z 2023-01-11T21:38:06.4253896Z aten = torch.ops.aten 2023-01-11T21:38:06.4254030Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.4254124Z async_compile = AsyncCompile() 2023-01-11T21:38:06.4254129Z 2023-01-11T21:38:06.4254202Z import triton 2023-01-11T21:38:06.4254295Z import triton.language as tl 2023-01-11T21:38:06.4254421Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.4254690Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.4254696Z 2023-01-11T21:38:06.4254701Z 2023-01-11T21:38:06.4254858Z triton_fused_clone_0 = async_compile.triton(''' 2023-01-11T21:38:06.4254925Z import triton 2023-01-11T21:38:06.4255016Z import triton.language as tl 2023-01-11T21:38:06.4255174Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4255283Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4255413Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4255537Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4255542Z 2023-01-11T21:38:06.4255943Z @pointwise(size_hints=[4], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.4256017Z @triton.jit 2023-01-11T21:38:06.4256148Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4256221Z xnumel = 3 2023-01-11T21:38:06.4256315Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4256442Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4256525Z xmask = xindex < xnumel 2023-01-11T21:38:06.4256597Z x0 = xindex 2023-01-11T21:38:06.4256713Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.4256840Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask) 2023-01-11T21:38:06.4256924Z ''') 2023-01-11T21:38:06.4256930Z 2023-01-11T21:38:06.4256934Z 2023-01-11T21:38:06.4257026Z async_compile.wait(globals()) 2023-01-11T21:38:06.4257102Z del async_compile 2023-01-11T21:38:06.4257107Z 2023-01-11T21:38:06.4257244Z def call(args): 2023-01-11T21:38:06.4257359Z arg0_1, arg1_1, arg2_1 = args 2023-01-11T21:38:06.4257443Z args.clear() 2023-01-11T21:38:06.4257528Z with torch.cuda.device(0): 2023-01-11T21:38:06.4257766Z buf0 = empty_strided((3, ), (1, ), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.4257860Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.4257998Z triton_fused_clone_0.run(arg0_1, buf0, 3, grid=grid(3), stream=stream0) 2023-01-11T21:38:06.4258072Z del arg0_1 2023-01-11T21:38:06.4258185Z aten.index_put_(buf0, [arg1_1], arg2_1, True) 2023-01-11T21:38:06.4258255Z del arg1_1 2023-01-11T21:38:06.4258320Z del arg2_1 2023-01-11T21:38:06.4258397Z return (buf0, ) 2023-01-11T21:38:06.4258403Z 2023-01-11T21:38:06.4258407Z 2023-01-11T21:38:06.4258488Z if __name__ == "__main__": 2023-01-11T21:38:06.4258605Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.4258734Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.4258929Z arg0_1 = rand_strided((3, ), (1, ), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.4259121Z arg1_1 = rand_strided((3, ), (1, ), device='cuda:0', dtype=torch.bool) 2023-01-11T21:38:06.4259313Z arg2_1 = rand_strided((2, ), (1, ), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.4259432Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1])) 2023-01-11T21:38:06.4259441Z 2023-01-11T21:38:06.4259505Z ok (0.432s) 2023-01-11T21:38:06.4259972Z test_index_put_fallback2_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.4260105Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.4260361Z [2023-01-11 21:34:50,343] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 605 2023-01-11T21:38:06.4260624Z [2023-01-11 21:34:50,498] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 605 2023-01-11T21:38:06.4261066Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.4261198Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.4261452Z [2023-01-11 21:34:50,547] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 606 2023-01-11T21:38:06.4261711Z [2023-01-11 21:34:50,614] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 606 2023-01-11T21:38:06.4262123Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.4262256Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.4262515Z [2023-01-11 21:34:50,664] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 607 2023-01-11T21:38:06.4262771Z [2023-01-11 21:34:50,674] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 607 2023-01-11T21:38:06.4263182Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.4263313Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.4263596Z [2023-01-11 21:34:50,723] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 608 2023-01-11T21:38:06.4263601Z 2023-01-11T21:38:06.4263702Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.4263784Z import torch 2023-01-11T21:38:06.4263865Z import random 2023-01-11T21:38:06.4263986Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.4264112Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.4264118Z 2023-01-11T21:38:06.4264195Z aten = torch.ops.aten 2023-01-11T21:38:06.4264337Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.4264434Z async_compile = AsyncCompile() 2023-01-11T21:38:06.4264439Z 2023-01-11T21:38:06.4264512Z import triton 2023-01-11T21:38:06.4264609Z import triton.language as tl 2023-01-11T21:38:06.4264735Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.4264879Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.4264884Z 2023-01-11T21:38:06.4264889Z 2023-01-11T21:38:06.4265048Z triton_fused_clone_0 = async_compile.triton(''' 2023-01-11T21:38:06.4265118Z import triton 2023-01-11T21:38:06.4265210Z import triton.language as tl 2023-01-11T21:38:06.4265333Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4265438Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4265594Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4265733Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4265739Z 2023-01-11T21:38:06.4266150Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.4266226Z @triton.jit 2023-01-11T21:38:06.4266354Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4266432Z xnumel = 6 2023-01-11T21:38:06.4266534Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4266664Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4266750Z xmask = xindex < xnumel 2023-01-11T21:38:06.4266823Z x0 = xindex 2023-01-11T21:38:06.4266980Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.4267113Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask) 2023-01-11T21:38:06.4267201Z ''') 2023-01-11T21:38:06.4267207Z 2023-01-11T21:38:06.4267212Z 2023-01-11T21:38:06.4267306Z async_compile.wait(globals()) 2023-01-11T21:38:06.4267386Z del async_compile 2023-01-11T21:38:06.4267391Z 2023-01-11T21:38:06.4267467Z def call(args): 2023-01-11T21:38:06.4267563Z arg0_1, arg1_1, arg2_1, arg3_1 = args 2023-01-11T21:38:06.4267639Z args.clear() 2023-01-11T21:38:06.4267726Z with torch.cuda.device(0): 2023-01-11T21:38:06.4267935Z buf0 = empty_strided((1, 2, 3), (6, 3, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.4268034Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.4268173Z triton_fused_clone_0.run(arg0_1, buf0, 6, grid=grid(6), stream=stream0) 2023-01-11T21:38:06.4268248Z del arg0_1 2023-01-11T21:38:06.4268379Z aten.index_put_(buf0, [None,arg1_1,arg2_1], arg3_1, False) 2023-01-11T21:38:06.4268457Z del arg1_1 2023-01-11T21:38:06.4268524Z del arg2_1 2023-01-11T21:38:06.4268595Z del arg3_1 2023-01-11T21:38:06.4268673Z return (buf0, ) 2023-01-11T21:38:06.4268678Z 2023-01-11T21:38:06.4268682Z 2023-01-11T21:38:06.4268764Z if __name__ == "__main__": 2023-01-11T21:38:06.4268890Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.4269017Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.4269231Z arg0_1 = rand_strided((1, 2, 3), (6, 3, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.4269423Z arg1_1 = rand_strided((2, ), (1, ), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.4269633Z arg2_1 = rand_strided((3, ), (1, ), device='cuda:0', dtype=torch.bool) 2023-01-11T21:38:06.4269824Z arg3_1 = rand_strided((), (), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.4269958Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1, arg3_1])) 2023-01-11T21:38:06.4269965Z 2023-01-11T21:38:06.4269970Z 2023-01-11T21:38:06.4270074Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.4270151Z import torch 2023-01-11T21:38:06.4270229Z import random 2023-01-11T21:38:06.4270348Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.4270474Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.4270479Z 2023-01-11T21:38:06.4270556Z aten = torch.ops.aten 2023-01-11T21:38:06.4270695Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.4270789Z async_compile = AsyncCompile() 2023-01-11T21:38:06.4270794Z 2023-01-11T21:38:06.4270871Z import triton 2023-01-11T21:38:06.4270971Z import triton.language as tl 2023-01-11T21:38:06.4271100Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.4271238Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.4271244Z 2023-01-11T21:38:06.4271248Z 2023-01-11T21:38:06.4271407Z triton_fused_clone_0 = async_compile.triton(''' 2023-01-11T21:38:06.4271478Z import triton 2023-01-11T21:38:06.4271573Z import triton.language as tl 2023-01-11T21:38:06.4271687Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4271792Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4271928Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4272050Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4272056Z 2023-01-11T21:38:06.4272453Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.4272534Z @triton.jit 2023-01-11T21:38:06.4272662Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4272734Z xnumel = 6 2023-01-11T21:38:06.4272834Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4272994Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4273081Z xmask = xindex < xnumel 2023-01-11T21:38:06.4273159Z x0 = xindex 2023-01-11T21:38:06.4273279Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.4273409Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask) 2023-01-11T21:38:06.4273497Z ''') 2023-01-11T21:38:06.4273503Z 2023-01-11T21:38:06.4273507Z 2023-01-11T21:38:06.4273601Z async_compile.wait(globals()) 2023-01-11T21:38:06.4273682Z del async_compile 2023-01-11T21:38:06.4273687Z 2023-01-11T21:38:06.4273763Z def call(args): 2023-01-11T21:38:06.4273858Z arg0_1, arg1_1, arg2_1, arg3_1 = args 2023-01-11T21:38:06.4273939Z args.clear() 2023-01-11T21:38:06.4274033Z with torch.cuda.device(0): 2023-01-11T21:38:06.4274233Z buf0 = empty_strided((1, 2, 3), (6, 3, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.4274325Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.4274463Z triton_fused_clone_0.run(arg0_1, buf0, 6, grid=grid(6), stream=stream0) 2023-01-11T21:38:06.4274542Z del arg0_1 2023-01-11T21:38:06.4274671Z aten.index_put_(buf0, [None,arg1_1,arg2_1], arg3_1, False) 2023-01-11T21:38:06.4274745Z del arg1_1 2023-01-11T21:38:06.4274821Z del arg2_1 2023-01-11T21:38:06.4274889Z del arg3_1 2023-01-11T21:38:06.4274968Z return (buf0, ) 2023-01-11T21:38:06.4274973Z 2023-01-11T21:38:06.4274977Z 2023-01-11T21:38:06.4275063Z if __name__ == "__main__": 2023-01-11T21:38:06.4275195Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.4275337Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.4275593Z arg0_1 = rand_strided((1, 2, 3), (6, 3, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.4275786Z arg1_1 = rand_strided((2, ), (1, ), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.4275977Z arg2_1 = rand_strided((3, ), (1, ), device='cuda:0', dtype=torch.bool) 2023-01-11T21:38:06.4276161Z arg3_1 = rand_strided((), (), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.4276295Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1, arg3_1])) 2023-01-11T21:38:06.4276300Z 2023-01-11T21:38:06.4276305Z 2023-01-11T21:38:06.4276405Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.4276481Z import torch 2023-01-11T21:38:06.4276557Z import random 2023-01-11T21:38:06.4276676Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.4276802Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.4276807Z 2023-01-11T21:38:06.4276890Z aten = torch.ops.aten 2023-01-11T21:38:06.4277021Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.4277121Z async_compile = AsyncCompile() 2023-01-11T21:38:06.4277126Z 2023-01-11T21:38:06.4277203Z import triton 2023-01-11T21:38:06.4277296Z import triton.language as tl 2023-01-11T21:38:06.4277423Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.4277566Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.4277571Z 2023-01-11T21:38:06.4277575Z 2023-01-11T21:38:06.4277733Z triton_fused_clone_0 = async_compile.triton(''' 2023-01-11T21:38:06.4277810Z import triton 2023-01-11T21:38:06.4277898Z import triton.language as tl 2023-01-11T21:38:06.4278013Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4278117Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4278252Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4278377Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4278382Z 2023-01-11T21:38:06.4278776Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.4278853Z @triton.jit 2023-01-11T21:38:06.4278985Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4279081Z xnumel = 6 2023-01-11T21:38:06.4279182Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4279311Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4279396Z xmask = xindex < xnumel 2023-01-11T21:38:06.4279470Z x0 = xindex 2023-01-11T21:38:06.4279568Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.4279706Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask) 2023-01-11T21:38:06.4279787Z ''') 2023-01-11T21:38:06.4279793Z 2023-01-11T21:38:06.4279797Z 2023-01-11T21:38:06.4279896Z async_compile.wait(globals()) 2023-01-11T21:38:06.4279975Z del async_compile 2023-01-11T21:38:06.4279980Z 2023-01-11T21:38:06.4280056Z def call(args): 2023-01-11T21:38:06.4280151Z arg0_1, arg1_1, arg2_1, arg3_1 = args 2023-01-11T21:38:06.4280227Z args.clear() 2023-01-11T21:38:06.4280320Z with torch.cuda.device(0): 2023-01-11T21:38:06.4280521Z buf0 = empty_strided((1, 2, 3), (6, 3, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.4280618Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.4280757Z triton_fused_clone_0.run(arg0_1, buf0, 6, grid=grid(6), stream=stream0) 2023-01-11T21:38:06.4280832Z del arg0_1 2023-01-11T21:38:06.4280958Z aten.index_put_(buf0, [None,arg1_1,arg2_1], arg3_1, True) 2023-01-11T21:38:06.4281036Z del arg1_1 2023-01-11T21:38:06.4281111Z del arg2_1 2023-01-11T21:38:06.4281178Z del arg3_1 2023-01-11T21:38:06.4281256Z return (buf0, ) 2023-01-11T21:38:06.4281261Z 2023-01-11T21:38:06.4281266Z 2023-01-11T21:38:06.4281348Z if __name__ == "__main__": 2023-01-11T21:38:06.4281538Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.4281663Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.4281870Z arg0_1 = rand_strided((1, 2, 3), (6, 3, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.4282065Z arg1_1 = rand_strided((2, ), (1, ), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.4282257Z arg2_1 = rand_strided((3, ), (1, ), device='cuda:0', dtype=torch.bool) 2023-01-11T21:38:06.4282439Z arg3_1 = rand_strided((), (), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.4282577Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1, arg3_1])) 2023-01-11T21:38:06.4282582Z 2023-01-11T21:38:06.4282849Z [2023-01-11 21:34:50,733] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 608 2023-01-11T21:38:06.4282856Z 2023-01-11T21:38:06.4282955Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.4283032Z import torch 2023-01-11T21:38:06.4283109Z import random 2023-01-11T21:38:06.4283228Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.4283352Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.4283358Z 2023-01-11T21:38:06.4283434Z aten = torch.ops.aten 2023-01-11T21:38:06.4283571Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.4283671Z async_compile = AsyncCompile() 2023-01-11T21:38:06.4283676Z 2023-01-11T21:38:06.4283752Z import triton 2023-01-11T21:38:06.4283847Z import triton.language as tl 2023-01-11T21:38:06.4283972Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.4284115Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.4284121Z 2023-01-11T21:38:06.4284126Z 2023-01-11T21:38:06.4284284Z triton_fused_clone_0 = async_compile.triton(''' 2023-01-11T21:38:06.4284354Z import triton 2023-01-11T21:38:06.4284449Z import triton.language as tl 2023-01-11T21:38:06.4284564Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4284669Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4284803Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4284931Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4284936Z 2023-01-11T21:38:06.4285367Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.4285441Z @triton.jit 2023-01-11T21:38:06.4285568Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4285641Z xnumel = 6 2023-01-11T21:38:06.4285740Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4285871Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4285959Z xmask = xindex < xnumel 2023-01-11T21:38:06.4286033Z x0 = xindex 2023-01-11T21:38:06.4286151Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.4286282Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask) 2023-01-11T21:38:06.4286369Z ''') 2023-01-11T21:38:06.4286375Z 2023-01-11T21:38:06.4286379Z 2023-01-11T21:38:06.4286476Z async_compile.wait(globals()) 2023-01-11T21:38:06.4286554Z del async_compile 2023-01-11T21:38:06.4286561Z 2023-01-11T21:38:06.4286637Z def call(args): 2023-01-11T21:38:06.4286733Z arg0_1, arg1_1, arg2_1, arg3_1 = args 2023-01-11T21:38:06.4286811Z args.clear() 2023-01-11T21:38:06.4286904Z with torch.cuda.device(0): 2023-01-11T21:38:06.4287104Z buf0 = empty_strided((1, 2, 3), (6, 3, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.4287199Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.4287339Z triton_fused_clone_0.run(arg0_1, buf0, 6, grid=grid(6), stream=stream0) 2023-01-11T21:38:06.4287415Z del arg0_1 2023-01-11T21:38:06.4287542Z aten.index_put_(buf0, [None,arg1_1,arg2_1], arg3_1, True) 2023-01-11T21:38:06.4287642Z del arg1_1 2023-01-11T21:38:06.4287717Z del arg2_1 2023-01-11T21:38:06.4287784Z del arg3_1 2023-01-11T21:38:06.4287863Z return (buf0, ) 2023-01-11T21:38:06.4287868Z 2023-01-11T21:38:06.4287873Z 2023-01-11T21:38:06.4287954Z if __name__ == "__main__": 2023-01-11T21:38:06.4288076Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.4288204Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.4288413Z arg0_1 = rand_strided((1, 2, 3), (6, 3, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.4288605Z arg1_1 = rand_strided((2, ), (1, ), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.4288795Z arg2_1 = rand_strided((3, ), (1, ), device='cuda:0', dtype=torch.bool) 2023-01-11T21:38:06.4288977Z arg3_1 = rand_strided((), (), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.4289113Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1, arg3_1])) 2023-01-11T21:38:06.4289122Z 2023-01-11T21:38:06.4289197Z ok (0.440s) 2023-01-11T21:38:06.4289660Z test_index_select_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.4289793Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.4290054Z [2023-01-11 21:34:50,776] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 609 2023-01-11T21:38:06.4290319Z [2023-01-11 21:34:50,894] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 609 2023-01-11T21:38:06.4290733Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.4290868Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.4291150Z [2023-01-11 21:34:50,936] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 610 2023-01-11T21:38:06.4291156Z 2023-01-11T21:38:06.4291258Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.4291328Z import torch 2023-01-11T21:38:06.4291404Z import random 2023-01-11T21:38:06.4291525Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.4291650Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.4291655Z 2023-01-11T21:38:06.4291738Z aten = torch.ops.aten 2023-01-11T21:38:06.4291875Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.4291974Z async_compile = AsyncCompile() 2023-01-11T21:38:06.4291979Z 2023-01-11T21:38:06.4292049Z import triton 2023-01-11T21:38:06.4292143Z import triton.language as tl 2023-01-11T21:38:06.4292276Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.4292417Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.4292424Z 2023-01-11T21:38:06.4292429Z 2023-01-11T21:38:06.4292589Z triton_fused_index_0 = async_compile.triton(''' 2023-01-11T21:38:06.4292668Z import triton 2023-01-11T21:38:06.4292762Z import triton.language as tl 2023-01-11T21:38:06.4292880Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4292977Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4293110Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4293238Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4293243Z 2023-01-11T21:38:06.4293663Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*i32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.4293786Z @triton.jit 2023-01-11T21:38:06.4293928Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4294002Z xnumel = 256 2023-01-11T21:38:06.4294103Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4294228Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4294314Z xmask = xindex < xnumel 2023-01-11T21:38:06.4294400Z x1 = (xindex // 64) 2023-01-11T21:38:06.4294587Z x0 = xindex % 64 2023-01-11T21:38:06.4294662Z x2 = xindex 2023-01-11T21:38:06.4294861Z tmp0 = tl.load(in_ptr0 + (x1), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.4295069Z tmp1 = tl.load(in_ptr1 + (x0 + (64*tmp0)), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.4295197Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask) 2023-01-11T21:38:06.4295290Z ''') 2023-01-11T21:38:06.4295295Z 2023-01-11T21:38:06.4295300Z 2023-01-11T21:38:06.4295460Z triton_fused_index_1_1 = async_compile.triton(''' 2023-01-11T21:38:06.4295535Z import triton 2023-01-11T21:38:06.4295633Z import triton.language as tl 2023-01-11T21:38:06.4295753Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4295858Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4295996Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4296116Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4296121Z 2023-01-11T21:38:06.4296540Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*i32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.4296616Z @triton.jit 2023-01-11T21:38:06.4296755Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4296835Z xnumel = 256 2023-01-11T21:38:06.4296932Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4297062Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4297196Z xmask = xindex < xnumel 2023-01-11T21:38:06.4297335Z x1 = (xindex // 8) % 4 2023-01-11T21:38:06.4297424Z x0 = xindex % 8 2023-01-11T21:38:06.4297519Z x2 = (xindex // 32) 2023-01-11T21:38:06.4297592Z x3 = xindex 2023-01-11T21:38:06.4297785Z tmp0 = tl.load(in_ptr0 + (x1), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.4298001Z tmp1 = tl.load(in_ptr1 + (x0 + (8*tmp0) + (64*x2)), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.4298136Z tl.store(out_ptr0 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask) 2023-01-11T21:38:06.4298213Z ''') 2023-01-11T21:38:06.4298219Z 2023-01-11T21:38:06.4298223Z 2023-01-11T21:38:06.4298381Z triton_fused_index_3_2 = async_compile.triton(''' 2023-01-11T21:38:06.4298462Z import triton 2023-01-11T21:38:06.4298553Z import triton.language as tl 2023-01-11T21:38:06.4298670Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4298770Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4298903Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4299023Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4299035Z 2023-01-11T21:38:06.4299444Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*i32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.4299516Z @triton.jit 2023-01-11T21:38:06.4299653Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4299725Z xnumel = 128 2023-01-11T21:38:06.4299822Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4299953Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4300074Z xmask = xindex < xnumel 2023-01-11T21:38:06.4300147Z x1 = (xindex // 4) % 4 2023-01-11T21:38:06.4300220Z x0 = xindex % 4 2023-01-11T21:38:06.4300295Z x2 = (xindex // 16) 2023-01-11T21:38:06.4300365Z x4 = xindex 2023-01-11T21:38:06.4300466Z tmp0 = tl.load(in_ptr0 + (x1), xmask) 2023-01-11T21:38:06.4300562Z tmp1 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.4300681Z tmp2 = tl.load(in_ptr1 + (tmp1 + (8*tmp0) + (64*x2)), xmask) 2023-01-11T21:38:06.4300805Z tl.store(out_ptr0 + (x4 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.4300888Z ''') 2023-01-11T21:38:06.4300893Z 2023-01-11T21:38:06.4300897Z 2023-01-11T21:38:06.4300990Z async_compile.wait(globals()) 2023-01-11T21:38:06.4301066Z del async_compile 2023-01-11T21:38:06.4301071Z 2023-01-11T21:38:06.4301145Z def call(args): 2023-01-11T21:38:06.4301224Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.4301298Z args.clear() 2023-01-11T21:38:06.4301395Z with torch.cuda.device(0): 2023-01-11T21:38:06.4301596Z buf0 = empty_strided((4, 8, 8), (64, 8, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.4301686Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.4301831Z triton_fused_index_0.run(arg1_1, arg0_1, buf0, 256, grid=grid(256), stream=stream0) 2023-01-11T21:38:06.4302037Z buf1 = empty_strided((8, 4, 8), (32, 8, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.4302182Z triton_fused_index_1_1.run(arg1_1, arg0_1, buf1, 256, grid=grid(256), stream=stream0) 2023-01-11T21:38:06.4302384Z buf2 = empty_strided((8, 4, 4), (16, 4, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.4302528Z triton_fused_index_3_2.run(arg1_1, arg0_1, buf2, 128, grid=grid(128), stream=stream0) 2023-01-11T21:38:06.4302600Z del arg0_1 2023-01-11T21:38:06.4302666Z del arg1_1 2023-01-11T21:38:06.4302756Z return (buf0, buf1, buf2, ) 2023-01-11T21:38:06.4302763Z 2023-01-11T21:38:06.4302768Z 2023-01-11T21:38:06.4302846Z if __name__ == "__main__": 2023-01-11T21:38:06.4302963Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.4303088Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.4303328Z arg0_1 = rand_strided((8, 8, 8), (64, 8, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.4303520Z arg1_1 = rand_strided((4, ), (1, ), device='cuda:0', dtype=torch.int32) 2023-01-11T21:38:06.4303638Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.4303643Z 2023-01-11T21:38:06.4303901Z [2023-01-11 21:34:51,039] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 610 2023-01-11T21:38:06.4304316Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.4304448Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.4304702Z [2023-01-11 21:34:51,081] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 611 2023-01-11T21:38:06.4304707Z 2023-01-11T21:38:06.4304807Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.4304882Z import torch 2023-01-11T21:38:06.4304957Z import random 2023-01-11T21:38:06.4305077Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.4305203Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.4305208Z 2023-01-11T21:38:06.4305283Z aten = torch.ops.aten 2023-01-11T21:38:06.4305437Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.4305540Z async_compile = AsyncCompile() 2023-01-11T21:38:06.4305547Z 2023-01-11T21:38:06.4305638Z import triton 2023-01-11T21:38:06.4305733Z import triton.language as tl 2023-01-11T21:38:06.4305893Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.4306033Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.4306038Z 2023-01-11T21:38:06.4306043Z 2023-01-11T21:38:06.4306202Z triton_fused_index_0 = async_compile.triton(''' 2023-01-11T21:38:06.4306271Z import triton 2023-01-11T21:38:06.4306365Z import triton.language as tl 2023-01-11T21:38:06.4306478Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4306579Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4306713Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4306836Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4306841Z 2023-01-11T21:38:06.4307256Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*i32', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.4307332Z @triton.jit 2023-01-11T21:38:06.4307464Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4307538Z xnumel = 256 2023-01-11T21:38:06.4307634Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4307765Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4307849Z xmask = xindex < xnumel 2023-01-11T21:38:06.4307926Z x1 = (xindex // 64) 2023-01-11T21:38:06.4307999Z x0 = xindex % 64 2023-01-11T21:38:06.4308063Z x2 = xindex 2023-01-11T21:38:06.4308252Z tmp0 = tl.load(in_ptr0 + (x1), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.4308485Z tmp1 = tl.load(in_ptr1 + (x0 + (64*tmp0)), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.4308620Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask) 2023-01-11T21:38:06.4308703Z ''') 2023-01-11T21:38:06.4308708Z 2023-01-11T21:38:06.4308713Z 2023-01-11T21:38:06.4308874Z triton_fused_index_1_1 = async_compile.triton(''' 2023-01-11T21:38:06.4308949Z import triton 2023-01-11T21:38:06.4309040Z import triton.language as tl 2023-01-11T21:38:06.4309148Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4309249Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4309408Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4309536Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4309541Z 2023-01-11T21:38:06.4309958Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*i32', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.4310030Z @triton.jit 2023-01-11T21:38:06.4310169Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4310242Z xnumel = 256 2023-01-11T21:38:06.4310332Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4310463Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4310544Z xmask = xindex < xnumel 2023-01-11T21:38:06.4310622Z x1 = (xindex // 8) % 4 2023-01-11T21:38:06.4310696Z x0 = xindex % 8 2023-01-11T21:38:06.4310770Z x2 = (xindex // 32) 2023-01-11T21:38:06.4310843Z x3 = xindex 2023-01-11T21:38:06.4311024Z tmp0 = tl.load(in_ptr0 + (x1), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.4311263Z tmp1 = tl.load(in_ptr1 + (x0 + (8*tmp0) + (64*x2)), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.4311395Z tl.store(out_ptr0 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask) 2023-01-11T21:38:06.4311479Z ''') 2023-01-11T21:38:06.4311485Z 2023-01-11T21:38:06.4311489Z 2023-01-11T21:38:06.4311647Z triton_fused_index_3_2 = async_compile.triton(''' 2023-01-11T21:38:06.4311722Z import triton 2023-01-11T21:38:06.4311815Z import triton.language as tl 2023-01-11T21:38:06.4311950Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4312051Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4312182Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4312306Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4312311Z 2023-01-11T21:38:06.4312728Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*i32', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.4312802Z @triton.jit 2023-01-11T21:38:06.4312938Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4313013Z xnumel = 128 2023-01-11T21:38:06.4313109Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4313231Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4313317Z xmask = xindex < xnumel 2023-01-11T21:38:06.4313401Z x1 = (xindex // 4) % 4 2023-01-11T21:38:06.4313474Z x0 = xindex % 4 2023-01-11T21:38:06.4313551Z x2 = (xindex // 16) 2023-01-11T21:38:06.4313620Z x4 = xindex 2023-01-11T21:38:06.4313711Z tmp0 = tl.load(in_ptr0 + (x1), xmask) 2023-01-11T21:38:06.4313808Z tmp1 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.4313944Z tmp2 = tl.load(in_ptr1 + (tmp1 + (8*tmp0) + (64*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.4314078Z tl.store(out_ptr0 + (x4 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.4314163Z ''') 2023-01-11T21:38:06.4314169Z 2023-01-11T21:38:06.4314173Z 2023-01-11T21:38:06.4314267Z async_compile.wait(globals()) 2023-01-11T21:38:06.4314343Z del async_compile 2023-01-11T21:38:06.4314348Z 2023-01-11T21:38:06.4314421Z def call(args): 2023-01-11T21:38:06.4314494Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.4314568Z args.clear() 2023-01-11T21:38:06.4314659Z with torch.cuda.device(0): 2023-01-11T21:38:06.4314865Z buf0 = empty_strided((4, 8, 8), (64, 8, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.4314959Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.4315108Z triton_fused_index_0.run(arg1_1, arg0_1, buf0, 256, grid=grid(256), stream=stream0) 2023-01-11T21:38:06.4315364Z buf1 = empty_strided((8, 4, 8), (32, 8, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.4315536Z triton_fused_index_1_1.run(arg1_1, arg0_1, buf1, 256, grid=grid(256), stream=stream0) 2023-01-11T21:38:06.4315733Z buf2 = empty_strided((8, 4, 4), (16, 4, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.4315879Z triton_fused_index_3_2.run(arg1_1, arg0_1, buf2, 128, grid=grid(128), stream=stream0) 2023-01-11T21:38:06.4315949Z del arg0_1 2023-01-11T21:38:06.4316022Z del arg1_1 2023-01-11T21:38:06.4316108Z return (buf0, buf1, buf2, ) 2023-01-11T21:38:06.4316114Z 2023-01-11T21:38:06.4316118Z 2023-01-11T21:38:06.4316196Z if __name__ == "__main__": 2023-01-11T21:38:06.4316321Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.4316448Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.4316647Z arg0_1 = rand_strided((8, 8, 8), (64, 8, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.4316841Z arg1_1 = rand_strided((4, ), (1, ), device='cuda:0', dtype=torch.int32) 2023-01-11T21:38:06.4316960Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.4316965Z 2023-01-11T21:38:06.4317228Z [2023-01-11 21:34:51,184] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 611 2023-01-11T21:38:06.4317644Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.4317800Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.4318057Z [2023-01-11 21:34:51,226] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 612 2023-01-11T21:38:06.4318063Z 2023-01-11T21:38:06.4318160Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.4318236Z import torch 2023-01-11T21:38:06.4318304Z import random 2023-01-11T21:38:06.4318422Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.4318547Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.4318553Z 2023-01-11T21:38:06.4318634Z aten = torch.ops.aten 2023-01-11T21:38:06.4318769Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.4318865Z async_compile = AsyncCompile() 2023-01-11T21:38:06.4318870Z 2023-01-11T21:38:06.4318944Z import triton 2023-01-11T21:38:06.4319033Z import triton.language as tl 2023-01-11T21:38:06.4319151Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.4319292Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.4319297Z 2023-01-11T21:38:06.4319302Z 2023-01-11T21:38:06.4319459Z triton_fused_index_0 = async_compile.triton(''' 2023-01-11T21:38:06.4319532Z import triton 2023-01-11T21:38:06.4319622Z import triton.language as tl 2023-01-11T21:38:06.4319739Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4319839Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4319964Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4320088Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4320093Z 2023-01-11T21:38:06.4320508Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.4320583Z @triton.jit 2023-01-11T21:38:06.4320726Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4320799Z xnumel = 256 2023-01-11T21:38:06.4320895Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4321024Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4321101Z xmask = xindex < xnumel 2023-01-11T21:38:06.4321208Z x1 = (xindex // 64) 2023-01-11T21:38:06.4321284Z x0 = xindex % 64 2023-01-11T21:38:06.4321355Z x2 = xindex 2023-01-11T21:38:06.4321544Z tmp0 = tl.load(in_ptr0 + (x1), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.4321749Z tmp1 = tl.load(in_ptr1 + (x0 + (64*tmp0)), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.4321881Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask) 2023-01-11T21:38:06.4321958Z ''') 2023-01-11T21:38:06.4321968Z 2023-01-11T21:38:06.4321972Z 2023-01-11T21:38:06.4322125Z triton_fused_index_1_1 = async_compile.triton(''' 2023-01-11T21:38:06.4322199Z import triton 2023-01-11T21:38:06.4322294Z import triton.language as tl 2023-01-11T21:38:06.4322408Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4322509Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4322643Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4322766Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4322774Z 2023-01-11T21:38:06.4323187Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.4323255Z @triton.jit 2023-01-11T21:38:06.4323392Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4323464Z xnumel = 256 2023-01-11T21:38:06.4323560Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4323688Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4323798Z xmask = xindex < xnumel 2023-01-11T21:38:06.4323881Z x1 = (xindex // 8) % 4 2023-01-11T21:38:06.4323947Z x0 = xindex % 8 2023-01-11T21:38:06.4324027Z x2 = (xindex // 32) 2023-01-11T21:38:06.4324098Z x3 = xindex 2023-01-11T21:38:06.4324287Z tmp0 = tl.load(in_ptr0 + (x1), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.4324506Z tmp1 = tl.load(in_ptr1 + (x0 + (8*tmp0) + (64*x2)), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.4324641Z tl.store(out_ptr0 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask) 2023-01-11T21:38:06.4324721Z ''') 2023-01-11T21:38:06.4324727Z 2023-01-11T21:38:06.4324731Z 2023-01-11T21:38:06.4324893Z triton_fused_index_3_2 = async_compile.triton(''' 2023-01-11T21:38:06.4324961Z import triton 2023-01-11T21:38:06.4325057Z import triton.language as tl 2023-01-11T21:38:06.4325170Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4325276Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4325410Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4325535Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4325540Z 2023-01-11T21:38:06.4325954Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.4326027Z @triton.jit 2023-01-11T21:38:06.4326158Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4326230Z xnumel = 128 2023-01-11T21:38:06.4326325Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4326453Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4326534Z xmask = xindex < xnumel 2023-01-11T21:38:06.4326616Z x1 = (xindex // 4) % 4 2023-01-11T21:38:06.4326689Z x0 = xindex % 4 2023-01-11T21:38:06.4326760Z x2 = (xindex // 16) 2023-01-11T21:38:06.4326833Z x4 = xindex 2023-01-11T21:38:06.4326929Z tmp0 = tl.load(in_ptr0 + (x1), xmask) 2023-01-11T21:38:06.4327026Z tmp1 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.4327146Z tmp2 = tl.load(in_ptr1 + (tmp1 + (8*tmp0) + (64*x2)), xmask) 2023-01-11T21:38:06.4327310Z tl.store(out_ptr0 + (x4 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.4327396Z ''') 2023-01-11T21:38:06.4327402Z 2023-01-11T21:38:06.4327406Z 2023-01-11T21:38:06.4327492Z async_compile.wait(globals()) 2023-01-11T21:38:06.4327568Z del async_compile 2023-01-11T21:38:06.4327573Z 2023-01-11T21:38:06.4327647Z def call(args): 2023-01-11T21:38:06.4327725Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.4327800Z args.clear() 2023-01-11T21:38:06.4327890Z with torch.cuda.device(0): 2023-01-11T21:38:06.4328097Z buf0 = empty_strided((4, 8, 8), (64, 8, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.4328182Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.4328330Z triton_fused_index_0.run(arg1_1, arg0_1, buf0, 256, grid=grid(256), stream=stream0) 2023-01-11T21:38:06.4328537Z buf1 = empty_strided((8, 4, 8), (32, 8, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.4328685Z triton_fused_index_1_1.run(arg1_1, arg0_1, buf1, 256, grid=grid(256), stream=stream0) 2023-01-11T21:38:06.4328890Z buf2 = empty_strided((8, 4, 4), (16, 4, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.4329034Z triton_fused_index_3_2.run(arg1_1, arg0_1, buf2, 128, grid=grid(128), stream=stream0) 2023-01-11T21:38:06.4329105Z del arg0_1 2023-01-11T21:38:06.4329177Z del arg1_1 2023-01-11T21:38:06.4329258Z return (buf0, buf1, buf2, ) 2023-01-11T21:38:06.4329263Z 2023-01-11T21:38:06.4329273Z 2023-01-11T21:38:06.4329346Z if __name__ == "__main__": 2023-01-11T21:38:06.4329463Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.4329589Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.4329825Z arg0_1 = rand_strided((8, 8, 8), (64, 8, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.4330021Z arg1_1 = rand_strided((4, ), (1, ), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.4330139Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.4330145Z 2023-01-11T21:38:06.4330414Z [2023-01-11 21:34:51,330] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 612 2023-01-11T21:38:06.4330419Z 2023-01-11T21:38:06.4330515Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.4330583Z import torch 2023-01-11T21:38:06.4330656Z import random 2023-01-11T21:38:06.4330774Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.4330901Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.4330906Z 2023-01-11T21:38:06.4330989Z aten = torch.ops.aten 2023-01-11T21:38:06.4331126Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.4331225Z async_compile = AsyncCompile() 2023-01-11T21:38:06.4331230Z 2023-01-11T21:38:06.4331297Z import triton 2023-01-11T21:38:06.4331389Z import triton.language as tl 2023-01-11T21:38:06.4331513Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.4331653Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.4331661Z 2023-01-11T21:38:06.4331665Z 2023-01-11T21:38:06.4331825Z triton_fused_index_0 = async_compile.triton(''' 2023-01-11T21:38:06.4331900Z import triton 2023-01-11T21:38:06.4331995Z import triton.language as tl 2023-01-11T21:38:06.4332108Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4332202Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4332333Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4332457Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4332462Z 2023-01-11T21:38:06.4332874Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.4332950Z @triton.jit 2023-01-11T21:38:06.4333090Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4333197Z xnumel = 256 2023-01-11T21:38:06.4333295Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4333417Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4333500Z xmask = xindex < xnumel 2023-01-11T21:38:06.4333581Z x1 = (xindex // 64) 2023-01-11T21:38:06.4333656Z x0 = xindex % 64 2023-01-11T21:38:06.4333725Z x2 = xindex 2023-01-11T21:38:06.4333917Z tmp0 = tl.load(in_ptr0 + (x1), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.4334147Z tmp1 = tl.load(in_ptr1 + (x0 + (64*tmp0)), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.4334272Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask) 2023-01-11T21:38:06.4334360Z ''') 2023-01-11T21:38:06.4334366Z 2023-01-11T21:38:06.4334370Z 2023-01-11T21:38:06.4334633Z triton_fused_index_1_1 = async_compile.triton(''' 2023-01-11T21:38:06.4334712Z import triton 2023-01-11T21:38:06.4334804Z import triton.language as tl 2023-01-11T21:38:06.4334920Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4335022Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4335153Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4335271Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4335276Z 2023-01-11T21:38:06.4335694Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.4335766Z @triton.jit 2023-01-11T21:38:06.4335901Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4336031Z xnumel = 256 2023-01-11T21:38:06.4336127Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4336256Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4336339Z xmask = xindex < xnumel 2023-01-11T21:38:06.4336416Z x1 = (xindex // 8) % 4 2023-01-11T21:38:06.4336490Z x0 = xindex % 8 2023-01-11T21:38:06.4336567Z x2 = (xindex // 32) 2023-01-11T21:38:06.4336637Z x3 = xindex 2023-01-11T21:38:06.4336822Z tmp0 = tl.load(in_ptr0 + (x1), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.4337062Z tmp1 = tl.load(in_ptr1 + (x0 + (8*tmp0) + (64*x2)), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.4337276Z tl.store(out_ptr0 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask) 2023-01-11T21:38:06.4337355Z ''') 2023-01-11T21:38:06.4337360Z 2023-01-11T21:38:06.4337364Z 2023-01-11T21:38:06.4337523Z triton_fused_index_3_2 = async_compile.triton(''' 2023-01-11T21:38:06.4337603Z import triton 2023-01-11T21:38:06.4337695Z import triton.language as tl 2023-01-11T21:38:06.4337808Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4337910Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4338042Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4338163Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4338175Z 2023-01-11T21:38:06.4338583Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.4338656Z @triton.jit 2023-01-11T21:38:06.4338794Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4338866Z xnumel = 128 2023-01-11T21:38:06.4338962Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4339093Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4339175Z xmask = xindex < xnumel 2023-01-11T21:38:06.4339248Z x1 = (xindex // 4) % 4 2023-01-11T21:38:06.4339320Z x0 = xindex % 4 2023-01-11T21:38:06.4339401Z x2 = (xindex // 16) 2023-01-11T21:38:06.4339472Z x4 = xindex 2023-01-11T21:38:06.4339615Z tmp0 = tl.load(in_ptr0 + (x1), xmask) 2023-01-11T21:38:06.4339712Z tmp1 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.4339848Z tmp2 = tl.load(in_ptr1 + (tmp1 + (8*tmp0) + (64*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.4339975Z tl.store(out_ptr0 + (x4 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.4340061Z ''') 2023-01-11T21:38:06.4340067Z 2023-01-11T21:38:06.4340071Z 2023-01-11T21:38:06.4340165Z async_compile.wait(globals()) 2023-01-11T21:38:06.4340242Z del async_compile 2023-01-11T21:38:06.4340247Z 2023-01-11T21:38:06.4340322Z def call(args): 2023-01-11T21:38:06.4340401Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.4340479Z args.clear() 2023-01-11T21:38:06.4340571Z with torch.cuda.device(0): 2023-01-11T21:38:06.4340770Z buf0 = empty_strided((4, 8, 8), (64, 8, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.4340860Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.4341007Z triton_fused_index_0.run(arg1_1, arg0_1, buf0, 256, grid=grid(256), stream=stream0) 2023-01-11T21:38:06.4341212Z buf1 = empty_strided((8, 4, 8), (32, 8, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.4341354Z triton_fused_index_1_1.run(arg1_1, arg0_1, buf1, 256, grid=grid(256), stream=stream0) 2023-01-11T21:38:06.4341555Z buf2 = empty_strided((8, 4, 4), (16, 4, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.4341701Z triton_fused_index_3_2.run(arg1_1, arg0_1, buf2, 128, grid=grid(128), stream=stream0) 2023-01-11T21:38:06.4341775Z del arg0_1 2023-01-11T21:38:06.4341841Z del arg1_1 2023-01-11T21:38:06.4341929Z return (buf0, buf1, buf2, ) 2023-01-11T21:38:06.4341961Z 2023-01-11T21:38:06.4341966Z 2023-01-11T21:38:06.4342050Z if __name__ == "__main__": 2023-01-11T21:38:06.4342169Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.4342297Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.4342509Z arg0_1 = rand_strided((8, 8, 8), (64, 8, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.4342703Z arg1_1 = rand_strided((4, ), (1, ), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.4342827Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.4342832Z 2023-01-11T21:38:06.4342898Z ok (0.597s) 2023-01-11T21:38:06.4343373Z test_indirect_load_broadcast_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.4343511Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.4343771Z [2023-01-11 21:34:51,348] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 613 2023-01-11T21:38:06.4344036Z [2023-01-11 21:34:51,876] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 613 2023-01-11T21:38:06.4344453Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.4344584Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.4344841Z [2023-01-11 21:34:51,895] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 614 2023-01-11T21:38:06.4345110Z [2023-01-11 21:34:52,122] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 614 2023-01-11T21:38:06.4345116Z 2023-01-11T21:38:06.4345217Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.4345321Z import torch 2023-01-11T21:38:06.4345393Z import random 2023-01-11T21:38:06.4345537Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.4345677Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.4345683Z 2023-01-11T21:38:06.4345780Z aten = torch.ops.aten 2023-01-11T21:38:06.4345918Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.4346014Z async_compile = AsyncCompile() 2023-01-11T21:38:06.4346019Z 2023-01-11T21:38:06.4346095Z import triton 2023-01-11T21:38:06.4346183Z import triton.language as tl 2023-01-11T21:38:06.4346309Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.4346451Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.4346457Z 2023-01-11T21:38:06.4346461Z 2023-01-11T21:38:06.4346617Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.4346694Z import triton 2023-01-11T21:38:06.4346790Z import triton.language as tl 2023-01-11T21:38:06.4346908Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4347013Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4347141Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4347269Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4347274Z 2023-01-11T21:38:06.4347767Z @pointwise(size_hints=[32, 32], tile_hint=TileHint.DEFAULT,filename=__file__, meta={'signature': {0: '*i64', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: 'i32', 5: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]}) 2023-01-11T21:38:06.4347871Z @triton.jit 2023-01-11T21:38:06.4348055Z def triton_(in_ptr0, in_ptr1, in_ptr2, out_ptr0, xnumel, ynumel, XBLOCK : tl.constexpr, YBLOCK : tl.constexpr): 2023-01-11T21:38:06.4348129Z xnumel = 32 2023-01-11T21:38:06.4348203Z ynumel = 21 2023-01-11T21:38:06.4348301Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4348434Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.4348521Z xmask = xindex < xnumel 2023-01-11T21:38:06.4348617Z yoffset = tl.program_id(1) * YBLOCK 2023-01-11T21:38:06.4348751Z yindex = yoffset + tl.reshape(tl.arange(0, YBLOCK), [1, YBLOCK]) 2023-01-11T21:38:06.4348835Z ymask = yindex < ynumel 2023-01-11T21:38:06.4348908Z x0 = xindex 2023-01-11T21:38:06.4348981Z y1 = yindex 2023-01-11T21:38:06.4349092Z tmp0 = tl.load(in_ptr0 + (x0 + (32*y1)), xmask & ymask) 2023-01-11T21:38:06.4349192Z tmp2 = tl.load(in_ptr2 + (x0), xmask) 2023-01-11T21:38:06.4349314Z tmp1 = tl.load(in_ptr1 + (y1 + (512*tmp0)), xmask & ymask) 2023-01-11T21:38:06.4349401Z tmp3 = tmp1 + tmp2 2023-01-11T21:38:06.4349560Z tl.store(out_ptr0 + (y1 + (21*x0) + tl.zeros([XBLOCK, YBLOCK], tl.int32)), tmp3, xmask & ymask) 2023-01-11T21:38:06.4349647Z ''') 2023-01-11T21:38:06.4349653Z 2023-01-11T21:38:06.4349657Z 2023-01-11T21:38:06.4349753Z async_compile.wait(globals()) 2023-01-11T21:38:06.4349835Z del async_compile 2023-01-11T21:38:06.4349840Z 2023-01-11T21:38:06.4349910Z def call(args): 2023-01-11T21:38:06.4349999Z arg0_1, arg1_1, arg2_1 = args 2023-01-11T21:38:06.4350081Z args.clear() 2023-01-11T21:38:06.4350174Z with torch.cuda.device(0): 2023-01-11T21:38:06.4350378Z buf0 = empty_strided((32, 21), (21, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.4350472Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.4350630Z triton_fused_add_0.run(arg2_1, arg1_1, arg0_1, buf0, 32, 21, grid=grid(32, 21), stream=stream0) 2023-01-11T21:38:06.4350699Z del arg0_1 2023-01-11T21:38:06.4350776Z del arg1_1 2023-01-11T21:38:06.4350850Z del arg2_1 2023-01-11T21:38:06.4350928Z return (buf0, ) 2023-01-11T21:38:06.4350934Z 2023-01-11T21:38:06.4350938Z 2023-01-11T21:38:06.4351020Z if __name__ == "__main__": 2023-01-11T21:38:06.4351139Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.4351295Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.4351502Z arg0_1 = rand_strided((32, 1), (1, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.4351704Z arg1_1 = rand_strided((9521, 512), (512, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.4351902Z arg2_1 = rand_strided((32, 21), (1, 32), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.4352032Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1])) 2023-01-11T21:38:06.4352037Z 2023-01-11T21:38:06.4352042Z 2023-01-11T21:38:06.4352140Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.4352215Z import torch 2023-01-11T21:38:06.4352293Z import random 2023-01-11T21:38:06.4352411Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.4352529Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.4352540Z 2023-01-11T21:38:06.4352617Z aten = torch.ops.aten 2023-01-11T21:38:06.4352758Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.4352858Z async_compile = AsyncCompile() 2023-01-11T21:38:06.4352863Z 2023-01-11T21:38:06.4352940Z import triton 2023-01-11T21:38:06.4353035Z import triton.language as tl 2023-01-11T21:38:06.4353160Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.4353300Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.4353306Z 2023-01-11T21:38:06.4353310Z 2023-01-11T21:38:06.4353465Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.4353536Z import triton 2023-01-11T21:38:06.4353630Z import triton.language as tl 2023-01-11T21:38:06.4353745Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4353875Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4354006Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4354136Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4354141Z 2023-01-11T21:38:06.4354638Z @pointwise(size_hints=[32, 32], tile_hint=TileHint.DEFAULT,filename=__file__, meta={'signature': {0: '*i64', 1: '*fp16', 2: '*fp16', 3: '*fp16', 4: 'i32', 5: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]}) 2023-01-11T21:38:06.4354715Z @triton.jit 2023-01-11T21:38:06.4354892Z def triton_(in_ptr0, in_ptr1, in_ptr2, out_ptr0, xnumel, ynumel, XBLOCK : tl.constexpr, YBLOCK : tl.constexpr): 2023-01-11T21:38:06.4354968Z xnumel = 32 2023-01-11T21:38:06.4355044Z ynumel = 21 2023-01-11T21:38:06.4355144Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4355281Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.4355369Z xmask = xindex < xnumel 2023-01-11T21:38:06.4355467Z yoffset = tl.program_id(1) * YBLOCK 2023-01-11T21:38:06.4355593Z yindex = yoffset + tl.reshape(tl.arange(0, YBLOCK), [1, YBLOCK]) 2023-01-11T21:38:06.4355678Z ymask = yindex < ynumel 2023-01-11T21:38:06.4355750Z x0 = xindex 2023-01-11T21:38:06.4355823Z y1 = yindex 2023-01-11T21:38:06.4355941Z tmp0 = tl.load(in_ptr0 + (x0 + (32*y1)), xmask & ymask) 2023-01-11T21:38:06.4356061Z tmp2 = tl.load(in_ptr2 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.4356197Z tmp1 = tl.load(in_ptr1 + (y1 + (512*tmp0)), xmask & ymask).to(tl.float32) 2023-01-11T21:38:06.4356272Z tmp3 = tmp1 + tmp2 2023-01-11T21:38:06.4356429Z tl.store(out_ptr0 + (y1 + (21*x0) + tl.zeros([XBLOCK, YBLOCK], tl.int32)), tmp3, xmask & ymask) 2023-01-11T21:38:06.4356518Z ''') 2023-01-11T21:38:06.4356523Z 2023-01-11T21:38:06.4356528Z 2023-01-11T21:38:06.4356621Z async_compile.wait(globals()) 2023-01-11T21:38:06.4356704Z del async_compile 2023-01-11T21:38:06.4356709Z 2023-01-11T21:38:06.4356785Z def call(args): 2023-01-11T21:38:06.4356877Z arg0_1, arg1_1, arg2_1 = args 2023-01-11T21:38:06.4356954Z args.clear() 2023-01-11T21:38:06.4357042Z with torch.cuda.device(0): 2023-01-11T21:38:06.4357280Z buf0 = empty_strided((32, 21), (21, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.4357376Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.4357532Z triton_fused_add_0.run(arg2_1, arg1_1, arg0_1, buf0, 32, 21, grid=grid(32, 21), stream=stream0) 2023-01-11T21:38:06.4357607Z del arg0_1 2023-01-11T21:38:06.4357682Z del arg1_1 2023-01-11T21:38:06.4357756Z del arg2_1 2023-01-11T21:38:06.4357828Z return (buf0, ) 2023-01-11T21:38:06.4357833Z 2023-01-11T21:38:06.4357838Z 2023-01-11T21:38:06.4357920Z if __name__ == "__main__": 2023-01-11T21:38:06.4358039Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.4358168Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.4358377Z arg0_1 = rand_strided((32, 1), (1, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.4358588Z arg1_1 = rand_strided((9521, 512), (512, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.4358788Z arg2_1 = rand_strided((32, 21), (1, 32), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.4358920Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1])) 2023-01-11T21:38:06.4358925Z 2023-01-11T21:38:06.4358992Z ok (0.794s) 2023-01-11T21:38:06.4359444Z test_inf_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.4359576Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.4359867Z [2023-01-11 21:34:52,142] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 615 2023-01-11T21:38:06.4360130Z [2023-01-11 21:34:52,313] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 615 2023-01-11T21:38:06.4360550Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.4360683Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.4360939Z [2023-01-11 21:34:52,331] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 616 2023-01-11T21:38:06.4361201Z [2023-01-11 21:34:52,406] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 616 2023-01-11T21:38:06.4361209Z 2023-01-11T21:38:06.4361310Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.4361386Z import torch 2023-01-11T21:38:06.4361455Z import random 2023-01-11T21:38:06.4361574Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.4361702Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.4361707Z 2023-01-11T21:38:06.4361793Z aten = torch.ops.aten 2023-01-11T21:38:06.4361933Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.4362030Z async_compile = AsyncCompile() 2023-01-11T21:38:06.4362036Z 2023-01-11T21:38:06.4362111Z import triton 2023-01-11T21:38:06.4362198Z import triton.language as tl 2023-01-11T21:38:06.4362325Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.4362468Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.4362474Z 2023-01-11T21:38:06.4362481Z 2023-01-11T21:38:06.4362653Z triton_fused_add_add_1_mul_0 = async_compile.triton(''' 2023-01-11T21:38:06.4362730Z import triton 2023-01-11T21:38:06.4362825Z import triton.language as tl 2023-01-11T21:38:06.4362941Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4363044Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4363201Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4363331Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4363336Z 2023-01-11T21:38:06.4363768Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.4363845Z @triton.jit 2023-01-11T21:38:06.4363999Z def triton_(in_ptr0, out_ptr0, out_ptr1, out_ptr2, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4364072Z xnumel = 8 2023-01-11T21:38:06.4364176Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4364306Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4364384Z xmask = xindex < xnumel 2023-01-11T21:38:06.4364458Z x0 = xindex 2023-01-11T21:38:06.4364649Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.4364752Z tmp5 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.4364837Z tmp1 = float("inf") 2023-01-11T21:38:06.4364918Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.4365032Z tmp3 = float("-inf") 2023-01-11T21:38:06.4365106Z tmp4 = tmp0 + tmp3 2023-01-11T21:38:06.4365185Z tmp6 = tmp5 * tmp3 2023-01-11T21:38:06.4365323Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.4365456Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.4365590Z tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp6, xmask) 2023-01-11T21:38:06.4365703Z ''') 2023-01-11T21:38:06.4365709Z 2023-01-11T21:38:06.4365714Z 2023-01-11T21:38:06.4365811Z async_compile.wait(globals()) 2023-01-11T21:38:06.4365888Z del async_compile 2023-01-11T21:38:06.4365893Z 2023-01-11T21:38:06.4365962Z def call(args): 2023-01-11T21:38:06.4366037Z arg0_1, = args 2023-01-11T21:38:06.4366111Z args.clear() 2023-01-11T21:38:06.4366209Z with torch.cuda.device(0): 2023-01-11T21:38:06.4366410Z buf0 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.4366607Z buf1 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.4366804Z buf2 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.4366892Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.4367055Z triton_fused_add_add_1_mul_0.run(arg0_1, buf0, buf1, buf2, 8, grid=grid(8), stream=stream0) 2023-01-11T21:38:06.4367129Z del arg0_1 2023-01-11T21:38:06.4367220Z return (buf0, buf1, buf2, ) 2023-01-11T21:38:06.4367228Z 2023-01-11T21:38:06.4367233Z 2023-01-11T21:38:06.4367315Z if __name__ == "__main__": 2023-01-11T21:38:06.4367436Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.4367565Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.4367767Z arg0_1 = rand_strided((8, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.4367875Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.4367880Z 2023-01-11T21:38:06.4367891Z 2023-01-11T21:38:06.4367984Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.4368065Z import torch 2023-01-11T21:38:06.4368143Z import random 2023-01-11T21:38:06.4368265Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.4368390Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.4368395Z 2023-01-11T21:38:06.4368479Z aten = torch.ops.aten 2023-01-11T21:38:06.4368617Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.4368711Z async_compile = AsyncCompile() 2023-01-11T21:38:06.4368716Z 2023-01-11T21:38:06.4368791Z import triton 2023-01-11T21:38:06.4368883Z import triton.language as tl 2023-01-11T21:38:06.4369010Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.4369178Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.4369184Z 2023-01-11T21:38:06.4369189Z 2023-01-11T21:38:06.4369361Z triton_fused_add_add_1_mul_0 = async_compile.triton(''' 2023-01-11T21:38:06.4369436Z import triton 2023-01-11T21:38:06.4369526Z import triton.language as tl 2023-01-11T21:38:06.4369633Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4369733Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4369864Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4369988Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4369994Z 2023-01-11T21:38:06.4370429Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: '*fp16', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.4370506Z @triton.jit 2023-01-11T21:38:06.4370660Z def triton_(in_ptr0, out_ptr0, out_ptr1, out_ptr2, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4370733Z xnumel = 8 2023-01-11T21:38:06.4370823Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4370951Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4371033Z xmask = xindex < xnumel 2023-01-11T21:38:06.4371103Z x0 = xindex 2023-01-11T21:38:06.4371318Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.4371435Z tmp5 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.4371515Z tmp1 = float("inf") 2023-01-11T21:38:06.4371587Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.4371727Z tmp3 = float("-inf") 2023-01-11T21:38:06.4371803Z tmp4 = tmp0 + tmp3 2023-01-11T21:38:06.4371880Z tmp6 = tmp5 * tmp3 2023-01-11T21:38:06.4372016Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.4372146Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.4372276Z tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp6, xmask) 2023-01-11T21:38:06.4372354Z ''') 2023-01-11T21:38:06.4372359Z 2023-01-11T21:38:06.4372371Z 2023-01-11T21:38:06.4372458Z async_compile.wait(globals()) 2023-01-11T21:38:06.4372533Z del async_compile 2023-01-11T21:38:06.4372538Z 2023-01-11T21:38:06.4372611Z def call(args): 2023-01-11T21:38:06.4372684Z arg0_1, = args 2023-01-11T21:38:06.4372759Z args.clear() 2023-01-11T21:38:06.4372851Z with torch.cuda.device(0): 2023-01-11T21:38:06.4373040Z buf0 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.4373233Z buf1 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.4373428Z buf2 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.4373519Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.4373676Z triton_fused_add_add_1_mul_0.run(arg0_1, buf0, buf1, buf2, 8, grid=grid(8), stream=stream0) 2023-01-11T21:38:06.4373750Z del arg0_1 2023-01-11T21:38:06.4373839Z return (buf0, buf1, buf2, ) 2023-01-11T21:38:06.4373844Z 2023-01-11T21:38:06.4373849Z 2023-01-11T21:38:06.4373929Z if __name__ == "__main__": 2023-01-11T21:38:06.4374039Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.4374163Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.4374358Z arg0_1 = rand_strided((8, ), (1, ), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.4374579Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.4374585Z 2023-01-11T21:38:06.4374658Z ok (0.282s) 2023-01-11T21:38:06.4375217Z test_inplace_activations_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.4375365Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.4375660Z [2023-01-11 21:34:52,548] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 617 2023-01-11T21:38:06.4375921Z [2023-01-11 21:34:52,849] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 617 2023-01-11T21:38:06.4376335Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.4376470Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.4376720Z [2023-01-11 21:34:53,061] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 618 2023-01-11T21:38:06.4376733Z 2023-01-11T21:38:06.4376824Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.4376898Z import torch 2023-01-11T21:38:06.4376969Z import random 2023-01-11T21:38:06.4377090Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.4377272Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.4377278Z 2023-01-11T21:38:06.4377368Z aten = torch.ops.aten 2023-01-11T21:38:06.4377516Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.4377605Z async_compile = AsyncCompile() 2023-01-11T21:38:06.4377610Z 2023-01-11T21:38:06.4377734Z import triton 2023-01-11T21:38:06.4377825Z import triton.language as tl 2023-01-11T21:38:06.4377950Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.4378088Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.4378093Z 2023-01-11T21:38:06.4378098Z 2023-01-11T21:38:06.4378349Z triton_fused_add_add_1_add_2_add_3_add_4_add_6_add_7_div_gt_lift_fresh_copy_0 = async_compile.triton(''' 2023-01-11T21:38:06.4378424Z import triton 2023-01-11T21:38:06.4378517Z import triton.language as tl 2023-01-11T21:38:06.4378624Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4378727Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4378863Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4378990Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4378995Z 2023-01-11T21:38:06.4379489Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: '*fp32', 5: '*fp32', 6: '*fp32', 7: '*fp32', 8: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5, 6, 7, 8), equal_to_1=())]}) 2023-01-11T21:38:06.4379566Z @triton.jit 2023-01-11T21:38:06.4379755Z def triton_(in_ptr0, out_ptr0, out_ptr1, out_ptr2, out_ptr3, out_ptr4, out_ptr5, out_ptr6, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4379831Z xnumel = 64 2023-01-11T21:38:06.4379921Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4380049Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4380132Z xmask = xindex < xnumel 2023-01-11T21:38:06.4380203Z x0 = xindex 2023-01-11T21:38:06.4380394Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.4380493Z tmp26 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.4380564Z tmp1 = 1 2023-01-11T21:38:06.4380636Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.4380709Z tmp3 = 3 2023-01-11T21:38:06.4380785Z tmp4 = tmp2 + tmp3 2023-01-11T21:38:06.4380857Z tmp5 = 0.0 2023-01-11T21:38:06.4380994Z tmp6 = tl.where(tmp4 != tmp4, tmp4, tl.where(tmp4 > tmp5, tmp4, tmp5)) 2023-01-11T21:38:06.4381066Z tmp7 = 6.0 2023-01-11T21:38:06.4381227Z tmp8 = tl.where(tmp6 != tmp6, tmp6, tl.where(tmp6 < tmp7, tmp6, tmp7)) 2023-01-11T21:38:06.4381299Z tmp9 = tmp2 * tmp8 2023-01-11T21:38:06.4381370Z tmp10 = 6 2023-01-11T21:38:06.4381449Z tmp11 = tmp9 / tmp10 2023-01-11T21:38:06.4381552Z tmp12 = -1.0 2023-01-11T21:38:06.4381690Z tmp13 = tl.where(tmp2 != tmp2, tmp2, tl.where(tmp2 > tmp12, tmp2, tmp12)) 2023-01-11T21:38:06.4381761Z tmp14 = 1.0 2023-01-11T21:38:06.4381903Z tmp15 = tl.where(tmp13 != tmp13, tmp13, tl.where(tmp13 < tmp14, tmp13, tmp14)) 2023-01-11T21:38:06.4381967Z tmp16 = 0 2023-01-11T21:38:06.4382047Z tmp17 = tmp2 > tmp16 2023-01-11T21:38:06.4382121Z tmp18 = 0.01 2023-01-11T21:38:06.4382202Z tmp19 = tmp2 * tmp18 2023-01-11T21:38:06.4382304Z tmp20 = tl.where(tmp17, tmp2, tmp19) 2023-01-11T21:38:06.4382391Z tmp21 = tl.sigmoid(tmp2) 2023-01-11T21:38:06.4382469Z tmp22 = tmp2 * tmp21 2023-01-11T21:38:06.4382561Z tmp23 = tl.libdevice.log1p(tmp2) 2023-01-11T21:38:06.4382633Z tmp24 = 99.0 2023-01-11T21:38:06.4382734Z tmp25 = tl.where(tmp16, tmp24, tmp2) 2023-01-11T21:38:06.4382811Z tmp27 = tmp26 + tmp1 2023-01-11T21:38:06.4382909Z tmp28 = tl.where(tmp1, tmp24, tmp27) 2023-01-11T21:38:06.4383043Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp11, xmask) 2023-01-11T21:38:06.4383177Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp15, xmask) 2023-01-11T21:38:06.4383299Z tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp20, xmask) 2023-01-11T21:38:06.4383422Z tl.store(out_ptr3 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp22, xmask) 2023-01-11T21:38:06.4383544Z tl.store(out_ptr4 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp23, xmask) 2023-01-11T21:38:06.4383696Z tl.store(out_ptr5 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp25, xmask) 2023-01-11T21:38:06.4383819Z tl.store(out_ptr6 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp28, xmask) 2023-01-11T21:38:06.4383903Z ''') 2023-01-11T21:38:06.4383909Z 2023-01-11T21:38:06.4383913Z 2023-01-11T21:38:06.4384007Z async_compile.wait(globals()) 2023-01-11T21:38:06.4384084Z del async_compile 2023-01-11T21:38:06.4384089Z 2023-01-11T21:38:06.4384156Z def call(args): 2023-01-11T21:38:06.4384227Z arg0_1, = args 2023-01-11T21:38:06.4384299Z args.clear() 2023-01-11T21:38:06.4384391Z with torch.cuda.device(0): 2023-01-11T21:38:06.4384589Z buf0 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.4384785Z buf1 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.4384978Z buf2 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.4385165Z buf3 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.4385360Z buf4 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.4385551Z buf5 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.4385776Z buf6 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.4385885Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.4386100Z triton_fused_add_add_1_add_2_add_3_add_4_add_6_add_7_div_gt_lift_fresh_copy_0.run(arg0_1, buf0, buf1, buf2, buf3, buf4, buf5, buf6, 64, grid=grid(64), stream=stream0) 2023-01-11T21:38:06.4386174Z del arg0_1 2023-01-11T21:38:06.4386290Z return (buf0, buf1, buf2, buf3, buf4, buf5, buf6, ) 2023-01-11T21:38:06.4386296Z 2023-01-11T21:38:06.4386300Z 2023-01-11T21:38:06.4386380Z if __name__ == "__main__": 2023-01-11T21:38:06.4386491Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.4386617Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.4386818Z arg0_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.4386930Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.4386936Z 2023-01-11T21:38:06.4387228Z [2023-01-11 21:34:53,261] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 618 2023-01-11T21:38:06.4387235Z 2023-01-11T21:38:06.4387332Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.4387408Z import torch 2023-01-11T21:38:06.4387481Z import random 2023-01-11T21:38:06.4387593Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.4387714Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.4387719Z 2023-01-11T21:38:06.4387799Z aten = torch.ops.aten 2023-01-11T21:38:06.4387935Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.4388028Z async_compile = AsyncCompile() 2023-01-11T21:38:06.4388033Z 2023-01-11T21:38:06.4388109Z import triton 2023-01-11T21:38:06.4388201Z import triton.language as tl 2023-01-11T21:38:06.4388319Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.4388461Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.4388467Z 2023-01-11T21:38:06.4388471Z 2023-01-11T21:38:06.4388783Z triton_fused_add_add_1_add_2_add_3_add_4_add_6_add_7_convert_element_type_convert_element_type_1_convert_element_type_2_0 = async_compile.triton(''' 2023-01-11T21:38:06.4388858Z import triton 2023-01-11T21:38:06.4388949Z import triton.language as tl 2023-01-11T21:38:06.4389063Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4389168Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4389298Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4389416Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4389426Z 2023-01-11T21:38:06.4389906Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: '*fp16', 4: '*fp16', 5: '*fp16', 6: '*fp16', 7: '*fp16', 8: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5, 6, 7, 8), equal_to_1=())]}) 2023-01-11T21:38:06.4390008Z @triton.jit 2023-01-11T21:38:06.4390196Z def triton_(in_ptr0, out_ptr0, out_ptr1, out_ptr2, out_ptr3, out_ptr4, out_ptr5, out_ptr6, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4390270Z xnumel = 64 2023-01-11T21:38:06.4390367Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4390501Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4390585Z xmask = xindex < xnumel 2023-01-11T21:38:06.4390655Z x0 = xindex 2023-01-11T21:38:06.4390859Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.4390976Z tmp31 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.4391046Z tmp1 = 1 2023-01-11T21:38:06.4391128Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.4391215Z tmp3 = tmp2.to(tl.float32) 2023-01-11T21:38:06.4391284Z tmp4 = 3 2023-01-11T21:38:06.4391361Z tmp5 = tmp3 + tmp4 2023-01-11T21:38:06.4391426Z tmp6 = 0.0 2023-01-11T21:38:06.4391559Z tmp7 = tl.where(tmp5 != tmp5, tmp5, tl.where(tmp5 > tmp6, tmp5, tmp6)) 2023-01-11T21:38:06.4391630Z tmp8 = 6.0 2023-01-11T21:38:06.4397973Z tmp9 = tl.where(tmp7 != tmp7, tmp7, tl.where(tmp7 < tmp8, tmp7, tmp8)) 2023-01-11T21:38:06.4398075Z tmp10 = tmp3 * tmp9 2023-01-11T21:38:06.4398151Z tmp11 = 6 2023-01-11T21:38:06.4398237Z tmp12 = tmp10 / tmp11 2023-01-11T21:38:06.4398334Z tmp13 = tmp12.to(tl.float32) 2023-01-11T21:38:06.4398458Z tmp14 = -1.0 2023-01-11T21:38:06.4398609Z tmp15 = tl.where(tmp3 != tmp3, tmp3, tl.where(tmp3 > tmp14, tmp3, tmp14)) 2023-01-11T21:38:06.4398682Z tmp16 = 1.0 2023-01-11T21:38:06.4398836Z tmp17 = tl.where(tmp15 != tmp15, tmp15, tl.where(tmp15 < tmp16, tmp15, tmp16)) 2023-01-11T21:38:06.4398936Z tmp18 = tmp17.to(tl.float32) 2023-01-11T21:38:06.4399009Z tmp19 = 0 2023-01-11T21:38:06.4399093Z tmp20 = tmp3 > tmp19 2023-01-11T21:38:06.4399159Z tmp21 = 0.01 2023-01-11T21:38:06.4399240Z tmp22 = tmp3 * tmp21 2023-01-11T21:38:06.4399340Z tmp23 = tl.where(tmp20, tmp3, tmp22) 2023-01-11T21:38:06.4399487Z tmp24 = tmp23.to(tl.float32) 2023-01-11T21:38:06.4399573Z tmp25 = tl.sigmoid(tmp3) 2023-01-11T21:38:06.4399647Z tmp26 = tmp3 * tmp25 2023-01-11T21:38:06.4399735Z tmp27 = tmp26.to(tl.float32) 2023-01-11T21:38:06.4399833Z tmp28 = tl.libdevice.log1p(tmp2) 2023-01-11T21:38:06.4399907Z tmp29 = 99.0 2023-01-11T21:38:06.4400005Z tmp30 = tl.where(tmp19, tmp29, tmp2) 2023-01-11T21:38:06.4400084Z tmp32 = tmp31 + tmp1 2023-01-11T21:38:06.4400181Z tmp33 = tl.where(tmp1, tmp29, tmp32) 2023-01-11T21:38:06.4400309Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp13, xmask) 2023-01-11T21:38:06.4400442Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp18, xmask) 2023-01-11T21:38:06.4400574Z tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp24, xmask) 2023-01-11T21:38:06.4400703Z tl.store(out_ptr3 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp27, xmask) 2023-01-11T21:38:06.4400829Z tl.store(out_ptr4 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp28, xmask) 2023-01-11T21:38:06.4400958Z tl.store(out_ptr5 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp30, xmask) 2023-01-11T21:38:06.4401081Z tl.store(out_ptr6 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp33, xmask) 2023-01-11T21:38:06.4401169Z ''') 2023-01-11T21:38:06.4401175Z 2023-01-11T21:38:06.4401179Z 2023-01-11T21:38:06.4401266Z async_compile.wait(globals()) 2023-01-11T21:38:06.4401343Z del async_compile 2023-01-11T21:38:06.4401348Z 2023-01-11T21:38:06.4401422Z def call(args): 2023-01-11T21:38:06.4401495Z arg0_1, = args 2023-01-11T21:38:06.4401570Z args.clear() 2023-01-11T21:38:06.4401662Z with torch.cuda.device(0): 2023-01-11T21:38:06.4401895Z buf0 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.4402087Z buf1 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.4402287Z buf2 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.4402487Z buf3 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.4402681Z buf4 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.4402872Z buf5 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.4403063Z buf6 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.4403159Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.4403416Z triton_fused_add_add_1_add_2_add_3_add_4_add_6_add_7_convert_element_type_convert_element_type_1_convert_element_type_2_0.run(arg0_1, buf0, buf1, buf2, buf3, buf4, buf5, buf6, 64, grid=grid(64), stream=stream0) 2023-01-11T21:38:06.4403494Z del arg0_1 2023-01-11T21:38:06.4403605Z return (buf0, buf1, buf2, buf3, buf4, buf5, buf6, ) 2023-01-11T21:38:06.4403610Z 2023-01-11T21:38:06.4403615Z 2023-01-11T21:38:06.4403698Z if __name__ == "__main__": 2023-01-11T21:38:06.4403820Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.4403951Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.4404156Z arg0_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.4404271Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.4404276Z 2023-01-11T21:38:06.4404349Z ok (0.857s) 2023-01-11T21:38:06.4404701Z test_inplace_add_alpha_autotune (__main__.CudaTests) ... [2023-01-11 21:34:53,283] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph None 2023-01-11T21:38:06.4404946Z [2023-01-11 21:34:53,943] torch._inductor.compile_fx: [WARNING] skipping cudagraphs due to input mutation 2023-01-11T21:38:06.4405211Z [2023-01-11 21:34:53,943] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph None 2023-01-11T21:38:06.4405217Z 2023-01-11T21:38:06.4405315Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.4405391Z import torch 2023-01-11T21:38:06.4405469Z import random 2023-01-11T21:38:06.4405639Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.4405781Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.4405787Z 2023-01-11T21:38:06.4405885Z aten = torch.ops.aten 2023-01-11T21:38:06.4406018Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.4406117Z async_compile = AsyncCompile() 2023-01-11T21:38:06.4406122Z 2023-01-11T21:38:06.4406198Z import triton 2023-01-11T21:38:06.4406292Z import triton.language as tl 2023-01-11T21:38:06.4406419Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.4406559Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.4406567Z 2023-01-11T21:38:06.4406572Z 2023-01-11T21:38:06.4406733Z triton_fused_add__0 = async_compile.triton(''' 2023-01-11T21:38:06.4406811Z import triton 2023-01-11T21:38:06.4406899Z import triton.language as tl 2023-01-11T21:38:06.4407013Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4407120Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4407253Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4407382Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4407387Z 2023-01-11T21:38:06.4407886Z @pointwise(size_hints=[8, 64], tile_hint=TileHint.DEFAULT,filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'out_ptr0', 'in_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.4407965Z @triton.jit 2023-01-11T21:38:06.4408144Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, ynumel, XBLOCK : tl.constexpr, YBLOCK : tl.constexpr): 2023-01-11T21:38:06.4408241Z xnumel = 6 2023-01-11T21:38:06.4408316Z ynumel = 40 2023-01-11T21:38:06.4408417Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4408552Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.4408641Z xmask = xindex < xnumel 2023-01-11T21:38:06.4408744Z yoffset = tl.program_id(1) * YBLOCK 2023-01-11T21:38:06.4408878Z yindex = yoffset + tl.reshape(tl.arange(0, YBLOCK), [1, YBLOCK]) 2023-01-11T21:38:06.4408956Z ymask = yindex < ynumel 2023-01-11T21:38:06.4409028Z x3 = xindex 2023-01-11T21:38:06.4409100Z y2 = yindex 2023-01-11T21:38:06.4409174Z x0 = xindex % 3 2023-01-11T21:38:06.4409256Z x1 = (xindex // 3) 2023-01-11T21:38:06.4409472Z tmp0 = tl.load(in_ptr0 + (y2 + (40*x3)), xmask & ymask, eviction_policy='evict_last') 2023-01-11T21:38:06.4409598Z tmp1 = tl.load(in_ptr1 + (x0 + (3*y2) + (120*x1)), xmask & ymask) 2023-01-11T21:38:06.4409669Z tmp2 = 0.55 2023-01-11T21:38:06.4409750Z tmp3 = tmp1 * tmp2 2023-01-11T21:38:06.4409831Z tmp4 = tmp0 + tmp3 2023-01-11T21:38:06.4409991Z tl.store(out_ptr0 + (y2 + (40*x3) + tl.zeros([XBLOCK, YBLOCK], tl.int32)), tmp4, xmask & ymask) 2023-01-11T21:38:06.4410078Z ''') 2023-01-11T21:38:06.4410084Z 2023-01-11T21:38:06.4410092Z 2023-01-11T21:38:06.4410188Z async_compile.wait(globals()) 2023-01-11T21:38:06.4410268Z del async_compile 2023-01-11T21:38:06.4410273Z 2023-01-11T21:38:06.4410349Z def call(args): 2023-01-11T21:38:06.4410416Z x_1, y_1 = args 2023-01-11T21:38:06.4410493Z args.clear() 2023-01-11T21:38:06.4410587Z with torch.cuda.device(0): 2023-01-11T21:38:06.4410682Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.4410821Z triton_fused_add__0.run(x_1, y_1, x_1, 6, 40, grid=grid(6, 40), stream=stream0) 2023-01-11T21:38:06.4410893Z del y_1 2023-01-11T21:38:06.4410972Z return (x_1, ) 2023-01-11T21:38:06.4410977Z 2023-01-11T21:38:06.4410984Z 2023-01-11T21:38:06.4411060Z if __name__ == "__main__": 2023-01-11T21:38:06.4411179Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.4411307Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.4411552Z x_1 = rand_strided((2, 3, 4, 10), (120, 40, 10, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.4411767Z y_1 = rand_strided((2, 3, 4, 10), (120, 1, 30, 3), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.4411886Z print_performance(lambda: call([x_1, y_1])) 2023-01-11T21:38:06.4411891Z 2023-01-11T21:38:06.4411963Z ok (8.995s) 2023-01-11T21:38:06.4412486Z test_inplace_add_cuda (__main__.CudaTests) ... /opt/conda/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py:372: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled.Consider setting `torch.set_float32_matmul_precision('high')` 2023-01-11T21:38:06.4412567Z warnings.warn( 2023-01-11T21:38:06.4412822Z [2023-01-11 21:35:02,272] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 619 2023-01-11T21:38:06.4413071Z [2023-01-11 21:35:02,343] torch._inductor.compile_fx: [WARNING] skipping cudagraphs due to input mutation 2023-01-11T21:38:06.4413336Z [2023-01-11 21:35:02,343] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 619 2023-01-11T21:38:06.4413342Z 2023-01-11T21:38:06.4413442Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.4413520Z import torch 2023-01-11T21:38:06.4413595Z import random 2023-01-11T21:38:06.4413716Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.4413843Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.4413848Z 2023-01-11T21:38:06.4413926Z aten = torch.ops.aten 2023-01-11T21:38:06.4414066Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.4414163Z async_compile = AsyncCompile() 2023-01-11T21:38:06.4414168Z 2023-01-11T21:38:06.4414273Z import triton 2023-01-11T21:38:06.4414367Z import triton.language as tl 2023-01-11T21:38:06.4414720Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.4414863Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.4414868Z 2023-01-11T21:38:06.4414873Z 2023-01-11T21:38:06.4415035Z triton_fused_add__0 = async_compile.triton(''' 2023-01-11T21:38:06.4415103Z import triton 2023-01-11T21:38:06.4415194Z import triton.language as tl 2023-01-11T21:38:06.4415321Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4415433Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4415589Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4415713Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4415718Z 2023-01-11T21:38:06.4416160Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'out_ptr0', 'in_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.4416237Z @triton.jit 2023-01-11T21:38:06.4416371Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4416441Z xnumel = 16 2023-01-11T21:38:06.4416536Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4416666Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4416749Z xmask = xindex < xnumel 2023-01-11T21:38:06.4416817Z x0 = xindex 2023-01-11T21:38:06.4417005Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.4417095Z tmp1 = tl.load(in_ptr1 + (x0), xmask) 2023-01-11T21:38:06.4417239Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.4417374Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.4417459Z ''') 2023-01-11T21:38:06.4417464Z 2023-01-11T21:38:06.4417469Z 2023-01-11T21:38:06.4417562Z async_compile.wait(globals()) 2023-01-11T21:38:06.4417639Z del async_compile 2023-01-11T21:38:06.4417644Z 2023-01-11T21:38:06.4417717Z def call(args): 2023-01-11T21:38:06.4417797Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.4417864Z args.clear() 2023-01-11T21:38:06.4417955Z with torch.cuda.device(0): 2023-01-11T21:38:06.4418109Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.4418256Z triton_fused_add__0.run(arg0_1, arg1_1, arg0_1, 16, grid=grid(16), stream=stream0) 2023-01-11T21:38:06.4418329Z del arg1_1 2023-01-11T21:38:06.4418408Z return (arg0_1, ) 2023-01-11T21:38:06.4418413Z 2023-01-11T21:38:06.4418418Z 2023-01-11T21:38:06.4418496Z if __name__ == "__main__": 2023-01-11T21:38:06.4418606Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.4418732Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.4418931Z arg0_1 = rand_strided((4, 4), (4, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.4419126Z arg1_1 = rand_strided((4, 4), (4, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.4419248Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.4419253Z 2023-01-11T21:38:06.4419328Z ok (0.083s) 2023-01-11T21:38:06.4419793Z test_inplace_buffer_autotune (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.4419924Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.4420180Z [2023-01-11 21:35:02,362] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 620 2023-01-11T21:38:06.4420439Z [2023-01-11 21:35:02,430] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 620 2023-01-11T21:38:06.4420488Z 2023-01-11T21:38:06.4420582Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.4420657Z import torch 2023-01-11T21:38:06.4420737Z import random 2023-01-11T21:38:06.4420856Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.4420978Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.4420986Z 2023-01-11T21:38:06.4421071Z aten = torch.ops.aten 2023-01-11T21:38:06.4421208Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.4421299Z async_compile = AsyncCompile() 2023-01-11T21:38:06.4421304Z 2023-01-11T21:38:06.4421380Z import triton 2023-01-11T21:38:06.4421473Z import triton.language as tl 2023-01-11T21:38:06.4421598Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.4421741Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.4421747Z 2023-01-11T21:38:06.4421751Z 2023-01-11T21:38:06.4421910Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.4421990Z import triton 2023-01-11T21:38:06.4422084Z import triton.language as tl 2023-01-11T21:38:06.4422193Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4422298Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4422434Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4422567Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4422573Z 2023-01-11T21:38:06.4422989Z @pointwise(size_hints=[32], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.4423064Z @triton.jit 2023-01-11T21:38:06.4423199Z def triton_(in_out_ptr0, in_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4423275Z xnumel = 25 2023-01-11T21:38:06.4423368Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4423499Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4423587Z xmask = xindex < xnumel 2023-01-11T21:38:06.4423660Z x0 = xindex 2023-01-11T21:38:06.4423767Z tmp0 = tl.load(in_out_ptr0 + (x0), xmask) 2023-01-11T21:38:06.4423866Z tmp1 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.4423947Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.4424112Z tl.store(in_out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.4424199Z ''') 2023-01-11T21:38:06.4424205Z 2023-01-11T21:38:06.4424210Z 2023-01-11T21:38:06.4424304Z async_compile.wait(globals()) 2023-01-11T21:38:06.4424383Z del async_compile 2023-01-11T21:38:06.4424388Z 2023-01-11T21:38:06.4424466Z def call(args): 2023-01-11T21:38:06.4424555Z arg0_1, arg1_1, arg2_1 = args 2023-01-11T21:38:06.4424633Z args.clear() 2023-01-11T21:38:06.4424720Z with torch.cuda.device(0): 2023-01-11T21:38:06.4424924Z buf0 = empty_strided((5, 5), (5, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.4425032Z aten.mm.out(arg0_1, arg1_1, out=buf0) 2023-01-11T21:38:06.4425106Z del arg0_1 2023-01-11T21:38:06.4425181Z del arg1_1 2023-01-11T21:38:06.4425308Z buf1 = as_strided(buf0, (1, 1, 5, 5), (25, 25, 5, 1)); del buf0 # reuse 2023-01-11T21:38:06.4425400Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.4425547Z triton_fused_add_0.run(buf1, arg2_1, 25, grid=grid(25), stream=stream0) 2023-01-11T21:38:06.4425631Z del arg2_1 2023-01-11T21:38:06.4425717Z return (buf1, ) 2023-01-11T21:38:06.4425723Z 2023-01-11T21:38:06.4425729Z 2023-01-11T21:38:06.4425825Z if __name__ == "__main__": 2023-01-11T21:38:06.4425946Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.4426074Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.4426273Z arg0_1 = rand_strided((5, 5), (5, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.4426471Z arg1_1 = rand_strided((5, 5), (5, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.4426734Z arg2_1 = rand_strided((1, 1, 5, 5), (25, 1, 5, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.4426856Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1])) 2023-01-11T21:38:06.4426861Z 2023-01-11T21:38:06.4426932Z ok (0.088s) 2023-01-11T21:38:06.4427467Z test_inplace_mixed_dtype_ops_cuda (__main__.CudaTests) ... /opt/conda/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py:372: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled.Consider setting `torch.set_float32_matmul_precision('high')` 2023-01-11T21:38:06.4427548Z warnings.warn( 2023-01-11T21:38:06.4427805Z [2023-01-11 21:35:02,477] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 621 2023-01-11T21:38:06.4428070Z [2023-01-11 21:35:02,550] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 621 2023-01-11T21:38:06.4428076Z 2023-01-11T21:38:06.4428178Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.4428253Z import torch 2023-01-11T21:38:06.4428323Z import random 2023-01-11T21:38:06.4428445Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.4428570Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.4428576Z 2023-01-11T21:38:06.4428660Z aten = torch.ops.aten 2023-01-11T21:38:06.4428801Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.4428897Z async_compile = AsyncCompile() 2023-01-11T21:38:06.4428902Z 2023-01-11T21:38:06.4428978Z import triton 2023-01-11T21:38:06.4429072Z import triton.language as tl 2023-01-11T21:38:06.4429194Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.4429333Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.4429338Z 2023-01-11T21:38:06.4429343Z 2023-01-11T21:38:06.4429552Z triton_fused_add_add__convert_element_type_mul__0 = async_compile.triton(''' 2023-01-11T21:38:06.4429630Z import triton 2023-01-11T21:38:06.4429729Z import triton.language as tl 2023-01-11T21:38:06.4429845Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4429950Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4430084Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4430233Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4430239Z 2023-01-11T21:38:06.4430671Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp64', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.4430747Z @triton.jit 2023-01-11T21:38:06.4430892Z def triton_(in_out_ptr0, in_ptr0, in_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4430968Z xnumel = 16 2023-01-11T21:38:06.4431067Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4431200Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4431287Z xmask = xindex < xnumel 2023-01-11T21:38:06.4431354Z x0 = xindex 2023-01-11T21:38:06.4431451Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.4431645Z tmp1 = tl.load(in_ptr1 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.4431746Z tmp8 = tl.load(in_ptr1 + (x0), xmask) 2023-01-11T21:38:06.4431835Z tmp2 = tmp1.to(tl.float32) 2023-01-11T21:38:06.4431916Z tmp3 = tmp0 + tmp2 2023-01-11T21:38:06.4432008Z tmp4 = tmp3.to(tl.float64) 2023-01-11T21:38:06.4432081Z tmp5 = tmp4 + tmp1 2023-01-11T21:38:06.4432168Z tmp6 = tmp5.to(tl.float32) 2023-01-11T21:38:06.4432254Z tmp7 = tmp6.to(tl.float64) 2023-01-11T21:38:06.4432333Z tmp9 = tmp7 * tmp8 2023-01-11T21:38:06.4432421Z tmp10 = tmp9.to(tl.float32) 2023-01-11T21:38:06.4432563Z tl.store(in_out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp10, xmask) 2023-01-11T21:38:06.4432649Z ''') 2023-01-11T21:38:06.4432655Z 2023-01-11T21:38:06.4432689Z 2023-01-11T21:38:06.4432779Z async_compile.wait(globals()) 2023-01-11T21:38:06.4432859Z del async_compile 2023-01-11T21:38:06.4432864Z 2023-01-11T21:38:06.4432943Z def call(args): 2023-01-11T21:38:06.4433024Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.4433101Z args.clear() 2023-01-11T21:38:06.4433195Z with torch.cuda.device(0): 2023-01-11T21:38:06.4433402Z buf0 = empty_strided((4, 4), (4, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.4433489Z buf1 = buf0; del buf0 # reuse 2023-01-11T21:38:06.4433583Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.4433761Z triton_fused_add_add__convert_element_type_mul__0.run(buf1, arg0_1, arg1_1, 16, grid=grid(16), stream=stream0) 2023-01-11T21:38:06.4433836Z del arg0_1 2023-01-11T21:38:06.4433911Z del arg1_1 2023-01-11T21:38:06.4433991Z return (buf1, ) 2023-01-11T21:38:06.4433996Z 2023-01-11T21:38:06.4434002Z 2023-01-11T21:38:06.4434086Z if __name__ == "__main__": 2023-01-11T21:38:06.4434208Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.4434330Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.4434533Z arg0_1 = rand_strided((4, 4), (4, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.4434731Z arg1_1 = rand_strided((4, 4), (4, 1), device='cuda:0', dtype=torch.float64) 2023-01-11T21:38:06.4434857Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.4434862Z 2023-01-11T21:38:06.4434936Z ok (0.304s) 2023-01-11T21:38:06.4435275Z test_input_mutation1_cuda (__main__.CudaTests) ... [2023-01-11 21:35:02,754] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 622 2023-01-11T21:38:06.4435525Z [2023-01-11 21:35:02,765] torch._inductor.scheduler: [DEBUG] remove_buffer('buf0') 2023-01-11T21:38:06.4435783Z [2023-01-11 21:35:02,832] torch._inductor.compile_fx: [WARNING] skipping cudagraphs due to input mutation 2023-01-11T21:38:06.4436042Z [2023-01-11 21:35:02,832] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 622 2023-01-11T21:38:06.4436056Z 2023-01-11T21:38:06.4436150Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.4436227Z import torch 2023-01-11T21:38:06.4436303Z import random 2023-01-11T21:38:06.4436425Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.4436582Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.4436588Z 2023-01-11T21:38:06.4436675Z aten = torch.ops.aten 2023-01-11T21:38:06.4436813Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.4436903Z async_compile = AsyncCompile() 2023-01-11T21:38:06.4436908Z 2023-01-11T21:38:06.4436985Z import triton 2023-01-11T21:38:06.4437080Z import triton.language as tl 2023-01-11T21:38:06.4437207Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.4437349Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.4437355Z 2023-01-11T21:38:06.4437363Z 2023-01-11T21:38:06.4437536Z triton_fused_add_copy__div_0 = async_compile.triton(''' 2023-01-11T21:38:06.4437614Z import triton 2023-01-11T21:38:06.4437708Z import triton.language as tl 2023-01-11T21:38:06.4437817Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4437920Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4438058Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4438190Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4438195Z 2023-01-11T21:38:06.4438638Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'out_ptr1', 'in_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.4438715Z @triton.jit 2023-01-11T21:38:06.4438858Z def triton_(in_ptr0, out_ptr1, out_ptr2, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4438933Z xnumel = 64 2023-01-11T21:38:06.4439053Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4439184Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4439267Z xmask = xindex < xnumel 2023-01-11T21:38:06.4439339Z x0 = xindex 2023-01-11T21:38:06.4439532Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.4439608Z tmp1 = 1 2023-01-11T21:38:06.4439692Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.4439766Z tmp3 = tmp2 * tmp2 2023-01-11T21:38:06.4439838Z tmp4 = 2 2023-01-11T21:38:06.4439919Z tmp5 = tmp2 + tmp4 2023-01-11T21:38:06.4439997Z tmp6 = tmp3 / tmp5 2023-01-11T21:38:06.4440140Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.4440274Z tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp6, xmask) 2023-01-11T21:38:06.4440362Z ''') 2023-01-11T21:38:06.4440367Z 2023-01-11T21:38:06.4440372Z 2023-01-11T21:38:06.4440460Z async_compile.wait(globals()) 2023-01-11T21:38:06.4440541Z del async_compile 2023-01-11T21:38:06.4440546Z 2023-01-11T21:38:06.4440624Z def call(args): 2023-01-11T21:38:06.4440701Z arg0_1, = args 2023-01-11T21:38:06.4440777Z args.clear() 2023-01-11T21:38:06.4440878Z with torch.cuda.device(0): 2023-01-11T21:38:06.4441083Z buf2 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.4441171Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.4441330Z triton_fused_add_copy__div_0.run(arg0_1, arg0_1, buf2, 64, grid=grid(64), stream=stream0) 2023-01-11T21:38:06.4441404Z del arg0_1 2023-01-11T21:38:06.4441484Z return (buf2, ) 2023-01-11T21:38:06.4441489Z 2023-01-11T21:38:06.4441494Z 2023-01-11T21:38:06.4441577Z if __name__ == "__main__": 2023-01-11T21:38:06.4441697Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.4441824Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.4442031Z arg0_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.4442142Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.4442154Z 2023-01-11T21:38:06.4442220Z ok (0.097s) 2023-01-11T21:38:06.4442579Z test_input_mutation2_cuda (__main__.CudaTests) ... [2023-01-11 21:35:02,905] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 623 2023-01-11T21:38:06.4442845Z [2023-01-11 21:35:02,914] torch._inductor.ir: [WARNING] Using FallbackKernel: torch.ops.aten.expand_copy.default 2023-01-11T21:38:06.4443107Z [2023-01-11 21:35:03,070] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 623 2023-01-11T21:38:06.4443113Z 2023-01-11T21:38:06.4443212Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.4443289Z import torch 2023-01-11T21:38:06.4443367Z import random 2023-01-11T21:38:06.4443487Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.4443607Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.4443614Z 2023-01-11T21:38:06.4443698Z aten = torch.ops.aten 2023-01-11T21:38:06.4443838Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.4443938Z async_compile = AsyncCompile() 2023-01-11T21:38:06.4443944Z 2023-01-11T21:38:06.4444021Z import triton 2023-01-11T21:38:06.4444116Z import triton.language as tl 2023-01-11T21:38:06.4444244Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.4444380Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.4444391Z 2023-01-11T21:38:06.4444395Z 2023-01-11T21:38:06.4444544Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.4444621Z import triton 2023-01-11T21:38:06.4444718Z import triton.language as tl 2023-01-11T21:38:06.4444834Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4444937Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4445072Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4445201Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4445240Z 2023-01-11T21:38:06.4445671Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.4445770Z @triton.jit 2023-01-11T21:38:06.4445904Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4445981Z xnumel = 64 2023-01-11T21:38:06.4446081Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4446210Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4446294Z xmask = xindex < xnumel 2023-01-11T21:38:06.4446367Z x0 = xindex 2023-01-11T21:38:06.4446458Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.4446532Z tmp1 = 1 2023-01-11T21:38:06.4446614Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.4446754Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.4446845Z ''') 2023-01-11T21:38:06.4446851Z 2023-01-11T21:38:06.4446855Z 2023-01-11T21:38:06.4447031Z triton_fused_lift_fresh_copy_1 = async_compile.triton(''' 2023-01-11T21:38:06.4447108Z import triton 2023-01-11T21:38:06.4447196Z import triton.language as tl 2023-01-11T21:38:06.4447315Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4447420Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4447552Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4447679Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4447685Z 2023-01-11T21:38:06.4448069Z @pointwise(size_hints=[1], filename=__file__, meta={'signature': {0: '*fp32', 1: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0,), equal_to_1=())]}) 2023-01-11T21:38:06.4448145Z @triton.jit 2023-01-11T21:38:06.4448268Z def triton_(out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4448339Z xnumel = 1 2023-01-11T21:38:06.4448436Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4448567Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4448653Z xmask = xindex < xnumel 2023-01-11T21:38:06.4448727Z tmp0 = 66.0 2023-01-11T21:38:06.4448892Z tl.store(out_ptr0 + (0 + tl.zeros([XBLOCK], tl.int32)), tmp0, None) 2023-01-11T21:38:06.4448981Z ''') 2023-01-11T21:38:06.4448986Z 2023-01-11T21:38:06.4448991Z 2023-01-11T21:38:06.4449151Z triton_fused_add_1_2 = async_compile.triton(''' 2023-01-11T21:38:06.4449222Z import triton 2023-01-11T21:38:06.4449321Z import triton.language as tl 2023-01-11T21:38:06.4449436Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4449538Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4449673Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4449799Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4449808Z 2023-01-11T21:38:06.4450210Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.4450280Z @triton.jit 2023-01-11T21:38:06.4450418Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4450496Z xnumel = 64 2023-01-11T21:38:06.4450594Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4450724Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4450811Z xmask = xindex < xnumel 2023-01-11T21:38:06.4450884Z x0 = xindex 2023-01-11T21:38:06.4451070Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.4451144Z tmp1 = 2 2023-01-11T21:38:06.4451222Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.4451355Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.4451470Z ''') 2023-01-11T21:38:06.4451476Z 2023-01-11T21:38:06.4451480Z 2023-01-11T21:38:06.4451576Z async_compile.wait(globals()) 2023-01-11T21:38:06.4451656Z del async_compile 2023-01-11T21:38:06.4451661Z 2023-01-11T21:38:06.4451736Z def call(args): 2023-01-11T21:38:06.4451811Z primals_1, = args 2023-01-11T21:38:06.4451889Z args.clear() 2023-01-11T21:38:06.4451985Z with torch.cuda.device(0): 2023-01-11T21:38:06.4452191Z buf0 = empty_strided((1, 64), (64, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.4452285Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.4452425Z triton_fused_add_0.run(primals_1, buf0, 64, grid=grid(64), stream=stream0) 2023-01-11T21:38:06.4452505Z del primals_1 2023-01-11T21:38:06.4452696Z buf1 = empty_strided((1, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.4452842Z triton_fused_lift_fresh_copy_1.run(buf1, 1, grid=grid(1), stream=stream0) 2023-01-11T21:38:06.4452980Z buf2 = torch.ops.aten.expand_copy.default(buf1, [64]) 2023-01-11T21:38:06.4453059Z del buf1 2023-01-11T21:38:06.4453135Z buf3 = buf2 2023-01-11T21:38:06.4453241Z assert_size_stride(buf3, (64, ), (1, )) 2023-01-11T21:38:06.4453310Z del buf2 2023-01-11T21:38:06.4453508Z buf4 = empty_strided((1, 64), (64, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.4453643Z triton_fused_add_1_2.run(buf3, buf4, 64, grid=grid(64), stream=stream0) 2023-01-11T21:38:06.4453757Z return (as_strided(buf3, (1, 64), (64, 1)), buf0, buf4, ) 2023-01-11T21:38:06.4453762Z 2023-01-11T21:38:06.4453766Z 2023-01-11T21:38:06.4453847Z if __name__ == "__main__": 2023-01-11T21:38:06.4453965Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.4454095Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.4454306Z primals_1 = rand_strided((1, 64), (64, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.4454426Z print_performance(lambda: call([primals_1])) 2023-01-11T21:38:06.4454435Z 2023-01-11T21:38:06.4454629Z ok (0.372s) 2023-01-11T21:38:06.4454954Z test_input_mutation3_cuda (__main__.CudaTests) ... [2023-01-11 21:35:03,240] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 624 2023-01-11T21:38:06.4455244Z [2023-01-11 21:35:03,351] torch._inductor.compile_fx: [WARNING] skipping cudagraphs due to input mutation 2023-01-11T21:38:06.4455514Z [2023-01-11 21:35:03,351] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 624 2023-01-11T21:38:06.4455520Z 2023-01-11T21:38:06.4455617Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.4455692Z import torch 2023-01-11T21:38:06.4455770Z import random 2023-01-11T21:38:06.4455891Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.4456015Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.4456020Z 2023-01-11T21:38:06.4456095Z aten = torch.ops.aten 2023-01-11T21:38:06.4456237Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.4456333Z async_compile = AsyncCompile() 2023-01-11T21:38:06.4456339Z 2023-01-11T21:38:06.4456413Z import triton 2023-01-11T21:38:06.4456503Z import triton.language as tl 2023-01-11T21:38:06.4456628Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.4456769Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.4456775Z 2023-01-11T21:38:06.4456779Z 2023-01-11T21:38:06.4456934Z triton_fused_add__0 = async_compile.triton(''' 2023-01-11T21:38:06.4457001Z import triton 2023-01-11T21:38:06.4457092Z import triton.language as tl 2023-01-11T21:38:06.4457264Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4457368Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4457502Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4457629Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4457634Z 2023-01-11T21:38:06.4458180Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'out_ptr0', 'in_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.4458262Z @triton.jit 2023-01-11T21:38:06.4458395Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4458472Z xnumel = 64 2023-01-11T21:38:06.4458571Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4458702Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4458788Z xmask = xindex < xnumel 2023-01-11T21:38:06.4458860Z x0 = xindex 2023-01-11T21:38:06.4459054Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.4459121Z tmp1 = 1 2023-01-11T21:38:06.4459205Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.4459343Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.4459433Z ''') 2023-01-11T21:38:06.4459439Z 2023-01-11T21:38:06.4459443Z 2023-01-11T21:38:06.4459601Z triton_fused_mul__1 = async_compile.triton(''' 2023-01-11T21:38:06.4459678Z import triton 2023-01-11T21:38:06.4459774Z import triton.language as tl 2023-01-11T21:38:06.4459884Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4459991Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4460125Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4460256Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4460261Z 2023-01-11T21:38:06.4460678Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'out_ptr0', 'in_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.4460751Z @triton.jit 2023-01-11T21:38:06.4460886Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4460963Z xnumel = 64 2023-01-11T21:38:06.4461054Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4461186Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4461271Z xmask = xindex < xnumel 2023-01-11T21:38:06.4461343Z x0 = xindex 2023-01-11T21:38:06.4461561Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.4461635Z tmp1 = 2 2023-01-11T21:38:06.4461716Z tmp2 = tmp0 * tmp1 2023-01-11T21:38:06.4461844Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.4461930Z ''') 2023-01-11T21:38:06.4461936Z 2023-01-11T21:38:06.4461940Z 2023-01-11T21:38:06.4462101Z triton_fused_sigmoid__2 = async_compile.triton(''' 2023-01-11T21:38:06.4462181Z import triton 2023-01-11T21:38:06.4462274Z import triton.language as tl 2023-01-11T21:38:06.4462391Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4462494Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4462629Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4462749Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4462754Z 2023-01-11T21:38:06.4463181Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'out_ptr0', 'in_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.4463257Z @triton.jit 2023-01-11T21:38:06.4463392Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4463467Z xnumel = 64 2023-01-11T21:38:06.4463566Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4463697Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4463782Z xmask = xindex < xnumel 2023-01-11T21:38:06.4463848Z x0 = xindex 2023-01-11T21:38:06.4464039Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.4464160Z tmp1 = tl.sigmoid(tmp0) 2023-01-11T21:38:06.4464295Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask) 2023-01-11T21:38:06.4464381Z ''') 2023-01-11T21:38:06.4464386Z 2023-01-11T21:38:06.4464391Z 2023-01-11T21:38:06.4464558Z triton_fused_add__1_view_3 = async_compile.triton(''' 2023-01-11T21:38:06.4464638Z import triton 2023-01-11T21:38:06.4464726Z import triton.language as tl 2023-01-11T21:38:06.4464842Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4464944Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4465076Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4465203Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4465208Z 2023-01-11T21:38:06.4465627Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'out_ptr0', 'in_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.4465705Z @triton.jit 2023-01-11T21:38:06.4465840Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4465912Z xnumel = 64 2023-01-11T21:38:06.4466004Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4466136Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4466217Z xmask = xindex < xnumel 2023-01-11T21:38:06.4466290Z x0 = xindex 2023-01-11T21:38:06.4466480Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.4466552Z tmp1 = 3 2023-01-11T21:38:06.4466626Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.4466760Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.4466845Z ''') 2023-01-11T21:38:06.4466852Z 2023-01-11T21:38:06.4466856Z 2023-01-11T21:38:06.4467024Z triton_fused_mul__1_view_4 = async_compile.triton(''' 2023-01-11T21:38:06.4467103Z import triton 2023-01-11T21:38:06.4467200Z import triton.language as tl 2023-01-11T21:38:06.4467317Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4467420Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4467545Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4467676Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4467707Z 2023-01-11T21:38:06.4468134Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'out_ptr0', 'in_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.4468209Z @triton.jit 2023-01-11T21:38:06.4468345Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4468420Z xnumel = 64 2023-01-11T21:38:06.4468519Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4468651Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4468734Z xmask = xindex < xnumel 2023-01-11T21:38:06.4468808Z x0 = xindex 2023-01-11T21:38:06.4469000Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.4469073Z tmp1 = 4 2023-01-11T21:38:06.4469155Z tmp2 = tmp0 * tmp1 2023-01-11T21:38:06.4469291Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.4469375Z ''') 2023-01-11T21:38:06.4469380Z 2023-01-11T21:38:06.4469385Z 2023-01-11T21:38:06.4469551Z triton_fused_relu__view_5 = async_compile.triton(''' 2023-01-11T21:38:06.4469621Z import triton 2023-01-11T21:38:06.4469716Z import triton.language as tl 2023-01-11T21:38:06.4469832Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4469936Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4470069Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4470195Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4470201Z 2023-01-11T21:38:06.4470664Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'out_ptr0', 'in_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.4470738Z @triton.jit 2023-01-11T21:38:06.4470869Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4470944Z xnumel = 64 2023-01-11T21:38:06.4471043Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4471172Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4471256Z xmask = xindex < xnumel 2023-01-11T21:38:06.4471328Z x0 = xindex 2023-01-11T21:38:06.4471518Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.4471630Z tmp1 = tl.where(0 != 0, 0, tl.where(0 > tmp0, 0, tmp0)) 2023-01-11T21:38:06.4471763Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask) 2023-01-11T21:38:06.4471854Z ''') 2023-01-11T21:38:06.4471859Z 2023-01-11T21:38:06.4471863Z 2023-01-11T21:38:06.4471957Z async_compile.wait(globals()) 2023-01-11T21:38:06.4472044Z del async_compile 2023-01-11T21:38:06.4472049Z 2023-01-11T21:38:06.4472128Z def call(args): 2023-01-11T21:38:06.4472204Z arg0_1, = args 2023-01-11T21:38:06.4472274Z args.clear() 2023-01-11T21:38:06.4472373Z with torch.cuda.device(0): 2023-01-11T21:38:06.4472468Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.4472608Z triton_fused_add__0.run(arg0_1, arg0_1, 64, grid=grid(64), stream=stream0) 2023-01-11T21:38:06.4472740Z triton_fused_mul__1.run(arg0_1, arg0_1, 64, grid=grid(64), stream=stream0) 2023-01-11T21:38:06.4472884Z triton_fused_sigmoid__2.run(arg0_1, arg0_1, 64, grid=grid(64), stream=stream0) 2023-01-11T21:38:06.4473032Z triton_fused_add__1_view_3.run(arg0_1, arg0_1, 64, grid=grid(64), stream=stream0) 2023-01-11T21:38:06.4473174Z triton_fused_mul__1_view_4.run(arg0_1, arg0_1, 64, grid=grid(64), stream=stream0) 2023-01-11T21:38:06.4473313Z triton_fused_relu__view_5.run(arg0_1, arg0_1, 64, grid=grid(64), stream=stream0) 2023-01-11T21:38:06.4473421Z return (as_strided(arg0_1, (64, ), (1, )), ) 2023-01-11T21:38:06.4473427Z 2023-01-11T21:38:06.4473432Z 2023-01-11T21:38:06.4473518Z if __name__ == "__main__": 2023-01-11T21:38:06.4473662Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.4473793Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.4473999Z arg0_1 = rand_strided((1, 64), (64, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.4474112Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.4474117Z 2023-01-11T21:38:06.4474192Z ok (0.146s) 2023-01-11T21:38:06.4474517Z test_input_mutation4_cuda (__main__.CudaTests) ... [2023-01-11 21:35:03,361] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 625 2023-01-11T21:38:06.4474767Z [2023-01-11 21:35:03,370] torch._inductor.compile_fx: [WARNING] skipping cudagraphs due to input mutation 2023-01-11T21:38:06.4475034Z [2023-01-11 21:35:03,370] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 625 2023-01-11T21:38:06.4475040Z 2023-01-11T21:38:06.4475143Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.4475219Z import torch 2023-01-11T21:38:06.4475299Z import random 2023-01-11T21:38:06.4475423Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.4475549Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.4475554Z 2023-01-11T21:38:06.4475639Z aten = torch.ops.aten 2023-01-11T21:38:06.4475772Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.4475868Z async_compile = AsyncCompile() 2023-01-11T21:38:06.4475873Z 2023-01-11T21:38:06.4475952Z import triton 2023-01-11T21:38:06.4476044Z import triton.language as tl 2023-01-11T21:38:06.4476173Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.4476395Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.4476400Z 2023-01-11T21:38:06.4476405Z 2023-01-11T21:38:06.4476562Z triton_fused_relu__0 = async_compile.triton(''' 2023-01-11T21:38:06.4476633Z import triton 2023-01-11T21:38:06.4476728Z import triton.language as tl 2023-01-11T21:38:06.4476848Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4476954Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4477089Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4477215Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4477220Z 2023-01-11T21:38:06.4477641Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'out_ptr0', 'in_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.4477717Z @triton.jit 2023-01-11T21:38:06.4477845Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4477926Z xnumel = 64 2023-01-11T21:38:06.4478025Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4478156Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4478243Z xmask = xindex < xnumel 2023-01-11T21:38:06.4478315Z x0 = xindex 2023-01-11T21:38:06.4478511Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.4478630Z tmp1 = tl.where(0 != 0, 0, tl.where(0 > tmp0, 0, tmp0)) 2023-01-11T21:38:06.4478761Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask) 2023-01-11T21:38:06.4478847Z ''') 2023-01-11T21:38:06.4478852Z 2023-01-11T21:38:06.4478857Z 2023-01-11T21:38:06.4478953Z async_compile.wait(globals()) 2023-01-11T21:38:06.4479034Z del async_compile 2023-01-11T21:38:06.4479039Z 2023-01-11T21:38:06.4479114Z def call(args): 2023-01-11T21:38:06.4479189Z arg0_1, = args 2023-01-11T21:38:06.4479266Z args.clear() 2023-01-11T21:38:06.4479356Z with torch.cuda.device(0): 2023-01-11T21:38:06.4479452Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.4479593Z triton_fused_relu__0.run(arg0_1, arg0_1, 64, grid=grid(64), stream=stream0) 2023-01-11T21:38:06.4479675Z return (arg0_1, ) 2023-01-11T21:38:06.4479680Z 2023-01-11T21:38:06.4479684Z 2023-01-11T21:38:06.4479792Z if __name__ == "__main__": 2023-01-11T21:38:06.4479916Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.4480042Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.4480247Z arg0_1 = rand_strided((1, 64), (64, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.4480354Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.4480359Z 2023-01-11T21:38:06.4480431Z ok (0.019s) 2023-01-11T21:38:06.4480900Z test_invalid_operand_issue1_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.4481039Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.4481302Z [2023-01-11 21:35:03,715] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 626 2023-01-11T21:38:06.4481566Z [2023-01-11 21:35:03,834] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 626 2023-01-11T21:38:06.4481985Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.4482142Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.4482399Z [2023-01-11 21:35:03,950] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 627 2023-01-11T21:38:06.4482665Z [2023-01-11 21:35:04,066] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 627 2023-01-11T21:38:06.4482671Z 2023-01-11T21:38:06.4482771Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.4482841Z import torch 2023-01-11T21:38:06.4482918Z import random 2023-01-11T21:38:06.4483039Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.4483164Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.4483169Z 2023-01-11T21:38:06.4483252Z aten = torch.ops.aten 2023-01-11T21:38:06.4483391Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.4483489Z async_compile = AsyncCompile() 2023-01-11T21:38:06.4483495Z 2023-01-11T21:38:06.4483569Z import triton 2023-01-11T21:38:06.4483660Z import triton.language as tl 2023-01-11T21:38:06.4483784Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.4483926Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.4483932Z 2023-01-11T21:38:06.4483936Z 2023-01-11T21:38:06.4484105Z triton_fused_embedding_0 = async_compile.triton(''' 2023-01-11T21:38:06.4484180Z import triton 2023-01-11T21:38:06.4484274Z import triton.language as tl 2023-01-11T21:38:06.4484390Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4484487Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4484620Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4484747Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4484752Z 2023-01-11T21:38:06.4485199Z @pointwise(size_hints=[1048576], filename=__file__, meta={'signature': {0: '*i64', 1: '*i64', 2: '*i64', 3: '*fp32', 4: '*fp32', 5: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5), equal_to_1=())]}) 2023-01-11T21:38:06.4485275Z @triton.jit 2023-01-11T21:38:06.4485433Z def triton_(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4485512Z xnumel = 786432 2023-01-11T21:38:06.4485638Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4485765Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4485850Z xmask = xindex < xnumel 2023-01-11T21:38:06.4485935Z x1 = (xindex // 768) % 128 2023-01-11T21:38:06.4486018Z x2 = (xindex // 98304) 2023-01-11T21:38:06.4486095Z x3 = (xindex // 768) 2023-01-11T21:38:06.4486174Z x0 = xindex % 768 2023-01-11T21:38:06.4486246Z x4 = xindex 2023-01-11T21:38:06.4486340Z tmp3 = tl.load(in_ptr0 + (x2), xmask) 2023-01-11T21:38:06.4486438Z tmp8 = tl.load(in_ptr2 + (x3), xmask) 2023-01-11T21:38:06.4486510Z tmp0 = x1 2023-01-11T21:38:06.4486587Z tmp1 = 0 2023-01-11T21:38:06.4486668Z tmp2 = tmp0 == tmp1 2023-01-11T21:38:06.4486741Z tmp4 = 1 2023-01-11T21:38:06.4486821Z tmp5 = tmp0 >= tmp4 2023-01-11T21:38:06.4487051Z tmp6 = tl.load(in_ptr1 + ((-1) + x1 + (127*x2) + tl.zeros([XBLOCK], tl.int32)), tmp5 & xmask, other=0) 2023-01-11T21:38:06.4487149Z tmp7 = tl.where(tmp5, tmp6, 0) 2023-01-11T21:38:06.4487248Z tmp9 = tl.where(tmp5, tmp7, tmp8) 2023-01-11T21:38:06.4487346Z tmp10 = tl.where(tmp2, tmp3, tmp9) 2023-01-11T21:38:06.4487463Z tmp11 = tl.load(in_ptr3 + (x0 + (768*tmp10)), xmask) 2023-01-11T21:38:06.4487600Z tl.store(out_ptr0 + (x4 + tl.zeros([XBLOCK], tl.int32)), tmp11, xmask) 2023-01-11T21:38:06.4487687Z ''') 2023-01-11T21:38:06.4487692Z 2023-01-11T21:38:06.4487698Z 2023-01-11T21:38:06.4487792Z async_compile.wait(globals()) 2023-01-11T21:38:06.4487865Z del async_compile 2023-01-11T21:38:06.4487870Z 2023-01-11T21:38:06.4487945Z def call(args): 2023-01-11T21:38:06.4488051Z arg0_1, arg1_1, arg2_1, arg3_1, arg4_1 = args 2023-01-11T21:38:06.4488158Z args.clear() 2023-01-11T21:38:06.4488254Z with torch.cuda.device(0): 2023-01-11T21:38:06.4488481Z buf0 = empty_strided((8, 128, 768), (98304, 768, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.4488577Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.4488742Z triton_fused_embedding_0.run(arg3_1, arg2_1, arg4_1, arg0_1, buf0, 786432, grid=grid(786432), stream=stream0) 2023-01-11T21:38:06.4488816Z del arg0_1 2023-01-11T21:38:06.4488890Z del arg2_1 2023-01-11T21:38:06.4488963Z del arg3_1 2023-01-11T21:38:06.4489037Z del arg4_1 2023-01-11T21:38:06.4489115Z return (buf0, ) 2023-01-11T21:38:06.4489120Z 2023-01-11T21:38:06.4489124Z 2023-01-11T21:38:06.4489206Z if __name__ == "__main__": 2023-01-11T21:38:06.4489320Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.4489449Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.4489663Z arg0_1 = rand_strided((50005, 768), (768, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.4489870Z arg1_1 = rand_strided((8, 128), (128, 1), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.4490071Z arg2_1 = rand_strided((8, 127), (127, 1), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.4490270Z arg3_1 = rand_strided((8, ), (1, ), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.4490468Z arg4_1 = rand_strided((8, 128), (128, 1), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.4490612Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1, arg3_1, arg4_1])) 2023-01-11T21:38:06.4490617Z 2023-01-11T21:38:06.4490622Z 2023-01-11T21:38:06.4490722Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.4490792Z import torch 2023-01-11T21:38:06.4490867Z import random 2023-01-11T21:38:06.4490987Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.4491110Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.4491119Z 2023-01-11T21:38:06.4491204Z aten = torch.ops.aten 2023-01-11T21:38:06.4491340Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.4491437Z async_compile = AsyncCompile() 2023-01-11T21:38:06.4491442Z 2023-01-11T21:38:06.4491512Z import triton 2023-01-11T21:38:06.4491606Z import triton.language as tl 2023-01-11T21:38:06.4491758Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.4491900Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.4491905Z 2023-01-11T21:38:06.4491910Z 2023-01-11T21:38:06.4492076Z triton_fused_embedding_0 = async_compile.triton(''' 2023-01-11T21:38:06.4492154Z import triton 2023-01-11T21:38:06.4492246Z import triton.language as tl 2023-01-11T21:38:06.4492360Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4492457Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4492589Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4492714Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4492722Z 2023-01-11T21:38:06.4493175Z @pointwise(size_hints=[1048576], filename=__file__, meta={'signature': {0: '*i64', 1: '*i64', 2: '*i64', 3: '*fp16', 4: '*fp16', 5: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5), equal_to_1=())]}) 2023-01-11T21:38:06.4493251Z @triton.jit 2023-01-11T21:38:06.4493412Z def triton_(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4493490Z xnumel = 786432 2023-01-11T21:38:06.4493590Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4493715Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4493801Z xmask = xindex < xnumel 2023-01-11T21:38:06.4493887Z x1 = (xindex // 768) % 128 2023-01-11T21:38:06.4493966Z x2 = (xindex // 98304) 2023-01-11T21:38:06.4494047Z x3 = (xindex // 768) 2023-01-11T21:38:06.4494124Z x0 = xindex % 768 2023-01-11T21:38:06.4494224Z x4 = xindex 2023-01-11T21:38:06.4494318Z tmp3 = tl.load(in_ptr0 + (x2), xmask) 2023-01-11T21:38:06.4494415Z tmp8 = tl.load(in_ptr2 + (x3), xmask) 2023-01-11T21:38:06.4494599Z tmp0 = x1 2023-01-11T21:38:06.4494674Z tmp1 = 0 2023-01-11T21:38:06.4494755Z tmp2 = tmp0 == tmp1 2023-01-11T21:38:06.4494827Z tmp4 = 1 2023-01-11T21:38:06.4494908Z tmp5 = tmp0 >= tmp4 2023-01-11T21:38:06.4495145Z tmp6 = tl.load(in_ptr1 + ((-1) + x1 + (127*x2) + tl.zeros([XBLOCK], tl.int32)), tmp5 & xmask, other=0) 2023-01-11T21:38:06.4495244Z tmp7 = tl.where(tmp5, tmp6, 0) 2023-01-11T21:38:06.4495341Z tmp9 = tl.where(tmp5, tmp7, tmp8) 2023-01-11T21:38:06.4495438Z tmp10 = tl.where(tmp2, tmp3, tmp9) 2023-01-11T21:38:06.4495571Z tmp11 = tl.load(in_ptr3 + (x0 + (768*tmp10)), xmask).to(tl.float32) 2023-01-11T21:38:06.4495711Z tl.store(out_ptr0 + (x4 + tl.zeros([XBLOCK], tl.int32)), tmp11, xmask) 2023-01-11T21:38:06.4495808Z ''') 2023-01-11T21:38:06.4495817Z 2023-01-11T21:38:06.4495821Z 2023-01-11T21:38:06.4495910Z async_compile.wait(globals()) 2023-01-11T21:38:06.4495990Z del async_compile 2023-01-11T21:38:06.4495995Z 2023-01-11T21:38:06.4496070Z def call(args): 2023-01-11T21:38:06.4496175Z arg0_1, arg1_1, arg2_1, arg3_1, arg4_1 = args 2023-01-11T21:38:06.4496252Z args.clear() 2023-01-11T21:38:06.4496350Z with torch.cuda.device(0): 2023-01-11T21:38:06.4496572Z buf0 = empty_strided((8, 128, 768), (98304, 768, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.4496665Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.4496827Z triton_fused_embedding_0.run(arg3_1, arg2_1, arg4_1, arg0_1, buf0, 786432, grid=grid(786432), stream=stream0) 2023-01-11T21:38:06.4496902Z del arg0_1 2023-01-11T21:38:06.4496975Z del arg2_1 2023-01-11T21:38:06.4497048Z del arg3_1 2023-01-11T21:38:06.4497121Z del arg4_1 2023-01-11T21:38:06.4497259Z return (buf0, ) 2023-01-11T21:38:06.4497268Z 2023-01-11T21:38:06.4497272Z 2023-01-11T21:38:06.4497355Z if __name__ == "__main__": 2023-01-11T21:38:06.4497469Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.4497596Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.4497868Z arg0_1 = rand_strided((50005, 768), (768, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.4498075Z arg1_1 = rand_strided((8, 128), (128, 1), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.4498276Z arg2_1 = rand_strided((8, 127), (127, 1), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.4498473Z arg3_1 = rand_strided((8, ), (1, ), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.4498671Z arg4_1 = rand_strided((8, 128), (128, 1), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.4498814Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1, arg3_1, arg4_1])) 2023-01-11T21:38:06.4498819Z 2023-01-11T21:38:06.4498887Z ok (0.719s) 2023-01-11T21:38:06.4499348Z test_isinf2_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.4499485Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.4499744Z [2023-01-11 21:35:04,104] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 628 2023-01-11T21:38:06.4500009Z [2023-01-11 21:35:04,176] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 628 2023-01-11T21:38:06.4500427Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.4500591Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.4500852Z [2023-01-11 21:35:04,193] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 629 2023-01-11T21:38:06.4501112Z [2023-01-11 21:35:04,267] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 629 2023-01-11T21:38:06.4501118Z 2023-01-11T21:38:06.4501215Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.4501287Z import torch 2023-01-11T21:38:06.4501354Z import random 2023-01-11T21:38:06.4501472Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.4501596Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.4501601Z 2023-01-11T21:38:06.4501685Z aten = torch.ops.aten 2023-01-11T21:38:06.4501821Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.4501920Z async_compile = AsyncCompile() 2023-01-11T21:38:06.4501925Z 2023-01-11T21:38:06.4501999Z import triton 2023-01-11T21:38:06.4502090Z import triton.language as tl 2023-01-11T21:38:06.4502207Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.4502346Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.4502352Z 2023-01-11T21:38:06.4502356Z 2023-01-11T21:38:06.4502507Z triton_fused_eq_0 = async_compile.triton(''' 2023-01-11T21:38:06.4502580Z import triton 2023-01-11T21:38:06.4502672Z import triton.language as tl 2023-01-11T21:38:06.4502785Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4502886Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4503010Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4503134Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4503140Z 2023-01-11T21:38:06.4503536Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*fp32', 1: '*i1', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.4503614Z @triton.jit 2023-01-11T21:38:06.4503746Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4503846Z xnumel = 5 2023-01-11T21:38:06.4503946Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4504076Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4504152Z xmask = xindex < xnumel 2023-01-11T21:38:06.4504224Z x0 = xindex 2023-01-11T21:38:06.4504321Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.4504393Z tmp1 = x0 2023-01-11T21:38:06.4504461Z tmp2 = 2 2023-01-11T21:38:06.4504540Z tmp3 = tmp1 < tmp2 2023-01-11T21:38:06.4504609Z tmp4 = 1 2023-01-11T21:38:06.4504680Z tmp5 = tmp1 < tmp4 2023-01-11T21:38:06.4504750Z tmp6 = 1.0 2023-01-11T21:38:06.4504827Z tmp7 = float("inf") 2023-01-11T21:38:06.4504926Z tmp8 = tl.where(tmp5, tmp6, tmp7) 2023-01-11T21:38:06.4504994Z tmp9 = 3 2023-01-11T21:38:06.4505073Z tmp10 = tmp1 < tmp9 2023-01-11T21:38:06.4505138Z tmp11 = 2.0 2023-01-11T21:38:06.4505208Z tmp12 = 4 2023-01-11T21:38:06.4505286Z tmp13 = tmp1 < tmp12 2023-01-11T21:38:06.4505407Z tmp14 = float("-inf") 2023-01-11T21:38:06.4505491Z tmp15 = float("nan") 2023-01-11T21:38:06.4505607Z tmp16 = tl.where(tmp13, tmp14, tmp15) 2023-01-11T21:38:06.4505714Z tmp17 = tl.where(tmp10, tmp11, tmp16) 2023-01-11T21:38:06.4505824Z tmp18 = tl.where(tmp3, tmp8, tmp17) 2023-01-11T21:38:06.4505905Z tmp19 = tmp0 == tmp18 2023-01-11T21:38:06.4506041Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp19, xmask) 2023-01-11T21:38:06.4506124Z ''') 2023-01-11T21:38:06.4506130Z 2023-01-11T21:38:06.4506134Z 2023-01-11T21:38:06.4506227Z async_compile.wait(globals()) 2023-01-11T21:38:06.4506302Z del async_compile 2023-01-11T21:38:06.4506338Z 2023-01-11T21:38:06.4506413Z def call(args): 2023-01-11T21:38:06.4506487Z arg0_1, = args 2023-01-11T21:38:06.4506554Z args.clear() 2023-01-11T21:38:06.4506649Z with torch.cuda.device(0): 2023-01-11T21:38:06.4506840Z buf0 = empty_strided((5, ), (1, ), device='cuda', dtype=torch.bool) 2023-01-11T21:38:06.4506931Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.4507065Z triton_fused_eq_0.run(arg0_1, buf0, 5, grid=grid(5), stream=stream0) 2023-01-11T21:38:06.4507138Z del arg0_1 2023-01-11T21:38:06.4507214Z return (buf0, ) 2023-01-11T21:38:06.4507219Z 2023-01-11T21:38:06.4507223Z 2023-01-11T21:38:06.4507296Z if __name__ == "__main__": 2023-01-11T21:38:06.4507412Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.4507539Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.4507736Z arg0_1 = rand_strided((5, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.4507848Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.4507856Z 2023-01-11T21:38:06.4507860Z 2023-01-11T21:38:06.4507957Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.4508029Z import torch 2023-01-11T21:38:06.4508102Z import random 2023-01-11T21:38:06.4508214Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.4508340Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.4508347Z 2023-01-11T21:38:06.4508427Z aten = torch.ops.aten 2023-01-11T21:38:06.4508563Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.4508657Z async_compile = AsyncCompile() 2023-01-11T21:38:06.4508662Z 2023-01-11T21:38:06.4508735Z import triton 2023-01-11T21:38:06.4508829Z import triton.language as tl 2023-01-11T21:38:06.4508947Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.4509086Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.4509091Z 2023-01-11T21:38:06.4509096Z 2023-01-11T21:38:06.4509247Z triton_fused_eq_0 = async_compile.triton(''' 2023-01-11T21:38:06.4509322Z import triton 2023-01-11T21:38:06.4509413Z import triton.language as tl 2023-01-11T21:38:06.4509526Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4509626Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4509786Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4509905Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4509917Z 2023-01-11T21:38:06.4510307Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*fp16', 1: '*i1', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.4510380Z @triton.jit 2023-01-11T21:38:06.4510515Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4510588Z xnumel = 5 2023-01-11T21:38:06.4510685Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4510813Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4510899Z xmask = xindex < xnumel 2023-01-11T21:38:06.4510962Z x0 = xindex 2023-01-11T21:38:06.4511080Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.4511149Z tmp1 = x0 2023-01-11T21:38:06.4511218Z tmp2 = 2 2023-01-11T21:38:06.4511302Z tmp3 = tmp1 < tmp2 2023-01-11T21:38:06.4511372Z tmp4 = 1 2023-01-11T21:38:06.4511451Z tmp5 = tmp1 < tmp4 2023-01-11T21:38:06.4511515Z tmp6 = 1.0 2023-01-11T21:38:06.4511590Z tmp7 = float("inf") 2023-01-11T21:38:06.4511685Z tmp8 = tl.where(tmp5, tmp6, tmp7) 2023-01-11T21:38:06.4511753Z tmp9 = 3 2023-01-11T21:38:06.4511832Z tmp10 = tmp1 < tmp9 2023-01-11T21:38:06.4511903Z tmp11 = 2.0 2023-01-11T21:38:06.4511967Z tmp12 = 4 2023-01-11T21:38:06.4512050Z tmp13 = tmp1 < tmp12 2023-01-11T21:38:06.4512164Z tmp14 = float("-inf") 2023-01-11T21:38:06.4512246Z tmp15 = float("nan") 2023-01-11T21:38:06.4512347Z tmp16 = tl.where(tmp13, tmp14, tmp15) 2023-01-11T21:38:06.4512482Z tmp17 = tl.where(tmp10, tmp11, tmp16) 2023-01-11T21:38:06.4512579Z tmp18 = tl.where(tmp3, tmp8, tmp17) 2023-01-11T21:38:06.4512652Z tmp19 = tmp0 == tmp18 2023-01-11T21:38:06.4512788Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp19, xmask) 2023-01-11T21:38:06.4512876Z ''') 2023-01-11T21:38:06.4512881Z 2023-01-11T21:38:06.4512886Z 2023-01-11T21:38:06.4512980Z async_compile.wait(globals()) 2023-01-11T21:38:06.4513054Z del async_compile 2023-01-11T21:38:06.4513059Z 2023-01-11T21:38:06.4513136Z def call(args): 2023-01-11T21:38:06.4513209Z arg0_1, = args 2023-01-11T21:38:06.4513284Z args.clear() 2023-01-11T21:38:06.4513369Z with torch.cuda.device(0): 2023-01-11T21:38:06.4513555Z buf0 = empty_strided((5, ), (1, ), device='cuda', dtype=torch.bool) 2023-01-11T21:38:06.4513643Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.4513776Z triton_fused_eq_0.run(arg0_1, buf0, 5, grid=grid(5), stream=stream0) 2023-01-11T21:38:06.4513850Z del arg0_1 2023-01-11T21:38:06.4513926Z return (buf0, ) 2023-01-11T21:38:06.4513931Z 2023-01-11T21:38:06.4513935Z 2023-01-11T21:38:06.4514014Z if __name__ == "__main__": 2023-01-11T21:38:06.4514124Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.4514251Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.4514446Z arg0_1 = rand_strided((5, ), (1, ), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.4514557Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.4514562Z 2023-01-11T21:38:06.4514632Z ok (0.179s) 2023-01-11T21:38:06.4515085Z test_isinf_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.4515220Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.4515480Z [2023-01-11 21:35:04,282] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 630 2023-01-11T21:38:06.4515767Z [2023-01-11 21:35:04,420] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 630 2023-01-11T21:38:06.4516186Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.4516315Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.4516562Z [2023-01-11 21:35:04,434] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 631 2023-01-11T21:38:06.4516828Z [2023-01-11 21:35:04,573] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 631 2023-01-11T21:38:06.4517239Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.4517368Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.4517623Z [2023-01-11 21:35:04,589] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 632 2023-01-11T21:38:06.4517882Z [2023-01-11 21:35:04,730] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 632 2023-01-11T21:38:06.4518294Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.4518451Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.4518700Z [2023-01-11 21:35:04,745] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 633 2023-01-11T21:38:06.4518705Z 2023-01-11T21:38:06.4518802Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.4518881Z import torch 2023-01-11T21:38:06.4518948Z import random 2023-01-11T21:38:06.4519065Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.4519190Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.4519195Z 2023-01-11T21:38:06.4519277Z aten = torch.ops.aten 2023-01-11T21:38:06.4519411Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.4519509Z async_compile = AsyncCompile() 2023-01-11T21:38:06.4519514Z 2023-01-11T21:38:06.4519588Z import triton 2023-01-11T21:38:06.4519672Z import triton.language as tl 2023-01-11T21:38:06.4519795Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.4519936Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.4519942Z 2023-01-11T21:38:06.4519946Z 2023-01-11T21:38:06.4520114Z triton_fused_isinf_isnan_0 = async_compile.triton(''' 2023-01-11T21:38:06.4520186Z import triton 2023-01-11T21:38:06.4520275Z import triton.language as tl 2023-01-11T21:38:06.4520386Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4520485Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4520611Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4520738Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4520743Z 2023-01-11T21:38:06.4521152Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*fp32', 1: '*i1', 2: '*i1', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.4521226Z @triton.jit 2023-01-11T21:38:06.4521392Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4521465Z xnumel = 5 2023-01-11T21:38:06.4521558Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4521684Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4521760Z xmask = xindex < xnumel 2023-01-11T21:38:06.4521829Z x0 = xindex 2023-01-11T21:38:06.4522020Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.4522120Z tmp2 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.4522216Z tmp1 = tl.libdevice.isinf(tmp0) 2023-01-11T21:38:06.4522313Z tmp3 = tl.libdevice.isnan(tmp2) 2023-01-11T21:38:06.4522451Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask) 2023-01-11T21:38:06.4522575Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.4522658Z ''') 2023-01-11T21:38:06.4522664Z 2023-01-11T21:38:06.4522668Z 2023-01-11T21:38:06.4522760Z async_compile.wait(globals()) 2023-01-11T21:38:06.4522838Z del async_compile 2023-01-11T21:38:06.4522843Z 2023-01-11T21:38:06.4522917Z def call(args): 2023-01-11T21:38:06.4522991Z arg0_1, = args 2023-01-11T21:38:06.4523066Z args.clear() 2023-01-11T21:38:06.4523152Z with torch.cuda.device(0): 2023-01-11T21:38:06.4523341Z buf0 = empty_strided((5, ), (1, ), device='cuda', dtype=torch.bool) 2023-01-11T21:38:06.4523529Z buf1 = empty_strided((5, ), (1, ), device='cuda', dtype=torch.bool) 2023-01-11T21:38:06.4523621Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.4523770Z triton_fused_isinf_isnan_0.run(arg0_1, buf0, buf1, 5, grid=grid(5), stream=stream0) 2023-01-11T21:38:06.4523872Z del arg0_1 2023-01-11T21:38:06.4523954Z return (buf0, buf1, ) 2023-01-11T21:38:06.4523959Z 2023-01-11T21:38:06.4523964Z 2023-01-11T21:38:06.4524044Z if __name__ == "__main__": 2023-01-11T21:38:06.4524155Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.4524283Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.4524480Z arg0_1 = rand_strided((5, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.4524592Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.4524598Z 2023-01-11T21:38:06.4524602Z 2023-01-11T21:38:06.4524698Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.4524772Z import torch 2023-01-11T21:38:06.4524845Z import random 2023-01-11T21:38:06.4524961Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.4525078Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.4525083Z 2023-01-11T21:38:06.4525165Z aten = torch.ops.aten 2023-01-11T21:38:06.4525303Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.4525397Z async_compile = AsyncCompile() 2023-01-11T21:38:06.4525402Z 2023-01-11T21:38:06.4525476Z import triton 2023-01-11T21:38:06.4525567Z import triton.language as tl 2023-01-11T21:38:06.4525689Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.4525823Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.4525833Z 2023-01-11T21:38:06.4525838Z 2023-01-11T21:38:06.4525997Z triton_fused_isinf_isnan_0 = async_compile.triton(''' 2023-01-11T21:38:06.4526072Z import triton 2023-01-11T21:38:06.4526163Z import triton.language as tl 2023-01-11T21:38:06.4526275Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4526376Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4526509Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4526630Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4526637Z 2023-01-11T21:38:06.4527044Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*fp16', 1: '*i1', 2: '*i1', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.4527111Z @triton.jit 2023-01-11T21:38:06.4527280Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4527351Z xnumel = 5 2023-01-11T21:38:06.4527446Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4527573Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4527655Z xmask = xindex < xnumel 2023-01-11T21:38:06.4527724Z x0 = xindex 2023-01-11T21:38:06.4527931Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.4528048Z tmp2 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.4528145Z tmp1 = tl.libdevice.isinf(tmp0) 2023-01-11T21:38:06.4528245Z tmp3 = tl.libdevice.isnan(tmp2) 2023-01-11T21:38:06.4528377Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask) 2023-01-11T21:38:06.4528509Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.4528593Z ''') 2023-01-11T21:38:06.4528598Z 2023-01-11T21:38:06.4528603Z 2023-01-11T21:38:06.4528696Z async_compile.wait(globals()) 2023-01-11T21:38:06.4528765Z del async_compile 2023-01-11T21:38:06.4528771Z 2023-01-11T21:38:06.4528846Z def call(args): 2023-01-11T21:38:06.4528917Z arg0_1, = args 2023-01-11T21:38:06.4528991Z args.clear() 2023-01-11T21:38:06.4529080Z with torch.cuda.device(0): 2023-01-11T21:38:06.4529272Z buf0 = empty_strided((5, ), (1, ), device='cuda', dtype=torch.bool) 2023-01-11T21:38:06.4529460Z buf1 = empty_strided((5, ), (1, ), device='cuda', dtype=torch.bool) 2023-01-11T21:38:06.4529545Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.4529692Z triton_fused_isinf_isnan_0.run(arg0_1, buf0, buf1, 5, grid=grid(5), stream=stream0) 2023-01-11T21:38:06.4529794Z del arg0_1 2023-01-11T21:38:06.4529876Z return (buf0, buf1, ) 2023-01-11T21:38:06.4529881Z 2023-01-11T21:38:06.4529886Z 2023-01-11T21:38:06.4529965Z if __name__ == "__main__": 2023-01-11T21:38:06.4530082Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.4530207Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.4530402Z arg0_1 = rand_strided((5, ), (1, ), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.4530508Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.4530513Z 2023-01-11T21:38:06.4530524Z 2023-01-11T21:38:06.4530613Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.4530685Z import torch 2023-01-11T21:38:06.4530756Z import random 2023-01-11T21:38:06.4530875Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.4531000Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.4531009Z 2023-01-11T21:38:06.4531090Z aten = torch.ops.aten 2023-01-11T21:38:06.4531224Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.4531311Z async_compile = AsyncCompile() 2023-01-11T21:38:06.4531316Z 2023-01-11T21:38:06.4531387Z import triton 2023-01-11T21:38:06.4531477Z import triton.language as tl 2023-01-11T21:38:06.4531604Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.4531742Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.4531748Z 2023-01-11T21:38:06.4531752Z 2023-01-11T21:38:06.4531917Z triton_fused_isinf_isnan_0 = async_compile.triton(''' 2023-01-11T21:38:06.4531990Z import triton 2023-01-11T21:38:06.4532082Z import triton.language as tl 2023-01-11T21:38:06.4532188Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4532289Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4532422Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4532544Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4532551Z 2023-01-11T21:38:06.4532958Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*fp64', 1: '*i1', 2: '*i1', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.4533057Z @triton.jit 2023-01-11T21:38:06.4533200Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4533270Z xnumel = 5 2023-01-11T21:38:06.4533360Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4533487Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4533569Z xmask = xindex < xnumel 2023-01-11T21:38:06.4533640Z x0 = xindex 2023-01-11T21:38:06.4533829Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.4533925Z tmp2 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.4534022Z tmp1 = tl.libdevice.isinf(tmp0) 2023-01-11T21:38:06.4534114Z tmp3 = tl.libdevice.isnan(tmp2) 2023-01-11T21:38:06.4534245Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask) 2023-01-11T21:38:06.4534379Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.4534462Z ''') 2023-01-11T21:38:06.4534467Z 2023-01-11T21:38:06.4534596Z 2023-01-11T21:38:06.4534691Z async_compile.wait(globals()) 2023-01-11T21:38:06.4534766Z del async_compile 2023-01-11T21:38:06.4534771Z 2023-01-11T21:38:06.4534846Z def call(args): 2023-01-11T21:38:06.4534913Z arg0_1, = args 2023-01-11T21:38:06.4534987Z args.clear() 2023-01-11T21:38:06.4535079Z with torch.cuda.device(0): 2023-01-11T21:38:06.4535270Z buf0 = empty_strided((5, ), (1, ), device='cuda', dtype=torch.bool) 2023-01-11T21:38:06.4535481Z buf1 = empty_strided((5, ), (1, ), device='cuda', dtype=torch.bool) 2023-01-11T21:38:06.4535586Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.4535745Z triton_fused_isinf_isnan_0.run(arg0_1, buf0, buf1, 5, grid=grid(5), stream=stream0) 2023-01-11T21:38:06.4535861Z del arg0_1 2023-01-11T21:38:06.4535937Z return (buf0, buf1, ) 2023-01-11T21:38:06.4535942Z 2023-01-11T21:38:06.4535946Z 2023-01-11T21:38:06.4536024Z if __name__ == "__main__": 2023-01-11T21:38:06.4536146Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.4536273Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.4536469Z arg0_1 = rand_strided((5, ), (1, ), device='cuda:0', dtype=torch.float64) 2023-01-11T21:38:06.4536582Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.4536587Z 2023-01-11T21:38:06.4536850Z [2023-01-11 21:35:04,752] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 633 2023-01-11T21:38:06.4536856Z 2023-01-11T21:38:06.4536951Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.4537017Z import torch 2023-01-11T21:38:06.4537089Z import random 2023-01-11T21:38:06.4537272Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.4537396Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.4537401Z 2023-01-11T21:38:06.4537483Z aten = torch.ops.aten 2023-01-11T21:38:06.4537620Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.4537718Z async_compile = AsyncCompile() 2023-01-11T21:38:06.4537724Z 2023-01-11T21:38:06.4537798Z import triton 2023-01-11T21:38:06.4537883Z import triton.language as tl 2023-01-11T21:38:06.4538007Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.4538143Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.4538149Z 2023-01-11T21:38:06.4538153Z 2023-01-11T21:38:06.4538318Z triton_fused_isinf_isnan_0 = async_compile.triton(''' 2023-01-11T21:38:06.4538390Z import triton 2023-01-11T21:38:06.4538482Z import triton.language as tl 2023-01-11T21:38:06.4538597Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4538695Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4538827Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4538950Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4538955Z 2023-01-11T21:38:06.4539407Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*fp64', 1: '*i1', 2: '*i1', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.4539482Z @triton.jit 2023-01-11T21:38:06.4539625Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4539698Z xnumel = 5 2023-01-11T21:38:06.4539794Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4539916Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4539997Z xmask = xindex < xnumel 2023-01-11T21:38:06.4540065Z x0 = xindex 2023-01-11T21:38:06.4540253Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.4540354Z tmp2 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.4540454Z tmp1 = tl.libdevice.isinf(tmp0) 2023-01-11T21:38:06.4540549Z tmp3 = tl.libdevice.isnan(tmp2) 2023-01-11T21:38:06.4540674Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask) 2023-01-11T21:38:06.4540810Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.4540894Z ''') 2023-01-11T21:38:06.4540899Z 2023-01-11T21:38:06.4540904Z 2023-01-11T21:38:06.4540999Z async_compile.wait(globals()) 2023-01-11T21:38:06.4541075Z del async_compile 2023-01-11T21:38:06.4541080Z 2023-01-11T21:38:06.4541154Z def call(args): 2023-01-11T21:38:06.4541227Z arg0_1, = args 2023-01-11T21:38:06.4541303Z args.clear() 2023-01-11T21:38:06.4541388Z with torch.cuda.device(0): 2023-01-11T21:38:06.4541575Z buf0 = empty_strided((5, ), (1, ), device='cuda', dtype=torch.bool) 2023-01-11T21:38:06.4541792Z buf1 = empty_strided((5, ), (1, ), device='cuda', dtype=torch.bool) 2023-01-11T21:38:06.4541886Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.4542039Z triton_fused_isinf_isnan_0.run(arg0_1, buf0, buf1, 5, grid=grid(5), stream=stream0) 2023-01-11T21:38:06.4542112Z del arg0_1 2023-01-11T21:38:06.4542201Z return (buf0, buf1, ) 2023-01-11T21:38:06.4542206Z 2023-01-11T21:38:06.4542211Z 2023-01-11T21:38:06.4542292Z if __name__ == "__main__": 2023-01-11T21:38:06.4542404Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.4542531Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.4542732Z arg0_1 = rand_strided((5, ), (1, ), device='cuda:0', dtype=torch.float64) 2023-01-11T21:38:06.4542847Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.4542852Z 2023-01-11T21:38:06.4542925Z ok (0.485s) 2023-01-11T21:38:06.4543443Z test_kernel_names_cuda (__main__.CudaTests) ... /opt/conda/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py:372: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled.Consider setting `torch.set_float32_matmul_precision('high')` 2023-01-11T21:38:06.4543526Z warnings.warn( 2023-01-11T21:38:06.4543789Z [2023-01-11 21:35:04,763] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 634 2023-01-11T21:38:06.4544055Z [2023-01-11 21:35:04,917] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 634 2023-01-11T21:38:06.4544061Z 2023-01-11T21:38:06.4544154Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.4544229Z import torch 2023-01-11T21:38:06.4544308Z import random 2023-01-11T21:38:06.4544429Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.4544554Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.4544559Z 2023-01-11T21:38:06.4544642Z aten = torch.ops.aten 2023-01-11T21:38:06.4544782Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.4544875Z async_compile = AsyncCompile() 2023-01-11T21:38:06.4544880Z 2023-01-11T21:38:06.4544957Z import triton 2023-01-11T21:38:06.4545050Z import triton.language as tl 2023-01-11T21:38:06.4545178Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.4545353Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.4545359Z 2023-01-11T21:38:06.4545363Z 2023-01-11T21:38:06.4545499Z triton__0 = async_compile.triton(''' 2023-01-11T21:38:06.4545577Z import triton 2023-01-11T21:38:06.4545687Z import triton.language as tl 2023-01-11T21:38:06.4545808Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4545919Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4546048Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4546173Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4546178Z 2023-01-11T21:38:06.4546580Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.4546656Z @triton.jit 2023-01-11T21:38:06.4546789Z def triton__0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4546862Z xnumel = 8 2023-01-11T21:38:06.4546952Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4547079Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4547160Z xmask = xindex < xnumel 2023-01-11T21:38:06.4547228Z x0 = xindex 2023-01-11T21:38:06.4547323Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.4547394Z tmp1 = 2 2023-01-11T21:38:06.4547472Z tmp2 = tmp0 * tmp1 2023-01-11T21:38:06.4547599Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.4547684Z ''') 2023-01-11T21:38:06.4547690Z 2023-01-11T21:38:06.4547695Z 2023-01-11T21:38:06.4547813Z async_compile.wait(globals()) 2023-01-11T21:38:06.4547887Z del async_compile 2023-01-11T21:38:06.4547893Z 2023-01-11T21:38:06.4547969Z def call(args): 2023-01-11T21:38:06.4548039Z arg0_1, = args 2023-01-11T21:38:06.4548115Z args.clear() 2023-01-11T21:38:06.4548200Z with torch.cuda.device(0): 2023-01-11T21:38:06.4548399Z buf0 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.4548490Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.4548614Z triton__0.run(arg0_1, buf0, 8, grid=grid(8), stream=stream0) 2023-01-11T21:38:06.4548685Z del arg0_1 2023-01-11T21:38:06.4548761Z return (buf0, ) 2023-01-11T21:38:06.4548767Z 2023-01-11T21:38:06.4548771Z 2023-01-11T21:38:06.4548852Z if __name__ == "__main__": 2023-01-11T21:38:06.4548967Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.4549086Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.4549280Z arg0_1 = rand_strided((8, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.4549394Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.4549399Z 2023-01-11T21:38:06.4549467Z ok (0.324s) 2023-01-11T21:38:06.4549626Z test_kwargs_cuda (__main__.CudaTests) ... skip: histogramdd only supports cpu (0.000s) 2023-01-11T21:38:06.4550078Z test_l1_loss_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.4550209Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.4550466Z [2023-01-11 21:35:05,101] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 635 2023-01-11T21:38:06.4550729Z [2023-01-11 21:35:05,315] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 635 2023-01-11T21:38:06.4550738Z 2023-01-11T21:38:06.4550834Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.4550901Z import torch 2023-01-11T21:38:06.4550978Z import random 2023-01-11T21:38:06.4551124Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.4551247Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.4551251Z 2023-01-11T21:38:06.4551332Z aten = torch.ops.aten 2023-01-11T21:38:06.4551469Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.4551566Z async_compile = AsyncCompile() 2023-01-11T21:38:06.4551571Z 2023-01-11T21:38:06.4551639Z import triton 2023-01-11T21:38:06.4551730Z import triton.language as tl 2023-01-11T21:38:06.4551855Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.4551993Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.4551999Z 2023-01-11T21:38:06.4552006Z 2023-01-11T21:38:06.4552202Z triton_fused_abs_1_mean_mean_1_pow_1_sub_sub_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.4552279Z import triton 2023-01-11T21:38:06.4552369Z import triton.language as tl 2023-01-11T21:38:06.4552481Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4552577Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4552707Z from torch._inductor.triton_ops.autotune import reduction 2023-01-11T21:38:06.4552830Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4552835Z 2023-01-11T21:38:06.4552925Z @reduction(size_hints=[1, 2048], 2023-01-11T21:38:06.4553039Z reduction_hint=ReductionHint.INNER, 2023-01-11T21:38:06.4553122Z filename=__file__, 2023-01-11T21:38:06.4553545Z meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: 'i32', 5: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0', 'in_out_ptr1'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 5), equal_to_1=())]}) 2023-01-11T21:38:06.4553652Z @triton.jit 2023-01-11T21:38:06.4553837Z def triton_(in_out_ptr0, in_out_ptr1, in_ptr0, in_ptr1, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr): 2023-01-11T21:38:06.4553909Z xnumel = 1 2023-01-11T21:38:06.4553985Z rnumel = 1536 2023-01-11T21:38:06.4554087Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4554224Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.4554308Z xmask = xindex < xnumel 2023-01-11T21:38:06.4554425Z rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK]) 2023-01-11T21:38:06.4554537Z _tmp4 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.4554654Z _tmp9 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.4554760Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.4554847Z rindex = roffset + rbase 2023-01-11T21:38:06.4554931Z rmask = rindex < rnumel 2023-01-11T21:38:06.4555006Z r0 = rindex 2023-01-11T21:38:06.4555201Z tmp0 = tl.load(in_ptr0 + (r0), rmask, eviction_policy='evict_last') 2023-01-11T21:38:06.4555410Z tmp1 = tl.load(in_ptr1 + (r0), rmask, eviction_policy='evict_last') 2023-01-11T21:38:06.4555522Z tmp5 = tl.load(in_ptr0 + (r0), rmask) 2023-01-11T21:38:06.4555634Z tmp6 = tl.load(in_ptr1 + (r0), rmask) 2023-01-11T21:38:06.4555748Z tmp2 = tmp0 - tmp1 2023-01-11T21:38:06.4555833Z tmp3 = tl.abs(tmp2) 2023-01-11T21:38:06.4555953Z _tmp4 = tl.where(xmask & rmask, _tmp4 + tmp3, _tmp4) 2023-01-11T21:38:06.4556067Z tmp7 = tmp5 - tmp6 2023-01-11T21:38:06.4556141Z tmp8 = tmp7 * tmp7 2023-01-11T21:38:06.4556262Z _tmp9 = tl.where(xmask & rmask, _tmp9 + tmp8, _tmp9) 2023-01-11T21:38:06.4556374Z tmp4 = tl.reshape(tl.sum(_tmp4, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.4556484Z tmp9 = tl.reshape(tl.sum(_tmp9, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.4556557Z tmp10 = 1536 2023-01-11T21:38:06.4556640Z tmp11 = tmp4 / tmp10 2023-01-11T21:38:06.4556720Z tmp12 = tmp9 / tmp10 2023-01-11T21:38:06.4556853Z tl.store(in_out_ptr0 + (0 + tl.zeros([XBLOCK, 1], tl.int32)), tmp11, None) 2023-01-11T21:38:06.4556990Z tl.store(in_out_ptr1 + (0 + tl.zeros([XBLOCK, 1], tl.int32)), tmp12, None) 2023-01-11T21:38:06.4557074Z ''') 2023-01-11T21:38:06.4557107Z 2023-01-11T21:38:06.4557112Z 2023-01-11T21:38:06.4557205Z async_compile.wait(globals()) 2023-01-11T21:38:06.4557285Z del async_compile 2023-01-11T21:38:06.4557290Z 2023-01-11T21:38:06.4557363Z def call(args): 2023-01-11T21:38:06.4557443Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.4557518Z args.clear() 2023-01-11T21:38:06.4557603Z with torch.cuda.device(0): 2023-01-11T21:38:06.4557792Z buf0 = empty_strided((), (), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.4557981Z buf1 = empty_strided((), (), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.4558073Z buf2 = buf0; del buf0 # reuse 2023-01-11T21:38:06.4558165Z buf3 = buf1; del buf1 # reuse 2023-01-11T21:38:06.4558255Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.4558429Z triton_fused_abs_1_mean_mean_1_pow_1_sub_sub_1_0.run(buf2, buf3, arg0_1, arg1_1, 1, 1536, grid=grid(1), stream=stream0) 2023-01-11T21:38:06.4558495Z del arg0_1 2023-01-11T21:38:06.4558569Z del arg1_1 2023-01-11T21:38:06.4558652Z return (buf2, buf3, ) 2023-01-11T21:38:06.4558657Z 2023-01-11T21:38:06.4558661Z 2023-01-11T21:38:06.4558740Z if __name__ == "__main__": 2023-01-11T21:38:06.4558857Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.4558982Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.4559206Z arg0_1 = rand_strided((2, 3, 16, 16), (768, 256, 16, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.4559420Z arg1_1 = rand_strided((2, 3, 16, 16), (768, 256, 16, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.4559533Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.4559575Z 2023-01-11T21:38:06.4559642Z ok (0.239s) 2023-01-11T21:38:06.4560102Z test_layer_norm_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.4560236Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.4560491Z [2023-01-11 21:35:05,382] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 636 2023-01-11T21:38:06.4560700Z [2023-01-11 21:35:05,412] torch._inductor.scheduler: [DEBUG] remove_buffer('buf0') 2023-01-11T21:38:06.4560962Z [2023-01-11 21:35:05,680] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 636 2023-01-11T21:38:06.4560970Z 2023-01-11T21:38:06.4561066Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.4561140Z import torch 2023-01-11T21:38:06.4561215Z import random 2023-01-11T21:38:06.4561327Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.4561449Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.4561457Z 2023-01-11T21:38:06.4561538Z aten = torch.ops.aten 2023-01-11T21:38:06.4561673Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.4561767Z async_compile = AsyncCompile() 2023-01-11T21:38:06.4561772Z 2023-01-11T21:38:06.4561843Z import triton 2023-01-11T21:38:06.4561934Z import triton.language as tl 2023-01-11T21:38:06.4562051Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.4562190Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.4562196Z 2023-01-11T21:38:06.4562200Z 2023-01-11T21:38:06.4562397Z triton_fused_getitem_1_relu_rsqrt_var_mean_0 = async_compile.triton(''' 2023-01-11T21:38:06.4562474Z import triton 2023-01-11T21:38:06.4562566Z import triton.language as tl 2023-01-11T21:38:06.4562678Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4562778Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4562911Z from torch._inductor.triton_ops.autotune import reduction 2023-01-11T21:38:06.4563056Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4563062Z 2023-01-11T21:38:06.4563151Z @reduction(size_hints=[16, 32], 2023-01-11T21:38:06.4563268Z reduction_hint=ReductionHint.INNER, 2023-01-11T21:38:06.4563352Z filename=__file__, 2023-01-11T21:38:06.4563808Z meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: '*fp32', 5: '*fp32', 6: 'i32', 7: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0', 'in_out_ptr1'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5, 6, 7), equal_to_1=())]}) 2023-01-11T21:38:06.4563885Z @triton.jit 2023-01-11T21:38:06.4564089Z def triton_(in_out_ptr0, in_out_ptr1, in_ptr0, in_ptr1, in_ptr2, out_ptr1, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr): 2023-01-11T21:38:06.4564160Z xnumel = 16 2023-01-11T21:38:06.4564225Z rnumel = 32 2023-01-11T21:38:06.4564322Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4564459Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.4564543Z xmask = xindex < xnumel 2023-01-11T21:38:06.4564658Z rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK]) 2023-01-11T21:38:06.4564727Z x0 = xindex 2023-01-11T21:38:06.4564844Z _tmp1 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.4564942Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.4565029Z rindex = roffset + rbase 2023-01-11T21:38:06.4565116Z rmask = rindex < rnumel 2023-01-11T21:38:06.4565186Z r1 = rindex 2023-01-11T21:38:06.4565406Z tmp0 = tl.load(in_ptr0 + (r1 + (32*x0)), rmask & xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.4565582Z _tmp1 = tl.where(xmask & rmask, _tmp1 + tmp0, _tmp1) 2023-01-11T21:38:06.4565709Z tmp1 = tl.reshape(tl.sum(_tmp1, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.4565836Z _tmp7 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.4565947Z _tmp8 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.4566052Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.4566140Z rindex = roffset + rbase 2023-01-11T21:38:06.4566225Z rmask = rindex < rnumel 2023-01-11T21:38:06.4566296Z r1 = rindex 2023-01-11T21:38:06.4566515Z tmp2 = tl.load(in_ptr0 + (r1 + (32*x0)), rmask & xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.4566588Z tmp3 = 32 2023-01-11T21:38:06.4566662Z tmp4 = tmp1 / tmp3 2023-01-11T21:38:06.4566775Z tmp5 = tmp2 - tmp4 2023-01-11T21:38:06.4566856Z tmp6 = tmp5 * tmp5 2023-01-11T21:38:06.4566977Z _tmp7 = tl.where(xmask & rmask, _tmp7 + tmp6, _tmp7) 2023-01-11T21:38:06.4567098Z _tmp8 = tl.where(xmask & rmask, _tmp8 + tmp2, _tmp8) 2023-01-11T21:38:06.4567210Z tmp7 = tl.reshape(tl.sum(_tmp7, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.4567322Z tmp8 = tl.reshape(tl.sum(_tmp8, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.4567386Z tmp9 = 32 2023-01-11T21:38:06.4567469Z tmp10 = tmp8 / tmp9 2023-01-11T21:38:06.4567546Z tmp11 = tmp7 / tmp9 2023-01-11T21:38:06.4567650Z tmp12 = 1e-05 2023-01-11T21:38:06.4567732Z tmp13 = tmp11 + tmp12 2023-01-11T21:38:06.4567833Z tmp14 = tl.libdevice.rsqrt(tmp13) 2023-01-11T21:38:06.4567966Z tl.store(in_out_ptr0 + (x0 + tl.zeros([XBLOCK, 1], tl.int32)), tmp10, xmask) 2023-01-11T21:38:06.4568108Z tl.store(in_out_ptr1 + (x0 + tl.zeros([XBLOCK, 1], tl.int32)), tmp14, xmask) 2023-01-11T21:38:06.4568211Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.4568299Z rindex = roffset + rbase 2023-01-11T21:38:06.4568381Z rmask = rindex < rnumel 2023-01-11T21:38:06.4568455Z r1 = rindex 2023-01-11T21:38:06.4568571Z tmp15 = tl.load(in_ptr0 + (r1 + (32*x0)), rmask & xmask) 2023-01-11T21:38:06.4568679Z tmp18 = tl.load(in_ptr1 + (r1), rmask) 2023-01-11T21:38:06.4568774Z tmp20 = tl.load(in_ptr2 + (r1), rmask) 2023-01-11T21:38:06.4568966Z tmp16 = tmp15 - tmp10 2023-01-11T21:38:06.4569054Z tmp17 = tmp16 * tmp14 2023-01-11T21:38:06.4569134Z tmp19 = tmp17 * tmp18 2023-01-11T21:38:06.4569213Z tmp21 = tmp19 + tmp20 2023-01-11T21:38:06.4569332Z tmp22 = tl.where(0 != 0, 0, tl.where(0 > tmp21, 0, tmp21)) 2023-01-11T21:38:06.4569489Z tl.store(out_ptr1 + (r1 + (32*x0) + tl.zeros([XBLOCK, RBLOCK], tl.int32)), tmp22, rmask & xmask) 2023-01-11T21:38:06.4569567Z ''') 2023-01-11T21:38:06.4569573Z 2023-01-11T21:38:06.4569578Z 2023-01-11T21:38:06.4569668Z async_compile.wait(globals()) 2023-01-11T21:38:06.4569745Z del async_compile 2023-01-11T21:38:06.4569750Z 2023-01-11T21:38:06.4569828Z def call(args): 2023-01-11T21:38:06.4569933Z primals_1, primals_2, primals_3 = args 2023-01-11T21:38:06.4570008Z args.clear() 2023-01-11T21:38:06.4570101Z with torch.cuda.device(0): 2023-01-11T21:38:06.4570296Z buf1 = empty_strided((16, 1), (1, 16), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.4570494Z buf2 = empty_strided((16, 1), (1, 16), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.4570616Z buf3 = as_strided(buf2, (16, 1), (1, 1)); del buf2 # reuse 2023-01-11T21:38:06.4570730Z buf4 = as_strided(buf1, (16, 1), (1, 1)); del buf1 # reuse 2023-01-11T21:38:06.4570929Z buf5 = empty_strided((16, 32), (32, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.4571022Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.4571221Z triton_fused_getitem_1_relu_rsqrt_var_mean_0.run(buf3, buf4, primals_3, primals_1, primals_2, buf5, 16, 32, grid=grid(16), stream=stream0) 2023-01-11T21:38:06.4571352Z return (buf5, primals_1, primals_2, primals_3, buf3, buf4, ) 2023-01-11T21:38:06.4571386Z 2023-01-11T21:38:06.4571390Z 2023-01-11T21:38:06.4571468Z if __name__ == "__main__": 2023-01-11T21:38:06.4571580Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.4571708Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.4571916Z primals_1 = rand_strided((32, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.4572119Z primals_2 = rand_strided((32, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.4572327Z primals_3 = rand_strided((16, 32), (32, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.4572470Z print_performance(lambda: call([primals_1, primals_2, primals_3])) 2023-01-11T21:38:06.4572476Z 2023-01-11T21:38:06.4572545Z ok (0.365s) 2023-01-11T21:38:06.4573007Z test_leaky_relu_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.4573138Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.4573392Z [2023-01-11 21:35:05,715] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 637 2023-01-11T21:38:06.4573657Z [2023-01-11 21:35:05,798] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 637 2023-01-11T21:38:06.4574076Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.4574211Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.4574464Z [2023-01-11 21:35:05,857] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 638 2023-01-11T21:38:06.4574880Z [2023-01-11 21:35:05,947] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 638 2023-01-11T21:38:06.4574886Z 2023-01-11T21:38:06.4574989Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.4575062Z import torch 2023-01-11T21:38:06.4575137Z import random 2023-01-11T21:38:06.4575255Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.4575371Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.4575376Z 2023-01-11T21:38:06.4575459Z aten = torch.ops.aten 2023-01-11T21:38:06.4575597Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.4575697Z async_compile = AsyncCompile() 2023-01-11T21:38:06.4575702Z 2023-01-11T21:38:06.4575775Z import triton 2023-01-11T21:38:06.4575872Z import triton.language as tl 2023-01-11T21:38:06.4575996Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.4576128Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.4576142Z 2023-01-11T21:38:06.4576146Z 2023-01-11T21:38:06.4576309Z triton_fused_add_where_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.4576382Z import triton 2023-01-11T21:38:06.4576477Z import triton.language as tl 2023-01-11T21:38:06.4576591Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4576692Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4576826Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4576950Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4576956Z 2023-01-11T21:38:06.4577437Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.4577598Z @triton.jit 2023-01-11T21:38:06.4577742Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4577816Z xnumel = 256 2023-01-11T21:38:06.4577911Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4578041Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4578126Z xmask = xindex < xnumel 2023-01-11T21:38:06.4578196Z x0 = xindex 2023-01-11T21:38:06.4578382Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.4578479Z tmp8 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.4578551Z tmp1 = 0 2023-01-11T21:38:06.4578628Z tmp2 = tmp0 > tmp1 2023-01-11T21:38:06.4578696Z tmp3 = 0.2 2023-01-11T21:38:06.4578771Z tmp4 = tmp0 * tmp3 2023-01-11T21:38:06.4578866Z tmp5 = tl.where(tmp2, tmp0, tmp4) 2023-01-11T21:38:06.4578930Z tmp6 = 2 2023-01-11T21:38:06.4579010Z tmp7 = tmp5 + tmp6 2023-01-11T21:38:06.4579080Z tmp9 = 1 2023-01-11T21:38:06.4579158Z tmp10 = tmp8 + tmp9 2023-01-11T21:38:06.4579237Z tmp11 = tmp10 > tmp1 2023-01-11T21:38:06.4579309Z tmp12 = 0.01 2023-01-11T21:38:06.4579381Z tmp13 = tmp10 * tmp12 2023-01-11T21:38:06.4579479Z tmp14 = tl.where(tmp11, tmp10, tmp13) 2023-01-11T21:38:06.4579618Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp7, xmask) 2023-01-11T21:38:06.4579751Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp14, xmask) 2023-01-11T21:38:06.4579834Z ''') 2023-01-11T21:38:06.4579840Z 2023-01-11T21:38:06.4579844Z 2023-01-11T21:38:06.4579936Z async_compile.wait(globals()) 2023-01-11T21:38:06.4580011Z del async_compile 2023-01-11T21:38:06.4580016Z 2023-01-11T21:38:06.4580091Z def call(args): 2023-01-11T21:38:06.4580157Z arg0_1, = args 2023-01-11T21:38:06.4580230Z args.clear() 2023-01-11T21:38:06.4580321Z with torch.cuda.device(0): 2023-01-11T21:38:06.4580525Z buf0 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.4580722Z buf1 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.4580810Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.4580990Z triton_fused_add_where_1_0.run(arg0_1, buf0, buf1, 256, grid=grid(256), stream=stream0) 2023-01-11T21:38:06.4581058Z del arg0_1 2023-01-11T21:38:06.4581140Z return (buf0, buf1, ) 2023-01-11T21:38:06.4581145Z 2023-01-11T21:38:06.4581150Z 2023-01-11T21:38:06.4581229Z if __name__ == "__main__": 2023-01-11T21:38:06.4581347Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.4581472Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.4581677Z arg0_1 = rand_strided((16, 16), (16, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.4581788Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.4581793Z 2023-01-11T21:38:06.4581801Z 2023-01-11T21:38:06.4581900Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.4581967Z import torch 2023-01-11T21:38:06.4582041Z import random 2023-01-11T21:38:06.4582159Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.4582280Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.4582287Z 2023-01-11T21:38:06.4582369Z aten = torch.ops.aten 2023-01-11T21:38:06.4582506Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.4582601Z async_compile = AsyncCompile() 2023-01-11T21:38:06.4582606Z 2023-01-11T21:38:06.4582679Z import triton 2023-01-11T21:38:06.4582764Z import triton.language as tl 2023-01-11T21:38:06.4582887Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.4583026Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.4583031Z 2023-01-11T21:38:06.4583036Z 2023-01-11T21:38:06.4583229Z triton_fused_add_convert_element_type_3_0 = async_compile.triton(''' 2023-01-11T21:38:06.4583334Z import triton 2023-01-11T21:38:06.4583427Z import triton.language as tl 2023-01-11T21:38:06.4583543Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4583645Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4583772Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4583902Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4583907Z 2023-01-11T21:38:06.4584329Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.4584405Z @triton.jit 2023-01-11T21:38:06.4584550Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4584627Z xnumel = 256 2023-01-11T21:38:06.4584725Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4584855Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4584936Z xmask = xindex < xnumel 2023-01-11T21:38:06.4585009Z x0 = xindex 2023-01-11T21:38:06.4585223Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.4585343Z tmp10 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.4585436Z tmp1 = tmp0.to(tl.float32) 2023-01-11T21:38:06.4585509Z tmp2 = 0 2023-01-11T21:38:06.4585591Z tmp3 = tmp1 > tmp2 2023-01-11T21:38:06.4585659Z tmp4 = 0.2 2023-01-11T21:38:06.4585738Z tmp5 = tmp1 * tmp4 2023-01-11T21:38:06.4585837Z tmp6 = tl.where(tmp3, tmp1, tmp5) 2023-01-11T21:38:06.4585926Z tmp7 = tmp6.to(tl.float32) 2023-01-11T21:38:06.4585998Z tmp8 = 2 2023-01-11T21:38:06.4586080Z tmp9 = tmp7 + tmp8 2023-01-11T21:38:06.4586146Z tmp11 = 1 2023-01-11T21:38:06.4586230Z tmp12 = tmp10 + tmp11 2023-01-11T21:38:06.4586322Z tmp13 = tmp12.to(tl.float32) 2023-01-11T21:38:06.4586403Z tmp14 = tmp13 > tmp2 2023-01-11T21:38:06.4586479Z tmp15 = 0.01 2023-01-11T21:38:06.4586558Z tmp16 = tmp13 * tmp15 2023-01-11T21:38:06.4586660Z tmp17 = tl.where(tmp14, tmp13, tmp16) 2023-01-11T21:38:06.4586745Z tmp18 = tmp17.to(tl.float32) 2023-01-11T21:38:06.4586883Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp9, xmask) 2023-01-11T21:38:06.4587046Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp18, xmask) 2023-01-11T21:38:06.4587133Z ''') 2023-01-11T21:38:06.4587138Z 2023-01-11T21:38:06.4587143Z 2023-01-11T21:38:06.4587236Z async_compile.wait(globals()) 2023-01-11T21:38:06.4587310Z del async_compile 2023-01-11T21:38:06.4587316Z 2023-01-11T21:38:06.4587391Z def call(args): 2023-01-11T21:38:06.4587463Z arg0_1, = args 2023-01-11T21:38:06.4587531Z args.clear() 2023-01-11T21:38:06.4587624Z with torch.cuda.device(0): 2023-01-11T21:38:06.4587825Z buf0 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.4588024Z buf1 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.4588116Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.4588283Z triton_fused_add_convert_element_type_3_0.run(arg0_1, buf0, buf1, 256, grid=grid(256), stream=stream0) 2023-01-11T21:38:06.4588356Z del arg0_1 2023-01-11T21:38:06.4588434Z return (buf0, buf1, ) 2023-01-11T21:38:06.4588440Z 2023-01-11T21:38:06.4588450Z 2023-01-11T21:38:06.4588523Z if __name__ == "__main__": 2023-01-11T21:38:06.4588637Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.4588765Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.4588968Z arg0_1 = rand_strided((16, 16), (16, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.4589079Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.4589084Z 2023-01-11T21:38:06.4589153Z ok (0.267s) 2023-01-11T21:38:06.4589599Z test_lgamma_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.4589757Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.4590013Z [2023-01-11 21:35:06,235] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 639 2023-01-11T21:38:06.4590267Z [2023-01-11 21:35:06,418] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 639 2023-01-11T21:38:06.4590677Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.4590809Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.4591063Z [2023-01-11 21:35:06,448] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 640 2023-01-11T21:38:06.4591328Z [2023-01-11 21:35:06,629] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 640 2023-01-11T21:38:06.4591334Z 2023-01-11T21:38:06.4591431Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.4591505Z import torch 2023-01-11T21:38:06.4591580Z import random 2023-01-11T21:38:06.4591700Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.4591816Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.4591821Z 2023-01-11T21:38:06.4591902Z aten = torch.ops.aten 2023-01-11T21:38:06.4592038Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.4592133Z async_compile = AsyncCompile() 2023-01-11T21:38:06.4592142Z 2023-01-11T21:38:06.4592214Z import triton 2023-01-11T21:38:06.4592305Z import triton.language as tl 2023-01-11T21:38:06.4592431Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.4592563Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.4592575Z 2023-01-11T21:38:06.4592608Z 2023-01-11T21:38:06.4592763Z triton_fused_add_cos_0 = async_compile.triton(''' 2023-01-11T21:38:06.4592837Z import triton 2023-01-11T21:38:06.4592926Z import triton.language as tl 2023-01-11T21:38:06.4593044Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4593148Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4593279Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4593406Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4593411Z 2023-01-11T21:38:06.4593829Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.4593899Z @triton.jit 2023-01-11T21:38:06.4594040Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4594116Z xnumel = 256 2023-01-11T21:38:06.4594217Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4594347Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4594428Z xmask = xindex < xnumel 2023-01-11T21:38:06.4594499Z x0 = xindex 2023-01-11T21:38:06.4594681Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.4594778Z tmp4 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.4594878Z tmp1 = tl.libdevice.lgamma(tmp0) 2023-01-11T21:38:06.4594947Z tmp2 = 2 2023-01-11T21:38:06.4595025Z tmp3 = tmp1 + tmp2 2023-01-11T21:38:06.4595097Z tmp5 = 1 2023-01-11T21:38:06.4595216Z tmp6 = tmp4 + tmp5 2023-01-11T21:38:06.4595298Z tmp7 = tl.cos(tmp6) 2023-01-11T21:38:06.4595452Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.4595606Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp7, xmask) 2023-01-11T21:38:06.4595691Z ''') 2023-01-11T21:38:06.4595696Z 2023-01-11T21:38:06.4595703Z 2023-01-11T21:38:06.4595795Z async_compile.wait(globals()) 2023-01-11T21:38:06.4595874Z del async_compile 2023-01-11T21:38:06.4595879Z 2023-01-11T21:38:06.4595954Z def call(args): 2023-01-11T21:38:06.4596020Z arg0_1, = args 2023-01-11T21:38:06.4596095Z args.clear() 2023-01-11T21:38:06.4596186Z with torch.cuda.device(0): 2023-01-11T21:38:06.4596389Z buf0 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.4596587Z buf1 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.4596680Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.4596829Z triton_fused_add_cos_0.run(arg0_1, buf0, buf1, 256, grid=grid(256), stream=stream0) 2023-01-11T21:38:06.4596900Z del arg0_1 2023-01-11T21:38:06.4596975Z return (buf0, buf1, ) 2023-01-11T21:38:06.4596980Z 2023-01-11T21:38:06.4596984Z 2023-01-11T21:38:06.4597063Z if __name__ == "__main__": 2023-01-11T21:38:06.4597184Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.4597310Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.4597512Z arg0_1 = rand_strided((16, 16), (16, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.4597623Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.4597628Z 2023-01-11T21:38:06.4597633Z 2023-01-11T21:38:06.4597729Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.4597802Z import torch 2023-01-11T21:38:06.4597869Z import random 2023-01-11T21:38:06.4597987Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.4598107Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.4598116Z 2023-01-11T21:38:06.4598195Z aten = torch.ops.aten 2023-01-11T21:38:06.4598329Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.4598423Z async_compile = AsyncCompile() 2023-01-11T21:38:06.4598428Z 2023-01-11T21:38:06.4598502Z import triton 2023-01-11T21:38:06.4598618Z import triton.language as tl 2023-01-11T21:38:06.4598745Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.4598883Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.4598889Z 2023-01-11T21:38:06.4598893Z 2023-01-11T21:38:06.4599052Z triton_fused_add_cos_0 = async_compile.triton(''' 2023-01-11T21:38:06.4599125Z import triton 2023-01-11T21:38:06.4599217Z import triton.language as tl 2023-01-11T21:38:06.4599329Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4599430Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4599555Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4599682Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4599687Z 2023-01-11T21:38:06.4600107Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.4600181Z @triton.jit 2023-01-11T21:38:06.4600325Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4600401Z xnumel = 256 2023-01-11T21:38:06.4600496Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4600626Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4600702Z xmask = xindex < xnumel 2023-01-11T21:38:06.4600772Z x0 = xindex 2023-01-11T21:38:06.4600986Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.4601102Z tmp4 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.4601231Z tmp1 = tl.libdevice.lgamma(tmp0) 2023-01-11T21:38:06.4601301Z tmp2 = 2 2023-01-11T21:38:06.4601378Z tmp3 = tmp1 + tmp2 2023-01-11T21:38:06.4601442Z tmp5 = 1 2023-01-11T21:38:06.4601518Z tmp6 = tmp4 + tmp5 2023-01-11T21:38:06.4601597Z tmp7 = tl.cos(tmp6) 2023-01-11T21:38:06.4601731Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.4601861Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp7, xmask) 2023-01-11T21:38:06.4601946Z ''') 2023-01-11T21:38:06.4601953Z 2023-01-11T21:38:06.4601957Z 2023-01-11T21:38:06.4602049Z async_compile.wait(globals()) 2023-01-11T21:38:06.4602126Z del async_compile 2023-01-11T21:38:06.4602131Z 2023-01-11T21:38:06.4602199Z def call(args): 2023-01-11T21:38:06.4602270Z arg0_1, = args 2023-01-11T21:38:06.4602342Z args.clear() 2023-01-11T21:38:06.4602434Z with torch.cuda.device(0): 2023-01-11T21:38:06.4602635Z buf0 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.4602836Z buf1 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.4602927Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.4603068Z triton_fused_add_cos_0.run(arg0_1, buf0, buf1, 256, grid=grid(256), stream=stream0) 2023-01-11T21:38:06.4603138Z del arg0_1 2023-01-11T21:38:06.4603221Z return (buf0, buf1, ) 2023-01-11T21:38:06.4603227Z 2023-01-11T21:38:06.4603232Z 2023-01-11T21:38:06.4603311Z if __name__ == "__main__": 2023-01-11T21:38:06.4603427Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.4603553Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.4603756Z arg0_1 = rand_strided((16, 16), (16, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.4603869Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.4603874Z 2023-01-11T21:38:06.4603938Z ok (0.682s) 2023-01-11T21:38:06.4604417Z test_linear1_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.4604552Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.4604808Z [2023-01-11 21:35:06,680] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 641 2023-01-11T21:38:06.4605071Z [2023-01-11 21:35:06,837] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 641 2023-01-11T21:38:06.4605510Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.4605663Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.4605925Z [2023-01-11 21:35:06,996] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 642 2023-01-11T21:38:06.4606187Z [2023-01-11 21:35:07,066] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 642 2023-01-11T21:38:06.4606192Z 2023-01-11T21:38:06.4606287Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.4606361Z import torch 2023-01-11T21:38:06.4606428Z import random 2023-01-11T21:38:06.4606544Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.4606666Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.4606671Z 2023-01-11T21:38:06.4606752Z aten = torch.ops.aten 2023-01-11T21:38:06.4606885Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.4607006Z async_compile = AsyncCompile() 2023-01-11T21:38:06.4607011Z 2023-01-11T21:38:06.4607083Z import triton 2023-01-11T21:38:06.4607167Z import triton.language as tl 2023-01-11T21:38:06.4607288Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.4607429Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.4607435Z 2023-01-11T21:38:06.4607439Z 2023-01-11T21:38:06.4607600Z triton_fused_sigmoid_0 = async_compile.triton(''' 2023-01-11T21:38:06.4607675Z import triton 2023-01-11T21:38:06.4607767Z import triton.language as tl 2023-01-11T21:38:06.4607882Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4607981Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4608108Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4608231Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4608236Z 2023-01-11T21:38:06.4608638Z @pointwise(size_hints=[32], filename=__file__, meta={'signature': {0: '*fp32', 1: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.4608710Z @triton.jit 2023-01-11T21:38:06.4608835Z def triton_(in_out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4608911Z xnumel = 32 2023-01-11T21:38:06.4609009Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4609139Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4609215Z xmask = xindex < xnumel 2023-01-11T21:38:06.4609285Z x0 = xindex 2023-01-11T21:38:06.4609393Z tmp0 = tl.load(in_out_ptr0 + (x0), xmask) 2023-01-11T21:38:06.4609478Z tmp1 = tl.sigmoid(tmp0) 2023-01-11T21:38:06.4609612Z tl.store(in_out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask) 2023-01-11T21:38:06.4609696Z ''') 2023-01-11T21:38:06.4609702Z 2023-01-11T21:38:06.4609706Z 2023-01-11T21:38:06.4609801Z async_compile.wait(globals()) 2023-01-11T21:38:06.4609871Z del async_compile 2023-01-11T21:38:06.4609881Z 2023-01-11T21:38:06.4609949Z def call(args): 2023-01-11T21:38:06.4610053Z primals_1, primals_2, primals_3 = args 2023-01-11T21:38:06.4610129Z args.clear() 2023-01-11T21:38:06.4610220Z with torch.cuda.device(0): 2023-01-11T21:38:06.4610447Z buf0 = empty_strided((2, 16), (16, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.4610615Z aten.addmm.out(primals_2, primals_3, as_strided(primals_1, (8, 16), (1, 8)), beta=1, alpha=1, out=buf0) 2023-01-11T21:38:06.4610692Z del primals_1 2023-01-11T21:38:06.4610761Z del primals_2 2023-01-11T21:38:06.4610852Z buf1 = buf0; del buf0 # reuse 2023-01-11T21:38:06.4610943Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.4611075Z triton_fused_sigmoid_0.run(buf1, 32, grid=grid(32), stream=stream0) 2023-01-11T21:38:06.4611171Z return (buf1, primals_3, buf1, ) 2023-01-11T21:38:06.4611180Z 2023-01-11T21:38:06.4611184Z 2023-01-11T21:38:06.4611264Z if __name__ == "__main__": 2023-01-11T21:38:06.4611383Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.4611503Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.4611710Z primals_1 = rand_strided((16, 8), (8, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.4611913Z primals_2 = rand_strided((16, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.4612118Z primals_3 = rand_strided((2, 8), (8, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.4612261Z print_performance(lambda: call([primals_1, primals_2, primals_3])) 2023-01-11T21:38:06.4612266Z 2023-01-11T21:38:06.4612270Z 2023-01-11T21:38:06.4612367Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.4612440Z import torch 2023-01-11T21:38:06.4612513Z import random 2023-01-11T21:38:06.4612625Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.4612748Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.4612779Z 2023-01-11T21:38:06.4612861Z aten = torch.ops.aten 2023-01-11T21:38:06.4612993Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.4613087Z async_compile = AsyncCompile() 2023-01-11T21:38:06.4613093Z 2023-01-11T21:38:06.4613163Z import triton 2023-01-11T21:38:06.4613258Z import triton.language as tl 2023-01-11T21:38:06.4613380Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.4613511Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.4613516Z 2023-01-11T21:38:06.4613527Z 2023-01-11T21:38:06.4613681Z triton_fused_sigmoid_0 = async_compile.triton(''' 2023-01-11T21:38:06.4613755Z import triton 2023-01-11T21:38:06.4613847Z import triton.language as tl 2023-01-11T21:38:06.4613961Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4614062Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4614195Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4614322Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4614327Z 2023-01-11T21:38:06.4614831Z @pointwise(size_hints=[32], filename=__file__, meta={'signature': {0: '*fp16', 1: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.4614905Z @triton.jit 2023-01-11T21:38:06.4615029Z def triton_(in_out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4615103Z xnumel = 32 2023-01-11T21:38:06.4615198Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4615327Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4615408Z xmask = xindex < xnumel 2023-01-11T21:38:06.4615473Z x0 = xindex 2023-01-11T21:38:06.4615594Z tmp0 = tl.load(in_out_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.4615679Z tmp1 = tl.sigmoid(tmp0) 2023-01-11T21:38:06.4615815Z tl.store(in_out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask) 2023-01-11T21:38:06.4615904Z ''') 2023-01-11T21:38:06.4615909Z 2023-01-11T21:38:06.4615914Z 2023-01-11T21:38:06.4616006Z async_compile.wait(globals()) 2023-01-11T21:38:06.4616083Z del async_compile 2023-01-11T21:38:06.4616088Z 2023-01-11T21:38:06.4616161Z def call(args): 2023-01-11T21:38:06.4616301Z primals_1, primals_2, primals_3 = args 2023-01-11T21:38:06.4616378Z args.clear() 2023-01-11T21:38:06.4616470Z with torch.cuda.device(0): 2023-01-11T21:38:06.4616671Z buf0 = empty_strided((2, 16), (16, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.4616836Z aten.addmm.out(primals_2, primals_3, as_strided(primals_1, (8, 16), (1, 8)), beta=1, alpha=1, out=buf0) 2023-01-11T21:38:06.4616912Z del primals_1 2023-01-11T21:38:06.4616987Z del primals_2 2023-01-11T21:38:06.4617070Z buf1 = buf0; del buf0 # reuse 2023-01-11T21:38:06.4617216Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.4617355Z triton_fused_sigmoid_0.run(buf1, 32, grid=grid(32), stream=stream0) 2023-01-11T21:38:06.4617462Z return (buf1, primals_3, buf1, ) 2023-01-11T21:38:06.4617467Z 2023-01-11T21:38:06.4617471Z 2023-01-11T21:38:06.4617551Z if __name__ == "__main__": 2023-01-11T21:38:06.4617669Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.4617798Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.4618003Z primals_1 = rand_strided((16, 8), (8, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.4618196Z primals_2 = rand_strided((16, ), (1, ), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.4618396Z primals_3 = rand_strided((2, 8), (8, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.4618538Z print_performance(lambda: call([primals_1, primals_2, primals_3])) 2023-01-11T21:38:06.4618544Z 2023-01-11T21:38:06.4618613Z ok (0.437s) 2023-01-11T21:38:06.4619067Z test_linear2_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.4619242Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.4619503Z [2023-01-11 21:35:07,244] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 643 2023-01-11T21:38:06.4619766Z [2023-01-11 21:35:07,417] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 643 2023-01-11T21:38:06.4620183Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.4620319Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.4620575Z [2023-01-11 21:35:08,030] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 644 2023-01-11T21:38:06.4620583Z 2023-01-11T21:38:06.4620677Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.4620752Z import torch 2023-01-11T21:38:06.4620828Z import random 2023-01-11T21:38:06.4620949Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.4621075Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.4621080Z 2023-01-11T21:38:06.4621165Z aten = torch.ops.aten 2023-01-11T21:38:06.4621304Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.4621401Z async_compile = AsyncCompile() 2023-01-11T21:38:06.4621406Z 2023-01-11T21:38:06.4621475Z import triton 2023-01-11T21:38:06.4621569Z import triton.language as tl 2023-01-11T21:38:06.4621699Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.4621841Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.4621846Z 2023-01-11T21:38:06.4621851Z 2023-01-11T21:38:06.4622007Z triton_fused_relu_0 = async_compile.triton(''' 2023-01-11T21:38:06.4622113Z import triton 2023-01-11T21:38:06.4622206Z import triton.language as tl 2023-01-11T21:38:06.4622315Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4622414Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4622549Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4622675Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4622680Z 2023-01-11T21:38:06.4623079Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp32', 1: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.4623158Z @triton.jit 2023-01-11T21:38:06.4623283Z def triton_(in_out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4623359Z xnumel = 16 2023-01-11T21:38:06.4623449Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4623580Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4623665Z xmask = xindex < xnumel 2023-01-11T21:38:06.4623737Z x0 = xindex 2023-01-11T21:38:06.4623841Z tmp0 = tl.load(in_out_ptr0 + (x0), xmask) 2023-01-11T21:38:06.4623959Z tmp1 = tl.where(0 != 0, 0, tl.where(0 > tmp0, 0, tmp0)) 2023-01-11T21:38:06.4624095Z tl.store(in_out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask) 2023-01-11T21:38:06.4624173Z ''') 2023-01-11T21:38:06.4624185Z 2023-01-11T21:38:06.4624189Z 2023-01-11T21:38:06.4624342Z triton_fused_le_relu_3_1 = async_compile.triton(''' 2023-01-11T21:38:06.4624418Z import triton 2023-01-11T21:38:06.4624510Z import triton.language as tl 2023-01-11T21:38:06.4624654Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4624755Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4624888Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4625014Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4625019Z 2023-01-11T21:38:06.4625440Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp32', 1: '*i1', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.4625510Z @triton.jit 2023-01-11T21:38:06.4625670Z def triton_(in_out_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4625749Z xnumel = 16 2023-01-11T21:38:06.4625871Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4626000Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4626085Z xmask = xindex < xnumel 2023-01-11T21:38:06.4626160Z x0 = xindex 2023-01-11T21:38:06.4626258Z tmp0 = tl.load(in_out_ptr0 + (x0), xmask) 2023-01-11T21:38:06.4626374Z tmp1 = tl.where(0 != 0, 0, tl.where(0 > tmp0, 0, tmp0)) 2023-01-11T21:38:06.4626449Z tmp2 = 0 2023-01-11T21:38:06.4626531Z tmp3 = tmp1 <= tmp2 2023-01-11T21:38:06.4626672Z tl.store(in_out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask) 2023-01-11T21:38:06.4626807Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.4626897Z ''') 2023-01-11T21:38:06.4626902Z 2023-01-11T21:38:06.4626907Z 2023-01-11T21:38:06.4627001Z async_compile.wait(globals()) 2023-01-11T21:38:06.4627075Z del async_compile 2023-01-11T21:38:06.4627080Z 2023-01-11T21:38:06.4627156Z def call(args): 2023-01-11T21:38:06.4627339Z primals_1, primals_2, primals_3, primals_4, primals_5, primals_6, primals_7, primals_8, primals_9 = args 2023-01-11T21:38:06.4627417Z args.clear() 2023-01-11T21:38:06.4627511Z with torch.cuda.device(0): 2023-01-11T21:38:06.4627718Z buf0 = empty_strided((2, 8), (8, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.4627889Z aten.addmm.out(primals_2, primals_9, as_strided(primals_1, (8, 8), (1, 8)), beta=1, alpha=1, out=buf0) 2023-01-11T21:38:06.4627963Z del primals_1 2023-01-11T21:38:06.4628044Z del primals_2 2023-01-11T21:38:06.4628163Z buf1 = buf0; del buf0 # reuse 2023-01-11T21:38:06.4628260Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.4628392Z triton_fused_relu_0.run(buf1, 16, grid=grid(16), stream=stream0) 2023-01-11T21:38:06.4628594Z buf2 = empty_strided((2, 8), (8, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.4628757Z aten.addmm.out(primals_4, buf1, as_strided(primals_3, (8, 8), (1, 8)), beta=1, alpha=1, out=buf2) 2023-01-11T21:38:06.4628831Z del primals_4 2023-01-11T21:38:06.4628922Z buf3 = buf2; del buf2 # reuse 2023-01-11T21:38:06.4629054Z triton_fused_relu_0.run(buf3, 16, grid=grid(16), stream=stream0) 2023-01-11T21:38:06.4629257Z buf4 = empty_strided((2, 8), (8, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.4629421Z aten.addmm.out(primals_6, buf3, as_strided(primals_5, (8, 8), (1, 8)), beta=1, alpha=1, out=buf4) 2023-01-11T21:38:06.4629501Z del primals_6 2023-01-11T21:38:06.4629593Z buf5 = buf4; del buf4 # reuse 2023-01-11T21:38:06.4629723Z triton_fused_relu_0.run(buf5, 16, grid=grid(16), stream=stream0) 2023-01-11T21:38:06.4629912Z buf6 = empty_strided((2, 8), (8, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.4630072Z aten.addmm.out(primals_8, buf5, as_strided(primals_7, (8, 8), (1, 8)), beta=1, alpha=1, out=buf6) 2023-01-11T21:38:06.4630152Z del primals_8 2023-01-11T21:38:06.4630244Z buf7 = buf6; del buf6 # reuse 2023-01-11T21:38:06.4630440Z buf8 = empty_strided((2, 8), (8, 1), device='cuda', dtype=torch.bool) 2023-01-11T21:38:06.4630582Z triton_fused_le_relu_3_1.run(buf7, buf8, 16, grid=grid(16), stream=stream0) 2023-01-11T21:38:06.4630820Z return (buf7, primals_9, buf1, buf3, buf5, buf8, as_strided(primals_7, (8, 8), (8, 1)), as_strided(primals_5, (8, 8), (8, 1)), as_strided(primals_3, (8, 8), (8, 1)), ) 2023-01-11T21:38:06.4630827Z 2023-01-11T21:38:06.4630831Z 2023-01-11T21:38:06.4630913Z if __name__ == "__main__": 2023-01-11T21:38:06.4631038Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.4631160Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.4631365Z primals_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.4631567Z primals_2 = rand_strided((8, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.4631770Z primals_3 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.4631969Z primals_4 = rand_strided((8, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.4632174Z primals_5 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.4632377Z primals_6 = rand_strided((8, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.4632580Z primals_7 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.4632777Z primals_8 = rand_strided((8, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.4632979Z primals_9 = rand_strided((2, 8), (8, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.4633190Z print_performance(lambda: call([primals_1, primals_2, primals_3, primals_4, primals_5, primals_6, primals_7, primals_8, primals_9])) 2023-01-11T21:38:06.4633196Z 2023-01-11T21:38:06.4633464Z [2023-01-11 21:35:08,117] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 644 2023-01-11T21:38:06.4633470Z 2023-01-11T21:38:06.4633569Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.4633647Z import torch 2023-01-11T21:38:06.4633726Z import random 2023-01-11T21:38:06.4633851Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.4633971Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.4633976Z 2023-01-11T21:38:06.4634060Z aten = torch.ops.aten 2023-01-11T21:38:06.4634199Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.4634325Z async_compile = AsyncCompile() 2023-01-11T21:38:06.4634332Z 2023-01-11T21:38:06.4634408Z import triton 2023-01-11T21:38:06.4634500Z import triton.language as tl 2023-01-11T21:38:06.4634627Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.4634767Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.4634772Z 2023-01-11T21:38:06.4634777Z 2023-01-11T21:38:06.4634925Z triton_fused_relu_0 = async_compile.triton(''' 2023-01-11T21:38:06.4635003Z import triton 2023-01-11T21:38:06.4635094Z import triton.language as tl 2023-01-11T21:38:06.4635209Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4635327Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4635477Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4635620Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4635625Z 2023-01-11T21:38:06.4636027Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp16', 1: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.4636094Z @triton.jit 2023-01-11T21:38:06.4636218Z def triton_(in_out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4636291Z xnumel = 16 2023-01-11T21:38:06.4636388Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4636518Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4636601Z xmask = xindex < xnumel 2023-01-11T21:38:06.4636671Z x0 = xindex 2023-01-11T21:38:06.4636786Z tmp0 = tl.load(in_out_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.4636941Z tmp1 = tl.where(0 != 0, 0, tl.where(0 > tmp0, 0, tmp0)) 2023-01-11T21:38:06.4637077Z tl.store(in_out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask) 2023-01-11T21:38:06.4637163Z ''') 2023-01-11T21:38:06.4637168Z 2023-01-11T21:38:06.4637173Z 2023-01-11T21:38:06.4637335Z triton_fused_le_relu_3_1 = async_compile.triton(''' 2023-01-11T21:38:06.4637410Z import triton 2023-01-11T21:38:06.4637503Z import triton.language as tl 2023-01-11T21:38:06.4637616Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4637711Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4637841Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4637965Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4637970Z 2023-01-11T21:38:06.4638381Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp16', 1: '*i1', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.4638457Z @triton.jit 2023-01-11T21:38:06.4638592Z def triton_(in_out_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4638665Z xnumel = 16 2023-01-11T21:38:06.4638762Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4638885Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4638970Z xmask = xindex < xnumel 2023-01-11T21:38:06.4639040Z x0 = xindex 2023-01-11T21:38:06.4639160Z tmp0 = tl.load(in_out_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.4639273Z tmp1 = tl.where(0 != 0, 0, tl.where(0 > tmp0, 0, tmp0)) 2023-01-11T21:38:06.4639360Z tmp2 = tmp1.to(tl.float32) 2023-01-11T21:38:06.4639426Z tmp3 = 0 2023-01-11T21:38:06.4639499Z tmp4 = tmp2 <= tmp3 2023-01-11T21:38:06.4639639Z tl.store(in_out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask) 2023-01-11T21:38:06.4639771Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.4639859Z ''') 2023-01-11T21:38:06.4639865Z 2023-01-11T21:38:06.4639869Z 2023-01-11T21:38:06.4639962Z async_compile.wait(globals()) 2023-01-11T21:38:06.4640040Z del async_compile 2023-01-11T21:38:06.4640045Z 2023-01-11T21:38:06.4640119Z def call(args): 2023-01-11T21:38:06.4640321Z primals_1, primals_2, primals_3, primals_4, primals_5, primals_6, primals_7, primals_8, primals_9 = args 2023-01-11T21:38:06.4640390Z args.clear() 2023-01-11T21:38:06.4640482Z with torch.cuda.device(0): 2023-01-11T21:38:06.4640681Z buf0 = empty_strided((2, 8), (8, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.4640853Z aten.addmm.out(primals_2, primals_9, as_strided(primals_1, (8, 8), (1, 8)), beta=1, alpha=1, out=buf0) 2023-01-11T21:38:06.4640929Z del primals_1 2023-01-11T21:38:06.4641006Z del primals_2 2023-01-11T21:38:06.4641096Z buf1 = buf0; del buf0 # reuse 2023-01-11T21:38:06.4641181Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.4641313Z triton_fused_relu_0.run(buf1, 16, grid=grid(16), stream=stream0) 2023-01-11T21:38:06.4641508Z buf2 = empty_strided((2, 8), (8, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.4641673Z aten.addmm.out(primals_4, buf1, as_strided(primals_3, (8, 8), (1, 8)), beta=1, alpha=1, out=buf2) 2023-01-11T21:38:06.4641751Z del primals_4 2023-01-11T21:38:06.4641842Z buf3 = buf2; del buf2 # reuse 2023-01-11T21:38:06.4641971Z triton_fused_relu_0.run(buf3, 16, grid=grid(16), stream=stream0) 2023-01-11T21:38:06.4642161Z buf4 = empty_strided((2, 8), (8, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.4642323Z aten.addmm.out(primals_6, buf3, as_strided(primals_5, (8, 8), (1, 8)), beta=1, alpha=1, out=buf4) 2023-01-11T21:38:06.4642400Z del primals_6 2023-01-11T21:38:06.4642486Z buf5 = buf4; del buf4 # reuse 2023-01-11T21:38:06.4642613Z triton_fused_relu_0.run(buf5, 16, grid=grid(16), stream=stream0) 2023-01-11T21:38:06.4642835Z buf6 = empty_strided((2, 8), (8, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.4642990Z aten.addmm.out(primals_8, buf5, as_strided(primals_7, (8, 8), (1, 8)), beta=1, alpha=1, out=buf6) 2023-01-11T21:38:06.4643068Z del primals_8 2023-01-11T21:38:06.4643153Z buf7 = buf6; del buf6 # reuse 2023-01-11T21:38:06.4643345Z buf8 = empty_strided((2, 8), (8, 1), device='cuda', dtype=torch.bool) 2023-01-11T21:38:06.4643485Z triton_fused_le_relu_3_1.run(buf7, buf8, 16, grid=grid(16), stream=stream0) 2023-01-11T21:38:06.4643690Z return (buf7, primals_9, buf1, buf3, buf5, buf8, as_strided(primals_7, (8, 8), (8, 1)), as_strided(primals_5, (8, 8), (8, 1)), as_strided(primals_3, (8, 8), (8, 1)), ) 2023-01-11T21:38:06.4643695Z 2023-01-11T21:38:06.4643700Z 2023-01-11T21:38:06.4643779Z if __name__ == "__main__": 2023-01-11T21:38:06.4643897Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.4644027Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.4644233Z primals_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.4644434Z primals_2 = rand_strided((8, ), (1, ), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.4644631Z primals_3 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.4644828Z primals_4 = rand_strided((8, ), (1, ), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.4645027Z primals_5 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.4645224Z primals_6 = rand_strided((8, ), (1, ), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.4645422Z primals_7 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.4645614Z primals_8 = rand_strided((8, ), (1, ), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.4645811Z primals_9 = rand_strided((2, 8), (8, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.4646023Z print_performance(lambda: call([primals_1, primals_2, primals_3, primals_4, primals_5, primals_6, primals_7, primals_8, primals_9])) 2023-01-11T21:38:06.4646029Z 2023-01-11T21:38:06.4646094Z ok (1.051s) 2023-01-11T21:38:06.4646249Z test_linear_binary_cuda (__main__.CudaTests) ... ok (0.001s) 2023-01-11T21:38:06.4646714Z test_linear_packed_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.4646845Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.4647104Z [2023-01-11 21:35:08,144] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 645 2023-01-11T21:38:06.4647369Z [2023-01-11 21:35:08,148] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 645 2023-01-11T21:38:06.4647785Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.4647915Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.4648171Z [2023-01-11 21:35:08,243] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 646 2023-01-11T21:38:06.4648432Z [2023-01-11 21:35:08,246] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 646 2023-01-11T21:38:06.4648846Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.4649002Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.4649249Z [2023-01-11 21:35:08,264] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 647 2023-01-11T21:38:06.4649507Z [2023-01-11 21:35:08,267] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 647 2023-01-11T21:38:06.4649916Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.4650049Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.4650301Z [2023-01-11 21:35:08,284] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 648 2023-01-11T21:38:06.4650562Z [2023-01-11 21:35:08,286] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 648 2023-01-11T21:38:06.4650971Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.4651099Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.4651349Z [2023-01-11 21:35:08,308] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 649 2023-01-11T21:38:06.4651610Z [2023-01-11 21:35:08,310] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 649 2023-01-11T21:38:06.4652043Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.4652173Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.4652417Z [2023-01-11 21:35:08,401] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 650 2023-01-11T21:38:06.4652671Z [2023-01-11 21:35:08,404] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 650 2023-01-11T21:38:06.4653076Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.4653209Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.4653458Z [2023-01-11 21:35:08,418] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 651 2023-01-11T21:38:06.4653464Z 2023-01-11T21:38:06.4653560Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.4653637Z import torch 2023-01-11T21:38:06.4653713Z import random 2023-01-11T21:38:06.4653833Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.4653950Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.4653964Z 2023-01-11T21:38:06.4654039Z aten = torch.ops.aten 2023-01-11T21:38:06.4654177Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.4654310Z async_compile = AsyncCompile() 2023-01-11T21:38:06.4654315Z 2023-01-11T21:38:06.4654389Z import triton 2023-01-11T21:38:06.4654581Z import triton.language as tl 2023-01-11T21:38:06.4654707Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.4654850Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.4654856Z 2023-01-11T21:38:06.4654860Z 2023-01-11T21:38:06.4654945Z async_compile.wait(globals()) 2023-01-11T21:38:06.4655022Z del async_compile 2023-01-11T21:38:06.4655027Z 2023-01-11T21:38:06.4655100Z def call(args): 2023-01-11T21:38:06.4655187Z arg0_1, arg1_1, arg2_1 = args 2023-01-11T21:38:06.4655264Z args.clear() 2023-01-11T21:38:06.4655355Z with torch.cuda.device(0): 2023-01-11T21:38:06.4655562Z buf0 = empty_strided((6, 30), (30, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.4655757Z aten.addmm.out(arg1_1, as_strided(arg2_1, (6, 10), (10, 1)), as_strided(arg0_1, (10, 30), (1, 10)), beta=1, alpha=1, out=buf0) 2023-01-11T21:38:06.4655827Z del arg0_1 2023-01-11T21:38:06.4655899Z del arg1_1 2023-01-11T21:38:06.4655969Z del arg2_1 2023-01-11T21:38:06.4656089Z return (as_strided(buf0, (2, 3, 30), (90, 30, 1)), ) 2023-01-11T21:38:06.4656094Z 2023-01-11T21:38:06.4656101Z 2023-01-11T21:38:06.4656181Z if __name__ == "__main__": 2023-01-11T21:38:06.4656306Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.4656443Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.4656675Z arg0_1 = rand_strided((30, 10), (10, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.4656893Z arg1_1 = rand_strided((30, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.4657182Z arg2_1 = rand_strided((2, 3, 10), (30, 10, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.4657326Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1])) 2023-01-11T21:38:06.4657335Z 2023-01-11T21:38:06.4657340Z 2023-01-11T21:38:06.4657442Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.4657516Z import torch 2023-01-11T21:38:06.4657589Z import random 2023-01-11T21:38:06.4657708Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.4657876Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.4657889Z 2023-01-11T21:38:06.4657965Z aten = torch.ops.aten 2023-01-11T21:38:06.4658101Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.4658195Z async_compile = AsyncCompile() 2023-01-11T21:38:06.4658200Z 2023-01-11T21:38:06.4658274Z import triton 2023-01-11T21:38:06.4658363Z import triton.language as tl 2023-01-11T21:38:06.4658487Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.4658628Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.4658634Z 2023-01-11T21:38:06.4658638Z 2023-01-11T21:38:06.4658728Z async_compile.wait(globals()) 2023-01-11T21:38:06.4658801Z del async_compile 2023-01-11T21:38:06.4658806Z 2023-01-11T21:38:06.4658886Z def call(args): 2023-01-11T21:38:06.4658970Z arg0_1, arg1_1, arg2_1 = args 2023-01-11T21:38:06.4659044Z args.clear() 2023-01-11T21:38:06.4659132Z with torch.cuda.device(0): 2023-01-11T21:38:06.4659334Z buf0 = empty_strided((6, 30), (30, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.4659510Z aten.addmm.out(arg1_1, as_strided(arg2_1, (6, 10), (10, 1)), as_strided(arg0_1, (10, 30), (1, 10)), beta=1, alpha=1, out=buf0) 2023-01-11T21:38:06.4659577Z del arg0_1 2023-01-11T21:38:06.4659649Z del arg1_1 2023-01-11T21:38:06.4659718Z del arg2_1 2023-01-11T21:38:06.4659830Z return (as_strided(buf0, (2, 3, 30), (90, 30, 1)), ) 2023-01-11T21:38:06.4659835Z 2023-01-11T21:38:06.4659839Z 2023-01-11T21:38:06.4659920Z if __name__ == "__main__": 2023-01-11T21:38:06.4660035Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.4660248Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.4660454Z arg0_1 = rand_strided((30, 10), (10, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.4660645Z arg1_1 = rand_strided((30, ), (1, ), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.4660859Z arg2_1 = rand_strided((2, 3, 10), (30, 10, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.4660991Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1])) 2023-01-11T21:38:06.4660997Z 2023-01-11T21:38:06.4661001Z 2023-01-11T21:38:06.4661101Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.4661177Z import torch 2023-01-11T21:38:06.4661253Z import random 2023-01-11T21:38:06.4661374Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.4661499Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.4661504Z 2023-01-11T21:38:06.4661582Z aten = torch.ops.aten 2023-01-11T21:38:06.4661719Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.4661820Z async_compile = AsyncCompile() 2023-01-11T21:38:06.4661825Z 2023-01-11T21:38:06.4661900Z import triton 2023-01-11T21:38:06.4661995Z import triton.language as tl 2023-01-11T21:38:06.4662124Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.4662271Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.4662276Z 2023-01-11T21:38:06.4662280Z 2023-01-11T21:38:06.4662375Z async_compile.wait(globals()) 2023-01-11T21:38:06.4662447Z del async_compile 2023-01-11T21:38:06.4662451Z 2023-01-11T21:38:06.4662527Z def call(args): 2023-01-11T21:38:06.4662608Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.4662685Z args.clear() 2023-01-11T21:38:06.4662777Z with torch.cuda.device(0): 2023-01-11T21:38:06.4662979Z buf0 = empty_strided((6, 30), (30, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.4663129Z aten.mm.out(as_strided(arg1_1, (6, 10), (10, 1)), as_strided(arg0_1, (10, 30), (1, 10)), out=buf0) 2023-01-11T21:38:06.4663202Z del arg0_1 2023-01-11T21:38:06.4663277Z del arg1_1 2023-01-11T21:38:06.4663390Z return (as_strided(buf0, (2, 3, 30), (90, 30, 1)), ) 2023-01-11T21:38:06.4663395Z 2023-01-11T21:38:06.4663400Z 2023-01-11T21:38:06.4663481Z if __name__ == "__main__": 2023-01-11T21:38:06.4663630Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.4663759Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.4663964Z arg0_1 = rand_strided((30, 10), (10, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.4664172Z arg1_1 = rand_strided((2, 3, 10), (30, 10, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.4664288Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.4664293Z 2023-01-11T21:38:06.4664305Z 2023-01-11T21:38:06.4664397Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.4664472Z import torch 2023-01-11T21:38:06.4664548Z import random 2023-01-11T21:38:06.4664672Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.4664797Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.4664802Z 2023-01-11T21:38:06.4664884Z aten = torch.ops.aten 2023-01-11T21:38:06.4665022Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.4665114Z async_compile = AsyncCompile() 2023-01-11T21:38:06.4665119Z 2023-01-11T21:38:06.4665195Z import triton 2023-01-11T21:38:06.4665290Z import triton.language as tl 2023-01-11T21:38:06.4665421Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.4665586Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.4665591Z 2023-01-11T21:38:06.4665596Z 2023-01-11T21:38:06.4665707Z async_compile.wait(globals()) 2023-01-11T21:38:06.4665792Z del async_compile 2023-01-11T21:38:06.4665797Z 2023-01-11T21:38:06.4665873Z def call(args): 2023-01-11T21:38:06.4665948Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.4666024Z args.clear() 2023-01-11T21:38:06.4666147Z with torch.cuda.device(0): 2023-01-11T21:38:06.4666348Z buf0 = empty_strided((6, 30), (30, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.4666496Z aten.mm.out(as_strided(arg1_1, (6, 10), (10, 1)), as_strided(arg0_1, (10, 30), (1, 10)), out=buf0) 2023-01-11T21:38:06.4666572Z del arg0_1 2023-01-11T21:38:06.4666647Z del arg1_1 2023-01-11T21:38:06.4666754Z return (as_strided(buf0, (2, 3, 30), (90, 30, 1)), ) 2023-01-11T21:38:06.4666759Z 2023-01-11T21:38:06.4666770Z 2023-01-11T21:38:06.4666845Z if __name__ == "__main__": 2023-01-11T21:38:06.4666963Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.4667088Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.4667294Z arg0_1 = rand_strided((30, 10), (10, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.4667501Z arg1_1 = rand_strided((2, 3, 10), (30, 10, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.4667630Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.4667635Z 2023-01-11T21:38:06.4667640Z 2023-01-11T21:38:06.4667740Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.4667816Z import torch 2023-01-11T21:38:06.4667885Z import random 2023-01-11T21:38:06.4668005Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.4668133Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.4668138Z 2023-01-11T21:38:06.4668225Z aten = torch.ops.aten 2023-01-11T21:38:06.4668361Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.4668458Z async_compile = AsyncCompile() 2023-01-11T21:38:06.4668463Z 2023-01-11T21:38:06.4668539Z import triton 2023-01-11T21:38:06.4668626Z import triton.language as tl 2023-01-11T21:38:06.4668752Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.4668892Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.4668897Z 2023-01-11T21:38:06.4668904Z 2023-01-11T21:38:06.4668999Z async_compile.wait(globals()) 2023-01-11T21:38:06.4669076Z del async_compile 2023-01-11T21:38:06.4669082Z 2023-01-11T21:38:06.4669157Z def call(args): 2023-01-11T21:38:06.4669243Z arg0_1, arg1_1, arg2_1 = args 2023-01-11T21:38:06.4669322Z args.clear() 2023-01-11T21:38:06.4669408Z with torch.cuda.device(0): 2023-01-11T21:38:06.4669642Z buf0 = empty_strided((2, 30), (30, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.4669801Z aten.addmm.out(arg1_1, arg2_1, as_strided(arg0_1, (10, 30), (1, 10)), beta=1, alpha=1, out=buf0) 2023-01-11T21:38:06.4669876Z del arg0_1 2023-01-11T21:38:06.4669952Z del arg1_1 2023-01-11T21:38:06.4670026Z del arg2_1 2023-01-11T21:38:06.4670106Z return (buf0, ) 2023-01-11T21:38:06.4670112Z 2023-01-11T21:38:06.4670116Z 2023-01-11T21:38:06.4670192Z if __name__ == "__main__": 2023-01-11T21:38:06.4670310Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.4670440Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.4670642Z arg0_1 = rand_strided((30, 10), (10, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.4670842Z arg1_1 = rand_strided((30, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.4671045Z arg2_1 = rand_strided((2, 10), (10, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.4671175Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1])) 2023-01-11T21:38:06.4671180Z 2023-01-11T21:38:06.4671185Z 2023-01-11T21:38:06.4671285Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.4671355Z import torch 2023-01-11T21:38:06.4671429Z import random 2023-01-11T21:38:06.4671549Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.4671670Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.4671675Z 2023-01-11T21:38:06.4671760Z aten = torch.ops.aten 2023-01-11T21:38:06.4671898Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.4672024Z async_compile = AsyncCompile() 2023-01-11T21:38:06.4672030Z 2023-01-11T21:38:06.4672105Z import triton 2023-01-11T21:38:06.4672192Z import triton.language as tl 2023-01-11T21:38:06.4672320Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.4672461Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.4672467Z 2023-01-11T21:38:06.4672471Z 2023-01-11T21:38:06.4672565Z async_compile.wait(globals()) 2023-01-11T21:38:06.4672643Z del async_compile 2023-01-11T21:38:06.4672648Z 2023-01-11T21:38:06.4672725Z def call(args): 2023-01-11T21:38:06.4672813Z arg0_1, arg1_1, arg2_1 = args 2023-01-11T21:38:06.4672890Z args.clear() 2023-01-11T21:38:06.4672977Z with torch.cuda.device(0): 2023-01-11T21:38:06.4673181Z buf0 = empty_strided((2, 30), (30, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.4673339Z aten.addmm.out(arg1_1, arg2_1, as_strided(arg0_1, (10, 30), (1, 10)), beta=1, alpha=1, out=buf0) 2023-01-11T21:38:06.4673418Z del arg0_1 2023-01-11T21:38:06.4673493Z del arg1_1 2023-01-11T21:38:06.4673568Z del arg2_1 2023-01-11T21:38:06.4673646Z return (buf0, ) 2023-01-11T21:38:06.4673652Z 2023-01-11T21:38:06.4673656Z 2023-01-11T21:38:06.4673731Z if __name__ == "__main__": 2023-01-11T21:38:06.4673853Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.4673981Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.4674186Z arg0_1 = rand_strided((30, 10), (10, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.4674384Z arg1_1 = rand_strided((30, ), (1, ), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.4674586Z arg2_1 = rand_strided((2, 10), (10, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.4674721Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1])) 2023-01-11T21:38:06.4674726Z 2023-01-11T21:38:06.4674992Z [2023-01-11 21:35:08,421] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 651 2023-01-11T21:38:06.4675467Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.4675623Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.4675874Z [2023-01-11 21:35:08,435] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 652 2023-01-11T21:38:06.4676137Z [2023-01-11 21:35:08,437] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 652 2023-01-11T21:38:06.4676143Z 2023-01-11T21:38:06.4676242Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.4676318Z import torch 2023-01-11T21:38:06.4676395Z import random 2023-01-11T21:38:06.4676520Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.4676644Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.4676649Z 2023-01-11T21:38:06.4676734Z aten = torch.ops.aten 2023-01-11T21:38:06.4676864Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.4676963Z async_compile = AsyncCompile() 2023-01-11T21:38:06.4676969Z 2023-01-11T21:38:06.4677045Z import triton 2023-01-11T21:38:06.4677139Z import triton.language as tl 2023-01-11T21:38:06.4677267Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.4677408Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.4677414Z 2023-01-11T21:38:06.4677418Z 2023-01-11T21:38:06.4677510Z async_compile.wait(globals()) 2023-01-11T21:38:06.4677583Z del async_compile 2023-01-11T21:38:06.4677593Z 2023-01-11T21:38:06.4677663Z def call(args): 2023-01-11T21:38:06.4677745Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.4677823Z args.clear() 2023-01-11T21:38:06.4677954Z with torch.cuda.device(0): 2023-01-11T21:38:06.4678159Z buf0 = empty_strided((2, 30), (30, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.4678289Z aten.mm.out(arg1_1, as_strided(arg0_1, (10, 30), (1, 10)), out=buf0) 2023-01-11T21:38:06.4678367Z del arg0_1 2023-01-11T21:38:06.4678437Z del arg1_1 2023-01-11T21:38:06.4678516Z return (buf0, ) 2023-01-11T21:38:06.4678522Z 2023-01-11T21:38:06.4678526Z 2023-01-11T21:38:06.4678606Z if __name__ == "__main__": 2023-01-11T21:38:06.4678725Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.4678853Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.4679058Z arg0_1 = rand_strided((30, 10), (10, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.4679259Z arg1_1 = rand_strided((2, 10), (10, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.4679373Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.4679389Z 2023-01-11T21:38:06.4679393Z 2023-01-11T21:38:06.4679486Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.4679560Z import torch 2023-01-11T21:38:06.4679637Z import random 2023-01-11T21:38:06.4679759Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.4679884Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.4679892Z 2023-01-11T21:38:06.4679977Z aten = torch.ops.aten 2023-01-11T21:38:06.4680116Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.4680205Z async_compile = AsyncCompile() 2023-01-11T21:38:06.4680217Z 2023-01-11T21:38:06.4680286Z import triton 2023-01-11T21:38:06.4680381Z import triton.language as tl 2023-01-11T21:38:06.4680508Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.4680652Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.4680657Z 2023-01-11T21:38:06.4680661Z 2023-01-11T21:38:06.4680753Z async_compile.wait(globals()) 2023-01-11T21:38:06.4680835Z del async_compile 2023-01-11T21:38:06.4680840Z 2023-01-11T21:38:06.4680918Z def call(args): 2023-01-11T21:38:06.4680993Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.4681069Z args.clear() 2023-01-11T21:38:06.4681163Z with torch.cuda.device(0): 2023-01-11T21:38:06.4681393Z buf0 = empty_strided((2, 30), (30, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.4681522Z aten.mm.out(arg1_1, as_strided(arg0_1, (10, 30), (1, 10)), out=buf0) 2023-01-11T21:38:06.4681599Z del arg0_1 2023-01-11T21:38:06.4681672Z del arg1_1 2023-01-11T21:38:06.4681745Z return (buf0, ) 2023-01-11T21:38:06.4681756Z 2023-01-11T21:38:06.4681761Z 2023-01-11T21:38:06.4681835Z if __name__ == "__main__": 2023-01-11T21:38:06.4681953Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.4682082Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.4682284Z arg0_1 = rand_strided((30, 10), (10, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.4682489Z arg1_1 = rand_strided((2, 10), (10, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.4682610Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.4682615Z 2023-01-11T21:38:06.4682688Z ok (0.318s) 2023-01-11T21:38:06.4682815Z test_linear_permute_fusion (__main__.CudaTests) ... ok (0.004s) 2023-01-11T21:38:06.4682941Z test_linear_unary_cuda (__main__.CudaTests) ... ok (0.001s) 2023-01-11T21:38:06.4683403Z test_linspace1_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.4683537Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.4683826Z [2023-01-11 21:35:08,477] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 653 2023-01-11T21:38:06.4684088Z [2023-01-11 21:35:08,548] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 653 2023-01-11T21:38:06.4684504Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.4684637Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.4684894Z [2023-01-11 21:35:08,581] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 654 2023-01-11T21:38:06.4685159Z [2023-01-11 21:35:08,649] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 654 2023-01-11T21:38:06.4685168Z 2023-01-11T21:38:06.4685267Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.4685337Z import torch 2023-01-11T21:38:06.4685415Z import random 2023-01-11T21:38:06.4685537Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.4685667Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.4685675Z 2023-01-11T21:38:06.4685758Z aten = torch.ops.aten 2023-01-11T21:38:06.4685897Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.4685996Z async_compile = AsyncCompile() 2023-01-11T21:38:06.4686001Z 2023-01-11T21:38:06.4686076Z import triton 2023-01-11T21:38:06.4686163Z import triton.language as tl 2023-01-11T21:38:06.4686290Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.4686431Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.4686436Z 2023-01-11T21:38:06.4686441Z 2023-01-11T21:38:06.4686600Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.4686678Z import triton 2023-01-11T21:38:06.4686772Z import triton.language as tl 2023-01-11T21:38:06.4686886Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4686983Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4687121Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4687275Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4687280Z 2023-01-11T21:38:06.4687686Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.4687762Z @triton.jit 2023-01-11T21:38:06.4687899Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4687976Z xnumel = 7 2023-01-11T21:38:06.4688076Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4688200Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4688288Z xmask = xindex < xnumel 2023-01-11T21:38:06.4688361Z x0 = xindex 2023-01-11T21:38:06.4688460Z tmp4 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.4688536Z tmp0 = 0.125 2023-01-11T21:38:06.4688612Z tmp1 = x0 2023-01-11T21:38:06.4688690Z tmp2 = tmp0 * tmp1 2023-01-11T21:38:06.4688766Z tmp3 = tmp2 + tmp0 2023-01-11T21:38:06.4688847Z tmp5 = tmp3 + tmp4 2023-01-11T21:38:06.4688987Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask) 2023-01-11T21:38:06.4689077Z ''') 2023-01-11T21:38:06.4689082Z 2023-01-11T21:38:06.4689087Z 2023-01-11T21:38:06.4689181Z async_compile.wait(globals()) 2023-01-11T21:38:06.4689257Z del async_compile 2023-01-11T21:38:06.4689262Z 2023-01-11T21:38:06.4689336Z def call(args): 2023-01-11T21:38:06.4689410Z arg0_1, = args 2023-01-11T21:38:06.4689480Z args.clear() 2023-01-11T21:38:06.4689572Z with torch.cuda.device(0): 2023-01-11T21:38:06.4689776Z buf0 = empty_strided((1, 7), (7, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.4689900Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.4690038Z triton_fused_add_0.run(arg0_1, buf0, 7, grid=grid(7), stream=stream0) 2023-01-11T21:38:06.4690112Z del arg0_1 2023-01-11T21:38:06.4690191Z return (buf0, ) 2023-01-11T21:38:06.4690197Z 2023-01-11T21:38:06.4690203Z 2023-01-11T21:38:06.4690279Z if __name__ == "__main__": 2023-01-11T21:38:06.4690399Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.4690525Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.4690727Z arg0_1 = rand_strided((1, 7), (7, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.4690840Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.4690846Z 2023-01-11T21:38:06.4690850Z 2023-01-11T21:38:06.4690950Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.4691027Z import torch 2023-01-11T21:38:06.4691103Z import random 2023-01-11T21:38:06.4691220Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.4691344Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.4691349Z 2023-01-11T21:38:06.4691436Z aten = torch.ops.aten 2023-01-11T21:38:06.4691573Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.4691673Z async_compile = AsyncCompile() 2023-01-11T21:38:06.4691678Z 2023-01-11T21:38:06.4691755Z import triton 2023-01-11T21:38:06.4691849Z import triton.language as tl 2023-01-11T21:38:06.4691969Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.4692110Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.4692115Z 2023-01-11T21:38:06.4692120Z 2023-01-11T21:38:06.4692274Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.4692350Z import triton 2023-01-11T21:38:06.4692443Z import triton.language as tl 2023-01-11T21:38:06.4692559Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4692666Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4692800Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4692922Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4692927Z 2023-01-11T21:38:06.4693357Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.4693433Z @triton.jit 2023-01-11T21:38:06.4693569Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4693643Z xnumel = 7 2023-01-11T21:38:06.4693742Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4693872Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4693957Z xmask = xindex < xnumel 2023-01-11T21:38:06.4694023Z x0 = xindex 2023-01-11T21:38:06.4694143Z tmp4 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.4694222Z tmp0 = 0.125 2023-01-11T21:38:06.4694295Z tmp1 = x0 2023-01-11T21:38:06.4694374Z tmp2 = tmp0 * tmp1 2023-01-11T21:38:06.4694453Z tmp3 = tmp2 + tmp0 2023-01-11T21:38:06.4694646Z tmp5 = tmp4.to(tl.float32) 2023-01-11T21:38:06.4694721Z tmp6 = tmp3 + tmp5 2023-01-11T21:38:06.4694861Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp6, xmask) 2023-01-11T21:38:06.4694949Z ''') 2023-01-11T21:38:06.4694955Z 2023-01-11T21:38:06.4694959Z 2023-01-11T21:38:06.4695053Z async_compile.wait(globals()) 2023-01-11T21:38:06.4695132Z del async_compile 2023-01-11T21:38:06.4695137Z 2023-01-11T21:38:06.4695217Z def call(args): 2023-01-11T21:38:06.4695292Z arg0_1, = args 2023-01-11T21:38:06.4695362Z args.clear() 2023-01-11T21:38:06.4695456Z with torch.cuda.device(0): 2023-01-11T21:38:06.4695660Z buf0 = empty_strided((1, 7), (7, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.4695755Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.4695939Z triton_fused_add_0.run(arg0_1, buf0, 7, grid=grid(7), stream=stream0) 2023-01-11T21:38:06.4696013Z del arg0_1 2023-01-11T21:38:06.4696092Z return (buf0, ) 2023-01-11T21:38:06.4696098Z 2023-01-11T21:38:06.4696102Z 2023-01-11T21:38:06.4696182Z if __name__ == "__main__": 2023-01-11T21:38:06.4696299Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.4696425Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.4702914Z arg0_1 = rand_strided((1, 7), (7, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.4703037Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.4703043Z 2023-01-11T21:38:06.4703116Z ok (0.207s) 2023-01-11T21:38:06.4703576Z test_linspace2_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.4703717Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.4704001Z [2023-01-11 21:35:08,680] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 655 2023-01-11T21:38:06.4704269Z [2023-01-11 21:35:08,745] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 655 2023-01-11T21:38:06.4704687Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.4704820Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.4705082Z [2023-01-11 21:35:08,774] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 656 2023-01-11T21:38:06.4705338Z [2023-01-11 21:35:08,839] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 656 2023-01-11T21:38:06.4705344Z 2023-01-11T21:38:06.4705523Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.4705612Z import torch 2023-01-11T21:38:06.4705698Z import random 2023-01-11T21:38:06.4705839Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.4705968Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.4705973Z 2023-01-11T21:38:06.4706048Z aten = torch.ops.aten 2023-01-11T21:38:06.4706185Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.4706281Z async_compile = AsyncCompile() 2023-01-11T21:38:06.4706286Z 2023-01-11T21:38:06.4706361Z import triton 2023-01-11T21:38:06.4706453Z import triton.language as tl 2023-01-11T21:38:06.4706579Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.4706722Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.4706728Z 2023-01-11T21:38:06.4706732Z 2023-01-11T21:38:06.4706887Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.4706955Z import triton 2023-01-11T21:38:06.4707049Z import triton.language as tl 2023-01-11T21:38:06.4707163Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4707264Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4707396Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4707521Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4707526Z 2023-01-11T21:38:06.4707926Z @pointwise(size_hints=[1], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.4707997Z @triton.jit 2023-01-11T21:38:06.4708156Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4708232Z xnumel = 1 2023-01-11T21:38:06.4708332Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4708465Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4708552Z xmask = xindex < xnumel 2023-01-11T21:38:06.4708689Z tmp4 = tl.load(in_ptr0 + (0 + tl.zeros([XBLOCK], tl.int32)), None) 2023-01-11T21:38:06.4708765Z tmp0 = 0.0 2023-01-11T21:38:06.4708831Z tmp1 = 0 2023-01-11T21:38:06.4708913Z tmp2 = tmp0 * tmp1 2023-01-11T21:38:06.4708993Z tmp3 = tmp2 + tmp1 2023-01-11T21:38:06.4709070Z tmp5 = tmp3 + tmp4 2023-01-11T21:38:06.4709207Z tl.store(out_ptr0 + (0 + tl.zeros([XBLOCK], tl.int32)), tmp5, None) 2023-01-11T21:38:06.4709296Z ''') 2023-01-11T21:38:06.4709302Z 2023-01-11T21:38:06.4709306Z 2023-01-11T21:38:06.4709399Z async_compile.wait(globals()) 2023-01-11T21:38:06.4709473Z del async_compile 2023-01-11T21:38:06.4709490Z 2023-01-11T21:38:06.4709560Z def call(args): 2023-01-11T21:38:06.4709636Z arg0_1, = args 2023-01-11T21:38:06.4709719Z args.clear() 2023-01-11T21:38:06.4709814Z with torch.cuda.device(0): 2023-01-11T21:38:06.4710018Z buf0 = empty_strided((1, 1), (1, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.4710121Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.4710261Z triton_fused_add_0.run(arg0_1, buf0, 1, grid=grid(1), stream=stream0) 2023-01-11T21:38:06.4710330Z del arg0_1 2023-01-11T21:38:06.4710410Z return (buf0, ) 2023-01-11T21:38:06.4710415Z 2023-01-11T21:38:06.4710420Z 2023-01-11T21:38:06.4710502Z if __name__ == "__main__": 2023-01-11T21:38:06.4710621Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.4710749Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.4710952Z arg0_1 = rand_strided((1, 1), (1, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.4711067Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.4711074Z 2023-01-11T21:38:06.4711079Z 2023-01-11T21:38:06.4711176Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.4711245Z import torch 2023-01-11T21:38:06.4711322Z import random 2023-01-11T21:38:06.4711446Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.4711602Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.4711608Z 2023-01-11T21:38:06.4711694Z aten = torch.ops.aten 2023-01-11T21:38:06.4711831Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.4711930Z async_compile = AsyncCompile() 2023-01-11T21:38:06.4711935Z 2023-01-11T21:38:06.4712004Z import triton 2023-01-11T21:38:06.4712101Z import triton.language as tl 2023-01-11T21:38:06.4712226Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.4712367Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.4712373Z 2023-01-11T21:38:06.4712377Z 2023-01-11T21:38:06.4712536Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.4712611Z import triton 2023-01-11T21:38:06.4712708Z import triton.language as tl 2023-01-11T21:38:06.4712825Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4712923Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4713060Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4713187Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4713192Z 2023-01-11T21:38:06.4713593Z @pointwise(size_hints=[1], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.4713671Z @triton.jit 2023-01-11T21:38:06.4713805Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4713879Z xnumel = 1 2023-01-11T21:38:06.4713979Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4714132Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4714219Z xmask = xindex < xnumel 2023-01-11T21:38:06.4714368Z tmp4 = tl.load(in_ptr0 + (0 + tl.zeros([XBLOCK], tl.int32)), None).to(tl.float32) 2023-01-11T21:38:06.4714443Z tmp0 = 0.0 2023-01-11T21:38:06.4714514Z tmp1 = 0 2023-01-11T21:38:06.4714595Z tmp2 = tmp0 * tmp1 2023-01-11T21:38:06.4714675Z tmp3 = tmp2 + tmp1 2023-01-11T21:38:06.4714759Z tmp5 = tmp4.to(tl.float32) 2023-01-11T21:38:06.4714838Z tmp6 = tmp3 + tmp5 2023-01-11T21:38:06.4714974Z tl.store(out_ptr0 + (0 + tl.zeros([XBLOCK], tl.int32)), tmp6, None) 2023-01-11T21:38:06.4715066Z ''') 2023-01-11T21:38:06.4715072Z 2023-01-11T21:38:06.4715076Z 2023-01-11T21:38:06.4715172Z async_compile.wait(globals()) 2023-01-11T21:38:06.4715252Z del async_compile 2023-01-11T21:38:06.4715258Z 2023-01-11T21:38:06.4715351Z def call(args): 2023-01-11T21:38:06.4715428Z arg0_1, = args 2023-01-11T21:38:06.4715522Z args.clear() 2023-01-11T21:38:06.4715631Z with torch.cuda.device(0): 2023-01-11T21:38:06.4715833Z buf0 = empty_strided((1, 1), (1, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.4715928Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.4716066Z triton_fused_add_0.run(arg0_1, buf0, 1, grid=grid(1), stream=stream0) 2023-01-11T21:38:06.4716146Z del arg0_1 2023-01-11T21:38:06.4716219Z return (buf0, ) 2023-01-11T21:38:06.4716234Z 2023-01-11T21:38:06.4716239Z 2023-01-11T21:38:06.4716315Z if __name__ == "__main__": 2023-01-11T21:38:06.4716434Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.4716562Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.4716764Z arg0_1 = rand_strided((1, 1), (1, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.4716879Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.4716884Z 2023-01-11T21:38:06.4716957Z ok (0.190s) 2023-01-11T21:38:06.4717446Z test_linspace3_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.4717581Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.4717840Z [2023-01-11 21:35:08,867] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 657 2023-01-11T21:38:06.4718099Z [2023-01-11 21:35:08,868] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 657 2023-01-11T21:38:06.4718512Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.4718647Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.4718906Z [2023-01-11 21:35:08,894] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 658 2023-01-11T21:38:06.4719167Z [2023-01-11 21:35:08,896] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 658 2023-01-11T21:38:06.4719172Z 2023-01-11T21:38:06.4719270Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.4719343Z import torch 2023-01-11T21:38:06.4719418Z import random 2023-01-11T21:38:06.4719538Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.4719654Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.4719659Z 2023-01-11T21:38:06.4719742Z aten = torch.ops.aten 2023-01-11T21:38:06.4719879Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.4720008Z async_compile = AsyncCompile() 2023-01-11T21:38:06.4720013Z 2023-01-11T21:38:06.4720086Z import triton 2023-01-11T21:38:06.4720176Z import triton.language as tl 2023-01-11T21:38:06.4720303Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.4720443Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.4720449Z 2023-01-11T21:38:06.4720453Z 2023-01-11T21:38:06.4720538Z async_compile.wait(globals()) 2023-01-11T21:38:06.4720613Z del async_compile 2023-01-11T21:38:06.4720618Z 2023-01-11T21:38:06.4720693Z def call(args): 2023-01-11T21:38:06.4720888Z buf0 = empty_strided((0, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.4720962Z return (buf0, ) 2023-01-11T21:38:06.4720967Z 2023-01-11T21:38:06.4720972Z 2023-01-11T21:38:06.4721051Z if __name__ == "__main__": 2023-01-11T21:38:06.4721166Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.4721292Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.4721390Z print_performance(lambda: call([])) 2023-01-11T21:38:06.4721395Z 2023-01-11T21:38:06.4721404Z 2023-01-11T21:38:06.4721494Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.4721565Z import torch 2023-01-11T21:38:06.4721639Z import random 2023-01-11T21:38:06.4721761Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.4721884Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.4721889Z 2023-01-11T21:38:06.4721970Z aten = torch.ops.aten 2023-01-11T21:38:06.4722106Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.4722195Z async_compile = AsyncCompile() 2023-01-11T21:38:06.4722200Z 2023-01-11T21:38:06.4722273Z import triton 2023-01-11T21:38:06.4722366Z import triton.language as tl 2023-01-11T21:38:06.4722491Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.4722629Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.4722637Z 2023-01-11T21:38:06.4722641Z 2023-01-11T21:38:06.4722734Z async_compile.wait(globals()) 2023-01-11T21:38:06.4722811Z del async_compile 2023-01-11T21:38:06.4722816Z 2023-01-11T21:38:06.4722889Z def call(args): 2023-01-11T21:38:06.4723075Z buf0 = empty_strided((0, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.4723183Z return (buf0, ) 2023-01-11T21:38:06.4723188Z 2023-01-11T21:38:06.4723193Z 2023-01-11T21:38:06.4723274Z if __name__ == "__main__": 2023-01-11T21:38:06.4723389Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.4723512Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.4723612Z print_performance(lambda: call([])) 2023-01-11T21:38:06.4723617Z 2023-01-11T21:38:06.4723692Z ok (0.056s) 2023-01-11T21:38:06.4724025Z test_list_clearing_cuda (__main__.CudaTests) ... [2023-01-11 21:35:08,923] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph None 2023-01-11T21:38:06.4724282Z [2023-01-11 21:35:09,221] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph None 2023-01-11T21:38:06.4724536Z [2023-01-11 21:35:09,246] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph None 2023-01-11T21:38:06.4724800Z [2023-01-11 21:35:09,383] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph None 2023-01-11T21:38:06.4724806Z 2023-01-11T21:38:06.4724903Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.4724976Z import torch 2023-01-11T21:38:06.4725050Z import random 2023-01-11T21:38:06.4725169Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.4725292Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.4725297Z 2023-01-11T21:38:06.4725372Z aten = torch.ops.aten 2023-01-11T21:38:06.4725504Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.4725604Z async_compile = AsyncCompile() 2023-01-11T21:38:06.4725609Z 2023-01-11T21:38:06.4725735Z import triton 2023-01-11T21:38:06.4725835Z import triton.language as tl 2023-01-11T21:38:06.4725984Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.4726126Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.4726132Z 2023-01-11T21:38:06.4726136Z 2023-01-11T21:38:06.4726298Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.4726365Z import triton 2023-01-11T21:38:06.4726458Z import triton.language as tl 2023-01-11T21:38:06.4726571Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4726672Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4726807Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4726933Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4726938Z 2023-01-11T21:38:06.4727360Z @pointwise(size_hints=[32], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.4727437Z @triton.jit 2023-01-11T21:38:06.4727573Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4727646Z xnumel = 25 2023-01-11T21:38:06.4727747Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4727878Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4727960Z xmask = xindex < xnumel 2023-01-11T21:38:06.4728030Z x0 = xindex 2023-01-11T21:38:06.4728127Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.4728217Z tmp1 = tl.load(in_ptr1 + (x0), xmask) 2023-01-11T21:38:06.4728296Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.4728430Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.4728516Z ''') 2023-01-11T21:38:06.4728522Z 2023-01-11T21:38:06.4728526Z 2023-01-11T21:38:06.4728620Z async_compile.wait(globals()) 2023-01-11T21:38:06.4728700Z del async_compile 2023-01-11T21:38:06.4728705Z 2023-01-11T21:38:06.4728779Z def call(args): 2023-01-11T21:38:06.4728848Z x_1, y_1 = args 2023-01-11T21:38:06.4728917Z args.clear() 2023-01-11T21:38:06.4729008Z with torch.cuda.device(0): 2023-01-11T21:38:06.4729207Z buf0 = empty_strided((5, 5), (5, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.4729328Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.4729469Z triton_fused_add_0.run(x_1, y_1, buf0, 25, grid=grid(25), stream=stream0) 2023-01-11T21:38:06.4729543Z del x_1 2023-01-11T21:38:06.4729612Z del y_1 2023-01-11T21:38:06.4729805Z buf1 = empty_strided((5, 5), (5, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.4729905Z aten.mm.out(buf0, buf0, out=buf1) 2023-01-11T21:38:06.4729983Z return (buf1, ) 2023-01-11T21:38:06.4729989Z 2023-01-11T21:38:06.4729993Z 2023-01-11T21:38:06.4730075Z if __name__ == "__main__": 2023-01-11T21:38:06.4730189Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.4730318Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.4730512Z x_1 = rand_strided((5, 5), (5, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.4730705Z y_1 = rand_strided((5, 5), (5, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.4730814Z print_performance(lambda: call([x_1, y_1])) 2023-01-11T21:38:06.4730819Z 2023-01-11T21:38:06.4730830Z 2023-01-11T21:38:06.4730921Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.4730995Z import torch 2023-01-11T21:38:06.4731070Z import random 2023-01-11T21:38:06.4731188Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.4731309Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.4731315Z 2023-01-11T21:38:06.4731398Z aten = torch.ops.aten 2023-01-11T21:38:06.4731530Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.4731618Z async_compile = AsyncCompile() 2023-01-11T21:38:06.4731650Z 2023-01-11T21:38:06.4731730Z import triton 2023-01-11T21:38:06.4731820Z import triton.language as tl 2023-01-11T21:38:06.4731946Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.4732089Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.4732095Z 2023-01-11T21:38:06.4732099Z 2023-01-11T21:38:06.4732255Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.4732332Z import triton 2023-01-11T21:38:06.4732417Z import triton.language as tl 2023-01-11T21:38:06.4732534Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4732636Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4732769Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4732899Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4732904Z 2023-01-11T21:38:06.4733320Z @pointwise(size_hints=[32], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.4733400Z @triton.jit 2023-01-11T21:38:06.4733543Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4733616Z xnumel = 25 2023-01-11T21:38:06.4733709Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4733842Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4733925Z xmask = xindex < xnumel 2023-01-11T21:38:06.4733997Z x0 = xindex 2023-01-11T21:38:06.4734093Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.4734188Z tmp1 = tl.load(in_ptr1 + (x0), xmask) 2023-01-11T21:38:06.4734261Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.4734395Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.4734714Z ''') 2023-01-11T21:38:06.4734721Z 2023-01-11T21:38:06.4734725Z 2023-01-11T21:38:06.4734822Z async_compile.wait(globals()) 2023-01-11T21:38:06.4734903Z del async_compile 2023-01-11T21:38:06.4734909Z 2023-01-11T21:38:06.4734982Z def call(args): 2023-01-11T21:38:06.4735053Z x_1, y_1 = args 2023-01-11T21:38:06.4735129Z args.clear() 2023-01-11T21:38:06.4735214Z with torch.cuda.device(0): 2023-01-11T21:38:06.4735466Z buf0 = empty_strided((5, 5), (5, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.4735559Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.4735697Z triton_fused_add_0.run(x_1, y_1, buf0, 25, grid=grid(25), stream=stream0) 2023-01-11T21:38:06.4735769Z del x_1 2023-01-11T21:38:06.4735837Z del y_1 2023-01-11T21:38:06.4736032Z buf1 = empty_strided((5, 5), (5, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.4736122Z aten.mm.out(buf0, buf0, out=buf1) 2023-01-11T21:38:06.4736200Z return (buf1, ) 2023-01-11T21:38:06.4736205Z 2023-01-11T21:38:06.4736209Z 2023-01-11T21:38:06.4736293Z if __name__ == "__main__": 2023-01-11T21:38:06.4736410Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.4736537Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.4736729Z x_1 = rand_strided((5, 5), (5, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.4736919Z y_1 = rand_strided((5, 5), (5, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.4737034Z print_performance(lambda: call([x_1, y_1])) 2023-01-11T21:38:06.4737040Z 2023-01-11T21:38:06.4737103Z ok (0.486s) 2023-01-11T21:38:06.4737635Z test_log1p_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.4737768Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.4738075Z [2023-01-11 21:35:09,398] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 659 2023-01-11T21:38:06.4738344Z [2023-01-11 21:35:09,553] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 659 2023-01-11T21:38:06.4738765Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.4738900Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.4739161Z [2023-01-11 21:35:09,569] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 660 2023-01-11T21:38:06.4739425Z [2023-01-11 21:35:09,577] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 660 2023-01-11T21:38:06.4739846Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.4739979Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.4740235Z [2023-01-11 21:35:09,593] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 661 2023-01-11T21:38:06.4740493Z [2023-01-11 21:35:09,750] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 661 2023-01-11T21:38:06.4740910Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.4741045Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.4741327Z [2023-01-11 21:35:09,767] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 662 2023-01-11T21:38:06.4741333Z 2023-01-11T21:38:06.4741433Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.4741511Z import torch 2023-01-11T21:38:06.4741588Z import random 2023-01-11T21:38:06.4741712Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.4741840Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.4741845Z 2023-01-11T21:38:06.4741923Z aten = torch.ops.aten 2023-01-11T21:38:06.4742061Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.4742158Z async_compile = AsyncCompile() 2023-01-11T21:38:06.4742163Z 2023-01-11T21:38:06.4742245Z import triton 2023-01-11T21:38:06.4742339Z import triton.language as tl 2023-01-11T21:38:06.4742467Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.4742608Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.4742614Z 2023-01-11T21:38:06.4742619Z 2023-01-11T21:38:06.4742788Z triton_fused_log1p_mul_0 = async_compile.triton(''' 2023-01-11T21:38:06.4742858Z import triton 2023-01-11T21:38:06.4742952Z import triton.language as tl 2023-01-11T21:38:06.4743066Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4743170Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4743306Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4743433Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4743439Z 2023-01-11T21:38:06.4743862Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.4743964Z @triton.jit 2023-01-11T21:38:06.4744101Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4744180Z xnumel = 64 2023-01-11T21:38:06.4744280Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4744413Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4744500Z xmask = xindex < xnumel 2023-01-11T21:38:06.4744573Z x0 = xindex 2023-01-11T21:38:06.4744791Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.4744904Z tmp2 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.4745005Z tmp1 = tl.libdevice.log1p(tmp0) 2023-01-11T21:38:06.4745103Z tmp3 = tl.libdevice.log1p(tmp2) 2023-01-11T21:38:06.4745176Z tmp4 = 2 2023-01-11T21:38:06.4745257Z tmp5 = tmp3 * tmp4 2023-01-11T21:38:06.4745397Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask) 2023-01-11T21:38:06.4745533Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask) 2023-01-11T21:38:06.4745613Z ''') 2023-01-11T21:38:06.4745625Z 2023-01-11T21:38:06.4745629Z 2023-01-11T21:38:06.4745719Z async_compile.wait(globals()) 2023-01-11T21:38:06.4745799Z del async_compile 2023-01-11T21:38:06.4745804Z 2023-01-11T21:38:06.4745880Z def call(args): 2023-01-11T21:38:06.4745956Z arg0_1, = args 2023-01-11T21:38:06.4746034Z args.clear() 2023-01-11T21:38:06.4746129Z with torch.cuda.device(0): 2023-01-11T21:38:06.4746330Z buf0 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.4746523Z buf1 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.4746618Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.4746767Z triton_fused_log1p_mul_0.run(arg0_1, buf0, buf1, 64, grid=grid(64), stream=stream0) 2023-01-11T21:38:06.4746846Z del arg0_1 2023-01-11T21:38:06.4746932Z return (buf0, buf1, ) 2023-01-11T21:38:06.4746937Z 2023-01-11T21:38:06.4746941Z 2023-01-11T21:38:06.4747022Z if __name__ == "__main__": 2023-01-11T21:38:06.4747143Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.4747303Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.4747498Z arg0_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.4747614Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.4747619Z 2023-01-11T21:38:06.4747623Z 2023-01-11T21:38:06.4747723Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.4747799Z import torch 2023-01-11T21:38:06.4747876Z import random 2023-01-11T21:38:06.4747997Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.4748122Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.4748127Z 2023-01-11T21:38:06.4748212Z aten = torch.ops.aten 2023-01-11T21:38:06.4748347Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.4748443Z async_compile = AsyncCompile() 2023-01-11T21:38:06.4748448Z 2023-01-11T21:38:06.4748522Z import triton 2023-01-11T21:38:06.4748616Z import triton.language as tl 2023-01-11T21:38:06.4748741Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.4748887Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.4748892Z 2023-01-11T21:38:06.4748897Z 2023-01-11T21:38:06.4749062Z triton_fused_log1p_mul_0 = async_compile.triton(''' 2023-01-11T21:38:06.4749140Z import triton 2023-01-11T21:38:06.4749228Z import triton.language as tl 2023-01-11T21:38:06.4749344Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4749447Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4749582Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4749711Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4749741Z 2023-01-11T21:38:06.4750164Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.4750239Z @triton.jit 2023-01-11T21:38:06.4750384Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4750453Z xnumel = 64 2023-01-11T21:38:06.4750550Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4750682Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4750767Z xmask = xindex < xnumel 2023-01-11T21:38:06.4750840Z x0 = xindex 2023-01-11T21:38:06.4751056Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.4751176Z tmp2 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.4751270Z tmp1 = tl.libdevice.log1p(tmp0) 2023-01-11T21:38:06.4751372Z tmp3 = tl.libdevice.log1p(tmp2) 2023-01-11T21:38:06.4751445Z tmp4 = 2 2023-01-11T21:38:06.4751527Z tmp5 = tmp3 * tmp4 2023-01-11T21:38:06.4751665Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask) 2023-01-11T21:38:06.4751805Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask) 2023-01-11T21:38:06.4751893Z ''') 2023-01-11T21:38:06.4751899Z 2023-01-11T21:38:06.4751903Z 2023-01-11T21:38:06.4751992Z async_compile.wait(globals()) 2023-01-11T21:38:06.4752070Z del async_compile 2023-01-11T21:38:06.4752075Z 2023-01-11T21:38:06.4752150Z def call(args): 2023-01-11T21:38:06.4752226Z arg0_1, = args 2023-01-11T21:38:06.4752304Z args.clear() 2023-01-11T21:38:06.4752400Z with torch.cuda.device(0): 2023-01-11T21:38:06.4752600Z buf0 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.4752798Z buf1 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.4752889Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.4753038Z triton_fused_log1p_mul_0.run(arg0_1, buf0, buf1, 64, grid=grid(64), stream=stream0) 2023-01-11T21:38:06.4753114Z del arg0_1 2023-01-11T21:38:06.4753200Z return (buf0, buf1, ) 2023-01-11T21:38:06.4753205Z 2023-01-11T21:38:06.4753209Z 2023-01-11T21:38:06.4753321Z if __name__ == "__main__": 2023-01-11T21:38:06.4753439Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.4753563Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.4753755Z arg0_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.4753867Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.4753872Z 2023-01-11T21:38:06.4753876Z 2023-01-11T21:38:06.4753974Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.4754046Z import torch 2023-01-11T21:38:06.4754119Z import random 2023-01-11T21:38:06.4754239Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.4754363Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.4754368Z 2023-01-11T21:38:06.4754451Z aten = torch.ops.aten 2023-01-11T21:38:06.4754579Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.4754674Z async_compile = AsyncCompile() 2023-01-11T21:38:06.4754679Z 2023-01-11T21:38:06.4754754Z import triton 2023-01-11T21:38:06.4754846Z import triton.language as tl 2023-01-11T21:38:06.4754970Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.4755109Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.4755115Z 2023-01-11T21:38:06.4755119Z 2023-01-11T21:38:06.4755290Z triton_fused_log1p_mul_0 = async_compile.triton(''' 2023-01-11T21:38:06.4755383Z import triton 2023-01-11T21:38:06.4755473Z import triton.language as tl 2023-01-11T21:38:06.4755611Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4755712Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4755878Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4756003Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4756009Z 2023-01-11T21:38:06.4756432Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.4756506Z @triton.jit 2023-01-11T21:38:06.4756648Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4756716Z xnumel = 201 2023-01-11T21:38:06.4756812Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4756937Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4757020Z xmask = xindex < xnumel 2023-01-11T21:38:06.4757090Z x0 = xindex 2023-01-11T21:38:06.4757304Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.4757424Z tmp2 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.4757515Z tmp1 = tl.libdevice.log1p(tmp0) 2023-01-11T21:38:06.4757614Z tmp3 = tl.libdevice.log1p(tmp2) 2023-01-11T21:38:06.4757683Z tmp4 = 2 2023-01-11T21:38:06.4757760Z tmp5 = tmp3 * tmp4 2023-01-11T21:38:06.4757894Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask) 2023-01-11T21:38:06.4758026Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask) 2023-01-11T21:38:06.4758109Z ''') 2023-01-11T21:38:06.4758115Z 2023-01-11T21:38:06.4758119Z 2023-01-11T21:38:06.4758211Z async_compile.wait(globals()) 2023-01-11T21:38:06.4758280Z del async_compile 2023-01-11T21:38:06.4758285Z 2023-01-11T21:38:06.4758360Z def call(args): 2023-01-11T21:38:06.4758432Z arg0_1, = args 2023-01-11T21:38:06.4758505Z args.clear() 2023-01-11T21:38:06.4758597Z with torch.cuda.device(0): 2023-01-11T21:38:06.4758796Z buf0 = empty_strided((201, ), (1, ), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.4758995Z buf1 = empty_strided((201, ), (1, ), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.4759080Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.4759226Z triton_fused_log1p_mul_0.run(arg0_1, buf0, buf1, 201, grid=grid(201), stream=stream0) 2023-01-11T21:38:06.4759370Z del arg0_1 2023-01-11T21:38:06.4759451Z return (buf0, buf1, ) 2023-01-11T21:38:06.4759457Z 2023-01-11T21:38:06.4759461Z 2023-01-11T21:38:06.4759540Z if __name__ == "__main__": 2023-01-11T21:38:06.4759658Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.4759783Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.4759986Z arg0_1 = rand_strided((201, ), (1, ), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.4760091Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.4760096Z 2023-01-11T21:38:06.4760362Z [2023-01-11 21:35:09,775] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 662 2023-01-11T21:38:06.4760786Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.4760916Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.4761172Z [2023-01-11 21:35:09,791] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 663 2023-01-11T21:38:06.4761439Z [2023-01-11 21:35:09,935] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 663 2023-01-11T21:38:06.4761853Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.4762014Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.4762272Z [2023-01-11 21:35:09,952] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 664 2023-01-11T21:38:06.4762534Z [2023-01-11 21:35:09,960] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 664 2023-01-11T21:38:06.4762947Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.4763077Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.4763326Z [2023-01-11 21:35:09,975] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 665 2023-01-11T21:38:06.4763340Z 2023-01-11T21:38:06.4763431Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.4763505Z import torch 2023-01-11T21:38:06.4763580Z import random 2023-01-11T21:38:06.4763703Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.4763826Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.4763831Z 2023-01-11T21:38:06.4763912Z aten = torch.ops.aten 2023-01-11T21:38:06.4764050Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.4764138Z async_compile = AsyncCompile() 2023-01-11T21:38:06.4764143Z 2023-01-11T21:38:06.4764217Z import triton 2023-01-11T21:38:06.4764308Z import triton.language as tl 2023-01-11T21:38:06.4764434Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.4764573Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.4764581Z 2023-01-11T21:38:06.4764585Z 2023-01-11T21:38:06.4764749Z triton_fused_log1p_mul_0 = async_compile.triton(''' 2023-01-11T21:38:06.4764821Z import triton 2023-01-11T21:38:06.4764914Z import triton.language as tl 2023-01-11T21:38:06.4765021Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4765153Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4765287Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4765411Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4765416Z 2023-01-11T21:38:06.4765835Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.4765912Z @triton.jit 2023-01-11T21:38:06.4766055Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4766131Z xnumel = 201 2023-01-11T21:38:06.4766222Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4766352Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4766435Z xmask = xindex < xnumel 2023-01-11T21:38:06.4766505Z x0 = xindex 2023-01-11T21:38:06.4766723Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.4766842Z tmp2 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.4766939Z tmp1 = tl.libdevice.log1p(tmp0) 2023-01-11T21:38:06.4767029Z tmp3 = tl.libdevice.log1p(tmp2) 2023-01-11T21:38:06.4767098Z tmp4 = 2 2023-01-11T21:38:06.4767177Z tmp5 = tmp3 * tmp4 2023-01-11T21:38:06.4767314Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask) 2023-01-11T21:38:06.4767443Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask) 2023-01-11T21:38:06.4767530Z ''') 2023-01-11T21:38:06.4767577Z 2023-01-11T21:38:06.4767582Z 2023-01-11T21:38:06.4767674Z async_compile.wait(globals()) 2023-01-11T21:38:06.4767744Z del async_compile 2023-01-11T21:38:06.4767756Z 2023-01-11T21:38:06.4767823Z def call(args): 2023-01-11T21:38:06.4767896Z arg0_1, = args 2023-01-11T21:38:06.4767972Z args.clear() 2023-01-11T21:38:06.4768065Z with torch.cuda.device(0): 2023-01-11T21:38:06.4768264Z buf0 = empty_strided((201, ), (1, ), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.4768463Z buf1 = empty_strided((201, ), (1, ), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.4768555Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.4768698Z triton_fused_log1p_mul_0.run(arg0_1, buf0, buf1, 201, grid=grid(201), stream=stream0) 2023-01-11T21:38:06.4768770Z del arg0_1 2023-01-11T21:38:06.4768853Z return (buf0, buf1, ) 2023-01-11T21:38:06.4768858Z 2023-01-11T21:38:06.4768862Z 2023-01-11T21:38:06.4768941Z if __name__ == "__main__": 2023-01-11T21:38:06.4769060Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.4769185Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.4769384Z arg0_1 = rand_strided((201, ), (1, ), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.4769490Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.4769502Z 2023-01-11T21:38:06.4769508Z 2023-01-11T21:38:06.4769598Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.4769675Z import torch 2023-01-11T21:38:06.4769750Z import random 2023-01-11T21:38:06.4769870Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.4769990Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.4769996Z 2023-01-11T21:38:06.4770078Z aten = torch.ops.aten 2023-01-11T21:38:06.4770212Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.4770300Z async_compile = AsyncCompile() 2023-01-11T21:38:06.4770314Z 2023-01-11T21:38:06.4770381Z import triton 2023-01-11T21:38:06.4770474Z import triton.language as tl 2023-01-11T21:38:06.4770597Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.4770739Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.4770744Z 2023-01-11T21:38:06.4770748Z 2023-01-11T21:38:06.4770911Z triton_fused_log1p_mul_0 = async_compile.triton(''' 2023-01-11T21:38:06.4771016Z import triton 2023-01-11T21:38:06.4771110Z import triton.language as tl 2023-01-11T21:38:06.4771216Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4771318Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4771449Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4771575Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4771581Z 2023-01-11T21:38:06.4771999Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.4772075Z @triton.jit 2023-01-11T21:38:06.4772212Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4772287Z xnumel = 64 2023-01-11T21:38:06.4772377Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4772508Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4772590Z xmask = xindex < xnumel 2023-01-11T21:38:06.4772661Z x0 = xindex 2023-01-11T21:38:06.4772852Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.4772948Z tmp2 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.4773047Z tmp1 = tl.libdevice.log1p(tmp0) 2023-01-11T21:38:06.4773137Z tmp3 = tl.libdevice.log1p(tmp2) 2023-01-11T21:38:06.4773206Z tmp4 = 2 2023-01-11T21:38:06.4773287Z tmp5 = tmp3 * tmp4 2023-01-11T21:38:06.4773417Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask) 2023-01-11T21:38:06.4773580Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask) 2023-01-11T21:38:06.4773664Z ''') 2023-01-11T21:38:06.4773670Z 2023-01-11T21:38:06.4773674Z 2023-01-11T21:38:06.4773767Z async_compile.wait(globals()) 2023-01-11T21:38:06.4773843Z del async_compile 2023-01-11T21:38:06.4773848Z 2023-01-11T21:38:06.4773915Z def call(args): 2023-01-11T21:38:06.4773990Z arg0_1, = args 2023-01-11T21:38:06.4774065Z args.clear() 2023-01-11T21:38:06.4774157Z with torch.cuda.device(0): 2023-01-11T21:38:06.4774356Z buf0 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.4774661Z buf1 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.4774755Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.4774895Z triton_fused_log1p_mul_0.run(arg0_1, buf0, buf1, 64, grid=grid(64), stream=stream0) 2023-01-11T21:38:06.4774969Z del arg0_1 2023-01-11T21:38:06.4775049Z return (buf0, buf1, ) 2023-01-11T21:38:06.4775058Z 2023-01-11T21:38:06.4775062Z 2023-01-11T21:38:06.4775139Z if __name__ == "__main__": 2023-01-11T21:38:06.4775254Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.4775378Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.4775580Z arg0_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.4775701Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.4775706Z 2023-01-11T21:38:06.4775710Z 2023-01-11T21:38:06.4775806Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.4775881Z import torch 2023-01-11T21:38:06.4775956Z import random 2023-01-11T21:38:06.4776089Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.4776219Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.4776224Z 2023-01-11T21:38:06.4776309Z aten = torch.ops.aten 2023-01-11T21:38:06.4776455Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.4776555Z async_compile = AsyncCompile() 2023-01-11T21:38:06.4776561Z 2023-01-11T21:38:06.4776628Z import triton 2023-01-11T21:38:06.4776725Z import triton.language as tl 2023-01-11T21:38:06.4776858Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.4777011Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.4777060Z 2023-01-11T21:38:06.4777065Z 2023-01-11T21:38:06.4777323Z triton_fused_log1p_mul_0 = async_compile.triton(''' 2023-01-11T21:38:06.4777425Z import triton 2023-01-11T21:38:06.4777517Z import triton.language as tl 2023-01-11T21:38:06.4777623Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4777725Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4777861Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4777986Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4777991Z 2023-01-11T21:38:06.4778410Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.4778488Z @triton.jit 2023-01-11T21:38:06.4778629Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4778705Z xnumel = 64 2023-01-11T21:38:06.4778796Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4778926Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4779011Z xmask = xindex < xnumel 2023-01-11T21:38:06.4779082Z x0 = xindex 2023-01-11T21:38:06.4779298Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.4779419Z tmp2 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.4779516Z tmp1 = tl.libdevice.log1p(tmp0) 2023-01-11T21:38:06.4779612Z tmp3 = tl.libdevice.log1p(tmp2) 2023-01-11T21:38:06.4779676Z tmp4 = 2 2023-01-11T21:38:06.4779798Z tmp5 = tmp3 * tmp4 2023-01-11T21:38:06.4779932Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask) 2023-01-11T21:38:06.4780064Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask) 2023-01-11T21:38:06.4780150Z ''') 2023-01-11T21:38:06.4780156Z 2023-01-11T21:38:06.4780161Z 2023-01-11T21:38:06.4780252Z async_compile.wait(globals()) 2023-01-11T21:38:06.4780329Z del async_compile 2023-01-11T21:38:06.4780335Z 2023-01-11T21:38:06.4780402Z def call(args): 2023-01-11T21:38:06.4780474Z arg0_1, = args 2023-01-11T21:38:06.4780546Z args.clear() 2023-01-11T21:38:06.4780638Z with torch.cuda.device(0): 2023-01-11T21:38:06.4780837Z buf0 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.4781033Z buf1 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.4781124Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.4781266Z triton_fused_log1p_mul_0.run(arg0_1, buf0, buf1, 64, grid=grid(64), stream=stream0) 2023-01-11T21:38:06.4781342Z del arg0_1 2023-01-11T21:38:06.4781425Z return (buf0, buf1, ) 2023-01-11T21:38:06.4781430Z 2023-01-11T21:38:06.4781434Z 2023-01-11T21:38:06.4781513Z if __name__ == "__main__": 2023-01-11T21:38:06.4781634Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.4781759Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.4781956Z arg0_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.4782068Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.4782073Z 2023-01-11T21:38:06.4782338Z [2023-01-11 21:35:10,128] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 665 2023-01-11T21:38:06.4782745Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.4782877Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.4783188Z [2023-01-11 21:35:10,144] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 666 2023-01-11T21:38:06.4783452Z [2023-01-11 21:35:10,152] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 666 2023-01-11T21:38:06.4783868Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.4783998Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.4784255Z [2023-01-11 21:35:10,168] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 667 2023-01-11T21:38:06.4784516Z [2023-01-11 21:35:10,318] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 667 2023-01-11T21:38:06.4784930Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.4785059Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.4785337Z [2023-01-11 21:35:10,334] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 668 2023-01-11T21:38:06.4785342Z 2023-01-11T21:38:06.4785453Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.4785562Z import torch 2023-01-11T21:38:06.4785636Z import random 2023-01-11T21:38:06.4785755Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.4785878Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.4785884Z 2023-01-11T21:38:06.4785965Z aten = torch.ops.aten 2023-01-11T21:38:06.4786102Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.4786198Z async_compile = AsyncCompile() 2023-01-11T21:38:06.4786203Z 2023-01-11T21:38:06.4786270Z import triton 2023-01-11T21:38:06.4786363Z import triton.language as tl 2023-01-11T21:38:06.4786486Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.4786626Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.4786631Z 2023-01-11T21:38:06.4786636Z 2023-01-11T21:38:06.4786800Z triton_fused_log1p_mul_0 = async_compile.triton(''' 2023-01-11T21:38:06.4786876Z import triton 2023-01-11T21:38:06.4786967Z import triton.language as tl 2023-01-11T21:38:06.4787082Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4787177Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4787309Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4787434Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4787439Z 2023-01-11T21:38:06.4787862Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.4787940Z @triton.jit 2023-01-11T21:38:06.4788083Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4788156Z xnumel = 201 2023-01-11T21:38:06.4788250Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4788374Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4788457Z xmask = xindex < xnumel 2023-01-11T21:38:06.4788531Z x0 = xindex 2023-01-11T21:38:06.4788725Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.4788825Z tmp2 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.4788924Z tmp1 = tl.libdevice.log1p(tmp0) 2023-01-11T21:38:06.4789018Z tmp3 = tl.libdevice.log1p(tmp2) 2023-01-11T21:38:06.4789111Z tmp4 = 2 2023-01-11T21:38:06.4789191Z tmp5 = tmp3 * tmp4 2023-01-11T21:38:06.4789323Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask) 2023-01-11T21:38:06.4789456Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask) 2023-01-11T21:38:06.4789542Z ''') 2023-01-11T21:38:06.4789547Z 2023-01-11T21:38:06.4789552Z 2023-01-11T21:38:06.4789643Z async_compile.wait(globals()) 2023-01-11T21:38:06.4789720Z del async_compile 2023-01-11T21:38:06.4789726Z 2023-01-11T21:38:06.4789799Z def call(args): 2023-01-11T21:38:06.4789866Z arg0_1, = args 2023-01-11T21:38:06.4789939Z args.clear() 2023-01-11T21:38:06.4790035Z with torch.cuda.device(0): 2023-01-11T21:38:06.4790234Z buf0 = empty_strided((201, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.4790433Z buf1 = empty_strided((201, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.4790528Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.4790678Z triton_fused_log1p_mul_0.run(arg0_1, buf0, buf1, 201, grid=grid(201), stream=stream0) 2023-01-11T21:38:06.4790745Z del arg0_1 2023-01-11T21:38:06.4790825Z return (buf0, buf1, ) 2023-01-11T21:38:06.4790830Z 2023-01-11T21:38:06.4790834Z 2023-01-11T21:38:06.4790914Z if __name__ == "__main__": 2023-01-11T21:38:06.4791032Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.4791158Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.4791358Z arg0_1 = rand_strided((201, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.4791469Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.4791507Z 2023-01-11T21:38:06.4791511Z 2023-01-11T21:38:06.4791610Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.4791677Z import torch 2023-01-11T21:38:06.4791747Z import random 2023-01-11T21:38:06.4791864Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.4791987Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.4791992Z 2023-01-11T21:38:06.4792072Z aten = torch.ops.aten 2023-01-11T21:38:06.4792206Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.4792301Z async_compile = AsyncCompile() 2023-01-11T21:38:06.4792306Z 2023-01-11T21:38:06.4792381Z import triton 2023-01-11T21:38:06.4792468Z import triton.language as tl 2023-01-11T21:38:06.4792596Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.4792734Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.4792739Z 2023-01-11T21:38:06.4792743Z 2023-01-11T21:38:06.4792909Z triton_fused_log1p_mul_0 = async_compile.triton(''' 2023-01-11T21:38:06.4792982Z import triton 2023-01-11T21:38:06.4793072Z import triton.language as tl 2023-01-11T21:38:06.4793185Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4793280Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4793417Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4793543Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4793548Z 2023-01-11T21:38:06.4793959Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.4794033Z @triton.jit 2023-01-11T21:38:06.4794177Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4794251Z xnumel = 201 2023-01-11T21:38:06.4794346Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4794471Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4794555Z xmask = xindex < xnumel 2023-01-11T21:38:06.4794624Z x0 = xindex 2023-01-11T21:38:06.4794838Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.4794985Z tmp2 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.4795086Z tmp1 = tl.libdevice.log1p(tmp0) 2023-01-11T21:38:06.4795192Z tmp3 = tl.libdevice.log1p(tmp2) 2023-01-11T21:38:06.4795272Z tmp4 = 2 2023-01-11T21:38:06.4795354Z tmp5 = tmp3 * tmp4 2023-01-11T21:38:06.4795509Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask) 2023-01-11T21:38:06.4795644Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask) 2023-01-11T21:38:06.4795731Z ''') 2023-01-11T21:38:06.4795737Z 2023-01-11T21:38:06.4795741Z 2023-01-11T21:38:06.4795831Z async_compile.wait(globals()) 2023-01-11T21:38:06.4795912Z del async_compile 2023-01-11T21:38:06.4795917Z 2023-01-11T21:38:06.4795992Z def call(args): 2023-01-11T21:38:06.4796058Z arg0_1, = args 2023-01-11T21:38:06.4796135Z args.clear() 2023-01-11T21:38:06.4796225Z with torch.cuda.device(0): 2023-01-11T21:38:06.4796424Z buf0 = empty_strided((201, ), (1, ), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.4796626Z buf1 = empty_strided((201, ), (1, ), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.4796715Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.4796859Z triton_fused_log1p_mul_0.run(arg0_1, buf0, buf1, 201, grid=grid(201), stream=stream0) 2023-01-11T21:38:06.4796925Z del arg0_1 2023-01-11T21:38:06.4797008Z return (buf0, buf1, ) 2023-01-11T21:38:06.4797013Z 2023-01-11T21:38:06.4797018Z 2023-01-11T21:38:06.4797101Z if __name__ == "__main__": 2023-01-11T21:38:06.4797218Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.4797343Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.4797571Z arg0_1 = rand_strided((201, ), (1, ), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.4797683Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.4797688Z 2023-01-11T21:38:06.4797692Z 2023-01-11T21:38:06.4797788Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.4797863Z import torch 2023-01-11T21:38:06.4797930Z import random 2023-01-11T21:38:06.4798046Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.4798167Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.4798172Z 2023-01-11T21:38:06.4798253Z aten = torch.ops.aten 2023-01-11T21:38:06.4798385Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.4798483Z async_compile = AsyncCompile() 2023-01-11T21:38:06.4798488Z 2023-01-11T21:38:06.4798562Z import triton 2023-01-11T21:38:06.4798648Z import triton.language as tl 2023-01-11T21:38:06.4798775Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.4798916Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.4798921Z 2023-01-11T21:38:06.4798925Z 2023-01-11T21:38:06.4799087Z triton_fused_log1p_mul_0 = async_compile.triton(''' 2023-01-11T21:38:06.4799161Z import triton 2023-01-11T21:38:06.4799252Z import triton.language as tl 2023-01-11T21:38:06.4799364Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4799467Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4799591Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4799713Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4799719Z 2023-01-11T21:38:06.4800144Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp64', 1: '*fp64', 2: '*fp64', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.4800217Z @triton.jit 2023-01-11T21:38:06.4800366Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4800439Z xnumel = 64 2023-01-11T21:38:06.4800536Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4800661Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4800738Z xmask = xindex < xnumel 2023-01-11T21:38:06.4800837Z x0 = xindex 2023-01-11T21:38:06.4801031Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.4801131Z tmp2 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.4801228Z tmp1 = tl.libdevice.log1p(tmp0) 2023-01-11T21:38:06.4801332Z tmp3 = tl.libdevice.log1p(tmp2) 2023-01-11T21:38:06.4801404Z tmp4 = 2 2023-01-11T21:38:06.4801476Z tmp5 = tmp3 * tmp4 2023-01-11T21:38:06.4801615Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask) 2023-01-11T21:38:06.4801746Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask) 2023-01-11T21:38:06.4801830Z ''') 2023-01-11T21:38:06.4801840Z 2023-01-11T21:38:06.4801844Z 2023-01-11T21:38:06.4801938Z async_compile.wait(globals()) 2023-01-11T21:38:06.4802015Z del async_compile 2023-01-11T21:38:06.4802020Z 2023-01-11T21:38:06.4802095Z def call(args): 2023-01-11T21:38:06.4802161Z arg0_1, = args 2023-01-11T21:38:06.4802236Z args.clear() 2023-01-11T21:38:06.4802328Z with torch.cuda.device(0): 2023-01-11T21:38:06.4802523Z buf0 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float64) 2023-01-11T21:38:06.4802720Z buf1 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float64) 2023-01-11T21:38:06.4802809Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.4802957Z triton_fused_log1p_mul_0.run(arg0_1, buf0, buf1, 64, grid=grid(64), stream=stream0) 2023-01-11T21:38:06.4803031Z del arg0_1 2023-01-11T21:38:06.4803106Z return (buf0, buf1, ) 2023-01-11T21:38:06.4803111Z 2023-01-11T21:38:06.4803116Z 2023-01-11T21:38:06.4803196Z if __name__ == "__main__": 2023-01-11T21:38:06.4803342Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.4803469Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.4803669Z arg0_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float64) 2023-01-11T21:38:06.4803781Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.4803790Z 2023-01-11T21:38:06.4804055Z [2023-01-11 21:35:10,342] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 668 2023-01-11T21:38:06.4804474Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.4804606Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.4804858Z [2023-01-11 21:35:10,358] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 669 2023-01-11T21:38:06.4805122Z [2023-01-11 21:35:10,517] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 669 2023-01-11T21:38:06.4805539Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.4805670Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.4805923Z [2023-01-11 21:35:10,533] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 670 2023-01-11T21:38:06.4806187Z [2023-01-11 21:35:10,541] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 670 2023-01-11T21:38:06.4806622Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.4806755Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.4807010Z [2023-01-11 21:35:10,556] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 671 2023-01-11T21:38:06.4807016Z 2023-01-11T21:38:06.4807113Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.4807185Z import torch 2023-01-11T21:38:06.4807253Z import random 2023-01-11T21:38:06.4807372Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.4807497Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.4807502Z 2023-01-11T21:38:06.4807589Z aten = torch.ops.aten 2023-01-11T21:38:06.4807724Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.4807817Z async_compile = AsyncCompile() 2023-01-11T21:38:06.4807822Z 2023-01-11T21:38:06.4807897Z import triton 2023-01-11T21:38:06.4807982Z import triton.language as tl 2023-01-11T21:38:06.4808109Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.4808244Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.4808249Z 2023-01-11T21:38:06.4808254Z 2023-01-11T21:38:06.4808415Z triton_fused_log1p_mul_0 = async_compile.triton(''' 2023-01-11T21:38:06.4808487Z import triton 2023-01-11T21:38:06.4808578Z import triton.language as tl 2023-01-11T21:38:06.4808692Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4808794Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4808919Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4809045Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4809088Z 2023-01-11T21:38:06.4809512Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp64', 1: '*fp64', 2: '*fp64', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.4809589Z @triton.jit 2023-01-11T21:38:06.4809731Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4809803Z xnumel = 64 2023-01-11T21:38:06.4809902Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4810033Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4810109Z xmask = xindex < xnumel 2023-01-11T21:38:06.4810182Z x0 = xindex 2023-01-11T21:38:06.4810375Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.4810477Z tmp2 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.4810575Z tmp1 = tl.libdevice.log1p(tmp0) 2023-01-11T21:38:06.4810675Z tmp3 = tl.libdevice.log1p(tmp2) 2023-01-11T21:38:06.4810745Z tmp4 = 2 2023-01-11T21:38:06.4810817Z tmp5 = tmp3 * tmp4 2023-01-11T21:38:06.4810950Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask) 2023-01-11T21:38:06.4811082Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask) 2023-01-11T21:38:06.4811166Z ''') 2023-01-11T21:38:06.4811172Z 2023-01-11T21:38:06.4811176Z 2023-01-11T21:38:06.4811271Z async_compile.wait(globals()) 2023-01-11T21:38:06.4811346Z del async_compile 2023-01-11T21:38:06.4811352Z 2023-01-11T21:38:06.4811426Z def call(args): 2023-01-11T21:38:06.4811498Z arg0_1, = args 2023-01-11T21:38:06.4811566Z args.clear() 2023-01-11T21:38:06.4811661Z with torch.cuda.device(0): 2023-01-11T21:38:06.4811862Z buf0 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float64) 2023-01-11T21:38:06.4812057Z buf1 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float64) 2023-01-11T21:38:06.4812151Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.4812296Z triton_fused_log1p_mul_0.run(arg0_1, buf0, buf1, 64, grid=grid(64), stream=stream0) 2023-01-11T21:38:06.4812370Z del arg0_1 2023-01-11T21:38:06.4812445Z return (buf0, buf1, ) 2023-01-11T21:38:06.4812451Z 2023-01-11T21:38:06.4812461Z 2023-01-11T21:38:06.4812565Z if __name__ == "__main__": 2023-01-11T21:38:06.4812685Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.4812812Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.4813012Z arg0_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float64) 2023-01-11T21:38:06.4813126Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.4813131Z 2023-01-11T21:38:06.4813135Z 2023-01-11T21:38:06.4813235Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.4813308Z import torch 2023-01-11T21:38:06.4813376Z import random 2023-01-11T21:38:06.4813494Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.4813618Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.4813624Z 2023-01-11T21:38:06.4813709Z aten = torch.ops.aten 2023-01-11T21:38:06.4813843Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.4813938Z async_compile = AsyncCompile() 2023-01-11T21:38:06.4813943Z 2023-01-11T21:38:06.4814020Z import triton 2023-01-11T21:38:06.4814113Z import triton.language as tl 2023-01-11T21:38:06.4814231Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.4814373Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.4814378Z 2023-01-11T21:38:06.4814383Z 2023-01-11T21:38:06.4814657Z triton_fused_log1p_mul_0 = async_compile.triton(''' 2023-01-11T21:38:06.4814730Z import triton 2023-01-11T21:38:06.4814822Z import triton.language as tl 2023-01-11T21:38:06.4814935Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4815037Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4815213Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4815331Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4815336Z 2023-01-11T21:38:06.4815761Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp64', 1: '*fp64', 2: '*fp64', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.4815838Z @triton.jit 2023-01-11T21:38:06.4815981Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4816055Z xnumel = 201 2023-01-11T21:38:06.4816155Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4816283Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4816369Z xmask = xindex < xnumel 2023-01-11T21:38:06.4816433Z x0 = xindex 2023-01-11T21:38:06.4816622Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.4816721Z tmp2 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.4816821Z tmp1 = tl.libdevice.log1p(tmp0) 2023-01-11T21:38:06.4816917Z tmp3 = tl.libdevice.log1p(tmp2) 2023-01-11T21:38:06.4816988Z tmp4 = 2 2023-01-11T21:38:06.4817061Z tmp5 = tmp3 * tmp4 2023-01-11T21:38:06.4817285Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask) 2023-01-11T21:38:06.4817465Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask) 2023-01-11T21:38:06.4817562Z ''') 2023-01-11T21:38:06.4817568Z 2023-01-11T21:38:06.4817572Z 2023-01-11T21:38:06.4817664Z async_compile.wait(globals()) 2023-01-11T21:38:06.4817741Z del async_compile 2023-01-11T21:38:06.4817746Z 2023-01-11T21:38:06.4817821Z def call(args): 2023-01-11T21:38:06.4817892Z arg0_1, = args 2023-01-11T21:38:06.4817959Z args.clear() 2023-01-11T21:38:06.4818052Z with torch.cuda.device(0): 2023-01-11T21:38:06.4818254Z buf0 = empty_strided((201, ), (1, ), device='cuda', dtype=torch.float64) 2023-01-11T21:38:06.4818455Z buf1 = empty_strided((201, ), (1, ), device='cuda', dtype=torch.float64) 2023-01-11T21:38:06.4818550Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.4818697Z triton_fused_log1p_mul_0.run(arg0_1, buf0, buf1, 201, grid=grid(201), stream=stream0) 2023-01-11T21:38:06.4818809Z del arg0_1 2023-01-11T21:38:06.4818887Z return (buf0, buf1, ) 2023-01-11T21:38:06.4818898Z 2023-01-11T21:38:06.4818903Z 2023-01-11T21:38:06.4818976Z if __name__ == "__main__": 2023-01-11T21:38:06.4819092Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.4819221Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.4819423Z arg0_1 = rand_strided((201, ), (1, ), device='cuda:0', dtype=torch.float64) 2023-01-11T21:38:06.4819535Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.4819540Z 2023-01-11T21:38:06.4819545Z 2023-01-11T21:38:06.4819642Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.4819719Z import torch 2023-01-11T21:38:06.4819793Z import random 2023-01-11T21:38:06.4819906Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.4820030Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.4820035Z 2023-01-11T21:38:06.4820117Z aten = torch.ops.aten 2023-01-11T21:38:06.4820257Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.4820353Z async_compile = AsyncCompile() 2023-01-11T21:38:06.4820359Z 2023-01-11T21:38:06.4820430Z import triton 2023-01-11T21:38:06.4820523Z import triton.language as tl 2023-01-11T21:38:06.4820641Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.4820779Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.4820785Z 2023-01-11T21:38:06.4820789Z 2023-01-11T21:38:06.4820952Z triton_fused_log1p_mul_0 = async_compile.triton(''' 2023-01-11T21:38:06.4821025Z import triton 2023-01-11T21:38:06.4821116Z import triton.language as tl 2023-01-11T21:38:06.4821263Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4821366Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4821498Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4821619Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4821624Z 2023-01-11T21:38:06.4822038Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp64', 1: '*fp64', 2: '*fp64', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.4822114Z @triton.jit 2023-01-11T21:38:06.4822256Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4822333Z xnumel = 201 2023-01-11T21:38:06.4822435Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4822565Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4822652Z xmask = xindex < xnumel 2023-01-11T21:38:06.4822722Z x0 = xindex 2023-01-11T21:38:06.4822912Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.4823011Z tmp2 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.4823111Z tmp1 = tl.libdevice.log1p(tmp0) 2023-01-11T21:38:06.4823210Z tmp3 = tl.libdevice.log1p(tmp2) 2023-01-11T21:38:06.4823288Z tmp4 = 2 2023-01-11T21:38:06.4823366Z tmp5 = tmp3 * tmp4 2023-01-11T21:38:06.4823495Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask) 2023-01-11T21:38:06.4823624Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask) 2023-01-11T21:38:06.4823712Z ''') 2023-01-11T21:38:06.4823717Z 2023-01-11T21:38:06.4823721Z 2023-01-11T21:38:06.4823820Z async_compile.wait(globals()) 2023-01-11T21:38:06.4823898Z del async_compile 2023-01-11T21:38:06.4823903Z 2023-01-11T21:38:06.4823978Z def call(args): 2023-01-11T21:38:06.4824051Z arg0_1, = args 2023-01-11T21:38:06.4824127Z args.clear() 2023-01-11T21:38:06.4824218Z with torch.cuda.device(0): 2023-01-11T21:38:06.4824423Z buf0 = empty_strided((201, ), (1, ), device='cuda', dtype=torch.float64) 2023-01-11T21:38:06.4824621Z buf1 = empty_strided((201, ), (1, ), device='cuda', dtype=torch.float64) 2023-01-11T21:38:06.4824716Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.4824896Z triton_fused_log1p_mul_0.run(arg0_1, buf0, buf1, 201, grid=grid(201), stream=stream0) 2023-01-11T21:38:06.4824970Z del arg0_1 2023-01-11T21:38:06.4825054Z return (buf0, buf1, ) 2023-01-11T21:38:06.4825059Z 2023-01-11T21:38:06.4825063Z 2023-01-11T21:38:06.4825139Z if __name__ == "__main__": 2023-01-11T21:38:06.4825272Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.4825415Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.4825637Z arg0_1 = rand_strided((201, ), (1, ), device='cuda:0', dtype=torch.float64) 2023-01-11T21:38:06.4825751Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.4825760Z 2023-01-11T21:38:06.4826026Z [2023-01-11 21:35:10,699] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 671 2023-01-11T21:38:06.4826442Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.4826573Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.4826830Z [2023-01-11 21:35:10,716] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 672 2023-01-11T21:38:06.4827093Z [2023-01-11 21:35:10,724] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 672 2023-01-11T21:38:06.4827502Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.4827664Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.4827921Z [2023-01-11 21:35:10,740] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 673 2023-01-11T21:38:06.4828186Z [2023-01-11 21:35:10,892] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 673 2023-01-11T21:38:06.4828596Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.4828730Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.4828983Z [2023-01-11 21:35:10,908] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 674 2023-01-11T21:38:06.4828989Z 2023-01-11T21:38:06.4829093Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.4829170Z import torch 2023-01-11T21:38:06.4829248Z import random 2023-01-11T21:38:06.4829362Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.4829485Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.4829490Z 2023-01-11T21:38:06.4829574Z aten = torch.ops.aten 2023-01-11T21:38:06.4829714Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.4829811Z async_compile = AsyncCompile() 2023-01-11T21:38:06.4829816Z 2023-01-11T21:38:06.4829891Z import triton 2023-01-11T21:38:06.4829988Z import triton.language as tl 2023-01-11T21:38:06.4830119Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.4830253Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.4830259Z 2023-01-11T21:38:06.4830264Z 2023-01-11T21:38:06.4830430Z triton_fused_log1p_mul_0 = async_compile.triton(''' 2023-01-11T21:38:06.4830507Z import triton 2023-01-11T21:38:06.4830625Z import triton.language as tl 2023-01-11T21:38:06.4830748Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4830852Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4830989Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4831110Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4831122Z 2023-01-11T21:38:06.4831539Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*i32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.4831617Z @triton.jit 2023-01-11T21:38:06.4831764Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4831840Z xnumel = 64 2023-01-11T21:38:06.4831940Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4832071Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4832157Z xmask = xindex < xnumel 2023-01-11T21:38:06.4832225Z x0 = xindex 2023-01-11T21:38:06.4832425Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.4832532Z tmp3 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.4832622Z tmp1 = tmp0.to(tl.float32) 2023-01-11T21:38:06.4832724Z tmp2 = tl.libdevice.log1p(tmp1) 2023-01-11T21:38:06.4832814Z tmp4 = tmp3.to(tl.float32) 2023-01-11T21:38:06.4832914Z tmp5 = tl.libdevice.log1p(tmp4) 2023-01-11T21:38:06.4832980Z tmp6 = 2 2023-01-11T21:38:06.4833062Z tmp7 = tmp5 * tmp6 2023-01-11T21:38:06.4833198Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.4833361Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp7, xmask) 2023-01-11T21:38:06.4833451Z ''') 2023-01-11T21:38:06.4833456Z 2023-01-11T21:38:06.4833461Z 2023-01-11T21:38:06.4833556Z async_compile.wait(globals()) 2023-01-11T21:38:06.4833635Z del async_compile 2023-01-11T21:38:06.4833645Z 2023-01-11T21:38:06.4833725Z def call(args): 2023-01-11T21:38:06.4833794Z arg0_1, = args 2023-01-11T21:38:06.4833872Z args.clear() 2023-01-11T21:38:06.4833966Z with torch.cuda.device(0): 2023-01-11T21:38:06.4834167Z buf0 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.4834366Z buf1 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.4834460Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.4834607Z triton_fused_log1p_mul_0.run(arg0_1, buf0, buf1, 64, grid=grid(64), stream=stream0) 2023-01-11T21:38:06.4834679Z del arg0_1 2023-01-11T21:38:06.4834765Z return (buf0, buf1, ) 2023-01-11T21:38:06.4834770Z 2023-01-11T21:38:06.4834774Z 2023-01-11T21:38:06.4834854Z if __name__ == "__main__": 2023-01-11T21:38:06.4834975Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.4835103Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.4835331Z arg0_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.int32) 2023-01-11T21:38:06.4835468Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.4835473Z 2023-01-11T21:38:06.4835478Z 2023-01-11T21:38:06.4835576Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.4835646Z import torch 2023-01-11T21:38:06.4835723Z import random 2023-01-11T21:38:06.4835844Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.4835968Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.4835973Z 2023-01-11T21:38:06.4836056Z aten = torch.ops.aten 2023-01-11T21:38:06.4836199Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.4836296Z async_compile = AsyncCompile() 2023-01-11T21:38:06.4836301Z 2023-01-11T21:38:06.4836377Z import triton 2023-01-11T21:38:06.4836465Z import triton.language as tl 2023-01-11T21:38:06.4836591Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.4836758Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.4836764Z 2023-01-11T21:38:06.4836768Z 2023-01-11T21:38:06.4836935Z triton_fused_log1p_mul_0 = async_compile.triton(''' 2023-01-11T21:38:06.4837012Z import triton 2023-01-11T21:38:06.4837107Z import triton.language as tl 2023-01-11T21:38:06.4837222Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4837326Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4837455Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4837581Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4837586Z 2023-01-11T21:38:06.4838005Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*i32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.4838079Z @triton.jit 2023-01-11T21:38:06.4838226Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4838302Z xnumel = 64 2023-01-11T21:38:06.4838401Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4838537Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4838616Z xmask = xindex < xnumel 2023-01-11T21:38:06.4838689Z x0 = xindex 2023-01-11T21:38:06.4838883Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.4838981Z tmp3 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.4839070Z tmp1 = tmp0.to(tl.float32) 2023-01-11T21:38:06.4839167Z tmp2 = tl.libdevice.log1p(tmp1) 2023-01-11T21:38:06.4839285Z tmp4 = tmp3.to(tl.float32) 2023-01-11T21:38:06.4839377Z tmp5 = tl.libdevice.log1p(tmp4) 2023-01-11T21:38:06.4839449Z tmp6 = 2 2023-01-11T21:38:06.4839528Z tmp7 = tmp5 * tmp6 2023-01-11T21:38:06.4839665Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.4839802Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp7, xmask) 2023-01-11T21:38:06.4839893Z ''') 2023-01-11T21:38:06.4839898Z 2023-01-11T21:38:06.4839904Z 2023-01-11T21:38:06.4839999Z async_compile.wait(globals()) 2023-01-11T21:38:06.4840072Z del async_compile 2023-01-11T21:38:06.4840077Z 2023-01-11T21:38:06.4840154Z def call(args): 2023-01-11T21:38:06.4840229Z arg0_1, = args 2023-01-11T21:38:06.4840308Z args.clear() 2023-01-11T21:38:06.4840403Z with torch.cuda.device(0): 2023-01-11T21:38:06.4840603Z buf0 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.4840803Z buf1 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.4840895Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.4841046Z triton_fused_log1p_mul_0.run(arg0_1, buf0, buf1, 64, grid=grid(64), stream=stream0) 2023-01-11T21:38:06.4841121Z del arg0_1 2023-01-11T21:38:06.4841206Z return (buf0, buf1, ) 2023-01-11T21:38:06.4841214Z 2023-01-11T21:38:06.4841219Z 2023-01-11T21:38:06.4841301Z if __name__ == "__main__": 2023-01-11T21:38:06.4841422Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.4841548Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.4841747Z arg0_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.int32) 2023-01-11T21:38:06.4841855Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.4841864Z 2023-01-11T21:38:06.4841869Z 2023-01-11T21:38:06.4841961Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.4842035Z import torch 2023-01-11T21:38:06.4842110Z import random 2023-01-11T21:38:06.4842234Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.4842360Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.4842365Z 2023-01-11T21:38:06.4842450Z aten = torch.ops.aten 2023-01-11T21:38:06.4842588Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.4842708Z async_compile = AsyncCompile() 2023-01-11T21:38:06.4842715Z 2023-01-11T21:38:06.4842790Z import triton 2023-01-11T21:38:06.4842887Z import triton.language as tl 2023-01-11T21:38:06.4843015Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.4843156Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.4843161Z 2023-01-11T21:38:06.4843165Z 2023-01-11T21:38:06.4843332Z triton_fused_log1p_mul_0 = async_compile.triton(''' 2023-01-11T21:38:06.4843409Z import triton 2023-01-11T21:38:06.4843503Z import triton.language as tl 2023-01-11T21:38:06.4843612Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4843717Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4843850Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4843977Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4843982Z 2023-01-11T21:38:06.4844402Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*i32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.4844478Z @triton.jit 2023-01-11T21:38:06.4844620Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4844695Z xnumel = 201 2023-01-11T21:38:06.4844788Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4844920Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4845005Z xmask = xindex < xnumel 2023-01-11T21:38:06.4845076Z x0 = xindex 2023-01-11T21:38:06.4845268Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.4845394Z tmp3 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.4845485Z tmp1 = tmp0.to(tl.float32) 2023-01-11T21:38:06.4845578Z tmp2 = tl.libdevice.log1p(tmp1) 2023-01-11T21:38:06.4845665Z tmp4 = tmp3.to(tl.float32) 2023-01-11T21:38:06.4845764Z tmp5 = tl.libdevice.log1p(tmp4) 2023-01-11T21:38:06.4845839Z tmp6 = 2 2023-01-11T21:38:06.4845921Z tmp7 = tmp5 * tmp6 2023-01-11T21:38:06.4846055Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.4846189Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp7, xmask) 2023-01-11T21:38:06.4846270Z ''') 2023-01-11T21:38:06.4846276Z 2023-01-11T21:38:06.4846280Z 2023-01-11T21:38:06.4846375Z async_compile.wait(globals()) 2023-01-11T21:38:06.4846453Z del async_compile 2023-01-11T21:38:06.4846458Z 2023-01-11T21:38:06.4846534Z def call(args): 2023-01-11T21:38:06.4846613Z arg0_1, = args 2023-01-11T21:38:06.4846694Z args.clear() 2023-01-11T21:38:06.4846787Z with torch.cuda.device(0): 2023-01-11T21:38:06.4846983Z buf0 = empty_strided((201, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.4847181Z buf1 = empty_strided((201, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.4847279Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.4847429Z triton_fused_log1p_mul_0.run(arg0_1, buf0, buf1, 201, grid=grid(201), stream=stream0) 2023-01-11T21:38:06.4847504Z del arg0_1 2023-01-11T21:38:06.4847589Z return (buf0, buf1, ) 2023-01-11T21:38:06.4847594Z 2023-01-11T21:38:06.4847598Z 2023-01-11T21:38:06.4847680Z if __name__ == "__main__": 2023-01-11T21:38:06.4847801Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.4847922Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.4848124Z arg0_1 = rand_strided((201, ), (1, ), device='cuda:0', dtype=torch.int32) 2023-01-11T21:38:06.4848241Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.4848251Z 2023-01-11T21:38:06.4848517Z [2023-01-11 21:35:10,916] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 674 2023-01-11T21:38:06.4848961Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.4849098Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.4849356Z [2023-01-11 21:35:10,932] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 675 2023-01-11T21:38:06.4849621Z [2023-01-11 21:35:11,075] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 675 2023-01-11T21:38:06.4850033Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.4850170Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.4850424Z [2023-01-11 21:35:11,091] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 676 2023-01-11T21:38:06.4850682Z [2023-01-11 21:35:11,099] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 676 2023-01-11T21:38:06.4851097Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.4851310Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.4851568Z [2023-01-11 21:35:11,115] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 677 2023-01-11T21:38:06.4851574Z 2023-01-11T21:38:06.4851676Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.4851753Z import torch 2023-01-11T21:38:06.4851829Z import random 2023-01-11T21:38:06.4851952Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.4852077Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.4852082Z 2023-01-11T21:38:06.4852160Z aten = torch.ops.aten 2023-01-11T21:38:06.4852298Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.4852395Z async_compile = AsyncCompile() 2023-01-11T21:38:06.4852400Z 2023-01-11T21:38:06.4852475Z import triton 2023-01-11T21:38:06.4852571Z import triton.language as tl 2023-01-11T21:38:06.4852701Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.4852842Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.4852848Z 2023-01-11T21:38:06.4852852Z 2023-01-11T21:38:06.4853017Z triton_fused_log1p_mul_0 = async_compile.triton(''' 2023-01-11T21:38:06.4853089Z import triton 2023-01-11T21:38:06.4853187Z import triton.language as tl 2023-01-11T21:38:06.4853301Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4853408Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4853541Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4853667Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4853673Z 2023-01-11T21:38:06.4854091Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*i32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.4854169Z @triton.jit 2023-01-11T21:38:06.4854308Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4854383Z xnumel = 201 2023-01-11T21:38:06.4854584Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4854761Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4854849Z xmask = xindex < xnumel 2023-01-11T21:38:06.4854922Z x0 = xindex 2023-01-11T21:38:06.4855119Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.4855212Z tmp3 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.4855303Z tmp1 = tmp0.to(tl.float32) 2023-01-11T21:38:06.4855403Z tmp2 = tl.libdevice.log1p(tmp1) 2023-01-11T21:38:06.4855492Z tmp4 = tmp3.to(tl.float32) 2023-01-11T21:38:06.4855589Z tmp5 = tl.libdevice.log1p(tmp4) 2023-01-11T21:38:06.4855661Z tmp6 = 2 2023-01-11T21:38:06.4855740Z tmp7 = tmp5 * tmp6 2023-01-11T21:38:06.4855874Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.4856010Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp7, xmask) 2023-01-11T21:38:06.4856098Z ''') 2023-01-11T21:38:06.4856104Z 2023-01-11T21:38:06.4856108Z 2023-01-11T21:38:06.4856203Z async_compile.wait(globals()) 2023-01-11T21:38:06.4856284Z del async_compile 2023-01-11T21:38:06.4856289Z 2023-01-11T21:38:06.4856365Z def call(args): 2023-01-11T21:38:06.4856440Z arg0_1, = args 2023-01-11T21:38:06.4856516Z args.clear() 2023-01-11T21:38:06.4856603Z with torch.cuda.device(0): 2023-01-11T21:38:06.4856806Z buf0 = empty_strided((201, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.4857008Z buf1 = empty_strided((201, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.4857103Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.4857338Z triton_fused_log1p_mul_0.run(arg0_1, buf0, buf1, 201, grid=grid(201), stream=stream0) 2023-01-11T21:38:06.4857573Z del arg0_1 2023-01-11T21:38:06.4857658Z return (buf0, buf1, ) 2023-01-11T21:38:06.4857664Z 2023-01-11T21:38:06.4857668Z 2023-01-11T21:38:06.4857753Z if __name__ == "__main__": 2023-01-11T21:38:06.4857867Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.4857998Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.4858199Z arg0_1 = rand_strided((201, ), (1, ), device='cuda:0', dtype=torch.int32) 2023-01-11T21:38:06.4858315Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.4858320Z 2023-01-11T21:38:06.4858324Z 2023-01-11T21:38:06.4858423Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.4858498Z import torch 2023-01-11T21:38:06.4858574Z import random 2023-01-11T21:38:06.4858689Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.4858816Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.4858821Z 2023-01-11T21:38:06.4858902Z aten = torch.ops.aten 2023-01-11T21:38:06.4859043Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.4859139Z async_compile = AsyncCompile() 2023-01-11T21:38:06.4859145Z 2023-01-11T21:38:06.4859219Z import triton 2023-01-11T21:38:06.4859314Z import triton.language as tl 2023-01-11T21:38:06.4859439Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.4859578Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.4859583Z 2023-01-11T21:38:06.4859594Z 2023-01-11T21:38:06.4859752Z triton_fused_log1p_mul_0 = async_compile.triton(''' 2023-01-11T21:38:06.4859829Z import triton 2023-01-11T21:38:06.4859924Z import triton.language as tl 2023-01-11T21:38:06.4860039Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4860144Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4860276Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4860404Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4860411Z 2023-01-11T21:38:06.4860823Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.4860898Z @triton.jit 2023-01-11T21:38:06.4861077Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4861153Z xnumel = 64 2023-01-11T21:38:06.4861251Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4861381Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4861468Z xmask = xindex < xnumel 2023-01-11T21:38:06.4861540Z x0 = xindex 2023-01-11T21:38:06.4861727Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.4861826Z tmp3 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.4861916Z tmp1 = tmp0.to(tl.float32) 2023-01-11T21:38:06.4862016Z tmp2 = tl.libdevice.log1p(tmp1) 2023-01-11T21:38:06.4862106Z tmp4 = tmp3.to(tl.float32) 2023-01-11T21:38:06.4862205Z tmp5 = tl.libdevice.log1p(tmp4) 2023-01-11T21:38:06.4862271Z tmp6 = 2 2023-01-11T21:38:06.4862352Z tmp7 = tmp5 * tmp6 2023-01-11T21:38:06.4862486Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.4862622Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp7, xmask) 2023-01-11T21:38:06.4862708Z ''') 2023-01-11T21:38:06.4862713Z 2023-01-11T21:38:06.4862718Z 2023-01-11T21:38:06.4862812Z async_compile.wait(globals()) 2023-01-11T21:38:06.4862891Z del async_compile 2023-01-11T21:38:06.4862896Z 2023-01-11T21:38:06.4862970Z def call(args): 2023-01-11T21:38:06.4863038Z arg0_1, = args 2023-01-11T21:38:06.4863115Z args.clear() 2023-01-11T21:38:06.4863210Z with torch.cuda.device(0): 2023-01-11T21:38:06.4863409Z buf0 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.4863605Z buf1 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.4863728Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.4863879Z triton_fused_log1p_mul_0.run(arg0_1, buf0, buf1, 64, grid=grid(64), stream=stream0) 2023-01-11T21:38:06.4863947Z del arg0_1 2023-01-11T21:38:06.4864033Z return (buf0, buf1, ) 2023-01-11T21:38:06.4864040Z 2023-01-11T21:38:06.4864045Z 2023-01-11T21:38:06.4864126Z if __name__ == "__main__": 2023-01-11T21:38:06.4864245Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.4864373Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.4864570Z arg0_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.4864683Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.4864689Z 2023-01-11T21:38:06.4864693Z 2023-01-11T21:38:06.4864793Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.4864868Z import torch 2023-01-11T21:38:06.4864940Z import random 2023-01-11T21:38:06.4865061Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.4865186Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.4865191Z 2023-01-11T21:38:06.4865276Z aten = torch.ops.aten 2023-01-11T21:38:06.4865438Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.4865548Z async_compile = AsyncCompile() 2023-01-11T21:38:06.4865554Z 2023-01-11T21:38:06.4865646Z import triton 2023-01-11T21:38:06.4865733Z import triton.language as tl 2023-01-11T21:38:06.4865860Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.4866002Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.4866008Z 2023-01-11T21:38:06.4866012Z 2023-01-11T21:38:06.4866178Z triton_fused_log1p_mul_0 = async_compile.triton(''' 2023-01-11T21:38:06.4866256Z import triton 2023-01-11T21:38:06.4866352Z import triton.language as tl 2023-01-11T21:38:06.4866468Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4866573Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4866701Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4866827Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4866832Z 2023-01-11T21:38:06.4867280Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.4867356Z @triton.jit 2023-01-11T21:38:06.4867499Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4867573Z xnumel = 64 2023-01-11T21:38:06.4867673Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4867805Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4867883Z xmask = xindex < xnumel 2023-01-11T21:38:06.4867954Z x0 = xindex 2023-01-11T21:38:06.4868147Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.4868247Z tmp3 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.4868336Z tmp1 = tmp0.to(tl.float32) 2023-01-11T21:38:06.4868435Z tmp2 = tl.libdevice.log1p(tmp1) 2023-01-11T21:38:06.4868524Z tmp4 = tmp3.to(tl.float32) 2023-01-11T21:38:06.4868620Z tmp5 = tl.libdevice.log1p(tmp4) 2023-01-11T21:38:06.4868691Z tmp6 = 2 2023-01-11T21:38:06.4868773Z tmp7 = tmp5 * tmp6 2023-01-11T21:38:06.4868910Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.4869044Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp7, xmask) 2023-01-11T21:38:06.4869132Z ''') 2023-01-11T21:38:06.4869138Z 2023-01-11T21:38:06.4869142Z 2023-01-11T21:38:06.4869236Z async_compile.wait(globals()) 2023-01-11T21:38:06.4869308Z del async_compile 2023-01-11T21:38:06.4869320Z 2023-01-11T21:38:06.4869391Z def call(args): 2023-01-11T21:38:06.4869466Z arg0_1, = args 2023-01-11T21:38:06.4869569Z args.clear() 2023-01-11T21:38:06.4869662Z with torch.cuda.device(0): 2023-01-11T21:38:06.4869862Z buf0 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.4870063Z buf1 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.4870162Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.4870304Z triton_fused_log1p_mul_0.run(arg0_1, buf0, buf1, 64, grid=grid(64), stream=stream0) 2023-01-11T21:38:06.4870377Z del arg0_1 2023-01-11T21:38:06.4870462Z return (buf0, buf1, ) 2023-01-11T21:38:06.4870467Z 2023-01-11T21:38:06.4870472Z 2023-01-11T21:38:06.4870551Z if __name__ == "__main__": 2023-01-11T21:38:06.4870675Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.4870801Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.4870999Z arg0_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.4871111Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.4871121Z 2023-01-11T21:38:06.4871381Z [2023-01-11 21:35:11,268] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 677 2023-01-11T21:38:06.4871797Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.4871932Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.4872190Z [2023-01-11 21:35:11,284] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 678 2023-01-11T21:38:06.4872450Z [2023-01-11 21:35:11,292] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 678 2023-01-11T21:38:06.4872458Z 2023-01-11T21:38:06.4872558Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.4872634Z import torch 2023-01-11T21:38:06.4872711Z import random 2023-01-11T21:38:06.4872829Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.4872947Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.4872952Z 2023-01-11T21:38:06.4873061Z aten = torch.ops.aten 2023-01-11T21:38:06.4873202Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.4873299Z async_compile = AsyncCompile() 2023-01-11T21:38:06.4873304Z 2023-01-11T21:38:06.4873379Z import triton 2023-01-11T21:38:06.4873471Z import triton.language as tl 2023-01-11T21:38:06.4873599Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.4873734Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.4873744Z 2023-01-11T21:38:06.4873748Z 2023-01-11T21:38:06.4873908Z triton_fused_log1p_mul_0 = async_compile.triton(''' 2023-01-11T21:38:06.4873987Z import triton 2023-01-11T21:38:06.4874084Z import triton.language as tl 2023-01-11T21:38:06.4874199Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4874306Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4874443Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4874573Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4874578Z 2023-01-11T21:38:06.4874994Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.4875063Z @triton.jit 2023-01-11T21:38:06.4875207Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4875294Z xnumel = 201 2023-01-11T21:38:06.4875409Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4875566Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4875679Z xmask = xindex < xnumel 2023-01-11T21:38:06.4875751Z x0 = xindex 2023-01-11T21:38:06.4875938Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.4876039Z tmp3 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.4876130Z tmp1 = tmp0.to(tl.float32) 2023-01-11T21:38:06.4876227Z tmp2 = tl.libdevice.log1p(tmp1) 2023-01-11T21:38:06.4876316Z tmp4 = tmp3.to(tl.float32) 2023-01-11T21:38:06.4876416Z tmp5 = tl.libdevice.log1p(tmp4) 2023-01-11T21:38:06.4876489Z tmp6 = 2 2023-01-11T21:38:06.4876563Z tmp7 = tmp5 * tmp6 2023-01-11T21:38:06.4876699Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.4876831Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp7, xmask) 2023-01-11T21:38:06.4876922Z ''') 2023-01-11T21:38:06.4876927Z 2023-01-11T21:38:06.4876933Z 2023-01-11T21:38:06.4877025Z async_compile.wait(globals()) 2023-01-11T21:38:06.4877108Z del async_compile 2023-01-11T21:38:06.4877113Z 2023-01-11T21:38:06.4877189Z def call(args): 2023-01-11T21:38:06.4877265Z arg0_1, = args 2023-01-11T21:38:06.4877335Z args.clear() 2023-01-11T21:38:06.4877430Z with torch.cuda.device(0): 2023-01-11T21:38:06.4877636Z buf0 = empty_strided((201, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.4877839Z buf1 = empty_strided((201, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.4877932Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.4878081Z triton_fused_log1p_mul_0.run(arg0_1, buf0, buf1, 201, grid=grid(201), stream=stream0) 2023-01-11T21:38:06.4878155Z del arg0_1 2023-01-11T21:38:06.4878234Z return (buf0, buf1, ) 2023-01-11T21:38:06.4878239Z 2023-01-11T21:38:06.4878248Z 2023-01-11T21:38:06.4878323Z if __name__ == "__main__": 2023-01-11T21:38:06.4878445Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.4878572Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.4878775Z arg0_1 = rand_strided((201, ), (1, ), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.4878888Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.4878893Z 2023-01-11T21:38:06.4878898Z 2023-01-11T21:38:06.4878999Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.4879103Z import torch 2023-01-11T21:38:06.4879174Z import random 2023-01-11T21:38:06.4879295Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.4879421Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.4879426Z 2023-01-11T21:38:06.4879510Z aten = torch.ops.aten 2023-01-11T21:38:06.4879650Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.4879750Z async_compile = AsyncCompile() 2023-01-11T21:38:06.4879755Z 2023-01-11T21:38:06.4879831Z import triton 2023-01-11T21:38:06.4879922Z import triton.language as tl 2023-01-11T21:38:06.4880042Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.4880189Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.4880194Z 2023-01-11T21:38:06.4880198Z 2023-01-11T21:38:06.4880366Z triton_fused_log1p_mul_0 = async_compile.triton(''' 2023-01-11T21:38:06.4880443Z import triton 2023-01-11T21:38:06.4880538Z import triton.language as tl 2023-01-11T21:38:06.4880657Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4880759Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4880888Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4881015Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4881020Z 2023-01-11T21:38:06.4881434Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.4881510Z @triton.jit 2023-01-11T21:38:06.4881680Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4881758Z xnumel = 201 2023-01-11T21:38:06.4881859Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4881990Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4882069Z xmask = xindex < xnumel 2023-01-11T21:38:06.4882144Z x0 = xindex 2023-01-11T21:38:06.4882334Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.4882434Z tmp3 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.4882524Z tmp1 = tmp0.to(tl.float32) 2023-01-11T21:38:06.4882624Z tmp2 = tl.libdevice.log1p(tmp1) 2023-01-11T21:38:06.4882711Z tmp4 = tmp3.to(tl.float32) 2023-01-11T21:38:06.4882805Z tmp5 = tl.libdevice.log1p(tmp4) 2023-01-11T21:38:06.4882876Z tmp6 = 2 2023-01-11T21:38:06.4882958Z tmp7 = tmp5 * tmp6 2023-01-11T21:38:06.4883093Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.4883233Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp7, xmask) 2023-01-11T21:38:06.4883322Z ''') 2023-01-11T21:38:06.4883328Z 2023-01-11T21:38:06.4883332Z 2023-01-11T21:38:06.4883425Z async_compile.wait(globals()) 2023-01-11T21:38:06.4883504Z del async_compile 2023-01-11T21:38:06.4883510Z 2023-01-11T21:38:06.4883583Z def call(args): 2023-01-11T21:38:06.4883658Z arg0_1, = args 2023-01-11T21:38:06.4883733Z args.clear() 2023-01-11T21:38:06.4883828Z with torch.cuda.device(0): 2023-01-11T21:38:06.4884029Z buf0 = empty_strided((201, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.4884227Z buf1 = empty_strided((201, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.4884325Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.4884470Z triton_fused_log1p_mul_0.run(arg0_1, buf0, buf1, 201, grid=grid(201), stream=stream0) 2023-01-11T21:38:06.4884546Z del arg0_1 2023-01-11T21:38:06.4884630Z return (buf0, buf1, ) 2023-01-11T21:38:06.4884638Z 2023-01-11T21:38:06.4884642Z 2023-01-11T21:38:06.4884726Z if __name__ == "__main__": 2023-01-11T21:38:06.4884848Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.4884973Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.4885201Z arg0_1 = rand_strided((201, ), (1, ), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.4885317Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.4885323Z 2023-01-11T21:38:06.4885388Z ok (1.911s) 2023-01-11T21:38:06.4885842Z test_log2_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.4885972Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.4886235Z [2023-01-11 21:35:11,313] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 679 2023-01-11T21:38:06.4886503Z [2023-01-11 21:35:11,393] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 679 2023-01-11T21:38:06.4886928Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.4887059Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.4887318Z [2023-01-11 21:35:11,413] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 680 2023-01-11T21:38:06.4887584Z [2023-01-11 21:35:11,491] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 680 2023-01-11T21:38:06.4887614Z 2023-01-11T21:38:06.4887718Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.4887794Z import torch 2023-01-11T21:38:06.4887864Z import random 2023-01-11T21:38:06.4887985Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.4888111Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.4888116Z 2023-01-11T21:38:06.4888198Z aten = torch.ops.aten 2023-01-11T21:38:06.4888339Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.4888435Z async_compile = AsyncCompile() 2023-01-11T21:38:06.4888440Z 2023-01-11T21:38:06.4888517Z import triton 2023-01-11T21:38:06.4888605Z import triton.language as tl 2023-01-11T21:38:06.4888735Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.4888878Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.4888884Z 2023-01-11T21:38:06.4888891Z 2023-01-11T21:38:06.4889052Z triton_fused_mul_sub_0 = async_compile.triton(''' 2023-01-11T21:38:06.4889130Z import triton 2023-01-11T21:38:06.4889222Z import triton.language as tl 2023-01-11T21:38:06.4889341Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4889448Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4889580Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4889707Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4889712Z 2023-01-11T21:38:06.4890133Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.4890209Z @triton.jit 2023-01-11T21:38:06.4890356Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4890432Z xnumel = 64 2023-01-11T21:38:06.4890536Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4890671Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4890750Z xmask = xindex < xnumel 2023-01-11T21:38:06.4890825Z x0 = xindex 2023-01-11T21:38:06.4891019Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.4891146Z tmp4 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.4891231Z tmp1 = tl.log(tmp0) 2023-01-11T21:38:06.4891313Z tmp2 = 1.4426950408889634 2023-01-11T21:38:06.4891402Z tmp3 = tmp1 * tmp2 2023-01-11T21:38:06.4891469Z tmp5 = 1 2023-01-11T21:38:06.4891547Z tmp6 = tmp4 + tmp5 2023-01-11T21:38:06.4891629Z tmp7 = tl.log(tmp6) 2023-01-11T21:38:06.4891709Z tmp8 = tmp7 * tmp2 2023-01-11T21:38:06.4891783Z tmp9 = 2 2023-01-11T21:38:06.4891897Z tmp10 = tmp8 - tmp9 2023-01-11T21:38:06.4892035Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.4892167Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp10, xmask) 2023-01-11T21:38:06.4892252Z ''') 2023-01-11T21:38:06.4892257Z 2023-01-11T21:38:06.4892262Z 2023-01-11T21:38:06.4892356Z async_compile.wait(globals()) 2023-01-11T21:38:06.4892434Z del async_compile 2023-01-11T21:38:06.4892439Z 2023-01-11T21:38:06.4892515Z def call(args): 2023-01-11T21:38:06.4892593Z arg0_1, = args 2023-01-11T21:38:06.4892672Z args.clear() 2023-01-11T21:38:06.4892759Z with torch.cuda.device(0): 2023-01-11T21:38:06.4892961Z buf0 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.4893159Z buf1 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.4893254Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.4893400Z triton_fused_mul_sub_0.run(arg0_1, buf0, buf1, 64, grid=grid(64), stream=stream0) 2023-01-11T21:38:06.4893475Z del arg0_1 2023-01-11T21:38:06.4893558Z return (buf0, buf1, ) 2023-01-11T21:38:06.4893604Z 2023-01-11T21:38:06.4893608Z 2023-01-11T21:38:06.4893691Z if __name__ == "__main__": 2023-01-11T21:38:06.4893804Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.4893933Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.4894133Z arg0_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.4894248Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.4894254Z 2023-01-11T21:38:06.4894259Z 2023-01-11T21:38:06.4894358Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.4894433Z import torch 2023-01-11T21:38:06.4894624Z import random 2023-01-11T21:38:06.4894749Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.4894867Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.4894872Z 2023-01-11T21:38:06.4894952Z aten = torch.ops.aten 2023-01-11T21:38:06.4895088Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.4895181Z async_compile = AsyncCompile() 2023-01-11T21:38:06.4895190Z 2023-01-11T21:38:06.4895261Z import triton 2023-01-11T21:38:06.4895352Z import triton.language as tl 2023-01-11T21:38:06.4895475Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.4895608Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.4895621Z 2023-01-11T21:38:06.4895628Z 2023-01-11T21:38:06.4895782Z triton_fused_mul_sub_0 = async_compile.triton(''' 2023-01-11T21:38:06.4895857Z import triton 2023-01-11T21:38:06.4895950Z import triton.language as tl 2023-01-11T21:38:06.4896061Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4896162Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4896293Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4896417Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4896422Z 2023-01-11T21:38:06.4896839Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.4896908Z @triton.jit 2023-01-11T21:38:06.4897047Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4897118Z xnumel = 64 2023-01-11T21:38:06.4897379Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4897533Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4897628Z xmask = xindex < xnumel 2023-01-11T21:38:06.4897713Z x0 = xindex 2023-01-11T21:38:06.4897950Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.4898185Z tmp4 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.4898266Z tmp1 = tl.log(tmp0) 2023-01-11T21:38:06.4898347Z tmp2 = 1.4426950408889634 2023-01-11T21:38:06.4898425Z tmp3 = tmp1 * tmp2 2023-01-11T21:38:06.4898493Z tmp5 = 1 2023-01-11T21:38:06.4898573Z tmp6 = tmp4 + tmp5 2023-01-11T21:38:06.4898644Z tmp7 = tl.log(tmp6) 2023-01-11T21:38:06.4898721Z tmp8 = tmp7 * tmp2 2023-01-11T21:38:06.4898789Z tmp9 = 2 2023-01-11T21:38:06.4898899Z tmp10 = tmp8 - tmp9 2023-01-11T21:38:06.4899035Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.4899172Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp10, xmask) 2023-01-11T21:38:06.4899255Z ''') 2023-01-11T21:38:06.4899260Z 2023-01-11T21:38:06.4899265Z 2023-01-11T21:38:06.4899350Z async_compile.wait(globals()) 2023-01-11T21:38:06.4899427Z del async_compile 2023-01-11T21:38:06.4899432Z 2023-01-11T21:38:06.4899507Z def call(args): 2023-01-11T21:38:06.4899581Z arg0_1, = args 2023-01-11T21:38:06.4899655Z args.clear() 2023-01-11T21:38:06.4899746Z with torch.cuda.device(0): 2023-01-11T21:38:06.4899943Z buf0 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.4900179Z buf1 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.4900274Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.4900420Z triton_fused_mul_sub_0.run(arg0_1, buf0, buf1, 64, grid=grid(64), stream=stream0) 2023-01-11T21:38:06.4900494Z del arg0_1 2023-01-11T21:38:06.4900580Z return (buf0, buf1, ) 2023-01-11T21:38:06.4900586Z 2023-01-11T21:38:06.4900590Z 2023-01-11T21:38:06.4900672Z if __name__ == "__main__": 2023-01-11T21:38:06.4900792Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.4900921Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.4901114Z arg0_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.4901229Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.4901234Z 2023-01-11T21:38:06.4901310Z ok (0.200s) 2023-01-11T21:38:06.4901771Z test_log_fp64_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.4901910Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.4902176Z [2023-01-11 21:35:11,507] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 681 2023-01-11T21:38:06.4902441Z [2023-01-11 21:35:11,656] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 681 2023-01-11T21:38:06.4902857Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.4902997Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.4903253Z [2023-01-11 21:35:11,671] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 682 2023-01-11T21:38:06.4903548Z [2023-01-11 21:35:11,680] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 682 2023-01-11T21:38:06.4903555Z 2023-01-11T21:38:06.4903650Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.4903727Z import torch 2023-01-11T21:38:06.4903804Z import random 2023-01-11T21:38:06.4903925Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.4904053Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.4904058Z 2023-01-11T21:38:06.4904141Z aten = torch.ops.aten 2023-01-11T21:38:06.4904281Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.4904378Z async_compile = AsyncCompile() 2023-01-11T21:38:06.4904386Z 2023-01-11T21:38:06.4904456Z import triton 2023-01-11T21:38:06.4904551Z import triton.language as tl 2023-01-11T21:38:06.4904677Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.4904819Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.4904824Z 2023-01-11T21:38:06.4904828Z 2023-01-11T21:38:06.4904995Z triton_fused_log_mul_0 = async_compile.triton(''' 2023-01-11T21:38:06.4905073Z import triton 2023-01-11T21:38:06.4905168Z import triton.language as tl 2023-01-11T21:38:06.4905279Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4905384Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4905540Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4905685Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4905692Z 2023-01-11T21:38:06.4906126Z @pointwise(size_hints=[1024], filename=__file__, meta={'signature': {0: '*fp64', 1: '*fp64', 2: '*fp64', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.4906228Z @triton.jit 2023-01-11T21:38:06.4906369Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4906446Z xnumel = 1024 2023-01-11T21:38:06.4906548Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4906675Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4906760Z xmask = xindex < xnumel 2023-01-11T21:38:06.4906834Z x0 = xindex 2023-01-11T21:38:06.4907029Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.4907129Z tmp2 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.4907226Z tmp1 = tl.libdevice.log(tmp0) 2023-01-11T21:38:06.4907322Z tmp3 = tl.libdevice.log(tmp2) 2023-01-11T21:38:06.4907397Z tmp4 = 1.4426950408889634 2023-01-11T21:38:06.4907476Z tmp5 = tmp3 * tmp4 2023-01-11T21:38:06.4907616Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask) 2023-01-11T21:38:06.4907751Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask) 2023-01-11T21:38:06.4907838Z ''') 2023-01-11T21:38:06.4907844Z 2023-01-11T21:38:06.4907848Z 2023-01-11T21:38:06.4907944Z async_compile.wait(globals()) 2023-01-11T21:38:06.4908025Z del async_compile 2023-01-11T21:38:06.4908031Z 2023-01-11T21:38:06.4908101Z def call(args): 2023-01-11T21:38:06.4908176Z arg0_1, = args 2023-01-11T21:38:06.4908252Z args.clear() 2023-01-11T21:38:06.4908345Z with torch.cuda.device(0): 2023-01-11T21:38:06.4908549Z buf0 = empty_strided((1024, ), (1, ), device='cuda', dtype=torch.float64) 2023-01-11T21:38:06.4908750Z buf1 = empty_strided((1024, ), (1, ), device='cuda', dtype=torch.float64) 2023-01-11T21:38:06.4908844Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.4908994Z triton_fused_log_mul_0.run(arg0_1, buf0, buf1, 1024, grid=grid(1024), stream=stream0) 2023-01-11T21:38:06.4909066Z del arg0_1 2023-01-11T21:38:06.4909150Z return (buf0, buf1, ) 2023-01-11T21:38:06.4909156Z 2023-01-11T21:38:06.4909160Z 2023-01-11T21:38:06.4909242Z if __name__ == "__main__": 2023-01-11T21:38:06.4909362Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.4909518Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.4909724Z arg0_1 = rand_strided((1024, ), (1, ), device='cuda:0', dtype=torch.float64) 2023-01-11T21:38:06.4909839Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.4909844Z 2023-01-11T21:38:06.4909848Z 2023-01-11T21:38:06.4909947Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.4910016Z import torch 2023-01-11T21:38:06.4910094Z import random 2023-01-11T21:38:06.4910215Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.4910340Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.4910345Z 2023-01-11T21:38:06.4910429Z aten = torch.ops.aten 2023-01-11T21:38:06.4910572Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.4910669Z async_compile = AsyncCompile() 2023-01-11T21:38:06.4910674Z 2023-01-11T21:38:06.4910743Z import triton 2023-01-11T21:38:06.4910837Z import triton.language as tl 2023-01-11T21:38:06.4910966Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.4911107Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.4911112Z 2023-01-11T21:38:06.4911116Z 2023-01-11T21:38:06.4911280Z triton_fused_log_mul_0 = async_compile.triton(''' 2023-01-11T21:38:06.4911358Z import triton 2023-01-11T21:38:06.4911453Z import triton.language as tl 2023-01-11T21:38:06.4911567Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4911664Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4911798Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4911926Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4911962Z 2023-01-11T21:38:06.4912389Z @pointwise(size_hints=[1024], filename=__file__, meta={'signature': {0: '*fp64', 1: '*fp64', 2: '*fp64', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.4912464Z @triton.jit 2023-01-11T21:38:06.4912611Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4912686Z xnumel = 1024 2023-01-11T21:38:06.4912786Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4912911Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4912997Z xmask = xindex < xnumel 2023-01-11T21:38:06.4913073Z x0 = xindex 2023-01-11T21:38:06.4913266Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.4913366Z tmp2 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.4913465Z tmp1 = tl.libdevice.log(tmp0) 2023-01-11T21:38:06.4913562Z tmp3 = tl.libdevice.log(tmp2) 2023-01-11T21:38:06.4913637Z tmp4 = 1.4426950408889634 2023-01-11T21:38:06.4913721Z tmp5 = tmp3 * tmp4 2023-01-11T21:38:06.4913856Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask) 2023-01-11T21:38:06.4913989Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask) 2023-01-11T21:38:06.4914080Z ''') 2023-01-11T21:38:06.4914086Z 2023-01-11T21:38:06.4914090Z 2023-01-11T21:38:06.4914185Z async_compile.wait(globals()) 2023-01-11T21:38:06.4914265Z del async_compile 2023-01-11T21:38:06.4914270Z 2023-01-11T21:38:06.4914349Z def call(args): 2023-01-11T21:38:06.4914417Z arg0_1, = args 2023-01-11T21:38:06.4914493Z args.clear() 2023-01-11T21:38:06.4914588Z with torch.cuda.device(0): 2023-01-11T21:38:06.4914789Z buf0 = empty_strided((1024, ), (1, ), device='cuda', dtype=torch.float64) 2023-01-11T21:38:06.4914990Z buf1 = empty_strided((1024, ), (1, ), device='cuda', dtype=torch.float64) 2023-01-11T21:38:06.4915085Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.4915234Z triton_fused_log_mul_0.run(arg0_1, buf0, buf1, 1024, grid=grid(1024), stream=stream0) 2023-01-11T21:38:06.4915312Z del arg0_1 2023-01-11T21:38:06.4915413Z return (buf0, buf1, ) 2023-01-11T21:38:06.4915418Z 2023-01-11T21:38:06.4915423Z 2023-01-11T21:38:06.4915548Z if __name__ == "__main__": 2023-01-11T21:38:06.4915671Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.4915798Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.4915997Z arg0_1 = rand_strided((1024, ), (1, ), device='cuda:0', dtype=torch.float64) 2023-01-11T21:38:06.4916110Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.4916115Z 2023-01-11T21:38:06.4916185Z ok (0.188s) 2023-01-11T21:38:06.4916646Z test_log_softmax_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.4916774Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.4917032Z [2023-01-11 21:35:11,724] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 683 2023-01-11T21:38:06.4917241Z [2023-01-11 21:35:11,772] torch._inductor.scheduler: [DEBUG] remove_buffer('buf0') 2023-01-11T21:38:06.4917443Z [2023-01-11 21:35:11,772] torch._inductor.scheduler: [DEBUG] remove_buffer('buf7') 2023-01-11T21:38:06.4917640Z [2023-01-11 21:35:11,772] torch._inductor.scheduler: [DEBUG] remove_buffer('buf6') 2023-01-11T21:38:06.4917838Z [2023-01-11 21:35:11,772] torch._inductor.scheduler: [DEBUG] remove_buffer('buf1') 2023-01-11T21:38:06.4918100Z [2023-01-11 21:35:11,996] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 683 2023-01-11T21:38:06.4918546Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.4918678Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.4918932Z [2023-01-11 21:35:12,144] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 684 2023-01-11T21:38:06.4919133Z [2023-01-11 21:35:12,194] torch._inductor.scheduler: [DEBUG] remove_buffer('buf0') 2023-01-11T21:38:06.4919338Z [2023-01-11 21:35:12,194] torch._inductor.scheduler: [DEBUG] remove_buffer('buf7') 2023-01-11T21:38:06.4919539Z [2023-01-11 21:35:12,194] torch._inductor.scheduler: [DEBUG] remove_buffer('buf6') 2023-01-11T21:38:06.4919740Z [2023-01-11 21:35:12,194] torch._inductor.scheduler: [DEBUG] remove_buffer('buf1') 2023-01-11T21:38:06.4919746Z 2023-01-11T21:38:06.4919842Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.4919915Z import torch 2023-01-11T21:38:06.4919989Z import random 2023-01-11T21:38:06.4920111Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.4920232Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.4920237Z 2023-01-11T21:38:06.4920320Z aten = torch.ops.aten 2023-01-11T21:38:06.4920457Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.4920550Z async_compile = AsyncCompile() 2023-01-11T21:38:06.4920555Z 2023-01-11T21:38:06.4920628Z import triton 2023-01-11T21:38:06.4920719Z import triton.language as tl 2023-01-11T21:38:06.4920844Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.4920978Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.4920988Z 2023-01-11T21:38:06.4920999Z 2023-01-11T21:38:06.4921180Z triton_fused_amax_1_exp_1_sub_2_sum_2_0 = async_compile.triton(''' 2023-01-11T21:38:06.4921254Z import triton 2023-01-11T21:38:06.4921344Z import triton.language as tl 2023-01-11T21:38:06.4921458Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4921558Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4921715Z from torch._inductor.triton_ops.autotune import reduction 2023-01-11T21:38:06.4921844Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4921850Z 2023-01-11T21:38:06.4921931Z @reduction(size_hints=[8, 8], 2023-01-11T21:38:06.4922050Z reduction_hint=ReductionHint.DEFAULT, 2023-01-11T21:38:06.4922137Z filename=__file__, 2023-01-11T21:38:06.4922514Z meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.4922591Z @triton.jit 2023-01-11T21:38:06.4922767Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr): 2023-01-11T21:38:06.4922842Z xnumel = 8 2023-01-11T21:38:06.4922912Z rnumel = 8 2023-01-11T21:38:06.4923003Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4923137Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.4923220Z xmask = xindex < xnumel 2023-01-11T21:38:06.4923335Z rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK]) 2023-01-11T21:38:06.4923405Z x0 = xindex 2023-01-11T21:38:06.4923588Z _tmp1 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("-inf") 2023-01-11T21:38:06.4923695Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.4923777Z rindex = roffset + rbase 2023-01-11T21:38:06.4923860Z rmask = rindex < rnumel 2023-01-11T21:38:06.4923931Z r1 = rindex 2023-01-11T21:38:06.4924151Z tmp0 = tl.load(in_ptr0 + (x0 + (8*r1)), rmask & xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.4924307Z _tmp1 = tl.where(xmask & rmask & (_tmp1 < tmp0), tmp0, _tmp1) 2023-01-11T21:38:06.4924420Z tmp1 = tl.reshape(tl.max(_tmp1, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.4924517Z tl.store(out_ptr0 + x0, tmp1, xmask) 2023-01-11T21:38:06.4924631Z _tmp5 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.4924736Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.4924822Z rindex = roffset + rbase 2023-01-11T21:38:06.4924905Z rmask = rindex < rnumel 2023-01-11T21:38:06.4924975Z r1 = rindex 2023-01-11T21:38:06.4925095Z tmp2 = tl.load(in_ptr0 + (x0 + (8*r1)), rmask & xmask) 2023-01-11T21:38:06.4925212Z tmp3 = tmp2 - tmp1 2023-01-11T21:38:06.4925288Z tmp4 = tl.exp(tmp3) 2023-01-11T21:38:06.4925410Z _tmp5 = tl.where(xmask & rmask, _tmp5 + tmp4, _tmp5) 2023-01-11T21:38:06.4925525Z tmp5 = tl.reshape(tl.sum(_tmp5, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.4925625Z tl.store(out_ptr1 + x0, tmp5, xmask) 2023-01-11T21:38:06.4925711Z ''') 2023-01-11T21:38:06.4925717Z 2023-01-11T21:38:06.4925721Z 2023-01-11T21:38:06.4925953Z triton_fused_add_amax_amax_2_exp_exp_2_sub_sub_1_sub_3_sub_4_sub_5_1 = async_compile.triton(''' 2023-01-11T21:38:06.4926030Z import triton 2023-01-11T21:38:06.4926123Z import triton.language as tl 2023-01-11T21:38:06.4926233Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4926334Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4926467Z from torch._inductor.triton_ops.autotune import reduction 2023-01-11T21:38:06.4926591Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4926596Z 2023-01-11T21:38:06.4926686Z @reduction(size_hints=[8, 8], 2023-01-11T21:38:06.4926802Z reduction_hint=ReductionHint.INNER, 2023-01-11T21:38:06.4926888Z filename=__file__, 2023-01-11T21:38:06.4927321Z meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: '*fp32', 5: '*fp32', 6: '*fp32', 7: 'i32', 8: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5, 6), equal_to_1=())]}) 2023-01-11T21:38:06.4927390Z @triton.jit 2023-01-11T21:38:06.4927630Z def triton_(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr4, out_ptr5, out_ptr6, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr): 2023-01-11T21:38:06.4927705Z xnumel = 8 2023-01-11T21:38:06.4927776Z rnumel = 8 2023-01-11T21:38:06.4927875Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4928012Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.4928096Z xmask = xindex < xnumel 2023-01-11T21:38:06.4928207Z rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK]) 2023-01-11T21:38:06.4928276Z x0 = xindex 2023-01-11T21:38:06.4928459Z _tmp3 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("-inf") 2023-01-11T21:38:06.4928562Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.4928655Z rindex = roffset + rbase 2023-01-11T21:38:06.4928740Z rmask = rindex < rnumel 2023-01-11T21:38:06.4928811Z r1 = rindex 2023-01-11T21:38:06.4929018Z tmp0 = tl.load(in_ptr0 + (r1 + (8*x0)), rmask & xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.4929236Z tmp1 = tl.load(in_ptr1 + (r1 + (8*x0)), rmask & xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.4929318Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.4929445Z _tmp3 = tl.where(xmask & rmask & (_tmp3 < tmp2), tmp2, _tmp3) 2023-01-11T21:38:06.4929559Z tmp3 = tl.reshape(tl.max(_tmp3, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.4929676Z _tmp9 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.4929862Z _tmp10 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("-inf") 2023-01-11T21:38:06.4929969Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.4930049Z rindex = roffset + rbase 2023-01-11T21:38:06.4930135Z rmask = rindex < rnumel 2023-01-11T21:38:06.4930243Z r1 = rindex 2023-01-11T21:38:06.4930459Z tmp4 = tl.load(in_ptr0 + (r1 + (8*x0)), rmask & xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.4930672Z tmp5 = tl.load(in_ptr1 + (r1 + (8*x0)), rmask & xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.4930753Z tmp6 = tmp4 + tmp5 2023-01-11T21:38:06.4930871Z tmp7 = tmp6 - tmp3 2023-01-11T21:38:06.4930946Z tmp8 = tl.exp(tmp7) 2023-01-11T21:38:06.4931067Z _tmp9 = tl.where(xmask & rmask, _tmp9 + tmp8, _tmp9) 2023-01-11T21:38:06.4931197Z _tmp10 = tl.where(xmask & rmask & (_tmp10 < tmp5), tmp5, _tmp10) 2023-01-11T21:38:06.4931309Z tmp9 = tl.reshape(tl.sum(_tmp9, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.4931423Z tmp10 = tl.reshape(tl.max(_tmp10, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.4931538Z _tmp14 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.4931640Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.4931721Z rindex = roffset + rbase 2023-01-11T21:38:06.4931809Z rmask = rindex < rnumel 2023-01-11T21:38:06.4931880Z r1 = rindex 2023-01-11T21:38:06.4932096Z tmp11 = tl.load(in_ptr1 + (r1 + (8*x0)), rmask & xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.4932320Z tmp15 = tl.load(in_ptr0 + (r1 + (8*x0)), rmask & xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.4932437Z tmp20 = tl.load(in_ptr0 + (r1 + (8*x0)), rmask & xmask) 2023-01-11T21:38:06.4932541Z tmp21 = tl.load(in_ptr2 + (r1), rmask) 2023-01-11T21:38:06.4932644Z tmp23 = tl.load(in_ptr3 + (r1), rmask) 2023-01-11T21:38:06.4932757Z tmp12 = tmp11 - tmp10 2023-01-11T21:38:06.4932842Z tmp13 = tl.exp(tmp12) 2023-01-11T21:38:06.4932966Z _tmp14 = tl.where(xmask & rmask, _tmp14 + tmp13, _tmp14) 2023-01-11T21:38:06.4933048Z tmp16 = tmp15 + tmp11 2023-01-11T21:38:06.4933164Z tmp17 = tmp16 - tmp3 2023-01-11T21:38:06.4933250Z tmp18 = tl.log(tmp9) 2023-01-11T21:38:06.4933361Z tmp19 = tmp17 - tmp18 2023-01-11T21:38:06.4933476Z tmp22 = tmp20 - tmp21 2023-01-11T21:38:06.4933557Z tmp24 = tl.log(tmp23) 2023-01-11T21:38:06.4933670Z tmp25 = tmp22 - tmp24 2023-01-11T21:38:06.4933829Z tl.store(out_ptr4 + (r1 + (8*x0) + tl.zeros([XBLOCK, RBLOCK], tl.int32)), tmp19, rmask & xmask) 2023-01-11T21:38:06.4934016Z tl.store(out_ptr5 + (r1 + (8*x0) + tl.zeros([XBLOCK, RBLOCK], tl.int32)), tmp25, rmask & xmask) 2023-01-11T21:38:06.4934135Z tmp14 = tl.reshape(tl.sum(_tmp14, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.4934241Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.4934322Z rindex = roffset + rbase 2023-01-11T21:38:06.4934406Z rmask = rindex < rnumel 2023-01-11T21:38:06.4934588Z r1 = rindex 2023-01-11T21:38:06.4934707Z tmp26 = tl.load(in_ptr1 + (r1 + (8*x0)), rmask & xmask) 2023-01-11T21:38:06.4934825Z tmp27 = tmp26 - tmp10 2023-01-11T21:38:06.4934907Z tmp28 = tl.log(tmp14) 2023-01-11T21:38:06.4935026Z tmp29 = tmp27 - tmp28 2023-01-11T21:38:06.4935172Z tl.store(out_ptr6 + (r1 + (8*x0) + tl.zeros([XBLOCK, RBLOCK], tl.int32)), tmp29, rmask & xmask) 2023-01-11T21:38:06.4935255Z ''') 2023-01-11T21:38:06.4935261Z 2023-01-11T21:38:06.4935265Z 2023-01-11T21:38:06.4935359Z async_compile.wait(globals()) 2023-01-11T21:38:06.4935437Z del async_compile 2023-01-11T21:38:06.4935442Z 2023-01-11T21:38:06.4935519Z def call(args): 2023-01-11T21:38:06.4935599Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.4935675Z args.clear() 2023-01-11T21:38:06.4935762Z with torch.cuda.device(0): 2023-01-11T21:38:06.4935966Z buf3 = empty_strided((1, 8), (8, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.4936166Z buf4 = empty_strided((1, 8), (8, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.4936259Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.4936418Z triton_fused_amax_1_exp_1_sub_2_sum_2_0.run(arg0_1, buf3, buf4, 8, 8, grid=grid(8), stream=stream0) 2023-01-11T21:38:06.4936674Z buf2 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.4936866Z buf5 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.4937064Z buf8 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.4937312Z triton_fused_add_amax_amax_2_exp_exp_2_sub_sub_1_sub_3_sub_4_sub_5_1.run(arg0_1, arg1_1, buf3, buf4, buf2, buf5, buf8, 8, 8, grid=grid(8), stream=stream0) 2023-01-11T21:38:06.4937396Z del arg0_1 2023-01-11T21:38:06.4937469Z del arg1_1 2023-01-11T21:38:06.4937558Z return (buf2, buf5, buf8, ) 2023-01-11T21:38:06.4937564Z 2023-01-11T21:38:06.4937568Z 2023-01-11T21:38:06.4937647Z if __name__ == "__main__": 2023-01-11T21:38:06.4937766Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.4937893Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.4938095Z arg0_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.4938285Z arg1_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.4938405Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.4938410Z 2023-01-11T21:38:06.4938415Z 2023-01-11T21:38:06.4938512Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.4938586Z import torch 2023-01-11T21:38:06.4938660Z import random 2023-01-11T21:38:06.4938777Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.4938901Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.4938906Z 2023-01-11T21:38:06.4938988Z aten = torch.ops.aten 2023-01-11T21:38:06.4939117Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.4939214Z async_compile = AsyncCompile() 2023-01-11T21:38:06.4939219Z 2023-01-11T21:38:06.4939292Z import triton 2023-01-11T21:38:06.4939385Z import triton.language as tl 2023-01-11T21:38:06.4939513Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.4939654Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.4939659Z 2023-01-11T21:38:06.4939664Z 2023-01-11T21:38:06.4939887Z triton_fused_amax_1_convert_element_type_2_exp_1_sub_2_sum_2_0 = async_compile.triton(''' 2023-01-11T21:38:06.4940002Z import triton 2023-01-11T21:38:06.4940089Z import triton.language as tl 2023-01-11T21:38:06.4940201Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4940303Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4940434Z from torch._inductor.triton_ops.autotune import reduction 2023-01-11T21:38:06.4940562Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4940567Z 2023-01-11T21:38:06.4940655Z @reduction(size_hints=[8, 8], 2023-01-11T21:38:06.4940772Z reduction_hint=ReductionHint.DEFAULT, 2023-01-11T21:38:06.4940856Z filename=__file__, 2023-01-11T21:38:06.4941226Z meta={'signature': {0: '*fp16', 1: '*fp32', 2: '*fp32', 3: 'i32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.4941303Z @triton.jit 2023-01-11T21:38:06.4941483Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr): 2023-01-11T21:38:06.4941555Z xnumel = 8 2023-01-11T21:38:06.4941626Z rnumel = 8 2023-01-11T21:38:06.4941724Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4941861Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.4941937Z xmask = xindex < xnumel 2023-01-11T21:38:06.4942056Z rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK]) 2023-01-11T21:38:06.4942127Z x0 = xindex 2023-01-11T21:38:06.4942312Z _tmp2 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("-inf") 2023-01-11T21:38:06.4942417Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.4942536Z rindex = roffset + rbase 2023-01-11T21:38:06.4942624Z rmask = rindex < rnumel 2023-01-11T21:38:06.4942690Z r1 = rindex 2023-01-11T21:38:06.4942933Z tmp0 = tl.load(in_ptr0 + (x0 + (8*r1)), rmask & xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.4943028Z tmp1 = tmp0.to(tl.float32) 2023-01-11T21:38:06.4943160Z _tmp2 = tl.where(xmask & rmask & (_tmp2 < tmp1), tmp1, _tmp2) 2023-01-11T21:38:06.4943276Z tmp2 = tl.reshape(tl.max(_tmp2, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.4943376Z tl.store(out_ptr0 + x0, tmp2, xmask) 2023-01-11T21:38:06.4943499Z _tmp7 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.4943607Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.4943690Z rindex = roffset + rbase 2023-01-11T21:38:06.4943776Z rmask = rindex < rnumel 2023-01-11T21:38:06.4943849Z r1 = rindex 2023-01-11T21:38:06.4943983Z tmp3 = tl.load(in_ptr0 + (x0 + (8*r1)), rmask & xmask).to(tl.float32) 2023-01-11T21:38:06.4944086Z tmp4 = tmp3.to(tl.float32) 2023-01-11T21:38:06.4944204Z tmp5 = tmp4 - tmp2 2023-01-11T21:38:06.4944293Z tmp6 = tl.exp(tmp5) 2023-01-11T21:38:06.4944409Z _tmp7 = tl.where(xmask & rmask, _tmp7 + tmp6, _tmp7) 2023-01-11T21:38:06.4944529Z tmp7 = tl.reshape(tl.sum(_tmp7, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.4944631Z tl.store(out_ptr1 + x0, tmp7, xmask) 2023-01-11T21:38:06.4944718Z ''') 2023-01-11T21:38:06.4944723Z 2023-01-11T21:38:06.4944728Z 2023-01-11T21:38:06.4945091Z triton_fused_add_amax_amax_2_convert_element_type_convert_element_type_1_convert_element_type_3_convert_element_type_4_convert_element_type_5_exp_exp_2_1 = async_compile.triton(''' 2023-01-11T21:38:06.4945170Z import triton 2023-01-11T21:38:06.4945267Z import triton.language as tl 2023-01-11T21:38:06.4945385Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4945485Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4945636Z from torch._inductor.triton_ops.autotune import reduction 2023-01-11T21:38:06.4945783Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4945790Z 2023-01-11T21:38:06.4945899Z @reduction(size_hints=[8, 8], 2023-01-11T21:38:06.4946016Z reduction_hint=ReductionHint.INNER, 2023-01-11T21:38:06.4946104Z filename=__file__, 2023-01-11T21:38:06.4946615Z meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp32', 3: '*fp32', 4: '*fp16', 5: '*fp16', 6: '*fp16', 7: 'i32', 8: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5, 6), equal_to_1=())]}) 2023-01-11T21:38:06.4946693Z @triton.jit 2023-01-11T21:38:06.4946896Z def triton_(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr4, out_ptr5, out_ptr6, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr): 2023-01-11T21:38:06.4946972Z xnumel = 8 2023-01-11T21:38:06.4947047Z rnumel = 8 2023-01-11T21:38:06.4947151Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4947291Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.4947375Z xmask = xindex < xnumel 2023-01-11T21:38:06.4947493Z rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK]) 2023-01-11T21:38:06.4947560Z x0 = xindex 2023-01-11T21:38:06.4947746Z _tmp4 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("-inf") 2023-01-11T21:38:06.4947850Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.4947940Z rindex = roffset + rbase 2023-01-11T21:38:06.4948027Z rmask = rindex < rnumel 2023-01-11T21:38:06.4948101Z r1 = rindex 2023-01-11T21:38:06.4948340Z tmp0 = tl.load(in_ptr0 + (r1 + (8*x0)), rmask & xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.4948572Z tmp1 = tl.load(in_ptr1 + (r1 + (8*x0)), rmask & xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.4948658Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.4948751Z tmp3 = tmp2.to(tl.float32) 2023-01-11T21:38:06.4948911Z _tmp4 = tl.where(xmask & rmask & (_tmp4 < tmp3), tmp3, _tmp4) 2023-01-11T21:38:06.4949027Z tmp4 = tl.reshape(tl.max(_tmp4, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.4949148Z _tmp11 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.4949338Z _tmp13 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("-inf") 2023-01-11T21:38:06.4949443Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.4949526Z rindex = roffset + rbase 2023-01-11T21:38:06.4949613Z rmask = rindex < rnumel 2023-01-11T21:38:06.4949687Z r1 = rindex 2023-01-11T21:38:06.4949929Z tmp5 = tl.load(in_ptr0 + (r1 + (8*x0)), rmask & xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.4950166Z tmp6 = tl.load(in_ptr1 + (r1 + (8*x0)), rmask & xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.4950251Z tmp7 = tmp5 + tmp6 2023-01-11T21:38:06.4950343Z tmp8 = tmp7.to(tl.float32) 2023-01-11T21:38:06.4950456Z tmp9 = tmp8 - tmp4 2023-01-11T21:38:06.4950543Z tmp10 = tl.exp(tmp9) 2023-01-11T21:38:06.4950668Z _tmp11 = tl.where(xmask & rmask, _tmp11 + tmp10, _tmp11) 2023-01-11T21:38:06.4950762Z tmp12 = tmp6.to(tl.float32) 2023-01-11T21:38:06.4950899Z _tmp13 = tl.where(xmask & rmask & (_tmp13 < tmp12), tmp12, _tmp13) 2023-01-11T21:38:06.4951015Z tmp11 = tl.reshape(tl.sum(_tmp11, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.4951134Z tmp13 = tl.reshape(tl.max(_tmp13, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.4951247Z _tmp18 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.4951353Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.4951443Z rindex = roffset + rbase 2023-01-11T21:38:06.4951530Z rmask = rindex < rnumel 2023-01-11T21:38:06.4951603Z r1 = rindex 2023-01-11T21:38:06.4951843Z tmp14 = tl.load(in_ptr1 + (r1 + (8*x0)), rmask & xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.4952083Z tmp19 = tl.load(in_ptr0 + (r1 + (8*x0)), rmask & xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.4952218Z tmp26 = tl.load(in_ptr0 + (r1 + (8*x0)), rmask & xmask).to(tl.float32) 2023-01-11T21:38:06.4952316Z tmp28 = tl.load(in_ptr2 + (r1), rmask) 2023-01-11T21:38:06.4952447Z tmp30 = tl.load(in_ptr3 + (r1), rmask) 2023-01-11T21:38:06.4952543Z tmp15 = tmp14.to(tl.float32) 2023-01-11T21:38:06.4952665Z tmp16 = tmp15 - tmp13 2023-01-11T21:38:06.4952753Z tmp17 = tl.exp(tmp16) 2023-01-11T21:38:06.4952876Z _tmp18 = tl.where(xmask & rmask, _tmp18 + tmp17, _tmp18) 2023-01-11T21:38:06.4952958Z tmp20 = tmp19 + tmp14 2023-01-11T21:38:06.4953043Z tmp21 = tmp20.to(tl.float32) 2023-01-11T21:38:06.4953159Z tmp22 = tmp21 - tmp4 2023-01-11T21:38:06.4953242Z tmp23 = tl.log(tmp11) 2023-01-11T21:38:06.4953358Z tmp24 = tmp22 - tmp23 2023-01-11T21:38:06.4953447Z tmp25 = tmp24.to(tl.float32) 2023-01-11T21:38:06.4953541Z tmp27 = tmp26.to(tl.float32) 2023-01-11T21:38:06.4953659Z tmp29 = tmp27 - tmp28 2023-01-11T21:38:06.4953736Z tmp31 = tl.log(tmp30) 2023-01-11T21:38:06.4953852Z tmp32 = tmp29 - tmp31 2023-01-11T21:38:06.4953944Z tmp33 = tmp32.to(tl.float32) 2023-01-11T21:38:06.4954108Z tl.store(out_ptr4 + (r1 + (8*x0) + tl.zeros([XBLOCK, RBLOCK], tl.int32)), tmp25, rmask & xmask) 2023-01-11T21:38:06.4954268Z tl.store(out_ptr5 + (r1 + (8*x0) + tl.zeros([XBLOCK, RBLOCK], tl.int32)), tmp33, rmask & xmask) 2023-01-11T21:38:06.4954387Z tmp18 = tl.reshape(tl.sum(_tmp18, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.4954495Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.4954578Z rindex = roffset + rbase 2023-01-11T21:38:06.4954664Z rmask = rindex < rnumel 2023-01-11T21:38:06.4954738Z r1 = rindex 2023-01-11T21:38:06.4954872Z tmp34 = tl.load(in_ptr1 + (r1 + (8*x0)), rmask & xmask).to(tl.float32) 2023-01-11T21:38:06.4954992Z tmp35 = tmp34.to(tl.float32) 2023-01-11T21:38:06.4955108Z tmp36 = tmp35 - tmp13 2023-01-11T21:38:06.4955194Z tmp37 = tl.log(tmp18) 2023-01-11T21:38:06.4955311Z tmp38 = tmp36 - tmp37 2023-01-11T21:38:06.4955422Z tmp39 = tmp38.to(tl.float32) 2023-01-11T21:38:06.4955603Z tl.store(out_ptr6 + (r1 + (8*x0) + tl.zeros([XBLOCK, RBLOCK], tl.int32)), tmp39, rmask & xmask) 2023-01-11T21:38:06.4955690Z ''') 2023-01-11T21:38:06.4955696Z 2023-01-11T21:38:06.4955700Z 2023-01-11T21:38:06.4955793Z async_compile.wait(globals()) 2023-01-11T21:38:06.4955871Z del async_compile 2023-01-11T21:38:06.4955877Z 2023-01-11T21:38:06.4955954Z def call(args): 2023-01-11T21:38:06.4956037Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.4956108Z args.clear() 2023-01-11T21:38:06.4956202Z with torch.cuda.device(0): 2023-01-11T21:38:06.4956403Z buf3 = empty_strided((1, 8), (8, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.4956602Z buf4 = empty_strided((1, 8), (8, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.4956699Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.4956889Z triton_fused_amax_1_convert_element_type_2_exp_1_sub_2_sum_2_0.run(arg0_1, buf3, buf4, 8, 8, grid=grid(8), stream=stream0) 2023-01-11T21:38:06.4957092Z buf2 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.4957283Z buf5 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.4957477Z buf8 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.4957767Z triton_fused_add_amax_amax_2_convert_element_type_convert_element_type_1_convert_element_type_3_convert_element_type_4_convert_element_type_5_exp_exp_2_1.run(arg0_1, arg1_1, buf3, buf4, buf2, buf5, buf8, 8, 8, grid=grid(8), stream=stream0) 2023-01-11T21:38:06.4957842Z del arg0_1 2023-01-11T21:38:06.4957918Z del arg1_1 2023-01-11T21:38:06.4958019Z return (buf2, buf5, buf8, ) 2023-01-11T21:38:06.4958024Z 2023-01-11T21:38:06.4958029Z 2023-01-11T21:38:06.4958111Z if __name__ == "__main__": 2023-01-11T21:38:06.4958237Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.4958366Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.4958587Z arg0_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.4958782Z arg1_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.4958906Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.4959173Z [2023-01-11 21:35:12,319] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 684 2023-01-11T21:38:06.4959179Z 2023-01-11T21:38:06.4959250Z ok (0.640s) 2023-01-11T21:38:06.4959709Z test_logsumexp_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.4959845Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.4960110Z [2023-01-11 21:35:12,366] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 685 2023-01-11T21:38:06.4960321Z [2023-01-11 21:35:12,392] torch._inductor.scheduler: [DEBUG] remove_buffer('buf0') 2023-01-11T21:38:06.4960520Z [2023-01-11 21:35:12,395] torch._inductor.scheduler: [DEBUG] remove_buffer('buf3') 2023-01-11T21:38:06.4960786Z [2023-01-11 21:35:12,624] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 685 2023-01-11T21:38:06.4961203Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.4961365Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.4961624Z [2023-01-11 21:35:12,740] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 686 2023-01-11T21:38:06.4961835Z [2023-01-11 21:35:12,769] torch._inductor.scheduler: [DEBUG] remove_buffer('buf0') 2023-01-11T21:38:06.4962042Z [2023-01-11 21:35:12,769] torch._inductor.scheduler: [DEBUG] remove_buffer('buf1') 2023-01-11T21:38:06.4962243Z [2023-01-11 21:35:12,772] torch._inductor.scheduler: [DEBUG] remove_buffer('buf3') 2023-01-11T21:38:06.4962443Z [2023-01-11 21:35:12,772] torch._inductor.scheduler: [DEBUG] remove_buffer('buf4') 2023-01-11T21:38:06.4962449Z 2023-01-11T21:38:06.4962550Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.4962623Z import torch 2023-01-11T21:38:06.4962700Z import random 2023-01-11T21:38:06.4962823Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.4962949Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.4962955Z 2023-01-11T21:38:06.4963037Z aten = torch.ops.aten 2023-01-11T21:38:06.4963178Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.4963278Z async_compile = AsyncCompile() 2023-01-11T21:38:06.4963284Z 2023-01-11T21:38:06.4963353Z import triton 2023-01-11T21:38:06.4963449Z import triton.language as tl 2023-01-11T21:38:06.4963576Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.4963716Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.4963722Z 2023-01-11T21:38:06.4963727Z 2023-01-11T21:38:06.4963917Z triton_fused_add_amax_exp_sub_sum_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.4963995Z import triton 2023-01-11T21:38:06.4964090Z import triton.language as tl 2023-01-11T21:38:06.4964211Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4964308Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4964441Z from torch._inductor.triton_ops.autotune import reduction 2023-01-11T21:38:06.4964569Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4964574Z 2023-01-11T21:38:06.4964691Z @reduction(size_hints=[8, 8], 2023-01-11T21:38:06.4964810Z reduction_hint=ReductionHint.INNER, 2023-01-11T21:38:06.4964896Z filename=__file__, 2023-01-11T21:38:06.4965270Z meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.4965344Z @triton.jit 2023-01-11T21:38:06.4965510Z def triton_(in_out_ptr0, in_ptr0, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr): 2023-01-11T21:38:06.4965586Z xnumel = 8 2023-01-11T21:38:06.4965664Z rnumel = 8 2023-01-11T21:38:06.4965766Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4965904Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.4965988Z xmask = xindex < xnumel 2023-01-11T21:38:06.4966110Z rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK]) 2023-01-11T21:38:06.4966179Z x0 = xindex 2023-01-11T21:38:06.4966365Z _tmp1 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("-inf") 2023-01-11T21:38:06.4966475Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.4966564Z rindex = roffset + rbase 2023-01-11T21:38:06.4966653Z rmask = rindex < rnumel 2023-01-11T21:38:06.4966724Z r1 = rindex 2023-01-11T21:38:06.4966940Z tmp0 = tl.load(in_ptr0 + (r1 + (8*x0)), rmask & xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.4967065Z _tmp1 = tl.where(xmask & rmask & (_tmp1 < tmp0), tmp0, _tmp1) 2023-01-11T21:38:06.4967179Z tmp1 = tl.reshape(tl.max(_tmp1, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.4967332Z _tmp5 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.4967441Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.4967536Z rindex = roffset + rbase 2023-01-11T21:38:06.4967625Z rmask = rindex < rnumel 2023-01-11T21:38:06.4967700Z r1 = rindex 2023-01-11T21:38:06.4967816Z tmp2 = tl.load(in_ptr0 + (r1 + (8*x0)), rmask & xmask) 2023-01-11T21:38:06.4967936Z tmp3 = tmp2 - tmp1 2023-01-11T21:38:06.4968022Z tmp4 = tl.exp(tmp3) 2023-01-11T21:38:06.4968144Z _tmp5 = tl.where(xmask & rmask, _tmp5 + tmp4, _tmp5) 2023-01-11T21:38:06.4968259Z tmp5 = tl.reshape(tl.sum(_tmp5, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.4968345Z tmp6 = tl.log(tmp5) 2023-01-11T21:38:06.4968427Z tmp7 = tl.abs(tmp1) 2023-01-11T21:38:06.4968502Z tmp8 = float("inf") 2023-01-11T21:38:06.4968581Z tmp9 = tmp7 == tmp8 2023-01-11T21:38:06.4968653Z tmp10 = 0.0 2023-01-11T21:38:06.4968752Z tmp11 = tl.where(tmp9, tmp10, tmp1) 2023-01-11T21:38:06.4968839Z tmp12 = tmp6 + tmp11 2023-01-11T21:38:06.4968981Z tl.store(in_out_ptr0 + (x0 + tl.zeros([XBLOCK, 1], tl.int32)), tmp12, xmask) 2023-01-11T21:38:06.4969070Z ''') 2023-01-11T21:38:06.4969075Z 2023-01-11T21:38:06.4969080Z 2023-01-11T21:38:06.4969276Z triton_fused_amax_1_exp_1_sub_1_sub_2_sum_2_1 = async_compile.triton(''' 2023-01-11T21:38:06.4969350Z import triton 2023-01-11T21:38:06.4969444Z import triton.language as tl 2023-01-11T21:38:06.4969560Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4969665Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4969799Z from torch._inductor.triton_ops.autotune import reduction 2023-01-11T21:38:06.4969926Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4969931Z 2023-01-11T21:38:06.4970022Z @reduction(size_hints=[8, 8], 2023-01-11T21:38:06.4970136Z reduction_hint=ReductionHint.DEFAULT, 2023-01-11T21:38:06.4970225Z filename=__file__, 2023-01-11T21:38:06.4970602Z meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.4970678Z @triton.jit 2023-01-11T21:38:06.4970881Z def triton_(in_out_ptr0, in_ptr0, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr): 2023-01-11T21:38:06.4970959Z xnumel = 8 2023-01-11T21:38:06.4971033Z rnumel = 8 2023-01-11T21:38:06.4971127Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4971264Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.4971348Z xmask = xindex < xnumel 2023-01-11T21:38:06.4971470Z rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK]) 2023-01-11T21:38:06.4971542Z x0 = xindex 2023-01-11T21:38:06.4971726Z _tmp1 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("-inf") 2023-01-11T21:38:06.4971832Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.4971927Z rindex = roffset + rbase 2023-01-11T21:38:06.4972008Z rmask = rindex < rnumel 2023-01-11T21:38:06.4972081Z r1 = rindex 2023-01-11T21:38:06.4972297Z tmp0 = tl.load(in_ptr0 + (x0 + (8*r1)), rmask & xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.4972430Z _tmp1 = tl.where(xmask & rmask & (_tmp1 < tmp0), tmp0, _tmp1) 2023-01-11T21:38:06.4972545Z tmp1 = tl.reshape(tl.max(_tmp1, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.4972667Z _tmp5 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.4972773Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.4972858Z rindex = roffset + rbase 2023-01-11T21:38:06.4972942Z rmask = rindex < rnumel 2023-01-11T21:38:06.4973015Z r1 = rindex 2023-01-11T21:38:06.4973135Z tmp2 = tl.load(in_ptr0 + (x0 + (8*r1)), rmask & xmask) 2023-01-11T21:38:06.4973254Z tmp3 = tmp2 - tmp1 2023-01-11T21:38:06.4973340Z tmp4 = tl.exp(tmp3) 2023-01-11T21:38:06.4973492Z _tmp5 = tl.where(xmask & rmask, _tmp5 + tmp4, _tmp5) 2023-01-11T21:38:06.4973601Z tmp5 = tl.reshape(tl.sum(_tmp5, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.4973683Z tmp6 = tl.log(tmp5) 2023-01-11T21:38:06.4973765Z tmp7 = tl.abs(tmp1) 2023-01-11T21:38:06.4973847Z tmp8 = float("inf") 2023-01-11T21:38:06.4973929Z tmp9 = tmp7 == tmp8 2023-01-11T21:38:06.4974008Z tmp10 = 0.0 2023-01-11T21:38:06.4974109Z tmp11 = tl.where(tmp9, tmp10, tmp1) 2023-01-11T21:38:06.4974185Z tmp12 = tmp6 + tmp11 2023-01-11T21:38:06.4974259Z tmp13 = 2 2023-01-11T21:38:06.4974378Z tmp14 = tmp12 - tmp13 2023-01-11T21:38:06.4974636Z tl.store(in_out_ptr0 + (x0 + tl.zeros([XBLOCK, 1], tl.int32)), tmp14, xmask) 2023-01-11T21:38:06.4974721Z ''') 2023-01-11T21:38:06.4974727Z 2023-01-11T21:38:06.4974731Z 2023-01-11T21:38:06.4974824Z async_compile.wait(globals()) 2023-01-11T21:38:06.4974900Z del async_compile 2023-01-11T21:38:06.4974905Z 2023-01-11T21:38:06.4974972Z def call(args): 2023-01-11T21:38:06.4975049Z arg0_1, = args 2023-01-11T21:38:06.4975124Z args.clear() 2023-01-11T21:38:06.4975217Z with torch.cuda.device(0): 2023-01-11T21:38:06.4975420Z buf1 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.4975510Z buf2 = buf1; del buf1 # reuse 2023-01-11T21:38:06.4975605Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.4975759Z triton_fused_add_amax_exp_sub_sum_1_0.run(buf2, arg0_1, 8, 8, grid=grid(8), stream=stream0) 2023-01-11T21:38:06.4975953Z buf4 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.4976042Z buf5 = buf4; del buf4 # reuse 2023-01-11T21:38:06.4976203Z triton_fused_amax_1_exp_1_sub_1_sub_2_sum_2_1.run(buf5, arg0_1, 8, 8, grid=grid(8), stream=stream0) 2023-01-11T21:38:06.4976276Z del arg0_1 2023-01-11T21:38:06.4976358Z return (buf2, buf5, ) 2023-01-11T21:38:06.4976363Z 2023-01-11T21:38:06.4976368Z 2023-01-11T21:38:06.4976446Z if __name__ == "__main__": 2023-01-11T21:38:06.4976567Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.4976688Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.4976888Z arg0_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.4977040Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.4977045Z 2023-01-11T21:38:06.4977383Z [2023-01-11 21:35:12,870] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 686 2023-01-11T21:38:06.4977390Z 2023-01-11T21:38:06.4977485Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.4977559Z import torch 2023-01-11T21:38:06.4977633Z import random 2023-01-11T21:38:06.4977753Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.4977871Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.4977876Z 2023-01-11T21:38:06.4977957Z aten = torch.ops.aten 2023-01-11T21:38:06.4978096Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.4978194Z async_compile = AsyncCompile() 2023-01-11T21:38:06.4978199Z 2023-01-11T21:38:06.4978273Z import triton 2023-01-11T21:38:06.4978364Z import triton.language as tl 2023-01-11T21:38:06.4978489Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.4978632Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.4978637Z 2023-01-11T21:38:06.4978642Z 2023-01-11T21:38:06.4978887Z triton_fused_amax_convert_element_type_convert_element_type_1_exp_sub_sum_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.4978963Z import triton 2023-01-11T21:38:06.4979056Z import triton.language as tl 2023-01-11T21:38:06.4979170Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4979272Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4979402Z from torch._inductor.triton_ops.autotune import reduction 2023-01-11T21:38:06.4979531Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4979585Z 2023-01-11T21:38:06.4979679Z @reduction(size_hints=[8, 8], 2023-01-11T21:38:06.4979789Z reduction_hint=ReductionHint.INNER, 2023-01-11T21:38:06.4979873Z filename=__file__, 2023-01-11T21:38:06.4980235Z meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.4980309Z @triton.jit 2023-01-11T21:38:06.4980477Z def triton_(in_ptr0, out_ptr2, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr): 2023-01-11T21:38:06.4980552Z xnumel = 8 2023-01-11T21:38:06.4980627Z rnumel = 8 2023-01-11T21:38:06.4980718Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4980856Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.4980938Z xmask = xindex < xnumel 2023-01-11T21:38:06.4981055Z rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK]) 2023-01-11T21:38:06.4981129Z x0 = xindex 2023-01-11T21:38:06.4981311Z _tmp2 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("-inf") 2023-01-11T21:38:06.4981416Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.4981499Z rindex = roffset + rbase 2023-01-11T21:38:06.4981588Z rmask = rindex < rnumel 2023-01-11T21:38:06.4981664Z r1 = rindex 2023-01-11T21:38:06.4981900Z tmp0 = tl.load(in_ptr0 + (r1 + (8*x0)), rmask & xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.4981993Z tmp1 = tmp0.to(tl.float32) 2023-01-11T21:38:06.4982121Z _tmp2 = tl.where(xmask & rmask & (_tmp2 < tmp1), tmp1, _tmp2) 2023-01-11T21:38:06.4982234Z tmp2 = tl.reshape(tl.max(_tmp2, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.4982349Z _tmp7 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.4982448Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.4982534Z rindex = roffset + rbase 2023-01-11T21:38:06.4982619Z rmask = rindex < rnumel 2023-01-11T21:38:06.4982693Z r1 = rindex 2023-01-11T21:38:06.4982826Z tmp3 = tl.load(in_ptr0 + (r1 + (8*x0)), rmask & xmask).to(tl.float32) 2023-01-11T21:38:06.4982915Z tmp4 = tmp3.to(tl.float32) 2023-01-11T21:38:06.4983027Z tmp5 = tmp4 - tmp2 2023-01-11T21:38:06.4983102Z tmp6 = tl.exp(tmp5) 2023-01-11T21:38:06.4983253Z _tmp7 = tl.where(xmask & rmask, _tmp7 + tmp6, _tmp7) 2023-01-11T21:38:06.4983368Z tmp7 = tl.reshape(tl.sum(_tmp7, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.4983446Z tmp8 = tl.log(tmp7) 2023-01-11T21:38:06.4983523Z tmp9 = tl.abs(tmp2) 2023-01-11T21:38:06.4983603Z tmp10 = float("inf") 2023-01-11T21:38:06.4983682Z tmp11 = tmp9 == tmp10 2023-01-11T21:38:06.4983750Z tmp12 = 0.0 2023-01-11T21:38:06.4983849Z tmp13 = tl.where(tmp11, tmp12, tmp2) 2023-01-11T21:38:06.4983929Z tmp14 = tmp8 + tmp13 2023-01-11T21:38:06.4984017Z tmp15 = tmp14.to(tl.float32) 2023-01-11T21:38:06.4984151Z tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK, 1], tl.int32)), tmp15, xmask) 2023-01-11T21:38:06.4984239Z ''') 2023-01-11T21:38:06.4984244Z 2023-01-11T21:38:06.4984248Z 2023-01-11T21:38:06.4984481Z triton_fused_amax_1_convert_element_type_2_exp_1_sub_1_sub_2_sum_2_1 = async_compile.triton(''' 2023-01-11T21:38:06.4984549Z import triton 2023-01-11T21:38:06.4984643Z import triton.language as tl 2023-01-11T21:38:06.4984757Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4984860Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4984992Z from torch._inductor.triton_ops.autotune import reduction 2023-01-11T21:38:06.4985118Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4985123Z 2023-01-11T21:38:06.4985211Z @reduction(size_hints=[8, 8], 2023-01-11T21:38:06.4985321Z reduction_hint=ReductionHint.DEFAULT, 2023-01-11T21:38:06.4985427Z filename=__file__, 2023-01-11T21:38:06.4985817Z meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.4985924Z @triton.jit 2023-01-11T21:38:06.4986096Z def triton_(in_ptr0, out_ptr2, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr): 2023-01-11T21:38:06.4986173Z xnumel = 8 2023-01-11T21:38:06.4986251Z rnumel = 8 2023-01-11T21:38:06.4986352Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4986482Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.4986566Z xmask = xindex < xnumel 2023-01-11T21:38:06.4986686Z rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK]) 2023-01-11T21:38:06.4986757Z x0 = xindex 2023-01-11T21:38:06.4986941Z _tmp2 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("-inf") 2023-01-11T21:38:06.4987046Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.4987136Z rindex = roffset + rbase 2023-01-11T21:38:06.4987216Z rmask = rindex < rnumel 2023-01-11T21:38:06.4987292Z r1 = rindex 2023-01-11T21:38:06.4987532Z tmp0 = tl.load(in_ptr0 + (x0 + (8*r1)), rmask & xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.4987626Z tmp1 = tmp0.to(tl.float32) 2023-01-11T21:38:06.4987757Z _tmp2 = tl.where(xmask & rmask & (_tmp2 < tmp1), tmp1, _tmp2) 2023-01-11T21:38:06.4987876Z tmp2 = tl.reshape(tl.max(_tmp2, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.4987994Z _tmp7 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.4988096Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.4988187Z rindex = roffset + rbase 2023-01-11T21:38:06.4988275Z rmask = rindex < rnumel 2023-01-11T21:38:06.4988349Z r1 = rindex 2023-01-11T21:38:06.4988483Z tmp3 = tl.load(in_ptr0 + (x0 + (8*r1)), rmask & xmask).to(tl.float32) 2023-01-11T21:38:06.4988575Z tmp4 = tmp3.to(tl.float32) 2023-01-11T21:38:06.4988693Z tmp5 = tmp4 - tmp2 2023-01-11T21:38:06.4988775Z tmp6 = tl.exp(tmp5) 2023-01-11T21:38:06.4988900Z _tmp7 = tl.where(xmask & rmask, _tmp7 + tmp6, _tmp7) 2023-01-11T21:38:06.4989015Z tmp7 = tl.reshape(tl.sum(_tmp7, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.4989096Z tmp8 = tl.log(tmp7) 2023-01-11T21:38:06.4989177Z tmp9 = tl.abs(tmp2) 2023-01-11T21:38:06.4989289Z tmp10 = float("inf") 2023-01-11T21:38:06.4989373Z tmp11 = tmp9 == tmp10 2023-01-11T21:38:06.4989440Z tmp12 = 0.0 2023-01-11T21:38:06.4989542Z tmp13 = tl.where(tmp11, tmp12, tmp2) 2023-01-11T21:38:06.4989626Z tmp14 = tmp8 + tmp13 2023-01-11T21:38:06.4989717Z tmp15 = tmp14.to(tl.float32) 2023-01-11T21:38:06.4989791Z tmp16 = 2 2023-01-11T21:38:06.4989908Z tmp17 = tmp15 - tmp16 2023-01-11T21:38:06.4990049Z tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK, 1], tl.int32)), tmp17, xmask) 2023-01-11T21:38:06.4990129Z ''') 2023-01-11T21:38:06.4990134Z 2023-01-11T21:38:06.4990145Z 2023-01-11T21:38:06.4990234Z async_compile.wait(globals()) 2023-01-11T21:38:06.4990316Z del async_compile 2023-01-11T21:38:06.4990321Z 2023-01-11T21:38:06.4990398Z def call(args): 2023-01-11T21:38:06.4990475Z arg0_1, = args 2023-01-11T21:38:06.4990551Z args.clear() 2023-01-11T21:38:06.4990646Z with torch.cuda.device(0): 2023-01-11T21:38:06.4990846Z buf2 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.4990935Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.4991137Z triton_fused_amax_convert_element_type_convert_element_type_1_exp_sub_sum_1_0.run(arg0_1, buf2, 8, 8, grid=grid(8), stream=stream0) 2023-01-11T21:38:06.4991335Z buf5 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.4991521Z triton_fused_amax_1_convert_element_type_2_exp_1_sub_1_sub_2_sum_2_1.run(arg0_1, buf5, 8, 8, grid=grid(8), stream=stream0) 2023-01-11T21:38:06.4991595Z del arg0_1 2023-01-11T21:38:06.4991682Z return (buf2, buf5, ) 2023-01-11T21:38:06.4991688Z 2023-01-11T21:38:06.4991719Z 2023-01-11T21:38:06.4991802Z if __name__ == "__main__": 2023-01-11T21:38:06.4991924Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.4992046Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.4992249Z arg0_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.4992364Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.4992370Z 2023-01-11T21:38:06.4992443Z ok (0.550s) 2023-01-11T21:38:06.4992903Z test_long_tensor_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.4993038Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.4993301Z [2023-01-11 21:35:12,897] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 687 2023-01-11T21:38:06.4993563Z [2023-01-11 21:35:12,971] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 687 2023-01-11T21:38:06.4993978Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.4994111Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.4994362Z [2023-01-11 21:35:12,997] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 688 2023-01-11T21:38:06.4994623Z [2023-01-11 21:35:13,007] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 688 2023-01-11T21:38:06.4994632Z 2023-01-11T21:38:06.4994733Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.4994807Z import torch 2023-01-11T21:38:06.4994882Z import random 2023-01-11T21:38:06.4995002Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.4995156Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.4995162Z 2023-01-11T21:38:06.4995248Z aten = torch.ops.aten 2023-01-11T21:38:06.4995403Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.4995507Z async_compile = AsyncCompile() 2023-01-11T21:38:06.4995513Z 2023-01-11T21:38:06.4995606Z import triton 2023-01-11T21:38:06.4995703Z import triton.language as tl 2023-01-11T21:38:06.4995829Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.4995970Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.4995976Z 2023-01-11T21:38:06.4995981Z 2023-01-11T21:38:06.4996142Z triton_fused_add_sub_0 = async_compile.triton(''' 2023-01-11T21:38:06.4996219Z import triton 2023-01-11T21:38:06.4996307Z import triton.language as tl 2023-01-11T21:38:06.4996422Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.4996528Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.4996664Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.4996794Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.4996799Z 2023-01-11T21:38:06.4997210Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*i64', 1: '*i64', 2: '*i64', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.4997285Z @triton.jit 2023-01-11T21:38:06.4997431Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.4997500Z xnumel = 64 2023-01-11T21:38:06.4997598Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.4997757Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.4997841Z xmask = xindex < xnumel 2023-01-11T21:38:06.4997914Z x0 = xindex 2023-01-11T21:38:06.4998105Z tmp1 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.4998207Z tmp4 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.4998279Z tmp0 = 294 2023-01-11T21:38:06.4998389Z tmp2 = tmp0 - tmp1 2023-01-11T21:38:06.4998463Z tmp3 = 295 2023-01-11T21:38:06.4998543Z tmp5 = tmp3 + tmp4 2023-01-11T21:38:06.4998677Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.4998812Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask) 2023-01-11T21:38:06.4998898Z ''') 2023-01-11T21:38:06.4998903Z 2023-01-11T21:38:06.4998908Z 2023-01-11T21:38:06.4999003Z async_compile.wait(globals()) 2023-01-11T21:38:06.4999075Z del async_compile 2023-01-11T21:38:06.4999081Z 2023-01-11T21:38:06.4999157Z def call(args): 2023-01-11T21:38:06.4999237Z arg0_1, = args 2023-01-11T21:38:06.4999313Z args.clear() 2023-01-11T21:38:06.4999406Z with torch.cuda.device(0): 2023-01-11T21:38:06.4999602Z buf0 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.4999803Z buf1 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.4999890Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.5000037Z triton_fused_add_sub_0.run(arg0_1, buf0, buf1, 64, grid=grid(64), stream=stream0) 2023-01-11T21:38:06.5000113Z del arg0_1 2023-01-11T21:38:06.5000197Z return (buf0, buf1, ) 2023-01-11T21:38:06.5000203Z 2023-01-11T21:38:06.5000207Z 2023-01-11T21:38:06.5000293Z if __name__ == "__main__": 2023-01-11T21:38:06.5000412Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.5000539Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.5000738Z arg0_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.5000848Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.5000854Z 2023-01-11T21:38:06.5000858Z 2023-01-11T21:38:06.5000957Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.5001032Z import torch 2023-01-11T21:38:06.5001107Z import random 2023-01-11T21:38:06.5001255Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.5001382Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.5001387Z 2023-01-11T21:38:06.5001467Z aten = torch.ops.aten 2023-01-11T21:38:06.5007845Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.5007961Z async_compile = AsyncCompile() 2023-01-11T21:38:06.5007967Z 2023-01-11T21:38:06.5008040Z import triton 2023-01-11T21:38:06.5008141Z import triton.language as tl 2023-01-11T21:38:06.5008272Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.5008419Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.5008428Z 2023-01-11T21:38:06.5008432Z 2023-01-11T21:38:06.5008614Z triton_fused_add_sub_0 = async_compile.triton(''' 2023-01-11T21:38:06.5008692Z import triton 2023-01-11T21:38:06.5008787Z import triton.language as tl 2023-01-11T21:38:06.5008906Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.5009006Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.5009146Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.5009273Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.5009279Z 2023-01-11T21:38:06.5009698Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*i64', 1: '*i64', 2: '*i64', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.5009767Z @triton.jit 2023-01-11T21:38:06.5009916Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.5010047Z xnumel = 64 2023-01-11T21:38:06.5010139Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.5010272Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.5010355Z xmask = xindex < xnumel 2023-01-11T21:38:06.5010426Z x0 = xindex 2023-01-11T21:38:06.5010621Z tmp1 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.5010720Z tmp4 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.5010794Z tmp0 = 294 2023-01-11T21:38:06.5010899Z tmp2 = tmp0 - tmp1 2023-01-11T21:38:06.5010971Z tmp3 = 295 2023-01-11T21:38:06.5011049Z tmp5 = tmp3 + tmp4 2023-01-11T21:38:06.5011184Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.5011320Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask) 2023-01-11T21:38:06.5011405Z ''') 2023-01-11T21:38:06.5011410Z 2023-01-11T21:38:06.5011415Z 2023-01-11T21:38:06.5011507Z async_compile.wait(globals()) 2023-01-11T21:38:06.5011580Z del async_compile 2023-01-11T21:38:06.5011586Z 2023-01-11T21:38:06.5011659Z def call(args): 2023-01-11T21:38:06.5011731Z arg0_1, = args 2023-01-11T21:38:06.5011805Z args.clear() 2023-01-11T21:38:06.5011899Z with torch.cuda.device(0): 2023-01-11T21:38:06.5012098Z buf0 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.5012294Z buf1 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.5012379Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.5012528Z triton_fused_add_sub_0.run(arg0_1, buf0, buf1, 64, grid=grid(64), stream=stream0) 2023-01-11T21:38:06.5012598Z del arg0_1 2023-01-11T21:38:06.5012681Z return (buf0, buf1, ) 2023-01-11T21:38:06.5012686Z 2023-01-11T21:38:06.5012691Z 2023-01-11T21:38:06.5012770Z if __name__ == "__main__": 2023-01-11T21:38:06.5012887Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.5013012Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.5013212Z arg0_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.5013317Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.5013326Z 2023-01-11T21:38:06.5013389Z ok (0.136s) 2023-01-11T21:38:06.5013943Z test_lowmem_dropout1_cuda (__main__.CudaTests) ... /opt/conda/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py:372: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled.Consider setting `torch.set_float32_matmul_precision('high')` 2023-01-11T21:38:06.5014026Z warnings.warn( 2023-01-11T21:38:06.5014286Z [2023-01-11 21:35:13,028] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 689 2023-01-11T21:38:06.5014778Z [2023-01-11 21:35:13,097] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 689 2023-01-11T21:38:06.5015042Z [2023-01-11 21:35:13,100] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling BACKWARDS graph 689 2023-01-11T21:38:06.5015310Z [2023-01-11 21:35:13,168] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling BACKWARDS graph 689 2023-01-11T21:38:06.5015564Z [2023-01-11 21:35:13,287] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 690 2023-01-11T21:38:06.5015824Z [2023-01-11 21:35:13,289] torch._inductor.lowering: [WARNING] using triton random, expect difference from eager 2023-01-11T21:38:06.5016078Z [2023-01-11 21:35:13,428] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 690 2023-01-11T21:38:06.5016331Z [2023-01-11 21:35:13,431] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling BACKWARDS graph 690 2023-01-11T21:38:06.5016337Z 2023-01-11T21:38:06.5016437Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.5016514Z import torch 2023-01-11T21:38:06.5016589Z import random 2023-01-11T21:38:06.5016710Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.5016887Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.5016893Z 2023-01-11T21:38:06.5016977Z aten = torch.ops.aten 2023-01-11T21:38:06.5017107Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.5017266Z async_compile = AsyncCompile() 2023-01-11T21:38:06.5017272Z 2023-01-11T21:38:06.5017380Z import triton 2023-01-11T21:38:06.5017475Z import triton.language as tl 2023-01-11T21:38:06.5017601Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.5017746Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.5017752Z 2023-01-11T21:38:06.5017757Z 2023-01-11T21:38:06.5017913Z triton_fused_mul_0 = async_compile.triton(''' 2023-01-11T21:38:06.5017987Z import triton 2023-01-11T21:38:06.5018073Z import triton.language as tl 2023-01-11T21:38:06.5018187Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.5018289Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.5018429Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.5018555Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.5018561Z 2023-01-11T21:38:06.5018992Z @pointwise(size_hints=[131072], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.5019071Z @triton.jit 2023-01-11T21:38:06.5019216Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.5019285Z xnumel = 100000 2023-01-11T21:38:06.5019382Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.5019512Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.5019595Z xmask = xindex < xnumel 2023-01-11T21:38:06.5019666Z x0 = xindex 2023-01-11T21:38:06.5019856Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.5019958Z tmp1 = tl.load(in_ptr1 + (x0), xmask) 2023-01-11T21:38:06.5020030Z tmp2 = tmp0 * tmp1 2023-01-11T21:38:06.5020164Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.5020248Z ''') 2023-01-11T21:38:06.5020253Z 2023-01-11T21:38:06.5020258Z 2023-01-11T21:38:06.5020392Z async_compile.wait(globals()) 2023-01-11T21:38:06.5020469Z del async_compile 2023-01-11T21:38:06.5020474Z 2023-01-11T21:38:06.5020547Z def call(args): 2023-01-11T21:38:06.5020637Z primals_1, primals_2 = args 2023-01-11T21:38:06.5020706Z args.clear() 2023-01-11T21:38:06.5020798Z with torch.cuda.device(0): 2023-01-11T21:38:06.5021004Z buf0 = empty_strided((100000, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.5021096Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.5021254Z triton_fused_mul_0.run(primals_1, primals_2, buf0, 100000, grid=grid(100000), stream=stream0) 2023-01-11T21:38:06.5021332Z del primals_2 2023-01-11T21:38:06.5021429Z return (buf0, primals_1, ) 2023-01-11T21:38:06.5021435Z 2023-01-11T21:38:06.5021439Z 2023-01-11T21:38:06.5021522Z if __name__ == "__main__": 2023-01-11T21:38:06.5021633Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.5021759Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.5021973Z primals_1 = rand_strided((100000, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.5022177Z primals_2 = rand_strided((100000, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.5022310Z print_performance(lambda: call([primals_1, primals_2])) 2023-01-11T21:38:06.5022316Z 2023-01-11T21:38:06.5022320Z 2023-01-11T21:38:06.5022419Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.5022494Z import torch 2023-01-11T21:38:06.5022570Z import random 2023-01-11T21:38:06.5022682Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.5022804Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.5022838Z 2023-01-11T21:38:06.5022922Z aten = torch.ops.aten 2023-01-11T21:38:06.5023058Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.5023154Z async_compile = AsyncCompile() 2023-01-11T21:38:06.5023159Z 2023-01-11T21:38:06.5023232Z import triton 2023-01-11T21:38:06.5023325Z import triton.language as tl 2023-01-11T21:38:06.5023444Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.5023583Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.5023588Z 2023-01-11T21:38:06.5023593Z 2023-01-11T21:38:06.5023751Z triton_fused_mul_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.5023826Z import triton 2023-01-11T21:38:06.5023918Z import triton.language as tl 2023-01-11T21:38:06.5024030Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.5024134Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.5024267Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.5024388Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.5024401Z 2023-01-11T21:38:06.5024821Z @pointwise(size_hints=[131072], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.5024894Z @triton.jit 2023-01-11T21:38:06.5025034Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.5025109Z xnumel = 100000 2023-01-11T21:38:06.5025204Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.5025331Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.5025415Z xmask = xindex < xnumel 2023-01-11T21:38:06.5025479Z x0 = xindex 2023-01-11T21:38:06.5025577Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.5025673Z tmp1 = tl.load(in_ptr1 + (x0), xmask) 2023-01-11T21:38:06.5025753Z tmp2 = tmp0 * tmp1 2023-01-11T21:38:06.5025885Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.5025968Z ''') 2023-01-11T21:38:06.5025973Z 2023-01-11T21:38:06.5025978Z 2023-01-11T21:38:06.5026069Z async_compile.wait(globals()) 2023-01-11T21:38:06.5026146Z del async_compile 2023-01-11T21:38:06.5026151Z 2023-01-11T21:38:06.5026255Z def call(args): 2023-01-11T21:38:06.5026350Z primals_1, tangents_1 = args 2023-01-11T21:38:06.5026424Z args.clear() 2023-01-11T21:38:06.5026514Z with torch.cuda.device(0): 2023-01-11T21:38:06.5026719Z buf0 = empty_strided((100000, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.5026810Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.5026968Z triton_fused_mul_1_0.run(tangents_1, primals_1, buf0, 100000, grid=grid(100000), stream=stream0) 2023-01-11T21:38:06.5027040Z del primals_1 2023-01-11T21:38:06.5027116Z del tangents_1 2023-01-11T21:38:06.5027202Z return (None, buf0, ) 2023-01-11T21:38:06.5027210Z 2023-01-11T21:38:06.5027215Z 2023-01-11T21:38:06.5027296Z if __name__ == "__main__": 2023-01-11T21:38:06.5027414Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.5027537Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.5027747Z primals_1 = rand_strided((100000, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.5027957Z tangents_1 = rand_strided((100000, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.5028083Z print_performance(lambda: call([primals_1, tangents_1])) 2023-01-11T21:38:06.5028089Z 2023-01-11T21:38:06.5028099Z 2023-01-11T21:38:06.5028190Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.5028264Z import torch 2023-01-11T21:38:06.5028337Z import random 2023-01-11T21:38:06.5028455Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.5028578Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.5028624Z 2023-01-11T21:38:06.5028705Z aten = torch.ops.aten 2023-01-11T21:38:06.5028840Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.5028929Z async_compile = AsyncCompile() 2023-01-11T21:38:06.5028934Z 2023-01-11T21:38:06.5029007Z import triton 2023-01-11T21:38:06.5029099Z import triton.language as tl 2023-01-11T21:38:06.5029224Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.5029364Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.5029528Z seed_cuda_0 = None # 12bf87036c8e625335a9db42dcf50de0c1ec952294785adced537424d5733e17 2023-01-11T21:38:06.5029533Z 2023-01-11T21:38:06.5029538Z 2023-01-11T21:38:06.5029694Z triton_fused_mul_2_0 = async_compile.triton(''' 2023-01-11T21:38:06.5029768Z import triton 2023-01-11T21:38:06.5029854Z import triton.language as tl 2023-01-11T21:38:06.5029969Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.5030071Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.5030204Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.5030327Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.5030332Z 2023-01-11T21:38:06.5030766Z @pointwise(size_hints=[131072], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]}) 2023-01-11T21:38:06.5030837Z @triton.jit 2023-01-11T21:38:06.5030985Z def triton_(seed0, in_ptr1, in_ptr2, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.5031054Z xnumel = 100000 2023-01-11T21:38:06.5031151Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.5031279Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.5031361Z xmask = xindex < xnumel 2023-01-11T21:38:06.5031430Z x0 = xindex 2023-01-11T21:38:06.5031662Z tmp0 = tl.load(seed0 + (0 + tl.zeros([XBLOCK], tl.int32)), None, eviction_policy='evict_last') 2023-01-11T21:38:06.5031854Z tmp6 = tl.load(in_ptr1 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.5031945Z tmp7 = tl.load(in_ptr2 + (x0), xmask) 2023-01-11T21:38:06.5032016Z tmp1 = x0 2023-01-11T21:38:06.5032104Z tmp2 = tl.rand(tmp0, tmp1) 2023-01-11T21:38:06.5032208Z tmp3 = 0.33 2023-01-11T21:38:06.5032287Z tmp4 = tmp2 > tmp3 2023-01-11T21:38:06.5032376Z tmp5 = tmp4.to(tl.float32) 2023-01-11T21:38:06.5032455Z tmp8 = tmp6 * tmp7 2023-01-11T21:38:06.5032526Z tmp9 = tmp5 * tmp8 2023-01-11T21:38:06.5032604Z tmp10 = 1.492537313432836 2023-01-11T21:38:06.5032684Z tmp11 = tmp9 * tmp10 2023-01-11T21:38:06.5032823Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp11, xmask) 2023-01-11T21:38:06.5032906Z ''') 2023-01-11T21:38:06.5032911Z 2023-01-11T21:38:06.5032916Z 2023-01-11T21:38:06.5033008Z async_compile.wait(globals()) 2023-01-11T21:38:06.5033085Z del async_compile 2023-01-11T21:38:06.5033093Z 2023-01-11T21:38:06.5033160Z def call(args): 2023-01-11T21:38:06.5033253Z primals_1, primals_2 = args 2023-01-11T21:38:06.5033326Z args.clear() 2023-01-11T21:38:06.5033461Z torch.randint(2**31, size=(), dtype=torch.int64, out=seed_cuda_0) 2023-01-11T21:38:06.5033550Z with torch.cuda.device(0): 2023-01-11T21:38:06.5033758Z buf0 = empty_strided((100000, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.5033848Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.5034020Z triton_fused_mul_2_0.run(seed_cuda_0, primals_1, primals_2, buf0, 100000, grid=grid(100000), stream=stream0) 2023-01-11T21:38:06.5034091Z del primals_2 2023-01-11T21:38:06.5034207Z return (buf0, primals_1, seed_cuda_0.clone(), ) 2023-01-11T21:38:06.5034212Z 2023-01-11T21:38:06.5034217Z 2023-01-11T21:38:06.5034299Z if __name__ == "__main__": 2023-01-11T21:38:06.5034417Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.5034543Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.5034766Z seed_cuda_0 = rand_strided((), (), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.5034980Z primals_1 = rand_strided((100000, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.5035186Z primals_2 = rand_strided((100000, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.5035322Z print_performance(lambda: call([primals_1, primals_2])) 2023-01-11T21:38:06.5035329Z 2023-01-11T21:38:06.5035634Z [2023-01-11 21:35:13,571] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling BACKWARDS graph 690 2023-01-11T21:38:06.5035640Z 2023-01-11T21:38:06.5035737Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.5035812Z import torch 2023-01-11T21:38:06.5035889Z import random 2023-01-11T21:38:06.5036008Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.5036131Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.5036136Z 2023-01-11T21:38:06.5036220Z aten = torch.ops.aten 2023-01-11T21:38:06.5036349Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.5036442Z async_compile = AsyncCompile() 2023-01-11T21:38:06.5036447Z 2023-01-11T21:38:06.5036519Z import triton 2023-01-11T21:38:06.5036608Z import triton.language as tl 2023-01-11T21:38:06.5036737Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.5036875Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.5036880Z 2023-01-11T21:38:06.5036885Z 2023-01-11T21:38:06.5037041Z triton_fused_mul_5_0 = async_compile.triton(''' 2023-01-11T21:38:06.5037115Z import triton 2023-01-11T21:38:06.5037201Z import triton.language as tl 2023-01-11T21:38:06.5037316Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.5037416Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.5037548Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.5037673Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.5037681Z 2023-01-11T21:38:06.5038116Z @pointwise(size_hints=[131072], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]}) 2023-01-11T21:38:06.5038216Z @triton.jit 2023-01-11T21:38:06.5038366Z def triton_(in_ptr0, in_ptr1, in_ptr2, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.5038435Z xnumel = 100000 2023-01-11T21:38:06.5038532Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.5038660Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.5038743Z xmask = xindex < xnumel 2023-01-11T21:38:06.5038814Z x0 = xindex 2023-01-11T21:38:06.5038946Z tmp0 = tl.load(in_ptr0 + (0 + tl.zeros([XBLOCK], tl.int32)), None) 2023-01-11T21:38:06.5039043Z tmp6 = tl.load(in_ptr1 + (x0), xmask) 2023-01-11T21:38:06.5039139Z tmp10 = tl.load(in_ptr2 + (x0), xmask) 2023-01-11T21:38:06.5039210Z tmp1 = x0 2023-01-11T21:38:06.5039301Z tmp2 = tl.rand(tmp0, tmp1) 2023-01-11T21:38:06.5039373Z tmp3 = 0.33 2023-01-11T21:38:06.5039451Z tmp4 = tmp2 > tmp3 2023-01-11T21:38:06.5039541Z tmp5 = tmp4.to(tl.float32) 2023-01-11T21:38:06.5039618Z tmp7 = tmp5 * tmp6 2023-01-11T21:38:06.5039693Z tmp8 = 1.492537313432836 2023-01-11T21:38:06.5039771Z tmp9 = tmp7 * tmp8 2023-01-11T21:38:06.5039851Z tmp11 = tmp9 * tmp10 2023-01-11T21:38:06.5039986Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp11, xmask) 2023-01-11T21:38:06.5040071Z ''') 2023-01-11T21:38:06.5040077Z 2023-01-11T21:38:06.5040081Z 2023-01-11T21:38:06.5040173Z async_compile.wait(globals()) 2023-01-11T21:38:06.5040250Z del async_compile 2023-01-11T21:38:06.5040255Z 2023-01-11T21:38:06.5040324Z def call(args): 2023-01-11T21:38:06.5040439Z primals_1, philox_seed_like, tangents_1 = args 2023-01-11T21:38:06.5040512Z args.clear() 2023-01-11T21:38:06.5040632Z with torch.cuda.device(0): 2023-01-11T21:38:06.5040836Z buf0 = empty_strided((100000, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.5040928Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.5041107Z triton_fused_mul_5_0.run(philox_seed_like, tangents_1, primals_1, buf0, 100000, grid=grid(100000), stream=stream0) 2023-01-11T21:38:06.5041194Z del philox_seed_like 2023-01-11T21:38:06.5041264Z del primals_1 2023-01-11T21:38:06.5041342Z del tangents_1 2023-01-11T21:38:06.5041425Z return (None, buf0, ) 2023-01-11T21:38:06.5041430Z 2023-01-11T21:38:06.5041435Z 2023-01-11T21:38:06.5041513Z if __name__ == "__main__": 2023-01-11T21:38:06.5041631Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.5041756Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.5041968Z primals_1 = rand_strided((100000, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.5042169Z philox_seed_like = rand_strided((), (), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.5042380Z tangents_1 = rand_strided((100000, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.5042532Z print_performance(lambda: call([primals_1, philox_seed_like, tangents_1])) 2023-01-11T21:38:06.5042538Z 2023-01-11T21:38:06.5042614Z ok (0.565s) 2023-01-11T21:38:06.5042948Z test_lowmem_dropout2_cuda (__main__.CudaTests) ... [2023-01-11 21:35:13,808] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 691 2023-01-11T21:38:06.5043203Z [2023-01-11 21:35:13,810] torch._inductor.lowering: [WARNING] using triton random, expect difference from eager 2023-01-11T21:38:06.5043466Z [2023-01-11 21:35:13,976] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 691 2023-01-11T21:38:06.5043720Z [2023-01-11 21:35:14,130] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling BACKWARDS graph 691 2023-01-11T21:38:06.5043727Z 2023-01-11T21:38:06.5043830Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.5043898Z import torch 2023-01-11T21:38:06.5043972Z import random 2023-01-11T21:38:06.5044091Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.5044215Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.5044221Z 2023-01-11T21:38:06.5044331Z aten = torch.ops.aten 2023-01-11T21:38:06.5044469Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.5044563Z async_compile = AsyncCompile() 2023-01-11T21:38:06.5044568Z 2023-01-11T21:38:06.5044642Z import triton 2023-01-11T21:38:06.5044728Z import triton.language as tl 2023-01-11T21:38:06.5044854Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.5044991Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.5045156Z seed_cuda_0 = None # 12bf87036c8e625335a9db42dcf50de0c1ec952294785adced537424d5733e17 2023-01-11T21:38:06.5045161Z 2023-01-11T21:38:06.5045169Z 2023-01-11T21:38:06.5045328Z triton_fused_mul_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.5045402Z import triton 2023-01-11T21:38:06.5045495Z import triton.language as tl 2023-01-11T21:38:06.5045603Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.5045706Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.5045843Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.5045970Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.5045975Z 2023-01-11T21:38:06.5046389Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp32', 1: '*i64', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.5046463Z @triton.jit 2023-01-11T21:38:06.5046596Z def triton_(in_out_ptr0, seed0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.5046671Z xnumel = 256 2023-01-11T21:38:06.5046842Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.5046966Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.5047050Z xmask = xindex < xnumel 2023-01-11T21:38:06.5047120Z x0 = xindex 2023-01-11T21:38:06.5047350Z tmp0 = tl.load(seed0 + (0 + tl.zeros([XBLOCK], tl.int32)), None, eviction_policy='evict_last') 2023-01-11T21:38:06.5047456Z tmp6 = tl.load(in_out_ptr0 + (x0), xmask) 2023-01-11T21:38:06.5047528Z tmp1 = x0 2023-01-11T21:38:06.5047616Z tmp2 = tl.rand(tmp0, tmp1) 2023-01-11T21:38:06.5047683Z tmp3 = 0.5 2023-01-11T21:38:06.5047764Z tmp4 = tmp2 > tmp3 2023-01-11T21:38:06.5047851Z tmp5 = tmp4.to(tl.float32) 2023-01-11T21:38:06.5047929Z tmp7 = tmp5 * tmp6 2023-01-11T21:38:06.5048002Z tmp8 = 2.0 2023-01-11T21:38:06.5048077Z tmp9 = tmp7 * tmp8 2023-01-11T21:38:06.5048211Z tl.store(in_out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp9, xmask) 2023-01-11T21:38:06.5048297Z ''') 2023-01-11T21:38:06.5048302Z 2023-01-11T21:38:06.5048310Z 2023-01-11T21:38:06.5048464Z triton_fused_mul_3_1 = async_compile.triton(''' 2023-01-11T21:38:06.5048539Z import triton 2023-01-11T21:38:06.5048631Z import triton.language as tl 2023-01-11T21:38:06.5048745Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.5048845Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.5048982Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.5049100Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.5049106Z 2023-01-11T21:38:06.5049520Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp32', 1: '*i64', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.5049595Z @triton.jit 2023-01-11T21:38:06.5049730Z def triton_(in_out_ptr0, seed0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.5049805Z xnumel = 256 2023-01-11T21:38:06.5049905Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.5050035Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.5050119Z xmask = xindex < xnumel 2023-01-11T21:38:06.5050183Z x0 = xindex 2023-01-11T21:38:06.5050440Z tmp0 = tl.load(seed0 + (0 + tl.zeros([XBLOCK], tl.int32)), None, eviction_policy='evict_last') 2023-01-11T21:38:06.5050546Z tmp6 = tl.load(in_out_ptr0 + (x0), xmask) 2023-01-11T21:38:06.5050622Z tmp1 = 256 + x0 2023-01-11T21:38:06.5050712Z tmp2 = tl.rand(tmp0, tmp1) 2023-01-11T21:38:06.5050785Z tmp3 = 0.5 2023-01-11T21:38:06.5050863Z tmp4 = tmp2 > tmp3 2023-01-11T21:38:06.5050944Z tmp5 = tmp4.to(tl.float32) 2023-01-11T21:38:06.5051022Z tmp7 = tmp5 * tmp6 2023-01-11T21:38:06.5051093Z tmp8 = 2.0 2023-01-11T21:38:06.5051172Z tmp9 = tmp7 * tmp8 2023-01-11T21:38:06.5051311Z tl.store(in_out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp9, xmask) 2023-01-11T21:38:06.5051396Z ''') 2023-01-11T21:38:06.5051404Z 2023-01-11T21:38:06.5051409Z 2023-01-11T21:38:06.5051502Z async_compile.wait(globals()) 2023-01-11T21:38:06.5051572Z del async_compile 2023-01-11T21:38:06.5051577Z 2023-01-11T21:38:06.5051650Z def call(args): 2023-01-11T21:38:06.5051756Z primals_1, primals_2, primals_3 = args 2023-01-11T21:38:06.5051834Z args.clear() 2023-01-11T21:38:06.5051975Z torch.randint(2**31, size=(), dtype=torch.int64, out=seed_cuda_0) 2023-01-11T21:38:06.5052068Z with torch.cuda.device(0): 2023-01-11T21:38:06.5052272Z buf0 = empty_strided((8, 32), (32, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.5052401Z aten.mm.out(primals_3, as_strided(primals_1, (32, 32), (1, 32)), out=buf0) 2023-01-11T21:38:06.5052480Z del primals_1 2023-01-11T21:38:06.5052573Z buf1 = buf0; del buf0 # reuse 2023-01-11T21:38:06.5052668Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.5052811Z triton_fused_mul_1_0.run(buf1, seed_cuda_0, 256, grid=grid(256), stream=stream0) 2023-01-11T21:38:06.5053045Z buf2 = empty_strided((8, 32), (32, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.5053174Z aten.mm.out(buf1, as_strided(primals_2, (32, 32), (1, 32)), out=buf2) 2023-01-11T21:38:06.5053265Z buf3 = buf2; del buf2 # reuse 2023-01-11T21:38:06.5053400Z triton_fused_mul_3_1.run(buf3, seed_cuda_0, 256, grid=grid(256), stream=stream0) 2023-01-11T21:38:06.5053564Z return (buf3, primals_3, seed_cuda_0.clone(), buf1, as_strided(primals_2, (32, 32), (32, 1)), ) 2023-01-11T21:38:06.5053570Z 2023-01-11T21:38:06.5053574Z 2023-01-11T21:38:06.5053655Z if __name__ == "__main__": 2023-01-11T21:38:06.5053774Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.5053901Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.5054096Z seed_cuda_0 = rand_strided((), (), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.5054305Z primals_1 = rand_strided((32, 32), (32, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.5054630Z primals_2 = rand_strided((32, 32), (32, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.5054832Z primals_3 = rand_strided((8, 32), (32, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.5054977Z print_performance(lambda: call([primals_1, primals_2, primals_3])) 2023-01-11T21:38:06.5054986Z 2023-01-11T21:38:06.5055255Z [2023-01-11 21:35:14,425] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling BACKWARDS graph 691 2023-01-11T21:38:06.5055260Z 2023-01-11T21:38:06.5055358Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.5055433Z import torch 2023-01-11T21:38:06.5055509Z import random 2023-01-11T21:38:06.5055628Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.5055753Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.5055758Z 2023-01-11T21:38:06.5055841Z aten = torch.ops.aten 2023-01-11T21:38:06.5055970Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.5056068Z async_compile = AsyncCompile() 2023-01-11T21:38:06.5056073Z 2023-01-11T21:38:06.5056147Z import triton 2023-01-11T21:38:06.5056239Z import triton.language as tl 2023-01-11T21:38:06.5056364Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.5056554Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.5056560Z 2023-01-11T21:38:06.5056565Z 2023-01-11T21:38:06.5056810Z triton_fused_convert_element_type_2_gt_2_mul_4_mul_5_philox_rand_like_2_0 = async_compile.triton(''' 2023-01-11T21:38:06.5056888Z import triton 2023-01-11T21:38:06.5056973Z import triton.language as tl 2023-01-11T21:38:06.5057085Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.5057254Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.5057433Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.5057563Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.5057568Z 2023-01-11T21:38:06.5057990Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.5058066Z @triton.jit 2023-01-11T21:38:06.5058209Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.5058277Z xnumel = 256 2023-01-11T21:38:06.5058374Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.5058502Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.5058587Z xmask = xindex < xnumel 2023-01-11T21:38:06.5058658Z x0 = xindex 2023-01-11T21:38:06.5058899Z tmp0 = tl.load(in_ptr0 + (0 + tl.zeros([XBLOCK], tl.int32)), None, eviction_policy='evict_last') 2023-01-11T21:38:06.5058997Z tmp6 = tl.load(in_ptr1 + (x0), xmask) 2023-01-11T21:38:06.5059065Z tmp1 = 256 + x0 2023-01-11T21:38:06.5059156Z tmp2 = tl.rand(tmp0, tmp1) 2023-01-11T21:38:06.5059273Z tmp3 = 0.5 2023-01-11T21:38:06.5059350Z tmp4 = tmp2 > tmp3 2023-01-11T21:38:06.5059441Z tmp5 = tmp4.to(tl.float32) 2023-01-11T21:38:06.5059519Z tmp7 = tmp5 * tmp6 2023-01-11T21:38:06.5059591Z tmp8 = 2.0 2023-01-11T21:38:06.5059662Z tmp9 = tmp7 * tmp8 2023-01-11T21:38:06.5059803Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp9, xmask) 2023-01-11T21:38:06.5059888Z ''') 2023-01-11T21:38:06.5059893Z 2023-01-11T21:38:06.5059898Z 2023-01-11T21:38:06.5060200Z triton_fused_convert_element_type_2_convert_element_type_3_gt_2_gt_3_mm_3_mul_4_mul_5_mul_6_mul_7_philox_rand_like_2_1 = async_compile.triton(''' 2023-01-11T21:38:06.5060274Z import triton 2023-01-11T21:38:06.5060364Z import triton.language as tl 2023-01-11T21:38:06.5060481Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.5060576Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.5060708Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.5060839Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.5060844Z 2023-01-11T21:38:06.5061261Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp32', 1: '*i64', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.5061334Z @triton.jit 2023-01-11T21:38:06.5061467Z def triton_(in_out_ptr0, in_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.5061542Z xnumel = 256 2023-01-11T21:38:06.5061638Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.5061760Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.5061842Z xmask = xindex < xnumel 2023-01-11T21:38:06.5061910Z x0 = xindex 2023-01-11T21:38:06.5062042Z tmp0 = tl.load(in_ptr0 + (0 + tl.zeros([XBLOCK], tl.int32)), None) 2023-01-11T21:38:06.5062146Z tmp6 = tl.load(in_out_ptr0 + (x0), xmask) 2023-01-11T21:38:06.5062222Z tmp1 = x0 2023-01-11T21:38:06.5062310Z tmp2 = tl.rand(tmp0, tmp1) 2023-01-11T21:38:06.5062375Z tmp3 = 0.5 2023-01-11T21:38:06.5062454Z tmp4 = tmp2 > tmp3 2023-01-11T21:38:06.5062542Z tmp5 = tmp4.to(tl.float32) 2023-01-11T21:38:06.5062617Z tmp7 = tmp5 * tmp6 2023-01-11T21:38:06.5062685Z tmp8 = 2.0 2023-01-11T21:38:06.5062788Z tmp9 = tmp7 * tmp8 2023-01-11T21:38:06.5062927Z tl.store(in_out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp9, xmask) 2023-01-11T21:38:06.5063006Z ''') 2023-01-11T21:38:06.5063012Z 2023-01-11T21:38:06.5063023Z 2023-01-11T21:38:06.5063109Z async_compile.wait(globals()) 2023-01-11T21:38:06.5063188Z del async_compile 2023-01-11T21:38:06.5063193Z 2023-01-11T21:38:06.5063265Z def call(args): 2023-01-11T21:38:06.5063407Z primals_3, philox_seed_like, mul_1, permute_4, tangents_1 = args 2023-01-11T21:38:06.5063482Z args.clear() 2023-01-11T21:38:06.5063574Z with torch.cuda.device(0): 2023-01-11T21:38:06.5063775Z buf0 = empty_strided((8, 32), (32, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.5063865Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.5064073Z triton_fused_convert_element_type_2_gt_2_mul_4_mul_5_philox_rand_like_2_0.run(philox_seed_like, tangents_1, buf0, 256, grid=grid(256), stream=stream0) 2023-01-11T21:38:06.5064155Z del tangents_1 2023-01-11T21:38:06.5064362Z buf1 = empty_strided((32, 32), (32, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.5064489Z aten.mm.out(as_strided(buf0, (32, 8), (1, 32)), mul_1, out=buf1) 2023-01-11T21:38:06.5064562Z del mul_1 2023-01-11T21:38:06.5064762Z buf2 = empty_strided((8, 32), (32, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.5064863Z aten.mm.out(buf0, permute_4, out=buf2) 2023-01-11T21:38:06.5064937Z del buf0 2023-01-11T21:38:06.5065012Z del permute_4 2023-01-11T21:38:06.5065104Z buf3 = buf2; del buf2 # reuse 2023-01-11T21:38:06.5065339Z triton_fused_convert_element_type_2_convert_element_type_3_gt_2_gt_3_mm_3_mul_4_mul_5_mul_6_mul_7_philox_rand_like_2_1.run(buf3, philox_seed_like, 256, grid=grid(256), stream=stream0) 2023-01-11T21:38:06.5065457Z del philox_seed_like 2023-01-11T21:38:06.5065681Z buf4 = empty_strided((32, 32), (32, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.5065833Z aten.mm.out(as_strided(buf3, (32, 8), (1, 32)), primals_3, out=buf4) 2023-01-11T21:38:06.5065907Z del buf3 2023-01-11T21:38:06.5065985Z del primals_3 2023-01-11T21:38:06.5066126Z return (as_strided(buf4, (32, 32), (32, 1)), as_strided(buf1, (32, 32), (32, 1)), None, ) 2023-01-11T21:38:06.5066132Z 2023-01-11T21:38:06.5066136Z 2023-01-11T21:38:06.5066216Z if __name__ == "__main__": 2023-01-11T21:38:06.5066335Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.5066464Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.5066670Z primals_3 = rand_strided((8, 32), (32, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.5066874Z philox_seed_like = rand_strided((), (), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.5067065Z mul_1 = rand_strided((8, 32), (32, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.5067275Z permute_4 = rand_strided((32, 32), (32, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.5067479Z tangents_1 = rand_strided((8, 32), (32, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.5067651Z print_performance(lambda: call([primals_3, philox_seed_like, mul_1, permute_4, tangents_1])) 2023-01-11T21:38:06.5067656Z 2023-01-11T21:38:06.5067728Z ok (0.851s) 2023-01-11T21:38:06.5068185Z test_masked_fill_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.5068322Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.5068581Z [2023-01-11 21:35:14,461] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 692 2023-01-11T21:38:06.5068869Z [2023-01-11 21:35:14,550] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 692 2023-01-11T21:38:06.5069287Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.5069418Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.5069665Z [2023-01-11 21:35:14,587] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 693 2023-01-11T21:38:06.5069930Z [2023-01-11 21:35:14,684] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 693 2023-01-11T21:38:06.5069936Z 2023-01-11T21:38:06.5070034Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.5070111Z import torch 2023-01-11T21:38:06.5070186Z import random 2023-01-11T21:38:06.5070304Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.5070428Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.5070433Z 2023-01-11T21:38:06.5070513Z aten = torch.ops.aten 2023-01-11T21:38:06.5070644Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.5070739Z async_compile = AsyncCompile() 2023-01-11T21:38:06.5070744Z 2023-01-11T21:38:06.5070817Z import triton 2023-01-11T21:38:06.5070907Z import triton.language as tl 2023-01-11T21:38:06.5071032Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.5071212Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.5071218Z 2023-01-11T21:38:06.5071222Z 2023-01-11T21:38:06.5071390Z triton_fused_add_where_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.5071464Z import triton 2023-01-11T21:38:06.5071549Z import triton.language as tl 2023-01-11T21:38:06.5071664Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.5071765Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.5071897Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.5072022Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.5072028Z 2023-01-11T21:38:06.5072462Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*i1', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]}) 2023-01-11T21:38:06.5072535Z @triton.jit 2023-01-11T21:38:06.5072689Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.5072756Z xnumel = 256 2023-01-11T21:38:06.5072853Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.5072984Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.5073071Z xmask = xindex < xnumel 2023-01-11T21:38:06.5073146Z x0 = xindex % 16 2023-01-11T21:38:06.5073217Z x2 = xindex 2023-01-11T21:38:06.5073409Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.5073592Z tmp2 = tl.load(in_ptr1 + (x2), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.5073687Z tmp6 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.5073785Z tmp9 = tl.load(in_ptr1 + (x2), xmask) 2023-01-11T21:38:06.5073892Z tmp1 = -10000.0 2023-01-11T21:38:06.5073988Z tmp3 = tl.where(tmp0, tmp1, tmp2) 2023-01-11T21:38:06.5074059Z tmp4 = 2 2023-01-11T21:38:06.5074138Z tmp5 = tmp3 + tmp4 2023-01-11T21:38:06.5074211Z tmp7 = tmp6 == 0 2023-01-11T21:38:06.5074285Z tmp8 = 667.0 2023-01-11T21:38:06.5074358Z tmp10 = 2.0 2023-01-11T21:38:06.5074440Z tmp11 = tmp9 / tmp10 2023-01-11T21:38:06.5074537Z tmp12 = tl.where(tmp7, tmp8, tmp11) 2023-01-11T21:38:06.5074677Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask) 2023-01-11T21:38:06.5074842Z tl.store(out_ptr1 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp12, xmask) 2023-01-11T21:38:06.5074922Z ''') 2023-01-11T21:38:06.5074927Z 2023-01-11T21:38:06.5074932Z 2023-01-11T21:38:06.5075027Z async_compile.wait(globals()) 2023-01-11T21:38:06.5075103Z del async_compile 2023-01-11T21:38:06.5075108Z 2023-01-11T21:38:06.5075184Z def call(args): 2023-01-11T21:38:06.5075263Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.5075347Z args.clear() 2023-01-11T21:38:06.5075456Z with torch.cuda.device(0): 2023-01-11T21:38:06.5075678Z buf0 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.5075880Z buf1 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.5075974Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.5076131Z triton_fused_add_where_1_0.run(arg0_1, arg1_1, buf0, buf1, 256, grid=grid(256), stream=stream0) 2023-01-11T21:38:06.5076203Z del arg0_1 2023-01-11T21:38:06.5076278Z del arg1_1 2023-01-11T21:38:06.5076359Z return (buf0, buf1, ) 2023-01-11T21:38:06.5076364Z 2023-01-11T21:38:06.5076368Z 2023-01-11T21:38:06.5076449Z if __name__ == "__main__": 2023-01-11T21:38:06.5076560Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.5076686Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.5076886Z arg0_1 = rand_strided((1, 16), (16, 1), device='cuda:0', dtype=torch.bool) 2023-01-11T21:38:06.5077091Z arg1_1 = rand_strided((16, 16), (16, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.5077209Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.5077243Z 2023-01-11T21:38:06.5077248Z 2023-01-11T21:38:06.5077348Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.5077422Z import torch 2023-01-11T21:38:06.5077497Z import random 2023-01-11T21:38:06.5077611Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.5077738Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.5077743Z 2023-01-11T21:38:06.5077826Z aten = torch.ops.aten 2023-01-11T21:38:06.5077962Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.5078058Z async_compile = AsyncCompile() 2023-01-11T21:38:06.5078063Z 2023-01-11T21:38:06.5078136Z import triton 2023-01-11T21:38:06.5078225Z import triton.language as tl 2023-01-11T21:38:06.5078349Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.5078480Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.5078486Z 2023-01-11T21:38:06.5078490Z 2023-01-11T21:38:06.5078656Z triton_fused_add_where_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.5078733Z import triton 2023-01-11T21:38:06.5078824Z import triton.language as tl 2023-01-11T21:38:06.5078940Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.5079041Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.5079174Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.5079292Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.5079304Z 2023-01-11T21:38:06.5079731Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*i1', 1: '*fp16', 2: '*fp16', 3: '*fp16', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]}) 2023-01-11T21:38:06.5079802Z @triton.jit 2023-01-11T21:38:06.5079954Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.5080028Z xnumel = 256 2023-01-11T21:38:06.5080123Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.5080256Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.5080339Z xmask = xindex < xnumel 2023-01-11T21:38:06.5080407Z x0 = xindex % 16 2023-01-11T21:38:06.5080478Z x2 = xindex 2023-01-11T21:38:06.5080696Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.5080910Z tmp2 = tl.load(in_ptr1 + (x2), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.5081007Z tmp6 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.5081126Z tmp9 = tl.load(in_ptr1 + (x2), xmask).to(tl.float32) 2023-01-11T21:38:06.5081231Z tmp1 = -10000.0 2023-01-11T21:38:06.5081328Z tmp3 = tl.where(tmp0, tmp1, tmp2) 2023-01-11T21:38:06.5081392Z tmp4 = 2 2023-01-11T21:38:06.5081470Z tmp5 = tmp3 + tmp4 2023-01-11T21:38:06.5081548Z tmp7 = tmp6 == 0 2023-01-11T21:38:06.5081618Z tmp8 = 667.0 2023-01-11T21:38:06.5081690Z tmp10 = 2.0 2023-01-11T21:38:06.5081774Z tmp11 = tmp9 / tmp10 2023-01-11T21:38:06.5081864Z tmp12 = tl.where(tmp7, tmp8, tmp11) 2023-01-11T21:38:06.5081999Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask) 2023-01-11T21:38:06.5082134Z tl.store(out_ptr1 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp12, xmask) 2023-01-11T21:38:06.5082218Z ''') 2023-01-11T21:38:06.5082226Z 2023-01-11T21:38:06.5082231Z 2023-01-11T21:38:06.5082324Z async_compile.wait(globals()) 2023-01-11T21:38:06.5082399Z del async_compile 2023-01-11T21:38:06.5082404Z 2023-01-11T21:38:06.5082478Z def call(args): 2023-01-11T21:38:06.5082555Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.5082623Z args.clear() 2023-01-11T21:38:06.5082715Z with torch.cuda.device(0): 2023-01-11T21:38:06.5082919Z buf0 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.5083116Z buf1 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.5083205Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.5083390Z triton_fused_add_where_1_0.run(arg0_1, arg1_1, buf0, buf1, 256, grid=grid(256), stream=stream0) 2023-01-11T21:38:06.5083462Z del arg0_1 2023-01-11T21:38:06.5083527Z del arg1_1 2023-01-11T21:38:06.5083609Z return (buf0, buf1, ) 2023-01-11T21:38:06.5083614Z 2023-01-11T21:38:06.5083618Z 2023-01-11T21:38:06.5083701Z if __name__ == "__main__": 2023-01-11T21:38:06.5083819Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.5083946Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.5084143Z arg0_1 = rand_strided((1, 16), (16, 1), device='cuda:0', dtype=torch.bool) 2023-01-11T21:38:06.5084347Z arg1_1 = rand_strided((16, 16), (16, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.5084464Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.5084470Z 2023-01-11T21:38:06.5084534Z ok (0.262s) 2023-01-11T21:38:06.5085058Z test_masked_fill_promotion_cuda (__main__.CudaTests) ... /opt/conda/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py:372: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled.Consider setting `torch.set_float32_matmul_precision('high')` 2023-01-11T21:38:06.5085141Z warnings.warn( 2023-01-11T21:38:06.5085401Z [2023-01-11 21:35:14,714] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 694 2023-01-11T21:38:06.5085665Z [2023-01-11 21:35:14,787] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 694 2023-01-11T21:38:06.5085917Z [2023-01-11 21:35:14,941] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 695 2023-01-11T21:38:06.5086154Z [2023-01-11 21:35:14,947] torch._inductor.optimize_indexing: [WARNING] unhandled ValueRange op name 2023-01-11T21:38:06.5086411Z [2023-01-11 21:35:15,021] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 695 2023-01-11T21:38:06.5086420Z 2023-01-11T21:38:06.5086516Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.5086590Z import torch 2023-01-11T21:38:06.5086658Z import random 2023-01-11T21:38:06.5086776Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.5086900Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.5086905Z 2023-01-11T21:38:06.5087017Z aten = torch.ops.aten 2023-01-11T21:38:06.5087155Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.5087249Z async_compile = AsyncCompile() 2023-01-11T21:38:06.5087254Z 2023-01-11T21:38:06.5087326Z import triton 2023-01-11T21:38:06.5087412Z import triton.language as tl 2023-01-11T21:38:06.5087536Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.5087675Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.5087680Z 2023-01-11T21:38:06.5087685Z 2023-01-11T21:38:06.5087841Z triton_fused_where_0 = async_compile.triton(''' 2023-01-11T21:38:06.5087918Z import triton 2023-01-11T21:38:06.5088010Z import triton.language as tl 2023-01-11T21:38:06.5088126Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.5088226Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.5088353Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.5088479Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.5088484Z 2023-01-11T21:38:06.5088902Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*i1', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.5088974Z @triton.jit 2023-01-11T21:38:06.5089112Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.5089190Z xnumel = 256 2023-01-11T21:38:06.5089286Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.5089414Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.5089518Z xmask = xindex < xnumel 2023-01-11T21:38:06.5089592Z x0 = xindex % 16 2023-01-11T21:38:06.5089661Z x2 = xindex 2023-01-11T21:38:06.5089759Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.5089878Z tmp3 = tl.load(in_ptr1 + (x2), xmask).to(tl.float32) 2023-01-11T21:38:06.5089953Z tmp1 = 3.5 2023-01-11T21:38:06.5090034Z tmp2 = tmp1.to(tl.float32) 2023-01-11T21:38:06.5090128Z tmp4 = tl.where(tmp0, tmp2, tmp3) 2023-01-11T21:38:06.5090261Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.5090345Z ''') 2023-01-11T21:38:06.5090350Z 2023-01-11T21:38:06.5090354Z 2023-01-11T21:38:06.5090452Z async_compile.wait(globals()) 2023-01-11T21:38:06.5090527Z del async_compile 2023-01-11T21:38:06.5090532Z 2023-01-11T21:38:06.5090605Z def call(args): 2023-01-11T21:38:06.5090686Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.5090754Z args.clear() 2023-01-11T21:38:06.5090845Z with torch.cuda.device(0): 2023-01-11T21:38:06.5091051Z buf0 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.5091141Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.5091285Z triton_fused_where_0.run(arg0_1, arg1_1, buf0, 256, grid=grid(256), stream=stream0) 2023-01-11T21:38:06.5091362Z del arg0_1 2023-01-11T21:38:06.5091434Z del arg1_1 2023-01-11T21:38:06.5091505Z return (buf0, ) 2023-01-11T21:38:06.5091515Z 2023-01-11T21:38:06.5091520Z 2023-01-11T21:38:06.5091593Z if __name__ == "__main__": 2023-01-11T21:38:06.5091710Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.5091834Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.5092033Z arg0_1 = rand_strided((1, 16), (16, 1), device='cuda:0', dtype=torch.bool) 2023-01-11T21:38:06.5092240Z arg1_1 = rand_strided((16, 16), (16, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.5092358Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.5092366Z 2023-01-11T21:38:06.5092370Z 2023-01-11T21:38:06.5092467Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.5092540Z import torch 2023-01-11T21:38:06.5092607Z import random 2023-01-11T21:38:06.5092723Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.5092870Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.5092876Z 2023-01-11T21:38:06.5092959Z aten = torch.ops.aten 2023-01-11T21:38:06.5093095Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.5093189Z async_compile = AsyncCompile() 2023-01-11T21:38:06.5093194Z 2023-01-11T21:38:06.5093270Z import triton 2023-01-11T21:38:06.5093355Z import triton.language as tl 2023-01-11T21:38:06.5093479Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.5093619Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.5093624Z 2023-01-11T21:38:06.5093629Z 2023-01-11T21:38:06.5093786Z triton_fused_where_0 = async_compile.triton(''' 2023-01-11T21:38:06.5093861Z import triton 2023-01-11T21:38:06.5093952Z import triton.language as tl 2023-01-11T21:38:06.5094063Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.5094162Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.5094291Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.5094416Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.5094421Z 2023-01-11T21:38:06.5094957Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*i1', 1: '*i64', 2: '*i64', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.5095032Z @triton.jit 2023-01-11T21:38:06.5095173Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.5095246Z xnumel = 256 2023-01-11T21:38:06.5095345Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.5095517Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.5095593Z xmask = xindex < xnumel 2023-01-11T21:38:06.5095670Z x0 = xindex % 16 2023-01-11T21:38:06.5095744Z x2 = xindex 2023-01-11T21:38:06.5095841Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.5095941Z tmp3 = tl.load(in_ptr1 + (x2), xmask) 2023-01-11T21:38:06.5096012Z tmp1 = 3.5 2023-01-11T21:38:06.5096099Z tmp2 = tmp1.to(tl.int64) 2023-01-11T21:38:06.5096189Z tmp4 = tl.where(tmp0, tmp2, tmp3) 2023-01-11T21:38:06.5096322Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.5096406Z ''') 2023-01-11T21:38:06.5096411Z 2023-01-11T21:38:06.5096415Z 2023-01-11T21:38:06.5096508Z async_compile.wait(globals()) 2023-01-11T21:38:06.5096585Z del async_compile 2023-01-11T21:38:06.5096590Z 2023-01-11T21:38:06.5096664Z def call(args): 2023-01-11T21:38:06.5096743Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.5096814Z args.clear() 2023-01-11T21:38:06.5096906Z with torch.cuda.device(0): 2023-01-11T21:38:06.5097109Z buf0 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.5097270Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.5097425Z triton_fused_where_0.run(arg0_1, arg1_1, buf0, 256, grid=grid(256), stream=stream0) 2023-01-11T21:38:06.5097498Z del arg0_1 2023-01-11T21:38:06.5097571Z del arg1_1 2023-01-11T21:38:06.5097642Z return (buf0, ) 2023-01-11T21:38:06.5097653Z 2023-01-11T21:38:06.5097657Z 2023-01-11T21:38:06.5097731Z if __name__ == "__main__": 2023-01-11T21:38:06.5097848Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.5097974Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.5098171Z arg0_1 = rand_strided((1, 16), (16, 1), device='cuda:0', dtype=torch.bool) 2023-01-11T21:38:06.5098370Z arg1_1 = rand_strided((16, 16), (16, 1), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.5098494Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.5098499Z 2023-01-11T21:38:06.5098571Z ok (0.466s) 2023-01-11T21:38:06.5099062Z test_max_min_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.5099191Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.5099442Z [2023-01-11 21:35:15,168] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 696 2023-01-11T21:38:06.5099705Z [2023-01-11 21:35:15,256] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 696 2023-01-11T21:38:06.5100120Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.5100258Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.5100512Z [2023-01-11 21:35:15,272] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 697 2023-01-11T21:38:06.5100777Z [2023-01-11 21:35:15,350] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 697 2023-01-11T21:38:06.5101189Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.5101347Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.5101601Z [2023-01-11 21:35:15,366] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 698 2023-01-11T21:38:06.5101862Z [2023-01-11 21:35:15,374] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 698 2023-01-11T21:38:06.5102274Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.5102403Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.5102649Z [2023-01-11 21:35:15,390] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 699 2023-01-11T21:38:06.5102657Z 2023-01-11T21:38:06.5102753Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.5102827Z import torch 2023-01-11T21:38:06.5102900Z import random 2023-01-11T21:38:06.5103019Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.5103147Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.5103152Z 2023-01-11T21:38:06.5103233Z aten = torch.ops.aten 2023-01-11T21:38:06.5103363Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.5103459Z async_compile = AsyncCompile() 2023-01-11T21:38:06.5103464Z 2023-01-11T21:38:06.5103538Z import triton 2023-01-11T21:38:06.5103628Z import triton.language as tl 2023-01-11T21:38:06.5103753Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.5103891Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.5103897Z 2023-01-11T21:38:06.5103901Z 2023-01-11T21:38:06.5104079Z triton_fused_maximum_minimum_0 = async_compile.triton(''' 2023-01-11T21:38:06.5104152Z import triton 2023-01-11T21:38:06.5104238Z import triton.language as tl 2023-01-11T21:38:06.5104351Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.5104451Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.5104609Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.5104733Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.5104738Z 2023-01-11T21:38:06.5105170Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.5105244Z @triton.jit 2023-01-11T21:38:06.5105394Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.5105469Z xnumel = 8 2023-01-11T21:38:06.5105563Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.5105717Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.5105810Z xmask = xindex < xnumel 2023-01-11T21:38:06.5105896Z x0 = xindex 2023-01-11T21:38:06.5106086Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.5106275Z tmp1 = tl.load(in_ptr1 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.5106373Z tmp3 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.5106463Z tmp4 = tl.load(in_ptr1 + (x0), xmask) 2023-01-11T21:38:06.5106601Z tmp2 = tl.where(tmp0 != tmp0, tmp0, tl.where(tmp0 > tmp1, tmp0, tmp1)) 2023-01-11T21:38:06.5106738Z tmp5 = tl.where(tmp3 != tmp3, tmp3, tl.where(tmp3 < tmp4, tmp3, tmp4)) 2023-01-11T21:38:06.5106875Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.5107004Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask) 2023-01-11T21:38:06.5107120Z ''') 2023-01-11T21:38:06.5107126Z 2023-01-11T21:38:06.5107130Z 2023-01-11T21:38:06.5107224Z async_compile.wait(globals()) 2023-01-11T21:38:06.5107300Z del async_compile 2023-01-11T21:38:06.5107305Z 2023-01-11T21:38:06.5107373Z def call(args): 2023-01-11T21:38:06.5107452Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.5107528Z args.clear() 2023-01-11T21:38:06.5107622Z with torch.cuda.device(0): 2023-01-11T21:38:06.5107817Z buf0 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.5108017Z buf1 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.5108110Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.5108263Z triton_fused_maximum_minimum_0.run(arg0_1, arg1_1, buf0, buf1, 8, grid=grid(8), stream=stream0) 2023-01-11T21:38:06.5108336Z del arg0_1 2023-01-11T21:38:06.5108407Z del arg1_1 2023-01-11T21:38:06.5108490Z return (buf0, buf1, ) 2023-01-11T21:38:06.5108495Z 2023-01-11T21:38:06.5108503Z 2023-01-11T21:38:06.5108583Z if __name__ == "__main__": 2023-01-11T21:38:06.5108700Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.5108826Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.5109022Z arg0_1 = rand_strided((8, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.5109212Z arg1_1 = rand_strided((8, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.5109329Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.5109335Z 2023-01-11T21:38:06.5109339Z 2023-01-11T21:38:06.5109436Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.5109508Z import torch 2023-01-11T21:38:06.5109582Z import random 2023-01-11T21:38:06.5109701Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.5109823Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.5109828Z 2023-01-11T21:38:06.5109908Z aten = torch.ops.aten 2023-01-11T21:38:06.5110037Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.5110134Z async_compile = AsyncCompile() 2023-01-11T21:38:06.5110139Z 2023-01-11T21:38:06.5110212Z import triton 2023-01-11T21:38:06.5110304Z import triton.language as tl 2023-01-11T21:38:06.5110427Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.5110590Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.5110596Z 2023-01-11T21:38:06.5110600Z 2023-01-11T21:38:06.5110780Z triton_fused_maximum_minimum_0 = async_compile.triton(''' 2023-01-11T21:38:06.5110854Z import triton 2023-01-11T21:38:06.5110940Z import triton.language as tl 2023-01-11T21:38:06.5111051Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.5111151Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.5111282Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.5111407Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.5111412Z 2023-01-11T21:38:06.5111843Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: '*fp16', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.5111917Z @triton.jit 2023-01-11T21:38:06.5112069Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.5112136Z xnumel = 8 2023-01-11T21:38:06.5112233Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.5112361Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.5112443Z xmask = xindex < xnumel 2023-01-11T21:38:06.5112514Z x0 = xindex 2023-01-11T21:38:06.5112727Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.5112938Z tmp1 = tl.load(in_ptr1 + (x0), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.5113048Z tmp3 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.5113207Z tmp4 = tl.load(in_ptr1 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.5113342Z tmp2 = tl.where(tmp0 != tmp0, tmp0, tl.where(tmp0 > tmp1, tmp0, tmp1)) 2023-01-11T21:38:06.5113480Z tmp5 = tl.where(tmp3 != tmp3, tmp3, tl.where(tmp3 < tmp4, tmp3, tmp4)) 2023-01-11T21:38:06.5113612Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.5113744Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask) 2023-01-11T21:38:06.5113832Z ''') 2023-01-11T21:38:06.5113838Z 2023-01-11T21:38:06.5113842Z 2023-01-11T21:38:06.5113934Z async_compile.wait(globals()) 2023-01-11T21:38:06.5114004Z del async_compile 2023-01-11T21:38:06.5114009Z 2023-01-11T21:38:06.5114081Z def call(args): 2023-01-11T21:38:06.5114160Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.5114233Z args.clear() 2023-01-11T21:38:06.5114324Z with torch.cuda.device(0): 2023-01-11T21:38:06.5114519Z buf0 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.5114715Z buf1 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.5114800Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.5114965Z triton_fused_maximum_minimum_0.run(arg0_1, arg1_1, buf0, buf1, 8, grid=grid(8), stream=stream0) 2023-01-11T21:38:06.5115038Z del arg0_1 2023-01-11T21:38:06.5115110Z del arg1_1 2023-01-11T21:38:06.5115191Z return (buf0, buf1, ) 2023-01-11T21:38:06.5115196Z 2023-01-11T21:38:06.5115201Z 2023-01-11T21:38:06.5115278Z if __name__ == "__main__": 2023-01-11T21:38:06.5115418Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.5115560Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.5115757Z arg0_1 = rand_strided((8, ), (1, ), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.5115952Z arg1_1 = rand_strided((8, ), (1, ), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.5116075Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.5116080Z 2023-01-11T21:38:06.5116084Z 2023-01-11T21:38:06.5116182Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.5116258Z import torch 2023-01-11T21:38:06.5116333Z import random 2023-01-11T21:38:06.5116480Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.5116604Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.5116609Z 2023-01-11T21:38:06.5116684Z aten = torch.ops.aten 2023-01-11T21:38:06.5116818Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.5116912Z async_compile = AsyncCompile() 2023-01-11T21:38:06.5116917Z 2023-01-11T21:38:06.5116990Z import triton 2023-01-11T21:38:06.5117080Z import triton.language as tl 2023-01-11T21:38:06.5117205Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.5117343Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.5117349Z 2023-01-11T21:38:06.5117356Z 2023-01-11T21:38:06.5117531Z triton_fused_maximum_minimum_0 = async_compile.triton(''' 2023-01-11T21:38:06.5117598Z import triton 2023-01-11T21:38:06.5117690Z import triton.language as tl 2023-01-11T21:38:06.5117802Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.5117905Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.5118039Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.5118165Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.5118170Z 2023-01-11T21:38:06.5118603Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.5118678Z @triton.jit 2023-01-11T21:38:06.5118822Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.5118922Z xnumel = 8 2023-01-11T21:38:06.5119020Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.5119150Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.5119233Z xmask = xindex < xnumel 2023-01-11T21:38:06.5119303Z x0 = xindex 2023-01-11T21:38:06.5119496Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.5119680Z tmp1 = tl.load(in_ptr1 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.5119778Z tmp3 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.5119873Z tmp4 = tl.load(in_ptr1 + (x0), xmask) 2023-01-11T21:38:06.5120009Z tmp2 = tl.where(tmp0 != tmp0, tmp0, tl.where(tmp0 > tmp1, tmp0, tmp1)) 2023-01-11T21:38:06.5120145Z tmp5 = tl.where(tmp3 != tmp3, tmp3, tl.where(tmp3 < tmp4, tmp3, tmp4)) 2023-01-11T21:38:06.5120276Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.5120407Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask) 2023-01-11T21:38:06.5120496Z ''') 2023-01-11T21:38:06.5120502Z 2023-01-11T21:38:06.5120506Z 2023-01-11T21:38:06.5120592Z async_compile.wait(globals()) 2023-01-11T21:38:06.5120668Z del async_compile 2023-01-11T21:38:06.5120674Z 2023-01-11T21:38:06.5120747Z def call(args): 2023-01-11T21:38:06.5120830Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.5120907Z args.clear() 2023-01-11T21:38:06.5120997Z with torch.cuda.device(0): 2023-01-11T21:38:06.5121194Z buf0 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.5121380Z buf1 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.5121474Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.5121634Z triton_fused_maximum_minimum_0.run(arg0_1, arg1_1, buf0, buf1, 8, grid=grid(8), stream=stream0) 2023-01-11T21:38:06.5121707Z del arg0_1 2023-01-11T21:38:06.5121775Z del arg1_1 2023-01-11T21:38:06.5121857Z return (buf0, buf1, ) 2023-01-11T21:38:06.5121865Z 2023-01-11T21:38:06.5121869Z 2023-01-11T21:38:06.5121947Z if __name__ == "__main__": 2023-01-11T21:38:06.5122064Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.5122183Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.5122409Z arg0_1 = rand_strided((8, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.5122607Z arg1_1 = rand_strided((8, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.5122726Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.5122731Z 2023-01-11T21:38:06.5122997Z [2023-01-11 21:35:15,399] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 699 2023-01-11T21:38:06.5123003Z 2023-01-11T21:38:06.5123101Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.5123174Z import torch 2023-01-11T21:38:06.5123248Z import random 2023-01-11T21:38:06.5123359Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.5123485Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.5123491Z 2023-01-11T21:38:06.5123572Z aten = torch.ops.aten 2023-01-11T21:38:06.5123710Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.5123805Z async_compile = AsyncCompile() 2023-01-11T21:38:06.5123810Z 2023-01-11T21:38:06.5123886Z import triton 2023-01-11T21:38:06.5123978Z import triton.language as tl 2023-01-11T21:38:06.5124104Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.5124235Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.5124241Z 2023-01-11T21:38:06.5124245Z 2023-01-11T21:38:06.5124417Z triton_fused_maximum_minimum_0 = async_compile.triton(''' 2023-01-11T21:38:06.5124490Z import triton 2023-01-11T21:38:06.5124582Z import triton.language as tl 2023-01-11T21:38:06.5124694Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.5124795Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.5124952Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.5125070Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.5125085Z 2023-01-11T21:38:06.5125509Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: '*fp16', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.5125582Z @triton.jit 2023-01-11T21:38:06.5125731Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.5125803Z xnumel = 8 2023-01-11T21:38:06.5125900Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.5126029Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.5126113Z xmask = xindex < xnumel 2023-01-11T21:38:06.5126177Z x0 = xindex 2023-01-11T21:38:06.5126390Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.5126604Z tmp1 = tl.load(in_ptr1 + (x0), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.5126718Z tmp3 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.5126834Z tmp4 = tl.load(in_ptr1 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.5126969Z tmp2 = tl.where(tmp0 != tmp0, tmp0, tl.where(tmp0 > tmp1, tmp0, tmp1)) 2023-01-11T21:38:06.5127100Z tmp5 = tl.where(tmp3 != tmp3, tmp3, tl.where(tmp3 < tmp4, tmp3, tmp4)) 2023-01-11T21:38:06.5127234Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.5127361Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask) 2023-01-11T21:38:06.5127445Z ''') 2023-01-11T21:38:06.5127451Z 2023-01-11T21:38:06.5127455Z 2023-01-11T21:38:06.5127547Z async_compile.wait(globals()) 2023-01-11T21:38:06.5127621Z del async_compile 2023-01-11T21:38:06.5127626Z 2023-01-11T21:38:06.5127699Z def call(args): 2023-01-11T21:38:06.5127780Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.5127854Z args.clear() 2023-01-11T21:38:06.5127944Z with torch.cuda.device(0): 2023-01-11T21:38:06.5128132Z buf0 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.5128324Z buf1 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.5128444Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.5128603Z triton_fused_maximum_minimum_0.run(arg0_1, arg1_1, buf0, buf1, 8, grid=grid(8), stream=stream0) 2023-01-11T21:38:06.5128676Z del arg0_1 2023-01-11T21:38:06.5128748Z del arg1_1 2023-01-11T21:38:06.5128830Z return (buf0, buf1, ) 2023-01-11T21:38:06.5128835Z 2023-01-11T21:38:06.5128840Z 2023-01-11T21:38:06.5128920Z if __name__ == "__main__": 2023-01-11T21:38:06.5129031Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.5129156Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.5129355Z arg0_1 = rand_strided((8, ), (1, ), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.5129548Z arg1_1 = rand_strided((8, ), (1, ), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.5129663Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.5129668Z 2023-01-11T21:38:06.5129740Z ok (0.248s) 2023-01-11T21:38:06.5130200Z test_max_pool2d1_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.5130330Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.5130588Z [2023-01-11 21:35:15,419] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 700 2023-01-11T21:38:06.5130873Z [2023-01-11 21:35:15,797] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 700 2023-01-11T21:38:06.5131288Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.5131418Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.5131671Z [2023-01-11 21:35:15,816] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 701 2023-01-11T21:38:06.5131677Z 2023-01-11T21:38:06.5131772Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.5131846Z import torch 2023-01-11T21:38:06.5131921Z import random 2023-01-11T21:38:06.5132040Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.5132165Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.5132170Z 2023-01-11T21:38:06.5132245Z aten = torch.ops.aten 2023-01-11T21:38:06.5132383Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.5132480Z async_compile = AsyncCompile() 2023-01-11T21:38:06.5132485Z 2023-01-11T21:38:06.5132560Z import triton 2023-01-11T21:38:06.5132652Z import triton.language as tl 2023-01-11T21:38:06.5132780Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.5132918Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.5132924Z 2023-01-11T21:38:06.5132928Z 2023-01-11T21:38:06.5133107Z triton_fused_getitem_getitem_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.5133175Z import triton 2023-01-11T21:38:06.5133269Z import triton.language as tl 2023-01-11T21:38:06.5133380Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.5133482Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.5133618Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.5133743Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.5133748Z 2023-01-11T21:38:06.5134193Z @pointwise(size_hints=[512], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*i64', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.5134269Z @triton.jit 2023-01-11T21:38:06.5134404Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.5134594Z xnumel = 392 2023-01-11T21:38:06.5134696Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.5134824Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.5134907Z xmask = xindex < xnumel 2023-01-11T21:38:06.5134979Z x0 = xindex % 7 2023-01-11T21:38:06.5135061Z x1 = (xindex // 7) % 7 2023-01-11T21:38:06.5135132Z x2 = (xindex // 49) 2023-01-11T21:38:06.5135206Z x3 = xindex 2023-01-11T21:38:06.5135429Z tmp0 = tl.load(in_ptr0 + ((2*x0) + (32*x1) + (256*x2)), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.5135658Z tmp1 = tl.load(in_ptr0 + (1 + (2*x0) + (32*x1) + (256*x2)), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.5135878Z tmp3 = tl.load(in_ptr0 + (2 + (2*x0) + (32*x1) + (256*x2)), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.5136099Z tmp5 = tl.load(in_ptr0 + (16 + (2*x0) + (32*x1) + (256*x2)), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.5136316Z tmp7 = tl.load(in_ptr0 + (17 + (2*x0) + (32*x1) + (256*x2)), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.5136528Z tmp9 = tl.load(in_ptr0 + (18 + (2*x0) + (32*x1) + (256*x2)), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.5136742Z tmp11 = tl.load(in_ptr0 + (32 + (2*x0) + (32*x1) + (256*x2)), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.5136964Z tmp13 = tl.load(in_ptr0 + (33 + (2*x0) + (32*x1) + (256*x2)), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.5137290Z tmp15 = tl.load(in_ptr0 + (34 + (2*x0) + (32*x1) + (256*x2)), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.5137408Z tmp17 = tl.load(in_ptr0 + ((2*x0) + (32*x1) + (256*x2)), xmask) 2023-01-11T21:38:06.5137531Z tmp19 = tl.load(in_ptr0 + (1 + (2*x0) + (32*x1) + (256*x2)), xmask) 2023-01-11T21:38:06.5137647Z tmp24 = tl.load(in_ptr0 + (2 + (2*x0) + (32*x1) + (256*x2)), xmask) 2023-01-11T21:38:06.5137769Z tmp29 = tl.load(in_ptr0 + (16 + (2*x0) + (32*x1) + (256*x2)), xmask) 2023-01-11T21:38:06.5137887Z tmp34 = tl.load(in_ptr0 + (17 + (2*x0) + (32*x1) + (256*x2)), xmask) 2023-01-11T21:38:06.5137997Z tmp39 = tl.load(in_ptr0 + (18 + (2*x0) + (32*x1) + (256*x2)), xmask) 2023-01-11T21:38:06.5138113Z tmp44 = tl.load(in_ptr0 + (32 + (2*x0) + (32*x1) + (256*x2)), xmask) 2023-01-11T21:38:06.5138227Z tmp49 = tl.load(in_ptr0 + (33 + (2*x0) + (32*x1) + (256*x2)), xmask) 2023-01-11T21:38:06.5138344Z tmp54 = tl.load(in_ptr0 + (34 + (2*x0) + (32*x1) + (256*x2)), xmask) 2023-01-11T21:38:06.5138480Z tmp2 = tl.where(tmp1 != tmp1, tmp1, tl.where(tmp1 > tmp0, tmp1, tmp0)) 2023-01-11T21:38:06.5138615Z tmp4 = tl.where(tmp3 != tmp3, tmp3, tl.where(tmp3 > tmp2, tmp3, tmp2)) 2023-01-11T21:38:06.5138748Z tmp6 = tl.where(tmp5 != tmp5, tmp5, tl.where(tmp5 > tmp4, tmp5, tmp4)) 2023-01-11T21:38:06.5138879Z tmp8 = tl.where(tmp7 != tmp7, tmp7, tl.where(tmp7 > tmp6, tmp7, tmp6)) 2023-01-11T21:38:06.5139008Z tmp10 = tl.where(tmp9 != tmp9, tmp9, tl.where(tmp9 > tmp8, tmp9, tmp8)) 2023-01-11T21:38:06.5139148Z tmp12 = tl.where(tmp11 != tmp11, tmp11, tl.where(tmp11 > tmp10, tmp11, tmp10)) 2023-01-11T21:38:06.5139284Z tmp14 = tl.where(tmp13 != tmp13, tmp13, tl.where(tmp13 > tmp12, tmp13, tmp12)) 2023-01-11T21:38:06.5139418Z tmp16 = tl.where(tmp15 != tmp15, tmp15, tl.where(tmp15 > tmp14, tmp15, tmp14)) 2023-01-11T21:38:06.5139498Z tmp18 = (2*x0) + (32*x1) 2023-01-11T21:38:06.5139587Z tmp20 = 1 + (2*x0) + (32*x1) 2023-01-11T21:38:06.5139669Z tmp21 = tmp19 > tmp17 2023-01-11T21:38:06.5139768Z tmp22 = tl.where(tmp21, tmp20, tmp18) 2023-01-11T21:38:06.5139901Z tmp23 = tl.where(tmp19 != tmp19, tmp19, tl.where(tmp19 > tmp17, tmp19, tmp17)) 2023-01-11T21:38:06.5139982Z tmp25 = 2 + (2*x0) + (32*x1) 2023-01-11T21:38:06.5140151Z tmp26 = tmp24 > tmp23 2023-01-11T21:38:06.5140251Z tmp27 = tl.where(tmp26, tmp25, tmp22) 2023-01-11T21:38:06.5140390Z tmp28 = tl.where(tmp24 != tmp24, tmp24, tl.where(tmp24 > tmp23, tmp24, tmp23)) 2023-01-11T21:38:06.5140470Z tmp30 = 16 + (2*x0) + (32*x1) 2023-01-11T21:38:06.5140550Z tmp31 = tmp29 > tmp28 2023-01-11T21:38:06.5140641Z tmp32 = tl.where(tmp31, tmp30, tmp27) 2023-01-11T21:38:06.5140781Z tmp33 = tl.where(tmp29 != tmp29, tmp29, tl.where(tmp29 > tmp28, tmp29, tmp28)) 2023-01-11T21:38:06.5140864Z tmp35 = 17 + (2*x0) + (32*x1) 2023-01-11T21:38:06.5140944Z tmp36 = tmp34 > tmp33 2023-01-11T21:38:06.5141044Z tmp37 = tl.where(tmp36, tmp35, tmp32) 2023-01-11T21:38:06.5141186Z tmp38 = tl.where(tmp34 != tmp34, tmp34, tl.where(tmp34 > tmp33, tmp34, tmp33)) 2023-01-11T21:38:06.5141268Z tmp40 = 18 + (2*x0) + (32*x1) 2023-01-11T21:38:06.5141341Z tmp41 = tmp39 > tmp38 2023-01-11T21:38:06.5141438Z tmp42 = tl.where(tmp41, tmp40, tmp37) 2023-01-11T21:38:06.5141575Z tmp43 = tl.where(tmp39 != tmp39, tmp39, tl.where(tmp39 > tmp38, tmp39, tmp38)) 2023-01-11T21:38:06.5141658Z tmp45 = 32 + (2*x0) + (32*x1) 2023-01-11T21:38:06.5141737Z tmp46 = tmp44 > tmp43 2023-01-11T21:38:06.5141832Z tmp47 = tl.where(tmp46, tmp45, tmp42) 2023-01-11T21:38:06.5141965Z tmp48 = tl.where(tmp44 != tmp44, tmp44, tl.where(tmp44 > tmp43, tmp44, tmp43)) 2023-01-11T21:38:06.5142041Z tmp50 = 33 + (2*x0) + (32*x1) 2023-01-11T21:38:06.5142123Z tmp51 = tmp49 > tmp48 2023-01-11T21:38:06.5142222Z tmp52 = tl.where(tmp51, tmp50, tmp47) 2023-01-11T21:38:06.5142359Z tmp53 = tl.where(tmp49 != tmp49, tmp49, tl.where(tmp49 > tmp48, tmp49, tmp48)) 2023-01-11T21:38:06.5142477Z tmp55 = 34 + (2*x0) + (32*x1) 2023-01-11T21:38:06.5142558Z tmp56 = tmp54 > tmp53 2023-01-11T21:38:06.5142656Z tmp57 = tl.where(tmp56, tmp55, tmp52) 2023-01-11T21:38:06.5142788Z tmp58 = tl.where(tmp54 != tmp54, tmp54, tl.where(tmp54 > tmp53, tmp54, tmp53)) 2023-01-11T21:38:06.5142926Z tl.store(out_ptr0 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp16, xmask) 2023-01-11T21:38:06.5143059Z tl.store(out_ptr1 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp57, xmask) 2023-01-11T21:38:06.5143151Z ''') 2023-01-11T21:38:06.5143156Z 2023-01-11T21:38:06.5143161Z 2023-01-11T21:38:06.5143259Z async_compile.wait(globals()) 2023-01-11T21:38:06.5143340Z del async_compile 2023-01-11T21:38:06.5143345Z 2023-01-11T21:38:06.5143425Z def call(args): 2023-01-11T21:38:06.5143502Z arg0_1, = args 2023-01-11T21:38:06.5143573Z args.clear() 2023-01-11T21:38:06.5143668Z with torch.cuda.device(0): 2023-01-11T21:38:06.5143890Z buf0 = empty_strided((2, 4, 7, 7), (196, 49, 7, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.5144109Z buf1 = empty_strided((2, 4, 7, 7), (196, 49, 7, 1), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.5144201Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.5144363Z triton_fused_getitem_getitem_1_0.run(arg0_1, buf0, buf1, 392, grid=grid(392), stream=stream0) 2023-01-11T21:38:06.5144440Z del arg0_1 2023-01-11T21:38:06.5144525Z return (buf0, buf1, ) 2023-01-11T21:38:06.5144530Z 2023-01-11T21:38:06.5144535Z 2023-01-11T21:38:06.5144610Z if __name__ == "__main__": 2023-01-11T21:38:06.5144731Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.5144865Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.5145094Z arg0_1 = rand_strided((2, 4, 16, 16), (1024, 256, 16, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.5145207Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.5145213Z 2023-01-11T21:38:06.5145485Z [2023-01-11 21:35:16,074] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 701 2023-01-11T21:38:06.5145491Z 2023-01-11T21:38:06.5145590Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.5145668Z import torch 2023-01-11T21:38:06.5145738Z import random 2023-01-11T21:38:06.5145914Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.5146062Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.5146068Z 2023-01-11T21:38:06.5146158Z aten = torch.ops.aten 2023-01-11T21:38:06.5146297Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.5146395Z async_compile = AsyncCompile() 2023-01-11T21:38:06.5146400Z 2023-01-11T21:38:06.5146473Z import triton 2023-01-11T21:38:06.5146570Z import triton.language as tl 2023-01-11T21:38:06.5146690Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.5146831Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.5146840Z 2023-01-11T21:38:06.5146845Z 2023-01-11T21:38:06.5147027Z triton_fused_getitem_getitem_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.5147104Z import triton 2023-01-11T21:38:06.5147202Z import triton.language as tl 2023-01-11T21:38:06.5147317Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.5147421Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.5147550Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.5147679Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.5147684Z 2023-01-11T21:38:06.5148101Z @pointwise(size_hints=[512], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*i64', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.5148176Z @triton.jit 2023-01-11T21:38:06.5148320Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.5148427Z xnumel = 392 2023-01-11T21:38:06.5148525Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.5148656Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.5148735Z xmask = xindex < xnumel 2023-01-11T21:38:06.5148811Z x0 = xindex % 7 2023-01-11T21:38:06.5148894Z x1 = (xindex // 7) % 7 2023-01-11T21:38:06.5148974Z x2 = (xindex // 49) 2023-01-11T21:38:06.5149050Z x3 = xindex 2023-01-11T21:38:06.5149296Z tmp0 = tl.load(in_ptr0 + ((2*x0) + (32*x1) + (256*x2)), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.5149537Z tmp1 = tl.load(in_ptr0 + (1 + (2*x0) + (32*x1) + (256*x2)), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.5149773Z tmp3 = tl.load(in_ptr0 + (2 + (2*x0) + (32*x1) + (256*x2)), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.5150008Z tmp5 = tl.load(in_ptr0 + (16 + (2*x0) + (32*x1) + (256*x2)), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.5150250Z tmp7 = tl.load(in_ptr0 + (17 + (2*x0) + (32*x1) + (256*x2)), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.5150483Z tmp9 = tl.load(in_ptr0 + (18 + (2*x0) + (32*x1) + (256*x2)), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.5150728Z tmp11 = tl.load(in_ptr0 + (32 + (2*x0) + (32*x1) + (256*x2)), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.5150964Z tmp13 = tl.load(in_ptr0 + (33 + (2*x0) + (32*x1) + (256*x2)), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.5151198Z tmp15 = tl.load(in_ptr0 + (34 + (2*x0) + (32*x1) + (256*x2)), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.5151330Z tmp17 = tl.load(in_ptr0 + ((2*x0) + (32*x1) + (256*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.5151465Z tmp19 = tl.load(in_ptr0 + (1 + (2*x0) + (32*x1) + (256*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.5151603Z tmp24 = tl.load(in_ptr0 + (2 + (2*x0) + (32*x1) + (256*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.5151733Z tmp29 = tl.load(in_ptr0 + (16 + (2*x0) + (32*x1) + (256*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.5151870Z tmp34 = tl.load(in_ptr0 + (17 + (2*x0) + (32*x1) + (256*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.5152034Z tmp39 = tl.load(in_ptr0 + (18 + (2*x0) + (32*x1) + (256*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.5152165Z tmp44 = tl.load(in_ptr0 + (32 + (2*x0) + (32*x1) + (256*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.5152293Z tmp49 = tl.load(in_ptr0 + (33 + (2*x0) + (32*x1) + (256*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.5152421Z tmp54 = tl.load(in_ptr0 + (34 + (2*x0) + (32*x1) + (256*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.5152560Z tmp2 = tl.where(tmp1 != tmp1, tmp1, tl.where(tmp1 > tmp0, tmp1, tmp0)) 2023-01-11T21:38:06.5152697Z tmp4 = tl.where(tmp3 != tmp3, tmp3, tl.where(tmp3 > tmp2, tmp3, tmp2)) 2023-01-11T21:38:06.5152824Z tmp6 = tl.where(tmp5 != tmp5, tmp5, tl.where(tmp5 > tmp4, tmp5, tmp4)) 2023-01-11T21:38:06.5152960Z tmp8 = tl.where(tmp7 != tmp7, tmp7, tl.where(tmp7 > tmp6, tmp7, tmp6)) 2023-01-11T21:38:06.5153097Z tmp10 = tl.where(tmp9 != tmp9, tmp9, tl.where(tmp9 > tmp8, tmp9, tmp8)) 2023-01-11T21:38:06.5153243Z tmp12 = tl.where(tmp11 != tmp11, tmp11, tl.where(tmp11 > tmp10, tmp11, tmp10)) 2023-01-11T21:38:06.5153381Z tmp14 = tl.where(tmp13 != tmp13, tmp13, tl.where(tmp13 > tmp12, tmp13, tmp12)) 2023-01-11T21:38:06.5153517Z tmp16 = tl.where(tmp15 != tmp15, tmp15, tl.where(tmp15 > tmp14, tmp15, tmp14)) 2023-01-11T21:38:06.5153604Z tmp18 = (2*x0) + (32*x1) 2023-01-11T21:38:06.5153690Z tmp20 = 1 + (2*x0) + (32*x1) 2023-01-11T21:38:06.5153766Z tmp21 = tmp19 > tmp17 2023-01-11T21:38:06.5153866Z tmp22 = tl.where(tmp21, tmp20, tmp18) 2023-01-11T21:38:06.5154007Z tmp23 = tl.where(tmp19 != tmp19, tmp19, tl.where(tmp19 > tmp17, tmp19, tmp17)) 2023-01-11T21:38:06.5154092Z tmp25 = 2 + (2*x0) + (32*x1) 2023-01-11T21:38:06.5154220Z tmp26 = tmp24 > tmp23 2023-01-11T21:38:06.5154319Z tmp27 = tl.where(tmp26, tmp25, tmp22) 2023-01-11T21:38:06.5154456Z tmp28 = tl.where(tmp24 != tmp24, tmp24, tl.where(tmp24 > tmp23, tmp24, tmp23)) 2023-01-11T21:38:06.5154536Z tmp30 = 16 + (2*x0) + (32*x1) 2023-01-11T21:38:06.5154618Z tmp31 = tmp29 > tmp28 2023-01-11T21:38:06.5154721Z tmp32 = tl.where(tmp31, tmp30, tmp27) 2023-01-11T21:38:06.5154862Z tmp33 = tl.where(tmp29 != tmp29, tmp29, tl.where(tmp29 > tmp28, tmp29, tmp28)) 2023-01-11T21:38:06.5154946Z tmp35 = 17 + (2*x0) + (32*x1) 2023-01-11T21:38:06.5155029Z tmp36 = tmp34 > tmp33 2023-01-11T21:38:06.5155129Z tmp37 = tl.where(tmp36, tmp35, tmp32) 2023-01-11T21:38:06.5155266Z tmp38 = tl.where(tmp34 != tmp34, tmp34, tl.where(tmp34 > tmp33, tmp34, tmp33)) 2023-01-11T21:38:06.5155366Z tmp40 = 18 + (2*x0) + (32*x1) 2023-01-11T21:38:06.5155457Z tmp41 = tmp39 > tmp38 2023-01-11T21:38:06.5155576Z tmp42 = tl.where(tmp41, tmp40, tmp37) 2023-01-11T21:38:06.5155722Z tmp43 = tl.where(tmp39 != tmp39, tmp39, tl.where(tmp39 > tmp38, tmp39, tmp38)) 2023-01-11T21:38:06.5155807Z tmp45 = 32 + (2*x0) + (32*x1) 2023-01-11T21:38:06.5155889Z tmp46 = tmp44 > tmp43 2023-01-11T21:38:06.5155981Z tmp47 = tl.where(tmp46, tmp45, tmp42) 2023-01-11T21:38:06.5156120Z tmp48 = tl.where(tmp44 != tmp44, tmp44, tl.where(tmp44 > tmp43, tmp44, tmp43)) 2023-01-11T21:38:06.5156203Z tmp50 = 33 + (2*x0) + (32*x1) 2023-01-11T21:38:06.5156286Z tmp51 = tmp49 > tmp48 2023-01-11T21:38:06.5156383Z tmp52 = tl.where(tmp51, tmp50, tmp47) 2023-01-11T21:38:06.5156522Z tmp53 = tl.where(tmp49 != tmp49, tmp49, tl.where(tmp49 > tmp48, tmp49, tmp48)) 2023-01-11T21:38:06.5156604Z tmp55 = 34 + (2*x0) + (32*x1) 2023-01-11T21:38:06.5156685Z tmp56 = tmp54 > tmp53 2023-01-11T21:38:06.5156778Z tmp57 = tl.where(tmp56, tmp55, tmp52) 2023-01-11T21:38:06.5156915Z tmp58 = tl.where(tmp54 != tmp54, tmp54, tl.where(tmp54 > tmp53, tmp54, tmp53)) 2023-01-11T21:38:06.5157052Z tl.store(out_ptr0 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp16, xmask) 2023-01-11T21:38:06.5157185Z tl.store(out_ptr1 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp57, xmask) 2023-01-11T21:38:06.5157276Z ''') 2023-01-11T21:38:06.5157282Z 2023-01-11T21:38:06.5157287Z 2023-01-11T21:38:06.5157381Z async_compile.wait(globals()) 2023-01-11T21:38:06.5157487Z del async_compile 2023-01-11T21:38:06.5157493Z 2023-01-11T21:38:06.5157570Z def call(args): 2023-01-11T21:38:06.5157639Z arg0_1, = args 2023-01-11T21:38:06.5157717Z args.clear() 2023-01-11T21:38:06.5157813Z with torch.cuda.device(0): 2023-01-11T21:38:06.5158033Z buf0 = empty_strided((2, 4, 7, 7), (196, 49, 7, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.5158252Z buf1 = empty_strided((2, 4, 7, 7), (196, 49, 7, 1), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.5158345Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.5158505Z triton_fused_getitem_getitem_1_0.run(arg0_1, buf0, buf1, 392, grid=grid(392), stream=stream0) 2023-01-11T21:38:06.5158578Z del arg0_1 2023-01-11T21:38:06.5158661Z return (buf0, buf1, ) 2023-01-11T21:38:06.5158666Z 2023-01-11T21:38:06.5158671Z 2023-01-11T21:38:06.5158753Z if __name__ == "__main__": 2023-01-11T21:38:06.5158875Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.5159009Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.5159235Z arg0_1 = rand_strided((2, 4, 16, 16), (1024, 256, 16, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.5159348Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.5159353Z 2023-01-11T21:38:06.5159426Z ok (0.675s) 2023-01-11T21:38:06.5159885Z test_max_pool2d2_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.5160040Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.5160300Z [2023-01-11 21:35:16,113] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 702 2023-01-11T21:38:06.5160571Z [2023-01-11 21:35:16,402] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 702 2023-01-11T21:38:06.5160988Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.5161121Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.5161379Z [2023-01-11 21:35:16,423] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 703 2023-01-11T21:38:06.5161387Z 2023-01-11T21:38:06.5161491Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.5161570Z import torch 2023-01-11T21:38:06.5161645Z import random 2023-01-11T21:38:06.5161763Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.5161888Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.5161893Z 2023-01-11T21:38:06.5161979Z aten = torch.ops.aten 2023-01-11T21:38:06.5162117Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.5162216Z async_compile = AsyncCompile() 2023-01-11T21:38:06.5162221Z 2023-01-11T21:38:06.5162297Z import triton 2023-01-11T21:38:06.5162390Z import triton.language as tl 2023-01-11T21:38:06.5162520Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.5162655Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.5162660Z 2023-01-11T21:38:06.5162678Z 2023-01-11T21:38:06.5162853Z triton_fused_getitem_getitem_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.5162929Z import triton 2023-01-11T21:38:06.5163025Z import triton.language as tl 2023-01-11T21:38:06.5163141Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.5163246Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.5163409Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.5163538Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.5163544Z 2023-01-11T21:38:06.5163964Z @pointwise(size_hints=[1048576], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*i64', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.5164041Z @triton.jit 2023-01-11T21:38:06.5164186Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.5164264Z xnumel = 746496 2023-01-11T21:38:06.5164366Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.5164498Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.5164584Z xmask = xindex < xnumel 2023-01-11T21:38:06.5164663Z x0 = xindex % 27 2023-01-11T21:38:06.5164741Z x1 = (xindex // 27) % 27 2023-01-11T21:38:06.5164825Z x2 = (xindex // 729) 2023-01-11T21:38:06.5164897Z x3 = xindex 2023-01-11T21:38:06.5165123Z tmp0 = tl.load(in_ptr0 + ((2*x0) + (110*x1) + (3025*x2)), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.5165352Z tmp1 = tl.load(in_ptr0 + (1 + (2*x0) + (110*x1) + (3025*x2)), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.5165579Z tmp3 = tl.load(in_ptr0 + (2 + (2*x0) + (110*x1) + (3025*x2)), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.5165805Z tmp5 = tl.load(in_ptr0 + (55 + (2*x0) + (110*x1) + (3025*x2)), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.5166024Z tmp7 = tl.load(in_ptr0 + (56 + (2*x0) + (110*x1) + (3025*x2)), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.5166278Z tmp9 = tl.load(in_ptr0 + (57 + (2*x0) + (110*x1) + (3025*x2)), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.5166507Z tmp11 = tl.load(in_ptr0 + (110 + (2*x0) + (110*x1) + (3025*x2)), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.5166737Z tmp13 = tl.load(in_ptr0 + (111 + (2*x0) + (110*x1) + (3025*x2)), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.5166965Z tmp15 = tl.load(in_ptr0 + (112 + (2*x0) + (110*x1) + (3025*x2)), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.5167088Z tmp17 = tl.load(in_ptr0 + ((2*x0) + (110*x1) + (3025*x2)), xmask) 2023-01-11T21:38:06.5167209Z tmp19 = tl.load(in_ptr0 + (1 + (2*x0) + (110*x1) + (3025*x2)), xmask) 2023-01-11T21:38:06.5167328Z tmp24 = tl.load(in_ptr0 + (2 + (2*x0) + (110*x1) + (3025*x2)), xmask) 2023-01-11T21:38:06.5167448Z tmp29 = tl.load(in_ptr0 + (55 + (2*x0) + (110*x1) + (3025*x2)), xmask) 2023-01-11T21:38:06.5167565Z tmp34 = tl.load(in_ptr0 + (56 + (2*x0) + (110*x1) + (3025*x2)), xmask) 2023-01-11T21:38:06.5167684Z tmp39 = tl.load(in_ptr0 + (57 + (2*x0) + (110*x1) + (3025*x2)), xmask) 2023-01-11T21:38:06.5167806Z tmp44 = tl.load(in_ptr0 + (110 + (2*x0) + (110*x1) + (3025*x2)), xmask) 2023-01-11T21:38:06.5167928Z tmp49 = tl.load(in_ptr0 + (111 + (2*x0) + (110*x1) + (3025*x2)), xmask) 2023-01-11T21:38:06.5168047Z tmp54 = tl.load(in_ptr0 + (112 + (2*x0) + (110*x1) + (3025*x2)), xmask) 2023-01-11T21:38:06.5168184Z tmp2 = tl.where(tmp1 != tmp1, tmp1, tl.where(tmp1 > tmp0, tmp1, tmp0)) 2023-01-11T21:38:06.5168318Z tmp4 = tl.where(tmp3 != tmp3, tmp3, tl.where(tmp3 > tmp2, tmp3, tmp2)) 2023-01-11T21:38:06.5168450Z tmp6 = tl.where(tmp5 != tmp5, tmp5, tl.where(tmp5 > tmp4, tmp5, tmp4)) 2023-01-11T21:38:06.5168577Z tmp8 = tl.where(tmp7 != tmp7, tmp7, tl.where(tmp7 > tmp6, tmp7, tmp6)) 2023-01-11T21:38:06.5168714Z tmp10 = tl.where(tmp9 != tmp9, tmp9, tl.where(tmp9 > tmp8, tmp9, tmp8)) 2023-01-11T21:38:06.5168861Z tmp12 = tl.where(tmp11 != tmp11, tmp11, tl.where(tmp11 > tmp10, tmp11, tmp10)) 2023-01-11T21:38:06.5168999Z tmp14 = tl.where(tmp13 != tmp13, tmp13, tl.where(tmp13 > tmp12, tmp13, tmp12)) 2023-01-11T21:38:06.5169135Z tmp16 = tl.where(tmp15 != tmp15, tmp15, tl.where(tmp15 > tmp14, tmp15, tmp14)) 2023-01-11T21:38:06.5169248Z tmp18 = (2*x0) + (110*x1) 2023-01-11T21:38:06.5169337Z tmp20 = 1 + (2*x0) + (110*x1) 2023-01-11T21:38:06.5169418Z tmp21 = tmp19 > tmp17 2023-01-11T21:38:06.5169514Z tmp22 = tl.where(tmp21, tmp20, tmp18) 2023-01-11T21:38:06.5169653Z tmp23 = tl.where(tmp19 != tmp19, tmp19, tl.where(tmp19 > tmp17, tmp19, tmp17)) 2023-01-11T21:38:06.5169739Z tmp25 = 2 + (2*x0) + (110*x1) 2023-01-11T21:38:06.5169822Z tmp26 = tmp24 > tmp23 2023-01-11T21:38:06.5169923Z tmp27 = tl.where(tmp26, tmp25, tmp22) 2023-01-11T21:38:06.5170059Z tmp28 = tl.where(tmp24 != tmp24, tmp24, tl.where(tmp24 > tmp23, tmp24, tmp23)) 2023-01-11T21:38:06.5170148Z tmp30 = 55 + (2*x0) + (110*x1) 2023-01-11T21:38:06.5170224Z tmp31 = tmp29 > tmp28 2023-01-11T21:38:06.5170323Z tmp32 = tl.where(tmp31, tmp30, tmp27) 2023-01-11T21:38:06.5170465Z tmp33 = tl.where(tmp29 != tmp29, tmp29, tl.where(tmp29 > tmp28, tmp29, tmp28)) 2023-01-11T21:38:06.5170551Z tmp35 = 56 + (2*x0) + (110*x1) 2023-01-11T21:38:06.5170637Z tmp36 = tmp34 > tmp33 2023-01-11T21:38:06.5170738Z tmp37 = tl.where(tmp36, tmp35, tmp32) 2023-01-11T21:38:06.5170874Z tmp38 = tl.where(tmp34 != tmp34, tmp34, tl.where(tmp34 > tmp33, tmp34, tmp33)) 2023-01-11T21:38:06.5170952Z tmp40 = 57 + (2*x0) + (110*x1) 2023-01-11T21:38:06.5171035Z tmp41 = tmp39 > tmp38 2023-01-11T21:38:06.5171134Z tmp42 = tl.where(tmp41, tmp40, tmp37) 2023-01-11T21:38:06.5171272Z tmp43 = tl.where(tmp39 != tmp39, tmp39, tl.where(tmp39 > tmp38, tmp39, tmp38)) 2023-01-11T21:38:06.5171359Z tmp45 = 110 + (2*x0) + (110*x1) 2023-01-11T21:38:06.5171441Z tmp46 = tmp44 > tmp43 2023-01-11T21:38:06.5171566Z tmp47 = tl.where(tmp46, tmp45, tmp42) 2023-01-11T21:38:06.5171694Z tmp48 = tl.where(tmp44 != tmp44, tmp44, tl.where(tmp44 > tmp43, tmp44, tmp43)) 2023-01-11T21:38:06.5171780Z tmp50 = 111 + (2*x0) + (110*x1) 2023-01-11T21:38:06.5171862Z tmp51 = tmp49 > tmp48 2023-01-11T21:38:06.5171959Z tmp52 = tl.where(tmp51, tmp50, tmp47) 2023-01-11T21:38:06.5172099Z tmp53 = tl.where(tmp49 != tmp49, tmp49, tl.where(tmp49 > tmp48, tmp49, tmp48)) 2023-01-11T21:38:06.5172185Z tmp55 = 112 + (2*x0) + (110*x1) 2023-01-11T21:38:06.5172266Z tmp56 = tmp54 > tmp53 2023-01-11T21:38:06.5172359Z tmp57 = tl.where(tmp56, tmp55, tmp52) 2023-01-11T21:38:06.5172493Z tmp58 = tl.where(tmp54 != tmp54, tmp54, tl.where(tmp54 > tmp53, tmp54, tmp53)) 2023-01-11T21:38:06.5172628Z tl.store(out_ptr0 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp16, xmask) 2023-01-11T21:38:06.5172761Z tl.store(out_ptr1 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp57, xmask) 2023-01-11T21:38:06.5172848Z ''') 2023-01-11T21:38:06.5172857Z 2023-01-11T21:38:06.5172862Z 2023-01-11T21:38:06.5172959Z async_compile.wait(globals()) 2023-01-11T21:38:06.5173037Z del async_compile 2023-01-11T21:38:06.5173042Z 2023-01-11T21:38:06.5173119Z def call(args): 2023-01-11T21:38:06.5173188Z arg0_1, = args 2023-01-11T21:38:06.5173265Z args.clear() 2023-01-11T21:38:06.5173364Z with torch.cuda.device(0): 2023-01-11T21:38:06.5173597Z buf0 = empty_strided((16, 64, 27, 27), (46656, 729, 27, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.5173823Z buf1 = empty_strided((16, 64, 27, 27), (46656, 729, 27, 1), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.5173918Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.5174079Z triton_fused_getitem_getitem_1_0.run(arg0_1, buf0, buf1, 746496, grid=grid(746496), stream=stream0) 2023-01-11T21:38:06.5174152Z del arg0_1 2023-01-11T21:38:06.5174231Z return (buf0, buf1, ) 2023-01-11T21:38:06.5174236Z 2023-01-11T21:38:06.5174243Z 2023-01-11T21:38:06.5174326Z if __name__ == "__main__": 2023-01-11T21:38:06.5174446Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.5174687Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.5174922Z arg0_1 = rand_strided((16, 64, 55, 55), (193600, 3025, 55, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.5175079Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.5175085Z 2023-01-11T21:38:06.5175351Z [2023-01-11 21:35:16,679] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 703 2023-01-11T21:38:06.5175357Z 2023-01-11T21:38:06.5175455Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.5175523Z import torch 2023-01-11T21:38:06.5175599Z import random 2023-01-11T21:38:06.5175717Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.5175841Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.5175846Z 2023-01-11T21:38:06.5175927Z aten = torch.ops.aten 2023-01-11T21:38:06.5176066Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.5176160Z async_compile = AsyncCompile() 2023-01-11T21:38:06.5176165Z 2023-01-11T21:38:06.5176239Z import triton 2023-01-11T21:38:06.5176324Z import triton.language as tl 2023-01-11T21:38:06.5176447Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.5176587Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.5176593Z 2023-01-11T21:38:06.5176598Z 2023-01-11T21:38:06.5176778Z triton_fused_getitem_getitem_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.5176853Z import triton 2023-01-11T21:38:06.5176940Z import triton.language as tl 2023-01-11T21:38:06.5177051Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.5177200Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.5177339Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.5177470Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.5177518Z 2023-01-11T21:38:06.5177946Z @pointwise(size_hints=[1048576], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*i64', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.5178021Z @triton.jit 2023-01-11T21:38:06.5178168Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.5178244Z xnumel = 746496 2023-01-11T21:38:06.5178344Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.5178475Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.5178554Z xmask = xindex < xnumel 2023-01-11T21:38:06.5178633Z x0 = xindex % 27 2023-01-11T21:38:06.5178718Z x1 = (xindex // 27) % 27 2023-01-11T21:38:06.5178800Z x2 = (xindex // 729) 2023-01-11T21:38:06.5178879Z x3 = xindex 2023-01-11T21:38:06.5179128Z tmp0 = tl.load(in_ptr0 + ((2*x0) + (110*x1) + (3025*x2)), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.5179381Z tmp1 = tl.load(in_ptr0 + (1 + (2*x0) + (110*x1) + (3025*x2)), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.5179614Z tmp3 = tl.load(in_ptr0 + (2 + (2*x0) + (110*x1) + (3025*x2)), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.5179867Z tmp5 = tl.load(in_ptr0 + (55 + (2*x0) + (110*x1) + (3025*x2)), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.5180116Z tmp7 = tl.load(in_ptr0 + (56 + (2*x0) + (110*x1) + (3025*x2)), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.5180362Z tmp9 = tl.load(in_ptr0 + (57 + (2*x0) + (110*x1) + (3025*x2)), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.5180610Z tmp11 = tl.load(in_ptr0 + (110 + (2*x0) + (110*x1) + (3025*x2)), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.5180857Z tmp13 = tl.load(in_ptr0 + (111 + (2*x0) + (110*x1) + (3025*x2)), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.5181112Z tmp15 = tl.load(in_ptr0 + (112 + (2*x0) + (110*x1) + (3025*x2)), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.5181245Z tmp17 = tl.load(in_ptr0 + ((2*x0) + (110*x1) + (3025*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.5181405Z tmp19 = tl.load(in_ptr0 + (1 + (2*x0) + (110*x1) + (3025*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.5181536Z tmp24 = tl.load(in_ptr0 + (2 + (2*x0) + (110*x1) + (3025*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.5181673Z tmp29 = tl.load(in_ptr0 + (55 + (2*x0) + (110*x1) + (3025*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.5181810Z tmp34 = tl.load(in_ptr0 + (56 + (2*x0) + (110*x1) + (3025*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.5181941Z tmp39 = tl.load(in_ptr0 + (57 + (2*x0) + (110*x1) + (3025*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.5182075Z tmp44 = tl.load(in_ptr0 + (110 + (2*x0) + (110*x1) + (3025*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.5182211Z tmp49 = tl.load(in_ptr0 + (111 + (2*x0) + (110*x1) + (3025*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.5182346Z tmp54 = tl.load(in_ptr0 + (112 + (2*x0) + (110*x1) + (3025*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.5182484Z tmp2 = tl.where(tmp1 != tmp1, tmp1, tl.where(tmp1 > tmp0, tmp1, tmp0)) 2023-01-11T21:38:06.5182621Z tmp4 = tl.where(tmp3 != tmp3, tmp3, tl.where(tmp3 > tmp2, tmp3, tmp2)) 2023-01-11T21:38:06.5182748Z tmp6 = tl.where(tmp5 != tmp5, tmp5, tl.where(tmp5 > tmp4, tmp5, tmp4)) 2023-01-11T21:38:06.5182880Z tmp8 = tl.where(tmp7 != tmp7, tmp7, tl.where(tmp7 > tmp6, tmp7, tmp6)) 2023-01-11T21:38:06.5183019Z tmp10 = tl.where(tmp9 != tmp9, tmp9, tl.where(tmp9 > tmp8, tmp9, tmp8)) 2023-01-11T21:38:06.5183164Z tmp12 = tl.where(tmp11 != tmp11, tmp11, tl.where(tmp11 > tmp10, tmp11, tmp10)) 2023-01-11T21:38:06.5183306Z tmp14 = tl.where(tmp13 != tmp13, tmp13, tl.where(tmp13 > tmp12, tmp13, tmp12)) 2023-01-11T21:38:06.5183468Z tmp16 = tl.where(tmp15 != tmp15, tmp15, tl.where(tmp15 > tmp14, tmp15, tmp14)) 2023-01-11T21:38:06.5183550Z tmp18 = (2*x0) + (110*x1) 2023-01-11T21:38:06.5183630Z tmp20 = 1 + (2*x0) + (110*x1) 2023-01-11T21:38:06.5183713Z tmp21 = tmp19 > tmp17 2023-01-11T21:38:06.5183817Z tmp22 = tl.where(tmp21, tmp20, tmp18) 2023-01-11T21:38:06.5183960Z tmp23 = tl.where(tmp19 != tmp19, tmp19, tl.where(tmp19 > tmp17, tmp19, tmp17)) 2023-01-11T21:38:06.5184044Z tmp25 = 2 + (2*x0) + (110*x1) 2023-01-11T21:38:06.5184127Z tmp26 = tmp24 > tmp23 2023-01-11T21:38:06.5184227Z tmp27 = tl.where(tmp26, tmp25, tmp22) 2023-01-11T21:38:06.5184363Z tmp28 = tl.where(tmp24 != tmp24, tmp24, tl.where(tmp24 > tmp23, tmp24, tmp23)) 2023-01-11T21:38:06.5184442Z tmp30 = 55 + (2*x0) + (110*x1) 2023-01-11T21:38:06.5184525Z tmp31 = tmp29 > tmp28 2023-01-11T21:38:06.5184628Z tmp32 = tl.where(tmp31, tmp30, tmp27) 2023-01-11T21:38:06.5184769Z tmp33 = tl.where(tmp29 != tmp29, tmp29, tl.where(tmp29 > tmp28, tmp29, tmp28)) 2023-01-11T21:38:06.5184858Z tmp35 = 56 + (2*x0) + (110*x1) 2023-01-11T21:38:06.5184941Z tmp36 = tmp34 > tmp33 2023-01-11T21:38:06.5185041Z tmp37 = tl.where(tmp36, tmp35, tmp32) 2023-01-11T21:38:06.5185170Z tmp38 = tl.where(tmp34 != tmp34, tmp34, tl.where(tmp34 > tmp33, tmp34, tmp33)) 2023-01-11T21:38:06.5185257Z tmp40 = 57 + (2*x0) + (110*x1) 2023-01-11T21:38:06.5185341Z tmp41 = tmp39 > tmp38 2023-01-11T21:38:06.5185438Z tmp42 = tl.where(tmp41, tmp40, tmp37) 2023-01-11T21:38:06.5185577Z tmp43 = tl.where(tmp39 != tmp39, tmp39, tl.where(tmp39 > tmp38, tmp39, tmp38)) 2023-01-11T21:38:06.5185686Z tmp45 = 110 + (2*x0) + (110*x1) 2023-01-11T21:38:06.5185774Z tmp46 = tmp44 > tmp43 2023-01-11T21:38:06.5185888Z tmp47 = tl.where(tmp46, tmp45, tmp42) 2023-01-11T21:38:06.5186023Z tmp48 = tl.where(tmp44 != tmp44, tmp44, tl.where(tmp44 > tmp43, tmp44, tmp43)) 2023-01-11T21:38:06.5186109Z tmp50 = 111 + (2*x0) + (110*x1) 2023-01-11T21:38:06.5186192Z tmp51 = tmp49 > tmp48 2023-01-11T21:38:06.5186292Z tmp52 = tl.where(tmp51, tmp50, tmp47) 2023-01-11T21:38:06.5186427Z tmp53 = tl.where(tmp49 != tmp49, tmp49, tl.where(tmp49 > tmp48, tmp49, tmp48)) 2023-01-11T21:38:06.5186512Z tmp55 = 112 + (2*x0) + (110*x1) 2023-01-11T21:38:06.5186587Z tmp56 = tmp54 > tmp53 2023-01-11T21:38:06.5186712Z tmp57 = tl.where(tmp56, tmp55, tmp52) 2023-01-11T21:38:06.5186849Z tmp58 = tl.where(tmp54 != tmp54, tmp54, tl.where(tmp54 > tmp53, tmp54, tmp53)) 2023-01-11T21:38:06.5186983Z tl.store(out_ptr0 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp16, xmask) 2023-01-11T21:38:06.5187118Z tl.store(out_ptr1 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp57, xmask) 2023-01-11T21:38:06.5187209Z ''') 2023-01-11T21:38:06.5187215Z 2023-01-11T21:38:06.5187219Z 2023-01-11T21:38:06.5187315Z async_compile.wait(globals()) 2023-01-11T21:38:06.5187399Z del async_compile 2023-01-11T21:38:06.5187404Z 2023-01-11T21:38:06.5187474Z def call(args): 2023-01-11T21:38:06.5187552Z arg0_1, = args 2023-01-11T21:38:06.5187629Z args.clear() 2023-01-11T21:38:06.5187724Z with torch.cuda.device(0): 2023-01-11T21:38:06.5187954Z buf0 = empty_strided((16, 64, 27, 27), (46656, 729, 27, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.5188181Z buf1 = empty_strided((16, 64, 27, 27), (46656, 729, 27, 1), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.5188276Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.5188431Z triton_fused_getitem_getitem_1_0.run(arg0_1, buf0, buf1, 746496, grid=grid(746496), stream=stream0) 2023-01-11T21:38:06.5188511Z del arg0_1 2023-01-11T21:38:06.5188597Z return (buf0, buf1, ) 2023-01-11T21:38:06.5188602Z 2023-01-11T21:38:06.5188607Z 2023-01-11T21:38:06.5188687Z if __name__ == "__main__": 2023-01-11T21:38:06.5188809Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.5188940Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.5189200Z arg0_1 = rand_strided((16, 64, 55, 55), (193600, 3025, 55, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.5189318Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.5189323Z 2023-01-11T21:38:06.5189398Z ok (0.605s) 2023-01-11T21:38:06.5189855Z test_max_pool2d3_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.5189988Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.5190249Z [2023-01-11 21:35:16,701] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 704 2023-01-11T21:38:06.5190254Z 2023-01-11T21:38:06.5190354Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.5190433Z import torch 2023-01-11T21:38:06.5190509Z import random 2023-01-11T21:38:06.5190632Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.5190755Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.5190761Z 2023-01-11T21:38:06.5190846Z aten = torch.ops.aten 2023-01-11T21:38:06.5190982Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.5191081Z async_compile = AsyncCompile() 2023-01-11T21:38:06.5191086Z 2023-01-11T21:38:06.5191160Z import triton 2023-01-11T21:38:06.5191252Z import triton.language as tl 2023-01-11T21:38:06.5191378Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.5191521Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.5191527Z 2023-01-11T21:38:06.5191531Z 2023-01-11T21:38:06.5191712Z triton_fused_getitem_getitem_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.5191783Z import triton 2023-01-11T21:38:06.5191875Z import triton.language as tl 2023-01-11T21:38:06.5191994Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.5192097Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.5192236Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.5192366Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.5192371Z 2023-01-11T21:38:06.5192822Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*i64', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.5192900Z @triton.jit 2023-01-11T21:38:06.5193042Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.5193112Z xnumel = 16 2023-01-11T21:38:06.5193211Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.5193341Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.5193429Z xmask = xindex < xnumel 2023-01-11T21:38:06.5193512Z x1 = (xindex // 4) 2023-01-11T21:38:06.5193588Z x0 = xindex % 4 2023-01-11T21:38:06.5193654Z x2 = xindex 2023-01-11T21:38:06.5193763Z tmp0 = (-1) + (2*x1) 2023-01-11T21:38:06.5193836Z tmp1 = 0 2023-01-11T21:38:06.5193916Z tmp2 = tmp0 >= tmp1 2023-01-11T21:38:06.5193988Z tmp3 = 8 2023-01-11T21:38:06.5194071Z tmp4 = tmp0 < tmp3 2023-01-11T21:38:06.5194149Z tmp5 = tmp2 & tmp4 2023-01-11T21:38:06.5194254Z tmp6 = (-1) + (2*x0) 2023-01-11T21:38:06.5194335Z tmp7 = tmp6 >= tmp1 2023-01-11T21:38:06.5194415Z tmp8 = tmp6 < tmp3 2023-01-11T21:38:06.5194494Z tmp9 = tmp7 & tmp8 2023-01-11T21:38:06.5194574Z tmp10 = tmp5 & tmp9 2023-01-11T21:38:06.5194869Z tmp11 = tl.load(in_ptr0 + ((-9) + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp10 & xmask, eviction_policy='evict_last', other=0) 2023-01-11T21:38:06.5195027Z tmp12 = tl.where(tmp10, tmp11, float("-inf")) 2023-01-11T21:38:06.5195097Z tmp13 = 2*x0 2023-01-11T21:38:06.5195217Z tmp14 = tmp13 >= tmp1 2023-01-11T21:38:06.5195299Z tmp15 = tmp13 < tmp3 2023-01-11T21:38:06.5195383Z tmp16 = tmp14 & tmp15 2023-01-11T21:38:06.5195467Z tmp17 = tmp5 & tmp16 2023-01-11T21:38:06.5195751Z tmp18 = tl.load(in_ptr0 + ((-8) + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp17 & xmask, eviction_policy='evict_last', other=0) 2023-01-11T21:38:06.5195906Z tmp19 = tl.where(tmp17, tmp18, float("-inf")) 2023-01-11T21:38:06.5196046Z tmp20 = tl.where(tmp19 != tmp19, tmp19, tl.where(tmp19 > tmp12, tmp19, tmp12)) 2023-01-11T21:38:06.5196123Z tmp21 = 1 + (2*x0) 2023-01-11T21:38:06.5196206Z tmp22 = tmp21 >= tmp1 2023-01-11T21:38:06.5196286Z tmp23 = tmp21 < tmp3 2023-01-11T21:38:06.5196369Z tmp24 = tmp22 & tmp23 2023-01-11T21:38:06.5196450Z tmp25 = tmp5 & tmp24 2023-01-11T21:38:06.5196736Z tmp26 = tl.load(in_ptr0 + ((-7) + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp25 & xmask, eviction_policy='evict_last', other=0) 2023-01-11T21:38:06.5196889Z tmp27 = tl.where(tmp25, tmp26, float("-inf")) 2023-01-11T21:38:06.5197028Z tmp28 = tl.where(tmp27 != tmp27, tmp27, tl.where(tmp27 > tmp20, tmp27, tmp20)) 2023-01-11T21:38:06.5197103Z tmp29 = 2*x1 2023-01-11T21:38:06.5197185Z tmp30 = tmp29 >= tmp1 2023-01-11T21:38:06.5197265Z tmp31 = tmp29 < tmp3 2023-01-11T21:38:06.5197349Z tmp32 = tmp30 & tmp31 2023-01-11T21:38:06.5197432Z tmp33 = tmp32 & tmp9 2023-01-11T21:38:06.5197702Z tmp34 = tl.load(in_ptr0 + ((-1) + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp33 & xmask, eviction_policy='evict_last', other=0) 2023-01-11T21:38:06.5197856Z tmp35 = tl.where(tmp33, tmp34, float("-inf")) 2023-01-11T21:38:06.5197995Z tmp36 = tl.where(tmp35 != tmp35, tmp35, tl.where(tmp35 > tmp28, tmp35, tmp28)) 2023-01-11T21:38:06.5198082Z tmp37 = tmp32 & tmp16 2023-01-11T21:38:06.5198358Z tmp38 = tl.load(in_ptr0 + ((2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp37 & xmask, eviction_policy='evict_last', other=0) 2023-01-11T21:38:06.5198514Z tmp39 = tl.where(tmp37, tmp38, float("-inf")) 2023-01-11T21:38:06.5198654Z tmp40 = tl.where(tmp39 != tmp39, tmp39, tl.where(tmp39 > tmp36, tmp39, tmp36)) 2023-01-11T21:38:06.5198737Z tmp41 = tmp32 & tmp24 2023-01-11T21:38:06.5199039Z tmp42 = tl.load(in_ptr0 + (1 + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp41 & xmask, eviction_policy='evict_last', other=0) 2023-01-11T21:38:06.5199202Z tmp43 = tl.where(tmp41, tmp42, float("-inf")) 2023-01-11T21:38:06.5199344Z tmp44 = tl.where(tmp43 != tmp43, tmp43, tl.where(tmp43 > tmp40, tmp43, tmp40)) 2023-01-11T21:38:06.5199423Z tmp45 = 1 + (2*x1) 2023-01-11T21:38:06.5199507Z tmp46 = tmp45 >= tmp1 2023-01-11T21:38:06.5199589Z tmp47 = tmp45 < tmp3 2023-01-11T21:38:06.5199672Z tmp48 = tmp46 & tmp47 2023-01-11T21:38:06.5199757Z tmp49 = tmp48 & tmp9 2023-01-11T21:38:06.5200031Z tmp50 = tl.load(in_ptr0 + (7 + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp49 & xmask, eviction_policy='evict_last', other=0) 2023-01-11T21:38:06.5200190Z tmp51 = tl.where(tmp49, tmp50, float("-inf")) 2023-01-11T21:38:06.5200330Z tmp52 = tl.where(tmp51 != tmp51, tmp51, tl.where(tmp51 > tmp44, tmp51, tmp44)) 2023-01-11T21:38:06.5200410Z tmp53 = tmp48 & tmp16 2023-01-11T21:38:06.5200684Z tmp54 = tl.load(in_ptr0 + (8 + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp53 & xmask, eviction_policy='evict_last', other=0) 2023-01-11T21:38:06.5200839Z tmp55 = tl.where(tmp53, tmp54, float("-inf")) 2023-01-11T21:38:06.5200978Z tmp56 = tl.where(tmp55 != tmp55, tmp55, tl.where(tmp55 > tmp52, tmp55, tmp52)) 2023-01-11T21:38:06.5201080Z tmp57 = tmp48 & tmp24 2023-01-11T21:38:06.5201380Z tmp58 = tl.load(in_ptr0 + (9 + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp57 & xmask, eviction_policy='evict_last', other=0) 2023-01-11T21:38:06.5201535Z tmp59 = tl.where(tmp57, tmp58, float("-inf")) 2023-01-11T21:38:06.5201674Z tmp60 = tl.where(tmp59 != tmp59, tmp59, tl.where(tmp59 > tmp56, tmp59, tmp56)) 2023-01-11T21:38:06.5201942Z tmp61 = tl.load(in_ptr0 + ((-9) + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp10 & xmask, other=0) 2023-01-11T21:38:06.5202096Z tmp62 = tl.where(tmp10, tmp61, float("-inf")) 2023-01-11T21:38:06.5202223Z tmp63 = (-9) + (2*x0) + (16*x1) 2023-01-11T21:38:06.5202465Z tmp64 = tl.load(in_ptr0 + ((-8) + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp17 & xmask, other=0) 2023-01-11T21:38:06.5202618Z tmp65 = tl.where(tmp17, tmp64, float("-inf")) 2023-01-11T21:38:06.5202738Z tmp66 = (-8) + (2*x0) + (16*x1) 2023-01-11T21:38:06.5202819Z tmp67 = tmp65 > tmp62 2023-01-11T21:38:06.5202919Z tmp68 = tl.where(tmp67, tmp66, tmp63) 2023-01-11T21:38:06.5203059Z tmp69 = tl.where(tmp65 != tmp65, tmp65, tl.where(tmp65 > tmp62, tmp65, tmp62)) 2023-01-11T21:38:06.5203294Z tmp70 = tl.load(in_ptr0 + ((-7) + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp25 & xmask, other=0) 2023-01-11T21:38:06.5203445Z tmp71 = tl.where(tmp25, tmp70, float("-inf")) 2023-01-11T21:38:06.5203571Z tmp72 = (-7) + (2*x0) + (16*x1) 2023-01-11T21:38:06.5203647Z tmp73 = tmp71 > tmp69 2023-01-11T21:38:06.5203748Z tmp74 = tl.where(tmp73, tmp72, tmp68) 2023-01-11T21:38:06.5203887Z tmp75 = tl.where(tmp71 != tmp71, tmp71, tl.where(tmp71 > tmp69, tmp71, tmp69)) 2023-01-11T21:38:06.5204122Z tmp76 = tl.load(in_ptr0 + ((-1) + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp33 & xmask, other=0) 2023-01-11T21:38:06.5204276Z tmp77 = tl.where(tmp33, tmp76, float("-inf")) 2023-01-11T21:38:06.5204399Z tmp78 = (-1) + (2*x0) + (16*x1) 2023-01-11T21:38:06.5204480Z tmp79 = tmp77 > tmp75 2023-01-11T21:38:06.5204574Z tmp80 = tl.where(tmp79, tmp78, tmp74) 2023-01-11T21:38:06.5204711Z tmp81 = tl.where(tmp77 != tmp77, tmp77, tl.where(tmp77 > tmp75, tmp77, tmp75)) 2023-01-11T21:38:06.5204866Z tmp82 = tl.load(in_ptr0 + ((2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp37 & xmask, other=0) 2023-01-11T21:38:06.5205019Z tmp83 = tl.where(tmp37, tmp82, float("-inf")) 2023-01-11T21:38:06.5205105Z tmp84 = (2*x0) + (16*x1) 2023-01-11T21:38:06.5205187Z tmp85 = tmp83 > tmp81 2023-01-11T21:38:06.5205285Z tmp86 = tl.where(tmp85, tmp84, tmp80) 2023-01-11T21:38:06.5205422Z tmp87 = tl.where(tmp83 != tmp83, tmp83, tl.where(tmp83 > tmp81, tmp83, tmp81)) 2023-01-11T21:38:06.5205601Z tmp88 = tl.load(in_ptr0 + (1 + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp41 & xmask, other=0) 2023-01-11T21:38:06.5205757Z tmp89 = tl.where(tmp41, tmp88, float("-inf")) 2023-01-11T21:38:06.5205847Z tmp90 = 1 + (2*x0) + (16*x1) 2023-01-11T21:38:06.5205929Z tmp91 = tmp89 > tmp87 2023-01-11T21:38:06.5206028Z tmp92 = tl.where(tmp91, tmp90, tmp86) 2023-01-11T21:38:06.5206169Z tmp93 = tl.where(tmp89 != tmp89, tmp89, tl.where(tmp89 > tmp87, tmp89, tmp87)) 2023-01-11T21:38:06.5206325Z tmp94 = tl.load(in_ptr0 + (7 + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp49 & xmask, other=0) 2023-01-11T21:38:06.5206472Z tmp95 = tl.where(tmp49, tmp94, float("-inf")) 2023-01-11T21:38:06.5206561Z tmp96 = 7 + (2*x0) + (16*x1) 2023-01-11T21:38:06.5206643Z tmp97 = tmp95 > tmp93 2023-01-11T21:38:06.5206741Z tmp98 = tl.where(tmp97, tmp96, tmp92) 2023-01-11T21:38:06.5206879Z tmp99 = tl.where(tmp95 != tmp95, tmp95, tl.where(tmp95 > tmp93, tmp95, tmp93)) 2023-01-11T21:38:06.5207037Z tmp100 = tl.load(in_ptr0 + (8 + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp53 & xmask, other=0) 2023-01-11T21:38:06.5207197Z tmp101 = tl.where(tmp53, tmp100, float("-inf")) 2023-01-11T21:38:06.5207286Z tmp102 = 8 + (2*x0) + (16*x1) 2023-01-11T21:38:06.5207364Z tmp103 = tmp101 > tmp99 2023-01-11T21:38:06.5207469Z tmp104 = tl.where(tmp103, tmp102, tmp98) 2023-01-11T21:38:06.5207614Z tmp105 = tl.where(tmp101 != tmp101, tmp101, tl.where(tmp101 > tmp99, tmp101, tmp99)) 2023-01-11T21:38:06.5207772Z tmp106 = tl.load(in_ptr0 + (9 + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp57 & xmask, other=0) 2023-01-11T21:38:06.5207958Z tmp107 = tl.where(tmp57, tmp106, float("-inf")) 2023-01-11T21:38:06.5208044Z tmp108 = 9 + (2*x0) + (16*x1) 2023-01-11T21:38:06.5208129Z tmp109 = tmp107 > tmp105 2023-01-11T21:38:06.5208229Z tmp110 = tl.where(tmp109, tmp108, tmp104) 2023-01-11T21:38:06.5208378Z tmp111 = tl.where(tmp107 != tmp107, tmp107, tl.where(tmp107 > tmp105, tmp107, tmp105)) 2023-01-11T21:38:06.5208512Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp60, xmask) 2023-01-11T21:38:06.5208647Z tl.store(out_ptr1 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp110, xmask) 2023-01-11T21:38:06.5208735Z ''') 2023-01-11T21:38:06.5208741Z 2023-01-11T21:38:06.5208745Z 2023-01-11T21:38:06.5208841Z async_compile.wait(globals()) 2023-01-11T21:38:06.5208919Z del async_compile 2023-01-11T21:38:06.5208924Z 2023-01-11T21:38:06.5209002Z def call(args): 2023-01-11T21:38:06.5209071Z arg0_1, = args 2023-01-11T21:38:06.5209147Z args.clear() 2023-01-11T21:38:06.5209240Z with torch.cuda.device(0): 2023-01-11T21:38:06.5209460Z buf0 = empty_strided((1, 1, 4, 4), (16, 16, 4, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.5209675Z buf1 = empty_strided((1, 1, 4, 4), (16, 16, 4, 1), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.5209768Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.5209932Z triton_fused_getitem_getitem_1_0.run(arg0_1, buf0, buf1, 16, grid=grid(16), stream=stream0) 2023-01-11T21:38:06.5210010Z del arg0_1 2023-01-11T21:38:06.5210089Z return (buf0, buf1, ) 2023-01-11T21:38:06.5210094Z 2023-01-11T21:38:06.5210099Z 2023-01-11T21:38:06.5210182Z if __name__ == "__main__": 2023-01-11T21:38:06.5210303Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.5210432Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.5210650Z arg0_1 = rand_strided((1, 1, 8, 8), (64, 64, 8, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.5210764Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.5211035Z [2023-01-11 21:35:16,976] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 704 2023-01-11T21:38:06.5211487Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.5211621Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.5211872Z [2023-01-11 21:35:16,996] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 705 2023-01-11T21:38:06.5211878Z 2023-01-11T21:38:06.5211889Z 2023-01-11T21:38:06.5211982Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.5212059Z import torch 2023-01-11T21:38:06.5212136Z import random 2023-01-11T21:38:06.5212255Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.5212385Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.5212391Z 2023-01-11T21:38:06.5212473Z aten = torch.ops.aten 2023-01-11T21:38:06.5212611Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.5212703Z async_compile = AsyncCompile() 2023-01-11T21:38:06.5212711Z 2023-01-11T21:38:06.5212786Z import triton 2023-01-11T21:38:06.5212879Z import triton.language as tl 2023-01-11T21:38:06.5213005Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.5213148Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.5213154Z 2023-01-11T21:38:06.5213158Z 2023-01-11T21:38:06.5213340Z triton_fused_getitem_getitem_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.5213418Z import triton 2023-01-11T21:38:06.5213511Z import triton.language as tl 2023-01-11T21:38:06.5213620Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.5213723Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.5213885Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.5214012Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.5214017Z 2023-01-11T21:38:06.5214438Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*i64', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.5214624Z @triton.jit 2023-01-11T21:38:06.5214770Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.5214847Z xnumel = 16 2023-01-11T21:38:06.5214938Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.5215066Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.5215150Z xmask = xindex < xnumel 2023-01-11T21:38:06.5215229Z x1 = (xindex // 4) 2023-01-11T21:38:06.5215303Z x0 = xindex % 4 2023-01-11T21:38:06.5215378Z x2 = xindex 2023-01-11T21:38:06.5215490Z tmp0 = (-1) + (2*x1) 2023-01-11T21:38:06.5215555Z tmp1 = 0 2023-01-11T21:38:06.5215633Z tmp2 = tmp0 >= tmp1 2023-01-11T21:38:06.5215703Z tmp3 = 8 2023-01-11T21:38:06.5215779Z tmp4 = tmp0 < tmp3 2023-01-11T21:38:06.5215855Z tmp5 = tmp2 & tmp4 2023-01-11T21:38:06.5215966Z tmp6 = (-1) + (2*x0) 2023-01-11T21:38:06.5216038Z tmp7 = tmp6 >= tmp1 2023-01-11T21:38:06.5216117Z tmp8 = tmp6 < tmp3 2023-01-11T21:38:06.5216192Z tmp9 = tmp7 & tmp8 2023-01-11T21:38:06.5216269Z tmp10 = tmp5 & tmp9 2023-01-11T21:38:06.5216584Z tmp11 = tl.load(in_ptr0 + ((-9) + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp10 & xmask, eviction_policy='evict_last', other=0).to(tl.float32) 2023-01-11T21:38:06.5216738Z tmp12 = tl.where(tmp10, tmp11, float("-inf")) 2023-01-11T21:38:06.5216811Z tmp13 = 2*x0 2023-01-11T21:38:06.5216885Z tmp14 = tmp13 >= tmp1 2023-01-11T21:38:06.5216965Z tmp15 = tmp13 < tmp3 2023-01-11T21:38:06.5217046Z tmp16 = tmp14 & tmp15 2023-01-11T21:38:06.5217171Z tmp17 = tmp5 & tmp16 2023-01-11T21:38:06.5217520Z tmp18 = tl.load(in_ptr0 + ((-8) + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp17 & xmask, eviction_policy='evict_last', other=0).to(tl.float32) 2023-01-11T21:38:06.5217722Z tmp19 = tl.where(tmp17, tmp18, float("-inf")) 2023-01-11T21:38:06.5217866Z tmp20 = tl.where(tmp19 != tmp19, tmp19, tl.where(tmp19 > tmp12, tmp19, tmp12)) 2023-01-11T21:38:06.5217941Z tmp21 = 1 + (2*x0) 2023-01-11T21:38:06.5218014Z tmp22 = tmp21 >= tmp1 2023-01-11T21:38:06.5218093Z tmp23 = tmp21 < tmp3 2023-01-11T21:38:06.5218169Z tmp24 = tmp22 & tmp23 2023-01-11T21:38:06.5218248Z tmp25 = tmp5 & tmp24 2023-01-11T21:38:06.5218557Z tmp26 = tl.load(in_ptr0 + ((-7) + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp25 & xmask, eviction_policy='evict_last', other=0).to(tl.float32) 2023-01-11T21:38:06.5218712Z tmp27 = tl.where(tmp25, tmp26, float("-inf")) 2023-01-11T21:38:06.5218857Z tmp28 = tl.where(tmp27 != tmp27, tmp27, tl.where(tmp27 > tmp20, tmp27, tmp20)) 2023-01-11T21:38:06.5218924Z tmp29 = 2*x1 2023-01-11T21:38:06.5219002Z tmp30 = tmp29 >= tmp1 2023-01-11T21:38:06.5219080Z tmp31 = tmp29 < tmp3 2023-01-11T21:38:06.5219155Z tmp32 = tmp30 & tmp31 2023-01-11T21:38:06.5219236Z tmp33 = tmp32 & tmp9 2023-01-11T21:38:06.5219531Z tmp34 = tl.load(in_ptr0 + ((-1) + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp33 & xmask, eviction_policy='evict_last', other=0).to(tl.float32) 2023-01-11T21:38:06.5219683Z tmp35 = tl.where(tmp33, tmp34, float("-inf")) 2023-01-11T21:38:06.5219812Z tmp36 = tl.where(tmp35 != tmp35, tmp35, tl.where(tmp35 > tmp28, tmp35, tmp28)) 2023-01-11T21:38:06.5219891Z tmp37 = tmp32 & tmp16 2023-01-11T21:38:06.5220185Z tmp38 = tl.load(in_ptr0 + ((2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp37 & xmask, eviction_policy='evict_last', other=0).to(tl.float32) 2023-01-11T21:38:06.5220336Z tmp39 = tl.where(tmp37, tmp38, float("-inf")) 2023-01-11T21:38:06.5220509Z tmp40 = tl.where(tmp39 != tmp39, tmp39, tl.where(tmp39 > tmp36, tmp39, tmp36)) 2023-01-11T21:38:06.5220590Z tmp41 = tmp32 & tmp24 2023-01-11T21:38:06.5220893Z tmp42 = tl.load(in_ptr0 + (1 + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp41 & xmask, eviction_policy='evict_last', other=0).to(tl.float32) 2023-01-11T21:38:06.5221046Z tmp43 = tl.where(tmp41, tmp42, float("-inf")) 2023-01-11T21:38:06.5221180Z tmp44 = tl.where(tmp43 != tmp43, tmp43, tl.where(tmp43 > tmp40, tmp43, tmp40)) 2023-01-11T21:38:06.5221259Z tmp45 = 1 + (2*x1) 2023-01-11T21:38:06.5221339Z tmp46 = tmp45 >= tmp1 2023-01-11T21:38:06.5221419Z tmp47 = tmp45 < tmp3 2023-01-11T21:38:06.5221501Z tmp48 = tmp46 & tmp47 2023-01-11T21:38:06.5221581Z tmp49 = tmp48 & tmp9 2023-01-11T21:38:06.5221879Z tmp50 = tl.load(in_ptr0 + (7 + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp49 & xmask, eviction_policy='evict_last', other=0).to(tl.float32) 2023-01-11T21:38:06.5222038Z tmp51 = tl.where(tmp49, tmp50, float("-inf")) 2023-01-11T21:38:06.5222172Z tmp52 = tl.where(tmp51 != tmp51, tmp51, tl.where(tmp51 > tmp44, tmp51, tmp44)) 2023-01-11T21:38:06.5222254Z tmp53 = tmp48 & tmp16 2023-01-11T21:38:06.5222556Z tmp54 = tl.load(in_ptr0 + (8 + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp53 & xmask, eviction_policy='evict_last', other=0).to(tl.float32) 2023-01-11T21:38:06.5222711Z tmp55 = tl.where(tmp53, tmp54, float("-inf")) 2023-01-11T21:38:06.5222852Z tmp56 = tl.where(tmp55 != tmp55, tmp55, tl.where(tmp55 > tmp52, tmp55, tmp52)) 2023-01-11T21:38:06.5222936Z tmp57 = tmp48 & tmp24 2023-01-11T21:38:06.5223237Z tmp58 = tl.load(in_ptr0 + (9 + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp57 & xmask, eviction_policy='evict_last', other=0).to(tl.float32) 2023-01-11T21:38:06.5223391Z tmp59 = tl.where(tmp57, tmp58, float("-inf")) 2023-01-11T21:38:06.5223522Z tmp60 = tl.where(tmp59 != tmp59, tmp59, tl.where(tmp59 > tmp56, tmp59, tmp56)) 2023-01-11T21:38:06.5223783Z tmp61 = tl.load(in_ptr0 + ((-9) + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp10 & xmask, other=0).to(tl.float32) 2023-01-11T21:38:06.5223937Z tmp62 = tl.where(tmp10, tmp61, float("-inf")) 2023-01-11T21:38:06.5224063Z tmp63 = (-9) + (2*x0) + (16*x1) 2023-01-11T21:38:06.5224351Z tmp64 = tl.load(in_ptr0 + ((-8) + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp17 & xmask, other=0).to(tl.float32) 2023-01-11T21:38:06.5224508Z tmp65 = tl.where(tmp17, tmp64, float("-inf")) 2023-01-11T21:38:06.5224634Z tmp66 = (-8) + (2*x0) + (16*x1) 2023-01-11T21:38:06.5224710Z tmp67 = tmp65 > tmp62 2023-01-11T21:38:06.5224811Z tmp68 = tl.where(tmp67, tmp66, tmp63) 2023-01-11T21:38:06.5224951Z tmp69 = tl.where(tmp65 != tmp65, tmp65, tl.where(tmp65 > tmp62, tmp65, tmp62)) 2023-01-11T21:38:06.5225202Z tmp70 = tl.load(in_ptr0 + ((-7) + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp25 & xmask, other=0).to(tl.float32) 2023-01-11T21:38:06.5225357Z tmp71 = tl.where(tmp25, tmp70, float("-inf")) 2023-01-11T21:38:06.5225480Z tmp72 = (-7) + (2*x0) + (16*x1) 2023-01-11T21:38:06.5225561Z tmp73 = tmp71 > tmp69 2023-01-11T21:38:06.5225662Z tmp74 = tl.where(tmp73, tmp72, tmp68) 2023-01-11T21:38:06.5225809Z tmp75 = tl.where(tmp71 != tmp71, tmp71, tl.where(tmp71 > tmp69, tmp71, tmp69)) 2023-01-11T21:38:06.5226097Z tmp76 = tl.load(in_ptr0 + ((-1) + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp33 & xmask, other=0).to(tl.float32) 2023-01-11T21:38:06.5226252Z tmp77 = tl.where(tmp33, tmp76, float("-inf")) 2023-01-11T21:38:06.5226375Z tmp78 = (-1) + (2*x0) + (16*x1) 2023-01-11T21:38:06.5226458Z tmp79 = tmp77 > tmp75 2023-01-11T21:38:06.5226559Z tmp80 = tl.where(tmp79, tmp78, tmp74) 2023-01-11T21:38:06.5226697Z tmp81 = tl.where(tmp77 != tmp77, tmp77, tl.where(tmp77 > tmp75, tmp77, tmp75)) 2023-01-11T21:38:06.5226865Z tmp82 = tl.load(in_ptr0 + ((2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp37 & xmask, other=0).to(tl.float32) 2023-01-11T21:38:06.5227042Z tmp83 = tl.where(tmp37, tmp82, float("-inf")) 2023-01-11T21:38:06.5227127Z tmp84 = (2*x0) + (16*x1) 2023-01-11T21:38:06.5227210Z tmp85 = tmp83 > tmp81 2023-01-11T21:38:06.5227310Z tmp86 = tl.where(tmp85, tmp84, tmp80) 2023-01-11T21:38:06.5227452Z tmp87 = tl.where(tmp83 != tmp83, tmp83, tl.where(tmp83 > tmp81, tmp83, tmp81)) 2023-01-11T21:38:06.5227620Z tmp88 = tl.load(in_ptr0 + (1 + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp41 & xmask, other=0).to(tl.float32) 2023-01-11T21:38:06.5227775Z tmp89 = tl.where(tmp41, tmp88, float("-inf")) 2023-01-11T21:38:06.5227854Z tmp90 = 1 + (2*x0) + (16*x1) 2023-01-11T21:38:06.5227940Z tmp91 = tmp89 > tmp87 2023-01-11T21:38:06.5228043Z tmp92 = tl.where(tmp91, tmp90, tmp86) 2023-01-11T21:38:06.5228181Z tmp93 = tl.where(tmp89 != tmp89, tmp89, tl.where(tmp89 > tmp87, tmp89, tmp87)) 2023-01-11T21:38:06.5228350Z tmp94 = tl.load(in_ptr0 + (7 + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp49 & xmask, other=0).to(tl.float32) 2023-01-11T21:38:06.5228505Z tmp95 = tl.where(tmp49, tmp94, float("-inf")) 2023-01-11T21:38:06.5228591Z tmp96 = 7 + (2*x0) + (16*x1) 2023-01-11T21:38:06.5228672Z tmp97 = tmp95 > tmp93 2023-01-11T21:38:06.5228765Z tmp98 = tl.where(tmp97, tmp96, tmp92) 2023-01-11T21:38:06.5228906Z tmp99 = tl.where(tmp95 != tmp95, tmp95, tl.where(tmp95 > tmp93, tmp95, tmp93)) 2023-01-11T21:38:06.5229077Z tmp100 = tl.load(in_ptr0 + (8 + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp53 & xmask, other=0).to(tl.float32) 2023-01-11T21:38:06.5229231Z tmp101 = tl.where(tmp53, tmp100, float("-inf")) 2023-01-11T21:38:06.5229318Z tmp102 = 8 + (2*x0) + (16*x1) 2023-01-11T21:38:06.5229401Z tmp103 = tmp101 > tmp99 2023-01-11T21:38:06.5229508Z tmp104 = tl.where(tmp103, tmp102, tmp98) 2023-01-11T21:38:06.5229647Z tmp105 = tl.where(tmp101 != tmp101, tmp101, tl.where(tmp101 > tmp99, tmp101, tmp99)) 2023-01-11T21:38:06.5229819Z tmp106 = tl.load(in_ptr0 + (9 + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp57 & xmask, other=0).to(tl.float32) 2023-01-11T21:38:06.5229977Z tmp107 = tl.where(tmp57, tmp106, float("-inf")) 2023-01-11T21:38:06.5230066Z tmp108 = 9 + (2*x0) + (16*x1) 2023-01-11T21:38:06.5230150Z tmp109 = tmp107 > tmp105 2023-01-11T21:38:06.5230284Z tmp110 = tl.where(tmp109, tmp108, tmp104) 2023-01-11T21:38:06.5230432Z tmp111 = tl.where(tmp107 != tmp107, tmp107, tl.where(tmp107 > tmp105, tmp107, tmp105)) 2023-01-11T21:38:06.5230569Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp60, xmask) 2023-01-11T21:38:06.5230699Z tl.store(out_ptr1 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp110, xmask) 2023-01-11T21:38:06.5230788Z ''') 2023-01-11T21:38:06.5230793Z 2023-01-11T21:38:06.5230798Z 2023-01-11T21:38:06.5230895Z async_compile.wait(globals()) 2023-01-11T21:38:06.5230974Z del async_compile 2023-01-11T21:38:06.5230979Z 2023-01-11T21:38:06.5231058Z def call(args): 2023-01-11T21:38:06.5231137Z arg0_1, = args 2023-01-11T21:38:06.5231215Z args.clear() 2023-01-11T21:38:06.5231304Z with torch.cuda.device(0): 2023-01-11T21:38:06.5231523Z buf0 = empty_strided((1, 1, 4, 4), (16, 16, 4, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.5231737Z buf1 = empty_strided((1, 1, 4, 4), (16, 16, 4, 1), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.5231834Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.5231992Z triton_fused_getitem_getitem_1_0.run(arg0_1, buf0, buf1, 16, grid=grid(16), stream=stream0) 2023-01-11T21:38:06.5232068Z del arg0_1 2023-01-11T21:38:06.5232152Z return (buf0, buf1, ) 2023-01-11T21:38:06.5232157Z 2023-01-11T21:38:06.5232162Z 2023-01-11T21:38:06.5232247Z if __name__ == "__main__": 2023-01-11T21:38:06.5232360Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.5232489Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.5232706Z arg0_1 = rand_strided((1, 1, 8, 8), (64, 64, 8, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.5232895Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.5233165Z [2023-01-11 21:35:17,263] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 705 2023-01-11T21:38:06.5233171Z 2023-01-11T21:38:06.5233246Z ok (0.583s) 2023-01-11T21:38:06.5233708Z test_max_pool2d4_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.5233842Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.5234101Z [2023-01-11 21:35:17,285] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 706 2023-01-11T21:38:06.5234369Z [2023-01-11 21:35:17,571] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 706 2023-01-11T21:38:06.5234779Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.5234913Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.5235170Z [2023-01-11 21:35:17,593] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 707 2023-01-11T21:38:06.5235176Z 2023-01-11T21:38:06.5235276Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.5235365Z import torch 2023-01-11T21:38:06.5235452Z import random 2023-01-11T21:38:06.5235595Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.5235726Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.5235732Z 2023-01-11T21:38:06.5235816Z aten = torch.ops.aten 2023-01-11T21:38:06.5235951Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.5236052Z async_compile = AsyncCompile() 2023-01-11T21:38:06.5236082Z 2023-01-11T21:38:06.5236161Z import triton 2023-01-11T21:38:06.5236255Z import triton.language as tl 2023-01-11T21:38:06.5236382Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.5236525Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.5236530Z 2023-01-11T21:38:06.5236535Z 2023-01-11T21:38:06.5236714Z triton_fused_getitem_getitem_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.5236790Z import triton 2023-01-11T21:38:06.5236878Z import triton.language as tl 2023-01-11T21:38:06.5236994Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.5237098Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.5237236Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.5237366Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.5237371Z 2023-01-11T21:38:06.5237793Z @pointwise(size_hints=[65536], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*i64', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.5237869Z @triton.jit 2023-01-11T21:38:06.5238014Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.5238085Z xnumel = 48400 2023-01-11T21:38:06.5238186Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.5238317Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.5238403Z xmask = xindex < xnumel 2023-01-11T21:38:06.5238481Z x0 = xindex % 55 2023-01-11T21:38:06.5238565Z x1 = (xindex // 55) % 55 2023-01-11T21:38:06.5238684Z x2 = (xindex // 3025) 2023-01-11T21:38:06.5238750Z x3 = xindex 2023-01-11T21:38:06.5238980Z tmp0 = tl.load(in_ptr0 + ((2*x0) + (222*x1) + (12321*x2)), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.5239206Z tmp1 = tl.load(in_ptr0 + (1 + (2*x0) + (222*x1) + (12321*x2)), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.5239434Z tmp3 = tl.load(in_ptr0 + (2 + (2*x0) + (222*x1) + (12321*x2)), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.5239662Z tmp5 = tl.load(in_ptr0 + (111 + (2*x0) + (222*x1) + (12321*x2)), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.5239890Z tmp7 = tl.load(in_ptr0 + (112 + (2*x0) + (222*x1) + (12321*x2)), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.5240115Z tmp9 = tl.load(in_ptr0 + (113 + (2*x0) + (222*x1) + (12321*x2)), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.5240345Z tmp11 = tl.load(in_ptr0 + (222 + (2*x0) + (222*x1) + (12321*x2)), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.5240571Z tmp13 = tl.load(in_ptr0 + (223 + (2*x0) + (222*x1) + (12321*x2)), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.5240798Z tmp15 = tl.load(in_ptr0 + (224 + (2*x0) + (222*x1) + (12321*x2)), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.5240922Z tmp17 = tl.load(in_ptr0 + ((2*x0) + (222*x1) + (12321*x2)), xmask) 2023-01-11T21:38:06.5241044Z tmp19 = tl.load(in_ptr0 + (1 + (2*x0) + (222*x1) + (12321*x2)), xmask) 2023-01-11T21:38:06.5241165Z tmp24 = tl.load(in_ptr0 + (2 + (2*x0) + (222*x1) + (12321*x2)), xmask) 2023-01-11T21:38:06.5241288Z tmp29 = tl.load(in_ptr0 + (111 + (2*x0) + (222*x1) + (12321*x2)), xmask) 2023-01-11T21:38:06.5241409Z tmp34 = tl.load(in_ptr0 + (112 + (2*x0) + (222*x1) + (12321*x2)), xmask) 2023-01-11T21:38:06.5241528Z tmp39 = tl.load(in_ptr0 + (113 + (2*x0) + (222*x1) + (12321*x2)), xmask) 2023-01-11T21:38:06.5241639Z tmp44 = tl.load(in_ptr0 + (222 + (2*x0) + (222*x1) + (12321*x2)), xmask) 2023-01-11T21:38:06.5241757Z tmp49 = tl.load(in_ptr0 + (223 + (2*x0) + (222*x1) + (12321*x2)), xmask) 2023-01-11T21:38:06.5241873Z tmp54 = tl.load(in_ptr0 + (224 + (2*x0) + (222*x1) + (12321*x2)), xmask) 2023-01-11T21:38:06.5242010Z tmp2 = tl.where(tmp1 != tmp1, tmp1, tl.where(tmp1 > tmp0, tmp1, tmp0)) 2023-01-11T21:38:06.5242176Z tmp4 = tl.where(tmp3 != tmp3, tmp3, tl.where(tmp3 > tmp2, tmp3, tmp2)) 2023-01-11T21:38:06.5242314Z tmp6 = tl.where(tmp5 != tmp5, tmp5, tl.where(tmp5 > tmp4, tmp5, tmp4)) 2023-01-11T21:38:06.5242447Z tmp8 = tl.where(tmp7 != tmp7, tmp7, tl.where(tmp7 > tmp6, tmp7, tmp6)) 2023-01-11T21:38:06.5242587Z tmp10 = tl.where(tmp9 != tmp9, tmp9, tl.where(tmp9 > tmp8, tmp9, tmp8)) 2023-01-11T21:38:06.5242725Z tmp12 = tl.where(tmp11 != tmp11, tmp11, tl.where(tmp11 > tmp10, tmp11, tmp10)) 2023-01-11T21:38:06.5242864Z tmp14 = tl.where(tmp13 != tmp13, tmp13, tl.where(tmp13 > tmp12, tmp13, tmp12)) 2023-01-11T21:38:06.5242998Z tmp16 = tl.where(tmp15 != tmp15, tmp15, tl.where(tmp15 > tmp14, tmp15, tmp14)) 2023-01-11T21:38:06.5243086Z tmp18 = (2*x0) + (222*x1) 2023-01-11T21:38:06.5243171Z tmp20 = 1 + (2*x0) + (222*x1) 2023-01-11T21:38:06.5243254Z tmp21 = tmp19 > tmp17 2023-01-11T21:38:06.5243356Z tmp22 = tl.where(tmp21, tmp20, tmp18) 2023-01-11T21:38:06.5243499Z tmp23 = tl.where(tmp19 != tmp19, tmp19, tl.where(tmp19 > tmp17, tmp19, tmp17)) 2023-01-11T21:38:06.5243578Z tmp25 = 2 + (2*x0) + (222*x1) 2023-01-11T21:38:06.5243661Z tmp26 = tmp24 > tmp23 2023-01-11T21:38:06.5243762Z tmp27 = tl.where(tmp26, tmp25, tmp22) 2023-01-11T21:38:06.5243899Z tmp28 = tl.where(tmp24 != tmp24, tmp24, tl.where(tmp24 > tmp23, tmp24, tmp23)) 2023-01-11T21:38:06.5243984Z tmp30 = 111 + (2*x0) + (222*x1) 2023-01-11T21:38:06.5244067Z tmp31 = tmp29 > tmp28 2023-01-11T21:38:06.5244168Z tmp32 = tl.where(tmp31, tmp30, tmp27) 2023-01-11T21:38:06.5244301Z tmp33 = tl.where(tmp29 != tmp29, tmp29, tl.where(tmp29 > tmp28, tmp29, tmp28)) 2023-01-11T21:38:06.5244417Z tmp35 = 112 + (2*x0) + (222*x1) 2023-01-11T21:38:06.5244501Z tmp36 = tmp34 > tmp33 2023-01-11T21:38:06.5244601Z tmp37 = tl.where(tmp36, tmp35, tmp32) 2023-01-11T21:38:06.5244736Z tmp38 = tl.where(tmp34 != tmp34, tmp34, tl.where(tmp34 > tmp33, tmp34, tmp33)) 2023-01-11T21:38:06.5244822Z tmp40 = 113 + (2*x0) + (222*x1) 2023-01-11T21:38:06.5244908Z tmp41 = tmp39 > tmp38 2023-01-11T21:38:06.5245001Z tmp42 = tl.where(tmp41, tmp40, tmp37) 2023-01-11T21:38:06.5245139Z tmp43 = tl.where(tmp39 != tmp39, tmp39, tl.where(tmp39 > tmp38, tmp39, tmp38)) 2023-01-11T21:38:06.5245223Z tmp45 = 222 + (2*x0) + (222*x1) 2023-01-11T21:38:06.5245306Z tmp46 = tmp44 > tmp43 2023-01-11T21:38:06.5245404Z tmp47 = tl.where(tmp46, tmp45, tmp42) 2023-01-11T21:38:06.5245538Z tmp48 = tl.where(tmp44 != tmp44, tmp44, tl.where(tmp44 > tmp43, tmp44, tmp43)) 2023-01-11T21:38:06.5245624Z tmp50 = 223 + (2*x0) + (222*x1) 2023-01-11T21:38:06.5245700Z tmp51 = tmp49 > tmp48 2023-01-11T21:38:06.5245801Z tmp52 = tl.where(tmp51, tmp50, tmp47) 2023-01-11T21:38:06.5245936Z tmp53 = tl.where(tmp49 != tmp49, tmp49, tl.where(tmp49 > tmp48, tmp49, tmp48)) 2023-01-11T21:38:06.5246021Z tmp55 = 224 + (2*x0) + (222*x1) 2023-01-11T21:38:06.5246101Z tmp56 = tmp54 > tmp53 2023-01-11T21:38:06.5246201Z tmp57 = tl.where(tmp56, tmp55, tmp52) 2023-01-11T21:38:06.5246339Z tmp58 = tl.where(tmp54 != tmp54, tmp54, tl.where(tmp54 > tmp53, tmp54, tmp53)) 2023-01-11T21:38:06.5246473Z tl.store(out_ptr0 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp16, xmask) 2023-01-11T21:38:06.5246601Z tl.store(out_ptr1 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp57, xmask) 2023-01-11T21:38:06.5246691Z ''') 2023-01-11T21:38:06.5246696Z 2023-01-11T21:38:06.5246701Z 2023-01-11T21:38:06.5246800Z async_compile.wait(globals()) 2023-01-11T21:38:06.5246878Z del async_compile 2023-01-11T21:38:06.5246883Z 2023-01-11T21:38:06.5246960Z def call(args): 2023-01-11T21:38:06.5247037Z arg0_1, = args 2023-01-11T21:38:06.5247117Z args.clear() 2023-01-11T21:38:06.5247207Z with torch.cuda.device(0): 2023-01-11T21:38:06.5247436Z buf0 = empty_strided((2, 8, 55, 55), (24200, 3025, 55, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.5247662Z buf1 = empty_strided((2, 8, 55, 55), (24200, 3025, 55, 1), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.5247787Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.5247951Z triton_fused_getitem_getitem_1_0.run(arg0_1, buf0, buf1, 48400, grid=grid(48400), stream=stream0) 2023-01-11T21:38:06.5248028Z del arg0_1 2023-01-11T21:38:06.5248112Z return (buf0, buf1, ) 2023-01-11T21:38:06.5248117Z 2023-01-11T21:38:06.5248122Z 2023-01-11T21:38:06.5248203Z if __name__ == "__main__": 2023-01-11T21:38:06.5248317Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.5248447Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.5248682Z arg0_1 = rand_strided((2, 8, 111, 111), (98568, 12321, 111, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.5248801Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.5248806Z 2023-01-11T21:38:06.5249073Z [2023-01-11 21:35:17,855] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 707 2023-01-11T21:38:06.5249079Z 2023-01-11T21:38:06.5249181Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.5249258Z import torch 2023-01-11T21:38:06.5249335Z import random 2023-01-11T21:38:06.5249449Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.5249575Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.5249580Z 2023-01-11T21:38:06.5249665Z aten = torch.ops.aten 2023-01-11T21:38:06.5249804Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.5249900Z async_compile = AsyncCompile() 2023-01-11T21:38:06.5249905Z 2023-01-11T21:38:06.5249982Z import triton 2023-01-11T21:38:06.5250077Z import triton.language as tl 2023-01-11T21:38:06.5250239Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.5250374Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.5250379Z 2023-01-11T21:38:06.5250390Z 2023-01-11T21:38:06.5250565Z triton_fused_getitem_getitem_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.5250642Z import triton 2023-01-11T21:38:06.5250739Z import triton.language as tl 2023-01-11T21:38:06.5250855Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.5250960Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.5251095Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.5251222Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.5251227Z 2023-01-11T21:38:06.5251643Z @pointwise(size_hints=[65536], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*i64', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.5251721Z @triton.jit 2023-01-11T21:38:06.5251865Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.5251943Z xnumel = 48400 2023-01-11T21:38:06.5252044Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.5252175Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.5252263Z xmask = xindex < xnumel 2023-01-11T21:38:06.5252343Z x0 = xindex % 55 2023-01-11T21:38:06.5252421Z x1 = (xindex // 55) % 55 2023-01-11T21:38:06.5252503Z x2 = (xindex // 3025) 2023-01-11T21:38:06.5252575Z x3 = xindex 2023-01-11T21:38:06.5252829Z tmp0 = tl.load(in_ptr0 + ((2*x0) + (222*x1) + (12321*x2)), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.5253083Z tmp1 = tl.load(in_ptr0 + (1 + (2*x0) + (222*x1) + (12321*x2)), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.5253333Z tmp3 = tl.load(in_ptr0 + (2 + (2*x0) + (222*x1) + (12321*x2)), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.5253585Z tmp5 = tl.load(in_ptr0 + (111 + (2*x0) + (222*x1) + (12321*x2)), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.5253833Z tmp7 = tl.load(in_ptr0 + (112 + (2*x0) + (222*x1) + (12321*x2)), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.5254104Z tmp9 = tl.load(in_ptr0 + (113 + (2*x0) + (222*x1) + (12321*x2)), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.5254357Z tmp11 = tl.load(in_ptr0 + (222 + (2*x0) + (222*x1) + (12321*x2)), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.5254717Z tmp13 = tl.load(in_ptr0 + (223 + (2*x0) + (222*x1) + (12321*x2)), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.5254969Z tmp15 = tl.load(in_ptr0 + (224 + (2*x0) + (222*x1) + (12321*x2)), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.5255103Z tmp17 = tl.load(in_ptr0 + ((2*x0) + (222*x1) + (12321*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.5255241Z tmp19 = tl.load(in_ptr0 + (1 + (2*x0) + (222*x1) + (12321*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.5255378Z tmp24 = tl.load(in_ptr0 + (2 + (2*x0) + (222*x1) + (12321*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.5255514Z tmp29 = tl.load(in_ptr0 + (111 + (2*x0) + (222*x1) + (12321*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.5255648Z tmp34 = tl.load(in_ptr0 + (112 + (2*x0) + (222*x1) + (12321*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.5255772Z tmp39 = tl.load(in_ptr0 + (113 + (2*x0) + (222*x1) + (12321*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.5255900Z tmp44 = tl.load(in_ptr0 + (222 + (2*x0) + (222*x1) + (12321*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.5256027Z tmp49 = tl.load(in_ptr0 + (223 + (2*x0) + (222*x1) + (12321*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.5256156Z tmp54 = tl.load(in_ptr0 + (224 + (2*x0) + (222*x1) + (12321*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.5256342Z tmp2 = tl.where(tmp1 != tmp1, tmp1, tl.where(tmp1 > tmp0, tmp1, tmp0)) 2023-01-11T21:38:06.5256479Z tmp4 = tl.where(tmp3 != tmp3, tmp3, tl.where(tmp3 > tmp2, tmp3, tmp2)) 2023-01-11T21:38:06.5256614Z tmp6 = tl.where(tmp5 != tmp5, tmp5, tl.where(tmp5 > tmp4, tmp5, tmp4)) 2023-01-11T21:38:06.5256750Z tmp8 = tl.where(tmp7 != tmp7, tmp7, tl.where(tmp7 > tmp6, tmp7, tmp6)) 2023-01-11T21:38:06.5256882Z tmp10 = tl.where(tmp9 != tmp9, tmp9, tl.where(tmp9 > tmp8, tmp9, tmp8)) 2023-01-11T21:38:06.5257025Z tmp12 = tl.where(tmp11 != tmp11, tmp11, tl.where(tmp11 > tmp10, tmp11, tmp10)) 2023-01-11T21:38:06.5257211Z tmp14 = tl.where(tmp13 != tmp13, tmp13, tl.where(tmp13 > tmp12, tmp13, tmp12)) 2023-01-11T21:38:06.5257365Z tmp16 = tl.where(tmp15 != tmp15, tmp15, tl.where(tmp15 > tmp14, tmp15, tmp14)) 2023-01-11T21:38:06.5257446Z tmp18 = (2*x0) + (222*x1) 2023-01-11T21:38:06.5257532Z tmp20 = 1 + (2*x0) + (222*x1) 2023-01-11T21:38:06.5257616Z tmp21 = tmp19 > tmp17 2023-01-11T21:38:06.5257713Z tmp22 = tl.where(tmp21, tmp20, tmp18) 2023-01-11T21:38:06.5257855Z tmp23 = tl.where(tmp19 != tmp19, tmp19, tl.where(tmp19 > tmp17, tmp19, tmp17)) 2023-01-11T21:38:06.5257937Z tmp25 = 2 + (2*x0) + (222*x1) 2023-01-11T21:38:06.5258018Z tmp26 = tmp24 > tmp23 2023-01-11T21:38:06.5258120Z tmp27 = tl.where(tmp26, tmp25, tmp22) 2023-01-11T21:38:06.5258259Z tmp28 = tl.where(tmp24 != tmp24, tmp24, tl.where(tmp24 > tmp23, tmp24, tmp23)) 2023-01-11T21:38:06.5258344Z tmp30 = 111 + (2*x0) + (222*x1) 2023-01-11T21:38:06.5258426Z tmp31 = tmp29 > tmp28 2023-01-11T21:38:06.5258520Z tmp32 = tl.where(tmp31, tmp30, tmp27) 2023-01-11T21:38:06.5258662Z tmp33 = tl.where(tmp29 != tmp29, tmp29, tl.where(tmp29 > tmp28, tmp29, tmp28)) 2023-01-11T21:38:06.5258748Z tmp35 = 112 + (2*x0) + (222*x1) 2023-01-11T21:38:06.5258830Z tmp36 = tmp34 > tmp33 2023-01-11T21:38:06.5258930Z tmp37 = tl.where(tmp36, tmp35, tmp32) 2023-01-11T21:38:06.5259065Z tmp38 = tl.where(tmp34 != tmp34, tmp34, tl.where(tmp34 > tmp33, tmp34, tmp33)) 2023-01-11T21:38:06.5259152Z tmp40 = 113 + (2*x0) + (222*x1) 2023-01-11T21:38:06.5259228Z tmp41 = tmp39 > tmp38 2023-01-11T21:38:06.5259326Z tmp42 = tl.where(tmp41, tmp40, tmp37) 2023-01-11T21:38:06.5259464Z tmp43 = tl.where(tmp39 != tmp39, tmp39, tl.where(tmp39 > tmp38, tmp39, tmp38)) 2023-01-11T21:38:06.5259596Z tmp45 = 222 + (2*x0) + (222*x1) 2023-01-11T21:38:06.5259681Z tmp46 = tmp44 > tmp43 2023-01-11T21:38:06.5259779Z tmp47 = tl.where(tmp46, tmp45, tmp42) 2023-01-11T21:38:06.5259915Z tmp48 = tl.where(tmp44 != tmp44, tmp44, tl.where(tmp44 > tmp43, tmp44, tmp43)) 2023-01-11T21:38:06.5259992Z tmp50 = 223 + (2*x0) + (222*x1) 2023-01-11T21:38:06.5260076Z tmp51 = tmp49 > tmp48 2023-01-11T21:38:06.5260175Z tmp52 = tl.where(tmp51, tmp50, tmp47) 2023-01-11T21:38:06.5260312Z tmp53 = tl.where(tmp49 != tmp49, tmp49, tl.where(tmp49 > tmp48, tmp49, tmp48)) 2023-01-11T21:38:06.5260396Z tmp55 = 224 + (2*x0) + (222*x1) 2023-01-11T21:38:06.5260484Z tmp56 = tmp54 > tmp53 2023-01-11T21:38:06.5260583Z tmp57 = tl.where(tmp56, tmp55, tmp52) 2023-01-11T21:38:06.5260711Z tmp58 = tl.where(tmp54 != tmp54, tmp54, tl.where(tmp54 > tmp53, tmp54, tmp53)) 2023-01-11T21:38:06.5260844Z tl.store(out_ptr0 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp16, xmask) 2023-01-11T21:38:06.5260978Z tl.store(out_ptr1 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp57, xmask) 2023-01-11T21:38:06.5261067Z ''') 2023-01-11T21:38:06.5261073Z 2023-01-11T21:38:06.5261078Z 2023-01-11T21:38:06.5261174Z async_compile.wait(globals()) 2023-01-11T21:38:06.5261251Z del async_compile 2023-01-11T21:38:06.5261256Z 2023-01-11T21:38:06.5261332Z def call(args): 2023-01-11T21:38:06.5261408Z arg0_1, = args 2023-01-11T21:38:06.5261479Z args.clear() 2023-01-11T21:38:06.5261574Z with torch.cuda.device(0): 2023-01-11T21:38:06.5261805Z buf0 = empty_strided((2, 8, 55, 55), (24200, 3025, 55, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.5262058Z buf1 = empty_strided((2, 8, 55, 55), (24200, 3025, 55, 1), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.5262151Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.5262308Z triton_fused_getitem_getitem_1_0.run(arg0_1, buf0, buf1, 48400, grid=grid(48400), stream=stream0) 2023-01-11T21:38:06.5262382Z del arg0_1 2023-01-11T21:38:06.5262467Z return (buf0, buf1, ) 2023-01-11T21:38:06.5262473Z 2023-01-11T21:38:06.5262477Z 2023-01-11T21:38:06.5262550Z if __name__ == "__main__": 2023-01-11T21:38:06.5262667Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.5262796Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.5263027Z arg0_1 = rand_strided((2, 8, 111, 111), (98568, 12321, 111, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.5263139Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.5263144Z 2023-01-11T21:38:06.5263211Z ok (0.592s) 2023-01-11T21:38:06.5263672Z test_max_pool2d5_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.5263807Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.5264067Z [2023-01-11 21:35:17,893] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 708 2023-01-11T21:38:06.5264322Z [2023-01-11 21:35:18,181] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 708 2023-01-11T21:38:06.5264738Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.5264870Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.5265148Z [2023-01-11 21:35:18,201] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 709 2023-01-11T21:38:06.5265154Z 2023-01-11T21:38:06.5265253Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.5265327Z import torch 2023-01-11T21:38:06.5265401Z import random 2023-01-11T21:38:06.5265520Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.5265659Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.5265666Z 2023-01-11T21:38:06.5265747Z aten = torch.ops.aten 2023-01-11T21:38:06.5265909Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.5266003Z async_compile = AsyncCompile() 2023-01-11T21:38:06.5266008Z 2023-01-11T21:38:06.5266081Z import triton 2023-01-11T21:38:06.5266175Z import triton.language as tl 2023-01-11T21:38:06.5266300Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.5266440Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.5266446Z 2023-01-11T21:38:06.5266450Z 2023-01-11T21:38:06.5266631Z triton_fused_getitem_getitem_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.5266699Z import triton 2023-01-11T21:38:06.5266793Z import triton.language as tl 2023-01-11T21:38:06.5266907Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.5267009Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.5267141Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.5267264Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.5267269Z 2023-01-11T21:38:06.5267689Z @pointwise(size_hints=[524288], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*i64', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.5267791Z @triton.jit 2023-01-11T21:38:06.5267927Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.5268000Z xnumel = 331776 2023-01-11T21:38:06.5268098Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.5268229Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.5268313Z xmask = xindex < xnumel 2023-01-11T21:38:06.5268387Z x0 = xindex % 18 2023-01-11T21:38:06.5268467Z x1 = (xindex // 18) % 18 2023-01-11T21:38:06.5268539Z x2 = (xindex // 324) 2023-01-11T21:38:06.5268611Z x3 = xindex 2023-01-11T21:38:06.5268838Z tmp0 = tl.load(in_ptr0 + ((3*x0) + (165*x1) + (3025*x2)), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.5269063Z tmp1 = tl.load(in_ptr0 + (1 + (3*x0) + (165*x1) + (3025*x2)), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.5269286Z tmp3 = tl.load(in_ptr0 + (2 + (3*x0) + (165*x1) + (3025*x2)), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.5269512Z tmp5 = tl.load(in_ptr0 + (55 + (3*x0) + (165*x1) + (3025*x2)), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.5269735Z tmp7 = tl.load(in_ptr0 + (56 + (3*x0) + (165*x1) + (3025*x2)), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.5269958Z tmp9 = tl.load(in_ptr0 + (57 + (3*x0) + (165*x1) + (3025*x2)), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.5270186Z tmp11 = tl.load(in_ptr0 + (110 + (3*x0) + (165*x1) + (3025*x2)), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.5270404Z tmp13 = tl.load(in_ptr0 + (111 + (3*x0) + (165*x1) + (3025*x2)), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.5270629Z tmp15 = tl.load(in_ptr0 + (112 + (3*x0) + (165*x1) + (3025*x2)), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.5270748Z tmp17 = tl.load(in_ptr0 + ((3*x0) + (165*x1) + (3025*x2)), xmask) 2023-01-11T21:38:06.5270864Z tmp19 = tl.load(in_ptr0 + (1 + (3*x0) + (165*x1) + (3025*x2)), xmask) 2023-01-11T21:38:06.5270984Z tmp24 = tl.load(in_ptr0 + (2 + (3*x0) + (165*x1) + (3025*x2)), xmask) 2023-01-11T21:38:06.5271100Z tmp29 = tl.load(in_ptr0 + (55 + (3*x0) + (165*x1) + (3025*x2)), xmask) 2023-01-11T21:38:06.5271217Z tmp34 = tl.load(in_ptr0 + (56 + (3*x0) + (165*x1) + (3025*x2)), xmask) 2023-01-11T21:38:06.5271358Z tmp39 = tl.load(in_ptr0 + (57 + (3*x0) + (165*x1) + (3025*x2)), xmask) 2023-01-11T21:38:06.5271470Z tmp44 = tl.load(in_ptr0 + (110 + (3*x0) + (165*x1) + (3025*x2)), xmask) 2023-01-11T21:38:06.5271588Z tmp49 = tl.load(in_ptr0 + (111 + (3*x0) + (165*x1) + (3025*x2)), xmask) 2023-01-11T21:38:06.5271701Z tmp54 = tl.load(in_ptr0 + (112 + (3*x0) + (165*x1) + (3025*x2)), xmask) 2023-01-11T21:38:06.5271834Z tmp2 = tl.where(tmp1 != tmp1, tmp1, tl.where(tmp1 > tmp0, tmp1, tmp0)) 2023-01-11T21:38:06.5271967Z tmp4 = tl.where(tmp3 != tmp3, tmp3, tl.where(tmp3 > tmp2, tmp3, tmp2)) 2023-01-11T21:38:06.5272095Z tmp6 = tl.where(tmp5 != tmp5, tmp5, tl.where(tmp5 > tmp4, tmp5, tmp4)) 2023-01-11T21:38:06.5272228Z tmp8 = tl.where(tmp7 != tmp7, tmp7, tl.where(tmp7 > tmp6, tmp7, tmp6)) 2023-01-11T21:38:06.5272359Z tmp10 = tl.where(tmp9 != tmp9, tmp9, tl.where(tmp9 > tmp8, tmp9, tmp8)) 2023-01-11T21:38:06.5272495Z tmp12 = tl.where(tmp11 != tmp11, tmp11, tl.where(tmp11 > tmp10, tmp11, tmp10)) 2023-01-11T21:38:06.5272630Z tmp14 = tl.where(tmp13 != tmp13, tmp13, tl.where(tmp13 > tmp12, tmp13, tmp12)) 2023-01-11T21:38:06.5272760Z tmp16 = tl.where(tmp15 != tmp15, tmp15, tl.where(tmp15 > tmp14, tmp15, tmp14)) 2023-01-11T21:38:06.5272841Z tmp18 = (3*x0) + (165*x1) 2023-01-11T21:38:06.5272921Z tmp20 = 1 + (3*x0) + (165*x1) 2023-01-11T21:38:06.5273000Z tmp21 = tmp19 > tmp17 2023-01-11T21:38:06.5273097Z tmp22 = tl.where(tmp21, tmp20, tmp18) 2023-01-11T21:38:06.5273228Z tmp23 = tl.where(tmp19 != tmp19, tmp19, tl.where(tmp19 > tmp17, tmp19, tmp17)) 2023-01-11T21:38:06.5273311Z tmp25 = 2 + (3*x0) + (165*x1) 2023-01-11T21:38:06.5273418Z tmp26 = tmp24 > tmp23 2023-01-11T21:38:06.5273520Z tmp27 = tl.where(tmp26, tmp25, tmp22) 2023-01-11T21:38:06.5273656Z tmp28 = tl.where(tmp24 != tmp24, tmp24, tl.where(tmp24 > tmp23, tmp24, tmp23)) 2023-01-11T21:38:06.5273739Z tmp30 = 55 + (3*x0) + (165*x1) 2023-01-11T21:38:06.5273822Z tmp31 = tmp29 > tmp28 2023-01-11T21:38:06.5273914Z tmp32 = tl.where(tmp31, tmp30, tmp27) 2023-01-11T21:38:06.5274052Z tmp33 = tl.where(tmp29 != tmp29, tmp29, tl.where(tmp29 > tmp28, tmp29, tmp28)) 2023-01-11T21:38:06.5274133Z tmp35 = 56 + (3*x0) + (165*x1) 2023-01-11T21:38:06.5274212Z tmp36 = tmp34 > tmp33 2023-01-11T21:38:06.5274311Z tmp37 = tl.where(tmp36, tmp35, tmp32) 2023-01-11T21:38:06.5274441Z tmp38 = tl.where(tmp34 != tmp34, tmp34, tl.where(tmp34 > tmp33, tmp34, tmp33)) 2023-01-11T21:38:06.5274523Z tmp40 = 57 + (3*x0) + (165*x1) 2023-01-11T21:38:06.5274602Z tmp41 = tmp39 > tmp38 2023-01-11T21:38:06.5274693Z tmp42 = tl.where(tmp41, tmp40, tmp37) 2023-01-11T21:38:06.5274832Z tmp43 = tl.where(tmp39 != tmp39, tmp39, tl.where(tmp39 > tmp38, tmp39, tmp38)) 2023-01-11T21:38:06.5274916Z tmp45 = 110 + (3*x0) + (165*x1) 2023-01-11T21:38:06.5275002Z tmp46 = tmp44 > tmp43 2023-01-11T21:38:06.5275101Z tmp47 = tl.where(tmp46, tmp45, tmp42) 2023-01-11T21:38:06.5275237Z tmp48 = tl.where(tmp44 != tmp44, tmp44, tl.where(tmp44 > tmp43, tmp44, tmp43)) 2023-01-11T21:38:06.5275339Z tmp50 = 111 + (3*x0) + (165*x1) 2023-01-11T21:38:06.5275419Z tmp51 = tmp49 > tmp48 2023-01-11T21:38:06.5275534Z tmp52 = tl.where(tmp51, tmp50, tmp47) 2023-01-11T21:38:06.5275672Z tmp53 = tl.where(tmp49 != tmp49, tmp49, tl.where(tmp49 > tmp48, tmp49, tmp48)) 2023-01-11T21:38:06.5275755Z tmp55 = 112 + (3*x0) + (165*x1) 2023-01-11T21:38:06.5275834Z tmp56 = tmp54 > tmp53 2023-01-11T21:38:06.5275931Z tmp57 = tl.where(tmp56, tmp55, tmp52) 2023-01-11T21:38:06.5276062Z tmp58 = tl.where(tmp54 != tmp54, tmp54, tl.where(tmp54 > tmp53, tmp54, tmp53)) 2023-01-11T21:38:06.5276188Z tl.store(out_ptr0 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp16, xmask) 2023-01-11T21:38:06.5276317Z tl.store(out_ptr1 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp57, xmask) 2023-01-11T21:38:06.5276403Z ''') 2023-01-11T21:38:06.5276409Z 2023-01-11T21:38:06.5276413Z 2023-01-11T21:38:06.5276534Z async_compile.wait(globals()) 2023-01-11T21:38:06.5276612Z del async_compile 2023-01-11T21:38:06.5276618Z 2023-01-11T21:38:06.5276694Z def call(args): 2023-01-11T21:38:06.5276768Z arg0_1, = args 2023-01-11T21:38:06.5276844Z args.clear() 2023-01-11T21:38:06.5276929Z with torch.cuda.device(0): 2023-01-11T21:38:06.5277160Z buf0 = empty_strided((16, 64, 18, 18), (20736, 324, 18, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.5277384Z buf1 = empty_strided((16, 64, 18, 18), (20736, 324, 18, 1), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.5277474Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.5277635Z triton_fused_getitem_getitem_1_0.run(arg0_1, buf0, buf1, 331776, grid=grid(331776), stream=stream0) 2023-01-11T21:38:06.5277711Z del arg0_1 2023-01-11T21:38:06.5277795Z return (buf0, buf1, ) 2023-01-11T21:38:06.5277800Z 2023-01-11T21:38:06.5277805Z 2023-01-11T21:38:06.5277884Z if __name__ == "__main__": 2023-01-11T21:38:06.5277998Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.5278123Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.5278355Z arg0_1 = rand_strided((16, 64, 55, 55), (193600, 3025, 55, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.5278469Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.5278474Z 2023-01-11T21:38:06.5278739Z [2023-01-11 21:35:18,459] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 709 2023-01-11T21:38:06.5278745Z 2023-01-11T21:38:06.5278841Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.5278914Z import torch 2023-01-11T21:38:06.5279035Z import random 2023-01-11T21:38:06.5279148Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.5279271Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.5279276Z 2023-01-11T21:38:06.5279359Z aten = torch.ops.aten 2023-01-11T21:38:06.5279495Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.5279595Z async_compile = AsyncCompile() 2023-01-11T21:38:06.5279600Z 2023-01-11T21:38:06.5279675Z import triton 2023-01-11T21:38:06.5279767Z import triton.language as tl 2023-01-11T21:38:06.5279885Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.5280024Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.5280030Z 2023-01-11T21:38:06.5280034Z 2023-01-11T21:38:06.5280213Z triton_fused_getitem_getitem_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.5280288Z import triton 2023-01-11T21:38:06.5280379Z import triton.language as tl 2023-01-11T21:38:06.5280492Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.5280595Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.5280729Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.5280848Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.5280860Z 2023-01-11T21:38:06.5281278Z @pointwise(size_hints=[524288], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*i64', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.5281352Z @triton.jit 2023-01-11T21:38:06.5281495Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.5281571Z xnumel = 331776 2023-01-11T21:38:06.5281668Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.5281799Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.5281886Z xmask = xindex < xnumel 2023-01-11T21:38:06.5281957Z x0 = xindex % 18 2023-01-11T21:38:06.5282039Z x1 = (xindex // 18) % 18 2023-01-11T21:38:06.5282118Z x2 = (xindex // 324) 2023-01-11T21:38:06.5282188Z x3 = xindex 2023-01-11T21:38:06.5282434Z tmp0 = tl.load(in_ptr0 + ((3*x0) + (165*x1) + (3025*x2)), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.5282708Z tmp1 = tl.load(in_ptr0 + (1 + (3*x0) + (165*x1) + (3025*x2)), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.5282947Z tmp3 = tl.load(in_ptr0 + (2 + (3*x0) + (165*x1) + (3025*x2)), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.5283192Z tmp5 = tl.load(in_ptr0 + (55 + (3*x0) + (165*x1) + (3025*x2)), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.5283427Z tmp7 = tl.load(in_ptr0 + (56 + (3*x0) + (165*x1) + (3025*x2)), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.5283668Z tmp9 = tl.load(in_ptr0 + (57 + (3*x0) + (165*x1) + (3025*x2)), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.5283922Z tmp11 = tl.load(in_ptr0 + (110 + (3*x0) + (165*x1) + (3025*x2)), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.5284168Z tmp13 = tl.load(in_ptr0 + (111 + (3*x0) + (165*x1) + (3025*x2)), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.5284415Z tmp15 = tl.load(in_ptr0 + (112 + (3*x0) + (165*x1) + (3025*x2)), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.5284547Z tmp17 = tl.load(in_ptr0 + ((3*x0) + (165*x1) + (3025*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.5284680Z tmp19 = tl.load(in_ptr0 + (1 + (3*x0) + (165*x1) + (3025*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.5284810Z tmp24 = tl.load(in_ptr0 + (2 + (3*x0) + (165*x1) + (3025*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.5284942Z tmp29 = tl.load(in_ptr0 + (55 + (3*x0) + (165*x1) + (3025*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.5285065Z tmp34 = tl.load(in_ptr0 + (56 + (3*x0) + (165*x1) + (3025*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.5285228Z tmp39 = tl.load(in_ptr0 + (57 + (3*x0) + (165*x1) + (3025*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.5285358Z tmp44 = tl.load(in_ptr0 + (110 + (3*x0) + (165*x1) + (3025*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.5285494Z tmp49 = tl.load(in_ptr0 + (111 + (3*x0) + (165*x1) + (3025*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.5285624Z tmp54 = tl.load(in_ptr0 + (112 + (3*x0) + (165*x1) + (3025*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.5285780Z tmp2 = tl.where(tmp1 != tmp1, tmp1, tl.where(tmp1 > tmp0, tmp1, tmp0)) 2023-01-11T21:38:06.5285932Z tmp4 = tl.where(tmp3 != tmp3, tmp3, tl.where(tmp3 > tmp2, tmp3, tmp2)) 2023-01-11T21:38:06.5286071Z tmp6 = tl.where(tmp5 != tmp5, tmp5, tl.where(tmp5 > tmp4, tmp5, tmp4)) 2023-01-11T21:38:06.5286195Z tmp8 = tl.where(tmp7 != tmp7, tmp7, tl.where(tmp7 > tmp6, tmp7, tmp6)) 2023-01-11T21:38:06.5286327Z tmp10 = tl.where(tmp9 != tmp9, tmp9, tl.where(tmp9 > tmp8, tmp9, tmp8)) 2023-01-11T21:38:06.5286470Z tmp12 = tl.where(tmp11 != tmp11, tmp11, tl.where(tmp11 > tmp10, tmp11, tmp10)) 2023-01-11T21:38:06.5286602Z tmp14 = tl.where(tmp13 != tmp13, tmp13, tl.where(tmp13 > tmp12, tmp13, tmp12)) 2023-01-11T21:38:06.5286735Z tmp16 = tl.where(tmp15 != tmp15, tmp15, tl.where(tmp15 > tmp14, tmp15, tmp14)) 2023-01-11T21:38:06.5286820Z tmp18 = (3*x0) + (165*x1) 2023-01-11T21:38:06.5286905Z tmp20 = 1 + (3*x0) + (165*x1) 2023-01-11T21:38:06.5286985Z tmp21 = tmp19 > tmp17 2023-01-11T21:38:06.5287077Z tmp22 = tl.where(tmp21, tmp20, tmp18) 2023-01-11T21:38:06.5287217Z tmp23 = tl.where(tmp19 != tmp19, tmp19, tl.where(tmp19 > tmp17, tmp19, tmp17)) 2023-01-11T21:38:06.5287301Z tmp25 = 2 + (3*x0) + (165*x1) 2023-01-11T21:38:06.5287383Z tmp26 = tmp24 > tmp23 2023-01-11T21:38:06.5287480Z tmp27 = tl.where(tmp26, tmp25, tmp22) 2023-01-11T21:38:06.5287613Z tmp28 = tl.where(tmp24 != tmp24, tmp24, tl.where(tmp24 > tmp23, tmp24, tmp23)) 2023-01-11T21:38:06.5287699Z tmp30 = 55 + (3*x0) + (165*x1) 2023-01-11T21:38:06.5287771Z tmp31 = tmp29 > tmp28 2023-01-11T21:38:06.5287872Z tmp32 = tl.where(tmp31, tmp30, tmp27) 2023-01-11T21:38:06.5288014Z tmp33 = tl.where(tmp29 != tmp29, tmp29, tl.where(tmp29 > tmp28, tmp29, tmp28)) 2023-01-11T21:38:06.5288097Z tmp35 = 56 + (3*x0) + (165*x1) 2023-01-11T21:38:06.5288203Z tmp36 = tmp34 > tmp33 2023-01-11T21:38:06.5288302Z tmp37 = tl.where(tmp36, tmp35, tmp32) 2023-01-11T21:38:06.5288438Z tmp38 = tl.where(tmp34 != tmp34, tmp34, tl.where(tmp34 > tmp33, tmp34, tmp33)) 2023-01-11T21:38:06.5288513Z tmp40 = 57 + (3*x0) + (165*x1) 2023-01-11T21:38:06.5288592Z tmp41 = tmp39 > tmp38 2023-01-11T21:38:06.5288690Z tmp42 = tl.where(tmp41, tmp40, tmp37) 2023-01-11T21:38:06.5288825Z tmp43 = tl.where(tmp39 != tmp39, tmp39, tl.where(tmp39 > tmp38, tmp39, tmp38)) 2023-01-11T21:38:06.5288907Z tmp45 = 110 + (3*x0) + (165*x1) 2023-01-11T21:38:06.5288990Z tmp46 = tmp44 > tmp43 2023-01-11T21:38:06.5289091Z tmp47 = tl.where(tmp46, tmp45, tmp42) 2023-01-11T21:38:06.5289223Z tmp48 = tl.where(tmp44 != tmp44, tmp44, tl.where(tmp44 > tmp43, tmp44, tmp43)) 2023-01-11T21:38:06.5289299Z tmp50 = 111 + (3*x0) + (165*x1) 2023-01-11T21:38:06.5289380Z tmp51 = tmp49 > tmp48 2023-01-11T21:38:06.5289482Z tmp52 = tl.where(tmp51, tmp50, tmp47) 2023-01-11T21:38:06.5289616Z tmp53 = tl.where(tmp49 != tmp49, tmp49, tl.where(tmp49 > tmp48, tmp49, tmp48)) 2023-01-11T21:38:06.5289699Z tmp55 = 112 + (3*x0) + (165*x1) 2023-01-11T21:38:06.5289779Z tmp56 = tmp54 > tmp53 2023-01-11T21:38:06.5289877Z tmp57 = tl.where(tmp56, tmp55, tmp52) 2023-01-11T21:38:06.5290002Z tmp58 = tl.where(tmp54 != tmp54, tmp54, tl.where(tmp54 > tmp53, tmp54, tmp53)) 2023-01-11T21:38:06.5290133Z tl.store(out_ptr0 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp16, xmask) 2023-01-11T21:38:06.5290265Z tl.store(out_ptr1 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp57, xmask) 2023-01-11T21:38:06.5290376Z ''') 2023-01-11T21:38:06.5290382Z 2023-01-11T21:38:06.5290386Z 2023-01-11T21:38:06.5290483Z async_compile.wait(globals()) 2023-01-11T21:38:06.5290558Z del async_compile 2023-01-11T21:38:06.5290564Z 2023-01-11T21:38:06.5290638Z def call(args): 2023-01-11T21:38:06.5290711Z arg0_1, = args 2023-01-11T21:38:06.5290780Z args.clear() 2023-01-11T21:38:06.5290875Z with torch.cuda.device(0): 2023-01-11T21:38:06.5291104Z buf0 = empty_strided((16, 64, 18, 18), (20736, 324, 18, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.5291327Z buf1 = empty_strided((16, 64, 18, 18), (20736, 324, 18, 1), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.5291417Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.5291579Z triton_fused_getitem_getitem_1_0.run(arg0_1, buf0, buf1, 331776, grid=grid(331776), stream=stream0) 2023-01-11T21:38:06.5291652Z del arg0_1 2023-01-11T21:38:06.5291728Z return (buf0, buf1, ) 2023-01-11T21:38:06.5291733Z 2023-01-11T21:38:06.5291746Z 2023-01-11T21:38:06.5291819Z if __name__ == "__main__": 2023-01-11T21:38:06.5291936Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.5292063Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.5292296Z arg0_1 = rand_strided((16, 64, 55, 55), (193600, 3025, 55, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.5292407Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.5292413Z 2023-01-11T21:38:06.5292481Z ok (0.605s) 2023-01-11T21:38:06.5292939Z test_max_pool2d6_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.5293071Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.5293332Z [2023-01-11 21:35:18,499] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 710 2023-01-11T21:38:06.5293569Z [2023-01-11 21:35:18,504] torch._inductor.ir: [WARNING] Using FallbackKernel: aten.max_pool2d_with_indices 2023-01-11T21:38:06.5293859Z [2023-01-11 21:35:18,507] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 710 2023-01-11T21:38:06.5294278Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.5294409Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.5294849Z [2023-01-11 21:35:18,527] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 711 2023-01-11T21:38:06.5295102Z [2023-01-11 21:35:18,532] torch._inductor.ir: [WARNING] Using FallbackKernel: aten.max_pool2d_with_indices 2023-01-11T21:38:06.5295367Z [2023-01-11 21:35:18,535] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 711 2023-01-11T21:38:06.5295373Z 2023-01-11T21:38:06.5295492Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.5295574Z import torch 2023-01-11T21:38:06.5295655Z import random 2023-01-11T21:38:06.5295784Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.5295906Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.5295912Z 2023-01-11T21:38:06.5295993Z aten = torch.ops.aten 2023-01-11T21:38:06.5296130Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.5296226Z async_compile = AsyncCompile() 2023-01-11T21:38:06.5296231Z 2023-01-11T21:38:06.5296304Z import triton 2023-01-11T21:38:06.5296396Z import triton.language as tl 2023-01-11T21:38:06.5296558Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.5296697Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.5296702Z 2023-01-11T21:38:06.5296707Z 2023-01-11T21:38:06.5296798Z async_compile.wait(globals()) 2023-01-11T21:38:06.5296875Z del async_compile 2023-01-11T21:38:06.5296882Z 2023-01-11T21:38:06.5296954Z def call(args): 2023-01-11T21:38:06.5297028Z arg0_1, = args 2023-01-11T21:38:06.5297102Z args.clear() 2023-01-11T21:38:06.5297257Z with torch.cuda.device(0): 2023-01-11T21:38:06.5297397Z buf0 = aten.max_pool2d_with_indices(arg0_1, [13, 13], [13, 13], [0, 0], 1, False) 2023-01-11T21:38:06.5297471Z del arg0_1 2023-01-11T21:38:06.5297544Z buf1 = buf0[0] 2023-01-11T21:38:06.5297657Z assert_size_stride(buf1, (16, 64, 4, 4), (1024, 16, 4, 1)) 2023-01-11T21:38:06.5297730Z buf2 = buf0[1] 2023-01-11T21:38:06.5297841Z assert_size_stride(buf2, (16, 64, 4, 4), (1024, 16, 4, 1)) 2023-01-11T21:38:06.5297916Z del buf0 2023-01-11T21:38:06.5297991Z return (buf1, buf2, ) 2023-01-11T21:38:06.5297997Z 2023-01-11T21:38:06.5298001Z 2023-01-11T21:38:06.5298079Z if __name__ == "__main__": 2023-01-11T21:38:06.5298194Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.5298322Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.5298562Z arg0_1 = rand_strided((16, 64, 55, 55), (193600, 3025, 55, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.5298677Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.5298682Z 2023-01-11T21:38:06.5298687Z 2023-01-11T21:38:06.5298783Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.5298856Z import torch 2023-01-11T21:38:06.5298924Z import random 2023-01-11T21:38:06.5299047Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.5299169Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.5299175Z 2023-01-11T21:38:06.5299262Z aten = torch.ops.aten 2023-01-11T21:38:06.5299396Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.5299491Z async_compile = AsyncCompile() 2023-01-11T21:38:06.5299496Z 2023-01-11T21:38:06.5299570Z import triton 2023-01-11T21:38:06.5299655Z import triton.language as tl 2023-01-11T21:38:06.5299817Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.5299957Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.5299963Z 2023-01-11T21:38:06.5299967Z 2023-01-11T21:38:06.5300058Z async_compile.wait(globals()) 2023-01-11T21:38:06.5300133Z del async_compile 2023-01-11T21:38:06.5300138Z 2023-01-11T21:38:06.5300211Z def call(args): 2023-01-11T21:38:06.5300285Z arg0_1, = args 2023-01-11T21:38:06.5300359Z args.clear() 2023-01-11T21:38:06.5300443Z with torch.cuda.device(0): 2023-01-11T21:38:06.5300579Z buf0 = aten.max_pool2d_with_indices(arg0_1, [13, 13], [13, 13], [0, 0], 1, False) 2023-01-11T21:38:06.5300658Z del arg0_1 2023-01-11T21:38:06.5300731Z buf1 = buf0[0] 2023-01-11T21:38:06.5300844Z assert_size_stride(buf1, (16, 64, 4, 4), (1024, 16, 4, 1)) 2023-01-11T21:38:06.5300921Z buf2 = buf0[1] 2023-01-11T21:38:06.5301032Z assert_size_stride(buf2, (16, 64, 4, 4), (1024, 16, 4, 1)) 2023-01-11T21:38:06.5301098Z del buf0 2023-01-11T21:38:06.5301180Z return (buf1, buf2, ) 2023-01-11T21:38:06.5301185Z 2023-01-11T21:38:06.5301189Z 2023-01-11T21:38:06.5301268Z if __name__ == "__main__": 2023-01-11T21:38:06.5301384Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.5301509Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.5301745Z arg0_1 = rand_strided((16, 64, 55, 55), (193600, 3025, 55, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.5301859Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.5301864Z 2023-01-11T21:38:06.5301934Z ok (0.075s) 2023-01-11T21:38:06.5302443Z test_max_pool2d_with_indices_backward2_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.5302580Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.5302834Z [2023-01-11 21:35:18,558] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 712 2023-01-11T21:38:06.5303095Z [2023-01-11 21:35:18,770] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 712 2023-01-11T21:38:06.5303511Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.5303649Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.5303911Z [2023-01-11 21:35:18,791] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 713 2023-01-11T21:38:06.5303917Z 2023-01-11T21:38:06.5304016Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.5304089Z import torch 2023-01-11T21:38:06.5304162Z import random 2023-01-11T21:38:06.5304281Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.5304400Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.5304405Z 2023-01-11T21:38:06.5304487Z aten = torch.ops.aten 2023-01-11T21:38:06.5304626Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.5304725Z async_compile = AsyncCompile() 2023-01-11T21:38:06.5304733Z 2023-01-11T21:38:06.5304808Z import triton 2023-01-11T21:38:06.5304901Z import triton.language as tl 2023-01-11T21:38:06.5305028Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.5305163Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.5305175Z 2023-01-11T21:38:06.5305180Z 2023-01-11T21:38:06.5305406Z triton_fused_max_pool2d_with_indices_backward_0 = async_compile.triton(''' 2023-01-11T21:38:06.5305484Z import triton 2023-01-11T21:38:06.5305583Z import triton.language as tl 2023-01-11T21:38:06.5305724Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.5305842Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.5305985Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.5306112Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.5306118Z 2023-01-11T21:38:06.5306540Z @pointwise(size_hints=[32768], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.5306612Z @triton.jit 2023-01-11T21:38:06.5306756Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.5306836Z xnumel = 17920 2023-01-11T21:38:06.5306939Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.5307073Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.5307158Z xmask = xindex < xnumel 2023-01-11T21:38:06.5307236Z x3 = xindex % 2240 2023-01-11T21:38:06.5307314Z x1 = (xindex // 56) % 40 2023-01-11T21:38:06.5307393Z x0 = xindex % 56 2023-01-11T21:38:06.5307474Z x2 = (xindex // 2240) 2023-01-11T21:38:06.5307547Z x5 = xindex 2023-01-11T21:38:06.5307619Z tmp0 = x3 2023-01-11T21:38:06.5307697Z tmp1 = (x1 // 2) 2023-01-11T21:38:06.5307774Z tmp2 = (x0 // 2) 2023-01-11T21:38:06.5307853Z tmp3 = 1 + (((1 + x1) // 2)) 2023-01-11T21:38:06.5307963Z tmp4 = 1 + (((1 + x0) // 2)) 2023-01-11T21:38:06.5308035Z tmp5 = 0 2023-01-11T21:38:06.5308177Z tmp6 = tl.where(tmp1 != tmp1, tmp1, tl.where(tmp1 > tmp5, tmp1, tmp5)) 2023-01-11T21:38:06.5308314Z tmp7 = tl.where(tmp2 != tmp2, tmp2, tl.where(tmp2 > tmp5, tmp2, tmp5)) 2023-01-11T21:38:06.5308387Z tmp8 = 21 2023-01-11T21:38:06.5308523Z tmp9 = tl.where(tmp3 != tmp3, tmp3, tl.where(tmp3 < tmp8, tmp3, tmp8)) 2023-01-11T21:38:06.5308591Z tmp10 = 29 2023-01-11T21:38:06.5308733Z tmp11 = tl.where(tmp4 != tmp4, tmp4, tl.where(tmp4 < tmp10, tmp4, tmp10)) 2023-01-11T21:38:06.5308816Z tmp12 = tmp6 + tmp5 2023-01-11T21:38:06.5308897Z tmp13 = tmp7 + tmp5 2023-01-11T21:38:06.5308969Z tmp14 = 1 2023-01-11T21:38:06.5309085Z tmp15 = tmp9 - tmp14 2023-01-11T21:38:06.5309228Z tmp16 = tl.where(tmp12 != tmp12, tmp12, tl.where(tmp12 < tmp15, tmp12, tmp15)) 2023-01-11T21:38:06.5309336Z tmp17 = tmp11 - tmp14 2023-01-11T21:38:06.5309478Z tmp18 = tl.where(tmp13 != tmp13, tmp13, tl.where(tmp13 < tmp17, tmp13, tmp17)) 2023-01-11T21:38:06.5309609Z tmp19 = tl.load(in_ptr0 + (tmp18 + (29*tmp16) + (609*x2)), xmask) 2023-01-11T21:38:06.5309729Z tmp20 = tl.load(in_ptr1 + (tmp18 + (29*tmp16) + (609*x2)), xmask) 2023-01-11T21:38:06.5309810Z tmp21 = tmp19 == tmp0 2023-01-11T21:38:06.5309887Z tmp22 = 0.0 2023-01-11T21:38:06.5309992Z tmp23 = tl.where(tmp21, tmp20, tmp22) 2023-01-11T21:38:06.5310069Z tmp24 = tmp7 + tmp14 2023-01-11T21:38:06.5310213Z tmp25 = tl.where(tmp24 != tmp24, tmp24, tl.where(tmp24 < tmp17, tmp24, tmp17)) 2023-01-11T21:38:06.5310335Z tmp26 = tl.load(in_ptr0 + (tmp25 + (29*tmp16) + (609*x2)), xmask) 2023-01-11T21:38:06.5310455Z tmp27 = tl.load(in_ptr1 + (tmp25 + (29*tmp16) + (609*x2)), xmask) 2023-01-11T21:38:06.5310537Z tmp28 = tmp26 == tmp0 2023-01-11T21:38:06.5310620Z tmp29 = tmp12 < tmp9 2023-01-11T21:38:06.5310701Z tmp30 = tmp24 < tmp11 2023-01-11T21:38:06.5310775Z tmp31 = tmp29 & tmp30 2023-01-11T21:38:06.5310858Z tmp32 = tmp31 & tmp28 2023-01-11T21:38:06.5310937Z tmp33 = tmp23 + tmp27 2023-01-11T21:38:06.5311036Z tmp34 = tl.where(tmp32, tmp33, tmp23) 2023-01-11T21:38:06.5317374Z tmp35 = tmp6 + tmp14 2023-01-11T21:38:06.5317540Z tmp36 = tl.where(tmp35 != tmp35, tmp35, tl.where(tmp35 < tmp15, tmp35, tmp15)) 2023-01-11T21:38:06.5317721Z tmp37 = tl.load(in_ptr0 + (tmp18 + (29*tmp36) + (609*x2)), xmask) 2023-01-11T21:38:06.5317844Z tmp38 = tl.load(in_ptr1 + (tmp18 + (29*tmp36) + (609*x2)), xmask) 2023-01-11T21:38:06.5317922Z tmp39 = tmp37 == tmp0 2023-01-11T21:38:06.5318006Z tmp40 = tmp35 < tmp9 2023-01-11T21:38:06.5318087Z tmp41 = tmp13 < tmp11 2023-01-11T21:38:06.5318169Z tmp42 = tmp40 & tmp41 2023-01-11T21:38:06.5318251Z tmp43 = tmp42 & tmp39 2023-01-11T21:38:06.5318331Z tmp44 = tmp34 + tmp38 2023-01-11T21:38:06.5318433Z tmp45 = tl.where(tmp43, tmp44, tmp34) 2023-01-11T21:38:06.5318550Z tmp46 = tl.load(in_ptr0 + (tmp25 + (29*tmp36) + (609*x2)), xmask) 2023-01-11T21:38:06.5318676Z tmp47 = tl.load(in_ptr1 + (tmp25 + (29*tmp36) + (609*x2)), xmask) 2023-01-11T21:38:06.5318758Z tmp48 = tmp46 == tmp0 2023-01-11T21:38:06.5318835Z tmp49 = tmp40 & tmp30 2023-01-11T21:38:06.5318919Z tmp50 = tmp49 & tmp48 2023-01-11T21:38:06.5318998Z tmp51 = tmp45 + tmp47 2023-01-11T21:38:06.5319095Z tmp52 = tl.where(tmp50, tmp51, tmp45) 2023-01-11T21:38:06.5319237Z tl.store(out_ptr0 + (x5 + tl.zeros([XBLOCK], tl.int32)), tmp52, xmask) 2023-01-11T21:38:06.5319345Z ''') 2023-01-11T21:38:06.5319351Z 2023-01-11T21:38:06.5319355Z 2023-01-11T21:38:06.5319453Z async_compile.wait(globals()) 2023-01-11T21:38:06.5319537Z del async_compile 2023-01-11T21:38:06.5319542Z 2023-01-11T21:38:06.5319619Z def call(args): 2023-01-11T21:38:06.5319710Z arg0_1, arg1_1, arg2_1 = args 2023-01-11T21:38:06.5319787Z args.clear() 2023-01-11T21:38:06.5319876Z with torch.cuda.device(0): 2023-01-11T21:38:06.5320109Z buf0 = empty_strided((2, 4, 40, 56), (8960, 2240, 56, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.5320237Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.5320420Z triton_fused_max_pool2d_with_indices_backward_0.run(arg2_1, arg0_1, buf0, 17920, grid=grid(17920), stream=stream0) 2023-01-11T21:38:06.5320496Z del arg0_1 2023-01-11T21:38:06.5320572Z del arg2_1 2023-01-11T21:38:06.5320652Z return (buf0, ) 2023-01-11T21:38:06.5320657Z 2023-01-11T21:38:06.5320661Z 2023-01-11T21:38:06.5320738Z if __name__ == "__main__": 2023-01-11T21:38:06.5320858Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.5320990Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.5321250Z arg0_1 = rand_strided((2, 4, 21, 29), (2436, 609, 29, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.5321488Z arg1_1 = rand_strided((2, 4, 40, 56), (8960, 2240, 56, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.5321705Z arg2_1 = rand_strided((2, 4, 21, 29), (2436, 609, 29, 1), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.5321837Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1])) 2023-01-11T21:38:06.5321843Z 2023-01-11T21:38:06.5322107Z [2023-01-11 21:35:18,982] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 713 2023-01-11T21:38:06.5322113Z 2023-01-11T21:38:06.5322213Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.5322281Z import torch 2023-01-11T21:38:06.5322355Z import random 2023-01-11T21:38:06.5322474Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.5322597Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.5322602Z 2023-01-11T21:38:06.5322685Z aten = torch.ops.aten 2023-01-11T21:38:06.5322819Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.5322914Z async_compile = AsyncCompile() 2023-01-11T21:38:06.5322919Z 2023-01-11T21:38:06.5322993Z import triton 2023-01-11T21:38:06.5323079Z import triton.language as tl 2023-01-11T21:38:06.5323208Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.5323347Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.5323352Z 2023-01-11T21:38:06.5323357Z 2023-01-11T21:38:06.5323565Z triton_fused_max_pool2d_with_indices_backward_0 = async_compile.triton(''' 2023-01-11T21:38:06.5323670Z import triton 2023-01-11T21:38:06.5323764Z import triton.language as tl 2023-01-11T21:38:06.5323880Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.5323974Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.5324107Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.5324230Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.5324236Z 2023-01-11T21:38:06.5324655Z @pointwise(size_hints=[32768], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.5324730Z @triton.jit 2023-01-11T21:38:06.5324870Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.5324945Z xnumel = 17920 2023-01-11T21:38:06.5325045Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.5325170Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.5325254Z xmask = xindex < xnumel 2023-01-11T21:38:06.5325332Z x3 = xindex % 2240 2023-01-11T21:38:06.5325415Z x1 = (xindex // 56) % 40 2023-01-11T21:38:06.5325489Z x0 = xindex % 56 2023-01-11T21:38:06.5325571Z x2 = (xindex // 2240) 2023-01-11T21:38:06.5325642Z x5 = xindex 2023-01-11T21:38:06.5325706Z tmp0 = x3 2023-01-11T21:38:06.5325781Z tmp1 = (x1 // 2) 2023-01-11T21:38:06.5325854Z tmp2 = (x0 // 2) 2023-01-11T21:38:06.5325940Z tmp3 = 1 + (((1 + x1) // 2)) 2023-01-11T21:38:06.5326021Z tmp4 = 1 + (((1 + x0) // 2)) 2023-01-11T21:38:06.5326139Z tmp5 = 0 2023-01-11T21:38:06.5326280Z tmp6 = tl.where(tmp1 != tmp1, tmp1, tl.where(tmp1 > tmp5, tmp1, tmp5)) 2023-01-11T21:38:06.5326413Z tmp7 = tl.where(tmp2 != tmp2, tmp2, tl.where(tmp2 > tmp5, tmp2, tmp5)) 2023-01-11T21:38:06.5326489Z tmp8 = 21 2023-01-11T21:38:06.5326624Z tmp9 = tl.where(tmp3 != tmp3, tmp3, tl.where(tmp3 < tmp8, tmp3, tmp8)) 2023-01-11T21:38:06.5326699Z tmp10 = 29 2023-01-11T21:38:06.5326841Z tmp11 = tl.where(tmp4 != tmp4, tmp4, tl.where(tmp4 < tmp10, tmp4, tmp10)) 2023-01-11T21:38:06.5326922Z tmp12 = tmp6 + tmp5 2023-01-11T21:38:06.5327003Z tmp13 = tmp7 + tmp5 2023-01-11T21:38:06.5327069Z tmp14 = 1 2023-01-11T21:38:06.5327185Z tmp15 = tmp9 - tmp14 2023-01-11T21:38:06.5327331Z tmp16 = tl.where(tmp12 != tmp12, tmp12, tl.where(tmp12 < tmp15, tmp12, tmp15)) 2023-01-11T21:38:06.5327448Z tmp17 = tmp11 - tmp14 2023-01-11T21:38:06.5327594Z tmp18 = tl.where(tmp13 != tmp13, tmp13, tl.where(tmp13 < tmp17, tmp13, tmp17)) 2023-01-11T21:38:06.5327723Z tmp19 = tl.load(in_ptr0 + (tmp18 + (29*tmp16) + (609*x2)), xmask) 2023-01-11T21:38:06.5327860Z tmp20 = tl.load(in_ptr1 + (tmp18 + (29*tmp16) + (609*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.5327937Z tmp21 = tmp19 == tmp0 2023-01-11T21:38:06.5328010Z tmp22 = 0.0 2023-01-11T21:38:06.5328112Z tmp23 = tl.where(tmp21, tmp20, tmp22) 2023-01-11T21:38:06.5328205Z tmp24 = tmp7 + tmp14 2023-01-11T21:38:06.5328352Z tmp25 = tl.where(tmp24 != tmp24, tmp24, tl.where(tmp24 < tmp17, tmp24, tmp17)) 2023-01-11T21:38:06.5328477Z tmp26 = tl.load(in_ptr0 + (tmp25 + (29*tmp16) + (609*x2)), xmask) 2023-01-11T21:38:06.5328616Z tmp27 = tl.load(in_ptr1 + (tmp25 + (29*tmp16) + (609*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.5328691Z tmp28 = tmp26 == tmp0 2023-01-11T21:38:06.5328775Z tmp29 = tmp12 < tmp9 2023-01-11T21:38:06.5328857Z tmp30 = tmp24 < tmp11 2023-01-11T21:38:06.5328941Z tmp31 = tmp29 & tmp30 2023-01-11T21:38:06.5329020Z tmp32 = tmp31 & tmp28 2023-01-11T21:38:06.5329103Z tmp33 = tmp23 + tmp27 2023-01-11T21:38:06.5329204Z tmp34 = tl.where(tmp32, tmp33, tmp23) 2023-01-11T21:38:06.5329279Z tmp35 = tmp6 + tmp14 2023-01-11T21:38:06.5329423Z tmp36 = tl.where(tmp35 != tmp35, tmp35, tl.where(tmp35 < tmp15, tmp35, tmp15)) 2023-01-11T21:38:06.5329617Z tmp37 = tl.load(in_ptr0 + (tmp18 + (29*tmp36) + (609*x2)), xmask) 2023-01-11T21:38:06.5329756Z tmp38 = tl.load(in_ptr1 + (tmp18 + (29*tmp36) + (609*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.5329839Z tmp39 = tmp37 == tmp0 2023-01-11T21:38:06.5329919Z tmp40 = tmp35 < tmp9 2023-01-11T21:38:06.5329999Z tmp41 = tmp13 < tmp11 2023-01-11T21:38:06.5330073Z tmp42 = tmp40 & tmp41 2023-01-11T21:38:06.5330151Z tmp43 = tmp42 & tmp39 2023-01-11T21:38:06.5330235Z tmp44 = tmp34 + tmp38 2023-01-11T21:38:06.5330336Z tmp45 = tl.where(tmp43, tmp44, tmp34) 2023-01-11T21:38:06.5330460Z tmp46 = tl.load(in_ptr0 + (tmp25 + (29*tmp36) + (609*x2)), xmask) 2023-01-11T21:38:06.5330606Z tmp47 = tl.load(in_ptr1 + (tmp25 + (29*tmp36) + (609*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.5330689Z tmp48 = tmp46 == tmp0 2023-01-11T21:38:06.5330771Z tmp49 = tmp40 & tmp30 2023-01-11T21:38:06.5330845Z tmp50 = tmp49 & tmp48 2023-01-11T21:38:06.5330926Z tmp51 = tmp45 + tmp47 2023-01-11T21:38:06.5331028Z tmp52 = tl.where(tmp50, tmp51, tmp45) 2023-01-11T21:38:06.5331167Z tl.store(out_ptr0 + (x5 + tl.zeros([XBLOCK], tl.int32)), tmp52, xmask) 2023-01-11T21:38:06.5331256Z ''') 2023-01-11T21:38:06.5331262Z 2023-01-11T21:38:06.5331266Z 2023-01-11T21:38:06.5331363Z async_compile.wait(globals()) 2023-01-11T21:38:06.5331445Z del async_compile 2023-01-11T21:38:06.5331450Z 2023-01-11T21:38:06.5331520Z def call(args): 2023-01-11T21:38:06.5331607Z arg0_1, arg1_1, arg2_1 = args 2023-01-11T21:38:06.5331684Z args.clear() 2023-01-11T21:38:06.5331778Z with torch.cuda.device(0): 2023-01-11T21:38:06.5332006Z buf0 = empty_strided((2, 4, 40, 56), (8960, 2240, 56, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.5332131Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.5332311Z triton_fused_max_pool2d_with_indices_backward_0.run(arg2_1, arg0_1, buf0, 17920, grid=grid(17920), stream=stream0) 2023-01-11T21:38:06.5332381Z del arg0_1 2023-01-11T21:38:06.5332457Z del arg2_1 2023-01-11T21:38:06.5332541Z return (buf0, ) 2023-01-11T21:38:06.5332547Z 2023-01-11T21:38:06.5332551Z 2023-01-11T21:38:06.5332631Z if __name__ == "__main__": 2023-01-11T21:38:06.5332750Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.5332880Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.5333109Z arg0_1 = rand_strided((2, 4, 21, 29), (2436, 609, 29, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.5333335Z arg1_1 = rand_strided((2, 4, 40, 56), (8960, 2240, 56, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.5333546Z arg2_1 = rand_strided((2, 4, 21, 29), (2436, 609, 29, 1), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.5333680Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1])) 2023-01-11T21:38:06.5333685Z 2023-01-11T21:38:06.5333757Z ok (0.446s) 2023-01-11T21:38:06.5334240Z test_max_pool2d_with_indices_backward3_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.5334373Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.5334781Z [2023-01-11 21:35:19,106] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 714 2023-01-11T21:38:06.5335049Z [2023-01-11 21:35:19,247] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 714 2023-01-11T21:38:06.5335572Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.5335713Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.5335976Z [2023-01-11 21:35:19,280] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 715 2023-01-11T21:38:06.5336242Z [2023-01-11 21:35:19,409] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 715 2023-01-11T21:38:06.5336248Z 2023-01-11T21:38:06.5336345Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.5336412Z import torch 2023-01-11T21:38:06.5336487Z import random 2023-01-11T21:38:06.5336606Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.5336730Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.5336735Z 2023-01-11T21:38:06.5336817Z aten = torch.ops.aten 2023-01-11T21:38:06.5336954Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.5337050Z async_compile = AsyncCompile() 2023-01-11T21:38:06.5337057Z 2023-01-11T21:38:06.5337181Z import triton 2023-01-11T21:38:06.5337293Z import triton.language as tl 2023-01-11T21:38:06.5337423Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.5337567Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.5337573Z 2023-01-11T21:38:06.5337578Z 2023-01-11T21:38:06.5337783Z triton_fused_max_pool2d_with_indices_backward_0 = async_compile.triton(''' 2023-01-11T21:38:06.5337859Z import triton 2023-01-11T21:38:06.5337950Z import triton.language as tl 2023-01-11T21:38:06.5338065Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.5338160Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.5338341Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.5338471Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.5338476Z 2023-01-11T21:38:06.5338907Z @pointwise(size_hints=[16777216], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.5338984Z @triton.jit 2023-01-11T21:38:06.5339125Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.5339203Z xnumel = 11517952 2023-01-11T21:38:06.5339302Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.5339426Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.5339508Z xmask = xindex < xnumel 2023-01-11T21:38:06.5339593Z x3 = xindex % 1406 2023-01-11T21:38:06.5339677Z x1 = (xindex // 38) % 37 2023-01-11T21:38:06.5339760Z x0 = xindex % 38 2023-01-11T21:38:06.5339840Z x2 = (xindex // 1406) 2023-01-11T21:38:06.5339912Z x5 = xindex 2023-01-11T21:38:06.5339979Z tmp0 = x3 2023-01-11T21:38:06.5340061Z tmp1 = ((1 + x1) // 2) 2023-01-11T21:38:06.5340141Z tmp2 = ((1 + x0) // 2) 2023-01-11T21:38:06.5340223Z tmp3 = 1 + (x1 // 2) 2023-01-11T21:38:06.5340301Z tmp4 = 1 + (x0 // 2) 2023-01-11T21:38:06.5340376Z tmp5 = 0 2023-01-11T21:38:06.5340510Z tmp6 = tl.where(tmp1 != tmp1, tmp1, tl.where(tmp1 > tmp5, tmp1, tmp5)) 2023-01-11T21:38:06.5340648Z tmp7 = tl.where(tmp2 != tmp2, tmp2, tl.where(tmp2 > tmp5, tmp2, tmp5)) 2023-01-11T21:38:06.5340727Z tmp8 = 19 2023-01-11T21:38:06.5340862Z tmp9 = tl.where(tmp3 != tmp3, tmp3, tl.where(tmp3 < tmp8, tmp3, tmp8)) 2023-01-11T21:38:06.5341001Z tmp10 = tl.where(tmp4 != tmp4, tmp4, tl.where(tmp4 < tmp8, tmp4, tmp8)) 2023-01-11T21:38:06.5341083Z tmp11 = tmp6 + tmp5 2023-01-11T21:38:06.5341164Z tmp12 = tmp7 + tmp5 2023-01-11T21:38:06.5341241Z tmp13 = 1 2023-01-11T21:38:06.5341350Z tmp14 = tmp9 - tmp13 2023-01-11T21:38:06.5341494Z tmp15 = tl.where(tmp11 != tmp11, tmp11, tl.where(tmp11 < tmp14, tmp11, tmp14)) 2023-01-11T21:38:06.5341610Z tmp16 = tmp10 - tmp13 2023-01-11T21:38:06.5341782Z tmp17 = tl.where(tmp12 != tmp12, tmp12, tl.where(tmp12 < tmp16, tmp12, tmp16)) 2023-01-11T21:38:06.5341906Z tmp18 = tl.load(in_ptr0 + (tmp17 + (19*tmp15) + (361*x2)), xmask) 2023-01-11T21:38:06.5342026Z tmp19 = tl.load(in_ptr1 + (tmp17 + (19*tmp15) + (361*x2)), xmask) 2023-01-11T21:38:06.5342104Z tmp20 = tmp18 == tmp0 2023-01-11T21:38:06.5342171Z tmp21 = 0.0 2023-01-11T21:38:06.5342274Z tmp22 = tl.where(tmp20, tmp19, tmp21) 2023-01-11T21:38:06.5342409Z tl.store(out_ptr0 + (x5 + tl.zeros([XBLOCK], tl.int32)), tmp22, xmask) 2023-01-11T21:38:06.5342494Z ''') 2023-01-11T21:38:06.5342500Z 2023-01-11T21:38:06.5342504Z 2023-01-11T21:38:06.5342598Z async_compile.wait(globals()) 2023-01-11T21:38:06.5342679Z del async_compile 2023-01-11T21:38:06.5342684Z 2023-01-11T21:38:06.5342759Z def call(args): 2023-01-11T21:38:06.5342844Z arg0_1, arg1_1, arg2_1 = args 2023-01-11T21:38:06.5342913Z args.clear() 2023-01-11T21:38:06.5343005Z with torch.cuda.device(0): 2023-01-11T21:38:06.5343241Z buf0 = empty_strided((32, 256, 37, 38), (359936, 1406, 38, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.5343333Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.5343515Z triton_fused_max_pool2d_with_indices_backward_0.run(arg2_1, arg0_1, buf0, 11517952, grid=grid(11517952), stream=stream0) 2023-01-11T21:38:06.5343588Z del arg0_1 2023-01-11T21:38:06.5343660Z del arg2_1 2023-01-11T21:38:06.5343730Z return (buf0, ) 2023-01-11T21:38:06.5343736Z 2023-01-11T21:38:06.5343740Z 2023-01-11T21:38:06.5343820Z if __name__ == "__main__": 2023-01-11T21:38:06.5343938Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.5344065Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.5344327Z arg0_1 = rand_strided((32, 256, 19, 19), (92416, 361, 19, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.5344557Z arg1_1 = rand_strided((32, 256, 37, 38), (359936, 1406, 38, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.5344782Z arg2_1 = rand_strided((32, 256, 19, 19), (92416, 361, 19, 1), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.5344907Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1])) 2023-01-11T21:38:06.5344912Z 2023-01-11T21:38:06.5344916Z 2023-01-11T21:38:06.5345012Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.5345080Z import torch 2023-01-11T21:38:06.5345153Z import random 2023-01-11T21:38:06.5345273Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.5345396Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.5345401Z 2023-01-11T21:38:06.5345486Z aten = torch.ops.aten 2023-01-11T21:38:06.5345629Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.5345745Z async_compile = AsyncCompile() 2023-01-11T21:38:06.5345750Z 2023-01-11T21:38:06.5345823Z import triton 2023-01-11T21:38:06.5345933Z import triton.language as tl 2023-01-11T21:38:06.5346058Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.5346198Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.5346203Z 2023-01-11T21:38:06.5346208Z 2023-01-11T21:38:06.5346409Z triton_fused_max_pool2d_with_indices_backward_0 = async_compile.triton(''' 2023-01-11T21:38:06.5346484Z import triton 2023-01-11T21:38:06.5346577Z import triton.language as tl 2023-01-11T21:38:06.5346691Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.5346786Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.5346917Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.5347046Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.5347054Z 2023-01-11T21:38:06.5347479Z @pointwise(size_hints=[16777216], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.5347552Z @triton.jit 2023-01-11T21:38:06.5347721Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.5347800Z xnumel = 11517952 2023-01-11T21:38:06.5347895Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.5348016Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.5348099Z xmask = xindex < xnumel 2023-01-11T21:38:06.5348177Z x3 = xindex % 1406 2023-01-11T21:38:06.5348259Z x1 = (xindex // 38) % 37 2023-01-11T21:38:06.5348334Z x0 = xindex % 38 2023-01-11T21:38:06.5348415Z x2 = (xindex // 1406) 2023-01-11T21:38:06.5348487Z x5 = xindex 2023-01-11T21:38:06.5348551Z tmp0 = x3 2023-01-11T21:38:06.5348630Z tmp1 = ((1 + x1) // 2) 2023-01-11T21:38:06.5348708Z tmp2 = ((1 + x0) // 2) 2023-01-11T21:38:06.5348784Z tmp3 = 1 + (x1 // 2) 2023-01-11T21:38:06.5348857Z tmp4 = 1 + (x0 // 2) 2023-01-11T21:38:06.5348927Z tmp5 = 0 2023-01-11T21:38:06.5349057Z tmp6 = tl.where(tmp1 != tmp1, tmp1, tl.where(tmp1 > tmp5, tmp1, tmp5)) 2023-01-11T21:38:06.5349197Z tmp7 = tl.where(tmp2 != tmp2, tmp2, tl.where(tmp2 > tmp5, tmp2, tmp5)) 2023-01-11T21:38:06.5349269Z tmp8 = 19 2023-01-11T21:38:06.5349398Z tmp9 = tl.where(tmp3 != tmp3, tmp3, tl.where(tmp3 < tmp8, tmp3, tmp8)) 2023-01-11T21:38:06.5349534Z tmp10 = tl.where(tmp4 != tmp4, tmp4, tl.where(tmp4 < tmp8, tmp4, tmp8)) 2023-01-11T21:38:06.5349614Z tmp11 = tmp6 + tmp5 2023-01-11T21:38:06.5349698Z tmp12 = tmp7 + tmp5 2023-01-11T21:38:06.5349767Z tmp13 = 1 2023-01-11T21:38:06.5349875Z tmp14 = tmp9 - tmp13 2023-01-11T21:38:06.5350017Z tmp15 = tl.where(tmp11 != tmp11, tmp11, tl.where(tmp11 < tmp14, tmp11, tmp14)) 2023-01-11T21:38:06.5350159Z tmp16 = tmp10 - tmp13 2023-01-11T21:38:06.5350298Z tmp17 = tl.where(tmp12 != tmp12, tmp12, tl.where(tmp12 < tmp16, tmp12, tmp16)) 2023-01-11T21:38:06.5350420Z tmp18 = tl.load(in_ptr0 + (tmp17 + (19*tmp15) + (361*x2)), xmask) 2023-01-11T21:38:06.5350563Z tmp19 = tl.load(in_ptr1 + (tmp17 + (19*tmp15) + (361*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.5350644Z tmp20 = tmp18 == tmp0 2023-01-11T21:38:06.5350710Z tmp21 = 0.0 2023-01-11T21:38:06.5350810Z tmp22 = tl.where(tmp20, tmp19, tmp21) 2023-01-11T21:38:06.5350943Z tl.store(out_ptr0 + (x5 + tl.zeros([XBLOCK], tl.int32)), tmp22, xmask) 2023-01-11T21:38:06.5351026Z ''') 2023-01-11T21:38:06.5351032Z 2023-01-11T21:38:06.5351036Z 2023-01-11T21:38:06.5351130Z async_compile.wait(globals()) 2023-01-11T21:38:06.5351207Z del async_compile 2023-01-11T21:38:06.5351212Z 2023-01-11T21:38:06.5351286Z def call(args): 2023-01-11T21:38:06.5351372Z arg0_1, arg1_1, arg2_1 = args 2023-01-11T21:38:06.5351443Z args.clear() 2023-01-11T21:38:06.5351535Z with torch.cuda.device(0): 2023-01-11T21:38:06.5351764Z buf0 = empty_strided((32, 256, 37, 38), (359936, 1406, 38, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.5351857Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.5352038Z triton_fused_max_pool2d_with_indices_backward_0.run(arg2_1, arg0_1, buf0, 11517952, grid=grid(11517952), stream=stream0) 2023-01-11T21:38:06.5352115Z del arg0_1 2023-01-11T21:38:06.5352189Z del arg2_1 2023-01-11T21:38:06.5352259Z return (buf0, ) 2023-01-11T21:38:06.5352264Z 2023-01-11T21:38:06.5352268Z 2023-01-11T21:38:06.5352346Z if __name__ == "__main__": 2023-01-11T21:38:06.5352464Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.5352590Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.5352821Z arg0_1 = rand_strided((32, 256, 19, 19), (92416, 361, 19, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.5353054Z arg1_1 = rand_strided((32, 256, 37, 38), (359936, 1406, 38, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.5353279Z arg2_1 = rand_strided((32, 256, 19, 19), (92416, 361, 19, 1), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.5353406Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1])) 2023-01-11T21:38:06.5353441Z 2023-01-11T21:38:06.5353507Z ok (0.440s) 2023-01-11T21:38:06.5353983Z test_max_pool2d_with_indices_backward4_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.5354113Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.5354373Z [2023-01-11 21:35:19,444] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 716 2023-01-11T21:38:06.5354381Z 2023-01-11T21:38:06.5354479Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.5354556Z import torch 2023-01-11T21:38:06.5354631Z import random 2023-01-11T21:38:06.5354750Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.5354874Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.5354879Z 2023-01-11T21:38:06.5354954Z aten = torch.ops.aten 2023-01-11T21:38:06.5355087Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.5355186Z async_compile = AsyncCompile() 2023-01-11T21:38:06.5355191Z 2023-01-11T21:38:06.5355266Z import triton 2023-01-11T21:38:06.5355373Z import triton.language as tl 2023-01-11T21:38:06.5355514Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.5355671Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.5355676Z 2023-01-11T21:38:06.5355682Z 2023-01-11T21:38:06.5355912Z triton_fused_max_pool2d_with_indices_backward_0 = async_compile.triton(''' 2023-01-11T21:38:06.5355981Z import triton 2023-01-11T21:38:06.5356073Z import triton.language as tl 2023-01-11T21:38:06.5356186Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.5356287Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.5356420Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.5356545Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.5356550Z 2023-01-11T21:38:06.5356973Z @pointwise(size_hints=[2048], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.5357045Z @triton.jit 2023-01-11T21:38:06.5357186Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.5357255Z xnumel = 1536 2023-01-11T21:38:06.5357354Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.5357485Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.5357566Z xmask = xindex < xnumel 2023-01-11T21:38:06.5357643Z x3 = xindex % 12 2023-01-11T21:38:06.5357721Z x1 = (xindex // 4) % 3 2023-01-11T21:38:06.5357791Z x0 = xindex % 4 2023-01-11T21:38:06.5357870Z x2 = (xindex // 12) 2023-01-11T21:38:06.5357940Z x5 = xindex 2023-01-11T21:38:06.5358011Z tmp0 = x3 2023-01-11T21:38:06.5358116Z tmp1 = (-2) + x1 2023-01-11T21:38:06.5358218Z tmp2 = (-2) + x0 2023-01-11T21:38:06.5358291Z tmp3 = 3 + x1 2023-01-11T21:38:06.5358357Z tmp4 = 3 + x0 2023-01-11T21:38:06.5358429Z tmp5 = 0 2023-01-11T21:38:06.5358567Z tmp6 = tl.where(tmp1 != tmp1, tmp1, tl.where(tmp1 > tmp5, tmp1, tmp5)) 2023-01-11T21:38:06.5358703Z tmp7 = tl.where(tmp2 != tmp2, tmp2, tl.where(tmp2 > tmp5, tmp2, tmp5)) 2023-01-11T21:38:06.5358773Z tmp8 = 3 2023-01-11T21:38:06.5358906Z tmp9 = tl.where(tmp3 != tmp3, tmp3, tl.where(tmp3 < tmp8, tmp3, tmp8)) 2023-01-11T21:38:06.5358978Z tmp10 = 4 2023-01-11T21:38:06.5359109Z tmp11 = tl.where(tmp4 != tmp4, tmp4, tl.where(tmp4 < tmp10, tmp4, tmp10)) 2023-01-11T21:38:06.5359188Z tmp12 = tmp6 + tmp5 2023-01-11T21:38:06.5359264Z tmp13 = tmp7 + tmp5 2023-01-11T21:38:06.5359364Z tmp14 = 1 2023-01-11T21:38:06.5359478Z tmp15 = tmp9 - tmp14 2023-01-11T21:38:06.5359620Z tmp16 = tl.where(tmp12 != tmp12, tmp12, tl.where(tmp12 < tmp15, tmp12, tmp15)) 2023-01-11T21:38:06.5359733Z tmp17 = tmp11 - tmp14 2023-01-11T21:38:06.5359867Z tmp18 = tl.where(tmp13 != tmp13, tmp13, tl.where(tmp13 < tmp17, tmp13, tmp17)) 2023-01-11T21:38:06.5359988Z tmp19 = tl.load(in_ptr0 + (tmp18 + (4*tmp16) + (12*x2)), xmask) 2023-01-11T21:38:06.5360104Z tmp20 = tl.load(in_ptr1 + (tmp18 + (4*tmp16) + (12*x2)), xmask) 2023-01-11T21:38:06.5360184Z tmp21 = tmp19 == tmp0 2023-01-11T21:38:06.5360257Z tmp22 = 0.0 2023-01-11T21:38:06.5360359Z tmp23 = tl.where(tmp21, tmp20, tmp22) 2023-01-11T21:38:06.5360441Z tmp24 = tmp7 + tmp14 2023-01-11T21:38:06.5360574Z tmp25 = tl.where(tmp24 != tmp24, tmp24, tl.where(tmp24 < tmp17, tmp24, tmp17)) 2023-01-11T21:38:06.5360693Z tmp26 = tl.load(in_ptr0 + (tmp25 + (4*tmp16) + (12*x2)), xmask) 2023-01-11T21:38:06.5360809Z tmp27 = tl.load(in_ptr1 + (tmp25 + (4*tmp16) + (12*x2)), xmask) 2023-01-11T21:38:06.5360889Z tmp28 = tmp26 == tmp0 2023-01-11T21:38:06.5360968Z tmp29 = tmp12 < tmp9 2023-01-11T21:38:06.5361049Z tmp30 = tmp24 < tmp11 2023-01-11T21:38:06.5361127Z tmp31 = tmp29 & tmp30 2023-01-11T21:38:06.5361199Z tmp32 = tmp31 & tmp28 2023-01-11T21:38:06.5361276Z tmp33 = tmp23 + tmp27 2023-01-11T21:38:06.5361377Z tmp34 = tl.where(tmp32, tmp33, tmp23) 2023-01-11T21:38:06.5361448Z tmp35 = 2 2023-01-11T21:38:06.5361528Z tmp36 = tmp7 + tmp35 2023-01-11T21:38:06.5361667Z tmp37 = tl.where(tmp36 != tmp36, tmp36, tl.where(tmp36 < tmp17, tmp36, tmp17)) 2023-01-11T21:38:06.5361811Z tmp38 = tl.load(in_ptr0 + (tmp37 + (4*tmp16) + (12*x2)), xmask) 2023-01-11T21:38:06.5361920Z tmp39 = tl.load(in_ptr1 + (tmp37 + (4*tmp16) + (12*x2)), xmask) 2023-01-11T21:38:06.5362003Z tmp40 = tmp38 == tmp0 2023-01-11T21:38:06.5362080Z tmp41 = tmp36 < tmp11 2023-01-11T21:38:06.5362160Z tmp42 = tmp29 & tmp41 2023-01-11T21:38:06.5362241Z tmp43 = tmp42 & tmp40 2023-01-11T21:38:06.5362319Z tmp44 = tmp34 + tmp39 2023-01-11T21:38:06.5362419Z tmp45 = tl.where(tmp43, tmp44, tmp34) 2023-01-11T21:38:06.5362491Z tmp46 = tmp7 + tmp8 2023-01-11T21:38:06.5362627Z tmp47 = tl.where(tmp46 != tmp46, tmp46, tl.where(tmp46 < tmp17, tmp46, tmp17)) 2023-01-11T21:38:06.5362746Z tmp48 = tl.load(in_ptr0 + (tmp47 + (4*tmp16) + (12*x2)), xmask) 2023-01-11T21:38:06.5362861Z tmp49 = tl.load(in_ptr1 + (tmp47 + (4*tmp16) + (12*x2)), xmask) 2023-01-11T21:38:06.5362938Z tmp50 = tmp48 == tmp0 2023-01-11T21:38:06.5363018Z tmp51 = tmp46 < tmp11 2023-01-11T21:38:06.5363101Z tmp52 = tmp29 & tmp51 2023-01-11T21:38:06.5363173Z tmp53 = tmp52 & tmp50 2023-01-11T21:38:06.5363251Z tmp54 = tmp45 + tmp49 2023-01-11T21:38:06.5363350Z tmp55 = tl.where(tmp53, tmp54, tmp45) 2023-01-11T21:38:06.5363427Z tmp56 = tmp7 + tmp10 2023-01-11T21:38:06.5363569Z tmp57 = tl.where(tmp56 != tmp56, tmp56, tl.where(tmp56 < tmp17, tmp56, tmp17)) 2023-01-11T21:38:06.5363685Z tmp58 = tl.load(in_ptr0 + (tmp57 + (4*tmp16) + (12*x2)), xmask) 2023-01-11T21:38:06.5363798Z tmp59 = tl.load(in_ptr1 + (tmp57 + (4*tmp16) + (12*x2)), xmask) 2023-01-11T21:38:06.5363881Z tmp60 = tmp58 == tmp0 2023-01-11T21:38:06.5363954Z tmp61 = tmp56 < tmp11 2023-01-11T21:38:06.5364033Z tmp62 = tmp29 & tmp61 2023-01-11T21:38:06.5364111Z tmp63 = tmp62 & tmp60 2023-01-11T21:38:06.5364190Z tmp64 = tmp55 + tmp59 2023-01-11T21:38:06.5364288Z tmp65 = tl.where(tmp63, tmp64, tmp55) 2023-01-11T21:38:06.5364369Z tmp66 = tmp6 + tmp14 2023-01-11T21:38:06.5364501Z tmp67 = tl.where(tmp66 != tmp66, tmp66, tl.where(tmp66 < tmp15, tmp66, tmp15)) 2023-01-11T21:38:06.5364622Z tmp68 = tl.load(in_ptr0 + (tmp18 + (4*tmp67) + (12*x2)), xmask) 2023-01-11T21:38:06.5364737Z tmp69 = tl.load(in_ptr1 + (tmp18 + (4*tmp67) + (12*x2)), xmask) 2023-01-11T21:38:06.5364817Z tmp70 = tmp68 == tmp0 2023-01-11T21:38:06.5364896Z tmp71 = tmp66 < tmp9 2023-01-11T21:38:06.5365002Z tmp72 = tmp13 < tmp11 2023-01-11T21:38:06.5365082Z tmp73 = tmp71 & tmp72 2023-01-11T21:38:06.5365153Z tmp74 = tmp73 & tmp70 2023-01-11T21:38:06.5365233Z tmp75 = tmp65 + tmp69 2023-01-11T21:38:06.5365336Z tmp76 = tl.where(tmp74, tmp75, tmp65) 2023-01-11T21:38:06.5365451Z tmp77 = tl.load(in_ptr0 + (tmp25 + (4*tmp67) + (12*x2)), xmask) 2023-01-11T21:38:06.5365565Z tmp78 = tl.load(in_ptr1 + (tmp25 + (4*tmp67) + (12*x2)), xmask) 2023-01-11T21:38:06.5365646Z tmp79 = tmp77 == tmp0 2023-01-11T21:38:06.5365723Z tmp80 = tmp71 & tmp30 2023-01-11T21:38:06.5365800Z tmp81 = tmp80 & tmp79 2023-01-11T21:38:06.5365875Z tmp82 = tmp76 + tmp78 2023-01-11T21:38:06.5365969Z tmp83 = tl.where(tmp81, tmp82, tmp76) 2023-01-11T21:38:06.5366085Z tmp84 = tl.load(in_ptr0 + (tmp37 + (4*tmp67) + (12*x2)), xmask) 2023-01-11T21:38:06.5366201Z tmp85 = tl.load(in_ptr1 + (tmp37 + (4*tmp67) + (12*x2)), xmask) 2023-01-11T21:38:06.5366282Z tmp86 = tmp84 == tmp0 2023-01-11T21:38:06.5366363Z tmp87 = tmp71 & tmp41 2023-01-11T21:38:06.5366439Z tmp88 = tmp87 & tmp86 2023-01-11T21:38:06.5366511Z tmp89 = tmp83 + tmp85 2023-01-11T21:38:06.5366606Z tmp90 = tl.where(tmp88, tmp89, tmp83) 2023-01-11T21:38:06.5366721Z tmp91 = tl.load(in_ptr0 + (tmp47 + (4*tmp67) + (12*x2)), xmask) 2023-01-11T21:38:06.5366837Z tmp92 = tl.load(in_ptr1 + (tmp47 + (4*tmp67) + (12*x2)), xmask) 2023-01-11T21:38:06.5366915Z tmp93 = tmp91 == tmp0 2023-01-11T21:38:06.5366993Z tmp94 = tmp71 & tmp51 2023-01-11T21:38:06.5367070Z tmp95 = tmp94 & tmp93 2023-01-11T21:38:06.5367142Z tmp96 = tmp90 + tmp92 2023-01-11T21:38:06.5367278Z tmp97 = tl.where(tmp95, tmp96, tmp90) 2023-01-11T21:38:06.5367397Z tmp98 = tl.load(in_ptr0 + (tmp57 + (4*tmp67) + (12*x2)), xmask) 2023-01-11T21:38:06.5367511Z tmp99 = tl.load(in_ptr1 + (tmp57 + (4*tmp67) + (12*x2)), xmask) 2023-01-11T21:38:06.5367592Z tmp100 = tmp98 == tmp0 2023-01-11T21:38:06.5367679Z tmp101 = tmp71 & tmp61 2023-01-11T21:38:06.5367757Z tmp102 = tmp101 & tmp100 2023-01-11T21:38:06.5367832Z tmp103 = tmp97 + tmp99 2023-01-11T21:38:06.5367936Z tmp104 = tl.where(tmp102, tmp103, tmp97) 2023-01-11T21:38:06.5368013Z tmp105 = tmp6 + tmp35 2023-01-11T21:38:06.5368155Z tmp106 = tl.where(tmp105 != tmp105, tmp105, tl.where(tmp105 < tmp15, tmp105, tmp15)) 2023-01-11T21:38:06.5368276Z tmp107 = tl.load(in_ptr0 + (tmp18 + (4*tmp106) + (12*x2)), xmask) 2023-01-11T21:38:06.5368395Z tmp108 = tl.load(in_ptr1 + (tmp18 + (4*tmp106) + (12*x2)), xmask) 2023-01-11T21:38:06.5368477Z tmp109 = tmp107 == tmp0 2023-01-11T21:38:06.5368551Z tmp110 = tmp105 < tmp9 2023-01-11T21:38:06.5368635Z tmp111 = tmp110 & tmp72 2023-01-11T21:38:06.5368717Z tmp112 = tmp111 & tmp109 2023-01-11T21:38:06.5368798Z tmp113 = tmp104 + tmp108 2023-01-11T21:38:06.5368900Z tmp114 = tl.where(tmp112, tmp113, tmp104) 2023-01-11T21:38:06.5369022Z tmp115 = tl.load(in_ptr0 + (tmp25 + (4*tmp106) + (12*x2)), xmask) 2023-01-11T21:38:06.5369143Z tmp116 = tl.load(in_ptr1 + (tmp25 + (4*tmp106) + (12*x2)), xmask) 2023-01-11T21:38:06.5369217Z tmp117 = tmp115 == tmp0 2023-01-11T21:38:06.5369299Z tmp118 = tmp110 & tmp30 2023-01-11T21:38:06.5369381Z tmp119 = tmp118 & tmp117 2023-01-11T21:38:06.5369463Z tmp120 = tmp114 + tmp116 2023-01-11T21:38:06.5369567Z tmp121 = tl.where(tmp119, tmp120, tmp114) 2023-01-11T21:38:06.5369686Z tmp122 = tl.load(in_ptr0 + (tmp37 + (4*tmp106) + (12*x2)), xmask) 2023-01-11T21:38:06.5369804Z tmp123 = tl.load(in_ptr1 + (tmp37 + (4*tmp106) + (12*x2)), xmask) 2023-01-11T21:38:06.5369878Z tmp124 = tmp122 == tmp0 2023-01-11T21:38:06.5369958Z tmp125 = tmp110 & tmp41 2023-01-11T21:38:06.5370041Z tmp126 = tmp125 & tmp124 2023-01-11T21:38:06.5370121Z tmp127 = tmp121 + tmp123 2023-01-11T21:38:06.5370223Z tmp128 = tl.where(tmp126, tmp127, tmp121) 2023-01-11T21:38:06.5370340Z tmp129 = tl.load(in_ptr0 + (tmp47 + (4*tmp106) + (12*x2)), xmask) 2023-01-11T21:38:06.5370481Z tmp130 = tl.load(in_ptr1 + (tmp47 + (4*tmp106) + (12*x2)), xmask) 2023-01-11T21:38:06.5370557Z tmp131 = tmp129 == tmp0 2023-01-11T21:38:06.5370638Z tmp132 = tmp110 & tmp51 2023-01-11T21:38:06.5370719Z tmp133 = tmp132 & tmp131 2023-01-11T21:38:06.5370800Z tmp134 = tmp128 + tmp130 2023-01-11T21:38:06.5370901Z tmp135 = tl.where(tmp133, tmp134, tmp128) 2023-01-11T21:38:06.5371021Z tmp136 = tl.load(in_ptr0 + (tmp57 + (4*tmp106) + (12*x2)), xmask) 2023-01-11T21:38:06.5371139Z tmp137 = tl.load(in_ptr1 + (tmp57 + (4*tmp106) + (12*x2)), xmask) 2023-01-11T21:38:06.5371213Z tmp138 = tmp136 == tmp0 2023-01-11T21:38:06.5371295Z tmp139 = tmp110 & tmp61 2023-01-11T21:38:06.5371378Z tmp140 = tmp139 & tmp138 2023-01-11T21:38:06.5371458Z tmp141 = tmp135 + tmp137 2023-01-11T21:38:06.5371560Z tmp142 = tl.where(tmp140, tmp141, tmp135) 2023-01-11T21:38:06.5371640Z tmp143 = tmp6 + tmp8 2023-01-11T21:38:06.5371786Z tmp144 = tl.where(tmp143 != tmp143, tmp143, tl.where(tmp143 < tmp15, tmp143, tmp15)) 2023-01-11T21:38:06.5371896Z tmp145 = tl.load(in_ptr0 + (tmp18 + (4*tmp144) + (12*x2)), xmask) 2023-01-11T21:38:06.5372011Z tmp146 = tl.load(in_ptr1 + (tmp18 + (4*tmp144) + (12*x2)), xmask) 2023-01-11T21:38:06.5372092Z tmp147 = tmp145 == tmp0 2023-01-11T21:38:06.5372172Z tmp148 = tmp143 < tmp9 2023-01-11T21:38:06.5372255Z tmp149 = tmp148 & tmp72 2023-01-11T21:38:06.5372336Z tmp150 = tmp149 & tmp147 2023-01-11T21:38:06.5372415Z tmp151 = tmp142 + tmp146 2023-01-11T21:38:06.5372509Z tmp152 = tl.where(tmp150, tmp151, tmp142) 2023-01-11T21:38:06.5372624Z tmp153 = tl.load(in_ptr0 + (tmp25 + (4*tmp144) + (12*x2)), xmask) 2023-01-11T21:38:06.5372766Z tmp154 = tl.load(in_ptr1 + (tmp25 + (4*tmp144) + (12*x2)), xmask) 2023-01-11T21:38:06.5372847Z tmp155 = tmp153 == tmp0 2023-01-11T21:38:06.5372926Z tmp156 = tmp148 & tmp30 2023-01-11T21:38:06.5373007Z tmp157 = tmp156 & tmp155 2023-01-11T21:38:06.5373089Z tmp158 = tmp152 + tmp154 2023-01-11T21:38:06.5373186Z tmp159 = tl.where(tmp157, tmp158, tmp152) 2023-01-11T21:38:06.5373303Z tmp160 = tl.load(in_ptr0 + (tmp37 + (4*tmp144) + (12*x2)), xmask) 2023-01-11T21:38:06.5373418Z tmp161 = tl.load(in_ptr1 + (tmp37 + (4*tmp144) + (12*x2)), xmask) 2023-01-11T21:38:06.5373498Z tmp162 = tmp160 == tmp0 2023-01-11T21:38:06.5373578Z tmp163 = tmp148 & tmp41 2023-01-11T21:38:06.5373659Z tmp164 = tmp163 & tmp162 2023-01-11T21:38:06.5373740Z tmp165 = tmp159 + tmp161 2023-01-11T21:38:06.5373835Z tmp166 = tl.where(tmp164, tmp165, tmp159) 2023-01-11T21:38:06.5373952Z tmp167 = tl.load(in_ptr0 + (tmp47 + (4*tmp144) + (12*x2)), xmask) 2023-01-11T21:38:06.5374067Z tmp168 = tl.load(in_ptr1 + (tmp47 + (4*tmp144) + (12*x2)), xmask) 2023-01-11T21:38:06.5374149Z tmp169 = tmp167 == tmp0 2023-01-11T21:38:06.5374228Z tmp170 = tmp148 & tmp51 2023-01-11T21:38:06.5374309Z tmp171 = tmp170 & tmp169 2023-01-11T21:38:06.5374390Z tmp172 = tmp166 + tmp168 2023-01-11T21:38:06.5374603Z tmp173 = tl.where(tmp171, tmp172, tmp166) 2023-01-11T21:38:06.5374721Z tmp174 = tl.load(in_ptr0 + (tmp57 + (4*tmp144) + (12*x2)), xmask) 2023-01-11T21:38:06.5374835Z tmp175 = tl.load(in_ptr1 + (tmp57 + (4*tmp144) + (12*x2)), xmask) 2023-01-11T21:38:06.5374916Z tmp176 = tmp174 == tmp0 2023-01-11T21:38:06.5374996Z tmp177 = tmp148 & tmp61 2023-01-11T21:38:06.5375077Z tmp178 = tmp177 & tmp176 2023-01-11T21:38:06.5375158Z tmp179 = tmp173 + tmp175 2023-01-11T21:38:06.5375251Z tmp180 = tl.where(tmp178, tmp179, tmp173) 2023-01-11T21:38:06.5375331Z tmp181 = tmp6 + tmp10 2023-01-11T21:38:06.5375473Z tmp182 = tl.where(tmp181 != tmp181, tmp181, tl.where(tmp181 < tmp15, tmp181, tmp15)) 2023-01-11T21:38:06.5375593Z tmp183 = tl.load(in_ptr0 + (tmp18 + (4*tmp182) + (12*x2)), xmask) 2023-01-11T21:38:06.5375708Z tmp184 = tl.load(in_ptr1 + (tmp18 + (4*tmp182) + (12*x2)), xmask) 2023-01-11T21:38:06.5375788Z tmp185 = tmp183 == tmp0 2023-01-11T21:38:06.5375909Z tmp186 = tmp181 < tmp9 2023-01-11T21:38:06.5375989Z tmp187 = tmp186 & tmp72 2023-01-11T21:38:06.5376063Z tmp188 = tmp187 & tmp185 2023-01-11T21:38:06.5376145Z tmp189 = tmp180 + tmp184 2023-01-11T21:38:06.5376246Z tmp190 = tl.where(tmp188, tmp189, tmp180) 2023-01-11T21:38:06.5376360Z tmp191 = tl.load(in_ptr0 + (tmp25 + (4*tmp182) + (12*x2)), xmask) 2023-01-11T21:38:06.5376476Z tmp192 = tl.load(in_ptr1 + (tmp25 + (4*tmp182) + (12*x2)), xmask) 2023-01-11T21:38:06.5376555Z tmp193 = tmp191 == tmp0 2023-01-11T21:38:06.5376637Z tmp194 = tmp186 & tmp30 2023-01-11T21:38:06.5376711Z tmp195 = tmp194 & tmp193 2023-01-11T21:38:06.5376793Z tmp196 = tmp190 + tmp192 2023-01-11T21:38:06.5376892Z tmp197 = tl.where(tmp195, tmp196, tmp190) 2023-01-11T21:38:06.5377008Z tmp198 = tl.load(in_ptr0 + (tmp37 + (4*tmp182) + (12*x2)), xmask) 2023-01-11T21:38:06.5377169Z tmp199 = tl.load(in_ptr1 + (tmp37 + (4*tmp182) + (12*x2)), xmask) 2023-01-11T21:38:06.5377269Z tmp200 = tmp198 == tmp0 2023-01-11T21:38:06.5377362Z tmp201 = tmp186 & tmp41 2023-01-11T21:38:06.5377437Z tmp202 = tmp201 & tmp200 2023-01-11T21:38:06.5377520Z tmp203 = tmp197 + tmp199 2023-01-11T21:38:06.5377620Z tmp204 = tl.where(tmp202, tmp203, tmp197) 2023-01-11T21:38:06.5377736Z tmp205 = tl.load(in_ptr0 + (tmp47 + (4*tmp182) + (12*x2)), xmask) 2023-01-11T21:38:06.5377849Z tmp206 = tl.load(in_ptr1 + (tmp47 + (4*tmp182) + (12*x2)), xmask) 2023-01-11T21:38:06.5377933Z tmp207 = tmp205 == tmp0 2023-01-11T21:38:06.5378013Z tmp208 = tmp186 & tmp51 2023-01-11T21:38:06.5378087Z tmp209 = tmp208 & tmp207 2023-01-11T21:38:06.5378166Z tmp210 = tmp204 + tmp206 2023-01-11T21:38:06.5378317Z tmp211 = tl.where(tmp209, tmp210, tmp204) 2023-01-11T21:38:06.5378432Z tmp212 = tl.load(in_ptr0 + (tmp57 + (4*tmp182) + (12*x2)), xmask) 2023-01-11T21:38:06.5378548Z tmp213 = tl.load(in_ptr1 + (tmp57 + (4*tmp182) + (12*x2)), xmask) 2023-01-11T21:38:06.5378627Z tmp214 = tmp212 == tmp0 2023-01-11T21:38:06.5378712Z tmp215 = tmp186 & tmp61 2023-01-11T21:38:06.5378785Z tmp216 = tmp215 & tmp214 2023-01-11T21:38:06.5378866Z tmp217 = tmp211 + tmp213 2023-01-11T21:38:06.5378967Z tmp218 = tl.where(tmp216, tmp217, tmp211) 2023-01-11T21:38:06.5379104Z tl.store(out_ptr0 + (x5 + tl.zeros([XBLOCK], tl.int32)), tmp218, xmask) 2023-01-11T21:38:06.5379196Z ''') 2023-01-11T21:38:06.5379202Z 2023-01-11T21:38:06.5379207Z 2023-01-11T21:38:06.5379300Z async_compile.wait(globals()) 2023-01-11T21:38:06.5379377Z del async_compile 2023-01-11T21:38:06.5379382Z 2023-01-11T21:38:06.5379456Z def call(args): 2023-01-11T21:38:06.5379535Z arg0_1, arg1_1, arg2_1 = args 2023-01-11T21:38:06.5379616Z args.clear() 2023-01-11T21:38:06.5379706Z with torch.cuda.device(0): 2023-01-11T21:38:06.5379928Z buf0 = empty_strided((2, 64, 3, 4), (768, 12, 4, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.5380019Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.5380200Z triton_fused_max_pool2d_with_indices_backward_0.run(arg2_1, arg0_1, buf0, 1536, grid=grid(1536), stream=stream0) 2023-01-11T21:38:06.5380272Z del arg0_1 2023-01-11T21:38:06.5380338Z del arg2_1 2023-01-11T21:38:06.5380415Z return (buf0, ) 2023-01-11T21:38:06.5380421Z 2023-01-11T21:38:06.5380425Z 2023-01-11T21:38:06.5380505Z if __name__ == "__main__": 2023-01-11T21:38:06.5380625Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.5380752Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.5380970Z arg0_1 = rand_strided((2, 64, 3, 4), (768, 12, 4, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.5381189Z arg1_1 = rand_strided((2, 64, 3, 4), (768, 12, 4, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.5381399Z arg2_1 = rand_strided((2, 64, 3, 4), (768, 12, 4, 1), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.5381521Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1])) 2023-01-11T21:38:06.5381817Z [2023-01-11 21:35:20,323] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 716 2023-01-11T21:38:06.5382235Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.5382366Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.5382623Z [2023-01-11 21:35:20,346] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 717 2023-01-11T21:38:06.5382631Z 2023-01-11T21:38:06.5382635Z 2023-01-11T21:38:06.5382732Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.5382805Z import torch 2023-01-11T21:38:06.5382877Z import random 2023-01-11T21:38:06.5383000Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.5383117Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.5383130Z 2023-01-11T21:38:06.5383205Z aten = torch.ops.aten 2023-01-11T21:38:06.5383342Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.5383436Z async_compile = AsyncCompile() 2023-01-11T21:38:06.5383441Z 2023-01-11T21:38:06.5383514Z import triton 2023-01-11T21:38:06.5383604Z import triton.language as tl 2023-01-11T21:38:06.5383729Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.5383869Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.5383874Z 2023-01-11T21:38:06.5383905Z 2023-01-11T21:38:06.5384104Z triton_fused_max_pool2d_with_indices_backward_0 = async_compile.triton(''' 2023-01-11T21:38:06.5384180Z import triton 2023-01-11T21:38:06.5384272Z import triton.language as tl 2023-01-11T21:38:06.5384387Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.5384492Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.5384623Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.5384751Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.5384757Z 2023-01-11T21:38:06.5385177Z @pointwise(size_hints=[2048], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.5385244Z @triton.jit 2023-01-11T21:38:06.5385384Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.5385463Z xnumel = 1536 2023-01-11T21:38:06.5385559Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.5385710Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.5385800Z xmask = xindex < xnumel 2023-01-11T21:38:06.5385891Z x3 = xindex % 12 2023-01-11T21:38:06.5385966Z x1 = (xindex // 4) % 3 2023-01-11T21:38:06.5386042Z x0 = xindex % 4 2023-01-11T21:38:06.5386120Z x2 = (xindex // 12) 2023-01-11T21:38:06.5386190Z x5 = xindex 2023-01-11T21:38:06.5386261Z tmp0 = x3 2023-01-11T21:38:06.5386363Z tmp1 = (-2) + x1 2023-01-11T21:38:06.5386465Z tmp2 = (-2) + x0 2023-01-11T21:38:06.5386532Z tmp3 = 3 + x1 2023-01-11T21:38:06.5386603Z tmp4 = 3 + x0 2023-01-11T21:38:06.5386673Z tmp5 = 0 2023-01-11T21:38:06.5386811Z tmp6 = tl.where(tmp1 != tmp1, tmp1, tl.where(tmp1 > tmp5, tmp1, tmp5)) 2023-01-11T21:38:06.5386948Z tmp7 = tl.where(tmp2 != tmp2, tmp2, tl.where(tmp2 > tmp5, tmp2, tmp5)) 2023-01-11T21:38:06.5387022Z tmp8 = 3 2023-01-11T21:38:06.5387156Z tmp9 = tl.where(tmp3 != tmp3, tmp3, tl.where(tmp3 < tmp8, tmp3, tmp8)) 2023-01-11T21:38:06.5387220Z tmp10 = 4 2023-01-11T21:38:06.5387357Z tmp11 = tl.where(tmp4 != tmp4, tmp4, tl.where(tmp4 < tmp10, tmp4, tmp10)) 2023-01-11T21:38:06.5387433Z tmp12 = tmp6 + tmp5 2023-01-11T21:38:06.5387508Z tmp13 = tmp7 + tmp5 2023-01-11T21:38:06.5387607Z tmp14 = 1 2023-01-11T21:38:06.5387722Z tmp15 = tmp9 - tmp14 2023-01-11T21:38:06.5387867Z tmp16 = tl.where(tmp12 != tmp12, tmp12, tl.where(tmp12 < tmp15, tmp12, tmp15)) 2023-01-11T21:38:06.5387973Z tmp17 = tmp11 - tmp14 2023-01-11T21:38:06.5388115Z tmp18 = tl.where(tmp13 != tmp13, tmp13, tl.where(tmp13 < tmp17, tmp13, tmp17)) 2023-01-11T21:38:06.5388235Z tmp19 = tl.load(in_ptr0 + (tmp18 + (4*tmp16) + (12*x2)), xmask) 2023-01-11T21:38:06.5388370Z tmp20 = tl.load(in_ptr1 + (tmp18 + (4*tmp16) + (12*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.5388449Z tmp21 = tmp19 == tmp0 2023-01-11T21:38:06.5388524Z tmp22 = 0.0 2023-01-11T21:38:06.5388624Z tmp23 = tl.where(tmp21, tmp20, tmp22) 2023-01-11T21:38:06.5388697Z tmp24 = tmp7 + tmp14 2023-01-11T21:38:06.5388835Z tmp25 = tl.where(tmp24 != tmp24, tmp24, tl.where(tmp24 < tmp17, tmp24, tmp17)) 2023-01-11T21:38:06.5388953Z tmp26 = tl.load(in_ptr0 + (tmp25 + (4*tmp16) + (12*x2)), xmask) 2023-01-11T21:38:06.5389087Z tmp27 = tl.load(in_ptr1 + (tmp25 + (4*tmp16) + (12*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.5389167Z tmp28 = tmp26 == tmp0 2023-01-11T21:38:06.5389247Z tmp29 = tmp12 < tmp9 2023-01-11T21:38:06.5389324Z tmp30 = tmp24 < tmp11 2023-01-11T21:38:06.5389396Z tmp31 = tmp29 & tmp30 2023-01-11T21:38:06.5389477Z tmp32 = tmp31 & tmp28 2023-01-11T21:38:06.5389554Z tmp33 = tmp23 + tmp27 2023-01-11T21:38:06.5389652Z tmp34 = tl.where(tmp32, tmp33, tmp23) 2023-01-11T21:38:06.5389721Z tmp35 = 2 2023-01-11T21:38:06.5389800Z tmp36 = tmp7 + tmp35 2023-01-11T21:38:06.5389935Z tmp37 = tl.where(tmp36 != tmp36, tmp36, tl.where(tmp36 < tmp17, tmp36, tmp17)) 2023-01-11T21:38:06.5390073Z tmp38 = tl.load(in_ptr0 + (tmp37 + (4*tmp16) + (12*x2)), xmask) 2023-01-11T21:38:06.5390205Z tmp39 = tl.load(in_ptr1 + (tmp37 + (4*tmp16) + (12*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.5390288Z tmp40 = tmp38 == tmp0 2023-01-11T21:38:06.5390370Z tmp41 = tmp36 < tmp11 2023-01-11T21:38:06.5390449Z tmp42 = tmp29 & tmp41 2023-01-11T21:38:06.5390531Z tmp43 = tmp42 & tmp40 2023-01-11T21:38:06.5390608Z tmp44 = tmp34 + tmp39 2023-01-11T21:38:06.5390700Z tmp45 = tl.where(tmp43, tmp44, tmp34) 2023-01-11T21:38:06.5390779Z tmp46 = tmp7 + tmp8 2023-01-11T21:38:06.5390915Z tmp47 = tl.where(tmp46 != tmp46, tmp46, tl.where(tmp46 < tmp17, tmp46, tmp17)) 2023-01-11T21:38:06.5391034Z tmp48 = tl.load(in_ptr0 + (tmp47 + (4*tmp16) + (12*x2)), xmask) 2023-01-11T21:38:06.5391164Z tmp49 = tl.load(in_ptr1 + (tmp47 + (4*tmp16) + (12*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.5391241Z tmp50 = tmp48 == tmp0 2023-01-11T21:38:06.5391325Z tmp51 = tmp46 < tmp11 2023-01-11T21:38:06.5391403Z tmp52 = tmp29 & tmp51 2023-01-11T21:38:06.5391475Z tmp53 = tmp52 & tmp50 2023-01-11T21:38:06.5391551Z tmp54 = tmp45 + tmp49 2023-01-11T21:38:06.5391648Z tmp55 = tl.where(tmp53, tmp54, tmp45) 2023-01-11T21:38:06.5391725Z tmp56 = tmp7 + tmp10 2023-01-11T21:38:06.5391865Z tmp57 = tl.where(tmp56 != tmp56, tmp56, tl.where(tmp56 < tmp17, tmp56, tmp17)) 2023-01-11T21:38:06.5391983Z tmp58 = tl.load(in_ptr0 + (tmp57 + (4*tmp16) + (12*x2)), xmask) 2023-01-11T21:38:06.5392113Z tmp59 = tl.load(in_ptr1 + (tmp57 + (4*tmp16) + (12*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.5392185Z tmp60 = tmp58 == tmp0 2023-01-11T21:38:06.5392264Z tmp61 = tmp56 < tmp11 2023-01-11T21:38:06.5392342Z tmp62 = tmp29 & tmp61 2023-01-11T21:38:06.5392419Z tmp63 = tmp62 & tmp60 2023-01-11T21:38:06.5392499Z tmp64 = tmp55 + tmp59 2023-01-11T21:38:06.5392597Z tmp65 = tl.where(tmp63, tmp64, tmp55) 2023-01-11T21:38:06.5392678Z tmp66 = tmp6 + tmp14 2023-01-11T21:38:06.5392811Z tmp67 = tl.where(tmp66 != tmp66, tmp66, tl.where(tmp66 < tmp15, tmp66, tmp15)) 2023-01-11T21:38:06.5392928Z tmp68 = tl.load(in_ptr0 + (tmp18 + (4*tmp67) + (12*x2)), xmask) 2023-01-11T21:38:06.5393086Z tmp69 = tl.load(in_ptr1 + (tmp18 + (4*tmp67) + (12*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.5393163Z tmp70 = tmp68 == tmp0 2023-01-11T21:38:06.5393242Z tmp71 = tmp66 < tmp9 2023-01-11T21:38:06.5393321Z tmp72 = tmp13 < tmp11 2023-01-11T21:38:06.5393398Z tmp73 = tmp71 & tmp72 2023-01-11T21:38:06.5393468Z tmp74 = tmp73 & tmp70 2023-01-11T21:38:06.5393544Z tmp75 = tmp65 + tmp69 2023-01-11T21:38:06.5393642Z tmp76 = tl.where(tmp74, tmp75, tmp65) 2023-01-11T21:38:06.5393761Z tmp77 = tl.load(in_ptr0 + (tmp25 + (4*tmp67) + (12*x2)), xmask) 2023-01-11T21:38:06.5393891Z tmp78 = tl.load(in_ptr1 + (tmp25 + (4*tmp67) + (12*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.5393971Z tmp79 = tmp77 == tmp0 2023-01-11T21:38:06.5394052Z tmp80 = tmp71 & tmp30 2023-01-11T21:38:06.5394124Z tmp81 = tmp80 & tmp79 2023-01-11T21:38:06.5394200Z tmp82 = tmp76 + tmp78 2023-01-11T21:38:06.5394295Z tmp83 = tl.where(tmp81, tmp82, tmp76) 2023-01-11T21:38:06.5394413Z tmp84 = tl.load(in_ptr0 + (tmp37 + (4*tmp67) + (12*x2)), xmask) 2023-01-11T21:38:06.5394545Z tmp85 = tl.load(in_ptr1 + (tmp37 + (4*tmp67) + (12*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.5394625Z tmp86 = tmp84 == tmp0 2023-01-11T21:38:06.5394704Z tmp87 = tmp71 & tmp41 2023-01-11T21:38:06.5394775Z tmp88 = tmp87 & tmp86 2023-01-11T21:38:06.5394854Z tmp89 = tmp83 + tmp85 2023-01-11T21:38:06.5394951Z tmp90 = tl.where(tmp88, tmp89, tmp83) 2023-01-11T21:38:06.5395068Z tmp91 = tl.load(in_ptr0 + (tmp47 + (4*tmp67) + (12*x2)), xmask) 2023-01-11T21:38:06.5395200Z tmp92 = tl.load(in_ptr1 + (tmp47 + (4*tmp67) + (12*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.5395285Z tmp93 = tmp91 == tmp0 2023-01-11T21:38:06.5395411Z tmp94 = tmp71 & tmp51 2023-01-11T21:38:06.5395491Z tmp95 = tmp94 & tmp93 2023-01-11T21:38:06.5395585Z tmp96 = tmp90 + tmp92 2023-01-11T21:38:06.5395681Z tmp97 = tl.where(tmp95, tmp96, tmp90) 2023-01-11T21:38:06.5395801Z tmp98 = tl.load(in_ptr0 + (tmp57 + (4*tmp67) + (12*x2)), xmask) 2023-01-11T21:38:06.5395933Z tmp99 = tl.load(in_ptr1 + (tmp57 + (4*tmp67) + (12*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.5396014Z tmp100 = tmp98 == tmp0 2023-01-11T21:38:06.5396096Z tmp101 = tmp71 & tmp61 2023-01-11T21:38:06.5396172Z tmp102 = tmp101 & tmp100 2023-01-11T21:38:06.5396254Z tmp103 = tmp97 + tmp99 2023-01-11T21:38:06.5396356Z tmp104 = tl.where(tmp102, tmp103, tmp97) 2023-01-11T21:38:06.5396434Z tmp105 = tmp6 + tmp35 2023-01-11T21:38:06.5396578Z tmp106 = tl.where(tmp105 != tmp105, tmp105, tl.where(tmp105 < tmp15, tmp105, tmp15)) 2023-01-11T21:38:06.5396700Z tmp107 = tl.load(in_ptr0 + (tmp18 + (4*tmp106) + (12*x2)), xmask) 2023-01-11T21:38:06.5396836Z tmp108 = tl.load(in_ptr1 + (tmp18 + (4*tmp106) + (12*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.5396911Z tmp109 = tmp107 == tmp0 2023-01-11T21:38:06.5396992Z tmp110 = tmp105 < tmp9 2023-01-11T21:38:06.5397074Z tmp111 = tmp110 & tmp72 2023-01-11T21:38:06.5397156Z tmp112 = tmp111 & tmp109 2023-01-11T21:38:06.5397236Z tmp113 = tmp104 + tmp108 2023-01-11T21:38:06.5397339Z tmp114 = tl.where(tmp112, tmp113, tmp104) 2023-01-11T21:38:06.5397460Z tmp115 = tl.load(in_ptr0 + (tmp25 + (4*tmp106) + (12*x2)), xmask) 2023-01-11T21:38:06.5397585Z tmp116 = tl.load(in_ptr1 + (tmp25 + (4*tmp106) + (12*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.5397666Z tmp117 = tmp115 == tmp0 2023-01-11T21:38:06.5397746Z tmp118 = tmp110 & tmp30 2023-01-11T21:38:06.5397827Z tmp119 = tmp118 & tmp117 2023-01-11T21:38:06.5397908Z tmp120 = tmp114 + tmp116 2023-01-11T21:38:06.5398010Z tmp121 = tl.where(tmp119, tmp120, tmp114) 2023-01-11T21:38:06.5398130Z tmp122 = tl.load(in_ptr0 + (tmp37 + (4*tmp106) + (12*x2)), xmask) 2023-01-11T21:38:06.5398266Z tmp123 = tl.load(in_ptr1 + (tmp37 + (4*tmp106) + (12*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.5398341Z tmp124 = tmp122 == tmp0 2023-01-11T21:38:06.5398420Z tmp125 = tmp110 & tmp41 2023-01-11T21:38:06.5398501Z tmp126 = tmp125 & tmp124 2023-01-11T21:38:06.5398609Z tmp127 = tmp121 + tmp123 2023-01-11T21:38:06.5398712Z tmp128 = tl.where(tmp126, tmp127, tmp121) 2023-01-11T21:38:06.5398831Z tmp129 = tl.load(in_ptr0 + (tmp47 + (4*tmp106) + (12*x2)), xmask) 2023-01-11T21:38:06.5398959Z tmp130 = tl.load(in_ptr1 + (tmp47 + (4*tmp106) + (12*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.5399033Z tmp131 = tmp129 == tmp0 2023-01-11T21:38:06.5399112Z tmp132 = tmp110 & tmp51 2023-01-11T21:38:06.5399190Z tmp133 = tmp132 & tmp131 2023-01-11T21:38:06.5399270Z tmp134 = tmp128 + tmp130 2023-01-11T21:38:06.5399371Z tmp135 = tl.where(tmp133, tmp134, tmp128) 2023-01-11T21:38:06.5399490Z tmp136 = tl.load(in_ptr0 + (tmp57 + (4*tmp106) + (12*x2)), xmask) 2023-01-11T21:38:06.5399625Z tmp137 = tl.load(in_ptr1 + (tmp57 + (4*tmp106) + (12*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.5399699Z tmp138 = tmp136 == tmp0 2023-01-11T21:38:06.5399777Z tmp139 = tmp110 & tmp61 2023-01-11T21:38:06.5399861Z tmp140 = tmp139 & tmp138 2023-01-11T21:38:06.5399944Z tmp141 = tmp135 + tmp137 2023-01-11T21:38:06.5400044Z tmp142 = tl.where(tmp140, tmp141, tmp135) 2023-01-11T21:38:06.5400124Z tmp143 = tmp6 + tmp8 2023-01-11T21:38:06.5400268Z tmp144 = tl.where(tmp143 != tmp143, tmp143, tl.where(tmp143 < tmp15, tmp143, tmp15)) 2023-01-11T21:38:06.5400381Z tmp145 = tl.load(in_ptr0 + (tmp18 + (4*tmp144) + (12*x2)), xmask) 2023-01-11T21:38:06.5400509Z tmp146 = tl.load(in_ptr1 + (tmp18 + (4*tmp144) + (12*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.5400589Z tmp147 = tmp145 == tmp0 2023-01-11T21:38:06.5400667Z tmp148 = tmp143 < tmp9 2023-01-11T21:38:06.5400748Z tmp149 = tmp148 & tmp72 2023-01-11T21:38:06.5400856Z tmp150 = tmp149 & tmp147 2023-01-11T21:38:06.5400935Z tmp151 = tmp142 + tmp146 2023-01-11T21:38:06.5401030Z tmp152 = tl.where(tmp150, tmp151, tmp142) 2023-01-11T21:38:06.5401147Z tmp153 = tl.load(in_ptr0 + (tmp25 + (4*tmp144) + (12*x2)), xmask) 2023-01-11T21:38:06.5401276Z tmp154 = tl.load(in_ptr1 + (tmp25 + (4*tmp144) + (12*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.5401353Z tmp155 = tmp153 == tmp0 2023-01-11T21:38:06.5401435Z tmp156 = tmp148 & tmp30 2023-01-11T21:38:06.5401516Z tmp157 = tmp156 & tmp155 2023-01-11T21:38:06.5401598Z tmp158 = tmp152 + tmp154 2023-01-11T21:38:06.5401692Z tmp159 = tl.where(tmp157, tmp158, tmp152) 2023-01-11T21:38:06.5401809Z tmp160 = tl.load(in_ptr0 + (tmp37 + (4*tmp144) + (12*x2)), xmask) 2023-01-11T21:38:06.5401938Z tmp161 = tl.load(in_ptr1 + (tmp37 + (4*tmp144) + (12*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.5402019Z tmp162 = tmp160 == tmp0 2023-01-11T21:38:06.5402105Z tmp163 = tmp148 & tmp41 2023-01-11T21:38:06.5402189Z tmp164 = tmp163 & tmp162 2023-01-11T21:38:06.5402269Z tmp165 = tmp159 + tmp161 2023-01-11T21:38:06.5402363Z tmp166 = tl.where(tmp164, tmp165, tmp159) 2023-01-11T21:38:06.5402480Z tmp167 = tl.load(in_ptr0 + (tmp47 + (4*tmp144) + (12*x2)), xmask) 2023-01-11T21:38:06.5402611Z tmp168 = tl.load(in_ptr1 + (tmp47 + (4*tmp144) + (12*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.5402693Z tmp169 = tmp167 == tmp0 2023-01-11T21:38:06.5402773Z tmp170 = tmp148 & tmp51 2023-01-11T21:38:06.5402853Z tmp171 = tmp170 & tmp169 2023-01-11T21:38:06.5402931Z tmp172 = tmp166 + tmp168 2023-01-11T21:38:06.5403025Z tmp173 = tl.where(tmp171, tmp172, tmp166) 2023-01-11T21:38:06.5403142Z tmp174 = tl.load(in_ptr0 + (tmp57 + (4*tmp144) + (12*x2)), xmask) 2023-01-11T21:38:06.5403271Z tmp175 = tl.load(in_ptr1 + (tmp57 + (4*tmp144) + (12*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.5403352Z tmp176 = tmp174 == tmp0 2023-01-11T21:38:06.5403434Z tmp177 = tmp148 & tmp61 2023-01-11T21:38:06.5403516Z tmp178 = tmp177 & tmp176 2023-01-11T21:38:06.5403596Z tmp179 = tmp173 + tmp175 2023-01-11T21:38:06.5403691Z tmp180 = tl.where(tmp178, tmp179, tmp173) 2023-01-11T21:38:06.5403772Z tmp181 = tmp6 + tmp10 2023-01-11T21:38:06.5403943Z tmp182 = tl.where(tmp181 != tmp181, tmp181, tl.where(tmp181 < tmp15, tmp181, tmp15)) 2023-01-11T21:38:06.5404062Z tmp183 = tl.load(in_ptr0 + (tmp18 + (4*tmp182) + (12*x2)), xmask) 2023-01-11T21:38:06.5404193Z tmp184 = tl.load(in_ptr1 + (tmp18 + (4*tmp182) + (12*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.5404274Z tmp185 = tmp183 == tmp0 2023-01-11T21:38:06.5404355Z tmp186 = tmp181 < tmp9 2023-01-11T21:38:06.5404435Z tmp187 = tmp186 & tmp72 2023-01-11T21:38:06.5404510Z tmp188 = tmp187 & tmp185 2023-01-11T21:38:06.5404591Z tmp189 = tmp180 + tmp184 2023-01-11T21:38:06.5404690Z tmp190 = tl.where(tmp188, tmp189, tmp180) 2023-01-11T21:38:06.5404807Z tmp191 = tl.load(in_ptr0 + (tmp25 + (4*tmp182) + (12*x2)), xmask) 2023-01-11T21:38:06.5404941Z tmp192 = tl.load(in_ptr1 + (tmp25 + (4*tmp182) + (12*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.5405020Z tmp193 = tmp191 == tmp0 2023-01-11T21:38:06.5405100Z tmp194 = tmp186 & tmp30 2023-01-11T21:38:06.5405174Z tmp195 = tmp194 & tmp193 2023-01-11T21:38:06.5405257Z tmp196 = tmp190 + tmp192 2023-01-11T21:38:06.5405358Z tmp197 = tl.where(tmp195, tmp196, tmp190) 2023-01-11T21:38:06.5405473Z tmp198 = tl.load(in_ptr0 + (tmp37 + (4*tmp182) + (12*x2)), xmask) 2023-01-11T21:38:06.5405603Z tmp199 = tl.load(in_ptr1 + (tmp37 + (4*tmp182) + (12*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.5405686Z tmp200 = tmp198 == tmp0 2023-01-11T21:38:06.5405767Z tmp201 = tmp186 & tmp41 2023-01-11T21:38:06.5405841Z tmp202 = tmp201 & tmp200 2023-01-11T21:38:06.5405919Z tmp203 = tmp197 + tmp199 2023-01-11T21:38:06.5406020Z tmp204 = tl.where(tmp202, tmp203, tmp197) 2023-01-11T21:38:06.5406140Z tmp205 = tl.load(in_ptr0 + (tmp47 + (4*tmp182) + (12*x2)), xmask) 2023-01-11T21:38:06.5406305Z tmp206 = tl.load(in_ptr1 + (tmp47 + (4*tmp182) + (12*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.5406382Z tmp207 = tmp205 == tmp0 2023-01-11T21:38:06.5406461Z tmp208 = tmp186 & tmp51 2023-01-11T21:38:06.5406535Z tmp209 = tmp208 & tmp207 2023-01-11T21:38:06.5406617Z tmp210 = tmp204 + tmp206 2023-01-11T21:38:06.5406718Z tmp211 = tl.where(tmp209, tmp210, tmp204) 2023-01-11T21:38:06.5406835Z tmp212 = tl.load(in_ptr0 + (tmp57 + (4*tmp182) + (12*x2)), xmask) 2023-01-11T21:38:06.5406963Z tmp213 = tl.load(in_ptr1 + (tmp57 + (4*tmp182) + (12*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.5407044Z tmp214 = tmp212 == tmp0 2023-01-11T21:38:06.5407129Z tmp215 = tmp186 & tmp61 2023-01-11T21:38:06.5407203Z tmp216 = tmp215 & tmp214 2023-01-11T21:38:06.5407286Z tmp217 = tmp211 + tmp213 2023-01-11T21:38:06.5407388Z tmp218 = tl.where(tmp216, tmp217, tmp211) 2023-01-11T21:38:06.5407529Z tl.store(out_ptr0 + (x5 + tl.zeros([XBLOCK], tl.int32)), tmp218, xmask) 2023-01-11T21:38:06.5407624Z ''') 2023-01-11T21:38:06.5407630Z 2023-01-11T21:38:06.5407634Z 2023-01-11T21:38:06.5407729Z async_compile.wait(globals()) 2023-01-11T21:38:06.5407807Z del async_compile 2023-01-11T21:38:06.5407812Z 2023-01-11T21:38:06.5407887Z def call(args): 2023-01-11T21:38:06.5407970Z arg0_1, arg1_1, arg2_1 = args 2023-01-11T21:38:06.5408046Z args.clear() 2023-01-11T21:38:06.5408140Z with torch.cuda.device(0): 2023-01-11T21:38:06.5408363Z buf0 = empty_strided((2, 64, 3, 4), (768, 12, 4, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.5408455Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.5408630Z triton_fused_max_pool2d_with_indices_backward_0.run(arg2_1, arg0_1, buf0, 1536, grid=grid(1536), stream=stream0) 2023-01-11T21:38:06.5408701Z del arg0_1 2023-01-11T21:38:06.5408766Z del arg2_1 2023-01-11T21:38:06.5408844Z return (buf0, ) 2023-01-11T21:38:06.5408852Z 2023-01-11T21:38:06.5408857Z 2023-01-11T21:38:06.5408937Z if __name__ == "__main__": 2023-01-11T21:38:06.5409054Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.5409185Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.5409433Z arg0_1 = rand_strided((2, 64, 3, 4), (768, 12, 4, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.5409650Z arg1_1 = rand_strided((2, 64, 3, 4), (768, 12, 4, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.5409864Z arg2_1 = rand_strided((2, 64, 3, 4), (768, 12, 4, 1), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.5409986Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1])) 2023-01-11T21:38:06.5410252Z [2023-01-11 21:35:21,222] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 717 2023-01-11T21:38:06.5410258Z 2023-01-11T21:38:06.5410331Z ok (1.801s) 2023-01-11T21:38:06.5410806Z test_max_pool2d_with_indices_backward5_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.5410942Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.5411198Z [2023-01-11 21:35:21,247] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 718 2023-01-11T21:38:06.5411458Z [2023-01-11 21:35:21,261] torch._inductor.ir: [WARNING] Using FallbackKernel: aten.max_pool2d_with_indices_backward 2023-01-11T21:38:06.5411716Z [2023-01-11 21:35:21,264] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 718 2023-01-11T21:38:06.5412131Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.5412298Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.5412554Z [2023-01-11 21:35:21,285] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 719 2023-01-11T21:38:06.5412806Z [2023-01-11 21:35:21,299] torch._inductor.ir: [WARNING] Using FallbackKernel: aten.max_pool2d_with_indices_backward 2023-01-11T21:38:06.5413071Z [2023-01-11 21:35:21,302] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 719 2023-01-11T21:38:06.5413077Z 2023-01-11T21:38:06.5413173Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.5413246Z import torch 2023-01-11T21:38:06.5413320Z import random 2023-01-11T21:38:06.5413438Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.5413563Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.5413568Z 2023-01-11T21:38:06.5413652Z aten = torch.ops.aten 2023-01-11T21:38:06.5413782Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.5413877Z async_compile = AsyncCompile() 2023-01-11T21:38:06.5413885Z 2023-01-11T21:38:06.5413960Z import triton 2023-01-11T21:38:06.5414052Z import triton.language as tl 2023-01-11T21:38:06.5414176Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.5414317Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.5414323Z 2023-01-11T21:38:06.5414327Z 2023-01-11T21:38:06.5414419Z async_compile.wait(globals()) 2023-01-11T21:38:06.5414614Z del async_compile 2023-01-11T21:38:06.5414620Z 2023-01-11T21:38:06.5414689Z def call(args): 2023-01-11T21:38:06.5414776Z arg0_1, arg1_1, arg2_1 = args 2023-01-11T21:38:06.5414852Z args.clear() 2023-01-11T21:38:06.5414948Z with torch.cuda.device(0): 2023-01-11T21:38:06.5415110Z buf0 = aten.max_pool2d_with_indices_backward(arg0_1, arg1_1, [13, 13], [1, 1], [2, 2], [1, 1], False, arg2_1) 2023-01-11T21:38:06.5415183Z del arg0_1 2023-01-11T21:38:06.5415255Z del arg1_1 2023-01-11T21:38:06.5415319Z del arg2_1 2023-01-11T21:38:06.5415436Z buf1 = buf0 2023-01-11T21:38:06.5415557Z assert_size_stride(buf1, (2, 64, 20, 20), (25600, 400, 20, 1)) 2023-01-11T21:38:06.5415630Z del buf0 2023-01-11T21:38:06.5415708Z return (buf1, ) 2023-01-11T21:38:06.5415715Z 2023-01-11T21:38:06.5415720Z 2023-01-11T21:38:06.5415824Z if __name__ == "__main__": 2023-01-11T21:38:06.5415955Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.5416094Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.5416318Z arg0_1 = rand_strided((2, 64, 12, 12), (9216, 144, 12, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.5416546Z arg1_1 = rand_strided((2, 64, 20, 20), (25600, 400, 20, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.5416767Z arg2_1 = rand_strided((2, 64, 12, 12), (9216, 144, 12, 1), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.5416895Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1])) 2023-01-11T21:38:06.5416900Z 2023-01-11T21:38:06.5416906Z 2023-01-11T21:38:06.5417004Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.5417077Z import torch 2023-01-11T21:38:06.5417213Z import random 2023-01-11T21:38:06.5417339Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.5417456Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.5417461Z 2023-01-11T21:38:06.5417547Z aten = torch.ops.aten 2023-01-11T21:38:06.5417683Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.5417779Z async_compile = AsyncCompile() 2023-01-11T21:38:06.5417784Z 2023-01-11T21:38:06.5417857Z import triton 2023-01-11T21:38:06.5418032Z import triton.language as tl 2023-01-11T21:38:06.5418157Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.5418289Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.5418299Z 2023-01-11T21:38:06.5418304Z 2023-01-11T21:38:06.5418389Z async_compile.wait(globals()) 2023-01-11T21:38:06.5418468Z del async_compile 2023-01-11T21:38:06.5418473Z 2023-01-11T21:38:06.5418550Z def call(args): 2023-01-11T21:38:06.5418638Z arg0_1, arg1_1, arg2_1 = args 2023-01-11T21:38:06.5418714Z args.clear() 2023-01-11T21:38:06.5418806Z with torch.cuda.device(0): 2023-01-11T21:38:06.5418967Z buf0 = aten.max_pool2d_with_indices_backward(arg0_1, arg1_1, [13, 13], [1, 1], [2, 2], [1, 1], False, arg2_1) 2023-01-11T21:38:06.5419034Z del arg0_1 2023-01-11T21:38:06.5419107Z del arg1_1 2023-01-11T21:38:06.5419178Z del arg2_1 2023-01-11T21:38:06.5419250Z buf1 = buf0 2023-01-11T21:38:06.5419366Z assert_size_stride(buf1, (2, 64, 20, 20), (25600, 400, 20, 1)) 2023-01-11T21:38:06.5419441Z del buf0 2023-01-11T21:38:06.5419516Z return (buf1, ) 2023-01-11T21:38:06.5419521Z 2023-01-11T21:38:06.5419525Z 2023-01-11T21:38:06.5419606Z if __name__ == "__main__": 2023-01-11T21:38:06.5419719Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.5419846Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.5420073Z arg0_1 = rand_strided((2, 64, 12, 12), (9216, 144, 12, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.5420293Z arg1_1 = rand_strided((2, 64, 20, 20), (25600, 400, 20, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.5420509Z arg2_1 = rand_strided((2, 64, 12, 12), (9216, 144, 12, 1), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.5420635Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1])) 2023-01-11T21:38:06.5420641Z 2023-01-11T21:38:06.5420711Z ok (0.080s) 2023-01-11T21:38:06.5421217Z test_max_pool2d_with_indices_backward_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.5421352Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.5421603Z [2023-01-11 21:35:21,325] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 720 2023-01-11T21:38:06.5421864Z [2023-01-11 21:35:21,462] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 720 2023-01-11T21:38:06.5422277Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.5422409Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.5422669Z [2023-01-11 21:35:21,484] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 721 2023-01-11T21:38:06.5422932Z [2023-01-11 21:35:21,609] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 721 2023-01-11T21:38:06.5422937Z 2023-01-11T21:38:06.5423036Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.5423110Z import torch 2023-01-11T21:38:06.5423184Z import random 2023-01-11T21:38:06.5423297Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.5423424Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.5423430Z 2023-01-11T21:38:06.5423510Z aten = torch.ops.aten 2023-01-11T21:38:06.5423646Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.5423764Z async_compile = AsyncCompile() 2023-01-11T21:38:06.5423769Z 2023-01-11T21:38:06.5423840Z import triton 2023-01-11T21:38:06.5423931Z import triton.language as tl 2023-01-11T21:38:06.5424058Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.5424193Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.5424198Z 2023-01-11T21:38:06.5424207Z 2023-01-11T21:38:06.5424404Z triton_fused_max_pool2d_with_indices_backward_0 = async_compile.triton(''' 2023-01-11T21:38:06.5424477Z import triton 2023-01-11T21:38:06.5424567Z import triton.language as tl 2023-01-11T21:38:06.5424681Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.5424782Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.5424917Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.5425042Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.5425047Z 2023-01-11T21:38:06.5425463Z @pointwise(size_hints=[2048], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.5425536Z @triton.jit 2023-01-11T21:38:06.5425679Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.5425753Z xnumel = 2016 2023-01-11T21:38:06.5425850Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.5425979Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.5426061Z xmask = xindex < xnumel 2023-01-11T21:38:06.5426138Z x3 = xindex % 252 2023-01-11T21:38:06.5426214Z x1 = (xindex // 14) % 18 2023-01-11T21:38:06.5426287Z x0 = xindex % 14 2023-01-11T21:38:06.5426365Z x2 = (xindex // 252) 2023-01-11T21:38:06.5426432Z x5 = xindex 2023-01-11T21:38:06.5426502Z tmp0 = x3 2023-01-11T21:38:06.5426577Z tmp1 = (x1 // 2) 2023-01-11T21:38:06.5426647Z tmp2 = (x0 // 2) 2023-01-11T21:38:06.5426722Z tmp3 = 1 + (x1 // 2) 2023-01-11T21:38:06.5426798Z tmp4 = 1 + (x0 // 2) 2023-01-11T21:38:06.5426866Z tmp5 = 0 2023-01-11T21:38:06.5427002Z tmp6 = tl.where(tmp1 != tmp1, tmp1, tl.where(tmp1 > tmp5, tmp1, tmp5)) 2023-01-11T21:38:06.5427163Z tmp7 = tl.where(tmp2 != tmp2, tmp2, tl.where(tmp2 > tmp5, tmp2, tmp5)) 2023-01-11T21:38:06.5427232Z tmp8 = 9 2023-01-11T21:38:06.5427356Z tmp9 = tl.where(tmp3 != tmp3, tmp3, tl.where(tmp3 < tmp8, tmp3, tmp8)) 2023-01-11T21:38:06.5427427Z tmp10 = 7 2023-01-11T21:38:06.5427564Z tmp11 = tl.where(tmp4 != tmp4, tmp4, tl.where(tmp4 < tmp10, tmp4, tmp10)) 2023-01-11T21:38:06.5427643Z tmp12 = tmp6 + tmp5 2023-01-11T21:38:06.5427719Z tmp13 = tmp7 + tmp5 2023-01-11T21:38:06.5427790Z tmp14 = 1 2023-01-11T21:38:06.5427902Z tmp15 = tmp9 - tmp14 2023-01-11T21:38:06.5428039Z tmp16 = tl.where(tmp12 != tmp12, tmp12, tl.where(tmp12 < tmp15, tmp12, tmp15)) 2023-01-11T21:38:06.5428155Z tmp17 = tmp11 - tmp14 2023-01-11T21:38:06.5428294Z tmp18 = tl.where(tmp13 != tmp13, tmp13, tl.where(tmp13 < tmp17, tmp13, tmp17)) 2023-01-11T21:38:06.5428415Z tmp19 = tl.load(in_ptr0 + (tmp18 + (7*tmp16) + (63*x2)), xmask) 2023-01-11T21:38:06.5428531Z tmp20 = tl.load(in_ptr1 + (tmp18 + (7*tmp16) + (63*x2)), xmask) 2023-01-11T21:38:06.5428613Z tmp21 = tmp19 == tmp0 2023-01-11T21:38:06.5428685Z tmp22 = 0.0 2023-01-11T21:38:06.5428778Z tmp23 = tl.where(tmp21, tmp20, tmp22) 2023-01-11T21:38:06.5428912Z tl.store(out_ptr0 + (x5 + tl.zeros([XBLOCK], tl.int32)), tmp23, xmask) 2023-01-11T21:38:06.5428995Z ''') 2023-01-11T21:38:06.5429001Z 2023-01-11T21:38:06.5429005Z 2023-01-11T21:38:06.5429099Z async_compile.wait(globals()) 2023-01-11T21:38:06.5429175Z del async_compile 2023-01-11T21:38:06.5429180Z 2023-01-11T21:38:06.5429254Z def call(args): 2023-01-11T21:38:06.5429338Z arg0_1, arg1_1, arg2_1 = args 2023-01-11T21:38:06.5429414Z args.clear() 2023-01-11T21:38:06.5429528Z with torch.cuda.device(0): 2023-01-11T21:38:06.5429753Z buf0 = empty_strided((2, 4, 18, 14), (1008, 252, 14, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.5429845Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.5430030Z triton_fused_max_pool2d_with_indices_backward_0.run(arg2_1, arg0_1, buf0, 2016, grid=grid(2016), stream=stream0) 2023-01-11T21:38:06.5430102Z del arg0_1 2023-01-11T21:38:06.5430175Z del arg2_1 2023-01-11T21:38:06.5430257Z return (buf0, ) 2023-01-11T21:38:06.5430262Z 2023-01-11T21:38:06.5430267Z 2023-01-11T21:38:06.5430344Z if __name__ == "__main__": 2023-01-11T21:38:06.5430454Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.5430579Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.5430796Z arg0_1 = rand_strided((2, 4, 9, 7), (252, 63, 7, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.5431016Z arg1_1 = rand_strided((2, 4, 18, 14), (1008, 252, 14, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.5431233Z arg2_1 = rand_strided((2, 4, 9, 7), (252, 63, 7, 1), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.5431357Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1])) 2023-01-11T21:38:06.5431363Z 2023-01-11T21:38:06.5431367Z 2023-01-11T21:38:06.5431467Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.5431540Z import torch 2023-01-11T21:38:06.5431606Z import random 2023-01-11T21:38:06.5431724Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.5431847Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.5431852Z 2023-01-11T21:38:06.5431933Z aten = torch.ops.aten 2023-01-11T21:38:06.5432072Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.5432167Z async_compile = AsyncCompile() 2023-01-11T21:38:06.5432172Z 2023-01-11T21:38:06.5432244Z import triton 2023-01-11T21:38:06.5432330Z import triton.language as tl 2023-01-11T21:38:06.5432455Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.5432594Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.5432599Z 2023-01-11T21:38:06.5432603Z 2023-01-11T21:38:06.5432803Z triton_fused_max_pool2d_with_indices_backward_0 = async_compile.triton(''' 2023-01-11T21:38:06.5432876Z import triton 2023-01-11T21:38:06.5432997Z import triton.language as tl 2023-01-11T21:38:06.5433113Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.5433212Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.5433337Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.5433462Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.5433467Z 2023-01-11T21:38:06.5433884Z @pointwise(size_hints=[2048], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.5433960Z @triton.jit 2023-01-11T21:38:06.5434102Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.5434179Z xnumel = 2016 2023-01-11T21:38:06.5434275Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.5434404Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.5434480Z xmask = xindex < xnumel 2023-01-11T21:38:06.5434555Z x3 = xindex % 252 2023-01-11T21:38:06.5434637Z x1 = (xindex // 14) % 18 2023-01-11T21:38:06.5434711Z x0 = xindex % 14 2023-01-11T21:38:06.5434788Z x2 = (xindex // 252) 2023-01-11T21:38:06.5434860Z x5 = xindex 2023-01-11T21:38:06.5434933Z tmp0 = x3 2023-01-11T21:38:06.5435001Z tmp1 = (x1 // 2) 2023-01-11T21:38:06.5435075Z tmp2 = (x0 // 2) 2023-01-11T21:38:06.5435156Z tmp3 = 1 + (x1 // 2) 2023-01-11T21:38:06.5435232Z tmp4 = 1 + (x0 // 2) 2023-01-11T21:38:06.5435311Z tmp5 = 0 2023-01-11T21:38:06.5435465Z tmp6 = tl.where(tmp1 != tmp1, tmp1, tl.where(tmp1 > tmp5, tmp1, tmp5)) 2023-01-11T21:38:06.5435653Z tmp7 = tl.where(tmp2 != tmp2, tmp2, tl.where(tmp2 > tmp5, tmp2, tmp5)) 2023-01-11T21:38:06.5435717Z tmp8 = 9 2023-01-11T21:38:06.5435846Z tmp9 = tl.where(tmp3 != tmp3, tmp3, tl.where(tmp3 < tmp8, tmp3, tmp8)) 2023-01-11T21:38:06.5435917Z tmp10 = 7 2023-01-11T21:38:06.5436060Z tmp11 = tl.where(tmp4 != tmp4, tmp4, tl.where(tmp4 < tmp10, tmp4, tmp10)) 2023-01-11T21:38:06.5436136Z tmp12 = tmp6 + tmp5 2023-01-11T21:38:06.5436215Z tmp13 = tmp7 + tmp5 2023-01-11T21:38:06.5436286Z tmp14 = 1 2023-01-11T21:38:06.5436392Z tmp15 = tmp9 - tmp14 2023-01-11T21:38:06.5436534Z tmp16 = tl.where(tmp12 != tmp12, tmp12, tl.where(tmp12 < tmp15, tmp12, tmp15)) 2023-01-11T21:38:06.5436646Z tmp17 = tmp11 - tmp14 2023-01-11T21:38:06.5436785Z tmp18 = tl.where(tmp13 != tmp13, tmp13, tl.where(tmp13 < tmp17, tmp13, tmp17)) 2023-01-11T21:38:06.5436904Z tmp19 = tl.load(in_ptr0 + (tmp18 + (7*tmp16) + (63*x2)), xmask) 2023-01-11T21:38:06.5437041Z tmp20 = tl.load(in_ptr1 + (tmp18 + (7*tmp16) + (63*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.5437119Z tmp21 = tmp19 == tmp0 2023-01-11T21:38:06.5437184Z tmp22 = 0.0 2023-01-11T21:38:06.5437281Z tmp23 = tl.where(tmp21, tmp20, tmp22) 2023-01-11T21:38:06.5437419Z tl.store(out_ptr0 + (x5 + tl.zeros([XBLOCK], tl.int32)), tmp23, xmask) 2023-01-11T21:38:06.5437504Z ''') 2023-01-11T21:38:06.5437510Z 2023-01-11T21:38:06.5437514Z 2023-01-11T21:38:06.5437610Z async_compile.wait(globals()) 2023-01-11T21:38:06.5437687Z del async_compile 2023-01-11T21:38:06.5437692Z 2023-01-11T21:38:06.5437767Z def call(args): 2023-01-11T21:38:06.5437855Z arg0_1, arg1_1, arg2_1 = args 2023-01-11T21:38:06.5437924Z args.clear() 2023-01-11T21:38:06.5438018Z with torch.cuda.device(0): 2023-01-11T21:38:06.5438243Z buf0 = empty_strided((2, 4, 18, 14), (1008, 252, 14, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.5438333Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.5438510Z triton_fused_max_pool2d_with_indices_backward_0.run(arg2_1, arg0_1, buf0, 2016, grid=grid(2016), stream=stream0) 2023-01-11T21:38:06.5438584Z del arg0_1 2023-01-11T21:38:06.5438656Z del arg2_1 2023-01-11T21:38:06.5438726Z return (buf0, ) 2023-01-11T21:38:06.5438731Z 2023-01-11T21:38:06.5438769Z 2023-01-11T21:38:06.5438844Z if __name__ == "__main__": 2023-01-11T21:38:06.5438964Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.5439091Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.5439304Z arg0_1 = rand_strided((2, 4, 9, 7), (252, 63, 7, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.5439526Z arg1_1 = rand_strided((2, 4, 18, 14), (1008, 252, 14, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.5439736Z arg2_1 = rand_strided((2, 4, 9, 7), (252, 63, 7, 1), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.5439862Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1])) 2023-01-11T21:38:06.5439870Z 2023-01-11T21:38:06.5439935Z ok (0.307s) 2023-01-11T21:38:06.5440387Z test_mean_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.5440519Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.5440776Z [2023-01-11 21:35:21,630] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 722 2023-01-11T21:38:06.5441041Z [2023-01-11 21:35:21,837] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 722 2023-01-11T21:38:06.5441457Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.5441617Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.5441874Z [2023-01-11 21:35:21,859] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 723 2023-01-11T21:38:06.5442078Z [2023-01-11 21:35:22,040] torch._inductor.scheduler: [DEBUG] remove_buffer('buf0') 2023-01-11T21:38:06.5442282Z [2023-01-11 21:35:22,042] torch._inductor.scheduler: [DEBUG] remove_buffer('buf1') 2023-01-11T21:38:06.5442287Z 2023-01-11T21:38:06.5442386Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.5442454Z import torch 2023-01-11T21:38:06.5442529Z import random 2023-01-11T21:38:06.5442648Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.5442773Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.5442778Z 2023-01-11T21:38:06.5442860Z aten = torch.ops.aten 2023-01-11T21:38:06.5443000Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.5443094Z async_compile = AsyncCompile() 2023-01-11T21:38:06.5443099Z 2023-01-11T21:38:06.5443176Z import triton 2023-01-11T21:38:06.5443261Z import triton.language as tl 2023-01-11T21:38:06.5443384Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.5443520Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.5443526Z 2023-01-11T21:38:06.5443530Z 2023-01-11T21:38:06.5443685Z triton_fused_mean_0 = async_compile.triton(''' 2023-01-11T21:38:06.5443759Z import triton 2023-01-11T21:38:06.5443850Z import triton.language as tl 2023-01-11T21:38:06.5443962Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.5444056Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.5444186Z from torch._inductor.triton_ops.autotune import reduction 2023-01-11T21:38:06.5444313Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.5444319Z 2023-01-11T21:38:06.5444404Z @reduction(size_hints=[1, 64], 2023-01-11T21:38:06.5444519Z reduction_hint=ReductionHint.INNER, 2023-01-11T21:38:06.5444607Z filename=__file__, 2023-01-11T21:38:06.5445007Z meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 3), equal_to_1=())]}) 2023-01-11T21:38:06.5445082Z @triton.jit 2023-01-11T21:38:06.5445247Z def triton_(in_out_ptr0, in_ptr0, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr): 2023-01-11T21:38:06.5445320Z xnumel = 1 2023-01-11T21:38:06.5445392Z rnumel = 64 2023-01-11T21:38:06.5445494Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.5445627Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.5445713Z xmask = xindex < xnumel 2023-01-11T21:38:06.5445831Z rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK]) 2023-01-11T21:38:06.5445942Z _tmp1 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.5446048Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.5446137Z rindex = roffset + rbase 2023-01-11T21:38:06.5446222Z rmask = rindex < rnumel 2023-01-11T21:38:06.5446295Z r0 = rindex 2023-01-11T21:38:06.5446396Z tmp0 = tl.load(in_ptr0 + (r0), rmask) 2023-01-11T21:38:06.5446518Z _tmp1 = tl.where(xmask & rmask, _tmp1 + tmp0, _tmp1) 2023-01-11T21:38:06.5446625Z tmp1 = tl.reshape(tl.sum(_tmp1, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.5446696Z tmp2 = 64 2023-01-11T21:38:06.5446774Z tmp3 = tmp1 / tmp2 2023-01-11T21:38:06.5446914Z tl.store(in_out_ptr0 + (0 + tl.zeros([XBLOCK, 1], tl.int32)), tmp3, None) 2023-01-11T21:38:06.5446999Z ''') 2023-01-11T21:38:06.5447005Z 2023-01-11T21:38:06.5447045Z 2023-01-11T21:38:06.5447206Z triton_fused_mean_1_1 = async_compile.triton(''' 2023-01-11T21:38:06.5447283Z import triton 2023-01-11T21:38:06.5447375Z import triton.language as tl 2023-01-11T21:38:06.5447481Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.5447581Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.5447717Z from torch._inductor.triton_ops.autotune import reduction 2023-01-11T21:38:06.5447840Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.5447845Z 2023-01-11T21:38:06.5447934Z @reduction(size_hints=[8, 8], 2023-01-11T21:38:06.5448047Z reduction_hint=ReductionHint.INNER, 2023-01-11T21:38:06.5448132Z filename=__file__, 2023-01-11T21:38:06.5448496Z meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.5448569Z @triton.jit 2023-01-11T21:38:06.5448741Z def triton_(in_out_ptr0, in_ptr0, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr): 2023-01-11T21:38:06.5448814Z xnumel = 8 2023-01-11T21:38:06.5448887Z rnumel = 8 2023-01-11T21:38:06.5448985Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.5449123Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.5449206Z xmask = xindex < xnumel 2023-01-11T21:38:06.5449318Z rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK]) 2023-01-11T21:38:06.5449390Z x0 = xindex 2023-01-11T21:38:06.5449511Z _tmp1 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.5449616Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.5449704Z rindex = roffset + rbase 2023-01-11T21:38:06.5449787Z rmask = rindex < rnumel 2023-01-11T21:38:06.5449861Z r1 = rindex 2023-01-11T21:38:06.5449972Z tmp0 = tl.load(in_ptr0 + (r1 + (8*x0)), rmask & xmask) 2023-01-11T21:38:06.5450091Z _tmp1 = tl.where(xmask & rmask, _tmp1 + tmp0, _tmp1) 2023-01-11T21:38:06.5450207Z tmp1 = tl.reshape(tl.sum(_tmp1, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.5450279Z tmp2 = 8 2023-01-11T21:38:06.5450357Z tmp3 = tmp1 / tmp2 2023-01-11T21:38:06.5450497Z tl.store(in_out_ptr0 + (x0 + tl.zeros([XBLOCK, 1], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.5450608Z ''') 2023-01-11T21:38:06.5450614Z 2023-01-11T21:38:06.5450618Z 2023-01-11T21:38:06.5450769Z triton_fused_mean_2_2 = async_compile.triton(''' 2023-01-11T21:38:06.5450845Z import triton 2023-01-11T21:38:06.5450936Z import triton.language as tl 2023-01-11T21:38:06.5451049Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.5451151Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.5451284Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.5451408Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.5451414Z 2023-01-11T21:38:06.5451818Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.5451889Z @triton.jit 2023-01-11T21:38:06.5452020Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.5452097Z xnumel = 16 2023-01-11T21:38:06.5452191Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.5452320Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.5452400Z xmask = xindex < xnumel 2023-01-11T21:38:06.5452472Z x0 = xindex % 8 2023-01-11T21:38:06.5452543Z x1 = (xindex // 8) 2023-01-11T21:38:06.5452612Z x2 = xindex 2023-01-11T21:38:06.5452813Z tmp0 = tl.load(in_ptr0 + (x0 + (32*x1)), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.5453019Z tmp1 = tl.load(in_ptr0 + (8 + x0 + (32*x1)), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.5453224Z tmp3 = tl.load(in_ptr0 + (16 + x0 + (32*x1)), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.5453459Z tmp5 = tl.load(in_ptr0 + (24 + x0 + (32*x1)), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.5453539Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.5453616Z tmp4 = tmp2 + tmp3 2023-01-11T21:38:06.5453686Z tmp6 = tmp4 + tmp5 2023-01-11T21:38:06.5453758Z tmp7 = 4 2023-01-11T21:38:06.5453836Z tmp8 = tmp6 / tmp7 2023-01-11T21:38:06.5453970Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp8, xmask) 2023-01-11T21:38:06.5454057Z ''') 2023-01-11T21:38:06.5454062Z 2023-01-11T21:38:06.5454067Z 2023-01-11T21:38:06.5454223Z triton_fused_mean_3_3 = async_compile.triton(''' 2023-01-11T21:38:06.5454296Z import triton 2023-01-11T21:38:06.5454382Z import triton.language as tl 2023-01-11T21:38:06.5454613Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.5454717Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.5454853Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.5454981Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.5454986Z 2023-01-11T21:38:06.5455433Z @pointwise(size_hints=[32], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.5455518Z @triton.jit 2023-01-11T21:38:06.5455648Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.5455715Z xnumel = 32 2023-01-11T21:38:06.5455810Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.5455937Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.5456018Z xmask = xindex < xnumel 2023-01-11T21:38:06.5456088Z x0 = xindex 2023-01-11T21:38:06.5456182Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.5456285Z tmp1 = tl.load(in_ptr0 + (32 + x0), xmask) 2023-01-11T21:38:06.5456357Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.5456428Z tmp3 = 2 2023-01-11T21:38:06.5456504Z tmp4 = tmp2 / tmp3 2023-01-11T21:38:06.5456638Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.5456723Z ''') 2023-01-11T21:38:06.5456728Z 2023-01-11T21:38:06.5456733Z 2023-01-11T21:38:06.5456826Z async_compile.wait(globals()) 2023-01-11T21:38:06.5456946Z del async_compile 2023-01-11T21:38:06.5456951Z 2023-01-11T21:38:06.5457019Z def call(args): 2023-01-11T21:38:06.5457093Z arg0_1, = args 2023-01-11T21:38:06.5457231Z args.clear() 2023-01-11T21:38:06.5457337Z with torch.cuda.device(0): 2023-01-11T21:38:06.5457550Z buf0 = empty_strided((), (), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.5457640Z buf5 = buf0; del buf0 # reuse 2023-01-11T21:38:06.5457729Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.5457861Z triton_fused_mean_0.run(buf5, arg0_1, 1, 64, grid=grid(1), stream=stream0) 2023-01-11T21:38:06.5458062Z buf1 = empty_strided((1, 2, 4), (8, 4, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.5458156Z buf2 = buf1; del buf1 # reuse 2023-01-11T21:38:06.5458296Z triton_fused_mean_1_1.run(buf2, arg0_1, 8, 8, grid=grid(8), stream=stream0) 2023-01-11T21:38:06.5458507Z buf3 = empty_strided((1, 2, 1, 8), (16, 8, 8, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.5458645Z triton_fused_mean_2_2.run(arg0_1, buf3, 16, grid=grid(16), stream=stream0) 2023-01-11T21:38:06.5458843Z buf4 = empty_strided((4, 8), (8, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.5458978Z triton_fused_mean_3_3.run(arg0_1, buf4, 32, grid=grid(32), stream=stream0) 2023-01-11T21:38:06.5459045Z del arg0_1 2023-01-11T21:38:06.5459139Z return (buf5, buf2, buf3, buf4, ) 2023-01-11T21:38:06.5459144Z 2023-01-11T21:38:06.5459149Z 2023-01-11T21:38:06.5459230Z if __name__ == "__main__": 2023-01-11T21:38:06.5459349Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.5459515Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.5459727Z arg0_1 = rand_strided((1, 2, 4, 8), (64, 32, 8, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.5459838Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.5459843Z 2023-01-11T21:38:06.5460109Z [2023-01-11 21:35:22,132] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 723 2023-01-11T21:38:06.5460115Z 2023-01-11T21:38:06.5460213Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.5460281Z import torch 2023-01-11T21:38:06.5460353Z import random 2023-01-11T21:38:06.5460473Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.5460596Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.5460601Z 2023-01-11T21:38:06.5460682Z aten = torch.ops.aten 2023-01-11T21:38:06.5460816Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.5460911Z async_compile = AsyncCompile() 2023-01-11T21:38:06.5460920Z 2023-01-11T21:38:06.5460986Z import triton 2023-01-11T21:38:06.5461079Z import triton.language as tl 2023-01-11T21:38:06.5461203Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.5461341Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.5461346Z 2023-01-11T21:38:06.5461354Z 2023-01-11T21:38:06.5461509Z triton_fused_mean_0 = async_compile.triton(''' 2023-01-11T21:38:06.5461583Z import triton 2023-01-11T21:38:06.5461672Z import triton.language as tl 2023-01-11T21:38:06.5461785Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.5461880Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.5462011Z from torch._inductor.triton_ops.autotune import reduction 2023-01-11T21:38:06.5462134Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.5462139Z 2023-01-11T21:38:06.5462228Z @reduction(size_hints=[1, 64], 2023-01-11T21:38:06.5462342Z reduction_hint=ReductionHint.INNER, 2023-01-11T21:38:06.5462430Z filename=__file__, 2023-01-11T21:38:06.5462791Z meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 3), equal_to_1=())]}) 2023-01-11T21:38:06.5462864Z @triton.jit 2023-01-11T21:38:06.5463052Z def triton_(in_ptr0, out_ptr1, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr): 2023-01-11T21:38:06.5463126Z xnumel = 1 2023-01-11T21:38:06.5463198Z rnumel = 64 2023-01-11T21:38:06.5463294Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.5463429Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.5463512Z xmask = xindex < xnumel 2023-01-11T21:38:06.5463632Z rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK]) 2023-01-11T21:38:06.5463744Z _tmp2 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.5463848Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.5463938Z rindex = roffset + rbase 2023-01-11T21:38:06.5464022Z rmask = rindex < rnumel 2023-01-11T21:38:06.5464093Z r0 = rindex 2023-01-11T21:38:06.5464212Z tmp0 = tl.load(in_ptr0 + (r0), rmask).to(tl.float32) 2023-01-11T21:38:06.5464300Z tmp1 = tmp0.to(tl.float32) 2023-01-11T21:38:06.5464416Z _tmp2 = tl.where(xmask & rmask, _tmp2 + tmp1, _tmp2) 2023-01-11T21:38:06.5464529Z tmp2 = tl.reshape(tl.sum(_tmp2, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.5464600Z tmp3 = 64 2023-01-11T21:38:06.5464678Z tmp4 = tmp2 / tmp3 2023-01-11T21:38:06.5464763Z tmp5 = tmp4.to(tl.float32) 2023-01-11T21:38:06.5464896Z tl.store(out_ptr1 + (0 + tl.zeros([XBLOCK, 1], tl.int32)), tmp5, None) 2023-01-11T21:38:06.5464981Z ''') 2023-01-11T21:38:06.5464986Z 2023-01-11T21:38:06.5464990Z 2023-01-11T21:38:06.5465149Z triton_fused_mean_1_1 = async_compile.triton(''' 2023-01-11T21:38:06.5465217Z import triton 2023-01-11T21:38:06.5465307Z import triton.language as tl 2023-01-11T21:38:06.5465457Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.5465575Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.5465727Z from torch._inductor.triton_ops.autotune import reduction 2023-01-11T21:38:06.5465854Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.5465859Z 2023-01-11T21:38:06.5465948Z @reduction(size_hints=[8, 8], 2023-01-11T21:38:06.5466056Z reduction_hint=ReductionHint.INNER, 2023-01-11T21:38:06.5466139Z filename=__file__, 2023-01-11T21:38:06.5466494Z meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.5466569Z @triton.jit 2023-01-11T21:38:06.5466736Z def triton_(in_ptr0, out_ptr1, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr): 2023-01-11T21:38:06.5466809Z xnumel = 8 2023-01-11T21:38:06.5466883Z rnumel = 8 2023-01-11T21:38:06.5466973Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.5467110Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.5467191Z xmask = xindex < xnumel 2023-01-11T21:38:06.5467309Z rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK]) 2023-01-11T21:38:06.5467381Z x0 = xindex 2023-01-11T21:38:06.5467498Z _tmp2 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.5467602Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.5467682Z rindex = roffset + rbase 2023-01-11T21:38:06.5467769Z rmask = rindex < rnumel 2023-01-11T21:38:06.5467841Z r1 = rindex 2023-01-11T21:38:06.5467973Z tmp0 = tl.load(in_ptr0 + (r1 + (8*x0)), rmask & xmask).to(tl.float32) 2023-01-11T21:38:06.5468062Z tmp1 = tmp0.to(tl.float32) 2023-01-11T21:38:06.5468179Z _tmp2 = tl.where(xmask & rmask, _tmp2 + tmp1, _tmp2) 2023-01-11T21:38:06.5468293Z tmp2 = tl.reshape(tl.sum(_tmp2, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.5468366Z tmp3 = 8 2023-01-11T21:38:06.5468439Z tmp4 = tmp2 / tmp3 2023-01-11T21:38:06.5468522Z tmp5 = tmp4.to(tl.float32) 2023-01-11T21:38:06.5468657Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK, 1], tl.int32)), tmp5, xmask) 2023-01-11T21:38:06.5468742Z ''') 2023-01-11T21:38:06.5468748Z 2023-01-11T21:38:06.5468781Z 2023-01-11T21:38:06.5468940Z triton_fused_mean_2_2 = async_compile.triton(''' 2023-01-11T21:38:06.5469010Z import triton 2023-01-11T21:38:06.5469100Z import triton.language as tl 2023-01-11T21:38:06.5469207Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.5469308Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.5469437Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.5469561Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.5469567Z 2023-01-11T21:38:06.5469970Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.5470046Z @triton.jit 2023-01-11T21:38:06.5470177Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.5470250Z xnumel = 16 2023-01-11T21:38:06.5470341Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.5470473Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.5470554Z xmask = xindex < xnumel 2023-01-11T21:38:06.5470628Z x0 = xindex % 8 2023-01-11T21:38:06.5470707Z x1 = (xindex // 8) 2023-01-11T21:38:06.5470781Z x2 = xindex 2023-01-11T21:38:06.5471008Z tmp0 = tl.load(in_ptr0 + (x0 + (32*x1)), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.5471229Z tmp2 = tl.load(in_ptr0 + (8 + x0 + (32*x1)), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.5471454Z tmp5 = tl.load(in_ptr0 + (16 + x0 + (32*x1)), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.5471703Z tmp8 = tl.load(in_ptr0 + (24 + x0 + (32*x1)), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.5471794Z tmp1 = tmp0.to(tl.float32) 2023-01-11T21:38:06.5471883Z tmp3 = tmp2.to(tl.float32) 2023-01-11T21:38:06.5471962Z tmp4 = tmp1 + tmp3 2023-01-11T21:38:06.5472047Z tmp6 = tmp5.to(tl.float32) 2023-01-11T21:38:06.5472118Z tmp7 = tmp4 + tmp6 2023-01-11T21:38:06.5472204Z tmp9 = tmp8.to(tl.float32) 2023-01-11T21:38:06.5472281Z tmp10 = tmp7 + tmp9 2023-01-11T21:38:06.5472349Z tmp11 = 4 2023-01-11T21:38:06.5472430Z tmp12 = tmp10 / tmp11 2023-01-11T21:38:06.5472518Z tmp13 = tmp12.to(tl.float32) 2023-01-11T21:38:06.5472654Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp13, xmask) 2023-01-11T21:38:06.5472732Z ''') 2023-01-11T21:38:06.5472743Z 2023-01-11T21:38:06.5472748Z 2023-01-11T21:38:06.5472897Z triton_fused_mean_3_3 = async_compile.triton(''' 2023-01-11T21:38:06.5472974Z import triton 2023-01-11T21:38:06.5473068Z import triton.language as tl 2023-01-11T21:38:06.5473178Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.5473281Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.5473411Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.5473540Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.5473546Z 2023-01-11T21:38:06.5473941Z @pointwise(size_hints=[32], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.5474014Z @triton.jit 2023-01-11T21:38:06.5474146Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.5474220Z xnumel = 32 2023-01-11T21:38:06.5474313Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.5474443Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.5474530Z xmask = xindex < xnumel 2023-01-11T21:38:06.5474599Z x0 = xindex 2023-01-11T21:38:06.5474708Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.5474826Z tmp2 = tl.load(in_ptr0 + (32 + x0), xmask).to(tl.float32) 2023-01-11T21:38:06.5474943Z tmp1 = tmp0.to(tl.float32) 2023-01-11T21:38:06.5475030Z tmp3 = tmp2.to(tl.float32) 2023-01-11T21:38:06.5475110Z tmp4 = tmp1 + tmp3 2023-01-11T21:38:06.5475181Z tmp5 = 2 2023-01-11T21:38:06.5475258Z tmp6 = tmp4 / tmp5 2023-01-11T21:38:06.5475350Z tmp7 = tmp6.to(tl.float32) 2023-01-11T21:38:06.5475502Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp7, xmask) 2023-01-11T21:38:06.5475607Z ''') 2023-01-11T21:38:06.5475613Z 2023-01-11T21:38:06.5475617Z 2023-01-11T21:38:06.5475709Z async_compile.wait(globals()) 2023-01-11T21:38:06.5475786Z del async_compile 2023-01-11T21:38:06.5475791Z 2023-01-11T21:38:06.5475866Z def call(args): 2023-01-11T21:38:06.5475943Z arg0_1, = args 2023-01-11T21:38:06.5476013Z args.clear() 2023-01-11T21:38:06.5476101Z with torch.cuda.device(0): 2023-01-11T21:38:06.5476290Z buf5 = empty_strided((), (), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.5476383Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.5476524Z triton_fused_mean_0.run(arg0_1, buf5, 1, 64, grid=grid(1), stream=stream0) 2023-01-11T21:38:06.5476728Z buf2 = empty_strided((1, 2, 4), (8, 4, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.5476867Z triton_fused_mean_1_1.run(arg0_1, buf2, 8, 8, grid=grid(8), stream=stream0) 2023-01-11T21:38:06.5477076Z buf3 = empty_strided((1, 2, 1, 8), (16, 8, 8, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.5477205Z triton_fused_mean_2_2.run(arg0_1, buf3, 16, grid=grid(16), stream=stream0) 2023-01-11T21:38:06.5477402Z buf4 = empty_strided((4, 8), (8, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.5477569Z triton_fused_mean_3_3.run(arg0_1, buf4, 32, grid=grid(32), stream=stream0) 2023-01-11T21:38:06.5477642Z del arg0_1 2023-01-11T21:38:06.5477737Z return (buf5, buf2, buf3, buf4, ) 2023-01-11T21:38:06.5477743Z 2023-01-11T21:38:06.5477748Z 2023-01-11T21:38:06.5477828Z if __name__ == "__main__": 2023-01-11T21:38:06.5477947Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.5478072Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.5478281Z arg0_1 = rand_strided((1, 2, 4, 8), (64, 32, 8, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.5478393Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.5478398Z 2023-01-11T21:38:06.5478466Z ok (0.524s) 2023-01-11T21:38:06.5478931Z test_min_max_reduction_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.5479064Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.5479326Z [2023-01-11 21:35:22,157] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 724 2023-01-11T21:38:06.5479590Z [2023-01-11 21:35:22,253] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 724 2023-01-11T21:38:06.5480004Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.5480135Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.5480392Z [2023-01-11 21:35:22,277] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 725 2023-01-11T21:38:06.5480655Z [2023-01-11 21:35:22,375] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 725 2023-01-11T21:38:06.5480661Z 2023-01-11T21:38:06.5480779Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.5480856Z import torch 2023-01-11T21:38:06.5480927Z import random 2023-01-11T21:38:06.5481046Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.5481170Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.5481175Z 2023-01-11T21:38:06.5481255Z aten = torch.ops.aten 2023-01-11T21:38:06.5481390Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.5481479Z async_compile = AsyncCompile() 2023-01-11T21:38:06.5481490Z 2023-01-11T21:38:06.5481558Z import triton 2023-01-11T21:38:06.5481647Z import triton.language as tl 2023-01-11T21:38:06.5481776Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.5481916Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.5481921Z 2023-01-11T21:38:06.5481926Z 2023-01-11T21:38:06.5482125Z triton_fused_add_add_1_add_2_amax_max_1_min_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.5482202Z import triton 2023-01-11T21:38:06.5482295Z import triton.language as tl 2023-01-11T21:38:06.5482401Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.5482500Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.5482630Z from torch._inductor.triton_ops.autotune import reduction 2023-01-11T21:38:06.5482755Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.5482760Z 2023-01-11T21:38:06.5482849Z @reduction(size_hints=[1, 64], 2023-01-11T21:38:06.5482962Z reduction_hint=ReductionHint.INNER, 2023-01-11T21:38:06.5483046Z filename=__file__, 2023-01-11T21:38:06.5483453Z meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: '*fp32', 5: 'i32', 6: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 6), equal_to_1=())]}) 2023-01-11T21:38:06.5483549Z @triton.jit 2023-01-11T21:38:06.5483745Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, out_ptr2, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr): 2023-01-11T21:38:06.5483817Z xnumel = 1 2023-01-11T21:38:06.5483889Z rnumel = 64 2023-01-11T21:38:06.5483985Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.5484120Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.5484202Z xmask = xindex < xnumel 2023-01-11T21:38:06.5484314Z rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK]) 2023-01-11T21:38:06.5484496Z _tmp3 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("-inf") 2023-01-11T21:38:06.5484622Z _tmp6 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("inf") 2023-01-11T21:38:06.5484805Z _tmp10 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("-inf") 2023-01-11T21:38:06.5484911Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.5484997Z rindex = roffset + rbase 2023-01-11T21:38:06.5485085Z rmask = rindex < rnumel 2023-01-11T21:38:06.5485158Z r0 = rindex 2023-01-11T21:38:06.5485346Z tmp0 = tl.load(in_ptr0 + (r0), rmask, eviction_policy='evict_last') 2023-01-11T21:38:06.5485563Z tmp1 = tl.load(in_ptr1 + (r0), rmask, eviction_policy='evict_last') 2023-01-11T21:38:06.5485672Z tmp4 = tl.load(in_ptr1 + (r0), rmask) 2023-01-11T21:38:06.5485786Z tmp7 = tl.load(in_ptr0 + (r0), rmask) 2023-01-11T21:38:06.5485865Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.5485991Z _tmp3 = tl.where(xmask & rmask & (_tmp3 < tmp2), tmp2, _tmp3) 2023-01-11T21:38:06.5486071Z tmp5 = tmp0 + tmp4 2023-01-11T21:38:06.5486191Z _tmp6 = tl.where(xmask & rmask & (_tmp6 > tmp5), tmp5, _tmp6) 2023-01-11T21:38:06.5486264Z tmp8 = 1 2023-01-11T21:38:06.5486345Z tmp9 = tmp7 + tmp8 2023-01-11T21:38:06.5486473Z _tmp10 = tl.where(xmask & rmask & (_tmp10 < tmp9), tmp9, _tmp10) 2023-01-11T21:38:06.5486589Z tmp3 = tl.reshape(tl.max(_tmp3, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.5486747Z tl.store(out_ptr0 + 0 + tl.zeros([XBLOCK, 1], tl.int32), tmp3, None) 2023-01-11T21:38:06.5486865Z tmp6 = tl.reshape(tl.min(_tmp6, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.5486990Z tl.store(out_ptr1 + 0 + tl.zeros([XBLOCK, 1], tl.int32), tmp6, None) 2023-01-11T21:38:06.5487104Z tmp10 = tl.reshape(tl.max(_tmp10, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.5487234Z tl.store(out_ptr2 + 0 + tl.zeros([XBLOCK, 1], tl.int32), tmp10, None) 2023-01-11T21:38:06.5487319Z ''') 2023-01-11T21:38:06.5487324Z 2023-01-11T21:38:06.5487329Z 2023-01-11T21:38:06.5487422Z async_compile.wait(globals()) 2023-01-11T21:38:06.5487498Z del async_compile 2023-01-11T21:38:06.5487503Z 2023-01-11T21:38:06.5487578Z def call(args): 2023-01-11T21:38:06.5487658Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.5487726Z args.clear() 2023-01-11T21:38:06.5487817Z with torch.cuda.device(0): 2023-01-11T21:38:06.5488006Z buf0 = empty_strided((), (), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.5488191Z buf1 = empty_strided((), (), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.5488392Z buf2 = empty_strided((1, 1), (1, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.5488484Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.5488662Z triton_fused_add_add_1_add_2_amax_max_1_min_1_0.run(arg0_1, arg1_1, buf0, buf1, buf2, 1, 64, grid=grid(1), stream=stream0) 2023-01-11T21:38:06.5488729Z del arg0_1 2023-01-11T21:38:06.5488797Z del arg1_1 2023-01-11T21:38:06.5488887Z return (buf0, buf1, buf2, ) 2023-01-11T21:38:06.5488892Z 2023-01-11T21:38:06.5488897Z 2023-01-11T21:38:06.5488978Z if __name__ == "__main__": 2023-01-11T21:38:06.5489095Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.5489265Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.5489467Z arg0_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.5489663Z arg1_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.5489779Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.5489791Z 2023-01-11T21:38:06.5489795Z 2023-01-11T21:38:06.5489885Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.5489963Z import torch 2023-01-11T21:38:06.5490037Z import random 2023-01-11T21:38:06.5490156Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.5490282Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.5490287Z 2023-01-11T21:38:06.5490369Z aten = torch.ops.aten 2023-01-11T21:38:06.5490506Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.5490594Z async_compile = AsyncCompile() 2023-01-11T21:38:06.5490601Z 2023-01-11T21:38:06.5490678Z import triton 2023-01-11T21:38:06.5490772Z import triton.language as tl 2023-01-11T21:38:06.5490897Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.5491035Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.5491040Z 2023-01-11T21:38:06.5491045Z 2023-01-11T21:38:06.5491245Z triton_fused_add_add_1_add_2_amax_max_1_min_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.5491319Z import triton 2023-01-11T21:38:06.5491409Z import triton.language as tl 2023-01-11T21:38:06.5491515Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.5491614Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.5491744Z from torch._inductor.triton_ops.autotune import reduction 2023-01-11T21:38:06.5491865Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.5491870Z 2023-01-11T21:38:06.5491962Z @reduction(size_hints=[1, 64], 2023-01-11T21:38:06.5492075Z reduction_hint=ReductionHint.INNER, 2023-01-11T21:38:06.5492163Z filename=__file__, 2023-01-11T21:38:06.5492607Z meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: '*fp16', 4: '*fp16', 5: 'i32', 6: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 6), equal_to_1=())]}) 2023-01-11T21:38:06.5492675Z @triton.jit 2023-01-11T21:38:06.5492869Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, out_ptr2, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr): 2023-01-11T21:38:06.5492939Z xnumel = 1 2023-01-11T21:38:06.5493013Z rnumel = 64 2023-01-11T21:38:06.5493109Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.5493245Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.5493329Z xmask = xindex < xnumel 2023-01-11T21:38:06.5493440Z rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK]) 2023-01-11T21:38:06.5493621Z _tmp3 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("-inf") 2023-01-11T21:38:06.5493750Z _tmp6 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("inf") 2023-01-11T21:38:06.5493935Z _tmp10 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("-inf") 2023-01-11T21:38:06.5494041Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.5494130Z rindex = roffset + rbase 2023-01-11T21:38:06.5494213Z rmask = rindex < rnumel 2023-01-11T21:38:06.5494284Z r0 = rindex 2023-01-11T21:38:06.5494605Z tmp0 = tl.load(in_ptr0 + (r0), rmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.5494823Z tmp1 = tl.load(in_ptr1 + (r0), rmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.5494942Z tmp4 = tl.load(in_ptr1 + (r0), rmask).to(tl.float32) 2023-01-11T21:38:06.5495059Z tmp7 = tl.load(in_ptr0 + (r0), rmask).to(tl.float32) 2023-01-11T21:38:06.5495140Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.5495267Z _tmp3 = tl.where(xmask & rmask & (_tmp3 < tmp2), tmp2, _tmp3) 2023-01-11T21:38:06.5495391Z tmp5 = tmp0 + tmp4 2023-01-11T21:38:06.5495511Z _tmp6 = tl.where(xmask & rmask & (_tmp6 > tmp5), tmp5, _tmp6) 2023-01-11T21:38:06.5495581Z tmp8 = 1 2023-01-11T21:38:06.5495660Z tmp9 = tmp7 + tmp8 2023-01-11T21:38:06.5495795Z _tmp10 = tl.where(xmask & rmask & (_tmp10 < tmp9), tmp9, _tmp10) 2023-01-11T21:38:06.5495910Z tmp3 = tl.reshape(tl.max(_tmp3, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.5496038Z tl.store(out_ptr0 + 0 + tl.zeros([XBLOCK, 1], tl.int32), tmp3, None) 2023-01-11T21:38:06.5496153Z tmp6 = tl.reshape(tl.min(_tmp6, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.5496276Z tl.store(out_ptr1 + 0 + tl.zeros([XBLOCK, 1], tl.int32), tmp6, None) 2023-01-11T21:38:06.5496392Z tmp10 = tl.reshape(tl.max(_tmp10, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.5496525Z tl.store(out_ptr2 + 0 + tl.zeros([XBLOCK, 1], tl.int32), tmp10, None) 2023-01-11T21:38:06.5496610Z ''') 2023-01-11T21:38:06.5496616Z 2023-01-11T21:38:06.5496623Z 2023-01-11T21:38:06.5496717Z async_compile.wait(globals()) 2023-01-11T21:38:06.5496794Z del async_compile 2023-01-11T21:38:06.5496800Z 2023-01-11T21:38:06.5496873Z def call(args): 2023-01-11T21:38:06.5496952Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.5497020Z args.clear() 2023-01-11T21:38:06.5497113Z with torch.cuda.device(0): 2023-01-11T21:38:06.5497375Z buf0 = empty_strided((), (), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.5497560Z buf1 = empty_strided((), (), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.5497762Z buf2 = empty_strided((1, 1), (1, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.5497854Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.5498033Z triton_fused_add_add_1_add_2_amax_max_1_min_1_0.run(arg0_1, arg1_1, buf0, buf1, buf2, 1, 64, grid=grid(1), stream=stream0) 2023-01-11T21:38:06.5498105Z del arg0_1 2023-01-11T21:38:06.5498170Z del arg1_1 2023-01-11T21:38:06.5498258Z return (buf0, buf1, buf2, ) 2023-01-11T21:38:06.5498267Z 2023-01-11T21:38:06.5498271Z 2023-01-11T21:38:06.5498349Z if __name__ == "__main__": 2023-01-11T21:38:06.5498468Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.5498596Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.5498836Z arg0_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.5499034Z arg1_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.5499147Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.5499158Z 2023-01-11T21:38:06.5499221Z ok (0.243s) 2023-01-11T21:38:06.5499696Z test_misaligned_address_issue1_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.5499831Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.5500088Z [2023-01-11 21:35:22,396] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 726 2023-01-11T21:38:06.5500351Z [2023-01-11 21:35:22,558] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 726 2023-01-11T21:38:06.5500767Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.5500900Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.5501152Z [2023-01-11 21:35:22,577] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 727 2023-01-11T21:38:06.5501445Z [2023-01-11 21:35:22,642] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 727 2023-01-11T21:38:06.5501450Z 2023-01-11T21:38:06.5501551Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.5501623Z import torch 2023-01-11T21:38:06.5501699Z import random 2023-01-11T21:38:06.5501820Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.5501946Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.5501952Z 2023-01-11T21:38:06.5502036Z aten = torch.ops.aten 2023-01-11T21:38:06.5502176Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.5502273Z async_compile = AsyncCompile() 2023-01-11T21:38:06.5502278Z 2023-01-11T21:38:06.5502355Z import triton 2023-01-11T21:38:06.5502443Z import triton.language as tl 2023-01-11T21:38:06.5502570Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.5502714Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.5502720Z 2023-01-11T21:38:06.5502725Z 2023-01-11T21:38:06.5502885Z triton_fused_gather_0 = async_compile.triton(''' 2023-01-11T21:38:06.5502959Z import triton 2023-01-11T21:38:06.5503053Z import triton.language as tl 2023-01-11T21:38:06.5503173Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.5503277Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.5503406Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.5503532Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.5503537Z 2023-01-11T21:38:06.5503953Z @pointwise(size_hints=[1], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.5504029Z @triton.jit 2023-01-11T21:38:06.5504174Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.5504249Z xnumel = 1 2023-01-11T21:38:06.5504347Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.5504476Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.5504556Z xmask = xindex < xnumel 2023-01-11T21:38:06.5504718Z tmp0 = tl.load(in_ptr0 + (0 + tl.zeros([XBLOCK], tl.int32)), None) 2023-01-11T21:38:06.5504823Z tmp1 = tl.load(in_ptr1 + (tmp0), None) 2023-01-11T21:38:06.5504962Z tl.store(out_ptr0 + (0 + tl.zeros([XBLOCK], tl.int32)), tmp1, None) 2023-01-11T21:38:06.5505049Z ''') 2023-01-11T21:38:06.5505054Z 2023-01-11T21:38:06.5505058Z 2023-01-11T21:38:06.5505154Z async_compile.wait(globals()) 2023-01-11T21:38:06.5505233Z del async_compile 2023-01-11T21:38:06.5505239Z 2023-01-11T21:38:06.5505309Z def call(args): 2023-01-11T21:38:06.5505389Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.5505466Z args.clear() 2023-01-11T21:38:06.5505561Z with torch.cuda.device(0): 2023-01-11T21:38:06.5505798Z buf0 = empty_strided((1, 1), (1, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.5505915Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.5506064Z triton_fused_gather_0.run(arg1_1, arg0_1, buf0, 1, grid=grid(1), stream=stream0) 2023-01-11T21:38:06.5506141Z del arg0_1 2023-01-11T21:38:06.5506211Z del arg1_1 2023-01-11T21:38:06.5506293Z return (buf0, ) 2023-01-11T21:38:06.5506298Z 2023-01-11T21:38:06.5506303Z 2023-01-11T21:38:06.5506384Z if __name__ == "__main__": 2023-01-11T21:38:06.5506504Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.5506632Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.5506843Z arg0_1 = rand_strided((1, 1000), (1000, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.5507042Z arg1_1 = rand_strided((1, 1), (1, 1), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.5507157Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.5507194Z 2023-01-11T21:38:06.5507198Z 2023-01-11T21:38:06.5507293Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.5507369Z import torch 2023-01-11T21:38:06.5507445Z import random 2023-01-11T21:38:06.5507566Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.5507695Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.5507700Z 2023-01-11T21:38:06.5507784Z aten = torch.ops.aten 2023-01-11T21:38:06.5507922Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.5508013Z async_compile = AsyncCompile() 2023-01-11T21:38:06.5508018Z 2023-01-11T21:38:06.5508092Z import triton 2023-01-11T21:38:06.5508186Z import triton.language as tl 2023-01-11T21:38:06.5508314Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.5508457Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.5508463Z 2023-01-11T21:38:06.5508467Z 2023-01-11T21:38:06.5508632Z triton_fused_gather_0 = async_compile.triton(''' 2023-01-11T21:38:06.5508706Z import triton 2023-01-11T21:38:06.5508801Z import triton.language as tl 2023-01-11T21:38:06.5508910Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.5509014Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.5509153Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.5509280Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.5509285Z 2023-01-11T21:38:06.5509699Z @pointwise(size_hints=[1], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.5509775Z @triton.jit 2023-01-11T21:38:06.5509920Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.5509999Z xnumel = 1 2023-01-11T21:38:06.5510091Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.5510226Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.5510312Z xmask = xindex < xnumel 2023-01-11T21:38:06.5510444Z tmp0 = tl.load(in_ptr0 + (0 + tl.zeros([XBLOCK], tl.int32)), None) 2023-01-11T21:38:06.5510564Z tmp1 = tl.load(in_ptr1 + (tmp0), None).to(tl.float32) 2023-01-11T21:38:06.5510774Z tl.store(out_ptr0 + (0 + tl.zeros([XBLOCK], tl.int32)), tmp1, None) 2023-01-11T21:38:06.5510862Z ''') 2023-01-11T21:38:06.5510868Z 2023-01-11T21:38:06.5510872Z 2023-01-11T21:38:06.5510970Z async_compile.wait(globals()) 2023-01-11T21:38:06.5511042Z del async_compile 2023-01-11T21:38:06.5511047Z 2023-01-11T21:38:06.5511123Z def call(args): 2023-01-11T21:38:06.5511204Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.5511282Z args.clear() 2023-01-11T21:38:06.5511377Z with torch.cuda.device(0): 2023-01-11T21:38:06.5511578Z buf0 = empty_strided((1, 1), (1, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.5511673Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.5511816Z triton_fused_gather_0.run(arg1_1, arg0_1, buf0, 1, grid=grid(1), stream=stream0) 2023-01-11T21:38:06.5511888Z del arg0_1 2023-01-11T21:38:06.5511963Z del arg1_1 2023-01-11T21:38:06.5512041Z return (buf0, ) 2023-01-11T21:38:06.5512046Z 2023-01-11T21:38:06.5512050Z 2023-01-11T21:38:06.5512133Z if __name__ == "__main__": 2023-01-11T21:38:06.5512253Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.5512379Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.5512587Z arg0_1 = rand_strided((1, 1000), (1000, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.5512777Z arg1_1 = rand_strided((1, 1), (1, 1), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.5512898Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.5512903Z 2023-01-11T21:38:06.5512976Z ok (0.265s) 2023-01-11T21:38:06.5513434Z test_mm_views_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.5513599Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.5513859Z [2023-01-11 21:35:22,660] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 728 2023-01-11T21:38:06.5514123Z [2023-01-11 21:35:22,662] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 728 2023-01-11T21:38:06.5514128Z 2023-01-11T21:38:06.5514227Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.5514304Z import torch 2023-01-11T21:38:06.5514374Z import random 2023-01-11T21:38:06.5514497Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.5514624Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.5514629Z 2023-01-11T21:38:06.5514713Z aten = torch.ops.aten 2023-01-11T21:38:06.5514857Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.5514954Z async_compile = AsyncCompile() 2023-01-11T21:38:06.5514959Z 2023-01-11T21:38:06.5515041Z import triton 2023-01-11T21:38:06.5515141Z import triton.language as tl 2023-01-11T21:38:06.5515261Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.5515428Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.5515434Z 2023-01-11T21:38:06.5515438Z 2023-01-11T21:38:06.5515545Z async_compile.wait(globals()) 2023-01-11T21:38:06.5515633Z del async_compile 2023-01-11T21:38:06.5515638Z 2023-01-11T21:38:06.5515714Z def call(args): 2023-01-11T21:38:06.5515795Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.5515871Z args.clear() 2023-01-11T21:38:06.5515958Z with torch.cuda.device(0): 2023-01-11T21:38:06.5516169Z buf0 = empty_strided((32, 32), (32, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.5516299Z aten.mm.out(arg0_1, as_strided(arg1_1, (32, 32), (32, 1)), out=buf0) 2023-01-11T21:38:06.5516373Z del arg0_1 2023-01-11T21:38:06.5516447Z del arg1_1 2023-01-11T21:38:06.5516526Z return (buf0, ) 2023-01-11T21:38:06.5516559Z 2023-01-11T21:38:06.5516564Z 2023-01-11T21:38:06.5516647Z if __name__ == "__main__": 2023-01-11T21:38:06.5516764Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.5516886Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.5517090Z arg0_1 = rand_strided((32, 32), (1, 32), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.5517302Z arg1_1 = rand_strided((32, 1, 32), (32, 1024, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.5517424Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.5517430Z 2023-01-11T21:38:06.5517505Z ok (0.020s) 2023-01-11T21:38:06.5517977Z test_move_arange_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.5518111Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.5518370Z [2023-01-11 21:35:22,728] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 729 2023-01-11T21:38:06.5518634Z [2023-01-11 21:35:22,800] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 729 2023-01-11T21:38:06.5518640Z 2023-01-11T21:38:06.5518740Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.5518811Z import torch 2023-01-11T21:38:06.5518890Z import random 2023-01-11T21:38:06.5519045Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.5519170Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.5519175Z 2023-01-11T21:38:06.5519257Z aten = torch.ops.aten 2023-01-11T21:38:06.5519392Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.5519489Z async_compile = AsyncCompile() 2023-01-11T21:38:06.5519497Z 2023-01-11T21:38:06.5519567Z import triton 2023-01-11T21:38:06.5519657Z import triton.language as tl 2023-01-11T21:38:06.5519785Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.5519924Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.5519930Z 2023-01-11T21:38:06.5519934Z 2023-01-11T21:38:06.5520091Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.5520168Z import triton 2023-01-11T21:38:06.5520262Z import triton.language as tl 2023-01-11T21:38:06.5520380Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.5520477Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.5520616Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.5520744Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.5520750Z 2023-01-11T21:38:06.5521159Z @pointwise(size_hints=[32], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.5521235Z @triton.jit 2023-01-11T21:38:06.5521368Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.5521444Z xnumel = 32 2023-01-11T21:38:06.5521542Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.5521667Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.5521753Z xmask = xindex < xnumel 2023-01-11T21:38:06.5521826Z x0 = xindex 2023-01-11T21:38:06.5521925Z tmp2 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.5522003Z tmp0 = x0 2023-01-11T21:38:06.5522092Z tmp1 = tmp0.to(tl.float32) 2023-01-11T21:38:06.5522166Z tmp3 = tmp1 + tmp2 2023-01-11T21:38:06.5522304Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.5522394Z ''') 2023-01-11T21:38:06.5522400Z 2023-01-11T21:38:06.5522404Z 2023-01-11T21:38:06.5522527Z async_compile.wait(globals()) 2023-01-11T21:38:06.5522607Z del async_compile 2023-01-11T21:38:06.5522613Z 2023-01-11T21:38:06.5522692Z def call(args): 2023-01-11T21:38:06.5522767Z arg0_1, = args 2023-01-11T21:38:06.5522845Z args.clear() 2023-01-11T21:38:06.5522932Z with torch.cuda.device(0): 2023-01-11T21:38:06.5523132Z buf0 = empty_strided((32, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.5523227Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.5523367Z triton_fused_add_0.run(arg0_1, buf0, 32, grid=grid(32), stream=stream0) 2023-01-11T21:38:06.5523441Z del arg0_1 2023-01-11T21:38:06.5523525Z return (buf0, ) 2023-01-11T21:38:06.5523530Z 2023-01-11T21:38:06.5523534Z 2023-01-11T21:38:06.5523616Z if __name__ == "__main__": 2023-01-11T21:38:06.5523737Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.5523858Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.5524061Z arg0_1 = rand_strided((32, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.5524177Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.5524182Z 2023-01-11T21:38:06.5524258Z ok (0.137s) 2023-01-11T21:38:06.5524717Z test_multi_device_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.5524878Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.5525137Z [2023-01-11 21:35:22,937] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 730 2023-01-11T21:38:06.5525321Z [2023-01-11 21:35:22,941] torch._inductor.ir: [WARNING] DeviceCopy 2023-01-11T21:38:06.5525502Z [2023-01-11 21:35:22,943] torch._inductor.ir: [WARNING] DeviceCopy 2023-01-11T21:38:06.5525671Z [2023-01-11 21:35:22,944] torch._inductor.ir: [WARNING] DeviceCopy 2023-01-11T21:38:06.5525942Z [2023-01-11 21:35:23,027] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 730 2023-01-11T21:38:06.5525948Z 2023-01-11T21:38:06.5526048Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.5526129Z import torch 2023-01-11T21:38:06.5526206Z import random 2023-01-11T21:38:06.5526329Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.5526456Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.5526463Z 2023-01-11T21:38:06.5526547Z aten = torch.ops.aten 2023-01-11T21:38:06.5526679Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.5526776Z async_compile = AsyncCompile() 2023-01-11T21:38:06.5526781Z 2023-01-11T21:38:06.5526856Z import triton 2023-01-11T21:38:06.5526951Z import triton.language as tl 2023-01-11T21:38:06.5527082Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.5527223Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.5527229Z 2023-01-11T21:38:06.5527233Z 2023-01-11T21:38:06.5527417Z triton_fused_add_add_1_add_2_add_3_0 = async_compile.triton(''' 2023-01-11T21:38:06.5527495Z import triton 2023-01-11T21:38:06.5527582Z import triton.language as tl 2023-01-11T21:38:06.5527699Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.5527800Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.5527935Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.5528067Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.5528072Z 2023-01-11T21:38:06.5528473Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.5528576Z @triton.jit 2023-01-11T21:38:06.5528714Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.5528783Z xnumel = 40 2023-01-11T21:38:06.5528881Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.5529011Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.5529096Z xmask = xindex < xnumel 2023-01-11T21:38:06.5529170Z x0 = xindex 2023-01-11T21:38:06.5529268Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.5529342Z tmp1 = 1 2023-01-11T21:38:06.5529417Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.5529490Z tmp3 = 2 2023-01-11T21:38:06.5529572Z tmp4 = tmp2 + tmp3 2023-01-11T21:38:06.5529644Z tmp5 = 3 2023-01-11T21:38:06.5529722Z tmp6 = tmp4 + tmp5 2023-01-11T21:38:06.5529793Z tmp7 = 4 2023-01-11T21:38:06.5529866Z tmp8 = tmp6 + tmp7 2023-01-11T21:38:06.5530003Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp8, xmask) 2023-01-11T21:38:06.5530091Z ''') 2023-01-11T21:38:06.5530098Z 2023-01-11T21:38:06.5530103Z 2023-01-11T21:38:06.5530242Z kernel_cpp_1 = async_compile.cpp(''' 2023-01-11T21:38:06.5530451Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.5530574Z extern "C" void kernel(float* __restrict__ in_out_ptr0) 2023-01-11T21:38:06.5530642Z { 2023-01-11T21:38:06.5530747Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.5530809Z { 2023-01-11T21:38:06.5530891Z #pragma omp for 2023-01-11T21:38:06.5530981Z for(long i0=0; i0<5; i0+=1) 2023-01-11T21:38:06.5531050Z { 2023-01-11T21:38:06.5531217Z auto tmp0 = at::vec::Vectorized::loadu(in_out_ptr0 + 8*i0); 2023-01-11T21:38:06.5531402Z auto tmp1 = at::vec::Vectorized(static_cast(5)); 2023-01-11T21:38:06.5531491Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.5531621Z auto tmp3 = at::vec::Vectorized(static_cast(6)); 2023-01-11T21:38:06.5531712Z auto tmp4 = tmp2 + tmp3; 2023-01-11T21:38:06.5531813Z tmp4.store(in_out_ptr0 + 8*i0); 2023-01-11T21:38:06.5531879Z } 2023-01-11T21:38:06.5531978Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:06.5532066Z for(long i0=40; i0<40; i0+=1) 2023-01-11T21:38:06.5532132Z { 2023-01-11T21:38:06.5532219Z auto tmp0 = in_out_ptr0[i0]; 2023-01-11T21:38:06.5532323Z auto tmp1 = static_cast(5); 2023-01-11T21:38:06.5532413Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.5532517Z auto tmp3 = static_cast(6); 2023-01-11T21:38:06.5532603Z auto tmp4 = tmp2 + tmp3; 2023-01-11T21:38:06.5532695Z in_out_ptr0[i0] = tmp4; 2023-01-11T21:38:06.5532761Z } 2023-01-11T21:38:06.5532820Z } 2023-01-11T21:38:06.5532886Z } 2023-01-11T21:38:06.5532972Z ''') 2023-01-11T21:38:06.5532977Z 2023-01-11T21:38:06.5532981Z 2023-01-11T21:38:06.5533241Z triton_fused_add_add_1_add_2_add_3_add_4_add_5_add_6_add_7_device_put_device_put_1_2 = async_compile.triton(''' 2023-01-11T21:38:06.5533318Z import triton 2023-01-11T21:38:06.5533408Z import triton.language as tl 2023-01-11T21:38:06.5533523Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.5533617Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.5533748Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.5533872Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.5533877Z 2023-01-11T21:38:06.5534274Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp32', 1: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0,), equal_to_1=())]}) 2023-01-11T21:38:06.5534349Z @triton.jit 2023-01-11T21:38:06.5534586Z def triton_(in_out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.5534661Z xnumel = 40 2023-01-11T21:38:06.5534755Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.5534923Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.5535008Z xmask = xindex < xnumel 2023-01-11T21:38:06.5535079Z x0 = xindex 2023-01-11T21:38:06.5535182Z tmp0 = tl.load(in_out_ptr0 + (x0), xmask) 2023-01-11T21:38:06.5535252Z tmp1 = 7 2023-01-11T21:38:06.5535333Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.5535405Z tmp3 = 8 2023-01-11T21:38:06.5535475Z tmp4 = tmp2 + tmp3 2023-01-11T21:38:06.5535613Z tl.store(in_out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.5535699Z ''') 2023-01-11T21:38:06.5535705Z 2023-01-11T21:38:06.5535709Z 2023-01-11T21:38:06.5535847Z kernel_cpp_3 = async_compile.cpp(''' 2023-01-11T21:38:06.5536051Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.5536171Z extern "C" void kernel(float* __restrict__ in_out_ptr0) 2023-01-11T21:38:06.5536235Z { 2023-01-11T21:38:06.5536336Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.5536397Z { 2023-01-11T21:38:06.5536478Z #pragma omp for 2023-01-11T21:38:06.5536564Z for(long i0=0; i0<5; i0+=1) 2023-01-11T21:38:06.5536630Z { 2023-01-11T21:38:06.5536772Z auto tmp0 = at::vec::Vectorized::loadu(in_out_ptr0 + 8*i0); 2023-01-11T21:38:06.5536910Z auto tmp1 = at::vec::Vectorized(static_cast(9)); 2023-01-11T21:38:06.5536998Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.5537186Z auto tmp3 = at::vec::Vectorized(static_cast(10)); 2023-01-11T21:38:06.5537298Z auto tmp4 = tmp2 + tmp3; 2023-01-11T21:38:06.5537494Z tmp4.store(in_out_ptr0 + 8*i0); 2023-01-11T21:38:06.5537561Z } 2023-01-11T21:38:06.5537661Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:06.5537747Z for(long i0=40; i0<40; i0+=1) 2023-01-11T21:38:06.5537813Z { 2023-01-11T21:38:06.5537900Z auto tmp0 = in_out_ptr0[i0]; 2023-01-11T21:38:06.5538007Z auto tmp1 = static_cast(9); 2023-01-11T21:38:06.5538095Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.5538200Z auto tmp3 = static_cast(10); 2023-01-11T21:38:06.5538288Z auto tmp4 = tmp2 + tmp3; 2023-01-11T21:38:06.5538376Z in_out_ptr0[i0] = tmp4; 2023-01-11T21:38:06.5538442Z } 2023-01-11T21:38:06.5538501Z } 2023-01-11T21:38:06.5538564Z } 2023-01-11T21:38:06.5538659Z ''') 2023-01-11T21:38:06.5538664Z 2023-01-11T21:38:06.5538668Z 2023-01-11T21:38:06.5538763Z async_compile.wait(globals()) 2023-01-11T21:38:06.5538840Z del async_compile 2023-01-11T21:38:06.5538845Z 2023-01-11T21:38:06.5538923Z def call(args): 2023-01-11T21:38:06.5538996Z arg0_1, = args 2023-01-11T21:38:06.5539064Z args.clear() 2023-01-11T21:38:06.5539154Z with torch.cuda.device(0): 2023-01-11T21:38:06.5539366Z buf0 = empty_strided((2, 2, 10), (20, 10, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.5539462Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.5539618Z triton_fused_add_add_1_add_2_add_3_0.run(arg0_1, buf0, 40, grid=grid(40), stream=stream0) 2023-01-11T21:38:06.5539691Z del arg0_1 2023-01-11T21:38:06.5539897Z buf1 = empty_strided((2, 2, 10), (20, 10, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.5539969Z buf1.copy_(buf0) 2023-01-11T21:38:06.5540059Z buf2 = buf1; del buf1 # reuse 2023-01-11T21:38:06.5540165Z kernel_cpp_1(c_void_p(buf2.data_ptr())) 2023-01-11T21:38:06.5540256Z with torch.cuda.device(0): 2023-01-11T21:38:06.5540345Z buf3 = buf0; del buf0 # reuse 2023-01-11T21:38:06.5540426Z buf3.copy_(buf2) 2023-01-11T21:38:06.5540520Z buf4 = buf3; del buf3 # reuse 2023-01-11T21:38:06.5540703Z triton_fused_add_add_1_add_2_add_3_add_4_add_5_add_6_add_7_device_put_device_put_1_2.run(buf4, 40, grid=grid(40), stream=stream0) 2023-01-11T21:38:06.5540792Z buf5 = buf2; del buf2 # reuse 2023-01-11T21:38:06.5540869Z buf5.copy_(buf4) 2023-01-11T21:38:06.5540967Z del buf4 2023-01-11T21:38:06.5541056Z buf6 = buf5; del buf5 # reuse 2023-01-11T21:38:06.5541164Z kernel_cpp_3(c_void_p(buf6.data_ptr())) 2023-01-11T21:38:06.5541238Z return (buf6, ) 2023-01-11T21:38:06.5541243Z 2023-01-11T21:38:06.5541248Z 2023-01-11T21:38:06.5541327Z if __name__ == "__main__": 2023-01-11T21:38:06.5541438Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.5541566Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.5541778Z arg0_1 = rand_strided((2, 2, 10), (20, 10, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.5541890Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.5541899Z 2023-01-11T21:38:06.5541968Z ok (0.227s) 2023-01-11T21:38:06.5542136Z test_multi_gpu_device_cuda (__main__.CudaTests) ... skip: requires multiple cuda devices (0.000s) 2023-01-11T21:38:06.5542603Z test_multilayer_low_prec_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.5542733Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.5542991Z [2023-01-11 21:35:23,124] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 731 2023-01-11T21:38:06.5543193Z [2023-01-11 21:35:23,177] torch._inductor.scheduler: [DEBUG] remove_buffer('buf1') 2023-01-11T21:38:06.5543485Z [2023-01-11 21:35:23,362] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 731 2023-01-11T21:38:06.5543907Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.5544040Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.5544294Z [2023-01-11 21:35:23,375] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 732 2023-01-11T21:38:06.5544501Z [2023-01-11 21:35:23,392] torch._inductor.scheduler: [DEBUG] remove_buffer('buf1') 2023-01-11T21:38:06.5544507Z 2023-01-11T21:38:06.5544600Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.5544676Z import torch 2023-01-11T21:38:06.5544754Z import random 2023-01-11T21:38:06.5544866Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.5544990Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.5544995Z 2023-01-11T21:38:06.5545074Z aten = torch.ops.aten 2023-01-11T21:38:06.5545215Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.5545311Z async_compile = AsyncCompile() 2023-01-11T21:38:06.5545317Z 2023-01-11T21:38:06.5545390Z import triton 2023-01-11T21:38:06.5545484Z import triton.language as tl 2023-01-11T21:38:06.5545615Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.5545766Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.5545772Z 2023-01-11T21:38:06.5545784Z 2023-01-11T21:38:06.5545953Z triton_fused_mean_0 = async_compile.triton(''' 2023-01-11T21:38:06.5546027Z import triton 2023-01-11T21:38:06.5546118Z import triton.language as tl 2023-01-11T21:38:06.5546231Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.5546336Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.5546467Z from torch._inductor.triton_ops.autotune import reduction 2023-01-11T21:38:06.5546585Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.5546596Z 2023-01-11T21:38:06.5546708Z @reduction(size_hints=[512, 8192], 2023-01-11T21:38:06.5546823Z reduction_hint=ReductionHint.INNER, 2023-01-11T21:38:06.5546907Z filename=__file__, 2023-01-11T21:38:06.5547267Z meta={'signature': {0: '*fp16', 1: '*fp32', 2: 'i32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.5547339Z @triton.jit 2023-01-11T21:38:06.5547506Z def triton_(in_ptr0, out_ptr0, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr): 2023-01-11T21:38:06.5547582Z xnumel = 454 2023-01-11T21:38:06.5547649Z rnumel = 8188 2023-01-11T21:38:06.5547748Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.5547884Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.5547968Z xmask = xindex < xnumel 2023-01-11T21:38:06.5548087Z rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK]) 2023-01-11T21:38:06.5548158Z x0 = xindex 2023-01-11T21:38:06.5548276Z _tmp6 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.5548374Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.5548460Z rindex = roffset + rbase 2023-01-11T21:38:06.5548543Z rmask = rindex < rnumel 2023-01-11T21:38:06.5548613Z r1 = rindex 2023-01-11T21:38:06.5548690Z tmp0 = r1 + (8188*x0) 2023-01-11T21:38:06.5548764Z tmp1 = 3717120 2023-01-11T21:38:06.5548846Z tmp2 = tmp0 < tmp1 2023-01-11T21:38:06.5549036Z tmp3 = tl.load(in_ptr0 + ((r1 + (8188*x0)) % 3717120 + tl.zeros([XBLOCK, RBLOCK], tl.int32)), rmask & tmp2 & xmask, other=0).to(tl.float32) 2023-01-11T21:38:06.5549155Z tmp4 = tmp3.to(tl.float32) 2023-01-11T21:38:06.5549248Z tmp5 = tl.where(tmp2, tmp4, 0) 2023-01-11T21:38:06.5549371Z _tmp6 = tl.where(xmask & rmask, _tmp6 + tmp5, _tmp6) 2023-01-11T21:38:06.5549486Z tmp6 = tl.reshape(tl.sum(_tmp6, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.5549584Z tl.store(out_ptr0 + x0, tmp6, xmask) 2023-01-11T21:38:06.5549672Z ''') 2023-01-11T21:38:06.5549678Z 2023-01-11T21:38:06.5549682Z 2023-01-11T21:38:06.5549837Z triton_fused_mean_1 = async_compile.triton(''' 2023-01-11T21:38:06.5549904Z import triton 2023-01-11T21:38:06.5549997Z import triton.language as tl 2023-01-11T21:38:06.5550110Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.5550212Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.5550342Z from torch._inductor.triton_ops.autotune import reduction 2023-01-11T21:38:06.5550462Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.5550467Z 2023-01-11T21:38:06.5550557Z @reduction(size_hints=[1, 512], 2023-01-11T21:38:06.5550669Z reduction_hint=ReductionHint.INNER, 2023-01-11T21:38:06.5550754Z filename=__file__, 2023-01-11T21:38:06.5551117Z meta={'signature': {0: '*fp32', 1: '*fp16', 2: 'i32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.5551190Z @triton.jit 2023-01-11T21:38:06.5551357Z def triton_(in_ptr0, out_ptr1, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr): 2023-01-11T21:38:06.5551427Z xnumel = 1 2023-01-11T21:38:06.5551501Z rnumel = 454 2023-01-11T21:38:06.5551599Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.5551728Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.5551809Z xmask = xindex < xnumel 2023-01-11T21:38:06.5551927Z rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK]) 2023-01-11T21:38:06.5552042Z _tmp1 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.5552148Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.5552237Z rindex = roffset + rbase 2023-01-11T21:38:06.5552321Z rmask = rindex < rnumel 2023-01-11T21:38:06.5552386Z r0 = rindex 2023-01-11T21:38:06.5552485Z tmp0 = tl.load(in_ptr0 + (r0), rmask) 2023-01-11T21:38:06.5552632Z _tmp1 = tl.where(xmask & rmask, _tmp1 + tmp0, _tmp1) 2023-01-11T21:38:06.5552746Z tmp1 = tl.reshape(tl.sum(_tmp1, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.5552821Z tmp2 = 3717120 2023-01-11T21:38:06.5552900Z tmp3 = tmp1 / tmp2 2023-01-11T21:38:06.5552988Z tmp4 = tmp3.to(tl.float32) 2023-01-11T21:38:06.5553116Z tl.store(out_ptr1 + (0 + tl.zeros([XBLOCK, 1], tl.int32)), tmp4, None) 2023-01-11T21:38:06.5553204Z ''') 2023-01-11T21:38:06.5553209Z 2023-01-11T21:38:06.5553214Z 2023-01-11T21:38:06.5553306Z async_compile.wait(globals()) 2023-01-11T21:38:06.5553380Z del async_compile 2023-01-11T21:38:06.5553385Z 2023-01-11T21:38:06.5553461Z def call(args): 2023-01-11T21:38:06.5553534Z arg0_1, = args 2023-01-11T21:38:06.5553607Z args.clear() 2023-01-11T21:38:06.5553698Z with torch.cuda.device(0): 2023-01-11T21:38:06.5553890Z buf0 = empty_strided((454, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.5553982Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.5554126Z triton_fused_mean_0.run(arg0_1, buf0, 454, 8188, grid=grid(454), stream=stream0) 2023-01-11T21:38:06.5554200Z del arg0_1 2023-01-11T21:38:06.5554386Z buf2 = empty_strided((), (), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.5554525Z triton_fused_mean_1.run(buf0, buf2, 1, 454, grid=grid(1), stream=stream0) 2023-01-11T21:38:06.5554602Z return (buf2, ) 2023-01-11T21:38:06.5554607Z 2023-01-11T21:38:06.5554612Z 2023-01-11T21:38:06.5554684Z if __name__ == "__main__": 2023-01-11T21:38:06.5554804Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.5554933Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.5555194Z arg0_1 = rand_strided((10, 3, 352, 352), (371712, 123904, 352, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.5555311Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.5555317Z 2023-01-11T21:38:06.5555627Z [2023-01-11 21:35:23,394] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 732 2023-01-11T21:38:06.5555633Z 2023-01-11T21:38:06.5555731Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.5555804Z import torch 2023-01-11T21:38:06.5555880Z import random 2023-01-11T21:38:06.5555992Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.5556113Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.5556118Z 2023-01-11T21:38:06.5556201Z aten = torch.ops.aten 2023-01-11T21:38:06.5556335Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.5556429Z async_compile = AsyncCompile() 2023-01-11T21:38:06.5556437Z 2023-01-11T21:38:06.5556511Z import triton 2023-01-11T21:38:06.5556603Z import triton.language as tl 2023-01-11T21:38:06.5556720Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.5556859Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.5556864Z 2023-01-11T21:38:06.5556868Z 2023-01-11T21:38:06.5557025Z triton_fused_mean_0 = async_compile.triton(''' 2023-01-11T21:38:06.5557099Z import triton 2023-01-11T21:38:06.5557190Z import triton.language as tl 2023-01-11T21:38:06.5557302Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.5557403Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.5557533Z from torch._inductor.triton_ops.autotune import reduction 2023-01-11T21:38:06.5557650Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.5557655Z 2023-01-11T21:38:06.5557747Z @reduction(size_hints=[512, 8192], 2023-01-11T21:38:06.5557859Z reduction_hint=ReductionHint.INNER, 2023-01-11T21:38:06.5557948Z filename=__file__, 2023-01-11T21:38:06.5558301Z meta={'signature': {0: '*fp16', 1: '*fp32', 2: 'i32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.5558372Z @triton.jit 2023-01-11T21:38:06.5558568Z def triton_(in_ptr0, out_ptr0, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr): 2023-01-11T21:38:06.5558641Z xnumel = 454 2023-01-11T21:38:06.5558709Z rnumel = 8188 2023-01-11T21:38:06.5558806Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.5558944Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.5559027Z xmask = xindex < xnumel 2023-01-11T21:38:06.5559145Z rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK]) 2023-01-11T21:38:06.5559215Z x0 = xindex 2023-01-11T21:38:06.5559333Z _tmp6 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.5559434Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.5559520Z rindex = roffset + rbase 2023-01-11T21:38:06.5559604Z rmask = rindex < rnumel 2023-01-11T21:38:06.5559674Z r1 = rindex 2023-01-11T21:38:06.5559753Z tmp0 = r1 + (8188*x0) 2023-01-11T21:38:06.5559827Z tmp1 = 3717120 2023-01-11T21:38:06.5559904Z tmp2 = tmp0 < tmp1 2023-01-11T21:38:06.5560097Z tmp3 = tl.load(in_ptr0 + ((r1 + (8188*x0)) % 3717120 + tl.zeros([XBLOCK, RBLOCK], tl.int32)), rmask & tmp2 & xmask, other=0).to(tl.float32) 2023-01-11T21:38:06.5560187Z tmp4 = tmp3.to(tl.float32) 2023-01-11T21:38:06.5560281Z tmp5 = tl.where(tmp2, tmp4, 0) 2023-01-11T21:38:06.5560402Z _tmp6 = tl.where(xmask & rmask, _tmp6 + tmp5, _tmp6) 2023-01-11T21:38:06.5560515Z tmp6 = tl.reshape(tl.sum(_tmp6, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.5560613Z tl.store(out_ptr0 + x0, tmp6, xmask) 2023-01-11T21:38:06.5560698Z ''') 2023-01-11T21:38:06.5560704Z 2023-01-11T21:38:06.5560736Z 2023-01-11T21:38:06.5560885Z triton_fused_mean_1 = async_compile.triton(''' 2023-01-11T21:38:06.5560958Z import triton 2023-01-11T21:38:06.5561050Z import triton.language as tl 2023-01-11T21:38:06.5561163Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.5561264Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.5561393Z from torch._inductor.triton_ops.autotune import reduction 2023-01-11T21:38:06.5561517Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.5561523Z 2023-01-11T21:38:06.5561610Z @reduction(size_hints=[1, 512], 2023-01-11T21:38:06.5561718Z reduction_hint=ReductionHint.INNER, 2023-01-11T21:38:06.5561803Z filename=__file__, 2023-01-11T21:38:06.5562158Z meta={'signature': {0: '*fp32', 1: '*fp16', 2: 'i32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.5562235Z @triton.jit 2023-01-11T21:38:06.5562405Z def triton_(in_ptr0, out_ptr1, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr): 2023-01-11T21:38:06.5562477Z xnumel = 1 2023-01-11T21:38:06.5562550Z rnumel = 454 2023-01-11T21:38:06.5562639Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.5562773Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.5562857Z xmask = xindex < xnumel 2023-01-11T21:38:06.5562974Z rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK]) 2023-01-11T21:38:06.5563091Z _tmp1 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.5563198Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.5563286Z rindex = roffset + rbase 2023-01-11T21:38:06.5563364Z rmask = rindex < rnumel 2023-01-11T21:38:06.5563437Z r0 = rindex 2023-01-11T21:38:06.5563539Z tmp0 = tl.load(in_ptr0 + (r0), rmask) 2023-01-11T21:38:06.5563658Z _tmp1 = tl.where(xmask & rmask, _tmp1 + tmp0, _tmp1) 2023-01-11T21:38:06.5563774Z tmp1 = tl.reshape(tl.sum(_tmp1, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.5563847Z tmp2 = 3717120 2023-01-11T21:38:06.5563926Z tmp3 = tmp1 / tmp2 2023-01-11T21:38:06.5564007Z tmp4 = tmp3.to(tl.float32) 2023-01-11T21:38:06.5564140Z tl.store(out_ptr1 + (0 + tl.zeros([XBLOCK, 1], tl.int32)), tmp4, None) 2023-01-11T21:38:06.5564226Z ''') 2023-01-11T21:38:06.5564259Z 2023-01-11T21:38:06.5564264Z 2023-01-11T21:38:06.5564357Z async_compile.wait(globals()) 2023-01-11T21:38:06.5564434Z del async_compile 2023-01-11T21:38:06.5564439Z 2023-01-11T21:38:06.5564514Z def call(args): 2023-01-11T21:38:06.5564586Z arg0_1, = args 2023-01-11T21:38:06.5564660Z args.clear() 2023-01-11T21:38:06.5564744Z with torch.cuda.device(0): 2023-01-11T21:38:06.5564944Z buf0 = empty_strided((454, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.5565034Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.5565174Z triton_fused_mean_0.run(arg0_1, buf0, 454, 8188, grid=grid(454), stream=stream0) 2023-01-11T21:38:06.5565251Z del arg0_1 2023-01-11T21:38:06.5565437Z buf2 = empty_strided((), (), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.5565573Z triton_fused_mean_1.run(buf0, buf2, 1, 454, grid=grid(1), stream=stream0) 2023-01-11T21:38:06.5565644Z return (buf2, ) 2023-01-11T21:38:06.5565657Z 2023-01-11T21:38:06.5565664Z 2023-01-11T21:38:06.5565737Z if __name__ == "__main__": 2023-01-11T21:38:06.5565854Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.5565980Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.5566214Z arg0_1 = rand_strided((10, 3, 352, 352), (371712, 123904, 352, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.5566327Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.5566332Z 2023-01-11T21:38:06.5566402Z ok (0.367s) 2023-01-11T21:38:06.5566558Z test_nan_to_num_cuda (__main__.CudaTests) ... skip: Skipping due to op bugs (0.001s) 2023-01-11T21:38:06.5567048Z test_narrow_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.5567182Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.5567433Z [2023-01-11 21:35:23,426] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 733 2023-01-11T21:38:06.5567696Z [2023-01-11 21:35:23,497] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 733 2023-01-11T21:38:06.5568107Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.5568242Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.5568504Z [2023-01-11 21:35:23,528] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 734 2023-01-11T21:38:06.5568766Z [2023-01-11 21:35:23,598] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 734 2023-01-11T21:38:06.5568772Z 2023-01-11T21:38:06.5568869Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.5568941Z import torch 2023-01-11T21:38:06.5569017Z import random 2023-01-11T21:38:06.5569129Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.5569254Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.5569259Z 2023-01-11T21:38:06.5569340Z aten = torch.ops.aten 2023-01-11T21:38:06.5569476Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.5569575Z async_compile = AsyncCompile() 2023-01-11T21:38:06.5569580Z 2023-01-11T21:38:06.5569654Z import triton 2023-01-11T21:38:06.5569745Z import triton.language as tl 2023-01-11T21:38:06.5569870Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.5570028Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.5570034Z 2023-01-11T21:38:06.5570039Z 2023-01-11T21:38:06.5570197Z triton_fused_add_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.5570272Z import triton 2023-01-11T21:38:06.5570363Z import triton.language as tl 2023-01-11T21:38:06.5570476Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.5570577Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.5570709Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.5570826Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.5570835Z 2023-01-11T21:38:06.5571235Z @pointwise(size_hints=[1024], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.5571313Z @triton.jit 2023-01-11T21:38:06.5571446Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.5571521Z xnumel = 1024 2023-01-11T21:38:06.5571618Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.5571747Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.5571829Z xmask = xindex < xnumel 2023-01-11T21:38:06.5571892Z x0 = xindex 2023-01-11T21:38:06.5572089Z tmp0 = tl.load(in_ptr0 + (640 + x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.5572158Z tmp1 = 2 2023-01-11T21:38:06.5572237Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.5572306Z tmp3 = 1 2023-01-11T21:38:06.5572384Z tmp4 = tmp2 + tmp3 2023-01-11T21:38:06.5572517Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.5572636Z ''') 2023-01-11T21:38:06.5572642Z 2023-01-11T21:38:06.5572653Z 2023-01-11T21:38:06.5572740Z async_compile.wait(globals()) 2023-01-11T21:38:06.5572817Z del async_compile 2023-01-11T21:38:06.5572822Z 2023-01-11T21:38:06.5572896Z def call(args): 2023-01-11T21:38:06.5572973Z arg0_1, = args 2023-01-11T21:38:06.5573047Z args.clear() 2023-01-11T21:38:06.5573139Z with torch.cuda.device(0): 2023-01-11T21:38:06.5573343Z buf0 = empty_strided((16, 64), (64, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.5573429Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.5573571Z triton_fused_add_1_0.run(arg0_1, buf0, 1024, grid=grid(1024), stream=stream0) 2023-01-11T21:38:06.5573684Z return (as_strided(arg0_1, (64, 16), (64, 1), 10), buf0, ) 2023-01-11T21:38:06.5573690Z 2023-01-11T21:38:06.5573694Z 2023-01-11T21:38:06.5573772Z if __name__ == "__main__": 2023-01-11T21:38:06.5573892Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.5574023Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.5574229Z arg0_1 = rand_strided((64, 64), (64, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.5574341Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.5574347Z 2023-01-11T21:38:06.5574354Z 2023-01-11T21:38:06.5574444Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.5574633Z import torch 2023-01-11T21:38:06.5574708Z import random 2023-01-11T21:38:06.5574826Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.5574949Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.5574954Z 2023-01-11T21:38:06.5575035Z aten = torch.ops.aten 2023-01-11T21:38:06.5575171Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.5575267Z async_compile = AsyncCompile() 2023-01-11T21:38:06.5575272Z 2023-01-11T21:38:06.5575338Z import triton 2023-01-11T21:38:06.5575436Z import triton.language as tl 2023-01-11T21:38:06.5575559Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.5575697Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.5575702Z 2023-01-11T21:38:06.5575707Z 2023-01-11T21:38:06.5575865Z triton_fused_add_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.5575981Z import triton 2023-01-11T21:38:06.5576075Z import triton.language as tl 2023-01-11T21:38:06.5576181Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.5576283Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.5576414Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.5576536Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.5576541Z 2023-01-11T21:38:06.5576942Z @pointwise(size_hints=[1024], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.5577019Z @triton.jit 2023-01-11T21:38:06.5577206Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.5577294Z xnumel = 1024 2023-01-11T21:38:06.5577386Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.5577536Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.5577619Z xmask = xindex < xnumel 2023-01-11T21:38:06.5577689Z x0 = xindex 2023-01-11T21:38:06.5577906Z tmp0 = tl.load(in_ptr0 + (640 + x0), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.5577976Z tmp1 = 2 2023-01-11T21:38:06.5578054Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.5578117Z tmp3 = 1 2023-01-11T21:38:06.5578194Z tmp4 = tmp2 + tmp3 2023-01-11T21:38:06.5578329Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.5578413Z ''') 2023-01-11T21:38:06.5578419Z 2023-01-11T21:38:06.5578423Z 2023-01-11T21:38:06.5578514Z async_compile.wait(globals()) 2023-01-11T21:38:06.5578664Z del async_compile 2023-01-11T21:38:06.5578670Z 2023-01-11T21:38:06.5578745Z def call(args): 2023-01-11T21:38:06.5578819Z arg0_1, = args 2023-01-11T21:38:06.5578886Z args.clear() 2023-01-11T21:38:06.5578977Z with torch.cuda.device(0): 2023-01-11T21:38:06.5579188Z buf0 = empty_strided((16, 64), (64, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.5579279Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.5579419Z triton_fused_add_1_0.run(arg0_1, buf0, 1024, grid=grid(1024), stream=stream0) 2023-01-11T21:38:06.5579532Z return (as_strided(arg0_1, (64, 16), (64, 1), 10), buf0, ) 2023-01-11T21:38:06.5579538Z 2023-01-11T21:38:06.5579542Z 2023-01-11T21:38:06.5579621Z if __name__ == "__main__": 2023-01-11T21:38:06.5579737Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.5579855Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.5580057Z arg0_1 = rand_strided((64, 64), (64, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.5580173Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.5580178Z 2023-01-11T21:38:06.5580250Z ok (0.203s) 2023-01-11T21:38:06.5580725Z test_new_empty_strided_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.5580857Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.5581112Z [2023-01-11 21:35:23,638] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 735 2023-01-11T21:38:06.5581377Z [2023-01-11 21:35:23,822] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 735 2023-01-11T21:38:06.5581826Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.5581958Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.5582207Z [2023-01-11 21:35:23,859] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 736 2023-01-11T21:38:06.5582469Z [2023-01-11 21:35:23,923] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 736 2023-01-11T21:38:06.5582475Z 2023-01-11T21:38:06.5582574Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.5582647Z import torch 2023-01-11T21:38:06.5582721Z import random 2023-01-11T21:38:06.5582837Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.5582966Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.5582971Z 2023-01-11T21:38:06.5583053Z aten = torch.ops.aten 2023-01-11T21:38:06.5583182Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.5583276Z async_compile = AsyncCompile() 2023-01-11T21:38:06.5583281Z 2023-01-11T21:38:06.5583356Z import triton 2023-01-11T21:38:06.5583450Z import triton.language as tl 2023-01-11T21:38:06.5583573Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.5583713Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.5583718Z 2023-01-11T21:38:06.5583723Z 2023-01-11T21:38:06.5583908Z triton_fused_fill__new_empty_strided_0 = async_compile.triton(''' 2023-01-11T21:38:06.5583983Z import triton 2023-01-11T21:38:06.5584069Z import triton.language as tl 2023-01-11T21:38:06.5584183Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.5584283Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.5584444Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.5584572Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.5584578Z 2023-01-11T21:38:06.5584972Z @pointwise(size_hints=[16384], filename=__file__, meta={'signature': {0: '*fp32', 1: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.5585044Z @triton.jit 2023-01-11T21:38:06.5585165Z def triton_(out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.5585233Z xnumel = 16384 2023-01-11T21:38:06.5585331Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.5585456Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.5585539Z xmask = xindex < xnumel 2023-01-11T21:38:06.5585616Z x0 = xindex 2023-01-11T21:38:06.5585705Z tmp0 = 123 2023-01-11T21:38:06.5585854Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask) 2023-01-11T21:38:06.5585944Z ''') 2023-01-11T21:38:06.5585949Z 2023-01-11T21:38:06.5585960Z 2023-01-11T21:38:06.5586046Z async_compile.wait(globals()) 2023-01-11T21:38:06.5586121Z del async_compile 2023-01-11T21:38:06.5586126Z 2023-01-11T21:38:06.5586201Z def call(args): 2023-01-11T21:38:06.5586272Z arg0_1, = args 2023-01-11T21:38:06.5586348Z args.clear() 2023-01-11T21:38:06.5586439Z with torch.cuda.device(0): 2023-01-11T21:38:06.5586653Z buf0 = empty_strided((1, 128, 128), (16384, 128, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.5586746Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.5586901Z triton_fused_fill__new_empty_strided_0.run(buf0, 16384, grid=grid(16384), stream=stream0) 2023-01-11T21:38:06.5586978Z return (buf0, ) 2023-01-11T21:38:06.5586983Z 2023-01-11T21:38:06.5586988Z 2023-01-11T21:38:06.5587066Z if __name__ == "__main__": 2023-01-11T21:38:06.5587186Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.5587314Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.5587510Z arg0_1 = rand_strided((55, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.5587615Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.5587629Z 2023-01-11T21:38:06.5587633Z 2023-01-11T21:38:06.5587752Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.5587826Z import torch 2023-01-11T21:38:06.5587899Z import random 2023-01-11T21:38:06.5588020Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.5588143Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.5588148Z 2023-01-11T21:38:06.5588227Z aten = torch.ops.aten 2023-01-11T21:38:06.5588359Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.5588447Z async_compile = AsyncCompile() 2023-01-11T21:38:06.5588456Z 2023-01-11T21:38:06.5588522Z import triton 2023-01-11T21:38:06.5588612Z import triton.language as tl 2023-01-11T21:38:06.5588739Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.5588878Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.5588883Z 2023-01-11T21:38:06.5588887Z 2023-01-11T21:38:06.5589070Z triton_fused_fill__new_empty_strided_0 = async_compile.triton(''' 2023-01-11T21:38:06.5589143Z import triton 2023-01-11T21:38:06.5589236Z import triton.language as tl 2023-01-11T21:38:06.5589344Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.5589447Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.5589579Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.5589702Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.5589707Z 2023-01-11T21:38:06.5590096Z @pointwise(size_hints=[16384], filename=__file__, meta={'signature': {0: '*fp16', 1: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.5590197Z @triton.jit 2023-01-11T21:38:06.5590317Z def triton_(out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.5590394Z xnumel = 16384 2023-01-11T21:38:06.5590484Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.5590610Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.5590693Z xmask = xindex < xnumel 2023-01-11T21:38:06.5590763Z x0 = xindex 2023-01-11T21:38:06.5590836Z tmp0 = 123 2023-01-11T21:38:06.5590970Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask) 2023-01-11T21:38:06.5591056Z ''') 2023-01-11T21:38:06.5591061Z 2023-01-11T21:38:06.5591065Z 2023-01-11T21:38:06.5591160Z async_compile.wait(globals()) 2023-01-11T21:38:06.5591229Z del async_compile 2023-01-11T21:38:06.5591234Z 2023-01-11T21:38:06.5591307Z def call(args): 2023-01-11T21:38:06.5591379Z arg0_1, = args 2023-01-11T21:38:06.5591451Z args.clear() 2023-01-11T21:38:06.5591546Z with torch.cuda.device(0): 2023-01-11T21:38:06.5591768Z buf0 = empty_strided((1, 128, 128), (16384, 128, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.5591861Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.5592008Z triton_fused_fill__new_empty_strided_0.run(buf0, 16384, grid=grid(16384), stream=stream0) 2023-01-11T21:38:06.5592086Z return (buf0, ) 2023-01-11T21:38:06.5592093Z 2023-01-11T21:38:06.5592097Z 2023-01-11T21:38:06.5592177Z if __name__ == "__main__": 2023-01-11T21:38:06.5592295Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.5592420Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.5592620Z arg0_1 = rand_strided((55, ), (1, ), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.5592733Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.5592738Z 2023-01-11T21:38:06.5592806Z ok (0.324s) 2023-01-11T21:38:06.5593256Z test_new_ones_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.5593417Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.5593678Z [2023-01-11 21:35:23,985] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 737 2023-01-11T21:38:06.5593942Z [2023-01-11 21:35:24,052] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 737 2023-01-11T21:38:06.5594356Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.5594489Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.5594742Z [2023-01-11 21:35:24,113] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 738 2023-01-11T21:38:06.5595002Z [2023-01-11 21:35:24,123] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 738 2023-01-11T21:38:06.5595007Z 2023-01-11T21:38:06.5595105Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.5595178Z import torch 2023-01-11T21:38:06.5595246Z import random 2023-01-11T21:38:06.5595382Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.5595520Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.5595526Z 2023-01-11T21:38:06.5595621Z aten = torch.ops.aten 2023-01-11T21:38:06.5595758Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.5595853Z async_compile = AsyncCompile() 2023-01-11T21:38:06.5595859Z 2023-01-11T21:38:06.5595960Z import triton 2023-01-11T21:38:06.5596054Z import triton.language as tl 2023-01-11T21:38:06.5596172Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.5596312Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.5596317Z 2023-01-11T21:38:06.5596322Z 2023-01-11T21:38:06.5596480Z triton_fused_full_0 = async_compile.triton(''' 2023-01-11T21:38:06.5596554Z import triton 2023-01-11T21:38:06.5596645Z import triton.language as tl 2023-01-11T21:38:06.5596756Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.5602983Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.5603144Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.5603275Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.5603281Z 2023-01-11T21:38:06.5603687Z @pointwise(size_hints=[1], filename=__file__, meta={'signature': {0: '*fp32', 1: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0,), equal_to_1=())]}) 2023-01-11T21:38:06.5603769Z @triton.jit 2023-01-11T21:38:06.5603893Z def triton_(out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.5603966Z xnumel = 1 2023-01-11T21:38:06.5604067Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.5604202Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.5604293Z xmask = xindex < xnumel 2023-01-11T21:38:06.5604357Z tmp0 = 1 2023-01-11T21:38:06.5604492Z tl.store(out_ptr0 + (0 + tl.zeros([XBLOCK], tl.int32)), tmp0, None) 2023-01-11T21:38:06.5604579Z ''') 2023-01-11T21:38:06.5604585Z 2023-01-11T21:38:06.5604589Z 2023-01-11T21:38:06.5604751Z triton_fused_new_zeros_1 = async_compile.triton(''' 2023-01-11T21:38:06.5604822Z import triton 2023-01-11T21:38:06.5604918Z import triton.language as tl 2023-01-11T21:38:06.5605030Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.5605124Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.5605261Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.5605385Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.5605390Z 2023-01-11T21:38:06.5605828Z @pointwise(size_hints=[1], filename=__file__, meta={'signature': {0: '*fp32', 1: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0,), equal_to_1=())]}) 2023-01-11T21:38:06.5605907Z @triton.jit 2023-01-11T21:38:06.5606025Z def triton_(out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.5606099Z xnumel = 1 2023-01-11T21:38:06.5606196Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.5606318Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.5606400Z xmask = xindex < xnumel 2023-01-11T21:38:06.5606470Z tmp0 = 0 2023-01-11T21:38:06.5606602Z tl.store(out_ptr0 + (0 + tl.zeros([XBLOCK], tl.int32)), tmp0, None) 2023-01-11T21:38:06.5606688Z ''') 2023-01-11T21:38:06.5606696Z 2023-01-11T21:38:06.5606701Z 2023-01-11T21:38:06.5606795Z async_compile.wait(globals()) 2023-01-11T21:38:06.5606871Z del async_compile 2023-01-11T21:38:06.5606876Z 2023-01-11T21:38:06.5606949Z def call(args): 2023-01-11T21:38:06.5607015Z arg0_1, = args 2023-01-11T21:38:06.5607092Z args.clear() 2023-01-11T21:38:06.5607189Z with torch.cuda.device(0): 2023-01-11T21:38:06.5607380Z buf0 = empty_strided((), (), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.5607472Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.5607603Z triton_fused_full_0.run(buf0, 1, grid=grid(1), stream=stream0) 2023-01-11T21:38:06.5607790Z buf1 = empty_strided((), (), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.5607919Z triton_fused_new_zeros_1.run(buf1, 1, grid=grid(1), stream=stream0) 2023-01-11T21:38:06.5608004Z return (buf0, buf1, ) 2023-01-11T21:38:06.5608009Z 2023-01-11T21:38:06.5608014Z 2023-01-11T21:38:06.5608094Z if __name__ == "__main__": 2023-01-11T21:38:06.5608287Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.5608412Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.5608608Z arg0_1 = rand_strided((8, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.5608716Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.5608724Z 2023-01-11T21:38:06.5608729Z 2023-01-11T21:38:06.5608824Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.5608890Z import torch 2023-01-11T21:38:06.5608965Z import random 2023-01-11T21:38:06.5609085Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.5609209Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.5609214Z 2023-01-11T21:38:06.5609299Z aten = torch.ops.aten 2023-01-11T21:38:06.5609434Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.5609531Z async_compile = AsyncCompile() 2023-01-11T21:38:06.5609536Z 2023-01-11T21:38:06.5609614Z import triton 2023-01-11T21:38:06.5609699Z import triton.language as tl 2023-01-11T21:38:06.5609822Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.5609958Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.5609964Z 2023-01-11T21:38:06.5609968Z 2023-01-11T21:38:06.5610126Z triton_fused_full_0 = async_compile.triton(''' 2023-01-11T21:38:06.5610200Z import triton 2023-01-11T21:38:06.5610291Z import triton.language as tl 2023-01-11T21:38:06.5610402Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.5610496Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.5610631Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.5610754Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.5610760Z 2023-01-11T21:38:06.5611139Z @pointwise(size_hints=[1], filename=__file__, meta={'signature': {0: '*fp32', 1: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0,), equal_to_1=())]}) 2023-01-11T21:38:06.5611214Z @triton.jit 2023-01-11T21:38:06.5611335Z def triton_(out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.5611407Z xnumel = 1 2023-01-11T21:38:06.5611504Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.5611653Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.5611738Z xmask = xindex < xnumel 2023-01-11T21:38:06.5611809Z tmp0 = 1 2023-01-11T21:38:06.5611940Z tl.store(out_ptr0 + (0 + tl.zeros([XBLOCK], tl.int32)), tmp0, None) 2023-01-11T21:38:06.5612024Z ''') 2023-01-11T21:38:06.5612029Z 2023-01-11T21:38:06.5612034Z 2023-01-11T21:38:06.5612196Z triton_fused_new_zeros_1 = async_compile.triton(''' 2023-01-11T21:38:06.5612270Z import triton 2023-01-11T21:38:06.5612362Z import triton.language as tl 2023-01-11T21:38:06.5612467Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.5612568Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.5612700Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.5612823Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.5612828Z 2023-01-11T21:38:06.5613204Z @pointwise(size_hints=[1], filename=__file__, meta={'signature': {0: '*fp32', 1: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0,), equal_to_1=())]}) 2023-01-11T21:38:06.5613277Z @triton.jit 2023-01-11T21:38:06.5613396Z def triton_(out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.5613469Z xnumel = 1 2023-01-11T21:38:06.5613559Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.5613686Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.5613768Z xmask = xindex < xnumel 2023-01-11T21:38:06.5613836Z tmp0 = 0 2023-01-11T21:38:06.5613964Z tl.store(out_ptr0 + (0 + tl.zeros([XBLOCK], tl.int32)), tmp0, None) 2023-01-11T21:38:06.5614047Z ''') 2023-01-11T21:38:06.5614094Z 2023-01-11T21:38:06.5614098Z 2023-01-11T21:38:06.5614191Z async_compile.wait(globals()) 2023-01-11T21:38:06.5614261Z del async_compile 2023-01-11T21:38:06.5614266Z 2023-01-11T21:38:06.5614340Z def call(args): 2023-01-11T21:38:06.5614414Z arg0_1, = args 2023-01-11T21:38:06.5614724Z args.clear() 2023-01-11T21:38:06.5614820Z with torch.cuda.device(0): 2023-01-11T21:38:06.5615017Z buf0 = empty_strided((), (), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.5615109Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.5615233Z triton_fused_full_0.run(buf0, 1, grid=grid(1), stream=stream0) 2023-01-11T21:38:06.5615420Z buf1 = empty_strided((), (), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.5615554Z triton_fused_new_zeros_1.run(buf1, 1, grid=grid(1), stream=stream0) 2023-01-11T21:38:06.5615636Z return (buf0, buf1, ) 2023-01-11T21:38:06.5615643Z 2023-01-11T21:38:06.5615647Z 2023-01-11T21:38:06.5615727Z if __name__ == "__main__": 2023-01-11T21:38:06.5615849Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.5615976Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.5616172Z arg0_1 = rand_strided((8, ), (1, ), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.5616277Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.5616285Z 2023-01-11T21:38:06.5616356Z ok (0.200s) 2023-01-11T21:38:06.5616822Z test_nll_loss_forward_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.5616954Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.5617272Z [2023-01-11 21:35:24,174] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 739 2023-01-11T21:38:06.5617562Z [2023-01-11 21:35:24,357] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 739 2023-01-11T21:38:06.5618033Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.5618170Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.5618430Z [2023-01-11 21:35:24,406] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 740 2023-01-11T21:38:06.5618644Z [2023-01-11 21:35:24,416] torch._inductor.scheduler: [DEBUG] remove_buffer('buf0') 2023-01-11T21:38:06.5618909Z [2023-01-11 21:35:24,493] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 740 2023-01-11T21:38:06.5618918Z 2023-01-11T21:38:06.5619011Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.5619088Z import torch 2023-01-11T21:38:06.5619163Z import random 2023-01-11T21:38:06.5619285Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.5619413Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.5619419Z 2023-01-11T21:38:06.5619507Z aten = torch.ops.aten 2023-01-11T21:38:06.5619646Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.5619742Z async_compile = AsyncCompile() 2023-01-11T21:38:06.5619747Z 2023-01-11T21:38:06.5619816Z import triton 2023-01-11T21:38:06.5619909Z import triton.language as tl 2023-01-11T21:38:06.5620035Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.5620175Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.5620181Z 2023-01-11T21:38:06.5620185Z 2023-01-11T21:38:06.5620441Z triton_fused_gather_mean_neg_squeeze_unsqueeze_0 = async_compile.triton(''' 2023-01-11T21:38:06.5620519Z import triton 2023-01-11T21:38:06.5620612Z import triton.language as tl 2023-01-11T21:38:06.5620721Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.5620828Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.5620961Z from torch._inductor.triton_ops.autotune import reduction 2023-01-11T21:38:06.5621088Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.5621093Z 2023-01-11T21:38:06.5621182Z @reduction(size_hints=[1, 8], 2023-01-11T21:38:06.5621300Z reduction_hint=ReductionHint.INNER, 2023-01-11T21:38:06.5621388Z filename=__file__, 2023-01-11T21:38:06.5621776Z meta={'signature': {0: '*fp32', 1: '*i64', 2: '*fp32', 3: 'i32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.5621848Z @triton.jit 2023-01-11T21:38:06.5622032Z def triton_(in_out_ptr0, in_ptr0, in_ptr1, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr): 2023-01-11T21:38:06.5622106Z xnumel = 1 2023-01-11T21:38:06.5622180Z rnumel = 5 2023-01-11T21:38:06.5622279Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.5622422Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.5622508Z xmask = xindex < xnumel 2023-01-11T21:38:06.5622627Z rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK]) 2023-01-11T21:38:06.5622741Z _tmp3 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.5622848Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.5622939Z rindex = roffset + rbase 2023-01-11T21:38:06.5623028Z rmask = rindex < rnumel 2023-01-11T21:38:06.5623103Z r0 = rindex 2023-01-11T21:38:06.5623208Z tmp0 = tl.load(in_ptr0 + (r0), rmask) 2023-01-11T21:38:06.5623321Z tmp1 = tl.load(in_ptr1 + (tmp0 + (5*r0)), rmask) 2023-01-11T21:38:06.5623426Z tmp2 = -tmp1 2023-01-11T21:38:06.5623550Z _tmp3 = tl.where(xmask & rmask, _tmp3 + tmp2, _tmp3) 2023-01-11T21:38:06.5623669Z tmp3 = tl.reshape(tl.sum(_tmp3, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.5623742Z tmp4 = 5 2023-01-11T21:38:06.5623824Z tmp5 = tmp3 / tmp4 2023-01-11T21:38:06.5623991Z tl.store(in_out_ptr0 + (0 + tl.zeros([XBLOCK, 1], tl.int32)), tmp5, None) 2023-01-11T21:38:06.5624080Z ''') 2023-01-11T21:38:06.5624086Z 2023-01-11T21:38:06.5624091Z 2023-01-11T21:38:06.5624241Z triton_fused_full_1 = async_compile.triton(''' 2023-01-11T21:38:06.5624319Z import triton 2023-01-11T21:38:06.5624413Z import triton.language as tl 2023-01-11T21:38:06.5624529Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.5624634Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.5624769Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.5624897Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.5624904Z 2023-01-11T21:38:06.5625293Z @pointwise(size_hints=[1], filename=__file__, meta={'signature': {0: '*fp32', 1: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0,), equal_to_1=())]}) 2023-01-11T21:38:06.5625363Z @triton.jit 2023-01-11T21:38:06.5625487Z def triton_(out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.5625560Z xnumel = 1 2023-01-11T21:38:06.5625676Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.5625822Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.5625919Z xmask = xindex < xnumel 2023-01-11T21:38:06.5625994Z tmp0 = 5.0 2023-01-11T21:38:06.5626122Z tl.store(out_ptr0 + (0 + tl.zeros([XBLOCK], tl.int32)), tmp0, None) 2023-01-11T21:38:06.5626208Z ''') 2023-01-11T21:38:06.5626213Z 2023-01-11T21:38:06.5626218Z 2023-01-11T21:38:06.5626313Z async_compile.wait(globals()) 2023-01-11T21:38:06.5626390Z del async_compile 2023-01-11T21:38:06.5626426Z 2023-01-11T21:38:06.5626503Z def call(args): 2023-01-11T21:38:06.5626584Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.5626661Z args.clear() 2023-01-11T21:38:06.5626749Z with torch.cuda.device(0): 2023-01-11T21:38:06.5626941Z buf0 = empty_strided((), (), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.5627037Z buf1 = buf0; del buf0 # reuse 2023-01-11T21:38:06.5627130Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.5627308Z triton_fused_gather_mean_neg_squeeze_unsqueeze_0.run(buf1, arg1_1, arg0_1, 1, 5, grid=grid(1), stream=stream0) 2023-01-11T21:38:06.5627384Z del arg0_1 2023-01-11T21:38:06.5627459Z del arg1_1 2023-01-11T21:38:06.5627651Z buf2 = empty_strided((), (), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.5627778Z triton_fused_full_1.run(buf2, 1, grid=grid(1), stream=stream0) 2023-01-11T21:38:06.5627864Z return (buf1, buf2, ) 2023-01-11T21:38:06.5627869Z 2023-01-11T21:38:06.5627877Z 2023-01-11T21:38:06.5627960Z if __name__ == "__main__": 2023-01-11T21:38:06.5628080Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.5628212Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.5628414Z arg0_1 = rand_strided((5, 5), (5, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.5628613Z arg1_1 = rand_strided((5, ), (1, ), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.5628733Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.5628738Z 2023-01-11T21:38:06.5628743Z 2023-01-11T21:38:06.5628836Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.5628912Z import torch 2023-01-11T21:38:06.5628987Z import random 2023-01-11T21:38:06.5629108Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.5629231Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.5629236Z 2023-01-11T21:38:06.5629319Z aten = torch.ops.aten 2023-01-11T21:38:06.5629458Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.5629559Z async_compile = AsyncCompile() 2023-01-11T21:38:06.5629564Z 2023-01-11T21:38:06.5629634Z import triton 2023-01-11T21:38:06.5629728Z import triton.language as tl 2023-01-11T21:38:06.5629854Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.5630026Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.5630032Z 2023-01-11T21:38:06.5630036Z 2023-01-11T21:38:06.5630248Z triton_fused_gather_mean_neg_squeeze_unsqueeze_0 = async_compile.triton(''' 2023-01-11T21:38:06.5630325Z import triton 2023-01-11T21:38:06.5630417Z import triton.language as tl 2023-01-11T21:38:06.5630526Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.5630629Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.5630761Z from torch._inductor.triton_ops.autotune import reduction 2023-01-11T21:38:06.5630887Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.5630892Z 2023-01-11T21:38:06.5630986Z @reduction(size_hints=[1, 8], 2023-01-11T21:38:06.5631103Z reduction_hint=ReductionHint.INNER, 2023-01-11T21:38:06.5631191Z filename=__file__, 2023-01-11T21:38:06.5631569Z meta={'signature': {0: '*i64', 1: '*fp16', 2: '*fp16', 3: 'i32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.5631639Z @triton.jit 2023-01-11T21:38:06.5631818Z def triton_(in_ptr0, in_ptr1, out_ptr1, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr): 2023-01-11T21:38:06.5631892Z xnumel = 1 2023-01-11T21:38:06.5631966Z rnumel = 5 2023-01-11T21:38:06.5632065Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.5632204Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.5632290Z xmask = xindex < xnumel 2023-01-11T21:38:06.5632409Z rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK]) 2023-01-11T21:38:06.5632556Z _tmp4 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.5632662Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.5632752Z rindex = roffset + rbase 2023-01-11T21:38:06.5632839Z rmask = rindex < rnumel 2023-01-11T21:38:06.5632912Z r0 = rindex 2023-01-11T21:38:06.5633018Z tmp0 = tl.load(in_ptr0 + (r0), rmask) 2023-01-11T21:38:06.5633148Z tmp1 = tl.load(in_ptr1 + (tmp0 + (5*r0)), rmask).to(tl.float32) 2023-01-11T21:38:06.5633250Z tmp2 = -tmp1 2023-01-11T21:38:06.5633344Z tmp3 = tmp2.to(tl.float32) 2023-01-11T21:38:06.5633468Z _tmp4 = tl.where(xmask & rmask, _tmp4 + tmp3, _tmp4) 2023-01-11T21:38:06.5633583Z tmp4 = tl.reshape(tl.sum(_tmp4, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.5633656Z tmp5 = 5 2023-01-11T21:38:06.5633738Z tmp6 = tmp4 / tmp5 2023-01-11T21:38:06.5633821Z tmp7 = tmp6.to(tl.float32) 2023-01-11T21:38:06.5633959Z tl.store(out_ptr1 + (0 + tl.zeros([XBLOCK, 1], tl.int32)), tmp7, None) 2023-01-11T21:38:06.5634048Z ''') 2023-01-11T21:38:06.5634054Z 2023-01-11T21:38:06.5634058Z 2023-01-11T21:38:06.5634216Z triton_fused_full_1 = async_compile.triton(''' 2023-01-11T21:38:06.5634291Z import triton 2023-01-11T21:38:06.5634384Z import triton.language as tl 2023-01-11T21:38:06.5634499Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.5634606Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.5634735Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.5634862Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.5634867Z 2023-01-11T21:38:06.5635257Z @pointwise(size_hints=[1], filename=__file__, meta={'signature': {0: '*fp16', 1: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0,), equal_to_1=())]}) 2023-01-11T21:38:06.5635344Z @triton.jit 2023-01-11T21:38:06.5635483Z def triton_(out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.5635579Z xnumel = 1 2023-01-11T21:38:06.5635683Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.5635814Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.5635892Z xmask = xindex < xnumel 2023-01-11T21:38:06.5635965Z tmp0 = 5.0 2023-01-11T21:38:06.5636129Z tl.store(out_ptr0 + (0 + tl.zeros([XBLOCK], tl.int32)), tmp0, None) 2023-01-11T21:38:06.5636218Z ''') 2023-01-11T21:38:06.5636223Z 2023-01-11T21:38:06.5636228Z 2023-01-11T21:38:06.5636323Z async_compile.wait(globals()) 2023-01-11T21:38:06.5636401Z del async_compile 2023-01-11T21:38:06.5636406Z 2023-01-11T21:38:06.5636483Z def call(args): 2023-01-11T21:38:06.5636569Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.5636640Z args.clear() 2023-01-11T21:38:06.5636735Z with torch.cuda.device(0): 2023-01-11T21:38:06.5636927Z buf1 = empty_strided((), (), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.5637022Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.5637204Z triton_fused_gather_mean_neg_squeeze_unsqueeze_0.run(arg1_1, arg0_1, buf1, 1, 5, grid=grid(1), stream=stream0) 2023-01-11T21:38:06.5637283Z del arg0_1 2023-01-11T21:38:06.5637357Z del arg1_1 2023-01-11T21:38:06.5637542Z buf2 = empty_strided((), (), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.5637676Z triton_fused_full_1.run(buf2, 1, grid=grid(1), stream=stream0) 2023-01-11T21:38:06.5637762Z return (buf1, buf2, ) 2023-01-11T21:38:06.5637768Z 2023-01-11T21:38:06.5637772Z 2023-01-11T21:38:06.5637854Z if __name__ == "__main__": 2023-01-11T21:38:06.5637972Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.5638102Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.5638307Z arg0_1 = rand_strided((5, 5), (5, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.5638504Z arg1_1 = rand_strided((5, ), (1, ), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.5638620Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.5638651Z 2023-01-11T21:38:06.5638731Z ok (0.370s) 2023-01-11T21:38:06.5639211Z test_no_mega_fusion_during_lowering_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.5639346Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.5639608Z [2023-01-11 21:35:24,674] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 741 2023-01-11T21:38:06.5639614Z 2023-01-11T21:38:06.5639717Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.5639793Z import torch 2023-01-11T21:38:06.5639872Z import random 2023-01-11T21:38:06.5639995Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.5640118Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.5640123Z 2023-01-11T21:38:06.5640206Z aten = torch.ops.aten 2023-01-11T21:38:06.5640344Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.5640445Z async_compile = AsyncCompile() 2023-01-11T21:38:06.5640452Z 2023-01-11T21:38:06.5640528Z import triton 2023-01-11T21:38:06.5640623Z import triton.language as tl 2023-01-11T21:38:06.5640749Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.5640883Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.5640897Z 2023-01-11T21:38:06.5640901Z 2023-01-11T21:38:06.5641126Z triton_fused_add_add_1_add_2_add_3_add_4_add_5_add_6_add_7_add_8_0 = async_compile.triton(''' 2023-01-11T21:38:06.5641203Z import triton 2023-01-11T21:38:06.5641295Z import triton.language as tl 2023-01-11T21:38:06.5641411Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.5641519Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.5641654Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.5641782Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.5641786Z 2023-01-11T21:38:06.5642343Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: '*fp32', 5: '*fp32', 6: '*fp32', 7: '*fp32', 8: '*fp32', 9: '*fp32', 10: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10), equal_to_1=())]}) 2023-01-11T21:38:06.5642420Z @triton.jit 2023-01-11T21:38:06.5642613Z def triton_(in_ptr0, in_ptr1, in_ptr2, in_ptr3, in_ptr4, in_ptr5, in_ptr6, in_ptr7, in_ptr8, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.5642688Z xnumel = 64 2023-01-11T21:38:06.5642788Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.5642920Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.5643010Z xmask = xindex < xnumel 2023-01-11T21:38:06.5643083Z x0 = xindex 2023-01-11T21:38:06.5643182Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.5643275Z tmp2 = tl.load(in_ptr1 + (x0), xmask) 2023-01-11T21:38:06.5643372Z tmp4 = tl.load(in_ptr2 + (x0), xmask) 2023-01-11T21:38:06.5643472Z tmp6 = tl.load(in_ptr3 + (x0), xmask) 2023-01-11T21:38:06.5643568Z tmp8 = tl.load(in_ptr4 + (x0), xmask) 2023-01-11T21:38:06.5643668Z tmp10 = tl.load(in_ptr5 + (x0), xmask) 2023-01-11T21:38:06.5643767Z tmp12 = tl.load(in_ptr6 + (x0), xmask) 2023-01-11T21:38:06.5643865Z tmp14 = tl.load(in_ptr7 + (x0), xmask) 2023-01-11T21:38:06.5643956Z tmp16 = tl.load(in_ptr8 + (x0), xmask) 2023-01-11T21:38:06.5644036Z tmp1 = tmp0 + tmp0 2023-01-11T21:38:06.5644114Z tmp3 = tmp1 + tmp2 2023-01-11T21:38:06.5644191Z tmp5 = tmp3 + tmp4 2023-01-11T21:38:06.5644270Z tmp7 = tmp5 + tmp6 2023-01-11T21:38:06.5644349Z tmp9 = tmp7 + tmp8 2023-01-11T21:38:06.5644452Z tmp11 = tmp9 + tmp10 2023-01-11T21:38:06.5644534Z tmp13 = tmp11 + tmp12 2023-01-11T21:38:06.5644616Z tmp15 = tmp13 + tmp14 2023-01-11T21:38:06.5644696Z tmp17 = tmp15 + tmp16 2023-01-11T21:38:06.5644836Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp17, xmask) 2023-01-11T21:38:06.5644931Z ''') 2023-01-11T21:38:06.5644937Z 2023-01-11T21:38:06.5644941Z 2023-01-11T21:38:06.5645190Z triton_fused_add_add_1_add_10_add_11_add_12_add_13_add_14_add_15_add_16_add_2_1 = async_compile.triton(''' 2023-01-11T21:38:06.5645268Z import triton 2023-01-11T21:38:06.5645357Z import triton.language as tl 2023-01-11T21:38:06.5645473Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.5645579Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.5645723Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.5645871Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.5645876Z 2023-01-11T21:38:06.5646426Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: '*fp32', 5: '*fp32', 6: '*fp32', 7: '*fp32', 8: '*fp32', 9: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5, 6, 7, 8, 9), equal_to_1=())]}) 2023-01-11T21:38:06.5646506Z @triton.jit 2023-01-11T21:38:06.5646697Z def triton_(in_out_ptr0, in_ptr0, in_ptr1, in_ptr2, in_ptr3, in_ptr4, in_ptr5, in_ptr6, in_ptr7, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.5646774Z xnumel = 64 2023-01-11T21:38:06.5646867Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.5647002Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.5647085Z xmask = xindex < xnumel 2023-01-11T21:38:06.5647157Z x0 = xindex 2023-01-11T21:38:06.5647261Z tmp0 = tl.load(in_out_ptr0 + (x0), xmask) 2023-01-11T21:38:06.5647358Z tmp1 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.5647458Z tmp3 = tl.load(in_ptr1 + (x0), xmask) 2023-01-11T21:38:06.5647548Z tmp5 = tl.load(in_ptr2 + (x0), xmask) 2023-01-11T21:38:06.5647645Z tmp7 = tl.load(in_ptr3 + (x0), xmask) 2023-01-11T21:38:06.5647740Z tmp9 = tl.load(in_ptr4 + (x0), xmask) 2023-01-11T21:38:06.5647839Z tmp11 = tl.load(in_ptr5 + (x0), xmask) 2023-01-11T21:38:06.5647968Z tmp13 = tl.load(in_ptr6 + (x0), xmask) 2023-01-11T21:38:06.5648064Z tmp15 = tl.load(in_ptr7 + (x0), xmask) 2023-01-11T21:38:06.5648146Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.5648217Z tmp4 = tmp2 + tmp3 2023-01-11T21:38:06.5648293Z tmp6 = tmp4 + tmp5 2023-01-11T21:38:06.5648368Z tmp8 = tmp6 + tmp7 2023-01-11T21:38:06.5648444Z tmp10 = tmp8 + tmp9 2023-01-11T21:38:06.5648525Z tmp12 = tmp10 + tmp11 2023-01-11T21:38:06.5648603Z tmp14 = tmp12 + tmp13 2023-01-11T21:38:06.5648675Z tmp16 = tmp14 + tmp15 2023-01-11T21:38:06.5648816Z tl.store(in_out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp16, xmask) 2023-01-11T21:38:06.5648906Z ''') 2023-01-11T21:38:06.5648911Z 2023-01-11T21:38:06.5648915Z 2023-01-11T21:38:06.5649075Z triton_fused_add_49_2 = async_compile.triton(''' 2023-01-11T21:38:06.5649151Z import triton 2023-01-11T21:38:06.5649244Z import triton.language as tl 2023-01-11T21:38:06.5649357Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.5649460Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.5649590Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.5649716Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.5649721Z 2023-01-11T21:38:06.5650134Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.5650206Z @triton.jit 2023-01-11T21:38:06.5650340Z def triton_(in_out_ptr0, in_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.5650444Z xnumel = 64 2023-01-11T21:38:06.5650542Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.5650671Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.5650746Z xmask = xindex < xnumel 2023-01-11T21:38:06.5650815Z x0 = xindex 2023-01-11T21:38:06.5650922Z tmp0 = tl.load(in_out_ptr0 + (x0), xmask) 2023-01-11T21:38:06.5651015Z tmp1 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.5651096Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.5651235Z tl.store(in_out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.5651321Z ''') 2023-01-11T21:38:06.5651327Z 2023-01-11T21:38:06.5651331Z 2023-01-11T21:38:06.5651415Z async_compile.wait(globals()) 2023-01-11T21:38:06.5651492Z del async_compile 2023-01-11T21:38:06.5651497Z 2023-01-11T21:38:06.5651569Z def call(args): 2023-01-11T21:38:06.5651890Z arg0_1, arg1_1, arg2_1, arg3_1, arg4_1, arg5_1, arg6_1, arg7_1, arg8_1, arg9_1, arg10_1, arg11_1, arg12_1, arg13_1, arg14_1, arg15_1, arg16_1, arg17_1, arg18_1, arg19_1, arg20_1, arg21_1, arg22_1, arg23_1, arg24_1, arg25_1, arg26_1, arg27_1, arg28_1, arg29_1, arg30_1, arg31_1, arg32_1, arg33_1, arg34_1, arg35_1, arg36_1, arg37_1, arg38_1, arg39_1, arg40_1, arg41_1, arg42_1, arg43_1, arg44_1, arg45_1, arg46_1, arg47_1, arg48_1, arg49_1 = args 2023-01-11T21:38:06.5651974Z args.clear() 2023-01-11T21:38:06.5652065Z with torch.cuda.device(0): 2023-01-11T21:38:06.5652268Z buf0 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.5652361Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.5652573Z triton_fused_add_add_1_add_2_add_3_add_4_add_5_add_6_add_7_add_8_0.run(arg0_1, arg1_1, arg2_1, arg3_1, arg4_1, arg5_1, arg6_1, arg7_1, arg8_1, buf0, 64, grid=grid(64), stream=stream0) 2023-01-11T21:38:06.5652639Z del arg0_1 2023-01-11T21:38:06.5652711Z del arg1_1 2023-01-11T21:38:06.5652782Z del arg2_1 2023-01-11T21:38:06.5652860Z del arg3_1 2023-01-11T21:38:06.5652933Z del arg4_1 2023-01-11T21:38:06.5653002Z del arg5_1 2023-01-11T21:38:06.5653071Z del arg6_1 2023-01-11T21:38:06.5653134Z del arg7_1 2023-01-11T21:38:06.5653204Z del arg8_1 2023-01-11T21:38:06.5653293Z buf1 = buf0; del buf0 # reuse 2023-01-11T21:38:06.5653536Z triton_fused_add_add_1_add_10_add_11_add_12_add_13_add_14_add_15_add_16_add_2_1.run(buf1, arg9_1, arg10_1, arg11_1, arg12_1, arg13_1, arg14_1, arg15_1, arg16_1, 64, grid=grid(64), stream=stream0) 2023-01-11T21:38:06.5653615Z del arg10_1 2023-01-11T21:38:06.5653691Z del arg11_1 2023-01-11T21:38:06.5653763Z del arg12_1 2023-01-11T21:38:06.5653829Z del arg13_1 2023-01-11T21:38:06.5653901Z del arg14_1 2023-01-11T21:38:06.5653974Z del arg15_1 2023-01-11T21:38:06.5654044Z del arg16_1 2023-01-11T21:38:06.5654115Z del arg9_1 2023-01-11T21:38:06.5654203Z buf2 = buf1; del buf1 # reuse 2023-01-11T21:38:06.5654428Z triton_fused_add_add_1_add_10_add_11_add_12_add_13_add_14_add_15_add_16_add_2_1.run(buf2, arg17_1, arg18_1, arg19_1, arg20_1, arg21_1, arg22_1, arg23_1, arg24_1, 64, grid=grid(64), stream=stream0) 2023-01-11T21:38:06.5654610Z del arg17_1 2023-01-11T21:38:06.5654687Z del arg18_1 2023-01-11T21:38:06.5654760Z del arg19_1 2023-01-11T21:38:06.5654830Z del arg20_1 2023-01-11T21:38:06.5654899Z del arg21_1 2023-01-11T21:38:06.5654966Z del arg22_1 2023-01-11T21:38:06.5655034Z del arg23_1 2023-01-11T21:38:06.5655097Z del arg24_1 2023-01-11T21:38:06.5655185Z buf3 = buf2; del buf2 # reuse 2023-01-11T21:38:06.5655398Z triton_fused_add_add_1_add_10_add_11_add_12_add_13_add_14_add_15_add_16_add_2_1.run(buf3, arg25_1, arg26_1, arg27_1, arg28_1, arg29_1, arg30_1, arg31_1, arg32_1, 64, grid=grid(64), stream=stream0) 2023-01-11T21:38:06.5655469Z del arg25_1 2023-01-11T21:38:06.5655541Z del arg26_1 2023-01-11T21:38:06.5655654Z del arg27_1 2023-01-11T21:38:06.5655726Z del arg28_1 2023-01-11T21:38:06.5655789Z del arg29_1 2023-01-11T21:38:06.5655859Z del arg30_1 2023-01-11T21:38:06.5655929Z del arg31_1 2023-01-11T21:38:06.5656000Z del arg32_1 2023-01-11T21:38:06.5656085Z buf4 = buf3; del buf3 # reuse 2023-01-11T21:38:06.5656304Z triton_fused_add_add_1_add_10_add_11_add_12_add_13_add_14_add_15_add_16_add_2_1.run(buf4, arg33_1, arg34_1, arg35_1, arg36_1, arg37_1, arg38_1, arg39_1, arg40_1, 64, grid=grid(64), stream=stream0) 2023-01-11T21:38:06.5656375Z del arg33_1 2023-01-11T21:38:06.5656439Z del arg34_1 2023-01-11T21:38:06.5656507Z del arg35_1 2023-01-11T21:38:06.5656577Z del arg36_1 2023-01-11T21:38:06.5656647Z del arg37_1 2023-01-11T21:38:06.5656717Z del arg38_1 2023-01-11T21:38:06.5656786Z del arg39_1 2023-01-11T21:38:06.5656856Z del arg40_1 2023-01-11T21:38:06.5656942Z buf5 = buf4; del buf4 # reuse 2023-01-11T21:38:06.5657209Z triton_fused_add_add_1_add_10_add_11_add_12_add_13_add_14_add_15_add_16_add_2_1.run(buf5, arg41_1, arg42_1, arg43_1, arg44_1, arg45_1, arg46_1, arg47_1, arg48_1, 64, grid=grid(64), stream=stream0) 2023-01-11T21:38:06.5657282Z del arg41_1 2023-01-11T21:38:06.5657354Z del arg42_1 2023-01-11T21:38:06.5657437Z del arg43_1 2023-01-11T21:38:06.5657515Z del arg44_1 2023-01-11T21:38:06.5657591Z del arg45_1 2023-01-11T21:38:06.5657655Z del arg46_1 2023-01-11T21:38:06.5657727Z del arg47_1 2023-01-11T21:38:06.5657800Z del arg48_1 2023-01-11T21:38:06.5657888Z buf6 = buf5; del buf5 # reuse 2023-01-11T21:38:06.5658029Z triton_fused_add_49_2.run(buf6, arg49_1, 64, grid=grid(64), stream=stream0) 2023-01-11T21:38:06.5658102Z del arg49_1 2023-01-11T21:38:06.5658178Z return (buf6, ) 2023-01-11T21:38:06.5658184Z 2023-01-11T21:38:06.5658188Z 2023-01-11T21:38:06.5658265Z if __name__ == "__main__": 2023-01-11T21:38:06.5658383Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.5658512Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.5658729Z arg0_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.5658989Z arg1_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.5659213Z arg2_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.5659433Z arg3_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.5659649Z arg4_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.5659858Z arg5_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.5660074Z arg6_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.5660288Z arg7_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.5660509Z arg8_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.5660722Z arg9_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.5660947Z arg10_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.5661170Z arg11_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.5661390Z arg12_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.5661604Z arg13_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.5661820Z arg14_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.5662035Z arg15_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.5662252Z arg16_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.5662504Z arg17_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.5662719Z arg18_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.5662935Z arg19_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.5663154Z arg20_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.5663362Z arg21_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.5663574Z arg22_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.5663788Z arg23_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.5664002Z arg24_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.5664217Z arg25_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.5664431Z arg26_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.5664646Z arg27_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.5664858Z arg28_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.5665077Z arg29_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.5665283Z arg30_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.5665504Z arg31_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.5665753Z arg32_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.5665975Z arg33_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.5666192Z arg34_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.5666405Z arg35_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.5666621Z arg36_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.5666831Z arg37_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.5667037Z arg38_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.5667278Z arg39_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.5667491Z arg40_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.5667705Z arg41_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.5667918Z arg42_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.5668132Z arg43_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.5668345Z arg44_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.5668561Z arg45_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.5668768Z arg46_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.5668980Z arg47_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.5669196Z arg48_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.5669411Z arg49_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.5669829Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1, arg3_1, arg4_1, arg5_1, arg6_1, arg7_1, arg8_1, arg9_1, arg10_1, arg11_1, arg12_1, arg13_1, arg14_1, arg15_1, arg16_1, arg17_1, arg18_1, arg19_1, arg20_1, arg21_1, arg22_1, arg23_1, arg24_1, arg25_1, arg26_1, arg27_1, arg28_1, arg29_1, arg30_1, arg31_1, arg32_1, arg33_1, arg34_1, arg35_1, arg36_1, arg37_1, arg38_1, arg39_1, arg40_1, arg41_1, arg42_1, arg43_1, arg44_1, arg45_1, arg46_1, arg47_1, arg48_1, arg49_1])) 2023-01-11T21:38:06.5670164Z [2023-01-11 21:35:24,964] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 741 2023-01-11T21:38:06.5670170Z 2023-01-11T21:38:06.5670259Z --> 7 2023-01-11T21:38:06.5670334Z ok (0.487s) 2023-01-11T21:38:06.5670885Z test_no_op_reduction_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.5671028Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.5671322Z [2023-01-11 21:35:24,998] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 742 2023-01-11T21:38:06.5671618Z [2023-01-11 21:35:25,068] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 742 2023-01-11T21:38:06.5672116Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.5672257Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.5672553Z [2023-01-11 21:35:25,087] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 743 2023-01-11T21:38:06.5672854Z [2023-01-11 21:35:25,156] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 743 2023-01-11T21:38:06.5672860Z 2023-01-11T21:38:06.5672962Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.5673039Z import torch 2023-01-11T21:38:06.5673112Z import random 2023-01-11T21:38:06.5673244Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.5673371Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.5673377Z 2023-01-11T21:38:06.5673457Z aten = torch.ops.aten 2023-01-11T21:38:06.5673593Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.5673712Z async_compile = AsyncCompile() 2023-01-11T21:38:06.5673718Z 2023-01-11T21:38:06.5673795Z import triton 2023-01-11T21:38:06.5673887Z import triton.language as tl 2023-01-11T21:38:06.5674013Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.5674145Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.5674160Z 2023-01-11T21:38:06.5674165Z 2023-01-11T21:38:06.5674323Z triton_fused_amax_sum_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.5674397Z import triton 2023-01-11T21:38:06.5674488Z import triton.language as tl 2023-01-11T21:38:06.5674602Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.5674705Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.5674840Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.5674964Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.5674969Z 2023-01-11T21:38:06.5675413Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.5675483Z @triton.jit 2023-01-11T21:38:06.5675648Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.5675721Z xnumel = 8 2023-01-11T21:38:06.5675818Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.5675947Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.5676029Z xmask = xindex < xnumel 2023-01-11T21:38:06.5676101Z x0 = xindex 2023-01-11T21:38:06.5676286Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.5676413Z tmp1 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.5676484Z tmp2 = 1 2023-01-11T21:38:06.5676562Z tmp3 = tmp1 + tmp2 2023-01-11T21:38:06.5676697Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask) 2023-01-11T21:38:06.5676833Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.5676915Z ''') 2023-01-11T21:38:06.5676920Z 2023-01-11T21:38:06.5676925Z 2023-01-11T21:38:06.5677018Z async_compile.wait(globals()) 2023-01-11T21:38:06.5677088Z del async_compile 2023-01-11T21:38:06.5677093Z 2023-01-11T21:38:06.5677166Z def call(args): 2023-01-11T21:38:06.5677238Z arg0_1, = args 2023-01-11T21:38:06.5677313Z args.clear() 2023-01-11T21:38:06.5677407Z with torch.cuda.device(0): 2023-01-11T21:38:06.5677608Z buf0 = empty_strided((8, 1), (1, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.5677815Z buf1 = empty_strided((8, 1, 1), (1, 1, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.5677904Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.5678046Z triton_fused_amax_sum_1_0.run(arg0_1, buf0, buf1, 8, grid=grid(8), stream=stream0) 2023-01-11T21:38:06.5678118Z del arg0_1 2023-01-11T21:38:06.5678200Z return (buf0, buf1, ) 2023-01-11T21:38:06.5678209Z 2023-01-11T21:38:06.5678213Z 2023-01-11T21:38:06.5678291Z if __name__ == "__main__": 2023-01-11T21:38:06.5678407Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.5678532Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.5678738Z arg0_1 = rand_strided((8, 1, 1), (1, 1, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.5678844Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.5678849Z 2023-01-11T21:38:06.5678858Z 2023-01-11T21:38:06.5678949Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.5679022Z import torch 2023-01-11T21:38:06.5679093Z import random 2023-01-11T21:38:06.5679217Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.5679341Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.5679346Z 2023-01-11T21:38:06.5679431Z aten = torch.ops.aten 2023-01-11T21:38:06.5679566Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.5679683Z async_compile = AsyncCompile() 2023-01-11T21:38:06.5679688Z 2023-01-11T21:38:06.5679762Z import triton 2023-01-11T21:38:06.5679852Z import triton.language as tl 2023-01-11T21:38:06.5679976Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.5680117Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.5680122Z 2023-01-11T21:38:06.5680127Z 2023-01-11T21:38:06.5680292Z triton_fused_amax_sum_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.5680365Z import triton 2023-01-11T21:38:06.5680449Z import triton.language as tl 2023-01-11T21:38:06.5680562Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.5680668Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.5680800Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.5680924Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.5680929Z 2023-01-11T21:38:06.5681347Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.5681420Z @triton.jit 2023-01-11T21:38:06.5681563Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.5681635Z xnumel = 8 2023-01-11T21:38:06.5681725Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.5681850Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.5681934Z xmask = xindex < xnumel 2023-01-11T21:38:06.5682004Z x0 = xindex 2023-01-11T21:38:06.5682246Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.5682364Z tmp1 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.5682434Z tmp2 = 1 2023-01-11T21:38:06.5682507Z tmp3 = tmp1 + tmp2 2023-01-11T21:38:06.5682648Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask) 2023-01-11T21:38:06.5682779Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.5682863Z ''') 2023-01-11T21:38:06.5682869Z 2023-01-11T21:38:06.5682873Z 2023-01-11T21:38:06.5682965Z async_compile.wait(globals()) 2023-01-11T21:38:06.5683041Z del async_compile 2023-01-11T21:38:06.5683047Z 2023-01-11T21:38:06.5683124Z def call(args): 2023-01-11T21:38:06.5683190Z arg0_1, = args 2023-01-11T21:38:06.5683263Z args.clear() 2023-01-11T21:38:06.5683355Z with torch.cuda.device(0): 2023-01-11T21:38:06.5683554Z buf0 = empty_strided((8, 1), (1, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.5683759Z buf1 = empty_strided((8, 1, 1), (1, 1, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.5683849Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.5683993Z triton_fused_amax_sum_1_0.run(arg0_1, buf0, buf1, 8, grid=grid(8), stream=stream0) 2023-01-11T21:38:06.5684067Z del arg0_1 2023-01-11T21:38:06.5684146Z return (buf0, buf1, ) 2023-01-11T21:38:06.5684151Z 2023-01-11T21:38:06.5684155Z 2023-01-11T21:38:06.5684236Z if __name__ == "__main__": 2023-01-11T21:38:06.5684355Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.5684482Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.5684687Z arg0_1 = rand_strided((8, 1, 1), (1, 1, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.5684799Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.5684804Z 2023-01-11T21:38:06.5684874Z ok (0.175s) 2023-01-11T21:38:06.5685390Z test_output_strides_cuda (__main__.CudaTests) ... /opt/conda/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py:372: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled.Consider setting `torch.set_float32_matmul_precision('high')` 2023-01-11T21:38:06.5685465Z warnings.warn( 2023-01-11T21:38:06.5685750Z [2023-01-11 21:35:25,171] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 744 2023-01-11T21:38:06.5686014Z [2023-01-11 21:35:25,351] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 744 2023-01-11T21:38:06.5686268Z [2023-01-11 21:35:25,528] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 745 2023-01-11T21:38:06.5686526Z [2023-01-11 21:35:25,530] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 745 2023-01-11T21:38:06.5687038Z /opt/conda/lib/python3.10/site-packages/torch/cuda/graphs.py:82: UserWarning: The CUDA Graph is empty. This ususally means that the graph was attempted to be captured on wrong device or stream. (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/cuda/CUDAGraph.cpp:192.) 2023-01-11T21:38:06.5687145Z super(CUDAGraph, self).capture_end() 2023-01-11T21:38:06.5687397Z [2023-01-11 21:35:25,681] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 746 2023-01-11T21:38:06.5687658Z [2023-01-11 21:35:25,685] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 746 2023-01-11T21:38:06.5688167Z /opt/conda/lib/python3.10/site-packages/torch/cuda/graphs.py:82: UserWarning: The CUDA Graph is empty. This ususally means that the graph was attempted to be captured on wrong device or stream. (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/cuda/CUDAGraph.cpp:192.) 2023-01-11T21:38:06.5688271Z super(CUDAGraph, self).capture_end() 2023-01-11T21:38:06.5688681Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:3148: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.5688828Z self.assertEqual(inp.storage(), out.storage()) 2023-01-11T21:38:06.5689462Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:1904: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.5689556Z device=typed_storage.device, 2023-01-11T21:38:06.5689562Z 2023-01-11T21:38:06.5689661Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.5689736Z import torch 2023-01-11T21:38:06.5689813Z import random 2023-01-11T21:38:06.5689933Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.5690061Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.5690067Z 2023-01-11T21:38:06.5690142Z aten = torch.ops.aten 2023-01-11T21:38:06.5690278Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.5690374Z async_compile = AsyncCompile() 2023-01-11T21:38:06.5690379Z 2023-01-11T21:38:06.5690455Z import triton 2023-01-11T21:38:06.5690548Z import triton.language as tl 2023-01-11T21:38:06.5690671Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.5690810Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.5690815Z 2023-01-11T21:38:06.5690820Z 2023-01-11T21:38:06.5690978Z triton_fused_clone_0 = async_compile.triton(''' 2023-01-11T21:38:06.5691045Z import triton 2023-01-11T21:38:06.5691136Z import triton.language as tl 2023-01-11T21:38:06.5691248Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.5691349Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.5691485Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.5691610Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.5691615Z 2023-01-11T21:38:06.5692103Z @pointwise(size_hints=[64, 4], tile_hint=TileHint.SQUARE,filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.5692178Z @triton.jit 2023-01-11T21:38:06.5692338Z def triton_(in_ptr0, out_ptr0, xnumel, ynumel, XBLOCK : tl.constexpr, YBLOCK : tl.constexpr): 2023-01-11T21:38:06.5692414Z xnumel = 64 2023-01-11T21:38:06.5692485Z ynumel = 4 2023-01-11T21:38:06.5692581Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.5692717Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.5692800Z xmask = xindex < xnumel 2023-01-11T21:38:06.5692896Z yoffset = tl.program_id(1) * YBLOCK 2023-01-11T21:38:06.5693024Z yindex = yoffset + tl.reshape(tl.arange(0, YBLOCK), [1, YBLOCK]) 2023-01-11T21:38:06.5693106Z ymask = yindex < ynumel 2023-01-11T21:38:06.5693181Z x0 = xindex % 16 2023-01-11T21:38:06.5693258Z x1 = (xindex // 16) 2023-01-11T21:38:06.5693327Z y2 = yindex 2023-01-11T21:38:06.5693400Z x3 = xindex 2023-01-11T21:38:06.5693526Z tmp0 = tl.load(in_ptr0 + (x0 + (16*y2) + (64*x1)), xmask & ymask) 2023-01-11T21:38:06.5693678Z tl.store(out_ptr0 + (y2 + (4*x3) + tl.zeros([XBLOCK, YBLOCK], tl.int32)), tmp0, xmask & ymask) 2023-01-11T21:38:06.5693763Z ''') 2023-01-11T21:38:06.5693769Z 2023-01-11T21:38:06.5693774Z 2023-01-11T21:38:06.5693867Z async_compile.wait(globals()) 2023-01-11T21:38:06.5693945Z del async_compile 2023-01-11T21:38:06.5693950Z 2023-01-11T21:38:06.5694021Z def call(args): 2023-01-11T21:38:06.5694093Z arg0_1, = args 2023-01-11T21:38:06.5694169Z args.clear() 2023-01-11T21:38:06.5694259Z with torch.cuda.device(0): 2023-01-11T21:38:06.5694609Z buf0 = empty_strided((4, 4, 4, 4), (64, 16, 4, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.5694706Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.5694846Z triton_fused_clone_0.run(arg0_1, buf0, 64, 4, grid=grid(64, 4), stream=stream0) 2023-01-11T21:38:06.5694919Z del arg0_1 2023-01-11T21:38:06.5695001Z return (buf0, ) 2023-01-11T21:38:06.5695007Z 2023-01-11T21:38:06.5695011Z 2023-01-11T21:38:06.5695090Z if __name__ == "__main__": 2023-01-11T21:38:06.5695206Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.5695330Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.5695540Z arg0_1 = rand_strided((4, 4, 4, 4), (64, 16, 4, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.5695653Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.5695658Z 2023-01-11T21:38:06.5695663Z 2023-01-11T21:38:06.5695760Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.5695843Z import torch 2023-01-11T21:38:06.5695919Z import random 2023-01-11T21:38:06.5696038Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.5696160Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.5696165Z 2023-01-11T21:38:06.5696249Z aten = torch.ops.aten 2023-01-11T21:38:06.5696382Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.5696478Z async_compile = AsyncCompile() 2023-01-11T21:38:06.5696483Z 2023-01-11T21:38:06.5696556Z import triton 2023-01-11T21:38:06.5696649Z import triton.language as tl 2023-01-11T21:38:06.5696773Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.5696910Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.5696916Z 2023-01-11T21:38:06.5696920Z 2023-01-11T21:38:06.5697012Z async_compile.wait(globals()) 2023-01-11T21:38:06.5697081Z del async_compile 2023-01-11T21:38:06.5697094Z 2023-01-11T21:38:06.5697238Z def call(args): 2023-01-11T21:38:06.5697326Z arg0_1, = args 2023-01-11T21:38:06.5697415Z args.clear() 2023-01-11T21:38:06.5697523Z return (as_strided(arg0_1, (64, 4), (4, 1)), ) 2023-01-11T21:38:06.5697528Z 2023-01-11T21:38:06.5697532Z 2023-01-11T21:38:06.5697614Z if __name__ == "__main__": 2023-01-11T21:38:06.5697733Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.5697915Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.5698130Z arg0_1 = rand_strided((4, 4, 4, 4), (64, 16, 4, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.5698243Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.5698248Z 2023-01-11T21:38:06.5698252Z 2023-01-11T21:38:06.5698364Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.5698453Z import torch 2023-01-11T21:38:06.5698541Z import random 2023-01-11T21:38:06.5698673Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.5698917Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.5698927Z 2023-01-11T21:38:06.5699028Z aten = torch.ops.aten 2023-01-11T21:38:06.5699193Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.5699317Z async_compile = AsyncCompile() 2023-01-11T21:38:06.5699322Z 2023-01-11T21:38:06.5699397Z import triton 2023-01-11T21:38:06.5699490Z import triton.language as tl 2023-01-11T21:38:06.5699619Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.5699759Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.5699765Z 2023-01-11T21:38:06.5699769Z 2023-01-11T21:38:06.5699862Z async_compile.wait(globals()) 2023-01-11T21:38:06.5699940Z del async_compile 2023-01-11T21:38:06.5699945Z 2023-01-11T21:38:06.5700014Z def call(args): 2023-01-11T21:38:06.5700087Z arg0_1, = args 2023-01-11T21:38:06.5700164Z args.clear() 2023-01-11T21:38:06.5700273Z return (as_strided(arg0_1, (4, 4, 1), (4, 16, 0), 3), ) 2023-01-11T21:38:06.5700278Z 2023-01-11T21:38:06.5700282Z 2023-01-11T21:38:06.5700412Z if __name__ == "__main__": 2023-01-11T21:38:06.5700527Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.5700648Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.5700869Z arg0_1 = rand_strided((4, 4, 4, 4), (64, 16, 4, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.5700977Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.5700982Z 2023-01-11T21:38:06.5701052Z ok (0.666s) 2023-01-11T21:38:06.5701182Z test_permute_bmm_fusion (__main__.CudaTests) ... ok (0.004s) 2023-01-11T21:38:06.5701643Z test_permute_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.5701776Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.5702036Z [2023-01-11 21:35:25,849] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 747 2023-01-11T21:38:06.5702305Z [2023-01-11 21:35:25,926] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 747 2023-01-11T21:38:06.5702718Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.5702849Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.5703099Z [2023-01-11 21:35:25,948] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 748 2023-01-11T21:38:06.5703365Z [2023-01-11 21:35:26,025] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 748 2023-01-11T21:38:06.5703370Z 2023-01-11T21:38:06.5703463Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.5703535Z import torch 2023-01-11T21:38:06.5703608Z import random 2023-01-11T21:38:06.5703802Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.5703931Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.5703937Z 2023-01-11T21:38:06.5704015Z aten = torch.ops.aten 2023-01-11T21:38:06.5704153Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.5704242Z async_compile = AsyncCompile() 2023-01-11T21:38:06.5704253Z 2023-01-11T21:38:06.5704321Z import triton 2023-01-11T21:38:06.5704411Z import triton.language as tl 2023-01-11T21:38:06.5704537Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.5704680Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.5704688Z 2023-01-11T21:38:06.5704693Z 2023-01-11T21:38:06.5704860Z triton_fused_add_1_add_2_0 = async_compile.triton(''' 2023-01-11T21:38:06.5704938Z import triton 2023-01-11T21:38:06.5705030Z import triton.language as tl 2023-01-11T21:38:06.5705138Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.5705239Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.5705374Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.5705501Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.5705506Z 2023-01-11T21:38:06.5705975Z @pointwise(size_hints=[32], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.5706049Z @triton.jit 2023-01-11T21:38:06.5706189Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.5706263Z xnumel = 32 2023-01-11T21:38:06.5706393Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.5706521Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.5706604Z xmask = xindex < xnumel 2023-01-11T21:38:06.5706676Z x0 = xindex 2023-01-11T21:38:06.5706870Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.5706967Z tmp5 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.5707037Z tmp1 = 1 2023-01-11T21:38:06.5707109Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.5707179Z tmp3 = 2 2023-01-11T21:38:06.5707257Z tmp4 = tmp2 + tmp3 2023-01-11T21:38:06.5707335Z tmp6 = tmp5 + tmp3 2023-01-11T21:38:06.5707470Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.5707606Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp6, xmask) 2023-01-11T21:38:06.5707696Z ''') 2023-01-11T21:38:06.5707701Z 2023-01-11T21:38:06.5707706Z 2023-01-11T21:38:06.5707791Z async_compile.wait(globals()) 2023-01-11T21:38:06.5707873Z del async_compile 2023-01-11T21:38:06.5707878Z 2023-01-11T21:38:06.5707953Z def call(args): 2023-01-11T21:38:06.5708024Z arg0_1, = args 2023-01-11T21:38:06.5708098Z args.clear() 2023-01-11T21:38:06.5708191Z with torch.cuda.device(0): 2023-01-11T21:38:06.5708414Z buf0 = empty_strided((2, 2, 2, 2, 2), (4, 8, 1, 16, 2), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.5708631Z buf1 = empty_strided((2, 2, 2, 2, 2), (4, 8, 1, 16, 2), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.5708716Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.5708865Z triton_fused_add_1_add_2_0.run(arg0_1, buf0, buf1, 32, grid=grid(32), stream=stream0) 2023-01-11T21:38:06.5708938Z del arg0_1 2023-01-11T21:38:06.5709018Z return (buf0, buf1, ) 2023-01-11T21:38:06.5709023Z 2023-01-11T21:38:06.5709027Z 2023-01-11T21:38:06.5709106Z if __name__ == "__main__": 2023-01-11T21:38:06.5709225Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.5709353Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.5709572Z arg0_1 = rand_strided((2, 2, 2, 2, 2), (16, 8, 4, 2, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.5709678Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.5709683Z 2023-01-11T21:38:06.5709715Z 2023-01-11T21:38:06.5709814Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.5709891Z import torch 2023-01-11T21:38:06.5709965Z import random 2023-01-11T21:38:06.5710081Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.5710204Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.5710209Z 2023-01-11T21:38:06.5710288Z aten = torch.ops.aten 2023-01-11T21:38:06.5710423Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.5710512Z async_compile = AsyncCompile() 2023-01-11T21:38:06.5710517Z 2023-01-11T21:38:06.5710590Z import triton 2023-01-11T21:38:06.5710680Z import triton.language as tl 2023-01-11T21:38:06.5710811Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.5710951Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.5710956Z 2023-01-11T21:38:06.5710961Z 2023-01-11T21:38:06.5711127Z triton_fused_add_1_add_2_0 = async_compile.triton(''' 2023-01-11T21:38:06.5711203Z import triton 2023-01-11T21:38:06.5711288Z import triton.language as tl 2023-01-11T21:38:06.5711404Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.5711506Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.5711639Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.5711762Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.5711767Z 2023-01-11T21:38:06.5712182Z @pointwise(size_hints=[32], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.5712288Z @triton.jit 2023-01-11T21:38:06.5712431Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.5712498Z xnumel = 32 2023-01-11T21:38:06.5712595Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.5712725Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.5712807Z xmask = xindex < xnumel 2023-01-11T21:38:06.5712876Z x0 = xindex 2023-01-11T21:38:06.5713092Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.5713208Z tmp5 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.5713278Z tmp1 = 1 2023-01-11T21:38:06.5713351Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.5713421Z tmp3 = 2 2023-01-11T21:38:06.5713497Z tmp4 = tmp2 + tmp3 2023-01-11T21:38:06.5713576Z tmp6 = tmp5 + tmp3 2023-01-11T21:38:06.5713713Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.5713850Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp6, xmask) 2023-01-11T21:38:06.5713928Z ''') 2023-01-11T21:38:06.5713938Z 2023-01-11T21:38:06.5713942Z 2023-01-11T21:38:06.5714028Z async_compile.wait(globals()) 2023-01-11T21:38:06.5714105Z del async_compile 2023-01-11T21:38:06.5714110Z 2023-01-11T21:38:06.5714186Z def call(args): 2023-01-11T21:38:06.5714257Z arg0_1, = args 2023-01-11T21:38:06.5714332Z args.clear() 2023-01-11T21:38:06.5714427Z with torch.cuda.device(0): 2023-01-11T21:38:06.5714646Z buf0 = empty_strided((2, 2, 2, 2, 2), (4, 8, 1, 16, 2), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.5714854Z buf1 = empty_strided((2, 2, 2, 2, 2), (4, 8, 1, 16, 2), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.5714947Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.5715094Z triton_fused_add_1_add_2_0.run(arg0_1, buf0, buf1, 32, grid=grid(32), stream=stream0) 2023-01-11T21:38:06.5715168Z del arg0_1 2023-01-11T21:38:06.5715248Z return (buf0, buf1, ) 2023-01-11T21:38:06.5715253Z 2023-01-11T21:38:06.5715257Z 2023-01-11T21:38:06.5715336Z if __name__ == "__main__": 2023-01-11T21:38:06.5715452Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.5715576Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.5715814Z arg0_1 = rand_strided((2, 2, 2, 2, 2), (16, 8, 4, 2, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.5715927Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.5715932Z 2023-01-11T21:38:06.5716004Z ok (0.199s) 2023-01-11T21:38:06.5716514Z test_permute_fusion (__main__.CudaTests) ... /opt/conda/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py:372: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled.Consider setting `torch.set_float32_matmul_precision('high')` 2023-01-11T21:38:06.5716594Z warnings.warn( 2023-01-11T21:38:06.5716854Z [2023-01-11 21:35:26,082] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 749 2023-01-11T21:38:06.5717118Z [2023-01-11 21:35:26,086] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 749 2023-01-11T21:38:06.5717124Z 2023-01-11T21:38:06.5717222Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.5717298Z import torch 2023-01-11T21:38:06.5717365Z import random 2023-01-11T21:38:06.5717484Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.5717606Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.5717611Z 2023-01-11T21:38:06.5717693Z aten = torch.ops.aten 2023-01-11T21:38:06.5717830Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.5717925Z async_compile = AsyncCompile() 2023-01-11T21:38:06.5717930Z 2023-01-11T21:38:06.5718003Z import triton 2023-01-11T21:38:06.5718088Z import triton.language as tl 2023-01-11T21:38:06.5718214Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.5718380Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.5718386Z 2023-01-11T21:38:06.5718390Z 2023-01-11T21:38:06.5718482Z async_compile.wait(globals()) 2023-01-11T21:38:06.5718557Z del async_compile 2023-01-11T21:38:06.5718562Z 2023-01-11T21:38:06.5718637Z def call(args): 2023-01-11T21:38:06.5718732Z primals_1, primals_2 = args 2023-01-11T21:38:06.5718809Z args.clear() 2023-01-11T21:38:06.5718893Z with torch.cuda.device(0): 2023-01-11T21:38:06.5719114Z buf0 = empty_strided((1024, 160, 20), (3200, 20, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.5719266Z aten.bmm.out(as_strided(primals_1, (1024, 160, 642), (102720, 1, 160)), primals_2, out=buf0) 2023-01-11T21:38:06.5719433Z return (buf0, as_strided(primals_1, (1024, 642, 160), (102720, 160, 1)), as_strided(primals_2, (1024, 20, 642), (12840, 1, 20)), ) 2023-01-11T21:38:06.5719439Z 2023-01-11T21:38:06.5719443Z 2023-01-11T21:38:06.5719527Z if __name__ == "__main__": 2023-01-11T21:38:06.5719642Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.5719767Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.5719998Z primals_1 = rand_strided((1024, 642, 160), (102720, 160, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.5720228Z primals_2 = rand_strided((1024, 642, 20), (12840, 20, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.5720354Z print_performance(lambda: call([primals_1, primals_2])) 2023-01-11T21:38:06.5720359Z 2023-01-11T21:38:06.5720430Z ok (0.203s) 2023-01-11T21:38:06.5720561Z test_permute_linear_fusion (__main__.CudaTests) ... ok (0.004s) 2023-01-11T21:38:06.5721018Z test_pow1_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.5721155Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.5721408Z [2023-01-11 21:35:26,381] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 750 2023-01-11T21:38:06.5721695Z [2023-01-11 21:35:26,670] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 750 2023-01-11T21:38:06.5722109Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.5722240Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.5722492Z [2023-01-11 21:35:26,823] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 751 2023-01-11T21:38:06.5722501Z 2023-01-11T21:38:06.5722600Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.5722668Z import torch 2023-01-11T21:38:06.5722746Z import random 2023-01-11T21:38:06.5722867Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.5722993Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.5722998Z 2023-01-11T21:38:06.5723081Z aten = torch.ops.aten 2023-01-11T21:38:06.5723220Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.5723315Z async_compile = AsyncCompile() 2023-01-11T21:38:06.5723320Z 2023-01-11T21:38:06.5723387Z import triton 2023-01-11T21:38:06.5723481Z import triton.language as tl 2023-01-11T21:38:06.5723613Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.5723752Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.5723757Z 2023-01-11T21:38:06.5723789Z 2023-01-11T21:38:06.5724040Z triton_fused_pow_1_pow_11_pow_12_pow_13_pow_14_pow_15_pow_16_pow_17_pow_2_pow_3_0 = async_compile.triton(''' 2023-01-11T21:38:06.5724116Z import triton 2023-01-11T21:38:06.5724210Z import triton.language as tl 2023-01-11T21:38:06.5724324Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.5724421Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.5724553Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.5724678Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.5724684Z 2023-01-11T21:38:06.5725316Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: '*fp32', 5: '*fp32', 6: '*fp32', 7: '*fp32', 8: '*fp32', 9: '*fp32', 10: '*fp32', 11: '*fp32', 12: '*fp32', 13: '*fp32', 14: '*fp32', 15: '*fp32', 16: '*fp32', 17: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17), equal_to_1=())]}) 2023-01-11T21:38:06.5725393Z @triton.jit 2023-01-11T21:38:06.5725646Z def triton_(in_ptr0, out_ptr0, out_ptr1, out_ptr2, out_ptr3, out_ptr4, out_ptr5, out_ptr6, out_ptr7, out_ptr8, out_ptr9, out_ptr10, out_ptr11, out_ptr12, out_ptr13, out_ptr14, out_ptr15, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.5725722Z xnumel = 256 2023-01-11T21:38:06.5725816Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.5725944Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.5726020Z xmask = xindex < xnumel 2023-01-11T21:38:06.5726090Z x0 = xindex 2023-01-11T21:38:06.5726281Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.5726380Z tmp16 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.5726455Z tmp1 = 1 / tmp0 2023-01-11T21:38:06.5726534Z tmp2 = tmp1 * tmp1 2023-01-11T21:38:06.5726611Z tmp3 = tmp2 * tmp2 2023-01-11T21:38:06.5726684Z tmp4 = tmp3 * tmp3 2023-01-11T21:38:06.5726760Z tmp5 = tmp2 * tmp1 2023-01-11T21:38:06.5726834Z tmp6 = tmp5 * tmp5 2023-01-11T21:38:06.5726907Z tmp7 = tmp6 * tmp1 2023-01-11T21:38:06.5726981Z tmp8 = tmp3 * tmp1 2023-01-11T21:38:06.5727051Z tmp9 = 1 2023-01-11T21:38:06.5727129Z tmp10 = tmp0 * tmp0 2023-01-11T21:38:06.5727233Z tmp11 = tmp10 * tmp0 2023-01-11T21:38:06.5727315Z tmp12 = tmp10 * tmp10 2023-01-11T21:38:06.5727394Z tmp13 = tmp12 * tmp0 2023-01-11T21:38:06.5727472Z tmp14 = tmp11 * tmp11 2023-01-11T21:38:06.5727551Z tmp15 = tmp14 * tmp0 2023-01-11T21:38:06.5727630Z tmp17 = tmp16 * tmp16 2023-01-11T21:38:06.5727701Z tmp18 = tmp17 * tmp17 2023-01-11T21:38:06.5727780Z tmp19 = tmp18 * tmp18 2023-01-11T21:38:06.5727916Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.5728071Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp7, xmask) 2023-01-11T21:38:06.5728215Z tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp6, xmask) 2023-01-11T21:38:06.5728362Z tl.store(out_ptr3 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp8, xmask) 2023-01-11T21:38:06.5728489Z tl.store(out_ptr4 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.5728618Z tl.store(out_ptr5 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask) 2023-01-11T21:38:06.5728738Z tl.store(out_ptr6 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.5728867Z tl.store(out_ptr7 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask) 2023-01-11T21:38:06.5728993Z tl.store(out_ptr8 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp9, xmask) 2023-01-11T21:38:06.5729124Z tl.store(out_ptr9 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp10, xmask) 2023-01-11T21:38:06.5729254Z tl.store(out_ptr10 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp11, xmask) 2023-01-11T21:38:06.5729384Z tl.store(out_ptr11 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp12, xmask) 2023-01-11T21:38:06.5729544Z tl.store(out_ptr12 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp13, xmask) 2023-01-11T21:38:06.5729673Z tl.store(out_ptr13 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp14, xmask) 2023-01-11T21:38:06.5729792Z tl.store(out_ptr14 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp15, xmask) 2023-01-11T21:38:06.5729921Z tl.store(out_ptr15 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp19, xmask) 2023-01-11T21:38:06.5730009Z ''') 2023-01-11T21:38:06.5730015Z 2023-01-11T21:38:06.5730019Z 2023-01-11T21:38:06.5730116Z async_compile.wait(globals()) 2023-01-11T21:38:06.5730191Z del async_compile 2023-01-11T21:38:06.5730196Z 2023-01-11T21:38:06.5730269Z def call(args): 2023-01-11T21:38:06.5730339Z arg0_1, = args 2023-01-11T21:38:06.5730413Z args.clear() 2023-01-11T21:38:06.5730499Z with torch.cuda.device(0): 2023-01-11T21:38:06.5730703Z buf0 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.5730903Z buf1 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.5731103Z buf2 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.5731295Z buf3 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.5731485Z buf4 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.5731683Z buf5 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.5731872Z buf6 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.5732055Z buf7 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.5732246Z buf8 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.5732436Z buf9 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.5732637Z buf10 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.5732837Z buf11 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.5733030Z buf12 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.5733224Z buf13 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.5733455Z buf14 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.5733642Z buf15 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.5733736Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.5733985Z triton_fused_pow_1_pow_11_pow_12_pow_13_pow_14_pow_15_pow_16_pow_17_pow_2_pow_3_0.run(arg0_1, buf0, buf1, buf2, buf3, buf4, buf5, buf6, buf7, buf8, buf9, buf10, buf11, buf12, buf13, buf14, buf15, 256, grid=grid(256), stream=stream0) 2023-01-11T21:38:06.5734153Z return (buf0, buf1, buf2, buf3, buf4, buf5, buf6, buf7, buf8, arg0_1, buf9, buf10, buf11, buf12, buf13, buf14, buf15, ) 2023-01-11T21:38:06.5734163Z 2023-01-11T21:38:06.5734167Z 2023-01-11T21:38:06.5734249Z if __name__ == "__main__": 2023-01-11T21:38:06.5734369Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.5734623Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.5734833Z arg0_1 = rand_strided((16, 16), (16, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.5734944Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.5734950Z 2023-01-11T21:38:06.5735210Z [2023-01-11 21:35:26,981] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 751 2023-01-11T21:38:06.5735223Z 2023-01-11T21:38:06.5735315Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.5735390Z import torch 2023-01-11T21:38:06.5735461Z import random 2023-01-11T21:38:06.5735580Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.5735702Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.5735753Z 2023-01-11T21:38:06.5735836Z aten = torch.ops.aten 2023-01-11T21:38:06.5735977Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.5736066Z async_compile = AsyncCompile() 2023-01-11T21:38:06.5736071Z 2023-01-11T21:38:06.5736143Z import triton 2023-01-11T21:38:06.5736241Z import triton.language as tl 2023-01-11T21:38:06.5736363Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.5736503Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.5736508Z 2023-01-11T21:38:06.5736514Z 2023-01-11T21:38:06.5736762Z triton_fused_pow_1_pow_11_pow_12_pow_13_pow_14_pow_15_pow_16_pow_17_pow_2_pow_3_0 = async_compile.triton(''' 2023-01-11T21:38:06.5736835Z import triton 2023-01-11T21:38:06.5736928Z import triton.language as tl 2023-01-11T21:38:06.5737035Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.5737193Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.5737329Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.5737452Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.5737457Z 2023-01-11T21:38:06.5738095Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: '*fp16', 4: '*fp16', 5: '*fp16', 6: '*fp16', 7: '*fp16', 8: '*fp16', 9: '*fp16', 10: '*fp16', 11: '*fp16', 12: '*fp16', 13: '*fp16', 14: '*fp16', 15: '*fp16', 16: '*fp16', 17: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17), equal_to_1=())]}) 2023-01-11T21:38:06.5738167Z @triton.jit 2023-01-11T21:38:06.5738418Z def triton_(in_ptr0, out_ptr0, out_ptr1, out_ptr2, out_ptr3, out_ptr4, out_ptr5, out_ptr6, out_ptr7, out_ptr8, out_ptr9, out_ptr10, out_ptr11, out_ptr12, out_ptr13, out_ptr14, out_ptr15, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.5738492Z xnumel = 256 2023-01-11T21:38:06.5738595Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.5738718Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.5738801Z xmask = xindex < xnumel 2023-01-11T21:38:06.5738872Z x0 = xindex 2023-01-11T21:38:06.5739124Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.5739244Z tmp16 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.5739319Z tmp1 = 1 / tmp0 2023-01-11T21:38:06.5739400Z tmp2 = tmp1 * tmp1 2023-01-11T21:38:06.5739471Z tmp3 = tmp2 * tmp2 2023-01-11T21:38:06.5739549Z tmp4 = tmp3 * tmp3 2023-01-11T21:38:06.5739623Z tmp5 = tmp2 * tmp1 2023-01-11T21:38:06.5739701Z tmp6 = tmp5 * tmp5 2023-01-11T21:38:06.5739777Z tmp7 = tmp6 * tmp1 2023-01-11T21:38:06.5739854Z tmp8 = tmp3 * tmp1 2023-01-11T21:38:06.5739921Z tmp9 = 1 2023-01-11T21:38:06.5739992Z tmp10 = tmp0 * tmp0 2023-01-11T21:38:06.5740076Z tmp11 = tmp10 * tmp0 2023-01-11T21:38:06.5740159Z tmp12 = tmp10 * tmp10 2023-01-11T21:38:06.5740240Z tmp13 = tmp12 * tmp0 2023-01-11T21:38:06.5740321Z tmp14 = tmp11 * tmp11 2023-01-11T21:38:06.5740399Z tmp15 = tmp14 * tmp0 2023-01-11T21:38:06.5740476Z tmp17 = tmp16 * tmp16 2023-01-11T21:38:06.5740548Z tmp18 = tmp17 * tmp17 2023-01-11T21:38:06.5740629Z tmp19 = tmp18 * tmp18 2023-01-11T21:38:06.5740765Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.5740898Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp7, xmask) 2023-01-11T21:38:06.5741027Z tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp6, xmask) 2023-01-11T21:38:06.5741157Z tl.store(out_ptr3 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp8, xmask) 2023-01-11T21:38:06.5741283Z tl.store(out_ptr4 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.5741403Z tl.store(out_ptr5 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask) 2023-01-11T21:38:06.5741558Z tl.store(out_ptr6 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.5741687Z tl.store(out_ptr7 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask) 2023-01-11T21:38:06.5741812Z tl.store(out_ptr8 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp9, xmask) 2023-01-11T21:38:06.5741946Z tl.store(out_ptr9 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp10, xmask) 2023-01-11T21:38:06.5742076Z tl.store(out_ptr10 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp11, xmask) 2023-01-11T21:38:06.5742207Z tl.store(out_ptr11 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp12, xmask) 2023-01-11T21:38:06.5742335Z tl.store(out_ptr12 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp13, xmask) 2023-01-11T21:38:06.5742457Z tl.store(out_ptr13 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp14, xmask) 2023-01-11T21:38:06.5742583Z tl.store(out_ptr14 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp15, xmask) 2023-01-11T21:38:06.5742707Z tl.store(out_ptr15 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp19, xmask) 2023-01-11T21:38:06.5742797Z ''') 2023-01-11T21:38:06.5742802Z 2023-01-11T21:38:06.5742807Z 2023-01-11T21:38:06.5742904Z async_compile.wait(globals()) 2023-01-11T21:38:06.5742981Z del async_compile 2023-01-11T21:38:06.5742986Z 2023-01-11T21:38:06.5743060Z def call(args): 2023-01-11T21:38:06.5743133Z arg0_1, = args 2023-01-11T21:38:06.5743204Z args.clear() 2023-01-11T21:38:06.5743297Z with torch.cuda.device(0): 2023-01-11T21:38:06.5743502Z buf0 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.5743700Z buf1 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.5743894Z buf2 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.5744089Z buf3 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.5744281Z buf4 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.5744475Z buf5 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.5744659Z buf6 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.5744849Z buf7 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.5745068Z buf8 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.5745260Z buf9 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.5745462Z buf10 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.5745685Z buf11 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.5745903Z buf12 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.5746096Z buf13 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.5746285Z buf14 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.5746476Z buf15 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.5746567Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.5746812Z triton_fused_pow_1_pow_11_pow_12_pow_13_pow_14_pow_15_pow_16_pow_17_pow_2_pow_3_0.run(arg0_1, buf0, buf1, buf2, buf3, buf4, buf5, buf6, buf7, buf8, buf9, buf10, buf11, buf12, buf13, buf14, buf15, 256, grid=grid(256), stream=stream0) 2023-01-11T21:38:06.5746982Z return (buf0, buf1, buf2, buf3, buf4, buf5, buf6, buf7, buf8, arg0_1, buf9, buf10, buf11, buf12, buf13, buf14, buf15, ) 2023-01-11T21:38:06.5746987Z 2023-01-11T21:38:06.5746992Z 2023-01-11T21:38:06.5747071Z if __name__ == "__main__": 2023-01-11T21:38:06.5747189Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.5747315Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.5747516Z arg0_1 = rand_strided((16, 16), (16, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.5747661Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.5747665Z 2023-01-11T21:38:06.5747736Z ok (0.753s) 2023-01-11T21:38:06.5748192Z test_pow2_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.5748324Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.5748584Z [2023-01-11 21:35:27,013] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 752 2023-01-11T21:38:06.5748844Z [2023-01-11 21:35:27,175] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 752 2023-01-11T21:38:06.5749261Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.5749393Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.5749643Z [2023-01-11 21:35:27,200] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 753 2023-01-11T21:38:06.5749903Z [2023-01-11 21:35:27,361] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 753 2023-01-11T21:38:06.5749908Z 2023-01-11T21:38:06.5750005Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.5750072Z import torch 2023-01-11T21:38:06.5750145Z import random 2023-01-11T21:38:06.5750265Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.5750394Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.5750399Z 2023-01-11T21:38:06.5750479Z aten = torch.ops.aten 2023-01-11T21:38:06.5750615Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.5750711Z async_compile = AsyncCompile() 2023-01-11T21:38:06.5750717Z 2023-01-11T21:38:06.5750817Z import triton 2023-01-11T21:38:06.5750904Z import triton.language as tl 2023-01-11T21:38:06.5751026Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.5751166Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.5751171Z 2023-01-11T21:38:06.5751176Z 2023-01-11T21:38:06.5751344Z triton_fused_pow_1_pow_2_0 = async_compile.triton(''' 2023-01-11T21:38:06.5751419Z import triton 2023-01-11T21:38:06.5751510Z import triton.language as tl 2023-01-11T21:38:06.5751626Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.5751722Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.5751858Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.5751981Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.5751986Z 2023-01-11T21:38:06.5752412Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.5752489Z @triton.jit 2023-01-11T21:38:06.5752630Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.5752706Z xnumel = 256 2023-01-11T21:38:06.5752802Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.5752924Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.5753008Z xmask = xindex < xnumel 2023-01-11T21:38:06.5753081Z x0 = xindex 2023-01-11T21:38:06.5753271Z tmp1 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.5753397Z tmp3 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.5753469Z tmp0 = 1000 2023-01-11T21:38:06.5753570Z tmp2 = tl.libdevice.pow(tmp0, tmp1) 2023-01-11T21:38:06.5753665Z tmp4 = tl.libdevice.pow(tmp3, tmp0) 2023-01-11T21:38:06.5753799Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.5753935Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.5754020Z ''') 2023-01-11T21:38:06.5754026Z 2023-01-11T21:38:06.5754030Z 2023-01-11T21:38:06.5754125Z async_compile.wait(globals()) 2023-01-11T21:38:06.5754202Z del async_compile 2023-01-11T21:38:06.5754207Z 2023-01-11T21:38:06.5754279Z def call(args): 2023-01-11T21:38:06.5754352Z arg0_1, = args 2023-01-11T21:38:06.5754420Z args.clear() 2023-01-11T21:38:06.5754514Z with torch.cuda.device(0): 2023-01-11T21:38:06.5754717Z buf0 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.5754917Z buf1 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.5755009Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.5755158Z triton_fused_pow_1_pow_2_0.run(arg0_1, buf0, buf1, 256, grid=grid(256), stream=stream0) 2023-01-11T21:38:06.5755234Z del arg0_1 2023-01-11T21:38:06.5755315Z return (buf0, buf1, ) 2023-01-11T21:38:06.5755330Z 2023-01-11T21:38:06.5755336Z 2023-01-11T21:38:06.5755426Z if __name__ == "__main__": 2023-01-11T21:38:06.5755559Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.5755700Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.5755902Z arg0_1 = rand_strided((16, 16), (16, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.5756013Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.5756018Z 2023-01-11T21:38:06.5756023Z 2023-01-11T21:38:06.5756120Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.5756194Z import torch 2023-01-11T21:38:06.5756263Z import random 2023-01-11T21:38:06.5756382Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.5756505Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.5756510Z 2023-01-11T21:38:06.5756591Z aten = torch.ops.aten 2023-01-11T21:38:06.5756752Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.5756847Z async_compile = AsyncCompile() 2023-01-11T21:38:06.5756852Z 2023-01-11T21:38:06.5756926Z import triton 2023-01-11T21:38:06.5757015Z import triton.language as tl 2023-01-11T21:38:06.5757132Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.5757269Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.5757274Z 2023-01-11T21:38:06.5757279Z 2023-01-11T21:38:06.5757445Z triton_fused_pow_1_pow_2_0 = async_compile.triton(''' 2023-01-11T21:38:06.5757522Z import triton 2023-01-11T21:38:06.5757613Z import triton.language as tl 2023-01-11T21:38:06.5757728Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.5757833Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.5757965Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.5758083Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.5758088Z 2023-01-11T21:38:06.5758508Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.5758582Z @triton.jit 2023-01-11T21:38:06.5758724Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.5758802Z xnumel = 256 2023-01-11T21:38:06.5758898Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.5759025Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.5759107Z xmask = xindex < xnumel 2023-01-11T21:38:06.5759200Z x0 = xindex 2023-01-11T21:38:06.5759412Z tmp1 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.5759529Z tmp3 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.5759601Z tmp0 = 1000 2023-01-11T21:38:06.5759701Z tmp2 = tl.libdevice.pow(tmp0, tmp1) 2023-01-11T21:38:06.5759806Z tmp4 = tl.libdevice.pow(tmp3, tmp0) 2023-01-11T21:38:06.5759940Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.5760064Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.5760148Z ''') 2023-01-11T21:38:06.5760154Z 2023-01-11T21:38:06.5760158Z 2023-01-11T21:38:06.5760248Z async_compile.wait(globals()) 2023-01-11T21:38:06.5760324Z del async_compile 2023-01-11T21:38:06.5760329Z 2023-01-11T21:38:06.5760401Z def call(args): 2023-01-11T21:38:06.5760473Z arg0_1, = args 2023-01-11T21:38:06.5760546Z args.clear() 2023-01-11T21:38:06.5760631Z with torch.cuda.device(0): 2023-01-11T21:38:06.5760842Z buf0 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.5761038Z buf1 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.5761130Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.5761283Z triton_fused_pow_1_pow_2_0.run(arg0_1, buf0, buf1, 256, grid=grid(256), stream=stream0) 2023-01-11T21:38:06.5761356Z del arg0_1 2023-01-11T21:38:06.5761437Z return (buf0, buf1, ) 2023-01-11T21:38:06.5761442Z 2023-01-11T21:38:06.5761446Z 2023-01-11T21:38:06.5761524Z if __name__ == "__main__": 2023-01-11T21:38:06.5761634Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.5761759Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.5761959Z arg0_1 = rand_strided((16, 16), (16, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.5762071Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.5762078Z 2023-01-11T21:38:06.5762148Z ok (0.374s) 2023-01-11T21:38:06.5762650Z test_pow3_cuda (__main__.CudaTests) ... /opt/conda/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py:372: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled.Consider setting `torch.set_float32_matmul_precision('high')` 2023-01-11T21:38:06.5762753Z warnings.warn( 2023-01-11T21:38:06.5763013Z [2023-01-11 21:35:27,378] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 754 2023-01-11T21:38:06.5763263Z [2023-01-11 21:35:27,530] torch._inductor.compile_fx: [WARNING] skipping cudagraphs due to multiple devices 2023-01-11T21:38:06.5763519Z [2023-01-11 21:35:27,530] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 754 2023-01-11T21:38:06.5763535Z 2023-01-11T21:38:06.5763625Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.5763700Z import torch 2023-01-11T21:38:06.5763774Z import random 2023-01-11T21:38:06.5763896Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.5764020Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.5764025Z 2023-01-11T21:38:06.5764107Z aten = torch.ops.aten 2023-01-11T21:38:06.5764244Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.5764334Z async_compile = AsyncCompile() 2023-01-11T21:38:06.5764339Z 2023-01-11T21:38:06.5764414Z import triton 2023-01-11T21:38:06.5764507Z import triton.language as tl 2023-01-11T21:38:06.5764631Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.5764771Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.5764777Z 2023-01-11T21:38:06.5764781Z 2023-01-11T21:38:06.5764970Z triton_fused_add_lift_fresh_copy_pow_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.5765043Z import triton 2023-01-11T21:38:06.5765132Z import triton.language as tl 2023-01-11T21:38:06.5765239Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.5765370Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.5765506Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.5765656Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.5765661Z 2023-01-11T21:38:06.5766087Z @pointwise(size_hints=[1], filename=__file__, meta={'signature': {0: 'fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.5766162Z @triton.jit 2023-01-11T21:38:06.5766294Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.5766365Z xnumel = 1 2023-01-11T21:38:06.5766455Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.5766583Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.5766667Z xmask = xindex < xnumel 2023-01-11T21:38:06.5766742Z tmp1 = in_ptr0 2023-01-11T21:38:06.5766821Z tmp0 = 0.12300000339746475 2023-01-11T21:38:06.5766906Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.5766989Z tmp3 = tl.sqrt(tmp2) 2023-01-11T21:38:06.5767115Z tl.store(out_ptr0 + (0 + tl.zeros([XBLOCK], tl.int32)), tmp3, None) 2023-01-11T21:38:06.5767200Z ''') 2023-01-11T21:38:06.5767205Z 2023-01-11T21:38:06.5767211Z 2023-01-11T21:38:06.5767303Z async_compile.wait(globals()) 2023-01-11T21:38:06.5767384Z del async_compile 2023-01-11T21:38:06.5767389Z 2023-01-11T21:38:06.5767465Z def call(args): 2023-01-11T21:38:06.5767537Z arg0_1, = args 2023-01-11T21:38:06.5767610Z args.clear() 2023-01-11T21:38:06.5767695Z with torch.cuda.device(0): 2023-01-11T21:38:06.5767890Z buf0 = empty_strided((), (), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.5767985Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.5768151Z triton_fused_add_lift_fresh_copy_pow_1_0.run(arg0_1.item(), buf0, 1, grid=grid(1), stream=stream0) 2023-01-11T21:38:06.5768226Z del arg0_1 2023-01-11T21:38:06.5768308Z return (buf0, ) 2023-01-11T21:38:06.5768313Z 2023-01-11T21:38:06.5768318Z 2023-01-11T21:38:06.5768397Z if __name__ == "__main__": 2023-01-11T21:38:06.5768514Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.5768632Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.5768846Z arg0_1 = rand_strided((), (), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.5768960Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.5768965Z 2023-01-11T21:38:06.5769033Z ok (0.168s) 2023-01-11T21:38:06.5769375Z test_profiler_mark_wrapper_call_cuda (__main__.CudaTests) ... STAGE:2023-01-11 21:35:27 2346:2346 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:38:06.5769636Z [2023-01-11 21:35:27,542] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 755 2023-01-11T21:38:06.5769902Z [2023-01-11 21:35:27,549] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 755 2023-01-11T21:38:06.5770155Z STAGE:2023-01-11 21:35:27 2346:2346 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:38:06.5770404Z STAGE:2023-01-11 21:35:27 2346:2346 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:38:06.5770417Z 2023-01-11T21:38:06.5770509Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.5770584Z import torch 2023-01-11T21:38:06.5770662Z import random 2023-01-11T21:38:06.5770783Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.5770909Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.5770914Z 2023-01-11T21:38:06.5770997Z aten = torch.ops.aten 2023-01-11T21:38:06.5771141Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.5771228Z async_compile = AsyncCompile() 2023-01-11T21:38:06.5771233Z 2023-01-11T21:38:06.5771306Z import triton 2023-01-11T21:38:06.5771398Z import triton.language as tl 2023-01-11T21:38:06.5771524Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.5771692Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.5771698Z 2023-01-11T21:38:06.5771702Z 2023-01-11T21:38:06.5771842Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.5772053Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.5772176Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.5772279Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:06.5772386Z float* __restrict__ out_ptr0) 2023-01-11T21:38:06.5772452Z { 2023-01-11T21:38:06.5772554Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.5772619Z { 2023-01-11T21:38:06.5772701Z #pragma omp for 2023-01-11T21:38:06.5772790Z for(long i0=0; i0<12; i0+=1) 2023-01-11T21:38:06.5772850Z { 2023-01-11T21:38:06.5773004Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:06.5773143Z auto tmp1 = at::vec::Vectorized::loadu(in_ptr1 + 8*i0); 2023-01-11T21:38:06.5773233Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.5773330Z tmp2.store(out_ptr0 + 8*i0); 2023-01-11T21:38:06.5773396Z } 2023-01-11T21:38:06.5773495Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:06.5773580Z for(long i0=96; i0<100; i0+=1) 2023-01-11T21:38:06.5773645Z { 2023-01-11T21:38:06.5773734Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:06.5773822Z auto tmp1 = in_ptr1[i0]; 2023-01-11T21:38:06.5773909Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.5773998Z out_ptr0[i0] = tmp2; 2023-01-11T21:38:06.5774065Z } 2023-01-11T21:38:06.5774125Z } 2023-01-11T21:38:06.5774189Z } 2023-01-11T21:38:06.5774274Z ''') 2023-01-11T21:38:06.5774280Z 2023-01-11T21:38:06.5774284Z 2023-01-11T21:38:06.5774376Z async_compile.wait(globals()) 2023-01-11T21:38:06.5774452Z del async_compile 2023-01-11T21:38:06.5774461Z 2023-01-11T21:38:06.5774648Z def call(args): 2023-01-11T21:38:06.5774731Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.5774799Z args.clear() 2023-01-11T21:38:06.5774918Z from torch.profiler import record_function 2023-01-11T21:38:06.5775078Z with record_function('inductor_wrapper_call'): 2023-01-11T21:38:06.5775327Z buf0 = empty_strided((100, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.5775497Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.5775570Z del arg0_1 2023-01-11T21:38:06.5775644Z del arg1_1 2023-01-11T21:38:06.5775714Z return (buf0, ) 2023-01-11T21:38:06.5775730Z 2023-01-11T21:38:06.5775734Z 2023-01-11T21:38:06.5775807Z if __name__ == "__main__": 2023-01-11T21:38:06.5775922Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.5776050Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.5776251Z arg0_1 = rand_strided((100, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.5776443Z arg1_1 = rand_strided((100, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.5776560Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.5776566Z 2023-01-11T21:38:06.5776633Z ok (0.023s) 2023-01-11T21:38:06.5777220Z test_rand_like_deterministic_cuda (__main__.CudaTests) ... /opt/conda/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py:372: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled.Consider setting `torch.set_float32_matmul_precision('high')` 2023-01-11T21:38:06.5777307Z warnings.warn( 2023-01-11T21:38:06.5777591Z [2023-01-11 21:35:27,625] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 756 2023-01-11T21:38:06.5777844Z [2023-01-11 21:35:27,626] torch._inductor.lowering: [WARNING] using triton random, expect difference from eager 2023-01-11T21:38:06.5778146Z [2023-01-11 21:35:27,817] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 756 2023-01-11T21:38:06.5778152Z 2023-01-11T21:38:06.5778250Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.5778322Z import torch 2023-01-11T21:38:06.5778399Z import random 2023-01-11T21:38:06.5778520Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.5778636Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.5778649Z 2023-01-11T21:38:06.5778724Z aten = torch.ops.aten 2023-01-11T21:38:06.5778859Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.5778954Z async_compile = AsyncCompile() 2023-01-11T21:38:06.5778959Z 2023-01-11T21:38:06.5779033Z import triton 2023-01-11T21:38:06.5779124Z import triton.language as tl 2023-01-11T21:38:06.5779248Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.5779385Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.5779545Z seed_cuda_0 = None # 12bf87036c8e625335a9db42dcf50de0c1ec952294785adced537424d5733e17 2023-01-11T21:38:06.5779551Z 2023-01-11T21:38:06.5779565Z 2023-01-11T21:38:06.5779765Z triton_fused_philox_rand_like_philox_rand_like_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.5779842Z import triton 2023-01-11T21:38:06.5779935Z import triton.language as tl 2023-01-11T21:38:06.5780055Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.5780160Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.5780292Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.5780414Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.5780419Z 2023-01-11T21:38:06.5780828Z @pointwise(size_hints=[1024], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.5780904Z @triton.jit 2023-01-11T21:38:06.5781043Z def triton_(seed0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.5781116Z xnumel = 1024 2023-01-11T21:38:06.5781212Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.5781339Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.5781451Z xmask = xindex < xnumel 2023-01-11T21:38:06.5781521Z x0 = xindex 2023-01-11T21:38:06.5781747Z tmp0 = tl.load(seed0 + (0 + tl.zeros([XBLOCK], tl.int32)), None, eviction_policy='evict_last') 2023-01-11T21:38:06.5781880Z tmp3 = tl.load(seed0 + (0 + tl.zeros([XBLOCK], tl.int32)), None) 2023-01-11T21:38:06.5781950Z tmp1 = x0 2023-01-11T21:38:06.5782039Z tmp2 = tl.rand(tmp0, tmp1) 2023-01-11T21:38:06.5782113Z tmp4 = 1024 + x0 2023-01-11T21:38:06.5782201Z tmp5 = tl.rand(tmp3, tmp4) 2023-01-11T21:38:06.5782335Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.5782460Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask) 2023-01-11T21:38:06.5782549Z ''') 2023-01-11T21:38:06.5782555Z 2023-01-11T21:38:06.5782559Z 2023-01-11T21:38:06.5782653Z async_compile.wait(globals()) 2023-01-11T21:38:06.5782729Z del async_compile 2023-01-11T21:38:06.5782734Z 2023-01-11T21:38:06.5782806Z def call(args): 2023-01-11T21:38:06.5782881Z arg0_1, = args 2023-01-11T21:38:06.5782957Z args.clear() 2023-01-11T21:38:06.5783092Z torch.randint(2**31, size=(), dtype=torch.int64, out=seed_cuda_0) 2023-01-11T21:38:06.5783177Z with torch.cuda.device(0): 2023-01-11T21:38:06.5783377Z buf0 = empty_strided((1024, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.5783576Z buf1 = empty_strided((1024, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.5783669Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.5783851Z triton_fused_philox_rand_like_philox_rand_like_1_0.run(seed_cuda_0, buf0, buf1, 1024, grid=grid(1024), stream=stream0) 2023-01-11T21:38:06.5783961Z return (buf0, buf1, ) 2023-01-11T21:38:06.5783966Z 2023-01-11T21:38:06.5783971Z 2023-01-11T21:38:06.5784050Z if __name__ == "__main__": 2023-01-11T21:38:06.5784168Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.5784287Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.5784488Z seed_cuda_0 = rand_strided((), (), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.5784691Z arg0_1 = rand_strided((1024, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.5784801Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.5784806Z 2023-01-11T21:38:06.5784875Z ok (0.413s) 2023-01-11T21:38:06.5785331Z test_reduction1_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.5785466Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.5785748Z [2023-01-11 21:35:27,985] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 757 2023-01-11T21:38:06.5786041Z [2023-01-11 21:35:28,228] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 757 2023-01-11T21:38:06.5786455Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.5786578Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.5786835Z [2023-01-11 21:35:28,247] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 758 2023-01-11T21:38:06.5786842Z 2023-01-11T21:38:06.5786940Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.5787013Z import torch 2023-01-11T21:38:06.5787087Z import random 2023-01-11T21:38:06.5787205Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.5787353Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.5787359Z 2023-01-11T21:38:06.5787442Z aten = torch.ops.aten 2023-01-11T21:38:06.5787571Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.5787665Z async_compile = AsyncCompile() 2023-01-11T21:38:06.5787670Z 2023-01-11T21:38:06.5787745Z import triton 2023-01-11T21:38:06.5787837Z import triton.language as tl 2023-01-11T21:38:06.5787962Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.5788101Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.5788107Z 2023-01-11T21:38:06.5788114Z 2023-01-11T21:38:06.5788314Z triton_fused_argmax_argmin_max_1_min_1_sum_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.5788386Z import triton 2023-01-11T21:38:06.5788471Z import triton.language as tl 2023-01-11T21:38:06.5788585Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.5788686Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.5788819Z from torch._inductor.triton_ops.autotune import reduction 2023-01-11T21:38:06.5788944Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.5788950Z 2023-01-11T21:38:06.5789037Z @reduction(size_hints=[1, 4], 2023-01-11T21:38:06.5789153Z reduction_hint=ReductionHint.INNER, 2023-01-11T21:38:06.5789231Z filename=__file__, 2023-01-11T21:38:06.5789647Z meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: '*i64', 5: '*i64', 6: 'i32', 7: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5), equal_to_1=())]}) 2023-01-11T21:38:06.5789758Z @triton.jit 2023-01-11T21:38:06.5789958Z def triton_(in_ptr0, out_ptr0, out_ptr1, out_ptr2, out_ptr3, out_ptr4, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr): 2023-01-11T21:38:06.5790031Z xnumel = 1 2023-01-11T21:38:06.5790105Z rnumel = 3 2023-01-11T21:38:06.5790207Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.5790342Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.5790419Z xmask = xindex < xnumel 2023-01-11T21:38:06.5790538Z rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK]) 2023-01-11T21:38:06.5790656Z _tmp1 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.5790842Z _tmp2 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("-inf") 2023-01-11T21:38:06.5790966Z _tmp3 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("inf") 2023-01-11T21:38:06.5791146Z _tmp4 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("-inf") 2023-01-11T21:38:06.5791265Z _tmp4_index = tl.zeros([XBLOCK, RBLOCK], tl.int64) 2023-01-11T21:38:06.5791387Z _tmp6 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("inf") 2023-01-11T21:38:06.5791495Z _tmp6_index = tl.zeros([XBLOCK, RBLOCK], tl.int64) 2023-01-11T21:38:06.5791597Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.5791687Z rindex = roffset + rbase 2023-01-11T21:38:06.5791773Z rmask = rindex < rnumel 2023-01-11T21:38:06.5791843Z r0 = rindex 2023-01-11T21:38:06.5792039Z tmp0 = tl.load(in_ptr0 + (r0), rmask, eviction_policy='evict_last') 2023-01-11T21:38:06.5792141Z tmp5 = tl.load(in_ptr0 + (r0), rmask) 2023-01-11T21:38:06.5792255Z _tmp1 = tl.where(xmask & rmask, _tmp1 + tmp0, _tmp1) 2023-01-11T21:38:06.5792382Z _tmp2 = tl.where(xmask & rmask & (_tmp2 < tmp0), tmp0, _tmp2) 2023-01-11T21:38:06.5792507Z _tmp3 = tl.where(xmask & rmask & (_tmp3 > tmp0), tmp0, _tmp3) 2023-01-11T21:38:06.5792650Z _tmp4_index = tl.where(xmask & rmask & (_tmp4 < tmp0), rindex, _tmp4_index) 2023-01-11T21:38:06.5792774Z _tmp4 = tl.where(xmask & rmask & (_tmp4 < tmp0), tmp0, _tmp4) 2023-01-11T21:38:06.5792909Z _tmp6_index = tl.where(xmask & rmask & (_tmp6 > tmp5), rindex, _tmp6_index) 2023-01-11T21:38:06.5793028Z _tmp6 = tl.where(xmask & rmask & (_tmp6 > tmp5), tmp5, _tmp6) 2023-01-11T21:38:06.5793173Z tmp1 = tl.reshape(tl.sum(_tmp1, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.5793299Z tl.store(out_ptr0 + 0 + tl.zeros([XBLOCK, 1], tl.int32), tmp1, None) 2023-01-11T21:38:06.5793413Z tmp2 = tl.reshape(tl.max(_tmp2, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.5793545Z tl.store(out_ptr1 + 0 + tl.zeros([XBLOCK, 1], tl.int32), tmp2, None) 2023-01-11T21:38:06.5793657Z tmp3 = tl.reshape(tl.min(_tmp3, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.5793787Z tl.store(out_ptr2 + 0 + tl.zeros([XBLOCK, 1], tl.int32), tmp3, None) 2023-01-11T21:38:06.5793883Z _tmp4_index_reduce = tl.reshape( 2023-01-11T21:38:06.5793992Z tl.argmax(_tmp4, 1), [XBLOCK, 1]).to(tl.int32) 2023-01-11T21:38:06.5794108Z _tmp4_index_mask = (tl.reshape(tl.arange(0, RBLOCK), 2023-01-11T21:38:06.5794202Z [1, RBLOCK]) == _tmp4_index_reduce) 2023-01-11T21:38:06.5794290Z tmp4 = tl.reshape(tl.sum( 2023-01-11T21:38:06.5794415Z tl.where(_tmp4_index_mask, _tmp4_index, 0), 1), [XBLOCK, 1]) 2023-01-11T21:38:06.5794546Z tl.store(out_ptr3 + 0 + tl.zeros([XBLOCK, 1], tl.int32), tmp4, None) 2023-01-11T21:38:06.5794641Z _tmp6_index_reduce = tl.reshape( 2023-01-11T21:38:06.5794749Z tl.argmin(_tmp6, 1), [XBLOCK, 1]).to(tl.int32) 2023-01-11T21:38:06.5794860Z _tmp6_index_mask = (tl.reshape(tl.arange(0, RBLOCK), 2023-01-11T21:38:06.5794957Z [1, RBLOCK]) == _tmp6_index_reduce) 2023-01-11T21:38:06.5795043Z tmp6 = tl.reshape(tl.sum( 2023-01-11T21:38:06.5795166Z tl.where(_tmp6_index_mask, _tmp6_index, 0), 1), [XBLOCK, 1]) 2023-01-11T21:38:06.5795294Z tl.store(out_ptr4 + 0 + tl.zeros([XBLOCK, 1], tl.int32), tmp6, None) 2023-01-11T21:38:06.5795472Z ''') 2023-01-11T21:38:06.5795478Z 2023-01-11T21:38:06.5795482Z 2023-01-11T21:38:06.5795595Z async_compile.wait(globals()) 2023-01-11T21:38:06.5795676Z del async_compile 2023-01-11T21:38:06.5795682Z 2023-01-11T21:38:06.5795749Z def call(args): 2023-01-11T21:38:06.5795822Z arg0_1, = args 2023-01-11T21:38:06.5795900Z args.clear() 2023-01-11T21:38:06.5795995Z with torch.cuda.device(0): 2023-01-11T21:38:06.5796186Z buf0 = empty_strided((), (), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.5796370Z buf1 = empty_strided((), (), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.5796554Z buf2 = empty_strided((), (), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.5796733Z buf3 = empty_strided((), (), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.5796918Z buf4 = empty_strided((), (), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.5797012Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.5797196Z triton_fused_argmax_argmin_max_1_min_1_sum_1_0.run(arg0_1, buf0, buf1, buf2, buf3, buf4, 1, 3, grid=grid(1), stream=stream0) 2023-01-11T21:38:06.5797271Z del arg0_1 2023-01-11T21:38:06.5797374Z return (buf0, buf1, buf2, buf3, buf4, ) 2023-01-11T21:38:06.5797380Z 2023-01-11T21:38:06.5797384Z 2023-01-11T21:38:06.5797467Z if __name__ == "__main__": 2023-01-11T21:38:06.5797588Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.5797709Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.5797907Z arg0_1 = rand_strided((3, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.5798019Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.5798024Z 2023-01-11T21:38:06.5798289Z [2023-01-11 21:35:28,358] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 758 2023-01-11T21:38:06.5798295Z 2023-01-11T21:38:06.5798393Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.5798466Z import torch 2023-01-11T21:38:06.5798543Z import random 2023-01-11T21:38:06.5798662Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.5798779Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.5798789Z 2023-01-11T21:38:06.5798863Z aten = torch.ops.aten 2023-01-11T21:38:06.5799027Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.5799125Z async_compile = AsyncCompile() 2023-01-11T21:38:06.5799130Z 2023-01-11T21:38:06.5799203Z import triton 2023-01-11T21:38:06.5799293Z import triton.language as tl 2023-01-11T21:38:06.5799420Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.5799559Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.5799564Z 2023-01-11T21:38:06.5799569Z 2023-01-11T21:38:06.5799767Z triton_fused_argmax_argmin_max_1_min_1_sum_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.5799834Z import triton 2023-01-11T21:38:06.5799927Z import triton.language as tl 2023-01-11T21:38:06.5800042Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.5800145Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.5800274Z from torch._inductor.triton_ops.autotune import reduction 2023-01-11T21:38:06.5800396Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.5800401Z 2023-01-11T21:38:06.5800491Z @reduction(size_hints=[1, 4], 2023-01-11T21:38:06.5800599Z reduction_hint=ReductionHint.INNER, 2023-01-11T21:38:06.5800682Z filename=__file__, 2023-01-11T21:38:06.5801098Z meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: '*fp16', 4: '*i64', 5: '*i64', 6: 'i32', 7: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5), equal_to_1=())]}) 2023-01-11T21:38:06.5801171Z @triton.jit 2023-01-11T21:38:06.5801374Z def triton_(in_ptr0, out_ptr0, out_ptr1, out_ptr2, out_ptr3, out_ptr4, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr): 2023-01-11T21:38:06.5801473Z xnumel = 1 2023-01-11T21:38:06.5801546Z rnumel = 3 2023-01-11T21:38:06.5801642Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.5801770Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.5801853Z xmask = xindex < xnumel 2023-01-11T21:38:06.5801974Z rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK]) 2023-01-11T21:38:06.5802090Z _tmp1 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.5802277Z _tmp2 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("-inf") 2023-01-11T21:38:06.5802402Z _tmp3 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("inf") 2023-01-11T21:38:06.5802582Z _tmp4 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("-inf") 2023-01-11T21:38:06.5802690Z _tmp4_index = tl.zeros([XBLOCK, RBLOCK], tl.int64) 2023-01-11T21:38:06.5802811Z _tmp6 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("inf") 2023-01-11T21:38:06.5802924Z _tmp6_index = tl.zeros([XBLOCK, RBLOCK], tl.int64) 2023-01-11T21:38:06.5803032Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.5803121Z rindex = roffset + rbase 2023-01-11T21:38:06.5803207Z rmask = rindex < rnumel 2023-01-11T21:38:06.5803278Z r0 = rindex 2023-01-11T21:38:06.5803493Z tmp0 = tl.load(in_ptr0 + (r0), rmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.5803612Z tmp5 = tl.load(in_ptr0 + (r0), rmask).to(tl.float32) 2023-01-11T21:38:06.5803732Z _tmp1 = tl.where(xmask & rmask, _tmp1 + tmp0, _tmp1) 2023-01-11T21:38:06.5803859Z _tmp2 = tl.where(xmask & rmask & (_tmp2 < tmp0), tmp0, _tmp2) 2023-01-11T21:38:06.5803982Z _tmp3 = tl.where(xmask & rmask & (_tmp3 > tmp0), tmp0, _tmp3) 2023-01-11T21:38:06.5804126Z _tmp4_index = tl.where(xmask & rmask & (_tmp4 < tmp0), rindex, _tmp4_index) 2023-01-11T21:38:06.5804249Z _tmp4 = tl.where(xmask & rmask & (_tmp4 < tmp0), tmp0, _tmp4) 2023-01-11T21:38:06.5804386Z _tmp6_index = tl.where(xmask & rmask & (_tmp6 > tmp5), rindex, _tmp6_index) 2023-01-11T21:38:06.5804504Z _tmp6 = tl.where(xmask & rmask & (_tmp6 > tmp5), tmp5, _tmp6) 2023-01-11T21:38:06.5804617Z tmp1 = tl.reshape(tl.sum(_tmp1, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.5804748Z tl.store(out_ptr0 + 0 + tl.zeros([XBLOCK, 1], tl.int32), tmp1, None) 2023-01-11T21:38:06.5804886Z tmp2 = tl.reshape(tl.max(_tmp2, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.5805022Z tl.store(out_ptr1 + 0 + tl.zeros([XBLOCK, 1], tl.int32), tmp2, None) 2023-01-11T21:38:06.5805133Z tmp3 = tl.reshape(tl.min(_tmp3, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.5805267Z tl.store(out_ptr2 + 0 + tl.zeros([XBLOCK, 1], tl.int32), tmp3, None) 2023-01-11T21:38:06.5805363Z _tmp4_index_reduce = tl.reshape( 2023-01-11T21:38:06.5805466Z tl.argmax(_tmp4, 1), [XBLOCK, 1]).to(tl.int32) 2023-01-11T21:38:06.5805584Z _tmp4_index_mask = (tl.reshape(tl.arange(0, RBLOCK), 2023-01-11T21:38:06.5805680Z [1, RBLOCK]) == _tmp4_index_reduce) 2023-01-11T21:38:06.5805771Z tmp4 = tl.reshape(tl.sum( 2023-01-11T21:38:06.5805897Z tl.where(_tmp4_index_mask, _tmp4_index, 0), 1), [XBLOCK, 1]) 2023-01-11T21:38:06.5806024Z tl.store(out_ptr3 + 0 + tl.zeros([XBLOCK, 1], tl.int32), tmp4, None) 2023-01-11T21:38:06.5806116Z _tmp6_index_reduce = tl.reshape( 2023-01-11T21:38:06.5806219Z tl.argmin(_tmp6, 1), [XBLOCK, 1]).to(tl.int32) 2023-01-11T21:38:06.5806339Z _tmp6_index_mask = (tl.reshape(tl.arange(0, RBLOCK), 2023-01-11T21:38:06.5806433Z [1, RBLOCK]) == _tmp6_index_reduce) 2023-01-11T21:38:06.5806521Z tmp6 = tl.reshape(tl.sum( 2023-01-11T21:38:06.5806643Z tl.where(_tmp6_index_mask, _tmp6_index, 0), 1), [XBLOCK, 1]) 2023-01-11T21:38:06.5806767Z tl.store(out_ptr4 + 0 + tl.zeros([XBLOCK, 1], tl.int32), tmp6, None) 2023-01-11T21:38:06.5806854Z ''') 2023-01-11T21:38:06.5806860Z 2023-01-11T21:38:06.5806864Z 2023-01-11T21:38:06.5806957Z async_compile.wait(globals()) 2023-01-11T21:38:06.5807028Z del async_compile 2023-01-11T21:38:06.5807061Z 2023-01-11T21:38:06.5807137Z def call(args): 2023-01-11T21:38:06.5807212Z arg0_1, = args 2023-01-11T21:38:06.5807285Z args.clear() 2023-01-11T21:38:06.5807381Z with torch.cuda.device(0): 2023-01-11T21:38:06.5807569Z buf0 = empty_strided((), (), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.5807754Z buf1 = empty_strided((), (), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.5807932Z buf2 = empty_strided((), (), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.5808116Z buf3 = empty_strided((), (), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.5808299Z buf4 = empty_strided((), (), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.5808392Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.5808572Z triton_fused_argmax_argmin_max_1_min_1_sum_1_0.run(arg0_1, buf0, buf1, buf2, buf3, buf4, 1, 3, grid=grid(1), stream=stream0) 2023-01-11T21:38:06.5808644Z del arg0_1 2023-01-11T21:38:06.5808749Z return (buf0, buf1, buf2, buf3, buf4, ) 2023-01-11T21:38:06.5808759Z 2023-01-11T21:38:06.5808763Z 2023-01-11T21:38:06.5808843Z if __name__ == "__main__": 2023-01-11T21:38:06.5808955Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.5809083Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.5809284Z arg0_1 = rand_strided((3, ), (1, ), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.5809396Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.5809402Z 2023-01-11T21:38:06.5809473Z ok (0.394s) 2023-01-11T21:38:06.5809929Z test_reduction2_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.5810063Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.5810320Z [2023-01-11 21:35:28,376] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 759 2023-01-11T21:38:06.5810613Z [2023-01-11 21:35:28,571] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 759 2023-01-11T21:38:06.5811030Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.5811154Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.5811406Z [2023-01-11 21:35:28,589] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 760 2023-01-11T21:38:06.5811668Z [2023-01-11 21:35:28,687] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 760 2023-01-11T21:38:06.5811674Z 2023-01-11T21:38:06.5811775Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.5811850Z import torch 2023-01-11T21:38:06.5811925Z import random 2023-01-11T21:38:06.5812046Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.5812169Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.5812176Z 2023-01-11T21:38:06.5812250Z aten = torch.ops.aten 2023-01-11T21:38:06.5812385Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.5812479Z async_compile = AsyncCompile() 2023-01-11T21:38:06.5812485Z 2023-01-11T21:38:06.5812556Z import triton 2023-01-11T21:38:06.5812646Z import triton.language as tl 2023-01-11T21:38:06.5812769Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.5812908Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.5812942Z 2023-01-11T21:38:06.5812946Z 2023-01-11T21:38:06.5813139Z triton_fused_argmin_max_1_min_1_sum_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.5813207Z import triton 2023-01-11T21:38:06.5813299Z import triton.language as tl 2023-01-11T21:38:06.5813411Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.5813517Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.5813648Z from torch._inductor.triton_ops.autotune import reduction 2023-01-11T21:38:06.5813774Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.5813779Z 2023-01-11T21:38:06.5813867Z @reduction(size_hints=[1, 4], 2023-01-11T21:38:06.5813982Z reduction_hint=ReductionHint.INNER, 2023-01-11T21:38:06.5814061Z filename=__file__, 2023-01-11T21:38:06.5814468Z meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: '*i64', 5: 'i32', 6: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]}) 2023-01-11T21:38:06.5814669Z @triton.jit 2023-01-11T21:38:06.5814866Z def triton_(in_ptr0, out_ptr0, out_ptr1, out_ptr2, out_ptr3, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr): 2023-01-11T21:38:06.5814941Z xnumel = 1 2023-01-11T21:38:06.5815013Z rnumel = 4 2023-01-11T21:38:06.5815112Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.5815248Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.5815325Z xmask = xindex < xnumel 2023-01-11T21:38:06.5815444Z rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK]) 2023-01-11T21:38:06.5815563Z _tmp1 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.5815747Z _tmp2 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("-inf") 2023-01-11T21:38:06.5815873Z _tmp3 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("inf") 2023-01-11T21:38:06.5815997Z _tmp5 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("inf") 2023-01-11T21:38:06.5816116Z _tmp5_index = tl.zeros([XBLOCK, RBLOCK], tl.int64) 2023-01-11T21:38:06.5816214Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.5816301Z rindex = roffset + rbase 2023-01-11T21:38:06.5816386Z rmask = rindex < rnumel 2023-01-11T21:38:06.5816456Z r0 = rindex 2023-01-11T21:38:06.5816693Z tmp0 = tl.load(in_ptr0 + (r0), rmask, eviction_policy='evict_last') 2023-01-11T21:38:06.5816796Z tmp4 = tl.load(in_ptr0 + (r0), rmask) 2023-01-11T21:38:06.5816918Z _tmp1 = tl.where(xmask & rmask, _tmp1 + tmp0, _tmp1) 2023-01-11T21:38:06.5817037Z _tmp2 = tl.where(xmask & rmask & (_tmp2 < tmp0), tmp0, _tmp2) 2023-01-11T21:38:06.5817213Z _tmp3 = tl.where(xmask & rmask & (_tmp3 > tmp0), tmp0, _tmp3) 2023-01-11T21:38:06.5817377Z _tmp5_index = tl.where(xmask & rmask & (_tmp5 > tmp4), rindex, _tmp5_index) 2023-01-11T21:38:06.5817508Z _tmp5 = tl.where(xmask & rmask & (_tmp5 > tmp4), tmp4, _tmp5) 2023-01-11T21:38:06.5817629Z tmp1 = tl.reshape(tl.sum(_tmp1, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.5817764Z tl.store(out_ptr0 + 0 + tl.zeros([XBLOCK, 1], tl.int32), tmp1, None) 2023-01-11T21:38:06.5817878Z tmp2 = tl.reshape(tl.max(_tmp2, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.5818008Z tl.store(out_ptr1 + 0 + tl.zeros([XBLOCK, 1], tl.int32), tmp2, None) 2023-01-11T21:38:06.5818114Z tmp3 = tl.reshape(tl.min(_tmp3, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.5818241Z tl.store(out_ptr2 + 0 + tl.zeros([XBLOCK, 1], tl.int32), tmp3, None) 2023-01-11T21:38:06.5818337Z _tmp5_index_reduce = tl.reshape( 2023-01-11T21:38:06.5818447Z tl.argmin(_tmp5, 1), [XBLOCK, 1]).to(tl.int32) 2023-01-11T21:38:06.5818568Z _tmp5_index_mask = (tl.reshape(tl.arange(0, RBLOCK), 2023-01-11T21:38:06.5818667Z [1, RBLOCK]) == _tmp5_index_reduce) 2023-01-11T21:38:06.5818754Z tmp5 = tl.reshape(tl.sum( 2023-01-11T21:38:06.5818871Z tl.where(_tmp5_index_mask, _tmp5_index, 0), 1), [XBLOCK, 1]) 2023-01-11T21:38:06.5818998Z tl.store(out_ptr3 + 0 + tl.zeros([XBLOCK, 1], tl.int32), tmp5, None) 2023-01-11T21:38:06.5819121Z ''') 2023-01-11T21:38:06.5819126Z 2023-01-11T21:38:06.5819131Z 2023-01-11T21:38:06.5819224Z async_compile.wait(globals()) 2023-01-11T21:38:06.5819302Z del async_compile 2023-01-11T21:38:06.5819307Z 2023-01-11T21:38:06.5819381Z def call(args): 2023-01-11T21:38:06.5819455Z arg0_1, = args 2023-01-11T21:38:06.5819530Z args.clear() 2023-01-11T21:38:06.5819615Z with torch.cuda.device(0): 2023-01-11T21:38:06.5819802Z buf0 = empty_strided((), (), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.5819986Z buf1 = empty_strided((), (), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.5820168Z buf2 = empty_strided((), (), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.5820354Z buf3 = empty_strided((), (), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.5820445Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.5820617Z triton_fused_argmin_max_1_min_1_sum_1_0.run(arg0_1, buf0, buf1, buf2, buf3, 1, 4, grid=grid(1), stream=stream0) 2023-01-11T21:38:06.5820686Z del arg0_1 2023-01-11T21:38:06.5820782Z return (buf0, buf1, buf2, buf3, ) 2023-01-11T21:38:06.5820787Z 2023-01-11T21:38:06.5820791Z 2023-01-11T21:38:06.5820871Z if __name__ == "__main__": 2023-01-11T21:38:06.5820992Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.5821121Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.5821315Z arg0_1 = rand_strided((4, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.5821427Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.5821433Z 2023-01-11T21:38:06.5821437Z 2023-01-11T21:38:06.5821533Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.5821609Z import torch 2023-01-11T21:38:06.5821677Z import random 2023-01-11T21:38:06.5821796Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.5821919Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.5821927Z 2023-01-11T21:38:06.5822011Z aten = torch.ops.aten 2023-01-11T21:38:06.5822146Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.5822241Z async_compile = AsyncCompile() 2023-01-11T21:38:06.5822246Z 2023-01-11T21:38:06.5822318Z import triton 2023-01-11T21:38:06.5822404Z import triton.language as tl 2023-01-11T21:38:06.5822557Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.5822702Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.5822707Z 2023-01-11T21:38:06.5822712Z 2023-01-11T21:38:06.5822898Z triton_fused_argmin_max_1_min_1_sum_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.5822974Z import triton 2023-01-11T21:38:06.5823065Z import triton.language as tl 2023-01-11T21:38:06.5823178Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.5823281Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.5823404Z from torch._inductor.triton_ops.autotune import reduction 2023-01-11T21:38:06.5823529Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.5823534Z 2023-01-11T21:38:06.5823620Z @reduction(size_hints=[1, 4], 2023-01-11T21:38:06.5823736Z reduction_hint=ReductionHint.INNER, 2023-01-11T21:38:06.5823820Z filename=__file__, 2023-01-11T21:38:06.5824227Z meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: '*fp16', 4: '*i64', 5: 'i32', 6: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]}) 2023-01-11T21:38:06.5824301Z @triton.jit 2023-01-11T21:38:06.5824496Z def triton_(in_ptr0, out_ptr0, out_ptr1, out_ptr2, out_ptr3, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr): 2023-01-11T21:38:06.5824562Z xnumel = 1 2023-01-11T21:38:06.5824632Z rnumel = 4 2023-01-11T21:38:06.5824728Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.5824863Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.5824974Z xmask = xindex < xnumel 2023-01-11T21:38:06.5825092Z rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK]) 2023-01-11T21:38:06.5825208Z _tmp1 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.5825384Z _tmp2 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("-inf") 2023-01-11T21:38:06.5825511Z _tmp3 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("inf") 2023-01-11T21:38:06.5825649Z _tmp5 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("inf") 2023-01-11T21:38:06.5825778Z _tmp5_index = tl.zeros([XBLOCK, RBLOCK], tl.int64) 2023-01-11T21:38:06.5825907Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.5825994Z rindex = roffset + rbase 2023-01-11T21:38:06.5826079Z rmask = rindex < rnumel 2023-01-11T21:38:06.5826143Z r0 = rindex 2023-01-11T21:38:06.5826362Z tmp0 = tl.load(in_ptr0 + (r0), rmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.5826483Z tmp4 = tl.load(in_ptr0 + (r0), rmask).to(tl.float32) 2023-01-11T21:38:06.5826603Z _tmp1 = tl.where(xmask & rmask, _tmp1 + tmp0, _tmp1) 2023-01-11T21:38:06.5826731Z _tmp2 = tl.where(xmask & rmask & (_tmp2 < tmp0), tmp0, _tmp2) 2023-01-11T21:38:06.5826854Z _tmp3 = tl.where(xmask & rmask & (_tmp3 > tmp0), tmp0, _tmp3) 2023-01-11T21:38:06.5826999Z _tmp5_index = tl.where(xmask & rmask & (_tmp5 > tmp4), rindex, _tmp5_index) 2023-01-11T21:38:06.5827117Z _tmp5 = tl.where(xmask & rmask & (_tmp5 > tmp4), tmp4, _tmp5) 2023-01-11T21:38:06.5827223Z tmp1 = tl.reshape(tl.sum(_tmp1, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.5827356Z tl.store(out_ptr0 + 0 + tl.zeros([XBLOCK, 1], tl.int32), tmp1, None) 2023-01-11T21:38:06.5827468Z tmp2 = tl.reshape(tl.max(_tmp2, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.5827600Z tl.store(out_ptr1 + 0 + tl.zeros([XBLOCK, 1], tl.int32), tmp2, None) 2023-01-11T21:38:06.5827710Z tmp3 = tl.reshape(tl.min(_tmp3, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.5827843Z tl.store(out_ptr2 + 0 + tl.zeros([XBLOCK, 1], tl.int32), tmp3, None) 2023-01-11T21:38:06.5827937Z _tmp5_index_reduce = tl.reshape( 2023-01-11T21:38:06.5828047Z tl.argmin(_tmp5, 1), [XBLOCK, 1]).to(tl.int32) 2023-01-11T21:38:06.5828161Z _tmp5_index_mask = (tl.reshape(tl.arange(0, RBLOCK), 2023-01-11T21:38:06.5828284Z [1, RBLOCK]) == _tmp5_index_reduce) 2023-01-11T21:38:06.5828377Z tmp5 = tl.reshape(tl.sum( 2023-01-11T21:38:06.5828503Z tl.where(_tmp5_index_mask, _tmp5_index, 0), 1), [XBLOCK, 1]) 2023-01-11T21:38:06.5828631Z tl.store(out_ptr3 + 0 + tl.zeros([XBLOCK, 1], tl.int32), tmp5, None) 2023-01-11T21:38:06.5828716Z ''') 2023-01-11T21:38:06.5828722Z 2023-01-11T21:38:06.5828726Z 2023-01-11T21:38:06.5828818Z async_compile.wait(globals()) 2023-01-11T21:38:06.5828889Z del async_compile 2023-01-11T21:38:06.5828901Z 2023-01-11T21:38:06.5828969Z def call(args): 2023-01-11T21:38:06.5829040Z arg0_1, = args 2023-01-11T21:38:06.5829115Z args.clear() 2023-01-11T21:38:06.5829209Z with torch.cuda.device(0): 2023-01-11T21:38:06.5829399Z buf0 = empty_strided((), (), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.5829584Z buf1 = empty_strided((), (), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.5829768Z buf2 = empty_strided((), (), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.5829948Z buf3 = empty_strided((), (), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.5830039Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.5830208Z triton_fused_argmin_max_1_min_1_sum_1_0.run(arg0_1, buf0, buf1, buf2, buf3, 1, 4, grid=grid(1), stream=stream0) 2023-01-11T21:38:06.5830281Z del arg0_1 2023-01-11T21:38:06.5830375Z return (buf0, buf1, buf2, buf3, ) 2023-01-11T21:38:06.5830381Z 2023-01-11T21:38:06.5830385Z 2023-01-11T21:38:06.5830465Z if __name__ == "__main__": 2023-01-11T21:38:06.5830584Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.5830705Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.5830944Z arg0_1 = rand_strided((4, ), (1, ), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.5831056Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.5831061Z 2023-01-11T21:38:06.5831131Z ok (0.329s) 2023-01-11T21:38:06.5831590Z test_reduction3_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.5831722Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.5831977Z [2023-01-11 21:35:28,706] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 761 2023-01-11T21:38:06.5832240Z [2023-01-11 21:35:28,804] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 761 2023-01-11T21:38:06.5832661Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.5832796Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.5833053Z [2023-01-11 21:35:28,822] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 762 2023-01-11T21:38:06.5833305Z [2023-01-11 21:35:28,920] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 762 2023-01-11T21:38:06.5833317Z 2023-01-11T21:38:06.5833409Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.5833481Z import torch 2023-01-11T21:38:06.5833558Z import random 2023-01-11T21:38:06.5833677Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.5833802Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.5833807Z 2023-01-11T21:38:06.5833889Z aten = torch.ops.aten 2023-01-11T21:38:06.5834025Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.5834140Z async_compile = AsyncCompile() 2023-01-11T21:38:06.5834146Z 2023-01-11T21:38:06.5834222Z import triton 2023-01-11T21:38:06.5834314Z import triton.language as tl 2023-01-11T21:38:06.5834437Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.5834575Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.5834580Z 2023-01-11T21:38:06.5834585Z 2023-01-11T21:38:06.5834770Z triton_fused_argmax_max_1_min_1_sum_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.5834845Z import triton 2023-01-11T21:38:06.5834936Z import triton.language as tl 2023-01-11T21:38:06.5835042Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.5835146Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.5835278Z from torch._inductor.triton_ops.autotune import reduction 2023-01-11T21:38:06.5835429Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.5835435Z 2023-01-11T21:38:06.5835533Z @reduction(size_hints=[1, 4], 2023-01-11T21:38:06.5835662Z reduction_hint=ReductionHint.INNER, 2023-01-11T21:38:06.5835747Z filename=__file__, 2023-01-11T21:38:06.5836151Z meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: '*i64', 5: 'i32', 6: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]}) 2023-01-11T21:38:06.5836220Z @triton.jit 2023-01-11T21:38:06.5836412Z def triton_(in_ptr0, out_ptr0, out_ptr1, out_ptr2, out_ptr3, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr): 2023-01-11T21:38:06.5836491Z xnumel = 1 2023-01-11T21:38:06.5836593Z rnumel = 4 2023-01-11T21:38:06.5836686Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.5836821Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.5836906Z xmask = xindex < xnumel 2023-01-11T21:38:06.5837017Z rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK]) 2023-01-11T21:38:06.5837138Z _tmp1 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.5837325Z _tmp2 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("-inf") 2023-01-11T21:38:06.5837451Z _tmp3 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("inf") 2023-01-11T21:38:06.5837629Z _tmp5 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("-inf") 2023-01-11T21:38:06.5837742Z _tmp5_index = tl.zeros([XBLOCK, RBLOCK], tl.int64) 2023-01-11T21:38:06.5837847Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.5837929Z rindex = roffset + rbase 2023-01-11T21:38:06.5838015Z rmask = rindex < rnumel 2023-01-11T21:38:06.5838091Z r0 = rindex 2023-01-11T21:38:06.5838288Z tmp0 = tl.load(in_ptr0 + (r0), rmask, eviction_policy='evict_last') 2023-01-11T21:38:06.5838390Z tmp4 = tl.load(in_ptr0 + (r0), rmask) 2023-01-11T21:38:06.5838511Z _tmp1 = tl.where(xmask & rmask, _tmp1 + tmp0, _tmp1) 2023-01-11T21:38:06.5838643Z _tmp2 = tl.where(xmask & rmask & (_tmp2 < tmp0), tmp0, _tmp2) 2023-01-11T21:38:06.5838766Z _tmp3 = tl.where(xmask & rmask & (_tmp3 > tmp0), tmp0, _tmp3) 2023-01-11T21:38:06.5838902Z _tmp5_index = tl.where(xmask & rmask & (_tmp5 < tmp4), rindex, _tmp5_index) 2023-01-11T21:38:06.5839025Z _tmp5 = tl.where(xmask & rmask & (_tmp5 < tmp4), tmp4, _tmp5) 2023-01-11T21:38:06.5839138Z tmp1 = tl.reshape(tl.sum(_tmp1, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.5839268Z tl.store(out_ptr0 + 0 + tl.zeros([XBLOCK, 1], tl.int32), tmp1, None) 2023-01-11T21:38:06.5839381Z tmp2 = tl.reshape(tl.max(_tmp2, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.5839511Z tl.store(out_ptr1 + 0 + tl.zeros([XBLOCK, 1], tl.int32), tmp2, None) 2023-01-11T21:38:06.5839624Z tmp3 = tl.reshape(tl.min(_tmp3, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.5839747Z tl.store(out_ptr2 + 0 + tl.zeros([XBLOCK, 1], tl.int32), tmp3, None) 2023-01-11T21:38:06.5839843Z _tmp5_index_reduce = tl.reshape( 2023-01-11T21:38:06.5839978Z tl.argmax(_tmp5, 1), [XBLOCK, 1]).to(tl.int32) 2023-01-11T21:38:06.5840099Z _tmp5_index_mask = (tl.reshape(tl.arange(0, RBLOCK), 2023-01-11T21:38:06.5840195Z [1, RBLOCK]) == _tmp5_index_reduce) 2023-01-11T21:38:06.5840285Z tmp5 = tl.reshape(tl.sum( 2023-01-11T21:38:06.5840408Z tl.where(_tmp5_index_mask, _tmp5_index, 0), 1), [XBLOCK, 1]) 2023-01-11T21:38:06.5840536Z tl.store(out_ptr3 + 0 + tl.zeros([XBLOCK, 1], tl.int32), tmp5, None) 2023-01-11T21:38:06.5840617Z ''') 2023-01-11T21:38:06.5840622Z 2023-01-11T21:38:06.5840627Z 2023-01-11T21:38:06.5840719Z async_compile.wait(globals()) 2023-01-11T21:38:06.5840796Z del async_compile 2023-01-11T21:38:06.5840804Z 2023-01-11T21:38:06.5840880Z def call(args): 2023-01-11T21:38:06.5840954Z arg0_1, = args 2023-01-11T21:38:06.5841029Z args.clear() 2023-01-11T21:38:06.5841122Z with torch.cuda.device(0): 2023-01-11T21:38:06.5841302Z buf0 = empty_strided((), (), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.5841488Z buf1 = empty_strided((), (), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.5841673Z buf2 = empty_strided((), (), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.5841856Z buf3 = empty_strided((), (), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.5841948Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.5842116Z triton_fused_argmax_max_1_min_1_sum_1_0.run(arg0_1, buf0, buf1, buf2, buf3, 1, 4, grid=grid(1), stream=stream0) 2023-01-11T21:38:06.5842192Z del arg0_1 2023-01-11T21:38:06.5842288Z return (buf0, buf1, buf2, buf3, ) 2023-01-11T21:38:06.5842293Z 2023-01-11T21:38:06.5842298Z 2023-01-11T21:38:06.5842401Z if __name__ == "__main__": 2023-01-11T21:38:06.5842520Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.5842644Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.5842841Z arg0_1 = rand_strided((4, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.5842956Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.5842961Z 2023-01-11T21:38:06.5842966Z 2023-01-11T21:38:06.5843061Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.5843134Z import torch 2023-01-11T21:38:06.5843207Z import random 2023-01-11T21:38:06.5843321Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.5843444Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.5843449Z 2023-01-11T21:38:06.5843531Z aten = torch.ops.aten 2023-01-11T21:38:06.5843667Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.5843766Z async_compile = AsyncCompile() 2023-01-11T21:38:06.5843771Z 2023-01-11T21:38:06.5843848Z import triton 2023-01-11T21:38:06.5843940Z import triton.language as tl 2023-01-11T21:38:06.5844064Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.5844196Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.5844201Z 2023-01-11T21:38:06.5844205Z 2023-01-11T21:38:06.5844393Z triton_fused_argmax_max_1_min_1_sum_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.5844467Z import triton 2023-01-11T21:38:06.5844558Z import triton.language as tl 2023-01-11T21:38:06.5844671Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.5844774Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.5844903Z from torch._inductor.triton_ops.autotune import reduction 2023-01-11T21:38:06.5845022Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.5845033Z 2023-01-11T21:38:06.5845115Z @reduction(size_hints=[1, 4], 2023-01-11T21:38:06.5845228Z reduction_hint=ReductionHint.INNER, 2023-01-11T21:38:06.5845315Z filename=__file__, 2023-01-11T21:38:06.5845718Z meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: '*fp16', 4: '*i64', 5: 'i32', 6: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]}) 2023-01-11T21:38:06.5845820Z @triton.jit 2023-01-11T21:38:06.5846015Z def triton_(in_ptr0, out_ptr0, out_ptr1, out_ptr2, out_ptr3, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr): 2023-01-11T21:38:06.5846088Z xnumel = 1 2023-01-11T21:38:06.5846153Z rnumel = 4 2023-01-11T21:38:06.5846252Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.5846387Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.5846469Z xmask = xindex < xnumel 2023-01-11T21:38:06.5846588Z rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK]) 2023-01-11T21:38:06.5846707Z _tmp1 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.5846890Z _tmp2 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("-inf") 2023-01-11T21:38:06.5847015Z _tmp3 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("inf") 2023-01-11T21:38:06.5847189Z _tmp5 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("-inf") 2023-01-11T21:38:06.5847307Z _tmp5_index = tl.zeros([XBLOCK, RBLOCK], tl.int64) 2023-01-11T21:38:06.5847413Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.5847501Z rindex = roffset + rbase 2023-01-11T21:38:06.5847586Z rmask = rindex < rnumel 2023-01-11T21:38:06.5847660Z r0 = rindex 2023-01-11T21:38:06.5847875Z tmp0 = tl.load(in_ptr0 + (r0), rmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.5847988Z tmp4 = tl.load(in_ptr0 + (r0), rmask).to(tl.float32) 2023-01-11T21:38:06.5848106Z _tmp1 = tl.where(xmask & rmask, _tmp1 + tmp0, _tmp1) 2023-01-11T21:38:06.5848232Z _tmp2 = tl.where(xmask & rmask & (_tmp2 < tmp0), tmp0, _tmp2) 2023-01-11T21:38:06.5848390Z _tmp3 = tl.where(xmask & rmask & (_tmp3 > tmp0), tmp0, _tmp3) 2023-01-11T21:38:06.5848533Z _tmp5_index = tl.where(xmask & rmask & (_tmp5 < tmp4), rindex, _tmp5_index) 2023-01-11T21:38:06.5848654Z _tmp5 = tl.where(xmask & rmask & (_tmp5 < tmp4), tmp4, _tmp5) 2023-01-11T21:38:06.5848770Z tmp1 = tl.reshape(tl.sum(_tmp1, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.5848900Z tl.store(out_ptr0 + 0 + tl.zeros([XBLOCK, 1], tl.int32), tmp1, None) 2023-01-11T21:38:06.5849006Z tmp2 = tl.reshape(tl.max(_tmp2, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.5849135Z tl.store(out_ptr1 + 0 + tl.zeros([XBLOCK, 1], tl.int32), tmp2, None) 2023-01-11T21:38:06.5849247Z tmp3 = tl.reshape(tl.min(_tmp3, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.5849376Z tl.store(out_ptr2 + 0 + tl.zeros([XBLOCK, 1], tl.int32), tmp3, None) 2023-01-11T21:38:06.5849471Z _tmp5_index_reduce = tl.reshape( 2023-01-11T21:38:06.5849581Z tl.argmax(_tmp5, 1), [XBLOCK, 1]).to(tl.int32) 2023-01-11T21:38:06.5849701Z _tmp5_index_mask = (tl.reshape(tl.arange(0, RBLOCK), 2023-01-11T21:38:06.5849793Z [1, RBLOCK]) == _tmp5_index_reduce) 2023-01-11T21:38:06.5849880Z tmp5 = tl.reshape(tl.sum( 2023-01-11T21:38:06.5850003Z tl.where(_tmp5_index_mask, _tmp5_index, 0), 1), [XBLOCK, 1]) 2023-01-11T21:38:06.5850134Z tl.store(out_ptr3 + 0 + tl.zeros([XBLOCK, 1], tl.int32), tmp5, None) 2023-01-11T21:38:06.5850219Z ''') 2023-01-11T21:38:06.5850224Z 2023-01-11T21:38:06.5850229Z 2023-01-11T21:38:06.5850322Z async_compile.wait(globals()) 2023-01-11T21:38:06.5850399Z del async_compile 2023-01-11T21:38:06.5850404Z 2023-01-11T21:38:06.5850479Z def call(args): 2023-01-11T21:38:06.5850545Z arg0_1, = args 2023-01-11T21:38:06.5850619Z args.clear() 2023-01-11T21:38:06.5850712Z with torch.cuda.device(0): 2023-01-11T21:38:06.5850899Z buf0 = empty_strided((), (), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.5851082Z buf1 = empty_strided((), (), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.5851270Z buf2 = empty_strided((), (), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.5851453Z buf3 = empty_strided((), (), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.5851537Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.5851737Z triton_fused_argmax_max_1_min_1_sum_1_0.run(arg0_1, buf0, buf1, buf2, buf3, 1, 4, grid=grid(1), stream=stream0) 2023-01-11T21:38:06.5851815Z del arg0_1 2023-01-11T21:38:06.5851910Z return (buf0, buf1, buf2, buf3, ) 2023-01-11T21:38:06.5851916Z 2023-01-11T21:38:06.5851920Z 2023-01-11T21:38:06.5852000Z if __name__ == "__main__": 2023-01-11T21:38:06.5852118Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.5852244Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.5852438Z arg0_1 = rand_strided((4, ), (1, ), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.5852544Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.5852549Z 2023-01-11T21:38:06.5852624Z ok (0.233s) 2023-01-11T21:38:06.5853082Z test_reduction4_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.5853216Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.5853474Z [2023-01-11 21:35:28,935] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 763 2023-01-11T21:38:06.5853738Z [2023-01-11 21:35:29,045] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 763 2023-01-11T21:38:06.5854151Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.5854308Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.5854678Z [2023-01-11 21:35:29,058] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 764 2023-01-11T21:38:06.5854942Z [2023-01-11 21:35:29,168] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 764 2023-01-11T21:38:06.5855354Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.5855481Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.5855730Z [2023-01-11 21:35:29,181] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 765 2023-01-11T21:38:06.5855748Z 2023-01-11T21:38:06.5855839Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.5855913Z import torch 2023-01-11T21:38:06.5855990Z import random 2023-01-11T21:38:06.5856109Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.5856234Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.5856239Z 2023-01-11T21:38:06.5856320Z aten = torch.ops.aten 2023-01-11T21:38:06.5856458Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.5856547Z async_compile = AsyncCompile() 2023-01-11T21:38:06.5856552Z 2023-01-11T21:38:06.5856626Z import triton 2023-01-11T21:38:06.5856717Z import triton.language as tl 2023-01-11T21:38:06.5856842Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.5856985Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.5856993Z 2023-01-11T21:38:06.5856997Z 2023-01-11T21:38:06.5857217Z triton_fused_argmax_argmin_0 = async_compile.triton(''' 2023-01-11T21:38:06.5857323Z import triton 2023-01-11T21:38:06.5857417Z import triton.language as tl 2023-01-11T21:38:06.5857575Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.5857680Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.5857809Z from torch._inductor.triton_ops.autotune import reduction 2023-01-11T21:38:06.5857934Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.5857939Z 2023-01-11T21:38:06.5858029Z @reduction(size_hints=[1, 128], 2023-01-11T21:38:06.5858141Z reduction_hint=ReductionHint.INNER, 2023-01-11T21:38:06.5858227Z filename=__file__, 2023-01-11T21:38:06.5858596Z meta={'signature': {0: '*fp32', 1: '*i64', 2: '*i64', 3: 'i32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 4), equal_to_1=())]}) 2023-01-11T21:38:06.5858674Z @triton.jit 2023-01-11T21:38:06.5858852Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr): 2023-01-11T21:38:06.5858925Z xnumel = 1 2023-01-11T21:38:06.5858997Z rnumel = 128 2023-01-11T21:38:06.5859093Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.5859225Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.5859307Z xmask = xindex < xnumel 2023-01-11T21:38:06.5859418Z rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK]) 2023-01-11T21:38:06.5859599Z _tmp1 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("-inf") 2023-01-11T21:38:06.5859714Z _tmp1_index = tl.zeros([XBLOCK, RBLOCK], tl.int64) 2023-01-11T21:38:06.5859840Z _tmp3 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("inf") 2023-01-11T21:38:06.5859955Z _tmp3_index = tl.zeros([XBLOCK, RBLOCK], tl.int64) 2023-01-11T21:38:06.5860097Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.5860185Z rindex = roffset + rbase 2023-01-11T21:38:06.5860263Z rmask = rindex < rnumel 2023-01-11T21:38:06.5860335Z r0 = rindex 2023-01-11T21:38:06.5860529Z tmp0 = tl.load(in_ptr0 + (r0), rmask, eviction_policy='evict_last') 2023-01-11T21:38:06.5860632Z tmp2 = tl.load(in_ptr0 + (r0), rmask) 2023-01-11T21:38:06.5860773Z _tmp1_index = tl.where(xmask & rmask & (_tmp1 < tmp0), rindex, _tmp1_index) 2023-01-11T21:38:06.5860899Z _tmp1 = tl.where(xmask & rmask & (_tmp1 < tmp0), tmp0, _tmp1) 2023-01-11T21:38:06.5861036Z _tmp3_index = tl.where(xmask & rmask & (_tmp3 > tmp2), rindex, _tmp3_index) 2023-01-11T21:38:06.5861153Z _tmp3 = tl.where(xmask & rmask & (_tmp3 > tmp2), tmp2, _tmp3) 2023-01-11T21:38:06.5861249Z _tmp1_index_reduce = tl.reshape( 2023-01-11T21:38:06.5861359Z tl.argmax(_tmp1, 1), [XBLOCK, 1]).to(tl.int32) 2023-01-11T21:38:06.5861476Z _tmp1_index_mask = (tl.reshape(tl.arange(0, RBLOCK), 2023-01-11T21:38:06.5861574Z [1, RBLOCK]) == _tmp1_index_reduce) 2023-01-11T21:38:06.5861661Z tmp1 = tl.reshape(tl.sum( 2023-01-11T21:38:06.5861784Z tl.where(_tmp1_index_mask, _tmp1_index, 0), 1), [XBLOCK, 1]) 2023-01-11T21:38:06.5861919Z tl.store(out_ptr0 + 0 + tl.zeros([XBLOCK, 1], tl.int32), tmp1, None) 2023-01-11T21:38:06.5862007Z _tmp3_index_reduce = tl.reshape( 2023-01-11T21:38:06.5862112Z tl.argmin(_tmp3, 1), [XBLOCK, 1]).to(tl.int32) 2023-01-11T21:38:06.5862230Z _tmp3_index_mask = (tl.reshape(tl.arange(0, RBLOCK), 2023-01-11T21:38:06.5862326Z [1, RBLOCK]) == _tmp3_index_reduce) 2023-01-11T21:38:06.5862414Z tmp3 = tl.reshape(tl.sum( 2023-01-11T21:38:06.5862535Z tl.where(_tmp3_index_mask, _tmp3_index, 0), 1), [XBLOCK, 1]) 2023-01-11T21:38:06.5862663Z tl.store(out_ptr1 + 0 + tl.zeros([XBLOCK, 1], tl.int32), tmp3, None) 2023-01-11T21:38:06.5862742Z ''') 2023-01-11T21:38:06.5862747Z 2023-01-11T21:38:06.5862756Z 2023-01-11T21:38:06.5862845Z async_compile.wait(globals()) 2023-01-11T21:38:06.5862920Z del async_compile 2023-01-11T21:38:06.5862925Z 2023-01-11T21:38:06.5863000Z def call(args): 2023-01-11T21:38:06.5863071Z arg0_1, = args 2023-01-11T21:38:06.5863144Z args.clear() 2023-01-11T21:38:06.5863236Z with torch.cuda.device(0): 2023-01-11T21:38:06.5863441Z buf0 = empty_strided((), (), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.5863627Z buf1 = empty_strided((), (), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.5863717Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.5863877Z triton_fused_argmax_argmin_0.run(arg0_1, buf0, buf1, 1, 128, grid=grid(1), stream=stream0) 2023-01-11T21:38:06.5863949Z del arg0_1 2023-01-11T21:38:06.5864036Z return (buf0, buf1, ) 2023-01-11T21:38:06.5864041Z 2023-01-11T21:38:06.5864045Z 2023-01-11T21:38:06.5864123Z if __name__ == "__main__": 2023-01-11T21:38:06.5864240Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.5864364Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.5864564Z arg0_1 = rand_strided((128, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.5864676Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.5864681Z 2023-01-11T21:38:06.5864685Z 2023-01-11T21:38:06.5864782Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.5864856Z import torch 2023-01-11T21:38:06.5864931Z import random 2023-01-11T21:38:06.5865052Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.5865175Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.5865180Z 2023-01-11T21:38:06.5865256Z aten = torch.ops.aten 2023-01-11T21:38:06.5865390Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.5865494Z async_compile = AsyncCompile() 2023-01-11T21:38:06.5865501Z 2023-01-11T21:38:06.5865590Z import triton 2023-01-11T21:38:06.5865688Z import triton.language as tl 2023-01-11T21:38:06.5865856Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.5865993Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.5865999Z 2023-01-11T21:38:06.5866003Z 2023-01-11T21:38:06.5866175Z triton_fused_argmax_argmin_0 = async_compile.triton(''' 2023-01-11T21:38:06.5866242Z import triton 2023-01-11T21:38:06.5866337Z import triton.language as tl 2023-01-11T21:38:06.5866450Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.5866552Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.5866681Z from torch._inductor.triton_ops.autotune import reduction 2023-01-11T21:38:06.5866804Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.5866809Z 2023-01-11T21:38:06.5866896Z @reduction(size_hints=[1, 128], 2023-01-11T21:38:06.5867011Z reduction_hint=ReductionHint.INNER, 2023-01-11T21:38:06.5867089Z filename=__file__, 2023-01-11T21:38:06.5867466Z meta={'signature': {0: '*fp16', 1: '*i64', 2: '*i64', 3: 'i32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 4), equal_to_1=())]}) 2023-01-11T21:38:06.5867543Z @triton.jit 2023-01-11T21:38:06.5867722Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr): 2023-01-11T21:38:06.5867796Z xnumel = 1 2023-01-11T21:38:06.5867869Z rnumel = 128 2023-01-11T21:38:06.5867966Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.5868094Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.5868178Z xmask = xindex < xnumel 2023-01-11T21:38:06.5868297Z rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK]) 2023-01-11T21:38:06.5868479Z _tmp1 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("-inf") 2023-01-11T21:38:06.5868596Z _tmp1_index = tl.zeros([XBLOCK, RBLOCK], tl.int64) 2023-01-11T21:38:06.5868723Z _tmp3 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("inf") 2023-01-11T21:38:06.5868840Z _tmp3_index = tl.zeros([XBLOCK, RBLOCK], tl.int64) 2023-01-11T21:38:06.5868944Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.5869025Z rindex = roffset + rbase 2023-01-11T21:38:06.5869108Z rmask = rindex < rnumel 2023-01-11T21:38:06.5869178Z r0 = rindex 2023-01-11T21:38:06.5869423Z tmp0 = tl.load(in_ptr0 + (r0), rmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.5869544Z tmp2 = tl.load(in_ptr0 + (r0), rmask).to(tl.float32) 2023-01-11T21:38:06.5869686Z _tmp1_index = tl.where(xmask & rmask & (_tmp1 < tmp0), rindex, _tmp1_index) 2023-01-11T21:38:06.5869813Z _tmp1 = tl.where(xmask & rmask & (_tmp1 < tmp0), tmp0, _tmp1) 2023-01-11T21:38:06.5869944Z _tmp3_index = tl.where(xmask & rmask & (_tmp3 > tmp2), rindex, _tmp3_index) 2023-01-11T21:38:06.5870066Z _tmp3 = tl.where(xmask & rmask & (_tmp3 > tmp2), tmp2, _tmp3) 2023-01-11T21:38:06.5870162Z _tmp1_index_reduce = tl.reshape( 2023-01-11T21:38:06.5870273Z tl.argmax(_tmp1, 1), [XBLOCK, 1]).to(tl.int32) 2023-01-11T21:38:06.5870392Z _tmp1_index_mask = (tl.reshape(tl.arange(0, RBLOCK), 2023-01-11T21:38:06.5870486Z [1, RBLOCK]) == _tmp1_index_reduce) 2023-01-11T21:38:06.5870578Z tmp1 = tl.reshape(tl.sum( 2023-01-11T21:38:06.5870698Z tl.where(_tmp1_index_mask, _tmp1_index, 0), 1), [XBLOCK, 1]) 2023-01-11T21:38:06.5870830Z tl.store(out_ptr0 + 0 + tl.zeros([XBLOCK, 1], tl.int32), tmp1, None) 2023-01-11T21:38:06.5870925Z _tmp3_index_reduce = tl.reshape( 2023-01-11T21:38:06.5871035Z tl.argmin(_tmp3, 1), [XBLOCK, 1]).to(tl.int32) 2023-01-11T21:38:06.5871155Z _tmp3_index_mask = (tl.reshape(tl.arange(0, RBLOCK), 2023-01-11T21:38:06.5871251Z [1, RBLOCK]) == _tmp3_index_reduce) 2023-01-11T21:38:06.5871338Z tmp3 = tl.reshape(tl.sum( 2023-01-11T21:38:06.5871461Z tl.where(_tmp3_index_mask, _tmp3_index, 0), 1), [XBLOCK, 1]) 2023-01-11T21:38:06.5871584Z tl.store(out_ptr1 + 0 + tl.zeros([XBLOCK, 1], tl.int32), tmp3, None) 2023-01-11T21:38:06.5871711Z ''') 2023-01-11T21:38:06.5871717Z 2023-01-11T21:38:06.5871722Z 2023-01-11T21:38:06.5871819Z async_compile.wait(globals()) 2023-01-11T21:38:06.5871898Z del async_compile 2023-01-11T21:38:06.5871903Z 2023-01-11T21:38:06.5871980Z def call(args): 2023-01-11T21:38:06.5872059Z arg0_1, = args 2023-01-11T21:38:06.5872137Z args.clear() 2023-01-11T21:38:06.5872226Z with torch.cuda.device(0): 2023-01-11T21:38:06.5872415Z buf0 = empty_strided((), (), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.5872601Z buf1 = empty_strided((), (), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.5872694Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.5872851Z triton_fused_argmax_argmin_0.run(arg0_1, buf0, buf1, 1, 128, grid=grid(1), stream=stream0) 2023-01-11T21:38:06.5872926Z del arg0_1 2023-01-11T21:38:06.5873011Z return (buf0, buf1, ) 2023-01-11T21:38:06.5873016Z 2023-01-11T21:38:06.5873024Z 2023-01-11T21:38:06.5873107Z if __name__ == "__main__": 2023-01-11T21:38:06.5873220Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.5873348Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.5873553Z arg0_1 = rand_strided((128, ), (1, ), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.5873671Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.5873676Z 2023-01-11T21:38:06.5873944Z [2023-01-11 21:35:29,247] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 765 2023-01-11T21:38:06.5874360Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.5874496Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.5874760Z [2023-01-11 21:35:29,260] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 766 2023-01-11T21:38:06.5875023Z [2023-01-11 21:35:29,270] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 766 2023-01-11T21:38:06.5875061Z 2023-01-11T21:38:06.5875162Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.5875232Z import torch 2023-01-11T21:38:06.5875309Z import random 2023-01-11T21:38:06.5875455Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.5875602Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.5875608Z 2023-01-11T21:38:06.5875694Z aten = torch.ops.aten 2023-01-11T21:38:06.5875830Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.5875927Z async_compile = AsyncCompile() 2023-01-11T21:38:06.5875932Z 2023-01-11T21:38:06.5876001Z import triton 2023-01-11T21:38:06.5876096Z import triton.language as tl 2023-01-11T21:38:06.5876226Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.5876366Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.5876372Z 2023-01-11T21:38:06.5876376Z 2023-01-11T21:38:06.5876539Z triton_fused_argmax_0 = async_compile.triton(''' 2023-01-11T21:38:06.5876618Z import triton 2023-01-11T21:38:06.5876712Z import triton.language as tl 2023-01-11T21:38:06.5876827Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.5876924Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.5877060Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.5877186Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.5877191Z 2023-01-11T21:38:06.5877581Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*i64', 1: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.5877683Z @triton.jit 2023-01-11T21:38:06.5877805Z def triton_(out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.5877879Z xnumel = 16 2023-01-11T21:38:06.5877980Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.5878104Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.5878195Z xmask = xindex < xnumel 2023-01-11T21:38:06.5878268Z x0 = xindex 2023-01-11T21:38:06.5878402Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), 0, xmask) 2023-01-11T21:38:06.5878491Z ''') 2023-01-11T21:38:06.5878497Z 2023-01-11T21:38:06.5878501Z 2023-01-11T21:38:06.5878596Z async_compile.wait(globals()) 2023-01-11T21:38:06.5878677Z del async_compile 2023-01-11T21:38:06.5878682Z 2023-01-11T21:38:06.5878752Z def call(args): 2023-01-11T21:38:06.5878828Z arg0_1, = args 2023-01-11T21:38:06.5878906Z args.clear() 2023-01-11T21:38:06.5878999Z with torch.cuda.device(0): 2023-01-11T21:38:06.5879200Z buf0 = empty_strided((4, 4), (4, 1), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.5879298Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.5879433Z triton_fused_argmax_0.run(buf0, 16, grid=grid(16), stream=stream0) 2023-01-11T21:38:06.5879622Z buf1 = empty_strided((4, 4), (4, 1), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.5879759Z triton_fused_argmax_0.run(buf1, 16, grid=grid(16), stream=stream0) 2023-01-11T21:38:06.5879845Z return (buf0, buf1, ) 2023-01-11T21:38:06.5879850Z 2023-01-11T21:38:06.5879854Z 2023-01-11T21:38:06.5879934Z if __name__ == "__main__": 2023-01-11T21:38:06.5880054Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.5880185Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.5880394Z arg0_1 = rand_strided((4, 4, 1), (4, 1, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.5880508Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.5880513Z 2023-01-11T21:38:06.5880520Z 2023-01-11T21:38:06.5880619Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.5880689Z import torch 2023-01-11T21:38:06.5880766Z import random 2023-01-11T21:38:06.5880885Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.5881009Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.5881014Z 2023-01-11T21:38:06.5881126Z aten = torch.ops.aten 2023-01-11T21:38:06.5881269Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.5881366Z async_compile = AsyncCompile() 2023-01-11T21:38:06.5881371Z 2023-01-11T21:38:06.5881439Z import triton 2023-01-11T21:38:06.5881533Z import triton.language as tl 2023-01-11T21:38:06.5881659Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.5881799Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.5881804Z 2023-01-11T21:38:06.5881809Z 2023-01-11T21:38:06.5881969Z triton_fused_argmax_0 = async_compile.triton(''' 2023-01-11T21:38:06.5882048Z import triton 2023-01-11T21:38:06.5882139Z import triton.language as tl 2023-01-11T21:38:06.5882255Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.5882352Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.5882487Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.5882615Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.5882620Z 2023-01-11T21:38:06.5883004Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*i64', 1: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.5883080Z @triton.jit 2023-01-11T21:38:06.5883201Z def triton_(out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.5883278Z xnumel = 16 2023-01-11T21:38:06.5883377Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.5883501Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.5883614Z xmask = xindex < xnumel 2023-01-11T21:38:06.5883687Z x0 = xindex 2023-01-11T21:38:06.5883819Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), 0, xmask) 2023-01-11T21:38:06.5883906Z ''') 2023-01-11T21:38:06.5883911Z 2023-01-11T21:38:06.5883916Z 2023-01-11T21:38:06.5884012Z async_compile.wait(globals()) 2023-01-11T21:38:06.5884094Z del async_compile 2023-01-11T21:38:06.5884100Z 2023-01-11T21:38:06.5884179Z def call(args): 2023-01-11T21:38:06.5884248Z arg0_1, = args 2023-01-11T21:38:06.5884325Z args.clear() 2023-01-11T21:38:06.5884418Z with torch.cuda.device(0): 2023-01-11T21:38:06.5884619Z buf0 = empty_strided((4, 4), (4, 1), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.5884712Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.5884848Z triton_fused_argmax_0.run(buf0, 16, grid=grid(16), stream=stream0) 2023-01-11T21:38:06.5885046Z buf1 = empty_strided((4, 4), (4, 1), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.5885173Z triton_fused_argmax_0.run(buf1, 16, grid=grid(16), stream=stream0) 2023-01-11T21:38:06.5885263Z return (buf0, buf1, ) 2023-01-11T21:38:06.5885268Z 2023-01-11T21:38:06.5885272Z 2023-01-11T21:38:06.5885356Z if __name__ == "__main__": 2023-01-11T21:38:06.5885476Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.5885608Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.5885815Z arg0_1 = rand_strided((4, 4, 1), (4, 1, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.5885928Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.5885933Z 2023-01-11T21:38:06.5886004Z ok (0.349s) 2023-01-11T21:38:06.5886481Z test_reflection_pad2d_backward_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.5886611Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.5886875Z [2023-01-11 21:35:29,290] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 767 2023-01-11T21:38:06.5887217Z [2023-01-11 21:35:29,360] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 767 2023-01-11T21:38:06.5887633Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.5887766Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.5888021Z [2023-01-11 21:35:29,380] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 768 2023-01-11T21:38:06.5888286Z [2023-01-11 21:35:29,448] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 768 2023-01-11T21:38:06.5888702Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.5888834Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.5889089Z [2023-01-11 21:35:29,467] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 769 2023-01-11T21:38:06.5889095Z 2023-01-11T21:38:06.5889197Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.5889267Z import torch 2023-01-11T21:38:06.5889343Z import random 2023-01-11T21:38:06.5889493Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.5889620Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.5889625Z 2023-01-11T21:38:06.5889708Z aten = torch.ops.aten 2023-01-11T21:38:06.5889847Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.5889949Z async_compile = AsyncCompile() 2023-01-11T21:38:06.5889954Z 2023-01-11T21:38:06.5890030Z import triton 2023-01-11T21:38:06.5890118Z import triton.language as tl 2023-01-11T21:38:06.5890243Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.5890387Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.5890392Z 2023-01-11T21:38:06.5890397Z 2023-01-11T21:38:06.5890592Z triton_fused_reflection_pad2d_backward_0 = async_compile.triton(''' 2023-01-11T21:38:06.5890669Z import triton 2023-01-11T21:38:06.5890764Z import triton.language as tl 2023-01-11T21:38:06.5890881Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.5890981Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.5891119Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.5891248Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.5891254Z 2023-01-11T21:38:06.5891660Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.5891739Z @triton.jit 2023-01-11T21:38:06.5891873Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.5891953Z xnumel = 64 2023-01-11T21:38:06.5892053Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.5892177Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.5892263Z xmask = xindex < xnumel 2023-01-11T21:38:06.5892345Z x1 = (xindex // 8) 2023-01-11T21:38:06.5892425Z x0 = xindex % 8 2023-01-11T21:38:06.5892499Z x2 = xindex 2023-01-11T21:38:06.5892572Z tmp0 = x1 2023-01-11T21:38:06.5892645Z tmp1 = x0 2023-01-11T21:38:06.5892751Z tmp2 = tl.load(in_ptr0 + (tmp1 + (8*tmp0)), None) 2023-01-11T21:38:06.5892886Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.5893002Z ''') 2023-01-11T21:38:06.5893008Z 2023-01-11T21:38:06.5893013Z 2023-01-11T21:38:06.5893110Z async_compile.wait(globals()) 2023-01-11T21:38:06.5893189Z del async_compile 2023-01-11T21:38:06.5893194Z 2023-01-11T21:38:06.5893270Z def call(args): 2023-01-11T21:38:06.5893354Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.5893432Z args.clear() 2023-01-11T21:38:06.5893519Z with torch.cuda.device(0): 2023-01-11T21:38:06.5893737Z buf0 = empty_strided((1, 1, 8, 8), (64, 64, 8, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.5893829Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.5893999Z triton_fused_reflection_pad2d_backward_0.run(arg0_1, buf0, 64, grid=grid(64), stream=stream0) 2023-01-11T21:38:06.5894078Z del arg0_1 2023-01-11T21:38:06.5894157Z return (buf0, ) 2023-01-11T21:38:06.5894163Z 2023-01-11T21:38:06.5894167Z 2023-01-11T21:38:06.5894252Z if __name__ == "__main__": 2023-01-11T21:38:06.5894367Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.5894601Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.5894824Z arg0_1 = rand_strided((1, 1, 8, 8), (64, 64, 8, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.5895037Z arg1_1 = rand_strided((1, 1, 8, 8), (64, 64, 8, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.5895155Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.5895160Z 2023-01-11T21:38:06.5895165Z 2023-01-11T21:38:06.5895262Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.5895336Z import torch 2023-01-11T21:38:06.5895427Z import random 2023-01-11T21:38:06.5895611Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.5895746Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.5895751Z 2023-01-11T21:38:06.5895836Z aten = torch.ops.aten 2023-01-11T21:38:06.5895975Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.5896074Z async_compile = AsyncCompile() 2023-01-11T21:38:06.5896079Z 2023-01-11T21:38:06.5896155Z import triton 2023-01-11T21:38:06.5896249Z import triton.language as tl 2023-01-11T21:38:06.5896376Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.5896510Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.5896515Z 2023-01-11T21:38:06.5896525Z 2023-01-11T21:38:06.5896715Z triton_fused_reflection_pad2d_backward_0 = async_compile.triton(''' 2023-01-11T21:38:06.5896792Z import triton 2023-01-11T21:38:06.5896892Z import triton.language as tl 2023-01-11T21:38:06.5897006Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.5897114Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.5897307Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.5897441Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.5897445Z 2023-01-11T21:38:06.5897845Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.5897919Z @triton.jit 2023-01-11T21:38:06.5898051Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.5898125Z xnumel = 64 2023-01-11T21:38:06.5898225Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.5898355Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.5898438Z xmask = xindex < xnumel 2023-01-11T21:38:06.5898512Z x1 = (xindex // 8) 2023-01-11T21:38:06.5898588Z x0 = xindex % 8 2023-01-11T21:38:06.5898665Z x2 = xindex 2023-01-11T21:38:06.5898739Z tmp0 = x1 2023-01-11T21:38:06.5898813Z tmp1 = x0 2023-01-11T21:38:06.5898943Z tmp2 = tl.load(in_ptr0 + (tmp1 + (8*tmp0)), None).to(tl.float32) 2023-01-11T21:38:06.5899081Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.5899161Z ''') 2023-01-11T21:38:06.5899211Z 2023-01-11T21:38:06.5899216Z 2023-01-11T21:38:06.5899302Z async_compile.wait(globals()) 2023-01-11T21:38:06.5899379Z del async_compile 2023-01-11T21:38:06.5899385Z 2023-01-11T21:38:06.5899460Z def call(args): 2023-01-11T21:38:06.5899539Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.5899614Z args.clear() 2023-01-11T21:38:06.5899707Z with torch.cuda.device(0): 2023-01-11T21:38:06.5899923Z buf0 = empty_strided((1, 1, 8, 8), (64, 64, 8, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.5900009Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.5900172Z triton_fused_reflection_pad2d_backward_0.run(arg0_1, buf0, 64, grid=grid(64), stream=stream0) 2023-01-11T21:38:06.5900246Z del arg0_1 2023-01-11T21:38:06.5900320Z return (buf0, ) 2023-01-11T21:38:06.5900326Z 2023-01-11T21:38:06.5900330Z 2023-01-11T21:38:06.5900408Z if __name__ == "__main__": 2023-01-11T21:38:06.5900533Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.5900672Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.5900913Z arg0_1 = rand_strided((1, 1, 8, 8), (64, 64, 8, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.5901146Z arg1_1 = rand_strided((1, 1, 8, 8), (64, 64, 8, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.5901273Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.5901278Z 2023-01-11T21:38:06.5901581Z [2023-01-11 21:35:29,634] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 769 2023-01-11T21:38:06.5902070Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.5902247Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.5902537Z [2023-01-11 21:35:29,653] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 770 2023-01-11T21:38:06.5902544Z 2023-01-11T21:38:06.5902646Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.5902722Z import torch 2023-01-11T21:38:06.5902796Z import random 2023-01-11T21:38:06.5902917Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.5903052Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.5903057Z 2023-01-11T21:38:06.5903140Z aten = torch.ops.aten 2023-01-11T21:38:06.5903291Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.5903392Z async_compile = AsyncCompile() 2023-01-11T21:38:06.5903397Z 2023-01-11T21:38:06.5903470Z import triton 2023-01-11T21:38:06.5903570Z import triton.language as tl 2023-01-11T21:38:06.5903705Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.5903852Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.5903857Z 2023-01-11T21:38:06.5903872Z 2023-01-11T21:38:06.5904085Z triton_fused_reflection_pad2d_backward_0 = async_compile.triton(''' 2023-01-11T21:38:06.5904159Z import triton 2023-01-11T21:38:06.5904257Z import triton.language as tl 2023-01-11T21:38:06.5904378Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.5904486Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.5904629Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.5904766Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.5904771Z 2023-01-11T21:38:06.5905249Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.5905323Z @triton.jit 2023-01-11T21:38:06.5905489Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.5905562Z xnumel = 64 2023-01-11T21:38:06.5905674Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.5905817Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.5905915Z xmask = xindex < xnumel 2023-01-11T21:38:06.5905987Z x1 = (xindex // 8) 2023-01-11T21:38:06.5906059Z x0 = xindex % 8 2023-01-11T21:38:06.5906131Z x2 = xindex 2023-01-11T21:38:06.5906205Z tmp0 = 1 + x1 2023-01-11T21:38:06.5906278Z tmp1 = 1 + x0 2023-01-11T21:38:06.5906391Z tmp2 = tl.load(in_ptr0 + (tmp1 + (10*tmp0)), None) 2023-01-11T21:38:06.5906460Z tmp3 = x0 2023-01-11T21:38:06.5906533Z tmp4 = tmp3 >= 1 2023-01-11T21:38:06.5906608Z tmp5 = tmp3 <= 1 2023-01-11T21:38:06.5906683Z tmp6 = tmp4 & tmp5 2023-01-11T21:38:06.5906791Z tmp7 = 1 + x1 + tl.zeros([XBLOCK], tl.int32) 2023-01-11T21:38:06.5906949Z tmp8 = 1 + ((-1)*x0) + tl.zeros([XBLOCK], tl.int32) 2023-01-11T21:38:06.5907074Z tmp9 = tl.load(in_ptr0 + (tmp8 + (10*tmp7)), tmp6, other=0) 2023-01-11T21:38:06.5907169Z tmp10 = tl.where(tmp6, tmp9, 0.0) 2023-01-11T21:38:06.5907243Z tmp11 = tmp2 + tmp10 2023-01-11T21:38:06.5907323Z tmp12 = tmp3 >= 6 2023-01-11T21:38:06.5907400Z tmp13 = tmp3 <= 6 2023-01-11T21:38:06.5907479Z tmp14 = tmp12 & tmp13 2023-01-11T21:38:06.5907585Z tmp15 = 1 + x1 + tl.zeros([XBLOCK], tl.int32) 2023-01-11T21:38:06.5907747Z tmp16 = 15 + ((-1)*x0) + tl.zeros([XBLOCK], tl.int32) 2023-01-11T21:38:06.5907872Z tmp17 = tl.load(in_ptr0 + (tmp16 + (10*tmp15)), tmp14, other=0) 2023-01-11T21:38:06.5907960Z tmp18 = tl.where(tmp14, tmp17, 0.0) 2023-01-11T21:38:06.5908065Z tmp19 = tmp11 + tmp18 2023-01-11T21:38:06.5908137Z tmp20 = x1 2023-01-11T21:38:06.5908211Z tmp21 = tmp20 >= 1 2023-01-11T21:38:06.5908286Z tmp22 = tmp20 <= 1 2023-01-11T21:38:06.5914676Z tmp23 = tmp21 & tmp22 2023-01-11T21:38:06.5914882Z tmp24 = 1 + ((-1)*x1) + tl.zeros([XBLOCK], tl.int32) 2023-01-11T21:38:06.5914998Z tmp25 = 1 + x0 + tl.zeros([XBLOCK], tl.int32) 2023-01-11T21:38:06.5915118Z tmp26 = tl.load(in_ptr0 + (tmp25 + (10*tmp24)), tmp23, other=0) 2023-01-11T21:38:06.5915218Z tmp27 = tl.where(tmp23, tmp26, 0.0) 2023-01-11T21:38:06.5915301Z tmp28 = tmp19 + tmp27 2023-01-11T21:38:06.5915381Z tmp29 = tmp20 >= 6 2023-01-11T21:38:06.5915467Z tmp30 = tmp20 <= 6 2023-01-11T21:38:06.5915566Z tmp31 = tmp29 & tmp30 2023-01-11T21:38:06.5915756Z tmp32 = 15 + ((-1)*x1) + tl.zeros([XBLOCK], tl.int32) 2023-01-11T21:38:06.5915857Z tmp33 = 1 + x0 + tl.zeros([XBLOCK], tl.int32) 2023-01-11T21:38:06.5915984Z tmp34 = tl.load(in_ptr0 + (tmp33 + (10*tmp32)), tmp31, other=0) 2023-01-11T21:38:06.5916084Z tmp35 = tl.where(tmp31, tmp34, 0.0) 2023-01-11T21:38:06.5916164Z tmp36 = tmp28 + tmp35 2023-01-11T21:38:06.5916240Z tmp37 = tmp23 & tmp6 2023-01-11T21:38:06.5916403Z tmp38 = 1 + ((-1)*x1) + tl.zeros([XBLOCK], tl.int32) 2023-01-11T21:38:06.5916557Z tmp39 = 1 + ((-1)*x0) + tl.zeros([XBLOCK], tl.int32) 2023-01-11T21:38:06.5916684Z tmp40 = tl.load(in_ptr0 + (tmp39 + (10*tmp38)), tmp37, other=0) 2023-01-11T21:38:06.5916782Z tmp41 = tl.where(tmp37, tmp40, 0.0) 2023-01-11T21:38:06.5916863Z tmp42 = tmp36 + tmp41 2023-01-11T21:38:06.5916943Z tmp43 = tmp23 & tmp14 2023-01-11T21:38:06.5917103Z tmp44 = 1 + ((-1)*x1) + tl.zeros([XBLOCK], tl.int32) 2023-01-11T21:38:06.5917264Z tmp45 = 15 + ((-1)*x0) + tl.zeros([XBLOCK], tl.int32) 2023-01-11T21:38:06.5917382Z tmp46 = tl.load(in_ptr0 + (tmp45 + (10*tmp44)), tmp43, other=0) 2023-01-11T21:38:06.5917476Z tmp47 = tl.where(tmp43, tmp46, 0.0) 2023-01-11T21:38:06.5917560Z tmp48 = tmp42 + tmp47 2023-01-11T21:38:06.5917643Z tmp49 = tmp31 & tmp6 2023-01-11T21:38:06.5917801Z tmp50 = 15 + ((-1)*x1) + tl.zeros([XBLOCK], tl.int32) 2023-01-11T21:38:06.5917960Z tmp51 = 1 + ((-1)*x0) + tl.zeros([XBLOCK], tl.int32) 2023-01-11T21:38:06.5918082Z tmp52 = tl.load(in_ptr0 + (tmp51 + (10*tmp50)), tmp49, other=0) 2023-01-11T21:38:06.5918226Z tmp53 = tl.where(tmp49, tmp52, 0.0) 2023-01-11T21:38:06.5918310Z tmp54 = tmp48 + tmp53 2023-01-11T21:38:06.5918390Z tmp55 = tmp31 & tmp14 2023-01-11T21:38:06.5918549Z tmp56 = 15 + ((-1)*x1) + tl.zeros([XBLOCK], tl.int32) 2023-01-11T21:38:06.5918709Z tmp57 = 15 + ((-1)*x0) + tl.zeros([XBLOCK], tl.int32) 2023-01-11T21:38:06.5918833Z tmp58 = tl.load(in_ptr0 + (tmp57 + (10*tmp56)), tmp55, other=0) 2023-01-11T21:38:06.5918931Z tmp59 = tl.where(tmp55, tmp58, 0.0) 2023-01-11T21:38:06.5919005Z tmp60 = tmp54 + tmp59 2023-01-11T21:38:06.5919142Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp60, xmask) 2023-01-11T21:38:06.5919232Z ''') 2023-01-11T21:38:06.5919238Z 2023-01-11T21:38:06.5919243Z 2023-01-11T21:38:06.5919334Z async_compile.wait(globals()) 2023-01-11T21:38:06.5919411Z del async_compile 2023-01-11T21:38:06.5919416Z 2023-01-11T21:38:06.5919492Z def call(args): 2023-01-11T21:38:06.5919569Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.5919645Z args.clear() 2023-01-11T21:38:06.5919731Z with torch.cuda.device(0): 2023-01-11T21:38:06.5919950Z buf0 = empty_strided((1, 1, 8, 8), (64, 64, 8, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.5920040Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.5920205Z triton_fused_reflection_pad2d_backward_0.run(arg0_1, buf0, 64, grid=grid(64), stream=stream0) 2023-01-11T21:38:06.5920279Z del arg0_1 2023-01-11T21:38:06.5920358Z return (buf0, ) 2023-01-11T21:38:06.5920363Z 2023-01-11T21:38:06.5920367Z 2023-01-11T21:38:06.5920448Z if __name__ == "__main__": 2023-01-11T21:38:06.5920567Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.5920729Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.5920953Z arg0_1 = rand_strided((1, 1, 10, 10), (100, 100, 10, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.5921168Z arg1_1 = rand_strided((1, 1, 8, 8), (64, 64, 8, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.5921290Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.5921296Z 2023-01-11T21:38:06.5921560Z [2023-01-11 21:35:29,809] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 770 2023-01-11T21:38:06.5921977Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.5922112Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.5922366Z [2023-01-11 21:35:29,829] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 771 2023-01-11T21:38:06.5922372Z 2023-01-11T21:38:06.5922469Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.5922542Z import torch 2023-01-11T21:38:06.5922614Z import random 2023-01-11T21:38:06.5922734Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.5922859Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.5922864Z 2023-01-11T21:38:06.5922946Z aten = torch.ops.aten 2023-01-11T21:38:06.5923081Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.5923178Z async_compile = AsyncCompile() 2023-01-11T21:38:06.5923183Z 2023-01-11T21:38:06.5923257Z import triton 2023-01-11T21:38:06.5923342Z import triton.language as tl 2023-01-11T21:38:06.5923467Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.5923609Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.5923614Z 2023-01-11T21:38:06.5923619Z 2023-01-11T21:38:06.5923812Z triton_fused_reflection_pad2d_backward_0 = async_compile.triton(''' 2023-01-11T21:38:06.5923889Z import triton 2023-01-11T21:38:06.5923980Z import triton.language as tl 2023-01-11T21:38:06.5924121Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.5924229Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.5924355Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.5924480Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.5924486Z 2023-01-11T21:38:06.5924892Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.5924966Z @triton.jit 2023-01-11T21:38:06.5925099Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.5925174Z xnumel = 64 2023-01-11T21:38:06.5925271Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.5925417Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.5925501Z xmask = xindex < xnumel 2023-01-11T21:38:06.5925597Z x1 = (xindex // 8) 2023-01-11T21:38:06.5925679Z x0 = xindex % 8 2023-01-11T21:38:06.5925749Z x2 = xindex 2023-01-11T21:38:06.5925822Z tmp0 = 1 + x1 2023-01-11T21:38:06.5925898Z tmp1 = 1 + x0 2023-01-11T21:38:06.5926021Z tmp2 = tl.load(in_ptr0 + (tmp1 + (10*tmp0)), None).to(tl.float32) 2023-01-11T21:38:06.5926091Z tmp3 = x0 2023-01-11T21:38:06.5926165Z tmp4 = tmp3 >= 1 2023-01-11T21:38:06.5926240Z tmp5 = tmp3 <= 1 2023-01-11T21:38:06.5926318Z tmp6 = tmp4 & tmp5 2023-01-11T21:38:06.5926428Z tmp7 = 1 + x1 + tl.zeros([XBLOCK], tl.int32) 2023-01-11T21:38:06.5926587Z tmp8 = 1 + ((-1)*x0) + tl.zeros([XBLOCK], tl.int32) 2023-01-11T21:38:06.5926751Z tmp9 = tl.load(in_ptr0 + (tmp8 + (10*tmp7)), tmp6, other=0).to(tl.float32) 2023-01-11T21:38:06.5926847Z tmp10 = tl.where(tmp6, tmp9, 0.0) 2023-01-11T21:38:06.5926930Z tmp11 = tmp2 + tmp10 2023-01-11T21:38:06.5927007Z tmp12 = tmp3 >= 6 2023-01-11T21:38:06.5927084Z tmp13 = tmp3 <= 6 2023-01-11T21:38:06.5927168Z tmp14 = tmp12 & tmp13 2023-01-11T21:38:06.5927274Z tmp15 = 1 + x1 + tl.zeros([XBLOCK], tl.int32) 2023-01-11T21:38:06.5927429Z tmp16 = 15 + ((-1)*x0) + tl.zeros([XBLOCK], tl.int32) 2023-01-11T21:38:06.5927572Z tmp17 = tl.load(in_ptr0 + (tmp16 + (10*tmp15)), tmp14, other=0).to(tl.float32) 2023-01-11T21:38:06.5927667Z tmp18 = tl.where(tmp14, tmp17, 0.0) 2023-01-11T21:38:06.5927747Z tmp19 = tmp11 + tmp18 2023-01-11T21:38:06.5927822Z tmp20 = x1 2023-01-11T21:38:06.5927898Z tmp21 = tmp20 >= 1 2023-01-11T21:38:06.5927974Z tmp22 = tmp20 <= 1 2023-01-11T21:38:06.5928047Z tmp23 = tmp21 & tmp22 2023-01-11T21:38:06.5928207Z tmp24 = 1 + ((-1)*x1) + tl.zeros([XBLOCK], tl.int32) 2023-01-11T21:38:06.5928317Z tmp25 = 1 + x0 + tl.zeros([XBLOCK], tl.int32) 2023-01-11T21:38:06.5928452Z tmp26 = tl.load(in_ptr0 + (tmp25 + (10*tmp24)), tmp23, other=0).to(tl.float32) 2023-01-11T21:38:06.5928546Z tmp27 = tl.where(tmp23, tmp26, 0.0) 2023-01-11T21:38:06.5928626Z tmp28 = tmp19 + tmp27 2023-01-11T21:38:06.5928709Z tmp29 = tmp20 >= 6 2023-01-11T21:38:06.5928779Z tmp30 = tmp20 <= 6 2023-01-11T21:38:06.5928860Z tmp31 = tmp29 & tmp30 2023-01-11T21:38:06.5929022Z tmp32 = 15 + ((-1)*x1) + tl.zeros([XBLOCK], tl.int32) 2023-01-11T21:38:06.5929127Z tmp33 = 1 + x0 + tl.zeros([XBLOCK], tl.int32) 2023-01-11T21:38:06.5929267Z tmp34 = tl.load(in_ptr0 + (tmp33 + (10*tmp32)), tmp31, other=0).to(tl.float32) 2023-01-11T21:38:06.5929362Z tmp35 = tl.where(tmp31, tmp34, 0.0) 2023-01-11T21:38:06.5929442Z tmp36 = tmp28 + tmp35 2023-01-11T21:38:06.5929515Z tmp37 = tmp23 & tmp6 2023-01-11T21:38:06.5929677Z tmp38 = 1 + ((-1)*x1) + tl.zeros([XBLOCK], tl.int32) 2023-01-11T21:38:06.5929837Z tmp39 = 1 + ((-1)*x0) + tl.zeros([XBLOCK], tl.int32) 2023-01-11T21:38:06.5929976Z tmp40 = tl.load(in_ptr0 + (tmp39 + (10*tmp38)), tmp37, other=0).to(tl.float32) 2023-01-11T21:38:06.5930070Z tmp41 = tl.where(tmp37, tmp40, 0.0) 2023-01-11T21:38:06.5930148Z tmp42 = tmp36 + tmp41 2023-01-11T21:38:06.5930264Z tmp43 = tmp23 & tmp14 2023-01-11T21:38:06.5930423Z tmp44 = 1 + ((-1)*x1) + tl.zeros([XBLOCK], tl.int32) 2023-01-11T21:38:06.5930576Z tmp45 = 15 + ((-1)*x0) + tl.zeros([XBLOCK], tl.int32) 2023-01-11T21:38:06.5930714Z tmp46 = tl.load(in_ptr0 + (tmp45 + (10*tmp44)), tmp43, other=0).to(tl.float32) 2023-01-11T21:38:06.5930807Z tmp47 = tl.where(tmp43, tmp46, 0.0) 2023-01-11T21:38:06.5930886Z tmp48 = tmp42 + tmp47 2023-01-11T21:38:06.5930964Z tmp49 = tmp31 & tmp6 2023-01-11T21:38:06.5931121Z tmp50 = 15 + ((-1)*x1) + tl.zeros([XBLOCK], tl.int32) 2023-01-11T21:38:06.5931279Z tmp51 = 1 + ((-1)*x0) + tl.zeros([XBLOCK], tl.int32) 2023-01-11T21:38:06.5931409Z tmp52 = tl.load(in_ptr0 + (tmp51 + (10*tmp50)), tmp49, other=0).to(tl.float32) 2023-01-11T21:38:06.5931503Z tmp53 = tl.where(tmp49, tmp52, 0.0) 2023-01-11T21:38:06.5931581Z tmp54 = tmp48 + tmp53 2023-01-11T21:38:06.5931658Z tmp55 = tmp31 & tmp14 2023-01-11T21:38:06.5931817Z tmp56 = 15 + ((-1)*x1) + tl.zeros([XBLOCK], tl.int32) 2023-01-11T21:38:06.5931976Z tmp57 = 15 + ((-1)*x0) + tl.zeros([XBLOCK], tl.int32) 2023-01-11T21:38:06.5932113Z tmp58 = tl.load(in_ptr0 + (tmp57 + (10*tmp56)), tmp55, other=0).to(tl.float32) 2023-01-11T21:38:06.5932200Z tmp59 = tl.where(tmp55, tmp58, 0.0) 2023-01-11T21:38:06.5932284Z tmp60 = tmp54 + tmp59 2023-01-11T21:38:06.5932419Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp60, xmask) 2023-01-11T21:38:06.5932504Z ''') 2023-01-11T21:38:06.5932509Z 2023-01-11T21:38:06.5932514Z 2023-01-11T21:38:06.5932608Z async_compile.wait(globals()) 2023-01-11T21:38:06.5932684Z del async_compile 2023-01-11T21:38:06.5932717Z 2023-01-11T21:38:06.5932792Z def call(args): 2023-01-11T21:38:06.5932871Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.5932938Z args.clear() 2023-01-11T21:38:06.5933031Z with torch.cuda.device(0): 2023-01-11T21:38:06.5933251Z buf0 = empty_strided((1, 1, 8, 8), (64, 64, 8, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.5933346Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.5933510Z triton_fused_reflection_pad2d_backward_0.run(arg0_1, buf0, 64, grid=grid(64), stream=stream0) 2023-01-11T21:38:06.5933583Z del arg0_1 2023-01-11T21:38:06.5933662Z return (buf0, ) 2023-01-11T21:38:06.5933667Z 2023-01-11T21:38:06.5933671Z 2023-01-11T21:38:06.5933750Z if __name__ == "__main__": 2023-01-11T21:38:06.5933861Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.5933988Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.5934212Z arg0_1 = rand_strided((1, 1, 10, 10), (100, 100, 10, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.5934426Z arg1_1 = rand_strided((1, 1, 8, 8), (64, 64, 8, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.5934780Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.5934786Z 2023-01-11T21:38:06.5935060Z [2023-01-11 21:35:29,993] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 771 2023-01-11T21:38:06.5935502Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.5935652Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.5935918Z [2023-01-11 21:35:30,012] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 772 2023-01-11T21:38:06.5935927Z 2023-01-11T21:38:06.5936018Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.5936095Z import torch 2023-01-11T21:38:06.5936167Z import random 2023-01-11T21:38:06.5936284Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.5936457Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.5936463Z 2023-01-11T21:38:06.5936544Z aten = torch.ops.aten 2023-01-11T21:38:06.5936682Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.5936777Z async_compile = AsyncCompile() 2023-01-11T21:38:06.5936782Z 2023-01-11T21:38:06.5936850Z import triton 2023-01-11T21:38:06.5936943Z import triton.language as tl 2023-01-11T21:38:06.5937066Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.5937264Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.5937270Z 2023-01-11T21:38:06.5937275Z 2023-01-11T21:38:06.5937477Z triton_fused_reflection_pad2d_backward_0 = async_compile.triton(''' 2023-01-11T21:38:06.5937556Z import triton 2023-01-11T21:38:06.5937647Z import triton.language as tl 2023-01-11T21:38:06.5937755Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.5937856Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.5937991Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.5938117Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.5938122Z 2023-01-11T21:38:06.5938527Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.5938601Z @triton.jit 2023-01-11T21:38:06.5938734Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.5938807Z xnumel = 64 2023-01-11T21:38:06.5938898Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.5939072Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.5939160Z xmask = xindex < xnumel 2023-01-11T21:38:06.5939242Z x1 = (xindex // 8) 2023-01-11T21:38:06.5939320Z x0 = xindex % 8 2023-01-11T21:38:06.5939393Z x2 = xindex 2023-01-11T21:38:06.5939468Z tmp0 = 3 + x1 2023-01-11T21:38:06.5939540Z tmp1 = 1 + x0 2023-01-11T21:38:06.5939657Z tmp2 = tl.load(in_ptr0 + (tmp1 + (11*tmp0)), None) 2023-01-11T21:38:06.5939728Z tmp3 = x0 2023-01-11T21:38:06.5939809Z tmp4 = tmp3 >= 1 2023-01-11T21:38:06.5939887Z tmp5 = tmp3 <= 1 2023-01-11T21:38:06.5939969Z tmp6 = tmp4 & tmp5 2023-01-11T21:38:06.5940079Z tmp7 = 3 + x1 + tl.zeros([XBLOCK], tl.int32) 2023-01-11T21:38:06.5940233Z tmp8 = 1 + ((-1)*x0) + tl.zeros([XBLOCK], tl.int32) 2023-01-11T21:38:06.5940359Z tmp9 = tl.load(in_ptr0 + (tmp8 + (11*tmp7)), tmp6, other=0) 2023-01-11T21:38:06.5940457Z tmp10 = tl.where(tmp6, tmp9, 0.0) 2023-01-11T21:38:06.5940542Z tmp11 = tmp2 + tmp10 2023-01-11T21:38:06.5940627Z tmp12 = tmp3 >= 5 2023-01-11T21:38:06.5940706Z tmp13 = tmp3 <= 6 2023-01-11T21:38:06.5940787Z tmp14 = tmp12 & tmp13 2023-01-11T21:38:06.5940890Z tmp15 = 3 + x1 + tl.zeros([XBLOCK], tl.int32) 2023-01-11T21:38:06.5941054Z tmp16 = 15 + ((-1)*x0) + tl.zeros([XBLOCK], tl.int32) 2023-01-11T21:38:06.5941183Z tmp17 = tl.load(in_ptr0 + (tmp16 + (11*tmp15)), tmp14, other=0) 2023-01-11T21:38:06.5941281Z tmp18 = tl.where(tmp14, tmp17, 0.0) 2023-01-11T21:38:06.5941363Z tmp19 = tmp11 + tmp18 2023-01-11T21:38:06.5941439Z tmp20 = x1 2023-01-11T21:38:06.5941519Z tmp21 = tmp20 >= 1 2023-01-11T21:38:06.5941591Z tmp22 = tmp20 <= 3 2023-01-11T21:38:06.5941673Z tmp23 = tmp21 & tmp22 2023-01-11T21:38:06.5941835Z tmp24 = 3 + ((-1)*x1) + tl.zeros([XBLOCK], tl.int32) 2023-01-11T21:38:06.5941943Z tmp25 = 1 + x0 + tl.zeros([XBLOCK], tl.int32) 2023-01-11T21:38:06.5942071Z tmp26 = tl.load(in_ptr0 + (tmp25 + (11*tmp24)), tmp23, other=0) 2023-01-11T21:38:06.5942176Z tmp27 = tl.where(tmp23, tmp26, 0.0) 2023-01-11T21:38:06.5942259Z tmp28 = tmp19 + tmp27 2023-01-11T21:38:06.5942331Z tmp29 = tmp20 >= 3 2023-01-11T21:38:06.5942410Z tmp30 = tmp20 <= 6 2023-01-11T21:38:06.5942495Z tmp31 = tmp29 & tmp30 2023-01-11T21:38:06.5942656Z tmp32 = 17 + ((-1)*x1) + tl.zeros([XBLOCK], tl.int32) 2023-01-11T21:38:06.5942796Z tmp33 = 1 + x0 + tl.zeros([XBLOCK], tl.int32) 2023-01-11T21:38:06.5942924Z tmp34 = tl.load(in_ptr0 + (tmp33 + (11*tmp32)), tmp31, other=0) 2023-01-11T21:38:06.5943024Z tmp35 = tl.where(tmp31, tmp34, 0.0) 2023-01-11T21:38:06.5943099Z tmp36 = tmp28 + tmp35 2023-01-11T21:38:06.5943182Z tmp37 = tmp23 & tmp6 2023-01-11T21:38:06.5943342Z tmp38 = 3 + ((-1)*x1) + tl.zeros([XBLOCK], tl.int32) 2023-01-11T21:38:06.5943501Z tmp39 = 1 + ((-1)*x0) + tl.zeros([XBLOCK], tl.int32) 2023-01-11T21:38:06.5943627Z tmp40 = tl.load(in_ptr0 + (tmp39 + (11*tmp38)), tmp37, other=0) 2023-01-11T21:38:06.5943724Z tmp41 = tl.where(tmp37, tmp40, 0.0) 2023-01-11T21:38:06.5943811Z tmp42 = tmp36 + tmp41 2023-01-11T21:38:06.5943887Z tmp43 = tmp23 & tmp14 2023-01-11T21:38:06.5944046Z tmp44 = 3 + ((-1)*x1) + tl.zeros([XBLOCK], tl.int32) 2023-01-11T21:38:06.5944208Z tmp45 = 15 + ((-1)*x0) + tl.zeros([XBLOCK], tl.int32) 2023-01-11T21:38:06.5944338Z tmp46 = tl.load(in_ptr0 + (tmp45 + (11*tmp44)), tmp43, other=0) 2023-01-11T21:38:06.5944434Z tmp47 = tl.where(tmp43, tmp46, 0.0) 2023-01-11T21:38:06.5944515Z tmp48 = tmp42 + tmp47 2023-01-11T21:38:06.5944596Z tmp49 = tmp31 & tmp6 2023-01-11T21:38:06.5944748Z tmp50 = 17 + ((-1)*x1) + tl.zeros([XBLOCK], tl.int32) 2023-01-11T21:38:06.5944911Z tmp51 = 1 + ((-1)*x0) + tl.zeros([XBLOCK], tl.int32) 2023-01-11T21:38:06.5945034Z tmp52 = tl.load(in_ptr0 + (tmp51 + (11*tmp50)), tmp49, other=0) 2023-01-11T21:38:06.5945129Z tmp53 = tl.where(tmp49, tmp52, 0.0) 2023-01-11T21:38:06.5945212Z tmp54 = tmp48 + tmp53 2023-01-11T21:38:06.5945297Z tmp55 = tmp31 & tmp14 2023-01-11T21:38:06.5945490Z tmp56 = 17 + ((-1)*x1) + tl.zeros([XBLOCK], tl.int32) 2023-01-11T21:38:06.5945643Z tmp57 = 15 + ((-1)*x0) + tl.zeros([XBLOCK], tl.int32) 2023-01-11T21:38:06.5945769Z tmp58 = tl.load(in_ptr0 + (tmp57 + (11*tmp56)), tmp55, other=0) 2023-01-11T21:38:06.5945865Z tmp59 = tl.where(tmp55, tmp58, 0.0) 2023-01-11T21:38:06.5945952Z tmp60 = tmp54 + tmp59 2023-01-11T21:38:06.5946095Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp60, xmask) 2023-01-11T21:38:06.5946180Z ''') 2023-01-11T21:38:06.5946185Z 2023-01-11T21:38:06.5946190Z 2023-01-11T21:38:06.5946285Z async_compile.wait(globals()) 2023-01-11T21:38:06.5946364Z del async_compile 2023-01-11T21:38:06.5946369Z 2023-01-11T21:38:06.5946439Z def call(args): 2023-01-11T21:38:06.5946521Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.5946598Z args.clear() 2023-01-11T21:38:06.5946694Z with torch.cuda.device(0): 2023-01-11T21:38:06.5946912Z buf0 = empty_strided((1, 1, 8, 8), (64, 64, 8, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.5947013Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.5947180Z triton_fused_reflection_pad2d_backward_0.run(arg0_1, buf0, 64, grid=grid(64), stream=stream0) 2023-01-11T21:38:06.5947249Z del arg0_1 2023-01-11T21:38:06.5947329Z return (buf0, ) 2023-01-11T21:38:06.5947337Z 2023-01-11T21:38:06.5947342Z 2023-01-11T21:38:06.5947426Z if __name__ == "__main__": 2023-01-11T21:38:06.5947548Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.5947678Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.5947903Z arg0_1 = rand_strided((1, 1, 15, 11), (165, 165, 11, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.5948117Z arg1_1 = rand_strided((1, 1, 8, 8), (64, 64, 8, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.5948239Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.5948244Z 2023-01-11T21:38:06.5948515Z [2023-01-11 21:35:30,165] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 772 2023-01-11T21:38:06.5948520Z 2023-01-11T21:38:06.5948614Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.5948694Z import torch 2023-01-11T21:38:06.5948771Z import random 2023-01-11T21:38:06.5948922Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.5949052Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.5949057Z 2023-01-11T21:38:06.5949141Z aten = torch.ops.aten 2023-01-11T21:38:06.5949281Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.5949372Z async_compile = AsyncCompile() 2023-01-11T21:38:06.5949384Z 2023-01-11T21:38:06.5949453Z import triton 2023-01-11T21:38:06.5949546Z import triton.language as tl 2023-01-11T21:38:06.5949675Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.5949818Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.5949826Z 2023-01-11T21:38:06.5949831Z 2023-01-11T21:38:06.5950031Z triton_fused_reflection_pad2d_backward_0 = async_compile.triton(''' 2023-01-11T21:38:06.5950107Z import triton 2023-01-11T21:38:06.5950203Z import triton.language as tl 2023-01-11T21:38:06.5950312Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.5950421Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.5950555Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.5950683Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.5950689Z 2023-01-11T21:38:06.5951091Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.5951167Z @triton.jit 2023-01-11T21:38:06.5951304Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.5951377Z xnumel = 64 2023-01-11T21:38:06.5951502Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.5951636Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.5951721Z xmask = xindex < xnumel 2023-01-11T21:38:06.5951801Z x1 = (xindex // 8) 2023-01-11T21:38:06.5951877Z x0 = xindex % 8 2023-01-11T21:38:06.5951950Z x2 = xindex 2023-01-11T21:38:06.5952027Z tmp0 = 3 + x1 2023-01-11T21:38:06.5952096Z tmp1 = 1 + x0 2023-01-11T21:38:06.5952227Z tmp2 = tl.load(in_ptr0 + (tmp1 + (11*tmp0)), None).to(tl.float32) 2023-01-11T21:38:06.5952301Z tmp3 = x0 2023-01-11T21:38:06.5952380Z tmp4 = tmp3 >= 1 2023-01-11T21:38:06.5952459Z tmp5 = tmp3 <= 1 2023-01-11T21:38:06.5952539Z tmp6 = tmp4 & tmp5 2023-01-11T21:38:06.5952653Z tmp7 = 3 + x1 + tl.zeros([XBLOCK], tl.int32) 2023-01-11T21:38:06.5952806Z tmp8 = 1 + ((-1)*x0) + tl.zeros([XBLOCK], tl.int32) 2023-01-11T21:38:06.5952947Z tmp9 = tl.load(in_ptr0 + (tmp8 + (11*tmp7)), tmp6, other=0).to(tl.float32) 2023-01-11T21:38:06.5953050Z tmp10 = tl.where(tmp6, tmp9, 0.0) 2023-01-11T21:38:06.5953133Z tmp11 = tmp2 + tmp10 2023-01-11T21:38:06.5953212Z tmp12 = tmp3 >= 5 2023-01-11T21:38:06.5953291Z tmp13 = tmp3 <= 6 2023-01-11T21:38:06.5953374Z tmp14 = tmp12 & tmp13 2023-01-11T21:38:06.5953476Z tmp15 = 3 + x1 + tl.zeros([XBLOCK], tl.int32) 2023-01-11T21:38:06.5953644Z tmp16 = 15 + ((-1)*x0) + tl.zeros([XBLOCK], tl.int32) 2023-01-11T21:38:06.5953786Z tmp17 = tl.load(in_ptr0 + (tmp16 + (11*tmp15)), tmp14, other=0).to(tl.float32) 2023-01-11T21:38:06.5953884Z tmp18 = tl.where(tmp14, tmp17, 0.0) 2023-01-11T21:38:06.5953967Z tmp19 = tmp11 + tmp18 2023-01-11T21:38:06.5954043Z tmp20 = x1 2023-01-11T21:38:06.5954122Z tmp21 = tmp20 >= 1 2023-01-11T21:38:06.5954195Z tmp22 = tmp20 <= 3 2023-01-11T21:38:06.5954276Z tmp23 = tmp21 & tmp22 2023-01-11T21:38:06.5954438Z tmp24 = 3 + ((-1)*x1) + tl.zeros([XBLOCK], tl.int32) 2023-01-11T21:38:06.5954546Z tmp25 = 1 + x0 + tl.zeros([XBLOCK], tl.int32) 2023-01-11T21:38:06.5954691Z tmp26 = tl.load(in_ptr0 + (tmp25 + (11*tmp24)), tmp23, other=0).to(tl.float32) 2023-01-11T21:38:06.5954789Z tmp27 = tl.where(tmp23, tmp26, 0.0) 2023-01-11T21:38:06.5954871Z tmp28 = tmp19 + tmp27 2023-01-11T21:38:06.5954943Z tmp29 = tmp20 >= 3 2023-01-11T21:38:06.5955023Z tmp30 = tmp20 <= 6 2023-01-11T21:38:06.5955135Z tmp31 = tmp29 & tmp30 2023-01-11T21:38:06.5955305Z tmp32 = 17 + ((-1)*x1) + tl.zeros([XBLOCK], tl.int32) 2023-01-11T21:38:06.5955436Z tmp33 = 1 + x0 + tl.zeros([XBLOCK], tl.int32) 2023-01-11T21:38:06.5955599Z tmp34 = tl.load(in_ptr0 + (tmp33 + (11*tmp32)), tmp31, other=0).to(tl.float32) 2023-01-11T21:38:06.5955698Z tmp35 = tl.where(tmp31, tmp34, 0.0) 2023-01-11T21:38:06.5955774Z tmp36 = tmp28 + tmp35 2023-01-11T21:38:06.5955855Z tmp37 = tmp23 & tmp6 2023-01-11T21:38:06.5956015Z tmp38 = 3 + ((-1)*x1) + tl.zeros([XBLOCK], tl.int32) 2023-01-11T21:38:06.5956176Z tmp39 = 1 + ((-1)*x0) + tl.zeros([XBLOCK], tl.int32) 2023-01-11T21:38:06.5956319Z tmp40 = tl.load(in_ptr0 + (tmp39 + (11*tmp38)), tmp37, other=0).to(tl.float32) 2023-01-11T21:38:06.5956416Z tmp41 = tl.where(tmp37, tmp40, 0.0) 2023-01-11T21:38:06.5956499Z tmp42 = tmp36 + tmp41 2023-01-11T21:38:06.5956574Z tmp43 = tmp23 & tmp14 2023-01-11T21:38:06.5956736Z tmp44 = 3 + ((-1)*x1) + tl.zeros([XBLOCK], tl.int32) 2023-01-11T21:38:06.5956898Z tmp45 = 15 + ((-1)*x0) + tl.zeros([XBLOCK], tl.int32) 2023-01-11T21:38:06.5957040Z tmp46 = tl.load(in_ptr0 + (tmp45 + (11*tmp44)), tmp43, other=0).to(tl.float32) 2023-01-11T21:38:06.5957137Z tmp47 = tl.where(tmp43, tmp46, 0.0) 2023-01-11T21:38:06.5957220Z tmp48 = tmp42 + tmp47 2023-01-11T21:38:06.5957304Z tmp49 = tmp31 & tmp6 2023-01-11T21:38:06.5957457Z tmp50 = 17 + ((-1)*x1) + tl.zeros([XBLOCK], tl.int32) 2023-01-11T21:38:06.5957619Z tmp51 = 1 + ((-1)*x0) + tl.zeros([XBLOCK], tl.int32) 2023-01-11T21:38:06.5957756Z tmp52 = tl.load(in_ptr0 + (tmp51 + (11*tmp50)), tmp49, other=0).to(tl.float32) 2023-01-11T21:38:06.5957883Z tmp53 = tl.where(tmp49, tmp52, 0.0) 2023-01-11T21:38:06.5957965Z tmp54 = tmp48 + tmp53 2023-01-11T21:38:06.5958046Z tmp55 = tmp31 & tmp14 2023-01-11T21:38:06.5958207Z tmp56 = 17 + ((-1)*x1) + tl.zeros([XBLOCK], tl.int32) 2023-01-11T21:38:06.5958370Z tmp57 = 15 + ((-1)*x0) + tl.zeros([XBLOCK], tl.int32) 2023-01-11T21:38:06.5958505Z tmp58 = tl.load(in_ptr0 + (tmp57 + (11*tmp56)), tmp55, other=0).to(tl.float32) 2023-01-11T21:38:06.5958601Z tmp59 = tl.where(tmp55, tmp58, 0.0) 2023-01-11T21:38:06.5958684Z tmp60 = tmp54 + tmp59 2023-01-11T21:38:06.5958822Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp60, xmask) 2023-01-11T21:38:06.5958911Z ''') 2023-01-11T21:38:06.5958917Z 2023-01-11T21:38:06.5958922Z 2023-01-11T21:38:06.5959017Z async_compile.wait(globals()) 2023-01-11T21:38:06.5959098Z del async_compile 2023-01-11T21:38:06.5959103Z 2023-01-11T21:38:06.5959173Z def call(args): 2023-01-11T21:38:06.5959260Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.5959337Z args.clear() 2023-01-11T21:38:06.5959434Z with torch.cuda.device(0): 2023-01-11T21:38:06.5959650Z buf0 = empty_strided((1, 1, 8, 8), (64, 64, 8, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.5959744Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.5959914Z triton_fused_reflection_pad2d_backward_0.run(arg0_1, buf0, 64, grid=grid(64), stream=stream0) 2023-01-11T21:38:06.5959991Z del arg0_1 2023-01-11T21:38:06.5960064Z return (buf0, ) 2023-01-11T21:38:06.5960070Z 2023-01-11T21:38:06.5960074Z 2023-01-11T21:38:06.5960158Z if __name__ == "__main__": 2023-01-11T21:38:06.5960280Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.5960407Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.5960631Z arg0_1 = rand_strided((1, 1, 15, 11), (165, 165, 11, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.5960844Z arg1_1 = rand_strided((1, 1, 8, 8), (64, 64, 8, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.5960970Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.5960976Z 2023-01-11T21:38:06.5961050Z ok (0.896s) 2023-01-11T21:38:06.5961537Z test_reflection_pad2d_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.5961678Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.5961943Z [2023-01-11 21:35:30,192] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 773 2023-01-11T21:38:06.5962211Z [2023-01-11 21:35:30,298] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 773 2023-01-11T21:38:06.5962628Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.5962761Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.5963015Z [2023-01-11 21:35:30,324] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 774 2023-01-11T21:38:06.5963277Z [2023-01-11 21:35:30,421] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 774 2023-01-11T21:38:06.5963283Z 2023-01-11T21:38:06.5963383Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.5963462Z import torch 2023-01-11T21:38:06.5963539Z import random 2023-01-11T21:38:06.5963655Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.5963823Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.5963829Z 2023-01-11T21:38:06.5963915Z aten = torch.ops.aten 2023-01-11T21:38:06.5964055Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.5964153Z async_compile = AsyncCompile() 2023-01-11T21:38:06.5964160Z 2023-01-11T21:38:06.5964238Z import triton 2023-01-11T21:38:06.5964335Z import triton.language as tl 2023-01-11T21:38:06.5964456Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.5964599Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.5964605Z 2023-01-11T21:38:06.5964609Z 2023-01-11T21:38:06.5964793Z triton_fused_reflection_pad2d_0 = async_compile.triton(''' 2023-01-11T21:38:06.5964873Z import triton 2023-01-11T21:38:06.5964967Z import triton.language as tl 2023-01-11T21:38:06.5965083Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.5965188Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.5965328Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.5965448Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.5965453Z 2023-01-11T21:38:06.5965860Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.5965933Z @triton.jit 2023-01-11T21:38:06.5966070Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.5966146Z xnumel = 100 2023-01-11T21:38:06.5966246Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.5966380Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.5966463Z xmask = xindex < xnumel 2023-01-11T21:38:06.5966539Z x1 = (xindex // 10) 2023-01-11T21:38:06.5966619Z x0 = xindex % 10 2023-01-11T21:38:06.5966690Z x2 = xindex 2023-01-11T21:38:06.5966765Z tmp0 = 7 2023-01-11T21:38:06.5966838Z tmp1 = x1 2023-01-11T21:38:06.5966910Z tmp2 = 1 2023-01-11T21:38:06.5967016Z tmp3 = tmp1 - tmp2 2023-01-11T21:38:06.5967099Z tmp4 = tl.abs(tmp3) 2023-01-11T21:38:06.5967210Z tmp5 = tmp0 - tmp4 2023-01-11T21:38:06.5967291Z tmp6 = tl.abs(tmp5) 2023-01-11T21:38:06.5967429Z tmp7 = tmp0 - tmp6 2023-01-11T21:38:06.5967504Z tmp8 = x0 2023-01-11T21:38:06.5967613Z tmp9 = tmp8 - tmp2 2023-01-11T21:38:06.5967690Z tmp10 = tl.abs(tmp9) 2023-01-11T21:38:06.5967804Z tmp11 = tmp0 - tmp10 2023-01-11T21:38:06.5967886Z tmp12 = tl.abs(tmp11) 2023-01-11T21:38:06.5968000Z tmp13 = tmp0 - tmp12 2023-01-11T21:38:06.5968211Z tmp14 = tl.load(in_ptr0 + (tmp13 + (8*tmp7)), None, eviction_policy='evict_last') 2023-01-11T21:38:06.5968352Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp14, xmask) 2023-01-11T21:38:06.5968439Z ''') 2023-01-11T21:38:06.5968445Z 2023-01-11T21:38:06.5968449Z 2023-01-11T21:38:06.5968630Z triton_fused_reflection_pad2d_1_1 = async_compile.triton(''' 2023-01-11T21:38:06.5968708Z import triton 2023-01-11T21:38:06.5968802Z import triton.language as tl 2023-01-11T21:38:06.5968919Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.5969025Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.5969164Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.5969291Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.5969297Z 2023-01-11T21:38:06.5969699Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.5969769Z @triton.jit 2023-01-11T21:38:06.5969903Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.5969979Z xnumel = 165 2023-01-11T21:38:06.5970080Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.5970239Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.5970325Z xmask = xindex < xnumel 2023-01-11T21:38:06.5970407Z x1 = (xindex // 11) 2023-01-11T21:38:06.5970478Z x0 = xindex % 11 2023-01-11T21:38:06.5970548Z x2 = xindex 2023-01-11T21:38:06.5970620Z tmp0 = 7 2023-01-11T21:38:06.5970695Z tmp1 = x1 2023-01-11T21:38:06.5970767Z tmp2 = 3 2023-01-11T21:38:06.5970880Z tmp3 = tmp1 - tmp2 2023-01-11T21:38:06.5970962Z tmp4 = tl.abs(tmp3) 2023-01-11T21:38:06.5971064Z tmp5 = tmp0 - tmp4 2023-01-11T21:38:06.5971146Z tmp6 = tl.abs(tmp5) 2023-01-11T21:38:06.5971255Z tmp7 = tmp0 - tmp6 2023-01-11T21:38:06.5971328Z tmp8 = x0 2023-01-11T21:38:06.5971400Z tmp9 = 1 2023-01-11T21:38:06.5971511Z tmp10 = tmp8 - tmp9 2023-01-11T21:38:06.5971589Z tmp11 = tl.abs(tmp10) 2023-01-11T21:38:06.5971706Z tmp12 = tmp0 - tmp11 2023-01-11T21:38:06.5971789Z tmp13 = tl.abs(tmp12) 2023-01-11T21:38:06.5971905Z tmp14 = tmp0 - tmp13 2023-01-11T21:38:06.5972023Z tmp15 = tl.load(in_ptr0 + (tmp14 + (8*tmp7)), None) 2023-01-11T21:38:06.5972159Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp15, xmask) 2023-01-11T21:38:06.5972248Z ''') 2023-01-11T21:38:06.5972253Z 2023-01-11T21:38:06.5972258Z 2023-01-11T21:38:06.5972356Z async_compile.wait(globals()) 2023-01-11T21:38:06.5972430Z del async_compile 2023-01-11T21:38:06.5972434Z 2023-01-11T21:38:06.5972512Z def call(args): 2023-01-11T21:38:06.5972587Z arg0_1, = args 2023-01-11T21:38:06.5972664Z args.clear() 2023-01-11T21:38:06.5972760Z with torch.cuda.device(0): 2023-01-11T21:38:06.5972989Z buf0 = empty_strided((1, 1, 10, 10), (100, 100, 10, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.5973088Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.5973235Z triton_fused_reflection_pad2d_0.run(arg0_1, buf0, 100, grid=grid(100), stream=stream0) 2023-01-11T21:38:06.5973462Z buf1 = empty_strided((1, 1, 15, 11), (165, 165, 11, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.5973622Z triton_fused_reflection_pad2d_1_1.run(arg0_1, buf1, 165, grid=grid(165), stream=stream0) 2023-01-11T21:38:06.5973699Z del arg0_1 2023-01-11T21:38:06.5973785Z return (buf0, buf1, ) 2023-01-11T21:38:06.5973790Z 2023-01-11T21:38:06.5973834Z 2023-01-11T21:38:06.5973918Z if __name__ == "__main__": 2023-01-11T21:38:06.5974037Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.5974167Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.5974377Z arg0_1 = rand_strided((1, 1, 8, 8), (64, 64, 8, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.5974603Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.5974609Z 2023-01-11T21:38:06.5974614Z 2023-01-11T21:38:06.5974719Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.5974798Z import torch 2023-01-11T21:38:06.5974875Z import random 2023-01-11T21:38:06.5975002Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.5975128Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.5975133Z 2023-01-11T21:38:06.5975214Z aten = torch.ops.aten 2023-01-11T21:38:06.5975346Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.5975447Z async_compile = AsyncCompile() 2023-01-11T21:38:06.5975452Z 2023-01-11T21:38:06.5975527Z import triton 2023-01-11T21:38:06.5975621Z import triton.language as tl 2023-01-11T21:38:06.5975749Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.5975893Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.5975899Z 2023-01-11T21:38:06.5975903Z 2023-01-11T21:38:06.5976085Z triton_fused_reflection_pad2d_0 = async_compile.triton(''' 2023-01-11T21:38:06.5976162Z import triton 2023-01-11T21:38:06.5976250Z import triton.language as tl 2023-01-11T21:38:06.5976366Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.5976519Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.5976655Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.5976783Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.5976788Z 2023-01-11T21:38:06.5977258Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.5977336Z @triton.jit 2023-01-11T21:38:06.5977470Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.5977540Z xnumel = 100 2023-01-11T21:38:06.5977642Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.5977775Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.5977861Z xmask = xindex < xnumel 2023-01-11T21:38:06.5977944Z x1 = (xindex // 10) 2023-01-11T21:38:06.5978021Z x0 = xindex % 10 2023-01-11T21:38:06.5978096Z x2 = xindex 2023-01-11T21:38:06.5978162Z tmp0 = 7 2023-01-11T21:38:06.5978237Z tmp1 = x1 2023-01-11T21:38:06.5978311Z tmp2 = 1 2023-01-11T21:38:06.5978424Z tmp3 = tmp1 - tmp2 2023-01-11T21:38:06.5978504Z tmp4 = tl.abs(tmp3) 2023-01-11T21:38:06.5978618Z tmp5 = tmp0 - tmp4 2023-01-11T21:38:06.5978695Z tmp6 = tl.abs(tmp5) 2023-01-11T21:38:06.5978806Z tmp7 = tmp0 - tmp6 2023-01-11T21:38:06.5978878Z tmp8 = x0 2023-01-11T21:38:06.5978986Z tmp9 = tmp8 - tmp2 2023-01-11T21:38:06.5979072Z tmp10 = tl.abs(tmp9) 2023-01-11T21:38:06.5979186Z tmp11 = tmp0 - tmp10 2023-01-11T21:38:06.5979271Z tmp12 = tl.abs(tmp11) 2023-01-11T21:38:06.5979378Z tmp13 = tmp0 - tmp12 2023-01-11T21:38:06.5979617Z tmp14 = tl.load(in_ptr0 + (tmp13 + (8*tmp7)), None, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.5979754Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp14, xmask) 2023-01-11T21:38:06.5979842Z ''') 2023-01-11T21:38:06.5979850Z 2023-01-11T21:38:06.5979855Z 2023-01-11T21:38:06.5980039Z triton_fused_reflection_pad2d_1_1 = async_compile.triton(''' 2023-01-11T21:38:06.5980114Z import triton 2023-01-11T21:38:06.5980210Z import triton.language as tl 2023-01-11T21:38:06.5980324Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.5980466Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.5980606Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.5980733Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.5980739Z 2023-01-11T21:38:06.5981142Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.5981222Z @triton.jit 2023-01-11T21:38:06.5981354Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.5981430Z xnumel = 165 2023-01-11T21:38:06.5981534Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.5981657Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.5981743Z xmask = xindex < xnumel 2023-01-11T21:38:06.5981825Z x1 = (xindex // 11) 2023-01-11T21:38:06.5981902Z x0 = xindex % 11 2023-01-11T21:38:06.5981974Z x2 = xindex 2023-01-11T21:38:06.5982046Z tmp0 = 7 2023-01-11T21:38:06.5982112Z tmp1 = x1 2023-01-11T21:38:06.5982184Z tmp2 = 3 2023-01-11T21:38:06.5982296Z tmp3 = tmp1 - tmp2 2023-01-11T21:38:06.5982378Z tmp4 = tl.abs(tmp3) 2023-01-11T21:38:06.5982487Z tmp5 = tmp0 - tmp4 2023-01-11T21:38:06.5982567Z tmp6 = tl.abs(tmp5) 2023-01-11T21:38:06.5982676Z tmp7 = tmp0 - tmp6 2023-01-11T21:38:06.5982743Z tmp8 = x0 2023-01-11T21:38:06.5982814Z tmp9 = 1 2023-01-11T21:38:06.5982926Z tmp10 = tmp8 - tmp9 2023-01-11T21:38:06.5983008Z tmp11 = tl.abs(tmp10) 2023-01-11T21:38:06.5983123Z tmp12 = tmp0 - tmp11 2023-01-11T21:38:06.5983237Z tmp13 = tl.abs(tmp12) 2023-01-11T21:38:06.5983341Z tmp14 = tmp0 - tmp13 2023-01-11T21:38:06.5983473Z tmp15 = tl.load(in_ptr0 + (tmp14 + (8*tmp7)), None).to(tl.float32) 2023-01-11T21:38:06.5983610Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp15, xmask) 2023-01-11T21:38:06.5983695Z ''') 2023-01-11T21:38:06.5983701Z 2023-01-11T21:38:06.5983708Z 2023-01-11T21:38:06.5983804Z async_compile.wait(globals()) 2023-01-11T21:38:06.5983881Z del async_compile 2023-01-11T21:38:06.5983885Z 2023-01-11T21:38:06.5983959Z def call(args): 2023-01-11T21:38:06.5984031Z arg0_1, = args 2023-01-11T21:38:06.5984098Z args.clear() 2023-01-11T21:38:06.5984193Z with torch.cuda.device(0): 2023-01-11T21:38:06.5984416Z buf0 = empty_strided((1, 1, 10, 10), (100, 100, 10, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.5984508Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.5984661Z triton_fused_reflection_pad2d_0.run(arg0_1, buf0, 100, grid=grid(100), stream=stream0) 2023-01-11T21:38:06.5984884Z buf1 = empty_strided((1, 1, 15, 11), (165, 165, 11, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.5985038Z triton_fused_reflection_pad2d_1_1.run(arg0_1, buf1, 165, grid=grid(165), stream=stream0) 2023-01-11T21:38:06.5985111Z del arg0_1 2023-01-11T21:38:06.5985191Z return (buf0, buf1, ) 2023-01-11T21:38:06.5985196Z 2023-01-11T21:38:06.5985200Z 2023-01-11T21:38:06.5985282Z if __name__ == "__main__": 2023-01-11T21:38:06.5985406Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.5985549Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.5985796Z arg0_1 = rand_strided((1, 1, 8, 8), (64, 64, 8, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.5985908Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.5985913Z 2023-01-11T21:38:06.5985983Z ok (0.255s) 2023-01-11T21:38:06.5986435Z test_relu_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.5986597Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.5986853Z [2023-01-11 21:35:30,443] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 775 2023-01-11T21:38:06.5987117Z [2023-01-11 21:35:30,520] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 775 2023-01-11T21:38:06.5987537Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.5987671Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.5987926Z [2023-01-11 21:35:30,542] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 776 2023-01-11T21:38:06.5988194Z [2023-01-11 21:35:30,620] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 776 2023-01-11T21:38:06.5988200Z 2023-01-11T21:38:06.5988298Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.5988373Z import torch 2023-01-11T21:38:06.5988448Z import random 2023-01-11T21:38:06.5988561Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.5988688Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.5988693Z 2023-01-11T21:38:06.5988779Z aten = torch.ops.aten 2023-01-11T21:38:06.5988916Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.5989012Z async_compile = AsyncCompile() 2023-01-11T21:38:06.5989088Z 2023-01-11T21:38:06.5989162Z import triton 2023-01-11T21:38:06.5989254Z import triton.language as tl 2023-01-11T21:38:06.5989379Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.5989512Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.5989517Z 2023-01-11T21:38:06.5989531Z 2023-01-11T21:38:06.5989685Z triton_fused_div_relu_0 = async_compile.triton(''' 2023-01-11T21:38:06.5989759Z import triton 2023-01-11T21:38:06.5989851Z import triton.language as tl 2023-01-11T21:38:06.5989965Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.5990066Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.5990198Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.5990323Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.5990328Z 2023-01-11T21:38:06.5990755Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]}) 2023-01-11T21:38:06.5990832Z @triton.jit 2023-01-11T21:38:06.5990982Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.5991061Z xnumel = 64 2023-01-11T21:38:06.5991155Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.5991286Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.5991368Z xmask = xindex < xnumel 2023-01-11T21:38:06.5991437Z x0 = xindex 2023-01-11T21:38:06.5991622Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.5991720Z tmp2 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.5991816Z tmp3 = tl.load(in_ptr1 + (x0), xmask) 2023-01-11T21:38:06.5991933Z tmp1 = tl.where(0 != 0, 0, tl.where(0 > tmp0, 0, tmp0)) 2023-01-11T21:38:06.5992013Z tmp4 = tmp2 + tmp3 2023-01-11T21:38:06.5992130Z tmp5 = tl.where(0 != 0, 0, tl.where(0 > tmp4, 0, tmp4)) 2023-01-11T21:38:06.5992202Z tmp6 = 10 2023-01-11T21:38:06.5992274Z tmp7 = tmp5 / tmp6 2023-01-11T21:38:06.5992409Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask) 2023-01-11T21:38:06.5992574Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp7, xmask) 2023-01-11T21:38:06.5992660Z ''') 2023-01-11T21:38:06.5992666Z 2023-01-11T21:38:06.5992670Z 2023-01-11T21:38:06.5992766Z async_compile.wait(globals()) 2023-01-11T21:38:06.5992844Z del async_compile 2023-01-11T21:38:06.5992849Z 2023-01-11T21:38:06.5992925Z def call(args): 2023-01-11T21:38:06.5993007Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.5993075Z args.clear() 2023-01-11T21:38:06.5993167Z with torch.cuda.device(0): 2023-01-11T21:38:06.5993371Z buf0 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.5993572Z buf1 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.5993667Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.5993821Z triton_fused_div_relu_0.run(arg0_1, arg1_1, buf0, buf1, 64, grid=grid(64), stream=stream0) 2023-01-11T21:38:06.5993894Z del arg0_1 2023-01-11T21:38:06.5993960Z del arg1_1 2023-01-11T21:38:06.5994045Z return (buf0, buf1, ) 2023-01-11T21:38:06.5994050Z 2023-01-11T21:38:06.5994055Z 2023-01-11T21:38:06.5994134Z if __name__ == "__main__": 2023-01-11T21:38:06.5994252Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.5994377Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.5994578Z arg0_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.5994777Z arg1_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.5994896Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.5994901Z 2023-01-11T21:38:06.5994906Z 2023-01-11T21:38:06.5995027Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.5995100Z import torch 2023-01-11T21:38:06.5995173Z import random 2023-01-11T21:38:06.5995300Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.5995440Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.5995445Z 2023-01-11T21:38:06.5995547Z aten = torch.ops.aten 2023-01-11T21:38:06.5995688Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.5995783Z async_compile = AsyncCompile() 2023-01-11T21:38:06.5995789Z 2023-01-11T21:38:06.5995855Z import triton 2023-01-11T21:38:06.5995945Z import triton.language as tl 2023-01-11T21:38:06.5996069Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.5996212Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.5996217Z 2023-01-11T21:38:06.5996222Z 2023-01-11T21:38:06.5996383Z triton_fused_div_relu_0 = async_compile.triton(''' 2023-01-11T21:38:06.5996456Z import triton 2023-01-11T21:38:06.5996551Z import triton.language as tl 2023-01-11T21:38:06.5996658Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.5996759Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.5996887Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.5997015Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.5997020Z 2023-01-11T21:38:06.5997457Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: '*fp16', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]}) 2023-01-11T21:38:06.5997530Z @triton.jit 2023-01-11T21:38:06.5997681Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.5997755Z xnumel = 64 2023-01-11T21:38:06.5997845Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.5997975Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.5998058Z xmask = xindex < xnumel 2023-01-11T21:38:06.5998131Z x0 = xindex 2023-01-11T21:38:06.5998345Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.5998461Z tmp2 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.5998602Z tmp3 = tl.load(in_ptr1 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.5998719Z tmp1 = tl.where(0 != 0, 0, tl.where(0 > tmp0, 0, tmp0)) 2023-01-11T21:38:06.5998791Z tmp4 = tmp2 + tmp3 2023-01-11T21:38:06.5998905Z tmp5 = tl.where(0 != 0, 0, tl.where(0 > tmp4, 0, tmp4)) 2023-01-11T21:38:06.5998978Z tmp6 = 10 2023-01-11T21:38:06.5999057Z tmp7 = tmp5 / tmp6 2023-01-11T21:38:06.5999189Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask) 2023-01-11T21:38:06.5999321Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp7, xmask) 2023-01-11T21:38:06.5999404Z ''') 2023-01-11T21:38:06.5999409Z 2023-01-11T21:38:06.5999416Z 2023-01-11T21:38:06.5999505Z async_compile.wait(globals()) 2023-01-11T21:38:06.5999584Z del async_compile 2023-01-11T21:38:06.5999589Z 2023-01-11T21:38:06.5999663Z def call(args): 2023-01-11T21:38:06.5999742Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.5999818Z args.clear() 2023-01-11T21:38:06.5999909Z with torch.cuda.device(0): 2023-01-11T21:38:06.6000110Z buf0 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.6000306Z buf1 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.6000391Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.6000541Z triton_fused_div_relu_0.run(arg0_1, arg1_1, buf0, buf1, 64, grid=grid(64), stream=stream0) 2023-01-11T21:38:06.6000615Z del arg0_1 2023-01-11T21:38:06.6000686Z del arg1_1 2023-01-11T21:38:06.6000766Z return (buf0, buf1, ) 2023-01-11T21:38:06.6000771Z 2023-01-11T21:38:06.6000776Z 2023-01-11T21:38:06.6000886Z if __name__ == "__main__": 2023-01-11T21:38:06.6001009Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.6001128Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.6001328Z arg0_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.6001528Z arg1_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.6001648Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.6001653Z 2023-01-11T21:38:06.6001725Z ok (0.200s) 2023-01-11T21:38:06.6002184Z test_remainder_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.6002317Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.6002583Z [2023-01-11 21:35:30,647] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 777 2023-01-11T21:38:06.6002849Z [2023-01-11 21:35:30,744] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 777 2023-01-11T21:38:06.6003269Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.6003402Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.6003649Z [2023-01-11 21:35:30,771] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 778 2023-01-11T21:38:06.6003910Z [2023-01-11 21:35:30,870] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 778 2023-01-11T21:38:06.6003919Z 2023-01-11T21:38:06.6004018Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.6004091Z import torch 2023-01-11T21:38:06.6004162Z import random 2023-01-11T21:38:06.6004281Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.6004436Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.6004442Z 2023-01-11T21:38:06.6004525Z aten = torch.ops.aten 2023-01-11T21:38:06.6004655Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.6004754Z async_compile = AsyncCompile() 2023-01-11T21:38:06.6004760Z 2023-01-11T21:38:06.6004834Z import triton 2023-01-11T21:38:06.6004925Z import triton.language as tl 2023-01-11T21:38:06.6005048Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.6005189Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.6005194Z 2023-01-11T21:38:06.6005202Z 2023-01-11T21:38:06.6005405Z triton_fused_remainder_remainder_1_remainder_2_0 = async_compile.triton(''' 2023-01-11T21:38:06.6005478Z import triton 2023-01-11T21:38:06.6005563Z import triton.language as tl 2023-01-11T21:38:06.6005676Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6005777Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6005912Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6006036Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6006041Z 2023-01-11T21:38:06.6006495Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: '*fp32', 5: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5), equal_to_1=())]}) 2023-01-11T21:38:06.6006569Z @triton.jit 2023-01-11T21:38:06.6006727Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, out_ptr2, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6006837Z xnumel = 64 2023-01-11T21:38:06.6006934Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6007064Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6007145Z xmask = xindex < xnumel 2023-01-11T21:38:06.6007215Z x0 = xindex 2023-01-11T21:38:06.6007407Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.6007598Z tmp1 = tl.load(in_ptr1 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.6007692Z tmp11 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.6007790Z tmp13 = tl.load(in_ptr1 + (x0), xmask) 2023-01-11T21:38:06.6007870Z tmp2 = tmp0 % tmp1 2023-01-11T21:38:06.6007947Z tmp3 = tmp2 + tmp1 2023-01-11T21:38:06.6008075Z tmp4 = tl.where(((tmp2 != 0) & ((tmp2 < 0) != (tmp1 < 0))), tmp3, tmp2) 2023-01-11T21:38:06.6008143Z tmp5 = 1 2023-01-11T21:38:06.6008220Z tmp6 = tmp0 + tmp5 2023-01-11T21:38:06.6008322Z tmp7 = tmp1 - tmp5 2023-01-11T21:38:06.6008403Z tmp8 = tmp6 % tmp7 2023-01-11T21:38:06.6008479Z tmp9 = tmp8 + tmp7 2023-01-11T21:38:06.6008607Z tmp10 = tl.where(((tmp8 != 0) & ((tmp8 < 0) != (tmp7 < 0))), tmp9, tmp8) 2023-01-11T21:38:06.6008719Z tmp12 = tmp11 - tmp5 2023-01-11T21:38:06.6008799Z tmp14 = tmp13 + tmp5 2023-01-11T21:38:06.6008885Z tmp15 = tmp12 % tmp14 2023-01-11T21:38:06.6008958Z tmp16 = tmp15 + tmp14 2023-01-11T21:38:06.6009088Z tmp17 = tl.where(((tmp15 != 0) & ((tmp15 < 0) != (tmp14 < 0))), tmp16, tmp15) 2023-01-11T21:38:06.6009223Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.6009357Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp10, xmask) 2023-01-11T21:38:06.6009490Z tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp17, xmask) 2023-01-11T21:38:06.6009576Z ''') 2023-01-11T21:38:06.6009581Z 2023-01-11T21:38:06.6009586Z 2023-01-11T21:38:06.6009678Z async_compile.wait(globals()) 2023-01-11T21:38:06.6009756Z del async_compile 2023-01-11T21:38:06.6009762Z 2023-01-11T21:38:06.6009829Z def call(args): 2023-01-11T21:38:06.6009906Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.6009980Z args.clear() 2023-01-11T21:38:06.6010071Z with torch.cuda.device(0): 2023-01-11T21:38:06.6010299Z buf0 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.6010498Z buf1 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.6010694Z buf2 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.6010781Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.6010968Z triton_fused_remainder_remainder_1_remainder_2_0.run(arg0_1, arg1_1, buf0, buf1, buf2, 64, grid=grid(64), stream=stream0) 2023-01-11T21:38:06.6011043Z del arg0_1 2023-01-11T21:38:06.6011115Z del arg1_1 2023-01-11T21:38:06.6011207Z return (buf0, buf1, buf2, ) 2023-01-11T21:38:06.6011213Z 2023-01-11T21:38:06.6011220Z 2023-01-11T21:38:06.6011299Z if __name__ == "__main__": 2023-01-11T21:38:06.6011419Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.6011544Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.6011736Z arg0_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.6011934Z arg1_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.6012053Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.6012058Z 2023-01-11T21:38:06.6012063Z 2023-01-11T21:38:06.6012158Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.6012231Z import torch 2023-01-11T21:38:06.6012304Z import random 2023-01-11T21:38:06.6012421Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.6012546Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.6012551Z 2023-01-11T21:38:06.6012626Z aten = torch.ops.aten 2023-01-11T21:38:06.6012764Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.6012885Z async_compile = AsyncCompile() 2023-01-11T21:38:06.6012890Z 2023-01-11T21:38:06.6012961Z import triton 2023-01-11T21:38:06.6013052Z import triton.language as tl 2023-01-11T21:38:06.6013178Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.6013322Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.6013328Z 2023-01-11T21:38:06.6013332Z 2023-01-11T21:38:06.6013536Z triton_fused_remainder_remainder_1_remainder_2_0 = async_compile.triton(''' 2023-01-11T21:38:06.6013603Z import triton 2023-01-11T21:38:06.6013693Z import triton.language as tl 2023-01-11T21:38:06.6013806Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6013907Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6014038Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6014161Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6014169Z 2023-01-11T21:38:06.6014725Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: '*fp16', 4: '*fp16', 5: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5), equal_to_1=())]}) 2023-01-11T21:38:06.6014800Z @triton.jit 2023-01-11T21:38:06.6014960Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, out_ptr2, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6015034Z xnumel = 64 2023-01-11T21:38:06.6015132Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6015263Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6015354Z xmask = xindex < xnumel 2023-01-11T21:38:06.6015443Z x0 = xindex 2023-01-11T21:38:06.6015684Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.6015892Z tmp1 = tl.load(in_ptr1 + (x0), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.6016012Z tmp11 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.6016128Z tmp13 = tl.load(in_ptr1 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.6016207Z tmp2 = tmp0 % tmp1 2023-01-11T21:38:06.6016285Z tmp3 = tmp2 + tmp1 2023-01-11T21:38:06.6016454Z tmp4 = tl.where(((tmp2 != 0) & ((tmp2 < 0) != (tmp1 < 0))), tmp3, tmp2) 2023-01-11T21:38:06.6016529Z tmp5 = 1 2023-01-11T21:38:06.6016600Z tmp6 = tmp0 + tmp5 2023-01-11T21:38:06.6016707Z tmp7 = tmp1 - tmp5 2023-01-11T21:38:06.6016783Z tmp8 = tmp6 % tmp7 2023-01-11T21:38:06.6016858Z tmp9 = tmp8 + tmp7 2023-01-11T21:38:06.6016985Z tmp10 = tl.where(((tmp8 != 0) & ((tmp8 < 0) != (tmp7 < 0))), tmp9, tmp8) 2023-01-11T21:38:06.6017097Z tmp12 = tmp11 - tmp5 2023-01-11T21:38:06.6017241Z tmp14 = tmp13 + tmp5 2023-01-11T21:38:06.6017327Z tmp15 = tmp12 % tmp14 2023-01-11T21:38:06.6017423Z tmp16 = tmp15 + tmp14 2023-01-11T21:38:06.6017568Z tmp17 = tl.where(((tmp15 != 0) & ((tmp15 < 0) != (tmp14 < 0))), tmp16, tmp15) 2023-01-11T21:38:06.6017708Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.6017839Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp10, xmask) 2023-01-11T21:38:06.6017970Z tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp17, xmask) 2023-01-11T21:38:06.6018059Z ''') 2023-01-11T21:38:06.6018064Z 2023-01-11T21:38:06.6018069Z 2023-01-11T21:38:06.6018162Z async_compile.wait(globals()) 2023-01-11T21:38:06.6018232Z del async_compile 2023-01-11T21:38:06.6018237Z 2023-01-11T21:38:06.6018309Z def call(args): 2023-01-11T21:38:06.6018385Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.6018462Z args.clear() 2023-01-11T21:38:06.6018551Z with torch.cuda.device(0): 2023-01-11T21:38:06.6018749Z buf0 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.6018946Z buf1 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.6019177Z buf2 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.6019270Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.6019458Z triton_fused_remainder_remainder_1_remainder_2_0.run(arg0_1, arg1_1, buf0, buf1, buf2, 64, grid=grid(64), stream=stream0) 2023-01-11T21:38:06.6019534Z del arg0_1 2023-01-11T21:38:06.6019606Z del arg1_1 2023-01-11T21:38:06.6019693Z return (buf0, buf1, buf2, ) 2023-01-11T21:38:06.6019698Z 2023-01-11T21:38:06.6019703Z 2023-01-11T21:38:06.6019781Z if __name__ == "__main__": 2023-01-11T21:38:06.6019899Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.6020019Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.6020214Z arg0_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.6020408Z arg1_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.6020530Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.6020542Z 2023-01-11T21:38:06.6020611Z ok (0.250s) 2023-01-11T21:38:06.6021069Z test_repeat_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.6021205Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.6021463Z [2023-01-11 21:35:30,889] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 779 2023-01-11T21:38:06.6021728Z [2023-01-11 21:35:31,019] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 779 2023-01-11T21:38:06.6022142Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.6022304Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.6022552Z [2023-01-11 21:35:31,036] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 780 2023-01-11T21:38:06.6022558Z 2023-01-11T21:38:06.6022656Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.6022729Z import torch 2023-01-11T21:38:06.6022803Z import random 2023-01-11T21:38:06.6022922Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.6023048Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.6023053Z 2023-01-11T21:38:06.6023135Z aten = torch.ops.aten 2023-01-11T21:38:06.6023273Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.6023365Z async_compile = AsyncCompile() 2023-01-11T21:38:06.6023370Z 2023-01-11T21:38:06.6023444Z import triton 2023-01-11T21:38:06.6023534Z import triton.language as tl 2023-01-11T21:38:06.6023659Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.6023804Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.6023810Z 2023-01-11T21:38:06.6023815Z 2023-01-11T21:38:06.6023975Z triton_fused_repeat_0 = async_compile.triton(''' 2023-01-11T21:38:06.6024048Z import triton 2023-01-11T21:38:06.6024133Z import triton.language as tl 2023-01-11T21:38:06.6024248Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6024349Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6024484Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6024610Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6024615Z 2023-01-11T21:38:06.6025019Z @pointwise(size_hints=[1024], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.6025122Z @triton.jit 2023-01-11T21:38:06.6025254Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6025323Z xnumel = 768 2023-01-11T21:38:06.6025423Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6025555Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6025650Z xmask = xindex < xnumel 2023-01-11T21:38:06.6025735Z x0 = xindex % 8 2023-01-11T21:38:06.6025823Z x1 = (xindex // 8) % 12 2023-01-11T21:38:06.6025916Z x2 = (xindex // 96) % 4 2023-01-11T21:38:06.6025979Z x4 = xindex 2023-01-11T21:38:06.6026208Z tmp0 = tl.load(in_ptr0 + (x0 + (8*(x1 % 4)) + (32*(x2 % 2))), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.6026344Z tl.store(out_ptr0 + (x4 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask) 2023-01-11T21:38:06.6026432Z ''') 2023-01-11T21:38:06.6026438Z 2023-01-11T21:38:06.6026442Z 2023-01-11T21:38:06.6026604Z triton_fused_repeat_1_1 = async_compile.triton(''' 2023-01-11T21:38:06.6026680Z import triton 2023-01-11T21:38:06.6026771Z import triton.language as tl 2023-01-11T21:38:06.6026886Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6026981Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6027112Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6027237Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6027242Z 2023-01-11T21:38:06.6027646Z @pointwise(size_hints=[512], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.6027719Z @triton.jit 2023-01-11T21:38:06.6027853Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6027927Z xnumel = 512 2023-01-11T21:38:06.6028023Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6028145Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6028228Z xmask = xindex < xnumel 2023-01-11T21:38:06.6028304Z x0 = xindex % 64 2023-01-11T21:38:06.6028399Z x2 = xindex 2023-01-11T21:38:06.6028589Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.6028724Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask) 2023-01-11T21:38:06.6028809Z ''') 2023-01-11T21:38:06.6028815Z 2023-01-11T21:38:06.6028819Z 2023-01-11T21:38:06.6028978Z triton_fused_repeat_2_2 = async_compile.triton(''' 2023-01-11T21:38:06.6029046Z import triton 2023-01-11T21:38:06.6029136Z import triton.language as tl 2023-01-11T21:38:06.6029249Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6029348Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6029483Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6029606Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6029612Z 2023-01-11T21:38:06.6030016Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.6030089Z @triton.jit 2023-01-11T21:38:06.6030213Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6030286Z xnumel = 128 2023-01-11T21:38:06.6030381Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6030510Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6030592Z xmask = xindex < xnumel 2023-01-11T21:38:06.6030666Z x0 = xindex % 64 2023-01-11T21:38:06.6030729Z x2 = xindex 2023-01-11T21:38:06.6030823Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.6030986Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask) 2023-01-11T21:38:06.6031071Z ''') 2023-01-11T21:38:06.6031077Z 2023-01-11T21:38:06.6031081Z 2023-01-11T21:38:06.6031176Z async_compile.wait(globals()) 2023-01-11T21:38:06.6031252Z del async_compile 2023-01-11T21:38:06.6031257Z 2023-01-11T21:38:06.6031330Z def call(args): 2023-01-11T21:38:06.6031403Z arg0_1, = args 2023-01-11T21:38:06.6031471Z args.clear() 2023-01-11T21:38:06.6031566Z with torch.cuda.device(0): 2023-01-11T21:38:06.6031787Z buf0 = empty_strided((2, 4, 12, 8), (384, 96, 8, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.6031877Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.6032019Z triton_fused_repeat_0.run(arg0_1, buf0, 768, grid=grid(768), stream=stream0) 2023-01-11T21:38:06.6032233Z buf1 = empty_strided((8, 2, 4, 8), (64, 32, 8, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.6032376Z triton_fused_repeat_1_1.run(arg0_1, buf1, 512, grid=grid(512), stream=stream0) 2023-01-11T21:38:06.6032607Z buf2 = empty_strided((2, 1, 1, 2, 4, 8), (64, 64, 64, 32, 8, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.6032742Z triton_fused_repeat_2_2.run(arg0_1, buf2, 128, grid=grid(128), stream=stream0) 2023-01-11T21:38:06.6032814Z del arg0_1 2023-01-11T21:38:06.6032905Z return (buf0, buf1, buf2, ) 2023-01-11T21:38:06.6032911Z 2023-01-11T21:38:06.6032915Z 2023-01-11T21:38:06.6032994Z if __name__ == "__main__": 2023-01-11T21:38:06.6033112Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.6033237Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.6033449Z arg0_1 = rand_strided((1, 2, 4, 8), (64, 32, 8, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.6033562Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.6033567Z 2023-01-11T21:38:06.6033824Z [2023-01-11 21:35:31,131] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 780 2023-01-11T21:38:06.6033846Z 2023-01-11T21:38:06.6033937Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.6034011Z import torch 2023-01-11T21:38:06.6034083Z import random 2023-01-11T21:38:06.6034202Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.6034355Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.6034360Z 2023-01-11T21:38:06.6034442Z aten = torch.ops.aten 2023-01-11T21:38:06.6034578Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.6034666Z async_compile = AsyncCompile() 2023-01-11T21:38:06.6034671Z 2023-01-11T21:38:06.6034741Z import triton 2023-01-11T21:38:06.6034832Z import triton.language as tl 2023-01-11T21:38:06.6034956Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.6035094Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.6035099Z 2023-01-11T21:38:06.6035104Z 2023-01-11T21:38:06.6035263Z triton_fused_repeat_0 = async_compile.triton(''' 2023-01-11T21:38:06.6035360Z import triton 2023-01-11T21:38:06.6035461Z import triton.language as tl 2023-01-11T21:38:06.6035591Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6035693Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6035825Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6035950Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6035955Z 2023-01-11T21:38:06.6036359Z @pointwise(size_hints=[1024], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.6036431Z @triton.jit 2023-01-11T21:38:06.6036564Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6036634Z xnumel = 768 2023-01-11T21:38:06.6036725Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6036880Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6036963Z xmask = xindex < xnumel 2023-01-11T21:38:06.6037037Z x0 = xindex % 8 2023-01-11T21:38:06.6037118Z x1 = (xindex // 8) % 12 2023-01-11T21:38:06.6037196Z x2 = (xindex // 96) % 4 2023-01-11T21:38:06.6037259Z x4 = xindex 2023-01-11T21:38:06.6037513Z tmp0 = tl.load(in_ptr0 + (x0 + (8*(x1 % 4)) + (32*(x2 % 2))), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.6037648Z tl.store(out_ptr0 + (x4 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask) 2023-01-11T21:38:06.6037732Z ''') 2023-01-11T21:38:06.6037738Z 2023-01-11T21:38:06.6037742Z 2023-01-11T21:38:06.6037903Z triton_fused_repeat_1_1 = async_compile.triton(''' 2023-01-11T21:38:06.6037976Z import triton 2023-01-11T21:38:06.6038068Z import triton.language as tl 2023-01-11T21:38:06.6038182Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6038277Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6038409Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6038536Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6038542Z 2023-01-11T21:38:06.6038947Z @pointwise(size_hints=[512], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.6039018Z @triton.jit 2023-01-11T21:38:06.6039150Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6039222Z xnumel = 512 2023-01-11T21:38:06.6039316Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6039438Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6039519Z xmask = xindex < xnumel 2023-01-11T21:38:06.6039593Z x0 = xindex % 64 2023-01-11T21:38:06.6039664Z x2 = xindex 2023-01-11T21:38:06.6039876Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.6040012Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask) 2023-01-11T21:38:06.6040096Z ''') 2023-01-11T21:38:06.6040102Z 2023-01-11T21:38:06.6040106Z 2023-01-11T21:38:06.6040265Z triton_fused_repeat_2_2 = async_compile.triton(''' 2023-01-11T21:38:06.6040333Z import triton 2023-01-11T21:38:06.6040453Z import triton.language as tl 2023-01-11T21:38:06.6040570Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6040672Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6040803Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6040930Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6040935Z 2023-01-11T21:38:06.6041341Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.6041415Z @triton.jit 2023-01-11T21:38:06.6041538Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6041612Z xnumel = 128 2023-01-11T21:38:06.6041707Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6041836Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6041921Z xmask = xindex < xnumel 2023-01-11T21:38:06.6041998Z x0 = xindex % 64 2023-01-11T21:38:06.6042068Z x2 = xindex 2023-01-11T21:38:06.6042180Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.6042312Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask) 2023-01-11T21:38:06.6042400Z ''') 2023-01-11T21:38:06.6042406Z 2023-01-11T21:38:06.6042410Z 2023-01-11T21:38:06.6042502Z async_compile.wait(globals()) 2023-01-11T21:38:06.6042578Z del async_compile 2023-01-11T21:38:06.6042583Z 2023-01-11T21:38:06.6042655Z def call(args): 2023-01-11T21:38:06.6042727Z arg0_1, = args 2023-01-11T21:38:06.6042823Z args.clear() 2023-01-11T21:38:06.6042916Z with torch.cuda.device(0): 2023-01-11T21:38:06.6043136Z buf0 = empty_strided((2, 4, 12, 8), (384, 96, 8, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.6043230Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.6043372Z triton_fused_repeat_0.run(arg0_1, buf0, 768, grid=grid(768), stream=stream0) 2023-01-11T21:38:06.6043587Z buf1 = empty_strided((8, 2, 4, 8), (64, 32, 8, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.6043728Z triton_fused_repeat_1_1.run(arg0_1, buf1, 512, grid=grid(512), stream=stream0) 2023-01-11T21:38:06.6043954Z buf2 = empty_strided((2, 1, 1, 2, 4, 8), (64, 64, 64, 32, 8, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.6044088Z triton_fused_repeat_2_2.run(arg0_1, buf2, 128, grid=grid(128), stream=stream0) 2023-01-11T21:38:06.6044160Z del arg0_1 2023-01-11T21:38:06.6044252Z return (buf0, buf1, buf2, ) 2023-01-11T21:38:06.6044260Z 2023-01-11T21:38:06.6044265Z 2023-01-11T21:38:06.6044344Z if __name__ == "__main__": 2023-01-11T21:38:06.6044461Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.6044588Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.6044810Z arg0_1 = rand_strided((1, 2, 4, 8), (64, 32, 8, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.6044923Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.6044928Z 2023-01-11T21:38:06.6044994Z ok (0.261s) 2023-01-11T21:38:06.6045451Z test_roi_align_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.6045581Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.6045841Z [2023-01-11 21:35:31,211] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 781 2023-01-11T21:38:06.6046098Z [2023-01-11 21:35:31,227] torch._inductor.ir: [WARNING] Using FallbackKernel: torch.ops.torchvision.roi_align 2023-01-11T21:38:06.6046385Z [2023-01-11 21:35:31,230] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 781 2023-01-11T21:38:06.6046801Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.6046931Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.6047187Z [2023-01-11 21:35:31,310] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 782 2023-01-11T21:38:06.6047448Z [2023-01-11 21:35:31,326] torch._inductor.ir: [WARNING] Using FallbackKernel: torch.ops.torchvision.roi_align 2023-01-11T21:38:06.6047709Z [2023-01-11 21:35:31,329] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 782 2023-01-11T21:38:06.6047717Z 2023-01-11T21:38:06.6047814Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.6047882Z import torch 2023-01-11T21:38:06.6047955Z import random 2023-01-11T21:38:06.6048073Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.6048196Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.6048201Z 2023-01-11T21:38:06.6048282Z aten = torch.ops.aten 2023-01-11T21:38:06.6048416Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.6048511Z async_compile = AsyncCompile() 2023-01-11T21:38:06.6048516Z 2023-01-11T21:38:06.6048583Z import triton 2023-01-11T21:38:06.6048675Z import triton.language as tl 2023-01-11T21:38:06.6048838Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.6048977Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.6048983Z 2023-01-11T21:38:06.6048988Z 2023-01-11T21:38:06.6049083Z async_compile.wait(globals()) 2023-01-11T21:38:06.6049157Z del async_compile 2023-01-11T21:38:06.6049165Z 2023-01-11T21:38:06.6049239Z def call(args): 2023-01-11T21:38:06.6049318Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.6049385Z args.clear() 2023-01-11T21:38:06.6049480Z with torch.cuda.device(0): 2023-01-11T21:38:06.6049628Z buf0 = torch.ops.torchvision.roi_align(arg0_1, arg1_1, 0.25, 7, 7, 2, False) 2023-01-11T21:38:06.6049700Z del arg0_1 2023-01-11T21:38:06.6049772Z del arg1_1 2023-01-11T21:38:06.6049844Z buf1 = buf0 2023-01-11T21:38:06.6049961Z assert_size_stride(buf1, (2292, 256, 7, 7), (12544, 49, 7, 1)) 2023-01-11T21:38:06.6050025Z del buf0 2023-01-11T21:38:06.6050104Z return (buf1, ) 2023-01-11T21:38:06.6050110Z 2023-01-11T21:38:06.6050114Z 2023-01-11T21:38:06.6050194Z if __name__ == "__main__": 2023-01-11T21:38:06.6050310Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.6050436Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.6050679Z arg0_1 = rand_strided((4, 256, 296, 304), (23035904, 89984, 304, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.6050884Z arg1_1 = rand_strided((2292, 5), (5, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.6051003Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.6051008Z 2023-01-11T21:38:06.6051013Z 2023-01-11T21:38:06.6051103Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.6051175Z import torch 2023-01-11T21:38:06.6051249Z import random 2023-01-11T21:38:06.6051366Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.6051488Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.6051499Z 2023-01-11T21:38:06.6051578Z aten = torch.ops.aten 2023-01-11T21:38:06.6051711Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.6051799Z async_compile = AsyncCompile() 2023-01-11T21:38:06.6051810Z 2023-01-11T21:38:06.6051877Z import triton 2023-01-11T21:38:06.6051968Z import triton.language as tl 2023-01-11T21:38:06.6052119Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.6052258Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.6052263Z 2023-01-11T21:38:06.6052267Z 2023-01-11T21:38:06.6052358Z async_compile.wait(globals()) 2023-01-11T21:38:06.6052433Z del async_compile 2023-01-11T21:38:06.6052438Z 2023-01-11T21:38:06.6052513Z def call(args): 2023-01-11T21:38:06.6052584Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.6052656Z args.clear() 2023-01-11T21:38:06.6052745Z with torch.cuda.device(0): 2023-01-11T21:38:06.6052892Z buf0 = torch.ops.torchvision.roi_align(arg0_1, arg1_1, 0.25, 7, 7, 2, False) 2023-01-11T21:38:06.6052970Z del arg0_1 2023-01-11T21:38:06.6053041Z del arg1_1 2023-01-11T21:38:06.6053111Z buf1 = buf0 2023-01-11T21:38:06.6053221Z assert_size_stride(buf1, (2292, 256, 7, 7), (12544, 49, 7, 1)) 2023-01-11T21:38:06.6053292Z del buf0 2023-01-11T21:38:06.6053371Z return (buf1, ) 2023-01-11T21:38:06.6053377Z 2023-01-11T21:38:06.6053381Z 2023-01-11T21:38:06.6053460Z if __name__ == "__main__": 2023-01-11T21:38:06.6053577Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.6053707Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.6053944Z arg0_1 = rand_strided((4, 256, 296, 304), (23035904, 89984, 304, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.6054144Z arg1_1 = rand_strided((2292, 5), (5, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.6054255Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.6054270Z 2023-01-11T21:38:06.6054364Z ok (0.262s) 2023-01-11T21:38:06.6054932Z test_roll_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.6055063Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.6055324Z [2023-01-11 21:35:31,424] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 783 2023-01-11T21:38:06.6055587Z [2023-01-11 21:35:31,596] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 783 2023-01-11T21:38:06.6056000Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.6056133Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.6056390Z [2023-01-11 21:35:31,624] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 784 2023-01-11T21:38:06.6056652Z [2023-01-11 21:35:31,742] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 784 2023-01-11T21:38:06.6056657Z 2023-01-11T21:38:06.6056755Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.6056822Z import torch 2023-01-11T21:38:06.6056897Z import random 2023-01-11T21:38:06.6057014Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.6057188Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.6057194Z 2023-01-11T21:38:06.6057276Z aten = torch.ops.aten 2023-01-11T21:38:06.6057415Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.6057511Z async_compile = AsyncCompile() 2023-01-11T21:38:06.6057516Z 2023-01-11T21:38:06.6057589Z import triton 2023-01-11T21:38:06.6057674Z import triton.language as tl 2023-01-11T21:38:06.6057797Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.6057982Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.6057989Z 2023-01-11T21:38:06.6057993Z 2023-01-11T21:38:06.6058151Z triton_fused_roll_0 = async_compile.triton(''' 2023-01-11T21:38:06.6058228Z import triton 2023-01-11T21:38:06.6058321Z import triton.language as tl 2023-01-11T21:38:06.6058435Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6058529Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6058663Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6058787Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6058795Z 2023-01-11T21:38:06.6059200Z @pointwise(size_hints=[131072], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.6059271Z @triton.jit 2023-01-11T21:38:06.6059409Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6059483Z xnumel = 100352 2023-01-11T21:38:06.6059581Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6059702Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6059784Z xmask = xindex < xnumel 2023-01-11T21:38:06.6059858Z x0 = xindex % 16 2023-01-11T21:38:06.6059939Z x1 = (xindex // 16) % 56 2023-01-11T21:38:06.6060021Z x2 = (xindex // 896) % 56 2023-01-11T21:38:06.6060099Z x3 = (xindex // 50176) 2023-01-11T21:38:06.6060168Z x4 = xindex 2023-01-11T21:38:06.6060425Z tmp0 = tl.load(in_ptr0 + (x0 + (16*((46 + x1) % 56)) + (896*((3 + x2) % 56)) + (50176*x3)), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.6060599Z tl.store(out_ptr0 + (x4 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask) 2023-01-11T21:38:06.6060686Z ''') 2023-01-11T21:38:06.6060692Z 2023-01-11T21:38:06.6060697Z 2023-01-11T21:38:06.6060856Z triton_fused_roll_1_1 = async_compile.triton(''' 2023-01-11T21:38:06.6060936Z import triton 2023-01-11T21:38:06.6061028Z import triton.language as tl 2023-01-11T21:38:06.6061145Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6061247Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6061375Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6061502Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6061508Z 2023-01-11T21:38:06.6061909Z @pointwise(size_hints=[131072], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.6061989Z @triton.jit 2023-01-11T21:38:06.6062124Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6062199Z xnumel = 100352 2023-01-11T21:38:06.6062296Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6062427Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6062506Z xmask = xindex < xnumel 2023-01-11T21:38:06.6062582Z x0 = xindex % 16 2023-01-11T21:38:06.6062666Z x1 = (xindex // 16) % 56 2023-01-11T21:38:06.6062749Z x2 = (xindex // 896) % 56 2023-01-11T21:38:06.6062832Z x3 = (xindex // 50176) 2023-01-11T21:38:06.6062903Z x4 = xindex 2023-01-11T21:38:06.6063039Z tmp0 = tl.load(in_ptr0 + ((100347 + x0 + (16*x1) + (896*x2) + (50176*x3)) % 100352), xmask) 2023-01-11T21:38:06.6063170Z tl.store(out_ptr0 + (x4 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask) 2023-01-11T21:38:06.6063256Z ''') 2023-01-11T21:38:06.6063261Z 2023-01-11T21:38:06.6063268Z 2023-01-11T21:38:06.6063366Z async_compile.wait(globals()) 2023-01-11T21:38:06.6063444Z del async_compile 2023-01-11T21:38:06.6063449Z 2023-01-11T21:38:06.6063525Z def call(args): 2023-01-11T21:38:06.6063600Z arg0_1, = args 2023-01-11T21:38:06.6063677Z args.clear() 2023-01-11T21:38:06.6063765Z with torch.cuda.device(0): 2023-01-11T21:38:06.6064025Z buf0 = empty_strided((2, 56, 56, 16), (50176, 896, 16, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.6064124Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.6064268Z triton_fused_roll_0.run(arg0_1, buf0, 100352, grid=grid(100352), stream=stream0) 2023-01-11T21:38:06.6064493Z buf1 = empty_strided((2, 56, 56, 16), (50176, 896, 16, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.6064635Z triton_fused_roll_1_1.run(arg0_1, buf1, 100352, grid=grid(100352), stream=stream0) 2023-01-11T21:38:06.6064709Z del arg0_1 2023-01-11T21:38:06.6064794Z return (buf0, buf1, ) 2023-01-11T21:38:06.6064803Z 2023-01-11T21:38:06.6064807Z 2023-01-11T21:38:06.6064889Z if __name__ == "__main__": 2023-01-11T21:38:06.6065002Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.6065131Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.6065359Z arg0_1 = rand_strided((2, 56, 56, 16), (50176, 896, 16, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.6065474Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.6065479Z 2023-01-11T21:38:06.6065484Z 2023-01-11T21:38:06.6065582Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.6065678Z import torch 2023-01-11T21:38:06.6065760Z import random 2023-01-11T21:38:06.6065899Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.6066025Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.6066030Z 2023-01-11T21:38:06.6066113Z aten = torch.ops.aten 2023-01-11T21:38:06.6066250Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.6066378Z async_compile = AsyncCompile() 2023-01-11T21:38:06.6066383Z 2023-01-11T21:38:06.6066459Z import triton 2023-01-11T21:38:06.6066554Z import triton.language as tl 2023-01-11T21:38:06.6066680Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.6066815Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.6066821Z 2023-01-11T21:38:06.6066831Z 2023-01-11T21:38:06.6066982Z triton_fused_roll_0 = async_compile.triton(''' 2023-01-11T21:38:06.6067057Z import triton 2023-01-11T21:38:06.6067149Z import triton.language as tl 2023-01-11T21:38:06.6067263Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6067364Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6067499Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6067625Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6067630Z 2023-01-11T21:38:06.6068026Z @pointwise(size_hints=[131072], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.6068103Z @triton.jit 2023-01-11T21:38:06.6068237Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6068316Z xnumel = 100352 2023-01-11T21:38:06.6068415Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6068544Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6068630Z xmask = xindex < xnumel 2023-01-11T21:38:06.6068709Z x0 = xindex % 16 2023-01-11T21:38:06.6068786Z x1 = (xindex // 16) % 56 2023-01-11T21:38:06.6068870Z x2 = (xindex // 896) % 56 2023-01-11T21:38:06.6068952Z x3 = (xindex // 50176) 2023-01-11T21:38:06.6069023Z x4 = xindex 2023-01-11T21:38:06.6069312Z tmp0 = tl.load(in_ptr0 + (x0 + (16*((46 + x1) % 56)) + (896*((3 + x2) % 56)) + (50176*x3)), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.6069456Z tl.store(out_ptr0 + (x4 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask) 2023-01-11T21:38:06.6069540Z ''') 2023-01-11T21:38:06.6069546Z 2023-01-11T21:38:06.6069550Z 2023-01-11T21:38:06.6069702Z triton_fused_roll_1_1 = async_compile.triton(''' 2023-01-11T21:38:06.6069779Z import triton 2023-01-11T21:38:06.6069901Z import triton.language as tl 2023-01-11T21:38:06.6070018Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6070123Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6070257Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6070385Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6070390Z 2023-01-11T21:38:06.6070801Z @pointwise(size_hints=[131072], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.6070873Z @triton.jit 2023-01-11T21:38:06.6071007Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6071085Z xnumel = 100352 2023-01-11T21:38:06.6071184Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6071312Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6071398Z xmask = xindex < xnumel 2023-01-11T21:38:06.6071475Z x0 = xindex % 16 2023-01-11T21:38:06.6071552Z x1 = (xindex // 16) % 56 2023-01-11T21:38:06.6071632Z x2 = (xindex // 896) % 56 2023-01-11T21:38:06.6071714Z x3 = (xindex // 50176) 2023-01-11T21:38:06.6071786Z x4 = xindex 2023-01-11T21:38:06.6071934Z tmp0 = tl.load(in_ptr0 + ((100347 + x0 + (16*x1) + (896*x2) + (50176*x3)) % 100352), xmask).to(tl.float32) 2023-01-11T21:38:06.6072071Z tl.store(out_ptr0 + (x4 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask) 2023-01-11T21:38:06.6072159Z ''') 2023-01-11T21:38:06.6072164Z 2023-01-11T21:38:06.6072168Z 2023-01-11T21:38:06.6072290Z async_compile.wait(globals()) 2023-01-11T21:38:06.6072362Z del async_compile 2023-01-11T21:38:06.6072367Z 2023-01-11T21:38:06.6072443Z def call(args): 2023-01-11T21:38:06.6072519Z arg0_1, = args 2023-01-11T21:38:06.6072596Z args.clear() 2023-01-11T21:38:06.6072690Z with torch.cuda.device(0): 2023-01-11T21:38:06.6072921Z buf0 = empty_strided((2, 56, 56, 16), (50176, 896, 16, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.6073016Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.6073150Z triton_fused_roll_0.run(arg0_1, buf0, 100352, grid=grid(100352), stream=stream0) 2023-01-11T21:38:06.6073377Z buf1 = empty_strided((2, 56, 56, 16), (50176, 896, 16, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.6073522Z triton_fused_roll_1_1.run(arg0_1, buf1, 100352, grid=grid(100352), stream=stream0) 2023-01-11T21:38:06.6073597Z del arg0_1 2023-01-11T21:38:06.6073680Z return (buf0, buf1, ) 2023-01-11T21:38:06.6073688Z 2023-01-11T21:38:06.6073693Z 2023-01-11T21:38:06.6073774Z if __name__ == "__main__": 2023-01-11T21:38:06.6073892Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.6074019Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.6074243Z arg0_1 = rand_strided((2, 56, 56, 16), (50176, 896, 16, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.6074359Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.6074364Z 2023-01-11T21:38:06.6074436Z ok (0.348s) 2023-01-11T21:38:06.6074612Z test_round_correctness_cuda (__main__.CudaTests) ... skip: need to debug tl.libdevice on A100/V100 (0.001s) 2023-01-11T21:38:06.6075067Z test_round_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.6075202Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.6075488Z [2023-01-11 21:35:31,764] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 785 2023-01-11T21:38:06.6075803Z [2023-01-11 21:35:31,912] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 785 2023-01-11T21:38:06.6076218Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.6076351Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.6076606Z [2023-01-11 21:35:31,934] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 786 2023-01-11T21:38:06.6076867Z [2023-01-11 21:35:32,081] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 786 2023-01-11T21:38:06.6076879Z 2023-01-11T21:38:06.6076972Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.6077048Z import torch 2023-01-11T21:38:06.6077126Z import random 2023-01-11T21:38:06.6077249Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.6077374Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.6077378Z 2023-01-11T21:38:06.6077463Z aten = torch.ops.aten 2023-01-11T21:38:06.6077602Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.6077693Z async_compile = AsyncCompile() 2023-01-11T21:38:06.6077698Z 2023-01-11T21:38:06.6077773Z import triton 2023-01-11T21:38:06.6077868Z import triton.language as tl 2023-01-11T21:38:06.6077995Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.6078136Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.6078167Z 2023-01-11T21:38:06.6078172Z 2023-01-11T21:38:06.6078342Z triton_fused_mul_1_round_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.6078419Z import triton 2023-01-11T21:38:06.6078514Z import triton.language as tl 2023-01-11T21:38:06.6078626Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6078731Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6078865Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6078992Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6078997Z 2023-01-11T21:38:06.6079417Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.6079492Z @triton.jit 2023-01-11T21:38:06.6079636Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6079710Z xnumel = 64 2023-01-11T21:38:06.6079803Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6079933Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6080020Z xmask = xindex < xnumel 2023-01-11T21:38:06.6080093Z x0 = xindex 2023-01-11T21:38:06.6080285Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.6080381Z tmp2 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.6080488Z tmp1 = tl.libdevice.nearbyint(tmp0) 2023-01-11T21:38:06.6080557Z tmp3 = 100.0 2023-01-11T21:38:06.6080638Z tmp4 = tmp2 * tmp3 2023-01-11T21:38:06.6080748Z tmp5 = tl.libdevice.nearbyint(tmp4) 2023-01-11T21:38:06.6080821Z tmp6 = 0.01 2023-01-11T21:38:06.6080900Z tmp7 = tmp5 * tmp6 2023-01-11T21:38:06.6081035Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask) 2023-01-11T21:38:06.6081174Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp7, xmask) 2023-01-11T21:38:06.6081257Z ''') 2023-01-11T21:38:06.6081262Z 2023-01-11T21:38:06.6081267Z 2023-01-11T21:38:06.6081428Z triton_fused_round_2_1 = async_compile.triton(''' 2023-01-11T21:38:06.6081504Z import triton 2023-01-11T21:38:06.6081602Z import triton.language as tl 2023-01-11T21:38:06.6081795Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6081899Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6082031Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6082149Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6082160Z 2023-01-11T21:38:06.6082557Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.6082628Z @triton.jit 2023-01-11T21:38:06.6082759Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6082835Z xnumel = 64 2023-01-11T21:38:06.6082931Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6083059Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6083142Z xmask = xindex < xnumel 2023-01-11T21:38:06.6083206Z x0 = xindex 2023-01-11T21:38:06.6083303Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.6083371Z tmp1 = 1 2023-01-11T21:38:06.6083451Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.6083552Z tmp3 = tl.libdevice.nearbyint(tmp2) 2023-01-11T21:38:06.6083684Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.6083769Z ''') 2023-01-11T21:38:06.6083775Z 2023-01-11T21:38:06.6083779Z 2023-01-11T21:38:06.6083865Z async_compile.wait(globals()) 2023-01-11T21:38:06.6083939Z del async_compile 2023-01-11T21:38:06.6083944Z 2023-01-11T21:38:06.6084020Z def call(args): 2023-01-11T21:38:06.6084096Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.6084200Z args.clear() 2023-01-11T21:38:06.6084292Z with torch.cuda.device(0): 2023-01-11T21:38:06.6084493Z buf0 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.6084690Z buf2 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.6084779Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.6084928Z triton_fused_mul_1_round_1_0.run(arg0_1, buf0, buf2, 64, grid=grid(64), stream=stream0) 2023-01-11T21:38:06.6085001Z del arg0_1 2023-01-11T21:38:06.6085196Z buf1 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.6085339Z triton_fused_round_2_1.run(arg1_1, buf1, 64, grid=grid(64), stream=stream0) 2023-01-11T21:38:06.6085411Z del arg1_1 2023-01-11T21:38:06.6085497Z return (buf0, buf1, buf2, ) 2023-01-11T21:38:06.6085502Z 2023-01-11T21:38:06.6085506Z 2023-01-11T21:38:06.6085585Z if __name__ == "__main__": 2023-01-11T21:38:06.6085702Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.6085827Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.6086025Z arg0_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.6086222Z arg1_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.6086346Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.6086352Z 2023-01-11T21:38:06.6086356Z 2023-01-11T21:38:06.6086453Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.6086529Z import torch 2023-01-11T21:38:06.6086597Z import random 2023-01-11T21:38:06.6086715Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.6086836Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.6086841Z 2023-01-11T21:38:06.6086923Z aten = torch.ops.aten 2023-01-11T21:38:06.6087062Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.6087162Z async_compile = AsyncCompile() 2023-01-11T21:38:06.6087167Z 2023-01-11T21:38:06.6087238Z import triton 2023-01-11T21:38:06.6087328Z import triton.language as tl 2023-01-11T21:38:06.6087445Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.6087584Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.6087617Z 2023-01-11T21:38:06.6087622Z 2023-01-11T21:38:06.6087789Z triton_fused_mul_1_round_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.6087863Z import triton 2023-01-11T21:38:06.6087956Z import triton.language as tl 2023-01-11T21:38:06.6088067Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6088168Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6088298Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6088415Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6088420Z 2023-01-11T21:38:06.6088838Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.6088916Z @triton.jit 2023-01-11T21:38:06.6089057Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6089133Z xnumel = 64 2023-01-11T21:38:06.6089230Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6089359Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6089444Z xmask = xindex < xnumel 2023-01-11T21:38:06.6089508Z x0 = xindex 2023-01-11T21:38:06.6089721Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.6089840Z tmp2 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.6089944Z tmp1 = tl.libdevice.nearbyint(tmp0) 2023-01-11T21:38:06.6090015Z tmp3 = 100.0 2023-01-11T21:38:06.6090093Z tmp4 = tmp2 * tmp3 2023-01-11T21:38:06.6090243Z tmp5 = tl.libdevice.nearbyint(tmp4) 2023-01-11T21:38:06.6090308Z tmp6 = 0.01 2023-01-11T21:38:06.6090386Z tmp7 = tmp5 * tmp6 2023-01-11T21:38:06.6090521Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask) 2023-01-11T21:38:06.6090654Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp7, xmask) 2023-01-11T21:38:06.6090739Z ''') 2023-01-11T21:38:06.6090745Z 2023-01-11T21:38:06.6090749Z 2023-01-11T21:38:06.6090908Z triton_fused_round_2_1 = async_compile.triton(''' 2023-01-11T21:38:06.6090983Z import triton 2023-01-11T21:38:06.6091068Z import triton.language as tl 2023-01-11T21:38:06.6091179Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6091283Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6091417Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6091539Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6091545Z 2023-01-11T21:38:06.6091948Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.6092025Z @triton.jit 2023-01-11T21:38:06.6092156Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6092225Z xnumel = 64 2023-01-11T21:38:06.6092321Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6092449Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6092529Z xmask = xindex < xnumel 2023-01-11T21:38:06.6092598Z x0 = xindex 2023-01-11T21:38:06.6092715Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.6092783Z tmp1 = 1 2023-01-11T21:38:06.6092855Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.6092958Z tmp3 = tl.libdevice.nearbyint(tmp2) 2023-01-11T21:38:06.6093086Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.6093178Z ''') 2023-01-11T21:38:06.6093183Z 2023-01-11T21:38:06.6093188Z 2023-01-11T21:38:06.6093282Z async_compile.wait(globals()) 2023-01-11T21:38:06.6093359Z del async_compile 2023-01-11T21:38:06.6093364Z 2023-01-11T21:38:06.6093438Z def call(args): 2023-01-11T21:38:06.6093517Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.6093585Z args.clear() 2023-01-11T21:38:06.6093704Z with torch.cuda.device(0): 2023-01-11T21:38:06.6093902Z buf0 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.6094099Z buf2 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.6094190Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.6094337Z triton_fused_mul_1_round_1_0.run(arg0_1, buf0, buf2, 64, grid=grid(64), stream=stream0) 2023-01-11T21:38:06.6094410Z del arg0_1 2023-01-11T21:38:06.6094723Z buf1 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.6094866Z triton_fused_round_2_1.run(arg1_1, buf1, 64, grid=grid(64), stream=stream0) 2023-01-11T21:38:06.6094944Z del arg1_1 2023-01-11T21:38:06.6095031Z return (buf0, buf1, buf2, ) 2023-01-11T21:38:06.6095036Z 2023-01-11T21:38:06.6095041Z 2023-01-11T21:38:06.6095120Z if __name__ == "__main__": 2023-01-11T21:38:06.6095238Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.6095364Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.6095564Z arg0_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.6095753Z arg1_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.6095872Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.6095878Z 2023-01-11T21:38:06.6095948Z ok (0.339s) 2023-01-11T21:38:06.6096397Z test_rsqrt_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.6096573Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.6096835Z [2023-01-11 21:35:32,101] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 787 2023-01-11T21:38:06.6097098Z [2023-01-11 21:35:32,241] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 787 2023-01-11T21:38:06.6097587Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.6097727Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.6097981Z [2023-01-11 21:35:32,260] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 788 2023-01-11T21:38:06.6098245Z [2023-01-11 21:35:32,402] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 788 2023-01-11T21:38:06.6098251Z 2023-01-11T21:38:06.6098342Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.6098416Z import torch 2023-01-11T21:38:06.6098490Z import random 2023-01-11T21:38:06.6098610Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.6098738Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.6098743Z 2023-01-11T21:38:06.6098824Z aten = torch.ops.aten 2023-01-11T21:38:06.6098958Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.6099046Z async_compile = AsyncCompile() 2023-01-11T21:38:06.6099057Z 2023-01-11T21:38:06.6099124Z import triton 2023-01-11T21:38:06.6099218Z import triton.language as tl 2023-01-11T21:38:06.6099342Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.6099482Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.6099487Z 2023-01-11T21:38:06.6099492Z 2023-01-11T21:38:06.6099694Z triton_fused_rsqrt_sub_0 = async_compile.triton(''' 2023-01-11T21:38:06.6099771Z import triton 2023-01-11T21:38:06.6099862Z import triton.language as tl 2023-01-11T21:38:06.6099968Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6100069Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6100201Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6100329Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6100335Z 2023-01-11T21:38:06.6100751Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.6100827Z @triton.jit 2023-01-11T21:38:06.6100968Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6101042Z xnumel = 64 2023-01-11T21:38:06.6101132Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6101261Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6101345Z xmask = xindex < xnumel 2023-01-11T21:38:06.6101415Z x0 = xindex 2023-01-11T21:38:06.6101606Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.6101705Z tmp2 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.6101802Z tmp1 = tl.libdevice.rsqrt(tmp0) 2023-01-11T21:38:06.6101866Z tmp3 = 1 2023-01-11T21:38:06.6101948Z tmp4 = tmp2 + tmp3 2023-01-11T21:38:06.6102049Z tmp5 = tl.libdevice.rsqrt(tmp4) 2023-01-11T21:38:06.6102119Z tmp6 = 2 2023-01-11T21:38:06.6102229Z tmp7 = tmp5 - tmp6 2023-01-11T21:38:06.6102389Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask) 2023-01-11T21:38:06.6102519Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp7, xmask) 2023-01-11T21:38:06.6102597Z ''') 2023-01-11T21:38:06.6102602Z 2023-01-11T21:38:06.6102614Z 2023-01-11T21:38:06.6102700Z async_compile.wait(globals()) 2023-01-11T21:38:06.6102777Z del async_compile 2023-01-11T21:38:06.6102782Z 2023-01-11T21:38:06.6102853Z def call(args): 2023-01-11T21:38:06.6102926Z arg0_1, = args 2023-01-11T21:38:06.6103004Z args.clear() 2023-01-11T21:38:06.6103124Z with torch.cuda.device(0): 2023-01-11T21:38:06.6103393Z buf0 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.6103602Z buf1 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.6103701Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.6103868Z triton_fused_rsqrt_sub_0.run(arg0_1, buf0, buf1, 64, grid=grid(64), stream=stream0) 2023-01-11T21:38:06.6103974Z del arg0_1 2023-01-11T21:38:06.6104085Z return (buf0, buf1, ) 2023-01-11T21:38:06.6104094Z 2023-01-11T21:38:06.6104100Z 2023-01-11T21:38:06.6104212Z if __name__ == "__main__": 2023-01-11T21:38:06.6104374Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.6104507Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.6104706Z arg0_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.6104821Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.6104827Z 2023-01-11T21:38:06.6104831Z 2023-01-11T21:38:06.6104932Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.6105005Z import torch 2023-01-11T21:38:06.6105077Z import random 2023-01-11T21:38:06.6105194Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.6105318Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.6105323Z 2023-01-11T21:38:06.6105405Z aten = torch.ops.aten 2023-01-11T21:38:06.6105537Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.6105643Z async_compile = AsyncCompile() 2023-01-11T21:38:06.6105650Z 2023-01-11T21:38:06.6105736Z import triton 2023-01-11T21:38:06.6105838Z import triton.language as tl 2023-01-11T21:38:06.6106022Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.6106164Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.6106169Z 2023-01-11T21:38:06.6106174Z 2023-01-11T21:38:06.6106334Z triton_fused_rsqrt_sub_0 = async_compile.triton(''' 2023-01-11T21:38:06.6106401Z import triton 2023-01-11T21:38:06.6106493Z import triton.language as tl 2023-01-11T21:38:06.6106633Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6106736Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6106868Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6106999Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6107007Z 2023-01-11T21:38:06.6107427Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.6107506Z @triton.jit 2023-01-11T21:38:06.6107650Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6107718Z xnumel = 64 2023-01-11T21:38:06.6107814Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6107942Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6108029Z xmask = xindex < xnumel 2023-01-11T21:38:06.6108098Z x0 = xindex 2023-01-11T21:38:06.6108311Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.6108432Z tmp2 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.6108523Z tmp1 = tl.libdevice.rsqrt(tmp0) 2023-01-11T21:38:06.6108625Z tmp3 = 1 2023-01-11T21:38:06.6108707Z tmp4 = tmp2 + tmp3 2023-01-11T21:38:06.6108807Z tmp5 = tl.libdevice.rsqrt(tmp4) 2023-01-11T21:38:06.6108880Z tmp6 = 2 2023-01-11T21:38:06.6108990Z tmp7 = tmp5 - tmp6 2023-01-11T21:38:06.6109121Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask) 2023-01-11T21:38:06.6109256Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp7, xmask) 2023-01-11T21:38:06.6109343Z ''') 2023-01-11T21:38:06.6109348Z 2023-01-11T21:38:06.6109353Z 2023-01-11T21:38:06.6109449Z async_compile.wait(globals()) 2023-01-11T21:38:06.6109529Z del async_compile 2023-01-11T21:38:06.6109535Z 2023-01-11T21:38:06.6109609Z def call(args): 2023-01-11T21:38:06.6109686Z arg0_1, = args 2023-01-11T21:38:06.6109763Z args.clear() 2023-01-11T21:38:06.6109851Z with torch.cuda.device(0): 2023-01-11T21:38:06.6110051Z buf0 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.6110251Z buf1 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.6110346Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.6110491Z triton_fused_rsqrt_sub_0.run(arg0_1, buf0, buf1, 64, grid=grid(64), stream=stream0) 2023-01-11T21:38:06.6110565Z del arg0_1 2023-01-11T21:38:06.6110652Z return (buf0, buf1, ) 2023-01-11T21:38:06.6110657Z 2023-01-11T21:38:06.6110662Z 2023-01-11T21:38:06.6110745Z if __name__ == "__main__": 2023-01-11T21:38:06.6110858Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.6110985Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.6111185Z arg0_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.6111301Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.6111307Z 2023-01-11T21:38:06.6111382Z ok (0.320s) 2023-01-11T21:38:06.6111844Z test_scatter1_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.6112014Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.6112277Z [2023-01-11 21:35:32,425] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 789 2023-01-11T21:38:06.6112542Z [2023-01-11 21:35:32,495] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 789 2023-01-11T21:38:06.6112959Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.6113090Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.6113346Z [2023-01-11 21:35:32,517] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 790 2023-01-11T21:38:06.6113608Z [2023-01-11 21:35:32,591] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 790 2023-01-11T21:38:06.6113614Z 2023-01-11T21:38:06.6113717Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.6113796Z import torch 2023-01-11T21:38:06.6113875Z import random 2023-01-11T21:38:06.6113995Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.6114122Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.6114128Z 2023-01-11T21:38:06.6114206Z aten = torch.ops.aten 2023-01-11T21:38:06.6114344Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.6114440Z async_compile = AsyncCompile() 2023-01-11T21:38:06.6114473Z 2023-01-11T21:38:06.6114552Z import triton 2023-01-11T21:38:06.6114645Z import triton.language as tl 2023-01-11T21:38:06.6114770Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.6114914Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.6114920Z 2023-01-11T21:38:06.6114927Z 2023-01-11T21:38:06.6115094Z triton_fused_scatter_0 = async_compile.triton(''' 2023-01-11T21:38:06.6115164Z import triton 2023-01-11T21:38:06.6115261Z import triton.language as tl 2023-01-11T21:38:06.6115378Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6115481Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6115615Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6115741Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6115746Z 2023-01-11T21:38:06.6116148Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.6116228Z @triton.jit 2023-01-11T21:38:06.6116356Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6116431Z xnumel = 6 2023-01-11T21:38:06.6116535Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6116665Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6116750Z xmask = xindex < xnumel 2023-01-11T21:38:06.6116824Z x0 = xindex 2023-01-11T21:38:06.6116923Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.6117052Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask) 2023-01-11T21:38:06.6117143Z ''') 2023-01-11T21:38:06.6117148Z 2023-01-11T21:38:06.6117153Z 2023-01-11T21:38:06.6117315Z triton_fused_scatter_1 = async_compile.triton(''' 2023-01-11T21:38:06.6117392Z import triton 2023-01-11T21:38:06.6117488Z import triton.language as tl 2023-01-11T21:38:06.6117609Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6117712Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6117847Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6117967Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6117973Z 2023-01-11T21:38:06.6118427Z @pointwise(size_hints=[1], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.6118510Z @triton.jit 2023-01-11T21:38:06.6118653Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6118727Z xnumel = 1 2023-01-11T21:38:06.6118824Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6118955Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6119041Z xmask = xindex < xnumel 2023-01-11T21:38:06.6119172Z tmp0 = tl.load(in_ptr0 + (0 + tl.zeros([XBLOCK], tl.int32)), None) 2023-01-11T21:38:06.6119303Z tmp1 = tl.load(in_ptr1 + (0 + tl.zeros([XBLOCK], tl.int32)), None) 2023-01-11T21:38:06.6119409Z tl.store(out_ptr0 + (tmp0), tmp1, None) 2023-01-11T21:38:06.6119497Z ''') 2023-01-11T21:38:06.6119502Z 2023-01-11T21:38:06.6119509Z 2023-01-11T21:38:06.6119606Z async_compile.wait(globals()) 2023-01-11T21:38:06.6119687Z del async_compile 2023-01-11T21:38:06.6119692Z 2023-01-11T21:38:06.6119769Z def call(args): 2023-01-11T21:38:06.6119858Z arg0_1, arg1_1, arg2_1 = args 2023-01-11T21:38:06.6119929Z args.clear() 2023-01-11T21:38:06.6120022Z with torch.cuda.device(0): 2023-01-11T21:38:06.6120224Z buf0 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.6120320Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.6120465Z triton_fused_scatter_0.run(arg0_1, buf0, 6, grid=grid(6), stream=stream0) 2023-01-11T21:38:06.6120569Z del arg0_1 2023-01-11T21:38:06.6120718Z triton_fused_scatter_1.run(arg1_1, arg2_1, buf0, 1, grid=grid(1), stream=stream0) 2023-01-11T21:38:06.6120787Z del arg1_1 2023-01-11T21:38:06.6120862Z del arg2_1 2023-01-11T21:38:06.6120940Z return (buf0, ) 2023-01-11T21:38:06.6120945Z 2023-01-11T21:38:06.6120952Z 2023-01-11T21:38:06.6121034Z if __name__ == "__main__": 2023-01-11T21:38:06.6121152Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.6121279Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.6121482Z arg0_1 = rand_strided((2, 3), (3, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.6121673Z arg1_1 = rand_strided((1, 1), (1, 1), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.6121870Z arg2_1 = rand_strided((2, 3), (3, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.6122000Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1])) 2023-01-11T21:38:06.6122008Z 2023-01-11T21:38:06.6122012Z 2023-01-11T21:38:06.6122113Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.6122191Z import torch 2023-01-11T21:38:06.6122267Z import random 2023-01-11T21:38:06.6122388Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.6122515Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.6122522Z 2023-01-11T21:38:06.6122606Z aten = torch.ops.aten 2023-01-11T21:38:06.6122738Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.6122837Z async_compile = AsyncCompile() 2023-01-11T21:38:06.6122842Z 2023-01-11T21:38:06.6122919Z import triton 2023-01-11T21:38:06.6123013Z import triton.language as tl 2023-01-11T21:38:06.6123138Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.6123282Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.6123287Z 2023-01-11T21:38:06.6123292Z 2023-01-11T21:38:06.6123454Z triton_fused_scatter_0 = async_compile.triton(''' 2023-01-11T21:38:06.6123527Z import triton 2023-01-11T21:38:06.6123622Z import triton.language as tl 2023-01-11T21:38:06.6123738Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6123843Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6123975Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6124128Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6124134Z 2023-01-11T21:38:06.6124536Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.6124611Z @triton.jit 2023-01-11T21:38:06.6124740Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6124814Z xnumel = 6 2023-01-11T21:38:06.6124913Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6125045Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6125135Z xmask = xindex < xnumel 2023-01-11T21:38:06.6125209Z x0 = xindex 2023-01-11T21:38:06.6125331Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.6125459Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask) 2023-01-11T21:38:06.6125548Z ''') 2023-01-11T21:38:06.6125557Z 2023-01-11T21:38:06.6125561Z 2023-01-11T21:38:06.6125723Z triton_fused_scatter_1 = async_compile.triton(''' 2023-01-11T21:38:06.6125799Z import triton 2023-01-11T21:38:06.6125894Z import triton.language as tl 2023-01-11T21:38:06.6126010Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6126112Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6126248Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6126367Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6126372Z 2023-01-11T21:38:06.6126792Z @pointwise(size_hints=[1], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.6126895Z @triton.jit 2023-01-11T21:38:06.6127038Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6127115Z xnumel = 1 2023-01-11T21:38:06.6127214Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6127344Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6127428Z xmask = xindex < xnumel 2023-01-11T21:38:06.6127555Z tmp0 = tl.load(in_ptr0 + (0 + tl.zeros([XBLOCK], tl.int32)), None) 2023-01-11T21:38:06.6127703Z tmp1 = tl.load(in_ptr1 + (0 + tl.zeros([XBLOCK], tl.int32)), None).to(tl.float32) 2023-01-11T21:38:06.6127813Z tl.store(out_ptr0 + (tmp0), tmp1, None) 2023-01-11T21:38:06.6127901Z ''') 2023-01-11T21:38:06.6127906Z 2023-01-11T21:38:06.6127910Z 2023-01-11T21:38:06.6128007Z async_compile.wait(globals()) 2023-01-11T21:38:06.6128087Z del async_compile 2023-01-11T21:38:06.6128092Z 2023-01-11T21:38:06.6128168Z def call(args): 2023-01-11T21:38:06.6128257Z arg0_1, arg1_1, arg2_1 = args 2023-01-11T21:38:06.6128328Z args.clear() 2023-01-11T21:38:06.6128419Z with torch.cuda.device(0): 2023-01-11T21:38:06.6128623Z buf0 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.6128719Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.6128863Z triton_fused_scatter_0.run(arg0_1, buf0, 6, grid=grid(6), stream=stream0) 2023-01-11T21:38:06.6128936Z del arg0_1 2023-01-11T21:38:06.6129080Z triton_fused_scatter_1.run(arg1_1, arg2_1, buf0, 1, grid=grid(1), stream=stream0) 2023-01-11T21:38:06.6129148Z del arg1_1 2023-01-11T21:38:06.6129222Z del arg2_1 2023-01-11T21:38:06.6129298Z return (buf0, ) 2023-01-11T21:38:06.6129304Z 2023-01-11T21:38:06.6129308Z 2023-01-11T21:38:06.6129390Z if __name__ == "__main__": 2023-01-11T21:38:06.6129511Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.6129643Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.6129847Z arg0_1 = rand_strided((2, 3), (3, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.6130068Z arg1_1 = rand_strided((1, 1), (1, 1), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.6130260Z arg2_1 = rand_strided((2, 3), (3, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.6130393Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1])) 2023-01-11T21:38:06.6130398Z 2023-01-11T21:38:06.6130471Z ok (0.189s) 2023-01-11T21:38:06.6130617Z test_scatter2_cuda (__main__.CudaTests) ... skip: unstable on sm86 (0.001s) 2023-01-11T21:38:06.6131076Z test_scatter3_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.6131216Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.6131480Z [2023-01-11 21:35:32,611] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 791 2023-01-11T21:38:06.6131744Z [2023-01-11 21:35:32,693] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 791 2023-01-11T21:38:06.6132162Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.6132296Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.6132590Z [2023-01-11 21:35:32,713] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 792 2023-01-11T21:38:06.6132847Z [2023-01-11 21:35:32,790] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 792 2023-01-11T21:38:06.6132853Z 2023-01-11T21:38:06.6132956Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.6133033Z import torch 2023-01-11T21:38:06.6133110Z import random 2023-01-11T21:38:06.6133233Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.6133359Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.6133364Z 2023-01-11T21:38:06.6133449Z aten = torch.ops.aten 2023-01-11T21:38:06.6133582Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.6133680Z async_compile = AsyncCompile() 2023-01-11T21:38:06.6133686Z 2023-01-11T21:38:06.6133762Z import triton 2023-01-11T21:38:06.6133856Z import triton.language as tl 2023-01-11T21:38:06.6133987Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.6134128Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.6134133Z 2023-01-11T21:38:06.6134138Z 2023-01-11T21:38:06.6134299Z triton_fused_scatter_0 = async_compile.triton(''' 2023-01-11T21:38:06.6134376Z import triton 2023-01-11T21:38:06.6134466Z import triton.language as tl 2023-01-11T21:38:06.6134940Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6135045Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6135177Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6135382Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6135388Z 2023-01-11T21:38:06.6135856Z @pointwise(size_hints=[2048], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.6135940Z @triton.jit 2023-01-11T21:38:06.6136074Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6136141Z xnumel = 1885 2023-01-11T21:38:06.6136235Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6136372Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6136513Z xmask = xindex < xnumel 2023-01-11T21:38:06.6136588Z x0 = xindex 2023-01-11T21:38:06.6136687Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.6136825Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask) 2023-01-11T21:38:06.6136905Z ''') 2023-01-11T21:38:06.6136917Z 2023-01-11T21:38:06.6136922Z 2023-01-11T21:38:06.6137074Z triton_fused_scatter_1 = async_compile.triton(''' 2023-01-11T21:38:06.6137201Z import triton 2023-01-11T21:38:06.6137310Z import triton.language as tl 2023-01-11T21:38:06.6137437Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6137540Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6137676Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6137801Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6137807Z 2023-01-11T21:38:06.6138210Z @pointwise(size_hints=[4], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.6138289Z @triton.jit 2023-01-11T21:38:06.6138424Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6138497Z xnumel = 4 2023-01-11T21:38:06.6138594Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6138724Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6138807Z xmask = xindex < xnumel 2023-01-11T21:38:06.6138879Z x0 = xindex 2023-01-11T21:38:06.6138969Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.6139040Z tmp1 = 0.8 2023-01-11T21:38:06.6139192Z tl.atomic_add(out_ptr0 + (tmp0), tmp1, xmask) 2023-01-11T21:38:06.6139277Z ''') 2023-01-11T21:38:06.6139283Z 2023-01-11T21:38:06.6139287Z 2023-01-11T21:38:06.6139376Z async_compile.wait(globals()) 2023-01-11T21:38:06.6139453Z del async_compile 2023-01-11T21:38:06.6139458Z 2023-01-11T21:38:06.6139534Z def call(args): 2023-01-11T21:38:06.6139610Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.6139687Z args.clear() 2023-01-11T21:38:06.6139780Z with torch.cuda.device(0): 2023-01-11T21:38:06.6139993Z buf0 = empty_strided((5, 29, 13), (377, 13, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.6140087Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.6140231Z triton_fused_scatter_0.run(arg0_1, buf0, 1885, grid=grid(1885), stream=stream0) 2023-01-11T21:38:06.6140305Z del arg0_1 2023-01-11T21:38:06.6140440Z triton_fused_scatter_1.run(arg1_1, buf0, 4, grid=grid(4), stream=stream0) 2023-01-11T21:38:06.6140516Z del arg1_1 2023-01-11T21:38:06.6140598Z return (buf0, ) 2023-01-11T21:38:06.6140603Z 2023-01-11T21:38:06.6140608Z 2023-01-11T21:38:06.6140688Z if __name__ == "__main__": 2023-01-11T21:38:06.6140804Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.6140930Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.6141145Z arg0_1 = rand_strided((5, 29, 13), (377, 13, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.6141349Z arg1_1 = rand_strided((1, 1, 4), (4, 4, 1), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.6141462Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.6141473Z 2023-01-11T21:38:06.6141478Z 2023-01-11T21:38:06.6141569Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.6141643Z import torch 2023-01-11T21:38:06.6141716Z import random 2023-01-11T21:38:06.6141836Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.6141961Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.6141969Z 2023-01-11T21:38:06.6142051Z aten = torch.ops.aten 2023-01-11T21:38:06.6142187Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.6142276Z async_compile = AsyncCompile() 2023-01-11T21:38:06.6142288Z 2023-01-11T21:38:06.6142356Z import triton 2023-01-11T21:38:06.6142477Z import triton.language as tl 2023-01-11T21:38:06.6142605Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.6142744Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.6142750Z 2023-01-11T21:38:06.6142754Z 2023-01-11T21:38:06.6142913Z triton_fused_scatter_0 = async_compile.triton(''' 2023-01-11T21:38:06.6142989Z import triton 2023-01-11T21:38:06.6143081Z import triton.language as tl 2023-01-11T21:38:06.6143189Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6143293Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6143425Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6143554Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6143559Z 2023-01-11T21:38:06.6143969Z @pointwise(size_hints=[2048], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.6144044Z @triton.jit 2023-01-11T21:38:06.6144176Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6144250Z xnumel = 1885 2023-01-11T21:38:06.6144341Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6144471Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6144554Z xmask = xindex < xnumel 2023-01-11T21:38:06.6144626Z x0 = xindex 2023-01-11T21:38:06.6144743Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.6144878Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask) 2023-01-11T21:38:06.6144991Z ''') 2023-01-11T21:38:06.6144996Z 2023-01-11T21:38:06.6145001Z 2023-01-11T21:38:06.6145160Z triton_fused_scatter_1 = async_compile.triton(''' 2023-01-11T21:38:06.6145230Z import triton 2023-01-11T21:38:06.6145324Z import triton.language as tl 2023-01-11T21:38:06.6145441Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6145549Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6145705Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6145846Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6145852Z 2023-01-11T21:38:06.6146264Z @pointwise(size_hints=[4], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.6146339Z @triton.jit 2023-01-11T21:38:06.6146467Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6146543Z xnumel = 4 2023-01-11T21:38:06.6146644Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6146774Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6146860Z xmask = xindex < xnumel 2023-01-11T21:38:06.6146933Z x0 = xindex 2023-01-11T21:38:06.6147025Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.6147104Z tmp1 = 0.8 2023-01-11T21:38:06.6147216Z tl.atomic_add(out_ptr0 + (tmp0), tmp1, xmask) 2023-01-11T21:38:06.6147306Z ''') 2023-01-11T21:38:06.6147311Z 2023-01-11T21:38:06.6147316Z 2023-01-11T21:38:06.6147410Z async_compile.wait(globals()) 2023-01-11T21:38:06.6147489Z del async_compile 2023-01-11T21:38:06.6147494Z 2023-01-11T21:38:06.6147569Z def call(args): 2023-01-11T21:38:06.6147649Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.6147720Z args.clear() 2023-01-11T21:38:06.6147813Z with torch.cuda.device(0): 2023-01-11T21:38:06.6148030Z buf0 = empty_strided((5, 29, 13), (377, 13, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.6148127Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.6148274Z triton_fused_scatter_0.run(arg0_1, buf0, 1885, grid=grid(1885), stream=stream0) 2023-01-11T21:38:06.6148349Z del arg0_1 2023-01-11T21:38:06.6148494Z triton_fused_scatter_1.run(arg1_1, buf0, 4, grid=grid(4), stream=stream0) 2023-01-11T21:38:06.6148592Z del arg1_1 2023-01-11T21:38:06.6148674Z return (buf0, ) 2023-01-11T21:38:06.6148679Z 2023-01-11T21:38:06.6148684Z 2023-01-11T21:38:06.6148768Z if __name__ == "__main__": 2023-01-11T21:38:06.6148886Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.6149014Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.6149229Z arg0_1 = rand_strided((5, 29, 13), (377, 13, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.6149433Z arg1_1 = rand_strided((1, 1, 4), (4, 4, 1), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.6149554Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.6149563Z 2023-01-11T21:38:06.6149629Z ok (0.198s) 2023-01-11T21:38:06.6150086Z test_scatter4_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.6150220Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.6150480Z [2023-01-11 21:35:32,806] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 793 2023-01-11T21:38:06.6150745Z [2023-01-11 21:35:32,888] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 793 2023-01-11T21:38:06.6151162Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.6151326Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.6151583Z [2023-01-11 21:35:32,902] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 794 2023-01-11T21:38:06.6151847Z [2023-01-11 21:35:32,980] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 794 2023-01-11T21:38:06.6151853Z 2023-01-11T21:38:06.6151955Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.6152033Z import torch 2023-01-11T21:38:06.6152103Z import random 2023-01-11T21:38:06.6152226Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.6152353Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.6152361Z 2023-01-11T21:38:06.6152446Z aten = torch.ops.aten 2023-01-11T21:38:06.6152583Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.6152680Z async_compile = AsyncCompile() 2023-01-11T21:38:06.6152685Z 2023-01-11T21:38:06.6152762Z import triton 2023-01-11T21:38:06.6152856Z import triton.language as tl 2023-01-11T21:38:06.6152980Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.6153122Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.6153128Z 2023-01-11T21:38:06.6153132Z 2023-01-11T21:38:06.6153294Z triton_fused_scatter_0 = async_compile.triton(''' 2023-01-11T21:38:06.6153372Z import triton 2023-01-11T21:38:06.6153464Z import triton.language as tl 2023-01-11T21:38:06.6153580Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6153684Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6153812Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6153939Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6153948Z 2023-01-11T21:38:06.6154361Z @pointwise(size_hints=[262144], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.6154463Z @triton.jit 2023-01-11T21:38:06.6154600Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6154678Z xnumel = 194432 2023-01-11T21:38:06.6154778Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6154909Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6154987Z xmask = xindex < xnumel 2023-01-11T21:38:06.6155061Z x0 = xindex 2023-01-11T21:38:06.6155159Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.6155299Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask) 2023-01-11T21:38:06.6155387Z ''') 2023-01-11T21:38:06.6155396Z 2023-01-11T21:38:06.6155400Z 2023-01-11T21:38:06.6155561Z triton_fused_scatter_1 = async_compile.triton(''' 2023-01-11T21:38:06.6155637Z import triton 2023-01-11T21:38:06.6155730Z import triton.language as tl 2023-01-11T21:38:06.6155840Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6155942Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6156075Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6156203Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6156209Z 2023-01-11T21:38:06.6156641Z @pointwise(size_hints=[1024], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.6156718Z @triton.jit 2023-01-11T21:38:06.6156860Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6156962Z xnumel = 992 2023-01-11T21:38:06.6157054Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6157184Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6157267Z xmask = xindex < xnumel 2023-01-11T21:38:06.6157341Z x0 = xindex 2023-01-11T21:38:06.6157440Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.6157541Z tmp1 = tl.load(in_ptr1 + (x0), xmask) 2023-01-11T21:38:06.6157687Z tl.store(out_ptr0 + (x0 + (992*tmp0) + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask) 2023-01-11T21:38:06.6157769Z ''') 2023-01-11T21:38:06.6157775Z 2023-01-11T21:38:06.6157779Z 2023-01-11T21:38:06.6157876Z async_compile.wait(globals()) 2023-01-11T21:38:06.6157958Z del async_compile 2023-01-11T21:38:06.6157963Z 2023-01-11T21:38:06.6158040Z def call(args): 2023-01-11T21:38:06.6158128Z arg0_1, arg1_1, arg2_1 = args 2023-01-11T21:38:06.6158205Z args.clear() 2023-01-11T21:38:06.6158298Z with torch.cuda.device(0): 2023-01-11T21:38:06.6158505Z buf0 = empty_strided((196, 992), (992, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.6158606Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.6158754Z triton_fused_scatter_0.run(arg0_1, buf0, 194432, grid=grid(194432), stream=stream0) 2023-01-11T21:38:06.6158830Z del arg0_1 2023-01-11T21:38:06.6158983Z triton_fused_scatter_1.run(arg1_1, arg2_1, buf0, 992, grid=grid(992), stream=stream0) 2023-01-11T21:38:06.6159056Z del arg1_1 2023-01-11T21:38:06.6159132Z del arg2_1 2023-01-11T21:38:06.6159212Z return (buf0, ) 2023-01-11T21:38:06.6159217Z 2023-01-11T21:38:06.6159221Z 2023-01-11T21:38:06.6159297Z if __name__ == "__main__": 2023-01-11T21:38:06.6159417Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.6159547Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.6159756Z arg0_1 = rand_strided((196, 992), (992, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.6159957Z arg1_1 = rand_strided((1, 992), (992, 1), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.6160164Z arg2_1 = rand_strided((1, 992), (992, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.6160293Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1])) 2023-01-11T21:38:06.6160299Z 2023-01-11T21:38:06.6160304Z 2023-01-11T21:38:06.6160433Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.6160505Z import torch 2023-01-11T21:38:06.6160580Z import random 2023-01-11T21:38:06.6160701Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.6160827Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.6160832Z 2023-01-11T21:38:06.6160913Z aten = torch.ops.aten 2023-01-11T21:38:06.6161052Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.6161169Z async_compile = AsyncCompile() 2023-01-11T21:38:06.6161174Z 2023-01-11T21:38:06.6161250Z import triton 2023-01-11T21:38:06.6161360Z import triton.language as tl 2023-01-11T21:38:06.6161489Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.6161629Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.6161634Z 2023-01-11T21:38:06.6161639Z 2023-01-11T21:38:06.6161800Z triton_fused_scatter_0 = async_compile.triton(''' 2023-01-11T21:38:06.6161878Z import triton 2023-01-11T21:38:06.6161973Z import triton.language as tl 2023-01-11T21:38:06.6162092Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6162188Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6162321Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6162447Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6162452Z 2023-01-11T21:38:06.6162857Z @pointwise(size_hints=[262144], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.6162957Z @triton.jit 2023-01-11T21:38:06.6163095Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6163173Z xnumel = 194432 2023-01-11T21:38:06.6163272Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6163402Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6163481Z xmask = xindex < xnumel 2023-01-11T21:38:06.6163550Z x0 = xindex 2023-01-11T21:38:06.6163669Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.6163804Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask) 2023-01-11T21:38:06.6163890Z ''') 2023-01-11T21:38:06.6163896Z 2023-01-11T21:38:06.6163900Z 2023-01-11T21:38:06.6164060Z triton_fused_scatter_1 = async_compile.triton(''' 2023-01-11T21:38:06.6164134Z import triton 2023-01-11T21:38:06.6164220Z import triton.language as tl 2023-01-11T21:38:06.6164334Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6164436Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6164573Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6164699Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6164705Z 2023-01-11T21:38:06.6165135Z @pointwise(size_hints=[1024], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.6165209Z @triton.jit 2023-01-11T21:38:06.6165351Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6165419Z xnumel = 992 2023-01-11T21:38:06.6165515Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6165642Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6165724Z xmask = xindex < xnumel 2023-01-11T21:38:06.6165796Z x0 = xindex 2023-01-11T21:38:06.6165895Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.6166016Z tmp1 = tl.load(in_ptr1 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.6166155Z tl.store(out_ptr0 + (x0 + (992*tmp0) + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask) 2023-01-11T21:38:06.6166241Z ''') 2023-01-11T21:38:06.6166246Z 2023-01-11T21:38:06.6166251Z 2023-01-11T21:38:06.6166344Z async_compile.wait(globals()) 2023-01-11T21:38:06.6166450Z del async_compile 2023-01-11T21:38:06.6166456Z 2023-01-11T21:38:06.6166531Z def call(args): 2023-01-11T21:38:06.6166619Z arg0_1, arg1_1, arg2_1 = args 2023-01-11T21:38:06.6166693Z args.clear() 2023-01-11T21:38:06.6166785Z with torch.cuda.device(0): 2023-01-11T21:38:06.6166987Z buf0 = empty_strided((196, 992), (992, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.6167079Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.6167226Z triton_fused_scatter_0.run(arg0_1, buf0, 194432, grid=grid(194432), stream=stream0) 2023-01-11T21:38:06.6167302Z del arg0_1 2023-01-11T21:38:06.6167457Z triton_fused_scatter_1.run(arg1_1, arg2_1, buf0, 992, grid=grid(992), stream=stream0) 2023-01-11T21:38:06.6167532Z del arg1_1 2023-01-11T21:38:06.6167606Z del arg2_1 2023-01-11T21:38:06.6167677Z return (buf0, ) 2023-01-11T21:38:06.6167682Z 2023-01-11T21:38:06.6167693Z 2023-01-11T21:38:06.6167770Z if __name__ == "__main__": 2023-01-11T21:38:06.6167889Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.6168020Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.6168228Z arg0_1 = rand_strided((196, 992), (992, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.6168429Z arg1_1 = rand_strided((1, 992), (992, 1), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.6168634Z arg2_1 = rand_strided((1, 992), (992, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.6168763Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1])) 2023-01-11T21:38:06.6168769Z 2023-01-11T21:38:06.6168866Z ok (0.190s) 2023-01-11T21:38:06.6169029Z test_scatter_add1_cuda (__main__.CudaTests) ... skip: Flaky test, needs debugging (0.000s) 2023-01-11T21:38:06.6169494Z test_scatter_add2_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.6169627Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.6169887Z [2023-01-11 21:35:33,001] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 795 2023-01-11T21:38:06.6170150Z [2023-01-11 21:35:33,078] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 795 2023-01-11T21:38:06.6170565Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.6170704Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.6170963Z [2023-01-11 21:35:33,099] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 796 2023-01-11T21:38:06.6171225Z [2023-01-11 21:35:33,173] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 796 2023-01-11T21:38:06.6171230Z 2023-01-11T21:38:06.6171329Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.6171399Z import torch 2023-01-11T21:38:06.6171474Z import random 2023-01-11T21:38:06.6171594Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.6171719Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.6171726Z 2023-01-11T21:38:06.6171810Z aten = torch.ops.aten 2023-01-11T21:38:06.6171946Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.6172042Z async_compile = AsyncCompile() 2023-01-11T21:38:06.6172048Z 2023-01-11T21:38:06.6172122Z import triton 2023-01-11T21:38:06.6172237Z import triton.language as tl 2023-01-11T21:38:06.6172364Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.6172503Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.6172509Z 2023-01-11T21:38:06.6172513Z 2023-01-11T21:38:06.6172681Z triton_fused_scatter_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.6172756Z import triton 2023-01-11T21:38:06.6172849Z import triton.language as tl 2023-01-11T21:38:06.6172963Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6173059Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6173194Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6173324Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6173329Z 2023-01-11T21:38:06.6173730Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.6173807Z @triton.jit 2023-01-11T21:38:06.6173940Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6174013Z xnumel = 6 2023-01-11T21:38:06.6174109Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6174234Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6174317Z xmask = xindex < xnumel 2023-01-11T21:38:06.6174389Z x0 = xindex 2023-01-11T21:38:06.6174663Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.6174802Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask) 2023-01-11T21:38:06.6174990Z ''') 2023-01-11T21:38:06.6174996Z 2023-01-11T21:38:06.6175000Z 2023-01-11T21:38:06.6175167Z triton_fused_scatter_add_1 = async_compile.triton(''' 2023-01-11T21:38:06.6175243Z import triton 2023-01-11T21:38:06.6175333Z import triton.language as tl 2023-01-11T21:38:06.6175448Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6175552Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6175686Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6175813Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6175819Z 2023-01-11T21:38:06.6176247Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.6176321Z @triton.jit 2023-01-11T21:38:06.6176461Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6176532Z xnumel = 6 2023-01-11T21:38:06.6176628Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6176758Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6176841Z xmask = xindex < xnumel 2023-01-11T21:38:06.6176911Z x2 = xindex 2023-01-11T21:38:06.6176986Z x0 = xindex % 3 2023-01-11T21:38:06.6177087Z tmp0 = tl.load(in_ptr0 + (x2), xmask) 2023-01-11T21:38:06.6177234Z tmp1 = tl.load(in_ptr1 + (x2), xmask) 2023-01-11T21:38:06.6177387Z tl.atomic_add(out_ptr0 + (x0 + (3*tmp0) + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask) 2023-01-11T21:38:06.6177476Z ''') 2023-01-11T21:38:06.6177482Z 2023-01-11T21:38:06.6177486Z 2023-01-11T21:38:06.6177580Z async_compile.wait(globals()) 2023-01-11T21:38:06.6177657Z del async_compile 2023-01-11T21:38:06.6177662Z 2023-01-11T21:38:06.6177737Z def call(args): 2023-01-11T21:38:06.6177822Z arg0_1, arg1_1, arg2_1 = args 2023-01-11T21:38:06.6177898Z args.clear() 2023-01-11T21:38:06.6177988Z with torch.cuda.device(0): 2023-01-11T21:38:06.6178189Z buf0 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.6178282Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.6178427Z triton_fused_scatter_add_0.run(arg0_1, buf0, 6, grid=grid(6), stream=stream0) 2023-01-11T21:38:06.6178552Z del arg0_1 2023-01-11T21:38:06.6178710Z triton_fused_scatter_add_1.run(arg1_1, arg2_1, buf0, 6, grid=grid(6), stream=stream0) 2023-01-11T21:38:06.6178784Z del arg1_1 2023-01-11T21:38:06.6178850Z del arg2_1 2023-01-11T21:38:06.6178929Z return (buf0, ) 2023-01-11T21:38:06.6178934Z 2023-01-11T21:38:06.6178939Z 2023-01-11T21:38:06.6179019Z if __name__ == "__main__": 2023-01-11T21:38:06.6179137Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.6179265Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.6179466Z arg0_1 = rand_strided((2, 3), (3, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.6179666Z arg1_1 = rand_strided((2, 3), (3, 1), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.6179863Z arg2_1 = rand_strided((2, 3), (3, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.6179985Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1])) 2023-01-11T21:38:06.6179990Z 2023-01-11T21:38:06.6180003Z 2023-01-11T21:38:06.6180095Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.6180171Z import torch 2023-01-11T21:38:06.6180245Z import random 2023-01-11T21:38:06.6180366Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.6180489Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.6180494Z 2023-01-11T21:38:06.6180580Z aten = torch.ops.aten 2023-01-11T21:38:06.6180716Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.6180805Z async_compile = AsyncCompile() 2023-01-11T21:38:06.6180810Z 2023-01-11T21:38:06.6180883Z import triton 2023-01-11T21:38:06.6181005Z import triton.language as tl 2023-01-11T21:38:06.6181130Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.6181270Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.6181275Z 2023-01-11T21:38:06.6181280Z 2023-01-11T21:38:06.6181446Z triton_fused_scatter_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.6181524Z import triton 2023-01-11T21:38:06.6181611Z import triton.language as tl 2023-01-11T21:38:06.6181728Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6181830Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6181963Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6182089Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6182094Z 2023-01-11T21:38:06.6182491Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.6182568Z @triton.jit 2023-01-11T21:38:06.6182700Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6182767Z xnumel = 6 2023-01-11T21:38:06.6182866Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6182997Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6183080Z xmask = xindex < xnumel 2023-01-11T21:38:06.6183151Z x0 = xindex 2023-01-11T21:38:06.6183266Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.6183403Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask) 2023-01-11T21:38:06.6183491Z ''') 2023-01-11T21:38:06.6183496Z 2023-01-11T21:38:06.6183501Z 2023-01-11T21:38:06.6183662Z triton_fused_scatter_add_1 = async_compile.triton(''' 2023-01-11T21:38:06.6183738Z import triton 2023-01-11T21:38:06.6183831Z import triton.language as tl 2023-01-11T21:38:06.6183945Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6184049Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6184183Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6184308Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6184313Z 2023-01-11T21:38:06.6184761Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.6184830Z @triton.jit 2023-01-11T21:38:06.6184972Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6185045Z xnumel = 6 2023-01-11T21:38:06.6185141Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6185271Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6185355Z xmask = xindex < xnumel 2023-01-11T21:38:06.6185424Z x2 = xindex 2023-01-11T21:38:06.6185494Z x0 = xindex % 3 2023-01-11T21:38:06.6185592Z tmp0 = tl.load(in_ptr0 + (x2), xmask) 2023-01-11T21:38:06.6185736Z tmp1 = tl.load(in_ptr1 + (x2), xmask).to(tl.float32) 2023-01-11T21:38:06.6185909Z tl.atomic_add(out_ptr0 + (x0 + (3*tmp0) + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask) 2023-01-11T21:38:06.6185997Z ''') 2023-01-11T21:38:06.6186002Z 2023-01-11T21:38:06.6186008Z 2023-01-11T21:38:06.6186101Z async_compile.wait(globals()) 2023-01-11T21:38:06.6186178Z del async_compile 2023-01-11T21:38:06.6186183Z 2023-01-11T21:38:06.6186258Z def call(args): 2023-01-11T21:38:06.6186337Z arg0_1, arg1_1, arg2_1 = args 2023-01-11T21:38:06.6186412Z args.clear() 2023-01-11T21:38:06.6186505Z with torch.cuda.device(0): 2023-01-11T21:38:06.6186705Z buf0 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.6186798Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.6186943Z triton_fused_scatter_add_0.run(arg0_1, buf0, 6, grid=grid(6), stream=stream0) 2023-01-11T21:38:06.6187045Z del arg0_1 2023-01-11T21:38:06.6187188Z triton_fused_scatter_add_1.run(arg1_1, arg2_1, buf0, 6, grid=grid(6), stream=stream0) 2023-01-11T21:38:06.6187262Z del arg1_1 2023-01-11T21:38:06.6187335Z del arg2_1 2023-01-11T21:38:06.6187412Z return (buf0, ) 2023-01-11T21:38:06.6187418Z 2023-01-11T21:38:06.6187424Z 2023-01-11T21:38:06.6187506Z if __name__ == "__main__": 2023-01-11T21:38:06.6187625Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.6187755Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.6187956Z arg0_1 = rand_strided((2, 3), (3, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.6188145Z arg1_1 = rand_strided((2, 3), (3, 1), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.6188341Z arg2_1 = rand_strided((2, 3), (3, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.6188467Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1])) 2023-01-11T21:38:06.6188476Z 2023-01-11T21:38:06.6188549Z ok (0.193s) 2023-01-11T21:38:06.6189013Z test_scatter_add3_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.6189148Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.6189411Z [2023-01-11 21:35:33,194] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 797 2023-01-11T21:38:06.6189674Z [2023-01-11 21:35:33,357] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 797 2023-01-11T21:38:06.6190091Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.6190227Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.6190502Z [2023-01-11 21:35:33,376] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 798 2023-01-11T21:38:06.6190769Z [2023-01-11 21:35:33,446] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 798 2023-01-11T21:38:06.6190775Z 2023-01-11T21:38:06.6190873Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.6190948Z import torch 2023-01-11T21:38:06.6191024Z import random 2023-01-11T21:38:06.6191144Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.6191270Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.6191276Z 2023-01-11T21:38:06.6191362Z aten = torch.ops.aten 2023-01-11T21:38:06.6191492Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.6191590Z async_compile = AsyncCompile() 2023-01-11T21:38:06.6191595Z 2023-01-11T21:38:06.6191669Z import triton 2023-01-11T21:38:06.6191761Z import triton.language as tl 2023-01-11T21:38:06.6191889Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.6192028Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.6192033Z 2023-01-11T21:38:06.6192038Z 2023-01-11T21:38:06.6192204Z triton_fused_scatter_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.6192279Z import triton 2023-01-11T21:38:06.6192365Z import triton.language as tl 2023-01-11T21:38:06.6192481Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6192585Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6192717Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6192843Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6192874Z 2023-01-11T21:38:06.6193280Z @pointwise(size_hints=[2048], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.6193355Z @triton.jit 2023-01-11T21:38:06.6193488Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6193557Z xnumel = 1885 2023-01-11T21:38:06.6193654Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6193782Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6193865Z xmask = xindex < xnumel 2023-01-11T21:38:06.6193937Z x0 = xindex 2023-01-11T21:38:06.6194036Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.6194172Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask) 2023-01-11T21:38:06.6194252Z ''') 2023-01-11T21:38:06.6194258Z 2023-01-11T21:38:06.6194271Z 2023-01-11T21:38:06.6194431Z triton_fused_scatter_add_1 = async_compile.triton(''' 2023-01-11T21:38:06.6194507Z import triton 2023-01-11T21:38:06.6194600Z import triton.language as tl 2023-01-11T21:38:06.6194715Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6194818Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6194955Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6195081Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6195086Z 2023-01-11T21:38:06.6195540Z @pointwise(size_hints=[4], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.6195622Z @triton.jit 2023-01-11T21:38:06.6195767Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6195841Z xnumel = 4 2023-01-11T21:38:06.6195941Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6196071Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6196154Z xmask = xindex < xnumel 2023-01-11T21:38:06.6196225Z x0 = xindex 2023-01-11T21:38:06.6196315Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.6196440Z tmp1 = tl.load(in_ptr1 + (x0), xmask) 2023-01-11T21:38:06.6196552Z tl.atomic_add(out_ptr0 + (tmp0), tmp1, xmask) 2023-01-11T21:38:06.6196637Z ''') 2023-01-11T21:38:06.6196643Z 2023-01-11T21:38:06.6196647Z 2023-01-11T21:38:06.6196741Z async_compile.wait(globals()) 2023-01-11T21:38:06.6196817Z del async_compile 2023-01-11T21:38:06.6196823Z 2023-01-11T21:38:06.6196896Z def call(args): 2023-01-11T21:38:06.6196977Z arg0_1, arg1_1, arg2_1 = args 2023-01-11T21:38:06.6197052Z args.clear() 2023-01-11T21:38:06.6197146Z with torch.cuda.device(0): 2023-01-11T21:38:06.6197358Z buf0 = empty_strided((5, 29, 13), (377, 13, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.6197455Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.6197604Z triton_fused_scatter_add_0.run(arg0_1, buf0, 1885, grid=grid(1885), stream=stream0) 2023-01-11T21:38:06.6197677Z del arg0_1 2023-01-11T21:38:06.6197832Z triton_fused_scatter_add_1.run(arg1_1, arg2_1, buf0, 4, grid=grid(4), stream=stream0) 2023-01-11T21:38:06.6197900Z del arg1_1 2023-01-11T21:38:06.6197971Z del arg2_1 2023-01-11T21:38:06.6198049Z return (buf0, ) 2023-01-11T21:38:06.6198055Z 2023-01-11T21:38:06.6198059Z 2023-01-11T21:38:06.6198140Z if __name__ == "__main__": 2023-01-11T21:38:06.6198258Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.6198385Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.6198600Z arg0_1 = rand_strided((5, 29, 13), (377, 13, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.6198798Z arg1_1 = rand_strided((1, 1, 4), (4, 4, 1), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.6199037Z arg2_1 = rand_strided((1, 1, 10), (10, 10, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.6199165Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1])) 2023-01-11T21:38:06.6199170Z 2023-01-11T21:38:06.6199174Z 2023-01-11T21:38:06.6199272Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.6199350Z import torch 2023-01-11T21:38:06.6199426Z import random 2023-01-11T21:38:06.6199545Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.6199670Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.6199675Z 2023-01-11T21:38:06.6199751Z aten = torch.ops.aten 2023-01-11T21:38:06.6199889Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.6199985Z async_compile = AsyncCompile() 2023-01-11T21:38:06.6199990Z 2023-01-11T21:38:06.6200066Z import triton 2023-01-11T21:38:06.6200159Z import triton.language as tl 2023-01-11T21:38:06.6200284Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.6200425Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.6200431Z 2023-01-11T21:38:06.6200435Z 2023-01-11T21:38:06.6200601Z triton_fused_scatter_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.6200669Z import triton 2023-01-11T21:38:06.6200761Z import triton.language as tl 2023-01-11T21:38:06.6200877Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6200978Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6201111Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6201234Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6201239Z 2023-01-11T21:38:06.6201643Z @pointwise(size_hints=[2048], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.6201718Z @triton.jit 2023-01-11T21:38:06.6201846Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6201924Z xnumel = 1885 2023-01-11T21:38:06.6202019Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6202148Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6202236Z xmask = xindex < xnumel 2023-01-11T21:38:06.6202336Z x0 = xindex 2023-01-11T21:38:06.6202456Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.6202584Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask) 2023-01-11T21:38:06.6202669Z ''') 2023-01-11T21:38:06.6202675Z 2023-01-11T21:38:06.6202679Z 2023-01-11T21:38:06.6202849Z triton_fused_scatter_add_1 = async_compile.triton(''' 2023-01-11T21:38:06.6202925Z import triton 2023-01-11T21:38:06.6203023Z import triton.language as tl 2023-01-11T21:38:06.6203134Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6203233Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6203367Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6203485Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6203490Z 2023-01-11T21:38:06.6203912Z @pointwise(size_hints=[4], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.6203986Z @triton.jit 2023-01-11T21:38:06.6204129Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6204200Z xnumel = 4 2023-01-11T21:38:06.6204298Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6204426Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6204509Z xmask = xindex < xnumel 2023-01-11T21:38:06.6204574Z x0 = xindex 2023-01-11T21:38:06.6204670Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.6204786Z tmp1 = tl.load(in_ptr1 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.6204925Z tl.atomic_add(out_ptr0 + (tmp0), tmp1, xmask) 2023-01-11T21:38:06.6205011Z ''') 2023-01-11T21:38:06.6205017Z 2023-01-11T21:38:06.6205021Z 2023-01-11T21:38:06.6205115Z async_compile.wait(globals()) 2023-01-11T21:38:06.6205191Z del async_compile 2023-01-11T21:38:06.6205196Z 2023-01-11T21:38:06.6205272Z def call(args): 2023-01-11T21:38:06.6205353Z arg0_1, arg1_1, arg2_1 = args 2023-01-11T21:38:06.6205428Z args.clear() 2023-01-11T21:38:06.6205522Z with torch.cuda.device(0): 2023-01-11T21:38:06.6205735Z buf0 = empty_strided((5, 29, 13), (377, 13, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.6205828Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.6205977Z triton_fused_scatter_add_0.run(arg0_1, buf0, 1885, grid=grid(1885), stream=stream0) 2023-01-11T21:38:06.6206050Z del arg0_1 2023-01-11T21:38:06.6206193Z triton_fused_scatter_add_1.run(arg1_1, arg2_1, buf0, 4, grid=grid(4), stream=stream0) 2023-01-11T21:38:06.6206269Z del arg1_1 2023-01-11T21:38:06.6206343Z del arg2_1 2023-01-11T21:38:06.6206420Z return (buf0, ) 2023-01-11T21:38:06.6206425Z 2023-01-11T21:38:06.6206429Z 2023-01-11T21:38:06.6206510Z if __name__ == "__main__": 2023-01-11T21:38:06.6206627Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.6206757Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.6206961Z arg0_1 = rand_strided((5, 29, 13), (377, 13, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.6207162Z arg1_1 = rand_strided((1, 1, 4), (4, 4, 1), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.6207369Z arg2_1 = rand_strided((1, 1, 10), (10, 10, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.6207495Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1])) 2023-01-11T21:38:06.6207500Z 2023-01-11T21:38:06.6207571Z ok (0.273s) 2023-01-11T21:38:06.6208070Z test_scatter_reduce1_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.6208208Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.6208467Z [2023-01-11 21:35:33,467] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 799 2023-01-11T21:38:06.6208734Z [2023-01-11 21:35:33,481] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 799 2023-01-11T21:38:06.6209149Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.6209284Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.6209534Z [2023-01-11 21:35:33,502] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 800 2023-01-11T21:38:06.6209799Z [2023-01-11 21:35:33,516] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 800 2023-01-11T21:38:06.6209805Z 2023-01-11T21:38:06.6209903Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.6209976Z import torch 2023-01-11T21:38:06.6210049Z import random 2023-01-11T21:38:06.6210170Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.6210294Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.6210299Z 2023-01-11T21:38:06.6210380Z aten = torch.ops.aten 2023-01-11T21:38:06.6210511Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.6210638Z async_compile = AsyncCompile() 2023-01-11T21:38:06.6210643Z 2023-01-11T21:38:06.6210718Z import triton 2023-01-11T21:38:06.6210811Z import triton.language as tl 2023-01-11T21:38:06.6210937Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.6211077Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.6211085Z 2023-01-11T21:38:06.6211089Z 2023-01-11T21:38:06.6211264Z triton_fused_scatter_reduce_0 = async_compile.triton(''' 2023-01-11T21:38:06.6211341Z import triton 2023-01-11T21:38:06.6211427Z import triton.language as tl 2023-01-11T21:38:06.6211543Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6211643Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6211774Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6211898Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6211903Z 2023-01-11T21:38:06.6219476Z @pointwise(size_hints=[2048], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.6219582Z @triton.jit 2023-01-11T21:38:06.6219724Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6219805Z xnumel = 1885 2023-01-11T21:38:06.6219899Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6220032Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6220116Z xmask = xindex < xnumel 2023-01-11T21:38:06.6220188Z x0 = xindex 2023-01-11T21:38:06.6220286Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.6220423Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask) 2023-01-11T21:38:06.6220531Z ''') 2023-01-11T21:38:06.6220538Z 2023-01-11T21:38:06.6220542Z 2023-01-11T21:38:06.6220714Z triton_fused_scatter_reduce_1 = async_compile.triton(''' 2023-01-11T21:38:06.6220793Z import triton 2023-01-11T21:38:06.6220886Z import triton.language as tl 2023-01-11T21:38:06.6220997Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6221103Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6221235Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6221437Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6221443Z 2023-01-11T21:38:06.6221860Z @pointwise(size_hints=[4], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.6221940Z @triton.jit 2023-01-11T21:38:06.6222088Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6222162Z xnumel = 4 2023-01-11T21:38:06.6222262Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6222394Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6222483Z xmask = xindex < xnumel 2023-01-11T21:38:06.6222559Z x0 = xindex 2023-01-11T21:38:06.6222652Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.6222754Z tmp1 = tl.load(in_ptr1 + (x0), xmask) 2023-01-11T21:38:06.6222868Z tl.atomic_add(out_ptr0 + (tmp0), tmp1, xmask) 2023-01-11T21:38:06.6222993Z ''') 2023-01-11T21:38:06.6223001Z 2023-01-11T21:38:06.6223007Z 2023-01-11T21:38:06.6223132Z async_compile.wait(globals()) 2023-01-11T21:38:06.6223221Z del async_compile 2023-01-11T21:38:06.6223227Z 2023-01-11T21:38:06.6223303Z def call(args): 2023-01-11T21:38:06.6223385Z arg0_1, arg1_1, arg2_1 = args 2023-01-11T21:38:06.6223465Z args.clear() 2023-01-11T21:38:06.6223560Z with torch.cuda.device(0): 2023-01-11T21:38:06.6223780Z buf0 = empty_strided((5, 29, 13), (377, 13, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.6223882Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.6224037Z triton_fused_scatter_reduce_0.run(arg0_1, buf0, 1885, grid=grid(1885), stream=stream0) 2023-01-11T21:38:06.6224176Z del arg0_1 2023-01-11T21:38:06.6224335Z triton_fused_scatter_reduce_1.run(arg1_1, arg2_1, buf0, 4, grid=grid(4), stream=stream0) 2023-01-11T21:38:06.6224402Z del arg1_1 2023-01-11T21:38:06.6224473Z del arg2_1 2023-01-11T21:38:06.6224551Z return (buf0, ) 2023-01-11T21:38:06.6224557Z 2023-01-11T21:38:06.6224561Z 2023-01-11T21:38:06.6224645Z if __name__ == "__main__": 2023-01-11T21:38:06.6224764Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.6224887Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.6225100Z arg0_1 = rand_strided((5, 29, 13), (377, 13, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.6225377Z arg1_1 = rand_strided((1, 1, 4), (4, 4, 1), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.6225592Z arg2_1 = rand_strided((1, 1, 10), (10, 10, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.6225748Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1])) 2023-01-11T21:38:06.6225753Z 2023-01-11T21:38:06.6225758Z 2023-01-11T21:38:06.6225878Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.6225958Z import torch 2023-01-11T21:38:06.6226032Z import random 2023-01-11T21:38:06.6226155Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.6226280Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.6226285Z 2023-01-11T21:38:06.6226360Z aten = torch.ops.aten 2023-01-11T21:38:06.6226495Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.6226592Z async_compile = AsyncCompile() 2023-01-11T21:38:06.6226597Z 2023-01-11T21:38:06.6226669Z import triton 2023-01-11T21:38:06.6226761Z import triton.language as tl 2023-01-11T21:38:06.6226886Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.6227024Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.6227031Z 2023-01-11T21:38:06.6227036Z 2023-01-11T21:38:06.6227209Z triton_fused_scatter_reduce_0 = async_compile.triton(''' 2023-01-11T21:38:06.6227276Z import triton 2023-01-11T21:38:06.6227367Z import triton.language as tl 2023-01-11T21:38:06.6227480Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6227613Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6227747Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6227872Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6227877Z 2023-01-11T21:38:06.6228281Z @pointwise(size_hints=[2048], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.6228354Z @triton.jit 2023-01-11T21:38:06.6228480Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6228556Z xnumel = 1885 2023-01-11T21:38:06.6228660Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6228788Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6228873Z xmask = xindex < xnumel 2023-01-11T21:38:06.6228944Z x0 = xindex 2023-01-11T21:38:06.6229060Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.6229192Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask) 2023-01-11T21:38:06.6229277Z ''') 2023-01-11T21:38:06.6229283Z 2023-01-11T21:38:06.6229288Z 2023-01-11T21:38:06.6229461Z triton_fused_scatter_reduce_1 = async_compile.triton(''' 2023-01-11T21:38:06.6229536Z import triton 2023-01-11T21:38:06.6229627Z import triton.language as tl 2023-01-11T21:38:06.6229740Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6229842Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6229974Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6230092Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6230127Z 2023-01-11T21:38:06.6230550Z @pointwise(size_hints=[4], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.6230625Z @triton.jit 2023-01-11T21:38:06.6230765Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6230837Z xnumel = 4 2023-01-11T21:38:06.6230932Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6231061Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6231144Z xmask = xindex < xnumel 2023-01-11T21:38:06.6231208Z x0 = xindex 2023-01-11T21:38:06.6231304Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.6231421Z tmp1 = tl.load(in_ptr1 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.6231534Z tl.atomic_add(out_ptr0 + (tmp0), tmp1, xmask) 2023-01-11T21:38:06.6231622Z ''') 2023-01-11T21:38:06.6231627Z 2023-01-11T21:38:06.6231632Z 2023-01-11T21:38:06.6231726Z async_compile.wait(globals()) 2023-01-11T21:38:06.6231803Z del async_compile 2023-01-11T21:38:06.6231808Z 2023-01-11T21:38:06.6231882Z def call(args): 2023-01-11T21:38:06.6231960Z arg0_1, arg1_1, arg2_1 = args 2023-01-11T21:38:06.6232037Z args.clear() 2023-01-11T21:38:06.6232128Z with torch.cuda.device(0): 2023-01-11T21:38:06.6232342Z buf0 = empty_strided((5, 29, 13), (377, 13, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.6232434Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.6232586Z triton_fused_scatter_reduce_0.run(arg0_1, buf0, 1885, grid=grid(1885), stream=stream0) 2023-01-11T21:38:06.6232659Z del arg0_1 2023-01-11T21:38:06.6232808Z triton_fused_scatter_reduce_1.run(arg1_1, arg2_1, buf0, 4, grid=grid(4), stream=stream0) 2023-01-11T21:38:06.6232879Z del arg1_1 2023-01-11T21:38:06.6232952Z del arg2_1 2023-01-11T21:38:06.6233032Z return (buf0, ) 2023-01-11T21:38:06.6233037Z 2023-01-11T21:38:06.6233041Z 2023-01-11T21:38:06.6233121Z if __name__ == "__main__": 2023-01-11T21:38:06.6233239Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.6233366Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.6233606Z arg0_1 = rand_strided((5, 29, 13), (377, 13, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.6233804Z arg1_1 = rand_strided((1, 1, 4), (4, 4, 1), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.6234011Z arg2_1 = rand_strided((1, 1, 10), (10, 10, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.6234137Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1])) 2023-01-11T21:38:06.6234142Z 2023-01-11T21:38:06.6234213Z ok (0.070s) 2023-01-11T21:38:06.6234675Z test_scatter_reduce2_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.6234811Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.6235075Z [2023-01-11 21:35:33,537] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 801 2023-01-11T21:38:06.6235343Z [2023-01-11 21:35:33,614] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 801 2023-01-11T21:38:06.6235760Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.6235949Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.6236223Z [2023-01-11 21:35:33,634] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 802 2023-01-11T21:38:06.6236237Z 2023-01-11T21:38:06.6236329Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.6236407Z import torch 2023-01-11T21:38:06.6236482Z import random 2023-01-11T21:38:06.6236600Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.6236724Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.6236729Z 2023-01-11T21:38:06.6236814Z aten = torch.ops.aten 2023-01-11T21:38:06.6236950Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.6237039Z async_compile = AsyncCompile() 2023-01-11T21:38:06.6237044Z 2023-01-11T21:38:06.6237115Z import triton 2023-01-11T21:38:06.6237204Z import triton.language as tl 2023-01-11T21:38:06.6237328Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.6237472Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.6237478Z 2023-01-11T21:38:06.6237483Z 2023-01-11T21:38:06.6237656Z triton_fused_scatter_reduce_0 = async_compile.triton(''' 2023-01-11T21:38:06.6237728Z import triton 2023-01-11T21:38:06.6237819Z import triton.language as tl 2023-01-11T21:38:06.6237929Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6238028Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6238160Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6238283Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6238289Z 2023-01-11T21:38:06.6238685Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.6238759Z @triton.jit 2023-01-11T21:38:06.6238895Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6238968Z xnumel = 6 2023-01-11T21:38:06.6239058Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6239187Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6239269Z xmask = xindex < xnumel 2023-01-11T21:38:06.6239375Z x0 = xindex 2023-01-11T21:38:06.6239473Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.6239608Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask) 2023-01-11T21:38:06.6239693Z ''') 2023-01-11T21:38:06.6239699Z 2023-01-11T21:38:06.6239703Z 2023-01-11T21:38:06.6239869Z triton_fused_scatter_reduce_1 = async_compile.triton(''' 2023-01-11T21:38:06.6239947Z import triton 2023-01-11T21:38:06.6240040Z import triton.language as tl 2023-01-11T21:38:06.6240155Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6240254Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6240385Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6240514Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6240520Z 2023-01-11T21:38:06.6240924Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.6240992Z @triton.jit 2023-01-11T21:38:06.6241123Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6241196Z xnumel = 6 2023-01-11T21:38:06.6241293Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6241424Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6241507Z xmask = xindex < xnumel 2023-01-11T21:38:06.6241577Z x2 = xindex 2023-01-11T21:38:06.6241644Z x0 = xindex % 3 2023-01-11T21:38:06.6241832Z tmp0 = tl.load(in_ptr0 + (x2), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.6241934Z tmp1 = 0 2023-01-11T21:38:06.6242076Z tl.store(out_ptr0 + (x0 + (3*tmp0) + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask) 2023-01-11T21:38:06.6242158Z ''') 2023-01-11T21:38:06.6242163Z 2023-01-11T21:38:06.6242167Z 2023-01-11T21:38:06.6242338Z triton_fused_scatter_reduce_2 = async_compile.triton(''' 2023-01-11T21:38:06.6242412Z import triton 2023-01-11T21:38:06.6242506Z import triton.language as tl 2023-01-11T21:38:06.6242614Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6242714Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6242847Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6242972Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6242977Z 2023-01-11T21:38:06.6243401Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.6243479Z @triton.jit 2023-01-11T21:38:06.6243617Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6243689Z xnumel = 6 2023-01-11T21:38:06.6243779Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6243910Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6243993Z xmask = xindex < xnumel 2023-01-11T21:38:06.6244064Z x2 = xindex 2023-01-11T21:38:06.6244136Z x0 = xindex % 3 2023-01-11T21:38:06.6244233Z tmp0 = tl.load(in_ptr0 + (x2), xmask) 2023-01-11T21:38:06.6244331Z tmp1 = tl.load(in_ptr1 + (x2), xmask) 2023-01-11T21:38:06.6244472Z tl.atomic_add(out_ptr0 + (x0 + (3*tmp0) + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask) 2023-01-11T21:38:06.6244556Z ''') 2023-01-11T21:38:06.6244561Z 2023-01-11T21:38:06.6244565Z 2023-01-11T21:38:06.6244658Z async_compile.wait(globals()) 2023-01-11T21:38:06.6244736Z del async_compile 2023-01-11T21:38:06.6244741Z 2023-01-11T21:38:06.6244822Z def call(args): 2023-01-11T21:38:06.6244906Z arg0_1, arg1_1, arg2_1 = args 2023-01-11T21:38:06.6244981Z args.clear() 2023-01-11T21:38:06.6245066Z with torch.cuda.device(0): 2023-01-11T21:38:06.6245264Z buf0 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.6245385Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.6245535Z triton_fused_scatter_reduce_0.run(arg0_1, buf0, 6, grid=grid(6), stream=stream0) 2023-01-11T21:38:06.6245608Z del arg0_1 2023-01-11T21:38:06.6245753Z triton_fused_scatter_reduce_1.run(arg1_1, buf0, 6, grid=grid(6), stream=stream0) 2023-01-11T21:38:06.6245909Z triton_fused_scatter_reduce_2.run(arg1_1, arg2_1, buf0, 6, grid=grid(6), stream=stream0) 2023-01-11T21:38:06.6245981Z del arg1_1 2023-01-11T21:38:06.6246047Z del arg2_1 2023-01-11T21:38:06.6246124Z return (buf0, ) 2023-01-11T21:38:06.6246129Z 2023-01-11T21:38:06.6246133Z 2023-01-11T21:38:06.6246217Z if __name__ == "__main__": 2023-01-11T21:38:06.6246335Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.6246459Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.6246657Z arg0_1 = rand_strided((2, 3), (3, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.6246854Z arg1_1 = rand_strided((2, 3), (3, 1), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.6247056Z arg2_1 = rand_strided((2, 3), (3, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.6247177Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1])) 2023-01-11T21:38:06.6247182Z 2023-01-11T21:38:06.6247443Z [2023-01-11 21:35:33,711] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 802 2023-01-11T21:38:06.6247449Z 2023-01-11T21:38:06.6247548Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.6247622Z import torch 2023-01-11T21:38:06.6247695Z import random 2023-01-11T21:38:06.6247812Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.6247962Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.6247967Z 2023-01-11T21:38:06.6248048Z aten = torch.ops.aten 2023-01-11T21:38:06.6248177Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.6248274Z async_compile = AsyncCompile() 2023-01-11T21:38:06.6248281Z 2023-01-11T21:38:06.6248354Z import triton 2023-01-11T21:38:06.6248446Z import triton.language as tl 2023-01-11T21:38:06.6248570Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.6248708Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.6248713Z 2023-01-11T21:38:06.6248717Z 2023-01-11T21:38:06.6248890Z triton_fused_scatter_reduce_0 = async_compile.triton(''' 2023-01-11T21:38:06.6248967Z import triton 2023-01-11T21:38:06.6249052Z import triton.language as tl 2023-01-11T21:38:06.6249166Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6249268Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6249403Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6249526Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6249531Z 2023-01-11T21:38:06.6249938Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.6250011Z @triton.jit 2023-01-11T21:38:06.6250143Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6250209Z xnumel = 6 2023-01-11T21:38:06.6250306Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6250438Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6250520Z xmask = xindex < xnumel 2023-01-11T21:38:06.6250591Z x0 = xindex 2023-01-11T21:38:06.6250708Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.6250844Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask) 2023-01-11T21:38:06.6250922Z ''') 2023-01-11T21:38:06.6250928Z 2023-01-11T21:38:06.6250932Z 2023-01-11T21:38:06.6251106Z triton_fused_scatter_reduce_1 = async_compile.triton(''' 2023-01-11T21:38:06.6251180Z import triton 2023-01-11T21:38:06.6251299Z import triton.language as tl 2023-01-11T21:38:06.6251415Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6251516Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6251647Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6251765Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6251778Z 2023-01-11T21:38:06.6252179Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.6252252Z @triton.jit 2023-01-11T21:38:06.6252387Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6252458Z xnumel = 6 2023-01-11T21:38:06.6252553Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6252686Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6252769Z xmask = xindex < xnumel 2023-01-11T21:38:06.6252835Z x2 = xindex 2023-01-11T21:38:06.6252909Z x0 = xindex % 3 2023-01-11T21:38:06.6253097Z tmp0 = tl.load(in_ptr0 + (x2), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.6253171Z tmp1 = 0 2023-01-11T21:38:06.6253315Z tl.store(out_ptr0 + (x0 + (3*tmp0) + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask) 2023-01-11T21:38:06.6253400Z ''') 2023-01-11T21:38:06.6253405Z 2023-01-11T21:38:06.6253409Z 2023-01-11T21:38:06.6253580Z triton_fused_scatter_reduce_2 = async_compile.triton(''' 2023-01-11T21:38:06.6253654Z import triton 2023-01-11T21:38:06.6253740Z import triton.language as tl 2023-01-11T21:38:06.6253853Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6253982Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6254113Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6254236Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6254241Z 2023-01-11T21:38:06.6254800Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.6254876Z @triton.jit 2023-01-11T21:38:06.6255014Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6255079Z xnumel = 6 2023-01-11T21:38:06.6255175Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6255304Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6255389Z xmask = xindex < xnumel 2023-01-11T21:38:06.6255463Z x2 = xindex 2023-01-11T21:38:06.6255536Z x0 = xindex % 3 2023-01-11T21:38:06.6255634Z tmp0 = tl.load(in_ptr0 + (x2), xmask) 2023-01-11T21:38:06.6255744Z tmp1 = tl.load(in_ptr1 + (x2), xmask).to(tl.float32) 2023-01-11T21:38:06.6255894Z tl.atomic_add(out_ptr0 + (x0 + (3*tmp0) + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask) 2023-01-11T21:38:06.6255985Z ''') 2023-01-11T21:38:06.6255991Z 2023-01-11T21:38:06.6255995Z 2023-01-11T21:38:06.6256088Z async_compile.wait(globals()) 2023-01-11T21:38:06.6256169Z del async_compile 2023-01-11T21:38:06.6256174Z 2023-01-11T21:38:06.6256248Z def call(args): 2023-01-11T21:38:06.6256333Z arg0_1, arg1_1, arg2_1 = args 2023-01-11T21:38:06.6256408Z args.clear() 2023-01-11T21:38:06.6256493Z with torch.cuda.device(0): 2023-01-11T21:38:06.6256696Z buf0 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.6256788Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.6256937Z triton_fused_scatter_reduce_0.run(arg0_1, buf0, 6, grid=grid(6), stream=stream0) 2023-01-11T21:38:06.6257017Z del arg0_1 2023-01-11T21:38:06.6257226Z triton_fused_scatter_reduce_1.run(arg1_1, buf0, 6, grid=grid(6), stream=stream0) 2023-01-11T21:38:06.6257397Z triton_fused_scatter_reduce_2.run(arg1_1, arg2_1, buf0, 6, grid=grid(6), stream=stream0) 2023-01-11T21:38:06.6257529Z del arg1_1 2023-01-11T21:38:06.6257616Z del arg2_1 2023-01-11T21:38:06.6257706Z return (buf0, ) 2023-01-11T21:38:06.6257712Z 2023-01-11T21:38:06.6257716Z 2023-01-11T21:38:06.6257797Z if __name__ == "__main__": 2023-01-11T21:38:06.6257914Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.6258041Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.6258243Z arg0_1 = rand_strided((2, 3), (3, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.6258436Z arg1_1 = rand_strided((2, 3), (3, 1), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.6258631Z arg2_1 = rand_strided((2, 3), (3, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.6258756Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1])) 2023-01-11T21:38:06.6258761Z 2023-01-11T21:38:06.6258832Z ok (0.194s) 2023-01-11T21:38:06.6259308Z test_scheduler_vertical_fusion1_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.6259442Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.6259701Z [2023-01-11 21:35:33,792] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 803 2023-01-11T21:38:06.6259909Z [2023-01-11 21:35:33,824] torch._inductor.scheduler: [DEBUG] remove_buffer('buf3') 2023-01-11T21:38:06.6260208Z [2023-01-11 21:35:33,906] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 803 2023-01-11T21:38:06.6260628Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.6260758Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.6261005Z [2023-01-11 21:35:33,974] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 804 2023-01-11T21:38:06.6261212Z [2023-01-11 21:35:34,000] torch._inductor.scheduler: [DEBUG] remove_buffer('buf3') 2023-01-11T21:38:06.6261473Z [2023-01-11 21:35:34,083] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 804 2023-01-11T21:38:06.6261482Z 2023-01-11T21:38:06.6261580Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.6261654Z import torch 2023-01-11T21:38:06.6261727Z import random 2023-01-11T21:38:06.6261848Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.6261971Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.6261979Z 2023-01-11T21:38:06.6262055Z aten = torch.ops.aten 2023-01-11T21:38:06.6262195Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.6262290Z async_compile = AsyncCompile() 2023-01-11T21:38:06.6262295Z 2023-01-11T21:38:06.6262371Z import triton 2023-01-11T21:38:06.6262463Z import triton.language as tl 2023-01-11T21:38:06.6262587Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.6262728Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.6262734Z 2023-01-11T21:38:06.6262738Z 2023-01-11T21:38:06.6262975Z triton_fused_add_add_1_add_2_add_3_mul_mul_1_mul_2_mul_3_mul_4_realize_0 = async_compile.triton(''' 2023-01-11T21:38:06.6263053Z import triton 2023-01-11T21:38:06.6263138Z import triton.language as tl 2023-01-11T21:38:06.6263251Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6263354Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6263515Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6263645Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6263651Z 2023-01-11T21:38:06.6264137Z @pointwise(size_hints=[2097152], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: '*fp32', 5: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0', 'in_out_ptr1'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5), equal_to_1=())]}) 2023-01-11T21:38:06.6264211Z @triton.jit 2023-01-11T21:38:06.6264373Z def triton_(in_out_ptr0, in_out_ptr1, in_ptr0, in_ptr1, in_ptr2, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6264446Z xnumel = 1082016 2023-01-11T21:38:06.6264544Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6264673Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6264754Z xmask = xindex < xnumel 2023-01-11T21:38:06.6264825Z x0 = xindex 2023-01-11T21:38:06.6264899Z x1 = xindex % 26 2023-01-11T21:38:06.6265092Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.6265183Z tmp3 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.6265280Z tmp9 = tl.load(in_ptr1 + (x0), xmask) 2023-01-11T21:38:06.6265378Z tmp16 = tl.load(in_ptr2 + (x1), xmask) 2023-01-11T21:38:06.6265502Z tmp1 = -1.061519070296458e-11 2023-01-11T21:38:06.6265583Z tmp2 = tmp0 * tmp1 2023-01-11T21:38:06.6265705Z tmp4 = -1.988366587925593e-08 2023-01-11T21:38:06.6265784Z tmp5 = tmp2 + tmp4 2023-01-11T21:38:06.6265855Z tmp6 = tmp3 * tmp5 2023-01-11T21:38:06.6265977Z tmp7 = -3.087032500374211e-07 2023-01-11T21:38:06.6266098Z tmp8 = tmp6 + tmp7 2023-01-11T21:38:06.6266220Z tmp10 = 1.55093272922008e-10 2023-01-11T21:38:06.6266302Z tmp11 = tmp9 * tmp10 2023-01-11T21:38:06.6266384Z tmp12 = tmp8 + tmp11 2023-01-11T21:38:06.6266453Z tmp13 = 1 / tmp12 2023-01-11T21:38:06.6266527Z tmp14 = 1.0 2023-01-11T21:38:06.6266611Z tmp15 = tmp13 * tmp14 2023-01-11T21:38:06.6266694Z tmp17 = tmp12 * tmp16 2023-01-11T21:38:06.6266773Z tmp18 = tmp15 + tmp17 2023-01-11T21:38:06.6266914Z tl.store(in_out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp18, xmask) 2023-01-11T21:38:06.6266999Z ''') 2023-01-11T21:38:06.6267005Z 2023-01-11T21:38:06.6267009Z 2023-01-11T21:38:06.6267104Z async_compile.wait(globals()) 2023-01-11T21:38:06.6267174Z del async_compile 2023-01-11T21:38:06.6267179Z 2023-01-11T21:38:06.6267256Z def call(args): 2023-01-11T21:38:06.6267342Z arg0_1, arg1_1, arg2_1 = args 2023-01-11T21:38:06.6267417Z args.clear() 2023-01-11T21:38:06.6267513Z with torch.cuda.device(0): 2023-01-11T21:38:06.6267742Z buf0 = empty_strided((204, 204, 26), (5304, 26, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.6267832Z buf1 = buf0; del buf0 # reuse 2023-01-11T21:38:06.6268037Z buf2 = empty_strided((204, 204, 26), (5304, 26, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.6268132Z buf4 = buf2; del buf2 # reuse 2023-01-11T21:38:06.6268223Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.6268423Z triton_fused_add_add_1_add_2_add_3_mul_mul_1_mul_2_mul_3_mul_4_realize_0.run(buf1, buf4, arg1_1, arg0_1, arg2_1, 1082016, grid=grid(1082016), stream=stream0) 2023-01-11T21:38:06.6268496Z del arg0_1 2023-01-11T21:38:06.6268568Z del arg1_1 2023-01-11T21:38:06.6268641Z del arg2_1 2023-01-11T21:38:06.6268711Z return (buf4, ) 2023-01-11T21:38:06.6268723Z 2023-01-11T21:38:06.6268728Z 2023-01-11T21:38:06.6268801Z if __name__ == "__main__": 2023-01-11T21:38:06.6268919Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.6269052Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.6269267Z arg0_1 = rand_strided((204, 204, 26), (5304, 26, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.6269485Z arg1_1 = rand_strided((204, 204, 26), (5304, 26, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.6269713Z arg2_1 = rand_strided((26, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.6269841Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1])) 2023-01-11T21:38:06.6269846Z 2023-01-11T21:38:06.6269851Z 2023-01-11T21:38:06.6269952Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.6270020Z import torch 2023-01-11T21:38:06.6270094Z import random 2023-01-11T21:38:06.6270213Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.6270337Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.6270342Z 2023-01-11T21:38:06.6270425Z aten = torch.ops.aten 2023-01-11T21:38:06.6270565Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.6270662Z async_compile = AsyncCompile() 2023-01-11T21:38:06.6270667Z 2023-01-11T21:38:06.6270741Z import triton 2023-01-11T21:38:06.6270827Z import triton.language as tl 2023-01-11T21:38:06.6270954Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.6271097Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.6271103Z 2023-01-11T21:38:06.6271107Z 2023-01-11T21:38:06.6271344Z triton_fused_add_add_1_add_2_add_3_mul_mul_1_mul_2_mul_3_mul_4_realize_0 = async_compile.triton(''' 2023-01-11T21:38:06.6271418Z import triton 2023-01-11T21:38:06.6271509Z import triton.language as tl 2023-01-11T21:38:06.6271625Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6271719Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6271852Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6271977Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6272011Z 2023-01-11T21:38:06.6272502Z @pointwise(size_hints=[2097152], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: '*fp16', 4: '*fp16', 5: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0', 'in_out_ptr1'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5), equal_to_1=())]}) 2023-01-11T21:38:06.6272575Z @triton.jit 2023-01-11T21:38:06.6272741Z def triton_(in_out_ptr0, in_out_ptr1, in_ptr0, in_ptr1, in_ptr2, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6272816Z xnumel = 1082016 2023-01-11T21:38:06.6272916Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6273047Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6273123Z xmask = xindex < xnumel 2023-01-11T21:38:06.6273195Z x0 = xindex 2023-01-11T21:38:06.6273270Z x1 = xindex % 26 2023-01-11T21:38:06.6273481Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.6273601Z tmp3 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.6273715Z tmp9 = tl.load(in_ptr1 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.6273831Z tmp16 = tl.load(in_ptr2 + (x1), xmask).to(tl.float32) 2023-01-11T21:38:06.6273948Z tmp1 = -1.061519070296458e-11 2023-01-11T21:38:06.6274031Z tmp2 = tmp0 * tmp1 2023-01-11T21:38:06.6274152Z tmp4 = -1.988366587925593e-08 2023-01-11T21:38:06.6274231Z tmp5 = tmp2 + tmp4 2023-01-11T21:38:06.6274310Z tmp6 = tmp3 * tmp5 2023-01-11T21:38:06.6274432Z tmp7 = -3.087032500374211e-07 2023-01-11T21:38:06.6274511Z tmp8 = tmp6 + tmp7 2023-01-11T21:38:06.6274626Z tmp10 = 1.55093272922008e-10 2023-01-11T21:38:06.6274706Z tmp11 = tmp9 * tmp10 2023-01-11T21:38:06.6274787Z tmp12 = tmp8 + tmp11 2023-01-11T21:38:06.6274861Z tmp13 = 1 / tmp12 2023-01-11T21:38:06.6274930Z tmp14 = 1.0 2023-01-11T21:38:06.6275008Z tmp15 = tmp13 * tmp14 2023-01-11T21:38:06.6275084Z tmp17 = tmp12 * tmp16 2023-01-11T21:38:06.6275161Z tmp18 = tmp15 + tmp17 2023-01-11T21:38:06.6275303Z tl.store(in_out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp18, xmask) 2023-01-11T21:38:06.6275387Z ''') 2023-01-11T21:38:06.6275393Z 2023-01-11T21:38:06.6275397Z 2023-01-11T21:38:06.6275488Z async_compile.wait(globals()) 2023-01-11T21:38:06.6275639Z del async_compile 2023-01-11T21:38:06.6275645Z 2023-01-11T21:38:06.6275721Z def call(args): 2023-01-11T21:38:06.6275807Z arg0_1, arg1_1, arg2_1 = args 2023-01-11T21:38:06.6275876Z args.clear() 2023-01-11T21:38:06.6275966Z with torch.cuda.device(0): 2023-01-11T21:38:06.6276181Z buf0 = empty_strided((204, 204, 26), (5304, 26, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.6276273Z buf1 = buf0; del buf0 # reuse 2023-01-11T21:38:06.6276484Z buf2 = empty_strided((204, 204, 26), (5304, 26, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.6276574Z buf4 = buf2; del buf2 # reuse 2023-01-11T21:38:06.6276669Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.6276859Z triton_fused_add_add_1_add_2_add_3_mul_mul_1_mul_2_mul_3_mul_4_realize_0.run(buf1, buf4, arg1_1, arg0_1, arg2_1, 1082016, grid=grid(1082016), stream=stream0) 2023-01-11T21:38:06.6276933Z del arg0_1 2023-01-11T21:38:06.6277007Z del arg1_1 2023-01-11T21:38:06.6277081Z del arg2_1 2023-01-11T21:38:06.6277156Z return (buf4, ) 2023-01-11T21:38:06.6277161Z 2023-01-11T21:38:06.6277166Z 2023-01-11T21:38:06.6277245Z if __name__ == "__main__": 2023-01-11T21:38:06.6277366Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.6277492Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.6277701Z arg0_1 = rand_strided((204, 204, 26), (5304, 26, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.6277916Z arg1_1 = rand_strided((204, 204, 26), (5304, 26, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.6278113Z arg2_1 = rand_strided((26, ), (1, ), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.6278274Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1])) 2023-01-11T21:38:06.6278279Z 2023-01-11T21:38:06.6278350Z ok (0.373s) 2023-01-11T21:38:06.6278814Z test_select_scatter_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.6278945Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.6279204Z [2023-01-11 21:35:34,114] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 805 2023-01-11T21:38:06.6279471Z [2023-01-11 21:35:34,218] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 805 2023-01-11T21:38:06.6279887Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.6280017Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.6280265Z [2023-01-11 21:35:34,246] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 806 2023-01-11T21:38:06.6280525Z [2023-01-11 21:35:34,343] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 806 2023-01-11T21:38:06.6280531Z 2023-01-11T21:38:06.6280629Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.6280705Z import torch 2023-01-11T21:38:06.6280780Z import random 2023-01-11T21:38:06.6280901Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.6281029Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.6281034Z 2023-01-11T21:38:06.6281117Z aten = torch.ops.aten 2023-01-11T21:38:06.6281247Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.6281343Z async_compile = AsyncCompile() 2023-01-11T21:38:06.6281376Z 2023-01-11T21:38:06.6281455Z import triton 2023-01-11T21:38:06.6281546Z import triton.language as tl 2023-01-11T21:38:06.6281672Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.6281813Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.6281818Z 2023-01-11T21:38:06.6281822Z 2023-01-11T21:38:06.6282022Z triton_fused_select_scatter_select_scatter_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.6282095Z import triton 2023-01-11T21:38:06.6282180Z import triton.language as tl 2023-01-11T21:38:06.6282293Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6282392Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6282526Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6282652Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6282657Z 2023-01-11T21:38:06.6283108Z @pointwise(size_hints=[65536], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: '*fp32', 5: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5), equal_to_1=())]}) 2023-01-11T21:38:06.6283182Z @triton.jit 2023-01-11T21:38:06.6283340Z def triton_(in_ptr0, in_ptr1, in_ptr2, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6283409Z xnumel = 59888 2023-01-11T21:38:06.6283505Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6283634Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6283714Z xmask = xindex < xnumel 2023-01-11T21:38:06.6283799Z x1 = (xindex // 38) % 197 2023-01-11T21:38:06.6283901Z x0 = xindex % 38 2023-01-11T21:38:06.6283980Z x2 = (xindex // 7486) 2023-01-11T21:38:06.6284044Z x3 = xindex 2023-01-11T21:38:06.6284121Z x4 = xindex % 7486 2023-01-11T21:38:06.6284230Z tmp3 = tl.load(in_ptr0 + (x0 + (38*x2)), xmask) 2023-01-11T21:38:06.6284423Z tmp4 = tl.load(in_ptr1 + (x3), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.6284521Z tmp9 = tl.load(in_ptr2 + (x4), xmask) 2023-01-11T21:38:06.6284621Z tmp10 = tl.load(in_ptr1 + (x3), xmask) 2023-01-11T21:38:06.6284693Z tmp0 = x1 2023-01-11T21:38:06.6284757Z tmp1 = 0 2023-01-11T21:38:06.6284835Z tmp2 = tmp0 == tmp1 2023-01-11T21:38:06.6284932Z tmp5 = tl.where(tmp2, tmp3, tmp4) 2023-01-11T21:38:06.6285003Z tmp6 = x2 2023-01-11T21:38:06.6285074Z tmp7 = 1 2023-01-11T21:38:06.6285154Z tmp8 = tmp6 == tmp7 2023-01-11T21:38:06.6285251Z tmp11 = tl.where(tmp8, tmp9, tmp10) 2023-01-11T21:38:06.6285378Z tl.store(out_ptr0 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask) 2023-01-11T21:38:06.6285515Z tl.store(out_ptr1 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp11, xmask) 2023-01-11T21:38:06.6285599Z ''') 2023-01-11T21:38:06.6285605Z 2023-01-11T21:38:06.6285609Z 2023-01-11T21:38:06.6285701Z async_compile.wait(globals()) 2023-01-11T21:38:06.6285778Z del async_compile 2023-01-11T21:38:06.6285783Z 2023-01-11T21:38:06.6285859Z def call(args): 2023-01-11T21:38:06.6285946Z arg0_1, arg1_1, arg2_1 = args 2023-01-11T21:38:06.6286015Z args.clear() 2023-01-11T21:38:06.6286107Z with torch.cuda.device(0): 2023-01-11T21:38:06.6286323Z buf0 = empty_strided((8, 197, 38), (7486, 38, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.6286535Z buf1 = empty_strided((8, 197, 38), (7486, 38, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.6286628Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.6286816Z triton_fused_select_scatter_select_scatter_1_0.run(arg1_1, arg0_1, arg2_1, buf0, buf1, 59888, grid=grid(59888), stream=stream0) 2023-01-11T21:38:06.6286890Z del arg0_1 2023-01-11T21:38:06.6286960Z del arg1_1 2023-01-11T21:38:06.6287025Z del arg2_1 2023-01-11T21:38:06.6287109Z return (buf0, buf1, ) 2023-01-11T21:38:06.6287114Z 2023-01-11T21:38:06.6287118Z 2023-01-11T21:38:06.6287196Z if __name__ == "__main__": 2023-01-11T21:38:06.6287343Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.6287472Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.6287685Z arg0_1 = rand_strided((8, 197, 38), (7486, 38, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.6287885Z arg1_1 = rand_strided((8, 38), (38, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.6288091Z arg2_1 = rand_strided((197, 38), (38, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.6288211Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1])) 2023-01-11T21:38:06.6288216Z 2023-01-11T21:38:06.6288226Z 2023-01-11T21:38:06.6288320Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.6288396Z import torch 2023-01-11T21:38:06.6288469Z import random 2023-01-11T21:38:06.6288589Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.6288712Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.6288717Z 2023-01-11T21:38:06.6288801Z aten = torch.ops.aten 2023-01-11T21:38:06.6288938Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.6289026Z async_compile = AsyncCompile() 2023-01-11T21:38:06.6289031Z 2023-01-11T21:38:06.6289107Z import triton 2023-01-11T21:38:06.6289199Z import triton.language as tl 2023-01-11T21:38:06.6289324Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.6289464Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.6289469Z 2023-01-11T21:38:06.6289473Z 2023-01-11T21:38:06.6289672Z triton_fused_select_scatter_select_scatter_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.6289777Z import triton 2023-01-11T21:38:06.6289872Z import triton.language as tl 2023-01-11T21:38:06.6289978Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6290078Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6290208Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6290335Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6290341Z 2023-01-11T21:38:06.6290795Z @pointwise(size_hints=[65536], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: '*fp16', 4: '*fp16', 5: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5), equal_to_1=())]}) 2023-01-11T21:38:06.6290868Z @triton.jit 2023-01-11T21:38:06.6291026Z def triton_(in_ptr0, in_ptr1, in_ptr2, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6291101Z xnumel = 59888 2023-01-11T21:38:06.6291190Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6291322Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6291405Z xmask = xindex < xnumel 2023-01-11T21:38:06.6291486Z x1 = (xindex // 38) % 197 2023-01-11T21:38:06.6291562Z x0 = xindex % 38 2023-01-11T21:38:06.6291640Z x2 = (xindex // 7486) 2023-01-11T21:38:06.6291711Z x3 = xindex 2023-01-11T21:38:06.6291785Z x4 = xindex % 7486 2023-01-11T21:38:06.6291909Z tmp3 = tl.load(in_ptr0 + (x0 + (38*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.6292120Z tmp4 = tl.load(in_ptr1 + (x3), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.6292237Z tmp9 = tl.load(in_ptr2 + (x4), xmask).to(tl.float32) 2023-01-11T21:38:06.6292352Z tmp10 = tl.load(in_ptr1 + (x3), xmask).to(tl.float32) 2023-01-11T21:38:06.6292424Z tmp0 = x1 2023-01-11T21:38:06.6292495Z tmp1 = 0 2023-01-11T21:38:06.6292568Z tmp2 = tmp0 == tmp1 2023-01-11T21:38:06.6292667Z tmp5 = tl.where(tmp2, tmp3, tmp4) 2023-01-11T21:38:06.6292742Z tmp6 = x2 2023-01-11T21:38:06.6292811Z tmp7 = 1 2023-01-11T21:38:06.6292894Z tmp8 = tmp6 == tmp7 2023-01-11T21:38:06.6292991Z tmp11 = tl.where(tmp8, tmp9, tmp10) 2023-01-11T21:38:06.6293126Z tl.store(out_ptr0 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask) 2023-01-11T21:38:06.6293282Z tl.store(out_ptr1 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp11, xmask) 2023-01-11T21:38:06.6293369Z ''') 2023-01-11T21:38:06.6293374Z 2023-01-11T21:38:06.6293379Z 2023-01-11T21:38:06.6293472Z async_compile.wait(globals()) 2023-01-11T21:38:06.6293551Z del async_compile 2023-01-11T21:38:06.6293557Z 2023-01-11T21:38:06.6293631Z def call(args): 2023-01-11T21:38:06.6293718Z arg0_1, arg1_1, arg2_1 = args 2023-01-11T21:38:06.6293793Z args.clear() 2023-01-11T21:38:06.6293879Z with torch.cuda.device(0): 2023-01-11T21:38:06.6294090Z buf0 = empty_strided((8, 197, 38), (7486, 38, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.6294304Z buf1 = empty_strided((8, 197, 38), (7486, 38, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.6294399Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.6294700Z triton_fused_select_scatter_select_scatter_1_0.run(arg1_1, arg0_1, arg2_1, buf0, buf1, 59888, grid=grid(59888), stream=stream0) 2023-01-11T21:38:06.6294777Z del arg0_1 2023-01-11T21:38:06.6294851Z del arg1_1 2023-01-11T21:38:06.6294922Z del arg2_1 2023-01-11T21:38:06.6294997Z return (buf0, buf1, ) 2023-01-11T21:38:06.6295002Z 2023-01-11T21:38:06.6295007Z 2023-01-11T21:38:06.6295089Z if __name__ == "__main__": 2023-01-11T21:38:06.6295206Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.6295332Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.6295549Z arg0_1 = rand_strided((8, 197, 38), (7486, 38, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.6295777Z arg1_1 = rand_strided((8, 38), (38, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.6296053Z arg2_1 = rand_strided((197, 38), (38, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.6296180Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1])) 2023-01-11T21:38:06.6296185Z 2023-01-11T21:38:06.6296252Z ok (0.259s) 2023-01-11T21:38:06.6296705Z test_sgn_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.6296839Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.6297103Z [2023-01-11 21:35:34,363] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 807 2023-01-11T21:38:06.6297437Z [2023-01-11 21:35:34,440] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 807 2023-01-11T21:38:06.6297859Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.6297994Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.6298250Z [2023-01-11 21:35:34,483] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 808 2023-01-11T21:38:06.6298516Z [2023-01-11 21:35:34,564] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 808 2023-01-11T21:38:06.6298522Z 2023-01-11T21:38:06.6298622Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.6298698Z import torch 2023-01-11T21:38:06.6298769Z import random 2023-01-11T21:38:06.6298890Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.6299019Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.6299024Z 2023-01-11T21:38:06.6299108Z aten = torch.ops.aten 2023-01-11T21:38:06.6299248Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.6299346Z async_compile = AsyncCompile() 2023-01-11T21:38:06.6299390Z 2023-01-11T21:38:06.6299473Z import triton 2023-01-11T21:38:06.6299562Z import triton.language as tl 2023-01-11T21:38:06.6299689Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.6299837Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.6299843Z 2023-01-11T21:38:06.6299847Z 2023-01-11T21:38:06.6300012Z triton_fused_sign_sub_0 = async_compile.triton(''' 2023-01-11T21:38:06.6300087Z import triton 2023-01-11T21:38:06.6300182Z import triton.language as tl 2023-01-11T21:38:06.6300299Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6300404Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6300536Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6300664Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6300670Z 2023-01-11T21:38:06.6301095Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.6301173Z @triton.jit 2023-01-11T21:38:06.6301318Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6301395Z xnumel = 41 2023-01-11T21:38:06.6301495Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6301627Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6301706Z xmask = xindex < xnumel 2023-01-11T21:38:06.6301782Z x0 = xindex 2023-01-11T21:38:06.6301975Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.6302105Z tmp6 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.6302183Z tmp1 = 0 < tmp0 2023-01-11T21:38:06.6302277Z tmp2 = tl.where(tmp1, 1, 0) 2023-01-11T21:38:06.6302356Z tmp3 = tmp0 < 0 2023-01-11T21:38:06.6302440Z tmp4 = tl.where(tmp3, 1, 0) 2023-01-11T21:38:06.6302554Z tmp5 = tmp2 - tmp4 2023-01-11T21:38:06.6302627Z tmp7 = 1 2023-01-11T21:38:06.6302705Z tmp8 = tmp6 + tmp7 2023-01-11T21:38:06.6302782Z tmp9 = 0 < tmp8 2023-01-11T21:38:06.6302874Z tmp10 = tl.where(tmp9, 1, 0) 2023-01-11T21:38:06.6302945Z tmp11 = tmp8 < 0 2023-01-11T21:38:06.6303039Z tmp12 = tl.where(tmp11, 1, 0) 2023-01-11T21:38:06.6303156Z tmp13 = tmp10 - tmp12 2023-01-11T21:38:06.6303271Z tmp14 = tmp13 - tmp7 2023-01-11T21:38:06.6303409Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask) 2023-01-11T21:38:06.6303546Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp14, xmask) 2023-01-11T21:38:06.6303636Z ''') 2023-01-11T21:38:06.6303642Z 2023-01-11T21:38:06.6303646Z 2023-01-11T21:38:06.6303741Z async_compile.wait(globals()) 2023-01-11T21:38:06.6303813Z del async_compile 2023-01-11T21:38:06.6303819Z 2023-01-11T21:38:06.6303896Z def call(args): 2023-01-11T21:38:06.6303972Z arg0_1, = args 2023-01-11T21:38:06.6304053Z args.clear() 2023-01-11T21:38:06.6304151Z with torch.cuda.device(0): 2023-01-11T21:38:06.6304351Z buf0 = empty_strided((41, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.6304550Z buf1 = empty_strided((41, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.6304638Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.6304785Z triton_fused_sign_sub_0.run(arg0_1, buf0, buf1, 41, grid=grid(41), stream=stream0) 2023-01-11T21:38:06.6304859Z del arg0_1 2023-01-11T21:38:06.6304945Z return (buf0, buf1, ) 2023-01-11T21:38:06.6304950Z 2023-01-11T21:38:06.6304954Z 2023-01-11T21:38:06.6305035Z if __name__ == "__main__": 2023-01-11T21:38:06.6305154Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.6305285Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.6305485Z arg0_1 = rand_strided((41, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.6305629Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.6305637Z 2023-01-11T21:38:06.6305650Z 2023-01-11T21:38:06.6305758Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.6305848Z import torch 2023-01-11T21:38:06.6305936Z import random 2023-01-11T21:38:06.6306058Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.6306181Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.6306186Z 2023-01-11T21:38:06.6306271Z aten = torch.ops.aten 2023-01-11T21:38:06.6306409Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.6306500Z async_compile = AsyncCompile() 2023-01-11T21:38:06.6306505Z 2023-01-11T21:38:06.6306583Z import triton 2023-01-11T21:38:06.6306675Z import triton.language as tl 2023-01-11T21:38:06.6306801Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.6306943Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.6306948Z 2023-01-11T21:38:06.6306953Z 2023-01-11T21:38:06.6307150Z triton_fused_convert_element_type_1_sub_0 = async_compile.triton(''' 2023-01-11T21:38:06.6307227Z import triton 2023-01-11T21:38:06.6307320Z import triton.language as tl 2023-01-11T21:38:06.6307428Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6307532Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6307667Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6307794Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6307799Z 2023-01-11T21:38:06.6308212Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.6308329Z @triton.jit 2023-01-11T21:38:06.6308473Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6308551Z xnumel = 41 2023-01-11T21:38:06.6308646Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6308778Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6308864Z xmask = xindex < xnumel 2023-01-11T21:38:06.6308935Z x0 = xindex 2023-01-11T21:38:06.6309151Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.6309272Z tmp8 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.6309362Z tmp1 = tmp0.to(tl.float32) 2023-01-11T21:38:06.6309433Z tmp2 = 0 < tmp1 2023-01-11T21:38:06.6309523Z tmp3 = tl.where(tmp2, 1, 0) 2023-01-11T21:38:06.6309599Z tmp4 = tmp1 < 0 2023-01-11T21:38:06.6309689Z tmp5 = tl.where(tmp4, 1, 0) 2023-01-11T21:38:06.6309804Z tmp6 = tmp3 - tmp5 2023-01-11T21:38:06.6309894Z tmp7 = tmp6.to(tl.float32) 2023-01-11T21:38:06.6309967Z tmp9 = 1 2023-01-11T21:38:06.6310043Z tmp10 = tmp8 + tmp9 2023-01-11T21:38:06.6310135Z tmp11 = tmp10.to(tl.float32) 2023-01-11T21:38:06.6310211Z tmp12 = 0 < tmp11 2023-01-11T21:38:06.6310307Z tmp13 = tl.where(tmp12, 1, 0) 2023-01-11T21:38:06.6310387Z tmp14 = tmp11 < 0 2023-01-11T21:38:06.6310479Z tmp15 = tl.where(tmp14, 1, 0) 2023-01-11T21:38:06.6310589Z tmp16 = tmp13 - tmp15 2023-01-11T21:38:06.6310680Z tmp17 = tmp16.to(tl.float32) 2023-01-11T21:38:06.6310795Z tmp18 = tmp17 - tmp9 2023-01-11T21:38:06.6310933Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp7, xmask) 2023-01-11T21:38:06.6311069Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp18, xmask) 2023-01-11T21:38:06.6311152Z ''') 2023-01-11T21:38:06.6311158Z 2023-01-11T21:38:06.6311162Z 2023-01-11T21:38:06.6311261Z async_compile.wait(globals()) 2023-01-11T21:38:06.6311340Z del async_compile 2023-01-11T21:38:06.6311345Z 2023-01-11T21:38:06.6311416Z def call(args): 2023-01-11T21:38:06.6311493Z arg0_1, = args 2023-01-11T21:38:06.6311569Z args.clear() 2023-01-11T21:38:06.6311663Z with torch.cuda.device(0): 2023-01-11T21:38:06.6311889Z buf0 = empty_strided((41, ), (1, ), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.6312090Z buf1 = empty_strided((41, ), (1, ), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.6312184Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.6312346Z triton_fused_convert_element_type_1_sub_0.run(arg0_1, buf0, buf1, 41, grid=grid(41), stream=stream0) 2023-01-11T21:38:06.6312419Z del arg0_1 2023-01-11T21:38:06.6312504Z return (buf0, buf1, ) 2023-01-11T21:38:06.6312509Z 2023-01-11T21:38:06.6312514Z 2023-01-11T21:38:06.6312594Z if __name__ == "__main__": 2023-01-11T21:38:06.6312712Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.6312843Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.6313040Z arg0_1 = rand_strided((41, ), (1, ), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.6313156Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.6313161Z 2023-01-11T21:38:06.6313227Z ok (0.221s) 2023-01-11T21:38:06.6313691Z test_sgn_extremal_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.6313823Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.6314082Z [2023-01-11 21:35:34,577] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 809 2023-01-11T21:38:06.6314384Z [2023-01-11 21:35:34,642] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 809 2023-01-11T21:38:06.6314800Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.6314934Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.6315188Z [2023-01-11 21:35:34,666] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 810 2023-01-11T21:38:06.6315452Z [2023-01-11 21:35:34,734] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 810 2023-01-11T21:38:06.6315458Z 2023-01-11T21:38:06.6315558Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.6315637Z import torch 2023-01-11T21:38:06.6315708Z import random 2023-01-11T21:38:06.6315829Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.6315953Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.6315958Z 2023-01-11T21:38:06.6316039Z aten = torch.ops.aten 2023-01-11T21:38:06.6316185Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.6316285Z async_compile = AsyncCompile() 2023-01-11T21:38:06.6316290Z 2023-01-11T21:38:06.6316367Z import triton 2023-01-11T21:38:06.6316455Z import triton.language as tl 2023-01-11T21:38:06.6316583Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.6316726Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.6316732Z 2023-01-11T21:38:06.6316736Z 2023-01-11T21:38:06.6316895Z triton_fused_sign_0 = async_compile.triton(''' 2023-01-11T21:38:06.6316974Z import triton 2023-01-11T21:38:06.6317075Z import triton.language as tl 2023-01-11T21:38:06.6317195Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6317298Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6317427Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6317555Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6317560Z 2023-01-11T21:38:06.6317995Z @pointwise(size_hints=[4], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.6318074Z @triton.jit 2023-01-11T21:38:06.6318209Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6318284Z xnumel = 4 2023-01-11T21:38:06.6318386Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6318516Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6318595Z xmask = xindex < xnumel 2023-01-11T21:38:06.6318670Z x0 = xindex 2023-01-11T21:38:06.6318772Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.6318851Z tmp1 = 0 < tmp0 2023-01-11T21:38:06.6318942Z tmp2 = tl.where(tmp1, 1, 0) 2023-01-11T21:38:06.6319021Z tmp3 = tmp0 < 0 2023-01-11T21:38:06.6319111Z tmp4 = tl.where(tmp3, 1, 0) 2023-01-11T21:38:06.6319215Z tmp5 = tmp2 - tmp4 2023-01-11T21:38:06.6319355Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask) 2023-01-11T21:38:06.6319441Z ''') 2023-01-11T21:38:06.6319446Z 2023-01-11T21:38:06.6319451Z 2023-01-11T21:38:06.6319546Z async_compile.wait(globals()) 2023-01-11T21:38:06.6319627Z del async_compile 2023-01-11T21:38:06.6319632Z 2023-01-11T21:38:06.6319710Z def call(args): 2023-01-11T21:38:06.6319788Z arg0_1, = args 2023-01-11T21:38:06.6319859Z args.clear() 2023-01-11T21:38:06.6319953Z with torch.cuda.device(0): 2023-01-11T21:38:06.6320150Z buf0 = empty_strided((4, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.6320246Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.6320415Z triton_fused_sign_0.run(arg0_1, buf0, 4, grid=grid(4), stream=stream0) 2023-01-11T21:38:06.6320491Z del arg0_1 2023-01-11T21:38:06.6320571Z return (buf0, ) 2023-01-11T21:38:06.6320576Z 2023-01-11T21:38:06.6320581Z 2023-01-11T21:38:06.6320661Z if __name__ == "__main__": 2023-01-11T21:38:06.6320777Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.6320903Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.6321102Z arg0_1 = rand_strided((4, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.6321219Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.6321224Z 2023-01-11T21:38:06.6321229Z 2023-01-11T21:38:06.6321329Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.6321403Z import torch 2023-01-11T21:38:06.6321477Z import random 2023-01-11T21:38:06.6321598Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.6321720Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.6321725Z 2023-01-11T21:38:06.6321809Z aten = torch.ops.aten 2023-01-11T21:38:06.6321944Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.6322040Z async_compile = AsyncCompile() 2023-01-11T21:38:06.6322046Z 2023-01-11T21:38:06.6322120Z import triton 2023-01-11T21:38:06.6322217Z import triton.language as tl 2023-01-11T21:38:06.6322344Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.6322478Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.6322491Z 2023-01-11T21:38:06.6322495Z 2023-01-11T21:38:06.6322677Z triton_fused_convert_element_type_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.6322754Z import triton 2023-01-11T21:38:06.6322849Z import triton.language as tl 2023-01-11T21:38:06.6322966Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6323073Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6323205Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6323335Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6323340Z 2023-01-11T21:38:06.6323772Z @pointwise(size_hints=[4], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.6323842Z @triton.jit 2023-01-11T21:38:06.6323977Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6324053Z xnumel = 4 2023-01-11T21:38:06.6324152Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6324282Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6324368Z xmask = xindex < xnumel 2023-01-11T21:38:06.6324442Z x0 = xindex 2023-01-11T21:38:06.6324555Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.6324645Z tmp1 = tmp0.to(tl.float32) 2023-01-11T21:38:06.6324723Z tmp2 = 0 < tmp1 2023-01-11T21:38:06.6324813Z tmp3 = tl.where(tmp2, 1, 0) 2023-01-11T21:38:06.6324890Z tmp4 = tmp1 < 0 2023-01-11T21:38:06.6324979Z tmp5 = tl.where(tmp4, 1, 0) 2023-01-11T21:38:06.6325091Z tmp6 = tmp3 - tmp5 2023-01-11T21:38:06.6325175Z tmp7 = tmp6.to(tl.float32) 2023-01-11T21:38:06.6325313Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp7, xmask) 2023-01-11T21:38:06.6325400Z ''') 2023-01-11T21:38:06.6325406Z 2023-01-11T21:38:06.6325410Z 2023-01-11T21:38:06.6325507Z async_compile.wait(globals()) 2023-01-11T21:38:06.6325586Z del async_compile 2023-01-11T21:38:06.6325591Z 2023-01-11T21:38:06.6325666Z def call(args): 2023-01-11T21:38:06.6325741Z arg0_1, = args 2023-01-11T21:38:06.6325811Z args.clear() 2023-01-11T21:38:06.6325905Z with torch.cuda.device(0): 2023-01-11T21:38:06.6326100Z buf0 = empty_strided((4, ), (1, ), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.6326195Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.6326383Z triton_fused_convert_element_type_1_0.run(arg0_1, buf0, 4, grid=grid(4), stream=stream0) 2023-01-11T21:38:06.6326459Z del arg0_1 2023-01-11T21:38:06.6326540Z return (buf0, ) 2023-01-11T21:38:06.6326545Z 2023-01-11T21:38:06.6326549Z 2023-01-11T21:38:06.6326632Z if __name__ == "__main__": 2023-01-11T21:38:06.6326746Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.6326872Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.6327070Z arg0_1 = rand_strided((4, ), (1, ), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.6327183Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.6327188Z 2023-01-11T21:38:06.6327261Z ok (0.169s) 2023-01-11T21:38:06.6327731Z test_shape_prop_torch_ones_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.6327869Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.6328133Z [2023-01-11 21:35:34,927] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 811 2023-01-11T21:38:06.6328402Z [2023-01-11 21:35:35,004] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 811 2023-01-11T21:38:06.6328819Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.6328947Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.6329206Z [2023-01-11 21:35:35,058] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 812 2023-01-11T21:38:06.6329468Z [2023-01-11 21:35:35,129] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 812 2023-01-11T21:38:06.6329474Z 2023-01-11T21:38:06.6329602Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.6329682Z import torch 2023-01-11T21:38:06.6329758Z import random 2023-01-11T21:38:06.6329879Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.6330005Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.6330012Z 2023-01-11T21:38:06.6330089Z aten = torch.ops.aten 2023-01-11T21:38:06.6330228Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.6330325Z async_compile = AsyncCompile() 2023-01-11T21:38:06.6330330Z 2023-01-11T21:38:06.6330406Z import triton 2023-01-11T21:38:06.6330502Z import triton.language as tl 2023-01-11T21:38:06.6330638Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.6330782Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.6330787Z 2023-01-11T21:38:06.6330792Z 2023-01-11T21:38:06.6330948Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.6331019Z import triton 2023-01-11T21:38:06.6331115Z import triton.language as tl 2023-01-11T21:38:06.6331233Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6331337Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6331472Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6331601Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6331607Z 2023-01-11T21:38:06.6332020Z @pointwise(size_hints=[33554432], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.6332132Z @triton.jit 2023-01-11T21:38:06.6332261Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6332339Z xnumel = 25165824 2023-01-11T21:38:06.6332439Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6332574Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6332661Z xmask = xindex < xnumel 2023-01-11T21:38:06.6332737Z x0 = xindex 2023-01-11T21:38:06.6332836Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.6332903Z tmp1 = 1 2023-01-11T21:38:06.6332984Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.6333122Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.6333207Z ''') 2023-01-11T21:38:06.6333214Z 2023-01-11T21:38:06.6333219Z 2023-01-11T21:38:06.6333314Z async_compile.wait(globals()) 2023-01-11T21:38:06.6333393Z del async_compile 2023-01-11T21:38:06.6333398Z 2023-01-11T21:38:06.6333477Z def call(args): 2023-01-11T21:38:06.6333558Z arg0_1, = args 2023-01-11T21:38:06.6333629Z args.clear() 2023-01-11T21:38:06.6333724Z with torch.cuda.device(0): 2023-01-11T21:38:06.6333963Z buf0 = empty_strided((8, 12, 512, 512), (3145728, 262144, 512, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.6334061Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.6334211Z triton_fused_add_0.run(arg0_1, buf0, 25165824, grid=grid(25165824), stream=stream0) 2023-01-11T21:38:06.6334287Z del arg0_1 2023-01-11T21:38:06.6334368Z return (buf0, ) 2023-01-11T21:38:06.6334374Z 2023-01-11T21:38:06.6334378Z 2023-01-11T21:38:06.6334454Z if __name__ == "__main__": 2023-01-11T21:38:06.6334719Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.6334846Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.6335084Z arg0_1 = rand_strided((8, 12, 512, 512), (3145728, 262144, 512, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.6335197Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.6335206Z 2023-01-11T21:38:06.6335211Z 2023-01-11T21:38:06.6335308Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.6335383Z import torch 2023-01-11T21:38:06.6335459Z import random 2023-01-11T21:38:06.6335576Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.6335764Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.6335772Z 2023-01-11T21:38:06.6335872Z aten = torch.ops.aten 2023-01-11T21:38:06.6336006Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.6336103Z async_compile = AsyncCompile() 2023-01-11T21:38:06.6336108Z 2023-01-11T21:38:06.6336179Z import triton 2023-01-11T21:38:06.6336270Z import triton.language as tl 2023-01-11T21:38:06.6336394Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.6336525Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.6336531Z 2023-01-11T21:38:06.6336535Z 2023-01-11T21:38:06.6336692Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.6336766Z import triton 2023-01-11T21:38:06.6336858Z import triton.language as tl 2023-01-11T21:38:06.6336970Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6337072Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6337266Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6337386Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6337398Z 2023-01-11T21:38:06.6337801Z @pointwise(size_hints=[33554432], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.6337876Z @triton.jit 2023-01-11T21:38:06.6338007Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6338083Z xnumel = 25165824 2023-01-11T21:38:06.6338176Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6338349Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6338437Z xmask = xindex < xnumel 2023-01-11T21:38:06.6338503Z x0 = xindex 2023-01-11T21:38:06.6338621Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.6338711Z tmp1 = tmp0.to(tl.float32) 2023-01-11T21:38:06.6338785Z tmp2 = 1 2023-01-11T21:38:06.6338865Z tmp3 = tmp1 + tmp2 2023-01-11T21:38:06.6339000Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.6339091Z ''') 2023-01-11T21:38:06.6339096Z 2023-01-11T21:38:06.6339101Z 2023-01-11T21:38:06.6339195Z async_compile.wait(globals()) 2023-01-11T21:38:06.6339267Z del async_compile 2023-01-11T21:38:06.6339272Z 2023-01-11T21:38:06.6339348Z def call(args): 2023-01-11T21:38:06.6339424Z arg0_1, = args 2023-01-11T21:38:06.6339501Z args.clear() 2023-01-11T21:38:06.6339595Z with torch.cuda.device(0): 2023-01-11T21:38:06.6339832Z buf0 = empty_strided((8, 12, 512, 512), (3145728, 262144, 512, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.6339929Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.6340066Z triton_fused_add_0.run(arg0_1, buf0, 25165824, grid=grid(25165824), stream=stream0) 2023-01-11T21:38:06.6340142Z del arg0_1 2023-01-11T21:38:06.6340222Z return (buf0, ) 2023-01-11T21:38:06.6340230Z 2023-01-11T21:38:06.6340235Z 2023-01-11T21:38:06.6340315Z if __name__ == "__main__": 2023-01-11T21:38:06.6340435Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.6340559Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.6340794Z arg0_1 = rand_strided((8, 12, 512, 512), (3145728, 262144, 512, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.6340909Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.6340914Z 2023-01-11T21:38:06.6340980Z ok (0.420s) 2023-01-11T21:38:06.6341437Z test_sigmoid_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.6341602Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.6341861Z [2023-01-11 21:35:35,173] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 813 2023-01-11T21:38:06.6342124Z [2023-01-11 21:35:35,251] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 813 2023-01-11T21:38:06.6342535Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.6342668Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.6342923Z [2023-01-11 21:35:35,271] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 814 2023-01-11T21:38:06.6343187Z [2023-01-11 21:35:35,349] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 814 2023-01-11T21:38:06.6343193Z 2023-01-11T21:38:06.6343293Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.6343373Z import torch 2023-01-11T21:38:06.6343443Z import random 2023-01-11T21:38:06.6343563Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.6343688Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.6343693Z 2023-01-11T21:38:06.6343777Z aten = torch.ops.aten 2023-01-11T21:38:06.6343913Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.6344010Z async_compile = AsyncCompile() 2023-01-11T21:38:06.6344043Z 2023-01-11T21:38:06.6344121Z import triton 2023-01-11T21:38:06.6344209Z import triton.language as tl 2023-01-11T21:38:06.6344339Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.6344480Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.6344486Z 2023-01-11T21:38:06.6344493Z 2023-01-11T21:38:06.6344673Z triton_fused_sigmoid_sigmoid_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.6344750Z import triton 2023-01-11T21:38:06.6344845Z import triton.language as tl 2023-01-11T21:38:06.6344961Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6345066Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6345194Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6345322Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6345327Z 2023-01-11T21:38:06.6345790Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]}) 2023-01-11T21:38:06.6345888Z @triton.jit 2023-01-11T21:38:06.6346046Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6346131Z xnumel = 64 2023-01-11T21:38:06.6346229Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6346360Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6346438Z xmask = xindex < xnumel 2023-01-11T21:38:06.6346511Z x0 = xindex 2023-01-11T21:38:06.6346702Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.6346802Z tmp2 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.6346904Z tmp3 = tl.load(in_ptr1 + (x0), xmask) 2023-01-11T21:38:06.6346990Z tmp1 = tl.sigmoid(tmp0) 2023-01-11T21:38:06.6347071Z tmp4 = tmp2 + tmp3 2023-01-11T21:38:06.6347153Z tmp5 = tl.sigmoid(tmp4) 2023-01-11T21:38:06.6347288Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask) 2023-01-11T21:38:06.6347424Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask) 2023-01-11T21:38:06.6347511Z ''') 2023-01-11T21:38:06.6347517Z 2023-01-11T21:38:06.6347521Z 2023-01-11T21:38:06.6347643Z async_compile.wait(globals()) 2023-01-11T21:38:06.6347725Z del async_compile 2023-01-11T21:38:06.6347730Z 2023-01-11T21:38:06.6347806Z def call(args): 2023-01-11T21:38:06.6347887Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.6347957Z args.clear() 2023-01-11T21:38:06.6348052Z with torch.cuda.device(0): 2023-01-11T21:38:06.6348254Z buf0 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.6348452Z buf1 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.6348546Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.6348712Z triton_fused_sigmoid_sigmoid_1_0.run(arg0_1, arg1_1, buf0, buf1, 64, grid=grid(64), stream=stream0) 2023-01-11T21:38:06.6348790Z del arg0_1 2023-01-11T21:38:06.6348858Z del arg1_1 2023-01-11T21:38:06.6348942Z return (buf0, buf1, ) 2023-01-11T21:38:06.6348947Z 2023-01-11T21:38:06.6348951Z 2023-01-11T21:38:06.6349034Z if __name__ == "__main__": 2023-01-11T21:38:06.6349154Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.6349284Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.6349487Z arg0_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.6349685Z arg1_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.6349807Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.6349813Z 2023-01-11T21:38:06.6349817Z 2023-01-11T21:38:06.6349917Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.6349987Z import torch 2023-01-11T21:38:06.6350063Z import random 2023-01-11T21:38:06.6350224Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.6350351Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.6350356Z 2023-01-11T21:38:06.6350441Z aten = torch.ops.aten 2023-01-11T21:38:06.6350578Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.6350677Z async_compile = AsyncCompile() 2023-01-11T21:38:06.6350682Z 2023-01-11T21:38:06.6350751Z import triton 2023-01-11T21:38:06.6350846Z import triton.language as tl 2023-01-11T21:38:06.6350972Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.6351114Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.6351120Z 2023-01-11T21:38:06.6351124Z 2023-01-11T21:38:06.6351306Z triton_fused_sigmoid_sigmoid_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.6351383Z import triton 2023-01-11T21:38:06.6351477Z import triton.language as tl 2023-01-11T21:38:06.6351597Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6351697Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6351829Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6351956Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6351961Z 2023-01-11T21:38:06.6352398Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: '*fp16', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]}) 2023-01-11T21:38:06.6352474Z @triton.jit 2023-01-11T21:38:06.6352625Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6352702Z xnumel = 64 2023-01-11T21:38:06.6352803Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6352927Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6353012Z xmask = xindex < xnumel 2023-01-11T21:38:06.6353084Z x0 = xindex 2023-01-11T21:38:06.6353302Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.6353421Z tmp2 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.6353539Z tmp3 = tl.load(in_ptr1 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.6353625Z tmp1 = tl.sigmoid(tmp0) 2023-01-11T21:38:06.6353727Z tmp4 = tmp2 + tmp3 2023-01-11T21:38:06.6353814Z tmp5 = tl.sigmoid(tmp4) 2023-01-11T21:38:06.6353954Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask) 2023-01-11T21:38:06.6354089Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask) 2023-01-11T21:38:06.6354176Z ''') 2023-01-11T21:38:06.6354182Z 2023-01-11T21:38:06.6354186Z 2023-01-11T21:38:06.6354282Z async_compile.wait(globals()) 2023-01-11T21:38:06.6354360Z del async_compile 2023-01-11T21:38:06.6354365Z 2023-01-11T21:38:06.6354446Z def call(args): 2023-01-11T21:38:06.6354521Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.6354599Z args.clear() 2023-01-11T21:38:06.6354693Z with torch.cuda.device(0): 2023-01-11T21:38:06.6354893Z buf0 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.6355090Z buf1 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.6355188Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.6355364Z triton_fused_sigmoid_sigmoid_1_0.run(arg0_1, arg1_1, buf0, buf1, 64, grid=grid(64), stream=stream0) 2023-01-11T21:38:06.6355449Z del arg0_1 2023-01-11T21:38:06.6355525Z del arg1_1 2023-01-11T21:38:06.6355630Z return (buf0, buf1, ) 2023-01-11T21:38:06.6355635Z 2023-01-11T21:38:06.6355639Z 2023-01-11T21:38:06.6355723Z if __name__ == "__main__": 2023-01-11T21:38:06.6355842Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.6355971Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.6356169Z arg0_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.6356397Z arg1_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.6356512Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.6356517Z 2023-01-11T21:38:06.6356590Z ok (0.196s) 2023-01-11T21:38:06.6357051Z test_signbit_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.6357185Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.6357444Z [2023-01-11 21:35:35,371] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 815 2023-01-11T21:38:06.6357709Z [2023-01-11 21:35:35,520] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 815 2023-01-11T21:38:06.6358131Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.6358270Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.6358525Z [2023-01-11 21:35:35,540] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 816 2023-01-11T21:38:06.6358788Z [2023-01-11 21:35:35,687] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 816 2023-01-11T21:38:06.6358794Z 2023-01-11T21:38:06.6358895Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.6358964Z import torch 2023-01-11T21:38:06.6359042Z import random 2023-01-11T21:38:06.6359169Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.6359295Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.6359300Z 2023-01-11T21:38:06.6359385Z aten = torch.ops.aten 2023-01-11T21:38:06.6359525Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.6359649Z async_compile = AsyncCompile() 2023-01-11T21:38:06.6359655Z 2023-01-11T21:38:06.6359725Z import triton 2023-01-11T21:38:06.6359819Z import triton.language as tl 2023-01-11T21:38:06.6359946Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.6360086Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.6360092Z 2023-01-11T21:38:06.6360096Z 2023-01-11T21:38:06.6360279Z triton_fused_bitwise_and_signbit_0 = async_compile.triton(''' 2023-01-11T21:38:06.6360358Z import triton 2023-01-11T21:38:06.6360452Z import triton.language as tl 2023-01-11T21:38:06.6360566Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6360666Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6360801Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6360932Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6360937Z 2023-01-11T21:38:06.6361353Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp32', 1: '*i1', 2: '*i64', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.6361429Z @triton.jit 2023-01-11T21:38:06.6361576Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6361653Z xnumel = 72 2023-01-11T21:38:06.6361754Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6361879Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6361963Z xmask = xindex < xnumel 2023-01-11T21:38:06.6362035Z x0 = xindex 2023-01-11T21:38:06.6362255Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.6362356Z tmp2 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.6362512Z tmp1 = tl.libdevice.signbit(tmp0) if (tmp0).dtype is tl.float32 else tmp0 < 0 2023-01-11T21:38:06.6362617Z tmp3 = -tmp2 2023-01-11T21:38:06.6362765Z tmp4 = tl.libdevice.signbit(tmp3) if (tmp3).dtype is tl.float32 else tmp3 < 0 2023-01-11T21:38:06.6362848Z tmp5 = tmp4 == 0 2023-01-11T21:38:06.6362934Z tmp6 = tmp5.to(tl.int64) 2023-01-11T21:38:06.6363009Z tmp7 = 1 2023-01-11T21:38:06.6363092Z tmp8 = tmp6 & tmp7 2023-01-11T21:38:06.6363227Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask) 2023-01-11T21:38:06.6363363Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp8, xmask) 2023-01-11T21:38:06.6363443Z ''') 2023-01-11T21:38:06.6363450Z 2023-01-11T21:38:06.6363460Z 2023-01-11T21:38:06.6363549Z async_compile.wait(globals()) 2023-01-11T21:38:06.6363627Z del async_compile 2023-01-11T21:38:06.6363635Z 2023-01-11T21:38:06.6363713Z def call(args): 2023-01-11T21:38:06.6363790Z arg0_1, = args 2023-01-11T21:38:06.6363866Z args.clear() 2023-01-11T21:38:06.6363963Z with torch.cuda.device(0): 2023-01-11T21:38:06.6364174Z buf0 = empty_strided((1, 2, 6, 6), (72, 36, 6, 1), device='cuda', dtype=torch.bool) 2023-01-11T21:38:06.6364382Z buf1 = empty_strided((1, 2, 6, 6), (72, 36, 6, 1), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.6364475Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.6364637Z triton_fused_bitwise_and_signbit_0.run(arg0_1, buf0, buf1, 72, grid=grid(72), stream=stream0) 2023-01-11T21:38:06.6364711Z del arg0_1 2023-01-11T21:38:06.6364796Z return (buf0, buf1, ) 2023-01-11T21:38:06.6364801Z 2023-01-11T21:38:06.6364806Z 2023-01-11T21:38:06.6364886Z if __name__ == "__main__": 2023-01-11T21:38:06.6365009Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.6365139Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.6365353Z arg0_1 = rand_strided((1, 2, 6, 6), (72, 36, 6, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.6365467Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.6365472Z 2023-01-11T21:38:06.6365476Z 2023-01-11T21:38:06.6365600Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.6365707Z import torch 2023-01-11T21:38:06.6365801Z import random 2023-01-11T21:38:06.6365922Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.6366047Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.6366052Z 2023-01-11T21:38:06.6366137Z aten = torch.ops.aten 2023-01-11T21:38:06.6366269Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.6366368Z async_compile = AsyncCompile() 2023-01-11T21:38:06.6366373Z 2023-01-11T21:38:06.6366447Z import triton 2023-01-11T21:38:06.6366542Z import triton.language as tl 2023-01-11T21:38:06.6366668Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.6366812Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.6366818Z 2023-01-11T21:38:06.6366822Z 2023-01-11T21:38:06.6367004Z triton_fused_bitwise_and_signbit_0 = async_compile.triton(''' 2023-01-11T21:38:06.6367081Z import triton 2023-01-11T21:38:06.6367171Z import triton.language as tl 2023-01-11T21:38:06.6367289Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6367392Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6367524Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6367652Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6367657Z 2023-01-11T21:38:06.6368067Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp16', 1: '*i1', 2: '*i64', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.6368224Z @triton.jit 2023-01-11T21:38:06.6368368Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6368437Z xnumel = 72 2023-01-11T21:38:06.6368537Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6368666Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6368754Z xmask = xindex < xnumel 2023-01-11T21:38:06.6368828Z x0 = xindex 2023-01-11T21:38:06.6369046Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.6369167Z tmp2 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.6369314Z tmp1 = tl.libdevice.signbit(tmp0) if (tmp0).dtype is tl.float32 else tmp0 < 0 2023-01-11T21:38:06.6369417Z tmp3 = -tmp2 2023-01-11T21:38:06.6369569Z tmp4 = tl.libdevice.signbit(tmp3) if (tmp3).dtype is tl.float32 else tmp3 < 0 2023-01-11T21:38:06.6369647Z tmp5 = tmp4 == 0 2023-01-11T21:38:06.6369735Z tmp6 = tmp5.to(tl.int64) 2023-01-11T21:38:06.6369811Z tmp7 = 1 2023-01-11T21:38:06.6369891Z tmp8 = tmp6 & tmp7 2023-01-11T21:38:06.6370019Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask) 2023-01-11T21:38:06.6370153Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp8, xmask) 2023-01-11T21:38:06.6370239Z ''') 2023-01-11T21:38:06.6370244Z 2023-01-11T21:38:06.6370251Z 2023-01-11T21:38:06.6370350Z async_compile.wait(globals()) 2023-01-11T21:38:06.6370427Z del async_compile 2023-01-11T21:38:06.6370432Z 2023-01-11T21:38:06.6370509Z def call(args): 2023-01-11T21:38:06.6370583Z arg0_1, = args 2023-01-11T21:38:06.6370662Z args.clear() 2023-01-11T21:38:06.6370749Z with torch.cuda.device(0): 2023-01-11T21:38:06.6370956Z buf0 = empty_strided((1, 2, 6, 6), (72, 36, 6, 1), device='cuda', dtype=torch.bool) 2023-01-11T21:38:06.6371166Z buf1 = empty_strided((1, 2, 6, 6), (72, 36, 6, 1), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.6371259Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.6371423Z triton_fused_bitwise_and_signbit_0.run(arg0_1, buf0, buf1, 72, grid=grid(72), stream=stream0) 2023-01-11T21:38:06.6371499Z del arg0_1 2023-01-11T21:38:06.6371585Z return (buf0, buf1, ) 2023-01-11T21:38:06.6371590Z 2023-01-11T21:38:06.6371595Z 2023-01-11T21:38:06.6371670Z if __name__ == "__main__": 2023-01-11T21:38:06.6371815Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.6371945Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.6372159Z arg0_1 = rand_strided((1, 2, 6, 6), (72, 36, 6, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.6372274Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.6372280Z 2023-01-11T21:38:06.6372352Z ok (0.337s) 2023-01-11T21:38:06.6372813Z test_silu_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.6372950Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.6373210Z [2023-01-11 21:35:35,703] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 817 2023-01-11T21:38:06.6373478Z [2023-01-11 21:35:35,773] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 817 2023-01-11T21:38:06.6373886Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.6374019Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.6374300Z [2023-01-11 21:35:35,823] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 818 2023-01-11T21:38:06.6374671Z [2023-01-11 21:35:35,897] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 818 2023-01-11T21:38:06.6374677Z 2023-01-11T21:38:06.6374780Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.6374857Z import torch 2023-01-11T21:38:06.6374936Z import random 2023-01-11T21:38:06.6375059Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.6375186Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.6375191Z 2023-01-11T21:38:06.6375268Z aten = torch.ops.aten 2023-01-11T21:38:06.6375408Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.6375505Z async_compile = AsyncCompile() 2023-01-11T21:38:06.6375510Z 2023-01-11T21:38:06.6375586Z import triton 2023-01-11T21:38:06.6375686Z import triton.language as tl 2023-01-11T21:38:06.6375818Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.6375958Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.6375963Z 2023-01-11T21:38:06.6375968Z 2023-01-11T21:38:06.6376124Z triton_fused_mul_0 = async_compile.triton(''' 2023-01-11T21:38:06.6376194Z import triton 2023-01-11T21:38:06.6376290Z import triton.language as tl 2023-01-11T21:38:06.6376406Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6376513Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6376647Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6376775Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6376780Z 2023-01-11T21:38:06.6377237Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.6377317Z @triton.jit 2023-01-11T21:38:06.6377445Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6377521Z xnumel = 64 2023-01-11T21:38:06.6377621Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6377751Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6377881Z xmask = xindex < xnumel 2023-01-11T21:38:06.6377955Z x0 = xindex 2023-01-11T21:38:06.6378054Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.6378135Z tmp1 = tl.sigmoid(tmp0) 2023-01-11T21:38:06.6378216Z tmp2 = tmp0 * tmp1 2023-01-11T21:38:06.6378350Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.6378439Z ''') 2023-01-11T21:38:06.6378445Z 2023-01-11T21:38:06.6378449Z 2023-01-11T21:38:06.6378546Z async_compile.wait(globals()) 2023-01-11T21:38:06.6378624Z del async_compile 2023-01-11T21:38:06.6378629Z 2023-01-11T21:38:06.6378707Z def call(args): 2023-01-11T21:38:06.6378779Z arg0_1, = args 2023-01-11T21:38:06.6378857Z args.clear() 2023-01-11T21:38:06.6378951Z with torch.cuda.device(0): 2023-01-11T21:38:06.6379152Z buf0 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.6379248Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.6379388Z triton_fused_mul_0.run(arg0_1, buf0, 64, grid=grid(64), stream=stream0) 2023-01-11T21:38:06.6379463Z del arg0_1 2023-01-11T21:38:06.6379536Z return (buf0, ) 2023-01-11T21:38:06.6379548Z 2023-01-11T21:38:06.6379553Z 2023-01-11T21:38:06.6379628Z if __name__ == "__main__": 2023-01-11T21:38:06.6379747Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.6379878Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.6380079Z arg0_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.6380194Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.6380199Z 2023-01-11T21:38:06.6380237Z 2023-01-11T21:38:06.6380337Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.6380413Z import torch 2023-01-11T21:38:06.6380493Z import random 2023-01-11T21:38:06.6380607Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.6380733Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.6380738Z 2023-01-11T21:38:06.6380824Z aten = torch.ops.aten 2023-01-11T21:38:06.6380965Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.6381067Z async_compile = AsyncCompile() 2023-01-11T21:38:06.6381073Z 2023-01-11T21:38:06.6381149Z import triton 2023-01-11T21:38:06.6381245Z import triton.language as tl 2023-01-11T21:38:06.6381367Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.6381508Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.6381513Z 2023-01-11T21:38:06.6381517Z 2023-01-11T21:38:06.6381706Z triton_fused_convert_element_type_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.6381786Z import triton 2023-01-11T21:38:06.6381881Z import triton.language as tl 2023-01-11T21:38:06.6381999Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6382103Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6382238Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6382361Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6382366Z 2023-01-11T21:38:06.6382773Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.6382851Z @triton.jit 2023-01-11T21:38:06.6382985Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6383059Z xnumel = 64 2023-01-11T21:38:06.6383156Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6383287Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6383376Z xmask = xindex < xnumel 2023-01-11T21:38:06.6383443Z x0 = xindex 2023-01-11T21:38:06.6383562Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.6383655Z tmp1 = tmp0.to(tl.float32) 2023-01-11T21:38:06.6383742Z tmp2 = tl.sigmoid(tmp1) 2023-01-11T21:38:06.6383850Z tmp3 = tmp1 * tmp2 2023-01-11T21:38:06.6383940Z tmp4 = tmp3.to(tl.float32) 2023-01-11T21:38:06.6384081Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.6384163Z ''') 2023-01-11T21:38:06.6384168Z 2023-01-11T21:38:06.6384172Z 2023-01-11T21:38:06.6384266Z async_compile.wait(globals()) 2023-01-11T21:38:06.6384343Z del async_compile 2023-01-11T21:38:06.6384348Z 2023-01-11T21:38:06.6384426Z def call(args): 2023-01-11T21:38:06.6384500Z arg0_1, = args 2023-01-11T21:38:06.6384576Z args.clear() 2023-01-11T21:38:06.6384670Z with torch.cuda.device(0): 2023-01-11T21:38:06.6384865Z buf0 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.6384962Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.6385124Z triton_fused_convert_element_type_1_0.run(arg0_1, buf0, 64, grid=grid(64), stream=stream0) 2023-01-11T21:38:06.6385200Z del arg0_1 2023-01-11T21:38:06.6385278Z return (buf0, ) 2023-01-11T21:38:06.6385286Z 2023-01-11T21:38:06.6385290Z 2023-01-11T21:38:06.6385372Z if __name__ == "__main__": 2023-01-11T21:38:06.6385495Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.6385630Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.6385854Z arg0_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.6385977Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.6385982Z 2023-01-11T21:38:06.6386054Z ok (0.210s) 2023-01-11T21:38:06.6386504Z test_simplify_dims (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.6386665Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.6386923Z [2023-01-11 21:35:35,909] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 819 2023-01-11T21:38:06.6387190Z [2023-01-11 21:35:35,998] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 819 2023-01-11T21:38:06.6387602Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.6387737Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.6387993Z [2023-01-11 21:35:36,010] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 820 2023-01-11T21:38:06.6388258Z [2023-01-11 21:35:36,088] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 820 2023-01-11T21:38:06.6388264Z 2023-01-11T21:38:06.6388358Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.6388432Z import torch 2023-01-11T21:38:06.6388505Z import random 2023-01-11T21:38:06.6388626Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.6388751Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.6388756Z 2023-01-11T21:38:06.6388841Z aten = torch.ops.aten 2023-01-11T21:38:06.6388978Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.6389068Z async_compile = AsyncCompile() 2023-01-11T21:38:06.6389081Z 2023-01-11T21:38:06.6389150Z import triton 2023-01-11T21:38:06.6389243Z import triton.language as tl 2023-01-11T21:38:06.6389370Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.6389510Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.6389515Z 2023-01-11T21:38:06.6389520Z 2023-01-11T21:38:06.6389703Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.6389780Z import triton 2023-01-11T21:38:06.6389875Z import triton.language as tl 2023-01-11T21:38:06.6389984Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6390092Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6390228Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6390356Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6390361Z 2023-01-11T21:38:06.6390769Z @pointwise(size_hints=[1024], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.6390848Z @triton.jit 2023-01-11T21:38:06.6390983Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6391057Z xnumel = 720 2023-01-11T21:38:06.6391150Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6391288Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6391376Z xmask = xindex < xnumel 2023-01-11T21:38:06.6391453Z x0 = xindex % 30 2023-01-11T21:38:06.6391534Z x1 = (xindex // 30) % 4 2023-01-11T21:38:06.6391620Z x2 = (xindex // 120) 2023-01-11T21:38:06.6391693Z x3 = xindex 2023-01-11T21:38:06.6391803Z tmp0 = tl.load(in_ptr0 + (x0 + (60*x1) + (300*x2)), xmask) 2023-01-11T21:38:06.6391877Z tmp1 = 1 2023-01-11T21:38:06.6391959Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.6392095Z tl.store(out_ptr0 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.6392220Z ''') 2023-01-11T21:38:06.6392227Z 2023-01-11T21:38:06.6392231Z 2023-01-11T21:38:06.6392324Z async_compile.wait(globals()) 2023-01-11T21:38:06.6392402Z del async_compile 2023-01-11T21:38:06.6392407Z 2023-01-11T21:38:06.6392483Z def call(args): 2023-01-11T21:38:06.6392552Z arg0_1, = args 2023-01-11T21:38:06.6392633Z args.clear() 2023-01-11T21:38:06.6392730Z with torch.cuda.device(0): 2023-01-11T21:38:06.6392961Z buf0 = empty_strided((2, 3, 4, 5, 6), (360, 120, 30, 6, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.6393056Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.6393195Z triton_fused_add_0.run(arg0_1, buf0, 720, grid=grid(720), stream=stream0) 2023-01-11T21:38:06.6393272Z del arg0_1 2023-01-11T21:38:06.6393345Z return (buf0, ) 2023-01-11T21:38:06.6393350Z 2023-01-11T21:38:06.6393355Z 2023-01-11T21:38:06.6393436Z if __name__ == "__main__": 2023-01-11T21:38:06.6393555Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.6393689Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.6393916Z arg0_1 = rand_strided((2, 3, 4, 5, 6), (900, 300, 60, 6, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.6394031Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.6394036Z 2023-01-11T21:38:06.6394041Z 2023-01-11T21:38:06.6394143Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.6394219Z import torch 2023-01-11T21:38:06.6394289Z import random 2023-01-11T21:38:06.6394410Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.6394538Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.6394543Z 2023-01-11T21:38:06.6394626Z aten = torch.ops.aten 2023-01-11T21:38:06.6394764Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.6394861Z async_compile = AsyncCompile() 2023-01-11T21:38:06.6394867Z 2023-01-11T21:38:06.6394943Z import triton 2023-01-11T21:38:06.6395032Z import triton.language as tl 2023-01-11T21:38:06.6395163Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.6395304Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.6395311Z 2023-01-11T21:38:06.6395316Z 2023-01-11T21:38:06.6395495Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.6395591Z import triton 2023-01-11T21:38:06.6395716Z import triton.language as tl 2023-01-11T21:38:06.6395834Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6395940Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6396067Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6396195Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6396200Z 2023-01-11T21:38:06.6396609Z @pointwise(size_hints=[1024], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.6396686Z @triton.jit 2023-01-11T21:38:06.6396820Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6396897Z xnumel = 720 2023-01-11T21:38:06.6396997Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6397128Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6397209Z xmask = xindex < xnumel 2023-01-11T21:38:06.6397288Z x0 = xindex % 30 2023-01-11T21:38:06.6397371Z x1 = (xindex // 30) % 4 2023-01-11T21:38:06.6397451Z x2 = (xindex // 120) 2023-01-11T21:38:06.6397523Z x3 = xindex 2023-01-11T21:38:06.6397656Z tmp0 = tl.load(in_ptr0 + (x0 + (60*x1) + (300*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.6397728Z tmp1 = 1 2023-01-11T21:38:06.6397803Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.6397939Z tl.store(out_ptr0 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.6398028Z ''') 2023-01-11T21:38:06.6398033Z 2023-01-11T21:38:06.6398038Z 2023-01-11T21:38:06.6398163Z async_compile.wait(globals()) 2023-01-11T21:38:06.6398241Z del async_compile 2023-01-11T21:38:06.6398247Z 2023-01-11T21:38:06.6398324Z def call(args): 2023-01-11T21:38:06.6398401Z arg0_1, = args 2023-01-11T21:38:06.6398471Z args.clear() 2023-01-11T21:38:06.6398563Z with torch.cuda.device(0): 2023-01-11T21:38:06.6398793Z buf0 = empty_strided((2, 3, 4, 5, 6), (360, 120, 30, 6, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.6398888Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.6399027Z triton_fused_add_0.run(arg0_1, buf0, 720, grid=grid(720), stream=stream0) 2023-01-11T21:38:06.6399102Z del arg0_1 2023-01-11T21:38:06.6399183Z return (buf0, ) 2023-01-11T21:38:06.6399188Z 2023-01-11T21:38:06.6399192Z 2023-01-11T21:38:06.6399273Z if __name__ == "__main__": 2023-01-11T21:38:06.6399386Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.6399515Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.6399746Z arg0_1 = rand_strided((2, 3, 4, 5, 6), (900, 300, 60, 6, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.6399860Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.6399866Z 2023-01-11T21:38:06.6399941Z ok (0.191s) 2023-01-11T21:38:06.6400404Z test_simplify_loops_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.6400536Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.6400797Z [2023-01-11 21:35:36,101] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 821 2023-01-11T21:38:06.6401064Z [2023-01-11 21:35:36,186] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 821 2023-01-11T21:38:06.6401505Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.6401637Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.6401887Z [2023-01-11 21:35:36,199] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 822 2023-01-11T21:38:06.6402147Z [2023-01-11 21:35:36,278] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 822 2023-01-11T21:38:06.6402153Z 2023-01-11T21:38:06.6402253Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.6402332Z import torch 2023-01-11T21:38:06.6402408Z import random 2023-01-11T21:38:06.6402532Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.6402658Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.6402663Z 2023-01-11T21:38:06.6402747Z aten = torch.ops.aten 2023-01-11T21:38:06.6402878Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.6402977Z async_compile = AsyncCompile() 2023-01-11T21:38:06.6402982Z 2023-01-11T21:38:06.6403058Z import triton 2023-01-11T21:38:06.6403152Z import triton.language as tl 2023-01-11T21:38:06.6403276Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.6403416Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.6403421Z 2023-01-11T21:38:06.6403426Z 2023-01-11T21:38:06.6403580Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.6403657Z import triton 2023-01-11T21:38:06.6403744Z import triton.language as tl 2023-01-11T21:38:06.6403859Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6403991Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6404123Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6404251Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6404256Z 2023-01-11T21:38:06.6404684Z @pointwise(size_hints=[1024], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.6404761Z @triton.jit 2023-01-11T21:38:06.6404905Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6404976Z xnumel = 720 2023-01-11T21:38:06.6405075Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6405206Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6405291Z xmask = xindex < xnumel 2023-01-11T21:38:06.6405364Z x3 = xindex 2023-01-11T21:38:06.6405442Z x0 = xindex % 30 2023-01-11T21:38:06.6405521Z x1 = (xindex // 30) % 4 2023-01-11T21:38:06.6405602Z x2 = (xindex // 120) 2023-01-11T21:38:06.6405703Z tmp0 = tl.load(in_ptr0 + (x3), xmask) 2023-01-11T21:38:06.6405819Z tmp1 = tl.load(in_ptr1 + (x0 + (30*x2) + (180*x1)), xmask) 2023-01-11T21:38:06.6405901Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.6406041Z tl.store(out_ptr0 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.6406128Z ''') 2023-01-11T21:38:06.6406134Z 2023-01-11T21:38:06.6406138Z 2023-01-11T21:38:06.6406234Z async_compile.wait(globals()) 2023-01-11T21:38:06.6406307Z del async_compile 2023-01-11T21:38:06.6406312Z 2023-01-11T21:38:06.6406388Z def call(args): 2023-01-11T21:38:06.6406470Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.6406548Z args.clear() 2023-01-11T21:38:06.6406641Z with torch.cuda.device(0): 2023-01-11T21:38:06.6406871Z buf0 = empty_strided((2, 3, 4, 5, 6), (360, 120, 30, 6, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.6406968Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.6407108Z triton_fused_add_0.run(arg0_1, arg1_1, buf0, 720, grid=grid(720), stream=stream0) 2023-01-11T21:38:06.6407184Z del arg0_1 2023-01-11T21:38:06.6407257Z del arg1_1 2023-01-11T21:38:06.6407336Z return (buf0, ) 2023-01-11T21:38:06.6407341Z 2023-01-11T21:38:06.6407374Z 2023-01-11T21:38:06.6407460Z if __name__ == "__main__": 2023-01-11T21:38:06.6407581Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.6407709Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.6407938Z arg0_1 = rand_strided((2, 3, 4, 5, 6), (360, 120, 30, 6, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.6408158Z arg1_1 = rand_strided((2, 3, 4, 5, 6), (90, 30, 180, 6, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.6408279Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.6408284Z 2023-01-11T21:38:06.6408289Z 2023-01-11T21:38:06.6408391Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.6408466Z import torch 2023-01-11T21:38:06.6408542Z import random 2023-01-11T21:38:06.6408663Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.6408789Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.6408794Z 2023-01-11T21:38:06.6408878Z aten = torch.ops.aten 2023-01-11T21:38:06.6409012Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.6409113Z async_compile = AsyncCompile() 2023-01-11T21:38:06.6409118Z 2023-01-11T21:38:06.6409193Z import triton 2023-01-11T21:38:06.6409287Z import triton.language as tl 2023-01-11T21:38:06.6409414Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.6409554Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.6409560Z 2023-01-11T21:38:06.6409565Z 2023-01-11T21:38:06.6409722Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.6409801Z import triton 2023-01-11T21:38:06.6409915Z import triton.language as tl 2023-01-11T21:38:06.6410030Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6410134Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6410268Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6410392Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6410399Z 2023-01-11T21:38:06.6410826Z @pointwise(size_hints=[1024], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.6410901Z @triton.jit 2023-01-11T21:38:06.6411046Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6411116Z xnumel = 720 2023-01-11T21:38:06.6411215Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6411348Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6411436Z xmask = xindex < xnumel 2023-01-11T21:38:06.6411509Z x3 = xindex 2023-01-11T21:38:06.6411587Z x0 = xindex % 30 2023-01-11T21:38:06.6411669Z x1 = (xindex // 30) % 4 2023-01-11T21:38:06.6411743Z x2 = (xindex // 120) 2023-01-11T21:38:06.6411864Z tmp0 = tl.load(in_ptr0 + (x3), xmask).to(tl.float32) 2023-01-11T21:38:06.6411999Z tmp1 = tl.load(in_ptr1 + (x0 + (30*x2) + (180*x1)), xmask).to(tl.float32) 2023-01-11T21:38:06.6412080Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.6412220Z tl.store(out_ptr0 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.6412311Z ''') 2023-01-11T21:38:06.6412317Z 2023-01-11T21:38:06.6412321Z 2023-01-11T21:38:06.6412414Z async_compile.wait(globals()) 2023-01-11T21:38:06.6412486Z del async_compile 2023-01-11T21:38:06.6412496Z 2023-01-11T21:38:06.6412566Z def call(args): 2023-01-11T21:38:06.6412647Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.6412724Z args.clear() 2023-01-11T21:38:06.6412817Z with torch.cuda.device(0): 2023-01-11T21:38:06.6413047Z buf0 = empty_strided((2, 3, 4, 5, 6), (360, 120, 30, 6, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.6413140Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.6413285Z triton_fused_add_0.run(arg0_1, arg1_1, buf0, 720, grid=grid(720), stream=stream0) 2023-01-11T21:38:06.6413381Z del arg0_1 2023-01-11T21:38:06.6413458Z del arg1_1 2023-01-11T21:38:06.6413538Z return (buf0, ) 2023-01-11T21:38:06.6413543Z 2023-01-11T21:38:06.6413548Z 2023-01-11T21:38:06.6413629Z if __name__ == "__main__": 2023-01-11T21:38:06.6413751Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.6413876Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.6414103Z arg0_1 = rand_strided((2, 3, 4, 5, 6), (360, 120, 30, 6, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.6414329Z arg1_1 = rand_strided((2, 3, 4, 5, 6), (90, 30, 180, 6, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.6414449Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.6414454Z 2023-01-11T21:38:06.6414634Z ok (0.190s) 2023-01-11T21:38:06.6415094Z test_sin_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.6415227Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.6415486Z [2023-01-11 21:35:36,309] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 823 2023-01-11T21:38:06.6415748Z [2023-01-11 21:35:36,388] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 823 2023-01-11T21:38:06.6416164Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.6416346Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.6416604Z [2023-01-11 21:35:36,418] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 824 2023-01-11T21:38:06.6416868Z [2023-01-11 21:35:36,499] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 824 2023-01-11T21:38:06.6416874Z 2023-01-11T21:38:06.6416973Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.6417043Z import torch 2023-01-11T21:38:06.6417117Z import random 2023-01-11T21:38:06.6417294Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.6417421Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.6417430Z 2023-01-11T21:38:06.6417519Z aten = torch.ops.aten 2023-01-11T21:38:06.6417658Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.6417756Z async_compile = AsyncCompile() 2023-01-11T21:38:06.6417761Z 2023-01-11T21:38:06.6417831Z import triton 2023-01-11T21:38:06.6417928Z import triton.language as tl 2023-01-11T21:38:06.6418058Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.6418201Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.6418207Z 2023-01-11T21:38:06.6418211Z 2023-01-11T21:38:06.6418377Z triton_fused_add_sin_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.6418452Z import triton 2023-01-11T21:38:06.6418552Z import triton.language as tl 2023-01-11T21:38:06.6418670Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6418769Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6418905Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6419034Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6419039Z 2023-01-11T21:38:06.6419501Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.6419578Z @triton.jit 2023-01-11T21:38:06.6419720Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6419798Z xnumel = 256 2023-01-11T21:38:06.6419899Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6420024Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6420108Z xmask = xindex < xnumel 2023-01-11T21:38:06.6420187Z x0 = xindex 2023-01-11T21:38:06.6420382Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.6420485Z tmp4 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.6420572Z tmp1 = tl.sin(tmp0) 2023-01-11T21:38:06.6420644Z tmp2 = 2 2023-01-11T21:38:06.6420718Z tmp3 = tmp1 + tmp2 2023-01-11T21:38:06.6420789Z tmp5 = 1 2023-01-11T21:38:06.6420869Z tmp6 = tmp4 + tmp5 2023-01-11T21:38:06.6420948Z tmp7 = tl.sin(tmp6) 2023-01-11T21:38:06.6421088Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.6421221Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp7, xmask) 2023-01-11T21:38:06.6421308Z ''') 2023-01-11T21:38:06.6421314Z 2023-01-11T21:38:06.6421318Z 2023-01-11T21:38:06.6421406Z async_compile.wait(globals()) 2023-01-11T21:38:06.6421485Z del async_compile 2023-01-11T21:38:06.6421490Z 2023-01-11T21:38:06.6421567Z def call(args): 2023-01-11T21:38:06.6421642Z arg0_1, = args 2023-01-11T21:38:06.6421719Z args.clear() 2023-01-11T21:38:06.6421812Z with torch.cuda.device(0): 2023-01-11T21:38:06.6422017Z buf0 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.6422240Z buf1 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.6422334Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.6422483Z triton_fused_add_sin_1_0.run(arg0_1, buf0, buf1, 256, grid=grid(256), stream=stream0) 2023-01-11T21:38:06.6422561Z del arg0_1 2023-01-11T21:38:06.6422646Z return (buf0, buf1, ) 2023-01-11T21:38:06.6422651Z 2023-01-11T21:38:06.6422656Z 2023-01-11T21:38:06.6422736Z if __name__ == "__main__": 2023-01-11T21:38:06.6422859Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.6422987Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.6423186Z arg0_1 = rand_strided((16, 16), (16, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.6423300Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.6423306Z 2023-01-11T21:38:06.6423310Z 2023-01-11T21:38:06.6423409Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.6423487Z import torch 2023-01-11T21:38:06.6423566Z import random 2023-01-11T21:38:06.6423686Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.6423812Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.6423818Z 2023-01-11T21:38:06.6423903Z aten = torch.ops.aten 2023-01-11T21:38:06.6424037Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.6424134Z async_compile = AsyncCompile() 2023-01-11T21:38:06.6424139Z 2023-01-11T21:38:06.6424214Z import triton 2023-01-11T21:38:06.6424309Z import triton.language as tl 2023-01-11T21:38:06.6424435Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.6424575Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.6424581Z 2023-01-11T21:38:06.6424585Z 2023-01-11T21:38:06.6424750Z triton_fused_add_sin_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.6424827Z import triton 2023-01-11T21:38:06.6424915Z import triton.language as tl 2023-01-11T21:38:06.6425036Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6425143Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6425277Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6425405Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6425411Z 2023-01-11T21:38:06.6425913Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.6425989Z @triton.jit 2023-01-11T21:38:06.6426135Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6426204Z xnumel = 256 2023-01-11T21:38:06.6426306Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6426436Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6426525Z xmask = xindex < xnumel 2023-01-11T21:38:06.6426597Z x0 = xindex 2023-01-11T21:38:06.6426813Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.6426931Z tmp4 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.6427007Z tmp1 = tl.sin(tmp0) 2023-01-11T21:38:06.6427082Z tmp2 = 2 2023-01-11T21:38:06.6427163Z tmp3 = tmp1 + tmp2 2023-01-11T21:38:06.6427237Z tmp5 = 1 2023-01-11T21:38:06.6427316Z tmp6 = tmp4 + tmp5 2023-01-11T21:38:06.6427397Z tmp7 = tl.sin(tmp6) 2023-01-11T21:38:06.6427531Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.6427660Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp7, xmask) 2023-01-11T21:38:06.6427747Z ''') 2023-01-11T21:38:06.6427753Z 2023-01-11T21:38:06.6427757Z 2023-01-11T21:38:06.6427856Z async_compile.wait(globals()) 2023-01-11T21:38:06.6427935Z del async_compile 2023-01-11T21:38:06.6427940Z 2023-01-11T21:38:06.6428045Z def call(args): 2023-01-11T21:38:06.6428121Z arg0_1, = args 2023-01-11T21:38:06.6428198Z args.clear() 2023-01-11T21:38:06.6428286Z with torch.cuda.device(0): 2023-01-11T21:38:06.6428493Z buf0 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.6428694Z buf1 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.6428788Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.6428936Z triton_fused_add_sin_1_0.run(arg0_1, buf0, buf1, 256, grid=grid(256), stream=stream0) 2023-01-11T21:38:06.6429011Z del arg0_1 2023-01-11T21:38:06.6429095Z return (buf0, buf1, ) 2023-01-11T21:38:06.6429100Z 2023-01-11T21:38:06.6429105Z 2023-01-11T21:38:06.6429187Z if __name__ == "__main__": 2023-01-11T21:38:06.6429299Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.6429432Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.6429636Z arg0_1 = rand_strided((16, 16), (16, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.6429755Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.6429760Z 2023-01-11T21:38:06.6429832Z ok (0.221s) 2023-01-11T21:38:06.6429969Z test_sink_cat_after_pointwise (__main__.CudaTests) ... ok (0.004s) 2023-01-11T21:38:06.6430435Z test_sizehint_issue1_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.6430572Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.6430833Z [2023-01-11 21:35:36,656] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 825 2023-01-11T21:38:06.6431097Z [2023-01-11 21:35:36,795] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 825 2023-01-11T21:38:06.6431541Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.6431675Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.6431932Z [2023-01-11 21:35:36,980] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 826 2023-01-11T21:38:06.6432196Z [2023-01-11 21:35:37,099] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 826 2023-01-11T21:38:06.6432202Z 2023-01-11T21:38:06.6432303Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.6432383Z import torch 2023-01-11T21:38:06.6432462Z import random 2023-01-11T21:38:06.6432582Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.6432709Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.6432714Z 2023-01-11T21:38:06.6432792Z aten = torch.ops.aten 2023-01-11T21:38:06.6432933Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.6433030Z async_compile = AsyncCompile() 2023-01-11T21:38:06.6433035Z 2023-01-11T21:38:06.6433110Z import triton 2023-01-11T21:38:06.6433204Z import triton.language as tl 2023-01-11T21:38:06.6433332Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.6433473Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.6433479Z 2023-01-11T21:38:06.6433483Z 2023-01-11T21:38:06.6433653Z triton_fused__unsafe_view_0 = async_compile.triton(''' 2023-01-11T21:38:06.6433723Z import triton 2023-01-11T21:38:06.6433820Z import triton.language as tl 2023-01-11T21:38:06.6433936Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6434078Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6434212Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6434338Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6434343Z 2023-01-11T21:38:06.6434753Z @pointwise(size_hints=[262144], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.6434829Z @triton.jit 2023-01-11T21:38:06.6434957Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6435035Z xnumel = 150528 2023-01-11T21:38:06.6435136Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6435268Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6435375Z xmask = xindex < xnumel 2023-01-11T21:38:06.6435457Z x0 = xindex % 196 2023-01-11T21:38:06.6435566Z x1 = (xindex // 196) % 384 2023-01-11T21:38:06.6435645Z x2 = (xindex // 75264) 2023-01-11T21:38:06.6435717Z x4 = xindex 2023-01-11T21:38:06.6435796Z tmp0 = 4*(x0 // 14) 2023-01-11T21:38:06.6435875Z tmp1 = (x1 // 4) % 4 2023-01-11T21:38:06.6435960Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.6436040Z tmp3 = 4*(x0 % 14) 2023-01-11T21:38:06.6436108Z tmp4 = x1 % 4 2023-01-11T21:38:06.6436188Z tmp5 = tmp3 + tmp4 2023-01-11T21:38:06.6436326Z tmp6 = tl.load(in_ptr0 + (tmp5 + (56*tmp2) + (3136*(x1 // 16)) + (75264*x2)), xmask) 2023-01-11T21:38:06.6436465Z tl.store(out_ptr0 + (x4 + tl.zeros([XBLOCK], tl.int32)), tmp6, xmask) 2023-01-11T21:38:06.6436553Z ''') 2023-01-11T21:38:06.6436559Z 2023-01-11T21:38:06.6436563Z 2023-01-11T21:38:06.6436657Z async_compile.wait(globals()) 2023-01-11T21:38:06.6436736Z del async_compile 2023-01-11T21:38:06.6436741Z 2023-01-11T21:38:06.6436818Z def call(args): 2023-01-11T21:38:06.6436886Z arg0_1, = args 2023-01-11T21:38:06.6436967Z args.clear() 2023-01-11T21:38:06.6437060Z with torch.cuda.device(0): 2023-01-11T21:38:06.6437279Z buf0 = empty_strided((2, 384, 196), (75264, 196, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.6437374Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.6437549Z triton_fused__unsafe_view_0.run(arg0_1, buf0, 150528, grid=grid(150528), stream=stream0) 2023-01-11T21:38:06.6437626Z del arg0_1 2023-01-11T21:38:06.6437699Z return (buf0, ) 2023-01-11T21:38:06.6437708Z 2023-01-11T21:38:06.6437713Z 2023-01-11T21:38:06.6437789Z if __name__ == "__main__": 2023-01-11T21:38:06.6437911Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.6438042Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.6438273Z arg0_1 = rand_strided((2, 24, 56, 56), (75264, 3136, 56, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.6438390Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.6438399Z 2023-01-11T21:38:06.6438403Z 2023-01-11T21:38:06.6438502Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.6438579Z import torch 2023-01-11T21:38:06.6438648Z import random 2023-01-11T21:38:06.6438768Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.6438894Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.6438901Z 2023-01-11T21:38:06.6438984Z aten = torch.ops.aten 2023-01-11T21:38:06.6439122Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.6439217Z async_compile = AsyncCompile() 2023-01-11T21:38:06.6439222Z 2023-01-11T21:38:06.6439297Z import triton 2023-01-11T21:38:06.6439391Z import triton.language as tl 2023-01-11T21:38:06.6439512Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.6439652Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.6439658Z 2023-01-11T21:38:06.6439662Z 2023-01-11T21:38:06.6439851Z triton_fused_convert_element_type_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.6439961Z import triton 2023-01-11T21:38:06.6440056Z import triton.language as tl 2023-01-11T21:38:06.6440174Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6440278Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6440412Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6440538Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6440543Z 2023-01-11T21:38:06.6440957Z @pointwise(size_hints=[262144], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.6441032Z @triton.jit 2023-01-11T21:38:06.6441167Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6441243Z xnumel = 150528 2023-01-11T21:38:06.6441342Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6441474Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6441560Z xmask = xindex < xnumel 2023-01-11T21:38:06.6441632Z x0 = xindex % 196 2023-01-11T21:38:06.6441717Z x1 = (xindex // 196) % 384 2023-01-11T21:38:06.6441800Z x2 = (xindex // 75264) 2023-01-11T21:38:06.6441873Z x4 = xindex 2023-01-11T21:38:06.6441956Z tmp0 = 4*(x0 // 14) 2023-01-11T21:38:06.6442038Z tmp1 = (x1 // 4) % 4 2023-01-11T21:38:06.6442112Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.6442189Z tmp3 = 4*(x0 % 14) 2023-01-11T21:38:06.6442262Z tmp4 = x1 % 4 2023-01-11T21:38:06.6442341Z tmp5 = tmp3 + tmp4 2023-01-11T21:38:06.6442492Z tmp6 = tl.load(in_ptr0 + (tmp5 + (56*tmp2) + (3136*(x1 // 16)) + (75264*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.6442586Z tmp7 = tmp6.to(tl.float32) 2023-01-11T21:38:06.6442674Z tmp8 = tmp7.to(tl.float32) 2023-01-11T21:38:06.6442805Z tl.store(out_ptr0 + (x4 + tl.zeros([XBLOCK], tl.int32)), tmp8, xmask) 2023-01-11T21:38:06.6442896Z ''') 2023-01-11T21:38:06.6442902Z 2023-01-11T21:38:06.6442906Z 2023-01-11T21:38:06.6443000Z async_compile.wait(globals()) 2023-01-11T21:38:06.6443081Z del async_compile 2023-01-11T21:38:06.6443086Z 2023-01-11T21:38:06.6443163Z def call(args): 2023-01-11T21:38:06.6443239Z arg0_1, = args 2023-01-11T21:38:06.6443315Z args.clear() 2023-01-11T21:38:06.6443436Z with torch.cuda.device(0): 2023-01-11T21:38:06.6443657Z buf0 = empty_strided((2, 384, 196), (75264, 196, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.6443751Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.6443916Z triton_fused_convert_element_type_1_0.run(arg0_1, buf0, 150528, grid=grid(150528), stream=stream0) 2023-01-11T21:38:06.6443991Z del arg0_1 2023-01-11T21:38:06.6444073Z return (buf0, ) 2023-01-11T21:38:06.6444078Z 2023-01-11T21:38:06.6444082Z 2023-01-11T21:38:06.6444163Z if __name__ == "__main__": 2023-01-11T21:38:06.6444282Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.6444413Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.6444634Z arg0_1 = rand_strided((2, 24, 56, 56), (75264, 3136, 56, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.6444750Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.6444755Z 2023-01-11T21:38:06.6444828Z ok (0.596s) 2023-01-11T21:38:06.6445281Z test_slice1_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.6445414Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.6445672Z [2023-01-11 21:35:37,140] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 827 2023-01-11T21:38:06.6445961Z [2023-01-11 21:35:37,230] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 827 2023-01-11T21:38:06.6446373Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.6446506Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.6446759Z [2023-01-11 21:35:37,272] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 828 2023-01-11T21:38:06.6447021Z [2023-01-11 21:35:37,356] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 828 2023-01-11T21:38:06.6447027Z 2023-01-11T21:38:06.6447121Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.6447202Z import torch 2023-01-11T21:38:06.6447277Z import random 2023-01-11T21:38:06.6447403Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.6447529Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.6447534Z 2023-01-11T21:38:06.6447618Z aten = torch.ops.aten 2023-01-11T21:38:06.6447760Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.6447851Z async_compile = AsyncCompile() 2023-01-11T21:38:06.6447862Z 2023-01-11T21:38:06.6447931Z import triton 2023-01-11T21:38:06.6448025Z import triton.language as tl 2023-01-11T21:38:06.6448150Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.6448292Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.6448298Z 2023-01-11T21:38:06.6448302Z 2023-01-11T21:38:06.6448467Z triton_fused_add_add_3_0 = async_compile.triton(''' 2023-01-11T21:38:06.6448542Z import triton 2023-01-11T21:38:06.6448635Z import triton.language as tl 2023-01-11T21:38:06.6448746Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6448850Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6448987Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6449113Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6449118Z 2023-01-11T21:38:06.6449570Z @pointwise(size_hints=[32], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.6449651Z @triton.jit 2023-01-11T21:38:06.6449795Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6449871Z xnumel = 20 2023-01-11T21:38:06.6449963Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6450096Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6450181Z xmask = xindex < xnumel 2023-01-11T21:38:06.6450262Z x0 = xindex % 10 2023-01-11T21:38:06.6450343Z x1 = (xindex // 10) 2023-01-11T21:38:06.6450415Z x2 = xindex 2023-01-11T21:38:06.6450626Z tmp0 = tl.load(in_ptr0 + ((2*x0) + (40*x1)), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.6450834Z tmp1 = tl.load(in_ptr0 + (20 + (2*x0) + (40*x1)), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.6450947Z tmp3 = tl.load(in_ptr0 + ((2*x0) + (40*x1)), xmask) 2023-01-11T21:38:06.6451062Z tmp6 = tl.load(in_ptr0 + (20 + (2*x0) + (40*x1)), xmask) 2023-01-11T21:38:06.6451145Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.6451219Z tmp4 = 1 2023-01-11T21:38:06.6451297Z tmp5 = tmp3 + tmp4 2023-01-11T21:38:06.6451377Z tmp7 = tmp6 + tmp4 2023-01-11T21:38:06.6451449Z tmp8 = tmp5 + tmp7 2023-01-11T21:38:06.6451586Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.6451722Z tl.store(out_ptr1 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp8, xmask) 2023-01-11T21:38:06.6451838Z ''') 2023-01-11T21:38:06.6451843Z 2023-01-11T21:38:06.6451848Z 2023-01-11T21:38:06.6451945Z async_compile.wait(globals()) 2023-01-11T21:38:06.6452026Z del async_compile 2023-01-11T21:38:06.6452031Z 2023-01-11T21:38:06.6452107Z def call(args): 2023-01-11T21:38:06.6452181Z arg0_1, = args 2023-01-11T21:38:06.6452254Z args.clear() 2023-01-11T21:38:06.6452349Z with torch.cuda.device(0): 2023-01-11T21:38:06.6452552Z buf0 = empty_strided((2, 10), (10, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.6452753Z buf1 = empty_strided((2, 10), (10, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.6452850Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.6452998Z triton_fused_add_add_3_0.run(arg0_1, buf0, buf1, 20, grid=grid(20), stream=stream0) 2023-01-11T21:38:06.6453074Z del arg0_1 2023-01-11T21:38:06.6453153Z return (buf0, buf1, ) 2023-01-11T21:38:06.6453158Z 2023-01-11T21:38:06.6453168Z 2023-01-11T21:38:06.6453247Z if __name__ == "__main__": 2023-01-11T21:38:06.6453365Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.6453493Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.6453705Z arg0_1 = rand_strided((2, 20, 2), (40, 2, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.6453824Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.6453830Z 2023-01-11T21:38:06.6453834Z 2023-01-11T21:38:06.6453937Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.6454012Z import torch 2023-01-11T21:38:06.6454082Z import random 2023-01-11T21:38:06.6454205Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.6454336Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.6454341Z 2023-01-11T21:38:06.6454425Z aten = torch.ops.aten 2023-01-11T21:38:06.6454668Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.6454767Z async_compile = AsyncCompile() 2023-01-11T21:38:06.6454776Z 2023-01-11T21:38:06.6454851Z import triton 2023-01-11T21:38:06.6454945Z import triton.language as tl 2023-01-11T21:38:06.6455065Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.6455206Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.6455212Z 2023-01-11T21:38:06.6455216Z 2023-01-11T21:38:06.6455428Z triton_fused_add_add_3_0 = async_compile.triton(''' 2023-01-11T21:38:06.6455506Z import triton 2023-01-11T21:38:06.6455601Z import triton.language as tl 2023-01-11T21:38:06.6455715Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6455817Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6455949Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6456069Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6456074Z 2023-01-11T21:38:06.6456492Z @pointwise(size_hints=[32], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.6456572Z @triton.jit 2023-01-11T21:38:06.6456714Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6456791Z xnumel = 20 2023-01-11T21:38:06.6456893Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6457023Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6457107Z xmask = xindex < xnumel 2023-01-11T21:38:06.6457233Z x0 = xindex % 10 2023-01-11T21:38:06.6457327Z x1 = (xindex // 10) 2023-01-11T21:38:06.6457408Z x2 = xindex 2023-01-11T21:38:06.6457643Z tmp0 = tl.load(in_ptr0 + ((2*x0) + (40*x1)), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.6457876Z tmp1 = tl.load(in_ptr0 + (20 + (2*x0) + (40*x1)), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.6458004Z tmp3 = tl.load(in_ptr0 + ((2*x0) + (40*x1)), xmask).to(tl.float32) 2023-01-11T21:38:06.6458177Z tmp6 = tl.load(in_ptr0 + (20 + (2*x0) + (40*x1)), xmask).to(tl.float32) 2023-01-11T21:38:06.6458251Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.6458324Z tmp4 = 1 2023-01-11T21:38:06.6458404Z tmp5 = tmp3 + tmp4 2023-01-11T21:38:06.6458484Z tmp7 = tmp6 + tmp4 2023-01-11T21:38:06.6458571Z tmp8 = tmp5 + tmp7 2023-01-11T21:38:06.6458706Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.6458841Z tl.store(out_ptr1 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp8, xmask) 2023-01-11T21:38:06.6458923Z ''') 2023-01-11T21:38:06.6458928Z 2023-01-11T21:38:06.6458940Z 2023-01-11T21:38:06.6459028Z async_compile.wait(globals()) 2023-01-11T21:38:06.6459108Z del async_compile 2023-01-11T21:38:06.6459113Z 2023-01-11T21:38:06.6459191Z def call(args): 2023-01-11T21:38:06.6459265Z arg0_1, = args 2023-01-11T21:38:06.6459342Z args.clear() 2023-01-11T21:38:06.6459437Z with torch.cuda.device(0): 2023-01-11T21:38:06.6459639Z buf0 = empty_strided((2, 10), (10, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.6459837Z buf1 = empty_strided((2, 10), (10, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.6459932Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.6460083Z triton_fused_add_add_3_0.run(arg0_1, buf0, buf1, 20, grid=grid(20), stream=stream0) 2023-01-11T21:38:06.6460158Z del arg0_1 2023-01-11T21:38:06.6460241Z return (buf0, buf1, ) 2023-01-11T21:38:06.6460246Z 2023-01-11T21:38:06.6460250Z 2023-01-11T21:38:06.6460332Z if __name__ == "__main__": 2023-01-11T21:38:06.6460452Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.6460574Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.6460786Z arg0_1 = rand_strided((2, 20, 2), (40, 2, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.6460901Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.6460906Z 2023-01-11T21:38:06.6460981Z ok (0.257s) 2023-01-11T21:38:06.6461506Z test_slice2_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.6461645Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.6461909Z [2023-01-11 21:35:37,398] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 829 2023-01-11T21:38:06.6462172Z [2023-01-11 21:35:37,482] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 829 2023-01-11T21:38:06.6462586Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.6462722Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.6462977Z [2023-01-11 21:35:37,523] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 830 2023-01-11T21:38:06.6463234Z [2023-01-11 21:35:37,605] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 830 2023-01-11T21:38:06.6463245Z 2023-01-11T21:38:06.6463338Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.6463415Z import torch 2023-01-11T21:38:06.6463492Z import random 2023-01-11T21:38:06.6463614Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.6463738Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.6463743Z 2023-01-11T21:38:06.6463827Z aten = torch.ops.aten 2023-01-11T21:38:06.6463995Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.6464086Z async_compile = AsyncCompile() 2023-01-11T21:38:06.6464091Z 2023-01-11T21:38:06.6464166Z import triton 2023-01-11T21:38:06.6464259Z import triton.language as tl 2023-01-11T21:38:06.6464386Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.6464531Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.6464537Z 2023-01-11T21:38:06.6464541Z 2023-01-11T21:38:06.6464704Z triton_fused_add_add_3_0 = async_compile.triton(''' 2023-01-11T21:38:06.6464779Z import triton 2023-01-11T21:38:06.6464874Z import triton.language as tl 2023-01-11T21:38:06.6464983Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6465087Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6465222Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6465349Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6465357Z 2023-01-11T21:38:06.6465817Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.6465903Z @triton.jit 2023-01-11T21:38:06.6466052Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6466128Z xnumel = 10 2023-01-11T21:38:06.6466221Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6466352Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6466435Z xmask = xindex < xnumel 2023-01-11T21:38:06.6466507Z x0 = xindex 2023-01-11T21:38:06.6466708Z tmp0 = tl.load(in_ptr0 + (1 + (4*x0)), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.6466909Z tmp1 = tl.load(in_ptr0 + (42 + (4*x0)), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.6467020Z tmp3 = tl.load(in_ptr0 + (1 + (4*x0)), xmask) 2023-01-11T21:38:06.6467122Z tmp6 = tl.load(in_ptr0 + (42 + (4*x0)), xmask) 2023-01-11T21:38:06.6467201Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.6467273Z tmp4 = 1 2023-01-11T21:38:06.6467354Z tmp5 = tmp3 + tmp4 2023-01-11T21:38:06.6467425Z tmp7 = 2 2023-01-11T21:38:06.6467505Z tmp8 = tmp6 + tmp7 2023-01-11T21:38:06.6467615Z tmp9 = tmp5 + tmp8 2023-01-11T21:38:06.6467747Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.6467882Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp9, xmask) 2023-01-11T21:38:06.6467970Z ''') 2023-01-11T21:38:06.6467976Z 2023-01-11T21:38:06.6467980Z 2023-01-11T21:38:06.6468074Z async_compile.wait(globals()) 2023-01-11T21:38:06.6468150Z del async_compile 2023-01-11T21:38:06.6468155Z 2023-01-11T21:38:06.6468230Z def call(args): 2023-01-11T21:38:06.6468304Z arg0_1, = args 2023-01-11T21:38:06.6468374Z args.clear() 2023-01-11T21:38:06.6468468Z with torch.cuda.device(0): 2023-01-11T21:38:06.6468680Z buf0 = empty_strided((1, 10), (10, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.6468881Z buf1 = empty_strided((1, 10), (10, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.6468976Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.6469127Z triton_fused_add_add_3_0.run(arg0_1, buf0, buf1, 10, grid=grid(10), stream=stream0) 2023-01-11T21:38:06.6469203Z del arg0_1 2023-01-11T21:38:06.6469289Z return (buf0, buf1, ) 2023-01-11T21:38:06.6469295Z 2023-01-11T21:38:06.6469299Z 2023-01-11T21:38:06.6469374Z if __name__ == "__main__": 2023-01-11T21:38:06.6469494Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.6469622Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.6469831Z arg0_1 = rand_strided((2, 20, 2), (40, 2, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.6469946Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.6469951Z 2023-01-11T21:38:06.6469986Z 2023-01-11T21:38:06.6470086Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.6470162Z import torch 2023-01-11T21:38:06.6470238Z import random 2023-01-11T21:38:06.6470352Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.6470474Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.6470479Z 2023-01-11T21:38:06.6470565Z aten = torch.ops.aten 2023-01-11T21:38:06.6470702Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.6470799Z async_compile = AsyncCompile() 2023-01-11T21:38:06.6470804Z 2023-01-11T21:38:06.6470880Z import triton 2023-01-11T21:38:06.6470974Z import triton.language as tl 2023-01-11T21:38:06.6471095Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.6471235Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.6471240Z 2023-01-11T21:38:06.6471245Z 2023-01-11T21:38:06.6471407Z triton_fused_add_add_3_0 = async_compile.triton(''' 2023-01-11T21:38:06.6471488Z import triton 2023-01-11T21:38:06.6471581Z import triton.language as tl 2023-01-11T21:38:06.6471695Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6471797Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6471931Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6472054Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6472060Z 2023-01-11T21:38:06.6472471Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.6472545Z @triton.jit 2023-01-11T21:38:06.6472688Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6472764Z xnumel = 10 2023-01-11T21:38:06.6472866Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6473000Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6473088Z xmask = xindex < xnumel 2023-01-11T21:38:06.6473154Z x0 = xindex 2023-01-11T21:38:06.6473377Z tmp0 = tl.load(in_ptr0 + (1 + (4*x0)), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.6473628Z tmp1 = tl.load(in_ptr0 + (42 + (4*x0)), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.6473751Z tmp3 = tl.load(in_ptr0 + (1 + (4*x0)), xmask).to(tl.float32) 2023-01-11T21:38:06.6473874Z tmp6 = tl.load(in_ptr0 + (42 + (4*x0)), xmask).to(tl.float32) 2023-01-11T21:38:06.6473954Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.6474028Z tmp4 = 1 2023-01-11T21:38:06.6474102Z tmp5 = tmp3 + tmp4 2023-01-11T21:38:06.6474172Z tmp7 = 2 2023-01-11T21:38:06.6474251Z tmp8 = tmp6 + tmp7 2023-01-11T21:38:06.6474330Z tmp9 = tmp5 + tmp8 2023-01-11T21:38:06.6474466Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.6474601Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp9, xmask) 2023-01-11T21:38:06.6474691Z ''') 2023-01-11T21:38:06.6474696Z 2023-01-11T21:38:06.6474701Z 2023-01-11T21:38:06.6474796Z async_compile.wait(globals()) 2023-01-11T21:38:06.6474868Z del async_compile 2023-01-11T21:38:06.6474873Z 2023-01-11T21:38:06.6474953Z def call(args): 2023-01-11T21:38:06.6475029Z arg0_1, = args 2023-01-11T21:38:06.6475106Z args.clear() 2023-01-11T21:38:06.6475200Z with torch.cuda.device(0): 2023-01-11T21:38:06.6475429Z buf0 = empty_strided((1, 10), (10, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.6475653Z buf1 = empty_strided((1, 10), (10, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.6475740Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.6475888Z triton_fused_add_add_3_0.run(arg0_1, buf0, buf1, 10, grid=grid(10), stream=stream0) 2023-01-11T21:38:06.6475963Z del arg0_1 2023-01-11T21:38:06.6476048Z return (buf0, buf1, ) 2023-01-11T21:38:06.6476090Z 2023-01-11T21:38:06.6476095Z 2023-01-11T21:38:06.6476176Z if __name__ == "__main__": 2023-01-11T21:38:06.6476294Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.6476420Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.6476633Z arg0_1 = rand_strided((2, 20, 2), (40, 2, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.6476741Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.6476746Z 2023-01-11T21:38:06.6476819Z ok (0.249s) 2023-01-11T21:38:06.6477280Z test_slice_mutation1_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.6477418Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.6477680Z [2023-01-11 21:35:37,653] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 831 2023-01-11T21:38:06.6477949Z [2023-01-11 21:35:37,751] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 831 2023-01-11T21:38:06.6478364Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.6478498Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.6478754Z [2023-01-11 21:35:37,800] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 832 2023-01-11T21:38:06.6478759Z 2023-01-11T21:38:06.6478860Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.6478942Z import torch 2023-01-11T21:38:06.6479012Z import random 2023-01-11T21:38:06.6479134Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.6479258Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.6479263Z 2023-01-11T21:38:06.6479345Z aten = torch.ops.aten 2023-01-11T21:38:06.6479506Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.6479605Z async_compile = AsyncCompile() 2023-01-11T21:38:06.6479610Z 2023-01-11T21:38:06.6479684Z import triton 2023-01-11T21:38:06.6479771Z import triton.language as tl 2023-01-11T21:38:06.6479898Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.6480040Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.6480046Z 2023-01-11T21:38:06.6480050Z 2023-01-11T21:38:06.6480226Z triton_fused_add_zeros_like_0 = async_compile.triton(''' 2023-01-11T21:38:06.6480305Z import triton 2023-01-11T21:38:06.6480398Z import triton.language as tl 2023-01-11T21:38:06.6480516Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6480618Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6480747Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6480874Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6480879Z 2023-01-11T21:38:06.6481286Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.6481363Z @triton.jit 2023-01-11T21:38:06.6481495Z def triton_(out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6481571Z xnumel = 64 2023-01-11T21:38:06.6481670Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6481802Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6481881Z xmask = xindex < xnumel 2023-01-11T21:38:06.6481983Z x0 = xindex 2023-01-11T21:38:06.6482055Z tmp0 = 0 2023-01-11T21:38:06.6482126Z tmp1 = 1 2023-01-11T21:38:06.6482208Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.6482344Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask) 2023-01-11T21:38:06.6482481Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.6482562Z ''') 2023-01-11T21:38:06.6482568Z 2023-01-11T21:38:06.6482572Z 2023-01-11T21:38:06.6482796Z triton_fused_fill__lift_fresh_copy_select_slice_1_zeros_like_1 = async_compile.triton(''' 2023-01-11T21:38:06.6482873Z import triton 2023-01-11T21:38:06.6482968Z import triton.language as tl 2023-01-11T21:38:06.6483084Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6483186Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6483323Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6483444Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6483459Z 2023-01-11T21:38:06.6483847Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*fp32', 1: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0,), equal_to_1=())]}) 2023-01-11T21:38:06.6483923Z @triton.jit 2023-01-11T21:38:06.6484049Z def triton_(out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6484126Z xnumel = 8 2023-01-11T21:38:06.6484223Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6484353Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6484436Z xmask = xindex < xnumel 2023-01-11T21:38:06.6484502Z x0 = xindex 2023-01-11T21:38:06.6484577Z tmp0 = 3.0 2023-01-11T21:38:06.6484719Z tl.store(out_ptr0 + (3 + (8*x0) + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask) 2023-01-11T21:38:06.6484806Z ''') 2023-01-11T21:38:06.6484811Z 2023-01-11T21:38:06.6484816Z 2023-01-11T21:38:06.6484976Z triton_fused_clone_2 = async_compile.triton(''' 2023-01-11T21:38:06.6485055Z import triton 2023-01-11T21:38:06.6485150Z import triton.language as tl 2023-01-11T21:38:06.6485269Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6485366Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6485499Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6485655Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6485661Z 2023-01-11T21:38:06.6486065Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.6486142Z @triton.jit 2023-01-11T21:38:06.6486276Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6486351Z xnumel = 64 2023-01-11T21:38:06.6486449Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6486572Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6486661Z xmask = xindex < xnumel 2023-01-11T21:38:06.6486734Z x0 = xindex 2023-01-11T21:38:06.6486926Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.6487061Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask) 2023-01-11T21:38:06.6487151Z ''') 2023-01-11T21:38:06.6487156Z 2023-01-11T21:38:06.6487161Z 2023-01-11T21:38:06.6487393Z triton_fused_fill__1_lift_fresh_copy_1_select_1_slice_2_zeros_like_3 = async_compile.triton(''' 2023-01-11T21:38:06.6487464Z import triton 2023-01-11T21:38:06.6487559Z import triton.language as tl 2023-01-11T21:38:06.6487675Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6487778Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6487913Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6488040Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6488045Z 2023-01-11T21:38:06.6488469Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*fp32', 1: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0,), equal_to_1=())]}) 2023-01-11T21:38:06.6488544Z @triton.jit 2023-01-11T21:38:06.6488663Z def triton_(out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6488739Z xnumel = 8 2023-01-11T21:38:06.6488838Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6488969Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6489055Z xmask = xindex < xnumel 2023-01-11T21:38:06.6489129Z x0 = xindex 2023-01-11T21:38:06.6489203Z tmp0 = 4.0 2023-01-11T21:38:06.6489335Z tl.store(out_ptr0 + (32 + x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask) 2023-01-11T21:38:06.6489422Z ''') 2023-01-11T21:38:06.6489428Z 2023-01-11T21:38:06.6489432Z 2023-01-11T21:38:06.6489591Z triton_fused_add_1_4 = async_compile.triton(''' 2023-01-11T21:38:06.6489672Z import triton 2023-01-11T21:38:06.6489766Z import triton.language as tl 2023-01-11T21:38:06.6489883Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6489985Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6490119Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6490242Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6490247Z 2023-01-11T21:38:06.6490650Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.6490726Z @triton.jit 2023-01-11T21:38:06.6490859Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6490935Z xnumel = 64 2023-01-11T21:38:06.6491032Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6491161Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6491247Z xmask = xindex < xnumel 2023-01-11T21:38:06.6491313Z x0 = xindex 2023-01-11T21:38:06.6491504Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.6491579Z tmp1 = 1 2023-01-11T21:38:06.6491659Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.6491826Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.6491913Z ''') 2023-01-11T21:38:06.6491918Z 2023-01-11T21:38:06.6491923Z 2023-01-11T21:38:06.6492021Z async_compile.wait(globals()) 2023-01-11T21:38:06.6492092Z del async_compile 2023-01-11T21:38:06.6492097Z 2023-01-11T21:38:06.6492171Z def call(args): 2023-01-11T21:38:06.6492245Z arg0_1, = args 2023-01-11T21:38:06.6492319Z args.clear() 2023-01-11T21:38:06.6492411Z with torch.cuda.device(0): 2023-01-11T21:38:06.6492613Z buf0 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.6492811Z buf1 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.6492901Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.6493047Z triton_fused_add_zeros_like_0.run(buf0, buf1, 64, grid=grid(64), stream=stream0) 2023-01-11T21:38:06.6493221Z triton_fused_fill__lift_fresh_copy_select_slice_1_zeros_like_1.run(buf0, 8, grid=grid(8), stream=stream0) 2023-01-11T21:38:06.6493421Z buf3 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.6493558Z triton_fused_clone_2.run(buf0, buf3, 64, grid=grid(64), stream=stream0) 2023-01-11T21:38:06.6493733Z triton_fused_fill__1_lift_fresh_copy_1_select_1_slice_2_zeros_like_3.run(buf0, 8, grid=grid(8), stream=stream0) 2023-01-11T21:38:06.6493931Z buf5 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.6494066Z triton_fused_add_1_4.run(buf0, buf5, 64, grid=grid(64), stream=stream0) 2023-01-11T21:38:06.6494162Z return (buf0, buf1, buf3, buf5, ) 2023-01-11T21:38:06.6494194Z 2023-01-11T21:38:06.6494199Z 2023-01-11T21:38:06.6494274Z if __name__ == "__main__": 2023-01-11T21:38:06.6494395Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.6494626Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.6494833Z arg0_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.6494946Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.6494951Z 2023-01-11T21:38:06.6495216Z [2023-01-11 21:35:37,899] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 832 2023-01-11T21:38:06.6495222Z 2023-01-11T21:38:06.6495320Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.6495395Z import torch 2023-01-11T21:38:06.6495464Z import random 2023-01-11T21:38:06.6495608Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.6495755Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.6495761Z 2023-01-11T21:38:06.6495855Z aten = torch.ops.aten 2023-01-11T21:38:06.6495993Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.6496090Z async_compile = AsyncCompile() 2023-01-11T21:38:06.6496095Z 2023-01-11T21:38:06.6496170Z import triton 2023-01-11T21:38:06.6496263Z import triton.language as tl 2023-01-11T21:38:06.6496384Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.6496524Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.6496529Z 2023-01-11T21:38:06.6496534Z 2023-01-11T21:38:06.6496705Z triton_fused_add_zeros_like_0 = async_compile.triton(''' 2023-01-11T21:38:06.6496780Z import triton 2023-01-11T21:38:06.6496874Z import triton.language as tl 2023-01-11T21:38:06.6496987Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6497091Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6497274Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6497421Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6497433Z 2023-01-11T21:38:06.6497834Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.6497957Z @triton.jit 2023-01-11T21:38:06.6498092Z def triton_(out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6498167Z xnumel = 64 2023-01-11T21:38:06.6498266Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6498394Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6498471Z xmask = xindex < xnumel 2023-01-11T21:38:06.6498541Z x0 = xindex 2023-01-11T21:38:06.6498611Z tmp0 = 0 2023-01-11T21:38:06.6498681Z tmp1 = 1 2023-01-11T21:38:06.6498760Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.6498896Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask) 2023-01-11T21:38:06.6499029Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.6499110Z ''') 2023-01-11T21:38:06.6499116Z 2023-01-11T21:38:06.6499128Z 2023-01-11T21:38:06.6499345Z triton_fused_fill__lift_fresh_copy_select_slice_1_zeros_like_1 = async_compile.triton(''' 2023-01-11T21:38:06.6499422Z import triton 2023-01-11T21:38:06.6499515Z import triton.language as tl 2023-01-11T21:38:06.6499630Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6499732Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6499863Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6499988Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6499993Z 2023-01-11T21:38:06.6500385Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*fp16', 1: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0,), equal_to_1=())]}) 2023-01-11T21:38:06.6500491Z @triton.jit 2023-01-11T21:38:06.6500614Z def triton_(out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6500687Z xnumel = 8 2023-01-11T21:38:06.6500785Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6500918Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6501003Z xmask = xindex < xnumel 2023-01-11T21:38:06.6501079Z x0 = xindex 2023-01-11T21:38:06.6501147Z tmp0 = 3.0 2023-01-11T21:38:06.6501287Z tl.store(out_ptr0 + (3 + (8*x0) + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask) 2023-01-11T21:38:06.6501374Z ''') 2023-01-11T21:38:06.6501379Z 2023-01-11T21:38:06.6501384Z 2023-01-11T21:38:06.6501542Z triton_fused_clone_2 = async_compile.triton(''' 2023-01-11T21:38:06.6501617Z import triton 2023-01-11T21:38:06.6501712Z import triton.language as tl 2023-01-11T21:38:06.6501827Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6501923Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6502057Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6502187Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6502192Z 2023-01-11T21:38:06.6502597Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.6502673Z @triton.jit 2023-01-11T21:38:06.6502808Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6502881Z xnumel = 64 2023-01-11T21:38:06.6502980Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6503104Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6503189Z xmask = xindex < xnumel 2023-01-11T21:38:06.6503260Z x0 = xindex 2023-01-11T21:38:06.6503476Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.6503610Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask) 2023-01-11T21:38:06.6503697Z ''') 2023-01-11T21:38:06.6503703Z 2023-01-11T21:38:06.6503707Z 2023-01-11T21:38:06.6503938Z triton_fused_fill__1_lift_fresh_copy_1_select_1_slice_2_zeros_like_3 = async_compile.triton(''' 2023-01-11T21:38:06.6504014Z import triton 2023-01-11T21:38:06.6504130Z import triton.language as tl 2023-01-11T21:38:06.6504249Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6504352Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6504483Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6504608Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6504614Z 2023-01-11T21:38:06.6505009Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*fp16', 1: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0,), equal_to_1=())]}) 2023-01-11T21:38:06.6505086Z @triton.jit 2023-01-11T21:38:06.6505213Z def triton_(out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6505280Z xnumel = 8 2023-01-11T21:38:06.6505381Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6505510Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6505607Z xmask = xindex < xnumel 2023-01-11T21:38:06.6505694Z x0 = xindex 2023-01-11T21:38:06.6505777Z tmp0 = 4.0 2023-01-11T21:38:06.6505934Z tl.store(out_ptr0 + (32 + x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask) 2023-01-11T21:38:06.6506014Z ''') 2023-01-11T21:38:06.6506019Z 2023-01-11T21:38:06.6506024Z 2023-01-11T21:38:06.6506183Z triton_fused_add_1_4 = async_compile.triton(''' 2023-01-11T21:38:06.6506259Z import triton 2023-01-11T21:38:06.6506353Z import triton.language as tl 2023-01-11T21:38:06.6506469Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6506572Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6506705Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6506853Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6506865Z 2023-01-11T21:38:06.6507262Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.6507336Z @triton.jit 2023-01-11T21:38:06.6507468Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6507541Z xnumel = 64 2023-01-11T21:38:06.6507638Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6507768Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6507852Z xmask = xindex < xnumel 2023-01-11T21:38:06.6507918Z x0 = xindex 2023-01-11T21:38:06.6508132Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.6508204Z tmp1 = 1 2023-01-11T21:38:06.6508289Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.6508424Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.6508511Z ''') 2023-01-11T21:38:06.6508517Z 2023-01-11T21:38:06.6508521Z 2023-01-11T21:38:06.6508614Z async_compile.wait(globals()) 2023-01-11T21:38:06.6508694Z del async_compile 2023-01-11T21:38:06.6508699Z 2023-01-11T21:38:06.6508772Z def call(args): 2023-01-11T21:38:06.6508845Z arg0_1, = args 2023-01-11T21:38:06.6508920Z args.clear() 2023-01-11T21:38:06.6509015Z with torch.cuda.device(0): 2023-01-11T21:38:06.6509214Z buf0 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.6509415Z buf1 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.6509509Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.6509650Z triton_fused_add_zeros_like_0.run(buf0, buf1, 64, grid=grid(64), stream=stream0) 2023-01-11T21:38:06.6509831Z triton_fused_fill__lift_fresh_copy_select_slice_1_zeros_like_1.run(buf0, 8, grid=grid(8), stream=stream0) 2023-01-11T21:38:06.6510033Z buf3 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.6510167Z triton_fused_clone_2.run(buf0, buf3, 64, grid=grid(64), stream=stream0) 2023-01-11T21:38:06.6510372Z triton_fused_fill__1_lift_fresh_copy_1_select_1_slice_2_zeros_like_3.run(buf0, 8, grid=grid(8), stream=stream0) 2023-01-11T21:38:06.6510572Z buf5 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.6510706Z triton_fused_add_1_4.run(buf0, buf5, 64, grid=grid(64), stream=stream0) 2023-01-11T21:38:06.6510805Z return (buf0, buf1, buf3, buf5, ) 2023-01-11T21:38:06.6510811Z 2023-01-11T21:38:06.6510815Z 2023-01-11T21:38:06.6510896Z if __name__ == "__main__": 2023-01-11T21:38:06.6511010Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.6511137Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.6511339Z arg0_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.6511453Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.6511458Z 2023-01-11T21:38:06.6511533Z ok (0.294s) 2023-01-11T21:38:06.6511870Z test_slice_mutation2_cuda (__main__.CudaTests) ... [2023-01-11 21:35:37,935] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 833 2023-01-11T21:38:06.6512117Z [2023-01-11 21:35:38,027] torch._inductor.compile_fx: [WARNING] skipping cudagraphs due to input mutation 2023-01-11T21:38:06.6512384Z [2023-01-11 21:35:38,027] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 833 2023-01-11T21:38:06.6512391Z 2023-01-11T21:38:06.6512484Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.6512561Z import torch 2023-01-11T21:38:06.6512637Z import random 2023-01-11T21:38:06.6512757Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.6512882Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.6512919Z 2023-01-11T21:38:06.6513004Z aten = torch.ops.aten 2023-01-11T21:38:06.6513143Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.6513241Z async_compile = AsyncCompile() 2023-01-11T21:38:06.6513246Z 2023-01-11T21:38:06.6513316Z import triton 2023-01-11T21:38:06.6513418Z import triton.language as tl 2023-01-11T21:38:06.6513547Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.6513687Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.6513692Z 2023-01-11T21:38:06.6513697Z 2023-01-11T21:38:06.6513879Z triton_fused_add_slice_1_slice_2_0 = async_compile.triton(''' 2023-01-11T21:38:06.6513954Z import triton 2023-01-11T21:38:06.6514048Z import triton.language as tl 2023-01-11T21:38:06.6514157Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6514259Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6514392Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6514521Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6514526Z 2023-01-11T21:38:06.6514929Z @pointwise(size_hints=[32], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.6515004Z @triton.jit 2023-01-11T21:38:06.6515137Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6515212Z xnumel = 20 2023-01-11T21:38:06.6515304Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6515435Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6515519Z xmask = xindex < xnumel 2023-01-11T21:38:06.6515592Z x0 = xindex 2023-01-11T21:38:06.6515794Z tmp0 = tl.load(in_ptr0 + (20 + x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.6515868Z tmp1 = 1 2023-01-11T21:38:06.6515954Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.6516083Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.6516168Z ''') 2023-01-11T21:38:06.6516174Z 2023-01-11T21:38:06.6516178Z 2023-01-11T21:38:06.6516394Z triton_fused_add_copy__slice_1_slice_2_slice_3_slice_4_1 = async_compile.triton(''' 2023-01-11T21:38:06.6516496Z import triton 2023-01-11T21:38:06.6516593Z import triton.language as tl 2023-01-11T21:38:06.6516708Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6516810Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6516944Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6517063Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6517075Z 2023-01-11T21:38:06.6517476Z @pointwise(size_hints=[32], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.6517556Z @triton.jit 2023-01-11T21:38:06.6517689Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6517764Z xnumel = 20 2023-01-11T21:38:06.6517862Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6517993Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6518076Z xmask = xindex < xnumel 2023-01-11T21:38:06.6518142Z x0 = xindex 2023-01-11T21:38:06.6518239Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.6518377Z tl.store(out_ptr0 + (20 + x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask) 2023-01-11T21:38:06.6518462Z ''') 2023-01-11T21:38:06.6518468Z 2023-01-11T21:38:06.6518472Z 2023-01-11T21:38:06.6524853Z triton_fused_add_1_slice_5_slice_6_2 = async_compile.triton(''' 2023-01-11T21:38:06.6524946Z import triton 2023-01-11T21:38:06.6525043Z import triton.language as tl 2023-01-11T21:38:06.6525162Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6525327Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6525478Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6525619Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6525626Z 2023-01-11T21:38:06.6526070Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.6526145Z @triton.jit 2023-01-11T21:38:06.6526280Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6526354Z xnumel = 9 2023-01-11T21:38:06.6526454Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6526576Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6526659Z xmask = xindex < xnumel 2023-01-11T21:38:06.6526726Z x0 = xindex 2023-01-11T21:38:06.6526928Z tmp0 = tl.load(in_ptr0 + (1 + x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.6527001Z tmp1 = 2 2023-01-11T21:38:06.6527080Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.6527214Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.6527292Z ''') 2023-01-11T21:38:06.6527298Z 2023-01-11T21:38:06.6527302Z 2023-01-11T21:38:06.6527522Z triton_fused_add_1_copy__1_slice_5_slice_6_slice_7_slice_8_3 = async_compile.triton(''' 2023-01-11T21:38:06.6527597Z import triton 2023-01-11T21:38:06.6527691Z import triton.language as tl 2023-01-11T21:38:06.6527807Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6527907Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6528038Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6528156Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6528167Z 2023-01-11T21:38:06.6528568Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.6528645Z @triton.jit 2023-01-11T21:38:06.6528776Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6528847Z xnumel = 9 2023-01-11T21:38:06.6528976Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6529106Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6529188Z xmask = xindex < xnumel 2023-01-11T21:38:06.6529252Z x0 = xindex 2023-01-11T21:38:06.6529348Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.6529484Z tl.store(out_ptr0 + (2 + x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask) 2023-01-11T21:38:06.6529570Z ''') 2023-01-11T21:38:06.6529575Z 2023-01-11T21:38:06.6529580Z 2023-01-11T21:38:06.6529675Z async_compile.wait(globals()) 2023-01-11T21:38:06.6529751Z del async_compile 2023-01-11T21:38:06.6529756Z 2023-01-11T21:38:06.6529834Z def call(args): 2023-01-11T21:38:06.6529906Z arg0_1, = args 2023-01-11T21:38:06.6529974Z args.clear() 2023-01-11T21:38:06.6530065Z with torch.cuda.device(0): 2023-01-11T21:38:06.6530268Z buf0 = empty_strided((1, 20), (20, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.6530362Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.6530513Z triton_fused_add_slice_1_slice_2_0.run(arg0_1, buf0, 20, grid=grid(20), stream=stream0) 2023-01-11T21:38:06.6530685Z triton_fused_add_copy__slice_1_slice_2_slice_3_slice_4_1.run(buf0, arg0_1, 20, grid=grid(20), stream=stream0) 2023-01-11T21:38:06.6530758Z del buf0 2023-01-11T21:38:06.6530951Z buf2 = empty_strided((1, 9), (9, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.6531103Z triton_fused_add_1_slice_5_slice_6_2.run(arg0_1, buf2, 9, grid=grid(9), stream=stream0) 2023-01-11T21:38:06.6531277Z triton_fused_add_1_copy__1_slice_5_slice_6_slice_7_slice_8_3.run(buf2, arg0_1, 9, grid=grid(9), stream=stream0) 2023-01-11T21:38:06.6531380Z del arg0_1 2023-01-11T21:38:06.6531451Z return () 2023-01-11T21:38:06.6531457Z 2023-01-11T21:38:06.6531461Z 2023-01-11T21:38:06.6531541Z if __name__ == "__main__": 2023-01-11T21:38:06.6531658Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.6531785Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.6531982Z arg0_1 = rand_strided((1, 64), (64, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.6532093Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.6532098Z 2023-01-11T21:38:06.6532169Z ok (0.126s) 2023-01-11T21:38:06.6532626Z test_slice_scatter2_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.6532760Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.6533018Z [2023-01-11 21:35:38,052] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 834 2023-01-11T21:38:06.6533283Z [2023-01-11 21:35:38,120] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 834 2023-01-11T21:38:06.6533702Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.6533838Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.6534092Z [2023-01-11 21:35:38,139] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 835 2023-01-11T21:38:06.6534358Z [2023-01-11 21:35:38,209] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 835 2023-01-11T21:38:06.6534364Z 2023-01-11T21:38:06.6534463Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.6534750Z import torch 2023-01-11T21:38:06.6534876Z import random 2023-01-11T21:38:06.6534998Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.6535121Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.6535126Z 2023-01-11T21:38:06.6535208Z aten = torch.ops.aten 2023-01-11T21:38:06.6535346Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.6535442Z async_compile = AsyncCompile() 2023-01-11T21:38:06.6535447Z 2023-01-11T21:38:06.6535514Z import triton 2023-01-11T21:38:06.6535608Z import triton.language as tl 2023-01-11T21:38:06.6535736Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.6535881Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.6535886Z 2023-01-11T21:38:06.6535891Z 2023-01-11T21:38:06.6536065Z triton_fused_slice_scatter_0 = async_compile.triton(''' 2023-01-11T21:38:06.6536140Z import triton 2023-01-11T21:38:06.6536231Z import triton.language as tl 2023-01-11T21:38:06.6536346Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6536441Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6536573Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6536702Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6536707Z 2023-01-11T21:38:06.6537116Z @pointwise(size_hints=[1048576], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.6537256Z @triton.jit 2023-01-11T21:38:06.6537390Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6537506Z xnumel = 605184 2023-01-11T21:38:06.6537604Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6537727Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6537809Z xmask = xindex < xnumel 2023-01-11T21:38:06.6537880Z x0 = xindex 2023-01-11T21:38:06.6537984Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.6538127Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask) 2023-01-11T21:38:06.6538213Z ''') 2023-01-11T21:38:06.6538219Z 2023-01-11T21:38:06.6538223Z 2023-01-11T21:38:06.6538319Z async_compile.wait(globals()) 2023-01-11T21:38:06.6538389Z del async_compile 2023-01-11T21:38:06.6538394Z 2023-01-11T21:38:06.6538469Z def call(args): 2023-01-11T21:38:06.6538547Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.6538621Z args.clear() 2023-01-11T21:38:06.6538713Z with torch.cuda.device(0): 2023-01-11T21:38:06.6538935Z buf0 = empty_strided((8, 197, 384), (75648, 384, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.6539031Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.6539177Z triton_fused_slice_scatter_0.run(arg1_1, buf0, 605184, grid=grid(605184), stream=stream0) 2023-01-11T21:38:06.6539251Z del arg1_1 2023-01-11T21:38:06.6539327Z return (buf0, ) 2023-01-11T21:38:06.6539336Z 2023-01-11T21:38:06.6539340Z 2023-01-11T21:38:06.6539420Z if __name__ == "__main__": 2023-01-11T21:38:06.6539535Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.6539660Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.6539880Z arg0_1 = rand_strided((8, 197, 384), (75648, 384, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.6540098Z arg1_1 = rand_strided((8, 197, 384), (75648, 384, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.6540211Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.6540223Z 2023-01-11T21:38:06.6540229Z 2023-01-11T21:38:06.6540320Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.6540392Z import torch 2023-01-11T21:38:06.6540465Z import random 2023-01-11T21:38:06.6540585Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.6540707Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.6540712Z 2023-01-11T21:38:06.6540823Z aten = torch.ops.aten 2023-01-11T21:38:06.6540960Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.6541049Z async_compile = AsyncCompile() 2023-01-11T21:38:06.6541061Z 2023-01-11T21:38:06.6541128Z import triton 2023-01-11T21:38:06.6541219Z import triton.language as tl 2023-01-11T21:38:06.6541347Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.6541488Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.6541493Z 2023-01-11T21:38:06.6541498Z 2023-01-11T21:38:06.6541666Z triton_fused_slice_scatter_0 = async_compile.triton(''' 2023-01-11T21:38:06.6541744Z import triton 2023-01-11T21:38:06.6541835Z import triton.language as tl 2023-01-11T21:38:06.6541942Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6542041Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6542171Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6542297Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6542303Z 2023-01-11T21:38:06.6542705Z @pointwise(size_hints=[1048576], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.6542778Z @triton.jit 2023-01-11T21:38:06.6542908Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6542980Z xnumel = 605184 2023-01-11T21:38:06.6543072Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6543201Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6543313Z xmask = xindex < xnumel 2023-01-11T21:38:06.6543383Z x0 = xindex 2023-01-11T21:38:06.6543499Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.6543632Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask) 2023-01-11T21:38:06.6543716Z ''') 2023-01-11T21:38:06.6543723Z 2023-01-11T21:38:06.6543728Z 2023-01-11T21:38:06.6543821Z async_compile.wait(globals()) 2023-01-11T21:38:06.6543890Z del async_compile 2023-01-11T21:38:06.6543895Z 2023-01-11T21:38:06.6543968Z def call(args): 2023-01-11T21:38:06.6544046Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.6544121Z args.clear() 2023-01-11T21:38:06.6544212Z with torch.cuda.device(0): 2023-01-11T21:38:06.6544433Z buf0 = empty_strided((8, 197, 384), (75648, 384, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.6544525Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.6544667Z triton_fused_slice_scatter_0.run(arg1_1, buf0, 605184, grid=grid(605184), stream=stream0) 2023-01-11T21:38:06.6544741Z del arg1_1 2023-01-11T21:38:06.6544819Z return (buf0, ) 2023-01-11T21:38:06.6544825Z 2023-01-11T21:38:06.6544829Z 2023-01-11T21:38:06.6544910Z if __name__ == "__main__": 2023-01-11T21:38:06.6545027Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.6545156Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.6545374Z arg0_1 = rand_strided((8, 197, 384), (75648, 384, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.6545587Z arg1_1 = rand_strided((8, 197, 384), (75648, 384, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.6545699Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.6545704Z 2023-01-11T21:38:06.6545775Z ok (0.183s) 2023-01-11T21:38:06.6546233Z test_slice_scatter_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.6546369Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.6546656Z [2023-01-11 21:35:38,243] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 836 2023-01-11T21:38:06.6546925Z [2023-01-11 21:35:38,358] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 836 2023-01-11T21:38:06.6547337Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.6547471Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.6547725Z [2023-01-11 21:35:38,391] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 837 2023-01-11T21:38:06.6547989Z [2023-01-11 21:35:38,500] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 837 2023-01-11T21:38:06.6547994Z 2023-01-11T21:38:06.6548092Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.6548160Z import torch 2023-01-11T21:38:06.6548235Z import random 2023-01-11T21:38:06.6548353Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.6548475Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.6548480Z 2023-01-11T21:38:06.6548560Z aten = torch.ops.aten 2023-01-11T21:38:06.6548692Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.6548787Z async_compile = AsyncCompile() 2023-01-11T21:38:06.6548792Z 2023-01-11T21:38:06.6548859Z import triton 2023-01-11T21:38:06.6548978Z import triton.language as tl 2023-01-11T21:38:06.6549103Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.6549241Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.6549247Z 2023-01-11T21:38:06.6549251Z 2023-01-11T21:38:06.6549452Z triton_fused_slice_scatter_slice_scatter_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.6549526Z import triton 2023-01-11T21:38:06.6549620Z import triton.language as tl 2023-01-11T21:38:06.6549733Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6549828Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6549959Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6550083Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6550088Z 2023-01-11T21:38:06.6550524Z @pointwise(size_hints=[4096], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]}) 2023-01-11T21:38:06.6550601Z @triton.jit 2023-01-11T21:38:06.6550752Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6550826Z xnumel = 3200 2023-01-11T21:38:06.6550926Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6551048Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6551131Z xmask = xindex < xnumel 2023-01-11T21:38:06.6551206Z x0 = xindex % 100 2023-01-11T21:38:06.6551286Z x1 = (xindex // 100) 2023-01-11T21:38:06.6551357Z x2 = xindex 2023-01-11T21:38:06.6551548Z tmp8 = tl.load(in_ptr1 + (x2), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.6551648Z tmp16 = tl.load(in_ptr1 + (x2), xmask) 2023-01-11T21:38:06.6551713Z tmp0 = x0 2023-01-11T21:38:06.6551783Z tmp1 = 10 2023-01-11T21:38:06.6551860Z tmp2 = tmp0 >= tmp1 2023-01-11T21:38:06.6551940Z tmp3 = 90 2023-01-11T21:38:06.6552017Z tmp4 = tmp0 < tmp3 2023-01-11T21:38:06.6552093Z tmp5 = tmp2 & tmp4 2023-01-11T21:38:06.6552377Z tmp6 = tl.load(in_ptr0 + ((-10) + x0 + (80*x1) + tl.zeros([XBLOCK], tl.int32)), tmp5 & xmask, eviction_policy='evict_last', other=0) 2023-01-11T21:38:06.6552466Z tmp7 = tl.where(tmp5, tmp6, 0.0) 2023-01-11T21:38:06.6552591Z tmp9 = tl.where(tmp5, tmp7, tmp8) 2023-01-11T21:38:06.6552667Z tmp10 = x0 % 2 2023-01-11T21:38:06.6552738Z tmp11 = 0 2023-01-11T21:38:06.6552820Z tmp12 = tmp10 == tmp11 2023-01-11T21:38:06.6552900Z tmp13 = tmp5 & tmp12 2023-01-11T21:38:06.6553143Z tmp14 = tl.load(in_ptr0 + ((-5) + (80*x1) + (x0 // 2) + tl.zeros([XBLOCK], tl.int32)), tmp13 & xmask, other=0) 2023-01-11T21:38:06.6553232Z tmp15 = tl.where(tmp13, tmp14, 0.0) 2023-01-11T21:38:06.6553331Z tmp17 = tl.where(tmp13, tmp15, tmp16) 2023-01-11T21:38:06.6553464Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp9, xmask) 2023-01-11T21:38:06.6553600Z tl.store(out_ptr1 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp17, xmask) 2023-01-11T21:38:06.6553688Z ''') 2023-01-11T21:38:06.6553693Z 2023-01-11T21:38:06.6553698Z 2023-01-11T21:38:06.6553791Z async_compile.wait(globals()) 2023-01-11T21:38:06.6553866Z del async_compile 2023-01-11T21:38:06.6553872Z 2023-01-11T21:38:06.6553944Z def call(args): 2023-01-11T21:38:06.6554019Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.6554092Z args.clear() 2023-01-11T21:38:06.6554182Z with torch.cuda.device(0): 2023-01-11T21:38:06.6554393Z buf0 = empty_strided((4, 8, 100), (800, 100, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.6554606Z buf1 = empty_strided((4, 8, 100), (800, 100, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.6554698Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.6554874Z triton_fused_slice_scatter_slice_scatter_1_0.run(arg1_1, arg0_1, buf0, buf1, 3200, grid=grid(3200), stream=stream0) 2023-01-11T21:38:06.6554941Z del arg0_1 2023-01-11T21:38:06.6555045Z del arg1_1 2023-01-11T21:38:06.6555127Z return (buf0, buf1, ) 2023-01-11T21:38:06.6555133Z 2023-01-11T21:38:06.6555137Z 2023-01-11T21:38:06.6555215Z if __name__ == "__main__": 2023-01-11T21:38:06.6555357Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.6555506Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.6555728Z arg0_1 = rand_strided((4, 8, 100), (800, 100, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.6555936Z arg1_1 = rand_strided((4, 8, 80), (640, 80, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.6556048Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.6556059Z 2023-01-11T21:38:06.6556064Z 2023-01-11T21:38:06.6556154Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.6556228Z import torch 2023-01-11T21:38:06.6556301Z import random 2023-01-11T21:38:06.6556419Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.6556546Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.6556551Z 2023-01-11T21:38:06.6556635Z aten = torch.ops.aten 2023-01-11T21:38:06.6556771Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.6556859Z async_compile = AsyncCompile() 2023-01-11T21:38:06.6556864Z 2023-01-11T21:38:06.6556940Z import triton 2023-01-11T21:38:06.6557032Z import triton.language as tl 2023-01-11T21:38:06.6557155Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.6557293Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.6557299Z 2023-01-11T21:38:06.6557303Z 2023-01-11T21:38:06.6557500Z triton_fused_slice_scatter_slice_scatter_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.6557577Z import triton 2023-01-11T21:38:06.6557668Z import triton.language as tl 2023-01-11T21:38:06.6557774Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6557875Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6558011Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6558135Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6558140Z 2023-01-11T21:38:06.6558603Z @pointwise(size_hints=[4096], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: '*fp16', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]}) 2023-01-11T21:38:06.6558680Z @triton.jit 2023-01-11T21:38:06.6558829Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6558902Z xnumel = 3200 2023-01-11T21:38:06.6558992Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6559122Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6559205Z xmask = xindex < xnumel 2023-01-11T21:38:06.6559281Z x0 = xindex % 100 2023-01-11T21:38:06.6559358Z x1 = (xindex // 100) 2023-01-11T21:38:06.6559431Z x2 = xindex 2023-01-11T21:38:06.6559645Z tmp8 = tl.load(in_ptr1 + (x2), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.6559757Z tmp16 = tl.load(in_ptr1 + (x2), xmask).to(tl.float32) 2023-01-11T21:38:06.6559828Z tmp0 = x0 2023-01-11T21:38:06.6559898Z tmp1 = 10 2023-01-11T21:38:06.6559979Z tmp2 = tmp0 >= tmp1 2023-01-11T21:38:06.6560050Z tmp3 = 90 2023-01-11T21:38:06.6560127Z tmp4 = tmp0 < tmp3 2023-01-11T21:38:06.6560205Z tmp5 = tmp2 & tmp4 2023-01-11T21:38:06.6560507Z tmp6 = tl.load(in_ptr0 + ((-10) + x0 + (80*x1) + tl.zeros([XBLOCK], tl.int32)), tmp5 & xmask, eviction_policy='evict_last', other=0).to(tl.float32) 2023-01-11T21:38:06.6560602Z tmp7 = tl.where(tmp5, tmp6, 0.0) 2023-01-11T21:38:06.6560698Z tmp9 = tl.where(tmp5, tmp7, tmp8) 2023-01-11T21:38:06.6560770Z tmp10 = x0 % 2 2023-01-11T21:38:06.6560840Z tmp11 = 0 2023-01-11T21:38:06.6560922Z tmp12 = tmp10 == tmp11 2023-01-11T21:38:06.6561003Z tmp13 = tmp5 & tmp12 2023-01-11T21:38:06.6561339Z tmp14 = tl.load(in_ptr0 + ((-5) + (80*x1) + (x0 // 2) + tl.zeros([XBLOCK], tl.int32)), tmp13 & xmask, other=0).to(tl.float32) 2023-01-11T21:38:06.6561436Z tmp15 = tl.where(tmp13, tmp14, 0.0) 2023-01-11T21:38:06.6561535Z tmp17 = tl.where(tmp13, tmp15, tmp16) 2023-01-11T21:38:06.6561670Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp9, xmask) 2023-01-11T21:38:06.6561809Z tl.store(out_ptr1 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp17, xmask) 2023-01-11T21:38:06.6561894Z ''') 2023-01-11T21:38:06.6561900Z 2023-01-11T21:38:06.6561904Z 2023-01-11T21:38:06.6561997Z async_compile.wait(globals()) 2023-01-11T21:38:06.6562073Z del async_compile 2023-01-11T21:38:06.6562079Z 2023-01-11T21:38:06.6562146Z def call(args): 2023-01-11T21:38:06.6562227Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.6562302Z args.clear() 2023-01-11T21:38:06.6562393Z with torch.cuda.device(0): 2023-01-11T21:38:06.6562604Z buf0 = empty_strided((4, 8, 100), (800, 100, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.6562820Z buf1 = empty_strided((4, 8, 100), (800, 100, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.6562912Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.6563083Z triton_fused_slice_scatter_slice_scatter_1_0.run(arg1_1, arg0_1, buf0, buf1, 3200, grid=grid(3200), stream=stream0) 2023-01-11T21:38:06.6563156Z del arg0_1 2023-01-11T21:38:06.6563229Z del arg1_1 2023-01-11T21:38:06.6563313Z return (buf0, buf1, ) 2023-01-11T21:38:06.6563318Z 2023-01-11T21:38:06.6563322Z 2023-01-11T21:38:06.6563406Z if __name__ == "__main__": 2023-01-11T21:38:06.6563522Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.6563649Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.6563863Z arg0_1 = rand_strided((4, 8, 100), (800, 100, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.6564064Z arg1_1 = rand_strided((4, 8, 80), (640, 80, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.6564187Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.6564193Z 2023-01-11T21:38:06.6564263Z ok (0.291s) 2023-01-11T21:38:06.6564746Z test_softmax_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.6564878Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.6565135Z [2023-01-11 21:35:38,542] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 838 2023-01-11T21:38:06.6565346Z [2023-01-11 21:35:38,589] torch._inductor.scheduler: [DEBUG] remove_buffer('buf0') 2023-01-11T21:38:06.6565551Z [2023-01-11 21:35:38,589] torch._inductor.scheduler: [DEBUG] remove_buffer('buf7') 2023-01-11T21:38:06.6565756Z [2023-01-11 21:35:38,589] torch._inductor.scheduler: [DEBUG] remove_buffer('buf6') 2023-01-11T21:38:06.6565953Z [2023-01-11 21:35:38,589] torch._inductor.scheduler: [DEBUG] remove_buffer('buf1') 2023-01-11T21:38:06.6566211Z [2023-01-11 21:35:38,707] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 838 2023-01-11T21:38:06.6566626Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.6566755Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.6567013Z [2023-01-11 21:35:38,851] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 839 2023-01-11T21:38:06.6567257Z [2023-01-11 21:35:38,899] torch._inductor.scheduler: [DEBUG] remove_buffer('buf0') 2023-01-11T21:38:06.6567458Z [2023-01-11 21:35:38,899] torch._inductor.scheduler: [DEBUG] remove_buffer('buf7') 2023-01-11T21:38:06.6567659Z [2023-01-11 21:35:38,899] torch._inductor.scheduler: [DEBUG] remove_buffer('buf6') 2023-01-11T21:38:06.6567857Z [2023-01-11 21:35:38,899] torch._inductor.scheduler: [DEBUG] remove_buffer('buf1') 2023-01-11T21:38:06.6567862Z 2023-01-11T21:38:06.6567959Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.6568027Z import torch 2023-01-11T21:38:06.6568100Z import random 2023-01-11T21:38:06.6568219Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.6568345Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.6568350Z 2023-01-11T21:38:06.6568433Z aten = torch.ops.aten 2023-01-11T21:38:06.6568570Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.6568669Z async_compile = AsyncCompile() 2023-01-11T21:38:06.6568675Z 2023-01-11T21:38:06.6568748Z import triton 2023-01-11T21:38:06.6568834Z import triton.language as tl 2023-01-11T21:38:06.6568958Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.6569097Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.6569105Z 2023-01-11T21:38:06.6569110Z 2023-01-11T21:38:06.6569296Z triton_fused_amax_1_exp_1_sub_1_sum_2_0 = async_compile.triton(''' 2023-01-11T21:38:06.6569369Z import triton 2023-01-11T21:38:06.6569461Z import triton.language as tl 2023-01-11T21:38:06.6569575Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6569678Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6569803Z from torch._inductor.triton_ops.autotune import reduction 2023-01-11T21:38:06.6569929Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6569934Z 2023-01-11T21:38:06.6570022Z @reduction(size_hints=[8, 8], 2023-01-11T21:38:06.6570144Z reduction_hint=ReductionHint.DEFAULT, 2023-01-11T21:38:06.6570228Z filename=__file__, 2023-01-11T21:38:06.6570633Z meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.6570709Z @triton.jit 2023-01-11T21:38:06.6570885Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr): 2023-01-11T21:38:06.6570951Z xnumel = 8 2023-01-11T21:38:06.6571022Z rnumel = 8 2023-01-11T21:38:06.6571118Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6571254Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.6571338Z xmask = xindex < xnumel 2023-01-11T21:38:06.6571457Z rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK]) 2023-01-11T21:38:06.6571528Z x0 = xindex 2023-01-11T21:38:06.6571709Z _tmp1 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("-inf") 2023-01-11T21:38:06.6571813Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.6571902Z rindex = roffset + rbase 2023-01-11T21:38:06.6571986Z rmask = rindex < rnumel 2023-01-11T21:38:06.6572057Z r1 = rindex 2023-01-11T21:38:06.6572279Z tmp0 = tl.load(in_ptr0 + (x0 + (8*r1)), rmask & xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.6572408Z _tmp1 = tl.where(xmask & rmask & (_tmp1 < tmp0), tmp0, _tmp1) 2023-01-11T21:38:06.6572516Z tmp1 = tl.reshape(tl.max(_tmp1, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.6572612Z tl.store(out_ptr0 + x0, tmp1, xmask) 2023-01-11T21:38:06.6572731Z _tmp5 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.6572837Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.6572925Z rindex = roffset + rbase 2023-01-11T21:38:06.6573011Z rmask = rindex < rnumel 2023-01-11T21:38:06.6573084Z r1 = rindex 2023-01-11T21:38:06.6573233Z tmp2 = tl.load(in_ptr0 + (x0 + (8*r1)), rmask & xmask) 2023-01-11T21:38:06.6573348Z tmp3 = tmp2 - tmp1 2023-01-11T21:38:06.6573432Z tmp4 = tl.exp(tmp3) 2023-01-11T21:38:06.6573556Z _tmp5 = tl.where(xmask & rmask, _tmp5 + tmp4, _tmp5) 2023-01-11T21:38:06.6573675Z tmp5 = tl.reshape(tl.sum(_tmp5, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.6573772Z tl.store(out_ptr1 + x0, tmp5, xmask) 2023-01-11T21:38:06.6573856Z ''') 2023-01-11T21:38:06.6573862Z 2023-01-11T21:38:06.6573867Z 2023-01-11T21:38:06.6574089Z triton_fused_add_amax_amax_2_div_div_1_div_2_exp_exp_2_sub_sub_2_1 = async_compile.triton(''' 2023-01-11T21:38:06.6574163Z import triton 2023-01-11T21:38:06.6574258Z import triton.language as tl 2023-01-11T21:38:06.6574371Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6574648Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6574784Z from torch._inductor.triton_ops.autotune import reduction 2023-01-11T21:38:06.6574913Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6574922Z 2023-01-11T21:38:06.6575010Z @reduction(size_hints=[8, 8], 2023-01-11T21:38:06.6575119Z reduction_hint=ReductionHint.INNER, 2023-01-11T21:38:06.6575202Z filename=__file__, 2023-01-11T21:38:06.6575647Z meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: '*fp32', 5: '*fp32', 6: '*fp32', 7: 'i32', 8: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5, 6), equal_to_1=())]}) 2023-01-11T21:38:06.6575723Z @triton.jit 2023-01-11T21:38:06.6575933Z def triton_(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr4, out_ptr5, out_ptr6, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr): 2023-01-11T21:38:06.6576006Z xnumel = 8 2023-01-11T21:38:06.6576077Z rnumel = 8 2023-01-11T21:38:06.6576173Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6576301Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.6576385Z xmask = xindex < xnumel 2023-01-11T21:38:06.6576501Z rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK]) 2023-01-11T21:38:06.6576571Z x0 = xindex 2023-01-11T21:38:06.6576756Z _tmp3 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("-inf") 2023-01-11T21:38:06.6576906Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.6576996Z rindex = roffset + rbase 2023-01-11T21:38:06.6577074Z rmask = rindex < rnumel 2023-01-11T21:38:06.6577201Z r1 = rindex 2023-01-11T21:38:06.6577424Z tmp0 = tl.load(in_ptr0 + (r1 + (8*x0)), rmask & xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.6577639Z tmp1 = tl.load(in_ptr1 + (r1 + (8*x0)), rmask & xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.6577720Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.6577851Z _tmp3 = tl.where(xmask & rmask & (_tmp3 < tmp2), tmp2, _tmp3) 2023-01-11T21:38:06.6577965Z tmp3 = tl.reshape(tl.max(_tmp3, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.6578078Z _tmp9 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.6578265Z _tmp10 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("-inf") 2023-01-11T21:38:06.6578369Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.6578457Z rindex = roffset + rbase 2023-01-11T21:38:06.6578547Z rmask = rindex < rnumel 2023-01-11T21:38:06.6578619Z r1 = rindex 2023-01-11T21:38:06.6578834Z tmp4 = tl.load(in_ptr0 + (r1 + (8*x0)), rmask & xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.6579050Z tmp5 = tl.load(in_ptr1 + (r1 + (8*x0)), rmask & xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.6579126Z tmp6 = tmp4 + tmp5 2023-01-11T21:38:06.6579241Z tmp7 = tmp6 - tmp3 2023-01-11T21:38:06.6579323Z tmp8 = tl.exp(tmp7) 2023-01-11T21:38:06.6579441Z _tmp9 = tl.where(xmask & rmask, _tmp9 + tmp8, _tmp9) 2023-01-11T21:38:06.6579574Z _tmp10 = tl.where(xmask & rmask & (_tmp10 < tmp5), tmp5, _tmp10) 2023-01-11T21:38:06.6579731Z tmp9 = tl.reshape(tl.sum(_tmp9, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.6579844Z tmp10 = tl.reshape(tl.max(_tmp10, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.6579955Z _tmp14 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.6580062Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.6580151Z rindex = roffset + rbase 2023-01-11T21:38:06.6580237Z rmask = rindex < rnumel 2023-01-11T21:38:06.6580309Z r1 = rindex 2023-01-11T21:38:06.6580526Z tmp11 = tl.load(in_ptr1 + (r1 + (8*x0)), rmask & xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.6580742Z tmp15 = tl.load(in_ptr0 + (r1 + (8*x0)), rmask & xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.6580852Z tmp20 = tl.load(in_ptr0 + (r1 + (8*x0)), rmask & xmask) 2023-01-11T21:38:06.6580954Z tmp21 = tl.load(in_ptr2 + (r1), rmask) 2023-01-11T21:38:06.6581058Z tmp24 = tl.load(in_ptr3 + (r1), rmask) 2023-01-11T21:38:06.6581182Z tmp12 = tmp11 - tmp10 2023-01-11T21:38:06.6581268Z tmp13 = tl.exp(tmp12) 2023-01-11T21:38:06.6581390Z _tmp14 = tl.where(xmask & rmask, _tmp14 + tmp13, _tmp14) 2023-01-11T21:38:06.6581473Z tmp16 = tmp15 + tmp11 2023-01-11T21:38:06.6581581Z tmp17 = tmp16 - tmp3 2023-01-11T21:38:06.6581668Z tmp18 = tl.exp(tmp17) 2023-01-11T21:38:06.6581747Z tmp19 = tmp18 / tmp9 2023-01-11T21:38:06.6581859Z tmp22 = tmp20 - tmp21 2023-01-11T21:38:06.6581939Z tmp23 = tl.exp(tmp22) 2023-01-11T21:38:06.6582017Z tmp25 = tmp23 / tmp24 2023-01-11T21:38:06.6582178Z tl.store(out_ptr4 + (r1 + (8*x0) + tl.zeros([XBLOCK, RBLOCK], tl.int32)), tmp19, rmask & xmask) 2023-01-11T21:38:06.6582330Z tl.store(out_ptr5 + (r1 + (8*x0) + tl.zeros([XBLOCK, RBLOCK], tl.int32)), tmp25, rmask & xmask) 2023-01-11T21:38:06.6582447Z tmp14 = tl.reshape(tl.sum(_tmp14, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.6582552Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.6582642Z rindex = roffset + rbase 2023-01-11T21:38:06.6582728Z rmask = rindex < rnumel 2023-01-11T21:38:06.6582801Z r1 = rindex 2023-01-11T21:38:06.6582919Z tmp26 = tl.load(in_ptr1 + (r1 + (8*x0)), rmask & xmask) 2023-01-11T21:38:06.6583029Z tmp27 = tmp26 - tmp10 2023-01-11T21:38:06.6583141Z tmp28 = tl.exp(tmp27) 2023-01-11T21:38:06.6583225Z tmp29 = tmp28 / tmp14 2023-01-11T21:38:06.6583378Z tl.store(out_ptr6 + (r1 + (8*x0) + tl.zeros([XBLOCK, RBLOCK], tl.int32)), tmp29, rmask & xmask) 2023-01-11T21:38:06.6583463Z ''') 2023-01-11T21:38:06.6583469Z 2023-01-11T21:38:06.6583473Z 2023-01-11T21:38:06.6583569Z async_compile.wait(globals()) 2023-01-11T21:38:06.6583647Z del async_compile 2023-01-11T21:38:06.6583652Z 2023-01-11T21:38:06.6583728Z def call(args): 2023-01-11T21:38:06.6583800Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.6583876Z args.clear() 2023-01-11T21:38:06.6583967Z with torch.cuda.device(0): 2023-01-11T21:38:06.6584174Z buf3 = empty_strided((1, 8), (8, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.6584371Z buf4 = empty_strided((1, 8), (8, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.6584464Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.6584624Z triton_fused_amax_1_exp_1_sub_1_sum_2_0.run(arg0_1, buf3, buf4, 8, 8, grid=grid(8), stream=stream0) 2023-01-11T21:38:06.6584820Z buf2 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.6585006Z buf5 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.6585204Z buf8 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.6585411Z triton_fused_add_amax_amax_2_div_div_1_div_2_exp_exp_2_sub_sub_2_1.run(arg0_1, arg1_1, buf3, buf4, buf2, buf5, buf8, 8, 8, grid=grid(8), stream=stream0) 2023-01-11T21:38:06.6585486Z del arg0_1 2023-01-11T21:38:06.6585587Z del arg1_1 2023-01-11T21:38:06.6585675Z return (buf2, buf5, buf8, ) 2023-01-11T21:38:06.6585681Z 2023-01-11T21:38:06.6585685Z 2023-01-11T21:38:06.6585768Z if __name__ == "__main__": 2023-01-11T21:38:06.6585886Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.6586006Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.6586210Z arg0_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.6586408Z arg1_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.6586527Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.6586533Z 2023-01-11T21:38:06.6586537Z 2023-01-11T21:38:06.6586636Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.6586709Z import torch 2023-01-11T21:38:06.6586785Z import random 2023-01-11T21:38:06.6586905Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.6587023Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.6587031Z 2023-01-11T21:38:06.6587113Z aten = torch.ops.aten 2023-01-11T21:38:06.6587252Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.6587348Z async_compile = AsyncCompile() 2023-01-11T21:38:06.6587353Z 2023-01-11T21:38:06.6587427Z import triton 2023-01-11T21:38:06.6587522Z import triton.language as tl 2023-01-11T21:38:06.6587648Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.6587782Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.6587795Z 2023-01-11T21:38:06.6587800Z 2023-01-11T21:38:06.6588014Z triton_fused_amax_1_convert_element_type_2_exp_1_sub_1_sum_2_0 = async_compile.triton(''' 2023-01-11T21:38:06.6588090Z import triton 2023-01-11T21:38:06.6588182Z import triton.language as tl 2023-01-11T21:38:06.6588294Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6588396Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6588528Z from torch._inductor.triton_ops.autotune import reduction 2023-01-11T21:38:06.6588654Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6588659Z 2023-01-11T21:38:06.6588741Z @reduction(size_hints=[8, 8], 2023-01-11T21:38:06.6588857Z reduction_hint=ReductionHint.DEFAULT, 2023-01-11T21:38:06.6588943Z filename=__file__, 2023-01-11T21:38:06.6589344Z meta={'signature': {0: '*fp16', 1: '*fp32', 2: '*fp32', 3: 'i32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.6589417Z @triton.jit 2023-01-11T21:38:06.6589592Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr): 2023-01-11T21:38:06.6589672Z xnumel = 8 2023-01-11T21:38:06.6589741Z rnumel = 8 2023-01-11T21:38:06.6589832Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6589969Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.6590055Z xmask = xindex < xnumel 2023-01-11T21:38:06.6590174Z rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK]) 2023-01-11T21:38:06.6590244Z x0 = xindex 2023-01-11T21:38:06.6590428Z _tmp2 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("-inf") 2023-01-11T21:38:06.6590535Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.6590617Z rindex = roffset + rbase 2023-01-11T21:38:06.6590699Z rmask = rindex < rnumel 2023-01-11T21:38:06.6590770Z r1 = rindex 2023-01-11T21:38:06.6591010Z tmp0 = tl.load(in_ptr0 + (x0 + (8*r1)), rmask & xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.6591103Z tmp1 = tmp0.to(tl.float32) 2023-01-11T21:38:06.6591231Z _tmp2 = tl.where(xmask & rmask & (_tmp2 < tmp1), tmp1, _tmp2) 2023-01-11T21:38:06.6591347Z tmp2 = tl.reshape(tl.max(_tmp2, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.6591438Z tl.store(out_ptr0 + x0, tmp2, xmask) 2023-01-11T21:38:06.6591583Z _tmp7 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.6591688Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.6591776Z rindex = roffset + rbase 2023-01-11T21:38:06.6591859Z rmask = rindex < rnumel 2023-01-11T21:38:06.6591932Z r1 = rindex 2023-01-11T21:38:06.6592066Z tmp3 = tl.load(in_ptr0 + (x0 + (8*r1)), rmask & xmask).to(tl.float32) 2023-01-11T21:38:06.6592149Z tmp4 = tmp3.to(tl.float32) 2023-01-11T21:38:06.6592264Z tmp5 = tmp4 - tmp2 2023-01-11T21:38:06.6592349Z tmp6 = tl.exp(tmp5) 2023-01-11T21:38:06.6592469Z _tmp7 = tl.where(xmask & rmask, _tmp7 + tmp6, _tmp7) 2023-01-11T21:38:06.6592583Z tmp7 = tl.reshape(tl.sum(_tmp7, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.6592682Z tl.store(out_ptr1 + x0, tmp7, xmask) 2023-01-11T21:38:06.6592764Z ''') 2023-01-11T21:38:06.6592769Z 2023-01-11T21:38:06.6592773Z 2023-01-11T21:38:06.6593131Z triton_fused_add_amax_amax_2_convert_element_type_convert_element_type_1_convert_element_type_3_convert_element_type_4_convert_element_type_5_exp_exp_2_1 = async_compile.triton(''' 2023-01-11T21:38:06.6593204Z import triton 2023-01-11T21:38:06.6593296Z import triton.language as tl 2023-01-11T21:38:06.6593412Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6593515Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6593650Z from torch._inductor.triton_ops.autotune import reduction 2023-01-11T21:38:06.6593775Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6593781Z 2023-01-11T21:38:06.6593868Z @reduction(size_hints=[8, 8], 2023-01-11T21:38:06.6593982Z reduction_hint=ReductionHint.INNER, 2023-01-11T21:38:06.6594060Z filename=__file__, 2023-01-11T21:38:06.6594494Z meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp32', 3: '*fp32', 4: '*fp16', 5: '*fp16', 6: '*fp16', 7: 'i32', 8: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5, 6), equal_to_1=())]}) 2023-01-11T21:38:06.6594572Z @triton.jit 2023-01-11T21:38:06.6594784Z def triton_(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr4, out_ptr5, out_ptr6, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr): 2023-01-11T21:38:06.6594856Z xnumel = 8 2023-01-11T21:38:06.6594927Z rnumel = 8 2023-01-11T21:38:06.6595051Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6595188Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.6595264Z xmask = xindex < xnumel 2023-01-11T21:38:06.6595384Z rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK]) 2023-01-11T21:38:06.6595455Z x0 = xindex 2023-01-11T21:38:06.6595638Z _tmp4 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("-inf") 2023-01-11T21:38:06.6595743Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.6595831Z rindex = roffset + rbase 2023-01-11T21:38:06.6595916Z rmask = rindex < rnumel 2023-01-11T21:38:06.6595980Z r1 = rindex 2023-01-11T21:38:06.6596220Z tmp0 = tl.load(in_ptr0 + (r1 + (8*x0)), rmask & xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.6596458Z tmp1 = tl.load(in_ptr1 + (r1 + (8*x0)), rmask & xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.6596539Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.6596630Z tmp3 = tmp2.to(tl.float32) 2023-01-11T21:38:06.6596758Z _tmp4 = tl.where(xmask & rmask & (_tmp4 < tmp3), tmp3, _tmp4) 2023-01-11T21:38:06.6596873Z tmp4 = tl.reshape(tl.max(_tmp4, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.6596984Z _tmp11 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.6597167Z _tmp13 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("-inf") 2023-01-11T21:38:06.6597273Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.6597361Z rindex = roffset + rbase 2023-01-11T21:38:06.6597446Z rmask = rindex < rnumel 2023-01-11T21:38:06.6597516Z r1 = rindex 2023-01-11T21:38:06.6597777Z tmp5 = tl.load(in_ptr0 + (r1 + (8*x0)), rmask & xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.6598003Z tmp6 = tl.load(in_ptr1 + (r1 + (8*x0)), rmask & xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.6598085Z tmp7 = tmp5 + tmp6 2023-01-11T21:38:06.6598178Z tmp8 = tmp7.to(tl.float32) 2023-01-11T21:38:06.6598290Z tmp9 = tmp8 - tmp4 2023-01-11T21:38:06.6598374Z tmp10 = tl.exp(tmp9) 2023-01-11T21:38:06.6598496Z _tmp11 = tl.where(xmask & rmask, _tmp11 + tmp10, _tmp11) 2023-01-11T21:38:06.6598586Z tmp12 = tmp6.to(tl.float32) 2023-01-11T21:38:06.6598714Z _tmp13 = tl.where(xmask & rmask & (_tmp13 < tmp12), tmp12, _tmp13) 2023-01-11T21:38:06.6598823Z tmp11 = tl.reshape(tl.sum(_tmp11, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.6598938Z tmp13 = tl.reshape(tl.max(_tmp13, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.6599053Z _tmp18 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.6599163Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.6599252Z rindex = roffset + rbase 2023-01-11T21:38:06.6599337Z rmask = rindex < rnumel 2023-01-11T21:38:06.6599408Z r1 = rindex 2023-01-11T21:38:06.6599641Z tmp14 = tl.load(in_ptr1 + (r1 + (8*x0)), rmask & xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.6599881Z tmp19 = tl.load(in_ptr0 + (r1 + (8*x0)), rmask & xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.6600012Z tmp26 = tl.load(in_ptr0 + (r1 + (8*x0)), rmask & xmask).to(tl.float32) 2023-01-11T21:38:06.6600114Z tmp28 = tl.load(in_ptr2 + (r1), rmask) 2023-01-11T21:38:06.6600216Z tmp31 = tl.load(in_ptr3 + (r1), rmask) 2023-01-11T21:38:06.6600309Z tmp15 = tmp14.to(tl.float32) 2023-01-11T21:38:06.6600428Z tmp16 = tmp15 - tmp13 2023-01-11T21:38:06.6600506Z tmp17 = tl.exp(tmp16) 2023-01-11T21:38:06.6600630Z _tmp18 = tl.where(xmask & rmask, _tmp18 + tmp17, _tmp18) 2023-01-11T21:38:06.6600715Z tmp20 = tmp19 + tmp14 2023-01-11T21:38:06.6600804Z tmp21 = tmp20.to(tl.float32) 2023-01-11T21:38:06.6600919Z tmp22 = tmp21 - tmp4 2023-01-11T21:38:06.6601002Z tmp23 = tl.exp(tmp22) 2023-01-11T21:38:06.6601088Z tmp24 = tmp23 / tmp11 2023-01-11T21:38:06.6601199Z tmp25 = tmp24.to(tl.float32) 2023-01-11T21:38:06.6601290Z tmp27 = tmp26.to(tl.float32) 2023-01-11T21:38:06.6601405Z tmp29 = tmp27 - tmp28 2023-01-11T21:38:06.6601485Z tmp30 = tl.exp(tmp29) 2023-01-11T21:38:06.6601569Z tmp32 = tmp30 / tmp31 2023-01-11T21:38:06.6601656Z tmp33 = tmp32.to(tl.float32) 2023-01-11T21:38:06.6601816Z tl.store(out_ptr4 + (r1 + (8*x0) + tl.zeros([XBLOCK, RBLOCK], tl.int32)), tmp25, rmask & xmask) 2023-01-11T21:38:06.6601967Z tl.store(out_ptr5 + (r1 + (8*x0) + tl.zeros([XBLOCK, RBLOCK], tl.int32)), tmp33, rmask & xmask) 2023-01-11T21:38:06.6602085Z tmp18 = tl.reshape(tl.sum(_tmp18, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.6602194Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.6602286Z rindex = roffset + rbase 2023-01-11T21:38:06.6602370Z rmask = rindex < rnumel 2023-01-11T21:38:06.6602441Z r1 = rindex 2023-01-11T21:38:06.6602575Z tmp34 = tl.load(in_ptr1 + (r1 + (8*x0)), rmask & xmask).to(tl.float32) 2023-01-11T21:38:06.6602667Z tmp35 = tmp34.to(tl.float32) 2023-01-11T21:38:06.6602777Z tmp36 = tmp35 - tmp13 2023-01-11T21:38:06.6602860Z tmp37 = tl.exp(tmp36) 2023-01-11T21:38:06.6602940Z tmp38 = tmp37 / tmp18 2023-01-11T21:38:06.6603029Z tmp39 = tmp38.to(tl.float32) 2023-01-11T21:38:06.6603184Z tl.store(out_ptr6 + (r1 + (8*x0) + tl.zeros([XBLOCK, RBLOCK], tl.int32)), tmp39, rmask & xmask) 2023-01-11T21:38:06.6603268Z ''') 2023-01-11T21:38:06.6603273Z 2023-01-11T21:38:06.6603278Z 2023-01-11T21:38:06.6603369Z async_compile.wait(globals()) 2023-01-11T21:38:06.6603440Z del async_compile 2023-01-11T21:38:06.6603471Z 2023-01-11T21:38:06.6603545Z def call(args): 2023-01-11T21:38:06.6603624Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.6603698Z args.clear() 2023-01-11T21:38:06.6603794Z with torch.cuda.device(0): 2023-01-11T21:38:06.6603996Z buf3 = empty_strided((1, 8), (8, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.6604196Z buf4 = empty_strided((1, 8), (8, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.6604281Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.6604465Z triton_fused_amax_1_convert_element_type_2_exp_1_sub_1_sum_2_0.run(arg0_1, buf3, buf4, 8, 8, grid=grid(8), stream=stream0) 2023-01-11T21:38:06.6604665Z buf2 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.6604863Z buf5 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.6605059Z buf8 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.6605353Z triton_fused_add_amax_amax_2_convert_element_type_convert_element_type_1_convert_element_type_3_convert_element_type_4_convert_element_type_5_exp_exp_2_1.run(arg0_1, arg1_1, buf3, buf4, buf2, buf5, buf8, 8, 8, grid=grid(8), stream=stream0) 2023-01-11T21:38:06.6605431Z del arg0_1 2023-01-11T21:38:06.6605504Z del arg1_1 2023-01-11T21:38:06.6605595Z return (buf2, buf5, buf8, ) 2023-01-11T21:38:06.6605600Z 2023-01-11T21:38:06.6605605Z 2023-01-11T21:38:06.6605678Z if __name__ == "__main__": 2023-01-11T21:38:06.6605796Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.6605924Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.6606122Z arg0_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.6606319Z arg1_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.6606440Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.6606710Z [2023-01-11 21:35:39,021] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 839 2023-01-11T21:38:06.6606716Z 2023-01-11T21:38:06.6606789Z ok (0.522s) 2023-01-11T21:38:06.6607279Z test_softmax_one_kernel_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.6607406Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.6607661Z [2023-01-11 21:35:39,208] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 840 2023-01-11T21:38:06.6607871Z [2023-01-11 21:35:39,225] torch._inductor.scheduler: [DEBUG] remove_buffer('buf0') 2023-01-11T21:38:06.6608080Z [2023-01-11 21:35:39,225] torch._inductor.scheduler: [DEBUG] remove_buffer('buf1') 2023-01-11T21:38:06.6608344Z [2023-01-11 21:35:39,322] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 840 2023-01-11T21:38:06.6608350Z 2023-01-11T21:38:06.6608447Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.6608524Z import torch 2023-01-11T21:38:06.6608597Z import random 2023-01-11T21:38:06.6608711Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.6608832Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.6608837Z 2023-01-11T21:38:06.6608919Z aten = torch.ops.aten 2023-01-11T21:38:06.6609056Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.6609150Z async_compile = AsyncCompile() 2023-01-11T21:38:06.6609155Z 2023-01-11T21:38:06.6609231Z import triton 2023-01-11T21:38:06.6609323Z import triton.language as tl 2023-01-11T21:38:06.6609446Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.6609618Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.6609624Z 2023-01-11T21:38:06.6609634Z 2023-01-11T21:38:06.6609812Z triton_fused_amax_div_exp_mul_sum_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.6609884Z import triton 2023-01-11T21:38:06.6609977Z import triton.language as tl 2023-01-11T21:38:06.6610094Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6610194Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6610328Z from torch._inductor.triton_ops.autotune import reduction 2023-01-11T21:38:06.6610453Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6610458Z 2023-01-11T21:38:06.6610542Z @reduction(size_hints=[16, 32], 2023-01-11T21:38:06.6610657Z reduction_hint=ReductionHint.INNER, 2023-01-11T21:38:06.6610742Z filename=__file__, 2023-01-11T21:38:06.6611108Z meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.6611186Z @triton.jit 2023-01-11T21:38:06.6611358Z def triton_(in_ptr0, out_ptr2, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr): 2023-01-11T21:38:06.6611430Z xnumel = 16 2023-01-11T21:38:06.6611498Z rnumel = 32 2023-01-11T21:38:06.6611591Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6611727Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.6611810Z xmask = xindex < xnumel 2023-01-11T21:38:06.6611928Z rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK]) 2023-01-11T21:38:06.6611999Z x0 = xindex 2023-01-11T21:38:06.6612182Z _tmp1 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("-inf") 2023-01-11T21:38:06.6612287Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.6612369Z rindex = roffset + rbase 2023-01-11T21:38:06.6612450Z rmask = rindex < rnumel 2023-01-11T21:38:06.6612525Z r1 = rindex 2023-01-11T21:38:06.6612739Z tmp0 = tl.load(in_ptr0 + (r1 + (32*x0)), rmask & xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.6612866Z _tmp1 = tl.where(xmask & rmask & (_tmp1 < tmp0), tmp0, _tmp1) 2023-01-11T21:38:06.6612979Z tmp1 = tl.reshape(tl.max(_tmp1, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.6613121Z _tmp5 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.6613221Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.6613311Z rindex = roffset + rbase 2023-01-11T21:38:06.6613397Z rmask = rindex < rnumel 2023-01-11T21:38:06.6613468Z r1 = rindex 2023-01-11T21:38:06.6613683Z tmp2 = tl.load(in_ptr0 + (r1 + (32*x0)), rmask & xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.6613769Z tmp3 = tmp2 * tmp1 2023-01-11T21:38:06.6613850Z tmp4 = tl.exp(tmp3) 2023-01-11T21:38:06.6613964Z _tmp5 = tl.where(xmask & rmask, _tmp5 + tmp4, _tmp5) 2023-01-11T21:38:06.6614080Z tmp5 = tl.reshape(tl.sum(_tmp5, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.6614185Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.6614273Z rindex = roffset + rbase 2023-01-11T21:38:06.6614359Z rmask = rindex < rnumel 2023-01-11T21:38:06.6614429Z r1 = rindex 2023-01-11T21:38:06.6614657Z tmp6 = tl.load(in_ptr0 + (r1 + (32*x0)), rmask & xmask) 2023-01-11T21:38:06.6614734Z tmp7 = tmp6 * tmp1 2023-01-11T21:38:06.6614816Z tmp8 = tl.exp(tmp7) 2023-01-11T21:38:06.6614895Z tmp9 = tmp8 / tmp5 2023-01-11T21:38:06.6615053Z tl.store(out_ptr2 + (r1 + (32*x0) + tl.zeros([XBLOCK, RBLOCK], tl.int32)), tmp9, rmask & xmask) 2023-01-11T21:38:06.6615138Z ''') 2023-01-11T21:38:06.6615144Z 2023-01-11T21:38:06.6615148Z 2023-01-11T21:38:06.6615240Z async_compile.wait(globals()) 2023-01-11T21:38:06.6615314Z del async_compile 2023-01-11T21:38:06.6615320Z 2023-01-11T21:38:06.6615395Z def call(args): 2023-01-11T21:38:06.6615461Z arg0_1, = args 2023-01-11T21:38:06.6615606Z args.clear() 2023-01-11T21:38:06.6615712Z with torch.cuda.device(0): 2023-01-11T21:38:06.6615935Z buf2 = empty_strided((16, 32), (32, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.6616029Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.6616189Z triton_fused_amax_div_exp_mul_sum_1_0.run(arg0_1, buf2, 16, 32, grid=grid(16), stream=stream0) 2023-01-11T21:38:06.6616260Z del arg0_1 2023-01-11T21:38:06.6616330Z return (buf2, ) 2023-01-11T21:38:06.6616336Z 2023-01-11T21:38:06.6616340Z 2023-01-11T21:38:06.6616419Z if __name__ == "__main__": 2023-01-11T21:38:06.6616536Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.6616665Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.6616874Z arg0_1 = rand_strided((16, 32), (32, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.6616986Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.6616991Z 2023-01-11T21:38:06.6617065Z ok (0.300s) 2023-01-11T21:38:06.6617573Z test_sort_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.6617703Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.6617953Z [2023-01-11 21:35:39,336] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 841 2023-01-11T21:38:06.6618171Z [2023-01-11 21:35:39,341] torch._inductor.ir: [WARNING] Using FallbackKernel: aten.sort 2023-01-11T21:38:06.6618433Z [2023-01-11 21:35:39,344] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 841 2023-01-11T21:38:06.6618848Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.6619025Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.6619280Z [2023-01-11 21:35:39,356] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 842 2023-01-11T21:38:06.6619493Z [2023-01-11 21:35:39,362] torch._inductor.ir: [WARNING] Using FallbackKernel: aten.sort 2023-01-11T21:38:06.6619748Z [2023-01-11 21:35:39,364] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 842 2023-01-11T21:38:06.6619754Z 2023-01-11T21:38:06.6619852Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.6619923Z import torch 2023-01-11T21:38:06.6619991Z import random 2023-01-11T21:38:06.6620114Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.6620235Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.6620240Z 2023-01-11T21:38:06.6620319Z aten = torch.ops.aten 2023-01-11T21:38:06.6620457Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.6620555Z async_compile = AsyncCompile() 2023-01-11T21:38:06.6620561Z 2023-01-11T21:38:06.6620636Z import triton 2023-01-11T21:38:06.6620720Z import triton.language as tl 2023-01-11T21:38:06.6620844Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.6620982Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.6620988Z 2023-01-11T21:38:06.6620992Z 2023-01-11T21:38:06.6621085Z async_compile.wait(globals()) 2023-01-11T21:38:06.6621162Z del async_compile 2023-01-11T21:38:06.6621167Z 2023-01-11T21:38:06.6621238Z def call(args): 2023-01-11T21:38:06.6621312Z arg0_1, = args 2023-01-11T21:38:06.6621384Z args.clear() 2023-01-11T21:38:06.6621501Z with torch.cuda.device(0): 2023-01-11T21:38:06.6621591Z buf0 = aten.sort(arg0_1) 2023-01-11T21:38:06.6621667Z del arg0_1 2023-01-11T21:38:06.6621743Z buf1 = buf0[0] 2023-01-11T21:38:06.6621860Z assert_size_stride(buf1, (1, 1, 8, 8), (64, 64, 8, 1)) 2023-01-11T21:38:06.6621936Z buf2 = buf0[1] 2023-01-11T21:38:06.6622049Z assert_size_stride(buf2, (1, 1, 8, 8), (64, 64, 8, 1)) 2023-01-11T21:38:06.6622115Z del buf0 2023-01-11T21:38:06.6622200Z return (buf1, buf2, ) 2023-01-11T21:38:06.6622205Z 2023-01-11T21:38:06.6622210Z 2023-01-11T21:38:06.6622292Z if __name__ == "__main__": 2023-01-11T21:38:06.6622413Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.6622540Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.6622761Z arg0_1 = rand_strided((1, 1, 8, 8), (64, 64, 8, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.6622884Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.6622892Z 2023-01-11T21:38:06.6622896Z 2023-01-11T21:38:06.6623003Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.6623073Z import torch 2023-01-11T21:38:06.6623150Z import random 2023-01-11T21:38:06.6623279Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.6623418Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.6623423Z 2023-01-11T21:38:06.6623508Z aten = torch.ops.aten 2023-01-11T21:38:06.6623659Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.6623760Z async_compile = AsyncCompile() 2023-01-11T21:38:06.6623765Z 2023-01-11T21:38:06.6623840Z import triton 2023-01-11T21:38:06.6623932Z import triton.language as tl 2023-01-11T21:38:06.6624071Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.6624223Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.6624228Z 2023-01-11T21:38:06.6624233Z 2023-01-11T21:38:06.6624331Z async_compile.wait(globals()) 2023-01-11T21:38:06.6624416Z del async_compile 2023-01-11T21:38:06.6624421Z 2023-01-11T21:38:06.6624501Z def call(args): 2023-01-11T21:38:06.6624577Z arg0_1, = args 2023-01-11T21:38:06.6624648Z args.clear() 2023-01-11T21:38:06.6624745Z with torch.cuda.device(0): 2023-01-11T21:38:06.6624838Z buf0 = aten.sort(arg0_1) 2023-01-11T21:38:06.6624942Z del arg0_1 2023-01-11T21:38:06.6625021Z buf1 = buf0[0] 2023-01-11T21:38:06.6625143Z assert_size_stride(buf1, (1, 1, 8, 8), (64, 64, 8, 1)) 2023-01-11T21:38:06.6625219Z buf2 = buf0[1] 2023-01-11T21:38:06.6625331Z assert_size_stride(buf2, (1, 1, 8, 8), (64, 64, 8, 1)) 2023-01-11T21:38:06.6625406Z del buf0 2023-01-11T21:38:06.6625495Z return (buf1, buf2, ) 2023-01-11T21:38:06.6625500Z 2023-01-11T21:38:06.6625504Z 2023-01-11T21:38:06.6625588Z if __name__ == "__main__": 2023-01-11T21:38:06.6625719Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.6625862Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.6626110Z arg0_1 = rand_strided((1, 1, 8, 8), (64, 64, 8, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.6626230Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.6626235Z 2023-01-11T21:38:06.6626302Z ok (0.042s) 2023-01-11T21:38:06.6626840Z test_split_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.6626984Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.6627284Z [2023-01-11 21:35:39,386] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 843 2023-01-11T21:38:06.6627582Z [2023-01-11 21:35:39,389] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 843 2023-01-11T21:38:06.6628099Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.6628247Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.6628540Z [2023-01-11 21:35:39,411] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 844 2023-01-11T21:38:06.6628841Z [2023-01-11 21:35:39,414] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 844 2023-01-11T21:38:06.6629329Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.6629473Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.6629766Z [2023-01-11 21:35:39,439] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 845 2023-01-11T21:38:06.6630056Z [2023-01-11 21:35:39,512] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 845 2023-01-11T21:38:06.6630533Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.6630679Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.6630969Z [2023-01-11 21:35:39,537] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 846 2023-01-11T21:38:06.6631296Z [2023-01-11 21:35:39,607] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 846 2023-01-11T21:38:06.6631302Z 2023-01-11T21:38:06.6631409Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.6631487Z import torch 2023-01-11T21:38:06.6631565Z import random 2023-01-11T21:38:06.6631696Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.6631825Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.6631830Z 2023-01-11T21:38:06.6631916Z aten = torch.ops.aten 2023-01-11T21:38:06.6632067Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.6632169Z async_compile = AsyncCompile() 2023-01-11T21:38:06.6632174Z 2023-01-11T21:38:06.6632251Z import triton 2023-01-11T21:38:06.6632353Z import triton.language as tl 2023-01-11T21:38:06.6632490Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.6632644Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.6632649Z 2023-01-11T21:38:06.6632654Z 2023-01-11T21:38:06.6632745Z async_compile.wait(globals()) 2023-01-11T21:38:06.6632828Z del async_compile 2023-01-11T21:38:06.6632833Z 2023-01-11T21:38:06.6632910Z def call(args): 2023-01-11T21:38:06.6632986Z arg0_1, = args 2023-01-11T21:38:06.6633064Z args.clear() 2023-01-11T21:38:06.6633287Z return (as_strided(arg0_1, (2, 2, 3), (20, 10, 1)), as_strided(arg0_1, (2, 2, 3), (20, 10, 1), 3), as_strided(arg0_1, (2, 2, 3), (20, 10, 1), 6), as_strided(arg0_1, (2, 2, 1), (20, 10, 1), 9), ) 2023-01-11T21:38:06.6633293Z 2023-01-11T21:38:06.6633297Z 2023-01-11T21:38:06.6633384Z if __name__ == "__main__": 2023-01-11T21:38:06.6633511Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.6633642Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.6633909Z arg0_1 = rand_strided((2, 2, 10), (20, 10, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.6634023Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.6634029Z 2023-01-11T21:38:06.6634033Z 2023-01-11T21:38:06.6634133Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.6634212Z import torch 2023-01-11T21:38:06.6634291Z import random 2023-01-11T21:38:06.6634413Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.6634542Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.6634547Z 2023-01-11T21:38:06.6634624Z aten = torch.ops.aten 2023-01-11T21:38:06.6634759Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.6634859Z async_compile = AsyncCompile() 2023-01-11T21:38:06.6634864Z 2023-01-11T21:38:06.6634939Z import triton 2023-01-11T21:38:06.6635036Z import triton.language as tl 2023-01-11T21:38:06.6635162Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.6635306Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.6635311Z 2023-01-11T21:38:06.6635316Z 2023-01-11T21:38:06.6635411Z async_compile.wait(globals()) 2023-01-11T21:38:06.6635484Z del async_compile 2023-01-11T21:38:06.6635496Z 2023-01-11T21:38:06.6635568Z def call(args): 2023-01-11T21:38:06.6635643Z arg0_1, = args 2023-01-11T21:38:06.6635721Z args.clear() 2023-01-11T21:38:06.6635918Z return (as_strided(arg0_1, (2, 2, 3), (20, 10, 1)), as_strided(arg0_1, (2, 2, 3), (20, 10, 1), 3), as_strided(arg0_1, (2, 2, 3), (20, 10, 1), 6), as_strided(arg0_1, (2, 2, 1), (20, 10, 1), 9), ) 2023-01-11T21:38:06.6635925Z 2023-01-11T21:38:06.6635929Z 2023-01-11T21:38:06.6636010Z if __name__ == "__main__": 2023-01-11T21:38:06.6636131Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.6636261Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.6636475Z arg0_1 = rand_strided((2, 2, 10), (20, 10, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.6636585Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.6636590Z 2023-01-11T21:38:06.6636595Z 2023-01-11T21:38:06.6636694Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.6636770Z import torch 2023-01-11T21:38:06.6636847Z import random 2023-01-11T21:38:06.6636995Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.6637122Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.6637127Z 2023-01-11T21:38:06.6637211Z aten = torch.ops.aten 2023-01-11T21:38:06.6637342Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.6637439Z async_compile = AsyncCompile() 2023-01-11T21:38:06.6637444Z 2023-01-11T21:38:06.6637518Z import triton 2023-01-11T21:38:06.6637615Z import triton.language as tl 2023-01-11T21:38:06.6637742Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.6637882Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.6637891Z 2023-01-11T21:38:06.6637895Z 2023-01-11T21:38:06.6638051Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.6638129Z import triton 2023-01-11T21:38:06.6638216Z import triton.language as tl 2023-01-11T21:38:06.6638332Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6638439Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6638575Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6638703Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6638708Z 2023-01-11T21:38:06.6639111Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.6639191Z @triton.jit 2023-01-11T21:38:06.6639326Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6639423Z xnumel = 40 2023-01-11T21:38:06.6639523Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6639654Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6639740Z xmask = xindex < xnumel 2023-01-11T21:38:06.6639814Z x0 = xindex 2023-01-11T21:38:06.6639912Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.6639988Z tmp1 = 1 2023-01-11T21:38:06.6640063Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.6640201Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.6640287Z ''') 2023-01-11T21:38:06.6640293Z 2023-01-11T21:38:06.6640297Z 2023-01-11T21:38:06.6640392Z async_compile.wait(globals()) 2023-01-11T21:38:06.6640471Z del async_compile 2023-01-11T21:38:06.6640476Z 2023-01-11T21:38:06.6640553Z def call(args): 2023-01-11T21:38:06.6640629Z arg0_1, = args 2023-01-11T21:38:06.6640704Z args.clear() 2023-01-11T21:38:06.6640791Z with torch.cuda.device(0): 2023-01-11T21:38:06.6641007Z buf0 = empty_strided((2, 2, 10), (20, 10, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.6641104Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.6641247Z triton_fused_add_0.run(arg0_1, buf0, 40, grid=grid(40), stream=stream0) 2023-01-11T21:38:06.6641323Z del arg0_1 2023-01-11T21:38:06.6641521Z return (as_strided(buf0, (2, 2, 3), (20, 10, 1)), as_strided(buf0, (2, 2, 3), (20, 10, 1), 3), as_strided(buf0, (2, 2, 3), (20, 10, 1), 6), as_strided(buf0, (2, 2, 1), (20, 10, 0), 9), ) 2023-01-11T21:38:06.6641528Z 2023-01-11T21:38:06.6641532Z 2023-01-11T21:38:06.6641615Z if __name__ == "__main__": 2023-01-11T21:38:06.6641735Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.6641856Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.6642068Z arg0_1 = rand_strided((2, 2, 10), (20, 10, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.6642184Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.6642191Z 2023-01-11T21:38:06.6642196Z 2023-01-11T21:38:06.6642301Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.6642377Z import torch 2023-01-11T21:38:06.6642454Z import random 2023-01-11T21:38:06.6642578Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.6642702Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.6642733Z 2023-01-11T21:38:06.6642813Z aten = torch.ops.aten 2023-01-11T21:38:06.6642950Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.6643048Z async_compile = AsyncCompile() 2023-01-11T21:38:06.6643053Z 2023-01-11T21:38:06.6643129Z import triton 2023-01-11T21:38:06.6643224Z import triton.language as tl 2023-01-11T21:38:06.6643352Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.6643492Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.6643498Z 2023-01-11T21:38:06.6643502Z 2023-01-11T21:38:06.6643656Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.6643730Z import triton 2023-01-11T21:38:06.6643824Z import triton.language as tl 2023-01-11T21:38:06.6643939Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6644045Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6644179Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6644312Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6644317Z 2023-01-11T21:38:06.6644719Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.6644794Z @triton.jit 2023-01-11T21:38:06.6644923Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6644999Z xnumel = 40 2023-01-11T21:38:06.6645098Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6645230Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6645343Z xmask = xindex < xnumel 2023-01-11T21:38:06.6645416Z x0 = xindex 2023-01-11T21:38:06.6645535Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.6645601Z tmp1 = 1 2023-01-11T21:38:06.6645681Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.6645821Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.6645910Z ''') 2023-01-11T21:38:06.6645916Z 2023-01-11T21:38:06.6645920Z 2023-01-11T21:38:06.6646016Z async_compile.wait(globals()) 2023-01-11T21:38:06.6646095Z del async_compile 2023-01-11T21:38:06.6646100Z 2023-01-11T21:38:06.6646176Z def call(args): 2023-01-11T21:38:06.6646244Z arg0_1, = args 2023-01-11T21:38:06.6646320Z args.clear() 2023-01-11T21:38:06.6646416Z with torch.cuda.device(0): 2023-01-11T21:38:06.6646628Z buf0 = empty_strided((2, 2, 10), (20, 10, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.6646723Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.6646865Z triton_fused_add_0.run(arg0_1, buf0, 40, grid=grid(40), stream=stream0) 2023-01-11T21:38:06.6646942Z del arg0_1 2023-01-11T21:38:06.6647134Z return (as_strided(buf0, (2, 2, 3), (20, 10, 1)), as_strided(buf0, (2, 2, 3), (20, 10, 1), 3), as_strided(buf0, (2, 2, 3), (20, 10, 1), 6), as_strided(buf0, (2, 2, 1), (20, 10, 0), 9), ) 2023-01-11T21:38:06.6647140Z 2023-01-11T21:38:06.6647144Z 2023-01-11T21:38:06.6647220Z if __name__ == "__main__": 2023-01-11T21:38:06.6647340Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.6647467Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.6647679Z arg0_1 = rand_strided((2, 2, 10), (20, 10, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.6647797Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.6647802Z 2023-01-11T21:38:06.6647874Z ok (0.244s) 2023-01-11T21:38:06.6648336Z test_split_with_sizes_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.6648503Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.6648763Z [2023-01-11 21:35:39,638] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 847 2023-01-11T21:38:06.6649028Z [2023-01-11 21:35:39,738] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 847 2023-01-11T21:38:06.6649434Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.6649571Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.6649828Z [2023-01-11 21:35:39,768] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 848 2023-01-11T21:38:06.6649834Z 2023-01-11T21:38:06.6649938Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.6650016Z import torch 2023-01-11T21:38:06.6650092Z import random 2023-01-11T21:38:06.6650213Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.6650338Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.6650343Z 2023-01-11T21:38:06.6650427Z aten = torch.ops.aten 2023-01-11T21:38:06.6650559Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.6650658Z async_compile = AsyncCompile() 2023-01-11T21:38:06.6650663Z 2023-01-11T21:38:06.6650741Z import triton 2023-01-11T21:38:06.6650835Z import triton.language as tl 2023-01-11T21:38:06.6651002Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.6651145Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.6651150Z 2023-01-11T21:38:06.6651155Z 2023-01-11T21:38:06.6651311Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.6651389Z import triton 2023-01-11T21:38:06.6651477Z import triton.language as tl 2023-01-11T21:38:06.6651595Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6651699Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6651835Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6651964Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6651969Z 2023-01-11T21:38:06.6652368Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.6652446Z @triton.jit 2023-01-11T21:38:06.6652581Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6652651Z xnumel = 12 2023-01-11T21:38:06.6652751Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6652883Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6652975Z xmask = xindex < xnumel 2023-01-11T21:38:06.6653053Z x0 = xindex % 3 2023-01-11T21:38:06.6653135Z x1 = (xindex // 3) 2023-01-11T21:38:06.6653200Z x2 = xindex 2023-01-11T21:38:06.6653407Z tmp0 = tl.load(in_ptr0 + (x0 + (10*x1)), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.6653483Z tmp1 = 2.0 2023-01-11T21:38:06.6653568Z tmp2 = tmp0 * tmp1 2023-01-11T21:38:06.6653643Z tmp3 = 1.0 2023-01-11T21:38:06.6653723Z tmp4 = tmp2 + tmp3 2023-01-11T21:38:06.6653859Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.6653939Z ''') 2023-01-11T21:38:06.6653955Z 2023-01-11T21:38:06.6653960Z 2023-01-11T21:38:06.6654112Z triton_fused_add_1_1 = async_compile.triton(''' 2023-01-11T21:38:06.6654189Z import triton 2023-01-11T21:38:06.6654284Z import triton.language as tl 2023-01-11T21:38:06.6654401Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6654628Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6654860Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6654988Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6654993Z 2023-01-11T21:38:06.6655387Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.6655462Z @triton.jit 2023-01-11T21:38:06.6655595Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6655669Z xnumel = 12 2023-01-11T21:38:06.6655765Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6655897Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6655979Z xmask = xindex < xnumel 2023-01-11T21:38:06.6656053Z x0 = xindex % 3 2023-01-11T21:38:06.6656123Z x1 = (xindex // 3) 2023-01-11T21:38:06.6656193Z x2 = xindex 2023-01-11T21:38:06.6656400Z tmp0 = tl.load(in_ptr0 + (3 + x0 + (10*x1)), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.6656473Z tmp1 = 2.0 2023-01-11T21:38:06.6656549Z tmp2 = tmp0 * tmp1 2023-01-11T21:38:06.6656620Z tmp3 = 1.0 2023-01-11T21:38:06.6656690Z tmp4 = tmp2 + tmp3 2023-01-11T21:38:06.6656824Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.6656910Z ''') 2023-01-11T21:38:06.6656915Z 2023-01-11T21:38:06.6656920Z 2023-01-11T21:38:06.6657074Z triton_fused_add_2_2 = async_compile.triton(''' 2023-01-11T21:38:06.6657248Z import triton 2023-01-11T21:38:06.6657352Z import triton.language as tl 2023-01-11T21:38:06.6657517Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6657618Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6657743Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6657866Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6657871Z 2023-01-11T21:38:06.6658281Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.6658355Z @triton.jit 2023-01-11T21:38:06.6658486Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6658557Z xnumel = 16 2023-01-11T21:38:06.6658652Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6658781Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6658856Z xmask = xindex < xnumel 2023-01-11T21:38:06.6658937Z x0 = xindex % 4 2023-01-11T21:38:06.6659014Z x1 = (xindex // 4) 2023-01-11T21:38:06.6659084Z x2 = xindex 2023-01-11T21:38:06.6659194Z tmp0 = tl.load(in_ptr0 + (6 + x0 + (10*x1)), xmask) 2023-01-11T21:38:06.6659266Z tmp1 = 2.0 2023-01-11T21:38:06.6659343Z tmp2 = tmp0 * tmp1 2023-01-11T21:38:06.6659408Z tmp3 = 1.0 2023-01-11T21:38:06.6659488Z tmp4 = tmp2 + tmp3 2023-01-11T21:38:06.6659623Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.6659708Z ''') 2023-01-11T21:38:06.6659714Z 2023-01-11T21:38:06.6659718Z 2023-01-11T21:38:06.6659811Z async_compile.wait(globals()) 2023-01-11T21:38:06.6659888Z del async_compile 2023-01-11T21:38:06.6659893Z 2023-01-11T21:38:06.6659969Z def call(args): 2023-01-11T21:38:06.6660035Z arg0_1, = args 2023-01-11T21:38:06.6660111Z args.clear() 2023-01-11T21:38:06.6660202Z with torch.cuda.device(0): 2023-01-11T21:38:06.6660408Z buf0 = empty_strided((2, 2, 3), (6, 3, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.6660503Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.6660640Z triton_fused_add_0.run(arg0_1, buf0, 12, grid=grid(12), stream=stream0) 2023-01-11T21:38:06.6660838Z buf1 = empty_strided((2, 2, 3), (6, 3, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.6661002Z triton_fused_add_1_1.run(arg0_1, buf1, 12, grid=grid(12), stream=stream0) 2023-01-11T21:38:06.6661195Z buf2 = empty_strided((2, 2, 4), (8, 4, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.6661328Z triton_fused_add_2_2.run(arg0_1, buf2, 16, grid=grid(16), stream=stream0) 2023-01-11T21:38:06.6661402Z del arg0_1 2023-01-11T21:38:06.6661492Z return (buf0, buf1, buf2, ) 2023-01-11T21:38:06.6661497Z 2023-01-11T21:38:06.6661501Z 2023-01-11T21:38:06.6661580Z if __name__ == "__main__": 2023-01-11T21:38:06.6661700Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.6661827Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.6662038Z arg0_1 = rand_strided((2, 2, 10), (20, 10, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.6662144Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.6662149Z 2023-01-11T21:38:06.6662423Z [2023-01-11 21:35:39,859] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 848 2023-01-11T21:38:06.6662839Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.6662970Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.6663226Z [2023-01-11 21:35:39,889] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 849 2023-01-11T21:38:06.6663256Z 2023-01-11T21:38:06.6663357Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.6663434Z import torch 2023-01-11T21:38:06.6663511Z import random 2023-01-11T21:38:06.6663634Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.6663754Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.6663759Z 2023-01-11T21:38:06.6663844Z aten = torch.ops.aten 2023-01-11T21:38:06.6663983Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.6664079Z async_compile = AsyncCompile() 2023-01-11T21:38:06.6664084Z 2023-01-11T21:38:06.6664162Z import triton 2023-01-11T21:38:06.6664255Z import triton.language as tl 2023-01-11T21:38:06.6664383Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.6664517Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.6664531Z 2023-01-11T21:38:06.6664535Z 2023-01-11T21:38:06.6664684Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.6664765Z import triton 2023-01-11T21:38:06.6664860Z import triton.language as tl 2023-01-11T21:38:06.6664977Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6665081Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6665214Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6665344Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6665349Z 2023-01-11T21:38:06.6665753Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.6665822Z @triton.jit 2023-01-11T21:38:06.6665960Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6666036Z xnumel = 12 2023-01-11T21:38:06.6666133Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6666265Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6666354Z xmask = xindex < xnumel 2023-01-11T21:38:06.6666430Z x0 = xindex % 3 2023-01-11T21:38:06.6666503Z x1 = (xindex // 3) 2023-01-11T21:38:06.6666574Z x2 = xindex 2023-01-11T21:38:06.6666809Z tmp0 = tl.load(in_ptr0 + (x0 + (10*x1)), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.6666915Z tmp1 = 2.0 2023-01-11T21:38:06.6667001Z tmp2 = tmp0 * tmp1 2023-01-11T21:38:06.6667075Z tmp3 = 1.0 2023-01-11T21:38:06.6667154Z tmp4 = tmp2 + tmp3 2023-01-11T21:38:06.6667287Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.6667376Z ''') 2023-01-11T21:38:06.6667381Z 2023-01-11T21:38:06.6667386Z 2023-01-11T21:38:06.6667545Z triton_fused_add_1_1 = async_compile.triton(''' 2023-01-11T21:38:06.6667621Z import triton 2023-01-11T21:38:06.6667719Z import triton.language as tl 2023-01-11T21:38:06.6667837Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6667941Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6668072Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6668200Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6668205Z 2023-01-11T21:38:06.6668609Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.6668684Z @triton.jit 2023-01-11T21:38:06.6668819Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6668894Z xnumel = 12 2023-01-11T21:38:06.6668995Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6669127Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6669205Z xmask = xindex < xnumel 2023-01-11T21:38:06.6669281Z x0 = xindex % 3 2023-01-11T21:38:06.6669365Z x1 = (xindex // 3) 2023-01-11T21:38:06.6669435Z x2 = xindex 2023-01-11T21:38:06.6669703Z tmp0 = tl.load(in_ptr0 + (3 + x0 + (10*x1)), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.6669779Z tmp1 = 2.0 2023-01-11T21:38:06.6669860Z tmp2 = tmp0 * tmp1 2023-01-11T21:38:06.6669928Z tmp3 = 1.0 2023-01-11T21:38:06.6670009Z tmp4 = tmp2 + tmp3 2023-01-11T21:38:06.6670149Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.6670235Z ''') 2023-01-11T21:38:06.6670240Z 2023-01-11T21:38:06.6670245Z 2023-01-11T21:38:06.6670406Z triton_fused_add_2_2 = async_compile.triton(''' 2023-01-11T21:38:06.6670482Z import triton 2023-01-11T21:38:06.6670577Z import triton.language as tl 2023-01-11T21:38:06.6670686Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6670789Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6670924Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6671052Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6671061Z 2023-01-11T21:38:06.6671464Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.6671540Z @triton.jit 2023-01-11T21:38:06.6671680Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6671756Z xnumel = 16 2023-01-11T21:38:06.6671848Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6671979Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6672065Z xmask = xindex < xnumel 2023-01-11T21:38:06.6672141Z x0 = xindex % 4 2023-01-11T21:38:06.6672221Z x1 = (xindex // 4) 2023-01-11T21:38:06.6672292Z x2 = xindex 2023-01-11T21:38:06.6672421Z tmp0 = tl.load(in_ptr0 + (6 + x0 + (10*x1)), xmask).to(tl.float32) 2023-01-11T21:38:06.6672488Z tmp1 = 2.0 2023-01-11T21:38:06.6672570Z tmp2 = tmp0 * tmp1 2023-01-11T21:38:06.6672647Z tmp3 = 1.0 2023-01-11T21:38:06.6672728Z tmp4 = tmp2 + tmp3 2023-01-11T21:38:06.6672868Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.6672956Z ''') 2023-01-11T21:38:06.6672962Z 2023-01-11T21:38:06.6672966Z 2023-01-11T21:38:06.6673061Z async_compile.wait(globals()) 2023-01-11T21:38:06.6673168Z del async_compile 2023-01-11T21:38:06.6673178Z 2023-01-11T21:38:06.6673248Z def call(args): 2023-01-11T21:38:06.6673323Z arg0_1, = args 2023-01-11T21:38:06.6673400Z args.clear() 2023-01-11T21:38:06.6673497Z with torch.cuda.device(0): 2023-01-11T21:38:06.6673702Z buf0 = empty_strided((2, 2, 3), (6, 3, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.6673798Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.6673940Z triton_fused_add_0.run(arg0_1, buf0, 12, grid=grid(12), stream=stream0) 2023-01-11T21:38:06.6674136Z buf1 = empty_strided((2, 2, 3), (6, 3, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.6674280Z triton_fused_add_1_1.run(arg0_1, buf1, 12, grid=grid(12), stream=stream0) 2023-01-11T21:38:06.6674480Z buf2 = empty_strided((2, 2, 4), (8, 4, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.6674615Z triton_fused_add_2_2.run(arg0_1, buf2, 16, grid=grid(16), stream=stream0) 2023-01-11T21:38:06.6674696Z del arg0_1 2023-01-11T21:38:06.6674788Z return (buf0, buf1, buf2, ) 2023-01-11T21:38:06.6674793Z 2023-01-11T21:38:06.6674798Z 2023-01-11T21:38:06.6674879Z if __name__ == "__main__": 2023-01-11T21:38:06.6674999Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.6675121Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.6675335Z arg0_1 = rand_strided((2, 2, 10), (20, 10, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.6675452Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.6675457Z 2023-01-11T21:38:06.6675723Z [2023-01-11 21:35:39,986] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 849 2023-01-11T21:38:06.6676172Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.6676306Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.6676563Z [2023-01-11 21:35:40,016] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 850 2023-01-11T21:38:06.6676569Z 2023-01-11T21:38:06.6676671Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.6676748Z import torch 2023-01-11T21:38:06.6676818Z import random 2023-01-11T21:38:06.6676940Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.6677066Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.6677074Z 2023-01-11T21:38:06.6677158Z aten = torch.ops.aten 2023-01-11T21:38:06.6677297Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.6677395Z async_compile = AsyncCompile() 2023-01-11T21:38:06.6677400Z 2023-01-11T21:38:06.6677479Z import triton 2023-01-11T21:38:06.6677576Z import triton.language as tl 2023-01-11T21:38:06.6677697Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.6677840Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.6677845Z 2023-01-11T21:38:06.6677849Z 2023-01-11T21:38:06.6678003Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.6678080Z import triton 2023-01-11T21:38:06.6678174Z import triton.language as tl 2023-01-11T21:38:06.6678290Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6678394Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6678532Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6678656Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6678661Z 2023-01-11T21:38:06.6679086Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.6679164Z @triton.jit 2023-01-11T21:38:06.6679299Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6679375Z xnumel = 16 2023-01-11T21:38:06.6679473Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6679605Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6679694Z xmask = xindex < xnumel 2023-01-11T21:38:06.6679763Z x0 = xindex % 4 2023-01-11T21:38:06.6679843Z x1 = (xindex // 4) 2023-01-11T21:38:06.6679915Z x2 = xindex 2023-01-11T21:38:06.6680119Z tmp0 = tl.load(in_ptr0 + (x0 + (10*x1)), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.6680199Z tmp1 = 2.0 2023-01-11T21:38:06.6680281Z tmp2 = tmp0 * tmp1 2023-01-11T21:38:06.6680349Z tmp3 = 1.0 2023-01-11T21:38:06.6680429Z tmp4 = tmp2 + tmp3 2023-01-11T21:38:06.6680567Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.6680654Z ''') 2023-01-11T21:38:06.6680662Z 2023-01-11T21:38:06.6680666Z 2023-01-11T21:38:06.6680825Z triton_fused_add_1_1 = async_compile.triton(''' 2023-01-11T21:38:06.6680903Z import triton 2023-01-11T21:38:06.6681000Z import triton.language as tl 2023-01-11T21:38:06.6681119Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6681216Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6681354Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6681481Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6681486Z 2023-01-11T21:38:06.6681885Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.6681990Z @triton.jit 2023-01-11T21:38:06.6682128Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6682204Z xnumel = 12 2023-01-11T21:38:06.6682306Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6682430Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6682514Z xmask = xindex < xnumel 2023-01-11T21:38:06.6682591Z x0 = xindex % 3 2023-01-11T21:38:06.6682671Z x1 = (xindex // 3) 2023-01-11T21:38:06.6682743Z x2 = xindex 2023-01-11T21:38:06.6682955Z tmp0 = tl.load(in_ptr0 + (4 + x0 + (10*x1)), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.6683031Z tmp1 = 2.0 2023-01-11T21:38:06.6683105Z tmp2 = tmp0 * tmp1 2023-01-11T21:38:06.6683178Z tmp3 = 1.0 2023-01-11T21:38:06.6683257Z tmp4 = tmp2 + tmp3 2023-01-11T21:38:06.6683396Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.6683485Z ''') 2023-01-11T21:38:06.6683490Z 2023-01-11T21:38:06.6683496Z 2023-01-11T21:38:06.6683655Z triton_fused_add_2_2 = async_compile.triton(''' 2023-01-11T21:38:06.6683732Z import triton 2023-01-11T21:38:06.6683822Z import triton.language as tl 2023-01-11T21:38:06.6683939Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6684043Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6684178Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6684305Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6684310Z 2023-01-11T21:38:06.6684707Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.6684784Z @triton.jit 2023-01-11T21:38:06.6684923Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6684991Z xnumel = 12 2023-01-11T21:38:06.6685090Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6685222Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6685307Z xmask = xindex < xnumel 2023-01-11T21:38:06.6685414Z x0 = xindex % 3 2023-01-11T21:38:06.6685518Z x1 = (xindex // 3) 2023-01-11T21:38:06.6685595Z x2 = xindex 2023-01-11T21:38:06.6685723Z tmp0 = tl.load(in_ptr0 + (7 + x0 + (10*x1)), xmask) 2023-01-11T21:38:06.6685800Z tmp1 = 2.0 2023-01-11T21:38:06.6685881Z tmp2 = tmp0 * tmp1 2023-01-11T21:38:06.6685955Z tmp3 = 1.0 2023-01-11T21:38:06.6686035Z tmp4 = tmp2 + tmp3 2023-01-11T21:38:06.6686171Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.6686259Z ''') 2023-01-11T21:38:06.6686264Z 2023-01-11T21:38:06.6686269Z 2023-01-11T21:38:06.6686357Z async_compile.wait(globals()) 2023-01-11T21:38:06.6686439Z del async_compile 2023-01-11T21:38:06.6686444Z 2023-01-11T21:38:06.6686521Z def call(args): 2023-01-11T21:38:06.6686596Z arg0_1, = args 2023-01-11T21:38:06.6686674Z args.clear() 2023-01-11T21:38:06.6686768Z with torch.cuda.device(0): 2023-01-11T21:38:06.6686981Z buf0 = empty_strided((2, 2, 4), (8, 4, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.6687069Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.6687209Z triton_fused_add_0.run(arg0_1, buf0, 16, grid=grid(16), stream=stream0) 2023-01-11T21:38:06.6687415Z buf1 = empty_strided((2, 2, 3), (6, 3, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.6687554Z triton_fused_add_1_1.run(arg0_1, buf1, 12, grid=grid(12), stream=stream0) 2023-01-11T21:38:06.6687755Z buf2 = empty_strided((2, 2, 3), (6, 3, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.6687889Z triton_fused_add_2_2.run(arg0_1, buf2, 12, grid=grid(12), stream=stream0) 2023-01-11T21:38:06.6687992Z del arg0_1 2023-01-11T21:38:06.6688082Z return (buf0, buf1, buf2, ) 2023-01-11T21:38:06.6688088Z 2023-01-11T21:38:06.6688092Z 2023-01-11T21:38:06.6688168Z if __name__ == "__main__": 2023-01-11T21:38:06.6688289Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.6688424Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.6688637Z arg0_1 = rand_strided((2, 2, 10), (20, 10, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.6688754Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.6688759Z 2023-01-11T21:38:06.6689027Z [2023-01-11 21:35:40,107] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 850 2023-01-11T21:38:06.6689449Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.6689586Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.6689849Z [2023-01-11 21:35:40,141] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 851 2023-01-11T21:38:06.6689855Z 2023-01-11T21:38:06.6689956Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.6690025Z import torch 2023-01-11T21:38:06.6690103Z import random 2023-01-11T21:38:06.6690229Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.6690356Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.6690361Z 2023-01-11T21:38:06.6690445Z aten = torch.ops.aten 2023-01-11T21:38:06.6690584Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.6690681Z async_compile = AsyncCompile() 2023-01-11T21:38:06.6690686Z 2023-01-11T21:38:06.6690755Z import triton 2023-01-11T21:38:06.6690853Z import triton.language as tl 2023-01-11T21:38:06.6690978Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.6691123Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.6691129Z 2023-01-11T21:38:06.6691133Z 2023-01-11T21:38:06.6691324Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.6691401Z import triton 2023-01-11T21:38:06.6691496Z import triton.language as tl 2023-01-11T21:38:06.6691612Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6691709Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6691842Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6691970Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6691975Z 2023-01-11T21:38:06.6692386Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.6692466Z @triton.jit 2023-01-11T21:38:06.6692601Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6692677Z xnumel = 16 2023-01-11T21:38:06.6692776Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6692903Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6692988Z xmask = xindex < xnumel 2023-01-11T21:38:06.6693067Z x0 = xindex % 4 2023-01-11T21:38:06.6693148Z x1 = (xindex // 4) 2023-01-11T21:38:06.6693219Z x2 = xindex 2023-01-11T21:38:06.6693451Z tmp0 = tl.load(in_ptr0 + (x0 + (10*x1)), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.6693527Z tmp1 = 2.0 2023-01-11T21:38:06.6693600Z tmp2 = tmp0 * tmp1 2023-01-11T21:38:06.6693674Z tmp3 = 1.0 2023-01-11T21:38:06.6693752Z tmp4 = tmp2 + tmp3 2023-01-11T21:38:06.6693888Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.6694023Z ''') 2023-01-11T21:38:06.6694028Z 2023-01-11T21:38:06.6694033Z 2023-01-11T21:38:06.6694191Z triton_fused_add_1_1 = async_compile.triton(''' 2023-01-11T21:38:06.6694268Z import triton 2023-01-11T21:38:06.6694356Z import triton.language as tl 2023-01-11T21:38:06.6694471Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6694699Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6694845Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6694983Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6694988Z 2023-01-11T21:38:06.6695463Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.6695541Z @triton.jit 2023-01-11T21:38:06.6695685Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6695758Z xnumel = 12 2023-01-11T21:38:06.6695861Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6696001Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6696089Z xmask = xindex < xnumel 2023-01-11T21:38:06.6696170Z x0 = xindex % 3 2023-01-11T21:38:06.6696250Z x1 = (xindex // 3) 2023-01-11T21:38:06.6696328Z x2 = xindex 2023-01-11T21:38:06.6696587Z tmp0 = tl.load(in_ptr0 + (4 + x0 + (10*x1)), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.6696663Z tmp1 = 2.0 2023-01-11T21:38:06.6696744Z tmp2 = tmp0 * tmp1 2023-01-11T21:38:06.6696818Z tmp3 = 1.0 2023-01-11T21:38:06.6696899Z tmp4 = tmp2 + tmp3 2023-01-11T21:38:06.6697036Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.6697123Z ''') 2023-01-11T21:38:06.6697171Z 2023-01-11T21:38:06.6697176Z 2023-01-11T21:38:06.6697350Z triton_fused_add_2_2 = async_compile.triton(''' 2023-01-11T21:38:06.6697423Z import triton 2023-01-11T21:38:06.6697519Z import triton.language as tl 2023-01-11T21:38:06.6697636Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6697741Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6697876Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6698056Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6698062Z 2023-01-11T21:38:06.6698463Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.6698539Z @triton.jit 2023-01-11T21:38:06.6698667Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6698742Z xnumel = 12 2023-01-11T21:38:06.6698842Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6698974Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6699063Z xmask = xindex < xnumel 2023-01-11T21:38:06.6699140Z x0 = xindex % 3 2023-01-11T21:38:06.6699212Z x1 = (xindex // 3) 2023-01-11T21:38:06.6699285Z x2 = xindex 2023-01-11T21:38:06.6699414Z tmp0 = tl.load(in_ptr0 + (7 + x0 + (10*x1)), xmask).to(tl.float32) 2023-01-11T21:38:06.6699490Z tmp1 = 2.0 2023-01-11T21:38:06.6699574Z tmp2 = tmp0 * tmp1 2023-01-11T21:38:06.6699649Z tmp3 = 1.0 2023-01-11T21:38:06.6699728Z tmp4 = tmp2 + tmp3 2023-01-11T21:38:06.6699859Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.6699946Z ''') 2023-01-11T21:38:06.6699953Z 2023-01-11T21:38:06.6699957Z 2023-01-11T21:38:06.6700053Z async_compile.wait(globals()) 2023-01-11T21:38:06.6700132Z del async_compile 2023-01-11T21:38:06.6700137Z 2023-01-11T21:38:06.6700213Z def call(args): 2023-01-11T21:38:06.6700287Z arg0_1, = args 2023-01-11T21:38:06.6700367Z args.clear() 2023-01-11T21:38:06.6700463Z with torch.cuda.device(0): 2023-01-11T21:38:06.6700698Z buf0 = empty_strided((2, 2, 4), (8, 4, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.6700794Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.6700934Z triton_fused_add_0.run(arg0_1, buf0, 16, grid=grid(16), stream=stream0) 2023-01-11T21:38:06.6701139Z buf1 = empty_strided((2, 2, 3), (6, 3, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.6701279Z triton_fused_add_1_1.run(arg0_1, buf1, 12, grid=grid(12), stream=stream0) 2023-01-11T21:38:06.6701479Z buf2 = empty_strided((2, 2, 3), (6, 3, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.6701619Z triton_fused_add_2_2.run(arg0_1, buf2, 12, grid=grid(12), stream=stream0) 2023-01-11T21:38:06.6701695Z del arg0_1 2023-01-11T21:38:06.6701780Z return (buf0, buf1, buf2, ) 2023-01-11T21:38:06.6701785Z 2023-01-11T21:38:06.6701789Z 2023-01-11T21:38:06.6701872Z if __name__ == "__main__": 2023-01-11T21:38:06.6701993Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.6702124Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.6702336Z arg0_1 = rand_strided((2, 2, 10), (20, 10, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.6702451Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.6702456Z 2023-01-11T21:38:06.6702728Z [2023-01-11 21:35:40,237] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 851 2023-01-11T21:38:06.6703148Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.6703285Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.6703537Z [2023-01-11 21:35:40,273] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 852 2023-01-11T21:38:06.6703552Z 2023-01-11T21:38:06.6703645Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.6703723Z import torch 2023-01-11T21:38:06.6703800Z import random 2023-01-11T21:38:06.6703922Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.6704077Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.6704082Z 2023-01-11T21:38:06.6704168Z aten = torch.ops.aten 2023-01-11T21:38:06.6704306Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.6704397Z async_compile = AsyncCompile() 2023-01-11T21:38:06.6704403Z 2023-01-11T21:38:06.6704478Z import triton 2023-01-11T21:38:06.6704575Z import triton.language as tl 2023-01-11T21:38:06.6704702Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.6704844Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.6704850Z 2023-01-11T21:38:06.6704857Z 2023-01-11T21:38:06.6705013Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.6705090Z import triton 2023-01-11T21:38:06.6705184Z import triton.language as tl 2023-01-11T21:38:06.6705293Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6705400Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6705536Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6705664Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6705669Z 2023-01-11T21:38:06.6706070Z @pointwise(size_hints=[4], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.6706150Z @triton.jit 2023-01-11T21:38:06.6706284Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6706360Z xnumel = 4 2023-01-11T21:38:06.6706453Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6706615Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6706701Z xmask = xindex < xnumel 2023-01-11T21:38:06.6706775Z x0 = xindex 2023-01-11T21:38:06.6706972Z tmp0 = tl.load(in_ptr0 + (10*x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.6707047Z tmp1 = 2.0 2023-01-11T21:38:06.6707132Z tmp2 = tmp0 * tmp1 2023-01-11T21:38:06.6707200Z tmp3 = 1.0 2023-01-11T21:38:06.6707280Z tmp4 = tmp2 + tmp3 2023-01-11T21:38:06.6707417Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.6707504Z ''') 2023-01-11T21:38:06.6707509Z 2023-01-11T21:38:06.6707513Z 2023-01-11T21:38:06.6707673Z triton_fused_add_1_1 = async_compile.triton(''' 2023-01-11T21:38:06.6707750Z import triton 2023-01-11T21:38:06.6707850Z import triton.language as tl 2023-01-11T21:38:06.6707959Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6708064Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6708201Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6708329Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6708335Z 2023-01-11T21:38:06.6708740Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.6708815Z @triton.jit 2023-01-11T21:38:06.6708949Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6709023Z xnumel = 8 2023-01-11T21:38:06.6709115Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6709247Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6709333Z xmask = xindex < xnumel 2023-01-11T21:38:06.6709409Z x0 = xindex % 2 2023-01-11T21:38:06.6709489Z x1 = (xindex // 2) 2023-01-11T21:38:06.6709562Z x2 = xindex 2023-01-11T21:38:06.6709774Z tmp0 = tl.load(in_ptr0 + (1 + x0 + (10*x1)), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.6709843Z tmp1 = 2.0 2023-01-11T21:38:06.6709923Z tmp2 = tmp0 * tmp1 2023-01-11T21:38:06.6709998Z tmp3 = 1.0 2023-01-11T21:38:06.6710076Z tmp4 = tmp2 + tmp3 2023-01-11T21:38:06.6710241Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.6710331Z ''') 2023-01-11T21:38:06.6710336Z 2023-01-11T21:38:06.6710341Z 2023-01-11T21:38:06.6710502Z triton_fused_add_2_2 = async_compile.triton(''' 2023-01-11T21:38:06.6710572Z import triton 2023-01-11T21:38:06.6710667Z import triton.language as tl 2023-01-11T21:38:06.6710785Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6710888Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6711023Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6711154Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6711159Z 2023-01-11T21:38:06.6711562Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.6711638Z @triton.jit 2023-01-11T21:38:06.6711767Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6711847Z xnumel = 12 2023-01-11T21:38:06.6711949Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6712082Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6712169Z xmask = xindex < xnumel 2023-01-11T21:38:06.6712246Z x0 = xindex % 3 2023-01-11T21:38:06.6712328Z x1 = (xindex // 3) 2023-01-11T21:38:06.6712394Z x2 = xindex 2023-01-11T21:38:06.6712604Z tmp0 = tl.load(in_ptr0 + (3 + x0 + (10*x1)), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.6712680Z tmp1 = 2.0 2023-01-11T21:38:06.6712762Z tmp2 = tmp0 * tmp1 2023-01-11T21:38:06.6712872Z tmp3 = 1.0 2023-01-11T21:38:06.6712952Z tmp4 = tmp2 + tmp3 2023-01-11T21:38:06.6713091Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.6713172Z ''') 2023-01-11T21:38:06.6713177Z 2023-01-11T21:38:06.6713182Z 2023-01-11T21:38:06.6713342Z triton_fused_add_3_3 = async_compile.triton(''' 2023-01-11T21:38:06.6713422Z import triton 2023-01-11T21:38:06.6713517Z import triton.language as tl 2023-01-11T21:38:06.6713633Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6713737Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6713872Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6713993Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6714006Z 2023-01-11T21:38:06.6714405Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.6714486Z @triton.jit 2023-01-11T21:38:06.6714623Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6714700Z xnumel = 16 2023-01-11T21:38:06.6714799Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6714931Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6715018Z xmask = xindex < xnumel 2023-01-11T21:38:06.6715088Z x0 = xindex % 4 2023-01-11T21:38:06.6715170Z x1 = (xindex // 4) 2023-01-11T21:38:06.6715251Z x2 = xindex 2023-01-11T21:38:06.6715381Z tmp0 = tl.load(in_ptr0 + (6 + x0 + (10*x1)), xmask) 2023-01-11T21:38:06.6715471Z tmp1 = 2.0 2023-01-11T21:38:06.6715561Z tmp2 = tmp0 * tmp1 2023-01-11T21:38:06.6715634Z tmp3 = 1.0 2023-01-11T21:38:06.6715707Z tmp4 = tmp2 + tmp3 2023-01-11T21:38:06.6715844Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.6715932Z ''') 2023-01-11T21:38:06.6715937Z 2023-01-11T21:38:06.6715945Z 2023-01-11T21:38:06.6716042Z async_compile.wait(globals()) 2023-01-11T21:38:06.6716120Z del async_compile 2023-01-11T21:38:06.6716126Z 2023-01-11T21:38:06.6716202Z def call(args): 2023-01-11T21:38:06.6716277Z arg0_1, = args 2023-01-11T21:38:06.6716356Z args.clear() 2023-01-11T21:38:06.6716444Z with torch.cuda.device(0): 2023-01-11T21:38:06.6716691Z buf0 = empty_strided((2, 2, 1), (2, 1, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.6716788Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.6716929Z triton_fused_add_0.run(arg0_1, buf0, 4, grid=grid(4), stream=stream0) 2023-01-11T21:38:06.6717136Z buf1 = empty_strided((2, 2, 2), (4, 2, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.6717274Z triton_fused_add_1_1.run(arg0_1, buf1, 8, grid=grid(8), stream=stream0) 2023-01-11T21:38:06.6717476Z buf2 = empty_strided((2, 2, 3), (6, 3, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.6717608Z triton_fused_add_2_2.run(arg0_1, buf2, 12, grid=grid(12), stream=stream0) 2023-01-11T21:38:06.6717812Z buf3 = empty_strided((2, 2, 4), (8, 4, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.6717948Z triton_fused_add_3_3.run(arg0_1, buf3, 16, grid=grid(16), stream=stream0) 2023-01-11T21:38:06.6718025Z del arg0_1 2023-01-11T21:38:06.6718126Z return (buf0, buf1, buf2, buf3, ) 2023-01-11T21:38:06.6718131Z 2023-01-11T21:38:06.6718136Z 2023-01-11T21:38:06.6718219Z if __name__ == "__main__": 2023-01-11T21:38:06.6718342Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.6718471Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.6718676Z arg0_1 = rand_strided((2, 2, 10), (20, 10, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.6718791Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.6718796Z 2023-01-11T21:38:06.6719063Z [2023-01-11 21:35:40,366] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 852 2023-01-11T21:38:06.6719095Z 2023-01-11T21:38:06.6719199Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.6719277Z import torch 2023-01-11T21:38:06.6719352Z import random 2023-01-11T21:38:06.6719473Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.6719602Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.6719607Z 2023-01-11T21:38:06.6719684Z aten = torch.ops.aten 2023-01-11T21:38:06.6719823Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.6719921Z async_compile = AsyncCompile() 2023-01-11T21:38:06.6719926Z 2023-01-11T21:38:06.6720003Z import triton 2023-01-11T21:38:06.6720099Z import triton.language as tl 2023-01-11T21:38:06.6720226Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.6720367Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.6720373Z 2023-01-11T21:38:06.6720377Z 2023-01-11T21:38:06.6720533Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.6720607Z import triton 2023-01-11T21:38:06.6720703Z import triton.language as tl 2023-01-11T21:38:06.6720819Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6720925Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6721064Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6721192Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6721197Z 2023-01-11T21:38:06.6721601Z @pointwise(size_hints=[4], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.6721677Z @triton.jit 2023-01-11T21:38:06.6721803Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6721879Z xnumel = 4 2023-01-11T21:38:06.6721979Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6722109Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6722198Z xmask = xindex < xnumel 2023-01-11T21:38:06.6722271Z x0 = xindex 2023-01-11T21:38:06.6722489Z tmp0 = tl.load(in_ptr0 + (10*x0), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.6722558Z tmp1 = 2.0 2023-01-11T21:38:06.6722673Z tmp2 = tmp0 * tmp1 2023-01-11T21:38:06.6722750Z tmp3 = 1.0 2023-01-11T21:38:06.6722832Z tmp4 = tmp2 + tmp3 2023-01-11T21:38:06.6722972Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.6723059Z ''') 2023-01-11T21:38:06.6723064Z 2023-01-11T21:38:06.6723069Z 2023-01-11T21:38:06.6723227Z triton_fused_add_1_1 = async_compile.triton(''' 2023-01-11T21:38:06.6723302Z import triton 2023-01-11T21:38:06.6723391Z import triton.language as tl 2023-01-11T21:38:06.6723510Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6723613Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6723747Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6723877Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6723883Z 2023-01-11T21:38:06.6724287Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.6724363Z @triton.jit 2023-01-11T21:38:06.6724499Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6724567Z xnumel = 8 2023-01-11T21:38:06.6724666Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6724797Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6724882Z xmask = xindex < xnumel 2023-01-11T21:38:06.6724958Z x0 = xindex % 2 2023-01-11T21:38:06.6725038Z x1 = (xindex // 2) 2023-01-11T21:38:06.6725104Z x2 = xindex 2023-01-11T21:38:06.6725338Z tmp0 = tl.load(in_ptr0 + (1 + x0 + (10*x1)), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.6725441Z tmp1 = 2.0 2023-01-11T21:38:06.6725523Z tmp2 = tmp0 * tmp1 2023-01-11T21:38:06.6725598Z tmp3 = 1.0 2023-01-11T21:38:06.6725682Z tmp4 = tmp2 + tmp3 2023-01-11T21:38:06.6725823Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.6725903Z ''') 2023-01-11T21:38:06.6725917Z 2023-01-11T21:38:06.6725922Z 2023-01-11T21:38:06.6726073Z triton_fused_add_2_2 = async_compile.triton(''' 2023-01-11T21:38:06.6726150Z import triton 2023-01-11T21:38:06.6726246Z import triton.language as tl 2023-01-11T21:38:06.6726363Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6726465Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6726599Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6726733Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6726738Z 2023-01-11T21:38:06.6727137Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.6727210Z @triton.jit 2023-01-11T21:38:06.6727344Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6727423Z xnumel = 12 2023-01-11T21:38:06.6727523Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6727656Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6727742Z xmask = xindex < xnumel 2023-01-11T21:38:06.6727818Z x0 = xindex % 3 2023-01-11T21:38:06.6727892Z x1 = (xindex // 3) 2023-01-11T21:38:06.6727964Z x2 = xindex 2023-01-11T21:38:06.6728199Z tmp0 = tl.load(in_ptr0 + (3 + x0 + (10*x1)), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.6728276Z tmp1 = 2.0 2023-01-11T21:38:06.6728357Z tmp2 = tmp0 * tmp1 2023-01-11T21:38:06.6728434Z tmp3 = 1.0 2023-01-11T21:38:06.6728515Z tmp4 = tmp2 + tmp3 2023-01-11T21:38:06.6728644Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.6728731Z ''') 2023-01-11T21:38:06.6728737Z 2023-01-11T21:38:06.6728741Z 2023-01-11T21:38:06.6728902Z triton_fused_add_3_3 = async_compile.triton(''' 2023-01-11T21:38:06.6729012Z import triton 2023-01-11T21:38:06.6729109Z import triton.language as tl 2023-01-11T21:38:06.6729228Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6729336Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6729464Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6729594Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6729599Z 2023-01-11T21:38:06.6730001Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.6730082Z @triton.jit 2023-01-11T21:38:06.6730215Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6730293Z xnumel = 16 2023-01-11T21:38:06.6730392Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6730525Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6730603Z xmask = xindex < xnumel 2023-01-11T21:38:06.6730681Z x0 = xindex % 4 2023-01-11T21:38:06.6730763Z x1 = (xindex // 4) 2023-01-11T21:38:06.6730837Z x2 = xindex 2023-01-11T21:38:06.6730968Z tmp0 = tl.load(in_ptr0 + (6 + x0 + (10*x1)), xmask).to(tl.float32) 2023-01-11T21:38:06.6731043Z tmp1 = 2.0 2023-01-11T21:38:06.6731127Z tmp2 = tmp0 * tmp1 2023-01-11T21:38:06.6731195Z tmp3 = 1.0 2023-01-11T21:38:06.6731274Z tmp4 = tmp2 + tmp3 2023-01-11T21:38:06.6731415Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.6731504Z ''') 2023-01-11T21:38:06.6731536Z 2023-01-11T21:38:06.6731541Z 2023-01-11T21:38:06.6731638Z async_compile.wait(globals()) 2023-01-11T21:38:06.6731717Z del async_compile 2023-01-11T21:38:06.6731722Z 2023-01-11T21:38:06.6731799Z def call(args): 2023-01-11T21:38:06.6731867Z arg0_1, = args 2023-01-11T21:38:06.6731945Z args.clear() 2023-01-11T21:38:06.6732044Z with torch.cuda.device(0): 2023-01-11T21:38:06.6732253Z buf0 = empty_strided((2, 2, 1), (2, 1, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.6732348Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.6732489Z triton_fused_add_0.run(arg0_1, buf0, 4, grid=grid(4), stream=stream0) 2023-01-11T21:38:06.6732700Z buf1 = empty_strided((2, 2, 2), (4, 2, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.6732843Z triton_fused_add_1_1.run(arg0_1, buf1, 8, grid=grid(8), stream=stream0) 2023-01-11T21:38:06.6733037Z buf2 = empty_strided((2, 2, 3), (6, 3, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.6733180Z triton_fused_add_2_2.run(arg0_1, buf2, 12, grid=grid(12), stream=stream0) 2023-01-11T21:38:06.6733382Z buf3 = empty_strided((2, 2, 4), (8, 4, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.6733517Z triton_fused_add_3_3.run(arg0_1, buf3, 16, grid=grid(16), stream=stream0) 2023-01-11T21:38:06.6733596Z del arg0_1 2023-01-11T21:38:06.6733695Z return (buf0, buf1, buf2, buf3, ) 2023-01-11T21:38:06.6733701Z 2023-01-11T21:38:06.6733705Z 2023-01-11T21:38:06.6733789Z if __name__ == "__main__": 2023-01-11T21:38:06.6733912Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.6734034Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.6734249Z arg0_1 = rand_strided((2, 2, 10), (20, 10, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.6734366Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.6734371Z 2023-01-11T21:38:06.6734446Z ok (0.759s) 2023-01-11T21:38:06.6735067Z test_squeeze1_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.6735204Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.6735465Z [2023-01-11 21:35:40,388] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 853 2023-01-11T21:38:06.6735727Z [2023-01-11 21:35:40,460] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 853 2023-01-11T21:38:06.6736144Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.6736282Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.6736541Z [2023-01-11 21:35:40,481] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 854 2023-01-11T21:38:06.6736794Z [2023-01-11 21:35:40,556] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 854 2023-01-11T21:38:06.6736799Z 2023-01-11T21:38:06.6736897Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.6736973Z import torch 2023-01-11T21:38:06.6737050Z import random 2023-01-11T21:38:06.6737225Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.6737353Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.6737358Z 2023-01-11T21:38:06.6737441Z aten = torch.ops.aten 2023-01-11T21:38:06.6737571Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.6737721Z async_compile = AsyncCompile() 2023-01-11T21:38:06.6737727Z 2023-01-11T21:38:06.6737803Z import triton 2023-01-11T21:38:06.6737897Z import triton.language as tl 2023-01-11T21:38:06.6738028Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.6738174Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.6738179Z 2023-01-11T21:38:06.6738183Z 2023-01-11T21:38:06.6738358Z triton_fused_add_1_add_2_0 = async_compile.triton(''' 2023-01-11T21:38:06.6738435Z import triton 2023-01-11T21:38:06.6738523Z import triton.language as tl 2023-01-11T21:38:06.6738644Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6738748Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6738884Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6739010Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6739016Z 2023-01-11T21:38:06.6739434Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.6739514Z @triton.jit 2023-01-11T21:38:06.6739661Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6739732Z xnumel = 8 2023-01-11T21:38:06.6739834Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6739967Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6740055Z xmask = xindex < xnumel 2023-01-11T21:38:06.6740129Z x0 = xindex 2023-01-11T21:38:06.6740322Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.6740424Z tmp5 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.6740490Z tmp1 = 1 2023-01-11T21:38:06.6740571Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.6740644Z tmp3 = 2 2023-01-11T21:38:06.6740724Z tmp4 = tmp2 + tmp3 2023-01-11T21:38:06.6740807Z tmp6 = tmp5 + tmp3 2023-01-11T21:38:06.6740944Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.6741080Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp6, xmask) 2023-01-11T21:38:06.6741160Z ''') 2023-01-11T21:38:06.6741172Z 2023-01-11T21:38:06.6741176Z 2023-01-11T21:38:06.6741296Z async_compile.wait(globals()) 2023-01-11T21:38:06.6741379Z del async_compile 2023-01-11T21:38:06.6741384Z 2023-01-11T21:38:06.6741462Z def call(args): 2023-01-11T21:38:06.6741538Z arg0_1, = args 2023-01-11T21:38:06.6741616Z args.clear() 2023-01-11T21:38:06.6741713Z with torch.cuda.device(0): 2023-01-11T21:38:06.6741921Z buf0 = empty_strided((2, 2, 2), (4, 2, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.6742121Z buf1 = empty_strided((2, 2, 2), (4, 2, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.6742218Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.6742371Z triton_fused_add_1_add_2_0.run(arg0_1, buf0, buf1, 8, grid=grid(8), stream=stream0) 2023-01-11T21:38:06.6742447Z del arg0_1 2023-01-11T21:38:06.6742536Z return (buf0, buf1, ) 2023-01-11T21:38:06.6742541Z 2023-01-11T21:38:06.6742545Z 2023-01-11T21:38:06.6742628Z if __name__ == "__main__": 2023-01-11T21:38:06.6742750Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.6742879Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.6743109Z arg0_1 = rand_strided((1, 2, 1, 2, 2, 1, 1), (8, 4, 4, 2, 1, 1, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.6743223Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.6743228Z 2023-01-11T21:38:06.6743233Z 2023-01-11T21:38:06.6743333Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.6743409Z import torch 2023-01-11T21:38:06.6743488Z import random 2023-01-11T21:38:06.6743611Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.6743737Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.6743769Z 2023-01-11T21:38:06.6743855Z aten = torch.ops.aten 2023-01-11T21:38:06.6743987Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.6744086Z async_compile = AsyncCompile() 2023-01-11T21:38:06.6744091Z 2023-01-11T21:38:06.6744166Z import triton 2023-01-11T21:38:06.6744264Z import triton.language as tl 2023-01-11T21:38:06.6744392Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.6744534Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.6744539Z 2023-01-11T21:38:06.6744544Z 2023-01-11T21:38:06.6744713Z triton_fused_add_1_add_2_0 = async_compile.triton(''' 2023-01-11T21:38:06.6744789Z import triton 2023-01-11T21:38:06.6744876Z import triton.language as tl 2023-01-11T21:38:06.6744993Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6745096Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6745233Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6745364Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6745370Z 2023-01-11T21:38:06.6745839Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.6745915Z @triton.jit 2023-01-11T21:38:06.6746060Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6746130Z xnumel = 8 2023-01-11T21:38:06.6746229Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6746361Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6746447Z xmask = xindex < xnumel 2023-01-11T21:38:06.6746521Z x0 = xindex 2023-01-11T21:38:06.6746737Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.6746858Z tmp5 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.6746927Z tmp1 = 1 2023-01-11T21:38:06.6747010Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.6747082Z tmp3 = 2 2023-01-11T21:38:06.6747163Z tmp4 = tmp2 + tmp3 2023-01-11T21:38:06.6747242Z tmp6 = tmp5 + tmp3 2023-01-11T21:38:06.6747406Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.6747542Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp6, xmask) 2023-01-11T21:38:06.6747623Z ''') 2023-01-11T21:38:06.6747628Z 2023-01-11T21:38:06.6747633Z 2023-01-11T21:38:06.6747728Z async_compile.wait(globals()) 2023-01-11T21:38:06.6747807Z del async_compile 2023-01-11T21:38:06.6747812Z 2023-01-11T21:38:06.6747889Z def call(args): 2023-01-11T21:38:06.6747964Z arg0_1, = args 2023-01-11T21:38:06.6748041Z args.clear() 2023-01-11T21:38:06.6748136Z with torch.cuda.device(0): 2023-01-11T21:38:06.6748337Z buf0 = empty_strided((2, 2, 2), (4, 2, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.6748546Z buf1 = empty_strided((2, 2, 2), (4, 2, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.6748644Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.6748795Z triton_fused_add_1_add_2_0.run(arg0_1, buf0, buf1, 8, grid=grid(8), stream=stream0) 2023-01-11T21:38:06.6748873Z del arg0_1 2023-01-11T21:38:06.6748965Z return (buf0, buf1, ) 2023-01-11T21:38:06.6748970Z 2023-01-11T21:38:06.6748974Z 2023-01-11T21:38:06.6749056Z if __name__ == "__main__": 2023-01-11T21:38:06.6749175Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.6749296Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.6749530Z arg0_1 = rand_strided((1, 2, 1, 2, 2, 1, 1), (8, 4, 4, 2, 1, 1, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.6749646Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.6749651Z 2023-01-11T21:38:06.6749725Z ok (0.189s) 2023-01-11T21:38:06.6750256Z test_squeeze2_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.6750394Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.6750659Z [2023-01-11 21:35:40,579] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 855 2023-01-11T21:38:06.6750924Z [2023-01-11 21:35:40,657] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 855 2023-01-11T21:38:06.6751340Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.6751476Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.6751733Z [2023-01-11 21:35:40,679] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 856 2023-01-11T21:38:06.6751987Z [2023-01-11 21:35:40,753] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 856 2023-01-11T21:38:06.6752001Z 2023-01-11T21:38:06.6752095Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.6752174Z import torch 2023-01-11T21:38:06.6752255Z import random 2023-01-11T21:38:06.6752378Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.6752507Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.6752513Z 2023-01-11T21:38:06.6752597Z aten = torch.ops.aten 2023-01-11T21:38:06.6752736Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.6752831Z async_compile = AsyncCompile() 2023-01-11T21:38:06.6752836Z 2023-01-11T21:38:06.6752911Z import triton 2023-01-11T21:38:06.6753005Z import triton.language as tl 2023-01-11T21:38:06.6753133Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.6753300Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.6753306Z 2023-01-11T21:38:06.6753311Z 2023-01-11T21:38:06.6753480Z triton_fused_add_1_add_2_0 = async_compile.triton(''' 2023-01-11T21:38:06.6753559Z import triton 2023-01-11T21:38:06.6753656Z import triton.language as tl 2023-01-11T21:38:06.6753764Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6753871Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6754005Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6754132Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6754137Z 2023-01-11T21:38:06.6754559Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.6754639Z @triton.jit 2023-01-11T21:38:06.6754787Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6754863Z xnumel = 16 2023-01-11T21:38:06.6754956Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6755088Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6755174Z xmask = xindex < xnumel 2023-01-11T21:38:06.6755251Z x0 = xindex 2023-01-11T21:38:06.6755445Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.6755546Z tmp5 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.6755621Z tmp1 = 1 2023-01-11T21:38:06.6755696Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.6755769Z tmp3 = 2 2023-01-11T21:38:06.6755877Z tmp4 = tmp2 + tmp3 2023-01-11T21:38:06.6755955Z tmp6 = tmp5 + tmp3 2023-01-11T21:38:06.6756092Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.6756229Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp6, xmask) 2023-01-11T21:38:06.6756309Z ''') 2023-01-11T21:38:06.6756323Z 2023-01-11T21:38:06.6756329Z 2023-01-11T21:38:06.6756418Z async_compile.wait(globals()) 2023-01-11T21:38:06.6756498Z del async_compile 2023-01-11T21:38:06.6756503Z 2023-01-11T21:38:06.6756581Z def call(args): 2023-01-11T21:38:06.6756657Z arg0_1, = args 2023-01-11T21:38:06.6756735Z args.clear() 2023-01-11T21:38:06.6756830Z with torch.cuda.device(0): 2023-01-11T21:38:06.6757056Z buf0 = empty_strided((1, 2, 2, 2, 2), (16, 8, 4, 2, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.6757277Z buf1 = empty_strided((2, 1, 2, 2, 2, 1), (8, 8, 4, 2, 1, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.6757377Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.6757533Z triton_fused_add_1_add_2_0.run(arg0_1, buf0, buf1, 16, grid=grid(16), stream=stream0) 2023-01-11T21:38:06.6757609Z del arg0_1 2023-01-11T21:38:06.6757694Z return (buf0, buf1, ) 2023-01-11T21:38:06.6757699Z 2023-01-11T21:38:06.6757703Z 2023-01-11T21:38:06.6757785Z if __name__ == "__main__": 2023-01-11T21:38:06.6757908Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.6758038Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.6758268Z arg0_1 = rand_strided((1, 2, 1, 2, 2, 2, 1), (16, 8, 8, 4, 2, 1, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.6758383Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.6758389Z 2023-01-11T21:38:06.6758393Z 2023-01-11T21:38:06.6758494Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.6758569Z import torch 2023-01-11T21:38:06.6758646Z import random 2023-01-11T21:38:06.6758767Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.6758898Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.6758903Z 2023-01-11T21:38:06.6758989Z aten = torch.ops.aten 2023-01-11T21:38:06.6759121Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.6759218Z async_compile = AsyncCompile() 2023-01-11T21:38:06.6759223Z 2023-01-11T21:38:06.6759325Z import triton 2023-01-11T21:38:06.6759422Z import triton.language as tl 2023-01-11T21:38:06.6759548Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.6759689Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.6759694Z 2023-01-11T21:38:06.6759698Z 2023-01-11T21:38:06.6759867Z triton_fused_add_1_add_2_0 = async_compile.triton(''' 2023-01-11T21:38:06.6759947Z import triton 2023-01-11T21:38:06.6760035Z import triton.language as tl 2023-01-11T21:38:06.6760153Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6760259Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6760396Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6760523Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6760528Z 2023-01-11T21:38:06.6760954Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.6761029Z @triton.jit 2023-01-11T21:38:06.6761175Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6761243Z xnumel = 16 2023-01-11T21:38:06.6761345Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6761479Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6761564Z xmask = xindex < xnumel 2023-01-11T21:38:06.6761638Z x0 = xindex 2023-01-11T21:38:06.6761852Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.6762000Z tmp5 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.6762066Z tmp1 = 1 2023-01-11T21:38:06.6762148Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.6762220Z tmp3 = 2 2023-01-11T21:38:06.6762304Z tmp4 = tmp2 + tmp3 2023-01-11T21:38:06.6762383Z tmp6 = tmp5 + tmp3 2023-01-11T21:38:06.6762522Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.6762658Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp6, xmask) 2023-01-11T21:38:06.6762739Z ''') 2023-01-11T21:38:06.6762744Z 2023-01-11T21:38:06.6762755Z 2023-01-11T21:38:06.6762843Z async_compile.wait(globals()) 2023-01-11T21:38:06.6762922Z del async_compile 2023-01-11T21:38:06.6762927Z 2023-01-11T21:38:06.6763003Z def call(args): 2023-01-11T21:38:06.6763080Z arg0_1, = args 2023-01-11T21:38:06.6763159Z args.clear() 2023-01-11T21:38:06.6763254Z with torch.cuda.device(0): 2023-01-11T21:38:06.6763478Z buf0 = empty_strided((1, 2, 2, 2, 2), (16, 8, 4, 2, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.6763700Z buf1 = empty_strided((2, 1, 2, 2, 2, 1), (8, 8, 4, 2, 1, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.6763795Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.6763949Z triton_fused_add_1_add_2_0.run(arg0_1, buf0, buf1, 16, grid=grid(16), stream=stream0) 2023-01-11T21:38:06.6764030Z del arg0_1 2023-01-11T21:38:06.6764115Z return (buf0, buf1, ) 2023-01-11T21:38:06.6764120Z 2023-01-11T21:38:06.6764124Z 2023-01-11T21:38:06.6764206Z if __name__ == "__main__": 2023-01-11T21:38:06.6764326Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.6764457Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.6764686Z arg0_1 = rand_strided((1, 2, 1, 2, 2, 2, 1), (16, 8, 8, 4, 2, 1, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.6764801Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.6764809Z 2023-01-11T21:38:06.6764882Z ok (0.197s) 2023-01-11T21:38:06.6765369Z test_stack_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.6765505Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.6765768Z [2023-01-11 21:35:40,774] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 857 2023-01-11T21:38:06.6766032Z [2023-01-11 21:35:40,867] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 857 2023-01-11T21:38:06.6766449Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.6766588Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.6766847Z [2023-01-11 21:35:40,886] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 858 2023-01-11T21:38:06.6767117Z [2023-01-11 21:35:40,976] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 858 2023-01-11T21:38:06.6767123Z 2023-01-11T21:38:06.6767216Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.6767293Z import torch 2023-01-11T21:38:06.6767369Z import random 2023-01-11T21:38:06.6767490Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.6767617Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.6767622Z 2023-01-11T21:38:06.6767708Z aten = torch.ops.aten 2023-01-11T21:38:06.6767874Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.6767965Z async_compile = AsyncCompile() 2023-01-11T21:38:06.6767970Z 2023-01-11T21:38:06.6768047Z import triton 2023-01-11T21:38:06.6768141Z import triton.language as tl 2023-01-11T21:38:06.6768271Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.6768414Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.6768420Z 2023-01-11T21:38:06.6768424Z 2023-01-11T21:38:06.6768584Z triton_fused_cat_0 = async_compile.triton(''' 2023-01-11T21:38:06.6768661Z import triton 2023-01-11T21:38:06.6768762Z import triton.language as tl 2023-01-11T21:38:06.6768872Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6768975Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6769109Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6769236Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6769244Z 2023-01-11T21:38:06.6769648Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.6769726Z @triton.jit 2023-01-11T21:38:06.6769867Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6769945Z xnumel = 192 2023-01-11T21:38:06.6770037Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6770169Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6770256Z xmask = xindex < xnumel 2023-01-11T21:38:06.6770334Z x0 = xindex % 16 2023-01-11T21:38:06.6770407Z x2 = xindex 2023-01-11T21:38:06.6770506Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.6770645Z tl.store(out_ptr0 + (2*x2 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask) 2023-01-11T21:38:06.6770725Z ''') 2023-01-11T21:38:06.6770734Z 2023-01-11T21:38:06.6770739Z 2023-01-11T21:38:06.6770893Z triton_fused_cat_1 = async_compile.triton(''' 2023-01-11T21:38:06.6770970Z import triton 2023-01-11T21:38:06.6771067Z import triton.language as tl 2023-01-11T21:38:06.6771182Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6771285Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6771454Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6771572Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6771585Z 2023-01-11T21:38:06.6771975Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 2), equal_to_1=())]}) 2023-01-11T21:38:06.6772049Z @triton.jit 2023-01-11T21:38:06.6772181Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6772255Z xnumel = 192 2023-01-11T21:38:06.6772354Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6772483Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6772565Z xmask = xindex < xnumel 2023-01-11T21:38:06.6772638Z x1 = (xindex // 16) 2023-01-11T21:38:06.6772708Z x2 = xindex 2023-01-11T21:38:06.6772808Z tmp0 = tl.load(in_ptr0 + (x1), xmask) 2023-01-11T21:38:06.6772943Z tl.store(out_ptr0 + (2*x2 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask) 2023-01-11T21:38:06.6773029Z ''') 2023-01-11T21:38:06.6773034Z 2023-01-11T21:38:06.6773039Z 2023-01-11T21:38:06.6773130Z async_compile.wait(globals()) 2023-01-11T21:38:06.6773207Z del async_compile 2023-01-11T21:38:06.6773212Z 2023-01-11T21:38:06.6773287Z def call(args): 2023-01-11T21:38:06.6773359Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.6773433Z args.clear() 2023-01-11T21:38:06.6773525Z with torch.cuda.device(0): 2023-01-11T21:38:06.6773735Z buf2 = empty_strided((12, 16, 2), (32, 2, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.6773872Z buf0 = as_strided(buf2, (12, 16, 1), (32, 2, 1)) # alias 2023-01-11T21:38:06.6773964Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.6774101Z triton_fused_cat_0.run(arg0_1, buf0, 192, grid=grid(192), stream=stream0) 2023-01-11T21:38:06.6774167Z del arg0_1 2023-01-11T21:38:06.6774286Z buf1 = as_strided(buf2, (12, 16, 1), (32, 2, 1), 1) # alias 2023-01-11T21:38:06.6774423Z triton_fused_cat_1.run(arg1_1, buf1, 192, grid=grid(192), stream=stream0) 2023-01-11T21:38:06.6774629Z del arg1_1 2023-01-11T21:38:06.6774709Z return (buf2, ) 2023-01-11T21:38:06.6774714Z 2023-01-11T21:38:06.6774719Z 2023-01-11T21:38:06.6774798Z if __name__ == "__main__": 2023-01-11T21:38:06.6774915Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.6775042Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.6775241Z arg0_1 = rand_strided((1, 16), (16, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.6775445Z arg1_1 = rand_strided((12, 1), (1, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.6775564Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.6775570Z 2023-01-11T21:38:06.6775574Z 2023-01-11T21:38:06.6775672Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.6775747Z import torch 2023-01-11T21:38:06.6775824Z import random 2023-01-11T21:38:06.6775947Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.6776070Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.6776076Z 2023-01-11T21:38:06.6776150Z aten = torch.ops.aten 2023-01-11T21:38:06.6776286Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.6776382Z async_compile = AsyncCompile() 2023-01-11T21:38:06.6776387Z 2023-01-11T21:38:06.6776460Z import triton 2023-01-11T21:38:06.6776551Z import triton.language as tl 2023-01-11T21:38:06.6776676Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.6776817Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.6776822Z 2023-01-11T21:38:06.6776827Z 2023-01-11T21:38:06.6776978Z triton_fused_cat_0 = async_compile.triton(''' 2023-01-11T21:38:06.6777045Z import triton 2023-01-11T21:38:06.6777189Z import triton.language as tl 2023-01-11T21:38:06.6777385Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6777490Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6777630Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6777755Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6777760Z 2023-01-11T21:38:06.6778163Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.6778236Z @triton.jit 2023-01-11T21:38:06.6778370Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6778449Z xnumel = 192 2023-01-11T21:38:06.6778549Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6778687Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6778772Z xmask = xindex < xnumel 2023-01-11T21:38:06.6778850Z x0 = xindex % 16 2023-01-11T21:38:06.6778922Z x2 = xindex 2023-01-11T21:38:06.6779040Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.6779187Z tl.store(out_ptr0 + (2*x2 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask) 2023-01-11T21:38:06.6779281Z ''') 2023-01-11T21:38:06.6779286Z 2023-01-11T21:38:06.6779291Z 2023-01-11T21:38:06.6779459Z triton_fused_cat_1 = async_compile.triton(''' 2023-01-11T21:38:06.6779534Z import triton 2023-01-11T21:38:06.6779630Z import triton.language as tl 2023-01-11T21:38:06.6779752Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6779852Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6780042Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6780178Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6780183Z 2023-01-11T21:38:06.6780655Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 2), equal_to_1=())]}) 2023-01-11T21:38:06.6780731Z @triton.jit 2023-01-11T21:38:06.6780864Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6780941Z xnumel = 192 2023-01-11T21:38:06.6781039Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6781163Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6781247Z xmask = xindex < xnumel 2023-01-11T21:38:06.6781329Z x1 = (xindex // 16) 2023-01-11T21:38:06.6781403Z x2 = xindex 2023-01-11T21:38:06.6781523Z tmp0 = tl.load(in_ptr0 + (x1), xmask).to(tl.float32) 2023-01-11T21:38:06.6781663Z tl.store(out_ptr0 + (2*x2 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask) 2023-01-11T21:38:06.6781750Z ''') 2023-01-11T21:38:06.6781755Z 2023-01-11T21:38:06.6781760Z 2023-01-11T21:38:06.6781856Z async_compile.wait(globals()) 2023-01-11T21:38:06.6781929Z del async_compile 2023-01-11T21:38:06.6781934Z 2023-01-11T21:38:06.6782015Z def call(args): 2023-01-11T21:38:06.6782095Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.6782174Z args.clear() 2023-01-11T21:38:06.6782268Z with torch.cuda.device(0): 2023-01-11T21:38:06.6782478Z buf2 = empty_strided((12, 16, 2), (32, 2, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.6782590Z buf0 = as_strided(buf2, (12, 16, 1), (32, 2, 1)) # alias 2023-01-11T21:38:06.6782678Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.6782817Z triton_fused_cat_0.run(arg0_1, buf0, 192, grid=grid(192), stream=stream0) 2023-01-11T21:38:06.6782891Z del arg0_1 2023-01-11T21:38:06.6783011Z buf1 = as_strided(buf2, (12, 16, 1), (32, 2, 1), 1) # alias 2023-01-11T21:38:06.6783153Z triton_fused_cat_1.run(arg1_1, buf1, 192, grid=grid(192), stream=stream0) 2023-01-11T21:38:06.6783229Z del arg1_1 2023-01-11T21:38:06.6783309Z return (buf2, ) 2023-01-11T21:38:06.6783314Z 2023-01-11T21:38:06.6783319Z 2023-01-11T21:38:06.6783427Z if __name__ == "__main__": 2023-01-11T21:38:06.6783542Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.6783670Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.6783873Z arg0_1 = rand_strided((1, 16), (16, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.6784075Z arg1_1 = rand_strided((12, 1), (1, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.6784198Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.6784203Z 2023-01-11T21:38:06.6784276Z ok (0.223s) 2023-01-11T21:38:06.6784727Z test_std_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.6784874Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.6785135Z [2023-01-11 21:35:41,014] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 859 2023-01-11T21:38:06.6785342Z [2023-01-11 21:35:41,112] torch._inductor.scheduler: [DEBUG] remove_buffer('buf10') 2023-01-11T21:38:06.6785551Z [2023-01-11 21:35:41,112] torch._inductor.scheduler: [DEBUG] remove_buffer('buf2') 2023-01-11T21:38:06.6785755Z [2023-01-11 21:35:41,112] torch._inductor.scheduler: [DEBUG] remove_buffer('buf0') 2023-01-11T21:38:06.6785961Z [2023-01-11 21:35:41,118] torch._inductor.scheduler: [DEBUG] remove_buffer('buf7') 2023-01-11T21:38:06.6786189Z [2023-01-11 21:35:41,118] torch._inductor.scheduler: [DEBUG] remove_buffer('buf4') 2023-01-11T21:38:06.6786398Z [2023-01-11 21:35:41,123] torch._inductor.scheduler: [DEBUG] remove_buffer('buf15') 2023-01-11T21:38:06.6786602Z [2023-01-11 21:35:41,123] torch._inductor.scheduler: [DEBUG] remove_buffer('buf12') 2023-01-11T21:38:06.6786607Z 2023-01-11T21:38:06.6786709Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.6786779Z import torch 2023-01-11T21:38:06.6786856Z import random 2023-01-11T21:38:06.6786978Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.6787106Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.6787112Z 2023-01-11T21:38:06.6787197Z aten = torch.ops.aten 2023-01-11T21:38:06.6787336Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.6787434Z async_compile = AsyncCompile() 2023-01-11T21:38:06.6787440Z 2023-01-11T21:38:06.6787518Z import triton 2023-01-11T21:38:06.6787606Z import triton.language as tl 2023-01-11T21:38:06.6787737Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.6787880Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.6787885Z 2023-01-11T21:38:06.6787890Z 2023-01-11T21:38:06.6788064Z triton_fused_std_var_var_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.6788144Z import triton 2023-01-11T21:38:06.6788239Z import triton.language as tl 2023-01-11T21:38:06.6788356Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6788455Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6788589Z from torch._inductor.triton_ops.autotune import reduction 2023-01-11T21:38:06.6788719Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6788724Z 2023-01-11T21:38:06.6788817Z @reduction(size_hints=[1, 256], 2023-01-11T21:38:06.6788935Z reduction_hint=ReductionHint.INNER, 2023-01-11T21:38:06.6789023Z filename=__file__, 2023-01-11T21:38:06.6789466Z meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: 'i32', 5: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr1', 'in_out_ptr0', 'in_out_ptr2'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 5), equal_to_1=())]}) 2023-01-11T21:38:06.6789549Z @triton.jit 2023-01-11T21:38:06.6789776Z def triton_(in_out_ptr0, in_out_ptr1, in_out_ptr2, in_ptr0, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr): 2023-01-11T21:38:06.6789848Z xnumel = 1 2023-01-11T21:38:06.6789923Z rnumel = 256 2023-01-11T21:38:06.6790023Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6790161Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.6790247Z xmask = xindex < xnumel 2023-01-11T21:38:06.6790368Z rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK]) 2023-01-11T21:38:06.6790489Z _tmp1 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.6790591Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.6790686Z rindex = roffset + rbase 2023-01-11T21:38:06.6790776Z rmask = rindex < rnumel 2023-01-11T21:38:06.6790854Z r0 = rindex 2023-01-11T21:38:06.6791051Z tmp0 = tl.load(in_ptr0 + (r0), rmask, eviction_policy='evict_last') 2023-01-11T21:38:06.6791175Z _tmp1 = tl.where(xmask & rmask, _tmp1 + tmp0, _tmp1) 2023-01-11T21:38:06.6791294Z tmp1 = tl.reshape(tl.sum(_tmp1, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.6791406Z _tmp7 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.6791526Z _tmp8 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.6791633Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.6791723Z rindex = roffset + rbase 2023-01-11T21:38:06.6791810Z rmask = rindex < rnumel 2023-01-11T21:38:06.6791884Z r0 = rindex 2023-01-11T21:38:06.6792079Z tmp2 = tl.load(in_ptr0 + (r0), rmask, eviction_policy='evict_last') 2023-01-11T21:38:06.6792149Z tmp3 = 256 2023-01-11T21:38:06.6792261Z tmp4 = tmp1 / tmp3 2023-01-11T21:38:06.6792378Z tmp5 = tmp2 - tmp4 2023-01-11T21:38:06.6792461Z tmp6 = tmp5 * tmp5 2023-01-11T21:38:06.6792584Z _tmp7 = tl.where(xmask & rmask, _tmp7 + tmp6, _tmp7) 2023-01-11T21:38:06.6792704Z _tmp8 = tl.where(xmask & rmask, _tmp8 + tmp2, _tmp8) 2023-01-11T21:38:06.6792823Z tmp7 = tl.reshape(tl.sum(_tmp7, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.6792929Z tmp8 = tl.reshape(tl.sum(_tmp8, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.6793049Z _tmp14 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.6793169Z _tmp15 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.6793280Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.6793371Z rindex = roffset + rbase 2023-01-11T21:38:06.6793458Z rmask = rindex < rnumel 2023-01-11T21:38:06.6793532Z r0 = rindex 2023-01-11T21:38:06.6793721Z tmp9 = tl.load(in_ptr0 + (r0), rmask, eviction_policy='evict_last') 2023-01-11T21:38:06.6793801Z tmp10 = 256 2023-01-11T21:38:06.6793885Z tmp11 = tmp8 / tmp10 2023-01-11T21:38:06.6794004Z tmp12 = tmp9 - tmp11 2023-01-11T21:38:06.6794091Z tmp13 = tmp12 * tmp12 2023-01-11T21:38:06.6794216Z _tmp14 = tl.where(xmask & rmask, _tmp14 + tmp13, _tmp14) 2023-01-11T21:38:06.6794342Z _tmp15 = tl.where(xmask & rmask, _tmp15 + tmp9, _tmp15) 2023-01-11T21:38:06.6794453Z tmp14 = tl.reshape(tl.sum(_tmp14, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.6794570Z tmp15 = tl.reshape(tl.sum(_tmp15, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.6794688Z _tmp21 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.6794798Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.6794888Z rindex = roffset + rbase 2023-01-11T21:38:06.6794975Z rmask = rindex < rnumel 2023-01-11T21:38:06.6795051Z r0 = rindex 2023-01-11T21:38:06.6795149Z tmp16 = tl.load(in_ptr0 + (r0), rmask) 2023-01-11T21:38:06.6795226Z tmp17 = 256 2023-01-11T21:38:06.6795315Z tmp18 = tmp15 / tmp17 2023-01-11T21:38:06.6795435Z tmp19 = tmp16 - tmp18 2023-01-11T21:38:06.6795524Z tmp20 = tmp19 * tmp19 2023-01-11T21:38:06.6795650Z _tmp21 = tl.where(xmask & rmask, _tmp21 + tmp20, _tmp21) 2023-01-11T21:38:06.6795767Z tmp21 = tl.reshape(tl.sum(_tmp21, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.6795862Z tmp22 = 256 2023-01-11T21:38:06.6795948Z tmp23 = tmp21 / tmp22 2023-01-11T21:38:06.6796035Z tmp24 = tl.sqrt(tmp23) 2023-01-11T21:38:06.6796121Z tmp25 = tmp14 / tmp22 2023-01-11T21:38:06.6796196Z tmp26 = 255 2023-01-11T21:38:06.6796279Z tmp27 = tmp7 / tmp26 2023-01-11T21:38:06.6796422Z tl.store(in_out_ptr0 + (0 + tl.zeros([XBLOCK, 1], tl.int32)), tmp24, None) 2023-01-11T21:38:06.6796555Z tl.store(in_out_ptr1 + (0 + tl.zeros([XBLOCK, 1], tl.int32)), tmp25, None) 2023-01-11T21:38:06.6796692Z tl.store(in_out_ptr2 + (0 + tl.zeros([XBLOCK, 1], tl.int32)), tmp27, None) 2023-01-11T21:38:06.6796783Z ''') 2023-01-11T21:38:06.6796789Z 2023-01-11T21:38:06.6796793Z 2023-01-11T21:38:06.6796962Z triton_fused_var_2_var_3_1 = async_compile.triton(''' 2023-01-11T21:38:06.6797040Z import triton 2023-01-11T21:38:06.6797137Z import triton.language as tl 2023-01-11T21:38:06.6797254Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6797362Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6797489Z from torch._inductor.triton_ops.autotune import reduction 2023-01-11T21:38:06.6797616Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6797621Z 2023-01-11T21:38:06.6797713Z @reduction(size_hints=[32, 8], 2023-01-11T21:38:06.6797833Z reduction_hint=ReductionHint.INNER, 2023-01-11T21:38:06.6797920Z filename=__file__, 2023-01-11T21:38:06.6798332Z meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0', 'in_out_ptr1'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.6798433Z @triton.jit 2023-01-11T21:38:06.6798619Z def triton_(in_out_ptr0, in_out_ptr1, in_ptr0, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr): 2023-01-11T21:38:06.6798688Z xnumel = 32 2023-01-11T21:38:06.6798763Z rnumel = 8 2023-01-11T21:38:06.6798866Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6799004Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.6799089Z xmask = xindex < xnumel 2023-01-11T21:38:06.6799211Z rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK]) 2023-01-11T21:38:06.6799284Z x0 = xindex 2023-01-11T21:38:06.6799396Z _tmp1 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.6799503Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.6799593Z rindex = roffset + rbase 2023-01-11T21:38:06.6799681Z rmask = rindex < rnumel 2023-01-11T21:38:06.6799756Z r1 = rindex 2023-01-11T21:38:06.6799978Z tmp0 = tl.load(in_ptr0 + (r1 + (8*x0)), rmask & xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.6800102Z _tmp1 = tl.where(xmask & rmask, _tmp1 + tmp0, _tmp1) 2023-01-11T21:38:06.6800211Z tmp1 = tl.reshape(tl.sum(_tmp1, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.6800329Z _tmp7 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.6800451Z _tmp8 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.6800559Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.6800652Z rindex = roffset + rbase 2023-01-11T21:38:06.6800742Z rmask = rindex < rnumel 2023-01-11T21:38:06.6800818Z r1 = rindex 2023-01-11T21:38:06.6801029Z tmp2 = tl.load(in_ptr0 + (r1 + (8*x0)), rmask & xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.6801107Z tmp3 = 8 2023-01-11T21:38:06.6801191Z tmp4 = tmp1 / tmp3 2023-01-11T21:38:06.6801310Z tmp5 = tmp2 - tmp4 2023-01-11T21:38:06.6801393Z tmp6 = tmp5 * tmp5 2023-01-11T21:38:06.6801520Z _tmp7 = tl.where(xmask & rmask, _tmp7 + tmp6, _tmp7) 2023-01-11T21:38:06.6801643Z _tmp8 = tl.where(xmask & rmask, _tmp8 + tmp2, _tmp8) 2023-01-11T21:38:06.6801753Z tmp7 = tl.reshape(tl.sum(_tmp7, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.6801870Z tmp8 = tl.reshape(tl.sum(_tmp8, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.6802017Z _tmp14 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.6802125Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.6802215Z rindex = roffset + rbase 2023-01-11T21:38:06.6802304Z rmask = rindex < rnumel 2023-01-11T21:38:06.6802379Z r1 = rindex 2023-01-11T21:38:06.6802492Z tmp9 = tl.load(in_ptr0 + (r1 + (8*x0)), rmask & xmask) 2023-01-11T21:38:06.6802568Z tmp10 = 8 2023-01-11T21:38:06.6802654Z tmp11 = tmp8 / tmp10 2023-01-11T21:38:06.6802773Z tmp12 = tmp9 - tmp11 2023-01-11T21:38:06.6802863Z tmp13 = tmp12 * tmp12 2023-01-11T21:38:06.6802993Z _tmp14 = tl.where(xmask & rmask, _tmp14 + tmp13, _tmp14) 2023-01-11T21:38:06.6803115Z tmp14 = tl.reshape(tl.sum(_tmp14, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.6803182Z tmp15 = 7 2023-01-11T21:38:06.6803266Z tmp16 = tmp7 / tmp15 2023-01-11T21:38:06.6803342Z tmp17 = 8 2023-01-11T21:38:06.6803424Z tmp18 = tmp14 / tmp17 2023-01-11T21:38:06.6803572Z tl.store(in_out_ptr0 + (x0 + tl.zeros([XBLOCK, 1], tl.int32)), tmp16, xmask) 2023-01-11T21:38:06.6803712Z tl.store(in_out_ptr1 + (x0 + tl.zeros([XBLOCK, 1], tl.int32)), tmp18, xmask) 2023-01-11T21:38:06.6803803Z ''') 2023-01-11T21:38:06.6803808Z 2023-01-11T21:38:06.6803812Z 2023-01-11T21:38:06.6803981Z triton_fused_std_1_std_2_2 = async_compile.triton(''' 2023-01-11T21:38:06.6804051Z import triton 2023-01-11T21:38:06.6804145Z import triton.language as tl 2023-01-11T21:38:06.6804261Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6804365Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6804501Z from torch._inductor.triton_ops.autotune import reduction 2023-01-11T21:38:06.6804655Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6804661Z 2023-01-11T21:38:06.6804753Z @reduction(size_hints=[32, 8], 2023-01-11T21:38:06.6804868Z reduction_hint=ReductionHint.DEFAULT, 2023-01-11T21:38:06.6804957Z filename=__file__, 2023-01-11T21:38:06.6805366Z meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0', 'in_out_ptr1'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.6805445Z @triton.jit 2023-01-11T21:38:06.6805654Z def triton_(in_out_ptr0, in_out_ptr1, in_ptr0, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr): 2023-01-11T21:38:06.6805739Z xnumel = 32 2023-01-11T21:38:06.6805831Z rnumel = 8 2023-01-11T21:38:06.6805929Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6806060Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.6806148Z xmask = xindex < xnumel 2023-01-11T21:38:06.6806273Z rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK]) 2023-01-11T21:38:06.6806347Z x0 = xindex 2023-01-11T21:38:06.6806465Z _tmp1 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.6806575Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.6806666Z rindex = roffset + rbase 2023-01-11T21:38:06.6806748Z rmask = rindex < rnumel 2023-01-11T21:38:06.6806825Z r1 = rindex 2023-01-11T21:38:06.6807043Z tmp0 = tl.load(in_ptr0 + (x0 + (32*r1)), rmask & xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.6807165Z _tmp1 = tl.where(xmask & rmask, _tmp1 + tmp0, _tmp1) 2023-01-11T21:38:06.6807282Z tmp1 = tl.reshape(tl.sum(_tmp1, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.6807400Z _tmp7 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.6807518Z _tmp8 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.6807620Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.6807710Z rindex = roffset + rbase 2023-01-11T21:38:06.6807799Z rmask = rindex < rnumel 2023-01-11T21:38:06.6807874Z r1 = rindex 2023-01-11T21:38:06.6808122Z tmp2 = tl.load(in_ptr0 + (x0 + (32*r1)), rmask & xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.6808199Z tmp3 = 8 2023-01-11T21:38:06.6808284Z tmp4 = tmp1 / tmp3 2023-01-11T21:38:06.6808393Z tmp5 = tmp2 - tmp4 2023-01-11T21:38:06.6808476Z tmp6 = tmp5 * tmp5 2023-01-11T21:38:06.6808599Z _tmp7 = tl.where(xmask & rmask, _tmp7 + tmp6, _tmp7) 2023-01-11T21:38:06.6808720Z _tmp8 = tl.where(xmask & rmask, _tmp8 + tmp2, _tmp8) 2023-01-11T21:38:06.6808836Z tmp7 = tl.reshape(tl.sum(_tmp7, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.6808950Z tmp8 = tl.reshape(tl.sum(_tmp8, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.6809069Z _tmp14 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.6809173Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.6809265Z rindex = roffset + rbase 2023-01-11T21:38:06.6809353Z rmask = rindex < rnumel 2023-01-11T21:38:06.6809427Z r1 = rindex 2023-01-11T21:38:06.6809546Z tmp9 = tl.load(in_ptr0 + (x0 + (32*r1)), rmask & xmask) 2023-01-11T21:38:06.6809625Z tmp10 = 8 2023-01-11T21:38:06.6809712Z tmp11 = tmp8 / tmp10 2023-01-11T21:38:06.6809824Z tmp12 = tmp9 - tmp11 2023-01-11T21:38:06.6809910Z tmp13 = tmp12 * tmp12 2023-01-11T21:38:06.6810034Z _tmp14 = tl.where(xmask & rmask, _tmp14 + tmp13, _tmp14) 2023-01-11T21:38:06.6810152Z tmp14 = tl.reshape(tl.sum(_tmp14, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.6810230Z tmp15 = 7 2023-01-11T21:38:06.6810314Z tmp16 = tmp7 / tmp15 2023-01-11T21:38:06.6810404Z tmp17 = tl.sqrt(tmp16) 2023-01-11T21:38:06.6810470Z tmp18 = 8 2023-01-11T21:38:06.6810555Z tmp19 = tmp14 / tmp18 2023-01-11T21:38:06.6810681Z tmp20 = tl.sqrt(tmp19) 2023-01-11T21:38:06.6810822Z tl.store(in_out_ptr0 + (x0 + tl.zeros([XBLOCK, 1], tl.int32)), tmp17, xmask) 2023-01-11T21:38:06.6810961Z tl.store(in_out_ptr1 + (x0 + tl.zeros([XBLOCK, 1], tl.int32)), tmp20, xmask) 2023-01-11T21:38:06.6811049Z ''') 2023-01-11T21:38:06.6811055Z 2023-01-11T21:38:06.6811059Z 2023-01-11T21:38:06.6811224Z triton_fused_std_3_3 = async_compile.triton(''' 2023-01-11T21:38:06.6811301Z import triton 2023-01-11T21:38:06.6811389Z import triton.language as tl 2023-01-11T21:38:06.6811506Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6811612Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6811746Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6811876Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6811881Z 2023-01-11T21:38:06.6812288Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.6812368Z @triton.jit 2023-01-11T21:38:06.6812495Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6812574Z xnumel = 64 2023-01-11T21:38:06.6812673Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6812809Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6812895Z xmask = xindex < xnumel 2023-01-11T21:38:06.6812973Z x0 = xindex % 8 2023-01-11T21:38:06.6813054Z x1 = (xindex // 8) 2023-01-11T21:38:06.6813120Z x2 = xindex 2023-01-11T21:38:06.6813229Z tmp0 = tl.load(in_ptr0 + (x0 + (32*x1)), xmask) 2023-01-11T21:38:06.6813344Z tmp1 = tl.load(in_ptr0 + (8 + x0 + (32*x1)), xmask) 2023-01-11T21:38:06.6813457Z tmp3 = tl.load(in_ptr0 + (16 + x0 + (32*x1)), xmask) 2023-01-11T21:38:06.6813567Z tmp5 = tl.load(in_ptr0 + (24 + x0 + (32*x1)), xmask) 2023-01-11T21:38:06.6813647Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.6813731Z tmp4 = tmp2 + tmp3 2023-01-11T21:38:06.6813804Z tmp6 = tmp4 + tmp5 2023-01-11T21:38:06.6813877Z tmp7 = 4 2023-01-11T21:38:06.6813957Z tmp8 = tmp6 / tmp7 2023-01-11T21:38:06.6814067Z tmp9 = tmp0 - tmp8 2023-01-11T21:38:06.6814149Z tmp10 = tmp9 * tmp9 2023-01-11T21:38:06.6814290Z tmp11 = tmp1 - tmp8 2023-01-11T21:38:06.6814377Z tmp12 = tmp11 * tmp11 2023-01-11T21:38:06.6814453Z tmp13 = tmp10 + tmp12 2023-01-11T21:38:06.6814694Z tmp14 = tmp3 - tmp8 2023-01-11T21:38:06.6814777Z tmp15 = tmp14 * tmp14 2023-01-11T21:38:06.6814861Z tmp16 = tmp13 + tmp15 2023-01-11T21:38:06.6814972Z tmp17 = tmp5 - tmp8 2023-01-11T21:38:06.6815054Z tmp18 = tmp17 * tmp17 2023-01-11T21:38:06.6815136Z tmp19 = tmp16 + tmp18 2023-01-11T21:38:06.6815202Z tmp20 = 3 2023-01-11T21:38:06.6815286Z tmp21 = tmp19 / tmp20 2023-01-11T21:38:06.6815371Z tmp22 = tl.sqrt(tmp21) 2023-01-11T21:38:06.6815511Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp22, xmask) 2023-01-11T21:38:06.6815601Z ''') 2023-01-11T21:38:06.6815606Z 2023-01-11T21:38:06.6815610Z 2023-01-11T21:38:06.6815708Z async_compile.wait(globals()) 2023-01-11T21:38:06.6815787Z del async_compile 2023-01-11T21:38:06.6815792Z 2023-01-11T21:38:06.6815862Z def call(args): 2023-01-11T21:38:06.6815938Z arg0_1, = args 2023-01-11T21:38:06.6816017Z args.clear() 2023-01-11T21:38:06.6816112Z with torch.cuda.device(0): 2023-01-11T21:38:06.6816303Z buf1 = empty_strided((), (), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.6816495Z buf3 = empty_strided((), (), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.6816687Z buf11 = empty_strided((), (), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.6816778Z buf21 = buf11; del buf11 # reuse 2023-01-11T21:38:06.6816870Z buf20 = buf3; del buf3 # reuse 2023-01-11T21:38:06.6816962Z buf19 = buf1; del buf1 # reuse 2023-01-11T21:38:06.6817057Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.6817332Z triton_fused_std_var_var_1_0.run(buf21, buf20, buf19, arg0_1, 1, 256, grid=grid(1), stream=stream0) 2023-01-11T21:38:06.6817544Z buf5 = empty_strided((2, 4, 4), (16, 4, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.6817753Z buf8 = empty_strided((2, 4, 4), (16, 4, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.6817848Z buf6 = buf5; del buf5 # reuse 2023-01-11T21:38:06.6817934Z buf9 = buf8; del buf8 # reuse 2023-01-11T21:38:06.6818086Z triton_fused_var_2_var_3_1.run(buf6, buf9, arg0_1, 32, 8, grid=grid(32), stream=stream0) 2023-01-11T21:38:06.6818288Z buf13 = empty_strided((4, 8), (8, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.6818488Z buf16 = empty_strided((4, 8), (8, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.6818585Z buf14 = buf13; del buf13 # reuse 2023-01-11T21:38:06.6818682Z buf17 = buf16; del buf16 # reuse 2023-01-11T21:38:06.6818840Z triton_fused_std_1_std_2_2.run(buf14, buf17, arg0_1, 32, 8, grid=grid(32), stream=stream0) 2023-01-11T21:38:06.6819055Z buf18 = empty_strided((2, 4, 1, 8), (32, 8, 8, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.6819190Z triton_fused_std_3_3.run(arg0_1, buf18, 64, grid=grid(64), stream=stream0) 2023-01-11T21:38:06.6819270Z del arg0_1 2023-01-11T21:38:06.6819398Z return (buf19, buf20, buf6, buf9, buf21, buf14, buf17, buf18, ) 2023-01-11T21:38:06.6819404Z 2023-01-11T21:38:06.6819408Z 2023-01-11T21:38:06.6819492Z if __name__ == "__main__": 2023-01-11T21:38:06.6819613Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.6819743Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.6819960Z arg0_1 = rand_strided((2, 4, 4, 8), (128, 32, 8, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.6820074Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.6820337Z [2023-01-11 21:35:41,429] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 859 2023-01-11T21:38:06.6820798Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.6820936Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.6821197Z [2023-01-11 21:35:41,614] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 860 2023-01-11T21:38:06.6821411Z [2023-01-11 21:35:41,704] torch._inductor.scheduler: [DEBUG] remove_buffer('buf10') 2023-01-11T21:38:06.6821620Z [2023-01-11 21:35:41,704] torch._inductor.scheduler: [DEBUG] remove_buffer('buf2') 2023-01-11T21:38:06.6821824Z [2023-01-11 21:35:41,704] torch._inductor.scheduler: [DEBUG] remove_buffer('buf0') 2023-01-11T21:38:06.6822034Z [2023-01-11 21:35:41,709] torch._inductor.scheduler: [DEBUG] remove_buffer('buf7') 2023-01-11T21:38:06.6822238Z [2023-01-11 21:35:41,709] torch._inductor.scheduler: [DEBUG] remove_buffer('buf4') 2023-01-11T21:38:06.6829207Z [2023-01-11 21:35:41,714] torch._inductor.scheduler: [DEBUG] remove_buffer('buf15') 2023-01-11T21:38:06.6829458Z [2023-01-11 21:35:41,714] torch._inductor.scheduler: [DEBUG] remove_buffer('buf12') 2023-01-11T21:38:06.6829475Z 2023-01-11T21:38:06.6829480Z 2023-01-11T21:38:06.6829577Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.6829656Z import torch 2023-01-11T21:38:06.6829736Z import random 2023-01-11T21:38:06.6829859Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.6829990Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.6829995Z 2023-01-11T21:38:06.6830080Z aten = torch.ops.aten 2023-01-11T21:38:06.6830219Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.6830392Z async_compile = AsyncCompile() 2023-01-11T21:38:06.6830397Z 2023-01-11T21:38:06.6830475Z import triton 2023-01-11T21:38:06.6830572Z import triton.language as tl 2023-01-11T21:38:06.6830701Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.6830846Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.6830851Z 2023-01-11T21:38:06.6830855Z 2023-01-11T21:38:06.6831026Z triton_fused_std_var_var_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.6831107Z import triton 2023-01-11T21:38:06.6831195Z import triton.language as tl 2023-01-11T21:38:06.6831311Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6831415Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6831549Z from torch._inductor.triton_ops.autotune import reduction 2023-01-11T21:38:06.6831678Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6831683Z 2023-01-11T21:38:06.6831777Z @reduction(size_hints=[1, 256], 2023-01-11T21:38:06.6831897Z reduction_hint=ReductionHint.INNER, 2023-01-11T21:38:06.6831986Z filename=__file__, 2023-01-11T21:38:06.6832426Z meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: '*fp16', 4: 'i32', 5: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr1', 'in_out_ptr0', 'in_out_ptr2'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 5), equal_to_1=())]}) 2023-01-11T21:38:06.6832502Z @triton.jit 2023-01-11T21:38:06.6832698Z def triton_(in_out_ptr0, in_out_ptr1, in_out_ptr2, in_ptr0, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr): 2023-01-11T21:38:06.6832775Z xnumel = 1 2023-01-11T21:38:06.6832851Z rnumel = 256 2023-01-11T21:38:06.6832953Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6833091Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.6833176Z xmask = xindex < xnumel 2023-01-11T21:38:06.6833291Z rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK]) 2023-01-11T21:38:06.6833413Z _tmp2 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.6833521Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.6833612Z rindex = roffset + rbase 2023-01-11T21:38:06.6833700Z rmask = rindex < rnumel 2023-01-11T21:38:06.6833808Z r0 = rindex 2023-01-11T21:38:06.6834030Z tmp0 = tl.load(in_ptr0 + (r0), rmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.6834118Z tmp1 = tmp0.to(tl.float32) 2023-01-11T21:38:06.6834241Z _tmp2 = tl.where(xmask & rmask, _tmp2 + tmp1, _tmp2) 2023-01-11T21:38:06.6834360Z tmp2 = tl.reshape(tl.sum(_tmp2, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.6834478Z _tmp9 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.6834598Z _tmp11 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.6834707Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.6834798Z rindex = roffset + rbase 2023-01-11T21:38:06.6834881Z rmask = rindex < rnumel 2023-01-11T21:38:06.6834956Z r0 = rindex 2023-01-11T21:38:06.6835178Z tmp3 = tl.load(in_ptr0 + (r0), rmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.6835254Z tmp4 = 256 2023-01-11T21:38:06.6835361Z tmp5 = tmp2 / tmp4 2023-01-11T21:38:06.6835463Z tmp6 = tmp5.to(tl.float32) 2023-01-11T21:38:06.6835600Z tmp7 = tmp3 - tmp6 2023-01-11T21:38:06.6835678Z tmp8 = tmp7 * tmp7 2023-01-11T21:38:06.6835800Z _tmp9 = tl.where(xmask & rmask, _tmp9 + tmp8, _tmp9) 2023-01-11T21:38:06.6835893Z tmp10 = tmp3.to(tl.float32) 2023-01-11T21:38:06.6836018Z _tmp11 = tl.where(xmask & rmask, _tmp11 + tmp10, _tmp11) 2023-01-11T21:38:06.6836135Z tmp9 = tl.reshape(tl.sum(_tmp9, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.6836251Z tmp11 = tl.reshape(tl.sum(_tmp11, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.6836375Z _tmp18 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.6836515Z _tmp20 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.6836624Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.6836714Z rindex = roffset + rbase 2023-01-11T21:38:06.6836803Z rmask = rindex < rnumel 2023-01-11T21:38:06.6836877Z r0 = rindex 2023-01-11T21:38:06.6837101Z tmp12 = tl.load(in_ptr0 + (r0), rmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.6837179Z tmp13 = 256 2023-01-11T21:38:06.6837258Z tmp14 = tmp11 / tmp13 2023-01-11T21:38:06.6837351Z tmp15 = tmp14.to(tl.float32) 2023-01-11T21:38:06.6837471Z tmp16 = tmp12 - tmp15 2023-01-11T21:38:06.6837556Z tmp17 = tmp16 * tmp16 2023-01-11T21:38:06.6837683Z _tmp18 = tl.where(xmask & rmask, _tmp18 + tmp17, _tmp18) 2023-01-11T21:38:06.6837776Z tmp19 = tmp12.to(tl.float32) 2023-01-11T21:38:06.6837898Z _tmp20 = tl.where(xmask & rmask, _tmp20 + tmp19, _tmp20) 2023-01-11T21:38:06.6838009Z tmp18 = tl.reshape(tl.sum(_tmp18, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.6838128Z tmp20 = tl.reshape(tl.sum(_tmp20, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.6838247Z _tmp27 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.6838358Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.6838451Z rindex = roffset + rbase 2023-01-11T21:38:06.6838541Z rmask = rindex < rnumel 2023-01-11T21:38:06.6838615Z r0 = rindex 2023-01-11T21:38:06.6838730Z tmp21 = tl.load(in_ptr0 + (r0), rmask).to(tl.float32) 2023-01-11T21:38:06.6838805Z tmp22 = 256 2023-01-11T21:38:06.6838890Z tmp23 = tmp20 / tmp22 2023-01-11T21:38:06.6838981Z tmp24 = tmp23.to(tl.float32) 2023-01-11T21:38:06.6839102Z tmp25 = tmp21 - tmp24 2023-01-11T21:38:06.6839187Z tmp26 = tmp25 * tmp25 2023-01-11T21:38:06.6839312Z _tmp27 = tl.where(xmask & rmask, _tmp27 + tmp26, _tmp27) 2023-01-11T21:38:06.6839421Z tmp27 = tl.reshape(tl.sum(_tmp27, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.6839500Z tmp28 = 256 2023-01-11T21:38:06.6839584Z tmp29 = tmp27 / tmp28 2023-01-11T21:38:06.6839670Z tmp30 = tl.sqrt(tmp29) 2023-01-11T21:38:06.6839752Z tmp31 = tmp18 / tmp28 2023-01-11T21:38:06.6839826Z tmp32 = 255 2023-01-11T21:38:06.6839909Z tmp33 = tmp9 / tmp32 2023-01-11T21:38:06.6840072Z tl.store(in_out_ptr0 + (0 + tl.zeros([XBLOCK, 1], tl.int32)), tmp30, None) 2023-01-11T21:38:06.6840217Z tl.store(in_out_ptr1 + (0 + tl.zeros([XBLOCK, 1], tl.int32)), tmp31, None) 2023-01-11T21:38:06.6840356Z tl.store(in_out_ptr2 + (0 + tl.zeros([XBLOCK, 1], tl.int32)), tmp33, None) 2023-01-11T21:38:06.6840444Z ''') 2023-01-11T21:38:06.6840450Z 2023-01-11T21:38:06.6840454Z 2023-01-11T21:38:06.6840623Z triton_fused_var_2_var_3_1 = async_compile.triton(''' 2023-01-11T21:38:06.6840702Z import triton 2023-01-11T21:38:06.6840798Z import triton.language as tl 2023-01-11T21:38:06.6840915Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6841014Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6841151Z from torch._inductor.triton_ops.autotune import reduction 2023-01-11T21:38:06.6841278Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6841283Z 2023-01-11T21:38:06.6841376Z @reduction(size_hints=[32, 8], 2023-01-11T21:38:06.6841497Z reduction_hint=ReductionHint.INNER, 2023-01-11T21:38:06.6841585Z filename=__file__, 2023-01-11T21:38:06.6841995Z meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0', 'in_out_ptr1'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.6842071Z @triton.jit 2023-01-11T21:38:06.6842248Z def triton_(in_out_ptr0, in_out_ptr1, in_ptr0, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr): 2023-01-11T21:38:06.6842324Z xnumel = 32 2023-01-11T21:38:06.6842400Z rnumel = 8 2023-01-11T21:38:06.6842500Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6842666Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.6842752Z xmask = xindex < xnumel 2023-01-11T21:38:06.6842873Z rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK]) 2023-01-11T21:38:06.6842939Z x0 = xindex 2023-01-11T21:38:06.6843060Z _tmp2 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.6843168Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.6843257Z rindex = roffset + rbase 2023-01-11T21:38:06.6843343Z rmask = rindex < rnumel 2023-01-11T21:38:06.6843417Z r1 = rindex 2023-01-11T21:38:06.6843661Z tmp0 = tl.load(in_ptr0 + (r1 + (8*x0)), rmask & xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.6843749Z tmp1 = tmp0.to(tl.float32) 2023-01-11T21:38:06.6843877Z _tmp2 = tl.where(xmask & rmask, _tmp2 + tmp1, _tmp2) 2023-01-11T21:38:06.6843993Z tmp2 = tl.reshape(tl.sum(_tmp2, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.6844114Z _tmp9 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.6844235Z _tmp11 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.6844342Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.6844433Z rindex = roffset + rbase 2023-01-11T21:38:06.6844513Z rmask = rindex < rnumel 2023-01-11T21:38:06.6844592Z r1 = rindex 2023-01-11T21:38:06.6844833Z tmp3 = tl.load(in_ptr0 + (r1 + (8*x0)), rmask & xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.6844908Z tmp4 = 8 2023-01-11T21:38:06.6844992Z tmp5 = tmp2 / tmp4 2023-01-11T21:38:06.6845086Z tmp6 = tmp5.to(tl.float32) 2023-01-11T21:38:06.6845201Z tmp7 = tmp3 - tmp6 2023-01-11T21:38:06.6845278Z tmp8 = tmp7 * tmp7 2023-01-11T21:38:06.6845407Z _tmp9 = tl.where(xmask & rmask, _tmp9 + tmp8, _tmp9) 2023-01-11T21:38:06.6845503Z tmp10 = tmp3.to(tl.float32) 2023-01-11T21:38:06.6845628Z _tmp11 = tl.where(xmask & rmask, _tmp11 + tmp10, _tmp11) 2023-01-11T21:38:06.6845748Z tmp9 = tl.reshape(tl.sum(_tmp9, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.6845864Z tmp11 = tl.reshape(tl.sum(_tmp11, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.6845982Z _tmp18 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.6846109Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.6846202Z rindex = roffset + rbase 2023-01-11T21:38:06.6846290Z rmask = rindex < rnumel 2023-01-11T21:38:06.6846364Z r1 = rindex 2023-01-11T21:38:06.6846502Z tmp12 = tl.load(in_ptr0 + (r1 + (8*x0)), rmask & xmask).to(tl.float32) 2023-01-11T21:38:06.6846578Z tmp13 = 8 2023-01-11T21:38:06.6846663Z tmp14 = tmp11 / tmp13 2023-01-11T21:38:06.6846750Z tmp15 = tmp14.to(tl.float32) 2023-01-11T21:38:06.6846871Z tmp16 = tmp12 - tmp15 2023-01-11T21:38:06.6846956Z tmp17 = tmp16 * tmp16 2023-01-11T21:38:06.6847080Z _tmp18 = tl.where(xmask & rmask, _tmp18 + tmp17, _tmp18) 2023-01-11T21:38:06.6847202Z tmp18 = tl.reshape(tl.sum(_tmp18, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.6847276Z tmp19 = 7 2023-01-11T21:38:06.6847360Z tmp20 = tmp9 / tmp19 2023-01-11T21:38:06.6847427Z tmp21 = 8 2023-01-11T21:38:06.6847512Z tmp22 = tmp18 / tmp21 2023-01-11T21:38:06.6847657Z tl.store(in_out_ptr0 + (x0 + tl.zeros([XBLOCK, 1], tl.int32)), tmp20, xmask) 2023-01-11T21:38:06.6847802Z tl.store(in_out_ptr1 + (x0 + tl.zeros([XBLOCK, 1], tl.int32)), tmp22, xmask) 2023-01-11T21:38:06.6847890Z ''') 2023-01-11T21:38:06.6847896Z 2023-01-11T21:38:06.6847900Z 2023-01-11T21:38:06.6848068Z triton_fused_std_1_std_2_2 = async_compile.triton(''' 2023-01-11T21:38:06.6848146Z import triton 2023-01-11T21:38:06.6848242Z import triton.language as tl 2023-01-11T21:38:06.6848353Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6848458Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6848594Z from torch._inductor.triton_ops.autotune import reduction 2023-01-11T21:38:06.6848748Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6848754Z 2023-01-11T21:38:06.6848845Z @reduction(size_hints=[32, 8], 2023-01-11T21:38:06.6848967Z reduction_hint=ReductionHint.DEFAULT, 2023-01-11T21:38:06.6849056Z filename=__file__, 2023-01-11T21:38:06.6849469Z meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0', 'in_out_ptr1'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.6849538Z @triton.jit 2023-01-11T21:38:06.6849724Z def triton_(in_out_ptr0, in_out_ptr1, in_ptr0, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr): 2023-01-11T21:38:06.6849800Z xnumel = 32 2023-01-11T21:38:06.6849874Z rnumel = 8 2023-01-11T21:38:06.6849973Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6850111Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.6850201Z xmask = xindex < xnumel 2023-01-11T21:38:06.6850318Z rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK]) 2023-01-11T21:38:06.6850391Z x0 = xindex 2023-01-11T21:38:06.6850509Z _tmp2 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.6850618Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.6850711Z rindex = roffset + rbase 2023-01-11T21:38:06.6850799Z rmask = rindex < rnumel 2023-01-11T21:38:06.6850873Z r1 = rindex 2023-01-11T21:38:06.6851107Z tmp0 = tl.load(in_ptr0 + (x0 + (32*r1)), rmask & xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.6851203Z tmp1 = tmp0.to(tl.float32) 2023-01-11T21:38:06.6851326Z _tmp2 = tl.where(xmask & rmask, _tmp2 + tmp1, _tmp2) 2023-01-11T21:38:06.6851443Z tmp2 = tl.reshape(tl.sum(_tmp2, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.6851566Z _tmp9 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.6851686Z _tmp11 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.6851797Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.6851880Z rindex = roffset + rbase 2023-01-11T21:38:06.6851969Z rmask = rindex < rnumel 2023-01-11T21:38:06.6852046Z r1 = rindex 2023-01-11T21:38:06.6852358Z tmp3 = tl.load(in_ptr0 + (x0 + (32*r1)), rmask & xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.6852435Z tmp4 = 8 2023-01-11T21:38:06.6852519Z tmp5 = tmp2 / tmp4 2023-01-11T21:38:06.6852613Z tmp6 = tmp5.to(tl.float32) 2023-01-11T21:38:06.6852722Z tmp7 = tmp3 - tmp6 2023-01-11T21:38:06.6852808Z tmp8 = tmp7 * tmp7 2023-01-11T21:38:06.6852930Z _tmp9 = tl.where(xmask & rmask, _tmp9 + tmp8, _tmp9) 2023-01-11T21:38:06.6853022Z tmp10 = tmp3.to(tl.float32) 2023-01-11T21:38:06.6853147Z _tmp11 = tl.where(xmask & rmask, _tmp11 + tmp10, _tmp11) 2023-01-11T21:38:06.6853264Z tmp9 = tl.reshape(tl.sum(_tmp9, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.6853385Z tmp11 = tl.reshape(tl.sum(_tmp11, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.6853504Z _tmp18 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.6853604Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.6853695Z rindex = roffset + rbase 2023-01-11T21:38:06.6853785Z rmask = rindex < rnumel 2023-01-11T21:38:06.6853859Z r1 = rindex 2023-01-11T21:38:06.6853995Z tmp12 = tl.load(in_ptr0 + (x0 + (32*r1)), rmask & xmask).to(tl.float32) 2023-01-11T21:38:06.6854072Z tmp13 = 8 2023-01-11T21:38:06.6854152Z tmp14 = tmp11 / tmp13 2023-01-11T21:38:06.6854245Z tmp15 = tmp14.to(tl.float32) 2023-01-11T21:38:06.6854366Z tmp16 = tmp12 - tmp15 2023-01-11T21:38:06.6854452Z tmp17 = tmp16 * tmp16 2023-01-11T21:38:06.6854729Z _tmp18 = tl.where(xmask & rmask, _tmp18 + tmp17, _tmp18) 2023-01-11T21:38:06.6854846Z tmp18 = tl.reshape(tl.sum(_tmp18, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.6854982Z tmp19 = 7 2023-01-11T21:38:06.6855057Z tmp20 = tmp9 / tmp19 2023-01-11T21:38:06.6855140Z tmp21 = tl.sqrt(tmp20) 2023-01-11T21:38:06.6855212Z tmp22 = 8 2023-01-11T21:38:06.6855293Z tmp23 = tmp18 / tmp22 2023-01-11T21:38:06.6855375Z tmp24 = tl.sqrt(tmp23) 2023-01-11T21:38:06.6855519Z tl.store(in_out_ptr0 + (x0 + tl.zeros([XBLOCK, 1], tl.int32)), tmp21, xmask) 2023-01-11T21:38:06.6855659Z tl.store(in_out_ptr1 + (x0 + tl.zeros([XBLOCK, 1], tl.int32)), tmp24, xmask) 2023-01-11T21:38:06.6855740Z ''') 2023-01-11T21:38:06.6855756Z 2023-01-11T21:38:06.6855760Z 2023-01-11T21:38:06.6855911Z triton_fused_std_3_3 = async_compile.triton(''' 2023-01-11T21:38:06.6855986Z import triton 2023-01-11T21:38:06.6856079Z import triton.language as tl 2023-01-11T21:38:06.6856196Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6856298Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6856431Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6856560Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6856565Z 2023-01-11T21:38:06.6856973Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.6857040Z @triton.jit 2023-01-11T21:38:06.6857251Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6857342Z xnumel = 64 2023-01-11T21:38:06.6857440Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6857573Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6857656Z xmask = xindex < xnumel 2023-01-11T21:38:06.6857731Z x0 = xindex % 8 2023-01-11T21:38:06.6857803Z x1 = (xindex // 8) 2023-01-11T21:38:06.6857874Z x2 = xindex 2023-01-11T21:38:06.6857997Z tmp0 = tl.load(in_ptr0 + (x0 + (32*x1)), xmask).to(tl.float32) 2023-01-11T21:38:06.6858127Z tmp2 = tl.load(in_ptr0 + (8 + x0 + (32*x1)), xmask).to(tl.float32) 2023-01-11T21:38:06.6858253Z tmp5 = tl.load(in_ptr0 + (16 + x0 + (32*x1)), xmask).to(tl.float32) 2023-01-11T21:38:06.6858376Z tmp8 = tl.load(in_ptr0 + (24 + x0 + (32*x1)), xmask).to(tl.float32) 2023-01-11T21:38:06.6858466Z tmp1 = tmp0.to(tl.float32) 2023-01-11T21:38:06.6858587Z tmp3 = tmp2.to(tl.float32) 2023-01-11T21:38:06.6858669Z tmp4 = tmp1 + tmp3 2023-01-11T21:38:06.6858755Z tmp6 = tmp5.to(tl.float32) 2023-01-11T21:38:06.6858834Z tmp7 = tmp4 + tmp6 2023-01-11T21:38:06.6858920Z tmp9 = tmp8.to(tl.float32) 2023-01-11T21:38:06.6858999Z tmp10 = tmp7 + tmp9 2023-01-11T21:38:06.6859071Z tmp11 = 4 2023-01-11T21:38:06.6859144Z tmp12 = tmp10 / tmp11 2023-01-11T21:38:06.6859233Z tmp13 = tmp12.to(tl.float32) 2023-01-11T21:38:06.6859350Z tmp14 = tmp0 - tmp13 2023-01-11T21:38:06.6859431Z tmp15 = tmp14 * tmp14 2023-01-11T21:38:06.6859542Z tmp16 = tmp2 - tmp13 2023-01-11T21:38:06.6859625Z tmp17 = tmp16 * tmp16 2023-01-11T21:38:06.6859697Z tmp18 = tmp15 + tmp17 2023-01-11T21:38:06.6859807Z tmp19 = tmp5 - tmp13 2023-01-11T21:38:06.6859887Z tmp20 = tmp19 * tmp19 2023-01-11T21:38:06.6859966Z tmp21 = tmp18 + tmp20 2023-01-11T21:38:06.6860078Z tmp22 = tmp8 - tmp13 2023-01-11T21:38:06.6860159Z tmp23 = tmp22 * tmp22 2023-01-11T21:38:06.6860238Z tmp24 = tmp21 + tmp23 2023-01-11T21:38:06.6860302Z tmp25 = 3 2023-01-11T21:38:06.6860380Z tmp26 = tmp24 / tmp25 2023-01-11T21:38:06.6860463Z tmp27 = tl.sqrt(tmp26) 2023-01-11T21:38:06.6860601Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp27, xmask) 2023-01-11T21:38:06.6860687Z ''') 2023-01-11T21:38:06.6860692Z 2023-01-11T21:38:06.6860697Z 2023-01-11T21:38:06.6860790Z async_compile.wait(globals()) 2023-01-11T21:38:06.6860866Z del async_compile 2023-01-11T21:38:06.6860872Z 2023-01-11T21:38:06.6860946Z def call(args): 2023-01-11T21:38:06.6861013Z arg0_1, = args 2023-01-11T21:38:06.6861131Z args.clear() 2023-01-11T21:38:06.6861227Z with torch.cuda.device(0): 2023-01-11T21:38:06.6861421Z buf1 = empty_strided((), (), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.6861617Z buf3 = empty_strided((), (), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.6861811Z buf11 = empty_strided((), (), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.6861913Z buf21 = buf11; del buf11 # reuse 2023-01-11T21:38:06.6862000Z buf20 = buf3; del buf3 # reuse 2023-01-11T21:38:06.6862096Z buf19 = buf1; del buf1 # reuse 2023-01-11T21:38:06.6862194Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.6862353Z triton_fused_std_var_var_1_0.run(buf21, buf20, buf19, arg0_1, 1, 256, grid=grid(1), stream=stream0) 2023-01-11T21:38:06.6862565Z buf5 = empty_strided((2, 4, 4), (16, 4, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.6862774Z buf8 = empty_strided((2, 4, 4), (16, 4, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.6862872Z buf6 = buf5; del buf5 # reuse 2023-01-11T21:38:06.6862958Z buf9 = buf8; del buf8 # reuse 2023-01-11T21:38:06.6863111Z triton_fused_var_2_var_3_1.run(buf6, buf9, arg0_1, 32, 8, grid=grid(32), stream=stream0) 2023-01-11T21:38:06.6863314Z buf13 = empty_strided((4, 8), (8, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.6863518Z buf16 = empty_strided((4, 8), (8, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.6863620Z buf14 = buf13; del buf13 # reuse 2023-01-11T21:38:06.6863717Z buf17 = buf16; del buf16 # reuse 2023-01-11T21:38:06.6863874Z triton_fused_std_1_std_2_2.run(buf14, buf17, arg0_1, 32, 8, grid=grid(32), stream=stream0) 2023-01-11T21:38:06.6864089Z buf18 = empty_strided((2, 4, 1, 8), (32, 8, 8, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.6864222Z triton_fused_std_3_3.run(arg0_1, buf18, 64, grid=grid(64), stream=stream0) 2023-01-11T21:38:06.6864302Z del arg0_1 2023-01-11T21:38:06.6864430Z return (buf19, buf20, buf6, buf9, buf21, buf14, buf17, buf18, ) 2023-01-11T21:38:06.6864436Z 2023-01-11T21:38:06.6864440Z 2023-01-11T21:38:06.6864523Z if __name__ == "__main__": 2023-01-11T21:38:06.6864645Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.6864806Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.6865027Z arg0_1 = rand_strided((2, 4, 4, 8), (128, 32, 8, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.6865142Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.6865404Z [2023-01-11 21:35:41,877] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 860 2023-01-11T21:38:06.6865418Z 2023-01-11T21:38:06.6865485Z ok (0.902s) 2023-01-11T21:38:06.6866007Z test_strided_inputs_cuda (__main__.CudaTests) ... /opt/conda/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py:372: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled.Consider setting `torch.set_float32_matmul_precision('high')` 2023-01-11T21:38:06.6866094Z warnings.warn( 2023-01-11T21:38:06.6866354Z [2023-01-11 21:35:41,892] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 861 2023-01-11T21:38:06.6866622Z [2023-01-11 21:35:41,960] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 861 2023-01-11T21:38:06.6866628Z 2023-01-11T21:38:06.6866733Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.6866811Z import torch 2023-01-11T21:38:06.6866887Z import random 2023-01-11T21:38:06.6867004Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.6867129Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.6867135Z 2023-01-11T21:38:06.6867224Z aten = torch.ops.aten 2023-01-11T21:38:06.6867364Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.6867464Z async_compile = AsyncCompile() 2023-01-11T21:38:06.6867498Z 2023-01-11T21:38:06.6867578Z import triton 2023-01-11T21:38:06.6867674Z import triton.language as tl 2023-01-11T21:38:06.6867801Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.6867936Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.6867942Z 2023-01-11T21:38:06.6867956Z 2023-01-11T21:38:06.6868108Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.6868185Z import triton 2023-01-11T21:38:06.6868281Z import triton.language as tl 2023-01-11T21:38:06.6868399Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6868504Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6868643Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6868765Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6868779Z 2023-01-11T21:38:06.6869197Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.6869278Z @triton.jit 2023-01-11T21:38:06.6869424Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6869501Z xnumel = 128 2023-01-11T21:38:06.6869606Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6869739Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6869826Z xmask = xindex < xnumel 2023-01-11T21:38:06.6869893Z x0 = xindex 2023-01-11T21:38:06.6869997Z tmp0 = tl.load(in_ptr0 + (2*x0), xmask) 2023-01-11T21:38:06.6870099Z tmp1 = tl.load(in_ptr1 + (x0), xmask) 2023-01-11T21:38:06.6870181Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.6870318Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.6870407Z ''') 2023-01-11T21:38:06.6870413Z 2023-01-11T21:38:06.6870417Z 2023-01-11T21:38:06.6870516Z async_compile.wait(globals()) 2023-01-11T21:38:06.6870596Z del async_compile 2023-01-11T21:38:06.6870601Z 2023-01-11T21:38:06.6870672Z def call(args): 2023-01-11T21:38:06.6870753Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.6870830Z args.clear() 2023-01-11T21:38:06.6870924Z with torch.cuda.device(0): 2023-01-11T21:38:06.6871161Z buf0 = empty_strided((8, 16), (16, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.6871257Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.6871404Z triton_fused_add_0.run(arg0_1, arg1_1, buf0, 128, grid=grid(128), stream=stream0) 2023-01-11T21:38:06.6871473Z del arg0_1 2023-01-11T21:38:06.6871549Z del arg1_1 2023-01-11T21:38:06.6871628Z return (buf0, ) 2023-01-11T21:38:06.6871634Z 2023-01-11T21:38:06.6871638Z 2023-01-11T21:38:06.6871720Z if __name__ == "__main__": 2023-01-11T21:38:06.6871840Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.6871968Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.6872179Z arg0_1 = rand_strided((8, 16), (32, 2), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.6872382Z arg1_1 = rand_strided((8, 16), (16, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.6872498Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.6872503Z 2023-01-11T21:38:06.6872578Z ok (0.259s) 2023-01-11T21:38:06.6873031Z test_sum1_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.6873170Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.6873434Z [2023-01-11 21:35:42,154] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 862 2023-01-11T21:38:06.6873737Z [2023-01-11 21:35:42,233] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 862 2023-01-11T21:38:06.6874157Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.6874292Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.6874546Z [2023-01-11 21:35:42,249] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 863 2023-01-11T21:38:06.6874805Z [2023-01-11 21:35:42,327] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 863 2023-01-11T21:38:06.6874811Z 2023-01-11T21:38:06.6874913Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.6874987Z import torch 2023-01-11T21:38:06.6875064Z import random 2023-01-11T21:38:06.6875187Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.6875318Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.6875323Z 2023-01-11T21:38:06.6875409Z aten = torch.ops.aten 2023-01-11T21:38:06.6875575Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.6875694Z async_compile = AsyncCompile() 2023-01-11T21:38:06.6875701Z 2023-01-11T21:38:06.6875773Z import triton 2023-01-11T21:38:06.6875868Z import triton.language as tl 2023-01-11T21:38:06.6875995Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.6876135Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.6876141Z 2023-01-11T21:38:06.6876145Z 2023-01-11T21:38:06.6876308Z triton_fused_add_sum_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.6876386Z import triton 2023-01-11T21:38:06.6876485Z import triton.language as tl 2023-01-11T21:38:06.6876600Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6876698Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6876832Z from torch._inductor.triton_ops.autotune import reduction 2023-01-11T21:38:06.6876960Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6876993Z 2023-01-11T21:38:06.6877086Z @reduction(size_hints=[8, 8], 2023-01-11T21:38:06.6877203Z reduction_hint=ReductionHint.INNER, 2023-01-11T21:38:06.6877294Z filename=__file__, 2023-01-11T21:38:06.6877675Z meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.6877752Z @triton.jit 2023-01-11T21:38:06.6877924Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr): 2023-01-11T21:38:06.6878006Z xnumel = 8 2023-01-11T21:38:06.6878080Z rnumel = 8 2023-01-11T21:38:06.6878182Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6878319Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.6878407Z xmask = xindex < xnumel 2023-01-11T21:38:06.6878531Z rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK]) 2023-01-11T21:38:06.6878598Z x0 = xindex 2023-01-11T21:38:06.6878717Z _tmp3 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.6878826Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.6878920Z rindex = roffset + rbase 2023-01-11T21:38:06.6879007Z rmask = rindex < rnumel 2023-01-11T21:38:06.6879083Z r1 = rindex 2023-01-11T21:38:06.6879204Z tmp0 = tl.load(in_ptr0 + (r1 + (8*x0)), rmask & xmask) 2023-01-11T21:38:06.6879315Z tmp1 = tl.load(in_ptr1 + (r1 + (8*x0)), rmask & xmask) 2023-01-11T21:38:06.6879399Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.6879524Z _tmp3 = tl.where(xmask & rmask, _tmp3 + tmp2, _tmp3) 2023-01-11T21:38:06.6879671Z tmp3 = tl.reshape(tl.sum(_tmp3, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.6879775Z tl.store(out_ptr0 + x0, tmp3, xmask) 2023-01-11T21:38:06.6879864Z ''') 2023-01-11T21:38:06.6879870Z 2023-01-11T21:38:06.6879874Z 2023-01-11T21:38:06.6879970Z async_compile.wait(globals()) 2023-01-11T21:38:06.6880053Z del async_compile 2023-01-11T21:38:06.6880058Z 2023-01-11T21:38:06.6880128Z def call(args): 2023-01-11T21:38:06.6880210Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.6880288Z args.clear() 2023-01-11T21:38:06.6880386Z with torch.cuda.device(0): 2023-01-11T21:38:06.6880586Z buf0 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.6880682Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.6880835Z triton_fused_add_sum_1_0.run(arg0_1, arg1_1, buf0, 8, 8, grid=grid(8), stream=stream0) 2023-01-11T21:38:06.6880904Z del arg0_1 2023-01-11T21:38:06.6880979Z del arg1_1 2023-01-11T21:38:06.6881062Z return (buf0, ) 2023-01-11T21:38:06.6881068Z 2023-01-11T21:38:06.6881072Z 2023-01-11T21:38:06.6881156Z if __name__ == "__main__": 2023-01-11T21:38:06.6881276Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.6881405Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.6881612Z arg0_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.6881814Z arg1_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.6881931Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.6881936Z 2023-01-11T21:38:06.6881940Z 2023-01-11T21:38:06.6882040Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.6882117Z import torch 2023-01-11T21:38:06.6882194Z import random 2023-01-11T21:38:06.6882316Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.6882443Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.6882450Z 2023-01-11T21:38:06.6882535Z aten = torch.ops.aten 2023-01-11T21:38:06.6882666Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.6882766Z async_compile = AsyncCompile() 2023-01-11T21:38:06.6882771Z 2023-01-11T21:38:06.6882849Z import triton 2023-01-11T21:38:06.6882943Z import triton.language as tl 2023-01-11T21:38:06.6883101Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.6883246Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.6883251Z 2023-01-11T21:38:06.6883256Z 2023-01-11T21:38:06.6883419Z triton_fused_add_sum_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.6883496Z import triton 2023-01-11T21:38:06.6883584Z import triton.language as tl 2023-01-11T21:38:06.6883700Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6883804Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6883939Z from torch._inductor.triton_ops.autotune import reduction 2023-01-11T21:38:06.6884070Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6884075Z 2023-01-11T21:38:06.6884166Z @reduction(size_hints=[8, 8], 2023-01-11T21:38:06.6884285Z reduction_hint=ReductionHint.INNER, 2023-01-11T21:38:06.6884373Z filename=__file__, 2023-01-11T21:38:06.6884747Z meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.6884824Z @triton.jit 2023-01-11T21:38:06.6885005Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr): 2023-01-11T21:38:06.6885080Z xnumel = 8 2023-01-11T21:38:06.6885154Z rnumel = 8 2023-01-11T21:38:06.6885255Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6885394Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.6885473Z xmask = xindex < xnumel 2023-01-11T21:38:06.6885648Z rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK]) 2023-01-11T21:38:06.6885733Z x0 = xindex 2023-01-11T21:38:06.6885866Z _tmp3 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.6885973Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.6886064Z rindex = roffset + rbase 2023-01-11T21:38:06.6886156Z rmask = rindex < rnumel 2023-01-11T21:38:06.6886224Z r1 = rindex 2023-01-11T21:38:06.6886359Z tmp0 = tl.load(in_ptr0 + (r1 + (8*x0)), rmask & xmask).to(tl.float32) 2023-01-11T21:38:06.6886493Z tmp1 = tl.load(in_ptr1 + (r1 + (8*x0)), rmask & xmask).to(tl.float32) 2023-01-11T21:38:06.6886579Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.6886706Z _tmp3 = tl.where(xmask & rmask, _tmp3 + tmp2, _tmp3) 2023-01-11T21:38:06.6886822Z tmp3 = tl.reshape(tl.sum(_tmp3, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.6886923Z tl.store(out_ptr0 + x0, tmp3, xmask) 2023-01-11T21:38:06.6887013Z ''') 2023-01-11T21:38:06.6887021Z 2023-01-11T21:38:06.6887025Z 2023-01-11T21:38:06.6887113Z async_compile.wait(globals()) 2023-01-11T21:38:06.6887194Z del async_compile 2023-01-11T21:38:06.6887199Z 2023-01-11T21:38:06.6887275Z def call(args): 2023-01-11T21:38:06.6887357Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.6887434Z args.clear() 2023-01-11T21:38:06.6887533Z with torch.cuda.device(0): 2023-01-11T21:38:06.6887731Z buf0 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.6887818Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.6887972Z triton_fused_add_sum_1_0.run(arg0_1, arg1_1, buf0, 8, 8, grid=grid(8), stream=stream0) 2023-01-11T21:38:06.6888049Z del arg0_1 2023-01-11T21:38:06.6888125Z del arg1_1 2023-01-11T21:38:06.6888204Z return (buf0, ) 2023-01-11T21:38:06.6888209Z 2023-01-11T21:38:06.6888214Z 2023-01-11T21:38:06.6888297Z if __name__ == "__main__": 2023-01-11T21:38:06.6888416Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.6888548Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.6888741Z arg0_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.6888939Z arg1_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.6889094Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.6889100Z 2023-01-11T21:38:06.6889174Z ok (0.189s) 2023-01-11T21:38:06.6889624Z test_sum2_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.6889755Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.6890014Z [2023-01-11 21:35:42,348] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 864 2023-01-11T21:38:06.6890277Z [2023-01-11 21:35:42,743] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 864 2023-01-11T21:38:06.6890695Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.6890827Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.6891076Z [2023-01-11 21:35:42,765] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 865 2023-01-11T21:38:06.6891089Z 2023-01-11T21:38:06.6891180Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.6891256Z import torch 2023-01-11T21:38:06.6891359Z import random 2023-01-11T21:38:06.6891478Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.6891604Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.6891609Z 2023-01-11T21:38:06.6891691Z aten = torch.ops.aten 2023-01-11T21:38:06.6891828Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.6891917Z async_compile = AsyncCompile() 2023-01-11T21:38:06.6891922Z 2023-01-11T21:38:06.6891996Z import triton 2023-01-11T21:38:06.6892089Z import triton.language as tl 2023-01-11T21:38:06.6892213Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.6892355Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.6892361Z 2023-01-11T21:38:06.6892365Z 2023-01-11T21:38:06.6892527Z triton_fused_add_sum_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.6892602Z import triton 2023-01-11T21:38:06.6892694Z import triton.language as tl 2023-01-11T21:38:06.6892801Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6892911Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6893043Z from torch._inductor.triton_ops.autotune import reduction 2023-01-11T21:38:06.6893169Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6893175Z 2023-01-11T21:38:06.6893265Z @reduction(size_hints=[256, 32], 2023-01-11T21:38:06.6893385Z reduction_hint=ReductionHint.DEFAULT, 2023-01-11T21:38:06.6893470Z filename=__file__, 2023-01-11T21:38:06.6893846Z meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.6893912Z @triton.jit 2023-01-11T21:38:06.6894090Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr): 2023-01-11T21:38:06.6894165Z xnumel = 168 2023-01-11T21:38:06.6894238Z rnumel = 27 2023-01-11T21:38:06.6894336Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6894471Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.6894673Z xmask = xindex < xnumel 2023-01-11T21:38:06.6894784Z rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK]) 2023-01-11T21:38:06.6894860Z x0 = xindex % 21 2023-01-11T21:38:06.6894983Z x1 = (xindex // 21) 2023-01-11T21:38:06.6895102Z _tmp3 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.6895173Z x3 = xindex 2023-01-11T21:38:06.6895278Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.6895366Z rindex = roffset + rbase 2023-01-11T21:38:06.6895444Z rmask = rindex < rnumel 2023-01-11T21:38:06.6895516Z r2 = rindex 2023-01-11T21:38:06.6895750Z tmp0 = tl.load(in_ptr0 + (x0 + (21*r2) + (567*x1)), rmask & xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.6895979Z tmp1 = tl.load(in_ptr1 + (x0 + (21*r2) + (567*x1)), rmask & xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.6896065Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.6896186Z _tmp3 = tl.where(xmask & rmask, _tmp3 + tmp2, _tmp3) 2023-01-11T21:38:06.6896301Z tmp3 = tl.reshape(tl.sum(_tmp3, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.6896392Z tl.store(out_ptr0 + x3, tmp3, xmask) 2023-01-11T21:38:06.6896478Z ''') 2023-01-11T21:38:06.6896483Z 2023-01-11T21:38:06.6896488Z 2023-01-11T21:38:06.6896653Z triton_fused_add_1_sum_2_1 = async_compile.triton(''' 2023-01-11T21:38:06.6896728Z import triton 2023-01-11T21:38:06.6896823Z import triton.language as tl 2023-01-11T21:38:06.6896938Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6897040Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6897237Z from torch._inductor.triton_ops.autotune import reduction 2023-01-11T21:38:06.6897357Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6897362Z 2023-01-11T21:38:06.6897453Z @reduction(size_hints=[256, 32], 2023-01-11T21:38:06.6897614Z reduction_hint=ReductionHint.INNER, 2023-01-11T21:38:06.6897703Z filename=__file__, 2023-01-11T21:38:06.6898077Z meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.6898155Z @triton.jit 2023-01-11T21:38:06.6898351Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr): 2023-01-11T21:38:06.6898425Z xnumel = 216 2023-01-11T21:38:06.6898492Z rnumel = 21 2023-01-11T21:38:06.6898593Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6898739Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.6898828Z xmask = xindex < xnumel 2023-01-11T21:38:06.6898954Z rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK]) 2023-01-11T21:38:06.6899026Z x0 = xindex 2023-01-11T21:38:06.6899155Z _tmp3 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.6899260Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.6899349Z rindex = roffset + rbase 2023-01-11T21:38:06.6899438Z rmask = rindex < rnumel 2023-01-11T21:38:06.6899509Z r1 = rindex 2023-01-11T21:38:06.6899640Z tmp0 = tl.load(in_ptr0 + (r1 + (21*x0)), rmask & xmask) 2023-01-11T21:38:06.6899764Z tmp1 = tl.load(in_ptr1 + (r1 + (21*x0)), rmask & xmask) 2023-01-11T21:38:06.6899847Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.6899969Z _tmp3 = tl.where(xmask & rmask, _tmp3 + tmp2, _tmp3) 2023-01-11T21:38:06.6900088Z tmp3 = tl.reshape(tl.sum(_tmp3, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.6900190Z tl.store(out_ptr0 + x0, tmp3, xmask) 2023-01-11T21:38:06.6900278Z ''') 2023-01-11T21:38:06.6900284Z 2023-01-11T21:38:06.6900288Z 2023-01-11T21:38:06.6900386Z async_compile.wait(globals()) 2023-01-11T21:38:06.6900463Z del async_compile 2023-01-11T21:38:06.6900473Z 2023-01-11T21:38:06.6900550Z def call(args): 2023-01-11T21:38:06.6900623Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.6900699Z args.clear() 2023-01-11T21:38:06.6900795Z with torch.cuda.device(0): 2023-01-11T21:38:06.6901025Z buf0 = empty_strided((8, 21), (21, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.6901146Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.6901300Z triton_fused_add_sum_1_0.run(arg0_1, arg1_1, buf0, 168, 27, grid=grid(168), stream=stream0) 2023-01-11T21:38:06.6901503Z buf1 = empty_strided((8, 9, 3), (27, 3, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.6901657Z triton_fused_add_1_sum_2_1.run(arg0_1, arg1_1, buf1, 216, 21, grid=grid(216), stream=stream0) 2023-01-11T21:38:06.6901723Z del arg0_1 2023-01-11T21:38:06.6901795Z del arg1_1 2023-01-11T21:38:06.6901882Z return (buf0, buf1, ) 2023-01-11T21:38:06.6901887Z 2023-01-11T21:38:06.6901891Z 2023-01-11T21:38:06.6901971Z if __name__ == "__main__": 2023-01-11T21:38:06.6902091Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.6902218Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.6902439Z arg0_1 = rand_strided((8, 9, 3, 21), (567, 63, 21, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.6902661Z arg1_1 = rand_strided((8, 9, 3, 21), (567, 63, 21, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.6902774Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.6902779Z 2023-01-11T21:38:06.6903042Z [2023-01-11 21:35:43,093] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 865 2023-01-11T21:38:06.6903048Z 2023-01-11T21:38:06.6903146Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.6903220Z import torch 2023-01-11T21:38:06.6903294Z import random 2023-01-11T21:38:06.6903413Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.6903535Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.6903572Z 2023-01-11T21:38:06.6903654Z aten = torch.ops.aten 2023-01-11T21:38:06.6903783Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.6903877Z async_compile = AsyncCompile() 2023-01-11T21:38:06.6903882Z 2023-01-11T21:38:06.6903955Z import triton 2023-01-11T21:38:06.6904051Z import triton.language as tl 2023-01-11T21:38:06.6904175Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.6904314Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.6904320Z 2023-01-11T21:38:06.6904324Z 2023-01-11T21:38:06.6904485Z triton_fused_add_sum_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.6904559Z import triton 2023-01-11T21:38:06.6904644Z import triton.language as tl 2023-01-11T21:38:06.6904757Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6904859Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6904993Z from torch._inductor.triton_ops.autotune import reduction 2023-01-11T21:38:06.6905122Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6905127Z 2023-01-11T21:38:06.6905218Z @reduction(size_hints=[256, 32], 2023-01-11T21:38:06.6905335Z reduction_hint=ReductionHint.DEFAULT, 2023-01-11T21:38:06.6905413Z filename=__file__, 2023-01-11T21:38:06.6905840Z meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.6905914Z @triton.jit 2023-01-11T21:38:06.6906090Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr): 2023-01-11T21:38:06.6906164Z xnumel = 168 2023-01-11T21:38:06.6906237Z rnumel = 27 2023-01-11T21:38:06.6906339Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6906473Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.6906552Z xmask = xindex < xnumel 2023-01-11T21:38:06.6906670Z rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK]) 2023-01-11T21:38:06.6906746Z x0 = xindex % 21 2023-01-11T21:38:06.6906824Z x1 = (xindex // 21) 2023-01-11T21:38:06.6906941Z _tmp3 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.6907012Z x3 = xindex 2023-01-11T21:38:06.6907142Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.6907225Z rindex = roffset + rbase 2023-01-11T21:38:06.6907310Z rmask = rindex < rnumel 2023-01-11T21:38:06.6907381Z r2 = rindex 2023-01-11T21:38:06.6907634Z tmp0 = tl.load(in_ptr0 + (x0 + (21*r2) + (567*x1)), rmask & xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.6907883Z tmp1 = tl.load(in_ptr1 + (x0 + (21*r2) + (567*x1)), rmask & xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.6907965Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.6908086Z _tmp3 = tl.where(xmask & rmask, _tmp3 + tmp2, _tmp3) 2023-01-11T21:38:06.6908198Z tmp3 = tl.reshape(tl.sum(_tmp3, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.6908297Z tl.store(out_ptr0 + x3, tmp3, xmask) 2023-01-11T21:38:06.6908381Z ''') 2023-01-11T21:38:06.6908386Z 2023-01-11T21:38:06.6908390Z 2023-01-11T21:38:06.6908557Z triton_fused_add_1_sum_2_1 = async_compile.triton(''' 2023-01-11T21:38:06.6908634Z import triton 2023-01-11T21:38:06.6908728Z import triton.language as tl 2023-01-11T21:38:06.6908841Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6908942Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6909066Z from torch._inductor.triton_ops.autotune import reduction 2023-01-11T21:38:06.6909189Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6909194Z 2023-01-11T21:38:06.6909283Z @reduction(size_hints=[256, 32], 2023-01-11T21:38:06.6909400Z reduction_hint=ReductionHint.INNER, 2023-01-11T21:38:06.6909485Z filename=__file__, 2023-01-11T21:38:06.6909888Z meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.6909962Z @triton.jit 2023-01-11T21:38:06.6910141Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr): 2023-01-11T21:38:06.6910208Z xnumel = 216 2023-01-11T21:38:06.6910281Z rnumel = 21 2023-01-11T21:38:06.6910379Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6910514Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.6910597Z xmask = xindex < xnumel 2023-01-11T21:38:06.6910715Z rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK]) 2023-01-11T21:38:06.6910785Z x0 = xindex 2023-01-11T21:38:06.6910893Z _tmp3 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.6911000Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.6911090Z rindex = roffset + rbase 2023-01-11T21:38:06.6911177Z rmask = rindex < rnumel 2023-01-11T21:38:06.6911249Z r1 = rindex 2023-01-11T21:38:06.6911381Z tmp0 = tl.load(in_ptr0 + (r1 + (21*x0)), rmask & xmask).to(tl.float32) 2023-01-11T21:38:06.6911517Z tmp1 = tl.load(in_ptr1 + (r1 + (21*x0)), rmask & xmask).to(tl.float32) 2023-01-11T21:38:06.6911592Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.6911718Z _tmp3 = tl.where(xmask & rmask, _tmp3 + tmp2, _tmp3) 2023-01-11T21:38:06.6911833Z tmp3 = tl.reshape(tl.sum(_tmp3, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.6911931Z tl.store(out_ptr0 + x0, tmp3, xmask) 2023-01-11T21:38:06.6912016Z ''') 2023-01-11T21:38:06.6912021Z 2023-01-11T21:38:06.6912025Z 2023-01-11T21:38:06.6912119Z async_compile.wait(globals()) 2023-01-11T21:38:06.6912196Z del async_compile 2023-01-11T21:38:06.6912201Z 2023-01-11T21:38:06.6912274Z def call(args): 2023-01-11T21:38:06.6912347Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.6912425Z args.clear() 2023-01-11T21:38:06.6912516Z with torch.cuda.device(0): 2023-01-11T21:38:06.6912719Z buf0 = empty_strided((8, 21), (21, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.6912814Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.6912992Z triton_fused_add_sum_1_0.run(arg0_1, arg1_1, buf0, 168, 27, grid=grid(168), stream=stream0) 2023-01-11T21:38:06.6913199Z buf1 = empty_strided((8, 9, 3), (27, 3, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.6913346Z triton_fused_add_1_sum_2_1.run(arg0_1, arg1_1, buf1, 216, 21, grid=grid(216), stream=stream0) 2023-01-11T21:38:06.6913419Z del arg0_1 2023-01-11T21:38:06.6913493Z del arg1_1 2023-01-11T21:38:06.6913575Z return (buf0, buf1, ) 2023-01-11T21:38:06.6913581Z 2023-01-11T21:38:06.6913585Z 2023-01-11T21:38:06.6913664Z if __name__ == "__main__": 2023-01-11T21:38:06.6913781Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.6913908Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.6914133Z arg0_1 = rand_strided((8, 9, 3, 21), (567, 63, 21, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.6914341Z arg1_1 = rand_strided((8, 9, 3, 21), (567, 63, 21, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.6914462Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.6914467Z 2023-01-11T21:38:06.6914540Z ok (0.766s) 2023-01-11T21:38:06.6914994Z test_sum3_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.6915125Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.6915385Z [2023-01-11 21:35:43,114] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 866 2023-01-11T21:38:06.6915689Z [2023-01-11 21:35:43,310] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 866 2023-01-11T21:38:06.6916110Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.6916243Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.6916497Z [2023-01-11 21:35:43,330] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 867 2023-01-11T21:38:06.6916503Z 2023-01-11T21:38:06.6916601Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.6916668Z import torch 2023-01-11T21:38:06.6916745Z import random 2023-01-11T21:38:06.6916862Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.6916987Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.6916992Z 2023-01-11T21:38:06.6917074Z aten = torch.ops.aten 2023-01-11T21:38:06.6917210Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.6917308Z async_compile = AsyncCompile() 2023-01-11T21:38:06.6917313Z 2023-01-11T21:38:06.6917380Z import triton 2023-01-11T21:38:06.6917473Z import triton.language as tl 2023-01-11T21:38:06.6917600Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.6917739Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.6917745Z 2023-01-11T21:38:06.6917749Z 2023-01-11T21:38:06.6917910Z triton_fused_add_sum_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.6917985Z import triton 2023-01-11T21:38:06.6918077Z import triton.language as tl 2023-01-11T21:38:06.6918190Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6918288Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6918417Z from torch._inductor.triton_ops.autotune import reduction 2023-01-11T21:38:06.6918543Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6918548Z 2023-01-11T21:38:06.6918641Z @reduction(size_hints=[16, 16], 2023-01-11T21:38:06.6918779Z reduction_hint=ReductionHint.INNER, 2023-01-11T21:38:06.6918865Z filename=__file__, 2023-01-11T21:38:06.6919258Z meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: 'i32', 5: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.6919331Z @triton.jit 2023-01-11T21:38:06.6919508Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr): 2023-01-11T21:38:06.6919582Z xnumel = 10 2023-01-11T21:38:06.6919654Z rnumel = 10 2023-01-11T21:38:06.6919756Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6919892Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.6919975Z xmask = xindex < xnumel 2023-01-11T21:38:06.6920093Z rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK]) 2023-01-11T21:38:06.6920157Z x0 = xindex 2023-01-11T21:38:06.6920266Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.6920353Z rindex = roffset + rbase 2023-01-11T21:38:06.6920437Z rmask = rindex < rnumel 2023-01-11T21:38:06.6920508Z r1 = rindex 2023-01-11T21:38:06.6920624Z tmp0 = tl.load(in_ptr0 + (r1 + (10*x0)), rmask & xmask) 2023-01-11T21:38:06.6920726Z tmp1 = tl.load(in_ptr1 + (r1), rmask) 2023-01-11T21:38:06.6920801Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.6920958Z tl.store(out_ptr0 + (r1 + (10*x0) + tl.zeros([XBLOCK, RBLOCK], tl.int32)), tmp2, rmask & xmask) 2023-01-11T21:38:06.6921076Z _tmp4 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.6921215Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.6921302Z rindex = roffset + rbase 2023-01-11T21:38:06.6921387Z rmask = rindex < rnumel 2023-01-11T21:38:06.6921458Z r1 = rindex 2023-01-11T21:38:06.6921567Z tmp3 = tl.load(out_ptr0 + (r1 + (10*x0)), rmask & xmask) 2023-01-11T21:38:06.6921690Z _tmp4 = tl.where(xmask & rmask, _tmp4 + tmp3, _tmp4) 2023-01-11T21:38:06.6921805Z tmp4 = tl.reshape(tl.sum(_tmp4, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.6921902Z tl.store(out_ptr1 + x0, tmp4, xmask) 2023-01-11T21:38:06.6921989Z ''') 2023-01-11T21:38:06.6921994Z 2023-01-11T21:38:06.6921999Z 2023-01-11T21:38:06.6922155Z triton_fused_add_1_1 = async_compile.triton(''' 2023-01-11T21:38:06.6922228Z import triton 2023-01-11T21:38:06.6922321Z import triton.language as tl 2023-01-11T21:38:06.6922428Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6922529Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6922667Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6922792Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6922797Z 2023-01-11T21:38:06.6923201Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.6923275Z @triton.jit 2023-01-11T21:38:06.6923414Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6923488Z xnumel = 10 2023-01-11T21:38:06.6923577Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6923706Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6923792Z xmask = xindex < xnumel 2023-01-11T21:38:06.6923861Z x0 = xindex 2023-01-11T21:38:06.6923960Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.6924034Z tmp1 = 10 2023-01-11T21:38:06.6924108Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.6924242Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.6924328Z ''') 2023-01-11T21:38:06.6924333Z 2023-01-11T21:38:06.6924338Z 2023-01-11T21:38:06.6924430Z async_compile.wait(globals()) 2023-01-11T21:38:06.6924505Z del async_compile 2023-01-11T21:38:06.6924542Z 2023-01-11T21:38:06.6924618Z def call(args): 2023-01-11T21:38:06.6924697Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.6924773Z args.clear() 2023-01-11T21:38:06.6924857Z with torch.cuda.device(0): 2023-01-11T21:38:06.6925063Z buf0 = empty_strided((10, 10), (10, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.6925260Z buf1 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.6925352Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.6925505Z triton_fused_add_sum_1_0.run(arg0_1, arg1_1, buf0, buf1, 10, 10, grid=grid(10), stream=stream0) 2023-01-11T21:38:06.6925581Z del arg0_1 2023-01-11T21:38:06.6925773Z buf2 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.6925909Z triton_fused_add_1_1.run(arg1_1, buf2, 10, grid=grid(10), stream=stream0) 2023-01-11T21:38:06.6925975Z del arg1_1 2023-01-11T21:38:06.6926066Z return (buf0, buf1, buf2, ) 2023-01-11T21:38:06.6926074Z 2023-01-11T21:38:06.6926079Z 2023-01-11T21:38:06.6926158Z if __name__ == "__main__": 2023-01-11T21:38:06.6926277Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.6926404Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.6926607Z arg0_1 = rand_strided((10, 10), (10, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.6926807Z arg1_1 = rand_strided((1, 10), (10, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.6926918Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.6926930Z 2023-01-11T21:38:06.6927187Z [2023-01-11 21:35:43,430] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 867 2023-01-11T21:38:06.6927222Z 2023-01-11T21:38:06.6927320Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.6927395Z import torch 2023-01-11T21:38:06.6927469Z import random 2023-01-11T21:38:06.6927588Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.6927715Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.6927720Z 2023-01-11T21:38:06.6927802Z aten = torch.ops.aten 2023-01-11T21:38:06.6927938Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.6928027Z async_compile = AsyncCompile() 2023-01-11T21:38:06.6928032Z 2023-01-11T21:38:06.6928105Z import triton 2023-01-11T21:38:06.6928198Z import triton.language as tl 2023-01-11T21:38:06.6928322Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.6928461Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.6928467Z 2023-01-11T21:38:06.6928474Z 2023-01-11T21:38:06.6928635Z triton_fused_add_sum_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.6928712Z import triton 2023-01-11T21:38:06.6928797Z import triton.language as tl 2023-01-11T21:38:06.6928909Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6929011Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6929145Z from torch._inductor.triton_ops.autotune import reduction 2023-01-11T21:38:06.6929268Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6929273Z 2023-01-11T21:38:06.6929362Z @reduction(size_hints=[16, 16], 2023-01-11T21:38:06.6929476Z reduction_hint=ReductionHint.INNER, 2023-01-11T21:38:06.6929563Z filename=__file__, 2023-01-11T21:38:06.6929942Z meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: '*fp16', 4: 'i32', 5: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.6930024Z @triton.jit 2023-01-11T21:38:06.6930209Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr): 2023-01-11T21:38:06.6930282Z xnumel = 10 2023-01-11T21:38:06.6930354Z rnumel = 10 2023-01-11T21:38:06.6930451Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6930618Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.6930702Z xmask = xindex < xnumel 2023-01-11T21:38:06.6930814Z rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK]) 2023-01-11T21:38:06.6930885Z x0 = xindex 2023-01-11T21:38:06.6930991Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.6931078Z rindex = roffset + rbase 2023-01-11T21:38:06.6931162Z rmask = rindex < rnumel 2023-01-11T21:38:06.6931234Z r1 = rindex 2023-01-11T21:38:06.6931360Z tmp0 = tl.load(in_ptr0 + (r1 + (10*x0)), rmask & xmask).to(tl.float32) 2023-01-11T21:38:06.6931483Z tmp1 = tl.load(in_ptr1 + (r1), rmask).to(tl.float32) 2023-01-11T21:38:06.6931569Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.6931728Z tl.store(out_ptr0 + (r1 + (10*x0) + tl.zeros([XBLOCK, RBLOCK], tl.int32)), tmp2, rmask & xmask) 2023-01-11T21:38:06.6931846Z _tmp4 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.6931954Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.6932041Z rindex = roffset + rbase 2023-01-11T21:38:06.6932126Z rmask = rindex < rnumel 2023-01-11T21:38:06.6932190Z r1 = rindex 2023-01-11T21:38:06.6932323Z tmp3 = tl.load(out_ptr0 + (r1 + (10*x0)), rmask & xmask).to(tl.float32) 2023-01-11T21:38:06.6932443Z _tmp4 = tl.where(xmask & rmask, _tmp4 + tmp3, _tmp4) 2023-01-11T21:38:06.6932555Z tmp4 = tl.reshape(tl.sum(_tmp4, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.6932652Z tl.store(out_ptr1 + x0, tmp4, xmask) 2023-01-11T21:38:06.6932737Z ''') 2023-01-11T21:38:06.6932743Z 2023-01-11T21:38:06.6932747Z 2023-01-11T21:38:06.6932934Z triton_fused_add_1_1 = async_compile.triton(''' 2023-01-11T21:38:06.6933002Z import triton 2023-01-11T21:38:06.6933094Z import triton.language as tl 2023-01-11T21:38:06.6933207Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6933309Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6933445Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.6933571Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6933576Z 2023-01-11T21:38:06.6933977Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.6934051Z @triton.jit 2023-01-11T21:38:06.6934177Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.6934250Z xnumel = 10 2023-01-11T21:38:06.6934346Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6934687Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.6934775Z xmask = xindex < xnumel 2023-01-11T21:38:06.6934847Z x0 = xindex 2023-01-11T21:38:06.6934963Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.6935028Z tmp1 = 10 2023-01-11T21:38:06.6935107Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.6935244Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.6935331Z ''') 2023-01-11T21:38:06.6935336Z 2023-01-11T21:38:06.6935341Z 2023-01-11T21:38:06.6935440Z async_compile.wait(globals()) 2023-01-11T21:38:06.6935534Z del async_compile 2023-01-11T21:38:06.6935539Z 2023-01-11T21:38:06.6935618Z def call(args): 2023-01-11T21:38:06.6935717Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.6935785Z args.clear() 2023-01-11T21:38:06.6935876Z with torch.cuda.device(0): 2023-01-11T21:38:06.6936079Z buf0 = empty_strided((10, 10), (10, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.6936277Z buf1 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.6936372Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.6936527Z triton_fused_add_sum_1_0.run(arg0_1, arg1_1, buf0, buf1, 10, 10, grid=grid(10), stream=stream0) 2023-01-11T21:38:06.6936602Z del arg0_1 2023-01-11T21:38:06.6936846Z buf2 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.6936986Z triton_fused_add_1_1.run(arg1_1, buf2, 10, grid=grid(10), stream=stream0) 2023-01-11T21:38:06.6937062Z del arg1_1 2023-01-11T21:38:06.6937204Z return (buf0, buf1, buf2, ) 2023-01-11T21:38:06.6937210Z 2023-01-11T21:38:06.6937214Z 2023-01-11T21:38:06.6937295Z if __name__ == "__main__": 2023-01-11T21:38:06.6937412Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.6937540Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.6937747Z arg0_1 = rand_strided((10, 10), (10, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.6937942Z arg1_1 = rand_strided((1, 10), (10, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.6938060Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.6938065Z 2023-01-11T21:38:06.6938135Z ok (0.338s) 2023-01-11T21:38:06.6938585Z test_sum4_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.6938717Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.6938975Z [2023-01-11 21:35:43,453] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 868 2023-01-11T21:38:06.6939236Z [2023-01-11 21:35:43,689] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 868 2023-01-11T21:38:06.6939705Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.6939841Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.6940096Z [2023-01-11 21:35:43,711] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 869 2023-01-11T21:38:06.6940102Z 2023-01-11T21:38:06.6940201Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.6940271Z import torch 2023-01-11T21:38:06.6940348Z import random 2023-01-11T21:38:06.6940469Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.6940595Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.6940604Z 2023-01-11T21:38:06.6940688Z aten = torch.ops.aten 2023-01-11T21:38:06.6940828Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.6940926Z async_compile = AsyncCompile() 2023-01-11T21:38:06.6940931Z 2023-01-11T21:38:06.6941008Z import triton 2023-01-11T21:38:06.6941099Z import triton.language as tl 2023-01-11T21:38:06.6941228Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.6941370Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.6941376Z 2023-01-11T21:38:06.6941381Z 2023-01-11T21:38:06.6941554Z triton_fused_add_add_1_sum_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.6941634Z import triton 2023-01-11T21:38:06.6941729Z import triton.language as tl 2023-01-11T21:38:06.6941845Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6941942Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6942076Z from torch._inductor.triton_ops.autotune import reduction 2023-01-11T21:38:06.6942206Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6942211Z 2023-01-11T21:38:06.6942304Z @reduction(size_hints=[128, 8], 2023-01-11T21:38:06.6942424Z reduction_hint=ReductionHint.INNER, 2023-01-11T21:38:06.6942511Z filename=__file__, 2023-01-11T21:38:06.6942935Z meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: 'i32', 5: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]}) 2023-01-11T21:38:06.6943013Z @triton.jit 2023-01-11T21:38:06.6943194Z def triton_(in_ptr0, out_ptr0, out_ptr1, out_ptr2, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr): 2023-01-11T21:38:06.6943277Z xnumel = 128 2023-01-11T21:38:06.6943353Z rnumel = 8 2023-01-11T21:38:06.6943454Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6943590Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.6943681Z xmask = xindex < xnumel 2023-01-11T21:38:06.6943801Z rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK]) 2023-01-11T21:38:06.6943867Z x0 = xindex 2023-01-11T21:38:06.6943975Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.6944068Z rindex = roffset + rbase 2023-01-11T21:38:06.6944157Z rmask = rindex < rnumel 2023-01-11T21:38:06.6944231Z r1 = rindex 2023-01-11T21:38:06.6944352Z tmp0 = tl.load(in_ptr0 + (r1 + (8*x0)), rmask & xmask) 2023-01-11T21:38:06.6944425Z tmp1 = 1 2023-01-11T21:38:06.6944503Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.6944665Z tl.store(out_ptr0 + (r1 + (8*x0) + tl.zeros([XBLOCK, RBLOCK], tl.int32)), tmp2, rmask & xmask) 2023-01-11T21:38:06.6944786Z _tmp4 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.6944894Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.6944983Z rindex = roffset + rbase 2023-01-11T21:38:06.6945143Z rmask = rindex < rnumel 2023-01-11T21:38:06.6945216Z r1 = rindex 2023-01-11T21:38:06.6945328Z tmp3 = tl.load(out_ptr0 + (r1 + (8*x0)), rmask & xmask) 2023-01-11T21:38:06.6945451Z _tmp4 = tl.where(xmask & rmask, _tmp4 + tmp3, _tmp4) 2023-01-11T21:38:06.6945571Z tmp4 = tl.reshape(tl.sum(_tmp4, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.6945672Z tl.store(out_ptr1 + x0, tmp4, xmask) 2023-01-11T21:38:06.6945746Z tmp5 = 3 2023-01-11T21:38:06.6945830Z tmp6 = tmp4 + tmp5 2023-01-11T21:38:06.6945968Z tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK, 1], tl.int32)), tmp6, xmask) 2023-01-11T21:38:06.6946052Z ''') 2023-01-11T21:38:06.6946066Z 2023-01-11T21:38:06.6946071Z 2023-01-11T21:38:06.6946243Z triton_fused_add_1_add_2_sum_2_1 = async_compile.triton(''' 2023-01-11T21:38:06.6946319Z import triton 2023-01-11T21:38:06.6946414Z import triton.language as tl 2023-01-11T21:38:06.6946530Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6946637Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6946775Z from torch._inductor.triton_ops.autotune import reduction 2023-01-11T21:38:06.6946902Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6946908Z 2023-01-11T21:38:06.6946993Z @reduction(size_hints=[16, 8], 2023-01-11T21:38:06.6947114Z reduction_hint=ReductionHint.INNER, 2023-01-11T21:38:06.6947202Z filename=__file__, 2023-01-11T21:38:06.6947579Z meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.6947655Z @triton.jit 2023-01-11T21:38:06.6947834Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr): 2023-01-11T21:38:06.6947910Z xnumel = 16 2023-01-11T21:38:06.6947984Z rnumel = 8 2023-01-11T21:38:06.6948077Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6948215Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.6948300Z xmask = xindex < xnumel 2023-01-11T21:38:06.6948422Z rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK]) 2023-01-11T21:38:06.6948495Z x0 = xindex 2023-01-11T21:38:06.6948640Z _tmp1 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.6948752Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.6948837Z rindex = roffset + rbase 2023-01-11T21:38:06.6948926Z rmask = rindex < rnumel 2023-01-11T21:38:06.6948999Z r1 = rindex 2023-01-11T21:38:06.6949119Z tmp0 = tl.load(in_ptr0 + (r1 + (8*x0)), rmask & xmask) 2023-01-11T21:38:06.6949242Z _tmp1 = tl.where(xmask & rmask, _tmp1 + tmp0, _tmp1) 2023-01-11T21:38:06.6949359Z tmp1 = tl.reshape(tl.sum(_tmp1, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.6949458Z tl.store(out_ptr0 + x0, tmp1, xmask) 2023-01-11T21:38:06.6949525Z tmp2 = 5 2023-01-11T21:38:06.6949611Z tmp3 = tmp1 + tmp2 2023-01-11T21:38:06.6949750Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK, 1], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.6949839Z ''') 2023-01-11T21:38:06.6949845Z 2023-01-11T21:38:06.6949849Z 2023-01-11T21:38:06.6949947Z async_compile.wait(globals()) 2023-01-11T21:38:06.6950026Z del async_compile 2023-01-11T21:38:06.6950033Z 2023-01-11T21:38:06.6950111Z def call(args): 2023-01-11T21:38:06.6950179Z arg0_1, = args 2023-01-11T21:38:06.6950260Z args.clear() 2023-01-11T21:38:06.6950359Z with torch.cuda.device(0): 2023-01-11T21:38:06.6950582Z buf0 = empty_strided((1, 16, 8, 8), (1024, 64, 8, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.6950794Z buf1 = empty_strided((1, 16, 8), (128, 8, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.6951005Z buf2 = empty_strided((1, 16, 8), (128, 8, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.6951103Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.6951296Z triton_fused_add_add_1_sum_1_0.run(arg0_1, buf0, buf1, buf2, 128, 8, grid=grid(128), stream=stream0) 2023-01-11T21:38:06.6951365Z del arg0_1 2023-01-11T21:38:06.6951566Z buf3 = empty_strided((1, 16), (16, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.6951766Z buf4 = empty_strided((1, 16), (16, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.6951923Z triton_fused_add_1_add_2_sum_2_1.run(buf2, buf3, buf4, 16, 8, grid=grid(16), stream=stream0) 2023-01-11T21:38:06.6952028Z return (buf4, buf3, buf2, buf1, buf0, ) 2023-01-11T21:38:06.6952033Z 2023-01-11T21:38:06.6952038Z 2023-01-11T21:38:06.6952121Z if __name__ == "__main__": 2023-01-11T21:38:06.6952241Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.6952370Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.6952583Z arg0_1 = rand_strided((1, 16, 8, 8), (1024, 64, 8, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.6952701Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.6952707Z 2023-01-11T21:38:06.6952974Z [2023-01-11 21:35:43,832] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 869 2023-01-11T21:38:06.6952980Z 2023-01-11T21:38:06.6953083Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.6953164Z import torch 2023-01-11T21:38:06.6953242Z import random 2023-01-11T21:38:06.6953365Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.6953493Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.6953498Z 2023-01-11T21:38:06.6953576Z aten = torch.ops.aten 2023-01-11T21:38:06.6953714Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.6953811Z async_compile = AsyncCompile() 2023-01-11T21:38:06.6953816Z 2023-01-11T21:38:06.6953894Z import triton 2023-01-11T21:38:06.6953989Z import triton.language as tl 2023-01-11T21:38:06.6954117Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.6954261Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.6954267Z 2023-01-11T21:38:06.6954271Z 2023-01-11T21:38:06.6954446Z triton_fused_add_add_1_sum_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.6954516Z import triton 2023-01-11T21:38:06.6954612Z import triton.language as tl 2023-01-11T21:38:06.6954752Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6954858Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6954994Z from torch._inductor.triton_ops.autotune import reduction 2023-01-11T21:38:06.6955121Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6955126Z 2023-01-11T21:38:06.6955223Z @reduction(size_hints=[128, 8], 2023-01-11T21:38:06.6955361Z reduction_hint=ReductionHint.INNER, 2023-01-11T21:38:06.6955455Z filename=__file__, 2023-01-11T21:38:06.6955859Z meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: '*fp16', 4: 'i32', 5: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]}) 2023-01-11T21:38:06.6955940Z @triton.jit 2023-01-11T21:38:06.6956126Z def triton_(in_ptr0, out_ptr0, out_ptr1, out_ptr2, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr): 2023-01-11T21:38:06.6956204Z xnumel = 128 2023-01-11T21:38:06.6956282Z rnumel = 8 2023-01-11T21:38:06.6956380Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6956510Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.6956596Z xmask = xindex < xnumel 2023-01-11T21:38:06.6956715Z rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK]) 2023-01-11T21:38:06.6956788Z x0 = xindex 2023-01-11T21:38:06.6956896Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.6956988Z rindex = roffset + rbase 2023-01-11T21:38:06.6957076Z rmask = rindex < rnumel 2023-01-11T21:38:06.6957142Z r1 = rindex 2023-01-11T21:38:06.6957316Z tmp0 = tl.load(in_ptr0 + (r1 + (8*x0)), rmask & xmask).to(tl.float32) 2023-01-11T21:38:06.6957391Z tmp1 = 1 2023-01-11T21:38:06.6957475Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.6957639Z tl.store(out_ptr0 + (r1 + (8*x0) + tl.zeros([XBLOCK, RBLOCK], tl.int32)), tmp2, rmask & xmask) 2023-01-11T21:38:06.6957761Z _tmp4 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.6957871Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.6957962Z rindex = roffset + rbase 2023-01-11T21:38:06.6958042Z rmask = rindex < rnumel 2023-01-11T21:38:06.6958116Z r1 = rindex 2023-01-11T21:38:06.6958252Z tmp3 = tl.load(out_ptr0 + (r1 + (8*x0)), rmask & xmask).to(tl.float32) 2023-01-11T21:38:06.6958377Z _tmp4 = tl.where(xmask & rmask, _tmp4 + tmp3, _tmp4) 2023-01-11T21:38:06.6958494Z tmp4 = tl.reshape(tl.sum(_tmp4, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.6958594Z tl.store(out_ptr1 + x0, tmp4, xmask) 2023-01-11T21:38:06.6958670Z tmp5 = 3 2023-01-11T21:38:06.6958745Z tmp6 = tmp4 + tmp5 2023-01-11T21:38:06.6958881Z tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK, 1], tl.int32)), tmp6, xmask) 2023-01-11T21:38:06.6958968Z ''') 2023-01-11T21:38:06.6958974Z 2023-01-11T21:38:06.6958978Z 2023-01-11T21:38:06.6959158Z triton_fused_add_1_add_2_sum_2_1 = async_compile.triton(''' 2023-01-11T21:38:06.6959238Z import triton 2023-01-11T21:38:06.6959337Z import triton.language as tl 2023-01-11T21:38:06.6959454Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6959551Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6959684Z from torch._inductor.triton_ops.autotune import reduction 2023-01-11T21:38:06.6959811Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6959816Z 2023-01-11T21:38:06.6959909Z @reduction(size_hints=[16, 8], 2023-01-11T21:38:06.6960027Z reduction_hint=ReductionHint.INNER, 2023-01-11T21:38:06.6960113Z filename=__file__, 2023-01-11T21:38:06.6960498Z meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.6960576Z @triton.jit 2023-01-11T21:38:06.6960775Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr): 2023-01-11T21:38:06.6960854Z xnumel = 16 2023-01-11T21:38:06.6960928Z rnumel = 8 2023-01-11T21:38:06.6961028Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6961165Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.6961250Z xmask = xindex < xnumel 2023-01-11T21:38:06.6961372Z rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK]) 2023-01-11T21:38:06.6961438Z x0 = xindex 2023-01-11T21:38:06.6961557Z _tmp1 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.6961665Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.6961760Z rindex = roffset + rbase 2023-01-11T21:38:06.6961848Z rmask = rindex < rnumel 2023-01-11T21:38:06.6961922Z r1 = rindex 2023-01-11T21:38:06.6962056Z tmp0 = tl.load(in_ptr0 + (r1 + (8*x0)), rmask & xmask).to(tl.float32) 2023-01-11T21:38:06.6962172Z _tmp1 = tl.where(xmask & rmask, _tmp1 + tmp0, _tmp1) 2023-01-11T21:38:06.6962293Z tmp1 = tl.reshape(tl.sum(_tmp1, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.6962393Z tl.store(out_ptr0 + x0, tmp1, xmask) 2023-01-11T21:38:06.6962472Z tmp2 = 5 2023-01-11T21:38:06.6962555Z tmp3 = tmp1 + tmp2 2023-01-11T21:38:06.6962692Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK, 1], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.6962781Z ''') 2023-01-11T21:38:06.6962786Z 2023-01-11T21:38:06.6962790Z 2023-01-11T21:38:06.6962886Z async_compile.wait(globals()) 2023-01-11T21:38:06.6962959Z del async_compile 2023-01-11T21:38:06.6962964Z 2023-01-11T21:38:06.6963044Z def call(args): 2023-01-11T21:38:06.6963121Z arg0_1, = args 2023-01-11T21:38:06.6963227Z args.clear() 2023-01-11T21:38:06.6963321Z with torch.cuda.device(0): 2023-01-11T21:38:06.6963546Z buf0 = empty_strided((1, 16, 8, 8), (1024, 64, 8, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.6963759Z buf1 = empty_strided((1, 16, 8), (128, 8, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.6963961Z buf2 = empty_strided((1, 16, 8), (128, 8, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.6964057Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.6964219Z triton_fused_add_add_1_sum_1_0.run(arg0_1, buf0, buf1, buf2, 128, 8, grid=grid(128), stream=stream0) 2023-01-11T21:38:06.6964295Z del arg0_1 2023-01-11T21:38:06.6964500Z buf3 = empty_strided((1, 16), (16, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.6964697Z buf4 = empty_strided((1, 16), (16, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.6964857Z triton_fused_add_1_add_2_sum_2_1.run(buf2, buf3, buf4, 16, 8, grid=grid(16), stream=stream0) 2023-01-11T21:38:06.6964969Z return (buf4, buf3, buf2, buf1, buf0, ) 2023-01-11T21:38:06.6964974Z 2023-01-11T21:38:06.6964979Z 2023-01-11T21:38:06.6965063Z if __name__ == "__main__": 2023-01-11T21:38:06.6965177Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.6965311Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.6965555Z arg0_1 = rand_strided((1, 16, 8, 8), (1024, 64, 8, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.6965681Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.6965687Z 2023-01-11T21:38:06.6965774Z ok (0.402s) 2023-01-11T21:38:06.6966228Z test_sum5_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.6966363Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.6966627Z [2023-01-11 21:35:43,855] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 870 2023-01-11T21:38:06.6966924Z [2023-01-11 21:35:44,053] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 870 2023-01-11T21:38:06.6967334Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.6967468Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.6967724Z [2023-01-11 21:35:44,074] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 871 2023-01-11T21:38:06.6967733Z 2023-01-11T21:38:06.6967833Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.6967911Z import torch 2023-01-11T21:38:06.6967987Z import random 2023-01-11T21:38:06.6968109Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.6968238Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.6968244Z 2023-01-11T21:38:06.6968329Z aten = torch.ops.aten 2023-01-11T21:38:06.6968462Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.6968560Z async_compile = AsyncCompile() 2023-01-11T21:38:06.6968565Z 2023-01-11T21:38:06.6968641Z import triton 2023-01-11T21:38:06.6968738Z import triton.language as tl 2023-01-11T21:38:06.6968865Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.6969008Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.6969014Z 2023-01-11T21:38:06.6969042Z 2023-01-11T21:38:06.6969207Z triton_fused_add_sum_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.6969284Z import triton 2023-01-11T21:38:06.6969373Z import triton.language as tl 2023-01-11T21:38:06.6969488Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6969592Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6969728Z from torch._inductor.triton_ops.autotune import reduction 2023-01-11T21:38:06.6969858Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6969863Z 2023-01-11T21:38:06.6969958Z @reduction(size_hints=[256, 16], 2023-01-11T21:38:06.6970074Z reduction_hint=ReductionHint.INNER, 2023-01-11T21:38:06.6970155Z filename=__file__, 2023-01-11T21:38:06.6970515Z meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.6970594Z @triton.jit 2023-01-11T21:38:06.6970769Z def triton_(in_ptr0, out_ptr0, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr): 2023-01-11T21:38:06.6970848Z xnumel = 136 2023-01-11T21:38:06.6970922Z rnumel = 9 2023-01-11T21:38:06.6971026Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6971167Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.6971246Z xmask = xindex < xnumel 2023-01-11T21:38:06.6971366Z rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK]) 2023-01-11T21:38:06.6971442Z x0 = xindex 2023-01-11T21:38:06.6971561Z _tmp3 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.6971671Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.6971763Z rindex = roffset + rbase 2023-01-11T21:38:06.6971849Z rmask = rindex < rnumel 2023-01-11T21:38:06.6971916Z r1 = rindex 2023-01-11T21:38:06.6972037Z tmp0 = tl.load(in_ptr0 + (r1 + (9*x0)), rmask & xmask) 2023-01-11T21:38:06.6972112Z tmp1 = 1 2023-01-11T21:38:06.6972199Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.6972322Z _tmp3 = tl.where(xmask & rmask, _tmp3 + tmp2, _tmp3) 2023-01-11T21:38:06.6972440Z tmp3 = tl.reshape(tl.sum(_tmp3, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.6972540Z tl.store(out_ptr0 + x0, tmp3, xmask) 2023-01-11T21:38:06.6972621Z ''') 2023-01-11T21:38:06.6972626Z 2023-01-11T21:38:06.6972659Z 2023-01-11T21:38:06.6972853Z triton_fused_add_add_1_add_2_sum_1_sum_2_1 = async_compile.triton(''' 2023-01-11T21:38:06.6972930Z import triton 2023-01-11T21:38:06.6973026Z import triton.language as tl 2023-01-11T21:38:06.6973147Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6973252Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6973388Z from torch._inductor.triton_ops.autotune import reduction 2023-01-11T21:38:06.6973509Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6973514Z 2023-01-11T21:38:06.6973605Z @reduction(size_hints=[32, 8], 2023-01-11T21:38:06.6973724Z reduction_hint=ReductionHint.INNER, 2023-01-11T21:38:06.6973812Z filename=__file__, 2023-01-11T21:38:06.6974186Z meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.6974267Z @triton.jit 2023-01-11T21:38:06.6974440Z def triton_(in_out_ptr0, in_ptr0, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr): 2023-01-11T21:38:06.6974644Z xnumel = 17 2023-01-11T21:38:06.6974710Z rnumel = 8 2023-01-11T21:38:06.6974807Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6974942Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.6975025Z xmask = xindex < xnumel 2023-01-11T21:38:06.6975145Z rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK]) 2023-01-11T21:38:06.6975214Z x0 = xindex 2023-01-11T21:38:06.6975331Z _tmp3 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.6975480Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.6975570Z rindex = roffset + rbase 2023-01-11T21:38:06.6975658Z rmask = rindex < rnumel 2023-01-11T21:38:06.6975732Z r1 = rindex 2023-01-11T21:38:06.6975855Z tmp0 = tl.load(in_ptr0 + (r1 + (8*x0)), rmask & xmask) 2023-01-11T21:38:06.6975930Z tmp1 = 3 2023-01-11T21:38:06.6976014Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.6976130Z _tmp3 = tl.where(xmask & rmask, _tmp3 + tmp2, _tmp3) 2023-01-11T21:38:06.6976247Z tmp3 = tl.reshape(tl.sum(_tmp3, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.6976320Z tmp4 = 5 2023-01-11T21:38:06.6976402Z tmp5 = tmp3 + tmp4 2023-01-11T21:38:06.6976544Z tl.store(in_out_ptr0 + (x0 + tl.zeros([XBLOCK, 1], tl.int32)), tmp5, xmask) 2023-01-11T21:38:06.6976632Z ''') 2023-01-11T21:38:06.6976638Z 2023-01-11T21:38:06.6976642Z 2023-01-11T21:38:06.6976737Z async_compile.wait(globals()) 2023-01-11T21:38:06.6976813Z del async_compile 2023-01-11T21:38:06.6976829Z 2023-01-11T21:38:06.6976898Z def call(args): 2023-01-11T21:38:06.6976973Z arg0_1, = args 2023-01-11T21:38:06.6977051Z args.clear() 2023-01-11T21:38:06.6977197Z with torch.cuda.device(0): 2023-01-11T21:38:06.6977442Z buf0 = empty_strided((1, 17, 8), (136, 8, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.6977537Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.6977683Z triton_fused_add_sum_1_0.run(arg0_1, buf0, 136, 9, grid=grid(136), stream=stream0) 2023-01-11T21:38:06.6977752Z del arg0_1 2023-01-11T21:38:06.6977957Z buf1 = empty_strided((1, 17), (17, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.6978050Z buf2 = buf1; del buf1 # reuse 2023-01-11T21:38:06.6978208Z triton_fused_add_add_1_add_2_sum_1_sum_2_1.run(buf2, buf0, 17, 8, grid=grid(17), stream=stream0) 2023-01-11T21:38:06.6978290Z return (buf2, ) 2023-01-11T21:38:06.6978296Z 2023-01-11T21:38:06.6978304Z 2023-01-11T21:38:06.6978385Z if __name__ == "__main__": 2023-01-11T21:38:06.6978503Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.6978632Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.6978848Z arg0_1 = rand_strided((1, 17, 8, 9), (1224, 72, 9, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.6979007Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.6979013Z 2023-01-11T21:38:06.6979281Z [2023-01-11 21:35:44,194] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 871 2023-01-11T21:38:06.6979287Z 2023-01-11T21:38:06.6979387Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.6979466Z import torch 2023-01-11T21:38:06.6979541Z import random 2023-01-11T21:38:06.6979662Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.6979780Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.6979792Z 2023-01-11T21:38:06.6979869Z aten = torch.ops.aten 2023-01-11T21:38:06.6980013Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.6980111Z async_compile = AsyncCompile() 2023-01-11T21:38:06.6980116Z 2023-01-11T21:38:06.6980192Z import triton 2023-01-11T21:38:06.6980286Z import triton.language as tl 2023-01-11T21:38:06.6980414Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.6980555Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.6980561Z 2023-01-11T21:38:06.6980565Z 2023-01-11T21:38:06.6980728Z triton_fused_add_sum_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.6980798Z import triton 2023-01-11T21:38:06.6980893Z import triton.language as tl 2023-01-11T21:38:06.6981007Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6981112Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6981250Z from torch._inductor.triton_ops.autotune import reduction 2023-01-11T21:38:06.6981377Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6981409Z 2023-01-11T21:38:06.6981504Z @reduction(size_hints=[256, 16], 2023-01-11T21:38:06.6981614Z reduction_hint=ReductionHint.INNER, 2023-01-11T21:38:06.6981702Z filename=__file__, 2023-01-11T21:38:06.6982067Z meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.6982143Z @triton.jit 2023-01-11T21:38:06.6982313Z def triton_(in_ptr0, out_ptr0, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr): 2023-01-11T21:38:06.6982389Z xnumel = 136 2023-01-11T21:38:06.6982464Z rnumel = 9 2023-01-11T21:38:06.6982564Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6982694Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.6982780Z xmask = xindex < xnumel 2023-01-11T21:38:06.6982902Z rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK]) 2023-01-11T21:38:06.6982978Z x0 = xindex 2023-01-11T21:38:06.6983098Z _tmp3 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.6983206Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.6983296Z rindex = roffset + rbase 2023-01-11T21:38:06.6983376Z rmask = rindex < rnumel 2023-01-11T21:38:06.6983455Z r1 = rindex 2023-01-11T21:38:06.6983590Z tmp0 = tl.load(in_ptr0 + (r1 + (9*x0)), rmask & xmask).to(tl.float32) 2023-01-11T21:38:06.6983664Z tmp1 = 1 2023-01-11T21:38:06.6983748Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.6983870Z _tmp3 = tl.where(xmask & rmask, _tmp3 + tmp2, _tmp3) 2023-01-11T21:38:06.6983987Z tmp3 = tl.reshape(tl.sum(_tmp3, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.6984081Z tl.store(out_ptr0 + x0, tmp3, xmask) 2023-01-11T21:38:06.6984168Z ''') 2023-01-11T21:38:06.6984173Z 2023-01-11T21:38:06.6984178Z 2023-01-11T21:38:06.6984371Z triton_fused_add_add_1_add_2_sum_1_sum_2_1 = async_compile.triton(''' 2023-01-11T21:38:06.6984453Z import triton 2023-01-11T21:38:06.6984548Z import triton.language as tl 2023-01-11T21:38:06.6984666Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6984770Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6984897Z from torch._inductor.triton_ops.autotune import reduction 2023-01-11T21:38:06.6985054Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6985059Z 2023-01-11T21:38:06.6985152Z @reduction(size_hints=[32, 8], 2023-01-11T21:38:06.6985269Z reduction_hint=ReductionHint.INNER, 2023-01-11T21:38:06.6985355Z filename=__file__, 2023-01-11T21:38:06.6985729Z meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.6985807Z @triton.jit 2023-01-11T21:38:06.6985978Z def triton_(in_out_ptr0, in_ptr0, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr): 2023-01-11T21:38:06.6986050Z xnumel = 17 2023-01-11T21:38:06.6986129Z rnumel = 8 2023-01-11T21:38:06.6986228Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6986364Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.6986449Z xmask = xindex < xnumel 2023-01-11T21:38:06.6986573Z rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK]) 2023-01-11T21:38:06.6986646Z x0 = xindex 2023-01-11T21:38:06.6986757Z _tmp3 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.6986865Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.6986954Z rindex = roffset + rbase 2023-01-11T21:38:06.6987041Z rmask = rindex < rnumel 2023-01-11T21:38:06.6987115Z r1 = rindex 2023-01-11T21:38:06.6987248Z tmp0 = tl.load(in_ptr0 + (r1 + (8*x0)), rmask & xmask).to(tl.float32) 2023-01-11T21:38:06.6987323Z tmp1 = 3 2023-01-11T21:38:06.6987400Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.6987551Z _tmp3 = tl.where(xmask & rmask, _tmp3 + tmp2, _tmp3) 2023-01-11T21:38:06.6987667Z tmp3 = tl.reshape(tl.sum(_tmp3, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.6987740Z tmp4 = 5 2023-01-11T21:38:06.6987821Z tmp5 = tmp3 + tmp4 2023-01-11T21:38:06.6987964Z tl.store(in_out_ptr0 + (x0 + tl.zeros([XBLOCK, 1], tl.int32)), tmp5, xmask) 2023-01-11T21:38:06.6988053Z ''') 2023-01-11T21:38:06.6988059Z 2023-01-11T21:38:06.6988063Z 2023-01-11T21:38:06.6988159Z async_compile.wait(globals()) 2023-01-11T21:38:06.6988231Z del async_compile 2023-01-11T21:38:06.6988236Z 2023-01-11T21:38:06.6988314Z def call(args): 2023-01-11T21:38:06.6988389Z arg0_1, = args 2023-01-11T21:38:06.6988466Z args.clear() 2023-01-11T21:38:06.6988561Z with torch.cuda.device(0): 2023-01-11T21:38:06.6988771Z buf0 = empty_strided((1, 17, 8), (136, 8, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.6988866Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.6989010Z triton_fused_add_sum_1_0.run(arg0_1, buf0, 136, 9, grid=grid(136), stream=stream0) 2023-01-11T21:38:06.6989086Z del arg0_1 2023-01-11T21:38:06.6989291Z buf1 = empty_strided((1, 17), (17, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.6989384Z buf2 = buf1; del buf1 # reuse 2023-01-11T21:38:06.6989547Z triton_fused_add_add_1_add_2_sum_1_sum_2_1.run(buf2, buf0, 17, 8, grid=grid(17), stream=stream0) 2023-01-11T21:38:06.6989631Z return (buf2, ) 2023-01-11T21:38:06.6989636Z 2023-01-11T21:38:06.6989640Z 2023-01-11T21:38:06.6989723Z if __name__ == "__main__": 2023-01-11T21:38:06.6989843Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.6989966Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.6990187Z arg0_1 = rand_strided((1, 17, 8, 9), (1224, 72, 9, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.6990304Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.6990309Z 2023-01-11T21:38:06.6990387Z ok (0.361s) 2023-01-11T21:38:06.6990874Z test_sum_dtype_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.6991012Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.6991274Z [2023-01-11 21:35:44,215] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 872 2023-01-11T21:38:06.6991484Z [2023-01-11 21:35:44,231] torch._inductor.scheduler: [DEBUG] remove_buffer('buf1') 2023-01-11T21:38:06.6991749Z [2023-01-11 21:35:44,315] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 872 2023-01-11T21:38:06.6992162Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.6992294Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.6992553Z [2023-01-11 21:35:44,334] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 873 2023-01-11T21:38:06.6992765Z [2023-01-11 21:35:44,348] torch._inductor.scheduler: [DEBUG] remove_buffer('buf1') 2023-01-11T21:38:06.6992770Z 2023-01-11T21:38:06.6992870Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.6992947Z import torch 2023-01-11T21:38:06.6993024Z import random 2023-01-11T21:38:06.6993145Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.6993272Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.6993304Z 2023-01-11T21:38:06.6993383Z aten = torch.ops.aten 2023-01-11T21:38:06.6993522Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.6993621Z async_compile = AsyncCompile() 2023-01-11T21:38:06.6993626Z 2023-01-11T21:38:06.6993705Z import triton 2023-01-11T21:38:06.6993799Z import triton.language as tl 2023-01-11T21:38:06.6993929Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.6994072Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.6994077Z 2023-01-11T21:38:06.6994082Z 2023-01-11T21:38:06.6994240Z triton_fused_sum_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.6994310Z import triton 2023-01-11T21:38:06.6994404Z import triton.language as tl 2023-01-11T21:38:06.6994520Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6994626Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6994761Z from torch._inductor.triton_ops.autotune import reduction 2023-01-11T21:38:06.6994894Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6994899Z 2023-01-11T21:38:06.6994990Z @reduction(size_hints=[32, 32], 2023-01-11T21:38:06.6995100Z reduction_hint=ReductionHint.INNER, 2023-01-11T21:38:06.6995206Z filename=__file__, 2023-01-11T21:38:06.6995608Z meta={'signature': {0: '*fp32', 1: '*fp64', 2: 'i32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.6995685Z @triton.jit 2023-01-11T21:38:06.6995859Z def triton_(in_ptr0, out_ptr0, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr): 2023-01-11T21:38:06.6995937Z xnumel = 32 2023-01-11T21:38:06.6996012Z rnumel = 32 2023-01-11T21:38:06.6996111Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6996241Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.6996325Z xmask = xindex < xnumel 2023-01-11T21:38:06.6996453Z rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK]) 2023-01-11T21:38:06.6996526Z x0 = xindex 2023-01-11T21:38:06.6996644Z _tmp2 = tl.zeros([XBLOCK, RBLOCK], tl.float64) + 0 2023-01-11T21:38:06.6996753Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.6996846Z rindex = roffset + rbase 2023-01-11T21:38:06.6996955Z rmask = rindex < rnumel 2023-01-11T21:38:06.6997030Z r1 = rindex 2023-01-11T21:38:06.6997249Z tmp0 = tl.load(in_ptr0 + (r1 + (32*x0)), rmask & xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.6997342Z tmp1 = tmp0.to(tl.float64) 2023-01-11T21:38:06.6997466Z _tmp2 = tl.where(xmask & rmask, _tmp2 + tmp1, _tmp2) 2023-01-11T21:38:06.6997585Z tmp2 = tl.reshape(tl.sum(_tmp2, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.6997684Z tl.store(out_ptr0 + x0, tmp2, xmask) 2023-01-11T21:38:06.6997764Z ''') 2023-01-11T21:38:06.6997770Z 2023-01-11T21:38:06.6997782Z 2023-01-11T21:38:06.6997938Z triton_fused_add_sum_2_1 = async_compile.triton(''' 2023-01-11T21:38:06.6998019Z import triton 2023-01-11T21:38:06.6998117Z import triton.language as tl 2023-01-11T21:38:06.6998234Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.6998339Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.6998477Z from torch._inductor.triton_ops.autotune import reduction 2023-01-11T21:38:06.6998606Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.6998612Z 2023-01-11T21:38:06.6998700Z @reduction(size_hints=[1, 1024], 2023-01-11T21:38:06.6998816Z reduction_hint=ReductionHint.INNER, 2023-01-11T21:38:06.6998903Z filename=__file__, 2023-01-11T21:38:06.6999281Z meta={'signature': {0: '*fp32', 1: '*fp64', 2: '*fp64', 3: 'i32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 4), equal_to_1=())]}) 2023-01-11T21:38:06.6999360Z @triton.jit 2023-01-11T21:38:06.6999541Z def triton_(in_ptr0, in_ptr1, out_ptr1, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr): 2023-01-11T21:38:06.6999656Z xnumel = 1 2023-01-11T21:38:06.6999733Z rnumel = 1024 2023-01-11T21:38:06.6999826Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.6999962Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.7000052Z xmask = xindex < xnumel 2023-01-11T21:38:06.7000176Z rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK]) 2023-01-11T21:38:06.7000297Z _tmp2 = tl.zeros([XBLOCK, RBLOCK], tl.float64) + 0 2023-01-11T21:38:06.7000405Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.7000497Z rindex = roffset + rbase 2023-01-11T21:38:06.7000577Z rmask = rindex < rnumel 2023-01-11T21:38:06.7000651Z r0 = rindex 2023-01-11T21:38:06.7000849Z tmp0 = tl.load(in_ptr0 + (r0), rmask, eviction_policy='evict_last') 2023-01-11T21:38:06.7000943Z tmp1 = tmp0.to(tl.float64) 2023-01-11T21:38:06.7001067Z _tmp2 = tl.where(xmask & rmask, _tmp2 + tmp1, _tmp2) 2023-01-11T21:38:06.7001189Z tmp2 = tl.reshape(tl.sum(_tmp2, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.7001296Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.7001379Z rindex = roffset + rbase 2023-01-11T21:38:06.7001466Z rmask = rindex < rnumel 2023-01-11T21:38:06.7001543Z r0 = rindex 2023-01-11T21:38:06.7001624Z r1 = rindex % 32 2023-01-11T21:38:06.7001730Z tmp3 = tl.load(in_ptr0 + (r0), rmask) 2023-01-11T21:38:06.7001833Z tmp5 = tl.load(in_ptr1 + (r1), rmask) 2023-01-11T21:38:06.7001925Z tmp4 = tmp3.to(tl.float64) 2023-01-11T21:38:06.7002003Z tmp6 = tmp4 * tmp5 2023-01-11T21:38:06.7002087Z tmp7 = tmp6 + tmp2 2023-01-11T21:38:06.7002241Z tl.store(out_ptr1 + (r0 + tl.zeros([XBLOCK, RBLOCK], tl.int32)), tmp7, rmask & xmask) 2023-01-11T21:38:06.7002329Z ''') 2023-01-11T21:38:06.7002336Z 2023-01-11T21:38:06.7002340Z 2023-01-11T21:38:06.7002437Z async_compile.wait(globals()) 2023-01-11T21:38:06.7002518Z del async_compile 2023-01-11T21:38:06.7002523Z 2023-01-11T21:38:06.7002602Z def call(args): 2023-01-11T21:38:06.7002670Z arg0_1, = args 2023-01-11T21:38:06.7002749Z args.clear() 2023-01-11T21:38:06.7002844Z with torch.cuda.device(0): 2023-01-11T21:38:06.7003076Z buf0 = empty_strided((32, ), (1, ), device='cuda', dtype=torch.float64) 2023-01-11T21:38:06.7003172Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.7003314Z triton_fused_sum_1_0.run(arg0_1, buf0, 32, 32, grid=grid(32), stream=stream0) 2023-01-11T21:38:06.7003520Z buf2 = empty_strided((32, 32), (32, 1), device='cuda', dtype=torch.float64) 2023-01-11T21:38:06.7003670Z triton_fused_add_sum_2_1.run(arg0_1, buf0, buf2, 1, 1024, grid=grid(1), stream=stream0) 2023-01-11T21:38:06.7003739Z del arg0_1 2023-01-11T21:38:06.7003820Z return (buf2, ) 2023-01-11T21:38:06.7003825Z 2023-01-11T21:38:06.7003830Z 2023-01-11T21:38:06.7003912Z if __name__ == "__main__": 2023-01-11T21:38:06.7004036Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.7004165Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.7004372Z arg0_1 = rand_strided((32, 32), (32, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.7004486Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.7004493Z 2023-01-11T21:38:06.7004758Z [2023-01-11 21:35:44,435] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 873 2023-01-11T21:38:06.7004764Z 2023-01-11T21:38:06.7004856Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.7004933Z import torch 2023-01-11T21:38:06.7005011Z import random 2023-01-11T21:38:06.7005131Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.7005256Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.7005261Z 2023-01-11T21:38:06.7005346Z aten = torch.ops.aten 2023-01-11T21:38:06.7005484Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.7005609Z async_compile = AsyncCompile() 2023-01-11T21:38:06.7005614Z 2023-01-11T21:38:06.7005684Z import triton 2023-01-11T21:38:06.7005778Z import triton.language as tl 2023-01-11T21:38:06.7005904Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.7006049Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.7006055Z 2023-01-11T21:38:06.7006059Z 2023-01-11T21:38:06.7006220Z triton_fused_sum_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.7006297Z import triton 2023-01-11T21:38:06.7006393Z import triton.language as tl 2023-01-11T21:38:06.7006502Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.7006605Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.7006737Z from torch._inductor.triton_ops.autotune import reduction 2023-01-11T21:38:06.7006864Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.7006869Z 2023-01-11T21:38:06.7006960Z @reduction(size_hints=[32, 32], 2023-01-11T21:38:06.7007082Z reduction_hint=ReductionHint.INNER, 2023-01-11T21:38:06.7007172Z filename=__file__, 2023-01-11T21:38:06.7007538Z meta={'signature': {0: '*fp16', 1: '*fp64', 2: 'i32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.7007607Z @triton.jit 2023-01-11T21:38:06.7007776Z def triton_(in_ptr0, out_ptr0, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr): 2023-01-11T21:38:06.7007850Z xnumel = 32 2023-01-11T21:38:06.7007924Z rnumel = 32 2023-01-11T21:38:06.7008024Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.7008161Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.7008248Z xmask = xindex < xnumel 2023-01-11T21:38:06.7008361Z rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK]) 2023-01-11T21:38:06.7008435Z x0 = xindex 2023-01-11T21:38:06.7008558Z _tmp2 = tl.zeros([XBLOCK, RBLOCK], tl.float64) + 0 2023-01-11T21:38:06.7008665Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.7008755Z rindex = roffset + rbase 2023-01-11T21:38:06.7008843Z rmask = rindex < rnumel 2023-01-11T21:38:06.7008917Z r1 = rindex 2023-01-11T21:38:06.7009177Z tmp0 = tl.load(in_ptr0 + (r1 + (32*x0)), rmask & xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.7009277Z tmp1 = tmp0.to(tl.float64) 2023-01-11T21:38:06.7009400Z _tmp2 = tl.where(xmask & rmask, _tmp2 + tmp1, _tmp2) 2023-01-11T21:38:06.7009516Z tmp2 = tl.reshape(tl.sum(_tmp2, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.7009616Z tl.store(out_ptr0 + x0, tmp2, xmask) 2023-01-11T21:38:06.7009703Z ''') 2023-01-11T21:38:06.7009708Z 2023-01-11T21:38:06.7009713Z 2023-01-11T21:38:06.7009879Z triton_fused_add_sum_2_1 = async_compile.triton(''' 2023-01-11T21:38:06.7009956Z import triton 2023-01-11T21:38:06.7010044Z import triton.language as tl 2023-01-11T21:38:06.7010163Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.7010266Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.7010401Z from torch._inductor.triton_ops.autotune import reduction 2023-01-11T21:38:06.7010528Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.7010533Z 2023-01-11T21:38:06.7010628Z @reduction(size_hints=[1, 1024], 2023-01-11T21:38:06.7010743Z reduction_hint=ReductionHint.INNER, 2023-01-11T21:38:06.7010823Z filename=__file__, 2023-01-11T21:38:06.7011202Z meta={'signature': {0: '*fp16', 1: '*fp64', 2: '*fp64', 3: 'i32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 4), equal_to_1=())]}) 2023-01-11T21:38:06.7011279Z @triton.jit 2023-01-11T21:38:06.7011457Z def triton_(in_ptr0, in_ptr1, out_ptr1, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr): 2023-01-11T21:38:06.7011564Z xnumel = 1 2023-01-11T21:38:06.7011640Z rnumel = 1024 2023-01-11T21:38:06.7011740Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.7011876Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.7011955Z xmask = xindex < xnumel 2023-01-11T21:38:06.7012077Z rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK]) 2023-01-11T21:38:06.7012198Z _tmp2 = tl.zeros([XBLOCK, RBLOCK], tl.float64) + 0 2023-01-11T21:38:06.7012305Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.7012396Z rindex = roffset + rbase 2023-01-11T21:38:06.7012483Z rmask = rindex < rnumel 2023-01-11T21:38:06.7012557Z r0 = rindex 2023-01-11T21:38:06.7012767Z tmp0 = tl.load(in_ptr0 + (r0), rmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.7012860Z tmp1 = tmp0.to(tl.float64) 2023-01-11T21:38:06.7012985Z _tmp2 = tl.where(xmask & rmask, _tmp2 + tmp1, _tmp2) 2023-01-11T21:38:06.7013101Z tmp2 = tl.reshape(tl.sum(_tmp2, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.7013212Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.7013305Z rindex = roffset + rbase 2023-01-11T21:38:06.7013392Z rmask = rindex < rnumel 2023-01-11T21:38:06.7013459Z r0 = rindex 2023-01-11T21:38:06.7013541Z r1 = rindex % 32 2023-01-11T21:38:06.7013663Z tmp3 = tl.load(in_ptr0 + (r0), rmask).to(tl.float32) 2023-01-11T21:38:06.7013767Z tmp5 = tl.load(in_ptr1 + (r1), rmask) 2023-01-11T21:38:06.7013860Z tmp4 = tmp3.to(tl.float64) 2023-01-11T21:38:06.7013944Z tmp6 = tmp4 * tmp5 2023-01-11T21:38:06.7014033Z tmp7 = tmp6 + tmp2 2023-01-11T21:38:06.7014183Z tl.store(out_ptr1 + (r0 + tl.zeros([XBLOCK, RBLOCK], tl.int32)), tmp7, rmask & xmask) 2023-01-11T21:38:06.7014270Z ''') 2023-01-11T21:38:06.7014276Z 2023-01-11T21:38:06.7014280Z 2023-01-11T21:38:06.7014377Z async_compile.wait(globals()) 2023-01-11T21:38:06.7014456Z del async_compile 2023-01-11T21:38:06.7014463Z 2023-01-11T21:38:06.7014728Z def call(args): 2023-01-11T21:38:06.7014807Z arg0_1, = args 2023-01-11T21:38:06.7014886Z args.clear() 2023-01-11T21:38:06.7014982Z with torch.cuda.device(0): 2023-01-11T21:38:06.7015180Z buf0 = empty_strided((32, ), (1, ), device='cuda', dtype=torch.float64) 2023-01-11T21:38:06.7015321Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.7015463Z triton_fused_sum_1_0.run(arg0_1, buf0, 32, 32, grid=grid(32), stream=stream0) 2023-01-11T21:38:06.7015666Z buf2 = empty_strided((32, 32), (32, 1), device='cuda', dtype=torch.float64) 2023-01-11T21:38:06.7015816Z triton_fused_add_sum_2_1.run(arg0_1, buf0, buf2, 1, 1024, grid=grid(1), stream=stream0) 2023-01-11T21:38:06.7015891Z del arg0_1 2023-01-11T21:38:06.7015973Z return (buf2, ) 2023-01-11T21:38:06.7015978Z 2023-01-11T21:38:06.7015982Z 2023-01-11T21:38:06.7016063Z if __name__ == "__main__": 2023-01-11T21:38:06.7016176Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.7016309Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.7016514Z arg0_1 = rand_strided((32, 32), (32, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.7016629Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.7016634Z 2023-01-11T21:38:06.7016707Z ok (0.240s) 2023-01-11T21:38:06.7017221Z test_sum_int_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.7017359Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.7017621Z [2023-01-11 21:35:44,454] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 874 2023-01-11T21:38:06.7017873Z [2023-01-11 21:35:44,464] torch._inductor.scheduler: [DEBUG] remove_buffer('buf1') 2023-01-11T21:38:06.7018132Z [2023-01-11 21:35:44,547] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 874 2023-01-11T21:38:06.7018554Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.7018687Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.7018945Z [2023-01-11 21:35:44,564] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 875 2023-01-11T21:38:06.7019155Z [2023-01-11 21:35:44,574] torch._inductor.scheduler: [DEBUG] remove_buffer('buf1') 2023-01-11T21:38:06.7019416Z [2023-01-11 21:35:44,654] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 875 2023-01-11T21:38:06.7019835Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.7019968Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.7020223Z [2023-01-11 21:35:44,672] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 876 2023-01-11T21:38:06.7020433Z [2023-01-11 21:35:44,682] torch._inductor.scheduler: [DEBUG] remove_buffer('buf1') 2023-01-11T21:38:06.7020439Z 2023-01-11T21:38:06.7020541Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.7020612Z import torch 2023-01-11T21:38:06.7020689Z import random 2023-01-11T21:38:06.7020812Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.7020938Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.7020943Z 2023-01-11T21:38:06.7021030Z aten = torch.ops.aten 2023-01-11T21:38:06.7021169Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.7021295Z async_compile = AsyncCompile() 2023-01-11T21:38:06.7021302Z 2023-01-11T21:38:06.7021373Z import triton 2023-01-11T21:38:06.7021467Z import triton.language as tl 2023-01-11T21:38:06.7021595Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.7021738Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.7021743Z 2023-01-11T21:38:06.7021748Z 2023-01-11T21:38:06.7021929Z triton_fused_add_mul_sum_1_sum_2_0 = async_compile.triton(''' 2023-01-11T21:38:06.7022006Z import triton 2023-01-11T21:38:06.7022101Z import triton.language as tl 2023-01-11T21:38:06.7022220Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.7022320Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.7022453Z from torch._inductor.triton_ops.autotune import reduction 2023-01-11T21:38:06.7022579Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.7022585Z 2023-01-11T21:38:06.7022681Z @reduction(size_hints=[1, 64], 2023-01-11T21:38:06.7022800Z reduction_hint=ReductionHint.INNER, 2023-01-11T21:38:06.7022888Z filename=__file__, 2023-01-11T21:38:06.7023260Z meta={'signature': {0: '*i64', 1: '*i1', 2: 'i32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 3), equal_to_1=())]}) 2023-01-11T21:38:06.7023337Z @triton.jit 2023-01-11T21:38:06.7023502Z def triton_(in_out_ptr0, in_ptr0, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr): 2023-01-11T21:38:06.7023577Z xnumel = 1 2023-01-11T21:38:06.7023655Z rnumel = 64 2023-01-11T21:38:06.7023754Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.7023923Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.7024009Z xmask = xindex < xnumel 2023-01-11T21:38:06.7024131Z rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK]) 2023-01-11T21:38:06.7024241Z _tmp2 = tl.zeros([XBLOCK, RBLOCK], tl.int64) + 0 2023-01-11T21:38:06.7024357Z _tmp5 = tl.zeros([XBLOCK, RBLOCK], tl.int64) + 0 2023-01-11T21:38:06.7024464Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.7024557Z rindex = roffset + rbase 2023-01-11T21:38:06.7024644Z rmask = rindex < rnumel 2023-01-11T21:38:06.7024719Z r0 = rindex 2023-01-11T21:38:06.7024916Z tmp0 = tl.load(in_ptr0 + (r0), rmask, eviction_policy='evict_last') 2023-01-11T21:38:06.7025013Z tmp3 = tl.load(in_ptr0 + (r0), rmask) 2023-01-11T21:38:06.7025105Z tmp1 = tmp0.to(tl.int64) 2023-01-11T21:38:06.7025230Z _tmp2 = tl.where(xmask & rmask, _tmp2 + tmp1, _tmp2) 2023-01-11T21:38:06.7025324Z tmp4 = tmp3.to(tl.int64) 2023-01-11T21:38:06.7025447Z _tmp5 = tl.where(xmask & rmask, _tmp5 + tmp4, _tmp5) 2023-01-11T21:38:06.7025562Z tmp2 = tl.reshape(tl.sum(_tmp2, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.7025675Z tmp5 = tl.reshape(tl.sum(_tmp5, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.7025742Z tmp6 = 2 2023-01-11T21:38:06.7025827Z tmp7 = tmp2 * tmp6 2023-01-11T21:38:06.7025908Z tmp8 = tmp7 + tmp5 2023-01-11T21:38:06.7026050Z tl.store(in_out_ptr0 + (0 + tl.zeros([XBLOCK, 1], tl.int32)), tmp8, None) 2023-01-11T21:38:06.7026139Z ''') 2023-01-11T21:38:06.7026145Z 2023-01-11T21:38:06.7026149Z 2023-01-11T21:38:06.7026247Z async_compile.wait(globals()) 2023-01-11T21:38:06.7026325Z del async_compile 2023-01-11T21:38:06.7026331Z 2023-01-11T21:38:06.7026407Z def call(args): 2023-01-11T21:38:06.7026476Z arg0_1, = args 2023-01-11T21:38:06.7026554Z args.clear() 2023-01-11T21:38:06.7026648Z with torch.cuda.device(0): 2023-01-11T21:38:06.7026840Z buf0 = empty_strided((), (), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.7026939Z buf2 = buf0; del buf0 # reuse 2023-01-11T21:38:06.7027032Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.7027185Z triton_fused_add_mul_sum_1_sum_2_0.run(buf2, arg0_1, 1, 64, grid=grid(1), stream=stream0) 2023-01-11T21:38:06.7027254Z del arg0_1 2023-01-11T21:38:06.7027362Z return (buf2, ) 2023-01-11T21:38:06.7027368Z 2023-01-11T21:38:06.7027373Z 2023-01-11T21:38:06.7027455Z if __name__ == "__main__": 2023-01-11T21:38:06.7027576Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.7027706Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.7027902Z arg0_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.bool) 2023-01-11T21:38:06.7028017Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.7028022Z 2023-01-11T21:38:06.7028027Z 2023-01-11T21:38:06.7028127Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.7028199Z import torch 2023-01-11T21:38:06.7028276Z import random 2023-01-11T21:38:06.7028400Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.7028529Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.7028534Z 2023-01-11T21:38:06.7028619Z aten = torch.ops.aten 2023-01-11T21:38:06.7028759Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.7028858Z async_compile = AsyncCompile() 2023-01-11T21:38:06.7028863Z 2023-01-11T21:38:06.7028940Z import triton 2023-01-11T21:38:06.7029027Z import triton.language as tl 2023-01-11T21:38:06.7029156Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.7029298Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.7029303Z 2023-01-11T21:38:06.7029308Z 2023-01-11T21:38:06.7029490Z triton_fused_add_mul_sum_1_sum_2_0 = async_compile.triton(''' 2023-01-11T21:38:06.7029569Z import triton 2023-01-11T21:38:06.7029663Z import triton.language as tl 2023-01-11T21:38:06.7029806Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.7029903Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.7030036Z from torch._inductor.triton_ops.autotune import reduction 2023-01-11T21:38:06.7030161Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.7030167Z 2023-01-11T21:38:06.7030261Z @reduction(size_hints=[1, 64], 2023-01-11T21:38:06.7030378Z reduction_hint=ReductionHint.INNER, 2023-01-11T21:38:06.7030466Z filename=__file__, 2023-01-11T21:38:06.7030834Z meta={'signature': {0: '*i64', 1: '*u8', 2: 'i32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 3), equal_to_1=())]}) 2023-01-11T21:38:06.7030909Z @triton.jit 2023-01-11T21:38:06.7031076Z def triton_(in_out_ptr0, in_ptr0, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr): 2023-01-11T21:38:06.7031152Z xnumel = 1 2023-01-11T21:38:06.7031231Z rnumel = 64 2023-01-11T21:38:06.7031334Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.7031471Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.7031558Z xmask = xindex < xnumel 2023-01-11T21:38:06.7031680Z rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK]) 2023-01-11T21:38:06.7031792Z _tmp2 = tl.zeros([XBLOCK, RBLOCK], tl.int64) + 0 2023-01-11T21:38:06.7031907Z _tmp5 = tl.zeros([XBLOCK, RBLOCK], tl.int64) + 0 2023-01-11T21:38:06.7032018Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.7032109Z rindex = roffset + rbase 2023-01-11T21:38:06.7032197Z rmask = rindex < rnumel 2023-01-11T21:38:06.7032271Z r0 = rindex 2023-01-11T21:38:06.7032467Z tmp0 = tl.load(in_ptr0 + (r0), rmask, eviction_policy='evict_last') 2023-01-11T21:38:06.7032564Z tmp3 = tl.load(in_ptr0 + (r0), rmask) 2023-01-11T21:38:06.7032654Z tmp1 = tmp0.to(tl.int64) 2023-01-11T21:38:06.7032776Z _tmp2 = tl.where(xmask & rmask, _tmp2 + tmp1, _tmp2) 2023-01-11T21:38:06.7032870Z tmp4 = tmp3.to(tl.int64) 2023-01-11T21:38:06.7032993Z _tmp5 = tl.where(xmask & rmask, _tmp5 + tmp4, _tmp5) 2023-01-11T21:38:06.7033113Z tmp2 = tl.reshape(tl.sum(_tmp2, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.7033255Z tmp5 = tl.reshape(tl.sum(_tmp5, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.7033330Z tmp6 = 2 2023-01-11T21:38:06.7033405Z tmp7 = tmp2 * tmp6 2023-01-11T21:38:06.7033484Z tmp8 = tmp7 + tmp5 2023-01-11T21:38:06.7033622Z tl.store(in_out_ptr0 + (0 + tl.zeros([XBLOCK, 1], tl.int32)), tmp8, None) 2023-01-11T21:38:06.7033711Z ''') 2023-01-11T21:38:06.7033716Z 2023-01-11T21:38:06.7033721Z 2023-01-11T21:38:06.7033819Z async_compile.wait(globals()) 2023-01-11T21:38:06.7033899Z del async_compile 2023-01-11T21:38:06.7033904Z 2023-01-11T21:38:06.7033982Z def call(args): 2023-01-11T21:38:06.7034050Z arg0_1, = args 2023-01-11T21:38:06.7034130Z args.clear() 2023-01-11T21:38:06.7034228Z with torch.cuda.device(0): 2023-01-11T21:38:06.7034423Z buf0 = empty_strided((), (), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.7034520Z buf2 = buf0; del buf0 # reuse 2023-01-11T21:38:06.7034615Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.7034771Z triton_fused_add_mul_sum_1_sum_2_0.run(buf2, arg0_1, 1, 64, grid=grid(1), stream=stream0) 2023-01-11T21:38:06.7034840Z del arg0_1 2023-01-11T21:38:06.7034921Z return (buf2, ) 2023-01-11T21:38:06.7034926Z 2023-01-11T21:38:06.7034931Z 2023-01-11T21:38:06.7035011Z if __name__ == "__main__": 2023-01-11T21:38:06.7035135Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.7035284Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.7035508Z arg0_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.uint8) 2023-01-11T21:38:06.7035623Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.7035628Z 2023-01-11T21:38:06.7035897Z [2023-01-11 21:35:44,764] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 876 2023-01-11T21:38:06.7035929Z 2023-01-11T21:38:06.7036029Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.7036099Z import torch 2023-01-11T21:38:06.7036175Z import random 2023-01-11T21:38:06.7036296Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.7036426Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.7036431Z 2023-01-11T21:38:06.7036515Z aten = torch.ops.aten 2023-01-11T21:38:06.7036655Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.7036754Z async_compile = AsyncCompile() 2023-01-11T21:38:06.7036759Z 2023-01-11T21:38:06.7036828Z import triton 2023-01-11T21:38:06.7036924Z import triton.language as tl 2023-01-11T21:38:06.7037051Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.7037194Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.7037199Z 2023-01-11T21:38:06.7037207Z 2023-01-11T21:38:06.7037389Z triton_fused_add_mul_sum_1_sum_2_0 = async_compile.triton(''' 2023-01-11T21:38:06.7037466Z import triton 2023-01-11T21:38:06.7037562Z import triton.language as tl 2023-01-11T21:38:06.7037677Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.7037773Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.7037908Z from torch._inductor.triton_ops.autotune import reduction 2023-01-11T21:38:06.7038036Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.7038041Z 2023-01-11T21:38:06.7038133Z @reduction(size_hints=[1, 64], 2023-01-11T21:38:06.7038249Z reduction_hint=ReductionHint.INNER, 2023-01-11T21:38:06.7038337Z filename=__file__, 2023-01-11T21:38:06.7038708Z meta={'signature': {0: '*i64', 1: '*i32', 2: 'i32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 3), equal_to_1=())]}) 2023-01-11T21:38:06.7038788Z @triton.jit 2023-01-11T21:38:06.7038953Z def triton_(in_out_ptr0, in_ptr0, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr): 2023-01-11T21:38:06.7039027Z xnumel = 1 2023-01-11T21:38:06.7039103Z rnumel = 64 2023-01-11T21:38:06.7039204Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.7039405Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.7039493Z xmask = xindex < xnumel 2023-01-11T21:38:06.7039616Z rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK]) 2023-01-11T21:38:06.7039725Z _tmp2 = tl.zeros([XBLOCK, RBLOCK], tl.int64) + 0 2023-01-11T21:38:06.7039840Z _tmp5 = tl.zeros([XBLOCK, RBLOCK], tl.int64) + 0 2023-01-11T21:38:06.7039948Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.7040039Z rindex = roffset + rbase 2023-01-11T21:38:06.7040128Z rmask = rindex < rnumel 2023-01-11T21:38:06.7040204Z r0 = rindex 2023-01-11T21:38:06.7040401Z tmp0 = tl.load(in_ptr0 + (r0), rmask, eviction_policy='evict_last') 2023-01-11T21:38:06.7040502Z tmp3 = tl.load(in_ptr0 + (r0), rmask) 2023-01-11T21:38:06.7040592Z tmp1 = tmp0.to(tl.int64) 2023-01-11T21:38:06.7040716Z _tmp2 = tl.where(xmask & rmask, _tmp2 + tmp1, _tmp2) 2023-01-11T21:38:06.7040810Z tmp4 = tmp3.to(tl.int64) 2023-01-11T21:38:06.7040931Z _tmp5 = tl.where(xmask & rmask, _tmp5 + tmp4, _tmp5) 2023-01-11T21:38:06.7041048Z tmp2 = tl.reshape(tl.sum(_tmp2, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.7041162Z tmp5 = tl.reshape(tl.sum(_tmp5, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.7041229Z tmp6 = 2 2023-01-11T21:38:06.7041312Z tmp7 = tmp2 * tmp6 2023-01-11T21:38:06.7041392Z tmp8 = tmp7 + tmp5 2023-01-11T21:38:06.7041532Z tl.store(in_out_ptr0 + (0 + tl.zeros([XBLOCK, 1], tl.int32)), tmp8, None) 2023-01-11T21:38:06.7041620Z ''') 2023-01-11T21:38:06.7041625Z 2023-01-11T21:38:06.7041630Z 2023-01-11T21:38:06.7041729Z async_compile.wait(globals()) 2023-01-11T21:38:06.7041807Z del async_compile 2023-01-11T21:38:06.7041848Z 2023-01-11T21:38:06.7041926Z def call(args): 2023-01-11T21:38:06.7041995Z arg0_1, = args 2023-01-11T21:38:06.7042071Z args.clear() 2023-01-11T21:38:06.7042167Z with torch.cuda.device(0): 2023-01-11T21:38:06.7042354Z buf0 = empty_strided((), (), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.7042450Z buf2 = buf0; del buf0 # reuse 2023-01-11T21:38:06.7042545Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.7042697Z triton_fused_add_mul_sum_1_sum_2_0.run(buf2, arg0_1, 1, 64, grid=grid(1), stream=stream0) 2023-01-11T21:38:06.7042767Z del arg0_1 2023-01-11T21:38:06.7042845Z return (buf2, ) 2023-01-11T21:38:06.7042850Z 2023-01-11T21:38:06.7042855Z 2023-01-11T21:38:06.7042936Z if __name__ == "__main__": 2023-01-11T21:38:06.7043056Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.7043184Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.7043383Z arg0_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.int32) 2023-01-11T21:38:06.7043501Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.7043506Z 2023-01-11T21:38:06.7043579Z ok (0.329s) 2023-01-11T21:38:06.7044042Z test_sum_keepdims_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.7044178Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.7044437Z [2023-01-11 21:35:44,780] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 877 2023-01-11T21:38:06.7044706Z [2023-01-11 21:35:44,792] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 877 2023-01-11T21:38:06.7045150Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.7045286Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.7045545Z [2023-01-11 21:35:44,809] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 878 2023-01-11T21:38:06.7045807Z [2023-01-11 21:35:44,820] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 878 2023-01-11T21:38:06.7045813Z 2023-01-11T21:38:06.7045915Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.7045992Z import torch 2023-01-11T21:38:06.7046062Z import random 2023-01-11T21:38:06.7046185Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.7046315Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.7046320Z 2023-01-11T21:38:06.7046404Z aten = torch.ops.aten 2023-01-11T21:38:06.7046542Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.7046641Z async_compile = AsyncCompile() 2023-01-11T21:38:06.7046646Z 2023-01-11T21:38:06.7046725Z import triton 2023-01-11T21:38:06.7046822Z import triton.language as tl 2023-01-11T21:38:06.7046942Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.7047085Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.7047091Z 2023-01-11T21:38:06.7047096Z 2023-01-11T21:38:06.7047259Z triton_fused_add_sum_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.7047336Z import triton 2023-01-11T21:38:06.7047431Z import triton.language as tl 2023-01-11T21:38:06.7047546Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.7047650Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.7047809Z from torch._inductor.triton_ops.autotune import reduction 2023-01-11T21:38:06.7047930Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.7047935Z 2023-01-11T21:38:06.7048026Z @reduction(size_hints=[8, 8], 2023-01-11T21:38:06.7048144Z reduction_hint=ReductionHint.INNER, 2023-01-11T21:38:06.7048234Z filename=__file__, 2023-01-11T21:38:06.7048613Z meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.7048691Z @triton.jit 2023-01-11T21:38:06.7048871Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr): 2023-01-11T21:38:06.7048946Z xnumel = 8 2023-01-11T21:38:06.7049014Z rnumel = 8 2023-01-11T21:38:06.7049114Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.7049250Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.7049340Z xmask = xindex < xnumel 2023-01-11T21:38:06.7049460Z rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK]) 2023-01-11T21:38:06.7049533Z x0 = xindex 2023-01-11T21:38:06.7049652Z _tmp3 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.7049756Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.7049846Z rindex = roffset + rbase 2023-01-11T21:38:06.7049936Z rmask = rindex < rnumel 2023-01-11T21:38:06.7050009Z r1 = rindex 2023-01-11T21:38:06.7050128Z tmp0 = tl.load(in_ptr0 + (r1 + (8*x0)), rmask & xmask) 2023-01-11T21:38:06.7050247Z tmp1 = tl.load(in_ptr1 + (r1 + (8*x0)), rmask & xmask) 2023-01-11T21:38:06.7050332Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.7050448Z _tmp3 = tl.where(xmask & rmask, _tmp3 + tmp2, _tmp3) 2023-01-11T21:38:06.7050565Z tmp3 = tl.reshape(tl.sum(_tmp3, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.7050665Z tl.store(out_ptr0 + x0, tmp3, xmask) 2023-01-11T21:38:06.7050755Z ''') 2023-01-11T21:38:06.7050760Z 2023-01-11T21:38:06.7050764Z 2023-01-11T21:38:06.7050860Z async_compile.wait(globals()) 2023-01-11T21:38:06.7050939Z del async_compile 2023-01-11T21:38:06.7050944Z 2023-01-11T21:38:06.7051021Z def call(args): 2023-01-11T21:38:06.7051096Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.7051199Z args.clear() 2023-01-11T21:38:06.7051293Z with torch.cuda.device(0): 2023-01-11T21:38:06.7051497Z buf0 = empty_strided((8, 1), (1, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.7051591Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.7051742Z triton_fused_add_sum_1_0.run(arg0_1, arg1_1, buf0, 8, 8, grid=grid(8), stream=stream0) 2023-01-11T21:38:06.7051818Z del arg0_1 2023-01-11T21:38:06.7051886Z del arg1_1 2023-01-11T21:38:06.7051966Z return (buf0, ) 2023-01-11T21:38:06.7051971Z 2023-01-11T21:38:06.7051975Z 2023-01-11T21:38:06.7052057Z if __name__ == "__main__": 2023-01-11T21:38:06.7052179Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.7052306Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.7052509Z arg0_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.7052710Z arg1_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.7052832Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.7052837Z 2023-01-11T21:38:06.7052842Z 2023-01-11T21:38:06.7052943Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.7053012Z import torch 2023-01-11T21:38:06.7053088Z import random 2023-01-11T21:38:06.7053210Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.7053336Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.7053341Z 2023-01-11T21:38:06.7053426Z aten = torch.ops.aten 2023-01-11T21:38:06.7053564Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.7053706Z async_compile = AsyncCompile() 2023-01-11T21:38:06.7053712Z 2023-01-11T21:38:06.7053781Z import triton 2023-01-11T21:38:06.7053878Z import triton.language as tl 2023-01-11T21:38:06.7054006Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.7054147Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.7054155Z 2023-01-11T21:38:06.7054159Z 2023-01-11T21:38:06.7054322Z triton_fused_add_sum_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.7054399Z import triton 2023-01-11T21:38:06.7054617Z import triton.language as tl 2023-01-11T21:38:06.7054732Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.7054827Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.7054957Z from torch._inductor.triton_ops.autotune import reduction 2023-01-11T21:38:06.7055083Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.7055089Z 2023-01-11T21:38:06.7055175Z @reduction(size_hints=[8, 8], 2023-01-11T21:38:06.7055314Z reduction_hint=ReductionHint.INNER, 2023-01-11T21:38:06.7055407Z filename=__file__, 2023-01-11T21:38:06.7055807Z meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.7055881Z @triton.jit 2023-01-11T21:38:06.7056051Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr): 2023-01-11T21:38:06.7056124Z xnumel = 8 2023-01-11T21:38:06.7056196Z rnumel = 8 2023-01-11T21:38:06.7056292Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.7056431Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.7056513Z xmask = xindex < xnumel 2023-01-11T21:38:06.7056630Z rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK]) 2023-01-11T21:38:06.7056694Z x0 = xindex 2023-01-11T21:38:06.7056814Z _tmp3 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.7056917Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.7057004Z rindex = roffset + rbase 2023-01-11T21:38:06.7057089Z rmask = rindex < rnumel 2023-01-11T21:38:06.7057217Z r1 = rindex 2023-01-11T21:38:06.7057406Z tmp0 = tl.load(in_ptr0 + (r1 + (8*x0)), rmask & xmask).to(tl.float32) 2023-01-11T21:38:06.7057535Z tmp1 = tl.load(in_ptr1 + (r1 + (8*x0)), rmask & xmask).to(tl.float32) 2023-01-11T21:38:06.7057617Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.7057740Z _tmp3 = tl.where(xmask & rmask, _tmp3 + tmp2, _tmp3) 2023-01-11T21:38:06.7057855Z tmp3 = tl.reshape(tl.sum(_tmp3, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.7057954Z tl.store(out_ptr0 + x0, tmp3, xmask) 2023-01-11T21:38:06.7058039Z ''') 2023-01-11T21:38:06.7058044Z 2023-01-11T21:38:06.7058049Z 2023-01-11T21:38:06.7058141Z async_compile.wait(globals()) 2023-01-11T21:38:06.7058211Z del async_compile 2023-01-11T21:38:06.7058226Z 2023-01-11T21:38:06.7058294Z def call(args): 2023-01-11T21:38:06.7058372Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.7058447Z args.clear() 2023-01-11T21:38:06.7058538Z with torch.cuda.device(0): 2023-01-11T21:38:06.7058737Z buf0 = empty_strided((8, 1), (1, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.7058831Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.7058980Z triton_fused_add_sum_1_0.run(arg0_1, arg1_1, buf0, 8, 8, grid=grid(8), stream=stream0) 2023-01-11T21:38:06.7059046Z del arg0_1 2023-01-11T21:38:06.7059118Z del arg1_1 2023-01-11T21:38:06.7059195Z return (buf0, ) 2023-01-11T21:38:06.7059200Z 2023-01-11T21:38:06.7059204Z 2023-01-11T21:38:06.7059284Z if __name__ == "__main__": 2023-01-11T21:38:06.7059402Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.7059527Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.7059725Z arg0_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.7059948Z arg1_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.7060068Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.7060073Z 2023-01-11T21:38:06.7060143Z ok (0.057s) 2023-01-11T21:38:06.7060595Z test_tanh_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.7060725Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.7060982Z [2023-01-11 21:35:44,850] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 879 2023-01-11T21:38:06.7061245Z [2023-01-11 21:35:44,998] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 879 2023-01-11T21:38:06.7061663Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.7061795Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.7062048Z [2023-01-11 21:35:45,027] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 880 2023-01-11T21:38:06.7062312Z [2023-01-11 21:35:45,175] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 880 2023-01-11T21:38:06.7062318Z 2023-01-11T21:38:06.7062417Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.7062484Z import torch 2023-01-11T21:38:06.7062561Z import random 2023-01-11T21:38:06.7062680Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.7062803Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.7062808Z 2023-01-11T21:38:06.7062888Z aten = torch.ops.aten 2023-01-11T21:38:06.7063024Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.7063143Z async_compile = AsyncCompile() 2023-01-11T21:38:06.7063149Z 2023-01-11T21:38:06.7063219Z import triton 2023-01-11T21:38:06.7063310Z import triton.language as tl 2023-01-11T21:38:06.7063438Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.7063579Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.7063585Z 2023-01-11T21:38:06.7063589Z 2023-01-11T21:38:06.7063752Z triton_fused_add_tanh_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.7063825Z import triton 2023-01-11T21:38:06.7063916Z import triton.language as tl 2023-01-11T21:38:06.7064028Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.7064126Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.7064258Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.7064381Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.7064386Z 2023-01-11T21:38:06.7064804Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.7064878Z @triton.jit 2023-01-11T21:38:06.7065021Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.7065094Z xnumel = 256 2023-01-11T21:38:06.7065193Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.7065315Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.7065400Z xmask = xindex < xnumel 2023-01-11T21:38:06.7065471Z x0 = xindex 2023-01-11T21:38:06.7065692Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.7065792Z tmp4 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.7065892Z tmp1 = tl.libdevice.tanh(tmp0) 2023-01-11T21:38:06.7065966Z tmp2 = 2 2023-01-11T21:38:06.7066041Z tmp3 = tmp1 + tmp2 2023-01-11T21:38:06.7066117Z tmp5 = 1 2023-01-11T21:38:06.7066197Z tmp6 = tmp4 + tmp5 2023-01-11T21:38:06.7066296Z tmp7 = tl.libdevice.tanh(tmp6) 2023-01-11T21:38:06.7066434Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.7066568Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp7, xmask) 2023-01-11T21:38:06.7066656Z ''') 2023-01-11T21:38:06.7066661Z 2023-01-11T21:38:06.7066666Z 2023-01-11T21:38:06.7066755Z async_compile.wait(globals()) 2023-01-11T21:38:06.7066834Z del async_compile 2023-01-11T21:38:06.7066841Z 2023-01-11T21:38:06.7066916Z def call(args): 2023-01-11T21:38:06.7066991Z arg0_1, = args 2023-01-11T21:38:06.7067071Z args.clear() 2023-01-11T21:38:06.7067166Z with torch.cuda.device(0): 2023-01-11T21:38:06.7067376Z buf0 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.7067569Z buf1 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.7067666Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.7067818Z triton_fused_add_tanh_1_0.run(arg0_1, buf0, buf1, 256, grid=grid(256), stream=stream0) 2023-01-11T21:38:06.7067894Z del arg0_1 2023-01-11T21:38:06.7067981Z return (buf0, buf1, ) 2023-01-11T21:38:06.7067986Z 2023-01-11T21:38:06.7067990Z 2023-01-11T21:38:06.7068071Z if __name__ == "__main__": 2023-01-11T21:38:06.7068192Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.7068320Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.7068518Z arg0_1 = rand_strided((16, 16), (16, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.7068637Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.7068642Z 2023-01-11T21:38:06.7068647Z 2023-01-11T21:38:06.7068746Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.7068822Z import torch 2023-01-11T21:38:06.7068899Z import random 2023-01-11T21:38:06.7069023Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.7069176Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.7069182Z 2023-01-11T21:38:06.7069267Z aten = torch.ops.aten 2023-01-11T21:38:06.7069398Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.7069495Z async_compile = AsyncCompile() 2023-01-11T21:38:06.7069500Z 2023-01-11T21:38:06.7069576Z import triton 2023-01-11T21:38:06.7069672Z import triton.language as tl 2023-01-11T21:38:06.7069799Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.7069940Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.7069946Z 2023-01-11T21:38:06.7069953Z 2023-01-11T21:38:06.7070119Z triton_fused_add_tanh_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.7070196Z import triton 2023-01-11T21:38:06.7070285Z import triton.language as tl 2023-01-11T21:38:06.7070399Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.7070503Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.7070640Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.7070766Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.7070772Z 2023-01-11T21:38:06.7071185Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.7071263Z @triton.jit 2023-01-11T21:38:06.7071406Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.7071477Z xnumel = 256 2023-01-11T21:38:06.7071608Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.7071742Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.7071827Z xmask = xindex < xnumel 2023-01-11T21:38:06.7071900Z x0 = xindex 2023-01-11T21:38:06.7072115Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.7072237Z tmp4 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.7072328Z tmp1 = tl.libdevice.tanh(tmp0) 2023-01-11T21:38:06.7072402Z tmp2 = 2 2023-01-11T21:38:06.7072484Z tmp3 = tmp1 + tmp2 2023-01-11T21:38:06.7072557Z tmp5 = 1 2023-01-11T21:38:06.7072638Z tmp6 = tmp4 + tmp5 2023-01-11T21:38:06.7072736Z tmp7 = tl.libdevice.tanh(tmp6) 2023-01-11T21:38:06.7072872Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.7072999Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp7, xmask) 2023-01-11T21:38:06.7073086Z ''') 2023-01-11T21:38:06.7073094Z 2023-01-11T21:38:06.7073100Z 2023-01-11T21:38:06.7073194Z async_compile.wait(globals()) 2023-01-11T21:38:06.7073274Z del async_compile 2023-01-11T21:38:06.7073279Z 2023-01-11T21:38:06.7073356Z def call(args): 2023-01-11T21:38:06.7073432Z arg0_1, = args 2023-01-11T21:38:06.7073509Z args.clear() 2023-01-11T21:38:06.7073600Z with torch.cuda.device(0): 2023-01-11T21:38:06.7073804Z buf0 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.7074002Z buf1 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.7074096Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.7074248Z triton_fused_add_tanh_1_0.run(arg0_1, buf0, buf1, 256, grid=grid(256), stream=stream0) 2023-01-11T21:38:06.7074324Z del arg0_1 2023-01-11T21:38:06.7074408Z return (buf0, buf1, ) 2023-01-11T21:38:06.7074413Z 2023-01-11T21:38:06.7074418Z 2023-01-11T21:38:06.7074499Z if __name__ == "__main__": 2023-01-11T21:38:06.7074614Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.7074743Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.7074946Z arg0_1 = rand_strided((16, 16), (16, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.7075059Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.7075094Z 2023-01-11T21:38:06.7075169Z ok (0.355s) 2023-01-11T21:38:06.7075626Z test_tensor1_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.7075761Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.7076020Z [2023-01-11 21:35:45,195] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 881 2023-01-11T21:38:06.7076288Z [2023-01-11 21:35:45,268] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 881 2023-01-11T21:38:06.7076705Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.7076840Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.7077090Z [2023-01-11 21:35:45,285] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 882 2023-01-11T21:38:06.7077354Z [2023-01-11 21:35:45,354] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 882 2023-01-11T21:38:06.7077359Z 2023-01-11T21:38:06.7077486Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.7077560Z import torch 2023-01-11T21:38:06.7077637Z import random 2023-01-11T21:38:06.7077758Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.7077882Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.7077887Z 2023-01-11T21:38:06.7077972Z aten = torch.ops.aten 2023-01-11T21:38:06.7078101Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.7078197Z async_compile = AsyncCompile() 2023-01-11T21:38:06.7078202Z 2023-01-11T21:38:06.7078275Z import triton 2023-01-11T21:38:06.7078367Z import triton.language as tl 2023-01-11T21:38:06.7078492Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.7078631Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.7078637Z 2023-01-11T21:38:06.7078642Z 2023-01-11T21:38:06.7078796Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.7078872Z import triton 2023-01-11T21:38:06.7078960Z import triton.language as tl 2023-01-11T21:38:06.7079074Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.7079176Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.7079308Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.7079437Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.7079445Z 2023-01-11T21:38:06.7079848Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.7079924Z @triton.jit 2023-01-11T21:38:06.7080057Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.7080124Z xnumel = 10 2023-01-11T21:38:06.7080220Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.7080350Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.7080436Z xmask = xindex < xnumel 2023-01-11T21:38:06.7080508Z x0 = xindex 2023-01-11T21:38:06.7080604Z tmp2 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.7080674Z tmp0 = 1 2023-01-11T21:38:06.7080755Z tmp1 = tmp0.to(tl.float32) 2023-01-11T21:38:06.7080833Z tmp3 = tmp1 + tmp2 2023-01-11T21:38:06.7080994Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.7081082Z ''') 2023-01-11T21:38:06.7081087Z 2023-01-11T21:38:06.7081092Z 2023-01-11T21:38:06.7081269Z triton_fused_lift_fresh_copy_1_1 = async_compile.triton(''' 2023-01-11T21:38:06.7081344Z import triton 2023-01-11T21:38:06.7081436Z import triton.language as tl 2023-01-11T21:38:06.7081544Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.7081643Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.7081774Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.7081898Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.7081906Z 2023-01-11T21:38:06.7082286Z @pointwise(size_hints=[1], filename=__file__, meta={'signature': {0: '*i64', 1: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0,), equal_to_1=())]}) 2023-01-11T21:38:06.7082362Z @triton.jit 2023-01-11T21:38:06.7082485Z def triton_(out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.7082557Z xnumel = 1 2023-01-11T21:38:06.7082646Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.7082773Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.7082855Z xmask = xindex < xnumel 2023-01-11T21:38:06.7082925Z tmp0 = 5 2023-01-11T21:38:06.7083058Z tl.store(out_ptr0 + (0 + tl.zeros([XBLOCK], tl.int32)), tmp0, None) 2023-01-11T21:38:06.7083143Z ''') 2023-01-11T21:38:06.7083148Z 2023-01-11T21:38:06.7083152Z 2023-01-11T21:38:06.7083246Z async_compile.wait(globals()) 2023-01-11T21:38:06.7083316Z del async_compile 2023-01-11T21:38:06.7083329Z 2023-01-11T21:38:06.7083435Z def call(args): 2023-01-11T21:38:06.7083508Z arg0_1, = args 2023-01-11T21:38:06.7083585Z args.clear() 2023-01-11T21:38:06.7083677Z with torch.cuda.device(0): 2023-01-11T21:38:06.7083874Z buf0 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.7083968Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.7084107Z triton_fused_add_0.run(arg0_1, buf0, 10, grid=grid(10), stream=stream0) 2023-01-11T21:38:06.7084173Z del arg0_1 2023-01-11T21:38:06.7084359Z buf1 = empty_strided((), (), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.7084502Z triton_fused_lift_fresh_copy_1_1.run(buf1, 1, grid=grid(1), stream=stream0) 2023-01-11T21:38:06.7084584Z return (buf0, buf1, ) 2023-01-11T21:38:06.7084590Z 2023-01-11T21:38:06.7084594Z 2023-01-11T21:38:06.7084672Z if __name__ == "__main__": 2023-01-11T21:38:06.7084789Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.7084915Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.7085109Z arg0_1 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.7085220Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.7085225Z 2023-01-11T21:38:06.7085229Z 2023-01-11T21:38:06.7085324Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.7085399Z import torch 2023-01-11T21:38:06.7085473Z import random 2023-01-11T21:38:06.7085592Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.7085715Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.7085720Z 2023-01-11T21:38:06.7085801Z aten = torch.ops.aten 2023-01-11T21:38:06.7085929Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.7086023Z async_compile = AsyncCompile() 2023-01-11T21:38:06.7086028Z 2023-01-11T21:38:06.7086100Z import triton 2023-01-11T21:38:06.7086192Z import triton.language as tl 2023-01-11T21:38:06.7086315Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.7086459Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.7086464Z 2023-01-11T21:38:06.7086468Z 2023-01-11T21:38:06.7086622Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.7086697Z import triton 2023-01-11T21:38:06.7086782Z import triton.language as tl 2023-01-11T21:38:06.7086922Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.7087026Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.7087159Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.7087284Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.7087289Z 2023-01-11T21:38:06.7087688Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.7087760Z @triton.jit 2023-01-11T21:38:06.7087897Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.7087963Z xnumel = 10 2023-01-11T21:38:06.7088060Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.7088188Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.7088270Z xmask = xindex < xnumel 2023-01-11T21:38:06.7088342Z x0 = xindex 2023-01-11T21:38:06.7088459Z tmp2 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.7088529Z tmp0 = 1 2023-01-11T21:38:06.7088609Z tmp1 = tmp0.to(tl.float32) 2023-01-11T21:38:06.7088688Z tmp3 = tmp1 + tmp2 2023-01-11T21:38:06.7088822Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.7088907Z ''') 2023-01-11T21:38:06.7088913Z 2023-01-11T21:38:06.7088917Z 2023-01-11T21:38:06.7089095Z triton_fused_lift_fresh_copy_1_1 = async_compile.triton(''' 2023-01-11T21:38:06.7089169Z import triton 2023-01-11T21:38:06.7089261Z import triton.language as tl 2023-01-11T21:38:06.7089396Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.7089497Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.7089630Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.7089754Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.7089759Z 2023-01-11T21:38:06.7090141Z @pointwise(size_hints=[1], filename=__file__, meta={'signature': {0: '*i64', 1: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0,), equal_to_1=())]}) 2023-01-11T21:38:06.7090214Z @triton.jit 2023-01-11T21:38:06.7090335Z def triton_(out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.7090407Z xnumel = 1 2023-01-11T21:38:06.7090495Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.7090623Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.7090705Z xmask = xindex < xnumel 2023-01-11T21:38:06.7090775Z tmp0 = 5 2023-01-11T21:38:06.7090914Z tl.store(out_ptr0 + (0 + tl.zeros([XBLOCK], tl.int32)), tmp0, None) 2023-01-11T21:38:06.7091001Z ''') 2023-01-11T21:38:06.7091006Z 2023-01-11T21:38:06.7091011Z 2023-01-11T21:38:06.7091103Z async_compile.wait(globals()) 2023-01-11T21:38:06.7091184Z del async_compile 2023-01-11T21:38:06.7091189Z 2023-01-11T21:38:06.7091257Z def call(args): 2023-01-11T21:38:06.7091332Z arg0_1, = args 2023-01-11T21:38:06.7091407Z args.clear() 2023-01-11T21:38:06.7091499Z with torch.cuda.device(0): 2023-01-11T21:38:06.7091697Z buf0 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.7091792Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.7091930Z triton_fused_add_0.run(arg0_1, buf0, 10, grid=grid(10), stream=stream0) 2023-01-11T21:38:06.7091995Z del arg0_1 2023-01-11T21:38:06.7092182Z buf1 = empty_strided((), (), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.7092326Z triton_fused_lift_fresh_copy_1_1.run(buf1, 1, grid=grid(1), stream=stream0) 2023-01-11T21:38:06.7092411Z return (buf0, buf1, ) 2023-01-11T21:38:06.7092417Z 2023-01-11T21:38:06.7092421Z 2023-01-11T21:38:06.7092500Z if __name__ == "__main__": 2023-01-11T21:38:06.7092619Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.7092772Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.7092971Z arg0_1 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.7093075Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.7093080Z 2023-01-11T21:38:06.7093151Z ok (0.179s) 2023-01-11T21:38:06.7093603Z test_tensor2_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.7093737Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.7093993Z [2023-01-11 21:35:45,371] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 883 2023-01-11T21:38:06.7094258Z [2023-01-11 21:35:45,439] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 883 2023-01-11T21:38:06.7094859Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.7094992Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.7095252Z [2023-01-11 21:35:45,455] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 884 2023-01-11T21:38:06.7095557Z [2023-01-11 21:35:45,525] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 884 2023-01-11T21:38:06.7095562Z 2023-01-11T21:38:06.7095661Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.7095729Z import torch 2023-01-11T21:38:06.7095802Z import random 2023-01-11T21:38:06.7095922Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.7096046Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.7096051Z 2023-01-11T21:38:06.7096132Z aten = torch.ops.aten 2023-01-11T21:38:06.7096268Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.7096363Z async_compile = AsyncCompile() 2023-01-11T21:38:06.7096368Z 2023-01-11T21:38:06.7096441Z import triton 2023-01-11T21:38:06.7096526Z import triton.language as tl 2023-01-11T21:38:06.7096651Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.7096791Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.7096951Z constant0 = None # 00601f3c4f47596913f89a8bc54b1f5991da93d930cde1b0ac60c4341d62906d 2023-01-11T21:38:06.7096957Z 2023-01-11T21:38:06.7096961Z 2023-01-11T21:38:06.7097114Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.7097260Z import triton 2023-01-11T21:38:06.7097357Z import triton.language as tl 2023-01-11T21:38:06.7097465Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.7097565Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.7097697Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.7097824Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.7097829Z 2023-01-11T21:38:06.7098243Z @pointwise(size_hints=[32], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.7098319Z @triton.jit 2023-01-11T21:38:06.7098462Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.7098536Z xnumel = 19 2023-01-11T21:38:06.7098626Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.7098752Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.7098874Z xmask = xindex < xnumel 2023-01-11T21:38:06.7098950Z x0 = xindex 2023-01-11T21:38:06.7099047Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.7099177Z tmp2 = tl.load(in_ptr1 + (0 + tl.zeros([XBLOCK], tl.int32)), None) 2023-01-11T21:38:06.7099266Z tmp1 = tmp0.to(tl.float32) 2023-01-11T21:38:06.7099338Z tmp3 = tmp1 + tmp2 2023-01-11T21:38:06.7099472Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.7099557Z ''') 2023-01-11T21:38:06.7099562Z 2023-01-11T21:38:06.7099567Z 2023-01-11T21:38:06.7099658Z async_compile.wait(globals()) 2023-01-11T21:38:06.7099734Z del async_compile 2023-01-11T21:38:06.7099742Z 2023-01-11T21:38:06.7099817Z def call(args): 2023-01-11T21:38:06.7099889Z arg0_1, = args 2023-01-11T21:38:06.7099962Z args.clear() 2023-01-11T21:38:06.7100047Z with torch.cuda.device(0): 2023-01-11T21:38:06.7100244Z buf0 = empty_strided((19, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.7100340Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.7100487Z triton_fused_add_0.run(constant0, arg0_1, buf0, 19, grid=grid(19), stream=stream0) 2023-01-11T21:38:06.7100559Z del arg0_1 2023-01-11T21:38:06.7100636Z return (buf0, ) 2023-01-11T21:38:06.7100641Z 2023-01-11T21:38:06.7100646Z 2023-01-11T21:38:06.7100725Z if __name__ == "__main__": 2023-01-11T21:38:06.7100843Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.7100962Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.7101164Z constant0 = rand_strided((19, ), (1, ), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.7101389Z arg0_1 = rand_strided((1, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.7101504Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.7101509Z 2023-01-11T21:38:06.7101514Z 2023-01-11T21:38:06.7101614Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.7101690Z import torch 2023-01-11T21:38:06.7101768Z import random 2023-01-11T21:38:06.7101885Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.7102012Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.7102017Z 2023-01-11T21:38:06.7102101Z aten = torch.ops.aten 2023-01-11T21:38:06.7102239Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.7102336Z async_compile = AsyncCompile() 2023-01-11T21:38:06.7102342Z 2023-01-11T21:38:06.7102417Z import triton 2023-01-11T21:38:06.7102515Z import triton.language as tl 2023-01-11T21:38:06.7102642Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.7102775Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.7102936Z constant0 = None # 00601f3c4f47596913f89a8bc54b1f5991da93d930cde1b0ac60c4341d62906d 2023-01-11T21:38:06.7102941Z 2023-01-11T21:38:06.7102946Z 2023-01-11T21:38:06.7103101Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.7103179Z import triton 2023-01-11T21:38:06.7103275Z import triton.language as tl 2023-01-11T21:38:06.7103391Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.7103494Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.7103628Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.7103749Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.7103755Z 2023-01-11T21:38:06.7104166Z @pointwise(size_hints=[32], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.7104244Z @triton.jit 2023-01-11T21:38:06.7104389Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.7104464Z xnumel = 19 2023-01-11T21:38:06.7104563Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.7104696Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.7104808Z xmask = xindex < xnumel 2023-01-11T21:38:06.7104876Z x0 = xindex 2023-01-11T21:38:06.7104976Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.7105123Z tmp2 = tl.load(in_ptr1 + (0 + tl.zeros([XBLOCK], tl.int32)), None).to(tl.float32) 2023-01-11T21:38:06.7105213Z tmp1 = tmp0.to(tl.float32) 2023-01-11T21:38:06.7105293Z tmp3 = tmp1 + tmp2 2023-01-11T21:38:06.7105430Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.7105517Z ''') 2023-01-11T21:38:06.7105522Z 2023-01-11T21:38:06.7105527Z 2023-01-11T21:38:06.7105614Z async_compile.wait(globals()) 2023-01-11T21:38:06.7105698Z del async_compile 2023-01-11T21:38:06.7105703Z 2023-01-11T21:38:06.7105783Z def call(args): 2023-01-11T21:38:06.7105858Z arg0_1, = args 2023-01-11T21:38:06.7105935Z args.clear() 2023-01-11T21:38:06.7106028Z with torch.cuda.device(0): 2023-01-11T21:38:06.7106228Z buf0 = empty_strided((19, ), (1, ), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.7106319Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.7106471Z triton_fused_add_0.run(constant0, arg0_1, buf0, 19, grid=grid(19), stream=stream0) 2023-01-11T21:38:06.7106547Z del arg0_1 2023-01-11T21:38:06.7106626Z return (buf0, ) 2023-01-11T21:38:06.7106631Z 2023-01-11T21:38:06.7106636Z 2023-01-11T21:38:06.7106718Z if __name__ == "__main__": 2023-01-11T21:38:06.7106839Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.7106967Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.7107171Z constant0 = rand_strided((19, ), (1, ), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.7107399Z arg0_1 = rand_strided((1, ), (1, ), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.7107513Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.7107518Z 2023-01-11T21:38:06.7107591Z ok (0.170s) 2023-01-11T21:38:06.7108049Z test_tensor3_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.7108182Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.7108442Z [2023-01-11 21:35:45,552] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 885 2023-01-11T21:38:06.7108705Z [2023-01-11 21:35:45,729] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 885 2023-01-11T21:38:06.7109131Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.7109264Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.7109522Z [2023-01-11 21:35:45,756] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 886 2023-01-11T21:38:06.7109528Z 2023-01-11T21:38:06.7109628Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.7109698Z import torch 2023-01-11T21:38:06.7109777Z import random 2023-01-11T21:38:06.7109902Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.7110029Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.7110037Z 2023-01-11T21:38:06.7110120Z aten = torch.ops.aten 2023-01-11T21:38:06.7110260Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.7110358Z async_compile = AsyncCompile() 2023-01-11T21:38:06.7110364Z 2023-01-11T21:38:06.7110440Z import triton 2023-01-11T21:38:06.7110528Z import triton.language as tl 2023-01-11T21:38:06.7110688Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.7110833Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.7110839Z 2023-01-11T21:38:06.7110843Z 2023-01-11T21:38:06.7111001Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.7111080Z import triton 2023-01-11T21:38:06.7111174Z import triton.language as tl 2023-01-11T21:38:06.7111291Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.7111388Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.7111522Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.7111654Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.7111659Z 2023-01-11T21:38:06.7112042Z @pointwise(size_hints=[2], filename=__file__, meta={'signature': {0: '*i64', 1: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0,), equal_to_1=())]}) 2023-01-11T21:38:06.7112121Z @triton.jit 2023-01-11T21:38:06.7112246Z def triton_(out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.7112322Z xnumel = 2 2023-01-11T21:38:06.7112420Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.7112545Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.7112630Z xmask = xindex < xnumel 2023-01-11T21:38:06.7112703Z x0 = xindex 2023-01-11T21:38:06.7112776Z tmp0 = x0 2023-01-11T21:38:06.7112849Z tmp1 = 1 2023-01-11T21:38:06.7112929Z tmp2 = tmp0 < tmp1 2023-01-11T21:38:06.7113002Z tmp3 = 2 2023-01-11T21:38:06.7113094Z tmp4 = tl.where(tmp2, tmp1, tmp3) 2023-01-11T21:38:06.7113202Z tmp5 = tmp4 + tmp1 2023-01-11T21:38:06.7113340Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask) 2023-01-11T21:38:06.7113428Z ''') 2023-01-11T21:38:06.7113434Z 2023-01-11T21:38:06.7113438Z 2023-01-11T21:38:06.7113597Z triton_fused_add_1_1 = async_compile.triton(''' 2023-01-11T21:38:06.7113675Z import triton 2023-01-11T21:38:06.7113772Z import triton.language as tl 2023-01-11T21:38:06.7113883Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.7113987Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.7114121Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.7114248Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.7114253Z 2023-01-11T21:38:06.7114638Z @pointwise(size_hints=[4], filename=__file__, meta={'signature': {0: '*i64', 1: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0,), equal_to_1=())]}) 2023-01-11T21:38:06.7114716Z @triton.jit 2023-01-11T21:38:06.7114839Z def triton_(out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.7114913Z xnumel = 3 2023-01-11T21:38:06.7115005Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.7115137Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.7115225Z xmask = xindex < xnumel 2023-01-11T21:38:06.7115298Z x0 = xindex 2023-01-11T21:38:06.7115373Z tmp0 = x0 2023-01-11T21:38:06.7115444Z tmp1 = 1 2023-01-11T21:38:06.7115529Z tmp2 = tmp0 < tmp1 2023-01-11T21:38:06.7115594Z tmp3 = 2 2023-01-11T21:38:06.7115673Z tmp4 = tmp0 < tmp3 2023-01-11T21:38:06.7115744Z tmp5 = 3 2023-01-11T21:38:06.7115843Z tmp6 = tl.where(tmp4, tmp3, tmp5) 2023-01-11T21:38:06.7115939Z tmp7 = tl.where(tmp2, tmp1, tmp6) 2023-01-11T21:38:06.7116019Z tmp8 = tmp7 + tmp3 2023-01-11T21:38:06.7116158Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp8, xmask) 2023-01-11T21:38:06.7116239Z ''') 2023-01-11T21:38:06.7116247Z 2023-01-11T21:38:06.7116251Z 2023-01-11T21:38:06.7116410Z triton_fused_add_2_2 = async_compile.triton(''' 2023-01-11T21:38:06.7116486Z import triton 2023-01-11T21:38:06.7116581Z import triton.language as tl 2023-01-11T21:38:06.7116697Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.7116829Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.7116964Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.7117082Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.7117087Z 2023-01-11T21:38:06.7117486Z @pointwise(size_hints=[4], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.7117559Z @triton.jit 2023-01-11T21:38:06.7117692Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.7117764Z xnumel = 4 2023-01-11T21:38:06.7117864Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.7117996Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.7118078Z xmask = xindex < xnumel 2023-01-11T21:38:06.7118142Z x0 = xindex 2023-01-11T21:38:06.7118240Z tmp12 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.7118315Z tmp0 = x0 2023-01-11T21:38:06.7118385Z tmp1 = 2 2023-01-11T21:38:06.7118463Z tmp2 = tmp0 < tmp1 2023-01-11T21:38:06.7118531Z tmp3 = 1 2023-01-11T21:38:06.7118608Z tmp4 = tmp0 < tmp3 2023-01-11T21:38:06.7118697Z tmp5 = tl.where(tmp4, tmp3, tmp1) 2023-01-11T21:38:06.7118768Z tmp6 = 3 2023-01-11T21:38:06.7118845Z tmp7 = tmp0 < tmp6 2023-01-11T21:38:06.7118914Z tmp8 = 4 2023-01-11T21:38:06.7119010Z tmp9 = tl.where(tmp7, tmp6, tmp8) 2023-01-11T21:38:06.7119105Z tmp10 = tl.where(tmp2, tmp5, tmp9) 2023-01-11T21:38:06.7126077Z tmp11 = tmp10.to(tl.float32) 2023-01-11T21:38:06.7126177Z tmp13 = tmp11 + tmp12 2023-01-11T21:38:06.7126385Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp13, xmask) 2023-01-11T21:38:06.7126498Z ''') 2023-01-11T21:38:06.7126504Z 2023-01-11T21:38:06.7126509Z 2023-01-11T21:38:06.7126608Z async_compile.wait(globals()) 2023-01-11T21:38:06.7126691Z del async_compile 2023-01-11T21:38:06.7126696Z 2023-01-11T21:38:06.7126779Z def call(args): 2023-01-11T21:38:06.7126857Z arg0_1, = args 2023-01-11T21:38:06.7126929Z args.clear() 2023-01-11T21:38:06.7127026Z with torch.cuda.device(0): 2023-01-11T21:38:06.7127228Z buf0 = empty_strided((2, ), (1, ), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.7127325Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.7127461Z triton_fused_add_0.run(buf0, 2, grid=grid(2), stream=stream0) 2023-01-11T21:38:06.7127657Z buf1 = empty_strided((3, ), (1, ), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.7127785Z triton_fused_add_1_1.run(buf1, 3, grid=grid(3), stream=stream0) 2023-01-11T21:38:06.7127980Z buf2 = empty_strided((4, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.7128123Z triton_fused_add_2_2.run(arg0_1, buf2, 4, grid=grid(4), stream=stream0) 2023-01-11T21:38:06.7128198Z del arg0_1 2023-01-11T21:38:06.7128397Z buf3 = empty_strided((0, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.7128498Z return (buf3, buf0, buf1, buf2, ) 2023-01-11T21:38:06.7128503Z 2023-01-11T21:38:06.7128508Z 2023-01-11T21:38:06.7128590Z if __name__ == "__main__": 2023-01-11T21:38:06.7128712Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.7128841Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.7129032Z arg0_1 = rand_strided((4, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.7129148Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.7129153Z 2023-01-11T21:38:06.7129420Z [2023-01-11 21:35:45,836] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 886 2023-01-11T21:38:06.7129429Z 2023-01-11T21:38:06.7129527Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.7129602Z import torch 2023-01-11T21:38:06.7129676Z import random 2023-01-11T21:38:06.7129796Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.7129955Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.7129961Z 2023-01-11T21:38:06.7130043Z aten = torch.ops.aten 2023-01-11T21:38:06.7130173Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.7130272Z async_compile = AsyncCompile() 2023-01-11T21:38:06.7130277Z 2023-01-11T21:38:06.7130351Z import triton 2023-01-11T21:38:06.7130443Z import triton.language as tl 2023-01-11T21:38:06.7130569Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.7130710Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.7130715Z 2023-01-11T21:38:06.7130719Z 2023-01-11T21:38:06.7130872Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.7130944Z import triton 2023-01-11T21:38:06.7131036Z import triton.language as tl 2023-01-11T21:38:06.7131149Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.7131250Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.7131381Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.7131507Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.7131512Z 2023-01-11T21:38:06.7131898Z @pointwise(size_hints=[2], filename=__file__, meta={'signature': {0: '*i64', 1: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0,), equal_to_1=())]}) 2023-01-11T21:38:06.7131972Z @triton.jit 2023-01-11T21:38:06.7132088Z def triton_(out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.7132163Z xnumel = 2 2023-01-11T21:38:06.7132263Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.7132392Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.7132518Z xmask = xindex < xnumel 2023-01-11T21:38:06.7132590Z x0 = xindex 2023-01-11T21:38:06.7132661Z tmp0 = x0 2023-01-11T21:38:06.7132724Z tmp1 = 1 2023-01-11T21:38:06.7132803Z tmp2 = tmp0 < tmp1 2023-01-11T21:38:06.7132876Z tmp3 = 2 2023-01-11T21:38:06.7132974Z tmp4 = tl.where(tmp2, tmp1, tmp3) 2023-01-11T21:38:06.7133053Z tmp5 = tmp4 + tmp1 2023-01-11T21:38:06.7133188Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask) 2023-01-11T21:38:06.7133273Z ''') 2023-01-11T21:38:06.7133279Z 2023-01-11T21:38:06.7133283Z 2023-01-11T21:38:06.7133433Z triton_fused_add_1_1 = async_compile.triton(''' 2023-01-11T21:38:06.7133507Z import triton 2023-01-11T21:38:06.7133599Z import triton.language as tl 2023-01-11T21:38:06.7133716Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.7133817Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.7133949Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.7134077Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.7134081Z 2023-01-11T21:38:06.7134702Z @pointwise(size_hints=[4], filename=__file__, meta={'signature': {0: '*i64', 1: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0,), equal_to_1=())]}) 2023-01-11T21:38:06.7134772Z @triton.jit 2023-01-11T21:38:06.7134894Z def triton_(out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.7134966Z xnumel = 3 2023-01-11T21:38:06.7135063Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.7135191Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.7135275Z xmask = xindex < xnumel 2023-01-11T21:38:06.7135345Z x0 = xindex 2023-01-11T21:38:06.7135409Z tmp0 = x0 2023-01-11T21:38:06.7135479Z tmp1 = 1 2023-01-11T21:38:06.7135558Z tmp2 = tmp0 < tmp1 2023-01-11T21:38:06.7135627Z tmp3 = 2 2023-01-11T21:38:06.7135705Z tmp4 = tmp0 < tmp3 2023-01-11T21:38:06.7135779Z tmp5 = 3 2023-01-11T21:38:06.7135874Z tmp6 = tl.where(tmp4, tmp3, tmp5) 2023-01-11T21:38:06.7135962Z tmp7 = tl.where(tmp2, tmp1, tmp6) 2023-01-11T21:38:06.7136041Z tmp8 = tmp7 + tmp3 2023-01-11T21:38:06.7136177Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp8, xmask) 2023-01-11T21:38:06.7136320Z ''') 2023-01-11T21:38:06.7136326Z 2023-01-11T21:38:06.7136331Z 2023-01-11T21:38:06.7136494Z triton_fused_add_2_2 = async_compile.triton(''' 2023-01-11T21:38:06.7136568Z import triton 2023-01-11T21:38:06.7136663Z import triton.language as tl 2023-01-11T21:38:06.7136771Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.7136872Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.7137003Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.7137174Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.7137180Z 2023-01-11T21:38:06.7137624Z @pointwise(size_hints=[4], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.7137704Z @triton.jit 2023-01-11T21:38:06.7137848Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.7137923Z xnumel = 4 2023-01-11T21:38:06.7138018Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.7138158Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.7138244Z xmask = xindex < xnumel 2023-01-11T21:38:06.7138315Z x0 = xindex 2023-01-11T21:38:06.7138442Z tmp12 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.7138513Z tmp0 = x0 2023-01-11T21:38:06.7138582Z tmp1 = 2 2023-01-11T21:38:06.7138656Z tmp2 = tmp0 < tmp1 2023-01-11T21:38:06.7138728Z tmp3 = 1 2023-01-11T21:38:06.7138806Z tmp4 = tmp0 < tmp3 2023-01-11T21:38:06.7138906Z tmp5 = tl.where(tmp4, tmp3, tmp1) 2023-01-11T21:38:06.7139072Z tmp6 = 3 2023-01-11T21:38:06.7139153Z tmp7 = tmp0 < tmp6 2023-01-11T21:38:06.7139225Z tmp8 = 4 2023-01-11T21:38:06.7139319Z tmp9 = tl.where(tmp7, tmp6, tmp8) 2023-01-11T21:38:06.7139422Z tmp10 = tl.where(tmp2, tmp5, tmp9) 2023-01-11T21:38:06.7139519Z tmp11 = tmp10.to(tl.float32) 2023-01-11T21:38:06.7139609Z tmp13 = tmp11 + tmp12 2023-01-11T21:38:06.7139761Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp13, xmask) 2023-01-11T21:38:06.7139854Z ''') 2023-01-11T21:38:06.7139860Z 2023-01-11T21:38:06.7139864Z 2023-01-11T21:38:06.7139962Z async_compile.wait(globals()) 2023-01-11T21:38:06.7140035Z del async_compile 2023-01-11T21:38:06.7140041Z 2023-01-11T21:38:06.7140119Z def call(args): 2023-01-11T21:38:06.7140194Z arg0_1, = args 2023-01-11T21:38:06.7140272Z args.clear() 2023-01-11T21:38:06.7140370Z with torch.cuda.device(0): 2023-01-11T21:38:06.7140591Z buf0 = empty_strided((2, ), (1, ), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.7140688Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.7140813Z triton_fused_add_0.run(buf0, 2, grid=grid(2), stream=stream0) 2023-01-11T21:38:06.7141005Z buf1 = empty_strided((3, ), (1, ), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.7141141Z triton_fused_add_1_1.run(buf1, 3, grid=grid(3), stream=stream0) 2023-01-11T21:38:06.7141340Z buf2 = empty_strided((4, ), (1, ), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.7141477Z triton_fused_add_2_2.run(arg0_1, buf2, 4, grid=grid(4), stream=stream0) 2023-01-11T21:38:06.7141552Z del arg0_1 2023-01-11T21:38:06.7141749Z buf3 = empty_strided((0, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.7141852Z return (buf3, buf0, buf1, buf2, ) 2023-01-11T21:38:06.7141857Z 2023-01-11T21:38:06.7141862Z 2023-01-11T21:38:06.7141936Z if __name__ == "__main__": 2023-01-11T21:38:06.7142057Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.7142186Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.7142383Z arg0_1 = rand_strided((4, ), (1, ), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.7142503Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.7142508Z 2023-01-11T21:38:06.7142581Z ok (0.312s) 2023-01-11T21:38:06.7143078Z test_tmp_not_defined_issue1_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.7143212Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.7143578Z Failed to collect metadata on function, produced code may be suboptimal. Known situations this can occur are inference mode only compilation involving resize_ or prims (!schema.hasAnyAliasInfo() INTERNAL ASSERT FAILED); if your situation looks different please file a bug to PyTorch. 2023-01-11T21:38:06.7143678Z Traceback (most recent call last): 2023-01-11T21:38:06.7143946Z File "/opt/conda/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 1273, in aot_wrapper_dedupe 2023-01-11T21:38:06.7144116Z fw_metadata, _out, _num_aliasing_metadata_outs = run_functionalized_fw_and_collect_metadata( 2023-01-11T21:38:06.7144363Z File "/opt/conda/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 289, in inner 2023-01-11T21:38:06.7144443Z outs = f(*f_args) 2023-01-11T21:38:06.7144705Z File "/opt/conda/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 2327, in functional_call 2023-01-11T21:38:06.7144834Z out = Interpreter(mod).run(*args[params_len:], **kwargs) 2023-01-11T21:38:06.7145064Z File "/opt/conda/lib/python3.10/site-packages/torch/fx/interpreter.py", line 136, in run 2023-01-11T21:38:06.7145193Z self.env[node] = self.run_node(node) 2023-01-11T21:38:06.7145421Z File "/opt/conda/lib/python3.10/site-packages/torch/fx/interpreter.py", line 177, in run_node 2023-01-11T21:38:06.7145557Z return getattr(self, n.op)(n.target, args, kwargs) 2023-01-11T21:38:06.7145837Z File "/opt/conda/lib/python3.10/site-packages/torch/fx/interpreter.py", line 249, in call_function 2023-01-11T21:38:06.7145930Z return target(*args, **kwargs) 2023-01-11T21:38:06.7146147Z File "/opt/conda/lib/python3.10/site-packages/torch/_ops.py", line 284, in __call__ 2023-01-11T21:38:06.7146251Z return self._op(*args, **kwargs or {}) 2023-01-11T21:38:06.7146510Z File "/opt/conda/lib/python3.10/site-packages/torch/_inductor/overrides.py", line 36, in __torch_function__ 2023-01-11T21:38:06.7146600Z return func(*args, **kwargs) 2023-01-11T21:38:06.7146808Z File "/opt/conda/lib/python3.10/site-packages/torch/_ops.py", line 284, in __call__ 2023-01-11T21:38:06.7146911Z return self._op(*args, **kwargs or {}) 2023-01-11T21:38:06.7147157Z File "/opt/conda/lib/python3.10/site-packages/torch/_prims/__init__.py", line 285, in _autograd_impl 2023-01-11T21:38:06.7147283Z return backwards_not_supported(_prim)(*args, **kwargs) 2023-01-11T21:38:06.7147542Z File "/opt/conda/lib/python3.10/site-packages/torch/_prims_common/wrappers.py", line 309, in _autograd_impl 2023-01-11T21:38:06.7147646Z return redispatch_prim(args, kwargs) 2023-01-11T21:38:06.7147907Z File "/opt/conda/lib/python3.10/site-packages/torch/_prims_common/wrappers.py", line 279, in redispatch_prim 2023-01-11T21:38:06.7147998Z return prim(*args, **kwargs) 2023-01-11T21:38:06.7148205Z File "/opt/conda/lib/python3.10/site-packages/torch/_ops.py", line 284, in __call__ 2023-01-11T21:38:06.7148309Z return self._op(*args, **kwargs or {}) 2023-01-11T21:38:06.7148777Z RuntimeError: !schema.hasAnyAliasInfo() INTERNAL ASSERT FAILED at "/var/lib/jenkins/workspace/aten/src/ATen/FunctionalizeFallbackKernel.cpp":32, please report a bug to PyTorch. mutating and aliasing ops should all have codegen'd kernels 2023-01-11T21:38:06.7148786Z 2023-01-11T21:38:06.7149028Z While executing %broadcast_in_dim_default : [#users=1] = call_function[target=torch.ops.prims.broadcast_in_dim.default](args = (%var_default_1, [1, 512, 1], [0, 1]), kwargs = {}) 2023-01-11T21:38:06.7149140Z Original traceback: 2023-01-11T21:38:06.7149219Z Module stack: {} 2023-01-11T21:38:06.7149390Z File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 4723, in forward 2023-01-11T21:38:06.7149548Z broadcast_in_dim_default_2 = torch.ops.prims.broadcast_in_dim.default( 2023-01-11T21:38:06.7149711Z | File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 318, in run 2023-01-11T21:38:06.7149792Z return model(*ex, **kwargs) 2023-01-11T21:38:06.7149798Z 2023-01-11T21:38:06.7150057Z [2023-01-11 21:35:46,046] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 887 2023-01-11T21:38:06.7150270Z [2023-01-11 21:35:46,081] torch._inductor.scheduler: [DEBUG] remove_buffer('buf0') 2023-01-11T21:38:06.7150476Z [2023-01-11 21:35:46,087] torch._inductor.scheduler: [DEBUG] remove_buffer('buf4') 2023-01-11T21:38:06.7150743Z [2023-01-11 21:35:46,306] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 887 2023-01-11T21:38:06.7151167Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.7151299Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.7151657Z Failed to collect metadata on function, produced code may be suboptimal. Known situations this can occur are inference mode only compilation involving resize_ or prims (!schema.hasAnyAliasInfo() INTERNAL ASSERT FAILED); if your situation looks different please file a bug to PyTorch. 2023-01-11T21:38:06.7151786Z Traceback (most recent call last): 2023-01-11T21:38:06.7152058Z File "/opt/conda/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 1273, in aot_wrapper_dedupe 2023-01-11T21:38:06.7152221Z fw_metadata, _out, _num_aliasing_metadata_outs = run_functionalized_fw_and_collect_metadata( 2023-01-11T21:38:06.7152466Z File "/opt/conda/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 289, in inner 2023-01-11T21:38:06.7152546Z outs = f(*f_args) 2023-01-11T21:38:06.7152807Z File "/opt/conda/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 2327, in functional_call 2023-01-11T21:38:06.7152934Z out = Interpreter(mod).run(*args[params_len:], **kwargs) 2023-01-11T21:38:06.7153165Z File "/opt/conda/lib/python3.10/site-packages/torch/fx/interpreter.py", line 136, in run 2023-01-11T21:38:06.7153268Z self.env[node] = self.run_node(node) 2023-01-11T21:38:06.7153506Z File "/opt/conda/lib/python3.10/site-packages/torch/fx/interpreter.py", line 177, in run_node 2023-01-11T21:38:06.7153618Z return getattr(self, n.op)(n.target, args, kwargs) 2023-01-11T21:38:06.7153867Z File "/opt/conda/lib/python3.10/site-packages/torch/fx/interpreter.py", line 249, in call_function 2023-01-11T21:38:06.7153961Z return target(*args, **kwargs) 2023-01-11T21:38:06.7154175Z File "/opt/conda/lib/python3.10/site-packages/torch/_ops.py", line 284, in __call__ 2023-01-11T21:38:06.7154279Z return self._op(*args, **kwargs or {}) 2023-01-11T21:38:06.7154535Z File "/opt/conda/lib/python3.10/site-packages/torch/_inductor/overrides.py", line 36, in __torch_function__ 2023-01-11T21:38:06.7154626Z return func(*args, **kwargs) 2023-01-11T21:38:06.7154842Z File "/opt/conda/lib/python3.10/site-packages/torch/_ops.py", line 284, in __call__ 2023-01-11T21:38:06.7154938Z return self._op(*args, **kwargs or {}) 2023-01-11T21:38:06.7155187Z File "/opt/conda/lib/python3.10/site-packages/torch/_prims/__init__.py", line 285, in _autograd_impl 2023-01-11T21:38:06.7155316Z return backwards_not_supported(_prim)(*args, **kwargs) 2023-01-11T21:38:06.7155573Z File "/opt/conda/lib/python3.10/site-packages/torch/_prims_common/wrappers.py", line 309, in _autograd_impl 2023-01-11T21:38:06.7155704Z return redispatch_prim(args, kwargs) 2023-01-11T21:38:06.7155969Z File "/opt/conda/lib/python3.10/site-packages/torch/_prims_common/wrappers.py", line 279, in redispatch_prim 2023-01-11T21:38:06.7156062Z return prim(*args, **kwargs) 2023-01-11T21:38:06.7156283Z File "/opt/conda/lib/python3.10/site-packages/torch/_ops.py", line 284, in __call__ 2023-01-11T21:38:06.7156380Z return self._op(*args, **kwargs or {}) 2023-01-11T21:38:06.7156848Z RuntimeError: !schema.hasAnyAliasInfo() INTERNAL ASSERT FAILED at "/var/lib/jenkins/workspace/aten/src/ATen/FunctionalizeFallbackKernel.cpp":32, please report a bug to PyTorch. mutating and aliasing ops should all have codegen'd kernels 2023-01-11T21:38:06.7156857Z 2023-01-11T21:38:06.7157100Z While executing %broadcast_in_dim_default : [#users=1] = call_function[target=torch.ops.prims.broadcast_in_dim.default](args = (%var_default_1, [1, 512, 1], [0, 1]), kwargs = {}) 2023-01-11T21:38:06.7157182Z Original traceback: 2023-01-11T21:38:06.7157262Z Module stack: {} 2023-01-11T21:38:06.7157428Z File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 4723, in forward 2023-01-11T21:38:06.7157582Z broadcast_in_dim_default_2 = torch.ops.prims.broadcast_in_dim.default( 2023-01-11T21:38:06.7157748Z | File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 318, in run 2023-01-11T21:38:06.7157835Z return model(*ex, **kwargs) 2023-01-11T21:38:06.7157840Z 2023-01-11T21:38:06.7158091Z [2023-01-11 21:35:46,508] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 888 2023-01-11T21:38:06.7158300Z [2023-01-11 21:35:46,542] torch._inductor.scheduler: [DEBUG] remove_buffer('buf0') 2023-01-11T21:38:06.7158533Z [2023-01-11 21:35:46,548] torch._inductor.scheduler: [DEBUG] remove_buffer('buf4') 2023-01-11T21:38:06.7158539Z 2023-01-11T21:38:06.7158637Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.7158712Z import torch 2023-01-11T21:38:06.7158788Z import random 2023-01-11T21:38:06.7158911Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.7159034Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.7159040Z 2023-01-11T21:38:06.7159115Z aten = torch.ops.aten 2023-01-11T21:38:06.7159252Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.7159347Z async_compile = AsyncCompile() 2023-01-11T21:38:06.7159352Z 2023-01-11T21:38:06.7159426Z import triton 2023-01-11T21:38:06.7159518Z import triton.language as tl 2023-01-11T21:38:06.7159645Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.7159786Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.7159795Z 2023-01-11T21:38:06.7159799Z 2023-01-11T21:38:06.7159953Z triton_fused_var_0 = async_compile.triton(''' 2023-01-11T21:38:06.7160021Z import triton 2023-01-11T21:38:06.7160114Z import triton.language as tl 2023-01-11T21:38:06.7160229Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.7160333Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.7160466Z from torch._inductor.triton_ops.autotune import reduction 2023-01-11T21:38:06.7160592Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.7160597Z 2023-01-11T21:38:06.7160691Z @reduction(size_hints=[512, 1024], 2023-01-11T21:38:06.7160799Z reduction_hint=ReductionHint.INNER, 2023-01-11T21:38:06.7160884Z filename=__file__, 2023-01-11T21:38:06.7161261Z meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.7161341Z @triton.jit 2023-01-11T21:38:06.7161510Z def triton_(in_out_ptr0, in_ptr0, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr): 2023-01-11T21:38:06.7161585Z xnumel = 512 2023-01-11T21:38:06.7161658Z rnumel = 1024 2023-01-11T21:38:06.7161782Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.7161913Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.7161998Z xmask = xindex < xnumel 2023-01-11T21:38:06.7162116Z rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK]) 2023-01-11T21:38:06.7162188Z x0 = xindex 2023-01-11T21:38:06.7162310Z _tmp1 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.7162416Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.7162505Z rindex = roffset + rbase 2023-01-11T21:38:06.7162583Z rmask = rindex < rnumel 2023-01-11T21:38:06.7162655Z r1 = rindex 2023-01-11T21:38:06.7162880Z tmp0 = tl.load(in_ptr0 + (r1 + (1024*x0)), rmask & xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.7163006Z _tmp1 = tl.where(xmask & rmask, _tmp1 + tmp0, _tmp1) 2023-01-11T21:38:06.7163123Z tmp1 = tl.reshape(tl.sum(_tmp1, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.7163239Z _tmp7 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.7163345Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.7163426Z rindex = roffset + rbase 2023-01-11T21:38:06.7163511Z rmask = rindex < rnumel 2023-01-11T21:38:06.7163584Z r1 = rindex 2023-01-11T21:38:06.7163702Z tmp2 = tl.load(in_ptr0 + (r1 + (1024*x0)), rmask & xmask) 2023-01-11T21:38:06.7163776Z tmp3 = 1024 2023-01-11T21:38:06.7163860Z tmp4 = tmp1 / tmp3 2023-01-11T21:38:06.7163975Z tmp5 = tmp2 - tmp4 2023-01-11T21:38:06.7164048Z tmp6 = tmp5 * tmp5 2023-01-11T21:38:06.7164168Z _tmp7 = tl.where(xmask & rmask, _tmp7 + tmp6, _tmp7) 2023-01-11T21:38:06.7164309Z tmp7 = tl.reshape(tl.sum(_tmp7, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.7164383Z tmp8 = 1024 2023-01-11T21:38:06.7164462Z tmp9 = tmp7 / tmp8 2023-01-11T21:38:06.7164601Z tl.store(in_out_ptr0 + (x0 + tl.zeros([XBLOCK, 1], tl.int32)), tmp9, xmask) 2023-01-11T21:38:06.7164689Z ''') 2023-01-11T21:38:06.7164694Z 2023-01-11T21:38:06.7164699Z 2023-01-11T21:38:06.7164910Z triton_fused_add_add_1_mul_mul_1_sub_sum_1_var_1_1 = async_compile.triton(''' 2023-01-11T21:38:06.7164980Z import triton 2023-01-11T21:38:06.7165095Z import triton.language as tl 2023-01-11T21:38:06.7165220Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.7165335Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.7165466Z from torch._inductor.triton_ops.autotune import reduction 2023-01-11T21:38:06.7165591Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.7165596Z 2023-01-11T21:38:06.7165689Z @reduction(size_hints=[512, 1024], 2023-01-11T21:38:06.7165796Z reduction_hint=ReductionHint.INNER, 2023-01-11T21:38:06.7165884Z filename=__file__, 2023-01-11T21:38:06.7166352Z meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: '*fp32', 5: '*fp32', 6: '*fp32', 7: '*fp32', 8: 'i32', 9: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5, 6, 7, 8, 9), equal_to_1=())]}) 2023-01-11T21:38:06.7166427Z @triton.jit 2023-01-11T21:38:06.7166645Z def triton_(in_out_ptr0, in_ptr0, in_ptr1, in_ptr2, in_ptr3, in_ptr4, out_ptr0, out_ptr2, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr): 2023-01-11T21:38:06.7166719Z xnumel = 512 2023-01-11T21:38:06.7166792Z rnumel = 1024 2023-01-11T21:38:06.7166889Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.7167018Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.7167104Z xmask = xindex < xnumel 2023-01-11T21:38:06.7167222Z rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK]) 2023-01-11T21:38:06.7167297Z x0 = xindex 2023-01-11T21:38:06.7167394Z tmp1 = tl.load(in_ptr1 + (x0), xmask) 2023-01-11T21:38:06.7167490Z tmp3 = tl.load(in_ptr2 + (x0), xmask) 2023-01-11T21:38:06.7167596Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.7167676Z rindex = roffset + rbase 2023-01-11T21:38:06.7167795Z rmask = rindex < rnumel 2023-01-11T21:38:06.7167868Z r1 = rindex 2023-01-11T21:38:06.7167985Z tmp0 = tl.load(in_ptr0 + (r1 + (1024*x0)), rmask & xmask) 2023-01-11T21:38:06.7168086Z tmp5 = tl.load(in_ptr3 + (r1), rmask) 2023-01-11T21:38:06.7168187Z tmp7 = tl.load(in_ptr4 + (r1), rmask) 2023-01-11T21:38:06.7168304Z tmp2 = tmp0 - tmp1 2023-01-11T21:38:06.7168378Z tmp4 = tmp2 * tmp3 2023-01-11T21:38:06.7168458Z tmp6 = tmp4 * tmp5 2023-01-11T21:38:06.7168538Z tmp8 = tmp6 + tmp7 2023-01-11T21:38:06.7168697Z tl.store(out_ptr0 + (r1 + (1024*x0) + tl.zeros([XBLOCK, RBLOCK], tl.int32)), tmp8, rmask & xmask) 2023-01-11T21:38:06.7168822Z _tmp10 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.7168927Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.7169014Z rindex = roffset + rbase 2023-01-11T21:38:06.7169092Z rmask = rindex < rnumel 2023-01-11T21:38:06.7169167Z r1 = rindex 2023-01-11T21:38:06.7169387Z tmp9 = tl.load(out_ptr0 + (r1 + (1024*x0)), rmask & xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.7169511Z _tmp10 = tl.where(xmask & rmask, _tmp10 + tmp9, _tmp10) 2023-01-11T21:38:06.7169627Z tmp10 = tl.reshape(tl.sum(_tmp10, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.7169745Z _tmp16 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.7169860Z _tmp18 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.7169967Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.7170048Z rindex = roffset + rbase 2023-01-11T21:38:06.7170132Z rmask = rindex < rnumel 2023-01-11T21:38:06.7170236Z r1 = rindex 2023-01-11T21:38:06.7170461Z tmp11 = tl.load(out_ptr0 + (r1 + (1024*x0)), rmask & xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.7170583Z tmp17 = tl.load(out_ptr0 + (r1 + (1024*x0)), rmask & xmask) 2023-01-11T21:38:06.7170658Z tmp12 = 1024 2023-01-11T21:38:06.7170745Z tmp13 = tmp10 / tmp12 2023-01-11T21:38:06.7170856Z tmp14 = tmp11 - tmp13 2023-01-11T21:38:06.7170939Z tmp15 = tmp14 * tmp14 2023-01-11T21:38:06.7171062Z _tmp16 = tl.where(xmask & rmask, _tmp16 + tmp15, _tmp16) 2023-01-11T21:38:06.7171184Z _tmp18 = tl.where(xmask & rmask, _tmp18 + tmp17, _tmp18) 2023-01-11T21:38:06.7171300Z tmp16 = tl.reshape(tl.sum(_tmp16, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.7171415Z tmp18 = tl.reshape(tl.sum(_tmp18, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.7171513Z tl.store(out_ptr2 + x0, tmp18, xmask) 2023-01-11T21:38:06.7171580Z tmp19 = 1024 2023-01-11T21:38:06.7171665Z tmp20 = tmp16 / tmp19 2023-01-11T21:38:06.7171769Z tmp21 = 1e-05 2023-01-11T21:38:06.7171850Z tmp22 = tmp20 + tmp21 2023-01-11T21:38:06.7171991Z tl.store(in_out_ptr0 + (x0 + tl.zeros([XBLOCK, 1], tl.int32)), tmp22, xmask) 2023-01-11T21:38:06.7172080Z ''') 2023-01-11T21:38:06.7172085Z 2023-01-11T21:38:06.7172090Z 2023-01-11T21:38:06.7172187Z async_compile.wait(globals()) 2023-01-11T21:38:06.7172257Z del async_compile 2023-01-11T21:38:06.7172262Z 2023-01-11T21:38:06.7172337Z def call(args): 2023-01-11T21:38:06.7172447Z arg0_1, arg1_1, arg2_1, arg3_1, arg4_1, arg5_1 = args 2023-01-11T21:38:06.7172523Z args.clear() 2023-01-11T21:38:06.7172617Z with torch.cuda.device(0): 2023-01-11T21:38:06.7172826Z buf1 = empty_strided((1, 512), (512, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.7172918Z buf2 = buf1; del buf1 # reuse 2023-01-11T21:38:06.7173004Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.7173144Z triton_fused_var_0.run(buf2, arg3_1, 512, 1024, grid=grid(512), stream=stream0) 2023-01-11T21:38:06.7173221Z del arg3_1 2023-01-11T21:38:06.7173446Z buf3 = empty_strided((1, 512, 1024), (524288, 1024, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.7173649Z buf5 = empty_strided((1, 512), (512, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.7173880Z buf6 = empty_strided((1, 512), (512, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.7174008Z buf7 = as_strided(buf5, (1, 512, 1), (512, 1, 1)); del buf5 # reuse 2023-01-11T21:38:06.7174209Z triton_fused_add_add_1_mul_mul_1_sub_sum_1_var_1_1.run(buf7, arg2_1, arg4_1, arg5_1, arg0_1, arg1_1, buf3, buf6, 512, 1024, grid=grid(512), stream=stream0) 2023-01-11T21:38:06.7174289Z del arg0_1 2023-01-11T21:38:06.7174357Z del arg1_1 2023-01-11T21:38:06.7174432Z del arg2_1 2023-01-11T21:38:06.7174631Z del arg4_1 2023-01-11T21:38:06.7174705Z del arg5_1 2023-01-11T21:38:06.7174798Z return (buf2, buf6, buf7, ) 2023-01-11T21:38:06.7174808Z 2023-01-11T21:38:06.7174812Z 2023-01-11T21:38:06.7174895Z if __name__ == "__main__": 2023-01-11T21:38:06.7175019Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.7175167Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.7175400Z arg0_1 = rand_strided((1024, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.7175604Z arg1_1 = rand_strided((1024, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.7175828Z arg2_1 = rand_strided((1, 512, 1024), (524288, 1024, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.7176046Z arg3_1 = rand_strided((1, 512, 1024), (524288, 1024, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.7176261Z arg4_1 = rand_strided((1, 512, 1), (512, 1, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.7176471Z arg5_1 = rand_strided((1, 512, 1), (512, 1, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.7176677Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1, arg3_1, arg4_1, arg5_1])) 2023-01-11T21:38:06.7176683Z 2023-01-11T21:38:06.7176942Z [2023-01-11 21:35:46,662] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 888 2023-01-11T21:38:06.7176956Z 2023-01-11T21:38:06.7177051Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.7177177Z import torch 2023-01-11T21:38:06.7177257Z import random 2023-01-11T21:38:06.7177377Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.7177504Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.7177510Z 2023-01-11T21:38:06.7177593Z aten = torch.ops.aten 2023-01-11T21:38:06.7177730Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.7177818Z async_compile = AsyncCompile() 2023-01-11T21:38:06.7177823Z 2023-01-11T21:38:06.7177900Z import triton 2023-01-11T21:38:06.7177993Z import triton.language as tl 2023-01-11T21:38:06.7178121Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.7178259Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.7178264Z 2023-01-11T21:38:06.7178269Z 2023-01-11T21:38:06.7178422Z triton_fused_var_0 = async_compile.triton(''' 2023-01-11T21:38:06.7178497Z import triton 2023-01-11T21:38:06.7178592Z import triton.language as tl 2023-01-11T21:38:06.7178699Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.7178800Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.7178931Z from torch._inductor.triton_ops.autotune import reduction 2023-01-11T21:38:06.7179055Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.7179060Z 2023-01-11T21:38:06.7179153Z @reduction(size_hints=[512, 1024], 2023-01-11T21:38:06.7179267Z reduction_hint=ReductionHint.INNER, 2023-01-11T21:38:06.7179354Z filename=__file__, 2023-01-11T21:38:06.7179729Z meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.7179800Z @triton.jit 2023-01-11T21:38:06.7179971Z def triton_(in_out_ptr0, in_ptr0, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr): 2023-01-11T21:38:06.7180101Z xnumel = 512 2023-01-11T21:38:06.7180178Z rnumel = 1024 2023-01-11T21:38:06.7180277Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.7180414Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.7180497Z xmask = xindex < xnumel 2023-01-11T21:38:06.7180610Z rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK]) 2023-01-11T21:38:06.7180680Z x0 = xindex 2023-01-11T21:38:06.7180799Z _tmp2 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.7180905Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.7180992Z rindex = roffset + rbase 2023-01-11T21:38:06.7181081Z rmask = rindex < rnumel 2023-01-11T21:38:06.7181155Z r1 = rindex 2023-01-11T21:38:06.7181391Z tmp0 = tl.load(in_ptr0 + (r1 + (1024*x0)), rmask & xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.7181482Z tmp1 = tmp0.to(tl.float32) 2023-01-11T21:38:06.7181604Z _tmp2 = tl.where(xmask & rmask, _tmp2 + tmp1, _tmp2) 2023-01-11T21:38:06.7181718Z tmp2 = tl.reshape(tl.sum(_tmp2, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.7181839Z _tmp9 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.7181944Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.7182032Z rindex = roffset + rbase 2023-01-11T21:38:06.7182110Z rmask = rindex < rnumel 2023-01-11T21:38:06.7182184Z r1 = rindex 2023-01-11T21:38:06.7182316Z tmp3 = tl.load(in_ptr0 + (r1 + (1024*x0)), rmask & xmask).to(tl.float32) 2023-01-11T21:38:06.7182393Z tmp4 = 1024 2023-01-11T21:38:06.7182477Z tmp5 = tmp2 / tmp4 2023-01-11T21:38:06.7182595Z tmp6 = tmp5.to(tl.float32) 2023-01-11T21:38:06.7182709Z tmp7 = tmp3 - tmp6 2023-01-11T21:38:06.7182783Z tmp8 = tmp7 * tmp7 2023-01-11T21:38:06.7182904Z _tmp9 = tl.where(xmask & rmask, _tmp9 + tmp8, _tmp9) 2023-01-11T21:38:06.7183018Z tmp9 = tl.reshape(tl.sum(_tmp9, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.7183097Z tmp10 = 1024 2023-01-11T21:38:06.7183180Z tmp11 = tmp9 / tmp10 2023-01-11T21:38:06.7183321Z tl.store(in_out_ptr0 + (x0 + tl.zeros([XBLOCK, 1], tl.int32)), tmp11, xmask) 2023-01-11T21:38:06.7183405Z ''') 2023-01-11T21:38:06.7183411Z 2023-01-11T21:38:06.7183415Z 2023-01-11T21:38:06.7183652Z triton_fused_add_add_1_convert_element_type_mul_mul_1_sub_sum_1_var_1_1 = async_compile.triton(''' 2023-01-11T21:38:06.7183720Z import triton 2023-01-11T21:38:06.7183814Z import triton.language as tl 2023-01-11T21:38:06.7183928Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.7184031Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.7184167Z from torch._inductor.triton_ops.autotune import reduction 2023-01-11T21:38:06.7184295Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.7184301Z 2023-01-11T21:38:06.7184397Z @reduction(size_hints=[512, 1024], 2023-01-11T21:38:06.7184504Z reduction_hint=ReductionHint.INNER, 2023-01-11T21:38:06.7184593Z filename=__file__, 2023-01-11T21:38:06.7185060Z meta={'signature': {0: '*fp32', 1: '*fp16', 2: '*fp16', 3: '*fp16', 4: '*fp16', 5: '*fp16', 6: '*fp32', 7: '*fp32', 8: 'i32', 9: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5, 6, 7, 8, 9), equal_to_1=())]}) 2023-01-11T21:38:06.7185135Z @triton.jit 2023-01-11T21:38:06.7185350Z def triton_(in_out_ptr0, in_ptr0, in_ptr1, in_ptr2, in_ptr3, in_ptr4, out_ptr0, out_ptr2, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr): 2023-01-11T21:38:06.7185426Z xnumel = 512 2023-01-11T21:38:06.7185504Z rnumel = 1024 2023-01-11T21:38:06.7185603Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.7185730Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.7185813Z xmask = xindex < xnumel 2023-01-11T21:38:06.7185931Z rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK]) 2023-01-11T21:38:06.7186036Z x0 = xindex 2023-01-11T21:38:06.7186155Z tmp1 = tl.load(in_ptr1 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.7186271Z tmp3 = tl.load(in_ptr2 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.7186375Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.7186456Z rindex = roffset + rbase 2023-01-11T21:38:06.7186543Z rmask = rindex < rnumel 2023-01-11T21:38:06.7186615Z r1 = rindex 2023-01-11T21:38:06.7186747Z tmp0 = tl.load(in_ptr0 + (r1 + (1024*x0)), rmask & xmask).to(tl.float32) 2023-01-11T21:38:06.7186865Z tmp5 = tl.load(in_ptr3 + (r1), rmask).to(tl.float32) 2023-01-11T21:38:06.7186984Z tmp7 = tl.load(in_ptr4 + (r1), rmask).to(tl.float32) 2023-01-11T21:38:06.7187104Z tmp2 = tmp0 - tmp1 2023-01-11T21:38:06.7187179Z tmp4 = tmp2 * tmp3 2023-01-11T21:38:06.7187260Z tmp6 = tmp4 * tmp5 2023-01-11T21:38:06.7187338Z tmp8 = tmp6 + tmp7 2023-01-11T21:38:06.7187429Z tmp9 = tmp8.to(tl.float32) 2023-01-11T21:38:06.7187593Z tl.store(out_ptr0 + (r1 + (1024*x0) + tl.zeros([XBLOCK, RBLOCK], tl.int32)), tmp9, rmask & xmask) 2023-01-11T21:38:06.7187713Z _tmp11 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.7187818Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.7187906Z rindex = roffset + rbase 2023-01-11T21:38:06.7187984Z rmask = rindex < rnumel 2023-01-11T21:38:06.7188059Z r1 = rindex 2023-01-11T21:38:06.7188282Z tmp10 = tl.load(out_ptr0 + (r1 + (1024*x0)), rmask & xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.7188405Z _tmp11 = tl.where(xmask & rmask, _tmp11 + tmp10, _tmp11) 2023-01-11T21:38:06.7188552Z tmp11 = tl.reshape(tl.sum(_tmp11, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.7188669Z _tmp17 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.7188785Z _tmp19 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.7188883Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.7188973Z rindex = roffset + rbase 2023-01-11T21:38:06.7189059Z rmask = rindex < rnumel 2023-01-11T21:38:06.7189130Z r1 = rindex 2023-01-11T21:38:06.7189351Z tmp12 = tl.load(out_ptr0 + (r1 + (1024*x0)), rmask & xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.7189472Z tmp18 = tl.load(out_ptr0 + (r1 + (1024*x0)), rmask & xmask) 2023-01-11T21:38:06.7189547Z tmp13 = 1024 2023-01-11T21:38:06.7189624Z tmp14 = tmp11 / tmp13 2023-01-11T21:38:06.7189741Z tmp15 = tmp12 - tmp14 2023-01-11T21:38:06.7189821Z tmp16 = tmp15 * tmp15 2023-01-11T21:38:06.7189944Z _tmp17 = tl.where(xmask & rmask, _tmp17 + tmp16, _tmp17) 2023-01-11T21:38:06.7190069Z _tmp19 = tl.where(xmask & rmask, _tmp19 + tmp18, _tmp19) 2023-01-11T21:38:06.7190184Z tmp17 = tl.reshape(tl.sum(_tmp17, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.7190297Z tmp19 = tl.reshape(tl.sum(_tmp19, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.7190394Z tl.store(out_ptr2 + x0, tmp19, xmask) 2023-01-11T21:38:06.7190468Z tmp20 = 1024 2023-01-11T21:38:06.7190555Z tmp21 = tmp17 / tmp20 2023-01-11T21:38:06.7190659Z tmp22 = 1e-05 2023-01-11T21:38:06.7190741Z tmp23 = tmp21 + tmp22 2023-01-11T21:38:06.7190880Z tl.store(in_out_ptr0 + (x0 + tl.zeros([XBLOCK, 1], tl.int32)), tmp23, xmask) 2023-01-11T21:38:06.7190963Z ''') 2023-01-11T21:38:06.7190970Z 2023-01-11T21:38:06.7190975Z 2023-01-11T21:38:06.7191067Z async_compile.wait(globals()) 2023-01-11T21:38:06.7191137Z del async_compile 2023-01-11T21:38:06.7191142Z 2023-01-11T21:38:06.7191217Z def call(args): 2023-01-11T21:38:06.7191327Z arg0_1, arg1_1, arg2_1, arg3_1, arg4_1, arg5_1 = args 2023-01-11T21:38:06.7191404Z args.clear() 2023-01-11T21:38:06.7191499Z with torch.cuda.device(0): 2023-01-11T21:38:06.7191704Z buf1 = empty_strided((1, 512), (512, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.7191794Z buf2 = buf1; del buf1 # reuse 2023-01-11T21:38:06.7191906Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.7192049Z triton_fused_var_0.run(buf2, arg3_1, 512, 1024, grid=grid(512), stream=stream0) 2023-01-11T21:38:06.7192123Z del arg3_1 2023-01-11T21:38:06.7192346Z buf3 = empty_strided((1, 512, 1024), (524288, 1024, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.7192548Z buf5 = empty_strided((1, 512), (512, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.7192742Z buf6 = empty_strided((1, 512), (512, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.7192864Z buf7 = as_strided(buf5, (1, 512, 1), (512, 1, 1)); del buf5 # reuse 2023-01-11T21:38:06.7193086Z triton_fused_add_add_1_convert_element_type_mul_mul_1_sub_sum_1_var_1_1.run(buf7, arg2_1, arg4_1, arg5_1, arg0_1, arg1_1, buf3, buf6, 512, 1024, grid=grid(512), stream=stream0) 2023-01-11T21:38:06.7193153Z del arg0_1 2023-01-11T21:38:06.7193225Z del arg1_1 2023-01-11T21:38:06.7193298Z del arg2_1 2023-01-11T21:38:06.7193372Z del arg4_1 2023-01-11T21:38:06.7193442Z del arg5_1 2023-01-11T21:38:06.7193531Z return (buf2, buf6, buf7, ) 2023-01-11T21:38:06.7193537Z 2023-01-11T21:38:06.7193542Z 2023-01-11T21:38:06.7193622Z if __name__ == "__main__": 2023-01-11T21:38:06.7193735Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.7193864Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.7194066Z arg0_1 = rand_strided((1024, ), (1, ), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.7194265Z arg1_1 = rand_strided((1024, ), (1, ), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.7194513Z arg2_1 = rand_strided((1, 512, 1024), (524288, 1024, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.7194726Z arg3_1 = rand_strided((1, 512, 1024), (524288, 1024, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.7194934Z arg4_1 = rand_strided((1, 512, 1), (512, 1, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.7195144Z arg5_1 = rand_strided((1, 512, 1), (512, 1, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.7195284Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1, arg3_1, arg4_1, arg5_1])) 2023-01-11T21:38:06.7195298Z 2023-01-11T21:38:06.7195362Z ok (0.827s) 2023-01-11T21:38:06.7195832Z test_tmp_not_defined_issue2_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.7195966Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.7196225Z [2023-01-11 21:35:46,725] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 889 2023-01-11T21:38:06.7196494Z [2023-01-11 21:35:46,936] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 889 2023-01-11T21:38:06.7196910Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.7197042Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.7197297Z [2023-01-11 21:35:46,998] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 890 2023-01-11T21:38:06.7197305Z 2023-01-11T21:38:06.7197403Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.7197479Z import torch 2023-01-11T21:38:06.7197547Z import random 2023-01-11T21:38:06.7197669Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.7197821Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.7197827Z 2023-01-11T21:38:06.7197911Z aten = torch.ops.aten 2023-01-11T21:38:06.7198050Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.7198147Z async_compile = AsyncCompile() 2023-01-11T21:38:06.7198152Z 2023-01-11T21:38:06.7198229Z import triton 2023-01-11T21:38:06.7198323Z import triton.language as tl 2023-01-11T21:38:06.7198440Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.7198581Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.7198587Z 2023-01-11T21:38:06.7198592Z 2023-01-11T21:38:06.7198764Z triton_fused_div_mul_sum_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.7198840Z import triton 2023-01-11T21:38:06.7198934Z import triton.language as tl 2023-01-11T21:38:06.7199050Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.7199151Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.7199277Z from torch._inductor.triton_ops.autotune import reduction 2023-01-11T21:38:06.7199404Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.7199409Z 2023-01-11T21:38:06.7199500Z @reduction(size_hints=[32, 8192], 2023-01-11T21:38:06.7199616Z reduction_hint=ReductionHint.INNER, 2023-01-11T21:38:06.7199704Z filename=__file__, 2023-01-11T21:38:06.7200092Z meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: 'i32', 5: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.7200166Z @triton.jit 2023-01-11T21:38:06.7200379Z def triton_(in_ptr0, in_ptr1, in_ptr2, out_ptr0, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr): 2023-01-11T21:38:06.7200455Z xnumel = 18 2023-01-11T21:38:06.7200522Z rnumel = 7823 2023-01-11T21:38:06.7200621Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.7200759Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.7200843Z xmask = xindex < xnumel 2023-01-11T21:38:06.7200963Z rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK]) 2023-01-11T21:38:06.7201033Z x0 = xindex 2023-01-11T21:38:06.7201281Z tmp4 = tl.load(in_ptr1 + (0 + tl.zeros([XBLOCK, RBLOCK], tl.int32)), None, eviction_policy='evict_last') 2023-01-11T21:38:06.7201392Z _tmp9 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.7201498Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.7201588Z rindex = roffset + rbase 2023-01-11T21:38:06.7201673Z rmask = rindex < rnumel 2023-01-11T21:38:06.7201748Z r1 = rindex 2023-01-11T21:38:06.7201830Z tmp0 = r1 + (7823*x0) 2023-01-11T21:38:06.7201904Z tmp1 = 140800 2023-01-11T21:38:06.7201979Z tmp2 = tmp0 < tmp1 2023-01-11T21:38:06.7202295Z tmp3 = tl.load(in_ptr0 + ((r1 + (7823*x0)) % 140800 + tl.zeros([XBLOCK, RBLOCK], tl.int32)), rmask & tmp2 & xmask, eviction_policy='evict_last', other=0) 2023-01-11T21:38:06.7202378Z tmp5 = tmp3 / tmp4 2023-01-11T21:38:06.7202696Z tmp6 = tl.load(in_ptr2 + ((r1 + (7823*x0)) % 140800 + tl.zeros([XBLOCK, RBLOCK], tl.int32)), rmask & tmp2 & xmask, eviction_policy='evict_last', other=0) 2023-01-11T21:38:06.7202780Z tmp7 = tmp5 * tmp6 2023-01-11T21:38:06.7202874Z tmp8 = tl.where(tmp2, tmp7, 0) 2023-01-11T21:38:06.7203000Z _tmp9 = tl.where(xmask & rmask, _tmp9 + tmp8, _tmp9) 2023-01-11T21:38:06.7203108Z tmp9 = tl.reshape(tl.sum(_tmp9, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.7203207Z tl.store(out_ptr0 + x0, tmp9, xmask) 2023-01-11T21:38:06.7203294Z ''') 2023-01-11T21:38:06.7203300Z 2023-01-11T21:38:06.7203304Z 2023-01-11T21:38:06.7203472Z triton_fused_div_mul_sum_1_1 = async_compile.triton(''' 2023-01-11T21:38:06.7203547Z import triton 2023-01-11T21:38:06.7203642Z import triton.language as tl 2023-01-11T21:38:06.7203755Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.7203884Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.7204009Z from torch._inductor.triton_ops.autotune import reduction 2023-01-11T21:38:06.7204138Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.7204143Z 2023-01-11T21:38:06.7204232Z @reduction(size_hints=[1, 32], 2023-01-11T21:38:06.7204346Z reduction_hint=ReductionHint.INNER, 2023-01-11T21:38:06.7204431Z filename=__file__, 2023-01-11T21:38:06.7204789Z meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.7204866Z @triton.jit 2023-01-11T21:38:06.7205033Z def triton_(in_ptr0, out_ptr0, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr): 2023-01-11T21:38:06.7205099Z xnumel = 1 2023-01-11T21:38:06.7205171Z rnumel = 18 2023-01-11T21:38:06.7205268Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.7205407Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.7205490Z xmask = xindex < xnumel 2023-01-11T21:38:06.7205609Z rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK]) 2023-01-11T21:38:06.7205726Z _tmp1 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.7205824Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.7205911Z rindex = roffset + rbase 2023-01-11T21:38:06.7205996Z rmask = rindex < rnumel 2023-01-11T21:38:06.7206067Z r0 = rindex 2023-01-11T21:38:06.7206170Z tmp0 = tl.load(in_ptr0 + (r0), rmask) 2023-01-11T21:38:06.7206290Z _tmp1 = tl.where(xmask & rmask, _tmp1 + tmp0, _tmp1) 2023-01-11T21:38:06.7206430Z tmp1 = tl.reshape(tl.sum(_tmp1, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.7206555Z tl.store(out_ptr0 + 0 + tl.zeros([XBLOCK, 1], tl.int32), tmp1, None) 2023-01-11T21:38:06.7206641Z ''') 2023-01-11T21:38:06.7206647Z 2023-01-11T21:38:06.7206651Z 2023-01-11T21:38:06.7206748Z async_compile.wait(globals()) 2023-01-11T21:38:06.7206825Z del async_compile 2023-01-11T21:38:06.7206831Z 2023-01-11T21:38:06.7206904Z def call(args): 2023-01-11T21:38:06.7207009Z primals_1, primals_2, primals_3 = args 2023-01-11T21:38:06.7207086Z args.clear() 2023-01-11T21:38:06.7207171Z with torch.cuda.device(0): 2023-01-11T21:38:06.7207370Z buf0 = empty_strided((18, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.7207461Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.7207635Z triton_fused_div_mul_sum_1_0.run(primals_3, primals_2, primals_1, buf0, 18, 7823, grid=grid(18), stream=stream0) 2023-01-11T21:38:06.7207826Z buf1 = empty_strided((), (), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.7207971Z triton_fused_div_mul_sum_1_1.run(buf0, buf1, 1, 18, grid=grid(1), stream=stream0) 2023-01-11T21:38:06.7208091Z return (buf1, primals_1, primals_2, primals_3, ) 2023-01-11T21:38:06.7208096Z 2023-01-11T21:38:06.7208100Z 2023-01-11T21:38:06.7208183Z if __name__ == "__main__": 2023-01-11T21:38:06.7208299Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.7208418Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.7208652Z primals_1 = rand_strided((1, 88, 40, 40), (140800, 1600, 40, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.7208847Z primals_2 = rand_strided((), (), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.7209074Z primals_3 = rand_strided((1, 88, 40, 40), (140800, 1600, 40, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.7209220Z print_performance(lambda: call([primals_1, primals_2, primals_3])) 2023-01-11T21:38:06.7209229Z 2023-01-11T21:38:06.7209495Z [2023-01-11 21:35:47,121] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 890 2023-01-11T21:38:06.7209501Z 2023-01-11T21:38:06.7209600Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.7209674Z import torch 2023-01-11T21:38:06.7209771Z import random 2023-01-11T21:38:06.7209891Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.7210017Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.7210022Z 2023-01-11T21:38:06.7210103Z aten = torch.ops.aten 2023-01-11T21:38:06.7210240Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.7210338Z async_compile = AsyncCompile() 2023-01-11T21:38:06.7210344Z 2023-01-11T21:38:06.7210417Z import triton 2023-01-11T21:38:06.7210509Z import triton.language as tl 2023-01-11T21:38:06.7210626Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.7210769Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.7210774Z 2023-01-11T21:38:06.7210779Z 2023-01-11T21:38:06.7210948Z triton_fused_div_mul_sum_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.7211023Z import triton 2023-01-11T21:38:06.7211116Z import triton.language as tl 2023-01-11T21:38:06.7211230Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.7211332Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.7211455Z from torch._inductor.triton_ops.autotune import reduction 2023-01-11T21:38:06.7211582Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.7211587Z 2023-01-11T21:38:06.7211677Z @reduction(size_hints=[32, 8192], 2023-01-11T21:38:06.7211790Z reduction_hint=ReductionHint.INNER, 2023-01-11T21:38:06.7211875Z filename=__file__, 2023-01-11T21:38:06.7212268Z meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: '*fp32', 4: 'i32', 5: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.7212371Z @triton.jit 2023-01-11T21:38:06.7212554Z def triton_(in_ptr0, in_ptr1, in_ptr2, out_ptr0, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr): 2023-01-11T21:38:06.7212627Z xnumel = 18 2023-01-11T21:38:06.7212697Z rnumel = 7823 2023-01-11T21:38:06.7212795Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.7212928Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.7213012Z xmask = xindex < xnumel 2023-01-11T21:38:06.7213131Z rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK]) 2023-01-11T21:38:06.7213202Z x0 = xindex 2023-01-11T21:38:06.7213471Z tmp4 = tl.load(in_ptr1 + (0 + tl.zeros([XBLOCK, RBLOCK], tl.int32)), None, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.7213582Z _tmp9 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.7213687Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.7213780Z rindex = roffset + rbase 2023-01-11T21:38:06.7213866Z rmask = rindex < rnumel 2023-01-11T21:38:06.7213936Z r1 = rindex 2023-01-11T21:38:06.7214017Z tmp0 = r1 + (7823*x0) 2023-01-11T21:38:06.7214093Z tmp1 = 140800 2023-01-11T21:38:06.7214166Z tmp2 = tmp0 < tmp1 2023-01-11T21:38:06.7214723Z tmp3 = tl.load(in_ptr0 + ((r1 + (7823*x0)) % 140800 + tl.zeros([XBLOCK, RBLOCK], tl.int32)), rmask & tmp2 & xmask, eviction_policy='evict_last', other=0).to(tl.float32) 2023-01-11T21:38:06.7214811Z tmp5 = tmp3 / tmp4 2023-01-11T21:38:06.7215147Z tmp6 = tl.load(in_ptr2 + ((r1 + (7823*x0)) % 140800 + tl.zeros([XBLOCK, RBLOCK], tl.int32)), rmask & tmp2 & xmask, eviction_policy='evict_last', other=0).to(tl.float32) 2023-01-11T21:38:06.7215228Z tmp7 = tmp5 * tmp6 2023-01-11T21:38:06.7215322Z tmp8 = tl.where(tmp2, tmp7, 0) 2023-01-11T21:38:06.7215444Z _tmp9 = tl.where(xmask & rmask, _tmp9 + tmp8, _tmp9) 2023-01-11T21:38:06.7215562Z tmp9 = tl.reshape(tl.sum(_tmp9, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.7215654Z tl.store(out_ptr0 + x0, tmp9, xmask) 2023-01-11T21:38:06.7215739Z ''') 2023-01-11T21:38:06.7215745Z 2023-01-11T21:38:06.7215750Z 2023-01-11T21:38:06.7215918Z triton_fused_div_mul_sum_1_1 = async_compile.triton(''' 2023-01-11T21:38:06.7216038Z import triton 2023-01-11T21:38:06.7216133Z import triton.language as tl 2023-01-11T21:38:06.7216249Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.7216352Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.7216476Z from torch._inductor.triton_ops.autotune import reduction 2023-01-11T21:38:06.7216600Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.7216605Z 2023-01-11T21:38:06.7216695Z @reduction(size_hints=[1, 32], 2023-01-11T21:38:06.7216808Z reduction_hint=ReductionHint.INNER, 2023-01-11T21:38:06.7216892Z filename=__file__, 2023-01-11T21:38:06.7217311Z meta={'signature': {0: '*fp32', 1: '*fp16', 2: 'i32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.7217392Z @triton.jit 2023-01-11T21:38:06.7217561Z def triton_(in_ptr0, out_ptr0, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr): 2023-01-11T21:38:06.7217627Z xnumel = 1 2023-01-11T21:38:06.7217705Z rnumel = 18 2023-01-11T21:38:06.7217803Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.7217936Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.7218019Z xmask = xindex < xnumel 2023-01-11T21:38:06.7218138Z rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK]) 2023-01-11T21:38:06.7218255Z _tmp1 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.7218353Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.7218440Z rindex = roffset + rbase 2023-01-11T21:38:06.7218581Z rmask = rindex < rnumel 2023-01-11T21:38:06.7218658Z r0 = rindex 2023-01-11T21:38:06.7218761Z tmp0 = tl.load(in_ptr0 + (r0), rmask) 2023-01-11T21:38:06.7218884Z _tmp1 = tl.where(xmask & rmask, _tmp1 + tmp0, _tmp1) 2023-01-11T21:38:06.7218998Z tmp1 = tl.reshape(tl.sum(_tmp1, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.7219129Z tl.store(out_ptr0 + 0 + tl.zeros([XBLOCK, 1], tl.int32), tmp1, None) 2023-01-11T21:38:06.7219219Z ''') 2023-01-11T21:38:06.7219224Z 2023-01-11T21:38:06.7219228Z 2023-01-11T21:38:06.7219324Z async_compile.wait(globals()) 2023-01-11T21:38:06.7219404Z del async_compile 2023-01-11T21:38:06.7219409Z 2023-01-11T21:38:06.7219485Z def call(args): 2023-01-11T21:38:06.7219594Z primals_1, primals_2, primals_3 = args 2023-01-11T21:38:06.7219672Z args.clear() 2023-01-11T21:38:06.7219766Z with torch.cuda.device(0): 2023-01-11T21:38:06.7219959Z buf0 = empty_strided((18, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.7220059Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.7220233Z triton_fused_div_mul_sum_1_0.run(primals_3, primals_2, primals_1, buf0, 18, 7823, grid=grid(18), stream=stream0) 2023-01-11T21:38:06.7220424Z buf1 = empty_strided((), (), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.7220573Z triton_fused_div_mul_sum_1_1.run(buf0, buf1, 1, 18, grid=grid(1), stream=stream0) 2023-01-11T21:38:06.7220696Z return (buf1, primals_1, primals_2, primals_3, ) 2023-01-11T21:38:06.7220701Z 2023-01-11T21:38:06.7220706Z 2023-01-11T21:38:06.7220789Z if __name__ == "__main__": 2023-01-11T21:38:06.7220910Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.7221032Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.7221267Z primals_1 = rand_strided((1, 88, 40, 40), (140800, 1600, 40, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.7221464Z primals_2 = rand_strided((), (), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.7221694Z primals_3 = rand_strided((1, 88, 40, 40), (140800, 1600, 40, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.7221840Z print_performance(lambda: call([primals_1, primals_2, primals_3])) 2023-01-11T21:38:06.7221845Z 2023-01-11T21:38:06.7221918Z ok (0.459s) 2023-01-11T21:38:06.7222416Z test_to_device_constant_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.7222547Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.7222810Z [2023-01-11 21:35:47,267] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 891 2023-01-11T21:38:06.7223072Z [2023-01-11 21:35:47,350] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 891 2023-01-11T21:38:06.7223484Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.7223616Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.7223870Z [2023-01-11 21:35:47,494] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 892 2023-01-11T21:38:06.7224133Z [2023-01-11 21:35:47,571] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 892 2023-01-11T21:38:06.7224138Z 2023-01-11T21:38:06.7224235Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.7224310Z import torch 2023-01-11T21:38:06.7224416Z import random 2023-01-11T21:38:06.7224535Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.7224658Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.7224663Z 2023-01-11T21:38:06.7224738Z aten = torch.ops.aten 2023-01-11T21:38:06.7224876Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.7224973Z async_compile = AsyncCompile() 2023-01-11T21:38:06.7224979Z 2023-01-11T21:38:06.7225051Z import triton 2023-01-11T21:38:06.7225143Z import triton.language as tl 2023-01-11T21:38:06.7225270Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.7225409Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.7225568Z constant0 = None # 001ba0e61f337f449db2ca901dfd67b1df5cf0825674994be1be485b60aabc98 2023-01-11T21:38:06.7225725Z constant0_cuda0 = None # beb02a6268c3f406824d31228875474e7116cf8e770246a5eb85f5795315f9cc 2023-01-11T21:38:06.7225731Z 2023-01-11T21:38:06.7225735Z 2023-01-11T21:38:06.7225896Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.7225970Z import triton 2023-01-11T21:38:06.7226063Z import triton.language as tl 2023-01-11T21:38:06.7226176Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.7226280Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.7226415Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.7226532Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.7226544Z 2023-01-11T21:38:06.7226937Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.7227013Z @triton.jit 2023-01-11T21:38:06.7227147Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.7227219Z xnumel = 10 2023-01-11T21:38:06.7227316Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.7227450Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.7227533Z xmask = xindex < xnumel 2023-01-11T21:38:06.7227597Z x0 = xindex 2023-01-11T21:38:06.7227694Z tmp2 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.7227765Z tmp0 = x0 2023-01-11T21:38:06.7227882Z tmp1 = tmp0.to(tl.float32) 2023-01-11T21:38:06.7227961Z tmp3 = tmp1 + tmp2 2023-01-11T21:38:06.7228098Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.7228186Z ''') 2023-01-11T21:38:06.7228192Z 2023-01-11T21:38:06.7228196Z 2023-01-11T21:38:06.7228417Z triton_fused_convert_element_type_1_convert_element_type_2_1 = async_compile.triton(''' 2023-01-11T21:38:06.7228485Z import triton 2023-01-11T21:38:06.7228577Z import triton.language as tl 2023-01-11T21:38:06.7228691Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.7228791Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.7228922Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.7229050Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.7229056Z 2023-01-11T21:38:06.7229469Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*i64', 1: '*i64', 2: '*i64', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.7229544Z @triton.jit 2023-01-11T21:38:06.7229680Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.7229753Z xnumel = 64 2023-01-11T21:38:06.7229849Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.7229978Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.7230060Z xmask = xindex < xnumel 2023-01-11T21:38:06.7230131Z x0 = xindex 2023-01-11T21:38:06.7230321Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.7230439Z tmp1 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.7230510Z tmp2 = 1 2023-01-11T21:38:06.7230592Z tmp3 = tmp1 + tmp2 2023-01-11T21:38:06.7230725Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask) 2023-01-11T21:38:06.7230857Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.7230943Z ''') 2023-01-11T21:38:06.7230949Z 2023-01-11T21:38:06.7230953Z 2023-01-11T21:38:06.7231047Z async_compile.wait(globals()) 2023-01-11T21:38:06.7231118Z del async_compile 2023-01-11T21:38:06.7231128Z 2023-01-11T21:38:06.7231196Z def call(args): 2023-01-11T21:38:06.7231268Z arg0_1, = args 2023-01-11T21:38:06.7231343Z args.clear() 2023-01-11T21:38:06.7231438Z with torch.cuda.device(0): 2023-01-11T21:38:06.7231636Z buf0 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.7231729Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.7231858Z triton_fused_add_0.run(arg0_1, buf0, 10, grid=grid(10), stream=stream0) 2023-01-11T21:38:06.7231934Z del arg0_1 2023-01-11T21:38:06.7232129Z buf1 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.7232324Z buf2 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.7232524Z triton_fused_convert_element_type_1_convert_element_type_2_1.run(constant0_cuda0, buf1, buf2, 64, grid=grid(64), stream=stream0) 2023-01-11T21:38:06.7232613Z return (buf0, buf1, buf2, ) 2023-01-11T21:38:06.7232619Z 2023-01-11T21:38:06.7232623Z 2023-01-11T21:38:06.7232703Z if __name__ == "__main__": 2023-01-11T21:38:06.7232821Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.7232948Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.7233136Z constant0 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.int64) 2023-01-11T21:38:06.7233345Z constant0_cuda0 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.7233546Z arg0_1 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.7233658Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.7233664Z 2023-01-11T21:38:06.7233668Z 2023-01-11T21:38:06.7233765Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.7233838Z import torch 2023-01-11T21:38:06.7233984Z import random 2023-01-11T21:38:06.7234098Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.7234223Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.7234228Z 2023-01-11T21:38:06.7234309Z aten = torch.ops.aten 2023-01-11T21:38:06.7234447Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.7234541Z async_compile = AsyncCompile() 2023-01-11T21:38:06.7234546Z 2023-01-11T21:38:06.7234622Z import triton 2023-01-11T21:38:06.7234715Z import triton.language as tl 2023-01-11T21:38:06.7234839Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.7234975Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.7235143Z constant0 = None # 001ba0e61f337f449db2ca901dfd67b1df5cf0825674994be1be485b60aabc98 2023-01-11T21:38:06.7235302Z constant0_cuda0 = None # beb02a6268c3f406824d31228875474e7116cf8e770246a5eb85f5795315f9cc 2023-01-11T21:38:06.7235308Z 2023-01-11T21:38:06.7235312Z 2023-01-11T21:38:06.7235467Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.7235542Z import triton 2023-01-11T21:38:06.7235635Z import triton.language as tl 2023-01-11T21:38:06.7235748Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.7235850Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.7235976Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.7236100Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.7236105Z 2023-01-11T21:38:06.7236504Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.7236605Z @triton.jit 2023-01-11T21:38:06.7236739Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.7236816Z xnumel = 10 2023-01-11T21:38:06.7236916Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.7237044Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.7237120Z xmask = xindex < xnumel 2023-01-11T21:38:06.7237191Z x0 = xindex 2023-01-11T21:38:06.7237310Z tmp2 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.7237382Z tmp0 = x0 2023-01-11T21:38:06.7237470Z tmp1 = tmp0.to(tl.float32) 2023-01-11T21:38:06.7237550Z tmp3 = tmp1 + tmp2 2023-01-11T21:38:06.7237685Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.7237763Z ''') 2023-01-11T21:38:06.7237769Z 2023-01-11T21:38:06.7237773Z 2023-01-11T21:38:06.7238000Z triton_fused_convert_element_type_1_convert_element_type_2_1 = async_compile.triton(''' 2023-01-11T21:38:06.7238076Z import triton 2023-01-11T21:38:06.7238168Z import triton.language as tl 2023-01-11T21:38:06.7238282Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.7238383Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.7238518Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.7238646Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.7238651Z 2023-01-11T21:38:06.7239054Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*i64', 1: '*i64', 2: '*i64', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.7239128Z @triton.jit 2023-01-11T21:38:06.7239270Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.7239344Z xnumel = 64 2023-01-11T21:38:06.7239443Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.7239571Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.7239656Z xmask = xindex < xnumel 2023-01-11T21:38:06.7239726Z x0 = xindex 2023-01-11T21:38:06.7239909Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.7240038Z tmp1 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.7240112Z tmp2 = 1 2023-01-11T21:38:06.7240192Z tmp3 = tmp1 + tmp2 2023-01-11T21:38:06.7240326Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask) 2023-01-11T21:38:06.7240458Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.7240543Z ''') 2023-01-11T21:38:06.7240548Z 2023-01-11T21:38:06.7240553Z 2023-01-11T21:38:06.7240638Z async_compile.wait(globals()) 2023-01-11T21:38:06.7240716Z del async_compile 2023-01-11T21:38:06.7240722Z 2023-01-11T21:38:06.7240796Z def call(args): 2023-01-11T21:38:06.7240874Z arg0_1, = args 2023-01-11T21:38:06.7240951Z args.clear() 2023-01-11T21:38:06.7241044Z with torch.cuda.device(0): 2023-01-11T21:38:06.7241241Z buf0 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.7241326Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.7241465Z triton_fused_add_0.run(arg0_1, buf0, 10, grid=grid(10), stream=stream0) 2023-01-11T21:38:06.7241539Z del arg0_1 2023-01-11T21:38:06.7241732Z buf1 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.7241925Z buf2 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.7242121Z triton_fused_convert_element_type_1_convert_element_type_2_1.run(constant0_cuda0, buf1, buf2, 64, grid=grid(64), stream=stream0) 2023-01-11T21:38:06.7242210Z return (buf0, buf1, buf2, ) 2023-01-11T21:38:06.7242215Z 2023-01-11T21:38:06.7242220Z 2023-01-11T21:38:06.7242301Z if __name__ == "__main__": 2023-01-11T21:38:06.7242439Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.7242566Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.7242765Z constant0 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.int64) 2023-01-11T21:38:06.7242978Z constant0_cuda0 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.7243176Z arg0_1 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.7243293Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.7243298Z 2023-01-11T21:38:06.7243369Z ok (0.448s) 2023-01-11T21:38:06.7243829Z test_to_device_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.7243963Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.7244219Z [2023-01-11 21:35:47,624] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 893 2023-01-11T21:38:06.7244395Z [2023-01-11 21:35:47,625] torch._inductor.ir: [WARNING] DeviceCopy 2023-01-11T21:38:06.7244661Z [2023-01-11 21:35:47,627] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 893 2023-01-11T21:38:06.7245076Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.7245209Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.7245467Z [2023-01-11 21:35:47,677] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 894 2023-01-11T21:38:06.7245648Z [2023-01-11 21:35:47,679] torch._inductor.ir: [WARNING] DeviceCopy 2023-01-11T21:38:06.7245909Z [2023-01-11 21:35:47,745] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 894 2023-01-11T21:38:06.7245939Z 2023-01-11T21:38:06.7246039Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.7246114Z import torch 2023-01-11T21:38:06.7246182Z import random 2023-01-11T21:38:06.7246301Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.7246426Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.7246432Z 2023-01-11T21:38:06.7246515Z aten = torch.ops.aten 2023-01-11T21:38:06.7246652Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.7246748Z async_compile = AsyncCompile() 2023-01-11T21:38:06.7246754Z 2023-01-11T21:38:06.7246827Z import triton 2023-01-11T21:38:06.7246922Z import triton.language as tl 2023-01-11T21:38:06.7247041Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.7247180Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.7247186Z 2023-01-11T21:38:06.7247190Z 2023-01-11T21:38:06.7247282Z async_compile.wait(globals()) 2023-01-11T21:38:06.7247361Z del async_compile 2023-01-11T21:38:06.7247366Z 2023-01-11T21:38:06.7247442Z def call(args): 2023-01-11T21:38:06.7247514Z arg0_1, = args 2023-01-11T21:38:06.7247590Z args.clear() 2023-01-11T21:38:06.7247790Z buf0 = empty_strided((2, 2, 10), (20, 10, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.7247871Z buf0.copy_(arg0_1) 2023-01-11T21:38:06.7247944Z del arg0_1 2023-01-11T21:38:06.7248024Z return (buf0, ) 2023-01-11T21:38:06.7248029Z 2023-01-11T21:38:06.7248033Z 2023-01-11T21:38:06.7248112Z if __name__ == "__main__": 2023-01-11T21:38:06.7248229Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.7248384Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.7248592Z arg0_1 = rand_strided((2, 2, 10), (20, 10, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.7248698Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.7248703Z 2023-01-11T21:38:06.7248714Z 2023-01-11T21:38:06.7248807Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.7248881Z import torch 2023-01-11T21:38:06.7248957Z import random 2023-01-11T21:38:06.7249074Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.7249197Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.7249202Z 2023-01-11T21:38:06.7249288Z aten = torch.ops.aten 2023-01-11T21:38:06.7249424Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.7249512Z async_compile = AsyncCompile() 2023-01-11T21:38:06.7249517Z 2023-01-11T21:38:06.7249590Z import triton 2023-01-11T21:38:06.7249683Z import triton.language as tl 2023-01-11T21:38:06.7249812Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.7249951Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.7249957Z 2023-01-11T21:38:06.7249961Z 2023-01-11T21:38:06.7250144Z triton_fused_convert_element_type_0 = async_compile.triton(''' 2023-01-11T21:38:06.7250218Z import triton 2023-01-11T21:38:06.7250314Z import triton.language as tl 2023-01-11T21:38:06.7250421Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.7250522Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.7250654Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.7250780Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.7250785Z 2023-01-11T21:38:06.7251184Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.7251261Z @triton.jit 2023-01-11T21:38:06.7251392Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.7251465Z xnumel = 40 2023-01-11T21:38:06.7251555Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.7251683Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.7251794Z xmask = xindex < xnumel 2023-01-11T21:38:06.7251867Z x0 = xindex 2023-01-11T21:38:06.7251986Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.7252073Z tmp1 = tmp0.to(tl.float32) 2023-01-11T21:38:06.7252208Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask) 2023-01-11T21:38:06.7252286Z ''') 2023-01-11T21:38:06.7252292Z 2023-01-11T21:38:06.7252296Z 2023-01-11T21:38:06.7252388Z async_compile.wait(globals()) 2023-01-11T21:38:06.7252464Z del async_compile 2023-01-11T21:38:06.7252469Z 2023-01-11T21:38:06.7252544Z def call(args): 2023-01-11T21:38:06.7252616Z arg0_1, = args 2023-01-11T21:38:06.7252693Z args.clear() 2023-01-11T21:38:06.7252788Z with torch.cuda.device(0): 2023-01-11T21:38:06.7252989Z buf0 = empty_strided((2, 2, 10), (20, 10, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.7253083Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.7253247Z triton_fused_convert_element_type_0.run(arg0_1, buf0, 40, grid=grid(40), stream=stream0) 2023-01-11T21:38:06.7253322Z del arg0_1 2023-01-11T21:38:06.7253526Z buf1 = empty_strided((2, 2, 10), (20, 10, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.7253605Z buf1.copy_(buf0) 2023-01-11T21:38:06.7253681Z return (buf1, ) 2023-01-11T21:38:06.7253686Z 2023-01-11T21:38:06.7253690Z 2023-01-11T21:38:06.7253769Z if __name__ == "__main__": 2023-01-11T21:38:06.7253879Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.7254004Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.7254212Z arg0_1 = rand_strided((2, 2, 10), (20, 10, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.7254360Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.7254366Z 2023-01-11T21:38:06.7254437Z ok (0.173s) 2023-01-11T21:38:06.7255030Z test_to_dtype_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.7255162Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.7255421Z [2023-01-11 21:35:47,836] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 895 2023-01-11T21:38:06.7255684Z [2023-01-11 21:35:47,909] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 895 2023-01-11T21:38:06.7256101Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.7256228Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.7256478Z [2023-01-11 21:35:48,002] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 896 2023-01-11T21:38:06.7256738Z [2023-01-11 21:35:48,019] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 896 2023-01-11T21:38:06.7256744Z 2023-01-11T21:38:06.7256840Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.7256914Z import torch 2023-01-11T21:38:06.7256988Z import random 2023-01-11T21:38:06.7257106Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.7257299Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.7257305Z 2023-01-11T21:38:06.7257380Z aten = torch.ops.aten 2023-01-11T21:38:06.7257518Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.7257613Z async_compile = AsyncCompile() 2023-01-11T21:38:06.7257618Z 2023-01-11T21:38:06.7257749Z import triton 2023-01-11T21:38:06.7257846Z import triton.language as tl 2023-01-11T21:38:06.7257971Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.7258110Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.7258116Z 2023-01-11T21:38:06.7258120Z 2023-01-11T21:38:06.7258343Z triton_fused_convert_element_type_1_convert_element_type_2_0 = async_compile.triton(''' 2023-01-11T21:38:06.7258410Z import triton 2023-01-11T21:38:06.7258503Z import triton.language as tl 2023-01-11T21:38:06.7258616Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.7258719Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.7258859Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.7258985Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.7258990Z 2023-01-11T21:38:06.7259403Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp64', 1: '*fp32', 2: '*i1', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.7259478Z @triton.jit 2023-01-11T21:38:06.7259614Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.7259688Z xnumel = 40 2023-01-11T21:38:06.7259785Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.7259913Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.7259996Z xmask = xindex < xnumel 2023-01-11T21:38:06.7260068Z x0 = xindex 2023-01-11T21:38:06.7260258Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.7260399Z tmp4 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.7260473Z tmp1 = 1 2023-01-11T21:38:06.7260555Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.7260646Z tmp3 = tmp2.to(tl.float32) 2023-01-11T21:38:06.7260727Z tmp5 = (tmp4 != 0) 2023-01-11T21:38:06.7260868Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.7261005Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask) 2023-01-11T21:38:06.7261087Z ''') 2023-01-11T21:38:06.7261101Z 2023-01-11T21:38:06.7261106Z 2023-01-11T21:38:06.7261194Z async_compile.wait(globals()) 2023-01-11T21:38:06.7261274Z del async_compile 2023-01-11T21:38:06.7261279Z 2023-01-11T21:38:06.7261356Z def call(args): 2023-01-11T21:38:06.7261438Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.7261516Z args.clear() 2023-01-11T21:38:06.7261611Z with torch.cuda.device(0): 2023-01-11T21:38:06.7261821Z buf0 = empty_strided((2, 2, 10), (20, 10, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.7262024Z buf1 = empty_strided((2, 2, 10), (20, 10, 1), device='cuda', dtype=torch.bool) 2023-01-11T21:38:06.7262120Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.7262309Z triton_fused_convert_element_type_1_convert_element_type_2_0.run(arg1_1, buf0, buf1, 40, grid=grid(40), stream=stream0) 2023-01-11T21:38:06.7262416Z return (arg0_1, buf0, arg1_1, buf1, ) 2023-01-11T21:38:06.7262422Z 2023-01-11T21:38:06.7262426Z 2023-01-11T21:38:06.7262510Z if __name__ == "__main__": 2023-01-11T21:38:06.7262629Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.7262757Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.7262969Z arg0_1 = rand_strided((2, 2, 10), (20, 10, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.7263173Z arg1_1 = rand_strided((2, 2, 10), (20, 10, 1), device='cuda:0', dtype=torch.float64) 2023-01-11T21:38:06.7263297Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.7263306Z 2023-01-11T21:38:06.7263310Z 2023-01-11T21:38:06.7263411Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.7263490Z import torch 2023-01-11T21:38:06.7263565Z import random 2023-01-11T21:38:06.7263685Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.7263840Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.7263845Z 2023-01-11T21:38:06.7263933Z aten = torch.ops.aten 2023-01-11T21:38:06.7264064Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.7264162Z async_compile = AsyncCompile() 2023-01-11T21:38:06.7264167Z 2023-01-11T21:38:06.7264244Z import triton 2023-01-11T21:38:06.7264338Z import triton.language as tl 2023-01-11T21:38:06.7264464Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.7264605Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.7264611Z 2023-01-11T21:38:06.7264616Z 2023-01-11T21:38:06.7264803Z triton_fused_convert_element_type_0 = async_compile.triton(''' 2023-01-11T21:38:06.7264884Z import triton 2023-01-11T21:38:06.7264972Z import triton.language as tl 2023-01-11T21:38:06.7265089Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.7265196Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.7265334Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.7265461Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.7265467Z 2023-01-11T21:38:06.7265870Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.7265946Z @triton.jit 2023-01-11T21:38:06.7266080Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.7266149Z xnumel = 40 2023-01-11T21:38:06.7266249Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.7266410Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.7266496Z xmask = xindex < xnumel 2023-01-11T21:38:06.7266570Z x0 = xindex 2023-01-11T21:38:06.7266690Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.7266783Z tmp1 = tmp0.to(tl.float32) 2023-01-11T21:38:06.7266913Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask) 2023-01-11T21:38:06.7267003Z ''') 2023-01-11T21:38:06.7267008Z 2023-01-11T21:38:06.7267014Z 2023-01-11T21:38:06.7267238Z triton_fused_convert_element_type_1_convert_element_type_2_1 = async_compile.triton(''' 2023-01-11T21:38:06.7267316Z import triton 2023-01-11T21:38:06.7267411Z import triton.language as tl 2023-01-11T21:38:06.7267525Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.7267629Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.7267764Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.7267884Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.7267892Z 2023-01-11T21:38:06.7268306Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp64', 1: '*fp32', 2: '*i1', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.7268386Z @triton.jit 2023-01-11T21:38:06.7268534Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.7268610Z xnumel = 40 2023-01-11T21:38:06.7268712Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.7268843Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.7268930Z xmask = xindex < xnumel 2023-01-11T21:38:06.7268997Z x0 = xindex 2023-01-11T21:38:06.7269194Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.7269294Z tmp4 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.7269369Z tmp1 = 1 2023-01-11T21:38:06.7269453Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.7269545Z tmp3 = tmp2.to(tl.float32) 2023-01-11T21:38:06.7269627Z tmp5 = (tmp4 != 0) 2023-01-11T21:38:06.7269757Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.7269890Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask) 2023-01-11T21:38:06.7270005Z ''') 2023-01-11T21:38:06.7270011Z 2023-01-11T21:38:06.7270016Z 2023-01-11T21:38:06.7270113Z async_compile.wait(globals()) 2023-01-11T21:38:06.7270192Z del async_compile 2023-01-11T21:38:06.7270197Z 2023-01-11T21:38:06.7270275Z def call(args): 2023-01-11T21:38:06.7270358Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.7270436Z args.clear() 2023-01-11T21:38:06.7270524Z with torch.cuda.device(0): 2023-01-11T21:38:06.7270738Z buf0 = empty_strided((2, 2, 10), (20, 10, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.7270833Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.7270995Z triton_fused_convert_element_type_0.run(arg0_1, buf0, 40, grid=grid(40), stream=stream0) 2023-01-11T21:38:06.7271073Z del arg0_1 2023-01-11T21:38:06.7271280Z buf1 = empty_strided((2, 2, 10), (20, 10, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.7271488Z buf2 = empty_strided((2, 2, 10), (20, 10, 1), device='cuda', dtype=torch.bool) 2023-01-11T21:38:06.7271671Z triton_fused_convert_element_type_1_convert_element_type_2_1.run(arg1_1, buf1, buf2, 40, grid=grid(40), stream=stream0) 2023-01-11T21:38:06.7271774Z return (buf0, buf1, arg1_1, buf2, ) 2023-01-11T21:38:06.7271779Z 2023-01-11T21:38:06.7271784Z 2023-01-11T21:38:06.7271865Z if __name__ == "__main__": 2023-01-11T21:38:06.7271985Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.7272113Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.7272323Z arg0_1 = rand_strided((2, 2, 10), (20, 10, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.7272536Z arg1_1 = rand_strided((2, 2, 10), (20, 10, 1), device='cuda:0', dtype=torch.float64) 2023-01-11T21:38:06.7272685Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.7272690Z 2023-01-11T21:38:06.7272761Z ok (0.276s) 2023-01-11T21:38:06.7273211Z test_topk_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.7273344Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.7273605Z [2023-01-11 21:35:48,035] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 897 2023-01-11T21:38:06.7273825Z [2023-01-11 21:35:48,040] torch._inductor.ir: [WARNING] Using FallbackKernel: aten.topk 2023-01-11T21:38:06.7274094Z [2023-01-11 21:35:48,043] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 897 2023-01-11T21:38:06.7274515Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.7274644Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.7274895Z [2023-01-11 21:35:48,056] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 898 2023-01-11T21:38:06.7275106Z [2023-01-11 21:35:48,061] torch._inductor.ir: [WARNING] Using FallbackKernel: aten.topk 2023-01-11T21:38:06.7275362Z [2023-01-11 21:35:48,063] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 898 2023-01-11T21:38:06.7275370Z 2023-01-11T21:38:06.7275469Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.7275537Z import torch 2023-01-11T21:38:06.7275613Z import random 2023-01-11T21:38:06.7275733Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.7275858Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.7275890Z 2023-01-11T21:38:06.7275974Z aten = torch.ops.aten 2023-01-11T21:38:06.7276112Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.7276208Z async_compile = AsyncCompile() 2023-01-11T21:38:06.7276213Z 2023-01-11T21:38:06.7276280Z import triton 2023-01-11T21:38:06.7276377Z import triton.language as tl 2023-01-11T21:38:06.7276504Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.7276645Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.7276651Z 2023-01-11T21:38:06.7276655Z 2023-01-11T21:38:06.7276747Z async_compile.wait(globals()) 2023-01-11T21:38:06.7276827Z del async_compile 2023-01-11T21:38:06.7276832Z 2023-01-11T21:38:06.7276907Z def call(args): 2023-01-11T21:38:06.7276983Z arg0_1, = args 2023-01-11T21:38:06.7277051Z args.clear() 2023-01-11T21:38:06.7277145Z with torch.cuda.device(0): 2023-01-11T21:38:06.7277237Z buf0 = aten.topk(arg0_1, 2) 2023-01-11T21:38:06.7277310Z del arg0_1 2023-01-11T21:38:06.7277386Z buf1 = buf0[0] 2023-01-11T21:38:06.7277501Z assert_size_stride(buf1, (1, 1, 8, 2), (16, 16, 2, 1)) 2023-01-11T21:38:06.7277578Z buf2 = buf0[1] 2023-01-11T21:38:06.7277681Z assert_size_stride(buf2, (1, 1, 8, 2), (16, 16, 2, 1)) 2023-01-11T21:38:06.7277751Z del buf0 2023-01-11T21:38:06.7277836Z return (buf1, buf2, ) 2023-01-11T21:38:06.7277841Z 2023-01-11T21:38:06.7277846Z 2023-01-11T21:38:06.7277928Z if __name__ == "__main__": 2023-01-11T21:38:06.7278046Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.7278171Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.7278419Z arg0_1 = rand_strided((1, 1, 8, 8), (64, 64, 8, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.7278525Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.7278537Z 2023-01-11T21:38:06.7278542Z 2023-01-11T21:38:06.7278631Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.7278709Z import torch 2023-01-11T21:38:06.7278783Z import random 2023-01-11T21:38:06.7278907Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.7279032Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.7279037Z 2023-01-11T21:38:06.7279121Z aten = torch.ops.aten 2023-01-11T21:38:06.7279257Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.7279346Z async_compile = AsyncCompile() 2023-01-11T21:38:06.7279358Z 2023-01-11T21:38:06.7279425Z import triton 2023-01-11T21:38:06.7279517Z import triton.language as tl 2023-01-11T21:38:06.7279642Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.7279783Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.7279788Z 2023-01-11T21:38:06.7279792Z 2023-01-11T21:38:06.7279886Z async_compile.wait(globals()) 2023-01-11T21:38:06.7279963Z del async_compile 2023-01-11T21:38:06.7279968Z 2023-01-11T21:38:06.7280042Z def call(args): 2023-01-11T21:38:06.7280112Z arg0_1, = args 2023-01-11T21:38:06.7280186Z args.clear() 2023-01-11T21:38:06.7280278Z with torch.cuda.device(0): 2023-01-11T21:38:06.7280370Z buf0 = aten.topk(arg0_1, 2) 2023-01-11T21:38:06.7280442Z del arg0_1 2023-01-11T21:38:06.7280517Z buf1 = buf0[0] 2023-01-11T21:38:06.7280631Z assert_size_stride(buf1, (1, 1, 8, 2), (16, 16, 2, 1)) 2023-01-11T21:38:06.7280699Z buf2 = buf0[1] 2023-01-11T21:38:06.7280806Z assert_size_stride(buf2, (1, 1, 8, 2), (16, 16, 2, 1)) 2023-01-11T21:38:06.7280877Z del buf0 2023-01-11T21:38:06.7280960Z return (buf1, buf2, ) 2023-01-11T21:38:06.7280967Z 2023-01-11T21:38:06.7280971Z 2023-01-11T21:38:06.7281051Z if __name__ == "__main__": 2023-01-11T21:38:06.7281168Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.7281294Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.7281537Z arg0_1 = rand_strided((1, 1, 8, 8), (64, 64, 8, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.7281644Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.7281649Z 2023-01-11T21:38:06.7281721Z ok (0.043s) 2023-01-11T21:38:06.7282183Z test_transpose_add_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.7282314Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.7282576Z [2023-01-11 21:35:48,078] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 899 2023-01-11T21:38:06.7282836Z [2023-01-11 21:35:48,696] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 899 2023-01-11T21:38:06.7282842Z 2023-01-11T21:38:06.7282941Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.7283021Z import torch 2023-01-11T21:38:06.7283096Z import random 2023-01-11T21:38:06.7283208Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.7283331Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.7283337Z 2023-01-11T21:38:06.7283417Z aten = torch.ops.aten 2023-01-11T21:38:06.7283555Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.7283650Z async_compile = AsyncCompile() 2023-01-11T21:38:06.7283655Z 2023-01-11T21:38:06.7283729Z import triton 2023-01-11T21:38:06.7283821Z import triton.language as tl 2023-01-11T21:38:06.7283967Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.7284107Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.7284112Z 2023-01-11T21:38:06.7284117Z 2023-01-11T21:38:06.7284270Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.7284345Z import triton 2023-01-11T21:38:06.7284440Z import triton.language as tl 2023-01-11T21:38:06.7284553Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.7284659Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.7284792Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.7284910Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.7284923Z 2023-01-11T21:38:06.7285400Z @pointwise(size_hints=[32, 16], tile_hint=TileHint.DEFAULT,filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]}) 2023-01-11T21:38:06.7285477Z @triton.jit 2023-01-11T21:38:06.7285654Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, ynumel, XBLOCK : tl.constexpr, YBLOCK : tl.constexpr): 2023-01-11T21:38:06.7285733Z xnumel = 32 2023-01-11T21:38:06.7285806Z ynumel = 16 2023-01-11T21:38:06.7285910Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.7286044Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.7286121Z xmask = xindex < xnumel 2023-01-11T21:38:06.7286217Z yoffset = tl.program_id(1) * YBLOCK 2023-01-11T21:38:06.7286350Z yindex = yoffset + tl.reshape(tl.arange(0, YBLOCK), [1, YBLOCK]) 2023-01-11T21:38:06.7286433Z ymask = yindex < ynumel 2023-01-11T21:38:06.7286505Z x0 = xindex 2023-01-11T21:38:06.7286576Z y1 = yindex 2023-01-11T21:38:06.7286693Z tmp0 = tl.load(in_ptr0 + (x0 + (32*y1)), xmask & ymask) 2023-01-11T21:38:06.7286802Z tmp1 = tl.load(in_ptr1 + (y1 + (16*x0)), xmask & ymask) 2023-01-11T21:38:06.7286883Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.7287043Z tl.store(out_ptr0 + (x0 + (32*y1) + tl.zeros([XBLOCK, YBLOCK], tl.int32)), tmp2, xmask & ymask) 2023-01-11T21:38:06.7287131Z ''') 2023-01-11T21:38:06.7287137Z 2023-01-11T21:38:06.7287141Z 2023-01-11T21:38:06.7287236Z async_compile.wait(globals()) 2023-01-11T21:38:06.7287342Z del async_compile 2023-01-11T21:38:06.7287347Z 2023-01-11T21:38:06.7287427Z def call(args): 2023-01-11T21:38:06.7287508Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.7287576Z args.clear() 2023-01-11T21:38:06.7287669Z with torch.cuda.device(0): 2023-01-11T21:38:06.7287873Z buf0 = empty_strided((32, 16), (1, 32), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.7287966Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.7288112Z triton_fused_add_0.run(arg0_1, arg1_1, buf0, 32, 16, grid=grid(32, 16), stream=stream0) 2023-01-11T21:38:06.7288187Z del arg0_1 2023-01-11T21:38:06.7288260Z del arg1_1 2023-01-11T21:38:06.7288334Z return (buf0, ) 2023-01-11T21:38:06.7288339Z 2023-01-11T21:38:06.7288351Z 2023-01-11T21:38:06.7288424Z if __name__ == "__main__": 2023-01-11T21:38:06.7288541Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.7288666Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.7288874Z arg0_1 = rand_strided((16, 32), (32, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.7289075Z arg1_1 = rand_strided((32, 16), (16, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.7289197Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.7289202Z 2023-01-11T21:38:06.7289273Z ok (0.633s) 2023-01-11T21:38:06.7289729Z test_transpose_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.7289889Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.7290140Z [2023-01-11 21:35:48,718] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 900 2023-01-11T21:38:06.7290405Z [2023-01-11 21:35:48,904] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 900 2023-01-11T21:38:06.7290818Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.7290949Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.7291208Z [2023-01-11 21:35:48,927] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 901 2023-01-11T21:38:06.7291470Z [2023-01-11 21:35:49,016] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 901 2023-01-11T21:38:06.7291476Z 2023-01-11T21:38:06.7291575Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.7291654Z import torch 2023-01-11T21:38:06.7291729Z import random 2023-01-11T21:38:06.7291842Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.7291967Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.7291972Z 2023-01-11T21:38:06.7292056Z aten = torch.ops.aten 2023-01-11T21:38:06.7292193Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.7292290Z async_compile = AsyncCompile() 2023-01-11T21:38:06.7292295Z 2023-01-11T21:38:06.7292370Z import triton 2023-01-11T21:38:06.7292461Z import triton.language as tl 2023-01-11T21:38:06.7292585Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.7292721Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.7292726Z 2023-01-11T21:38:06.7292731Z 2023-01-11T21:38:06.7292897Z triton_fused_add_add_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.7292973Z import triton 2023-01-11T21:38:06.7293067Z import triton.language as tl 2023-01-11T21:38:06.7293205Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.7293310Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.7293447Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.7293566Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.7293581Z 2023-01-11T21:38:06.7294065Z @pointwise(size_hints=[8, 8], tile_hint=TileHint.DEFAULT,filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: 'i32', 5: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.7294141Z @triton.jit 2023-01-11T21:38:06.7294328Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, xnumel, ynumel, XBLOCK : tl.constexpr, YBLOCK : tl.constexpr): 2023-01-11T21:38:06.7294403Z xnumel = 8 2023-01-11T21:38:06.7294677Z ynumel = 8 2023-01-11T21:38:06.7294781Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.7294923Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.7295007Z xmask = xindex < xnumel 2023-01-11T21:38:06.7295097Z yoffset = tl.program_id(1) * YBLOCK 2023-01-11T21:38:06.7295230Z yindex = yoffset + tl.reshape(tl.arange(0, YBLOCK), [1, YBLOCK]) 2023-01-11T21:38:06.7295314Z ymask = yindex < ynumel 2023-01-11T21:38:06.7295385Z x0 = xindex 2023-01-11T21:38:06.7295457Z y1 = yindex 2023-01-11T21:38:06.7295577Z tmp0 = tl.load(in_ptr0 + (x0 + (8*y1)), xmask & ymask) 2023-01-11T21:38:06.7295798Z tmp1 = tl.load(in_ptr1 + (y1 + (8*x0)), xmask & ymask, eviction_policy='evict_last') 2023-01-11T21:38:06.7295966Z tmp3 = tl.load(in_ptr1 + (y1 + (8*x0)), xmask & ymask) 2023-01-11T21:38:06.7296050Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.7296125Z tmp4 = 2 2023-01-11T21:38:06.7296206Z tmp5 = tmp3 * tmp4 2023-01-11T21:38:06.7296283Z tmp6 = 10 2023-01-11T21:38:06.7296363Z tmp7 = tmp5 + tmp6 2023-01-11T21:38:06.7296520Z tl.store(out_ptr0 + (x0 + (8*y1) + tl.zeros([XBLOCK, YBLOCK], tl.int32)), tmp2, xmask & ymask) 2023-01-11T21:38:06.7296673Z tl.store(out_ptr1 + (y1 + (8*x0) + tl.zeros([XBLOCK, YBLOCK], tl.int32)), tmp7, xmask & ymask) 2023-01-11T21:38:06.7296763Z ''') 2023-01-11T21:38:06.7296769Z 2023-01-11T21:38:06.7296773Z 2023-01-11T21:38:06.7296871Z async_compile.wait(globals()) 2023-01-11T21:38:06.7296950Z del async_compile 2023-01-11T21:38:06.7296955Z 2023-01-11T21:38:06.7297034Z def call(args): 2023-01-11T21:38:06.7297116Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.7297257Z args.clear() 2023-01-11T21:38:06.7297359Z with torch.cuda.device(0): 2023-01-11T21:38:06.7297587Z buf0 = empty_strided((8, 8), (1, 8), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.7297789Z buf1 = empty_strided((8, 8), (1, 8), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.7297884Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.7298048Z triton_fused_add_add_1_0.run(arg0_1, arg1_1, buf0, buf1, 8, 8, grid=grid(8, 8), stream=stream0) 2023-01-11T21:38:06.7298127Z del arg0_1 2023-01-11T21:38:06.7298203Z del arg1_1 2023-01-11T21:38:06.7298282Z return (buf0, buf1, ) 2023-01-11T21:38:06.7298295Z 2023-01-11T21:38:06.7298300Z 2023-01-11T21:38:06.7298376Z if __name__ == "__main__": 2023-01-11T21:38:06.7298496Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.7298625Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.7298827Z arg0_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.7299024Z arg1_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.7299151Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.7299156Z 2023-01-11T21:38:06.7299160Z 2023-01-11T21:38:06.7299261Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.7299338Z import torch 2023-01-11T21:38:06.7299408Z import random 2023-01-11T21:38:06.7299574Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.7299703Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.7299711Z 2023-01-11T21:38:06.7299796Z aten = torch.ops.aten 2023-01-11T21:38:06.7299935Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.7300033Z async_compile = AsyncCompile() 2023-01-11T21:38:06.7300038Z 2023-01-11T21:38:06.7300114Z import triton 2023-01-11T21:38:06.7300203Z import triton.language as tl 2023-01-11T21:38:06.7300331Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.7300475Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.7300484Z 2023-01-11T21:38:06.7300488Z 2023-01-11T21:38:06.7300654Z triton_fused_add_add_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.7300731Z import triton 2023-01-11T21:38:06.7300826Z import triton.language as tl 2023-01-11T21:38:06.7300944Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.7301052Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.7301181Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.7301309Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.7301314Z 2023-01-11T21:38:06.7301797Z @pointwise(size_hints=[8, 8], tile_hint=TileHint.DEFAULT,filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: '*fp16', 4: 'i32', 5: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.7301875Z @triton.jit 2023-01-11T21:38:06.7302066Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, xnumel, ynumel, XBLOCK : tl.constexpr, YBLOCK : tl.constexpr): 2023-01-11T21:38:06.7302180Z xnumel = 8 2023-01-11T21:38:06.7302254Z ynumel = 8 2023-01-11T21:38:06.7302352Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.7302480Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.7302567Z xmask = xindex < xnumel 2023-01-11T21:38:06.7302665Z yoffset = tl.program_id(1) * YBLOCK 2023-01-11T21:38:06.7302796Z yindex = yoffset + tl.reshape(tl.arange(0, YBLOCK), [1, YBLOCK]) 2023-01-11T21:38:06.7302881Z ymask = yindex < ynumel 2023-01-11T21:38:06.7302953Z x0 = xindex 2023-01-11T21:38:06.7303023Z y1 = yindex 2023-01-11T21:38:06.7303148Z tmp0 = tl.load(in_ptr0 + (x0 + (8*y1)), xmask & ymask).to(tl.float32) 2023-01-11T21:38:06.7303386Z tmp1 = tl.load(in_ptr1 + (y1 + (8*x0)), xmask & ymask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.7303515Z tmp3 = tl.load(in_ptr1 + (y1 + (8*x0)), xmask & ymask).to(tl.float32) 2023-01-11T21:38:06.7303601Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.7303673Z tmp4 = 2 2023-01-11T21:38:06.7303752Z tmp5 = tmp3 * tmp4 2023-01-11T21:38:06.7303825Z tmp6 = 10 2023-01-11T21:38:06.7303895Z tmp7 = tmp5 + tmp6 2023-01-11T21:38:06.7304059Z tl.store(out_ptr0 + (x0 + (8*y1) + tl.zeros([XBLOCK, YBLOCK], tl.int32)), tmp2, xmask & ymask) 2023-01-11T21:38:06.7304211Z tl.store(out_ptr1 + (y1 + (8*x0) + tl.zeros([XBLOCK, YBLOCK], tl.int32)), tmp7, xmask & ymask) 2023-01-11T21:38:06.7304296Z ''') 2023-01-11T21:38:06.7304302Z 2023-01-11T21:38:06.7304306Z 2023-01-11T21:38:06.7304400Z async_compile.wait(globals()) 2023-01-11T21:38:06.7304480Z del async_compile 2023-01-11T21:38:06.7304485Z 2023-01-11T21:38:06.7304560Z def call(args): 2023-01-11T21:38:06.7304640Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.7304708Z args.clear() 2023-01-11T21:38:06.7304804Z with torch.cuda.device(0): 2023-01-11T21:38:06.7305004Z buf0 = empty_strided((8, 8), (1, 8), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.7305206Z buf1 = empty_strided((8, 8), (1, 8), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.7305299Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.7305458Z triton_fused_add_add_1_0.run(arg0_1, arg1_1, buf0, buf1, 8, 8, grid=grid(8, 8), stream=stream0) 2023-01-11T21:38:06.7305560Z del arg0_1 2023-01-11T21:38:06.7305635Z del arg1_1 2023-01-11T21:38:06.7305712Z return (buf0, buf1, ) 2023-01-11T21:38:06.7305717Z 2023-01-11T21:38:06.7305722Z 2023-01-11T21:38:06.7305803Z if __name__ == "__main__": 2023-01-11T21:38:06.7305921Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.7306049Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.7306248Z arg0_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.7306446Z arg1_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.7306568Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.7306574Z 2023-01-11T21:38:06.7306644Z ok (0.321s) 2023-01-11T21:38:06.7307169Z test_transposed_propagates_cuda (__main__.CudaTests) ... /opt/conda/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py:372: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled.Consider setting `torch.set_float32_matmul_precision('high')` 2023-01-11T21:38:06.7307251Z warnings.warn( 2023-01-11T21:38:06.7307508Z [2023-01-11 21:35:49,031] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 902 2023-01-11T21:38:06.7307770Z [2023-01-11 21:35:49,100] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 902 2023-01-11T21:38:06.7307776Z 2023-01-11T21:38:06.7307874Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.7307949Z import torch 2023-01-11T21:38:06.7308022Z import random 2023-01-11T21:38:06.7308142Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.7308289Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.7308302Z 2023-01-11T21:38:06.7308378Z aten = torch.ops.aten 2023-01-11T21:38:06.7308517Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.7308613Z async_compile = AsyncCompile() 2023-01-11T21:38:06.7308620Z 2023-01-11T21:38:06.7308695Z import triton 2023-01-11T21:38:06.7308787Z import triton.language as tl 2023-01-11T21:38:06.7308911Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.7309049Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.7309055Z 2023-01-11T21:38:06.7309059Z 2023-01-11T21:38:06.7309206Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.7309281Z import triton 2023-01-11T21:38:06.7309375Z import triton.language as tl 2023-01-11T21:38:06.7309491Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.7309592Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.7309730Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.7309855Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.7309861Z 2023-01-11T21:38:06.7310283Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.7310350Z @triton.jit 2023-01-11T21:38:06.7310491Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.7310565Z xnumel = 64 2023-01-11T21:38:06.7310661Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.7310791Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.7310875Z xmask = xindex < xnumel 2023-01-11T21:38:06.7310947Z x0 = xindex 2023-01-11T21:38:06.7311038Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.7311137Z tmp1 = tl.load(in_ptr1 + (x0), xmask) 2023-01-11T21:38:06.7311217Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.7311352Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.7311437Z ''') 2023-01-11T21:38:06.7311443Z 2023-01-11T21:38:06.7311447Z 2023-01-11T21:38:06.7311569Z async_compile.wait(globals()) 2023-01-11T21:38:06.7311648Z del async_compile 2023-01-11T21:38:06.7311654Z 2023-01-11T21:38:06.7311728Z def call(args): 2023-01-11T21:38:06.7311800Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.7311876Z args.clear() 2023-01-11T21:38:06.7311967Z with torch.cuda.device(0): 2023-01-11T21:38:06.7312182Z buf0 = empty_strided((1, 4, 4, 4), (64, 4, 1, 16), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.7312275Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.7312422Z triton_fused_add_0.run(arg0_1, arg1_1, buf0, 64, grid=grid(64), stream=stream0) 2023-01-11T21:38:06.7312495Z del arg0_1 2023-01-11T21:38:06.7312564Z del arg1_1 2023-01-11T21:38:06.7312640Z return (buf0, ) 2023-01-11T21:38:06.7312646Z 2023-01-11T21:38:06.7312650Z 2023-01-11T21:38:06.7312730Z if __name__ == "__main__": 2023-01-11T21:38:06.7312848Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.7312975Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.7313190Z arg0_1 = rand_strided((1, 4, 4, 4), (64, 4, 1, 16), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.7313395Z arg1_1 = rand_strided((4, 4, 4), (4, 1, 16), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.7313516Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.7313521Z 2023-01-11T21:38:06.7313585Z ok (0.239s) 2023-01-11T21:38:06.7314096Z test_triton_conv_cuda (__main__.CudaTests) ... /opt/conda/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py:372: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled.Consider setting `torch.set_float32_matmul_precision('high')` 2023-01-11T21:38:06.7314203Z warnings.warn( 2023-01-11T21:38:06.7314459Z [2023-01-11 21:35:49,300] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 903 2023-01-11T21:38:06.7314798Z [2023-01-11 21:35:49,345] torch._inductor.codegen.triton_template: [DEBUG] template_codegen: TemplateSchedulerNode(name='buf1') -- [SchedulerNode(name='buf2')] 2023-01-11T21:38:06.7315012Z [2023-01-11 21:35:49,370] torch._inductor.scheduler: [DEBUG] remove_buffer('buf1') 2023-01-11T21:38:06.7315018Z 2023-01-11T21:38:06.7315117Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.7315192Z import torch 2023-01-11T21:38:06.7315266Z import random 2023-01-11T21:38:06.7315378Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.7315500Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.7315505Z 2023-01-11T21:38:06.7315588Z aten = torch.ops.aten 2023-01-11T21:38:06.7315723Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.7315826Z async_compile = AsyncCompile() 2023-01-11T21:38:06.7315831Z 2023-01-11T21:38:06.7315904Z import triton 2023-01-11T21:38:06.7315996Z import triton.language as tl 2023-01-11T21:38:06.7316114Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.7316255Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.7316261Z 2023-01-11T21:38:06.7316414Z from torch._inductor.triton_ops.conv_perf_model import early_config_prune 2023-01-11T21:38:06.7316562Z from torch._inductor.triton_ops.conv_perf_model import estimate_conv_time 2023-01-11T21:38:06.7316700Z from torch._inductor.triton_ops.autotune import conv_heuristics 2023-01-11T21:38:06.7316705Z 2023-01-11T21:38:06.7316710Z 2023-01-11T21:38:06.7316879Z triton_fused_convolution_0 = async_compile.triton(''' 2023-01-11T21:38:06.7316954Z import triton 2023-01-11T21:38:06.7317045Z import triton.language as tl 2023-01-11T21:38:06.7317158Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.7317255Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.7317391Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.7317521Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.7317526Z 2023-01-11T21:38:06.7318027Z @pointwise(size_hints=[4096, 1024], tile_hint=TileHint.SQUARE,filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.7318103Z @triton.jit 2023-01-11T21:38:06.7318269Z def triton_(in_ptr0, out_ptr0, xnumel, ynumel, XBLOCK : tl.constexpr, YBLOCK : tl.constexpr): 2023-01-11T21:38:06.7318346Z xnumel = 4096 2023-01-11T21:38:06.7318419Z ynumel = 1024 2023-01-11T21:38:06.7318510Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.7318646Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.7318732Z xmask = xindex < xnumel 2023-01-11T21:38:06.7318834Z yoffset = tl.program_id(1) * YBLOCK 2023-01-11T21:38:06.7318966Z yindex = yoffset + tl.reshape(tl.arange(0, YBLOCK), [1, YBLOCK]) 2023-01-11T21:38:06.7319051Z ymask = yindex < ynumel 2023-01-11T21:38:06.7319122Z x3 = xindex 2023-01-11T21:38:06.7319187Z y2 = yindex 2023-01-11T21:38:06.7319263Z x0 = xindex % 128 2023-01-11T21:38:06.7319343Z x1 = (xindex // 128) 2023-01-11T21:38:06.7319502Z tmp0 = tl.load(in_ptr0 + (y2 + (1024*x3) + tl.zeros([XBLOCK, YBLOCK], tl.int32)), xmask & ymask) 2023-01-11T21:38:06.7319667Z tl.store(out_ptr0 + (x0 + (128*y2) + (131072*x1) + tl.zeros([XBLOCK, YBLOCK], tl.int32)), tmp0, xmask & ymask) 2023-01-11T21:38:06.7319755Z ''') 2023-01-11T21:38:06.7319761Z 2023-01-11T21:38:06.7319841Z @conv_heuristics() 2023-01-11T21:38:06.7319906Z @triton.jit 2023-01-11T21:38:06.7319992Z def triton_template_1( 2023-01-11T21:38:06.7320058Z x, 2023-01-11T21:38:06.7320157Z w, 2023-01-11T21:38:06.7320237Z # stride of tensor 2023-01-11T21:38:06.7320308Z stride_xn, 2023-01-11T21:38:06.7320379Z stride_xc, 2023-01-11T21:38:06.7320443Z stride_xh, 2023-01-11T21:38:06.7320515Z stride_xw, 2023-01-11T21:38:06.7320585Z stride_wn, 2023-01-11T21:38:06.7320657Z stride_wc, 2023-01-11T21:38:06.7320732Z stride_wh, 2023-01-11T21:38:06.7320802Z stride_ww, 2023-01-11T21:38:06.7320865Z stride_yn, 2023-01-11T21:38:06.7320935Z stride_yc, 2023-01-11T21:38:06.7321004Z stride_yh, 2023-01-11T21:38:06.7321074Z stride_yw, 2023-01-11T21:38:06.7321150Z stride_biasn, 2023-01-11T21:38:06.7321230Z # Tensor dimensions 2023-01-11T21:38:06.7321299Z BATCH, 2023-01-11T21:38:06.7321361Z IN_C, 2023-01-11T21:38:06.7321428Z IN_H, 2023-01-11T21:38:06.7321495Z IN_W, 2023-01-11T21:38:06.7321565Z KERNEL_N, 2023-01-11T21:38:06.7321636Z KERNEL_H, 2023-01-11T21:38:06.7321705Z KERNEL_W, 2023-01-11T21:38:06.7321766Z OUT_H, 2023-01-11T21:38:06.7321838Z OUT_W, 2023-01-11T21:38:06.7321924Z # parameters of conv 2023-01-11T21:38:06.7321995Z stride_h, 2023-01-11T21:38:06.7322065Z stride_w, 2023-01-11T21:38:06.7322138Z padding_h, 2023-01-11T21:38:06.7322211Z padding_w, 2023-01-11T21:38:06.7322276Z dilation_h, 2023-01-11T21:38:06.7322349Z dilation_w, 2023-01-11T21:38:06.7322432Z output_padding_h, 2023-01-11T21:38:06.7322512Z output_padding_w, 2023-01-11T21:38:06.7322598Z groups: tl.constexpr, 2023-01-11T21:38:06.7322679Z # pointer inc for x 2023-01-11T21:38:06.7322747Z delta_x_ptr, 2023-01-11T21:38:06.7322831Z # fusable kernels args 2023-01-11T21:38:06.7322902Z in_ptr0, 2023-01-11T21:38:06.7322972Z out_ptr3, 2023-01-11T21:38:06.7323052Z # Metaparameters 2023-01-11T21:38:06.7323141Z ACC_TYPE: tl.constexpr, 2023-01-11T21:38:06.7323233Z CONV1X1_NHWC: tl.constexpr, 2023-01-11T21:38:06.7323320Z # blocks in different dimension 2023-01-11T21:38:06.7323407Z BLOCK_M: tl.constexpr, 2023-01-11T21:38:06.7323495Z BLOCK_N: tl.constexpr, 2023-01-11T21:38:06.7323605Z # reduction tiling parameter for matmul 2023-01-11T21:38:06.7323691Z BLOCK_K: tl.constexpr, 2023-01-11T21:38:06.7323757Z ): 2023-01-11T21:38:06.7323818Z """ 2023-01-11T21:38:06.7324009Z each program instance computes a [BLOCK_BATCH, BLOCK_N, BLOCK_H, BLOCK_W] block of y 2023-01-11T21:38:06.7324079Z """ 2023-01-11T21:38:06.7324267Z # ----------------------------------------------------------- 2023-01-11T21:38:06.7324408Z # Map program ids `pid` to the block of y it should compute. 2023-01-11T21:38:06.7324498Z pid_nhw = tl.program_id(0) 2023-01-11T21:38:06.7324588Z pid_k = tl.program_id(1) 2023-01-11T21:38:06.7324593Z 2023-01-11T21:38:06.7324676Z # offset for output y 2023-01-11T21:38:06.7324784Z off_y_k = pid_k * BLOCK_N + tl.arange(0, BLOCK_N) 2023-01-11T21:38:06.7324902Z off_y_nhw = pid_nhw * BLOCK_M + tl.arange(0, BLOCK_M) 2023-01-11T21:38:06.7325003Z off_y_n = off_y_nhw // (OUT_H * OUT_W) 2023-01-11T21:38:06.7325104Z off_y_hw = off_y_nhw % (OUT_H * OUT_W) 2023-01-11T21:38:06.7325190Z off_y_h = off_y_hw // OUT_W 2023-01-11T21:38:06.7325278Z off_y_w = off_y_hw % OUT_W 2023-01-11T21:38:06.7325283Z 2023-01-11T21:38:06.7325380Z # offset for the initial ptr for x 2023-01-11T21:38:06.7325451Z off_x_n = off_y_n 2023-01-11T21:38:06.7325598Z off_x_h = off_y_h * stride_h - padding_h 2023-01-11T21:38:06.7325740Z off_x_w = off_y_w * stride_w - padding_w 2023-01-11T21:38:06.7325877Z off_x_nhw = off_x_n * stride_xn + off_x_h * stride_xh + off_x_w * stride_xw 2023-01-11T21:38:06.7325972Z off_x_crs = tl.arange(0, BLOCK_K) 2023-01-11T21:38:06.7325977Z 2023-01-11T21:38:06.7326068Z CRS = IN_C * KERNEL_H * KERNEL_W 2023-01-11T21:38:06.7326166Z # load inc ptr of x, upade x_ptrs 2023-01-11T21:38:06.7326241Z if not CONV1X1_NHWC: 2023-01-11T21:38:06.7326349Z delta_x_ptrs = delta_x_ptr + off_x_crs 2023-01-11T21:38:06.7326575Z off_x_crs_unpacked = tl.load(delta_x_ptrs, mask=off_x_crs < CRS, other=0) 2023-01-11T21:38:06.7326706Z x_ptrs = x + off_x_nhw[:, None] + off_x_crs_unpacked[None, :] 2023-01-11T21:38:06.7326774Z else: 2023-01-11T21:38:06.7326895Z x_ptrs = x + off_x_nhw[:, None] + off_x_crs[None, :] 2023-01-11T21:38:06.7326901Z 2023-01-11T21:38:06.7326976Z mask_x = ( 2023-01-11T21:38:06.7327055Z (off_x_n < BATCH) 2023-01-11T21:38:06.7327125Z & (off_x_h >= 0) 2023-01-11T21:38:06.7327206Z & (off_x_h < IN_H) 2023-01-11T21:38:06.7327283Z & (off_x_w >= 0) 2023-01-11T21:38:06.7327360Z & (off_x_w < IN_W) 2023-01-11T21:38:06.7327460Z )[:, None] & (off_x_crs < CRS)[None, :] 2023-01-11T21:38:06.7327466Z 2023-01-11T21:38:06.7327563Z # offset for the inital ptr for w 2023-01-11T21:38:06.7327654Z off_w_crs = tl.arange(0, BLOCK_K) 2023-01-11T21:38:06.7327723Z off_w_k = off_y_k 2023-01-11T21:38:06.7327849Z w_ptrs = w + off_w_crs[:, None] + off_w_k[None, :] * stride_wn 2023-01-11T21:38:06.7327984Z mask_w = (off_x_crs < CRS)[:, None] & (off_w_k < KERNEL_N)[None, :] 2023-01-11T21:38:06.7327989Z 2023-01-11T21:38:06.7328105Z # ------ load x ------ 2023-01-11T21:38:06.7328226Z matrix_x = tl.load(x_ptrs, mask=mask_x, other=0.0) 2023-01-11T21:38:06.7328339Z # ------ load w ------ 2023-01-11T21:38:06.7328461Z matrix_w = tl.load(w_ptrs, mask=mask_w, other=0.0) 2023-01-11T21:38:06.7328466Z 2023-01-11T21:38:06.7328645Z # ----------------------------------------------------------- 2023-01-11T21:38:06.7328724Z # allocate accumulator 2023-01-11T21:38:06.7328841Z acc = tl.zeros((BLOCK_M, BLOCK_N), dtype=ACC_TYPE) 2023-01-11T21:38:06.7328936Z for crs in range(0, CRS, BLOCK_K): 2023-01-11T21:38:06.7328941Z 2023-01-11T21:38:06.7329093Z # ------ matrix multiplication ------ 2023-01-11T21:38:06.7329194Z acc += tl.dot(matrix_x, matrix_w) 2023-01-11T21:38:06.7329323Z # ------ update ptrs ------ 2023-01-11T21:38:06.7329406Z w_ptrs += BLOCK_K 2023-01-11T21:38:06.7329499Z # load inc ptr of x, upade x_ptrs 2023-01-11T21:38:06.7329586Z if not CONV1X1_NHWC: 2023-01-11T21:38:06.7329673Z delta_x_ptrs += BLOCK_K 2023-01-11T21:38:06.7329792Z off_x_crs = crs + BLOCK_K + tl.arange(0, BLOCK_K) 2023-01-11T21:38:06.7329972Z off_x_crs_unpacked = tl.load(delta_x_ptrs, mask=off_x_crs < CRS, other=0) 2023-01-11T21:38:06.7330101Z x_ptrs = x + off_x_nhw[:, None] + off_x_crs_unpacked[None, :] 2023-01-11T21:38:06.7330173Z else: 2023-01-11T21:38:06.7330283Z off_x_crs = crs + BLOCK_K + tl.arange(0, BLOCK_K) 2023-01-11T21:38:06.7330367Z x_ptrs += BLOCK_K 2023-01-11T21:38:06.7330372Z 2023-01-11T21:38:06.7330444Z mask_x = ( 2023-01-11T21:38:06.7330525Z (off_x_n < BATCH) 2023-01-11T21:38:06.7330604Z & (off_x_h >= 0) 2023-01-11T21:38:06.7330686Z & (off_x_h < IN_H) 2023-01-11T21:38:06.7330765Z & (off_x_w >= 0) 2023-01-11T21:38:06.7330849Z & (off_x_w < IN_W) 2023-01-11T21:38:06.7330944Z )[:, None] & (off_x_crs < CRS)[None, :] 2023-01-11T21:38:06.7331075Z mask_w = (off_x_crs < CRS)[:, None] & (off_w_k < KERNEL_N)[None, :] 2023-01-11T21:38:06.7331201Z # ------ prefetch ------ 2023-01-11T21:38:06.7331319Z # ------ load x ------ 2023-01-11T21:38:06.7331441Z matrix_x = tl.load(x_ptrs, mask=mask_x, other=0.0) 2023-01-11T21:38:06.7331558Z # ------ load w ------ 2023-01-11T21:38:06.7331679Z matrix_w = tl.load(w_ptrs, mask=mask_w, other=0.0) 2023-01-11T21:38:06.7331685Z 2023-01-11T21:38:06.7331788Z acc = acc.to(out_ptr3.dtype.element_ty) 2023-01-11T21:38:06.7331793Z 2023-01-11T21:38:06.7331806Z 2023-01-11T21:38:06.7331894Z XBLOCK: tl.constexpr = BLOCK_M 2023-01-11T21:38:06.7331989Z YBLOCK: tl.constexpr = BLOCK_N 2023-01-11T21:38:06.7332136Z xnumel = BATCH * (OUT_H + 2 * output_padding_h) * (OUT_W + 2 * output_padding_w) 2023-01-11T21:38:06.7332246Z ynumel = KERNEL_N 2023-01-11T21:38:06.7332334Z xoffset = pid_nhw * XBLOCK 2023-01-11T21:38:06.7332470Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.7332555Z xmask = xindex < xnumel 2023-01-11T21:38:06.7332634Z yoffset = pid_k * YBLOCK 2023-01-11T21:38:06.7332765Z yindex = yoffset + tl.reshape(tl.arange(0, YBLOCK), [1, YBLOCK]) 2023-01-11T21:38:06.7332849Z ymask = yindex < ynumel 2023-01-11T21:38:06.7332920Z y2 = yindex 2023-01-11T21:38:06.7332998Z x0 = xindex % 1024 2023-01-11T21:38:06.7333078Z x1 = (xindex // 1024) 2023-01-11T21:38:06.7333229Z tmp0 = tl.load(in_ptr0 + (y2 + tl.zeros([XBLOCK, YBLOCK], tl.int32)), xmask & ymask) 2023-01-11T21:38:06.7333300Z tmp1 = acc + tmp0 2023-01-11T21:38:06.7333465Z tl.store(out_ptr3 + (x0 + (1024*y2) + (32768*x1) + tl.zeros([XBLOCK, YBLOCK], tl.int32)), tmp1, xmask & ymask) 2023-01-11T21:38:06.7333471Z 2023-01-11T21:38:06.7333476Z 2023-01-11T21:38:06.7333549Z return 2023-01-11T21:38:06.7333554Z 2023-01-11T21:38:06.7333559Z 2023-01-11T21:38:06.7333653Z async_compile.wait(globals()) 2023-01-11T21:38:06.7333730Z del async_compile 2023-01-11T21:38:06.7333735Z 2023-01-11T21:38:06.7333811Z def call(args): 2023-01-11T21:38:06.7333903Z arg0_1, arg1_1, arg2_1 = args 2023-01-11T21:38:06.7333981Z args.clear() 2023-01-11T21:38:06.7334066Z with torch.cuda.device(0): 2023-01-11T21:38:06.7334305Z buf0 = empty_strided((32, 128, 32, 32), (131072, 1, 4096, 128), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.7334398Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.7334688Z triton_fused_convolution_0.run(arg0_1, buf0, 4096, 1024, grid=grid(4096, 1024), stream=stream0) 2023-01-11T21:38:06.7334764Z del arg0_1 2023-01-11T21:38:06.7334995Z buf2 = empty_strided((32, 32, 32, 32), (32768, 1024, 32, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.7335001Z 2023-01-11T21:38:06.7335101Z def grid_triton_template_1(META): 2023-01-11T21:38:06.7335178Z return ( 2023-01-11T21:38:06.7335285Z triton.cdiv(32 * 32 * 32, META["BLOCK_M"]), 2023-01-11T21:38:06.7335391Z triton.cdiv(32, META["BLOCK_N"]), 2023-01-11T21:38:06.7335460Z ) 2023-01-11T21:38:06.7335465Z 2023-01-11T21:38:06.7335728Z triton_template_1[grid_triton_template_1](buf0, arg1_1, 131072, 1, 4096, 128, 128, 1, 1, 1, 32768, 1, 1024, 32, None, 32, 128, 32, 32, 32, 1, 1, 32, 32, 1, 1, 0, 0, 1, 1, 0, 0, 1, None, arg2_1, buf2, ACC_TYPE=tl.float32, CONV1X1_NHWC=True) 2023-01-11T21:38:06.7335807Z del arg1_1 2023-01-11T21:38:06.7335880Z del arg2_1 2023-01-11T21:38:06.7335958Z return (buf2, ) 2023-01-11T21:38:06.7335963Z 2023-01-11T21:38:06.7335968Z 2023-01-11T21:38:06.7336047Z if __name__ == "__main__": 2023-01-11T21:38:06.7336159Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.7336287Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.7336522Z arg0_1 = rand_strided((32, 128, 32, 32), (131072, 1024, 32, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.7336743Z arg1_1 = rand_strided((32, 128, 1, 1), (128, 1, 1, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.7336941Z arg2_1 = rand_strided((32, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.7337070Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1])) 2023-01-11T21:38:06.7337399Z [2023-01-11 21:35:49,532] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 903 2023-01-11T21:38:06.7337406Z 2023-01-11T21:38:06.7337478Z ok (9.917s) 2023-01-11T21:38:06.7337990Z test_triton_mm2_cuda (__main__.CudaTests) ... /opt/conda/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py:372: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled.Consider setting `torch.set_float32_matmul_precision('high')` 2023-01-11T21:38:06.7338063Z warnings.warn( 2023-01-11T21:38:06.7338363Z [2023-01-11 21:35:59,191] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 904 2023-01-11T21:38:06.7338698Z [2023-01-11 21:35:59,195] torch._inductor.codegen.triton_template: [DEBUG] template_codegen: TemplateSchedulerNode(name='buf0') -- [SchedulerNode(name='buf1')] 2023-01-11T21:38:06.7338912Z [2023-01-11 21:35:59,200] torch._inductor.scheduler: [DEBUG] remove_buffer('buf0') 2023-01-11T21:38:06.7339173Z [2023-01-11 21:35:59,206] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 904 2023-01-11T21:38:06.7339179Z 2023-01-11T21:38:06.7339278Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.7339353Z import torch 2023-01-11T21:38:06.7339428Z import random 2023-01-11T21:38:06.7339541Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.7339665Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.7339670Z 2023-01-11T21:38:06.7339752Z aten = torch.ops.aten 2023-01-11T21:38:06.7339892Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.7339989Z async_compile = AsyncCompile() 2023-01-11T21:38:06.7339994Z 2023-01-11T21:38:06.7340069Z import triton 2023-01-11T21:38:06.7340163Z import triton.language as tl 2023-01-11T21:38:06.7340292Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.7340429Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.7340434Z 2023-01-11T21:38:06.7340570Z from torch._inductor.triton_ops.autotune import mm_heuristics 2023-01-11T21:38:06.7340703Z from torch._inductor.triton_ops.autotune import mm_autotune 2023-01-11T21:38:06.7340779Z import torch 2023-01-11T21:38:06.7340856Z import triton 2023-01-11T21:38:06.7340948Z import triton.language as tl 2023-01-11T21:38:06.7340953Z 2023-01-11T21:38:06.7341029Z @mm_autotune() 2023-01-11T21:38:06.7341099Z @mm_heuristics() 2023-01-11T21:38:06.7341170Z @triton.jit 2023-01-11T21:38:06.7341254Z def triton_template_0( 2023-01-11T21:38:06.7341325Z A, 2023-01-11T21:38:06.7341392Z B, 2023-01-11T21:38:06.7341458Z M, 2023-01-11T21:38:06.7341523Z N, 2023-01-11T21:38:06.7341581Z K, 2023-01-11T21:38:06.7341659Z stride_am, 2023-01-11T21:38:06.7341731Z stride_ak, 2023-01-11T21:38:06.7341803Z stride_bk, 2023-01-11T21:38:06.7341874Z stride_bn, 2023-01-11T21:38:06.7341971Z stride_cm, 2023-01-11T21:38:06.7342036Z stride_cn, 2023-01-11T21:38:06.7342120Z # fusable kernels args 2023-01-11T21:38:06.7342196Z out_ptr3, 2023-01-11T21:38:06.7342284Z allow_tf32: tl.constexpr, 2023-01-11T21:38:06.7342372Z BLOCK_M: tl.constexpr, 2023-01-11T21:38:06.7342456Z BLOCK_N: tl.constexpr, 2023-01-11T21:38:06.7342541Z BLOCK_K: tl.constexpr, 2023-01-11T21:38:06.7342618Z GROUP_M: tl.constexpr, 2023-01-11T21:38:06.7342701Z SPLIT_K: tl.constexpr, 2023-01-11T21:38:06.7342783Z EVEN_K: tl.constexpr, 2023-01-11T21:38:06.7342870Z ACC_TYPE: tl.constexpr, 2023-01-11T21:38:06.7342939Z ): 2023-01-11T21:38:06.7343030Z # matrix multiplication 2023-01-11T21:38:06.7343108Z pid = tl.program_id(0) 2023-01-11T21:38:06.7343197Z pid_z = tl.program_id(1) 2023-01-11T21:38:06.7343338Z grid_m = (M + BLOCK_M - 1) // BLOCK_M 2023-01-11T21:38:06.7343470Z grid_n = (N + BLOCK_N - 1) // BLOCK_N 2023-01-11T21:38:06.7343637Z # re-order program ID for better L2 performance 2023-01-11T21:38:06.7343723Z width = GROUP_M * grid_n 2023-01-11T21:38:06.7343809Z group_id = pid // width 2023-01-11T21:38:06.7343971Z group_size = min(grid_m - group_id * GROUP_M, GROUP_M) 2023-01-11T21:38:06.7344087Z pid_m = group_id * GROUP_M + (pid % group_size) 2023-01-11T21:38:06.7344187Z pid_n = (pid % width) // (group_size) 2023-01-11T21:38:06.7344278Z # do matrix multiplication 2023-01-11T21:38:06.7344387Z rm = pid_m * BLOCK_M + tl.arange(0, BLOCK_M) 2023-01-11T21:38:06.7344495Z rn = pid_n * BLOCK_N + tl.arange(0, BLOCK_N) 2023-01-11T21:38:06.7344631Z ram = tl.max_contiguous(tl.multiple_of(rm % M, BLOCK_M), BLOCK_M) 2023-01-11T21:38:06.7344803Z rbn = tl.max_contiguous(tl.multiple_of(rn % N, BLOCK_N), BLOCK_N) 2023-01-11T21:38:06.7344903Z rk = pid_z * BLOCK_K + tl.arange(0, BLOCK_K) 2023-01-11T21:38:06.7344977Z # pointers 2023-01-11T21:38:06.7345108Z A_ptrs = A + (ram[:, None] * stride_am + rk[None, :] * stride_ak) 2023-01-11T21:38:06.7345238Z B_ptrs = B + (rk[:, None] * stride_bk + rbn[None, :] * stride_bn) 2023-01-11T21:38:06.7345355Z acc = tl.zeros((BLOCK_M, BLOCK_N), dtype=ACC_TYPE) 2023-01-11T21:38:06.7345504Z for k in range(K, 0, -BLOCK_K * SPLIT_K): 2023-01-11T21:38:06.7345577Z if EVEN_K: 2023-01-11T21:38:06.7345655Z a = tl.load(A_ptrs) 2023-01-11T21:38:06.7345737Z b = tl.load(B_ptrs) 2023-01-11T21:38:06.7345806Z else: 2023-01-11T21:38:06.7345927Z a = tl.load(A_ptrs, mask=rk[None, :] < k, other=0.0) 2023-01-11T21:38:06.7346047Z b = tl.load(B_ptrs, mask=rk[:, None] < k, other=0.0) 2023-01-11T21:38:06.7346157Z acc += tl.dot(a, b, allow_tf32=allow_tf32) 2023-01-11T21:38:06.7346261Z A_ptrs += BLOCK_K * SPLIT_K * stride_ak 2023-01-11T21:38:06.7346359Z B_ptrs += BLOCK_K * SPLIT_K * stride_bk 2023-01-11T21:38:06.7346470Z acc = acc.to(out_ptr3.dtype.element_ty) 2023-01-11T21:38:06.7346475Z 2023-01-11T21:38:06.7346484Z 2023-01-11T21:38:06.7346582Z XBLOCK: tl.constexpr = BLOCK_M 2023-01-11T21:38:06.7346680Z YBLOCK: tl.constexpr = BLOCK_N 2023-01-11T21:38:06.7346753Z xnumel = M 2023-01-11T21:38:06.7346826Z ynumel = N 2023-01-11T21:38:06.7346915Z xoffset = pid_m * XBLOCK 2023-01-11T21:38:06.7347044Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.7347127Z xmask = xindex < xnumel 2023-01-11T21:38:06.7347213Z yoffset = pid_n * YBLOCK 2023-01-11T21:38:06.7347343Z yindex = yoffset + tl.reshape(tl.arange(0, YBLOCK), [1, YBLOCK]) 2023-01-11T21:38:06.7347425Z ymask = yindex < ynumel 2023-01-11T21:38:06.7347499Z x0 = xindex 2023-01-11T21:38:06.7347568Z y1 = yindex 2023-01-11T21:38:06.7347676Z tmp0 = tl.where(0 != 0, 0, tl.where(0 > acc, 0, acc)) 2023-01-11T21:38:06.7347835Z tl.store(out_ptr3 + (y1 + (1024*x0) + tl.zeros([XBLOCK, YBLOCK], tl.int32)), tmp0, xmask & ymask) 2023-01-11T21:38:06.7347841Z 2023-01-11T21:38:06.7347845Z 2023-01-11T21:38:06.7347968Z async_compile.wait(globals()) 2023-01-11T21:38:06.7348047Z del async_compile 2023-01-11T21:38:06.7348052Z 2023-01-11T21:38:06.7348126Z def call(args): 2023-01-11T21:38:06.7348206Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.7348282Z args.clear() 2023-01-11T21:38:06.7348374Z with torch.cuda.device(0): 2023-01-11T21:38:06.7348581Z buf1 = empty_strided((1024, 1024), (1024, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.7348597Z 2023-01-11T21:38:06.7348691Z def grid_triton_template_0(META): 2023-01-11T21:38:06.7348764Z return ( 2023-01-11T21:38:06.7348909Z triton.cdiv(1024, META["BLOCK_M"]) * triton.cdiv(1024, META["BLOCK_N"]), 2023-01-11T21:38:06.7348999Z META["SPLIT_K"], 2023-01-11T21:38:06.7349067Z ) 2023-01-11T21:38:06.7349072Z 2023-01-11T21:38:06.7349273Z triton_template_0[grid_triton_template_0](arg0_1, arg1_1, 1024, 1024, 1024, 1024, 1, 1024, 1, 1024, 1, buf1, GROUP_M=8, ACC_TYPE=tl.float32, allow_tf32=False) 2023-01-11T21:38:06.7349348Z del arg0_1 2023-01-11T21:38:06.7349414Z del arg1_1 2023-01-11T21:38:06.7349491Z return (buf1, ) 2023-01-11T21:38:06.7349496Z 2023-01-11T21:38:06.7349500Z 2023-01-11T21:38:06.7349583Z if __name__ == "__main__": 2023-01-11T21:38:06.7349704Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.7349835Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.7350048Z arg0_1 = rand_strided((1024, 1024), (1024, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.7350256Z arg1_1 = rand_strided((1024, 1024), (1024, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.7350405Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.7350410Z 2023-01-11T21:38:06.7350474Z ok (5.666s) 2023-01-11T21:38:06.7350935Z test_triu_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.7351066Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.7351327Z [2023-01-11 21:36:04,879] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 905 2023-01-11T21:38:06.7351589Z [2023-01-11 21:36:04,985] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 905 2023-01-11T21:38:06.7352006Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.7352142Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.7352398Z [2023-01-11 21:36:05,023] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 906 2023-01-11T21:38:06.7352662Z [2023-01-11 21:36:05,121] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 906 2023-01-11T21:38:06.7352667Z 2023-01-11T21:38:06.7352765Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.7352840Z import torch 2023-01-11T21:38:06.7352908Z import random 2023-01-11T21:38:06.7353027Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.7353155Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.7353160Z 2023-01-11T21:38:06.7353243Z aten = torch.ops.aten 2023-01-11T21:38:06.7353379Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.7353476Z async_compile = AsyncCompile() 2023-01-11T21:38:06.7353481Z 2023-01-11T21:38:06.7353556Z import triton 2023-01-11T21:38:06.7353676Z import triton.language as tl 2023-01-11T21:38:06.7353796Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.7353937Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.7353942Z 2023-01-11T21:38:06.7353947Z 2023-01-11T21:38:06.7354127Z triton_fused_triu_triu_1_triu_2_0 = async_compile.triton(''' 2023-01-11T21:38:06.7354202Z import triton 2023-01-11T21:38:06.7354294Z import triton.language as tl 2023-01-11T21:38:06.7354410Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.7354511Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.7354637Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.7354765Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.7354770Z 2023-01-11T21:38:06.7355205Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.7355279Z @triton.jit 2023-01-11T21:38:06.7355435Z def triton_(in_ptr0, out_ptr0, out_ptr1, out_ptr2, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.7355511Z xnumel = 200 2023-01-11T21:38:06.7355609Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.7355741Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.7355817Z xmask = xindex < xnumel 2023-01-11T21:38:06.7355893Z x0 = xindex % 10 2023-01-11T21:38:06.7355975Z x1 = (xindex // 10) % 10 2023-01-11T21:38:06.7356045Z x3 = xindex 2023-01-11T21:38:06.7356267Z tmp3 = tl.load(in_ptr0 + (x3), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.7356366Z tmp10 = tl.load(in_ptr0 + (x3), xmask) 2023-01-11T21:38:06.7356487Z tmp0 = (-1) + x0 + ((-1)*x1) 2023-01-11T21:38:06.7356551Z tmp1 = 0 2023-01-11T21:38:06.7356634Z tmp2 = tmp0 >= tmp1 2023-01-11T21:38:06.7356733Z tmp4 = tl.where(tmp2, tmp3, tmp1) 2023-01-11T21:38:06.7356845Z tmp5 = x0 + ((-1)*x1) 2023-01-11T21:38:06.7356925Z tmp6 = tmp5 >= tmp1 2023-01-11T21:38:06.7357020Z tmp7 = tl.where(tmp6, tmp3, tmp1) 2023-01-11T21:38:06.7357139Z tmp8 = (-2) + x0 + ((-1)*x1) 2023-01-11T21:38:06.7357210Z tmp9 = tmp8 >= tmp1 2023-01-11T21:38:06.7357308Z tmp11 = tl.where(tmp9, tmp10, tmp1) 2023-01-11T21:38:06.7357446Z tl.store(out_ptr0 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.7357580Z tl.store(out_ptr1 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp7, xmask) 2023-01-11T21:38:06.7357715Z tl.store(out_ptr2 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp11, xmask) 2023-01-11T21:38:06.7357803Z ''') 2023-01-11T21:38:06.7357808Z 2023-01-11T21:38:06.7357813Z 2023-01-11T21:38:06.7357906Z async_compile.wait(globals()) 2023-01-11T21:38:06.7357984Z del async_compile 2023-01-11T21:38:06.7357989Z 2023-01-11T21:38:06.7358056Z def call(args): 2023-01-11T21:38:06.7358134Z arg0_1, = args 2023-01-11T21:38:06.7358209Z args.clear() 2023-01-11T21:38:06.7358302Z with torch.cuda.device(0): 2023-01-11T21:38:06.7358516Z buf0 = empty_strided((2, 10, 10), (100, 10, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.7358728Z buf1 = empty_strided((2, 10, 10), (100, 10, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.7358933Z buf2 = empty_strided((2, 10, 10), (100, 10, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.7359020Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.7359185Z triton_fused_triu_triu_1_triu_2_0.run(arg0_1, buf0, buf1, buf2, 200, grid=grid(200), stream=stream0) 2023-01-11T21:38:06.7359266Z del arg0_1 2023-01-11T21:38:06.7359356Z return (buf0, buf1, buf2, ) 2023-01-11T21:38:06.7359361Z 2023-01-11T21:38:06.7359366Z 2023-01-11T21:38:06.7359447Z if __name__ == "__main__": 2023-01-11T21:38:06.7359566Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.7359721Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.7359934Z arg0_1 = rand_strided((2, 10, 10), (100, 10, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.7360039Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.7360054Z 2023-01-11T21:38:06.7360059Z 2023-01-11T21:38:06.7360149Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.7360231Z import torch 2023-01-11T21:38:06.7360307Z import random 2023-01-11T21:38:06.7360427Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.7360553Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.7360558Z 2023-01-11T21:38:06.7360644Z aten = torch.ops.aten 2023-01-11T21:38:06.7360780Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.7360868Z async_compile = AsyncCompile() 2023-01-11T21:38:06.7360873Z 2023-01-11T21:38:06.7360946Z import triton 2023-01-11T21:38:06.7361038Z import triton.language as tl 2023-01-11T21:38:06.7361164Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.7361307Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.7361313Z 2023-01-11T21:38:06.7361317Z 2023-01-11T21:38:06.7361496Z triton_fused_triu_triu_1_triu_2_0 = async_compile.triton(''' 2023-01-11T21:38:06.7361571Z import triton 2023-01-11T21:38:06.7361662Z import triton.language as tl 2023-01-11T21:38:06.7361769Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.7361872Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.7362006Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.7362131Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.7362164Z 2023-01-11T21:38:06.7362597Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: '*fp16', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.7362678Z @triton.jit 2023-01-11T21:38:06.7362833Z def triton_(in_ptr0, out_ptr0, out_ptr1, out_ptr2, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.7362906Z xnumel = 200 2023-01-11T21:38:06.7362997Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.7363131Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.7363218Z xmask = xindex < xnumel 2023-01-11T21:38:06.7363295Z x0 = xindex % 10 2023-01-11T21:38:06.7363377Z x1 = (xindex // 10) % 10 2023-01-11T21:38:06.7363448Z x3 = xindex 2023-01-11T21:38:06.7363660Z tmp3 = tl.load(in_ptr0 + (x3), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.7363775Z tmp10 = tl.load(in_ptr0 + (x3), xmask).to(tl.float32) 2023-01-11T21:38:06.7363894Z tmp0 = (-1) + x0 + ((-1)*x1) 2023-01-11T21:38:06.7363966Z tmp1 = 0 2023-01-11T21:38:06.7364048Z tmp2 = tmp0 >= tmp1 2023-01-11T21:38:06.7364147Z tmp4 = tl.where(tmp2, tmp3, tmp1) 2023-01-11T21:38:06.7364263Z tmp5 = x0 + ((-1)*x1) 2023-01-11T21:38:06.7364345Z tmp6 = tmp5 >= tmp1 2023-01-11T21:38:06.7364432Z tmp7 = tl.where(tmp6, tmp3, tmp1) 2023-01-11T21:38:06.7364552Z tmp8 = (-2) + x0 + ((-1)*x1) 2023-01-11T21:38:06.7364631Z tmp9 = tmp8 >= tmp1 2023-01-11T21:38:06.7364729Z tmp11 = tl.where(tmp9, tmp10, tmp1) 2023-01-11T21:38:06.7364864Z tl.store(out_ptr0 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.7364997Z tl.store(out_ptr1 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp7, xmask) 2023-01-11T21:38:06.7365133Z tl.store(out_ptr2 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp11, xmask) 2023-01-11T21:38:06.7365214Z ''') 2023-01-11T21:38:06.7365220Z 2023-01-11T21:38:06.7365231Z 2023-01-11T21:38:06.7365318Z async_compile.wait(globals()) 2023-01-11T21:38:06.7365395Z del async_compile 2023-01-11T21:38:06.7365401Z 2023-01-11T21:38:06.7365477Z def call(args): 2023-01-11T21:38:06.7365555Z arg0_1, = args 2023-01-11T21:38:06.7365630Z args.clear() 2023-01-11T21:38:06.7365755Z with torch.cuda.device(0): 2023-01-11T21:38:06.7365967Z buf0 = empty_strided((2, 10, 10), (100, 10, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.7366170Z buf1 = empty_strided((2, 10, 10), (100, 10, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.7366374Z buf2 = empty_strided((2, 10, 10), (100, 10, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.7366467Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.7366628Z triton_fused_triu_triu_1_triu_2_0.run(arg0_1, buf0, buf1, buf2, 200, grid=grid(200), stream=stream0) 2023-01-11T21:38:06.7366702Z del arg0_1 2023-01-11T21:38:06.7366797Z return (buf0, buf1, buf2, ) 2023-01-11T21:38:06.7366802Z 2023-01-11T21:38:06.7366807Z 2023-01-11T21:38:06.7366889Z if __name__ == "__main__": 2023-01-11T21:38:06.7367009Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.7367130Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.7367345Z arg0_1 = rand_strided((2, 10, 10), (100, 10, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.7367458Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.7367463Z 2023-01-11T21:38:06.7367536Z ok (0.282s) 2023-01-11T21:38:06.7367990Z test_unbind_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.7368152Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.7368412Z [2023-01-11 21:36:05,144] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 907 2023-01-11T21:38:06.7368677Z [2023-01-11 21:36:05,151] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 907 2023-01-11T21:38:06.7369088Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.7369220Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.7369466Z [2023-01-11 21:36:05,173] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 908 2023-01-11T21:38:06.7369732Z [2023-01-11 21:36:05,179] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 908 2023-01-11T21:38:06.7369738Z 2023-01-11T21:38:06.7369836Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.7369911Z import torch 2023-01-11T21:38:06.7369991Z import random 2023-01-11T21:38:06.7370113Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.7370238Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.7370243Z 2023-01-11T21:38:06.7370325Z aten = torch.ops.aten 2023-01-11T21:38:06.7370455Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.7370551Z async_compile = AsyncCompile() 2023-01-11T21:38:06.7370556Z 2023-01-11T21:38:06.7370631Z import triton 2023-01-11T21:38:06.7370723Z import triton.language as tl 2023-01-11T21:38:06.7370849Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.7370989Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.7370997Z 2023-01-11T21:38:06.7371002Z 2023-01-11T21:38:06.7371094Z async_compile.wait(globals()) 2023-01-11T21:38:06.7371171Z del async_compile 2023-01-11T21:38:06.7371178Z 2023-01-11T21:38:06.7371245Z def call(args): 2023-01-11T21:38:06.7371319Z arg0_1, = args 2023-01-11T21:38:06.7371396Z args.clear() 2023-01-11T21:38:06.7371698Z return (as_strided(arg0_1, (4, 4), (4, 1)), as_strided(arg0_1, (4, 4), (4, 1), 16), as_strided(arg0_1, (4, 4), (4, 1), 32), as_strided(arg0_1, (4, 4), (4, 1), 48), as_strided(arg0_1, (4, 4), (16, 4)), as_strided(arg0_1, (4, 4), (16, 4), 1), as_strided(arg0_1, (4, 4), (16, 4), 2), as_strided(arg0_1, (4, 4), (16, 4), 3), ) 2023-01-11T21:38:06.7371705Z 2023-01-11T21:38:06.7371710Z 2023-01-11T21:38:06.7371794Z if __name__ == "__main__": 2023-01-11T21:38:06.7371913Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.7372038Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.7372250Z arg0_1 = rand_strided((4, 4, 4), (16, 4, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.7372366Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.7372371Z 2023-01-11T21:38:06.7372375Z 2023-01-11T21:38:06.7372466Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.7372542Z import torch 2023-01-11T21:38:06.7372616Z import random 2023-01-11T21:38:06.7372739Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.7372864Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.7372869Z 2023-01-11T21:38:06.7372952Z aten = torch.ops.aten 2023-01-11T21:38:06.7373090Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.7373187Z async_compile = AsyncCompile() 2023-01-11T21:38:06.7373192Z 2023-01-11T21:38:06.7373260Z import triton 2023-01-11T21:38:06.7373354Z import triton.language as tl 2023-01-11T21:38:06.7373481Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.7373621Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.7373652Z 2023-01-11T21:38:06.7373656Z 2023-01-11T21:38:06.7373753Z async_compile.wait(globals()) 2023-01-11T21:38:06.7373829Z del async_compile 2023-01-11T21:38:06.7373835Z 2023-01-11T21:38:06.7373913Z def call(args): 2023-01-11T21:38:06.7373986Z arg0_1, = args 2023-01-11T21:38:06.7374057Z args.clear() 2023-01-11T21:38:06.7374326Z return (as_strided(arg0_1, (4, 4), (4, 1)), as_strided(arg0_1, (4, 4), (4, 1), 16), as_strided(arg0_1, (4, 4), (4, 1), 32), as_strided(arg0_1, (4, 4), (4, 1), 48), as_strided(arg0_1, (4, 4), (16, 4)), as_strided(arg0_1, (4, 4), (16, 4), 1), as_strided(arg0_1, (4, 4), (16, 4), 2), as_strided(arg0_1, (4, 4), (16, 4), 3), ) 2023-01-11T21:38:06.7374332Z 2023-01-11T21:38:06.7374336Z 2023-01-11T21:38:06.7374417Z if __name__ == "__main__": 2023-01-11T21:38:06.7374740Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.7374870Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.7375088Z arg0_1 = rand_strided((4, 4, 4), (16, 4, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.7375209Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.7375214Z 2023-01-11T21:38:06.7375286Z ok (0.059s) 2023-01-11T21:38:06.7375846Z test_unroll_small_reduction_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.7375992Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.7376278Z [2023-01-11 21:36:05,220] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 909 2023-01-11T21:38:06.7376581Z [2023-01-11 21:36:05,500] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 909 2023-01-11T21:38:06.7377116Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.7377308Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.7377564Z [2023-01-11 21:36:05,541] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 910 2023-01-11T21:38:06.7377570Z 2023-01-11T21:38:06.7377666Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.7377742Z import torch 2023-01-11T21:38:06.7377818Z import random 2023-01-11T21:38:06.7377939Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.7378056Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.7378064Z 2023-01-11T21:38:06.7378146Z aten = torch.ops.aten 2023-01-11T21:38:06.7378287Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.7378385Z async_compile = AsyncCompile() 2023-01-11T21:38:06.7378390Z 2023-01-11T21:38:06.7378464Z import triton 2023-01-11T21:38:06.7378559Z import triton.language as tl 2023-01-11T21:38:06.7378689Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.7378830Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.7378835Z 2023-01-11T21:38:06.7378840Z 2023-01-11T21:38:06.7379107Z triton_fused_amax_amin_any_1_argmax_argmin_getitem_getitem_1_getitem_2_getitem_3_logical_not_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.7379186Z import triton 2023-01-11T21:38:06.7379281Z import triton.language as tl 2023-01-11T21:38:06.7379399Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.7379501Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.7379681Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.7379807Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.7379812Z 2023-01-11T21:38:06.7380357Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*i64', 3: '*fp32', 4: '*i64', 5: '*fp32', 6: '*i1', 7: '*i1', 8: '*i64', 9: '*i64', 10: '*fp32', 11: '*fp32', 12: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11), equal_to_1=())]}) 2023-01-11T21:38:06.7380431Z @triton.jit 2023-01-11T21:38:06.7380639Z def triton_(in_ptr0, out_ptr0, out_ptr1, out_ptr2, out_ptr3, out_ptr4, out_ptr5, out_ptr6, out_ptr7, out_ptr8, out_ptr9, out_ptr10, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.7380713Z xnumel = 8 2023-01-11T21:38:06.7380812Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.7380942Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.7381030Z xmask = xindex < xnumel 2023-01-11T21:38:06.7381102Z x0 = xindex 2023-01-11T21:38:06.7381294Z tmp0 = tl.load(in_ptr0 + (3*x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.7381485Z tmp1 = tl.load(in_ptr0 + (1 + (3*x0)), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.7381685Z tmp3 = tl.load(in_ptr0 + (2 + (3*x0)), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.7381789Z tmp46 = tl.load(in_ptr0 + (3*x0), xmask) 2023-01-11T21:38:06.7381898Z tmp47 = tl.load(in_ptr0 + (1 + (3*x0)), xmask) 2023-01-11T21:38:06.7382003Z tmp49 = tl.load(in_ptr0 + (2 + (3*x0)), xmask) 2023-01-11T21:38:06.7382142Z tmp2 = tl.where(tmp0 != tmp0, tmp0, tl.where(tmp0 < tmp1, tmp0, tmp1)) 2023-01-11T21:38:06.7382279Z tmp4 = tl.where(tmp2 != tmp2, tmp2, tl.where(tmp2 < tmp3, tmp2, tmp3)) 2023-01-11T21:38:06.7382350Z tmp5 = 0 2023-01-11T21:38:06.7382412Z tmp6 = 1 2023-01-11T21:38:06.7382492Z tmp7 = tmp1 < tmp0 2023-01-11T21:38:06.7382592Z tmp8 = tl.where(tmp7, tmp6, tmp5) 2023-01-11T21:38:06.7382661Z tmp9 = 2 2023-01-11T21:38:06.7382744Z tmp10 = tmp3 < tmp2 2023-01-11T21:38:06.7382842Z tmp11 = tl.where(tmp10, tmp9, tmp8) 2023-01-11T21:38:06.7382981Z tmp12 = tl.where(tmp0 != tmp0, tmp0, tl.where(tmp0 > tmp1, tmp0, tmp1)) 2023-01-11T21:38:06.7383143Z tmp13 = tl.where(tmp12 != tmp12, tmp12, tl.where(tmp12 > tmp3, tmp12, tmp3)) 2023-01-11T21:38:06.7383224Z tmp14 = tmp1 > tmp0 2023-01-11T21:38:06.7383322Z tmp15 = tl.where(tmp14, tmp6, tmp5) 2023-01-11T21:38:06.7383404Z tmp16 = tmp3 > tmp12 2023-01-11T21:38:06.7383503Z tmp17 = tl.where(tmp16, tmp9, tmp15) 2023-01-11T21:38:06.7383582Z tmp18 = tmp0 + tmp1 2023-01-11T21:38:06.7383664Z tmp19 = tmp18 + tmp3 2023-01-11T21:38:06.7383737Z tmp20 = tmp0 > tmp6 2023-01-11T21:38:06.7383826Z tmp21 = tmp20.to(tl.int64) 2023-01-11T21:38:06.7383908Z tmp22 = (tmp21 != 0) 2023-01-11T21:38:06.7383988Z tmp23 = tmp1 > tmp6 2023-01-11T21:38:06.7384081Z tmp24 = tmp23.to(tl.int64) 2023-01-11T21:38:06.7384165Z tmp25 = (tmp24 != 0) 2023-01-11T21:38:06.7384239Z tmp26 = tmp22 | tmp25 2023-01-11T21:38:06.7384320Z tmp27 = tmp3 > tmp6 2023-01-11T21:38:06.7384406Z tmp28 = tmp27.to(tl.int64) 2023-01-11T21:38:06.7384487Z tmp29 = (tmp28 != 0) 2023-01-11T21:38:06.7384570Z tmp30 = tmp26 | tmp29 2023-01-11T21:38:06.7384651Z tmp31 = tmp0 > tmp5 2023-01-11T21:38:06.7384729Z tmp32 = tmp31 == 0 2023-01-11T21:38:06.7384808Z tmp33 = tmp32.to(tl.int64) 2023-01-11T21:38:06.7384888Z tmp34 = (tmp33 != 0) 2023-01-11T21:38:06.7384968Z tmp35 = tmp1 > tmp5 2023-01-11T21:38:06.7385047Z tmp36 = tmp35 == 0 2023-01-11T21:38:06.7385132Z tmp37 = tmp36.to(tl.int64) 2023-01-11T21:38:06.7385210Z tmp38 = (tmp37 != 0) 2023-01-11T21:38:06.7385292Z tmp39 = tmp34 | tmp38 2023-01-11T21:38:06.7385364Z tmp40 = tmp3 > tmp5 2023-01-11T21:38:06.7385442Z tmp41 = tmp40 == 0 2023-01-11T21:38:06.7385531Z tmp42 = tmp41.to(tl.int64) 2023-01-11T21:38:06.7385651Z tmp43 = (tmp42 != 0) 2023-01-11T21:38:06.7385731Z tmp44 = tmp39 | tmp43 2023-01-11T21:38:06.7385811Z tmp45 = tmp44 == 0 2023-01-11T21:38:06.7385948Z tmp48 = tl.where(tmp46 != tmp46, tmp46, tl.where(tmp46 > tmp47, tmp46, tmp47)) 2023-01-11T21:38:06.7386088Z tmp50 = tl.where(tmp48 != tmp48, tmp48, tl.where(tmp48 > tmp49, tmp48, tmp49)) 2023-01-11T21:38:06.7386225Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.7386361Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp11, xmask) 2023-01-11T21:38:06.7386495Z tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp13, xmask) 2023-01-11T21:38:06.7386624Z tl.store(out_ptr3 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp17, xmask) 2023-01-11T21:38:06.7386754Z tl.store(out_ptr4 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp19, xmask) 2023-01-11T21:38:06.7386882Z tl.store(out_ptr5 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp30, xmask) 2023-01-11T21:38:06.7387005Z tl.store(out_ptr6 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp45, xmask) 2023-01-11T21:38:06.7387133Z tl.store(out_ptr7 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp11, xmask) 2023-01-11T21:38:06.7387259Z tl.store(out_ptr8 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp17, xmask) 2023-01-11T21:38:06.7387392Z tl.store(out_ptr9 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.7387524Z tl.store(out_ptr10 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp50, xmask) 2023-01-11T21:38:06.7387613Z ''') 2023-01-11T21:38:06.7387619Z 2023-01-11T21:38:06.7387624Z 2023-01-11T21:38:06.7387724Z async_compile.wait(globals()) 2023-01-11T21:38:06.7387803Z del async_compile 2023-01-11T21:38:06.7387808Z 2023-01-11T21:38:06.7387885Z def call(args): 2023-01-11T21:38:06.7387951Z arg0_1, = args 2023-01-11T21:38:06.7388026Z args.clear() 2023-01-11T21:38:06.7388119Z with torch.cuda.device(0): 2023-01-11T21:38:06.7388322Z buf0 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.7388519Z buf1 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.7388717Z buf2 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.7388914Z buf3 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.7389128Z buf4 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.7389318Z buf5 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.bool) 2023-01-11T21:38:06.7389506Z buf6 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.bool) 2023-01-11T21:38:06.7389699Z buf7 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.7389887Z buf8 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.7390082Z buf9 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.7390277Z buf10 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.7390373Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.7390618Z triton_fused_amax_amin_any_1_argmax_argmin_getitem_getitem_1_getitem_2_getitem_3_logical_not_1_0.run(arg0_1, buf0, buf1, buf2, buf3, buf4, buf5, buf6, buf7, buf8, buf9, buf10, 8, grid=grid(8), stream=stream0) 2023-01-11T21:38:06.7390697Z del arg0_1 2023-01-11T21:38:06.7390834Z return (buf0, buf1, buf2, buf3, buf4, buf5, buf6, buf7, buf8, buf9, buf10, ) 2023-01-11T21:38:06.7390840Z 2023-01-11T21:38:06.7390844Z 2023-01-11T21:38:06.7390925Z if __name__ == "__main__": 2023-01-11T21:38:06.7391044Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.7391173Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.7391373Z arg0_1 = rand_strided((8, 3), (3, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.7391485Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.7391518Z 2023-01-11T21:38:06.7391782Z [2023-01-11 21:36:05,697] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 910 2023-01-11T21:38:06.7392192Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.7392324Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.7392580Z [2023-01-11 21:36:05,740] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 911 2023-01-11T21:38:06.7392586Z 2023-01-11T21:38:06.7392684Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.7392758Z import torch 2023-01-11T21:38:06.7392834Z import random 2023-01-11T21:38:06.7392955Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.7393086Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.7393092Z 2023-01-11T21:38:06.7393176Z aten = torch.ops.aten 2023-01-11T21:38:06.7393305Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.7393401Z async_compile = AsyncCompile() 2023-01-11T21:38:06.7393406Z 2023-01-11T21:38:06.7393482Z import triton 2023-01-11T21:38:06.7393576Z import triton.language as tl 2023-01-11T21:38:06.7393703Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.7393841Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.7393846Z 2023-01-11T21:38:06.7393850Z 2023-01-11T21:38:06.7394126Z triton_fused_amax_amin_any_1_argmax_argmin_getitem_getitem_1_getitem_2_getitem_3_logical_not_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.7394201Z import triton 2023-01-11T21:38:06.7394287Z import triton.language as tl 2023-01-11T21:38:06.7394402Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.7394507Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.7394643Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.7394768Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.7394774Z 2023-01-11T21:38:06.7395344Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*i64', 3: '*fp16', 4: '*i64', 5: '*fp16', 6: '*i1', 7: '*i1', 8: '*i64', 9: '*i64', 10: '*fp16', 11: '*fp16', 12: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11), equal_to_1=())]}) 2023-01-11T21:38:06.7395422Z @triton.jit 2023-01-11T21:38:06.7395635Z def triton_(in_ptr0, out_ptr0, out_ptr1, out_ptr2, out_ptr3, out_ptr4, out_ptr5, out_ptr6, out_ptr7, out_ptr8, out_ptr9, out_ptr10, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.7395710Z xnumel = 8 2023-01-11T21:38:06.7395801Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.7395932Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.7396016Z xmask = xindex < xnumel 2023-01-11T21:38:06.7396088Z x0 = xindex 2023-01-11T21:38:06.7396303Z tmp0 = tl.load(in_ptr0 + (3*x0), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.7396526Z tmp1 = tl.load(in_ptr0 + (1 + (3*x0)), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.7396744Z tmp3 = tl.load(in_ptr0 + (2 + (3*x0)), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.7396858Z tmp46 = tl.load(in_ptr0 + (3*x0), xmask).to(tl.float32) 2023-01-11T21:38:06.7396979Z tmp47 = tl.load(in_ptr0 + (1 + (3*x0)), xmask).to(tl.float32) 2023-01-11T21:38:06.7397097Z tmp49 = tl.load(in_ptr0 + (2 + (3*x0)), xmask).to(tl.float32) 2023-01-11T21:38:06.7397235Z tmp2 = tl.where(tmp0 != tmp0, tmp0, tl.where(tmp0 < tmp1, tmp0, tmp1)) 2023-01-11T21:38:06.7397373Z tmp4 = tl.where(tmp2 != tmp2, tmp2, tl.where(tmp2 < tmp3, tmp2, tmp3)) 2023-01-11T21:38:06.7397472Z tmp5 = 0 2023-01-11T21:38:06.7397544Z tmp6 = 1 2023-01-11T21:38:06.7397618Z tmp7 = tmp1 < tmp0 2023-01-11T21:38:06.7397714Z tmp8 = tl.where(tmp7, tmp6, tmp5) 2023-01-11T21:38:06.7397784Z tmp9 = 2 2023-01-11T21:38:06.7397863Z tmp10 = tmp3 < tmp2 2023-01-11T21:38:06.7397962Z tmp11 = tl.where(tmp10, tmp9, tmp8) 2023-01-11T21:38:06.7398099Z tmp12 = tl.where(tmp0 != tmp0, tmp0, tl.where(tmp0 > tmp1, tmp0, tmp1)) 2023-01-11T21:38:06.7398240Z tmp13 = tl.where(tmp12 != tmp12, tmp12, tl.where(tmp12 > tmp3, tmp12, tmp3)) 2023-01-11T21:38:06.7398313Z tmp14 = tmp1 > tmp0 2023-01-11T21:38:06.7398408Z tmp15 = tl.where(tmp14, tmp6, tmp5) 2023-01-11T21:38:06.7398490Z tmp16 = tmp3 > tmp12 2023-01-11T21:38:06.7398588Z tmp17 = tl.where(tmp16, tmp9, tmp15) 2023-01-11T21:38:06.7398666Z tmp18 = tmp0 + tmp1 2023-01-11T21:38:06.7398747Z tmp19 = tmp18 + tmp3 2023-01-11T21:38:06.7398825Z tmp20 = tmp0 > tmp6 2023-01-11T21:38:06.7398909Z tmp21 = tmp20.to(tl.int64) 2023-01-11T21:38:06.7398988Z tmp22 = (tmp21 != 0) 2023-01-11T21:38:06.7399066Z tmp23 = tmp1 > tmp6 2023-01-11T21:38:06.7399152Z tmp24 = tmp23.to(tl.int64) 2023-01-11T21:38:06.7399231Z tmp25 = (tmp24 != 0) 2023-01-11T21:38:06.7399312Z tmp26 = tmp22 | tmp25 2023-01-11T21:38:06.7399393Z tmp27 = tmp3 > tmp6 2023-01-11T21:38:06.7399471Z tmp28 = tmp27.to(tl.int64) 2023-01-11T21:38:06.7399550Z tmp29 = (tmp28 != 0) 2023-01-11T21:38:06.7399630Z tmp30 = tmp26 | tmp29 2023-01-11T21:38:06.7399708Z tmp31 = tmp0 > tmp5 2023-01-11T21:38:06.7399786Z tmp32 = tmp31 == 0 2023-01-11T21:38:06.7399872Z tmp33 = tmp32.to(tl.int64) 2023-01-11T21:38:06.7399944Z tmp34 = (tmp33 != 0) 2023-01-11T21:38:06.7400023Z tmp35 = tmp1 > tmp5 2023-01-11T21:38:06.7400100Z tmp36 = tmp35 == 0 2023-01-11T21:38:06.7400185Z tmp37 = tmp36.to(tl.int64) 2023-01-11T21:38:06.7400264Z tmp38 = (tmp37 != 0) 2023-01-11T21:38:06.7400344Z tmp39 = tmp34 | tmp38 2023-01-11T21:38:06.7400425Z tmp40 = tmp3 > tmp5 2023-01-11T21:38:06.7400494Z tmp41 = tmp40 == 0 2023-01-11T21:38:06.7400578Z tmp42 = tmp41.to(tl.int64) 2023-01-11T21:38:06.7400659Z tmp43 = (tmp42 != 0) 2023-01-11T21:38:06.7400738Z tmp44 = tmp39 | tmp43 2023-01-11T21:38:06.7400815Z tmp45 = tmp44 == 0 2023-01-11T21:38:06.7400989Z tmp48 = tl.where(tmp46 != tmp46, tmp46, tl.where(tmp46 > tmp47, tmp46, tmp47)) 2023-01-11T21:38:06.7401133Z tmp50 = tl.where(tmp48 != tmp48, tmp48, tl.where(tmp48 > tmp49, tmp48, tmp49)) 2023-01-11T21:38:06.7401263Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.7401399Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp11, xmask) 2023-01-11T21:38:06.7401531Z tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp13, xmask) 2023-01-11T21:38:06.7401659Z tl.store(out_ptr3 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp17, xmask) 2023-01-11T21:38:06.7401786Z tl.store(out_ptr4 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp19, xmask) 2023-01-11T21:38:06.7401915Z tl.store(out_ptr5 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp30, xmask) 2023-01-11T21:38:06.7402042Z tl.store(out_ptr6 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp45, xmask) 2023-01-11T21:38:06.7402167Z tl.store(out_ptr7 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp11, xmask) 2023-01-11T21:38:06.7402289Z tl.store(out_ptr8 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp17, xmask) 2023-01-11T21:38:06.7402422Z tl.store(out_ptr9 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.7402555Z tl.store(out_ptr10 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp50, xmask) 2023-01-11T21:38:06.7402641Z ''') 2023-01-11T21:38:06.7402647Z 2023-01-11T21:38:06.7402651Z 2023-01-11T21:38:06.7402748Z async_compile.wait(globals()) 2023-01-11T21:38:06.7402828Z del async_compile 2023-01-11T21:38:06.7402833Z 2023-01-11T21:38:06.7402908Z def call(args): 2023-01-11T21:38:06.7402981Z arg0_1, = args 2023-01-11T21:38:06.7403080Z args.clear() 2023-01-11T21:38:06.7403173Z with torch.cuda.device(0): 2023-01-11T21:38:06.7403373Z buf0 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.7403565Z buf1 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.7403763Z buf2 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.7403952Z buf3 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.7404146Z buf4 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.7404332Z buf5 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.bool) 2023-01-11T21:38:06.7404510Z buf6 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.bool) 2023-01-11T21:38:06.7404701Z buf7 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.7404887Z buf8 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.7405085Z buf9 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.7405282Z buf10 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.7405381Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.7405631Z triton_fused_amax_amin_any_1_argmax_argmin_getitem_getitem_1_getitem_2_getitem_3_logical_not_1_0.run(arg0_1, buf0, buf1, buf2, buf3, buf4, buf5, buf6, buf7, buf8, buf9, buf10, 8, grid=grid(8), stream=stream0) 2023-01-11T21:38:06.7405706Z del arg0_1 2023-01-11T21:38:06.7405836Z return (buf0, buf1, buf2, buf3, buf4, buf5, buf6, buf7, buf8, buf9, buf10, ) 2023-01-11T21:38:06.7405842Z 2023-01-11T21:38:06.7405854Z 2023-01-11T21:38:06.7405928Z if __name__ == "__main__": 2023-01-11T21:38:06.7406049Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.7406177Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.7406378Z arg0_1 = rand_strided((8, 3), (3, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.7406496Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.7406501Z 2023-01-11T21:38:06.7406766Z [2023-01-11 21:36:06,030] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 911 2023-01-11T21:38:06.7407210Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.7407343Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.7407599Z [2023-01-11 21:36:06,071] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 912 2023-01-11T21:38:06.7407606Z 2023-01-11T21:38:06.7407697Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.7407774Z import torch 2023-01-11T21:38:06.7407851Z import random 2023-01-11T21:38:06.7407973Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.7408096Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.7408101Z 2023-01-11T21:38:06.7408183Z aten = torch.ops.aten 2023-01-11T21:38:06.7408320Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.7408410Z async_compile = AsyncCompile() 2023-01-11T21:38:06.7408423Z 2023-01-11T21:38:06.7408490Z import triton 2023-01-11T21:38:06.7408583Z import triton.language as tl 2023-01-11T21:38:06.7408708Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.7408848Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.7408854Z 2023-01-11T21:38:06.7408858Z 2023-01-11T21:38:06.7409116Z triton_fused_amax_amin_any_1_any_2_argmax_argmin_gt_gt_1_logical_not_logical_not_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.7409219Z import triton 2023-01-11T21:38:06.7409314Z import triton.language as tl 2023-01-11T21:38:06.7409422Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.7409525Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.7409656Z from torch._inductor.triton_ops.autotune import reduction 2023-01-11T21:38:06.7409783Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.7409788Z 2023-01-11T21:38:06.7409877Z @reduction(size_hints=[8, 4], 2023-01-11T21:38:06.7409992Z reduction_hint=ReductionHint.INNER, 2023-01-11T21:38:06.7410078Z filename=__file__, 2023-01-11T21:38:06.7410589Z meta={'signature': {0: '*i1', 1: '*fp32', 2: '*fp32', 3: '*i64', 4: '*fp32', 5: '*i64', 6: '*fp32', 7: '*i1', 8: '*i64', 9: '*i64', 10: '*fp32', 11: '*fp32', 12: 'i32', 13: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11), equal_to_1=())]}) 2023-01-11T21:38:06.7410666Z @triton.jit 2023-01-11T21:38:06.7410906Z def triton_(in_out_ptr0, in_ptr0, out_ptr0, out_ptr1, out_ptr2, out_ptr3, out_ptr4, out_ptr5, out_ptr6, out_ptr7, out_ptr8, out_ptr9, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr): 2023-01-11T21:38:06.7410979Z xnumel = 8 2023-01-11T21:38:06.7411051Z rnumel = 3 2023-01-11T21:38:06.7411153Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.7411289Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.7411372Z xmask = xindex < xnumel 2023-01-11T21:38:06.7411489Z rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK]) 2023-01-11T21:38:06.7411554Z x0 = xindex 2023-01-11T21:38:06.7411679Z _tmp1 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("inf") 2023-01-11T21:38:06.7411808Z _tmp2 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("inf") 2023-01-11T21:38:06.7411926Z _tmp2_index = tl.zeros([XBLOCK, RBLOCK], tl.int64) 2023-01-11T21:38:06.7412108Z _tmp3 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("-inf") 2023-01-11T21:38:06.7412287Z _tmp4 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("-inf") 2023-01-11T21:38:06.7412402Z _tmp4_index = tl.zeros([XBLOCK, RBLOCK], tl.int64) 2023-01-11T21:38:06.7412517Z _tmp5 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.7412667Z _tmp10 = tl.zeros([XBLOCK, RBLOCK], tl.int1) + 0 2023-01-11T21:38:06.7412782Z _tmp16 = tl.zeros([XBLOCK, RBLOCK], tl.int1) + 0 2023-01-11T21:38:06.7412965Z _tmp21 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("-inf") 2023-01-11T21:38:06.7413071Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.7413161Z rindex = roffset + rbase 2023-01-11T21:38:06.7413248Z rmask = rindex < rnumel 2023-01-11T21:38:06.7413325Z r1 = rindex 2023-01-11T21:38:06.7413535Z tmp0 = tl.load(in_ptr0 + (r1 + (3*x0)), rmask & xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.7413656Z tmp20 = tl.load(in_ptr0 + (r1 + (3*x0)), rmask & xmask) 2023-01-11T21:38:06.7413789Z _tmp1 = tl.where(xmask & rmask & (_tmp1 > tmp0), tmp0, _tmp1) 2023-01-11T21:38:06.7413931Z _tmp2_index = tl.where(xmask & rmask & (_tmp2 > tmp0), rindex, _tmp2_index) 2023-01-11T21:38:06.7414055Z _tmp2 = tl.where(xmask & rmask & (_tmp2 > tmp0), tmp0, _tmp2) 2023-01-11T21:38:06.7414179Z _tmp3 = tl.where(xmask & rmask & (_tmp3 < tmp0), tmp0, _tmp3) 2023-01-11T21:38:06.7414318Z _tmp4_index = tl.where(xmask & rmask & (_tmp4 < tmp0), rindex, _tmp4_index) 2023-01-11T21:38:06.7414441Z _tmp4 = tl.where(xmask & rmask & (_tmp4 < tmp0), tmp0, _tmp4) 2023-01-11T21:38:06.7414755Z _tmp5 = tl.where(xmask & rmask, _tmp5 + tmp0, _tmp5) 2023-01-11T21:38:06.7414832Z tmp6 = 1 2023-01-11T21:38:06.7414916Z tmp7 = tmp0 > tmp6 2023-01-11T21:38:06.7415008Z tmp8 = tmp7.to(tl.int64) 2023-01-11T21:38:06.7415092Z tmp9 = (tmp8 != 0) 2023-01-11T21:38:06.7415220Z _tmp10 = tl.where(xmask & rmask & (_tmp10 < tmp9), tmp9, _tmp10) 2023-01-11T21:38:06.7415357Z tmp11 = 0 2023-01-11T21:38:06.7415435Z tmp12 = tmp0 > tmp11 2023-01-11T21:38:06.7415519Z tmp13 = tmp12 == 0 2023-01-11T21:38:06.7415613Z tmp14 = tmp13.to(tl.int64) 2023-01-11T21:38:06.7415699Z tmp15 = (tmp14 != 0) 2023-01-11T21:38:06.7415834Z _tmp16 = tl.where(xmask & rmask & (_tmp16 < tmp15), tmp15, _tmp16) 2023-01-11T21:38:06.7415966Z _tmp21 = tl.where(xmask & rmask & (_tmp21 < tmp20), tmp20, _tmp21) 2023-01-11T21:38:06.7416083Z tmp1 = tl.reshape(tl.min(_tmp1, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.7416177Z tl.store(out_ptr0 + x0, tmp1, xmask) 2023-01-11T21:38:06.7416275Z _tmp2_index_reduce = tl.reshape( 2023-01-11T21:38:06.7416389Z tl.argmin(_tmp2, 1), [XBLOCK, 1]).to(tl.int32) 2023-01-11T21:38:06.7416513Z _tmp2_index_mask = (tl.reshape(tl.arange(0, RBLOCK), 2023-01-11T21:38:06.7416617Z [1, RBLOCK]) == _tmp2_index_reduce) 2023-01-11T21:38:06.7416709Z tmp2 = tl.reshape(tl.sum( 2023-01-11T21:38:06.7416842Z tl.where(_tmp2_index_mask, _tmp2_index, 0), 1), [XBLOCK, 1]) 2023-01-11T21:38:06.7416935Z tl.store(out_ptr1 + x0, tmp2, xmask) 2023-01-11T21:38:06.7417051Z tmp3 = tl.reshape(tl.max(_tmp3, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.7417209Z tl.store(out_ptr2 + x0, tmp3, xmask) 2023-01-11T21:38:06.7417311Z _tmp4_index_reduce = tl.reshape( 2023-01-11T21:38:06.7417424Z tl.argmax(_tmp4, 1), [XBLOCK, 1]).to(tl.int32) 2023-01-11T21:38:06.7417549Z _tmp4_index_mask = (tl.reshape(tl.arange(0, RBLOCK), 2023-01-11T21:38:06.7417648Z [1, RBLOCK]) == _tmp4_index_reduce) 2023-01-11T21:38:06.7417731Z tmp4 = tl.reshape(tl.sum( 2023-01-11T21:38:06.7417858Z tl.where(_tmp4_index_mask, _tmp4_index, 0), 1), [XBLOCK, 1]) 2023-01-11T21:38:06.7417957Z tl.store(out_ptr3 + x0, tmp4, xmask) 2023-01-11T21:38:06.7418071Z tmp5 = tl.reshape(tl.sum(_tmp5, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.7418169Z tl.store(out_ptr4 + x0, tmp5, xmask) 2023-01-11T21:38:06.7418293Z tmp10 = tl.reshape(tl.max(_tmp10, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.7418394Z tl.store(out_ptr5 + x0, tmp10, xmask) 2023-01-11T21:38:06.7418505Z tmp16 = tl.reshape(tl.max(_tmp16, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.7418582Z tmp17 = tmp2 2023-01-11T21:38:06.7418682Z tl.store(out_ptr6 + x0, tmp17, xmask) 2023-01-11T21:38:06.7418813Z tmp18 = tmp4 2023-01-11T21:38:06.7418911Z tl.store(out_ptr7 + x0, tmp18, xmask) 2023-01-11T21:38:06.7418985Z tmp19 = tmp1 2023-01-11T21:38:06.7419080Z tl.store(out_ptr8 + x0, tmp19, xmask) 2023-01-11T21:38:06.7419189Z tmp21 = tl.reshape(tl.max(_tmp21, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.7419285Z tl.store(out_ptr9 + x0, tmp21, xmask) 2023-01-11T21:38:06.7419364Z tmp22 = tmp16 == 0 2023-01-11T21:38:06.7419503Z tl.store(in_out_ptr0 + (x0 + tl.zeros([XBLOCK, 1], tl.int32)), tmp22, xmask) 2023-01-11T21:38:06.7419596Z ''') 2023-01-11T21:38:06.7419601Z 2023-01-11T21:38:06.7419606Z 2023-01-11T21:38:06.7419701Z async_compile.wait(globals()) 2023-01-11T21:38:06.7419782Z del async_compile 2023-01-11T21:38:06.7419788Z 2023-01-11T21:38:06.7419862Z def call(args): 2023-01-11T21:38:06.7419928Z arg0_1, = args 2023-01-11T21:38:06.7420004Z args.clear() 2023-01-11T21:38:06.7420097Z with torch.cuda.device(0): 2023-01-11T21:38:06.7420298Z buf0 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.7420491Z buf1 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.7420689Z buf2 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.7420878Z buf3 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.7421064Z buf4 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.7421252Z buf5 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.bool) 2023-01-11T21:38:06.7421438Z buf6 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.bool) 2023-01-11T21:38:06.7421661Z buf8 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.7421851Z buf9 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.7422049Z buf10 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.7422248Z buf11 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.7422340Z buf7 = buf6; del buf6 # reuse 2023-01-11T21:38:06.7422425Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.7422660Z triton_fused_amax_amin_any_1_any_2_argmax_argmin_gt_gt_1_logical_not_logical_not_1_0.run(buf7, arg0_1, buf0, buf1, buf2, buf3, buf4, buf5, buf8, buf9, buf10, buf11, 8, 3, grid=grid(8), stream=stream0) 2023-01-11T21:38:06.7422733Z del arg0_1 2023-01-11T21:38:06.7422871Z return (buf0, buf1, buf2, buf3, buf4, buf5, buf7, buf8, buf9, buf10, buf11, ) 2023-01-11T21:38:06.7422876Z 2023-01-11T21:38:06.7422884Z 2023-01-11T21:38:06.7422966Z if __name__ == "__main__": 2023-01-11T21:38:06.7423084Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.7423211Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.7423409Z arg0_1 = rand_strided((8, 3), (3, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.7423526Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.7423532Z 2023-01-11T21:38:06.7423790Z [2023-01-11 21:36:06,235] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 912 2023-01-11T21:38:06.7423796Z 2023-01-11T21:38:06.7423897Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.7423974Z import torch 2023-01-11T21:38:06.7424050Z import random 2023-01-11T21:38:06.7424170Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.7431659Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.7431670Z 2023-01-11T21:38:06.7431819Z aten = torch.ops.aten 2023-01-11T21:38:06.7431975Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.7432067Z async_compile = AsyncCompile() 2023-01-11T21:38:06.7432072Z 2023-01-11T21:38:06.7432153Z import triton 2023-01-11T21:38:06.7432248Z import triton.language as tl 2023-01-11T21:38:06.7432379Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.7432611Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.7432853Z 2023-01-11T21:38:06.7432858Z 2023-01-11T21:38:06.7433148Z triton_fused_amax_amin_any_1_any_2_argmax_argmin_gt_gt_1_logical_not_logical_not_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.7433451Z import triton 2023-01-11T21:38:06.7433675Z import triton.language as tl 2023-01-11T21:38:06.7433927Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.7434179Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.7434458Z from torch._inductor.triton_ops.autotune import reduction 2023-01-11T21:38:06.7434754Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.7434917Z 2023-01-11T21:38:06.7435008Z @reduction(size_hints=[8, 4], 2023-01-11T21:38:06.7435258Z reduction_hint=ReductionHint.INNER, 2023-01-11T21:38:06.7435541Z filename=__file__, 2023-01-11T21:38:06.7436197Z meta={'signature': {0: '*i1', 1: '*fp16', 2: '*fp16', 3: '*i64', 4: '*fp16', 5: '*i64', 6: '*fp16', 7: '*i1', 8: '*i64', 9: '*i64', 10: '*fp16', 11: '*fp16', 12: 'i32', 13: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11), equal_to_1=())]}) 2023-01-11T21:38:06.7436589Z @triton.jit 2023-01-11T21:38:06.7436968Z def triton_(in_out_ptr0, in_ptr0, out_ptr0, out_ptr1, out_ptr2, out_ptr3, out_ptr4, out_ptr5, out_ptr6, out_ptr7, out_ptr8, out_ptr9, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr): 2023-01-11T21:38:06.7437333Z xnumel = 8 2023-01-11T21:38:06.7437524Z rnumel = 3 2023-01-11T21:38:06.7437781Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.7438068Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.7438331Z xmask = xindex < xnumel 2023-01-11T21:38:06.7438577Z rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK]) 2023-01-11T21:38:06.7438842Z x0 = xindex 2023-01-11T21:38:06.7439109Z _tmp1 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("inf") 2023-01-11T21:38:06.7439407Z _tmp2 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("inf") 2023-01-11T21:38:06.7439676Z _tmp2_index = tl.zeros([XBLOCK, RBLOCK], tl.int64) 2023-01-11T21:38:06.7440012Z _tmp3 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("-inf") 2023-01-11T21:38:06.7440355Z _tmp4 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("-inf") 2023-01-11T21:38:06.7440624Z _tmp4_index = tl.zeros([XBLOCK, RBLOCK], tl.int64) 2023-01-11T21:38:06.7440888Z _tmp5 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.7441155Z _tmp10 = tl.zeros([XBLOCK, RBLOCK], tl.int1) + 0 2023-01-11T21:38:06.7441404Z _tmp16 = tl.zeros([XBLOCK, RBLOCK], tl.int1) + 0 2023-01-11T21:38:06.7441732Z _tmp21 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("-inf") 2023-01-11T21:38:06.7441996Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.7442223Z rindex = roffset + rbase 2023-01-11T21:38:06.7442435Z rmask = rindex < rnumel 2023-01-11T21:38:06.7442627Z r1 = rindex 2023-01-11T21:38:06.7442980Z tmp0 = tl.load(in_ptr0 + (r1 + (3*x0)), rmask & xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.7443296Z tmp20 = tl.load(in_ptr0 + (r1 + (3*x0)), rmask & xmask).to(tl.float32) 2023-01-11T21:38:06.7443589Z _tmp1 = tl.where(xmask & rmask & (_tmp1 > tmp0), tmp0, _tmp1) 2023-01-11T21:38:06.7443891Z _tmp2_index = tl.where(xmask & rmask & (_tmp2 > tmp0), rindex, _tmp2_index) 2023-01-11T21:38:06.7444184Z _tmp2 = tl.where(xmask & rmask & (_tmp2 > tmp0), tmp0, _tmp2) 2023-01-11T21:38:06.7444469Z _tmp3 = tl.where(xmask & rmask & (_tmp3 < tmp0), tmp0, _tmp3) 2023-01-11T21:38:06.7444759Z _tmp4_index = tl.where(xmask & rmask & (_tmp4 < tmp0), rindex, _tmp4_index) 2023-01-11T21:38:06.7445050Z _tmp4 = tl.where(xmask & rmask & (_tmp4 < tmp0), tmp0, _tmp4) 2023-01-11T21:38:06.7445353Z _tmp5 = tl.where(xmask & rmask, _tmp5 + tmp0, _tmp5) 2023-01-11T21:38:06.7445581Z tmp6 = 1 2023-01-11T21:38:06.7445772Z tmp7 = tmp0 > tmp6 2023-01-11T21:38:06.7445975Z tmp8 = tmp7.to(tl.int64) 2023-01-11T21:38:06.7446183Z tmp9 = (tmp8 != 0) 2023-01-11T21:38:06.7446435Z _tmp10 = tl.where(xmask & rmask & (_tmp10 < tmp9), tmp9, _tmp10) 2023-01-11T21:38:06.7446660Z tmp11 = 0 2023-01-11T21:38:06.7446852Z tmp12 = tmp0 > tmp11 2023-01-11T21:38:06.7447052Z tmp13 = tmp12 == 0 2023-01-11T21:38:06.7447256Z tmp14 = tmp13.to(tl.int64) 2023-01-11T21:38:06.7447466Z tmp15 = (tmp14 != 0) 2023-01-11T21:38:06.7447720Z _tmp16 = tl.where(xmask & rmask & (_tmp16 < tmp15), tmp15, _tmp16) 2023-01-11T21:38:06.7448003Z _tmp21 = tl.where(xmask & rmask & (_tmp21 < tmp20), tmp20, _tmp21) 2023-01-11T21:38:06.7448277Z tmp1 = tl.reshape(tl.min(_tmp1, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.7448519Z tl.store(out_ptr0 + x0, tmp1, xmask) 2023-01-11T21:38:06.7448744Z _tmp2_index_reduce = tl.reshape( 2023-01-11T21:38:06.7448992Z tl.argmin(_tmp2, 1), [XBLOCK, 1]).to(tl.int32) 2023-01-11T21:38:06.7449257Z _tmp2_index_mask = (tl.reshape(tl.arange(0, RBLOCK), 2023-01-11T21:38:06.7449505Z [1, RBLOCK]) == _tmp2_index_reduce) 2023-01-11T21:38:06.7449716Z tmp2 = tl.reshape(tl.sum( 2023-01-11T21:38:06.7449970Z tl.where(_tmp2_index_mask, _tmp2_index, 0), 1), [XBLOCK, 1]) 2023-01-11T21:38:06.7450223Z tl.store(out_ptr1 + x0, tmp2, xmask) 2023-01-11T21:38:06.7450463Z tmp3 = tl.reshape(tl.max(_tmp3, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.7450701Z tl.store(out_ptr2 + x0, tmp3, xmask) 2023-01-11T21:38:06.7450966Z _tmp4_index_reduce = tl.reshape( 2023-01-11T21:38:06.7451202Z tl.argmax(_tmp4, 1), [XBLOCK, 1]).to(tl.int32) 2023-01-11T21:38:06.7451466Z _tmp4_index_mask = (tl.reshape(tl.arange(0, RBLOCK), 2023-01-11T21:38:06.7451712Z [1, RBLOCK]) == _tmp4_index_reduce) 2023-01-11T21:38:06.7451923Z tmp4 = tl.reshape(tl.sum( 2023-01-11T21:38:06.7452175Z tl.where(_tmp4_index_mask, _tmp4_index, 0), 1), [XBLOCK, 1]) 2023-01-11T21:38:06.7452426Z tl.store(out_ptr3 + x0, tmp4, xmask) 2023-01-11T21:38:06.7452674Z tmp5 = tl.reshape(tl.sum(_tmp5, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.7452915Z tl.store(out_ptr4 + x0, tmp5, xmask) 2023-01-11T21:38:06.7453161Z tmp10 = tl.reshape(tl.max(_tmp10, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.7453409Z tl.store(out_ptr5 + x0, tmp10, xmask) 2023-01-11T21:38:06.7453652Z tmp16 = tl.reshape(tl.max(_tmp16, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.7453869Z tmp17 = tmp2 2023-01-11T21:38:06.7454077Z tl.store(out_ptr6 + x0, tmp17, xmask) 2023-01-11T21:38:06.7454277Z tmp18 = tmp4 2023-01-11T21:38:06.7454690Z tl.store(out_ptr7 + x0, tmp18, xmask) 2023-01-11T21:38:06.7454901Z tmp19 = tmp1 2023-01-11T21:38:06.7455097Z tl.store(out_ptr8 + x0, tmp19, xmask) 2023-01-11T21:38:06.7455347Z tmp21 = tl.reshape(tl.max(_tmp21, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.7455594Z tl.store(out_ptr9 + x0, tmp21, xmask) 2023-01-11T21:38:06.7455796Z tmp22 = tmp16 == 0 2023-01-11T21:38:06.7456058Z tl.store(in_out_ptr0 + (x0 + tl.zeros([XBLOCK, 1], tl.int32)), tmp22, xmask) 2023-01-11T21:38:06.7456319Z ''') 2023-01-11T21:38:06.7456423Z 2023-01-11T21:38:06.7456428Z 2023-01-11T21:38:06.7456528Z async_compile.wait(globals()) 2023-01-11T21:38:06.7456728Z del async_compile 2023-01-11T21:38:06.7456846Z 2023-01-11T21:38:06.7456924Z def call(args): 2023-01-11T21:38:06.7457107Z arg0_1, = args 2023-01-11T21:38:06.7457356Z args.clear() 2023-01-11T21:38:06.7457565Z with torch.cuda.device(0): 2023-01-11T21:38:06.7457901Z buf0 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.7458255Z buf1 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.7458613Z buf2 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.7459034Z buf3 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.7459388Z buf4 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.7459742Z buf5 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.bool) 2023-01-11T21:38:06.7460087Z buf6 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.bool) 2023-01-11T21:38:06.7460439Z buf8 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.7460780Z buf9 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.int64) 2023-01-11T21:38:06.7461134Z buf10 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.7461498Z buf11 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.7461748Z buf7 = buf6; del buf6 # reuse 2023-01-11T21:38:06.7461966Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.7462338Z triton_fused_amax_amin_any_1_any_2_argmax_argmin_gt_gt_1_logical_not_logical_not_1_0.run(buf7, arg0_1, buf0, buf1, buf2, buf3, buf4, buf5, buf8, buf9, buf10, buf11, 8, 3, grid=grid(8), stream=stream0) 2023-01-11T21:38:06.7462666Z del arg0_1 2023-01-11T21:38:06.7462906Z return (buf0, buf1, buf2, buf3, buf4, buf5, buf7, buf8, buf9, buf10, buf11, ) 2023-01-11T21:38:06.7463076Z 2023-01-11T21:38:06.7463080Z 2023-01-11T21:38:06.7463162Z if __name__ == "__main__": 2023-01-11T21:38:06.7463404Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.7463675Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.7464039Z arg0_1 = rand_strided((8, 3), (3, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.7464361Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.7464509Z 2023-01-11T21:38:06.7464582Z ok (1.056s) 2023-01-11T21:38:06.7465209Z test_unspec_inputs_cuda (__main__.CudaTests) ... /opt/conda/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py:372: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled.Consider setting `torch.set_float32_matmul_precision('high')` 2023-01-11T21:38:06.7465698Z warnings.warn( 2023-01-11T21:38:06.7466082Z [2023-01-11 21:36:06,255] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 913 2023-01-11T21:38:06.7466539Z [2023-01-11 21:36:06,270] torch._inductor.compile_fx: [WARNING] skipping cudagraphs due to multiple devices 2023-01-11T21:38:06.7466997Z [2023-01-11 21:36:06,270] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 913 2023-01-11T21:38:06.7467464Z [2023-01-11 21:36:06,286] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 914 2023-01-11T21:38:06.7467913Z [2023-01-11 21:36:06,299] torch._inductor.compile_fx: [WARNING] skipping cudagraphs due to multiple devices 2023-01-11T21:38:06.7468365Z [2023-01-11 21:36:06,299] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 914 2023-01-11T21:38:06.7468823Z [2023-01-11 21:36:06,315] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 915 2023-01-11T21:38:06.7469269Z [2023-01-11 21:36:06,328] torch._inductor.compile_fx: [WARNING] skipping cudagraphs due to multiple devices 2023-01-11T21:38:06.7469727Z [2023-01-11 21:36:06,328] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 915 2023-01-11T21:38:06.7470172Z [2023-01-11 21:36:06,343] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 916 2023-01-11T21:38:06.7470373Z 2023-01-11T21:38:06.7470473Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.7470683Z import torch 2023-01-11T21:38:06.7470860Z import random 2023-01-11T21:38:06.7471090Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.7471369Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.7471532Z 2023-01-11T21:38:06.7471615Z aten = torch.ops.aten 2023-01-11T21:38:06.7471899Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.7472165Z async_compile = AsyncCompile() 2023-01-11T21:38:06.7472301Z 2023-01-11T21:38:06.7472377Z import triton 2023-01-11T21:38:06.7472570Z import triton.language as tl 2023-01-11T21:38:06.7472826Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.7473122Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.7473295Z 2023-01-11T21:38:06.7473300Z 2023-01-11T21:38:06.7473471Z triton_fused_add_div_mul_0 = async_compile.triton(''' 2023-01-11T21:38:06.7473691Z import triton 2023-01-11T21:38:06.7473895Z import triton.language as tl 2023-01-11T21:38:06.7474142Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.7474383Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.7474656Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.7474943Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.7475097Z 2023-01-11T21:38:06.7475554Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*fp32', 1: 'fp32', 2: '*fp32', 3: '*fp32', 4: '*fp32', 5: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]}) 2023-01-11T21:38:06.7475921Z @triton.jit 2023-01-11T21:38:06.7476187Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, out_ptr2, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.7476447Z xnumel = 6 2023-01-11T21:38:06.7476645Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.7476912Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.7477187Z xmask = xindex < xnumel 2023-01-11T21:38:06.7477375Z x0 = xindex 2023-01-11T21:38:06.7477674Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.7477914Z tmp1 = in_ptr1 2023-01-11T21:38:06.7478116Z tmp5 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.7478342Z tmp2 = tmp1.to(tl.float32) 2023-01-11T21:38:06.7478547Z tmp3 = tmp0 + tmp2 2023-01-11T21:38:06.7478736Z tmp4 = tmp0 * tmp2 2023-01-11T21:38:06.7478929Z tmp6 = tmp5 / tmp2 2023-01-11T21:38:06.7479179Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.7479475Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.7479771Z tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp6, xmask) 2023-01-11T21:38:06.7480013Z ''') 2023-01-11T21:38:06.7480115Z 2023-01-11T21:38:06.7480120Z 2023-01-11T21:38:06.7480215Z async_compile.wait(globals()) 2023-01-11T21:38:06.7480423Z del async_compile 2023-01-11T21:38:06.7480539Z 2023-01-11T21:38:06.7480617Z def call(args): 2023-01-11T21:38:06.7480806Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.7480989Z args.clear() 2023-01-11T21:38:06.7481192Z with torch.cuda.device(0): 2023-01-11T21:38:06.7481529Z buf0 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.7481892Z buf1 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.7482251Z buf2 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.7482508Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.7482810Z triton_fused_add_div_mul_0.run(arg0_1, arg1_1.item(), buf0, buf1, buf2, 6, grid=grid(6), stream=stream0) 2023-01-11T21:38:06.7483066Z del arg0_1 2023-01-11T21:38:06.7483249Z del arg1_1 2023-01-11T21:38:06.7483450Z return (buf0, buf1, buf2, ) 2023-01-11T21:38:06.7483575Z 2023-01-11T21:38:06.7483583Z 2023-01-11T21:38:06.7483667Z if __name__ == "__main__": 2023-01-11T21:38:06.7483907Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.7484185Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.7484540Z arg0_1 = rand_strided((2, 3), (3, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.7484931Z arg1_1 = rand_strided((), (), device='cpu', dtype=torch.float16) 2023-01-11T21:38:06.7485218Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.7485384Z 2023-01-11T21:38:06.7485390Z 2023-01-11T21:38:06.7485500Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.7485726Z import torch 2023-01-11T21:38:06.7485918Z import random 2023-01-11T21:38:06.7486148Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.7486415Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.7486575Z 2023-01-11T21:38:06.7486659Z aten = torch.ops.aten 2023-01-11T21:38:06.7486918Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.7487177Z async_compile = AsyncCompile() 2023-01-11T21:38:06.7487310Z 2023-01-11T21:38:06.7487385Z import triton 2023-01-11T21:38:06.7487586Z import triton.language as tl 2023-01-11T21:38:06.7487838Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.7488137Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.7488313Z 2023-01-11T21:38:06.7488318Z 2023-01-11T21:38:06.7488489Z triton_fused_add_div_mul_0 = async_compile.triton(''' 2023-01-11T21:38:06.7488715Z import triton 2023-01-11T21:38:06.7488907Z import triton.language as tl 2023-01-11T21:38:06.7489151Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.7489399Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.7489663Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.7489949Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.7490105Z 2023-01-11T21:38:06.7490558Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: 'fp32', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: '*fp32', 5: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]}) 2023-01-11T21:38:06.7490957Z @triton.jit 2023-01-11T21:38:06.7491220Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, out_ptr2, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.7491478Z xnumel = 6 2023-01-11T21:38:06.7491683Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.7491947Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.7492191Z xmask = xindex < xnumel 2023-01-11T21:38:06.7492382Z x0 = xindex 2023-01-11T21:38:06.7492554Z tmp0 = in_ptr0 2023-01-11T21:38:06.7492858Z tmp2 = tl.load(in_ptr1 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.7493118Z tmp5 = tl.load(in_ptr1 + (x0), xmask) 2023-01-11T21:38:06.7493333Z tmp1 = tmp0.to(tl.float32) 2023-01-11T21:38:06.7493543Z tmp3 = tmp1 + tmp2 2023-01-11T21:38:06.7493736Z tmp4 = tmp1 * tmp2 2023-01-11T21:38:06.7493923Z tmp6 = tmp1 / tmp5 2023-01-11T21:38:06.7494177Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.7494669Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.7494972Z tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp6, xmask) 2023-01-11T21:38:06.7495213Z ''') 2023-01-11T21:38:06.7495314Z 2023-01-11T21:38:06.7495319Z 2023-01-11T21:38:06.7495414Z async_compile.wait(globals()) 2023-01-11T21:38:06.7495623Z del async_compile 2023-01-11T21:38:06.7495738Z 2023-01-11T21:38:06.7495807Z def call(args): 2023-01-11T21:38:06.7495995Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.7496187Z args.clear() 2023-01-11T21:38:06.7496384Z with torch.cuda.device(0): 2023-01-11T21:38:06.7496715Z buf0 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.7497083Z buf1 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.7497514Z buf2 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.7497763Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.7498108Z triton_fused_add_div_mul_0.run(arg0_1.item(), arg1_1, buf0, buf1, buf2, 6, grid=grid(6), stream=stream0) 2023-01-11T21:38:06.7498371Z del arg0_1 2023-01-11T21:38:06.7498546Z del arg1_1 2023-01-11T21:38:06.7498741Z return (buf0, buf1, buf2, ) 2023-01-11T21:38:06.7498871Z 2023-01-11T21:38:06.7498875Z 2023-01-11T21:38:06.7498957Z if __name__ == "__main__": 2023-01-11T21:38:06.7499185Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.7499460Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.7499809Z arg0_1 = rand_strided((), (), device='cpu', dtype=torch.float16) 2023-01-11T21:38:06.7500171Z arg1_1 = rand_strided((2, 3), (3, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.7500451Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.7500603Z 2023-01-11T21:38:06.7500608Z 2023-01-11T21:38:06.7500706Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.7500910Z import torch 2023-01-11T21:38:06.7501088Z import random 2023-01-11T21:38:06.7501314Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.7501586Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.7501746Z 2023-01-11T21:38:06.7501821Z aten = torch.ops.aten 2023-01-11T21:38:06.7502078Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.7502338Z async_compile = AsyncCompile() 2023-01-11T21:38:06.7502471Z 2023-01-11T21:38:06.7502545Z import triton 2023-01-11T21:38:06.7502737Z import triton.language as tl 2023-01-11T21:38:06.7502992Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.7503286Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.7503491Z 2023-01-11T21:38:06.7503505Z 2023-01-11T21:38:06.7503670Z triton_fused_add_div_mul_0 = async_compile.triton(''' 2023-01-11T21:38:06.7503901Z import triton 2023-01-11T21:38:06.7504107Z import triton.language as tl 2023-01-11T21:38:06.7504347Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.7504601Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.7504878Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.7505166Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.7505325Z 2023-01-11T21:38:06.7505832Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*fp32', 1: 'fp32', 2: '*fp32', 3: '*fp32', 4: '*fp32', 5: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]}) 2023-01-11T21:38:06.7506202Z @triton.jit 2023-01-11T21:38:06.7506470Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, out_ptr2, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.7506727Z xnumel = 6 2023-01-11T21:38:06.7506935Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.7507202Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.7507447Z xmask = xindex < xnumel 2023-01-11T21:38:06.7507633Z x0 = xindex 2023-01-11T21:38:06.7507939Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.7508182Z tmp1 = in_ptr1 2023-01-11T21:38:06.7508388Z tmp5 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.7508612Z tmp2 = tmp1.to(tl.float32) 2023-01-11T21:38:06.7508822Z tmp3 = tmp0 + tmp2 2023-01-11T21:38:06.7509011Z tmp4 = tmp0 * tmp2 2023-01-11T21:38:06.7509205Z tmp6 = tmp5 / tmp2 2023-01-11T21:38:06.7509456Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.7509748Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.7510038Z tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp6, xmask) 2023-01-11T21:38:06.7510292Z ''') 2023-01-11T21:38:06.7510395Z 2023-01-11T21:38:06.7510400Z 2023-01-11T21:38:06.7510491Z async_compile.wait(globals()) 2023-01-11T21:38:06.7510699Z del async_compile 2023-01-11T21:38:06.7510817Z 2023-01-11T21:38:06.7510893Z def call(args): 2023-01-11T21:38:06.7511118Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.7511306Z args.clear() 2023-01-11T21:38:06.7511513Z with torch.cuda.device(0): 2023-01-11T21:38:06.7511848Z buf0 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.7512210Z buf1 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.7512570Z buf2 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.7512831Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.7513127Z triton_fused_add_div_mul_0.run(arg0_1, arg1_1.item(), buf0, buf1, buf2, 6, grid=grid(6), stream=stream0) 2023-01-11T21:38:06.7513393Z del arg0_1 2023-01-11T21:38:06.7513575Z del arg1_1 2023-01-11T21:38:06.7513768Z return (buf0, buf1, buf2, ) 2023-01-11T21:38:06.7513899Z 2023-01-11T21:38:06.7513904Z 2023-01-11T21:38:06.7513988Z if __name__ == "__main__": 2023-01-11T21:38:06.7514230Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.7514515Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.7514870Z arg0_1 = rand_strided((2, 3), (3, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.7515229Z arg1_1 = rand_strided((), (), device='cpu', dtype=torch.bfloat16) 2023-01-11T21:38:06.7515516Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.7515670Z 2023-01-11T21:38:06.7515919Z [2023-01-11 21:36:06,356] torch._inductor.compile_fx: [WARNING] skipping cudagraphs due to multiple devices 2023-01-11T21:38:06.7516393Z [2023-01-11 21:36:06,356] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 916 2023-01-11T21:38:06.7516899Z [2023-01-11 21:36:06,373] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 917 2023-01-11T21:38:06.7517354Z [2023-01-11 21:36:06,385] torch._inductor.compile_fx: [WARNING] skipping cudagraphs due to multiple devices 2023-01-11T21:38:06.7517819Z [2023-01-11 21:36:06,385] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 917 2023-01-11T21:38:06.7518283Z [2023-01-11 21:36:06,400] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 918 2023-01-11T21:38:06.7518729Z [2023-01-11 21:36:06,412] torch._inductor.compile_fx: [WARNING] skipping cudagraphs due to multiple devices 2023-01-11T21:38:06.7519191Z [2023-01-11 21:36:06,412] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 918 2023-01-11T21:38:06.7519644Z [2023-01-11 21:36:06,428] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 919 2023-01-11T21:38:06.7519848Z 2023-01-11T21:38:06.7519951Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.7520161Z import torch 2023-01-11T21:38:06.7520341Z import random 2023-01-11T21:38:06.7520581Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.7520859Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.7521028Z 2023-01-11T21:38:06.7521113Z aten = torch.ops.aten 2023-01-11T21:38:06.7521369Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.7521634Z async_compile = AsyncCompile() 2023-01-11T21:38:06.7521771Z 2023-01-11T21:38:06.7521850Z import triton 2023-01-11T21:38:06.7522049Z import triton.language as tl 2023-01-11T21:38:06.7522310Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.7522614Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.7522790Z 2023-01-11T21:38:06.7522795Z 2023-01-11T21:38:06.7522959Z triton_fused_add_div_mul_0 = async_compile.triton(''' 2023-01-11T21:38:06.7523195Z import triton 2023-01-11T21:38:06.7523399Z import triton.language as tl 2023-01-11T21:38:06.7523639Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.7523892Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.7524171Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.7524500Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.7524653Z 2023-01-11T21:38:06.7525105Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: 'fp32', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: '*fp32', 5: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]}) 2023-01-11T21:38:06.7525499Z @triton.jit 2023-01-11T21:38:06.7525793Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, out_ptr2, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.7526055Z xnumel = 6 2023-01-11T21:38:06.7526258Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.7526532Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.7526778Z xmask = xindex < xnumel 2023-01-11T21:38:06.7526971Z x0 = xindex 2023-01-11T21:38:06.7527153Z tmp0 = in_ptr0 2023-01-11T21:38:06.7527464Z tmp2 = tl.load(in_ptr1 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.7527728Z tmp5 = tl.load(in_ptr1 + (x0), xmask) 2023-01-11T21:38:06.7527948Z tmp1 = tmp0.to(tl.float32) 2023-01-11T21:38:06.7528154Z tmp3 = tmp1 + tmp2 2023-01-11T21:38:06.7528345Z tmp4 = tmp1 * tmp2 2023-01-11T21:38:06.7528540Z tmp6 = tmp1 / tmp5 2023-01-11T21:38:06.7528798Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.7529091Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.7529381Z tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp6, xmask) 2023-01-11T21:38:06.7529632Z ''') 2023-01-11T21:38:06.7529735Z 2023-01-11T21:38:06.7529781Z 2023-01-11T21:38:06.7529880Z async_compile.wait(globals()) 2023-01-11T21:38:06.7530083Z del async_compile 2023-01-11T21:38:06.7530201Z 2023-01-11T21:38:06.7530279Z def call(args): 2023-01-11T21:38:06.7530470Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.7530656Z args.clear() 2023-01-11T21:38:06.7530861Z with torch.cuda.device(0): 2023-01-11T21:38:06.7531201Z buf0 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.7531564Z buf1 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.7531923Z buf2 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.7532181Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.7532474Z triton_fused_add_div_mul_0.run(arg0_1.item(), arg1_1, buf0, buf1, buf2, 6, grid=grid(6), stream=stream0) 2023-01-11T21:38:06.7532738Z del arg0_1 2023-01-11T21:38:06.7532920Z del arg1_1 2023-01-11T21:38:06.7533116Z return (buf0, buf1, buf2, ) 2023-01-11T21:38:06.7533244Z 2023-01-11T21:38:06.7533249Z 2023-01-11T21:38:06.7533332Z if __name__ == "__main__": 2023-01-11T21:38:06.7533571Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.7533851Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.7534202Z arg0_1 = rand_strided((), (), device='cpu', dtype=torch.bfloat16) 2023-01-11T21:38:06.7534777Z arg1_1 = rand_strided((2, 3), (3, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.7535062Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.7535216Z 2023-01-11T21:38:06.7535220Z 2023-01-11T21:38:06.7535317Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.7535513Z import torch 2023-01-11T21:38:06.7535693Z import random 2023-01-11T21:38:06.7535921Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.7536186Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.7536342Z 2023-01-11T21:38:06.7536428Z aten = torch.ops.aten 2023-01-11T21:38:06.7536680Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.7536935Z async_compile = AsyncCompile() 2023-01-11T21:38:06.7537071Z 2023-01-11T21:38:06.7537191Z import triton 2023-01-11T21:38:06.7537404Z import triton.language as tl 2023-01-11T21:38:06.7537697Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.7537995Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.7538167Z 2023-01-11T21:38:06.7538171Z 2023-01-11T21:38:06.7538344Z triton_fused_add_div_mul_0 = async_compile.triton(''' 2023-01-11T21:38:06.7538565Z import triton 2023-01-11T21:38:06.7538754Z import triton.language as tl 2023-01-11T21:38:06.7538994Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.7539239Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.7539499Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.7539778Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.7539935Z 2023-01-11T21:38:06.7540386Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*fp32', 1: 'fp32', 2: '*fp32', 3: '*fp32', 4: '*fp32', 5: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]}) 2023-01-11T21:38:06.7540745Z @triton.jit 2023-01-11T21:38:06.7541003Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, out_ptr2, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.7541254Z xnumel = 6 2023-01-11T21:38:06.7541453Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.7541709Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.7541946Z xmask = xindex < xnumel 2023-01-11T21:38:06.7542135Z x0 = xindex 2023-01-11T21:38:06.7542423Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.7542658Z tmp1 = in_ptr1 2023-01-11T21:38:06.7542864Z tmp4 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.7543112Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.7543307Z tmp3 = tmp0 * tmp1 2023-01-11T21:38:06.7543502Z tmp5 = tmp4 / tmp1 2023-01-11T21:38:06.7543744Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.7544044Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.7544332Z tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask) 2023-01-11T21:38:06.7544581Z ''') 2023-01-11T21:38:06.7544677Z 2023-01-11T21:38:06.7544683Z 2023-01-11T21:38:06.7544780Z async_compile.wait(globals()) 2023-01-11T21:38:06.7544986Z del async_compile 2023-01-11T21:38:06.7545100Z 2023-01-11T21:38:06.7545196Z def call(args): 2023-01-11T21:38:06.7545409Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.7545605Z args.clear() 2023-01-11T21:38:06.7545810Z with torch.cuda.device(0): 2023-01-11T21:38:06.7546141Z buf0 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.7546510Z buf1 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.7546872Z buf2 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.7547127Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.7547424Z triton_fused_add_div_mul_0.run(arg0_1, arg1_1.item(), buf0, buf1, buf2, 6, grid=grid(6), stream=stream0) 2023-01-11T21:38:06.7547687Z del arg0_1 2023-01-11T21:38:06.7547871Z del arg1_1 2023-01-11T21:38:06.7548062Z return (buf0, buf1, buf2, ) 2023-01-11T21:38:06.7548195Z 2023-01-11T21:38:06.7548199Z 2023-01-11T21:38:06.7548282Z if __name__ == "__main__": 2023-01-11T21:38:06.7548518Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.7548787Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.7549152Z arg0_1 = rand_strided((2, 3), (3, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.7549512Z arg1_1 = rand_strided((), (), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.7549794Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.7549939Z 2023-01-11T21:38:06.7549943Z 2023-01-11T21:38:06.7550046Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.7550252Z import torch 2023-01-11T21:38:06.7550472Z import random 2023-01-11T21:38:06.7550699Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.7550976Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.7551134Z 2023-01-11T21:38:06.7551218Z aten = torch.ops.aten 2023-01-11T21:38:06.7551469Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.7551731Z async_compile = AsyncCompile() 2023-01-11T21:38:06.7551867Z 2023-01-11T21:38:06.7551944Z import triton 2023-01-11T21:38:06.7552140Z import triton.language as tl 2023-01-11T21:38:06.7552398Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.7552694Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.7552873Z 2023-01-11T21:38:06.7552877Z 2023-01-11T21:38:06.7553050Z triton_fused_add_div_mul_0 = async_compile.triton(''' 2023-01-11T21:38:06.7553271Z import triton 2023-01-11T21:38:06.7553474Z import triton.language as tl 2023-01-11T21:38:06.7553721Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.7553965Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.7554236Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.7554526Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.7554683Z 2023-01-11T21:38:06.7555133Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: 'fp32', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: '*fp32', 5: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]}) 2023-01-11T21:38:06.7555488Z @triton.jit 2023-01-11T21:38:06.7555755Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, out_ptr2, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.7556047Z xnumel = 6 2023-01-11T21:38:06.7556247Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.7556511Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.7556756Z xmask = xindex < xnumel 2023-01-11T21:38:06.7556944Z x0 = xindex 2023-01-11T21:38:06.7557126Z tmp0 = in_ptr0 2023-01-11T21:38:06.7557429Z tmp1 = tl.load(in_ptr1 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.7557681Z tmp4 = tl.load(in_ptr1 + (x0), xmask) 2023-01-11T21:38:06.7557893Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.7558091Z tmp3 = tmp0 * tmp1 2023-01-11T21:38:06.7558284Z tmp5 = tmp0 / tmp4 2023-01-11T21:38:06.7558525Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.7558820Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.7559109Z tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask) 2023-01-11T21:38:06.7559353Z ''') 2023-01-11T21:38:06.7559454Z 2023-01-11T21:38:06.7559459Z 2023-01-11T21:38:06.7559556Z async_compile.wait(globals()) 2023-01-11T21:38:06.7559766Z del async_compile 2023-01-11T21:38:06.7559881Z 2023-01-11T21:38:06.7559952Z def call(args): 2023-01-11T21:38:06.7560143Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.7560335Z args.clear() 2023-01-11T21:38:06.7560533Z with torch.cuda.device(0): 2023-01-11T21:38:06.7560863Z buf0 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.7561229Z buf1 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.7561589Z buf2 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.7561843Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.7562141Z triton_fused_add_div_mul_0.run(arg0_1.item(), arg1_1, buf0, buf1, buf2, 6, grid=grid(6), stream=stream0) 2023-01-11T21:38:06.7562406Z del arg0_1 2023-01-11T21:38:06.7562582Z del arg1_1 2023-01-11T21:38:06.7562783Z return (buf0, buf1, buf2, ) 2023-01-11T21:38:06.7562914Z 2023-01-11T21:38:06.7562918Z 2023-01-11T21:38:06.7563001Z if __name__ == "__main__": 2023-01-11T21:38:06.7563233Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.7563546Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.7563895Z arg0_1 = rand_strided((), (), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.7564262Z arg1_1 = rand_strided((2, 3), (3, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.7564538Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.7564692Z 2023-01-11T21:38:06.7564947Z [2023-01-11 21:36:06,441] torch._inductor.compile_fx: [WARNING] skipping cudagraphs due to multiple devices 2023-01-11T21:38:06.7565448Z [2023-01-11 21:36:06,441] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 919 2023-01-11T21:38:06.7565915Z [2023-01-11 21:36:06,457] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 920 2023-01-11T21:38:06.7566363Z [2023-01-11 21:36:06,471] torch._inductor.compile_fx: [WARNING] skipping cudagraphs due to multiple devices 2023-01-11T21:38:06.7566823Z [2023-01-11 21:36:06,471] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 920 2023-01-11T21:38:06.7567281Z [2023-01-11 21:36:06,486] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 921 2023-01-11T21:38:06.7567719Z [2023-01-11 21:36:06,500] torch._inductor.compile_fx: [WARNING] skipping cudagraphs due to multiple devices 2023-01-11T21:38:06.7568177Z [2023-01-11 21:36:06,500] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 921 2023-01-11T21:38:06.7568630Z [2023-01-11 21:36:06,515] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 922 2023-01-11T21:38:06.7568864Z 2023-01-11T21:38:06.7568966Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.7569172Z import torch 2023-01-11T21:38:06.7569359Z import random 2023-01-11T21:38:06.7569591Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.7569862Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.7570025Z 2023-01-11T21:38:06.7570113Z aten = torch.ops.aten 2023-01-11T21:38:06.7570371Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.7570628Z async_compile = AsyncCompile() 2023-01-11T21:38:06.7570639Z 2023-01-11T21:38:06.7570709Z import triton 2023-01-11T21:38:06.7570804Z import triton.language as tl 2023-01-11T21:38:06.7570934Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.7571076Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.7571082Z 2023-01-11T21:38:06.7571087Z 2023-01-11T21:38:06.7571258Z triton_fused_add_div_mul_0 = async_compile.triton(''' 2023-01-11T21:38:06.7571343Z import triton 2023-01-11T21:38:06.7571438Z import triton.language as tl 2023-01-11T21:38:06.7571548Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.7571653Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.7571791Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.7571921Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.7571926Z 2023-01-11T21:38:06.7572374Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*fp32', 1: 'fp64', 2: '*fp32', 3: '*fp32', 4: '*fp32', 5: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]}) 2023-01-11T21:38:06.7572450Z @triton.jit 2023-01-11T21:38:06.7572614Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, out_ptr2, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.7572689Z xnumel = 6 2023-01-11T21:38:06.7572783Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.7572919Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.7573008Z xmask = xindex < xnumel 2023-01-11T21:38:06.7573082Z x0 = xindex 2023-01-11T21:38:06.7573275Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.7573355Z tmp1 = in_ptr1 2023-01-11T21:38:06.7573484Z tmp5 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.7573570Z tmp2 = tmp1.to(tl.float32) 2023-01-11T21:38:06.7573653Z tmp3 = tmp0 + tmp2 2023-01-11T21:38:06.7573734Z tmp4 = tmp0 * tmp2 2023-01-11T21:38:06.7573814Z tmp6 = tmp5 / tmp2 2023-01-11T21:38:06.7573953Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.7574089Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.7574223Z tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp6, xmask) 2023-01-11T21:38:06.7574305Z ''') 2023-01-11T21:38:06.7574317Z 2023-01-11T21:38:06.7574324Z 2023-01-11T21:38:06.7574414Z async_compile.wait(globals()) 2023-01-11T21:38:06.7574693Z del async_compile 2023-01-11T21:38:06.7574699Z 2023-01-11T21:38:06.7574778Z def call(args): 2023-01-11T21:38:06.7574858Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.7574933Z args.clear() 2023-01-11T21:38:06.7575027Z with torch.cuda.device(0): 2023-01-11T21:38:06.7575239Z buf0 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.7575431Z buf1 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.7575628Z buf2 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.7575721Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.7575888Z triton_fused_add_div_mul_0.run(arg0_1, arg1_1.item(), buf0, buf1, buf2, 6, grid=grid(6), stream=stream0) 2023-01-11T21:38:06.7575961Z del arg0_1 2023-01-11T21:38:06.7576033Z del arg1_1 2023-01-11T21:38:06.7576122Z return (buf0, buf1, buf2, ) 2023-01-11T21:38:06.7576173Z 2023-01-11T21:38:06.7576177Z 2023-01-11T21:38:06.7576262Z if __name__ == "__main__": 2023-01-11T21:38:06.7576376Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.7576505Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.7576711Z arg0_1 = rand_strided((2, 3), (3, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.7576898Z arg1_1 = rand_strided((), (), device='cpu', dtype=torch.float64) 2023-01-11T21:38:06.7577018Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.7577024Z 2023-01-11T21:38:06.7577028Z 2023-01-11T21:38:06.7577174Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.7577262Z import torch 2023-01-11T21:38:06.7577342Z import random 2023-01-11T21:38:06.7577455Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.7577582Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.7577587Z 2023-01-11T21:38:06.7577672Z aten = torch.ops.aten 2023-01-11T21:38:06.7577813Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.7577910Z async_compile = AsyncCompile() 2023-01-11T21:38:06.7577915Z 2023-01-11T21:38:06.7577991Z import triton 2023-01-11T21:38:06.7578085Z import triton.language as tl 2023-01-11T21:38:06.7578209Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.7578351Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.7578356Z 2023-01-11T21:38:06.7578360Z 2023-01-11T21:38:06.7578530Z triton_fused_add_div_mul_0 = async_compile.triton(''' 2023-01-11T21:38:06.7578606Z import triton 2023-01-11T21:38:06.7578704Z import triton.language as tl 2023-01-11T21:38:06.7578820Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.7578924Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.7579059Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.7579179Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.7579187Z 2023-01-11T21:38:06.7579633Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: 'fp64', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: '*fp32', 5: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]}) 2023-01-11T21:38:06.7579749Z @triton.jit 2023-01-11T21:38:06.7579915Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, out_ptr2, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.7579991Z xnumel = 6 2023-01-11T21:38:06.7580090Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.7580225Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.7580312Z xmask = xindex < xnumel 2023-01-11T21:38:06.7580378Z x0 = xindex 2023-01-11T21:38:06.7580456Z tmp0 = in_ptr0 2023-01-11T21:38:06.7580651Z tmp2 = tl.load(in_ptr1 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.7580754Z tmp5 = tl.load(in_ptr1 + (x0), xmask) 2023-01-11T21:38:06.7580848Z tmp1 = tmp0.to(tl.float32) 2023-01-11T21:38:06.7580929Z tmp3 = tmp1 + tmp2 2023-01-11T21:38:06.7581011Z tmp4 = tmp1 * tmp2 2023-01-11T21:38:06.7581085Z tmp6 = tmp1 / tmp5 2023-01-11T21:38:06.7581222Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.7581359Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.7581495Z tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp6, xmask) 2023-01-11T21:38:06.7581583Z ''') 2023-01-11T21:38:06.7581589Z 2023-01-11T21:38:06.7581593Z 2023-01-11T21:38:06.7581689Z async_compile.wait(globals()) 2023-01-11T21:38:06.7581767Z del async_compile 2023-01-11T21:38:06.7581772Z 2023-01-11T21:38:06.7581851Z def call(args): 2023-01-11T21:38:06.7581927Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.7582006Z args.clear() 2023-01-11T21:38:06.7582100Z with torch.cuda.device(0): 2023-01-11T21:38:06.7582302Z buf0 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.7582538Z buf1 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.7582737Z buf2 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.7582831Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.7582995Z triton_fused_add_div_mul_0.run(arg0_1.item(), arg1_1, buf0, buf1, buf2, 6, grid=grid(6), stream=stream0) 2023-01-11T21:38:06.7583073Z del arg0_1 2023-01-11T21:38:06.7583148Z del arg1_1 2023-01-11T21:38:06.7583239Z return (buf0, buf1, buf2, ) 2023-01-11T21:38:06.7583245Z 2023-01-11T21:38:06.7583249Z 2023-01-11T21:38:06.7583330Z if __name__ == "__main__": 2023-01-11T21:38:06.7583451Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.7583583Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.7583772Z arg0_1 = rand_strided((), (), device='cpu', dtype=torch.float64) 2023-01-11T21:38:06.7583968Z arg1_1 = rand_strided((2, 3), (3, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.7584089Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.7584094Z 2023-01-11T21:38:06.7584099Z 2023-01-11T21:38:06.7584199Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.7584278Z import torch 2023-01-11T21:38:06.7584358Z import random 2023-01-11T21:38:06.7584478Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.7584605Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.7584610Z 2023-01-11T21:38:06.7584693Z aten = torch.ops.aten 2023-01-11T21:38:06.7584826Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.7584925Z async_compile = AsyncCompile() 2023-01-11T21:38:06.7584930Z 2023-01-11T21:38:06.7585005Z import triton 2023-01-11T21:38:06.7585099Z import triton.language as tl 2023-01-11T21:38:06.7585228Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.7585385Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.7585392Z 2023-01-11T21:38:06.7585397Z 2023-01-11T21:38:06.7585592Z triton_fused_add_div_mul_0 = async_compile.triton(''' 2023-01-11T21:38:06.7585680Z import triton 2023-01-11T21:38:06.7585767Z import triton.language as tl 2023-01-11T21:38:06.7585923Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.7586032Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.7586167Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.7586294Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.7586299Z 2023-01-11T21:38:06.7586747Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*fp32', 1: 'i32', 2: '*fp32', 3: '*fp32', 4: '*fp32', 5: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]}) 2023-01-11T21:38:06.7586824Z @triton.jit 2023-01-11T21:38:06.7586990Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, out_ptr2, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.7587060Z xnumel = 6 2023-01-11T21:38:06.7587165Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.7587297Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.7587385Z xmask = xindex < xnumel 2023-01-11T21:38:06.7587459Z x0 = xindex 2023-01-11T21:38:06.7587651Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.7587731Z tmp1 = in_ptr1 2023-01-11T21:38:06.7587823Z tmp5 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.7587913Z tmp2 = tmp1.to(tl.float32) 2023-01-11T21:38:06.7587997Z tmp3 = tmp0 + tmp2 2023-01-11T21:38:06.7588077Z tmp4 = tmp0 * tmp2 2023-01-11T21:38:06.7588158Z tmp6 = tmp5 / tmp2 2023-01-11T21:38:06.7588293Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.7588433Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.7588585Z tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp6, xmask) 2023-01-11T21:38:06.7588674Z ''') 2023-01-11T21:38:06.7588680Z 2023-01-11T21:38:06.7588684Z 2023-01-11T21:38:06.7588780Z async_compile.wait(globals()) 2023-01-11T21:38:06.7588860Z del async_compile 2023-01-11T21:38:06.7588868Z 2023-01-11T21:38:06.7588947Z def call(args): 2023-01-11T21:38:06.7589029Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.7589109Z args.clear() 2023-01-11T21:38:06.7589205Z with torch.cuda.device(0): 2023-01-11T21:38:06.7589402Z buf0 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.7589604Z buf1 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.7589802Z buf2 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.7589901Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.7590068Z triton_fused_add_div_mul_0.run(arg0_1, arg1_1.item(), buf0, buf1, buf2, 6, grid=grid(6), stream=stream0) 2023-01-11T21:38:06.7590148Z del arg0_1 2023-01-11T21:38:06.7590223Z del arg1_1 2023-01-11T21:38:06.7590308Z return (buf0, buf1, buf2, ) 2023-01-11T21:38:06.7590321Z 2023-01-11T21:38:06.7590326Z 2023-01-11T21:38:06.7590403Z if __name__ == "__main__": 2023-01-11T21:38:06.7590525Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.7590655Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.7590857Z arg0_1 = rand_strided((2, 3), (3, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.7591043Z arg1_1 = rand_strided((), (), device='cpu', dtype=torch.int32) 2023-01-11T21:38:06.7591163Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.7591169Z 2023-01-11T21:38:06.7591419Z [2023-01-11 21:36:06,528] torch._inductor.compile_fx: [WARNING] skipping cudagraphs due to multiple devices 2023-01-11T21:38:06.7591685Z [2023-01-11 21:36:06,528] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 922 2023-01-11T21:38:06.7591932Z [2023-01-11 21:36:06,544] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 923 2023-01-11T21:38:06.7592212Z [2023-01-11 21:36:06,557] torch._inductor.compile_fx: [WARNING] skipping cudagraphs due to multiple devices 2023-01-11T21:38:06.7592474Z [2023-01-11 21:36:06,557] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 923 2023-01-11T21:38:06.7592726Z [2023-01-11 21:36:06,574] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 924 2023-01-11T21:38:06.7592970Z [2023-01-11 21:36:06,587] torch._inductor.compile_fx: [WARNING] skipping cudagraphs due to multiple devices 2023-01-11T21:38:06.7593228Z [2023-01-11 21:36:06,587] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 924 2023-01-11T21:38:06.7593234Z 2023-01-11T21:38:06.7593338Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.7593413Z import torch 2023-01-11T21:38:06.7593481Z import random 2023-01-11T21:38:06.7593599Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.7593722Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.7593727Z 2023-01-11T21:38:06.7593808Z aten = torch.ops.aten 2023-01-11T21:38:06.7593946Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.7594043Z async_compile = AsyncCompile() 2023-01-11T21:38:06.7594048Z 2023-01-11T21:38:06.7594122Z import triton 2023-01-11T21:38:06.7594216Z import triton.language as tl 2023-01-11T21:38:06.7594333Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.7594471Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.7594477Z 2023-01-11T21:38:06.7594481Z 2023-01-11T21:38:06.7594650Z triton_fused_add_div_mul_0 = async_compile.triton(''' 2023-01-11T21:38:06.7594724Z import triton 2023-01-11T21:38:06.7594885Z import triton.language as tl 2023-01-11T21:38:06.7595000Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.7595106Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.7595260Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.7595402Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.7595410Z 2023-01-11T21:38:06.7595857Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: 'i32', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: '*fp32', 5: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]}) 2023-01-11T21:38:06.7595932Z @triton.jit 2023-01-11T21:38:06.7596091Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, out_ptr2, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.7596163Z xnumel = 6 2023-01-11T21:38:06.7596259Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.7596390Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.7596476Z xmask = xindex < xnumel 2023-01-11T21:38:06.7596540Z x0 = xindex 2023-01-11T21:38:06.7596615Z tmp0 = in_ptr0 2023-01-11T21:38:06.7596807Z tmp2 = tl.load(in_ptr1 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.7596904Z tmp5 = tl.load(in_ptr1 + (x0), xmask) 2023-01-11T21:38:06.7596995Z tmp1 = tmp0.to(tl.float32) 2023-01-11T21:38:06.7597073Z tmp3 = tmp1 + tmp2 2023-01-11T21:38:06.7597151Z tmp4 = tmp1 * tmp2 2023-01-11T21:38:06.7597222Z tmp6 = tmp1 / tmp5 2023-01-11T21:38:06.7597357Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.7597489Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.7597620Z tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp6, xmask) 2023-01-11T21:38:06.7597708Z ''') 2023-01-11T21:38:06.7597714Z 2023-01-11T21:38:06.7597718Z 2023-01-11T21:38:06.7597810Z async_compile.wait(globals()) 2023-01-11T21:38:06.7597895Z del async_compile 2023-01-11T21:38:06.7597900Z 2023-01-11T21:38:06.7597974Z def call(args): 2023-01-11T21:38:06.7598046Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.7598122Z args.clear() 2023-01-11T21:38:06.7598213Z with torch.cuda.device(0): 2023-01-11T21:38:06.7598438Z buf0 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.7598637Z buf1 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.7598834Z buf2 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.7598926Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.7599085Z triton_fused_add_div_mul_0.run(arg0_1.item(), arg1_1, buf0, buf1, buf2, 6, grid=grid(6), stream=stream0) 2023-01-11T21:38:06.7599158Z del arg0_1 2023-01-11T21:38:06.7599230Z del arg1_1 2023-01-11T21:38:06.7599320Z return (buf0, buf1, buf2, ) 2023-01-11T21:38:06.7599328Z 2023-01-11T21:38:06.7599332Z 2023-01-11T21:38:06.7599413Z if __name__ == "__main__": 2023-01-11T21:38:06.7599530Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.7599655Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.7599839Z arg0_1 = rand_strided((), (), device='cpu', dtype=torch.int32) 2023-01-11T21:38:06.7600032Z arg1_1 = rand_strided((2, 3), (3, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.7600151Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.7600156Z 2023-01-11T21:38:06.7600160Z 2023-01-11T21:38:06.7600256Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.7600332Z import torch 2023-01-11T21:38:06.7600405Z import random 2023-01-11T21:38:06.7600523Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.7600645Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.7600650Z 2023-01-11T21:38:06.7600733Z aten = torch.ops.aten 2023-01-11T21:38:06.7600891Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.7600986Z async_compile = AsyncCompile() 2023-01-11T21:38:06.7600991Z 2023-01-11T21:38:06.7601064Z import triton 2023-01-11T21:38:06.7601155Z import triton.language as tl 2023-01-11T21:38:06.7601280Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.7601422Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.7601427Z 2023-01-11T21:38:06.7601432Z 2023-01-11T21:38:06.7601598Z triton_fused_add_div_mul_0 = async_compile.triton(''' 2023-01-11T21:38:06.7601672Z import triton 2023-01-11T21:38:06.7601757Z import triton.language as tl 2023-01-11T21:38:06.7601872Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.7601975Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.7602107Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.7602231Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.7602236Z 2023-01-11T21:38:06.7602680Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*fp32', 1: 'i64', 2: '*fp32', 3: '*fp32', 4: '*fp32', 5: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]}) 2023-01-11T21:38:06.7602754Z @triton.jit 2023-01-11T21:38:06.7602914Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, out_ptr2, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.7602980Z xnumel = 6 2023-01-11T21:38:06.7603077Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.7603204Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.7603289Z xmask = xindex < xnumel 2023-01-11T21:38:06.7603360Z x0 = xindex 2023-01-11T21:38:06.7603550Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.7603625Z tmp1 = in_ptr1 2023-01-11T21:38:06.7603715Z tmp5 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.7603801Z tmp2 = tmp1.to(tl.float32) 2023-01-11T21:38:06.7603882Z tmp3 = tmp0 + tmp2 2023-01-11T21:38:06.7603960Z tmp4 = tmp0 * tmp2 2023-01-11T21:38:06.7604037Z tmp6 = tmp5 / tmp2 2023-01-11T21:38:06.7604170Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.7604335Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.7604460Z tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp6, xmask) 2023-01-11T21:38:06.7604545Z ''') 2023-01-11T21:38:06.7604551Z 2023-01-11T21:38:06.7604555Z 2023-01-11T21:38:06.7604647Z async_compile.wait(globals()) 2023-01-11T21:38:06.7604725Z del async_compile 2023-01-11T21:38:06.7604730Z 2023-01-11T21:38:06.7604806Z def call(args): 2023-01-11T21:38:06.7604886Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.7604961Z args.clear() 2023-01-11T21:38:06.7605047Z with torch.cuda.device(0): 2023-01-11T21:38:06.7605243Z buf0 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.7605443Z buf1 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.7605640Z buf2 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.7605732Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.7605904Z triton_fused_add_div_mul_0.run(arg0_1, arg1_1.item(), buf0, buf1, buf2, 6, grid=grid(6), stream=stream0) 2023-01-11T21:38:06.7605978Z del arg0_1 2023-01-11T21:38:06.7606051Z del arg1_1 2023-01-11T21:38:06.7606133Z return (buf0, buf1, buf2, ) 2023-01-11T21:38:06.7606138Z 2023-01-11T21:38:06.7606142Z 2023-01-11T21:38:06.7606224Z if __name__ == "__main__": 2023-01-11T21:38:06.7606341Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.7606468Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.7606665Z arg0_1 = rand_strided((2, 3), (3, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.7606876Z arg1_1 = rand_strided((), (), device='cpu', dtype=torch.int64) 2023-01-11T21:38:06.7607000Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.7607005Z 2023-01-11T21:38:06.7607010Z 2023-01-11T21:38:06.7607109Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.7607179Z import torch 2023-01-11T21:38:06.7607255Z import random 2023-01-11T21:38:06.7607379Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.7607504Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.7607509Z 2023-01-11T21:38:06.7607593Z aten = torch.ops.aten 2023-01-11T21:38:06.7607730Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.7607827Z async_compile = AsyncCompile() 2023-01-11T21:38:06.7607832Z 2023-01-11T21:38:06.7607907Z import triton 2023-01-11T21:38:06.7607996Z import triton.language as tl 2023-01-11T21:38:06.7608123Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.7608266Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.7608275Z 2023-01-11T21:38:06.7608279Z 2023-01-11T21:38:06.7608448Z triton_fused_add_div_mul_0 = async_compile.triton(''' 2023-01-11T21:38:06.7608525Z import triton 2023-01-11T21:38:06.7608620Z import triton.language as tl 2023-01-11T21:38:06.7608735Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.7608836Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.7608972Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.7609099Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.7609104Z 2023-01-11T21:38:06.7609547Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: 'i64', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: '*fp32', 5: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]}) 2023-01-11T21:38:06.7609622Z @triton.jit 2023-01-11T21:38:06.7609782Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, out_ptr2, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.7609859Z xnumel = 6 2023-01-11T21:38:06.7609958Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.7610089Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.7610168Z xmask = xindex < xnumel 2023-01-11T21:38:06.7610266Z x0 = xindex 2023-01-11T21:38:06.7610345Z tmp0 = in_ptr0 2023-01-11T21:38:06.7610537Z tmp2 = tl.load(in_ptr1 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.7610634Z tmp5 = tl.load(in_ptr1 + (x0), xmask) 2023-01-11T21:38:06.7610722Z tmp1 = tmp0.to(tl.float32) 2023-01-11T21:38:06.7610800Z tmp3 = tmp1 + tmp2 2023-01-11T21:38:06.7610871Z tmp4 = tmp1 * tmp2 2023-01-11T21:38:06.7610947Z tmp6 = tmp1 / tmp5 2023-01-11T21:38:06.7611081Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.7611213Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.7611347Z tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp6, xmask) 2023-01-11T21:38:06.7611432Z ''') 2023-01-11T21:38:06.7611438Z 2023-01-11T21:38:06.7611442Z 2023-01-11T21:38:06.7611536Z async_compile.wait(globals()) 2023-01-11T21:38:06.7611606Z del async_compile 2023-01-11T21:38:06.7611611Z 2023-01-11T21:38:06.7611689Z def call(args): 2023-01-11T21:38:06.7611766Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.7611840Z args.clear() 2023-01-11T21:38:06.7611933Z with torch.cuda.device(0): 2023-01-11T21:38:06.7612133Z buf0 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.7612331Z buf1 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.7612525Z buf2 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.7612611Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.7612776Z triton_fused_add_div_mul_0.run(arg0_1.item(), arg1_1, buf0, buf1, buf2, 6, grid=grid(6), stream=stream0) 2023-01-11T21:38:06.7612896Z del arg0_1 2023-01-11T21:38:06.7612969Z del arg1_1 2023-01-11T21:38:06.7613058Z return (buf0, buf1, buf2, ) 2023-01-11T21:38:06.7613064Z 2023-01-11T21:38:06.7613068Z 2023-01-11T21:38:06.7613148Z if __name__ == "__main__": 2023-01-11T21:38:06.7613270Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.7613389Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.7613574Z arg0_1 = rand_strided((), (), device='cpu', dtype=torch.int64) 2023-01-11T21:38:06.7613770Z arg1_1 = rand_strided((2, 3), (3, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.7613888Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.7613894Z 2023-01-11T21:38:06.7613964Z ok (0.349s) 2023-01-11T21:38:06.7614426Z test_unsqueeze_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.7614762Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.7615033Z [2023-01-11 21:36:06,618] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 925 2023-01-11T21:38:06.7615324Z [2023-01-11 21:36:06,799] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 925 2023-01-11T21:38:06.7615768Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.7615903Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.7616151Z [2023-01-11 21:36:06,831] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 926 2023-01-11T21:38:06.7616465Z [2023-01-11 21:36:06,920] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 926 2023-01-11T21:38:06.7616471Z 2023-01-11T21:38:06.7616571Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.7616645Z import torch 2023-01-11T21:38:06.7616719Z import random 2023-01-11T21:38:06.7616840Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.7616963Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.7616968Z 2023-01-11T21:38:06.7617050Z aten = torch.ops.aten 2023-01-11T21:38:06.7617231Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.7617344Z async_compile = AsyncCompile() 2023-01-11T21:38:06.7617350Z 2023-01-11T21:38:06.7617436Z import triton 2023-01-11T21:38:06.7617552Z import triton.language as tl 2023-01-11T21:38:06.7617679Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.7617819Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.7617825Z 2023-01-11T21:38:06.7617829Z 2023-01-11T21:38:06.7618024Z triton_fused_add_1_add_2_add_4_add_5_0 = async_compile.triton(''' 2023-01-11T21:38:06.7618099Z import triton 2023-01-11T21:38:06.7618184Z import triton.language as tl 2023-01-11T21:38:06.7618299Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.7618400Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.7618532Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.7618657Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.7618663Z 2023-01-11T21:38:06.7619115Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: '*fp32', 5: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5), equal_to_1=())]}) 2023-01-11T21:38:06.7619234Z @triton.jit 2023-01-11T21:38:06.7619412Z def triton_(in_ptr0, out_ptr0, out_ptr1, out_ptr2, out_ptr3, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.7619483Z xnumel = 16 2023-01-11T21:38:06.7619591Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.7619733Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.7619821Z xmask = xindex < xnumel 2023-01-11T21:38:06.7619894Z x0 = xindex 2023-01-11T21:38:06.7620114Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.7620218Z tmp6 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.7620285Z tmp1 = 1 2023-01-11T21:38:06.7620369Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.7620444Z tmp3 = 2 2023-01-11T21:38:06.7620526Z tmp4 = tmp2 + tmp3 2023-01-11T21:38:06.7620607Z tmp5 = tmp0 + tmp3 2023-01-11T21:38:06.7620693Z tmp7 = tmp6 + tmp3 2023-01-11T21:38:06.7620841Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.7620981Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask) 2023-01-11T21:38:06.7621123Z tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.7621270Z tl.store(out_ptr3 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp7, xmask) 2023-01-11T21:38:06.7621361Z ''') 2023-01-11T21:38:06.7621366Z 2023-01-11T21:38:06.7621371Z 2023-01-11T21:38:06.7621473Z async_compile.wait(globals()) 2023-01-11T21:38:06.7621553Z del async_compile 2023-01-11T21:38:06.7621558Z 2023-01-11T21:38:06.7621638Z def call(args): 2023-01-11T21:38:06.7621715Z arg0_1, = args 2023-01-11T21:38:06.7621787Z args.clear() 2023-01-11T21:38:06.7621885Z with torch.cuda.device(0): 2023-01-11T21:38:06.7622139Z buf0 = empty_strided((2, 2, 2, 2, 1), (8, 4, 2, 1, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.7622357Z buf1 = empty_strided((2, 2, 1, 2, 2), (8, 4, 4, 2, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.7622577Z buf2 = empty_strided((1, 2, 2, 2, 2), (16, 8, 4, 2, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.7622817Z buf3 = empty_strided((2, 2, 2, 1, 2), (8, 4, 2, 2, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.7622911Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.7623071Z triton_fused_add_1_add_2_add_4_add_5_0.run(arg0_1, buf0, buf1, buf2, buf3, 16, grid=grid(16), stream=stream0) 2023-01-11T21:38:06.7623146Z del arg0_1 2023-01-11T21:38:06.7623240Z return (buf0, buf1, buf2, buf3, ) 2023-01-11T21:38:06.7623246Z 2023-01-11T21:38:06.7623250Z 2023-01-11T21:38:06.7623330Z if __name__ == "__main__": 2023-01-11T21:38:06.7623449Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.7623575Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.7623786Z arg0_1 = rand_strided((2, 2, 2, 2), (8, 4, 2, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.7623905Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.7623910Z 2023-01-11T21:38:06.7623914Z 2023-01-11T21:38:06.7624010Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.7624077Z import torch 2023-01-11T21:38:06.7624156Z import random 2023-01-11T21:38:06.7624274Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.7624400Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.7624405Z 2023-01-11T21:38:06.7624488Z aten = torch.ops.aten 2023-01-11T21:38:06.7624623Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.7624724Z async_compile = AsyncCompile() 2023-01-11T21:38:06.7624729Z 2023-01-11T21:38:06.7624795Z import triton 2023-01-11T21:38:06.7624888Z import triton.language as tl 2023-01-11T21:38:06.7625012Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.7625151Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.7625183Z 2023-01-11T21:38:06.7625188Z 2023-01-11T21:38:06.7625380Z triton_fused_add_1_add_2_add_4_add_5_0 = async_compile.triton(''' 2023-01-11T21:38:06.7625457Z import triton 2023-01-11T21:38:06.7625557Z import triton.language as tl 2023-01-11T21:38:06.7625694Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.7625812Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.7625951Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.7626079Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.7626084Z 2023-01-11T21:38:06.7626537Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: '*fp16', 4: '*fp16', 5: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5), equal_to_1=())]}) 2023-01-11T21:38:06.7626614Z @triton.jit 2023-01-11T21:38:06.7626783Z def triton_(in_ptr0, out_ptr0, out_ptr1, out_ptr2, out_ptr3, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.7626859Z xnumel = 16 2023-01-11T21:38:06.7626958Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.7627083Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.7627169Z xmask = xindex < xnumel 2023-01-11T21:38:06.7627244Z x0 = xindex 2023-01-11T21:38:06.7627461Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.7627585Z tmp6 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.7627659Z tmp1 = 1 2023-01-11T21:38:06.7627740Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.7627806Z tmp3 = 2 2023-01-11T21:38:06.7627888Z tmp4 = tmp2 + tmp3 2023-01-11T21:38:06.7627968Z tmp5 = tmp0 + tmp3 2023-01-11T21:38:06.7628050Z tmp7 = tmp6 + tmp3 2023-01-11T21:38:06.7628188Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.7628325Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask) 2023-01-11T21:38:06.7628465Z tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.7628590Z tl.store(out_ptr3 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp7, xmask) 2023-01-11T21:38:06.7628679Z ''') 2023-01-11T21:38:06.7628684Z 2023-01-11T21:38:06.7628716Z 2023-01-11T21:38:06.7628813Z async_compile.wait(globals()) 2023-01-11T21:38:06.7628891Z del async_compile 2023-01-11T21:38:06.7628896Z 2023-01-11T21:38:06.7628969Z def call(args): 2023-01-11T21:38:06.7629042Z arg0_1, = args 2023-01-11T21:38:06.7629117Z args.clear() 2023-01-11T21:38:06.7629211Z with torch.cuda.device(0): 2023-01-11T21:38:06.7629424Z buf0 = empty_strided((2, 2, 2, 2, 1), (8, 4, 2, 1, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.7629636Z buf1 = empty_strided((2, 2, 1, 2, 2), (8, 4, 4, 2, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.7629853Z buf2 = empty_strided((1, 2, 2, 2, 2), (16, 8, 4, 2, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.7630067Z buf3 = empty_strided((2, 2, 2, 1, 2), (8, 4, 2, 2, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.7630159Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.7630328Z triton_fused_add_1_add_2_add_4_add_5_0.run(arg0_1, buf0, buf1, buf2, buf3, 16, grid=grid(16), stream=stream0) 2023-01-11T21:38:06.7630406Z del arg0_1 2023-01-11T21:38:06.7630502Z return (buf0, buf1, buf2, buf3, ) 2023-01-11T21:38:06.7630507Z 2023-01-11T21:38:06.7630511Z 2023-01-11T21:38:06.7630584Z if __name__ == "__main__": 2023-01-11T21:38:06.7630703Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.7630829Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.7631042Z arg0_1 = rand_strided((2, 2, 2, 2), (8, 4, 2, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.7631157Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.7631187Z 2023-01-11T21:38:06.7631260Z ok (0.335s) 2023-01-11T21:38:06.7631729Z test_unsqueeze_inplace_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.7631861Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.7632121Z [2023-01-11 21:36:06,952] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 927 2023-01-11T21:38:06.7632385Z [2023-01-11 21:36:07,024] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 927 2023-01-11T21:38:06.7632791Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.7632927Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.7633185Z [2023-01-11 21:36:07,055] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 928 2023-01-11T21:38:06.7633448Z [2023-01-11 21:36:07,126] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 928 2023-01-11T21:38:06.7633454Z 2023-01-11T21:38:06.7633551Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.7633625Z import torch 2023-01-11T21:38:06.7633702Z import random 2023-01-11T21:38:06.7633821Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.7633947Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.7633952Z 2023-01-11T21:38:06.7634032Z aten = torch.ops.aten 2023-01-11T21:38:06.7634169Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.7634267Z async_compile = AsyncCompile() 2023-01-11T21:38:06.7634272Z 2023-01-11T21:38:06.7634346Z import triton 2023-01-11T21:38:06.7634439Z import triton.language as tl 2023-01-11T21:38:06.7634595Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.7634736Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.7634741Z 2023-01-11T21:38:06.7634746Z 2023-01-11T21:38:06.7634924Z triton_fused_add_2_unsqueeze__0 = async_compile.triton(''' 2023-01-11T21:38:06.7634992Z import triton 2023-01-11T21:38:06.7635085Z import triton.language as tl 2023-01-11T21:38:06.7635198Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.7635300Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.7635432Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.7635558Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.7635565Z 2023-01-11T21:38:06.7635988Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.7636065Z @triton.jit 2023-01-11T21:38:06.7636201Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.7636276Z xnumel = 16 2023-01-11T21:38:06.7636376Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.7636507Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.7636591Z xmask = xindex < xnumel 2023-01-11T21:38:06.7636662Z x0 = xindex 2023-01-11T21:38:06.7636853Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.7636943Z tmp3 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.7637014Z tmp1 = 1 2023-01-11T21:38:06.7637131Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.7637212Z tmp4 = tmp3 + tmp1 2023-01-11T21:38:06.7637282Z tmp5 = 2 2023-01-11T21:38:06.7637359Z tmp6 = tmp4 + tmp5 2023-01-11T21:38:06.7637492Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.7637619Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp6, xmask) 2023-01-11T21:38:06.7637704Z ''') 2023-01-11T21:38:06.7637710Z 2023-01-11T21:38:06.7637715Z 2023-01-11T21:38:06.7637809Z async_compile.wait(globals()) 2023-01-11T21:38:06.7637886Z del async_compile 2023-01-11T21:38:06.7637891Z 2023-01-11T21:38:06.7637965Z def call(args): 2023-01-11T21:38:06.7638037Z arg0_1, = args 2023-01-11T21:38:06.7638111Z args.clear() 2023-01-11T21:38:06.7638196Z with torch.cuda.device(0): 2023-01-11T21:38:06.7638417Z buf0 = empty_strided((2, 2, 1, 2, 2), (8, 4, 4, 2, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.7638637Z buf1 = empty_strided((1, 2, 2, 2, 2), (16, 8, 4, 2, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.7638731Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.7638883Z triton_fused_add_2_unsqueeze__0.run(arg0_1, buf0, buf1, 16, grid=grid(16), stream=stream0) 2023-01-11T21:38:06.7638956Z del arg0_1 2023-01-11T21:38:06.7639038Z return (buf0, buf1, ) 2023-01-11T21:38:06.7639046Z 2023-01-11T21:38:06.7639051Z 2023-01-11T21:38:06.7639129Z if __name__ == "__main__": 2023-01-11T21:38:06.7639239Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.7639368Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.7639580Z arg0_1 = rand_strided((2, 2, 2, 2), (8, 4, 2, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.7639692Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.7639697Z 2023-01-11T21:38:06.7639702Z 2023-01-11T21:38:06.7639798Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.7639871Z import torch 2023-01-11T21:38:06.7639947Z import random 2023-01-11T21:38:06.7640066Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.7640183Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.7640188Z 2023-01-11T21:38:06.7640270Z aten = torch.ops.aten 2023-01-11T21:38:06.7640406Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.7640531Z async_compile = AsyncCompile() 2023-01-11T21:38:06.7640537Z 2023-01-11T21:38:06.7640614Z import triton 2023-01-11T21:38:06.7640706Z import triton.language as tl 2023-01-11T21:38:06.7640831Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.7640972Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.7640978Z 2023-01-11T21:38:06.7640982Z 2023-01-11T21:38:06.7641153Z triton_fused_add_2_unsqueeze__0 = async_compile.triton(''' 2023-01-11T21:38:06.7641228Z import triton 2023-01-11T21:38:06.7641320Z import triton.language as tl 2023-01-11T21:38:06.7641434Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.7641538Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.7641670Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.7641795Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.7641800Z 2023-01-11T21:38:06.7642222Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.7642289Z @triton.jit 2023-01-11T21:38:06.7642432Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.7642506Z xnumel = 16 2023-01-11T21:38:06.7642603Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.7642732Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.7642815Z xmask = xindex < xnumel 2023-01-11T21:38:06.7642884Z x0 = xindex 2023-01-11T21:38:06.7643118Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.7643237Z tmp3 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.7643310Z tmp1 = 1 2023-01-11T21:38:06.7643389Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.7643467Z tmp4 = tmp3 + tmp1 2023-01-11T21:38:06.7643539Z tmp5 = 2 2023-01-11T21:38:06.7643615Z tmp6 = tmp4 + tmp5 2023-01-11T21:38:06.7643743Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.7643876Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp6, xmask) 2023-01-11T21:38:06.7643961Z ''') 2023-01-11T21:38:06.7643967Z 2023-01-11T21:38:06.7643972Z 2023-01-11T21:38:06.7644066Z async_compile.wait(globals()) 2023-01-11T21:38:06.7644142Z del async_compile 2023-01-11T21:38:06.7644147Z 2023-01-11T21:38:06.7644221Z def call(args): 2023-01-11T21:38:06.7644297Z arg0_1, = args 2023-01-11T21:38:06.7644371Z args.clear() 2023-01-11T21:38:06.7644459Z with torch.cuda.device(0): 2023-01-11T21:38:06.7644681Z buf0 = empty_strided((2, 2, 1, 2, 2), (8, 4, 4, 2, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.7644896Z buf1 = empty_strided((1, 2, 2, 2, 2), (16, 8, 4, 2, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.7644990Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.7645143Z triton_fused_add_2_unsqueeze__0.run(arg0_1, buf0, buf1, 16, grid=grid(16), stream=stream0) 2023-01-11T21:38:06.7645215Z del arg0_1 2023-01-11T21:38:06.7645296Z return (buf0, buf1, ) 2023-01-11T21:38:06.7645301Z 2023-01-11T21:38:06.7645306Z 2023-01-11T21:38:06.7645384Z if __name__ == "__main__": 2023-01-11T21:38:06.7645495Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.7645621Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.7645833Z arg0_1 = rand_strided((2, 2, 2, 2), (8, 4, 2, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.7645949Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.7645954Z 2023-01-11T21:38:06.7646025Z ok (0.206s) 2023-01-11T21:38:06.7646518Z test_upsample_bicubic2d_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.7646650Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.7646907Z [2023-01-11 21:36:08,705] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 929 2023-01-11T21:38:06.7646913Z 2023-01-11T21:38:06.7648395Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.7648462Z import torch 2023-01-11T21:38:06.7648539Z import random 2023-01-11T21:38:06.7648660Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.7648784Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.7648789Z 2023-01-11T21:38:06.7648872Z aten = torch.ops.aten 2023-01-11T21:38:06.7649008Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.7649102Z async_compile = AsyncCompile() 2023-01-11T21:38:06.7649110Z 2023-01-11T21:38:06.7649185Z import triton 2023-01-11T21:38:06.7649270Z import triton.language as tl 2023-01-11T21:38:06.7649394Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.7649532Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.7649538Z 2023-01-11T21:38:06.7649542Z 2023-01-11T21:38:06.7649728Z triton_fused_upsample_bicubic2d_0 = async_compile.triton(''' 2023-01-11T21:38:06.7649804Z import triton 2023-01-11T21:38:06.7649896Z import triton.language as tl 2023-01-11T21:38:06.7650009Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.7650130Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.7650263Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.7650390Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.7650395Z 2023-01-11T21:38:06.7650810Z @pointwise(size_hints=[262144], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.7650884Z @triton.jit 2023-01-11T21:38:06.7651017Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.7651092Z xnumel = 196608 2023-01-11T21:38:06.7651192Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.7651313Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.7651396Z xmask = xindex < xnumel 2023-01-11T21:38:06.7651472Z x0 = xindex % 128 2023-01-11T21:38:06.7651556Z x1 = (xindex // 128) % 128 2023-01-11T21:38:06.7651641Z x2 = (xindex // 16384) 2023-01-11T21:38:06.7651711Z x4 = xindex 2023-01-11T21:38:06.7651784Z tmp0 = x0 2023-01-11T21:38:06.7651863Z tmp1 = 0.2440944881889764 * tmp0 2023-01-11T21:38:06.7651960Z tmp2 = tl.libdevice.floor(tmp1) 2023-01-11T21:38:06.7652071Z tmp3 = tmp1 - tmp2 2023-01-11T21:38:06.7652145Z tmp4 = x1 2023-01-11T21:38:06.7652230Z tmp5 = 0.49606299212598426 * tmp4 2023-01-11T21:38:06.7652330Z tmp6 = tl.libdevice.floor(tmp5) 2023-01-11T21:38:06.7652438Z tmp7 = tmp5 - tmp6 2023-01-11T21:38:06.7652517Z tmp8 = tmp6.to(tl.int32) 2023-01-11T21:38:06.7652601Z tmp9 = tmp2.to(tl.int32) 2023-01-11T21:38:06.7652695Z tmp10 = tmp8 + -1 2023-01-11T21:38:06.7652771Z tmp11 = tmp8 + 0 2023-01-11T21:38:06.7652845Z tmp12 = tmp8 + 1 2023-01-11T21:38:06.7652920Z tmp13 = tmp8 + 2 2023-01-11T21:38:06.7653005Z tmp14 = tmp9 + -1 2023-01-11T21:38:06.7653086Z tmp15 = tmp9 + 0 2023-01-11T21:38:06.7653158Z tmp16 = tmp9 + 1 2023-01-11T21:38:06.7653236Z tmp17 = tmp9 + 2 2023-01-11T21:38:06.7653361Z tmp18 = tl.where(63 != 63, 63, tl.where(63 < tmp10, 63, tmp10)) 2023-01-11T21:38:06.7653479Z tmp19 = tl.where(0 != 0, 0, tl.where(0 > tmp18, 0, tmp18)) 2023-01-11T21:38:06.7653599Z tmp20 = tl.where(31 != 31, 31, tl.where(31 < tmp14, 31, tmp14)) 2023-01-11T21:38:06.7653738Z tmp21 = tl.where(0 != 0, 0, tl.where(0 > tmp20, 0, tmp20)) 2023-01-11T21:38:06.7653971Z tmp22 = tl.load(in_ptr0 + (tmp21 + (32*tmp19) + (2048*x2)), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.7654091Z tmp23 = tl.where(31 != 31, 31, tl.where(31 < tmp15, 31, tmp15)) 2023-01-11T21:38:06.7654205Z tmp24 = tl.where(0 != 0, 0, tl.where(0 > tmp23, 0, tmp23)) 2023-01-11T21:38:06.7654433Z tmp25 = tl.load(in_ptr0 + (tmp24 + (32*tmp19) + (2048*x2)), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.7654753Z tmp26 = tl.where(31 != 31, 31, tl.where(31 < tmp16, 31, tmp16)) 2023-01-11T21:38:06.7654872Z tmp27 = tl.where(0 != 0, 0, tl.where(0 > tmp26, 0, tmp26)) 2023-01-11T21:38:06.7655106Z tmp28 = tl.load(in_ptr0 + (tmp27 + (32*tmp19) + (2048*x2)), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.7655225Z tmp29 = tl.where(31 != 31, 31, tl.where(31 < tmp17, 31, tmp17)) 2023-01-11T21:38:06.7655331Z tmp30 = tl.where(0 != 0, 0, tl.where(0 > tmp29, 0, tmp29)) 2023-01-11T21:38:06.7655606Z tmp31 = tl.load(in_ptr0 + (tmp30 + (32*tmp19) + (2048*x2)), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.7655692Z tmp32 = tmp3 + 1.0 2023-01-11T21:38:06.7655806Z tmp33 = -0.75 * tmp32 2023-01-11T21:38:06.7655921Z tmp34 = tmp33 - -3.75 2023-01-11T21:38:06.7656001Z tmp35 = tmp34 * tmp32 2023-01-11T21:38:06.7656112Z tmp36 = tmp35 + -6.0 2023-01-11T21:38:06.7656185Z tmp37 = tmp36 * tmp32 2023-01-11T21:38:06.7656295Z tmp38 = tmp37 - -3.0 2023-01-11T21:38:06.7656373Z tmp39 = 1.25 * tmp3 2023-01-11T21:38:06.7656482Z tmp40 = tmp39 - 2.25 2023-01-11T21:38:06.7656560Z tmp41 = tmp40 * tmp3 2023-01-11T21:38:06.7656690Z tmp42 = tmp41 * tmp3 2023-01-11T21:38:06.7656764Z tmp43 = tmp42 + 1.0 2023-01-11T21:38:06.7656874Z tmp44 = 1.0 - tmp3 2023-01-11T21:38:06.7656953Z tmp45 = 1.25 * tmp44 2023-01-11T21:38:06.7657063Z tmp46 = tmp45 - 2.25 2023-01-11T21:38:06.7657195Z tmp47 = tmp46 * tmp44 2023-01-11T21:38:06.7657289Z tmp48 = tmp47 * tmp44 2023-01-11T21:38:06.7657371Z tmp49 = tmp48 + 1.0 2023-01-11T21:38:06.7657445Z tmp50 = tmp44 + 1.0 2023-01-11T21:38:06.7657557Z tmp51 = -0.75 * tmp50 2023-01-11T21:38:06.7657671Z tmp52 = tmp51 - -3.75 2023-01-11T21:38:06.7657752Z tmp53 = tmp52 * tmp50 2023-01-11T21:38:06.7657867Z tmp54 = tmp53 + -6.0 2023-01-11T21:38:06.7657949Z tmp55 = tmp54 * tmp50 2023-01-11T21:38:06.7658063Z tmp56 = tmp55 - -3.0 2023-01-11T21:38:06.7658139Z tmp57 = tmp22 * tmp38 2023-01-11T21:38:06.7658220Z tmp58 = tmp25 * tmp43 2023-01-11T21:38:06.7658301Z tmp59 = tmp28 * tmp49 2023-01-11T21:38:06.7658382Z tmp60 = tmp31 * tmp56 2023-01-11T21:38:06.7658466Z tmp61 = tmp59 + tmp60 2023-01-11T21:38:06.7658547Z tmp62 = tmp58 + tmp61 2023-01-11T21:38:06.7658620Z tmp63 = tmp57 + tmp62 2023-01-11T21:38:06.7658740Z tmp64 = tl.where(63 != 63, 63, tl.where(63 < tmp11, 63, tmp11)) 2023-01-11T21:38:06.7658862Z tmp65 = tl.where(0 != 0, 0, tl.where(0 > tmp64, 0, tmp64)) 2023-01-11T21:38:06.7659092Z tmp66 = tl.load(in_ptr0 + (tmp21 + (32*tmp65) + (2048*x2)), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.7659316Z tmp67 = tl.load(in_ptr0 + (tmp24 + (32*tmp65) + (2048*x2)), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.7659537Z tmp68 = tl.load(in_ptr0 + (tmp27 + (32*tmp65) + (2048*x2)), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.7659758Z tmp69 = tl.load(in_ptr0 + (tmp30 + (32*tmp65) + (2048*x2)), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.7659842Z tmp70 = tmp66 * tmp38 2023-01-11T21:38:06.7659917Z tmp71 = tmp67 * tmp43 2023-01-11T21:38:06.7660001Z tmp72 = tmp68 * tmp49 2023-01-11T21:38:06.7660084Z tmp73 = tmp69 * tmp56 2023-01-11T21:38:06.7660165Z tmp74 = tmp72 + tmp73 2023-01-11T21:38:06.7660245Z tmp75 = tmp71 + tmp74 2023-01-11T21:38:06.7660325Z tmp76 = tmp70 + tmp75 2023-01-11T21:38:06.7660444Z tmp77 = tl.where(63 != 63, 63, tl.where(63 < tmp12, 63, tmp12)) 2023-01-11T21:38:06.7660597Z tmp78 = tl.where(0 != 0, 0, tl.where(0 > tmp77, 0, tmp77)) 2023-01-11T21:38:06.7660827Z tmp79 = tl.load(in_ptr0 + (tmp21 + (32*tmp78) + (2048*x2)), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.7661051Z tmp80 = tl.load(in_ptr0 + (tmp24 + (32*tmp78) + (2048*x2)), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.7661272Z tmp81 = tl.load(in_ptr0 + (tmp27 + (32*tmp78) + (2048*x2)), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.7661492Z tmp82 = tl.load(in_ptr0 + (tmp30 + (32*tmp78) + (2048*x2)), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.7661573Z tmp83 = tmp79 * tmp38 2023-01-11T21:38:06.7661657Z tmp84 = tmp80 * tmp43 2023-01-11T21:38:06.7661737Z tmp85 = tmp81 * tmp49 2023-01-11T21:38:06.7661810Z tmp86 = tmp82 * tmp56 2023-01-11T21:38:06.7661889Z tmp87 = tmp85 + tmp86 2023-01-11T21:38:06.7661967Z tmp88 = tmp84 + tmp87 2023-01-11T21:38:06.7662046Z tmp89 = tmp83 + tmp88 2023-01-11T21:38:06.7662168Z tmp90 = tl.where(63 != 63, 63, tl.where(63 < tmp13, 63, tmp13)) 2023-01-11T21:38:06.7662286Z tmp91 = tl.where(0 != 0, 0, tl.where(0 > tmp90, 0, tmp90)) 2023-01-11T21:38:06.7662511Z tmp92 = tl.load(in_ptr0 + (tmp21 + (32*tmp91) + (2048*x2)), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.7662722Z tmp93 = tl.load(in_ptr0 + (tmp24 + (32*tmp91) + (2048*x2)), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.7662940Z tmp94 = tl.load(in_ptr0 + (tmp27 + (32*tmp91) + (2048*x2)), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.7663158Z tmp95 = tl.load(in_ptr0 + (tmp30 + (32*tmp91) + (2048*x2)), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.7663272Z tmp96 = tmp92 * tmp38 2023-01-11T21:38:06.7663355Z tmp97 = tmp93 * tmp43 2023-01-11T21:38:06.7663437Z tmp98 = tmp94 * tmp49 2023-01-11T21:38:06.7663518Z tmp99 = tmp95 * tmp56 2023-01-11T21:38:06.7663601Z tmp100 = tmp98 + tmp99 2023-01-11T21:38:06.7663676Z tmp101 = tmp97 + tmp100 2023-01-11T21:38:06.7663759Z tmp102 = tmp96 + tmp101 2023-01-11T21:38:06.7663839Z tmp103 = tmp7 + 1.0 2023-01-11T21:38:06.7663956Z tmp104 = -0.75 * tmp103 2023-01-11T21:38:06.7664072Z tmp105 = tmp104 - -3.75 2023-01-11T21:38:06.7664154Z tmp106 = tmp105 * tmp103 2023-01-11T21:38:06.7664262Z tmp107 = tmp106 + -6.0 2023-01-11T21:38:06.7664346Z tmp108 = tmp107 * tmp103 2023-01-11T21:38:06.7664462Z tmp109 = tmp108 - -3.0 2023-01-11T21:38:06.7664540Z tmp110 = 1.25 * tmp7 2023-01-11T21:38:06.7664654Z tmp111 = tmp110 - 2.25 2023-01-11T21:38:06.7664737Z tmp112 = tmp111 * tmp7 2023-01-11T21:38:06.7664820Z tmp113 = tmp112 * tmp7 2023-01-11T21:38:06.7664899Z tmp114 = tmp113 + 1.0 2023-01-11T21:38:06.7665010Z tmp115 = 1.0 - tmp7 2023-01-11T21:38:06.7665089Z tmp116 = 1.25 * tmp115 2023-01-11T21:38:06.7665202Z tmp117 = tmp116 - 2.25 2023-01-11T21:38:06.7665286Z tmp118 = tmp117 * tmp115 2023-01-11T21:38:06.7665367Z tmp119 = tmp118 * tmp115 2023-01-11T21:38:06.7665450Z tmp120 = tmp119 + 1.0 2023-01-11T21:38:06.7665523Z tmp121 = tmp115 + 1.0 2023-01-11T21:38:06.7665638Z tmp122 = -0.75 * tmp121 2023-01-11T21:38:06.7665754Z tmp123 = tmp122 - -3.75 2023-01-11T21:38:06.7665836Z tmp124 = tmp123 * tmp121 2023-01-11T21:38:06.7665950Z tmp125 = tmp124 + -6.0 2023-01-11T21:38:06.7666032Z tmp126 = tmp125 * tmp121 2023-01-11T21:38:06.7666138Z tmp127 = tmp126 - -3.0 2023-01-11T21:38:06.7666219Z tmp128 = tmp63 * tmp109 2023-01-11T21:38:06.7666300Z tmp129 = tmp76 * tmp114 2023-01-11T21:38:06.7666381Z tmp130 = tmp89 * tmp120 2023-01-11T21:38:06.7666461Z tmp131 = tmp102 * tmp127 2023-01-11T21:38:06.7666548Z tmp132 = tmp130 + tmp131 2023-01-11T21:38:06.7666631Z tmp133 = tmp129 + tmp132 2023-01-11T21:38:06.7666705Z tmp134 = tmp128 + tmp133 2023-01-11T21:38:06.7666845Z tl.store(out_ptr0 + (x4 + tl.zeros([XBLOCK], tl.int32)), tmp134, xmask) 2023-01-11T21:38:06.7666931Z ''') 2023-01-11T21:38:06.7666938Z 2023-01-11T21:38:06.7666943Z 2023-01-11T21:38:06.7667161Z triton_fused_upsample_bicubic2d_1_1 = async_compile.triton(''' 2023-01-11T21:38:06.7667239Z import triton 2023-01-11T21:38:06.7667332Z import triton.language as tl 2023-01-11T21:38:06.7667449Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.7667552Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.7667681Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.7667808Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.7667813Z 2023-01-11T21:38:06.7668219Z @pointwise(size_hints=[524288], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.7668297Z @triton.jit 2023-01-11T21:38:06.7668431Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.7668506Z xnumel = 393216 2023-01-11T21:38:06.7668606Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.7668737Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.7668814Z xmask = xindex < xnumel 2023-01-11T21:38:06.7668893Z x0 = xindex % 256 2023-01-11T21:38:06.7668977Z x1 = (xindex // 256) % 128 2023-01-11T21:38:06.7669057Z x2 = (xindex // 32768) 2023-01-11T21:38:06.7669128Z x4 = xindex 2023-01-11T21:38:06.7669200Z tmp0 = x0 2023-01-11T21:38:06.7669272Z tmp1 = tmp0 + 0.5 2023-01-11T21:38:06.7669351Z tmp2 = 0.125 * tmp1 2023-01-11T21:38:06.7669461Z tmp3 = tmp2 - 0.5 2023-01-11T21:38:06.7669563Z tmp4 = tl.libdevice.floor(tmp3) 2023-01-11T21:38:06.7669704Z tmp5 = tmp3 - tmp4 2023-01-11T21:38:06.7669779Z tmp6 = x1 2023-01-11T21:38:06.7669856Z tmp7 = tmp6 + 0.5 2023-01-11T21:38:06.7669926Z tmp8 = 0.5 * tmp7 2023-01-11T21:38:06.7670034Z tmp9 = tmp8 - 0.5 2023-01-11T21:38:06.7670137Z tmp10 = tl.libdevice.floor(tmp9) 2023-01-11T21:38:06.7670252Z tmp11 = tmp9 - tmp10 2023-01-11T21:38:06.7670345Z tmp12 = tmp10.to(tl.int32) 2023-01-11T21:38:06.7670434Z tmp13 = tmp4.to(tl.int32) 2023-01-11T21:38:06.7670529Z tmp14 = tmp12 + -1 2023-01-11T21:38:06.7670601Z tmp15 = tmp12 + 0 2023-01-11T21:38:06.7670679Z tmp16 = tmp12 + 1 2023-01-11T21:38:06.7670761Z tmp17 = tmp12 + 2 2023-01-11T21:38:06.7670857Z tmp18 = tmp13 + -1 2023-01-11T21:38:06.7670935Z tmp19 = tmp13 + 0 2023-01-11T21:38:06.7671012Z tmp20 = tmp13 + 1 2023-01-11T21:38:06.7671082Z tmp21 = tmp13 + 2 2023-01-11T21:38:06.7671204Z tmp22 = tl.where(63 != 63, 63, tl.where(63 < tmp14, 63, tmp14)) 2023-01-11T21:38:06.7671321Z tmp23 = tl.where(0 != 0, 0, tl.where(0 > tmp22, 0, tmp22)) 2023-01-11T21:38:06.7671445Z tmp24 = tl.where(31 != 31, 31, tl.where(31 < tmp18, 31, tmp18)) 2023-01-11T21:38:06.7671561Z tmp25 = tl.where(0 != 0, 0, tl.where(0 > tmp24, 0, tmp24)) 2023-01-11T21:38:06.7671687Z tmp26 = tl.load(in_ptr0 + (tmp25 + (32*tmp23) + (2048*x2)), xmask) 2023-01-11T21:38:06.7671812Z tmp27 = tl.where(31 != 31, 31, tl.where(31 < tmp19, 31, tmp19)) 2023-01-11T21:38:06.7671926Z tmp28 = tl.where(0 != 0, 0, tl.where(0 > tmp27, 0, tmp27)) 2023-01-11T21:38:06.7672040Z tmp29 = tl.load(in_ptr0 + (tmp28 + (32*tmp23) + (2048*x2)), xmask) 2023-01-11T21:38:06.7672157Z tmp30 = tl.where(31 != 31, 31, tl.where(31 < tmp20, 31, tmp20)) 2023-01-11T21:38:06.7672271Z tmp31 = tl.where(0 != 0, 0, tl.where(0 > tmp30, 0, tmp30)) 2023-01-11T21:38:06.7672390Z tmp32 = tl.load(in_ptr0 + (tmp31 + (32*tmp23) + (2048*x2)), xmask) 2023-01-11T21:38:06.7672508Z tmp33 = tl.where(31 != 31, 31, tl.where(31 < tmp21, 31, tmp21)) 2023-01-11T21:38:06.7672624Z tmp34 = tl.where(0 != 0, 0, tl.where(0 > tmp33, 0, tmp33)) 2023-01-11T21:38:06.7672742Z tmp35 = tl.load(in_ptr0 + (tmp34 + (32*tmp23) + (2048*x2)), xmask) 2023-01-11T21:38:06.7672823Z tmp36 = tmp5 + 1.0 2023-01-11T21:38:06.7672931Z tmp37 = -0.75 * tmp36 2023-01-11T21:38:06.7673045Z tmp38 = tmp37 - -3.75 2023-01-11T21:38:06.7673155Z tmp39 = tmp38 * tmp36 2023-01-11T21:38:06.7673270Z tmp40 = tmp39 + -6.0 2023-01-11T21:38:06.7673352Z tmp41 = tmp40 * tmp36 2023-01-11T21:38:06.7673463Z tmp42 = tmp41 - -3.0 2023-01-11T21:38:06.7673535Z tmp43 = 1.25 * tmp5 2023-01-11T21:38:06.7673644Z tmp44 = tmp43 - 2.25 2023-01-11T21:38:06.7673724Z tmp45 = tmp44 * tmp5 2023-01-11T21:38:06.7673803Z tmp46 = tmp45 * tmp5 2023-01-11T21:38:06.7673881Z tmp47 = tmp46 + 1.0 2023-01-11T21:38:06.7673987Z tmp48 = 1.0 - tmp5 2023-01-11T21:38:06.7674064Z tmp49 = 1.25 * tmp48 2023-01-11T21:38:06.7674167Z tmp50 = tmp49 - 2.25 2023-01-11T21:38:06.7674250Z tmp51 = tmp50 * tmp48 2023-01-11T21:38:06.7674334Z tmp52 = tmp51 * tmp48 2023-01-11T21:38:06.7674412Z tmp53 = tmp52 + 1.0 2023-01-11T21:38:06.7674489Z tmp54 = tmp48 + 1.0 2023-01-11T21:38:06.7674598Z tmp55 = -0.75 * tmp54 2023-01-11T21:38:06.7674708Z tmp56 = tmp55 - -3.75 2023-01-11T21:38:06.7674781Z tmp57 = tmp56 * tmp54 2023-01-11T21:38:06.7674893Z tmp58 = tmp57 + -6.0 2023-01-11T21:38:06.7674974Z tmp59 = tmp58 * tmp54 2023-01-11T21:38:06.7675083Z tmp60 = tmp59 - -3.0 2023-01-11T21:38:06.7675163Z tmp61 = tmp26 * tmp42 2023-01-11T21:38:06.7675243Z tmp62 = tmp29 * tmp47 2023-01-11T21:38:06.7675315Z tmp63 = tmp32 * tmp53 2023-01-11T21:38:06.7675394Z tmp64 = tmp35 * tmp60 2023-01-11T21:38:06.7675473Z tmp65 = tmp63 + tmp64 2023-01-11T21:38:06.7675551Z tmp66 = tmp62 + tmp65 2023-01-11T21:38:06.7675629Z tmp67 = tmp61 + tmp66 2023-01-11T21:38:06.7675745Z tmp68 = tl.where(63 != 63, 63, tl.where(63 < tmp15, 63, tmp15)) 2023-01-11T21:38:06.7675892Z tmp69 = tl.where(0 != 0, 0, tl.where(0 > tmp68, 0, tmp68)) 2023-01-11T21:38:06.7676003Z tmp70 = tl.load(in_ptr0 + (tmp25 + (32*tmp69) + (2048*x2)), xmask) 2023-01-11T21:38:06.7676125Z tmp71 = tl.load(in_ptr0 + (tmp28 + (32*tmp69) + (2048*x2)), xmask) 2023-01-11T21:38:06.7676245Z tmp72 = tl.load(in_ptr0 + (tmp31 + (32*tmp69) + (2048*x2)), xmask) 2023-01-11T21:38:06.7676368Z tmp73 = tl.load(in_ptr0 + (tmp34 + (32*tmp69) + (2048*x2)), xmask) 2023-01-11T21:38:06.7676450Z tmp74 = tmp70 * tmp42 2023-01-11T21:38:06.7676531Z tmp75 = tmp71 * tmp47 2023-01-11T21:38:06.7676611Z tmp76 = tmp72 * tmp53 2023-01-11T21:38:06.7676691Z tmp77 = tmp73 * tmp60 2023-01-11T21:38:06.7676762Z tmp78 = tmp76 + tmp77 2023-01-11T21:38:06.7676839Z tmp79 = tmp75 + tmp78 2023-01-11T21:38:06.7676917Z tmp80 = tmp74 + tmp79 2023-01-11T21:38:06.7677037Z tmp81 = tl.where(63 != 63, 63, tl.where(63 < tmp16, 63, tmp16)) 2023-01-11T21:38:06.7677153Z tmp82 = tl.where(0 != 0, 0, tl.where(0 > tmp81, 0, tmp81)) 2023-01-11T21:38:06.7677277Z tmp83 = tl.load(in_ptr0 + (tmp25 + (32*tmp82) + (2048*x2)), xmask) 2023-01-11T21:38:06.7677396Z tmp84 = tl.load(in_ptr0 + (tmp28 + (32*tmp82) + (2048*x2)), xmask) 2023-01-11T21:38:06.7677505Z tmp85 = tl.load(in_ptr0 + (tmp31 + (32*tmp82) + (2048*x2)), xmask) 2023-01-11T21:38:06.7677629Z tmp86 = tl.load(in_ptr0 + (tmp34 + (32*tmp82) + (2048*x2)), xmask) 2023-01-11T21:38:06.7677709Z tmp87 = tmp83 * tmp42 2023-01-11T21:38:06.7677787Z tmp88 = tmp84 * tmp47 2023-01-11T21:38:06.7677865Z tmp89 = tmp85 * tmp53 2023-01-11T21:38:06.7677946Z tmp90 = tmp86 * tmp60 2023-01-11T21:38:06.7678024Z tmp91 = tmp89 + tmp90 2023-01-11T21:38:06.7678095Z tmp92 = tmp88 + tmp91 2023-01-11T21:38:06.7678172Z tmp93 = tmp87 + tmp92 2023-01-11T21:38:06.7678287Z tmp94 = tl.where(63 != 63, 63, tl.where(63 < tmp17, 63, tmp17)) 2023-01-11T21:38:06.7678400Z tmp95 = tl.where(0 != 0, 0, tl.where(0 > tmp94, 0, tmp94)) 2023-01-11T21:38:06.7678516Z tmp96 = tl.load(in_ptr0 + (tmp25 + (32*tmp95) + (2048*x2)), xmask) 2023-01-11T21:38:06.7678637Z tmp97 = tl.load(in_ptr0 + (tmp28 + (32*tmp95) + (2048*x2)), xmask) 2023-01-11T21:38:06.7678760Z tmp98 = tl.load(in_ptr0 + (tmp31 + (32*tmp95) + (2048*x2)), xmask) 2023-01-11T21:38:06.7678871Z tmp99 = tl.load(in_ptr0 + (tmp34 + (32*tmp95) + (2048*x2)), xmask) 2023-01-11T21:38:06.7678981Z tmp100 = tmp96 * tmp42 2023-01-11T21:38:06.7679064Z tmp101 = tmp97 * tmp47 2023-01-11T21:38:06.7679143Z tmp102 = tmp98 * tmp53 2023-01-11T21:38:06.7679223Z tmp103 = tmp99 * tmp60 2023-01-11T21:38:06.7679307Z tmp104 = tmp102 + tmp103 2023-01-11T21:38:06.7679387Z tmp105 = tmp101 + tmp104 2023-01-11T21:38:06.7679460Z tmp106 = tmp100 + tmp105 2023-01-11T21:38:06.7679538Z tmp107 = tmp11 + 1.0 2023-01-11T21:38:06.7679657Z tmp108 = -0.75 * tmp107 2023-01-11T21:38:06.7679771Z tmp109 = tmp108 - -3.75 2023-01-11T21:38:06.7679852Z tmp110 = tmp109 * tmp107 2023-01-11T21:38:06.7679965Z tmp111 = tmp110 + -6.0 2023-01-11T21:38:06.7680047Z tmp112 = tmp111 * tmp107 2023-01-11T21:38:06.7680153Z tmp113 = tmp112 - -3.0 2023-01-11T21:38:06.7680230Z tmp114 = 1.25 * tmp11 2023-01-11T21:38:06.7680343Z tmp115 = tmp114 - 2.25 2023-01-11T21:38:06.7680424Z tmp116 = tmp115 * tmp11 2023-01-11T21:38:06.7680504Z tmp117 = tmp116 * tmp11 2023-01-11T21:38:06.7680583Z tmp118 = tmp117 + 1.0 2023-01-11T21:38:06.7680687Z tmp119 = 1.0 - tmp11 2023-01-11T21:38:06.7680765Z tmp120 = 1.25 * tmp119 2023-01-11T21:38:06.7680878Z tmp121 = tmp120 - 2.25 2023-01-11T21:38:06.7680958Z tmp122 = tmp121 * tmp119 2023-01-11T21:38:06.7681038Z tmp123 = tmp122 * tmp119 2023-01-11T21:38:06.7681114Z tmp124 = tmp123 + 1.0 2023-01-11T21:38:06.7681191Z tmp125 = tmp119 + 1.0 2023-01-11T21:38:06.7681296Z tmp126 = -0.75 * tmp125 2023-01-11T21:38:06.7681410Z tmp127 = tmp126 - -3.75 2023-01-11T21:38:06.7681490Z tmp128 = tmp127 * tmp125 2023-01-11T21:38:06.7681601Z tmp129 = tmp128 + -6.0 2023-01-11T21:38:06.7681723Z tmp130 = tmp129 * tmp125 2023-01-11T21:38:06.7681837Z tmp131 = tmp130 - -3.0 2023-01-11T21:38:06.7681917Z tmp132 = tmp67 * tmp113 2023-01-11T21:38:06.7681989Z tmp133 = tmp80 * tmp118 2023-01-11T21:38:06.7682067Z tmp134 = tmp93 * tmp124 2023-01-11T21:38:06.7682147Z tmp135 = tmp106 * tmp131 2023-01-11T21:38:06.7682230Z tmp136 = tmp134 + tmp135 2023-01-11T21:38:06.7682312Z tmp137 = tmp133 + tmp136 2023-01-11T21:38:06.7682396Z tmp138 = tmp132 + tmp137 2023-01-11T21:38:06.7682533Z tl.store(out_ptr0 + (x4 + tl.zeros([XBLOCK], tl.int32)), tmp138, xmask) 2023-01-11T21:38:06.7682611Z ''') 2023-01-11T21:38:06.7682617Z 2023-01-11T21:38:06.7682621Z 2023-01-11T21:38:06.7682716Z async_compile.wait(globals()) 2023-01-11T21:38:06.7682792Z del async_compile 2023-01-11T21:38:06.7682797Z 2023-01-11T21:38:06.7682871Z def call(args): 2023-01-11T21:38:06.7682943Z arg0_1, = args 2023-01-11T21:38:06.7683018Z args.clear() 2023-01-11T21:38:06.7683113Z with torch.cuda.device(0): 2023-01-11T21:38:06.7683340Z buf0 = empty_strided((4, 3, 128, 128), (49152, 16384, 128, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.7683434Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.7683595Z triton_fused_upsample_bicubic2d_0.run(arg0_1, buf0, 196608, grid=grid(196608), stream=stream0) 2023-01-11T21:38:06.7683826Z buf1 = empty_strided((4, 3, 128, 256), (98304, 32768, 256, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.7683987Z triton_fused_upsample_bicubic2d_1_1.run(arg0_1, buf1, 393216, grid=grid(393216), stream=stream0) 2023-01-11T21:38:06.7684059Z del arg0_1 2023-01-11T21:38:06.7684142Z return (buf0, buf1, ) 2023-01-11T21:38:06.7684147Z 2023-01-11T21:38:06.7684151Z 2023-01-11T21:38:06.7684230Z if __name__ == "__main__": 2023-01-11T21:38:06.7684348Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.7684468Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.7684695Z arg0_1 = rand_strided((4, 3, 64, 32), (6144, 2048, 32, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.7684807Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.7685071Z [2023-01-11 21:36:09,323] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 929 2023-01-11T21:38:06.7685512Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.7685644Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.7685925Z [2023-01-11 21:36:10,974] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 930 2023-01-11T21:38:06.7685931Z 2023-01-11T21:38:06.7685938Z 2023-01-11T21:38:06.7686046Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.7686134Z import torch 2023-01-11T21:38:06.7686202Z import random 2023-01-11T21:38:06.7686321Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.7686445Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.7686450Z 2023-01-11T21:38:06.7686534Z aten = torch.ops.aten 2023-01-11T21:38:06.7686671Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.7686767Z async_compile = AsyncCompile() 2023-01-11T21:38:06.7686772Z 2023-01-11T21:38:06.7686845Z import triton 2023-01-11T21:38:06.7686930Z import triton.language as tl 2023-01-11T21:38:06.7687054Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.7687194Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.7687199Z 2023-01-11T21:38:06.7687204Z 2023-01-11T21:38:06.7687388Z triton_fused_upsample_bicubic2d_0 = async_compile.triton(''' 2023-01-11T21:38:06.7687491Z import triton 2023-01-11T21:38:06.7687583Z import triton.language as tl 2023-01-11T21:38:06.7687698Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.7687800Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.7687925Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.7688059Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.7688065Z 2023-01-11T21:38:06.7688473Z @pointwise(size_hints=[262144], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.7688546Z @triton.jit 2023-01-11T21:38:06.7688682Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.7688757Z xnumel = 196608 2023-01-11T21:38:06.7688855Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.7688985Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.7689065Z xmask = xindex < xnumel 2023-01-11T21:38:06.7689141Z x0 = xindex % 128 2023-01-11T21:38:06.7689225Z x1 = (xindex // 128) % 128 2023-01-11T21:38:06.7689305Z x2 = (xindex // 16384) 2023-01-11T21:38:06.7689375Z x4 = xindex 2023-01-11T21:38:06.7689446Z tmp0 = x0 2023-01-11T21:38:06.7689533Z tmp1 = 0.2440944881889764 * tmp0 2023-01-11T21:38:06.7689625Z tmp2 = tl.libdevice.floor(tmp1) 2023-01-11T21:38:06.7689736Z tmp3 = tmp1 - tmp2 2023-01-11T21:38:06.7689805Z tmp4 = x1 2023-01-11T21:38:06.7689891Z tmp5 = 0.49606299212598426 * tmp4 2023-01-11T21:38:06.7689987Z tmp6 = tl.libdevice.floor(tmp5) 2023-01-11T21:38:06.7690097Z tmp7 = tmp5 - tmp6 2023-01-11T21:38:06.7690183Z tmp8 = tmp6.to(tl.int32) 2023-01-11T21:38:06.7690261Z tmp9 = tmp2.to(tl.int32) 2023-01-11T21:38:06.7690356Z tmp10 = tmp8 + -1 2023-01-11T21:38:06.7690432Z tmp11 = tmp8 + 0 2023-01-11T21:38:06.7690507Z tmp12 = tmp8 + 1 2023-01-11T21:38:06.7690585Z tmp13 = tmp8 + 2 2023-01-11T21:38:06.7690679Z tmp14 = tmp9 + -1 2023-01-11T21:38:06.7690747Z tmp15 = tmp9 + 0 2023-01-11T21:38:06.7690820Z tmp16 = tmp9 + 1 2023-01-11T21:38:06.7690892Z tmp17 = tmp9 + 2 2023-01-11T21:38:06.7691015Z tmp18 = tl.where(63 != 63, 63, tl.where(63 < tmp10, 63, tmp10)) 2023-01-11T21:38:06.7691194Z tmp19 = tl.where(0 != 0, 0, tl.where(0 > tmp18, 0, tmp18)) 2023-01-11T21:38:06.7691315Z tmp20 = tl.where(31 != 31, 31, tl.where(31 < tmp14, 31, tmp14)) 2023-01-11T21:38:06.7691429Z tmp21 = tl.where(0 != 0, 0, tl.where(0 > tmp20, 0, tmp20)) 2023-01-11T21:38:06.7691684Z tmp22 = tl.load(in_ptr0 + (tmp21 + (32*tmp19) + (2048*x2)), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.7691795Z tmp23 = tl.where(31 != 31, 31, tl.where(31 < tmp15, 31, tmp15)) 2023-01-11T21:38:06.7691909Z tmp24 = tl.where(0 != 0, 0, tl.where(0 > tmp23, 0, tmp23)) 2023-01-11T21:38:06.7692159Z tmp25 = tl.load(in_ptr0 + (tmp24 + (32*tmp19) + (2048*x2)), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.7692278Z tmp26 = tl.where(31 != 31, 31, tl.where(31 < tmp16, 31, tmp16)) 2023-01-11T21:38:06.7692391Z tmp27 = tl.where(0 != 0, 0, tl.where(0 > tmp26, 0, tmp26)) 2023-01-11T21:38:06.7692640Z tmp28 = tl.load(in_ptr0 + (tmp27 + (32*tmp19) + (2048*x2)), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.7692759Z tmp29 = tl.where(31 != 31, 31, tl.where(31 < tmp17, 31, tmp17)) 2023-01-11T21:38:06.7692874Z tmp30 = tl.where(0 != 0, 0, tl.where(0 > tmp29, 0, tmp29)) 2023-01-11T21:38:06.7693113Z tmp31 = tl.load(in_ptr0 + (tmp30 + (32*tmp19) + (2048*x2)), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.7693191Z tmp32 = tmp3 + 1.0 2023-01-11T21:38:06.7693302Z tmp33 = -0.75 * tmp32 2023-01-11T21:38:06.7693416Z tmp34 = tmp33 - -3.75 2023-01-11T21:38:06.7693495Z tmp35 = tmp34 * tmp32 2023-01-11T21:38:06.7693606Z tmp36 = tmp35 + -6.0 2023-01-11T21:38:06.7693714Z tmp37 = tmp36 * tmp32 2023-01-11T21:38:06.7693817Z tmp38 = tmp37 - -3.0 2023-01-11T21:38:06.7693897Z tmp39 = 1.25 * tmp3 2023-01-11T21:38:06.7694006Z tmp40 = tmp39 - 2.25 2023-01-11T21:38:06.7694085Z tmp41 = tmp40 * tmp3 2023-01-11T21:38:06.7694163Z tmp42 = tmp41 * tmp3 2023-01-11T21:38:06.7694243Z tmp43 = tmp42 + 1.0 2023-01-11T21:38:06.7694342Z tmp44 = 1.0 - tmp3 2023-01-11T21:38:06.7694421Z tmp45 = 1.25 * tmp44 2023-01-11T21:38:06.7694740Z tmp46 = tmp45 - 2.25 2023-01-11T21:38:06.7694821Z tmp47 = tmp46 * tmp44 2023-01-11T21:38:06.7694900Z tmp48 = tmp47 * tmp44 2023-01-11T21:38:06.7694979Z tmp49 = tmp48 + 1.0 2023-01-11T21:38:06.7695058Z tmp50 = tmp44 + 1.0 2023-01-11T21:38:06.7695165Z tmp51 = -0.75 * tmp50 2023-01-11T21:38:06.7695278Z tmp52 = tmp51 - -3.75 2023-01-11T21:38:06.7695356Z tmp53 = tmp52 * tmp50 2023-01-11T21:38:06.7695468Z tmp54 = tmp53 + -6.0 2023-01-11T21:38:06.7695551Z tmp55 = tmp54 * tmp50 2023-01-11T21:38:06.7695667Z tmp56 = tmp55 - -3.0 2023-01-11T21:38:06.7695749Z tmp57 = tmp22 * tmp38 2023-01-11T21:38:06.7695821Z tmp58 = tmp25 * tmp43 2023-01-11T21:38:06.7695899Z tmp59 = tmp28 * tmp49 2023-01-11T21:38:06.7695976Z tmp60 = tmp31 * tmp56 2023-01-11T21:38:06.7696055Z tmp61 = tmp59 + tmp60 2023-01-11T21:38:06.7696136Z tmp62 = tmp58 + tmp61 2023-01-11T21:38:06.7696216Z tmp63 = tmp57 + tmp62 2023-01-11T21:38:06.7696328Z tmp64 = tl.where(63 != 63, 63, tl.where(63 < tmp11, 63, tmp11)) 2023-01-11T21:38:06.7696447Z tmp65 = tl.where(0 != 0, 0, tl.where(0 > tmp64, 0, tmp64)) 2023-01-11T21:38:06.7696697Z tmp66 = tl.load(in_ptr0 + (tmp21 + (32*tmp65) + (2048*x2)), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.7696974Z tmp67 = tl.load(in_ptr0 + (tmp24 + (32*tmp65) + (2048*x2)), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.7697294Z tmp68 = tl.load(in_ptr0 + (tmp27 + (32*tmp65) + (2048*x2)), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.7697537Z tmp69 = tl.load(in_ptr0 + (tmp30 + (32*tmp65) + (2048*x2)), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.7697620Z tmp70 = tmp66 * tmp38 2023-01-11T21:38:06.7697700Z tmp71 = tmp67 * tmp43 2023-01-11T21:38:06.7697782Z tmp72 = tmp68 * tmp49 2023-01-11T21:38:06.7697909Z tmp73 = tmp69 * tmp56 2023-01-11T21:38:06.7697991Z tmp74 = tmp72 + tmp73 2023-01-11T21:38:06.7698071Z tmp75 = tmp71 + tmp74 2023-01-11T21:38:06.7698151Z tmp76 = tmp70 + tmp75 2023-01-11T21:38:06.7698273Z tmp77 = tl.where(63 != 63, 63, tl.where(63 < tmp12, 63, tmp12)) 2023-01-11T21:38:06.7698391Z tmp78 = tl.where(0 != 0, 0, tl.where(0 > tmp77, 0, tmp77)) 2023-01-11T21:38:06.7698642Z tmp79 = tl.load(in_ptr0 + (tmp21 + (32*tmp78) + (2048*x2)), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.7698879Z tmp80 = tl.load(in_ptr0 + (tmp24 + (32*tmp78) + (2048*x2)), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.7699125Z tmp81 = tl.load(in_ptr0 + (tmp27 + (32*tmp78) + (2048*x2)), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.7699364Z tmp82 = tl.load(in_ptr0 + (tmp30 + (32*tmp78) + (2048*x2)), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.7699446Z tmp83 = tmp79 * tmp38 2023-01-11T21:38:06.7699533Z tmp84 = tmp80 * tmp43 2023-01-11T21:38:06.7699613Z tmp85 = tmp81 * tmp49 2023-01-11T21:38:06.7699695Z tmp86 = tmp82 * tmp56 2023-01-11T21:38:06.7699768Z tmp87 = tmp85 + tmp86 2023-01-11T21:38:06.7699848Z tmp88 = tmp84 + tmp87 2023-01-11T21:38:06.7699927Z tmp89 = tmp83 + tmp88 2023-01-11T21:38:06.7700049Z tmp90 = tl.where(63 != 63, 63, tl.where(63 < tmp13, 63, tmp13)) 2023-01-11T21:38:06.7700166Z tmp91 = tl.where(0 != 0, 0, tl.where(0 > tmp90, 0, tmp90)) 2023-01-11T21:38:06.7700408Z tmp92 = tl.load(in_ptr0 + (tmp21 + (32*tmp91) + (2048*x2)), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.7700685Z tmp93 = tl.load(in_ptr0 + (tmp24 + (32*tmp91) + (2048*x2)), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.7700925Z tmp94 = tl.load(in_ptr0 + (tmp27 + (32*tmp91) + (2048*x2)), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.7701157Z tmp95 = tl.load(in_ptr0 + (tmp30 + (32*tmp91) + (2048*x2)), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.7701239Z tmp96 = tmp92 * tmp38 2023-01-11T21:38:06.7701318Z tmp97 = tmp93 * tmp43 2023-01-11T21:38:06.7701399Z tmp98 = tmp94 * tmp49 2023-01-11T21:38:06.7701478Z tmp99 = tmp95 * tmp56 2023-01-11T21:38:06.7701561Z tmp100 = tmp98 + tmp99 2023-01-11T21:38:06.7701643Z tmp101 = tmp97 + tmp100 2023-01-11T21:38:06.7701718Z tmp102 = tmp96 + tmp101 2023-01-11T21:38:06.7701798Z tmp103 = tmp7 + 1.0 2023-01-11T21:38:06.7701912Z tmp104 = -0.75 * tmp103 2023-01-11T21:38:06.7702030Z tmp105 = tmp104 - -3.75 2023-01-11T21:38:06.7702116Z tmp106 = tmp105 * tmp103 2023-01-11T21:38:06.7702230Z tmp107 = tmp106 + -6.0 2023-01-11T21:38:06.7702312Z tmp108 = tmp107 * tmp103 2023-01-11T21:38:06.7702419Z tmp109 = tmp108 - -3.0 2023-01-11T21:38:06.7702497Z tmp110 = 1.25 * tmp7 2023-01-11T21:38:06.7702611Z tmp111 = tmp110 - 2.25 2023-01-11T21:38:06.7702694Z tmp112 = tmp111 * tmp7 2023-01-11T21:38:06.7702778Z tmp113 = tmp112 * tmp7 2023-01-11T21:38:06.7702859Z tmp114 = tmp113 + 1.0 2023-01-11T21:38:06.7702970Z tmp115 = 1.0 - tmp7 2023-01-11T21:38:06.7703041Z tmp116 = 1.25 * tmp115 2023-01-11T21:38:06.7703157Z tmp117 = tmp116 - 2.25 2023-01-11T21:38:06.7703245Z tmp118 = tmp117 * tmp115 2023-01-11T21:38:06.7703331Z tmp119 = tmp118 * tmp115 2023-01-11T21:38:06.7703410Z tmp120 = tmp119 + 1.0 2023-01-11T21:38:06.7703489Z tmp121 = tmp115 + 1.0 2023-01-11T21:38:06.7703596Z tmp122 = -0.75 * tmp121 2023-01-11T21:38:06.7703713Z tmp123 = tmp122 - -3.75 2023-01-11T21:38:06.7703794Z tmp124 = tmp123 * tmp121 2023-01-11T21:38:06.7703912Z tmp125 = tmp124 + -6.0 2023-01-11T21:38:06.7703994Z tmp126 = tmp125 * tmp121 2023-01-11T21:38:06.7704108Z tmp127 = tmp126 - -3.0 2023-01-11T21:38:06.7704189Z tmp128 = tmp63 * tmp109 2023-01-11T21:38:06.7704263Z tmp129 = tmp76 * tmp114 2023-01-11T21:38:06.7704344Z tmp130 = tmp89 * tmp120 2023-01-11T21:38:06.7704452Z tmp131 = tmp102 * tmp127 2023-01-11T21:38:06.7704535Z tmp132 = tmp130 + tmp131 2023-01-11T21:38:06.7704617Z tmp133 = tmp129 + tmp132 2023-01-11T21:38:06.7704701Z tmp134 = tmp128 + tmp133 2023-01-11T21:38:06.7704841Z tl.store(out_ptr0 + (x4 + tl.zeros([XBLOCK], tl.int32)), tmp134, xmask) 2023-01-11T21:38:06.7704920Z ''') 2023-01-11T21:38:06.7704926Z 2023-01-11T21:38:06.7704930Z 2023-01-11T21:38:06.7705119Z triton_fused_upsample_bicubic2d_1_1 = async_compile.triton(''' 2023-01-11T21:38:06.7705200Z import triton 2023-01-11T21:38:06.7705294Z import triton.language as tl 2023-01-11T21:38:06.7705413Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.7705524Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.7705657Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.7705777Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.7705790Z 2023-01-11T21:38:06.7706196Z @pointwise(size_hints=[524288], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.7706272Z @triton.jit 2023-01-11T21:38:06.7706404Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.7706481Z xnumel = 393216 2023-01-11T21:38:06.7706583Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.7706714Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.7706798Z xmask = xindex < xnumel 2023-01-11T21:38:06.7706869Z x0 = xindex % 256 2023-01-11T21:38:06.7706981Z x1 = (xindex // 256) % 128 2023-01-11T21:38:06.7707062Z x2 = (xindex // 32768) 2023-01-11T21:38:06.7707136Z x4 = xindex 2023-01-11T21:38:06.7707208Z tmp0 = x0 2023-01-11T21:38:06.7707288Z tmp1 = tmp0 + 0.5 2023-01-11T21:38:06.7707369Z tmp2 = 0.125 * tmp1 2023-01-11T21:38:06.7707471Z tmp3 = tmp2 - 0.5 2023-01-11T21:38:06.7707578Z tmp4 = tl.libdevice.floor(tmp3) 2023-01-11T21:38:06.7707686Z tmp5 = tmp3 - tmp4 2023-01-11T21:38:06.7707758Z tmp6 = x1 2023-01-11T21:38:06.7707836Z tmp7 = tmp6 + 0.5 2023-01-11T21:38:06.7707912Z tmp8 = 0.5 * tmp7 2023-01-11T21:38:06.7708018Z tmp9 = tmp8 - 0.5 2023-01-11T21:38:06.7708112Z tmp10 = tl.libdevice.floor(tmp9) 2023-01-11T21:38:06.7708227Z tmp11 = tmp9 - tmp10 2023-01-11T21:38:06.7708317Z tmp12 = tmp10.to(tl.int32) 2023-01-11T21:38:06.7708404Z tmp13 = tmp4.to(tl.int32) 2023-01-11T21:38:06.7708499Z tmp14 = tmp12 + -1 2023-01-11T21:38:06.7708578Z tmp15 = tmp12 + 0 2023-01-11T21:38:06.7708652Z tmp16 = tmp12 + 1 2023-01-11T21:38:06.7708731Z tmp17 = tmp12 + 2 2023-01-11T21:38:06.7708826Z tmp18 = tmp13 + -1 2023-01-11T21:38:06.7708904Z tmp19 = tmp13 + 0 2023-01-11T21:38:06.7708980Z tmp20 = tmp13 + 1 2023-01-11T21:38:06.7709058Z tmp21 = tmp13 + 2 2023-01-11T21:38:06.7709185Z tmp22 = tl.where(63 != 63, 63, tl.where(63 < tmp14, 63, tmp14)) 2023-01-11T21:38:06.7709294Z tmp23 = tl.where(0 != 0, 0, tl.where(0 > tmp22, 0, tmp22)) 2023-01-11T21:38:06.7709416Z tmp24 = tl.where(31 != 31, 31, tl.where(31 < tmp18, 31, tmp18)) 2023-01-11T21:38:06.7709533Z tmp25 = tl.where(0 != 0, 0, tl.where(0 > tmp24, 0, tmp24)) 2023-01-11T21:38:06.7709677Z tmp26 = tl.load(in_ptr0 + (tmp25 + (32*tmp23) + (2048*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.7709796Z tmp27 = tl.where(31 != 31, 31, tl.where(31 < tmp19, 31, tmp19)) 2023-01-11T21:38:06.7709909Z tmp28 = tl.where(0 != 0, 0, tl.where(0 > tmp27, 0, tmp27)) 2023-01-11T21:38:06.7710048Z tmp29 = tl.load(in_ptr0 + (tmp28 + (32*tmp23) + (2048*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.7710168Z tmp30 = tl.where(31 != 31, 31, tl.where(31 < tmp20, 31, tmp20)) 2023-01-11T21:38:06.7710275Z tmp31 = tl.where(0 != 0, 0, tl.where(0 > tmp30, 0, tmp30)) 2023-01-11T21:38:06.7710442Z tmp32 = tl.load(in_ptr0 + (tmp31 + (32*tmp23) + (2048*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.7710560Z tmp33 = tl.where(31 != 31, 31, tl.where(31 < tmp21, 31, tmp21)) 2023-01-11T21:38:06.7710672Z tmp34 = tl.where(0 != 0, 0, tl.where(0 > tmp33, 0, tmp33)) 2023-01-11T21:38:06.7710810Z tmp35 = tl.load(in_ptr0 + (tmp34 + (32*tmp23) + (2048*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.7710889Z tmp36 = tmp5 + 1.0 2023-01-11T21:38:06.7711004Z tmp37 = -0.75 * tmp36 2023-01-11T21:38:06.7711113Z tmp38 = tmp37 - -3.75 2023-01-11T21:38:06.7711193Z tmp39 = tmp38 * tmp36 2023-01-11T21:38:06.7711307Z tmp40 = tmp39 + -6.0 2023-01-11T21:38:06.7711390Z tmp41 = tmp40 * tmp36 2023-01-11T21:38:06.7711503Z tmp42 = tmp41 - -3.0 2023-01-11T21:38:06.7711581Z tmp43 = 1.25 * tmp5 2023-01-11T21:38:06.7711691Z tmp44 = tmp43 - 2.25 2023-01-11T21:38:06.7711764Z tmp45 = tmp44 * tmp5 2023-01-11T21:38:06.7711845Z tmp46 = tmp45 * tmp5 2023-01-11T21:38:06.7711924Z tmp47 = tmp46 + 1.0 2023-01-11T21:38:06.7712031Z tmp48 = 1.0 - tmp5 2023-01-11T21:38:06.7712111Z tmp49 = 1.25 * tmp48 2023-01-11T21:38:06.7712224Z tmp50 = tmp49 - 2.25 2023-01-11T21:38:06.7712303Z tmp51 = tmp50 * tmp48 2023-01-11T21:38:06.7712375Z tmp52 = tmp51 * tmp48 2023-01-11T21:38:06.7712453Z tmp53 = tmp52 + 1.0 2023-01-11T21:38:06.7712531Z tmp54 = tmp48 + 1.0 2023-01-11T21:38:06.7712641Z tmp55 = -0.75 * tmp54 2023-01-11T21:38:06.7712754Z tmp56 = tmp55 - -3.75 2023-01-11T21:38:06.7712833Z tmp57 = tmp56 * tmp54 2023-01-11T21:38:06.7712936Z tmp58 = tmp57 + -6.0 2023-01-11T21:38:06.7713016Z tmp59 = tmp58 * tmp54 2023-01-11T21:38:06.7713125Z tmp60 = tmp59 - -3.0 2023-01-11T21:38:06.7713239Z tmp61 = tmp26 * tmp42 2023-01-11T21:38:06.7713317Z tmp62 = tmp29 * tmp47 2023-01-11T21:38:06.7713397Z tmp63 = tmp32 * tmp53 2023-01-11T21:38:06.7713478Z tmp64 = tmp35 * tmp60 2023-01-11T21:38:06.7713550Z tmp65 = tmp63 + tmp64 2023-01-11T21:38:06.7713629Z tmp66 = tmp62 + tmp65 2023-01-11T21:38:06.7713706Z tmp67 = tmp61 + tmp66 2023-01-11T21:38:06.7713827Z tmp68 = tl.where(63 != 63, 63, tl.where(63 < tmp15, 63, tmp15)) 2023-01-11T21:38:06.7713944Z tmp69 = tl.where(0 != 0, 0, tl.where(0 > tmp68, 0, tmp68)) 2023-01-11T21:38:06.7714079Z tmp70 = tl.load(in_ptr0 + (tmp25 + (32*tmp69) + (2048*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.7714211Z tmp71 = tl.load(in_ptr0 + (tmp28 + (32*tmp69) + (2048*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.7714340Z tmp72 = tl.load(in_ptr0 + (tmp31 + (32*tmp69) + (2048*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.7714462Z tmp73 = tl.load(in_ptr0 + (tmp34 + (32*tmp69) + (2048*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.7714546Z tmp74 = tmp70 * tmp42 2023-01-11T21:38:06.7714629Z tmp75 = tmp71 * tmp47 2023-01-11T21:38:06.7714709Z tmp76 = tmp72 * tmp53 2023-01-11T21:38:06.7714789Z tmp77 = tmp73 * tmp60 2023-01-11T21:38:06.7714869Z tmp78 = tmp76 + tmp77 2023-01-11T21:38:06.7714947Z tmp79 = tmp75 + tmp78 2023-01-11T21:38:06.7715022Z tmp80 = tmp74 + tmp79 2023-01-11T21:38:06.7715141Z tmp81 = tl.where(63 != 63, 63, tl.where(63 < tmp16, 63, tmp16)) 2023-01-11T21:38:06.7715282Z tmp82 = tl.where(0 != 0, 0, tl.where(0 > tmp81, 0, tmp81)) 2023-01-11T21:38:06.7715435Z tmp83 = tl.load(in_ptr0 + (tmp25 + (32*tmp82) + (2048*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.7715571Z tmp84 = tl.load(in_ptr0 + (tmp28 + (32*tmp82) + (2048*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.7715701Z tmp85 = tl.load(in_ptr0 + (tmp31 + (32*tmp82) + (2048*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.7715829Z tmp86 = tl.load(in_ptr0 + (tmp34 + (32*tmp82) + (2048*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.7715906Z tmp87 = tmp83 * tmp42 2023-01-11T21:38:06.7715987Z tmp88 = tmp84 * tmp47 2023-01-11T21:38:06.7716069Z tmp89 = tmp85 * tmp53 2023-01-11T21:38:06.7716149Z tmp90 = tmp86 * tmp60 2023-01-11T21:38:06.7716227Z tmp91 = tmp89 + tmp90 2023-01-11T21:38:06.7716304Z tmp92 = tmp88 + tmp91 2023-01-11T21:38:06.7716409Z tmp93 = tmp87 + tmp92 2023-01-11T21:38:06.7716521Z tmp94 = tl.where(63 != 63, 63, tl.where(63 < tmp17, 63, tmp17)) 2023-01-11T21:38:06.7716639Z tmp95 = tl.where(0 != 0, 0, tl.where(0 > tmp94, 0, tmp94)) 2023-01-11T21:38:06.7716770Z tmp96 = tl.load(in_ptr0 + (tmp25 + (32*tmp95) + (2048*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.7716899Z tmp97 = tl.load(in_ptr0 + (tmp28 + (32*tmp95) + (2048*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.7717026Z tmp98 = tl.load(in_ptr0 + (tmp31 + (32*tmp95) + (2048*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.7717152Z tmp99 = tl.load(in_ptr0 + (tmp34 + (32*tmp95) + (2048*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.7717239Z tmp100 = tmp96 * tmp42 2023-01-11T21:38:06.7717323Z tmp101 = tmp97 * tmp47 2023-01-11T21:38:06.7717397Z tmp102 = tmp98 * tmp53 2023-01-11T21:38:06.7717477Z tmp103 = tmp99 * tmp60 2023-01-11T21:38:06.7717560Z tmp104 = tmp102 + tmp103 2023-01-11T21:38:06.7717643Z tmp105 = tmp101 + tmp104 2023-01-11T21:38:06.7717725Z tmp106 = tmp100 + tmp105 2023-01-11T21:38:06.7717805Z tmp107 = tmp11 + 1.0 2023-01-11T21:38:06.7717922Z tmp108 = -0.75 * tmp107 2023-01-11T21:38:06.7718032Z tmp109 = tmp108 - -3.75 2023-01-11T21:38:06.7718115Z tmp110 = tmp109 * tmp107 2023-01-11T21:38:06.7718229Z tmp111 = tmp110 + -6.0 2023-01-11T21:38:06.7718310Z tmp112 = tmp111 * tmp107 2023-01-11T21:38:06.7718425Z tmp113 = tmp112 - -3.0 2023-01-11T21:38:06.7718504Z tmp114 = 1.25 * tmp11 2023-01-11T21:38:06.7718611Z tmp115 = tmp114 - 2.25 2023-01-11T21:38:06.7718693Z tmp116 = tmp115 * tmp11 2023-01-11T21:38:06.7718774Z tmp117 = tmp116 * tmp11 2023-01-11T21:38:06.7718882Z tmp118 = tmp117 + 1.0 2023-01-11T21:38:06.7718994Z tmp119 = 1.0 - tmp11 2023-01-11T21:38:06.7719072Z tmp120 = 1.25 * tmp119 2023-01-11T21:38:06.7719186Z tmp121 = tmp120 - 2.25 2023-01-11T21:38:06.7719261Z tmp122 = tmp121 * tmp119 2023-01-11T21:38:06.7719343Z tmp123 = tmp122 * tmp119 2023-01-11T21:38:06.7719425Z tmp124 = tmp123 + 1.0 2023-01-11T21:38:06.7719507Z tmp125 = tmp119 + 1.0 2023-01-11T21:38:06.7719619Z tmp126 = -0.75 * tmp125 2023-01-11T21:38:06.7719736Z tmp127 = tmp126 - -3.75 2023-01-11T21:38:06.7719817Z tmp128 = tmp127 * tmp125 2023-01-11T21:38:06.7719924Z tmp129 = tmp128 + -6.0 2023-01-11T21:38:06.7720004Z tmp130 = tmp129 * tmp125 2023-01-11T21:38:06.7720118Z tmp131 = tmp130 - -3.0 2023-01-11T21:38:06.7720203Z tmp132 = tmp67 * tmp113 2023-01-11T21:38:06.7720284Z tmp133 = tmp80 * tmp118 2023-01-11T21:38:06.7720368Z tmp134 = tmp93 * tmp124 2023-01-11T21:38:06.7720443Z tmp135 = tmp106 * tmp131 2023-01-11T21:38:06.7720528Z tmp136 = tmp134 + tmp135 2023-01-11T21:38:06.7720612Z tmp137 = tmp133 + tmp136 2023-01-11T21:38:06.7720693Z tmp138 = tmp132 + tmp137 2023-01-11T21:38:06.7720833Z tl.store(out_ptr0 + (x4 + tl.zeros([XBLOCK], tl.int32)), tmp138, xmask) 2023-01-11T21:38:06.7720921Z ''') 2023-01-11T21:38:06.7720927Z 2023-01-11T21:38:06.7720937Z 2023-01-11T21:38:06.7721031Z async_compile.wait(globals()) 2023-01-11T21:38:06.7721110Z del async_compile 2023-01-11T21:38:06.7721115Z 2023-01-11T21:38:06.7721184Z def call(args): 2023-01-11T21:38:06.7721258Z arg0_1, = args 2023-01-11T21:38:06.7721335Z args.clear() 2023-01-11T21:38:06.7721429Z with torch.cuda.device(0): 2023-01-11T21:38:06.7721663Z buf0 = empty_strided((4, 3, 128, 128), (49152, 16384, 128, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.7721758Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.7721918Z triton_fused_upsample_bicubic2d_0.run(arg0_1, buf0, 196608, grid=grid(196608), stream=stream0) 2023-01-11T21:38:06.7722145Z buf1 = empty_strided((4, 3, 128, 256), (98304, 32768, 256, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.7722309Z triton_fused_upsample_bicubic2d_1_1.run(arg0_1, buf1, 393216, grid=grid(393216), stream=stream0) 2023-01-11T21:38:06.7722385Z del arg0_1 2023-01-11T21:38:06.7722495Z return (buf0, buf1, ) 2023-01-11T21:38:06.7722501Z 2023-01-11T21:38:06.7722505Z 2023-01-11T21:38:06.7722587Z if __name__ == "__main__": 2023-01-11T21:38:06.7722706Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.7722835Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.7723064Z arg0_1 = rand_strided((4, 3, 64, 32), (6144, 2048, 32, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.7723170Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.7723433Z [2023-01-11 21:36:11,550] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 930 2023-01-11T21:38:06.7723442Z 2023-01-11T21:38:06.7723515Z ok (4.423s) 2023-01-11T21:38:06.7723992Z test_upsample_bilinear2d_a_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.7724125Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.7724380Z [2023-01-11 21:36:12,104] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 931 2023-01-11T21:38:06.7724618Z [2023-01-11 21:36:12,268] torch._inductor.optimize_indexing: [WARNING] unhandled ValueRange op name 2023-01-11T21:38:06.7724848Z [2023-01-11 21:36:12,285] torch._inductor.optimize_indexing: [WARNING] unhandled ValueRange op name 2023-01-11T21:38:06.7725121Z [2023-01-11 21:36:12,294] torch._inductor.optimize_indexing: [WARNING] unhandled ValueRange op name 2023-01-11T21:38:06.7725336Z [2023-01-11 21:36:12,299] torch._inductor.scheduler: [DEBUG] remove_buffer('buf1') 2023-01-11T21:38:06.7725557Z [2023-01-11 21:36:12,300] torch._inductor.optimize_indexing: [WARNING] unhandled ValueRange op name 2023-01-11T21:38:06.7725786Z [2023-01-11 21:36:12,312] torch._inductor.optimize_indexing: [WARNING] unhandled ValueRange op name 2023-01-11T21:38:06.7725991Z [2023-01-11 21:36:12,320] torch._inductor.scheduler: [DEBUG] remove_buffer('buf4') 2023-01-11T21:38:06.7726252Z [2023-01-11 21:36:12,548] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 931 2023-01-11T21:38:06.7726669Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.7726806Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.7727061Z [2023-01-11 21:36:13,169] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 932 2023-01-11T21:38:06.7727300Z [2023-01-11 21:36:13,317] torch._inductor.optimize_indexing: [WARNING] unhandled ValueRange op name 2023-01-11T21:38:06.7727529Z [2023-01-11 21:36:13,328] torch._inductor.optimize_indexing: [WARNING] unhandled ValueRange op name 2023-01-11T21:38:06.7727755Z [2023-01-11 21:36:13,337] torch._inductor.optimize_indexing: [WARNING] unhandled ValueRange op name 2023-01-11T21:38:06.7727956Z [2023-01-11 21:36:13,340] torch._inductor.scheduler: [DEBUG] remove_buffer('buf0') 2023-01-11T21:38:06.7728162Z [2023-01-11 21:36:13,341] torch._inductor.scheduler: [DEBUG] remove_buffer('buf1') 2023-01-11T21:38:06.7728393Z [2023-01-11 21:36:13,342] torch._inductor.optimize_indexing: [WARNING] unhandled ValueRange op name 2023-01-11T21:38:06.7728624Z [2023-01-11 21:36:13,350] torch._inductor.optimize_indexing: [WARNING] unhandled ValueRange op name 2023-01-11T21:38:06.7728824Z [2023-01-11 21:36:13,358] torch._inductor.scheduler: [DEBUG] remove_buffer('buf3') 2023-01-11T21:38:06.7729050Z [2023-01-11 21:36:13,358] torch._inductor.scheduler: [DEBUG] remove_buffer('buf4') 2023-01-11T21:38:06.7729056Z 2023-01-11T21:38:06.7729158Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.7729234Z import torch 2023-01-11T21:38:06.7729303Z import random 2023-01-11T21:38:06.7729423Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.7729549Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.7729554Z 2023-01-11T21:38:06.7729637Z aten = torch.ops.aten 2023-01-11T21:38:06.7729776Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.7729874Z async_compile = AsyncCompile() 2023-01-11T21:38:06.7729880Z 2023-01-11T21:38:06.7729960Z import triton 2023-01-11T21:38:06.7730054Z import triton.language as tl 2023-01-11T21:38:06.7730173Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.7730314Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.7730320Z 2023-01-11T21:38:06.7730324Z 2023-01-11T21:38:06.7730600Z triton_fused_add_add_2_add_3_add_4_arange_convert_element_type_floor_index_index_1_index_2_0 = async_compile.triton(''' 2023-01-11T21:38:06.7730676Z import triton 2023-01-11T21:38:06.7730769Z import triton.language as tl 2023-01-11T21:38:06.7730883Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.7730985Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.7740679Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.7740844Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.7740850Z 2023-01-11T21:38:06.7741302Z @pointwise(size_hints=[16384], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.7741479Z @triton.jit 2023-01-11T21:38:06.7741621Z def triton_(in_out_ptr0, in_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.7741699Z xnumel = 16200 2023-01-11T21:38:06.7741804Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.7741944Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.7742023Z xmask = xindex < xnumel 2023-01-11T21:38:06.7742117Z x1 = (xindex // 45) % 45 2023-01-11T21:38:06.7742197Z x0 = xindex % 45 2023-01-11T21:38:06.7742285Z x2 = (xindex // 2025) 2023-01-11T21:38:06.7742360Z x4 = xindex 2023-01-11T21:38:06.7742436Z tmp0 = x1 2023-01-11T21:38:06.7742514Z tmp1 = 0.5 2023-01-11T21:38:06.7742590Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.7742674Z tmp3 = 0.8222222222222222 2023-01-11T21:38:06.7742758Z tmp4 = tmp2 * tmp3 2023-01-11T21:38:06.7742876Z tmp5 = tmp4 - tmp1 2023-01-11T21:38:06.7742952Z tmp6 = 0.0 2023-01-11T21:38:06.7743100Z tmp7 = tl.where(tmp5 != tmp5, tmp5, tl.where(tmp5 > tmp6, tmp5, tmp6)) 2023-01-11T21:38:06.7743206Z tmp8 = tl.libdevice.floor(tmp7) 2023-01-11T21:38:06.7743288Z tmp9 = tmp8.to(tl.int32) 2023-01-11T21:38:06.7743368Z tmp10 = x0 2023-01-11T21:38:06.7743454Z tmp11 = tmp10 + tmp1 2023-01-11T21:38:06.7743541Z tmp12 = 0.8444444444444444 2023-01-11T21:38:06.7743627Z tmp13 = tmp11 * tmp12 2023-01-11T21:38:06.7743745Z tmp14 = tmp13 - tmp1 2023-01-11T21:38:06.7743891Z tmp15 = tl.where(tmp14 != tmp14, tmp14, tl.where(tmp14 > tmp6, tmp14, tmp6)) 2023-01-11T21:38:06.7743987Z tmp16 = tl.libdevice.floor(tmp15) 2023-01-11T21:38:06.7744077Z tmp17 = tmp16.to(tl.int32) 2023-01-11T21:38:06.7744306Z tmp18 = tl.load(in_ptr0 + (tmp17 + (38*tmp9) + (1406*x2)), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.7744383Z tmp19 = 1.0 2023-01-11T21:38:06.7744479Z tmp20 = tmp9.to(tl.float32) 2023-01-11T21:38:06.7744594Z tmp21 = tmp7 - tmp20 2023-01-11T21:38:06.7744713Z tmp22 = tmp19 - tmp21 2023-01-11T21:38:06.7744789Z tmp23 = tmp18 * tmp22 2023-01-11T21:38:06.7744892Z tmp24 = tl.libdevice.ceil(tmp7) 2023-01-11T21:38:06.7744971Z tmp25 = 36.0 2023-01-11T21:38:06.7745171Z tmp26 = tl.where(tmp24 != tmp24, tmp24, tl.where(tmp24 < tmp25, tmp24, tmp25)) 2023-01-11T21:38:06.7745261Z tmp27 = tmp26.to(tl.int32) 2023-01-11T21:38:06.7745493Z tmp28 = tl.load(in_ptr0 + (tmp17 + (38*tmp27) + (1406*x2)), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.7745579Z tmp29 = tmp28 * tmp21 2023-01-11T21:38:06.7745651Z tmp30 = tmp23 + tmp29 2023-01-11T21:38:06.7745750Z tmp31 = tl.libdevice.ceil(tmp15) 2023-01-11T21:38:06.7745826Z tmp32 = 37.0 2023-01-11T21:38:06.7745967Z tmp33 = tl.where(tmp31 != tmp31, tmp31, tl.where(tmp31 < tmp32, tmp31, tmp32)) 2023-01-11T21:38:06.7746057Z tmp34 = tmp33.to(tl.int32) 2023-01-11T21:38:06.7746188Z tmp35 = tl.load(in_ptr0 + (tmp34 + (38*tmp9) + (1406*x2)), xmask) 2023-01-11T21:38:06.7746271Z tmp36 = tmp35 * tmp22 2023-01-11T21:38:06.7746387Z tmp37 = tl.load(in_ptr0 + (tmp34 + (38*tmp27) + (1406*x2)), xmask) 2023-01-11T21:38:06.7746470Z tmp38 = tmp37 * tmp21 2023-01-11T21:38:06.7746557Z tmp39 = tmp36 + tmp38 2023-01-11T21:38:06.7746651Z tmp40 = tmp17.to(tl.float32) 2023-01-11T21:38:06.7746767Z tmp41 = tmp15 - tmp40 2023-01-11T21:38:06.7746882Z tmp42 = tmp19 - tmp41 2023-01-11T21:38:06.7746966Z tmp43 = tmp30 * tmp42 2023-01-11T21:38:06.7747037Z tmp44 = tmp39 * tmp41 2023-01-11T21:38:06.7747118Z tmp45 = tmp43 + tmp44 2023-01-11T21:38:06.7747261Z tl.store(in_out_ptr0 + (x4 + tl.zeros([XBLOCK], tl.int32)), tmp45, xmask) 2023-01-11T21:38:06.7747348Z ''') 2023-01-11T21:38:06.7747353Z 2023-01-11T21:38:06.7747358Z 2023-01-11T21:38:06.7747670Z triton_fused_add_5_add_6_add_7_arange_2_arange_3_convert_element_type_4_convert_element_type_6_floor_2_floor_3_index_4_1 = async_compile.triton(''' 2023-01-11T21:38:06.7747782Z import triton 2023-01-11T21:38:06.7747876Z import triton.language as tl 2023-01-11T21:38:06.7747985Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.7748093Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.7748232Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.7748363Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.7748368Z 2023-01-11T21:38:06.7748789Z @pointwise(size_hints=[65536], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.7748867Z @triton.jit 2023-01-11T21:38:06.7749005Z def triton_(in_out_ptr0, in_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.7749082Z xnumel = 44992 2023-01-11T21:38:06.7749173Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.7749308Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.7749391Z xmask = xindex < xnumel 2023-01-11T21:38:06.7749477Z x1 = (xindex // 76) % 74 2023-01-11T21:38:06.7749552Z x0 = xindex % 76 2023-01-11T21:38:06.7749633Z x2 = (xindex // 5624) 2023-01-11T21:38:06.7749707Z x4 = xindex 2023-01-11T21:38:06.7749772Z tmp0 = x1 2023-01-11T21:38:06.7749855Z tmp1 = 0.4931506849315068 2023-01-11T21:38:06.7749935Z tmp2 = tmp0 * tmp1 2023-01-11T21:38:06.7750036Z tmp3 = tl.libdevice.floor(tmp2) 2023-01-11T21:38:06.7750125Z tmp4 = tmp3.to(tl.int32) 2023-01-11T21:38:06.7750200Z tmp5 = x0 2023-01-11T21:38:06.7750283Z tmp6 = 0.49333333333333335 2023-01-11T21:38:06.7750354Z tmp7 = tmp5 * tmp6 2023-01-11T21:38:06.7750453Z tmp8 = tl.libdevice.floor(tmp7) 2023-01-11T21:38:06.7750539Z tmp9 = tmp8.to(tl.int32) 2023-01-11T21:38:06.7750768Z tmp10 = tl.load(in_ptr0 + (tmp9 + (38*tmp4) + (1406*x2)), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.7750844Z tmp11 = 1.0 2023-01-11T21:38:06.7750934Z tmp12 = tmp4.to(tl.float32) 2023-01-11T21:38:06.7751047Z tmp13 = tmp2 - tmp12 2023-01-11T21:38:06.7751152Z tmp14 = tmp11 - tmp13 2023-01-11T21:38:06.7751233Z tmp15 = tmp10 * tmp14 2023-01-11T21:38:06.7751328Z tmp16 = tl.libdevice.ceil(tmp2) 2023-01-11T21:38:06.7751433Z tmp17 = 36.0 2023-01-11T21:38:06.7751575Z tmp18 = tl.where(tmp16 != tmp16, tmp16, tl.where(tmp16 < tmp17, tmp16, tmp17)) 2023-01-11T21:38:06.7751663Z tmp19 = tmp18.to(tl.int32) 2023-01-11T21:38:06.7751889Z tmp20 = tl.load(in_ptr0 + (tmp9 + (38*tmp19) + (1406*x2)), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.7751963Z tmp21 = tmp20 * tmp13 2023-01-11T21:38:06.7752047Z tmp22 = tmp15 + tmp21 2023-01-11T21:38:06.7752135Z tmp23 = tmp9.to(tl.float32) 2023-01-11T21:38:06.7752250Z tmp24 = tmp7 - tmp23 2023-01-11T21:38:06.7752363Z tmp25 = tmp11 - tmp24 2023-01-11T21:38:06.7752446Z tmp26 = tmp22 * tmp25 2023-01-11T21:38:06.7752547Z tmp27 = tl.libdevice.ceil(tmp7) 2023-01-11T21:38:06.7752613Z tmp28 = 37.0 2023-01-11T21:38:06.7752758Z tmp29 = tl.where(tmp27 != tmp27, tmp27, tl.where(tmp27 < tmp28, tmp27, tmp28)) 2023-01-11T21:38:06.7752846Z tmp30 = tmp29.to(tl.int32) 2023-01-11T21:38:06.7752973Z tmp31 = tl.load(in_ptr0 + (tmp30 + (38*tmp4) + (1406*x2)), xmask) 2023-01-11T21:38:06.7753055Z tmp32 = tmp31 * tmp14 2023-01-11T21:38:06.7753178Z tmp33 = tl.load(in_ptr0 + (tmp30 + (38*tmp19) + (1406*x2)), xmask) 2023-01-11T21:38:06.7753258Z tmp34 = tmp33 * tmp13 2023-01-11T21:38:06.7753330Z tmp35 = tmp32 + tmp34 2023-01-11T21:38:06.7753410Z tmp36 = tmp35 * tmp24 2023-01-11T21:38:06.7753490Z tmp37 = tmp26 + tmp36 2023-01-11T21:38:06.7753631Z tl.store(in_out_ptr0 + (x4 + tl.zeros([XBLOCK], tl.int32)), tmp37, xmask) 2023-01-11T21:38:06.7753715Z ''') 2023-01-11T21:38:06.7753721Z 2023-01-11T21:38:06.7753726Z 2023-01-11T21:38:06.7753819Z async_compile.wait(globals()) 2023-01-11T21:38:06.7753927Z del async_compile 2023-01-11T21:38:06.7753932Z 2023-01-11T21:38:06.7754007Z def call(args): 2023-01-11T21:38:06.7754073Z arg0_1, = args 2023-01-11T21:38:06.7754147Z args.clear() 2023-01-11T21:38:06.7754240Z with torch.cuda.device(0): 2023-01-11T21:38:06.7754470Z buf0 = empty_strided((2, 4, 45, 45), (8100, 2025, 45, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.7754561Z buf2 = buf0; del buf0 # reuse 2023-01-11T21:38:06.7754654Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.7754864Z triton_fused_add_add_2_add_3_add_4_arange_convert_element_type_floor_index_index_1_index_2_0.run(buf2, arg0_1, 16200, grid=grid(16200), stream=stream0) 2023-01-11T21:38:06.7755082Z buf3 = empty_strided((2, 4, 74, 76), (22496, 5624, 76, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.7755184Z buf5 = buf3; del buf3 # reuse 2023-01-11T21:38:06.7755452Z triton_fused_add_5_add_6_add_7_arange_2_arange_3_convert_element_type_4_convert_element_type_6_floor_2_floor_3_index_4_1.run(buf5, arg0_1, 44992, grid=grid(44992), stream=stream0) 2023-01-11T21:38:06.7755532Z del arg0_1 2023-01-11T21:38:06.7755615Z return (buf2, buf5, ) 2023-01-11T21:38:06.7755621Z 2023-01-11T21:38:06.7755625Z 2023-01-11T21:38:06.7755707Z if __name__ == "__main__": 2023-01-11T21:38:06.7755830Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.7755960Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.7756178Z arg0_1 = rand_strided((2, 4, 37, 38), (5624, 1406, 38, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.7756291Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.7756297Z 2023-01-11T21:38:06.7756571Z [2023-01-11 21:36:13,580] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 932 2023-01-11T21:38:06.7756577Z 2023-01-11T21:38:06.7756676Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.7756755Z import torch 2023-01-11T21:38:06.7756830Z import random 2023-01-11T21:38:06.7756954Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.7757079Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.7757084Z 2023-01-11T21:38:06.7757159Z aten = torch.ops.aten 2023-01-11T21:38:06.7757328Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.7757427Z async_compile = AsyncCompile() 2023-01-11T21:38:06.7757432Z 2023-01-11T21:38:06.7757508Z import triton 2023-01-11T21:38:06.7757602Z import triton.language as tl 2023-01-11T21:38:06.7757730Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.7757874Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.7757879Z 2023-01-11T21:38:06.7757884Z 2023-01-11T21:38:06.7758207Z triton_fused_add_add_2_add_3_arange_convert_element_type_convert_element_type_1_convert_element_type_5_floor_index_index_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.7758287Z import triton 2023-01-11T21:38:06.7758372Z import triton.language as tl 2023-01-11T21:38:06.7758490Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.7758594Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.7758732Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.7758862Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.7758867Z 2023-01-11T21:38:06.7759274Z @pointwise(size_hints=[16384], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.7759349Z @triton.jit 2023-01-11T21:38:06.7759483Z def triton_(in_ptr0, out_ptr2, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.7759551Z xnumel = 16200 2023-01-11T21:38:06.7759650Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.7759778Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.7759899Z xmask = xindex < xnumel 2023-01-11T21:38:06.7759985Z x1 = (xindex // 45) % 45 2023-01-11T21:38:06.7760066Z x0 = xindex % 45 2023-01-11T21:38:06.7760148Z x2 = (xindex // 2025) 2023-01-11T21:38:06.7760214Z x4 = xindex 2023-01-11T21:38:06.7760288Z tmp0 = x1 2023-01-11T21:38:06.7760368Z tmp1 = 0.5 2023-01-11T21:38:06.7760453Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.7760537Z tmp3 = 0.8222222222222222 2023-01-11T21:38:06.7760620Z tmp4 = tmp2 * tmp3 2023-01-11T21:38:06.7760725Z tmp5 = tmp4 - tmp1 2023-01-11T21:38:06.7760801Z tmp6 = 0.0 2023-01-11T21:38:06.7760942Z tmp7 = tl.where(tmp5 != tmp5, tmp5, tl.where(tmp5 > tmp6, tmp5, tmp6)) 2023-01-11T21:38:06.7761045Z tmp8 = tl.libdevice.floor(tmp7) 2023-01-11T21:38:06.7761134Z tmp9 = tmp8.to(tl.int32) 2023-01-11T21:38:06.7761209Z tmp10 = x0 2023-01-11T21:38:06.7761292Z tmp11 = tmp10 + tmp1 2023-01-11T21:38:06.7761367Z tmp12 = 0.8444444444444444 2023-01-11T21:38:06.7761454Z tmp13 = tmp11 * tmp12 2023-01-11T21:38:06.7761570Z tmp14 = tmp13 - tmp1 2023-01-11T21:38:06.7761715Z tmp15 = tl.where(tmp14 != tmp14, tmp14, tl.where(tmp14 > tmp6, tmp14, tmp6)) 2023-01-11T21:38:06.7761820Z tmp16 = tl.libdevice.floor(tmp15) 2023-01-11T21:38:06.7761909Z tmp17 = tmp16.to(tl.int32) 2023-01-11T21:38:06.7762169Z tmp18 = tl.load(in_ptr0 + (tmp17 + (38*tmp9) + (1406*x2)), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.7762255Z tmp19 = tmp18.to(tl.float32) 2023-01-11T21:38:06.7762332Z tmp20 = 1.0 2023-01-11T21:38:06.7762422Z tmp21 = tmp9.to(tl.float32) 2023-01-11T21:38:06.7762538Z tmp22 = tmp7 - tmp21 2023-01-11T21:38:06.7762656Z tmp23 = tmp20 - tmp22 2023-01-11T21:38:06.7762738Z tmp24 = tmp19 * tmp23 2023-01-11T21:38:06.7762838Z tmp25 = tl.libdevice.ceil(tmp7) 2023-01-11T21:38:06.7762907Z tmp26 = 36.0 2023-01-11T21:38:06.7763051Z tmp27 = tl.where(tmp25 != tmp25, tmp25, tl.where(tmp25 < tmp26, tmp25, tmp26)) 2023-01-11T21:38:06.7763146Z tmp28 = tmp27.to(tl.int32) 2023-01-11T21:38:06.7763400Z tmp29 = tl.load(in_ptr0 + (tmp17 + (38*tmp28) + (1406*x2)), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.7763492Z tmp30 = tmp29.to(tl.float32) 2023-01-11T21:38:06.7763574Z tmp31 = tmp30 * tmp22 2023-01-11T21:38:06.7763683Z tmp32 = tmp24 + tmp31 2023-01-11T21:38:06.7763775Z tmp33 = tl.libdevice.ceil(tmp15) 2023-01-11T21:38:06.7763848Z tmp34 = 37.0 2023-01-11T21:38:06.7763990Z tmp35 = tl.where(tmp33 != tmp33, tmp33, tl.where(tmp33 < tmp34, tmp33, tmp34)) 2023-01-11T21:38:06.7764077Z tmp36 = tmp35.to(tl.int32) 2023-01-11T21:38:06.7764215Z tmp37 = tl.load(in_ptr0 + (tmp36 + (38*tmp9) + (1406*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.7764303Z tmp38 = tmp37.to(tl.float32) 2023-01-11T21:38:06.7764383Z tmp39 = tmp38 * tmp23 2023-01-11T21:38:06.7764510Z tmp40 = tl.load(in_ptr0 + (tmp36 + (38*tmp28) + (1406*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.7764601Z tmp41 = tmp40.to(tl.float32) 2023-01-11T21:38:06.7764682Z tmp42 = tmp41 * tmp22 2023-01-11T21:38:06.7764761Z tmp43 = tmp39 + tmp42 2023-01-11T21:38:06.7764847Z tmp44 = tmp17.to(tl.float32) 2023-01-11T21:38:06.7764962Z tmp45 = tmp15 - tmp44 2023-01-11T21:38:06.7765074Z tmp46 = tmp20 - tmp45 2023-01-11T21:38:06.7765149Z tmp47 = tmp32 * tmp46 2023-01-11T21:38:06.7765227Z tmp48 = tmp43 * tmp45 2023-01-11T21:38:06.7765306Z tmp49 = tmp47 + tmp48 2023-01-11T21:38:06.7765395Z tmp50 = tmp49.to(tl.float32) 2023-01-11T21:38:06.7765531Z tl.store(out_ptr2 + (x4 + tl.zeros([XBLOCK], tl.int32)), tmp50, xmask) 2023-01-11T21:38:06.7765616Z ''') 2023-01-11T21:38:06.7765622Z 2023-01-11T21:38:06.7765626Z 2023-01-11T21:38:06.7765989Z triton_fused_add_5_add_6_arange_2_arange_3_convert_element_type_11_convert_element_type_6_convert_element_type_7_convert_element_type_9_floor_2_floor_3_1 = async_compile.triton(''' 2023-01-11T21:38:06.7766066Z import triton 2023-01-11T21:38:06.7766182Z import triton.language as tl 2023-01-11T21:38:06.7766301Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.7766404Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.7766538Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.7766669Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.7766676Z 2023-01-11T21:38:06.7767088Z @pointwise(size_hints=[65536], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.7767161Z @triton.jit 2023-01-11T21:38:06.7767295Z def triton_(in_ptr0, out_ptr2, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.7767363Z xnumel = 44992 2023-01-11T21:38:06.7767465Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.7767599Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.7767687Z xmask = xindex < xnumel 2023-01-11T21:38:06.7767772Z x1 = (xindex // 76) % 74 2023-01-11T21:38:06.7767848Z x0 = xindex % 76 2023-01-11T21:38:06.7767927Z x2 = (xindex // 5624) 2023-01-11T21:38:06.7767990Z x4 = xindex 2023-01-11T21:38:06.7768064Z tmp0 = x1 2023-01-11T21:38:06.7768145Z tmp1 = 0.4931506849315068 2023-01-11T21:38:06.7768228Z tmp2 = tmp0 * tmp1 2023-01-11T21:38:06.7768330Z tmp3 = tl.libdevice.floor(tmp2) 2023-01-11T21:38:06.7768416Z tmp4 = tmp3.to(tl.int32) 2023-01-11T21:38:06.7768481Z tmp5 = x0 2023-01-11T21:38:06.7768562Z tmp6 = 0.49333333333333335 2023-01-11T21:38:06.7768641Z tmp7 = tmp5 * tmp6 2023-01-11T21:38:06.7768739Z tmp8 = tl.libdevice.floor(tmp7) 2023-01-11T21:38:06.7768823Z tmp9 = tmp8.to(tl.int32) 2023-01-11T21:38:06.7769075Z tmp10 = tl.load(in_ptr0 + (tmp9 + (38*tmp4) + (1406*x2)), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.7769166Z tmp11 = tmp10.to(tl.float32) 2023-01-11T21:38:06.7769235Z tmp12 = 1.0 2023-01-11T21:38:06.7769323Z tmp13 = tmp4.to(tl.float32) 2023-01-11T21:38:06.7769435Z tmp14 = tmp2 - tmp13 2023-01-11T21:38:06.7769551Z tmp15 = tmp12 - tmp14 2023-01-11T21:38:06.7769632Z tmp16 = tmp11 * tmp15 2023-01-11T21:38:06.7769729Z tmp17 = tl.libdevice.ceil(tmp2) 2023-01-11T21:38:06.7769801Z tmp18 = 36.0 2023-01-11T21:38:06.7769965Z tmp19 = tl.where(tmp17 != tmp17, tmp17, tl.where(tmp17 < tmp18, tmp17, tmp18)) 2023-01-11T21:38:06.7770054Z tmp20 = tmp19.to(tl.int32) 2023-01-11T21:38:06.7770302Z tmp21 = tl.load(in_ptr0 + (tmp9 + (38*tmp20) + (1406*x2)), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.7770391Z tmp22 = tmp21.to(tl.float32) 2023-01-11T21:38:06.7770474Z tmp23 = tmp22 * tmp14 2023-01-11T21:38:06.7770555Z tmp24 = tmp16 + tmp23 2023-01-11T21:38:06.7770642Z tmp25 = tmp9.to(tl.float32) 2023-01-11T21:38:06.7770746Z tmp26 = tmp7 - tmp25 2023-01-11T21:38:06.7770859Z tmp27 = tmp12 - tmp26 2023-01-11T21:38:06.7770943Z tmp28 = tmp24 * tmp27 2023-01-11T21:38:06.7771040Z tmp29 = tl.libdevice.ceil(tmp7) 2023-01-11T21:38:06.7771113Z tmp30 = 37.0 2023-01-11T21:38:06.7771256Z tmp31 = tl.where(tmp29 != tmp29, tmp29, tl.where(tmp29 < tmp30, tmp29, tmp30)) 2023-01-11T21:38:06.7771343Z tmp32 = tmp31.to(tl.int32) 2023-01-11T21:38:06.7771478Z tmp33 = tl.load(in_ptr0 + (tmp32 + (38*tmp4) + (1406*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.7771566Z tmp34 = tmp33.to(tl.float32) 2023-01-11T21:38:06.7771646Z tmp35 = tmp34 * tmp15 2023-01-11T21:38:06.7771781Z tmp36 = tl.load(in_ptr0 + (tmp32 + (38*tmp20) + (1406*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.7771868Z tmp37 = tmp36.to(tl.float32) 2023-01-11T21:38:06.7771948Z tmp38 = tmp37 * tmp14 2023-01-11T21:38:06.7772030Z tmp39 = tmp35 + tmp38 2023-01-11T21:38:06.7772102Z tmp40 = tmp39 * tmp26 2023-01-11T21:38:06.7772183Z tmp41 = tmp28 + tmp40 2023-01-11T21:38:06.7772272Z tmp42 = tmp41.to(tl.float32) 2023-01-11T21:38:06.7772441Z tl.store(out_ptr2 + (x4 + tl.zeros([XBLOCK], tl.int32)), tmp42, xmask) 2023-01-11T21:38:06.7772527Z ''') 2023-01-11T21:38:06.7772533Z 2023-01-11T21:38:06.7772537Z 2023-01-11T21:38:06.7772632Z async_compile.wait(globals()) 2023-01-11T21:38:06.7772711Z del async_compile 2023-01-11T21:38:06.7772716Z 2023-01-11T21:38:06.7772793Z def call(args): 2023-01-11T21:38:06.7772859Z arg0_1, = args 2023-01-11T21:38:06.7772933Z args.clear() 2023-01-11T21:38:06.7773024Z with torch.cuda.device(0): 2023-01-11T21:38:06.7773255Z buf2 = empty_strided((2, 4, 45, 45), (8100, 2025, 45, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.7773350Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.7773591Z triton_fused_add_add_2_add_3_arange_convert_element_type_convert_element_type_1_convert_element_type_5_floor_index_index_1_0.run(arg0_1, buf2, 16200, grid=grid(16200), stream=stream0) 2023-01-11T21:38:06.7773819Z buf5 = empty_strided((2, 4, 74, 76), (22496, 5624, 76, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.7774084Z triton_fused_add_5_add_6_arange_2_arange_3_convert_element_type_11_convert_element_type_6_convert_element_type_7_convert_element_type_9_floor_2_floor_3_1.run(arg0_1, buf5, 44992, grid=grid(44992), stream=stream0) 2023-01-11T21:38:06.7774152Z del arg0_1 2023-01-11T21:38:06.7774239Z return (buf2, buf5, ) 2023-01-11T21:38:06.7774245Z 2023-01-11T21:38:06.7774249Z 2023-01-11T21:38:06.7774329Z if __name__ == "__main__": 2023-01-11T21:38:06.7774452Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.7774819Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.7775053Z arg0_1 = rand_strided((2, 4, 37, 38), (5624, 1406, 38, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.7775165Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.7775170Z 2023-01-11T21:38:06.7775240Z ok (2.030s) 2023-01-11T21:38:06.7775809Z test_upsample_bilinear2d_b_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.7775946Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.7776201Z [2023-01-11 21:36:13,857] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 933 2023-01-11T21:38:06.7776441Z [2023-01-11 21:36:13,926] torch._inductor.optimize_indexing: [WARNING] unhandled ValueRange op name 2023-01-11T21:38:06.7776672Z [2023-01-11 21:36:13,941] torch._inductor.optimize_indexing: [WARNING] unhandled ValueRange op name 2023-01-11T21:38:06.7776881Z [2023-01-11 21:36:13,949] torch._inductor.scheduler: [DEBUG] remove_buffer('buf1') 2023-01-11T21:38:06.7777198Z [2023-01-11 21:36:14,144] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 933 2023-01-11T21:38:06.7777631Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.7777764Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.7778018Z [2023-01-11 21:36:14,452] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 934 2023-01-11T21:38:06.7778259Z [2023-01-11 21:36:14,515] torch._inductor.optimize_indexing: [WARNING] unhandled ValueRange op name 2023-01-11T21:38:06.7778491Z [2023-01-11 21:36:14,524] torch._inductor.optimize_indexing: [WARNING] unhandled ValueRange op name 2023-01-11T21:38:06.7778747Z [2023-01-11 21:36:14,532] torch._inductor.scheduler: [DEBUG] remove_buffer('buf0') 2023-01-11T21:38:06.7778954Z [2023-01-11 21:36:14,532] torch._inductor.scheduler: [DEBUG] remove_buffer('buf1') 2023-01-11T21:38:06.7779221Z [2023-01-11 21:36:14,729] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 934 2023-01-11T21:38:06.7779229Z 2023-01-11T21:38:06.7779330Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.7779411Z import torch 2023-01-11T21:38:06.7779489Z import random 2023-01-11T21:38:06.7779612Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.7779731Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.7779745Z 2023-01-11T21:38:06.7779823Z aten = torch.ops.aten 2023-01-11T21:38:06.7779965Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.7780064Z async_compile = AsyncCompile() 2023-01-11T21:38:06.7780069Z 2023-01-11T21:38:06.7780146Z import triton 2023-01-11T21:38:06.7780245Z import triton.language as tl 2023-01-11T21:38:06.7780373Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.7780515Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.7780521Z 2023-01-11T21:38:06.7780525Z 2023-01-11T21:38:06.7780830Z triton_fused_add_add_1_add_2_arange_arange_1_convert_element_type_convert_element_type_2_floor_floor_1_index_0 = async_compile.triton(''' 2023-01-11T21:38:06.7780902Z import triton 2023-01-11T21:38:06.7780997Z import triton.language as tl 2023-01-11T21:38:06.7781117Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.7781222Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.7781358Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.7781487Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.7781493Z 2023-01-11T21:38:06.7781916Z @pointwise(size_hints=[32768], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.7781998Z @triton.jit 2023-01-11T21:38:06.7782138Z def triton_(in_out_ptr0, in_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.7782218Z xnumel = 18880 2023-01-11T21:38:06.7782353Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.7782492Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.7782580Z xmask = xindex < xnumel 2023-01-11T21:38:06.7782666Z x1 = (xindex // 118) % 80 2023-01-11T21:38:06.7782746Z x0 = xindex % 118 2023-01-11T21:38:06.7782820Z x2 = (xindex // 9440) 2023-01-11T21:38:06.7782891Z x4 = xindex 2023-01-11T21:38:06.7782962Z tmp0 = x1 2023-01-11T21:38:06.7783041Z tmp1 = 0.4936708860759494 2023-01-11T21:38:06.7783121Z tmp2 = tmp0 * tmp1 2023-01-11T21:38:06.7783226Z tmp3 = tl.libdevice.floor(tmp2) 2023-01-11T21:38:06.7783314Z tmp4 = tmp3.to(tl.int32) 2023-01-11T21:38:06.7783382Z tmp5 = x0 2023-01-11T21:38:06.7783465Z tmp6 = 0.49572649572649574 2023-01-11T21:38:06.7783545Z tmp7 = tmp5 * tmp6 2023-01-11T21:38:06.7783647Z tmp8 = tl.libdevice.floor(tmp7) 2023-01-11T21:38:06.7783735Z tmp9 = tmp8.to(tl.int32) 2023-01-11T21:38:06.7784000Z tmp10 = tl.load(in_ptr0 + (tmp9 + (59*tmp4) + (2360*x2)), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.7784068Z tmp11 = 1.0 2023-01-11T21:38:06.7784162Z tmp12 = tmp4.to(tl.float32) 2023-01-11T21:38:06.7784282Z tmp13 = tmp2 - tmp12 2023-01-11T21:38:06.7784406Z tmp14 = tmp11 - tmp13 2023-01-11T21:38:06.7784487Z tmp15 = tmp10 * tmp14 2023-01-11T21:38:06.7784590Z tmp16 = tl.libdevice.ceil(tmp2) 2023-01-11T21:38:06.7784665Z tmp17 = 39.0 2023-01-11T21:38:06.7784810Z tmp18 = tl.where(tmp16 != tmp16, tmp16, tl.where(tmp16 < tmp17, tmp16, tmp17)) 2023-01-11T21:38:06.7784900Z tmp19 = tmp18.to(tl.int32) 2023-01-11T21:38:06.7785160Z tmp20 = tl.load(in_ptr0 + (tmp9 + (59*tmp19) + (2360*x2)), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.7785273Z tmp21 = tmp20 * tmp13 2023-01-11T21:38:06.7785353Z tmp22 = tmp15 + tmp21 2023-01-11T21:38:06.7785443Z tmp23 = tmp9.to(tl.float32) 2023-01-11T21:38:06.7785562Z tmp24 = tmp7 - tmp23 2023-01-11T21:38:06.7785675Z tmp25 = tmp11 - tmp24 2023-01-11T21:38:06.7785759Z tmp26 = tmp22 * tmp25 2023-01-11T21:38:06.7785861Z tmp27 = tl.libdevice.ceil(tmp7) 2023-01-11T21:38:06.7785935Z tmp28 = 58.0 2023-01-11T21:38:06.7786088Z tmp29 = tl.where(tmp27 != tmp27, tmp27, tl.where(tmp27 < tmp28, tmp27, tmp28)) 2023-01-11T21:38:06.7786178Z tmp30 = tmp29.to(tl.int32) 2023-01-11T21:38:06.7786310Z tmp31 = tl.load(in_ptr0 + (tmp30 + (59*tmp4) + (2360*x2)), xmask) 2023-01-11T21:38:06.7786386Z tmp32 = tmp31 * tmp14 2023-01-11T21:38:06.7786517Z tmp33 = tl.load(in_ptr0 + (tmp30 + (59*tmp19) + (2360*x2)), xmask) 2023-01-11T21:38:06.7786597Z tmp34 = tmp33 * tmp13 2023-01-11T21:38:06.7786682Z tmp35 = tmp32 + tmp34 2023-01-11T21:38:06.7786763Z tmp36 = tmp35 * tmp24 2023-01-11T21:38:06.7786844Z tmp37 = tmp26 + tmp36 2023-01-11T21:38:06.7786995Z tl.store(in_out_ptr0 + (x4 + tl.zeros([XBLOCK], tl.int32)), tmp37, xmask) 2023-01-11T21:38:06.7787076Z ''') 2023-01-11T21:38:06.7787089Z 2023-01-11T21:38:06.7787093Z 2023-01-11T21:38:06.7787187Z async_compile.wait(globals()) 2023-01-11T21:38:06.7787266Z del async_compile 2023-01-11T21:38:06.7787271Z 2023-01-11T21:38:06.7787346Z def call(args): 2023-01-11T21:38:06.7787421Z arg0_1, = args 2023-01-11T21:38:06.7787498Z args.clear() 2023-01-11T21:38:06.7787594Z with torch.cuda.device(0): 2023-01-11T21:38:06.7787860Z buf0 = empty_strided((1, 2, 80, 118), (18880, 9440, 118, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.7787944Z buf2 = buf0; del buf0 # reuse 2023-01-11T21:38:06.7788036Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.7788262Z triton_fused_add_add_1_add_2_arange_arange_1_convert_element_type_convert_element_type_2_floor_floor_1_index_0.run(buf2, arg0_1, 18880, grid=grid(18880), stream=stream0) 2023-01-11T21:38:06.7788341Z del arg0_1 2023-01-11T21:38:06.7788420Z return (buf2, ) 2023-01-11T21:38:06.7788425Z 2023-01-11T21:38:06.7788429Z 2023-01-11T21:38:06.7788509Z if __name__ == "__main__": 2023-01-11T21:38:06.7788654Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.7788784Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.7789004Z arg0_1 = rand_strided((1, 2, 40, 59), (4720, 2360, 59, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.7789115Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.7789120Z 2023-01-11T21:38:06.7789125Z 2023-01-11T21:38:06.7789223Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.7789299Z import torch 2023-01-11T21:38:06.7789373Z import random 2023-01-11T21:38:06.7789491Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.7789621Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.7789626Z 2023-01-11T21:38:06.7789710Z aten = torch.ops.aten 2023-01-11T21:38:06.7789839Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.7789936Z async_compile = AsyncCompile() 2023-01-11T21:38:06.7789941Z 2023-01-11T21:38:06.7790015Z import triton 2023-01-11T21:38:06.7790112Z import triton.language as tl 2023-01-11T21:38:06.7790238Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.7790377Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.7790383Z 2023-01-11T21:38:06.7790387Z 2023-01-11T21:38:06.7790741Z triton_fused_add_add_1_arange_arange_1_convert_element_type_convert_element_type_1_convert_element_type_3_convert_element_type_5_floor_floor_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.7790817Z import triton 2023-01-11T21:38:06.7790902Z import triton.language as tl 2023-01-11T21:38:06.7791017Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.7791152Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.7791287Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.7791412Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.7791418Z 2023-01-11T21:38:06.7791830Z @pointwise(size_hints=[32768], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.7791906Z @triton.jit 2023-01-11T21:38:06.7792040Z def triton_(in_ptr0, out_ptr2, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.7792108Z xnumel = 18880 2023-01-11T21:38:06.7792204Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.7792334Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.7792419Z xmask = xindex < xnumel 2023-01-11T21:38:06.7792501Z x1 = (xindex // 118) % 80 2023-01-11T21:38:06.7792584Z x0 = xindex % 118 2023-01-11T21:38:06.7792664Z x2 = (xindex // 9440) 2023-01-11T21:38:06.7792728Z x4 = xindex 2023-01-11T21:38:06.7792798Z tmp0 = x1 2023-01-11T21:38:06.7792876Z tmp1 = 0.4936708860759494 2023-01-11T21:38:06.7792956Z tmp2 = tmp0 * tmp1 2023-01-11T21:38:06.7793055Z tmp3 = tl.libdevice.floor(tmp2) 2023-01-11T21:38:06.7793144Z tmp4 = tmp3.to(tl.int32) 2023-01-11T21:38:06.7793215Z tmp5 = x0 2023-01-11T21:38:06.7793287Z tmp6 = 0.49572649572649574 2023-01-11T21:38:06.7793366Z tmp7 = tmp5 * tmp6 2023-01-11T21:38:06.7793465Z tmp8 = tl.libdevice.floor(tmp7) 2023-01-11T21:38:06.7793550Z tmp9 = tmp8.to(tl.int32) 2023-01-11T21:38:06.7793804Z tmp10 = tl.load(in_ptr0 + (tmp9 + (59*tmp4) + (2360*x2)), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.7793896Z tmp11 = tmp10.to(tl.float32) 2023-01-11T21:38:06.7793968Z tmp12 = 1.0 2023-01-11T21:38:06.7794048Z tmp13 = tmp4.to(tl.float32) 2023-01-11T21:38:06.7794163Z tmp14 = tmp2 - tmp13 2023-01-11T21:38:06.7794277Z tmp15 = tmp12 - tmp14 2023-01-11T21:38:06.7794356Z tmp16 = tmp11 * tmp15 2023-01-11T21:38:06.7794454Z tmp17 = tl.libdevice.ceil(tmp2) 2023-01-11T21:38:06.7794526Z tmp18 = 39.0 2023-01-11T21:38:06.7794667Z tmp19 = tl.where(tmp17 != tmp17, tmp17, tl.where(tmp17 < tmp18, tmp17, tmp18)) 2023-01-11T21:38:06.7794774Z tmp20 = tmp19.to(tl.int32) 2023-01-11T21:38:06.7795026Z tmp21 = tl.load(in_ptr0 + (tmp9 + (59*tmp20) + (2360*x2)), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.7795116Z tmp22 = tmp21.to(tl.float32) 2023-01-11T21:38:06.7795196Z tmp23 = tmp22 * tmp14 2023-01-11T21:38:06.7795276Z tmp24 = tmp16 + tmp23 2023-01-11T21:38:06.7795364Z tmp25 = tmp9.to(tl.float32) 2023-01-11T21:38:06.7795477Z tmp26 = tmp7 - tmp25 2023-01-11T21:38:06.7795582Z tmp27 = tmp12 - tmp26 2023-01-11T21:38:06.7795662Z tmp28 = tmp24 * tmp27 2023-01-11T21:38:06.7795759Z tmp29 = tl.libdevice.ceil(tmp7) 2023-01-11T21:38:06.7795837Z tmp30 = 58.0 2023-01-11T21:38:06.7795981Z tmp31 = tl.where(tmp29 != tmp29, tmp29, tl.where(tmp29 < tmp30, tmp29, tmp30)) 2023-01-11T21:38:06.7796067Z tmp32 = tmp31.to(tl.int32) 2023-01-11T21:38:06.7796207Z tmp33 = tl.load(in_ptr0 + (tmp32 + (59*tmp4) + (2360*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.7796290Z tmp34 = tmp33.to(tl.float32) 2023-01-11T21:38:06.7796369Z tmp35 = tmp34 * tmp15 2023-01-11T21:38:06.7796505Z tmp36 = tl.load(in_ptr0 + (tmp32 + (59*tmp20) + (2360*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.7796593Z tmp37 = tmp36.to(tl.float32) 2023-01-11T21:38:06.7796673Z tmp38 = tmp37 * tmp14 2023-01-11T21:38:06.7796752Z tmp39 = tmp35 + tmp38 2023-01-11T21:38:06.7796832Z tmp40 = tmp39 * tmp26 2023-01-11T21:38:06.7796903Z tmp41 = tmp28 + tmp40 2023-01-11T21:38:06.7796992Z tmp42 = tmp41.to(tl.float32) 2023-01-11T21:38:06.7797128Z tl.store(out_ptr2 + (x4 + tl.zeros([XBLOCK], tl.int32)), tmp42, xmask) 2023-01-11T21:38:06.7797277Z ''') 2023-01-11T21:38:06.7797282Z 2023-01-11T21:38:06.7797287Z 2023-01-11T21:38:06.7797382Z async_compile.wait(globals()) 2023-01-11T21:38:06.7797460Z del async_compile 2023-01-11T21:38:06.7797466Z 2023-01-11T21:38:06.7797540Z def call(args): 2023-01-11T21:38:06.7797605Z arg0_1, = args 2023-01-11T21:38:06.7797680Z args.clear() 2023-01-11T21:38:06.7797775Z with torch.cuda.device(0): 2023-01-11T21:38:06.7798010Z buf2 = empty_strided((1, 2, 80, 118), (18880, 9440, 118, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.7798103Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.7798361Z triton_fused_add_add_1_arange_arange_1_convert_element_type_convert_element_type_1_convert_element_type_3_convert_element_type_5_floor_floor_1_0.run(arg0_1, buf2, 18880, grid=grid(18880), stream=stream0) 2023-01-11T21:38:06.7798437Z del arg0_1 2023-01-11T21:38:06.7798515Z return (buf2, ) 2023-01-11T21:38:06.7798520Z 2023-01-11T21:38:06.7798527Z 2023-01-11T21:38:06.7798600Z if __name__ == "__main__": 2023-01-11T21:38:06.7798720Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.7798848Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.7799077Z arg0_1 = rand_strided((1, 2, 40, 59), (4720, 2360, 59, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.7799194Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.7799200Z 2023-01-11T21:38:06.7799270Z ok (1.149s) 2023-01-11T21:38:06.7799737Z test_upsample_nearest1d_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.7799869Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.7800133Z [2023-01-11 21:36:15,012] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 935 2023-01-11T21:38:06.7800396Z [2023-01-11 21:36:15,143] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 935 2023-01-11T21:38:06.7800831Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.7800964Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.7801223Z [2023-01-11 21:36:15,538] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 936 2023-01-11T21:38:06.7801228Z 2023-01-11T21:38:06.7801326Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.7801406Z import torch 2023-01-11T21:38:06.7801483Z import random 2023-01-11T21:38:06.7801603Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.7801731Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.7801736Z 2023-01-11T21:38:06.7801823Z aten = torch.ops.aten 2023-01-11T21:38:06.7801956Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.7802053Z async_compile = AsyncCompile() 2023-01-11T21:38:06.7802058Z 2023-01-11T21:38:06.7802132Z import triton 2023-01-11T21:38:06.7802227Z import triton.language as tl 2023-01-11T21:38:06.7802351Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.7802491Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.7802497Z 2023-01-11T21:38:06.7802501Z 2023-01-11T21:38:06.7802721Z triton_fused_upsample_nearest1d_upsample_nearest1d_4_0 = async_compile.triton(''' 2023-01-11T21:38:06.7802798Z import triton 2023-01-11T21:38:06.7802911Z import triton.language as tl 2023-01-11T21:38:06.7803028Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.7803129Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.7803264Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.7803391Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.7803399Z 2023-01-11T21:38:06.7803826Z @pointwise(size_hints=[1024], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.7803901Z @triton.jit 2023-01-11T21:38:06.7804045Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.7804113Z xnumel = 592 2023-01-11T21:38:06.7804210Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.7804340Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.7804431Z xmask = xindex < xnumel 2023-01-11T21:38:06.7804508Z x0 = xindex % 74 2023-01-11T21:38:06.7804589Z x1 = (xindex // 74) 2023-01-11T21:38:06.7804661Z x2 = xindex 2023-01-11T21:38:06.7804725Z tmp0 = x0 2023-01-11T21:38:06.7804799Z tmp1 = 0.5 2023-01-11T21:38:06.7804878Z tmp2 = tmp0 * tmp1 2023-01-11T21:38:06.7804967Z tmp3 = tmp2.to(tl.int32) 2023-01-11T21:38:06.7805175Z tmp4 = tl.load(in_ptr0 + (tmp3 + (37*x1)), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.7805284Z tmp5 = tl.load(in_ptr0 + (tmp3 + (37*x1)), xmask) 2023-01-11T21:38:06.7805420Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.7805546Z tl.store(out_ptr1 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask) 2023-01-11T21:38:06.7805630Z ''') 2023-01-11T21:38:06.7805636Z 2023-01-11T21:38:06.7805640Z 2023-01-11T21:38:06.7805827Z triton_fused_upsample_nearest1d_1_1 = async_compile.triton(''' 2023-01-11T21:38:06.7805902Z import triton 2023-01-11T21:38:06.7806000Z import triton.language as tl 2023-01-11T21:38:06.7806115Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.7806217Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.7806342Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.7806468Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.7806502Z 2023-01-11T21:38:06.7806911Z @pointwise(size_hints=[1024], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.7806987Z @triton.jit 2023-01-11T21:38:06.7807120Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.7807196Z xnumel = 560 2023-01-11T21:38:06.7807293Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.7807422Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.7807501Z xmask = xindex < xnumel 2023-01-11T21:38:06.7807577Z x0 = xindex % 70 2023-01-11T21:38:06.7807656Z x1 = (xindex // 70) 2023-01-11T21:38:06.7807727Z x2 = xindex 2023-01-11T21:38:06.7807797Z tmp0 = x0 2023-01-11T21:38:06.7807878Z tmp1 = 0.5285714285714286 2023-01-11T21:38:06.7807957Z tmp2 = tmp0 * tmp1 2023-01-11T21:38:06.7808038Z tmp3 = tmp2.to(tl.int32) 2023-01-11T21:38:06.7808244Z tmp4 = tl.load(in_ptr0 + (tmp3 + (37*x1)), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.7808378Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.7808463Z ''') 2023-01-11T21:38:06.7808468Z 2023-01-11T21:38:06.7808472Z 2023-01-11T21:38:06.7808656Z triton_fused_upsample_nearest1d_2_2 = async_compile.triton(''' 2023-01-11T21:38:06.7808730Z import triton 2023-01-11T21:38:06.7808821Z import triton.language as tl 2023-01-11T21:38:06.7808927Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.7809029Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.7809189Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.7809316Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.7809321Z 2023-01-11T21:38:06.7809723Z @pointwise(size_hints=[512], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.7809798Z @triton.jit 2023-01-11T21:38:06.7809928Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.7810003Z xnumel = 360 2023-01-11T21:38:06.7810091Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.7810221Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.7810304Z xmask = xindex < xnumel 2023-01-11T21:38:06.7810378Z x0 = xindex % 45 2023-01-11T21:38:06.7810457Z x1 = (xindex // 45) 2023-01-11T21:38:06.7810530Z x2 = xindex 2023-01-11T21:38:06.7810605Z tmp0 = x0 2023-01-11T21:38:06.7810677Z tmp1 = 0.8222222222222222 2023-01-11T21:38:06.7810757Z tmp2 = tmp0 * tmp1 2023-01-11T21:38:06.7810843Z tmp3 = tmp2.to(tl.int32) 2023-01-11T21:38:06.7811050Z tmp4 = tl.load(in_ptr0 + (tmp3 + (37*x1)), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.7811187Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.7811272Z ''') 2023-01-11T21:38:06.7811277Z 2023-01-11T21:38:06.7811281Z 2023-01-11T21:38:06.7811464Z triton_fused_upsample_nearest1d_3_3 = async_compile.triton(''' 2023-01-11T21:38:06.7811541Z import triton 2023-01-11T21:38:06.7811626Z import triton.language as tl 2023-01-11T21:38:06.7811741Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.7811842Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.7811975Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.7812099Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.7812107Z 2023-01-11T21:38:06.7812513Z @pointwise(size_hints=[512], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.7812589Z @triton.jit 2023-01-11T21:38:06.7812748Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.7812817Z xnumel = 288 2023-01-11T21:38:06.7812914Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.7813043Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.7813126Z xmask = xindex < xnumel 2023-01-11T21:38:06.7813204Z x0 = xindex % 36 2023-01-11T21:38:06.7813283Z x1 = (xindex // 36) 2023-01-11T21:38:06.7813355Z x2 = xindex 2023-01-11T21:38:06.7813419Z tmp0 = x0 2023-01-11T21:38:06.7813498Z tmp1 = 1.0277777777777777 2023-01-11T21:38:06.7813576Z tmp2 = tmp0 * tmp1 2023-01-11T21:38:06.7813666Z tmp3 = tmp2.to(tl.int32) 2023-01-11T21:38:06.7813777Z tmp4 = tl.load(in_ptr0 + (tmp3 + (37*x1)), xmask) 2023-01-11T21:38:06.7813912Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.7813999Z ''') 2023-01-11T21:38:06.7814005Z 2023-01-11T21:38:06.7814009Z 2023-01-11T21:38:06.7814097Z async_compile.wait(globals()) 2023-01-11T21:38:06.7814174Z del async_compile 2023-01-11T21:38:06.7814181Z 2023-01-11T21:38:06.7814256Z def call(args): 2023-01-11T21:38:06.7814331Z arg0_1, = args 2023-01-11T21:38:06.7814407Z args.clear() 2023-01-11T21:38:06.7814704Z with torch.cuda.device(0): 2023-01-11T21:38:06.7814926Z buf0 = empty_strided((2, 4, 74), (296, 74, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.7815129Z buf4 = empty_strided((2, 4, 74), (296, 74, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.7815224Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.7815410Z triton_fused_upsample_nearest1d_upsample_nearest1d_4_0.run(arg0_1, buf0, buf4, 592, grid=grid(592), stream=stream0) 2023-01-11T21:38:06.7815667Z buf1 = empty_strided((2, 4, 70), (280, 70, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.7815828Z triton_fused_upsample_nearest1d_1_1.run(arg0_1, buf1, 560, grid=grid(560), stream=stream0) 2023-01-11T21:38:06.7816041Z buf2 = empty_strided((2, 4, 45), (180, 45, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.7816200Z triton_fused_upsample_nearest1d_2_2.run(arg0_1, buf2, 360, grid=grid(360), stream=stream0) 2023-01-11T21:38:06.7816404Z buf3 = empty_strided((2, 4, 36), (144, 36, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.7816561Z triton_fused_upsample_nearest1d_3_3.run(arg0_1, buf3, 288, grid=grid(288), stream=stream0) 2023-01-11T21:38:06.7816630Z del arg0_1 2023-01-11T21:38:06.7816737Z return (buf0, buf1, buf2, buf3, buf4, ) 2023-01-11T21:38:06.7816742Z 2023-01-11T21:38:06.7816751Z 2023-01-11T21:38:06.7816837Z if __name__ == "__main__": 2023-01-11T21:38:06.7816958Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.7817088Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.7817367Z arg0_1 = rand_strided((2, 4, 37), (148, 37, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.7817485Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.7817492Z 2023-01-11T21:38:06.7817759Z [2023-01-11 21:36:15,656] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 936 2023-01-11T21:38:06.7817765Z 2023-01-11T21:38:06.7817858Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.7817935Z import torch 2023-01-11T21:38:06.7818012Z import random 2023-01-11T21:38:06.7818134Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.7818263Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.7818268Z 2023-01-11T21:38:06.7818353Z aten = torch.ops.aten 2023-01-11T21:38:06.7818496Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.7818599Z async_compile = AsyncCompile() 2023-01-11T21:38:06.7818604Z 2023-01-11T21:38:06.7818674Z import triton 2023-01-11T21:38:06.7818771Z import triton.language as tl 2023-01-11T21:38:06.7818898Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.7819079Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.7819085Z 2023-01-11T21:38:06.7819090Z 2023-01-11T21:38:06.7819316Z triton_fused_upsample_nearest1d_upsample_nearest1d_4_0 = async_compile.triton(''' 2023-01-11T21:38:06.7819393Z import triton 2023-01-11T21:38:06.7819493Z import triton.language as tl 2023-01-11T21:38:06.7819610Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.7819708Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.7819845Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.7819972Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.7819981Z 2023-01-11T21:38:06.7820404Z @pointwise(size_hints=[1024], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.7820480Z @triton.jit 2023-01-11T21:38:06.7820629Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.7820706Z xnumel = 592 2023-01-11T21:38:06.7820808Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.7820934Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.7821024Z xmask = xindex < xnumel 2023-01-11T21:38:06.7821103Z x0 = xindex % 74 2023-01-11T21:38:06.7821186Z x1 = (xindex // 74) 2023-01-11T21:38:06.7821261Z x2 = xindex 2023-01-11T21:38:06.7821334Z tmp0 = x0 2023-01-11T21:38:06.7821402Z tmp1 = 0.5 2023-01-11T21:38:06.7821483Z tmp2 = tmp0 * tmp1 2023-01-11T21:38:06.7821606Z tmp3 = tmp2.to(tl.int32) 2023-01-11T21:38:06.7821841Z tmp4 = tl.load(in_ptr0 + (tmp3 + (37*x1)), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.7821973Z tmp5 = tl.load(in_ptr0 + (tmp3 + (37*x1)), xmask).to(tl.float32) 2023-01-11T21:38:06.7822113Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.7822252Z tl.store(out_ptr1 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask) 2023-01-11T21:38:06.7822338Z ''') 2023-01-11T21:38:06.7822344Z 2023-01-11T21:38:06.7822348Z 2023-01-11T21:38:06.7822529Z triton_fused_upsample_nearest1d_1_1 = async_compile.triton(''' 2023-01-11T21:38:06.7822607Z import triton 2023-01-11T21:38:06.7822701Z import triton.language as tl 2023-01-11T21:38:06.7822819Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.7822926Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.7823061Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.7823196Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.7823201Z 2023-01-11T21:38:06.7823605Z @pointwise(size_hints=[1024], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.7823678Z @triton.jit 2023-01-11T21:38:06.7823812Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.7823891Z xnumel = 560 2023-01-11T21:38:06.7823992Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.7824123Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.7824210Z xmask = xindex < xnumel 2023-01-11T21:38:06.7824288Z x0 = xindex % 70 2023-01-11T21:38:06.7824363Z x1 = (xindex // 70) 2023-01-11T21:38:06.7824436Z x2 = xindex 2023-01-11T21:38:06.7824510Z tmp0 = x0 2023-01-11T21:38:06.7824593Z tmp1 = 0.5285714285714286 2023-01-11T21:38:06.7824677Z tmp2 = tmp0 * tmp1 2023-01-11T21:38:06.7824765Z tmp3 = tmp2.to(tl.int32) 2023-01-11T21:38:06.7824996Z tmp4 = tl.load(in_ptr0 + (tmp3 + (37*x1)), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.7825125Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.7825211Z ''') 2023-01-11T21:38:06.7825246Z 2023-01-11T21:38:06.7825251Z 2023-01-11T21:38:06.7825467Z triton_fused_upsample_nearest1d_2_2 = async_compile.triton(''' 2023-01-11T21:38:06.7825564Z import triton 2023-01-11T21:38:06.7825663Z import triton.language as tl 2023-01-11T21:38:06.7825780Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.7825889Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.7826023Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.7826144Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.7826149Z 2023-01-11T21:38:06.7826550Z @pointwise(size_hints=[512], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.7826628Z @triton.jit 2023-01-11T21:38:06.7826762Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.7826841Z xnumel = 360 2023-01-11T21:38:06.7826938Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.7827068Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.7827156Z xmask = xindex < xnumel 2023-01-11T21:38:06.7827227Z x0 = xindex % 45 2023-01-11T21:38:06.7827309Z x1 = (xindex // 45) 2023-01-11T21:38:06.7827384Z x2 = xindex 2023-01-11T21:38:06.7827459Z tmp0 = x0 2023-01-11T21:38:06.7827542Z tmp1 = 0.8222222222222222 2023-01-11T21:38:06.7827625Z tmp2 = tmp0 * tmp1 2023-01-11T21:38:06.7827707Z tmp3 = tmp2.to(tl.int32) 2023-01-11T21:38:06.7827937Z tmp4 = tl.load(in_ptr0 + (tmp3 + (37*x1)), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.7828099Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.7828188Z ''') 2023-01-11T21:38:06.7828193Z 2023-01-11T21:38:06.7828198Z 2023-01-11T21:38:06.7828385Z triton_fused_upsample_nearest1d_3_3 = async_compile.triton(''' 2023-01-11T21:38:06.7828464Z import triton 2023-01-11T21:38:06.7828558Z import triton.language as tl 2023-01-11T21:38:06.7828675Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.7828772Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.7828909Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.7829035Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.7829040Z 2023-01-11T21:38:06.7829444Z @pointwise(size_hints=[512], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.7829524Z @triton.jit 2023-01-11T21:38:06.7829658Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.7829736Z xnumel = 288 2023-01-11T21:38:06.7829835Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.7829962Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.7830047Z xmask = xindex < xnumel 2023-01-11T21:38:06.7830125Z x0 = xindex % 36 2023-01-11T21:38:06.7830206Z x1 = (xindex // 36) 2023-01-11T21:38:06.7830280Z x2 = xindex 2023-01-11T21:38:06.7830355Z tmp0 = x0 2023-01-11T21:38:06.7830437Z tmp1 = 1.0277777777777777 2023-01-11T21:38:06.7830511Z tmp2 = tmp0 * tmp1 2023-01-11T21:38:06.7830599Z tmp3 = tmp2.to(tl.int32) 2023-01-11T21:38:06.7830729Z tmp4 = tl.load(in_ptr0 + (tmp3 + (37*x1)), xmask).to(tl.float32) 2023-01-11T21:38:06.7830867Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.7830958Z ''') 2023-01-11T21:38:06.7830963Z 2023-01-11T21:38:06.7830967Z 2023-01-11T21:38:06.7831064Z async_compile.wait(globals()) 2023-01-11T21:38:06.7831147Z del async_compile 2023-01-11T21:38:06.7831152Z 2023-01-11T21:38:06.7831222Z def call(args): 2023-01-11T21:38:06.7831297Z arg0_1, = args 2023-01-11T21:38:06.7831374Z args.clear() 2023-01-11T21:38:06.7831496Z with torch.cuda.device(0): 2023-01-11T21:38:06.7831708Z buf0 = empty_strided((2, 4, 74), (296, 74, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.7831913Z buf4 = empty_strided((2, 4, 74), (296, 74, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.7832006Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.7832193Z triton_fused_upsample_nearest1d_upsample_nearest1d_4_0.run(arg0_1, buf0, buf4, 592, grid=grid(592), stream=stream0) 2023-01-11T21:38:06.7832389Z buf1 = empty_strided((2, 4, 70), (280, 70, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.7832546Z triton_fused_upsample_nearest1d_1_1.run(arg0_1, buf1, 560, grid=grid(560), stream=stream0) 2023-01-11T21:38:06.7832752Z buf2 = empty_strided((2, 4, 45), (180, 45, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.7832907Z triton_fused_upsample_nearest1d_2_2.run(arg0_1, buf2, 360, grid=grid(360), stream=stream0) 2023-01-11T21:38:06.7833110Z buf3 = empty_strided((2, 4, 36), (144, 36, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.7833261Z triton_fused_upsample_nearest1d_3_3.run(arg0_1, buf3, 288, grid=grid(288), stream=stream0) 2023-01-11T21:38:06.7833338Z del arg0_1 2023-01-11T21:38:06.7833441Z return (buf0, buf1, buf2, buf3, buf4, ) 2023-01-11T21:38:06.7833447Z 2023-01-11T21:38:06.7833451Z 2023-01-11T21:38:06.7833524Z if __name__ == "__main__": 2023-01-11T21:38:06.7833641Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.7833767Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.7833977Z arg0_1 = rand_strided((2, 4, 37), (148, 37, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.7834118Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.7834123Z 2023-01-11T21:38:06.7834194Z ok (0.928s) 2023-01-11T21:38:06.7834672Z test_upsample_nearest2d_backward_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.7834803Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.7835062Z [2023-01-11 21:36:15,683] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 937 2023-01-11T21:38:06.7835069Z 2023-01-11T21:38:06.7835166Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.7835237Z import torch 2023-01-11T21:38:06.7835313Z import random 2023-01-11T21:38:06.7835434Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.7835557Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.7835562Z 2023-01-11T21:38:06.7835646Z aten = torch.ops.aten 2023-01-11T21:38:06.7835788Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.7835884Z async_compile = AsyncCompile() 2023-01-11T21:38:06.7835889Z 2023-01-11T21:38:06.7835956Z import triton 2023-01-11T21:38:06.7836049Z import triton.language as tl 2023-01-11T21:38:06.7836175Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.7836313Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.7836318Z 2023-01-11T21:38:06.7836323Z 2023-01-11T21:38:06.7836527Z triton_fused_upsample_nearest2d_backward_0 = async_compile.triton(''' 2023-01-11T21:38:06.7836603Z import triton 2023-01-11T21:38:06.7836696Z import triton.language as tl 2023-01-11T21:38:06.7836812Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.7836906Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.7837037Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.7837162Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.7837167Z 2023-01-11T21:38:06.7837595Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.7837670Z @triton.jit 2023-01-11T21:38:06.7837802Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.7837876Z xnumel = 162 2023-01-11T21:38:06.7837972Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.7838093Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.7838175Z xmask = xindex < xnumel 2023-01-11T21:38:06.7838253Z x0 = xindex % 6 2023-01-11T21:38:06.7838331Z x1 = (xindex // 6) 2023-01-11T21:38:06.7838401Z x2 = xindex 2023-01-11T21:38:06.7838608Z tmp0 = tl.load(in_ptr0 + ((2*x0) + (24*x1)), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.7838819Z tmp1 = tl.load(in_ptr0 + (1 + (2*x0) + (24*x1)), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.7839020Z tmp3 = tl.load(in_ptr0 + (12 + (2*x0) + (24*x1)), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.7839227Z tmp5 = tl.load(in_ptr0 + (13 + (2*x0) + (24*x1)), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.7839307Z tmp2 = tmp1 + tmp0 2023-01-11T21:38:06.7839386Z tmp4 = tmp3 + tmp2 2023-01-11T21:38:06.7839465Z tmp6 = tmp5 + tmp4 2023-01-11T21:38:06.7839539Z tmp7 = 1.0 2023-01-11T21:38:06.7839616Z tmp8 = tmp6 * tmp7 2023-01-11T21:38:06.7839743Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp8, xmask) 2023-01-11T21:38:06.7839831Z ''') 2023-01-11T21:38:06.7839863Z 2023-01-11T21:38:06.7839867Z 2023-01-11T21:38:06.7840069Z triton_fused_upsample_nearest2d_backward_1_1 = async_compile.triton(''' 2023-01-11T21:38:06.7840144Z import triton 2023-01-11T21:38:06.7840237Z import triton.language as tl 2023-01-11T21:38:06.7840353Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.7840457Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.7840589Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.7840707Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.7840712Z 2023-01-11T21:38:06.7841114Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.7841186Z @triton.jit 2023-01-11T21:38:06.7841318Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.7841393Z xnumel = 180 2023-01-11T21:38:06.7841494Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.7841622Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.7841704Z xmask = xindex < xnumel 2023-01-11T21:38:06.7841779Z x1 = (xindex // 5) % 4 2023-01-11T21:38:06.7841853Z x0 = xindex % 5 2023-01-11T21:38:06.7841934Z x2 = (xindex // 20) 2023-01-11T21:38:06.7842008Z x4 = xindex 2023-01-11T21:38:06.7842089Z tmp0 = ((3 + (6*x1)) // 4) 2023-01-11T21:38:06.7842171Z tmp1 = ((9 + (6*x1)) // 4) 2023-01-11T21:38:06.7842243Z tmp2 = tmp0 < tmp1 2023-01-11T21:38:06.7842327Z tmp3 = ((4 + (12*x0)) // 5) 2023-01-11T21:38:06.7842409Z tmp4 = ((16 + (12*x0)) // 5) 2023-01-11T21:38:06.7842490Z tmp5 = tmp3 < tmp4 2023-01-11T21:38:06.7842567Z tmp6 = tmp2 & tmp5 2023-01-11T21:38:06.7842888Z tmp7 = tl.load(in_ptr0 + ((12*(((3 + (6*x1)) // 4))) + (72*x2) + (((4 + (12*x0)) // 5)) + tl.zeros([XBLOCK], tl.int32)), tmp6 & xmask, eviction_policy='evict_last', other=0) 2023-01-11T21:38:06.7842989Z tmp8 = tl.where(tmp6, tmp7, 0.0) 2023-01-11T21:38:06.7843073Z tmp9 = 1 + (((4 + (12*x0)) // 5)) 2023-01-11T21:38:06.7843146Z tmp10 = tmp9 < tmp4 2023-01-11T21:38:06.7843226Z tmp11 = tmp2 & tmp10 2023-01-11T21:38:06.7843580Z tmp12 = tl.load(in_ptr0 + (1 + (12*(((3 + (6*x1)) // 4))) + (72*x2) + (((4 + (12*x0)) // 5)) + tl.zeros([XBLOCK], tl.int32)), tmp11 & xmask, eviction_policy='evict_last', other=0) 2023-01-11T21:38:06.7843679Z tmp13 = tl.where(tmp11, tmp12, 0.0) 2023-01-11T21:38:06.7843759Z tmp14 = tmp13 + tmp8 2023-01-11T21:38:06.7843843Z tmp15 = 2 + (((4 + (12*x0)) // 5)) 2023-01-11T21:38:06.7843924Z tmp16 = tmp15 < tmp4 2023-01-11T21:38:06.7843997Z tmp17 = tmp2 & tmp16 2023-01-11T21:38:06.7844322Z tmp18 = tl.load(in_ptr0 + (2 + (12*(((3 + (6*x1)) // 4))) + (72*x2) + (((4 + (12*x0)) // 5)) + tl.zeros([XBLOCK], tl.int32)), tmp17 & xmask, eviction_policy='evict_last', other=0) 2023-01-11T21:38:06.7844420Z tmp19 = tl.where(tmp17, tmp18, 0.0) 2023-01-11T21:38:06.7844503Z tmp20 = tmp19 + tmp14 2023-01-11T21:38:06.7844588Z tmp21 = 1 + (((3 + (6*x1)) // 4)) 2023-01-11T21:38:06.7844668Z tmp22 = tmp21 < tmp1 2023-01-11T21:38:06.7844748Z tmp23 = tmp22 & tmp5 2023-01-11T21:38:06.7845071Z tmp24 = tl.load(in_ptr0 + (12 + (12*(((3 + (6*x1)) // 4))) + (72*x2) + (((4 + (12*x0)) // 5)) + tl.zeros([XBLOCK], tl.int32)), tmp23 & xmask, eviction_policy='evict_last', other=0) 2023-01-11T21:38:06.7845160Z tmp25 = tl.where(tmp23, tmp24, 0.0) 2023-01-11T21:38:06.7845240Z tmp26 = tmp25 + tmp20 2023-01-11T21:38:06.7845319Z tmp27 = tmp22 & tmp10 2023-01-11T21:38:06.7845631Z tmp28 = tl.load(in_ptr0 + (13 + (12*(((3 + (6*x1)) // 4))) + (72*x2) + (((4 + (12*x0)) // 5)) + tl.zeros([XBLOCK], tl.int32)), tmp27 & xmask, eviction_policy='evict_last', other=0) 2023-01-11T21:38:06.7845729Z tmp29 = tl.where(tmp27, tmp28, 0.0) 2023-01-11T21:38:06.7845810Z tmp30 = tmp29 + tmp26 2023-01-11T21:38:06.7845890Z tmp31 = tmp22 & tmp16 2023-01-11T21:38:06.7846219Z tmp32 = tl.load(in_ptr0 + (14 + (12*(((3 + (6*x1)) // 4))) + (72*x2) + (((4 + (12*x0)) // 5)) + tl.zeros([XBLOCK], tl.int32)), tmp31 & xmask, eviction_policy='evict_last', other=0) 2023-01-11T21:38:06.7846318Z tmp33 = tl.where(tmp31, tmp32, 0.0) 2023-01-11T21:38:06.7846400Z tmp34 = tmp33 + tmp30 2023-01-11T21:38:06.7846536Z tl.store(out_ptr0 + (x4 + tl.zeros([XBLOCK], tl.int32)), tmp34, xmask) 2023-01-11T21:38:06.7846621Z ''') 2023-01-11T21:38:06.7846626Z 2023-01-11T21:38:06.7846631Z 2023-01-11T21:38:06.7846887Z triton_fused_upsample_nearest2d_backward_2_upsample_nearest2d_backward_3_2 = async_compile.triton(''' 2023-01-11T21:38:06.7846962Z import triton 2023-01-11T21:38:06.7847055Z import triton.language as tl 2023-01-11T21:38:06.7847164Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.7847265Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.7847399Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.7847528Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.7847533Z 2023-01-11T21:38:06.7847956Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.7848031Z @triton.jit 2023-01-11T21:38:06.7848175Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.7848252Z xnumel = 144 2023-01-11T21:38:06.7848341Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.7848471Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.7848556Z xmask = xindex < xnumel 2023-01-11T21:38:06.7848637Z x1 = (xindex // 8) % 2 2023-01-11T21:38:06.7848714Z x0 = xindex % 8 2023-01-11T21:38:06.7848791Z x3 = (xindex // 8) 2023-01-11T21:38:06.7848862Z x4 = xindex 2023-01-11T21:38:06.7848931Z tmp0 = 3*x1 2023-01-11T21:38:06.7849006Z tmp1 = 3 + (3*x1) 2023-01-11T21:38:06.7849086Z tmp2 = tmp0 < tmp1 2023-01-11T21:38:06.7849167Z tmp3 = ((7 + (12*x0)) // 8) 2023-01-11T21:38:06.7849249Z tmp4 = ((19 + (12*x0)) // 8) 2023-01-11T21:38:06.7849329Z tmp5 = tmp3 < tmp4 2023-01-11T21:38:06.7849405Z tmp6 = tmp2 & tmp5 2023-01-11T21:38:06.7849723Z tmp7 = tl.load(in_ptr0 + ((36*x3) + (((7 + (12*x0)) // 8)) + tl.zeros([XBLOCK], tl.int32)), tmp6 & xmask, eviction_policy='evict_last', other=0) 2023-01-11T21:38:06.7849822Z tmp8 = tl.where(tmp6, tmp7, 0.0) 2023-01-11T21:38:06.7849907Z tmp9 = 1 + (((7 + (12*x0)) // 8)) 2023-01-11T21:38:06.7849986Z tmp10 = tmp9 < tmp4 2023-01-11T21:38:06.7850066Z tmp11 = tmp2 & tmp10 2023-01-11T21:38:06.7850363Z tmp12 = tl.load(in_ptr0 + (1 + (36*x3) + (((7 + (12*x0)) // 8)) + tl.zeros([XBLOCK], tl.int32)), tmp11 & xmask, eviction_policy='evict_last', other=0) 2023-01-11T21:38:06.7850457Z tmp13 = tl.where(tmp11, tmp12, 0.0) 2023-01-11T21:38:06.7850534Z tmp14 = tmp13 + tmp8 2023-01-11T21:38:06.7850609Z tmp15 = 1 + (3*x1) 2023-01-11T21:38:06.7850688Z tmp16 = tmp15 < tmp1 2023-01-11T21:38:06.7850767Z tmp17 = tmp16 & tmp5 2023-01-11T21:38:06.7851067Z tmp18 = tl.load(in_ptr0 + (12 + (36*x3) + (((7 + (12*x0)) // 8)) + tl.zeros([XBLOCK], tl.int32)), tmp17 & xmask, eviction_policy='evict_last', other=0) 2023-01-11T21:38:06.7851163Z tmp19 = tl.where(tmp17, tmp18, 0.0) 2023-01-11T21:38:06.7851243Z tmp20 = tmp19 + tmp14 2023-01-11T21:38:06.7851323Z tmp21 = tmp16 & tmp10 2023-01-11T21:38:06.7851609Z tmp22 = tl.load(in_ptr0 + (13 + (36*x3) + (((7 + (12*x0)) // 8)) + tl.zeros([XBLOCK], tl.int32)), tmp21 & xmask, eviction_policy='evict_last', other=0) 2023-01-11T21:38:06.7851704Z tmp23 = tl.where(tmp21, tmp22, 0.0) 2023-01-11T21:38:06.7851783Z tmp24 = tmp23 + tmp20 2023-01-11T21:38:06.7851858Z tmp25 = 2 + (3*x1) 2023-01-11T21:38:06.7851938Z tmp26 = tmp25 < tmp1 2023-01-11T21:38:06.7852016Z tmp27 = tmp26 & tmp5 2023-01-11T21:38:06.7852339Z tmp28 = tl.load(in_ptr0 + (24 + (36*x3) + (((7 + (12*x0)) // 8)) + tl.zeros([XBLOCK], tl.int32)), tmp27 & xmask, eviction_policy='evict_last', other=0) 2023-01-11T21:38:06.7852427Z tmp29 = tl.where(tmp27, tmp28, 0.0) 2023-01-11T21:38:06.7852508Z tmp30 = tmp29 + tmp24 2023-01-11T21:38:06.7852592Z tmp31 = tmp26 & tmp10 2023-01-11T21:38:06.7852879Z tmp32 = tl.load(in_ptr0 + (25 + (36*x3) + (((7 + (12*x0)) // 8)) + tl.zeros([XBLOCK], tl.int32)), tmp31 & xmask, eviction_policy='evict_last', other=0) 2023-01-11T21:38:06.7852973Z tmp33 = tl.where(tmp31, tmp32, 0.0) 2023-01-11T21:38:06.7853055Z tmp34 = tmp33 + tmp30 2023-01-11T21:38:06.7853213Z tmp35 = tl.load(in_ptr0 + ((36*x3) + (((7 + (12*x0)) // 8)) + tl.zeros([XBLOCK], tl.int32)), tmp6 & xmask, other=0) 2023-01-11T21:38:06.7853308Z tmp36 = tl.where(tmp6, tmp35, 0.0) 2023-01-11T21:38:06.7853464Z tmp37 = tl.load(in_ptr0 + (1 + (36*x3) + (((7 + (12*x0)) // 8)) + tl.zeros([XBLOCK], tl.int32)), tmp11 & xmask, other=0) 2023-01-11T21:38:06.7853562Z tmp38 = tl.where(tmp11, tmp37, 0.0) 2023-01-11T21:38:06.7853643Z tmp39 = tmp38 + tmp36 2023-01-11T21:38:06.7853802Z tmp40 = tl.load(in_ptr0 + (12 + (36*x3) + (((7 + (12*x0)) // 8)) + tl.zeros([XBLOCK], tl.int32)), tmp17 & xmask, other=0) 2023-01-11T21:38:06.7853902Z tmp41 = tl.where(tmp17, tmp40, 0.0) 2023-01-11T21:38:06.7853982Z tmp42 = tmp41 + tmp39 2023-01-11T21:38:06.7854135Z tmp43 = tl.load(in_ptr0 + (13 + (36*x3) + (((7 + (12*x0)) // 8)) + tl.zeros([XBLOCK], tl.int32)), tmp21 & xmask, other=0) 2023-01-11T21:38:06.7854221Z tmp44 = tl.where(tmp21, tmp43, 0.0) 2023-01-11T21:38:06.7854302Z tmp45 = tmp44 + tmp42 2023-01-11T21:38:06.7854456Z tmp46 = tl.load(in_ptr0 + (24 + (36*x3) + (((7 + (12*x0)) // 8)) + tl.zeros([XBLOCK], tl.int32)), tmp27 & xmask, other=0) 2023-01-11T21:38:06.7854866Z tmp47 = tl.where(tmp27, tmp46, 0.0) 2023-01-11T21:38:06.7854949Z tmp48 = tmp47 + tmp45 2023-01-11T21:38:06.7855104Z tmp49 = tl.load(in_ptr0 + (25 + (36*x3) + (((7 + (12*x0)) // 8)) + tl.zeros([XBLOCK], tl.int32)), tmp31 & xmask, other=0) 2023-01-11T21:38:06.7855197Z tmp50 = tl.where(tmp31, tmp49, 0.0) 2023-01-11T21:38:06.7855278Z tmp51 = tmp50 + tmp48 2023-01-11T21:38:06.7855406Z tl.store(out_ptr0 + (x4 + tl.zeros([XBLOCK], tl.int32)), tmp34, xmask) 2023-01-11T21:38:06.7855589Z tl.store(out_ptr1 + (x4 + tl.zeros([XBLOCK], tl.int32)), tmp51, xmask) 2023-01-11T21:38:06.7855679Z ''') 2023-01-11T21:38:06.7855684Z 2023-01-11T21:38:06.7855688Z 2023-01-11T21:38:06.7855892Z triton_fused_upsample_nearest2d_backward_4_3 = async_compile.triton(''' 2023-01-11T21:38:06.7855970Z import triton 2023-01-11T21:38:06.7856062Z import triton.language as tl 2023-01-11T21:38:06.7856179Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.7856281Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.7856408Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.7856535Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.7856543Z 2023-01-11T21:38:06.7856944Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.7857020Z @triton.jit 2023-01-11T21:38:06.7857203Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.7857285Z xnumel = 252 2023-01-11T21:38:06.7857384Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.7857515Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.7857591Z xmask = xindex < xnumel 2023-01-11T21:38:06.7857675Z x1 = (xindex // 7) % 4 2023-01-11T21:38:06.7857749Z x0 = xindex % 7 2023-01-11T21:38:06.7857827Z x2 = (xindex // 28) 2023-01-11T21:38:06.7857897Z x4 = xindex 2023-01-11T21:38:06.7857978Z tmp0 = ((3 + (6*x1)) // 4) 2023-01-11T21:38:06.7858094Z tmp1 = ((9 + (6*x1)) // 4) 2023-01-11T21:38:06.7858177Z tmp2 = tmp0 < tmp1 2023-01-11T21:38:06.7858261Z tmp3 = ((6 + (12*x0)) // 7) 2023-01-11T21:38:06.7858345Z tmp4 = ((18 + (12*x0)) // 7) 2023-01-11T21:38:06.7858425Z tmp5 = tmp3 < tmp4 2023-01-11T21:38:06.7858505Z tmp6 = tmp2 & tmp5 2023-01-11T21:38:06.7858693Z tmp7 = tl.load(in_ptr0 + ((12*(((3 + (6*x1)) // 4))) + (72*x2) + (((6 + (12*x0)) // 7)) + tl.zeros([XBLOCK], tl.int32)), tmp6 & xmask, other=0) 2023-01-11T21:38:06.7858784Z tmp8 = tl.where(tmp6, tmp7, 0.0) 2023-01-11T21:38:06.7858870Z tmp9 = 1 + (((6 + (12*x0)) // 7)) 2023-01-11T21:38:06.7858954Z tmp10 = tmp9 < tmp4 2023-01-11T21:38:06.7859037Z tmp11 = tmp2 & tmp10 2023-01-11T21:38:06.7859218Z tmp12 = tl.load(in_ptr0 + (1 + (12*(((3 + (6*x1)) // 4))) + (72*x2) + (((6 + (12*x0)) // 7)) + tl.zeros([XBLOCK], tl.int32)), tmp11 & xmask, other=0) 2023-01-11T21:38:06.7859318Z tmp13 = tl.where(tmp11, tmp12, 0.0) 2023-01-11T21:38:06.7859402Z tmp14 = tmp13 + tmp8 2023-01-11T21:38:06.7859495Z tmp15 = 1 + (((3 + (6*x1)) // 4)) 2023-01-11T21:38:06.7859571Z tmp16 = tmp15 < tmp1 2023-01-11T21:38:06.7859652Z tmp17 = tmp16 & tmp5 2023-01-11T21:38:06.7859833Z tmp18 = tl.load(in_ptr0 + (12 + (12*(((3 + (6*x1)) // 4))) + (72*x2) + (((6 + (12*x0)) // 7)) + tl.zeros([XBLOCK], tl.int32)), tmp17 & xmask, other=0) 2023-01-11T21:38:06.7859931Z tmp19 = tl.where(tmp17, tmp18, 0.0) 2023-01-11T21:38:06.7860015Z tmp20 = tmp19 + tmp14 2023-01-11T21:38:06.7860098Z tmp21 = tmp16 & tmp10 2023-01-11T21:38:06.7860277Z tmp22 = tl.load(in_ptr0 + (13 + (12*(((3 + (6*x1)) // 4))) + (72*x2) + (((6 + (12*x0)) // 7)) + tl.zeros([XBLOCK], tl.int32)), tmp21 & xmask, other=0) 2023-01-11T21:38:06.7860367Z tmp23 = tl.where(tmp21, tmp22, 0.0) 2023-01-11T21:38:06.7860450Z tmp24 = tmp23 + tmp20 2023-01-11T21:38:06.7860586Z tl.store(out_ptr0 + (x4 + tl.zeros([XBLOCK], tl.int32)), tmp24, xmask) 2023-01-11T21:38:06.7860679Z ''') 2023-01-11T21:38:06.7860687Z 2023-01-11T21:38:06.7860692Z 2023-01-11T21:38:06.7860789Z async_compile.wait(globals()) 2023-01-11T21:38:06.7860869Z del async_compile 2023-01-11T21:38:06.7860876Z 2023-01-11T21:38:06.7860953Z def call(args): 2023-01-11T21:38:06.7861030Z arg0_1, = args 2023-01-11T21:38:06.7861101Z args.clear() 2023-01-11T21:38:06.7861225Z with torch.cuda.device(0): 2023-01-11T21:38:06.7861445Z buf0 = empty_strided((3, 3, 3, 6), (54, 18, 6, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.7861539Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.7861708Z triton_fused_upsample_nearest2d_backward_0.run(arg0_1, buf0, 162, grid=grid(162), stream=stream0) 2023-01-11T21:38:06.7861918Z buf1 = empty_strided((3, 3, 4, 5), (60, 20, 5, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.7862085Z triton_fused_upsample_nearest2d_backward_1_1.run(arg0_1, buf1, 180, grid=grid(180), stream=stream0) 2023-01-11T21:38:06.7862300Z buf2 = empty_strided((3, 3, 2, 8), (48, 16, 8, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.7862502Z buf3 = empty_strided((3, 3, 2, 8), (48, 16, 8, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.7862708Z triton_fused_upsample_nearest2d_backward_2_upsample_nearest2d_backward_3_2.run(arg0_1, buf2, buf3, 144, grid=grid(144), stream=stream0) 2023-01-11T21:38:06.7862925Z buf4 = empty_strided((3, 3, 4, 7), (84, 28, 7, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.7863092Z triton_fused_upsample_nearest2d_backward_4_3.run(arg0_1, buf4, 252, grid=grid(252), stream=stream0) 2023-01-11T21:38:06.7863167Z del arg0_1 2023-01-11T21:38:06.7863271Z return (buf0, buf1, buf2, buf3, buf4, ) 2023-01-11T21:38:06.7863276Z 2023-01-11T21:38:06.7863281Z 2023-01-11T21:38:06.7863362Z if __name__ == "__main__": 2023-01-11T21:38:06.7863483Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.7863603Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.7863869Z arg0_1 = rand_strided((3, 3, 6, 12), (216, 72, 12, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.7863982Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.7864253Z [2023-01-11 21:36:16,431] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 937 2023-01-11T21:38:06.7864674Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.7864807Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.7865065Z [2023-01-11 21:36:16,457] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 938 2023-01-11T21:38:06.7865071Z 2023-01-11T21:38:06.7865079Z 2023-01-11T21:38:06.7865177Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.7865252Z import torch 2023-01-11T21:38:06.7865326Z import random 2023-01-11T21:38:06.7865439Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.7865563Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.7865568Z 2023-01-11T21:38:06.7865656Z aten = torch.ops.aten 2023-01-11T21:38:06.7865793Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.7865889Z async_compile = AsyncCompile() 2023-01-11T21:38:06.7865895Z 2023-01-11T21:38:06.7865970Z import triton 2023-01-11T21:38:06.7866064Z import triton.language as tl 2023-01-11T21:38:06.7866182Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.7866321Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.7866327Z 2023-01-11T21:38:06.7866331Z 2023-01-11T21:38:06.7866530Z triton_fused_upsample_nearest2d_backward_0 = async_compile.triton(''' 2023-01-11T21:38:06.7866609Z import triton 2023-01-11T21:38:06.7866701Z import triton.language as tl 2023-01-11T21:38:06.7866816Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.7866917Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.7867050Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.7867198Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.7867204Z 2023-01-11T21:38:06.7867608Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.7867682Z @triton.jit 2023-01-11T21:38:06.7867819Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.7867893Z xnumel = 162 2023-01-11T21:38:06.7867991Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.7868121Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.7868207Z xmask = xindex < xnumel 2023-01-11T21:38:06.7868274Z x0 = xindex % 6 2023-01-11T21:38:06.7868355Z x1 = (xindex // 6) 2023-01-11T21:38:06.7868427Z x2 = xindex 2023-01-11T21:38:06.7868659Z tmp0 = tl.load(in_ptr0 + ((2*x0) + (24*x1)), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.7868893Z tmp1 = tl.load(in_ptr0 + (1 + (2*x0) + (24*x1)), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.7869126Z tmp3 = tl.load(in_ptr0 + (12 + (2*x0) + (24*x1)), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.7869358Z tmp5 = tl.load(in_ptr0 + (13 + (2*x0) + (24*x1)), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.7869439Z tmp2 = tmp1 + tmp0 2023-01-11T21:38:06.7869511Z tmp4 = tmp3 + tmp2 2023-01-11T21:38:06.7869589Z tmp6 = tmp5 + tmp4 2023-01-11T21:38:06.7869665Z tmp7 = 1.0 2023-01-11T21:38:06.7869742Z tmp8 = tmp6 * tmp7 2023-01-11T21:38:06.7869909Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp8, xmask) 2023-01-11T21:38:06.7869993Z ''') 2023-01-11T21:38:06.7869999Z 2023-01-11T21:38:06.7870003Z 2023-01-11T21:38:06.7870205Z triton_fused_upsample_nearest2d_backward_1_1 = async_compile.triton(''' 2023-01-11T21:38:06.7870273Z import triton 2023-01-11T21:38:06.7870370Z import triton.language as tl 2023-01-11T21:38:06.7870486Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.7870590Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.7870724Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.7870852Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.7870857Z 2023-01-11T21:38:06.7871252Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.7871327Z @triton.jit 2023-01-11T21:38:06.7871455Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.7871530Z xnumel = 180 2023-01-11T21:38:06.7871625Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.7871754Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.7871837Z xmask = xindex < xnumel 2023-01-11T21:38:06.7871925Z x1 = (xindex // 5) % 4 2023-01-11T21:38:06.7871999Z x0 = xindex % 5 2023-01-11T21:38:06.7872070Z x2 = (xindex // 20) 2023-01-11T21:38:06.7872143Z x4 = xindex 2023-01-11T21:38:06.7872228Z tmp0 = ((3 + (6*x1)) // 4) 2023-01-11T21:38:06.7872312Z tmp1 = ((9 + (6*x1)) // 4) 2023-01-11T21:38:06.7872392Z tmp2 = tmp0 < tmp1 2023-01-11T21:38:06.7872473Z tmp3 = ((4 + (12*x0)) // 5) 2023-01-11T21:38:06.7872554Z tmp4 = ((16 + (12*x0)) // 5) 2023-01-11T21:38:06.7872625Z tmp5 = tmp3 < tmp4 2023-01-11T21:38:06.7872704Z tmp6 = tmp2 & tmp5 2023-01-11T21:38:06.7873047Z tmp7 = tl.load(in_ptr0 + ((12*(((3 + (6*x1)) // 4))) + (72*x2) + (((4 + (12*x0)) // 5)) + tl.zeros([XBLOCK], tl.int32)), tmp6 & xmask, eviction_policy='evict_last', other=0).to(tl.float32) 2023-01-11T21:38:06.7873149Z tmp8 = tl.where(tmp6, tmp7, 0.0) 2023-01-11T21:38:06.7873238Z tmp9 = 1 + (((4 + (12*x0)) // 5)) 2023-01-11T21:38:06.7873319Z tmp10 = tmp9 < tmp4 2023-01-11T21:38:06.7873428Z tmp11 = tmp2 & tmp10 2023-01-11T21:38:06.7873768Z tmp12 = tl.load(in_ptr0 + (1 + (12*(((3 + (6*x1)) // 4))) + (72*x2) + (((4 + (12*x0)) // 5)) + tl.zeros([XBLOCK], tl.int32)), tmp11 & xmask, eviction_policy='evict_last', other=0).to(tl.float32) 2023-01-11T21:38:06.7873866Z tmp13 = tl.where(tmp11, tmp12, 0.0) 2023-01-11T21:38:06.7873946Z tmp14 = tmp13 + tmp8 2023-01-11T21:38:06.7874035Z tmp15 = 2 + (((4 + (12*x0)) // 5)) 2023-01-11T21:38:06.7874115Z tmp16 = tmp15 < tmp4 2023-01-11T21:38:06.7874195Z tmp17 = tmp2 & tmp16 2023-01-11T21:38:06.7874537Z tmp18 = tl.load(in_ptr0 + (2 + (12*(((3 + (6*x1)) // 4))) + (72*x2) + (((4 + (12*x0)) // 5)) + tl.zeros([XBLOCK], tl.int32)), tmp17 & xmask, eviction_policy='evict_last', other=0).to(tl.float32) 2023-01-11T21:38:06.7874636Z tmp19 = tl.where(tmp17, tmp18, 0.0) 2023-01-11T21:38:06.7874710Z tmp20 = tmp19 + tmp14 2023-01-11T21:38:06.7874794Z tmp21 = 1 + (((3 + (6*x1)) // 4)) 2023-01-11T21:38:06.7874877Z tmp22 = tmp21 < tmp1 2023-01-11T21:38:06.7874957Z tmp23 = tmp22 & tmp5 2023-01-11T21:38:06.7875302Z tmp24 = tl.load(in_ptr0 + (12 + (12*(((3 + (6*x1)) // 4))) + (72*x2) + (((4 + (12*x0)) // 5)) + tl.zeros([XBLOCK], tl.int32)), tmp23 & xmask, eviction_policy='evict_last', other=0).to(tl.float32) 2023-01-11T21:38:06.7875398Z tmp25 = tl.where(tmp23, tmp24, 0.0) 2023-01-11T21:38:06.7875479Z tmp26 = tmp25 + tmp20 2023-01-11T21:38:06.7875552Z tmp27 = tmp22 & tmp10 2023-01-11T21:38:06.7875886Z tmp28 = tl.load(in_ptr0 + (13 + (12*(((3 + (6*x1)) // 4))) + (72*x2) + (((4 + (12*x0)) // 5)) + tl.zeros([XBLOCK], tl.int32)), tmp27 & xmask, eviction_policy='evict_last', other=0).to(tl.float32) 2023-01-11T21:38:06.7876010Z tmp29 = tl.where(tmp27, tmp28, 0.0) 2023-01-11T21:38:06.7876092Z tmp30 = tmp29 + tmp26 2023-01-11T21:38:06.7876171Z tmp31 = tmp22 & tmp16 2023-01-11T21:38:06.7876504Z tmp32 = tl.load(in_ptr0 + (14 + (12*(((3 + (6*x1)) // 4))) + (72*x2) + (((4 + (12*x0)) // 5)) + tl.zeros([XBLOCK], tl.int32)), tmp31 & xmask, eviction_policy='evict_last', other=0).to(tl.float32) 2023-01-11T21:38:06.7876600Z tmp33 = tl.where(tmp31, tmp32, 0.0) 2023-01-11T21:38:06.7876680Z tmp34 = tmp33 + tmp30 2023-01-11T21:38:06.7876808Z tl.store(out_ptr0 + (x4 + tl.zeros([XBLOCK], tl.int32)), tmp34, xmask) 2023-01-11T21:38:06.7876895Z ''') 2023-01-11T21:38:06.7876900Z 2023-01-11T21:38:06.7876905Z 2023-01-11T21:38:06.7877159Z triton_fused_upsample_nearest2d_backward_2_upsample_nearest2d_backward_3_2 = async_compile.triton(''' 2023-01-11T21:38:06.7877238Z import triton 2023-01-11T21:38:06.7877331Z import triton.language as tl 2023-01-11T21:38:06.7877451Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.7877554Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.7877691Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.7877812Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.7877817Z 2023-01-11T21:38:06.7878240Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.7878315Z @triton.jit 2023-01-11T21:38:06.7878458Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.7878533Z xnumel = 144 2023-01-11T21:38:06.7878629Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.7878758Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.7878841Z xmask = xindex < xnumel 2023-01-11T21:38:06.7878918Z x1 = (xindex // 8) % 2 2023-01-11T21:38:06.7878993Z x0 = xindex % 8 2023-01-11T21:38:06.7879070Z x3 = (xindex // 8) 2023-01-11T21:38:06.7879142Z x4 = xindex 2023-01-11T21:38:06.7879216Z tmp0 = 3*x1 2023-01-11T21:38:06.7879291Z tmp1 = 3 + (3*x1) 2023-01-11T21:38:06.7879363Z tmp2 = tmp0 < tmp1 2023-01-11T21:38:06.7879472Z tmp3 = ((7 + (12*x0)) // 8) 2023-01-11T21:38:06.7879557Z tmp4 = ((19 + (12*x0)) // 8) 2023-01-11T21:38:06.7879636Z tmp5 = tmp3 < tmp4 2023-01-11T21:38:06.7879715Z tmp6 = tmp2 & tmp5 2023-01-11T21:38:06.7880028Z tmp7 = tl.load(in_ptr0 + ((36*x3) + (((7 + (12*x0)) // 8)) + tl.zeros([XBLOCK], tl.int32)), tmp6 & xmask, eviction_policy='evict_last', other=0).to(tl.float32) 2023-01-11T21:38:06.7880124Z tmp8 = tl.where(tmp6, tmp7, 0.0) 2023-01-11T21:38:06.7880209Z tmp9 = 1 + (((7 + (12*x0)) // 8)) 2023-01-11T21:38:06.7880282Z tmp10 = tmp9 < tmp4 2023-01-11T21:38:06.7880362Z tmp11 = tmp2 & tmp10 2023-01-11T21:38:06.7880683Z tmp12 = tl.load(in_ptr0 + (1 + (36*x3) + (((7 + (12*x0)) // 8)) + tl.zeros([XBLOCK], tl.int32)), tmp11 & xmask, eviction_policy='evict_last', other=0).to(tl.float32) 2023-01-11T21:38:06.7880780Z tmp13 = tl.where(tmp11, tmp12, 0.0) 2023-01-11T21:38:06.7880861Z tmp14 = tmp13 + tmp8 2023-01-11T21:38:06.7880936Z tmp15 = 1 + (3*x1) 2023-01-11T21:38:06.7881021Z tmp16 = tmp15 < tmp1 2023-01-11T21:38:06.7881094Z tmp17 = tmp16 & tmp5 2023-01-11T21:38:06.7881409Z tmp18 = tl.load(in_ptr0 + (12 + (36*x3) + (((7 + (12*x0)) // 8)) + tl.zeros([XBLOCK], tl.int32)), tmp17 & xmask, eviction_policy='evict_last', other=0).to(tl.float32) 2023-01-11T21:38:06.7881505Z tmp19 = tl.where(tmp17, tmp18, 0.0) 2023-01-11T21:38:06.7881591Z tmp20 = tmp19 + tmp14 2023-01-11T21:38:06.7881672Z tmp21 = tmp16 & tmp10 2023-01-11T21:38:06.7881988Z tmp22 = tl.load(in_ptr0 + (13 + (36*x3) + (((7 + (12*x0)) // 8)) + tl.zeros([XBLOCK], tl.int32)), tmp21 & xmask, eviction_policy='evict_last', other=0).to(tl.float32) 2023-01-11T21:38:06.7882115Z tmp23 = tl.where(tmp21, tmp22, 0.0) 2023-01-11T21:38:06.7882194Z tmp24 = tmp23 + tmp20 2023-01-11T21:38:06.7882263Z tmp25 = 2 + (3*x1) 2023-01-11T21:38:06.7882347Z tmp26 = tmp25 < tmp1 2023-01-11T21:38:06.7882429Z tmp27 = tmp26 & tmp5 2023-01-11T21:38:06.7882738Z tmp28 = tl.load(in_ptr0 + (24 + (36*x3) + (((7 + (12*x0)) // 8)) + tl.zeros([XBLOCK], tl.int32)), tmp27 & xmask, eviction_policy='evict_last', other=0).to(tl.float32) 2023-01-11T21:38:06.7882834Z tmp29 = tl.where(tmp27, tmp28, 0.0) 2023-01-11T21:38:06.7882916Z tmp30 = tmp29 + tmp24 2023-01-11T21:38:06.7882997Z tmp31 = tmp26 & tmp10 2023-01-11T21:38:06.7883295Z tmp32 = tl.load(in_ptr0 + (25 + (36*x3) + (((7 + (12*x0)) // 8)) + tl.zeros([XBLOCK], tl.int32)), tmp31 & xmask, eviction_policy='evict_last', other=0).to(tl.float32) 2023-01-11T21:38:06.7883396Z tmp33 = tl.where(tmp31, tmp32, 0.0) 2023-01-11T21:38:06.7883478Z tmp34 = tmp33 + tmp30 2023-01-11T21:38:06.7883653Z tmp35 = tl.load(in_ptr0 + ((36*x3) + (((7 + (12*x0)) // 8)) + tl.zeros([XBLOCK], tl.int32)), tmp6 & xmask, other=0).to(tl.float32) 2023-01-11T21:38:06.7883748Z tmp36 = tl.where(tmp6, tmp35, 0.0) 2023-01-11T21:38:06.7883928Z tmp37 = tl.load(in_ptr0 + (1 + (36*x3) + (((7 + (12*x0)) // 8)) + tl.zeros([XBLOCK], tl.int32)), tmp11 & xmask, other=0).to(tl.float32) 2023-01-11T21:38:06.7884025Z tmp38 = tl.where(tmp11, tmp37, 0.0) 2023-01-11T21:38:06.7884106Z tmp39 = tmp38 + tmp36 2023-01-11T21:38:06.7884270Z tmp40 = tl.load(in_ptr0 + (12 + (36*x3) + (((7 + (12*x0)) // 8)) + tl.zeros([XBLOCK], tl.int32)), tmp17 & xmask, other=0).to(tl.float32) 2023-01-11T21:38:06.7884367Z tmp41 = tl.where(tmp17, tmp40, 0.0) 2023-01-11T21:38:06.7884450Z tmp42 = tmp41 + tmp39 2023-01-11T21:38:06.7884618Z tmp43 = tl.load(in_ptr0 + (13 + (36*x3) + (((7 + (12*x0)) // 8)) + tl.zeros([XBLOCK], tl.int32)), tmp21 & xmask, other=0).to(tl.float32) 2023-01-11T21:38:06.7884711Z tmp44 = tl.where(tmp21, tmp43, 0.0) 2023-01-11T21:38:06.7884796Z tmp45 = tmp44 + tmp42 2023-01-11T21:38:06.7884962Z tmp46 = tl.load(in_ptr0 + (24 + (36*x3) + (((7 + (12*x0)) // 8)) + tl.zeros([XBLOCK], tl.int32)), tmp27 & xmask, other=0).to(tl.float32) 2023-01-11T21:38:06.7885058Z tmp47 = tl.where(tmp27, tmp46, 0.0) 2023-01-11T21:38:06.7885131Z tmp48 = tmp47 + tmp45 2023-01-11T21:38:06.7885321Z tmp49 = tl.load(in_ptr0 + (25 + (36*x3) + (((7 + (12*x0)) // 8)) + tl.zeros([XBLOCK], tl.int32)), tmp31 & xmask, other=0).to(tl.float32) 2023-01-11T21:38:06.7885422Z tmp50 = tl.where(tmp31, tmp49, 0.0) 2023-01-11T21:38:06.7885504Z tmp51 = tmp50 + tmp48 2023-01-11T21:38:06.7885640Z tl.store(out_ptr0 + (x4 + tl.zeros([XBLOCK], tl.int32)), tmp34, xmask) 2023-01-11T21:38:06.7885772Z tl.store(out_ptr1 + (x4 + tl.zeros([XBLOCK], tl.int32)), tmp51, xmask) 2023-01-11T21:38:06.7885860Z ''') 2023-01-11T21:38:06.7885866Z 2023-01-11T21:38:06.7885870Z 2023-01-11T21:38:06.7886073Z triton_fused_upsample_nearest2d_backward_4_3 = async_compile.triton(''' 2023-01-11T21:38:06.7886144Z import triton 2023-01-11T21:38:06.7886237Z import triton.language as tl 2023-01-11T21:38:06.7886353Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.7886457Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.7886594Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.7886721Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.7886727Z 2023-01-11T21:38:06.7887127Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.7887202Z @triton.jit 2023-01-11T21:38:06.7887328Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.7887403Z xnumel = 252 2023-01-11T21:38:06.7887501Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.7887661Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.7887744Z xmask = xindex < xnumel 2023-01-11T21:38:06.7887825Z x1 = (xindex // 7) % 4 2023-01-11T21:38:06.7887900Z x0 = xindex % 7 2023-01-11T21:38:06.7887973Z x2 = (xindex // 28) 2023-01-11T21:38:06.7888045Z x4 = xindex 2023-01-11T21:38:06.7888131Z tmp0 = ((3 + (6*x1)) // 4) 2023-01-11T21:38:06.7888211Z tmp1 = ((9 + (6*x1)) // 4) 2023-01-11T21:38:06.7888291Z tmp2 = tmp0 < tmp1 2023-01-11T21:38:06.7888370Z tmp3 = ((6 + (12*x0)) // 7) 2023-01-11T21:38:06.7888445Z tmp4 = ((18 + (12*x0)) // 7) 2023-01-11T21:38:06.7888525Z tmp5 = tmp3 < tmp4 2023-01-11T21:38:06.7888603Z tmp6 = tmp2 & tmp5 2023-01-11T21:38:06.7888791Z tmp7 = tl.load(in_ptr0 + ((12*(((3 + (6*x1)) // 4))) + (72*x2) + (((6 + (12*x0)) // 7)) + tl.zeros([XBLOCK], tl.int32)), tmp6 & xmask, other=0).to(tl.float32) 2023-01-11T21:38:06.7888889Z tmp8 = tl.where(tmp6, tmp7, 0.0) 2023-01-11T21:38:06.7888972Z tmp9 = 1 + (((6 + (12*x0)) // 7)) 2023-01-11T21:38:06.7889057Z tmp10 = tmp9 < tmp4 2023-01-11T21:38:06.7889131Z tmp11 = tmp2 & tmp10 2023-01-11T21:38:06.7889321Z tmp12 = tl.load(in_ptr0 + (1 + (12*(((3 + (6*x1)) // 4))) + (72*x2) + (((6 + (12*x0)) // 7)) + tl.zeros([XBLOCK], tl.int32)), tmp11 & xmask, other=0).to(tl.float32) 2023-01-11T21:38:06.7889420Z tmp13 = tl.where(tmp11, tmp12, 0.0) 2023-01-11T21:38:06.7889501Z tmp14 = tmp13 + tmp8 2023-01-11T21:38:06.7889587Z tmp15 = 1 + (((3 + (6*x1)) // 4)) 2023-01-11T21:38:06.7889667Z tmp16 = tmp15 < tmp1 2023-01-11T21:38:06.7889746Z tmp17 = tmp16 & tmp5 2023-01-11T21:38:06.7889934Z tmp18 = tl.load(in_ptr0 + (12 + (12*(((3 + (6*x1)) // 4))) + (72*x2) + (((6 + (12*x0)) // 7)) + tl.zeros([XBLOCK], tl.int32)), tmp17 & xmask, other=0).to(tl.float32) 2023-01-11T21:38:06.7890023Z tmp19 = tl.where(tmp17, tmp18, 0.0) 2023-01-11T21:38:06.7890104Z tmp20 = tmp19 + tmp14 2023-01-11T21:38:06.7890185Z tmp21 = tmp16 & tmp10 2023-01-11T21:38:06.7890366Z tmp22 = tl.load(in_ptr0 + (13 + (12*(((3 + (6*x1)) // 4))) + (72*x2) + (((6 + (12*x0)) // 7)) + tl.zeros([XBLOCK], tl.int32)), tmp21 & xmask, other=0).to(tl.float32) 2023-01-11T21:38:06.7890461Z tmp23 = tl.where(tmp21, tmp22, 0.0) 2023-01-11T21:38:06.7890542Z tmp24 = tmp23 + tmp20 2023-01-11T21:38:06.7890726Z tl.store(out_ptr0 + (x4 + tl.zeros([XBLOCK], tl.int32)), tmp24, xmask) 2023-01-11T21:38:06.7890814Z ''') 2023-01-11T21:38:06.7890820Z 2023-01-11T21:38:06.7890824Z 2023-01-11T21:38:06.7890911Z async_compile.wait(globals()) 2023-01-11T21:38:06.7890990Z del async_compile 2023-01-11T21:38:06.7890995Z 2023-01-11T21:38:06.7891069Z def call(args): 2023-01-11T21:38:06.7891148Z arg0_1, = args 2023-01-11T21:38:06.7891223Z args.clear() 2023-01-11T21:38:06.7891317Z with torch.cuda.device(0): 2023-01-11T21:38:06.7891533Z buf0 = empty_strided((3, 3, 3, 6), (54, 18, 6, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.7891618Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.7891791Z triton_fused_upsample_nearest2d_backward_0.run(arg0_1, buf0, 162, grid=grid(162), stream=stream0) 2023-01-11T21:38:06.7892001Z buf1 = empty_strided((3, 3, 4, 5), (60, 20, 5, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.7892172Z triton_fused_upsample_nearest2d_backward_1_1.run(arg0_1, buf1, 180, grid=grid(180), stream=stream0) 2023-01-11T21:38:06.7892384Z buf2 = empty_strided((3, 3, 2, 8), (48, 16, 8, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.7892592Z buf3 = empty_strided((3, 3, 2, 8), (48, 16, 8, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.7892796Z triton_fused_upsample_nearest2d_backward_2_upsample_nearest2d_backward_3_2.run(arg0_1, buf2, buf3, 144, grid=grid(144), stream=stream0) 2023-01-11T21:38:06.7893008Z buf4 = empty_strided((3, 3, 4, 7), (84, 28, 7, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.7893175Z triton_fused_upsample_nearest2d_backward_4_3.run(arg0_1, buf4, 252, grid=grid(252), stream=stream0) 2023-01-11T21:38:06.7893272Z del arg0_1 2023-01-11T21:38:06.7893379Z return (buf0, buf1, buf2, buf3, buf4, ) 2023-01-11T21:38:06.7893384Z 2023-01-11T21:38:06.7893389Z 2023-01-11T21:38:06.7893469Z if __name__ == "__main__": 2023-01-11T21:38:06.7893590Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.7893721Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.7893942Z arg0_1 = rand_strided((3, 3, 6, 12), (216, 72, 12, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.7894054Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.7894320Z [2023-01-11 21:36:16,977] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 938 2023-01-11T21:38:06.7894326Z 2023-01-11T21:38:06.7894391Z ok (1.321s) 2023-01-11T21:38:06.7895074Z test_upsample_nearest2d_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.7895219Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.7895479Z [2023-01-11 21:36:17,400] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 939 2023-01-11T21:38:06.7895743Z [2023-01-11 21:36:17,570] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 939 2023-01-11T21:38:06.7896160Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.7896296Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.7896552Z [2023-01-11 21:36:18,105] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 940 2023-01-11T21:38:06.7896558Z 2023-01-11T21:38:06.7896714Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.7896791Z import torch 2023-01-11T21:38:06.7896867Z import random 2023-01-11T21:38:06.7896980Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.7897107Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.7897112Z 2023-01-11T21:38:06.7897254Z aten = torch.ops.aten 2023-01-11T21:38:06.7897395Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.7897492Z async_compile = AsyncCompile() 2023-01-11T21:38:06.7897498Z 2023-01-11T21:38:06.7897571Z import triton 2023-01-11T21:38:06.7897663Z import triton.language as tl 2023-01-11T21:38:06.7897781Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.7897923Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.7897928Z 2023-01-11T21:38:06.7897933Z 2023-01-11T21:38:06.7898154Z triton_fused_upsample_nearest2d_upsample_nearest2d_4_0 = async_compile.triton(''' 2023-01-11T21:38:06.7898229Z import triton 2023-01-11T21:38:06.7898323Z import triton.language as tl 2023-01-11T21:38:06.7898438Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.7898546Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.7898681Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.7898800Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.7898805Z 2023-01-11T21:38:06.7899225Z @pointwise(size_hints=[65536], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.7899345Z @triton.jit 2023-01-11T21:38:06.7899490Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.7899569Z xnumel = 44992 2023-01-11T21:38:06.7899667Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.7899800Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.7899885Z xmask = xindex < xnumel 2023-01-11T21:38:06.7899960Z x1 = (xindex // 76) % 74 2023-01-11T21:38:06.7900036Z x0 = xindex % 76 2023-01-11T21:38:06.7900116Z x2 = (xindex // 5624) 2023-01-11T21:38:06.7900188Z x4 = xindex 2023-01-11T21:38:06.7900261Z tmp0 = x1 2023-01-11T21:38:06.7900335Z tmp1 = 0.5 2023-01-11T21:38:06.7900414Z tmp2 = tmp0 * tmp1 2023-01-11T21:38:06.7900493Z tmp3 = tmp2.to(tl.int32) 2023-01-11T21:38:06.7900563Z tmp4 = x0 2023-01-11T21:38:06.7900643Z tmp5 = tmp4 * tmp1 2023-01-11T21:38:06.7900727Z tmp6 = tmp5.to(tl.int32) 2023-01-11T21:38:06.7900956Z tmp7 = tl.load(in_ptr0 + (tmp6 + (38*tmp3) + (1406*x2)), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.7901082Z tmp8 = tl.load(in_ptr0 + (tmp6 + (38*tmp3) + (1406*x2)), xmask) 2023-01-11T21:38:06.7901219Z tl.store(out_ptr0 + (x4 + tl.zeros([XBLOCK], tl.int32)), tmp7, xmask) 2023-01-11T21:38:06.7901349Z tl.store(out_ptr1 + (x4 + tl.zeros([XBLOCK], tl.int32)), tmp8, xmask) 2023-01-11T21:38:06.7901435Z ''') 2023-01-11T21:38:06.7901441Z 2023-01-11T21:38:06.7901445Z 2023-01-11T21:38:06.7901634Z triton_fused_upsample_nearest2d_1_1 = async_compile.triton(''' 2023-01-11T21:38:06.7901708Z import triton 2023-01-11T21:38:06.7901801Z import triton.language as tl 2023-01-11T21:38:06.7901916Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.7902019Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.7902144Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.7902269Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.7902277Z 2023-01-11T21:38:06.7902685Z @pointwise(size_hints=[65536], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.7902759Z @triton.jit 2023-01-11T21:38:06.7902920Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.7902997Z xnumel = 42000 2023-01-11T21:38:06.7903097Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.7903226Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.7903309Z xmask = xindex < xnumel 2023-01-11T21:38:06.7903384Z x1 = (xindex // 75) % 70 2023-01-11T21:38:06.7903460Z x0 = xindex % 75 2023-01-11T21:38:06.7903540Z x2 = (xindex // 5250) 2023-01-11T21:38:06.7903611Z x4 = xindex 2023-01-11T21:38:06.7903686Z tmp0 = x1 2023-01-11T21:38:06.7903764Z tmp1 = 0.5285714285714286 2023-01-11T21:38:06.7903836Z tmp2 = tmp0 * tmp1 2023-01-11T21:38:06.7903924Z tmp3 = tmp2.to(tl.int32) 2023-01-11T21:38:06.7903995Z tmp4 = x0 2023-01-11T21:38:06.7904073Z tmp5 = 0.5066666666666667 2023-01-11T21:38:06.7904153Z tmp6 = tmp4 * tmp5 2023-01-11T21:38:06.7904237Z tmp7 = tmp6.to(tl.int32) 2023-01-11T21:38:06.7904467Z tmp8 = tl.load(in_ptr0 + (tmp7 + (38*tmp3) + (1406*x2)), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.7904596Z tl.store(out_ptr0 + (x4 + tl.zeros([XBLOCK], tl.int32)), tmp8, xmask) 2023-01-11T21:38:06.7904680Z ''') 2023-01-11T21:38:06.7904685Z 2023-01-11T21:38:06.7904690Z 2023-01-11T21:38:06.7904875Z triton_fused_upsample_nearest2d_2_2 = async_compile.triton(''' 2023-01-11T21:38:06.7904952Z import triton 2023-01-11T21:38:06.7905045Z import triton.language as tl 2023-01-11T21:38:06.7905159Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.7905261Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.7905396Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.7905577Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.7905584Z 2023-01-11T21:38:06.7906013Z @pointwise(size_hints=[32768], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.7906086Z @triton.jit 2023-01-11T21:38:06.7906217Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.7906293Z xnumel = 26640 2023-01-11T21:38:06.7906391Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.7906523Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.7906607Z xmask = xindex < xnumel 2023-01-11T21:38:06.7906681Z x1 = (xindex // 74) % 45 2023-01-11T21:38:06.7906758Z x0 = xindex % 74 2023-01-11T21:38:06.7906838Z x2 = (xindex // 3330) 2023-01-11T21:38:06.7906909Z x4 = xindex 2023-01-11T21:38:06.7906983Z tmp0 = x1 2023-01-11T21:38:06.7907062Z tmp1 = 0.8222222222222222 2023-01-11T21:38:06.7907134Z tmp2 = tmp0 * tmp1 2023-01-11T21:38:06.7907217Z tmp3 = tmp2.to(tl.int32) 2023-01-11T21:38:06.7907290Z tmp4 = x0 2023-01-11T21:38:06.7907367Z tmp5 = 0.5135135135135135 2023-01-11T21:38:06.7907446Z tmp6 = tmp4 * tmp5 2023-01-11T21:38:06.7907532Z tmp7 = tmp6.to(tl.int32) 2023-01-11T21:38:06.7907756Z tmp8 = tl.load(in_ptr0 + (tmp7 + (38*tmp3) + (1406*x2)), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.7907885Z tl.store(out_ptr0 + (x4 + tl.zeros([XBLOCK], tl.int32)), tmp8, xmask) 2023-01-11T21:38:06.7907971Z ''') 2023-01-11T21:38:06.7907976Z 2023-01-11T21:38:06.7907981Z 2023-01-11T21:38:06.7908168Z triton_fused_upsample_nearest2d_3_3 = async_compile.triton(''' 2023-01-11T21:38:06.7908242Z import triton 2023-01-11T21:38:06.7908335Z import triton.language as tl 2023-01-11T21:38:06.7908453Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.7908555Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.7908691Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.7908809Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.7908815Z 2023-01-11T21:38:06.7909247Z @pointwise(size_hints=[16384], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.7909323Z @triton.jit 2023-01-11T21:38:06.7909456Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.7909531Z xnumel = 11232 2023-01-11T21:38:06.7909627Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.7909756Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.7909839Z xmask = xindex < xnumel 2023-01-11T21:38:06.7909914Z x1 = (xindex // 39) % 36 2023-01-11T21:38:06.7909988Z x0 = xindex % 39 2023-01-11T21:38:06.7910069Z x2 = (xindex // 1404) 2023-01-11T21:38:06.7910141Z x4 = xindex 2023-01-11T21:38:06.7910212Z tmp0 = x1 2023-01-11T21:38:06.7910294Z tmp1 = 1.0277777777777777 2023-01-11T21:38:06.7910366Z tmp2 = tmp0 * tmp1 2023-01-11T21:38:06.7910455Z tmp3 = tmp2.to(tl.int32) 2023-01-11T21:38:06.7910527Z tmp4 = x0 2023-01-11T21:38:06.7910612Z tmp5 = 0.9743589743589743 2023-01-11T21:38:06.7910691Z tmp6 = tmp4 * tmp5 2023-01-11T21:38:06.7910776Z tmp7 = tmp6.to(tl.int32) 2023-01-11T21:38:06.7910899Z tmp8 = tl.load(in_ptr0 + (tmp7 + (38*tmp3) + (1406*x2)), xmask) 2023-01-11T21:38:06.7911027Z tl.store(out_ptr0 + (x4 + tl.zeros([XBLOCK], tl.int32)), tmp8, xmask) 2023-01-11T21:38:06.7911115Z ''') 2023-01-11T21:38:06.7911121Z 2023-01-11T21:38:06.7911125Z 2023-01-11T21:38:06.7911219Z async_compile.wait(globals()) 2023-01-11T21:38:06.7911296Z del async_compile 2023-01-11T21:38:06.7911301Z 2023-01-11T21:38:06.7911374Z def call(args): 2023-01-11T21:38:06.7911448Z arg0_1, = args 2023-01-11T21:38:06.7911552Z args.clear() 2023-01-11T21:38:06.7911646Z with torch.cuda.device(0): 2023-01-11T21:38:06.7911867Z buf0 = empty_strided((2, 4, 74, 76), (22496, 5624, 76, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.7912095Z buf4 = empty_strided((2, 4, 74, 76), (22496, 5624, 76, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.7912191Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.7912381Z triton_fused_upsample_nearest2d_upsample_nearest2d_4_0.run(arg0_1, buf0, buf4, 44992, grid=grid(44992), stream=stream0) 2023-01-11T21:38:06.7912603Z buf1 = empty_strided((2, 4, 70, 75), (21000, 5250, 75, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.7912762Z triton_fused_upsample_nearest2d_1_1.run(arg0_1, buf1, 42000, grid=grid(42000), stream=stream0) 2023-01-11T21:38:06.7912987Z buf2 = empty_strided((2, 4, 45, 74), (13320, 3330, 74, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.7913143Z triton_fused_upsample_nearest2d_2_2.run(arg0_1, buf2, 26640, grid=grid(26640), stream=stream0) 2023-01-11T21:38:06.7913357Z buf3 = empty_strided((2, 4, 36, 39), (5616, 1404, 39, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.7913511Z triton_fused_upsample_nearest2d_3_3.run(arg0_1, buf3, 11232, grid=grid(11232), stream=stream0) 2023-01-11T21:38:06.7913590Z del arg0_1 2023-01-11T21:38:06.7913698Z return (buf0, buf1, buf2, buf3, buf4, ) 2023-01-11T21:38:06.7913703Z 2023-01-11T21:38:06.7913707Z 2023-01-11T21:38:06.7913787Z if __name__ == "__main__": 2023-01-11T21:38:06.7913907Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.7914033Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.7914260Z arg0_1 = rand_strided((2, 4, 37, 38), (5624, 1406, 38, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.7914373Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.7914378Z 2023-01-11T21:38:06.7914638Z [2023-01-11 21:36:18,245] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 940 2023-01-11T21:38:06.7914651Z 2023-01-11T21:38:06.7914743Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.7914818Z import torch 2023-01-11T21:38:06.7914895Z import random 2023-01-11T21:38:06.7915042Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.7915172Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.7915177Z 2023-01-11T21:38:06.7915262Z aten = torch.ops.aten 2023-01-11T21:38:06.7915401Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.7915490Z async_compile = AsyncCompile() 2023-01-11T21:38:06.7915494Z 2023-01-11T21:38:06.7915569Z import triton 2023-01-11T21:38:06.7915662Z import triton.language as tl 2023-01-11T21:38:06.7915787Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.7915926Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.7915935Z 2023-01-11T21:38:06.7915940Z 2023-01-11T21:38:06.7916160Z triton_fused_upsample_nearest2d_upsample_nearest2d_4_0 = async_compile.triton(''' 2023-01-11T21:38:06.7916239Z import triton 2023-01-11T21:38:06.7916331Z import triton.language as tl 2023-01-11T21:38:06.7916439Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.7916546Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.7916680Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.7916807Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.7916812Z 2023-01-11T21:38:06.7917234Z @pointwise(size_hints=[65536], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.7917307Z @triton.jit 2023-01-11T21:38:06.7917452Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.7917556Z xnumel = 44992 2023-01-11T21:38:06.7917647Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.7917778Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.7917864Z xmask = xindex < xnumel 2023-01-11T21:38:06.7917945Z x1 = (xindex // 76) % 74 2023-01-11T21:38:06.7918023Z x0 = xindex % 76 2023-01-11T21:38:06.7918103Z x2 = (xindex // 5624) 2023-01-11T21:38:06.7918176Z x4 = xindex 2023-01-11T21:38:06.7918240Z tmp0 = x1 2023-01-11T21:38:06.7918313Z tmp1 = 0.5 2023-01-11T21:38:06.7918393Z tmp2 = tmp0 * tmp1 2023-01-11T21:38:06.7918478Z tmp3 = tmp2.to(tl.int32) 2023-01-11T21:38:06.7918550Z tmp4 = x0 2023-01-11T21:38:06.7918629Z tmp5 = tmp4 * tmp1 2023-01-11T21:38:06.7918706Z tmp6 = tmp5.to(tl.int32) 2023-01-11T21:38:06.7918956Z tmp7 = tl.load(in_ptr0 + (tmp6 + (38*tmp3) + (1406*x2)), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.7919094Z tmp8 = tl.load(in_ptr0 + (tmp6 + (38*tmp3) + (1406*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.7919232Z tl.store(out_ptr0 + (x4 + tl.zeros([XBLOCK], tl.int32)), tmp7, xmask) 2023-01-11T21:38:06.7919365Z tl.store(out_ptr1 + (x4 + tl.zeros([XBLOCK], tl.int32)), tmp8, xmask) 2023-01-11T21:38:06.7919451Z ''') 2023-01-11T21:38:06.7919457Z 2023-01-11T21:38:06.7919461Z 2023-01-11T21:38:06.7919649Z triton_fused_upsample_nearest2d_1_1 = async_compile.triton(''' 2023-01-11T21:38:06.7919725Z import triton 2023-01-11T21:38:06.7919811Z import triton.language as tl 2023-01-11T21:38:06.7919926Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.7920028Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.7920160Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.7920286Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.7920291Z 2023-01-11T21:38:06.7920696Z @pointwise(size_hints=[65536], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.7920774Z @triton.jit 2023-01-11T21:38:06.7920905Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.7920973Z xnumel = 42000 2023-01-11T21:38:06.7921102Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.7921236Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.7921320Z xmask = xindex < xnumel 2023-01-11T21:38:06.7921404Z x1 = (xindex // 75) % 70 2023-01-11T21:38:06.7921482Z x0 = xindex % 75 2023-01-11T21:38:06.7921563Z x2 = (xindex // 5250) 2023-01-11T21:38:06.7921626Z x4 = xindex 2023-01-11T21:38:06.7921699Z tmp0 = x1 2023-01-11T21:38:06.7921779Z tmp1 = 0.5285714285714286 2023-01-11T21:38:06.7921858Z tmp2 = tmp0 * tmp1 2023-01-11T21:38:06.7921943Z tmp3 = tmp2.to(tl.int32) 2023-01-11T21:38:06.7922016Z tmp4 = x0 2023-01-11T21:38:06.7922098Z tmp5 = 0.5066666666666667 2023-01-11T21:38:06.7922170Z tmp6 = tmp4 * tmp5 2023-01-11T21:38:06.7922256Z tmp7 = tmp6.to(tl.int32) 2023-01-11T21:38:06.7922505Z tmp8 = tl.load(in_ptr0 + (tmp7 + (38*tmp3) + (1406*x2)), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.7922642Z tl.store(out_ptr0 + (x4 + tl.zeros([XBLOCK], tl.int32)), tmp8, xmask) 2023-01-11T21:38:06.7922726Z ''') 2023-01-11T21:38:06.7922732Z 2023-01-11T21:38:06.7922736Z 2023-01-11T21:38:06.7922921Z triton_fused_upsample_nearest2d_2_2 = async_compile.triton(''' 2023-01-11T21:38:06.7922996Z import triton 2023-01-11T21:38:06.7923081Z import triton.language as tl 2023-01-11T21:38:06.7923196Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.7923297Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.7923430Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.7923555Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.7923588Z 2023-01-11T21:38:06.7923992Z @pointwise(size_hints=[32768], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.7924065Z @triton.jit 2023-01-11T21:38:06.7924200Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.7924269Z xnumel = 26640 2023-01-11T21:38:06.7924366Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.7924496Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.7924579Z xmask = xindex < xnumel 2023-01-11T21:38:06.7924663Z x1 = (xindex // 74) % 45 2023-01-11T21:38:06.7924743Z x0 = xindex % 74 2023-01-11T21:38:06.7924824Z x2 = (xindex // 3330) 2023-01-11T21:38:06.7924888Z x4 = xindex 2023-01-11T21:38:06.7924959Z tmp0 = x1 2023-01-11T21:38:06.7925039Z tmp1 = 0.8222222222222222 2023-01-11T21:38:06.7925119Z tmp2 = tmp0 * tmp1 2023-01-11T21:38:06.7925206Z tmp3 = tmp2.to(tl.int32) 2023-01-11T21:38:06.7925278Z tmp4 = x0 2023-01-11T21:38:06.7925358Z tmp5 = 0.5135135135135135 2023-01-11T21:38:06.7925430Z tmp6 = tmp4 * tmp5 2023-01-11T21:38:06.7925514Z tmp7 = tmp6.to(tl.int32) 2023-01-11T21:38:06.7925769Z tmp8 = tl.load(in_ptr0 + (tmp7 + (38*tmp3) + (1406*x2)), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.7925905Z tl.store(out_ptr0 + (x4 + tl.zeros([XBLOCK], tl.int32)), tmp8, xmask) 2023-01-11T21:38:06.7925990Z ''') 2023-01-11T21:38:06.7925996Z 2023-01-11T21:38:06.7926000Z 2023-01-11T21:38:06.7926185Z triton_fused_upsample_nearest2d_3_3 = async_compile.triton(''' 2023-01-11T21:38:06.7926262Z import triton 2023-01-11T21:38:06.7926355Z import triton.language as tl 2023-01-11T21:38:06.7926462Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.7926564Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.7926697Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.7926829Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.7926834Z 2023-01-11T21:38:06.7927266Z @pointwise(size_hints=[16384], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.7927344Z @triton.jit 2023-01-11T21:38:06.7927476Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.7927551Z xnumel = 11232 2023-01-11T21:38:06.7927641Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.7927769Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.7927851Z xmask = xindex < xnumel 2023-01-11T21:38:06.7927932Z x1 = (xindex // 39) % 36 2023-01-11T21:38:06.7928007Z x0 = xindex % 39 2023-01-11T21:38:06.7928091Z x2 = (xindex // 1404) 2023-01-11T21:38:06.7928154Z x4 = xindex 2023-01-11T21:38:06.7928229Z tmp0 = x1 2023-01-11T21:38:06.7928308Z tmp1 = 1.0277777777777777 2023-01-11T21:38:06.7928388Z tmp2 = tmp0 * tmp1 2023-01-11T21:38:06.7928473Z tmp3 = tmp2.to(tl.int32) 2023-01-11T21:38:06.7928546Z tmp4 = x0 2023-01-11T21:38:06.7928623Z tmp5 = 0.9743589743589743 2023-01-11T21:38:06.7928695Z tmp6 = tmp4 * tmp5 2023-01-11T21:38:06.7928782Z tmp7 = tmp6.to(tl.int32) 2023-01-11T21:38:06.7928921Z tmp8 = tl.load(in_ptr0 + (tmp7 + (38*tmp3) + (1406*x2)), xmask).to(tl.float32) 2023-01-11T21:38:06.7929058Z tl.store(out_ptr0 + (x4 + tl.zeros([XBLOCK], tl.int32)), tmp8, xmask) 2023-01-11T21:38:06.7929144Z ''') 2023-01-11T21:38:06.7929150Z 2023-01-11T21:38:06.7929154Z 2023-01-11T21:38:06.7929247Z async_compile.wait(globals()) 2023-01-11T21:38:06.7929324Z del async_compile 2023-01-11T21:38:06.7929329Z 2023-01-11T21:38:06.7929405Z def call(args): 2023-01-11T21:38:06.7929471Z arg0_1, = args 2023-01-11T21:38:06.7929545Z args.clear() 2023-01-11T21:38:06.7929665Z with torch.cuda.device(0): 2023-01-11T21:38:06.7929893Z buf0 = empty_strided((2, 4, 74, 76), (22496, 5624, 76, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.7930116Z buf4 = empty_strided((2, 4, 74, 76), (22496, 5624, 76, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.7930209Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.7930397Z triton_fused_upsample_nearest2d_upsample_nearest2d_4_0.run(arg0_1, buf0, buf4, 44992, grid=grid(44992), stream=stream0) 2023-01-11T21:38:06.7930610Z buf1 = empty_strided((2, 4, 70, 75), (21000, 5250, 75, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.7930767Z triton_fused_upsample_nearest2d_1_1.run(arg0_1, buf1, 42000, grid=grid(42000), stream=stream0) 2023-01-11T21:38:06.7930993Z buf2 = empty_strided((2, 4, 45, 74), (13320, 3330, 74, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.7931147Z triton_fused_upsample_nearest2d_2_2.run(arg0_1, buf2, 26640, grid=grid(26640), stream=stream0) 2023-01-11T21:38:06.7931372Z buf3 = empty_strided((2, 4, 36, 39), (5616, 1404, 39, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.7931529Z triton_fused_upsample_nearest2d_3_3.run(arg0_1, buf3, 11232, grid=grid(11232), stream=stream0) 2023-01-11T21:38:06.7931604Z del arg0_1 2023-01-11T21:38:06.7931711Z return (buf0, buf1, buf2, buf3, buf4, ) 2023-01-11T21:38:06.7931717Z 2023-01-11T21:38:06.7931721Z 2023-01-11T21:38:06.7931802Z if __name__ == "__main__": 2023-01-11T21:38:06.7931914Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.7932039Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.7932264Z arg0_1 = rand_strided((2, 4, 37, 38), (5624, 1406, 38, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.7932377Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.7932382Z 2023-01-11T21:38:06.7932454Z ok (1.268s) 2023-01-11T21:38:06.7932948Z test_upsample_nearest3d_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.7933087Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.7933345Z [2023-01-11 21:36:18,814] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 941 2023-01-11T21:38:06.7933610Z [2023-01-11 21:36:19,024] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 941 2023-01-11T21:38:06.7934024Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.7934158Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.7934409Z [2023-01-11 21:36:19,707] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 942 2023-01-11T21:38:06.7934414Z 2023-01-11T21:38:06.7934709Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.7934794Z import torch 2023-01-11T21:38:06.7934870Z import random 2023-01-11T21:38:06.7934991Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.7935116Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.7935122Z 2023-01-11T21:38:06.7935203Z aten = torch.ops.aten 2023-01-11T21:38:06.7935334Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.7935432Z async_compile = AsyncCompile() 2023-01-11T21:38:06.7935437Z 2023-01-11T21:38:06.7935511Z import triton 2023-01-11T21:38:06.7935657Z import triton.language as tl 2023-01-11T21:38:06.7935786Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.7935929Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.7935934Z 2023-01-11T21:38:06.7935939Z 2023-01-11T21:38:06.7936170Z triton_fused_upsample_nearest3d_upsample_nearest3d_4_0 = async_compile.triton(''' 2023-01-11T21:38:06.7936248Z import triton 2023-01-11T21:38:06.7936337Z import triton.language as tl 2023-01-11T21:38:06.7936455Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.7936559Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.7936696Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.7936824Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.7936829Z 2023-01-11T21:38:06.7937319Z @pointwise(size_hints=[4194304], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.7937402Z @triton.jit 2023-01-11T21:38:06.7937548Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.7937627Z xnumel = 3509376 2023-01-11T21:38:06.7937723Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.7937855Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.7937943Z xmask = xindex < xnumel 2023-01-11T21:38:06.7938030Z x2 = (xindex // 5928) % 74 2023-01-11T21:38:06.7938114Z x1 = (xindex // 78) % 76 2023-01-11T21:38:06.7938191Z x0 = xindex % 78 2023-01-11T21:38:06.7938266Z x3 = (xindex // 438672) 2023-01-11T21:38:06.7938339Z x5 = xindex 2023-01-11T21:38:06.7938411Z tmp0 = x2 2023-01-11T21:38:06.7938488Z tmp1 = 0.5 2023-01-11T21:38:06.7938569Z tmp2 = tmp0 * tmp1 2023-01-11T21:38:06.7938661Z tmp3 = tmp2.to(tl.int32) 2023-01-11T21:38:06.7938735Z tmp4 = x1 2023-01-11T21:38:06.7938812Z tmp5 = tmp4 * tmp1 2023-01-11T21:38:06.7938898Z tmp6 = tmp5.to(tl.int32) 2023-01-11T21:38:06.7938972Z tmp7 = x0 2023-01-11T21:38:06.7939053Z tmp8 = tmp7 * tmp1 2023-01-11T21:38:06.7939138Z tmp9 = tmp8.to(tl.int32) 2023-01-11T21:38:06.7939434Z tmp10 = tl.load(in_ptr0 + (tmp9 + (39*tmp6) + (1482*tmp3) + (54834*x3)), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.7939572Z tmp11 = tl.load(in_ptr0 + (tmp9 + (39*tmp6) + (1482*tmp3) + (54834*x3)), xmask) 2023-01-11T21:38:06.7939704Z tl.store(out_ptr0 + (x5 + tl.zeros([XBLOCK], tl.int32)), tmp10, xmask) 2023-01-11T21:38:06.7939839Z tl.store(out_ptr1 + (x5 + tl.zeros([XBLOCK], tl.int32)), tmp11, xmask) 2023-01-11T21:38:06.7939928Z ''') 2023-01-11T21:38:06.7939934Z 2023-01-11T21:38:06.7939938Z 2023-01-11T21:38:06.7940127Z triton_fused_upsample_nearest3d_1_1 = async_compile.triton(''' 2023-01-11T21:38:06.7940205Z import triton 2023-01-11T21:38:06.7940298Z import triton.language as tl 2023-01-11T21:38:06.7940420Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.7940525Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.7940654Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.7940783Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.7940788Z 2023-01-11T21:38:06.7941196Z @pointwise(size_hints=[4194304], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.7941272Z @triton.jit 2023-01-11T21:38:06.7941406Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.7941484Z xnumel = 3360000 2023-01-11T21:38:06.7941584Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.7941716Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.7941794Z xmask = xindex < xnumel 2023-01-11T21:38:06.7941913Z x2 = (xindex // 6000) % 70 2023-01-11T21:38:06.7941996Z x1 = (xindex // 80) % 75 2023-01-11T21:38:06.7942074Z x0 = xindex % 80 2023-01-11T21:38:06.7942158Z x3 = (xindex // 420000) 2023-01-11T21:38:06.7942231Z x5 = xindex 2023-01-11T21:38:06.7942305Z tmp0 = x2 2023-01-11T21:38:06.7942380Z tmp1 = 0.5285714285714286 2023-01-11T21:38:06.7942469Z tmp2 = tmp0 * tmp1 2023-01-11T21:38:06.7942556Z tmp3 = tmp2.to(tl.int32) 2023-01-11T21:38:06.7942631Z tmp4 = x1 2023-01-11T21:38:06.7942714Z tmp5 = 0.5066666666666667 2023-01-11T21:38:06.7942796Z tmp6 = tmp4 * tmp5 2023-01-11T21:38:06.7942875Z tmp7 = tmp6.to(tl.int32) 2023-01-11T21:38:06.7942950Z tmp8 = x0 2023-01-11T21:38:06.7943026Z tmp9 = 0.4875 2023-01-11T21:38:06.7943110Z tmp10 = tmp8 * tmp9 2023-01-11T21:38:06.7943201Z tmp11 = tmp10.to(tl.int32) 2023-01-11T21:38:06.7943454Z tmp12 = tl.load(in_ptr0 + (tmp11 + (39*tmp7) + (1482*tmp3) + (54834*x3)), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.7943593Z tl.store(out_ptr0 + (x5 + tl.zeros([XBLOCK], tl.int32)), tmp12, xmask) 2023-01-11T21:38:06.7943673Z ''') 2023-01-11T21:38:06.7943686Z 2023-01-11T21:38:06.7943690Z 2023-01-11T21:38:06.7943868Z triton_fused_upsample_nearest3d_2_2 = async_compile.triton(''' 2023-01-11T21:38:06.7943948Z import triton 2023-01-11T21:38:06.7944046Z import triton.language as tl 2023-01-11T21:38:06.7944162Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.7944265Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.7944400Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.7944527Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.7944533Z 2023-01-11T21:38:06.7944933Z @pointwise(size_hints=[4194304], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.7945005Z @triton.jit 2023-01-11T21:38:06.7945136Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.7945213Z xnumel = 2743920 2023-01-11T21:38:06.7945313Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.7945443Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.7945558Z xmask = xindex < xnumel 2023-01-11T21:38:06.7945658Z x2 = (xindex // 7622) % 45 2023-01-11T21:38:06.7945745Z x1 = (xindex // 103) % 74 2023-01-11T21:38:06.7945835Z x0 = xindex % 103 2023-01-11T21:38:06.7945929Z x3 = (xindex // 342990) 2023-01-11T21:38:06.7946007Z x5 = xindex 2023-01-11T21:38:06.7946080Z tmp0 = x2 2023-01-11T21:38:06.7946161Z tmp1 = 0.8222222222222222 2023-01-11T21:38:06.7946235Z tmp2 = tmp0 * tmp1 2023-01-11T21:38:06.7946323Z tmp3 = tmp2.to(tl.int32) 2023-01-11T21:38:06.7946397Z tmp4 = x1 2023-01-11T21:38:06.7946478Z tmp5 = 0.5135135135135135 2023-01-11T21:38:06.7946558Z tmp6 = tmp4 * tmp5 2023-01-11T21:38:06.7946649Z tmp7 = tmp6.to(tl.int32) 2023-01-11T21:38:06.7946727Z tmp8 = x0 2023-01-11T21:38:06.7946799Z tmp9 = 0.3786407766990291 2023-01-11T21:38:06.7946880Z tmp10 = tmp8 * tmp9 2023-01-11T21:38:06.7946971Z tmp11 = tmp10.to(tl.int32) 2023-01-11T21:38:06.7947224Z tmp12 = tl.load(in_ptr0 + (tmp11 + (39*tmp7) + (1482*tmp3) + (54834*x3)), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.7947362Z tl.store(out_ptr0 + (x5 + tl.zeros([XBLOCK], tl.int32)), tmp12, xmask) 2023-01-11T21:38:06.7947449Z ''') 2023-01-11T21:38:06.7947458Z 2023-01-11T21:38:06.7947463Z 2023-01-11T21:38:06.7947650Z triton_fused_upsample_nearest3d_3_3 = async_compile.triton(''' 2023-01-11T21:38:06.7947727Z import triton 2023-01-11T21:38:06.7947815Z import triton.language as tl 2023-01-11T21:38:06.7947932Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.7948037Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.7948173Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.7948342Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.7948347Z 2023-01-11T21:38:06.7948758Z @pointwise(size_hints=[524288], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.7948839Z @triton.jit 2023-01-11T21:38:06.7948972Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.7949043Z xnumel = 449280 2023-01-11T21:38:06.7949141Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.7949272Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.7949357Z xmask = xindex < xnumel 2023-01-11T21:38:06.7949443Z x2 = (xindex // 1560) % 36 2023-01-11T21:38:06.7949526Z x1 = (xindex // 40) % 39 2023-01-11T21:38:06.7949605Z x0 = xindex % 40 2023-01-11T21:38:06.7949680Z x3 = (xindex // 56160) 2023-01-11T21:38:06.7949757Z x5 = xindex 2023-01-11T21:38:06.7949830Z tmp0 = x2 2023-01-11T21:38:06.7949910Z tmp1 = 1.0277777777777777 2023-01-11T21:38:06.7949992Z tmp2 = tmp0 * tmp1 2023-01-11T21:38:06.7950081Z tmp3 = tmp2.to(tl.int32) 2023-01-11T21:38:06.7950148Z tmp4 = x1 2023-01-11T21:38:06.7950231Z tmp5 = 0.9743589743589743 2023-01-11T21:38:06.7950316Z tmp6 = tmp4 * tmp5 2023-01-11T21:38:06.7950403Z tmp7 = tmp6.to(tl.int32) 2023-01-11T21:38:06.7950477Z tmp8 = x0 2023-01-11T21:38:06.7950555Z tmp9 = 0.975 2023-01-11T21:38:06.7950637Z tmp10 = tmp8 * tmp9 2023-01-11T21:38:06.7950720Z tmp11 = tmp10.to(tl.int32) 2023-01-11T21:38:06.7950857Z tmp12 = tl.load(in_ptr0 + (tmp11 + (39*tmp7) + (1482*tmp3) + (54834*x3)), xmask) 2023-01-11T21:38:06.7950994Z tl.store(out_ptr0 + (x5 + tl.zeros([XBLOCK], tl.int32)), tmp12, xmask) 2023-01-11T21:38:06.7951086Z ''') 2023-01-11T21:38:06.7951091Z 2023-01-11T21:38:06.7951095Z 2023-01-11T21:38:06.7951194Z async_compile.wait(globals()) 2023-01-11T21:38:06.7951282Z del async_compile 2023-01-11T21:38:06.7951288Z 2023-01-11T21:38:06.7951366Z def call(args): 2023-01-11T21:38:06.7951441Z arg0_1, = args 2023-01-11T21:38:06.7951512Z args.clear() 2023-01-11T21:38:06.7951606Z with torch.cuda.device(0): 2023-01-11T21:38:06.7951881Z buf0 = empty_strided((2, 4, 74, 76, 78), (1754688, 438672, 5928, 78, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.7952117Z buf4 = empty_strided((2, 4, 74, 76, 78), (1754688, 438672, 5928, 78, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.7952215Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.7952406Z triton_fused_upsample_nearest3d_upsample_nearest3d_4_0.run(arg0_1, buf0, buf4, 3509376, grid=grid(3509376), stream=stream0) 2023-01-11T21:38:06.7952643Z buf1 = empty_strided((2, 4, 70, 75, 80), (1680000, 420000, 6000, 80, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.7952806Z triton_fused_upsample_nearest3d_1_1.run(arg0_1, buf1, 3360000, grid=grid(3360000), stream=stream0) 2023-01-11T21:38:06.7953045Z buf2 = empty_strided((2, 4, 45, 74, 103), (1371960, 342990, 7622, 103, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.7953211Z triton_fused_upsample_nearest3d_2_2.run(arg0_1, buf2, 2743920, grid=grid(2743920), stream=stream0) 2023-01-11T21:38:06.7953451Z buf3 = empty_strided((2, 4, 36, 39, 40), (224640, 56160, 1560, 40, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.7953612Z triton_fused_upsample_nearest3d_3_3.run(arg0_1, buf3, 449280, grid=grid(449280), stream=stream0) 2023-01-11T21:38:06.7953691Z del arg0_1 2023-01-11T21:38:06.7953802Z return (buf0, buf1, buf2, buf3, buf4, ) 2023-01-11T21:38:06.7953807Z 2023-01-11T21:38:06.7953812Z 2023-01-11T21:38:06.7953894Z if __name__ == "__main__": 2023-01-11T21:38:06.7954014Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.7954137Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.7954405Z arg0_1 = rand_strided((2, 4, 37, 38, 39), (219336, 54834, 1482, 39, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.7954524Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.7954529Z 2023-01-11T21:38:06.7954796Z [2023-01-11 21:36:19,869] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 942 2023-01-11T21:38:06.7954801Z 2023-01-11T21:38:06.7954901Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.7954978Z import torch 2023-01-11T21:38:06.7955055Z import random 2023-01-11T21:38:06.7955177Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.7955322Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.7955334Z 2023-01-11T21:38:06.7955418Z aten = torch.ops.aten 2023-01-11T21:38:06.7955575Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.7955675Z async_compile = AsyncCompile() 2023-01-11T21:38:06.7955680Z 2023-01-11T21:38:06.7955758Z import triton 2023-01-11T21:38:06.7955853Z import triton.language as tl 2023-01-11T21:38:06.7955981Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.7956122Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.7956128Z 2023-01-11T21:38:06.7956132Z 2023-01-11T21:38:06.7956357Z triton_fused_upsample_nearest3d_upsample_nearest3d_4_0 = async_compile.triton(''' 2023-01-11T21:38:06.7956427Z import triton 2023-01-11T21:38:06.7956521Z import triton.language as tl 2023-01-11T21:38:06.7956638Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.7956744Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.7956882Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.7957009Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.7957014Z 2023-01-11T21:38:06.7957441Z @pointwise(size_hints=[4194304], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.7957526Z @triton.jit 2023-01-11T21:38:06.7957664Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.7957742Z xnumel = 3509376 2023-01-11T21:38:06.7957871Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.7958004Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.7958092Z xmask = xindex < xnumel 2023-01-11T21:38:06.7958176Z x2 = (xindex // 5928) % 74 2023-01-11T21:38:06.7958262Z x1 = (xindex // 78) % 76 2023-01-11T21:38:06.7958334Z x0 = xindex % 78 2023-01-11T21:38:06.7958415Z x3 = (xindex // 438672) 2023-01-11T21:38:06.7958489Z x5 = xindex 2023-01-11T21:38:06.7958563Z tmp0 = x2 2023-01-11T21:38:06.7958639Z tmp1 = 0.5 2023-01-11T21:38:06.7958721Z tmp2 = tmp0 * tmp1 2023-01-11T21:38:06.7958802Z tmp3 = tmp2.to(tl.int32) 2023-01-11T21:38:06.7958879Z tmp4 = x1 2023-01-11T21:38:06.7958960Z tmp5 = tmp4 * tmp1 2023-01-11T21:38:06.7959046Z tmp6 = tmp5.to(tl.int32) 2023-01-11T21:38:06.7959120Z tmp7 = x0 2023-01-11T21:38:06.7959202Z tmp8 = tmp7 * tmp1 2023-01-11T21:38:06.7959287Z tmp9 = tmp8.to(tl.int32) 2023-01-11T21:38:06.7959556Z tmp10 = tl.load(in_ptr0 + (tmp9 + (39*tmp6) + (1482*tmp3) + (54834*x3)), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.7959707Z tmp11 = tl.load(in_ptr0 + (tmp9 + (39*tmp6) + (1482*tmp3) + (54834*x3)), xmask).to(tl.float32) 2023-01-11T21:38:06.7959842Z tl.store(out_ptr0 + (x5 + tl.zeros([XBLOCK], tl.int32)), tmp10, xmask) 2023-01-11T21:38:06.7959976Z tl.store(out_ptr1 + (x5 + tl.zeros([XBLOCK], tl.int32)), tmp11, xmask) 2023-01-11T21:38:06.7960063Z ''') 2023-01-11T21:38:06.7960069Z 2023-01-11T21:38:06.7960074Z 2023-01-11T21:38:06.7960261Z triton_fused_upsample_nearest3d_1_1 = async_compile.triton(''' 2023-01-11T21:38:06.7960368Z import triton 2023-01-11T21:38:06.7960464Z import triton.language as tl 2023-01-11T21:38:06.7960574Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.7960679Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.7960816Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.7960945Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.7960950Z 2023-01-11T21:38:06.7961363Z @pointwise(size_hints=[4194304], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.7961440Z @triton.jit 2023-01-11T21:38:06.7961573Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.7961652Z xnumel = 3360000 2023-01-11T21:38:06.7961745Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.7961877Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.7961967Z xmask = xindex < xnumel 2023-01-11T21:38:06.7962051Z x2 = (xindex // 6000) % 70 2023-01-11T21:38:06.7962133Z x1 = (xindex // 80) % 75 2023-01-11T21:38:06.7962212Z x0 = xindex % 80 2023-01-11T21:38:06.7962295Z x3 = (xindex // 420000) 2023-01-11T21:38:06.7962361Z x5 = xindex 2023-01-11T21:38:06.7962439Z tmp0 = x2 2023-01-11T21:38:06.7962522Z tmp1 = 0.5285714285714286 2023-01-11T21:38:06.7962604Z tmp2 = tmp0 * tmp1 2023-01-11T21:38:06.7962692Z tmp3 = tmp2.to(tl.int32) 2023-01-11T21:38:06.7962766Z tmp4 = x1 2023-01-11T21:38:06.7962847Z tmp5 = 0.5066666666666667 2023-01-11T21:38:06.7962921Z tmp6 = tmp4 * tmp5 2023-01-11T21:38:06.7963008Z tmp7 = tmp6.to(tl.int32) 2023-01-11T21:38:06.7963082Z tmp8 = x0 2023-01-11T21:38:06.7963161Z tmp9 = 0.4875 2023-01-11T21:38:06.7963244Z tmp10 = tmp8 * tmp9 2023-01-11T21:38:06.7963340Z tmp11 = tmp10.to(tl.int32) 2023-01-11T21:38:06.7963605Z tmp12 = tl.load(in_ptr0 + (tmp11 + (39*tmp7) + (1482*tmp3) + (54834*x3)), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.7963746Z tl.store(out_ptr0 + (x5 + tl.zeros([XBLOCK], tl.int32)), tmp12, xmask) 2023-01-11T21:38:06.7963833Z ''') 2023-01-11T21:38:06.7963838Z 2023-01-11T21:38:06.7963843Z 2023-01-11T21:38:06.7964066Z triton_fused_upsample_nearest3d_2_2 = async_compile.triton(''' 2023-01-11T21:38:06.7964145Z import triton 2023-01-11T21:38:06.7964240Z import triton.language as tl 2023-01-11T21:38:06.7964357Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.7964461Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.7964588Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.7964716Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.7964721Z 2023-01-11T21:38:06.7965126Z @pointwise(size_hints=[4194304], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.7965204Z @triton.jit 2023-01-11T21:38:06.7965337Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.7965417Z xnumel = 2743920 2023-01-11T21:38:06.7965516Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.7965649Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.7965742Z xmask = xindex < xnumel 2023-01-11T21:38:06.7965836Z x2 = (xindex // 7622) % 45 2023-01-11T21:38:06.7965933Z x1 = (xindex // 103) % 74 2023-01-11T21:38:06.7966022Z x0 = xindex % 103 2023-01-11T21:38:06.7966103Z x3 = (xindex // 342990) 2023-01-11T21:38:06.7966176Z x5 = xindex 2023-01-11T21:38:06.7966249Z tmp0 = x2 2023-01-11T21:38:06.7966323Z tmp1 = 0.8222222222222222 2023-01-11T21:38:06.7966403Z tmp2 = tmp0 * tmp1 2023-01-11T21:38:06.7966493Z tmp3 = tmp2.to(tl.int32) 2023-01-11T21:38:06.7966568Z tmp4 = x1 2023-01-11T21:38:06.7966680Z tmp5 = 0.5135135135135135 2023-01-11T21:38:06.7966761Z tmp6 = tmp4 * tmp5 2023-01-11T21:38:06.7966849Z tmp7 = tmp6.to(tl.int32) 2023-01-11T21:38:06.7966916Z tmp8 = x0 2023-01-11T21:38:06.7966996Z tmp9 = 0.3786407766990291 2023-01-11T21:38:06.7967077Z tmp10 = tmp8 * tmp9 2023-01-11T21:38:06.7967170Z tmp11 = tmp10.to(tl.int32) 2023-01-11T21:38:06.7967444Z tmp12 = tl.load(in_ptr0 + (tmp11 + (39*tmp7) + (1482*tmp3) + (54834*x3)), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.7967583Z tl.store(out_ptr0 + (x5 + tl.zeros([XBLOCK], tl.int32)), tmp12, xmask) 2023-01-11T21:38:06.7967671Z ''') 2023-01-11T21:38:06.7967677Z 2023-01-11T21:38:06.7967681Z 2023-01-11T21:38:06.7967862Z triton_fused_upsample_nearest3d_3_3 = async_compile.triton(''' 2023-01-11T21:38:06.7967939Z import triton 2023-01-11T21:38:06.7968033Z import triton.language as tl 2023-01-11T21:38:06.7968150Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.7968256Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.7968390Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.7968518Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.7968523Z 2023-01-11T21:38:06.7968933Z @pointwise(size_hints=[524288], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.7969003Z @triton.jit 2023-01-11T21:38:06.7969138Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.7969215Z xnumel = 449280 2023-01-11T21:38:06.7969314Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.7969446Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.7969532Z xmask = xindex < xnumel 2023-01-11T21:38:06.7969619Z x2 = (xindex // 1560) % 36 2023-01-11T21:38:06.7969695Z x1 = (xindex // 40) % 39 2023-01-11T21:38:06.7969775Z x0 = xindex % 40 2023-01-11T21:38:06.7969860Z x3 = (xindex // 56160) 2023-01-11T21:38:06.7969934Z x5 = xindex 2023-01-11T21:38:06.7970006Z tmp0 = x2 2023-01-11T21:38:06.7970092Z tmp1 = 1.0277777777777777 2023-01-11T21:38:06.7970174Z tmp2 = tmp0 * tmp1 2023-01-11T21:38:06.7970284Z tmp3 = tmp2.to(tl.int32) 2023-01-11T21:38:06.7970360Z tmp4 = x1 2023-01-11T21:38:06.7970440Z tmp5 = 0.9743589743589743 2023-01-11T21:38:06.7970521Z tmp6 = tmp4 * tmp5 2023-01-11T21:38:06.7970606Z tmp7 = tmp6.to(tl.int32) 2023-01-11T21:38:06.7970681Z tmp8 = x0 2023-01-11T21:38:06.7970757Z tmp9 = 0.975 2023-01-11T21:38:06.7970832Z tmp10 = tmp8 * tmp9 2023-01-11T21:38:06.7970921Z tmp11 = tmp10.to(tl.int32) 2023-01-11T21:38:06.7971076Z tmp12 = tl.load(in_ptr0 + (tmp11 + (39*tmp7) + (1482*tmp3) + (54834*x3)), xmask).to(tl.float32) 2023-01-11T21:38:06.7971212Z tl.store(out_ptr0 + (x5 + tl.zeros([XBLOCK], tl.int32)), tmp12, xmask) 2023-01-11T21:38:06.7971306Z ''') 2023-01-11T21:38:06.7971311Z 2023-01-11T21:38:06.7971315Z 2023-01-11T21:38:06.7971411Z async_compile.wait(globals()) 2023-01-11T21:38:06.7971493Z del async_compile 2023-01-11T21:38:06.7971498Z 2023-01-11T21:38:06.7971568Z def call(args): 2023-01-11T21:38:06.7971644Z arg0_1, = args 2023-01-11T21:38:06.7971721Z args.clear() 2023-01-11T21:38:06.7971818Z with torch.cuda.device(0): 2023-01-11T21:38:06.7972070Z buf0 = empty_strided((2, 4, 74, 76, 78), (1754688, 438672, 5928, 78, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.7972307Z buf4 = empty_strided((2, 4, 74, 76, 78), (1754688, 438672, 5928, 78, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.7972403Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.7972595Z triton_fused_upsample_nearest3d_upsample_nearest3d_4_0.run(arg0_1, buf0, buf4, 3509376, grid=grid(3509376), stream=stream0) 2023-01-11T21:38:06.7972823Z buf1 = empty_strided((2, 4, 70, 75, 80), (1680000, 420000, 6000, 80, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.7973014Z triton_fused_upsample_nearest3d_1_1.run(arg0_1, buf1, 3360000, grid=grid(3360000), stream=stream0) 2023-01-11T21:38:06.7973259Z buf2 = empty_strided((2, 4, 45, 74, 103), (1371960, 342990, 7622, 103, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.7973424Z triton_fused_upsample_nearest3d_2_2.run(arg0_1, buf2, 2743920, grid=grid(2743920), stream=stream0) 2023-01-11T21:38:06.7973662Z buf3 = empty_strided((2, 4, 36, 39, 40), (224640, 56160, 1560, 40, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.7973824Z triton_fused_upsample_nearest3d_3_3.run(arg0_1, buf3, 449280, grid=grid(449280), stream=stream0) 2023-01-11T21:38:06.7973901Z del arg0_1 2023-01-11T21:38:06.7974009Z return (buf0, buf1, buf2, buf3, buf4, ) 2023-01-11T21:38:06.7974014Z 2023-01-11T21:38:06.7974019Z 2023-01-11T21:38:06.7974102Z if __name__ == "__main__": 2023-01-11T21:38:06.7974220Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.7974349Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.7974807Z arg0_1 = rand_strided((2, 4, 37, 38, 39), (219336, 54834, 1482, 39, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.7974923Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.7974933Z 2023-01-11T21:38:06.7975004Z ok (1.628s) 2023-01-11T21:38:06.7975463Z test_var_mean_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.7975597Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.7975858Z [2023-01-11 21:36:19,895] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 943 2023-01-11T21:38:06.7976071Z [2023-01-11 21:36:19,935] torch._inductor.scheduler: [DEBUG] remove_buffer('buf0') 2023-01-11T21:38:06.7976267Z [2023-01-11 21:36:19,941] torch._inductor.scheduler: [DEBUG] remove_buffer('buf5') 2023-01-11T21:38:06.7976584Z [2023-01-11 21:36:20,144] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 943 2023-01-11T21:38:06.7977006Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.7977197Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.7977460Z [2023-01-11 21:36:20,166] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 944 2023-01-11T21:38:06.7977670Z [2023-01-11 21:36:20,199] torch._inductor.scheduler: [DEBUG] remove_buffer('buf0') 2023-01-11T21:38:06.7977872Z [2023-01-11 21:36:20,199] torch._inductor.scheduler: [DEBUG] remove_buffer('buf2') 2023-01-11T21:38:06.7978075Z [2023-01-11 21:36:20,204] torch._inductor.scheduler: [DEBUG] remove_buffer('buf7') 2023-01-11T21:38:06.7978273Z [2023-01-11 21:36:20,204] torch._inductor.scheduler: [DEBUG] remove_buffer('buf5') 2023-01-11T21:38:06.7978278Z 2023-01-11T21:38:06.7978378Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.7978445Z import torch 2023-01-11T21:38:06.7978521Z import random 2023-01-11T21:38:06.7978641Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.7978766Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.7978771Z 2023-01-11T21:38:06.7978853Z aten = torch.ops.aten 2023-01-11T21:38:06.7978990Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.7979127Z async_compile = AsyncCompile() 2023-01-11T21:38:06.7979133Z 2023-01-11T21:38:06.7979200Z import triton 2023-01-11T21:38:06.7979291Z import triton.language as tl 2023-01-11T21:38:06.7979416Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.7979557Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.7979565Z 2023-01-11T21:38:06.7979570Z 2023-01-11T21:38:06.7979764Z triton_fused_getitem_getitem_1_var_mean_0 = async_compile.triton(''' 2023-01-11T21:38:06.7979839Z import triton 2023-01-11T21:38:06.7979931Z import triton.language as tl 2023-01-11T21:38:06.7980046Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.7980141Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.7980273Z from torch._inductor.triton_ops.autotune import reduction 2023-01-11T21:38:06.7980399Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.7980404Z 2023-01-11T21:38:06.7980493Z @reduction(size_hints=[8, 8], 2023-01-11T21:38:06.7980611Z reduction_hint=ReductionHint.INNER, 2023-01-11T21:38:06.7980698Z filename=__file__, 2023-01-11T21:38:06.7981108Z meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0', 'in_out_ptr1'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.7981184Z @triton.jit 2023-01-11T21:38:06.7981380Z def triton_(in_out_ptr0, in_out_ptr1, in_ptr0, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr): 2023-01-11T21:38:06.7981458Z xnumel = 8 2023-01-11T21:38:06.7981532Z rnumel = 8 2023-01-11T21:38:06.7981633Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.7981780Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.7981872Z xmask = xindex < xnumel 2023-01-11T21:38:06.7981998Z rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK]) 2023-01-11T21:38:06.7982063Z x0 = xindex 2023-01-11T21:38:06.7982191Z _tmp1 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.7982302Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.7982397Z rindex = roffset + rbase 2023-01-11T21:38:06.7982485Z rmask = rindex < rnumel 2023-01-11T21:38:06.7982557Z r1 = rindex 2023-01-11T21:38:06.7982847Z tmp0 = tl.load(in_ptr0 + (r1 + (8*x0)), rmask & xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.7982965Z _tmp1 = tl.where(xmask & rmask, _tmp1 + tmp0, _tmp1) 2023-01-11T21:38:06.7983081Z tmp1 = tl.reshape(tl.sum(_tmp1, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.7983198Z _tmp7 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.7983314Z _tmp9 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.7983419Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.7983508Z rindex = roffset + rbase 2023-01-11T21:38:06.7983595Z rmask = rindex < rnumel 2023-01-11T21:38:06.7983662Z r1 = rindex 2023-01-11T21:38:06.7983878Z tmp2 = tl.load(in_ptr0 + (r1 + (8*x0)), rmask & xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.7983997Z tmp8 = tl.load(in_ptr0 + (r1 + (8*x0)), rmask & xmask) 2023-01-11T21:38:06.7984069Z tmp3 = 8 2023-01-11T21:38:06.7984151Z tmp4 = tmp1 / tmp3 2023-01-11T21:38:06.7984267Z tmp5 = tmp2 - tmp4 2023-01-11T21:38:06.7984349Z tmp6 = tmp5 * tmp5 2023-01-11T21:38:06.7984464Z _tmp7 = tl.where(xmask & rmask, _tmp7 + tmp6, _tmp7) 2023-01-11T21:38:06.7984584Z _tmp9 = tl.where(xmask & rmask, _tmp9 + tmp8, _tmp9) 2023-01-11T21:38:06.7984697Z tmp7 = tl.reshape(tl.sum(_tmp7, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.7984807Z tmp9 = tl.reshape(tl.sum(_tmp9, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.7984884Z tmp10 = 7 2023-01-11T21:38:06.7984967Z tmp11 = tmp7 / tmp10 2023-01-11T21:38:06.7985038Z tmp12 = 8 2023-01-11T21:38:06.7985110Z tmp13 = tmp9 / tmp12 2023-01-11T21:38:06.7985250Z tl.store(in_out_ptr0 + (x0 + tl.zeros([XBLOCK, 1], tl.int32)), tmp11, xmask) 2023-01-11T21:38:06.7985445Z tl.store(in_out_ptr1 + (x0 + tl.zeros([XBLOCK, 1], tl.int32)), tmp13, xmask) 2023-01-11T21:38:06.7985530Z ''') 2023-01-11T21:38:06.7985535Z 2023-01-11T21:38:06.7985540Z 2023-01-11T21:38:06.7985740Z triton_fused_getitem_2_getitem_3_var_mean_1_1 = async_compile.triton(''' 2023-01-11T21:38:06.7985818Z import triton 2023-01-11T21:38:06.7985912Z import triton.language as tl 2023-01-11T21:38:06.7986031Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.7986127Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.7986259Z from torch._inductor.triton_ops.autotune import reduction 2023-01-11T21:38:06.7986387Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.7986392Z 2023-01-11T21:38:06.7986482Z @reduction(size_hints=[4, 16], 2023-01-11T21:38:06.7986601Z reduction_hint=ReductionHint.DEFAULT, 2023-01-11T21:38:06.7986686Z filename=__file__, 2023-01-11T21:38:06.7987098Z meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0', 'in_out_ptr1'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 4), equal_to_1=())]}) 2023-01-11T21:38:06.7987171Z @triton.jit 2023-01-11T21:38:06.7987349Z def triton_(in_out_ptr0, in_out_ptr1, in_ptr0, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr): 2023-01-11T21:38:06.7987424Z xnumel = 4 2023-01-11T21:38:06.7987495Z rnumel = 16 2023-01-11T21:38:06.7987593Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.7987727Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.7987813Z xmask = xindex < xnumel 2023-01-11T21:38:06.7987932Z rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK]) 2023-01-11T21:38:06.7987995Z x0 = xindex 2023-01-11T21:38:06.7988111Z _tmp1 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.7988221Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.7988308Z rindex = roffset + rbase 2023-01-11T21:38:06.7988393Z rmask = rindex < rnumel 2023-01-11T21:38:06.7988469Z r1 = rindex % 8 2023-01-11T21:38:06.7988549Z r2 = (rindex // 8) 2023-01-11T21:38:06.7988808Z tmp0 = tl.load(in_ptr0 + (r1 + (8*x0) + (32*r2)), rmask & xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.7988933Z _tmp1 = tl.where(xmask & rmask, _tmp1 + tmp0, _tmp1) 2023-01-11T21:38:06.7989048Z tmp1 = tl.reshape(tl.sum(_tmp1, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.7989163Z _tmp7 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.7989278Z _tmp9 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.7989383Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.7989474Z rindex = roffset + rbase 2023-01-11T21:38:06.7989560Z rmask = rindex < rnumel 2023-01-11T21:38:06.7989629Z r1 = rindex % 8 2023-01-11T21:38:06.7989710Z r2 = (rindex // 8) 2023-01-11T21:38:06.7989935Z tmp2 = tl.load(in_ptr0 + (r1 + (8*x0) + (32*r2)), rmask & xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.7990059Z tmp8 = tl.load(in_ptr0 + (r1 + (8*x0) + (32*r2)), rmask & xmask) 2023-01-11T21:38:06.7990132Z tmp3 = 16 2023-01-11T21:38:06.7990216Z tmp4 = tmp1 / tmp3 2023-01-11T21:38:06.7990330Z tmp5 = tmp2 - tmp4 2023-01-11T21:38:06.7990403Z tmp6 = tmp5 * tmp5 2023-01-11T21:38:06.7990526Z _tmp7 = tl.where(xmask & rmask, _tmp7 + tmp6, _tmp7) 2023-01-11T21:38:06.7990643Z _tmp9 = tl.where(xmask & rmask, _tmp9 + tmp8, _tmp9) 2023-01-11T21:38:06.7990758Z tmp7 = tl.reshape(tl.sum(_tmp7, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.7990869Z tmp9 = tl.reshape(tl.sum(_tmp9, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.7990942Z tmp10 = 15 2023-01-11T21:38:06.7991026Z tmp11 = tmp7 / tmp10 2023-01-11T21:38:06.7991091Z tmp12 = 16 2023-01-11T21:38:06.7991172Z tmp13 = tmp9 / tmp12 2023-01-11T21:38:06.7991349Z tl.store(in_out_ptr0 + (x0 + tl.zeros([XBLOCK, 1], tl.int32)), tmp11, xmask) 2023-01-11T21:38:06.7991487Z tl.store(in_out_ptr1 + (x0 + tl.zeros([XBLOCK, 1], tl.int32)), tmp13, xmask) 2023-01-11T21:38:06.7991572Z ''') 2023-01-11T21:38:06.7991578Z 2023-01-11T21:38:06.7991582Z 2023-01-11T21:38:06.7991680Z async_compile.wait(globals()) 2023-01-11T21:38:06.7991760Z del async_compile 2023-01-11T21:38:06.7991765Z 2023-01-11T21:38:06.7991832Z def call(args): 2023-01-11T21:38:06.7991907Z arg0_1, = args 2023-01-11T21:38:06.7991982Z args.clear() 2023-01-11T21:38:06.7992075Z with torch.cuda.device(0): 2023-01-11T21:38:06.7992285Z buf1 = empty_strided((1, 2, 4), (8, 4, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.7992488Z buf2 = empty_strided((1, 2, 4), (8, 4, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.7992580Z buf3 = buf1; del buf1 # reuse 2023-01-11T21:38:06.7992670Z buf4 = buf2; del buf2 # reuse 2023-01-11T21:38:06.7992759Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.7992929Z triton_fused_getitem_getitem_1_var_mean_0.run(buf3, buf4, arg0_1, 8, 8, grid=grid(8), stream=stream0) 2023-01-11T21:38:06.7993130Z buf6 = empty_strided((1, 4), (4, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.7993330Z buf7 = empty_strided((1, 4), (4, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.7993421Z buf8 = buf6; del buf6 # reuse 2023-01-11T21:38:06.7993513Z buf9 = buf7; del buf7 # reuse 2023-01-11T21:38:06.7993681Z triton_fused_getitem_2_getitem_3_var_mean_1_1.run(buf8, buf9, arg0_1, 4, 16, grid=grid(4), stream=stream0) 2023-01-11T21:38:06.7993747Z del arg0_1 2023-01-11T21:38:06.7993844Z return (buf3, buf4, buf8, buf9, ) 2023-01-11T21:38:06.7993850Z 2023-01-11T21:38:06.7993854Z 2023-01-11T21:38:06.7993933Z if __name__ == "__main__": 2023-01-11T21:38:06.7994051Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.7994182Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.7994398Z arg0_1 = rand_strided((1, 2, 4, 8), (64, 32, 8, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.7994510Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.7994515Z 2023-01-11T21:38:06.7994809Z [2023-01-11 21:36:20,303] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 944 2023-01-11T21:38:06.7994815Z 2023-01-11T21:38:06.7994914Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.7994981Z import torch 2023-01-11T21:38:06.7995057Z import random 2023-01-11T21:38:06.7995183Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.7995327Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.7995332Z 2023-01-11T21:38:06.7995433Z aten = torch.ops.aten 2023-01-11T21:38:06.7995575Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.7995672Z async_compile = AsyncCompile() 2023-01-11T21:38:06.7995680Z 2023-01-11T21:38:06.7995746Z import triton 2023-01-11T21:38:06.7995838Z import triton.language as tl 2023-01-11T21:38:06.7995961Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.7996100Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.7996105Z 2023-01-11T21:38:06.7996109Z 2023-01-11T21:38:06.7996304Z triton_fused_getitem_getitem_1_var_mean_0 = async_compile.triton(''' 2023-01-11T21:38:06.7996384Z import triton 2023-01-11T21:38:06.7996477Z import triton.language as tl 2023-01-11T21:38:06.7996590Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.7996685Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.7996816Z from torch._inductor.triton_ops.autotune import reduction 2023-01-11T21:38:06.7996942Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.7996947Z 2023-01-11T21:38:06.7997036Z @reduction(size_hints=[8, 8], 2023-01-11T21:38:06.7997151Z reduction_hint=ReductionHint.INNER, 2023-01-11T21:38:06.7997266Z filename=__file__, 2023-01-11T21:38:06.7997655Z meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.7997732Z @triton.jit 2023-01-11T21:38:06.7997905Z def triton_(in_out_ptr0, in_ptr0, out_ptr2, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr): 2023-01-11T21:38:06.7997977Z xnumel = 8 2023-01-11T21:38:06.7998048Z rnumel = 8 2023-01-11T21:38:06.7998146Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.7998281Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.7998364Z xmask = xindex < xnumel 2023-01-11T21:38:06.7998483Z rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK]) 2023-01-11T21:38:06.7998547Z x0 = xindex 2023-01-11T21:38:06.7998663Z _tmp2 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.7998773Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.7998860Z rindex = roffset + rbase 2023-01-11T21:38:06.7998946Z rmask = rindex < rnumel 2023-01-11T21:38:06.7999020Z r1 = rindex 2023-01-11T21:38:06.7999263Z tmp0 = tl.load(in_ptr0 + (r1 + (8*x0)), rmask & xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.7999348Z tmp1 = tmp0.to(tl.float32) 2023-01-11T21:38:06.7999468Z _tmp2 = tl.where(xmask & rmask, _tmp2 + tmp1, _tmp2) 2023-01-11T21:38:06.7999583Z tmp2 = tl.reshape(tl.sum(_tmp2, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.7999699Z _tmp9 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.7999817Z _tmp12 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.7999922Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.8000009Z rindex = roffset + rbase 2023-01-11T21:38:06.8000087Z rmask = rindex < rnumel 2023-01-11T21:38:06.8000162Z r1 = rindex 2023-01-11T21:38:06.8000399Z tmp3 = tl.load(in_ptr0 + (r1 + (8*x0)), rmask & xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.8000532Z tmp10 = tl.load(in_ptr0 + (r1 + (8*x0)), rmask & xmask).to(tl.float32) 2023-01-11T21:38:06.8000605Z tmp4 = 8 2023-01-11T21:38:06.8000716Z tmp5 = tmp2 / tmp4 2023-01-11T21:38:06.8000808Z tmp6 = tmp5.to(tl.float32) 2023-01-11T21:38:06.8000923Z tmp7 = tmp3 - tmp6 2023-01-11T21:38:06.8000997Z tmp8 = tmp7 * tmp7 2023-01-11T21:38:06.8001119Z _tmp9 = tl.where(xmask & rmask, _tmp9 + tmp8, _tmp9) 2023-01-11T21:38:06.8001211Z tmp11 = tmp10.to(tl.float32) 2023-01-11T21:38:06.8001335Z _tmp12 = tl.where(xmask & rmask, _tmp12 + tmp11, _tmp12) 2023-01-11T21:38:06.8001448Z tmp9 = tl.reshape(tl.sum(_tmp9, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.8001563Z tmp12 = tl.reshape(tl.sum(_tmp12, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.8001637Z tmp13 = 8 2023-01-11T21:38:06.8001714Z tmp14 = tmp12 / tmp13 2023-01-11T21:38:06.8001804Z tmp15 = tmp14.to(tl.float32) 2023-01-11T21:38:06.8001875Z tmp16 = 7 2023-01-11T21:38:06.8001957Z tmp17 = tmp9 / tmp16 2023-01-11T21:38:06.8002094Z tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK, 1], tl.int32)), tmp15, xmask) 2023-01-11T21:38:06.8002236Z tl.store(in_out_ptr0 + (x0 + tl.zeros([XBLOCK, 1], tl.int32)), tmp17, xmask) 2023-01-11T21:38:06.8002320Z ''') 2023-01-11T21:38:06.8002325Z 2023-01-11T21:38:06.8002330Z 2023-01-11T21:38:06.8002519Z triton_fused_getitem_2_getitem_3_var_mean_1_1 = async_compile.triton(''' 2023-01-11T21:38:06.8002594Z import triton 2023-01-11T21:38:06.8002687Z import triton.language as tl 2023-01-11T21:38:06.8002802Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.8002905Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.8003037Z from torch._inductor.triton_ops.autotune import reduction 2023-01-11T21:38:06.8003162Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.8003194Z 2023-01-11T21:38:06.8003286Z @reduction(size_hints=[4, 16], 2023-01-11T21:38:06.8003397Z reduction_hint=ReductionHint.DEFAULT, 2023-01-11T21:38:06.8003482Z filename=__file__, 2023-01-11T21:38:06.8003871Z meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 4), equal_to_1=())]}) 2023-01-11T21:38:06.8003948Z @triton.jit 2023-01-11T21:38:06.8004126Z def triton_(in_out_ptr0, in_ptr0, out_ptr2, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr): 2023-01-11T21:38:06.8004201Z xnumel = 4 2023-01-11T21:38:06.8004274Z rnumel = 16 2023-01-11T21:38:06.8004364Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.8004497Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.8004581Z xmask = xindex < xnumel 2023-01-11T21:38:06.8004704Z rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK]) 2023-01-11T21:38:06.8004775Z x0 = xindex 2023-01-11T21:38:06.8004893Z _tmp2 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.8004998Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.8005079Z rindex = roffset + rbase 2023-01-11T21:38:06.8005170Z rmask = rindex < rnumel 2023-01-11T21:38:06.8005250Z r1 = rindex % 8 2023-01-11T21:38:06.8005336Z r2 = (rindex // 8) 2023-01-11T21:38:06.8005630Z tmp0 = tl.load(in_ptr0 + (r1 + (8*x0) + (32*r2)), rmask & xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.8005722Z tmp1 = tmp0.to(tl.float32) 2023-01-11T21:38:06.8005845Z _tmp2 = tl.where(xmask & rmask, _tmp2 + tmp1, _tmp2) 2023-01-11T21:38:06.8005959Z tmp2 = tl.reshape(tl.sum(_tmp2, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.8006069Z _tmp9 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.8006187Z _tmp12 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.8006294Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.8006382Z rindex = roffset + rbase 2023-01-11T21:38:06.8006467Z rmask = rindex < rnumel 2023-01-11T21:38:06.8006543Z r1 = rindex % 8 2023-01-11T21:38:06.8006624Z r2 = (rindex // 8) 2023-01-11T21:38:06.8006894Z tmp3 = tl.load(in_ptr0 + (r1 + (8*x0) + (32*r2)), rmask & xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.8007038Z tmp10 = tl.load(in_ptr0 + (r1 + (8*x0) + (32*r2)), rmask & xmask).to(tl.float32) 2023-01-11T21:38:06.8007113Z tmp4 = 16 2023-01-11T21:38:06.8007196Z tmp5 = tmp2 / tmp4 2023-01-11T21:38:06.8007287Z tmp6 = tmp5.to(tl.float32) 2023-01-11T21:38:06.8007400Z tmp7 = tmp3 - tmp6 2023-01-11T21:38:06.8007480Z tmp8 = tmp7 * tmp7 2023-01-11T21:38:06.8007594Z _tmp9 = tl.where(xmask & rmask, _tmp9 + tmp8, _tmp9) 2023-01-11T21:38:06.8007685Z tmp11 = tmp10.to(tl.float32) 2023-01-11T21:38:06.8007812Z _tmp12 = tl.where(xmask & rmask, _tmp12 + tmp11, _tmp12) 2023-01-11T21:38:06.8007926Z tmp9 = tl.reshape(tl.sum(_tmp9, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.8008041Z tmp12 = tl.reshape(tl.sum(_tmp12, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.8008116Z tmp13 = 16 2023-01-11T21:38:06.8008200Z tmp14 = tmp12 / tmp13 2023-01-11T21:38:06.8008281Z tmp15 = tmp14.to(tl.float32) 2023-01-11T21:38:06.8008353Z tmp16 = 15 2023-01-11T21:38:06.8008434Z tmp17 = tmp9 / tmp16 2023-01-11T21:38:06.8008571Z tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK, 1], tl.int32)), tmp15, xmask) 2023-01-11T21:38:06.8008708Z tl.store(in_out_ptr0 + (x0 + tl.zeros([XBLOCK, 1], tl.int32)), tmp17, xmask) 2023-01-11T21:38:06.8008793Z ''') 2023-01-11T21:38:06.8008799Z 2023-01-11T21:38:06.8008803Z 2023-01-11T21:38:06.8008897Z async_compile.wait(globals()) 2023-01-11T21:38:06.8008967Z del async_compile 2023-01-11T21:38:06.8008979Z 2023-01-11T21:38:06.8009046Z def call(args): 2023-01-11T21:38:06.8009155Z arg0_1, = args 2023-01-11T21:38:06.8009229Z args.clear() 2023-01-11T21:38:06.8009324Z with torch.cuda.device(0): 2023-01-11T21:38:06.8009531Z buf1 = empty_strided((1, 2, 4), (8, 4, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.8009740Z buf4 = empty_strided((1, 2, 4), (8, 4, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.8009833Z buf3 = buf1; del buf1 # reuse 2023-01-11T21:38:06.8009918Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.8010083Z triton_fused_getitem_getitem_1_var_mean_0.run(buf3, arg0_1, buf4, 8, 8, grid=grid(8), stream=stream0) 2023-01-11T21:38:06.8010279Z buf6 = empty_strided((1, 4), (4, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.8010476Z buf9 = empty_strided((1, 4), (4, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.8010565Z buf8 = buf6; del buf6 # reuse 2023-01-11T21:38:06.8010735Z triton_fused_getitem_2_getitem_3_var_mean_1_1.run(buf8, arg0_1, buf9, 4, 16, grid=grid(4), stream=stream0) 2023-01-11T21:38:06.8010812Z del arg0_1 2023-01-11T21:38:06.8010900Z return (buf3, buf4, buf8, buf9, ) 2023-01-11T21:38:06.8010913Z 2023-01-11T21:38:06.8010918Z 2023-01-11T21:38:06.8010992Z if __name__ == "__main__": 2023-01-11T21:38:06.8011115Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8011241Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8011457Z arg0_1 = rand_strided((1, 2, 4, 8), (64, 32, 8, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.8011571Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.8011576Z 2023-01-11T21:38:06.8011647Z ok (0.429s) 2023-01-11T21:38:06.8012109Z test_vdd_clamp_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8012246Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8012535Z [2023-01-11 21:36:20,341] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 945 2023-01-11T21:38:06.8012792Z [2023-01-11 21:36:20,413] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 945 2023-01-11T21:38:06.8013208Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8013340Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8013599Z [2023-01-11 21:36:20,447] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 946 2023-01-11T21:38:06.8013861Z [2023-01-11 21:36:20,523] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 946 2023-01-11T21:38:06.8013867Z 2023-01-11T21:38:06.8013968Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8014044Z import torch 2023-01-11T21:38:06.8014119Z import random 2023-01-11T21:38:06.8014239Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8014357Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8014362Z 2023-01-11T21:38:06.8014447Z aten = torch.ops.aten 2023-01-11T21:38:06.8014782Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8014881Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8014887Z 2023-01-11T21:38:06.8014962Z import triton 2023-01-11T21:38:06.8015054Z import triton.language as tl 2023-01-11T21:38:06.8015179Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8015369Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8015375Z 2023-01-11T21:38:06.8015380Z 2023-01-11T21:38:06.8015568Z triton_fused_ge_maximum_0 = async_compile.triton(''' 2023-01-11T21:38:06.8015662Z import triton 2023-01-11T21:38:06.8015766Z import triton.language as tl 2023-01-11T21:38:06.8015881Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.8015983Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.8016117Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.8016243Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.8016248Z 2023-01-11T21:38:06.8016665Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*i1', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.8016735Z @triton.jit 2023-01-11T21:38:06.8016879Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.8016956Z xnumel = 16 2023-01-11T21:38:06.8017052Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.8017233Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.8017321Z xmask = xindex < xnumel 2023-01-11T21:38:06.8017392Z x0 = xindex 2023-01-11T21:38:06.8017577Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.8017677Z tmp3 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.8017750Z tmp1 = 3.0 2023-01-11T21:38:06.8017887Z tmp2 = tl.where(tmp0 != tmp0, tmp0, tl.where(tmp0 > tmp1, tmp0, tmp1)) 2023-01-11T21:38:06.8017958Z tmp4 = 3 2023-01-11T21:38:06.8018039Z tmp5 = tmp3 >= tmp4 2023-01-11T21:38:06.8018174Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.8018300Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask) 2023-01-11T21:38:06.8018391Z ''') 2023-01-11T21:38:06.8018397Z 2023-01-11T21:38:06.8018401Z 2023-01-11T21:38:06.8018493Z async_compile.wait(globals()) 2023-01-11T21:38:06.8018570Z del async_compile 2023-01-11T21:38:06.8018575Z 2023-01-11T21:38:06.8018656Z def call(args): 2023-01-11T21:38:06.8018736Z primals_1, = args 2023-01-11T21:38:06.8018851Z args.clear() 2023-01-11T21:38:06.8018946Z with torch.cuda.device(0): 2023-01-11T21:38:06.8019138Z buf0 = empty_strided((16, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.8019330Z buf1 = empty_strided((16, ), (1, ), device='cuda', dtype=torch.bool) 2023-01-11T21:38:06.8019423Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.8019580Z triton_fused_ge_maximum_0.run(primals_1, buf0, buf1, 16, grid=grid(16), stream=stream0) 2023-01-11T21:38:06.8019658Z del primals_1 2023-01-11T21:38:06.8019742Z return (buf0, buf1, ) 2023-01-11T21:38:06.8019747Z 2023-01-11T21:38:06.8019755Z 2023-01-11T21:38:06.8019835Z if __name__ == "__main__": 2023-01-11T21:38:06.8019957Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8020076Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8020281Z primals_1 = rand_strided((16, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.8020402Z print_performance(lambda: call([primals_1])) 2023-01-11T21:38:06.8020408Z 2023-01-11T21:38:06.8020412Z 2023-01-11T21:38:06.8020511Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8020587Z import torch 2023-01-11T21:38:06.8020662Z import random 2023-01-11T21:38:06.8020784Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8020907Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8020912Z 2023-01-11T21:38:06.8020987Z aten = torch.ops.aten 2023-01-11T21:38:06.8021122Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8021216Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8021248Z 2023-01-11T21:38:06.8021324Z import triton 2023-01-11T21:38:06.8021415Z import triton.language as tl 2023-01-11T21:38:06.8021541Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8021680Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8021685Z 2023-01-11T21:38:06.8021690Z 2023-01-11T21:38:06.8021858Z triton_fused_ge_maximum_0 = async_compile.triton(''' 2023-01-11T21:38:06.8021926Z import triton 2023-01-11T21:38:06.8022019Z import triton.language as tl 2023-01-11T21:38:06.8022131Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.8022234Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.8022367Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.8022493Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.8022498Z 2023-01-11T21:38:06.8022915Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*i1', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.8022994Z @triton.jit 2023-01-11T21:38:06.8023130Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.8023204Z xnumel = 16 2023-01-11T21:38:06.8023307Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.8023435Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.8023519Z xmask = xindex < xnumel 2023-01-11T21:38:06.8023589Z x0 = xindex 2023-01-11T21:38:06.8023802Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.8023913Z tmp3 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.8023986Z tmp1 = 3.0 2023-01-11T21:38:06.8024124Z tmp2 = tl.where(tmp0 != tmp0, tmp0, tl.where(tmp0 > tmp1, tmp0, tmp1)) 2023-01-11T21:38:06.8024196Z tmp4 = 3 2023-01-11T21:38:06.8024281Z tmp5 = tmp3 >= tmp4 2023-01-11T21:38:06.8024415Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.8024548Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask) 2023-01-11T21:38:06.8024625Z ''') 2023-01-11T21:38:06.8024631Z 2023-01-11T21:38:06.8024643Z 2023-01-11T21:38:06.8024757Z async_compile.wait(globals()) 2023-01-11T21:38:06.8024836Z del async_compile 2023-01-11T21:38:06.8024841Z 2023-01-11T21:38:06.8024915Z def call(args): 2023-01-11T21:38:06.8024994Z primals_1, = args 2023-01-11T21:38:06.8025069Z args.clear() 2023-01-11T21:38:06.8025162Z with torch.cuda.device(0): 2023-01-11T21:38:06.8025361Z buf0 = empty_strided((16, ), (1, ), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.8025546Z buf1 = empty_strided((16, ), (1, ), device='cuda', dtype=torch.bool) 2023-01-11T21:38:06.8025639Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.8025794Z triton_fused_ge_maximum_0.run(primals_1, buf0, buf1, 16, grid=grid(16), stream=stream0) 2023-01-11T21:38:06.8025874Z del primals_1 2023-01-11T21:38:06.8025957Z return (buf0, buf1, ) 2023-01-11T21:38:06.8025962Z 2023-01-11T21:38:06.8025967Z 2023-01-11T21:38:06.8026046Z if __name__ == "__main__": 2023-01-11T21:38:06.8026164Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8026285Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8026489Z primals_1 = rand_strided((16, ), (1, ), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.8026607Z print_performance(lambda: call([primals_1])) 2023-01-11T21:38:06.8026612Z 2023-01-11T21:38:06.8026682Z ok (0.219s) 2023-01-11T21:38:06.8027148Z test_vertical_fusion1_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8027305Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8027564Z [2023-01-11 21:36:20,570] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 947 2023-01-11T21:38:06.8027829Z [2023-01-11 21:36:20,672] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 947 2023-01-11T21:38:06.8028244Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8028373Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8028629Z [2023-01-11 21:36:20,707] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 948 2023-01-11T21:38:06.8028887Z [2023-01-11 21:36:20,806] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 948 2023-01-11T21:38:06.8028900Z 2023-01-11T21:38:06.8028991Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8029068Z import torch 2023-01-11T21:38:06.8029142Z import random 2023-01-11T21:38:06.8029262Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8029385Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8029390Z 2023-01-11T21:38:06.8029472Z aten = torch.ops.aten 2023-01-11T21:38:06.8029608Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8029697Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8029702Z 2023-01-11T21:38:06.8029776Z import triton 2023-01-11T21:38:06.8029868Z import triton.language as tl 2023-01-11T21:38:06.8029993Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8030136Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8030141Z 2023-01-11T21:38:06.8030146Z 2023-01-11T21:38:06.8030303Z triton_fused_add_3_0 = async_compile.triton(''' 2023-01-11T21:38:06.8030377Z import triton 2023-01-11T21:38:06.8030471Z import triton.language as tl 2023-01-11T21:38:06.8030604Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.8030707Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.8030839Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.8030963Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.8030969Z 2023-01-11T21:38:06.8038605Z @pointwise(size_hints=[2097152], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]}) 2023-01-11T21:38:06.8038698Z @triton.jit 2023-01-11T21:38:06.8038862Z def triton_(in_ptr0, in_ptr1, in_ptr2, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.8038941Z xnumel = 1082016 2023-01-11T21:38:06.8039040Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.8039168Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.8039249Z xmask = xindex < xnumel 2023-01-11T21:38:06.8039326Z x2 = xindex 2023-01-11T21:38:06.8039404Z x0 = xindex % 26 2023-01-11T21:38:06.8039504Z tmp0 = tl.load(in_ptr0 + (x2), xmask) 2023-01-11T21:38:06.8039601Z tmp8 = tl.load(in_ptr1 + (x2), xmask) 2023-01-11T21:38:06.8039694Z tmp15 = tl.load(in_ptr2 + (x0), xmask) 2023-01-11T21:38:06.8039843Z tmp1 = -1.061519070296458e-11 2023-01-11T21:38:06.8039924Z tmp2 = tmp0 * tmp1 2023-01-11T21:38:06.8040048Z tmp3 = -1.988366587925593e-08 2023-01-11T21:38:06.8040130Z tmp4 = tmp2 + tmp3 2023-01-11T21:38:06.8040210Z tmp5 = tmp0 * tmp4 2023-01-11T21:38:06.8040331Z tmp6 = -3.087032500374211e-07 2023-01-11T21:38:06.8040472Z tmp7 = tmp5 + tmp6 2023-01-11T21:38:06.8040594Z tmp9 = 1.55093272922008e-10 2023-01-11T21:38:06.8040675Z tmp10 = tmp8 * tmp9 2023-01-11T21:38:06.8040757Z tmp11 = tmp7 + tmp10 2023-01-11T21:38:06.8040835Z tmp12 = 1 / tmp11 2023-01-11T21:38:06.8040909Z tmp13 = 1.0 2023-01-11T21:38:06.8040988Z tmp14 = tmp12 * tmp13 2023-01-11T21:38:06.8041070Z tmp16 = tmp11 * tmp15 2023-01-11T21:38:06.8041152Z tmp17 = tmp14 + tmp16 2023-01-11T21:38:06.8041290Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp17, xmask) 2023-01-11T21:38:06.8041376Z ''') 2023-01-11T21:38:06.8041382Z 2023-01-11T21:38:06.8041387Z 2023-01-11T21:38:06.8041483Z async_compile.wait(globals()) 2023-01-11T21:38:06.8041562Z del async_compile 2023-01-11T21:38:06.8041568Z 2023-01-11T21:38:06.8041642Z def call(args): 2023-01-11T21:38:06.8041723Z arg0_1, arg1_1, arg2_1 = args 2023-01-11T21:38:06.8041801Z args.clear() 2023-01-11T21:38:06.8041897Z with torch.cuda.device(0): 2023-01-11T21:38:06.8042122Z buf0 = empty_strided((204, 204, 26), (5304, 26, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.8042216Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.8042372Z triton_fused_add_3_0.run(arg1_1, arg0_1, arg2_1, buf0, 1082016, grid=grid(1082016), stream=stream0) 2023-01-11T21:38:06.8042449Z del arg0_1 2023-01-11T21:38:06.8042514Z del arg1_1 2023-01-11T21:38:06.8042587Z del arg2_1 2023-01-11T21:38:06.8042665Z return (buf0, ) 2023-01-11T21:38:06.8042671Z 2023-01-11T21:38:06.8042675Z 2023-01-11T21:38:06.8042759Z if __name__ == "__main__": 2023-01-11T21:38:06.8042878Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8043007Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8043226Z arg0_1 = rand_strided((204, 204, 26), (5304, 26, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.8043445Z arg1_1 = rand_strided((204, 204, 26), (5304, 26, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.8043638Z arg2_1 = rand_strided((26, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.8043768Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1])) 2023-01-11T21:38:06.8043773Z 2023-01-11T21:38:06.8043777Z 2023-01-11T21:38:06.8043876Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8043979Z import torch 2023-01-11T21:38:06.8044056Z import random 2023-01-11T21:38:06.8044177Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8044302Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8044307Z 2023-01-11T21:38:06.8044389Z aten = torch.ops.aten 2023-01-11T21:38:06.8044519Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8044615Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8044620Z 2023-01-11T21:38:06.8044694Z import triton 2023-01-11T21:38:06.8044786Z import triton.language as tl 2023-01-11T21:38:06.8044911Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8045052Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8045058Z 2023-01-11T21:38:06.8045062Z 2023-01-11T21:38:06.8045219Z triton_fused_add_3_0 = async_compile.triton(''' 2023-01-11T21:38:06.8045300Z import triton 2023-01-11T21:38:06.8045389Z import triton.language as tl 2023-01-11T21:38:06.8045503Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.8045608Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.8045743Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.8045868Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.8045873Z 2023-01-11T21:38:06.8046312Z @pointwise(size_hints=[2097152], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: '*fp16', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]}) 2023-01-11T21:38:06.8046413Z @triton.jit 2023-01-11T21:38:06.8046564Z def triton_(in_ptr0, in_ptr1, in_ptr2, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.8046634Z xnumel = 1082016 2023-01-11T21:38:06.8046732Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.8046862Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.8046949Z xmask = xindex < xnumel 2023-01-11T21:38:06.8047022Z x2 = xindex 2023-01-11T21:38:06.8047100Z x0 = xindex % 26 2023-01-11T21:38:06.8047219Z tmp0 = tl.load(in_ptr0 + (x2), xmask).to(tl.float32) 2023-01-11T21:38:06.8047330Z tmp8 = tl.load(in_ptr1 + (x2), xmask).to(tl.float32) 2023-01-11T21:38:06.8047449Z tmp15 = tl.load(in_ptr2 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.8047574Z tmp1 = -1.061519070296458e-11 2023-01-11T21:38:06.8047657Z tmp2 = tmp0 * tmp1 2023-01-11T21:38:06.8047781Z tmp3 = -1.988366587925593e-08 2023-01-11T21:38:06.8047862Z tmp4 = tmp2 + tmp3 2023-01-11T21:38:06.8047945Z tmp5 = tmp0 * tmp4 2023-01-11T21:38:06.8048060Z tmp6 = -3.087032500374211e-07 2023-01-11T21:38:06.8048138Z tmp7 = tmp5 + tmp6 2023-01-11T21:38:06.8048262Z tmp9 = 1.55093272922008e-10 2023-01-11T21:38:06.8048344Z tmp10 = tmp8 * tmp9 2023-01-11T21:38:06.8048427Z tmp11 = tmp7 + tmp10 2023-01-11T21:38:06.8048505Z tmp12 = 1 / tmp11 2023-01-11T21:38:06.8048575Z tmp13 = 1.0 2023-01-11T21:38:06.8048657Z tmp14 = tmp12 * tmp13 2023-01-11T21:38:06.8048737Z tmp16 = tmp11 * tmp15 2023-01-11T21:38:06.8048818Z tmp17 = tmp14 + tmp16 2023-01-11T21:38:06.8048957Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp17, xmask) 2023-01-11T21:38:06.8049045Z ''') 2023-01-11T21:38:06.8049051Z 2023-01-11T21:38:06.8049056Z 2023-01-11T21:38:06.8049151Z async_compile.wait(globals()) 2023-01-11T21:38:06.8049229Z del async_compile 2023-01-11T21:38:06.8049235Z 2023-01-11T21:38:06.8049303Z def call(args): 2023-01-11T21:38:06.8049390Z arg0_1, arg1_1, arg2_1 = args 2023-01-11T21:38:06.8049473Z args.clear() 2023-01-11T21:38:06.8049567Z with torch.cuda.device(0): 2023-01-11T21:38:06.8049784Z buf0 = empty_strided((204, 204, 26), (5304, 26, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.8049878Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.8050066Z triton_fused_add_3_0.run(arg1_1, arg0_1, arg2_1, buf0, 1082016, grid=grid(1082016), stream=stream0) 2023-01-11T21:38:06.8050135Z del arg0_1 2023-01-11T21:38:06.8050209Z del arg1_1 2023-01-11T21:38:06.8050281Z del arg2_1 2023-01-11T21:38:06.8050360Z return (buf0, ) 2023-01-11T21:38:06.8050366Z 2023-01-11T21:38:06.8050371Z 2023-01-11T21:38:06.8050453Z if __name__ == "__main__": 2023-01-11T21:38:06.8050571Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8050698Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8050917Z arg0_1 = rand_strided((204, 204, 26), (5304, 26, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.8051128Z arg1_1 = rand_strided((204, 204, 26), (5304, 26, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.8051324Z arg2_1 = rand_strided((26, ), (1, ), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.8051455Z print_performance(lambda: call([arg0_1, arg1_1, arg2_1])) 2023-01-11T21:38:06.8051460Z 2023-01-11T21:38:06.8051534Z ok (0.284s) 2023-01-11T21:38:06.8051990Z test_views1_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8052122Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8052383Z [2023-01-11 21:36:20,824] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 949 2023-01-11T21:38:06.8052672Z [2023-01-11 21:36:20,893] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 949 2023-01-11T21:38:06.8053090Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8053223Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8053477Z [2023-01-11 21:36:20,909] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 950 2023-01-11T21:38:06.8053733Z [2023-01-11 21:36:20,978] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 950 2023-01-11T21:38:06.8054145Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8054282Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8054852Z [2023-01-11 21:36:20,996] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 951 2023-01-11T21:38:06.8055118Z [2023-01-11 21:36:21,066] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 951 2023-01-11T21:38:06.8055532Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8055667Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8055922Z [2023-01-11 21:36:21,085] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 952 2023-01-11T21:38:06.8055927Z 2023-01-11T21:38:06.8056080Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8056157Z import torch 2023-01-11T21:38:06.8056225Z import random 2023-01-11T21:38:06.8056346Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8056470Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8056475Z 2023-01-11T21:38:06.8056561Z aten = torch.ops.aten 2023-01-11T21:38:06.8056699Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8056794Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8056800Z 2023-01-11T21:38:06.8056873Z import triton 2023-01-11T21:38:06.8056965Z import triton.language as tl 2023-01-11T21:38:06.8057086Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8057290Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8057296Z 2023-01-11T21:38:06.8057302Z 2023-01-11T21:38:06.8057481Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.8057570Z import triton 2023-01-11T21:38:06.8057668Z import triton.language as tl 2023-01-11T21:38:06.8057782Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.8057883Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.8058009Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.8058134Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.8058140Z 2023-01-11T21:38:06.8058558Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.8058675Z @triton.jit 2023-01-11T21:38:06.8058819Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.8058896Z xnumel = 35 2023-01-11T21:38:06.8058995Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.8059125Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.8059204Z xmask = xindex < xnumel 2023-01-11T21:38:06.8059277Z x0 = xindex 2023-01-11T21:38:06.8059374Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.8059472Z tmp1 = tl.load(in_ptr1 + (x0), xmask) 2023-01-11T21:38:06.8059554Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.8059691Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.8059777Z ''') 2023-01-11T21:38:06.8059782Z 2023-01-11T21:38:06.8059787Z 2023-01-11T21:38:06.8059886Z async_compile.wait(globals()) 2023-01-11T21:38:06.8059956Z del async_compile 2023-01-11T21:38:06.8059962Z 2023-01-11T21:38:06.8060039Z def call(args): 2023-01-11T21:38:06.8060122Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.8060197Z args.clear() 2023-01-11T21:38:06.8060292Z with torch.cuda.device(0): 2023-01-11T21:38:06.8060493Z buf0 = empty_strided((5, 7), (7, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.8060587Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.8060727Z triton_fused_add_0.run(arg0_1, arg1_1, buf0, 35, grid=grid(35), stream=stream0) 2023-01-11T21:38:06.8060803Z del arg0_1 2023-01-11T21:38:06.8060876Z del arg1_1 2023-01-11T21:38:06.8060953Z return (buf0, ) 2023-01-11T21:38:06.8060958Z 2023-01-11T21:38:06.8060962Z 2023-01-11T21:38:06.8061042Z if __name__ == "__main__": 2023-01-11T21:38:06.8061159Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8061284Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8061485Z arg0_1 = rand_strided((35, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.8061675Z arg1_1 = rand_strided((5, 7), (7, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.8061800Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.8061805Z 2023-01-11T21:38:06.8061810Z 2023-01-11T21:38:06.8061907Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8061988Z import torch 2023-01-11T21:38:06.8062063Z import random 2023-01-11T21:38:06.8062214Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8062353Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8062358Z 2023-01-11T21:38:06.8062443Z aten = torch.ops.aten 2023-01-11T21:38:06.8062584Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8062686Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8062691Z 2023-01-11T21:38:06.8062768Z import triton 2023-01-11T21:38:06.8062865Z import triton.language as tl 2023-01-11T21:38:06.8063001Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8063153Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8063162Z 2023-01-11T21:38:06.8063166Z 2023-01-11T21:38:06.8063336Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.8063412Z import triton 2023-01-11T21:38:06.8063502Z import triton.language as tl 2023-01-11T21:38:06.8063624Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.8063737Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.8063882Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.8064019Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.8064024Z 2023-01-11T21:38:06.8064513Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.8064589Z @triton.jit 2023-01-11T21:38:06.8064731Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.8064825Z xnumel = 35 2023-01-11T21:38:06.8064922Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.8065051Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.8065136Z xmask = xindex < xnumel 2023-01-11T21:38:06.8065207Z x0 = xindex 2023-01-11T21:38:06.8065327Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.8065445Z tmp1 = tl.load(in_ptr1 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.8065518Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.8065651Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.8065739Z ''') 2023-01-11T21:38:06.8065745Z 2023-01-11T21:38:06.8065749Z 2023-01-11T21:38:06.8065842Z async_compile.wait(globals()) 2023-01-11T21:38:06.8065920Z del async_compile 2023-01-11T21:38:06.8065925Z 2023-01-11T21:38:06.8065999Z def call(args): 2023-01-11T21:38:06.8066078Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.8066146Z args.clear() 2023-01-11T21:38:06.8066244Z with torch.cuda.device(0): 2023-01-11T21:38:06.8066446Z buf0 = empty_strided((5, 7), (7, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.8066540Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.8066681Z triton_fused_add_0.run(arg0_1, arg1_1, buf0, 35, grid=grid(35), stream=stream0) 2023-01-11T21:38:06.8066758Z del arg0_1 2023-01-11T21:38:06.8066834Z del arg1_1 2023-01-11T21:38:06.8066905Z return (buf0, ) 2023-01-11T21:38:06.8066918Z 2023-01-11T21:38:06.8066922Z 2023-01-11T21:38:06.8066995Z if __name__ == "__main__": 2023-01-11T21:38:06.8067112Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8067238Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8067437Z arg0_1 = rand_strided((35, ), (1, ), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.8067634Z arg1_1 = rand_strided((5, 7), (7, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.8067756Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.8067761Z 2023-01-11T21:38:06.8067765Z 2023-01-11T21:38:06.8067864Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8067939Z import torch 2023-01-11T21:38:06.8068007Z import random 2023-01-11T21:38:06.8068127Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8068279Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8068285Z 2023-01-11T21:38:06.8068368Z aten = torch.ops.aten 2023-01-11T21:38:06.8068506Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8068600Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8068606Z 2023-01-11T21:38:06.8068680Z import triton 2023-01-11T21:38:06.8068764Z import triton.language as tl 2023-01-11T21:38:06.8068889Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8069028Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8069033Z 2023-01-11T21:38:06.8069042Z 2023-01-11T21:38:06.8069199Z triton_fused_add_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.8069275Z import triton 2023-01-11T21:38:06.8069367Z import triton.language as tl 2023-01-11T21:38:06.8069483Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.8069585Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.8069714Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.8069842Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.8069847Z 2023-01-11T21:38:06.8070264Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.8070338Z @triton.jit 2023-01-11T21:38:06.8070480Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.8070554Z xnumel = 35 2023-01-11T21:38:06.8070680Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.8070810Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.8070886Z xmask = xindex < xnumel 2023-01-11T21:38:06.8070957Z x0 = xindex 2023-01-11T21:38:06.8071054Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.8071155Z tmp3 = tl.load(in_ptr1 + (x0), xmask) 2023-01-11T21:38:06.8071227Z tmp1 = 1 2023-01-11T21:38:06.8071305Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.8071383Z tmp4 = tmp2 + tmp3 2023-01-11T21:38:06.8071512Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.8071598Z ''') 2023-01-11T21:38:06.8071604Z 2023-01-11T21:38:06.8071608Z 2023-01-11T21:38:06.8071701Z async_compile.wait(globals()) 2023-01-11T21:38:06.8071778Z del async_compile 2023-01-11T21:38:06.8071784Z 2023-01-11T21:38:06.8071858Z def call(args): 2023-01-11T21:38:06.8071938Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.8072015Z args.clear() 2023-01-11T21:38:06.8072103Z with torch.cuda.device(0): 2023-01-11T21:38:06.8072300Z buf0 = empty_strided((5, 7), (7, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.8072393Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.8072535Z triton_fused_add_1_0.run(arg0_1, arg1_1, buf0, 35, grid=grid(35), stream=stream0) 2023-01-11T21:38:06.8072613Z del arg0_1 2023-01-11T21:38:06.8072687Z del arg1_1 2023-01-11T21:38:06.8072764Z return (buf0, ) 2023-01-11T21:38:06.8072770Z 2023-01-11T21:38:06.8072774Z 2023-01-11T21:38:06.8072854Z if __name__ == "__main__": 2023-01-11T21:38:06.8072964Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8073088Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8073286Z arg0_1 = rand_strided((35, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.8073483Z arg1_1 = rand_strided((5, 7), (7, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.8073601Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.8073610Z 2023-01-11T21:38:06.8073878Z [2023-01-11 21:36:21,155] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 952 2023-01-11T21:38:06.8074326Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8074460Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8074717Z [2023-01-11 21:36:21,172] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 953 2023-01-11T21:38:06.8074971Z [2023-01-11 21:36:21,253] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 953 2023-01-11T21:38:06.8075415Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8075578Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8075834Z [2023-01-11 21:36:21,270] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 954 2023-01-11T21:38:06.8076100Z [2023-01-11 21:36:21,341] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 954 2023-01-11T21:38:06.8076514Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8076669Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8076922Z [2023-01-11 21:36:21,361] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 955 2023-01-11T21:38:06.8076928Z 2023-01-11T21:38:06.8077028Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8077104Z import torch 2023-01-11T21:38:06.8077180Z import random 2023-01-11T21:38:06.8077292Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8077415Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8077420Z 2023-01-11T21:38:06.8077502Z aten = torch.ops.aten 2023-01-11T21:38:06.8077639Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8077735Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8077740Z 2023-01-11T21:38:06.8077813Z import triton 2023-01-11T21:38:06.8077905Z import triton.language as tl 2023-01-11T21:38:06.8078026Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8078167Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8078172Z 2023-01-11T21:38:06.8078177Z 2023-01-11T21:38:06.8078335Z triton_fused_add_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.8078410Z import triton 2023-01-11T21:38:06.8078505Z import triton.language as tl 2023-01-11T21:38:06.8078618Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.8078719Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.8078852Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.8078970Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.8078975Z 2023-01-11T21:38:06.8079392Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.8079471Z @triton.jit 2023-01-11T21:38:06.8079613Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.8079686Z xnumel = 35 2023-01-11T21:38:06.8079785Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.8079938Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.8080026Z xmask = xindex < xnumel 2023-01-11T21:38:06.8080090Z x0 = xindex 2023-01-11T21:38:06.8080208Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.8080324Z tmp3 = tl.load(in_ptr1 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.8080396Z tmp1 = 1 2023-01-11T21:38:06.8080476Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.8080554Z tmp4 = tmp2 + tmp3 2023-01-11T21:38:06.8080690Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.8080768Z ''') 2023-01-11T21:38:06.8080774Z 2023-01-11T21:38:06.8080778Z 2023-01-11T21:38:06.8080871Z async_compile.wait(globals()) 2023-01-11T21:38:06.8080953Z del async_compile 2023-01-11T21:38:06.8080958Z 2023-01-11T21:38:06.8081034Z def call(args): 2023-01-11T21:38:06.8081115Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.8081193Z args.clear() 2023-01-11T21:38:06.8081286Z with torch.cuda.device(0): 2023-01-11T21:38:06.8081479Z buf0 = empty_strided((5, 7), (7, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.8081572Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.8081716Z triton_fused_add_1_0.run(arg0_1, arg1_1, buf0, 35, grid=grid(35), stream=stream0) 2023-01-11T21:38:06.8081790Z del arg0_1 2023-01-11T21:38:06.8081863Z del arg1_1 2023-01-11T21:38:06.8081941Z return (buf0, ) 2023-01-11T21:38:06.8081946Z 2023-01-11T21:38:06.8081950Z 2023-01-11T21:38:06.8082030Z if __name__ == "__main__": 2023-01-11T21:38:06.8082149Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8082269Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8082505Z arg0_1 = rand_strided((35, ), (1, ), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.8082704Z arg1_1 = rand_strided((5, 7), (7, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.8082824Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.8082830Z 2023-01-11T21:38:06.8082837Z 2023-01-11T21:38:06.8082935Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8083009Z import torch 2023-01-11T21:38:06.8083083Z import random 2023-01-11T21:38:06.8083202Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8083319Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8083325Z 2023-01-11T21:38:06.8083407Z aten = torch.ops.aten 2023-01-11T21:38:06.8083543Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8083640Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8083645Z 2023-01-11T21:38:06.8083718Z import triton 2023-01-11T21:38:06.8083811Z import triton.language as tl 2023-01-11T21:38:06.8083941Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8084073Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8084086Z 2023-01-11T21:38:06.8084090Z 2023-01-11T21:38:06.8084237Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.8084314Z import triton 2023-01-11T21:38:06.8084409Z import triton.language as tl 2023-01-11T21:38:06.8084525Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.8084627Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.8084761Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.8084887Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.8084892Z 2023-01-11T21:38:06.8085314Z @pointwise(size_hints=[8192], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.8085384Z @triton.jit 2023-01-11T21:38:06.8085524Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.8085599Z xnumel = 5040 2023-01-11T21:38:06.8085696Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.8085881Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.8085967Z xmask = xindex < xnumel 2023-01-11T21:38:06.8086040Z x0 = xindex 2023-01-11T21:38:06.8086129Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.8086226Z tmp1 = tl.load(in_ptr1 + (x0), xmask) 2023-01-11T21:38:06.8086304Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.8086439Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.8086525Z ''') 2023-01-11T21:38:06.8086532Z 2023-01-11T21:38:06.8086536Z 2023-01-11T21:38:06.8086630Z async_compile.wait(globals()) 2023-01-11T21:38:06.8086707Z del async_compile 2023-01-11T21:38:06.8086712Z 2023-01-11T21:38:06.8086789Z def call(args): 2023-01-11T21:38:06.8086861Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.8086937Z args.clear() 2023-01-11T21:38:06.8087028Z with torch.cuda.device(0): 2023-01-11T21:38:06.8087266Z buf0 = empty_strided((2, 3, 4, 5, 6, 7), (2520, 840, 210, 42, 7, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.8087361Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.8087506Z triton_fused_add_0.run(arg0_1, arg1_1, buf0, 5040, grid=grid(5040), stream=stream0) 2023-01-11T21:38:06.8087580Z del arg0_1 2023-01-11T21:38:06.8087645Z del arg1_1 2023-01-11T21:38:06.8087722Z return (buf0, ) 2023-01-11T21:38:06.8087728Z 2023-01-11T21:38:06.8087733Z 2023-01-11T21:38:06.8087814Z if __name__ == "__main__": 2023-01-11T21:38:06.8087933Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8088060Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8088262Z arg0_1 = rand_strided((5040, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.8088526Z arg1_1 = rand_strided((2, 3, 4, 5, 6, 7), (2520, 840, 210, 42, 7, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.8088646Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.8088652Z 2023-01-11T21:38:06.8088656Z 2023-01-11T21:38:06.8088748Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8088824Z import torch 2023-01-11T21:38:06.8088897Z import random 2023-01-11T21:38:06.8089016Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8089142Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8089147Z 2023-01-11T21:38:06.8089229Z aten = torch.ops.aten 2023-01-11T21:38:06.8089364Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8089459Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8089464Z 2023-01-11T21:38:06.8089530Z import triton 2023-01-11T21:38:06.8089623Z import triton.language as tl 2023-01-11T21:38:06.8089754Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8089892Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8089898Z 2023-01-11T21:38:06.8089903Z 2023-01-11T21:38:06.8090056Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.8090131Z import triton 2023-01-11T21:38:06.8090228Z import triton.language as tl 2023-01-11T21:38:06.8090336Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.8090439Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.8090573Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.8090698Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.8090703Z 2023-01-11T21:38:06.8091123Z @pointwise(size_hints=[8192], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.8091200Z @triton.jit 2023-01-11T21:38:06.8091342Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.8091419Z xnumel = 5040 2023-01-11T21:38:06.8091516Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.8091637Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.8091750Z xmask = xindex < xnumel 2023-01-11T21:38:06.8091823Z x0 = xindex 2023-01-11T21:38:06.8091939Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.8092057Z tmp1 = tl.load(in_ptr1 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.8092139Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.8092273Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.8092351Z ''') 2023-01-11T21:38:06.8092357Z 2023-01-11T21:38:06.8092361Z 2023-01-11T21:38:06.8092455Z async_compile.wait(globals()) 2023-01-11T21:38:06.8092532Z del async_compile 2023-01-11T21:38:06.8092538Z 2023-01-11T21:38:06.8092615Z def call(args): 2023-01-11T21:38:06.8092692Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.8092768Z args.clear() 2023-01-11T21:38:06.8092860Z with torch.cuda.device(0): 2023-01-11T21:38:06.8093090Z buf0 = empty_strided((2, 3, 4, 5, 6, 7), (2520, 840, 210, 42, 7, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.8093188Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.8093331Z triton_fused_add_0.run(arg0_1, arg1_1, buf0, 5040, grid=grid(5040), stream=stream0) 2023-01-11T21:38:06.8093408Z del arg0_1 2023-01-11T21:38:06.8093483Z del arg1_1 2023-01-11T21:38:06.8093560Z return (buf0, ) 2023-01-11T21:38:06.8093565Z 2023-01-11T21:38:06.8093569Z 2023-01-11T21:38:06.8093649Z if __name__ == "__main__": 2023-01-11T21:38:06.8093768Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8093887Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8094087Z arg0_1 = rand_strided((5040, ), (1, ), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.8094349Z arg1_1 = rand_strided((2, 3, 4, 5, 6, 7), (2520, 840, 210, 42, 7, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.8094660Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.8094665Z 2023-01-11T21:38:06.8094937Z [2023-01-11 21:36:21,433] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 955 2023-01-11T21:38:06.8095353Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8095489Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8095781Z [2023-01-11 21:36:21,452] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 956 2023-01-11T21:38:06.8096066Z [2023-01-11 21:36:21,526] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 956 2023-01-11T21:38:06.8096480Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8096614Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8096860Z [2023-01-11 21:36:21,543] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 957 2023-01-11T21:38:06.8097119Z [2023-01-11 21:36:21,555] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 957 2023-01-11T21:38:06.8097604Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8097794Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8098050Z [2023-01-11 21:36:21,572] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 958 2023-01-11T21:38:06.8098056Z 2023-01-11T21:38:06.8098155Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8098233Z import torch 2023-01-11T21:38:06.8098308Z import random 2023-01-11T21:38:06.8098427Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8098543Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8098548Z 2023-01-11T21:38:06.8098632Z aten = torch.ops.aten 2023-01-11T21:38:06.8098774Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8098873Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8098879Z 2023-01-11T21:38:06.8098954Z import triton 2023-01-11T21:38:06.8099047Z import triton.language as tl 2023-01-11T21:38:06.8099172Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8099307Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8099320Z 2023-01-11T21:38:06.8099325Z 2023-01-11T21:38:06.8099475Z triton_fused_add_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.8099550Z import triton 2023-01-11T21:38:06.8099642Z import triton.language as tl 2023-01-11T21:38:06.8099757Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.8099859Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.8099991Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.8100116Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.8100121Z 2023-01-11T21:38:06.8100540Z @pointwise(size_hints=[8192], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.8100645Z @triton.jit 2023-01-11T21:38:06.8100791Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.8100870Z xnumel = 5040 2023-01-11T21:38:06.8100975Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.8101107Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.8101195Z xmask = xindex < xnumel 2023-01-11T21:38:06.8101270Z x0 = xindex 2023-01-11T21:38:06.8101363Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.8101465Z tmp3 = tl.load(in_ptr1 + (x0), xmask) 2023-01-11T21:38:06.8101539Z tmp1 = 1 2023-01-11T21:38:06.8101624Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.8101707Z tmp4 = tmp2 + tmp3 2023-01-11T21:38:06.8101844Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.8101936Z ''') 2023-01-11T21:38:06.8101941Z 2023-01-11T21:38:06.8101946Z 2023-01-11T21:38:06.8102034Z async_compile.wait(globals()) 2023-01-11T21:38:06.8102116Z del async_compile 2023-01-11T21:38:06.8102121Z 2023-01-11T21:38:06.8102198Z def call(args): 2023-01-11T21:38:06.8102283Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.8102361Z args.clear() 2023-01-11T21:38:06.8102455Z with torch.cuda.device(0): 2023-01-11T21:38:06.8102694Z buf0 = empty_strided((2, 3, 4, 5, 6, 7), (2520, 840, 210, 42, 7, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.8102790Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.8102932Z triton_fused_add_1_0.run(arg0_1, arg1_1, buf0, 5040, grid=grid(5040), stream=stream0) 2023-01-11T21:38:06.8103007Z del arg0_1 2023-01-11T21:38:06.8103089Z del arg1_1 2023-01-11T21:38:06.8103169Z return (buf0, ) 2023-01-11T21:38:06.8103174Z 2023-01-11T21:38:06.8103182Z 2023-01-11T21:38:06.8103270Z if __name__ == "__main__": 2023-01-11T21:38:06.8103391Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8103521Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8103720Z arg0_1 = rand_strided((5040, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.8103990Z arg1_1 = rand_strided((2, 3, 4, 5, 6, 7), (2520, 840, 210, 42, 7, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.8104112Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.8104117Z 2023-01-11T21:38:06.8104121Z 2023-01-11T21:38:06.8104219Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8104294Z import torch 2023-01-11T21:38:06.8104368Z import random 2023-01-11T21:38:06.8104487Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8104612Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8104617Z 2023-01-11T21:38:06.8104692Z aten = torch.ops.aten 2023-01-11T21:38:06.8104833Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8104932Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8104937Z 2023-01-11T21:38:06.8105011Z import triton 2023-01-11T21:38:06.8105103Z import triton.language as tl 2023-01-11T21:38:06.8105230Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8105371Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8105376Z 2023-01-11T21:38:06.8105381Z 2023-01-11T21:38:06.8105540Z triton_fused_add_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.8105612Z import triton 2023-01-11T21:38:06.8105724Z import triton.language as tl 2023-01-11T21:38:06.8105855Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.8105968Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.8106100Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.8106226Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.8106269Z 2023-01-11T21:38:06.8106688Z @pointwise(size_hints=[8192], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.8106762Z @triton.jit 2023-01-11T21:38:06.8106897Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.8106974Z xnumel = 5040 2023-01-11T21:38:06.8107074Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.8107204Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.8107288Z xmask = xindex < xnumel 2023-01-11T21:38:06.8107361Z x0 = xindex 2023-01-11T21:38:06.8107479Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.8107588Z tmp3 = tl.load(in_ptr1 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.8107662Z tmp1 = 1 2023-01-11T21:38:06.8107742Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.8107825Z tmp4 = tmp2 + tmp3 2023-01-11T21:38:06.8107960Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.8108044Z ''') 2023-01-11T21:38:06.8108050Z 2023-01-11T21:38:06.8108054Z 2023-01-11T21:38:06.8108147Z async_compile.wait(globals()) 2023-01-11T21:38:06.8108224Z del async_compile 2023-01-11T21:38:06.8108232Z 2023-01-11T21:38:06.8108300Z def call(args): 2023-01-11T21:38:06.8108379Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.8108456Z args.clear() 2023-01-11T21:38:06.8108548Z with torch.cuda.device(0): 2023-01-11T21:38:06.8108785Z buf0 = empty_strided((2, 3, 4, 5, 6, 7), (2520, 840, 210, 42, 7, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.8108877Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.8109026Z triton_fused_add_1_0.run(arg0_1, arg1_1, buf0, 5040, grid=grid(5040), stream=stream0) 2023-01-11T21:38:06.8109092Z del arg0_1 2023-01-11T21:38:06.8109165Z del arg1_1 2023-01-11T21:38:06.8109245Z return (buf0, ) 2023-01-11T21:38:06.8109251Z 2023-01-11T21:38:06.8109255Z 2023-01-11T21:38:06.8109337Z if __name__ == "__main__": 2023-01-11T21:38:06.8109455Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8109582Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8109815Z arg0_1 = rand_strided((5040, ), (1, ), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.8110050Z arg1_1 = rand_strided((2, 3, 4, 5, 6, 7), (2520, 840, 210, 42, 7, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.8110163Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.8110168Z 2023-01-11T21:38:06.8110181Z 2023-01-11T21:38:06.8110271Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8110345Z import torch 2023-01-11T21:38:06.8110420Z import random 2023-01-11T21:38:06.8110538Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8110665Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8110673Z 2023-01-11T21:38:06.8110756Z aten = torch.ops.aten 2023-01-11T21:38:06.8110892Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8110981Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8110986Z 2023-01-11T21:38:06.8111058Z import triton 2023-01-11T21:38:06.8111154Z import triton.language as tl 2023-01-11T21:38:06.8111281Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8111420Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8111426Z 2023-01-11T21:38:06.8111430Z 2023-01-11T21:38:06.8111583Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.8111658Z import triton 2023-01-11T21:38:06.8111751Z import triton.language as tl 2023-01-11T21:38:06.8111858Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.8111959Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.8112092Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.8112247Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.8112253Z 2023-01-11T21:38:06.8112678Z @pointwise(size_hints=[8192], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.8112752Z @triton.jit 2023-01-11T21:38:06.8112893Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.8112968Z xnumel = 5040 2023-01-11T21:38:06.8113059Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.8113188Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.8113271Z xmask = xindex < xnumel 2023-01-11T21:38:06.8113341Z x0 = xindex 2023-01-11T21:38:06.8113439Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.8113535Z tmp1 = tl.load(in_ptr1 + (x0), xmask) 2023-01-11T21:38:06.8113615Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.8113745Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.8113830Z ''') 2023-01-11T21:38:06.8113836Z 2023-01-11T21:38:06.8113840Z 2023-01-11T21:38:06.8113932Z async_compile.wait(globals()) 2023-01-11T21:38:06.8114008Z del async_compile 2023-01-11T21:38:06.8114013Z 2023-01-11T21:38:06.8114090Z def call(args): 2023-01-11T21:38:06.8114169Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.8114247Z args.clear() 2023-01-11T21:38:06.8114332Z with torch.cuda.device(0): 2023-01-11T21:38:06.8114567Z buf0 = empty_strided((2, 3, 4, 5, 6, 7), (2520, 840, 210, 42, 7, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.8114659Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.8114805Z triton_fused_add_0.run(arg0_1, arg1_1, buf0, 5040, grid=grid(5040), stream=stream0) 2023-01-11T21:38:06.8114878Z del arg0_1 2023-01-11T21:38:06.8114950Z del arg1_1 2023-01-11T21:38:06.8115027Z return (buf0, ) 2023-01-11T21:38:06.8115035Z 2023-01-11T21:38:06.8115040Z 2023-01-11T21:38:06.8115119Z if __name__ == "__main__": 2023-01-11T21:38:06.8115229Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8115378Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8115656Z arg0_1 = rand_strided((6, 4, 5, 42), (840, 210, 42, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.8115897Z arg1_1 = rand_strided((2, 3, 4, 5, 6, 7), (2520, 840, 210, 42, 7, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.8116017Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.8116023Z 2023-01-11T21:38:06.8116290Z [2023-01-11 21:36:21,584] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 958 2023-01-11T21:38:06.8116708Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8116843Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8117103Z [2023-01-11 21:36:21,604] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 959 2023-01-11T21:38:06.8117365Z [2023-01-11 21:36:21,618] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 959 2023-01-11T21:38:06.8117770Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8117900Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8118180Z [2023-01-11 21:36:21,637] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 960 2023-01-11T21:38:06.8118444Z [2023-01-11 21:36:21,650] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 960 2023-01-11T21:38:06.8118857Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8118988Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8119242Z [2023-01-11 21:36:21,666] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 961 2023-01-11T21:38:06.8119247Z 2023-01-11T21:38:06.8119345Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8119422Z import torch 2023-01-11T21:38:06.8119497Z import random 2023-01-11T21:38:06.8119610Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8119733Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8119738Z 2023-01-11T21:38:06.8119822Z aten = torch.ops.aten 2023-01-11T21:38:06.8119961Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8120056Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8120061Z 2023-01-11T21:38:06.8120137Z import triton 2023-01-11T21:38:06.8120229Z import triton.language as tl 2023-01-11T21:38:06.8120356Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8120488Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8120493Z 2023-01-11T21:38:06.8120498Z 2023-01-11T21:38:06.8120651Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.8120726Z import triton 2023-01-11T21:38:06.8120818Z import triton.language as tl 2023-01-11T21:38:06.8120934Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.8121040Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.8121172Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.8121290Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.8121303Z 2023-01-11T21:38:06.8121737Z @pointwise(size_hints=[8192], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.8121814Z @triton.jit 2023-01-11T21:38:06.8121956Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.8122031Z xnumel = 5040 2023-01-11T21:38:06.8122128Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.8122258Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.8122341Z xmask = xindex < xnumel 2023-01-11T21:38:06.8122408Z x0 = xindex 2023-01-11T21:38:06.8122525Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.8122640Z tmp1 = tl.load(in_ptr1 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.8122719Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.8122856Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.8122942Z ''') 2023-01-11T21:38:06.8122948Z 2023-01-11T21:38:06.8122952Z 2023-01-11T21:38:06.8123045Z async_compile.wait(globals()) 2023-01-11T21:38:06.8123124Z del async_compile 2023-01-11T21:38:06.8123129Z 2023-01-11T21:38:06.8123196Z def call(args): 2023-01-11T21:38:06.8123275Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.8123349Z args.clear() 2023-01-11T21:38:06.8123442Z with torch.cuda.device(0): 2023-01-11T21:38:06.8123679Z buf0 = empty_strided((2, 3, 4, 5, 6, 7), (2520, 840, 210, 42, 7, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.8123772Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.8123955Z triton_fused_add_0.run(arg0_1, arg1_1, buf0, 5040, grid=grid(5040), stream=stream0) 2023-01-11T21:38:06.8124021Z del arg0_1 2023-01-11T21:38:06.8124093Z del arg1_1 2023-01-11T21:38:06.8124172Z return (buf0, ) 2023-01-11T21:38:06.8124177Z 2023-01-11T21:38:06.8124182Z 2023-01-11T21:38:06.8124265Z if __name__ == "__main__": 2023-01-11T21:38:06.8124384Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8124515Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8124739Z arg0_1 = rand_strided((6, 4, 5, 42), (840, 210, 42, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.8124974Z arg1_1 = rand_strided((2, 3, 4, 5, 6, 7), (2520, 840, 210, 42, 7, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.8125086Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.8125098Z 2023-01-11T21:38:06.8125103Z 2023-01-11T21:38:06.8125193Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8125270Z import torch 2023-01-11T21:38:06.8125348Z import random 2023-01-11T21:38:06.8125465Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8125613Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8125618Z 2023-01-11T21:38:06.8125706Z aten = torch.ops.aten 2023-01-11T21:38:06.8125866Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8125954Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8125959Z 2023-01-11T21:38:06.8126032Z import triton 2023-01-11T21:38:06.8126124Z import triton.language as tl 2023-01-11T21:38:06.8126248Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8126386Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8126392Z 2023-01-11T21:38:06.8126396Z 2023-01-11T21:38:06.8126552Z triton_fused_add_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.8126628Z import triton 2023-01-11T21:38:06.8126720Z import triton.language as tl 2023-01-11T21:38:06.8126829Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.8126931Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.8127064Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.8127188Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.8127193Z 2023-01-11T21:38:06.8127640Z @pointwise(size_hints=[8192], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.8127718Z @triton.jit 2023-01-11T21:38:06.8127861Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.8127935Z xnumel = 5040 2023-01-11T21:38:06.8128025Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.8128156Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.8128240Z xmask = xindex < xnumel 2023-01-11T21:38:06.8128315Z x0 = xindex 2023-01-11T21:38:06.8128412Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.8128507Z tmp3 = tl.load(in_ptr1 + (x0), xmask) 2023-01-11T21:38:06.8128578Z tmp1 = 1 2023-01-11T21:38:06.8128650Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.8128727Z tmp4 = tmp2 + tmp3 2023-01-11T21:38:06.8128867Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.8128952Z ''') 2023-01-11T21:38:06.8128957Z 2023-01-11T21:38:06.8128962Z 2023-01-11T21:38:06.8129055Z async_compile.wait(globals()) 2023-01-11T21:38:06.8129131Z del async_compile 2023-01-11T21:38:06.8129137Z 2023-01-11T21:38:06.8129211Z def call(args): 2023-01-11T21:38:06.8129284Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.8129359Z args.clear() 2023-01-11T21:38:06.8129451Z with torch.cuda.device(0): 2023-01-11T21:38:06.8129688Z buf0 = empty_strided((2, 3, 4, 5, 6, 7), (2520, 840, 210, 42, 7, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.8129816Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.8129960Z triton_fused_add_1_0.run(arg0_1, arg1_1, buf0, 5040, grid=grid(5040), stream=stream0) 2023-01-11T21:38:06.8130034Z del arg0_1 2023-01-11T21:38:06.8130107Z del arg1_1 2023-01-11T21:38:06.8130177Z return (buf0, ) 2023-01-11T21:38:06.8130185Z 2023-01-11T21:38:06.8130190Z 2023-01-11T21:38:06.8130270Z if __name__ == "__main__": 2023-01-11T21:38:06.8130390Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8130518Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8130738Z arg0_1 = rand_strided((6, 4, 5, 42), (840, 210, 42, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.8130972Z arg1_1 = rand_strided((2, 3, 4, 5, 6, 7), (2520, 840, 210, 42, 7, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.8131091Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.8131096Z 2023-01-11T21:38:06.8131104Z 2023-01-11T21:38:06.8131202Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8131270Z import torch 2023-01-11T21:38:06.8131344Z import random 2023-01-11T21:38:06.8131461Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8131587Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8131594Z 2023-01-11T21:38:06.8131680Z aten = torch.ops.aten 2023-01-11T21:38:06.8131817Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8131911Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8131917Z 2023-01-11T21:38:06.8131989Z import triton 2023-01-11T21:38:06.8132074Z import triton.language as tl 2023-01-11T21:38:06.8132198Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8132339Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8132344Z 2023-01-11T21:38:06.8132349Z 2023-01-11T21:38:06.8132505Z triton_fused_add_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.8132582Z import triton 2023-01-11T21:38:06.8132675Z import triton.language as tl 2023-01-11T21:38:06.8132789Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.8132883Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.8133016Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.8133170Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.8133175Z 2023-01-11T21:38:06.8133597Z @pointwise(size_hints=[8192], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.8133670Z @triton.jit 2023-01-11T21:38:06.8133810Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.8133885Z xnumel = 5040 2023-01-11T21:38:06.8133982Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.8134105Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.8134190Z xmask = xindex < xnumel 2023-01-11T21:38:06.8134260Z x0 = xindex 2023-01-11T21:38:06.8134379Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.8134755Z tmp3 = tl.load(in_ptr1 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.8134830Z tmp1 = 1 2023-01-11T21:38:06.8134917Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.8134988Z tmp4 = tmp2 + tmp3 2023-01-11T21:38:06.8135122Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.8135211Z ''') 2023-01-11T21:38:06.8135216Z 2023-01-11T21:38:06.8135220Z 2023-01-11T21:38:06.8135313Z async_compile.wait(globals()) 2023-01-11T21:38:06.8135389Z del async_compile 2023-01-11T21:38:06.8135394Z 2023-01-11T21:38:06.8135471Z def call(args): 2023-01-11T21:38:06.8135549Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.8135630Z args.clear() 2023-01-11T21:38:06.8135732Z with torch.cuda.device(0): 2023-01-11T21:38:06.8136038Z buf0 = empty_strided((2, 3, 4, 5, 6, 7), (2520, 840, 210, 42, 7, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.8136130Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.8136277Z triton_fused_add_1_0.run(arg0_1, arg1_1, buf0, 5040, grid=grid(5040), stream=stream0) 2023-01-11T21:38:06.8136349Z del arg0_1 2023-01-11T21:38:06.8136423Z del arg1_1 2023-01-11T21:38:06.8136504Z return (buf0, ) 2023-01-11T21:38:06.8136509Z 2023-01-11T21:38:06.8136513Z 2023-01-11T21:38:06.8136594Z if __name__ == "__main__": 2023-01-11T21:38:06.8136705Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8136831Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8137050Z arg0_1 = rand_strided((6, 4, 5, 42), (840, 210, 42, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.8137338Z arg1_1 = rand_strided((2, 3, 4, 5, 6, 7), (2520, 840, 210, 42, 7, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.8137464Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.8137470Z 2023-01-11T21:38:06.8137737Z [2023-01-11 21:36:21,741] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 961 2023-01-11T21:38:06.8138155Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8138287Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8138541Z [2023-01-11 21:36:21,757] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 962 2023-01-11T21:38:06.8138796Z [2023-01-11 21:36:21,826] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 962 2023-01-11T21:38:06.8139209Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8139378Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8139636Z [2023-01-11 21:36:21,846] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 963 2023-01-11T21:38:06.8139898Z [2023-01-11 21:36:21,917] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 963 2023-01-11T21:38:06.8140312Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8140446Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8140699Z [2023-01-11 21:36:21,936] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 964 2023-01-11T21:38:06.8140710Z 2023-01-11T21:38:06.8140813Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8140888Z import torch 2023-01-11T21:38:06.8140962Z import random 2023-01-11T21:38:06.8141075Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8141197Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8141202Z 2023-01-11T21:38:06.8141284Z aten = torch.ops.aten 2023-01-11T21:38:06.8141420Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8141517Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8141522Z 2023-01-11T21:38:06.8141595Z import triton 2023-01-11T21:38:06.8141716Z import triton.language as tl 2023-01-11T21:38:06.8141837Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8141979Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8141984Z 2023-01-11T21:38:06.8141989Z 2023-01-11T21:38:06.8142145Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.8142225Z import triton 2023-01-11T21:38:06.8142322Z import triton.language as tl 2023-01-11T21:38:06.8142440Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.8142543Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.8142678Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.8142799Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.8142804Z 2023-01-11T21:38:06.8143223Z @pointwise(size_hints=[1024], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.8143306Z @triton.jit 2023-01-11T21:38:06.8143450Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.8143527Z xnumel = 1000 2023-01-11T21:38:06.8143628Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.8143764Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.8143851Z xmask = xindex < xnumel 2023-01-11T21:38:06.8143917Z x0 = xindex 2023-01-11T21:38:06.8144020Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.8144117Z tmp1 = tl.load(in_ptr1 + (x0), xmask) 2023-01-11T21:38:06.8144199Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.8144336Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.8144423Z ''') 2023-01-11T21:38:06.8144428Z 2023-01-11T21:38:06.8144432Z 2023-01-11T21:38:06.8144528Z async_compile.wait(globals()) 2023-01-11T21:38:06.8144601Z del async_compile 2023-01-11T21:38:06.8144615Z 2023-01-11T21:38:06.8144684Z def call(args): 2023-01-11T21:38:06.8144765Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.8144844Z args.clear() 2023-01-11T21:38:06.8144940Z with torch.cuda.device(0): 2023-01-11T21:38:06.8145154Z buf0 = empty_strided((10, 5, 20), (100, 20, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.8145285Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.8145436Z triton_fused_add_0.run(arg0_1, arg1_1, buf0, 1000, grid=grid(1000), stream=stream0) 2023-01-11T21:38:06.8145504Z del arg0_1 2023-01-11T21:38:06.8145580Z del arg1_1 2023-01-11T21:38:06.8145662Z return (buf0, ) 2023-01-11T21:38:06.8145668Z 2023-01-11T21:38:06.8145672Z 2023-01-11T21:38:06.8145755Z if __name__ == "__main__": 2023-01-11T21:38:06.8145876Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8146006Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8146213Z arg0_1 = rand_strided((50, 20), (20, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.8146421Z arg1_1 = rand_strided((10, 5, 20), (100, 20, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.8146544Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.8146549Z 2023-01-11T21:38:06.8146553Z 2023-01-11T21:38:06.8146653Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8146733Z import torch 2023-01-11T21:38:06.8146810Z import random 2023-01-11T21:38:06.8146930Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8147055Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8147060Z 2023-01-11T21:38:06.8147145Z aten = torch.ops.aten 2023-01-11T21:38:06.8147274Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8147371Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8147376Z 2023-01-11T21:38:06.8147452Z import triton 2023-01-11T21:38:06.8147547Z import triton.language as tl 2023-01-11T21:38:06.8147673Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8147841Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8147847Z 2023-01-11T21:38:06.8147851Z 2023-01-11T21:38:06.8148010Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.8148090Z import triton 2023-01-11T21:38:06.8148178Z import triton.language as tl 2023-01-11T21:38:06.8148299Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.8148404Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.8148538Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.8148666Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.8148671Z 2023-01-11T21:38:06.8149089Z @pointwise(size_hints=[1024], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.8149166Z @triton.jit 2023-01-11T21:38:06.8149311Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.8149382Z xnumel = 1000 2023-01-11T21:38:06.8149481Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.8149612Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.8149700Z xmask = xindex < xnumel 2023-01-11T21:38:06.8149774Z x0 = xindex 2023-01-11T21:38:06.8149893Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.8150013Z tmp1 = tl.load(in_ptr1 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.8150089Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.8150227Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.8150316Z ''') 2023-01-11T21:38:06.8150322Z 2023-01-11T21:38:06.8150326Z 2023-01-11T21:38:06.8150421Z async_compile.wait(globals()) 2023-01-11T21:38:06.8150502Z del async_compile 2023-01-11T21:38:06.8150507Z 2023-01-11T21:38:06.8150584Z def call(args): 2023-01-11T21:38:06.8150669Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.8150747Z args.clear() 2023-01-11T21:38:06.8150835Z with torch.cuda.device(0): 2023-01-11T21:38:06.8151051Z buf0 = empty_strided((10, 5, 20), (100, 20, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.8151147Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.8151324Z triton_fused_add_0.run(arg0_1, arg1_1, buf0, 1000, grid=grid(1000), stream=stream0) 2023-01-11T21:38:06.8151402Z del arg0_1 2023-01-11T21:38:06.8151478Z del arg1_1 2023-01-11T21:38:06.8151559Z return (buf0, ) 2023-01-11T21:38:06.8151564Z 2023-01-11T21:38:06.8151569Z 2023-01-11T21:38:06.8151644Z if __name__ == "__main__": 2023-01-11T21:38:06.8151765Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8151894Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8152108Z arg0_1 = rand_strided((50, 20), (20, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.8152322Z arg1_1 = rand_strided((10, 5, 20), (100, 20, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.8152443Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.8152448Z 2023-01-11T21:38:06.8152452Z 2023-01-11T21:38:06.8152552Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8152629Z import torch 2023-01-11T21:38:06.8152701Z import random 2023-01-11T21:38:06.8152822Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8152948Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8152953Z 2023-01-11T21:38:06.8153037Z aten = torch.ops.aten 2023-01-11T21:38:06.8153177Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8153275Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8153280Z 2023-01-11T21:38:06.8153356Z import triton 2023-01-11T21:38:06.8153450Z import triton.language as tl 2023-01-11T21:38:06.8153571Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8153739Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8153745Z 2023-01-11T21:38:06.8153749Z 2023-01-11T21:38:06.8153909Z triton_fused_add_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.8153986Z import triton 2023-01-11T21:38:06.8154080Z import triton.language as tl 2023-01-11T21:38:06.8154199Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.8154302Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.8154432Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.8154559Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.8154564Z 2023-01-11T21:38:06.8154984Z @pointwise(size_hints=[1024], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.8155061Z @triton.jit 2023-01-11T21:38:06.8155204Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.8155287Z xnumel = 1000 2023-01-11T21:38:06.8155387Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.8155544Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.8155637Z xmask = xindex < xnumel 2023-01-11T21:38:06.8155721Z x0 = xindex 2023-01-11T21:38:06.8155826Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.8155925Z tmp3 = tl.load(in_ptr1 + (x0), xmask) 2023-01-11T21:38:06.8156000Z tmp1 = 1 2023-01-11T21:38:06.8156083Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.8156164Z tmp4 = tmp2 + tmp3 2023-01-11T21:38:06.8156295Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.8156384Z ''') 2023-01-11T21:38:06.8156389Z 2023-01-11T21:38:06.8156394Z 2023-01-11T21:38:06.8156491Z async_compile.wait(globals()) 2023-01-11T21:38:06.8156570Z del async_compile 2023-01-11T21:38:06.8156575Z 2023-01-11T21:38:06.8156658Z def call(args): 2023-01-11T21:38:06.8156742Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.8156824Z args.clear() 2023-01-11T21:38:06.8156918Z with torch.cuda.device(0): 2023-01-11T21:38:06.8157124Z buf0 = empty_strided((10, 5, 20), (100, 20, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.8157218Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.8157392Z triton_fused_add_1_0.run(arg0_1, arg1_1, buf0, 1000, grid=grid(1000), stream=stream0) 2023-01-11T21:38:06.8157470Z del arg0_1 2023-01-11T21:38:06.8157546Z del arg1_1 2023-01-11T21:38:06.8157627Z return (buf0, ) 2023-01-11T21:38:06.8157632Z 2023-01-11T21:38:06.8157637Z 2023-01-11T21:38:06.8157720Z if __name__ == "__main__": 2023-01-11T21:38:06.8157840Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8157961Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8158170Z arg0_1 = rand_strided((50, 20), (20, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.8158389Z arg1_1 = rand_strided((10, 5, 20), (100, 20, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.8158510Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.8158515Z 2023-01-11T21:38:06.8158784Z [2023-01-11 21:36:22,007] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 964 2023-01-11T21:38:06.8159201Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8159335Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8159593Z [2023-01-11 21:36:22,023] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 965 2023-01-11T21:38:06.8159882Z [2023-01-11 21:36:22,089] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 965 2023-01-11T21:38:06.8160298Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8160425Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8160682Z [2023-01-11 21:36:22,105] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 966 2023-01-11T21:38:06.8160945Z [2023-01-11 21:36:22,175] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 966 2023-01-11T21:38:06.8161359Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8161495Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8161753Z [2023-01-11 21:36:22,193] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 967 2023-01-11T21:38:06.8161759Z 2023-01-11T21:38:06.8161859Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8161936Z import torch 2023-01-11T21:38:06.8162015Z import random 2023-01-11T21:38:06.8162138Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8162258Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8162263Z 2023-01-11T21:38:06.8162348Z aten = torch.ops.aten 2023-01-11T21:38:06.8162490Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8162591Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8162596Z 2023-01-11T21:38:06.8162672Z import triton 2023-01-11T21:38:06.8162767Z import triton.language as tl 2023-01-11T21:38:06.8162898Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8163033Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8163071Z 2023-01-11T21:38:06.8163077Z 2023-01-11T21:38:06.8163230Z triton_fused_add_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.8163311Z import triton 2023-01-11T21:38:06.8163406Z import triton.language as tl 2023-01-11T21:38:06.8163524Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.8163628Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.8163765Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.8163896Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.8163901Z 2023-01-11T21:38:06.8164312Z @pointwise(size_hints=[1024], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.8164393Z @triton.jit 2023-01-11T21:38:06.8164535Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.8164614Z xnumel = 1000 2023-01-11T21:38:06.8164713Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.8164845Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.8164931Z xmask = xindex < xnumel 2023-01-11T21:38:06.8165009Z x0 = xindex 2023-01-11T21:38:06.8165121Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.8165241Z tmp3 = tl.load(in_ptr1 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.8165317Z tmp1 = 1 2023-01-11T21:38:06.8165400Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.8165481Z tmp4 = tmp2 + tmp3 2023-01-11T21:38:06.8165619Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.8165745Z ''') 2023-01-11T21:38:06.8165750Z 2023-01-11T21:38:06.8165755Z 2023-01-11T21:38:06.8165844Z async_compile.wait(globals()) 2023-01-11T21:38:06.8165923Z del async_compile 2023-01-11T21:38:06.8165929Z 2023-01-11T21:38:06.8166005Z def call(args): 2023-01-11T21:38:06.8166089Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.8166167Z args.clear() 2023-01-11T21:38:06.8166261Z with torch.cuda.device(0): 2023-01-11T21:38:06.8166475Z buf0 = empty_strided((10, 5, 20), (100, 20, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.8166563Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.8166710Z triton_fused_add_1_0.run(arg0_1, arg1_1, buf0, 1000, grid=grid(1000), stream=stream0) 2023-01-11T21:38:06.8166785Z del arg0_1 2023-01-11T21:38:06.8166860Z del arg1_1 2023-01-11T21:38:06.8166945Z return (buf0, ) 2023-01-11T21:38:06.8166950Z 2023-01-11T21:38:06.8166955Z 2023-01-11T21:38:06.8167042Z if __name__ == "__main__": 2023-01-11T21:38:06.8167164Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8167293Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8167493Z arg0_1 = rand_strided((50, 20), (20, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.8167707Z arg1_1 = rand_strided((10, 5, 20), (100, 20, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.8167829Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.8167834Z 2023-01-11T21:38:06.8167839Z 2023-01-11T21:38:06.8167940Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8168017Z import torch 2023-01-11T21:38:06.8168094Z import random 2023-01-11T21:38:06.8168214Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8168340Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8168345Z 2023-01-11T21:38:06.8168423Z aten = torch.ops.aten 2023-01-11T21:38:06.8168559Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8168660Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8168666Z 2023-01-11T21:38:06.8168743Z import triton 2023-01-11T21:38:06.8168839Z import triton.language as tl 2023-01-11T21:38:06.8168967Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8169137Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8169143Z 2023-01-11T21:38:06.8169147Z 2023-01-11T21:38:06.8169305Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.8169375Z import triton 2023-01-11T21:38:06.8169469Z import triton.language as tl 2023-01-11T21:38:06.8169583Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.8169688Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.8169822Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.8169949Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.8169954Z 2023-01-11T21:38:06.8170374Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.8170454Z @triton.jit 2023-01-11T21:38:06.8170592Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.8170668Z xnumel = 10 2023-01-11T21:38:06.8170769Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.8170902Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.8170989Z xmask = xindex < xnumel 2023-01-11T21:38:06.8171062Z x0 = xindex 2023-01-11T21:38:06.8171160Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.8171252Z tmp1 = tl.load(in_ptr1 + (x0), xmask) 2023-01-11T21:38:06.8171334Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.8171471Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.8171559Z ''') 2023-01-11T21:38:06.8171590Z 2023-01-11T21:38:06.8171595Z 2023-01-11T21:38:06.8171692Z async_compile.wait(globals()) 2023-01-11T21:38:06.8171772Z del async_compile 2023-01-11T21:38:06.8171778Z 2023-01-11T21:38:06.8171855Z def call(args): 2023-01-11T21:38:06.8171937Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.8172008Z args.clear() 2023-01-11T21:38:06.8172104Z with torch.cuda.device(0): 2023-01-11T21:38:06.8172306Z buf0 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.8172403Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.8172546Z triton_fused_add_0.run(arg0_1, arg1_1, buf0, 10, grid=grid(10), stream=stream0) 2023-01-11T21:38:06.8172623Z del arg0_1 2023-01-11T21:38:06.8172698Z del arg1_1 2023-01-11T21:38:06.8172771Z return (buf0, ) 2023-01-11T21:38:06.8172776Z 2023-01-11T21:38:06.8172780Z 2023-01-11T21:38:06.8172863Z if __name__ == "__main__": 2023-01-11T21:38:06.8172984Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8173114Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8173326Z arg0_1 = rand_strided((1, 10, 1), (10, 1, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.8173528Z arg1_1 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.8173652Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.8173657Z 2023-01-11T21:38:06.8173662Z 2023-01-11T21:38:06.8173765Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8173834Z import torch 2023-01-11T21:38:06.8173910Z import random 2023-01-11T21:38:06.8174030Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8174156Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8174161Z 2023-01-11T21:38:06.8174246Z aten = torch.ops.aten 2023-01-11T21:38:06.8174383Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8174686Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8174695Z 2023-01-11T21:38:06.8174772Z import triton 2023-01-11T21:38:06.8174858Z import triton.language as tl 2023-01-11T21:38:06.8174984Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8175122Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8175128Z 2023-01-11T21:38:06.8175133Z 2023-01-11T21:38:06.8175331Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.8175407Z import triton 2023-01-11T21:38:06.8175500Z import triton.language as tl 2023-01-11T21:38:06.8175614Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.8175709Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.8175842Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.8175966Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.8175972Z 2023-01-11T21:38:06.8176391Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.8176469Z @triton.jit 2023-01-11T21:38:06.8176609Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.8176682Z xnumel = 10 2023-01-11T21:38:06.8176781Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.8176904Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.8176988Z xmask = xindex < xnumel 2023-01-11T21:38:06.8177060Z x0 = xindex 2023-01-11T21:38:06.8177232Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.8177368Z tmp1 = tl.load(in_ptr1 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.8177461Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.8177607Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.8177686Z ''') 2023-01-11T21:38:06.8177697Z 2023-01-11T21:38:06.8177702Z 2023-01-11T21:38:06.8177788Z async_compile.wait(globals()) 2023-01-11T21:38:06.8177932Z del async_compile 2023-01-11T21:38:06.8177937Z 2023-01-11T21:38:06.8178014Z def call(args): 2023-01-11T21:38:06.8178096Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.8178177Z args.clear() 2023-01-11T21:38:06.8178274Z with torch.cuda.device(0): 2023-01-11T21:38:06.8178478Z buf0 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.8178567Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.8178712Z triton_fused_add_0.run(arg0_1, arg1_1, buf0, 10, grid=grid(10), stream=stream0) 2023-01-11T21:38:06.8178788Z del arg0_1 2023-01-11T21:38:06.8178863Z del arg1_1 2023-01-11T21:38:06.8178943Z return (buf0, ) 2023-01-11T21:38:06.8178948Z 2023-01-11T21:38:06.8178952Z 2023-01-11T21:38:06.8179034Z if __name__ == "__main__": 2023-01-11T21:38:06.8179155Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8179276Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8179489Z arg0_1 = rand_strided((1, 10, 1), (10, 1, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.8179690Z arg1_1 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.8179812Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.8179817Z 2023-01-11T21:38:06.8180090Z [2023-01-11 21:36:22,261] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 967 2023-01-11T21:38:06.8180507Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8180642Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8180899Z [2023-01-11 21:36:22,279] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 968 2023-01-11T21:38:06.8181169Z [2023-01-11 21:36:22,355] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 968 2023-01-11T21:38:06.8181609Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8181746Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8181996Z [2023-01-11 21:36:22,371] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 969 2023-01-11T21:38:06.8182258Z [2023-01-11 21:36:22,382] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 969 2023-01-11T21:38:06.8182673Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8182815Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8183068Z [2023-01-11 21:36:22,397] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 970 2023-01-11T21:38:06.8183074Z 2023-01-11T21:38:06.8183179Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8183256Z import torch 2023-01-11T21:38:06.8183333Z import random 2023-01-11T21:38:06.8183458Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8183578Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8183590Z 2023-01-11T21:38:06.8183668Z aten = torch.ops.aten 2023-01-11T21:38:06.8183835Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8183933Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8183939Z 2023-01-11T21:38:06.8184015Z import triton 2023-01-11T21:38:06.8184109Z import triton.language as tl 2023-01-11T21:38:06.8184239Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8184384Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8184389Z 2023-01-11T21:38:06.8184394Z 2023-01-11T21:38:06.8184555Z triton_fused_add_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.8184626Z import triton 2023-01-11T21:38:06.8184720Z import triton.language as tl 2023-01-11T21:38:06.8184837Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.8184943Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.8185078Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.8185207Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.8185215Z 2023-01-11T21:38:06.8185674Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.8185758Z @triton.jit 2023-01-11T21:38:06.8185895Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.8185971Z xnumel = 10 2023-01-11T21:38:06.8186070Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.8186202Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.8186290Z xmask = xindex < xnumel 2023-01-11T21:38:06.8186364Z x0 = xindex 2023-01-11T21:38:06.8186457Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.8186558Z tmp3 = tl.load(in_ptr1 + (x0), xmask) 2023-01-11T21:38:06.8186630Z tmp1 = 1 2023-01-11T21:38:06.8186712Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.8186794Z tmp4 = tmp2 + tmp3 2023-01-11T21:38:06.8186939Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.8187027Z ''') 2023-01-11T21:38:06.8187032Z 2023-01-11T21:38:06.8187037Z 2023-01-11T21:38:06.8187134Z async_compile.wait(globals()) 2023-01-11T21:38:06.8187206Z del async_compile 2023-01-11T21:38:06.8187211Z 2023-01-11T21:38:06.8187317Z def call(args): 2023-01-11T21:38:06.8187401Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.8187479Z args.clear() 2023-01-11T21:38:06.8187573Z with torch.cuda.device(0): 2023-01-11T21:38:06.8187774Z buf0 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.8187868Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.8188007Z triton_fused_add_1_0.run(arg0_1, arg1_1, buf0, 10, grid=grid(10), stream=stream0) 2023-01-11T21:38:06.8188084Z del arg0_1 2023-01-11T21:38:06.8188160Z del arg1_1 2023-01-11T21:38:06.8188239Z return (buf0, ) 2023-01-11T21:38:06.8188248Z 2023-01-11T21:38:06.8188252Z 2023-01-11T21:38:06.8188334Z if __name__ == "__main__": 2023-01-11T21:38:06.8188455Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8188586Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8188800Z arg0_1 = rand_strided((1, 10, 1), (10, 1, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.8188992Z arg1_1 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.8189114Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.8189119Z 2023-01-11T21:38:06.8189124Z 2023-01-11T21:38:06.8189225Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8189303Z import torch 2023-01-11T21:38:06.8189380Z import random 2023-01-11T21:38:06.8189500Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8189626Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8189631Z 2023-01-11T21:38:06.8189716Z aten = torch.ops.aten 2023-01-11T21:38:06.8189874Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8189973Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8189979Z 2023-01-11T21:38:06.8190055Z import triton 2023-01-11T21:38:06.8190150Z import triton.language as tl 2023-01-11T21:38:06.8190278Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8190421Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8190427Z 2023-01-11T21:38:06.8190431Z 2023-01-11T21:38:06.8190590Z triton_fused_add_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.8190667Z import triton 2023-01-11T21:38:06.8190755Z import triton.language as tl 2023-01-11T21:38:06.8190875Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.8190980Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.8191116Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.8191244Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.8191252Z 2023-01-11T21:38:06.8191667Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.8191743Z @triton.jit 2023-01-11T21:38:06.8191890Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.8191959Z xnumel = 10 2023-01-11T21:38:06.8192059Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.8192191Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.8192279Z xmask = xindex < xnumel 2023-01-11T21:38:06.8192353Z x0 = xindex 2023-01-11T21:38:06.8192472Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.8192591Z tmp3 = tl.load(in_ptr1 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.8192657Z tmp1 = 1 2023-01-11T21:38:06.8192740Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.8192824Z tmp4 = tmp2 + tmp3 2023-01-11T21:38:06.8192963Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.8193050Z ''') 2023-01-11T21:38:06.8193056Z 2023-01-11T21:38:06.8193060Z 2023-01-11T21:38:06.8193154Z async_compile.wait(globals()) 2023-01-11T21:38:06.8193233Z del async_compile 2023-01-11T21:38:06.8193238Z 2023-01-11T21:38:06.8193334Z def call(args): 2023-01-11T21:38:06.8193417Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.8193495Z args.clear() 2023-01-11T21:38:06.8193589Z with torch.cuda.device(0): 2023-01-11T21:38:06.8193792Z buf0 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.8193886Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.8194033Z triton_fused_add_1_0.run(arg0_1, arg1_1, buf0, 10, grid=grid(10), stream=stream0) 2023-01-11T21:38:06.8194103Z del arg0_1 2023-01-11T21:38:06.8194178Z del arg1_1 2023-01-11T21:38:06.8194259Z return (buf0, ) 2023-01-11T21:38:06.8194267Z 2023-01-11T21:38:06.8194271Z 2023-01-11T21:38:06.8194354Z if __name__ == "__main__": 2023-01-11T21:38:06.8194475Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8194604Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8194816Z arg0_1 = rand_strided((1, 10, 1), (10, 1, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.8195015Z arg1_1 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.8195130Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.8195135Z 2023-01-11T21:38:06.8195149Z 2023-01-11T21:38:06.8195242Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8195318Z import torch 2023-01-11T21:38:06.8195395Z import random 2023-01-11T21:38:06.8195515Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8195641Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8195646Z 2023-01-11T21:38:06.8195732Z aten = torch.ops.aten 2023-01-11T21:38:06.8195897Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8195988Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8195993Z 2023-01-11T21:38:06.8196068Z import triton 2023-01-11T21:38:06.8196168Z import triton.language as tl 2023-01-11T21:38:06.8196298Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8196439Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8196444Z 2023-01-11T21:38:06.8196449Z 2023-01-11T21:38:06.8196604Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.8196682Z import triton 2023-01-11T21:38:06.8196777Z import triton.language as tl 2023-01-11T21:38:06.8196886Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.8196989Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.8197123Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.8197255Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.8197263Z 2023-01-11T21:38:06.8197684Z @pointwise(size_hints=[1024], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.8197760Z @triton.jit 2023-01-11T21:38:06.8197907Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.8197984Z xnumel = 1000 2023-01-11T21:38:06.8198076Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.8198209Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.8198295Z xmask = xindex < xnumel 2023-01-11T21:38:06.8198368Z x0 = xindex 2023-01-11T21:38:06.8198469Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.8198568Z tmp1 = tl.load(in_ptr1 + (x0), xmask) 2023-01-11T21:38:06.8198649Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.8198778Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.8198867Z ''') 2023-01-11T21:38:06.8198873Z 2023-01-11T21:38:06.8198877Z 2023-01-11T21:38:06.8198972Z async_compile.wait(globals()) 2023-01-11T21:38:06.8199050Z del async_compile 2023-01-11T21:38:06.8199056Z 2023-01-11T21:38:06.8199133Z def call(args): 2023-01-11T21:38:06.8199215Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.8199325Z args.clear() 2023-01-11T21:38:06.8199413Z with torch.cuda.device(0): 2023-01-11T21:38:06.8199620Z buf0 = empty_strided((10, 100), (100, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.8199715Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.8199860Z triton_fused_add_0.run(arg0_1, arg1_1, buf0, 1000, grid=grid(1000), stream=stream0) 2023-01-11T21:38:06.8199936Z del arg0_1 2023-01-11T21:38:06.8200011Z del arg1_1 2023-01-11T21:38:06.8200091Z return (buf0, ) 2023-01-11T21:38:06.8200097Z 2023-01-11T21:38:06.8200101Z 2023-01-11T21:38:06.8200186Z if __name__ == "__main__": 2023-01-11T21:38:06.8200302Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8200431Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8200668Z arg0_1 = rand_strided((10, 1, 10, 1, 10), (100, 100, 10, 10, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.8200877Z arg1_1 = rand_strided((10, 100), (100, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.8200999Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.8201004Z 2023-01-11T21:38:06.8201271Z [2023-01-11 21:36:22,408] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 970 2023-01-11T21:38:06.8201689Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8201849Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8202108Z [2023-01-11 21:36:22,427] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 971 2023-01-11T21:38:06.8202377Z [2023-01-11 21:36:22,440] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 971 2023-01-11T21:38:06.8202785Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8202917Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8203176Z [2023-01-11 21:36:22,459] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 972 2023-01-11T21:38:06.8203440Z [2023-01-11 21:36:22,471] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 972 2023-01-11T21:38:06.8203854Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8203988Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8204243Z [2023-01-11 21:36:22,486] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 973 2023-01-11T21:38:06.8204249Z 2023-01-11T21:38:06.8204349Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8204429Z import torch 2023-01-11T21:38:06.8204507Z import random 2023-01-11T21:38:06.8204621Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8204750Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8204756Z 2023-01-11T21:38:06.8204839Z aten = torch.ops.aten 2023-01-11T21:38:06.8204981Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8205077Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8205108Z 2023-01-11T21:38:06.8205189Z import triton 2023-01-11T21:38:06.8205283Z import triton.language as tl 2023-01-11T21:38:06.8205404Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8205562Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8205568Z 2023-01-11T21:38:06.8205574Z 2023-01-11T21:38:06.8205753Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.8205840Z import triton 2023-01-11T21:38:06.8205936Z import triton.language as tl 2023-01-11T21:38:06.8206052Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.8206156Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.8206296Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.8206416Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.8206428Z 2023-01-11T21:38:06.8206845Z @pointwise(size_hints=[1024], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.8206923Z @triton.jit 2023-01-11T21:38:06.8207066Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.8207145Z xnumel = 1000 2023-01-11T21:38:06.8207248Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.8207382Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.8207467Z xmask = xindex < xnumel 2023-01-11T21:38:06.8207535Z x0 = xindex 2023-01-11T21:38:06.8207656Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.8207810Z tmp1 = tl.load(in_ptr1 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.8207896Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.8208033Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.8208120Z ''') 2023-01-11T21:38:06.8208125Z 2023-01-11T21:38:06.8208130Z 2023-01-11T21:38:06.8208229Z async_compile.wait(globals()) 2023-01-11T21:38:06.8208310Z del async_compile 2023-01-11T21:38:06.8208315Z 2023-01-11T21:38:06.8208385Z def call(args): 2023-01-11T21:38:06.8208469Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.8208547Z args.clear() 2023-01-11T21:38:06.8208642Z with torch.cuda.device(0): 2023-01-11T21:38:06.8208853Z buf0 = empty_strided((10, 100), (100, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.8208948Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.8209093Z triton_fused_add_0.run(arg0_1, arg1_1, buf0, 1000, grid=grid(1000), stream=stream0) 2023-01-11T21:38:06.8209162Z del arg0_1 2023-01-11T21:38:06.8209240Z del arg1_1 2023-01-11T21:38:06.8209322Z return (buf0, ) 2023-01-11T21:38:06.8209327Z 2023-01-11T21:38:06.8209331Z 2023-01-11T21:38:06.8209414Z if __name__ == "__main__": 2023-01-11T21:38:06.8209534Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8209666Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8209901Z arg0_1 = rand_strided((10, 1, 10, 1, 10), (100, 100, 10, 10, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.8210109Z arg1_1 = rand_strided((10, 100), (100, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.8210222Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.8210227Z 2023-01-11T21:38:06.8210241Z 2023-01-11T21:38:06.8210333Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8210409Z import torch 2023-01-11T21:38:06.8210488Z import random 2023-01-11T21:38:06.8210609Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8210738Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8210743Z 2023-01-11T21:38:06.8210829Z aten = torch.ops.aten 2023-01-11T21:38:06.8210966Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8211056Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8211061Z 2023-01-11T21:38:06.8211163Z import triton 2023-01-11T21:38:06.8211258Z import triton.language as tl 2023-01-11T21:38:06.8211386Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8211529Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8211534Z 2023-01-11T21:38:06.8211539Z 2023-01-11T21:38:06.8211700Z triton_fused_add_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.8211779Z import triton 2023-01-11T21:38:06.8211880Z import triton.language as tl 2023-01-11T21:38:06.8211989Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.8212092Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.8212232Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.8212358Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.8212363Z 2023-01-11T21:38:06.8212783Z @pointwise(size_hints=[1024], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.8212859Z @triton.jit 2023-01-11T21:38:06.8213002Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.8213080Z xnumel = 1000 2023-01-11T21:38:06.8213173Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.8213306Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.8213392Z xmask = xindex < xnumel 2023-01-11T21:38:06.8213470Z x0 = xindex 2023-01-11T21:38:06.8213575Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.8213674Z tmp3 = tl.load(in_ptr1 + (x0), xmask) 2023-01-11T21:38:06.8213775Z tmp1 = 1 2023-01-11T21:38:06.8213850Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.8213931Z tmp4 = tmp2 + tmp3 2023-01-11T21:38:06.8214072Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.8214161Z ''') 2023-01-11T21:38:06.8214166Z 2023-01-11T21:38:06.8214175Z 2023-01-11T21:38:06.8214269Z async_compile.wait(globals()) 2023-01-11T21:38:06.8214349Z del async_compile 2023-01-11T21:38:06.8214354Z 2023-01-11T21:38:06.8214431Z def call(args): 2023-01-11T21:38:06.8214706Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.8214791Z args.clear() 2023-01-11T21:38:06.8214886Z with torch.cuda.device(0): 2023-01-11T21:38:06.8215098Z buf0 = empty_strided((10, 100), (100, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.8215192Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.8215341Z triton_fused_add_1_0.run(arg0_1, arg1_1, buf0, 1000, grid=grid(1000), stream=stream0) 2023-01-11T21:38:06.8215421Z del arg0_1 2023-01-11T21:38:06.8215489Z del arg1_1 2023-01-11T21:38:06.8215570Z return (buf0, ) 2023-01-11T21:38:06.8215575Z 2023-01-11T21:38:06.8215579Z 2023-01-11T21:38:06.8215661Z if __name__ == "__main__": 2023-01-11T21:38:06.8215781Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8215912Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8216148Z arg0_1 = rand_strided((10, 1, 10, 1, 10), (100, 100, 10, 10, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.8216355Z arg1_1 = rand_strided((10, 100), (100, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.8216476Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.8216482Z 2023-01-11T21:38:06.8216486Z 2023-01-11T21:38:06.8216586Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8216656Z import torch 2023-01-11T21:38:06.8216731Z import random 2023-01-11T21:38:06.8216853Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8216982Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8216987Z 2023-01-11T21:38:06.8217071Z aten = torch.ops.aten 2023-01-11T21:38:06.8217266Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8217364Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8217370Z 2023-01-11T21:38:06.8217491Z import triton 2023-01-11T21:38:06.8217590Z import triton.language as tl 2023-01-11T21:38:06.8217720Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8217862Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8217868Z 2023-01-11T21:38:06.8217872Z 2023-01-11T21:38:06.8218033Z triton_fused_add_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.8218111Z import triton 2023-01-11T21:38:06.8218208Z import triton.language as tl 2023-01-11T21:38:06.8218324Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.8218422Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.8218560Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.8218689Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.8218694Z 2023-01-11T21:38:06.8219116Z @pointwise(size_hints=[1024], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.8219192Z @triton.jit 2023-01-11T21:38:06.8219335Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.8219416Z xnumel = 1000 2023-01-11T21:38:06.8219515Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.8219640Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.8219725Z xmask = xindex < xnumel 2023-01-11T21:38:06.8219799Z x0 = xindex 2023-01-11T21:38:06.8219919Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.8220075Z tmp3 = tl.load(in_ptr1 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.8220148Z tmp1 = 1 2023-01-11T21:38:06.8220229Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.8220304Z tmp4 = tmp2 + tmp3 2023-01-11T21:38:06.8220443Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.8220536Z ''') 2023-01-11T21:38:06.8220543Z 2023-01-11T21:38:06.8220548Z 2023-01-11T21:38:06.8220642Z async_compile.wait(globals()) 2023-01-11T21:38:06.8220723Z del async_compile 2023-01-11T21:38:06.8220728Z 2023-01-11T21:38:06.8220805Z def call(args): 2023-01-11T21:38:06.8220888Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.8220958Z args.clear() 2023-01-11T21:38:06.8221052Z with torch.cuda.device(0): 2023-01-11T21:38:06.8221263Z buf0 = empty_strided((10, 100), (100, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.8221357Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.8221507Z triton_fused_add_1_0.run(arg0_1, arg1_1, buf0, 1000, grid=grid(1000), stream=stream0) 2023-01-11T21:38:06.8221585Z del arg0_1 2023-01-11T21:38:06.8221660Z del arg1_1 2023-01-11T21:38:06.8221733Z return (buf0, ) 2023-01-11T21:38:06.8221746Z 2023-01-11T21:38:06.8221750Z 2023-01-11T21:38:06.8221826Z if __name__ == "__main__": 2023-01-11T21:38:06.8221950Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8222079Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8222315Z arg0_1 = rand_strided((10, 1, 10, 1, 10), (100, 100, 10, 10, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.8222522Z arg1_1 = rand_strided((10, 100), (100, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.8222645Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.8222650Z 2023-01-11T21:38:06.8222916Z [2023-01-11 21:36:22,553] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 973 2023-01-11T21:38:06.8223329Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8223506Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8223757Z [2023-01-11 21:36:22,569] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 974 2023-01-11T21:38:06.8224025Z [2023-01-11 21:36:22,639] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 974 2023-01-11T21:38:06.8224441Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8224580Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8224838Z [2023-01-11 21:36:22,658] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 975 2023-01-11T21:38:06.8225107Z [2023-01-11 21:36:22,730] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 975 2023-01-11T21:38:06.8225520Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8225668Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8225956Z [2023-01-11 21:36:22,749] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 976 2023-01-11T21:38:06.8225988Z 2023-01-11T21:38:06.8226095Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8226173Z import torch 2023-01-11T21:38:06.8226243Z import random 2023-01-11T21:38:06.8226365Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8226497Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8226503Z 2023-01-11T21:38:06.8226588Z aten = torch.ops.aten 2023-01-11T21:38:06.8226728Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8226829Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8226834Z 2023-01-11T21:38:06.8226911Z import triton 2023-01-11T21:38:06.8226998Z import triton.language as tl 2023-01-11T21:38:06.8227129Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8227270Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8227276Z 2023-01-11T21:38:06.8227280Z 2023-01-11T21:38:06.8227440Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.8227517Z import triton 2023-01-11T21:38:06.8227613Z import triton.language as tl 2023-01-11T21:38:06.8227734Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.8227838Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.8227968Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.8228097Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.8228102Z 2023-01-11T21:38:06.8228523Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.8228599Z @triton.jit 2023-01-11T21:38:06.8228743Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.8228820Z xnumel = 16 2023-01-11T21:38:06.8228921Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.8229057Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.8229136Z xmask = xindex < xnumel 2023-01-11T21:38:06.8229211Z x0 = xindex 2023-01-11T21:38:06.8229311Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.8229410Z tmp1 = tl.load(in_ptr1 + (x0), xmask) 2023-01-11T21:38:06.8229518Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.8229657Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.8229745Z ''') 2023-01-11T21:38:06.8229750Z 2023-01-11T21:38:06.8229754Z 2023-01-11T21:38:06.8229854Z async_compile.wait(globals()) 2023-01-11T21:38:06.8229927Z del async_compile 2023-01-11T21:38:06.8229932Z 2023-01-11T21:38:06.8230009Z def call(args): 2023-01-11T21:38:06.8230091Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.8230168Z args.clear() 2023-01-11T21:38:06.8230261Z with torch.cuda.device(0): 2023-01-11T21:38:06.8230464Z buf0 = empty_strided((4, 4), (4, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.8230565Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.8230703Z triton_fused_add_0.run(arg0_1, arg1_1, buf0, 16, grid=grid(16), stream=stream0) 2023-01-11T21:38:06.8230781Z del arg0_1 2023-01-11T21:38:06.8230855Z del arg1_1 2023-01-11T21:38:06.8230943Z return (buf0, ) 2023-01-11T21:38:06.8230948Z 2023-01-11T21:38:06.8230953Z 2023-01-11T21:38:06.8231038Z if __name__ == "__main__": 2023-01-11T21:38:06.8231158Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8231288Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8231497Z arg0_1 = rand_strided((2, 2, 2, 2), (8, 4, 2, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.8231698Z arg1_1 = rand_strided((4, 4), (4, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.8231821Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.8231826Z 2023-01-11T21:38:06.8231856Z 2023-01-11T21:38:06.8231956Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8232035Z import torch 2023-01-11T21:38:06.8232111Z import random 2023-01-11T21:38:06.8232232Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8232358Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8232363Z 2023-01-11T21:38:06.8232442Z aten = torch.ops.aten 2023-01-11T21:38:06.8232580Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8232677Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8232682Z 2023-01-11T21:38:06.8232760Z import triton 2023-01-11T21:38:06.8232854Z import triton.language as tl 2023-01-11T21:38:06.8232982Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8233123Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8233128Z 2023-01-11T21:38:06.8233132Z 2023-01-11T21:38:06.8233289Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.8233363Z import triton 2023-01-11T21:38:06.8233458Z import triton.language as tl 2023-01-11T21:38:06.8233576Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.8233680Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.8233815Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.8233944Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.8233949Z 2023-01-11T21:38:06.8234370Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.8234445Z @triton.jit 2023-01-11T21:38:06.8234581Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.8234659Z xnumel = 16 2023-01-11T21:38:06.8234758Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.8234892Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.8234980Z xmask = xindex < xnumel 2023-01-11T21:38:06.8235055Z x0 = xindex 2023-01-11T21:38:06.8235175Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.8235286Z tmp1 = tl.load(in_ptr1 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.8235370Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.8235533Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.8235624Z ''') 2023-01-11T21:38:06.8235629Z 2023-01-11T21:38:06.8235634Z 2023-01-11T21:38:06.8235728Z async_compile.wait(globals()) 2023-01-11T21:38:06.8235807Z del async_compile 2023-01-11T21:38:06.8235812Z 2023-01-11T21:38:06.8235889Z def call(args): 2023-01-11T21:38:06.8235970Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.8236040Z args.clear() 2023-01-11T21:38:06.8236135Z with torch.cuda.device(0): 2023-01-11T21:38:06.8236338Z buf0 = empty_strided((4, 4), (4, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.8236438Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.8236582Z triton_fused_add_0.run(arg0_1, arg1_1, buf0, 16, grid=grid(16), stream=stream0) 2023-01-11T21:38:06.8236658Z del arg0_1 2023-01-11T21:38:06.8236733Z del arg1_1 2023-01-11T21:38:06.8236805Z return (buf0, ) 2023-01-11T21:38:06.8236811Z 2023-01-11T21:38:06.8236826Z 2023-01-11T21:38:06.8236903Z if __name__ == "__main__": 2023-01-11T21:38:06.8237025Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8237154Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8237369Z arg0_1 = rand_strided((2, 2, 2, 2), (8, 4, 2, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.8237569Z arg1_1 = rand_strided((4, 4), (4, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.8237690Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.8237695Z 2023-01-11T21:38:06.8237700Z 2023-01-11T21:38:06.8237800Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8237901Z import torch 2023-01-11T21:38:06.8237969Z import random 2023-01-11T21:38:06.8238085Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8238209Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8238214Z 2023-01-11T21:38:06.8238295Z aten = torch.ops.aten 2023-01-11T21:38:06.8238434Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8238529Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8238534Z 2023-01-11T21:38:06.8238607Z import triton 2023-01-11T21:38:06.8238691Z import triton.language as tl 2023-01-11T21:38:06.8238816Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8238954Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8238960Z 2023-01-11T21:38:06.8238964Z 2023-01-11T21:38:06.8239120Z triton_fused_add_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.8239194Z import triton 2023-01-11T21:38:06.8239286Z import triton.language as tl 2023-01-11T21:38:06.8239404Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.8239507Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.8239634Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.8239757Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.8239763Z 2023-01-11T21:38:06.8240187Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.8240265Z @triton.jit 2023-01-11T21:38:06.8240406Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.8240481Z xnumel = 16 2023-01-11T21:38:06.8240579Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.8240709Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.8240786Z xmask = xindex < xnumel 2023-01-11T21:38:06.8240864Z x0 = xindex 2023-01-11T21:38:06.8240962Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.8241060Z tmp3 = tl.load(in_ptr1 + (x0), xmask) 2023-01-11T21:38:06.8241131Z tmp1 = 1 2023-01-11T21:38:06.8241210Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.8241281Z tmp4 = tmp2 + tmp3 2023-01-11T21:38:06.8241445Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.8241532Z ''') 2023-01-11T21:38:06.8241537Z 2023-01-11T21:38:06.8241542Z 2023-01-11T21:38:06.8241634Z async_compile.wait(globals()) 2023-01-11T21:38:06.8241712Z del async_compile 2023-01-11T21:38:06.8241717Z 2023-01-11T21:38:06.8241791Z def call(args): 2023-01-11T21:38:06.8241870Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.8241949Z args.clear() 2023-01-11T21:38:06.8242032Z with torch.cuda.device(0): 2023-01-11T21:38:06.8242232Z buf0 = empty_strided((4, 4), (4, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.8242323Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.8242470Z triton_fused_add_1_0.run(arg0_1, arg1_1, buf0, 16, grid=grid(16), stream=stream0) 2023-01-11T21:38:06.8242544Z del arg0_1 2023-01-11T21:38:06.8242617Z del arg1_1 2023-01-11T21:38:06.8242694Z return (buf0, ) 2023-01-11T21:38:06.8242700Z 2023-01-11T21:38:06.8242704Z 2023-01-11T21:38:06.8242787Z if __name__ == "__main__": 2023-01-11T21:38:06.8242897Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8243024Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8243235Z arg0_1 = rand_strided((2, 2, 2, 2), (8, 4, 2, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.8243431Z arg1_1 = rand_strided((4, 4), (4, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.8243555Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.8243560Z 2023-01-11T21:38:06.8243827Z [2023-01-11 21:36:22,818] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 976 2023-01-11T21:38:06.8244276Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8244411Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8244667Z [2023-01-11 21:36:22,834] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 977 2023-01-11T21:38:06.8244920Z [2023-01-11 21:36:22,843] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 977 2023-01-11T21:38:06.8245333Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8245468Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8245724Z [2023-01-11 21:36:22,859] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 978 2023-01-11T21:38:06.8245986Z [2023-01-11 21:36:22,869] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 978 2023-01-11T21:38:06.8246396Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8246527Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8246781Z [2023-01-11 21:36:22,887] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 979 2023-01-11T21:38:06.8246787Z 2023-01-11T21:38:06.8246885Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8246961Z import torch 2023-01-11T21:38:06.8247034Z import random 2023-01-11T21:38:06.8247171Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8247297Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8247302Z 2023-01-11T21:38:06.8247384Z aten = torch.ops.aten 2023-01-11T21:38:06.8247519Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8247616Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8247621Z 2023-01-11T21:38:06.8247695Z import triton 2023-01-11T21:38:06.8247787Z import triton.language as tl 2023-01-11T21:38:06.8247904Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8248044Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8248053Z 2023-01-11T21:38:06.8248057Z 2023-01-11T21:38:06.8248216Z triton_fused_add_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.8248290Z import triton 2023-01-11T21:38:06.8248383Z import triton.language as tl 2023-01-11T21:38:06.8248498Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.8248601Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.8248735Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.8248853Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.8248858Z 2023-01-11T21:38:06.8249277Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.8249354Z @triton.jit 2023-01-11T21:38:06.8249492Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.8249606Z xnumel = 16 2023-01-11T21:38:06.8249703Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.8249833Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.8249917Z xmask = xindex < xnumel 2023-01-11T21:38:06.8249981Z x0 = xindex 2023-01-11T21:38:06.8250103Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.8250219Z tmp3 = tl.load(in_ptr1 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.8250290Z tmp1 = 1 2023-01-11T21:38:06.8250370Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.8250447Z tmp4 = tmp2 + tmp3 2023-01-11T21:38:06.8250582Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.8250661Z ''') 2023-01-11T21:38:06.8250666Z 2023-01-11T21:38:06.8250671Z 2023-01-11T21:38:06.8250765Z async_compile.wait(globals()) 2023-01-11T21:38:06.8250841Z del async_compile 2023-01-11T21:38:06.8250846Z 2023-01-11T21:38:06.8250921Z def call(args): 2023-01-11T21:38:06.8251004Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.8251082Z args.clear() 2023-01-11T21:38:06.8251175Z with torch.cuda.device(0): 2023-01-11T21:38:06.8251368Z buf0 = empty_strided((4, 4), (4, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.8251464Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.8251611Z triton_fused_add_1_0.run(arg0_1, arg1_1, buf0, 16, grid=grid(16), stream=stream0) 2023-01-11T21:38:06.8251685Z del arg0_1 2023-01-11T21:38:06.8251758Z del arg1_1 2023-01-11T21:38:06.8251835Z return (buf0, ) 2023-01-11T21:38:06.8251840Z 2023-01-11T21:38:06.8251845Z 2023-01-11T21:38:06.8251924Z if __name__ == "__main__": 2023-01-11T21:38:06.8252044Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8252163Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8252375Z arg0_1 = rand_strided((2, 2, 2, 2), (8, 4, 2, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.8252575Z arg1_1 = rand_strided((4, 4), (4, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.8252698Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.8252703Z 2023-01-11T21:38:06.8252708Z 2023-01-11T21:38:06.8252806Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8252882Z import torch 2023-01-11T21:38:06.8252957Z import random 2023-01-11T21:38:06.8253105Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8253222Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8253227Z 2023-01-11T21:38:06.8253310Z aten = torch.ops.aten 2023-01-11T21:38:06.8253446Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8253543Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8253548Z 2023-01-11T21:38:06.8253623Z import triton 2023-01-11T21:38:06.8253715Z import triton.language as tl 2023-01-11T21:38:06.8253841Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8253973Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8253988Z 2023-01-11T21:38:06.8253992Z 2023-01-11T21:38:06.8254139Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.8254215Z import triton 2023-01-11T21:38:06.8254309Z import triton.language as tl 2023-01-11T21:38:06.8254424Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.8254749Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.8254884Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.8255010Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.8255015Z 2023-01-11T21:38:06.8255433Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.8255509Z @triton.jit 2023-01-11T21:38:06.8255669Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.8255825Z xnumel = 35 2023-01-11T21:38:06.8255927Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.8256060Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.8256148Z xmask = xindex < xnumel 2023-01-11T21:38:06.8256224Z x0 = xindex 2023-01-11T21:38:06.8256319Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.8256418Z tmp1 = tl.load(in_ptr1 + (x0), xmask) 2023-01-11T21:38:06.8256499Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.8256638Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.8256728Z ''') 2023-01-11T21:38:06.8256733Z 2023-01-11T21:38:06.8256738Z 2023-01-11T21:38:06.8256834Z async_compile.wait(globals()) 2023-01-11T21:38:06.8256915Z del async_compile 2023-01-11T21:38:06.8256920Z 2023-01-11T21:38:06.8256998Z def call(args): 2023-01-11T21:38:06.8257073Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.8257219Z args.clear() 2023-01-11T21:38:06.8257338Z with torch.cuda.device(0): 2023-01-11T21:38:06.8257543Z buf0 = empty_strided((35, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.8257640Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.8257788Z triton_fused_add_0.run(arg0_1, arg1_1, buf0, 35, grid=grid(35), stream=stream0) 2023-01-11T21:38:06.8257869Z del arg0_1 2023-01-11T21:38:06.8257937Z del arg1_1 2023-01-11T21:38:06.8258016Z return (buf0, ) 2023-01-11T21:38:06.8258021Z 2023-01-11T21:38:06.8258026Z 2023-01-11T21:38:06.8258109Z if __name__ == "__main__": 2023-01-11T21:38:06.8258229Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8258359Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8258561Z arg0_1 = rand_strided((5, 7), (7, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.8258762Z arg1_1 = rand_strided((35, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.8258876Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.8258891Z 2023-01-11T21:38:06.8258895Z 2023-01-11T21:38:06.8258989Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8259066Z import torch 2023-01-11T21:38:06.8259143Z import random 2023-01-11T21:38:06.8259264Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8259438Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8259444Z 2023-01-11T21:38:06.8259531Z aten = torch.ops.aten 2023-01-11T21:38:06.8259670Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8259759Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8259772Z 2023-01-11T21:38:06.8259841Z import triton 2023-01-11T21:38:06.8259936Z import triton.language as tl 2023-01-11T21:38:06.8260065Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8260205Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8260211Z 2023-01-11T21:38:06.8260215Z 2023-01-11T21:38:06.8260377Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.8260456Z import triton 2023-01-11T21:38:06.8260550Z import triton.language as tl 2023-01-11T21:38:06.8260659Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.8260763Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.8260899Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.8261030Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.8261035Z 2023-01-11T21:38:06.8261454Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.8261531Z @triton.jit 2023-01-11T21:38:06.8261674Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.8261752Z xnumel = 35 2023-01-11T21:38:06.8261844Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.8262007Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.8262093Z xmask = xindex < xnumel 2023-01-11T21:38:06.8262168Z x0 = xindex 2023-01-11T21:38:06.8262290Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.8262412Z tmp1 = tl.load(in_ptr1 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.8262496Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.8262624Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.8262712Z ''') 2023-01-11T21:38:06.8262717Z 2023-01-11T21:38:06.8262721Z 2023-01-11T21:38:06.8262817Z async_compile.wait(globals()) 2023-01-11T21:38:06.8262897Z del async_compile 2023-01-11T21:38:06.8262902Z 2023-01-11T21:38:06.8262980Z def call(args): 2023-01-11T21:38:06.8263062Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.8263139Z args.clear() 2023-01-11T21:38:06.8263238Z with torch.cuda.device(0): 2023-01-11T21:38:06.8263431Z buf0 = empty_strided((35, ), (1, ), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.8263530Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.8263673Z triton_fused_add_0.run(arg0_1, arg1_1, buf0, 35, grid=grid(35), stream=stream0) 2023-01-11T21:38:06.8263750Z del arg0_1 2023-01-11T21:38:06.8263828Z del arg1_1 2023-01-11T21:38:06.8263913Z return (buf0, ) 2023-01-11T21:38:06.8263918Z 2023-01-11T21:38:06.8263923Z 2023-01-11T21:38:06.8264005Z if __name__ == "__main__": 2023-01-11T21:38:06.8264118Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8264246Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8264445Z arg0_1 = rand_strided((5, 7), (7, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.8264644Z arg1_1 = rand_strided((35, ), (1, ), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.8264765Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.8264770Z 2023-01-11T21:38:06.8265041Z [2023-01-11 21:36:22,899] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 979 2023-01-11T21:38:06.8265490Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8265631Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8265888Z [2023-01-11 21:36:22,917] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 980 2023-01-11T21:38:06.8266155Z [2023-01-11 21:36:22,929] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 980 2023-01-11T21:38:06.8266568Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8266697Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8266958Z [2023-01-11 21:36:22,944] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 981 2023-01-11T21:38:06.8267222Z [2023-01-11 21:36:22,954] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 981 2023-01-11T21:38:06.8267635Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8267799Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8268054Z [2023-01-11 21:36:22,970] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 982 2023-01-11T21:38:06.8268059Z 2023-01-11T21:38:06.8268161Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8268243Z import torch 2023-01-11T21:38:06.8268320Z import random 2023-01-11T21:38:06.8268435Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8268561Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8268566Z 2023-01-11T21:38:06.8268650Z aten = torch.ops.aten 2023-01-11T21:38:06.8268790Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8268888Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8268893Z 2023-01-11T21:38:06.8268969Z import triton 2023-01-11T21:38:06.8269063Z import triton.language as tl 2023-01-11T21:38:06.8269192Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8269330Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8269336Z 2023-01-11T21:38:06.8269347Z 2023-01-11T21:38:06.8269500Z triton_fused_add_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.8269577Z import triton 2023-01-11T21:38:06.8269673Z import triton.language as tl 2023-01-11T21:38:06.8269792Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.8269896Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.8270032Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.8270153Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.8270168Z 2023-01-11T21:38:06.8270576Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.8270653Z @triton.jit 2023-01-11T21:38:06.8270798Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.8270876Z xnumel = 35 2023-01-11T21:38:06.8270975Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.8271108Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.8271196Z xmask = xindex < xnumel 2023-01-11T21:38:06.8271333Z x0 = xindex 2023-01-11T21:38:06.8271437Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.8271537Z tmp3 = tl.load(in_ptr1 + (x0), xmask) 2023-01-11T21:38:06.8271609Z tmp1 = 1 2023-01-11T21:38:06.8271691Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.8271772Z tmp4 = tmp2 + tmp3 2023-01-11T21:38:06.8271912Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.8271993Z ''') 2023-01-11T21:38:06.8272009Z 2023-01-11T21:38:06.8272013Z 2023-01-11T21:38:06.8272102Z async_compile.wait(globals()) 2023-01-11T21:38:06.8272182Z del async_compile 2023-01-11T21:38:06.8272187Z 2023-01-11T21:38:06.8272269Z def call(args): 2023-01-11T21:38:06.8272351Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.8272429Z args.clear() 2023-01-11T21:38:06.8272527Z with torch.cuda.device(0): 2023-01-11T21:38:06.8272731Z buf0 = empty_strided((35, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.8272818Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.8272968Z triton_fused_add_1_0.run(arg0_1, arg1_1, buf0, 35, grid=grid(35), stream=stream0) 2023-01-11T21:38:06.8273045Z del arg0_1 2023-01-11T21:38:06.8273121Z del arg1_1 2023-01-11T21:38:06.8273202Z return (buf0, ) 2023-01-11T21:38:06.8273207Z 2023-01-11T21:38:06.8273212Z 2023-01-11T21:38:06.8273293Z if __name__ == "__main__": 2023-01-11T21:38:06.8273413Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8273534Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8273736Z arg0_1 = rand_strided((5, 7), (7, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.8273967Z arg1_1 = rand_strided((35, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.8274089Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.8274094Z 2023-01-11T21:38:06.8274098Z 2023-01-11T21:38:06.8274199Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8274275Z import torch 2023-01-11T21:38:06.8274356Z import random 2023-01-11T21:38:06.8274482Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8274600Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8274613Z 2023-01-11T21:38:06.8274690Z aten = torch.ops.aten 2023-01-11T21:38:06.8274828Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8274927Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8274932Z 2023-01-11T21:38:06.8275009Z import triton 2023-01-11T21:38:06.8275104Z import triton.language as tl 2023-01-11T21:38:06.8275236Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8275382Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8275387Z 2023-01-11T21:38:06.8275391Z 2023-01-11T21:38:06.8275552Z triton_fused_add_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.8275622Z import triton 2023-01-11T21:38:06.8275721Z import triton.language as tl 2023-01-11T21:38:06.8275840Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.8275945Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.8276082Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.8276208Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.8276213Z 2023-01-11T21:38:06.8276626Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.8276695Z @triton.jit 2023-01-11T21:38:06.8276838Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.8276919Z xnumel = 35 2023-01-11T21:38:06.8277018Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.8277152Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.8277238Z xmask = xindex < xnumel 2023-01-11T21:38:06.8277311Z x0 = xindex 2023-01-11T21:38:06.8277452Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.8277575Z tmp3 = tl.load(in_ptr1 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.8277648Z tmp1 = 1 2023-01-11T21:38:06.8277731Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.8277813Z tmp4 = tmp2 + tmp3 2023-01-11T21:38:06.8277951Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.8278039Z ''') 2023-01-11T21:38:06.8278045Z 2023-01-11T21:38:06.8278049Z 2023-01-11T21:38:06.8278145Z async_compile.wait(globals()) 2023-01-11T21:38:06.8278218Z del async_compile 2023-01-11T21:38:06.8278223Z 2023-01-11T21:38:06.8278303Z def call(args): 2023-01-11T21:38:06.8278383Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.8278463Z args.clear() 2023-01-11T21:38:06.8278557Z with torch.cuda.device(0): 2023-01-11T21:38:06.8278756Z buf0 = empty_strided((35, ), (1, ), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.8278853Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.8278992Z triton_fused_add_1_0.run(arg0_1, arg1_1, buf0, 35, grid=grid(35), stream=stream0) 2023-01-11T21:38:06.8279068Z del arg0_1 2023-01-11T21:38:06.8279145Z del arg1_1 2023-01-11T21:38:06.8279225Z return (buf0, ) 2023-01-11T21:38:06.8279231Z 2023-01-11T21:38:06.8279235Z 2023-01-11T21:38:06.8279319Z if __name__ == "__main__": 2023-01-11T21:38:06.8279441Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8279569Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8279768Z arg0_1 = rand_strided((5, 7), (7, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.8279997Z arg1_1 = rand_strided((35, ), (1, ), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.8280118Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.8280123Z 2023-01-11T21:38:06.8280128Z 2023-01-11T21:38:06.8280229Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8280306Z import torch 2023-01-11T21:38:06.8280387Z import random 2023-01-11T21:38:06.8280508Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8280634Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8280640Z 2023-01-11T21:38:06.8280724Z aten = torch.ops.aten 2023-01-11T21:38:06.8280856Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8280955Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8280960Z 2023-01-11T21:38:06.8281037Z import triton 2023-01-11T21:38:06.8281131Z import triton.language as tl 2023-01-11T21:38:06.8281259Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8281404Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8281409Z 2023-01-11T21:38:06.8281414Z 2023-01-11T21:38:06.8281572Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.8281650Z import triton 2023-01-11T21:38:06.8281737Z import triton.language as tl 2023-01-11T21:38:06.8281856Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.8281960Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.8282097Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.8282225Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.8282230Z 2023-01-11T21:38:06.8282652Z @pointwise(size_hints=[8192], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.8282728Z @triton.jit 2023-01-11T21:38:06.8282870Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.8282944Z xnumel = 5040 2023-01-11T21:38:06.8283044Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.8283175Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.8283261Z xmask = xindex < xnumel 2023-01-11T21:38:06.8283362Z x0 = xindex 2023-01-11T21:38:06.8283462Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.8283562Z tmp1 = tl.load(in_ptr1 + (x0), xmask) 2023-01-11T21:38:06.8283637Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.8283771Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.8283862Z ''') 2023-01-11T21:38:06.8283868Z 2023-01-11T21:38:06.8283872Z 2023-01-11T21:38:06.8283967Z async_compile.wait(globals()) 2023-01-11T21:38:06.8284046Z del async_compile 2023-01-11T21:38:06.8284051Z 2023-01-11T21:38:06.8284128Z def call(args): 2023-01-11T21:38:06.8284209Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.8284283Z args.clear() 2023-01-11T21:38:06.8284377Z with torch.cuda.device(0): 2023-01-11T21:38:06.8284581Z buf0 = empty_strided((5040, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.8284679Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.8284831Z triton_fused_add_0.run(arg0_1, arg1_1, buf0, 5040, grid=grid(5040), stream=stream0) 2023-01-11T21:38:06.8284909Z del arg0_1 2023-01-11T21:38:06.8284983Z del arg1_1 2023-01-11T21:38:06.8285056Z return (buf0, ) 2023-01-11T21:38:06.8285069Z 2023-01-11T21:38:06.8285074Z 2023-01-11T21:38:06.8285149Z if __name__ == "__main__": 2023-01-11T21:38:06.8285270Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8285397Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8285638Z arg0_1 = rand_strided((2, 3, 4, 5, 6, 7), (2520, 840, 210, 42, 7, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.8285874Z arg1_1 = rand_strided((5040, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.8286039Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.8286044Z 2023-01-11T21:38:06.8286312Z [2023-01-11 21:36:22,981] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 982 2023-01-11T21:38:06.8286729Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8286865Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8287116Z [2023-01-11 21:36:22,999] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 983 2023-01-11T21:38:06.8287378Z [2023-01-11 21:36:23,014] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 983 2023-01-11T21:38:06.8287794Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8287929Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8288185Z [2023-01-11 21:36:23,032] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 984 2023-01-11T21:38:06.8288448Z [2023-01-11 21:36:23,045] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 984 2023-01-11T21:38:06.8288866Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8289001Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8289287Z [2023-01-11 21:36:23,061] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 985 2023-01-11T21:38:06.8289293Z 2023-01-11T21:38:06.8289394Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8289473Z import torch 2023-01-11T21:38:06.8289544Z import random 2023-01-11T21:38:06.8289665Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8289791Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8289797Z 2023-01-11T21:38:06.8289882Z aten = torch.ops.aten 2023-01-11T21:38:06.8290021Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8290119Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8290127Z 2023-01-11T21:38:06.8290204Z import triton 2023-01-11T21:38:06.8290291Z import triton.language as tl 2023-01-11T21:38:06.8290420Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8290562Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8290567Z 2023-01-11T21:38:06.8290572Z 2023-01-11T21:38:06.8290733Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.8290810Z import triton 2023-01-11T21:38:06.8290906Z import triton.language as tl 2023-01-11T21:38:06.8291023Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.8291128Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.8291256Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.8291384Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.8291389Z 2023-01-11T21:38:06.8291813Z @pointwise(size_hints=[8192], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.8291940Z @triton.jit 2023-01-11T21:38:06.8292086Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.8292164Z xnumel = 5040 2023-01-11T21:38:06.8292266Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.8292398Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.8292477Z xmask = xindex < xnumel 2023-01-11T21:38:06.8292551Z x0 = xindex 2023-01-11T21:38:06.8292670Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.8292793Z tmp1 = tl.load(in_ptr1 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.8292877Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.8293016Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.8293104Z ''') 2023-01-11T21:38:06.8293110Z 2023-01-11T21:38:06.8293114Z 2023-01-11T21:38:06.8293211Z async_compile.wait(globals()) 2023-01-11T21:38:06.8293288Z del async_compile 2023-01-11T21:38:06.8293293Z 2023-01-11T21:38:06.8293371Z def call(args): 2023-01-11T21:38:06.8293454Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.8293531Z args.clear() 2023-01-11T21:38:06.8293628Z with torch.cuda.device(0): 2023-01-11T21:38:06.8293834Z buf0 = empty_strided((5040, ), (1, ), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.8293929Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.8294069Z triton_fused_add_0.run(arg0_1, arg1_1, buf0, 5040, grid=grid(5040), stream=stream0) 2023-01-11T21:38:06.8294146Z del arg0_1 2023-01-11T21:38:06.8294220Z del arg1_1 2023-01-11T21:38:06.8294303Z return (buf0, ) 2023-01-11T21:38:06.8294308Z 2023-01-11T21:38:06.8294312Z 2023-01-11T21:38:06.8294394Z if __name__ == "__main__": 2023-01-11T21:38:06.8294883Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8295025Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8295266Z arg0_1 = rand_strided((2, 3, 4, 5, 6, 7), (2520, 840, 210, 42, 7, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.8295471Z arg1_1 = rand_strided((5040, ), (1, ), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.8295618Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.8295723Z 2023-01-11T21:38:06.8295730Z 2023-01-11T21:38:06.8295844Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8295921Z import torch 2023-01-11T21:38:06.8295998Z import random 2023-01-11T21:38:06.8296119Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8296245Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8296251Z 2023-01-11T21:38:06.8296340Z aten = torch.ops.aten 2023-01-11T21:38:06.8296472Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8296571Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8296576Z 2023-01-11T21:38:06.8296653Z import triton 2023-01-11T21:38:06.8296750Z import triton.language as tl 2023-01-11T21:38:06.8296877Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8297021Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8297026Z 2023-01-11T21:38:06.8297031Z 2023-01-11T21:38:06.8297265Z triton_fused_add_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.8297338Z import triton 2023-01-11T21:38:06.8297435Z import triton.language as tl 2023-01-11T21:38:06.8297552Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.8297657Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.8297798Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.8297926Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.8297931Z 2023-01-11T21:38:06.8298354Z @pointwise(size_hints=[8192], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.8298478Z @triton.jit 2023-01-11T21:38:06.8298614Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.8298693Z xnumel = 5040 2023-01-11T21:38:06.8298792Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.8298927Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.8299016Z xmask = xindex < xnumel 2023-01-11T21:38:06.8299090Z x0 = xindex 2023-01-11T21:38:06.8299189Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.8299282Z tmp3 = tl.load(in_ptr1 + (x0), xmask) 2023-01-11T21:38:06.8299355Z tmp1 = 1 2023-01-11T21:38:06.8299436Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.8299516Z tmp4 = tmp2 + tmp3 2023-01-11T21:38:06.8299657Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.8299749Z ''') 2023-01-11T21:38:06.8299754Z 2023-01-11T21:38:06.8299759Z 2023-01-11T21:38:06.8299862Z async_compile.wait(globals()) 2023-01-11T21:38:06.8299942Z del async_compile 2023-01-11T21:38:06.8299948Z 2023-01-11T21:38:06.8300017Z def call(args): 2023-01-11T21:38:06.8300098Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.8300177Z args.clear() 2023-01-11T21:38:06.8300271Z with torch.cuda.device(0): 2023-01-11T21:38:06.8300475Z buf0 = empty_strided((5040, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.8300573Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.8300720Z triton_fused_add_1_0.run(arg0_1, arg1_1, buf0, 5040, grid=grid(5040), stream=stream0) 2023-01-11T21:38:06.8300788Z del arg0_1 2023-01-11T21:38:06.8300863Z del arg1_1 2023-01-11T21:38:06.8300943Z return (buf0, ) 2023-01-11T21:38:06.8300948Z 2023-01-11T21:38:06.8300952Z 2023-01-11T21:38:06.8301035Z if __name__ == "__main__": 2023-01-11T21:38:06.8301155Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8301284Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8301527Z arg0_1 = rand_strided((2, 3, 4, 5, 6, 7), (2520, 840, 210, 42, 7, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.8301730Z arg1_1 = rand_strided((5040, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.8301877Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.8301883Z 2023-01-11T21:38:06.8301896Z 2023-01-11T21:38:06.8301990Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8302072Z import torch 2023-01-11T21:38:06.8302149Z import random 2023-01-11T21:38:06.8302271Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8302399Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8302405Z 2023-01-11T21:38:06.8302489Z aten = torch.ops.aten 2023-01-11T21:38:06.8302626Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8302716Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8302721Z 2023-01-11T21:38:06.8302799Z import triton 2023-01-11T21:38:06.8302894Z import triton.language as tl 2023-01-11T21:38:06.8303021Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8303162Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8303167Z 2023-01-11T21:38:06.8303172Z 2023-01-11T21:38:06.8303334Z triton_fused_add_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.8303412Z import triton 2023-01-11T21:38:06.8303507Z import triton.language as tl 2023-01-11T21:38:06.8303616Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.8303720Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.8303855Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.8303982Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.8303987Z 2023-01-11T21:38:06.8304408Z @pointwise(size_hints=[8192], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.8304512Z @triton.jit 2023-01-11T21:38:06.8304655Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.8304732Z xnumel = 5040 2023-01-11T21:38:06.8304824Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.8304959Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.8305046Z xmask = xindex < xnumel 2023-01-11T21:38:06.8305121Z x0 = xindex 2023-01-11T21:38:06.8305242Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.8305361Z tmp3 = tl.load(in_ptr1 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.8305434Z tmp1 = 1 2023-01-11T21:38:06.8305509Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.8305590Z tmp4 = tmp2 + tmp3 2023-01-11T21:38:06.8305732Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.8305822Z ''') 2023-01-11T21:38:06.8305830Z 2023-01-11T21:38:06.8305835Z 2023-01-11T21:38:06.8305931Z async_compile.wait(globals()) 2023-01-11T21:38:06.8306010Z del async_compile 2023-01-11T21:38:06.8306015Z 2023-01-11T21:38:06.8306092Z def call(args): 2023-01-11T21:38:06.8306167Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.8306247Z args.clear() 2023-01-11T21:38:06.8306343Z with torch.cuda.device(0): 2023-01-11T21:38:06.8306546Z buf0 = empty_strided((5040, ), (1, ), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.8306641Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.8306790Z triton_fused_add_1_0.run(arg0_1, arg1_1, buf0, 5040, grid=grid(5040), stream=stream0) 2023-01-11T21:38:06.8306865Z del arg0_1 2023-01-11T21:38:06.8306933Z del arg1_1 2023-01-11T21:38:06.8307013Z return (buf0, ) 2023-01-11T21:38:06.8307019Z 2023-01-11T21:38:06.8307023Z 2023-01-11T21:38:06.8307106Z if __name__ == "__main__": 2023-01-11T21:38:06.8307228Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8307362Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8307601Z arg0_1 = rand_strided((2, 3, 4, 5, 6, 7), (2520, 840, 210, 42, 7, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.8307802Z arg1_1 = rand_strided((5040, ), (1, ), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.8307959Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.8307965Z 2023-01-11T21:38:06.8308230Z [2023-01-11 21:36:23,072] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 985 2023-01-11T21:38:06.8308635Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8308766Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8309023Z [2023-01-11 21:36:23,088] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 986 2023-01-11T21:38:06.8309286Z [2023-01-11 21:36:23,097] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 986 2023-01-11T21:38:06.8309699Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8309830Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8310082Z [2023-01-11 21:36:23,116] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 987 2023-01-11T21:38:06.8310347Z [2023-01-11 21:36:23,129] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 987 2023-01-11T21:38:06.8310794Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8310925Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8311177Z [2023-01-11 21:36:23,148] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 988 2023-01-11T21:38:06.8311183Z 2023-01-11T21:38:06.8311281Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8311348Z import torch 2023-01-11T21:38:06.8311422Z import random 2023-01-11T21:38:06.8311541Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8311665Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8311674Z 2023-01-11T21:38:06.8311757Z aten = torch.ops.aten 2023-01-11T21:38:06.8311894Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8311989Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8311994Z 2023-01-11T21:38:06.8312061Z import triton 2023-01-11T21:38:06.8312156Z import triton.language as tl 2023-01-11T21:38:06.8312283Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8312423Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8312428Z 2023-01-11T21:38:06.8312432Z 2023-01-11T21:38:06.8312587Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.8312662Z import triton 2023-01-11T21:38:06.8312754Z import triton.language as tl 2023-01-11T21:38:06.8312869Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.8312963Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.8313098Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.8313225Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.8313230Z 2023-01-11T21:38:06.8313680Z @pointwise(size_hints=[8192], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.8313758Z @triton.jit 2023-01-11T21:38:06.8313899Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.8313973Z xnumel = 5040 2023-01-11T21:38:06.8314072Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.8314194Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.8314277Z xmask = xindex < xnumel 2023-01-11T21:38:06.8314348Z x0 = xindex 2023-01-11T21:38:06.8314446Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.8314544Z tmp1 = tl.load(in_ptr1 + (x0), xmask) 2023-01-11T21:38:06.8314624Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.8314759Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.8314837Z ''') 2023-01-11T21:38:06.8314843Z 2023-01-11T21:38:06.8314854Z 2023-01-11T21:38:06.8314939Z async_compile.wait(globals()) 2023-01-11T21:38:06.8315019Z del async_compile 2023-01-11T21:38:06.8315027Z 2023-01-11T21:38:06.8315101Z def call(args): 2023-01-11T21:38:06.8315180Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.8315255Z args.clear() 2023-01-11T21:38:06.8315349Z with torch.cuda.device(0): 2023-01-11T21:38:06.8315570Z buf0 = empty_strided((6, 4, 5, 42), (840, 210, 42, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.8315657Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.8315801Z triton_fused_add_0.run(arg0_1, arg1_1, buf0, 5040, grid=grid(5040), stream=stream0) 2023-01-11T21:38:06.8315877Z del arg0_1 2023-01-11T21:38:06.8315949Z del arg1_1 2023-01-11T21:38:06.8316027Z return (buf0, ) 2023-01-11T21:38:06.8316059Z 2023-01-11T21:38:06.8316064Z 2023-01-11T21:38:06.8316145Z if __name__ == "__main__": 2023-01-11T21:38:06.8316264Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8316382Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8316621Z arg0_1 = rand_strided((2, 3, 4, 5, 6, 7), (2520, 840, 210, 42, 7, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.8316848Z arg1_1 = rand_strided((6, 4, 5, 42), (840, 210, 42, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.8316972Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.8316977Z 2023-01-11T21:38:06.8316982Z 2023-01-11T21:38:06.8317079Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8317153Z import torch 2023-01-11T21:38:06.8317228Z import random 2023-01-11T21:38:06.8317347Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8317463Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8317479Z 2023-01-11T21:38:06.8317554Z aten = torch.ops.aten 2023-01-11T21:38:06.8317691Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8317786Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8317791Z 2023-01-11T21:38:06.8317863Z import triton 2023-01-11T21:38:06.8317955Z import triton.language as tl 2023-01-11T21:38:06.8318084Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8318222Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8318228Z 2023-01-11T21:38:06.8318232Z 2023-01-11T21:38:06.8318385Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.8318453Z import triton 2023-01-11T21:38:06.8318544Z import triton.language as tl 2023-01-11T21:38:06.8318661Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.8318763Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.8318894Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.8319023Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.8319028Z 2023-01-11T21:38:06.8319484Z @pointwise(size_hints=[8192], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.8319561Z @triton.jit 2023-01-11T21:38:06.8319695Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.8319770Z xnumel = 5040 2023-01-11T21:38:06.8319867Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.8319998Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.8320081Z xmask = xindex < xnumel 2023-01-11T21:38:06.8320153Z x0 = xindex 2023-01-11T21:38:06.8320263Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.8320380Z tmp1 = tl.load(in_ptr1 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.8320463Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.8320598Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.8320683Z ''') 2023-01-11T21:38:06.8320689Z 2023-01-11T21:38:06.8320693Z 2023-01-11T21:38:06.8320786Z async_compile.wait(globals()) 2023-01-11T21:38:06.8320863Z del async_compile 2023-01-11T21:38:06.8320871Z 2023-01-11T21:38:06.8320947Z def call(args): 2023-01-11T21:38:06.8321019Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.8321094Z args.clear() 2023-01-11T21:38:06.8321188Z with torch.cuda.device(0): 2023-01-11T21:38:06.8321409Z buf0 = empty_strided((6, 4, 5, 42), (840, 210, 42, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.8321502Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.8321650Z triton_fused_add_0.run(arg0_1, arg1_1, buf0, 5040, grid=grid(5040), stream=stream0) 2023-01-11T21:38:06.8321723Z del arg0_1 2023-01-11T21:38:06.8321789Z del arg1_1 2023-01-11T21:38:06.8321872Z return (buf0, ) 2023-01-11T21:38:06.8321906Z 2023-01-11T21:38:06.8321911Z 2023-01-11T21:38:06.8321992Z if __name__ == "__main__": 2023-01-11T21:38:06.8322110Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8322236Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8322477Z arg0_1 = rand_strided((2, 3, 4, 5, 6, 7), (2520, 840, 210, 42, 7, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.8322696Z arg1_1 = rand_strided((6, 4, 5, 42), (840, 210, 42, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.8322815Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.8322820Z 2023-01-11T21:38:06.8322824Z 2023-01-11T21:38:06.8322921Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8322988Z import torch 2023-01-11T21:38:06.8323066Z import random 2023-01-11T21:38:06.8323182Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8323306Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8323314Z 2023-01-11T21:38:06.8323397Z aten = torch.ops.aten 2023-01-11T21:38:06.8323532Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8323627Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8323633Z 2023-01-11T21:38:06.8323699Z import triton 2023-01-11T21:38:06.8323791Z import triton.language as tl 2023-01-11T21:38:06.8323919Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8324060Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8324065Z 2023-01-11T21:38:06.8324069Z 2023-01-11T21:38:06.8324226Z triton_fused_add_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.8324301Z import triton 2023-01-11T21:38:06.8324394Z import triton.language as tl 2023-01-11T21:38:06.8324507Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.8324602Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.8324735Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.8324861Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.8324866Z 2023-01-11T21:38:06.8325314Z @pointwise(size_hints=[8192], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.8325396Z @triton.jit 2023-01-11T21:38:06.8325548Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.8325637Z xnumel = 5040 2023-01-11T21:38:06.8325750Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.8325882Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.8325967Z xmask = xindex < xnumel 2023-01-11T21:38:06.8326038Z x0 = xindex 2023-01-11T21:38:06.8326134Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.8326233Z tmp3 = tl.load(in_ptr1 + (x0), xmask) 2023-01-11T21:38:06.8326305Z tmp1 = 1 2023-01-11T21:38:06.8326388Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.8326459Z tmp4 = tmp2 + tmp3 2023-01-11T21:38:06.8326595Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.8326679Z ''') 2023-01-11T21:38:06.8326685Z 2023-01-11T21:38:06.8326689Z 2023-01-11T21:38:06.8326782Z async_compile.wait(globals()) 2023-01-11T21:38:06.8326866Z del async_compile 2023-01-11T21:38:06.8326871Z 2023-01-11T21:38:06.8326946Z def call(args): 2023-01-11T21:38:06.8327026Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.8327094Z args.clear() 2023-01-11T21:38:06.8327190Z with torch.cuda.device(0): 2023-01-11T21:38:06.8327410Z buf0 = empty_strided((6, 4, 5, 42), (840, 210, 42, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.8327503Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.8327648Z triton_fused_add_1_0.run(arg0_1, arg1_1, buf0, 5040, grid=grid(5040), stream=stream0) 2023-01-11T21:38:06.8327721Z del arg0_1 2023-01-11T21:38:06.8327824Z del arg1_1 2023-01-11T21:38:06.8327902Z return (buf0, ) 2023-01-11T21:38:06.8327907Z 2023-01-11T21:38:06.8327912Z 2023-01-11T21:38:06.8327985Z if __name__ == "__main__": 2023-01-11T21:38:06.8328103Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8328230Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8328469Z arg0_1 = rand_strided((2, 3, 4, 5, 6, 7), (2520, 840, 210, 42, 7, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.8328689Z arg1_1 = rand_strided((6, 4, 5, 42), (840, 210, 42, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.8328809Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.8328814Z 2023-01-11T21:38:06.8329077Z [2023-01-11 21:36:23,159] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 988 2023-01-11T21:38:06.8329493Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8329627Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8329888Z [2023-01-11 21:36:23,175] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 989 2023-01-11T21:38:06.8330144Z [2023-01-11 21:36:23,186] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 989 2023-01-11T21:38:06.8330556Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8330692Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8330946Z [2023-01-11 21:36:23,202] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 990 2023-01-11T21:38:06.8331233Z [2023-01-11 21:36:23,212] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 990 2023-01-11T21:38:06.8331650Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8331783Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8332035Z [2023-01-11 21:36:23,231] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 991 2023-01-11T21:38:06.8332043Z 2023-01-11T21:38:06.8332141Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8332216Z import torch 2023-01-11T21:38:06.8332283Z import random 2023-01-11T21:38:06.8332401Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8332525Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8332530Z 2023-01-11T21:38:06.8332614Z aten = torch.ops.aten 2023-01-11T21:38:06.8332753Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8332848Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8332853Z 2023-01-11T21:38:06.8332931Z import triton 2023-01-11T21:38:06.8333022Z import triton.language as tl 2023-01-11T21:38:06.8333140Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8333281Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8333287Z 2023-01-11T21:38:06.8333291Z 2023-01-11T21:38:06.8333450Z triton_fused_add_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.8333525Z import triton 2023-01-11T21:38:06.8333672Z import triton.language as tl 2023-01-11T21:38:06.8333788Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.8333890Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.8334016Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.8334143Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.8334149Z 2023-01-11T21:38:06.8334837Z @pointwise(size_hints=[8192], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.8334912Z @triton.jit 2023-01-11T21:38:06.8335054Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.8335130Z xnumel = 5040 2023-01-11T21:38:06.8335227Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.8335356Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.8335444Z xmask = xindex < xnumel 2023-01-11T21:38:06.8335508Z x0 = xindex 2023-01-11T21:38:06.8335626Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.8335742Z tmp3 = tl.load(in_ptr1 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.8335812Z tmp1 = 1 2023-01-11T21:38:06.8335898Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.8335978Z tmp4 = tmp2 + tmp3 2023-01-11T21:38:06.8336105Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.8336191Z ''') 2023-01-11T21:38:06.8336196Z 2023-01-11T21:38:06.8336201Z 2023-01-11T21:38:06.8336297Z async_compile.wait(globals()) 2023-01-11T21:38:06.8336372Z del async_compile 2023-01-11T21:38:06.8336377Z 2023-01-11T21:38:06.8336453Z def call(args): 2023-01-11T21:38:06.8336531Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.8336607Z args.clear() 2023-01-11T21:38:06.8336700Z with torch.cuda.device(0): 2023-01-11T21:38:06.8336915Z buf0 = empty_strided((6, 4, 5, 42), (840, 210, 42, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.8337013Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.8337225Z triton_fused_add_1_0.run(arg0_1, arg1_1, buf0, 5040, grid=grid(5040), stream=stream0) 2023-01-11T21:38:06.8337323Z del arg0_1 2023-01-11T21:38:06.8337490Z del arg1_1 2023-01-11T21:38:06.8337570Z return (buf0, ) 2023-01-11T21:38:06.8337575Z 2023-01-11T21:38:06.8337580Z 2023-01-11T21:38:06.8337661Z if __name__ == "__main__": 2023-01-11T21:38:06.8337778Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8337897Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8338135Z arg0_1 = rand_strided((2, 3, 4, 5, 6, 7), (2520, 840, 210, 42, 7, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.8338357Z arg1_1 = rand_strided((6, 4, 5, 42), (840, 210, 42, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.8338477Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.8338485Z 2023-01-11T21:38:06.8338489Z 2023-01-11T21:38:06.8338588Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8338663Z import torch 2023-01-11T21:38:06.8338736Z import random 2023-01-11T21:38:06.8338853Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8338972Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8338977Z 2023-01-11T21:38:06.8339061Z aten = torch.ops.aten 2023-01-11T21:38:06.8339199Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8339294Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8339299Z 2023-01-11T21:38:06.8339373Z import triton 2023-01-11T21:38:06.8339466Z import triton.language as tl 2023-01-11T21:38:06.8339590Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8339721Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8339733Z 2023-01-11T21:38:06.8339738Z 2023-01-11T21:38:06.8339941Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.8340018Z import triton 2023-01-11T21:38:06.8340115Z import triton.language as tl 2023-01-11T21:38:06.8340230Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.8340335Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.8340473Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.8348593Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.8348603Z 2023-01-11T21:38:06.8349084Z @pointwise(size_hints=[1024], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.8349161Z @triton.jit 2023-01-11T21:38:06.8349307Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.8349376Z xnumel = 1000 2023-01-11T21:38:06.8349475Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.8349611Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.8349691Z xmask = xindex < xnumel 2023-01-11T21:38:06.8349757Z x0 = xindex 2023-01-11T21:38:06.8349856Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.8349953Z tmp1 = tl.load(in_ptr1 + (x0), xmask) 2023-01-11T21:38:06.8350036Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.8350172Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.8350259Z ''') 2023-01-11T21:38:06.8350265Z 2023-01-11T21:38:06.8350269Z 2023-01-11T21:38:06.8350364Z async_compile.wait(globals()) 2023-01-11T21:38:06.8350441Z del async_compile 2023-01-11T21:38:06.8350446Z 2023-01-11T21:38:06.8350514Z def call(args): 2023-01-11T21:38:06.8350595Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.8350670Z args.clear() 2023-01-11T21:38:06.8350763Z with torch.cuda.device(0): 2023-01-11T21:38:06.8350970Z buf0 = empty_strided((50, 20), (20, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.8351064Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.8351209Z triton_fused_add_0.run(arg0_1, arg1_1, buf0, 1000, grid=grid(1000), stream=stream0) 2023-01-11T21:38:06.8351276Z del arg0_1 2023-01-11T21:38:06.8351347Z del arg1_1 2023-01-11T21:38:06.8351483Z return (buf0, ) 2023-01-11T21:38:06.8351489Z 2023-01-11T21:38:06.8351493Z 2023-01-11T21:38:06.8351576Z if __name__ == "__main__": 2023-01-11T21:38:06.8351698Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8351827Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8352041Z arg0_1 = rand_strided((10, 5, 20), (100, 20, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.8352244Z arg1_1 = rand_strided((50, 20), (20, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.8352357Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.8352362Z 2023-01-11T21:38:06.8352369Z 2023-01-11T21:38:06.8352467Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8352540Z import torch 2023-01-11T21:38:06.8352614Z import random 2023-01-11T21:38:06.8352733Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8352857Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8352862Z 2023-01-11T21:38:06.8352945Z aten = torch.ops.aten 2023-01-11T21:38:06.8353077Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8353174Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8353179Z 2023-01-11T21:38:06.8353253Z import triton 2023-01-11T21:38:06.8353347Z import triton.language as tl 2023-01-11T21:38:06.8353473Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8353612Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8353618Z 2023-01-11T21:38:06.8353622Z 2023-01-11T21:38:06.8353773Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.8353847Z import triton 2023-01-11T21:38:06.8353967Z import triton.language as tl 2023-01-11T21:38:06.8354081Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.8354185Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.8354319Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.8354448Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.8354453Z 2023-01-11T21:38:06.8354874Z @pointwise(size_hints=[1024], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.8354949Z @triton.jit 2023-01-11T21:38:06.8355091Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.8355162Z xnumel = 1000 2023-01-11T21:38:06.8355259Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.8355388Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.8355475Z xmask = xindex < xnumel 2023-01-11T21:38:06.8355546Z x0 = xindex 2023-01-11T21:38:06.8355666Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.8355783Z tmp1 = tl.load(in_ptr1 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.8355855Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.8355995Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.8356082Z ''') 2023-01-11T21:38:06.8356088Z 2023-01-11T21:38:06.8356092Z 2023-01-11T21:38:06.8356185Z async_compile.wait(globals()) 2023-01-11T21:38:06.8356261Z del async_compile 2023-01-11T21:38:06.8356266Z 2023-01-11T21:38:06.8356341Z def call(args): 2023-01-11T21:38:06.8356421Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.8356495Z args.clear() 2023-01-11T21:38:06.8356581Z with torch.cuda.device(0): 2023-01-11T21:38:06.8356783Z buf0 = empty_strided((50, 20), (20, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.8356879Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.8357022Z triton_fused_add_0.run(arg0_1, arg1_1, buf0, 1000, grid=grid(1000), stream=stream0) 2023-01-11T21:38:06.8357096Z del arg0_1 2023-01-11T21:38:06.8357168Z del arg1_1 2023-01-11T21:38:06.8357245Z return (buf0, ) 2023-01-11T21:38:06.8357250Z 2023-01-11T21:38:06.8357287Z 2023-01-11T21:38:06.8357369Z if __name__ == "__main__": 2023-01-11T21:38:06.8357480Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8357607Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8357820Z arg0_1 = rand_strided((10, 5, 20), (100, 20, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.8358021Z arg1_1 = rand_strided((50, 20), (20, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.8358141Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.8358146Z 2023-01-11T21:38:06.8358410Z [2023-01-11 21:36:23,245] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 991 2023-01-11T21:38:06.8358828Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8358963Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8359218Z [2023-01-11 21:36:23,263] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 992 2023-01-11T21:38:06.8359474Z [2023-01-11 21:36:23,275] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 992 2023-01-11T21:38:06.8359885Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8360045Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8360303Z [2023-01-11 21:36:23,291] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 993 2023-01-11T21:38:06.8360563Z [2023-01-11 21:36:23,301] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 993 2023-01-11T21:38:06.8360975Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8361105Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8361363Z [2023-01-11 21:36:23,317] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 994 2023-01-11T21:38:06.8361368Z 2023-01-11T21:38:06.8361466Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8361545Z import torch 2023-01-11T21:38:06.8361620Z import random 2023-01-11T21:38:06.8361737Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8361861Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8361867Z 2023-01-11T21:38:06.8361948Z aten = torch.ops.aten 2023-01-11T21:38:06.8362086Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8362184Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8362190Z 2023-01-11T21:38:06.8362265Z import triton 2023-01-11T21:38:06.8362360Z import triton.language as tl 2023-01-11T21:38:06.8362479Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8362619Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8362626Z 2023-01-11T21:38:06.8362631Z 2023-01-11T21:38:06.8362790Z triton_fused_add_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.8362865Z import triton 2023-01-11T21:38:06.8362957Z import triton.language as tl 2023-01-11T21:38:06.8363071Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.8363198Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.8363334Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.8363453Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.8363458Z 2023-01-11T21:38:06.8363893Z @pointwise(size_hints=[1024], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.8363967Z @triton.jit 2023-01-11T21:38:06.8364109Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.8364188Z xnumel = 1000 2023-01-11T21:38:06.8364286Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.8364415Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.8364503Z xmask = xindex < xnumel 2023-01-11T21:38:06.8364567Z x0 = xindex 2023-01-11T21:38:06.8364666Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.8364764Z tmp3 = tl.load(in_ptr1 + (x0), xmask) 2023-01-11T21:38:06.8364837Z tmp1 = 1 2023-01-11T21:38:06.8364915Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.8364992Z tmp4 = tmp2 + tmp3 2023-01-11T21:38:06.8365128Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.8365206Z ''') 2023-01-11T21:38:06.8365212Z 2023-01-11T21:38:06.8365216Z 2023-01-11T21:38:06.8365309Z async_compile.wait(globals()) 2023-01-11T21:38:06.8365385Z del async_compile 2023-01-11T21:38:06.8365390Z 2023-01-11T21:38:06.8365465Z def call(args): 2023-01-11T21:38:06.8365543Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.8365644Z args.clear() 2023-01-11T21:38:06.8365736Z with torch.cuda.device(0): 2023-01-11T21:38:06.8365934Z buf0 = empty_strided((50, 20), (20, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.8366026Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.8366173Z triton_fused_add_1_0.run(arg0_1, arg1_1, buf0, 1000, grid=grid(1000), stream=stream0) 2023-01-11T21:38:06.8366249Z del arg0_1 2023-01-11T21:38:06.8366321Z del arg1_1 2023-01-11T21:38:06.8366400Z return (buf0, ) 2023-01-11T21:38:06.8366405Z 2023-01-11T21:38:06.8366409Z 2023-01-11T21:38:06.8366490Z if __name__ == "__main__": 2023-01-11T21:38:06.8366608Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8366727Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8366938Z arg0_1 = rand_strided((10, 5, 20), (100, 20, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.8367139Z arg1_1 = rand_strided((50, 20), (20, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.8367261Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.8367266Z 2023-01-11T21:38:06.8367270Z 2023-01-11T21:38:06.8367366Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8367442Z import torch 2023-01-11T21:38:06.8367517Z import random 2023-01-11T21:38:06.8367637Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8367754Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8367759Z 2023-01-11T21:38:06.8367841Z aten = torch.ops.aten 2023-01-11T21:38:06.8367977Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8368072Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8368077Z 2023-01-11T21:38:06.8368153Z import triton 2023-01-11T21:38:06.8368245Z import triton.language as tl 2023-01-11T21:38:06.8368369Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8368500Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8368518Z 2023-01-11T21:38:06.8368522Z 2023-01-11T21:38:06.8368672Z triton_fused_add_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.8368747Z import triton 2023-01-11T21:38:06.8368839Z import triton.language as tl 2023-01-11T21:38:06.8368953Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.8369083Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.8369218Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.8369341Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.8369346Z 2023-01-11T21:38:06.8369765Z @pointwise(size_hints=[1024], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.8369832Z @triton.jit 2023-01-11T21:38:06.8369974Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.8370050Z xnumel = 1000 2023-01-11T21:38:06.8370149Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.8370278Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.8370362Z xmask = xindex < xnumel 2023-01-11T21:38:06.8370433Z x0 = xindex 2023-01-11T21:38:06.8370544Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.8370660Z tmp3 = tl.load(in_ptr1 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.8370731Z tmp1 = 1 2023-01-11T21:38:06.8370809Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.8370886Z tmp4 = tmp2 + tmp3 2023-01-11T21:38:06.8371024Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.8371109Z ''') 2023-01-11T21:38:06.8371115Z 2023-01-11T21:38:06.8371119Z 2023-01-11T21:38:06.8371205Z async_compile.wait(globals()) 2023-01-11T21:38:06.8371285Z del async_compile 2023-01-11T21:38:06.8371290Z 2023-01-11T21:38:06.8371368Z def call(args): 2023-01-11T21:38:06.8371501Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.8371575Z args.clear() 2023-01-11T21:38:06.8371668Z with torch.cuda.device(0): 2023-01-11T21:38:06.8371872Z buf0 = empty_strided((50, 20), (20, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.8371964Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.8372106Z triton_fused_add_1_0.run(arg0_1, arg1_1, buf0, 1000, grid=grid(1000), stream=stream0) 2023-01-11T21:38:06.8372179Z del arg0_1 2023-01-11T21:38:06.8372253Z del arg1_1 2023-01-11T21:38:06.8372330Z return (buf0, ) 2023-01-11T21:38:06.8372335Z 2023-01-11T21:38:06.8372340Z 2023-01-11T21:38:06.8372419Z if __name__ == "__main__": 2023-01-11T21:38:06.8372537Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8372662Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8372867Z arg0_1 = rand_strided((10, 5, 20), (100, 20, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.8373072Z arg1_1 = rand_strided((50, 20), (20, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.8373192Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.8373198Z 2023-01-11T21:38:06.8373202Z 2023-01-11T21:38:06.8373299Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8373374Z import torch 2023-01-11T21:38:06.8373452Z import random 2023-01-11T21:38:06.8373570Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8373694Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8373700Z 2023-01-11T21:38:06.8373775Z aten = torch.ops.aten 2023-01-11T21:38:06.8373910Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8374005Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8374010Z 2023-01-11T21:38:06.8374089Z import triton 2023-01-11T21:38:06.8374180Z import triton.language as tl 2023-01-11T21:38:06.8374306Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8374447Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8374452Z 2023-01-11T21:38:06.8374457Z 2023-01-11T21:38:06.8374947Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.8375016Z import triton 2023-01-11T21:38:06.8375108Z import triton.language as tl 2023-01-11T21:38:06.8375289Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.8375392Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.8375527Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.8375650Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.8375656Z 2023-01-11T21:38:06.8376074Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.8376148Z @triton.jit 2023-01-11T21:38:06.8376281Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.8376357Z xnumel = 10 2023-01-11T21:38:06.8376454Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.8376583Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.8376665Z xmask = xindex < xnumel 2023-01-11T21:38:06.8376735Z x0 = xindex 2023-01-11T21:38:06.8376834Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.8376924Z tmp1 = tl.load(in_ptr1 + (x0), xmask) 2023-01-11T21:38:06.8377004Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.8377195Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.8377301Z ''') 2023-01-11T21:38:06.8377307Z 2023-01-11T21:38:06.8377311Z 2023-01-11T21:38:06.8377405Z async_compile.wait(globals()) 2023-01-11T21:38:06.8377481Z del async_compile 2023-01-11T21:38:06.8377486Z 2023-01-11T21:38:06.8377560Z def call(args): 2023-01-11T21:38:06.8377640Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.8377708Z args.clear() 2023-01-11T21:38:06.8377847Z with torch.cuda.device(0): 2023-01-11T21:38:06.8378059Z buf0 = empty_strided((1, 10, 1), (10, 1, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.8378153Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.8378300Z triton_fused_add_0.run(arg0_1, arg1_1, buf0, 10, grid=grid(10), stream=stream0) 2023-01-11T21:38:06.8378375Z del arg0_1 2023-01-11T21:38:06.8378451Z del arg1_1 2023-01-11T21:38:06.8378524Z return (buf0, ) 2023-01-11T21:38:06.8378529Z 2023-01-11T21:38:06.8378533Z 2023-01-11T21:38:06.8378615Z if __name__ == "__main__": 2023-01-11T21:38:06.8378734Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8378862Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8379064Z arg0_1 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.8379272Z arg1_1 = rand_strided((1, 10, 1), (10, 1, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.8379397Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.8379402Z 2023-01-11T21:38:06.8379670Z [2023-01-11 21:36:23,326] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 994 2023-01-11T21:38:06.8380094Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8380220Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8380478Z [2023-01-11 21:36:23,345] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 995 2023-01-11T21:38:06.8380743Z [2023-01-11 21:36:23,356] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 995 2023-01-11T21:38:06.8381185Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8381324Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8381579Z [2023-01-11 21:36:23,374] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 996 2023-01-11T21:38:06.8381842Z [2023-01-11 21:36:23,384] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 996 2023-01-11T21:38:06.8382255Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8382390Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8382651Z [2023-01-11 21:36:23,401] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 997 2023-01-11T21:38:06.8382657Z 2023-01-11T21:38:06.8382759Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8382829Z import torch 2023-01-11T21:38:06.8382905Z import random 2023-01-11T21:38:06.8383026Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8383152Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8383157Z 2023-01-11T21:38:06.8383240Z aten = torch.ops.aten 2023-01-11T21:38:06.8383380Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8383478Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8383483Z 2023-01-11T21:38:06.8383558Z import triton 2023-01-11T21:38:06.8383686Z import triton.language as tl 2023-01-11T21:38:06.8383813Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8383954Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8383959Z 2023-01-11T21:38:06.8383964Z 2023-01-11T21:38:06.8384123Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.8384203Z import triton 2023-01-11T21:38:06.8384298Z import triton.language as tl 2023-01-11T21:38:06.8384413Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.8384510Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.8384647Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.8384773Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.8384779Z 2023-01-11T21:38:06.8385196Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.8385278Z @triton.jit 2023-01-11T21:38:06.8385421Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.8385500Z xnumel = 10 2023-01-11T21:38:06.8385601Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.8385728Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.8385815Z xmask = xindex < xnumel 2023-01-11T21:38:06.8385887Z x0 = xindex 2023-01-11T21:38:06.8386007Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.8386125Z tmp1 = tl.load(in_ptr1 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.8386206Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.8386344Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.8386434Z ''') 2023-01-11T21:38:06.8386439Z 2023-01-11T21:38:06.8386443Z 2023-01-11T21:38:06.8386531Z async_compile.wait(globals()) 2023-01-11T21:38:06.8386613Z del async_compile 2023-01-11T21:38:06.8386618Z 2023-01-11T21:38:06.8386695Z def call(args): 2023-01-11T21:38:06.8386776Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.8386853Z args.clear() 2023-01-11T21:38:06.8386948Z with torch.cuda.device(0): 2023-01-11T21:38:06.8387194Z buf0 = empty_strided((1, 10, 1), (10, 1, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.8387283Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.8387429Z triton_fused_add_0.run(arg0_1, arg1_1, buf0, 10, grid=grid(10), stream=stream0) 2023-01-11T21:38:06.8387505Z del arg0_1 2023-01-11T21:38:06.8387579Z del arg1_1 2023-01-11T21:38:06.8387659Z return (buf0, ) 2023-01-11T21:38:06.8387665Z 2023-01-11T21:38:06.8387669Z 2023-01-11T21:38:06.8387750Z if __name__ == "__main__": 2023-01-11T21:38:06.8387870Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8387999Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8388194Z arg0_1 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.8388407Z arg1_1 = rand_strided((1, 10, 1), (10, 1, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.8388533Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.8388539Z 2023-01-11T21:38:06.8388543Z 2023-01-11T21:38:06.8388645Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8388721Z import torch 2023-01-11T21:38:06.8388797Z import random 2023-01-11T21:38:06.8388917Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8389042Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8389047Z 2023-01-11T21:38:06.8389125Z aten = torch.ops.aten 2023-01-11T21:38:06.8389264Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8389360Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8389366Z 2023-01-11T21:38:06.8389443Z import triton 2023-01-11T21:38:06.8389536Z import triton.language as tl 2023-01-11T21:38:06.8389690Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8389831Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8389837Z 2023-01-11T21:38:06.8389841Z 2023-01-11T21:38:06.8389999Z triton_fused_add_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.8390070Z import triton 2023-01-11T21:38:06.8390172Z import triton.language as tl 2023-01-11T21:38:06.8390288Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.8390392Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.8390533Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.8390660Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.8390665Z 2023-01-11T21:38:06.8391081Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.8391162Z @triton.jit 2023-01-11T21:38:06.8391298Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.8391378Z xnumel = 10 2023-01-11T21:38:06.8391479Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.8391610Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.8391698Z xmask = xindex < xnumel 2023-01-11T21:38:06.8391772Z x0 = xindex 2023-01-11T21:38:06.8391871Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.8391963Z tmp3 = tl.load(in_ptr1 + (x0), xmask) 2023-01-11T21:38:06.8392037Z tmp1 = 1 2023-01-11T21:38:06.8392118Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.8392198Z tmp4 = tmp2 + tmp3 2023-01-11T21:38:06.8392334Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.8392423Z ''') 2023-01-11T21:38:06.8392428Z 2023-01-11T21:38:06.8392433Z 2023-01-11T21:38:06.8392529Z async_compile.wait(globals()) 2023-01-11T21:38:06.8392601Z del async_compile 2023-01-11T21:38:06.8392608Z 2023-01-11T21:38:06.8392685Z def call(args): 2023-01-11T21:38:06.8392766Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.8392843Z args.clear() 2023-01-11T21:38:06.8392936Z with torch.cuda.device(0): 2023-01-11T21:38:06.8393176Z buf0 = empty_strided((1, 10, 1), (10, 1, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.8393272Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.8393411Z triton_fused_add_1_0.run(arg0_1, arg1_1, buf0, 10, grid=grid(10), stream=stream0) 2023-01-11T21:38:06.8393486Z del arg0_1 2023-01-11T21:38:06.8393560Z del arg1_1 2023-01-11T21:38:06.8393640Z return (buf0, ) 2023-01-11T21:38:06.8393645Z 2023-01-11T21:38:06.8393649Z 2023-01-11T21:38:06.8393731Z if __name__ == "__main__": 2023-01-11T21:38:06.8393850Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8393979Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8394181Z arg0_1 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.8394385Z arg1_1 = rand_strided((1, 10, 1), (10, 1, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.8394506Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.8394511Z 2023-01-11T21:38:06.8394516Z 2023-01-11T21:38:06.8394618Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8394694Z import torch 2023-01-11T21:38:06.8394772Z import random 2023-01-11T21:38:06.8394896Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8395021Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8395026Z 2023-01-11T21:38:06.8395112Z aten = torch.ops.aten 2023-01-11T21:38:06.8395244Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8395359Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8395365Z 2023-01-11T21:38:06.8395450Z import triton 2023-01-11T21:38:06.8395562Z import triton.language as tl 2023-01-11T21:38:06.8395723Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8395864Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8395869Z 2023-01-11T21:38:06.8395874Z 2023-01-11T21:38:06.8396034Z triton_fused_add_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.8396110Z import triton 2023-01-11T21:38:06.8396200Z import triton.language as tl 2023-01-11T21:38:06.8396316Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.8396421Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.8396555Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.8396682Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.8396687Z 2023-01-11T21:38:06.8397104Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.8397186Z @triton.jit 2023-01-11T21:38:06.8397330Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.8397400Z xnumel = 10 2023-01-11T21:38:06.8397500Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.8397632Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.8397726Z xmask = xindex < xnumel 2023-01-11T21:38:06.8397800Z x0 = xindex 2023-01-11T21:38:06.8397918Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.8398040Z tmp3 = tl.load(in_ptr1 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.8398107Z tmp1 = 1 2023-01-11T21:38:06.8398187Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.8398269Z tmp4 = tmp2 + tmp3 2023-01-11T21:38:06.8398406Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.8398493Z ''') 2023-01-11T21:38:06.8398499Z 2023-01-11T21:38:06.8398503Z 2023-01-11T21:38:06.8398599Z async_compile.wait(globals()) 2023-01-11T21:38:06.8398683Z del async_compile 2023-01-11T21:38:06.8398688Z 2023-01-11T21:38:06.8398758Z def call(args): 2023-01-11T21:38:06.8398839Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.8398916Z args.clear() 2023-01-11T21:38:06.8399009Z with torch.cuda.device(0): 2023-01-11T21:38:06.8399251Z buf0 = empty_strided((1, 10, 1), (10, 1, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.8399346Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.8399490Z triton_fused_add_1_0.run(arg0_1, arg1_1, buf0, 10, grid=grid(10), stream=stream0) 2023-01-11T21:38:06.8399568Z del arg0_1 2023-01-11T21:38:06.8399636Z del arg1_1 2023-01-11T21:38:06.8399719Z return (buf0, ) 2023-01-11T21:38:06.8399724Z 2023-01-11T21:38:06.8399728Z 2023-01-11T21:38:06.8399810Z if __name__ == "__main__": 2023-01-11T21:38:06.8399930Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8400059Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8400260Z arg0_1 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.8400467Z arg1_1 = rand_strided((1, 10, 1), (10, 1, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.8400580Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.8400592Z 2023-01-11T21:38:06.8400854Z [2023-01-11 21:36:23,413] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 997 2023-01-11T21:38:06.8401277Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8401415Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8401674Z [2023-01-11 21:36:23,429] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 998 2023-01-11T21:38:06.8401967Z [2023-01-11 21:36:23,440] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 998 2023-01-11T21:38:06.8402384Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8402518Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8402774Z [2023-01-11 21:36:23,460] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 999 2023-01-11T21:38:06.8403035Z [2023-01-11 21:36:23,472] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 999 2023-01-11T21:38:06.8403446Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8403587Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8403847Z [2023-01-11 21:36:23,491] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1000 2023-01-11T21:38:06.8403853Z 2023-01-11T21:38:06.8403946Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8404025Z import torch 2023-01-11T21:38:06.8404102Z import random 2023-01-11T21:38:06.8404223Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8404348Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8404354Z 2023-01-11T21:38:06.8404438Z aten = torch.ops.aten 2023-01-11T21:38:06.8404581Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8404672Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8404685Z 2023-01-11T21:38:06.8404755Z import triton 2023-01-11T21:38:06.8404851Z import triton.language as tl 2023-01-11T21:38:06.8404979Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8405146Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8405152Z 2023-01-11T21:38:06.8405156Z 2023-01-11T21:38:06.8405314Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.8405391Z import triton 2023-01-11T21:38:06.8405488Z import triton.language as tl 2023-01-11T21:38:06.8405597Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.8405702Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.8405862Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.8406004Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.8406014Z 2023-01-11T21:38:06.8406442Z @pointwise(size_hints=[1024], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.8406523Z @triton.jit 2023-01-11T21:38:06.8406671Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.8406749Z xnumel = 1000 2023-01-11T21:38:06.8406842Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.8406974Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.8407059Z xmask = xindex < xnumel 2023-01-11T21:38:06.8407132Z x0 = xindex 2023-01-11T21:38:06.8407230Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.8407330Z tmp1 = tl.load(in_ptr1 + (x0), xmask) 2023-01-11T21:38:06.8407412Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.8407542Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.8407658Z ''') 2023-01-11T21:38:06.8407665Z 2023-01-11T21:38:06.8407670Z 2023-01-11T21:38:06.8407764Z async_compile.wait(globals()) 2023-01-11T21:38:06.8407845Z del async_compile 2023-01-11T21:38:06.8407850Z 2023-01-11T21:38:06.8407929Z def call(args): 2023-01-11T21:38:06.8408009Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.8408090Z args.clear() 2023-01-11T21:38:06.8408185Z with torch.cuda.device(0): 2023-01-11T21:38:06.8408412Z buf0 = empty_strided((10, 1, 10, 1, 10), (100, 100, 10, 10, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.8408506Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.8408653Z triton_fused_add_0.run(arg0_1, arg1_1, buf0, 1000, grid=grid(1000), stream=stream0) 2023-01-11T21:38:06.8408727Z del arg0_1 2023-01-11T21:38:06.8408801Z del arg1_1 2023-01-11T21:38:06.8408884Z return (buf0, ) 2023-01-11T21:38:06.8408889Z 2023-01-11T21:38:06.8408894Z 2023-01-11T21:38:06.8408975Z if __name__ == "__main__": 2023-01-11T21:38:06.8409091Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8409218Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8409428Z arg0_1 = rand_strided((10, 100), (100, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.8409665Z arg1_1 = rand_strided((10, 1, 10, 1, 10), (100, 100, 10, 10, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.8409786Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.8409792Z 2023-01-11T21:38:06.8409796Z 2023-01-11T21:38:06.8409896Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8409973Z import torch 2023-01-11T21:38:06.8410052Z import random 2023-01-11T21:38:06.8410166Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8410291Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8410296Z 2023-01-11T21:38:06.8410380Z aten = torch.ops.aten 2023-01-11T21:38:06.8410519Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8410621Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8410626Z 2023-01-11T21:38:06.8410703Z import triton 2023-01-11T21:38:06.8410798Z import triton.language as tl 2023-01-11T21:38:06.8410925Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8411083Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8411089Z 2023-01-11T21:38:06.8411101Z 2023-01-11T21:38:06.8411251Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.8411328Z import triton 2023-01-11T21:38:06.8411422Z import triton.language as tl 2023-01-11T21:38:06.8411538Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.8411643Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.8411778Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.8411904Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.8411909Z 2023-01-11T21:38:06.8412320Z @pointwise(size_hints=[1024], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.8412400Z @triton.jit 2023-01-11T21:38:06.8412547Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.8412625Z xnumel = 1000 2023-01-11T21:38:06.8412728Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.8412860Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.8412946Z xmask = xindex < xnumel 2023-01-11T21:38:06.8413021Z x0 = xindex 2023-01-11T21:38:06.8413133Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.8413251Z tmp1 = tl.load(in_ptr1 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.8413332Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.8413469Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.8413583Z ''') 2023-01-11T21:38:06.8413588Z 2023-01-11T21:38:06.8413593Z 2023-01-11T21:38:06.8413689Z async_compile.wait(globals()) 2023-01-11T21:38:06.8413769Z del async_compile 2023-01-11T21:38:06.8413774Z 2023-01-11T21:38:06.8413845Z def call(args): 2023-01-11T21:38:06.8413928Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.8414005Z args.clear() 2023-01-11T21:38:06.8414102Z with torch.cuda.device(0): 2023-01-11T21:38:06.8414338Z buf0 = empty_strided((10, 1, 10, 1, 10), (100, 100, 10, 10, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.8414434Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.8414780Z triton_fused_add_0.run(arg0_1, arg1_1, buf0, 1000, grid=grid(1000), stream=stream0) 2023-01-11T21:38:06.8414849Z del arg0_1 2023-01-11T21:38:06.8414922Z del arg1_1 2023-01-11T21:38:06.8415000Z return (buf0, ) 2023-01-11T21:38:06.8415005Z 2023-01-11T21:38:06.8415009Z 2023-01-11T21:38:06.8415092Z if __name__ == "__main__": 2023-01-11T21:38:06.8415214Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8415341Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8415576Z arg0_1 = rand_strided((10, 100), (100, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.8415867Z arg1_1 = rand_strided((10, 1, 10, 1, 10), (100, 100, 10, 10, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.8415989Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.8416001Z 2023-01-11T21:38:06.8416005Z 2023-01-11T21:38:06.8416101Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8416175Z import torch 2023-01-11T21:38:06.8416252Z import random 2023-01-11T21:38:06.8416380Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8416512Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8416517Z 2023-01-11T21:38:06.8416601Z aten = torch.ops.aten 2023-01-11T21:38:06.8416750Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8416845Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8416850Z 2023-01-11T21:38:06.8416927Z import triton 2023-01-11T21:38:06.8417025Z import triton.language as tl 2023-01-11T21:38:06.8417220Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8417434Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8417440Z 2023-01-11T21:38:06.8417444Z 2023-01-11T21:38:06.8417606Z triton_fused_add_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.8417681Z import triton 2023-01-11T21:38:06.8417774Z import triton.language as tl 2023-01-11T21:38:06.8417881Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.8417983Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.8418116Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.8418240Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.8418245Z 2023-01-11T21:38:06.8418661Z @pointwise(size_hints=[1024], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.8418740Z @triton.jit 2023-01-11T21:38:06.8418881Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.8418959Z xnumel = 1000 2023-01-11T21:38:06.8419049Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.8419177Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.8419263Z xmask = xindex < xnumel 2023-01-11T21:38:06.8419336Z x0 = xindex 2023-01-11T21:38:06.8419437Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.8419534Z tmp3 = tl.load(in_ptr1 + (x0), xmask) 2023-01-11T21:38:06.8419605Z tmp1 = 1 2023-01-11T21:38:06.8419677Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.8419754Z tmp4 = tmp2 + tmp3 2023-01-11T21:38:06.8419889Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.8420012Z ''') 2023-01-11T21:38:06.8420017Z 2023-01-11T21:38:06.8420022Z 2023-01-11T21:38:06.8420118Z async_compile.wait(globals()) 2023-01-11T21:38:06.8420199Z del async_compile 2023-01-11T21:38:06.8420204Z 2023-01-11T21:38:06.8420281Z def call(args): 2023-01-11T21:38:06.8420356Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.8420438Z args.clear() 2023-01-11T21:38:06.8420531Z with torch.cuda.device(0): 2023-01-11T21:38:06.8420766Z buf0 = empty_strided((10, 1, 10, 1, 10), (100, 100, 10, 10, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.8420862Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.8421011Z triton_fused_add_1_0.run(arg0_1, arg1_1, buf0, 1000, grid=grid(1000), stream=stream0) 2023-01-11T21:38:06.8421089Z del arg0_1 2023-01-11T21:38:06.8421165Z del arg1_1 2023-01-11T21:38:06.8421238Z return (buf0, ) 2023-01-11T21:38:06.8421243Z 2023-01-11T21:38:06.8421247Z 2023-01-11T21:38:06.8421335Z if __name__ == "__main__": 2023-01-11T21:38:06.8421459Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8421587Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8421796Z arg0_1 = rand_strided((10, 100), (100, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.8422032Z arg1_1 = rand_strided((10, 1, 10, 1, 10), (100, 100, 10, 10, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.8422156Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.8422161Z 2023-01-11T21:38:06.8422430Z [2023-01-11 21:36:23,503] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1000 2023-01-11T21:38:06.8422848Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8422979Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8423237Z [2023-01-11 21:36:23,519] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1001 2023-01-11T21:38:06.8423531Z [2023-01-11 21:36:23,531] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1001 2023-01-11T21:38:06.8423948Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8424085Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8424344Z [2023-01-11 21:36:23,547] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1002 2023-01-11T21:38:06.8424610Z [2023-01-11 21:36:23,558] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1002 2023-01-11T21:38:06.8425025Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8425159Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8425419Z [2023-01-11 21:36:23,577] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1003 2023-01-11T21:38:06.8425425Z 2023-01-11T21:38:06.8425528Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8425617Z import torch 2023-01-11T21:38:06.8425700Z import random 2023-01-11T21:38:06.8425882Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8426009Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8426015Z 2023-01-11T21:38:06.8426101Z aten = torch.ops.aten 2023-01-11T21:38:06.8426240Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8426342Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8426347Z 2023-01-11T21:38:06.8426426Z import triton 2023-01-11T21:38:06.8426514Z import triton.language as tl 2023-01-11T21:38:06.8426644Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8426786Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8426792Z 2023-01-11T21:38:06.8426796Z 2023-01-11T21:38:06.8426958Z triton_fused_add_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.8427036Z import triton 2023-01-11T21:38:06.8427131Z import triton.language as tl 2023-01-11T21:38:06.8427249Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.8427349Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.8427487Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.8427615Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.8427620Z 2023-01-11T21:38:06.8428044Z @pointwise(size_hints=[1024], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.8428121Z @triton.jit 2023-01-11T21:38:06.8428263Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.8428341Z xnumel = 1000 2023-01-11T21:38:06.8428442Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.8428567Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.8428654Z xmask = xindex < xnumel 2023-01-11T21:38:06.8428728Z x0 = xindex 2023-01-11T21:38:06.8428848Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.8428971Z tmp3 = tl.load(in_ptr1 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.8429044Z tmp1 = 1 2023-01-11T21:38:06.8429127Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.8429200Z tmp4 = tmp2 + tmp3 2023-01-11T21:38:06.8429367Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.8429457Z ''') 2023-01-11T21:38:06.8429462Z 2023-01-11T21:38:06.8429467Z 2023-01-11T21:38:06.8429563Z async_compile.wait(globals()) 2023-01-11T21:38:06.8429643Z del async_compile 2023-01-11T21:38:06.8429648Z 2023-01-11T21:38:06.8429726Z def call(args): 2023-01-11T21:38:06.8429808Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.8429885Z args.clear() 2023-01-11T21:38:06.8429973Z with torch.cuda.device(0): 2023-01-11T21:38:06.8430206Z buf0 = empty_strided((10, 1, 10, 1, 10), (100, 100, 10, 10, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.8430300Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.8430450Z triton_fused_add_1_0.run(arg0_1, arg1_1, buf0, 1000, grid=grid(1000), stream=stream0) 2023-01-11T21:38:06.8430526Z del arg0_1 2023-01-11T21:38:06.8430603Z del arg1_1 2023-01-11T21:38:06.8430683Z return (buf0, ) 2023-01-11T21:38:06.8430688Z 2023-01-11T21:38:06.8430693Z 2023-01-11T21:38:06.8430778Z if __name__ == "__main__": 2023-01-11T21:38:06.8430892Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8431020Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8431230Z arg0_1 = rand_strided((10, 100), (100, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.8431460Z arg1_1 = rand_strided((10, 1, 10, 1, 10), (100, 100, 10, 10, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.8431581Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.8431586Z 2023-01-11T21:38:06.8431590Z 2023-01-11T21:38:06.8431691Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8431796Z import torch 2023-01-11T21:38:06.8431873Z import random 2023-01-11T21:38:06.8431986Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8432112Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8432118Z 2023-01-11T21:38:06.8432202Z aten = torch.ops.aten 2023-01-11T21:38:06.8432343Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8432440Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8432446Z 2023-01-11T21:38:06.8432522Z import triton 2023-01-11T21:38:06.8432617Z import triton.language as tl 2023-01-11T21:38:06.8432737Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8432878Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8432884Z 2023-01-11T21:38:06.8432888Z 2023-01-11T21:38:06.8433045Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.8433125Z import triton 2023-01-11T21:38:06.8433220Z import triton.language as tl 2023-01-11T21:38:06.8433343Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.8433449Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.8433584Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.8433705Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.8433710Z 2023-01-11T21:38:06.8434133Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.8434214Z @triton.jit 2023-01-11T21:38:06.8434356Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.8434432Z xnumel = 16 2023-01-11T21:38:06.8434532Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.8434664Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.8434748Z xmask = xindex < xnumel 2023-01-11T21:38:06.8434817Z x0 = xindex 2023-01-11T21:38:06.8434917Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.8435016Z tmp1 = tl.load(in_ptr1 + (x0), xmask) 2023-01-11T21:38:06.8435098Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.8435235Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.8435344Z ''') 2023-01-11T21:38:06.8435380Z 2023-01-11T21:38:06.8435385Z 2023-01-11T21:38:06.8435495Z async_compile.wait(globals()) 2023-01-11T21:38:06.8435579Z del async_compile 2023-01-11T21:38:06.8435592Z 2023-01-11T21:38:06.8435662Z def call(args): 2023-01-11T21:38:06.8435746Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.8435824Z args.clear() 2023-01-11T21:38:06.8435918Z with torch.cuda.device(0): 2023-01-11T21:38:06.8436133Z buf0 = empty_strided((2, 2, 2, 2), (8, 4, 2, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.8436228Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.8436365Z triton_fused_add_0.run(arg0_1, arg1_1, buf0, 16, grid=grid(16), stream=stream0) 2023-01-11T21:38:06.8436445Z del arg0_1 2023-01-11T21:38:06.8436520Z del arg1_1 2023-01-11T21:38:06.8436601Z return (buf0, ) 2023-01-11T21:38:06.8436606Z 2023-01-11T21:38:06.8436611Z 2023-01-11T21:38:06.8436694Z if __name__ == "__main__": 2023-01-11T21:38:06.8436818Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8436946Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8437148Z arg0_1 = rand_strided((4, 4), (4, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.8437355Z arg1_1 = rand_strided((2, 2, 2, 2), (8, 4, 2, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.8437477Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.8437482Z 2023-01-11T21:38:06.8437487Z 2023-01-11T21:38:06.8437586Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8437663Z import torch 2023-01-11T21:38:06.8437739Z import random 2023-01-11T21:38:06.8437890Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8438017Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8438022Z 2023-01-11T21:38:06.8438106Z aten = torch.ops.aten 2023-01-11T21:38:06.8438238Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8438339Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8438344Z 2023-01-11T21:38:06.8438420Z import triton 2023-01-11T21:38:06.8438515Z import triton.language as tl 2023-01-11T21:38:06.8438643Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8438783Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8438789Z 2023-01-11T21:38:06.8438793Z 2023-01-11T21:38:06.8438948Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.8439025Z import triton 2023-01-11T21:38:06.8439113Z import triton.language as tl 2023-01-11T21:38:06.8439230Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.8439334Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.8439473Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.8439599Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.8439604Z 2023-01-11T21:38:06.8440031Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.8440108Z @triton.jit 2023-01-11T21:38:06.8440249Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.8440319Z xnumel = 16 2023-01-11T21:38:06.8440420Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.8440557Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.8440647Z xmask = xindex < xnumel 2023-01-11T21:38:06.8440720Z x0 = xindex 2023-01-11T21:38:06.8440839Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.8440960Z tmp1 = tl.load(in_ptr1 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.8441034Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.8441170Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.8441257Z ''') 2023-01-11T21:38:06.8441263Z 2023-01-11T21:38:06.8441267Z 2023-01-11T21:38:06.8441388Z async_compile.wait(globals()) 2023-01-11T21:38:06.8441470Z del async_compile 2023-01-11T21:38:06.8441475Z 2023-01-11T21:38:06.8441552Z def call(args): 2023-01-11T21:38:06.8441633Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.8441709Z args.clear() 2023-01-11T21:38:06.8441796Z with torch.cuda.device(0): 2023-01-11T21:38:06.8442013Z buf0 = empty_strided((2, 2, 2, 2), (8, 4, 2, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.8442109Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.8442257Z triton_fused_add_0.run(arg0_1, arg1_1, buf0, 16, grid=grid(16), stream=stream0) 2023-01-11T21:38:06.8442337Z del arg0_1 2023-01-11T21:38:06.8442413Z del arg1_1 2023-01-11T21:38:06.8442493Z return (buf0, ) 2023-01-11T21:38:06.8442499Z 2023-01-11T21:38:06.8442503Z 2023-01-11T21:38:06.8442578Z if __name__ == "__main__": 2023-01-11T21:38:06.8442697Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8442826Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8443030Z arg0_1 = rand_strided((4, 4), (4, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.8443241Z arg1_1 = rand_strided((2, 2, 2, 2), (8, 4, 2, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.8443361Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.8443367Z 2023-01-11T21:38:06.8443639Z [2023-01-11 21:36:23,589] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1003 2023-01-11T21:38:06.8444059Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8444221Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8444482Z [2023-01-11 21:36:23,608] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1004 2023-01-11T21:38:06.8444741Z [2023-01-11 21:36:23,619] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1004 2023-01-11T21:38:06.8444748Z 2023-01-11T21:38:06.8444847Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8444926Z import torch 2023-01-11T21:38:06.8445003Z import random 2023-01-11T21:38:06.8445126Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8445254Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8445262Z 2023-01-11T21:38:06.8445346Z aten = torch.ops.aten 2023-01-11T21:38:06.8445503Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8445616Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8445622Z 2023-01-11T21:38:06.8445707Z import triton 2023-01-11T21:38:06.8445802Z import triton.language as tl 2023-01-11T21:38:06.8445933Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8446075Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8446080Z 2023-01-11T21:38:06.8446085Z 2023-01-11T21:38:06.8446246Z triton_fused_add_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.8446323Z import triton 2023-01-11T21:38:06.8446411Z import triton.language as tl 2023-01-11T21:38:06.8446527Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.8446631Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.8446767Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.8446893Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.8446901Z 2023-01-11T21:38:06.8447319Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.8447420Z @triton.jit 2023-01-11T21:38:06.8447564Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.8447633Z xnumel = 16 2023-01-11T21:38:06.8447733Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.8447865Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.8447950Z xmask = xindex < xnumel 2023-01-11T21:38:06.8448022Z x0 = xindex 2023-01-11T21:38:06.8448122Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.8448225Z tmp3 = tl.load(in_ptr1 + (x0), xmask) 2023-01-11T21:38:06.8448292Z tmp1 = 1 2023-01-11T21:38:06.8448373Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.8448459Z tmp4 = tmp2 + tmp3 2023-01-11T21:38:06.8448596Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.8448684Z ''') 2023-01-11T21:38:06.8448690Z 2023-01-11T21:38:06.8448694Z 2023-01-11T21:38:06.8448791Z async_compile.wait(globals()) 2023-01-11T21:38:06.8448873Z del async_compile 2023-01-11T21:38:06.8448879Z 2023-01-11T21:38:06.8448957Z def call(args): 2023-01-11T21:38:06.8449032Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.8449110Z args.clear() 2023-01-11T21:38:06.8449204Z with torch.cuda.device(0): 2023-01-11T21:38:06.8449418Z buf0 = empty_strided((2, 2, 2, 2), (8, 4, 2, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.8449514Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.8449659Z triton_fused_add_1_0.run(arg0_1, arg1_1, buf0, 16, grid=grid(16), stream=stream0) 2023-01-11T21:38:06.8449736Z del arg0_1 2023-01-11T21:38:06.8449805Z del arg1_1 2023-01-11T21:38:06.8449915Z return (buf0, ) 2023-01-11T21:38:06.8449920Z 2023-01-11T21:38:06.8449925Z 2023-01-11T21:38:06.8450007Z if __name__ == "__main__": 2023-01-11T21:38:06.8450128Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8450256Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8450463Z arg0_1 = rand_strided((4, 4), (4, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.8450676Z arg1_1 = rand_strided((2, 2, 2, 2), (8, 4, 2, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.8450798Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.8450803Z 2023-01-11T21:38:06.8450808Z 2023-01-11T21:38:06.8450900Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8450976Z import torch 2023-01-11T21:38:06.8451053Z import random 2023-01-11T21:38:06.8451172Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8451297Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8451305Z 2023-01-11T21:38:06.8451390Z aten = torch.ops.aten 2023-01-11T21:38:06.8451528Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8451626Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8451631Z 2023-01-11T21:38:06.8451701Z import triton 2023-01-11T21:38:06.8451800Z import triton.language as tl 2023-01-11T21:38:06.8451932Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8452076Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8452081Z 2023-01-11T21:38:06.8452085Z 2023-01-11T21:38:06.8452245Z triton_fused_add_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.8452322Z import triton 2023-01-11T21:38:06.8452418Z import triton.language as tl 2023-01-11T21:38:06.8452527Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.8452631Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.8452765Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.8452891Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.8452900Z 2023-01-11T21:38:06.8453319Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.8453421Z @triton.jit 2023-01-11T21:38:06.8453565Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.8453643Z xnumel = 16 2023-01-11T21:38:06.8453735Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.8453867Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.8453952Z xmask = xindex < xnumel 2023-01-11T21:38:06.8454026Z x0 = xindex 2023-01-11T21:38:06.8454146Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.8454265Z tmp3 = tl.load(in_ptr1 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.8454338Z tmp1 = 1 2023-01-11T21:38:06.8454415Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.8454768Z tmp4 = tmp2 + tmp3 2023-01-11T21:38:06.8454911Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.8455002Z ''') 2023-01-11T21:38:06.8455007Z 2023-01-11T21:38:06.8455011Z 2023-01-11T21:38:06.8455106Z async_compile.wait(globals()) 2023-01-11T21:38:06.8455185Z del async_compile 2023-01-11T21:38:06.8455190Z 2023-01-11T21:38:06.8455265Z def call(args): 2023-01-11T21:38:06.8455343Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.8455411Z args.clear() 2023-01-11T21:38:06.8455505Z with torch.cuda.device(0): 2023-01-11T21:38:06.8455716Z buf0 = empty_strided((2, 2, 2, 2), (8, 4, 2, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.8455810Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.8455952Z triton_fused_add_1_0.run(arg0_1, arg1_1, buf0, 16, grid=grid(16), stream=stream0) 2023-01-11T21:38:06.8456025Z del arg0_1 2023-01-11T21:38:06.8456146Z del arg1_1 2023-01-11T21:38:06.8456220Z return (buf0, ) 2023-01-11T21:38:06.8456225Z 2023-01-11T21:38:06.8456237Z 2023-01-11T21:38:06.8456312Z if __name__ == "__main__": 2023-01-11T21:38:06.8456431Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8456559Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8456764Z arg0_1 = rand_strided((4, 4), (4, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.8456978Z arg1_1 = rand_strided((2, 2, 2, 2), (8, 4, 2, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.8457099Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.8457105Z 2023-01-11T21:38:06.8457259Z ok (2.813s) 2023-01-11T21:38:06.8457714Z test_views2_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8457844Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8458111Z [2023-01-11 21:36:23,636] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1005 2023-01-11T21:38:06.8458381Z [2023-01-11 21:36:23,702] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1005 2023-01-11T21:38:06.8458795Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8458927Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8459190Z [2023-01-11 21:36:23,717] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1006 2023-01-11T21:38:06.8459457Z [2023-01-11 21:36:23,783] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1006 2023-01-11T21:38:06.8459912Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8460046Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8460303Z [2023-01-11 21:36:23,800] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1007 2023-01-11T21:38:06.8460565Z [2023-01-11 21:36:23,868] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1007 2023-01-11T21:38:06.8460981Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8461113Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8461363Z [2023-01-11 21:36:23,885] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1008 2023-01-11T21:38:06.8461627Z [2023-01-11 21:36:23,953] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1008 2023-01-11T21:38:06.8462037Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8462193Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8462452Z [2023-01-11 21:36:23,967] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1009 2023-01-11T21:38:06.8462458Z 2023-01-11T21:38:06.8462558Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8462635Z import torch 2023-01-11T21:38:06.8462712Z import random 2023-01-11T21:38:06.8462833Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8462952Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8462957Z 2023-01-11T21:38:06.8463043Z aten = torch.ops.aten 2023-01-11T21:38:06.8463183Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8463279Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8463285Z 2023-01-11T21:38:06.8463364Z import triton 2023-01-11T21:38:06.8463458Z import triton.language as tl 2023-01-11T21:38:06.8463585Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8463720Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8463732Z 2023-01-11T21:38:06.8463736Z 2023-01-11T21:38:06.8463888Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.8463965Z import triton 2023-01-11T21:38:06.8464059Z import triton.language as tl 2023-01-11T21:38:06.8464175Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.8464279Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.8464416Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.8464543Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.8464548Z 2023-01-11T21:38:06.8464955Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.8465028Z @triton.jit 2023-01-11T21:38:06.8465162Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.8465237Z xnumel = 16 2023-01-11T21:38:06.8465337Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.8465535Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.8465623Z xmask = xindex < xnumel 2023-01-11T21:38:06.8465696Z x0 = xindex 2023-01-11T21:38:06.8465789Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.8465862Z tmp1 = 1 2023-01-11T21:38:06.8465943Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.8466080Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.8466167Z ''') 2023-01-11T21:38:06.8466173Z 2023-01-11T21:38:06.8466177Z 2023-01-11T21:38:06.8466274Z async_compile.wait(globals()) 2023-01-11T21:38:06.8466355Z del async_compile 2023-01-11T21:38:06.8466363Z 2023-01-11T21:38:06.8466434Z def call(args): 2023-01-11T21:38:06.8466510Z arg0_1, = args 2023-01-11T21:38:06.8466590Z args.clear() 2023-01-11T21:38:06.8466683Z with torch.cuda.device(0): 2023-01-11T21:38:06.8466887Z buf0 = empty_strided((4, 4), (4, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.8466985Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.8467125Z triton_fused_add_0.run(arg0_1, buf0, 16, grid=grid(16), stream=stream0) 2023-01-11T21:38:06.8467193Z del arg0_1 2023-01-11T21:38:06.8467274Z return (buf0, ) 2023-01-11T21:38:06.8467280Z 2023-01-11T21:38:06.8467284Z 2023-01-11T21:38:06.8467367Z if __name__ == "__main__": 2023-01-11T21:38:06.8467487Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8467614Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8467830Z arg0_1 = rand_strided((2, 2, 2, 2), (8, 4, 2, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.8467944Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.8467984Z 2023-01-11T21:38:06.8467988Z 2023-01-11T21:38:06.8468088Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8468163Z import torch 2023-01-11T21:38:06.8468233Z import random 2023-01-11T21:38:06.8468355Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8468482Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8468487Z 2023-01-11T21:38:06.8468571Z aten = torch.ops.aten 2023-01-11T21:38:06.8468708Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8468804Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8468809Z 2023-01-11T21:38:06.8468885Z import triton 2023-01-11T21:38:06.8468973Z import triton.language as tl 2023-01-11T21:38:06.8469098Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8469238Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8469243Z 2023-01-11T21:38:06.8469248Z 2023-01-11T21:38:06.8469409Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.8469486Z import triton 2023-01-11T21:38:06.8469581Z import triton.language as tl 2023-01-11T21:38:06.8469696Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.8469801Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.8469931Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.8470058Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.8470063Z 2023-01-11T21:38:06.8470469Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.8470547Z @triton.jit 2023-01-11T21:38:06.8470681Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.8470757Z xnumel = 16 2023-01-11T21:38:06.8470857Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.8470992Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.8471071Z xmask = xindex < xnumel 2023-01-11T21:38:06.8471143Z x0 = xindex 2023-01-11T21:38:06.8471262Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.8471336Z tmp1 = 1 2023-01-11T21:38:06.8471443Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.8471582Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.8471669Z ''') 2023-01-11T21:38:06.8471674Z 2023-01-11T21:38:06.8471679Z 2023-01-11T21:38:06.8471766Z async_compile.wait(globals()) 2023-01-11T21:38:06.8471845Z del async_compile 2023-01-11T21:38:06.8471851Z 2023-01-11T21:38:06.8471927Z def call(args): 2023-01-11T21:38:06.8472006Z arg0_1, = args 2023-01-11T21:38:06.8472083Z args.clear() 2023-01-11T21:38:06.8472177Z with torch.cuda.device(0): 2023-01-11T21:38:06.8472380Z buf0 = empty_strided((4, 4), (4, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.8472471Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.8472611Z triton_fused_add_0.run(arg0_1, buf0, 16, grid=grid(16), stream=stream0) 2023-01-11T21:38:06.8472687Z del arg0_1 2023-01-11T21:38:06.8472767Z return (buf0, ) 2023-01-11T21:38:06.8472772Z 2023-01-11T21:38:06.8472777Z 2023-01-11T21:38:06.8472861Z if __name__ == "__main__": 2023-01-11T21:38:06.8472980Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8473111Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8473326Z arg0_1 = rand_strided((2, 2, 2, 2), (8, 4, 2, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.8473434Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.8473439Z 2023-01-11T21:38:06.8473452Z 2023-01-11T21:38:06.8473544Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8473622Z import torch 2023-01-11T21:38:06.8473699Z import random 2023-01-11T21:38:06.8473820Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8473973Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8473978Z 2023-01-11T21:38:06.8474062Z aten = torch.ops.aten 2023-01-11T21:38:06.8474200Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8474290Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8474295Z 2023-01-11T21:38:06.8474372Z import triton 2023-01-11T21:38:06.8474466Z import triton.language as tl 2023-01-11T21:38:06.8474594Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8474736Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8474742Z 2023-01-11T21:38:06.8474746Z 2023-01-11T21:38:06.8474902Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.8474980Z import triton 2023-01-11T21:38:06.8475075Z import triton.language as tl 2023-01-11T21:38:06.8475184Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.8475294Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.8475453Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.8475603Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.8475608Z 2023-01-11T21:38:06.8476018Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.8476095Z @triton.jit 2023-01-11T21:38:06.8476230Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.8476305Z xnumel = 16 2023-01-11T21:38:06.8476397Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.8476528Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.8476614Z xmask = xindex < xnumel 2023-01-11T21:38:06.8476687Z x0 = xindex 2023-01-11T21:38:06.8476786Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.8476859Z tmp1 = 2 2023-01-11T21:38:06.8476943Z tmp2 = tmp0 * tmp1 2023-01-11T21:38:06.8477009Z tmp3 = 1 2023-01-11T21:38:06.8477090Z tmp4 = tmp2 + tmp3 2023-01-11T21:38:06.8477227Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.8477315Z ''') 2023-01-11T21:38:06.8477321Z 2023-01-11T21:38:06.8477325Z 2023-01-11T21:38:06.8477451Z async_compile.wait(globals()) 2023-01-11T21:38:06.8477533Z del async_compile 2023-01-11T21:38:06.8477538Z 2023-01-11T21:38:06.8477616Z def call(args): 2023-01-11T21:38:06.8477684Z arg0_1, = args 2023-01-11T21:38:06.8477761Z args.clear() 2023-01-11T21:38:06.8477857Z with torch.cuda.device(0): 2023-01-11T21:38:06.8478061Z buf0 = empty_strided((4, 4), (4, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.8478156Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.8478298Z triton_fused_add_0.run(arg0_1, buf0, 16, grid=grid(16), stream=stream0) 2023-01-11T21:38:06.8478375Z del arg0_1 2023-01-11T21:38:06.8478451Z return (buf0, ) 2023-01-11T21:38:06.8478456Z 2023-01-11T21:38:06.8478467Z 2023-01-11T21:38:06.8478542Z if __name__ == "__main__": 2023-01-11T21:38:06.8478661Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8478790Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8479007Z arg0_1 = rand_strided((2, 2, 2, 2), (8, 4, 2, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.8479126Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.8479131Z 2023-01-11T21:38:06.8479136Z 2023-01-11T21:38:06.8479237Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8479314Z import torch 2023-01-11T21:38:06.8479383Z import random 2023-01-11T21:38:06.8479504Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8479631Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8479636Z 2023-01-11T21:38:06.8479720Z aten = torch.ops.aten 2023-01-11T21:38:06.8479858Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8479986Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8479991Z 2023-01-11T21:38:06.8480067Z import triton 2023-01-11T21:38:06.8480160Z import triton.language as tl 2023-01-11T21:38:06.8480281Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8480425Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8480430Z 2023-01-11T21:38:06.8480435Z 2023-01-11T21:38:06.8480590Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.8480667Z import triton 2023-01-11T21:38:06.8480765Z import triton.language as tl 2023-01-11T21:38:06.8480881Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.8480986Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.8481114Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.8481243Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.8481248Z 2023-01-11T21:38:06.8481652Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.8481731Z @triton.jit 2023-01-11T21:38:06.8481866Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.8481943Z xnumel = 16 2023-01-11T21:38:06.8482046Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.8482178Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.8482256Z xmask = xindex < xnumel 2023-01-11T21:38:06.8482331Z x0 = xindex 2023-01-11T21:38:06.8482449Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.8482522Z tmp1 = 2 2023-01-11T21:38:06.8482605Z tmp2 = tmp0 * tmp1 2023-01-11T21:38:06.8482678Z tmp3 = 1 2023-01-11T21:38:06.8482757Z tmp4 = tmp2 + tmp3 2023-01-11T21:38:06.8482887Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.8482978Z ''') 2023-01-11T21:38:06.8482984Z 2023-01-11T21:38:06.8482988Z 2023-01-11T21:38:06.8483083Z async_compile.wait(globals()) 2023-01-11T21:38:06.8483163Z del async_compile 2023-01-11T21:38:06.8483168Z 2023-01-11T21:38:06.8483246Z def call(args): 2023-01-11T21:38:06.8483323Z arg0_1, = args 2023-01-11T21:38:06.8483427Z args.clear() 2023-01-11T21:38:06.8483524Z with torch.cuda.device(0): 2023-01-11T21:38:06.8483721Z buf0 = empty_strided((4, 4), (4, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.8483816Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.8483953Z triton_fused_add_0.run(arg0_1, buf0, 16, grid=grid(16), stream=stream0) 2023-01-11T21:38:06.8484028Z del arg0_1 2023-01-11T21:38:06.8484108Z return (buf0, ) 2023-01-11T21:38:06.8484114Z 2023-01-11T21:38:06.8484118Z 2023-01-11T21:38:06.8484199Z if __name__ == "__main__": 2023-01-11T21:38:06.8484319Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8484443Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8484658Z arg0_1 = rand_strided((2, 2, 2, 2), (8, 4, 2, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.8484774Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.8484779Z 2023-01-11T21:38:06.8485049Z [2023-01-11 21:36:24,034] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1009 2023-01-11T21:38:06.8485463Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8485597Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8485856Z [2023-01-11 21:36:24,049] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1010 2023-01-11T21:38:06.8486150Z [2023-01-11 21:36:24,116] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1010 2023-01-11T21:38:06.8486567Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8486701Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8486957Z [2023-01-11 21:36:24,133] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1011 2023-01-11T21:38:06.8487215Z [2023-01-11 21:36:24,202] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1011 2023-01-11T21:38:06.8487627Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8487766Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8488022Z [2023-01-11 21:36:24,219] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1012 2023-01-11T21:38:06.8488028Z 2023-01-11T21:38:06.8488128Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8488205Z import torch 2023-01-11T21:38:06.8488281Z import random 2023-01-11T21:38:06.8488402Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8488527Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8488532Z 2023-01-11T21:38:06.8488610Z aten = torch.ops.aten 2023-01-11T21:38:06.8488752Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8488850Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8488855Z 2023-01-11T21:38:06.8488934Z import triton 2023-01-11T21:38:06.8489030Z import triton.language as tl 2023-01-11T21:38:06.8489156Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8489324Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8489330Z 2023-01-11T21:38:06.8489334Z 2023-01-11T21:38:06.8489492Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.8489562Z import triton 2023-01-11T21:38:06.8489657Z import triton.language as tl 2023-01-11T21:38:06.8489775Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.8489880Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.8490014Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.8490142Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.8490147Z 2023-01-11T21:38:06.8490558Z @pointwise(size_hints=[1024], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.8490635Z @triton.jit 2023-01-11T21:38:06.8490766Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.8490844Z xnumel = 1000 2023-01-11T21:38:06.8490944Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.8491075Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.8491160Z xmask = xindex < xnumel 2023-01-11T21:38:06.8491236Z x0 = xindex 2023-01-11T21:38:06.8491334Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.8491401Z tmp1 = 1 2023-01-11T21:38:06.8491483Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.8491620Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.8491709Z ''') 2023-01-11T21:38:06.8491740Z 2023-01-11T21:38:06.8491744Z 2023-01-11T21:38:06.8491841Z async_compile.wait(globals()) 2023-01-11T21:38:06.8491919Z del async_compile 2023-01-11T21:38:06.8491924Z 2023-01-11T21:38:06.8492000Z def call(args): 2023-01-11T21:38:06.8492076Z arg0_1, = args 2023-01-11T21:38:06.8492147Z args.clear() 2023-01-11T21:38:06.8492242Z with torch.cuda.device(0): 2023-01-11T21:38:06.8492454Z buf0 = empty_strided((10, 100), (100, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.8492550Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.8492692Z triton_fused_add_0.run(arg0_1, buf0, 1000, grid=grid(1000), stream=stream0) 2023-01-11T21:38:06.8492769Z del arg0_1 2023-01-11T21:38:06.8492849Z return (buf0, ) 2023-01-11T21:38:06.8492854Z 2023-01-11T21:38:06.8492859Z 2023-01-11T21:38:06.8492934Z if __name__ == "__main__": 2023-01-11T21:38:06.8493053Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8493184Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8493425Z arg0_1 = rand_strided((10, 1, 10, 1, 10), (100, 100, 10, 10, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.8493541Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.8493546Z 2023-01-11T21:38:06.8493551Z 2023-01-11T21:38:06.8493651Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8493732Z import torch 2023-01-11T21:38:06.8493810Z import random 2023-01-11T21:38:06.8493924Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8494050Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8494055Z 2023-01-11T21:38:06.8494139Z aten = torch.ops.aten 2023-01-11T21:38:06.8494281Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8494383Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8494388Z 2023-01-11T21:38:06.8494465Z import triton 2023-01-11T21:38:06.8494764Z import triton.language as tl 2023-01-11T21:38:06.8494891Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8495028Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8495034Z 2023-01-11T21:38:06.8495038Z 2023-01-11T21:38:06.8495193Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.8495268Z import triton 2023-01-11T21:38:06.8495360Z import triton.language as tl 2023-01-11T21:38:06.8495519Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.8495622Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.8495758Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.8495877Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.8495889Z 2023-01-11T21:38:06.8496285Z @pointwise(size_hints=[1024], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.8496359Z @triton.jit 2023-01-11T21:38:06.8496494Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.8496569Z xnumel = 1000 2023-01-11T21:38:06.8496665Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.8496794Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.8496876Z xmask = xindex < xnumel 2023-01-11T21:38:06.8496942Z x0 = xindex 2023-01-11T21:38:06.8497063Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.8497177Z tmp1 = 1 2023-01-11T21:38:06.8497272Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.8497422Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.8497526Z ''') 2023-01-11T21:38:06.8497532Z 2023-01-11T21:38:06.8497537Z 2023-01-11T21:38:06.8497630Z async_compile.wait(globals()) 2023-01-11T21:38:06.8497709Z del async_compile 2023-01-11T21:38:06.8497714Z 2023-01-11T21:38:06.8497782Z def call(args): 2023-01-11T21:38:06.8497854Z arg0_1, = args 2023-01-11T21:38:06.8497970Z args.clear() 2023-01-11T21:38:06.8498061Z with torch.cuda.device(0): 2023-01-11T21:38:06.8498271Z buf0 = empty_strided((10, 100), (100, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.8498362Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.8498505Z triton_fused_add_0.run(arg0_1, buf0, 1000, grid=grid(1000), stream=stream0) 2023-01-11T21:38:06.8498571Z del arg0_1 2023-01-11T21:38:06.8498648Z return (buf0, ) 2023-01-11T21:38:06.8498654Z 2023-01-11T21:38:06.8498658Z 2023-01-11T21:38:06.8498739Z if __name__ == "__main__": 2023-01-11T21:38:06.8498857Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8498984Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8499217Z arg0_1 = rand_strided((10, 1, 10, 1, 10), (100, 100, 10, 10, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.8499330Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.8499335Z 2023-01-11T21:38:06.8499342Z 2023-01-11T21:38:06.8499438Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8499505Z import torch 2023-01-11T21:38:06.8499579Z import random 2023-01-11T21:38:06.8499697Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8499820Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8499825Z 2023-01-11T21:38:06.8499908Z aten = torch.ops.aten 2023-01-11T21:38:06.8500044Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8500138Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8500143Z 2023-01-11T21:38:06.8500217Z import triton 2023-01-11T21:38:06.8500302Z import triton.language as tl 2023-01-11T21:38:06.8500426Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8500564Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8500569Z 2023-01-11T21:38:06.8500573Z 2023-01-11T21:38:06.8500728Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.8500801Z import triton 2023-01-11T21:38:06.8500896Z import triton.language as tl 2023-01-11T21:38:06.8501009Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.8501103Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.8501237Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.8501390Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.8501396Z 2023-01-11T21:38:06.8501800Z @pointwise(size_hints=[1024], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.8501875Z @triton.jit 2023-01-11T21:38:06.8502007Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.8502083Z xnumel = 1000 2023-01-11T21:38:06.8502181Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.8502301Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.8502387Z xmask = xindex < xnumel 2023-01-11T21:38:06.8502458Z x0 = xindex 2023-01-11T21:38:06.8502557Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.8502628Z tmp1 = 2 2023-01-11T21:38:06.8502710Z tmp2 = tmp0 * tmp1 2023-01-11T21:38:06.8502781Z tmp3 = 1 2023-01-11T21:38:06.8502852Z tmp4 = tmp2 + tmp3 2023-01-11T21:38:06.8502990Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.8503075Z ''') 2023-01-11T21:38:06.8503081Z 2023-01-11T21:38:06.8503085Z 2023-01-11T21:38:06.8503176Z async_compile.wait(globals()) 2023-01-11T21:38:06.8503254Z del async_compile 2023-01-11T21:38:06.8503259Z 2023-01-11T21:38:06.8503334Z def call(args): 2023-01-11T21:38:06.8503409Z arg0_1, = args 2023-01-11T21:38:06.8503485Z args.clear() 2023-01-11T21:38:06.8503570Z with torch.cuda.device(0): 2023-01-11T21:38:06.8503776Z buf0 = empty_strided((10, 100), (100, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.8503895Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.8504031Z triton_fused_add_0.run(arg0_1, buf0, 1000, grid=grid(1000), stream=stream0) 2023-01-11T21:38:06.8504106Z del arg0_1 2023-01-11T21:38:06.8504184Z return (buf0, ) 2023-01-11T21:38:06.8504189Z 2023-01-11T21:38:06.8504193Z 2023-01-11T21:38:06.8504276Z if __name__ == "__main__": 2023-01-11T21:38:06.8504387Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8504512Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8504742Z arg0_1 = rand_strided((10, 1, 10, 1, 10), (100, 100, 10, 10, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.8504852Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.8504858Z 2023-01-11T21:38:06.8505124Z [2023-01-11 21:36:24,291] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1012 2023-01-11T21:38:06.8505536Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8505673Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8505930Z [2023-01-11 21:36:24,305] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1013 2023-01-11T21:38:06.8506192Z [2023-01-11 21:36:24,315] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1013 2023-01-11T21:38:06.8506604Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8506737Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8506985Z [2023-01-11 21:36:24,329] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1014 2023-01-11T21:38:06.8507270Z [2023-01-11 21:36:24,339] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1014 2023-01-11T21:38:06.8507682Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8507812Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8508067Z [2023-01-11 21:36:24,356] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1015 2023-01-11T21:38:06.8508076Z 2023-01-11T21:38:06.8508173Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8508247Z import torch 2023-01-11T21:38:06.8508324Z import random 2023-01-11T21:38:06.8508442Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8508559Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8508573Z 2023-01-11T21:38:06.8508648Z aten = torch.ops.aten 2023-01-11T21:38:06.8508786Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8508881Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8508886Z 2023-01-11T21:38:06.8508962Z import triton 2023-01-11T21:38:06.8509053Z import triton.language as tl 2023-01-11T21:38:06.8509179Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8509317Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8509323Z 2023-01-11T21:38:06.8509327Z 2023-01-11T21:38:06.8509474Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.8509586Z import triton 2023-01-11T21:38:06.8509682Z import triton.language as tl 2023-01-11T21:38:06.8509797Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.8509900Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.8510031Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.8510158Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.8510164Z 2023-01-11T21:38:06.8510570Z @pointwise(size_hints=[1024], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.8510636Z @triton.jit 2023-01-11T21:38:06.8510767Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.8510842Z xnumel = 1000 2023-01-11T21:38:06.8510939Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.8511068Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.8511154Z xmask = xindex < xnumel 2023-01-11T21:38:06.8511225Z x0 = xindex 2023-01-11T21:38:06.8511335Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.8511409Z tmp1 = 2 2023-01-11T21:38:06.8511486Z tmp2 = tmp0 * tmp1 2023-01-11T21:38:06.8511557Z tmp3 = 1 2023-01-11T21:38:06.8511637Z tmp4 = tmp2 + tmp3 2023-01-11T21:38:06.8511772Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.8511859Z ''') 2023-01-11T21:38:06.8511865Z 2023-01-11T21:38:06.8511869Z 2023-01-11T21:38:06.8511965Z async_compile.wait(globals()) 2023-01-11T21:38:06.8512035Z del async_compile 2023-01-11T21:38:06.8512040Z 2023-01-11T21:38:06.8512114Z def call(args): 2023-01-11T21:38:06.8512186Z arg0_1, = args 2023-01-11T21:38:06.8512261Z args.clear() 2023-01-11T21:38:06.8512353Z with torch.cuda.device(0): 2023-01-11T21:38:06.8512559Z buf0 = empty_strided((10, 100), (100, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.8512653Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.8512785Z triton_fused_add_0.run(arg0_1, buf0, 1000, grid=grid(1000), stream=stream0) 2023-01-11T21:38:06.8512860Z del arg0_1 2023-01-11T21:38:06.8512936Z return (buf0, ) 2023-01-11T21:38:06.8512942Z 2023-01-11T21:38:06.8512975Z 2023-01-11T21:38:06.8513057Z if __name__ == "__main__": 2023-01-11T21:38:06.8513175Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8513301Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8513533Z arg0_1 = rand_strided((10, 1, 10, 1, 10), (100, 100, 10, 10, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.8513645Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.8513650Z 2023-01-11T21:38:06.8513654Z 2023-01-11T21:38:06.8513744Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8513816Z import torch 2023-01-11T21:38:06.8513890Z import random 2023-01-11T21:38:06.8514012Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8514135Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8514141Z 2023-01-11T21:38:06.8514222Z aten = torch.ops.aten 2023-01-11T21:38:06.8514360Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8514450Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8514462Z 2023-01-11T21:38:06.8514529Z import triton 2023-01-11T21:38:06.8514620Z import triton.language as tl 2023-01-11T21:38:06.8514747Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8514884Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8514889Z 2023-01-11T21:38:06.8514894Z 2023-01-11T21:38:06.8515047Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.8515122Z import triton 2023-01-11T21:38:06.8515218Z import triton.language as tl 2023-01-11T21:38:06.8515324Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.8515455Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.8515586Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.8515711Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.8515716Z 2023-01-11T21:38:06.8516118Z @pointwise(size_hints=[1024], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.8516194Z @triton.jit 2023-01-11T21:38:06.8516326Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.8516401Z xnumel = 1000 2023-01-11T21:38:06.8516491Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.8516619Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.8516703Z xmask = xindex < xnumel 2023-01-11T21:38:06.8516774Z x0 = xindex 2023-01-11T21:38:06.8516874Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.8516948Z tmp1 = 1 2023-01-11T21:38:06.8517025Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.8517153Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.8517237Z ''') 2023-01-11T21:38:06.8517243Z 2023-01-11T21:38:06.8517247Z 2023-01-11T21:38:06.8517339Z async_compile.wait(globals()) 2023-01-11T21:38:06.8517418Z del async_compile 2023-01-11T21:38:06.8517423Z 2023-01-11T21:38:06.8517501Z def call(args): 2023-01-11T21:38:06.8517574Z arg0_1, = args 2023-01-11T21:38:06.8517648Z args.clear() 2023-01-11T21:38:06.8517739Z with torch.cuda.device(0): 2023-01-11T21:38:06.8517944Z buf0 = empty_strided((10, 5, 20), (100, 20, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.8518036Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.8518175Z triton_fused_add_0.run(arg0_1, buf0, 1000, grid=grid(1000), stream=stream0) 2023-01-11T21:38:06.8518247Z del arg0_1 2023-01-11T21:38:06.8518324Z return (buf0, ) 2023-01-11T21:38:06.8518333Z 2023-01-11T21:38:06.8518337Z 2023-01-11T21:38:06.8518417Z if __name__ == "__main__": 2023-01-11T21:38:06.8518535Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8518653Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8518885Z arg0_1 = rand_strided((50, 20), (20, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.8518996Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.8519001Z 2023-01-11T21:38:06.8519006Z 2023-01-11T21:38:06.8519103Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8519177Z import torch 2023-01-11T21:38:06.8519251Z import random 2023-01-11T21:38:06.8519369Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8519493Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8519498Z 2023-01-11T21:38:06.8519572Z aten = torch.ops.aten 2023-01-11T21:38:06.8519708Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8519805Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8519810Z 2023-01-11T21:38:06.8519886Z import triton 2023-01-11T21:38:06.8519977Z import triton.language as tl 2023-01-11T21:38:06.8520101Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8520242Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8520247Z 2023-01-11T21:38:06.8520252Z 2023-01-11T21:38:06.8520407Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.8520474Z import triton 2023-01-11T21:38:06.8520568Z import triton.language as tl 2023-01-11T21:38:06.8520682Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.8520784Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.8520915Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.8521040Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.8521045Z 2023-01-11T21:38:06.8521446Z @pointwise(size_hints=[1024], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.8521589Z @triton.jit 2023-01-11T21:38:06.8521714Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.8521791Z xnumel = 1000 2023-01-11T21:38:06.8521889Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.8522018Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.8522101Z xmask = xindex < xnumel 2023-01-11T21:38:06.8522173Z x0 = xindex 2023-01-11T21:38:06.8522290Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.8522354Z tmp1 = 1 2023-01-11T21:38:06.8522433Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.8522567Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.8522654Z ''') 2023-01-11T21:38:06.8522660Z 2023-01-11T21:38:06.8522664Z 2023-01-11T21:38:06.8522766Z async_compile.wait(globals()) 2023-01-11T21:38:06.8522842Z del async_compile 2023-01-11T21:38:06.8522847Z 2023-01-11T21:38:06.8522922Z def call(args): 2023-01-11T21:38:06.8522994Z arg0_1, = args 2023-01-11T21:38:06.8523062Z args.clear() 2023-01-11T21:38:06.8523154Z with torch.cuda.device(0): 2023-01-11T21:38:06.8523371Z buf0 = empty_strided((10, 5, 20), (100, 20, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.8523464Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.8523606Z triton_fused_add_0.run(arg0_1, buf0, 1000, grid=grid(1000), stream=stream0) 2023-01-11T21:38:06.8523679Z del arg0_1 2023-01-11T21:38:06.8523756Z return (buf0, ) 2023-01-11T21:38:06.8523761Z 2023-01-11T21:38:06.8523765Z 2023-01-11T21:38:06.8523838Z if __name__ == "__main__": 2023-01-11T21:38:06.8523955Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8524082Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8524288Z arg0_1 = rand_strided((50, 20), (20, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.8524399Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.8524405Z 2023-01-11T21:38:06.8524671Z [2023-01-11 21:36:24,367] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1015 2023-01-11T21:38:06.8525122Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8525256Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8525520Z [2023-01-11 21:36:24,384] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1016 2023-01-11T21:38:06.8525781Z [2023-01-11 21:36:24,394] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1016 2023-01-11T21:38:06.8525789Z 2023-01-11T21:38:06.8525879Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8525954Z import torch 2023-01-11T21:38:06.8526029Z import random 2023-01-11T21:38:06.8526147Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8526276Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8526282Z 2023-01-11T21:38:06.8526363Z aten = torch.ops.aten 2023-01-11T21:38:06.8526499Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8526587Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8526600Z 2023-01-11T21:38:06.8526667Z import triton 2023-01-11T21:38:06.8526757Z import triton.language as tl 2023-01-11T21:38:06.8526881Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8527020Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8527026Z 2023-01-11T21:38:06.8527055Z 2023-01-11T21:38:06.8527213Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.8527288Z import triton 2023-01-11T21:38:06.8527383Z import triton.language as tl 2023-01-11T21:38:06.8527490Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.8527591Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.8527725Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.8527852Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.8527857Z 2023-01-11T21:38:06.8528256Z @pointwise(size_hints=[1024], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.8528329Z @triton.jit 2023-01-11T21:38:06.8528461Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.8528537Z xnumel = 1000 2023-01-11T21:38:06.8528627Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.8528759Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.8528843Z xmask = xindex < xnumel 2023-01-11T21:38:06.8528914Z x0 = xindex 2023-01-11T21:38:06.8529011Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.8529084Z tmp1 = 2 2023-01-11T21:38:06.8529167Z tmp2 = tmp0 * tmp1 2023-01-11T21:38:06.8529230Z tmp3 = 1 2023-01-11T21:38:06.8529308Z tmp4 = tmp2 + tmp3 2023-01-11T21:38:06.8529445Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.8529530Z ''') 2023-01-11T21:38:06.8529536Z 2023-01-11T21:38:06.8529540Z 2023-01-11T21:38:06.8529633Z async_compile.wait(globals()) 2023-01-11T21:38:06.8529710Z del async_compile 2023-01-11T21:38:06.8529715Z 2023-01-11T21:38:06.8529790Z def call(args): 2023-01-11T21:38:06.8529855Z arg0_1, = args 2023-01-11T21:38:06.8529931Z args.clear() 2023-01-11T21:38:06.8530021Z with torch.cuda.device(0): 2023-01-11T21:38:06.8530235Z buf0 = empty_strided((10, 5, 20), (100, 20, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.8530328Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.8530466Z triton_fused_add_0.run(arg0_1, buf0, 1000, grid=grid(1000), stream=stream0) 2023-01-11T21:38:06.8530546Z del arg0_1 2023-01-11T21:38:06.8530647Z return (buf0, ) 2023-01-11T21:38:06.8530662Z 2023-01-11T21:38:06.8530667Z 2023-01-11T21:38:06.8530741Z if __name__ == "__main__": 2023-01-11T21:38:06.8530860Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8530986Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8531188Z arg0_1 = rand_strided((50, 20), (20, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.8531303Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.8531308Z 2023-01-11T21:38:06.8531313Z 2023-01-11T21:38:06.8531411Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8531488Z import torch 2023-01-11T21:38:06.8531565Z import random 2023-01-11T21:38:06.8531676Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8531799Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8531804Z 2023-01-11T21:38:06.8531885Z aten = torch.ops.aten 2023-01-11T21:38:06.8532023Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8532121Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8532127Z 2023-01-11T21:38:06.8532201Z import triton 2023-01-11T21:38:06.8532298Z import triton.language as tl 2023-01-11T21:38:06.8532414Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8532551Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8532556Z 2023-01-11T21:38:06.8532561Z 2023-01-11T21:38:06.8532714Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.8532789Z import triton 2023-01-11T21:38:06.8532884Z import triton.language as tl 2023-01-11T21:38:06.8532997Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.8533130Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.8533265Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.8533383Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.8533388Z 2023-01-11T21:38:06.8533788Z @pointwise(size_hints=[1024], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.8533867Z @triton.jit 2023-01-11T21:38:06.8533998Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.8534074Z xnumel = 1000 2023-01-11T21:38:06.8534170Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.8534297Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.8534381Z xmask = xindex < xnumel 2023-01-11T21:38:06.8534445Z x0 = xindex 2023-01-11T21:38:06.8534761Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.8534834Z tmp1 = 2 2023-01-11T21:38:06.8534913Z tmp2 = tmp0 * tmp1 2023-01-11T21:38:06.8534984Z tmp3 = 1 2023-01-11T21:38:06.8535062Z tmp4 = tmp2 + tmp3 2023-01-11T21:38:06.8535198Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.8535283Z ''') 2023-01-11T21:38:06.8535288Z 2023-01-11T21:38:06.8535293Z 2023-01-11T21:38:06.8535386Z async_compile.wait(globals()) 2023-01-11T21:38:06.8535463Z del async_compile 2023-01-11T21:38:06.8535468Z 2023-01-11T21:38:06.8535541Z def call(args): 2023-01-11T21:38:06.8535614Z arg0_1, = args 2023-01-11T21:38:06.8535689Z args.clear() 2023-01-11T21:38:06.8535783Z with torch.cuda.device(0): 2023-01-11T21:38:06.8536023Z buf0 = empty_strided((10, 5, 20), (100, 20, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.8536130Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.8536266Z triton_fused_add_0.run(arg0_1, buf0, 1000, grid=grid(1000), stream=stream0) 2023-01-11T21:38:06.8536344Z del arg0_1 2023-01-11T21:38:06.8536422Z return (buf0, ) 2023-01-11T21:38:06.8536427Z 2023-01-11T21:38:06.8536432Z 2023-01-11T21:38:06.8536512Z if __name__ == "__main__": 2023-01-11T21:38:06.8536672Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8536800Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8536996Z arg0_1 = rand_strided((50, 20), (20, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.8537110Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.8537115Z 2023-01-11T21:38:06.8537252Z ok (0.774s) 2023-01-11T21:38:06.8537704Z test_views3_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8537841Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8538101Z [2023-01-11 21:36:24,452] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1017 2023-01-11T21:38:06.8538365Z [2023-01-11 21:36:24,564] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1017 2023-01-11T21:38:06.8538777Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8538908Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8539165Z [2023-01-11 21:36:24,620] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1018 2023-01-11T21:38:06.8539467Z [2023-01-11 21:36:24,714] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1018 2023-01-11T21:38:06.8539472Z 2023-01-11T21:38:06.8539566Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8539646Z import torch 2023-01-11T21:38:06.8539725Z import random 2023-01-11T21:38:06.8539851Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8539977Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8539982Z 2023-01-11T21:38:06.8540068Z aten = torch.ops.aten 2023-01-11T21:38:06.8540209Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8540300Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8540314Z 2023-01-11T21:38:06.8540384Z import triton 2023-01-11T21:38:06.8540479Z import triton.language as tl 2023-01-11T21:38:06.8540608Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8540753Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8540759Z 2023-01-11T21:38:06.8540763Z 2023-01-11T21:38:06.8540923Z triton_fused_view_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.8541000Z import triton 2023-01-11T21:38:06.8541095Z import triton.language as tl 2023-01-11T21:38:06.8541207Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.8541312Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.8541447Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.8541579Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.8541584Z 2023-01-11T21:38:06.8542009Z @pointwise(size_hints=[262144], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.8542086Z @triton.jit 2023-01-11T21:38:06.8542233Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.8542312Z xnumel = 142848 2023-01-11T21:38:06.8542405Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.8542538Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.8542623Z xmask = xindex < xnumel 2023-01-11T21:38:06.8542730Z x0 = xindex % 192 2023-01-11T21:38:06.8542815Z x1 = (xindex // 192) 2023-01-11T21:38:06.8542889Z x2 = xindex 2023-01-11T21:38:06.8543005Z tmp0 = tl.load(in_ptr0 + ((3*x1) + (x0 // 64)), xmask) 2023-01-11T21:38:06.8543116Z tmp1 = tl.load(in_ptr1 + ((64*tmp0) + (x0 % 64)), xmask) 2023-01-11T21:38:06.8543254Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask) 2023-01-11T21:38:06.8543343Z ''') 2023-01-11T21:38:06.8543349Z 2023-01-11T21:38:06.8543354Z 2023-01-11T21:38:06.8543449Z async_compile.wait(globals()) 2023-01-11T21:38:06.8543528Z del async_compile 2023-01-11T21:38:06.8543536Z 2023-01-11T21:38:06.8543616Z def call(args): 2023-01-11T21:38:06.8543699Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.8543776Z args.clear() 2023-01-11T21:38:06.8543864Z with torch.cuda.device(0): 2023-01-11T21:38:06.8544100Z buf0 = empty_strided((1, 12, 62, 192), (142848, 11904, 192, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.8544199Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.8544351Z triton_fused_view_1_0.run(arg1_1, arg0_1, buf0, 142848, grid=grid(142848), stream=stream0) 2023-01-11T21:38:06.8544426Z del arg0_1 2023-01-11T21:38:06.8544501Z del arg1_1 2023-01-11T21:38:06.8544580Z return (buf0, ) 2023-01-11T21:38:06.8544585Z 2023-01-11T21:38:06.8544590Z 2023-01-11T21:38:06.8544665Z if __name__ == "__main__": 2023-01-11T21:38:06.8544788Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8544916Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8545123Z arg0_1 = rand_strided((64, 64), (64, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.8545353Z arg1_1 = rand_strided((2232, ), (1, ), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.8545476Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.8545481Z 2023-01-11T21:38:06.8545485Z 2023-01-11T21:38:06.8545588Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8545665Z import torch 2023-01-11T21:38:06.8545735Z import random 2023-01-11T21:38:06.8545856Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8545981Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8545986Z 2023-01-11T21:38:06.8546070Z aten = torch.ops.aten 2023-01-11T21:38:06.8546207Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8546308Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8546313Z 2023-01-11T21:38:06.8546388Z import triton 2023-01-11T21:38:06.8546481Z import triton.language as tl 2023-01-11T21:38:06.8546600Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8546745Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8546750Z 2023-01-11T21:38:06.8546755Z 2023-01-11T21:38:06.8546917Z triton_fused_view_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.8546994Z import triton 2023-01-11T21:38:06.8547093Z import triton.language as tl 2023-01-11T21:38:06.8547208Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.8547314Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.8547450Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.8547571Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.8547576Z 2023-01-11T21:38:06.8547998Z @pointwise(size_hints=[262144], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) 2023-01-11T21:38:06.8548078Z @triton.jit 2023-01-11T21:38:06.8548221Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.8548300Z xnumel = 142848 2023-01-11T21:38:06.8548399Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.8548529Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.8548643Z xmask = xindex < xnumel 2023-01-11T21:38:06.8548717Z x0 = xindex % 192 2023-01-11T21:38:06.8548800Z x1 = (xindex // 192) 2023-01-11T21:38:06.8548873Z x2 = xindex 2023-01-11T21:38:06.8548989Z tmp0 = tl.load(in_ptr0 + ((3*x1) + (x0 // 64)), xmask) 2023-01-11T21:38:06.8549122Z tmp1 = tl.load(in_ptr1 + ((64*tmp0) + (x0 % 64)), xmask).to(tl.float32) 2023-01-11T21:38:06.8549258Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask) 2023-01-11T21:38:06.8549345Z ''') 2023-01-11T21:38:06.8549351Z 2023-01-11T21:38:06.8549355Z 2023-01-11T21:38:06.8549443Z async_compile.wait(globals()) 2023-01-11T21:38:06.8549523Z del async_compile 2023-01-11T21:38:06.8549531Z 2023-01-11T21:38:06.8549609Z def call(args): 2023-01-11T21:38:06.8549692Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.8549769Z args.clear() 2023-01-11T21:38:06.8549864Z with torch.cuda.device(0): 2023-01-11T21:38:06.8550101Z buf0 = empty_strided((1, 12, 62, 192), (142848, 11904, 192, 1), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.8550189Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.8550340Z triton_fused_view_1_0.run(arg1_1, arg0_1, buf0, 142848, grid=grid(142848), stream=stream0) 2023-01-11T21:38:06.8550416Z del arg0_1 2023-01-11T21:38:06.8550490Z del arg1_1 2023-01-11T21:38:06.8550572Z return (buf0, ) 2023-01-11T21:38:06.8550578Z 2023-01-11T21:38:06.8550582Z 2023-01-11T21:38:06.8550664Z if __name__ == "__main__": 2023-01-11T21:38:06.8550787Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8550915Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8551149Z arg0_1 = rand_strided((64, 64), (64, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.8551349Z arg1_1 = rand_strided((2232, ), (1, ), device='cuda:0', dtype=torch.int64) 2023-01-11T21:38:06.8551472Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.8551478Z 2023-01-11T21:38:06.8551552Z ok (0.319s) 2023-01-11T21:38:06.8552084Z test_zero_dim_reductions_cuda (__main__.CudaTests) ... /opt/conda/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py:372: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled.Consider setting `torch.set_float32_matmul_precision('high')` 2023-01-11T21:38:06.8552167Z warnings.warn( 2023-01-11T21:38:06.8552431Z [2023-01-11 21:36:24,772] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1019 2023-01-11T21:38:06.8552697Z [2023-01-11 21:36:24,836] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1019 2023-01-11T21:38:06.8552956Z [2023-01-11 21:36:25,089] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1020 2023-01-11T21:38:06.8553211Z [2023-01-11 21:36:25,099] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1020 2023-01-11T21:38:06.8553225Z 2023-01-11T21:38:06.8553322Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8553398Z import torch 2023-01-11T21:38:06.8553475Z import random 2023-01-11T21:38:06.8553595Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8553723Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8553728Z 2023-01-11T21:38:06.8553814Z aten = torch.ops.aten 2023-01-11T21:38:06.8553951Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8554041Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8554046Z 2023-01-11T21:38:06.8554122Z import triton 2023-01-11T21:38:06.8554218Z import triton.language as tl 2023-01-11T21:38:06.8554346Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8554488Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8554494Z 2023-01-11T21:38:06.8554498Z 2023-01-11T21:38:06.8554670Z triton_fused_logical_not_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.8554749Z import triton 2023-01-11T21:38:06.8554871Z import triton.language as tl 2023-01-11T21:38:06.8554981Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.8555085Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.8555220Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.8555347Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.8555352Z 2023-01-11T21:38:06.8555740Z @pointwise(size_hints=[2], filename=__file__, meta={'signature': {0: '*i1', 1: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0,), equal_to_1=())]}) 2023-01-11T21:38:06.8555816Z @triton.jit 2023-01-11T21:38:06.8555942Z def triton_(out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.8556019Z xnumel = 2 2023-01-11T21:38:06.8556112Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.8556241Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.8556327Z xmask = xindex < xnumel 2023-01-11T21:38:06.8556403Z x0 = xindex 2023-01-11T21:38:06.8556481Z tmp0 = False 2023-01-11T21:38:06.8556560Z tmp1 = tmp0 == 0 2023-01-11T21:38:06.8556689Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask) 2023-01-11T21:38:06.8556780Z ''') 2023-01-11T21:38:06.8556785Z 2023-01-11T21:38:06.8556790Z 2023-01-11T21:38:06.8556884Z async_compile.wait(globals()) 2023-01-11T21:38:06.8556966Z del async_compile 2023-01-11T21:38:06.8556971Z 2023-01-11T21:38:06.8557047Z def call(args): 2023-01-11T21:38:06.8557122Z arg0_1, = args 2023-01-11T21:38:06.8557198Z args.clear() 2023-01-11T21:38:06.8557292Z with torch.cuda.device(0): 2023-01-11T21:38:06.8557538Z buf0 = empty_strided((2, 1), (1, 1), device='cuda', dtype=torch.bool) 2023-01-11T21:38:06.8557636Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.8557777Z triton_fused_logical_not_1_0.run(buf0, 2, grid=grid(2), stream=stream0) 2023-01-11T21:38:06.8557857Z return (buf0, ) 2023-01-11T21:38:06.8557862Z 2023-01-11T21:38:06.8557869Z 2023-01-11T21:38:06.8557952Z if __name__ == "__main__": 2023-01-11T21:38:06.8558074Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8558201Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8558403Z arg0_1 = rand_strided((2, 0), (1, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.8558510Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.8558515Z 2023-01-11T21:38:06.8558527Z 2023-01-11T21:38:06.8558619Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8558694Z import torch 2023-01-11T21:38:06.8558771Z import random 2023-01-11T21:38:06.8558897Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8559024Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8559029Z 2023-01-11T21:38:06.8559113Z aten = torch.ops.aten 2023-01-11T21:38:06.8559251Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8559343Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8559348Z 2023-01-11T21:38:06.8559424Z import triton 2023-01-11T21:38:06.8559519Z import triton.language as tl 2023-01-11T21:38:06.8559646Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8559787Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8559793Z 2023-01-11T21:38:06.8559797Z 2023-01-11T21:38:06.8559968Z triton_fused_logical_not_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.8560046Z import triton 2023-01-11T21:38:06.8560140Z import triton.language as tl 2023-01-11T21:38:06.8560249Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.8560355Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.8560488Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.8560615Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.8560620Z 2023-01-11T21:38:06.8561028Z @pointwise(size_hints=[2], filename=__file__, meta={'signature': {0: '*i1', 1: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0,), equal_to_1=())]}) 2023-01-11T21:38:06.8561106Z @triton.jit 2023-01-11T21:38:06.8561232Z def triton_(out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.8561300Z xnumel = 2 2023-01-11T21:38:06.8561400Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.8561530Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.8561616Z xmask = xindex < xnumel 2023-01-11T21:38:06.8561690Z x0 = xindex 2023-01-11T21:38:06.8561768Z tmp0 = False 2023-01-11T21:38:06.8561847Z tmp1 = tmp0 == 0 2023-01-11T21:38:06.8561979Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask) 2023-01-11T21:38:06.8562066Z ''') 2023-01-11T21:38:06.8562072Z 2023-01-11T21:38:06.8562076Z 2023-01-11T21:38:06.8562171Z async_compile.wait(globals()) 2023-01-11T21:38:06.8562251Z del async_compile 2023-01-11T21:38:06.8562256Z 2023-01-11T21:38:06.8562336Z def call(args): 2023-01-11T21:38:06.8562411Z arg0_1, = args 2023-01-11T21:38:06.8562487Z args.clear() 2023-01-11T21:38:06.8562582Z with torch.cuda.device(0): 2023-01-11T21:38:06.8562768Z buf0 = empty_strided((2, ), (1, ), device='cuda', dtype=torch.bool) 2023-01-11T21:38:06.8562862Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.8563002Z triton_fused_logical_not_1_0.run(buf0, 2, grid=grid(2), stream=stream0) 2023-01-11T21:38:06.8563083Z return (buf0, ) 2023-01-11T21:38:06.8563088Z 2023-01-11T21:38:06.8563093Z 2023-01-11T21:38:06.8563175Z if __name__ == "__main__": 2023-01-11T21:38:06.8563295Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8563452Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8563654Z arg0_1 = rand_strided((2, 0), (1, 1), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.8563761Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.8563766Z 2023-01-11T21:38:06.8563843Z ok (0.539s) 2023-01-11T21:38:06.8564298Z test_zeros_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8564431Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8564694Z [2023-01-11 21:36:25,360] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1021 2023-01-11T21:38:06.8564963Z [2023-01-11 21:36:27,038] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1021 2023-01-11T21:38:06.8565380Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8565513Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8565770Z [2023-01-11 21:36:27,146] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1022 2023-01-11T21:38:06.8565776Z 2023-01-11T21:38:06.8565877Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8565947Z import torch 2023-01-11T21:38:06.8566024Z import random 2023-01-11T21:38:06.8566145Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8566275Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8566280Z 2023-01-11T21:38:06.8566364Z aten = torch.ops.aten 2023-01-11T21:38:06.8566503Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8566601Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8566606Z 2023-01-11T21:38:06.8566704Z import triton 2023-01-11T21:38:06.8566793Z import triton.language as tl 2023-01-11T21:38:06.8566922Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8567065Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8567070Z 2023-01-11T21:38:06.8567075Z 2023-01-11T21:38:06.8567240Z triton_fused_add_add_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.8567316Z import triton 2023-01-11T21:38:06.8567411Z import triton.language as tl 2023-01-11T21:38:06.8567526Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.8567631Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.8567762Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.8567888Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.8567893Z 2023-01-11T21:38:06.8568310Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.8568386Z @triton.jit 2023-01-11T21:38:06.8568531Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.8568611Z xnumel = 8 2023-01-11T21:38:06.8568712Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.8568843Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.8568921Z xmask = xindex < xnumel 2023-01-11T21:38:06.8568995Z x0 = xindex 2023-01-11T21:38:06.8569188Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.8569318Z tmp3 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.8569393Z tmp1 = 1 2023-01-11T21:38:06.8569474Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.8569549Z tmp4 = tmp3 + tmp1 2023-01-11T21:38:06.8569685Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.8569824Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask) 2023-01-11T21:38:06.8569911Z ''') 2023-01-11T21:38:06.8569917Z 2023-01-11T21:38:06.8569921Z 2023-01-11T21:38:06.8570082Z triton_fused_zeros_1 = async_compile.triton(''' 2023-01-11T21:38:06.8570159Z import triton 2023-01-11T21:38:06.8570254Z import triton.language as tl 2023-01-11T21:38:06.8570370Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.8570467Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.8570601Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.8570728Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.8570737Z 2023-01-11T21:38:06.8571131Z @pointwise(size_hints=[32768], filename=__file__, meta={'signature': {0: '*fp32', 1: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.8571208Z @triton.jit 2023-01-11T21:38:06.8571334Z def triton_(out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.8571415Z xnumel = 32768 2023-01-11T21:38:06.8571515Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.8571639Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.8571725Z xmask = xindex < xnumel 2023-01-11T21:38:06.8571797Z x0 = xindex 2023-01-11T21:38:06.8571870Z tmp0 = 0 2023-01-11T21:38:06.8572008Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask) 2023-01-11T21:38:06.8572095Z ''') 2023-01-11T21:38:06.8572101Z 2023-01-11T21:38:06.8572105Z 2023-01-11T21:38:06.8572244Z kernel_cpp_2 = async_compile.cpp(''' 2023-01-11T21:38:06.8572457Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.8572571Z extern "C" void kernel(float* __restrict__ out_ptr0) 2023-01-11T21:38:06.8572638Z { 2023-01-11T21:38:06.8572726Z #pragma GCC ivdep 2023-01-11T21:38:06.8572812Z for(long i0=0; i0<6; i0+=1) 2023-01-11T21:38:06.8572907Z { 2023-01-11T21:38:06.8572980Z { 2023-01-11T21:38:06.8573043Z { 2023-01-11T21:38:06.8573151Z auto tmp0 = static_cast(0); 2023-01-11T21:38:06.8573243Z out_ptr0[i0] = tmp0; 2023-01-11T21:38:06.8573314Z } 2023-01-11T21:38:06.8573383Z } 2023-01-11T21:38:06.8573452Z } 2023-01-11T21:38:06.8573518Z } 2023-01-11T21:38:06.8573598Z ''') 2023-01-11T21:38:06.8573603Z 2023-01-11T21:38:06.8573607Z 2023-01-11T21:38:06.8573767Z triton_fused_full_3 = async_compile.triton(''' 2023-01-11T21:38:06.8573844Z import triton 2023-01-11T21:38:06.8573938Z import triton.language as tl 2023-01-11T21:38:06.8574058Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.8574162Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.8574295Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.8574415Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.8574427Z 2023-01-11T21:38:06.8575004Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*fp32', 1: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0,), equal_to_1=())]}) 2023-01-11T21:38:06.8575083Z @triton.jit 2023-01-11T21:38:06.8575208Z def triton_(out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.8575284Z xnumel = 6 2023-01-11T21:38:06.8575382Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.8575512Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.8575596Z xmask = xindex < xnumel 2023-01-11T21:38:06.8575663Z x0 = xindex 2023-01-11T21:38:06.8575789Z tmp0 = 3.1416 2023-01-11T21:38:06.8575926Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask) 2023-01-11T21:38:06.8576013Z ''') 2023-01-11T21:38:06.8576019Z 2023-01-11T21:38:06.8576024Z 2023-01-11T21:38:06.8576119Z async_compile.wait(globals()) 2023-01-11T21:38:06.8576199Z del async_compile 2023-01-11T21:38:06.8576206Z 2023-01-11T21:38:06.8576286Z def call(args): 2023-01-11T21:38:06.8576362Z arg0_1, = args 2023-01-11T21:38:06.8576433Z args.clear() 2023-01-11T21:38:06.8576528Z with torch.cuda.device(0): 2023-01-11T21:38:06.8576731Z buf0 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.8576929Z buf4 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.8577025Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.8577222Z triton_fused_add_add_1_0.run(arg0_1, buf0, buf4, 8, grid=grid(8), stream=stream0) 2023-01-11T21:38:06.8577310Z del arg0_1 2023-01-11T21:38:06.8577559Z buf1 = empty_strided((1, 8, 64, 64), (32768, 4096, 64, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.8577697Z triton_fused_zeros_1.run(buf1, 32768, grid=grid(32768), stream=stream0) 2023-01-11T21:38:06.8577920Z buf2 = empty_strided((1, 8, 64, 64), (32768, 4096, 64, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.8578052Z triton_fused_zeros_1.run(buf2, 32768, grid=grid(32768), stream=stream0) 2023-01-11T21:38:06.8578247Z buf3 = empty_strided((2, 3), (3, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.8578357Z kernel_cpp_2(c_void_p(buf3.data_ptr())) 2023-01-11T21:38:06.8578452Z with torch.cuda.device(0): 2023-01-11T21:38:06.8578655Z buf5 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.8578781Z triton_fused_full_3.run(buf5, 6, grid=grid(6), stream=stream0) 2023-01-11T21:38:06.8578895Z return (buf0, buf1, buf2, buf3, buf4, buf5, ) 2023-01-11T21:38:06.8578903Z 2023-01-11T21:38:06.8578909Z 2023-01-11T21:38:06.8578990Z if __name__ == "__main__": 2023-01-11T21:38:06.8579109Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8579237Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8579476Z arg0_1 = rand_strided((8, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.8579591Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.8579597Z 2023-01-11T21:38:06.8579862Z [2023-01-11 21:36:27,234] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1022 2023-01-11T21:38:06.8579868Z 2023-01-11T21:38:06.8579968Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8580038Z import torch 2023-01-11T21:38:06.8580114Z import random 2023-01-11T21:38:06.8580235Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8580359Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8580364Z 2023-01-11T21:38:06.8580451Z aten = torch.ops.aten 2023-01-11T21:38:06.8580588Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8580685Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8580690Z 2023-01-11T21:38:06.8580760Z import triton 2023-01-11T21:38:06.8580853Z import triton.language as tl 2023-01-11T21:38:06.8580983Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8581125Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8581131Z 2023-01-11T21:38:06.8581135Z 2023-01-11T21:38:06.8581301Z triton_fused_add_add_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.8581378Z import triton 2023-01-11T21:38:06.8581472Z import triton.language as tl 2023-01-11T21:38:06.8581587Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.8581684Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.8581817Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.8581943Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.8582006Z 2023-01-11T21:38:06.8582426Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.8582510Z @triton.jit 2023-01-11T21:38:06.8582656Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.8582732Z xnumel = 8 2023-01-11T21:38:06.8582831Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.8582955Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.8583045Z xmask = xindex < xnumel 2023-01-11T21:38:06.8583118Z x0 = xindex 2023-01-11T21:38:06.8583335Z tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last').to(tl.float32) 2023-01-11T21:38:06.8583456Z tmp3 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2023-01-11T21:38:06.8583532Z tmp1 = 1 2023-01-11T21:38:06.8583616Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.8583699Z tmp4 = tmp3.to(tl.float32) 2023-01-11T21:38:06.8583782Z tmp5 = tmp4 + tmp1 2023-01-11T21:38:06.8583916Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.8584055Z tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask) 2023-01-11T21:38:06.8584143Z ''') 2023-01-11T21:38:06.8584149Z 2023-01-11T21:38:06.8584153Z 2023-01-11T21:38:06.8584314Z triton_fused_zeros_1 = async_compile.triton(''' 2023-01-11T21:38:06.8584392Z import triton 2023-01-11T21:38:06.8584480Z import triton.language as tl 2023-01-11T21:38:06.8584598Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.8584708Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.8584843Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.8584971Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.8584977Z 2023-01-11T21:38:06.8585372Z @pointwise(size_hints=[32768], filename=__file__, meta={'signature': {0: '*fp32', 1: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.8585456Z @triton.jit 2023-01-11T21:38:06.8585597Z def triton_(out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.8585717Z xnumel = 32768 2023-01-11T21:38:06.8585820Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.8585951Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.8586038Z xmask = xindex < xnumel 2023-01-11T21:38:06.8586111Z x0 = xindex 2023-01-11T21:38:06.8586183Z tmp0 = 0 2023-01-11T21:38:06.8586317Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask) 2023-01-11T21:38:06.8586397Z ''') 2023-01-11T21:38:06.8586403Z 2023-01-11T21:38:06.8586418Z 2023-01-11T21:38:06.8586549Z kernel_cpp_2 = async_compile.cpp(''' 2023-01-11T21:38:06.8586758Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.8586881Z extern "C" void kernel(float* __restrict__ out_ptr0) 2023-01-11T21:38:06.8586953Z { 2023-01-11T21:38:06.8587036Z #pragma GCC ivdep 2023-01-11T21:38:06.8587127Z for(long i0=0; i0<6; i0+=1) 2023-01-11T21:38:06.8587197Z { 2023-01-11T21:38:06.8587262Z { 2023-01-11T21:38:06.8587333Z { 2023-01-11T21:38:06.8587440Z auto tmp0 = static_cast(0); 2023-01-11T21:38:06.8587528Z out_ptr0[i0] = tmp0; 2023-01-11T21:38:06.8587601Z } 2023-01-11T21:38:06.8587669Z } 2023-01-11T21:38:06.8587730Z } 2023-01-11T21:38:06.8587798Z } 2023-01-11T21:38:06.8587883Z ''') 2023-01-11T21:38:06.8587889Z 2023-01-11T21:38:06.8587894Z 2023-01-11T21:38:06.8588053Z triton_fused_full_3 = async_compile.triton(''' 2023-01-11T21:38:06.8588131Z import triton 2023-01-11T21:38:06.8588226Z import triton.language as tl 2023-01-11T21:38:06.8588342Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.8588469Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.8588608Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.8588734Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.8588740Z 2023-01-11T21:38:06.8589127Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*fp32', 1: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0,), equal_to_1=())]}) 2023-01-11T21:38:06.8589203Z @triton.jit 2023-01-11T21:38:06.8589329Z def triton_(out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.8589402Z xnumel = 6 2023-01-11T21:38:06.8589505Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.8589627Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.8589712Z xmask = xindex < xnumel 2023-01-11T21:38:06.8589785Z x0 = xindex 2023-01-11T21:38:06.8589862Z tmp0 = 3.1416 2023-01-11T21:38:06.8589999Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask) 2023-01-11T21:38:06.8590086Z ''') 2023-01-11T21:38:06.8590091Z 2023-01-11T21:38:06.8590096Z 2023-01-11T21:38:06.8590192Z async_compile.wait(globals()) 2023-01-11T21:38:06.8590272Z del async_compile 2023-01-11T21:38:06.8590277Z 2023-01-11T21:38:06.8590349Z def call(args): 2023-01-11T21:38:06.8590425Z arg0_1, = args 2023-01-11T21:38:06.8590501Z args.clear() 2023-01-11T21:38:06.8590595Z with torch.cuda.device(0): 2023-01-11T21:38:06.8590798Z buf0 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.float16) 2023-01-11T21:38:06.8590995Z buf4 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.8591091Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.8591232Z triton_fused_add_add_1_0.run(arg0_1, buf0, buf4, 8, grid=grid(8), stream=stream0) 2023-01-11T21:38:06.8591310Z del arg0_1 2023-01-11T21:38:06.8591535Z buf1 = empty_strided((1, 8, 64, 64), (32768, 4096, 64, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.8591676Z triton_fused_zeros_1.run(buf1, 32768, grid=grid(32768), stream=stream0) 2023-01-11T21:38:06.8591897Z buf2 = empty_strided((1, 8, 64, 64), (32768, 4096, 64, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.8592055Z triton_fused_zeros_1.run(buf2, 32768, grid=grid(32768), stream=stream0) 2023-01-11T21:38:06.8592253Z buf3 = empty_strided((2, 3), (3, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.8592364Z kernel_cpp_2(c_void_p(buf3.data_ptr())) 2023-01-11T21:38:06.8592453Z with torch.cuda.device(0): 2023-01-11T21:38:06.8592652Z buf5 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.8592784Z triton_fused_full_3.run(buf5, 6, grid=grid(6), stream=stream0) 2023-01-11T21:38:06.8592894Z return (buf0, buf1, buf2, buf3, buf4, buf5, ) 2023-01-11T21:38:06.8592899Z 2023-01-11T21:38:06.8592907Z 2023-01-11T21:38:06.8592993Z if __name__ == "__main__": 2023-01-11T21:38:06.8593114Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8593241Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8593441Z arg0_1 = rand_strided((8, ), (1, ), device='cuda:0', dtype=torch.float16) 2023-01-11T21:38:06.8593550Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.8593555Z 2023-01-11T21:38:06.8593629Z ok (1.983s) 2023-01-11T21:38:06.8593762Z test_print_pow (__main__.ExprPrinterTests) ... ok (0.003s) 2023-01-11T21:38:06.8594249Z test_cpu_broadcast1_broadcast1 (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8594419Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8594683Z [2023-01-11 21:36:27,252] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1023 2023-01-11T21:38:06.8594950Z [2023-01-11 21:36:27,260] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1023 2023-01-11T21:38:06.8594956Z 2023-01-11T21:38:06.8595056Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8595134Z import torch 2023-01-11T21:38:06.8595204Z import random 2023-01-11T21:38:06.8595325Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8595451Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8595456Z 2023-01-11T21:38:06.8595542Z aten = torch.ops.aten 2023-01-11T21:38:06.8595683Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8595781Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8595786Z 2023-01-11T21:38:06.8595864Z import triton 2023-01-11T21:38:06.8595962Z import triton.language as tl 2023-01-11T21:38:06.8596082Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8596223Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8596228Z 2023-01-11T21:38:06.8596233Z 2023-01-11T21:38:06.8596376Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.8596585Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.8596710Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.8596824Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:06.8596930Z float* __restrict__ out_ptr0) 2023-01-11T21:38:06.8596999Z { 2023-01-11T21:38:06.8597096Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.8597165Z { 2023-01-11T21:38:06.8597251Z #pragma omp for 2023-01-11T21:38:06.8597342Z for(long i0=0; i0<1; i0+=1) 2023-01-11T21:38:06.8597414Z { 2023-01-11T21:38:06.8597587Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:06.8597728Z auto tmp1 = at::vec::Vectorized::loadu(in_ptr1 + 8*i0); 2023-01-11T21:38:06.8597814Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.8597914Z tmp2.store(out_ptr0 + 8*i0); 2023-01-11T21:38:06.8598010Z } 2023-01-11T21:38:06.8598112Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:06.8598199Z for(long i0=8; i0<10; i0+=1) 2023-01-11T21:38:06.8598265Z { 2023-01-11T21:38:06.8598352Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:06.8598432Z auto tmp1 = in_ptr1[i0]; 2023-01-11T21:38:06.8598522Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.8598606Z out_ptr0[i0] = tmp2; 2023-01-11T21:38:06.8598672Z } 2023-01-11T21:38:06.8598737Z } 2023-01-11T21:38:06.8598801Z } 2023-01-11T21:38:06.8598879Z ''') 2023-01-11T21:38:06.8598894Z 2023-01-11T21:38:06.8598901Z 2023-01-11T21:38:06.8598987Z async_compile.wait(globals()) 2023-01-11T21:38:06.8599064Z del async_compile 2023-01-11T21:38:06.8599069Z 2023-01-11T21:38:06.8599143Z def call(args): 2023-01-11T21:38:06.8599222Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.8599298Z args.clear() 2023-01-11T21:38:06.8599497Z buf0 = empty_strided((10, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.8599666Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.8599732Z del arg0_1 2023-01-11T21:38:06.8599805Z del arg1_1 2023-01-11T21:38:06.8599881Z return (buf0, ) 2023-01-11T21:38:06.8599887Z 2023-01-11T21:38:06.8599891Z 2023-01-11T21:38:06.8599970Z if __name__ == "__main__": 2023-01-11T21:38:06.8600090Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8600216Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8600412Z arg0_1 = rand_strided((10, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.8600633Z arg1_1 = rand_strided((10, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.8600746Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.8600752Z 2023-01-11T21:38:06.8600823Z ok (0.020s) 2023-01-11T21:38:06.8601301Z test_cpu_broadcast1_broadcast2 (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8601433Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8601692Z [2023-01-11 21:36:27,273] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1024 2023-01-11T21:38:06.8601954Z [2023-01-11 21:36:28,982] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1024 2023-01-11T21:38:06.8601963Z 2023-01-11T21:38:06.8602062Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8602136Z import torch 2023-01-11T21:38:06.8602210Z import random 2023-01-11T21:38:06.8602322Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8602446Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8602451Z 2023-01-11T21:38:06.8602533Z aten = torch.ops.aten 2023-01-11T21:38:06.8602669Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8602765Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8602770Z 2023-01-11T21:38:06.8602844Z import triton 2023-01-11T21:38:06.8602937Z import triton.language as tl 2023-01-11T21:38:06.8603064Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8603197Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8603202Z 2023-01-11T21:38:06.8603209Z 2023-01-11T21:38:06.8603348Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.8603553Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.8603676Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.8603816Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:06.8603921Z float* __restrict__ out_ptr0) 2023-01-11T21:38:06.8603987Z { 2023-01-11T21:38:06.8604088Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.8604147Z { 2023-01-11T21:38:06.8604229Z #pragma omp for 2023-01-11T21:38:06.8604316Z for(long i0=0; i0<10; i0+=1) 2023-01-11T21:38:06.8604383Z { 2023-01-11T21:38:06.8604471Z for(long i1=0; i1<1; i1+=1) 2023-01-11T21:38:06.8604539Z { 2023-01-11T21:38:06.8604672Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i1); 2023-01-11T21:38:06.8604806Z auto tmp1 = at::vec::Vectorized(in_ptr1[i0]); 2023-01-11T21:38:06.8604902Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.8605012Z tmp2.store(out_ptr0 + (8*i1) + (10*i0)); 2023-01-11T21:38:06.8605080Z } 2023-01-11T21:38:06.8605176Z #pragma omp simd simdlen(4) 2023-01-11T21:38:06.8605268Z for(long i1=8; i1<10; i1+=1) 2023-01-11T21:38:06.8605336Z { 2023-01-11T21:38:06.8605442Z auto tmp0 = in_ptr0[i1]; 2023-01-11T21:38:06.8605539Z auto tmp1 = in_ptr1[i0]; 2023-01-11T21:38:06.8605649Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.8605747Z out_ptr0[i1 + (10*i0)] = tmp2; 2023-01-11T21:38:06.8605816Z } 2023-01-11T21:38:06.8605885Z } 2023-01-11T21:38:06.8605944Z } 2023-01-11T21:38:06.8606008Z } 2023-01-11T21:38:06.8606092Z ''') 2023-01-11T21:38:06.8606098Z 2023-01-11T21:38:06.8606103Z 2023-01-11T21:38:06.8606196Z async_compile.wait(globals()) 2023-01-11T21:38:06.8606300Z del async_compile 2023-01-11T21:38:06.8606306Z 2023-01-11T21:38:06.8606379Z def call(args): 2023-01-11T21:38:06.8606459Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.8606534Z args.clear() 2023-01-11T21:38:06.8606735Z buf0 = empty_strided((1, 10, 10), (100, 10, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.8606907Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.8606979Z del arg0_1 2023-01-11T21:38:06.8607052Z del arg1_1 2023-01-11T21:38:06.8607127Z return (buf0, ) 2023-01-11T21:38:06.8607132Z 2023-01-11T21:38:06.8607137Z 2023-01-11T21:38:06.8607219Z if __name__ == "__main__": 2023-01-11T21:38:06.8607337Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8607457Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8607652Z arg0_1 = rand_strided((10, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.8607859Z arg1_1 = rand_strided((1, 10, 1), (10, 1, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.8607984Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.8607989Z 2023-01-11T21:38:06.8608058Z ok (1.724s) 2023-01-11T21:38:06.8608537Z test_cpu_broadcast1_broadcast3 (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8608667Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8608924Z [2023-01-11 21:36:28,998] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1025 2023-01-11T21:38:06.8609187Z [2023-01-11 21:36:30,662] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1025 2023-01-11T21:38:06.8609195Z 2023-01-11T21:38:06.8609292Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8609359Z import torch 2023-01-11T21:38:06.8609433Z import random 2023-01-11T21:38:06.8609552Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8609701Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8609707Z 2023-01-11T21:38:06.8609789Z aten = torch.ops.aten 2023-01-11T21:38:06.8609926Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8610020Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8610026Z 2023-01-11T21:38:06.8610099Z import triton 2023-01-11T21:38:06.8610184Z import triton.language as tl 2023-01-11T21:38:06.8610308Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8610447Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8610453Z 2023-01-11T21:38:06.8610457Z 2023-01-11T21:38:06.8610600Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.8610806Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.8610928Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.8611036Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:06.8611142Z float* __restrict__ out_ptr0) 2023-01-11T21:38:06.8611200Z { 2023-01-11T21:38:06.8611302Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.8611368Z { 2023-01-11T21:38:06.8611450Z #pragma omp for 2023-01-11T21:38:06.8611540Z for(long i0=0; i0<1; i0+=1) 2023-01-11T21:38:06.8611611Z { 2023-01-11T21:38:06.8611743Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:06.8611868Z auto tmp1 = at::vec::Vectorized(in_ptr1[0]); 2023-01-11T21:38:06.8611957Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.8612053Z tmp2.store(out_ptr0 + 8*i0); 2023-01-11T21:38:06.8612147Z } 2023-01-11T21:38:06.8612246Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:06.8612334Z for(long i0=8; i0<10; i0+=1) 2023-01-11T21:38:06.8612394Z { 2023-01-11T21:38:06.8612482Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:06.8612571Z auto tmp1 = in_ptr1[0]; 2023-01-11T21:38:06.8612661Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.8612746Z out_ptr0[i0] = tmp2; 2023-01-11T21:38:06.8612815Z } 2023-01-11T21:38:06.8612882Z } 2023-01-11T21:38:06.8612940Z } 2023-01-11T21:38:06.8613024Z ''') 2023-01-11T21:38:06.8613030Z 2023-01-11T21:38:06.8613034Z 2023-01-11T21:38:06.8613129Z async_compile.wait(globals()) 2023-01-11T21:38:06.8613205Z del async_compile 2023-01-11T21:38:06.8613210Z 2023-01-11T21:38:06.8613284Z def call(args): 2023-01-11T21:38:06.8613364Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.8613439Z args.clear() 2023-01-11T21:38:06.8613634Z buf0 = empty_strided((10, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.8613798Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.8613870Z del arg0_1 2023-01-11T21:38:06.8613942Z del arg1_1 2023-01-11T21:38:06.8614017Z return (buf0, ) 2023-01-11T21:38:06.8614022Z 2023-01-11T21:38:06.8614029Z 2023-01-11T21:38:06.8614110Z if __name__ == "__main__": 2023-01-11T21:38:06.8614228Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8614354Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8614655Z arg0_1 = rand_strided((10, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.8614849Z arg1_1 = rand_strided((1, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.8614969Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.8614975Z 2023-01-11T21:38:06.8615046Z ok (1.678s) 2023-01-11T21:38:06.8615565Z test_cpu_broadcast1_dense (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8615702Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8615960Z [2023-01-11 21:36:30,675] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1026 2023-01-11T21:38:06.8616219Z [2023-01-11 21:36:32,351] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1026 2023-01-11T21:38:06.8616225Z 2023-01-11T21:38:06.8616322Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8616398Z import torch 2023-01-11T21:38:06.8616466Z import random 2023-01-11T21:38:06.8616585Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8616711Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8616716Z 2023-01-11T21:38:06.8616797Z aten = torch.ops.aten 2023-01-11T21:38:06.8616932Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8617026Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8617031Z 2023-01-11T21:38:06.8617109Z import triton 2023-01-11T21:38:06.8617256Z import triton.language as tl 2023-01-11T21:38:06.8617383Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8617523Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8617528Z 2023-01-11T21:38:06.8617533Z 2023-01-11T21:38:06.8617671Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.8617877Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.8617999Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.8618108Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:06.8618255Z float* __restrict__ out_ptr0) 2023-01-11T21:38:06.8618314Z { 2023-01-11T21:38:06.8618414Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.8618480Z { 2023-01-11T21:38:06.8618563Z #pragma omp for 2023-01-11T21:38:06.8618652Z for(long i0=0; i0<10; i0+=1) 2023-01-11T21:38:06.8618718Z { 2023-01-11T21:38:06.8618806Z for(long i1=0; i1<1; i1+=1) 2023-01-11T21:38:06.8618867Z { 2023-01-11T21:38:06.8619009Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i1); 2023-01-11T21:38:06.8619157Z auto tmp1 = at::vec::Vectorized::loadu(in_ptr1 + (8*i1) + (10*i0)); 2023-01-11T21:38:06.8619249Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.8619359Z tmp2.store(out_ptr0 + (8*i1) + (10*i0)); 2023-01-11T21:38:06.8619426Z } 2023-01-11T21:38:06.8619523Z #pragma omp simd simdlen(4) 2023-01-11T21:38:06.8619609Z for(long i1=8; i1<10; i1+=1) 2023-01-11T21:38:06.8619676Z { 2023-01-11T21:38:06.8619767Z auto tmp0 = in_ptr0[i1]; 2023-01-11T21:38:06.8619873Z auto tmp1 = in_ptr1[i1 + (10*i0)]; 2023-01-11T21:38:06.8619964Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.8620065Z out_ptr0[i1 + (10*i0)] = tmp2; 2023-01-11T21:38:06.8620133Z } 2023-01-11T21:38:06.8620193Z } 2023-01-11T21:38:06.8620259Z } 2023-01-11T21:38:06.8620322Z } 2023-01-11T21:38:06.8620408Z ''') 2023-01-11T21:38:06.8620414Z 2023-01-11T21:38:06.8620418Z 2023-01-11T21:38:06.8620513Z async_compile.wait(globals()) 2023-01-11T21:38:06.8620590Z del async_compile 2023-01-11T21:38:06.8620595Z 2023-01-11T21:38:06.8620670Z def call(args): 2023-01-11T21:38:06.8620751Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.8620819Z args.clear() 2023-01-11T21:38:06.8621018Z buf0 = empty_strided((10, 10), (10, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.8621189Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.8621261Z del arg0_1 2023-01-11T21:38:06.8621332Z del arg1_1 2023-01-11T21:38:06.8621407Z return (buf0, ) 2023-01-11T21:38:06.8621412Z 2023-01-11T21:38:06.8621417Z 2023-01-11T21:38:06.8621526Z if __name__ == "__main__": 2023-01-11T21:38:06.8621638Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8621767Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8621962Z arg0_1 = rand_strided((10, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.8622159Z arg1_1 = rand_strided((10, 10), (10, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.8622281Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.8622286Z 2023-01-11T21:38:06.8622356Z ok (1.689s) 2023-01-11T21:38:06.8622837Z test_cpu_broadcast1_double (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8622972Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8623233Z [2023-01-11 21:36:32,365] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1027 2023-01-11T21:38:06.8623488Z [2023-01-11 21:36:34,024] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1027 2023-01-11T21:38:06.8623502Z 2023-01-11T21:38:06.8623593Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8623666Z import torch 2023-01-11T21:38:06.8623740Z import random 2023-01-11T21:38:06.8623859Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8624019Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8624024Z 2023-01-11T21:38:06.8624106Z aten = torch.ops.aten 2023-01-11T21:38:06.8624243Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8624331Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8624337Z 2023-01-11T21:38:06.8624411Z import triton 2023-01-11T21:38:06.8624505Z import triton.language as tl 2023-01-11T21:38:06.8624632Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8624775Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8624780Z 2023-01-11T21:38:06.8624785Z 2023-01-11T21:38:06.8624922Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.8625130Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.8625252Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.8625358Z const double* __restrict__ in_ptr1, 2023-01-11T21:38:06.8625467Z double* __restrict__ out_ptr0) 2023-01-11T21:38:06.8625532Z { 2023-01-11T21:38:06.8625633Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.8625699Z { 2023-01-11T21:38:06.8625794Z #pragma omp for collapse(2) 2023-01-11T21:38:06.8625881Z for(long i0=0; i0<10; i0+=1) 2023-01-11T21:38:06.8625944Z { 2023-01-11T21:38:06.8626034Z for(long i1=0; i1<10; i1+=1) 2023-01-11T21:38:06.8626102Z { 2023-01-11T21:38:06.8626170Z { 2023-01-11T21:38:06.8626241Z { 2023-01-11T21:38:06.8626342Z auto tmp0 = in_ptr0[i1]; 2023-01-11T21:38:06.8626451Z auto tmp2 = in_ptr1[i1 + (10*i0)]; 2023-01-11T21:38:06.8626562Z auto tmp1 = static_cast(tmp0); 2023-01-11T21:38:06.8626660Z auto tmp3 = tmp1 + tmp2; 2023-01-11T21:38:06.8626761Z out_ptr0[i1 + (10*i0)] = tmp3; 2023-01-11T21:38:06.8626834Z } 2023-01-11T21:38:06.8626903Z } 2023-01-11T21:38:06.8626971Z } 2023-01-11T21:38:06.8627037Z } 2023-01-11T21:38:06.8627096Z } 2023-01-11T21:38:06.8627160Z } 2023-01-11T21:38:06.8627245Z ''') 2023-01-11T21:38:06.8627250Z 2023-01-11T21:38:06.8627255Z 2023-01-11T21:38:06.8627377Z async_compile.wait(globals()) 2023-01-11T21:38:06.8627457Z del async_compile 2023-01-11T21:38:06.8627462Z 2023-01-11T21:38:06.8627536Z def call(args): 2023-01-11T21:38:06.8627615Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.8627683Z args.clear() 2023-01-11T21:38:06.8627881Z buf0 = empty_strided((10, 10), (10, 1), device='cpu', dtype=torch.float64) 2023-01-11T21:38:06.8628048Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.8628121Z del arg0_1 2023-01-11T21:38:06.8628192Z del arg1_1 2023-01-11T21:38:06.8628268Z return (buf0, ) 2023-01-11T21:38:06.8628277Z 2023-01-11T21:38:06.8628282Z 2023-01-11T21:38:06.8628363Z if __name__ == "__main__": 2023-01-11T21:38:06.8628482Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8628601Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8628795Z arg0_1 = rand_strided((10, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.8628995Z arg1_1 = rand_strided((10, 10), (10, 1), device='cpu', dtype=torch.float64) 2023-01-11T21:38:06.8629113Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.8629119Z 2023-01-11T21:38:06.8629190Z ok (1.673s) 2023-01-11T21:38:06.8629663Z test_cpu_broadcast1_int (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8629822Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8630084Z [2023-01-11 21:36:34,038] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1028 2023-01-11T21:38:06.8630349Z [2023-01-11 21:36:35,696] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1028 2023-01-11T21:38:06.8630355Z 2023-01-11T21:38:06.8630447Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8630521Z import torch 2023-01-11T21:38:06.8630595Z import random 2023-01-11T21:38:06.8630713Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8630836Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8630841Z 2023-01-11T21:38:06.8630926Z aten = torch.ops.aten 2023-01-11T21:38:06.8631062Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8631159Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8631167Z 2023-01-11T21:38:06.8631234Z import triton 2023-01-11T21:38:06.8631326Z import triton.language as tl 2023-01-11T21:38:06.8631449Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8631589Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8631595Z 2023-01-11T21:38:06.8631599Z 2023-01-11T21:38:06.8631741Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.8631946Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.8632069Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.8632177Z const int* __restrict__ in_ptr1, 2023-01-11T21:38:06.8632274Z float* __restrict__ out_ptr0) 2023-01-11T21:38:06.8632339Z { 2023-01-11T21:38:06.8632440Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.8632507Z { 2023-01-11T21:38:06.8632590Z #pragma omp for 2023-01-11T21:38:06.8632676Z for(long i0=0; i0<10; i0+=1) 2023-01-11T21:38:06.8632739Z { 2023-01-11T21:38:06.8632807Z { 2023-01-11T21:38:06.8632875Z { 2023-01-11T21:38:06.8632973Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:06.8633070Z auto tmp1 = in_ptr1[i0]; 2023-01-11T21:38:06.8633213Z auto tmp2 = static_cast(tmp1); 2023-01-11T21:38:06.8633310Z auto tmp3 = tmp0 + tmp2; 2023-01-11T21:38:06.8633393Z out_ptr0[i0] = tmp3; 2023-01-11T21:38:06.8633462Z } 2023-01-11T21:38:06.8633531Z } 2023-01-11T21:38:06.8633598Z } 2023-01-11T21:38:06.8633663Z } 2023-01-11T21:38:06.8633727Z } 2023-01-11T21:38:06.8633811Z ''') 2023-01-11T21:38:06.8633817Z 2023-01-11T21:38:06.8633821Z 2023-01-11T21:38:06.8633906Z async_compile.wait(globals()) 2023-01-11T21:38:06.8633984Z del async_compile 2023-01-11T21:38:06.8633989Z 2023-01-11T21:38:06.8634063Z def call(args): 2023-01-11T21:38:06.8634146Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.8634220Z args.clear() 2023-01-11T21:38:06.8634415Z buf0 = empty_strided((10, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.8634582Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.8634655Z del arg0_1 2023-01-11T21:38:06.8634722Z del arg1_1 2023-01-11T21:38:06.8634798Z return (buf0, ) 2023-01-11T21:38:06.8634803Z 2023-01-11T21:38:06.8634807Z 2023-01-11T21:38:06.8634888Z if __name__ == "__main__": 2023-01-11T21:38:06.8635005Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8635132Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8635357Z arg0_1 = rand_strided((10, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.8635562Z arg1_1 = rand_strided((10, ), (1, ), device='cpu', dtype=torch.int32) 2023-01-11T21:38:06.8635674Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.8635723Z 2023-01-11T21:38:06.8635790Z ok (1.671s) 2023-01-11T21:38:06.8636267Z test_cpu_broadcast1_strided (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8636399Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8636662Z [2023-01-11 21:36:35,710] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1029 2023-01-11T21:38:06.8636928Z [2023-01-11 21:36:37,377] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1029 2023-01-11T21:38:06.8636934Z 2023-01-11T21:38:06.8637031Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8637108Z import torch 2023-01-11T21:38:06.8637183Z import random 2023-01-11T21:38:06.8637302Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8637418Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8637423Z 2023-01-11T21:38:06.8637504Z aten = torch.ops.aten 2023-01-11T21:38:06.8637642Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8637739Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8637744Z 2023-01-11T21:38:06.8637818Z import triton 2023-01-11T21:38:06.8637910Z import triton.language as tl 2023-01-11T21:38:06.8638034Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8638166Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8638181Z 2023-01-11T21:38:06.8638185Z 2023-01-11T21:38:06.8638314Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.8638521Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.8638646Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.8638755Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:06.8638858Z float* __restrict__ out_ptr0) 2023-01-11T21:38:06.8638926Z { 2023-01-11T21:38:06.8639028Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.8639113Z { 2023-01-11T21:38:06.8639210Z #pragma omp for collapse(2) 2023-01-11T21:38:06.8639296Z for(long i0=0; i0<10; i0+=1) 2023-01-11T21:38:06.8639364Z { 2023-01-11T21:38:06.8639455Z for(long i1=0; i1<10; i1+=1) 2023-01-11T21:38:06.8639524Z { 2023-01-11T21:38:06.8639594Z { 2023-01-11T21:38:06.8639658Z { 2023-01-11T21:38:06.8639758Z auto tmp0 = in_ptr0[i1]; 2023-01-11T21:38:06.8639869Z auto tmp1 = in_ptr1[(2*i1) + (30*i0)]; 2023-01-11T21:38:06.8639969Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.8640074Z out_ptr0[i1 + (10*i0)] = tmp2; 2023-01-11T21:38:06.8640144Z } 2023-01-11T21:38:06.8640213Z } 2023-01-11T21:38:06.8640273Z } 2023-01-11T21:38:06.8640340Z } 2023-01-11T21:38:06.8640406Z } 2023-01-11T21:38:06.8640469Z } 2023-01-11T21:38:06.8640555Z ''') 2023-01-11T21:38:06.8640563Z 2023-01-11T21:38:06.8640568Z 2023-01-11T21:38:06.8640660Z async_compile.wait(globals()) 2023-01-11T21:38:06.8640738Z del async_compile 2023-01-11T21:38:06.8640744Z 2023-01-11T21:38:06.8640811Z def call(args): 2023-01-11T21:38:06.8640889Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.8640964Z args.clear() 2023-01-11T21:38:06.8641165Z buf0 = empty_strided((10, 10), (10, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.8641333Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.8641407Z del arg0_1 2023-01-11T21:38:06.8641478Z del arg1_1 2023-01-11T21:38:06.8649722Z return (buf0, ) 2023-01-11T21:38:06.8649733Z 2023-01-11T21:38:06.8649737Z 2023-01-11T21:38:06.8649838Z if __name__ == "__main__": 2023-01-11T21:38:06.8649962Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8650084Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8650311Z arg0_1 = rand_strided((10, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.8650511Z arg1_1 = rand_strided((10, 10), (30, 2), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.8650631Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.8650637Z 2023-01-11T21:38:06.8650704Z ok (1.681s) 2023-01-11T21:38:06.8651193Z test_cpu_broadcast1_transposed (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8651329Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8651583Z [2023-01-11 21:36:37,391] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1030 2023-01-11T21:38:06.8651850Z [2023-01-11 21:36:39,058] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1030 2023-01-11T21:38:06.8651856Z 2023-01-11T21:38:06.8651954Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8652031Z import torch 2023-01-11T21:38:06.8652109Z import random 2023-01-11T21:38:06.8652228Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8652353Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8652358Z 2023-01-11T21:38:06.8652441Z aten = torch.ops.aten 2023-01-11T21:38:06.8652570Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8652670Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8652675Z 2023-01-11T21:38:06.8652751Z import triton 2023-01-11T21:38:06.8652843Z import triton.language as tl 2023-01-11T21:38:06.8652968Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8653108Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8653172Z 2023-01-11T21:38:06.8653178Z 2023-01-11T21:38:06.8653316Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.8653524Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.8653642Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.8653751Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:06.8653855Z float* __restrict__ out_ptr0) 2023-01-11T21:38:06.8653922Z { 2023-01-11T21:38:06.8654025Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.8654091Z { 2023-01-11T21:38:06.8654176Z #pragma omp for 2023-01-11T21:38:06.8654257Z for(long i0=0; i0<10; i0+=1) 2023-01-11T21:38:06.8654325Z { 2023-01-11T21:38:06.8654413Z for(long i1=0; i1<1; i1+=1) 2023-01-11T21:38:06.8654613Z { 2023-01-11T21:38:06.8654758Z auto tmp0 = at::vec::Vectorized(in_ptr0[i0]); 2023-01-11T21:38:06.8654910Z auto tmp1 = at::vec::Vectorized::loadu(in_ptr1 + (8*i1) + (10*i0)); 2023-01-11T21:38:06.8655006Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.8655110Z tmp2.store(out_ptr0 + (8*i1) + (10*i0)); 2023-01-11T21:38:06.8655181Z } 2023-01-11T21:38:06.8655278Z #pragma omp simd simdlen(4) 2023-01-11T21:38:06.8655368Z for(long i1=8; i1<10; i1+=1) 2023-01-11T21:38:06.8655436Z { 2023-01-11T21:38:06.8655527Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:06.8655629Z auto tmp1 = in_ptr1[i1 + (10*i0)]; 2023-01-11T21:38:06.8655770Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.8655868Z out_ptr0[i1 + (10*i0)] = tmp2; 2023-01-11T21:38:06.8655937Z } 2023-01-11T21:38:06.8656005Z } 2023-01-11T21:38:06.8656073Z } 2023-01-11T21:38:06.8656139Z } 2023-01-11T21:38:06.8656227Z ''') 2023-01-11T21:38:06.8656233Z 2023-01-11T21:38:06.8656240Z 2023-01-11T21:38:06.8656326Z async_compile.wait(globals()) 2023-01-11T21:38:06.8656407Z del async_compile 2023-01-11T21:38:06.8656413Z 2023-01-11T21:38:06.8656487Z def call(args): 2023-01-11T21:38:06.8656567Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.8656642Z args.clear() 2023-01-11T21:38:06.8656845Z buf0 = empty_strided((10, 10), (1, 10), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.8657010Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.8657085Z del arg0_1 2023-01-11T21:38:06.8657204Z del arg1_1 2023-01-11T21:38:06.8657294Z return (buf0, ) 2023-01-11T21:38:06.8657304Z 2023-01-11T21:38:06.8657309Z 2023-01-11T21:38:06.8657391Z if __name__ == "__main__": 2023-01-11T21:38:06.8657509Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8657635Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8657836Z arg0_1 = rand_strided((10, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.8658032Z arg1_1 = rand_strided((10, 10), (1, 10), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.8658144Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.8658157Z 2023-01-11T21:38:06.8658220Z ok (1.681s) 2023-01-11T21:38:06.8658698Z test_cpu_broadcast2_broadcast1 (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8658834Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8659096Z [2023-01-11 21:36:39,074] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1031 2023-01-11T21:38:06.8659433Z [2023-01-11 21:36:40,735] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1031 2023-01-11T21:38:06.8659441Z 2023-01-11T21:38:06.8659542Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8659616Z import torch 2023-01-11T21:38:06.8659693Z import random 2023-01-11T21:38:06.8659812Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8659933Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8659938Z 2023-01-11T21:38:06.8660022Z aten = torch.ops.aten 2023-01-11T21:38:06.8660157Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8660253Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8660262Z 2023-01-11T21:38:06.8660337Z import triton 2023-01-11T21:38:06.8660429Z import triton.language as tl 2023-01-11T21:38:06.8660554Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8660685Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8660699Z 2023-01-11T21:38:06.8660706Z 2023-01-11T21:38:06.8660837Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.8661043Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.8661167Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.8661276Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:06.8661384Z float* __restrict__ out_ptr0) 2023-01-11T21:38:06.8661450Z { 2023-01-11T21:38:06.8661554Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.8661613Z { 2023-01-11T21:38:06.8661698Z #pragma omp for 2023-01-11T21:38:06.8661816Z for(long i0=0; i0<10; i0+=1) 2023-01-11T21:38:06.8661883Z { 2023-01-11T21:38:06.8661973Z for(long i1=0; i1<1; i1+=1) 2023-01-11T21:38:06.8662041Z { 2023-01-11T21:38:06.8662175Z auto tmp0 = at::vec::Vectorized(in_ptr0[i0]); 2023-01-11T21:38:06.8662309Z auto tmp1 = at::vec::Vectorized::loadu(in_ptr1 + 8*i1); 2023-01-11T21:38:06.8662404Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.8662512Z tmp2.store(out_ptr0 + (8*i1) + (10*i0)); 2023-01-11T21:38:06.8662581Z } 2023-01-11T21:38:06.8662676Z #pragma omp simd simdlen(4) 2023-01-11T21:38:06.8662767Z for(long i1=8; i1<10; i1+=1) 2023-01-11T21:38:06.8662835Z { 2023-01-11T21:38:06.8662918Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:06.8663009Z auto tmp1 = in_ptr1[i1]; 2023-01-11T21:38:06.8663100Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.8663203Z out_ptr0[i1 + (10*i0)] = tmp2; 2023-01-11T21:38:06.8663271Z } 2023-01-11T21:38:06.8663339Z } 2023-01-11T21:38:06.8663405Z } 2023-01-11T21:38:06.8663462Z } 2023-01-11T21:38:06.8663548Z ''') 2023-01-11T21:38:06.8663554Z 2023-01-11T21:38:06.8663558Z 2023-01-11T21:38:06.8663657Z async_compile.wait(globals()) 2023-01-11T21:38:06.8663736Z del async_compile 2023-01-11T21:38:06.8663742Z 2023-01-11T21:38:06.8663818Z def call(args): 2023-01-11T21:38:06.8663898Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.8663974Z args.clear() 2023-01-11T21:38:06.8664177Z buf0 = empty_strided((1, 10, 10), (100, 10, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.8664347Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.8664422Z del arg0_1 2023-01-11T21:38:06.8664493Z del arg1_1 2023-01-11T21:38:06.8664571Z return (buf0, ) 2023-01-11T21:38:06.8664576Z 2023-01-11T21:38:06.8664583Z 2023-01-11T21:38:06.8664665Z if __name__ == "__main__": 2023-01-11T21:38:06.8664783Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8664911Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8665109Z arg0_1 = rand_strided((1, 10, 1), (10, 1, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.8665329Z arg1_1 = rand_strided((10, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.8665449Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.8665455Z 2023-01-11T21:38:06.8665528Z ok (1.676s) 2023-01-11T21:38:06.8666052Z test_cpu_broadcast2_broadcast2 (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8666189Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8666451Z [2023-01-11 21:36:40,748] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1032 2023-01-11T21:38:06.8666716Z [2023-01-11 21:36:40,756] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1032 2023-01-11T21:38:06.8666722Z 2023-01-11T21:38:06.8666823Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8666899Z import torch 2023-01-11T21:38:06.8666966Z import random 2023-01-11T21:38:06.8667085Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8667209Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8667214Z 2023-01-11T21:38:06.8667297Z aten = torch.ops.aten 2023-01-11T21:38:06.8667434Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8667530Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8667536Z 2023-01-11T21:38:06.8667643Z import triton 2023-01-11T21:38:06.8667729Z import triton.language as tl 2023-01-11T21:38:06.8667854Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8667995Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8668001Z 2023-01-11T21:38:06.8668005Z 2023-01-11T21:38:06.8668146Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.8668353Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.8668476Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.8668592Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:06.8668697Z float* __restrict__ out_ptr0) 2023-01-11T21:38:06.8668756Z { 2023-01-11T21:38:06.8668856Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.8668928Z { 2023-01-11T21:38:06.8669014Z #pragma omp for 2023-01-11T21:38:06.8669102Z for(long i0=0; i0<1; i0+=1) 2023-01-11T21:38:06.8669175Z { 2023-01-11T21:38:06.8669315Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:06.8669444Z auto tmp1 = at::vec::Vectorized::loadu(in_ptr1 + 8*i0); 2023-01-11T21:38:06.8669535Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.8669631Z tmp2.store(out_ptr0 + 8*i0); 2023-01-11T21:38:06.8669699Z } 2023-01-11T21:38:06.8669798Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:06.8669887Z for(long i0=8; i0<10; i0+=1) 2023-01-11T21:38:06.8669956Z { 2023-01-11T21:38:06.8670037Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:06.8670125Z auto tmp1 = in_ptr1[i0]; 2023-01-11T21:38:06.8670213Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.8670301Z out_ptr0[i0] = tmp2; 2023-01-11T21:38:06.8670368Z } 2023-01-11T21:38:06.8670434Z } 2023-01-11T21:38:06.8670500Z } 2023-01-11T21:38:06.8670577Z ''') 2023-01-11T21:38:06.8670585Z 2023-01-11T21:38:06.8670590Z 2023-01-11T21:38:06.8670688Z async_compile.wait(globals()) 2023-01-11T21:38:06.8670765Z del async_compile 2023-01-11T21:38:06.8670771Z 2023-01-11T21:38:06.8670848Z def call(args): 2023-01-11T21:38:06.8670928Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.8671006Z args.clear() 2023-01-11T21:38:06.8671241Z buf0 = empty_strided((1, 10, 1), (10, 1, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.8671405Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.8671484Z del arg0_1 2023-01-11T21:38:06.8671558Z del arg1_1 2023-01-11T21:38:06.8671636Z return (buf0, ) 2023-01-11T21:38:06.8671641Z 2023-01-11T21:38:06.8671646Z 2023-01-11T21:38:06.8671732Z if __name__ == "__main__": 2023-01-11T21:38:06.8671855Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8671986Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8672197Z arg0_1 = rand_strided((1, 10, 1), (10, 1, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.8672394Z arg1_1 = rand_strided((1, 10, 1), (10, 1, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.8672517Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.8672522Z 2023-01-11T21:38:06.8672596Z ok (0.021s) 2023-01-11T21:38:06.8673078Z test_cpu_broadcast2_broadcast3 (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8673212Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8673474Z [2023-01-11 21:36:40,770] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1033 2023-01-11T21:38:06.8673765Z [2023-01-11 21:36:40,777] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1033 2023-01-11T21:38:06.8673771Z 2023-01-11T21:38:06.8673869Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8673945Z import torch 2023-01-11T21:38:06.8674013Z import random 2023-01-11T21:38:06.8674135Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8674259Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8674264Z 2023-01-11T21:38:06.8674350Z aten = torch.ops.aten 2023-01-11T21:38:06.8674485Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8674581Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8674586Z 2023-01-11T21:38:06.8674661Z import triton 2023-01-11T21:38:06.8674754Z import triton.language as tl 2023-01-11T21:38:06.8674873Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8675014Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8675022Z 2023-01-11T21:38:06.8675026Z 2023-01-11T21:38:06.8675164Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.8675369Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.8675496Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.8675610Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:06.8675715Z float* __restrict__ out_ptr0) 2023-01-11T21:38:06.8675782Z { 2023-01-11T21:38:06.8675878Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.8675944Z { 2023-01-11T21:38:06.8676026Z #pragma omp for 2023-01-11T21:38:06.8676115Z for(long i0=0; i0<1; i0+=1) 2023-01-11T21:38:06.8676182Z { 2023-01-11T21:38:06.8676321Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:06.8676447Z auto tmp1 = at::vec::Vectorized(in_ptr1[0]); 2023-01-11T21:38:06.8676532Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.8676627Z tmp2.store(out_ptr0 + 8*i0); 2023-01-11T21:38:06.8676695Z } 2023-01-11T21:38:06.8676796Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:06.8676885Z for(long i0=8; i0<10; i0+=1) 2023-01-11T21:38:06.8676953Z { 2023-01-11T21:38:06.8677068Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:06.8677150Z auto tmp1 = in_ptr1[0]; 2023-01-11T21:38:06.8677237Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.8677324Z out_ptr0[i0] = tmp2; 2023-01-11T21:38:06.8677393Z } 2023-01-11T21:38:06.8677461Z } 2023-01-11T21:38:06.8677527Z } 2023-01-11T21:38:06.8677606Z ''') 2023-01-11T21:38:06.8677621Z 2023-01-11T21:38:06.8677625Z 2023-01-11T21:38:06.8677713Z async_compile.wait(globals()) 2023-01-11T21:38:06.8677793Z del async_compile 2023-01-11T21:38:06.8677798Z 2023-01-11T21:38:06.8677873Z def call(args): 2023-01-11T21:38:06.8677953Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.8678034Z args.clear() 2023-01-11T21:38:06.8678238Z buf0 = empty_strided((1, 10, 1), (10, 1, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.8678405Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.8678470Z del arg0_1 2023-01-11T21:38:06.8678546Z del arg1_1 2023-01-11T21:38:06.8678622Z return (buf0, ) 2023-01-11T21:38:06.8678628Z 2023-01-11T21:38:06.8678632Z 2023-01-11T21:38:06.8678713Z if __name__ == "__main__": 2023-01-11T21:38:06.8678831Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8678959Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8679165Z arg0_1 = rand_strided((1, 10, 1), (10, 1, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.8679356Z arg1_1 = rand_strided((1, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.8679469Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.8679503Z 2023-01-11T21:38:06.8679574Z ok (0.021s) 2023-01-11T21:38:06.8680051Z test_cpu_broadcast2_dense (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8680186Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8680450Z [2023-01-11 21:36:40,790] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1034 2023-01-11T21:38:06.8680712Z [2023-01-11 21:36:40,799] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1034 2023-01-11T21:38:06.8680718Z 2023-01-11T21:38:06.8680816Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8680894Z import torch 2023-01-11T21:38:06.8680968Z import random 2023-01-11T21:38:06.8681079Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8681203Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8681208Z 2023-01-11T21:38:06.8681292Z aten = torch.ops.aten 2023-01-11T21:38:06.8681430Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8681526Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8681531Z 2023-01-11T21:38:06.8681605Z import triton 2023-01-11T21:38:06.8681699Z import triton.language as tl 2023-01-11T21:38:06.8681822Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8681954Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8681961Z 2023-01-11T21:38:06.8681972Z 2023-01-11T21:38:06.8682102Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.8682308Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.8682436Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.8682547Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:06.8682651Z float* __restrict__ out_ptr0) 2023-01-11T21:38:06.8682718Z { 2023-01-11T21:38:06.8682820Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.8682905Z { 2023-01-11T21:38:06.8682988Z #pragma omp for 2023-01-11T21:38:06.8683076Z for(long i0=0; i0<10; i0+=1) 2023-01-11T21:38:06.8683143Z { 2023-01-11T21:38:06.8683231Z for(long i1=0; i1<1; i1+=1) 2023-01-11T21:38:06.8683299Z { 2023-01-11T21:38:06.8683425Z auto tmp0 = at::vec::Vectorized(in_ptr0[i0]); 2023-01-11T21:38:06.8683576Z auto tmp1 = at::vec::Vectorized::loadu(in_ptr1 + (8*i1) + (10*i0)); 2023-01-11T21:38:06.8683669Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.8683778Z tmp2.store(out_ptr0 + (8*i1) + (10*i0)); 2023-01-11T21:38:06.8683853Z } 2023-01-11T21:38:06.8683951Z #pragma omp simd simdlen(4) 2023-01-11T21:38:06.8684039Z for(long i1=8; i1<10; i1+=1) 2023-01-11T21:38:06.8684107Z { 2023-01-11T21:38:06.8684190Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:06.8684297Z auto tmp1 = in_ptr1[i1 + (10*i0)]; 2023-01-11T21:38:06.8684388Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.8684487Z out_ptr0[i1 + (10*i0)] = tmp2; 2023-01-11T21:38:06.8684555Z } 2023-01-11T21:38:06.8684621Z } 2023-01-11T21:38:06.8684688Z } 2023-01-11T21:38:06.8684745Z } 2023-01-11T21:38:06.8684830Z ''') 2023-01-11T21:38:06.8684837Z 2023-01-11T21:38:06.8684841Z 2023-01-11T21:38:06.8684934Z async_compile.wait(globals()) 2023-01-11T21:38:06.8685012Z del async_compile 2023-01-11T21:38:06.8685017Z 2023-01-11T21:38:06.8685093Z def call(args): 2023-01-11T21:38:06.8685173Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.8685290Z args.clear() 2023-01-11T21:38:06.8685490Z buf0 = empty_strided((1, 10, 10), (100, 10, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.8685683Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.8685769Z del arg0_1 2023-01-11T21:38:06.8685851Z del arg1_1 2023-01-11T21:38:06.8685931Z return (buf0, ) 2023-01-11T21:38:06.8685937Z 2023-01-11T21:38:06.8685941Z 2023-01-11T21:38:06.8686028Z if __name__ == "__main__": 2023-01-11T21:38:06.8686148Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8686279Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8686477Z arg0_1 = rand_strided((1, 10, 1), (10, 1, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.8686675Z arg1_1 = rand_strided((10, 10), (10, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.8686796Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.8686801Z 2023-01-11T21:38:06.8686876Z ok (0.021s) 2023-01-11T21:38:06.8687356Z test_cpu_broadcast2_double (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8687489Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8687749Z [2023-01-11 21:36:40,812] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1035 2023-01-11T21:38:06.8688016Z [2023-01-11 21:36:42,465] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1035 2023-01-11T21:38:06.8688022Z 2023-01-11T21:38:06.8688120Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8688187Z import torch 2023-01-11T21:38:06.8688264Z import random 2023-01-11T21:38:06.8688383Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8688508Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8688513Z 2023-01-11T21:38:06.8688595Z aten = torch.ops.aten 2023-01-11T21:38:06.8688733Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8688855Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8688861Z 2023-01-11T21:38:06.8688938Z import triton 2023-01-11T21:38:06.8689024Z import triton.language as tl 2023-01-11T21:38:06.8689148Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8689288Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8689293Z 2023-01-11T21:38:06.8689298Z 2023-01-11T21:38:06.8689442Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.8689648Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.8689772Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.8689888Z const double* __restrict__ in_ptr1, 2023-01-11T21:38:06.8689995Z double* __restrict__ out_ptr0) 2023-01-11T21:38:06.8690053Z { 2023-01-11T21:38:06.8690154Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.8690221Z { 2023-01-11T21:38:06.8690321Z #pragma omp for collapse(2) 2023-01-11T21:38:06.8690409Z for(long i0=0; i0<10; i0+=1) 2023-01-11T21:38:06.8690476Z { 2023-01-11T21:38:06.8690560Z for(long i1=0; i1<10; i1+=1) 2023-01-11T21:38:06.8690628Z { 2023-01-11T21:38:06.8690698Z { 2023-01-11T21:38:06.8690771Z { 2023-01-11T21:38:06.8690873Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:06.8690980Z auto tmp2 = in_ptr1[i1 + (10*i0)]; 2023-01-11T21:38:06.8691098Z auto tmp1 = static_cast(tmp0); 2023-01-11T21:38:06.8691189Z auto tmp3 = tmp1 + tmp2; 2023-01-11T21:38:06.8691319Z out_ptr0[i1 + (10*i0)] = tmp3; 2023-01-11T21:38:06.8691390Z } 2023-01-11T21:38:06.8691459Z } 2023-01-11T21:38:06.8691528Z } 2023-01-11T21:38:06.8691596Z } 2023-01-11T21:38:06.8691664Z } 2023-01-11T21:38:06.8691724Z } 2023-01-11T21:38:06.8691808Z ''') 2023-01-11T21:38:06.8691814Z 2023-01-11T21:38:06.8691818Z 2023-01-11T21:38:06.8691911Z async_compile.wait(globals()) 2023-01-11T21:38:06.8691990Z del async_compile 2023-01-11T21:38:06.8691995Z 2023-01-11T21:38:06.8692073Z def call(args): 2023-01-11T21:38:06.8692154Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.8692228Z args.clear() 2023-01-11T21:38:06.8692430Z buf0 = empty_strided((1, 10, 10), (100, 10, 1), device='cpu', dtype=torch.float64) 2023-01-11T21:38:06.8692601Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.8692677Z del arg0_1 2023-01-11T21:38:06.8692749Z del arg1_1 2023-01-11T21:38:06.8692825Z return (buf0, ) 2023-01-11T21:38:06.8692830Z 2023-01-11T21:38:06.8692835Z 2023-01-11T21:38:06.8692915Z if __name__ == "__main__": 2023-01-11T21:38:06.8693039Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8693166Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8693365Z arg0_1 = rand_strided((1, 10, 1), (10, 1, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.8693571Z arg1_1 = rand_strided((10, 10), (10, 1), device='cpu', dtype=torch.float64) 2023-01-11T21:38:06.8693689Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.8693695Z 2023-01-11T21:38:06.8693766Z ok (1.667s) 2023-01-11T21:38:06.8694238Z test_cpu_broadcast2_int (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8694374Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8694787Z [2023-01-11 21:36:42,479] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1036 2023-01-11T21:38:06.8695052Z [2023-01-11 21:36:44,140] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1036 2023-01-11T21:38:06.8695058Z 2023-01-11T21:38:06.8695156Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8695231Z import torch 2023-01-11T21:38:06.8695298Z import random 2023-01-11T21:38:06.8695421Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8695544Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8695550Z 2023-01-11T21:38:06.8695630Z aten = torch.ops.aten 2023-01-11T21:38:06.8695769Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8695866Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8695871Z 2023-01-11T21:38:06.8695944Z import triton 2023-01-11T21:38:06.8696029Z import triton.language as tl 2023-01-11T21:38:06.8696153Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8696294Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8696300Z 2023-01-11T21:38:06.8696304Z 2023-01-11T21:38:06.8696443Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.8696646Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.8696768Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.8696877Z const int* __restrict__ in_ptr1, 2023-01-11T21:38:06.8696981Z float* __restrict__ out_ptr0) 2023-01-11T21:38:06.8697040Z { 2023-01-11T21:38:06.8697189Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.8697311Z { 2023-01-11T21:38:06.8697409Z #pragma omp for collapse(2) 2023-01-11T21:38:06.8697495Z for(long i0=0; i0<10; i0+=1) 2023-01-11T21:38:06.8697562Z { 2023-01-11T21:38:06.8697654Z for(long i1=0; i1<10; i1+=1) 2023-01-11T21:38:06.8697714Z { 2023-01-11T21:38:06.8697787Z { 2023-01-11T21:38:06.8697862Z { 2023-01-11T21:38:06.8697962Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:06.8698062Z auto tmp1 = in_ptr1[i1]; 2023-01-11T21:38:06.8698178Z auto tmp2 = static_cast(tmp1); 2023-01-11T21:38:06.8698276Z auto tmp3 = tmp0 + tmp2; 2023-01-11T21:38:06.8698369Z out_ptr0[i1 + (10*i0)] = tmp3; 2023-01-11T21:38:06.8698440Z } 2023-01-11T21:38:06.8698508Z } 2023-01-11T21:38:06.8698575Z } 2023-01-11T21:38:06.8698642Z } 2023-01-11T21:38:06.8698711Z } 2023-01-11T21:38:06.8698776Z } 2023-01-11T21:38:06.8698855Z ''') 2023-01-11T21:38:06.8698860Z 2023-01-11T21:38:06.8698864Z 2023-01-11T21:38:06.8698959Z async_compile.wait(globals()) 2023-01-11T21:38:06.8699034Z del async_compile 2023-01-11T21:38:06.8699039Z 2023-01-11T21:38:06.8699114Z def call(args): 2023-01-11T21:38:06.8699198Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.8699274Z args.clear() 2023-01-11T21:38:06.8699484Z buf0 = empty_strided((1, 10, 10), (100, 10, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.8699644Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.8699717Z del arg0_1 2023-01-11T21:38:06.8699790Z del arg1_1 2023-01-11T21:38:06.8699866Z return (buf0, ) 2023-01-11T21:38:06.8699871Z 2023-01-11T21:38:06.8699876Z 2023-01-11T21:38:06.8699958Z if __name__ == "__main__": 2023-01-11T21:38:06.8700076Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8700207Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8700413Z arg0_1 = rand_strided((1, 10, 1), (10, 1, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.8700596Z arg1_1 = rand_strided((10, ), (1, ), device='cpu', dtype=torch.int32) 2023-01-11T21:38:06.8700743Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.8700749Z 2023-01-11T21:38:06.8700824Z ok (1.675s) 2023-01-11T21:38:06.8701300Z test_cpu_broadcast2_strided (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8701431Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8701696Z [2023-01-11 21:36:44,154] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1037 2023-01-11T21:38:06.8701960Z [2023-01-11 21:36:45,813] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1037 2023-01-11T21:38:06.8701966Z 2023-01-11T21:38:06.8702064Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8702140Z import torch 2023-01-11T21:38:06.8702208Z import random 2023-01-11T21:38:06.8702328Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8702452Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8702457Z 2023-01-11T21:38:06.8702538Z aten = torch.ops.aten 2023-01-11T21:38:06.8702674Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8702771Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8702776Z 2023-01-11T21:38:06.8702850Z import triton 2023-01-11T21:38:06.8702943Z import triton.language as tl 2023-01-11T21:38:06.8703061Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8703229Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8703235Z 2023-01-11T21:38:06.8703240Z 2023-01-11T21:38:06.8703378Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.8703586Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.8703710Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.8703819Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:06.8703921Z float* __restrict__ out_ptr0) 2023-01-11T21:38:06.8703986Z { 2023-01-11T21:38:06.8704080Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.8704146Z { 2023-01-11T21:38:06.8704241Z #pragma omp for collapse(2) 2023-01-11T21:38:06.8704328Z for(long i0=0; i0<10; i0+=1) 2023-01-11T21:38:06.8704395Z { 2023-01-11T21:38:06.8704487Z for(long i1=0; i1<10; i1+=1) 2023-01-11T21:38:06.8704551Z { 2023-01-11T21:38:06.8704620Z { 2023-01-11T21:38:06.8704690Z { 2023-01-11T21:38:06.8704791Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:06.8704903Z auto tmp1 = in_ptr1[(2*i1) + (30*i0)]; 2023-01-11T21:38:06.8705004Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.8705107Z out_ptr0[i1 + (10*i0)] = tmp2; 2023-01-11T21:38:06.8705171Z } 2023-01-11T21:38:06.8705240Z } 2023-01-11T21:38:06.8705307Z } 2023-01-11T21:38:06.8705377Z } 2023-01-11T21:38:06.8705447Z } 2023-01-11T21:38:06.8705518Z } 2023-01-11T21:38:06.8705624Z ''') 2023-01-11T21:38:06.8705630Z 2023-01-11T21:38:06.8705634Z 2023-01-11T21:38:06.8705730Z async_compile.wait(globals()) 2023-01-11T21:38:06.8705822Z del async_compile 2023-01-11T21:38:06.8705828Z 2023-01-11T21:38:06.8705903Z def call(args): 2023-01-11T21:38:06.8705983Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.8706062Z args.clear() 2023-01-11T21:38:06.8706267Z buf0 = empty_strided((1, 10, 10), (100, 10, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.8706435Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.8706509Z del arg0_1 2023-01-11T21:38:06.8706607Z del arg1_1 2023-01-11T21:38:06.8706685Z return (buf0, ) 2023-01-11T21:38:06.8706690Z 2023-01-11T21:38:06.8706694Z 2023-01-11T21:38:06.8706775Z if __name__ == "__main__": 2023-01-11T21:38:06.8706895Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8707020Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8707226Z arg0_1 = rand_strided((1, 10, 1), (10, 1, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.8707425Z arg1_1 = rand_strided((10, 10), (30, 2), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.8707536Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.8707551Z 2023-01-11T21:38:06.8707615Z ok (1.674s) 2023-01-11T21:38:06.8708100Z test_cpu_broadcast2_transposed (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8708233Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8708492Z [2023-01-11 21:36:45,827] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1038 2023-01-11T21:38:06.8708757Z [2023-01-11 21:36:45,836] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1038 2023-01-11T21:38:06.8708763Z 2023-01-11T21:38:06.8708861Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8708962Z import torch 2023-01-11T21:38:06.8709037Z import random 2023-01-11T21:38:06.8709155Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8709271Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8709277Z 2023-01-11T21:38:06.8709358Z aten = torch.ops.aten 2023-01-11T21:38:06.8709499Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8709595Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8709600Z 2023-01-11T21:38:06.8709675Z import triton 2023-01-11T21:38:06.8709767Z import triton.language as tl 2023-01-11T21:38:06.8709891Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8710023Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8710035Z 2023-01-11T21:38:06.8710040Z 2023-01-11T21:38:06.8710169Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.8710375Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.8710504Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.8710613Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:06.8710717Z float* __restrict__ out_ptr0) 2023-01-11T21:38:06.8710784Z { 2023-01-11T21:38:06.8710885Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.8710947Z { 2023-01-11T21:38:06.8711030Z #pragma omp for 2023-01-11T21:38:06.8711116Z for(long i0=0; i0<10; i0+=1) 2023-01-11T21:38:06.8711184Z { 2023-01-11T21:38:06.8711273Z for(long i1=0; i1<1; i1+=1) 2023-01-11T21:38:06.8711341Z { 2023-01-11T21:38:06.8711483Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i1); 2023-01-11T21:38:06.8711622Z auto tmp1 = at::vec::Vectorized::loadu(in_ptr1 + (8*i1) + (10*i0)); 2023-01-11T21:38:06.8711715Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.8711826Z tmp2.store(out_ptr0 + (8*i1) + (10*i0)); 2023-01-11T21:38:06.8711896Z } 2023-01-11T21:38:06.8711993Z #pragma omp simd simdlen(4) 2023-01-11T21:38:06.8712083Z for(long i1=8; i1<10; i1+=1) 2023-01-11T21:38:06.8712151Z { 2023-01-11T21:38:06.8712234Z auto tmp0 = in_ptr0[i1]; 2023-01-11T21:38:06.8712368Z auto tmp1 = in_ptr1[i1 + (10*i0)]; 2023-01-11T21:38:06.8712461Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.8712564Z out_ptr0[i1 + (10*i0)] = tmp2; 2023-01-11T21:38:06.8712634Z } 2023-01-11T21:38:06.8712703Z } 2023-01-11T21:38:06.8712768Z } 2023-01-11T21:38:06.8712826Z } 2023-01-11T21:38:06.8712914Z ''') 2023-01-11T21:38:06.8712920Z 2023-01-11T21:38:06.8712925Z 2023-01-11T21:38:06.8713019Z async_compile.wait(globals()) 2023-01-11T21:38:06.8713103Z del async_compile 2023-01-11T21:38:06.8713108Z 2023-01-11T21:38:06.8713184Z def call(args): 2023-01-11T21:38:06.8713263Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.8713341Z args.clear() 2023-01-11T21:38:06.8713549Z buf0 = empty_strided((1, 10, 10), (100, 1, 10), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.8713710Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.8713782Z del arg0_1 2023-01-11T21:38:06.8713857Z del arg1_1 2023-01-11T21:38:06.8713932Z return (buf0, ) 2023-01-11T21:38:06.8713937Z 2023-01-11T21:38:06.8713942Z 2023-01-11T21:38:06.8714025Z if __name__ == "__main__": 2023-01-11T21:38:06.8714143Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8714269Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8714468Z arg0_1 = rand_strided((1, 10, 1), (10, 1, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.8714666Z arg1_1 = rand_strided((10, 10), (1, 10), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.8714786Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.8714820Z 2023-01-11T21:38:06.8714892Z ok (0.022s) 2023-01-11T21:38:06.8715375Z test_cpu_broadcast3_broadcast1 (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8715506Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8715766Z [2023-01-11 21:36:45,849] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1039 2023-01-11T21:38:06.8716030Z [2023-01-11 21:36:47,508] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1039 2023-01-11T21:38:06.8716036Z 2023-01-11T21:38:06.8716134Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8716208Z import torch 2023-01-11T21:38:06.8716278Z import random 2023-01-11T21:38:06.8716398Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8716521Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8716526Z 2023-01-11T21:38:06.8716610Z aten = torch.ops.aten 2023-01-11T21:38:06.8716750Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8716846Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8716851Z 2023-01-11T21:38:06.8716925Z import triton 2023-01-11T21:38:06.8717010Z import triton.language as tl 2023-01-11T21:38:06.8717136Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8717276Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8717282Z 2023-01-11T21:38:06.8717286Z 2023-01-11T21:38:06.8717422Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.8717628Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.8717757Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.8717865Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:06.8717972Z float* __restrict__ out_ptr0) 2023-01-11T21:38:06.8718030Z { 2023-01-11T21:38:06.8718132Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.8718228Z { 2023-01-11T21:38:06.8718312Z #pragma omp for 2023-01-11T21:38:06.8718402Z for(long i0=0; i0<1; i0+=1) 2023-01-11T21:38:06.8718469Z { 2023-01-11T21:38:06.8718599Z auto tmp0 = at::vec::Vectorized(in_ptr0[0]); 2023-01-11T21:38:06.8718729Z auto tmp1 = at::vec::Vectorized::loadu(in_ptr1 + 8*i0); 2023-01-11T21:38:06.8718819Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.8718915Z tmp2.store(out_ptr0 + 8*i0); 2023-01-11T21:38:06.8718983Z } 2023-01-11T21:38:06.8719083Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:06.8719169Z for(long i0=8; i0<10; i0+=1) 2023-01-11T21:38:06.8719241Z { 2023-01-11T21:38:06.8719321Z auto tmp0 = in_ptr0[0]; 2023-01-11T21:38:06.8719408Z auto tmp1 = in_ptr1[i0]; 2023-01-11T21:38:06.8719497Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.8719583Z out_ptr0[i0] = tmp2; 2023-01-11T21:38:06.8719652Z } 2023-01-11T21:38:06.8719721Z } 2023-01-11T21:38:06.8719785Z } 2023-01-11T21:38:06.8719864Z ''') 2023-01-11T21:38:06.8719869Z 2023-01-11T21:38:06.8719874Z 2023-01-11T21:38:06.8719972Z async_compile.wait(globals()) 2023-01-11T21:38:06.8720050Z del async_compile 2023-01-11T21:38:06.8720055Z 2023-01-11T21:38:06.8720130Z def call(args): 2023-01-11T21:38:06.8720209Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.8720283Z args.clear() 2023-01-11T21:38:06.8720478Z buf0 = empty_strided((10, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.8720645Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.8720742Z del arg0_1 2023-01-11T21:38:06.8720815Z del arg1_1 2023-01-11T21:38:06.8720891Z return (buf0, ) 2023-01-11T21:38:06.8720896Z 2023-01-11T21:38:06.8720901Z 2023-01-11T21:38:06.8720983Z if __name__ == "__main__": 2023-01-11T21:38:06.8721101Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8721231Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8721423Z arg0_1 = rand_strided((1, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.8721610Z arg1_1 = rand_strided((10, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.8721731Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.8721736Z 2023-01-11T21:38:06.8721809Z ok (1.672s) 2023-01-11T21:38:06.8722288Z test_cpu_broadcast3_broadcast2 (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8722421Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8722683Z [2023-01-11 21:36:47,521] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1040 2023-01-11T21:38:06.8722948Z [2023-01-11 21:36:47,528] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1040 2023-01-11T21:38:06.8722954Z 2023-01-11T21:38:06.8723051Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8723126Z import torch 2023-01-11T21:38:06.8723200Z import random 2023-01-11T21:38:06.8723312Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8723434Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8723439Z 2023-01-11T21:38:06.8723522Z aten = torch.ops.aten 2023-01-11T21:38:06.8723661Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8723757Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8723762Z 2023-01-11T21:38:06.8723836Z import triton 2023-01-11T21:38:06.8723929Z import triton.language as tl 2023-01-11T21:38:06.8724046Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8724212Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8724218Z 2023-01-11T21:38:06.8724223Z 2023-01-11T21:38:06.8724360Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.8724566Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.8724692Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.8724802Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:06.8724908Z float* __restrict__ out_ptr0) 2023-01-11T21:38:06.8724975Z { 2023-01-11T21:38:06.8725073Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.8725138Z { 2023-01-11T21:38:06.8725220Z #pragma omp for 2023-01-11T21:38:06.8725306Z for(long i0=0; i0<1; i0+=1) 2023-01-11T21:38:06.8725374Z { 2023-01-11T21:38:06.8725501Z auto tmp0 = at::vec::Vectorized(in_ptr0[0]); 2023-01-11T21:38:06.8725642Z auto tmp1 = at::vec::Vectorized::loadu(in_ptr1 + 8*i0); 2023-01-11T21:38:06.8725725Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.8725821Z tmp2.store(out_ptr0 + 8*i0); 2023-01-11T21:38:06.8725892Z } 2023-01-11T21:38:06.8725991Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:06.8726077Z for(long i0=8; i0<10; i0+=1) 2023-01-11T21:38:06.8726143Z { 2023-01-11T21:38:06.8726229Z auto tmp0 = in_ptr0[0]; 2023-01-11T21:38:06.8726311Z auto tmp1 = in_ptr1[i0]; 2023-01-11T21:38:06.8726399Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.8726485Z out_ptr0[i0] = tmp2; 2023-01-11T21:38:06.8726591Z } 2023-01-11T21:38:06.8726656Z } 2023-01-11T21:38:06.8726724Z } 2023-01-11T21:38:06.8726810Z ''') 2023-01-11T21:38:06.8726816Z 2023-01-11T21:38:06.8726820Z 2023-01-11T21:38:06.8726908Z async_compile.wait(globals()) 2023-01-11T21:38:06.8726985Z del async_compile 2023-01-11T21:38:06.8726990Z 2023-01-11T21:38:06.8727069Z def call(args): 2023-01-11T21:38:06.8727150Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.8727225Z args.clear() 2023-01-11T21:38:06.8727433Z buf0 = empty_strided((1, 10, 1), (10, 1, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.8727599Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.8727665Z del arg0_1 2023-01-11T21:38:06.8727738Z del arg1_1 2023-01-11T21:38:06.8727816Z return (buf0, ) 2023-01-11T21:38:06.8727822Z 2023-01-11T21:38:06.8727826Z 2023-01-11T21:38:06.8727906Z if __name__ == "__main__": 2023-01-11T21:38:06.8728027Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8728154Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8728349Z arg0_1 = rand_strided((1, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.8728556Z arg1_1 = rand_strided((1, 10, 1), (10, 1, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.8728672Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.8728677Z 2023-01-11T21:38:06.8728747Z ok (0.020s) 2023-01-11T21:38:06.8729229Z test_cpu_broadcast3_broadcast3 (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8729361Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8729624Z [2023-01-11 21:36:47,542] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1041 2023-01-11T21:38:06.8729887Z [2023-01-11 21:36:49,187] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1041 2023-01-11T21:38:06.8729893Z 2023-01-11T21:38:06.8730017Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8730095Z import torch 2023-01-11T21:38:06.8730170Z import random 2023-01-11T21:38:06.8730283Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8730407Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8730413Z 2023-01-11T21:38:06.8730495Z aten = torch.ops.aten 2023-01-11T21:38:06.8730631Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8730726Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8730731Z 2023-01-11T21:38:06.8730806Z import triton 2023-01-11T21:38:06.8730901Z import triton.language as tl 2023-01-11T21:38:06.8731027Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8731160Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8731165Z 2023-01-11T21:38:06.8731176Z 2023-01-11T21:38:06.8731306Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.8731513Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.8731637Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.8731748Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:06.8731851Z float* __restrict__ out_ptr0) 2023-01-11T21:38:06.8731917Z { 2023-01-11T21:38:06.8731983Z { 2023-01-11T21:38:06.8732043Z { 2023-01-11T21:38:06.8732131Z auto tmp0 = in_ptr0[0]; 2023-01-11T21:38:06.8732217Z auto tmp1 = in_ptr1[0]; 2023-01-11T21:38:06.8732307Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.8732394Z out_ptr0[0] = tmp2; 2023-01-11T21:38:06.8732491Z } 2023-01-11T21:38:06.8732550Z } 2023-01-11T21:38:06.8732614Z } 2023-01-11T21:38:06.8732700Z ''') 2023-01-11T21:38:06.8732706Z 2023-01-11T21:38:06.8732710Z 2023-01-11T21:38:06.8732806Z async_compile.wait(globals()) 2023-01-11T21:38:06.8732883Z del async_compile 2023-01-11T21:38:06.8732888Z 2023-01-11T21:38:06.8732965Z def call(args): 2023-01-11T21:38:06.8733046Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.8733122Z args.clear() 2023-01-11T21:38:06.8733307Z buf0 = empty_strided((1, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.8733475Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.8733547Z del arg0_1 2023-01-11T21:38:06.8733619Z del arg1_1 2023-01-11T21:38:06.8733696Z return (buf0, ) 2023-01-11T21:38:06.8733702Z 2023-01-11T21:38:06.8733706Z 2023-01-11T21:38:06.8733790Z if __name__ == "__main__": 2023-01-11T21:38:06.8733909Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8734032Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8734227Z arg0_1 = rand_strided((1, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.8734417Z arg1_1 = rand_strided((1, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.8734654Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.8734659Z 2023-01-11T21:38:06.8734733Z ok (1.659s) 2023-01-11T21:38:06.8735297Z test_cpu_broadcast3_dense (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8735442Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8735742Z [2023-01-11 21:36:49,200] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1042 2023-01-11T21:38:06.8736042Z [2023-01-11 21:36:50,877] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1042 2023-01-11T21:38:06.8736048Z 2023-01-11T21:38:06.8736196Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8736266Z import torch 2023-01-11T21:38:06.8736343Z import random 2023-01-11T21:38:06.8736472Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8736607Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8736612Z 2023-01-11T21:38:06.8736696Z aten = torch.ops.aten 2023-01-11T21:38:06.8736845Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8736946Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8736951Z 2023-01-11T21:38:06.8737027Z import triton 2023-01-11T21:38:06.8737117Z import triton.language as tl 2023-01-11T21:38:06.8737309Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8737452Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8737457Z 2023-01-11T21:38:06.8737462Z 2023-01-11T21:38:06.8737601Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.8737807Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.8737932Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.8738041Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:06.8738143Z float* __restrict__ out_ptr0) 2023-01-11T21:38:06.8738202Z { 2023-01-11T21:38:06.8738304Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.8738369Z { 2023-01-11T21:38:06.8738451Z #pragma omp for 2023-01-11T21:38:06.8738538Z for(long i0=0; i0<12; i0+=1) 2023-01-11T21:38:06.8738605Z { 2023-01-11T21:38:06.8738726Z auto tmp0 = at::vec::Vectorized(in_ptr0[0]); 2023-01-11T21:38:06.8738900Z auto tmp1 = at::vec::Vectorized::loadu(in_ptr1 + 8*i0); 2023-01-11T21:38:06.8738992Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.8739088Z tmp2.store(out_ptr0 + 8*i0); 2023-01-11T21:38:06.8739156Z } 2023-01-11T21:38:06.8739259Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:06.8739349Z for(long i0=96; i0<100; i0+=1) 2023-01-11T21:38:06.8739409Z { 2023-01-11T21:38:06.8739497Z auto tmp0 = in_ptr0[0]; 2023-01-11T21:38:06.8739590Z auto tmp1 = in_ptr1[i0]; 2023-01-11T21:38:06.8739678Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.8739768Z out_ptr0[i0] = tmp2; 2023-01-11T21:38:06.8739836Z } 2023-01-11T21:38:06.8739903Z } 2023-01-11T21:38:06.8739960Z } 2023-01-11T21:38:06.8740046Z ''') 2023-01-11T21:38:06.8740052Z 2023-01-11T21:38:06.8740056Z 2023-01-11T21:38:06.8740149Z async_compile.wait(globals()) 2023-01-11T21:38:06.8740230Z del async_compile 2023-01-11T21:38:06.8740235Z 2023-01-11T21:38:06.8740311Z def call(args): 2023-01-11T21:38:06.8740391Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.8740466Z args.clear() 2023-01-11T21:38:06.8740665Z buf0 = empty_strided((10, 10), (10, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.8740827Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.8740899Z del arg0_1 2023-01-11T21:38:06.8740972Z del arg1_1 2023-01-11T21:38:06.8741048Z return (buf0, ) 2023-01-11T21:38:06.8741053Z 2023-01-11T21:38:06.8741057Z 2023-01-11T21:38:06.8741139Z if __name__ == "__main__": 2023-01-11T21:38:06.8741257Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8741384Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8741568Z arg0_1 = rand_strided((1, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.8741766Z arg1_1 = rand_strided((10, 10), (10, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.8741889Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.8741895Z 2023-01-11T21:38:06.8741968Z ok (1.690s) 2023-01-11T21:38:06.8742477Z test_cpu_broadcast3_double (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8742612Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8742871Z [2023-01-11 21:36:50,891] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1043 2023-01-11T21:38:06.8743134Z [2023-01-11 21:36:52,551] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1043 2023-01-11T21:38:06.8743142Z 2023-01-11T21:38:06.8743243Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8743318Z import torch 2023-01-11T21:38:06.8743386Z import random 2023-01-11T21:38:06.8743505Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8743629Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8743636Z 2023-01-11T21:38:06.8743718Z aten = torch.ops.aten 2023-01-11T21:38:06.8743854Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8743950Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8743955Z 2023-01-11T21:38:06.8744029Z import triton 2023-01-11T21:38:06.8744114Z import triton.language as tl 2023-01-11T21:38:06.8744239Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8744381Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8744386Z 2023-01-11T21:38:06.8744390Z 2023-01-11T21:38:06.8744526Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.8744759Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.8744880Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.8744994Z const double* __restrict__ in_ptr1, 2023-01-11T21:38:06.8745104Z double* __restrict__ out_ptr0) 2023-01-11T21:38:06.8745162Z { 2023-01-11T21:38:06.8745265Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.8745332Z { 2023-01-11T21:38:06.8745417Z #pragma omp for 2023-01-11T21:38:06.8745507Z for(long i0=0; i0<100; i0+=1) 2023-01-11T21:38:06.8745574Z { 2023-01-11T21:38:06.8745644Z { 2023-01-11T21:38:06.8745705Z { 2023-01-11T21:38:06.8745803Z auto tmp0 = in_ptr0[0]; 2023-01-11T21:38:06.8745904Z auto tmp2 = in_ptr1[i0]; 2023-01-11T21:38:06.8746019Z auto tmp1 = static_cast(tmp0); 2023-01-11T21:38:06.8746117Z auto tmp3 = tmp1 + tmp2; 2023-01-11T21:38:06.8746208Z out_ptr0[i0] = tmp3; 2023-01-11T21:38:06.8746280Z } 2023-01-11T21:38:06.8746340Z } 2023-01-11T21:38:06.8746408Z } 2023-01-11T21:38:06.8746474Z } 2023-01-11T21:38:06.8746541Z } 2023-01-11T21:38:06.8746626Z ''') 2023-01-11T21:38:06.8746637Z 2023-01-11T21:38:06.8746642Z 2023-01-11T21:38:06.8746735Z async_compile.wait(globals()) 2023-01-11T21:38:06.8746812Z del async_compile 2023-01-11T21:38:06.8746817Z 2023-01-11T21:38:06.8746885Z def call(args): 2023-01-11T21:38:06.8746965Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.8747041Z args.clear() 2023-01-11T21:38:06.8747241Z buf0 = empty_strided((10, 10), (10, 1), device='cpu', dtype=torch.float64) 2023-01-11T21:38:06.8747410Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.8747483Z del arg0_1 2023-01-11T21:38:06.8747554Z del arg1_1 2023-01-11T21:38:06.8747625Z return (buf0, ) 2023-01-11T21:38:06.8747631Z 2023-01-11T21:38:06.8747644Z 2023-01-11T21:38:06.8747717Z if __name__ == "__main__": 2023-01-11T21:38:06.8747836Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8747964Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8748186Z arg0_1 = rand_strided((1, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.8748389Z arg1_1 = rand_strided((10, 10), (10, 1), device='cpu', dtype=torch.float64) 2023-01-11T21:38:06.8748510Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.8748517Z 2023-01-11T21:38:06.8748588Z ok (1.674s) 2023-01-11T21:38:06.8749063Z test_cpu_broadcast3_int (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8749190Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8749448Z [2023-01-11 21:36:52,565] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1044 2023-01-11T21:38:06.8749714Z [2023-01-11 21:36:54,231] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1044 2023-01-11T21:38:06.8749720Z 2023-01-11T21:38:06.8749821Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8749896Z import torch 2023-01-11T21:38:06.8749970Z import random 2023-01-11T21:38:06.8750089Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8750214Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8750219Z 2023-01-11T21:38:06.8750304Z aten = torch.ops.aten 2023-01-11T21:38:06.8750433Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8750591Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8750597Z 2023-01-11T21:38:06.8750673Z import triton 2023-01-11T21:38:06.8750766Z import triton.language as tl 2023-01-11T21:38:06.8750892Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8751032Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8751040Z 2023-01-11T21:38:06.8751044Z 2023-01-11T21:38:06.8751181Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.8751386Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.8751506Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.8751615Z const int* __restrict__ in_ptr1, 2023-01-11T21:38:06.8751719Z float* __restrict__ out_ptr0) 2023-01-11T21:38:06.8751786Z { 2023-01-11T21:38:06.8751891Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.8751956Z { 2023-01-11T21:38:06.8752040Z #pragma omp for 2023-01-11T21:38:06.8752121Z for(long i0=0; i0<10; i0+=1) 2023-01-11T21:38:06.8752187Z { 2023-01-11T21:38:06.8752256Z { 2023-01-11T21:38:06.8752325Z { 2023-01-11T21:38:06.8752421Z auto tmp0 = in_ptr0[0]; 2023-01-11T21:38:06.8752522Z auto tmp1 = in_ptr1[i0]; 2023-01-11T21:38:06.8752627Z auto tmp2 = static_cast(tmp1); 2023-01-11T21:38:06.8752725Z auto tmp3 = tmp0 + tmp2; 2023-01-11T21:38:06.8752816Z out_ptr0[i0] = tmp3; 2023-01-11T21:38:06.8752886Z } 2023-01-11T21:38:06.8752953Z } 2023-01-11T21:38:06.8753021Z } 2023-01-11T21:38:06.8753087Z } 2023-01-11T21:38:06.8753144Z } 2023-01-11T21:38:06.8753228Z ''') 2023-01-11T21:38:06.8753233Z 2023-01-11T21:38:06.8753239Z 2023-01-11T21:38:06.8753332Z async_compile.wait(globals()) 2023-01-11T21:38:06.8753411Z del async_compile 2023-01-11T21:38:06.8753418Z 2023-01-11T21:38:06.8753495Z def call(args): 2023-01-11T21:38:06.8753573Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.8753649Z args.clear() 2023-01-11T21:38:06.8753835Z buf0 = empty_strided((10, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.8754030Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.8754107Z del arg0_1 2023-01-11T21:38:06.8754182Z del arg1_1 2023-01-11T21:38:06.8754260Z return (buf0, ) 2023-01-11T21:38:06.8754265Z 2023-01-11T21:38:06.8754270Z 2023-01-11T21:38:06.8754353Z if __name__ == "__main__": 2023-01-11T21:38:06.8754473Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8754603Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8754791Z arg0_1 = rand_strided((1, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.8754985Z arg1_1 = rand_strided((10, ), (1, ), device='cpu', dtype=torch.int32) 2023-01-11T21:38:06.8755111Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.8755116Z 2023-01-11T21:38:06.8755189Z ok (1.680s) 2023-01-11T21:38:06.8755671Z test_cpu_broadcast3_strided (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8755808Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8756071Z [2023-01-11 21:36:54,245] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1045 2023-01-11T21:38:06.8756336Z [2023-01-11 21:36:55,904] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1045 2023-01-11T21:38:06.8756342Z 2023-01-11T21:38:06.8756468Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8756543Z import torch 2023-01-11T21:38:06.8756610Z import random 2023-01-11T21:38:06.8756730Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8756854Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8756859Z 2023-01-11T21:38:06.8756944Z aten = torch.ops.aten 2023-01-11T21:38:06.8757081Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8757178Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8757184Z 2023-01-11T21:38:06.8757257Z import triton 2023-01-11T21:38:06.8757342Z import triton.language as tl 2023-01-11T21:38:06.8757466Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8757605Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8757611Z 2023-01-11T21:38:06.8757615Z 2023-01-11T21:38:06.8757752Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.8757957Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.8758083Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.8758192Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:06.8758297Z float* __restrict__ out_ptr0) 2023-01-11T21:38:06.8758355Z { 2023-01-11T21:38:06.8758462Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.8758529Z { 2023-01-11T21:38:06.8758623Z #pragma omp for collapse(2) 2023-01-11T21:38:06.8758711Z for(long i0=0; i0<10; i0+=1) 2023-01-11T21:38:06.8758782Z { 2023-01-11T21:38:06.8758872Z for(long i1=0; i1<10; i1+=1) 2023-01-11T21:38:06.8758932Z { 2023-01-11T21:38:06.8759000Z { 2023-01-11T21:38:06.8759071Z { 2023-01-11T21:38:06.8759171Z auto tmp0 = in_ptr0[0]; 2023-01-11T21:38:06.8759283Z auto tmp1 = in_ptr1[(2*i1) + (30*i0)]; 2023-01-11T21:38:06.8759387Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.8759489Z out_ptr0[i1 + (10*i0)] = tmp2; 2023-01-11T21:38:06.8759553Z } 2023-01-11T21:38:06.8759621Z } 2023-01-11T21:38:06.8759689Z } 2023-01-11T21:38:06.8759755Z } 2023-01-11T21:38:06.8759822Z } 2023-01-11T21:38:06.8759916Z } 2023-01-11T21:38:06.8759994Z ''') 2023-01-11T21:38:06.8760000Z 2023-01-11T21:38:06.8760011Z 2023-01-11T21:38:06.8760099Z async_compile.wait(globals()) 2023-01-11T21:38:06.8760177Z del async_compile 2023-01-11T21:38:06.8760182Z 2023-01-11T21:38:06.8760257Z def call(args): 2023-01-11T21:38:06.8760336Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.8760411Z args.clear() 2023-01-11T21:38:06.8760613Z buf0 = empty_strided((10, 10), (10, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.8760779Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.8760848Z del arg0_1 2023-01-11T21:38:06.8760919Z del arg1_1 2023-01-11T21:38:06.8760995Z return (buf0, ) 2023-01-11T21:38:06.8761000Z 2023-01-11T21:38:06.8761005Z 2023-01-11T21:38:06.8761086Z if __name__ == "__main__": 2023-01-11T21:38:06.8761203Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8761331Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8761522Z arg0_1 = rand_strided((1, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.8761721Z arg1_1 = rand_strided((10, 10), (30, 2), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.8761833Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.8761838Z 2023-01-11T21:38:06.8761910Z ok (1.673s) 2023-01-11T21:38:06.8762393Z test_cpu_broadcast3_transposed (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8762557Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8762818Z [2023-01-11 21:36:55,918] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1046 2023-01-11T21:38:06.8763081Z [2023-01-11 21:36:55,926] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1046 2023-01-11T21:38:06.8763087Z 2023-01-11T21:38:06.8763185Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8763260Z import torch 2023-01-11T21:38:06.8763336Z import random 2023-01-11T21:38:06.8763447Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8763571Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8763577Z 2023-01-11T21:38:06.8763658Z aten = torch.ops.aten 2023-01-11T21:38:06.8763798Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8763897Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8763903Z 2023-01-11T21:38:06.8763980Z import triton 2023-01-11T21:38:06.8764073Z import triton.language as tl 2023-01-11T21:38:06.8764197Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8764334Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8764339Z 2023-01-11T21:38:06.8764344Z 2023-01-11T21:38:06.8764480Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.8764683Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.8764806Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.8764914Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:06.8765015Z float* __restrict__ out_ptr0) 2023-01-11T21:38:06.8765084Z { 2023-01-11T21:38:06.8765178Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.8765248Z { 2023-01-11T21:38:06.8765330Z #pragma omp for 2023-01-11T21:38:06.8765418Z for(long i0=0; i0<12; i0+=1) 2023-01-11T21:38:06.8765486Z { 2023-01-11T21:38:06.8765614Z auto tmp0 = at::vec::Vectorized(in_ptr0[0]); 2023-01-11T21:38:06.8765782Z auto tmp1 = at::vec::Vectorized::loadu(in_ptr1 + 8*i0); 2023-01-11T21:38:06.8765872Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.8765961Z tmp2.store(out_ptr0 + 8*i0); 2023-01-11T21:38:06.8766030Z } 2023-01-11T21:38:06.8766129Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:06.8766216Z for(long i0=96; i0<100; i0+=1) 2023-01-11T21:38:06.8766282Z { 2023-01-11T21:38:06.8766371Z auto tmp0 = in_ptr0[0]; 2023-01-11T21:38:06.8766452Z auto tmp1 = in_ptr1[i0]; 2023-01-11T21:38:06.8766540Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.8766625Z out_ptr0[i0] = tmp2; 2023-01-11T21:38:06.8766696Z } 2023-01-11T21:38:06.8766764Z } 2023-01-11T21:38:06.8766829Z } 2023-01-11T21:38:06.8766914Z ''') 2023-01-11T21:38:06.8766919Z 2023-01-11T21:38:06.8766924Z 2023-01-11T21:38:06.8767011Z async_compile.wait(globals()) 2023-01-11T21:38:06.8767089Z del async_compile 2023-01-11T21:38:06.8767094Z 2023-01-11T21:38:06.8767171Z def call(args): 2023-01-11T21:38:06.8767251Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.8767326Z args.clear() 2023-01-11T21:38:06.8767523Z buf0 = empty_strided((10, 10), (1, 10), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.8767688Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.8767761Z del arg0_1 2023-01-11T21:38:06.8767825Z del arg1_1 2023-01-11T21:38:06.8767899Z return (buf0, ) 2023-01-11T21:38:06.8767904Z 2023-01-11T21:38:06.8767909Z 2023-01-11T21:38:06.8767988Z if __name__ == "__main__": 2023-01-11T21:38:06.8768108Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8768275Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8768466Z arg0_1 = rand_strided((1, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.8768663Z arg1_1 = rand_strided((10, 10), (1, 10), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.8768785Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.8768791Z 2023-01-11T21:38:06.8768854Z ok (0.022s) 2023-01-11T21:38:06.8769329Z test_cpu_dense_broadcast1 (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8769460Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8769720Z [2023-01-11 21:36:55,939] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1047 2023-01-11T21:38:06.8769984Z [2023-01-11 21:36:57,602] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1047 2023-01-11T21:38:06.8769989Z 2023-01-11T21:38:06.8770090Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8770166Z import torch 2023-01-11T21:38:06.8770242Z import random 2023-01-11T21:38:06.8770363Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8770479Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8770484Z 2023-01-11T21:38:06.8770565Z aten = torch.ops.aten 2023-01-11T21:38:06.8770700Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8770796Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8770801Z 2023-01-11T21:38:06.8770878Z import triton 2023-01-11T21:38:06.8770971Z import triton.language as tl 2023-01-11T21:38:06.8771098Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8771230Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8771245Z 2023-01-11T21:38:06.8771250Z 2023-01-11T21:38:06.8771378Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.8771609Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.8771739Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.8771850Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:06.8771956Z float* __restrict__ out_ptr0) 2023-01-11T21:38:06.8772026Z { 2023-01-11T21:38:06.8772132Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.8772194Z { 2023-01-11T21:38:06.8772278Z #pragma omp for 2023-01-11T21:38:06.8772369Z for(long i0=0; i0<10; i0+=1) 2023-01-11T21:38:06.8772438Z { 2023-01-11T21:38:06.8772529Z for(long i1=0; i1<1; i1+=1) 2023-01-11T21:38:06.8772603Z { 2023-01-11T21:38:06.8772754Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + (8*i1) + (10*i0)); 2023-01-11T21:38:06.8772889Z auto tmp1 = at::vec::Vectorized::loadu(in_ptr1 + 8*i1); 2023-01-11T21:38:06.8772986Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.8773103Z tmp2.store(out_ptr0 + (8*i1) + (10*i0)); 2023-01-11T21:38:06.8773174Z } 2023-01-11T21:38:06.8773273Z #pragma omp simd simdlen(4) 2023-01-11T21:38:06.8773366Z for(long i1=8; i1<10; i1+=1) 2023-01-11T21:38:06.8773439Z { 2023-01-11T21:38:06.8773536Z auto tmp0 = in_ptr0[i1 + (10*i0)]; 2023-01-11T21:38:06.8773631Z auto tmp1 = in_ptr1[i1]; 2023-01-11T21:38:06.8773725Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.8773827Z out_ptr0[i1 + (10*i0)] = tmp2; 2023-01-11T21:38:06.8773899Z } 2023-01-11T21:38:06.8773969Z } 2023-01-11T21:38:06.8774063Z } 2023-01-11T21:38:06.8774120Z } 2023-01-11T21:38:06.8774205Z ''') 2023-01-11T21:38:06.8774210Z 2023-01-11T21:38:06.8774215Z 2023-01-11T21:38:06.8774308Z async_compile.wait(globals()) 2023-01-11T21:38:06.8774385Z del async_compile 2023-01-11T21:38:06.8774390Z 2023-01-11T21:38:06.8774465Z def call(args): 2023-01-11T21:38:06.8774656Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.8774733Z args.clear() 2023-01-11T21:38:06.8774937Z buf0 = empty_strided((10, 10), (10, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.8775097Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.8775171Z del arg0_1 2023-01-11T21:38:06.8775241Z del arg1_1 2023-01-11T21:38:06.8775316Z return (buf0, ) 2023-01-11T21:38:06.8775321Z 2023-01-11T21:38:06.8775326Z 2023-01-11T21:38:06.8775408Z if __name__ == "__main__": 2023-01-11T21:38:06.8775525Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8775660Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8775853Z arg0_1 = rand_strided((10, 10), (10, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.8776046Z arg1_1 = rand_strided((10, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.8776170Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.8776175Z 2023-01-11T21:38:06.8776247Z ok (1.678s) 2023-01-11T21:38:06.8776724Z test_cpu_dense_broadcast2 (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8776856Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8777113Z [2023-01-11 21:36:57,618] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1048 2023-01-11T21:38:06.8777440Z [2023-01-11 21:36:59,284] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1048 2023-01-11T21:38:06.8777446Z 2023-01-11T21:38:06.8777543Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8777672Z import torch 2023-01-11T21:38:06.8777744Z import random 2023-01-11T21:38:06.8777864Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8777987Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8777992Z 2023-01-11T21:38:06.8778075Z aten = torch.ops.aten 2023-01-11T21:38:06.8778210Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8778307Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8778313Z 2023-01-11T21:38:06.8778391Z import triton 2023-01-11T21:38:06.8778476Z import triton.language as tl 2023-01-11T21:38:06.8778601Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8778742Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8778748Z 2023-01-11T21:38:06.8778752Z 2023-01-11T21:38:06.8778892Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.8779095Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.8779220Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.8779332Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:06.8779438Z float* __restrict__ out_ptr0) 2023-01-11T21:38:06.8779497Z { 2023-01-11T21:38:06.8779600Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.8779670Z { 2023-01-11T21:38:06.8779751Z #pragma omp for 2023-01-11T21:38:06.8779843Z for(long i0=0; i0<10; i0+=1) 2023-01-11T21:38:06.8779913Z { 2023-01-11T21:38:06.8780002Z for(long i1=0; i1<1; i1+=1) 2023-01-11T21:38:06.8780062Z { 2023-01-11T21:38:06.8780248Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + (8*i1) + (10*i0)); 2023-01-11T21:38:06.8780384Z auto tmp1 = at::vec::Vectorized(in_ptr1[i0]); 2023-01-11T21:38:06.8780479Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.8780599Z tmp2.store(out_ptr0 + (8*i1) + (10*i0)); 2023-01-11T21:38:06.8780671Z } 2023-01-11T21:38:06.8780769Z #pragma omp simd simdlen(4) 2023-01-11T21:38:06.8780854Z for(long i1=8; i1<10; i1+=1) 2023-01-11T21:38:06.8780925Z { 2023-01-11T21:38:06.8781027Z auto tmp0 = in_ptr0[i1 + (10*i0)]; 2023-01-11T21:38:06.8781121Z auto tmp1 = in_ptr1[i0]; 2023-01-11T21:38:06.8781212Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.8781312Z out_ptr0[i1 + (10*i0)] = tmp2; 2023-01-11T21:38:06.8781383Z } 2023-01-11T21:38:06.8781446Z } 2023-01-11T21:38:06.8781519Z } 2023-01-11T21:38:06.8781590Z } 2023-01-11T21:38:06.8781679Z ''') 2023-01-11T21:38:06.8781684Z 2023-01-11T21:38:06.8781689Z 2023-01-11T21:38:06.8781785Z async_compile.wait(globals()) 2023-01-11T21:38:06.8781868Z del async_compile 2023-01-11T21:38:06.8781873Z 2023-01-11T21:38:06.8781949Z def call(args): 2023-01-11T21:38:06.8782032Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.8782106Z args.clear() 2023-01-11T21:38:06.8782319Z buf0 = empty_strided((1, 10, 10), (100, 10, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.8782487Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.8782562Z del arg0_1 2023-01-11T21:38:06.8782636Z del arg1_1 2023-01-11T21:38:06.8782714Z return (buf0, ) 2023-01-11T21:38:06.8782719Z 2023-01-11T21:38:06.8782723Z 2023-01-11T21:38:06.8782807Z if __name__ == "__main__": 2023-01-11T21:38:06.8782919Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8783051Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8783256Z arg0_1 = rand_strided((10, 10), (10, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.8783466Z arg1_1 = rand_strided((1, 10, 1), (10, 1, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.8783589Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.8783626Z 2023-01-11T21:38:06.8783701Z ok (1.680s) 2023-01-11T21:38:06.8784177Z test_cpu_dense_broadcast3 (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8784312Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8784573Z [2023-01-11 21:36:59,298] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1049 2023-01-11T21:38:06.8784844Z [2023-01-11 21:37:00,970] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1049 2023-01-11T21:38:06.8784850Z 2023-01-11T21:38:06.8784943Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8785020Z import torch 2023-01-11T21:38:06.8785101Z import random 2023-01-11T21:38:06.8785224Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8785351Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8785356Z 2023-01-11T21:38:06.8785442Z aten = torch.ops.aten 2023-01-11T21:38:06.8785583Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8785674Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8785686Z 2023-01-11T21:38:06.8785756Z import triton 2023-01-11T21:38:06.8785852Z import triton.language as tl 2023-01-11T21:38:06.8785979Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8786151Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8786156Z 2023-01-11T21:38:06.8786161Z 2023-01-11T21:38:06.8786301Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.8786509Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.8786637Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.8786742Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:06.8786849Z float* __restrict__ out_ptr0) 2023-01-11T21:38:06.8786917Z { 2023-01-11T21:38:06.8787022Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.8787091Z { 2023-01-11T21:38:06.8787175Z #pragma omp for 2023-01-11T21:38:06.8787264Z for(long i0=0; i0<12; i0+=1) 2023-01-11T21:38:06.8787327Z { 2023-01-11T21:38:06.8787471Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:06.8787600Z auto tmp1 = at::vec::Vectorized(in_ptr1[0]); 2023-01-11T21:38:06.8787696Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.8787795Z tmp2.store(out_ptr0 + 8*i0); 2023-01-11T21:38:06.8787863Z } 2023-01-11T21:38:06.8787964Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:06.8788054Z for(long i0=96; i0<100; i0+=1) 2023-01-11T21:38:06.8788125Z { 2023-01-11T21:38:06.8788221Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:06.8788309Z auto tmp1 = in_ptr1[0]; 2023-01-11T21:38:06.8788400Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.8788489Z out_ptr0[i0] = tmp2; 2023-01-11T21:38:06.8788561Z } 2023-01-11T21:38:06.8788623Z } 2023-01-11T21:38:06.8788691Z } 2023-01-11T21:38:06.8788779Z ''') 2023-01-11T21:38:06.8788785Z 2023-01-11T21:38:06.8788790Z 2023-01-11T21:38:06.8788885Z async_compile.wait(globals()) 2023-01-11T21:38:06.8788965Z del async_compile 2023-01-11T21:38:06.8788971Z 2023-01-11T21:38:06.8789053Z def call(args): 2023-01-11T21:38:06.8789134Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.8789205Z args.clear() 2023-01-11T21:38:06.8789407Z buf0 = empty_strided((10, 10), (10, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.8789576Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.8789680Z del arg0_1 2023-01-11T21:38:06.8789756Z del arg1_1 2023-01-11T21:38:06.8789834Z return (buf0, ) 2023-01-11T21:38:06.8789840Z 2023-01-11T21:38:06.8789844Z 2023-01-11T21:38:06.8789927Z if __name__ == "__main__": 2023-01-11T21:38:06.8790047Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8790169Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8790370Z arg0_1 = rand_strided((10, 10), (10, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.8790568Z arg1_1 = rand_strided((1, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.8790692Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.8790697Z 2023-01-11T21:38:06.8790773Z ok (1.687s) 2023-01-11T21:38:06.8791250Z test_cpu_dense_dense (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8791386Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8791648Z [2023-01-11 21:37:00,985] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1050 2023-01-11T21:38:06.8791913Z [2023-01-11 21:37:00,995] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1050 2023-01-11T21:38:06.8791919Z 2023-01-11T21:38:06.8792019Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8792114Z import torch 2023-01-11T21:38:06.8792189Z import random 2023-01-11T21:38:06.8792308Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8792433Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8792438Z 2023-01-11T21:38:06.8792520Z aten = torch.ops.aten 2023-01-11T21:38:06.8792659Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8792756Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8792761Z 2023-01-11T21:38:06.8792828Z import triton 2023-01-11T21:38:06.8792920Z import triton.language as tl 2023-01-11T21:38:06.8793045Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8793185Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8793190Z 2023-01-11T21:38:06.8793195Z 2023-01-11T21:38:06.8793332Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.8793537Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.8793663Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.8793772Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:06.8793869Z float* __restrict__ out_ptr0) 2023-01-11T21:38:06.8793934Z { 2023-01-11T21:38:06.8794037Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.8794104Z { 2023-01-11T21:38:06.8794185Z #pragma omp for 2023-01-11T21:38:06.8794274Z for(long i0=0; i0<12; i0+=1) 2023-01-11T21:38:06.8794343Z { 2023-01-11T21:38:06.8794475Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:06.8794611Z auto tmp1 = at::vec::Vectorized::loadu(in_ptr1 + 8*i0); 2023-01-11T21:38:06.8794703Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.8794799Z tmp2.store(out_ptr0 + 8*i0); 2023-01-11T21:38:06.8794867Z } 2023-01-11T21:38:06.8794965Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:06.8795057Z for(long i0=96; i0<100; i0+=1) 2023-01-11T21:38:06.8795116Z { 2023-01-11T21:38:06.8795205Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:06.8795293Z auto tmp1 = in_ptr1[i0]; 2023-01-11T21:38:06.8795380Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.8795496Z out_ptr0[i0] = tmp2; 2023-01-11T21:38:06.8795564Z } 2023-01-11T21:38:06.8795632Z } 2023-01-11T21:38:06.8795689Z } 2023-01-11T21:38:06.8795777Z ''') 2023-01-11T21:38:06.8795783Z 2023-01-11T21:38:06.8795787Z 2023-01-11T21:38:06.8795882Z async_compile.wait(globals()) 2023-01-11T21:38:06.8795964Z del async_compile 2023-01-11T21:38:06.8795970Z 2023-01-11T21:38:06.8796045Z def call(args): 2023-01-11T21:38:06.8796125Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.8796201Z args.clear() 2023-01-11T21:38:06.8796392Z buf0 = empty_strided((10, 10), (10, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.8796557Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.8796633Z del arg0_1 2023-01-11T21:38:06.8796704Z del arg1_1 2023-01-11T21:38:06.8796779Z return (buf0, ) 2023-01-11T21:38:06.8796785Z 2023-01-11T21:38:06.8796789Z 2023-01-11T21:38:06.8796868Z if __name__ == "__main__": 2023-01-11T21:38:06.8796987Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8797114Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8797306Z arg0_1 = rand_strided((10, 10), (10, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.8797504Z arg1_1 = rand_strided((10, 10), (10, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.8797624Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.8797629Z 2023-01-11T21:38:06.8797701Z ok (0.024s) 2023-01-11T21:38:06.8798176Z test_cpu_dense_double (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8798336Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8798595Z [2023-01-11 21:37:01,009] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1051 2023-01-11T21:38:06.8798858Z [2023-01-11 21:37:02,675] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1051 2023-01-11T21:38:06.8798864Z 2023-01-11T21:38:06.8798963Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8799038Z import torch 2023-01-11T21:38:06.8799105Z import random 2023-01-11T21:38:06.8799225Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8799348Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8799356Z 2023-01-11T21:38:06.8799437Z aten = torch.ops.aten 2023-01-11T21:38:06.8799573Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8799669Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8799674Z 2023-01-11T21:38:06.8799748Z import triton 2023-01-11T21:38:06.8799833Z import triton.language as tl 2023-01-11T21:38:06.8799961Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8800103Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8800108Z 2023-01-11T21:38:06.8800113Z 2023-01-11T21:38:06.8800248Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.8800452Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.8800574Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.8800687Z const double* __restrict__ in_ptr1, 2023-01-11T21:38:06.8800793Z double* __restrict__ out_ptr0) 2023-01-11T21:38:06.8800856Z { 2023-01-11T21:38:06.8800959Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.8801025Z { 2023-01-11T21:38:06.8801109Z #pragma omp for 2023-01-11T21:38:06.8801197Z for(long i0=0; i0<100; i0+=1) 2023-01-11T21:38:06.8801264Z { 2023-01-11T21:38:06.8801332Z { 2023-01-11T21:38:06.8801422Z { 2023-01-11T21:38:06.8801523Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:06.8801624Z auto tmp2 = in_ptr1[i0]; 2023-01-11T21:38:06.8801738Z auto tmp1 = static_cast(tmp0); 2023-01-11T21:38:06.8801834Z auto tmp3 = tmp1 + tmp2; 2023-01-11T21:38:06.8801925Z out_ptr0[i0] = tmp3; 2023-01-11T21:38:06.8801996Z } 2023-01-11T21:38:06.8802056Z } 2023-01-11T21:38:06.8802123Z } 2023-01-11T21:38:06.8802188Z } 2023-01-11T21:38:06.8802252Z } 2023-01-11T21:38:06.8802338Z ''') 2023-01-11T21:38:06.8802346Z 2023-01-11T21:38:06.8802351Z 2023-01-11T21:38:06.8802444Z async_compile.wait(globals()) 2023-01-11T21:38:06.8802514Z del async_compile 2023-01-11T21:38:06.8802526Z 2023-01-11T21:38:06.8802593Z def call(args): 2023-01-11T21:38:06.8802675Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.8802752Z args.clear() 2023-01-11T21:38:06.8802953Z buf0 = empty_strided((10, 10), (10, 1), device='cpu', dtype=torch.float64) 2023-01-11T21:38:06.8803120Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.8803194Z del arg0_1 2023-01-11T21:38:06.8803266Z del arg1_1 2023-01-11T21:38:06.8803334Z return (buf0, ) 2023-01-11T21:38:06.8803340Z 2023-01-11T21:38:06.8803345Z 2023-01-11T21:38:06.8803425Z if __name__ == "__main__": 2023-01-11T21:38:06.8803543Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8803673Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8803871Z arg0_1 = rand_strided((10, 10), (10, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.8804096Z arg1_1 = rand_strided((10, 10), (10, 1), device='cpu', dtype=torch.float64) 2023-01-11T21:38:06.8804215Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.8804220Z 2023-01-11T21:38:06.8804290Z ok (1.680s) 2023-01-11T21:38:06.8804755Z test_cpu_dense_int (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8804886Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8805145Z [2023-01-11 21:37:02,690] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1052 2023-01-11T21:38:06.8805412Z [2023-01-11 21:37:04,355] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1052 2023-01-11T21:38:06.8805418Z 2023-01-11T21:38:06.8805516Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8805592Z import torch 2023-01-11T21:38:06.8805666Z import random 2023-01-11T21:38:06.8805788Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8805912Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8805917Z 2023-01-11T21:38:06.8805992Z aten = torch.ops.aten 2023-01-11T21:38:06.8806129Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8806226Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8806231Z 2023-01-11T21:38:06.8806305Z import triton 2023-01-11T21:38:06.8806397Z import triton.language as tl 2023-01-11T21:38:06.8806522Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8806661Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8806669Z 2023-01-11T21:38:06.8806673Z 2023-01-11T21:38:06.8806813Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.8807011Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.8807136Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.8807271Z const int* __restrict__ in_ptr1, 2023-01-11T21:38:06.8807377Z float* __restrict__ out_ptr0) 2023-01-11T21:38:06.8807442Z { 2023-01-11T21:38:06.8807545Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.8807611Z { 2023-01-11T21:38:06.8807699Z #pragma omp for collapse(2) 2023-01-11T21:38:06.8807786Z for(long i0=0; i0<10; i0+=1) 2023-01-11T21:38:06.8807854Z { 2023-01-11T21:38:06.8807944Z for(long i1=0; i1<10; i1+=1) 2023-01-11T21:38:06.8808011Z { 2023-01-11T21:38:06.8808080Z { 2023-01-11T21:38:06.8808151Z { 2023-01-11T21:38:06.8808257Z auto tmp0 = in_ptr0[i1 + (10*i0)]; 2023-01-11T21:38:06.8808360Z auto tmp1 = in_ptr1[i1]; 2023-01-11T21:38:06.8808477Z auto tmp2 = static_cast(tmp1); 2023-01-11T21:38:06.8808575Z auto tmp3 = tmp0 + tmp2; 2023-01-11T21:38:06.8808679Z out_ptr0[i1 + (10*i0)] = tmp3; 2023-01-11T21:38:06.8808751Z } 2023-01-11T21:38:06.8808818Z } 2023-01-11T21:38:06.8808878Z } 2023-01-11T21:38:06.8808945Z } 2023-01-11T21:38:06.8809011Z } 2023-01-11T21:38:06.8809076Z } 2023-01-11T21:38:06.8809167Z ''') 2023-01-11T21:38:06.8809172Z 2023-01-11T21:38:06.8809177Z 2023-01-11T21:38:06.8809269Z async_compile.wait(globals()) 2023-01-11T21:38:06.8809345Z del async_compile 2023-01-11T21:38:06.8809351Z 2023-01-11T21:38:06.8809418Z def call(args): 2023-01-11T21:38:06.8809496Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.8809577Z args.clear() 2023-01-11T21:38:06.8809816Z buf0 = empty_strided((10, 10), (10, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.8809982Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.8810060Z del arg0_1 2023-01-11T21:38:06.8810133Z del arg1_1 2023-01-11T21:38:06.8810203Z return (buf0, ) 2023-01-11T21:38:06.8810208Z 2023-01-11T21:38:06.8810220Z 2023-01-11T21:38:06.8810294Z if __name__ == "__main__": 2023-01-11T21:38:06.8810414Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8810540Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8810737Z arg0_1 = rand_strided((10, 10), (10, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.8810929Z arg1_1 = rand_strided((10, ), (1, ), device='cpu', dtype=torch.int32) 2023-01-11T21:38:06.8811049Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.8811054Z 2023-01-11T21:38:06.8811130Z ok (1.680s) 2023-01-11T21:38:06.8811603Z test_cpu_dense_strided (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8811734Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8811986Z [2023-01-11 21:37:04,369] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1053 2023-01-11T21:38:06.8812252Z [2023-01-11 21:37:06,026] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1053 2023-01-11T21:38:06.8812258Z 2023-01-11T21:38:06.8812356Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8812430Z import torch 2023-01-11T21:38:06.8812503Z import random 2023-01-11T21:38:06.8812624Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8812748Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8812754Z 2023-01-11T21:38:06.8812836Z aten = torch.ops.aten 2023-01-11T21:38:06.8812965Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8813088Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8813094Z 2023-01-11T21:38:06.8813168Z import triton 2023-01-11T21:38:06.8813262Z import triton.language as tl 2023-01-11T21:38:06.8813387Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8813528Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8813533Z 2023-01-11T21:38:06.8813538Z 2023-01-11T21:38:06.8813674Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.8813880Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.8813996Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.8814111Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:06.8814216Z float* __restrict__ out_ptr0) 2023-01-11T21:38:06.8814284Z { 2023-01-11T21:38:06.8814386Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.8814454Z { 2023-01-11T21:38:06.8814668Z #pragma omp for collapse(2) 2023-01-11T21:38:06.8814748Z for(long i0=0; i0<10; i0+=1) 2023-01-11T21:38:06.8814817Z { 2023-01-11T21:38:06.8814908Z for(long i1=0; i1<10; i1+=1) 2023-01-11T21:38:06.8814976Z { 2023-01-11T21:38:06.8815045Z { 2023-01-11T21:38:06.8815117Z { 2023-01-11T21:38:06.8815220Z auto tmp0 = in_ptr0[i1 + (10*i0)]; 2023-01-11T21:38:06.8815334Z auto tmp1 = in_ptr1[(2*i1) + (30*i0)]; 2023-01-11T21:38:06.8815434Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.8815537Z out_ptr0[i1 + (10*i0)] = tmp2; 2023-01-11T21:38:06.8815655Z } 2023-01-11T21:38:06.8815725Z } 2023-01-11T21:38:06.8815793Z } 2023-01-11T21:38:06.8815853Z } 2023-01-11T21:38:06.8815919Z } 2023-01-11T21:38:06.8815984Z } 2023-01-11T21:38:06.8816071Z ''') 2023-01-11T21:38:06.8816076Z 2023-01-11T21:38:06.8816081Z 2023-01-11T21:38:06.8816176Z async_compile.wait(globals()) 2023-01-11T21:38:06.8816252Z del async_compile 2023-01-11T21:38:06.8816257Z 2023-01-11T21:38:06.8816332Z def call(args): 2023-01-11T21:38:06.8816413Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.8816481Z args.clear() 2023-01-11T21:38:06.8816679Z buf0 = empty_strided((10, 10), (10, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.8816844Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.8816917Z del arg0_1 2023-01-11T21:38:06.8816989Z del arg1_1 2023-01-11T21:38:06.8817063Z return (buf0, ) 2023-01-11T21:38:06.8817071Z 2023-01-11T21:38:06.8817075Z 2023-01-11T21:38:06.8817236Z if __name__ == "__main__": 2023-01-11T21:38:06.8817360Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8817487Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8817691Z arg0_1 = rand_strided((10, 10), (10, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.8817893Z arg1_1 = rand_strided((10, 10), (30, 2), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.8818011Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.8818016Z 2023-01-11T21:38:06.8818086Z ok (1.671s) 2023-01-11T21:38:06.8818561Z test_cpu_dense_transposed (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8818696Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8818954Z [2023-01-11 21:37:06,040] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1054 2023-01-11T21:38:06.8819256Z [2023-01-11 21:37:07,707] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1054 2023-01-11T21:38:06.8819262Z 2023-01-11T21:38:06.8819354Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8819430Z import torch 2023-01-11T21:38:06.8819504Z import random 2023-01-11T21:38:06.8819623Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8819747Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8819752Z 2023-01-11T21:38:06.8819833Z aten = torch.ops.aten 2023-01-11T21:38:06.8819971Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8820060Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8820076Z 2023-01-11T21:38:06.8820143Z import triton 2023-01-11T21:38:06.8820238Z import triton.language as tl 2023-01-11T21:38:06.8820365Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8820507Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8820513Z 2023-01-11T21:38:06.8820519Z 2023-01-11T21:38:06.8820657Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.8820862Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.8820986Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.8821088Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:06.8821192Z float* __restrict__ out_ptr0) 2023-01-11T21:38:06.8821260Z { 2023-01-11T21:38:06.8821360Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.8821426Z { 2023-01-11T21:38:06.8821522Z #pragma omp for collapse(2) 2023-01-11T21:38:06.8821642Z for(long i0=0; i0<10; i0+=1) 2023-01-11T21:38:06.8821702Z { 2023-01-11T21:38:06.8821792Z for(long i1=0; i1<10; i1+=1) 2023-01-11T21:38:06.8821862Z { 2023-01-11T21:38:06.8821930Z { 2023-01-11T21:38:06.8822000Z { 2023-01-11T21:38:06.8822112Z auto tmp0 = in_ptr0[i1 + (10*i0)]; 2023-01-11T21:38:06.8822221Z auto tmp1 = in_ptr1[i0 + (10*i1)]; 2023-01-11T21:38:06.8822314Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.8822415Z out_ptr0[i1 + (10*i0)] = tmp2; 2023-01-11T21:38:06.8822485Z } 2023-01-11T21:38:06.8822553Z } 2023-01-11T21:38:06.8822622Z } 2023-01-11T21:38:06.8822692Z } 2023-01-11T21:38:06.8822759Z } 2023-01-11T21:38:06.8822816Z } 2023-01-11T21:38:06.8822902Z ''') 2023-01-11T21:38:06.8822907Z 2023-01-11T21:38:06.8822912Z 2023-01-11T21:38:06.8823008Z async_compile.wait(globals()) 2023-01-11T21:38:06.8823088Z del async_compile 2023-01-11T21:38:06.8823093Z 2023-01-11T21:38:06.8823168Z def call(args): 2023-01-11T21:38:06.8823248Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.8823323Z args.clear() 2023-01-11T21:38:06.8823512Z buf0 = empty_strided((10, 10), (10, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.8823681Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.8823754Z del arg0_1 2023-01-11T21:38:06.8823827Z del arg1_1 2023-01-11T21:38:06.8823902Z return (buf0, ) 2023-01-11T21:38:06.8823908Z 2023-01-11T21:38:06.8823912Z 2023-01-11T21:38:06.8823992Z if __name__ == "__main__": 2023-01-11T21:38:06.8824109Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8824239Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8824430Z arg0_1 = rand_strided((10, 10), (10, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.8824627Z arg1_1 = rand_strided((10, 10), (1, 10), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.8824747Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.8824752Z 2023-01-11T21:38:06.8824823Z ok (1.681s) 2023-01-11T21:38:06.8825329Z test_cpu_double_broadcast1 (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8825465Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8825724Z [2023-01-11 21:37:07,721] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1055 2023-01-11T21:38:06.8825987Z [2023-01-11 21:37:09,378] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1055 2023-01-11T21:38:06.8825997Z 2023-01-11T21:38:06.8826094Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8826162Z import torch 2023-01-11T21:38:06.8826236Z import random 2023-01-11T21:38:06.8826355Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8826481Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8826486Z 2023-01-11T21:38:06.8826567Z aten = torch.ops.aten 2023-01-11T21:38:06.8826705Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8826801Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8826806Z 2023-01-11T21:38:06.8826880Z import triton 2023-01-11T21:38:06.8826965Z import triton.language as tl 2023-01-11T21:38:06.8827090Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8827230Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8827236Z 2023-01-11T21:38:06.8827240Z 2023-01-11T21:38:06.8827403Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.8827609Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.8827734Z extern "C" void kernel(const double* __restrict__ in_ptr0, 2023-01-11T21:38:06.8827843Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:06.8827950Z double* __restrict__ out_ptr0) 2023-01-11T21:38:06.8828009Z { 2023-01-11T21:38:06.8828110Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.8828176Z { 2023-01-11T21:38:06.8828271Z #pragma omp for collapse(2) 2023-01-11T21:38:06.8828358Z for(long i0=0; i0<10; i0+=1) 2023-01-11T21:38:06.8828423Z { 2023-01-11T21:38:06.8828513Z for(long i1=0; i1<10; i1+=1) 2023-01-11T21:38:06.8828574Z { 2023-01-11T21:38:06.8828642Z { 2023-01-11T21:38:06.8828711Z { 2023-01-11T21:38:06.8828820Z auto tmp0 = in_ptr0[i1 + (10*i0)]; 2023-01-11T21:38:06.8828923Z auto tmp1 = in_ptr1[i1]; 2023-01-11T21:38:06.8829042Z auto tmp2 = static_cast(tmp1); 2023-01-11T21:38:06.8829134Z auto tmp3 = tmp0 + tmp2; 2023-01-11T21:38:06.8829234Z out_ptr0[i1 + (10*i0)] = tmp3; 2023-01-11T21:38:06.8829307Z } 2023-01-11T21:38:06.8829378Z } 2023-01-11T21:38:06.8829450Z } 2023-01-11T21:38:06.8829519Z } 2023-01-11T21:38:06.8829588Z } 2023-01-11T21:38:06.8829645Z } 2023-01-11T21:38:06.8829730Z ''') 2023-01-11T21:38:06.8829735Z 2023-01-11T21:38:06.8829740Z 2023-01-11T21:38:06.8829832Z async_compile.wait(globals()) 2023-01-11T21:38:06.8829908Z del async_compile 2023-01-11T21:38:06.8829913Z 2023-01-11T21:38:06.8829988Z def call(args): 2023-01-11T21:38:06.8830068Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.8830143Z args.clear() 2023-01-11T21:38:06.8830343Z buf0 = empty_strided((10, 10), (10, 1), device='cpu', dtype=torch.float64) 2023-01-11T21:38:06.8830505Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.8830577Z del arg0_1 2023-01-11T21:38:06.8830649Z del arg1_1 2023-01-11T21:38:06.8830725Z return (buf0, ) 2023-01-11T21:38:06.8830730Z 2023-01-11T21:38:06.8830762Z 2023-01-11T21:38:06.8830850Z if __name__ == "__main__": 2023-01-11T21:38:06.8830971Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8831100Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8831295Z arg0_1 = rand_strided((10, 10), (10, 1), device='cpu', dtype=torch.float64) 2023-01-11T21:38:06.8831491Z arg1_1 = rand_strided((10, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.8831612Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.8831618Z 2023-01-11T21:38:06.8831690Z ok (1.671s) 2023-01-11T21:38:06.8832167Z test_cpu_double_broadcast2 (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8832305Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8832566Z [2023-01-11 21:37:09,392] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1056 2023-01-11T21:38:06.8832830Z [2023-01-11 21:37:11,054] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1056 2023-01-11T21:38:06.8832836Z 2023-01-11T21:38:06.8832937Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8833014Z import torch 2023-01-11T21:38:06.8833085Z import random 2023-01-11T21:38:06.8833205Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8833356Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8833361Z 2023-01-11T21:38:06.8833442Z aten = torch.ops.aten 2023-01-11T21:38:06.8833578Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8833673Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8833678Z 2023-01-11T21:38:06.8833754Z import triton 2023-01-11T21:38:06.8833839Z import triton.language as tl 2023-01-11T21:38:06.8833964Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8834103Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8834108Z 2023-01-11T21:38:06.8834113Z 2023-01-11T21:38:06.8834248Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.8834452Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.8834577Z extern "C" void kernel(const double* __restrict__ in_ptr0, 2023-01-11T21:38:06.8834688Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:06.8834796Z double* __restrict__ out_ptr0) 2023-01-11T21:38:06.8834854Z { 2023-01-11T21:38:06.8834955Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.8835022Z { 2023-01-11T21:38:06.8835117Z #pragma omp for collapse(2) 2023-01-11T21:38:06.8835205Z for(long i0=0; i0<10; i0+=1) 2023-01-11T21:38:06.8835272Z { 2023-01-11T21:38:06.8835362Z for(long i1=0; i1<10; i1+=1) 2023-01-11T21:38:06.8835423Z { 2023-01-11T21:38:06.8835492Z { 2023-01-11T21:38:06.8835563Z { 2023-01-11T21:38:06.8835671Z auto tmp0 = in_ptr0[i1 + (10*i0)]; 2023-01-11T21:38:06.8835771Z auto tmp1 = in_ptr1[i0]; 2023-01-11T21:38:06.8835886Z auto tmp2 = static_cast(tmp1); 2023-01-11T21:38:06.8835984Z auto tmp3 = tmp0 + tmp2; 2023-01-11T21:38:06.8836079Z out_ptr0[i1 + (10*i0)] = tmp3; 2023-01-11T21:38:06.8836153Z } 2023-01-11T21:38:06.8836223Z } 2023-01-11T21:38:06.8836292Z } 2023-01-11T21:38:06.8836359Z } 2023-01-11T21:38:06.8836425Z } 2023-01-11T21:38:06.8836490Z } 2023-01-11T21:38:06.8836568Z ''') 2023-01-11T21:38:06.8836573Z 2023-01-11T21:38:06.8836609Z 2023-01-11T21:38:06.8836704Z async_compile.wait(globals()) 2023-01-11T21:38:06.8836780Z del async_compile 2023-01-11T21:38:06.8836785Z 2023-01-11T21:38:06.8836862Z def call(args): 2023-01-11T21:38:06.8836942Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.8837017Z args.clear() 2023-01-11T21:38:06.8837228Z buf0 = empty_strided((1, 10, 10), (100, 10, 1), device='cpu', dtype=torch.float64) 2023-01-11T21:38:06.8837387Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.8837458Z del arg0_1 2023-01-11T21:38:06.8837530Z del arg1_1 2023-01-11T21:38:06.8837606Z return (buf0, ) 2023-01-11T21:38:06.8837614Z 2023-01-11T21:38:06.8837618Z 2023-01-11T21:38:06.8837698Z if __name__ == "__main__": 2023-01-11T21:38:06.8837816Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8837943Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8838145Z arg0_1 = rand_strided((10, 10), (10, 1), device='cpu', dtype=torch.float64) 2023-01-11T21:38:06.8838343Z arg1_1 = rand_strided((1, 10, 1), (10, 1, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.8838461Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.8838466Z 2023-01-11T21:38:06.8838536Z ok (1.676s) 2023-01-11T21:38:06.8839013Z test_cpu_double_broadcast3 (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8839172Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8839429Z [2023-01-11 21:37:11,068] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1057 2023-01-11T21:38:06.8839694Z [2023-01-11 21:37:12,731] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1057 2023-01-11T21:38:06.8839699Z 2023-01-11T21:38:06.8839798Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8839871Z import torch 2023-01-11T21:38:06.8839938Z import random 2023-01-11T21:38:06.8840058Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8840183Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8840188Z 2023-01-11T21:38:06.8840271Z aten = torch.ops.aten 2023-01-11T21:38:06.8840409Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8840507Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8840512Z 2023-01-11T21:38:06.8840586Z import triton 2023-01-11T21:38:06.8840679Z import triton.language as tl 2023-01-11T21:38:06.8840797Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8840937Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8840945Z 2023-01-11T21:38:06.8840950Z 2023-01-11T21:38:06.8841086Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.8841292Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.8841417Z extern "C" void kernel(const double* __restrict__ in_ptr0, 2023-01-11T21:38:06.8841527Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:06.8841632Z double* __restrict__ out_ptr0) 2023-01-11T21:38:06.8841698Z { 2023-01-11T21:38:06.8841792Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.8841857Z { 2023-01-11T21:38:06.8841942Z #pragma omp for 2023-01-11T21:38:06.8842032Z for(long i0=0; i0<100; i0+=1) 2023-01-11T21:38:06.8842099Z { 2023-01-11T21:38:06.8842165Z { 2023-01-11T21:38:06.8842229Z { 2023-01-11T21:38:06.8842326Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:06.8842423Z auto tmp1 = in_ptr1[0]; 2023-01-11T21:38:06.8842595Z auto tmp2 = static_cast(tmp1); 2023-01-11T21:38:06.8842696Z auto tmp3 = tmp0 + tmp2; 2023-01-11T21:38:06.8842788Z out_ptr0[i0] = tmp3; 2023-01-11T21:38:06.8842856Z } 2023-01-11T21:38:06.8842917Z } 2023-01-11T21:38:06.8842983Z } 2023-01-11T21:38:06.8843048Z } 2023-01-11T21:38:06.8843112Z } 2023-01-11T21:38:06.8843197Z ''') 2023-01-11T21:38:06.8843202Z 2023-01-11T21:38:06.8843206Z 2023-01-11T21:38:06.8843301Z async_compile.wait(globals()) 2023-01-11T21:38:06.8843378Z del async_compile 2023-01-11T21:38:06.8843383Z 2023-01-11T21:38:06.8843463Z def call(args): 2023-01-11T21:38:06.8843536Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.8843610Z args.clear() 2023-01-11T21:38:06.8843813Z buf0 = empty_strided((10, 10), (10, 1), device='cpu', dtype=torch.float64) 2023-01-11T21:38:06.8843982Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.8844056Z del arg0_1 2023-01-11T21:38:06.8844128Z del arg1_1 2023-01-11T21:38:06.8844205Z return (buf0, ) 2023-01-11T21:38:06.8844210Z 2023-01-11T21:38:06.8844215Z 2023-01-11T21:38:06.8844288Z if __name__ == "__main__": 2023-01-11T21:38:06.8844406Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8844532Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8844731Z arg0_1 = rand_strided((10, 10), (10, 1), device='cpu', dtype=torch.float64) 2023-01-11T21:38:06.8844922Z arg1_1 = rand_strided((1, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.8845065Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.8845071Z 2023-01-11T21:38:06.8845142Z ok (1.677s) 2023-01-11T21:38:06.8845612Z test_cpu_double_dense (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8845746Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8845999Z [2023-01-11 21:37:12,745] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1058 2023-01-11T21:38:06.8846262Z [2023-01-11 21:37:14,419] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1058 2023-01-11T21:38:06.8846267Z 2023-01-11T21:38:06.8846370Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8846445Z import torch 2023-01-11T21:38:06.8846520Z import random 2023-01-11T21:38:06.8846638Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8846761Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8846766Z 2023-01-11T21:38:06.8846848Z aten = torch.ops.aten 2023-01-11T21:38:06.8846980Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8847078Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8847083Z 2023-01-11T21:38:06.8847158Z import triton 2023-01-11T21:38:06.8847252Z import triton.language as tl 2023-01-11T21:38:06.8847376Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8847515Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8847521Z 2023-01-11T21:38:06.8847525Z 2023-01-11T21:38:06.8847662Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.8847868Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.8847989Z extern "C" void kernel(const double* __restrict__ in_ptr0, 2023-01-11T21:38:06.8848098Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:06.8848204Z double* __restrict__ out_ptr0) 2023-01-11T21:38:06.8848269Z { 2023-01-11T21:38:06.8848397Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.8848466Z { 2023-01-11T21:38:06.8848546Z #pragma omp for 2023-01-11T21:38:06.8848627Z for(long i0=0; i0<100; i0+=1) 2023-01-11T21:38:06.8848696Z { 2023-01-11T21:38:06.8848766Z { 2023-01-11T21:38:06.8848836Z { 2023-01-11T21:38:06.8848934Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:06.8849030Z auto tmp1 = in_ptr1[i0]; 2023-01-11T21:38:06.8849143Z auto tmp2 = static_cast(tmp1); 2023-01-11T21:38:06.8849231Z auto tmp3 = tmp0 + tmp2; 2023-01-11T21:38:06.8849325Z out_ptr0[i0] = tmp3; 2023-01-11T21:38:06.8849392Z } 2023-01-11T21:38:06.8849459Z } 2023-01-11T21:38:06.8849526Z } 2023-01-11T21:38:06.8849591Z } 2023-01-11T21:38:06.8849654Z } 2023-01-11T21:38:06.8849733Z ''') 2023-01-11T21:38:06.8849738Z 2023-01-11T21:38:06.8849743Z 2023-01-11T21:38:06.8849838Z async_compile.wait(globals()) 2023-01-11T21:38:06.8849916Z del async_compile 2023-01-11T21:38:06.8849921Z 2023-01-11T21:38:06.8849996Z def call(args): 2023-01-11T21:38:06.8850075Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.8850150Z args.clear() 2023-01-11T21:38:06.8850350Z buf0 = empty_strided((10, 10), (10, 1), device='cpu', dtype=torch.float64) 2023-01-11T21:38:06.8850509Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.8850582Z del arg0_1 2023-01-11T21:38:06.8850654Z del arg1_1 2023-01-11T21:38:06.8850728Z return (buf0, ) 2023-01-11T21:38:06.8850768Z 2023-01-11T21:38:06.8850773Z 2023-01-11T21:38:06.8850856Z if __name__ == "__main__": 2023-01-11T21:38:06.8850974Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8851100Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8851301Z arg0_1 = rand_strided((10, 10), (10, 1), device='cpu', dtype=torch.float64) 2023-01-11T21:38:06.8851492Z arg1_1 = rand_strided((10, 10), (10, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.8851611Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.8851616Z 2023-01-11T21:38:06.8851686Z ok (1.687s) 2023-01-11T21:38:06.8852157Z test_cpu_double_double (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8852292Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8852554Z [2023-01-11 21:37:14,433] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1059 2023-01-11T21:38:06.8852821Z [2023-01-11 21:37:16,095] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1059 2023-01-11T21:38:06.8852826Z 2023-01-11T21:38:06.8852924Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8852999Z import torch 2023-01-11T21:38:06.8853066Z import random 2023-01-11T21:38:06.8853185Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8853312Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8853317Z 2023-01-11T21:38:06.8853397Z aten = torch.ops.aten 2023-01-11T21:38:06.8853533Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8853631Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8853639Z 2023-01-11T21:38:06.8853713Z import triton 2023-01-11T21:38:06.8853807Z import triton.language as tl 2023-01-11T21:38:06.8853925Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8854064Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8854070Z 2023-01-11T21:38:06.8854074Z 2023-01-11T21:38:06.8854239Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.8854446Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.8854703Z extern "C" void kernel(const double* __restrict__ in_ptr0, 2023-01-11T21:38:06.8854817Z const double* __restrict__ in_ptr1, 2023-01-11T21:38:06.8854921Z double* __restrict__ out_ptr0) 2023-01-11T21:38:06.8854985Z { 2023-01-11T21:38:06.8855081Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.8855147Z { 2023-01-11T21:38:06.8855228Z #pragma omp for 2023-01-11T21:38:06.8855325Z for(long i0=0; i0<100; i0+=1) 2023-01-11T21:38:06.8855392Z { 2023-01-11T21:38:06.8855460Z { 2023-01-11T21:38:06.8855522Z { 2023-01-11T21:38:06.8855619Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:06.8855717Z auto tmp1 = in_ptr1[i0]; 2023-01-11T21:38:06.8855814Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.8855905Z out_ptr0[i0] = tmp2; 2023-01-11T21:38:06.8855975Z } 2023-01-11T21:38:06.8856043Z } 2023-01-11T21:38:06.8856102Z } 2023-01-11T21:38:06.8856167Z } 2023-01-11T21:38:06.8856230Z } 2023-01-11T21:38:06.8856317Z ''') 2023-01-11T21:38:06.8856323Z 2023-01-11T21:38:06.8856327Z 2023-01-11T21:38:06.8856422Z async_compile.wait(globals()) 2023-01-11T21:38:06.8856501Z del async_compile 2023-01-11T21:38:06.8856506Z 2023-01-11T21:38:06.8856580Z def call(args): 2023-01-11T21:38:06.8856652Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.8856727Z args.clear() 2023-01-11T21:38:06.8856977Z buf0 = empty_strided((10, 10), (10, 1), device='cpu', dtype=torch.float64) 2023-01-11T21:38:06.8857196Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.8857282Z del arg0_1 2023-01-11T21:38:06.8857355Z del arg1_1 2023-01-11T21:38:06.8857433Z return (buf0, ) 2023-01-11T21:38:06.8857438Z 2023-01-11T21:38:06.8857443Z 2023-01-11T21:38:06.8857523Z if __name__ == "__main__": 2023-01-11T21:38:06.8857634Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8857761Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8857958Z arg0_1 = rand_strided((10, 10), (10, 1), device='cpu', dtype=torch.float64) 2023-01-11T21:38:06.8858154Z arg1_1 = rand_strided((10, 10), (10, 1), device='cpu', dtype=torch.float64) 2023-01-11T21:38:06.8858273Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.8858278Z 2023-01-11T21:38:06.8858349Z ok (1.676s) 2023-01-11T21:38:06.8858822Z test_cpu_double_int (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8858953Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8859210Z [2023-01-11 21:37:16,108] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1060 2023-01-11T21:38:06.8859463Z [2023-01-11 21:37:17,776] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1060 2023-01-11T21:38:06.8859477Z 2023-01-11T21:38:06.8859568Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8859642Z import torch 2023-01-11T21:38:06.8859717Z import random 2023-01-11T21:38:06.8859839Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8859963Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8859969Z 2023-01-11T21:38:06.8860050Z aten = torch.ops.aten 2023-01-11T21:38:06.8860185Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8860312Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8860318Z 2023-01-11T21:38:06.8860392Z import triton 2023-01-11T21:38:06.8860483Z import triton.language as tl 2023-01-11T21:38:06.8860608Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8860748Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8860753Z 2023-01-11T21:38:06.8860758Z 2023-01-11T21:38:06.8860895Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.8861103Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.8861226Z extern "C" void kernel(const double* __restrict__ in_ptr0, 2023-01-11T21:38:06.8861328Z const int* __restrict__ in_ptr1, 2023-01-11T21:38:06.8861433Z double* __restrict__ out_ptr0) 2023-01-11T21:38:06.8861498Z { 2023-01-11T21:38:06.8861598Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.8861663Z { 2023-01-11T21:38:06.8861764Z #pragma omp for collapse(2) 2023-01-11T21:38:06.8861850Z for(long i0=0; i0<10; i0+=1) 2023-01-11T21:38:06.8861910Z { 2023-01-11T21:38:06.8862000Z for(long i1=0; i1<10; i1+=1) 2023-01-11T21:38:06.8862068Z { 2023-01-11T21:38:06.8862138Z { 2023-01-11T21:38:06.8862209Z { 2023-01-11T21:38:06.8862317Z auto tmp0 = in_ptr0[i1 + (10*i0)]; 2023-01-11T21:38:06.8862417Z auto tmp1 = in_ptr1[i1]; 2023-01-11T21:38:06.8862528Z auto tmp2 = static_cast(tmp1); 2023-01-11T21:38:06.8862626Z auto tmp3 = tmp0 + tmp2; 2023-01-11T21:38:06.8862757Z out_ptr0[i1 + (10*i0)] = tmp3; 2023-01-11T21:38:06.8862829Z } 2023-01-11T21:38:06.8862900Z } 2023-01-11T21:38:06.8862971Z } 2023-01-11T21:38:06.8863037Z } 2023-01-11T21:38:06.8863095Z } 2023-01-11T21:38:06.8863159Z } 2023-01-11T21:38:06.8863245Z ''') 2023-01-11T21:38:06.8863251Z 2023-01-11T21:38:06.8863255Z 2023-01-11T21:38:06.8863349Z async_compile.wait(globals()) 2023-01-11T21:38:06.8863427Z del async_compile 2023-01-11T21:38:06.8863432Z 2023-01-11T21:38:06.8863507Z def call(args): 2023-01-11T21:38:06.8863586Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.8863654Z args.clear() 2023-01-11T21:38:06.8863853Z buf0 = empty_strided((10, 10), (10, 1), device='cpu', dtype=torch.float64) 2023-01-11T21:38:06.8864020Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.8864092Z del arg0_1 2023-01-11T21:38:06.8864166Z del arg1_1 2023-01-11T21:38:06.8864242Z return (buf0, ) 2023-01-11T21:38:06.8864248Z 2023-01-11T21:38:06.8864252Z 2023-01-11T21:38:06.8864335Z if __name__ == "__main__": 2023-01-11T21:38:06.8864445Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8864571Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8864774Z arg0_1 = rand_strided((10, 10), (10, 1), device='cpu', dtype=torch.float64) 2023-01-11T21:38:06.8864965Z arg1_1 = rand_strided((10, ), (1, ), device='cpu', dtype=torch.int32) 2023-01-11T21:38:06.8865084Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.8865089Z 2023-01-11T21:38:06.8865162Z ok (1.682s) 2023-01-11T21:38:06.8865642Z test_cpu_double_strided (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8865777Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8866066Z [2023-01-11 21:37:17,791] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1061 2023-01-11T21:38:06.8866330Z [2023-01-11 21:37:19,455] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1061 2023-01-11T21:38:06.8866336Z 2023-01-11T21:38:06.8866427Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8866503Z import torch 2023-01-11T21:38:06.8866578Z import random 2023-01-11T21:38:06.8866696Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8866820Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8866825Z 2023-01-11T21:38:06.8866908Z aten = torch.ops.aten 2023-01-11T21:38:06.8867045Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8867137Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8867152Z 2023-01-11T21:38:06.8867219Z import triton 2023-01-11T21:38:06.8867311Z import triton.language as tl 2023-01-11T21:38:06.8867436Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8867578Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8867583Z 2023-01-11T21:38:06.8867588Z 2023-01-11T21:38:06.8867724Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.8867930Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.8868054Z extern "C" void kernel(const double* __restrict__ in_ptr0, 2023-01-11T21:38:06.8868165Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:06.8868264Z double* __restrict__ out_ptr0) 2023-01-11T21:38:06.8868329Z { 2023-01-11T21:38:06.8868430Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.8868497Z { 2023-01-11T21:38:06.8868621Z #pragma omp for collapse(2) 2023-01-11T21:38:06.8868710Z for(long i0=0; i0<10; i0+=1) 2023-01-11T21:38:06.8868770Z { 2023-01-11T21:38:06.8868862Z for(long i1=0; i1<10; i1+=1) 2023-01-11T21:38:06.8868931Z { 2023-01-11T21:38:06.8869000Z { 2023-01-11T21:38:06.8869074Z { 2023-01-11T21:38:06.8869184Z auto tmp0 = in_ptr0[i1 + (10*i0)]; 2023-01-11T21:38:06.8869295Z auto tmp1 = in_ptr1[(2*i1) + (30*i0)]; 2023-01-11T21:38:06.8869404Z auto tmp2 = static_cast(tmp1); 2023-01-11T21:38:06.8869507Z auto tmp3 = tmp0 + tmp2; 2023-01-11T21:38:06.8869610Z out_ptr0[i1 + (10*i0)] = tmp3; 2023-01-11T21:38:06.8869682Z } 2023-01-11T21:38:06.8869752Z } 2023-01-11T21:38:06.8869819Z } 2023-01-11T21:38:06.8869887Z } 2023-01-11T21:38:06.8869951Z } 2023-01-11T21:38:06.8870013Z } 2023-01-11T21:38:06.8870098Z ''') 2023-01-11T21:38:06.8870104Z 2023-01-11T21:38:06.8870108Z 2023-01-11T21:38:06.8870203Z async_compile.wait(globals()) 2023-01-11T21:38:06.8870280Z del async_compile 2023-01-11T21:38:06.8870286Z 2023-01-11T21:38:06.8870359Z def call(args): 2023-01-11T21:38:06.8870439Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.8870510Z args.clear() 2023-01-11T21:38:06.8870709Z buf0 = empty_strided((10, 10), (10, 1), device='cpu', dtype=torch.float64) 2023-01-11T21:38:06.8870874Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.8870950Z del arg0_1 2023-01-11T21:38:06.8871024Z del arg1_1 2023-01-11T21:38:06.8871100Z return (buf0, ) 2023-01-11T21:38:06.8871106Z 2023-01-11T21:38:06.8871110Z 2023-01-11T21:38:06.8871191Z if __name__ == "__main__": 2023-01-11T21:38:06.8871309Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8871429Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8871629Z arg0_1 = rand_strided((10, 10), (10, 1), device='cpu', dtype=torch.float64) 2023-01-11T21:38:06.8871825Z arg1_1 = rand_strided((10, 10), (30, 2), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.8871946Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.8871981Z 2023-01-11T21:38:06.8872053Z ok (1.679s) 2023-01-11T21:38:06.8872530Z test_cpu_double_transposed (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8872665Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8872920Z [2023-01-11 21:37:19,470] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1062 2023-01-11T21:38:06.8873186Z [2023-01-11 21:37:21,136] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1062 2023-01-11T21:38:06.8873191Z 2023-01-11T21:38:06.8873289Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8873357Z import torch 2023-01-11T21:38:06.8873435Z import random 2023-01-11T21:38:06.8873555Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8873678Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8873683Z 2023-01-11T21:38:06.8873764Z aten = torch.ops.aten 2023-01-11T21:38:06.8873900Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8873994Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8873999Z 2023-01-11T21:38:06.8874066Z import triton 2023-01-11T21:38:06.8874158Z import triton.language as tl 2023-01-11T21:38:06.8874282Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8874451Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8874457Z 2023-01-11T21:38:06.8874461Z 2023-01-11T21:38:06.8874598Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.8874802Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.8874929Z extern "C" void kernel(const double* __restrict__ in_ptr0, 2023-01-11T21:38:06.8875039Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:06.8875138Z double* __restrict__ out_ptr0) 2023-01-11T21:38:06.8875203Z { 2023-01-11T21:38:06.8875303Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.8875369Z { 2023-01-11T21:38:06.8875465Z #pragma omp for collapse(2) 2023-01-11T21:38:06.8875552Z for(long i0=0; i0<10; i0+=1) 2023-01-11T21:38:06.8875619Z { 2023-01-11T21:38:06.8875703Z for(long i1=0; i1<10; i1+=1) 2023-01-11T21:38:06.8875769Z { 2023-01-11T21:38:06.8875841Z { 2023-01-11T21:38:06.8875911Z { 2023-01-11T21:38:06.8876019Z auto tmp0 = in_ptr0[i1 + (10*i0)]; 2023-01-11T21:38:06.8876125Z auto tmp1 = in_ptr1[i0 + (10*i1)]; 2023-01-11T21:38:06.8876243Z auto tmp2 = static_cast(tmp1); 2023-01-11T21:38:06.8876339Z auto tmp3 = tmp0 + tmp2; 2023-01-11T21:38:06.8876440Z out_ptr0[i1 + (10*i0)] = tmp3; 2023-01-11T21:38:06.8876513Z } 2023-01-11T21:38:06.8876582Z } 2023-01-11T21:38:06.8876650Z } 2023-01-11T21:38:06.8876717Z } 2023-01-11T21:38:06.8876783Z } 2023-01-11T21:38:06.8876840Z } 2023-01-11T21:38:06.8876924Z ''') 2023-01-11T21:38:06.8876929Z 2023-01-11T21:38:06.8876934Z 2023-01-11T21:38:06.8877028Z async_compile.wait(globals()) 2023-01-11T21:38:06.8877104Z del async_compile 2023-01-11T21:38:06.8877109Z 2023-01-11T21:38:06.8877186Z def call(args): 2023-01-11T21:38:06.8877264Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.8877338Z args.clear() 2023-01-11T21:38:06.8877530Z buf0 = empty_strided((10, 10), (10, 1), device='cpu', dtype=torch.float64) 2023-01-11T21:38:06.8877692Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.8877792Z del arg0_1 2023-01-11T21:38:06.8877865Z del arg1_1 2023-01-11T21:38:06.8877941Z return (buf0, ) 2023-01-11T21:38:06.8877946Z 2023-01-11T21:38:06.8877950Z 2023-01-11T21:38:06.8878030Z if __name__ == "__main__": 2023-01-11T21:38:06.8878146Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8878272Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8878464Z arg0_1 = rand_strided((10, 10), (10, 1), device='cpu', dtype=torch.float64) 2023-01-11T21:38:06.8878659Z arg1_1 = rand_strided((10, 10), (1, 10), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.8878778Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.8878786Z 2023-01-11T21:38:06.8878856Z ok (1.680s) 2023-01-11T21:38:06.8879330Z test_cpu_int_broadcast1 (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8879463Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8879720Z [2023-01-11 21:37:21,149] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1063 2023-01-11T21:38:06.8879982Z [2023-01-11 21:37:22,820] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1063 2023-01-11T21:38:06.8879988Z 2023-01-11T21:38:06.8880087Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8880187Z import torch 2023-01-11T21:38:06.8880262Z import random 2023-01-11T21:38:06.8880380Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8880505Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8880510Z 2023-01-11T21:38:06.8880591Z aten = torch.ops.aten 2023-01-11T21:38:06.8880731Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8880829Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8880834Z 2023-01-11T21:38:06.8880908Z import triton 2023-01-11T21:38:06.8880993Z import triton.language as tl 2023-01-11T21:38:06.8881118Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8881258Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8881263Z 2023-01-11T21:38:06.8881268Z 2023-01-11T21:38:06.8881405Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.8881608Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.8881735Z extern "C" void kernel(const int* __restrict__ in_ptr0, 2023-01-11T21:38:06.8881845Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:06.8881950Z float* __restrict__ out_ptr0) 2023-01-11T21:38:06.8882009Z { 2023-01-11T21:38:06.8882111Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.8882179Z { 2023-01-11T21:38:06.8882259Z #pragma omp for 2023-01-11T21:38:06.8882345Z for(long i0=0; i0<10; i0+=1) 2023-01-11T21:38:06.8882414Z { 2023-01-11T21:38:06.8882474Z { 2023-01-11T21:38:06.8882543Z { 2023-01-11T21:38:06.8882640Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:06.8882738Z auto tmp2 = in_ptr1[i0]; 2023-01-11T21:38:06.8882851Z auto tmp1 = static_cast(tmp0); 2023-01-11T21:38:06.8882946Z auto tmp3 = tmp1 + tmp2; 2023-01-11T21:38:06.8883037Z out_ptr0[i0] = tmp3; 2023-01-11T21:38:06.8883102Z } 2023-01-11T21:38:06.8883167Z } 2023-01-11T21:38:06.8883237Z } 2023-01-11T21:38:06.8883306Z } 2023-01-11T21:38:06.8883375Z } 2023-01-11T21:38:06.8883458Z ''') 2023-01-11T21:38:06.8883464Z 2023-01-11T21:38:06.8883468Z 2023-01-11T21:38:06.8883596Z async_compile.wait(globals()) 2023-01-11T21:38:06.8883668Z del async_compile 2023-01-11T21:38:06.8883680Z 2023-01-11T21:38:06.8883748Z def call(args): 2023-01-11T21:38:06.8883829Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.8883904Z args.clear() 2023-01-11T21:38:06.8884098Z buf0 = empty_strided((10, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.8884262Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.8884339Z del arg0_1 2023-01-11T21:38:06.8884413Z del arg1_1 2023-01-11T21:38:06.8884481Z return (buf0, ) 2023-01-11T21:38:06.8884486Z 2023-01-11T21:38:06.8884494Z 2023-01-11T21:38:06.8884574Z if __name__ == "__main__": 2023-01-11T21:38:06.8884691Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8884817Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8885007Z arg0_1 = rand_strided((10, ), (1, ), device='cpu', dtype=torch.int32) 2023-01-11T21:38:06.8885202Z arg1_1 = rand_strided((10, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.8885320Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.8885326Z 2023-01-11T21:38:06.8885398Z ok (1.684s) 2023-01-11T21:38:06.8885852Z test_cpu_int_broadcast2 (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8886010Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8886267Z [2023-01-11 21:37:22,834] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1064 2023-01-11T21:38:06.8886529Z [2023-01-11 21:37:24,498] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1064 2023-01-11T21:38:06.8886534Z 2023-01-11T21:38:06.8886631Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8886705Z import torch 2023-01-11T21:38:06.8886779Z import random 2023-01-11T21:38:06.8886897Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8887021Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8887026Z 2023-01-11T21:38:06.8887101Z aten = torch.ops.aten 2023-01-11T21:38:06.8887237Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8887332Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8887338Z 2023-01-11T21:38:06.8887414Z import triton 2023-01-11T21:38:06.8887507Z import triton.language as tl 2023-01-11T21:38:06.8887631Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8887769Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8887775Z 2023-01-11T21:38:06.8887779Z 2023-01-11T21:38:06.8887922Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.8888116Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.8888235Z extern "C" void kernel(const int* __restrict__ in_ptr0, 2023-01-11T21:38:06.8888344Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:06.8888447Z float* __restrict__ out_ptr0) 2023-01-11T21:38:06.8888513Z { 2023-01-11T21:38:06.8888615Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.8888681Z { 2023-01-11T21:38:06.8888768Z #pragma omp for collapse(2) 2023-01-11T21:38:06.8888852Z for(long i0=0; i0<10; i0+=1) 2023-01-11T21:38:06.8888924Z { 2023-01-11T21:38:06.8889014Z for(long i1=0; i1<10; i1+=1) 2023-01-11T21:38:06.8889080Z { 2023-01-11T21:38:06.8889149Z { 2023-01-11T21:38:06.8889219Z { 2023-01-11T21:38:06.8889312Z auto tmp0 = in_ptr0[i1]; 2023-01-11T21:38:06.8889444Z auto tmp2 = in_ptr1[i0]; 2023-01-11T21:38:06.8889561Z auto tmp1 = static_cast(tmp0); 2023-01-11T21:38:06.8889661Z auto tmp3 = tmp1 + tmp2; 2023-01-11T21:38:06.8889763Z out_ptr0[i1 + (10*i0)] = tmp3; 2023-01-11T21:38:06.8889834Z } 2023-01-11T21:38:06.8889906Z } 2023-01-11T21:38:06.8889966Z } 2023-01-11T21:38:06.8890032Z } 2023-01-11T21:38:06.8890099Z } 2023-01-11T21:38:06.8890164Z } 2023-01-11T21:38:06.8890249Z ''') 2023-01-11T21:38:06.8890255Z 2023-01-11T21:38:06.8890259Z 2023-01-11T21:38:06.8890354Z async_compile.wait(globals()) 2023-01-11T21:38:06.8890430Z del async_compile 2023-01-11T21:38:06.8890435Z 2023-01-11T21:38:06.8890502Z def call(args): 2023-01-11T21:38:06.8890580Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.8890655Z args.clear() 2023-01-11T21:38:06.8890866Z buf0 = empty_strided((1, 10, 10), (100, 10, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.8891031Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.8891105Z del arg0_1 2023-01-11T21:38:06.8891177Z del arg1_1 2023-01-11T21:38:06.8891245Z return (buf0, ) 2023-01-11T21:38:06.8891250Z 2023-01-11T21:38:06.8891261Z 2023-01-11T21:38:06.8891335Z if __name__ == "__main__": 2023-01-11T21:38:06.8891451Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8891576Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8891771Z arg0_1 = rand_strided((10, ), (1, ), device='cpu', dtype=torch.int32) 2023-01-11T21:38:06.8892015Z arg1_1 = rand_strided((1, 10, 1), (10, 1, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.8892135Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.8892140Z 2023-01-11T21:38:06.8892212Z ok (1.678s) 2023-01-11T21:38:06.8892688Z test_cpu_int_broadcast3 (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8892813Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8893070Z [2023-01-11 21:37:24,512] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1065 2023-01-11T21:38:06.8893332Z [2023-01-11 21:37:26,168] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1065 2023-01-11T21:38:06.8893341Z 2023-01-11T21:38:06.8893438Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8893513Z import torch 2023-01-11T21:38:06.8893589Z import random 2023-01-11T21:38:06.8893708Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8893834Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8893839Z 2023-01-11T21:38:06.8893914Z aten = torch.ops.aten 2023-01-11T21:38:06.8894051Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8894146Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8894151Z 2023-01-11T21:38:06.8894228Z import triton 2023-01-11T21:38:06.8894320Z import triton.language as tl 2023-01-11T21:38:06.8894445Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8894694Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8894701Z 2023-01-11T21:38:06.8894708Z 2023-01-11T21:38:06.8894846Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.8895044Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.8895163Z extern "C" void kernel(const int* __restrict__ in_ptr0, 2023-01-11T21:38:06.8895273Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:06.8895424Z float* __restrict__ out_ptr0) 2023-01-11T21:38:06.8895490Z { 2023-01-11T21:38:06.8895593Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.8895659Z { 2023-01-11T21:38:06.8895734Z #pragma omp for 2023-01-11T21:38:06.8895821Z for(long i0=0; i0<10; i0+=1) 2023-01-11T21:38:06.8895886Z { 2023-01-11T21:38:06.8895954Z { 2023-01-11T21:38:06.8896020Z { 2023-01-11T21:38:06.8896117Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:06.8896213Z auto tmp2 = in_ptr1[0]; 2023-01-11T21:38:06.8896319Z auto tmp1 = static_cast(tmp0); 2023-01-11T21:38:06.8896419Z auto tmp3 = tmp1 + tmp2; 2023-01-11T21:38:06.8896510Z out_ptr0[i0] = tmp3; 2023-01-11T21:38:06.8896579Z } 2023-01-11T21:38:06.8896645Z } 2023-01-11T21:38:06.8896711Z } 2023-01-11T21:38:06.8896777Z } 2023-01-11T21:38:06.8896836Z } 2023-01-11T21:38:06.8896921Z ''') 2023-01-11T21:38:06.8896927Z 2023-01-11T21:38:06.8896931Z 2023-01-11T21:38:06.8897023Z async_compile.wait(globals()) 2023-01-11T21:38:06.8897098Z del async_compile 2023-01-11T21:38:06.8897103Z 2023-01-11T21:38:06.8897245Z def call(args): 2023-01-11T21:38:06.8897330Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.8897406Z args.clear() 2023-01-11T21:38:06.8897593Z buf0 = empty_strided((10, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.8897759Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.8897831Z del arg0_1 2023-01-11T21:38:06.8897941Z del arg1_1 2023-01-11T21:38:06.8898015Z return (buf0, ) 2023-01-11T21:38:06.8898021Z 2023-01-11T21:38:06.8898026Z 2023-01-11T21:38:06.8898106Z if __name__ == "__main__": 2023-01-11T21:38:06.8898222Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8898350Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8898536Z arg0_1 = rand_strided((10, ), (1, ), device='cpu', dtype=torch.int32) 2023-01-11T21:38:06.8898724Z arg1_1 = rand_strided((1, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.8898842Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.8898847Z 2023-01-11T21:38:06.8898917Z ok (1.671s) 2023-01-11T21:38:06.8899381Z test_cpu_int_dense (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8899515Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8899774Z [2023-01-11 21:37:26,182] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1066 2023-01-11T21:38:06.8900036Z [2023-01-11 21:37:27,834] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1066 2023-01-11T21:38:06.8900041Z 2023-01-11T21:38:06.8900139Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8900213Z import torch 2023-01-11T21:38:06.8900280Z import random 2023-01-11T21:38:06.8900397Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8900521Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8900526Z 2023-01-11T21:38:06.8900607Z aten = torch.ops.aten 2023-01-11T21:38:06.8900741Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8900840Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8900845Z 2023-01-11T21:38:06.8900919Z import triton 2023-01-11T21:38:06.8901004Z import triton.language as tl 2023-01-11T21:38:06.8901129Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8901294Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8901300Z 2023-01-11T21:38:06.8901304Z 2023-01-11T21:38:06.8901441Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.8901645Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.8901764Z extern "C" void kernel(const int* __restrict__ in_ptr0, 2023-01-11T21:38:06.8901873Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:06.8901978Z float* __restrict__ out_ptr0) 2023-01-11T21:38:06.8902036Z { 2023-01-11T21:38:06.8902138Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.8902207Z { 2023-01-11T21:38:06.8902301Z #pragma omp for collapse(2) 2023-01-11T21:38:06.8902387Z for(long i0=0; i0<10; i0+=1) 2023-01-11T21:38:06.8902453Z { 2023-01-11T21:38:06.8902544Z for(long i1=0; i1<10; i1+=1) 2023-01-11T21:38:06.8902606Z { 2023-01-11T21:38:06.8902674Z { 2023-01-11T21:38:06.8902747Z { 2023-01-11T21:38:06.8902848Z auto tmp0 = in_ptr0[i1]; 2023-01-11T21:38:06.8902957Z auto tmp2 = in_ptr1[i1 + (10*i0)]; 2023-01-11T21:38:06.8903072Z auto tmp1 = static_cast(tmp0); 2023-01-11T21:38:06.8903169Z auto tmp3 = tmp1 + tmp2; 2023-01-11T21:38:06.8903263Z out_ptr0[i1 + (10*i0)] = tmp3; 2023-01-11T21:38:06.8903333Z } 2023-01-11T21:38:06.8903401Z } 2023-01-11T21:38:06.8903469Z } 2023-01-11T21:38:06.8903534Z } 2023-01-11T21:38:06.8903628Z } 2023-01-11T21:38:06.8903685Z } 2023-01-11T21:38:06.8903769Z ''') 2023-01-11T21:38:06.8903774Z 2023-01-11T21:38:06.8903779Z 2023-01-11T21:38:06.8903871Z async_compile.wait(globals()) 2023-01-11T21:38:06.8903948Z del async_compile 2023-01-11T21:38:06.8903953Z 2023-01-11T21:38:06.8904027Z def call(args): 2023-01-11T21:38:06.8904108Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.8904182Z args.clear() 2023-01-11T21:38:06.8904380Z buf0 = empty_strided((10, 10), (10, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.8904537Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.8904610Z del arg0_1 2023-01-11T21:38:06.8904683Z del arg1_1 2023-01-11T21:38:06.8904758Z return (buf0, ) 2023-01-11T21:38:06.8904763Z 2023-01-11T21:38:06.8904768Z 2023-01-11T21:38:06.8904846Z if __name__ == "__main__": 2023-01-11T21:38:06.8904962Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8905090Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8905273Z arg0_1 = rand_strided((10, ), (1, ), device='cpu', dtype=torch.int32) 2023-01-11T21:38:06.8905470Z arg1_1 = rand_strided((10, 10), (10, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.8905589Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.8905596Z 2023-01-11T21:38:06.8905667Z ok (1.665s) 2023-01-11T21:38:06.8906133Z test_cpu_int_double (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8906264Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8906522Z [2023-01-11 21:37:27,848] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1067 2023-01-11T21:38:06.8906784Z [2023-01-11 21:37:29,511] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1067 2023-01-11T21:38:06.8906789Z 2023-01-11T21:38:06.8906887Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8906961Z import torch 2023-01-11T21:38:06.8907057Z import random 2023-01-11T21:38:06.8907179Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8907302Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8907308Z 2023-01-11T21:38:06.8907388Z aten = torch.ops.aten 2023-01-11T21:38:06.8907525Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8907622Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8907627Z 2023-01-11T21:38:06.8907702Z import triton 2023-01-11T21:38:06.8907787Z import triton.language as tl 2023-01-11T21:38:06.8907915Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8908058Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8908063Z 2023-01-11T21:38:06.8908068Z 2023-01-11T21:38:06.8908204Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.8908407Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.8908532Z extern "C" void kernel(const int* __restrict__ in_ptr0, 2023-01-11T21:38:06.8908644Z const double* __restrict__ in_ptr1, 2023-01-11T21:38:06.8908748Z double* __restrict__ out_ptr0) 2023-01-11T21:38:06.8908807Z { 2023-01-11T21:38:06.8908911Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.8908977Z { 2023-01-11T21:38:06.8909070Z #pragma omp for collapse(2) 2023-01-11T21:38:06.8909155Z for(long i0=0; i0<10; i0+=1) 2023-01-11T21:38:06.8909223Z { 2023-01-11T21:38:06.8909315Z for(long i1=0; i1<10; i1+=1) 2023-01-11T21:38:06.8909375Z { 2023-01-11T21:38:06.8909478Z { 2023-01-11T21:38:06.8909550Z { 2023-01-11T21:38:06.8909649Z auto tmp0 = in_ptr0[i1]; 2023-01-11T21:38:06.8909756Z auto tmp2 = in_ptr1[i1 + (10*i0)]; 2023-01-11T21:38:06.8909872Z auto tmp1 = static_cast(tmp0); 2023-01-11T21:38:06.8909973Z auto tmp3 = tmp1 + tmp2; 2023-01-11T21:38:06.8910068Z out_ptr0[i1 + (10*i0)] = tmp3; 2023-01-11T21:38:06.8910139Z } 2023-01-11T21:38:06.8910209Z } 2023-01-11T21:38:06.8910275Z } 2023-01-11T21:38:06.8910345Z } 2023-01-11T21:38:06.8910412Z } 2023-01-11T21:38:06.8910475Z } 2023-01-11T21:38:06.8910553Z ''') 2023-01-11T21:38:06.8910560Z 2023-01-11T21:38:06.8910564Z 2023-01-11T21:38:06.8910660Z async_compile.wait(globals()) 2023-01-11T21:38:06.8910738Z del async_compile 2023-01-11T21:38:06.8910743Z 2023-01-11T21:38:06.8910816Z def call(args): 2023-01-11T21:38:06.8910899Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.8910975Z args.clear() 2023-01-11T21:38:06.8911175Z buf0 = empty_strided((10, 10), (10, 1), device='cpu', dtype=torch.float64) 2023-01-11T21:38:06.8911335Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.8911410Z del arg0_1 2023-01-11T21:38:06.8911482Z del arg1_1 2023-01-11T21:38:06.8911558Z return (buf0, ) 2023-01-11T21:38:06.8911563Z 2023-01-11T21:38:06.8911568Z 2023-01-11T21:38:06.8911652Z if __name__ == "__main__": 2023-01-11T21:38:06.8911770Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8911895Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8912087Z arg0_1 = rand_strided((10, ), (1, ), device='cpu', dtype=torch.int32) 2023-01-11T21:38:06.8912279Z arg1_1 = rand_strided((10, 10), (10, 1), device='cpu', dtype=torch.float64) 2023-01-11T21:38:06.8912399Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.8912407Z 2023-01-11T21:38:06.8912477Z ok (1.677s) 2023-01-11T21:38:06.8912967Z test_cpu_int_int (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8913103Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8913361Z [2023-01-11 21:37:29,525] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1068 2023-01-11T21:38:06.8913625Z [2023-01-11 21:37:31,192] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1068 2023-01-11T21:38:06.8913631Z 2023-01-11T21:38:06.8913728Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8913805Z import torch 2023-01-11T21:38:06.8913873Z import random 2023-01-11T21:38:06.8913995Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8914123Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8914128Z 2023-01-11T21:38:06.8914210Z aten = torch.ops.aten 2023-01-11T21:38:06.8914350Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8914447Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8914452Z 2023-01-11T21:38:06.8914525Z import triton 2023-01-11T21:38:06.8914617Z import triton.language as tl 2023-01-11T21:38:06.8914734Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8914873Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8914878Z 2023-01-11T21:38:06.8914883Z 2023-01-11T21:38:06.8915019Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.8915225Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.8915373Z extern "C" void kernel(const int* __restrict__ in_ptr0, 2023-01-11T21:38:06.8915480Z const int* __restrict__ in_ptr1, 2023-01-11T21:38:06.8915583Z int* __restrict__ out_ptr0) 2023-01-11T21:38:06.8915651Z { 2023-01-11T21:38:06.8915746Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.8915814Z { 2023-01-11T21:38:06.8915894Z #pragma omp for 2023-01-11T21:38:06.8915980Z for(long i0=0; i0<10; i0+=1) 2023-01-11T21:38:06.8916048Z { 2023-01-11T21:38:06.8916118Z { 2023-01-11T21:38:06.8916179Z { 2023-01-11T21:38:06.8916276Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:06.8916374Z auto tmp1 = in_ptr1[i0]; 2023-01-11T21:38:06.8916468Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.8916559Z out_ptr0[i0] = tmp2; 2023-01-11T21:38:06.8916631Z } 2023-01-11T21:38:06.8916699Z } 2023-01-11T21:38:06.8916762Z } 2023-01-11T21:38:06.8916829Z } 2023-01-11T21:38:06.8916893Z } 2023-01-11T21:38:06.8916977Z ''') 2023-01-11T21:38:06.8916983Z 2023-01-11T21:38:06.8916987Z 2023-01-11T21:38:06.8917079Z async_compile.wait(globals()) 2023-01-11T21:38:06.8917155Z del async_compile 2023-01-11T21:38:06.8917160Z 2023-01-11T21:38:06.8917238Z def call(args): 2023-01-11T21:38:06.8917310Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.8917385Z args.clear() 2023-01-11T21:38:06.8917575Z buf0 = empty_strided((10, ), (1, ), device='cpu', dtype=torch.int32) 2023-01-11T21:38:06.8917743Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.8917816Z del arg0_1 2023-01-11T21:38:06.8917887Z del arg1_1 2023-01-11T21:38:06.8917962Z return (buf0, ) 2023-01-11T21:38:06.8917966Z 2023-01-11T21:38:06.8917971Z 2023-01-11T21:38:06.8918049Z if __name__ == "__main__": 2023-01-11T21:38:06.8918159Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8918288Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8918479Z arg0_1 = rand_strided((10, ), (1, ), device='cpu', dtype=torch.int32) 2023-01-11T21:38:06.8918668Z arg1_1 = rand_strided((10, ), (1, ), device='cpu', dtype=torch.int32) 2023-01-11T21:38:06.8918817Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.8918823Z 2023-01-11T21:38:06.8918896Z ok (1.681s) 2023-01-11T21:38:06.8919362Z test_cpu_int_strided (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8919493Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8919752Z [2023-01-11 21:37:31,207] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1069 2023-01-11T21:38:06.8920006Z [2023-01-11 21:37:32,869] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1069 2023-01-11T21:38:06.8920019Z 2023-01-11T21:38:06.8920113Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8920186Z import torch 2023-01-11T21:38:06.8920261Z import random 2023-01-11T21:38:06.8920380Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8920505Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8920510Z 2023-01-11T21:38:06.8920591Z aten = torch.ops.aten 2023-01-11T21:38:06.8920727Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8920816Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8920821Z 2023-01-11T21:38:06.8920894Z import triton 2023-01-11T21:38:06.8920985Z import triton.language as tl 2023-01-11T21:38:06.8921111Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8921279Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8921285Z 2023-01-11T21:38:06.8921289Z 2023-01-11T21:38:06.8921428Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.8921636Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.8921760Z extern "C" void kernel(const int* __restrict__ in_ptr0, 2023-01-11T21:38:06.8921866Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:06.8921977Z float* __restrict__ out_ptr0) 2023-01-11T21:38:06.8922044Z { 2023-01-11T21:38:06.8922148Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.8922217Z { 2023-01-11T21:38:06.8922315Z #pragma omp for collapse(2) 2023-01-11T21:38:06.8922405Z for(long i0=0; i0<10; i0+=1) 2023-01-11T21:38:06.8922467Z { 2023-01-11T21:38:06.8922561Z for(long i1=0; i1<10; i1+=1) 2023-01-11T21:38:06.8922633Z { 2023-01-11T21:38:06.8922705Z { 2023-01-11T21:38:06.8922775Z { 2023-01-11T21:38:06.8922877Z auto tmp0 = in_ptr0[i1]; 2023-01-11T21:38:06.8922989Z auto tmp2 = in_ptr1[(2*i1) + (30*i0)]; 2023-01-11T21:38:06.8923105Z auto tmp1 = static_cast(tmp0); 2023-01-11T21:38:06.8923207Z auto tmp3 = tmp1 + tmp2; 2023-01-11T21:38:06.8923310Z out_ptr0[i1 + (10*i0)] = tmp3; 2023-01-11T21:38:06.8923382Z } 2023-01-11T21:38:06.8929541Z } 2023-01-11T21:38:06.8929626Z } 2023-01-11T21:38:06.8929697Z } 2023-01-11T21:38:06.8929769Z } 2023-01-11T21:38:06.8929838Z } 2023-01-11T21:38:06.8929939Z ''') 2023-01-11T21:38:06.8929944Z 2023-01-11T21:38:06.8929949Z 2023-01-11T21:38:06.8930049Z async_compile.wait(globals()) 2023-01-11T21:38:06.8930124Z del async_compile 2023-01-11T21:38:06.8930134Z 2023-01-11T21:38:06.8930214Z def call(args): 2023-01-11T21:38:06.8930294Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.8930365Z args.clear() 2023-01-11T21:38:06.8930567Z buf0 = empty_strided((10, 10), (10, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.8930785Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.8930859Z del arg0_1 2023-01-11T21:38:06.8930931Z del arg1_1 2023-01-11T21:38:06.8931007Z return (buf0, ) 2023-01-11T21:38:06.8931012Z 2023-01-11T21:38:06.8931016Z 2023-01-11T21:38:06.8931098Z if __name__ == "__main__": 2023-01-11T21:38:06.8931217Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8931339Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8931531Z arg0_1 = rand_strided((10, ), (1, ), device='cpu', dtype=torch.int32) 2023-01-11T21:38:06.8931732Z arg1_1 = rand_strided((10, 10), (30, 2), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.8931859Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.8931864Z 2023-01-11T21:38:06.8931935Z ok (1.677s) 2023-01-11T21:38:06.8932413Z test_cpu_int_transposed (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8932545Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8932805Z [2023-01-11 21:37:32,883] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1070 2023-01-11T21:38:06.8933069Z [2023-01-11 21:37:34,551] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1070 2023-01-11T21:38:06.8933075Z 2023-01-11T21:38:06.8933197Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8933272Z import torch 2023-01-11T21:38:06.8933349Z import random 2023-01-11T21:38:06.8933468Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8933591Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8933596Z 2023-01-11T21:38:06.8933679Z aten = torch.ops.aten 2023-01-11T21:38:06.8933815Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8933911Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8933917Z 2023-01-11T21:38:06.8933983Z import triton 2023-01-11T21:38:06.8934073Z import triton.language as tl 2023-01-11T21:38:06.8934198Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8934336Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8934341Z 2023-01-11T21:38:06.8934346Z 2023-01-11T21:38:06.8934622Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.8934830Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.8934954Z extern "C" void kernel(const int* __restrict__ in_ptr0, 2023-01-11T21:38:06.8935065Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:06.8935162Z float* __restrict__ out_ptr0) 2023-01-11T21:38:06.8935227Z { 2023-01-11T21:38:06.8935334Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.8935400Z { 2023-01-11T21:38:06.8935499Z #pragma omp for collapse(2) 2023-01-11T21:38:06.8935608Z for(long i0=0; i0<10; i0+=1) 2023-01-11T21:38:06.8935668Z { 2023-01-11T21:38:06.8935783Z for(long i1=0; i1<10; i1+=1) 2023-01-11T21:38:06.8935853Z { 2023-01-11T21:38:06.8935922Z { 2023-01-11T21:38:06.8935992Z { 2023-01-11T21:38:06.8936092Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:06.8936201Z auto tmp2 = in_ptr1[i1 + (10*i0)]; 2023-01-11T21:38:06.8936313Z auto tmp1 = static_cast(tmp0); 2023-01-11T21:38:06.8936412Z auto tmp3 = tmp1 + tmp2; 2023-01-11T21:38:06.8936517Z out_ptr0[i1 + (10*i0)] = tmp3; 2023-01-11T21:38:06.8936588Z } 2023-01-11T21:38:06.8936655Z } 2023-01-11T21:38:06.8936774Z } 2023-01-11T21:38:06.8936843Z } 2023-01-11T21:38:06.8936904Z } 2023-01-11T21:38:06.8936969Z } 2023-01-11T21:38:06.8937055Z ''') 2023-01-11T21:38:06.8937060Z 2023-01-11T21:38:06.8937065Z 2023-01-11T21:38:06.8937223Z async_compile.wait(globals()) 2023-01-11T21:38:06.8937315Z del async_compile 2023-01-11T21:38:06.8937321Z 2023-01-11T21:38:06.8937395Z def call(args): 2023-01-11T21:38:06.8937475Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.8937543Z args.clear() 2023-01-11T21:38:06.8937749Z buf0 = empty_strided((10, 10), (1, 10), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.8937916Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.8937993Z del arg0_1 2023-01-11T21:38:06.8938063Z del arg1_1 2023-01-11T21:38:06.8938138Z return (buf0, ) 2023-01-11T21:38:06.8938143Z 2023-01-11T21:38:06.8938147Z 2023-01-11T21:38:06.8938226Z if __name__ == "__main__": 2023-01-11T21:38:06.8938345Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8938464Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8938657Z arg0_1 = rand_strided((10, ), (1, ), device='cpu', dtype=torch.int32) 2023-01-11T21:38:06.8938857Z arg1_1 = rand_strided((10, 10), (1, 10), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.8938976Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.8938982Z 2023-01-11T21:38:06.8939053Z ok (1.682s) 2023-01-11T21:38:06.8939528Z test_cpu_strided_broadcast1 (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8939749Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8940009Z [2023-01-11 21:37:34,565] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1071 2023-01-11T21:38:06.8940273Z [2023-01-11 21:37:36,265] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1071 2023-01-11T21:38:06.8940278Z 2023-01-11T21:38:06.8940376Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8940444Z import torch 2023-01-11T21:38:06.8940518Z import random 2023-01-11T21:38:06.8940638Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8940762Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8940771Z 2023-01-11T21:38:06.8940854Z aten = torch.ops.aten 2023-01-11T21:38:06.8940989Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8941084Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8941089Z 2023-01-11T21:38:06.8941156Z import triton 2023-01-11T21:38:06.8941247Z import triton.language as tl 2023-01-11T21:38:06.8941374Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8941513Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8941519Z 2023-01-11T21:38:06.8941523Z 2023-01-11T21:38:06.8941661Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.8941870Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.8941995Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.8942104Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:06.8942200Z float* __restrict__ out_ptr0) 2023-01-11T21:38:06.8942267Z { 2023-01-11T21:38:06.8942367Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.8942432Z { 2023-01-11T21:38:06.8942529Z #pragma omp for collapse(2) 2023-01-11T21:38:06.8942614Z for(long i0=0; i0<10; i0+=1) 2023-01-11T21:38:06.8942679Z { 2023-01-11T21:38:06.8942796Z for(long i1=0; i1<10; i1+=1) 2023-01-11T21:38:06.8942866Z { 2023-01-11T21:38:06.8942934Z { 2023-01-11T21:38:06.8943005Z { 2023-01-11T21:38:06.8943117Z auto tmp0 = in_ptr0[(2*i1) + (30*i0)]; 2023-01-11T21:38:06.8943218Z auto tmp1 = in_ptr1[i1]; 2023-01-11T21:38:06.8943316Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.8943410Z out_ptr0[i1 + (10*i0)] = tmp2; 2023-01-11T21:38:06.8943480Z } 2023-01-11T21:38:06.8943549Z } 2023-01-11T21:38:06.8943616Z } 2023-01-11T21:38:06.8943684Z } 2023-01-11T21:38:06.8943749Z } 2023-01-11T21:38:06.8943807Z } 2023-01-11T21:38:06.8943892Z ''') 2023-01-11T21:38:06.8943897Z 2023-01-11T21:38:06.8943902Z 2023-01-11T21:38:06.8943993Z async_compile.wait(globals()) 2023-01-11T21:38:06.8944069Z del async_compile 2023-01-11T21:38:06.8944073Z 2023-01-11T21:38:06.8944147Z def call(args): 2023-01-11T21:38:06.8944228Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.8944304Z args.clear() 2023-01-11T21:38:06.8944502Z buf0 = empty_strided((10, 10), (10, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.8944662Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.8944733Z del arg0_1 2023-01-11T21:38:06.8944803Z del arg1_1 2023-01-11T21:38:06.8944878Z return (buf0, ) 2023-01-11T21:38:06.8944883Z 2023-01-11T21:38:06.8944887Z 2023-01-11T21:38:06.8944968Z if __name__ == "__main__": 2023-01-11T21:38:06.8945084Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8945239Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8945437Z arg0_1 = rand_strided((10, 10), (30, 2), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.8945622Z arg1_1 = rand_strided((10, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.8945743Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.8945749Z 2023-01-11T21:38:06.8945820Z ok (1.714s) 2023-01-11T21:38:06.8946293Z test_cpu_strided_broadcast2 (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8946424Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8946686Z [2023-01-11 21:37:36,280] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1072 2023-01-11T21:38:06.8946949Z [2023-01-11 21:37:37,943] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1072 2023-01-11T21:38:06.8946955Z 2023-01-11T21:38:06.8947051Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8947126Z import torch 2023-01-11T21:38:06.8947193Z import random 2023-01-11T21:38:06.8947312Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8947436Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8947441Z 2023-01-11T21:38:06.8947521Z aten = torch.ops.aten 2023-01-11T21:38:06.8947656Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8947751Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8947757Z 2023-01-11T21:38:06.8947830Z import triton 2023-01-11T21:38:06.8947922Z import triton.language as tl 2023-01-11T21:38:06.8948038Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8948179Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8948184Z 2023-01-11T21:38:06.8948189Z 2023-01-11T21:38:06.8948326Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.8948556Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.8948681Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.8948793Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:06.8948900Z float* __restrict__ out_ptr0) 2023-01-11T21:38:06.8948968Z { 2023-01-11T21:38:06.8949064Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.8949132Z { 2023-01-11T21:38:06.8949231Z #pragma omp for collapse(2) 2023-01-11T21:38:06.8949320Z for(long i0=0; i0<10; i0+=1) 2023-01-11T21:38:06.8949389Z { 2023-01-11T21:38:06.8949480Z for(long i1=0; i1<10; i1+=1) 2023-01-11T21:38:06.8949545Z { 2023-01-11T21:38:06.8949615Z { 2023-01-11T21:38:06.8949687Z { 2023-01-11T21:38:06.8949803Z auto tmp0 = in_ptr0[(2*i1) + (30*i0)]; 2023-01-11T21:38:06.8949905Z auto tmp1 = in_ptr1[i0]; 2023-01-11T21:38:06.8950007Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.8950109Z out_ptr0[i1 + (10*i0)] = tmp2; 2023-01-11T21:38:06.8950176Z } 2023-01-11T21:38:06.8950247Z } 2023-01-11T21:38:06.8950316Z } 2023-01-11T21:38:06.8950385Z } 2023-01-11T21:38:06.8950453Z } 2023-01-11T21:38:06.8950518Z } 2023-01-11T21:38:06.8950605Z ''') 2023-01-11T21:38:06.8950610Z 2023-01-11T21:38:06.8950615Z 2023-01-11T21:38:06.8950704Z async_compile.wait(globals()) 2023-01-11T21:38:06.8950783Z del async_compile 2023-01-11T21:38:06.8950788Z 2023-01-11T21:38:06.8950866Z def call(args): 2023-01-11T21:38:06.8950948Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.8951058Z args.clear() 2023-01-11T21:38:06.8951267Z buf0 = empty_strided((1, 10, 10), (100, 10, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.8951433Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.8951505Z del arg0_1 2023-01-11T21:38:06.8951572Z del arg1_1 2023-01-11T21:38:06.8951647Z return (buf0, ) 2023-01-11T21:38:06.8951652Z 2023-01-11T21:38:06.8951657Z 2023-01-11T21:38:06.8951738Z if __name__ == "__main__": 2023-01-11T21:38:06.8951856Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8951983Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8952178Z arg0_1 = rand_strided((10, 10), (30, 2), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.8952381Z arg1_1 = rand_strided((1, 10, 1), (10, 1, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.8952492Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.8952509Z 2023-01-11T21:38:06.8952573Z ok (1.677s) 2023-01-11T21:38:06.8953045Z test_cpu_strided_broadcast3 (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8953177Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8953434Z [2023-01-11 21:37:37,957] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1073 2023-01-11T21:38:06.8953695Z [2023-01-11 21:37:39,618] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1073 2023-01-11T21:38:06.8953701Z 2023-01-11T21:38:06.8953799Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8953875Z import torch 2023-01-11T21:38:06.8953951Z import random 2023-01-11T21:38:06.8954070Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8954186Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8954191Z 2023-01-11T21:38:06.8954272Z aten = torch.ops.aten 2023-01-11T21:38:06.8954432Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8954529Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8954534Z 2023-01-11T21:38:06.8954608Z import triton 2023-01-11T21:38:06.8954700Z import triton.language as tl 2023-01-11T21:38:06.8954823Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8954955Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8954967Z 2023-01-11T21:38:06.8954972Z 2023-01-11T21:38:06.8955101Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.8955316Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.8955456Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.8955586Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:06.8955688Z float* __restrict__ out_ptr0) 2023-01-11T21:38:06.8955753Z { 2023-01-11T21:38:06.8955852Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.8955915Z { 2023-01-11T21:38:06.8956010Z #pragma omp for collapse(2) 2023-01-11T21:38:06.8956094Z for(long i0=0; i0<10; i0+=1) 2023-01-11T21:38:06.8956160Z { 2023-01-11T21:38:06.8956249Z for(long i1=0; i1<10; i1+=1) 2023-01-11T21:38:06.8956319Z { 2023-01-11T21:38:06.8956389Z { 2023-01-11T21:38:06.8956453Z { 2023-01-11T21:38:06.8956562Z auto tmp0 = in_ptr0[(2*i1) + (30*i0)]; 2023-01-11T21:38:06.8956663Z auto tmp1 = in_ptr1[0]; 2023-01-11T21:38:06.8956762Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.8956895Z out_ptr0[i1 + (10*i0)] = tmp2; 2023-01-11T21:38:06.8956966Z } 2023-01-11T21:38:06.8957034Z } 2023-01-11T21:38:06.8957094Z } 2023-01-11T21:38:06.8957160Z } 2023-01-11T21:38:06.8957225Z } 2023-01-11T21:38:06.8957290Z } 2023-01-11T21:38:06.8957375Z ''') 2023-01-11T21:38:06.8957383Z 2023-01-11T21:38:06.8957388Z 2023-01-11T21:38:06.8957480Z async_compile.wait(globals()) 2023-01-11T21:38:06.8957555Z del async_compile 2023-01-11T21:38:06.8957560Z 2023-01-11T21:38:06.8957628Z def call(args): 2023-01-11T21:38:06.8957706Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.8957780Z args.clear() 2023-01-11T21:38:06.8957978Z buf0 = empty_strided((10, 10), (10, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.8958143Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.8958216Z del arg0_1 2023-01-11T21:38:06.8958286Z del arg1_1 2023-01-11T21:38:06.8958358Z return (buf0, ) 2023-01-11T21:38:06.8958363Z 2023-01-11T21:38:06.8958367Z 2023-01-11T21:38:06.8958447Z if __name__ == "__main__": 2023-01-11T21:38:06.8958563Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8958688Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8958886Z arg0_1 = rand_strided((10, 10), (30, 2), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.8959075Z arg1_1 = rand_strided((1, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.8959193Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.8959198Z 2023-01-11T21:38:06.8959270Z ok (1.675s) 2023-01-11T21:38:06.8959744Z test_cpu_strided_dense (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8959872Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8960133Z [2023-01-11 21:37:39,632] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1074 2023-01-11T21:38:06.8960425Z [2023-01-11 21:37:41,294] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1074 2023-01-11T21:38:06.8960432Z 2023-01-11T21:38:06.8960530Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8960605Z import torch 2023-01-11T21:38:06.8960681Z import random 2023-01-11T21:38:06.8960800Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8960924Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8960929Z 2023-01-11T21:38:06.8961003Z aten = torch.ops.aten 2023-01-11T21:38:06.8961139Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8961238Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8961243Z 2023-01-11T21:38:06.8961315Z import triton 2023-01-11T21:38:06.8961407Z import triton.language as tl 2023-01-11T21:38:06.8961530Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8961669Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8961677Z 2023-01-11T21:38:06.8961681Z 2023-01-11T21:38:06.8961818Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.8962018Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.8962141Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.8962250Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:06.8962354Z float* __restrict__ out_ptr0) 2023-01-11T21:38:06.8962419Z { 2023-01-11T21:38:06.8962519Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.8962585Z { 2023-01-11T21:38:06.8962702Z #pragma omp for collapse(2) 2023-01-11T21:38:06.8962788Z for(long i0=0; i0<10; i0+=1) 2023-01-11T21:38:06.8962854Z { 2023-01-11T21:38:06.8962946Z for(long i1=0; i1<10; i1+=1) 2023-01-11T21:38:06.8963014Z { 2023-01-11T21:38:06.8963081Z { 2023-01-11T21:38:06.8963151Z { 2023-01-11T21:38:06.8963256Z auto tmp0 = in_ptr0[(2*i1) + (30*i0)]; 2023-01-11T21:38:06.8963366Z auto tmp1 = in_ptr1[i1 + (10*i0)]; 2023-01-11T21:38:06.8963465Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.8963566Z out_ptr0[i1 + (10*i0)] = tmp2; 2023-01-11T21:38:06.8963637Z } 2023-01-11T21:38:06.8963706Z } 2023-01-11T21:38:06.8963773Z } 2023-01-11T21:38:06.8963833Z } 2023-01-11T21:38:06.8963896Z } 2023-01-11T21:38:06.8963959Z } 2023-01-11T21:38:06.8964044Z ''') 2023-01-11T21:38:06.8964049Z 2023-01-11T21:38:06.8964057Z 2023-01-11T21:38:06.8964150Z async_compile.wait(globals()) 2023-01-11T21:38:06.8964226Z del async_compile 2023-01-11T21:38:06.8964231Z 2023-01-11T21:38:06.8964305Z def call(args): 2023-01-11T21:38:06.8964377Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.8964450Z args.clear() 2023-01-11T21:38:06.8964649Z buf0 = empty_strided((10, 10), (10, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.8964814Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.8964886Z del arg0_1 2023-01-11T21:38:06.8964957Z del arg1_1 2023-01-11T21:38:06.8965031Z return (buf0, ) 2023-01-11T21:38:06.8965036Z 2023-01-11T21:38:06.8965041Z 2023-01-11T21:38:06.8965120Z if __name__ == "__main__": 2023-01-11T21:38:06.8965232Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8965358Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8965556Z arg0_1 = rand_strided((10, 10), (30, 2), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.8965757Z arg1_1 = rand_strided((10, 10), (10, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.8965875Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.8965880Z 2023-01-11T21:38:06.8965950Z ok (1.676s) 2023-01-11T21:38:06.8966451Z test_cpu_strided_double (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8966582Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8966843Z [2023-01-11 21:37:41,309] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1075 2023-01-11T21:38:06.8967100Z [2023-01-11 21:37:42,971] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1075 2023-01-11T21:38:06.8967117Z 2023-01-11T21:38:06.8967207Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8967281Z import torch 2023-01-11T21:38:06.8967357Z import random 2023-01-11T21:38:06.8967477Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8967600Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8967605Z 2023-01-11T21:38:06.8967687Z aten = torch.ops.aten 2023-01-11T21:38:06.8967821Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8967910Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8967915Z 2023-01-11T21:38:06.8967991Z import triton 2023-01-11T21:38:06.8968086Z import triton.language as tl 2023-01-11T21:38:06.8968208Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8968348Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8968381Z 2023-01-11T21:38:06.8968385Z 2023-01-11T21:38:06.8968528Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.8968732Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.8968856Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.8968966Z const double* __restrict__ in_ptr1, 2023-01-11T21:38:06.8969070Z double* __restrict__ out_ptr0) 2023-01-11T21:38:06.8969138Z { 2023-01-11T21:38:06.8969238Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.8969305Z { 2023-01-11T21:38:06.8969400Z #pragma omp for collapse(2) 2023-01-11T21:38:06.8969483Z for(long i0=0; i0<10; i0+=1) 2023-01-11T21:38:06.8969543Z { 2023-01-11T21:38:06.8969633Z for(long i1=0; i1<10; i1+=1) 2023-01-11T21:38:06.8969700Z { 2023-01-11T21:38:06.8969769Z { 2023-01-11T21:38:06.8969838Z { 2023-01-11T21:38:06.8969953Z auto tmp0 = in_ptr0[(2*i1) + (30*i0)]; 2023-01-11T21:38:06.8970060Z auto tmp2 = in_ptr1[i1 + (10*i0)]; 2023-01-11T21:38:06.8970170Z auto tmp1 = static_cast(tmp0); 2023-01-11T21:38:06.8970270Z auto tmp3 = tmp1 + tmp2; 2023-01-11T21:38:06.8970373Z out_ptr0[i1 + (10*i0)] = tmp3; 2023-01-11T21:38:06.8970444Z } 2023-01-11T21:38:06.8970513Z } 2023-01-11T21:38:06.8970579Z } 2023-01-11T21:38:06.8970646Z } 2023-01-11T21:38:06.8970705Z } 2023-01-11T21:38:06.8970768Z } 2023-01-11T21:38:06.8970854Z ''') 2023-01-11T21:38:06.8970860Z 2023-01-11T21:38:06.8970864Z 2023-01-11T21:38:06.8970959Z async_compile.wait(globals()) 2023-01-11T21:38:06.8971038Z del async_compile 2023-01-11T21:38:06.8971043Z 2023-01-11T21:38:06.8971116Z def call(args): 2023-01-11T21:38:06.8971197Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.8971265Z args.clear() 2023-01-11T21:38:06.8971468Z buf0 = empty_strided((10, 10), (10, 1), device='cpu', dtype=torch.float64) 2023-01-11T21:38:06.8971637Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.8971711Z del arg0_1 2023-01-11T21:38:06.8971782Z del arg1_1 2023-01-11T21:38:06.8971883Z return (buf0, ) 2023-01-11T21:38:06.8971889Z 2023-01-11T21:38:06.8971893Z 2023-01-11T21:38:06.8971973Z if __name__ == "__main__": 2023-01-11T21:38:06.8972084Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8972209Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8972411Z arg0_1 = rand_strided((10, 10), (30, 2), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.8972607Z arg1_1 = rand_strided((10, 10), (10, 1), device='cpu', dtype=torch.float64) 2023-01-11T21:38:06.8972726Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.8972732Z 2023-01-11T21:38:06.8972805Z ok (1.676s) 2023-01-11T21:38:06.8973275Z test_cpu_strided_int (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8973405Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8973662Z [2023-01-11 21:37:42,985] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1076 2023-01-11T21:38:06.8973923Z [2023-01-11 21:37:44,652] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1076 2023-01-11T21:38:06.8973929Z 2023-01-11T21:38:06.8974019Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8974095Z import torch 2023-01-11T21:38:06.8974170Z import random 2023-01-11T21:38:06.8974317Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8974440Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8974445Z 2023-01-11T21:38:06.8974631Z aten = torch.ops.aten 2023-01-11T21:38:06.8974769Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8974862Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8974874Z 2023-01-11T21:38:06.8974941Z import triton 2023-01-11T21:38:06.8975035Z import triton.language as tl 2023-01-11T21:38:06.8975160Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8975297Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8975303Z 2023-01-11T21:38:06.8975308Z 2023-01-11T21:38:06.8975451Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.8975669Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.8975806Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.8975936Z const int* __restrict__ in_ptr1, 2023-01-11T21:38:06.8976033Z float* __restrict__ out_ptr0) 2023-01-11T21:38:06.8976098Z { 2023-01-11T21:38:06.8976198Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.8976265Z { 2023-01-11T21:38:06.8976362Z #pragma omp for collapse(2) 2023-01-11T21:38:06.8976446Z for(long i0=0; i0<10; i0+=1) 2023-01-11T21:38:06.8976506Z { 2023-01-11T21:38:06.8976596Z for(long i1=0; i1<10; i1+=1) 2023-01-11T21:38:06.8976665Z { 2023-01-11T21:38:06.8976735Z { 2023-01-11T21:38:06.8976805Z { 2023-01-11T21:38:06.8976918Z auto tmp0 = in_ptr0[(2*i1) + (30*i0)]; 2023-01-11T21:38:06.8977019Z auto tmp1 = in_ptr1[i1]; 2023-01-11T21:38:06.8977174Z auto tmp2 = static_cast(tmp1); 2023-01-11T21:38:06.8977286Z auto tmp3 = tmp0 + tmp2; 2023-01-11T21:38:06.8977388Z out_ptr0[i1 + (10*i0)] = tmp3; 2023-01-11T21:38:06.8977459Z } 2023-01-11T21:38:06.8977529Z } 2023-01-11T21:38:06.8977596Z } 2023-01-11T21:38:06.8977665Z } 2023-01-11T21:38:06.8977723Z } 2023-01-11T21:38:06.8977784Z } 2023-01-11T21:38:06.8977918Z ''') 2023-01-11T21:38:06.8977925Z 2023-01-11T21:38:06.8977929Z 2023-01-11T21:38:06.8978027Z async_compile.wait(globals()) 2023-01-11T21:38:06.8978102Z del async_compile 2023-01-11T21:38:06.8978107Z 2023-01-11T21:38:06.8978181Z def call(args): 2023-01-11T21:38:06.8978259Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.8978327Z args.clear() 2023-01-11T21:38:06.8978527Z buf0 = empty_strided((10, 10), (10, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.8978695Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.8978770Z del arg0_1 2023-01-11T21:38:06.8978843Z del arg1_1 2023-01-11T21:38:06.8978918Z return (buf0, ) 2023-01-11T21:38:06.8978923Z 2023-01-11T21:38:06.8978927Z 2023-01-11T21:38:06.8979007Z if __name__ == "__main__": 2023-01-11T21:38:06.8979122Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8979240Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8979439Z arg0_1 = rand_strided((10, 10), (30, 2), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.8979631Z arg1_1 = rand_strided((10, ), (1, ), device='cpu', dtype=torch.int32) 2023-01-11T21:38:06.8979749Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.8979754Z 2023-01-11T21:38:06.8979824Z ok (1.681s) 2023-01-11T21:38:06.8980299Z test_cpu_strided_strided (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8980477Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8980743Z [2023-01-11 21:37:44,666] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1077 2023-01-11T21:38:06.8981008Z [2023-01-11 21:37:46,321] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1077 2023-01-11T21:38:06.8981014Z 2023-01-11T21:38:06.8981115Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8981185Z import torch 2023-01-11T21:38:06.8981261Z import random 2023-01-11T21:38:06.8981382Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8981508Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8981513Z 2023-01-11T21:38:06.8981597Z aten = torch.ops.aten 2023-01-11T21:38:06.8981735Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8981838Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8981843Z 2023-01-11T21:38:06.8981913Z import triton 2023-01-11T21:38:06.8982009Z import triton.language as tl 2023-01-11T21:38:06.8982141Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8982287Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8982292Z 2023-01-11T21:38:06.8982296Z 2023-01-11T21:38:06.8982434Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.8982640Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.8982766Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.8982877Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:06.8982975Z float* __restrict__ out_ptr0) 2023-01-11T21:38:06.8983044Z { 2023-01-11T21:38:06.8983147Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.8983216Z { 2023-01-11T21:38:06.8983316Z #pragma omp for collapse(2) 2023-01-11T21:38:06.8983403Z for(long i0=0; i0<10; i0+=1) 2023-01-11T21:38:06.8983473Z { 2023-01-11T21:38:06.8983559Z for(long i1=0; i1<10; i1+=1) 2023-01-11T21:38:06.8983628Z { 2023-01-11T21:38:06.8983698Z { 2023-01-11T21:38:06.8983799Z { 2023-01-11T21:38:06.8983917Z auto tmp0 = in_ptr0[(2*i1) + (30*i0)]; 2023-01-11T21:38:06.8984029Z auto tmp1 = in_ptr1[(2*i1) + (30*i0)]; 2023-01-11T21:38:06.8984130Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.8984227Z out_ptr0[i1 + (10*i0)] = tmp2; 2023-01-11T21:38:06.8984301Z } 2023-01-11T21:38:06.8984371Z } 2023-01-11T21:38:06.8984442Z } 2023-01-11T21:38:06.8984511Z } 2023-01-11T21:38:06.8984579Z } 2023-01-11T21:38:06.8984639Z } 2023-01-11T21:38:06.8984726Z ''') 2023-01-11T21:38:06.8984734Z 2023-01-11T21:38:06.8984738Z 2023-01-11T21:38:06.8984833Z async_compile.wait(globals()) 2023-01-11T21:38:06.8984911Z del async_compile 2023-01-11T21:38:06.8984916Z 2023-01-11T21:38:06.8984993Z def call(args): 2023-01-11T21:38:06.8985074Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.8985152Z args.clear() 2023-01-11T21:38:06.8985356Z buf0 = empty_strided((10, 10), (10, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.8985520Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.8985595Z del arg0_1 2023-01-11T21:38:06.8985670Z del arg1_1 2023-01-11T21:38:06.8985748Z return (buf0, ) 2023-01-11T21:38:06.8985753Z 2023-01-11T21:38:06.8985757Z 2023-01-11T21:38:06.8985839Z if __name__ == "__main__": 2023-01-11T21:38:06.8985960Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8986089Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8986289Z arg0_1 = rand_strided((10, 10), (30, 2), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.8986511Z arg1_1 = rand_strided((10, 10), (30, 2), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.8986634Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.8986639Z 2023-01-11T21:38:06.8986711Z ok (1.669s) 2023-01-11T21:38:06.8987198Z test_cpu_strided_transposed (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8987332Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8987593Z [2023-01-11 21:37:46,335] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1078 2023-01-11T21:38:06.8987862Z [2023-01-11 21:37:47,998] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1078 2023-01-11T21:38:06.8987868Z 2023-01-11T21:38:06.8987969Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8988046Z import torch 2023-01-11T21:38:06.8988117Z import random 2023-01-11T21:38:06.8988242Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8988367Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8988372Z 2023-01-11T21:38:06.8988455Z aten = torch.ops.aten 2023-01-11T21:38:06.8988595Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8988693Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8988698Z 2023-01-11T21:38:06.8988776Z import triton 2023-01-11T21:38:06.8988878Z import triton.language as tl 2023-01-11T21:38:06.8988999Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8989141Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8989148Z 2023-01-11T21:38:06.8989152Z 2023-01-11T21:38:06.8989292Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.8989501Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.8989627Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.8989764Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:06.8989872Z float* __restrict__ out_ptr0) 2023-01-11T21:38:06.8989940Z { 2023-01-11T21:38:06.8990036Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.8990104Z { 2023-01-11T21:38:06.8990201Z #pragma omp for collapse(2) 2023-01-11T21:38:06.8990289Z for(long i0=0; i0<10; i0+=1) 2023-01-11T21:38:06.8990359Z { 2023-01-11T21:38:06.8990452Z for(long i1=0; i1<10; i1+=1) 2023-01-11T21:38:06.8990515Z { 2023-01-11T21:38:06.8990588Z { 2023-01-11T21:38:06.8990661Z { 2023-01-11T21:38:06.8990774Z auto tmp0 = in_ptr0[(2*i1) + (30*i0)]; 2023-01-11T21:38:06.8990886Z auto tmp1 = in_ptr1[i0 + (10*i1)]; 2023-01-11T21:38:06.8990991Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.8991097Z out_ptr0[i1 + (10*i0)] = tmp2; 2023-01-11T21:38:06.8991165Z } 2023-01-11T21:38:06.8991236Z } 2023-01-11T21:38:06.8991305Z } 2023-01-11T21:38:06.8991374Z } 2023-01-11T21:38:06.8991441Z } 2023-01-11T21:38:06.8991507Z } 2023-01-11T21:38:06.8991592Z ''') 2023-01-11T21:38:06.8991597Z 2023-01-11T21:38:06.8991602Z 2023-01-11T21:38:06.8991691Z async_compile.wait(globals()) 2023-01-11T21:38:06.8991772Z del async_compile 2023-01-11T21:38:06.8991777Z 2023-01-11T21:38:06.8991853Z def call(args): 2023-01-11T21:38:06.8991935Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.8992012Z args.clear() 2023-01-11T21:38:06.8992213Z buf0 = empty_strided((10, 10), (10, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.8992411Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.8992487Z del arg0_1 2023-01-11T21:38:06.8992554Z del arg1_1 2023-01-11T21:38:06.8992631Z return (buf0, ) 2023-01-11T21:38:06.8992636Z 2023-01-11T21:38:06.8992643Z 2023-01-11T21:38:06.8992730Z if __name__ == "__main__": 2023-01-11T21:38:06.8992848Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.8992976Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.8993180Z arg0_1 = rand_strided((10, 10), (30, 2), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.8993377Z arg1_1 = rand_strided((10, 10), (1, 10), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.8993493Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.8993504Z 2023-01-11T21:38:06.8993571Z ok (1.677s) 2023-01-11T21:38:06.8994054Z test_cpu_transposed_broadcast1 (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.8994189Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.8994449Z [2023-01-11 21:37:48,012] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1079 2023-01-11T21:38:06.8994711Z [2023-01-11 21:37:48,021] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1079 2023-01-11T21:38:06.8994717Z 2023-01-11T21:38:06.8994817Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.8994893Z import torch 2023-01-11T21:38:06.8994969Z import random 2023-01-11T21:38:06.8995091Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.8995212Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.8995217Z 2023-01-11T21:38:06.8995298Z aten = torch.ops.aten 2023-01-11T21:38:06.8995437Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.8995533Z async_compile = AsyncCompile() 2023-01-11T21:38:06.8995538Z 2023-01-11T21:38:06.8995640Z import triton 2023-01-11T21:38:06.8995737Z import triton.language as tl 2023-01-11T21:38:06.8995862Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.8995996Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.8996008Z 2023-01-11T21:38:06.8996012Z 2023-01-11T21:38:06.8996145Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.8996351Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.8996474Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.8996582Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:06.8996695Z float* __restrict__ out_ptr0) 2023-01-11T21:38:06.8996764Z { 2023-01-11T21:38:06.8996869Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.8996931Z { 2023-01-11T21:38:06.8997014Z #pragma omp for 2023-01-11T21:38:06.8997106Z for(long i0=0; i0<10; i0+=1) 2023-01-11T21:38:06.8997174Z { 2023-01-11T21:38:06.8997264Z for(long i1=0; i1<1; i1+=1) 2023-01-11T21:38:06.8997339Z { 2023-01-11T21:38:06.8997495Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + (8*i1) + (10*i0)); 2023-01-11T21:38:06.8997624Z auto tmp1 = at::vec::Vectorized(in_ptr1[i0]); 2023-01-11T21:38:06.8997716Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.8997828Z tmp2.store(out_ptr0 + (8*i1) + (10*i0)); 2023-01-11T21:38:06.8997899Z } 2023-01-11T21:38:06.8997996Z #pragma omp simd simdlen(4) 2023-01-11T21:38:06.8998117Z for(long i1=8; i1<10; i1+=1) 2023-01-11T21:38:06.8998185Z { 2023-01-11T21:38:06.8998282Z auto tmp0 = in_ptr0[i1 + (10*i0)]; 2023-01-11T21:38:06.8998373Z auto tmp1 = in_ptr1[i0]; 2023-01-11T21:38:06.8998469Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.8998575Z out_ptr0[i1 + (10*i0)] = tmp2; 2023-01-11T21:38:06.8998643Z } 2023-01-11T21:38:06.8998712Z } 2023-01-11T21:38:06.8998780Z } 2023-01-11T21:38:06.8998840Z } 2023-01-11T21:38:06.8998926Z ''') 2023-01-11T21:38:06.8998931Z 2023-01-11T21:38:06.8998936Z 2023-01-11T21:38:06.8999031Z async_compile.wait(globals()) 2023-01-11T21:38:06.8999110Z del async_compile 2023-01-11T21:38:06.8999115Z 2023-01-11T21:38:06.8999192Z def call(args): 2023-01-11T21:38:06.8999273Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.8999350Z args.clear() 2023-01-11T21:38:06.8999546Z buf0 = empty_strided((10, 10), (1, 10), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.8999718Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.8999795Z del arg0_1 2023-01-11T21:38:06.8999868Z del arg1_1 2023-01-11T21:38:06.8999945Z return (buf0, ) 2023-01-11T21:38:06.8999950Z 2023-01-11T21:38:06.8999954Z 2023-01-11T21:38:06.9000039Z if __name__ == "__main__": 2023-01-11T21:38:06.9000159Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.9000287Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.9000480Z arg0_1 = rand_strided((10, 10), (1, 10), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.9000675Z arg1_1 = rand_strided((10, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.9000795Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.9000800Z 2023-01-11T21:38:06.9000876Z ok (0.022s) 2023-01-11T21:38:06.9001357Z test_cpu_transposed_broadcast2 (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.9001520Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.9001783Z [2023-01-11 21:37:48,035] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1080 2023-01-11T21:38:06.9002050Z [2023-01-11 21:37:48,044] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1080 2023-01-11T21:38:06.9002056Z 2023-01-11T21:38:06.9002157Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.9002233Z import torch 2023-01-11T21:38:06.9002303Z import random 2023-01-11T21:38:06.9002425Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.9002558Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.9002563Z 2023-01-11T21:38:06.9002645Z aten = torch.ops.aten 2023-01-11T21:38:06.9002783Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.9002882Z async_compile = AsyncCompile() 2023-01-11T21:38:06.9002887Z 2023-01-11T21:38:06.9002967Z import triton 2023-01-11T21:38:06.9003054Z import triton.language as tl 2023-01-11T21:38:06.9003181Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.9003321Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.9003327Z 2023-01-11T21:38:06.9003332Z 2023-01-11T21:38:06.9003472Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.9003679Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.9003803Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.9003912Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:06.9004045Z float* __restrict__ out_ptr0) 2023-01-11T21:38:06.9004104Z { 2023-01-11T21:38:06.9004203Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.9004267Z { 2023-01-11T21:38:06.9004348Z #pragma omp for 2023-01-11T21:38:06.9004436Z for(long i0=0; i0<10; i0+=1) 2023-01-11T21:38:06.9004505Z { 2023-01-11T21:38:06.9004593Z for(long i1=0; i1<1; i1+=1) 2023-01-11T21:38:06.9004654Z { 2023-01-11T21:38:06.9004800Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + (8*i1) + (10*i0)); 2023-01-11T21:38:06.9004939Z auto tmp1 = at::vec::Vectorized::loadu(in_ptr1 + 8*i1); 2023-01-11T21:38:06.9005032Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.9005143Z tmp2.store(out_ptr0 + (8*i1) + (10*i0)); 2023-01-11T21:38:06.9005210Z } 2023-01-11T21:38:06.9005304Z #pragma omp simd simdlen(4) 2023-01-11T21:38:06.9005385Z for(long i1=8; i1<10; i1+=1) 2023-01-11T21:38:06.9005457Z { 2023-01-11T21:38:06.9005557Z auto tmp0 = in_ptr0[i1 + (10*i0)]; 2023-01-11T21:38:06.9005645Z auto tmp1 = in_ptr1[i1]; 2023-01-11T21:38:06.9005735Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.9005833Z out_ptr0[i1 + (10*i0)] = tmp2; 2023-01-11T21:38:06.9005899Z } 2023-01-11T21:38:06.9005959Z } 2023-01-11T21:38:06.9006023Z } 2023-01-11T21:38:06.9006087Z } 2023-01-11T21:38:06.9006171Z ''') 2023-01-11T21:38:06.9006177Z 2023-01-11T21:38:06.9006182Z 2023-01-11T21:38:06.9006274Z async_compile.wait(globals()) 2023-01-11T21:38:06.9006353Z del async_compile 2023-01-11T21:38:06.9006359Z 2023-01-11T21:38:06.9006432Z def call(args): 2023-01-11T21:38:06.9006505Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.9006580Z args.clear() 2023-01-11T21:38:06.9006790Z buf0 = empty_strided((1, 10, 10), (100, 1, 10), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.9006961Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.9007033Z del arg0_1 2023-01-11T21:38:06.9007103Z del arg1_1 2023-01-11T21:38:06.9007180Z return (buf0, ) 2023-01-11T21:38:06.9007185Z 2023-01-11T21:38:06.9007189Z 2023-01-11T21:38:06.9007296Z if __name__ == "__main__": 2023-01-11T21:38:06.9007407Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.9007533Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.9007731Z arg0_1 = rand_strided((10, 10), (1, 10), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.9007937Z arg1_1 = rand_strided((1, 10, 1), (10, 1, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.9008058Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.9008063Z 2023-01-11T21:38:06.9008134Z ok (0.023s) 2023-01-11T21:38:06.9008612Z test_cpu_transposed_broadcast3 (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.9008746Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.9009004Z [2023-01-11 21:37:48,058] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1081 2023-01-11T21:38:06.9009260Z [2023-01-11 21:37:48,065] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1081 2023-01-11T21:38:06.9009271Z 2023-01-11T21:38:06.9009362Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.9009437Z import torch 2023-01-11T21:38:06.9009509Z import random 2023-01-11T21:38:06.9009628Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.9009751Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.9009785Z 2023-01-11T21:38:06.9009867Z aten = torch.ops.aten 2023-01-11T21:38:06.9010001Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.9010089Z async_compile = AsyncCompile() 2023-01-11T21:38:06.9010094Z 2023-01-11T21:38:06.9010166Z import triton 2023-01-11T21:38:06.9010262Z import triton.language as tl 2023-01-11T21:38:06.9010387Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.9010527Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.9010533Z 2023-01-11T21:38:06.9010537Z 2023-01-11T21:38:06.9010674Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.9010878Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.9011001Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.9011104Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:06.9011209Z float* __restrict__ out_ptr0) 2023-01-11T21:38:06.9011273Z { 2023-01-11T21:38:06.9011374Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.9011439Z { 2023-01-11T21:38:06.9011520Z #pragma omp for 2023-01-11T21:38:06.9011607Z for(long i0=0; i0<12; i0+=1) 2023-01-11T21:38:06.9011667Z { 2023-01-11T21:38:06.9011812Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:06.9011941Z auto tmp1 = at::vec::Vectorized(in_ptr1[0]); 2023-01-11T21:38:06.9012031Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.9012126Z tmp2.store(out_ptr0 + 8*i0); 2023-01-11T21:38:06.9012192Z } 2023-01-11T21:38:06.9012291Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:06.9012372Z for(long i0=96; i0<100; i0+=1) 2023-01-11T21:38:06.9012437Z { 2023-01-11T21:38:06.9012525Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:06.9012614Z auto tmp1 = in_ptr1[0]; 2023-01-11T21:38:06.9012704Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.9012789Z out_ptr0[i0] = tmp2; 2023-01-11T21:38:06.9012857Z } 2023-01-11T21:38:06.9012916Z } 2023-01-11T21:38:06.9012980Z } 2023-01-11T21:38:06.9013064Z ''') 2023-01-11T21:38:06.9013070Z 2023-01-11T21:38:06.9013074Z 2023-01-11T21:38:06.9013197Z async_compile.wait(globals()) 2023-01-11T21:38:06.9013277Z del async_compile 2023-01-11T21:38:06.9013282Z 2023-01-11T21:38:06.9013357Z def call(args): 2023-01-11T21:38:06.9013440Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.9013507Z args.clear() 2023-01-11T21:38:06.9013710Z buf0 = empty_strided((10, 10), (1, 10), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.9013875Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.9013946Z del arg0_1 2023-01-11T21:38:06.9014017Z del arg1_1 2023-01-11T21:38:06.9014091Z return (buf0, ) 2023-01-11T21:38:06.9014100Z 2023-01-11T21:38:06.9014105Z 2023-01-11T21:38:06.9014182Z if __name__ == "__main__": 2023-01-11T21:38:06.9014299Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.9014419Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.9014739Z arg0_1 = rand_strided((10, 10), (1, 10), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.9014931Z arg1_1 = rand_strided((1, ), (1, ), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.9015050Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.9015055Z 2023-01-11T21:38:06.9015124Z ok (0.021s) 2023-01-11T21:38:06.9015630Z test_cpu_transposed_dense (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.9015832Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.9016089Z [2023-01-11 21:37:48,079] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1082 2023-01-11T21:38:06.9016357Z [2023-01-11 21:37:49,738] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1082 2023-01-11T21:38:06.9016362Z 2023-01-11T21:38:06.9016453Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.9016526Z import torch 2023-01-11T21:38:06.9016598Z import random 2023-01-11T21:38:06.9016719Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.9016842Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.9016847Z 2023-01-11T21:38:06.9016929Z aten = torch.ops.aten 2023-01-11T21:38:06.9017063Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.9017210Z async_compile = AsyncCompile() 2023-01-11T21:38:06.9017219Z 2023-01-11T21:38:06.9017289Z import triton 2023-01-11T21:38:06.9017383Z import triton.language as tl 2023-01-11T21:38:06.9017508Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.9017647Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.9017652Z 2023-01-11T21:38:06.9017657Z 2023-01-11T21:38:06.9017802Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.9018007Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.9018132Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.9018243Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:06.9018342Z float* __restrict__ out_ptr0) 2023-01-11T21:38:06.9018408Z { 2023-01-11T21:38:06.9018510Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.9018578Z { 2023-01-11T21:38:06.9018675Z #pragma omp for collapse(2) 2023-01-11T21:38:06.9018764Z for(long i0=0; i0<10; i0+=1) 2023-01-11T21:38:06.9018831Z { 2023-01-11T21:38:06.9018917Z for(long i1=0; i1<10; i1+=1) 2023-01-11T21:38:06.9018985Z { 2023-01-11T21:38:06.9019055Z { 2023-01-11T21:38:06.9019126Z { 2023-01-11T21:38:06.9019277Z auto tmp0 = in_ptr0[i0 + (10*i1)]; 2023-01-11T21:38:06.9019389Z auto tmp1 = in_ptr1[i1 + (10*i0)]; 2023-01-11T21:38:06.9019490Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.9019587Z out_ptr0[i0 + (10*i1)] = tmp2; 2023-01-11T21:38:06.9019658Z } 2023-01-11T21:38:06.9019727Z } 2023-01-11T21:38:06.9019795Z } 2023-01-11T21:38:06.9019863Z } 2023-01-11T21:38:06.9019930Z } 2023-01-11T21:38:06.9019988Z } 2023-01-11T21:38:06.9020077Z ''') 2023-01-11T21:38:06.9020083Z 2023-01-11T21:38:06.9020087Z 2023-01-11T21:38:06.9020181Z async_compile.wait(globals()) 2023-01-11T21:38:06.9020263Z del async_compile 2023-01-11T21:38:06.9020268Z 2023-01-11T21:38:06.9020343Z def call(args): 2023-01-11T21:38:06.9020423Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.9020499Z args.clear() 2023-01-11T21:38:06.9020699Z buf0 = empty_strided((10, 10), (1, 10), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.9020864Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.9020937Z del arg0_1 2023-01-11T21:38:06.9021009Z del arg1_1 2023-01-11T21:38:06.9021084Z return (buf0, ) 2023-01-11T21:38:06.9021089Z 2023-01-11T21:38:06.9021093Z 2023-01-11T21:38:06.9021175Z if __name__ == "__main__": 2023-01-11T21:38:06.9021294Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.9021422Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.9021616Z arg0_1 = rand_strided((10, 10), (1, 10), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.9021862Z arg1_1 = rand_strided((10, 10), (10, 1), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.9021982Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.9021988Z 2023-01-11T21:38:06.9022057Z ok (1.673s) 2023-01-11T21:38:06.9022539Z test_cpu_transposed_double (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.9022675Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.9022935Z [2023-01-11 21:37:49,752] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1083 2023-01-11T21:38:06.9023199Z [2023-01-11 21:37:51,418] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1083 2023-01-11T21:38:06.9023207Z 2023-01-11T21:38:06.9023306Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.9023385Z import torch 2023-01-11T21:38:06.9023454Z import random 2023-01-11T21:38:06.9023574Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.9023699Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.9023704Z 2023-01-11T21:38:06.9023786Z aten = torch.ops.aten 2023-01-11T21:38:06.9023922Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.9024019Z async_compile = AsyncCompile() 2023-01-11T21:38:06.9024025Z 2023-01-11T21:38:06.9024099Z import triton 2023-01-11T21:38:06.9024185Z import triton.language as tl 2023-01-11T21:38:06.9024315Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.9024457Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.9024462Z 2023-01-11T21:38:06.9024467Z 2023-01-11T21:38:06.9024608Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.9024814Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.9024940Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.9025053Z const double* __restrict__ in_ptr1, 2023-01-11T21:38:06.9025189Z double* __restrict__ out_ptr0) 2023-01-11T21:38:06.9025252Z { 2023-01-11T21:38:06.9025355Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.9025435Z { 2023-01-11T21:38:06.9025545Z #pragma omp for collapse(2) 2023-01-11T21:38:06.9025647Z for(long i0=0; i0<10; i0+=1) 2023-01-11T21:38:06.9025723Z { 2023-01-11T21:38:06.9025814Z for(long i1=0; i1<10; i1+=1) 2023-01-11T21:38:06.9025875Z { 2023-01-11T21:38:06.9025946Z { 2023-01-11T21:38:06.9026018Z { 2023-01-11T21:38:06.9026127Z auto tmp0 = in_ptr0[i0 + (10*i1)]; 2023-01-11T21:38:06.9026240Z auto tmp2 = in_ptr1[i1 + (10*i0)]; 2023-01-11T21:38:06.9026357Z auto tmp1 = static_cast(tmp0); 2023-01-11T21:38:06.9026457Z auto tmp3 = tmp1 + tmp2; 2023-01-11T21:38:06.9026553Z out_ptr0[i0 + (10*i1)] = tmp3; 2023-01-11T21:38:06.9026627Z } 2023-01-11T21:38:06.9026697Z } 2023-01-11T21:38:06.9026765Z } 2023-01-11T21:38:06.9026833Z } 2023-01-11T21:38:06.9026901Z } 2023-01-11T21:38:06.9026968Z } 2023-01-11T21:38:06.9027048Z ''') 2023-01-11T21:38:06.9027053Z 2023-01-11T21:38:06.9027058Z 2023-01-11T21:38:06.9027151Z async_compile.wait(globals()) 2023-01-11T21:38:06.9027231Z del async_compile 2023-01-11T21:38:06.9027236Z 2023-01-11T21:38:06.9027310Z def call(args): 2023-01-11T21:38:06.9027390Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.9027466Z args.clear() 2023-01-11T21:38:06.9027667Z buf0 = empty_strided((10, 10), (1, 10), device='cpu', dtype=torch.float64) 2023-01-11T21:38:06.9027858Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.9027936Z del arg0_1 2023-01-11T21:38:06.9028010Z del arg1_1 2023-01-11T21:38:06.9028088Z return (buf0, ) 2023-01-11T21:38:06.9028093Z 2023-01-11T21:38:06.9028099Z 2023-01-11T21:38:06.9028182Z if __name__ == "__main__": 2023-01-11T21:38:06.9028302Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.9028432Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.9028636Z arg0_1 = rand_strided((10, 10), (1, 10), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.9028829Z arg1_1 = rand_strided((10, 10), (10, 1), device='cpu', dtype=torch.float64) 2023-01-11T21:38:06.9028953Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.9028958Z 2023-01-11T21:38:06.9029033Z ok (1.680s) 2023-01-11T21:38:06.9029511Z test_cpu_transposed_int (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.9029649Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.9029911Z [2023-01-11 21:37:51,433] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1084 2023-01-11T21:38:06.9030175Z [2023-01-11 21:37:53,097] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1084 2023-01-11T21:38:06.9030181Z 2023-01-11T21:38:06.9030284Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.9030363Z import torch 2023-01-11T21:38:06.9030434Z import random 2023-01-11T21:38:06.9030558Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.9030690Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.9030695Z 2023-01-11T21:38:06.9030781Z aten = torch.ops.aten 2023-01-11T21:38:06.9030922Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.9031022Z async_compile = AsyncCompile() 2023-01-11T21:38:06.9031028Z 2023-01-11T21:38:06.9031167Z import triton 2023-01-11T21:38:06.9031264Z import triton.language as tl 2023-01-11T21:38:06.9031387Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.9031532Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.9031537Z 2023-01-11T21:38:06.9031542Z 2023-01-11T21:38:06.9031685Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.9031891Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.9032017Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.9032129Z const int* __restrict__ in_ptr1, 2023-01-11T21:38:06.9032239Z float* __restrict__ out_ptr0) 2023-01-11T21:38:06.9032307Z { 2023-01-11T21:38:06.9032406Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.9032477Z { 2023-01-11T21:38:06.9032575Z #pragma omp for collapse(2) 2023-01-11T21:38:06.9032668Z for(long i0=0; i0<10; i0+=1) 2023-01-11T21:38:06.9032739Z { 2023-01-11T21:38:06.9032832Z for(long i1=0; i1<10; i1+=1) 2023-01-11T21:38:06.9032903Z { 2023-01-11T21:38:06.9032969Z { 2023-01-11T21:38:06.9033043Z { 2023-01-11T21:38:06.9033157Z auto tmp0 = in_ptr0[i1 + (10*i0)]; 2023-01-11T21:38:06.9033261Z auto tmp1 = in_ptr1[i0]; 2023-01-11T21:38:06.9033382Z auto tmp2 = static_cast(tmp1); 2023-01-11T21:38:06.9033484Z auto tmp3 = tmp0 + tmp2; 2023-01-11T21:38:06.9033589Z out_ptr0[i1 + (10*i0)] = tmp3; 2023-01-11T21:38:06.9033682Z } 2023-01-11T21:38:06.9033752Z } 2023-01-11T21:38:06.9033819Z } 2023-01-11T21:38:06.9033886Z } 2023-01-11T21:38:06.9033952Z } 2023-01-11T21:38:06.9034018Z } 2023-01-11T21:38:06.9034098Z ''') 2023-01-11T21:38:06.9034103Z 2023-01-11T21:38:06.9034117Z 2023-01-11T21:38:06.9034206Z async_compile.wait(globals()) 2023-01-11T21:38:06.9034283Z del async_compile 2023-01-11T21:38:06.9034288Z 2023-01-11T21:38:06.9034364Z def call(args): 2023-01-11T21:38:06.9034444Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.9034519Z args.clear() 2023-01-11T21:38:06.9034720Z buf0 = empty_strided((10, 10), (1, 10), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.9034888Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.9034955Z del arg0_1 2023-01-11T21:38:06.9035028Z del arg1_1 2023-01-11T21:38:06.9035102Z return (buf0, ) 2023-01-11T21:38:06.9035111Z 2023-01-11T21:38:06.9035117Z 2023-01-11T21:38:06.9035198Z if __name__ == "__main__": 2023-01-11T21:38:06.9035319Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.9035447Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.9035648Z arg0_1 = rand_strided((10, 10), (1, 10), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.9035834Z arg1_1 = rand_strided((10, ), (1, ), device='cpu', dtype=torch.int32) 2023-01-11T21:38:06.9035955Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.9035960Z 2023-01-11T21:38:06.9036032Z ok (1.679s) 2023-01-11T21:38:06.9036512Z test_cpu_transposed_strided (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.9036647Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.9036907Z [2023-01-11 21:37:53,113] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1085 2023-01-11T21:38:06.9037200Z [2023-01-11 21:37:54,777] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1085 2023-01-11T21:38:06.9037206Z 2023-01-11T21:38:06.9037304Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.9037379Z import torch 2023-01-11T21:38:06.9037455Z import random 2023-01-11T21:38:06.9037569Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.9037693Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.9037698Z 2023-01-11T21:38:06.9037781Z aten = torch.ops.aten 2023-01-11T21:38:06.9037919Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.9038016Z async_compile = AsyncCompile() 2023-01-11T21:38:06.9038023Z 2023-01-11T21:38:06.9038099Z import triton 2023-01-11T21:38:06.9038192Z import triton.language as tl 2023-01-11T21:38:06.9038310Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.9038451Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.9038456Z 2023-01-11T21:38:06.9038463Z 2023-01-11T21:38:06.9038599Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.9038805Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.9038929Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.9039040Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:06.9039143Z float* __restrict__ out_ptr0) 2023-01-11T21:38:06.9039208Z { 2023-01-11T21:38:06.9039303Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.9039368Z { 2023-01-11T21:38:06.9039464Z #pragma omp for collapse(2) 2023-01-11T21:38:06.9039576Z for(long i0=0; i0<10; i0+=1) 2023-01-11T21:38:06.9039643Z { 2023-01-11T21:38:06.9039737Z for(long i1=0; i1<10; i1+=1) 2023-01-11T21:38:06.9039805Z { 2023-01-11T21:38:06.9039870Z { 2023-01-11T21:38:06.9039941Z { 2023-01-11T21:38:06.9040053Z auto tmp0 = in_ptr0[i0 + (10*i1)]; 2023-01-11T21:38:06.9040167Z auto tmp1 = in_ptr1[(2*i1) + (30*i0)]; 2023-01-11T21:38:06.9040266Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.9040370Z out_ptr0[i0 + (10*i1)] = tmp2; 2023-01-11T21:38:06.9040441Z } 2023-01-11T21:38:06.9040503Z } 2023-01-11T21:38:06.9040569Z } 2023-01-11T21:38:06.9040635Z } 2023-01-11T21:38:06.9040698Z } 2023-01-11T21:38:06.9040761Z } 2023-01-11T21:38:06.9040847Z ''') 2023-01-11T21:38:06.9040853Z 2023-01-11T21:38:06.9040858Z 2023-01-11T21:38:06.9040953Z async_compile.wait(globals()) 2023-01-11T21:38:06.9041024Z del async_compile 2023-01-11T21:38:06.9041029Z 2023-01-11T21:38:06.9041105Z def call(args): 2023-01-11T21:38:06.9041187Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.9041261Z args.clear() 2023-01-11T21:38:06.9041461Z buf0 = empty_strided((10, 10), (1, 10), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.9041630Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.9041704Z del arg0_1 2023-01-11T21:38:06.9041769Z del arg1_1 2023-01-11T21:38:06.9041848Z return (buf0, ) 2023-01-11T21:38:06.9041853Z 2023-01-11T21:38:06.9041857Z 2023-01-11T21:38:06.9041939Z if __name__ == "__main__": 2023-01-11T21:38:06.9042057Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.9042184Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.9042384Z arg0_1 = rand_strided((10, 10), (1, 10), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.9042585Z arg1_1 = rand_strided((10, 10), (30, 2), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.9042702Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.9042708Z 2023-01-11T21:38:06.9042775Z ok (1.680s) 2023-01-11T21:38:06.9043285Z test_cpu_transposed_transposed (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.9043422Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.9043684Z [2023-01-11 21:37:54,792] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1086 2023-01-11T21:38:06.9043950Z [2023-01-11 21:37:54,800] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1086 2023-01-11T21:38:06.9043959Z 2023-01-11T21:38:06.9044058Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.9044132Z import torch 2023-01-11T21:38:06.9044207Z import random 2023-01-11T21:38:06.9044327Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.9044448Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.9044461Z 2023-01-11T21:38:06.9044537Z aten = torch.ops.aten 2023-01-11T21:38:06.9044673Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.9044769Z async_compile = AsyncCompile() 2023-01-11T21:38:06.9044775Z 2023-01-11T21:38:06.9044848Z import triton 2023-01-11T21:38:06.9044942Z import triton.language as tl 2023-01-11T21:38:06.9045067Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.9045206Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.9045212Z 2023-01-11T21:38:06.9045216Z 2023-01-11T21:38:06.9045380Z kernel_cpp_0 = async_compile.cpp(''' 2023-01-11T21:38:06.9045580Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" 2023-01-11T21:38:06.9045705Z extern "C" void kernel(const float* __restrict__ in_ptr0, 2023-01-11T21:38:06.9045814Z const float* __restrict__ in_ptr1, 2023-01-11T21:38:06.9045922Z float* __restrict__ out_ptr0) 2023-01-11T21:38:06.9045988Z { 2023-01-11T21:38:06.9046090Z #pragma omp parallel num_threads(8) 2023-01-11T21:38:06.9046157Z { 2023-01-11T21:38:06.9046233Z #pragma omp for 2023-01-11T21:38:06.9046322Z for(long i0=0; i0<12; i0+=1) 2023-01-11T21:38:06.9046388Z { 2023-01-11T21:38:06.9046530Z auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); 2023-01-11T21:38:06.9046670Z auto tmp1 = at::vec::Vectorized::loadu(in_ptr1 + 8*i0); 2023-01-11T21:38:06.9046761Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.9046860Z tmp2.store(out_ptr0 + 8*i0); 2023-01-11T21:38:06.9046921Z } 2023-01-11T21:38:06.9047021Z #pragma omp for simd simdlen(4) 2023-01-11T21:38:06.9047110Z for(long i0=96; i0<100; i0+=1) 2023-01-11T21:38:06.9047176Z { 2023-01-11T21:38:06.9047264Z auto tmp0 = in_ptr0[i0]; 2023-01-11T21:38:06.9047356Z auto tmp1 = in_ptr1[i0]; 2023-01-11T21:38:06.9047442Z auto tmp2 = tmp0 + tmp1; 2023-01-11T21:38:06.9047522Z out_ptr0[i0] = tmp2; 2023-01-11T21:38:06.9047588Z } 2023-01-11T21:38:06.9047655Z } 2023-01-11T21:38:06.9047721Z } 2023-01-11T21:38:06.9047806Z ''') 2023-01-11T21:38:06.9047812Z 2023-01-11T21:38:06.9047816Z 2023-01-11T21:38:06.9047910Z async_compile.wait(globals()) 2023-01-11T21:38:06.9047988Z del async_compile 2023-01-11T21:38:06.9047993Z 2023-01-11T21:38:06.9048061Z def call(args): 2023-01-11T21:38:06.9048140Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.9048217Z args.clear() 2023-01-11T21:38:06.9048421Z buf0 = empty_strided((10, 10), (1, 10), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.9048590Z kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr())) 2023-01-11T21:38:06.9048662Z del arg0_1 2023-01-11T21:38:06.9048734Z del arg1_1 2023-01-11T21:38:06.9048804Z return (buf0, ) 2023-01-11T21:38:06.9048835Z 2023-01-11T21:38:06.9048848Z 2023-01-11T21:38:06.9048925Z if __name__ == "__main__": 2023-01-11T21:38:06.9049047Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.9049176Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.9049381Z arg0_1 = rand_strided((10, 10), (1, 10), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.9049581Z arg1_1 = rand_strided((10, 10), (1, 10), device='cpu', dtype=torch.float32) 2023-01-11T21:38:06.9049705Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.9049710Z 2023-01-11T21:38:06.9049788Z ok (0.022s) 2023-01-11T21:38:06.9050279Z test_cuda_broadcast1_broadcast1 (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.9050414Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.9050671Z [2023-01-11 21:37:54,813] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1087 2023-01-11T21:38:06.9050939Z [2023-01-11 21:37:54,822] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1087 2023-01-11T21:38:06.9050944Z 2023-01-11T21:38:06.9051046Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.9051122Z import torch 2023-01-11T21:38:06.9051200Z import random 2023-01-11T21:38:06.9051350Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.9051475Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.9051480Z 2023-01-11T21:38:06.9051563Z aten = torch.ops.aten 2023-01-11T21:38:06.9051694Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.9051792Z async_compile = AsyncCompile() 2023-01-11T21:38:06.9051797Z 2023-01-11T21:38:06.9051874Z import triton 2023-01-11T21:38:06.9051966Z import triton.language as tl 2023-01-11T21:38:06.9052089Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.9052228Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.9052233Z 2023-01-11T21:38:06.9052238Z 2023-01-11T21:38:06.9052393Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.9052462Z import triton 2023-01-11T21:38:06.9052556Z import triton.language as tl 2023-01-11T21:38:06.9052670Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.9052780Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.9052914Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.9053039Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.9053044Z 2023-01-11T21:38:06.9053470Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.9053546Z @triton.jit 2023-01-11T21:38:06.9053684Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.9053760Z xnumel = 10 2023-01-11T21:38:06.9053858Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.9053991Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.9054075Z xmask = xindex < xnumel 2023-01-11T21:38:06.9054147Z x0 = xindex 2023-01-11T21:38:06.9054245Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.9054339Z tmp1 = tl.load(in_ptr1 + (x0), xmask) 2023-01-11T21:38:06.9054420Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.9054666Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.9054753Z ''') 2023-01-11T21:38:06.9054759Z 2023-01-11T21:38:06.9054763Z 2023-01-11T21:38:06.9054899Z async_compile.wait(globals()) 2023-01-11T21:38:06.9054976Z del async_compile 2023-01-11T21:38:06.9054981Z 2023-01-11T21:38:06.9055057Z def call(args): 2023-01-11T21:38:06.9055135Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.9055204Z args.clear() 2023-01-11T21:38:06.9055296Z with torch.cuda.device(0): 2023-01-11T21:38:06.9055499Z buf0 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.9055590Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.9055736Z triton_fused_add_0.run(arg0_1, arg1_1, buf0, 10, grid=grid(10), stream=stream0) 2023-01-11T21:38:06.9055810Z del arg0_1 2023-01-11T21:38:06.9055889Z del arg1_1 2023-01-11T21:38:06.9055961Z return (buf0, ) 2023-01-11T21:38:06.9055966Z 2023-01-11T21:38:06.9055980Z 2023-01-11T21:38:06.9056054Z if __name__ == "__main__": 2023-01-11T21:38:06.9056172Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.9056303Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.9056505Z arg0_1 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.9056703Z arg1_1 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.9056823Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.9056828Z 2023-01-11T21:38:06.9056898Z ok (0.023s) 2023-01-11T21:38:06.9057467Z test_cuda_broadcast1_broadcast2 (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.9057645Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.9057907Z [2023-01-11 21:37:54,835] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1088 2023-01-11T21:38:06.9058172Z [2023-01-11 21:37:54,908] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1088 2023-01-11T21:38:06.9058178Z 2023-01-11T21:38:06.9058276Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.9058351Z import torch 2023-01-11T21:38:06.9058424Z import random 2023-01-11T21:38:06.9058545Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.9058670Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.9058675Z 2023-01-11T21:38:06.9058758Z aten = torch.ops.aten 2023-01-11T21:38:06.9058892Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.9058986Z async_compile = AsyncCompile() 2023-01-11T21:38:06.9058992Z 2023-01-11T21:38:06.9059064Z import triton 2023-01-11T21:38:06.9059156Z import triton.language as tl 2023-01-11T21:38:06.9059281Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.9059422Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.9059428Z 2023-01-11T21:38:06.9059432Z 2023-01-11T21:38:06.9059586Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.9059655Z import triton 2023-01-11T21:38:06.9059748Z import triton.language as tl 2023-01-11T21:38:06.9059864Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.9059966Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.9060102Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.9060230Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.9060235Z 2023-01-11T21:38:06.9060657Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.9060731Z @triton.jit 2023-01-11T21:38:06.9060902Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.9060972Z xnumel = 100 2023-01-11T21:38:06.9061070Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.9061200Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.9061280Z xmask = xindex < xnumel 2023-01-11T21:38:06.9061355Z x0 = xindex % 10 2023-01-11T21:38:06.9061436Z x1 = (xindex // 10) 2023-01-11T21:38:06.9061501Z x2 = xindex 2023-01-11T21:38:06.9061602Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.9061701Z tmp1 = tl.load(in_ptr1 + (x1), xmask) 2023-01-11T21:38:06.9061784Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.9061923Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.9062009Z ''') 2023-01-11T21:38:06.9062015Z 2023-01-11T21:38:06.9062019Z 2023-01-11T21:38:06.9062115Z async_compile.wait(globals()) 2023-01-11T21:38:06.9062191Z del async_compile 2023-01-11T21:38:06.9062196Z 2023-01-11T21:38:06.9062268Z def call(args): 2023-01-11T21:38:06.9062348Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.9062423Z args.clear() 2023-01-11T21:38:06.9062514Z with torch.cuda.device(0): 2023-01-11T21:38:06.9062729Z buf0 = empty_strided((1, 10, 10), (100, 10, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.9062821Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.9062966Z triton_fused_add_0.run(arg0_1, arg1_1, buf0, 100, grid=grid(100), stream=stream0) 2023-01-11T21:38:06.9063033Z del arg0_1 2023-01-11T21:38:06.9063104Z del arg1_1 2023-01-11T21:38:06.9063182Z return (buf0, ) 2023-01-11T21:38:06.9063187Z 2023-01-11T21:38:06.9063229Z 2023-01-11T21:38:06.9063311Z if __name__ == "__main__": 2023-01-11T21:38:06.9063430Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.9063561Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.9063762Z arg0_1 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.9063976Z arg1_1 = rand_strided((1, 10, 1), (10, 1, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.9064090Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.9064096Z 2023-01-11T21:38:06.9064167Z ok (0.086s) 2023-01-11T21:38:06.9064643Z test_cuda_broadcast1_broadcast3 (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.9064778Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.9065038Z [2023-01-11 21:37:54,922] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1089 2023-01-11T21:38:06.9065301Z [2023-01-11 21:37:54,989] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1089 2023-01-11T21:38:06.9065307Z 2023-01-11T21:38:06.9065406Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.9065480Z import torch 2023-01-11T21:38:06.9065555Z import random 2023-01-11T21:38:06.9065668Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.9065791Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.9065797Z 2023-01-11T21:38:06.9065881Z aten = torch.ops.aten 2023-01-11T21:38:06.9066018Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.9066113Z async_compile = AsyncCompile() 2023-01-11T21:38:06.9066121Z 2023-01-11T21:38:06.9066194Z import triton 2023-01-11T21:38:06.9066287Z import triton.language as tl 2023-01-11T21:38:06.9066413Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.9066547Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.9066552Z 2023-01-11T21:38:06.9066561Z 2023-01-11T21:38:06.9066734Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.9066809Z import triton 2023-01-11T21:38:06.9066902Z import triton.language as tl 2023-01-11T21:38:06.9067015Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.9067117Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.9067250Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.9067369Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.9067381Z 2023-01-11T21:38:06.9067794Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.9067871Z @triton.jit 2023-01-11T21:38:06.9068013Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.9068087Z xnumel = 10 2023-01-11T21:38:06.9068186Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.9068317Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.9068399Z xmask = xindex < xnumel 2023-01-11T21:38:06.9068465Z x0 = xindex 2023-01-11T21:38:06.9068563Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.9068696Z tmp1 = tl.load(in_ptr1 + (0 + tl.zeros([XBLOCK], tl.int32)), None) 2023-01-11T21:38:06.9068775Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.9068910Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.9068994Z ''') 2023-01-11T21:38:06.9069000Z 2023-01-11T21:38:06.9069004Z 2023-01-11T21:38:06.9069125Z async_compile.wait(globals()) 2023-01-11T21:38:06.9069203Z del async_compile 2023-01-11T21:38:06.9069208Z 2023-01-11T21:38:06.9069277Z def call(args): 2023-01-11T21:38:06.9069357Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.9069432Z args.clear() 2023-01-11T21:38:06.9069525Z with torch.cuda.device(0): 2023-01-11T21:38:06.9069726Z buf0 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.9069818Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.9069961Z triton_fused_add_0.run(arg0_1, arg1_1, buf0, 10, grid=grid(10), stream=stream0) 2023-01-11T21:38:06.9070028Z del arg0_1 2023-01-11T21:38:06.9070101Z del arg1_1 2023-01-11T21:38:06.9070179Z return (buf0, ) 2023-01-11T21:38:06.9070185Z 2023-01-11T21:38:06.9070189Z 2023-01-11T21:38:06.9070268Z if __name__ == "__main__": 2023-01-11T21:38:06.9070385Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.9070511Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.9070713Z arg0_1 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.9070910Z arg1_1 = rand_strided((1, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.9071023Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.9071028Z 2023-01-11T21:38:06.9071101Z ok (0.081s) 2023-01-11T21:38:06.9071577Z test_cuda_broadcast1_dense (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.9071708Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.9071966Z [2023-01-11 21:37:55,003] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1090 2023-01-11T21:38:06.9072232Z [2023-01-11 21:37:55,084] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1090 2023-01-11T21:38:06.9072238Z 2023-01-11T21:38:06.9072335Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.9072411Z import torch 2023-01-11T21:38:06.9072512Z import random 2023-01-11T21:38:06.9072626Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.9072753Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.9072758Z 2023-01-11T21:38:06.9072842Z aten = torch.ops.aten 2023-01-11T21:38:06.9072979Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.9073077Z async_compile = AsyncCompile() 2023-01-11T21:38:06.9073082Z 2023-01-11T21:38:06.9073156Z import triton 2023-01-11T21:38:06.9073248Z import triton.language as tl 2023-01-11T21:38:06.9073373Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.9073507Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.9073515Z 2023-01-11T21:38:06.9073524Z 2023-01-11T21:38:06.9073673Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.9073748Z import triton 2023-01-11T21:38:06.9073842Z import triton.language as tl 2023-01-11T21:38:06.9073958Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.9074061Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.9074193Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.9074319Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.9074324Z 2023-01-11T21:38:06.9074735Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.9074809Z @triton.jit 2023-01-11T21:38:06.9074953Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.9075052Z xnumel = 100 2023-01-11T21:38:06.9075156Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.9075287Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.9075377Z xmask = xindex < xnumel 2023-01-11T21:38:06.9075452Z x0 = xindex % 10 2023-01-11T21:38:06.9075531Z x2 = xindex 2023-01-11T21:38:06.9075641Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.9075759Z tmp1 = tl.load(in_ptr1 + (x2), xmask) 2023-01-11T21:38:06.9075847Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.9075980Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.9076071Z ''') 2023-01-11T21:38:06.9076076Z 2023-01-11T21:38:06.9076081Z 2023-01-11T21:38:06.9076174Z async_compile.wait(globals()) 2023-01-11T21:38:06.9076245Z del async_compile 2023-01-11T21:38:06.9076250Z 2023-01-11T21:38:06.9076324Z def call(args): 2023-01-11T21:38:06.9076402Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.9076482Z args.clear() 2023-01-11T21:38:06.9076575Z with torch.cuda.device(0): 2023-01-11T21:38:06.9076779Z buf0 = empty_strided((10, 10), (10, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.9076872Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.9077011Z triton_fused_add_0.run(arg0_1, arg1_1, buf0, 100, grid=grid(100), stream=stream0) 2023-01-11T21:38:06.9077087Z del arg0_1 2023-01-11T21:38:06.9077162Z del arg1_1 2023-01-11T21:38:06.9077238Z return (buf0, ) 2023-01-11T21:38:06.9077243Z 2023-01-11T21:38:06.9077248Z 2023-01-11T21:38:06.9077327Z if __name__ == "__main__": 2023-01-11T21:38:06.9077446Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.9077572Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.9077774Z arg0_1 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.9077971Z arg1_1 = rand_strided((10, 10), (10, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.9078093Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.9078099Z 2023-01-11T21:38:06.9078171Z ok (0.094s) 2023-01-11T21:38:06.9078673Z test_cuda_broadcast1_double (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.9078808Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.9079070Z [2023-01-11 21:37:55,097] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1091 2023-01-11T21:38:06.9079339Z [2023-01-11 21:37:55,167] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1091 2023-01-11T21:38:06.9079347Z 2023-01-11T21:38:06.9079447Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.9079523Z import torch 2023-01-11T21:38:06.9079592Z import random 2023-01-11T21:38:06.9079714Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.9079838Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.9079846Z 2023-01-11T21:38:06.9079928Z aten = torch.ops.aten 2023-01-11T21:38:06.9080066Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.9080161Z async_compile = AsyncCompile() 2023-01-11T21:38:06.9080166Z 2023-01-11T21:38:06.9080240Z import triton 2023-01-11T21:38:06.9080332Z import triton.language as tl 2023-01-11T21:38:06.9080451Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.9080590Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.9080596Z 2023-01-11T21:38:06.9080600Z 2023-01-11T21:38:06.9080757Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.9080863Z import triton 2023-01-11T21:38:06.9080957Z import triton.language as tl 2023-01-11T21:38:06.9081072Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.9081176Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.9081312Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.9081436Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.9081441Z 2023-01-11T21:38:06.9081862Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp64', 2: '*fp64', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.9081935Z @triton.jit 2023-01-11T21:38:06.9082077Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.9082150Z xnumel = 100 2023-01-11T21:38:06.9082248Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.9082378Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.9082464Z xmask = xindex < xnumel 2023-01-11T21:38:06.9082534Z x0 = xindex % 10 2023-01-11T21:38:06.9082605Z x2 = xindex 2023-01-11T21:38:06.9082702Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.9082801Z tmp2 = tl.load(in_ptr1 + (x2), xmask) 2023-01-11T21:38:06.9082893Z tmp1 = tmp0.to(tl.float64) 2023-01-11T21:38:06.9082971Z tmp3 = tmp1 + tmp2 2023-01-11T21:38:06.9083102Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.9083188Z ''') 2023-01-11T21:38:06.9083193Z 2023-01-11T21:38:06.9083198Z 2023-01-11T21:38:06.9083290Z async_compile.wait(globals()) 2023-01-11T21:38:06.9083367Z del async_compile 2023-01-11T21:38:06.9083372Z 2023-01-11T21:38:06.9083448Z def call(args): 2023-01-11T21:38:06.9083526Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.9083604Z args.clear() 2023-01-11T21:38:06.9083695Z with torch.cuda.device(0): 2023-01-11T21:38:06.9083893Z buf0 = empty_strided((10, 10), (10, 1), device='cuda', dtype=torch.float64) 2023-01-11T21:38:06.9083989Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.9084131Z triton_fused_add_0.run(arg0_1, arg1_1, buf0, 100, grid=grid(100), stream=stream0) 2023-01-11T21:38:06.9084206Z del arg0_1 2023-01-11T21:38:06.9084279Z del arg1_1 2023-01-11T21:38:06.9084380Z return (buf0, ) 2023-01-11T21:38:06.9084385Z 2023-01-11T21:38:06.9084390Z 2023-01-11T21:38:06.9084472Z if __name__ == "__main__": 2023-01-11T21:38:06.9084591Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.9084711Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.9084910Z arg0_1 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.9085115Z arg1_1 = rand_strided((10, 10), (10, 1), device='cuda:0', dtype=torch.float64) 2023-01-11T21:38:06.9085234Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.9085242Z 2023-01-11T21:38:06.9085318Z ok (0.083s) 2023-01-11T21:38:06.9085852Z test_cuda_broadcast1_int (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.9085983Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.9086245Z [2023-01-11 21:37:55,180] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1092 2023-01-11T21:38:06.9086510Z [2023-01-11 21:37:55,247] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1092 2023-01-11T21:38:06.9086516Z 2023-01-11T21:38:06.9086607Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.9086680Z import torch 2023-01-11T21:38:06.9086783Z import random 2023-01-11T21:38:06.9086902Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.9087024Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.9087029Z 2023-01-11T21:38:06.9087115Z aten = torch.ops.aten 2023-01-11T21:38:06.9087254Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.9087351Z async_compile = AsyncCompile() 2023-01-11T21:38:06.9087357Z 2023-01-11T21:38:06.9087424Z import triton 2023-01-11T21:38:06.9087516Z import triton.language as tl 2023-01-11T21:38:06.9087642Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.9087783Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.9087789Z 2023-01-11T21:38:06.9087793Z 2023-01-11T21:38:06.9087950Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.9088024Z import triton 2023-01-11T21:38:06.9088117Z import triton.language as tl 2023-01-11T21:38:06.9088225Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.9088332Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.9088468Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.9088592Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.9088597Z 2023-01-11T21:38:06.9089014Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp32', 1: '*i32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.9089089Z @triton.jit 2023-01-11T21:38:06.9089229Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.9089300Z xnumel = 10 2023-01-11T21:38:06.9089391Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.9089522Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.9089606Z xmask = xindex < xnumel 2023-01-11T21:38:06.9089676Z x0 = xindex 2023-01-11T21:38:06.9089775Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.9089869Z tmp1 = tl.load(in_ptr1 + (x0), xmask) 2023-01-11T21:38:06.9089958Z tmp2 = tmp1.to(tl.float32) 2023-01-11T21:38:06.9090030Z tmp3 = tmp0 + tmp2 2023-01-11T21:38:06.9090165Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.9090275Z ''') 2023-01-11T21:38:06.9090281Z 2023-01-11T21:38:06.9090286Z 2023-01-11T21:38:06.9090377Z async_compile.wait(globals()) 2023-01-11T21:38:06.9090457Z del async_compile 2023-01-11T21:38:06.9090462Z 2023-01-11T21:38:06.9090537Z def call(args): 2023-01-11T21:38:06.9090620Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.9090695Z args.clear() 2023-01-11T21:38:06.9090780Z with torch.cuda.device(0): 2023-01-11T21:38:06.9090978Z buf0 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.9091070Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.9091212Z triton_fused_add_0.run(arg0_1, arg1_1, buf0, 10, grid=grid(10), stream=stream0) 2023-01-11T21:38:06.9091287Z del arg0_1 2023-01-11T21:38:06.9091361Z del arg1_1 2023-01-11T21:38:06.9091439Z return (buf0, ) 2023-01-11T21:38:06.9091444Z 2023-01-11T21:38:06.9091448Z 2023-01-11T21:38:06.9091530Z if __name__ == "__main__": 2023-01-11T21:38:06.9091644Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.9091770Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.9091970Z arg0_1 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.9092165Z arg1_1 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.int32) 2023-01-11T21:38:06.9092284Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.9092289Z 2023-01-11T21:38:06.9092359Z ok (0.080s) 2023-01-11T21:38:06.9092836Z test_cuda_broadcast1_strided (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.9093000Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.9093261Z [2023-01-11 21:37:55,261] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1093 2023-01-11T21:38:06.9093519Z [2023-01-11 21:37:55,335] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1093 2023-01-11T21:38:06.9093525Z 2023-01-11T21:38:06.9093626Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.9093698Z import torch 2023-01-11T21:38:06.9093771Z import random 2023-01-11T21:38:06.9093889Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.9094013Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.9094022Z 2023-01-11T21:38:06.9094104Z aten = torch.ops.aten 2023-01-11T21:38:06.9094243Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.9094332Z async_compile = AsyncCompile() 2023-01-11T21:38:06.9094337Z 2023-01-11T21:38:06.9094410Z import triton 2023-01-11T21:38:06.9094714Z import triton.language as tl 2023-01-11T21:38:06.9094846Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.9094989Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.9094995Z 2023-01-11T21:38:06.9094999Z 2023-01-11T21:38:06.9095157Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.9095231Z import triton 2023-01-11T21:38:06.9095316Z import triton.language as tl 2023-01-11T21:38:06.9095431Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.9095534Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.9095670Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.9095794Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.9095802Z 2023-01-11T21:38:06.9096219Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.9096336Z @triton.jit 2023-01-11T21:38:06.9096481Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.9096556Z xnumel = 100 2023-01-11T21:38:06.9096647Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.9096775Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.9096859Z xmask = xindex < xnumel 2023-01-11T21:38:06.9096935Z x0 = xindex % 10 2023-01-11T21:38:06.9097016Z x1 = (xindex // 10) 2023-01-11T21:38:06.9097087Z x2 = xindex 2023-01-11T21:38:06.9097237Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.9097347Z tmp1 = tl.load(in_ptr1 + ((2*x0) + (30*x1)), xmask) 2023-01-11T21:38:06.9097434Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.9097572Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.9097657Z ''') 2023-01-11T21:38:06.9097662Z 2023-01-11T21:38:06.9097667Z 2023-01-11T21:38:06.9097762Z async_compile.wait(globals()) 2023-01-11T21:38:06.9097841Z del async_compile 2023-01-11T21:38:06.9097846Z 2023-01-11T21:38:06.9097919Z def call(args): 2023-01-11T21:38:06.9097991Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.9098069Z args.clear() 2023-01-11T21:38:06.9098161Z with torch.cuda.device(0): 2023-01-11T21:38:06.9098364Z buf0 = empty_strided((10, 10), (10, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.9098454Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.9098598Z triton_fused_add_0.run(arg0_1, arg1_1, buf0, 100, grid=grid(100), stream=stream0) 2023-01-11T21:38:06.9098670Z del arg0_1 2023-01-11T21:38:06.9098735Z del arg1_1 2023-01-11T21:38:06.9098855Z return (buf0, ) 2023-01-11T21:38:06.9098860Z 2023-01-11T21:38:06.9098865Z 2023-01-11T21:38:06.9098946Z if __name__ == "__main__": 2023-01-11T21:38:06.9099063Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.9099186Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.9099388Z arg0_1 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.9099592Z arg1_1 = rand_strided((10, 10), (30, 2), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.9099711Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.9099717Z 2023-01-11T21:38:06.9099780Z ok (0.088s) 2023-01-11T21:38:06.9100264Z test_cuda_broadcast1_transposed (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.9100403Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.9100668Z [2023-01-11 21:37:55,349] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1094 2023-01-11T21:38:06.9100933Z [2023-01-11 21:37:55,421] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1094 2023-01-11T21:38:06.9100939Z 2023-01-11T21:38:06.9101036Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.9101110Z import torch 2023-01-11T21:38:06.9101184Z import random 2023-01-11T21:38:06.9101302Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.9101419Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.9101432Z 2023-01-11T21:38:06.9101508Z aten = torch.ops.aten 2023-01-11T21:38:06.9101644Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.9101741Z async_compile = AsyncCompile() 2023-01-11T21:38:06.9101747Z 2023-01-11T21:38:06.9101820Z import triton 2023-01-11T21:38:06.9101911Z import triton.language as tl 2023-01-11T21:38:06.9102034Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.9102199Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.9102205Z 2023-01-11T21:38:06.9102209Z 2023-01-11T21:38:06.9102357Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.9102430Z import triton 2023-01-11T21:38:06.9102522Z import triton.language as tl 2023-01-11T21:38:06.9102638Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.9102739Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.9102873Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.9102998Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.9103004Z 2023-01-11T21:38:06.9103422Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.9103492Z @triton.jit 2023-01-11T21:38:06.9103638Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.9103713Z xnumel = 100 2023-01-11T21:38:06.9103810Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.9103938Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.9104020Z xmask = xindex < xnumel 2023-01-11T21:38:06.9104098Z x1 = (xindex // 10) 2023-01-11T21:38:06.9104162Z x2 = xindex 2023-01-11T21:38:06.9104261Z tmp0 = tl.load(in_ptr0 + (x1), xmask) 2023-01-11T21:38:06.9104358Z tmp1 = tl.load(in_ptr1 + (x2), xmask) 2023-01-11T21:38:06.9104437Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.9104573Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.9104701Z ''') 2023-01-11T21:38:06.9104707Z 2023-01-11T21:38:06.9104711Z 2023-01-11T21:38:06.9104804Z async_compile.wait(globals()) 2023-01-11T21:38:06.9104881Z del async_compile 2023-01-11T21:38:06.9104886Z 2023-01-11T21:38:06.9104954Z def call(args): 2023-01-11T21:38:06.9105033Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.9105111Z args.clear() 2023-01-11T21:38:06.9105200Z with torch.cuda.device(0): 2023-01-11T21:38:06.9105407Z buf0 = empty_strided((10, 10), (1, 10), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.9105501Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.9105644Z triton_fused_add_0.run(arg0_1, arg1_1, buf0, 100, grid=grid(100), stream=stream0) 2023-01-11T21:38:06.9105711Z del arg0_1 2023-01-11T21:38:06.9105783Z del arg1_1 2023-01-11T21:38:06.9105860Z return (buf0, ) 2023-01-11T21:38:06.9105865Z 2023-01-11T21:38:06.9105870Z 2023-01-11T21:38:06.9105949Z if __name__ == "__main__": 2023-01-11T21:38:06.9106075Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.9106202Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.9106401Z arg0_1 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.9106606Z arg1_1 = rand_strided((10, 10), (1, 10), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.9106721Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.9106727Z 2023-01-11T21:38:06.9106796Z ok (0.086s) 2023-01-11T21:38:06.9107280Z test_cuda_broadcast2_broadcast1 (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.9107412Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.9107674Z [2023-01-11 21:37:55,435] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1095 2023-01-11T21:38:06.9107939Z [2023-01-11 21:37:55,506] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1095 2023-01-11T21:38:06.9107944Z 2023-01-11T21:38:06.9108070Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.9108146Z import torch 2023-01-11T21:38:06.9108223Z import random 2023-01-11T21:38:06.9108335Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.9108459Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.9108464Z 2023-01-11T21:38:06.9108544Z aten = torch.ops.aten 2023-01-11T21:38:06.9108680Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.9108775Z async_compile = AsyncCompile() 2023-01-11T21:38:06.9108780Z 2023-01-11T21:38:06.9108850Z import triton 2023-01-11T21:38:06.9108945Z import triton.language as tl 2023-01-11T21:38:06.9109072Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.9109206Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.9109211Z 2023-01-11T21:38:06.9109224Z 2023-01-11T21:38:06.9109371Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.9109447Z import triton 2023-01-11T21:38:06.9109540Z import triton.language as tl 2023-01-11T21:38:06.9109653Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.9109755Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.9109887Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.9110013Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.9110018Z 2023-01-11T21:38:06.9110427Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.9110527Z @triton.jit 2023-01-11T21:38:06.9110667Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.9110740Z xnumel = 100 2023-01-11T21:38:06.9110836Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.9110967Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.9111046Z xmask = xindex < xnumel 2023-01-11T21:38:06.9111125Z x1 = (xindex // 10) 2023-01-11T21:38:06.9111193Z x0 = xindex % 10 2023-01-11T21:38:06.9111263Z x2 = xindex 2023-01-11T21:38:06.9111361Z tmp0 = tl.load(in_ptr0 + (x1), xmask) 2023-01-11T21:38:06.9111458Z tmp1 = tl.load(in_ptr1 + (x0), xmask) 2023-01-11T21:38:06.9111535Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.9111669Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.9111747Z ''') 2023-01-11T21:38:06.9111759Z 2023-01-11T21:38:06.9111763Z 2023-01-11T21:38:06.9111849Z async_compile.wait(globals()) 2023-01-11T21:38:06.9111926Z del async_compile 2023-01-11T21:38:06.9111931Z 2023-01-11T21:38:06.9112004Z def call(args): 2023-01-11T21:38:06.9112082Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.9112157Z args.clear() 2023-01-11T21:38:06.9112250Z with torch.cuda.device(0): 2023-01-11T21:38:06.9112465Z buf0 = empty_strided((1, 10, 10), (100, 10, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.9112551Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.9112694Z triton_fused_add_0.run(arg0_1, arg1_1, buf0, 100, grid=grid(100), stream=stream0) 2023-01-11T21:38:06.9112767Z del arg0_1 2023-01-11T21:38:06.9112839Z del arg1_1 2023-01-11T21:38:06.9112916Z return (buf0, ) 2023-01-11T21:38:06.9112921Z 2023-01-11T21:38:06.9112926Z 2023-01-11T21:38:06.9113005Z if __name__ == "__main__": 2023-01-11T21:38:06.9113122Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.9113247Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.9113454Z arg0_1 = rand_strided((1, 10, 1), (10, 1, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.9113652Z arg1_1 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.9113773Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.9113778Z 2023-01-11T21:38:06.9113876Z ok (0.085s) 2023-01-11T21:38:06.9114355Z test_cuda_broadcast2_broadcast2 (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.9114485Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.9114747Z [2023-01-11 21:37:55,520] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1096 2023-01-11T21:38:06.9115014Z [2023-01-11 21:37:55,529] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1096 2023-01-11T21:38:06.9115020Z 2023-01-11T21:38:06.9115118Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.9115186Z import torch 2023-01-11T21:38:06.9115261Z import random 2023-01-11T21:38:06.9115379Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.9115503Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.9115509Z 2023-01-11T21:38:06.9115591Z aten = torch.ops.aten 2023-01-11T21:38:06.9115728Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.9115823Z async_compile = AsyncCompile() 2023-01-11T21:38:06.9115829Z 2023-01-11T21:38:06.9115900Z import triton 2023-01-11T21:38:06.9115985Z import triton.language as tl 2023-01-11T21:38:06.9116110Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.9116250Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.9116283Z 2023-01-11T21:38:06.9116288Z 2023-01-11T21:38:06.9116442Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.9116515Z import triton 2023-01-11T21:38:06.9116607Z import triton.language as tl 2023-01-11T21:38:06.9116723Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.9116817Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.9116950Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.9117075Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.9117080Z 2023-01-11T21:38:06.9117497Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.9117570Z @triton.jit 2023-01-11T21:38:06.9117711Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.9117788Z xnumel = 10 2023-01-11T21:38:06.9117887Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.9118018Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.9118095Z xmask = xindex < xnumel 2023-01-11T21:38:06.9118165Z x0 = xindex 2023-01-11T21:38:06.9118266Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.9118364Z tmp1 = tl.load(in_ptr1 + (x0), xmask) 2023-01-11T21:38:06.9118442Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.9118578Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.9118657Z ''') 2023-01-11T21:38:06.9118667Z 2023-01-11T21:38:06.9118671Z 2023-01-11T21:38:06.9118757Z async_compile.wait(globals()) 2023-01-11T21:38:06.9118833Z del async_compile 2023-01-11T21:38:06.9118839Z 2023-01-11T21:38:06.9118915Z def call(args): 2023-01-11T21:38:06.9118993Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.9119068Z args.clear() 2023-01-11T21:38:06.9119162Z with torch.cuda.device(0): 2023-01-11T21:38:06.9119371Z buf0 = empty_strided((1, 10, 1), (10, 1, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.9119456Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.9119597Z triton_fused_add_0.run(arg0_1, arg1_1, buf0, 10, grid=grid(10), stream=stream0) 2023-01-11T21:38:06.9119701Z del arg0_1 2023-01-11T21:38:06.9119773Z del arg1_1 2023-01-11T21:38:06.9119849Z return (buf0, ) 2023-01-11T21:38:06.9119854Z 2023-01-11T21:38:06.9119858Z 2023-01-11T21:38:06.9119938Z if __name__ == "__main__": 2023-01-11T21:38:06.9120056Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.9120183Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.9120389Z arg0_1 = rand_strided((1, 10, 1), (10, 1, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.9120597Z arg1_1 = rand_strided((1, 10, 1), (10, 1, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.9120720Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.9120725Z 2023-01-11T21:38:06.9120795Z ok (0.023s) 2023-01-11T21:38:06.9121270Z test_cuda_broadcast2_broadcast3 (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.9121401Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.9121662Z [2023-01-11 21:37:55,543] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1097 2023-01-11T21:38:06.9121926Z [2023-01-11 21:37:55,552] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1097 2023-01-11T21:38:06.9121931Z 2023-01-11T21:38:06.9122091Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.9122159Z import torch 2023-01-11T21:38:06.9122232Z import random 2023-01-11T21:38:06.9122351Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.9122475Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.9122481Z 2023-01-11T21:38:06.9122569Z aten = torch.ops.aten 2023-01-11T21:38:06.9122706Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.9122803Z async_compile = AsyncCompile() 2023-01-11T21:38:06.9122808Z 2023-01-11T21:38:06.9122883Z import triton 2023-01-11T21:38:06.9122969Z import triton.language as tl 2023-01-11T21:38:06.9123095Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.9123235Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.9123240Z 2023-01-11T21:38:06.9123245Z 2023-01-11T21:38:06.9123402Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.9123475Z import triton 2023-01-11T21:38:06.9123574Z import triton.language as tl 2023-01-11T21:38:06.9123688Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.9123791Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.9123918Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.9124046Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.9124051Z 2023-01-11T21:38:06.9124464Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.9124537Z @triton.jit 2023-01-11T21:38:06.9124682Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.9124756Z xnumel = 10 2023-01-11T21:38:06.9124855Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.9124982Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.9125061Z xmask = xindex < xnumel 2023-01-11T21:38:06.9125132Z x0 = xindex 2023-01-11T21:38:06.9125231Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.9125363Z tmp1 = tl.load(in_ptr1 + (0 + tl.zeros([XBLOCK], tl.int32)), None) 2023-01-11T21:38:06.9125446Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.9125606Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.9125693Z ''') 2023-01-11T21:38:06.9125699Z 2023-01-11T21:38:06.9125703Z 2023-01-11T21:38:06.9125789Z async_compile.wait(globals()) 2023-01-11T21:38:06.9125868Z del async_compile 2023-01-11T21:38:06.9125873Z 2023-01-11T21:38:06.9125950Z def call(args): 2023-01-11T21:38:06.9126029Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.9126103Z args.clear() 2023-01-11T21:38:06.9126194Z with torch.cuda.device(0): 2023-01-11T21:38:06.9126403Z buf0 = empty_strided((1, 10, 1), (10, 1, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.9126489Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.9126632Z triton_fused_add_0.run(arg0_1, arg1_1, buf0, 10, grid=grid(10), stream=stream0) 2023-01-11T21:38:06.9126704Z del arg0_1 2023-01-11T21:38:06.9126777Z del arg1_1 2023-01-11T21:38:06.9126854Z return (buf0, ) 2023-01-11T21:38:06.9126860Z 2023-01-11T21:38:06.9126865Z 2023-01-11T21:38:06.9126946Z if __name__ == "__main__": 2023-01-11T21:38:06.9127064Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.9127191Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.9127396Z arg0_1 = rand_strided((1, 10, 1), (10, 1, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.9127594Z arg1_1 = rand_strided((1, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.9127713Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.9127718Z 2023-01-11T21:38:06.9127787Z ok (0.023s) 2023-01-11T21:38:06.9128268Z test_cuda_broadcast2_dense (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.9128425Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.9128683Z [2023-01-11 21:37:55,566] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1098 2023-01-11T21:38:06.9128947Z [2023-01-11 21:37:55,577] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1098 2023-01-11T21:38:06.9128952Z 2023-01-11T21:38:06.9129049Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.9129122Z import torch 2023-01-11T21:38:06.9129190Z import random 2023-01-11T21:38:06.9129310Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.9129437Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.9129442Z 2023-01-11T21:38:06.9129524Z aten = torch.ops.aten 2023-01-11T21:38:06.9129662Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.9129759Z async_compile = AsyncCompile() 2023-01-11T21:38:06.9129764Z 2023-01-11T21:38:06.9129840Z import triton 2023-01-11T21:38:06.9129925Z import triton.language as tl 2023-01-11T21:38:06.9130050Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.9130190Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.9130195Z 2023-01-11T21:38:06.9130200Z 2023-01-11T21:38:06.9130352Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.9130426Z import triton 2023-01-11T21:38:06.9130518Z import triton.language as tl 2023-01-11T21:38:06.9130632Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.9130734Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.9130864Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.9130990Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.9130995Z 2023-01-11T21:38:06.9131440Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.9131518Z @triton.jit 2023-01-11T21:38:06.9131658Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.9131732Z xnumel = 100 2023-01-11T21:38:06.9131836Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.9131964Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.9132041Z xmask = xindex < xnumel 2023-01-11T21:38:06.9132120Z x1 = (xindex // 10) 2023-01-11T21:38:06.9132191Z x2 = xindex 2023-01-11T21:38:06.9132290Z tmp0 = tl.load(in_ptr0 + (x1), xmask) 2023-01-11T21:38:06.9132388Z tmp1 = tl.load(in_ptr1 + (x2), xmask) 2023-01-11T21:38:06.9132467Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.9132602Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.9132681Z ''') 2023-01-11T21:38:06.9132687Z 2023-01-11T21:38:06.9132691Z 2023-01-11T21:38:06.9132786Z async_compile.wait(globals()) 2023-01-11T21:38:06.9132861Z del async_compile 2023-01-11T21:38:06.9132866Z 2023-01-11T21:38:06.9132939Z def call(args): 2023-01-11T21:38:06.9133017Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.9133093Z args.clear() 2023-01-11T21:38:06.9133184Z with torch.cuda.device(0): 2023-01-11T21:38:06.9133392Z buf0 = empty_strided((1, 10, 10), (100, 10, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.9133483Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.9133628Z triton_fused_add_0.run(arg0_1, arg1_1, buf0, 100, grid=grid(100), stream=stream0) 2023-01-11T21:38:06.9133702Z del arg0_1 2023-01-11T21:38:06.9133803Z del arg1_1 2023-01-11T21:38:06.9133880Z return (buf0, ) 2023-01-11T21:38:06.9133885Z 2023-01-11T21:38:06.9133889Z 2023-01-11T21:38:06.9133970Z if __name__ == "__main__": 2023-01-11T21:38:06.9134089Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.9134212Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.9134423Z arg0_1 = rand_strided((1, 10, 1), (10, 1, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.9134768Z arg1_1 = rand_strided((10, 10), (10, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.9134885Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.9134890Z 2023-01-11T21:38:06.9134961Z ok (0.025s) 2023-01-11T21:38:06.9135430Z test_cuda_broadcast2_double (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.9135590Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.9135876Z [2023-01-11 21:37:55,591] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1099 2023-01-11T21:38:06.9136140Z [2023-01-11 21:37:55,665] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1099 2023-01-11T21:38:06.9136145Z 2023-01-11T21:38:06.9136243Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.9136311Z import torch 2023-01-11T21:38:06.9136387Z import random 2023-01-11T21:38:06.9136508Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.9136631Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.9136636Z 2023-01-11T21:38:06.9136718Z aten = torch.ops.aten 2023-01-11T21:38:06.9136855Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.9136949Z async_compile = AsyncCompile() 2023-01-11T21:38:06.9136954Z 2023-01-11T21:38:06.9137021Z import triton 2023-01-11T21:38:06.9137113Z import triton.language as tl 2023-01-11T21:38:06.9137307Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.9137494Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.9137500Z 2023-01-11T21:38:06.9137505Z 2023-01-11T21:38:06.9137665Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.9137738Z import triton 2023-01-11T21:38:06.9137830Z import triton.language as tl 2023-01-11T21:38:06.9137945Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.9138041Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.9138176Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.9138302Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.9138309Z 2023-01-11T21:38:06.9138723Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp64', 2: '*fp64', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.9138796Z @triton.jit 2023-01-11T21:38:06.9138940Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.9139014Z xnumel = 100 2023-01-11T21:38:06.9139113Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.9139235Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.9139318Z xmask = xindex < xnumel 2023-01-11T21:38:06.9139399Z x1 = (xindex // 10) 2023-01-11T21:38:06.9139469Z x2 = xindex 2023-01-11T21:38:06.9139566Z tmp0 = tl.load(in_ptr0 + (x1), xmask) 2023-01-11T21:38:06.9139664Z tmp2 = tl.load(in_ptr1 + (x2), xmask) 2023-01-11T21:38:06.9139752Z tmp1 = tmp0.to(tl.float64) 2023-01-11T21:38:06.9139825Z tmp3 = tmp1 + tmp2 2023-01-11T21:38:06.9140000Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.9140088Z ''') 2023-01-11T21:38:06.9140094Z 2023-01-11T21:38:06.9140098Z 2023-01-11T21:38:06.9140193Z async_compile.wait(globals()) 2023-01-11T21:38:06.9140272Z del async_compile 2023-01-11T21:38:06.9140277Z 2023-01-11T21:38:06.9140355Z def call(args): 2023-01-11T21:38:06.9140436Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.9140507Z args.clear() 2023-01-11T21:38:06.9140600Z with torch.cuda.device(0): 2023-01-11T21:38:06.9140816Z buf0 = empty_strided((1, 10, 10), (100, 10, 1), device='cuda', dtype=torch.float64) 2023-01-11T21:38:06.9140911Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.9141060Z triton_fused_add_0.run(arg0_1, arg1_1, buf0, 100, grid=grid(100), stream=stream0) 2023-01-11T21:38:06.9141136Z del arg0_1 2023-01-11T21:38:06.9141210Z del arg1_1 2023-01-11T21:38:06.9141284Z return (buf0, ) 2023-01-11T21:38:06.9141302Z 2023-01-11T21:38:06.9141306Z 2023-01-11T21:38:06.9141382Z if __name__ == "__main__": 2023-01-11T21:38:06.9141501Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.9141630Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.9141846Z arg0_1 = rand_strided((1, 10, 1), (10, 1, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.9142053Z arg1_1 = rand_strided((10, 10), (10, 1), device='cuda:0', dtype=torch.float64) 2023-01-11T21:38:06.9142175Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.9142181Z 2023-01-11T21:38:06.9142254Z ok (0.088s) 2023-01-11T21:38:06.9142731Z test_cuda_broadcast2_int (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.9142871Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.9143128Z [2023-01-11 21:37:55,678] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1100 2023-01-11T21:38:06.9143420Z [2023-01-11 21:37:55,751] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1100 2023-01-11T21:38:06.9143426Z 2023-01-11T21:38:06.9143527Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.9143603Z import torch 2023-01-11T21:38:06.9143680Z import random 2023-01-11T21:38:06.9143800Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.9143925Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.9143930Z 2023-01-11T21:38:06.9144014Z aten = torch.ops.aten 2023-01-11T21:38:06.9144147Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.9144240Z async_compile = AsyncCompile() 2023-01-11T21:38:06.9144249Z 2023-01-11T21:38:06.9144323Z import triton 2023-01-11T21:38:06.9144419Z import triton.language as tl 2023-01-11T21:38:06.9144546Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.9144690Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.9144696Z 2023-01-11T21:38:06.9144702Z 2023-01-11T21:38:06.9144860Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.9144940Z import triton 2023-01-11T21:38:06.9145028Z import triton.language as tl 2023-01-11T21:38:06.9145143Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.9145248Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.9145387Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.9145525Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.9145532Z 2023-01-11T21:38:06.9145994Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp32', 1: '*i32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.9146108Z @triton.jit 2023-01-11T21:38:06.9146253Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.9146324Z xnumel = 100 2023-01-11T21:38:06.9146425Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.9146558Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.9146643Z xmask = xindex < xnumel 2023-01-11T21:38:06.9146721Z x1 = (xindex // 10) 2023-01-11T21:38:06.9146797Z x0 = xindex % 10 2023-01-11T21:38:06.9146872Z x2 = xindex 2023-01-11T21:38:06.9146967Z tmp0 = tl.load(in_ptr0 + (x1), xmask) 2023-01-11T21:38:06.9147067Z tmp1 = tl.load(in_ptr1 + (x0), xmask) 2023-01-11T21:38:06.9147157Z tmp2 = tmp1.to(tl.float32) 2023-01-11T21:38:06.9147238Z tmp3 = tmp0 + tmp2 2023-01-11T21:38:06.9147376Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.9147465Z ''') 2023-01-11T21:38:06.9147470Z 2023-01-11T21:38:06.9147475Z 2023-01-11T21:38:06.9147570Z async_compile.wait(globals()) 2023-01-11T21:38:06.9147643Z del async_compile 2023-01-11T21:38:06.9147648Z 2023-01-11T21:38:06.9147724Z def call(args): 2023-01-11T21:38:06.9147809Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.9147886Z args.clear() 2023-01-11T21:38:06.9147979Z with torch.cuda.device(0): 2023-01-11T21:38:06.9148194Z buf0 = empty_strided((1, 10, 10), (100, 10, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.9148290Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.9148430Z triton_fused_add_0.run(arg0_1, arg1_1, buf0, 100, grid=grid(100), stream=stream0) 2023-01-11T21:38:06.9148507Z del arg0_1 2023-01-11T21:38:06.9148582Z del arg1_1 2023-01-11T21:38:06.9148663Z return (buf0, ) 2023-01-11T21:38:06.9148668Z 2023-01-11T21:38:06.9148672Z 2023-01-11T21:38:06.9148755Z if __name__ == "__main__": 2023-01-11T21:38:06.9148875Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.9149004Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.9149217Z arg0_1 = rand_strided((1, 10, 1), (10, 1, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.9149437Z arg1_1 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.int32) 2023-01-11T21:38:06.9149561Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.9149566Z 2023-01-11T21:38:06.9149641Z ok (0.086s) 2023-01-11T21:38:06.9150123Z test_cuda_broadcast2_strided (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.9150263Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.9150526Z [2023-01-11 21:37:55,765] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1101 2023-01-11T21:38:06.9150796Z [2023-01-11 21:37:55,838] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1101 2023-01-11T21:38:06.9150802Z 2023-01-11T21:38:06.9150903Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.9150979Z import torch 2023-01-11T21:38:06.9151056Z import random 2023-01-11T21:38:06.9151171Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.9151297Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.9151302Z 2023-01-11T21:38:06.9151387Z aten = torch.ops.aten 2023-01-11T21:38:06.9151527Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.9151626Z async_compile = AsyncCompile() 2023-01-11T21:38:06.9151631Z 2023-01-11T21:38:06.9151707Z import triton 2023-01-11T21:38:06.9151832Z import triton.language as tl 2023-01-11T21:38:06.9151953Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.9152099Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.9152104Z 2023-01-11T21:38:06.9152109Z 2023-01-11T21:38:06.9152271Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.9152350Z import triton 2023-01-11T21:38:06.9152445Z import triton.language as tl 2023-01-11T21:38:06.9152562Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.9152667Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.9152802Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.9152925Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.9152930Z 2023-01-11T21:38:06.9153353Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.9153431Z @triton.jit 2023-01-11T21:38:06.9153574Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.9153652Z xnumel = 100 2023-01-11T21:38:06.9153756Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.9153894Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.9153979Z xmask = xindex < xnumel 2023-01-11T21:38:06.9154054Z x1 = (xindex // 10) 2023-01-11T21:38:06.9154130Z x0 = xindex % 10 2023-01-11T21:38:06.9154203Z x2 = xindex 2023-01-11T21:38:06.9154307Z tmp0 = tl.load(in_ptr0 + (x1), xmask) 2023-01-11T21:38:06.9154418Z tmp1 = tl.load(in_ptr1 + ((2*x0) + (30*x1)), xmask) 2023-01-11T21:38:06.9154501Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.9154642Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.9154722Z ''') 2023-01-11T21:38:06.9154728Z 2023-01-11T21:38:06.9154735Z 2023-01-11T21:38:06.9154830Z async_compile.wait(globals()) 2023-01-11T21:38:06.9154909Z del async_compile 2023-01-11T21:38:06.9154914Z 2023-01-11T21:38:06.9154991Z def call(args): 2023-01-11T21:38:06.9155071Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.9155149Z args.clear() 2023-01-11T21:38:06.9155247Z with torch.cuda.device(0): 2023-01-11T21:38:06.9155525Z buf0 = empty_strided((1, 10, 10), (100, 10, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.9155629Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.9155772Z triton_fused_add_0.run(arg0_1, arg1_1, buf0, 100, grid=grid(100), stream=stream0) 2023-01-11T21:38:06.9155846Z del arg0_1 2023-01-11T21:38:06.9155918Z del arg1_1 2023-01-11T21:38:06.9155997Z return (buf0, ) 2023-01-11T21:38:06.9156002Z 2023-01-11T21:38:06.9156007Z 2023-01-11T21:38:06.9156088Z if __name__ == "__main__": 2023-01-11T21:38:06.9156207Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.9156329Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.9156540Z arg0_1 = rand_strided((1, 10, 1), (10, 1, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.9156746Z arg1_1 = rand_strided((10, 10), (30, 2), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.9156866Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.9156871Z 2023-01-11T21:38:06.9156941Z ok (0.088s) 2023-01-11T21:38:06.9157427Z test_cuda_broadcast2_transposed (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.9157559Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.9157849Z [2023-01-11 21:37:55,853] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1102 2023-01-11T21:38:06.9158116Z [2023-01-11 21:37:55,865] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1102 2023-01-11T21:38:06.9158121Z 2023-01-11T21:38:06.9158224Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.9158294Z import torch 2023-01-11T21:38:06.9158368Z import random 2023-01-11T21:38:06.9158487Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.9158609Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.9158615Z 2023-01-11T21:38:06.9158697Z aten = torch.ops.aten 2023-01-11T21:38:06.9158835Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.9158928Z async_compile = AsyncCompile() 2023-01-11T21:38:06.9158933Z 2023-01-11T21:38:06.9159001Z import triton 2023-01-11T21:38:06.9159092Z import triton.language as tl 2023-01-11T21:38:06.9159218Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.9159362Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.9159368Z 2023-01-11T21:38:06.9159372Z 2023-01-11T21:38:06.9159527Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.9159602Z import triton 2023-01-11T21:38:06.9159694Z import triton.language as tl 2023-01-11T21:38:06.9159811Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.9159906Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.9160039Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.9160166Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.9160171Z 2023-01-11T21:38:06.9160588Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.9160660Z @triton.jit 2023-01-11T21:38:06.9160803Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.9160873Z xnumel = 100 2023-01-11T21:38:06.9160970Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.9161093Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.9161175Z xmask = xindex < xnumel 2023-01-11T21:38:06.9161286Z x0 = xindex % 10 2023-01-11T21:38:06.9161359Z x2 = xindex 2023-01-11T21:38:06.9161457Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.9161556Z tmp1 = tl.load(in_ptr1 + (x2), xmask) 2023-01-11T21:38:06.9161631Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.9161762Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.9161847Z ''') 2023-01-11T21:38:06.9161853Z 2023-01-11T21:38:06.9161857Z 2023-01-11T21:38:06.9161949Z async_compile.wait(globals()) 2023-01-11T21:38:06.9162025Z del async_compile 2023-01-11T21:38:06.9162031Z 2023-01-11T21:38:06.9162103Z def call(args): 2023-01-11T21:38:06.9162185Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.9162260Z args.clear() 2023-01-11T21:38:06.9162346Z with torch.cuda.device(0): 2023-01-11T21:38:06.9162560Z buf0 = empty_strided((1, 10, 10), (100, 1, 10), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.9162653Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.9162801Z triton_fused_add_0.run(arg0_1, arg1_1, buf0, 100, grid=grid(100), stream=stream0) 2023-01-11T21:38:06.9162874Z del arg0_1 2023-01-11T21:38:06.9162948Z del arg1_1 2023-01-11T21:38:06.9163027Z return (buf0, ) 2023-01-11T21:38:06.9163032Z 2023-01-11T21:38:06.9163036Z 2023-01-11T21:38:06.9163116Z if __name__ == "__main__": 2023-01-11T21:38:06.9163227Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.9163354Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.9163563Z arg0_1 = rand_strided((1, 10, 1), (10, 1, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.9163794Z arg1_1 = rand_strided((10, 10), (1, 10), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.9163914Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.9163919Z 2023-01-11T21:38:06.9163991Z ok (0.026s) 2023-01-11T21:38:06.9164475Z test_cuda_broadcast3_broadcast1 (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.9164606Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.9164865Z [2023-01-11 21:37:55,878] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1103 2023-01-11T21:38:06.9165122Z [2023-01-11 21:37:55,945] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1103 2023-01-11T21:38:06.9165138Z 2023-01-11T21:38:06.9165230Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.9165304Z import torch 2023-01-11T21:38:06.9165380Z import random 2023-01-11T21:38:06.9165502Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.9165628Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.9165633Z 2023-01-11T21:38:06.9165717Z aten = torch.ops.aten 2023-01-11T21:38:06.9165854Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.9165942Z async_compile = AsyncCompile() 2023-01-11T21:38:06.9165947Z 2023-01-11T21:38:06.9166017Z import triton 2023-01-11T21:38:06.9166107Z import triton.language as tl 2023-01-11T21:38:06.9166231Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.9166368Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.9166373Z 2023-01-11T21:38:06.9166380Z 2023-01-11T21:38:06.9166531Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.9166606Z import triton 2023-01-11T21:38:06.9166699Z import triton.language as tl 2023-01-11T21:38:06.9166806Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.9166906Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.9167068Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.9167197Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.9167202Z 2023-01-11T21:38:06.9167620Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.9167694Z @triton.jit 2023-01-11T21:38:06.9167835Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.9167908Z xnumel = 10 2023-01-11T21:38:06.9167999Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.9168133Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.9168217Z xmask = xindex < xnumel 2023-01-11T21:38:06.9168287Z x0 = xindex 2023-01-11T21:38:06.9168419Z tmp0 = tl.load(in_ptr0 + (0 + tl.zeros([XBLOCK], tl.int32)), None) 2023-01-11T21:38:06.9168520Z tmp1 = tl.load(in_ptr1 + (x0), xmask) 2023-01-11T21:38:06.9168599Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.9168727Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.9168811Z ''') 2023-01-11T21:38:06.9168817Z 2023-01-11T21:38:06.9168821Z 2023-01-11T21:38:06.9168916Z async_compile.wait(globals()) 2023-01-11T21:38:06.9168993Z del async_compile 2023-01-11T21:38:06.9168998Z 2023-01-11T21:38:06.9169071Z def call(args): 2023-01-11T21:38:06.9169151Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.9169229Z args.clear() 2023-01-11T21:38:06.9169314Z with torch.cuda.device(0): 2023-01-11T21:38:06.9169513Z buf0 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.9169635Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.9169777Z triton_fused_add_0.run(arg0_1, arg1_1, buf0, 10, grid=grid(10), stream=stream0) 2023-01-11T21:38:06.9169851Z del arg0_1 2023-01-11T21:38:06.9169924Z del arg1_1 2023-01-11T21:38:06.9170003Z return (buf0, ) 2023-01-11T21:38:06.9170008Z 2023-01-11T21:38:06.9170013Z 2023-01-11T21:38:06.9170091Z if __name__ == "__main__": 2023-01-11T21:38:06.9170203Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.9170331Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.9170531Z arg0_1 = rand_strided((1, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.9170730Z arg1_1 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.9170852Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.9170857Z 2023-01-11T21:38:06.9170931Z ok (0.080s) 2023-01-11T21:38:06.9171418Z test_cuda_broadcast3_broadcast2 (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.9171550Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.9171809Z [2023-01-11 21:37:55,958] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1104 2023-01-11T21:38:06.9172068Z [2023-01-11 21:37:55,969] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1104 2023-01-11T21:38:06.9172079Z 2023-01-11T21:38:06.9172171Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.9172245Z import torch 2023-01-11T21:38:06.9172320Z import random 2023-01-11T21:38:06.9172440Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.9172562Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.9172567Z 2023-01-11T21:38:06.9172649Z aten = torch.ops.aten 2023-01-11T21:38:06.9172786Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.9172912Z async_compile = AsyncCompile() 2023-01-11T21:38:06.9172917Z 2023-01-11T21:38:06.9172994Z import triton 2023-01-11T21:38:06.9173086Z import triton.language as tl 2023-01-11T21:38:06.9173211Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.9173350Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.9173355Z 2023-01-11T21:38:06.9173360Z 2023-01-11T21:38:06.9173516Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.9173587Z import triton 2023-01-11T21:38:06.9173679Z import triton.language as tl 2023-01-11T21:38:06.9173787Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.9173897Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.9174032Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.9174160Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.9174166Z 2023-01-11T21:38:06.9174708Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.9174784Z @triton.jit 2023-01-11T21:38:06.9174926Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.9174997Z xnumel = 10 2023-01-11T21:38:06.9175088Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.9175223Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.9175306Z xmask = xindex < xnumel 2023-01-11T21:38:06.9175377Z x0 = xindex 2023-01-11T21:38:06.9175573Z tmp0 = tl.load(in_ptr0 + (0 + tl.zeros([XBLOCK], tl.int32)), None) 2023-01-11T21:38:06.9175680Z tmp1 = tl.load(in_ptr1 + (x0), xmask) 2023-01-11T21:38:06.9175776Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.9175911Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.9175996Z ''') 2023-01-11T21:38:06.9176005Z 2023-01-11T21:38:06.9176009Z 2023-01-11T21:38:06.9176102Z async_compile.wait(globals()) 2023-01-11T21:38:06.9176178Z del async_compile 2023-01-11T21:38:06.9176183Z 2023-01-11T21:38:06.9176259Z def call(args): 2023-01-11T21:38:06.9176338Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.9176411Z args.clear() 2023-01-11T21:38:06.9176500Z with torch.cuda.device(0): 2023-01-11T21:38:06.9176705Z buf0 = empty_strided((1, 10, 1), (10, 1, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.9176796Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.9176939Z triton_fused_add_0.run(arg0_1, arg1_1, buf0, 10, grid=grid(10), stream=stream0) 2023-01-11T21:38:06.9177017Z del arg0_1 2023-01-11T21:38:06.9177089Z del arg1_1 2023-01-11T21:38:06.9177230Z return (buf0, ) 2023-01-11T21:38:06.9177237Z 2023-01-11T21:38:06.9177241Z 2023-01-11T21:38:06.9177330Z if __name__ == "__main__": 2023-01-11T21:38:06.9177445Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.9177571Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.9177771Z arg0_1 = rand_strided((1, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.9177979Z arg1_1 = rand_strided((1, 10, 1), (10, 1, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.9178100Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.9178105Z 2023-01-11T21:38:06.9178174Z ok (0.024s) 2023-01-11T21:38:06.9178656Z test_cuda_broadcast3_broadcast3 (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.9178789Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.9179092Z [2023-01-11 21:37:55,982] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1105 2023-01-11T21:38:06.9179357Z [2023-01-11 21:37:56,046] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1105 2023-01-11T21:38:06.9179363Z 2023-01-11T21:38:06.9179456Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.9179533Z import torch 2023-01-11T21:38:06.9179608Z import random 2023-01-11T21:38:06.9179729Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.9179851Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.9179860Z 2023-01-11T21:38:06.9179942Z aten = torch.ops.aten 2023-01-11T21:38:06.9180078Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.9180167Z async_compile = AsyncCompile() 2023-01-11T21:38:06.9180178Z 2023-01-11T21:38:06.9180245Z import triton 2023-01-11T21:38:06.9180337Z import triton.language as tl 2023-01-11T21:38:06.9180464Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.9180604Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.9180609Z 2023-01-11T21:38:06.9180614Z 2023-01-11T21:38:06.9180768Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.9180842Z import triton 2023-01-11T21:38:06.9180934Z import triton.language as tl 2023-01-11T21:38:06.9181041Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.9181142Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.9181278Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.9181403Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.9181435Z 2023-01-11T21:38:06.9181853Z @pointwise(size_hints=[1], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.9181929Z @triton.jit 2023-01-11T21:38:06.9182073Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.9182145Z xnumel = 1 2023-01-11T21:38:06.9182236Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.9182364Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.9182447Z xmask = xindex < xnumel 2023-01-11T21:38:06.9182580Z tmp0 = tl.load(in_ptr0 + (0 + tl.zeros([XBLOCK], tl.int32)), None) 2023-01-11T21:38:06.9182709Z tmp1 = tl.load(in_ptr1 + (0 + tl.zeros([XBLOCK], tl.int32)), None) 2023-01-11T21:38:06.9182789Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.9182925Z tl.store(out_ptr0 + (0 + tl.zeros([XBLOCK], tl.int32)), tmp2, None) 2023-01-11T21:38:06.9183004Z ''') 2023-01-11T21:38:06.9183016Z 2023-01-11T21:38:06.9183021Z 2023-01-11T21:38:06.9183108Z async_compile.wait(globals()) 2023-01-11T21:38:06.9183186Z del async_compile 2023-01-11T21:38:06.9183192Z 2023-01-11T21:38:06.9183266Z def call(args): 2023-01-11T21:38:06.9183348Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.9183423Z args.clear() 2023-01-11T21:38:06.9183514Z with torch.cuda.device(0): 2023-01-11T21:38:06.9183713Z buf0 = empty_strided((1, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.9183799Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.9183940Z triton_fused_add_0.run(arg0_1, arg1_1, buf0, 1, grid=grid(1), stream=stream0) 2023-01-11T21:38:06.9184012Z del arg0_1 2023-01-11T21:38:06.9184084Z del arg1_1 2023-01-11T21:38:06.9184160Z return (buf0, ) 2023-01-11T21:38:06.9184167Z 2023-01-11T21:38:06.9184171Z 2023-01-11T21:38:06.9184253Z if __name__ == "__main__": 2023-01-11T21:38:06.9184371Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.9184498Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.9184690Z arg0_1 = rand_strided((1, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.9184912Z arg1_1 = rand_strided((1, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.9185033Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.9185039Z 2023-01-11T21:38:06.9185109Z ok (0.077s) 2023-01-11T21:38:06.9185586Z test_cuda_broadcast3_dense (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.9185730Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.9186031Z [2023-01-11 21:37:56,060] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1106 2023-01-11T21:38:06.9186300Z [2023-01-11 21:37:56,129] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1106 2023-01-11T21:38:06.9186306Z 2023-01-11T21:38:06.9186402Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.9186470Z import torch 2023-01-11T21:38:06.9186544Z import random 2023-01-11T21:38:06.9186662Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.9186787Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.9186792Z 2023-01-11T21:38:06.9186874Z aten = torch.ops.aten 2023-01-11T21:38:06.9187011Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.9187107Z async_compile = AsyncCompile() 2023-01-11T21:38:06.9187112Z 2023-01-11T21:38:06.9187225Z import triton 2023-01-11T21:38:06.9187310Z import triton.language as tl 2023-01-11T21:38:06.9187435Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.9187576Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.9187581Z 2023-01-11T21:38:06.9187586Z 2023-01-11T21:38:06.9187743Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.9187817Z import triton 2023-01-11T21:38:06.9187909Z import triton.language as tl 2023-01-11T21:38:06.9188023Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.9188118Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.9188252Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.9188380Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.9188385Z 2023-01-11T21:38:06.9188802Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.9188877Z @triton.jit 2023-01-11T21:38:06.9189021Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.9189096Z xnumel = 100 2023-01-11T21:38:06.9189194Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.9189320Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.9189403Z xmask = xindex < xnumel 2023-01-11T21:38:06.9189473Z x0 = xindex 2023-01-11T21:38:06.9189606Z tmp0 = tl.load(in_ptr0 + (0 + tl.zeros([XBLOCK], tl.int32)), None) 2023-01-11T21:38:06.9189703Z tmp1 = tl.load(in_ptr1 + (x0), xmask) 2023-01-11T21:38:06.9189782Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.9189918Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.9189996Z ''') 2023-01-11T21:38:06.9190010Z 2023-01-11T21:38:06.9190014Z 2023-01-11T21:38:06.9190100Z async_compile.wait(globals()) 2023-01-11T21:38:06.9190177Z del async_compile 2023-01-11T21:38:06.9190183Z 2023-01-11T21:38:06.9190257Z def call(args): 2023-01-11T21:38:06.9190333Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.9190408Z args.clear() 2023-01-11T21:38:06.9190499Z with torch.cuda.device(0): 2023-01-11T21:38:06.9190732Z buf0 = empty_strided((10, 10), (10, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.9190819Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.9190961Z triton_fused_add_0.run(arg0_1, arg1_1, buf0, 100, grid=grid(100), stream=stream0) 2023-01-11T21:38:06.9191035Z del arg0_1 2023-01-11T21:38:06.9191105Z del arg1_1 2023-01-11T21:38:06.9191183Z return (buf0, ) 2023-01-11T21:38:06.9191188Z 2023-01-11T21:38:06.9191192Z 2023-01-11T21:38:06.9191270Z if __name__ == "__main__": 2023-01-11T21:38:06.9191388Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.9191514Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.9191709Z arg0_1 = rand_strided((1, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.9191914Z arg1_1 = rand_strided((10, 10), (10, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.9192039Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.9192044Z 2023-01-11T21:38:06.9192115Z ok (0.082s) 2023-01-11T21:38:06.9192596Z test_cuda_broadcast3_double (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.9192729Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.9192994Z [2023-01-11 21:37:56,143] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1107 2023-01-11T21:38:06.9193289Z [2023-01-11 21:37:56,211] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1107 2023-01-11T21:38:06.9193295Z 2023-01-11T21:38:06.9193393Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.9193460Z import torch 2023-01-11T21:38:06.9193536Z import random 2023-01-11T21:38:06.9193656Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.9193784Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.9193790Z 2023-01-11T21:38:06.9193871Z aten = torch.ops.aten 2023-01-11T21:38:06.9194007Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.9194106Z async_compile = AsyncCompile() 2023-01-11T21:38:06.9194111Z 2023-01-11T21:38:06.9194184Z import triton 2023-01-11T21:38:06.9194269Z import triton.language as tl 2023-01-11T21:38:06.9194394Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.9194536Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.9194545Z 2023-01-11T21:38:06.9194549Z 2023-01-11T21:38:06.9194704Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.9194778Z import triton 2023-01-11T21:38:06.9194870Z import triton.language as tl 2023-01-11T21:38:06.9194986Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.9195084Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.9195218Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.9195346Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.9195351Z 2023-01-11T21:38:06.9195769Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp64', 2: '*fp64', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.9195845Z @triton.jit 2023-01-11T21:38:06.9195986Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.9196063Z xnumel = 100 2023-01-11T21:38:06.9196160Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.9196290Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.9196367Z xmask = xindex < xnumel 2023-01-11T21:38:06.9196438Z x0 = xindex 2023-01-11T21:38:06.9196596Z tmp0 = tl.load(in_ptr0 + (0 + tl.zeros([XBLOCK], tl.int32)), None) 2023-01-11T21:38:06.9196695Z tmp2 = tl.load(in_ptr1 + (x0), xmask) 2023-01-11T21:38:06.9196783Z tmp1 = tmp0.to(tl.float64) 2023-01-11T21:38:06.9196861Z tmp3 = tmp1 + tmp2 2023-01-11T21:38:06.9196998Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.9197077Z ''') 2023-01-11T21:38:06.9197082Z 2023-01-11T21:38:06.9197087Z 2023-01-11T21:38:06.9197179Z async_compile.wait(globals()) 2023-01-11T21:38:06.9197255Z del async_compile 2023-01-11T21:38:06.9197260Z 2023-01-11T21:38:06.9197333Z def call(args): 2023-01-11T21:38:06.9197412Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.9197489Z args.clear() 2023-01-11T21:38:06.9197579Z with torch.cuda.device(0): 2023-01-11T21:38:06.9197779Z buf0 = empty_strided((10, 10), (10, 1), device='cuda', dtype=torch.float64) 2023-01-11T21:38:06.9197874Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.9198021Z triton_fused_add_0.run(arg0_1, arg1_1, buf0, 100, grid=grid(100), stream=stream0) 2023-01-11T21:38:06.9198094Z del arg0_1 2023-01-11T21:38:06.9198165Z del arg1_1 2023-01-11T21:38:06.9198242Z return (buf0, ) 2023-01-11T21:38:06.9198247Z 2023-01-11T21:38:06.9198252Z 2023-01-11T21:38:06.9198330Z if __name__ == "__main__": 2023-01-11T21:38:06.9198448Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.9198569Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.9198769Z arg0_1 = rand_strided((1, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.9198973Z arg1_1 = rand_strided((10, 10), (10, 1), device='cuda:0', dtype=torch.float64) 2023-01-11T21:38:06.9199125Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.9199131Z 2023-01-11T21:38:06.9199201Z ok (0.082s) 2023-01-11T21:38:06.9199682Z test_cuda_broadcast3_int (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.9199815Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.9200076Z [2023-01-11 21:37:56,225] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1108 2023-01-11T21:38:06.9200343Z [2023-01-11 21:37:56,290] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1108 2023-01-11T21:38:06.9200351Z 2023-01-11T21:38:06.9200453Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.9200521Z import torch 2023-01-11T21:38:06.9200596Z import random 2023-01-11T21:38:06.9200715Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.9200838Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.9200843Z 2023-01-11T21:38:06.9200929Z aten = torch.ops.aten 2023-01-11T21:38:06.9201068Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.9201162Z async_compile = AsyncCompile() 2023-01-11T21:38:06.9201168Z 2023-01-11T21:38:06.9201234Z import triton 2023-01-11T21:38:06.9201325Z import triton.language as tl 2023-01-11T21:38:06.9201450Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.9201589Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.9201595Z 2023-01-11T21:38:06.9201599Z 2023-01-11T21:38:06.9201752Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.9201828Z import triton 2023-01-11T21:38:06.9201920Z import triton.language as tl 2023-01-11T21:38:06.9202035Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.9202130Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.9202262Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.9202413Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.9202420Z 2023-01-11T21:38:06.9202836Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp32', 1: '*i32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.9202911Z @triton.jit 2023-01-11T21:38:06.9203052Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.9203125Z xnumel = 10 2023-01-11T21:38:06.9203223Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.9203346Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.9203432Z xmask = xindex < xnumel 2023-01-11T21:38:06.9203503Z x0 = xindex 2023-01-11T21:38:06.9203634Z tmp0 = tl.load(in_ptr0 + (0 + tl.zeros([XBLOCK], tl.int32)), None) 2023-01-11T21:38:06.9203733Z tmp1 = tl.load(in_ptr1 + (x0), xmask) 2023-01-11T21:38:06.9203825Z tmp2 = tmp1.to(tl.float32) 2023-01-11T21:38:06.9203903Z tmp3 = tmp0 + tmp2 2023-01-11T21:38:06.9204032Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.9204117Z ''') 2023-01-11T21:38:06.9204122Z 2023-01-11T21:38:06.9204127Z 2023-01-11T21:38:06.9204219Z async_compile.wait(globals()) 2023-01-11T21:38:06.9204295Z del async_compile 2023-01-11T21:38:06.9204301Z 2023-01-11T21:38:06.9204375Z def call(args): 2023-01-11T21:38:06.9204453Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.9204527Z args.clear() 2023-01-11T21:38:06.9204611Z with torch.cuda.device(0): 2023-01-11T21:38:06.9204809Z buf0 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.9204927Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.9205070Z triton_fused_add_0.run(arg0_1, arg1_1, buf0, 10, grid=grid(10), stream=stream0) 2023-01-11T21:38:06.9205144Z del arg0_1 2023-01-11T21:38:06.9205214Z del arg1_1 2023-01-11T21:38:06.9205292Z return (buf0, ) 2023-01-11T21:38:06.9205298Z 2023-01-11T21:38:06.9205302Z 2023-01-11T21:38:06.9205379Z if __name__ == "__main__": 2023-01-11T21:38:06.9205490Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.9205617Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.9205816Z arg0_1 = rand_strided((1, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.9206012Z arg1_1 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.int32) 2023-01-11T21:38:06.9206132Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.9206137Z 2023-01-11T21:38:06.9206207Z ok (0.079s) 2023-01-11T21:38:06.9206695Z test_cuda_broadcast3_strided (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.9206826Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.9207090Z [2023-01-11 21:37:56,304] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1109 2023-01-11T21:38:06.9207349Z [2023-01-11 21:37:56,376] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1109 2023-01-11T21:38:06.9207359Z 2023-01-11T21:38:06.9207451Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.9207525Z import torch 2023-01-11T21:38:06.9207600Z import random 2023-01-11T21:38:06.9207722Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.9207844Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.9207849Z 2023-01-11T21:38:06.9207927Z aten = torch.ops.aten 2023-01-11T21:38:06.9208060Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.9208176Z async_compile = AsyncCompile() 2023-01-11T21:38:06.9208182Z 2023-01-11T21:38:06.9208255Z import triton 2023-01-11T21:38:06.9208346Z import triton.language as tl 2023-01-11T21:38:06.9208471Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.9208608Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.9208614Z 2023-01-11T21:38:06.9208618Z 2023-01-11T21:38:06.9208771Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.9208843Z import triton 2023-01-11T21:38:06.9208937Z import triton.language as tl 2023-01-11T21:38:06.9209045Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.9209150Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.9209280Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.9209405Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.9209410Z 2023-01-11T21:38:06.9209828Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.9209903Z @triton.jit 2023-01-11T21:38:06.9210046Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.9210117Z xnumel = 100 2023-01-11T21:38:06.9210209Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.9210337Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.9210419Z xmask = xindex < xnumel 2023-01-11T21:38:06.9210497Z x0 = xindex % 10 2023-01-11T21:38:06.9210576Z x1 = (xindex // 10) 2023-01-11T21:38:06.9210675Z x2 = xindex 2023-01-11T21:38:06.9210807Z tmp0 = tl.load(in_ptr0 + (0 + tl.zeros([XBLOCK], tl.int32)), None) 2023-01-11T21:38:06.9210910Z tmp1 = tl.load(in_ptr1 + ((2*x0) + (30*x1)), xmask) 2023-01-11T21:38:06.9210988Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.9211126Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.9211209Z ''') 2023-01-11T21:38:06.9211215Z 2023-01-11T21:38:06.9211219Z 2023-01-11T21:38:06.9211314Z async_compile.wait(globals()) 2023-01-11T21:38:06.9211390Z del async_compile 2023-01-11T21:38:06.9211395Z 2023-01-11T21:38:06.9211468Z def call(args): 2023-01-11T21:38:06.9211540Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.9211613Z args.clear() 2023-01-11T21:38:06.9211705Z with torch.cuda.device(0): 2023-01-11T21:38:06.9211909Z buf0 = empty_strided((10, 10), (10, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.9212000Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.9212146Z triton_fused_add_0.run(arg0_1, arg1_1, buf0, 100, grid=grid(100), stream=stream0) 2023-01-11T21:38:06.9212219Z del arg0_1 2023-01-11T21:38:06.9212284Z del arg1_1 2023-01-11T21:38:06.9212360Z return (buf0, ) 2023-01-11T21:38:06.9212365Z 2023-01-11T21:38:06.9212369Z 2023-01-11T21:38:06.9212452Z if __name__ == "__main__": 2023-01-11T21:38:06.9212570Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.9212697Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.9212897Z arg0_1 = rand_strided((1, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.9213101Z arg1_1 = rand_strided((10, 10), (30, 2), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.9213218Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.9213223Z 2023-01-11T21:38:06.9213292Z ok (0.086s) 2023-01-11T21:38:06.9213767Z test_cuda_broadcast3_transposed (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.9213956Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.9214218Z [2023-01-11 21:37:56,391] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1110 2023-01-11T21:38:06.9214589Z [2023-01-11 21:37:56,402] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1110 2023-01-11T21:38:06.9214595Z 2023-01-11T21:38:06.9214695Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.9214770Z import torch 2023-01-11T21:38:06.9214844Z import random 2023-01-11T21:38:06.9214961Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.9215089Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.9215094Z 2023-01-11T21:38:06.9215169Z aten = torch.ops.aten 2023-01-11T21:38:06.9215305Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.9215400Z async_compile = AsyncCompile() 2023-01-11T21:38:06.9215405Z 2023-01-11T21:38:06.9215489Z import triton 2023-01-11T21:38:06.9215595Z import triton.language as tl 2023-01-11T21:38:06.9215743Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.9215880Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.9215885Z 2023-01-11T21:38:06.9215890Z 2023-01-11T21:38:06.9216041Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.9216109Z import triton 2023-01-11T21:38:06.9216200Z import triton.language as tl 2023-01-11T21:38:06.9216313Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.9216415Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.9216547Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.9216714Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.9216719Z 2023-01-11T21:38:06.9223051Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.9223148Z @triton.jit 2023-01-11T21:38:06.9223296Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.9223373Z xnumel = 100 2023-01-11T21:38:06.9223465Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.9223598Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.9223681Z xmask = xindex < xnumel 2023-01-11T21:38:06.9223749Z x0 = xindex 2023-01-11T21:38:06.9223887Z tmp0 = tl.load(in_ptr0 + (0 + tl.zeros([XBLOCK], tl.int32)), None) 2023-01-11T21:38:06.9223978Z tmp1 = tl.load(in_ptr1 + (x0), xmask) 2023-01-11T21:38:06.9224059Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.9224192Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.9224293Z ''') 2023-01-11T21:38:06.9224299Z 2023-01-11T21:38:06.9224304Z 2023-01-11T21:38:06.9224397Z async_compile.wait(globals()) 2023-01-11T21:38:06.9224477Z del async_compile 2023-01-11T21:38:06.9224483Z 2023-01-11T21:38:06.9224560Z def call(args): 2023-01-11T21:38:06.9224641Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.9224710Z args.clear() 2023-01-11T21:38:06.9224801Z with torch.cuda.device(0): 2023-01-11T21:38:06.9225005Z buf0 = empty_strided((10, 10), (1, 10), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.9225096Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.9225238Z triton_fused_add_0.run(arg0_1, arg1_1, buf0, 100, grid=grid(100), stream=stream0) 2023-01-11T21:38:06.9225310Z del arg0_1 2023-01-11T21:38:06.9225381Z del arg1_1 2023-01-11T21:38:06.9225455Z return (buf0, ) 2023-01-11T21:38:06.9225460Z 2023-01-11T21:38:06.9225465Z 2023-01-11T21:38:06.9225544Z if __name__ == "__main__": 2023-01-11T21:38:06.9225660Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.9225785Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.9226087Z arg0_1 = rand_strided((1, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.9226302Z arg1_1 = rand_strided((10, 10), (1, 10), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.9226421Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.9226427Z 2023-01-11T21:38:06.9226497Z ok (0.025s) 2023-01-11T21:38:06.9226968Z test_cuda_dense_broadcast1 (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.9227099Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.9227358Z [2023-01-11 21:37:56,415] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1111 2023-01-11T21:38:06.9227625Z [2023-01-11 21:37:56,485] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1111 2023-01-11T21:38:06.9227631Z 2023-01-11T21:38:06.9227728Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.9227802Z import torch 2023-01-11T21:38:06.9227879Z import random 2023-01-11T21:38:06.9227993Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.9228116Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.9228121Z 2023-01-11T21:38:06.9228196Z aten = torch.ops.aten 2023-01-11T21:38:06.9228331Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.9228458Z async_compile = AsyncCompile() 2023-01-11T21:38:06.9228464Z 2023-01-11T21:38:06.9228539Z import triton 2023-01-11T21:38:06.9228635Z import triton.language as tl 2023-01-11T21:38:06.9228762Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.9228904Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.9228912Z 2023-01-11T21:38:06.9228916Z 2023-01-11T21:38:06.9229075Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.9229145Z import triton 2023-01-11T21:38:06.9229238Z import triton.language as tl 2023-01-11T21:38:06.9229354Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.9229460Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.9229592Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.9229720Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.9229725Z 2023-01-11T21:38:06.9230142Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.9230221Z @triton.jit 2023-01-11T21:38:06.9230358Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.9230437Z xnumel = 100 2023-01-11T21:38:06.9230536Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.9230667Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.9230752Z xmask = xindex < xnumel 2023-01-11T21:38:06.9230825Z x2 = xindex 2023-01-11T21:38:06.9230902Z x0 = xindex % 10 2023-01-11T21:38:06.9230995Z tmp0 = tl.load(in_ptr0 + (x2), xmask) 2023-01-11T21:38:06.9231092Z tmp1 = tl.load(in_ptr1 + (x0), xmask) 2023-01-11T21:38:06.9231174Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.9231309Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.9231396Z ''') 2023-01-11T21:38:06.9231402Z 2023-01-11T21:38:06.9231406Z 2023-01-11T21:38:06.9231501Z async_compile.wait(globals()) 2023-01-11T21:38:06.9231580Z del async_compile 2023-01-11T21:38:06.9231585Z 2023-01-11T21:38:06.9231663Z def call(args): 2023-01-11T21:38:06.9231738Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.9231816Z args.clear() 2023-01-11T21:38:06.9231938Z with torch.cuda.device(0): 2023-01-11T21:38:06.9232147Z buf0 = empty_strided((10, 10), (10, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.9232242Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.9232384Z triton_fused_add_0.run(arg0_1, arg1_1, buf0, 100, grid=grid(100), stream=stream0) 2023-01-11T21:38:06.9232462Z del arg0_1 2023-01-11T21:38:06.9232530Z del arg1_1 2023-01-11T21:38:06.9232611Z return (buf0, ) 2023-01-11T21:38:06.9232616Z 2023-01-11T21:38:06.9232620Z 2023-01-11T21:38:06.9232704Z if __name__ == "__main__": 2023-01-11T21:38:06.9232822Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.9232951Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.9233156Z arg0_1 = rand_strided((10, 10), (10, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.9233353Z arg1_1 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.9233469Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.9233481Z 2023-01-11T21:38:06.9233548Z ok (0.084s) 2023-01-11T21:38:06.9234028Z test_cuda_dense_broadcast2 (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.9234164Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.9234462Z [2023-01-11 21:37:56,499] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1112 2023-01-11T21:38:06.9234731Z [2023-01-11 21:37:56,570] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1112 2023-01-11T21:38:06.9234737Z 2023-01-11T21:38:06.9234841Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.9234917Z import torch 2023-01-11T21:38:06.9234994Z import random 2023-01-11T21:38:06.9235115Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.9235234Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.9235239Z 2023-01-11T21:38:06.9235323Z aten = torch.ops.aten 2023-01-11T21:38:06.9235463Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.9235561Z async_compile = AsyncCompile() 2023-01-11T21:38:06.9235566Z 2023-01-11T21:38:06.9235641Z import triton 2023-01-11T21:38:06.9235736Z import triton.language as tl 2023-01-11T21:38:06.9235865Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.9236000Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.9236011Z 2023-01-11T21:38:06.9236015Z 2023-01-11T21:38:06.9236164Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.9236241Z import triton 2023-01-11T21:38:06.9236338Z import triton.language as tl 2023-01-11T21:38:06.9236456Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.9236559Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.9236692Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.9236817Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.9236822Z 2023-01-11T21:38:06.9237240Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.9237311Z @triton.jit 2023-01-11T21:38:06.9237454Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.9237533Z xnumel = 100 2023-01-11T21:38:06.9237630Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.9237761Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.9237874Z xmask = xindex < xnumel 2023-01-11T21:38:06.9237951Z x2 = xindex 2023-01-11T21:38:06.9238026Z x1 = (xindex // 10) 2023-01-11T21:38:06.9238126Z tmp0 = tl.load(in_ptr0 + (x2), xmask) 2023-01-11T21:38:06.9238223Z tmp1 = tl.load(in_ptr1 + (x1), xmask) 2023-01-11T21:38:06.9238302Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.9238437Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.9238524Z ''') 2023-01-11T21:38:06.9238529Z 2023-01-11T21:38:06.9238534Z 2023-01-11T21:38:06.9238629Z async_compile.wait(globals()) 2023-01-11T21:38:06.9238702Z del async_compile 2023-01-11T21:38:06.9238716Z 2023-01-11T21:38:06.9238786Z def call(args): 2023-01-11T21:38:06.9238867Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.9238945Z args.clear() 2023-01-11T21:38:06.9239039Z with torch.cuda.device(0): 2023-01-11T21:38:06.9239249Z buf0 = empty_strided((1, 10, 10), (100, 10, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.9239345Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.9239490Z triton_fused_add_0.run(arg0_1, arg1_1, buf0, 100, grid=grid(100), stream=stream0) 2023-01-11T21:38:06.9239559Z del arg0_1 2023-01-11T21:38:06.9239636Z del arg1_1 2023-01-11T21:38:06.9239716Z return (buf0, ) 2023-01-11T21:38:06.9239721Z 2023-01-11T21:38:06.9239726Z 2023-01-11T21:38:06.9239806Z if __name__ == "__main__": 2023-01-11T21:38:06.9239924Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.9240053Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.9240259Z arg0_1 = rand_strided((10, 10), (10, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.9240487Z arg1_1 = rand_strided((1, 10, 1), (10, 1, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.9240612Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.9240618Z 2023-01-11T21:38:06.9240691Z ok (0.085s) 2023-01-11T21:38:06.9241169Z test_cuda_dense_broadcast3 (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.9241303Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.9241562Z [2023-01-11 21:37:56,583] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1113 2023-01-11T21:38:06.9241829Z [2023-01-11 21:37:56,652] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1113 2023-01-11T21:38:06.9241835Z 2023-01-11T21:38:06.9241936Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.9242014Z import torch 2023-01-11T21:38:06.9242089Z import random 2023-01-11T21:38:06.9242207Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.9242331Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.9242337Z 2023-01-11T21:38:06.9242420Z aten = torch.ops.aten 2023-01-11T21:38:06.9242559Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.9242656Z async_compile = AsyncCompile() 2023-01-11T21:38:06.9242661Z 2023-01-11T21:38:06.9242735Z import triton 2023-01-11T21:38:06.9242829Z import triton.language as tl 2023-01-11T21:38:06.9242950Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.9243090Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.9243099Z 2023-01-11T21:38:06.9243103Z 2023-01-11T21:38:06.9243257Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.9243332Z import triton 2023-01-11T21:38:06.9243427Z import triton.language as tl 2023-01-11T21:38:06.9243541Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.9243646Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.9243807Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.9243927Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.9243932Z 2023-01-11T21:38:06.9244348Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.9244422Z @triton.jit 2023-01-11T21:38:06.9244563Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.9244637Z xnumel = 100 2023-01-11T21:38:06.9244738Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.9244868Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.9244953Z xmask = xindex < xnumel 2023-01-11T21:38:06.9245020Z x0 = xindex 2023-01-11T21:38:06.9245117Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.9245251Z tmp1 = tl.load(in_ptr1 + (0 + tl.zeros([XBLOCK], tl.int32)), None) 2023-01-11T21:38:06.9245335Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.9245471Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.9245559Z ''') 2023-01-11T21:38:06.9245564Z 2023-01-11T21:38:06.9245569Z 2023-01-11T21:38:06.9245664Z async_compile.wait(globals()) 2023-01-11T21:38:06.9245743Z del async_compile 2023-01-11T21:38:06.9245749Z 2023-01-11T21:38:06.9245819Z def call(args): 2023-01-11T21:38:06.9245900Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.9245977Z args.clear() 2023-01-11T21:38:06.9246073Z with torch.cuda.device(0): 2023-01-11T21:38:06.9246308Z buf0 = empty_strided((10, 10), (10, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.9246403Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.9246547Z triton_fused_add_0.run(arg0_1, arg1_1, buf0, 100, grid=grid(100), stream=stream0) 2023-01-11T21:38:06.9246616Z del arg0_1 2023-01-11T21:38:06.9246694Z del arg1_1 2023-01-11T21:38:06.9246778Z return (buf0, ) 2023-01-11T21:38:06.9246784Z 2023-01-11T21:38:06.9246788Z 2023-01-11T21:38:06.9246870Z if __name__ == "__main__": 2023-01-11T21:38:06.9246990Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.9247118Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.9247323Z arg0_1 = rand_strided((10, 10), (10, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.9247519Z arg1_1 = rand_strided((1, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.9247634Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.9247643Z 2023-01-11T21:38:06.9247714Z ok (0.082s) 2023-01-11T21:38:06.9248193Z test_cuda_dense_dense (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.9248325Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.9248589Z [2023-01-11 21:37:56,665] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1114 2023-01-11T21:38:06.9248854Z [2023-01-11 21:37:56,733] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1114 2023-01-11T21:38:06.9248860Z 2023-01-11T21:38:06.9248960Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.9249039Z import torch 2023-01-11T21:38:06.9249116Z import random 2023-01-11T21:38:06.9249229Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.9249354Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.9249359Z 2023-01-11T21:38:06.9249443Z aten = torch.ops.aten 2023-01-11T21:38:06.9249609Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.9249708Z async_compile = AsyncCompile() 2023-01-11T21:38:06.9249714Z 2023-01-11T21:38:06.9249793Z import triton 2023-01-11T21:38:06.9249888Z import triton.language as tl 2023-01-11T21:38:06.9250009Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.9250149Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.9250155Z 2023-01-11T21:38:06.9250159Z 2023-01-11T21:38:06.9250315Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.9250393Z import triton 2023-01-11T21:38:06.9250486Z import triton.language as tl 2023-01-11T21:38:06.9250603Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.9250706Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.9250840Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.9250961Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.9250972Z 2023-01-11T21:38:06.9251384Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.9251460Z @triton.jit 2023-01-11T21:38:06.9251601Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.9251677Z xnumel = 100 2023-01-11T21:38:06.9251775Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.9251906Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.9251991Z xmask = xindex < xnumel 2023-01-11T21:38:06.9252087Z x0 = xindex 2023-01-11T21:38:06.9252186Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.9252287Z tmp1 = tl.load(in_ptr1 + (x0), xmask) 2023-01-11T21:38:06.9252367Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.9252501Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.9252589Z ''') 2023-01-11T21:38:06.9252597Z 2023-01-11T21:38:06.9252602Z 2023-01-11T21:38:06.9252699Z async_compile.wait(globals()) 2023-01-11T21:38:06.9252782Z del async_compile 2023-01-11T21:38:06.9252787Z 2023-01-11T21:38:06.9252857Z def call(args): 2023-01-11T21:38:06.9252938Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.9253017Z args.clear() 2023-01-11T21:38:06.9253111Z with torch.cuda.device(0): 2023-01-11T21:38:06.9253315Z buf0 = empty_strided((10, 10), (10, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.9253409Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.9253550Z triton_fused_add_0.run(arg0_1, arg1_1, buf0, 100, grid=grid(100), stream=stream0) 2023-01-11T21:38:06.9253624Z del arg0_1 2023-01-11T21:38:06.9253699Z del arg1_1 2023-01-11T21:38:06.9253779Z return (buf0, ) 2023-01-11T21:38:06.9253785Z 2023-01-11T21:38:06.9253789Z 2023-01-11T21:38:06.9253871Z if __name__ == "__main__": 2023-01-11T21:38:06.9253992Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.9254119Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.9254326Z arg0_1 = rand_strided((10, 10), (10, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.9254668Z arg1_1 = rand_strided((10, 10), (10, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.9254786Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.9254792Z 2023-01-11T21:38:06.9254865Z ok (0.081s) 2023-01-11T21:38:06.9255335Z test_cuda_dense_double (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.9255471Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.9255774Z [2023-01-11 21:37:56,746] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1115 2023-01-11T21:38:06.9256042Z [2023-01-11 21:37:56,815] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1115 2023-01-11T21:38:06.9256048Z 2023-01-11T21:38:06.9256148Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.9256225Z import torch 2023-01-11T21:38:06.9256302Z import random 2023-01-11T21:38:06.9256416Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.9256542Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.9256550Z 2023-01-11T21:38:06.9256636Z aten = torch.ops.aten 2023-01-11T21:38:06.9256774Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.9256871Z async_compile = AsyncCompile() 2023-01-11T21:38:06.9256876Z 2023-01-11T21:38:06.9256954Z import triton 2023-01-11T21:38:06.9257049Z import triton.language as tl 2023-01-11T21:38:06.9257288Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.9257425Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.9257431Z 2023-01-11T21:38:06.9257435Z 2023-01-11T21:38:06.9257591Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.9257668Z import triton 2023-01-11T21:38:06.9257765Z import triton.language as tl 2023-01-11T21:38:06.9257881Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.9257983Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.9258117Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.9258237Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.9258288Z 2023-01-11T21:38:06.9258698Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp64', 2: '*fp64', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.9258778Z @triton.jit 2023-01-11T21:38:06.9258925Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.9259001Z xnumel = 100 2023-01-11T21:38:06.9259099Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.9259230Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.9259316Z xmask = xindex < xnumel 2023-01-11T21:38:06.9259382Z x0 = xindex 2023-01-11T21:38:06.9259482Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.9259583Z tmp2 = tl.load(in_ptr1 + (x0), xmask) 2023-01-11T21:38:06.9259675Z tmp1 = tmp0.to(tl.float64) 2023-01-11T21:38:06.9259758Z tmp3 = tmp1 + tmp2 2023-01-11T21:38:06.9259897Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.9259987Z ''') 2023-01-11T21:38:06.9259992Z 2023-01-11T21:38:06.9259997Z 2023-01-11T21:38:06.9260091Z async_compile.wait(globals()) 2023-01-11T21:38:06.9260164Z del async_compile 2023-01-11T21:38:06.9260169Z 2023-01-11T21:38:06.9260253Z def call(args): 2023-01-11T21:38:06.9260335Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.9260416Z args.clear() 2023-01-11T21:38:06.9260511Z with torch.cuda.device(0): 2023-01-11T21:38:06.9260715Z buf0 = empty_strided((10, 10), (10, 1), device='cuda', dtype=torch.float64) 2023-01-11T21:38:06.9260810Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.9260946Z triton_fused_add_0.run(arg0_1, arg1_1, buf0, 100, grid=grid(100), stream=stream0) 2023-01-11T21:38:06.9261024Z del arg0_1 2023-01-11T21:38:06.9261099Z del arg1_1 2023-01-11T21:38:06.9261179Z return (buf0, ) 2023-01-11T21:38:06.9261187Z 2023-01-11T21:38:06.9261191Z 2023-01-11T21:38:06.9261274Z if __name__ == "__main__": 2023-01-11T21:38:06.9261394Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.9261522Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.9261753Z arg0_1 = rand_strided((10, 10), (10, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.9261957Z arg1_1 = rand_strided((10, 10), (10, 1), device='cuda:0', dtype=torch.float64) 2023-01-11T21:38:06.9262080Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.9262086Z 2023-01-11T21:38:06.9262159Z ok (0.083s) 2023-01-11T21:38:06.9262631Z test_cuda_dense_int (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.9262765Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.9263028Z [2023-01-11 21:37:56,829] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1116 2023-01-11T21:38:06.9263301Z [2023-01-11 21:37:56,901] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1116 2023-01-11T21:38:06.9263307Z 2023-01-11T21:38:06.9263406Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.9263482Z import torch 2023-01-11T21:38:06.9263552Z import random 2023-01-11T21:38:06.9263672Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.9263797Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.9263802Z 2023-01-11T21:38:06.9263884Z aten = torch.ops.aten 2023-01-11T21:38:06.9264020Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.9264115Z async_compile = AsyncCompile() 2023-01-11T21:38:06.9264146Z 2023-01-11T21:38:06.9264225Z import triton 2023-01-11T21:38:06.9264321Z import triton.language as tl 2023-01-11T21:38:06.9264440Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.9264582Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.9264587Z 2023-01-11T21:38:06.9264595Z 2023-01-11T21:38:06.9264752Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.9264828Z import triton 2023-01-11T21:38:06.9264923Z import triton.language as tl 2023-01-11T21:38:06.9265041Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.9265145Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.9265272Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.9265401Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.9265406Z 2023-01-11T21:38:06.9265824Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp32', 1: '*i32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.9265904Z @triton.jit 2023-01-11T21:38:06.9266047Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.9266123Z xnumel = 100 2023-01-11T21:38:06.9266225Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.9266357Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.9266436Z xmask = xindex < xnumel 2023-01-11T21:38:06.9266507Z x2 = xindex 2023-01-11T21:38:06.9266590Z x0 = xindex % 10 2023-01-11T21:38:06.9266688Z tmp0 = tl.load(in_ptr0 + (x2), xmask) 2023-01-11T21:38:06.9266785Z tmp1 = tl.load(in_ptr1 + (x0), xmask) 2023-01-11T21:38:06.9266875Z tmp2 = tmp1.to(tl.float32) 2023-01-11T21:38:06.9266957Z tmp3 = tmp0 + tmp2 2023-01-11T21:38:06.9267090Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.9267180Z ''') 2023-01-11T21:38:06.9267185Z 2023-01-11T21:38:06.9267190Z 2023-01-11T21:38:06.9267284Z async_compile.wait(globals()) 2023-01-11T21:38:06.9267363Z del async_compile 2023-01-11T21:38:06.9267368Z 2023-01-11T21:38:06.9267444Z def call(args): 2023-01-11T21:38:06.9267525Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.9267602Z args.clear() 2023-01-11T21:38:06.9267727Z with torch.cuda.device(0): 2023-01-11T21:38:06.9267927Z buf0 = empty_strided((10, 10), (10, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.9268022Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.9268165Z triton_fused_add_0.run(arg0_1, arg1_1, buf0, 100, grid=grid(100), stream=stream0) 2023-01-11T21:38:06.9268240Z del arg0_1 2023-01-11T21:38:06.9268311Z del arg1_1 2023-01-11T21:38:06.9268394Z return (buf0, ) 2023-01-11T21:38:06.9268399Z 2023-01-11T21:38:06.9268403Z 2023-01-11T21:38:06.9268485Z if __name__ == "__main__": 2023-01-11T21:38:06.9268598Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.9268732Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.9268938Z arg0_1 = rand_strided((10, 10), (10, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.9269136Z arg1_1 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.int32) 2023-01-11T21:38:06.9269261Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.9269267Z 2023-01-11T21:38:06.9269341Z ok (0.085s) 2023-01-11T21:38:06.9269813Z test_cuda_dense_strided (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.9269947Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.9270235Z [2023-01-11 21:37:56,915] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1117 2023-01-11T21:38:06.9270500Z [2023-01-11 21:37:56,988] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1117 2023-01-11T21:38:06.9270506Z 2023-01-11T21:38:06.9270602Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.9270679Z import torch 2023-01-11T21:38:06.9270758Z import random 2023-01-11T21:38:06.9270878Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.9271004Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.9271009Z 2023-01-11T21:38:06.9271093Z aten = torch.ops.aten 2023-01-11T21:38:06.9271230Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.9271321Z async_compile = AsyncCompile() 2023-01-11T21:38:06.9271331Z 2023-01-11T21:38:06.9271402Z import triton 2023-01-11T21:38:06.9271496Z import triton.language as tl 2023-01-11T21:38:06.9271626Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.9271768Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.9271773Z 2023-01-11T21:38:06.9271777Z 2023-01-11T21:38:06.9271934Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.9272012Z import triton 2023-01-11T21:38:06.9272107Z import triton.language as tl 2023-01-11T21:38:06.9272217Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.9272322Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.9272456Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.9272582Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.9272587Z 2023-01-11T21:38:06.9273003Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.9273083Z @triton.jit 2023-01-11T21:38:06.9273229Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.9273306Z xnumel = 100 2023-01-11T21:38:06.9273398Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.9273529Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.9273644Z xmask = xindex < xnumel 2023-01-11T21:38:06.9273721Z x2 = xindex 2023-01-11T21:38:06.9273798Z x0 = xindex % 10 2023-01-11T21:38:06.9273879Z x1 = (xindex // 10) 2023-01-11T21:38:06.9273980Z tmp0 = tl.load(in_ptr0 + (x2), xmask) 2023-01-11T21:38:06.9274085Z tmp1 = tl.load(in_ptr1 + ((2*x0) + (30*x1)), xmask) 2023-01-11T21:38:06.9274166Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.9274303Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.9274391Z ''') 2023-01-11T21:38:06.9274397Z 2023-01-11T21:38:06.9274401Z 2023-01-11T21:38:06.9274498Z async_compile.wait(globals()) 2023-01-11T21:38:06.9274581Z del async_compile 2023-01-11T21:38:06.9274587Z 2023-01-11T21:38:06.9274667Z def call(args): 2023-01-11T21:38:06.9274748Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.9274819Z args.clear() 2023-01-11T21:38:06.9274911Z with torch.cuda.device(0): 2023-01-11T21:38:06.9275119Z buf0 = empty_strided((10, 10), (10, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.9275210Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.9275354Z triton_fused_add_0.run(arg0_1, arg1_1, buf0, 100, grid=grid(100), stream=stream0) 2023-01-11T21:38:06.9275432Z del arg0_1 2023-01-11T21:38:06.9275507Z del arg1_1 2023-01-11T21:38:06.9275581Z return (buf0, ) 2023-01-11T21:38:06.9275586Z 2023-01-11T21:38:06.9275590Z 2023-01-11T21:38:06.9275675Z if __name__ == "__main__": 2023-01-11T21:38:06.9275793Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.9275920Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.9276161Z arg0_1 = rand_strided((10, 10), (10, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.9276361Z arg1_1 = rand_strided((10, 10), (30, 2), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.9276482Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.9276487Z 2023-01-11T21:38:06.9276559Z ok (0.087s) 2023-01-11T21:38:06.9277038Z test_cuda_dense_transposed (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.9277171Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.9277433Z [2023-01-11 21:37:57,002] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1118 2023-01-11T21:38:06.9277704Z [2023-01-11 21:37:57,226] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1118 2023-01-11T21:38:06.9277710Z 2023-01-11T21:38:06.9277808Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.9277886Z import torch 2023-01-11T21:38:06.9277963Z import random 2023-01-11T21:38:06.9278084Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.9278209Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.9278214Z 2023-01-11T21:38:06.9278291Z aten = torch.ops.aten 2023-01-11T21:38:06.9278428Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.9278522Z async_compile = AsyncCompile() 2023-01-11T21:38:06.9278528Z 2023-01-11T21:38:06.9278605Z import triton 2023-01-11T21:38:06.9278701Z import triton.language as tl 2023-01-11T21:38:06.9278826Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.9278967Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.9278975Z 2023-01-11T21:38:06.9278980Z 2023-01-11T21:38:06.9279135Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.9279206Z import triton 2023-01-11T21:38:06.9279301Z import triton.language as tl 2023-01-11T21:38:06.9279416Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.9279547Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.9279685Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.9279813Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.9279818Z 2023-01-11T21:38:06.9280302Z @pointwise(size_hints=[16, 16], tile_hint=TileHint.DEFAULT,filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.9280380Z @triton.jit 2023-01-11T21:38:06.9280552Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, ynumel, XBLOCK : tl.constexpr, YBLOCK : tl.constexpr): 2023-01-11T21:38:06.9280632Z xnumel = 10 2023-01-11T21:38:06.9280708Z ynumel = 10 2023-01-11T21:38:06.9280809Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.9280947Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.9281035Z xmask = xindex < xnumel 2023-01-11T21:38:06.9281133Z yoffset = tl.program_id(1) * YBLOCK 2023-01-11T21:38:06.9281260Z yindex = yoffset + tl.reshape(tl.arange(0, YBLOCK), [1, YBLOCK]) 2023-01-11T21:38:06.9281345Z ymask = yindex < ynumel 2023-01-11T21:38:06.9281416Z x0 = xindex 2023-01-11T21:38:06.9281488Z y1 = yindex 2023-01-11T21:38:06.9281607Z tmp0 = tl.load(in_ptr0 + (y1 + (10*x0)), xmask & ymask) 2023-01-11T21:38:06.9281724Z tmp1 = tl.load(in_ptr1 + (x0 + (10*y1)), xmask & ymask) 2023-01-11T21:38:06.9281806Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.9281959Z tl.store(out_ptr0 + (y1 + (10*x0) + tl.zeros([XBLOCK, YBLOCK], tl.int32)), tmp2, xmask & ymask) 2023-01-11T21:38:06.9282073Z ''') 2023-01-11T21:38:06.9282079Z 2023-01-11T21:38:06.9282083Z 2023-01-11T21:38:06.9282181Z async_compile.wait(globals()) 2023-01-11T21:38:06.9282266Z del async_compile 2023-01-11T21:38:06.9282271Z 2023-01-11T21:38:06.9282347Z def call(args): 2023-01-11T21:38:06.9282430Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.9282510Z args.clear() 2023-01-11T21:38:06.9282603Z with torch.cuda.device(0): 2023-01-11T21:38:06.9282805Z buf0 = empty_strided((10, 10), (10, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.9282899Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.9283047Z triton_fused_add_0.run(arg0_1, arg1_1, buf0, 10, 10, grid=grid(10, 10), stream=stream0) 2023-01-11T21:38:06.9283124Z del arg0_1 2023-01-11T21:38:06.9283198Z del arg1_1 2023-01-11T21:38:06.9283278Z return (buf0, ) 2023-01-11T21:38:06.9283284Z 2023-01-11T21:38:06.9283288Z 2023-01-11T21:38:06.9283370Z if __name__ == "__main__": 2023-01-11T21:38:06.9283492Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.9283614Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.9283821Z arg0_1 = rand_strided((10, 10), (10, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.9284027Z arg1_1 = rand_strided((10, 10), (1, 10), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.9284147Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.9284152Z 2023-01-11T21:38:06.9284225Z ok (0.238s) 2023-01-11T21:38:06.9284702Z test_cuda_double_broadcast1 (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.9284837Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.9285097Z [2023-01-11 21:37:57,240] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1119 2023-01-11T21:38:06.9285390Z [2023-01-11 21:37:57,316] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1119 2023-01-11T21:38:06.9285397Z 2023-01-11T21:38:06.9285491Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.9285579Z import torch 2023-01-11T21:38:06.9285670Z import random 2023-01-11T21:38:06.9285807Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.9285941Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.9285946Z 2023-01-11T21:38:06.9286030Z aten = torch.ops.aten 2023-01-11T21:38:06.9286168Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.9286265Z async_compile = AsyncCompile() 2023-01-11T21:38:06.9286270Z 2023-01-11T21:38:06.9286341Z import triton 2023-01-11T21:38:06.9286437Z import triton.language as tl 2023-01-11T21:38:06.9286563Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.9286705Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.9286711Z 2023-01-11T21:38:06.9286715Z 2023-01-11T21:38:06.9286874Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.9286951Z import triton 2023-01-11T21:38:06.9287044Z import triton.language as tl 2023-01-11T21:38:06.9287154Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.9287259Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.9287391Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.9287517Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.9287522Z 2023-01-11T21:38:06.9287938Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp64', 1: '*fp32', 2: '*fp64', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.9288040Z @triton.jit 2023-01-11T21:38:06.9288182Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.9288259Z xnumel = 100 2023-01-11T21:38:06.9288359Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.9288486Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.9288569Z xmask = xindex < xnumel 2023-01-11T21:38:06.9288642Z x2 = xindex 2023-01-11T21:38:06.9288720Z x0 = xindex % 10 2023-01-11T21:38:06.9288820Z tmp0 = tl.load(in_ptr0 + (x2), xmask) 2023-01-11T21:38:06.9288919Z tmp1 = tl.load(in_ptr1 + (x0), xmask) 2023-01-11T21:38:06.9289003Z tmp2 = tmp1.to(tl.float64) 2023-01-11T21:38:06.9289085Z tmp3 = tmp0 + tmp2 2023-01-11T21:38:06.9289228Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.9289315Z ''') 2023-01-11T21:38:06.9289320Z 2023-01-11T21:38:06.9289327Z 2023-01-11T21:38:06.9289424Z async_compile.wait(globals()) 2023-01-11T21:38:06.9289502Z del async_compile 2023-01-11T21:38:06.9289507Z 2023-01-11T21:38:06.9289585Z def call(args): 2023-01-11T21:38:06.9289666Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.9289736Z args.clear() 2023-01-11T21:38:06.9289828Z with torch.cuda.device(0): 2023-01-11T21:38:06.9290034Z buf0 = empty_strided((10, 10), (10, 1), device='cuda', dtype=torch.float64) 2023-01-11T21:38:06.9290128Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.9290272Z triton_fused_add_0.run(arg0_1, arg1_1, buf0, 100, grid=grid(100), stream=stream0) 2023-01-11T21:38:06.9290347Z del arg0_1 2023-01-11T21:38:06.9290422Z del arg1_1 2023-01-11T21:38:06.9290495Z return (buf0, ) 2023-01-11T21:38:06.9290501Z 2023-01-11T21:38:06.9290511Z 2023-01-11T21:38:06.9290587Z if __name__ == "__main__": 2023-01-11T21:38:06.9290706Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.9290833Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.9291041Z arg0_1 = rand_strided((10, 10), (10, 1), device='cuda:0', dtype=torch.float64) 2023-01-11T21:38:06.9291243Z arg1_1 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.9291363Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.9291397Z 2023-01-11T21:38:06.9291473Z ok (0.089s) 2023-01-11T21:38:06.9291952Z test_cuda_double_broadcast2 (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.9292085Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.9292340Z [2023-01-11 21:37:57,331] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1120 2023-01-11T21:38:06.9292611Z [2023-01-11 21:37:57,402] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1120 2023-01-11T21:38:06.9292616Z 2023-01-11T21:38:06.9292716Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.9292793Z import torch 2023-01-11T21:38:06.9292874Z import random 2023-01-11T21:38:06.9292996Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.9293121Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.9293127Z 2023-01-11T21:38:06.9293212Z aten = torch.ops.aten 2023-01-11T21:38:06.9293342Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.9293441Z async_compile = AsyncCompile() 2023-01-11T21:38:06.9293447Z 2023-01-11T21:38:06.9293521Z import triton 2023-01-11T21:38:06.9293615Z import triton.language as tl 2023-01-11T21:38:06.9293739Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.9293915Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.9293920Z 2023-01-11T21:38:06.9293925Z 2023-01-11T21:38:06.9294080Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.9294150Z import triton 2023-01-11T21:38:06.9294245Z import triton.language as tl 2023-01-11T21:38:06.9294367Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.9294472Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.9294721Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.9294846Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.9294851Z 2023-01-11T21:38:06.9295269Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp64', 1: '*fp32', 2: '*fp64', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.9295342Z @triton.jit 2023-01-11T21:38:06.9295481Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.9295552Z xnumel = 100 2023-01-11T21:38:06.9295649Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.9295777Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.9295866Z xmask = xindex < xnumel 2023-01-11T21:38:06.9295955Z x2 = xindex 2023-01-11T21:38:06.9296038Z x1 = (xindex // 10) 2023-01-11T21:38:06.9296152Z tmp0 = tl.load(in_ptr0 + (x2), xmask) 2023-01-11T21:38:06.9296245Z tmp1 = tl.load(in_ptr1 + (x1), xmask) 2023-01-11T21:38:06.9296334Z tmp2 = tmp1.to(tl.float64) 2023-01-11T21:38:06.9296412Z tmp3 = tmp0 + tmp2 2023-01-11T21:38:06.9296547Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.9296634Z ''') 2023-01-11T21:38:06.9296639Z 2023-01-11T21:38:06.9296644Z 2023-01-11T21:38:06.9296737Z async_compile.wait(globals()) 2023-01-11T21:38:06.9296811Z del async_compile 2023-01-11T21:38:06.9296816Z 2023-01-11T21:38:06.9296887Z def call(args): 2023-01-11T21:38:06.9296967Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.9297042Z args.clear() 2023-01-11T21:38:06.9297192Z with torch.cuda.device(0): 2023-01-11T21:38:06.9297431Z buf0 = empty_strided((1, 10, 10), (100, 10, 1), device='cuda', dtype=torch.float64) 2023-01-11T21:38:06.9297587Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.9297750Z triton_fused_add_0.run(arg0_1, arg1_1, buf0, 100, grid=grid(100), stream=stream0) 2023-01-11T21:38:06.9297828Z del arg0_1 2023-01-11T21:38:06.9297912Z del arg1_1 2023-01-11T21:38:06.9298083Z return (buf0, ) 2023-01-11T21:38:06.9298089Z 2023-01-11T21:38:06.9298093Z 2023-01-11T21:38:06.9298173Z if __name__ == "__main__": 2023-01-11T21:38:06.9298288Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.9298413Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.9298618Z arg0_1 = rand_strided((10, 10), (10, 1), device='cuda:0', dtype=torch.float64) 2023-01-11T21:38:06.9298826Z arg1_1 = rand_strided((1, 10, 1), (10, 1, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.9298939Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.9298944Z 2023-01-11T21:38:06.9299017Z ok (0.086s) 2023-01-11T21:38:06.9299493Z test_cuda_double_broadcast3 (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.9299623Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.9299881Z [2023-01-11 21:37:57,416] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1121 2023-01-11T21:38:06.9300145Z [2023-01-11 21:37:57,484] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1121 2023-01-11T21:38:06.9300188Z 2023-01-11T21:38:06.9300291Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.9300366Z import torch 2023-01-11T21:38:06.9300444Z import random 2023-01-11T21:38:06.9300561Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.9300685Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.9300690Z 2023-01-11T21:38:06.9300775Z aten = torch.ops.aten 2023-01-11T21:38:06.9300913Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.9301009Z async_compile = AsyncCompile() 2023-01-11T21:38:06.9301014Z 2023-01-11T21:38:06.9301091Z import triton 2023-01-11T21:38:06.9301189Z import triton.language as tl 2023-01-11T21:38:06.9301316Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.9301450Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.9301456Z 2023-01-11T21:38:06.9301468Z 2023-01-11T21:38:06.9301618Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.9301693Z import triton 2023-01-11T21:38:06.9301787Z import triton.language as tl 2023-01-11T21:38:06.9301905Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.9302011Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.9302150Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.9302276Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.9302282Z 2023-01-11T21:38:06.9302693Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp64', 1: '*fp32', 2: '*fp64', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.9302767Z @triton.jit 2023-01-11T21:38:06.9302909Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.9302985Z xnumel = 100 2023-01-11T21:38:06.9303089Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.9303218Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.9303304Z xmask = xindex < xnumel 2023-01-11T21:38:06.9303377Z x0 = xindex 2023-01-11T21:38:06.9303470Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.9303635Z tmp1 = tl.load(in_ptr1 + (0 + tl.zeros([XBLOCK], tl.int32)), None) 2023-01-11T21:38:06.9303728Z tmp2 = tmp1.to(tl.float64) 2023-01-11T21:38:06.9303811Z tmp3 = tmp0 + tmp2 2023-01-11T21:38:06.9303948Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.9304035Z ''') 2023-01-11T21:38:06.9304041Z 2023-01-11T21:38:06.9304045Z 2023-01-11T21:38:06.9304138Z async_compile.wait(globals()) 2023-01-11T21:38:06.9304211Z del async_compile 2023-01-11T21:38:06.9304216Z 2023-01-11T21:38:06.9304293Z def call(args): 2023-01-11T21:38:06.9304377Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.9304453Z args.clear() 2023-01-11T21:38:06.9304549Z with torch.cuda.device(0): 2023-01-11T21:38:06.9304754Z buf0 = empty_strided((10, 10), (10, 1), device='cuda', dtype=torch.float64) 2023-01-11T21:38:06.9304848Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.9304984Z triton_fused_add_0.run(arg0_1, arg1_1, buf0, 100, grid=grid(100), stream=stream0) 2023-01-11T21:38:06.9305063Z del arg0_1 2023-01-11T21:38:06.9305140Z del arg1_1 2023-01-11T21:38:06.9305220Z return (buf0, ) 2023-01-11T21:38:06.9305225Z 2023-01-11T21:38:06.9305230Z 2023-01-11T21:38:06.9305311Z if __name__ == "__main__": 2023-01-11T21:38:06.9305428Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.9305555Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.9305761Z arg0_1 = rand_strided((10, 10), (10, 1), device='cuda:0', dtype=torch.float64) 2023-01-11T21:38:06.9305950Z arg1_1 = rand_strided((1, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.9306100Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.9306105Z 2023-01-11T21:38:06.9306180Z ok (0.082s) 2023-01-11T21:38:06.9306659Z test_cuda_double_dense (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.9306791Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.9307052Z [2023-01-11 21:37:57,498] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1122 2023-01-11T21:38:06.9307315Z [2023-01-11 21:37:57,567] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1122 2023-01-11T21:38:06.9307321Z 2023-01-11T21:38:06.9307422Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.9307503Z import torch 2023-01-11T21:38:06.9307573Z import random 2023-01-11T21:38:06.9307694Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.9307820Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.9307825Z 2023-01-11T21:38:06.9307908Z aten = torch.ops.aten 2023-01-11T21:38:06.9308048Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.9308146Z async_compile = AsyncCompile() 2023-01-11T21:38:06.9308151Z 2023-01-11T21:38:06.9308228Z import triton 2023-01-11T21:38:06.9308323Z import triton.language as tl 2023-01-11T21:38:06.9308443Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.9308585Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.9308590Z 2023-01-11T21:38:06.9308595Z 2023-01-11T21:38:06.9308750Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.9308825Z import triton 2023-01-11T21:38:06.9308922Z import triton.language as tl 2023-01-11T21:38:06.9309039Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.9309143Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.9309275Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.9309396Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.9309428Z 2023-01-11T21:38:06.9309847Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp64', 1: '*fp32', 2: '*fp64', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.9309925Z @triton.jit 2023-01-11T21:38:06.9310067Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.9310145Z xnumel = 100 2023-01-11T21:38:06.9310244Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.9310372Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.9310461Z xmask = xindex < xnumel 2023-01-11T21:38:06.9310528Z x0 = xindex 2023-01-11T21:38:06.9310631Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.9310731Z tmp1 = tl.load(in_ptr1 + (x0), xmask) 2023-01-11T21:38:06.9310823Z tmp2 = tmp1.to(tl.float64) 2023-01-11T21:38:06.9310905Z tmp3 = tmp0 + tmp2 2023-01-11T21:38:06.9311043Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.9311130Z ''') 2023-01-11T21:38:06.9311135Z 2023-01-11T21:38:06.9311139Z 2023-01-11T21:38:06.9311228Z async_compile.wait(globals()) 2023-01-11T21:38:06.9311307Z del async_compile 2023-01-11T21:38:06.9311312Z 2023-01-11T21:38:06.9311389Z def call(args): 2023-01-11T21:38:06.9311470Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.9311549Z args.clear() 2023-01-11T21:38:06.9311640Z with torch.cuda.device(0): 2023-01-11T21:38:06.9311843Z buf0 = empty_strided((10, 10), (10, 1), device='cuda', dtype=torch.float64) 2023-01-11T21:38:06.9311931Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.9312137Z triton_fused_add_0.run(arg0_1, arg1_1, buf0, 100, grid=grid(100), stream=stream0) 2023-01-11T21:38:06.9312215Z del arg0_1 2023-01-11T21:38:06.9312288Z del arg1_1 2023-01-11T21:38:06.9312370Z return (buf0, ) 2023-01-11T21:38:06.9312375Z 2023-01-11T21:38:06.9312379Z 2023-01-11T21:38:06.9312464Z if __name__ == "__main__": 2023-01-11T21:38:06.9312581Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.9312709Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.9312910Z arg0_1 = rand_strided((10, 10), (10, 1), device='cuda:0', dtype=torch.float64) 2023-01-11T21:38:06.9313111Z arg1_1 = rand_strided((10, 10), (10, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.9313235Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.9313240Z 2023-01-11T21:38:06.9313313Z ok (0.083s) 2023-01-11T21:38:06.9313793Z test_cuda_double_double (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.9313928Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.9314190Z [2023-01-11 21:37:57,580] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1123 2023-01-11T21:38:06.9314454Z [2023-01-11 21:37:57,649] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1123 2023-01-11T21:38:06.9314460Z 2023-01-11T21:38:06.9314559Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.9314637Z import torch 2023-01-11T21:38:06.9314707Z import random 2023-01-11T21:38:06.9314829Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.9314958Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.9314964Z 2023-01-11T21:38:06.9315048Z aten = torch.ops.aten 2023-01-11T21:38:06.9315185Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.9315282Z async_compile = AsyncCompile() 2023-01-11T21:38:06.9315287Z 2023-01-11T21:38:06.9315385Z import triton 2023-01-11T21:38:06.9315474Z import triton.language as tl 2023-01-11T21:38:06.9315601Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.9315742Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.9315747Z 2023-01-11T21:38:06.9315752Z 2023-01-11T21:38:06.9315909Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.9315982Z import triton 2023-01-11T21:38:06.9316077Z import triton.language as tl 2023-01-11T21:38:06.9316192Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.9316295Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.9316427Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.9316552Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.9316557Z 2023-01-11T21:38:06.9316974Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp64', 1: '*fp64', 2: '*fp64', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.9317050Z @triton.jit 2023-01-11T21:38:06.9317198Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.9317274Z xnumel = 100 2023-01-11T21:38:06.9317373Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.9317503Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.9317582Z xmask = xindex < xnumel 2023-01-11T21:38:06.9317655Z x0 = xindex 2023-01-11T21:38:06.9317753Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.9317853Z tmp1 = tl.load(in_ptr1 + (x0), xmask) 2023-01-11T21:38:06.9317972Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.9318114Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.9318199Z ''') 2023-01-11T21:38:06.9318205Z 2023-01-11T21:38:06.9318209Z 2023-01-11T21:38:06.9318297Z async_compile.wait(globals()) 2023-01-11T21:38:06.9318380Z del async_compile 2023-01-11T21:38:06.9318386Z 2023-01-11T21:38:06.9318462Z def call(args): 2023-01-11T21:38:06.9318545Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.9318621Z args.clear() 2023-01-11T21:38:06.9318714Z with torch.cuda.device(0): 2023-01-11T21:38:06.9318924Z buf0 = empty_strided((10, 10), (10, 1), device='cuda', dtype=torch.float64) 2023-01-11T21:38:06.9319011Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.9319155Z triton_fused_add_0.run(arg0_1, arg1_1, buf0, 100, grid=grid(100), stream=stream0) 2023-01-11T21:38:06.9319228Z del arg0_1 2023-01-11T21:38:06.9319303Z del arg1_1 2023-01-11T21:38:06.9319387Z return (buf0, ) 2023-01-11T21:38:06.9319392Z 2023-01-11T21:38:06.9319396Z 2023-01-11T21:38:06.9319480Z if __name__ == "__main__": 2023-01-11T21:38:06.9319599Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.9319727Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.9319929Z arg0_1 = rand_strided((10, 10), (10, 1), device='cuda:0', dtype=torch.float64) 2023-01-11T21:38:06.9320129Z arg1_1 = rand_strided((10, 10), (10, 1), device='cuda:0', dtype=torch.float64) 2023-01-11T21:38:06.9320254Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.9320260Z 2023-01-11T21:38:06.9320332Z ok (0.081s) 2023-01-11T21:38:06.9320805Z test_cuda_double_int (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.9320942Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.9321207Z [2023-01-11 21:37:57,662] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1124 2023-01-11T21:38:06.9321502Z [2023-01-11 21:37:57,737] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1124 2023-01-11T21:38:06.9321508Z 2023-01-11T21:38:06.9321609Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.9321685Z import torch 2023-01-11T21:38:06.9321755Z import random 2023-01-11T21:38:06.9321875Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.9322000Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.9322007Z 2023-01-11T21:38:06.9322088Z aten = torch.ops.aten 2023-01-11T21:38:06.9322226Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.9322331Z async_compile = AsyncCompile() 2023-01-11T21:38:06.9322336Z 2023-01-11T21:38:06.9322413Z import triton 2023-01-11T21:38:06.9322501Z import triton.language as tl 2023-01-11T21:38:06.9322628Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.9322771Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.9322777Z 2023-01-11T21:38:06.9322781Z 2023-01-11T21:38:06.9322940Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.9323018Z import triton 2023-01-11T21:38:06.9323115Z import triton.language as tl 2023-01-11T21:38:06.9323230Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.9323336Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.9323464Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.9323593Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.9323598Z 2023-01-11T21:38:06.9324015Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp64', 1: '*i32', 2: '*fp64', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.9324117Z @triton.jit 2023-01-11T21:38:06.9324259Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.9324339Z xnumel = 100 2023-01-11T21:38:06.9324439Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.9324570Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.9324649Z xmask = xindex < xnumel 2023-01-11T21:38:06.9324723Z x2 = xindex 2023-01-11T21:38:06.9324800Z x0 = xindex % 10 2023-01-11T21:38:06.9324898Z tmp0 = tl.load(in_ptr0 + (x2), xmask) 2023-01-11T21:38:06.9324998Z tmp1 = tl.load(in_ptr1 + (x0), xmask) 2023-01-11T21:38:06.9325090Z tmp2 = tmp1.to(tl.float64) 2023-01-11T21:38:06.9325172Z tmp3 = tmp0 + tmp2 2023-01-11T21:38:06.9325304Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.9325393Z ''') 2023-01-11T21:38:06.9325399Z 2023-01-11T21:38:06.9325403Z 2023-01-11T21:38:06.9325498Z async_compile.wait(globals()) 2023-01-11T21:38:06.9325577Z del async_compile 2023-01-11T21:38:06.9325582Z 2023-01-11T21:38:06.9325659Z def call(args): 2023-01-11T21:38:06.9325741Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.9325817Z args.clear() 2023-01-11T21:38:06.9325905Z with torch.cuda.device(0): 2023-01-11T21:38:06.9326108Z buf0 = empty_strided((10, 10), (10, 1), device='cuda', dtype=torch.float64) 2023-01-11T21:38:06.9326202Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.9326346Z triton_fused_add_0.run(arg0_1, arg1_1, buf0, 100, grid=grid(100), stream=stream0) 2023-01-11T21:38:06.9326424Z del arg0_1 2023-01-11T21:38:06.9326499Z del arg1_1 2023-01-11T21:38:06.9326580Z return (buf0, ) 2023-01-11T21:38:06.9326585Z 2023-01-11T21:38:06.9326590Z 2023-01-11T21:38:06.9326673Z if __name__ == "__main__": 2023-01-11T21:38:06.9326789Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.9326916Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.9327124Z arg0_1 = rand_strided((10, 10), (10, 1), device='cuda:0', dtype=torch.float64) 2023-01-11T21:38:06.9327348Z arg1_1 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.int32) 2023-01-11T21:38:06.9327471Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.9327476Z 2023-01-11T21:38:06.9327552Z ok (0.088s) 2023-01-11T21:38:06.9328031Z test_cuda_double_strided (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.9328167Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.9328426Z [2023-01-11 21:37:57,751] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1125 2023-01-11T21:38:06.9328685Z [2023-01-11 21:37:57,824] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1125 2023-01-11T21:38:06.9328704Z 2023-01-11T21:38:06.9328798Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.9328874Z import torch 2023-01-11T21:38:06.9328949Z import random 2023-01-11T21:38:06.9329068Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.9329193Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.9329199Z 2023-01-11T21:38:06.9329282Z aten = torch.ops.aten 2023-01-11T21:38:06.9329421Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.9329512Z async_compile = AsyncCompile() 2023-01-11T21:38:06.9329517Z 2023-01-11T21:38:06.9329592Z import triton 2023-01-11T21:38:06.9329713Z import triton.language as tl 2023-01-11T21:38:06.9329840Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.9329982Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.9329988Z 2023-01-11T21:38:06.9329992Z 2023-01-11T21:38:06.9330148Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.9330224Z import triton 2023-01-11T21:38:06.9330318Z import triton.language as tl 2023-01-11T21:38:06.9330428Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.9330531Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.9330662Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.9330788Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.9330793Z 2023-01-11T21:38:06.9331208Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp64', 1: '*fp32', 2: '*fp64', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.9331287Z @triton.jit 2023-01-11T21:38:06.9331431Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.9331508Z xnumel = 100 2023-01-11T21:38:06.9331601Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.9331734Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.9331823Z xmask = xindex < xnumel 2023-01-11T21:38:06.9331896Z x2 = xindex 2023-01-11T21:38:06.9331973Z x0 = xindex % 10 2023-01-11T21:38:06.9332054Z x1 = (xindex // 10) 2023-01-11T21:38:06.9332147Z tmp0 = tl.load(in_ptr0 + (x2), xmask) 2023-01-11T21:38:06.9332257Z tmp1 = tl.load(in_ptr1 + ((2*x0) + (30*x1)), xmask) 2023-01-11T21:38:06.9332348Z tmp2 = tmp1.to(tl.float64) 2023-01-11T21:38:06.9332429Z tmp3 = tmp0 + tmp2 2023-01-11T21:38:06.9332569Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.9332659Z ''') 2023-01-11T21:38:06.9332666Z 2023-01-11T21:38:06.9332670Z 2023-01-11T21:38:06.9332764Z async_compile.wait(globals()) 2023-01-11T21:38:06.9332844Z del async_compile 2023-01-11T21:38:06.9332849Z 2023-01-11T21:38:06.9332919Z def call(args): 2023-01-11T21:38:06.9333002Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.9333080Z args.clear() 2023-01-11T21:38:06.9333201Z with torch.cuda.device(0): 2023-01-11T21:38:06.9333411Z buf0 = empty_strided((10, 10), (10, 1), device='cuda', dtype=torch.float64) 2023-01-11T21:38:06.9333506Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.9333650Z triton_fused_add_0.run(arg0_1, arg1_1, buf0, 100, grid=grid(100), stream=stream0) 2023-01-11T21:38:06.9333719Z del arg0_1 2023-01-11T21:38:06.9333794Z del arg1_1 2023-01-11T21:38:06.9333873Z return (buf0, ) 2023-01-11T21:38:06.9333878Z 2023-01-11T21:38:06.9333883Z 2023-01-11T21:38:06.9333968Z if __name__ == "__main__": 2023-01-11T21:38:06.9334088Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.9334219Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.9334423Z arg0_1 = rand_strided((10, 10), (10, 1), device='cuda:0', dtype=torch.float64) 2023-01-11T21:38:06.9334733Z arg1_1 = rand_strided((10, 10), (30, 2), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.9334851Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.9334856Z 2023-01-11T21:38:06.9334930Z ok (0.087s) 2023-01-11T21:38:06.9335405Z test_cuda_double_transposed (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.9335539Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.9335840Z [2023-01-11 21:37:57,838] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1126 2023-01-11T21:38:06.9336104Z [2023-01-11 21:37:58,043] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1126 2023-01-11T21:38:06.9336110Z 2023-01-11T21:38:06.9336212Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.9336288Z import torch 2023-01-11T21:38:06.9336364Z import random 2023-01-11T21:38:06.9336479Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.9336603Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.9336608Z 2023-01-11T21:38:06.9336693Z aten = torch.ops.aten 2023-01-11T21:38:06.9336831Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.9336929Z async_compile = AsyncCompile() 2023-01-11T21:38:06.9336935Z 2023-01-11T21:38:06.9337010Z import triton 2023-01-11T21:38:06.9337101Z import triton.language as tl 2023-01-11T21:38:06.9337289Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.9337425Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.9337431Z 2023-01-11T21:38:06.9337440Z 2023-01-11T21:38:06.9337593Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.9337669Z import triton 2023-01-11T21:38:06.9337768Z import triton.language as tl 2023-01-11T21:38:06.9337885Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.9337990Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.9338124Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.9338253Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.9338258Z 2023-01-11T21:38:06.9338728Z @pointwise(size_hints=[16, 16], tile_hint=TileHint.DEFAULT,filename=__file__, meta={'signature': {0: '*fp64', 1: '*fp32', 2: '*fp64', 3: 'i32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.9338807Z @triton.jit 2023-01-11T21:38:06.9338987Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, ynumel, XBLOCK : tl.constexpr, YBLOCK : tl.constexpr): 2023-01-11T21:38:06.9339064Z xnumel = 10 2023-01-11T21:38:06.9339140Z ynumel = 10 2023-01-11T21:38:06.9339241Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.9339415Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.9339505Z xmask = xindex < xnumel 2023-01-11T21:38:06.9339597Z yoffset = tl.program_id(1) * YBLOCK 2023-01-11T21:38:06.9339731Z yindex = yoffset + tl.reshape(tl.arange(0, YBLOCK), [1, YBLOCK]) 2023-01-11T21:38:06.9339819Z ymask = yindex < ynumel 2023-01-11T21:38:06.9339891Z x0 = xindex 2023-01-11T21:38:06.9339961Z y1 = yindex 2023-01-11T21:38:06.9340078Z tmp0 = tl.load(in_ptr0 + (y1 + (10*x0)), xmask & ymask) 2023-01-11T21:38:06.9340193Z tmp1 = tl.load(in_ptr1 + (x0 + (10*y1)), xmask & ymask) 2023-01-11T21:38:06.9340282Z tmp2 = tmp1.to(tl.float64) 2023-01-11T21:38:06.9340362Z tmp3 = tmp0 + tmp2 2023-01-11T21:38:06.9340519Z tl.store(out_ptr0 + (y1 + (10*x0) + tl.zeros([XBLOCK, YBLOCK], tl.int32)), tmp3, xmask & ymask) 2023-01-11T21:38:06.9340609Z ''') 2023-01-11T21:38:06.9340614Z 2023-01-11T21:38:06.9340618Z 2023-01-11T21:38:06.9340715Z async_compile.wait(globals()) 2023-01-11T21:38:06.9340795Z del async_compile 2023-01-11T21:38:06.9340800Z 2023-01-11T21:38:06.9340878Z def call(args): 2023-01-11T21:38:06.9340960Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.9341031Z args.clear() 2023-01-11T21:38:06.9341125Z with torch.cuda.device(0): 2023-01-11T21:38:06.9341332Z buf0 = empty_strided((10, 10), (10, 1), device='cuda', dtype=torch.float64) 2023-01-11T21:38:06.9341426Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.9341573Z triton_fused_add_0.run(arg0_1, arg1_1, buf0, 10, 10, grid=grid(10, 10), stream=stream0) 2023-01-11T21:38:06.9341648Z del arg0_1 2023-01-11T21:38:06.9341753Z del arg1_1 2023-01-11T21:38:06.9341827Z return (buf0, ) 2023-01-11T21:38:06.9341832Z 2023-01-11T21:38:06.9341837Z 2023-01-11T21:38:06.9341916Z if __name__ == "__main__": 2023-01-11T21:38:06.9342034Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.9342162Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.9342374Z arg0_1 = rand_strided((10, 10), (10, 1), device='cuda:0', dtype=torch.float64) 2023-01-11T21:38:06.9342573Z arg1_1 = rand_strided((10, 10), (1, 10), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.9342693Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.9342699Z 2023-01-11T21:38:06.9342770Z ok (0.220s) 2023-01-11T21:38:06.9343240Z test_cuda_int_broadcast1 (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.9343375Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.9343640Z [2023-01-11 21:37:58,057] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1127 2023-01-11T21:38:06.9343907Z [2023-01-11 21:37:58,126] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1127 2023-01-11T21:38:06.9343913Z 2023-01-11T21:38:06.9344013Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.9344093Z import torch 2023-01-11T21:38:06.9344167Z import random 2023-01-11T21:38:06.9344288Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.9344414Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.9344420Z 2023-01-11T21:38:06.9344497Z aten = torch.ops.aten 2023-01-11T21:38:06.9344634Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.9344734Z async_compile = AsyncCompile() 2023-01-11T21:38:06.9344740Z 2023-01-11T21:38:06.9344815Z import triton 2023-01-11T21:38:06.9344908Z import triton.language as tl 2023-01-11T21:38:06.9345033Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.9345199Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.9345205Z 2023-01-11T21:38:06.9345210Z 2023-01-11T21:38:06.9345368Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.9345438Z import triton 2023-01-11T21:38:06.9345532Z import triton.language as tl 2023-01-11T21:38:06.9345647Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.9345751Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.9345889Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.9346015Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.9346020Z 2023-01-11T21:38:06.9346435Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*i32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.9346513Z @triton.jit 2023-01-11T21:38:06.9346653Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.9346729Z xnumel = 10 2023-01-11T21:38:06.9346830Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.9346962Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.9347045Z xmask = xindex < xnumel 2023-01-11T21:38:06.9347121Z x0 = xindex 2023-01-11T21:38:06.9347220Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.9347313Z tmp2 = tl.load(in_ptr1 + (x0), xmask) 2023-01-11T21:38:06.9347403Z tmp1 = tmp0.to(tl.float32) 2023-01-11T21:38:06.9347483Z tmp3 = tmp1 + tmp2 2023-01-11T21:38:06.9347621Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.9347736Z ''') 2023-01-11T21:38:06.9347741Z 2023-01-11T21:38:06.9347746Z 2023-01-11T21:38:06.9347841Z async_compile.wait(globals()) 2023-01-11T21:38:06.9347920Z del async_compile 2023-01-11T21:38:06.9347926Z 2023-01-11T21:38:06.9348002Z def call(args): 2023-01-11T21:38:06.9348077Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.9348156Z args.clear() 2023-01-11T21:38:06.9348251Z with torch.cuda.device(0): 2023-01-11T21:38:06.9348452Z buf0 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.9348547Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.9348691Z triton_fused_add_0.run(arg0_1, arg1_1, buf0, 10, grid=grid(10), stream=stream0) 2023-01-11T21:38:06.9348767Z del arg0_1 2023-01-11T21:38:06.9348835Z del arg1_1 2023-01-11T21:38:06.9348915Z return (buf0, ) 2023-01-11T21:38:06.9348920Z 2023-01-11T21:38:06.9348925Z 2023-01-11T21:38:06.9349005Z if __name__ == "__main__": 2023-01-11T21:38:06.9349127Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.9349255Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.9349453Z arg0_1 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.int32) 2023-01-11T21:38:06.9349657Z arg1_1 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.9349773Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.9349783Z 2023-01-11T21:38:06.9349849Z ok (0.082s) 2023-01-11T21:38:06.9350326Z test_cuda_int_broadcast2 (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.9350460Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.9350725Z [2023-01-11 21:37:58,139] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1128 2023-01-11T21:38:06.9350991Z [2023-01-11 21:37:58,213] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1128 2023-01-11T21:38:06.9351022Z 2023-01-11T21:38:06.9351124Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.9351200Z import torch 2023-01-11T21:38:06.9351276Z import random 2023-01-11T21:38:06.9351398Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.9351517Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.9351523Z 2023-01-11T21:38:06.9351607Z aten = torch.ops.aten 2023-01-11T21:38:06.9351744Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.9351843Z async_compile = AsyncCompile() 2023-01-11T21:38:06.9351848Z 2023-01-11T21:38:06.9351927Z import triton 2023-01-11T21:38:06.9352020Z import triton.language as tl 2023-01-11T21:38:06.9352148Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.9352283Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.9352294Z 2023-01-11T21:38:06.9352298Z 2023-01-11T21:38:06.9352449Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.9352526Z import triton 2023-01-11T21:38:06.9352621Z import triton.language as tl 2023-01-11T21:38:06.9352736Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.9352840Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.9352973Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.9353099Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.9353104Z 2023-01-11T21:38:06.9353521Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*i32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.9353618Z @triton.jit 2023-01-11T21:38:06.9353761Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.9353838Z xnumel = 100 2023-01-11T21:38:06.9353938Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.9354073Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.9354159Z xmask = xindex < xnumel 2023-01-11T21:38:06.9354235Z x0 = xindex % 10 2023-01-11T21:38:06.9354310Z x1 = (xindex // 10) 2023-01-11T21:38:06.9354383Z x2 = xindex 2023-01-11T21:38:06.9354481Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.9354578Z tmp2 = tl.load(in_ptr1 + (x1), xmask) 2023-01-11T21:38:06.9354668Z tmp1 = tmp0.to(tl.float32) 2023-01-11T21:38:06.9354748Z tmp3 = tmp1 + tmp2 2023-01-11T21:38:06.9354887Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.9354968Z ''') 2023-01-11T21:38:06.9354974Z 2023-01-11T21:38:06.9354981Z 2023-01-11T21:38:06.9355078Z async_compile.wait(globals()) 2023-01-11T21:38:06.9355157Z del async_compile 2023-01-11T21:38:06.9355162Z 2023-01-11T21:38:06.9355241Z def call(args): 2023-01-11T21:38:06.9355322Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.9355399Z args.clear() 2023-01-11T21:38:06.9355490Z with torch.cuda.device(0): 2023-01-11T21:38:06.9355700Z buf0 = empty_strided((1, 10, 10), (100, 10, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.9355794Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.9355938Z triton_fused_add_0.run(arg0_1, arg1_1, buf0, 100, grid=grid(100), stream=stream0) 2023-01-11T21:38:06.9356013Z del arg0_1 2023-01-11T21:38:06.9356088Z del arg1_1 2023-01-11T21:38:06.9356171Z return (buf0, ) 2023-01-11T21:38:06.9356176Z 2023-01-11T21:38:06.9356180Z 2023-01-11T21:38:06.9356262Z if __name__ == "__main__": 2023-01-11T21:38:06.9356380Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.9356503Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.9356700Z arg0_1 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.int32) 2023-01-11T21:38:06.9356911Z arg1_1 = rand_strided((1, 10, 1), (10, 1, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.9357058Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.9357064Z 2023-01-11T21:38:06.9357137Z ok (0.087s) 2023-01-11T21:38:06.9357607Z test_cuda_int_broadcast3 (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.9357741Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.9358001Z [2023-01-11 21:37:58,226] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1129 2023-01-11T21:38:06.9358273Z [2023-01-11 21:37:58,293] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1129 2023-01-11T21:38:06.9358279Z 2023-01-11T21:38:06.9358382Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.9358454Z import torch 2023-01-11T21:38:06.9358531Z import random 2023-01-11T21:38:06.9358656Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.9358781Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.9358787Z 2023-01-11T21:38:06.9358871Z aten = torch.ops.aten 2023-01-11T21:38:06.9359009Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.9359104Z async_compile = AsyncCompile() 2023-01-11T21:38:06.9359109Z 2023-01-11T21:38:06.9359179Z import triton 2023-01-11T21:38:06.9359277Z import triton.language as tl 2023-01-11T21:38:06.9359405Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.9359579Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.9359585Z 2023-01-11T21:38:06.9359590Z 2023-01-11T21:38:06.9359746Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.9359824Z import triton 2023-01-11T21:38:06.9359917Z import triton.language as tl 2023-01-11T21:38:06.9360036Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.9360133Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.9360266Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.9360392Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.9360397Z 2023-01-11T21:38:06.9360812Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*i32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.9360887Z @triton.jit 2023-01-11T21:38:06.9361034Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.9361109Z xnumel = 10 2023-01-11T21:38:06.9361209Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.9361334Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.9361416Z xmask = xindex < xnumel 2023-01-11T21:38:06.9361493Z x0 = xindex 2023-01-11T21:38:06.9361592Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.9361725Z tmp2 = tl.load(in_ptr1 + (0 + tl.zeros([XBLOCK], tl.int32)), None) 2023-01-11T21:38:06.9361819Z tmp1 = tmp0.to(tl.float32) 2023-01-11T21:38:06.9361901Z tmp3 = tmp1 + tmp2 2023-01-11T21:38:06.9362031Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.9362119Z ''') 2023-01-11T21:38:06.9362124Z 2023-01-11T21:38:06.9362129Z 2023-01-11T21:38:06.9362223Z async_compile.wait(globals()) 2023-01-11T21:38:06.9362302Z del async_compile 2023-01-11T21:38:06.9362308Z 2023-01-11T21:38:06.9362386Z def call(args): 2023-01-11T21:38:06.9362467Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.9362544Z args.clear() 2023-01-11T21:38:06.9362630Z with torch.cuda.device(0): 2023-01-11T21:38:06.9362830Z buf0 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.9362950Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.9363096Z triton_fused_add_0.run(arg0_1, arg1_1, buf0, 10, grid=grid(10), stream=stream0) 2023-01-11T21:38:06.9363173Z del arg0_1 2023-01-11T21:38:06.9363245Z del arg1_1 2023-01-11T21:38:06.9363324Z return (buf0, ) 2023-01-11T21:38:06.9363329Z 2023-01-11T21:38:06.9363333Z 2023-01-11T21:38:06.9363415Z if __name__ == "__main__": 2023-01-11T21:38:06.9363528Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.9363654Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.9363848Z arg0_1 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.int32) 2023-01-11T21:38:06.9364046Z arg1_1 = rand_strided((1, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.9364166Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.9364171Z 2023-01-11T21:38:06.9364243Z ok (0.080s) 2023-01-11T21:38:06.9364717Z test_cuda_int_dense (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.9364848Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.9365109Z [2023-01-11 21:37:58,306] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1130 2023-01-11T21:38:06.9365366Z [2023-01-11 21:37:58,383] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1130 2023-01-11T21:38:06.9365404Z 2023-01-11T21:38:06.9365499Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.9365575Z import torch 2023-01-11T21:38:06.9365652Z import random 2023-01-11T21:38:06.9365774Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.9365902Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.9365907Z 2023-01-11T21:38:06.9365988Z aten = torch.ops.aten 2023-01-11T21:38:06.9366125Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.9366217Z async_compile = AsyncCompile() 2023-01-11T21:38:06.9366222Z 2023-01-11T21:38:06.9366297Z import triton 2023-01-11T21:38:06.9366391Z import triton.language as tl 2023-01-11T21:38:06.9366517Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.9366659Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.9366665Z 2023-01-11T21:38:06.9366669Z 2023-01-11T21:38:06.9366831Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.9366908Z import triton 2023-01-11T21:38:06.9367003Z import triton.language as tl 2023-01-11T21:38:06.9367113Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.9367216Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.9367353Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.9367479Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.9367484Z 2023-01-11T21:38:06.9367900Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*i32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.9367975Z @triton.jit 2023-01-11T21:38:06.9368117Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.9368191Z xnumel = 100 2023-01-11T21:38:06.9368284Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.9368417Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.9368503Z xmask = xindex < xnumel 2023-01-11T21:38:06.9368580Z x0 = xindex % 10 2023-01-11T21:38:06.9368656Z x2 = xindex 2023-01-11T21:38:06.9368759Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.9368884Z tmp2 = tl.load(in_ptr1 + (x2), xmask) 2023-01-11T21:38:06.9368970Z tmp1 = tmp0.to(tl.float32) 2023-01-11T21:38:06.9369053Z tmp3 = tmp1 + tmp2 2023-01-11T21:38:06.9369191Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.9369278Z ''') 2023-01-11T21:38:06.9369284Z 2023-01-11T21:38:06.9369289Z 2023-01-11T21:38:06.9369385Z async_compile.wait(globals()) 2023-01-11T21:38:06.9369467Z del async_compile 2023-01-11T21:38:06.9369472Z 2023-01-11T21:38:06.9369547Z def call(args): 2023-01-11T21:38:06.9369622Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.9369699Z args.clear() 2023-01-11T21:38:06.9369794Z with torch.cuda.device(0): 2023-01-11T21:38:06.9370000Z buf0 = empty_strided((10, 10), (10, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.9370097Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.9370240Z triton_fused_add_0.run(arg0_1, arg1_1, buf0, 100, grid=grid(100), stream=stream0) 2023-01-11T21:38:06.9370319Z del arg0_1 2023-01-11T21:38:06.9370387Z del arg1_1 2023-01-11T21:38:06.9370466Z return (buf0, ) 2023-01-11T21:38:06.9370471Z 2023-01-11T21:38:06.9370475Z 2023-01-11T21:38:06.9370556Z if __name__ == "__main__": 2023-01-11T21:38:06.9370675Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.9370803Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.9371002Z arg0_1 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.int32) 2023-01-11T21:38:06.9371205Z arg1_1 = rand_strided((10, 10), (10, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.9371327Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.9371361Z 2023-01-11T21:38:06.9371432Z ok (0.090s) 2023-01-11T21:38:06.9371897Z test_cuda_int_double (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.9372032Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.9372290Z [2023-01-11 21:37:58,397] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1131 2023-01-11T21:38:06.9372554Z [2023-01-11 21:37:58,466] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1131 2023-01-11T21:38:06.9372560Z 2023-01-11T21:38:06.9372660Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.9372738Z import torch 2023-01-11T21:38:06.9372813Z import random 2023-01-11T21:38:06.9372932Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.9373056Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.9373062Z 2023-01-11T21:38:06.9373139Z aten = torch.ops.aten 2023-01-11T21:38:06.9373279Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.9373376Z async_compile = AsyncCompile() 2023-01-11T21:38:06.9373381Z 2023-01-11T21:38:06.9373461Z import triton 2023-01-11T21:38:06.9373556Z import triton.language as tl 2023-01-11T21:38:06.9373683Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.9373823Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.9373828Z 2023-01-11T21:38:06.9373833Z 2023-01-11T21:38:06.9373988Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.9374058Z import triton 2023-01-11T21:38:06.9374151Z import triton.language as tl 2023-01-11T21:38:06.9374271Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.9374373Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.9374616Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.9374744Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.9374749Z 2023-01-11T21:38:06.9375205Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*i32', 1: '*fp64', 2: '*fp64', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.9375283Z @triton.jit 2023-01-11T21:38:06.9375418Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.9375492Z xnumel = 100 2023-01-11T21:38:06.9375588Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.9375715Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.9375796Z xmask = xindex < xnumel 2023-01-11T21:38:06.9375874Z x0 = xindex % 10 2023-01-11T21:38:06.9375943Z x2 = xindex 2023-01-11T21:38:06.9376034Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.9376133Z tmp2 = tl.load(in_ptr1 + (x2), xmask) 2023-01-11T21:38:06.9376220Z tmp1 = tmp0.to(tl.float64) 2023-01-11T21:38:06.9376295Z tmp3 = tmp1 + tmp2 2023-01-11T21:38:06.9376432Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.9376518Z ''') 2023-01-11T21:38:06.9376523Z 2023-01-11T21:38:06.9376528Z 2023-01-11T21:38:06.9376622Z async_compile.wait(globals()) 2023-01-11T21:38:06.9376692Z del async_compile 2023-01-11T21:38:06.9376697Z 2023-01-11T21:38:06.9376772Z def call(args): 2023-01-11T21:38:06.9376850Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.9376927Z args.clear() 2023-01-11T21:38:06.9377018Z with torch.cuda.device(0): 2023-01-11T21:38:06.9377307Z buf0 = empty_strided((10, 10), (10, 1), device='cuda', dtype=torch.float64) 2023-01-11T21:38:06.9377443Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.9377580Z triton_fused_add_0.run(arg0_1, arg1_1, buf0, 100, grid=grid(100), stream=stream0) 2023-01-11T21:38:06.9377658Z del arg0_1 2023-01-11T21:38:06.9377730Z del arg1_1 2023-01-11T21:38:06.9377810Z return (buf0, ) 2023-01-11T21:38:06.9377815Z 2023-01-11T21:38:06.9377822Z 2023-01-11T21:38:06.9377903Z if __name__ == "__main__": 2023-01-11T21:38:06.9378022Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.9378147Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.9378345Z arg0_1 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.int32) 2023-01-11T21:38:06.9378543Z arg1_1 = rand_strided((10, 10), (10, 1), device='cuda:0', dtype=torch.float64) 2023-01-11T21:38:06.9378664Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.9378669Z 2023-01-11T21:38:06.9378742Z ok (0.083s) 2023-01-11T21:38:06.9379214Z test_cuda_int_int (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.9379349Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.9379610Z [2023-01-11 21:37:58,480] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1132 2023-01-11T21:38:06.9379877Z [2023-01-11 21:37:58,544] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1132 2023-01-11T21:38:06.9379882Z 2023-01-11T21:38:06.9379981Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.9380058Z import torch 2023-01-11T21:38:06.9380128Z import random 2023-01-11T21:38:06.9380250Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.9380380Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.9380385Z 2023-01-11T21:38:06.9380469Z aten = torch.ops.aten 2023-01-11T21:38:06.9380607Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.9380705Z async_compile = AsyncCompile() 2023-01-11T21:38:06.9380710Z 2023-01-11T21:38:06.9380816Z import triton 2023-01-11T21:38:06.9380911Z import triton.language as tl 2023-01-11T21:38:06.9381030Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.9381173Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.9381178Z 2023-01-11T21:38:06.9381183Z 2023-01-11T21:38:06.9381339Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.9381418Z import triton 2023-01-11T21:38:06.9381511Z import triton.language as tl 2023-01-11T21:38:06.9381627Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.9381731Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.9381866Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.9381987Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.9381992Z 2023-01-11T21:38:06.9382405Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*i32', 1: '*i32', 2: '*i32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.9382482Z @triton.jit 2023-01-11T21:38:06.9382625Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.9382699Z xnumel = 10 2023-01-11T21:38:06.9382797Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.9382929Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.9383013Z xmask = xindex < xnumel 2023-01-11T21:38:06.9383080Z x0 = xindex 2023-01-11T21:38:06.9383179Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.9383278Z tmp1 = tl.load(in_ptr1 + (x0), xmask) 2023-01-11T21:38:06.9383390Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.9383527Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.9383616Z ''') 2023-01-11T21:38:06.9383621Z 2023-01-11T21:38:06.9383626Z 2023-01-11T21:38:06.9383720Z async_compile.wait(globals()) 2023-01-11T21:38:06.9383795Z del async_compile 2023-01-11T21:38:06.9383801Z 2023-01-11T21:38:06.9383878Z def call(args): 2023-01-11T21:38:06.9383959Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.9384037Z args.clear() 2023-01-11T21:38:06.9384131Z with torch.cuda.device(0): 2023-01-11T21:38:06.9384327Z buf0 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.int32) 2023-01-11T21:38:06.9384421Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.9384560Z triton_fused_add_0.run(arg0_1, arg1_1, buf0, 10, grid=grid(10), stream=stream0) 2023-01-11T21:38:06.9384636Z del arg0_1 2023-01-11T21:38:06.9384709Z del arg1_1 2023-01-11T21:38:06.9384791Z return (buf0, ) 2023-01-11T21:38:06.9384797Z 2023-01-11T21:38:06.9384801Z 2023-01-11T21:38:06.9384882Z if __name__ == "__main__": 2023-01-11T21:38:06.9385002Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.9385129Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.9385331Z arg0_1 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.int32) 2023-01-11T21:38:06.9385519Z arg1_1 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.int32) 2023-01-11T21:38:06.9385641Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.9385646Z 2023-01-11T21:38:06.9385717Z ok (0.077s) 2023-01-11T21:38:06.9386183Z test_cuda_int_strided (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.9386318Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.9386578Z [2023-01-11 21:37:58,557] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1133 2023-01-11T21:38:06.9386871Z [2023-01-11 21:37:58,631] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1133 2023-01-11T21:38:06.9386877Z 2023-01-11T21:38:06.9386976Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.9387051Z import torch 2023-01-11T21:38:06.9387121Z import random 2023-01-11T21:38:06.9387242Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.9387367Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.9387372Z 2023-01-11T21:38:06.9387456Z aten = torch.ops.aten 2023-01-11T21:38:06.9387592Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.9387691Z async_compile = AsyncCompile() 2023-01-11T21:38:06.9387696Z 2023-01-11T21:38:06.9387771Z import triton 2023-01-11T21:38:06.9387864Z import triton.language as tl 2023-01-11T21:38:06.9387983Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.9388125Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.9388133Z 2023-01-11T21:38:06.9388137Z 2023-01-11T21:38:06.9388292Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.9388368Z import triton 2023-01-11T21:38:06.9388462Z import triton.language as tl 2023-01-11T21:38:06.9388578Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.9388680Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.9388813Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.9388933Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.9388938Z 2023-01-11T21:38:06.9389355Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*i32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.9389463Z @triton.jit 2023-01-11T21:38:06.9389605Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.9389685Z xnumel = 100 2023-01-11T21:38:06.9389784Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.9389913Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.9389999Z xmask = xindex < xnumel 2023-01-11T21:38:06.9390071Z x0 = xindex % 10 2023-01-11T21:38:06.9390152Z x1 = (xindex // 10) 2023-01-11T21:38:06.9390222Z x2 = xindex 2023-01-11T21:38:06.9390323Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.9390433Z tmp2 = tl.load(in_ptr1 + ((2*x0) + (30*x1)), xmask) 2023-01-11T21:38:06.9390523Z tmp1 = tmp0.to(tl.float32) 2023-01-11T21:38:06.9390598Z tmp3 = tmp1 + tmp2 2023-01-11T21:38:06.9390735Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.9390819Z ''') 2023-01-11T21:38:06.9390824Z 2023-01-11T21:38:06.9390829Z 2023-01-11T21:38:06.9390923Z async_compile.wait(globals()) 2023-01-11T21:38:06.9391002Z del async_compile 2023-01-11T21:38:06.9391007Z 2023-01-11T21:38:06.9391087Z def call(args): 2023-01-11T21:38:06.9391167Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.9391244Z args.clear() 2023-01-11T21:38:06.9391331Z with torch.cuda.device(0): 2023-01-11T21:38:06.9391534Z buf0 = empty_strided((10, 10), (10, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.9391628Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.9391769Z triton_fused_add_0.run(arg0_1, arg1_1, buf0, 100, grid=grid(100), stream=stream0) 2023-01-11T21:38:06.9391843Z del arg0_1 2023-01-11T21:38:06.9391917Z del arg1_1 2023-01-11T21:38:06.9391998Z return (buf0, ) 2023-01-11T21:38:06.9392003Z 2023-01-11T21:38:06.9392010Z 2023-01-11T21:38:06.9392093Z if __name__ == "__main__": 2023-01-11T21:38:06.9392206Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.9392334Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.9392531Z arg0_1 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.int32) 2023-01-11T21:38:06.9392762Z arg1_1 = rand_strided((10, 10), (30, 2), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.9392884Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.9392889Z 2023-01-11T21:38:06.9392962Z ok (0.088s) 2023-01-11T21:38:06.9393440Z test_cuda_int_transposed (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.9393576Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.9393840Z [2023-01-11 21:37:58,645] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1134 2023-01-11T21:38:06.9394102Z [2023-01-11 21:37:58,718] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1134 2023-01-11T21:38:06.9394114Z 2023-01-11T21:38:06.9394209Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.9394283Z import torch 2023-01-11T21:38:06.9394364Z import random 2023-01-11T21:38:06.9394484Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.9394610Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.9394615Z 2023-01-11T21:38:06.9394698Z aten = torch.ops.aten 2023-01-11T21:38:06.9394831Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.9394923Z async_compile = AsyncCompile() 2023-01-11T21:38:06.9394928Z 2023-01-11T21:38:06.9395037Z import triton 2023-01-11T21:38:06.9395131Z import triton.language as tl 2023-01-11T21:38:06.9395258Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.9395397Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.9395402Z 2023-01-11T21:38:06.9395407Z 2023-01-11T21:38:06.9395567Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.9395643Z import triton 2023-01-11T21:38:06.9395732Z import triton.language as tl 2023-01-11T21:38:06.9395848Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.9395952Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.9396087Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.9396214Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.9396219Z 2023-01-11T21:38:06.9396639Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*i32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.9396717Z @triton.jit 2023-01-11T21:38:06.9396860Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.9396941Z xnumel = 100 2023-01-11T21:38:06.9397034Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.9397170Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.9397255Z xmask = xindex < xnumel 2023-01-11T21:38:06.9397339Z x1 = (xindex // 10) 2023-01-11T21:38:06.9397412Z x2 = xindex 2023-01-11T21:38:06.9397511Z tmp0 = tl.load(in_ptr0 + (x1), xmask) 2023-01-11T21:38:06.9397604Z tmp2 = tl.load(in_ptr1 + (x2), xmask) 2023-01-11T21:38:06.9397695Z tmp1 = tmp0.to(tl.float32) 2023-01-11T21:38:06.9397775Z tmp3 = tmp1 + tmp2 2023-01-11T21:38:06.9397912Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.9398000Z ''') 2023-01-11T21:38:06.9398008Z 2023-01-11T21:38:06.9398013Z 2023-01-11T21:38:06.9398108Z async_compile.wait(globals()) 2023-01-11T21:38:06.9398185Z del async_compile 2023-01-11T21:38:06.9398190Z 2023-01-11T21:38:06.9398265Z def call(args): 2023-01-11T21:38:06.9398340Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.9398417Z args.clear() 2023-01-11T21:38:06.9398538Z with torch.cuda.device(0): 2023-01-11T21:38:06.9398743Z buf0 = empty_strided((10, 10), (1, 10), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.9398835Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.9398974Z triton_fused_add_0.run(arg0_1, arg1_1, buf0, 100, grid=grid(100), stream=stream0) 2023-01-11T21:38:06.9399047Z del arg0_1 2023-01-11T21:38:06.9399113Z del arg1_1 2023-01-11T21:38:06.9399189Z return (buf0, ) 2023-01-11T21:38:06.9399194Z 2023-01-11T21:38:06.9399198Z 2023-01-11T21:38:06.9399279Z if __name__ == "__main__": 2023-01-11T21:38:06.9399395Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.9399523Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.9399718Z arg0_1 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.int32) 2023-01-11T21:38:06.9399922Z arg1_1 = rand_strided((10, 10), (1, 10), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.9400044Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.9400049Z 2023-01-11T21:38:06.9400112Z ok (0.087s) 2023-01-11T21:38:06.9400589Z test_cuda_strided_broadcast1 (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.9400719Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.9401015Z [2023-01-11 21:37:58,733] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1135 2023-01-11T21:38:06.9401279Z [2023-01-11 21:37:58,806] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1135 2023-01-11T21:38:06.9401285Z 2023-01-11T21:38:06.9401381Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.9401455Z import torch 2023-01-11T21:38:06.9401528Z import random 2023-01-11T21:38:06.9401646Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.9401763Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.9401778Z 2023-01-11T21:38:06.9401853Z aten = torch.ops.aten 2023-01-11T21:38:06.9401988Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.9402080Z async_compile = AsyncCompile() 2023-01-11T21:38:06.9402086Z 2023-01-11T21:38:06.9402158Z import triton 2023-01-11T21:38:06.9402251Z import triton.language as tl 2023-01-11T21:38:06.9402374Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.9402519Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.9402525Z 2023-01-11T21:38:06.9402530Z 2023-01-11T21:38:06.9402676Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.9402750Z import triton 2023-01-11T21:38:06.9402842Z import triton.language as tl 2023-01-11T21:38:06.9402961Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.9403062Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.9403199Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.9403325Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.9403330Z 2023-01-11T21:38:06.9403744Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.9403810Z @triton.jit 2023-01-11T21:38:06.9403951Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.9404025Z xnumel = 100 2023-01-11T21:38:06.9404122Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.9404250Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.9404395Z xmask = xindex < xnumel 2023-01-11T21:38:06.9404471Z x0 = xindex % 10 2023-01-11T21:38:06.9404544Z x1 = (xindex // 10) 2023-01-11T21:38:06.9404616Z x2 = xindex 2023-01-11T21:38:06.9404724Z tmp0 = tl.load(in_ptr0 + ((2*x0) + (30*x1)), xmask) 2023-01-11T21:38:06.9404821Z tmp1 = tl.load(in_ptr1 + (x0), xmask) 2023-01-11T21:38:06.9404899Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.9405033Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.9405122Z ''') 2023-01-11T21:38:06.9405127Z 2023-01-11T21:38:06.9405132Z 2023-01-11T21:38:06.9405226Z async_compile.wait(globals()) 2023-01-11T21:38:06.9405296Z del async_compile 2023-01-11T21:38:06.9405304Z 2023-01-11T21:38:06.9405377Z def call(args): 2023-01-11T21:38:06.9405455Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.9405530Z args.clear() 2023-01-11T21:38:06.9405645Z with torch.cuda.device(0): 2023-01-11T21:38:06.9405874Z buf0 = empty_strided((10, 10), (10, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.9405967Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.9406101Z triton_fused_add_0.run(arg0_1, arg1_1, buf0, 100, grid=grid(100), stream=stream0) 2023-01-11T21:38:06.9406174Z del arg0_1 2023-01-11T21:38:06.9406246Z del arg1_1 2023-01-11T21:38:06.9406320Z return (buf0, ) 2023-01-11T21:38:06.9406325Z 2023-01-11T21:38:06.9406330Z 2023-01-11T21:38:06.9406409Z if __name__ == "__main__": 2023-01-11T21:38:06.9406524Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.9406649Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.9406853Z arg0_1 = rand_strided((10, 10), (30, 2), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.9407071Z arg1_1 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.9407189Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.9407195Z 2023-01-11T21:38:06.9407265Z ok (0.088s) 2023-01-11T21:38:06.9407741Z test_cuda_strided_broadcast2 (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.9407872Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.9408133Z [2023-01-11 21:37:58,820] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1136 2023-01-11T21:38:06.9408400Z [2023-01-11 21:37:58,895] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1136 2023-01-11T21:38:06.9408406Z 2023-01-11T21:38:06.9408505Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.9408578Z import torch 2023-01-11T21:38:06.9408646Z import random 2023-01-11T21:38:06.9408767Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.9408892Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.9408897Z 2023-01-11T21:38:06.9408979Z aten = torch.ops.aten 2023-01-11T21:38:06.9409116Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.9409211Z async_compile = AsyncCompile() 2023-01-11T21:38:06.9409217Z 2023-01-11T21:38:06.9409289Z import triton 2023-01-11T21:38:06.9409383Z import triton.language as tl 2023-01-11T21:38:06.9409501Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.9409639Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.9409647Z 2023-01-11T21:38:06.9409652Z 2023-01-11T21:38:06.9409806Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.9409882Z import triton 2023-01-11T21:38:06.9409977Z import triton.language as tl 2023-01-11T21:38:06.9410088Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.9410223Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.9410350Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.9410475Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.9410480Z 2023-01-11T21:38:06.9410902Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.9410976Z @triton.jit 2023-01-11T21:38:06.9411117Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.9411193Z xnumel = 100 2023-01-11T21:38:06.9411291Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.9411420Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.9411500Z xmask = xindex < xnumel 2023-01-11T21:38:06.9411569Z x0 = xindex % 10 2023-01-11T21:38:06.9411649Z x1 = (xindex // 10) 2023-01-11T21:38:06.9411724Z x2 = xindex 2023-01-11T21:38:06.9411834Z tmp0 = tl.load(in_ptr0 + ((2*x0) + (30*x1)), xmask) 2023-01-11T21:38:06.9411935Z tmp1 = tl.load(in_ptr1 + (x1), xmask) 2023-01-11T21:38:06.9412012Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.9412141Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.9412227Z ''') 2023-01-11T21:38:06.9412233Z 2023-01-11T21:38:06.9412238Z 2023-01-11T21:38:06.9412331Z async_compile.wait(globals()) 2023-01-11T21:38:06.9412408Z del async_compile 2023-01-11T21:38:06.9412413Z 2023-01-11T21:38:06.9412490Z def call(args): 2023-01-11T21:38:06.9412571Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.9412676Z args.clear() 2023-01-11T21:38:06.9412769Z with torch.cuda.device(0): 2023-01-11T21:38:06.9412976Z buf0 = empty_strided((1, 10, 10), (100, 10, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.9413066Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.9413209Z triton_fused_add_0.run(arg0_1, arg1_1, buf0, 100, grid=grid(100), stream=stream0) 2023-01-11T21:38:06.9413282Z del arg0_1 2023-01-11T21:38:06.9413353Z del arg1_1 2023-01-11T21:38:06.9413432Z return (buf0, ) 2023-01-11T21:38:06.9413437Z 2023-01-11T21:38:06.9413442Z 2023-01-11T21:38:06.9413526Z if __name__ == "__main__": 2023-01-11T21:38:06.9413643Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.9413761Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.9413964Z arg0_1 = rand_strided((10, 10), (30, 2), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.9414172Z arg1_1 = rand_strided((1, 10, 1), (10, 1, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.9414295Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.9414300Z 2023-01-11T21:38:06.9414370Z ok (0.088s) 2023-01-11T21:38:06.9414963Z test_cuda_strided_broadcast3 (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.9415094Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.9415353Z [2023-01-11 21:37:58,909] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1137 2023-01-11T21:38:06.9415616Z [2023-01-11 21:37:58,982] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1137 2023-01-11T21:38:06.9415625Z 2023-01-11T21:38:06.9415719Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.9415803Z import torch 2023-01-11T21:38:06.9415890Z import random 2023-01-11T21:38:06.9416024Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.9416155Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.9416206Z 2023-01-11T21:38:06.9416291Z aten = torch.ops.aten 2023-01-11T21:38:06.9416429Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.9416525Z async_compile = AsyncCompile() 2023-01-11T21:38:06.9416530Z 2023-01-11T21:38:06.9416597Z import triton 2023-01-11T21:38:06.9416688Z import triton.language as tl 2023-01-11T21:38:06.9416813Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.9416950Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.9416955Z 2023-01-11T21:38:06.9416960Z 2023-01-11T21:38:06.9417112Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.9417244Z import triton 2023-01-11T21:38:06.9417339Z import triton.language as tl 2023-01-11T21:38:06.9417445Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.9417546Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.9417676Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.9417805Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.9417810Z 2023-01-11T21:38:06.9418225Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.9418303Z @triton.jit 2023-01-11T21:38:06.9418443Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.9418517Z xnumel = 100 2023-01-11T21:38:06.9418608Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.9418736Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.9418864Z xmask = xindex < xnumel 2023-01-11T21:38:06.9418940Z x0 = xindex % 10 2023-01-11T21:38:06.9419019Z x1 = (xindex // 10) 2023-01-11T21:38:06.9419089Z x2 = xindex 2023-01-11T21:38:06.9419196Z tmp0 = tl.load(in_ptr0 + ((2*x0) + (30*x1)), xmask) 2023-01-11T21:38:06.9419323Z tmp1 = tl.load(in_ptr1 + (0 + tl.zeros([XBLOCK], tl.int32)), None) 2023-01-11T21:38:06.9419403Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.9419537Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.9419623Z ''') 2023-01-11T21:38:06.9419629Z 2023-01-11T21:38:06.9419633Z 2023-01-11T21:38:06.9419728Z async_compile.wait(globals()) 2023-01-11T21:38:06.9419802Z del async_compile 2023-01-11T21:38:06.9419807Z 2023-01-11T21:38:06.9419879Z def call(args): 2023-01-11T21:38:06.9419959Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.9420028Z args.clear() 2023-01-11T21:38:06.9420121Z with torch.cuda.device(0): 2023-01-11T21:38:06.9420328Z buf0 = empty_strided((10, 10), (10, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.9420422Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.9420559Z triton_fused_add_0.run(arg0_1, arg1_1, buf0, 100, grid=grid(100), stream=stream0) 2023-01-11T21:38:06.9420633Z del arg0_1 2023-01-11T21:38:06.9420708Z del arg1_1 2023-01-11T21:38:06.9420779Z return (buf0, ) 2023-01-11T21:38:06.9420784Z 2023-01-11T21:38:06.9420795Z 2023-01-11T21:38:06.9420868Z if __name__ == "__main__": 2023-01-11T21:38:06.9420984Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.9421108Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.9421312Z arg0_1 = rand_strided((10, 10), (30, 2), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.9421507Z arg1_1 = rand_strided((1, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.9421625Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.9421633Z 2023-01-11T21:38:06.9421707Z ok (0.087s) 2023-01-11T21:38:06.9422204Z test_cuda_strided_dense (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.9422337Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.9422590Z [2023-01-11 21:37:58,996] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1138 2023-01-11T21:38:06.9422854Z [2023-01-11 21:37:59,074] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1138 2023-01-11T21:38:06.9422859Z 2023-01-11T21:38:06.9422960Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.9423038Z import torch 2023-01-11T21:38:06.9423112Z import random 2023-01-11T21:38:06.9423230Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.9423354Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.9423359Z 2023-01-11T21:38:06.9423439Z aten = torch.ops.aten 2023-01-11T21:38:06.9423571Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.9423666Z async_compile = AsyncCompile() 2023-01-11T21:38:06.9423671Z 2023-01-11T21:38:06.9423741Z import triton 2023-01-11T21:38:06.9423835Z import triton.language as tl 2023-01-11T21:38:06.9423960Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.9424099Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.9424104Z 2023-01-11T21:38:06.9424109Z 2023-01-11T21:38:06.9424261Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.9424335Z import triton 2023-01-11T21:38:06.9424422Z import triton.language as tl 2023-01-11T21:38:06.9424563Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.9424662Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.9424795Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.9424918Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.9424923Z 2023-01-11T21:38:06.9425340Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.9425416Z @triton.jit 2023-01-11T21:38:06.9425558Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.9425633Z xnumel = 100 2023-01-11T21:38:06.9425748Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.9425898Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.9425985Z xmask = xindex < xnumel 2023-01-11T21:38:06.9426063Z x0 = xindex % 10 2023-01-11T21:38:06.9426142Z x1 = (xindex // 10) 2023-01-11T21:38:06.9426205Z x2 = xindex 2023-01-11T21:38:06.9426314Z tmp0 = tl.load(in_ptr0 + ((2*x0) + (30*x1)), xmask) 2023-01-11T21:38:06.9426410Z tmp1 = tl.load(in_ptr1 + (x2), xmask) 2023-01-11T21:38:06.9426489Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.9426625Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.9426710Z ''') 2023-01-11T21:38:06.9426715Z 2023-01-11T21:38:06.9426720Z 2023-01-11T21:38:06.9426814Z async_compile.wait(globals()) 2023-01-11T21:38:06.9426892Z del async_compile 2023-01-11T21:38:06.9426897Z 2023-01-11T21:38:06.9426966Z def call(args): 2023-01-11T21:38:06.9427045Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.9427122Z args.clear() 2023-01-11T21:38:06.9427215Z with torch.cuda.device(0): 2023-01-11T21:38:06.9427418Z buf0 = empty_strided((10, 10), (10, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.9427516Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.9427656Z triton_fused_add_0.run(arg0_1, arg1_1, buf0, 100, grid=grid(100), stream=stream0) 2023-01-11T21:38:06.9427723Z del arg0_1 2023-01-11T21:38:06.9427796Z del arg1_1 2023-01-11T21:38:06.9427873Z return (buf0, ) 2023-01-11T21:38:06.9427878Z 2023-01-11T21:38:06.9427909Z 2023-01-11T21:38:06.9427994Z if __name__ == "__main__": 2023-01-11T21:38:06.9428113Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.9428241Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.9428446Z arg0_1 = rand_strided((10, 10), (30, 2), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.9428647Z arg1_1 = rand_strided((10, 10), (10, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.9428761Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.9428766Z 2023-01-11T21:38:06.9428840Z ok (0.092s) 2023-01-11T21:38:06.9429316Z test_cuda_strided_double (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.9429451Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.9429712Z [2023-01-11 21:37:59,089] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1139 2023-01-11T21:38:06.9429975Z [2023-01-11 21:37:59,162] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1139 2023-01-11T21:38:06.9429980Z 2023-01-11T21:38:06.9430082Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.9430159Z import torch 2023-01-11T21:38:06.9430235Z import random 2023-01-11T21:38:06.9430350Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.9430501Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.9430507Z 2023-01-11T21:38:06.9430586Z aten = torch.ops.aten 2023-01-11T21:38:06.9430722Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.9430817Z async_compile = AsyncCompile() 2023-01-11T21:38:06.9430825Z 2023-01-11T21:38:06.9430899Z import triton 2023-01-11T21:38:06.9430993Z import triton.language as tl 2023-01-11T21:38:06.9431116Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.9431247Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.9431253Z 2023-01-11T21:38:06.9431266Z 2023-01-11T21:38:06.9431414Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.9431488Z import triton 2023-01-11T21:38:06.9431580Z import triton.language as tl 2023-01-11T21:38:06.9431691Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.9431791Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.9431929Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.9432054Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.9432059Z 2023-01-11T21:38:06.9432469Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp64', 2: '*fp64', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.9432543Z @triton.jit 2023-01-11T21:38:06.9432686Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.9432759Z xnumel = 100 2023-01-11T21:38:06.9432858Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.9432984Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.9433068Z xmask = xindex < xnumel 2023-01-11T21:38:06.9433142Z x0 = xindex % 10 2023-01-11T21:38:06.9433215Z x1 = (xindex // 10) 2023-01-11T21:38:06.9433287Z x2 = xindex 2023-01-11T21:38:06.9433399Z tmp0 = tl.load(in_ptr0 + ((2*x0) + (30*x1)), xmask) 2023-01-11T21:38:06.9433495Z tmp2 = tl.load(in_ptr1 + (x2), xmask) 2023-01-11T21:38:06.9433582Z tmp1 = tmp0.to(tl.float64) 2023-01-11T21:38:06.9433661Z tmp3 = tmp1 + tmp2 2023-01-11T21:38:06.9433816Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.9433904Z ''') 2023-01-11T21:38:06.9433909Z 2023-01-11T21:38:06.9433914Z 2023-01-11T21:38:06.9434008Z async_compile.wait(globals()) 2023-01-11T21:38:06.9434086Z del async_compile 2023-01-11T21:38:06.9434092Z 2023-01-11T21:38:06.9434170Z def call(args): 2023-01-11T21:38:06.9434249Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.9434324Z args.clear() 2023-01-11T21:38:06.9434419Z with torch.cuda.device(0): 2023-01-11T21:38:06.9434614Z buf0 = empty_strided((10, 10), (10, 1), device='cuda', dtype=torch.float64) 2023-01-11T21:38:06.9434704Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.9434845Z triton_fused_add_0.run(arg0_1, arg1_1, buf0, 100, grid=grid(100), stream=stream0) 2023-01-11T21:38:06.9434919Z del arg0_1 2023-01-11T21:38:06.9434991Z del arg1_1 2023-01-11T21:38:06.9435068Z return (buf0, ) 2023-01-11T21:38:06.9435074Z 2023-01-11T21:38:06.9435079Z 2023-01-11T21:38:06.9435157Z if __name__ == "__main__": 2023-01-11T21:38:06.9435279Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.9435397Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.9435600Z arg0_1 = rand_strided((10, 10), (30, 2), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.9435796Z arg1_1 = rand_strided((10, 10), (10, 1), device='cuda:0', dtype=torch.float64) 2023-01-11T21:38:06.9435913Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.9435918Z 2023-01-11T21:38:06.9435989Z ok (0.088s) 2023-01-11T21:38:06.9436458Z test_cuda_strided_int (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.9436622Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.9436880Z [2023-01-11 21:37:59,176] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1140 2023-01-11T21:38:06.9437145Z [2023-01-11 21:37:59,251] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1140 2023-01-11T21:38:06.9437151Z 2023-01-11T21:38:06.9437242Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.9437318Z import torch 2023-01-11T21:38:06.9437391Z import random 2023-01-11T21:38:06.9437508Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.9437629Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.9437638Z 2023-01-11T21:38:06.9437717Z aten = torch.ops.aten 2023-01-11T21:38:06.9437854Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.9437947Z async_compile = AsyncCompile() 2023-01-11T21:38:06.9437952Z 2023-01-11T21:38:06.9438019Z import triton 2023-01-11T21:38:06.9438117Z import triton.language as tl 2023-01-11T21:38:06.9438240Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.9438378Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.9438383Z 2023-01-11T21:38:06.9438388Z 2023-01-11T21:38:06.9438541Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.9438616Z import triton 2023-01-11T21:38:06.9438713Z import triton.language as tl 2023-01-11T21:38:06.9438819Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.9438921Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.9439055Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.9439181Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.9439186Z 2023-01-11T21:38:06.9439626Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp32', 1: '*i32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.9439704Z @triton.jit 2023-01-11T21:38:06.9439843Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.9439917Z xnumel = 100 2023-01-11T21:38:06.9440008Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.9440133Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.9440215Z xmask = xindex < xnumel 2023-01-11T21:38:06.9440290Z x0 = xindex % 10 2023-01-11T21:38:06.9440370Z x1 = (xindex // 10) 2023-01-11T21:38:06.9440440Z x2 = xindex 2023-01-11T21:38:06.9440548Z tmp0 = tl.load(in_ptr0 + ((2*x0) + (30*x1)), xmask) 2023-01-11T21:38:06.9440642Z tmp1 = tl.load(in_ptr1 + (x0), xmask) 2023-01-11T21:38:06.9440734Z tmp2 = tmp1.to(tl.float32) 2023-01-11T21:38:06.9440813Z tmp3 = tmp0 + tmp2 2023-01-11T21:38:06.9440948Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.9441032Z ''') 2023-01-11T21:38:06.9441040Z 2023-01-11T21:38:06.9441044Z 2023-01-11T21:38:06.9441137Z async_compile.wait(globals()) 2023-01-11T21:38:06.9441213Z del async_compile 2023-01-11T21:38:06.9441219Z 2023-01-11T21:38:06.9441293Z def call(args): 2023-01-11T21:38:06.9441366Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.9441440Z args.clear() 2023-01-11T21:38:06.9441532Z with torch.cuda.device(0): 2023-01-11T21:38:06.9441734Z buf0 = empty_strided((10, 10), (10, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.9441826Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.9441968Z triton_fused_add_0.run(arg0_1, arg1_1, buf0, 100, grid=grid(100), stream=stream0) 2023-01-11T21:38:06.9442081Z del arg0_1 2023-01-11T21:38:06.9442147Z del arg1_1 2023-01-11T21:38:06.9442224Z return (buf0, ) 2023-01-11T21:38:06.9442230Z 2023-01-11T21:38:06.9442234Z 2023-01-11T21:38:06.9442313Z if __name__ == "__main__": 2023-01-11T21:38:06.9442431Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.9442558Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.9442763Z arg0_1 = rand_strided((10, 10), (30, 2), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.9442959Z arg1_1 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.int32) 2023-01-11T21:38:06.9443076Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.9443082Z 2023-01-11T21:38:06.9443145Z ok (0.089s) 2023-01-11T21:38:06.9443624Z test_cuda_strided_strided (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.9443757Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.9444018Z [2023-01-11 21:37:59,265] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1141 2023-01-11T21:38:06.9444282Z [2023-01-11 21:37:59,339] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1141 2023-01-11T21:38:06.9444287Z 2023-01-11T21:38:06.9444385Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.9444461Z import torch 2023-01-11T21:38:06.9444535Z import random 2023-01-11T21:38:06.9444653Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.9444768Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.9444780Z 2023-01-11T21:38:06.9444858Z aten = torch.ops.aten 2023-01-11T21:38:06.9444993Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.9445086Z async_compile = AsyncCompile() 2023-01-11T21:38:06.9445091Z 2023-01-11T21:38:06.9445165Z import triton 2023-01-11T21:38:06.9445255Z import triton.language as tl 2023-01-11T21:38:06.9445411Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.9445561Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.9445568Z 2023-01-11T21:38:06.9445573Z 2023-01-11T21:38:06.9445743Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.9445834Z import triton 2023-01-11T21:38:06.9445926Z import triton.language as tl 2023-01-11T21:38:06.9446039Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.9446139Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.9446271Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.9446394Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.9446401Z 2023-01-11T21:38:06.9446816Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.9446883Z @triton.jit 2023-01-11T21:38:06.9447027Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.9447101Z xnumel = 100 2023-01-11T21:38:06.9447202Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.9447331Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.9447415Z xmask = xindex < xnumel 2023-01-11T21:38:06.9447489Z x0 = xindex % 10 2023-01-11T21:38:06.9447562Z x1 = (xindex // 10) 2023-01-11T21:38:06.9447633Z x2 = xindex 2023-01-11T21:38:06.9447742Z tmp0 = tl.load(in_ptr0 + ((2*x0) + (30*x1)), xmask) 2023-01-11T21:38:06.9447849Z tmp1 = tl.load(in_ptr1 + ((2*x0) + (30*x1)), xmask) 2023-01-11T21:38:06.9447956Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.9448091Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.9448174Z ''') 2023-01-11T21:38:06.9448180Z 2023-01-11T21:38:06.9448184Z 2023-01-11T21:38:06.9448276Z async_compile.wait(globals()) 2023-01-11T21:38:06.9448348Z del async_compile 2023-01-11T21:38:06.9448353Z 2023-01-11T21:38:06.9448428Z def call(args): 2023-01-11T21:38:06.9448506Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.9448581Z args.clear() 2023-01-11T21:38:06.9448673Z with torch.cuda.device(0): 2023-01-11T21:38:06.9448877Z buf0 = empty_strided((10, 10), (10, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.9448969Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.9449102Z triton_fused_add_0.run(arg0_1, arg1_1, buf0, 100, grid=grid(100), stream=stream0) 2023-01-11T21:38:06.9449178Z del arg0_1 2023-01-11T21:38:06.9449250Z del arg1_1 2023-01-11T21:38:06.9449332Z return (buf0, ) 2023-01-11T21:38:06.9449337Z 2023-01-11T21:38:06.9449342Z 2023-01-11T21:38:06.9449422Z if __name__ == "__main__": 2023-01-11T21:38:06.9449537Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.9449661Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.9449867Z arg0_1 = rand_strided((10, 10), (30, 2), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.9450057Z arg1_1 = rand_strided((10, 10), (30, 2), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.9450176Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.9450181Z 2023-01-11T21:38:06.9450251Z ok (0.089s) 2023-01-11T21:38:06.9450728Z test_cuda_strided_transposed (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.9450860Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.9451119Z [2023-01-11 21:37:59,354] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1142 2023-01-11T21:38:06.9451411Z [2023-01-11 21:37:59,573] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1142 2023-01-11T21:38:06.9451417Z 2023-01-11T21:38:06.9451515Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.9451589Z import torch 2023-01-11T21:38:06.9451657Z import random 2023-01-11T21:38:06.9451775Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.9451897Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.9451902Z 2023-01-11T21:38:06.9451985Z aten = torch.ops.aten 2023-01-11T21:38:06.9452119Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.9452217Z async_compile = AsyncCompile() 2023-01-11T21:38:06.9452222Z 2023-01-11T21:38:06.9452298Z import triton 2023-01-11T21:38:06.9452391Z import triton.language as tl 2023-01-11T21:38:06.9452508Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.9452652Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.9452660Z 2023-01-11T21:38:06.9452665Z 2023-01-11T21:38:06.9452820Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.9452894Z import triton 2023-01-11T21:38:06.9452985Z import triton.language as tl 2023-01-11T21:38:06.9453099Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.9453202Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.9453329Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.9453451Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.9453456Z 2023-01-11T21:38:06.9453930Z @pointwise(size_hints=[16, 16], tile_hint=TileHint.DEFAULT,filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.9454032Z @triton.jit 2023-01-11T21:38:06.9454210Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, ynumel, XBLOCK : tl.constexpr, YBLOCK : tl.constexpr): 2023-01-11T21:38:06.9454287Z xnumel = 10 2023-01-11T21:38:06.9454359Z ynumel = 10 2023-01-11T21:38:06.9454457Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.9454710Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.9454788Z xmask = xindex < xnumel 2023-01-11T21:38:06.9454885Z yoffset = tl.program_id(1) * YBLOCK 2023-01-11T21:38:06.9455016Z yindex = yoffset + tl.reshape(tl.arange(0, YBLOCK), [1, YBLOCK]) 2023-01-11T21:38:06.9455100Z ymask = yindex < ynumel 2023-01-11T21:38:06.9455171Z x0 = xindex 2023-01-11T21:38:06.9455243Z y1 = yindex 2023-01-11T21:38:06.9455363Z tmp0 = tl.load(in_ptr0 + ((2*y1) + (30*x0)), xmask & ymask) 2023-01-11T21:38:06.9455473Z tmp1 = tl.load(in_ptr1 + (x0 + (10*y1)), xmask & ymask) 2023-01-11T21:38:06.9455555Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.9455715Z tl.store(out_ptr0 + (y1 + (10*x0) + tl.zeros([XBLOCK, YBLOCK], tl.int32)), tmp2, xmask & ymask) 2023-01-11T21:38:06.9455806Z ''') 2023-01-11T21:38:06.9455811Z 2023-01-11T21:38:06.9455816Z 2023-01-11T21:38:06.9455909Z async_compile.wait(globals()) 2023-01-11T21:38:06.9455987Z del async_compile 2023-01-11T21:38:06.9455994Z 2023-01-11T21:38:06.9456068Z def call(args): 2023-01-11T21:38:06.9456140Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.9456220Z args.clear() 2023-01-11T21:38:06.9456310Z with torch.cuda.device(0): 2023-01-11T21:38:06.9456516Z buf0 = empty_strided((10, 10), (10, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.9456607Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.9456756Z triton_fused_add_0.run(arg0_1, arg1_1, buf0, 10, 10, grid=grid(10, 10), stream=stream0) 2023-01-11T21:38:06.9456827Z del arg0_1 2023-01-11T21:38:06.9456892Z del arg1_1 2023-01-11T21:38:06.9456969Z return (buf0, ) 2023-01-11T21:38:06.9456974Z 2023-01-11T21:38:06.9456978Z 2023-01-11T21:38:06.9457059Z if __name__ == "__main__": 2023-01-11T21:38:06.9457281Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.9458700Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.9458920Z arg0_1 = rand_strided((10, 10), (30, 2), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.9459121Z arg1_1 = rand_strided((10, 10), (1, 10), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.9459242Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.9459247Z 2023-01-11T21:38:06.9459319Z ok (0.234s) 2023-01-11T21:38:06.9459803Z test_cuda_transposed_broadcast1 (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.9459938Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.9460201Z [2023-01-11 21:37:59,588] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1143 2023-01-11T21:38:06.9460492Z [2023-01-11 21:37:59,601] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1143 2023-01-11T21:38:06.9460497Z 2023-01-11T21:38:06.9460596Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.9460673Z import torch 2023-01-11T21:38:06.9460748Z import random 2023-01-11T21:38:06.9460867Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.9460993Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.9460999Z 2023-01-11T21:38:06.9461083Z aten = torch.ops.aten 2023-01-11T21:38:06.9461221Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.9461311Z async_compile = AsyncCompile() 2023-01-11T21:38:06.9461316Z 2023-01-11T21:38:06.9461391Z import triton 2023-01-11T21:38:06.9461487Z import triton.language as tl 2023-01-11T21:38:06.9461612Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.9461756Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.9461762Z 2023-01-11T21:38:06.9461767Z 2023-01-11T21:38:06.9461925Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.9462000Z import triton 2023-01-11T21:38:06.9462087Z import triton.language as tl 2023-01-11T21:38:06.9462206Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.9462308Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.9462443Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.9462570Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.9462575Z 2023-01-11T21:38:06.9462993Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.9463068Z @triton.jit 2023-01-11T21:38:06.9463212Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.9463281Z xnumel = 100 2023-01-11T21:38:06.9463380Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.9463509Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.9463593Z xmask = xindex < xnumel 2023-01-11T21:38:06.9463665Z x2 = xindex 2023-01-11T21:38:06.9463746Z x1 = (xindex // 10) 2023-01-11T21:38:06.9463845Z tmp0 = tl.load(in_ptr0 + (x2), xmask) 2023-01-11T21:38:06.9463935Z tmp1 = tl.load(in_ptr1 + (x1), xmask) 2023-01-11T21:38:06.9464015Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.9464151Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.9464240Z ''') 2023-01-11T21:38:06.9464246Z 2023-01-11T21:38:06.9464250Z 2023-01-11T21:38:06.9464343Z async_compile.wait(globals()) 2023-01-11T21:38:06.9464459Z del async_compile 2023-01-11T21:38:06.9464465Z 2023-01-11T21:38:06.9464543Z def call(args): 2023-01-11T21:38:06.9464668Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.9464738Z args.clear() 2023-01-11T21:38:06.9464833Z with torch.cuda.device(0): 2023-01-11T21:38:06.9465041Z buf0 = empty_strided((10, 10), (1, 10), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.9465134Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.9465276Z triton_fused_add_0.run(arg0_1, arg1_1, buf0, 100, grid=grid(100), stream=stream0) 2023-01-11T21:38:06.9465350Z del arg0_1 2023-01-11T21:38:06.9465421Z del arg1_1 2023-01-11T21:38:06.9465493Z return (buf0, ) 2023-01-11T21:38:06.9465498Z 2023-01-11T21:38:06.9465507Z 2023-01-11T21:38:06.9465582Z if __name__ == "__main__": 2023-01-11T21:38:06.9465701Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.9465827Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.9466035Z arg0_1 = rand_strided((10, 10), (1, 10), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.9466237Z arg1_1 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.9466357Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.9466362Z 2023-01-11T21:38:06.9466433Z ok (0.028s) 2023-01-11T21:38:06.9466912Z test_cuda_transposed_broadcast2 (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.9467038Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.9467301Z [2023-01-11 21:37:59,615] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1144 2023-01-11T21:38:06.9467565Z [2023-01-11 21:37:59,627] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1144 2023-01-11T21:38:06.9467574Z 2023-01-11T21:38:06.9467674Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.9467748Z import torch 2023-01-11T21:38:06.9467823Z import random 2023-01-11T21:38:06.9467943Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.9468068Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.9468073Z 2023-01-11T21:38:06.9468156Z aten = torch.ops.aten 2023-01-11T21:38:06.9468287Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.9468386Z async_compile = AsyncCompile() 2023-01-11T21:38:06.9468391Z 2023-01-11T21:38:06.9468467Z import triton 2023-01-11T21:38:06.9468561Z import triton.language as tl 2023-01-11T21:38:06.9468687Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.9468830Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.9468835Z 2023-01-11T21:38:06.9468840Z 2023-01-11T21:38:06.9468997Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.9469066Z import triton 2023-01-11T21:38:06.9469159Z import triton.language as tl 2023-01-11T21:38:06.9469273Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.9469376Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.9469507Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.9469635Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.9469640Z 2023-01-11T21:38:06.9470051Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.9470129Z @triton.jit 2023-01-11T21:38:06.9470263Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.9470372Z xnumel = 100 2023-01-11T21:38:06.9470471Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.9470635Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.9470719Z xmask = xindex < xnumel 2023-01-11T21:38:06.9470791Z x2 = xindex 2023-01-11T21:38:06.9470868Z x0 = xindex % 10 2023-01-11T21:38:06.9470959Z tmp0 = tl.load(in_ptr0 + (x2), xmask) 2023-01-11T21:38:06.9471054Z tmp1 = tl.load(in_ptr1 + (x0), xmask) 2023-01-11T21:38:06.9471132Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.9471267Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.9471352Z ''') 2023-01-11T21:38:06.9471358Z 2023-01-11T21:38:06.9471363Z 2023-01-11T21:38:06.9471453Z async_compile.wait(globals()) 2023-01-11T21:38:06.9471528Z del async_compile 2023-01-11T21:38:06.9471533Z 2023-01-11T21:38:06.9471609Z def call(args): 2023-01-11T21:38:06.9471682Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.9471761Z args.clear() 2023-01-11T21:38:06.9471854Z with torch.cuda.device(0): 2023-01-11T21:38:06.9472069Z buf0 = empty_strided((1, 10, 10), (100, 1, 10), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.9472166Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.9472311Z triton_fused_add_0.run(arg0_1, arg1_1, buf0, 100, grid=grid(100), stream=stream0) 2023-01-11T21:38:06.9472387Z del arg0_1 2023-01-11T21:38:06.9472453Z del arg1_1 2023-01-11T21:38:06.9472535Z return (buf0, ) 2023-01-11T21:38:06.9472541Z 2023-01-11T21:38:06.9472545Z 2023-01-11T21:38:06.9472625Z if __name__ == "__main__": 2023-01-11T21:38:06.9472748Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.9472874Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.9473081Z arg0_1 = rand_strided((10, 10), (1, 10), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.9473289Z arg1_1 = rand_strided((1, 10, 1), (10, 1, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.9473411Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.9473420Z 2023-01-11T21:38:06.9473485Z ok (0.026s) 2023-01-11T21:38:06.9473962Z test_cuda_transposed_broadcast3 (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.9474094Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.9474355Z [2023-01-11 21:37:59,641] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1145 2023-01-11T21:38:06.9474617Z [2023-01-11 21:37:59,649] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1145 2023-01-11T21:38:06.9474626Z 2023-01-11T21:38:06.9474723Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.9474799Z import torch 2023-01-11T21:38:06.9474878Z import random 2023-01-11T21:38:06.9474997Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.9475118Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.9475130Z 2023-01-11T21:38:06.9475208Z aten = torch.ops.aten 2023-01-11T21:38:06.9475344Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.9475439Z async_compile = AsyncCompile() 2023-01-11T21:38:06.9475444Z 2023-01-11T21:38:06.9475520Z import triton 2023-01-11T21:38:06.9475611Z import triton.language as tl 2023-01-11T21:38:06.9475735Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.9475876Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.9475882Z 2023-01-11T21:38:06.9475886Z 2023-01-11T21:38:06.9476040Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.9476141Z import triton 2023-01-11T21:38:06.9476235Z import triton.language as tl 2023-01-11T21:38:06.9476352Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.9476483Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.9476617Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.9476743Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.9476748Z 2023-01-11T21:38:06.9477157Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.9477225Z @triton.jit 2023-01-11T21:38:06.9477366Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.9477440Z xnumel = 100 2023-01-11T21:38:06.9477536Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.9477668Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.9477750Z xmask = xindex < xnumel 2023-01-11T21:38:06.9477825Z x0 = xindex 2023-01-11T21:38:06.9477915Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.9478047Z tmp1 = tl.load(in_ptr1 + (0 + tl.zeros([XBLOCK], tl.int32)), None) 2023-01-11T21:38:06.9478130Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.9478265Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.9478349Z ''') 2023-01-11T21:38:06.9478355Z 2023-01-11T21:38:06.9478359Z 2023-01-11T21:38:06.9478453Z async_compile.wait(globals()) 2023-01-11T21:38:06.9478532Z del async_compile 2023-01-11T21:38:06.9478538Z 2023-01-11T21:38:06.9478612Z def call(args): 2023-01-11T21:38:06.9478685Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.9478759Z args.clear() 2023-01-11T21:38:06.9478851Z with torch.cuda.device(0): 2023-01-11T21:38:06.9479055Z buf0 = empty_strided((10, 10), (1, 10), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.9479152Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.9479295Z triton_fused_add_0.run(arg0_1, arg1_1, buf0, 100, grid=grid(100), stream=stream0) 2023-01-11T21:38:06.9479372Z del arg0_1 2023-01-11T21:38:06.9479438Z del arg1_1 2023-01-11T21:38:06.9479515Z return (buf0, ) 2023-01-11T21:38:06.9479520Z 2023-01-11T21:38:06.9479524Z 2023-01-11T21:38:06.9479604Z if __name__ == "__main__": 2023-01-11T21:38:06.9479722Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.9479848Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.9480052Z arg0_1 = rand_strided((10, 10), (1, 10), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.9480247Z arg1_1 = rand_strided((1, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.9480367Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.9480372Z 2023-01-11T21:38:06.9480437Z ok (0.022s) 2023-01-11T21:38:06.9480918Z test_cuda_transposed_dense (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.9481052Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.9481313Z [2023-01-11 21:37:59,663] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1146 2023-01-11T21:38:06.9481579Z [2023-01-11 21:37:59,867] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1146 2023-01-11T21:38:06.9481584Z 2023-01-11T21:38:06.9481684Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.9481759Z import torch 2023-01-11T21:38:06.9481832Z import random 2023-01-11T21:38:06.9481979Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.9482099Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.9482151Z 2023-01-11T21:38:06.9482229Z aten = torch.ops.aten 2023-01-11T21:38:06.9482367Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.9482464Z async_compile = AsyncCompile() 2023-01-11T21:38:06.9482469Z 2023-01-11T21:38:06.9482546Z import triton 2023-01-11T21:38:06.9482639Z import triton.language as tl 2023-01-11T21:38:06.9482765Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.9482907Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.9482912Z 2023-01-11T21:38:06.9482917Z 2023-01-11T21:38:06.9483074Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.9483143Z import triton 2023-01-11T21:38:06.9483235Z import triton.language as tl 2023-01-11T21:38:06.9483351Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.9483451Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.9483588Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.9483714Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.9483722Z 2023-01-11T21:38:06.9484196Z @pointwise(size_hints=[16, 16], tile_hint=TileHint.DEFAULT,filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.9484270Z @triton.jit 2023-01-11T21:38:06.9484441Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, ynumel, XBLOCK : tl.constexpr, YBLOCK : tl.constexpr): 2023-01-11T21:38:06.9484516Z xnumel = 10 2023-01-11T21:38:06.9484589Z ynumel = 10 2023-01-11T21:38:06.9484688Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.9484826Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.9484910Z xmask = xindex < xnumel 2023-01-11T21:38:06.9485009Z yoffset = tl.program_id(1) * YBLOCK 2023-01-11T21:38:06.9485136Z yindex = yoffset + tl.reshape(tl.arange(0, YBLOCK), [1, YBLOCK]) 2023-01-11T21:38:06.9485223Z ymask = yindex < ynumel 2023-01-11T21:38:06.9485293Z x0 = xindex 2023-01-11T21:38:06.9485368Z y1 = yindex 2023-01-11T21:38:06.9485484Z tmp0 = tl.load(in_ptr0 + (x0 + (10*y1)), xmask & ymask) 2023-01-11T21:38:06.9485602Z tmp1 = tl.load(in_ptr1 + (y1 + (10*x0)), xmask & ymask) 2023-01-11T21:38:06.9485683Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.9485860Z tl.store(out_ptr0 + (x0 + (10*y1) + tl.zeros([XBLOCK, YBLOCK], tl.int32)), tmp2, xmask & ymask) 2023-01-11T21:38:06.9485965Z ''') 2023-01-11T21:38:06.9485972Z 2023-01-11T21:38:06.9485978Z 2023-01-11T21:38:06.9486074Z async_compile.wait(globals()) 2023-01-11T21:38:06.9486155Z del async_compile 2023-01-11T21:38:06.9486161Z 2023-01-11T21:38:06.9486233Z def call(args): 2023-01-11T21:38:06.9486313Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.9486389Z args.clear() 2023-01-11T21:38:06.9486479Z with torch.cuda.device(0): 2023-01-11T21:38:06.9486686Z buf0 = empty_strided((10, 10), (1, 10), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.9486783Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.9486931Z triton_fused_add_0.run(arg0_1, arg1_1, buf0, 10, 10, grid=grid(10, 10), stream=stream0) 2023-01-11T21:38:06.9487003Z del arg0_1 2023-01-11T21:38:06.9487075Z del arg1_1 2023-01-11T21:38:06.9487154Z return (buf0, ) 2023-01-11T21:38:06.9487159Z 2023-01-11T21:38:06.9487164Z 2023-01-11T21:38:06.9487244Z if __name__ == "__main__": 2023-01-11T21:38:06.9487355Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.9487482Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.9487688Z arg0_1 = rand_strided((10, 10), (1, 10), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.9487887Z arg1_1 = rand_strided((10, 10), (10, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.9488040Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.9488046Z 2023-01-11T21:38:06.9488144Z ok (0.218s) 2023-01-11T21:38:06.9488627Z test_cuda_transposed_double (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.9488759Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.9489019Z [2023-01-11 21:37:59,881] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1147 2023-01-11T21:38:06.9489282Z [2023-01-11 21:38:00,085] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1147 2023-01-11T21:38:06.9489288Z 2023-01-11T21:38:06.9489383Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.9489459Z import torch 2023-01-11T21:38:06.9489537Z import random 2023-01-11T21:38:06.9489659Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.9489783Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.9489788Z 2023-01-11T21:38:06.9489872Z aten = torch.ops.aten 2023-01-11T21:38:06.9490010Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.9490099Z async_compile = AsyncCompile() 2023-01-11T21:38:06.9490109Z 2023-01-11T21:38:06.9490177Z import triton 2023-01-11T21:38:06.9490272Z import triton.language as tl 2023-01-11T21:38:06.9490402Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.9490540Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.9490546Z 2023-01-11T21:38:06.9490550Z 2023-01-11T21:38:06.9490706Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.9490782Z import triton 2023-01-11T21:38:06.9490878Z import triton.language as tl 2023-01-11T21:38:06.9490987Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.9491091Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.9491223Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.9491352Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.9491357Z 2023-01-11T21:38:06.9491833Z @pointwise(size_hints=[16, 16], tile_hint=TileHint.DEFAULT,filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp64', 2: '*fp64', 3: 'i32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.9491908Z @triton.jit 2023-01-11T21:38:06.9492085Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, ynumel, XBLOCK : tl.constexpr, YBLOCK : tl.constexpr): 2023-01-11T21:38:06.9492161Z xnumel = 10 2023-01-11T21:38:06.9492228Z ynumel = 10 2023-01-11T21:38:06.9492332Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.9492466Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.9492555Z xmask = xindex < xnumel 2023-01-11T21:38:06.9492656Z yoffset = tl.program_id(1) * YBLOCK 2023-01-11T21:38:06.9492788Z yindex = yoffset + tl.reshape(tl.arange(0, YBLOCK), [1, YBLOCK]) 2023-01-11T21:38:06.9492873Z ymask = yindex < ynumel 2023-01-11T21:38:06.9492938Z x0 = xindex 2023-01-11T21:38:06.9493008Z y1 = yindex 2023-01-11T21:38:06.9493125Z tmp0 = tl.load(in_ptr0 + (x0 + (10*y1)), xmask & ymask) 2023-01-11T21:38:06.9493243Z tmp2 = tl.load(in_ptr1 + (y1 + (10*x0)), xmask & ymask) 2023-01-11T21:38:06.9493334Z tmp1 = tmp0.to(tl.float64) 2023-01-11T21:38:06.9493414Z tmp3 = tmp1 + tmp2 2023-01-11T21:38:06.9493574Z tl.store(out_ptr0 + (x0 + (10*y1) + tl.zeros([XBLOCK, YBLOCK], tl.int32)), tmp3, xmask & ymask) 2023-01-11T21:38:06.9493654Z ''') 2023-01-11T21:38:06.9493666Z 2023-01-11T21:38:06.9493671Z 2023-01-11T21:38:06.9493787Z async_compile.wait(globals()) 2023-01-11T21:38:06.9493869Z del async_compile 2023-01-11T21:38:06.9493927Z 2023-01-11T21:38:06.9494004Z def call(args): 2023-01-11T21:38:06.9494084Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.9494161Z args.clear() 2023-01-11T21:38:06.9494254Z with torch.cuda.device(0): 2023-01-11T21:38:06.9494461Z buf0 = empty_strided((10, 10), (1, 10), device='cuda', dtype=torch.float64) 2023-01-11T21:38:06.9494674Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.9494825Z triton_fused_add_0.run(arg0_1, arg1_1, buf0, 10, 10, grid=grid(10, 10), stream=stream0) 2023-01-11T21:38:06.9494898Z del arg0_1 2023-01-11T21:38:06.9494970Z del arg1_1 2023-01-11T21:38:06.9495050Z return (buf0, ) 2023-01-11T21:38:06.9495055Z 2023-01-11T21:38:06.9495059Z 2023-01-11T21:38:06.9495141Z if __name__ == "__main__": 2023-01-11T21:38:06.9495260Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.9495390Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.9495596Z arg0_1 = rand_strided((10, 10), (1, 10), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.9495831Z arg1_1 = rand_strided((10, 10), (10, 1), device='cuda:0', dtype=torch.float64) 2023-01-11T21:38:06.9495961Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.9495967Z 2023-01-11T21:38:06.9496038Z ok (0.218s) 2023-01-11T21:38:06.9496513Z test_cuda_transposed_int (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.9496646Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.9496908Z [2023-01-11 21:38:00,099] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1148 2023-01-11T21:38:06.9497253Z [2023-01-11 21:38:00,169] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1148 2023-01-11T21:38:06.9497262Z 2023-01-11T21:38:06.9497368Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.9497437Z import torch 2023-01-11T21:38:06.9497512Z import random 2023-01-11T21:38:06.9497634Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.9497760Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.9497765Z 2023-01-11T21:38:06.9497850Z aten = torch.ops.aten 2023-01-11T21:38:06.9497989Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.9498086Z async_compile = AsyncCompile() 2023-01-11T21:38:06.9498092Z 2023-01-11T21:38:06.9498165Z import triton 2023-01-11T21:38:06.9498252Z import triton.language as tl 2023-01-11T21:38:06.9498377Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.9498520Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.9498526Z 2023-01-11T21:38:06.9498532Z 2023-01-11T21:38:06.9498686Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.9498763Z import triton 2023-01-11T21:38:06.9498858Z import triton.language as tl 2023-01-11T21:38:06.9498972Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.9499068Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.9499202Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.9499328Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.9499333Z 2023-01-11T21:38:06.9499751Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp32', 1: '*i32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.9499825Z @triton.jit 2023-01-11T21:38:06.9500020Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.9500099Z xnumel = 100 2023-01-11T21:38:06.9500234Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.9500359Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.9500444Z xmask = xindex < xnumel 2023-01-11T21:38:06.9500516Z x2 = xindex 2023-01-11T21:38:06.9500596Z x1 = (xindex // 10) 2023-01-11T21:38:06.9500694Z tmp0 = tl.load(in_ptr0 + (x2), xmask) 2023-01-11T21:38:06.9500791Z tmp1 = tl.load(in_ptr1 + (x1), xmask) 2023-01-11T21:38:06.9500880Z tmp2 = tmp1.to(tl.float32) 2023-01-11T21:38:06.9500952Z tmp3 = tmp0 + tmp2 2023-01-11T21:38:06.9501088Z tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask) 2023-01-11T21:38:06.9501175Z ''') 2023-01-11T21:38:06.9501181Z 2023-01-11T21:38:06.9501185Z 2023-01-11T21:38:06.9501278Z async_compile.wait(globals()) 2023-01-11T21:38:06.9501355Z del async_compile 2023-01-11T21:38:06.9501360Z 2023-01-11T21:38:06.9501438Z def call(args): 2023-01-11T21:38:06.9501518Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.9501596Z args.clear() 2023-01-11T21:38:06.9501682Z with torch.cuda.device(0): 2023-01-11T21:38:06.9501884Z buf0 = empty_strided((10, 10), (1, 10), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.9501977Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.9502119Z triton_fused_add_0.run(arg0_1, arg1_1, buf0, 100, grid=grid(100), stream=stream0) 2023-01-11T21:38:06.9502192Z del arg0_1 2023-01-11T21:38:06.9502265Z del arg1_1 2023-01-11T21:38:06.9502345Z return (buf0, ) 2023-01-11T21:38:06.9502350Z 2023-01-11T21:38:06.9502354Z 2023-01-11T21:38:06.9502428Z if __name__ == "__main__": 2023-01-11T21:38:06.9502547Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.9502673Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.9502880Z arg0_1 = rand_strided((10, 10), (1, 10), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.9503075Z arg1_1 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.int32) 2023-01-11T21:38:06.9503197Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.9503202Z 2023-01-11T21:38:06.9503272Z ok (0.084s) 2023-01-11T21:38:06.9503751Z test_cuda_transposed_strided (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.9503883Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.9504142Z [2023-01-11 21:38:00,183] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1149 2023-01-11T21:38:06.9504403Z [2023-01-11 21:38:00,387] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1149 2023-01-11T21:38:06.9504408Z 2023-01-11T21:38:06.9504510Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.9504584Z import torch 2023-01-11T21:38:06.9504659Z import random 2023-01-11T21:38:06.9504778Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.9504902Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.9504907Z 2023-01-11T21:38:06.9504990Z aten = torch.ops.aten 2023-01-11T21:38:06.9505121Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.9505217Z async_compile = AsyncCompile() 2023-01-11T21:38:06.9505222Z 2023-01-11T21:38:06.9505296Z import triton 2023-01-11T21:38:06.9505389Z import triton.language as tl 2023-01-11T21:38:06.9505514Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.9505654Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.9505660Z 2023-01-11T21:38:06.9505665Z 2023-01-11T21:38:06.9505849Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.9505929Z import triton 2023-01-11T21:38:06.9506041Z import triton.language as tl 2023-01-11T21:38:06.9506157Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.9506260Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.9506397Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.9506525Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.9506530Z 2023-01-11T21:38:06.9507005Z @pointwise(size_hints=[16, 16], tile_hint=TileHint.DEFAULT,filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.9507083Z @triton.jit 2023-01-11T21:38:06.9507260Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, ynumel, XBLOCK : tl.constexpr, YBLOCK : tl.constexpr): 2023-01-11T21:38:06.9507342Z xnumel = 10 2023-01-11T21:38:06.9507411Z ynumel = 10 2023-01-11T21:38:06.9507512Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.9507655Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.9507740Z xmask = xindex < xnumel 2023-01-11T21:38:06.9507840Z yoffset = tl.program_id(1) * YBLOCK 2023-01-11T21:38:06.9507975Z yindex = yoffset + tl.reshape(tl.arange(0, YBLOCK), [1, YBLOCK]) 2023-01-11T21:38:06.9508061Z ymask = yindex < ynumel 2023-01-11T21:38:06.9508128Z x0 = xindex 2023-01-11T21:38:06.9508199Z y1 = yindex 2023-01-11T21:38:06.9508317Z tmp0 = tl.load(in_ptr0 + (x0 + (10*y1)), xmask & ymask) 2023-01-11T21:38:06.9508436Z tmp1 = tl.load(in_ptr1 + ((2*y1) + (30*x0)), xmask & ymask) 2023-01-11T21:38:06.9508518Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.9508678Z tl.store(out_ptr0 + (x0 + (10*y1) + tl.zeros([XBLOCK, YBLOCK], tl.int32)), tmp2, xmask & ymask) 2023-01-11T21:38:06.9508765Z ''') 2023-01-11T21:38:06.9508773Z 2023-01-11T21:38:06.9508777Z 2023-01-11T21:38:06.9508866Z async_compile.wait(globals()) 2023-01-11T21:38:06.9508951Z del async_compile 2023-01-11T21:38:06.9508956Z 2023-01-11T21:38:06.9509031Z def call(args): 2023-01-11T21:38:06.9509112Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.9509190Z args.clear() 2023-01-11T21:38:06.9509283Z with torch.cuda.device(0): 2023-01-11T21:38:06.9509491Z buf0 = empty_strided((10, 10), (1, 10), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.9509579Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.9509727Z triton_fused_add_0.run(arg0_1, arg1_1, buf0, 10, 10, grid=grid(10, 10), stream=stream0) 2023-01-11T21:38:06.9509802Z del arg0_1 2023-01-11T21:38:06.9509875Z del arg1_1 2023-01-11T21:38:06.9509952Z return (buf0, ) 2023-01-11T21:38:06.9509958Z 2023-01-11T21:38:06.9509962Z 2023-01-11T21:38:06.9510046Z if __name__ == "__main__": 2023-01-11T21:38:06.9510170Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.9510300Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.9510503Z arg0_1 = rand_strided((10, 10), (1, 10), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.9510704Z arg1_1 = rand_strided((10, 10), (30, 2), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.9510825Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.9510830Z 2023-01-11T21:38:06.9510905Z ok (0.218s) 2023-01-11T21:38:06.9511389Z test_cuda_transposed_transposed (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:06.9511523Z buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone() 2023-01-11T21:38:06.9511814Z [2023-01-11 21:38:00,401] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1150 2023-01-11T21:38:06.9512112Z [2023-01-11 21:38:00,411] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1150 2023-01-11T21:38:06.9512118Z 2023-01-11T21:38:06.9512219Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.9512296Z import torch 2023-01-11T21:38:06.9512367Z import random 2023-01-11T21:38:06.9512489Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.9512614Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.9512619Z 2023-01-11T21:38:06.9512706Z aten = torch.ops.aten 2023-01-11T21:38:06.9512844Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.9512942Z async_compile = AsyncCompile() 2023-01-11T21:38:06.9512947Z 2023-01-11T21:38:06.9513021Z import triton 2023-01-11T21:38:06.9513109Z import triton.language as tl 2023-01-11T21:38:06.9513234Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.9513374Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.9513383Z 2023-01-11T21:38:06.9513388Z 2023-01-11T21:38:06.9513542Z triton_fused_add_0 = async_compile.triton(''' 2023-01-11T21:38:06.9513617Z import triton 2023-01-11T21:38:06.9513713Z import triton.language as tl 2023-01-11T21:38:06.9513827Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.9513929Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.9514056Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.9514182Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.9514187Z 2023-01-11T21:38:06.9514603Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.9514681Z @triton.jit 2023-01-11T21:38:06.9514822Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.9514900Z xnumel = 100 2023-01-11T21:38:06.9515001Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.9515135Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.9515214Z xmask = xindex < xnumel 2023-01-11T21:38:06.9515287Z x0 = xindex 2023-01-11T21:38:06.9515385Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.9515484Z tmp1 = tl.load(in_ptr1 + (x0), xmask) 2023-01-11T21:38:06.9515564Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.9515701Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) 2023-01-11T21:38:06.9515788Z ''') 2023-01-11T21:38:06.9515794Z 2023-01-11T21:38:06.9515798Z 2023-01-11T21:38:06.9515886Z async_compile.wait(globals()) 2023-01-11T21:38:06.9515966Z del async_compile 2023-01-11T21:38:06.9515971Z 2023-01-11T21:38:06.9516047Z def call(args): 2023-01-11T21:38:06.9516130Z arg0_1, arg1_1 = args 2023-01-11T21:38:06.9516210Z args.clear() 2023-01-11T21:38:06.9516304Z with torch.cuda.device(0): 2023-01-11T21:38:06.9516509Z buf0 = empty_strided((10, 10), (1, 10), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.9516596Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.9516740Z triton_fused_add_0.run(arg0_1, arg1_1, buf0, 100, grid=grid(100), stream=stream0) 2023-01-11T21:38:06.9516815Z del arg0_1 2023-01-11T21:38:06.9516888Z del arg1_1 2023-01-11T21:38:06.9516968Z return (buf0, ) 2023-01-11T21:38:06.9516973Z 2023-01-11T21:38:06.9516978Z 2023-01-11T21:38:06.9517058Z if __name__ == "__main__": 2023-01-11T21:38:06.9517177Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.9517306Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.9517505Z arg0_1 = rand_strided((10, 10), (1, 10), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.9517736Z arg1_1 = rand_strided((10, 10), (1, 10), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.9517858Z print_performance(lambda: call([arg0_1, arg1_1])) 2023-01-11T21:38:06.9517887Z 2023-01-11T21:38:06.9517961Z ok (0.024s) 2023-01-11T21:38:06.9518118Z test_indexing_join (__main__.TestIndexingSimplification) ... ok (0.065s) 2023-01-11T21:38:06.9518290Z test_indexing_simplification (__main__.TestIndexingSimplification) ... ok (0.070s) 2023-01-11T21:38:06.9518832Z test_cant_optimize_compute (__main__.TritonCodeGenTests) ... /opt/conda/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py:372: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled.Consider setting `torch.set_float32_matmul_precision('high')` 2023-01-11T21:38:06.9518915Z warnings.warn( 2023-01-11T21:38:06.9519155Z [2023-01-11 21:38:00,552] torch._dynamo.symbolic_convert: [INFO] Step 2: torchdynamo start tracing fn 2023-01-11T21:38:06.9519468Z [2023-01-11 21:38:00,552] torch._dynamo.symbolic_convert: [DEBUG] TRACE starts_line /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:6090 2023-01-11T21:38:06.9519695Z [2023-01-11 21:38:00,552] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_DEREF suffix [] 2023-01-11T21:38:06.9519953Z [2023-01-11 21:38:00,552] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_DEREF foo [UserFunctionVariable()] 2023-01-11T21:38:06.9520252Z [2023-01-11 21:38:00,552] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_DEREF ones [UserFunctionVariable(), UserFunctionVariable()] 2023-01-11T21:38:06.9520588Z [2023-01-11 21:38:00,552] torch._dynamo.symbolic_convert: [DEBUG] TRACE CALL_FUNCTION 0 [UserFunctionVariable(), UserFunctionVariable(), UserFunctionVariable()] 2023-01-11T21:38:06.9520962Z [2023-01-11 21:38:00,553] torch._dynamo.symbolic_convert: [DEBUG] INLINING 2023-01-11T21:38:06.9521071Z 6075 0 LOAD_GLOBAL 0 (torch) 2023-01-11T21:38:06.9521171Z 2 LOAD_ATTR 1 (ones) 2023-01-11T21:38:06.9521270Z 4 LOAD_CONST 1 (4) 2023-01-11T21:38:06.9521360Z 6 BUILD_LIST 1 2023-01-11T21:38:06.9521505Z 8 LOAD_CONST 2 ('cuda') 2023-01-11T21:38:06.9521659Z 10 LOAD_CONST 3 (('device',)) 2023-01-11T21:38:06.9521762Z 12 CALL_FUNCTION_KW 2 2023-01-11T21:38:06.9521844Z 14 RETURN_VALUE 2023-01-11T21:38:06.9521911Z 2023-01-11T21:38:06.9521916Z 2023-01-11T21:38:06.9522228Z [2023-01-11 21:38:00,553] torch._dynamo.symbolic_convert: [DEBUG] TRACE starts_line /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:6075 2023-01-11T21:38:06.9522448Z [2023-01-11 21:38:00,553] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_GLOBAL torch [] 2023-01-11T21:38:06.9522804Z [2023-01-11 21:38:00,553] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_ATTR ones [TorchVariable()] 2023-01-11T21:38:06.9523121Z [2023-01-11 21:38:00,553] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_CONST 4 [TorchVariable()] 2023-01-11T21:38:06.9523471Z [2023-01-11 21:38:00,553] torch._dynamo.symbolic_convert: [DEBUG] TRACE BUILD_LIST 1 [TorchVariable(), ConstantVariable(int)] 2023-01-11T21:38:06.9523810Z [2023-01-11 21:38:00,553] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_CONST cuda [TorchVariable(), ListVariable()] 2023-01-11T21:38:06.9524198Z [2023-01-11 21:38:00,553] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_CONST ('device',) [TorchVariable(), ListVariable(), ConstantVariable(str)] 2023-01-11T21:38:06.9524644Z [2023-01-11 21:38:00,553] torch._dynamo.symbolic_convert: [DEBUG] TRACE CALL_FUNCTION_KW 2 [TorchVariable(), ListVariable(), ConstantVariable(str), ConstantVariable(tuple)] 2023-01-11T21:38:06.9524931Z [2023-01-11 21:38:00,559] torch._dynamo.symbolic_convert: [DEBUG] TRACE RETURN_VALUE None [TensorVariable()] 2023-01-11T21:38:06.9531534Z [2023-01-11 21:38:00,559] torch._dynamo.symbolic_convert: [DEBUG] DONE INLINING 2023-01-11T21:38:06.9531892Z [2023-01-11 21:38:00,559] torch._dynamo.symbolic_convert: [DEBUG] TRACE CALL_FUNCTION 1 [UserFunctionVariable(), UserFunctionVariable(), TensorVariable()] 2023-01-11T21:38:06.9532274Z [2023-01-11 21:38:00,559] torch._dynamo.symbolic_convert: [DEBUG] INLINING at 0x7fcc3e8ac9d0, file "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 6083> 2023-01-11T21:38:06.9532374Z 6083 0 LOAD_FAST 0 (x) 2023-01-11T21:38:06.9532480Z 2 LOAD_CONST 1 (2147483657) 2023-01-11T21:38:06.9532557Z 4 BINARY_ADD 2023-01-11T21:38:06.9532645Z 6 RETURN_VALUE 2023-01-11T21:38:06.9532703Z 2023-01-11T21:38:06.9532709Z 2023-01-11T21:38:06.9532920Z [2023-01-11 21:38:00,559] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_FAST x [] 2023-01-11T21:38:06.9533170Z [2023-01-11 21:38:00,559] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_CONST 2147483657 [TensorVariable()] 2023-01-11T21:38:06.9533450Z [2023-01-11 21:38:00,560] torch._dynamo.symbolic_convert: [DEBUG] TRACE BINARY_ADD None [TensorVariable(), ConstantVariable(int)] 2023-01-11T21:38:06.9533699Z [2023-01-11 21:38:00,560] torch._dynamo.symbolic_convert: [DEBUG] TRACE RETURN_VALUE None [TensorVariable()] 2023-01-11T21:38:06.9534078Z [2023-01-11 21:38:00,560] torch._dynamo.symbolic_convert: [DEBUG] DONE INLINING at 0x7fcc3e8ac9d0, file "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 6083> 2023-01-11T21:38:06.9534366Z [2023-01-11 21:38:00,560] torch._dynamo.symbolic_convert: [DEBUG] TRACE CALL_FUNCTION 1 [UserFunctionVariable(), TensorVariable()] 2023-01-11T21:38:06.9534894Z [2023-01-11 21:38:00,561] torch._dynamo.symbolic_convert: [DEBUG] INLINING 2023-01-11T21:38:06.9534994Z 6078 0 LOAD_FAST 0 (inp) 2023-01-11T21:38:06.9535085Z 2 LOAD_METHOD 0 (to) 2023-01-11T21:38:06.9535187Z 4 LOAD_GLOBAL 1 (torch) 2023-01-11T21:38:06.9535281Z 6 LOAD_ATTR 2 (int64) 2023-01-11T21:38:06.9535376Z 8 CALL_METHOD 1 2023-01-11T21:38:06.9535471Z 10 LOAD_CONST 1 (1) 2023-01-11T21:38:06.9535549Z 12 BINARY_ADD 2023-01-11T21:38:06.9535646Z 14 LOAD_METHOD 0 (to) 2023-01-11T21:38:06.9535744Z 16 LOAD_GLOBAL 1 (torch) 2023-01-11T21:38:06.9535837Z 18 LOAD_ATTR 3 (float64) 2023-01-11T21:38:06.9535930Z 20 CALL_METHOD 1 2023-01-11T21:38:06.9536010Z 22 RETURN_VALUE 2023-01-11T21:38:06.9536076Z 2023-01-11T21:38:06.9536081Z 2023-01-11T21:38:06.9536398Z [2023-01-11 21:38:00,561] torch._dynamo.symbolic_convert: [DEBUG] TRACE starts_line /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:6078 2023-01-11T21:38:06.9536615Z [2023-01-11 21:38:00,561] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_FAST inp [] 2023-01-11T21:38:06.9536853Z [2023-01-11 21:38:00,561] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_ATTR to [TensorVariable()] 2023-01-11T21:38:06.9537186Z [2023-01-11 21:38:00,561] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_GLOBAL torch [GetAttrVariable(TensorVariable(), to)] 2023-01-11T21:38:06.9537716Z [2023-01-11 21:38:00,561] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_ATTR int64 [GetAttrVariable(TensorVariable(), to), TorchVariable()] 2023-01-11T21:38:06.9538072Z [2023-01-11 21:38:00,561] torch._dynamo.symbolic_convert: [DEBUG] TRACE CALL_FUNCTION 1 [GetAttrVariable(TensorVariable(), to), ConstantVariable(dtype)] 2023-01-11T21:38:06.9538311Z [2023-01-11 21:38:00,567] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_CONST 1 [TensorVariable()] 2023-01-11T21:38:06.9538586Z [2023-01-11 21:38:00,567] torch._dynamo.symbolic_convert: [DEBUG] TRACE BINARY_ADD None [TensorVariable(), ConstantVariable(int)] 2023-01-11T21:38:06.9538816Z [2023-01-11 21:38:00,568] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_ATTR to [TensorVariable()] 2023-01-11T21:38:06.9539084Z [2023-01-11 21:38:00,568] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_GLOBAL torch [GetAttrVariable(TensorVariable(), to)] 2023-01-11T21:38:06.9539501Z [2023-01-11 21:38:00,568] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_ATTR float64 [GetAttrVariable(TensorVariable(), to), TorchVariable()] 2023-01-11T21:38:06.9539819Z [2023-01-11 21:38:00,569] torch._dynamo.symbolic_convert: [DEBUG] TRACE CALL_FUNCTION 1 [GetAttrVariable(TensorVariable(), to), ConstantVariable(dtype)] 2023-01-11T21:38:06.9540068Z [2023-01-11 21:38:00,575] torch._dynamo.symbolic_convert: [DEBUG] TRACE RETURN_VALUE None [TensorVariable()] 2023-01-11T21:38:06.9540441Z [2023-01-11 21:38:00,575] torch._dynamo.symbolic_convert: [DEBUG] DONE INLINING 2023-01-11T21:38:06.9540687Z [2023-01-11 21:38:00,575] torch._dynamo.symbolic_convert: [DEBUG] TRACE RETURN_VALUE None [TensorVariable()] 2023-01-11T21:38:06.9540935Z [2023-01-11 21:38:00,575] torch._dynamo.symbolic_convert: [INFO] Step 2: torchdynamo done tracing fn (RETURN_VALUE) 2023-01-11T21:38:06.9541165Z [2023-01-11 21:38:00,575] torch._dynamo.symbolic_convert: [DEBUG] RETURN_VALUE triggered compile 2023-01-11T21:38:06.9541387Z [2023-01-11 21:38:00,575] torch._dynamo.output_graph: [DEBUG] COMPILING GRAPH due to None 2023-01-11T21:38:06.9541632Z [2023-01-11 21:38:00,576] torch._dynamo.output_graph: [INFO] Step 3: calling compiler function compile_fx 2023-01-11T21:38:06.9541888Z [2023-01-11 21:38:00,618] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1151 2023-01-11T21:38:06.9542127Z [2023-01-11 21:38:00,628] torch._inductor.optimize_indexing: [WARNING] unhandled ValueRange op name 2023-01-11T21:38:06.9542392Z [2023-01-11 21:38:00,862] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1151 2023-01-11T21:38:06.9542810Z [2023-01-11 21:38:00,862] torch._inductor.debug: [WARNING] model__1151_inference_1197 debug trace: /var/lib/jenkins/workspace/test/torch_compile_debug_tmp_0/run_2023_01_11_21_38_00_551246/aot_torchinductor/model__1151_inference_1197.1 2023-01-11T21:38:06.9543052Z [2023-01-11 21:38:00,863] torch._dynamo.output_graph: [INFO] Step 3: done compiler function compile_fx 2023-01-11T21:38:06.9543431Z [2023-01-11 21:38:00,864] torch._inductor.debug: [WARNING] model___1198 debug trace: /var/lib/jenkins/workspace/test/torch_compile_debug_tmp_0/run_2023_01_11_21_38_00_551246/aot_torchinductor/model___1197.0 2023-01-11T21:38:06.9543659Z [2023-01-11 21:38:00,867] torch._dynamo.symbolic_convert: [INFO] Step 2: torchdynamo start tracing fn 2023-01-11T21:38:06.9543972Z [2023-01-11 21:38:00,867] torch._dynamo.symbolic_convert: [DEBUG] TRACE starts_line /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:6090 2023-01-11T21:38:06.9544192Z [2023-01-11 21:38:00,867] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_DEREF suffix [] 2023-01-11T21:38:06.9544448Z [2023-01-11 21:38:00,867] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_DEREF foo [UserFunctionVariable()] 2023-01-11T21:38:06.9544776Z [2023-01-11 21:38:00,867] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_DEREF ones [UserFunctionVariable(), UserFunctionVariable()] 2023-01-11T21:38:06.9545132Z [2023-01-11 21:38:00,867] torch._dynamo.symbolic_convert: [DEBUG] TRACE CALL_FUNCTION 0 [UserFunctionVariable(), UserFunctionVariable(), UserFunctionVariable()] 2023-01-11T21:38:06.9545500Z [2023-01-11 21:38:00,867] torch._dynamo.symbolic_convert: [DEBUG] INLINING 2023-01-11T21:38:06.9545602Z 6075 0 LOAD_GLOBAL 0 (torch) 2023-01-11T21:38:06.9545700Z 2 LOAD_ATTR 1 (ones) 2023-01-11T21:38:06.9545788Z 4 LOAD_CONST 1 (4) 2023-01-11T21:38:06.9545882Z 6 BUILD_LIST 1 2023-01-11T21:38:06.9546026Z 8 LOAD_CONST 2 ('cuda') 2023-01-11T21:38:06.9546176Z 10 LOAD_CONST 3 (('device',)) 2023-01-11T21:38:06.9546277Z 12 CALL_FUNCTION_KW 2 2023-01-11T21:38:06.9546361Z 14 RETURN_VALUE 2023-01-11T21:38:06.9546429Z 2023-01-11T21:38:06.9546434Z 2023-01-11T21:38:06.9546745Z [2023-01-11 21:38:00,867] torch._dynamo.symbolic_convert: [DEBUG] TRACE starts_line /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:6075 2023-01-11T21:38:06.9546957Z [2023-01-11 21:38:00,867] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_GLOBAL torch [] 2023-01-11T21:38:06.9547310Z [2023-01-11 21:38:00,867] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_ATTR ones [TorchVariable()] 2023-01-11T21:38:06.9547626Z [2023-01-11 21:38:00,868] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_CONST 4 [TorchVariable()] 2023-01-11T21:38:06.9547977Z [2023-01-11 21:38:00,868] torch._dynamo.symbolic_convert: [DEBUG] TRACE BUILD_LIST 1 [TorchVariable(), ConstantVariable(int)] 2023-01-11T21:38:06.9548315Z [2023-01-11 21:38:00,868] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_CONST cuda [TorchVariable(), ListVariable()] 2023-01-11T21:38:06.9548702Z [2023-01-11 21:38:00,868] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_CONST ('device',) [TorchVariable(), ListVariable(), ConstantVariable(str)] 2023-01-11T21:38:06.9549128Z [2023-01-11 21:38:00,868] torch._dynamo.symbolic_convert: [DEBUG] TRACE CALL_FUNCTION_KW 2 [TorchVariable(), ListVariable(), ConstantVariable(str), ConstantVariable(tuple)] 2023-01-11T21:38:06.9549373Z [2023-01-11 21:38:00,874] torch._dynamo.symbolic_convert: [DEBUG] TRACE RETURN_VALUE None [TensorVariable()] 2023-01-11T21:38:06.9549748Z [2023-01-11 21:38:00,874] torch._dynamo.symbolic_convert: [DEBUG] DONE INLINING 2023-01-11T21:38:06.9550069Z [2023-01-11 21:38:00,874] torch._dynamo.symbolic_convert: [DEBUG] TRACE CALL_FUNCTION 1 [UserFunctionVariable(), UserFunctionVariable(), TensorVariable()] 2023-01-11T21:38:06.9550439Z [2023-01-11 21:38:00,874] torch._dynamo.symbolic_convert: [DEBUG] INLINING at 0x7fcc3e8aca80, file "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 6084> 2023-01-11T21:38:06.9550541Z 6084 0 LOAD_GLOBAL 0 (torch) 2023-01-11T21:38:06.9550634Z 2 LOAD_METHOD 1 (where) 2023-01-11T21:38:06.9550729Z 4 LOAD_FAST 0 (x) 2023-01-11T21:38:06.9550824Z 6 LOAD_CONST 1 (0) 2023-01-11T21:38:06.9550921Z 8 COMPARE_OP 0 (<) 2023-01-11T21:38:06.9551019Z 10 LOAD_DEREF 0 (ones) 2023-01-11T21:38:06.9551146Z 12 CALL_FUNCTION 0 2023-01-11T21:38:06.9551244Z 14 LOAD_DEREF 0 (ones) 2023-01-11T21:38:06.9551358Z 16 CALL_FUNCTION 0 2023-01-11T21:38:06.9551455Z 18 LOAD_CONST 2 (2) 2023-01-11T21:38:06.9551543Z 20 BINARY_SUBTRACT 2023-01-11T21:38:06.9551637Z 22 CALL_METHOD 3 2023-01-11T21:38:06.9551800Z 24 LOAD_CONST 3 (-1099511627776) 2023-01-11T21:38:06.9551889Z 26 BINARY_MULTIPLY 2023-01-11T21:38:06.9551972Z 28 RETURN_VALUE 2023-01-11T21:38:06.9552032Z 2023-01-11T21:38:06.9552045Z 2023-01-11T21:38:06.9552259Z [2023-01-11 21:38:00,874] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_GLOBAL torch [] 2023-01-11T21:38:06.9552620Z [2023-01-11 21:38:00,874] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_ATTR where [TorchVariable()] 2023-01-11T21:38:06.9552942Z [2023-01-11 21:38:00,874] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_FAST x [TorchVariable()] 2023-01-11T21:38:06.9553289Z [2023-01-11 21:38:00,874] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_CONST 0 [TorchVariable(), TensorVariable()] 2023-01-11T21:38:06.9553667Z [2023-01-11 21:38:00,874] torch._dynamo.symbolic_convert: [DEBUG] TRACE COMPARE_OP < [TorchVariable(), TensorVariable(), ConstantVariable(int)] 2023-01-11T21:38:06.9554010Z [2023-01-11 21:38:00,875] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_DEREF ones [TorchVariable(), TensorVariable()] 2023-01-11T21:38:06.9554396Z [2023-01-11 21:38:00,875] torch._dynamo.symbolic_convert: [DEBUG] TRACE CALL_FUNCTION 0 [TorchVariable(), TensorVariable(), UserFunctionVariable()] 2023-01-11T21:38:06.9554762Z [2023-01-11 21:38:00,875] torch._dynamo.symbolic_convert: [DEBUG] INLINING 2023-01-11T21:38:06.9554870Z 6075 0 LOAD_GLOBAL 0 (torch) 2023-01-11T21:38:06.9554967Z 2 LOAD_ATTR 1 (ones) 2023-01-11T21:38:06.9555056Z 4 LOAD_CONST 1 (4) 2023-01-11T21:38:06.9555154Z 6 BUILD_LIST 1 2023-01-11T21:38:06.9555316Z 8 LOAD_CONST 2 ('cuda') 2023-01-11T21:38:06.9555493Z 10 LOAD_CONST 3 (('device',)) 2023-01-11T21:38:06.9555604Z 12 CALL_FUNCTION_KW 2 2023-01-11T21:38:06.9555687Z 14 RETURN_VALUE 2023-01-11T21:38:06.9555754Z 2023-01-11T21:38:06.9555759Z 2023-01-11T21:38:06.9556082Z [2023-01-11 21:38:00,876] torch._dynamo.symbolic_convert: [DEBUG] TRACE starts_line /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:6075 2023-01-11T21:38:06.9556295Z [2023-01-11 21:38:00,876] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_GLOBAL torch [] 2023-01-11T21:38:06.9556654Z [2023-01-11 21:38:00,876] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_ATTR ones [TorchVariable()] 2023-01-11T21:38:06.9556975Z [2023-01-11 21:38:00,876] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_CONST 4 [TorchVariable()] 2023-01-11T21:38:06.9557324Z [2023-01-11 21:38:00,876] torch._dynamo.symbolic_convert: [DEBUG] TRACE BUILD_LIST 1 [TorchVariable(), ConstantVariable(int)] 2023-01-11T21:38:06.9557662Z [2023-01-11 21:38:00,876] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_CONST cuda [TorchVariable(), ListVariable()] 2023-01-11T21:38:06.9558075Z [2023-01-11 21:38:00,876] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_CONST ('device',) [TorchVariable(), ListVariable(), ConstantVariable(str)] 2023-01-11T21:38:06.9558518Z [2023-01-11 21:38:00,876] torch._dynamo.symbolic_convert: [DEBUG] TRACE CALL_FUNCTION_KW 2 [TorchVariable(), ListVariable(), ConstantVariable(str), ConstantVariable(tuple)] 2023-01-11T21:38:06.9558768Z [2023-01-11 21:38:00,882] torch._dynamo.symbolic_convert: [DEBUG] TRACE RETURN_VALUE None [TensorVariable()] 2023-01-11T21:38:06.9559142Z [2023-01-11 21:38:00,882] torch._dynamo.symbolic_convert: [DEBUG] DONE INLINING 2023-01-11T21:38:06.9559514Z [2023-01-11 21:38:00,882] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_DEREF ones [TorchVariable(), TensorVariable(), TensorVariable()] 2023-01-11T21:38:06.9559928Z [2023-01-11 21:38:00,882] torch._dynamo.symbolic_convert: [DEBUG] TRACE CALL_FUNCTION 0 [TorchVariable(), TensorVariable(), TensorVariable(), UserFunctionVariable()] 2023-01-11T21:38:06.9560301Z [2023-01-11 21:38:00,882] torch._dynamo.symbolic_convert: [DEBUG] INLINING 2023-01-11T21:38:06.9560405Z 6075 0 LOAD_GLOBAL 0 (torch) 2023-01-11T21:38:06.9560497Z 2 LOAD_ATTR 1 (ones) 2023-01-11T21:38:06.9560596Z 4 LOAD_CONST 1 (4) 2023-01-11T21:38:06.9560692Z 6 BUILD_LIST 1 2023-01-11T21:38:06.9560836Z 8 LOAD_CONST 2 ('cuda') 2023-01-11T21:38:06.9560988Z 10 LOAD_CONST 3 (('device',)) 2023-01-11T21:38:06.9561089Z 12 CALL_FUNCTION_KW 2 2023-01-11T21:38:06.9561172Z 14 RETURN_VALUE 2023-01-11T21:38:06.9561235Z 2023-01-11T21:38:06.9561240Z 2023-01-11T21:38:06.9561550Z [2023-01-11 21:38:00,882] torch._dynamo.symbolic_convert: [DEBUG] TRACE starts_line /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:6075 2023-01-11T21:38:06.9561769Z [2023-01-11 21:38:00,882] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_GLOBAL torch [] 2023-01-11T21:38:06.9562125Z [2023-01-11 21:38:00,882] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_ATTR ones [TorchVariable()] 2023-01-11T21:38:06.9562441Z [2023-01-11 21:38:00,882] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_CONST 4 [TorchVariable()] 2023-01-11T21:38:06.9562793Z [2023-01-11 21:38:00,882] torch._dynamo.symbolic_convert: [DEBUG] TRACE BUILD_LIST 1 [TorchVariable(), ConstantVariable(int)] 2023-01-11T21:38:06.9563137Z [2023-01-11 21:38:00,882] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_CONST cuda [TorchVariable(), ListVariable()] 2023-01-11T21:38:06.9563521Z [2023-01-11 21:38:00,882] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_CONST ('device',) [TorchVariable(), ListVariable(), ConstantVariable(str)] 2023-01-11T21:38:06.9563946Z [2023-01-11 21:38:00,883] torch._dynamo.symbolic_convert: [DEBUG] TRACE CALL_FUNCTION_KW 2 [TorchVariable(), ListVariable(), ConstantVariable(str), ConstantVariable(tuple)] 2023-01-11T21:38:06.9564194Z [2023-01-11 21:38:00,888] torch._dynamo.symbolic_convert: [DEBUG] TRACE RETURN_VALUE None [TensorVariable()] 2023-01-11T21:38:06.9564597Z [2023-01-11 21:38:00,888] torch._dynamo.symbolic_convert: [DEBUG] DONE INLINING 2023-01-11T21:38:06.9565011Z [2023-01-11 21:38:00,888] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_CONST 2 [TorchVariable(), TensorVariable(), TensorVariable(), TensorVariable()] 2023-01-11T21:38:06.9565454Z [2023-01-11 21:38:00,888] torch._dynamo.symbolic_convert: [DEBUG] TRACE BINARY_SUBTRACT None [TorchVariable(), TensorVariable(), TensorVariable(), TensorVariable(), ConstantVariable(int)] 2023-01-11T21:38:06.9565894Z [2023-01-11 21:38:00,889] torch._dynamo.symbolic_convert: [DEBUG] TRACE CALL_FUNCTION 3 [TorchVariable(), TensorVariable(), TensorVariable(), TensorVariable()] 2023-01-11T21:38:06.9566151Z [2023-01-11 21:38:00,890] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_CONST -1099511627776 [TensorVariable()] 2023-01-11T21:38:06.9566444Z [2023-01-11 21:38:00,890] torch._dynamo.symbolic_convert: [DEBUG] TRACE BINARY_MULTIPLY None [TensorVariable(), ConstantVariable(int)] 2023-01-11T21:38:06.9566694Z [2023-01-11 21:38:00,890] torch._dynamo.symbolic_convert: [DEBUG] TRACE RETURN_VALUE None [TensorVariable()] 2023-01-11T21:38:06.9567074Z [2023-01-11 21:38:00,890] torch._dynamo.symbolic_convert: [DEBUG] DONE INLINING at 0x7fcc3e8aca80, file "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 6084> 2023-01-11T21:38:06.9567363Z [2023-01-11 21:38:00,890] torch._dynamo.symbolic_convert: [DEBUG] TRACE CALL_FUNCTION 1 [UserFunctionVariable(), TensorVariable()] 2023-01-11T21:38:06.9567732Z [2023-01-11 21:38:00,891] torch._dynamo.symbolic_convert: [DEBUG] INLINING 2023-01-11T21:38:06.9567833Z 6078 0 LOAD_FAST 0 (inp) 2023-01-11T21:38:06.9567936Z 2 LOAD_METHOD 0 (to) 2023-01-11T21:38:06.9568038Z 4 LOAD_GLOBAL 1 (torch) 2023-01-11T21:38:06.9568133Z 6 LOAD_ATTR 2 (int64) 2023-01-11T21:38:06.9568235Z 8 CALL_METHOD 1 2023-01-11T21:38:06.9568336Z 10 LOAD_CONST 1 (1) 2023-01-11T21:38:06.9568417Z 12 BINARY_ADD 2023-01-11T21:38:06.9568516Z 14 LOAD_METHOD 0 (to) 2023-01-11T21:38:06.9568616Z 16 LOAD_GLOBAL 1 (torch) 2023-01-11T21:38:06.9568717Z 18 LOAD_ATTR 3 (float64) 2023-01-11T21:38:06.9568805Z 20 CALL_METHOD 1 2023-01-11T21:38:06.9568885Z 22 RETURN_VALUE 2023-01-11T21:38:06.9568953Z 2023-01-11T21:38:06.9568959Z 2023-01-11T21:38:06.9569278Z [2023-01-11 21:38:00,891] torch._dynamo.symbolic_convert: [DEBUG] TRACE starts_line /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:6078 2023-01-11T21:38:06.9569497Z [2023-01-11 21:38:00,891] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_FAST inp [] 2023-01-11T21:38:06.9569739Z [2023-01-11 21:38:00,891] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_ATTR to [TensorVariable()] 2023-01-11T21:38:06.9570021Z [2023-01-11 21:38:00,891] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_GLOBAL torch [GetAttrVariable(TensorVariable(), to)] 2023-01-11T21:38:06.9570437Z [2023-01-11 21:38:00,891] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_ATTR int64 [GetAttrVariable(TensorVariable(), to), TorchVariable()] 2023-01-11T21:38:06.9570753Z [2023-01-11 21:38:00,891] torch._dynamo.symbolic_convert: [DEBUG] TRACE CALL_FUNCTION 1 [GetAttrVariable(TensorVariable(), to), ConstantVariable(dtype)] 2023-01-11T21:38:06.9570982Z [2023-01-11 21:38:00,897] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_CONST 1 [TensorVariable()] 2023-01-11T21:38:06.9571290Z [2023-01-11 21:38:00,897] torch._dynamo.symbolic_convert: [DEBUG] TRACE BINARY_ADD None [TensorVariable(), ConstantVariable(int)] 2023-01-11T21:38:06.9571565Z [2023-01-11 21:38:00,898] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_ATTR to [TensorVariable()] 2023-01-11T21:38:06.9571840Z [2023-01-11 21:38:00,898] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_GLOBAL torch [GetAttrVariable(TensorVariable(), to)] 2023-01-11T21:38:06.9572260Z [2023-01-11 21:38:00,898] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_ATTR float64 [GetAttrVariable(TensorVariable(), to), TorchVariable()] 2023-01-11T21:38:06.9572571Z [2023-01-11 21:38:00,899] torch._dynamo.symbolic_convert: [DEBUG] TRACE CALL_FUNCTION 1 [GetAttrVariable(TensorVariable(), to), ConstantVariable(dtype)] 2023-01-11T21:38:06.9572821Z [2023-01-11 21:38:00,905] torch._dynamo.symbolic_convert: [DEBUG] TRACE RETURN_VALUE None [TensorVariable()] 2023-01-11T21:38:06.9573201Z [2023-01-11 21:38:00,905] torch._dynamo.symbolic_convert: [DEBUG] DONE INLINING 2023-01-11T21:38:06.9573451Z [2023-01-11 21:38:00,905] torch._dynamo.symbolic_convert: [DEBUG] TRACE RETURN_VALUE None [TensorVariable()] 2023-01-11T21:38:06.9573710Z [2023-01-11 21:38:00,905] torch._dynamo.symbolic_convert: [INFO] Step 2: torchdynamo done tracing fn (RETURN_VALUE) 2023-01-11T21:38:06.9573938Z [2023-01-11 21:38:00,905] torch._dynamo.symbolic_convert: [DEBUG] RETURN_VALUE triggered compile 2023-01-11T21:38:06.9574151Z [2023-01-11 21:38:00,905] torch._dynamo.output_graph: [DEBUG] COMPILING GRAPH due to None 2023-01-11T21:38:06.9574395Z [2023-01-11 21:38:00,906] torch._dynamo.output_graph: [INFO] Step 3: calling compiler function compile_fx 2023-01-11T21:38:06.9574771Z [2023-01-11 21:38:00,976] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1152 2023-01-11T21:38:06.9575011Z [2023-01-11 21:38:00,986] torch._inductor.optimize_indexing: [WARNING] unhandled ValueRange op name 2023-01-11T21:38:06.9575273Z [2023-01-11 21:38:01,206] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1152 2023-01-11T21:38:06.9575683Z [2023-01-11 21:38:01,206] torch._inductor.debug: [WARNING] model__1152_inference_1198 debug trace: /var/lib/jenkins/workspace/test/torch_compile_debug_tmp_1/run_2023_01_11_21_38_00_866432/aot_torchinductor/model__1152_inference_1198.3 2023-01-11T21:38:06.9575919Z [2023-01-11 21:38:01,206] torch._dynamo.output_graph: [INFO] Step 3: done compiler function compile_fx 2023-01-11T21:38:06.9576293Z [2023-01-11 21:38:01,208] torch._inductor.debug: [WARNING] model___1199 debug trace: /var/lib/jenkins/workspace/test/torch_compile_debug_tmp_1/run_2023_01_11_21_38_00_866432/aot_torchinductor/model___1198.2 2023-01-11T21:38:06.9576527Z [2023-01-11 21:38:01,210] torch._dynamo.symbolic_convert: [INFO] Step 2: torchdynamo start tracing fn 2023-01-11T21:38:06.9576839Z [2023-01-11 21:38:01,210] torch._dynamo.symbolic_convert: [DEBUG] TRACE starts_line /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:6090 2023-01-11T21:38:06.9577054Z [2023-01-11 21:38:01,211] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_DEREF suffix [] 2023-01-11T21:38:06.9577382Z [2023-01-11 21:38:01,211] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_DEREF foo [UserFunctionVariable()] 2023-01-11T21:38:06.9577677Z [2023-01-11 21:38:01,211] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_DEREF ones [UserFunctionVariable(), UserFunctionVariable()] 2023-01-11T21:38:06.9578010Z [2023-01-11 21:38:01,211] torch._dynamo.symbolic_convert: [DEBUG] TRACE CALL_FUNCTION 0 [UserFunctionVariable(), UserFunctionVariable(), UserFunctionVariable()] 2023-01-11T21:38:06.9578379Z [2023-01-11 21:38:01,211] torch._dynamo.symbolic_convert: [DEBUG] INLINING 2023-01-11T21:38:06.9578526Z 6075 0 LOAD_GLOBAL 0 (torch) 2023-01-11T21:38:06.9578627Z 2 LOAD_ATTR 1 (ones) 2023-01-11T21:38:06.9578757Z 4 LOAD_CONST 1 (4) 2023-01-11T21:38:06.9578850Z 6 BUILD_LIST 1 2023-01-11T21:38:06.9578988Z 8 LOAD_CONST 2 ('cuda') 2023-01-11T21:38:06.9579138Z 10 LOAD_CONST 3 (('device',)) 2023-01-11T21:38:06.9579236Z 12 CALL_FUNCTION_KW 2 2023-01-11T21:38:06.9579317Z 14 RETURN_VALUE 2023-01-11T21:38:06.9579382Z 2023-01-11T21:38:06.9579387Z 2023-01-11T21:38:06.9579698Z [2023-01-11 21:38:01,211] torch._dynamo.symbolic_convert: [DEBUG] TRACE starts_line /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:6075 2023-01-11T21:38:06.9579916Z [2023-01-11 21:38:01,211] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_GLOBAL torch [] 2023-01-11T21:38:06.9580267Z [2023-01-11 21:38:01,211] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_ATTR ones [TorchVariable()] 2023-01-11T21:38:06.9580584Z [2023-01-11 21:38:01,211] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_CONST 4 [TorchVariable()] 2023-01-11T21:38:06.9580931Z [2023-01-11 21:38:01,211] torch._dynamo.symbolic_convert: [DEBUG] TRACE BUILD_LIST 1 [TorchVariable(), ConstantVariable(int)] 2023-01-11T21:38:06.9581268Z [2023-01-11 21:38:01,211] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_CONST cuda [TorchVariable(), ListVariable()] 2023-01-11T21:38:06.9581649Z [2023-01-11 21:38:01,212] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_CONST ('device',) [TorchVariable(), ListVariable(), ConstantVariable(str)] 2023-01-11T21:38:06.9582067Z [2023-01-11 21:38:01,212] torch._dynamo.symbolic_convert: [DEBUG] TRACE CALL_FUNCTION_KW 2 [TorchVariable(), ListVariable(), ConstantVariable(str), ConstantVariable(tuple)] 2023-01-11T21:38:06.9582314Z [2023-01-11 21:38:01,217] torch._dynamo.symbolic_convert: [DEBUG] TRACE RETURN_VALUE None [TensorVariable()] 2023-01-11T21:38:06.9582688Z [2023-01-11 21:38:01,217] torch._dynamo.symbolic_convert: [DEBUG] DONE INLINING 2023-01-11T21:38:06.9583013Z [2023-01-11 21:38:01,217] torch._dynamo.symbolic_convert: [DEBUG] TRACE CALL_FUNCTION 1 [UserFunctionVariable(), UserFunctionVariable(), TensorVariable()] 2023-01-11T21:38:06.9583381Z [2023-01-11 21:38:01,218] torch._dynamo.symbolic_convert: [DEBUG] INLINING at 0x7fcc3e8acb30, file "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 6085> 2023-01-11T21:38:06.9583483Z 6085 0 LOAD_FAST 0 (x) 2023-01-11T21:38:06.9583581Z 2 LOAD_DEREF 0 (ten) 2023-01-11T21:38:06.9583656Z 4 BINARY_ADD 2023-01-11T21:38:06.9583739Z 6 RETURN_VALUE 2023-01-11T21:38:06.9583805Z 2023-01-11T21:38:06.9583810Z 2023-01-11T21:38:06.9584020Z [2023-01-11 21:38:01,218] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_FAST x [] 2023-01-11T21:38:06.9584263Z [2023-01-11 21:38:01,218] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_DEREF ten [TensorVariable()] 2023-01-11T21:38:06.9584537Z [2023-01-11 21:38:01,218] torch._dynamo.symbolic_convert: [DEBUG] TRACE BINARY_ADD None [TensorVariable(), TensorVariable()] 2023-01-11T21:38:06.9584781Z [2023-01-11 21:38:01,219] torch._dynamo.symbolic_convert: [DEBUG] TRACE RETURN_VALUE None [TensorVariable()] 2023-01-11T21:38:06.9585189Z [2023-01-11 21:38:01,219] torch._dynamo.symbolic_convert: [DEBUG] DONE INLINING at 0x7fcc3e8acb30, file "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 6085> 2023-01-11T21:38:06.9585475Z [2023-01-11 21:38:01,219] torch._dynamo.symbolic_convert: [DEBUG] TRACE CALL_FUNCTION 1 [UserFunctionVariable(), TensorVariable()] 2023-01-11T21:38:06.9585916Z [2023-01-11 21:38:01,219] torch._dynamo.symbolic_convert: [DEBUG] INLINING 2023-01-11T21:38:06.9586019Z 6078 0 LOAD_FAST 0 (inp) 2023-01-11T21:38:06.9586119Z 2 LOAD_METHOD 0 (to) 2023-01-11T21:38:06.9586221Z 4 LOAD_GLOBAL 1 (torch) 2023-01-11T21:38:06.9586317Z 6 LOAD_ATTR 2 (int64) 2023-01-11T21:38:06.9586410Z 8 CALL_METHOD 1 2023-01-11T21:38:06.9586508Z 10 LOAD_CONST 1 (1) 2023-01-11T21:38:06.9586589Z 12 BINARY_ADD 2023-01-11T21:38:06.9586684Z 14 LOAD_METHOD 0 (to) 2023-01-11T21:38:06.9586786Z 16 LOAD_GLOBAL 1 (torch) 2023-01-11T21:38:06.9586895Z 18 LOAD_ATTR 3 (float64) 2023-01-11T21:38:06.9586990Z 20 CALL_METHOD 1 2023-01-11T21:38:06.9587075Z 22 RETURN_VALUE 2023-01-11T21:38:06.9587141Z 2023-01-11T21:38:06.9587146Z 2023-01-11T21:38:06.9587460Z [2023-01-11 21:38:01,219] torch._dynamo.symbolic_convert: [DEBUG] TRACE starts_line /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:6078 2023-01-11T21:38:06.9587679Z [2023-01-11 21:38:01,219] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_FAST inp [] 2023-01-11T21:38:06.9587913Z [2023-01-11 21:38:01,219] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_ATTR to [TensorVariable()] 2023-01-11T21:38:06.9588193Z [2023-01-11 21:38:01,220] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_GLOBAL torch [GetAttrVariable(TensorVariable(), to)] 2023-01-11T21:38:06.9588612Z [2023-01-11 21:38:01,220] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_ATTR int64 [GetAttrVariable(TensorVariable(), to), TorchVariable()] 2023-01-11T21:38:06.9588929Z [2023-01-11 21:38:01,220] torch._dynamo.symbolic_convert: [DEBUG] TRACE CALL_FUNCTION 1 [GetAttrVariable(TensorVariable(), to), ConstantVariable(dtype)] 2023-01-11T21:38:06.9589166Z [2023-01-11 21:38:01,226] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_CONST 1 [TensorVariable()] 2023-01-11T21:38:06.9589440Z [2023-01-11 21:38:01,226] torch._dynamo.symbolic_convert: [DEBUG] TRACE BINARY_ADD None [TensorVariable(), ConstantVariable(int)] 2023-01-11T21:38:06.9589672Z [2023-01-11 21:38:01,227] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_ATTR to [TensorVariable()] 2023-01-11T21:38:06.9589946Z [2023-01-11 21:38:01,227] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_GLOBAL torch [GetAttrVariable(TensorVariable(), to)] 2023-01-11T21:38:06.9590365Z [2023-01-11 21:38:01,227] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_ATTR float64 [GetAttrVariable(TensorVariable(), to), TorchVariable()] 2023-01-11T21:38:06.9590682Z [2023-01-11 21:38:01,227] torch._dynamo.symbolic_convert: [DEBUG] TRACE CALL_FUNCTION 1 [GetAttrVariable(TensorVariable(), to), ConstantVariable(dtype)] 2023-01-11T21:38:06.9590931Z [2023-01-11 21:38:01,234] torch._dynamo.symbolic_convert: [DEBUG] TRACE RETURN_VALUE None [TensorVariable()] 2023-01-11T21:38:06.9591300Z [2023-01-11 21:38:01,234] torch._dynamo.symbolic_convert: [DEBUG] DONE INLINING 2023-01-11T21:38:06.9591545Z [2023-01-11 21:38:01,234] torch._dynamo.symbolic_convert: [DEBUG] TRACE RETURN_VALUE None [TensorVariable()] 2023-01-11T21:38:06.9591831Z [2023-01-11 21:38:01,234] torch._dynamo.symbolic_convert: [INFO] Step 2: torchdynamo done tracing fn (RETURN_VALUE) 2023-01-11T21:38:06.9592061Z [2023-01-11 21:38:01,234] torch._dynamo.symbolic_convert: [DEBUG] RETURN_VALUE triggered compile 2023-01-11T21:38:06.9592302Z [2023-01-11 21:38:01,234] torch._dynamo.output_graph: [DEBUG] COMPILING GRAPH due to None 2023-01-11T21:38:06.9592549Z [2023-01-11 21:38:01,235] torch._dynamo.output_graph: [INFO] Step 3: calling compiler function compile_fx 2023-01-11T21:38:06.9592808Z [2023-01-11 21:38:01,277] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1153 2023-01-11T21:38:06.9593071Z [2023-01-11 21:38:01,348] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1153 2023-01-11T21:38:06.9593483Z [2023-01-11 21:38:01,348] torch._inductor.debug: [WARNING] model__1153_inference_1199 debug trace: /var/lib/jenkins/workspace/test/torch_compile_debug_tmp_2/run_2023_01_11_21_38_01_210061/aot_torchinductor/model__1153_inference_1199.5 2023-01-11T21:38:06.9593726Z [2023-01-11 21:38:01,349] torch._dynamo.output_graph: [INFO] Step 3: done compiler function compile_fx 2023-01-11T21:38:06.9594095Z [2023-01-11 21:38:01,500] torch._inductor.debug: [WARNING] model___1200 debug trace: /var/lib/jenkins/workspace/test/torch_compile_debug_tmp_2/run_2023_01_11_21_38_01_210061/aot_torchinductor/model___1199.4 2023-01-11T21:38:06.9594334Z [2023-01-11 21:38:01,503] torch._dynamo.symbolic_convert: [INFO] Step 2: torchdynamo start tracing fn 2023-01-11T21:38:06.9594647Z [2023-01-11 21:38:01,503] torch._dynamo.symbolic_convert: [DEBUG] TRACE starts_line /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:6090 2023-01-11T21:38:06.9594871Z [2023-01-11 21:38:01,503] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_DEREF suffix [] 2023-01-11T21:38:06.9595129Z [2023-01-11 21:38:01,503] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_DEREF foo [UserFunctionVariable()] 2023-01-11T21:38:06.9595451Z [2023-01-11 21:38:01,503] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_DEREF ones [UserFunctionVariable(), UserFunctionVariable()] 2023-01-11T21:38:06.9595815Z [2023-01-11 21:38:01,503] torch._dynamo.symbolic_convert: [DEBUG] TRACE CALL_FUNCTION 0 [UserFunctionVariable(), UserFunctionVariable(), UserFunctionVariable()] 2023-01-11T21:38:06.9596186Z [2023-01-11 21:38:01,503] torch._dynamo.symbolic_convert: [DEBUG] INLINING 2023-01-11T21:38:06.9596290Z 6075 0 LOAD_GLOBAL 0 (torch) 2023-01-11T21:38:06.9596389Z 2 LOAD_ATTR 1 (ones) 2023-01-11T21:38:06.9596480Z 4 LOAD_CONST 1 (4) 2023-01-11T21:38:06.9596577Z 6 BUILD_LIST 1 2023-01-11T21:38:06.9596722Z 8 LOAD_CONST 2 ('cuda') 2023-01-11T21:38:06.9596878Z 10 LOAD_CONST 3 (('device',)) 2023-01-11T21:38:06.9596978Z 12 CALL_FUNCTION_KW 2 2023-01-11T21:38:06.9597064Z 14 RETURN_VALUE 2023-01-11T21:38:06.9597132Z 2023-01-11T21:38:06.9597137Z 2023-01-11T21:38:06.9597445Z [2023-01-11 21:38:01,504] torch._dynamo.symbolic_convert: [DEBUG] TRACE starts_line /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:6075 2023-01-11T21:38:06.9597665Z [2023-01-11 21:38:01,504] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_GLOBAL torch [] 2023-01-11T21:38:06.9598017Z [2023-01-11 21:38:01,504] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_ATTR ones [TorchVariable()] 2023-01-11T21:38:06.9598338Z [2023-01-11 21:38:01,504] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_CONST 4 [TorchVariable()] 2023-01-11T21:38:06.9598689Z [2023-01-11 21:38:01,504] torch._dynamo.symbolic_convert: [DEBUG] TRACE BUILD_LIST 1 [TorchVariable(), ConstantVariable(int)] 2023-01-11T21:38:06.9599059Z [2023-01-11 21:38:01,504] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_CONST cuda [TorchVariable(), ListVariable()] 2023-01-11T21:38:06.9599460Z [2023-01-11 21:38:01,504] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_CONST ('device',) [TorchVariable(), ListVariable(), ConstantVariable(str)] 2023-01-11T21:38:06.9599882Z [2023-01-11 21:38:01,504] torch._dynamo.symbolic_convert: [DEBUG] TRACE CALL_FUNCTION_KW 2 [TorchVariable(), ListVariable(), ConstantVariable(str), ConstantVariable(tuple)] 2023-01-11T21:38:06.9600132Z [2023-01-11 21:38:01,510] torch._dynamo.symbolic_convert: [DEBUG] TRACE RETURN_VALUE None [TensorVariable()] 2023-01-11T21:38:06.9600508Z [2023-01-11 21:38:01,510] torch._dynamo.symbolic_convert: [DEBUG] DONE INLINING 2023-01-11T21:38:06.9600835Z [2023-01-11 21:38:01,510] torch._dynamo.symbolic_convert: [DEBUG] TRACE CALL_FUNCTION 1 [UserFunctionVariable(), UserFunctionVariable(), TensorVariable()] 2023-01-11T21:38:06.9601213Z [2023-01-11 21:38:01,511] torch._dynamo.symbolic_convert: [DEBUG] INLINING at 0x7fcc3e8acbe0, file "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 6086> 2023-01-11T21:38:06.9601308Z 6086 0 LOAD_FAST 0 (x) 2023-01-11T21:38:06.9601411Z 2 LOAD_DEREF 0 (ten) 2023-01-11T21:38:06.9601512Z 4 LOAD_METHOD 0 (sum) 2023-01-11T21:38:06.9601607Z 6 CALL_METHOD 0 2023-01-11T21:38:06.9601687Z 8 BINARY_ADD 2023-01-11T21:38:06.9601770Z 10 RETURN_VALUE 2023-01-11T21:38:06.9601837Z 2023-01-11T21:38:06.9601843Z 2023-01-11T21:38:06.9602061Z [2023-01-11 21:38:01,511] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_FAST x [] 2023-01-11T21:38:06.9602300Z [2023-01-11 21:38:01,511] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_DEREF ten [TensorVariable()] 2023-01-11T21:38:06.9602573Z [2023-01-11 21:38:01,511] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_ATTR sum [TensorVariable(), TensorVariable()] 2023-01-11T21:38:06.9602876Z [2023-01-11 21:38:01,511] torch._dynamo.symbolic_convert: [DEBUG] TRACE CALL_FUNCTION 0 [TensorVariable(), GetAttrVariable(TensorVariable(), sum)] 2023-01-11T21:38:06.9603148Z [2023-01-11 21:38:01,512] torch._dynamo.symbolic_convert: [DEBUG] TRACE BINARY_ADD None [TensorVariable(), TensorVariable()] 2023-01-11T21:38:06.9603397Z [2023-01-11 21:38:01,512] torch._dynamo.symbolic_convert: [DEBUG] TRACE RETURN_VALUE None [TensorVariable()] 2023-01-11T21:38:06.9603776Z [2023-01-11 21:38:01,513] torch._dynamo.symbolic_convert: [DEBUG] DONE INLINING at 0x7fcc3e8acbe0, file "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 6086> 2023-01-11T21:38:06.9604062Z [2023-01-11 21:38:01,513] torch._dynamo.symbolic_convert: [DEBUG] TRACE CALL_FUNCTION 1 [UserFunctionVariable(), TensorVariable()] 2023-01-11T21:38:06.9604437Z [2023-01-11 21:38:01,513] torch._dynamo.symbolic_convert: [DEBUG] INLINING 2023-01-11T21:38:06.9604539Z 6078 0 LOAD_FAST 0 (inp) 2023-01-11T21:38:06.9604639Z 2 LOAD_METHOD 0 (to) 2023-01-11T21:38:06.9604736Z 4 LOAD_GLOBAL 1 (torch) 2023-01-11T21:38:06.9604834Z 6 LOAD_ATTR 2 (int64) 2023-01-11T21:38:06.9604928Z 8 CALL_METHOD 1 2023-01-11T21:38:06.9605026Z 10 LOAD_CONST 1 (1) 2023-01-11T21:38:06.9605105Z 12 BINARY_ADD 2023-01-11T21:38:06.9605206Z 14 LOAD_METHOD 0 (to) 2023-01-11T21:38:06.9605363Z 16 LOAD_GLOBAL 1 (torch) 2023-01-11T21:38:06.9605460Z 18 LOAD_ATTR 3 (float64) 2023-01-11T21:38:06.9605593Z 20 CALL_METHOD 1 2023-01-11T21:38:06.9605687Z 22 RETURN_VALUE 2023-01-11T21:38:06.9605768Z 2023-01-11T21:38:06.9605775Z 2023-01-11T21:38:06.9606104Z [2023-01-11 21:38:01,513] torch._dynamo.symbolic_convert: [DEBUG] TRACE starts_line /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:6078 2023-01-11T21:38:06.9606322Z [2023-01-11 21:38:01,513] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_FAST inp [] 2023-01-11T21:38:06.9606565Z [2023-01-11 21:38:01,513] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_ATTR to [TensorVariable()] 2023-01-11T21:38:06.9606849Z [2023-01-11 21:38:01,513] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_GLOBAL torch [GetAttrVariable(TensorVariable(), to)] 2023-01-11T21:38:06.9607259Z [2023-01-11 21:38:01,513] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_ATTR int64 [GetAttrVariable(TensorVariable(), to), TorchVariable()] 2023-01-11T21:38:06.9607576Z [2023-01-11 21:38:01,514] torch._dynamo.symbolic_convert: [DEBUG] TRACE CALL_FUNCTION 1 [GetAttrVariable(TensorVariable(), to), ConstantVariable(dtype)] 2023-01-11T21:38:06.9607812Z [2023-01-11 21:38:01,520] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_CONST 1 [TensorVariable()] 2023-01-11T21:38:06.9608085Z [2023-01-11 21:38:01,520] torch._dynamo.symbolic_convert: [DEBUG] TRACE BINARY_ADD None [TensorVariable(), ConstantVariable(int)] 2023-01-11T21:38:06.9608317Z [2023-01-11 21:38:01,520] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_ATTR to [TensorVariable()] 2023-01-11T21:38:06.9608588Z [2023-01-11 21:38:01,521] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_GLOBAL torch [GetAttrVariable(TensorVariable(), to)] 2023-01-11T21:38:06.9609007Z [2023-01-11 21:38:01,521] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_ATTR float64 [GetAttrVariable(TensorVariable(), to), TorchVariable()] 2023-01-11T21:38:06.9609321Z [2023-01-11 21:38:01,521] torch._dynamo.symbolic_convert: [DEBUG] TRACE CALL_FUNCTION 1 [GetAttrVariable(TensorVariable(), to), ConstantVariable(dtype)] 2023-01-11T21:38:06.9609568Z [2023-01-11 21:38:01,527] torch._dynamo.symbolic_convert: [DEBUG] TRACE RETURN_VALUE None [TensorVariable()] 2023-01-11T21:38:06.9609944Z [2023-01-11 21:38:01,527] torch._dynamo.symbolic_convert: [DEBUG] DONE INLINING 2023-01-11T21:38:06.9610192Z [2023-01-11 21:38:01,527] torch._dynamo.symbolic_convert: [DEBUG] TRACE RETURN_VALUE None [TensorVariable()] 2023-01-11T21:38:06.9610445Z [2023-01-11 21:38:01,527] torch._dynamo.symbolic_convert: [INFO] Step 2: torchdynamo done tracing fn (RETURN_VALUE) 2023-01-11T21:38:06.9610675Z [2023-01-11 21:38:01,527] torch._dynamo.symbolic_convert: [DEBUG] RETURN_VALUE triggered compile 2023-01-11T21:38:06.9610895Z [2023-01-11 21:38:01,527] torch._dynamo.output_graph: [DEBUG] COMPILING GRAPH due to None 2023-01-11T21:38:06.9611145Z [2023-01-11 21:38:01,528] torch._dynamo.output_graph: [INFO] Step 3: calling compiler function compile_fx 2023-01-11T21:38:06.9611404Z [2023-01-11 21:38:01,572] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1154 2023-01-11T21:38:06.9611615Z [2023-01-11 21:38:01,584] torch._inductor.scheduler: [DEBUG] remove_buffer('buf0') 2023-01-11T21:38:06.9611621Z 2023-01-11T21:38:06.9611719Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.9611797Z import torch 2023-01-11T21:38:06.9611873Z import random 2023-01-11T21:38:06.9611987Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.9612114Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.9612120Z 2023-01-11T21:38:06.9612204Z aten = torch.ops.aten 2023-01-11T21:38:06.9612376Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.9612478Z async_compile = AsyncCompile() 2023-01-11T21:38:06.9612506Z 2023-01-11T21:38:06.9612585Z import triton 2023-01-11T21:38:06.9612680Z import triton.language as tl 2023-01-11T21:38:06.9612802Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.9612946Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.9612951Z 2023-01-11T21:38:06.9612956Z 2023-01-11T21:38:06.9613145Z triton_fused_convert_element_type_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.9613222Z import triton 2023-01-11T21:38:06.9613315Z import triton.language as tl 2023-01-11T21:38:06.9613434Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.9613539Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.9613676Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.9613798Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.9613803Z 2023-01-11T21:38:06.9614194Z @pointwise(size_hints=[4], filename=__file__, meta={'signature': {0: '*fp64', 1: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0,), equal_to_1=())]}) 2023-01-11T21:38:06.9614272Z @triton.jit 2023-01-11T21:38:06.9614397Z def triton_(out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.9614583Z xnumel = 4 2023-01-11T21:38:06.9614685Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.9614816Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.9614900Z xmask = xindex < xnumel 2023-01-11T21:38:06.9614965Z x0 = xindex 2023-01-11T21:38:06.9615035Z tmp0 = 1 2023-01-11T21:38:06.9615109Z tmp1 = 2147483657 2023-01-11T21:38:06.9615187Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.9615273Z tmp3 = tmp2.to(tl.int64) 2023-01-11T21:38:06.9615354Z tmp4 = tmp3 + tmp0 2023-01-11T21:38:06.9615439Z tmp5 = tmp4.to(tl.float64) 2023-01-11T21:38:06.9615577Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask) 2023-01-11T21:38:06.9615664Z ''') 2023-01-11T21:38:06.9615673Z 2023-01-11T21:38:06.9615678Z 2023-01-11T21:38:06.9615771Z async_compile.wait(globals()) 2023-01-11T21:38:06.9615847Z del async_compile 2023-01-11T21:38:06.9615852Z 2023-01-11T21:38:06.9615926Z def call(args): 2023-01-11T21:38:06.9616020Z with torch.cuda.device(0): 2023-01-11T21:38:06.9616216Z buf0 = empty_strided((4, ), (1, ), device='cuda', dtype=torch.float64) 2023-01-11T21:38:06.9616302Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.9616452Z triton_fused_convert_element_type_1_0.run(buf0, 4, grid=grid(4), stream=stream0) 2023-01-11T21:38:06.9616529Z return (buf0, ) 2023-01-11T21:38:06.9616534Z 2023-01-11T21:38:06.9616539Z 2023-01-11T21:38:06.9616619Z if __name__ == "__main__": 2023-01-11T21:38:06.9616737Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.9616864Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.9616968Z print_performance(lambda: call([])) 2023-01-11T21:38:06.9616973Z 2023-01-11T21:38:06.9616980Z 2023-01-11T21:38:06.9617080Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.9617197Z import torch 2023-01-11T21:38:06.9617275Z import random 2023-01-11T21:38:06.9617395Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.9617519Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.9617524Z 2023-01-11T21:38:06.9617604Z aten = torch.ops.aten 2023-01-11T21:38:06.9617738Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.9617830Z async_compile = AsyncCompile() 2023-01-11T21:38:06.9617835Z 2023-01-11T21:38:06.9617909Z import triton 2023-01-11T21:38:06.9617995Z import triton.language as tl 2023-01-11T21:38:06.9618120Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.9618259Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.9618265Z 2023-01-11T21:38:06.9618322Z 2023-01-11T21:38:06.9618517Z triton_fused_convert_element_type_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.9618625Z import triton 2023-01-11T21:38:06.9618718Z import triton.language as tl 2023-01-11T21:38:06.9618834Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.9618938Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.9619066Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.9619193Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.9619198Z 2023-01-11T21:38:06.9619585Z @pointwise(size_hints=[4], filename=__file__, meta={'signature': {0: '*fp64', 1: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0,), equal_to_1=())]}) 2023-01-11T21:38:06.9619661Z @triton.jit 2023-01-11T21:38:06.9619786Z def triton_(out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.9619861Z xnumel = 4 2023-01-11T21:38:06.9619962Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.9620094Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.9620175Z xmask = xindex < xnumel 2023-01-11T21:38:06.9620251Z x0 = xindex 2023-01-11T21:38:06.9620321Z tmp0 = 1 2023-01-11T21:38:06.9620391Z tmp1 = 0 2023-01-11T21:38:06.9620470Z tmp2 = tmp0 < tmp1 2023-01-11T21:38:06.9620540Z tmp3 = 2 2023-01-11T21:38:06.9620645Z tmp4 = tmp0 - tmp3 2023-01-11T21:38:06.9620743Z tmp5 = tl.where(tmp2, tmp0, tmp4) 2023-01-11T21:38:06.9620857Z tmp6 = -1099511627776 2023-01-11T21:38:06.9620936Z tmp7 = tmp5 * tmp6 2023-01-11T21:38:06.9621024Z tmp8 = tmp7.to(tl.int64) 2023-01-11T21:38:06.9621103Z tmp9 = tmp8 + tmp0 2023-01-11T21:38:06.9621197Z tmp10 = tmp9.to(tl.float64) 2023-01-11T21:38:06.9621330Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp10, xmask) 2023-01-11T21:38:06.9621417Z ''') 2023-01-11T21:38:06.9621423Z 2023-01-11T21:38:06.9621427Z 2023-01-11T21:38:06.9621524Z async_compile.wait(globals()) 2023-01-11T21:38:06.9621603Z del async_compile 2023-01-11T21:38:06.9621608Z 2023-01-11T21:38:06.9621691Z def call(args): 2023-01-11T21:38:06.9621785Z with torch.cuda.device(0): 2023-01-11T21:38:06.9621987Z buf0 = empty_strided((4, ), (1, ), device='cuda', dtype=torch.float64) 2023-01-11T21:38:06.9622083Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.9622228Z triton_fused_convert_element_type_1_0.run(buf0, 4, grid=grid(4), stream=stream0) 2023-01-11T21:38:06.9622309Z return (buf0, ) 2023-01-11T21:38:06.9622314Z 2023-01-11T21:38:06.9622319Z 2023-01-11T21:38:06.9622401Z if __name__ == "__main__": 2023-01-11T21:38:06.9622518Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.9622649Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.9622752Z print_performance(lambda: call([])) 2023-01-11T21:38:06.9622758Z 2023-01-11T21:38:06.9622762Z 2023-01-11T21:38:06.9622864Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.9622939Z import torch 2023-01-11T21:38:06.9623009Z import random 2023-01-11T21:38:06.9623131Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.9623259Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.9623264Z 2023-01-11T21:38:06.9623349Z aten = torch.ops.aten 2023-01-11T21:38:06.9623485Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.9623582Z async_compile = AsyncCompile() 2023-01-11T21:38:06.9623587Z 2023-01-11T21:38:06.9623663Z import triton 2023-01-11T21:38:06.9623751Z import triton.language as tl 2023-01-11T21:38:06.9623881Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.9624022Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.9624027Z 2023-01-11T21:38:06.9624032Z 2023-01-11T21:38:06.9624219Z triton_fused_convert_element_type_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.9624297Z import triton 2023-01-11T21:38:06.9624422Z import triton.language as tl 2023-01-11T21:38:06.9624540Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.9624676Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.9624803Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.9624931Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.9624936Z 2023-01-11T21:38:06.9625339Z @pointwise(size_hints=[4], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp64', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.9625414Z @triton.jit 2023-01-11T21:38:06.9625550Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.9625626Z xnumel = 4 2023-01-11T21:38:06.9625724Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.9625854Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.9625936Z xmask = xindex < xnumel 2023-01-11T21:38:06.9626007Z x0 = xindex 2023-01-11T21:38:06.9626107Z tmp1 = tl.load(in_ptr0 + (x0), xmask) 2023-01-11T21:38:06.9626182Z tmp0 = 1 2023-01-11T21:38:06.9626265Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.9626351Z tmp3 = tmp2.to(tl.int64) 2023-01-11T21:38:06.9626432Z tmp4 = tmp3 + tmp0 2023-01-11T21:38:06.9626515Z tmp5 = tmp4.to(tl.float64) 2023-01-11T21:38:06.9626654Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask) 2023-01-11T21:38:06.9626741Z ''') 2023-01-11T21:38:06.9626746Z 2023-01-11T21:38:06.9626750Z 2023-01-11T21:38:06.9626845Z async_compile.wait(globals()) 2023-01-11T21:38:06.9626926Z del async_compile 2023-01-11T21:38:06.9626931Z 2023-01-11T21:38:06.9627007Z def call(args): 2023-01-11T21:38:06.9627080Z arg0_1, = args 2023-01-11T21:38:06.9627151Z args.clear() 2023-01-11T21:38:06.9627243Z with torch.cuda.device(0): 2023-01-11T21:38:06.9627441Z buf0 = empty_strided((4, ), (1, ), device='cuda', dtype=torch.float64) 2023-01-11T21:38:06.9627538Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.9627697Z triton_fused_convert_element_type_1_0.run(arg0_1, buf0, 4, grid=grid(4), stream=stream0) 2023-01-11T21:38:06.9627775Z del arg0_1 2023-01-11T21:38:06.9627856Z return (buf0, ) 2023-01-11T21:38:06.9627861Z 2023-01-11T21:38:06.9627865Z 2023-01-11T21:38:06.9627946Z if __name__ == "__main__": 2023-01-11T21:38:06.9628060Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.9628190Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.9628391Z arg0_1 = rand_strided((4, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.9628505Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.9628511Z 2023-01-11T21:38:06.9628777Z [2023-01-11 21:38:01,749] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1154 2023-01-11T21:38:06.9629194Z [2023-01-11 21:38:01,749] torch._inductor.debug: [WARNING] model__1154_inference_1200 debug trace: /var/lib/jenkins/workspace/test/torch_compile_debug_tmp_3/run_2023_01_11_21_38_01_502557/aot_torchinductor/model__1154_inference_1200.7 2023-01-11T21:38:06.9629443Z [2023-01-11 21:38:01,750] torch._dynamo.output_graph: [INFO] Step 3: done compiler function compile_fx 2023-01-11T21:38:06.9629818Z [2023-01-11 21:38:01,899] torch._inductor.debug: [WARNING] model___1201 debug trace: /var/lib/jenkins/workspace/test/torch_compile_debug_tmp_3/run_2023_01_11_21_38_01_502557/aot_torchinductor/model___1200.6 2023-01-11T21:38:06.9629823Z 2023-01-11T21:38:06.9629924Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.9629994Z import torch 2023-01-11T21:38:06.9630071Z import random 2023-01-11T21:38:06.9630193Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.9630319Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.9630324Z 2023-01-11T21:38:06.9630408Z aten = torch.ops.aten 2023-01-11T21:38:06.9630575Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.9630677Z async_compile = AsyncCompile() 2023-01-11T21:38:06.9630704Z 2023-01-11T21:38:06.9630782Z import triton 2023-01-11T21:38:06.9630871Z import triton.language as tl 2023-01-11T21:38:06.9631000Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.9631141Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.9631147Z 2023-01-11T21:38:06.9631151Z 2023-01-11T21:38:06.9631351Z triton_fused_convert_element_type_1_sum_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.9631431Z import triton 2023-01-11T21:38:06.9631526Z import triton.language as tl 2023-01-11T21:38:06.9631640Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.9631742Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.9631869Z from torch._inductor.triton_ops.autotune import reduction 2023-01-11T21:38:06.9631996Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.9632002Z 2023-01-11T21:38:06.9632094Z @reduction(size_hints=[1, 4], 2023-01-11T21:38:06.9632209Z reduction_hint=ReductionHint.INNER, 2023-01-11T21:38:06.9632302Z filename=__file__, 2023-01-11T21:38:06.9632662Z meta={'signature': {0: '*fp32', 1: '*fp64', 2: 'i32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.9632740Z @triton.jit 2023-01-11T21:38:06.9632908Z def triton_(in_ptr0, out_ptr1, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr): 2023-01-11T21:38:06.9632977Z xnumel = 1 2023-01-11T21:38:06.9633051Z rnumel = 4 2023-01-11T21:38:06.9633152Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.9633289Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.9633376Z xmask = xindex < xnumel 2023-01-11T21:38:06.9633497Z rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK]) 2023-01-11T21:38:06.9633620Z _tmp1 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.9633722Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.9633816Z rindex = roffset + rbase 2023-01-11T21:38:06.9633909Z rmask = rindex < rnumel 2023-01-11T21:38:06.9633982Z r0 = rindex 2023-01-11T21:38:06.9634086Z tmp0 = tl.load(in_ptr0 + (r0), rmask) 2023-01-11T21:38:06.9634207Z _tmp1 = tl.where(xmask & rmask, _tmp1 + tmp0, _tmp1) 2023-01-11T21:38:06.9634324Z tmp1 = tl.reshape(tl.sum(_tmp1, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.9634424Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.9634513Z rindex = roffset + rbase 2023-01-11T21:38:06.9634600Z rmask = rindex < rnumel 2023-01-11T21:38:06.9634672Z r0 = rindex 2023-01-11T21:38:06.9634743Z tmp2 = 1 2023-01-11T21:38:06.9634826Z tmp3 = tmp2 + tmp1 2023-01-11T21:38:06.9634910Z tmp4 = tmp3.to(tl.int64) 2023-01-11T21:38:06.9634993Z tmp5 = tmp4 + tmp2 2023-01-11T21:38:06.9635091Z tmp6 = tmp5.to(tl.float64) 2023-01-11T21:38:06.9635247Z tl.store(out_ptr1 + (r0 + tl.zeros([XBLOCK, RBLOCK], tl.int32)), tmp6, rmask & xmask) 2023-01-11T21:38:06.9635337Z ''') 2023-01-11T21:38:06.9635343Z 2023-01-11T21:38:06.9635347Z 2023-01-11T21:38:06.9635444Z async_compile.wait(globals()) 2023-01-11T21:38:06.9635523Z del async_compile 2023-01-11T21:38:06.9635528Z 2023-01-11T21:38:06.9635602Z def call(args): 2023-01-11T21:38:06.9635671Z arg0_1, = args 2023-01-11T21:38:06.9635749Z args.clear() 2023-01-11T21:38:06.9635841Z with torch.cuda.device(0): 2023-01-11T21:38:06.9636039Z buf1 = empty_strided((4, ), (1, ), device='cuda', dtype=torch.float64) 2023-01-11T21:38:06.9636133Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.9636297Z triton_fused_convert_element_type_1_sum_1_0.run(arg0_1, buf1, 1, 4, grid=grid(1), stream=stream0) 2023-01-11T21:38:06.9636374Z del arg0_1 2023-01-11T21:38:06.9636448Z return (buf1, ) 2023-01-11T21:38:06.9636488Z 2023-01-11T21:38:06.9636494Z 2023-01-11T21:38:06.9636570Z if __name__ == "__main__": 2023-01-11T21:38:06.9636713Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.9636839Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.9637041Z arg0_1 = rand_strided((4, ), (1, ), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.9637156Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.9637161Z 2023-01-11T21:38:06.9637236Z ok (1.351s) 2023-01-11T21:38:06.9637388Z test_divisibile_by_16_covers_numel_args (__main__.TritonCodeGenTests) ... 2023-01-11T21:38:06.9637482Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.9637560Z import torch 2023-01-11T21:38:06.9637634Z import random 2023-01-11T21:38:06.9637754Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.9637883Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.9637888Z 2023-01-11T21:38:06.9637973Z aten = torch.ops.aten 2023-01-11T21:38:06.9638112Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.9638204Z async_compile = AsyncCompile() 2023-01-11T21:38:06.9638219Z 2023-01-11T21:38:06.9638289Z import triton 2023-01-11T21:38:06.9638384Z import triton.language as tl 2023-01-11T21:38:06.9638510Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.9638653Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.9638658Z 2023-01-11T21:38:06.9638663Z 2023-01-11T21:38:06.9638818Z triton_fused_sum_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.9638893Z import triton 2023-01-11T21:38:06.9638989Z import triton.language as tl 2023-01-11T21:38:06.9639099Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.9639202Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.9639333Z from torch._inductor.triton_ops.autotune import reduction 2023-01-11T21:38:06.9639460Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.9639468Z 2023-01-11T21:38:06.9639561Z @reduction(size_hints=[8, 8192], 2023-01-11T21:38:06.9639677Z reduction_hint=ReductionHint.INNER, 2023-01-11T21:38:06.9639770Z filename=__file__, 2023-01-11T21:38:06.9640134Z meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 3), equal_to_1=())]}) 2023-01-11T21:38:06.9640204Z @triton.jit 2023-01-11T21:38:06.9640376Z def triton_(in_ptr0, out_ptr0, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr): 2023-01-11T21:38:06.9640449Z xnumel = 8 2023-01-11T21:38:06.9640526Z rnumel = 8192 2023-01-11T21:38:06.9640624Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.9640764Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.9640850Z xmask = xindex < xnumel 2023-01-11T21:38:06.9640965Z rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK]) 2023-01-11T21:38:06.9641041Z x0 = xindex 2023-01-11T21:38:06.9641162Z _tmp1 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.9641273Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.9641364Z rindex = roffset + rbase 2023-01-11T21:38:06.9641450Z rmask = rindex < rnumel 2023-01-11T21:38:06.9641521Z r1 = rindex 2023-01-11T21:38:06.9641633Z tmp0 = tl.load(in_ptr0 + (r1 + (8192*x0)), rmask & xmask) 2023-01-11T21:38:06.9641756Z _tmp1 = tl.where(xmask & rmask, _tmp1 + tmp0, _tmp1) 2023-01-11T21:38:06.9641873Z tmp1 = tl.reshape(tl.sum(_tmp1, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.9641974Z tl.store(out_ptr0 + x0, tmp1, xmask) 2023-01-11T21:38:06.9642061Z ''') 2023-01-11T21:38:06.9642067Z 2023-01-11T21:38:06.9642071Z 2023-01-11T21:38:06.9642230Z triton_fused_sum_1_1 = async_compile.triton(''' 2023-01-11T21:38:06.9642308Z import triton 2023-01-11T21:38:06.9642402Z import triton.language as tl 2023-01-11T21:38:06.9642541Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.9642647Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.9642809Z from torch._inductor.triton_ops.autotune import reduction 2023-01-11T21:38:06.9642936Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.9642942Z 2023-01-11T21:38:06.9643032Z @reduction(size_hints=[1, 8], 2023-01-11T21:38:06.9643150Z reduction_hint=ReductionHint.INNER, 2023-01-11T21:38:06.9643234Z filename=__file__, 2023-01-11T21:38:06.9643590Z meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]}) 2023-01-11T21:38:06.9643665Z @triton.jit 2023-01-11T21:38:06.9643835Z def triton_(in_ptr0, out_ptr0, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr): 2023-01-11T21:38:06.9643909Z xnumel = 1 2023-01-11T21:38:06.9643982Z rnumel = 8 2023-01-11T21:38:06.9644081Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.9644219Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1]) 2023-01-11T21:38:06.9644307Z xmask = xindex < xnumel 2023-01-11T21:38:06.9644422Z rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK]) 2023-01-11T21:38:06.9644539Z _tmp1 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0 2023-01-11T21:38:06.9644644Z for roffset in range(0, rnumel, RBLOCK): 2023-01-11T21:38:06.9644734Z rindex = roffset + rbase 2023-01-11T21:38:06.9644820Z rmask = rindex < rnumel 2023-01-11T21:38:06.9644893Z r0 = rindex 2023-01-11T21:38:06.9644998Z tmp0 = tl.load(in_ptr0 + (r0), rmask) 2023-01-11T21:38:06.9645115Z _tmp1 = tl.where(xmask & rmask, _tmp1 + tmp0, _tmp1) 2023-01-11T21:38:06.9645230Z tmp1 = tl.reshape(tl.sum(_tmp1, 1), [XBLOCK, 1]) 2023-01-11T21:38:06.9645363Z tl.store(out_ptr0 + 0 + tl.zeros([XBLOCK, 1], tl.int32), tmp1, None) 2023-01-11T21:38:06.9645449Z ''') 2023-01-11T21:38:06.9645454Z 2023-01-11T21:38:06.9645461Z 2023-01-11T21:38:06.9645557Z async_compile.wait(globals()) 2023-01-11T21:38:06.9645638Z del async_compile 2023-01-11T21:38:06.9645644Z 2023-01-11T21:38:06.9645719Z def call(args): 2023-01-11T21:38:06.9645799Z arg0_1, = args 2023-01-11T21:38:06.9645870Z args.clear() 2023-01-11T21:38:06.9645967Z with torch.cuda.device(0): 2023-01-11T21:38:06.9646165Z buf0 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.9646259Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.9646400Z triton_fused_sum_1_0.run(arg0_1, buf0, 8, 8192, grid=grid(8), stream=stream0) 2023-01-11T21:38:06.9646476Z del arg0_1 2023-01-11T21:38:06.9646667Z buf1 = empty_strided((), (), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.9646800Z triton_fused_sum_1_1.run(buf0, buf1, 1, 8, grid=grid(1), stream=stream0) 2023-01-11T21:38:06.9646878Z return (buf1, ) 2023-01-11T21:38:06.9646884Z 2023-01-11T21:38:06.9646888Z 2023-01-11T21:38:06.9646973Z if __name__ == "__main__": 2023-01-11T21:38:06.9647093Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.9647226Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.9647437Z arg0_1 = rand_strided((256, 256), (256, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.9647551Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.9647556Z 2023-01-11T21:38:06.9647628Z ok (0.211s) 2023-01-11T21:38:06.9648152Z test_optimize_compute (__main__.TritonCodeGenTests) ... /opt/conda/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py:372: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled.Consider setting `torch.set_float32_matmul_precision('high')` 2023-01-11T21:38:06.9648234Z warnings.warn( 2023-01-11T21:38:06.9648476Z [2023-01-11 21:38:02,114] torch._dynamo.symbolic_convert: [INFO] Step 2: torchdynamo start tracing fn 2023-01-11T21:38:06.9648817Z [2023-01-11 21:38:02,114] torch._dynamo.symbolic_convert: [DEBUG] TRACE starts_line /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:6113 2023-01-11T21:38:06.9649060Z [2023-01-11 21:38:02,114] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_DEREF suffix [] 2023-01-11T21:38:06.9649319Z [2023-01-11 21:38:02,114] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_DEREF foo [UserFunctionVariable()] 2023-01-11T21:38:06.9649615Z [2023-01-11 21:38:02,114] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_DEREF ones [UserFunctionVariable(), UserFunctionVariable()] 2023-01-11T21:38:06.9649947Z [2023-01-11 21:38:02,114] torch._dynamo.symbolic_convert: [DEBUG] TRACE CALL_FUNCTION 0 [UserFunctionVariable(), UserFunctionVariable(), UserFunctionVariable()] 2023-01-11T21:38:06.9650321Z [2023-01-11 21:38:02,115] torch._dynamo.symbolic_convert: [DEBUG] INLINING 2023-01-11T21:38:06.9650429Z 6101 0 LOAD_GLOBAL 0 (torch) 2023-01-11T21:38:06.9650522Z 2 LOAD_ATTR 1 (ones) 2023-01-11T21:38:06.9650621Z 4 LOAD_CONST 1 (4) 2023-01-11T21:38:06.9650719Z 6 BUILD_LIST 1 2023-01-11T21:38:06.9650862Z 8 LOAD_CONST 2 ('cuda') 2023-01-11T21:38:06.9651013Z 10 LOAD_CONST 3 (('device',)) 2023-01-11T21:38:06.9651112Z 12 CALL_FUNCTION_KW 2 2023-01-11T21:38:06.9651194Z 14 RETURN_VALUE 2023-01-11T21:38:06.9651255Z 2023-01-11T21:38:06.9651261Z 2023-01-11T21:38:06.9651571Z [2023-01-11 21:38:02,115] torch._dynamo.symbolic_convert: [DEBUG] TRACE starts_line /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:6101 2023-01-11T21:38:06.9651794Z [2023-01-11 21:38:02,115] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_GLOBAL torch [] 2023-01-11T21:38:06.9652154Z [2023-01-11 21:38:02,115] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_ATTR ones [TorchVariable()] 2023-01-11T21:38:06.9652471Z [2023-01-11 21:38:02,115] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_CONST 4 [TorchVariable()] 2023-01-11T21:38:06.9652820Z [2023-01-11 21:38:02,115] torch._dynamo.symbolic_convert: [DEBUG] TRACE BUILD_LIST 1 [TorchVariable(), ConstantVariable(int)] 2023-01-11T21:38:06.9653160Z [2023-01-11 21:38:02,115] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_CONST cuda [TorchVariable(), ListVariable()] 2023-01-11T21:38:06.9653546Z [2023-01-11 21:38:02,115] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_CONST ('device',) [TorchVariable(), ListVariable(), ConstantVariable(str)] 2023-01-11T21:38:06.9653968Z [2023-01-11 21:38:02,115] torch._dynamo.symbolic_convert: [DEBUG] TRACE CALL_FUNCTION_KW 2 [TorchVariable(), ListVariable(), ConstantVariable(str), ConstantVariable(tuple)] 2023-01-11T21:38:06.9654218Z [2023-01-11 21:38:02,121] torch._dynamo.symbolic_convert: [DEBUG] TRACE RETURN_VALUE None [TensorVariable()] 2023-01-11T21:38:06.9654704Z [2023-01-11 21:38:02,121] torch._dynamo.symbolic_convert: [DEBUG] DONE INLINING 2023-01-11T21:38:06.9655030Z [2023-01-11 21:38:02,121] torch._dynamo.symbolic_convert: [DEBUG] TRACE CALL_FUNCTION 1 [UserFunctionVariable(), UserFunctionVariable(), TensorVariable()] 2023-01-11T21:38:06.9655400Z [2023-01-11 21:38:02,121] torch._dynamo.symbolic_convert: [DEBUG] INLINING at 0x7fcc3e8ad0b0, file "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 6107> 2023-01-11T21:38:06.9655544Z 6107 0 LOAD_FAST 0 (x) 2023-01-11T21:38:06.9655646Z 2 LOAD_CONST 1 (500) 2023-01-11T21:38:06.9655755Z 4 BINARY_ADD 2023-01-11T21:38:06.9655838Z 6 RETURN_VALUE 2023-01-11T21:38:06.9655905Z 2023-01-11T21:38:06.9655910Z 2023-01-11T21:38:06.9656124Z [2023-01-11 21:38:02,122] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_FAST x [] 2023-01-11T21:38:06.9656366Z [2023-01-11 21:38:02,122] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_CONST 500 [TensorVariable()] 2023-01-11T21:38:06.9656641Z [2023-01-11 21:38:02,122] torch._dynamo.symbolic_convert: [DEBUG] TRACE BINARY_ADD None [TensorVariable(), ConstantVariable(int)] 2023-01-11T21:38:06.9656886Z [2023-01-11 21:38:02,122] torch._dynamo.symbolic_convert: [DEBUG] TRACE RETURN_VALUE None [TensorVariable()] 2023-01-11T21:38:06.9657326Z [2023-01-11 21:38:02,122] torch._dynamo.symbolic_convert: [DEBUG] DONE INLINING at 0x7fcc3e8ad0b0, file "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 6107> 2023-01-11T21:38:06.9657615Z [2023-01-11 21:38:02,123] torch._dynamo.symbolic_convert: [DEBUG] TRACE CALL_FUNCTION 1 [UserFunctionVariable(), TensorVariable()] 2023-01-11T21:38:06.9657993Z [2023-01-11 21:38:02,123] torch._dynamo.symbolic_convert: [DEBUG] INLINING 2023-01-11T21:38:06.9658094Z 6104 0 LOAD_FAST 0 (inp) 2023-01-11T21:38:06.9658195Z 2 LOAD_METHOD 0 (to) 2023-01-11T21:38:06.9658295Z 4 LOAD_GLOBAL 1 (torch) 2023-01-11T21:38:06.9658392Z 6 LOAD_ATTR 2 (int64) 2023-01-11T21:38:06.9658481Z 8 CALL_METHOD 1 2023-01-11T21:38:06.9658581Z 10 LOAD_CONST 1 (1) 2023-01-11T21:38:06.9658662Z 12 BINARY_ADD 2023-01-11T21:38:06.9658764Z 14 LOAD_METHOD 0 (to) 2023-01-11T21:38:06.9658866Z 16 LOAD_GLOBAL 1 (torch) 2023-01-11T21:38:06.9658971Z 18 LOAD_ATTR 3 (float64) 2023-01-11T21:38:06.9659068Z 20 CALL_METHOD 1 2023-01-11T21:38:06.9659145Z 22 RETURN_VALUE 2023-01-11T21:38:06.9659209Z 2023-01-11T21:38:06.9659214Z 2023-01-11T21:38:06.9659529Z [2023-01-11 21:38:02,123] torch._dynamo.symbolic_convert: [DEBUG] TRACE starts_line /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:6104 2023-01-11T21:38:06.9659748Z [2023-01-11 21:38:02,123] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_FAST inp [] 2023-01-11T21:38:06.9659990Z [2023-01-11 21:38:02,123] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_ATTR to [TensorVariable()] 2023-01-11T21:38:06.9660271Z [2023-01-11 21:38:02,123] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_GLOBAL torch [GetAttrVariable(TensorVariable(), to)] 2023-01-11T21:38:06.9660697Z [2023-01-11 21:38:02,123] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_ATTR int64 [GetAttrVariable(TensorVariable(), to), TorchVariable()] 2023-01-11T21:38:06.9661021Z [2023-01-11 21:38:02,123] torch._dynamo.symbolic_convert: [DEBUG] TRACE CALL_FUNCTION 1 [GetAttrVariable(TensorVariable(), to), ConstantVariable(dtype)] 2023-01-11T21:38:06.9661260Z [2023-01-11 21:38:02,129] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_CONST 1 [TensorVariable()] 2023-01-11T21:38:06.9661539Z [2023-01-11 21:38:02,130] torch._dynamo.symbolic_convert: [DEBUG] TRACE BINARY_ADD None [TensorVariable(), ConstantVariable(int)] 2023-01-11T21:38:06.9661770Z [2023-01-11 21:38:02,130] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_ATTR to [TensorVariable()] 2023-01-11T21:38:06.9662047Z [2023-01-11 21:38:02,130] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_GLOBAL torch [GetAttrVariable(TensorVariable(), to)] 2023-01-11T21:38:06.9662514Z [2023-01-11 21:38:02,131] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_ATTR float64 [GetAttrVariable(TensorVariable(), to), TorchVariable()] 2023-01-11T21:38:06.9662847Z [2023-01-11 21:38:02,131] torch._dynamo.symbolic_convert: [DEBUG] TRACE CALL_FUNCTION 1 [GetAttrVariable(TensorVariable(), to), ConstantVariable(dtype)] 2023-01-11T21:38:06.9663094Z [2023-01-11 21:38:02,137] torch._dynamo.symbolic_convert: [DEBUG] TRACE RETURN_VALUE None [TensorVariable()] 2023-01-11T21:38:06.9663471Z [2023-01-11 21:38:02,137] torch._dynamo.symbolic_convert: [DEBUG] DONE INLINING 2023-01-11T21:38:06.9663715Z [2023-01-11 21:38:02,137] torch._dynamo.symbolic_convert: [DEBUG] TRACE RETURN_VALUE None [TensorVariable()] 2023-01-11T21:38:06.9663972Z [2023-01-11 21:38:02,137] torch._dynamo.symbolic_convert: [INFO] Step 2: torchdynamo done tracing fn (RETURN_VALUE) 2023-01-11T21:38:06.9664200Z [2023-01-11 21:38:02,137] torch._dynamo.symbolic_convert: [DEBUG] RETURN_VALUE triggered compile 2023-01-11T21:38:06.9664421Z [2023-01-11 21:38:02,137] torch._dynamo.output_graph: [DEBUG] COMPILING GRAPH due to None 2023-01-11T21:38:06.9664665Z [2023-01-11 21:38:02,138] torch._dynamo.output_graph: [INFO] Step 3: calling compiler function compile_fx 2023-01-11T21:38:06.9664918Z [2023-01-11 21:38:02,179] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1155 2023-01-11T21:38:06.9665155Z [2023-01-11 21:38:02,187] torch._inductor.optimize_indexing: [WARNING] unhandled ValueRange op name 2023-01-11T21:38:06.9665417Z [2023-01-11 21:38:02,402] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1155 2023-01-11T21:38:06.9665830Z [2023-01-11 21:38:02,402] torch._inductor.debug: [WARNING] model__1155_inference_1201 debug trace: /var/lib/jenkins/workspace/test/torch_compile_debug_tmp_4/run_2023_01_11_21_38_02_113801/aot_torchinductor/model__1155_inference_1201.9 2023-01-11T21:38:06.9666070Z [2023-01-11 21:38:02,402] torch._dynamo.output_graph: [INFO] Step 3: done compiler function compile_fx 2023-01-11T21:38:06.9666449Z [2023-01-11 21:38:02,404] torch._inductor.debug: [WARNING] model___1202 debug trace: /var/lib/jenkins/workspace/test/torch_compile_debug_tmp_4/run_2023_01_11_21_38_02_113801/aot_torchinductor/model___1201.8 2023-01-11T21:38:06.9666684Z [2023-01-11 21:38:02,406] torch._dynamo.symbolic_convert: [INFO] Step 2: torchdynamo start tracing fn 2023-01-11T21:38:06.9666998Z [2023-01-11 21:38:02,406] torch._dynamo.symbolic_convert: [DEBUG] TRACE starts_line /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:6113 2023-01-11T21:38:06.9667222Z [2023-01-11 21:38:02,406] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_DEREF suffix [] 2023-01-11T21:38:06.9667482Z [2023-01-11 21:38:02,407] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_DEREF foo [UserFunctionVariable()] 2023-01-11T21:38:06.9667780Z [2023-01-11 21:38:02,407] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_DEREF ones [UserFunctionVariable(), UserFunctionVariable()] 2023-01-11T21:38:06.9668119Z [2023-01-11 21:38:02,407] torch._dynamo.symbolic_convert: [DEBUG] TRACE CALL_FUNCTION 0 [UserFunctionVariable(), UserFunctionVariable(), UserFunctionVariable()] 2023-01-11T21:38:06.9668495Z [2023-01-11 21:38:02,407] torch._dynamo.symbolic_convert: [DEBUG] INLINING 2023-01-11T21:38:06.9668597Z 6101 0 LOAD_GLOBAL 0 (torch) 2023-01-11T21:38:06.9668696Z 2 LOAD_ATTR 1 (ones) 2023-01-11T21:38:06.9668795Z 4 LOAD_CONST 1 (4) 2023-01-11T21:38:06.9668892Z 6 BUILD_LIST 1 2023-01-11T21:38:06.9669039Z 8 LOAD_CONST 2 ('cuda') 2023-01-11T21:38:06.9669187Z 10 LOAD_CONST 3 (('device',)) 2023-01-11T21:38:06.9669318Z 12 CALL_FUNCTION_KW 2 2023-01-11T21:38:06.9669403Z 14 RETURN_VALUE 2023-01-11T21:38:06.9669505Z 2023-01-11T21:38:06.9669510Z 2023-01-11T21:38:06.9669825Z [2023-01-11 21:38:02,407] torch._dynamo.symbolic_convert: [DEBUG] TRACE starts_line /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:6101 2023-01-11T21:38:06.9670047Z [2023-01-11 21:38:02,407] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_GLOBAL torch [] 2023-01-11T21:38:06.9670401Z [2023-01-11 21:38:02,407] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_ATTR ones [TorchVariable()] 2023-01-11T21:38:06.9670722Z [2023-01-11 21:38:02,407] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_CONST 4 [TorchVariable()] 2023-01-11T21:38:06.9671074Z [2023-01-11 21:38:02,407] torch._dynamo.symbolic_convert: [DEBUG] TRACE BUILD_LIST 1 [TorchVariable(), ConstantVariable(int)] 2023-01-11T21:38:06.9671413Z [2023-01-11 21:38:02,407] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_CONST cuda [TorchVariable(), ListVariable()] 2023-01-11T21:38:06.9671792Z [2023-01-11 21:38:02,407] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_CONST ('device',) [TorchVariable(), ListVariable(), ConstantVariable(str)] 2023-01-11T21:38:06.9672212Z [2023-01-11 21:38:02,408] torch._dynamo.symbolic_convert: [DEBUG] TRACE CALL_FUNCTION_KW 2 [TorchVariable(), ListVariable(), ConstantVariable(str), ConstantVariable(tuple)] 2023-01-11T21:38:06.9672459Z [2023-01-11 21:38:02,413] torch._dynamo.symbolic_convert: [DEBUG] TRACE RETURN_VALUE None [TensorVariable()] 2023-01-11T21:38:06.9672838Z [2023-01-11 21:38:02,413] torch._dynamo.symbolic_convert: [DEBUG] DONE INLINING 2023-01-11T21:38:06.9673169Z [2023-01-11 21:38:02,413] torch._dynamo.symbolic_convert: [DEBUG] TRACE CALL_FUNCTION 1 [UserFunctionVariable(), UserFunctionVariable(), TensorVariable()] 2023-01-11T21:38:06.9673545Z [2023-01-11 21:38:02,413] torch._dynamo.symbolic_convert: [DEBUG] INLINING at 0x7fcc3e8ad160, file "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 6108> 2023-01-11T21:38:06.9673650Z 6108 0 LOAD_GLOBAL 0 (torch) 2023-01-11T21:38:06.9673751Z 2 LOAD_METHOD 1 (where) 2023-01-11T21:38:06.9673851Z 4 LOAD_FAST 0 (x) 2023-01-11T21:38:06.9673950Z 6 LOAD_CONST 1 (0) 2023-01-11T21:38:06.9674042Z 8 COMPARE_OP 0 (<) 2023-01-11T21:38:06.9674145Z 10 LOAD_DEREF 0 (ones) 2023-01-11T21:38:06.9674245Z 12 CALL_FUNCTION 0 2023-01-11T21:38:06.9674346Z 14 LOAD_DEREF 0 (ones) 2023-01-11T21:38:06.9674446Z 16 CALL_FUNCTION 0 2023-01-11T21:38:06.9674541Z 18 LOAD_CONST 2 (2) 2023-01-11T21:38:06.9674628Z 20 BINARY_SUBTRACT 2023-01-11T21:38:06.9674716Z 22 CALL_METHOD 3 2023-01-11T21:38:06.9674867Z 24 LOAD_CONST 3 (-1048576) 2023-01-11T21:38:06.9674956Z 26 BINARY_MULTIPLY 2023-01-11T21:38:06.9675037Z 28 RETURN_VALUE 2023-01-11T21:38:06.9675110Z 2023-01-11T21:38:06.9675115Z 2023-01-11T21:38:06.9675341Z [2023-01-11 21:38:02,414] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_GLOBAL torch [] 2023-01-11T21:38:06.9675750Z [2023-01-11 21:38:02,414] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_ATTR where [TorchVariable()] 2023-01-11T21:38:06.9676096Z [2023-01-11 21:38:02,414] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_FAST x [TorchVariable()] 2023-01-11T21:38:06.9676458Z [2023-01-11 21:38:02,414] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_CONST 0 [TorchVariable(), TensorVariable()] 2023-01-11T21:38:06.9676832Z [2023-01-11 21:38:02,414] torch._dynamo.symbolic_convert: [DEBUG] TRACE COMPARE_OP < [TorchVariable(), TensorVariable(), ConstantVariable(int)] 2023-01-11T21:38:06.9677179Z [2023-01-11 21:38:02,415] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_DEREF ones [TorchVariable(), TensorVariable()] 2023-01-11T21:38:06.9677566Z [2023-01-11 21:38:02,415] torch._dynamo.symbolic_convert: [DEBUG] TRACE CALL_FUNCTION 0 [TorchVariable(), TensorVariable(), UserFunctionVariable()] 2023-01-11T21:38:06.9677939Z [2023-01-11 21:38:02,415] torch._dynamo.symbolic_convert: [DEBUG] INLINING 2023-01-11T21:38:06.9678046Z 6101 0 LOAD_GLOBAL 0 (torch) 2023-01-11T21:38:06.9678147Z 2 LOAD_ATTR 1 (ones) 2023-01-11T21:38:06.9678244Z 4 LOAD_CONST 1 (4) 2023-01-11T21:38:06.9678336Z 6 BUILD_LIST 1 2023-01-11T21:38:06.9678481Z 8 LOAD_CONST 2 ('cuda') 2023-01-11T21:38:06.9678628Z 10 LOAD_CONST 3 (('device',)) 2023-01-11T21:38:06.9678726Z 12 CALL_FUNCTION_KW 2 2023-01-11T21:38:06.9678809Z 14 RETURN_VALUE 2023-01-11T21:38:06.9678877Z 2023-01-11T21:38:06.9678882Z 2023-01-11T21:38:06.9679196Z [2023-01-11 21:38:02,415] torch._dynamo.symbolic_convert: [DEBUG] TRACE starts_line /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:6101 2023-01-11T21:38:06.9679423Z [2023-01-11 21:38:02,415] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_GLOBAL torch [] 2023-01-11T21:38:06.9679785Z [2023-01-11 21:38:02,415] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_ATTR ones [TorchVariable()] 2023-01-11T21:38:06.9680102Z [2023-01-11 21:38:02,415] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_CONST 4 [TorchVariable()] 2023-01-11T21:38:06.9680448Z [2023-01-11 21:38:02,416] torch._dynamo.symbolic_convert: [DEBUG] TRACE BUILD_LIST 1 [TorchVariable(), ConstantVariable(int)] 2023-01-11T21:38:06.9680788Z [2023-01-11 21:38:02,416] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_CONST cuda [TorchVariable(), ListVariable()] 2023-01-11T21:38:06.9681167Z [2023-01-11 21:38:02,416] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_CONST ('device',) [TorchVariable(), ListVariable(), ConstantVariable(str)] 2023-01-11T21:38:06.9681588Z [2023-01-11 21:38:02,416] torch._dynamo.symbolic_convert: [DEBUG] TRACE CALL_FUNCTION_KW 2 [TorchVariable(), ListVariable(), ConstantVariable(str), ConstantVariable(tuple)] 2023-01-11T21:38:06.9681834Z [2023-01-11 21:38:02,421] torch._dynamo.symbolic_convert: [DEBUG] TRACE RETURN_VALUE None [TensorVariable()] 2023-01-11T21:38:06.9682208Z [2023-01-11 21:38:02,421] torch._dynamo.symbolic_convert: [DEBUG] DONE INLINING 2023-01-11T21:38:06.9682615Z [2023-01-11 21:38:02,421] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_DEREF ones [TorchVariable(), TensorVariable(), TensorVariable()] 2023-01-11T21:38:06.9683029Z [2023-01-11 21:38:02,421] torch._dynamo.symbolic_convert: [DEBUG] TRACE CALL_FUNCTION 0 [TorchVariable(), TensorVariable(), TensorVariable(), UserFunctionVariable()] 2023-01-11T21:38:06.9683423Z [2023-01-11 21:38:02,421] torch._dynamo.symbolic_convert: [DEBUG] INLINING 2023-01-11T21:38:06.9683528Z 6101 0 LOAD_GLOBAL 0 (torch) 2023-01-11T21:38:06.9683627Z 2 LOAD_ATTR 1 (ones) 2023-01-11T21:38:06.9683724Z 4 LOAD_CONST 1 (4) 2023-01-11T21:38:06.9683818Z 6 BUILD_LIST 1 2023-01-11T21:38:06.9683957Z 8 LOAD_CONST 2 ('cuda') 2023-01-11T21:38:06.9684113Z 10 LOAD_CONST 3 (('device',)) 2023-01-11T21:38:06.9684214Z 12 CALL_FUNCTION_KW 2 2023-01-11T21:38:06.9684296Z 14 RETURN_VALUE 2023-01-11T21:38:06.9684367Z 2023-01-11T21:38:06.9684372Z 2023-01-11T21:38:06.9684685Z [2023-01-11 21:38:02,422] torch._dynamo.symbolic_convert: [DEBUG] TRACE starts_line /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:6101 2023-01-11T21:38:06.9684909Z [2023-01-11 21:38:02,422] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_GLOBAL torch [] 2023-01-11T21:38:06.9685266Z [2023-01-11 21:38:02,422] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_ATTR ones [TorchVariable()] 2023-01-11T21:38:06.9685619Z [2023-01-11 21:38:02,422] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_CONST 4 [TorchVariable()] 2023-01-11T21:38:06.9685975Z [2023-01-11 21:38:02,422] torch._dynamo.symbolic_convert: [DEBUG] TRACE BUILD_LIST 1 [TorchVariable(), ConstantVariable(int)] 2023-01-11T21:38:06.9686314Z [2023-01-11 21:38:02,422] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_CONST cuda [TorchVariable(), ListVariable()] 2023-01-11T21:38:06.9686698Z [2023-01-11 21:38:02,422] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_CONST ('device',) [TorchVariable(), ListVariable(), ConstantVariable(str)] 2023-01-11T21:38:06.9687118Z [2023-01-11 21:38:02,422] torch._dynamo.symbolic_convert: [DEBUG] TRACE CALL_FUNCTION_KW 2 [TorchVariable(), ListVariable(), ConstantVariable(str), ConstantVariable(tuple)] 2023-01-11T21:38:06.9687365Z [2023-01-11 21:38:02,427] torch._dynamo.symbolic_convert: [DEBUG] TRACE RETURN_VALUE None [TensorVariable()] 2023-01-11T21:38:06.9687742Z [2023-01-11 21:38:02,427] torch._dynamo.symbolic_convert: [DEBUG] DONE INLINING 2023-01-11T21:38:06.9688144Z [2023-01-11 21:38:02,427] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_CONST 2 [TorchVariable(), TensorVariable(), TensorVariable(), TensorVariable()] 2023-01-11T21:38:06.9688587Z [2023-01-11 21:38:02,428] torch._dynamo.symbolic_convert: [DEBUG] TRACE BINARY_SUBTRACT None [TorchVariable(), TensorVariable(), TensorVariable(), TensorVariable(), ConstantVariable(int)] 2023-01-11T21:38:06.9688985Z [2023-01-11 21:38:02,428] torch._dynamo.symbolic_convert: [DEBUG] TRACE CALL_FUNCTION 3 [TorchVariable(), TensorVariable(), TensorVariable(), TensorVariable()] 2023-01-11T21:38:06.9689235Z [2023-01-11 21:38:02,429] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_CONST -1048576 [TensorVariable()] 2023-01-11T21:38:06.9689552Z [2023-01-11 21:38:02,429] torch._dynamo.symbolic_convert: [DEBUG] TRACE BINARY_MULTIPLY None [TensorVariable(), ConstantVariable(int)] 2023-01-11T21:38:06.9689818Z [2023-01-11 21:38:02,430] torch._dynamo.symbolic_convert: [DEBUG] TRACE RETURN_VALUE None [TensorVariable()] 2023-01-11T21:38:06.9690190Z [2023-01-11 21:38:02,430] torch._dynamo.symbolic_convert: [DEBUG] DONE INLINING at 0x7fcc3e8ad160, file "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 6108> 2023-01-11T21:38:06.9690472Z [2023-01-11 21:38:02,430] torch._dynamo.symbolic_convert: [DEBUG] TRACE CALL_FUNCTION 1 [UserFunctionVariable(), TensorVariable()] 2023-01-11T21:38:06.9690845Z [2023-01-11 21:38:02,430] torch._dynamo.symbolic_convert: [DEBUG] INLINING 2023-01-11T21:38:06.9690948Z 6104 0 LOAD_FAST 0 (inp) 2023-01-11T21:38:06.9691051Z 2 LOAD_METHOD 0 (to) 2023-01-11T21:38:06.9691154Z 4 LOAD_GLOBAL 1 (torch) 2023-01-11T21:38:06.9691258Z 6 LOAD_ATTR 2 (int64) 2023-01-11T21:38:06.9691356Z 8 CALL_METHOD 1 2023-01-11T21:38:06.9691454Z 10 LOAD_CONST 1 (1) 2023-01-11T21:38:06.9691530Z 12 BINARY_ADD 2023-01-11T21:38:06.9691629Z 14 LOAD_METHOD 0 (to) 2023-01-11T21:38:06.9691730Z 16 LOAD_GLOBAL 1 (torch) 2023-01-11T21:38:06.9691833Z 18 LOAD_ATTR 3 (float64) 2023-01-11T21:38:06.9691928Z 20 CALL_METHOD 1 2023-01-11T21:38:06.9692012Z 22 RETURN_VALUE 2023-01-11T21:38:06.9692080Z 2023-01-11T21:38:06.9692086Z 2023-01-11T21:38:06.9692396Z [2023-01-11 21:38:02,430] torch._dynamo.symbolic_convert: [DEBUG] TRACE starts_line /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:6104 2023-01-11T21:38:06.9692616Z [2023-01-11 21:38:02,430] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_FAST inp [] 2023-01-11T21:38:06.9692860Z [2023-01-11 21:38:02,430] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_ATTR to [TensorVariable()] 2023-01-11T21:38:06.9693141Z [2023-01-11 21:38:02,431] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_GLOBAL torch [GetAttrVariable(TensorVariable(), to)] 2023-01-11T21:38:06.9693555Z [2023-01-11 21:38:02,431] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_ATTR int64 [GetAttrVariable(TensorVariable(), to), TorchVariable()] 2023-01-11T21:38:06.9693874Z [2023-01-11 21:38:02,431] torch._dynamo.symbolic_convert: [DEBUG] TRACE CALL_FUNCTION 1 [GetAttrVariable(TensorVariable(), to), ConstantVariable(dtype)] 2023-01-11T21:38:06.9694110Z [2023-01-11 21:38:02,437] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_CONST 1 [TensorVariable()] 2023-01-11T21:38:06.9694388Z [2023-01-11 21:38:02,437] torch._dynamo.symbolic_convert: [DEBUG] TRACE BINARY_ADD None [TensorVariable(), ConstantVariable(int)] 2023-01-11T21:38:06.9694743Z [2023-01-11 21:38:02,438] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_ATTR to [TensorVariable()] 2023-01-11T21:38:06.9695015Z [2023-01-11 21:38:02,438] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_GLOBAL torch [GetAttrVariable(TensorVariable(), to)] 2023-01-11T21:38:06.9695424Z [2023-01-11 21:38:02,438] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_ATTR float64 [GetAttrVariable(TensorVariable(), to), TorchVariable()] 2023-01-11T21:38:06.9695733Z [2023-01-11 21:38:02,438] torch._dynamo.symbolic_convert: [DEBUG] TRACE CALL_FUNCTION 1 [GetAttrVariable(TensorVariable(), to), ConstantVariable(dtype)] 2023-01-11T21:38:06.9695980Z [2023-01-11 21:38:02,444] torch._dynamo.symbolic_convert: [DEBUG] TRACE RETURN_VALUE None [TensorVariable()] 2023-01-11T21:38:06.9696396Z [2023-01-11 21:38:02,444] torch._dynamo.symbolic_convert: [DEBUG] DONE INLINING 2023-01-11T21:38:06.9696674Z [2023-01-11 21:38:02,444] torch._dynamo.symbolic_convert: [DEBUG] TRACE RETURN_VALUE None [TensorVariable()] 2023-01-11T21:38:06.9696933Z [2023-01-11 21:38:02,444] torch._dynamo.symbolic_convert: [INFO] Step 2: torchdynamo done tracing fn (RETURN_VALUE) 2023-01-11T21:38:06.9697207Z [2023-01-11 21:38:02,444] torch._dynamo.symbolic_convert: [DEBUG] RETURN_VALUE triggered compile 2023-01-11T21:38:06.9697456Z [2023-01-11 21:38:02,445] torch._dynamo.output_graph: [DEBUG] COMPILING GRAPH due to None 2023-01-11T21:38:06.9697724Z [2023-01-11 21:38:02,446] torch._dynamo.output_graph: [INFO] Step 3: calling compiler function compile_fx 2023-01-11T21:38:06.9697982Z [2023-01-11 21:38:02,516] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1156 2023-01-11T21:38:06.9698213Z [2023-01-11 21:38:02,526] torch._inductor.optimize_indexing: [WARNING] unhandled ValueRange op name 2023-01-11T21:38:06.9698479Z [2023-01-11 21:38:02,748] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1156 2023-01-11T21:38:06.9698892Z [2023-01-11 21:38:02,748] torch._inductor.debug: [WARNING] model__1156_inference_1202 debug trace: /var/lib/jenkins/workspace/test/torch_compile_debug_tmp_5/run_2023_01_11_21_38_02_406037/aot_torchinductor/model__1156_inference_1202.11 2023-01-11T21:38:06.9699134Z [2023-01-11 21:38:02,749] torch._dynamo.output_graph: [INFO] Step 3: done compiler function compile_fx 2023-01-11T21:38:06.9699510Z [2023-01-11 21:38:02,750] torch._inductor.debug: [WARNING] model___1203 debug trace: /var/lib/jenkins/workspace/test/torch_compile_debug_tmp_5/run_2023_01_11_21_38_02_406037/aot_torchinductor/model___1202.10 2023-01-11T21:38:06.9699743Z [2023-01-11 21:38:02,753] torch._dynamo.symbolic_convert: [INFO] Step 2: torchdynamo start tracing fn 2023-01-11T21:38:06.9700055Z [2023-01-11 21:38:02,753] torch._dynamo.symbolic_convert: [DEBUG] TRACE starts_line /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:6113 2023-01-11T21:38:06.9700277Z [2023-01-11 21:38:02,753] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_DEREF suffix [] 2023-01-11T21:38:06.9700536Z [2023-01-11 21:38:02,753] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_DEREF foo [UserFunctionVariable()] 2023-01-11T21:38:06.9700833Z [2023-01-11 21:38:02,753] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_DEREF ones [UserFunctionVariable(), UserFunctionVariable()] 2023-01-11T21:38:06.9701171Z [2023-01-11 21:38:02,753] torch._dynamo.symbolic_convert: [DEBUG] TRACE CALL_FUNCTION 0 [UserFunctionVariable(), UserFunctionVariable(), UserFunctionVariable()] 2023-01-11T21:38:06.9701534Z [2023-01-11 21:38:02,753] torch._dynamo.symbolic_convert: [DEBUG] INLINING 2023-01-11T21:38:06.9701643Z 6101 0 LOAD_GLOBAL 0 (torch) 2023-01-11T21:38:06.9701745Z 2 LOAD_ATTR 1 (ones) 2023-01-11T21:38:06.9701844Z 4 LOAD_CONST 1 (4) 2023-01-11T21:38:06.9701945Z 6 BUILD_LIST 1 2023-01-11T21:38:06.9702093Z 8 LOAD_CONST 2 ('cuda') 2023-01-11T21:38:06.9702250Z 10 LOAD_CONST 3 (('device',)) 2023-01-11T21:38:06.9702350Z 12 CALL_FUNCTION_KW 2 2023-01-11T21:38:06.9702427Z 14 RETURN_VALUE 2023-01-11T21:38:06.9702493Z 2023-01-11T21:38:06.9702499Z 2023-01-11T21:38:06.9702809Z [2023-01-11 21:38:02,753] torch._dynamo.symbolic_convert: [DEBUG] TRACE starts_line /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:6101 2023-01-11T21:38:06.9703031Z [2023-01-11 21:38:02,753] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_GLOBAL torch [] 2023-01-11T21:38:06.9703417Z [2023-01-11 21:38:02,753] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_ATTR ones [TorchVariable()] 2023-01-11T21:38:06.9703788Z [2023-01-11 21:38:02,754] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_CONST 4 [TorchVariable()] 2023-01-11T21:38:06.9704138Z [2023-01-11 21:38:02,754] torch._dynamo.symbolic_convert: [DEBUG] TRACE BUILD_LIST 1 [TorchVariable(), ConstantVariable(int)] 2023-01-11T21:38:06.9704476Z [2023-01-11 21:38:02,754] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_CONST cuda [TorchVariable(), ListVariable()] 2023-01-11T21:38:06.9704860Z [2023-01-11 21:38:02,754] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_CONST ('device',) [TorchVariable(), ListVariable(), ConstantVariable(str)] 2023-01-11T21:38:06.9705281Z [2023-01-11 21:38:02,754] torch._dynamo.symbolic_convert: [DEBUG] TRACE CALL_FUNCTION_KW 2 [TorchVariable(), ListVariable(), ConstantVariable(str), ConstantVariable(tuple)] 2023-01-11T21:38:06.9705556Z [2023-01-11 21:38:02,759] torch._dynamo.symbolic_convert: [DEBUG] TRACE RETURN_VALUE None [TensorVariable()] 2023-01-11T21:38:06.9705950Z [2023-01-11 21:38:02,759] torch._dynamo.symbolic_convert: [DEBUG] DONE INLINING 2023-01-11T21:38:06.9706275Z [2023-01-11 21:38:02,759] torch._dynamo.symbolic_convert: [DEBUG] TRACE CALL_FUNCTION 1 [UserFunctionVariable(), UserFunctionVariable(), TensorVariable()] 2023-01-11T21:38:06.9706657Z [2023-01-11 21:38:02,760] torch._dynamo.symbolic_convert: [DEBUG] INLINING at 0x7fcc3e8ad210, file "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 6109> 2023-01-11T21:38:06.9706763Z 6109 0 LOAD_FAST 0 (x) 2023-01-11T21:38:06.9706863Z 2 LOAD_CONST 1 (30) 2023-01-11T21:38:06.9706957Z 4 BINARY_TRUE_DIVIDE 2023-01-11T21:38:06.9707040Z 6 RETURN_VALUE 2023-01-11T21:38:06.9707108Z 2023-01-11T21:38:06.9707113Z 2023-01-11T21:38:06.9707320Z [2023-01-11 21:38:02,760] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_FAST x [] 2023-01-11T21:38:06.9707569Z [2023-01-11 21:38:02,760] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_CONST 30 [TensorVariable()] 2023-01-11T21:38:06.9707861Z [2023-01-11 21:38:02,760] torch._dynamo.symbolic_convert: [DEBUG] TRACE BINARY_TRUE_DIVIDE None [TensorVariable(), ConstantVariable(int)] 2023-01-11T21:38:06.9708112Z [2023-01-11 21:38:02,761] torch._dynamo.symbolic_convert: [DEBUG] TRACE RETURN_VALUE None [TensorVariable()] 2023-01-11T21:38:06.9708495Z [2023-01-11 21:38:02,761] torch._dynamo.symbolic_convert: [DEBUG] DONE INLINING at 0x7fcc3e8ad210, file "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 6109> 2023-01-11T21:38:06.9708791Z [2023-01-11 21:38:02,761] torch._dynamo.symbolic_convert: [DEBUG] TRACE CALL_FUNCTION 1 [UserFunctionVariable(), TensorVariable()] 2023-01-11T21:38:06.9709160Z [2023-01-11 21:38:02,761] torch._dynamo.symbolic_convert: [DEBUG] INLINING 2023-01-11T21:38:06.9709260Z 6104 0 LOAD_FAST 0 (inp) 2023-01-11T21:38:06.9709362Z 2 LOAD_METHOD 0 (to) 2023-01-11T21:38:06.9709464Z 4 LOAD_GLOBAL 1 (torch) 2023-01-11T21:38:06.9709555Z 6 LOAD_ATTR 2 (int64) 2023-01-11T21:38:06.9709649Z 8 CALL_METHOD 1 2023-01-11T21:38:06.9709745Z 10 LOAD_CONST 1 (1) 2023-01-11T21:38:06.9709824Z 12 BINARY_ADD 2023-01-11T21:38:06.9709955Z 14 LOAD_METHOD 0 (to) 2023-01-11T21:38:06.9710056Z 16 LOAD_GLOBAL 1 (torch) 2023-01-11T21:38:06.9710181Z 18 LOAD_ATTR 3 (float64) 2023-01-11T21:38:06.9710269Z 20 CALL_METHOD 1 2023-01-11T21:38:06.9710352Z 22 RETURN_VALUE 2023-01-11T21:38:06.9710419Z 2023-01-11T21:38:06.9710424Z 2023-01-11T21:38:06.9710738Z [2023-01-11 21:38:02,761] torch._dynamo.symbolic_convert: [DEBUG] TRACE starts_line /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:6104 2023-01-11T21:38:06.9710958Z [2023-01-11 21:38:02,761] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_FAST inp [] 2023-01-11T21:38:06.9711202Z [2023-01-11 21:38:02,761] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_ATTR to [TensorVariable()] 2023-01-11T21:38:06.9711481Z [2023-01-11 21:38:02,761] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_GLOBAL torch [GetAttrVariable(TensorVariable(), to)] 2023-01-11T21:38:06.9711902Z [2023-01-11 21:38:02,761] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_ATTR int64 [GetAttrVariable(TensorVariable(), to), TorchVariable()] 2023-01-11T21:38:06.9712217Z [2023-01-11 21:38:02,762] torch._dynamo.symbolic_convert: [DEBUG] TRACE CALL_FUNCTION 1 [GetAttrVariable(TensorVariable(), to), ConstantVariable(dtype)] 2023-01-11T21:38:06.9712447Z [2023-01-11 21:38:02,768] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_CONST 1 [TensorVariable()] 2023-01-11T21:38:06.9712721Z [2023-01-11 21:38:02,768] torch._dynamo.symbolic_convert: [DEBUG] TRACE BINARY_ADD None [TensorVariable(), ConstantVariable(int)] 2023-01-11T21:38:06.9712957Z [2023-01-11 21:38:02,768] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_ATTR to [TensorVariable()] 2023-01-11T21:38:06.9713225Z [2023-01-11 21:38:02,769] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_GLOBAL torch [GetAttrVariable(TensorVariable(), to)] 2023-01-11T21:38:06.9713646Z [2023-01-11 21:38:02,769] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_ATTR float64 [GetAttrVariable(TensorVariable(), to), TorchVariable()] 2023-01-11T21:38:06.9713962Z [2023-01-11 21:38:02,769] torch._dynamo.symbolic_convert: [DEBUG] TRACE CALL_FUNCTION 1 [GetAttrVariable(TensorVariable(), to), ConstantVariable(dtype)] 2023-01-11T21:38:06.9714206Z [2023-01-11 21:38:02,775] torch._dynamo.symbolic_convert: [DEBUG] TRACE RETURN_VALUE None [TensorVariable()] 2023-01-11T21:38:06.9714583Z [2023-01-11 21:38:02,775] torch._dynamo.symbolic_convert: [DEBUG] DONE INLINING 2023-01-11T21:38:06.9714837Z [2023-01-11 21:38:02,775] torch._dynamo.symbolic_convert: [DEBUG] TRACE RETURN_VALUE None [TensorVariable()] 2023-01-11T21:38:06.9715096Z [2023-01-11 21:38:02,775] torch._dynamo.symbolic_convert: [INFO] Step 2: torchdynamo done tracing fn (RETURN_VALUE) 2023-01-11T21:38:06.9715322Z [2023-01-11 21:38:02,775] torch._dynamo.symbolic_convert: [DEBUG] RETURN_VALUE triggered compile 2023-01-11T21:38:06.9715537Z [2023-01-11 21:38:02,775] torch._dynamo.output_graph: [DEBUG] COMPILING GRAPH due to None 2023-01-11T21:38:06.9715780Z [2023-01-11 21:38:02,776] torch._dynamo.output_graph: [INFO] Step 3: calling compiler function compile_fx 2023-01-11T21:38:06.9716039Z [2023-01-11 21:38:02,818] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1157 2023-01-11T21:38:06.9716281Z [2023-01-11 21:38:02,825] torch._inductor.optimize_indexing: [WARNING] unhandled ValueRange op name 2023-01-11T21:38:06.9716548Z [2023-01-11 21:38:03,042] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1157 2023-01-11T21:38:06.9716992Z [2023-01-11 21:38:03,042] torch._inductor.debug: [WARNING] model__1157_inference_1203 debug trace: /var/lib/jenkins/workspace/test/torch_compile_debug_tmp_6/run_2023_01_11_21_38_02_752239/aot_torchinductor/model__1157_inference_1203.13 2023-01-11T21:38:06.9717267Z [2023-01-11 21:38:03,042] torch._dynamo.output_graph: [INFO] Step 3: done compiler function compile_fx 2023-01-11T21:38:06.9717645Z [2023-01-11 21:38:03,043] torch._inductor.debug: [WARNING] model___1204 debug trace: /var/lib/jenkins/workspace/test/torch_compile_debug_tmp_6/run_2023_01_11_21_38_02_752239/aot_torchinductor/model___1203.12 2023-01-11T21:38:06.9717651Z 2023-01-11T21:38:06.9717752Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.9717829Z import torch 2023-01-11T21:38:06.9717901Z import random 2023-01-11T21:38:06.9718024Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.9718151Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.9718157Z 2023-01-11T21:38:06.9718242Z aten = torch.ops.aten 2023-01-11T21:38:06.9718383Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.9718482Z async_compile = AsyncCompile() 2023-01-11T21:38:06.9718488Z 2023-01-11T21:38:06.9718568Z import triton 2023-01-11T21:38:06.9718660Z import triton.language as tl 2023-01-11T21:38:06.9718789Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.9718935Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.9718941Z 2023-01-11T21:38:06.9718945Z 2023-01-11T21:38:06.9719134Z triton_fused_convert_element_type_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.9719217Z import triton 2023-01-11T21:38:06.9719312Z import triton.language as tl 2023-01-11T21:38:06.9719430Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.9719535Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.9719664Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.9719794Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.9719799Z 2023-01-11T21:38:06.9720192Z @pointwise(size_hints=[4], filename=__file__, meta={'signature': {0: '*fp64', 1: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0,), equal_to_1=())]}) 2023-01-11T21:38:06.9720273Z @triton.jit 2023-01-11T21:38:06.9720399Z def triton_(out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.9720472Z xnumel = 4 2023-01-11T21:38:06.9720573Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.9720708Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.9720787Z xmask = xindex < xnumel 2023-01-11T21:38:06.9720860Z x0 = xindex 2023-01-11T21:38:06.9720934Z tmp0 = 1 2023-01-11T21:38:06.9721009Z tmp1 = 500 2023-01-11T21:38:06.9721089Z tmp2 = tmp0 + tmp1 2023-01-11T21:38:06.9721177Z tmp3 = tmp2.to(tl.int32) 2023-01-11T21:38:06.9721251Z tmp4 = tmp3 + tmp0 2023-01-11T21:38:06.9721343Z tmp5 = tmp4.to(tl.float64) 2023-01-11T21:38:06.9721482Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask) 2023-01-11T21:38:06.9721568Z ''') 2023-01-11T21:38:06.9721576Z 2023-01-11T21:38:06.9721581Z 2023-01-11T21:38:06.9721677Z async_compile.wait(globals()) 2023-01-11T21:38:06.9721761Z del async_compile 2023-01-11T21:38:06.9721767Z 2023-01-11T21:38:06.9721842Z def call(args): 2023-01-11T21:38:06.9721935Z with torch.cuda.device(0): 2023-01-11T21:38:06.9722127Z buf0 = empty_strided((4, ), (1, ), device='cuda', dtype=torch.float64) 2023-01-11T21:38:06.9722224Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.9722373Z triton_fused_convert_element_type_1_0.run(buf0, 4, grid=grid(4), stream=stream0) 2023-01-11T21:38:06.9722455Z return (buf0, ) 2023-01-11T21:38:06.9722461Z 2023-01-11T21:38:06.9722465Z 2023-01-11T21:38:06.9722546Z if __name__ == "__main__": 2023-01-11T21:38:06.9722665Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.9722791Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.9722895Z print_performance(lambda: call([])) 2023-01-11T21:38:06.9722900Z 2023-01-11T21:38:06.9722940Z 2023-01-11T21:38:06.9723035Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.9723134Z import torch 2023-01-11T21:38:06.9723211Z import random 2023-01-11T21:38:06.9723333Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.9723460Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.9723465Z 2023-01-11T21:38:06.9723549Z aten = torch.ops.aten 2023-01-11T21:38:06.9723686Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.9723783Z async_compile = AsyncCompile() 2023-01-11T21:38:06.9723789Z 2023-01-11T21:38:06.9723858Z import triton 2023-01-11T21:38:06.9723953Z import triton.language as tl 2023-01-11T21:38:06.9724079Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.9724219Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.9724224Z 2023-01-11T21:38:06.9724228Z 2023-01-11T21:38:06.9724423Z triton_fused_convert_element_type_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.9724499Z import triton 2023-01-11T21:38:06.9724595Z import triton.language as tl 2023-01-11T21:38:06.9724707Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.9724813Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.9724948Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.9725080Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.9725085Z 2023-01-11T21:38:06.9725470Z @pointwise(size_hints=[4], filename=__file__, meta={'signature': {0: '*fp64', 1: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0,), equal_to_1=())]}) 2023-01-11T21:38:06.9725545Z @triton.jit 2023-01-11T21:38:06.9725666Z def triton_(out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.9725741Z xnumel = 4 2023-01-11T21:38:06.9725835Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.9725967Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.9726055Z xmask = xindex < xnumel 2023-01-11T21:38:06.9726129Z x0 = xindex 2023-01-11T21:38:06.9726206Z tmp0 = 1 2023-01-11T21:38:06.9726279Z tmp1 = 0 2023-01-11T21:38:06.9726360Z tmp2 = tmp0 < tmp1 2023-01-11T21:38:06.9726425Z tmp3 = 2 2023-01-11T21:38:06.9726536Z tmp4 = tmp0 - tmp3 2023-01-11T21:38:06.9726632Z tmp5 = tl.where(tmp2, tmp0, tmp4) 2023-01-11T21:38:06.9726740Z tmp6 = -1048576 2023-01-11T21:38:06.9726819Z tmp7 = tmp5 * tmp6 2023-01-11T21:38:06.9726909Z tmp8 = tmp7.to(tl.int32) 2023-01-11T21:38:06.9726990Z tmp9 = tmp8 + tmp0 2023-01-11T21:38:06.9727075Z tmp10 = tmp9.to(tl.float64) 2023-01-11T21:38:06.9727216Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp10, xmask) 2023-01-11T21:38:06.9727303Z ''') 2023-01-11T21:38:06.9727309Z 2023-01-11T21:38:06.9727313Z 2023-01-11T21:38:06.9727412Z async_compile.wait(globals()) 2023-01-11T21:38:06.9727493Z del async_compile 2023-01-11T21:38:06.9727498Z 2023-01-11T21:38:06.9727578Z def call(args): 2023-01-11T21:38:06.9727673Z with torch.cuda.device(0): 2023-01-11T21:38:06.9727868Z buf0 = empty_strided((4, ), (1, ), device='cuda', dtype=torch.float64) 2023-01-11T21:38:06.9727966Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.9728117Z triton_fused_convert_element_type_1_0.run(buf0, 4, grid=grid(4), stream=stream0) 2023-01-11T21:38:06.9728200Z return (buf0, ) 2023-01-11T21:38:06.9728205Z 2023-01-11T21:38:06.9728210Z 2023-01-11T21:38:06.9728293Z if __name__ == "__main__": 2023-01-11T21:38:06.9728412Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.9728539Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.9728643Z print_performance(lambda: call([])) 2023-01-11T21:38:06.9728648Z 2023-01-11T21:38:06.9728652Z 2023-01-11T21:38:06.9728754Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.9728823Z import torch 2023-01-11T21:38:06.9728900Z import random 2023-01-11T21:38:06.9729053Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.9729180Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.9729205Z 2023-01-11T21:38:06.9729291Z aten = torch.ops.aten 2023-01-11T21:38:06.9729429Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.9729527Z async_compile = AsyncCompile() 2023-01-11T21:38:06.9729532Z 2023-01-11T21:38:06.9729601Z import triton 2023-01-11T21:38:06.9729696Z import triton.language as tl 2023-01-11T21:38:06.9729824Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.9729967Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.9729972Z 2023-01-11T21:38:06.9729976Z 2023-01-11T21:38:06.9730162Z triton_fused_convert_element_type_1_0 = async_compile.triton(''' 2023-01-11T21:38:06.9730244Z import triton 2023-01-11T21:38:06.9730337Z import triton.language as tl 2023-01-11T21:38:06.9730453Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.9730553Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.9730690Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.9730819Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.9730825Z 2023-01-11T21:38:06.9731211Z @pointwise(size_hints=[4], filename=__file__, meta={'signature': {0: '*fp64', 1: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0,), equal_to_1=())]}) 2023-01-11T21:38:06.9731285Z @triton.jit 2023-01-11T21:38:06.9731409Z def triton_(out_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.9731485Z xnumel = 4 2023-01-11T21:38:06.9731586Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.9731711Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.9731795Z xmask = xindex < xnumel 2023-01-11T21:38:06.9731867Z x0 = xindex 2023-01-11T21:38:06.9731938Z tmp0 = 1 2023-01-11T21:38:06.9732011Z tmp1 = 30 2023-01-11T21:38:06.9732095Z tmp2 = tmp0 / tmp1 2023-01-11T21:38:06.9732177Z tmp3 = tmp2.to(tl.int32) 2023-01-11T21:38:06.9732260Z tmp4 = tmp3 + tmp0 2023-01-11T21:38:06.9732349Z tmp5 = tmp4.to(tl.float64) 2023-01-11T21:38:06.9732485Z tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask) 2023-01-11T21:38:06.9732571Z ''') 2023-01-11T21:38:06.9732576Z 2023-01-11T21:38:06.9733676Z 2023-01-11T21:38:06.9733785Z async_compile.wait(globals()) 2023-01-11T21:38:06.9733865Z del async_compile 2023-01-11T21:38:06.9733870Z 2023-01-11T21:38:06.9733954Z def call(args): 2023-01-11T21:38:06.9734047Z with torch.cuda.device(0): 2023-01-11T21:38:06.9734256Z buf0 = empty_strided((4, ), (1, ), device='cuda', dtype=torch.float64) 2023-01-11T21:38:06.9734344Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.9734624Z triton_fused_convert_element_type_1_0.run(buf0, 4, grid=grid(4), stream=stream0) 2023-01-11T21:38:06.9734708Z return (buf0, ) 2023-01-11T21:38:06.9734713Z 2023-01-11T21:38:06.9734717Z 2023-01-11T21:38:06.9734801Z if __name__ == "__main__": 2023-01-11T21:38:06.9734935Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.9735076Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.9735187Z print_performance(lambda: call([])) 2023-01-11T21:38:06.9735192Z 2023-01-11T21:38:06.9735263Z ok (0.933s) 2023-01-11T21:38:06.9735700Z test_optimize_indexing_dtype (__main__.TritonCodeGenTests) ... [2023-01-11 21:38:03,047] torch._dynamo.symbolic_convert: [INFO] Step 2: torchdynamo start tracing fn 2023-01-11T21:38:06.9736058Z [2023-01-11 21:38:03,047] torch._dynamo.symbolic_convert: [DEBUG] TRACE starts_line /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:6063 2023-01-11T21:38:06.9736285Z [2023-01-11 21:38:03,047] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_GLOBAL aten [] 2023-01-11T21:38:06.9736616Z [2023-01-11 21:38:03,053] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_ATTR upsample_bilinear2d [TorchVariable()] 2023-01-11T21:38:06.9736909Z [2023-01-11 21:38:03,053] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_ATTR vec [TorchVariable(aten.upsample_bilinear2d)] 2023-01-11T21:38:06.9737359Z [2023-01-11 21:38:03,053] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_FAST x [TorchVariable(aten.upsample_bilinear2d.vec)] 2023-01-11T21:38:06.9737677Z [2023-01-11 21:38:03,053] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_CONST None [TorchVariable(aten.upsample_bilinear2d.vec), TensorVariable()] 2023-01-11T21:38:06.9738038Z [2023-01-11 21:38:03,053] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_CONST True [TorchVariable(aten.upsample_bilinear2d.vec), TensorVariable(), ConstantVariable(NoneType)] 2023-01-11T21:38:06.9738431Z [2023-01-11 21:38:03,054] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_CONST 2.0 [TorchVariable(aten.upsample_bilinear2d.vec), TensorVariable(), ConstantVariable(NoneType), ConstantVariable(bool)] 2023-01-11T21:38:06.9738862Z [2023-01-11 21:38:03,054] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_CONST 2.0 [TorchVariable(aten.upsample_bilinear2d.vec), TensorVariable(), ConstantVariable(NoneType), ConstantVariable(bool), ConstantVariable(float)] 2023-01-11T21:38:06.9739333Z [2023-01-11 21:38:03,054] torch._dynamo.symbolic_convert: [DEBUG] TRACE BUILD_LIST 2 [TorchVariable(aten.upsample_bilinear2d.vec), TensorVariable(), ConstantVariable(NoneType), ConstantVariable(bool), ConstantVariable(float), ConstantVariable(float)] 2023-01-11T21:38:06.9739753Z [2023-01-11 21:38:03,054] torch._dynamo.symbolic_convert: [DEBUG] TRACE CALL_FUNCTION 4 [TorchVariable(aten.upsample_bilinear2d.vec), TensorVariable(), ConstantVariable(NoneType), ConstantVariable(bool), ListVariable()] 2023-01-11T21:38:06.9740003Z [2023-01-11 21:38:03,137] torch._dynamo.symbolic_convert: [DEBUG] TRACE RETURN_VALUE None [TensorVariable()] 2023-01-11T21:38:06.9740261Z [2023-01-11 21:38:03,137] torch._dynamo.symbolic_convert: [INFO] Step 2: torchdynamo done tracing fn (RETURN_VALUE) 2023-01-11T21:38:06.9740494Z [2023-01-11 21:38:03,137] torch._dynamo.symbolic_convert: [DEBUG] RETURN_VALUE triggered compile 2023-01-11T21:38:06.9740719Z [2023-01-11 21:38:03,137] torch._dynamo.output_graph: [DEBUG] COMPILING GRAPH due to None 2023-01-11T21:38:06.9740959Z [2023-01-11 21:38:03,138] torch._dynamo.output_graph: [INFO] Step 3: calling compiler function compile_fx 2023-01-11T21:38:06.9741275Z [2023-01-11 21:38:03,319] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1158 2023-01-11T21:38:06.9741517Z [2023-01-11 21:38:03,412] torch._inductor.optimize_indexing: [WARNING] unhandled ValueRange op name 2023-01-11T21:38:06.9741750Z [2023-01-11 21:38:03,426] torch._inductor.optimize_indexing: [WARNING] unhandled ValueRange op name 2023-01-11T21:38:06.9741964Z [2023-01-11 21:38:03,434] torch._inductor.scheduler: [DEBUG] remove_buffer('buf1') 2023-01-11T21:38:06.9742231Z [2023-01-11 21:38:03,622] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1158 2023-01-11T21:38:06.9742647Z [2023-01-11 21:38:03,622] torch._inductor.debug: [WARNING] model__1158_inference_1204 debug trace: /var/lib/jenkins/workspace/test/torch_compile_debug_tmp_7/run_2023_01_11_21_38_03_046733/aot_torchinductor/model__1158_inference_1204.15 2023-01-11T21:38:06.9742897Z [2023-01-11 21:38:03,623] torch._dynamo.output_graph: [INFO] Step 3: done compiler function compile_fx 2023-01-11T21:38:06.9743272Z [2023-01-11 21:38:03,779] torch._inductor.debug: [WARNING] model___1205 debug trace: /var/lib/jenkins/workspace/test/torch_compile_debug_tmp_7/run_2023_01_11_21_38_03_046733/aot_torchinductor/model___1204.14 2023-01-11T21:38:06.9743278Z 2023-01-11T21:38:06.9743380Z from ctypes import c_void_p, c_long 2023-01-11T21:38:06.9743449Z import torch 2023-01-11T21:38:06.9743528Z import random 2023-01-11T21:38:06.9743652Z from torch import empty_strided, as_strided, device 2023-01-11T21:38:06.9743780Z from torch._inductor.codecache import AsyncCompile 2023-01-11T21:38:06.9743785Z 2023-01-11T21:38:06.9743868Z aten = torch.ops.aten 2023-01-11T21:38:06.9744009Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride 2023-01-11T21:38:06.9744138Z async_compile = AsyncCompile() 2023-01-11T21:38:06.9744143Z 2023-01-11T21:38:06.9744223Z import triton 2023-01-11T21:38:06.9744312Z import triton.language as tl 2023-01-11T21:38:06.9744443Z from torch._inductor.triton_ops.autotune import grid 2023-01-11T21:38:06.9744584Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream 2023-01-11T21:38:06.9744590Z 2023-01-11T21:38:06.9744594Z 2023-01-11T21:38:06.9744896Z triton_fused_add_add_1_add_2_arange_arange_1_convert_element_type_convert_element_type_2_floor_floor_1_index_0 = async_compile.triton(''' 2023-01-11T21:38:06.9744975Z import triton 2023-01-11T21:38:06.9745071Z import triton.language as tl 2023-01-11T21:38:06.9745187Z from torch._inductor.ir import ReductionHint 2023-01-11T21:38:06.9745296Z from torch._inductor.ir import TileHint 2023-01-11T21:38:06.9745425Z from torch._inductor.triton_ops.autotune import pointwise 2023-01-11T21:38:06.9745552Z from torch._inductor.utils import instance_descriptor 2023-01-11T21:38:06.9745560Z 2023-01-11T21:38:06.9745985Z @pointwise(size_hints=[8192], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]}) 2023-01-11T21:38:06.9746063Z @triton.jit 2023-01-11T21:38:06.9746202Z def triton_(in_out_ptr0, in_ptr0, xnumel, XBLOCK : tl.constexpr): 2023-01-11T21:38:06.9746281Z xnumel = 8192 2023-01-11T21:38:06.9746381Z xoffset = tl.program_id(0) * XBLOCK 2023-01-11T21:38:06.9746515Z xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK]) 2023-01-11T21:38:06.9746594Z xmask = xindex < xnumel 2023-01-11T21:38:06.9746680Z x1 = (xindex // 32) % 32 2023-01-11T21:38:06.9746758Z x0 = xindex % 32 2023-01-11T21:38:06.9746837Z x2 = (xindex // 1024) 2023-01-11T21:38:06.9746908Z x4 = xindex 2023-01-11T21:38:06.9746983Z tmp0 = x1 2023-01-11T21:38:06.9747062Z tmp1 = 0.4838709677419355 2023-01-11T21:38:06.9747143Z tmp2 = tmp0 * tmp1 2023-01-11T21:38:06.9747244Z tmp3 = tl.libdevice.floor(tmp2) 2023-01-11T21:38:06.9747333Z tmp4 = tmp3.to(tl.int32) 2023-01-11T21:38:06.9747408Z tmp5 = x0 2023-01-11T21:38:06.9747487Z tmp6 = tmp5 * tmp1 2023-01-11T21:38:06.9747619Z tmp7 = tl.libdevice.floor(tmp6) 2023-01-11T21:38:06.9747701Z tmp8 = tmp7.to(tl.int32) 2023-01-11T21:38:06.9747930Z tmp9 = tl.load(in_ptr0 + (tmp8 + (16*tmp4) + (256*x2)), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.9748008Z tmp10 = 1.0 2023-01-11T21:38:06.9748099Z tmp11 = tmp4.to(tl.float32) 2023-01-11T21:38:06.9748214Z tmp12 = tmp2 - tmp11 2023-01-11T21:38:06.9748330Z tmp13 = tmp10 - tmp12 2023-01-11T21:38:06.9748415Z tmp14 = tmp9 * tmp13 2023-01-11T21:38:06.9748507Z tmp15 = tl.libdevice.ceil(tmp2) 2023-01-11T21:38:06.9748583Z tmp16 = 15.0 2023-01-11T21:38:06.9748730Z tmp17 = tl.where(tmp15 != tmp15, tmp15, tl.where(tmp15 < tmp16, tmp15, tmp16)) 2023-01-11T21:38:06.9748824Z tmp18 = tmp17.to(tl.int32) 2023-01-11T21:38:06.9749051Z tmp19 = tl.load(in_ptr0 + (tmp8 + (16*tmp18) + (256*x2)), xmask, eviction_policy='evict_last') 2023-01-11T21:38:06.9749134Z tmp20 = tmp19 * tmp12 2023-01-11T21:38:06.9749221Z tmp21 = tmp14 + tmp20 2023-01-11T21:38:06.9749305Z tmp22 = tmp8.to(tl.float32) 2023-01-11T21:38:06.9749417Z tmp23 = tmp6 - tmp22 2023-01-11T21:38:06.9749534Z tmp24 = tmp10 - tmp23 2023-01-11T21:38:06.9749615Z tmp25 = tmp21 * tmp24 2023-01-11T21:38:06.9749713Z tmp26 = tl.libdevice.ceil(tmp6) 2023-01-11T21:38:06.9749860Z tmp27 = tl.where(tmp26 != tmp26, tmp26, tl.where(tmp26 < tmp16, tmp26, tmp16)) 2023-01-11T21:38:06.9749949Z tmp28 = tmp27.to(tl.int32) 2023-01-11T21:38:06.9750068Z tmp29 = tl.load(in_ptr0 + (tmp28 + (16*tmp4) + (256*x2)), xmask) 2023-01-11T21:38:06.9750151Z tmp30 = tmp29 * tmp13 2023-01-11T21:38:06.9750277Z tmp31 = tl.load(in_ptr0 + (tmp28 + (16*tmp18) + (256*x2)), xmask) 2023-01-11T21:38:06.9750393Z tmp32 = tmp31 * tmp12 2023-01-11T21:38:06.9750476Z tmp33 = tmp30 + tmp32 2023-01-11T21:38:06.9750559Z tmp34 = tmp33 * tmp23 2023-01-11T21:38:06.9750641Z tmp35 = tmp25 + tmp34 2023-01-11T21:38:06.9750778Z tl.store(in_out_ptr0 + (x4 + tl.zeros([XBLOCK], tl.int32)), tmp35, xmask) 2023-01-11T21:38:06.9750867Z ''') 2023-01-11T21:38:06.9750873Z 2023-01-11T21:38:06.9750877Z 2023-01-11T21:38:06.9750975Z async_compile.wait(globals()) 2023-01-11T21:38:06.9751054Z del async_compile 2023-01-11T21:38:06.9751060Z 2023-01-11T21:38:06.9751135Z def call(args): 2023-01-11T21:38:06.9751211Z arg0_1, = args 2023-01-11T21:38:06.9751290Z args.clear() 2023-01-11T21:38:06.9751385Z with torch.cuda.device(0): 2023-01-11T21:38:06.9751606Z buf0 = empty_strided((2, 4, 32, 32), (4096, 1024, 32, 1), device='cuda', dtype=torch.float32) 2023-01-11T21:38:06.9751699Z buf2 = buf0; del buf0 # reuse 2023-01-11T21:38:06.9751794Z stream0 = get_cuda_stream(0) 2023-01-11T21:38:06.9752026Z triton_fused_add_add_1_add_2_arange_arange_1_convert_element_type_convert_element_type_2_floor_floor_1_index_0.run(buf2, arg0_1, 8192, grid=grid(8192), stream=stream0) 2023-01-11T21:38:06.9752102Z del arg0_1 2023-01-11T21:38:06.9752188Z return (buf2, ) 2023-01-11T21:38:06.9752193Z 2023-01-11T21:38:06.9752197Z 2023-01-11T21:38:06.9752281Z if __name__ == "__main__": 2023-01-11T21:38:06.9752403Z from torch._dynamo.testing import rand_strided 2023-01-11T21:38:06.9752527Z from torch._inductor.utils import print_performance 2023-01-11T21:38:06.9752756Z arg0_1 = rand_strided((2, 4, 16, 16), (1024, 256, 16, 1), device='cuda:0', dtype=torch.float32) 2023-01-11T21:38:06.9752872Z print_performance(lambda: call([arg0_1])) 2023-01-11T21:38:06.9752877Z 2023-01-11T21:38:06.9752954Z ok (0.736s) 2023-01-11T21:38:06.9752959Z 2023-01-11T21:38:06.9753157Z ---------------------------------------------------------------------- 2023-01-11T21:38:06.9753249Z Ran 731 tests in 867.685s 2023-01-11T21:38:06.9753253Z 2023-01-11T21:38:06.9753333Z OK (skipped=33) 2023-01-11T21:38:06.9753339Z 2023-01-11T21:38:06.9753431Z Generating XML reports... 2023-01-11T21:38:06.9753759Z Generated XML report: test-reports/python-unittest/inductor.test_torchinductor/TEST-CPUReproTests-20230111212336.xml 2023-01-11T21:38:06.9754050Z Generated XML report: test-reports/python-unittest/inductor.test_torchinductor/TEST-CpuTests-20230111212336.xml 2023-01-11T21:38:06.9754352Z Generated XML report: test-reports/python-unittest/inductor.test_torchinductor/TEST-CudaReproTests-20230111212336.xml 2023-01-11T21:38:06.9754637Z Generated XML report: test-reports/python-unittest/inductor.test_torchinductor/TEST-CudaTests-20230111212336.xml 2023-01-11T21:38:06.9754944Z Generated XML report: test-reports/python-unittest/inductor.test_torchinductor/TEST-ExprPrinterTests-20230111212336.xml 2023-01-11T21:38:06.9755252Z Generated XML report: test-reports/python-unittest/inductor.test_torchinductor/TEST-SweepInputsCpuTest-20230111212336.xml 2023-01-11T21:38:06.9755620Z Generated XML report: test-reports/python-unittest/inductor.test_torchinductor/TEST-SweepInputsCudaTest-20230111212336.xml 2023-01-11T21:38:06.9755964Z Generated XML report: test-reports/python-unittest/inductor.test_torchinductor/TEST-TestIndexingSimplification-20230111212336.xml 2023-01-11T21:38:06.9756276Z Generated XML report: test-reports/python-unittest/inductor.test_torchinductor/TEST-TritonCodeGenTests-20230111212336.xml 2023-01-11T21:38:06.9756282Z 2023-01-11T21:38:06.9756619Z ##[endgroup] 2023-01-11T21:38:06.9756945Z FINISHED PRINTING LOG FILE of inductor/test_torchinductor (/var/lib/jenkins/workspace/test/test-reports/inductor-test_torchinductor_y59pervs) 2023-01-11T21:38:06.9756951Z 2023-01-11T21:38:06.9757144Z Running test_fake_tensor ... [2023-01-11 21:38:06.158498] 2023-01-11T21:38:06.9757479Z Executing ['/opt/conda/bin/python', '-bb', 'test_fake_tensor.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 21:38:06.158901] 2023-01-11T21:38:16.3220740Z 2023-01-11T21:38:16.3221454Z Expand the folded group to see the log file of test_fake_tensor 2023-01-11T21:38:16.3222528Z ##[group]PRINTING LOG FILE of test_fake_tensor (/var/lib/jenkins/workspace/test/test-reports/test_fake_tensor_jebacq4z) 2023-01-11T21:38:16.3222935Z 2023-01-11T21:38:16.3223080Z Running tests... 2023-01-11T21:38:16.3223656Z ---------------------------------------------------------------------- 2023-01-11T21:38:16.3224185Z Test results will be stored in test-reports/python-unittest/test_fake_tensor 2023-01-11T21:38:16.3224645Z test_aliased_const_write (__main__.FakeTensorConstHandling) ... ok (1.098s) 2023-01-11T21:38:16.3225219Z test_constant_invalidation (__main__.FakeTensorConstHandling) ... ok (0.004s) 2023-01-11T21:38:16.3225852Z test_fake_tensor_batch_norm_cpu (__main__.FakeTensorConstHandling) ... ok (0.083s) 2023-01-11T21:38:16.3226406Z test_fake_tensor_in_intlist_repro (__main__.FakeTensorConstHandling) ... ok (0.008s) 2023-01-11T21:38:16.3226924Z test_inplace_add (__main__.FakeTensorConstHandling) ... ok (0.001s) 2023-01-11T21:38:16.3227428Z test_inplace_view_invalidation (__main__.FakeTensorConstHandling) ... ok (0.001s) 2023-01-11T21:38:16.3227854Z test_shared_storage_invalidation (__main__.FakeTensorConstHandling) ... ok (0.004s) 2023-01-11T21:38:16.3228600Z test_shared_storages (__main__.FakeTensorConstHandling) ... /var/lib/jenkins/workspace/test/test_fake_tensor.py:513: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:16.3229211Z self.assertEqual(x.storage()._cdata, y.storage()._cdata) 2023-01-11T21:38:16.3229782Z /var/lib/jenkins/workspace/test/test_fake_tensor.py:514: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:16.3230370Z self.assertEqual(x.constant.storage()._cdata, y.constant.storage()._cdata) 2023-01-11T21:38:16.3230844Z ok (0.001s) 2023-01-11T21:38:16.3231090Z test_simple (__main__.FakeTensorConstHandling) ... ok (0.001s) 2023-01-11T21:38:16.3231406Z test_dead_key (__main__.FakeTensorConverterTest) ... ok (0.001s) 2023-01-11T21:38:16.3231728Z test_dead_weak_ref (__main__.FakeTensorConverterTest) ... ok (0.001s) 2023-01-11T21:38:16.3232070Z test_memoized_conversion_from_meta (__main__.FakeTensorConverterTest) ... ok (0.001s) 2023-01-11T21:38:16.3232418Z test_memoized_conversion_to_meta (__main__.FakeTensorConverterTest) ... ok (0.001s) 2023-01-11T21:38:16.3232755Z test_no_active_mode (__main__.FakeTensorConverterTest) ... ok (0.007s) 2023-01-11T21:38:16.3233080Z test_no_ref_cycle (__main__.FakeTensorConverterTest) ... ok (0.001s) 2023-01-11T21:38:16.3233404Z test_separate_mode_error (__main__.FakeTensorConverterTest) ... ok (0.007s) 2023-01-11T21:38:16.3234068Z test_separate_tensor_storages_non_view (__main__.FakeTensorConverterTest) ... /var/lib/jenkins/workspace/test/test_fake_tensor.py:602: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:16.3234639Z y.set_(x.storage()) 2023-01-11T21:38:16.3234832Z ok (0.001s) 2023-01-11T21:38:16.3235102Z test_separate_tensor_storages_view (__main__.FakeTensorConverterTest) ... ok (0.001s) 2023-01-11T21:38:16.3235445Z test_like_ops (__main__.FakeTensorOperatorInvariants) ... ok (0.012s) 2023-01-11T21:38:16.3235793Z test_non_kwarg_only_device (__main__.FakeTensorOperatorInvariants) ... ok (0.057s) 2023-01-11T21:38:16.3236226Z test_sparse_new (__main__.FakeTensorOperatorInvariants) ... expected failure (0.002s) 2023-01-11T21:38:16.3236607Z test_tensor_constructors_all_have_kwarg_device (__main__.FakeTensorOperatorInvariants) ... ok (0.109s) 2023-01-11T21:38:16.3237456Z test_fake_tensor_prop_on_nn_module (__main__.FakeTensorPropTest) ... /opt/conda/lib/python3.10/site-packages/torch/fx/_symbolic_trace.py:564: UserWarning: Was not able to add assertion to guarantee correct input value to specialized function. It is up to the user to make sure that your inputs match the inputs you specialized the function with. 2023-01-11T21:38:16.3238002Z warnings.warn( 2023-01-11T21:38:16.3238198Z ok (0.023s) 2023-01-11T21:38:16.3238424Z test_basic (__main__.FakeTensorTest) ... ok (0.002s) 2023-01-11T21:38:16.3238721Z test_binary_op_type_promotion (__main__.FakeTensorTest) ... ok (0.008s) 2023-01-11T21:38:16.3239024Z test_constructor (__main__.FakeTensorTest) ... ok (0.004s) 2023-01-11T21:38:16.3239308Z test_cpu_fallback (__main__.FakeTensorTest) ... ok (0.145s) 2023-01-11T21:38:16.3239591Z test_cuda_lstm (__main__.FakeTensorTest) ... ok (0.115s) 2023-01-11T21:38:16.3239886Z test_cudnn_rnn_with_fallback (__main__.FakeTensorTest) ... ok (3.489s) 2023-01-11T21:38:16.3240192Z test_cudnn_rnn_without_fallback (__main__.FakeTensorTest) ... ok (2.612s) 2023-01-11T21:38:16.3240511Z test_data_dependent_operator (__main__.FakeTensorTest) ... ok (0.004s) 2023-01-11T21:38:16.3241134Z test_deepcopy (__main__.FakeTensorTest) ... /var/lib/jenkins/workspace/test/test_fake_tensor.py:466: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:38:16.3241753Z self.assertEqual(mod_copied.b.storage()._cdata, mod_copied.a.storage()._cdata) 2023-01-11T21:38:16.3242016Z ok (0.005s) 2023-01-11T21:38:16.3242259Z test_fake_dispatch_keys (__main__.FakeTensorTest) ... ok (0.007s) 2023-01-11T21:38:16.3242559Z test_fake_grad_copy (__main__.FakeTensorTest) ... ok (0.001s) 2023-01-11T21:38:16.3242849Z test_fake_mode_error (__main__.FakeTensorTest) ... ok (0.001s) 2023-01-11T21:38:16.3243181Z test_fallback_memory_prop (__main__.FakeTensorTest) ... ok (0.030s) 2023-01-11T21:38:16.3243484Z test_from_numpy (__main__.FakeTensorTest) ... ok (0.010s) 2023-01-11T21:38:16.3243778Z test_index_cuda_with_cpu (__main__.FakeTensorTest) ... ok (0.011s) 2023-01-11T21:38:16.3244069Z test_like_constructor (__main__.FakeTensorTest) ... ok (0.010s) 2023-01-11T21:38:16.3244351Z test_mode (__main__.FakeTensorTest) ... ok (0.004s) 2023-01-11T21:38:16.3244629Z test_nan_to_num (__main__.FakeTensorTest) ... ok (0.013s) 2023-01-11T21:38:16.3244896Z test_new (__main__.FakeTensorTest) ... ok (0.017s) 2023-01-11T21:38:16.3245168Z test_non_kwarg_device (__main__.FakeTensorTest) ... ok (0.021s) 2023-01-11T21:38:16.3245483Z test_non_overlapping_stride_zero (__main__.FakeTensorTest) ... ok (0.013s) 2023-01-11T21:38:16.3245793Z test_non_parameter_grad (__main__.FakeTensorTest) ... ok (0.001s) 2023-01-11T21:38:16.3246082Z test_normalize_device (__main__.FakeTensorTest) ... ok (0.007s) 2023-01-11T21:38:16.3246389Z test_parameter_instantiation (__main__.FakeTensorTest) ... ok (0.004s) 2023-01-11T21:38:16.3246698Z test_print_in_fake_mode (__main__.FakeTensorTest) ... ok (0.001s) 2023-01-11T21:38:16.3246982Z test_randperm (__main__.FakeTensorTest) ... ok (0.007s) 2023-01-11T21:38:16.3247280Z test_recursive_invocation (__main__.FakeTensorTest) ... ok (0.001s) 2023-01-11T21:38:16.3247580Z test_scalar_inputs (__main__.FakeTensorTest) ... ok (0.004s) 2023-01-11T21:38:16.3247865Z test_setitem (__main__.FakeTensorTest) ... ok (0.008s) 2023-01-11T21:38:16.3248172Z test_shape_take_not_device (__main__.FakeTensorTest) ... ok (0.011s) 2023-01-11T21:38:16.3248462Z test_throw (__main__.FakeTensorTest) ... ok (0.008s) 2023-01-11T21:38:16.3248774Z test_type_as (__main__.FakeTensorTest) ... ok (0.024s) 2023-01-11T21:38:16.3249068Z test_upsample_bilinear_small_channels (__main__.FakeTensorTest) ... ok (0.081s) 2023-01-11T21:38:16.3249370Z test_zero_dim (__main__.FakeTensorTest) ... ok (0.004s) 2023-01-11T21:38:16.3249529Z 2023-01-11T21:38:16.3249742Z ---------------------------------------------------------------------- 2023-01-11T21:38:16.3249996Z Ran 57 tests in 8.109s 2023-01-11T21:38:16.3250119Z 2023-01-11T21:38:16.3250212Z OK (expected failures=1) 2023-01-11T21:38:16.3250341Z 2023-01-11T21:38:16.3250436Z Generating XML reports... 2023-01-11T21:38:16.3250893Z Generated XML report: test-reports/python-unittest/test_fake_tensor/TEST-FakeTensorConstHandling-20230111213807.xml 2023-01-11T21:38:16.3251447Z Generated XML report: test-reports/python-unittest/test_fake_tensor/TEST-FakeTensorConverterTest-20230111213807.xml 2023-01-11T21:38:16.3252028Z Generated XML report: test-reports/python-unittest/test_fake_tensor/TEST-FakeTensorOperatorInvariants-20230111213807.xml 2023-01-11T21:38:16.3252589Z Generated XML report: test-reports/python-unittest/test_fake_tensor/TEST-FakeTensorPropTest-20230111213807.xml 2023-01-11T21:38:16.3253108Z Generated XML report: test-reports/python-unittest/test_fake_tensor/TEST-FakeTensorTest-20230111213807.xml 2023-01-11T21:38:16.3253334Z 2023-01-11T21:38:16.3253582Z ##[endgroup] 2023-01-11T21:38:16.3253973Z FINISHED PRINTING LOG FILE of test_fake_tensor (/var/lib/jenkins/workspace/test/test-reports/test_fake_tensor_jebacq4z) 2023-01-11T21:38:16.3254199Z 2023-01-11T21:38:16.3254372Z Running test_sparse_csr ... [2023-01-11 21:38:16.321931] 2023-01-11T21:38:16.3255056Z Executing ['/opt/conda/bin/python', '-bb', 'test_sparse_csr.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 21:38:16.322120] 2023-01-11T21:45:10.4308675Z 2023-01-11T21:45:10.4309303Z Expand the folded group to see the log file of test_sparse_csr 2023-01-11T21:45:10.4310042Z ##[group]PRINTING LOG FILE of test_sparse_csr (/var/lib/jenkins/workspace/test/test-reports/test_sparse_csr_txl3rn3o) 2023-01-11T21:45:10.4312454Z 2023-01-11T21:45:10.4314394Z Running tests... 2023-01-11T21:45:10.4315020Z ---------------------------------------------------------------------- 2023-01-11T21:45:10.4315826Z Test results will be stored in test-reports/python-unittest/test_sparse_csr 2023-01-11T21:45:10.4317050Z test_add_cuda_float32 (__main__.TestSparseCSRCUDA) ... /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:2429: UserWarning: Sparse CSR tensor support is in beta state. If you miss a functionality in the sparse tensor support, please submit a feature request to https://github.com/pytorch/pytorch/issues. (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/SparseCsrTensorImpl.cpp:56.) 2023-01-11T21:45:10.4318028Z return torch.sparse_compressed_tensor(compressed_indices, plain_indices, 2023-01-11T21:45:10.4318424Z ok (0.126s) 2023-01-11T21:45:10.4318738Z test_add_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.104s) 2023-01-11T21:45:10.4327220Z test_addmm_all_sparse_csr_SparseCSC_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.219s) 2023-01-11T21:45:10.4328127Z test_addmm_all_sparse_csr_SparseCSC_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.052s) 2023-01-11T21:45:10.4329518Z test_addmm_all_sparse_csr_SparseCSC_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.047s) 2023-01-11T21:45:10.4330057Z test_addmm_all_sparse_csr_SparseCSC_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.047s) 2023-01-11T21:45:10.4330522Z test_addmm_all_sparse_csr_SparseCSC_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.046s) 2023-01-11T21:45:10.4330987Z test_addmm_all_sparse_csr_SparseCSC_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.046s) 2023-01-11T21:45:10.4331492Z test_addmm_all_sparse_csr_SparseCSR_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.040s) 2023-01-11T21:45:10.4332002Z test_addmm_all_sparse_csr_SparseCSR_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.041s) 2023-01-11T21:45:10.4332690Z test_addmm_all_sparse_csr_SparseCSR_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.040s) 2023-01-11T21:45:10.4333162Z test_addmm_all_sparse_csr_SparseCSR_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.041s) 2023-01-11T21:45:10.4333565Z test_addmm_all_sparse_csr_SparseCSR_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.039s) 2023-01-11T21:45:10.4333913Z test_addmm_all_sparse_csr_SparseCSR_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.039s) 2023-01-11T21:45:10.4334379Z test_addmm_dense_result_SparseCSC_cuda_complex128 (__main__.TestSparseCSRCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:45:10.4335177Z test_addmm_dense_result_SparseCSC_cuda_complex64 (__main__.TestSparseCSRCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:45:10.4335681Z test_addmm_dense_result_SparseCSC_cuda_float32 (__main__.TestSparseCSRCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:45:10.4336098Z test_addmm_dense_result_SparseCSC_cuda_float64 (__main__.TestSparseCSRCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:45:10.4336623Z test_addmm_dense_result_SparseCSR_cuda_complex128 (__main__.TestSparseCSRCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:45:10.4337128Z test_addmm_dense_result_SparseCSR_cuda_complex64 (__main__.TestSparseCSRCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:45:10.4337524Z test_addmm_dense_result_SparseCSR_cuda_float32 (__main__.TestSparseCSRCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:45:10.4337977Z test_addmm_dense_result_SparseCSR_cuda_float64 (__main__.TestSparseCSRCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:45:10.4338437Z test_addmm_errors_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.032s) 2023-01-11T21:45:10.4338971Z test_addmm_sizes_all_sparse_csr_k_0_n_0_m_0_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.015s) 2023-01-11T21:45:10.4339440Z test_addmm_sizes_all_sparse_csr_k_0_n_0_m_0_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.014s) 2023-01-11T21:45:10.4339816Z test_addmm_sizes_all_sparse_csr_k_0_n_0_m_0_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.014s) 2023-01-11T21:45:10.4340178Z test_addmm_sizes_all_sparse_csr_k_0_n_0_m_0_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.014s) 2023-01-11T21:45:10.4340623Z test_addmm_sizes_all_sparse_csr_k_0_n_0_m_0_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.014s) 2023-01-11T21:45:10.4341035Z test_addmm_sizes_all_sparse_csr_k_0_n_0_m_0_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.014s) 2023-01-11T21:45:10.4341462Z test_addmm_sizes_all_sparse_csr_k_0_n_0_m_1_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.014s) 2023-01-11T21:45:10.4341924Z test_addmm_sizes_all_sparse_csr_k_0_n_0_m_1_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.015s) 2023-01-11T21:45:10.4342364Z test_addmm_sizes_all_sparse_csr_k_0_n_0_m_1_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.014s) 2023-01-11T21:45:10.4342828Z test_addmm_sizes_all_sparse_csr_k_0_n_0_m_1_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.014s) 2023-01-11T21:45:10.4343296Z test_addmm_sizes_all_sparse_csr_k_0_n_0_m_1_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.014s) 2023-01-11T21:45:10.4343745Z test_addmm_sizes_all_sparse_csr_k_0_n_0_m_1_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.014s) 2023-01-11T21:45:10.4344198Z test_addmm_sizes_all_sparse_csr_k_0_n_0_m_25_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.014s) 2023-01-11T21:45:10.4344649Z test_addmm_sizes_all_sparse_csr_k_0_n_0_m_25_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.014s) 2023-01-11T21:45:10.4345116Z test_addmm_sizes_all_sparse_csr_k_0_n_0_m_25_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.014s) 2023-01-11T21:45:10.4345594Z test_addmm_sizes_all_sparse_csr_k_0_n_0_m_25_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.014s) 2023-01-11T21:45:10.4346055Z test_addmm_sizes_all_sparse_csr_k_0_n_0_m_25_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.014s) 2023-01-11T21:45:10.4346605Z test_addmm_sizes_all_sparse_csr_k_0_n_0_m_25_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.014s) 2023-01-11T21:45:10.4347055Z test_addmm_sizes_all_sparse_csr_k_0_n_10_m_0_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.014s) 2023-01-11T21:45:10.4347519Z test_addmm_sizes_all_sparse_csr_k_0_n_10_m_0_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.014s) 2023-01-11T21:45:10.4347989Z test_addmm_sizes_all_sparse_csr_k_0_n_10_m_0_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.014s) 2023-01-11T21:45:10.4348458Z test_addmm_sizes_all_sparse_csr_k_0_n_10_m_0_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.014s) 2023-01-11T21:45:10.4348945Z test_addmm_sizes_all_sparse_csr_k_0_n_10_m_0_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.014s) 2023-01-11T21:45:10.4349399Z test_addmm_sizes_all_sparse_csr_k_0_n_10_m_0_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.014s) 2023-01-11T21:45:10.4349853Z test_addmm_sizes_all_sparse_csr_k_0_n_10_m_1_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.015s) 2023-01-11T21:45:10.4350328Z test_addmm_sizes_all_sparse_csr_k_0_n_10_m_1_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.015s) 2023-01-11T21:45:10.4350793Z test_addmm_sizes_all_sparse_csr_k_0_n_10_m_1_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.015s) 2023-01-11T21:45:10.4351160Z test_addmm_sizes_all_sparse_csr_k_0_n_10_m_1_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.015s) 2023-01-11T21:45:10.4351511Z test_addmm_sizes_all_sparse_csr_k_0_n_10_m_1_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.015s) 2023-01-11T21:45:10.4351864Z test_addmm_sizes_all_sparse_csr_k_0_n_10_m_1_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.015s) 2023-01-11T21:45:10.4352249Z test_addmm_sizes_all_sparse_csr_k_0_n_10_m_25_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.015s) 2023-01-11T21:45:10.4352753Z test_addmm_sizes_all_sparse_csr_k_0_n_10_m_25_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.015s) 2023-01-11T21:45:10.4353253Z test_addmm_sizes_all_sparse_csr_k_0_n_10_m_25_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.015s) 2023-01-11T21:45:10.4353785Z test_addmm_sizes_all_sparse_csr_k_0_n_10_m_25_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.015s) 2023-01-11T21:45:10.4354327Z test_addmm_sizes_all_sparse_csr_k_0_n_10_m_25_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.015s) 2023-01-11T21:45:10.4354797Z test_addmm_sizes_all_sparse_csr_k_0_n_10_m_25_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.015s) 2023-01-11T21:45:10.4355265Z test_addmm_sizes_all_sparse_csr_k_0_n_1_m_0_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.014s) 2023-01-11T21:45:10.4355738Z test_addmm_sizes_all_sparse_csr_k_0_n_1_m_0_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.014s) 2023-01-11T21:45:10.4356401Z test_addmm_sizes_all_sparse_csr_k_0_n_1_m_0_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.014s) 2023-01-11T21:45:10.4356878Z test_addmm_sizes_all_sparse_csr_k_0_n_1_m_0_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.014s) 2023-01-11T21:45:10.4357347Z test_addmm_sizes_all_sparse_csr_k_0_n_1_m_0_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.014s) 2023-01-11T21:45:10.4357802Z test_addmm_sizes_all_sparse_csr_k_0_n_1_m_0_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.014s) 2023-01-11T21:45:10.4358261Z test_addmm_sizes_all_sparse_csr_k_0_n_1_m_1_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.015s) 2023-01-11T21:45:10.4358738Z test_addmm_sizes_all_sparse_csr_k_0_n_1_m_1_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.015s) 2023-01-11T21:45:10.4359214Z test_addmm_sizes_all_sparse_csr_k_0_n_1_m_1_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.015s) 2023-01-11T21:45:10.4359680Z test_addmm_sizes_all_sparse_csr_k_0_n_1_m_1_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.015s) 2023-01-11T21:45:10.4360128Z test_addmm_sizes_all_sparse_csr_k_0_n_1_m_1_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.015s) 2023-01-11T21:45:10.4361346Z test_addmm_sizes_all_sparse_csr_k_0_n_1_m_1_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.015s) 2023-01-11T21:45:10.4361882Z test_addmm_sizes_all_sparse_csr_k_0_n_1_m_25_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.015s) 2023-01-11T21:45:10.4362376Z test_addmm_sizes_all_sparse_csr_k_0_n_1_m_25_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.015s) 2023-01-11T21:45:10.4362868Z test_addmm_sizes_all_sparse_csr_k_0_n_1_m_25_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.015s) 2023-01-11T21:45:10.4363355Z test_addmm_sizes_all_sparse_csr_k_0_n_1_m_25_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.015s) 2023-01-11T21:45:10.4363819Z test_addmm_sizes_all_sparse_csr_k_0_n_1_m_25_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.015s) 2023-01-11T21:45:10.4364284Z test_addmm_sizes_all_sparse_csr_k_0_n_1_m_25_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.015s) 2023-01-11T21:45:10.4378636Z test_addmm_sizes_all_sparse_csr_k_1_n_0_m_0_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.015s) 2023-01-11T21:45:10.4379131Z test_addmm_sizes_all_sparse_csr_k_1_n_0_m_0_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.014s) 2023-01-11T21:45:10.4379512Z test_addmm_sizes_all_sparse_csr_k_1_n_0_m_0_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.014s) 2023-01-11T21:45:10.4379901Z test_addmm_sizes_all_sparse_csr_k_1_n_0_m_0_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.014s) 2023-01-11T21:45:10.4380304Z test_addmm_sizes_all_sparse_csr_k_1_n_0_m_0_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.014s) 2023-01-11T21:45:10.4380773Z test_addmm_sizes_all_sparse_csr_k_1_n_0_m_0_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.014s) 2023-01-11T21:45:10.4381320Z test_addmm_sizes_all_sparse_csr_k_1_n_0_m_1_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.014s) 2023-01-11T21:45:10.4381757Z test_addmm_sizes_all_sparse_csr_k_1_n_0_m_1_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.014s) 2023-01-11T21:45:10.4382179Z test_addmm_sizes_all_sparse_csr_k_1_n_0_m_1_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.014s) 2023-01-11T21:45:10.4382719Z test_addmm_sizes_all_sparse_csr_k_1_n_0_m_1_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.014s) 2023-01-11T21:45:10.4383261Z test_addmm_sizes_all_sparse_csr_k_1_n_0_m_1_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.014s) 2023-01-11T21:45:10.4383746Z test_addmm_sizes_all_sparse_csr_k_1_n_0_m_1_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.014s) 2023-01-11T21:45:10.4384109Z test_addmm_sizes_all_sparse_csr_k_1_n_0_m_25_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.015s) 2023-01-11T21:45:10.4384572Z test_addmm_sizes_all_sparse_csr_k_1_n_0_m_25_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.014s) 2023-01-11T21:45:10.4385099Z test_addmm_sizes_all_sparse_csr_k_1_n_0_m_25_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.014s) 2023-01-11T21:45:10.4385620Z test_addmm_sizes_all_sparse_csr_k_1_n_0_m_25_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.014s) 2023-01-11T21:45:10.4386108Z test_addmm_sizes_all_sparse_csr_k_1_n_0_m_25_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.014s) 2023-01-11T21:45:10.4386627Z test_addmm_sizes_all_sparse_csr_k_1_n_0_m_25_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.014s) 2023-01-11T21:45:10.4387130Z test_addmm_sizes_all_sparse_csr_k_1_n_10_m_0_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.015s) 2023-01-11T21:45:10.4387639Z test_addmm_sizes_all_sparse_csr_k_1_n_10_m_0_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.015s) 2023-01-11T21:45:10.4388132Z test_addmm_sizes_all_sparse_csr_k_1_n_10_m_0_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.014s) 2023-01-11T21:45:10.4388660Z test_addmm_sizes_all_sparse_csr_k_1_n_10_m_0_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.014s) 2023-01-11T21:45:10.4389149Z test_addmm_sizes_all_sparse_csr_k_1_n_10_m_0_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.015s) 2023-01-11T21:45:10.4389630Z test_addmm_sizes_all_sparse_csr_k_1_n_10_m_0_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.014s) 2023-01-11T21:45:10.4390206Z test_addmm_sizes_all_sparse_csr_k_1_n_10_m_1_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.015s) 2023-01-11T21:45:10.4390716Z test_addmm_sizes_all_sparse_csr_k_1_n_10_m_1_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.015s) 2023-01-11T21:45:10.4391218Z test_addmm_sizes_all_sparse_csr_k_1_n_10_m_1_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.015s) 2023-01-11T21:45:10.4391763Z test_addmm_sizes_all_sparse_csr_k_1_n_10_m_1_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.015s) 2023-01-11T21:45:10.4392187Z test_addmm_sizes_all_sparse_csr_k_1_n_10_m_1_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.015s) 2023-01-11T21:45:10.4392730Z test_addmm_sizes_all_sparse_csr_k_1_n_10_m_1_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.015s) 2023-01-11T21:45:10.4393304Z test_addmm_sizes_all_sparse_csr_k_1_n_10_m_25_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.015s) 2023-01-11T21:45:10.4393871Z test_addmm_sizes_all_sparse_csr_k_1_n_10_m_25_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.015s) 2023-01-11T21:45:10.4394337Z test_addmm_sizes_all_sparse_csr_k_1_n_10_m_25_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.015s) 2023-01-11T21:45:10.4394797Z test_addmm_sizes_all_sparse_csr_k_1_n_10_m_25_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.015s) 2023-01-11T21:45:10.4395253Z test_addmm_sizes_all_sparse_csr_k_1_n_10_m_25_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.015s) 2023-01-11T21:45:10.4395700Z test_addmm_sizes_all_sparse_csr_k_1_n_10_m_25_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.015s) 2023-01-11T21:45:10.4396159Z test_addmm_sizes_all_sparse_csr_k_1_n_1_m_0_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.015s) 2023-01-11T21:45:10.4396625Z test_addmm_sizes_all_sparse_csr_k_1_n_1_m_0_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.014s) 2023-01-11T21:45:10.4397092Z test_addmm_sizes_all_sparse_csr_k_1_n_1_m_0_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.015s) 2023-01-11T21:45:10.4397549Z test_addmm_sizes_all_sparse_csr_k_1_n_1_m_0_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.014s) 2023-01-11T21:45:10.4398007Z test_addmm_sizes_all_sparse_csr_k_1_n_1_m_0_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.014s) 2023-01-11T21:45:10.4398529Z test_addmm_sizes_all_sparse_csr_k_1_n_1_m_0_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.014s) 2023-01-11T21:45:10.4398988Z test_addmm_sizes_all_sparse_csr_k_1_n_1_m_1_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.015s) 2023-01-11T21:45:10.4399440Z test_addmm_sizes_all_sparse_csr_k_1_n_1_m_1_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.015s) 2023-01-11T21:45:10.4399912Z test_addmm_sizes_all_sparse_csr_k_1_n_1_m_1_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.015s) 2023-01-11T21:45:10.4400365Z test_addmm_sizes_all_sparse_csr_k_1_n_1_m_1_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.015s) 2023-01-11T21:45:10.4400814Z test_addmm_sizes_all_sparse_csr_k_1_n_1_m_1_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.015s) 2023-01-11T21:45:10.4401262Z test_addmm_sizes_all_sparse_csr_k_1_n_1_m_1_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.015s) 2023-01-11T21:45:10.4401727Z test_addmm_sizes_all_sparse_csr_k_1_n_1_m_25_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.015s) 2023-01-11T21:45:10.4402177Z test_addmm_sizes_all_sparse_csr_k_1_n_1_m_25_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.015s) 2023-01-11T21:45:10.4402635Z test_addmm_sizes_all_sparse_csr_k_1_n_1_m_25_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.015s) 2023-01-11T21:45:10.4403088Z test_addmm_sizes_all_sparse_csr_k_1_n_1_m_25_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.015s) 2023-01-11T21:45:10.4403549Z test_addmm_sizes_all_sparse_csr_k_1_n_1_m_25_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.015s) 2023-01-11T21:45:10.4403999Z test_addmm_sizes_all_sparse_csr_k_1_n_1_m_25_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.015s) 2023-01-11T21:45:10.4404482Z test_addmm_sizes_all_sparse_csr_k_8_n_0_m_0_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.014s) 2023-01-11T21:45:10.4404933Z test_addmm_sizes_all_sparse_csr_k_8_n_0_m_0_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.014s) 2023-01-11T21:45:10.4405397Z test_addmm_sizes_all_sparse_csr_k_8_n_0_m_0_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.014s) 2023-01-11T21:45:10.4405851Z test_addmm_sizes_all_sparse_csr_k_8_n_0_m_0_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.014s) 2023-01-11T21:45:10.4406284Z test_addmm_sizes_all_sparse_csr_k_8_n_0_m_0_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.014s) 2023-01-11T21:45:10.4406738Z test_addmm_sizes_all_sparse_csr_k_8_n_0_m_0_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.014s) 2023-01-11T21:45:10.4407188Z test_addmm_sizes_all_sparse_csr_k_8_n_0_m_1_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.015s) 2023-01-11T21:45:10.4407649Z test_addmm_sizes_all_sparse_csr_k_8_n_0_m_1_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.014s) 2023-01-11T21:45:10.4408108Z test_addmm_sizes_all_sparse_csr_k_8_n_0_m_1_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.014s) 2023-01-11T21:45:10.4408567Z test_addmm_sizes_all_sparse_csr_k_8_n_0_m_1_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.014s) 2023-01-11T21:45:10.4409015Z test_addmm_sizes_all_sparse_csr_k_8_n_0_m_1_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.014s) 2023-01-11T21:45:10.4409468Z test_addmm_sizes_all_sparse_csr_k_8_n_0_m_1_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.014s) 2023-01-11T21:45:10.4409924Z test_addmm_sizes_all_sparse_csr_k_8_n_0_m_25_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.014s) 2023-01-11T21:45:10.4410388Z test_addmm_sizes_all_sparse_csr_k_8_n_0_m_25_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.014s) 2023-01-11T21:45:10.4410847Z test_addmm_sizes_all_sparse_csr_k_8_n_0_m_25_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.015s) 2023-01-11T21:45:10.4411303Z test_addmm_sizes_all_sparse_csr_k_8_n_0_m_25_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.014s) 2023-01-11T21:45:10.4411750Z test_addmm_sizes_all_sparse_csr_k_8_n_0_m_25_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.014s) 2023-01-11T21:45:10.4412245Z test_addmm_sizes_all_sparse_csr_k_8_n_0_m_25_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.014s) 2023-01-11T21:45:10.4412703Z test_addmm_sizes_all_sparse_csr_k_8_n_10_m_0_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.015s) 2023-01-11T21:45:10.4413171Z test_addmm_sizes_all_sparse_csr_k_8_n_10_m_0_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.015s) 2023-01-11T21:45:10.4413678Z test_addmm_sizes_all_sparse_csr_k_8_n_10_m_0_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.014s) 2023-01-11T21:45:10.4414140Z test_addmm_sizes_all_sparse_csr_k_8_n_10_m_0_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.014s) 2023-01-11T21:45:10.4414884Z test_addmm_sizes_all_sparse_csr_k_8_n_10_m_0_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.014s) 2023-01-11T21:45:10.4415341Z test_addmm_sizes_all_sparse_csr_k_8_n_10_m_0_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.014s) 2023-01-11T21:45:10.4415788Z test_addmm_sizes_all_sparse_csr_k_8_n_10_m_1_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.015s) 2023-01-11T21:45:10.4416259Z test_addmm_sizes_all_sparse_csr_k_8_n_10_m_1_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.015s) 2023-01-11T21:45:10.4416731Z test_addmm_sizes_all_sparse_csr_k_8_n_10_m_1_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.015s) 2023-01-11T21:45:10.4417183Z test_addmm_sizes_all_sparse_csr_k_8_n_10_m_1_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.015s) 2023-01-11T21:45:10.4417646Z test_addmm_sizes_all_sparse_csr_k_8_n_10_m_1_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.015s) 2023-01-11T21:45:10.4418089Z test_addmm_sizes_all_sparse_csr_k_8_n_10_m_1_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.015s) 2023-01-11T21:45:10.4418654Z test_addmm_sizes_all_sparse_csr_k_8_n_10_m_25_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.015s) 2023-01-11T21:45:10.4419197Z test_addmm_sizes_all_sparse_csr_k_8_n_10_m_25_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.015s) 2023-01-11T21:45:10.4419682Z test_addmm_sizes_all_sparse_csr_k_8_n_10_m_25_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.015s) 2023-01-11T21:45:10.4420139Z test_addmm_sizes_all_sparse_csr_k_8_n_10_m_25_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.015s) 2023-01-11T21:45:10.4420591Z test_addmm_sizes_all_sparse_csr_k_8_n_10_m_25_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.015s) 2023-01-11T21:45:10.4421022Z test_addmm_sizes_all_sparse_csr_k_8_n_10_m_25_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.015s) 2023-01-11T21:45:10.4421478Z test_addmm_sizes_all_sparse_csr_k_8_n_1_m_0_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.015s) 2023-01-11T21:45:10.4421937Z test_addmm_sizes_all_sparse_csr_k_8_n_1_m_0_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.014s) 2023-01-11T21:45:10.4422405Z test_addmm_sizes_all_sparse_csr_k_8_n_1_m_0_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.014s) 2023-01-11T21:45:10.4422844Z test_addmm_sizes_all_sparse_csr_k_8_n_1_m_0_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.014s) 2023-01-11T21:45:10.4423303Z test_addmm_sizes_all_sparse_csr_k_8_n_1_m_0_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.014s) 2023-01-11T21:45:10.4423761Z test_addmm_sizes_all_sparse_csr_k_8_n_1_m_0_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.014s) 2023-01-11T21:45:10.4424198Z test_addmm_sizes_all_sparse_csr_k_8_n_1_m_1_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.015s) 2023-01-11T21:45:10.4424673Z test_addmm_sizes_all_sparse_csr_k_8_n_1_m_1_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.015s) 2023-01-11T21:45:10.4425143Z test_addmm_sizes_all_sparse_csr_k_8_n_1_m_1_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.015s) 2023-01-11T21:45:10.4425592Z test_addmm_sizes_all_sparse_csr_k_8_n_1_m_1_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.015s) 2023-01-11T21:45:10.4426042Z test_addmm_sizes_all_sparse_csr_k_8_n_1_m_1_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.015s) 2023-01-11T21:45:10.4426487Z test_addmm_sizes_all_sparse_csr_k_8_n_1_m_1_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.015s) 2023-01-11T21:45:10.4427002Z test_addmm_sizes_all_sparse_csr_k_8_n_1_m_25_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.015s) 2023-01-11T21:45:10.4427461Z test_addmm_sizes_all_sparse_csr_k_8_n_1_m_25_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.015s) 2023-01-11T21:45:10.4427915Z test_addmm_sizes_all_sparse_csr_k_8_n_1_m_25_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.015s) 2023-01-11T21:45:10.4428370Z test_addmm_sizes_all_sparse_csr_k_8_n_1_m_25_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.015s) 2023-01-11T21:45:10.4428828Z test_addmm_sizes_all_sparse_csr_k_8_n_1_m_25_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.015s) 2023-01-11T21:45:10.4429268Z test_addmm_sizes_all_sparse_csr_k_8_n_1_m_25_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.015s) 2023-01-11T21:45:10.4429712Z test_autograd_dense_output_addmm_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.711s) 2023-01-11T21:45:10.4430162Z test_autograd_dense_output_addmv_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.043s) 2023-01-11T21:45:10.4430609Z test_autograd_dense_output_mm_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.007s) 2023-01-11T21:45:10.4431046Z test_autograd_dense_output_mv_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.007s) 2023-01-11T21:45:10.4431490Z test_autograd_sparse_csr_unary_abs_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.202s) 2023-01-11T21:45:10.4431952Z test_autograd_sparse_csr_unary_abs_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4432517Z test_autograd_sparse_csr_unary_angle_cuda_complex128 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Unary op angle not supported with CSR input and autograd (0.001s) 2023-01-11T21:45:10.4433084Z test_autograd_sparse_csr_unary_angle_cuda_float64 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Unary op angle not supported with CSR input and autograd (0.001s) 2023-01-11T21:45:10.4433567Z test_autograd_sparse_csr_unary_asin_cuda_complex128 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Unary op asin not supported with CSR input and autograd (0.001s) 2023-01-11T21:45:10.4434057Z test_autograd_sparse_csr_unary_asin_cuda_float64 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Unary op asin not supported with CSR input and autograd (0.001s) 2023-01-11T21:45:10.4434542Z test_autograd_sparse_csr_unary_asinh_cuda_complex128 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Unary op asinh not supported with CSR input and autograd (0.001s) 2023-01-11T21:45:10.4435021Z test_autograd_sparse_csr_unary_asinh_cuda_float64 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Unary op asinh not supported with CSR input and autograd (0.001s) 2023-01-11T21:45:10.4435495Z test_autograd_sparse_csr_unary_atan_cuda_complex128 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Unary op atan not supported with CSR input and autograd (0.001s) 2023-01-11T21:45:10.4435985Z test_autograd_sparse_csr_unary_atan_cuda_float64 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Unary op atan not supported with CSR input and autograd (0.001s) 2023-01-11T21:45:10.4436465Z test_autograd_sparse_csr_unary_atanh_cuda_complex128 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Unary op atanh not supported with CSR input and autograd (0.001s) 2023-01-11T21:45:10.4436954Z test_autograd_sparse_csr_unary_atanh_cuda_float64 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Unary op atanh not supported with CSR input and autograd (0.001s) 2023-01-11T21:45:10.4437430Z test_autograd_sparse_csr_unary_ceil_cuda_float64 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Unary op ceil not supported with CSR input and autograd (0.001s) 2023-01-11T21:45:10.4437853Z test_autograd_sparse_csr_unary_conj_physical_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4438234Z test_autograd_sparse_csr_unary_conj_physical_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4438607Z test_autograd_sparse_csr_unary_deg2rad_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4439054Z test_autograd_sparse_csr_unary_erf_cuda_float64 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Unary op erf not supported with CSR input and autograd (0.001s) 2023-01-11T21:45:10.4439537Z test_autograd_sparse_csr_unary_erfinv_cuda_float64 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Unary op erfinv not supported with CSR input and autograd (0.001s) 2023-01-11T21:45:10.4440023Z test_autograd_sparse_csr_unary_expm1_cuda_float64 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Unary op expm1 not supported with CSR input and autograd (0.001s) 2023-01-11T21:45:10.4440494Z test_autograd_sparse_csr_unary_floor_cuda_float64 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Unary op floor not supported with CSR input and autograd (0.001s) 2023-01-11T21:45:10.4440909Z test_autograd_sparse_csr_unary_frac_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4441326Z test_autograd_sparse_csr_unary_isinf_cuda_complex128 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Unary op isinf not supported with CSR input and autograd (0.001s) 2023-01-11T21:45:10.4441807Z test_autograd_sparse_csr_unary_isinf_cuda_float64 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Unary op isinf not supported with CSR input and autograd (0.001s) 2023-01-11T21:45:10.4442290Z test_autograd_sparse_csr_unary_isnan_cuda_complex128 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Unary op isnan not supported with CSR input and autograd (0.001s) 2023-01-11T21:45:10.4442767Z test_autograd_sparse_csr_unary_isnan_cuda_float64 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Unary op isnan not supported with CSR input and autograd (0.001s) 2023-01-11T21:45:10.4443252Z test_autograd_sparse_csr_unary_isneginf_cuda_float64 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Unary op isneginf not supported with CSR input and autograd (0.001s) 2023-01-11T21:45:10.4443780Z test_autograd_sparse_csr_unary_isposinf_cuda_float64 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Unary op isposinf not supported with CSR input and autograd (0.001s) 2023-01-11T21:45:10.4444796Z test_autograd_sparse_csr_unary_log1p_cuda_complex128 (__main__.TestSparseCSRCUDA) ... /opt/conda/lib/python3.10/site-packages/torch/autograd/__init__.py:197: UserWarning: log1p_backward: received self with sparse layout, but backward requires materialization of a dense tensor with this shape (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/FunctionsManual.cpp:4679.) 2023-01-11T21:45:10.4445476Z Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2023-01-11T21:45:10.4445744Z ok (0.003s) 2023-01-11T21:45:10.4446016Z test_autograd_sparse_csr_unary_log1p_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4446381Z test_autograd_sparse_csr_unary_neg_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.156s) 2023-01-11T21:45:10.4446739Z test_autograd_sparse_csr_unary_neg_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4447117Z test_autograd_sparse_csr_unary_nn_functional_relu_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.004s) 2023-01-11T21:45:10.4447488Z test_autograd_sparse_csr_unary_positive_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4447857Z test_autograd_sparse_csr_unary_positive_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4448222Z test_autograd_sparse_csr_unary_rad2deg_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4448647Z test_autograd_sparse_csr_unary_round_cuda_float64 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Unary op round not supported with CSR input and autograd (0.001s) 2023-01-11T21:45:10.4449130Z test_autograd_sparse_csr_unary_sgn_cuda_complex128 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Unary op sgn not supported with CSR input and autograd (0.001s) 2023-01-11T21:45:10.4449642Z test_autograd_sparse_csr_unary_sgn_cuda_float64 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Unary op sgn not supported with CSR input and autograd (0.001s) 2023-01-11T21:45:10.4450126Z test_autograd_sparse_csr_unary_sign_cuda_float64 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Unary op sign not supported with CSR input and autograd (0.001s) 2023-01-11T21:45:10.4450614Z test_autograd_sparse_csr_unary_signbit_cuda_float64 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Unary op signbit not supported with CSR input and autograd (0.001s) 2023-01-11T21:45:10.4451095Z test_autograd_sparse_csr_unary_sin_cuda_complex128 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Unary op sin not supported with CSR input and autograd (0.001s) 2023-01-11T21:45:10.4451570Z test_autograd_sparse_csr_unary_sin_cuda_float64 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Unary op sin not supported with CSR input and autograd (0.001s) 2023-01-11T21:45:10.4452057Z test_autograd_sparse_csr_unary_sinh_cuda_complex128 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Unary op sinh not supported with CSR input and autograd (0.001s) 2023-01-11T21:45:10.4452544Z test_autograd_sparse_csr_unary_sinh_cuda_float64 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Unary op sinh not supported with CSR input and autograd (0.001s) 2023-01-11T21:45:10.4453025Z test_autograd_sparse_csr_unary_sqrt_cuda_complex128 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Unary op sqrt not supported with CSR input and autograd (0.001s) 2023-01-11T21:45:10.4453498Z test_autograd_sparse_csr_unary_sqrt_cuda_float64 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Unary op sqrt not supported with CSR input and autograd (0.001s) 2023-01-11T21:45:10.4453973Z test_autograd_sparse_csr_unary_tan_cuda_complex128 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Unary op tan not supported with CSR input and autograd (0.001s) 2023-01-11T21:45:10.4454675Z test_autograd_sparse_csr_unary_tan_cuda_float64 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Unary op tan not supported with CSR input and autograd (0.001s) 2023-01-11T21:45:10.4455198Z test_autograd_sparse_csr_unary_tanh_cuda_complex128 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Unary op tanh not supported with CSR input and autograd (0.001s) 2023-01-11T21:45:10.4455669Z test_autograd_sparse_csr_unary_tanh_cuda_float64 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Unary op tanh not supported with CSR input and autograd (0.001s) 2023-01-11T21:45:10.4456142Z test_autograd_sparse_csr_unary_trunc_cuda_float64 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Unary op trunc not supported with CSR input and autograd (0.001s) 2023-01-11T21:45:10.4456556Z test_baddbmm_cuda_complex128 (__main__.TestSparseCSRCUDA) ... skip: ROCm not available (0.002s) 2023-01-11T21:45:10.4456914Z test_baddbmm_cuda_complex64 (__main__.TestSparseCSRCUDA) ... skip: ROCm not available (0.002s) 2023-01-11T21:45:10.4457264Z test_baddbmm_cuda_float32 (__main__.TestSparseCSRCUDA) ... skip: ROCm not available (0.002s) 2023-01-11T21:45:10.4457614Z test_baddbmm_cuda_float64 (__main__.TestSparseCSRCUDA) ... skip: ROCm not available (0.002s) 2023-01-11T21:45:10.4457986Z test_block_addmm_block_size_2_int32_noncontiguous_False_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.072s) 2023-01-11T21:45:10.4458378Z test_block_addmm_block_size_2_int32_noncontiguous_False_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.065s) 2023-01-11T21:45:10.4458832Z test_block_addmm_block_size_2_int32_noncontiguous_False_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.064s) 2023-01-11T21:45:10.4459219Z test_block_addmm_block_size_2_int32_noncontiguous_False_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.071s) 2023-01-11T21:45:10.4459595Z test_block_addmm_block_size_2_int32_noncontiguous_False_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.061s) 2023-01-11T21:45:10.4459986Z test_block_addmm_block_size_2_int32_noncontiguous_False_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.059s) 2023-01-11T21:45:10.4460416Z test_block_addmm_block_size_2_int32_noncontiguous_True_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.074s) 2023-01-11T21:45:10.4460809Z test_block_addmm_block_size_2_int32_noncontiguous_True_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.070s) 2023-01-11T21:45:10.4461195Z test_block_addmm_block_size_2_int32_noncontiguous_True_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.071s) 2023-01-11T21:45:10.4461582Z test_block_addmm_block_size_2_int32_noncontiguous_True_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.075s) 2023-01-11T21:45:10.4461952Z test_block_addmm_block_size_2_int32_noncontiguous_True_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.067s) 2023-01-11T21:45:10.4462326Z test_block_addmm_block_size_2_int32_noncontiguous_True_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.065s) 2023-01-11T21:45:10.4462711Z test_block_addmm_block_size_2_int64_noncontiguous_False_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.072s) 2023-01-11T21:45:10.4463101Z test_block_addmm_block_size_2_int64_noncontiguous_False_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.068s) 2023-01-11T21:45:10.4463534Z test_block_addmm_block_size_2_int64_noncontiguous_False_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.069s) 2023-01-11T21:45:10.4463914Z test_block_addmm_block_size_2_int64_noncontiguous_False_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.071s) 2023-01-11T21:45:10.4464287Z test_block_addmm_block_size_2_int64_noncontiguous_False_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.064s) 2023-01-11T21:45:10.4464656Z test_block_addmm_block_size_2_int64_noncontiguous_False_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.062s) 2023-01-11T21:45:10.4465026Z test_block_addmm_block_size_2_int64_noncontiguous_True_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.074s) 2023-01-11T21:45:10.4465453Z test_block_addmm_block_size_2_int64_noncontiguous_True_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.072s) 2023-01-11T21:45:10.4465840Z test_block_addmm_block_size_2_int64_noncontiguous_True_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.074s) 2023-01-11T21:45:10.4466220Z test_block_addmm_block_size_2_int64_noncontiguous_True_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.075s) 2023-01-11T21:45:10.4466597Z test_block_addmm_block_size_2_int64_noncontiguous_True_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.069s) 2023-01-11T21:45:10.4466969Z test_block_addmm_block_size_2_int64_noncontiguous_True_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.068s) 2023-01-11T21:45:10.4467351Z test_block_addmm_block_size_3_int32_noncontiguous_False_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.072s) 2023-01-11T21:45:10.4467729Z test_block_addmm_block_size_3_int32_noncontiguous_False_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.064s) 2023-01-11T21:45:10.4468119Z test_block_addmm_block_size_3_int32_noncontiguous_False_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.065s) 2023-01-11T21:45:10.4468502Z test_block_addmm_block_size_3_int32_noncontiguous_False_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.072s) 2023-01-11T21:45:10.4468880Z test_block_addmm_block_size_3_int32_noncontiguous_False_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.061s) 2023-01-11T21:45:10.4469251Z test_block_addmm_block_size_3_int32_noncontiguous_False_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.060s) 2023-01-11T21:45:10.4469627Z test_block_addmm_block_size_3_int32_noncontiguous_True_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.075s) 2023-01-11T21:45:10.4470010Z test_block_addmm_block_size_3_int32_noncontiguous_True_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.070s) 2023-01-11T21:45:10.4470391Z test_block_addmm_block_size_3_int32_noncontiguous_True_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.071s) 2023-01-11T21:45:10.4470769Z test_block_addmm_block_size_3_int32_noncontiguous_True_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.075s) 2023-01-11T21:45:10.4471143Z test_block_addmm_block_size_3_int32_noncontiguous_True_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.067s) 2023-01-11T21:45:10.4471536Z test_block_addmm_block_size_3_int32_noncontiguous_True_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.065s) 2023-01-11T21:45:10.4471916Z test_block_addmm_block_size_3_int64_noncontiguous_False_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.072s) 2023-01-11T21:45:10.4472295Z test_block_addmm_block_size_3_int64_noncontiguous_False_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.067s) 2023-01-11T21:45:10.4472681Z test_block_addmm_block_size_3_int64_noncontiguous_False_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.069s) 2023-01-11T21:45:10.4473066Z test_block_addmm_block_size_3_int64_noncontiguous_False_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.071s) 2023-01-11T21:45:10.4473442Z test_block_addmm_block_size_3_int64_noncontiguous_False_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.063s) 2023-01-11T21:45:10.4473812Z test_block_addmm_block_size_3_int64_noncontiguous_False_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.061s) 2023-01-11T21:45:10.4474191Z test_block_addmm_block_size_3_int64_noncontiguous_True_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.074s) 2023-01-11T21:45:10.4474575Z test_block_addmm_block_size_3_int64_noncontiguous_True_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.073s) 2023-01-11T21:45:10.4474949Z test_block_addmm_block_size_3_int64_noncontiguous_True_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.075s) 2023-01-11T21:45:10.4475333Z test_block_addmm_block_size_3_int64_noncontiguous_True_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.075s) 2023-01-11T21:45:10.4475706Z test_block_addmm_block_size_3_int64_noncontiguous_True_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.070s) 2023-01-11T21:45:10.4476117Z test_block_addmm_block_size_3_int64_noncontiguous_True_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.068s) 2023-01-11T21:45:10.4476488Z test_block_addmv_block_size_2_int32_noncontiguous_False_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.009s) 2023-01-11T21:45:10.4476877Z test_block_addmv_block_size_2_int32_noncontiguous_False_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.009s) 2023-01-11T21:45:10.4477258Z test_block_addmv_block_size_2_int32_noncontiguous_False_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.009s) 2023-01-11T21:45:10.4477629Z test_block_addmv_block_size_2_int32_noncontiguous_False_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.009s) 2023-01-11T21:45:10.4477997Z test_block_addmv_block_size_2_int32_noncontiguous_True_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.008s) 2023-01-11T21:45:10.4478381Z test_block_addmv_block_size_2_int32_noncontiguous_True_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.008s) 2023-01-11T21:45:10.4478766Z test_block_addmv_block_size_2_int32_noncontiguous_True_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.008s) 2023-01-11T21:45:10.4479140Z test_block_addmv_block_size_2_int32_noncontiguous_True_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.008s) 2023-01-11T21:45:10.4479520Z test_block_addmv_block_size_2_int64_noncontiguous_False_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.009s) 2023-01-11T21:45:10.4479904Z test_block_addmv_block_size_2_int64_noncontiguous_False_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.009s) 2023-01-11T21:45:10.4480286Z test_block_addmv_block_size_2_int64_noncontiguous_False_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.009s) 2023-01-11T21:45:10.4480661Z test_block_addmv_block_size_2_int64_noncontiguous_False_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.009s) 2023-01-11T21:45:10.4481034Z test_block_addmv_block_size_2_int64_noncontiguous_True_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.008s) 2023-01-11T21:45:10.4481423Z test_block_addmv_block_size_2_int64_noncontiguous_True_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.008s) 2023-01-11T21:45:10.4481806Z test_block_addmv_block_size_2_int64_noncontiguous_True_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.008s) 2023-01-11T21:45:10.4482223Z test_block_addmv_block_size_2_int64_noncontiguous_True_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.008s) 2023-01-11T21:45:10.4482601Z test_block_addmv_block_size_3_int32_noncontiguous_False_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.011s) 2023-01-11T21:45:10.4482987Z test_block_addmv_block_size_3_int32_noncontiguous_False_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.011s) 2023-01-11T21:45:10.4483404Z test_block_addmv_block_size_3_int32_noncontiguous_False_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.011s) 2023-01-11T21:45:10.4483782Z test_block_addmv_block_size_3_int32_noncontiguous_False_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.011s) 2023-01-11T21:45:10.4484163Z test_block_addmv_block_size_3_int32_noncontiguous_True_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.008s) 2023-01-11T21:45:10.4484548Z test_block_addmv_block_size_3_int32_noncontiguous_True_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.009s) 2023-01-11T21:45:10.4484932Z test_block_addmv_block_size_3_int32_noncontiguous_True_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.008s) 2023-01-11T21:45:10.4485297Z test_block_addmv_block_size_3_int32_noncontiguous_True_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.008s) 2023-01-11T21:45:10.4485683Z test_block_addmv_block_size_3_int64_noncontiguous_False_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.011s) 2023-01-11T21:45:10.4486072Z test_block_addmv_block_size_3_int64_noncontiguous_False_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.011s) 2023-01-11T21:45:10.4486453Z test_block_addmv_block_size_3_int64_noncontiguous_False_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.011s) 2023-01-11T21:45:10.4486820Z test_block_addmv_block_size_3_int64_noncontiguous_False_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.011s) 2023-01-11T21:45:10.4487226Z test_block_addmv_block_size_3_int64_noncontiguous_True_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.008s) 2023-01-11T21:45:10.4487618Z test_block_addmv_block_size_3_int64_noncontiguous_True_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.008s) 2023-01-11T21:45:10.4488000Z test_block_addmv_block_size_3_int64_noncontiguous_True_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.008s) 2023-01-11T21:45:10.4488367Z test_block_addmv_block_size_3_int64_noncontiguous_True_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.008s) 2023-01-11T21:45:10.4488951Z test_block_triangular_solve_block_size_2_int32_noncontiguous_False_cuda_complex128 (__main__.TestSparseCSRCUDA) ... /var/lib/jenkins/workspace/test/test_sparse_csr.py:1616: UserWarning: torch.triangular_solve is deprecated in favor of torch.linalg.solve_triangularand will be removed in a future PyTorch release. 2023-01-11T21:45:10.4489523Z torch.linalg.solve_triangular has its arguments reversed and does not return a copy of one of the inputs. 2023-01-11T21:45:10.4489835Z X = torch.triangular_solve(B, A).solution 2023-01-11T21:45:10.4490052Z should be replaced with 2023-01-11T21:45:10.4490415Z X = torch.linalg.solve_triangular(A, B). (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/BatchLinearAlgebra.cpp:2225.) 2023-01-11T21:45:10.4490772Z expected_X, _ = torch.triangular_solve( 2023-01-11T21:45:10.4490984Z ok (0.150s) 2023-01-11T21:45:10.4491286Z test_block_triangular_solve_block_size_2_int32_noncontiguous_False_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.058s) 2023-01-11T21:45:10.4491698Z test_block_triangular_solve_block_size_2_int32_noncontiguous_False_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.057s) 2023-01-11T21:45:10.4492103Z test_block_triangular_solve_block_size_2_int32_noncontiguous_False_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.057s) 2023-01-11T21:45:10.4492507Z test_block_triangular_solve_block_size_2_int32_noncontiguous_True_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.056s) 2023-01-11T21:45:10.4492951Z test_block_triangular_solve_block_size_2_int32_noncontiguous_True_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.056s) 2023-01-11T21:45:10.4493361Z test_block_triangular_solve_block_size_2_int32_noncontiguous_True_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.056s) 2023-01-11T21:45:10.4493765Z test_block_triangular_solve_block_size_2_int32_noncontiguous_True_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.056s) 2023-01-11T21:45:10.4494161Z test_block_triangular_solve_block_size_2_int64_noncontiguous_False_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.057s) 2023-01-11T21:45:10.4494787Z test_block_triangular_solve_block_size_2_int64_noncontiguous_False_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.058s) 2023-01-11T21:45:10.4495201Z test_block_triangular_solve_block_size_2_int64_noncontiguous_False_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.058s) 2023-01-11T21:45:10.4495607Z test_block_triangular_solve_block_size_2_int64_noncontiguous_False_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.057s) 2023-01-11T21:45:10.4496005Z test_block_triangular_solve_block_size_2_int64_noncontiguous_True_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.057s) 2023-01-11T21:45:10.4496414Z test_block_triangular_solve_block_size_2_int64_noncontiguous_True_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.057s) 2023-01-11T21:45:10.4496815Z test_block_triangular_solve_block_size_2_int64_noncontiguous_True_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.056s) 2023-01-11T21:45:10.4497213Z test_block_triangular_solve_block_size_2_int64_noncontiguous_True_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.056s) 2023-01-11T21:45:10.4497611Z test_block_triangular_solve_block_size_3_int32_noncontiguous_False_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.058s) 2023-01-11T21:45:10.4498079Z test_block_triangular_solve_block_size_3_int32_noncontiguous_False_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.058s) 2023-01-11T21:45:10.4498482Z test_block_triangular_solve_block_size_3_int32_noncontiguous_False_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.059s) 2023-01-11T21:45:10.4498966Z test_block_triangular_solve_block_size_3_int32_noncontiguous_False_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.058s) 2023-01-11T21:45:10.4499361Z test_block_triangular_solve_block_size_3_int32_noncontiguous_True_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.056s) 2023-01-11T21:45:10.4499768Z test_block_triangular_solve_block_size_3_int32_noncontiguous_True_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.056s) 2023-01-11T21:45:10.4500167Z test_block_triangular_solve_block_size_3_int32_noncontiguous_True_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.056s) 2023-01-11T21:45:10.4500563Z test_block_triangular_solve_block_size_3_int32_noncontiguous_True_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.056s) 2023-01-11T21:45:10.4500964Z test_block_triangular_solve_block_size_3_int64_noncontiguous_False_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.058s) 2023-01-11T21:45:10.4501374Z test_block_triangular_solve_block_size_3_int64_noncontiguous_False_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.058s) 2023-01-11T21:45:10.4501779Z test_block_triangular_solve_block_size_3_int64_noncontiguous_False_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.058s) 2023-01-11T21:45:10.4502178Z test_block_triangular_solve_block_size_3_int64_noncontiguous_False_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.059s) 2023-01-11T21:45:10.4502578Z test_block_triangular_solve_block_size_3_int64_noncontiguous_True_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.056s) 2023-01-11T21:45:10.4502985Z test_block_triangular_solve_block_size_3_int64_noncontiguous_True_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.057s) 2023-01-11T21:45:10.4503388Z test_block_triangular_solve_block_size_3_int64_noncontiguous_True_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.057s) 2023-01-11T21:45:10.4503786Z test_block_triangular_solve_block_size_3_int64_noncontiguous_True_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.056s) 2023-01-11T21:45:10.4504162Z test_bmm_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.015s) 2023-01-11T21:45:10.4504471Z test_bmm_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.016s) 2023-01-11T21:45:10.4504771Z test_bmm_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.015s) 2023-01-11T21:45:10.4505063Z test_bmm_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.014s) 2023-01-11T21:45:10.4505412Z test_compressed_layout_conversions_coverage_SparseBSC_SparseBSC_cuda (__main__.TestSparseCSRCUDA) 2023-01-11T21:45:10.4505777Z This test performs a smoke test for covered conversion and verifies ... ok (0.038s) 2023-01-11T21:45:10.4506140Z test_compressed_layout_conversions_coverage_SparseBSC_SparseBSR_cuda (__main__.TestSparseCSRCUDA) 2023-01-11T21:45:10.4506488Z This test performs a smoke test for covered conversion and verifies ... ok (0.038s) 2023-01-11T21:45:10.4506847Z test_compressed_layout_conversions_coverage_SparseBSC_SparseCSC_cuda (__main__.TestSparseCSRCUDA) 2023-01-11T21:45:10.4507207Z This test performs a smoke test for covered conversion and verifies ... ok (0.019s) 2023-01-11T21:45:10.4507556Z test_compressed_layout_conversions_coverage_SparseBSC_SparseCSR_cuda (__main__.TestSparseCSRCUDA) 2023-01-11T21:45:10.4507894Z This test performs a smoke test for covered conversion and verifies ... ok (0.019s) 2023-01-11T21:45:10.4508253Z test_compressed_layout_conversions_coverage_SparseBSR_SparseBSC_cuda (__main__.TestSparseCSRCUDA) 2023-01-11T21:45:10.4508608Z This test performs a smoke test for covered conversion and verifies ... ok (0.038s) 2023-01-11T21:45:10.4508948Z test_compressed_layout_conversions_coverage_SparseBSR_SparseBSR_cuda (__main__.TestSparseCSRCUDA) 2023-01-11T21:45:10.4509327Z This test performs a smoke test for covered conversion and verifies ... ok (0.037s) 2023-01-11T21:45:10.4509685Z test_compressed_layout_conversions_coverage_SparseBSR_SparseCSC_cuda (__main__.TestSparseCSRCUDA) 2023-01-11T21:45:10.4510047Z This test performs a smoke test for covered conversion and verifies ... ok (0.019s) 2023-01-11T21:45:10.4510401Z test_compressed_layout_conversions_coverage_SparseBSR_SparseCSR_cuda (__main__.TestSparseCSRCUDA) 2023-01-11T21:45:10.4510755Z This test performs a smoke test for covered conversion and verifies ... ok (0.016s) 2023-01-11T21:45:10.4511112Z test_compressed_layout_conversions_coverage_SparseCSC_SparseBSC_cuda (__main__.TestSparseCSRCUDA) 2023-01-11T21:45:10.4511457Z This test performs a smoke test for covered conversion and verifies ... ok (0.020s) 2023-01-11T21:45:10.4511813Z test_compressed_layout_conversions_coverage_SparseCSC_SparseBSR_cuda (__main__.TestSparseCSRCUDA) 2023-01-11T21:45:10.4512170Z This test performs a smoke test for covered conversion and verifies ... ok (0.019s) 2023-01-11T21:45:10.4512526Z test_compressed_layout_conversions_coverage_SparseCSC_SparseCSC_cuda (__main__.TestSparseCSRCUDA) 2023-01-11T21:45:10.4512868Z This test performs a smoke test for covered conversion and verifies ... ok (0.005s) 2023-01-11T21:45:10.4513225Z test_compressed_layout_conversions_coverage_SparseCSC_SparseCSR_cuda (__main__.TestSparseCSRCUDA) 2023-01-11T21:45:10.4513577Z This test performs a smoke test for covered conversion and verifies ... ok (0.006s) 2023-01-11T21:45:10.4513929Z test_compressed_layout_conversions_coverage_SparseCSR_SparseBSC_cuda (__main__.TestSparseCSRCUDA) 2023-01-11T21:45:10.4514269Z This test performs a smoke test for covered conversion and verifies ... ok (0.019s) 2023-01-11T21:45:10.4514621Z test_compressed_layout_conversions_coverage_SparseCSR_SparseBSR_cuda (__main__.TestSparseCSRCUDA) 2023-01-11T21:45:10.4514975Z This test performs a smoke test for covered conversion and verifies ... ok (0.019s) 2023-01-11T21:45:10.4515323Z test_compressed_layout_conversions_coverage_SparseCSR_SparseCSC_cuda (__main__.TestSparseCSRCUDA) 2023-01-11T21:45:10.4515674Z This test performs a smoke test for covered conversion and verifies ... ok (0.006s) 2023-01-11T21:45:10.4516055Z test_compressed_layout_conversions_coverage_SparseCSR_SparseCSR_cuda (__main__.TestSparseCSRCUDA) 2023-01-11T21:45:10.4516409Z This test performs a smoke test for covered conversion and verifies ... ok (0.004s) 2023-01-11T21:45:10.4516729Z test_coo_csr_conversion_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.004s) 2023-01-11T21:45:10.4517066Z test_coo_csr_conversion_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4517401Z test_coo_csr_conversion_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.004s) 2023-01-11T21:45:10.4517732Z test_coo_csr_conversion_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.004s) 2023-01-11T21:45:10.4518065Z test_coo_csr_conversion_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.004s) 2023-01-11T21:45:10.4518394Z test_coo_csr_conversion_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.004s) 2023-01-11T21:45:10.4518717Z test_coo_csr_conversion_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.004s) 2023-01-11T21:45:10.4519037Z test_coo_csr_conversion_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4519364Z test_coo_csr_conversion_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4519687Z test_coo_csr_conversion_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4520003Z test_coo_csr_conversion_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4520329Z test_coo_csr_conversion_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4520659Z test_coo_to_csr_convert_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.015s) 2023-01-11T21:45:10.4520993Z test_csr_coo_conversion_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4521342Z test_csr_coo_conversion_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.004s) 2023-01-11T21:45:10.4521678Z test_csr_coo_conversion_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4522021Z test_csr_coo_conversion_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4522346Z test_csr_coo_conversion_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4522669Z test_csr_coo_conversion_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4522997Z test_csr_coo_conversion_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4523325Z test_csr_coo_conversion_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.004s) 2023-01-11T21:45:10.4523640Z test_csr_coo_conversion_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.004s) 2023-01-11T21:45:10.4523962Z test_csr_coo_conversion_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.004s) 2023-01-11T21:45:10.4524289Z test_csr_coo_conversion_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.004s) 2023-01-11T21:45:10.4524606Z test_csr_coo_conversion_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.004s) 2023-01-11T21:45:10.4524931Z test_csr_double_to_sparse_csr_cuda (__main__.TestSparseCSRCUDA) ... ok (0.001s) 2023-01-11T21:45:10.4525252Z test_csr_is_contiguous_cuda (__main__.TestSparseCSRCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4525573Z test_csr_matvec_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.044s) 2023-01-11T21:45:10.4525889Z test_csr_matvec_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.045s) 2023-01-11T21:45:10.4526213Z test_csr_matvec_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.044s) 2023-01-11T21:45:10.4526533Z test_csr_matvec_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.044s) 2023-01-11T21:45:10.4526840Z test_csr_matvec_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.044s) 2023-01-11T21:45:10.4527151Z test_csr_matvec_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.044s) 2023-01-11T21:45:10.4527808Z test_csr_storage_cuda (__main__.TestSparseCSRCUDA) ... /var/lib/jenkins/workspace/test/test_sparse_csr.py:924: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T21:45:10.4528351Z a.storage() 2023-01-11T21:45:10.4528529Z ok (0.003s) 2023-01-11T21:45:10.4528769Z test_csr_stride_cuda (__main__.TestSparseCSRCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4529100Z test_csr_to_block_csr_blocksize_2_cuda_float64_int32 (__main__.TestSparseCSRCUDA) ... ok (0.014s) 2023-01-11T21:45:10.4529456Z test_csr_to_block_csr_blocksize_2_cuda_float64_int64 (__main__.TestSparseCSRCUDA) ... ok (0.014s) 2023-01-11T21:45:10.4529799Z test_csr_to_block_csr_blocksize_4_cuda_float64_int32 (__main__.TestSparseCSRCUDA) ... ok (0.023s) 2023-01-11T21:45:10.4530149Z test_csr_to_block_csr_blocksize_4_cuda_float64_int64 (__main__.TestSparseCSRCUDA) ... ok (0.023s) 2023-01-11T21:45:10.4530492Z test_csr_to_block_csr_errors_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.023s) 2023-01-11T21:45:10.4530850Z test_dense_to_from_sparse_compressed_SparseBSC_Batched_Hybrid_cuda (__main__.TestSparseCSRCUDA) 2023-01-11T21:45:10.4531195Z This test tests conversion from dense to/from CSR and CSC ... ok (0.013s) 2023-01-11T21:45:10.4531546Z test_dense_to_from_sparse_compressed_SparseBSC_Batched_NonHybrid_cuda (__main__.TestSparseCSRCUDA) 2023-01-11T21:45:10.4531891Z This test tests conversion from dense to/from CSR and CSC ... ok (0.378s) 2023-01-11T21:45:10.4532233Z test_dense_to_from_sparse_compressed_SparseBSC_NonBatched_Hybrid_cuda (__main__.TestSparseCSRCUDA) 2023-01-11T21:45:10.4532578Z This test tests conversion from dense to/from CSR and CSC ... ok (0.011s) 2023-01-11T21:45:10.4532934Z test_dense_to_from_sparse_compressed_SparseBSC_NonBatched_NonHybrid_cuda (__main__.TestSparseCSRCUDA) 2023-01-11T21:45:10.4533309Z This test tests conversion from dense to/from CSR and CSC ... ok (0.020s) 2023-01-11T21:45:10.4533643Z test_dense_to_from_sparse_compressed_SparseBSR_Batched_Hybrid_cuda (__main__.TestSparseCSRCUDA) 2023-01-11T21:45:10.4533983Z This test tests conversion from dense to/from CSR and CSC ... ok (0.012s) 2023-01-11T21:45:10.4534327Z test_dense_to_from_sparse_compressed_SparseBSR_Batched_NonHybrid_cuda (__main__.TestSparseCSRCUDA) 2023-01-11T21:45:10.4534848Z This test tests conversion from dense to/from CSR and CSC ... ok (0.304s) 2023-01-11T21:45:10.4535201Z test_dense_to_from_sparse_compressed_SparseBSR_NonBatched_Hybrid_cuda (__main__.TestSparseCSRCUDA) 2023-01-11T21:45:10.4535542Z This test tests conversion from dense to/from CSR and CSC ... ok (0.011s) 2023-01-11T21:45:10.4535893Z test_dense_to_from_sparse_compressed_SparseBSR_NonBatched_NonHybrid_cuda (__main__.TestSparseCSRCUDA) 2023-01-11T21:45:10.4536229Z This test tests conversion from dense to/from CSR and CSC ... ok (0.019s) 2023-01-11T21:45:10.4536574Z test_dense_to_from_sparse_compressed_SparseCSC_Batched_Hybrid_cuda (__main__.TestSparseCSRCUDA) 2023-01-11T21:45:10.4536910Z This test tests conversion from dense to/from CSR and CSC ... ok (0.015s) 2023-01-11T21:45:10.4537252Z test_dense_to_from_sparse_compressed_SparseCSC_Batched_NonHybrid_cuda (__main__.TestSparseCSRCUDA) 2023-01-11T21:45:10.4537594Z This test tests conversion from dense to/from CSR and CSC ... ok (0.083s) 2023-01-11T21:45:10.4537940Z test_dense_to_from_sparse_compressed_SparseCSC_NonBatched_Hybrid_cuda (__main__.TestSparseCSRCUDA) 2023-01-11T21:45:10.4538279Z This test tests conversion from dense to/from CSR and CSC ... ok (0.012s) 2023-01-11T21:45:10.4538678Z test_dense_to_from_sparse_compressed_SparseCSC_NonBatched_NonHybrid_cuda (__main__.TestSparseCSRCUDA) 2023-01-11T21:45:10.4539025Z This test tests conversion from dense to/from CSR and CSC ... ok (0.009s) 2023-01-11T21:45:10.4539367Z test_dense_to_from_sparse_compressed_SparseCSR_Batched_Hybrid_cuda (__main__.TestSparseCSRCUDA) 2023-01-11T21:45:10.4539693Z This test tests conversion from dense to/from CSR and CSC ... ok (0.013s) 2023-01-11T21:45:10.4540100Z test_dense_to_from_sparse_compressed_SparseCSR_Batched_NonHybrid_cuda (__main__.TestSparseCSRCUDA) 2023-01-11T21:45:10.4540443Z This test tests conversion from dense to/from CSR and CSC ... ok (0.080s) 2023-01-11T21:45:10.4540786Z test_dense_to_from_sparse_compressed_SparseCSR_NonBatched_Hybrid_cuda (__main__.TestSparseCSRCUDA) 2023-01-11T21:45:10.4541115Z This test tests conversion from dense to/from CSR and CSC ... ok (0.011s) 2023-01-11T21:45:10.4541469Z test_dense_to_from_sparse_compressed_SparseCSR_NonBatched_NonHybrid_cuda (__main__.TestSparseCSRCUDA) 2023-01-11T21:45:10.4541818Z This test tests conversion from dense to/from CSR and CSC ... ok (0.009s) 2023-01-11T21:45:10.4542140Z test_direct_coo_csr_conversion_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4542499Z test_direct_coo_csr_conversion_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4542854Z test_direct_coo_csr_conversion_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.004s) 2023-01-11T21:45:10.4543206Z test_direct_coo_csr_conversion_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.004s) 2023-01-11T21:45:10.4543539Z test_direct_coo_csr_conversion_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4543883Z test_direct_coo_csr_conversion_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4544220Z test_direct_coo_csr_conversion_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4544568Z test_direct_coo_csr_conversion_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4544901Z test_direct_coo_csr_conversion_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4545235Z test_exercise_detach_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4545601Z test_exercise_detach_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4545927Z test_exercise_detach_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4546268Z test_exercise_detach_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4546602Z test_exercise_detach_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4546931Z test_exercise_detach_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4547252Z test_exercise_detach_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4547574Z test_exercise_detach_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4547900Z test_exercise_detach_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4548218Z test_exercise_detach_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4548541Z test_exercise_detach_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4548868Z test_exercise_detach_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4549207Z test_matmul_device_mismatch_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.024s) 2023-01-11T21:45:10.4549522Z test_mm_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.823s) 2023-01-11T21:45:10.4549833Z test_mm_errors_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.023s) 2023-01-11T21:45:10.4550145Z test_mul_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.261s) 2023-01-11T21:45:10.4550439Z test_mul_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.259s) 2023-01-11T21:45:10.4550788Z test_mul_scalar_enable_hybrid_False_SparseBSC_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (5.375s) 2023-01-11T21:45:10.4551171Z test_mul_scalar_enable_hybrid_False_SparseBSC_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (3.101s) 2023-01-11T21:45:10.4551560Z test_mul_scalar_enable_hybrid_False_SparseBSC_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (5.841s) 2023-01-11T21:45:10.4551944Z test_mul_scalar_enable_hybrid_False_SparseBSC_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (5.896s) 2023-01-11T21:45:10.4552353Z test_mul_scalar_enable_hybrid_False_SparseBSC_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (4.886s) 2023-01-11T21:45:10.4552737Z test_mul_scalar_enable_hybrid_False_SparseBSC_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (5.358s) 2023-01-11T21:45:10.4553106Z test_mul_scalar_enable_hybrid_False_SparseBSC_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (5.336s) 2023-01-11T21:45:10.4553486Z test_mul_scalar_enable_hybrid_False_SparseBSC_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (4.267s) 2023-01-11T21:45:10.4553853Z test_mul_scalar_enable_hybrid_False_SparseBSC_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (4.272s) 2023-01-11T21:45:10.4554230Z test_mul_scalar_enable_hybrid_False_SparseBSC_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (4.247s) 2023-01-11T21:45:10.4554596Z test_mul_scalar_enable_hybrid_False_SparseBSC_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (4.279s) 2023-01-11T21:45:10.4554969Z test_mul_scalar_enable_hybrid_False_SparseBSC_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (4.260s) 2023-01-11T21:45:10.4555346Z test_mul_scalar_enable_hybrid_False_SparseBSR_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (4.009s) 2023-01-11T21:45:10.4555723Z test_mul_scalar_enable_hybrid_False_SparseBSR_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (2.309s) 2023-01-11T21:45:10.4556097Z test_mul_scalar_enable_hybrid_False_SparseBSR_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (4.385s) 2023-01-11T21:45:10.4556484Z test_mul_scalar_enable_hybrid_False_SparseBSR_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (4.418s) 2023-01-11T21:45:10.4556858Z test_mul_scalar_enable_hybrid_False_SparseBSR_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (3.643s) 2023-01-11T21:45:10.4557231Z test_mul_scalar_enable_hybrid_False_SparseBSR_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (4.012s) 2023-01-11T21:45:10.4557647Z test_mul_scalar_enable_hybrid_False_SparseBSR_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (3.973s) 2023-01-11T21:45:10.4558027Z test_mul_scalar_enable_hybrid_False_SparseBSR_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (3.164s) 2023-01-11T21:45:10.4558394Z test_mul_scalar_enable_hybrid_False_SparseBSR_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (3.174s) 2023-01-11T21:45:10.4558760Z test_mul_scalar_enable_hybrid_False_SparseBSR_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (3.133s) 2023-01-11T21:45:10.4559129Z test_mul_scalar_enable_hybrid_False_SparseBSR_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (3.155s) 2023-01-11T21:45:10.4559495Z test_mul_scalar_enable_hybrid_False_SparseBSR_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (3.155s) 2023-01-11T21:45:10.4559870Z test_mul_scalar_enable_hybrid_False_SparseCSC_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (1.456s) 2023-01-11T21:45:10.4560239Z test_mul_scalar_enable_hybrid_False_SparseCSC_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.801s) 2023-01-11T21:45:10.4560623Z test_mul_scalar_enable_hybrid_False_SparseCSC_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (1.602s) 2023-01-11T21:45:10.4561016Z test_mul_scalar_enable_hybrid_False_SparseCSC_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (1.639s) 2023-01-11T21:45:10.4561396Z test_mul_scalar_enable_hybrid_False_SparseCSC_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (1.324s) 2023-01-11T21:45:10.4561763Z test_mul_scalar_enable_hybrid_False_SparseCSC_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (1.456s) 2023-01-11T21:45:10.4562134Z test_mul_scalar_enable_hybrid_False_SparseCSC_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (1.420s) 2023-01-11T21:45:10.4562507Z test_mul_scalar_enable_hybrid_False_SparseCSC_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (1.071s) 2023-01-11T21:45:10.4562883Z test_mul_scalar_enable_hybrid_False_SparseCSC_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (1.073s) 2023-01-11T21:45:10.4563250Z test_mul_scalar_enable_hybrid_False_SparseCSC_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (1.053s) 2023-01-11T21:45:10.4563648Z test_mul_scalar_enable_hybrid_False_SparseCSC_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (1.072s) 2023-01-11T21:45:10.4564021Z test_mul_scalar_enable_hybrid_False_SparseCSC_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (1.069s) 2023-01-11T21:45:10.4564395Z test_mul_scalar_enable_hybrid_False_SparseCSR_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (1.354s) 2023-01-11T21:45:10.4564769Z test_mul_scalar_enable_hybrid_False_SparseCSR_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.740s) 2023-01-11T21:45:10.4565154Z test_mul_scalar_enable_hybrid_False_SparseCSR_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (1.492s) 2023-01-11T21:45:10.4565543Z test_mul_scalar_enable_hybrid_False_SparseCSR_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (1.531s) 2023-01-11T21:45:10.4565916Z test_mul_scalar_enable_hybrid_False_SparseCSR_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (1.230s) 2023-01-11T21:45:10.4566293Z test_mul_scalar_enable_hybrid_False_SparseCSR_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (1.356s) 2023-01-11T21:45:10.4566666Z test_mul_scalar_enable_hybrid_False_SparseCSR_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (1.322s) 2023-01-11T21:45:10.4567041Z test_mul_scalar_enable_hybrid_False_SparseCSR_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.997s) 2023-01-11T21:45:10.4567401Z test_mul_scalar_enable_hybrid_False_SparseCSR_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.994s) 2023-01-11T21:45:10.4567776Z test_mul_scalar_enable_hybrid_False_SparseCSR_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.977s) 2023-01-11T21:45:10.4568147Z test_mul_scalar_enable_hybrid_False_SparseCSR_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.994s) 2023-01-11T21:45:10.4568511Z test_mul_scalar_enable_hybrid_False_SparseCSR_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.992s) 2023-01-11T21:45:10.4569387Z test_resize_as_sparse_compressed_SparseBSC_cuda_bool (__main__.TestSparseCSRCUDA) ... /opt/conda/lib/python3.10/site-packages/torch/testing/_creation.py:167: UserWarning: ComplexHalf support is experimental and many operators don't support it yet. (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/EmptyTensor.cpp:32.) 2023-01-11T21:45:10.4569940Z result = torch.empty(shape, device=device, dtype=dtype) 2023-01-11T21:45:10.4570176Z ok (0.064s) 2023-01-11T21:45:10.4570456Z test_resize_as_sparse_compressed_SparseBSC_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.063s) 2023-01-11T21:45:10.4570835Z test_resize_as_sparse_compressed_SparseBSR_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.060s) 2023-01-11T21:45:10.4571212Z test_resize_as_sparse_compressed_SparseBSR_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.060s) 2023-01-11T21:45:10.4571593Z test_resize_as_sparse_compressed_SparseCSC_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.075s) 2023-01-11T21:45:10.4571965Z test_resize_as_sparse_compressed_SparseCSC_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.075s) 2023-01-11T21:45:10.4572333Z test_resize_as_sparse_compressed_SparseCSR_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.085s) 2023-01-11T21:45:10.4572706Z test_resize_as_sparse_compressed_SparseCSR_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.086s) 2023-01-11T21:45:10.4573050Z test_resize_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.004s) 2023-01-11T21:45:10.4573359Z test_resize_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.004s) 2023-01-11T21:45:10.4573675Z test_resize_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.004s) 2023-01-11T21:45:10.4573993Z test_resize_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.004s) 2023-01-11T21:45:10.4574299Z test_resize_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4574791Z test_resize_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.004s) 2023-01-11T21:45:10.4575109Z test_resize_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.004s) 2023-01-11T21:45:10.4575409Z test_resize_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.004s) 2023-01-11T21:45:10.4575699Z test_resize_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.004s) 2023-01-11T21:45:10.4576049Z test_resize_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.004s) 2023-01-11T21:45:10.4576346Z test_resize_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.004s) 2023-01-11T21:45:10.4576635Z test_resize_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.004s) 2023-01-11T21:45:10.4576945Z test_resize_errors_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.011s) 2023-01-11T21:45:10.4577265Z test_resize_errors_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.010s) 2023-01-11T21:45:10.4577576Z test_resize_errors_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.010s) 2023-01-11T21:45:10.4577904Z test_resize_errors_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.010s) 2023-01-11T21:45:10.4578318Z test_resize_errors_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.010s) 2023-01-11T21:45:10.4578763Z test_resize_errors_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.010s) 2023-01-11T21:45:10.4579105Z test_resize_errors_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.010s) 2023-01-11T21:45:10.4579486Z test_resize_errors_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.010s) 2023-01-11T21:45:10.4579801Z test_resize_errors_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.010s) 2023-01-11T21:45:10.4580101Z test_resize_errors_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.010s) 2023-01-11T21:45:10.4580408Z test_resize_errors_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.010s) 2023-01-11T21:45:10.4580718Z test_resize_errors_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.010s) 2023-01-11T21:45:10.4581053Z test_sampled_addmm_autograd_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.007s) 2023-01-11T21:45:10.4581397Z test_sampled_addmm_autograd_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.006s) 2023-01-11T21:45:10.4581806Z test_sampled_addmm_autograd_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.006s) 2023-01-11T21:45:10.4582145Z test_sampled_addmm_autograd_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.006s) 2023-01-11T21:45:10.4582477Z test_sampled_addmm_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (1.174s) 2023-01-11T21:45:10.4582808Z test_sampled_addmm_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (1.205s) 2023-01-11T21:45:10.4583133Z test_sampled_addmm_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (1.147s) 2023-01-11T21:45:10.4583454Z test_sampled_addmm_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (1.118s) 2023-01-11T21:45:10.4583783Z test_sampled_addmm_errors_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.034s) 2023-01-11T21:45:10.4584128Z test_sampled_addmm_errors_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.034s) 2023-01-11T21:45:10.4584470Z test_sampled_addmm_errors_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.034s) 2023-01-11T21:45:10.4584799Z test_sampled_addmm_errors_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.034s) 2023-01-11T21:45:10.4585249Z test_sampled_addmm_zero_sized_cuda_complex128 (__main__.TestSparseCSRCUDA) ... skip: Causes CUDA memory exception, see https://github.com/pytorch/pytorch/issues/72177 (0.001s) 2023-01-11T21:45:10.4585774Z test_sampled_addmm_zero_sized_cuda_complex64 (__main__.TestSparseCSRCUDA) ... skip: Causes CUDA memory exception, see https://github.com/pytorch/pytorch/issues/72177 (0.001s) 2023-01-11T21:45:10.4586291Z test_sampled_addmm_zero_sized_cuda_float32 (__main__.TestSparseCSRCUDA) ... skip: Causes CUDA memory exception, see https://github.com/pytorch/pytorch/issues/72177 (0.001s) 2023-01-11T21:45:10.4586802Z test_sampled_addmm_zero_sized_cuda_float64 (__main__.TestSparseCSRCUDA) ... skip: Causes CUDA memory exception, see https://github.com/pytorch/pytorch/issues/72177 (0.001s) 2023-01-11T21:45:10.4587212Z test_select_SparseBSC_int32_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.023s) 2023-01-11T21:45:10.4587559Z test_select_SparseBSC_int32_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.022s) 2023-01-11T21:45:10.4587934Z test_select_SparseBSC_int32_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.023s) 2023-01-11T21:45:10.4588286Z test_select_SparseBSC_int32_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.023s) 2023-01-11T21:45:10.4588619Z test_select_SparseBSC_int32_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.023s) 2023-01-11T21:45:10.4588955Z test_select_SparseBSC_int32_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.023s) 2023-01-11T21:45:10.4589295Z test_select_SparseBSC_int32_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.023s) 2023-01-11T21:45:10.4589619Z test_select_SparseBSC_int32_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.022s) 2023-01-11T21:45:10.4589949Z test_select_SparseBSC_int32_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.022s) 2023-01-11T21:45:10.4590276Z test_select_SparseBSC_int32_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.022s) 2023-01-11T21:45:10.4590597Z test_select_SparseBSC_int32_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.022s) 2023-01-11T21:45:10.4590924Z test_select_SparseBSC_int32_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.022s) 2023-01-11T21:45:10.4591255Z test_select_SparseBSC_int64_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.023s) 2023-01-11T21:45:10.4591592Z test_select_SparseBSC_int64_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.022s) 2023-01-11T21:45:10.4591923Z test_select_SparseBSC_int64_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.023s) 2023-01-11T21:45:10.4592266Z test_select_SparseBSC_int64_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.023s) 2023-01-11T21:45:10.4592604Z test_select_SparseBSC_int64_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.023s) 2023-01-11T21:45:10.4592941Z test_select_SparseBSC_int64_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.023s) 2023-01-11T21:45:10.4593299Z test_select_SparseBSC_int64_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.023s) 2023-01-11T21:45:10.4593632Z test_select_SparseBSC_int64_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.022s) 2023-01-11T21:45:10.4593965Z test_select_SparseBSC_int64_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.022s) 2023-01-11T21:45:10.4594279Z test_select_SparseBSC_int64_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.023s) 2023-01-11T21:45:10.4594608Z test_select_SparseBSC_int64_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.022s) 2023-01-11T21:45:10.4594936Z test_select_SparseBSC_int64_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.022s) 2023-01-11T21:45:10.4595270Z test_select_SparseBSR_int32_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.020s) 2023-01-11T21:45:10.4595597Z test_select_SparseBSR_int32_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.020s) 2023-01-11T21:45:10.4595933Z test_select_SparseBSR_int32_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.021s) 2023-01-11T21:45:10.4596280Z test_select_SparseBSR_int32_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.021s) 2023-01-11T21:45:10.4596609Z test_select_SparseBSR_int32_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.020s) 2023-01-11T21:45:10.4596955Z test_select_SparseBSR_int32_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.020s) 2023-01-11T21:45:10.4597292Z test_select_SparseBSR_int32_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.020s) 2023-01-11T21:45:10.4597627Z test_select_SparseBSR_int32_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.020s) 2023-01-11T21:45:10.4597944Z test_select_SparseBSR_int32_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.020s) 2023-01-11T21:45:10.4598269Z test_select_SparseBSR_int32_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.020s) 2023-01-11T21:45:10.4598594Z test_select_SparseBSR_int32_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.020s) 2023-01-11T21:45:10.4598913Z test_select_SparseBSR_int32_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.020s) 2023-01-11T21:45:10.4599258Z test_select_SparseBSR_int64_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.020s) 2023-01-11T21:45:10.4599592Z test_select_SparseBSR_int64_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.020s) 2023-01-11T21:45:10.4599964Z test_select_SparseBSR_int64_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.020s) 2023-01-11T21:45:10.4600302Z test_select_SparseBSR_int64_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.020s) 2023-01-11T21:45:10.4600643Z test_select_SparseBSR_int64_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.020s) 2023-01-11T21:45:10.4600974Z test_select_SparseBSR_int64_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.020s) 2023-01-11T21:45:10.4601313Z test_select_SparseBSR_int64_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.020s) 2023-01-11T21:45:10.4601637Z test_select_SparseBSR_int64_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.020s) 2023-01-11T21:45:10.4601966Z test_select_SparseBSR_int64_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.020s) 2023-01-11T21:45:10.4602293Z test_select_SparseBSR_int64_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.019s) 2023-01-11T21:45:10.4602612Z test_select_SparseBSR_int64_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.019s) 2023-01-11T21:45:10.4602940Z test_select_SparseBSR_int64_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.019s) 2023-01-11T21:45:10.4603274Z test_select_SparseCSC_int32_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.023s) 2023-01-11T21:45:10.4603611Z test_select_SparseCSC_int32_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.023s) 2023-01-11T21:45:10.4605287Z test_select_SparseCSC_int32_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.023s) 2023-01-11T21:45:10.4605633Z test_select_SparseCSC_int32_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.023s) 2023-01-11T21:45:10.4605970Z test_select_SparseCSC_int32_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.023s) 2023-01-11T21:45:10.4606328Z test_select_SparseCSC_int32_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.024s) 2023-01-11T21:45:10.4606665Z test_select_SparseCSC_int32_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.023s) 2023-01-11T21:45:10.4606999Z test_select_SparseCSC_int32_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.023s) 2023-01-11T21:45:10.4607325Z test_select_SparseCSC_int32_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.023s) 2023-01-11T21:45:10.4607638Z test_select_SparseCSC_int32_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.023s) 2023-01-11T21:45:10.4607966Z test_select_SparseCSC_int32_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.023s) 2023-01-11T21:45:10.4608295Z test_select_SparseCSC_int32_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.023s) 2023-01-11T21:45:10.4608627Z test_select_SparseCSC_int64_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.024s) 2023-01-11T21:45:10.4608949Z test_select_SparseCSC_int64_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.023s) 2023-01-11T21:45:10.4609293Z test_select_SparseCSC_int64_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.024s) 2023-01-11T21:45:10.4609636Z test_select_SparseCSC_int64_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.023s) 2023-01-11T21:45:10.4609970Z test_select_SparseCSC_int64_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.024s) 2023-01-11T21:45:10.4610307Z test_select_SparseCSC_int64_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.023s) 2023-01-11T21:45:10.4610640Z test_select_SparseCSC_int64_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.024s) 2023-01-11T21:45:10.4610977Z test_select_SparseCSC_int64_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.023s) 2023-01-11T21:45:10.4611297Z test_select_SparseCSC_int64_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.023s) 2023-01-11T21:45:10.4611625Z test_select_SparseCSC_int64_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.023s) 2023-01-11T21:45:10.4611951Z test_select_SparseCSC_int64_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.023s) 2023-01-11T21:45:10.4612275Z test_select_SparseCSC_int64_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.023s) 2023-01-11T21:45:10.4612608Z test_select_SparseCSR_int32_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.021s) 2023-01-11T21:45:10.4612969Z test_select_SparseCSR_int32_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.020s) 2023-01-11T21:45:10.4613309Z test_select_SparseCSR_int32_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.020s) 2023-01-11T21:45:10.4613644Z test_select_SparseCSR_int32_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.021s) 2023-01-11T21:45:10.4613980Z test_select_SparseCSR_int32_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.021s) 2023-01-11T21:45:10.4614311Z test_select_SparseCSR_int32_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.021s) 2023-01-11T21:45:10.4614768Z test_select_SparseCSR_int32_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.021s) 2023-01-11T21:45:10.4615100Z test_select_SparseCSR_int32_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.020s) 2023-01-11T21:45:10.4615427Z test_select_SparseCSR_int32_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.020s) 2023-01-11T21:45:10.4615749Z test_select_SparseCSR_int32_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.020s) 2023-01-11T21:45:10.4616066Z test_select_SparseCSR_int32_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.020s) 2023-01-11T21:45:10.4616395Z test_select_SparseCSR_int32_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.020s) 2023-01-11T21:45:10.4616727Z test_select_SparseCSR_int64_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.021s) 2023-01-11T21:45:10.4617055Z test_select_SparseCSR_int64_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.020s) 2023-01-11T21:45:10.4617394Z test_select_SparseCSR_int64_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.021s) 2023-01-11T21:45:10.4617742Z test_select_SparseCSR_int64_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.021s) 2023-01-11T21:45:10.4618125Z test_select_SparseCSR_int64_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.021s) 2023-01-11T21:45:10.4618450Z test_select_SparseCSR_int64_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.020s) 2023-01-11T21:45:10.4618854Z test_select_SparseCSR_int64_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.020s) 2023-01-11T21:45:10.4619189Z test_select_SparseCSR_int64_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.020s) 2023-01-11T21:45:10.4619516Z test_select_SparseCSR_int64_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.020s) 2023-01-11T21:45:10.4619830Z test_select_SparseCSR_int64_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.020s) 2023-01-11T21:45:10.4620159Z test_select_SparseCSR_int64_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.020s) 2023-01-11T21:45:10.4620482Z test_select_SparseCSR_int64_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.020s) 2023-01-11T21:45:10.4620801Z test_sparse_add_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.030s) 2023-01-11T21:45:10.4621129Z test_sparse_add_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.030s) 2023-01-11T21:45:10.4621445Z test_sparse_add_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.029s) 2023-01-11T21:45:10.4621768Z test_sparse_add_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.029s) 2023-01-11T21:45:10.4622085Z test_sparse_add_errors_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.013s) 2023-01-11T21:45:10.4622423Z test_sparse_add_errors_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.013s) 2023-01-11T21:45:10.4622755Z test_sparse_add_errors_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.013s) 2023-01-11T21:45:10.4623077Z test_sparse_add_errors_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.013s) 2023-01-11T21:45:10.4623406Z test_sparse_addmm_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.026s) 2023-01-11T21:45:10.4623735Z test_sparse_addmm_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.026s) 2023-01-11T21:45:10.4624065Z test_sparse_addmm_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.027s) 2023-01-11T21:45:10.4624382Z test_sparse_addmm_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.026s) 2023-01-11T21:45:10.4624699Z test_sparse_addmm_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.026s) 2023-01-11T21:45:10.4625062Z test_sparse_addmm_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.026s) 2023-01-11T21:45:10.4625380Z test_sparse_csc_to_dense_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.006s) 2023-01-11T21:45:10.4625711Z test_sparse_csc_to_dense_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.004s) 2023-01-11T21:45:10.4626045Z test_sparse_csc_to_dense_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.006s) 2023-01-11T21:45:10.4626381Z test_sparse_csc_to_dense_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.006s) 2023-01-11T21:45:10.4626711Z test_sparse_csc_to_dense_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4627036Z test_sparse_csc_to_dense_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4627370Z test_sparse_csc_to_dense_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4627687Z test_sparse_csc_to_dense_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.004s) 2023-01-11T21:45:10.4628013Z test_sparse_csc_to_dense_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.004s) 2023-01-11T21:45:10.4628339Z test_sparse_csc_to_dense_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.004s) 2023-01-11T21:45:10.4628661Z test_sparse_csc_to_dense_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.004s) 2023-01-11T21:45:10.4628980Z test_sparse_csc_to_dense_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.004s) 2023-01-11T21:45:10.4629315Z test_sparse_csr_from_dense_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4629650Z test_sparse_csr_from_dense_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4629980Z test_sparse_csr_from_dense_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4630356Z test_sparse_csr_from_dense_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4630697Z test_sparse_csr_from_dense_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4631030Z test_sparse_csr_from_dense_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4631356Z test_sparse_csr_from_dense_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4631690Z test_sparse_csr_from_dense_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4632020Z test_sparse_csr_from_dense_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4632339Z test_sparse_csr_from_dense_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4632668Z test_sparse_csr_from_dense_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4633002Z test_sparse_csr_from_dense_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4633375Z test_sparse_csr_to_dense_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4633705Z test_sparse_csr_to_dense_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.004s) 2023-01-11T21:45:10.4634041Z test_sparse_csr_to_dense_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4634379Z test_sparse_csr_to_dense_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4634703Z test_sparse_csr_to_dense_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4635032Z test_sparse_csr_to_dense_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4635360Z test_sparse_csr_to_dense_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4635689Z test_sparse_csr_to_dense_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.004s) 2023-01-11T21:45:10.4636003Z test_sparse_csr_to_dense_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.004s) 2023-01-11T21:45:10.4636334Z test_sparse_csr_to_dense_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.004s) 2023-01-11T21:45:10.4636657Z test_sparse_csr_to_dense_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.004s) 2023-01-11T21:45:10.4637001Z test_sparse_csr_to_dense_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.004s) 2023-01-11T21:45:10.4637343Z test_sparse_csr_unary_inplace_abs_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4637692Z test_sparse_csr_unary_inplace_abs_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4638047Z test_sparse_csr_unary_inplace_abs_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.006s) 2023-01-11T21:45:10.4638397Z test_sparse_csr_unary_inplace_abs_cuda_complex32 (__main__.TestSparseCSRCUDA) ... ok (0.175s) 2023-01-11T21:45:10.4638755Z test_sparse_csr_unary_inplace_abs_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.007s) 2023-01-11T21:45:10.4639111Z test_sparse_csr_unary_inplace_abs_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4639460Z test_sparse_csr_unary_inplace_abs_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4639803Z test_sparse_csr_unary_inplace_abs_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4640152Z test_sparse_csr_unary_inplace_abs_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4640497Z test_sparse_csr_unary_inplace_abs_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4640825Z test_sparse_csr_unary_inplace_abs_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4641167Z test_sparse_csr_unary_inplace_abs_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4641517Z test_sparse_csr_unary_inplace_abs_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4641913Z test_sparse_csr_unary_inplace_angle_cuda_bool (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s) 2023-01-11T21:45:10.4642390Z test_sparse_csr_unary_inplace_angle_cuda_complex128 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s) 2023-01-11T21:45:10.4642845Z test_sparse_csr_unary_inplace_angle_cuda_complex32 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s) 2023-01-11T21:45:10.4643291Z test_sparse_csr_unary_inplace_angle_cuda_complex64 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s) 2023-01-11T21:45:10.4643730Z test_sparse_csr_unary_inplace_angle_cuda_float32 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s) 2023-01-11T21:45:10.4644160Z test_sparse_csr_unary_inplace_angle_cuda_float64 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s) 2023-01-11T21:45:10.4644601Z test_sparse_csr_unary_inplace_angle_cuda_int16 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s) 2023-01-11T21:45:10.4645032Z test_sparse_csr_unary_inplace_angle_cuda_int32 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s) 2023-01-11T21:45:10.4645469Z test_sparse_csr_unary_inplace_angle_cuda_int64 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s) 2023-01-11T21:45:10.4645897Z test_sparse_csr_unary_inplace_angle_cuda_int8 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s) 2023-01-11T21:45:10.4646331Z test_sparse_csr_unary_inplace_angle_cuda_uint8 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s) 2023-01-11T21:45:10.4646722Z test_sparse_csr_unary_inplace_asin_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4647076Z test_sparse_csr_unary_inplace_asin_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.007s) 2023-01-11T21:45:10.4647422Z test_sparse_csr_unary_inplace_asin_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4647792Z test_sparse_csr_unary_inplace_asin_cuda_complex32 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4648178Z test_sparse_csr_unary_inplace_asin_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4648533Z test_sparse_csr_unary_inplace_asin_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4648877Z test_sparse_csr_unary_inplace_asin_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4649224Z test_sparse_csr_unary_inplace_asin_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4649574Z test_sparse_csr_unary_inplace_asin_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.007s) 2023-01-11T21:45:10.4649911Z test_sparse_csr_unary_inplace_asin_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.007s) 2023-01-11T21:45:10.4650261Z test_sparse_csr_unary_inplace_asin_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.007s) 2023-01-11T21:45:10.4650607Z test_sparse_csr_unary_inplace_asin_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.007s) 2023-01-11T21:45:10.4650950Z test_sparse_csr_unary_inplace_asin_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.007s) 2023-01-11T21:45:10.4651292Z test_sparse_csr_unary_inplace_asinh_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4651643Z test_sparse_csr_unary_inplace_asinh_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.007s) 2023-01-11T21:45:10.4651995Z test_sparse_csr_unary_inplace_asinh_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4652348Z test_sparse_csr_unary_inplace_asinh_cuda_complex32 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4652711Z test_sparse_csr_unary_inplace_asinh_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4653069Z test_sparse_csr_unary_inplace_asinh_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4653446Z test_sparse_csr_unary_inplace_asinh_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4653791Z test_sparse_csr_unary_inplace_asinh_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4654147Z test_sparse_csr_unary_inplace_asinh_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.007s) 2023-01-11T21:45:10.4654628Z test_sparse_csr_unary_inplace_asinh_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.007s) 2023-01-11T21:45:10.4655030Z test_sparse_csr_unary_inplace_asinh_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.007s) 2023-01-11T21:45:10.4655363Z test_sparse_csr_unary_inplace_asinh_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.007s) 2023-01-11T21:45:10.4655710Z test_sparse_csr_unary_inplace_asinh_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.007s) 2023-01-11T21:45:10.4656056Z test_sparse_csr_unary_inplace_atan_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4656398Z test_sparse_csr_unary_inplace_atan_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.006s) 2023-01-11T21:45:10.4656748Z test_sparse_csr_unary_inplace_atan_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4657106Z test_sparse_csr_unary_inplace_atan_cuda_complex32 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4657459Z test_sparse_csr_unary_inplace_atan_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4657803Z test_sparse_csr_unary_inplace_atan_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4658148Z test_sparse_csr_unary_inplace_atan_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4658575Z test_sparse_csr_unary_inplace_atan_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4658936Z test_sparse_csr_unary_inplace_atan_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.007s) 2023-01-11T21:45:10.4659272Z test_sparse_csr_unary_inplace_atan_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.007s) 2023-01-11T21:45:10.4659615Z test_sparse_csr_unary_inplace_atan_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.007s) 2023-01-11T21:45:10.4660005Z test_sparse_csr_unary_inplace_atan_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.007s) 2023-01-11T21:45:10.4660338Z test_sparse_csr_unary_inplace_atan_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.007s) 2023-01-11T21:45:10.4660687Z test_sparse_csr_unary_inplace_atanh_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4661032Z test_sparse_csr_unary_inplace_atanh_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.007s) 2023-01-11T21:45:10.4661380Z test_sparse_csr_unary_inplace_atanh_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4661729Z test_sparse_csr_unary_inplace_atanh_cuda_complex32 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4662088Z test_sparse_csr_unary_inplace_atanh_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4662442Z test_sparse_csr_unary_inplace_atanh_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4662785Z test_sparse_csr_unary_inplace_atanh_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4663136Z test_sparse_csr_unary_inplace_atanh_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4663518Z test_sparse_csr_unary_inplace_atanh_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.007s) 2023-01-11T21:45:10.4663868Z test_sparse_csr_unary_inplace_atanh_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.007s) 2023-01-11T21:45:10.4664202Z test_sparse_csr_unary_inplace_atanh_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.007s) 2023-01-11T21:45:10.4664540Z test_sparse_csr_unary_inplace_atanh_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.007s) 2023-01-11T21:45:10.4664883Z test_sparse_csr_unary_inplace_atanh_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.007s) 2023-01-11T21:45:10.4665275Z test_sparse_csr_unary_inplace_ceil_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4665617Z test_sparse_csr_unary_inplace_ceil_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4665970Z test_sparse_csr_unary_inplace_ceil_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4666318Z test_sparse_csr_unary_inplace_ceil_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4666656Z test_sparse_csr_unary_inplace_ceil_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4667001Z test_sparse_csr_unary_inplace_ceil_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4667341Z test_sparse_csr_unary_inplace_ceil_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4667679Z test_sparse_csr_unary_inplace_ceil_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4668020Z test_sparse_csr_unary_inplace_ceil_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4668382Z test_sparse_csr_unary_inplace_conj_physical_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4668757Z test_sparse_csr_unary_inplace_conj_physical_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4669138Z test_sparse_csr_unary_inplace_conj_physical_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4669512Z test_sparse_csr_unary_inplace_conj_physical_cuda_complex32 (__main__.TestSparseCSRCUDA) ... ok (0.153s) 2023-01-11T21:45:10.4669888Z test_sparse_csr_unary_inplace_conj_physical_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4670262Z test_sparse_csr_unary_inplace_conj_physical_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4670624Z test_sparse_csr_unary_inplace_conj_physical_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4671000Z test_sparse_csr_unary_inplace_conj_physical_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4671367Z test_sparse_csr_unary_inplace_conj_physical_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4671770Z test_sparse_csr_unary_inplace_conj_physical_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4672122Z test_sparse_csr_unary_inplace_conj_physical_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4672489Z test_sparse_csr_unary_inplace_conj_physical_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4672852Z test_sparse_csr_unary_inplace_conj_physical_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4673216Z test_sparse_csr_unary_inplace_deg2rad_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4673567Z test_sparse_csr_unary_inplace_deg2rad_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.008s) 2023-01-11T21:45:10.4673927Z test_sparse_csr_unary_inplace_deg2rad_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4674284Z test_sparse_csr_unary_inplace_deg2rad_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4674636Z test_sparse_csr_unary_inplace_deg2rad_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4674993Z test_sparse_csr_unary_inplace_deg2rad_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.008s) 2023-01-11T21:45:10.4675347Z test_sparse_csr_unary_inplace_deg2rad_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.008s) 2023-01-11T21:45:10.4675702Z test_sparse_csr_unary_inplace_deg2rad_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.008s) 2023-01-11T21:45:10.4676045Z test_sparse_csr_unary_inplace_deg2rad_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.008s) 2023-01-11T21:45:10.4676406Z test_sparse_csr_unary_inplace_deg2rad_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.008s) 2023-01-11T21:45:10.4676790Z test_sparse_csr_unary_inplace_erf_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4677138Z test_sparse_csr_unary_inplace_erf_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.007s) 2023-01-11T21:45:10.4677482Z test_sparse_csr_unary_inplace_erf_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4677828Z test_sparse_csr_unary_inplace_erf_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4678173Z test_sparse_csr_unary_inplace_erf_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4678507Z test_sparse_csr_unary_inplace_erf_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.007s) 2023-01-11T21:45:10.4678848Z test_sparse_csr_unary_inplace_erf_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.007s) 2023-01-11T21:45:10.4679187Z test_sparse_csr_unary_inplace_erf_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.007s) 2023-01-11T21:45:10.4679525Z test_sparse_csr_unary_inplace_erf_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.007s) 2023-01-11T21:45:10.4679856Z test_sparse_csr_unary_inplace_erf_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.007s) 2023-01-11T21:45:10.4680209Z test_sparse_csr_unary_inplace_erfinv_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.235s) 2023-01-11T21:45:10.4680561Z test_sparse_csr_unary_inplace_erfinv_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.167s) 2023-01-11T21:45:10.4680909Z test_sparse_csr_unary_inplace_erfinv_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.165s) 2023-01-11T21:45:10.4681271Z test_sparse_csr_unary_inplace_erfinv_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.217s) 2023-01-11T21:45:10.4681621Z test_sparse_csr_unary_inplace_erfinv_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.007s) 2023-01-11T21:45:10.4681977Z test_sparse_csr_unary_inplace_erfinv_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.007s) 2023-01-11T21:45:10.4682325Z test_sparse_csr_unary_inplace_erfinv_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.007s) 2023-01-11T21:45:10.4682672Z test_sparse_csr_unary_inplace_erfinv_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.007s) 2023-01-11T21:45:10.4683022Z test_sparse_csr_unary_inplace_erfinv_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.007s) 2023-01-11T21:45:10.4683403Z test_sparse_csr_unary_inplace_expm1_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4683751Z test_sparse_csr_unary_inplace_expm1_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.007s) 2023-01-11T21:45:10.4684102Z test_sparse_csr_unary_inplace_expm1_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4684458Z test_sparse_csr_unary_inplace_expm1_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4684800Z test_sparse_csr_unary_inplace_expm1_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4685152Z test_sparse_csr_unary_inplace_expm1_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.007s) 2023-01-11T21:45:10.4685503Z test_sparse_csr_unary_inplace_expm1_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.007s) 2023-01-11T21:45:10.4685849Z test_sparse_csr_unary_inplace_expm1_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.007s) 2023-01-11T21:45:10.4686186Z test_sparse_csr_unary_inplace_expm1_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.007s) 2023-01-11T21:45:10.4686535Z test_sparse_csr_unary_inplace_expm1_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.007s) 2023-01-11T21:45:10.4686886Z test_sparse_csr_unary_inplace_floor_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4687240Z test_sparse_csr_unary_inplace_floor_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4687581Z test_sparse_csr_unary_inplace_floor_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4687929Z test_sparse_csr_unary_inplace_floor_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4688310Z test_sparse_csr_unary_inplace_floor_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4688648Z test_sparse_csr_unary_inplace_floor_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4688995Z test_sparse_csr_unary_inplace_floor_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4689339Z test_sparse_csr_unary_inplace_floor_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4689688Z test_sparse_csr_unary_inplace_floor_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4690025Z test_sparse_csr_unary_inplace_frac_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4690375Z test_sparse_csr_unary_inplace_frac_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4690727Z test_sparse_csr_unary_inplace_frac_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4691071Z test_sparse_csr_unary_inplace_frac_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4691478Z test_sparse_csr_unary_inplace_isinf_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s) 2023-01-11T21:45:10.4691930Z test_sparse_csr_unary_inplace_isinf_cuda_bool (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s) 2023-01-11T21:45:10.4692374Z test_sparse_csr_unary_inplace_isinf_cuda_complex128 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s) 2023-01-11T21:45:10.4692813Z test_sparse_csr_unary_inplace_isinf_cuda_complex32 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s) 2023-01-11T21:45:10.4693274Z test_sparse_csr_unary_inplace_isinf_cuda_complex64 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s) 2023-01-11T21:45:10.4693750Z test_sparse_csr_unary_inplace_isinf_cuda_float16 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s) 2023-01-11T21:45:10.4694187Z test_sparse_csr_unary_inplace_isinf_cuda_float32 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s) 2023-01-11T21:45:10.4694763Z test_sparse_csr_unary_inplace_isinf_cuda_float64 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s) 2023-01-11T21:45:10.4695186Z test_sparse_csr_unary_inplace_isinf_cuda_int16 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s) 2023-01-11T21:45:10.4695617Z test_sparse_csr_unary_inplace_isinf_cuda_int32 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s) 2023-01-11T21:45:10.4696044Z test_sparse_csr_unary_inplace_isinf_cuda_int64 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s) 2023-01-11T21:45:10.4696477Z test_sparse_csr_unary_inplace_isinf_cuda_int8 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s) 2023-01-11T21:45:10.4696908Z test_sparse_csr_unary_inplace_isinf_cuda_uint8 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s) 2023-01-11T21:45:10.4697351Z test_sparse_csr_unary_inplace_isnan_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s) 2023-01-11T21:45:10.4697789Z test_sparse_csr_unary_inplace_isnan_cuda_bool (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s) 2023-01-11T21:45:10.4698233Z test_sparse_csr_unary_inplace_isnan_cuda_complex128 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s) 2023-01-11T21:45:10.4698719Z test_sparse_csr_unary_inplace_isnan_cuda_complex64 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s) 2023-01-11T21:45:10.4699156Z test_sparse_csr_unary_inplace_isnan_cuda_float16 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s) 2023-01-11T21:45:10.4699625Z test_sparse_csr_unary_inplace_isnan_cuda_float32 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s) 2023-01-11T21:45:10.4700060Z test_sparse_csr_unary_inplace_isnan_cuda_float64 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s) 2023-01-11T21:45:10.4700480Z test_sparse_csr_unary_inplace_isnan_cuda_int16 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s) 2023-01-11T21:45:10.4700910Z test_sparse_csr_unary_inplace_isnan_cuda_int32 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s) 2023-01-11T21:45:10.4701342Z test_sparse_csr_unary_inplace_isnan_cuda_int64 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s) 2023-01-11T21:45:10.4701773Z test_sparse_csr_unary_inplace_isnan_cuda_int8 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s) 2023-01-11T21:45:10.4702194Z test_sparse_csr_unary_inplace_isnan_cuda_uint8 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s) 2023-01-11T21:45:10.4702639Z test_sparse_csr_unary_inplace_isneginf_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s) 2023-01-11T21:45:10.4703086Z test_sparse_csr_unary_inplace_isneginf_cuda_bool (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s) 2023-01-11T21:45:10.4703528Z test_sparse_csr_unary_inplace_isneginf_cuda_float16 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s) 2023-01-11T21:45:10.4703965Z test_sparse_csr_unary_inplace_isneginf_cuda_float32 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s) 2023-01-11T21:45:10.4704407Z test_sparse_csr_unary_inplace_isneginf_cuda_float64 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s) 2023-01-11T21:45:10.4704853Z test_sparse_csr_unary_inplace_isneginf_cuda_int16 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s) 2023-01-11T21:45:10.4705315Z test_sparse_csr_unary_inplace_isneginf_cuda_int32 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s) 2023-01-11T21:45:10.4705743Z test_sparse_csr_unary_inplace_isneginf_cuda_int64 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s) 2023-01-11T21:45:10.4706175Z test_sparse_csr_unary_inplace_isneginf_cuda_int8 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s) 2023-01-11T21:45:10.4706613Z test_sparse_csr_unary_inplace_isneginf_cuda_uint8 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s) 2023-01-11T21:45:10.4707056Z test_sparse_csr_unary_inplace_isposinf_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s) 2023-01-11T21:45:10.4707495Z test_sparse_csr_unary_inplace_isposinf_cuda_bool (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s) 2023-01-11T21:45:10.4707942Z test_sparse_csr_unary_inplace_isposinf_cuda_float16 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s) 2023-01-11T21:45:10.4708382Z test_sparse_csr_unary_inplace_isposinf_cuda_float32 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s) 2023-01-11T21:45:10.4708822Z test_sparse_csr_unary_inplace_isposinf_cuda_float64 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s) 2023-01-11T21:45:10.4709251Z test_sparse_csr_unary_inplace_isposinf_cuda_int16 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s) 2023-01-11T21:45:10.4709688Z test_sparse_csr_unary_inplace_isposinf_cuda_int32 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s) 2023-01-11T21:45:10.4710147Z test_sparse_csr_unary_inplace_isposinf_cuda_int64 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s) 2023-01-11T21:45:10.4710584Z test_sparse_csr_unary_inplace_isposinf_cuda_int8 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s) 2023-01-11T21:45:10.4711022Z test_sparse_csr_unary_inplace_isposinf_cuda_uint8 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s) 2023-01-11T21:45:10.4711407Z test_sparse_csr_unary_inplace_log1p_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4711765Z test_sparse_csr_unary_inplace_log1p_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.007s) 2023-01-11T21:45:10.4712131Z test_sparse_csr_unary_inplace_log1p_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4712484Z test_sparse_csr_unary_inplace_log1p_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4712845Z test_sparse_csr_unary_inplace_log1p_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4713199Z test_sparse_csr_unary_inplace_log1p_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4713558Z test_sparse_csr_unary_inplace_log1p_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4713900Z test_sparse_csr_unary_inplace_log1p_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.006s) 2023-01-11T21:45:10.4714247Z test_sparse_csr_unary_inplace_log1p_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.006s) 2023-01-11T21:45:10.4714590Z test_sparse_csr_unary_inplace_log1p_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.007s) 2023-01-11T21:45:10.4714933Z test_sparse_csr_unary_inplace_log1p_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.007s) 2023-01-11T21:45:10.4715270Z test_sparse_csr_unary_inplace_log1p_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.006s) 2023-01-11T21:45:10.4715626Z test_sparse_csr_unary_inplace_neg_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4715988Z test_sparse_csr_unary_inplace_neg_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4716370Z test_sparse_csr_unary_inplace_neg_cuda_complex32 (__main__.TestSparseCSRCUDA) ... ok (0.154s) 2023-01-11T21:45:10.4716726Z test_sparse_csr_unary_inplace_neg_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.151s) 2023-01-11T21:45:10.4717079Z test_sparse_csr_unary_inplace_neg_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4717426Z test_sparse_csr_unary_inplace_neg_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4717760Z test_sparse_csr_unary_inplace_neg_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4718105Z test_sparse_csr_unary_inplace_neg_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4718451Z test_sparse_csr_unary_inplace_neg_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4718784Z test_sparse_csr_unary_inplace_neg_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4719125Z test_sparse_csr_unary_inplace_neg_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4719473Z test_sparse_csr_unary_inplace_neg_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4719888Z test_sparse_csr_unary_inplace_nn_functional_relu_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s) 2023-01-11T21:45:10.4720348Z test_sparse_csr_unary_inplace_nn_functional_relu_cuda_float16 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s) 2023-01-11T21:45:10.4720809Z test_sparse_csr_unary_inplace_nn_functional_relu_cuda_float32 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s) 2023-01-11T21:45:10.4721268Z test_sparse_csr_unary_inplace_nn_functional_relu_cuda_float64 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s) 2023-01-11T21:45:10.4721768Z test_sparse_csr_unary_inplace_nn_functional_relu_cuda_int16 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s) 2023-01-11T21:45:10.4722217Z test_sparse_csr_unary_inplace_nn_functional_relu_cuda_int32 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s) 2023-01-11T21:45:10.4722663Z test_sparse_csr_unary_inplace_nn_functional_relu_cuda_int64 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s) 2023-01-11T21:45:10.4723117Z test_sparse_csr_unary_inplace_nn_functional_relu_cuda_int8 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s) 2023-01-11T21:45:10.4723567Z test_sparse_csr_unary_inplace_nn_functional_relu_cuda_uint8 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s) 2023-01-11T21:45:10.4724016Z test_sparse_csr_unary_inplace_positive_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s) 2023-01-11T21:45:10.4724463Z test_sparse_csr_unary_inplace_positive_cuda_complex128 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s) 2023-01-11T21:45:10.4724918Z test_sparse_csr_unary_inplace_positive_cuda_complex32 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s) 2023-01-11T21:45:10.4725368Z test_sparse_csr_unary_inplace_positive_cuda_complex64 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s) 2023-01-11T21:45:10.4725815Z test_sparse_csr_unary_inplace_positive_cuda_float16 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s) 2023-01-11T21:45:10.4726248Z test_sparse_csr_unary_inplace_positive_cuda_float32 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s) 2023-01-11T21:45:10.4726686Z test_sparse_csr_unary_inplace_positive_cuda_float64 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s) 2023-01-11T21:45:10.4727155Z test_sparse_csr_unary_inplace_positive_cuda_int16 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s) 2023-01-11T21:45:10.4727597Z test_sparse_csr_unary_inplace_positive_cuda_int32 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s) 2023-01-11T21:45:10.4728023Z test_sparse_csr_unary_inplace_positive_cuda_int64 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s) 2023-01-11T21:45:10.4728461Z test_sparse_csr_unary_inplace_positive_cuda_int8 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s) 2023-01-11T21:45:10.4728896Z test_sparse_csr_unary_inplace_positive_cuda_uint8 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s) 2023-01-11T21:45:10.4729295Z test_sparse_csr_unary_inplace_rad2deg_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4729652Z test_sparse_csr_unary_inplace_rad2deg_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.008s) 2023-01-11T21:45:10.4730011Z test_sparse_csr_unary_inplace_rad2deg_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4730372Z test_sparse_csr_unary_inplace_rad2deg_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4730733Z test_sparse_csr_unary_inplace_rad2deg_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4731081Z test_sparse_csr_unary_inplace_rad2deg_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.008s) 2023-01-11T21:45:10.4731433Z test_sparse_csr_unary_inplace_rad2deg_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.008s) 2023-01-11T21:45:10.4731788Z test_sparse_csr_unary_inplace_rad2deg_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.008s) 2023-01-11T21:45:10.4732133Z test_sparse_csr_unary_inplace_rad2deg_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.008s) 2023-01-11T21:45:10.4732513Z test_sparse_csr_unary_inplace_rad2deg_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.008s) 2023-01-11T21:45:10.4732873Z test_sparse_csr_unary_inplace_round_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4733230Z test_sparse_csr_unary_inplace_round_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4733583Z test_sparse_csr_unary_inplace_round_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4733973Z test_sparse_csr_unary_inplace_round_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4734324Z test_sparse_csr_unary_inplace_round_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4734853Z test_sparse_csr_unary_inplace_round_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4735238Z test_sparse_csr_unary_inplace_round_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4735589Z test_sparse_csr_unary_inplace_round_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4735934Z test_sparse_csr_unary_inplace_round_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4736281Z test_sparse_csr_unary_inplace_sgn_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4736635Z test_sparse_csr_unary_inplace_sgn_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4736989Z test_sparse_csr_unary_inplace_sgn_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4737349Z test_sparse_csr_unary_inplace_sgn_cuda_complex32 (__main__.TestSparseCSRCUDA) ... ok (0.192s) 2023-01-11T21:45:10.4737696Z test_sparse_csr_unary_inplace_sgn_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.188s) 2023-01-11T21:45:10.4738046Z test_sparse_csr_unary_inplace_sgn_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4738392Z test_sparse_csr_unary_inplace_sgn_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4738794Z test_sparse_csr_unary_inplace_sgn_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4739200Z test_sparse_csr_unary_inplace_sgn_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4739558Z test_sparse_csr_unary_inplace_sgn_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4739903Z test_sparse_csr_unary_inplace_sgn_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4740245Z test_sparse_csr_unary_inplace_sgn_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4740597Z test_sparse_csr_unary_inplace_sgn_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4740950Z test_sparse_csr_unary_inplace_sign_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4741302Z test_sparse_csr_unary_inplace_sign_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4741465Z test_sparse_csr_unary_inplace_sign_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4741636Z test_sparse_csr_unary_inplace_sign_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4741805Z test_sparse_csr_unary_inplace_sign_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4741972Z test_sparse_csr_unary_inplace_sign_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4742138Z test_sparse_csr_unary_inplace_sign_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4742300Z test_sparse_csr_unary_inplace_sign_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4742465Z test_sparse_csr_unary_inplace_sign_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4742629Z test_sparse_csr_unary_inplace_sign_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4742876Z test_sparse_csr_unary_inplace_signbit_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s) 2023-01-11T21:45:10.4743097Z test_sparse_csr_unary_inplace_signbit_cuda_bool (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s) 2023-01-11T21:45:10.4743316Z test_sparse_csr_unary_inplace_signbit_cuda_float16 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s) 2023-01-11T21:45:10.4743534Z test_sparse_csr_unary_inplace_signbit_cuda_float32 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s) 2023-01-11T21:45:10.4743747Z test_sparse_csr_unary_inplace_signbit_cuda_float64 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s) 2023-01-11T21:45:10.4743961Z test_sparse_csr_unary_inplace_signbit_cuda_int16 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s) 2023-01-11T21:45:10.4744181Z test_sparse_csr_unary_inplace_signbit_cuda_int32 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s) 2023-01-11T21:45:10.4744399Z test_sparse_csr_unary_inplace_signbit_cuda_int64 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s) 2023-01-11T21:45:10.4744611Z test_sparse_csr_unary_inplace_signbit_cuda_int8 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s) 2023-01-11T21:45:10.4744825Z test_sparse_csr_unary_inplace_signbit_cuda_uint8 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s) 2023-01-11T21:45:10.4744992Z test_sparse_csr_unary_inplace_sin_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4745148Z test_sparse_csr_unary_inplace_sin_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.007s) 2023-01-11T21:45:10.4745322Z test_sparse_csr_unary_inplace_sin_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4745497Z test_sparse_csr_unary_inplace_sin_cuda_complex32 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4745667Z test_sparse_csr_unary_inplace_sin_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4745856Z test_sparse_csr_unary_inplace_sin_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4746023Z test_sparse_csr_unary_inplace_sin_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4746186Z test_sparse_csr_unary_inplace_sin_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4746350Z test_sparse_csr_unary_inplace_sin_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.007s) 2023-01-11T21:45:10.4746507Z test_sparse_csr_unary_inplace_sin_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.007s) 2023-01-11T21:45:10.4746664Z test_sparse_csr_unary_inplace_sin_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.007s) 2023-01-11T21:45:10.4746832Z test_sparse_csr_unary_inplace_sin_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.007s) 2023-01-11T21:45:10.4746995Z test_sparse_csr_unary_inplace_sin_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.007s) 2023-01-11T21:45:10.4747166Z test_sparse_csr_unary_inplace_sinh_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4747326Z test_sparse_csr_unary_inplace_sinh_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.007s) 2023-01-11T21:45:10.4747502Z test_sparse_csr_unary_inplace_sinh_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4747673Z test_sparse_csr_unary_inplace_sinh_cuda_complex32 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4747845Z test_sparse_csr_unary_inplace_sinh_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4748003Z test_sparse_csr_unary_inplace_sinh_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4748172Z test_sparse_csr_unary_inplace_sinh_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4748363Z test_sparse_csr_unary_inplace_sinh_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4748531Z test_sparse_csr_unary_inplace_sinh_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.007s) 2023-01-11T21:45:10.4748696Z test_sparse_csr_unary_inplace_sinh_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.007s) 2023-01-11T21:45:10.4748855Z test_sparse_csr_unary_inplace_sinh_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.007s) 2023-01-11T21:45:10.4749019Z test_sparse_csr_unary_inplace_sinh_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.007s) 2023-01-11T21:45:10.4749178Z test_sparse_csr_unary_inplace_sinh_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.007s) 2023-01-11T21:45:10.4749340Z test_sparse_csr_unary_inplace_sqrt_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4749505Z test_sparse_csr_unary_inplace_sqrt_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.007s) 2023-01-11T21:45:10.4749679Z test_sparse_csr_unary_inplace_sqrt_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.418s) 2023-01-11T21:45:10.4749854Z test_sparse_csr_unary_inplace_sqrt_cuda_complex32 (__main__.TestSparseCSRCUDA) ... ok (0.430s) 2023-01-11T21:45:10.4750029Z test_sparse_csr_unary_inplace_sqrt_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.424s) 2023-01-11T21:45:10.4750193Z test_sparse_csr_unary_inplace_sqrt_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4750356Z test_sparse_csr_unary_inplace_sqrt_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4750524Z test_sparse_csr_unary_inplace_sqrt_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4750690Z test_sparse_csr_unary_inplace_sqrt_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.007s) 2023-01-11T21:45:10.4750847Z test_sparse_csr_unary_inplace_sqrt_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.007s) 2023-01-11T21:45:10.4751012Z test_sparse_csr_unary_inplace_sqrt_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.007s) 2023-01-11T21:45:10.4751171Z test_sparse_csr_unary_inplace_sqrt_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.007s) 2023-01-11T21:45:10.4751356Z test_sparse_csr_unary_inplace_sqrt_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.007s) 2023-01-11T21:45:10.4751521Z test_sparse_csr_unary_inplace_tan_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4751685Z test_sparse_csr_unary_inplace_tan_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.007s) 2023-01-11T21:45:10.4751856Z test_sparse_csr_unary_inplace_tan_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4752026Z test_sparse_csr_unary_inplace_tan_cuda_complex32 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4752195Z test_sparse_csr_unary_inplace_tan_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4752358Z test_sparse_csr_unary_inplace_tan_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4752524Z test_sparse_csr_unary_inplace_tan_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4752694Z test_sparse_csr_unary_inplace_tan_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4752857Z test_sparse_csr_unary_inplace_tan_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.007s) 2023-01-11T21:45:10.4753018Z test_sparse_csr_unary_inplace_tan_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.007s) 2023-01-11T21:45:10.4753175Z test_sparse_csr_unary_inplace_tan_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.007s) 2023-01-11T21:45:10.4753338Z test_sparse_csr_unary_inplace_tan_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.007s) 2023-01-11T21:45:10.4753503Z test_sparse_csr_unary_inplace_tan_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.007s) 2023-01-11T21:45:10.4753662Z test_sparse_csr_unary_inplace_tanh_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4753848Z test_sparse_csr_unary_inplace_tanh_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.007s) 2023-01-11T21:45:10.4754023Z test_sparse_csr_unary_inplace_tanh_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4754196Z test_sparse_csr_unary_inplace_tanh_cuda_complex32 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4754370Z test_sparse_csr_unary_inplace_tanh_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4754535Z test_sparse_csr_unary_inplace_tanh_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4754701Z test_sparse_csr_unary_inplace_tanh_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4754864Z test_sparse_csr_unary_inplace_tanh_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4755029Z test_sparse_csr_unary_inplace_tanh_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.007s) 2023-01-11T21:45:10.4755185Z test_sparse_csr_unary_inplace_tanh_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.007s) 2023-01-11T21:45:10.4755350Z test_sparse_csr_unary_inplace_tanh_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.007s) 2023-01-11T21:45:10.4755515Z test_sparse_csr_unary_inplace_tanh_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.007s) 2023-01-11T21:45:10.4755677Z test_sparse_csr_unary_inplace_tanh_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.007s) 2023-01-11T21:45:10.4755849Z test_sparse_csr_unary_inplace_trunc_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4756019Z test_sparse_csr_unary_inplace_trunc_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4756188Z test_sparse_csr_unary_inplace_trunc_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4756357Z test_sparse_csr_unary_inplace_trunc_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4756517Z test_sparse_csr_unary_inplace_trunc_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4756682Z test_sparse_csr_unary_inplace_trunc_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4756873Z test_sparse_csr_unary_inplace_trunc_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4757038Z test_sparse_csr_unary_inplace_trunc_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4757201Z test_sparse_csr_unary_inplace_trunc_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4757364Z test_sparse_csr_unary_out_abs_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4757525Z test_sparse_csr_unary_out_abs_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4758153Z test_sparse_csr_unary_out_abs_cuda_complex128 (__main__.TestSparseCSRCUDA) ... /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/opinfo/core.py:1068: UserWarning: Casting complex values to real discards the imaginary part (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/Copy.cpp:276.) 2023-01-11T21:45:10.4758258Z return self.op(*args, **kwargs) 2023-01-11T21:45:10.4758334Z ok (0.006s) 2023-01-11T21:45:10.4758496Z test_sparse_csr_unary_out_abs_cuda_complex32 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4758656Z test_sparse_csr_unary_out_abs_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4758815Z test_sparse_csr_unary_out_abs_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4758975Z test_sparse_csr_unary_out_abs_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4759133Z test_sparse_csr_unary_out_abs_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4759292Z test_sparse_csr_unary_out_abs_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4759444Z test_sparse_csr_unary_out_abs_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4759629Z test_sparse_csr_unary_out_abs_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4759782Z test_sparse_csr_unary_out_abs_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4759941Z test_sparse_csr_unary_out_abs_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4760102Z test_sparse_csr_unary_out_angle_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4760269Z test_sparse_csr_unary_out_angle_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.202s) 2023-01-11T21:45:10.4760435Z test_sparse_csr_unary_out_angle_cuda_complex32 (__main__.TestSparseCSRCUDA) ... ok (0.200s) 2023-01-11T21:45:10.4760598Z test_sparse_csr_unary_out_angle_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.197s) 2023-01-11T21:45:10.4760761Z test_sparse_csr_unary_out_angle_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.006s) 2023-01-11T21:45:10.4760927Z test_sparse_csr_unary_out_angle_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4761077Z test_sparse_csr_unary_out_angle_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4761241Z test_sparse_csr_unary_out_angle_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4761400Z test_sparse_csr_unary_out_angle_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4761561Z test_sparse_csr_unary_out_angle_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4761719Z test_sparse_csr_unary_out_angle_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4761884Z test_sparse_csr_unary_out_asin_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4762045Z test_sparse_csr_unary_out_asin_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4762212Z test_sparse_csr_unary_out_asin_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4762384Z test_sparse_csr_unary_out_asin_cuda_complex32 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4762564Z test_sparse_csr_unary_out_asin_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4762728Z test_sparse_csr_unary_out_asin_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4762890Z test_sparse_csr_unary_out_asin_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.006s) 2023-01-11T21:45:10.4763056Z test_sparse_csr_unary_out_asin_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4763219Z test_sparse_csr_unary_out_asin_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4763400Z test_sparse_csr_unary_out_asin_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4763577Z test_sparse_csr_unary_out_asin_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4763738Z test_sparse_csr_unary_out_asin_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4763884Z test_sparse_csr_unary_out_asin_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.004s) 2023-01-11T21:45:10.4764054Z test_sparse_csr_unary_out_asinh_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4764211Z test_sparse_csr_unary_out_asinh_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4764381Z test_sparse_csr_unary_out_asinh_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4764546Z test_sparse_csr_unary_out_asinh_cuda_complex32 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4764713Z test_sparse_csr_unary_out_asinh_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4764873Z test_sparse_csr_unary_out_asinh_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4765035Z test_sparse_csr_unary_out_asinh_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4765225Z test_sparse_csr_unary_out_asinh_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4765375Z test_sparse_csr_unary_out_asinh_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4765538Z test_sparse_csr_unary_out_asinh_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4765696Z test_sparse_csr_unary_out_asinh_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4765857Z test_sparse_csr_unary_out_asinh_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4766014Z test_sparse_csr_unary_out_asinh_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4766177Z test_sparse_csr_unary_out_atan_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4766336Z test_sparse_csr_unary_out_atan_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4766505Z test_sparse_csr_unary_out_atan_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4766661Z test_sparse_csr_unary_out_atan_cuda_complex32 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4766829Z test_sparse_csr_unary_out_atan_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4766990Z test_sparse_csr_unary_out_atan_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4767151Z test_sparse_csr_unary_out_atan_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4767309Z test_sparse_csr_unary_out_atan_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4767468Z test_sparse_csr_unary_out_atan_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4767625Z test_sparse_csr_unary_out_atan_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4767782Z test_sparse_csr_unary_out_atan_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4767943Z test_sparse_csr_unary_out_atan_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4768093Z test_sparse_csr_unary_out_atan_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4768280Z test_sparse_csr_unary_out_atanh_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4768439Z test_sparse_csr_unary_out_atanh_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4768607Z test_sparse_csr_unary_out_atanh_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4768773Z test_sparse_csr_unary_out_atanh_cuda_complex32 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4768937Z test_sparse_csr_unary_out_atanh_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4769099Z test_sparse_csr_unary_out_atanh_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4769264Z test_sparse_csr_unary_out_atanh_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4769417Z test_sparse_csr_unary_out_atanh_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4769578Z test_sparse_csr_unary_out_atanh_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4769736Z test_sparse_csr_unary_out_atanh_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4769898Z test_sparse_csr_unary_out_atanh_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4770054Z test_sparse_csr_unary_out_atanh_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4770213Z test_sparse_csr_unary_out_atanh_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.004s) 2023-01-11T21:45:10.4770371Z test_sparse_csr_unary_out_ceil_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4770531Z test_sparse_csr_unary_out_ceil_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4770730Z test_sparse_csr_unary_out_ceil_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4770882Z test_sparse_csr_unary_out_ceil_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4771043Z test_sparse_csr_unary_out_ceil_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4771201Z test_sparse_csr_unary_out_ceil_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4771360Z test_sparse_csr_unary_out_ceil_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4771517Z test_sparse_csr_unary_out_ceil_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4771672Z test_sparse_csr_unary_out_ceil_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4771849Z test_sparse_csr_unary_out_conj_physical_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4772021Z test_sparse_csr_unary_out_conj_physical_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4772194Z test_sparse_csr_unary_out_conj_physical_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4772375Z test_sparse_csr_unary_out_conj_physical_cuda_complex32 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4772553Z test_sparse_csr_unary_out_conj_physical_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4772724Z test_sparse_csr_unary_out_conj_physical_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4772893Z test_sparse_csr_unary_out_conj_physical_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4773060Z test_sparse_csr_unary_out_conj_physical_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4773230Z test_sparse_csr_unary_out_conj_physical_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4773397Z test_sparse_csr_unary_out_conj_physical_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4773561Z test_sparse_csr_unary_out_conj_physical_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4773744Z test_sparse_csr_unary_out_conj_physical_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4773908Z test_sparse_csr_unary_out_conj_physical_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4774074Z test_sparse_csr_unary_out_deg2rad_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4774239Z test_sparse_csr_unary_out_deg2rad_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4774404Z test_sparse_csr_unary_out_deg2rad_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4774690Z test_sparse_csr_unary_out_deg2rad_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4774855Z test_sparse_csr_unary_out_deg2rad_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4775017Z test_sparse_csr_unary_out_deg2rad_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4775179Z test_sparse_csr_unary_out_deg2rad_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4775329Z test_sparse_csr_unary_out_deg2rad_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4775489Z test_sparse_csr_unary_out_deg2rad_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4775646Z test_sparse_csr_unary_out_deg2rad_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4775805Z test_sparse_csr_unary_out_erf_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4775963Z test_sparse_csr_unary_out_erf_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4776120Z test_sparse_csr_unary_out_erf_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4776323Z test_sparse_csr_unary_out_erf_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4776477Z test_sparse_csr_unary_out_erf_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4776629Z test_sparse_csr_unary_out_erf_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4776785Z test_sparse_csr_unary_out_erf_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4776937Z test_sparse_csr_unary_out_erf_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4777091Z test_sparse_csr_unary_out_erf_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4777244Z test_sparse_csr_unary_out_erf_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4777400Z test_sparse_csr_unary_out_erfinv_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4777564Z test_sparse_csr_unary_out_erfinv_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4777728Z test_sparse_csr_unary_out_erfinv_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4777891Z test_sparse_csr_unary_out_erfinv_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4778044Z test_sparse_csr_unary_out_erfinv_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4778203Z test_sparse_csr_unary_out_erfinv_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4778358Z test_sparse_csr_unary_out_erfinv_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4778573Z test_sparse_csr_unary_out_erfinv_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4778730Z test_sparse_csr_unary_out_erfinv_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.004s) 2023-01-11T21:45:10.4778892Z test_sparse_csr_unary_out_expm1_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4779054Z test_sparse_csr_unary_out_expm1_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4779212Z test_sparse_csr_unary_out_expm1_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4779393Z test_sparse_csr_unary_out_expm1_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4779552Z test_sparse_csr_unary_out_expm1_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4779706Z test_sparse_csr_unary_out_expm1_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4779858Z test_sparse_csr_unary_out_expm1_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4780011Z test_sparse_csr_unary_out_expm1_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4780167Z test_sparse_csr_unary_out_expm1_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4780320Z test_sparse_csr_unary_out_expm1_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4780482Z test_sparse_csr_unary_out_floor_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4780637Z test_sparse_csr_unary_out_floor_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4780794Z test_sparse_csr_unary_out_floor_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4780951Z test_sparse_csr_unary_out_floor_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4781106Z test_sparse_csr_unary_out_floor_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4781260Z test_sparse_csr_unary_out_floor_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4781413Z test_sparse_csr_unary_out_floor_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4781572Z test_sparse_csr_unary_out_floor_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4781753Z test_sparse_csr_unary_out_floor_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4781919Z test_sparse_csr_unary_out_frac_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4782075Z test_sparse_csr_unary_out_frac_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4782237Z test_sparse_csr_unary_out_frac_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4782397Z test_sparse_csr_unary_out_frac_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4782603Z test_sparse_csr_unary_out_isinf_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Out not supported (0.001s) 2023-01-11T21:45:10.4782800Z test_sparse_csr_unary_out_isinf_cuda_bool (__main__.TestSparseCSRCUDA) ... skip: Skipped! Out not supported (0.001s) 2023-01-11T21:45:10.4783006Z test_sparse_csr_unary_out_isinf_cuda_complex128 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Out not supported (0.001s) 2023-01-11T21:45:10.4783211Z test_sparse_csr_unary_out_isinf_cuda_complex32 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Out not supported (0.001s) 2023-01-11T21:45:10.4783416Z test_sparse_csr_unary_out_isinf_cuda_complex64 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Out not supported (0.001s) 2023-01-11T21:45:10.4783614Z test_sparse_csr_unary_out_isinf_cuda_float16 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Out not supported (0.001s) 2023-01-11T21:45:10.4783812Z test_sparse_csr_unary_out_isinf_cuda_float32 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Out not supported (0.001s) 2023-01-11T21:45:10.4783996Z test_sparse_csr_unary_out_isinf_cuda_float64 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Out not supported (0.001s) 2023-01-11T21:45:10.4784190Z test_sparse_csr_unary_out_isinf_cuda_int16 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Out not supported (0.001s) 2023-01-11T21:45:10.4784384Z test_sparse_csr_unary_out_isinf_cuda_int32 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Out not supported (0.001s) 2023-01-11T21:45:10.4784575Z test_sparse_csr_unary_out_isinf_cuda_int64 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Out not supported (0.001s) 2023-01-11T21:45:10.4784794Z test_sparse_csr_unary_out_isinf_cuda_int8 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Out not supported (0.001s) 2023-01-11T21:45:10.4784984Z test_sparse_csr_unary_out_isinf_cuda_uint8 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Out not supported (0.001s) 2023-01-11T21:45:10.4785181Z test_sparse_csr_unary_out_isnan_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Out not supported (0.001s) 2023-01-11T21:45:10.4785375Z test_sparse_csr_unary_out_isnan_cuda_bool (__main__.TestSparseCSRCUDA) ... skip: Skipped! Out not supported (0.001s) 2023-01-11T21:45:10.4785576Z test_sparse_csr_unary_out_isnan_cuda_complex128 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Out not supported (0.001s) 2023-01-11T21:45:10.4785766Z test_sparse_csr_unary_out_isnan_cuda_complex64 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Out not supported (0.001s) 2023-01-11T21:45:10.4785966Z test_sparse_csr_unary_out_isnan_cuda_float16 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Out not supported (0.001s) 2023-01-11T21:45:10.4786166Z test_sparse_csr_unary_out_isnan_cuda_float32 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Out not supported (0.001s) 2023-01-11T21:45:10.4786359Z test_sparse_csr_unary_out_isnan_cuda_float64 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Out not supported (0.001s) 2023-01-11T21:45:10.4786555Z test_sparse_csr_unary_out_isnan_cuda_int16 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Out not supported (0.001s) 2023-01-11T21:45:10.4786748Z test_sparse_csr_unary_out_isnan_cuda_int32 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Out not supported (0.001s) 2023-01-11T21:45:10.4786936Z test_sparse_csr_unary_out_isnan_cuda_int64 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Out not supported (0.001s) 2023-01-11T21:45:10.4787129Z test_sparse_csr_unary_out_isnan_cuda_int8 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Out not supported (0.001s) 2023-01-11T21:45:10.4787349Z test_sparse_csr_unary_out_isnan_cuda_uint8 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Out not supported (0.001s) 2023-01-11T21:45:10.4787522Z test_sparse_csr_unary_out_isneginf_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4787680Z test_sparse_csr_unary_out_isneginf_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4787849Z test_sparse_csr_unary_out_isneginf_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4788013Z test_sparse_csr_unary_out_isneginf_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4788174Z test_sparse_csr_unary_out_isneginf_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4788338Z test_sparse_csr_unary_out_isneginf_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4788504Z test_sparse_csr_unary_out_isneginf_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4788671Z test_sparse_csr_unary_out_isneginf_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4788835Z test_sparse_csr_unary_out_isneginf_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4788989Z test_sparse_csr_unary_out_isneginf_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4789160Z test_sparse_csr_unary_out_isposinf_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4789322Z test_sparse_csr_unary_out_isposinf_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4789490Z test_sparse_csr_unary_out_isposinf_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4789653Z test_sparse_csr_unary_out_isposinf_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4789813Z test_sparse_csr_unary_out_isposinf_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4789978Z test_sparse_csr_unary_out_isposinf_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4790144Z test_sparse_csr_unary_out_isposinf_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4790329Z test_sparse_csr_unary_out_isposinf_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4790484Z test_sparse_csr_unary_out_isposinf_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4790647Z test_sparse_csr_unary_out_isposinf_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4790809Z test_sparse_csr_unary_out_log1p_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4790973Z test_sparse_csr_unary_out_log1p_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4791136Z test_sparse_csr_unary_out_log1p_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4791308Z test_sparse_csr_unary_out_log1p_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4791472Z test_sparse_csr_unary_out_log1p_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4791635Z test_sparse_csr_unary_out_log1p_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4791798Z test_sparse_csr_unary_out_log1p_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4791951Z test_sparse_csr_unary_out_log1p_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4792112Z test_sparse_csr_unary_out_log1p_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4792268Z test_sparse_csr_unary_out_log1p_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4792429Z test_sparse_csr_unary_out_log1p_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4792586Z test_sparse_csr_unary_out_log1p_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4792776Z test_sparse_csr_unary_out_neg_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4792940Z test_sparse_csr_unary_out_neg_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4807776Z test_sparse_csr_unary_out_neg_cuda_complex32 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4807970Z test_sparse_csr_unary_out_neg_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4808131Z test_sparse_csr_unary_out_neg_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4808284Z test_sparse_csr_unary_out_neg_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4808443Z test_sparse_csr_unary_out_neg_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4808598Z test_sparse_csr_unary_out_neg_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4808756Z test_sparse_csr_unary_out_neg_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4808912Z test_sparse_csr_unary_out_neg_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4809067Z test_sparse_csr_unary_out_neg_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4809222Z test_sparse_csr_unary_out_neg_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4809426Z test_sparse_csr_unary_out_nn_functional_relu_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Out not supported (0.001s) 2023-01-11T21:45:10.4809633Z test_sparse_csr_unary_out_nn_functional_relu_cuda_float16 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Out not supported (0.001s) 2023-01-11T21:45:10.4809831Z test_sparse_csr_unary_out_nn_functional_relu_cuda_float32 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Out not supported (0.001s) 2023-01-11T21:45:10.4810037Z test_sparse_csr_unary_out_nn_functional_relu_cuda_float64 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Out not supported (0.001s) 2023-01-11T21:45:10.4810243Z test_sparse_csr_unary_out_nn_functional_relu_cuda_int16 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Out not supported (0.001s) 2023-01-11T21:45:10.4810498Z test_sparse_csr_unary_out_nn_functional_relu_cuda_int32 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Out not supported (0.001s) 2023-01-11T21:45:10.4810702Z test_sparse_csr_unary_out_nn_functional_relu_cuda_int64 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Out not supported (0.001s) 2023-01-11T21:45:10.4810902Z test_sparse_csr_unary_out_nn_functional_relu_cuda_int8 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Out not supported (0.001s) 2023-01-11T21:45:10.4811099Z test_sparse_csr_unary_out_nn_functional_relu_cuda_uint8 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Out not supported (0.001s) 2023-01-11T21:45:10.4811293Z test_sparse_csr_unary_out_positive_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Out not supported (0.001s) 2023-01-11T21:45:10.4811490Z test_sparse_csr_unary_out_positive_cuda_complex128 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Out not supported (0.001s) 2023-01-11T21:45:10.4811688Z test_sparse_csr_unary_out_positive_cuda_complex32 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Out not supported (0.001s) 2023-01-11T21:45:10.4811886Z test_sparse_csr_unary_out_positive_cuda_complex64 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Out not supported (0.001s) 2023-01-11T21:45:10.4812081Z test_sparse_csr_unary_out_positive_cuda_float16 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Out not supported (0.001s) 2023-01-11T21:45:10.4812274Z test_sparse_csr_unary_out_positive_cuda_float32 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Out not supported (0.001s) 2023-01-11T21:45:10.4812467Z test_sparse_csr_unary_out_positive_cuda_float64 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Out not supported (0.001s) 2023-01-11T21:45:10.4812660Z test_sparse_csr_unary_out_positive_cuda_int16 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Out not supported (0.001s) 2023-01-11T21:45:10.4812879Z test_sparse_csr_unary_out_positive_cuda_int32 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Out not supported (0.001s) 2023-01-11T21:45:10.4813073Z test_sparse_csr_unary_out_positive_cuda_int64 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Out not supported (0.001s) 2023-01-11T21:45:10.4813265Z test_sparse_csr_unary_out_positive_cuda_int8 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Out not supported (0.001s) 2023-01-11T21:45:10.4813447Z test_sparse_csr_unary_out_positive_cuda_uint8 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Out not supported (0.001s) 2023-01-11T21:45:10.4813612Z test_sparse_csr_unary_out_rad2deg_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4813770Z test_sparse_csr_unary_out_rad2deg_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4813932Z test_sparse_csr_unary_out_rad2deg_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4814095Z test_sparse_csr_unary_out_rad2deg_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4814249Z test_sparse_csr_unary_out_rad2deg_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4814409Z test_sparse_csr_unary_out_rad2deg_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4814905Z test_sparse_csr_unary_out_rad2deg_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4815080Z test_sparse_csr_unary_out_rad2deg_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4815238Z test_sparse_csr_unary_out_rad2deg_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4815393Z test_sparse_csr_unary_out_rad2deg_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4815554Z test_sparse_csr_unary_out_round_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4815715Z test_sparse_csr_unary_out_round_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4815869Z test_sparse_csr_unary_out_round_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4816081Z test_sparse_csr_unary_out_round_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4816241Z test_sparse_csr_unary_out_round_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4816396Z test_sparse_csr_unary_out_round_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4816540Z test_sparse_csr_unary_out_round_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4816697Z test_sparse_csr_unary_out_round_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4816852Z test_sparse_csr_unary_out_round_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4817010Z test_sparse_csr_unary_out_sgn_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4817169Z test_sparse_csr_unary_out_sgn_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4817333Z test_sparse_csr_unary_out_sgn_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4817494Z test_sparse_csr_unary_out_sgn_cuda_complex32 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4817651Z test_sparse_csr_unary_out_sgn_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4817799Z test_sparse_csr_unary_out_sgn_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4817952Z test_sparse_csr_unary_out_sgn_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4818100Z test_sparse_csr_unary_out_sgn_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4818255Z test_sparse_csr_unary_out_sgn_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4818409Z test_sparse_csr_unary_out_sgn_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4818706Z test_sparse_csr_unary_out_sgn_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4818871Z test_sparse_csr_unary_out_sgn_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4819025Z test_sparse_csr_unary_out_sgn_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4819179Z test_sparse_csr_unary_out_sign_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4819326Z test_sparse_csr_unary_out_sign_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4819480Z test_sparse_csr_unary_out_sign_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4819635Z test_sparse_csr_unary_out_sign_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4819790Z test_sparse_csr_unary_out_sign_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4819947Z test_sparse_csr_unary_out_sign_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4820102Z test_sparse_csr_unary_out_sign_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4820260Z test_sparse_csr_unary_out_sign_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4820409Z test_sparse_csr_unary_out_sign_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4820558Z test_sparse_csr_unary_out_sign_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4820722Z test_sparse_csr_unary_out_signbit_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4820878Z test_sparse_csr_unary_out_signbit_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4821036Z test_sparse_csr_unary_out_signbit_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4821195Z test_sparse_csr_unary_out_signbit_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4821355Z test_sparse_csr_unary_out_signbit_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4821511Z test_sparse_csr_unary_out_signbit_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4821695Z test_sparse_csr_unary_out_signbit_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4821850Z test_sparse_csr_unary_out_signbit_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4822001Z test_sparse_csr_unary_out_signbit_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4822158Z test_sparse_csr_unary_out_signbit_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4822314Z test_sparse_csr_unary_out_sin_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4822469Z test_sparse_csr_unary_out_sin_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4822632Z test_sparse_csr_unary_out_sin_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4822787Z test_sparse_csr_unary_out_sin_cuda_complex32 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4822946Z test_sparse_csr_unary_out_sin_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4823103Z test_sparse_csr_unary_out_sin_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4823255Z test_sparse_csr_unary_out_sin_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4823417Z test_sparse_csr_unary_out_sin_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4823595Z test_sparse_csr_unary_out_sin_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4823763Z test_sparse_csr_unary_out_sin_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4823916Z test_sparse_csr_unary_out_sin_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4824093Z test_sparse_csr_unary_out_sin_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4824247Z test_sparse_csr_unary_out_sin_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4824406Z test_sparse_csr_unary_out_sinh_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4824557Z test_sparse_csr_unary_out_sinh_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4824712Z test_sparse_csr_unary_out_sinh_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4824869Z test_sparse_csr_unary_out_sinh_cuda_complex32 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4825027Z test_sparse_csr_unary_out_sinh_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4825183Z test_sparse_csr_unary_out_sinh_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4825339Z test_sparse_csr_unary_out_sinh_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4825494Z test_sparse_csr_unary_out_sinh_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4825651Z test_sparse_csr_unary_out_sinh_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4825806Z test_sparse_csr_unary_out_sinh_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4825955Z test_sparse_csr_unary_out_sinh_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4826107Z test_sparse_csr_unary_out_sinh_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4826259Z test_sparse_csr_unary_out_sinh_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4826417Z test_sparse_csr_unary_out_sqrt_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4826566Z test_sparse_csr_unary_out_sqrt_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4826729Z test_sparse_csr_unary_out_sqrt_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4826892Z test_sparse_csr_unary_out_sqrt_cuda_complex32 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4827076Z test_sparse_csr_unary_out_sqrt_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4827229Z test_sparse_csr_unary_out_sqrt_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4827377Z test_sparse_csr_unary_out_sqrt_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4827531Z test_sparse_csr_unary_out_sqrt_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4827682Z test_sparse_csr_unary_out_sqrt_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4827836Z test_sparse_csr_unary_out_sqrt_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4827987Z test_sparse_csr_unary_out_sqrt_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4828145Z test_sparse_csr_unary_out_sqrt_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4828297Z test_sparse_csr_unary_out_sqrt_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4828454Z test_sparse_csr_unary_out_tan_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4828602Z test_sparse_csr_unary_out_tan_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4828761Z test_sparse_csr_unary_out_tan_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4828918Z test_sparse_csr_unary_out_tan_cuda_complex32 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4829075Z test_sparse_csr_unary_out_tan_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4829229Z test_sparse_csr_unary_out_tan_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4829408Z test_sparse_csr_unary_out_tan_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4829563Z test_sparse_csr_unary_out_tan_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4829718Z test_sparse_csr_unary_out_tan_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4829868Z test_sparse_csr_unary_out_tan_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4830011Z test_sparse_csr_unary_out_tan_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4830162Z test_sparse_csr_unary_out_tan_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4830314Z test_sparse_csr_unary_out_tan_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4830467Z test_sparse_csr_unary_out_tanh_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4830619Z test_sparse_csr_unary_out_tanh_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4830781Z test_sparse_csr_unary_out_tanh_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4830941Z test_sparse_csr_unary_out_tanh_cuda_complex32 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4831099Z test_sparse_csr_unary_out_tanh_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4831247Z test_sparse_csr_unary_out_tanh_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4831402Z test_sparse_csr_unary_out_tanh_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4831557Z test_sparse_csr_unary_out_tanh_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4831711Z test_sparse_csr_unary_out_tanh_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4831862Z test_sparse_csr_unary_out_tanh_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4832019Z test_sparse_csr_unary_out_tanh_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4832169Z test_sparse_csr_unary_out_tanh_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4832344Z test_sparse_csr_unary_out_tanh_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4832498Z test_sparse_csr_unary_out_trunc_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4832655Z test_sparse_csr_unary_out_trunc_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4832809Z test_sparse_csr_unary_out_trunc_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4832961Z test_sparse_csr_unary_out_trunc_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4833115Z test_sparse_csr_unary_out_trunc_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4833265Z test_sparse_csr_unary_out_trunc_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4833423Z test_sparse_csr_unary_out_trunc_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4833574Z test_sparse_csr_unary_out_trunc_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4833731Z test_sparse_csr_unary_out_trunc_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.005s) 2023-01-11T21:45:10.4833869Z test_sparse_mm_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.008s) 2023-01-11T21:45:10.4834014Z test_sparse_mm_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.008s) 2023-01-11T21:45:10.4834158Z test_sparse_mm_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.009s) 2023-01-11T21:45:10.4834301Z test_sparse_mm_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.009s) 2023-01-11T21:45:10.4834440Z test_sparse_mm_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.008s) 2023-01-11T21:45:10.4834577Z test_sparse_mm_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.008s) 2023-01-11T21:45:10.4834763Z test_sparse_to_sparse_compressed_SparseBSC_cuda_float64 (__main__.TestSparseCSRCUDA) 2023-01-11T21:45:10.4834936Z This test tests conversion from COO to CSR and CSC and CSC to CSR and CSC ... skip: NOT IMPL (0.002s) 2023-01-11T21:45:10.4835100Z test_sparse_to_sparse_compressed_SparseBSR_cuda_float64 (__main__.TestSparseCSRCUDA) 2023-01-11T21:45:10.4835271Z This test tests conversion from COO to CSR and CSC and CSC to CSR and CSC ... skip: NOT IMPL (0.002s) 2023-01-11T21:45:10.4835435Z test_sparse_to_sparse_compressed_SparseCSC_cuda_float64 (__main__.TestSparseCSRCUDA) 2023-01-11T21:45:10.4835583Z This test tests conversion from COO to CSR and CSC and CSC to CSR and CSC ... ok (0.015s) 2023-01-11T21:45:10.4835747Z test_sparse_to_sparse_compressed_SparseCSR_cuda_float64 (__main__.TestSparseCSRCUDA) 2023-01-11T21:45:10.4835896Z This test tests conversion from COO to CSR and CSC and CSC to CSR and CSC ... ok (0.016s) 2023-01-11T21:45:10.4836057Z test_sparse_triangular_solve_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.189s) 2023-01-11T21:45:10.4836220Z test_sparse_triangular_solve_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.188s) 2023-01-11T21:45:10.4836376Z test_sparse_triangular_solve_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.186s) 2023-01-11T21:45:10.4836528Z test_sparse_triangular_solve_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.185s) 2023-01-11T21:45:10.4836665Z test_sum_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.025s) 2023-01-11T21:45:10.4836793Z test_sum_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.024s) 2023-01-11T21:45:10.4836934Z test_sum_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.026s) 2023-01-11T21:45:10.4837072Z test_sum_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.025s) 2023-01-11T21:45:10.4837205Z test_sum_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.026s) 2023-01-11T21:45:10.4837335Z test_sum_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.031s) 2023-01-11T21:45:10.4837462Z test_sum_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.031s) 2023-01-11T21:45:10.4837593Z test_sum_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.023s) 2023-01-11T21:45:10.4837720Z test_sum_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.023s) 2023-01-11T21:45:10.4837869Z test_sum_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.024s) 2023-01-11T21:45:10.4837998Z test_sum_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.023s) 2023-01-11T21:45:10.4838126Z test_sum_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.023s) 2023-01-11T21:45:10.4838283Z test_transpose_SparseBSC_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (3.379s) 2023-01-11T21:45:10.4838433Z test_transpose_SparseBSC_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (3.283s) 2023-01-11T21:45:10.4838587Z test_transpose_SparseBSC_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (3.385s) 2023-01-11T21:45:10.4838745Z test_transpose_SparseBSC_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (3.394s) 2023-01-11T21:45:10.4838902Z test_transpose_SparseBSC_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (3.380s) 2023-01-11T21:45:10.4839054Z test_transpose_SparseBSC_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (3.378s) 2023-01-11T21:45:10.4839205Z test_transpose_SparseBSC_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (3.364s) 2023-01-11T21:45:10.4839356Z test_transpose_SparseBSC_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (3.295s) 2023-01-11T21:45:10.4839504Z test_transpose_SparseBSC_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (3.302s) 2023-01-11T21:45:10.4839650Z test_transpose_SparseBSC_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (3.282s) 2023-01-11T21:45:10.4839798Z test_transpose_SparseBSC_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (3.293s) 2023-01-11T21:45:10.4839945Z test_transpose_SparseBSC_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (3.295s) 2023-01-11T21:45:10.4840098Z test_transpose_SparseBSR_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (3.451s) 2023-01-11T21:45:10.4840270Z test_transpose_SparseBSR_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (3.361s) 2023-01-11T21:45:10.4840428Z test_transpose_SparseBSR_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (3.469s) 2023-01-11T21:45:10.4840586Z test_transpose_SparseBSR_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (3.474s) 2023-01-11T21:45:10.4840742Z test_transpose_SparseBSR_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (3.451s) 2023-01-11T21:45:10.4840894Z test_transpose_SparseBSR_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (3.452s) 2023-01-11T21:45:10.4841042Z test_transpose_SparseBSR_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (3.440s) 2023-01-11T21:45:10.4841186Z test_transpose_SparseBSR_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (3.370s) 2023-01-11T21:45:10.4841333Z test_transpose_SparseBSR_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (3.370s) 2023-01-11T21:45:10.4841477Z test_transpose_SparseBSR_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (3.363s) 2023-01-11T21:45:10.4841626Z test_transpose_SparseBSR_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (3.374s) 2023-01-11T21:45:10.4841775Z test_transpose_SparseBSR_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (3.369s) 2023-01-11T21:45:10.4841933Z test_transpose_SparseCSC_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (1.590s) 2023-01-11T21:45:10.4842081Z test_transpose_SparseCSC_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (1.545s) 2023-01-11T21:45:10.4842236Z test_transpose_SparseCSC_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (1.600s) 2023-01-11T21:45:10.4842387Z test_transpose_SparseCSC_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (1.601s) 2023-01-11T21:45:10.4842541Z test_transpose_SparseCSC_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (1.588s) 2023-01-11T21:45:10.4842689Z test_transpose_SparseCSC_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (1.586s) 2023-01-11T21:45:10.4842836Z test_transpose_SparseCSC_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (1.583s) 2023-01-11T21:45:10.4842989Z test_transpose_SparseCSC_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (1.545s) 2023-01-11T21:45:10.4843137Z test_transpose_SparseCSC_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (1.546s) 2023-01-11T21:45:10.4843304Z test_transpose_SparseCSC_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (1.543s) 2023-01-11T21:45:10.4843455Z test_transpose_SparseCSC_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (1.546s) 2023-01-11T21:45:10.4843602Z test_transpose_SparseCSC_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (1.550s) 2023-01-11T21:45:10.4843756Z test_transpose_SparseCSR_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (1.645s) 2023-01-11T21:45:10.4843907Z test_transpose_SparseCSR_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (1.597s) 2023-01-11T21:45:10.4844066Z test_transpose_SparseCSR_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (1.653s) 2023-01-11T21:45:10.4844223Z test_transpose_SparseCSR_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (1.657s) 2023-01-11T21:45:10.4844376Z test_transpose_SparseCSR_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (1.645s) 2023-01-11T21:45:10.4844523Z test_transpose_SparseCSR_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (1.640s) 2023-01-11T21:45:10.4844672Z test_transpose_SparseCSR_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (1.641s) 2023-01-11T21:45:10.4844821Z test_transpose_SparseCSR_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (1.603s) 2023-01-11T21:45:10.4844963Z test_transpose_SparseCSR_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (1.601s) 2023-01-11T21:45:10.4845109Z test_transpose_SparseCSR_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (1.596s) 2023-01-11T21:45:10.4845260Z test_transpose_SparseCSR_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (1.603s) 2023-01-11T21:45:10.4845410Z test_transpose_SparseCSR_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (1.601s) 2023-01-11T21:45:10.4845586Z test_zero_to_zero_correspondence_unary_abs_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4845794Z test_zero_to_zero_correspondence_unary_abs_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4845971Z test_zero_to_zero_correspondence_unary_abs_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4846148Z test_zero_to_zero_correspondence_unary_abs_cuda_complex32 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4846321Z test_zero_to_zero_correspondence_unary_abs_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4846496Z test_zero_to_zero_correspondence_unary_abs_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4846671Z test_zero_to_zero_correspondence_unary_abs_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4846838Z test_zero_to_zero_correspondence_unary_abs_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4847009Z test_zero_to_zero_correspondence_unary_abs_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4847176Z test_zero_to_zero_correspondence_unary_abs_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4847340Z test_zero_to_zero_correspondence_unary_abs_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4847511Z test_zero_to_zero_correspondence_unary_abs_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4847675Z test_zero_to_zero_correspondence_unary_abs_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4847843Z test_zero_to_zero_correspondence_unary_angle_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4848024Z test_zero_to_zero_correspondence_unary_angle_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4848205Z test_zero_to_zero_correspondence_unary_angle_cuda_complex32 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4848382Z test_zero_to_zero_correspondence_unary_angle_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4848563Z test_zero_to_zero_correspondence_unary_angle_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4848770Z test_zero_to_zero_correspondence_unary_angle_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4848942Z test_zero_to_zero_correspondence_unary_angle_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4849115Z test_zero_to_zero_correspondence_unary_angle_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4849285Z test_zero_to_zero_correspondence_unary_angle_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4849452Z test_zero_to_zero_correspondence_unary_angle_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4849620Z test_zero_to_zero_correspondence_unary_angle_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4849795Z test_zero_to_zero_correspondence_unary_asin_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4849969Z test_zero_to_zero_correspondence_unary_asin_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4850153Z test_zero_to_zero_correspondence_unary_asin_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4850330Z test_zero_to_zero_correspondence_unary_asin_cuda_complex32 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4850503Z test_zero_to_zero_correspondence_unary_asin_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4850679Z test_zero_to_zero_correspondence_unary_asin_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4850851Z test_zero_to_zero_correspondence_unary_asin_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4851019Z test_zero_to_zero_correspondence_unary_asin_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4851217Z test_zero_to_zero_correspondence_unary_asin_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4851389Z test_zero_to_zero_correspondence_unary_asin_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4851563Z test_zero_to_zero_correspondence_unary_asin_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4851732Z test_zero_to_zero_correspondence_unary_asin_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4851899Z test_zero_to_zero_correspondence_unary_asin_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4852074Z test_zero_to_zero_correspondence_unary_asinh_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4852241Z test_zero_to_zero_correspondence_unary_asinh_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4852422Z test_zero_to_zero_correspondence_unary_asinh_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4852596Z test_zero_to_zero_correspondence_unary_asinh_cuda_complex32 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4852779Z test_zero_to_zero_correspondence_unary_asinh_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4852956Z test_zero_to_zero_correspondence_unary_asinh_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4853125Z test_zero_to_zero_correspondence_unary_asinh_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4853301Z test_zero_to_zero_correspondence_unary_asinh_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4853499Z test_zero_to_zero_correspondence_unary_asinh_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4853695Z test_zero_to_zero_correspondence_unary_asinh_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4853868Z test_zero_to_zero_correspondence_unary_asinh_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4854040Z test_zero_to_zero_correspondence_unary_asinh_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4854224Z test_zero_to_zero_correspondence_unary_asinh_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4854404Z test_zero_to_zero_correspondence_unary_atan_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4854686Z test_zero_to_zero_correspondence_unary_atan_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4854866Z test_zero_to_zero_correspondence_unary_atan_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4855045Z test_zero_to_zero_correspondence_unary_atan_cuda_complex32 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4855221Z test_zero_to_zero_correspondence_unary_atan_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4855407Z test_zero_to_zero_correspondence_unary_atan_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4855584Z test_zero_to_zero_correspondence_unary_atan_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4855763Z test_zero_to_zero_correspondence_unary_atan_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4855926Z test_zero_to_zero_correspondence_unary_atan_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4856091Z test_zero_to_zero_correspondence_unary_atan_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4856257Z test_zero_to_zero_correspondence_unary_atan_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4856430Z test_zero_to_zero_correspondence_unary_atan_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4856598Z test_zero_to_zero_correspondence_unary_atan_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4856829Z test_zero_to_zero_correspondence_unary_atanh_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4856999Z test_zero_to_zero_correspondence_unary_atanh_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4857184Z test_zero_to_zero_correspondence_unary_atanh_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4857364Z test_zero_to_zero_correspondence_unary_atanh_cuda_complex32 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4857535Z test_zero_to_zero_correspondence_unary_atanh_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4857709Z test_zero_to_zero_correspondence_unary_atanh_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4857881Z test_zero_to_zero_correspondence_unary_atanh_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4858054Z test_zero_to_zero_correspondence_unary_atanh_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4858234Z test_zero_to_zero_correspondence_unary_atanh_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4858406Z test_zero_to_zero_correspondence_unary_atanh_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4858627Z test_zero_to_zero_correspondence_unary_atanh_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4858801Z test_zero_to_zero_correspondence_unary_atanh_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4858971Z test_zero_to_zero_correspondence_unary_atanh_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4859137Z test_zero_to_zero_correspondence_unary_ceil_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4859314Z test_zero_to_zero_correspondence_unary_ceil_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4859487Z test_zero_to_zero_correspondence_unary_ceil_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4859659Z test_zero_to_zero_correspondence_unary_ceil_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4859863Z test_zero_to_zero_correspondence_unary_ceil_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4860031Z test_zero_to_zero_correspondence_unary_ceil_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4860195Z test_zero_to_zero_correspondence_unary_ceil_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4860364Z test_zero_to_zero_correspondence_unary_ceil_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4860534Z test_zero_to_zero_correspondence_unary_ceil_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4860712Z test_zero_to_zero_correspondence_unary_conj_physical_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4860898Z test_zero_to_zero_correspondence_unary_conj_physical_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4861088Z test_zero_to_zero_correspondence_unary_conj_physical_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4861277Z test_zero_to_zero_correspondence_unary_conj_physical_cuda_complex32 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4861456Z test_zero_to_zero_correspondence_unary_conj_physical_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4861639Z test_zero_to_zero_correspondence_unary_conj_physical_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4861821Z test_zero_to_zero_correspondence_unary_conj_physical_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4862005Z test_zero_to_zero_correspondence_unary_conj_physical_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4862213Z test_zero_to_zero_correspondence_unary_conj_physical_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4862389Z test_zero_to_zero_correspondence_unary_conj_physical_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4862571Z test_zero_to_zero_correspondence_unary_conj_physical_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4862750Z test_zero_to_zero_correspondence_unary_conj_physical_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4862929Z test_zero_to_zero_correspondence_unary_conj_physical_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4863116Z test_zero_to_zero_correspondence_unary_deg2rad_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4863312Z test_zero_to_zero_correspondence_unary_deg2rad_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4863519Z test_zero_to_zero_correspondence_unary_deg2rad_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4863700Z test_zero_to_zero_correspondence_unary_deg2rad_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4863878Z test_zero_to_zero_correspondence_unary_deg2rad_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4864047Z test_zero_to_zero_correspondence_unary_deg2rad_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4864217Z test_zero_to_zero_correspondence_unary_deg2rad_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4864391Z test_zero_to_zero_correspondence_unary_deg2rad_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4864566Z test_zero_to_zero_correspondence_unary_deg2rad_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4864745Z test_zero_to_zero_correspondence_unary_deg2rad_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4864921Z test_zero_to_zero_correspondence_unary_erf_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4865091Z test_zero_to_zero_correspondence_unary_erf_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4865288Z test_zero_to_zero_correspondence_unary_erf_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4865460Z test_zero_to_zero_correspondence_unary_erf_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4865626Z test_zero_to_zero_correspondence_unary_erf_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4865796Z test_zero_to_zero_correspondence_unary_erf_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4865967Z test_zero_to_zero_correspondence_unary_erf_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4866133Z test_zero_to_zero_correspondence_unary_erf_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4866303Z test_zero_to_zero_correspondence_unary_erf_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4866466Z test_zero_to_zero_correspondence_unary_erf_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4866639Z test_zero_to_zero_correspondence_unary_erfinv_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4866817Z test_zero_to_zero_correspondence_unary_erfinv_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4866995Z test_zero_to_zero_correspondence_unary_erfinv_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4867164Z test_zero_to_zero_correspondence_unary_erfinv_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4867338Z test_zero_to_zero_correspondence_unary_erfinv_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4867511Z test_zero_to_zero_correspondence_unary_erfinv_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4867707Z test_zero_to_zero_correspondence_unary_erfinv_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4867881Z test_zero_to_zero_correspondence_unary_erfinv_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4868053Z test_zero_to_zero_correspondence_unary_erfinv_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4868231Z test_zero_to_zero_correspondence_unary_expm1_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4868403Z test_zero_to_zero_correspondence_unary_expm1_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4868577Z test_zero_to_zero_correspondence_unary_expm1_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4868745Z test_zero_to_zero_correspondence_unary_expm1_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4868921Z test_zero_to_zero_correspondence_unary_expm1_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4869095Z test_zero_to_zero_correspondence_unary_expm1_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4869268Z test_zero_to_zero_correspondence_unary_expm1_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4869433Z test_zero_to_zero_correspondence_unary_expm1_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4869602Z test_zero_to_zero_correspondence_unary_expm1_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4869772Z test_zero_to_zero_correspondence_unary_expm1_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4869949Z test_zero_to_zero_correspondence_unary_floor_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4870123Z test_zero_to_zero_correspondence_unary_floor_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4870289Z test_zero_to_zero_correspondence_unary_floor_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4870460Z test_zero_to_zero_correspondence_unary_floor_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4870652Z test_zero_to_zero_correspondence_unary_floor_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4870827Z test_zero_to_zero_correspondence_unary_floor_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4870996Z test_zero_to_zero_correspondence_unary_floor_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4871166Z test_zero_to_zero_correspondence_unary_floor_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4871336Z test_zero_to_zero_correspondence_unary_floor_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4871510Z test_zero_to_zero_correspondence_unary_frac_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4871688Z test_zero_to_zero_correspondence_unary_frac_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4871854Z test_zero_to_zero_correspondence_unary_frac_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4872024Z test_zero_to_zero_correspondence_unary_frac_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4872200Z test_zero_to_zero_correspondence_unary_isinf_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4872372Z test_zero_to_zero_correspondence_unary_isinf_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4872551Z test_zero_to_zero_correspondence_unary_isinf_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4872728Z test_zero_to_zero_correspondence_unary_isinf_cuda_complex32 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4872908Z test_zero_to_zero_correspondence_unary_isinf_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4873110Z test_zero_to_zero_correspondence_unary_isinf_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4873286Z test_zero_to_zero_correspondence_unary_isinf_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4873450Z test_zero_to_zero_correspondence_unary_isinf_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4873620Z test_zero_to_zero_correspondence_unary_isinf_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4873786Z test_zero_to_zero_correspondence_unary_isinf_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4873957Z test_zero_to_zero_correspondence_unary_isinf_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4874128Z test_zero_to_zero_correspondence_unary_isinf_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4874300Z test_zero_to_zero_correspondence_unary_isinf_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4874475Z test_zero_to_zero_correspondence_unary_isnan_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4874645Z test_zero_to_zero_correspondence_unary_isnan_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4874825Z test_zero_to_zero_correspondence_unary_isnan_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4874996Z test_zero_to_zero_correspondence_unary_isnan_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4875167Z test_zero_to_zero_correspondence_unary_isnan_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4875336Z test_zero_to_zero_correspondence_unary_isnan_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4875507Z test_zero_to_zero_correspondence_unary_isnan_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4875677Z test_zero_to_zero_correspondence_unary_isnan_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4875847Z test_zero_to_zero_correspondence_unary_isnan_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4876039Z test_zero_to_zero_correspondence_unary_isnan_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4876208Z test_zero_to_zero_correspondence_unary_isnan_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4876375Z test_zero_to_zero_correspondence_unary_isnan_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4876551Z test_zero_to_zero_correspondence_unary_isneginf_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4876727Z test_zero_to_zero_correspondence_unary_isneginf_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4876906Z test_zero_to_zero_correspondence_unary_isneginf_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4877090Z test_zero_to_zero_correspondence_unary_isneginf_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4877271Z test_zero_to_zero_correspondence_unary_isneginf_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4877448Z test_zero_to_zero_correspondence_unary_isneginf_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4877629Z test_zero_to_zero_correspondence_unary_isneginf_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4877804Z test_zero_to_zero_correspondence_unary_isneginf_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4877980Z test_zero_to_zero_correspondence_unary_isneginf_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4878150Z test_zero_to_zero_correspondence_unary_isneginf_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4878355Z test_zero_to_zero_correspondence_unary_isposinf_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4878528Z test_zero_to_zero_correspondence_unary_isposinf_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4878709Z test_zero_to_zero_correspondence_unary_isposinf_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4878886Z test_zero_to_zero_correspondence_unary_isposinf_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4879062Z test_zero_to_zero_correspondence_unary_isposinf_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4879234Z test_zero_to_zero_correspondence_unary_isposinf_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4879407Z test_zero_to_zero_correspondence_unary_isposinf_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4879580Z test_zero_to_zero_correspondence_unary_isposinf_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4879748Z test_zero_to_zero_correspondence_unary_isposinf_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4879925Z test_zero_to_zero_correspondence_unary_isposinf_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4880097Z test_zero_to_zero_correspondence_unary_log1p_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4880269Z test_zero_to_zero_correspondence_unary_log1p_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4880448Z test_zero_to_zero_correspondence_unary_log1p_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4880628Z test_zero_to_zero_correspondence_unary_log1p_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4880800Z test_zero_to_zero_correspondence_unary_log1p_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4880975Z test_zero_to_zero_correspondence_unary_log1p_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4881146Z test_zero_to_zero_correspondence_unary_log1p_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4881333Z test_zero_to_zero_correspondence_unary_log1p_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4881502Z test_zero_to_zero_correspondence_unary_log1p_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4881676Z test_zero_to_zero_correspondence_unary_log1p_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4881847Z test_zero_to_zero_correspondence_unary_log1p_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4882015Z test_zero_to_zero_correspondence_unary_log1p_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4882188Z test_zero_to_zero_correspondence_unary_neg_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4882364Z test_zero_to_zero_correspondence_unary_neg_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4882540Z test_zero_to_zero_correspondence_unary_neg_cuda_complex32 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4882719Z test_zero_to_zero_correspondence_unary_neg_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4882880Z test_zero_to_zero_correspondence_unary_neg_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4883053Z test_zero_to_zero_correspondence_unary_neg_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4883229Z test_zero_to_zero_correspondence_unary_neg_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4883427Z test_zero_to_zero_correspondence_unary_neg_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4883609Z test_zero_to_zero_correspondence_unary_neg_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4883800Z test_zero_to_zero_correspondence_unary_neg_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4883973Z test_zero_to_zero_correspondence_unary_neg_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4884144Z test_zero_to_zero_correspondence_unary_neg_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4884334Z test_zero_to_zero_correspondence_unary_nn_functional_relu_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4884518Z test_zero_to_zero_correspondence_unary_nn_functional_relu_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4884703Z test_zero_to_zero_correspondence_unary_nn_functional_relu_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4884892Z test_zero_to_zero_correspondence_unary_nn_functional_relu_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4885082Z test_zero_to_zero_correspondence_unary_nn_functional_relu_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4885263Z test_zero_to_zero_correspondence_unary_nn_functional_relu_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4885444Z test_zero_to_zero_correspondence_unary_nn_functional_relu_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4885632Z test_zero_to_zero_correspondence_unary_nn_functional_relu_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4885818Z test_zero_to_zero_correspondence_unary_nn_functional_relu_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4885999Z test_zero_to_zero_correspondence_unary_positive_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4886187Z test_zero_to_zero_correspondence_unary_positive_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4886365Z test_zero_to_zero_correspondence_unary_positive_cuda_complex32 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4886545Z test_zero_to_zero_correspondence_unary_positive_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4886749Z test_zero_to_zero_correspondence_unary_positive_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4886928Z test_zero_to_zero_correspondence_unary_positive_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4887100Z test_zero_to_zero_correspondence_unary_positive_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4887276Z test_zero_to_zero_correspondence_unary_positive_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4887453Z test_zero_to_zero_correspondence_unary_positive_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4887631Z test_zero_to_zero_correspondence_unary_positive_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4887807Z test_zero_to_zero_correspondence_unary_positive_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4887977Z test_zero_to_zero_correspondence_unary_positive_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4888156Z test_zero_to_zero_correspondence_unary_rad2deg_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4888333Z test_zero_to_zero_correspondence_unary_rad2deg_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4888509Z test_zero_to_zero_correspondence_unary_rad2deg_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4888685Z test_zero_to_zero_correspondence_unary_rad2deg_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4888857Z test_zero_to_zero_correspondence_unary_rad2deg_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4889059Z test_zero_to_zero_correspondence_unary_rad2deg_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4889237Z test_zero_to_zero_correspondence_unary_rad2deg_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4889413Z test_zero_to_zero_correspondence_unary_rad2deg_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4889580Z test_zero_to_zero_correspondence_unary_rad2deg_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4889749Z test_zero_to_zero_correspondence_unary_rad2deg_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4889920Z test_zero_to_zero_correspondence_unary_round_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4890093Z test_zero_to_zero_correspondence_unary_round_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4890265Z test_zero_to_zero_correspondence_unary_round_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4890441Z test_zero_to_zero_correspondence_unary_round_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4890614Z test_zero_to_zero_correspondence_unary_round_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4890782Z test_zero_to_zero_correspondence_unary_round_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4890954Z test_zero_to_zero_correspondence_unary_round_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4891119Z test_zero_to_zero_correspondence_unary_round_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4891288Z test_zero_to_zero_correspondence_unary_round_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4891461Z test_zero_to_zero_correspondence_unary_sgn_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4891630Z test_zero_to_zero_correspondence_unary_sgn_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4891810Z test_zero_to_zero_correspondence_unary_sgn_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4892006Z test_zero_to_zero_correspondence_unary_sgn_cuda_complex32 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4892181Z test_zero_to_zero_correspondence_unary_sgn_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4892351Z test_zero_to_zero_correspondence_unary_sgn_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4892520Z test_zero_to_zero_correspondence_unary_sgn_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4892680Z test_zero_to_zero_correspondence_unary_sgn_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4892854Z test_zero_to_zero_correspondence_unary_sgn_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4893023Z test_zero_to_zero_correspondence_unary_sgn_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4893189Z test_zero_to_zero_correspondence_unary_sgn_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4893364Z test_zero_to_zero_correspondence_unary_sgn_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4893534Z test_zero_to_zero_correspondence_unary_sgn_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4893707Z test_zero_to_zero_correspondence_unary_sign_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4893876Z test_zero_to_zero_correspondence_unary_sign_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4894049Z test_zero_to_zero_correspondence_unary_sign_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4894211Z test_zero_to_zero_correspondence_unary_sign_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4894420Z test_zero_to_zero_correspondence_unary_sign_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4894765Z test_zero_to_zero_correspondence_unary_sign_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4894977Z test_zero_to_zero_correspondence_unary_sign_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4895144Z test_zero_to_zero_correspondence_unary_sign_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4895315Z test_zero_to_zero_correspondence_unary_sign_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4895483Z test_zero_to_zero_correspondence_unary_sign_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4895661Z test_zero_to_zero_correspondence_unary_signbit_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4895835Z test_zero_to_zero_correspondence_unary_signbit_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4896008Z test_zero_to_zero_correspondence_unary_signbit_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4896185Z test_zero_to_zero_correspondence_unary_signbit_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4896362Z test_zero_to_zero_correspondence_unary_signbit_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4896534Z test_zero_to_zero_correspondence_unary_signbit_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4896711Z test_zero_to_zero_correspondence_unary_signbit_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4896882Z test_zero_to_zero_correspondence_unary_signbit_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4897053Z test_zero_to_zero_correspondence_unary_signbit_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4897224Z test_zero_to_zero_correspondence_unary_signbit_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4897396Z test_zero_to_zero_correspondence_unary_sin_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4897631Z test_zero_to_zero_correspondence_unary_sin_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4897808Z test_zero_to_zero_correspondence_unary_sin_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4897980Z test_zero_to_zero_correspondence_unary_sin_cuda_complex32 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4898152Z test_zero_to_zero_correspondence_unary_sin_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4898321Z test_zero_to_zero_correspondence_unary_sin_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4898493Z test_zero_to_zero_correspondence_unary_sin_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4898735Z test_zero_to_zero_correspondence_unary_sin_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4898903Z test_zero_to_zero_correspondence_unary_sin_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4899072Z test_zero_to_zero_correspondence_unary_sin_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4899228Z test_zero_to_zero_correspondence_unary_sin_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4899395Z test_zero_to_zero_correspondence_unary_sin_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4899562Z test_zero_to_zero_correspondence_unary_sin_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4899740Z test_zero_to_zero_correspondence_unary_sinh_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4899904Z test_zero_to_zero_correspondence_unary_sinh_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4900120Z test_zero_to_zero_correspondence_unary_sinh_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4900299Z test_zero_to_zero_correspondence_unary_sinh_cuda_complex32 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4900474Z test_zero_to_zero_correspondence_unary_sinh_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4900646Z test_zero_to_zero_correspondence_unary_sinh_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4900814Z test_zero_to_zero_correspondence_unary_sinh_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4900989Z test_zero_to_zero_correspondence_unary_sinh_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4901158Z test_zero_to_zero_correspondence_unary_sinh_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4901328Z test_zero_to_zero_correspondence_unary_sinh_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4901498Z test_zero_to_zero_correspondence_unary_sinh_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4901672Z test_zero_to_zero_correspondence_unary_sinh_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4901839Z test_zero_to_zero_correspondence_unary_sinh_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4902013Z test_zero_to_zero_correspondence_unary_sqrt_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4902182Z test_zero_to_zero_correspondence_unary_sqrt_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4902355Z test_zero_to_zero_correspondence_unary_sqrt_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4902532Z test_zero_to_zero_correspondence_unary_sqrt_cuda_complex32 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4902708Z test_zero_to_zero_correspondence_unary_sqrt_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4902884Z test_zero_to_zero_correspondence_unary_sqrt_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4903078Z test_zero_to_zero_correspondence_unary_sqrt_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4903254Z test_zero_to_zero_correspondence_unary_sqrt_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4903427Z test_zero_to_zero_correspondence_unary_sqrt_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4903593Z test_zero_to_zero_correspondence_unary_sqrt_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4903758Z test_zero_to_zero_correspondence_unary_sqrt_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4903923Z test_zero_to_zero_correspondence_unary_sqrt_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4904095Z test_zero_to_zero_correspondence_unary_sqrt_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4904269Z test_zero_to_zero_correspondence_unary_tan_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4904445Z test_zero_to_zero_correspondence_unary_tan_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4904624Z test_zero_to_zero_correspondence_unary_tan_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4904804Z test_zero_to_zero_correspondence_unary_tan_cuda_complex32 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4904980Z test_zero_to_zero_correspondence_unary_tan_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4905152Z test_zero_to_zero_correspondence_unary_tan_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4905317Z test_zero_to_zero_correspondence_unary_tan_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4905513Z test_zero_to_zero_correspondence_unary_tan_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4905683Z test_zero_to_zero_correspondence_unary_tan_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4905852Z test_zero_to_zero_correspondence_unary_tan_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4906017Z test_zero_to_zero_correspondence_unary_tan_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4906185Z test_zero_to_zero_correspondence_unary_tan_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4906355Z test_zero_to_zero_correspondence_unary_tan_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4906530Z test_zero_to_zero_correspondence_unary_tanh_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4906698Z test_zero_to_zero_correspondence_unary_tanh_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4906874Z test_zero_to_zero_correspondence_unary_tanh_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4907051Z test_zero_to_zero_correspondence_unary_tanh_cuda_complex32 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4907227Z test_zero_to_zero_correspondence_unary_tanh_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4907399Z test_zero_to_zero_correspondence_unary_tanh_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4907571Z test_zero_to_zero_correspondence_unary_tanh_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4907742Z test_zero_to_zero_correspondence_unary_tanh_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4907914Z test_zero_to_zero_correspondence_unary_tanh_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4908081Z test_zero_to_zero_correspondence_unary_tanh_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4908254Z test_zero_to_zero_correspondence_unary_tanh_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4908438Z test_zero_to_zero_correspondence_unary_tanh_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4908607Z test_zero_to_zero_correspondence_unary_tanh_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4908784Z test_zero_to_zero_correspondence_unary_trunc_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4908960Z test_zero_to_zero_correspondence_unary_trunc_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4909128Z test_zero_to_zero_correspondence_unary_trunc_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4909301Z test_zero_to_zero_correspondence_unary_trunc_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4909475Z test_zero_to_zero_correspondence_unary_trunc_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4909647Z test_zero_to_zero_correspondence_unary_trunc_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4909821Z test_zero_to_zero_correspondence_unary_trunc_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4909983Z test_zero_to_zero_correspondence_unary_trunc_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4910150Z test_zero_to_zero_correspondence_unary_trunc_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4910296Z test_make_crow_indices (__main__.TestSparseCSRSampler) ... ok (0.411s) 2023-01-11T21:45:10.4910467Z test_clone_SparseBSC_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.188s) 2023-01-11T21:45:10.4910630Z test_clone_SparseBSC_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.177s) 2023-01-11T21:45:10.4910800Z test_clone_SparseBSC_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.185s) 2023-01-11T21:45:10.4910993Z test_clone_SparseBSC_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.185s) 2023-01-11T21:45:10.4911161Z test_clone_SparseBSC_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.186s) 2023-01-11T21:45:10.4911327Z test_clone_SparseBSC_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.185s) 2023-01-11T21:45:10.4911479Z test_clone_SparseBSC_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.183s) 2023-01-11T21:45:10.4911642Z test_clone_SparseBSC_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.178s) 2023-01-11T21:45:10.4911800Z test_clone_SparseBSC_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.177s) 2023-01-11T21:45:10.4911955Z test_clone_SparseBSC_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.174s) 2023-01-11T21:45:10.4912117Z test_clone_SparseBSC_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.175s) 2023-01-11T21:45:10.4912275Z test_clone_SparseBSC_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.178s) 2023-01-11T21:45:10.4912442Z test_clone_SparseBSR_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.143s) 2023-01-11T21:45:10.4912603Z test_clone_SparseBSR_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.134s) 2023-01-11T21:45:10.4912766Z test_clone_SparseBSR_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.144s) 2023-01-11T21:45:10.4912933Z test_clone_SparseBSR_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.145s) 2023-01-11T21:45:10.4913095Z test_clone_SparseBSR_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.143s) 2023-01-11T21:45:10.4913256Z test_clone_SparseBSR_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.143s) 2023-01-11T21:45:10.4913418Z test_clone_SparseBSR_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.142s) 2023-01-11T21:45:10.4913580Z test_clone_SparseBSR_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.135s) 2023-01-11T21:45:10.4913741Z test_clone_SparseBSR_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.135s) 2023-01-11T21:45:10.4913894Z test_clone_SparseBSR_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.133s) 2023-01-11T21:45:10.4914069Z test_clone_SparseBSR_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.135s) 2023-01-11T21:45:10.4914226Z test_clone_SparseBSR_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.134s) 2023-01-11T21:45:10.4914387Z test_clone_SparseCSC_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.123s) 2023-01-11T21:45:10.4914549Z test_clone_SparseCSC_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.114s) 2023-01-11T21:45:10.4914714Z test_clone_SparseCSC_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.121s) 2023-01-11T21:45:10.4914875Z test_clone_SparseCSC_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.122s) 2023-01-11T21:45:10.4915034Z test_clone_SparseCSC_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.121s) 2023-01-11T21:45:10.4915195Z test_clone_SparseCSC_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.121s) 2023-01-11T21:45:10.4915356Z test_clone_SparseCSC_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.119s) 2023-01-11T21:45:10.4915511Z test_clone_SparseCSC_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.112s) 2023-01-11T21:45:10.4915665Z test_clone_SparseCSC_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.114s) 2023-01-11T21:45:10.4915817Z test_clone_SparseCSC_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.112s) 2023-01-11T21:45:10.4915974Z test_clone_SparseCSC_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.113s) 2023-01-11T21:45:10.4916128Z test_clone_SparseCSC_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.111s) 2023-01-11T21:45:10.4916290Z test_clone_SparseCSR_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.088s) 2023-01-11T21:45:10.4916447Z test_clone_SparseCSR_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.079s) 2023-01-11T21:45:10.4916640Z test_clone_SparseCSR_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.089s) 2023-01-11T21:45:10.4916797Z test_clone_SparseCSR_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.090s) 2023-01-11T21:45:10.4916957Z test_clone_SparseCSR_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.087s) 2023-01-11T21:45:10.4917119Z test_clone_SparseCSR_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.087s) 2023-01-11T21:45:10.4917275Z test_clone_SparseCSR_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.086s) 2023-01-11T21:45:10.4917434Z test_clone_SparseCSR_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.080s) 2023-01-11T21:45:10.4917588Z test_clone_SparseCSR_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.080s) 2023-01-11T21:45:10.4917742Z test_clone_SparseCSR_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.078s) 2023-01-11T21:45:10.4917900Z test_clone_SparseCSR_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.079s) 2023-01-11T21:45:10.4918056Z test_clone_SparseCSR_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.079s) 2023-01-11T21:45:10.4918230Z test_consistency_SparseBSC_abs_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4918401Z test_consistency_SparseBSC_abs_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4918585Z test_consistency_SparseBSC_abs_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4918760Z test_consistency_SparseBSC_abs_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4918932Z test_consistency_SparseBSC_abs_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4919109Z test_consistency_SparseBSC_abs_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4919283Z test_consistency_SparseBSC_abs_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4919459Z test_consistency_SparseBSC_abs_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4919650Z test_consistency_SparseBSC_abs_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4919824Z test_consistency_SparseBSC_abs_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4920000Z test_consistency_SparseBSC_abs_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4920170Z test_consistency_SparseBSC_abs_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4920338Z test_consistency_SparseBSC_abs_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4920515Z test_consistency_SparseBSC_angle_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4920696Z test_consistency_SparseBSC_angle_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4920877Z test_consistency_SparseBSC_angle_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4921053Z test_consistency_SparseBSC_angle_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4921224Z test_consistency_SparseBSC_angle_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4921394Z test_consistency_SparseBSC_angle_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4921566Z test_consistency_SparseBSC_angle_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4921739Z test_consistency_SparseBSC_angle_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4921911Z test_consistency_SparseBSC_angle_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4922084Z test_consistency_SparseBSC_angle_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4922290Z test_consistency_SparseBSC_angle_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4922469Z test_consistency_SparseBSC_asin_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4922643Z test_consistency_SparseBSC_asin_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4922813Z test_consistency_SparseBSC_asin_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4922989Z test_consistency_SparseBSC_asin_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4923162Z test_consistency_SparseBSC_asin_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4923346Z test_consistency_SparseBSC_asin_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4923553Z test_consistency_SparseBSC_asin_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4923739Z test_consistency_SparseBSC_asin_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4923915Z test_consistency_SparseBSC_asin_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4924088Z test_consistency_SparseBSC_asin_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4924258Z test_consistency_SparseBSC_asin_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4924422Z test_consistency_SparseBSC_asin_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4924590Z test_consistency_SparseBSC_asin_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4924771Z test_consistency_SparseBSC_asinh_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4924940Z test_consistency_SparseBSC_asinh_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4925123Z test_consistency_SparseBSC_asinh_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4925324Z test_consistency_SparseBSC_asinh_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4925501Z test_consistency_SparseBSC_asinh_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4925676Z test_consistency_SparseBSC_asinh_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4925839Z test_consistency_SparseBSC_asinh_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4926013Z test_consistency_SparseBSC_asinh_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4926185Z test_consistency_SparseBSC_asinh_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4926355Z test_consistency_SparseBSC_asinh_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4926532Z test_consistency_SparseBSC_asinh_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4926705Z test_consistency_SparseBSC_asinh_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4926874Z test_consistency_SparseBSC_asinh_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4927048Z test_consistency_SparseBSC_atan_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4927219Z test_consistency_SparseBSC_atan_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4927389Z test_consistency_SparseBSC_atan_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4927568Z test_consistency_SparseBSC_atan_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4927749Z test_consistency_SparseBSC_atan_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4927947Z test_consistency_SparseBSC_atan_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4928118Z test_consistency_SparseBSC_atan_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4928285Z test_consistency_SparseBSC_atan_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4928455Z test_consistency_SparseBSC_atan_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4928626Z test_consistency_SparseBSC_atan_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4928794Z test_consistency_SparseBSC_atan_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4928958Z test_consistency_SparseBSC_atan_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4929127Z test_consistency_SparseBSC_atan_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4929308Z test_consistency_SparseBSC_atanh_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4929479Z test_consistency_SparseBSC_atanh_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4929661Z test_consistency_SparseBSC_atanh_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4929843Z test_consistency_SparseBSC_atanh_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4930018Z test_consistency_SparseBSC_atanh_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4930189Z test_consistency_SparseBSC_atanh_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4930365Z test_consistency_SparseBSC_atanh_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4930529Z test_consistency_SparseBSC_atanh_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4930702Z test_consistency_SparseBSC_atanh_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4930903Z test_consistency_SparseBSC_atanh_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4931077Z test_consistency_SparseBSC_atanh_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4931244Z test_consistency_SparseBSC_atanh_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4931411Z test_consistency_SparseBSC_atanh_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4931586Z test_consistency_SparseBSC_ceil_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4931758Z test_consistency_SparseBSC_ceil_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4931927Z test_consistency_SparseBSC_ceil_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4932098Z test_consistency_SparseBSC_ceil_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4932270Z test_consistency_SparseBSC_ceil_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4932439Z test_consistency_SparseBSC_ceil_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4932608Z test_consistency_SparseBSC_ceil_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4932779Z test_consistency_SparseBSC_ceil_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4932944Z test_consistency_SparseBSC_ceil_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4933132Z test_consistency_SparseBSC_conj_physical_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4933315Z test_consistency_SparseBSC_conj_physical_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4933534Z test_consistency_SparseBSC_conj_physical_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4933739Z test_consistency_SparseBSC_conj_physical_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4933961Z test_consistency_SparseBSC_conj_physical_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4934149Z test_consistency_SparseBSC_conj_physical_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4934332Z test_consistency_SparseBSC_conj_physical_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4934642Z test_consistency_SparseBSC_conj_physical_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4934834Z test_consistency_SparseBSC_conj_physical_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4935022Z test_consistency_SparseBSC_conj_physical_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4935206Z test_consistency_SparseBSC_conj_physical_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4935388Z test_consistency_SparseBSC_conj_physical_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4935567Z test_consistency_SparseBSC_conj_physical_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4935748Z test_consistency_SparseBSC_deg2rad_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4935925Z test_consistency_SparseBSC_deg2rad_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4936105Z test_consistency_SparseBSC_deg2rad_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4936279Z test_consistency_SparseBSC_deg2rad_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4936454Z test_consistency_SparseBSC_deg2rad_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4936629Z test_consistency_SparseBSC_deg2rad_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4936841Z test_consistency_SparseBSC_deg2rad_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4937004Z test_consistency_SparseBSC_deg2rad_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4937176Z test_consistency_SparseBSC_deg2rad_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4937343Z test_consistency_SparseBSC_deg2rad_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4937524Z test_consistency_SparseBSC_erf_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4937695Z test_consistency_SparseBSC_erf_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4937870Z test_consistency_SparseBSC_erf_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4938042Z test_consistency_SparseBSC_erf_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4938215Z test_consistency_SparseBSC_erf_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4938385Z test_consistency_SparseBSC_erf_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4938607Z test_consistency_SparseBSC_erf_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4938786Z test_consistency_SparseBSC_erf_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4938956Z test_consistency_SparseBSC_erf_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4939123Z test_consistency_SparseBSC_erf_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4939330Z test_consistency_SparseBSC_erfinv_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4939511Z test_consistency_SparseBSC_erfinv_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4939687Z test_consistency_SparseBSC_erfinv_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4939859Z test_consistency_SparseBSC_erfinv_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4940032Z test_consistency_SparseBSC_erfinv_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4940200Z test_consistency_SparseBSC_erfinv_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4940368Z test_consistency_SparseBSC_erfinv_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4940539Z test_consistency_SparseBSC_erfinv_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4940716Z test_consistency_SparseBSC_erfinv_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4940894Z test_consistency_SparseBSC_expm1_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4941071Z test_consistency_SparseBSC_expm1_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4941245Z test_consistency_SparseBSC_expm1_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4941414Z test_consistency_SparseBSC_expm1_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4941583Z test_consistency_SparseBSC_expm1_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4941747Z test_consistency_SparseBSC_expm1_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4941920Z test_consistency_SparseBSC_expm1_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4942090Z test_consistency_SparseBSC_expm1_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4942264Z test_consistency_SparseBSC_expm1_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4942459Z test_consistency_SparseBSC_expm1_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4942637Z test_consistency_SparseBSC_floor_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4942809Z test_consistency_SparseBSC_floor_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4942978Z test_consistency_SparseBSC_floor_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4943140Z test_consistency_SparseBSC_floor_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4943310Z test_consistency_SparseBSC_floor_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4943505Z test_consistency_SparseBSC_floor_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4943699Z test_consistency_SparseBSC_floor_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4943873Z test_consistency_SparseBSC_floor_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4944043Z test_consistency_SparseBSC_floor_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4944219Z test_consistency_SparseBSC_frac_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4944391Z test_consistency_SparseBSC_frac_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4944564Z test_consistency_SparseBSC_frac_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4944727Z test_consistency_SparseBSC_frac_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4944927Z test_consistency_SparseBSC_isinf_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4945097Z test_consistency_SparseBSC_isinf_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4945280Z test_consistency_SparseBSC_isinf_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4945458Z test_consistency_SparseBSC_isinf_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4945636Z test_consistency_SparseBSC_isinf_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4945807Z test_consistency_SparseBSC_isinf_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4945977Z test_consistency_SparseBSC_isinf_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4946143Z test_consistency_SparseBSC_isinf_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4946309Z test_consistency_SparseBSC_isinf_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4946480Z test_consistency_SparseBSC_isinf_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4946651Z test_consistency_SparseBSC_isinf_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4946821Z test_consistency_SparseBSC_isinf_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4946989Z test_consistency_SparseBSC_isinf_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4947166Z test_consistency_SparseBSC_isnan_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4947337Z test_consistency_SparseBSC_isnan_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4947517Z test_consistency_SparseBSC_isnan_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4947700Z test_consistency_SparseBSC_isnan_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4947865Z test_consistency_SparseBSC_isnan_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4948058Z test_consistency_SparseBSC_isnan_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4948229Z test_consistency_SparseBSC_isnan_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4948399Z test_consistency_SparseBSC_isnan_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4948570Z test_consistency_SparseBSC_isnan_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4948738Z test_consistency_SparseBSC_isnan_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4948909Z test_consistency_SparseBSC_isnan_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4949086Z test_consistency_SparseBSC_isnan_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4949269Z test_consistency_SparseBSC_isneginf_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4949442Z test_consistency_SparseBSC_isneginf_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4949623Z test_consistency_SparseBSC_isneginf_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4949802Z test_consistency_SparseBSC_isneginf_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4949981Z test_consistency_SparseBSC_isneginf_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4950159Z test_consistency_SparseBSC_isneginf_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4950331Z test_consistency_SparseBSC_isneginf_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4950529Z test_consistency_SparseBSC_isneginf_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4950708Z test_consistency_SparseBSC_isneginf_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4950885Z test_consistency_SparseBSC_isneginf_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4951059Z test_consistency_SparseBSC_isposinf_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4951234Z test_consistency_SparseBSC_isposinf_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4951411Z test_consistency_SparseBSC_isposinf_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4951585Z test_consistency_SparseBSC_isposinf_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4951765Z test_consistency_SparseBSC_isposinf_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4951943Z test_consistency_SparseBSC_isposinf_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4952118Z test_consistency_SparseBSC_isposinf_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4952288Z test_consistency_SparseBSC_isposinf_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4952456Z test_consistency_SparseBSC_isposinf_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4952638Z test_consistency_SparseBSC_isposinf_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4952812Z test_consistency_SparseBSC_log1p_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4952985Z test_consistency_SparseBSC_log1p_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4953166Z test_consistency_SparseBSC_log1p_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4953362Z test_consistency_SparseBSC_log1p_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4953596Z test_consistency_SparseBSC_log1p_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4953767Z test_consistency_SparseBSC_log1p_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4953935Z test_consistency_SparseBSC_log1p_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4954099Z test_consistency_SparseBSC_log1p_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4954269Z test_consistency_SparseBSC_log1p_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4954441Z test_consistency_SparseBSC_log1p_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4954612Z test_consistency_SparseBSC_log1p_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4954788Z test_consistency_SparseBSC_log1p_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4955046Z test_consistency_SparseBSC_masked_amax_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... skip: masked.amax does not support input with torch.sparse_bsc layout (0.002s) 2023-01-11T21:45:10.4955301Z test_consistency_SparseBSC_masked_amax_cuda_float16 (__main__.TestSparseCompressedCUDA) ... skip: masked.amax does not support input with torch.sparse_bsc layout (0.002s) 2023-01-11T21:45:10.4955556Z test_consistency_SparseBSC_masked_amax_cuda_float32 (__main__.TestSparseCompressedCUDA) ... skip: masked.amax does not support input with torch.sparse_bsc layout (0.002s) 2023-01-11T21:45:10.4955802Z test_consistency_SparseBSC_masked_amax_cuda_float64 (__main__.TestSparseCompressedCUDA) ... skip: masked.amax does not support input with torch.sparse_bsc layout (0.002s) 2023-01-11T21:45:10.4956078Z test_consistency_SparseBSC_masked_amax_cuda_int16 (__main__.TestSparseCompressedCUDA) ... skip: masked.amax does not support input with torch.sparse_bsc layout (0.002s) 2023-01-11T21:45:10.4956326Z test_consistency_SparseBSC_masked_amax_cuda_int32 (__main__.TestSparseCompressedCUDA) ... skip: masked.amax does not support input with torch.sparse_bsc layout (0.002s) 2023-01-11T21:45:10.4956559Z test_consistency_SparseBSC_masked_amax_cuda_int64 (__main__.TestSparseCompressedCUDA) ... skip: masked.amax does not support input with torch.sparse_bsc layout (0.002s) 2023-01-11T21:45:10.4956805Z test_consistency_SparseBSC_masked_amax_cuda_int8 (__main__.TestSparseCompressedCUDA) ... skip: masked.amax does not support input with torch.sparse_bsc layout (0.002s) 2023-01-11T21:45:10.4957045Z test_consistency_SparseBSC_masked_amax_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... skip: masked.amax does not support input with torch.sparse_bsc layout (0.002s) 2023-01-11T21:45:10.4957296Z test_consistency_SparseBSC_masked_amin_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... skip: masked.amin does not support input with torch.sparse_bsc layout (0.002s) 2023-01-11T21:45:10.4957546Z test_consistency_SparseBSC_masked_amin_cuda_float16 (__main__.TestSparseCompressedCUDA) ... skip: masked.amin does not support input with torch.sparse_bsc layout (0.002s) 2023-01-11T21:45:10.4957790Z test_consistency_SparseBSC_masked_amin_cuda_float32 (__main__.TestSparseCompressedCUDA) ... skip: masked.amin does not support input with torch.sparse_bsc layout (0.002s) 2023-01-11T21:45:10.4958037Z test_consistency_SparseBSC_masked_amin_cuda_float64 (__main__.TestSparseCompressedCUDA) ... skip: masked.amin does not support input with torch.sparse_bsc layout (0.002s) 2023-01-11T21:45:10.4958284Z test_consistency_SparseBSC_masked_amin_cuda_int16 (__main__.TestSparseCompressedCUDA) ... skip: masked.amin does not support input with torch.sparse_bsc layout (0.002s) 2023-01-11T21:45:10.4958528Z test_consistency_SparseBSC_masked_amin_cuda_int32 (__main__.TestSparseCompressedCUDA) ... skip: masked.amin does not support input with torch.sparse_bsc layout (0.002s) 2023-01-11T21:45:10.4958796Z test_consistency_SparseBSC_masked_amin_cuda_int64 (__main__.TestSparseCompressedCUDA) ... skip: masked.amin does not support input with torch.sparse_bsc layout (0.002s) 2023-01-11T21:45:10.4959044Z test_consistency_SparseBSC_masked_amin_cuda_int8 (__main__.TestSparseCompressedCUDA) ... skip: masked.amin does not support input with torch.sparse_bsc layout (0.002s) 2023-01-11T21:45:10.4959286Z test_consistency_SparseBSC_masked_amin_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... skip: masked.amin does not support input with torch.sparse_bsc layout (0.002s) 2023-01-11T21:45:10.4959535Z test_consistency_SparseBSC_masked_mean_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... skip: masked.mean does not support input with torch.sparse_bsc layout (0.002s) 2023-01-11T21:45:10.4959769Z test_consistency_SparseBSC_masked_mean_cuda_bool (__main__.TestSparseCompressedCUDA) ... skip: masked.mean does not support input with torch.sparse_bsc layout (0.002s) 2023-01-11T21:45:10.4960030Z test_consistency_SparseBSC_masked_mean_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... skip: masked.mean does not support input with torch.sparse_bsc layout (0.002s) 2023-01-11T21:45:10.4960282Z test_consistency_SparseBSC_masked_mean_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... skip: masked.mean does not support input with torch.sparse_bsc layout (0.002s) 2023-01-11T21:45:10.4960527Z test_consistency_SparseBSC_masked_mean_cuda_float16 (__main__.TestSparseCompressedCUDA) ... skip: masked.mean does not support input with torch.sparse_bsc layout (0.002s) 2023-01-11T21:45:10.4960776Z test_consistency_SparseBSC_masked_mean_cuda_float32 (__main__.TestSparseCompressedCUDA) ... skip: masked.mean does not support input with torch.sparse_bsc layout (0.002s) 2023-01-11T21:45:10.4961014Z test_consistency_SparseBSC_masked_mean_cuda_float64 (__main__.TestSparseCompressedCUDA) ... skip: masked.mean does not support input with torch.sparse_bsc layout (0.002s) 2023-01-11T21:45:10.4961283Z test_consistency_SparseBSC_masked_mean_cuda_int16 (__main__.TestSparseCompressedCUDA) ... skip: masked.mean does not support input with torch.sparse_bsc layout (0.002s) 2023-01-11T21:45:10.4961523Z test_consistency_SparseBSC_masked_mean_cuda_int32 (__main__.TestSparseCompressedCUDA) ... skip: masked.mean does not support input with torch.sparse_bsc layout (0.002s) 2023-01-11T21:45:10.4961759Z test_consistency_SparseBSC_masked_mean_cuda_int64 (__main__.TestSparseCompressedCUDA) ... skip: masked.mean does not support input with torch.sparse_bsc layout (0.002s) 2023-01-11T21:45:10.4961999Z test_consistency_SparseBSC_masked_mean_cuda_int8 (__main__.TestSparseCompressedCUDA) ... skip: masked.mean does not support input with torch.sparse_bsc layout (0.002s) 2023-01-11T21:45:10.4962239Z test_consistency_SparseBSC_masked_mean_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... skip: masked.mean does not support input with torch.sparse_bsc layout (0.002s) 2023-01-11T21:45:10.4962492Z test_consistency_SparseBSC_masked_prod_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... skip: masked.prod does not support input with torch.sparse_bsc layout (0.002s) 2023-01-11T21:45:10.4962725Z test_consistency_SparseBSC_masked_prod_cuda_bool (__main__.TestSparseCompressedCUDA) ... skip: masked.prod does not support input with torch.sparse_bsc layout (0.002s) 2023-01-11T21:45:10.4962977Z test_consistency_SparseBSC_masked_prod_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... skip: masked.prod does not support input with torch.sparse_bsc layout (0.002s) 2023-01-11T21:45:10.4963231Z test_consistency_SparseBSC_masked_prod_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... skip: masked.prod does not support input with torch.sparse_bsc layout (0.002s) 2023-01-11T21:45:10.4963475Z test_consistency_SparseBSC_masked_prod_cuda_float16 (__main__.TestSparseCompressedCUDA) ... skip: masked.prod does not support input with torch.sparse_bsc layout (0.002s) 2023-01-11T21:45:10.4963723Z test_consistency_SparseBSC_masked_prod_cuda_float32 (__main__.TestSparseCompressedCUDA) ... skip: masked.prod does not support input with torch.sparse_bsc layout (0.002s) 2023-01-11T21:45:10.4963985Z test_consistency_SparseBSC_masked_prod_cuda_float64 (__main__.TestSparseCompressedCUDA) ... skip: masked.prod does not support input with torch.sparse_bsc layout (0.002s) 2023-01-11T21:45:10.4964231Z test_consistency_SparseBSC_masked_prod_cuda_int16 (__main__.TestSparseCompressedCUDA) ... skip: masked.prod does not support input with torch.sparse_bsc layout (0.002s) 2023-01-11T21:45:10.4964470Z test_consistency_SparseBSC_masked_prod_cuda_int32 (__main__.TestSparseCompressedCUDA) ... skip: masked.prod does not support input with torch.sparse_bsc layout (0.002s) 2023-01-11T21:45:10.4964708Z test_consistency_SparseBSC_masked_prod_cuda_int64 (__main__.TestSparseCompressedCUDA) ... skip: masked.prod does not support input with torch.sparse_bsc layout (0.002s) 2023-01-11T21:45:10.4964953Z test_consistency_SparseBSC_masked_prod_cuda_int8 (__main__.TestSparseCompressedCUDA) ... skip: masked.prod does not support input with torch.sparse_bsc layout (0.002s) 2023-01-11T21:45:10.4965196Z test_consistency_SparseBSC_masked_prod_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... skip: masked.prod does not support input with torch.sparse_bsc layout (0.002s) 2023-01-11T21:45:10.4965446Z test_consistency_SparseBSC_masked_sum_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... skip: masked.sum does not support input with torch.sparse_bsc layout (0.002s) 2023-01-11T21:45:10.4965691Z test_consistency_SparseBSC_masked_sum_cuda_bool (__main__.TestSparseCompressedCUDA) ... skip: masked.sum does not support input with torch.sparse_bsc layout (0.002s) 2023-01-11T21:45:10.4965937Z test_consistency_SparseBSC_masked_sum_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... skip: masked.sum does not support input with torch.sparse_bsc layout (0.002s) 2023-01-11T21:45:10.4966268Z test_consistency_SparseBSC_masked_sum_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... skip: masked.sum does not support input with torch.sparse_bsc layout (0.002s) 2023-01-11T21:45:10.4966513Z test_consistency_SparseBSC_masked_sum_cuda_float16 (__main__.TestSparseCompressedCUDA) ... skip: masked.sum does not support input with torch.sparse_bsc layout (0.002s) 2023-01-11T21:45:10.4966754Z test_consistency_SparseBSC_masked_sum_cuda_float32 (__main__.TestSparseCompressedCUDA) ... skip: masked.sum does not support input with torch.sparse_bsc layout (0.002s) 2023-01-11T21:45:10.4966991Z test_consistency_SparseBSC_masked_sum_cuda_float64 (__main__.TestSparseCompressedCUDA) ... skip: masked.sum does not support input with torch.sparse_bsc layout (0.002s) 2023-01-11T21:45:10.4967233Z test_consistency_SparseBSC_masked_sum_cuda_int16 (__main__.TestSparseCompressedCUDA) ... skip: masked.sum does not support input with torch.sparse_bsc layout (0.002s) 2023-01-11T21:45:10.4967479Z test_consistency_SparseBSC_masked_sum_cuda_int32 (__main__.TestSparseCompressedCUDA) ... skip: masked.sum does not support input with torch.sparse_bsc layout (0.002s) 2023-01-11T21:45:10.4967725Z test_consistency_SparseBSC_masked_sum_cuda_int64 (__main__.TestSparseCompressedCUDA) ... skip: masked.sum does not support input with torch.sparse_bsc layout (0.002s) 2023-01-11T21:45:10.4967967Z test_consistency_SparseBSC_masked_sum_cuda_int8 (__main__.TestSparseCompressedCUDA) ... skip: masked.sum does not support input with torch.sparse_bsc layout (0.002s) 2023-01-11T21:45:10.4968212Z test_consistency_SparseBSC_masked_sum_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... skip: masked.sum does not support input with torch.sparse_bsc layout (0.002s) 2023-01-11T21:45:10.4968390Z test_consistency_SparseBSC_neg_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4968570Z test_consistency_SparseBSC_neg_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4968744Z test_consistency_SparseBSC_neg_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4968948Z test_consistency_SparseBSC_neg_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4969122Z test_consistency_SparseBSC_neg_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4969298Z test_consistency_SparseBSC_neg_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4969473Z test_consistency_SparseBSC_neg_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4969644Z test_consistency_SparseBSC_neg_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4969814Z test_consistency_SparseBSC_neg_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4969984Z test_consistency_SparseBSC_neg_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4970157Z test_consistency_SparseBSC_neg_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4970320Z test_consistency_SparseBSC_neg_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4970514Z test_consistency_SparseBSC_nn_functional_relu_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4970703Z test_consistency_SparseBSC_nn_functional_relu_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4970894Z test_consistency_SparseBSC_nn_functional_relu_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4971083Z test_consistency_SparseBSC_nn_functional_relu_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4971268Z test_consistency_SparseBSC_nn_functional_relu_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4971492Z test_consistency_SparseBSC_nn_functional_relu_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4971677Z test_consistency_SparseBSC_nn_functional_relu_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4971865Z test_consistency_SparseBSC_nn_functional_relu_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4972042Z test_consistency_SparseBSC_nn_functional_relu_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4972225Z test_consistency_SparseBSC_positive_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4972410Z test_consistency_SparseBSC_positive_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4972596Z test_consistency_SparseBSC_positive_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4972779Z test_consistency_SparseBSC_positive_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4972960Z test_consistency_SparseBSC_positive_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4973143Z test_consistency_SparseBSC_positive_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4973321Z test_consistency_SparseBSC_positive_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4973499Z test_consistency_SparseBSC_positive_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4973665Z test_consistency_SparseBSC_positive_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4973869Z test_consistency_SparseBSC_positive_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4974067Z test_consistency_SparseBSC_positive_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4974240Z test_consistency_SparseBSC_positive_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4974420Z test_consistency_SparseBSC_rad2deg_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4974838Z test_consistency_SparseBSC_rad2deg_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4975030Z test_consistency_SparseBSC_rad2deg_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4975205Z test_consistency_SparseBSC_rad2deg_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4975380Z test_consistency_SparseBSC_rad2deg_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4975547Z test_consistency_SparseBSC_rad2deg_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4975717Z test_consistency_SparseBSC_rad2deg_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4975895Z test_consistency_SparseBSC_rad2deg_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4976076Z test_consistency_SparseBSC_rad2deg_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4976254Z test_consistency_SparseBSC_rad2deg_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4976506Z test_consistency_SparseBSC_randn_like_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... skip: randn_like does not support input with torch.sparse_bsc layout (0.002s) 2023-01-11T21:45:10.4976760Z test_consistency_SparseBSC_randn_like_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... skip: randn_like does not support input with torch.sparse_bsc layout (0.002s) 2023-01-11T21:45:10.4977011Z test_consistency_SparseBSC_randn_like_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... skip: randn_like does not support input with torch.sparse_bsc layout (0.002s) 2023-01-11T21:45:10.4977260Z test_consistency_SparseBSC_randn_like_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... skip: randn_like does not support input with torch.sparse_bsc layout (0.002s) 2023-01-11T21:45:10.4977552Z test_consistency_SparseBSC_randn_like_cuda_float16 (__main__.TestSparseCompressedCUDA) ... skip: randn_like does not support input with torch.sparse_bsc layout (0.002s) 2023-01-11T21:45:10.4977793Z test_consistency_SparseBSC_randn_like_cuda_float32 (__main__.TestSparseCompressedCUDA) ... skip: randn_like does not support input with torch.sparse_bsc layout (0.002s) 2023-01-11T21:45:10.4978031Z test_consistency_SparseBSC_randn_like_cuda_float64 (__main__.TestSparseCompressedCUDA) ... skip: randn_like does not support input with torch.sparse_bsc layout (0.002s) 2023-01-11T21:45:10.4978212Z test_consistency_SparseBSC_round_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4978389Z test_consistency_SparseBSC_round_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4978629Z test_consistency_SparseBSC_round_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4978810Z test_consistency_SparseBSC_round_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4978988Z test_consistency_SparseBSC_round_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4979162Z test_consistency_SparseBSC_round_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4979335Z test_consistency_SparseBSC_round_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4979511Z test_consistency_SparseBSC_round_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4979679Z test_consistency_SparseBSC_round_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4979856Z test_consistency_SparseBSC_sgn_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4980030Z test_consistency_SparseBSC_sgn_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4980210Z test_consistency_SparseBSC_sgn_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4980413Z test_consistency_SparseBSC_sgn_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4980587Z test_consistency_SparseBSC_sgn_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4980761Z test_consistency_SparseBSC_sgn_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4980934Z test_consistency_SparseBSC_sgn_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4981102Z test_consistency_SparseBSC_sgn_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4981275Z test_consistency_SparseBSC_sgn_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4981449Z test_consistency_SparseBSC_sgn_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4981624Z test_consistency_SparseBSC_sgn_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4981798Z test_consistency_SparseBSC_sgn_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4981968Z test_consistency_SparseBSC_sgn_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4982146Z test_consistency_SparseBSC_sign_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4982317Z test_consistency_SparseBSC_sign_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4982493Z test_consistency_SparseBSC_sign_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4982663Z test_consistency_SparseBSC_sign_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4982834Z test_consistency_SparseBSC_sign_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4983037Z test_consistency_SparseBSC_sign_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4983213Z test_consistency_SparseBSC_sign_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4983388Z test_consistency_SparseBSC_sign_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4983559Z test_consistency_SparseBSC_sign_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4983730Z test_consistency_SparseBSC_sign_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4983913Z test_consistency_SparseBSC_signbit_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4984095Z test_consistency_SparseBSC_signbit_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4984270Z test_consistency_SparseBSC_signbit_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4984453Z test_consistency_SparseBSC_signbit_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4984631Z test_consistency_SparseBSC_signbit_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4984811Z test_consistency_SparseBSC_signbit_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4984984Z test_consistency_SparseBSC_signbit_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4985155Z test_consistency_SparseBSC_signbit_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4985328Z test_consistency_SparseBSC_signbit_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4985504Z test_consistency_SparseBSC_signbit_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.4985674Z test_consistency_SparseBSC_sin_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4985843Z test_consistency_SparseBSC_sin_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4986049Z test_consistency_SparseBSC_sin_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4986223Z test_consistency_SparseBSC_sin_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4986394Z test_consistency_SparseBSC_sin_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4986568Z test_consistency_SparseBSC_sin_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4986740Z test_consistency_SparseBSC_sin_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4986918Z test_consistency_SparseBSC_sin_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4987089Z test_consistency_SparseBSC_sin_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4987258Z test_consistency_SparseBSC_sin_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4987431Z test_consistency_SparseBSC_sin_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4987603Z test_consistency_SparseBSC_sin_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4987775Z test_consistency_SparseBSC_sin_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4987952Z test_consistency_SparseBSC_sinh_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4988120Z test_consistency_SparseBSC_sinh_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4988300Z test_consistency_SparseBSC_sinh_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4988479Z test_consistency_SparseBSC_sinh_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4988681Z test_consistency_SparseBSC_sinh_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4988853Z test_consistency_SparseBSC_sinh_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4989024Z test_consistency_SparseBSC_sinh_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4989194Z test_consistency_SparseBSC_sinh_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4989369Z test_consistency_SparseBSC_sinh_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4989542Z test_consistency_SparseBSC_sinh_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4989713Z test_consistency_SparseBSC_sinh_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4989885Z test_consistency_SparseBSC_sinh_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4990062Z test_consistency_SparseBSC_sinh_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4990238Z test_consistency_SparseBSC_sqrt_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4990406Z test_consistency_SparseBSC_sqrt_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4990588Z test_consistency_SparseBSC_sqrt_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4990767Z test_consistency_SparseBSC_sqrt_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4990943Z test_consistency_SparseBSC_sqrt_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4991119Z test_consistency_SparseBSC_sqrt_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4991289Z test_consistency_SparseBSC_sqrt_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4991466Z test_consistency_SparseBSC_sqrt_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4991638Z test_consistency_SparseBSC_sqrt_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4991832Z test_consistency_SparseBSC_sqrt_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4991999Z test_consistency_SparseBSC_sqrt_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4992172Z test_consistency_SparseBSC_sqrt_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4992342Z test_consistency_SparseBSC_sqrt_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4992516Z test_consistency_SparseBSC_tan_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4992689Z test_consistency_SparseBSC_tan_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4992870Z test_consistency_SparseBSC_tan_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4993047Z test_consistency_SparseBSC_tan_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4993231Z test_consistency_SparseBSC_tan_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4993417Z test_consistency_SparseBSC_tan_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4993620Z test_consistency_SparseBSC_tan_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4993800Z test_consistency_SparseBSC_tan_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4993971Z test_consistency_SparseBSC_tan_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4994144Z test_consistency_SparseBSC_tan_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4994346Z test_consistency_SparseBSC_tan_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4994517Z test_consistency_SparseBSC_tan_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4994688Z test_consistency_SparseBSC_tan_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4994867Z test_consistency_SparseBSC_tanh_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4995031Z test_consistency_SparseBSC_tanh_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4995212Z test_consistency_SparseBSC_tanh_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4995392Z test_consistency_SparseBSC_tanh_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4995569Z test_consistency_SparseBSC_tanh_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4995745Z test_consistency_SparseBSC_tanh_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4995922Z test_consistency_SparseBSC_tanh_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4996094Z test_consistency_SparseBSC_tanh_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4996267Z test_consistency_SparseBSC_tanh_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4996440Z test_consistency_SparseBSC_tanh_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4996606Z test_consistency_SparseBSC_tanh_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4996774Z test_consistency_SparseBSC_tanh_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4996948Z test_consistency_SparseBSC_tanh_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.4997201Z test_consistency_SparseBSC_to_sparse_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... skip: to_sparse does not support input with torch.sparse_bsc layout (0.002s) 2023-01-11T21:45:10.4997473Z test_consistency_SparseBSC_to_sparse_cuda_bool (__main__.TestSparseCompressedCUDA) ... skip: to_sparse does not support input with torch.sparse_bsc layout (0.002s) 2023-01-11T21:45:10.4997726Z test_consistency_SparseBSC_to_sparse_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... skip: to_sparse does not support input with torch.sparse_bsc layout (0.002s) 2023-01-11T21:45:10.4997977Z test_consistency_SparseBSC_to_sparse_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... skip: to_sparse does not support input with torch.sparse_bsc layout (0.002s) 2023-01-11T21:45:10.4998223Z test_consistency_SparseBSC_to_sparse_cuda_float16 (__main__.TestSparseCompressedCUDA) ... skip: to_sparse does not support input with torch.sparse_bsc layout (0.002s) 2023-01-11T21:45:10.4998464Z test_consistency_SparseBSC_to_sparse_cuda_float32 (__main__.TestSparseCompressedCUDA) ... skip: to_sparse does not support input with torch.sparse_bsc layout (0.002s) 2023-01-11T21:45:10.4998714Z test_consistency_SparseBSC_to_sparse_cuda_float64 (__main__.TestSparseCompressedCUDA) ... skip: to_sparse does not support input with torch.sparse_bsc layout (0.002s) 2023-01-11T21:45:10.4998956Z test_consistency_SparseBSC_to_sparse_cuda_int16 (__main__.TestSparseCompressedCUDA) ... skip: to_sparse does not support input with torch.sparse_bsc layout (0.002s) 2023-01-11T21:45:10.4999186Z test_consistency_SparseBSC_to_sparse_cuda_int32 (__main__.TestSparseCompressedCUDA) ... skip: to_sparse does not support input with torch.sparse_bsc layout (0.002s) 2023-01-11T21:45:10.4999431Z test_consistency_SparseBSC_to_sparse_cuda_int64 (__main__.TestSparseCompressedCUDA) ... skip: to_sparse does not support input with torch.sparse_bsc layout (0.002s) 2023-01-11T21:45:10.4999673Z test_consistency_SparseBSC_to_sparse_cuda_int8 (__main__.TestSparseCompressedCUDA) ... skip: to_sparse does not support input with torch.sparse_bsc layout (0.002s) 2023-01-11T21:45:10.4999937Z test_consistency_SparseBSC_to_sparse_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... skip: to_sparse does not support input with torch.sparse_bsc layout (0.002s) 2023-01-11T21:45:10.5000116Z test_consistency_SparseBSC_trunc_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5000290Z test_consistency_SparseBSC_trunc_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5000460Z test_consistency_SparseBSC_trunc_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5000637Z test_consistency_SparseBSC_trunc_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5000807Z test_consistency_SparseBSC_trunc_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5000982Z test_consistency_SparseBSC_trunc_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5001151Z test_consistency_SparseBSC_trunc_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5001323Z test_consistency_SparseBSC_trunc_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5001492Z test_consistency_SparseBSC_trunc_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5001665Z test_consistency_SparseBSR_abs_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5001833Z test_consistency_SparseBSR_abs_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5002009Z test_consistency_SparseBSR_abs_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5002182Z test_consistency_SparseBSR_abs_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5002354Z test_consistency_SparseBSR_abs_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5002531Z test_consistency_SparseBSR_abs_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5002719Z test_consistency_SparseBSR_abs_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5002892Z test_consistency_SparseBSR_abs_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5003064Z test_consistency_SparseBSR_abs_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5003234Z test_consistency_SparseBSR_abs_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5003401Z test_consistency_SparseBSR_abs_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5003569Z test_consistency_SparseBSR_abs_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5003737Z test_consistency_SparseBSR_abs_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5003914Z test_consistency_SparseBSR_angle_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5004093Z test_consistency_SparseBSR_angle_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5004272Z test_consistency_SparseBSR_angle_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5004445Z test_consistency_SparseBSR_angle_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5004617Z test_consistency_SparseBSR_angle_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5004787Z test_consistency_SparseBSR_angle_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5004957Z test_consistency_SparseBSR_angle_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5005131Z test_consistency_SparseBSR_angle_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5005332Z test_consistency_SparseBSR_angle_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5005510Z test_consistency_SparseBSR_angle_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5005676Z test_consistency_SparseBSR_angle_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5005854Z test_consistency_SparseBSR_asin_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5006027Z test_consistency_SparseBSR_asin_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5006206Z test_consistency_SparseBSR_asin_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5006385Z test_consistency_SparseBSR_asin_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5006559Z test_consistency_SparseBSR_asin_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5006739Z test_consistency_SparseBSR_asin_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5006912Z test_consistency_SparseBSR_asin_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5007082Z test_consistency_SparseBSR_asin_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5007249Z test_consistency_SparseBSR_asin_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5007424Z test_consistency_SparseBSR_asin_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5007597Z test_consistency_SparseBSR_asin_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5007769Z test_consistency_SparseBSR_asin_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5007940Z test_consistency_SparseBSR_asin_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5008123Z test_consistency_SparseBSR_asinh_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5008318Z test_consistency_SparseBSR_asinh_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5008503Z test_consistency_SparseBSR_asinh_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5008683Z test_consistency_SparseBSR_asinh_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5008855Z test_consistency_SparseBSR_asinh_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5009033Z test_consistency_SparseBSR_asinh_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5009205Z test_consistency_SparseBSR_asinh_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5009384Z test_consistency_SparseBSR_asinh_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5009559Z test_consistency_SparseBSR_asinh_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5009736Z test_consistency_SparseBSR_asinh_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5009911Z test_consistency_SparseBSR_asinh_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5010084Z test_consistency_SparseBSR_asinh_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5010259Z test_consistency_SparseBSR_asinh_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5010429Z test_consistency_SparseBSR_atan_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5010601Z test_consistency_SparseBSR_atan_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5010780Z test_consistency_SparseBSR_atan_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5010999Z test_consistency_SparseBSR_atan_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5011183Z test_consistency_SparseBSR_atan_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5011359Z test_consistency_SparseBSR_atan_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5011533Z test_consistency_SparseBSR_atan_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5011702Z test_consistency_SparseBSR_atan_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5011868Z test_consistency_SparseBSR_atan_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5012039Z test_consistency_SparseBSR_atan_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5012209Z test_consistency_SparseBSR_atan_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5012385Z test_consistency_SparseBSR_atan_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5012562Z test_consistency_SparseBSR_atan_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5012742Z test_consistency_SparseBSR_atanh_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5012911Z test_consistency_SparseBSR_atanh_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5013091Z test_consistency_SparseBSR_atanh_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5013269Z test_consistency_SparseBSR_atanh_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5013439Z test_consistency_SparseBSR_atanh_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5013614Z test_consistency_SparseBSR_atanh_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5013789Z test_consistency_SparseBSR_atanh_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5013987Z test_consistency_SparseBSR_atanh_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5014165Z test_consistency_SparseBSR_atanh_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5014340Z test_consistency_SparseBSR_atanh_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5014631Z test_consistency_SparseBSR_atanh_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5014805Z test_consistency_SparseBSR_atanh_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5014973Z test_consistency_SparseBSR_atanh_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5015142Z test_consistency_SparseBSR_ceil_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5015319Z test_consistency_SparseBSR_ceil_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5015491Z test_consistency_SparseBSR_ceil_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5015661Z test_consistency_SparseBSR_ceil_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5015831Z test_consistency_SparseBSR_ceil_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5015999Z test_consistency_SparseBSR_ceil_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5016165Z test_consistency_SparseBSR_ceil_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5016334Z test_consistency_SparseBSR_ceil_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5016502Z test_consistency_SparseBSR_ceil_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5016726Z test_consistency_SparseBSR_conj_physical_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5016913Z test_consistency_SparseBSR_conj_physical_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5017106Z test_consistency_SparseBSR_conj_physical_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5017297Z test_consistency_SparseBSR_conj_physical_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5017488Z test_consistency_SparseBSR_conj_physical_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5017674Z test_consistency_SparseBSR_conj_physical_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5017858Z test_consistency_SparseBSR_conj_physical_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5018046Z test_consistency_SparseBSR_conj_physical_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5018232Z test_consistency_SparseBSR_conj_physical_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5018409Z test_consistency_SparseBSR_conj_physical_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5018652Z test_consistency_SparseBSR_conj_physical_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5018839Z test_consistency_SparseBSR_conj_physical_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5019020Z test_consistency_SparseBSR_conj_physical_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5019205Z test_consistency_SparseBSR_deg2rad_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5019382Z test_consistency_SparseBSR_deg2rad_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5019564Z test_consistency_SparseBSR_deg2rad_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5019740Z test_consistency_SparseBSR_deg2rad_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5019954Z test_consistency_SparseBSR_deg2rad_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5020124Z test_consistency_SparseBSR_deg2rad_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5020299Z test_consistency_SparseBSR_deg2rad_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5020468Z test_consistency_SparseBSR_deg2rad_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5020643Z test_consistency_SparseBSR_deg2rad_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5020811Z test_consistency_SparseBSR_deg2rad_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5020985Z test_consistency_SparseBSR_erf_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5021158Z test_consistency_SparseBSR_erf_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5021330Z test_consistency_SparseBSR_erf_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5021501Z test_consistency_SparseBSR_erf_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5021665Z test_consistency_SparseBSR_erf_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5021836Z test_consistency_SparseBSR_erf_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5022003Z test_consistency_SparseBSR_erf_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5022170Z test_consistency_SparseBSR_erf_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5022367Z test_consistency_SparseBSR_erf_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5022539Z test_consistency_SparseBSR_erf_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5022718Z test_consistency_SparseBSR_erfinv_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5022897Z test_consistency_SparseBSR_erfinv_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5023065Z test_consistency_SparseBSR_erfinv_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5023242Z test_consistency_SparseBSR_erfinv_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5023447Z test_consistency_SparseBSR_erfinv_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5023637Z test_consistency_SparseBSR_erfinv_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5023812Z test_consistency_SparseBSR_erfinv_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5023984Z test_consistency_SparseBSR_erfinv_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5024160Z test_consistency_SparseBSR_erfinv_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5024338Z test_consistency_SparseBSR_expm1_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5024513Z test_consistency_SparseBSR_expm1_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5024683Z test_consistency_SparseBSR_expm1_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5024856Z test_consistency_SparseBSR_expm1_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5025026Z test_consistency_SparseBSR_expm1_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5025201Z test_consistency_SparseBSR_expm1_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5025374Z test_consistency_SparseBSR_expm1_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5025572Z test_consistency_SparseBSR_expm1_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5025748Z test_consistency_SparseBSR_expm1_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5025917Z test_consistency_SparseBSR_expm1_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5026097Z test_consistency_SparseBSR_floor_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5026265Z test_consistency_SparseBSR_floor_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5026442Z test_consistency_SparseBSR_floor_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5026618Z test_consistency_SparseBSR_floor_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5026793Z test_consistency_SparseBSR_floor_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5026968Z test_consistency_SparseBSR_floor_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5027139Z test_consistency_SparseBSR_floor_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5027316Z test_consistency_SparseBSR_floor_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5027485Z test_consistency_SparseBSR_floor_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5027659Z test_consistency_SparseBSR_frac_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5027827Z test_consistency_SparseBSR_frac_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5028026Z test_consistency_SparseBSR_frac_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5028202Z test_consistency_SparseBSR_frac_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5028383Z test_consistency_SparseBSR_isinf_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5028555Z test_consistency_SparseBSR_isinf_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5028739Z test_consistency_SparseBSR_isinf_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5028919Z test_consistency_SparseBSR_isinf_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5029101Z test_consistency_SparseBSR_isinf_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5029268Z test_consistency_SparseBSR_isinf_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5029442Z test_consistency_SparseBSR_isinf_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5029617Z test_consistency_SparseBSR_isinf_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5029790Z test_consistency_SparseBSR_isinf_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5029963Z test_consistency_SparseBSR_isinf_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5030135Z test_consistency_SparseBSR_isinf_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5030306Z test_consistency_SparseBSR_isinf_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5030478Z test_consistency_SparseBSR_isinf_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5030659Z test_consistency_SparseBSR_isnan_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5030828Z test_consistency_SparseBSR_isnan_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5031010Z test_consistency_SparseBSR_isnan_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5031218Z test_consistency_SparseBSR_isnan_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5031396Z test_consistency_SparseBSR_isnan_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5031567Z test_consistency_SparseBSR_isnan_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5031741Z test_consistency_SparseBSR_isnan_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5031912Z test_consistency_SparseBSR_isnan_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5032084Z test_consistency_SparseBSR_isnan_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5032258Z test_consistency_SparseBSR_isnan_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5032424Z test_consistency_SparseBSR_isnan_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5032598Z test_consistency_SparseBSR_isnan_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5032783Z test_consistency_SparseBSR_isneginf_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5032959Z test_consistency_SparseBSR_isneginf_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5033141Z test_consistency_SparseBSR_isneginf_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5033324Z test_consistency_SparseBSR_isneginf_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5033505Z test_consistency_SparseBSR_isneginf_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5033740Z test_consistency_SparseBSR_isneginf_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5033939Z test_consistency_SparseBSR_isneginf_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5034106Z test_consistency_SparseBSR_isneginf_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5034281Z test_consistency_SparseBSR_isneginf_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5034460Z test_consistency_SparseBSR_isneginf_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5034643Z test_consistency_SparseBSR_isposinf_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5034816Z test_consistency_SparseBSR_isposinf_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5034992Z test_consistency_SparseBSR_isposinf_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5035178Z test_consistency_SparseBSR_isposinf_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5035359Z test_consistency_SparseBSR_isposinf_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5035537Z test_consistency_SparseBSR_isposinf_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5035704Z test_consistency_SparseBSR_isposinf_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5035878Z test_consistency_SparseBSR_isposinf_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5036055Z test_consistency_SparseBSR_isposinf_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5036235Z test_consistency_SparseBSR_isposinf_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5036412Z test_consistency_SparseBSR_log1p_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5036590Z test_consistency_SparseBSR_log1p_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5036794Z test_consistency_SparseBSR_log1p_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5036979Z test_consistency_SparseBSR_log1p_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5037157Z test_consistency_SparseBSR_log1p_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5037324Z test_consistency_SparseBSR_log1p_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5037494Z test_consistency_SparseBSR_log1p_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5037667Z test_consistency_SparseBSR_log1p_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5037843Z test_consistency_SparseBSR_log1p_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5038015Z test_consistency_SparseBSR_log1p_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5038192Z test_consistency_SparseBSR_log1p_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5038364Z test_consistency_SparseBSR_log1p_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5038624Z test_consistency_SparseBSR_masked_amax_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... skip: masked.amax does not support input with torch.sparse_bsr layout (0.002s) 2023-01-11T21:45:10.5038880Z test_consistency_SparseBSR_masked_amax_cuda_float16 (__main__.TestSparseCompressedCUDA) ... skip: masked.amax does not support input with torch.sparse_bsr layout (0.002s) 2023-01-11T21:45:10.5039128Z test_consistency_SparseBSR_masked_amax_cuda_float32 (__main__.TestSparseCompressedCUDA) ... skip: masked.amax does not support input with torch.sparse_bsr layout (0.002s) 2023-01-11T21:45:10.5039401Z test_consistency_SparseBSR_masked_amax_cuda_float64 (__main__.TestSparseCompressedCUDA) ... skip: masked.amax does not support input with torch.sparse_bsr layout (0.002s) 2023-01-11T21:45:10.5039655Z test_consistency_SparseBSR_masked_amax_cuda_int16 (__main__.TestSparseCompressedCUDA) ... skip: masked.amax does not support input with torch.sparse_bsr layout (0.002s) 2023-01-11T21:45:10.5039904Z test_consistency_SparseBSR_masked_amax_cuda_int32 (__main__.TestSparseCompressedCUDA) ... skip: masked.amax does not support input with torch.sparse_bsr layout (0.002s) 2023-01-11T21:45:10.5040145Z test_consistency_SparseBSR_masked_amax_cuda_int64 (__main__.TestSparseCompressedCUDA) ... skip: masked.amax does not support input with torch.sparse_bsr layout (0.002s) 2023-01-11T21:45:10.5040393Z test_consistency_SparseBSR_masked_amax_cuda_int8 (__main__.TestSparseCompressedCUDA) ... skip: masked.amax does not support input with torch.sparse_bsr layout (0.002s) 2023-01-11T21:45:10.5040639Z test_consistency_SparseBSR_masked_amax_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... skip: masked.amax does not support input with torch.sparse_bsr layout (0.002s) 2023-01-11T21:45:10.5040892Z test_consistency_SparseBSR_masked_amin_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... skip: masked.amin does not support input with torch.sparse_bsr layout (0.002s) 2023-01-11T21:45:10.5041140Z test_consistency_SparseBSR_masked_amin_cuda_float16 (__main__.TestSparseCompressedCUDA) ... skip: masked.amin does not support input with torch.sparse_bsr layout (0.002s) 2023-01-11T21:45:10.5041381Z test_consistency_SparseBSR_masked_amin_cuda_float32 (__main__.TestSparseCompressedCUDA) ... skip: masked.amin does not support input with torch.sparse_bsr layout (0.002s) 2023-01-11T21:45:10.5041623Z test_consistency_SparseBSR_masked_amin_cuda_float64 (__main__.TestSparseCompressedCUDA) ... skip: masked.amin does not support input with torch.sparse_bsr layout (0.002s) 2023-01-11T21:45:10.5041867Z test_consistency_SparseBSR_masked_amin_cuda_int16 (__main__.TestSparseCompressedCUDA) ... skip: masked.amin does not support input with torch.sparse_bsr layout (0.002s) 2023-01-11T21:45:10.5042133Z test_consistency_SparseBSR_masked_amin_cuda_int32 (__main__.TestSparseCompressedCUDA) ... skip: masked.amin does not support input with torch.sparse_bsr layout (0.002s) 2023-01-11T21:45:10.5042366Z test_consistency_SparseBSR_masked_amin_cuda_int64 (__main__.TestSparseCompressedCUDA) ... skip: masked.amin does not support input with torch.sparse_bsr layout (0.002s) 2023-01-11T21:45:10.5042614Z test_consistency_SparseBSR_masked_amin_cuda_int8 (__main__.TestSparseCompressedCUDA) ... skip: masked.amin does not support input with torch.sparse_bsr layout (0.002s) 2023-01-11T21:45:10.5042854Z test_consistency_SparseBSR_masked_amin_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... skip: masked.amin does not support input with torch.sparse_bsr layout (0.002s) 2023-01-11T21:45:10.5043109Z test_consistency_SparseBSR_masked_mean_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... skip: masked.mean does not support input with torch.sparse_bsr layout (0.002s) 2023-01-11T21:45:10.5043355Z test_consistency_SparseBSR_masked_mean_cuda_bool (__main__.TestSparseCompressedCUDA) ... skip: masked.mean does not support input with torch.sparse_bsr layout (0.002s) 2023-01-11T21:45:10.5043615Z test_consistency_SparseBSR_masked_mean_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... skip: masked.mean does not support input with torch.sparse_bsr layout (0.002s) 2023-01-11T21:45:10.5043869Z test_consistency_SparseBSR_masked_mean_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... skip: masked.mean does not support input with torch.sparse_bsr layout (0.002s) 2023-01-11T21:45:10.5044115Z test_consistency_SparseBSR_masked_mean_cuda_float16 (__main__.TestSparseCompressedCUDA) ... skip: masked.mean does not support input with torch.sparse_bsr layout (0.002s) 2023-01-11T21:45:10.5044387Z test_consistency_SparseBSR_masked_mean_cuda_float32 (__main__.TestSparseCompressedCUDA) ... skip: masked.mean does not support input with torch.sparse_bsr layout (0.002s) 2023-01-11T21:45:10.5044628Z test_consistency_SparseBSR_masked_mean_cuda_float64 (__main__.TestSparseCompressedCUDA) ... skip: masked.mean does not support input with torch.sparse_bsr layout (0.002s) 2023-01-11T21:45:10.5044866Z test_consistency_SparseBSR_masked_mean_cuda_int16 (__main__.TestSparseCompressedCUDA) ... skip: masked.mean does not support input with torch.sparse_bsr layout (0.002s) 2023-01-11T21:45:10.5045106Z test_consistency_SparseBSR_masked_mean_cuda_int32 (__main__.TestSparseCompressedCUDA) ... skip: masked.mean does not support input with torch.sparse_bsr layout (0.002s) 2023-01-11T21:45:10.5045343Z test_consistency_SparseBSR_masked_mean_cuda_int64 (__main__.TestSparseCompressedCUDA) ... skip: masked.mean does not support input with torch.sparse_bsr layout (0.002s) 2023-01-11T21:45:10.5045578Z test_consistency_SparseBSR_masked_mean_cuda_int8 (__main__.TestSparseCompressedCUDA) ... skip: masked.mean does not support input with torch.sparse_bsr layout (0.002s) 2023-01-11T21:45:10.5045826Z test_consistency_SparseBSR_masked_mean_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... skip: masked.mean does not support input with torch.sparse_bsr layout (0.002s) 2023-01-11T21:45:10.5046073Z test_consistency_SparseBSR_masked_prod_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... skip: masked.prod does not support input with torch.sparse_bsr layout (0.002s) 2023-01-11T21:45:10.5046319Z test_consistency_SparseBSR_masked_prod_cuda_bool (__main__.TestSparseCompressedCUDA) ... skip: masked.prod does not support input with torch.sparse_bsr layout (0.002s) 2023-01-11T21:45:10.5046574Z test_consistency_SparseBSR_masked_prod_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... skip: masked.prod does not support input with torch.sparse_bsr layout (0.002s) 2023-01-11T21:45:10.5046827Z test_consistency_SparseBSR_masked_prod_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... skip: masked.prod does not support input with torch.sparse_bsr layout (0.002s) 2023-01-11T21:45:10.5047109Z test_consistency_SparseBSR_masked_prod_cuda_float16 (__main__.TestSparseCompressedCUDA) ... skip: masked.prod does not support input with torch.sparse_bsr layout (0.002s) 2023-01-11T21:45:10.5047360Z test_consistency_SparseBSR_masked_prod_cuda_float32 (__main__.TestSparseCompressedCUDA) ... skip: masked.prod does not support input with torch.sparse_bsr layout (0.002s) 2023-01-11T21:45:10.5047604Z test_consistency_SparseBSR_masked_prod_cuda_float64 (__main__.TestSparseCompressedCUDA) ... skip: masked.prod does not support input with torch.sparse_bsr layout (0.002s) 2023-01-11T21:45:10.5047851Z test_consistency_SparseBSR_masked_prod_cuda_int16 (__main__.TestSparseCompressedCUDA) ... skip: masked.prod does not support input with torch.sparse_bsr layout (0.002s) 2023-01-11T21:45:10.5048096Z test_consistency_SparseBSR_masked_prod_cuda_int32 (__main__.TestSparseCompressedCUDA) ... skip: masked.prod does not support input with torch.sparse_bsr layout (0.002s) 2023-01-11T21:45:10.5048340Z test_consistency_SparseBSR_masked_prod_cuda_int64 (__main__.TestSparseCompressedCUDA) ... skip: masked.prod does not support input with torch.sparse_bsr layout (0.002s) 2023-01-11T21:45:10.5048584Z test_consistency_SparseBSR_masked_prod_cuda_int8 (__main__.TestSparseCompressedCUDA) ... skip: masked.prod does not support input with torch.sparse_bsr layout (0.002s) 2023-01-11T21:45:10.5048817Z test_consistency_SparseBSR_masked_prod_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... skip: masked.prod does not support input with torch.sparse_bsr layout (0.002s) 2023-01-11T21:45:10.5049062Z test_consistency_SparseBSR_masked_sum_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... skip: masked.sum does not support input with torch.sparse_bsr layout (0.002s) 2023-01-11T21:45:10.5049310Z test_consistency_SparseBSR_masked_sum_cuda_bool (__main__.TestSparseCompressedCUDA) ... skip: masked.sum does not support input with torch.sparse_bsr layout (0.002s) 2023-01-11T21:45:10.5049601Z test_consistency_SparseBSR_masked_sum_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... skip: masked.sum does not support input with torch.sparse_bsr layout (0.002s) 2023-01-11T21:45:10.5049848Z test_consistency_SparseBSR_masked_sum_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... skip: masked.sum does not support input with torch.sparse_bsr layout (0.002s) 2023-01-11T21:45:10.5050091Z test_consistency_SparseBSR_masked_sum_cuda_float16 (__main__.TestSparseCompressedCUDA) ... skip: masked.sum does not support input with torch.sparse_bsr layout (0.002s) 2023-01-11T21:45:10.5050337Z test_consistency_SparseBSR_masked_sum_cuda_float32 (__main__.TestSparseCompressedCUDA) ... skip: masked.sum does not support input with torch.sparse_bsr layout (0.002s) 2023-01-11T21:45:10.5050575Z test_consistency_SparseBSR_masked_sum_cuda_float64 (__main__.TestSparseCompressedCUDA) ... skip: masked.sum does not support input with torch.sparse_bsr layout (0.002s) 2023-01-11T21:45:10.5050823Z test_consistency_SparseBSR_masked_sum_cuda_int16 (__main__.TestSparseCompressedCUDA) ... skip: masked.sum does not support input with torch.sparse_bsr layout (0.002s) 2023-01-11T21:45:10.5051072Z test_consistency_SparseBSR_masked_sum_cuda_int32 (__main__.TestSparseCompressedCUDA) ... skip: masked.sum does not support input with torch.sparse_bsr layout (0.002s) 2023-01-11T21:45:10.5051314Z test_consistency_SparseBSR_masked_sum_cuda_int64 (__main__.TestSparseCompressedCUDA) ... skip: masked.sum does not support input with torch.sparse_bsr layout (0.002s) 2023-01-11T21:45:10.5051558Z test_consistency_SparseBSR_masked_sum_cuda_int8 (__main__.TestSparseCompressedCUDA) ... skip: masked.sum does not support input with torch.sparse_bsr layout (0.002s) 2023-01-11T21:45:10.5051794Z test_consistency_SparseBSR_masked_sum_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... skip: masked.sum does not support input with torch.sparse_bsr layout (0.002s) 2023-01-11T21:45:10.5051973Z test_consistency_SparseBSR_neg_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5052224Z test_consistency_SparseBSR_neg_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5052406Z test_consistency_SparseBSR_neg_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5052578Z test_consistency_SparseBSR_neg_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5052753Z test_consistency_SparseBSR_neg_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5052929Z test_consistency_SparseBSR_neg_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5053102Z test_consistency_SparseBSR_neg_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5053276Z test_consistency_SparseBSR_neg_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5053442Z test_consistency_SparseBSR_neg_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5053615Z test_consistency_SparseBSR_neg_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5053786Z test_consistency_SparseBSR_neg_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5053954Z test_consistency_SparseBSR_neg_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5054146Z test_consistency_SparseBSR_nn_functional_relu_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5054337Z test_consistency_SparseBSR_nn_functional_relu_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5054674Z test_consistency_SparseBSR_nn_functional_relu_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5054911Z test_consistency_SparseBSR_nn_functional_relu_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5055155Z test_consistency_SparseBSR_nn_functional_relu_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5055341Z test_consistency_SparseBSR_nn_functional_relu_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5055532Z test_consistency_SparseBSR_nn_functional_relu_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5055720Z test_consistency_SparseBSR_nn_functional_relu_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5055907Z test_consistency_SparseBSR_nn_functional_relu_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5056093Z test_consistency_SparseBSR_positive_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5056281Z test_consistency_SparseBSR_positive_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5056470Z test_consistency_SparseBSR_positive_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5056658Z test_consistency_SparseBSR_positive_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5056840Z test_consistency_SparseBSR_positive_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5057015Z test_consistency_SparseBSR_positive_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5057199Z test_consistency_SparseBSR_positive_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5057378Z test_consistency_SparseBSR_positive_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5057553Z test_consistency_SparseBSR_positive_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5057734Z test_consistency_SparseBSR_positive_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5057915Z test_consistency_SparseBSR_positive_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5058123Z test_consistency_SparseBSR_positive_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5058310Z test_consistency_SparseBSR_rad2deg_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5058490Z test_consistency_SparseBSR_rad2deg_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5058743Z test_consistency_SparseBSR_rad2deg_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5058921Z test_consistency_SparseBSR_rad2deg_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5059096Z test_consistency_SparseBSR_rad2deg_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5059274Z test_consistency_SparseBSR_rad2deg_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5059449Z test_consistency_SparseBSR_rad2deg_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5059629Z test_consistency_SparseBSR_rad2deg_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5059808Z test_consistency_SparseBSR_rad2deg_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5059984Z test_consistency_SparseBSR_rad2deg_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5060236Z test_consistency_SparseBSR_randn_like_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... skip: randn_like does not support input with torch.sparse_bsr layout (0.002s) 2023-01-11T21:45:10.5060487Z test_consistency_SparseBSR_randn_like_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... skip: randn_like does not support input with torch.sparse_bsr layout (0.002s) 2023-01-11T21:45:10.5060759Z test_consistency_SparseBSR_randn_like_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... skip: randn_like does not support input with torch.sparse_bsr layout (0.002s) 2023-01-11T21:45:10.5061013Z test_consistency_SparseBSR_randn_like_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... skip: randn_like does not support input with torch.sparse_bsr layout (0.002s) 2023-01-11T21:45:10.5061263Z test_consistency_SparseBSR_randn_like_cuda_float16 (__main__.TestSparseCompressedCUDA) ... skip: randn_like does not support input with torch.sparse_bsr layout (0.002s) 2023-01-11T21:45:10.5061509Z test_consistency_SparseBSR_randn_like_cuda_float32 (__main__.TestSparseCompressedCUDA) ... skip: randn_like does not support input with torch.sparse_bsr layout (0.002s) 2023-01-11T21:45:10.5061753Z test_consistency_SparseBSR_randn_like_cuda_float64 (__main__.TestSparseCompressedCUDA) ... skip: randn_like does not support input with torch.sparse_bsr layout (0.002s) 2023-01-11T21:45:10.5061933Z test_consistency_SparseBSR_round_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5062114Z test_consistency_SparseBSR_round_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5062290Z test_consistency_SparseBSR_round_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5062461Z test_consistency_SparseBSR_round_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5062635Z test_consistency_SparseBSR_round_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5062804Z test_consistency_SparseBSR_round_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5062978Z test_consistency_SparseBSR_round_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5063152Z test_consistency_SparseBSR_round_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5063329Z test_consistency_SparseBSR_round_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5063508Z test_consistency_SparseBSR_sgn_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5063710Z test_consistency_SparseBSR_sgn_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5063894Z test_consistency_SparseBSR_sgn_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5064069Z test_consistency_SparseBSR_sgn_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5064241Z test_consistency_SparseBSR_sgn_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5064407Z test_consistency_SparseBSR_sgn_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5064580Z test_consistency_SparseBSR_sgn_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5064754Z test_consistency_SparseBSR_sgn_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5064929Z test_consistency_SparseBSR_sgn_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5065104Z test_consistency_SparseBSR_sgn_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5065276Z test_consistency_SparseBSR_sgn_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5065449Z test_consistency_SparseBSR_sgn_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5065618Z test_consistency_SparseBSR_sgn_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5065793Z test_consistency_SparseBSR_sign_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5065958Z test_consistency_SparseBSR_sign_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5066135Z test_consistency_SparseBSR_sign_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5066344Z test_consistency_SparseBSR_sign_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5066520Z test_consistency_SparseBSR_sign_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5066691Z test_consistency_SparseBSR_sign_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5066863Z test_consistency_SparseBSR_sign_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5067038Z test_consistency_SparseBSR_sign_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5067208Z test_consistency_SparseBSR_sign_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5067373Z test_consistency_SparseBSR_sign_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5067555Z test_consistency_SparseBSR_signbit_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5067737Z test_consistency_SparseBSR_signbit_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5067920Z test_consistency_SparseBSR_signbit_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5068101Z test_consistency_SparseBSR_signbit_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5068279Z test_consistency_SparseBSR_signbit_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5068455Z test_consistency_SparseBSR_signbit_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5068626Z test_consistency_SparseBSR_signbit_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5068804Z test_consistency_SparseBSR_signbit_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5068972Z test_consistency_SparseBSR_signbit_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5069150Z test_consistency_SparseBSR_signbit_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5069347Z test_consistency_SparseBSR_sin_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5069519Z test_consistency_SparseBSR_sin_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5069700Z test_consistency_SparseBSR_sin_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5069873Z test_consistency_SparseBSR_sin_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5070044Z test_consistency_SparseBSR_sin_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5070215Z test_consistency_SparseBSR_sin_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5070391Z test_consistency_SparseBSR_sin_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5070561Z test_consistency_SparseBSR_sin_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5070735Z test_consistency_SparseBSR_sin_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5070909Z test_consistency_SparseBSR_sin_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5071081Z test_consistency_SparseBSR_sin_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5071254Z test_consistency_SparseBSR_sin_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5071422Z test_consistency_SparseBSR_sin_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5071597Z test_consistency_SparseBSR_sinh_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5071766Z test_consistency_SparseBSR_sinh_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5071975Z test_consistency_SparseBSR_sinh_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5072151Z test_consistency_SparseBSR_sinh_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5072329Z test_consistency_SparseBSR_sinh_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5072506Z test_consistency_SparseBSR_sinh_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5072677Z test_consistency_SparseBSR_sinh_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5072852Z test_consistency_SparseBSR_sinh_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5073025Z test_consistency_SparseBSR_sinh_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5073200Z test_consistency_SparseBSR_sinh_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5073398Z test_consistency_SparseBSR_sinh_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5073589Z test_consistency_SparseBSR_sinh_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5073764Z test_consistency_SparseBSR_sinh_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5073942Z test_consistency_SparseBSR_sqrt_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5074112Z test_consistency_SparseBSR_sqrt_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5074293Z test_consistency_SparseBSR_sqrt_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5074473Z test_consistency_SparseBSR_sqrt_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5074651Z test_consistency_SparseBSR_sqrt_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5074830Z test_consistency_SparseBSR_sqrt_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5075025Z test_consistency_SparseBSR_sqrt_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5075195Z test_consistency_SparseBSR_sqrt_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5075367Z test_consistency_SparseBSR_sqrt_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5075538Z test_consistency_SparseBSR_sqrt_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5075711Z test_consistency_SparseBSR_sqrt_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5075882Z test_consistency_SparseBSR_sqrt_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5076052Z test_consistency_SparseBSR_sqrt_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5076229Z test_consistency_SparseBSR_tan_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5076405Z test_consistency_SparseBSR_tan_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5076583Z test_consistency_SparseBSR_tan_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5076754Z test_consistency_SparseBSR_tan_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5076935Z test_consistency_SparseBSR_tan_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5077110Z test_consistency_SparseBSR_tan_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5077285Z test_consistency_SparseBSR_tan_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5077460Z test_consistency_SparseBSR_tan_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5077662Z test_consistency_SparseBSR_tan_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5077834Z test_consistency_SparseBSR_tan_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5078006Z test_consistency_SparseBSR_tan_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5078178Z test_consistency_SparseBSR_tan_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5078344Z test_consistency_SparseBSR_tan_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5078521Z test_consistency_SparseBSR_tanh_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5078692Z test_consistency_SparseBSR_tanh_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5078876Z test_consistency_SparseBSR_tanh_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5079057Z test_consistency_SparseBSR_tanh_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5079230Z test_consistency_SparseBSR_tanh_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5079407Z test_consistency_SparseBSR_tanh_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5079584Z test_consistency_SparseBSR_tanh_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5079748Z test_consistency_SparseBSR_tanh_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5079920Z test_consistency_SparseBSR_tanh_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5080092Z test_consistency_SparseBSR_tanh_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5080265Z test_consistency_SparseBSR_tanh_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5080438Z test_consistency_SparseBSR_tanh_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5080608Z test_consistency_SparseBSR_tanh_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5080883Z test_consistency_SparseBSR_to_sparse_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... skip: to_sparse does not support input with torch.sparse_bsr layout (0.002s) 2023-01-11T21:45:10.5081132Z test_consistency_SparseBSR_to_sparse_cuda_bool (__main__.TestSparseCompressedCUDA) ... skip: to_sparse does not support input with torch.sparse_bsr layout (0.002s) 2023-01-11T21:45:10.5081382Z test_consistency_SparseBSR_to_sparse_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... skip: to_sparse does not support input with torch.sparse_bsr layout (0.002s) 2023-01-11T21:45:10.5081631Z test_consistency_SparseBSR_to_sparse_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... skip: to_sparse does not support input with torch.sparse_bsr layout (0.002s) 2023-01-11T21:45:10.5081873Z test_consistency_SparseBSR_to_sparse_cuda_float16 (__main__.TestSparseCompressedCUDA) ... skip: to_sparse does not support input with torch.sparse_bsr layout (0.002s) 2023-01-11T21:45:10.5082112Z test_consistency_SparseBSR_to_sparse_cuda_float32 (__main__.TestSparseCompressedCUDA) ... skip: to_sparse does not support input with torch.sparse_bsr layout (0.002s) 2023-01-11T21:45:10.5082358Z test_consistency_SparseBSR_to_sparse_cuda_float64 (__main__.TestSparseCompressedCUDA) ... skip: to_sparse does not support input with torch.sparse_bsr layout (0.002s) 2023-01-11T21:45:10.5082603Z test_consistency_SparseBSR_to_sparse_cuda_int16 (__main__.TestSparseCompressedCUDA) ... skip: to_sparse does not support input with torch.sparse_bsr layout (0.002s) 2023-01-11T21:45:10.5082845Z test_consistency_SparseBSR_to_sparse_cuda_int32 (__main__.TestSparseCompressedCUDA) ... skip: to_sparse does not support input with torch.sparse_bsr layout (0.002s) 2023-01-11T21:45:10.5083113Z test_consistency_SparseBSR_to_sparse_cuda_int64 (__main__.TestSparseCompressedCUDA) ... skip: to_sparse does not support input with torch.sparse_bsr layout (0.002s) 2023-01-11T21:45:10.5083359Z test_consistency_SparseBSR_to_sparse_cuda_int8 (__main__.TestSparseCompressedCUDA) ... skip: to_sparse does not support input with torch.sparse_bsr layout (0.002s) 2023-01-11T21:45:10.5083640Z test_consistency_SparseBSR_to_sparse_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... skip: to_sparse does not support input with torch.sparse_bsr layout (0.002s) 2023-01-11T21:45:10.5083832Z test_consistency_SparseBSR_trunc_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5084007Z test_consistency_SparseBSR_trunc_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5084178Z test_consistency_SparseBSR_trunc_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5084352Z test_consistency_SparseBSR_trunc_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5084530Z test_consistency_SparseBSR_trunc_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5084707Z test_consistency_SparseBSR_trunc_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5084883Z test_consistency_SparseBSR_trunc_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5085056Z test_consistency_SparseBSR_trunc_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5085229Z test_consistency_SparseBSR_trunc_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5085406Z test_consistency_SparseCSC_abs_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5085581Z test_consistency_SparseCSC_abs_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5085760Z test_consistency_SparseCSC_abs_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5085932Z test_consistency_SparseCSC_abs_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5086136Z test_consistency_SparseCSC_abs_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5086312Z test_consistency_SparseCSC_abs_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5086484Z test_consistency_SparseCSC_abs_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5086657Z test_consistency_SparseCSC_abs_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5086831Z test_consistency_SparseCSC_abs_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5087002Z test_consistency_SparseCSC_abs_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5087174Z test_consistency_SparseCSC_abs_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5087346Z test_consistency_SparseCSC_abs_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5087515Z test_consistency_SparseCSC_abs_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5087690Z test_consistency_SparseCSC_angle_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5087875Z test_consistency_SparseCSC_angle_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5088056Z test_consistency_SparseCSC_angle_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5088232Z test_consistency_SparseCSC_angle_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5088410Z test_consistency_SparseCSC_angle_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5088583Z test_consistency_SparseCSC_angle_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5088790Z test_consistency_SparseCSC_angle_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5088968Z test_consistency_SparseCSC_angle_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5089134Z test_consistency_SparseCSC_angle_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5089308Z test_consistency_SparseCSC_angle_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5089482Z test_consistency_SparseCSC_angle_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5089660Z test_consistency_SparseCSC_asin_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5089832Z test_consistency_SparseCSC_asin_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5090011Z test_consistency_SparseCSC_asin_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5090193Z test_consistency_SparseCSC_asin_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5090371Z test_consistency_SparseCSC_asin_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5090547Z test_consistency_SparseCSC_asin_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5090713Z test_consistency_SparseCSC_asin_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5090883Z test_consistency_SparseCSC_asin_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5091056Z test_consistency_SparseCSC_asin_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5091228Z test_consistency_SparseCSC_asin_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5091401Z test_consistency_SparseCSC_asin_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5091576Z test_consistency_SparseCSC_asin_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5091769Z test_consistency_SparseCSC_asin_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5091952Z test_consistency_SparseCSC_asinh_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5092120Z test_consistency_SparseCSC_asinh_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5092299Z test_consistency_SparseCSC_asinh_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5092479Z test_consistency_SparseCSC_asinh_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5092655Z test_consistency_SparseCSC_asinh_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5092832Z test_consistency_SparseCSC_asinh_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5093007Z test_consistency_SparseCSC_asinh_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5093187Z test_consistency_SparseCSC_asinh_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5093361Z test_consistency_SparseCSC_asinh_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5093542Z test_consistency_SparseCSC_asinh_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5093735Z test_consistency_SparseCSC_asinh_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5093924Z test_consistency_SparseCSC_asinh_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5094093Z test_consistency_SparseCSC_asinh_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5094270Z test_consistency_SparseCSC_atan_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5094580Z test_consistency_SparseCSC_atan_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5094764Z test_consistency_SparseCSC_atan_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5094943Z test_consistency_SparseCSC_atan_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5095126Z test_consistency_SparseCSC_atan_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5095301Z test_consistency_SparseCSC_atan_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5095465Z test_consistency_SparseCSC_atan_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5095635Z test_consistency_SparseCSC_atan_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5095806Z test_consistency_SparseCSC_atan_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5095979Z test_consistency_SparseCSC_atan_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5096150Z test_consistency_SparseCSC_atan_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5096321Z test_consistency_SparseCSC_atan_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5096493Z test_consistency_SparseCSC_atan_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5096671Z test_consistency_SparseCSC_atanh_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5096841Z test_consistency_SparseCSC_atanh_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5097016Z test_consistency_SparseCSC_atanh_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5097197Z test_consistency_SparseCSC_atanh_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5097374Z test_consistency_SparseCSC_atanh_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5097582Z test_consistency_SparseCSC_atanh_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5097761Z test_consistency_SparseCSC_atanh_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5097931Z test_consistency_SparseCSC_atanh_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5098102Z test_consistency_SparseCSC_atanh_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5098276Z test_consistency_SparseCSC_atanh_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5098440Z test_consistency_SparseCSC_atanh_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5098662Z test_consistency_SparseCSC_atanh_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5098836Z test_consistency_SparseCSC_atanh_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5099012Z test_consistency_SparseCSC_ceil_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5099184Z test_consistency_SparseCSC_ceil_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5099354Z test_consistency_SparseCSC_ceil_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5099526Z test_consistency_SparseCSC_ceil_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5099695Z test_consistency_SparseCSC_ceil_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5099863Z test_consistency_SparseCSC_ceil_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5100027Z test_consistency_SparseCSC_ceil_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5100230Z test_consistency_SparseCSC_ceil_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5100404Z test_consistency_SparseCSC_ceil_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5100594Z test_consistency_SparseCSC_conj_physical_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5100779Z test_consistency_SparseCSC_conj_physical_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5100968Z test_consistency_SparseCSC_conj_physical_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5101156Z test_consistency_SparseCSC_conj_physical_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5101343Z test_consistency_SparseCSC_conj_physical_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5101531Z test_consistency_SparseCSC_conj_physical_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5101708Z test_consistency_SparseCSC_conj_physical_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5101893Z test_consistency_SparseCSC_conj_physical_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5102075Z test_consistency_SparseCSC_conj_physical_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5102256Z test_consistency_SparseCSC_conj_physical_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5102434Z test_consistency_SparseCSC_conj_physical_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5102617Z test_consistency_SparseCSC_conj_physical_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5102798Z test_consistency_SparseCSC_conj_physical_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5102984Z test_consistency_SparseCSC_deg2rad_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5103157Z test_consistency_SparseCSC_deg2rad_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5103352Z test_consistency_SparseCSC_deg2rad_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5103554Z test_consistency_SparseCSC_deg2rad_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5103749Z test_consistency_SparseCSC_deg2rad_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5103924Z test_consistency_SparseCSC_deg2rad_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5104094Z test_consistency_SparseCSC_deg2rad_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5104264Z test_consistency_SparseCSC_deg2rad_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5104441Z test_consistency_SparseCSC_deg2rad_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5104613Z test_consistency_SparseCSC_deg2rad_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5104789Z test_consistency_SparseCSC_erf_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5104955Z test_consistency_SparseCSC_erf_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5105130Z test_consistency_SparseCSC_erf_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5105302Z test_consistency_SparseCSC_erf_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5105473Z test_consistency_SparseCSC_erf_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5105643Z test_consistency_SparseCSC_erf_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5105838Z test_consistency_SparseCSC_erf_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5106009Z test_consistency_SparseCSC_erf_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5106180Z test_consistency_SparseCSC_erf_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5106348Z test_consistency_SparseCSC_erf_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5106514Z test_consistency_SparseCSC_erfinv_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5106694Z test_consistency_SparseCSC_erfinv_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5106872Z test_consistency_SparseCSC_erfinv_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5107047Z test_consistency_SparseCSC_erfinv_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5107225Z test_consistency_SparseCSC_erfinv_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5107398Z test_consistency_SparseCSC_erfinv_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5107572Z test_consistency_SparseCSC_erfinv_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5107743Z test_consistency_SparseCSC_erfinv_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5107915Z test_consistency_SparseCSC_erfinv_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5108086Z test_consistency_SparseCSC_expm1_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5108258Z test_consistency_SparseCSC_expm1_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5108431Z test_consistency_SparseCSC_expm1_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5108603Z test_consistency_SparseCSC_expm1_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5108772Z test_consistency_SparseCSC_expm1_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5108968Z test_consistency_SparseCSC_expm1_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5109139Z test_consistency_SparseCSC_expm1_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5109311Z test_consistency_SparseCSC_expm1_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5109476Z test_consistency_SparseCSC_expm1_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5109647Z test_consistency_SparseCSC_expm1_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5109822Z test_consistency_SparseCSC_floor_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5110004Z test_consistency_SparseCSC_floor_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5110175Z test_consistency_SparseCSC_floor_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5110345Z test_consistency_SparseCSC_floor_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5110514Z test_consistency_SparseCSC_floor_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5110684Z test_consistency_SparseCSC_floor_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5110853Z test_consistency_SparseCSC_floor_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5111017Z test_consistency_SparseCSC_floor_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5111184Z test_consistency_SparseCSC_floor_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5111385Z test_consistency_SparseCSC_frac_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5111557Z test_consistency_SparseCSC_frac_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5111731Z test_consistency_SparseCSC_frac_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5111900Z test_consistency_SparseCSC_frac_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5112077Z test_consistency_SparseCSC_isinf_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5112248Z test_consistency_SparseCSC_isinf_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5112430Z test_consistency_SparseCSC_isinf_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5112602Z test_consistency_SparseCSC_isinf_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5112780Z test_consistency_SparseCSC_isinf_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5112955Z test_consistency_SparseCSC_isinf_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5113122Z test_consistency_SparseCSC_isinf_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5113291Z test_consistency_SparseCSC_isinf_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5113486Z test_consistency_SparseCSC_isinf_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5113680Z test_consistency_SparseCSC_isinf_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5113849Z test_consistency_SparseCSC_isinf_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5114022Z test_consistency_SparseCSC_isinf_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5114187Z test_consistency_SparseCSC_isinf_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5114362Z test_consistency_SparseCSC_isnan_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5114559Z test_consistency_SparseCSC_isnan_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5114742Z test_consistency_SparseCSC_isnan_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5114922Z test_consistency_SparseCSC_isnan_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5115094Z test_consistency_SparseCSC_isnan_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5115265Z test_consistency_SparseCSC_isnan_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5115434Z test_consistency_SparseCSC_isnan_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5115607Z test_consistency_SparseCSC_isnan_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5115772Z test_consistency_SparseCSC_isnan_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5115943Z test_consistency_SparseCSC_isnan_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5116111Z test_consistency_SparseCSC_isnan_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5116276Z test_consistency_SparseCSC_isnan_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5116459Z test_consistency_SparseCSC_isneginf_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5116637Z test_consistency_SparseCSC_isneginf_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5116818Z test_consistency_SparseCSC_isneginf_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5117018Z test_consistency_SparseCSC_isneginf_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5117194Z test_consistency_SparseCSC_isneginf_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5117370Z test_consistency_SparseCSC_isneginf_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5117541Z test_consistency_SparseCSC_isneginf_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5117712Z test_consistency_SparseCSC_isneginf_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5117886Z test_consistency_SparseCSC_isneginf_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5118064Z test_consistency_SparseCSC_isneginf_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5118244Z test_consistency_SparseCSC_isposinf_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5118420Z test_consistency_SparseCSC_isposinf_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5118600Z test_consistency_SparseCSC_isposinf_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5118769Z test_consistency_SparseCSC_isposinf_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5118948Z test_consistency_SparseCSC_isposinf_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5119122Z test_consistency_SparseCSC_isposinf_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5119295Z test_consistency_SparseCSC_isposinf_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5119464Z test_consistency_SparseCSC_isposinf_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5119639Z test_consistency_SparseCSC_isposinf_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5119816Z test_consistency_SparseCSC_isposinf_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5120016Z test_consistency_SparseCSC_log1p_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5120192Z test_consistency_SparseCSC_log1p_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5120363Z test_consistency_SparseCSC_log1p_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5120541Z test_consistency_SparseCSC_log1p_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5120716Z test_consistency_SparseCSC_log1p_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5120888Z test_consistency_SparseCSC_log1p_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5121065Z test_consistency_SparseCSC_log1p_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5121237Z test_consistency_SparseCSC_log1p_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5121412Z test_consistency_SparseCSC_log1p_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5121584Z test_consistency_SparseCSC_log1p_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5121755Z test_consistency_SparseCSC_log1p_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5121918Z test_consistency_SparseCSC_log1p_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5122173Z test_consistency_SparseCSC_masked_amax_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... skip: masked.amax does not support input with torch.sparse_csc layout (0.002s) 2023-01-11T21:45:10.5122422Z test_consistency_SparseCSC_masked_amax_cuda_float16 (__main__.TestSparseCompressedCUDA) ... skip: masked.amax does not support input with torch.sparse_csc layout (0.002s) 2023-01-11T21:45:10.5122700Z test_consistency_SparseCSC_masked_amax_cuda_float32 (__main__.TestSparseCompressedCUDA) ... skip: masked.amax does not support input with torch.sparse_csc layout (0.002s) 2023-01-11T21:45:10.5122945Z test_consistency_SparseCSC_masked_amax_cuda_float64 (__main__.TestSparseCompressedCUDA) ... skip: masked.amax does not support input with torch.sparse_csc layout (0.002s) 2023-01-11T21:45:10.5123195Z test_consistency_SparseCSC_masked_amax_cuda_int16 (__main__.TestSparseCompressedCUDA) ... skip: masked.amax does not support input with torch.sparse_csc layout (0.002s) 2023-01-11T21:45:10.5123443Z test_consistency_SparseCSC_masked_amax_cuda_int32 (__main__.TestSparseCompressedCUDA) ... skip: masked.amax does not support input with torch.sparse_csc layout (0.002s) 2023-01-11T21:45:10.5123680Z test_consistency_SparseCSC_masked_amax_cuda_int64 (__main__.TestSparseCompressedCUDA) ... skip: masked.amax does not support input with torch.sparse_csc layout (0.002s) 2023-01-11T21:45:10.5123930Z test_consistency_SparseCSC_masked_amax_cuda_int8 (__main__.TestSparseCompressedCUDA) ... skip: masked.amax does not support input with torch.sparse_csc layout (0.002s) 2023-01-11T21:45:10.5124170Z test_consistency_SparseCSC_masked_amax_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... skip: masked.amax does not support input with torch.sparse_csc layout (0.002s) 2023-01-11T21:45:10.5124419Z test_consistency_SparseCSC_masked_amin_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... skip: masked.amin does not support input with torch.sparse_csc layout (0.002s) 2023-01-11T21:45:10.5124661Z test_consistency_SparseCSC_masked_amin_cuda_float16 (__main__.TestSparseCompressedCUDA) ... skip: masked.amin does not support input with torch.sparse_csc layout (0.002s) 2023-01-11T21:45:10.5124896Z test_consistency_SparseCSC_masked_amin_cuda_float32 (__main__.TestSparseCompressedCUDA) ... skip: masked.amin does not support input with torch.sparse_csc layout (0.002s) 2023-01-11T21:45:10.5125145Z test_consistency_SparseCSC_masked_amin_cuda_float64 (__main__.TestSparseCompressedCUDA) ... skip: masked.amin does not support input with torch.sparse_csc layout (0.002s) 2023-01-11T21:45:10.5125410Z test_consistency_SparseCSC_masked_amin_cuda_int16 (__main__.TestSparseCompressedCUDA) ... skip: masked.amin does not support input with torch.sparse_csc layout (0.002s) 2023-01-11T21:45:10.5125655Z test_consistency_SparseCSC_masked_amin_cuda_int32 (__main__.TestSparseCompressedCUDA) ... skip: masked.amin does not support input with torch.sparse_csc layout (0.002s) 2023-01-11T21:45:10.5125892Z test_consistency_SparseCSC_masked_amin_cuda_int64 (__main__.TestSparseCompressedCUDA) ... skip: masked.amin does not support input with torch.sparse_csc layout (0.002s) 2023-01-11T21:45:10.5126136Z test_consistency_SparseCSC_masked_amin_cuda_int8 (__main__.TestSparseCompressedCUDA) ... skip: masked.amin does not support input with torch.sparse_csc layout (0.002s) 2023-01-11T21:45:10.5126384Z test_consistency_SparseCSC_masked_amin_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... skip: masked.amin does not support input with torch.sparse_csc layout (0.002s) 2023-01-11T21:45:10.5126636Z test_consistency_SparseCSC_masked_mean_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... skip: masked.mean does not support input with torch.sparse_csc layout (0.002s) 2023-01-11T21:45:10.5126882Z test_consistency_SparseCSC_masked_mean_cuda_bool (__main__.TestSparseCompressedCUDA) ... skip: masked.mean does not support input with torch.sparse_csc layout (0.002s) 2023-01-11T21:45:10.5127137Z test_consistency_SparseCSC_masked_mean_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... skip: masked.mean does not support input with torch.sparse_csc layout (0.002s) 2023-01-11T21:45:10.5127393Z test_consistency_SparseCSC_masked_mean_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... skip: masked.mean does not support input with torch.sparse_csc layout (0.002s) 2023-01-11T21:45:10.5127673Z test_consistency_SparseCSC_masked_mean_cuda_float16 (__main__.TestSparseCompressedCUDA) ... skip: masked.mean does not support input with torch.sparse_csc layout (0.002s) 2023-01-11T21:45:10.5127927Z test_consistency_SparseCSC_masked_mean_cuda_float32 (__main__.TestSparseCompressedCUDA) ... skip: masked.mean does not support input with torch.sparse_csc layout (0.002s) 2023-01-11T21:45:10.5128161Z test_consistency_SparseCSC_masked_mean_cuda_float64 (__main__.TestSparseCompressedCUDA) ... skip: masked.mean does not support input with torch.sparse_csc layout (0.002s) 2023-01-11T21:45:10.5128401Z test_consistency_SparseCSC_masked_mean_cuda_int16 (__main__.TestSparseCompressedCUDA) ... skip: masked.mean does not support input with torch.sparse_csc layout (0.002s) 2023-01-11T21:45:10.5128644Z test_consistency_SparseCSC_masked_mean_cuda_int32 (__main__.TestSparseCompressedCUDA) ... skip: masked.mean does not support input with torch.sparse_csc layout (0.002s) 2023-01-11T21:45:10.5128881Z test_consistency_SparseCSC_masked_mean_cuda_int64 (__main__.TestSparseCompressedCUDA) ... skip: masked.mean does not support input with torch.sparse_csc layout (0.002s) 2023-01-11T21:45:10.5129124Z test_consistency_SparseCSC_masked_mean_cuda_int8 (__main__.TestSparseCompressedCUDA) ... skip: masked.mean does not support input with torch.sparse_csc layout (0.002s) 2023-01-11T21:45:10.5129361Z test_consistency_SparseCSC_masked_mean_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... skip: masked.mean does not support input with torch.sparse_csc layout (0.002s) 2023-01-11T21:45:10.5129609Z test_consistency_SparseCSC_masked_prod_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... skip: masked.prod does not support input with torch.sparse_csc layout (0.002s) 2023-01-11T21:45:10.5129852Z test_consistency_SparseCSC_masked_prod_cuda_bool (__main__.TestSparseCompressedCUDA) ... skip: masked.prod does not support input with torch.sparse_csc layout (0.002s) 2023-01-11T21:45:10.5130107Z test_consistency_SparseCSC_masked_prod_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... skip: masked.prod does not support input with torch.sparse_csc layout (0.002s) 2023-01-11T21:45:10.5130380Z test_consistency_SparseCSC_masked_prod_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... skip: masked.prod does not support input with torch.sparse_csc layout (0.002s) 2023-01-11T21:45:10.5130627Z test_consistency_SparseCSC_masked_prod_cuda_float16 (__main__.TestSparseCompressedCUDA) ... skip: masked.prod does not support input with torch.sparse_csc layout (0.002s) 2023-01-11T21:45:10.5130879Z test_consistency_SparseCSC_masked_prod_cuda_float32 (__main__.TestSparseCompressedCUDA) ... skip: masked.prod does not support input with torch.sparse_csc layout (0.002s) 2023-01-11T21:45:10.5131120Z test_consistency_SparseCSC_masked_prod_cuda_float64 (__main__.TestSparseCompressedCUDA) ... skip: masked.prod does not support input with torch.sparse_csc layout (0.002s) 2023-01-11T21:45:10.5131356Z test_consistency_SparseCSC_masked_prod_cuda_int16 (__main__.TestSparseCompressedCUDA) ... skip: masked.prod does not support input with torch.sparse_csc layout (0.002s) 2023-01-11T21:45:10.5131599Z test_consistency_SparseCSC_masked_prod_cuda_int32 (__main__.TestSparseCompressedCUDA) ... skip: masked.prod does not support input with torch.sparse_csc layout (0.002s) 2023-01-11T21:45:10.5131838Z test_consistency_SparseCSC_masked_prod_cuda_int64 (__main__.TestSparseCompressedCUDA) ... skip: masked.prod does not support input with torch.sparse_csc layout (0.002s) 2023-01-11T21:45:10.5132078Z test_consistency_SparseCSC_masked_prod_cuda_int8 (__main__.TestSparseCompressedCUDA) ... skip: masked.prod does not support input with torch.sparse_csc layout (0.002s) 2023-01-11T21:45:10.5132316Z test_consistency_SparseCSC_masked_prod_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... skip: masked.prod does not support input with torch.sparse_csc layout (0.002s) 2023-01-11T21:45:10.5132590Z test_consistency_SparseCSC_masked_sum_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... skip: masked.sum does not support input with torch.sparse_csc layout (0.002s) 2023-01-11T21:45:10.5132840Z test_consistency_SparseCSC_masked_sum_cuda_bool (__main__.TestSparseCompressedCUDA) ... skip: masked.sum does not support input with torch.sparse_csc layout (0.002s) 2023-01-11T21:45:10.5133094Z test_consistency_SparseCSC_masked_sum_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... skip: masked.sum does not support input with torch.sparse_csc layout (0.002s) 2023-01-11T21:45:10.5133344Z test_consistency_SparseCSC_masked_sum_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... skip: masked.sum does not support input with torch.sparse_csc layout (0.002s) 2023-01-11T21:45:10.5133591Z test_consistency_SparseCSC_masked_sum_cuda_float16 (__main__.TestSparseCompressedCUDA) ... skip: masked.sum does not support input with torch.sparse_csc layout (0.002s) 2023-01-11T21:45:10.5133832Z test_consistency_SparseCSC_masked_sum_cuda_float32 (__main__.TestSparseCompressedCUDA) ... skip: masked.sum does not support input with torch.sparse_csc layout (0.002s) 2023-01-11T21:45:10.5134077Z test_consistency_SparseCSC_masked_sum_cuda_float64 (__main__.TestSparseCompressedCUDA) ... skip: masked.sum does not support input with torch.sparse_csc layout (0.002s) 2023-01-11T21:45:10.5134320Z test_consistency_SparseCSC_masked_sum_cuda_int16 (__main__.TestSparseCompressedCUDA) ... skip: masked.sum does not support input with torch.sparse_csc layout (0.002s) 2023-01-11T21:45:10.5134654Z test_consistency_SparseCSC_masked_sum_cuda_int32 (__main__.TestSparseCompressedCUDA) ... skip: masked.sum does not support input with torch.sparse_csc layout (0.002s) 2023-01-11T21:45:10.5134946Z test_consistency_SparseCSC_masked_sum_cuda_int64 (__main__.TestSparseCompressedCUDA) ... skip: masked.sum does not support input with torch.sparse_csc layout (0.002s) 2023-01-11T21:45:10.5135225Z test_consistency_SparseCSC_masked_sum_cuda_int8 (__main__.TestSparseCompressedCUDA) ... skip: masked.sum does not support input with torch.sparse_csc layout (0.002s) 2023-01-11T21:45:10.5135525Z test_consistency_SparseCSC_masked_sum_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... skip: masked.sum does not support input with torch.sparse_csc layout (0.002s) 2023-01-11T21:45:10.5135706Z test_consistency_SparseCSC_neg_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5135886Z test_consistency_SparseCSC_neg_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5136062Z test_consistency_SparseCSC_neg_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5136235Z test_consistency_SparseCSC_neg_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5136405Z test_consistency_SparseCSC_neg_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5136580Z test_consistency_SparseCSC_neg_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5136749Z test_consistency_SparseCSC_neg_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5136922Z test_consistency_SparseCSC_neg_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5137093Z test_consistency_SparseCSC_neg_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5137263Z test_consistency_SparseCSC_neg_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5137433Z test_consistency_SparseCSC_neg_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5137601Z test_consistency_SparseCSC_neg_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5137793Z test_consistency_SparseCSC_nn_functional_relu_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5137982Z test_consistency_SparseCSC_nn_functional_relu_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5138270Z test_consistency_SparseCSC_nn_functional_relu_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5138459Z test_consistency_SparseCSC_nn_functional_relu_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5138724Z test_consistency_SparseCSC_nn_functional_relu_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5138913Z test_consistency_SparseCSC_nn_functional_relu_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5139102Z test_consistency_SparseCSC_nn_functional_relu_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5139294Z test_consistency_SparseCSC_nn_functional_relu_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5139480Z test_consistency_SparseCSC_nn_functional_relu_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5139668Z test_consistency_SparseCSC_positive_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5139858Z test_consistency_SparseCSC_positive_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5140044Z test_consistency_SparseCSC_positive_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5140223Z test_consistency_SparseCSC_positive_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5140404Z test_consistency_SparseCSC_positive_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5140587Z test_consistency_SparseCSC_positive_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5140768Z test_consistency_SparseCSC_positive_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5140948Z test_consistency_SparseCSC_positive_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5141128Z test_consistency_SparseCSC_positive_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5141338Z test_consistency_SparseCSC_positive_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5141517Z test_consistency_SparseCSC_positive_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5141696Z test_consistency_SparseCSC_positive_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5141869Z test_consistency_SparseCSC_rad2deg_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5142052Z test_consistency_SparseCSC_rad2deg_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5142231Z test_consistency_SparseCSC_rad2deg_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5142406Z test_consistency_SparseCSC_rad2deg_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5142584Z test_consistency_SparseCSC_rad2deg_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5142763Z test_consistency_SparseCSC_rad2deg_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5142937Z test_consistency_SparseCSC_rad2deg_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5143117Z test_consistency_SparseCSC_rad2deg_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5143289Z test_consistency_SparseCSC_rad2deg_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5143488Z test_consistency_SparseCSC_rad2deg_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5143765Z test_consistency_SparseCSC_randn_like_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... skip: randn_like does not support input with torch.sparse_csc layout (0.002s) 2023-01-11T21:45:10.5144047Z test_consistency_SparseCSC_randn_like_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... skip: randn_like does not support input with torch.sparse_csc layout (0.002s) 2023-01-11T21:45:10.5144302Z test_consistency_SparseCSC_randn_like_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... skip: randn_like does not support input with torch.sparse_csc layout (0.002s) 2023-01-11T21:45:10.5144553Z test_consistency_SparseCSC_randn_like_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... skip: randn_like does not support input with torch.sparse_csc layout (0.002s) 2023-01-11T21:45:10.5144804Z test_consistency_SparseCSC_randn_like_cuda_float16 (__main__.TestSparseCompressedCUDA) ... skip: randn_like does not support input with torch.sparse_csc layout (0.002s) 2023-01-11T21:45:10.5145052Z test_consistency_SparseCSC_randn_like_cuda_float32 (__main__.TestSparseCompressedCUDA) ... skip: randn_like does not support input with torch.sparse_csc layout (0.002s) 2023-01-11T21:45:10.5145294Z test_consistency_SparseCSC_randn_like_cuda_float64 (__main__.TestSparseCompressedCUDA) ... skip: randn_like does not support input with torch.sparse_csc layout (0.002s) 2023-01-11T21:45:10.5145478Z test_consistency_SparseCSC_round_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5145657Z test_consistency_SparseCSC_round_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5145823Z test_consistency_SparseCSC_round_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5145995Z test_consistency_SparseCSC_round_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5146174Z test_consistency_SparseCSC_round_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5146350Z test_consistency_SparseCSC_round_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5146530Z test_consistency_SparseCSC_round_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5146710Z test_consistency_SparseCSC_round_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5146905Z test_consistency_SparseCSC_round_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5147085Z test_consistency_SparseCSC_sgn_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5147260Z test_consistency_SparseCSC_sgn_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5147436Z test_consistency_SparseCSC_sgn_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5147609Z test_consistency_SparseCSC_sgn_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5147782Z test_consistency_SparseCSC_sgn_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5147957Z test_consistency_SparseCSC_sgn_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5148129Z test_consistency_SparseCSC_sgn_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5148303Z test_consistency_SparseCSC_sgn_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5148478Z test_consistency_SparseCSC_sgn_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5148651Z test_consistency_SparseCSC_sgn_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5148822Z test_consistency_SparseCSC_sgn_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5148989Z test_consistency_SparseCSC_sgn_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5149158Z test_consistency_SparseCSC_sgn_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5149334Z test_consistency_SparseCSC_sign_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5149533Z test_consistency_SparseCSC_sign_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5149714Z test_consistency_SparseCSC_sign_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5149891Z test_consistency_SparseCSC_sign_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5150062Z test_consistency_SparseCSC_sign_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5150238Z test_consistency_SparseCSC_sign_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5150412Z test_consistency_SparseCSC_sign_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5150580Z test_consistency_SparseCSC_sign_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5150751Z test_consistency_SparseCSC_sign_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5150923Z test_consistency_SparseCSC_sign_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5151113Z test_consistency_SparseCSC_signbit_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5151289Z test_consistency_SparseCSC_signbit_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5151472Z test_consistency_SparseCSC_signbit_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5151653Z test_consistency_SparseCSC_signbit_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5151828Z test_consistency_SparseCSC_signbit_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5152005Z test_consistency_SparseCSC_signbit_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5152171Z test_consistency_SparseCSC_signbit_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5152345Z test_consistency_SparseCSC_signbit_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5152552Z test_consistency_SparseCSC_signbit_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5152734Z test_consistency_SparseCSC_signbit_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5152908Z test_consistency_SparseCSC_sin_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5153083Z test_consistency_SparseCSC_sin_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5153262Z test_consistency_SparseCSC_sin_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5153437Z test_consistency_SparseCSC_sin_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5153602Z test_consistency_SparseCSC_sin_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5153782Z test_consistency_SparseCSC_sin_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5153962Z test_consistency_SparseCSC_sin_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5154137Z test_consistency_SparseCSC_sin_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5154310Z test_consistency_SparseCSC_sin_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5154484Z test_consistency_SparseCSC_sin_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5154655Z test_consistency_SparseCSC_sin_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5154827Z test_consistency_SparseCSC_sin_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5154996Z test_consistency_SparseCSC_sin_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5155198Z test_consistency_SparseCSC_sinh_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5155375Z test_consistency_SparseCSC_sinh_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5155553Z test_consistency_SparseCSC_sinh_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5155733Z test_consistency_SparseCSC_sinh_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5155913Z test_consistency_SparseCSC_sinh_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5156092Z test_consistency_SparseCSC_sinh_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5156262Z test_consistency_SparseCSC_sinh_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5156435Z test_consistency_SparseCSC_sinh_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5156610Z test_consistency_SparseCSC_sinh_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5156779Z test_consistency_SparseCSC_sinh_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5156952Z test_consistency_SparseCSC_sinh_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5157124Z test_consistency_SparseCSC_sinh_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5157296Z test_consistency_SparseCSC_sinh_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5157471Z test_consistency_SparseCSC_sqrt_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5157641Z test_consistency_SparseCSC_sqrt_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5157821Z test_consistency_SparseCSC_sqrt_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5158003Z test_consistency_SparseCSC_sqrt_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5158203Z test_consistency_SparseCSC_sqrt_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5158374Z test_consistency_SparseCSC_sqrt_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5158546Z test_consistency_SparseCSC_sqrt_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5158724Z test_consistency_SparseCSC_sqrt_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5158899Z test_consistency_SparseCSC_sqrt_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5159070Z test_consistency_SparseCSC_sqrt_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5159243Z test_consistency_SparseCSC_sqrt_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5159421Z test_consistency_SparseCSC_sqrt_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5159593Z test_consistency_SparseCSC_sqrt_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5159762Z test_consistency_SparseCSC_tan_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5159936Z test_consistency_SparseCSC_tan_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5160113Z test_consistency_SparseCSC_tan_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5160290Z test_consistency_SparseCSC_tan_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5160466Z test_consistency_SparseCSC_tan_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5160639Z test_consistency_SparseCSC_tan_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5160843Z test_consistency_SparseCSC_tan_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5161027Z test_consistency_SparseCSC_tan_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5161200Z test_consistency_SparseCSC_tan_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5161365Z test_consistency_SparseCSC_tan_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5161546Z test_consistency_SparseCSC_tan_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5161719Z test_consistency_SparseCSC_tan_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5161889Z test_consistency_SparseCSC_tan_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5162067Z test_consistency_SparseCSC_tanh_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5162241Z test_consistency_SparseCSC_tanh_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5162423Z test_consistency_SparseCSC_tanh_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5162602Z test_consistency_SparseCSC_tanh_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5162778Z test_consistency_SparseCSC_tanh_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5162947Z test_consistency_SparseCSC_tanh_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5163122Z test_consistency_SparseCSC_tanh_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5163300Z test_consistency_SparseCSC_tanh_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5163475Z test_consistency_SparseCSC_tanh_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5163650Z test_consistency_SparseCSC_tanh_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5163825Z test_consistency_SparseCSC_tanh_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5164020Z test_consistency_SparseCSC_tanh_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5164191Z test_consistency_SparseCSC_tanh_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5164372Z test_consistency_SparseCSC_to_sparse_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.004s) 2023-01-11T21:45:10.5164553Z test_consistency_SparseCSC_to_sparse_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5164739Z test_consistency_SparseCSC_to_sparse_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.004s) 2023-01-11T21:45:10.5164927Z test_consistency_SparseCSC_to_sparse_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.004s) 2023-01-11T21:45:10.5165117Z test_consistency_SparseCSC_to_sparse_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5165303Z test_consistency_SparseCSC_to_sparse_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5165483Z test_consistency_SparseCSC_to_sparse_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5165666Z test_consistency_SparseCSC_to_sparse_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5165847Z test_consistency_SparseCSC_to_sparse_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5166014Z test_consistency_SparseCSC_to_sparse_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5166194Z test_consistency_SparseCSC_to_sparse_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5166368Z test_consistency_SparseCSC_to_sparse_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5166589Z test_consistency_SparseCSC_trunc_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5166772Z test_consistency_SparseCSC_trunc_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5166946Z test_consistency_SparseCSC_trunc_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5167121Z test_consistency_SparseCSC_trunc_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5167297Z test_consistency_SparseCSC_trunc_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5167473Z test_consistency_SparseCSC_trunc_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5167639Z test_consistency_SparseCSC_trunc_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5167814Z test_consistency_SparseCSC_trunc_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5167988Z test_consistency_SparseCSC_trunc_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5168170Z test_consistency_SparseCSR_abs_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5168343Z test_consistency_SparseCSR_abs_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5168522Z test_consistency_SparseCSR_abs_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5168699Z test_consistency_SparseCSR_abs_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5168871Z test_consistency_SparseCSR_abs_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5169046Z test_consistency_SparseCSR_abs_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5169212Z test_consistency_SparseCSR_abs_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5169392Z test_consistency_SparseCSR_abs_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5169594Z test_consistency_SparseCSR_abs_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5169770Z test_consistency_SparseCSR_abs_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5169944Z test_consistency_SparseCSR_abs_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5170119Z test_consistency_SparseCSR_abs_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5170297Z test_consistency_SparseCSR_abs_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5170475Z test_consistency_SparseCSR_angle_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5170660Z test_consistency_SparseCSR_angle_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5170835Z test_consistency_SparseCSR_angle_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5171021Z test_consistency_SparseCSR_angle_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5171200Z test_consistency_SparseCSR_angle_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5171373Z test_consistency_SparseCSR_angle_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5171554Z test_consistency_SparseCSR_angle_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5171735Z test_consistency_SparseCSR_angle_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5171910Z test_consistency_SparseCSR_angle_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5172086Z test_consistency_SparseCSR_angle_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5172288Z test_consistency_SparseCSR_angle_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5172464Z test_consistency_SparseCSR_asin_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5172640Z test_consistency_SparseCSR_asin_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5172824Z test_consistency_SparseCSR_asin_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5173009Z test_consistency_SparseCSR_asin_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5173189Z test_consistency_SparseCSR_asin_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5173371Z test_consistency_SparseCSR_asin_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5173550Z test_consistency_SparseCSR_asin_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5173728Z test_consistency_SparseCSR_asin_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5173899Z test_consistency_SparseCSR_asin_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5174078Z test_consistency_SparseCSR_asin_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5174261Z test_consistency_SparseCSR_asin_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5174438Z test_consistency_SparseCSR_asin_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5174732Z test_consistency_SparseCSR_asin_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5174914Z test_consistency_SparseCSR_asinh_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5175088Z test_consistency_SparseCSR_asinh_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5175279Z test_consistency_SparseCSR_asinh_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5175499Z test_consistency_SparseCSR_asinh_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5175665Z test_consistency_SparseCSR_asinh_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5175844Z test_consistency_SparseCSR_asinh_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5176021Z test_consistency_SparseCSR_asinh_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5176195Z test_consistency_SparseCSR_asinh_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5176371Z test_consistency_SparseCSR_asinh_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5176546Z test_consistency_SparseCSR_asinh_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5176726Z test_consistency_SparseCSR_asinh_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5176903Z test_consistency_SparseCSR_asinh_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5177072Z test_consistency_SparseCSR_asinh_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5177243Z test_consistency_SparseCSR_atan_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5177419Z test_consistency_SparseCSR_atan_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5177599Z test_consistency_SparseCSR_atan_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5177776Z test_consistency_SparseCSR_atan_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5177949Z test_consistency_SparseCSR_atan_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5178157Z test_consistency_SparseCSR_atan_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5178333Z test_consistency_SparseCSR_atan_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5178577Z test_consistency_SparseCSR_atan_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5178758Z test_consistency_SparseCSR_atan_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5178923Z test_consistency_SparseCSR_atan_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5179093Z test_consistency_SparseCSR_atan_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5179267Z test_consistency_SparseCSR_atan_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5179435Z test_consistency_SparseCSR_atan_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5179618Z test_consistency_SparseCSR_atanh_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5179795Z test_consistency_SparseCSR_atanh_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5179982Z test_consistency_SparseCSR_atanh_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5180164Z test_consistency_SparseCSR_atanh_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5180334Z test_consistency_SparseCSR_atanh_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5180510Z test_consistency_SparseCSR_atanh_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5180684Z test_consistency_SparseCSR_atanh_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5180850Z test_consistency_SparseCSR_atanh_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5181024Z test_consistency_SparseCSR_atanh_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5181224Z test_consistency_SparseCSR_atanh_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5181398Z test_consistency_SparseCSR_atanh_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5181570Z test_consistency_SparseCSR_atanh_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5181746Z test_consistency_SparseCSR_atanh_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5181914Z test_consistency_SparseCSR_ceil_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5182087Z test_consistency_SparseCSR_ceil_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5182256Z test_consistency_SparseCSR_ceil_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5182427Z test_consistency_SparseCSR_ceil_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5182607Z test_consistency_SparseCSR_ceil_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5182781Z test_consistency_SparseCSR_ceil_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5182949Z test_consistency_SparseCSR_ceil_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5183120Z test_consistency_SparseCSR_ceil_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5183291Z test_consistency_SparseCSR_ceil_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5183473Z test_consistency_SparseCSR_conj_physical_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5183659Z test_consistency_SparseCSR_conj_physical_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5183876Z test_consistency_SparseCSR_conj_physical_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5184067Z test_consistency_SparseCSR_conj_physical_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5184259Z test_consistency_SparseCSR_conj_physical_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5184447Z test_consistency_SparseCSR_conj_physical_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5184634Z test_consistency_SparseCSR_conj_physical_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5184821Z test_consistency_SparseCSR_conj_physical_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5185007Z test_consistency_SparseCSR_conj_physical_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5185188Z test_consistency_SparseCSR_conj_physical_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5185369Z test_consistency_SparseCSR_conj_physical_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5185555Z test_consistency_SparseCSR_conj_physical_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5185741Z test_consistency_SparseCSR_conj_physical_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5185924Z test_consistency_SparseCSR_deg2rad_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5186103Z test_consistency_SparseCSR_deg2rad_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5186285Z test_consistency_SparseCSR_deg2rad_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5186462Z test_consistency_SparseCSR_deg2rad_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5186637Z test_consistency_SparseCSR_deg2rad_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5186829Z test_consistency_SparseCSR_deg2rad_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5187005Z test_consistency_SparseCSR_deg2rad_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5187184Z test_consistency_SparseCSR_deg2rad_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5187362Z test_consistency_SparseCSR_deg2rad_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5187537Z test_consistency_SparseCSR_deg2rad_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5187711Z test_consistency_SparseCSR_erf_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5187880Z test_consistency_SparseCSR_erf_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5188056Z test_consistency_SparseCSR_erf_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5188232Z test_consistency_SparseCSR_erf_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5188397Z test_consistency_SparseCSR_erf_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5188568Z test_consistency_SparseCSR_erf_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5188737Z test_consistency_SparseCSR_erf_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5188906Z test_consistency_SparseCSR_erf_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5189076Z test_consistency_SparseCSR_erf_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5189241Z test_consistency_SparseCSR_erf_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5189441Z test_consistency_SparseCSR_erfinv_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5189620Z test_consistency_SparseCSR_erfinv_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5189801Z test_consistency_SparseCSR_erfinv_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5189969Z test_consistency_SparseCSR_erfinv_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5190146Z test_consistency_SparseCSR_erfinv_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5190316Z test_consistency_SparseCSR_erfinv_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5190485Z test_consistency_SparseCSR_erfinv_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5190656Z test_consistency_SparseCSR_erfinv_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5190834Z test_consistency_SparseCSR_erfinv_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5191017Z test_consistency_SparseCSR_expm1_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5191193Z test_consistency_SparseCSR_expm1_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5191359Z test_consistency_SparseCSR_expm1_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5191531Z test_consistency_SparseCSR_expm1_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5191702Z test_consistency_SparseCSR_expm1_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5191872Z test_consistency_SparseCSR_expm1_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5192042Z test_consistency_SparseCSR_expm1_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5192214Z test_consistency_SparseCSR_expm1_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5192384Z test_consistency_SparseCSR_expm1_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5192583Z test_consistency_SparseCSR_expm1_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5192762Z test_consistency_SparseCSR_floor_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5192927Z test_consistency_SparseCSR_floor_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5193092Z test_consistency_SparseCSR_floor_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5193269Z test_consistency_SparseCSR_floor_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5193464Z test_consistency_SparseCSR_floor_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5193661Z test_consistency_SparseCSR_floor_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5193829Z test_consistency_SparseCSR_floor_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5194004Z test_consistency_SparseCSR_floor_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5194170Z test_consistency_SparseCSR_floor_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5194345Z test_consistency_SparseCSR_frac_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5194512Z test_consistency_SparseCSR_frac_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5194680Z test_consistency_SparseCSR_frac_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5194849Z test_consistency_SparseCSR_frac_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5195051Z test_consistency_SparseCSR_isinf_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5195222Z test_consistency_SparseCSR_isinf_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5195406Z test_consistency_SparseCSR_isinf_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5195585Z test_consistency_SparseCSR_isinf_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5195761Z test_consistency_SparseCSR_isinf_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5195935Z test_consistency_SparseCSR_isinf_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5196102Z test_consistency_SparseCSR_isinf_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5196272Z test_consistency_SparseCSR_isinf_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5196443Z test_consistency_SparseCSR_isinf_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5196615Z test_consistency_SparseCSR_isinf_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5196788Z test_consistency_SparseCSR_isinf_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5196955Z test_consistency_SparseCSR_isinf_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5197124Z test_consistency_SparseCSR_isinf_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5197298Z test_consistency_SparseCSR_isnan_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5197472Z test_consistency_SparseCSR_isnan_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5197647Z test_consistency_SparseCSR_isnan_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5197829Z test_consistency_SparseCSR_isnan_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5198025Z test_consistency_SparseCSR_isnan_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5198200Z test_consistency_SparseCSR_isnan_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5198368Z test_consistency_SparseCSR_isnan_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5198537Z test_consistency_SparseCSR_isnan_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5198706Z test_consistency_SparseCSR_isnan_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5198877Z test_consistency_SparseCSR_isnan_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5199043Z test_consistency_SparseCSR_isnan_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5199215Z test_consistency_SparseCSR_isnan_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5199399Z test_consistency_SparseCSR_isneginf_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5199577Z test_consistency_SparseCSR_isneginf_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5199758Z test_consistency_SparseCSR_isneginf_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5199937Z test_consistency_SparseCSR_isneginf_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5200117Z test_consistency_SparseCSR_isneginf_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5200294Z test_consistency_SparseCSR_isneginf_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5200467Z test_consistency_SparseCSR_isneginf_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5200665Z test_consistency_SparseCSR_isneginf_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5200840Z test_consistency_SparseCSR_isneginf_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5201017Z test_consistency_SparseCSR_isneginf_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5201199Z test_consistency_SparseCSR_isposinf_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5201376Z test_consistency_SparseCSR_isposinf_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5201554Z test_consistency_SparseCSR_isposinf_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5201732Z test_consistency_SparseCSR_isposinf_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5201911Z test_consistency_SparseCSR_isposinf_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5202089Z test_consistency_SparseCSR_isposinf_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5202259Z test_consistency_SparseCSR_isposinf_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5202430Z test_consistency_SparseCSR_isposinf_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5202607Z test_consistency_SparseCSR_isposinf_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5202780Z test_consistency_SparseCSR_isposinf_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5202956Z test_consistency_SparseCSR_log1p_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5203130Z test_consistency_SparseCSR_log1p_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5203313Z test_consistency_SparseCSR_log1p_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5203493Z test_consistency_SparseCSR_log1p_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5203721Z test_consistency_SparseCSR_log1p_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5203911Z test_consistency_SparseCSR_log1p_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5204082Z test_consistency_SparseCSR_log1p_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5204251Z test_consistency_SparseCSR_log1p_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5204421Z test_consistency_SparseCSR_log1p_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5204591Z test_consistency_SparseCSR_log1p_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5204762Z test_consistency_SparseCSR_log1p_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5204933Z test_consistency_SparseCSR_log1p_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5205677Z test_consistency_SparseCSR_masked_amax_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... /opt/conda/lib/python3.10/site-packages/torch/masked/_ops.py:767: UserWarning: scatter_reduce() is in beta and the API may change at any time. (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/TensorAdvancedIndexing.cpp:1739.) 2023-01-11T21:45:10.5205772Z new_values.scatter_reduce_( 2023-01-11T21:45:10.5205837Z ok (0.023s) 2023-01-11T21:45:10.5206021Z test_consistency_SparseCSR_masked_amax_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.023s) 2023-01-11T21:45:10.5206207Z test_consistency_SparseCSR_masked_amax_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.022s) 2023-01-11T21:45:10.5206390Z test_consistency_SparseCSR_masked_amax_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.022s) 2023-01-11T21:45:10.5206682Z test_consistency_SparseCSR_masked_amax_cuda_int16 (__main__.TestSparseCompressedCUDA) ... skip: masked.amax does not support input with torch.sparse_csr layout (0.002s) 2023-01-11T21:45:10.5206942Z test_consistency_SparseCSR_masked_amax_cuda_int32 (__main__.TestSparseCompressedCUDA) ... skip: masked.amax does not support input with torch.sparse_csr layout (0.002s) 2023-01-11T21:45:10.5207191Z test_consistency_SparseCSR_masked_amax_cuda_int64 (__main__.TestSparseCompressedCUDA) ... skip: masked.amax does not support input with torch.sparse_csr layout (0.002s) 2023-01-11T21:45:10.5207438Z test_consistency_SparseCSR_masked_amax_cuda_int8 (__main__.TestSparseCompressedCUDA) ... skip: masked.amax does not support input with torch.sparse_csr layout (0.002s) 2023-01-11T21:45:10.5207680Z test_consistency_SparseCSR_masked_amax_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... skip: masked.amax does not support input with torch.sparse_csr layout (0.002s) 2023-01-11T21:45:10.5207869Z test_consistency_SparseCSR_masked_amin_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.022s) 2023-01-11T21:45:10.5208053Z test_consistency_SparseCSR_masked_amin_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.022s) 2023-01-11T21:45:10.5208228Z test_consistency_SparseCSR_masked_amin_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.022s) 2023-01-11T21:45:10.5208406Z test_consistency_SparseCSR_masked_amin_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.022s) 2023-01-11T21:45:10.5208653Z test_consistency_SparseCSR_masked_amin_cuda_int16 (__main__.TestSparseCompressedCUDA) ... skip: masked.amin does not support input with torch.sparse_csr layout (0.002s) 2023-01-11T21:45:10.5208893Z test_consistency_SparseCSR_masked_amin_cuda_int32 (__main__.TestSparseCompressedCUDA) ... skip: masked.amin does not support input with torch.sparse_csr layout (0.002s) 2023-01-11T21:45:10.5209133Z test_consistency_SparseCSR_masked_amin_cuda_int64 (__main__.TestSparseCompressedCUDA) ... skip: masked.amin does not support input with torch.sparse_csr layout (0.002s) 2023-01-11T21:45:10.5209404Z test_consistency_SparseCSR_masked_amin_cuda_int8 (__main__.TestSparseCompressedCUDA) ... skip: masked.amin does not support input with torch.sparse_csr layout (0.002s) 2023-01-11T21:45:10.5209644Z test_consistency_SparseCSR_masked_amin_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... skip: masked.amin does not support input with torch.sparse_csr layout (0.002s) 2023-01-11T21:45:10.5209828Z test_consistency_SparseCSR_masked_mean_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.023s) 2023-01-11T21:45:10.5210069Z test_consistency_SparseCSR_masked_mean_cuda_bool (__main__.TestSparseCompressedCUDA) ... skip: masked.mean does not support input with torch.sparse_csr layout (0.002s) 2023-01-11T21:45:10.5210395Z test_consistency_SparseCSR_masked_mean_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... skip: masked.mean does not support input with torch.sparse_csr layout (0.002s) 2023-01-11T21:45:10.5210669Z test_consistency_SparseCSR_masked_mean_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... skip: masked.mean does not support input with torch.sparse_csr layout (0.002s) 2023-01-11T21:45:10.5210847Z test_consistency_SparseCSR_masked_mean_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.023s) 2023-01-11T21:45:10.5211041Z test_consistency_SparseCSR_masked_mean_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.023s) 2023-01-11T21:45:10.5211288Z test_consistency_SparseCSR_masked_mean_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.023s) 2023-01-11T21:45:10.5211551Z test_consistency_SparseCSR_masked_mean_cuda_int16 (__main__.TestSparseCompressedCUDA) ... skip: masked.mean does not support input with torch.sparse_csr layout (0.002s) 2023-01-11T21:45:10.5211812Z test_consistency_SparseCSR_masked_mean_cuda_int32 (__main__.TestSparseCompressedCUDA) ... skip: masked.mean does not support input with torch.sparse_csr layout (0.002s) 2023-01-11T21:45:10.5212099Z test_consistency_SparseCSR_masked_mean_cuda_int64 (__main__.TestSparseCompressedCUDA) ... skip: masked.mean does not support input with torch.sparse_csr layout (0.002s) 2023-01-11T21:45:10.5212479Z test_consistency_SparseCSR_masked_mean_cuda_int8 (__main__.TestSparseCompressedCUDA) ... skip: masked.mean does not support input with torch.sparse_csr layout (0.002s) 2023-01-11T21:45:10.5212737Z test_consistency_SparseCSR_masked_mean_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... skip: masked.mean does not support input with torch.sparse_csr layout (0.002s) 2023-01-11T21:45:10.5212935Z test_consistency_SparseCSR_masked_prod_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.674s) 2023-01-11T21:45:10.5213138Z test_consistency_SparseCSR_masked_prod_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.646s) 2023-01-11T21:45:10.5213350Z test_consistency_SparseCSR_masked_prod_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (2.104s) 2023-01-11T21:45:10.5213562Z test_consistency_SparseCSR_masked_prod_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (1.421s) 2023-01-11T21:45:10.5213770Z test_consistency_SparseCSR_masked_prod_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.676s) 2023-01-11T21:45:10.5213964Z test_consistency_SparseCSR_masked_prod_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.646s) 2023-01-11T21:45:10.5214182Z test_consistency_SparseCSR_masked_prod_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.649s) 2023-01-11T21:45:10.5214359Z test_consistency_SparseCSR_masked_prod_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.009s) 2023-01-11T21:45:10.5214727Z test_consistency_SparseCSR_masked_prod_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.009s) 2023-01-11T21:45:10.5214980Z test_consistency_SparseCSR_masked_prod_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.008s) 2023-01-11T21:45:10.5215234Z test_consistency_SparseCSR_masked_prod_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.008s) 2023-01-11T21:45:10.5215429Z test_consistency_SparseCSR_masked_prod_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.008s) 2023-01-11T21:45:10.5215702Z test_consistency_SparseCSR_masked_sum_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.021s) 2023-01-11T21:45:10.5215906Z test_consistency_SparseCSR_masked_sum_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.009s) 2023-01-11T21:45:10.5216107Z test_consistency_SparseCSR_masked_sum_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.008s) 2023-01-11T21:45:10.5216339Z test_consistency_SparseCSR_masked_sum_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.008s) 2023-01-11T21:45:10.5216516Z test_consistency_SparseCSR_masked_sum_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.021s) 2023-01-11T21:45:10.5216711Z test_consistency_SparseCSR_masked_sum_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.021s) 2023-01-11T21:45:10.5216908Z test_consistency_SparseCSR_masked_sum_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.021s) 2023-01-11T21:45:10.5217103Z test_consistency_SparseCSR_masked_sum_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.008s) 2023-01-11T21:45:10.5217307Z test_consistency_SparseCSR_masked_sum_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.008s) 2023-01-11T21:45:10.5217507Z test_consistency_SparseCSR_masked_sum_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.008s) 2023-01-11T21:45:10.5217704Z test_consistency_SparseCSR_masked_sum_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.008s) 2023-01-11T21:45:10.5217901Z test_consistency_SparseCSR_masked_sum_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.008s) 2023-01-11T21:45:10.5218107Z test_consistency_SparseCSR_neg_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5218281Z test_consistency_SparseCSR_neg_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5218575Z test_consistency_SparseCSR_neg_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5218832Z test_consistency_SparseCSR_neg_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5219022Z test_consistency_SparseCSR_neg_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5219212Z test_consistency_SparseCSR_neg_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5219402Z test_consistency_SparseCSR_neg_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5219587Z test_consistency_SparseCSR_neg_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5219777Z test_consistency_SparseCSR_neg_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5219990Z test_consistency_SparseCSR_neg_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5220164Z test_consistency_SparseCSR_neg_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5220359Z test_consistency_SparseCSR_neg_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5220572Z test_consistency_SparseCSR_nn_functional_relu_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5220780Z test_consistency_SparseCSR_nn_functional_relu_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5220990Z test_consistency_SparseCSR_nn_functional_relu_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5221193Z test_consistency_SparseCSR_nn_functional_relu_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5221404Z test_consistency_SparseCSR_nn_functional_relu_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5221613Z test_consistency_SparseCSR_nn_functional_relu_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5221837Z test_consistency_SparseCSR_nn_functional_relu_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5222049Z test_consistency_SparseCSR_nn_functional_relu_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5222299Z test_consistency_SparseCSR_nn_functional_relu_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5222497Z test_consistency_SparseCSR_positive_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5222700Z test_consistency_SparseCSR_positive_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5222901Z test_consistency_SparseCSR_positive_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5223094Z test_consistency_SparseCSR_positive_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5223296Z test_consistency_SparseCSR_positive_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5223500Z test_consistency_SparseCSR_positive_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5223714Z test_consistency_SparseCSR_positive_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5223889Z test_consistency_SparseCSR_positive_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5224085Z test_consistency_SparseCSR_positive_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5224278Z test_consistency_SparseCSR_positive_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5224472Z test_consistency_SparseCSR_positive_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5224662Z test_consistency_SparseCSR_positive_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5224890Z test_consistency_SparseCSR_rad2deg_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5225089Z test_consistency_SparseCSR_rad2deg_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5225288Z test_consistency_SparseCSR_rad2deg_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5225534Z test_consistency_SparseCSR_rad2deg_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5225703Z test_consistency_SparseCSR_rad2deg_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5225898Z test_consistency_SparseCSR_rad2deg_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5226091Z test_consistency_SparseCSR_rad2deg_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5226278Z test_consistency_SparseCSR_rad2deg_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5226482Z test_consistency_SparseCSR_rad2deg_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5226677Z test_consistency_SparseCSR_rad2deg_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5226878Z test_consistency_SparseCSR_randn_like_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5227079Z test_consistency_SparseCSR_randn_like_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5227295Z test_consistency_SparseCSR_randn_like_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5227476Z test_consistency_SparseCSR_randn_like_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5227675Z test_consistency_SparseCSR_randn_like_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5227879Z test_consistency_SparseCSR_randn_like_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5228080Z test_consistency_SparseCSR_randn_like_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5228304Z test_consistency_SparseCSR_round_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5228491Z test_consistency_SparseCSR_round_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5228675Z test_consistency_SparseCSR_round_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5228897Z test_consistency_SparseCSR_round_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5229096Z test_consistency_SparseCSR_round_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5229262Z test_consistency_SparseCSR_round_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5229452Z test_consistency_SparseCSR_round_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5229649Z test_consistency_SparseCSR_round_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5229841Z test_consistency_SparseCSR_round_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5230035Z test_consistency_SparseCSR_sgn_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5230226Z test_consistency_SparseCSR_sgn_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5230423Z test_consistency_SparseCSR_sgn_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5230612Z test_consistency_SparseCSR_sgn_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5230778Z test_consistency_SparseCSR_sgn_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5231013Z test_consistency_SparseCSR_sgn_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5231199Z test_consistency_SparseCSR_sgn_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5231388Z test_consistency_SparseCSR_sgn_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5231572Z test_consistency_SparseCSR_sgn_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5231756Z test_consistency_SparseCSR_sgn_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5231940Z test_consistency_SparseCSR_sgn_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5232163Z test_consistency_SparseCSR_sgn_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5232336Z test_consistency_SparseCSR_sgn_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5239087Z test_consistency_SparseCSR_sign_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5239284Z test_consistency_SparseCSR_sign_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5239469Z test_consistency_SparseCSR_sign_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5239643Z test_consistency_SparseCSR_sign_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5239817Z test_consistency_SparseCSR_sign_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5239986Z test_consistency_SparseCSR_sign_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5240161Z test_consistency_SparseCSR_sign_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5240334Z test_consistency_SparseCSR_sign_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5240512Z test_consistency_SparseCSR_sign_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5240684Z test_consistency_SparseCSR_sign_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5240930Z test_consistency_SparseCSR_signbit_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5241112Z test_consistency_SparseCSR_signbit_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5241295Z test_consistency_SparseCSR_signbit_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5241464Z test_consistency_SparseCSR_signbit_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5241639Z test_consistency_SparseCSR_signbit_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5241814Z test_consistency_SparseCSR_signbit_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5241990Z test_consistency_SparseCSR_signbit_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5242166Z test_consistency_SparseCSR_signbit_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5242349Z test_consistency_SparseCSR_signbit_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5242518Z test_consistency_SparseCSR_signbit_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5242686Z test_consistency_SparseCSR_sin_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5242856Z test_consistency_SparseCSR_sin_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5243030Z test_consistency_SparseCSR_sin_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5243205Z test_consistency_SparseCSR_sin_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5243403Z test_consistency_SparseCSR_sin_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5243601Z test_consistency_SparseCSR_sin_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5243804Z test_consistency_SparseCSR_sin_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5243976Z test_consistency_SparseCSR_sin_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5244147Z test_consistency_SparseCSR_sin_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5244319Z test_consistency_SparseCSR_sin_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5244488Z test_consistency_SparseCSR_sin_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5244654Z test_consistency_SparseCSR_sin_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5244826Z test_consistency_SparseCSR_sin_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5244999Z test_consistency_SparseCSR_sinh_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5245173Z test_consistency_SparseCSR_sinh_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5245353Z test_consistency_SparseCSR_sinh_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5245535Z test_consistency_SparseCSR_sinh_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5245717Z test_consistency_SparseCSR_sinh_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5245892Z test_consistency_SparseCSR_sinh_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5246063Z test_consistency_SparseCSR_sinh_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5246225Z test_consistency_SparseCSR_sinh_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5246395Z test_consistency_SparseCSR_sinh_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5246589Z test_consistency_SparseCSR_sinh_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5246764Z test_consistency_SparseCSR_sinh_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5246931Z test_consistency_SparseCSR_sinh_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5247102Z test_consistency_SparseCSR_sinh_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5247273Z test_consistency_SparseCSR_sqrt_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5247444Z test_consistency_SparseCSR_sqrt_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5247615Z test_consistency_SparseCSR_sqrt_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5247797Z test_consistency_SparseCSR_sqrt_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5247974Z test_consistency_SparseCSR_sqrt_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5248150Z test_consistency_SparseCSR_sqrt_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5248326Z test_consistency_SparseCSR_sqrt_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5248496Z test_consistency_SparseCSR_sqrt_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5248666Z test_consistency_SparseCSR_sqrt_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5248834Z test_consistency_SparseCSR_sqrt_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5249029Z test_consistency_SparseCSR_sqrt_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5249191Z test_consistency_SparseCSR_sqrt_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5249361Z test_consistency_SparseCSR_sqrt_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5249533Z test_consistency_SparseCSR_tan_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5249704Z test_consistency_SparseCSR_tan_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5249883Z test_consistency_SparseCSR_tan_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5250058Z test_consistency_SparseCSR_tan_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5250230Z test_consistency_SparseCSR_tan_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5250402Z test_consistency_SparseCSR_tan_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5250579Z test_consistency_SparseCSR_tan_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5250748Z test_consistency_SparseCSR_tan_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5250916Z test_consistency_SparseCSR_tan_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5251086Z test_consistency_SparseCSR_tan_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5251255Z test_consistency_SparseCSR_tan_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5251427Z test_consistency_SparseCSR_tan_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5251595Z test_consistency_SparseCSR_tan_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5251772Z test_consistency_SparseCSR_tanh_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5251941Z test_consistency_SparseCSR_tanh_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5252144Z test_consistency_SparseCSR_tanh_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5252314Z test_consistency_SparseCSR_tanh_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5252494Z test_consistency_SparseCSR_tanh_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5252669Z test_consistency_SparseCSR_tanh_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5252840Z test_consistency_SparseCSR_tanh_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5253009Z test_consistency_SparseCSR_tanh_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5253181Z test_consistency_SparseCSR_tanh_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5253354Z test_consistency_SparseCSR_tanh_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5253530Z test_consistency_SparseCSR_tanh_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5253717Z test_consistency_SparseCSR_tanh_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5253910Z test_consistency_SparseCSR_tanh_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5254096Z test_consistency_SparseCSR_to_sparse_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.004s) 2023-01-11T21:45:10.5254301Z test_consistency_SparseCSR_to_sparse_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5254689Z test_consistency_SparseCSR_to_sparse_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5254945Z test_consistency_SparseCSR_to_sparse_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5255289Z test_consistency_SparseCSR_to_sparse_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5255483Z test_consistency_SparseCSR_to_sparse_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5255665Z test_consistency_SparseCSR_to_sparse_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5255836Z test_consistency_SparseCSR_to_sparse_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5256014Z test_consistency_SparseCSR_to_sparse_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5256184Z test_consistency_SparseCSR_to_sparse_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5256360Z test_consistency_SparseCSR_to_sparse_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5256534Z test_consistency_SparseCSR_to_sparse_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5256710Z test_consistency_SparseCSR_trunc_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5256885Z test_consistency_SparseCSR_trunc_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5257056Z test_consistency_SparseCSR_trunc_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s) 2023-01-11T21:45:10.5257227Z test_consistency_SparseCSR_trunc_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5257393Z test_consistency_SparseCSR_trunc_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5257568Z test_consistency_SparseCSR_trunc_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5257737Z test_consistency_SparseCSR_trunc_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5257914Z test_consistency_SparseCSR_trunc_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5258083Z test_consistency_SparseCSR_trunc_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s) 2023-01-11T21:45:10.5258286Z test_copy_SparseBSC_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.030s) 2023-01-11T21:45:10.5258450Z test_copy_SparseBSC_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.029s) 2023-01-11T21:45:10.5258707Z test_copy_SparseBSC_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.029s) 2023-01-11T21:45:10.5258877Z test_copy_SparseBSC_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.029s) 2023-01-11T21:45:10.5259034Z test_copy_SparseBSC_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.029s) 2023-01-11T21:45:10.5259197Z test_copy_SparseBSC_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.029s) 2023-01-11T21:45:10.5259353Z test_copy_SparseBSC_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.029s) 2023-01-11T21:45:10.5259515Z test_copy_SparseBSC_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.029s) 2023-01-11T21:45:10.5259668Z test_copy_SparseBSC_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.029s) 2023-01-11T21:45:10.5259823Z test_copy_SparseBSC_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.029s) 2023-01-11T21:45:10.5259980Z test_copy_SparseBSC_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.029s) 2023-01-11T21:45:10.5260133Z test_copy_SparseBSC_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.029s) 2023-01-11T21:45:10.5260289Z test_copy_SparseBSR_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.021s) 2023-01-11T21:45:10.5260444Z test_copy_SparseBSR_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.020s) 2023-01-11T21:45:10.5260606Z test_copy_SparseBSR_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.021s) 2023-01-11T21:45:10.5260769Z test_copy_SparseBSR_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.021s) 2023-01-11T21:45:10.5260962Z test_copy_SparseBSR_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.021s) 2023-01-11T21:45:10.5261122Z test_copy_SparseBSR_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.021s) 2023-01-11T21:45:10.5261287Z test_copy_SparseBSR_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.020s) 2023-01-11T21:45:10.5261444Z test_copy_SparseBSR_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.020s) 2023-01-11T21:45:10.5261593Z test_copy_SparseBSR_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.020s) 2023-01-11T21:45:10.5261744Z test_copy_SparseBSR_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.020s) 2023-01-11T21:45:10.5261904Z test_copy_SparseBSR_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.020s) 2023-01-11T21:45:10.5262069Z test_copy_SparseBSR_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.021s) 2023-01-11T21:45:10.5262234Z test_copy_SparseCSC_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.021s) 2023-01-11T21:45:10.5262395Z test_copy_SparseCSC_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.021s) 2023-01-11T21:45:10.5262560Z test_copy_SparseCSC_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.022s) 2023-01-11T21:45:10.5262728Z test_copy_SparseCSC_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.022s) 2023-01-11T21:45:10.5262894Z test_copy_SparseCSC_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.021s) 2023-01-11T21:45:10.5263046Z test_copy_SparseCSC_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.021s) 2023-01-11T21:45:10.5263210Z test_copy_SparseCSC_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.021s) 2023-01-11T21:45:10.5263366Z test_copy_SparseCSC_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.021s) 2023-01-11T21:45:10.5263528Z test_copy_SparseCSC_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.021s) 2023-01-11T21:45:10.5263685Z test_copy_SparseCSC_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.021s) 2023-01-11T21:45:10.5263847Z test_copy_SparseCSC_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.021s) 2023-01-11T21:45:10.5264004Z test_copy_SparseCSC_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.020s) 2023-01-11T21:45:10.5264203Z test_copy_SparseCSR_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.021s) 2023-01-11T21:45:10.5264356Z test_copy_SparseCSR_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.021s) 2023-01-11T21:45:10.5264524Z test_copy_SparseCSR_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.021s) 2023-01-11T21:45:10.5264692Z test_copy_SparseCSR_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.021s) 2023-01-11T21:45:10.5264854Z test_copy_SparseCSR_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.021s) 2023-01-11T21:45:10.5265015Z test_copy_SparseCSR_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.021s) 2023-01-11T21:45:10.5265181Z test_copy_SparseCSR_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.021s) 2023-01-11T21:45:10.5265338Z test_copy_SparseCSR_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.020s) 2023-01-11T21:45:10.5265498Z test_copy_SparseCSR_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.021s) 2023-01-11T21:45:10.5265647Z test_copy_SparseCSR_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.021s) 2023-01-11T21:45:10.5265805Z test_copy_SparseCSR_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.021s) 2023-01-11T21:45:10.5265961Z test_copy_SparseCSR_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.021s) 2023-01-11T21:45:10.5266134Z test_copy_errors_SparseBSC_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.039s) 2023-01-11T21:45:10.5266302Z test_copy_errors_SparseBSC_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.038s) 2023-01-11T21:45:10.5266478Z test_copy_errors_SparseBSC_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.038s) 2023-01-11T21:45:10.5266680Z test_copy_errors_SparseBSC_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.038s) 2023-01-11T21:45:10.5266851Z test_copy_errors_SparseBSC_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.038s) 2023-01-11T21:45:10.5267022Z test_copy_errors_SparseBSC_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.038s) 2023-01-11T21:45:10.5267187Z test_copy_errors_SparseBSC_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.038s) 2023-01-11T21:45:10.5267357Z test_copy_errors_SparseBSC_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.038s) 2023-01-11T21:45:10.5267524Z test_copy_errors_SparseBSC_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.038s) 2023-01-11T21:45:10.5267688Z test_copy_errors_SparseBSC_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.038s) 2023-01-11T21:45:10.5267856Z test_copy_errors_SparseBSC_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.038s) 2023-01-11T21:45:10.5268024Z test_copy_errors_SparseBSC_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.038s) 2023-01-11T21:45:10.5268202Z test_copy_errors_SparseBSR_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.038s) 2023-01-11T21:45:10.5268372Z test_copy_errors_SparseBSR_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.038s) 2023-01-11T21:45:10.5268540Z test_copy_errors_SparseBSR_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.038s) 2023-01-11T21:45:10.5268715Z test_copy_errors_SparseBSR_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.038s) 2023-01-11T21:45:10.5268883Z test_copy_errors_SparseBSR_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.038s) 2023-01-11T21:45:10.5269053Z test_copy_errors_SparseBSR_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.038s) 2023-01-11T21:45:10.5269219Z test_copy_errors_SparseBSR_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.038s) 2023-01-11T21:45:10.5269386Z test_copy_errors_SparseBSR_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.038s) 2023-01-11T21:45:10.5269554Z test_copy_errors_SparseBSR_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.038s) 2023-01-11T21:45:10.5269719Z test_copy_errors_SparseBSR_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.038s) 2023-01-11T21:45:10.5269909Z test_copy_errors_SparseBSR_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.038s) 2023-01-11T21:45:10.5270068Z test_copy_errors_SparseBSR_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.038s) 2023-01-11T21:45:10.5270242Z test_copy_errors_SparseCSC_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.026s) 2023-01-11T21:45:10.5270410Z test_copy_errors_SparseCSC_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.026s) 2023-01-11T21:45:10.5270584Z test_copy_errors_SparseCSC_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.026s) 2023-01-11T21:45:10.5270758Z test_copy_errors_SparseCSC_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.026s) 2023-01-11T21:45:10.5270932Z test_copy_errors_SparseCSC_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.026s) 2023-01-11T21:45:10.5271101Z test_copy_errors_SparseCSC_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.026s) 2023-01-11T21:45:10.5271268Z test_copy_errors_SparseCSC_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.026s) 2023-01-11T21:45:10.5271433Z test_copy_errors_SparseCSC_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.026s) 2023-01-11T21:45:10.5271591Z test_copy_errors_SparseCSC_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.026s) 2023-01-11T21:45:10.5271752Z test_copy_errors_SparseCSC_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.026s) 2023-01-11T21:45:10.5271918Z test_copy_errors_SparseCSC_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.026s) 2023-01-11T21:45:10.5272082Z test_copy_errors_SparseCSC_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.026s) 2023-01-11T21:45:10.5272254Z test_copy_errors_SparseCSR_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.026s) 2023-01-11T21:45:10.5272443Z test_copy_errors_SparseCSR_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.026s) 2023-01-11T21:45:10.5272619Z test_copy_errors_SparseCSR_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.026s) 2023-01-11T21:45:10.5272789Z test_copy_errors_SparseCSR_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.026s) 2023-01-11T21:45:10.5272950Z test_copy_errors_SparseCSR_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.026s) 2023-01-11T21:45:10.5273118Z test_copy_errors_SparseCSR_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.026s) 2023-01-11T21:45:10.5273294Z test_copy_errors_SparseCSR_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.026s) 2023-01-11T21:45:10.5273482Z test_copy_errors_SparseCSR_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.026s) 2023-01-11T21:45:10.5273663Z test_copy_errors_SparseCSR_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.026s) 2023-01-11T21:45:10.5273829Z test_copy_errors_SparseCSR_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.026s) 2023-01-11T21:45:10.5273994Z test_copy_errors_SparseCSR_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.026s) 2023-01-11T21:45:10.5274158Z test_copy_errors_SparseCSR_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.026s) 2023-01-11T21:45:10.5274336Z test_dim_SparseBSC_cuda (__main__.TestSparseCompressedCUDA) ... skip: Only runs on cpu (0.001s) 2023-01-11T21:45:10.5274507Z test_dim_SparseBSR_cuda (__main__.TestSparseCompressedCUDA) ... skip: Only runs on cpu (0.001s) 2023-01-11T21:45:10.5274683Z test_dim_SparseCSC_cuda (__main__.TestSparseCompressedCUDA) ... skip: Only runs on cpu (0.001s) 2023-01-11T21:45:10.5274854Z test_dim_SparseCSR_cuda (__main__.TestSparseCompressedCUDA) ... skip: Only runs on cpu (0.001s) 2023-01-11T21:45:10.5275023Z test_empty_SparseCSC_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.019s) 2023-01-11T21:45:10.5275191Z test_empty_SparseCSC_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.019s) 2023-01-11T21:45:10.5275363Z test_empty_SparseCSC_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.019s) 2023-01-11T21:45:10.5275559Z test_empty_SparseCSC_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.019s) 2023-01-11T21:45:10.5275723Z test_empty_SparseCSC_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.019s) 2023-01-11T21:45:10.5275881Z test_empty_SparseCSC_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.018s) 2023-01-11T21:45:10.5276045Z test_empty_SparseCSC_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.018s) 2023-01-11T21:45:10.5276207Z test_empty_SparseCSC_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.018s) 2023-01-11T21:45:10.5276366Z test_empty_SparseCSC_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.018s) 2023-01-11T21:45:10.5276529Z test_empty_SparseCSC_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.018s) 2023-01-11T21:45:10.5276688Z test_empty_SparseCSC_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.018s) 2023-01-11T21:45:10.5276848Z test_empty_SparseCSC_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.018s) 2023-01-11T21:45:10.5277015Z test_empty_SparseCSR_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.018s) 2023-01-11T21:45:10.5277173Z test_empty_SparseCSR_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.018s) 2023-01-11T21:45:10.5277336Z test_empty_SparseCSR_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.018s) 2023-01-11T21:45:10.5277504Z test_empty_SparseCSR_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.019s) 2023-01-11T21:45:10.5277666Z test_empty_SparseCSR_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.019s) 2023-01-11T21:45:10.5277830Z test_empty_SparseCSR_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.019s) 2023-01-11T21:45:10.5277992Z test_empty_SparseCSR_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.019s) 2023-01-11T21:45:10.5278175Z test_empty_SparseCSR_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.019s) 2023-01-11T21:45:10.5278329Z test_empty_SparseCSR_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.019s) 2023-01-11T21:45:10.5278485Z test_empty_SparseCSR_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.019s) 2023-01-11T21:45:10.5278635Z test_empty_SparseCSR_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.019s) 2023-01-11T21:45:10.5278788Z test_empty_SparseCSR_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.019s) 2023-01-11T21:45:10.5278960Z test_empty_errors_SparseCSC_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.004s) 2023-01-11T21:45:10.5279124Z test_empty_errors_SparseCSC_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.004s) 2023-01-11T21:45:10.5279294Z test_empty_errors_SparseCSC_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.004s) 2023-01-11T21:45:10.5279469Z test_empty_errors_SparseCSC_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.004s) 2023-01-11T21:45:10.5279643Z test_empty_errors_SparseCSC_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.004s) 2023-01-11T21:45:10.5279814Z test_empty_errors_SparseCSC_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.004s) 2023-01-11T21:45:10.5279980Z test_empty_errors_SparseCSC_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.004s) 2023-01-11T21:45:10.5280141Z test_empty_errors_SparseCSC_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.004s) 2023-01-11T21:45:10.5280308Z test_empty_errors_SparseCSC_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.004s) 2023-01-11T21:45:10.5280471Z test_empty_errors_SparseCSC_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.004s) 2023-01-11T21:45:10.5280637Z test_empty_errors_SparseCSC_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.004s) 2023-01-11T21:45:10.5280803Z test_empty_errors_SparseCSC_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.004s) 2023-01-11T21:45:10.5280975Z test_empty_errors_SparseCSR_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.004s) 2023-01-11T21:45:10.5281141Z test_empty_errors_SparseCSR_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.004s) 2023-01-11T21:45:10.5281353Z test_empty_errors_SparseCSR_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.004s) 2023-01-11T21:45:10.5281521Z test_empty_errors_SparseCSR_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.004s) 2023-01-11T21:45:10.5281694Z test_empty_errors_SparseCSR_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.004s) 2023-01-11T21:45:10.5281861Z test_empty_errors_SparseCSR_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.004s) 2023-01-11T21:45:10.5282029Z test_empty_errors_SparseCSR_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.004s) 2023-01-11T21:45:10.5282195Z test_empty_errors_SparseCSR_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.004s) 2023-01-11T21:45:10.5282364Z test_empty_errors_SparseCSR_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.004s) 2023-01-11T21:45:10.5282530Z test_empty_errors_SparseCSR_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.004s) 2023-01-11T21:45:10.5282695Z test_empty_errors_SparseCSR_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.004s) 2023-01-11T21:45:10.5282857Z test_empty_errors_SparseCSR_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.004s) 2023-01-11T21:45:10.5283034Z test_empty_like_SparseBSC_SparseBSC_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.026s) 2023-01-11T21:45:10.5283213Z test_empty_like_SparseBSC_SparseBSC_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.025s) 2023-01-11T21:45:10.5283400Z test_empty_like_SparseBSC_SparseBSC_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.026s) 2023-01-11T21:45:10.5283585Z test_empty_like_SparseBSC_SparseBSC_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.025s) 2023-01-11T21:45:10.5283792Z test_empty_like_SparseBSC_SparseBSC_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.025s) 2023-01-11T21:45:10.5283974Z test_empty_like_SparseBSC_SparseBSC_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.025s) 2023-01-11T21:45:10.5284156Z test_empty_like_SparseBSC_SparseBSC_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.025s) 2023-01-11T21:45:10.5284333Z test_empty_like_SparseBSC_SparseBSC_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.026s) 2023-01-11T21:45:10.5284508Z test_empty_like_SparseBSC_SparseBSC_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.025s) 2023-01-11T21:45:10.5284673Z test_empty_like_SparseBSC_SparseBSC_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.025s) 2023-01-11T21:45:10.5284849Z test_empty_like_SparseBSC_SparseBSC_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.025s) 2023-01-11T21:45:10.5285020Z test_empty_like_SparseBSC_SparseBSC_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.025s) 2023-01-11T21:45:10.5285203Z test_empty_like_SparseBSC_SparseBSR_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.111s) 2023-01-11T21:45:10.5285377Z test_empty_like_SparseBSC_SparseBSR_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.111s) 2023-01-11T21:45:10.5285562Z test_empty_like_SparseBSC_SparseBSR_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.111s) 2023-01-11T21:45:10.5285743Z test_empty_like_SparseBSC_SparseBSR_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.111s) 2023-01-11T21:45:10.5285920Z test_empty_like_SparseBSC_SparseBSR_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.111s) 2023-01-11T21:45:10.5286087Z test_empty_like_SparseBSC_SparseBSR_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.111s) 2023-01-11T21:45:10.5286265Z test_empty_like_SparseBSC_SparseBSR_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.111s) 2023-01-11T21:45:10.5286439Z test_empty_like_SparseBSC_SparseBSR_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.111s) 2023-01-11T21:45:10.5286612Z test_empty_like_SparseBSC_SparseBSR_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.111s) 2023-01-11T21:45:10.5286805Z test_empty_like_SparseBSC_SparseBSR_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.111s) 2023-01-11T21:45:10.5286982Z test_empty_like_SparseBSC_SparseBSR_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.111s) 2023-01-11T21:45:10.5287159Z test_empty_like_SparseBSC_SparseBSR_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.111s) 2023-01-11T21:45:10.5287337Z test_empty_like_SparseBSC_SparseCSC_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.111s) 2023-01-11T21:45:10.5287512Z test_empty_like_SparseBSC_SparseCSC_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.111s) 2023-01-11T21:45:10.5287686Z test_empty_like_SparseBSC_SparseCSC_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.111s) 2023-01-11T21:45:10.5287868Z test_empty_like_SparseBSC_SparseCSC_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.111s) 2023-01-11T21:45:10.5288051Z test_empty_like_SparseBSC_SparseCSC_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.111s) 2023-01-11T21:45:10.5288231Z test_empty_like_SparseBSC_SparseCSC_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.111s) 2023-01-11T21:45:10.5288406Z test_empty_like_SparseBSC_SparseCSC_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.111s) 2023-01-11T21:45:10.5288578Z test_empty_like_SparseBSC_SparseCSC_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.111s) 2023-01-11T21:45:10.5288753Z test_empty_like_SparseBSC_SparseCSC_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.111s) 2023-01-11T21:45:10.5288924Z test_empty_like_SparseBSC_SparseCSC_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.111s) 2023-01-11T21:45:10.5289099Z test_empty_like_SparseBSC_SparseCSC_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.111s) 2023-01-11T21:45:10.5289289Z test_empty_like_SparseBSC_SparseCSC_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.111s) 2023-01-11T21:45:10.5289470Z test_empty_like_SparseBSC_SparseCSR_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.111s) 2023-01-11T21:45:10.5289646Z test_empty_like_SparseBSC_SparseCSR_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.111s) 2023-01-11T21:45:10.5289830Z test_empty_like_SparseBSC_SparseCSR_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.111s) 2023-01-11T21:45:10.5290010Z test_empty_like_SparseBSC_SparseCSR_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.111s) 2023-01-11T21:45:10.5290186Z test_empty_like_SparseBSC_SparseCSR_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.111s) 2023-01-11T21:45:10.5290360Z test_empty_like_SparseBSC_SparseCSR_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.111s) 2023-01-11T21:45:10.5290532Z test_empty_like_SparseBSC_SparseCSR_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.111s) 2023-01-11T21:45:10.5290708Z test_empty_like_SparseBSC_SparseCSR_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.111s) 2023-01-11T21:45:10.5290879Z test_empty_like_SparseBSC_SparseCSR_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.111s) 2023-01-11T21:45:10.5291056Z test_empty_like_SparseBSC_SparseCSR_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.111s) 2023-01-11T21:45:10.5291233Z test_empty_like_SparseBSC_SparseCSR_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.111s) 2023-01-11T21:45:10.5291413Z test_empty_like_SparseBSC_SparseCSR_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.111s) 2023-01-11T21:45:10.5291591Z test_empty_like_SparseBSR_SparseBSC_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.111s) 2023-01-11T21:45:10.5291765Z test_empty_like_SparseBSR_SparseBSC_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.111s) 2023-01-11T21:45:10.5291949Z test_empty_like_SparseBSR_SparseBSC_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.111s) 2023-01-11T21:45:10.5292133Z test_empty_like_SparseBSR_SparseBSC_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.111s) 2023-01-11T21:45:10.5292307Z test_empty_like_SparseBSR_SparseBSC_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.111s) 2023-01-11T21:45:10.5292502Z test_empty_like_SparseBSR_SparseBSC_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.111s) 2023-01-11T21:45:10.5292684Z test_empty_like_SparseBSR_SparseBSC_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.111s) 2023-01-11T21:45:10.5292858Z test_empty_like_SparseBSR_SparseBSC_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.111s) 2023-01-11T21:45:10.5293030Z test_empty_like_SparseBSR_SparseBSC_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.111s) 2023-01-11T21:45:10.5293206Z test_empty_like_SparseBSR_SparseBSC_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.111s) 2023-01-11T21:45:10.5293382Z test_empty_like_SparseBSR_SparseBSC_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.112s) 2023-01-11T21:45:10.5293561Z test_empty_like_SparseBSR_SparseBSC_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.112s) 2023-01-11T21:45:10.5293751Z test_empty_like_SparseBSR_SparseBSR_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.026s) 2023-01-11T21:45:10.5293952Z test_empty_like_SparseBSR_SparseBSR_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.025s) 2023-01-11T21:45:10.5294142Z test_empty_like_SparseBSR_SparseBSR_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.025s) 2023-01-11T21:45:10.5294327Z test_empty_like_SparseBSR_SparseBSR_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.025s) 2023-01-11T21:45:10.5294652Z test_empty_like_SparseBSR_SparseBSR_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.025s) 2023-01-11T21:45:10.5294868Z test_empty_like_SparseBSR_SparseBSR_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.025s) 2023-01-11T21:45:10.5295047Z test_empty_like_SparseBSR_SparseBSR_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.025s) 2023-01-11T21:45:10.5295278Z test_empty_like_SparseBSR_SparseBSR_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.025s) 2023-01-11T21:45:10.5295456Z test_empty_like_SparseBSR_SparseBSR_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.025s) 2023-01-11T21:45:10.5295627Z test_empty_like_SparseBSR_SparseBSR_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.025s) 2023-01-11T21:45:10.5295807Z test_empty_like_SparseBSR_SparseBSR_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.025s) 2023-01-11T21:45:10.5295979Z test_empty_like_SparseBSR_SparseBSR_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.025s) 2023-01-11T21:45:10.5296162Z test_empty_like_SparseBSR_SparseCSC_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.111s) 2023-01-11T21:45:10.5296338Z test_empty_like_SparseBSR_SparseCSC_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.115s) 2023-01-11T21:45:10.5296530Z test_empty_like_SparseBSR_SparseCSC_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.111s) 2023-01-11T21:45:10.5296713Z test_empty_like_SparseBSR_SparseCSC_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.111s) 2023-01-11T21:45:10.5296897Z test_empty_like_SparseBSR_SparseCSC_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.111s) 2023-01-11T21:45:10.5297076Z test_empty_like_SparseBSR_SparseCSC_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.111s) 2023-01-11T21:45:10.5297253Z test_empty_like_SparseBSR_SparseCSC_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.111s) 2023-01-11T21:45:10.5297426Z test_empty_like_SparseBSR_SparseCSC_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.111s) 2023-01-11T21:45:10.5297592Z test_empty_like_SparseBSR_SparseCSC_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.111s) 2023-01-11T21:45:10.5297764Z test_empty_like_SparseBSR_SparseCSC_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.111s) 2023-01-11T21:45:10.5297944Z test_empty_like_SparseBSR_SparseCSC_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.111s) 2023-01-11T21:45:10.5298125Z test_empty_like_SparseBSR_SparseCSC_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.111s) 2023-01-11T21:45:10.5298335Z test_empty_like_SparseBSR_SparseCSR_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.111s) 2023-01-11T21:45:10.5298577Z test_empty_like_SparseBSR_SparseCSR_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.111s) 2023-01-11T21:45:10.5298777Z test_empty_like_SparseBSR_SparseCSR_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.111s) 2023-01-11T21:45:10.5298957Z test_empty_like_SparseBSR_SparseCSR_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.111s) 2023-01-11T21:45:10.5299133Z test_empty_like_SparseBSR_SparseCSR_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.111s) 2023-01-11T21:45:10.5299301Z test_empty_like_SparseBSR_SparseCSR_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.111s) 2023-01-11T21:45:10.5299477Z test_empty_like_SparseBSR_SparseCSR_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.111s) 2023-01-11T21:45:10.5299648Z test_empty_like_SparseBSR_SparseCSR_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.111s) 2023-01-11T21:45:10.5299822Z test_empty_like_SparseBSR_SparseCSR_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.111s) 2023-01-11T21:45:10.5299992Z test_empty_like_SparseBSR_SparseCSR_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.111s) 2023-01-11T21:45:10.5300166Z test_empty_like_SparseBSR_SparseCSR_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.111s) 2023-01-11T21:45:10.5300341Z test_empty_like_SparseBSR_SparseCSR_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.111s) 2023-01-11T21:45:10.5300519Z test_empty_like_SparseCSC_SparseBSC_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.109s) 2023-01-11T21:45:10.5300688Z test_empty_like_SparseCSC_SparseBSC_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.109s) 2023-01-11T21:45:10.5300906Z test_empty_like_SparseCSC_SparseBSC_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.109s) 2023-01-11T21:45:10.5301090Z test_empty_like_SparseCSC_SparseBSC_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.109s) 2023-01-11T21:45:10.5301268Z test_empty_like_SparseCSC_SparseBSC_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.109s) 2023-01-11T21:45:10.5301442Z test_empty_like_SparseCSC_SparseBSC_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.109s) 2023-01-11T21:45:10.5301618Z test_empty_like_SparseCSC_SparseBSC_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.109s) 2023-01-11T21:45:10.5301792Z test_empty_like_SparseCSC_SparseBSC_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.109s) 2023-01-11T21:45:10.5301962Z test_empty_like_SparseCSC_SparseBSC_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.109s) 2023-01-11T21:45:10.5302131Z test_empty_like_SparseCSC_SparseBSC_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.109s) 2023-01-11T21:45:10.5302300Z test_empty_like_SparseCSC_SparseBSC_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.109s) 2023-01-11T21:45:10.5302477Z test_empty_like_SparseCSC_SparseBSC_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.109s) 2023-01-11T21:45:10.5302658Z test_empty_like_SparseCSC_SparseBSR_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.109s) 2023-01-11T21:45:10.5302829Z test_empty_like_SparseCSC_SparseBSR_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.109s) 2023-01-11T21:45:10.5303010Z test_empty_like_SparseCSC_SparseBSR_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.109s) 2023-01-11T21:45:10.5303189Z test_empty_like_SparseCSC_SparseBSR_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.109s) 2023-01-11T21:45:10.5303364Z test_empty_like_SparseCSC_SparseBSR_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.109s) 2023-01-11T21:45:10.5303544Z test_empty_like_SparseCSC_SparseBSR_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.109s) 2023-01-11T21:45:10.5303717Z test_empty_like_SparseCSC_SparseBSR_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.109s) 2023-01-11T21:45:10.5303905Z test_empty_like_SparseCSC_SparseBSR_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.108s) 2023-01-11T21:45:10.5304075Z test_empty_like_SparseCSC_SparseBSR_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.109s) 2023-01-11T21:45:10.5304249Z test_empty_like_SparseCSC_SparseBSR_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.109s) 2023-01-11T21:45:10.5304422Z test_empty_like_SparseCSC_SparseBSR_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.109s) 2023-01-11T21:45:10.5304595Z test_empty_like_SparseCSC_SparseBSR_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.109s) 2023-01-11T21:45:10.5304771Z test_empty_like_SparseCSC_SparseCSC_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.023s) 2023-01-11T21:45:10.5304950Z test_empty_like_SparseCSC_SparseCSC_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.023s) 2023-01-11T21:45:10.5305131Z test_empty_like_SparseCSC_SparseCSC_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.023s) 2023-01-11T21:45:10.5305317Z test_empty_like_SparseCSC_SparseCSC_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.023s) 2023-01-11T21:45:10.5305485Z test_empty_like_SparseCSC_SparseCSC_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.023s) 2023-01-11T21:45:10.5305663Z test_empty_like_SparseCSC_SparseCSC_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.023s) 2023-01-11T21:45:10.5305837Z test_empty_like_SparseCSC_SparseCSC_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.023s) 2023-01-11T21:45:10.5306007Z test_empty_like_SparseCSC_SparseCSC_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.023s) 2023-01-11T21:45:10.5306176Z test_empty_like_SparseCSC_SparseCSC_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.023s) 2023-01-11T21:45:10.5306430Z test_empty_like_SparseCSC_SparseCSC_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.023s) 2023-01-11T21:45:10.5306622Z test_empty_like_SparseCSC_SparseCSC_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.023s) 2023-01-11T21:45:10.5306796Z test_empty_like_SparseCSC_SparseCSC_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.023s) 2023-01-11T21:45:10.5306976Z test_empty_like_SparseCSC_SparseCSR_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.109s) 2023-01-11T21:45:10.5307141Z test_empty_like_SparseCSC_SparseCSR_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.109s) 2023-01-11T21:45:10.5307323Z test_empty_like_SparseCSC_SparseCSR_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.109s) 2023-01-11T21:45:10.5307505Z test_empty_like_SparseCSC_SparseCSR_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.109s) 2023-01-11T21:45:10.5307681Z test_empty_like_SparseCSC_SparseCSR_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.109s) 2023-01-11T21:45:10.5307855Z test_empty_like_SparseCSC_SparseCSR_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.109s) 2023-01-11T21:45:10.5308031Z test_empty_like_SparseCSC_SparseCSR_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.109s) 2023-01-11T21:45:10.5308202Z test_empty_like_SparseCSC_SparseCSR_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.109s) 2023-01-11T21:45:10.5308371Z test_empty_like_SparseCSC_SparseCSR_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.109s) 2023-01-11T21:45:10.5308540Z test_empty_like_SparseCSC_SparseCSR_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.109s) 2023-01-11T21:45:10.5308709Z test_empty_like_SparseCSC_SparseCSR_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.109s) 2023-01-11T21:45:10.5308885Z test_empty_like_SparseCSC_SparseCSR_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.109s) 2023-01-11T21:45:10.5309063Z test_empty_like_SparseCSR_SparseBSC_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.109s) 2023-01-11T21:45:10.5309236Z test_empty_like_SparseCSR_SparseBSC_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.109s) 2023-01-11T21:45:10.5309454Z test_empty_like_SparseCSR_SparseBSC_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.109s) 2023-01-11T21:45:10.5309636Z test_empty_like_SparseCSR_SparseBSC_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.109s) 2023-01-11T21:45:10.5309811Z test_empty_like_SparseCSR_SparseBSC_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.109s) 2023-01-11T21:45:10.5309988Z test_empty_like_SparseCSR_SparseBSC_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.109s) 2023-01-11T21:45:10.5310164Z test_empty_like_SparseCSR_SparseBSC_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.109s) 2023-01-11T21:45:10.5310331Z test_empty_like_SparseCSR_SparseBSC_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.109s) 2023-01-11T21:45:10.5310506Z test_empty_like_SparseCSR_SparseBSC_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.109s) 2023-01-11T21:45:10.5310674Z test_empty_like_SparseCSR_SparseBSC_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.109s) 2023-01-11T21:45:10.5310852Z test_empty_like_SparseCSR_SparseBSC_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.109s) 2023-01-11T21:45:10.5311020Z test_empty_like_SparseCSR_SparseBSC_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.109s) 2023-01-11T21:45:10.5311199Z test_empty_like_SparseCSR_SparseBSR_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.109s) 2023-01-11T21:45:10.5311373Z test_empty_like_SparseCSR_SparseBSR_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.109s) 2023-01-11T21:45:10.5311556Z test_empty_like_SparseCSR_SparseBSR_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.109s) 2023-01-11T21:45:10.5311738Z test_empty_like_SparseCSR_SparseBSR_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.109s) 2023-01-11T21:45:10.5311935Z test_empty_like_SparseCSR_SparseBSR_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.108s) 2023-01-11T21:45:10.5312108Z test_empty_like_SparseCSR_SparseBSR_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.109s) 2023-01-11T21:45:10.5312288Z test_empty_like_SparseCSR_SparseBSR_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.109s) 2023-01-11T21:45:10.5312459Z test_empty_like_SparseCSR_SparseBSR_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.109s) 2023-01-11T21:45:10.5312629Z test_empty_like_SparseCSR_SparseBSR_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.109s) 2023-01-11T21:45:10.5312799Z test_empty_like_SparseCSR_SparseBSR_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.109s) 2023-01-11T21:45:10.5312973Z test_empty_like_SparseCSR_SparseBSR_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.109s) 2023-01-11T21:45:10.5313145Z test_empty_like_SparseCSR_SparseBSR_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.109s) 2023-01-11T21:45:10.5313325Z test_empty_like_SparseCSR_SparseCSC_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.109s) 2023-01-11T21:45:10.5313492Z test_empty_like_SparseCSR_SparseCSC_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.109s) 2023-01-11T21:45:10.5313674Z test_empty_like_SparseCSR_SparseCSC_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.109s) 2023-01-11T21:45:10.5313854Z test_empty_like_SparseCSR_SparseCSC_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.109s) 2023-01-11T21:45:10.5314031Z test_empty_like_SparseCSR_SparseCSC_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.109s) 2023-01-11T21:45:10.5314203Z test_empty_like_SparseCSR_SparseCSC_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.109s) 2023-01-11T21:45:10.5314374Z test_empty_like_SparseCSR_SparseCSC_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.109s) 2023-01-11T21:45:10.5314547Z test_empty_like_SparseCSR_SparseCSC_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.109s) 2023-01-11T21:45:10.5314719Z test_empty_like_SparseCSR_SparseCSC_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.109s) 2023-01-11T21:45:10.5314907Z test_empty_like_SparseCSR_SparseCSC_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.109s) 2023-01-11T21:45:10.5315081Z test_empty_like_SparseCSR_SparseCSC_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.109s) 2023-01-11T21:45:10.5315254Z test_empty_like_SparseCSR_SparseCSC_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.109s) 2023-01-11T21:45:10.5315432Z test_empty_like_SparseCSR_SparseCSR_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.023s) 2023-01-11T21:45:10.5315603Z test_empty_like_SparseCSR_SparseCSR_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.023s) 2023-01-11T21:45:10.5315784Z test_empty_like_SparseCSR_SparseCSR_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.023s) 2023-01-11T21:45:10.5315965Z test_empty_like_SparseCSR_SparseCSR_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.023s) 2023-01-11T21:45:10.5316144Z test_empty_like_SparseCSR_SparseCSR_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.023s) 2023-01-11T21:45:10.5316321Z test_empty_like_SparseCSR_SparseCSR_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.023s) 2023-01-11T21:45:10.5316491Z test_empty_like_SparseCSR_SparseCSR_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.023s) 2023-01-11T21:45:10.5316661Z test_empty_like_SparseCSR_SparseCSR_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.023s) 2023-01-11T21:45:10.5316829Z test_empty_like_SparseCSR_SparseCSR_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.023s) 2023-01-11T21:45:10.5316998Z test_empty_like_SparseCSR_SparseCSR_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.023s) 2023-01-11T21:45:10.5317173Z test_empty_like_SparseCSR_SparseCSR_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.023s) 2023-01-11T21:45:10.5317375Z test_empty_like_SparseCSR_SparseCSR_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.023s) 2023-01-11T21:45:10.5317577Z test_invalid_input_SparseBSC_target_sparse_compressed_tensor_cuda (__main__.TestSparseCompressedCUDA) ... ok (0.034s) 2023-01-11T21:45:10.5317790Z test_invalid_input_SparseBSC_target_sparse_compressed_tensor_no_size_cuda (__main__.TestSparseCompressedCUDA) ... ok (0.020s) 2023-01-11T21:45:10.5318009Z test_invalid_input_SparseBSC_target_validate_sparse_compressed_tensor_args_cuda (__main__.TestSparseCompressedCUDA) ... ok (0.024s) 2023-01-11T21:45:10.5318209Z test_invalid_input_SparseBSR_target_sparse_compressed_tensor_cuda (__main__.TestSparseCompressedCUDA) ... ok (0.033s) 2023-01-11T21:45:10.5318409Z test_invalid_input_SparseBSR_target_sparse_compressed_tensor_no_size_cuda (__main__.TestSparseCompressedCUDA) ... ok (0.020s) 2023-01-11T21:45:10.5318619Z test_invalid_input_SparseBSR_target_validate_sparse_compressed_tensor_args_cuda (__main__.TestSparseCompressedCUDA) ... ok (0.024s) 2023-01-11T21:45:10.5318822Z test_invalid_input_SparseCSC_target_sparse_compressed_tensor_cuda (__main__.TestSparseCompressedCUDA) ... ok (0.030s) 2023-01-11T21:45:10.5319029Z test_invalid_input_SparseCSC_target_sparse_compressed_tensor_no_size_cuda (__main__.TestSparseCompressedCUDA) ... ok (0.020s) 2023-01-11T21:45:10.5319244Z test_invalid_input_SparseCSC_target_validate_sparse_compressed_tensor_args_cuda (__main__.TestSparseCompressedCUDA) ... ok (0.022s) 2023-01-11T21:45:10.5319439Z test_invalid_input_SparseCSR_target_sparse_compressed_tensor_cuda (__main__.TestSparseCompressedCUDA) ... ok (0.030s) 2023-01-11T21:45:10.5319645Z test_invalid_input_SparseCSR_target_sparse_compressed_tensor_no_size_cuda (__main__.TestSparseCompressedCUDA) ... ok (0.020s) 2023-01-11T21:45:10.5319860Z test_invalid_input_SparseCSR_target_validate_sparse_compressed_tensor_args_cuda (__main__.TestSparseCompressedCUDA) ... ok (0.022s) 2023-01-11T21:45:10.5320036Z test_layout_SparseBSC_cuda (__main__.TestSparseCompressedCUDA) ... skip: Only runs on cpu (0.001s) 2023-01-11T21:45:10.5320206Z test_layout_SparseBSR_cuda (__main__.TestSparseCompressedCUDA) ... skip: Only runs on cpu (0.000s) 2023-01-11T21:45:10.5320463Z test_layout_SparseCSC_cuda (__main__.TestSparseCompressedCUDA) ... skip: Only runs on cpu (0.000s) 2023-01-11T21:45:10.5320638Z test_layout_SparseCSR_cuda (__main__.TestSparseCompressedCUDA) ... skip: Only runs on cpu (0.000s) 2023-01-11T21:45:10.5320804Z test_pickle_SparseBSC_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.220s) 2023-01-11T21:45:10.5320968Z test_pickle_SparseBSR_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.170s) 2023-01-11T21:45:10.5321132Z test_pickle_SparseCSC_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.153s) 2023-01-11T21:45:10.5321288Z test_pickle_SparseCSR_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.118s) 2023-01-11T21:45:10.5321446Z test_print_SparseBSC_cuda (__main__.TestSparseCompressedCUDA) ... ok (0.253s) 2023-01-11T21:45:10.5321606Z test_print_SparseBSR_cuda (__main__.TestSparseCompressedCUDA) ... ok (0.246s) 2023-01-11T21:45:10.5321754Z test_print_SparseCSC_cuda (__main__.TestSparseCompressedCUDA) ... ok (0.223s) 2023-01-11T21:45:10.5321914Z test_print_SparseCSR_cuda (__main__.TestSparseCompressedCUDA) ... ok (0.223s) 2023-01-11T21:45:10.5322092Z test_select_copy_SparseBSC_int32_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.722s) 2023-01-11T21:45:10.5322267Z test_select_copy_SparseBSC_int32_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.675s) 2023-01-11T21:45:10.5322446Z test_select_copy_SparseBSC_int32_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.728s) 2023-01-11T21:45:10.5322627Z test_select_copy_SparseBSC_int32_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.735s) 2023-01-11T21:45:10.5322799Z test_select_copy_SparseBSC_int32_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.722s) 2023-01-11T21:45:10.5322997Z test_select_copy_SparseBSC_int32_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.723s) 2023-01-11T21:45:10.5323169Z test_select_copy_SparseBSC_int32_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.718s) 2023-01-11T21:45:10.5323336Z test_select_copy_SparseBSC_int32_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.674s) 2023-01-11T21:45:10.5323505Z test_select_copy_SparseBSC_int32_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.675s) 2023-01-11T21:45:10.5323671Z test_select_copy_SparseBSC_int32_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.666s) 2023-01-11T21:45:10.5323842Z test_select_copy_SparseBSC_int32_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.675s) 2023-01-11T21:45:10.5324018Z test_select_copy_SparseBSC_int32_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.675s) 2023-01-11T21:45:10.5324197Z test_select_copy_SparseBSC_int64_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.721s) 2023-01-11T21:45:10.5324369Z test_select_copy_SparseBSC_int64_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.673s) 2023-01-11T21:45:10.5324550Z test_select_copy_SparseBSC_int64_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.724s) 2023-01-11T21:45:10.5324723Z test_select_copy_SparseBSC_int64_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.732s) 2023-01-11T21:45:10.5324895Z test_select_copy_SparseBSC_int64_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.720s) 2023-01-11T21:45:10.5325064Z test_select_copy_SparseBSC_int64_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.724s) 2023-01-11T21:45:10.5325230Z test_select_copy_SparseBSC_int64_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.714s) 2023-01-11T21:45:10.5325399Z test_select_copy_SparseBSC_int64_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.674s) 2023-01-11T21:45:10.5325567Z test_select_copy_SparseBSC_int64_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.675s) 2023-01-11T21:45:10.5325736Z test_select_copy_SparseBSC_int64_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.667s) 2023-01-11T21:45:10.5325908Z test_select_copy_SparseBSC_int64_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.674s) 2023-01-11T21:45:10.5326102Z test_select_copy_SparseBSC_int64_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.673s) 2023-01-11T21:45:10.5326274Z test_select_copy_SparseBSR_int32_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.640s) 2023-01-11T21:45:10.5326441Z test_select_copy_SparseBSR_int32_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.592s) 2023-01-11T21:45:10.5326619Z test_select_copy_SparseBSR_int32_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.643s) 2023-01-11T21:45:10.5326797Z test_select_copy_SparseBSR_int32_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.650s) 2023-01-11T21:45:10.5326971Z test_select_copy_SparseBSR_int32_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.639s) 2023-01-11T21:45:10.5327141Z test_select_copy_SparseBSR_int32_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.640s) 2023-01-11T21:45:10.5327309Z test_select_copy_SparseBSR_int32_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.633s) 2023-01-11T21:45:10.5327479Z test_select_copy_SparseBSR_int32_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.592s) 2023-01-11T21:45:10.5327646Z test_select_copy_SparseBSR_int32_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.591s) 2023-01-11T21:45:10.5327807Z test_select_copy_SparseBSR_int32_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.583s) 2023-01-11T21:45:10.5327977Z test_select_copy_SparseBSR_int32_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.592s) 2023-01-11T21:45:10.5328142Z test_select_copy_SparseBSR_int32_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.592s) 2023-01-11T21:45:10.5328316Z test_select_copy_SparseBSR_int64_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.624s) 2023-01-11T21:45:10.5328513Z test_select_copy_SparseBSR_int64_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.577s) 2023-01-11T21:45:10.5328691Z test_select_copy_SparseBSR_int64_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.630s) 2023-01-11T21:45:10.5328871Z test_select_copy_SparseBSR_int64_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.637s) 2023-01-11T21:45:10.5329043Z test_select_copy_SparseBSR_int64_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.628s) 2023-01-11T21:45:10.5329210Z test_select_copy_SparseBSR_int64_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.628s) 2023-01-11T21:45:10.5329371Z test_select_copy_SparseBSR_int64_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.620s) 2023-01-11T21:45:10.5329539Z test_select_copy_SparseBSR_int64_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.578s) 2023-01-11T21:45:10.5329707Z test_select_copy_SparseBSR_int64_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.578s) 2023-01-11T21:45:10.5329876Z test_select_copy_SparseBSR_int64_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.572s) 2023-01-11T21:45:10.5330044Z test_select_copy_SparseBSR_int64_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.578s) 2023-01-11T21:45:10.5330212Z test_select_copy_SparseBSR_int64_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.579s) 2023-01-11T21:45:10.5330393Z test_select_copy_SparseCSC_int32_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.584s) 2023-01-11T21:45:10.5330563Z test_select_copy_SparseCSC_int32_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.537s) 2023-01-11T21:45:10.5330739Z test_select_copy_SparseCSC_int32_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.590s) 2023-01-11T21:45:10.5330908Z test_select_copy_SparseCSC_int32_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.596s) 2023-01-11T21:45:10.5331081Z test_select_copy_SparseCSC_int32_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.585s) 2023-01-11T21:45:10.5331253Z test_select_copy_SparseCSC_int32_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.586s) 2023-01-11T21:45:10.5331420Z test_select_copy_SparseCSC_int32_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.577s) 2023-01-11T21:45:10.5331614Z test_select_copy_SparseCSC_int32_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.536s) 2023-01-11T21:45:10.5331784Z test_select_copy_SparseCSC_int32_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.537s) 2023-01-11T21:45:10.5331952Z test_select_copy_SparseCSC_int32_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.528s) 2023-01-11T21:45:10.5332120Z test_select_copy_SparseCSC_int32_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.536s) 2023-01-11T21:45:10.5332280Z test_select_copy_SparseCSC_int32_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.536s) 2023-01-11T21:45:10.5332453Z test_select_copy_SparseCSC_int64_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.583s) 2023-01-11T21:45:10.5332623Z test_select_copy_SparseCSC_int64_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.536s) 2023-01-11T21:45:10.5332803Z test_select_copy_SparseCSC_int64_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.587s) 2023-01-11T21:45:10.5332982Z test_select_copy_SparseCSC_int64_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.597s) 2023-01-11T21:45:10.5333150Z test_select_copy_SparseCSC_int64_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.584s) 2023-01-11T21:45:10.5333320Z test_select_copy_SparseCSC_int64_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.584s) 2023-01-11T21:45:10.5333494Z test_select_copy_SparseCSC_int64_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.575s) 2023-01-11T21:45:10.5333680Z test_select_copy_SparseCSC_int64_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.534s) 2023-01-11T21:45:10.5333863Z test_select_copy_SparseCSC_int64_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.536s) 2023-01-11T21:45:10.5334056Z test_select_copy_SparseCSC_int64_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.528s) 2023-01-11T21:45:10.5334225Z test_select_copy_SparseCSC_int64_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.534s) 2023-01-11T21:45:10.5334392Z test_select_copy_SparseCSC_int64_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.535s) 2023-01-11T21:45:10.5334803Z test_select_copy_SparseCSR_int32_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.523s) 2023-01-11T21:45:10.5334974Z test_select_copy_SparseCSR_int32_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.474s) 2023-01-11T21:45:10.5335151Z test_select_copy_SparseCSR_int32_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.527s) 2023-01-11T21:45:10.5335326Z test_select_copy_SparseCSR_int32_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.535s) 2023-01-11T21:45:10.5335495Z test_select_copy_SparseCSR_int32_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.524s) 2023-01-11T21:45:10.5335661Z test_select_copy_SparseCSR_int32_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.523s) 2023-01-11T21:45:10.5335832Z test_select_copy_SparseCSR_int32_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.518s) 2023-01-11T21:45:10.5336004Z test_select_copy_SparseCSR_int32_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.475s) 2023-01-11T21:45:10.5336169Z test_select_copy_SparseCSR_int32_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.476s) 2023-01-11T21:45:10.5336335Z test_select_copy_SparseCSR_int32_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.468s) 2023-01-11T21:45:10.5336503Z test_select_copy_SparseCSR_int32_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.475s) 2023-01-11T21:45:10.5336669Z test_select_copy_SparseCSR_int32_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.474s) 2023-01-11T21:45:10.5336846Z test_select_copy_SparseCSR_int64_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.517s) 2023-01-11T21:45:10.5337017Z test_select_copy_SparseCSR_int64_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.469s) 2023-01-11T21:45:10.5337189Z test_select_copy_SparseCSR_int64_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.521s) 2023-01-11T21:45:10.5337415Z test_select_copy_SparseCSR_int64_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.528s) 2023-01-11T21:45:10.5337588Z test_select_copy_SparseCSR_int64_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.516s) 2023-01-11T21:45:10.5337759Z test_select_copy_SparseCSR_int64_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.518s) 2023-01-11T21:45:10.5337927Z test_select_copy_SparseCSR_int64_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.511s) 2023-01-11T21:45:10.5338097Z test_select_copy_SparseCSR_int64_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.469s) 2023-01-11T21:45:10.5338262Z test_select_copy_SparseCSR_int64_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.470s) 2023-01-11T21:45:10.5338435Z test_select_copy_SparseCSR_int64_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.462s) 2023-01-11T21:45:10.5338649Z test_select_copy_SparseCSR_int64_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.469s) 2023-01-11T21:45:10.5338812Z test_select_copy_SparseCSR_int64_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.470s) 2023-01-11T21:45:10.5339021Z test_sparse_compressed_constructor_____from_list_SparseBSC_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.062s) 2023-01-11T21:45:10.5339222Z test_sparse_compressed_constructor_____from_list_SparseBSC_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.054s) 2023-01-11T21:45:10.5339432Z test_sparse_compressed_constructor_____from_list_SparseBSC_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.063s) 2023-01-11T21:45:10.5339636Z test_sparse_compressed_constructor_____from_list_SparseBSC_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.064s) 2023-01-11T21:45:10.5339874Z test_sparse_compressed_constructor_____from_list_SparseBSC_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.062s) 2023-01-11T21:45:10.5340076Z test_sparse_compressed_constructor_____from_list_SparseBSC_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.061s) 2023-01-11T21:45:10.5340277Z test_sparse_compressed_constructor_____from_list_SparseBSC_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.061s) 2023-01-11T21:45:10.5340474Z test_sparse_compressed_constructor_____from_list_SparseBSC_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.053s) 2023-01-11T21:45:10.5340666Z test_sparse_compressed_constructor_____from_list_SparseBSC_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.053s) 2023-01-11T21:45:10.5340859Z test_sparse_compressed_constructor_____from_list_SparseBSC_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.052s) 2023-01-11T21:45:10.5341055Z test_sparse_compressed_constructor_____from_list_SparseBSC_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.053s) 2023-01-11T21:45:10.5341257Z test_sparse_compressed_constructor_____from_list_SparseBSC_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.053s) 2023-01-11T21:45:10.5341460Z test_sparse_compressed_constructor_____from_list_SparseBSR_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.062s) 2023-01-11T21:45:10.5341656Z test_sparse_compressed_constructor_____from_list_SparseBSR_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.053s) 2023-01-11T21:45:10.5341859Z test_sparse_compressed_constructor_____from_list_SparseBSR_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.062s) 2023-01-11T21:45:10.5342061Z test_sparse_compressed_constructor_____from_list_SparseBSR_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.063s) 2023-01-11T21:45:10.5342258Z test_sparse_compressed_constructor_____from_list_SparseBSR_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.062s) 2023-01-11T21:45:10.5342455Z test_sparse_compressed_constructor_____from_list_SparseBSR_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.062s) 2023-01-11T21:45:10.5342646Z test_sparse_compressed_constructor_____from_list_SparseBSR_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.061s) 2023-01-11T21:45:10.5342864Z test_sparse_compressed_constructor_____from_list_SparseBSR_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.053s) 2023-01-11T21:45:10.5343066Z test_sparse_compressed_constructor_____from_list_SparseBSR_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.053s) 2023-01-11T21:45:10.5343262Z test_sparse_compressed_constructor_____from_list_SparseBSR_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.052s) 2023-01-11T21:45:10.5343459Z test_sparse_compressed_constructor_____from_list_SparseBSR_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.053s) 2023-01-11T21:45:10.5343661Z test_sparse_compressed_constructor_____from_list_SparseBSR_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.053s) 2023-01-11T21:45:10.5343863Z test_sparse_compressed_constructor_____from_list_SparseCSC_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.059s) 2023-01-11T21:45:10.5344063Z test_sparse_compressed_constructor_____from_list_SparseCSC_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.051s) 2023-01-11T21:45:10.5344269Z test_sparse_compressed_constructor_____from_list_SparseCSC_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.060s) 2023-01-11T21:45:10.5344479Z test_sparse_compressed_constructor_____from_list_SparseCSC_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.061s) 2023-01-11T21:45:10.5344669Z test_sparse_compressed_constructor_____from_list_SparseCSC_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.059s) 2023-01-11T21:45:10.5344866Z test_sparse_compressed_constructor_____from_list_SparseCSC_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.059s) 2023-01-11T21:45:10.5345059Z test_sparse_compressed_constructor_____from_list_SparseCSC_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.058s) 2023-01-11T21:45:10.5345284Z test_sparse_compressed_constructor_____from_list_SparseCSC_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.051s) 2023-01-11T21:45:10.5345480Z test_sparse_compressed_constructor_____from_list_SparseCSC_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.050s) 2023-01-11T21:45:10.5345669Z test_sparse_compressed_constructor_____from_list_SparseCSC_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.049s) 2023-01-11T21:45:10.5345865Z test_sparse_compressed_constructor_____from_list_SparseCSC_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.051s) 2023-01-11T21:45:10.5346059Z test_sparse_compressed_constructor_____from_list_SparseCSC_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.051s) 2023-01-11T21:45:10.5346256Z test_sparse_compressed_constructor_____from_list_SparseCSR_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.059s) 2023-01-11T21:45:10.5346454Z test_sparse_compressed_constructor_____from_list_SparseCSR_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.051s) 2023-01-11T21:45:10.5346653Z test_sparse_compressed_constructor_____from_list_SparseCSR_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.060s) 2023-01-11T21:45:10.5346860Z test_sparse_compressed_constructor_____from_list_SparseCSR_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.061s) 2023-01-11T21:45:10.5347058Z test_sparse_compressed_constructor_____from_list_SparseCSR_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.059s) 2023-01-11T21:45:10.5347256Z test_sparse_compressed_constructor_____from_list_SparseCSR_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.059s) 2023-01-11T21:45:10.5347449Z test_sparse_compressed_constructor_____from_list_SparseCSR_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.058s) 2023-01-11T21:45:10.5347642Z test_sparse_compressed_constructor_____from_list_SparseCSR_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.051s) 2023-01-11T21:45:10.5347841Z test_sparse_compressed_constructor_____from_list_SparseCSR_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.051s) 2023-01-11T21:45:10.5348056Z test_sparse_compressed_constructor_____from_list_SparseCSR_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.049s) 2023-01-11T21:45:10.5348253Z test_sparse_compressed_constructor_____from_list_SparseCSR_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.051s) 2023-01-11T21:45:10.5348450Z test_sparse_compressed_constructor_____from_list_SparseCSR_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.051s) 2023-01-11T21:45:10.5348649Z test_sparse_compressed_constructor_____from_tensor_SparseBSC_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.120s) 2023-01-11T21:45:10.5348848Z test_sparse_compressed_constructor_____from_tensor_SparseBSC_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.103s) 2023-01-11T21:45:10.5349054Z test_sparse_compressed_constructor_____from_tensor_SparseBSC_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.122s) 2023-01-11T21:45:10.5349263Z test_sparse_compressed_constructor_____from_tensor_SparseBSC_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.125s) 2023-01-11T21:45:10.5349467Z test_sparse_compressed_constructor_____from_tensor_SparseBSC_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.121s) 2023-01-11T21:45:10.5349671Z test_sparse_compressed_constructor_____from_tensor_SparseBSC_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.121s) 2023-01-11T21:45:10.5349870Z test_sparse_compressed_constructor_____from_tensor_SparseBSC_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.118s) 2023-01-11T21:45:10.5350066Z test_sparse_compressed_constructor_____from_tensor_SparseBSC_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.103s) 2023-01-11T21:45:10.5350263Z test_sparse_compressed_constructor_____from_tensor_SparseBSC_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.102s) 2023-01-11T21:45:10.5350484Z test_sparse_compressed_constructor_____from_tensor_SparseBSC_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.100s) 2023-01-11T21:45:10.5350677Z test_sparse_compressed_constructor_____from_tensor_SparseBSC_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.102s) 2023-01-11T21:45:10.5350876Z test_sparse_compressed_constructor_____from_tensor_SparseBSC_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.103s) 2023-01-11T21:45:10.5351078Z test_sparse_compressed_constructor_____from_tensor_SparseBSR_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.120s) 2023-01-11T21:45:10.5351276Z test_sparse_compressed_constructor_____from_tensor_SparseBSR_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.103s) 2023-01-11T21:45:10.5351484Z test_sparse_compressed_constructor_____from_tensor_SparseBSR_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.122s) 2023-01-11T21:45:10.5351691Z test_sparse_compressed_constructor_____from_tensor_SparseBSR_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.125s) 2023-01-11T21:45:10.5351891Z test_sparse_compressed_constructor_____from_tensor_SparseBSR_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.121s) 2023-01-11T21:45:10.5352092Z test_sparse_compressed_constructor_____from_tensor_SparseBSR_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.121s) 2023-01-11T21:45:10.5352293Z test_sparse_compressed_constructor_____from_tensor_SparseBSR_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.118s) 2023-01-11T21:45:10.5352490Z test_sparse_compressed_constructor_____from_tensor_SparseBSR_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.103s) 2023-01-11T21:45:10.5352681Z test_sparse_compressed_constructor_____from_tensor_SparseBSR_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.103s) 2023-01-11T21:45:10.5352876Z test_sparse_compressed_constructor_____from_tensor_SparseBSR_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.100s) 2023-01-11T21:45:10.5353078Z test_sparse_compressed_constructor_____from_tensor_SparseBSR_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.103s) 2023-01-11T21:45:10.5353276Z test_sparse_compressed_constructor_____from_tensor_SparseBSR_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.102s) 2023-01-11T21:45:10.5353505Z test_sparse_compressed_constructor_____from_tensor_SparseCSC_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.114s) 2023-01-11T21:45:10.5353710Z test_sparse_compressed_constructor_____from_tensor_SparseCSC_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.097s) 2023-01-11T21:45:10.5353915Z test_sparse_compressed_constructor_____from_tensor_SparseCSC_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.118s) 2023-01-11T21:45:10.5354116Z test_sparse_compressed_constructor_____from_tensor_SparseCSC_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.120s) 2023-01-11T21:45:10.5354315Z test_sparse_compressed_constructor_____from_tensor_SparseCSC_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.115s) 2023-01-11T21:45:10.5354516Z test_sparse_compressed_constructor_____from_tensor_SparseCSC_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.115s) 2023-01-11T21:45:10.5354713Z test_sparse_compressed_constructor_____from_tensor_SparseCSC_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.113s) 2023-01-11T21:45:10.5354906Z test_sparse_compressed_constructor_____from_tensor_SparseCSC_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.098s) 2023-01-11T21:45:10.5355105Z test_sparse_compressed_constructor_____from_tensor_SparseCSC_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.098s) 2023-01-11T21:45:10.5355298Z test_sparse_compressed_constructor_____from_tensor_SparseCSC_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.096s) 2023-01-11T21:45:10.5355497Z test_sparse_compressed_constructor_____from_tensor_SparseCSC_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.098s) 2023-01-11T21:45:10.5355692Z test_sparse_compressed_constructor_____from_tensor_SparseCSC_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.098s) 2023-01-11T21:45:10.5355918Z test_sparse_compressed_constructor_____from_tensor_SparseCSR_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.116s) 2023-01-11T21:45:10.5356120Z test_sparse_compressed_constructor_____from_tensor_SparseCSR_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.098s) 2023-01-11T21:45:10.5356325Z test_sparse_compressed_constructor_____from_tensor_SparseCSR_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.118s) 2023-01-11T21:45:10.5356528Z test_sparse_compressed_constructor_____from_tensor_SparseCSR_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.120s) 2023-01-11T21:45:10.5356720Z test_sparse_compressed_constructor_____from_tensor_SparseCSR_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.116s) 2023-01-11T21:45:10.5356916Z test_sparse_compressed_constructor_____from_tensor_SparseCSR_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.116s) 2023-01-11T21:45:10.5357116Z test_sparse_compressed_constructor_____from_tensor_SparseCSR_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.113s) 2023-01-11T21:45:10.5357313Z test_sparse_compressed_constructor_____from_tensor_SparseCSR_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.098s) 2023-01-11T21:45:10.5357511Z test_sparse_compressed_constructor_____from_tensor_SparseCSR_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.099s) 2023-01-11T21:45:10.5357704Z test_sparse_compressed_constructor_____from_tensor_SparseCSR_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.096s) 2023-01-11T21:45:10.5357900Z test_sparse_compressed_constructor_____from_tensor_SparseCSR_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.099s) 2023-01-11T21:45:10.5358094Z test_sparse_compressed_constructor_____from_tensor_SparseCSR_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.098s) 2023-01-11T21:45:10.5358304Z test_sparse_compressed_constructor___factory_from_list_SparseBSC_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.062s) 2023-01-11T21:45:10.5358512Z test_sparse_compressed_constructor___factory_from_list_SparseBSC_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.053s) 2023-01-11T21:45:10.5358758Z test_sparse_compressed_constructor___factory_from_list_SparseBSC_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.063s) 2023-01-11T21:45:10.5358971Z test_sparse_compressed_constructor___factory_from_list_SparseBSC_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.064s) 2023-01-11T21:45:10.5359178Z test_sparse_compressed_constructor___factory_from_list_SparseBSC_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.062s) 2023-01-11T21:45:10.5359381Z test_sparse_compressed_constructor___factory_from_list_SparseBSC_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.061s) 2023-01-11T21:45:10.5359582Z test_sparse_compressed_constructor___factory_from_list_SparseBSC_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.060s) 2023-01-11T21:45:10.5359792Z test_sparse_compressed_constructor___factory_from_list_SparseBSC_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.054s) 2023-01-11T21:45:10.5360004Z test_sparse_compressed_constructor___factory_from_list_SparseBSC_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.054s) 2023-01-11T21:45:10.5360206Z test_sparse_compressed_constructor___factory_from_list_SparseBSC_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.052s) 2023-01-11T21:45:10.5360410Z test_sparse_compressed_constructor___factory_from_list_SparseBSC_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.054s) 2023-01-11T21:45:10.5360611Z test_sparse_compressed_constructor___factory_from_list_SparseBSC_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.053s) 2023-01-11T21:45:10.5360811Z test_sparse_compressed_constructor___factory_from_list_SparseBSR_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.062s) 2023-01-11T21:45:10.5361040Z test_sparse_compressed_constructor___factory_from_list_SparseBSR_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.054s) 2023-01-11T21:45:10.5361250Z test_sparse_compressed_constructor___factory_from_list_SparseBSR_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.062s) 2023-01-11T21:45:10.5361461Z test_sparse_compressed_constructor___factory_from_list_SparseBSR_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.064s) 2023-01-11T21:45:10.5361662Z test_sparse_compressed_constructor___factory_from_list_SparseBSR_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.062s) 2023-01-11T21:45:10.5361862Z test_sparse_compressed_constructor___factory_from_list_SparseBSR_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.062s) 2023-01-11T21:45:10.5362059Z test_sparse_compressed_constructor___factory_from_list_SparseBSR_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.061s) 2023-01-11T21:45:10.5362262Z test_sparse_compressed_constructor___factory_from_list_SparseBSR_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.053s) 2023-01-11T21:45:10.5362467Z test_sparse_compressed_constructor___factory_from_list_SparseBSR_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.053s) 2023-01-11T21:45:10.5362671Z test_sparse_compressed_constructor___factory_from_list_SparseBSR_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.052s) 2023-01-11T21:45:10.5362868Z test_sparse_compressed_constructor___factory_from_list_SparseBSR_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.053s) 2023-01-11T21:45:10.5363067Z test_sparse_compressed_constructor___factory_from_list_SparseBSR_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.053s) 2023-01-11T21:45:10.5363273Z test_sparse_compressed_constructor___factory_from_list_SparseCSC_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.059s) 2023-01-11T21:45:10.5363499Z test_sparse_compressed_constructor___factory_from_list_SparseCSC_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.051s) 2023-01-11T21:45:10.5363736Z test_sparse_compressed_constructor___factory_from_list_SparseCSC_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.059s) 2023-01-11T21:45:10.5363964Z test_sparse_compressed_constructor___factory_from_list_SparseCSC_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.061s) 2023-01-11T21:45:10.5364167Z test_sparse_compressed_constructor___factory_from_list_SparseCSC_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.059s) 2023-01-11T21:45:10.5364366Z test_sparse_compressed_constructor___factory_from_list_SparseCSC_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.059s) 2023-01-11T21:45:10.5364564Z test_sparse_compressed_constructor___factory_from_list_SparseCSC_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.058s) 2023-01-11T21:45:10.5364765Z test_sparse_compressed_constructor___factory_from_list_SparseCSC_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.051s) 2023-01-11T21:45:10.5364964Z test_sparse_compressed_constructor___factory_from_list_SparseCSC_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.051s) 2023-01-11T21:45:10.5365164Z test_sparse_compressed_constructor___factory_from_list_SparseCSC_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.049s) 2023-01-11T21:45:10.5365371Z test_sparse_compressed_constructor___factory_from_list_SparseCSC_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.051s) 2023-01-11T21:45:10.5365573Z test_sparse_compressed_constructor___factory_from_list_SparseCSC_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.051s) 2023-01-11T21:45:10.5365779Z test_sparse_compressed_constructor___factory_from_list_SparseCSR_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.059s) 2023-01-11T21:45:10.5365981Z test_sparse_compressed_constructor___factory_from_list_SparseCSR_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.051s) 2023-01-11T21:45:10.5366191Z test_sparse_compressed_constructor___factory_from_list_SparseCSR_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.059s) 2023-01-11T21:45:10.5366424Z test_sparse_compressed_constructor___factory_from_list_SparseCSR_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.061s) 2023-01-11T21:45:10.5366631Z test_sparse_compressed_constructor___factory_from_list_SparseCSR_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.059s) 2023-01-11T21:45:10.5366831Z test_sparse_compressed_constructor___factory_from_list_SparseCSR_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.059s) 2023-01-11T21:45:10.5367031Z test_sparse_compressed_constructor___factory_from_list_SparseCSR_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.058s) 2023-01-11T21:45:10.5367225Z test_sparse_compressed_constructor___factory_from_list_SparseCSR_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.051s) 2023-01-11T21:45:10.5367425Z test_sparse_compressed_constructor___factory_from_list_SparseCSR_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.051s) 2023-01-11T21:45:10.5367632Z test_sparse_compressed_constructor___factory_from_list_SparseCSR_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.049s) 2023-01-11T21:45:10.5367835Z test_sparse_compressed_constructor___factory_from_list_SparseCSR_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.051s) 2023-01-11T21:45:10.5368035Z test_sparse_compressed_constructor___factory_from_list_SparseCSR_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.051s) 2023-01-11T21:45:10.5368246Z test_sparse_compressed_constructor___factory_from_tensor_SparseBSC_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.121s) 2023-01-11T21:45:10.5368452Z test_sparse_compressed_constructor___factory_from_tensor_SparseBSC_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.102s) 2023-01-11T21:45:10.5368666Z test_sparse_compressed_constructor___factory_from_tensor_SparseBSC_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.122s) 2023-01-11T21:45:10.5368876Z test_sparse_compressed_constructor___factory_from_tensor_SparseBSC_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.124s) 2023-01-11T21:45:10.5369089Z test_sparse_compressed_constructor___factory_from_tensor_SparseBSC_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.120s) 2023-01-11T21:45:10.5369312Z test_sparse_compressed_constructor___factory_from_tensor_SparseBSC_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.120s) 2023-01-11T21:45:10.5369517Z test_sparse_compressed_constructor___factory_from_tensor_SparseBSC_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.117s) 2023-01-11T21:45:10.5369725Z test_sparse_compressed_constructor___factory_from_tensor_SparseBSC_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.103s) 2023-01-11T21:45:10.5369925Z test_sparse_compressed_constructor___factory_from_tensor_SparseBSC_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.103s) 2023-01-11T21:45:10.5370124Z test_sparse_compressed_constructor___factory_from_tensor_SparseBSC_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.100s) 2023-01-11T21:45:10.5370331Z test_sparse_compressed_constructor___factory_from_tensor_SparseBSC_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.103s) 2023-01-11T21:45:10.5370538Z test_sparse_compressed_constructor___factory_from_tensor_SparseBSC_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.103s) 2023-01-11T21:45:10.5370744Z test_sparse_compressed_constructor___factory_from_tensor_SparseBSR_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.121s) 2023-01-11T21:45:10.5370947Z test_sparse_compressed_constructor___factory_from_tensor_SparseBSR_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.103s) 2023-01-11T21:45:10.5371158Z test_sparse_compressed_constructor___factory_from_tensor_SparseBSR_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.122s) 2023-01-11T21:45:10.5371361Z test_sparse_compressed_constructor___factory_from_tensor_SparseBSR_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.125s) 2023-01-11T21:45:10.5371589Z test_sparse_compressed_constructor___factory_from_tensor_SparseBSR_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.121s) 2023-01-11T21:45:10.5371794Z test_sparse_compressed_constructor___factory_from_tensor_SparseBSR_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.120s) 2023-01-11T21:45:10.5371995Z test_sparse_compressed_constructor___factory_from_tensor_SparseBSR_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.117s) 2023-01-11T21:45:10.5372196Z test_sparse_compressed_constructor___factory_from_tensor_SparseBSR_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.103s) 2023-01-11T21:45:10.5372396Z test_sparse_compressed_constructor___factory_from_tensor_SparseBSR_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.103s) 2023-01-11T21:45:10.5372593Z test_sparse_compressed_constructor___factory_from_tensor_SparseBSR_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.100s) 2023-01-11T21:45:10.5372795Z test_sparse_compressed_constructor___factory_from_tensor_SparseBSR_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.103s) 2023-01-11T21:45:10.5373004Z test_sparse_compressed_constructor___factory_from_tensor_SparseBSR_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.102s) 2023-01-11T21:45:10.5373210Z test_sparse_compressed_constructor___factory_from_tensor_SparseCSC_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.115s) 2023-01-11T21:45:10.5373413Z test_sparse_compressed_constructor___factory_from_tensor_SparseCSC_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.097s) 2023-01-11T21:45:10.5373649Z test_sparse_compressed_constructor___factory_from_tensor_SparseCSC_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.117s) 2023-01-11T21:45:10.5373877Z test_sparse_compressed_constructor___factory_from_tensor_SparseCSC_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.120s) 2023-01-11T21:45:10.5374084Z test_sparse_compressed_constructor___factory_from_tensor_SparseCSC_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.115s) 2023-01-11T21:45:10.5374293Z test_sparse_compressed_constructor___factory_from_tensor_SparseCSC_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.116s) 2023-01-11T21:45:10.5374641Z test_sparse_compressed_constructor___factory_from_tensor_SparseCSC_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.112s) 2023-01-11T21:45:10.5374846Z test_sparse_compressed_constructor___factory_from_tensor_SparseCSC_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.096s) 2023-01-11T21:45:10.5375048Z test_sparse_compressed_constructor___factory_from_tensor_SparseCSC_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.098s) 2023-01-11T21:45:10.5375243Z test_sparse_compressed_constructor___factory_from_tensor_SparseCSC_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.095s) 2023-01-11T21:45:10.5375447Z test_sparse_compressed_constructor___factory_from_tensor_SparseCSC_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.098s) 2023-01-11T21:45:10.5375654Z test_sparse_compressed_constructor___factory_from_tensor_SparseCSC_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.097s) 2023-01-11T21:45:10.5375860Z test_sparse_compressed_constructor___factory_from_tensor_SparseCSR_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.115s) 2023-01-11T21:45:10.5376064Z test_sparse_compressed_constructor___factory_from_tensor_SparseCSR_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.098s) 2023-01-11T21:45:10.5376280Z test_sparse_compressed_constructor___factory_from_tensor_SparseCSR_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.117s) 2023-01-11T21:45:10.5376491Z test_sparse_compressed_constructor___factory_from_tensor_SparseCSR_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.119s) 2023-01-11T21:45:10.5376695Z test_sparse_compressed_constructor___factory_from_tensor_SparseCSR_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.115s) 2023-01-11T21:45:10.5376895Z test_sparse_compressed_constructor___factory_from_tensor_SparseCSR_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.115s) 2023-01-11T21:45:10.5377125Z test_sparse_compressed_constructor___factory_from_tensor_SparseCSR_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.113s) 2023-01-11T21:45:10.5377329Z test_sparse_compressed_constructor___factory_from_tensor_SparseCSR_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.098s) 2023-01-11T21:45:10.5377528Z test_sparse_compressed_constructor___factory_from_tensor_SparseCSR_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.098s) 2023-01-11T21:45:10.5377727Z test_sparse_compressed_constructor___factory_from_tensor_SparseCSR_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.095s) 2023-01-11T21:45:10.5377923Z test_sparse_compressed_constructor___factory_from_tensor_SparseCSR_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.098s) 2023-01-11T21:45:10.5378126Z test_sparse_compressed_constructor___factory_from_tensor_SparseCSR_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.098s) 2023-01-11T21:45:10.5378376Z test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseBSC_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s) 2023-01-11T21:45:10.5378743Z test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseBSC_cuda_bool (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s) 2023-01-11T21:45:10.5378993Z test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseBSC_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s) 2023-01-11T21:45:10.5379238Z test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseBSC_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s) 2023-01-11T21:45:10.5379479Z test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseBSC_cuda_float16 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s) 2023-01-11T21:45:10.5379716Z test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseBSC_cuda_float32 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s) 2023-01-11T21:45:10.5379990Z test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseBSC_cuda_float64 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s) 2023-01-11T21:45:10.5380232Z test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseBSC_cuda_int16 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s) 2023-01-11T21:45:10.5380466Z test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseBSC_cuda_int32 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s) 2023-01-11T21:45:10.5380709Z test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseBSC_cuda_int64 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s) 2023-01-11T21:45:10.5380949Z test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseBSC_cuda_int8 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s) 2023-01-11T21:45:10.5381183Z test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseBSC_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s) 2023-01-11T21:45:10.5381433Z test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseBSR_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s) 2023-01-11T21:45:10.5381674Z test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseBSR_cuda_bool (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s) 2023-01-11T21:45:10.5381922Z test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseBSR_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s) 2023-01-11T21:45:10.5382194Z test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseBSR_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s) 2023-01-11T21:45:10.5382438Z test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseBSR_cuda_float16 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s) 2023-01-11T21:45:10.5382676Z test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseBSR_cuda_float32 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s) 2023-01-11T21:45:10.5382909Z test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseBSR_cuda_float64 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s) 2023-01-11T21:45:10.5383146Z test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseBSR_cuda_int16 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s) 2023-01-11T21:45:10.5383378Z test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseBSR_cuda_int32 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s) 2023-01-11T21:45:10.5383622Z test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseBSR_cuda_int64 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s) 2023-01-11T21:45:10.5383860Z test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseBSR_cuda_int8 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s) 2023-01-11T21:45:10.5384088Z test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseBSR_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s) 2023-01-11T21:45:10.5384332Z test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseCSC_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s) 2023-01-11T21:45:10.5384574Z test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseCSC_cuda_bool (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s) 2023-01-11T21:45:10.5384824Z test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseCSC_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s) 2023-01-11T21:45:10.5385092Z test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseCSC_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s) 2023-01-11T21:45:10.5385334Z test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseCSC_cuda_float16 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s) 2023-01-11T21:45:10.5385568Z test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseCSC_cuda_float32 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s) 2023-01-11T21:45:10.5385811Z test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseCSC_cuda_float64 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s) 2023-01-11T21:45:10.5386052Z test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseCSC_cuda_int16 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s) 2023-01-11T21:45:10.5386289Z test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseCSC_cuda_int32 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s) 2023-01-11T21:45:10.5386527Z test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseCSC_cuda_int64 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s) 2023-01-11T21:45:10.5386763Z test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseCSC_cuda_int8 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s) 2023-01-11T21:45:10.5387000Z test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseCSC_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s) 2023-01-11T21:45:10.5387263Z test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseCSR_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s) 2023-01-11T21:45:10.5387505Z test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseCSR_cuda_bool (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s) 2023-01-11T21:45:10.5387748Z test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseCSR_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s) 2023-01-11T21:45:10.5387993Z test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseCSR_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s) 2023-01-11T21:45:10.5388230Z test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseCSR_cuda_float16 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s) 2023-01-11T21:45:10.5388468Z test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseCSR_cuda_float32 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s) 2023-01-11T21:45:10.5388708Z test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseCSR_cuda_float64 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s) 2023-01-11T21:45:10.5388944Z test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseCSR_cuda_int16 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s) 2023-01-11T21:45:10.5389174Z test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseCSR_cuda_int32 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s) 2023-01-11T21:45:10.5389410Z test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseCSR_cuda_int64 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s) 2023-01-11T21:45:10.5389649Z test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseCSR_cuda_int8 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s) 2023-01-11T21:45:10.5389912Z test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseCSR_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s) 2023-01-11T21:45:10.5390139Z test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseBSC_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.114s) 2023-01-11T21:45:10.5390368Z test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseBSC_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.097s) 2023-01-11T21:45:10.5390602Z test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseBSC_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.116s) 2023-01-11T21:45:10.5390833Z test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseBSC_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.118s) 2023-01-11T21:45:10.5391065Z test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseBSC_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.115s) 2023-01-11T21:45:10.5391291Z test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseBSC_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.115s) 2023-01-11T21:45:10.5391521Z test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseBSC_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.111s) 2023-01-11T21:45:10.5391748Z test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseBSC_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.097s) 2023-01-11T21:45:10.5391965Z test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseBSC_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.097s) 2023-01-11T21:45:10.5392192Z test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseBSC_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.094s) 2023-01-11T21:45:10.5392452Z test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseBSC_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.098s) 2023-01-11T21:45:10.5392674Z test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseBSC_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.098s) 2023-01-11T21:45:10.5392896Z test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseBSR_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.115s) 2023-01-11T21:45:10.5393122Z test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseBSR_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.097s) 2023-01-11T21:45:10.5393353Z test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseBSR_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.115s) 2023-01-11T21:45:10.5393583Z test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseBSR_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.119s) 2023-01-11T21:45:10.5393810Z test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseBSR_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.114s) 2023-01-11T21:45:10.5394029Z test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseBSR_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.114s) 2023-01-11T21:45:10.5394254Z test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseBSR_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.112s) 2023-01-11T21:45:10.5394475Z test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseBSR_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.098s) 2023-01-11T21:45:10.5394692Z test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseBSR_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.098s) 2023-01-11T21:45:10.5394916Z test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseBSR_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.095s) 2023-01-11T21:45:10.5395175Z test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseBSR_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.098s) 2023-01-11T21:45:10.5395391Z test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseBSR_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.098s) 2023-01-11T21:45:10.5395621Z test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseCSC_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.110s) 2023-01-11T21:45:10.5395844Z test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseCSC_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.094s) 2023-01-11T21:45:10.5396070Z test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseCSC_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.112s) 2023-01-11T21:45:10.5396303Z test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseCSC_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.114s) 2023-01-11T21:45:10.5396523Z test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseCSC_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.110s) 2023-01-11T21:45:10.5396743Z test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseCSC_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.109s) 2023-01-11T21:45:10.5396967Z test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseCSC_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.108s) 2023-01-11T21:45:10.5397185Z test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseCSC_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.094s) 2023-01-11T21:45:10.5397428Z test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseCSC_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.094s) 2023-01-11T21:45:10.5397651Z test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseCSC_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.091s) 2023-01-11T21:45:10.5397873Z test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseCSC_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.094s) 2023-01-11T21:45:10.5398086Z test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseCSC_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.093s) 2023-01-11T21:45:10.5398312Z test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseCSR_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.110s) 2023-01-11T21:45:10.5398536Z test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseCSR_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.093s) 2023-01-11T21:45:10.5398764Z test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseCSR_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.112s) 2023-01-11T21:45:10.5398992Z test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseCSR_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.114s) 2023-01-11T21:45:10.5399215Z test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseCSR_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.110s) 2023-01-11T21:45:10.5399435Z test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseCSR_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.110s) 2023-01-11T21:45:10.5399659Z test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseCSR_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.108s) 2023-01-11T21:45:10.5399881Z test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseCSR_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.093s) 2023-01-11T21:45:10.5400121Z test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseCSR_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.094s) 2023-01-11T21:45:10.5400338Z test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseCSR_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.090s) 2023-01-11T21:45:10.5400556Z test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseCSR_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.093s) 2023-01-11T21:45:10.5400772Z test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseCSR_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.094s) 2023-01-11T21:45:10.5401027Z test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseBSC_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s) 2023-01-11T21:45:10.5401279Z test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseBSC_cuda_bool (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s) 2023-01-11T21:45:10.5401535Z test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseBSC_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s) 2023-01-11T21:45:10.5401784Z test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseBSC_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s) 2023-01-11T21:45:10.5402033Z test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseBSC_cuda_float16 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s) 2023-01-11T21:45:10.5402286Z test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseBSC_cuda_float32 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s) 2023-01-11T21:45:10.5402606Z test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseBSC_cuda_float64 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s) 2023-01-11T21:45:10.5402852Z test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseBSC_cuda_int16 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s) 2023-01-11T21:45:10.5403099Z test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseBSC_cuda_int32 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s) 2023-01-11T21:45:10.5403347Z test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseBSC_cuda_int64 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s) 2023-01-11T21:45:10.5403620Z test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseBSC_cuda_int8 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s) 2023-01-11T21:45:10.5403886Z test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseBSC_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s) 2023-01-11T21:45:10.5404138Z test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseBSR_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s) 2023-01-11T21:45:10.5404386Z test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseBSR_cuda_bool (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s) 2023-01-11T21:45:10.5404638Z test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseBSR_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s) 2023-01-11T21:45:10.5404887Z test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseBSR_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s) 2023-01-11T21:45:10.5405140Z test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseBSR_cuda_float16 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s) 2023-01-11T21:45:10.5405406Z test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseBSR_cuda_float32 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s) 2023-01-11T21:45:10.5405651Z test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseBSR_cuda_float64 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s) 2023-01-11T21:45:10.5405895Z test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseBSR_cuda_int16 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s) 2023-01-11T21:45:10.5406137Z test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseBSR_cuda_int32 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s) 2023-01-11T21:45:10.5406381Z test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseBSR_cuda_int64 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s) 2023-01-11T21:45:10.5406628Z test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseBSR_cuda_int8 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s) 2023-01-11T21:45:10.5406872Z test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseBSR_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s) 2023-01-11T21:45:10.5407115Z test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseCSC_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s) 2023-01-11T21:45:10.5407363Z test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseCSC_cuda_bool (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s) 2023-01-11T21:45:10.5407640Z test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseCSC_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s) 2023-01-11T21:45:10.5407892Z test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseCSC_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s) 2023-01-11T21:45:10.5408138Z test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseCSC_cuda_float16 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s) 2023-01-11T21:45:10.5408381Z test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseCSC_cuda_float32 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s) 2023-01-11T21:45:10.5408630Z test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseCSC_cuda_float64 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s) 2023-01-11T21:45:10.5408872Z test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseCSC_cuda_int16 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s) 2023-01-11T21:45:10.5409113Z test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseCSC_cuda_int32 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s) 2023-01-11T21:45:10.5409355Z test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseCSC_cuda_int64 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s) 2023-01-11T21:45:10.5409597Z test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseCSC_cuda_int8 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s) 2023-01-11T21:45:10.5409842Z test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseCSC_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s) 2023-01-11T21:45:10.5410094Z test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseCSR_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s) 2023-01-11T21:45:10.5410356Z test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseCSR_cuda_bool (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s) 2023-01-11T21:45:10.5410610Z test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseCSR_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s) 2023-01-11T21:45:10.5410859Z test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseCSR_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s) 2023-01-11T21:45:10.5411103Z test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseCSR_cuda_float16 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s) 2023-01-11T21:45:10.5411351Z test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseCSR_cuda_float32 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s) 2023-01-11T21:45:10.5411605Z test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseCSR_cuda_float64 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s) 2023-01-11T21:45:10.5411846Z test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseCSR_cuda_int16 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s) 2023-01-11T21:45:10.5412089Z test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseCSR_cuda_int32 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s) 2023-01-11T21:45:10.5412330Z test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseCSR_cuda_int64 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s) 2023-01-11T21:45:10.5412599Z test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseCSR_cuda_int8 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s) 2023-01-11T21:45:10.5412844Z test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseCSR_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s) 2023-01-11T21:45:10.5413081Z test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseBSC_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.114s) 2023-01-11T21:45:10.5413319Z test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseBSC_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.098s) 2023-01-11T21:45:10.5413559Z test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseBSC_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.116s) 2023-01-11T21:45:10.5413845Z test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseBSC_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.119s) 2023-01-11T21:45:10.5414082Z test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseBSC_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.115s) 2023-01-11T21:45:10.5414316Z test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseBSC_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.114s) 2023-01-11T21:45:10.5414662Z test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseBSC_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.112s) 2023-01-11T21:45:10.5414899Z test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseBSC_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.096s) 2023-01-11T21:45:10.5415138Z test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseBSC_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.098s) 2023-01-11T21:45:10.5415377Z test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseBSC_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.095s) 2023-01-11T21:45:10.5415645Z test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseBSC_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.098s) 2023-01-11T21:45:10.5415880Z test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseBSC_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.097s) 2023-01-11T21:45:10.5416115Z test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseBSR_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.114s) 2023-01-11T21:45:10.5416348Z test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseBSR_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.098s) 2023-01-11T21:45:10.5416580Z test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseBSR_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.116s) 2023-01-11T21:45:10.5416818Z test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseBSR_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.119s) 2023-01-11T21:45:10.5417051Z test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseBSR_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.115s) 2023-01-11T21:45:10.5417283Z test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseBSR_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.115s) 2023-01-11T21:45:10.5417516Z test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseBSR_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.112s) 2023-01-11T21:45:10.5417745Z test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseBSR_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.098s) 2023-01-11T21:45:10.5418001Z test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseBSR_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.098s) 2023-01-11T21:45:10.5418234Z test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseBSR_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.095s) 2023-01-11T21:45:10.5418467Z test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseBSR_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.097s) 2023-01-11T21:45:10.5418763Z test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseBSR_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.097s) 2023-01-11T21:45:10.5419025Z test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseCSC_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.109s) 2023-01-11T21:45:10.5419275Z test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseCSC_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.093s) 2023-01-11T21:45:10.5419566Z test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseCSC_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.111s) 2023-01-11T21:45:10.5419802Z test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseCSC_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.114s) 2023-01-11T21:45:10.5420034Z test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseCSC_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.109s) 2023-01-11T21:45:10.5420268Z test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseCSC_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.110s) 2023-01-11T21:45:10.5420501Z test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseCSC_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.107s) 2023-01-11T21:45:10.5420730Z test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseCSC_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.093s) 2023-01-11T21:45:10.5420989Z test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseCSC_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.093s) 2023-01-11T21:45:10.5421216Z test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseCSC_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.090s) 2023-01-11T21:45:10.5421444Z test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseCSC_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.092s) 2023-01-11T21:45:10.5421676Z test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseCSC_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.093s) 2023-01-11T21:45:10.5421913Z test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseCSR_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.110s) 2023-01-11T21:45:10.5422138Z test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseCSR_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.093s) 2023-01-11T21:45:10.5422375Z test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseCSR_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.111s) 2023-01-11T21:45:10.5422613Z test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseCSR_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.114s) 2023-01-11T21:45:10.5422844Z test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseCSR_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.109s) 2023-01-11T21:45:10.5423075Z test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseCSR_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.110s) 2023-01-11T21:45:10.5423335Z test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseCSR_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.107s) 2023-01-11T21:45:10.5423565Z test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseCSR_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.093s) 2023-01-11T21:45:10.5423793Z test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseCSR_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.093s) 2023-01-11T21:45:10.5424021Z test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseCSR_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.091s) 2023-01-11T21:45:10.5424248Z test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseCSR_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.093s) 2023-01-11T21:45:10.5424480Z test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseCSR_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.093s) 2023-01-11T21:45:10.5424652Z test_to_dtype_SparseBSC_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.166s) 2023-01-11T21:45:10.5424810Z test_to_dtype_SparseBSC_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.164s) 2023-01-11T21:45:10.5424981Z test_to_dtype_SparseBSC_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.166s) 2023-01-11T21:45:10.5425148Z test_to_dtype_SparseBSC_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.167s) 2023-01-11T21:45:10.5425313Z test_to_dtype_SparseBSC_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.166s) 2023-01-11T21:45:10.5425477Z test_to_dtype_SparseBSC_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.166s) 2023-01-11T21:45:10.5425643Z test_to_dtype_SparseBSC_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.165s) 2023-01-11T21:45:10.5425809Z test_to_dtype_SparseBSC_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.164s) 2023-01-11T21:45:10.5425971Z test_to_dtype_SparseBSC_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.164s) 2023-01-11T21:45:10.5426144Z test_to_dtype_SparseBSC_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.164s) 2023-01-11T21:45:10.5426307Z test_to_dtype_SparseBSC_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.164s) 2023-01-11T21:45:10.5426472Z test_to_dtype_SparseBSC_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.164s) 2023-01-11T21:45:10.5426636Z test_to_dtype_SparseBSR_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.163s) 2023-01-11T21:45:10.5426796Z test_to_dtype_SparseBSR_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.162s) 2023-01-11T21:45:10.5426965Z test_to_dtype_SparseBSR_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.165s) 2023-01-11T21:45:10.5427134Z test_to_dtype_SparseBSR_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.166s) 2023-01-11T21:45:10.5427297Z test_to_dtype_SparseBSR_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.164s) 2023-01-11T21:45:10.5427461Z test_to_dtype_SparseBSR_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.166s) 2023-01-11T21:45:10.5427614Z test_to_dtype_SparseBSR_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.165s) 2023-01-11T21:45:10.5427778Z test_to_dtype_SparseBSR_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.165s) 2023-01-11T21:45:10.5427939Z test_to_dtype_SparseBSR_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.165s) 2023-01-11T21:45:10.5428099Z test_to_dtype_SparseBSR_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.162s) 2023-01-11T21:45:10.5428260Z test_to_dtype_SparseBSR_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.164s) 2023-01-11T21:45:10.5428418Z test_to_dtype_SparseBSR_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.164s) 2023-01-11T21:45:10.5428612Z test_to_dtype_SparseCSC_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.099s) 2023-01-11T21:45:10.5428772Z test_to_dtype_SparseCSC_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.097s) 2023-01-11T21:45:10.5428938Z test_to_dtype_SparseCSC_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.099s) 2023-01-11T21:45:10.5429104Z test_to_dtype_SparseCSC_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.099s) 2023-01-11T21:45:10.5429264Z test_to_dtype_SparseCSC_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.099s) 2023-01-11T21:45:10.5429424Z test_to_dtype_SparseCSC_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.098s) 2023-01-11T21:45:10.5429585Z test_to_dtype_SparseCSC_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.098s) 2023-01-11T21:45:10.5429743Z test_to_dtype_SparseCSC_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.098s) 2023-01-11T21:45:10.5429904Z test_to_dtype_SparseCSC_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.098s) 2023-01-11T21:45:10.5430065Z test_to_dtype_SparseCSC_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.097s) 2023-01-11T21:45:10.5430221Z test_to_dtype_SparseCSC_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.096s) 2023-01-11T21:45:10.5430382Z test_to_dtype_SparseCSC_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.096s) 2023-01-11T21:45:10.5430546Z test_to_dtype_SparseCSR_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.093s) 2023-01-11T21:45:10.5430707Z test_to_dtype_SparseCSR_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.092s) 2023-01-11T21:45:10.5430876Z test_to_dtype_SparseCSR_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.094s) 2023-01-11T21:45:10.5431045Z test_to_dtype_SparseCSR_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.093s) 2023-01-11T21:45:10.5431206Z test_to_dtype_SparseCSR_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.093s) 2023-01-11T21:45:10.5431369Z test_to_dtype_SparseCSR_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.093s) 2023-01-11T21:45:10.5431527Z test_to_dtype_SparseCSR_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.092s) 2023-01-11T21:45:10.5431716Z test_to_dtype_SparseCSR_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.092s) 2023-01-11T21:45:10.5431879Z test_to_dtype_SparseCSR_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.092s) 2023-01-11T21:45:10.5432034Z test_to_dtype_SparseCSR_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.091s) 2023-01-11T21:45:10.5432194Z test_to_dtype_SparseCSR_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.092s) 2023-01-11T21:45:10.5432356Z test_to_dtype_SparseCSR_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.091s) 2023-01-11T21:45:10.5432523Z test_validate_SparseBSC_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.053s) 2023-01-11T21:45:10.5432686Z test_validate_SparseBSC_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.047s) 2023-01-11T21:45:10.5432862Z test_validate_SparseBSC_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.053s) 2023-01-11T21:45:10.5433022Z test_validate_SparseBSC_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.053s) 2023-01-11T21:45:10.5433188Z test_validate_SparseBSC_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.053s) 2023-01-11T21:45:10.5433366Z test_validate_SparseBSC_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.053s) 2023-01-11T21:45:10.5433553Z test_validate_SparseBSC_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.052s) 2023-01-11T21:45:10.5433721Z test_validate_SparseBSC_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.047s) 2023-01-11T21:45:10.5433882Z test_validate_SparseBSC_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.047s) 2023-01-11T21:45:10.5434041Z test_validate_SparseBSC_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.046s) 2023-01-11T21:45:10.5434227Z test_validate_SparseBSC_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.047s) 2023-01-11T21:45:10.5434390Z test_validate_SparseBSC_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.047s) 2023-01-11T21:45:10.5434551Z test_validate_SparseBSR_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.052s) 2023-01-11T21:45:10.5434712Z test_validate_SparseBSR_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.047s) 2023-01-11T21:45:10.5434880Z test_validate_SparseBSR_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.053s) 2023-01-11T21:45:10.5435047Z test_validate_SparseBSR_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.054s) 2023-01-11T21:45:10.5435206Z test_validate_SparseBSR_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.052s) 2023-01-11T21:45:10.5435369Z test_validate_SparseBSR_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.052s) 2023-01-11T21:45:10.5435530Z test_validate_SparseBSR_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.052s) 2023-01-11T21:45:10.5435691Z test_validate_SparseBSR_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.047s) 2023-01-11T21:45:10.5435844Z test_validate_SparseBSR_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.047s) 2023-01-11T21:45:10.5436003Z test_validate_SparseBSR_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.046s) 2023-01-11T21:45:10.5436163Z test_validate_SparseBSR_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.047s) 2023-01-11T21:45:10.5436323Z test_validate_SparseBSR_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.046s) 2023-01-11T21:45:10.5436488Z test_validate_SparseCSC_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.047s) 2023-01-11T21:45:10.5436648Z test_validate_SparseCSC_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.042s) 2023-01-11T21:45:10.5436819Z test_validate_SparseCSC_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.048s) 2023-01-11T21:45:10.5436986Z test_validate_SparseCSC_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.049s) 2023-01-11T21:45:10.5437151Z test_validate_SparseCSC_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.048s) 2023-01-11T21:45:10.5437328Z test_validate_SparseCSC_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.048s) 2023-01-11T21:45:10.5437488Z test_validate_SparseCSC_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.048s) 2023-01-11T21:45:10.5437647Z test_validate_SparseCSC_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.043s) 2023-01-11T21:45:10.5437806Z test_validate_SparseCSC_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.043s) 2023-01-11T21:45:10.5437966Z test_validate_SparseCSC_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.042s) 2023-01-11T21:45:10.5438123Z test_validate_SparseCSC_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.043s) 2023-01-11T21:45:10.5438284Z test_validate_SparseCSC_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.043s) 2023-01-11T21:45:10.5438452Z test_validate_SparseCSR_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.048s) 2023-01-11T21:45:10.5438609Z test_validate_SparseCSR_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.043s) 2023-01-11T21:45:10.5438781Z test_validate_SparseCSR_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.049s) 2023-01-11T21:45:10.5438950Z test_validate_SparseCSR_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.049s) 2023-01-11T21:45:10.5439109Z test_validate_SparseCSR_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.047s) 2023-01-11T21:45:10.5439274Z test_validate_SparseCSR_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.048s) 2023-01-11T21:45:10.5439435Z test_validate_SparseCSR_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.047s) 2023-01-11T21:45:10.5439598Z test_validate_SparseCSR_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.043s) 2023-01-11T21:45:10.5439757Z test_validate_SparseCSR_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.043s) 2023-01-11T21:45:10.5439946Z test_validate_SparseCSR_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.041s) 2023-01-11T21:45:10.5440099Z test_validate_SparseCSR_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.043s) 2023-01-11T21:45:10.5440264Z test_validate_SparseCSR_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.042s) 2023-01-11T21:45:10.5440272Z 2023-01-11T21:45:10.5440526Z ---------------------------------------------------------------------- 2023-01-11T21:45:10.5440616Z Ran 4617 tests in 409.424s 2023-01-11T21:45:10.5440622Z 2023-01-11T21:45:10.5440701Z OK (skipped=517) 2023-01-11T21:45:10.5440707Z 2023-01-11T21:45:10.5440794Z Generating XML reports... 2023-01-11T21:45:10.5441102Z Generated XML report: test-reports/python-unittest/test_sparse_csr/TEST-TestSparseCSRCUDA-20230111213819.xml 2023-01-11T21:45:10.5441406Z Generated XML report: test-reports/python-unittest/test_sparse_csr/TEST-TestSparseCSRSampler-20230111213819.xml 2023-01-11T21:45:10.5441714Z Generated XML report: test-reports/python-unittest/test_sparse_csr/TEST-TestSparseCompressedCUDA-20230111213819.xml 2023-01-11T21:45:10.5441728Z 2023-01-11T21:45:10.5442098Z ##[endgroup] 2023-01-11T21:45:10.5442377Z FINISHED PRINTING LOG FILE of test_sparse_csr (/var/lib/jenkins/workspace/test/test-reports/test_sparse_csr_txl3rn3o) 2023-01-11T21:45:10.5442383Z 2023-01-11T21:45:10.5442581Z Running test_cpp_extensions_aot_no_ninja ... [2023-01-11 21:45:10.435941] 2023-01-11T21:45:12.0501046Z running install 2023-01-11T21:45:12.0501915Z /opt/conda/lib/python3.10/site-packages/setuptools/command/install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools. 2023-01-11T21:45:12.0502310Z warnings.warn( 2023-01-11T21:45:12.0604262Z running build 2023-01-11T21:45:12.0604828Z running build_py 2023-01-11T21:45:12.0645343Z creating build 2023-01-11T21:45:12.0646159Z creating build/lib.linux-x86_64-cpython-310 2023-01-11T21:45:12.0646948Z creating build/lib.linux-x86_64-cpython-310/torch_test_cpp_extension 2023-01-11T21:45:12.0647818Z copying torch_test_cpp_extension/__init__.py -> build/lib.linux-x86_64-cpython-310/torch_test_cpp_extension 2023-01-11T21:45:12.0648389Z running build_ext 2023-01-11T21:45:12.0676497Z building 'torch_test_cpp_extension.cpp' extension 2023-01-11T21:45:12.0677302Z creating build/temp.linux-x86_64-cpython-310 2023-01-11T21:45:12.0681062Z gcc -pthread -B /opt/conda/compiler_compat -Wno-unused-result -Wsign-compare -DNDEBUG -fwrapv -O2 -Wall -fPIC -O2 -isystem /opt/conda/include -fPIC -O2 -isystem /opt/conda/include -fPIC -I/opt/conda/lib/python3.10/site-packages/torch/include -I/opt/conda/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/opt/conda/lib/python3.10/site-packages/torch/include/TH -I/opt/conda/lib/python3.10/site-packages/torch/include/THC -Iself_compiler_include_dirs_test -I/opt/conda/include/python3.10 -c extension.cpp -o build/temp.linux-x86_64-cpython-310/extension.o -g -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -DTORCH_EXTENSION_NAME=cpp -D_GLIBCXX_USE_CXX11_ABI=1 -std=c++17 2023-01-11T21:45:12.9823337Z In file included from /opt/conda/lib/python3.10/site-packages/torch/include/torch/csrc/Exceptions.h:14:0, 2023-01-11T21:45:12.9824153Z from /opt/conda/lib/python3.10/site-packages/torch/include/torch/csrc/api/include/torch/python.h:11, 2023-01-11T21:45:12.9824693Z from /opt/conda/lib/python3.10/site-packages/torch/include/torch/extension.h:6, 2023-01-11T21:45:12.9824978Z from extension.cpp:1: 2023-01-11T21:45:12.9826394Z /opt/conda/lib/python3.10/site-packages/torch/include/pybind11/pybind11.h: In instantiation of ‘class pybind11::class_’: 2023-01-11T21:45:12.9826747Z extension.cpp:40:53: required from here 2023-01-11T21:45:12.9827480Z /opt/conda/lib/python3.10/site-packages/torch/include/pybind11/pybind11.h:1479:7: warning: ‘pybind11::class_’ declared with greater visibility than the type of its field ‘pybind11::class_::’ [-Wattributes] 2023-01-11T21:45:12.9828251Z class class_ : public detail::generic_type { 2023-01-11T21:45:12.9828470Z ^~~~~~ 2023-01-11T21:45:12.9829088Z /opt/conda/lib/python3.10/site-packages/torch/include/pybind11/pybind11.h:1479:7: warning: ‘pybind11::class_’ declared with greater visibility than its base ‘pybind11::detail::generic_type’ [-Wattributes] 2023-01-11T21:45:12.9833682Z g++ -pthread -B /opt/conda/compiler_compat -shared -Wl,-rpath,/opt/conda/lib -Wl,-rpath-link,/opt/conda/lib -L/opt/conda/lib -Wl,-rpath,/opt/conda/lib -Wl,-rpath-link,/opt/conda/lib -L/opt/conda/lib build/temp.linux-x86_64-cpython-310/extension.o -L/opt/conda/lib/python3.10/site-packages/torch/lib -lc10 -ltorch -ltorch_cpu -ltorch_python -o build/lib.linux-x86_64-cpython-310/torch_test_cpp_extension/cpp.cpython-310-x86_64-linux-gnu.so 2023-01-11T21:45:13.3393235Z building 'torch_test_cpp_extension.ort' extension 2023-01-11T21:45:13.3395304Z gcc -pthread -B /opt/conda/compiler_compat -Wno-unused-result -Wsign-compare -DNDEBUG -fwrapv -O2 -Wall -fPIC -O2 -isystem /opt/conda/include -fPIC -O2 -isystem /opt/conda/include -fPIC -I/opt/conda/lib/python3.10/site-packages/torch/include -I/opt/conda/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/opt/conda/lib/python3.10/site-packages/torch/include/TH -I/opt/conda/lib/python3.10/site-packages/torch/include/THC -Iself_compiler_include_dirs_test -I/opt/conda/include/python3.10 -c ort_extension.cpp -o build/temp.linux-x86_64-cpython-310/ort_extension.o -g -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -DTORCH_EXTENSION_NAME=ort -D_GLIBCXX_USE_CXX11_ABI=1 -std=c++17 2023-01-11T21:45:14.3347780Z g++ -pthread -B /opt/conda/compiler_compat -shared -Wl,-rpath,/opt/conda/lib -Wl,-rpath-link,/opt/conda/lib -L/opt/conda/lib -Wl,-rpath,/opt/conda/lib -Wl,-rpath-link,/opt/conda/lib -L/opt/conda/lib build/temp.linux-x86_64-cpython-310/ort_extension.o -L/opt/conda/lib/python3.10/site-packages/torch/lib -lc10 -ltorch -ltorch_cpu -ltorch_python -o build/lib.linux-x86_64-cpython-310/torch_test_cpp_extension/ort.cpython-310-x86_64-linux-gnu.so 2023-01-11T21:45:14.6620768Z building 'torch_test_cpp_extension.rng' extension 2023-01-11T21:45:14.6622466Z gcc -pthread -B /opt/conda/compiler_compat -Wno-unused-result -Wsign-compare -DNDEBUG -fwrapv -O2 -Wall -fPIC -O2 -isystem /opt/conda/include -fPIC -O2 -isystem /opt/conda/include -fPIC -I/opt/conda/lib/python3.10/site-packages/torch/include -I/opt/conda/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/opt/conda/lib/python3.10/site-packages/torch/include/TH -I/opt/conda/lib/python3.10/site-packages/torch/include/THC -Iself_compiler_include_dirs_test -I/opt/conda/include/python3.10 -c rng_extension.cpp -o build/temp.linux-x86_64-cpython-310/rng_extension.o -g -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -DTORCH_EXTENSION_NAME=rng -D_GLIBCXX_USE_CXX11_ABI=1 -std=c++17 2023-01-11T21:45:15.8051636Z In file included from /opt/conda/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec256/vec256.h:8:0, 2023-01-11T21:45:15.8052946Z from /opt/conda/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec.h:6, 2023-01-11T21:45:15.8053858Z from /opt/conda/lib/python3.10/site-packages/torch/include/ATen/native/cpu/Loops.h:37, 2023-01-11T21:45:15.8055024Z from /opt/conda/lib/python3.10/site-packages/torch/include/ATen/native/cpu/DistributionTemplates.h:8, 2023-01-11T21:45:15.8055368Z from rng_extension.cpp:6: 2023-01-11T21:45:15.8055816Z /opt/conda/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec_base.h:1008:0: warning: ignoring #pragma unroll [-Wunknown-pragmas] 2023-01-11T21:45:15.8056303Z # pragma unroll 2023-01-11T21:45:15.8056485Z 2023-01-11T21:45:15.8059702Z g++ -pthread -B /opt/conda/compiler_compat -shared -Wl,-rpath,/opt/conda/lib -Wl,-rpath-link,/opt/conda/lib -L/opt/conda/lib -Wl,-rpath,/opt/conda/lib -Wl,-rpath-link,/opt/conda/lib -L/opt/conda/lib build/temp.linux-x86_64-cpython-310/rng_extension.o -L/opt/conda/lib/python3.10/site-packages/torch/lib -lc10 -ltorch -ltorch_cpu -ltorch_python -o build/lib.linux-x86_64-cpython-310/torch_test_cpp_extension/rng.cpython-310-x86_64-linux-gnu.so 2023-01-11T21:45:16.1552202Z building 'torch_test_cpp_extension.cuda' extension 2023-01-11T21:45:16.1554042Z gcc -pthread -B /opt/conda/compiler_compat -Wno-unused-result -Wsign-compare -DNDEBUG -fwrapv -O2 -Wall -fPIC -O2 -isystem /opt/conda/include -fPIC -O2 -isystem /opt/conda/include -fPIC -I/opt/conda/lib/python3.10/site-packages/torch/include -I/opt/conda/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/opt/conda/lib/python3.10/site-packages/torch/include/TH -I/opt/conda/lib/python3.10/site-packages/torch/include/THC -I/usr/local/cuda/include -Iself_compiler_include_dirs_test -I/opt/conda/include/python3.10 -c cuda_extension.cpp -o build/temp.linux-x86_64-cpython-310/cuda_extension.o -g -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -DTORCH_EXTENSION_NAME=cuda -D_GLIBCXX_USE_CXX11_ABI=1 -std=c++17 2023-01-11T21:45:17.1416779Z /usr/local/cuda/bin/nvcc -I/opt/conda/lib/python3.10/site-packages/torch/include -I/opt/conda/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/opt/conda/lib/python3.10/site-packages/torch/include/TH -I/opt/conda/lib/python3.10/site-packages/torch/include/THC -I/usr/local/cuda/include -Iself_compiler_include_dirs_test -I/opt/conda/include/python3.10 -c cuda_extension_kernel.cu -o build/temp.linux-x86_64-cpython-310/cuda_extension_kernel.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -O2 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -DTORCH_EXTENSION_NAME=cuda -D_GLIBCXX_USE_CXX11_ABI=1 -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_52,code=compute_52 -gencode=arch=compute_52,code=sm_52 -std=c++17 2023-01-11T21:45:20.8485706Z /opt/conda/lib/python3.10/site-packages/torch/include/c10/util/irange.h(54): warning #186-D: pointless comparison of unsigned integer with zero 2023-01-11T21:45:20.8486109Z detected during: 2023-01-11T21:45:20.8486771Z instantiation of "__nv_bool c10::detail::integer_iterator>::operator==(const c10::detail::integer_iterator> &) const [with I=size_t, one_sided=false, =0]"  2023-01-11T21:45:20.8487173Z (61): here 2023-01-11T21:45:20.8487711Z instantiation of "__nv_bool c10::detail::integer_iterator>::operator!=(const c10::detail::integer_iterator> &) const [with I=size_t, one_sided=false, =0]"  2023-01-11T21:45:20.8488243Z /opt/conda/lib/python3.10/site-packages/torch/include/c10/core/TensorImpl.h(77): here 2023-01-11T21:45:20.8488442Z 2023-01-11T21:45:20.8989327Z /opt/conda/lib/python3.10/site-packages/torch/include/c10/util/irange.h(54): warning #186-D: pointless comparison of unsigned integer with zero 2023-01-11T21:45:20.8989687Z detected during: 2023-01-11T21:45:20.8990436Z instantiation of "__nv_bool c10::detail::integer_iterator>::operator==(const c10::detail::integer_iterator> &) const [with I=std::size_t, one_sided=true, =0]"  2023-01-11T21:45:20.8990851Z (61): here 2023-01-11T21:45:20.8991396Z instantiation of "__nv_bool c10::detail::integer_iterator>::operator!=(const c10::detail::integer_iterator> &) const [with I=std::size_t, one_sided=true, =0]"  2023-01-11T21:45:20.8992198Z /opt/conda/lib/python3.10/site-packages/torch/include/ATen/core/qualified_name.h(73): here 2023-01-11T21:45:20.8992403Z 2023-01-11T21:45:27.1778869Z /opt/conda/lib/python3.10/site-packages/torch/include/c10/util/irange.h(54): warning #186-D: pointless comparison of unsigned integer with zero 2023-01-11T21:45:27.1779753Z detected during: 2023-01-11T21:45:27.1781170Z instantiation of "__nv_bool c10::detail::integer_iterator>::operator==(const c10::detail::integer_iterator> &) const [with I=size_t, one_sided=false, =0]"  2023-01-11T21:45:27.1781956Z (61): here 2023-01-11T21:45:27.1783029Z instantiation of "__nv_bool c10::detail::integer_iterator>::operator!=(const c10::detail::integer_iterator> &) const [with I=size_t, one_sided=false, =0]"  2023-01-11T21:45:27.1784097Z /opt/conda/lib/python3.10/site-packages/torch/include/c10/core/TensorImpl.h(77): here 2023-01-11T21:45:27.1784478Z 2023-01-11T21:45:27.2285697Z /opt/conda/lib/python3.10/site-packages/torch/include/c10/util/irange.h(54): warning #186-D: pointless comparison of unsigned integer with zero 2023-01-11T21:45:27.2286648Z detected during: 2023-01-11T21:45:27.2288038Z instantiation of "__nv_bool c10::detail::integer_iterator>::operator==(const c10::detail::integer_iterator> &) const [with I=std::size_t, one_sided=true, =0]"  2023-01-11T21:45:27.2288597Z (61): here 2023-01-11T21:45:27.2289140Z instantiation of "__nv_bool c10::detail::integer_iterator>::operator!=(const c10::detail::integer_iterator> &) const [with I=std::size_t, one_sided=true, =0]"  2023-01-11T21:45:27.2289685Z /opt/conda/lib/python3.10/site-packages/torch/include/ATen/core/qualified_name.h(73): here 2023-01-11T21:45:27.2289889Z 2023-01-11T21:45:36.3905525Z /usr/local/cuda/bin/nvcc -I/opt/conda/lib/python3.10/site-packages/torch/include -I/opt/conda/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/opt/conda/lib/python3.10/site-packages/torch/include/TH -I/opt/conda/lib/python3.10/site-packages/torch/include/THC -I/usr/local/cuda/include -Iself_compiler_include_dirs_test -I/opt/conda/include/python3.10 -c cuda_extension_kernel2.cu -o build/temp.linux-x86_64-cpython-310/cuda_extension_kernel2.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -O2 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -DTORCH_EXTENSION_NAME=cuda -D_GLIBCXX_USE_CXX11_ABI=1 -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_52,code=compute_52 -gencode=arch=compute_52,code=sm_52 -std=c++17 2023-01-11T21:45:40.0969467Z /opt/conda/lib/python3.10/site-packages/torch/include/c10/util/irange.h(54): warning #186-D: pointless comparison of unsigned integer with zero 2023-01-11T21:45:40.0970417Z detected during: 2023-01-11T21:45:40.0971639Z instantiation of "__nv_bool c10::detail::integer_iterator>::operator==(const c10::detail::integer_iterator> &) const [with I=size_t, one_sided=false, =0]"  2023-01-11T21:45:40.0972095Z (61): here 2023-01-11T21:45:40.0972634Z instantiation of "__nv_bool c10::detail::integer_iterator>::operator!=(const c10::detail::integer_iterator> &) const [with I=size_t, one_sided=false, =0]"  2023-01-11T21:45:40.0973424Z /opt/conda/lib/python3.10/site-packages/torch/include/c10/core/TensorImpl.h(77): here 2023-01-11T21:45:40.0973620Z 2023-01-11T21:45:40.1473095Z /opt/conda/lib/python3.10/site-packages/torch/include/c10/util/irange.h(54): warning #186-D: pointless comparison of unsigned integer with zero 2023-01-11T21:45:40.1473657Z detected during: 2023-01-11T21:45:40.1474301Z instantiation of "__nv_bool c10::detail::integer_iterator>::operator==(const c10::detail::integer_iterator> &) const [with I=std::size_t, one_sided=true, =0]"  2023-01-11T21:45:40.1474707Z (61): here 2023-01-11T21:45:40.1475260Z instantiation of "__nv_bool c10::detail::integer_iterator>::operator!=(const c10::detail::integer_iterator> &) const [with I=std::size_t, one_sided=true, =0]"  2023-01-11T21:45:40.1475790Z /opt/conda/lib/python3.10/site-packages/torch/include/ATen/core/qualified_name.h(73): here 2023-01-11T21:45:40.1475995Z 2023-01-11T21:45:46.4372198Z /opt/conda/lib/python3.10/site-packages/torch/include/c10/util/irange.h(54): warning #186-D: pointless comparison of unsigned integer with zero 2023-01-11T21:45:46.4372581Z detected during: 2023-01-11T21:45:46.4373373Z instantiation of "__nv_bool c10::detail::integer_iterator>::operator==(const c10::detail::integer_iterator> &) const [with I=size_t, one_sided=false, =0]"  2023-01-11T21:45:46.4373770Z (61): here 2023-01-11T21:45:46.4374314Z instantiation of "__nv_bool c10::detail::integer_iterator>::operator!=(const c10::detail::integer_iterator> &) const [with I=size_t, one_sided=false, =0]"  2023-01-11T21:45:46.4375075Z /opt/conda/lib/python3.10/site-packages/torch/include/c10/core/TensorImpl.h(77): here 2023-01-11T21:45:46.4375272Z 2023-01-11T21:45:46.4874350Z /opt/conda/lib/python3.10/site-packages/torch/include/c10/util/irange.h(54): warning #186-D: pointless comparison of unsigned integer with zero 2023-01-11T21:45:46.4874719Z detected during: 2023-01-11T21:45:46.4875451Z instantiation of "__nv_bool c10::detail::integer_iterator>::operator==(const c10::detail::integer_iterator> &) const [with I=std::size_t, one_sided=true, =0]"  2023-01-11T21:45:46.4875856Z (61): here 2023-01-11T21:45:46.4876408Z instantiation of "__nv_bool c10::detail::integer_iterator>::operator!=(const c10::detail::integer_iterator> &) const [with I=std::size_t, one_sided=true, =0]"  2023-01-11T21:45:46.4876943Z /opt/conda/lib/python3.10/site-packages/torch/include/ATen/core/qualified_name.h(73): here 2023-01-11T21:45:46.4877149Z 2023-01-11T21:45:55.6657231Z g++ -pthread -B /opt/conda/compiler_compat -shared -Wl,-rpath,/opt/conda/lib -Wl,-rpath-link,/opt/conda/lib -L/opt/conda/lib -Wl,-rpath,/opt/conda/lib -Wl,-rpath-link,/opt/conda/lib -L/opt/conda/lib build/temp.linux-x86_64-cpython-310/cuda_extension.o build/temp.linux-x86_64-cpython-310/cuda_extension_kernel.o build/temp.linux-x86_64-cpython-310/cuda_extension_kernel2.o -L/opt/conda/lib/python3.10/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -ltorch -ltorch_cpu -ltorch_python -lcudart -lc10_cuda -ltorch_cuda -o build/lib.linux-x86_64-cpython-310/torch_test_cpp_extension/cuda.cpython-310-x86_64-linux-gnu.so 2023-01-11T21:45:56.0536909Z building 'torch_test_cpp_extension.torch_library' extension 2023-01-11T21:45:56.0538848Z /usr/local/cuda/bin/nvcc -I/opt/conda/lib/python3.10/site-packages/torch/include -I/opt/conda/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/opt/conda/lib/python3.10/site-packages/torch/include/TH -I/opt/conda/lib/python3.10/site-packages/torch/include/THC -I/usr/local/cuda/include -Iself_compiler_include_dirs_test -I/opt/conda/include/python3.10 -c torch_library.cu -o build/temp.linux-x86_64-cpython-310/torch_library.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -O2 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -DTORCH_EXTENSION_NAME=torch_library -D_GLIBCXX_USE_CXX11_ABI=1 -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_52,code=compute_52 -gencode=arch=compute_52,code=sm_52 -std=c++17 2023-01-11T21:46:03.4096591Z /opt/conda/lib/python3.10/site-packages/torch/include/c10/util/irange.h(54): warning #186-D: pointless comparison of unsigned integer with zero 2023-01-11T21:46:03.4097450Z detected during: 2023-01-11T21:46:03.4098795Z instantiation of "__nv_bool c10::detail::integer_iterator>::operator==(const c10::detail::integer_iterator> &) const [with I=size_t, one_sided=false, =0]"  2023-01-11T21:46:03.4099362Z (61): here 2023-01-11T21:46:03.4099962Z instantiation of "__nv_bool c10::detail::integer_iterator>::operator!=(const c10::detail::integer_iterator> &) const [with I=size_t, one_sided=false, =0]"  2023-01-11T21:46:03.4100489Z /opt/conda/lib/python3.10/site-packages/torch/include/c10/core/TensorImpl.h(77): here 2023-01-11T21:46:03.4100692Z 2023-01-11T21:46:03.5384066Z /opt/conda/lib/python3.10/site-packages/torch/include/c10/util/irange.h(54): warning #186-D: pointless comparison of unsigned integer with zero 2023-01-11T21:46:03.5384949Z detected during: 2023-01-11T21:46:03.5386146Z instantiation of "__nv_bool c10::detail::integer_iterator>::operator==(const c10::detail::integer_iterator> &) const [with I=std::size_t, one_sided=true, =0]"  2023-01-11T21:46:03.5387305Z (61): here 2023-01-11T21:46:03.5388413Z instantiation of "__nv_bool c10::detail::integer_iterator>::operator!=(const c10::detail::integer_iterator> &) const [with I=std::size_t, one_sided=true, =0]"  2023-01-11T21:46:03.5389020Z /opt/conda/lib/python3.10/site-packages/torch/include/ATen/core/qualified_name.h(73): here 2023-01-11T21:46:03.5389221Z 2023-01-11T21:46:27.2342296Z /opt/conda/lib/python3.10/site-packages/torch/include/c10/util/irange.h(54): warning #186-D: pointless comparison of unsigned integer with zero 2023-01-11T21:46:27.2342888Z detected during: 2023-01-11T21:46:27.2343891Z instantiation of "__nv_bool c10::detail::integer_iterator>::operator==(const c10::detail::integer_iterator> &) const [with I=size_t, one_sided=false, =0]"  2023-01-11T21:46:27.2344577Z (61): here 2023-01-11T21:46:27.2345495Z instantiation of "__nv_bool c10::detail::integer_iterator>::operator!=(const c10::detail::integer_iterator> &) const [with I=size_t, one_sided=false, =0]"  2023-01-11T21:46:27.2346483Z /opt/conda/lib/python3.10/site-packages/torch/include/c10/core/TensorImpl.h(77): here 2023-01-11T21:46:27.2346793Z 2023-01-11T21:46:27.3632420Z /opt/conda/lib/python3.10/site-packages/torch/include/c10/util/irange.h(54): warning #186-D: pointless comparison of unsigned integer with zero 2023-01-11T21:46:27.3633008Z detected during: 2023-01-11T21:46:27.3633966Z instantiation of "__nv_bool c10::detail::integer_iterator>::operator==(const c10::detail::integer_iterator> &) const [with I=std::size_t, one_sided=true, =0]"  2023-01-11T21:46:27.3634816Z (61): here 2023-01-11T21:46:27.3635386Z instantiation of "__nv_bool c10::detail::integer_iterator>::operator!=(const c10::detail::integer_iterator> &) const [with I=std::size_t, one_sided=true, =0]"  2023-01-11T21:46:27.3635913Z /opt/conda/lib/python3.10/site-packages/torch/include/ATen/core/qualified_name.h(73): here 2023-01-11T21:46:27.3636115Z 2023-01-11T21:46:59.6070642Z g++ -pthread -B /opt/conda/compiler_compat -shared -Wl,-rpath,/opt/conda/lib -Wl,-rpath-link,/opt/conda/lib -L/opt/conda/lib -Wl,-rpath,/opt/conda/lib -Wl,-rpath-link,/opt/conda/lib -L/opt/conda/lib build/temp.linux-x86_64-cpython-310/torch_library.o -L/opt/conda/lib/python3.10/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -ltorch -ltorch_cpu -ltorch_python -lcudart -lc10_cuda -ltorch_cuda -o build/lib.linux-x86_64-cpython-310/torch_test_cpp_extension/torch_library.cpython-310-x86_64-linux-gnu.so 2023-01-11T21:46:59.8416846Z building 'torch_test_cpp_extension.cublas_extension' extension 2023-01-11T21:46:59.8418538Z gcc -pthread -B /opt/conda/compiler_compat -Wno-unused-result -Wsign-compare -DNDEBUG -fwrapv -O2 -Wall -fPIC -O2 -isystem /opt/conda/include -fPIC -O2 -isystem /opt/conda/include -fPIC -I/opt/conda/lib/python3.10/site-packages/torch/include -I/opt/conda/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/opt/conda/lib/python3.10/site-packages/torch/include/TH -I/opt/conda/lib/python3.10/site-packages/torch/include/THC -I/usr/local/cuda/include -Iself_compiler_include_dirs_test -I/opt/conda/include/python3.10 -c cublas_extension.cpp -o build/temp.linux-x86_64-cpython-310/cublas_extension.o -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -DTORCH_EXTENSION_NAME=cublas_extension -D_GLIBCXX_USE_CXX11_ABI=1 -std=c++17 2023-01-11T21:47:00.6737421Z g++ -pthread -B /opt/conda/compiler_compat -shared -Wl,-rpath,/opt/conda/lib -Wl,-rpath-link,/opt/conda/lib -L/opt/conda/lib -Wl,-rpath,/opt/conda/lib -Wl,-rpath-link,/opt/conda/lib -L/opt/conda/lib build/temp.linux-x86_64-cpython-310/cublas_extension.o -L/opt/conda/lib/python3.10/site-packages/torch/lib -L/usr/local/cuda/lib64 -lcublas -lc10 -ltorch -ltorch_cpu -ltorch_python -lcudart -lc10_cuda -ltorch_cuda -o build/lib.linux-x86_64-cpython-310/torch_test_cpp_extension/cublas_extension.cpython-310-x86_64-linux-gnu.so 2023-01-11T21:47:00.9284969Z building 'torch_test_cpp_extension.cusolver_extension' extension 2023-01-11T21:47:00.9286899Z gcc -pthread -B /opt/conda/compiler_compat -Wno-unused-result -Wsign-compare -DNDEBUG -fwrapv -O2 -Wall -fPIC -O2 -isystem /opt/conda/include -fPIC -O2 -isystem /opt/conda/include -fPIC -I/opt/conda/lib/python3.10/site-packages/torch/include -I/opt/conda/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/opt/conda/lib/python3.10/site-packages/torch/include/TH -I/opt/conda/lib/python3.10/site-packages/torch/include/THC -I/usr/local/cuda/include -Iself_compiler_include_dirs_test -I/opt/conda/include/python3.10 -c cusolver_extension.cpp -o build/temp.linux-x86_64-cpython-310/cusolver_extension.o -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -DTORCH_EXTENSION_NAME=cusolver_extension -D_GLIBCXX_USE_CXX11_ABI=1 -std=c++17 2023-01-11T21:47:01.7701225Z g++ -pthread -B /opt/conda/compiler_compat -shared -Wl,-rpath,/opt/conda/lib -Wl,-rpath-link,/opt/conda/lib -L/opt/conda/lib -Wl,-rpath,/opt/conda/lib -Wl,-rpath-link,/opt/conda/lib -L/opt/conda/lib build/temp.linux-x86_64-cpython-310/cusolver_extension.o -L/opt/conda/lib/python3.10/site-packages/torch/lib -L/usr/local/cuda/lib64 -lcusolver -lc10 -ltorch -ltorch_cpu -ltorch_python -lcudart -lc10_cuda -ltorch_cuda -o build/lib.linux-x86_64-cpython-310/torch_test_cpp_extension/cusolver_extension.cpython-310-x86_64-linux-gnu.so 2023-01-11T21:47:02.0159500Z running install_lib 2023-01-11T21:47:02.0207028Z creating install 2023-01-11T21:47:02.0207296Z creating install/opt 2023-01-11T21:47:02.0207593Z creating install/opt/conda 2023-01-11T21:47:02.0207928Z creating install/opt/conda/lib 2023-01-11T21:47:02.0208267Z creating install/opt/conda/lib/python3.10 2023-01-11T21:47:02.0208735Z creating install/opt/conda/lib/python3.10/site-packages 2023-01-11T21:47:02.0212991Z creating install/opt/conda/lib/python3.10/site-packages/torch_test_cpp_extension 2023-01-11T21:47:02.0213641Z copying build/lib.linux-x86_64-cpython-310/torch_test_cpp_extension/__init__.py -> ./install/opt/conda/lib/python3.10/site-packages/torch_test_cpp_extension 2023-01-11T21:47:02.0214284Z copying build/lib.linux-x86_64-cpython-310/torch_test_cpp_extension/cpp.cpython-310-x86_64-linux-gnu.so -> ./install/opt/conda/lib/python3.10/site-packages/torch_test_cpp_extension 2023-01-11T21:47:02.0269192Z copying build/lib.linux-x86_64-cpython-310/torch_test_cpp_extension/ort.cpython-310-x86_64-linux-gnu.so -> ./install/opt/conda/lib/python3.10/site-packages/torch_test_cpp_extension 2023-01-11T21:47:02.0325243Z copying build/lib.linux-x86_64-cpython-310/torch_test_cpp_extension/rng.cpython-310-x86_64-linux-gnu.so -> ./install/opt/conda/lib/python3.10/site-packages/torch_test_cpp_extension 2023-01-11T21:47:02.0382134Z copying build/lib.linux-x86_64-cpython-310/torch_test_cpp_extension/cuda.cpython-310-x86_64-linux-gnu.so -> ./install/opt/conda/lib/python3.10/site-packages/torch_test_cpp_extension 2023-01-11T21:47:02.0432693Z copying build/lib.linux-x86_64-cpython-310/torch_test_cpp_extension/torch_library.cpython-310-x86_64-linux-gnu.so -> ./install/opt/conda/lib/python3.10/site-packages/torch_test_cpp_extension 2023-01-11T21:47:02.0434212Z copying build/lib.linux-x86_64-cpython-310/torch_test_cpp_extension/cublas_extension.cpython-310-x86_64-linux-gnu.so -> ./install/opt/conda/lib/python3.10/site-packages/torch_test_cpp_extension 2023-01-11T21:47:02.0436097Z copying build/lib.linux-x86_64-cpython-310/torch_test_cpp_extension/cusolver_extension.cpython-310-x86_64-linux-gnu.so -> ./install/opt/conda/lib/python3.10/site-packages/torch_test_cpp_extension 2023-01-11T21:47:02.0443178Z byte-compiling ./install/opt/conda/lib/python3.10/site-packages/torch_test_cpp_extension/__init__.py to __init__.cpython-310.pyc 2023-01-11T21:47:02.0444169Z running install_egg_info 2023-01-11T21:47:02.0541780Z running egg_info 2023-01-11T21:47:02.0542119Z creating torch_test_cpp_extension.egg-info 2023-01-11T21:47:02.0575720Z writing torch_test_cpp_extension.egg-info/PKG-INFO 2023-01-11T21:47:02.0576228Z writing dependency_links to torch_test_cpp_extension.egg-info/dependency_links.txt 2023-01-11T21:47:02.0578619Z writing top-level names to torch_test_cpp_extension.egg-info/top_level.txt 2023-01-11T21:47:02.0579178Z writing manifest file 'torch_test_cpp_extension.egg-info/SOURCES.txt' 2023-01-11T21:47:02.0616893Z reading manifest file 'torch_test_cpp_extension.egg-info/SOURCES.txt' 2023-01-11T21:47:02.0624084Z writing manifest file 'torch_test_cpp_extension.egg-info/SOURCES.txt' 2023-01-11T21:47:02.0625112Z Copying torch_test_cpp_extension.egg-info to ./install/opt/conda/lib/python3.10/site-packages/torch_test_cpp_extension-0.0.0-py3.10.egg-info 2023-01-11T21:47:02.0626413Z running install_scripts 2023-01-11T21:47:03.9471606Z running install 2023-01-11T21:47:03.9472336Z /opt/conda/lib/python3.10/site-packages/setuptools/command/install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools. 2023-01-11T21:47:03.9472726Z warnings.warn( 2023-01-11T21:47:03.9567733Z running build 2023-01-11T21:47:03.9568134Z running build_ext 2023-01-11T21:47:03.9846487Z building 'no_python_abi_suffix_test' extension 2023-01-11T21:47:03.9847013Z creating /var/lib/jenkins/workspace/test/cpp_extensions/no_python_abi_suffix_test/build 2023-01-11T21:47:03.9847801Z creating /var/lib/jenkins/workspace/test/cpp_extensions/no_python_abi_suffix_test/build/temp.linux-x86_64-cpython-310 2023-01-11T21:47:04.0116079Z Emitting ninja build file /var/lib/jenkins/workspace/test/cpp_extensions/no_python_abi_suffix_test/build/temp.linux-x86_64-cpython-310/build.ninja... 2023-01-11T21:47:04.0116580Z Compiling objects... 2023-01-11T21:47:04.0116828Z Using envvar MAX_JOBS (14) as the number of workers... 2023-01-11T21:47:04.0712693Z [1/1] c++ -MMD -MF /var/lib/jenkins/workspace/test/cpp_extensions/no_python_abi_suffix_test/build/temp.linux-x86_64-cpython-310/no_python_abi_suffix_test.o.d -pthread -B /opt/conda/compiler_compat -Wno-unused-result -Wsign-compare -DNDEBUG -fwrapv -O2 -Wall -fPIC -O2 -isystem /opt/conda/include -fPIC -O2 -isystem /opt/conda/include -fPIC -I/opt/conda/lib/python3.10/site-packages/torch/include -I/opt/conda/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/opt/conda/lib/python3.10/site-packages/torch/include/TH -I/opt/conda/lib/python3.10/site-packages/torch/include/THC -I/opt/conda/include/python3.10 -c -c /var/lib/jenkins/workspace/test/cpp_extensions/no_python_abi_suffix_test/no_python_abi_suffix_test.cpp -o /var/lib/jenkins/workspace/test/cpp_extensions/no_python_abi_suffix_test/build/temp.linux-x86_64-cpython-310/no_python_abi_suffix_test.o -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=no_python_abi_suffix_test -D_GLIBCXX_USE_CXX11_ABI=1 -std=c++17 2023-01-11T21:47:04.0754193Z creating build/lib.linux-x86_64-cpython-310 2023-01-11T21:47:04.0756579Z g++ -pthread -B /opt/conda/compiler_compat -shared -Wl,-rpath,/opt/conda/lib -Wl,-rpath-link,/opt/conda/lib -L/opt/conda/lib -Wl,-rpath,/opt/conda/lib -Wl,-rpath-link,/opt/conda/lib -L/opt/conda/lib /var/lib/jenkins/workspace/test/cpp_extensions/no_python_abi_suffix_test/build/temp.linux-x86_64-cpython-310/no_python_abi_suffix_test.o -L/opt/conda/lib/python3.10/site-packages/torch/lib -lc10 -ltorch -ltorch_cpu -ltorch_python -o build/lib.linux-x86_64-cpython-310/no_python_abi_suffix_test.so 2023-01-11T21:47:04.1243306Z running install_lib 2023-01-11T21:47:04.1282339Z creating install 2023-01-11T21:47:04.1282879Z creating install/opt 2023-01-11T21:47:04.1283139Z creating install/opt/conda 2023-01-11T21:47:04.1283410Z creating install/opt/conda/lib 2023-01-11T21:47:04.1283645Z creating install/opt/conda/lib/python3.10 2023-01-11T21:47:04.1284123Z creating install/opt/conda/lib/python3.10/site-packages 2023-01-11T21:47:04.1284653Z copying build/lib.linux-x86_64-cpython-310/no_python_abi_suffix_test.so -> ./install/opt/conda/lib/python3.10/site-packages 2023-01-11T21:47:04.1290275Z running install_egg_info 2023-01-11T21:47:04.1384153Z running egg_info 2023-01-11T21:47:04.1384626Z creating no_python_abi_suffix_test.egg-info 2023-01-11T21:47:04.1417881Z writing no_python_abi_suffix_test.egg-info/PKG-INFO 2023-01-11T21:47:04.1418915Z writing dependency_links to no_python_abi_suffix_test.egg-info/dependency_links.txt 2023-01-11T21:47:04.1422603Z writing top-level names to no_python_abi_suffix_test.egg-info/top_level.txt 2023-01-11T21:47:04.1423137Z writing manifest file 'no_python_abi_suffix_test.egg-info/SOURCES.txt' 2023-01-11T21:47:04.1457122Z reading manifest file 'no_python_abi_suffix_test.egg-info/SOURCES.txt' 2023-01-11T21:47:04.1463381Z writing manifest file 'no_python_abi_suffix_test.egg-info/SOURCES.txt' 2023-01-11T21:47:04.1464041Z Copying no_python_abi_suffix_test.egg-info to ./install/opt/conda/lib/python3.10/site-packages/no_python_abi_suffix_test-0.0.0-py3.10.egg-info 2023-01-11T21:47:04.1466224Z running install_scripts 2023-01-11T21:47:04.4664431Z Executing ['/opt/conda/bin/python', '-bb', 'test_cpp_extensions_aot_no_ninja.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 21:47:04.465798] 2023-01-11T21:47:08.5466928Z 2023-01-11T21:47:08.5467645Z Expand the folded group to see the log file of test_cpp_extensions_aot_no_ninja 2023-01-11T21:47:08.5468885Z ##[group]PRINTING LOG FILE of test_cpp_extensions_aot_no_ninja (/var/lib/jenkins/workspace/test/test-reports/test_cpp_extensions_aot_no_ninja_hsmbdv80) 2023-01-11T21:47:08.5469334Z 2023-01-11T21:47:08.5469449Z Running tests... 2023-01-11T21:47:08.5470019Z ---------------------------------------------------------------------- 2023-01-11T21:47:08.5472014Z Test results will be stored in test-reports/python-unittest/test_cpp_extensions_aot_no_ninja 2023-01-11T21:47:08.5472716Z test_backward (__main__.TestCppExtensionAOT) ... ok (0.011s) 2023-01-11T21:47:08.5473095Z test_cublas_extension (__main__.TestCppExtensionAOT) ... ok (0.653s) 2023-01-11T21:47:08.5473655Z test_cuda_dlink_libs (__main__.TestCppExtensionAOT) ... skip: cuda extension with dlink requires ninja to build (0.001s) 2023-01-11T21:47:08.5474018Z test_cuda_extension (__main__.TestCppExtensionAOT) ... ok (0.070s) 2023-01-11T21:47:08.5474325Z test_cusolver_extension (__main__.TestCppExtensionAOT) ... ok (0.060s) 2023-01-11T21:47:08.5474646Z test_extension_function (__main__.TestCppExtensionAOT) ... ok (0.001s) 2023-01-11T21:47:08.5474958Z test_extension_module (__main__.TestCppExtensionAOT) ... ok (0.001s) 2023-01-11T21:47:08.5475309Z test_no_python_abi_suffix_sets_the_correct_library_name (__main__.TestCppExtensionAOT) ... ok (0.001s) 2023-01-11T21:47:08.5476548Z test_optional (__main__.TestCppExtensionAOT) ... ok (0.000s) 2023-01-11T21:47:08.5476826Z test_add (__main__.TestORTTensor) ... ok (0.001s) 2023-01-11T21:47:08.5477196Z test_conv_backend_override (__main__.TestORTTensor) ... ok (0.001s) 2023-01-11T21:47:08.5477619Z test_unregistered (__main__.TestORTTensor) ... ok (0.005s) 2023-01-11T21:47:08.5477885Z test_zeros (__main__.TestORTTensor) ... ok (0.001s) 2023-01-11T21:47:08.5478181Z test_pybind_return_types (__main__.TestPybindTypeCasters) ... ok (0.001s) 2023-01-11T21:47:08.5478483Z test_rng (__main__.TestRNGExtension) ... ok (0.002s) 2023-01-11T21:47:08.5478764Z test_torch_library (__main__.TestTorchLibrary) ... ok (0.045s) 2023-01-11T21:47:08.5478923Z 2023-01-11T21:47:08.5479153Z ---------------------------------------------------------------------- 2023-01-11T21:47:08.5480569Z Ran 16 tests in 0.883s 2023-01-11T21:47:08.5480759Z 2023-01-11T21:47:08.5481032Z OK (skipped=1) 2023-01-11T21:47:08.5481230Z 2023-01-11T21:47:08.5481363Z Generating XML reports... 2023-01-11T21:47:08.5482066Z Generated XML report: test-reports/python-unittest/test_cpp_extensions_aot_no_ninja/TEST-TestCppExtensionAOT-20230111214707.xml 2023-01-11T21:47:08.5482637Z Generated XML report: test-reports/python-unittest/test_cpp_extensions_aot_no_ninja/TEST-TestORTTensor-20230111214707.xml 2023-01-11T21:47:08.5483204Z Generated XML report: test-reports/python-unittest/test_cpp_extensions_aot_no_ninja/TEST-TestPybindTypeCasters-20230111214707.xml 2023-01-11T21:47:08.5483764Z Generated XML report: test-reports/python-unittest/test_cpp_extensions_aot_no_ninja/TEST-TestRNGExtension-20230111214707.xml 2023-01-11T21:47:08.5484315Z Generated XML report: test-reports/python-unittest/test_cpp_extensions_aot_no_ninja/TEST-TestTorchLibrary-20230111214707.xml 2023-01-11T21:47:08.5484555Z 2023-01-11T21:47:08.5484845Z ##[endgroup] 2023-01-11T21:47:08.5485296Z FINISHED PRINTING LOG FILE of test_cpp_extensions_aot_no_ninja (/var/lib/jenkins/workspace/test/test-reports/test_cpp_extensions_aot_no_ninja_hsmbdv80) 2023-01-11T21:47:08.5485551Z 2023-01-11T21:47:08.5485738Z Running test_cuda_nvml_based_avail ... [2023-01-11 21:47:08.546680] 2023-01-11T21:47:08.5486256Z Executing ['/opt/conda/bin/python', '-bb', 'test_cuda_nvml_based_avail.py', '-v', '--subprocess', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 21:47:08.546872] 2023-01-11T21:47:19.7502141Z 2023-01-11T21:47:19.7502719Z Expand the folded group to see the log file of test_cuda_nvml_based_avail 2023-01-11T21:47:19.7503684Z ##[group]PRINTING LOG FILE of test_cuda_nvml_based_avail (/var/lib/jenkins/workspace/test/test-reports/test_cuda_nvml_based_avail_1g39c3hx) 2023-01-11T21:47:19.7504457Z 2023-01-11T21:47:19.7505703Z , <__main__.TestExtendedCUDAIsAvail testMethod=test_cuda_is_available_nvml_avail_False_avoid_init_1>, <__main__.TestExtendedCUDAIsAvail testMethod=test_cuda_is_available_nvml_avail_False_avoid_init_None>, <__main__.TestExtendedCUDAIsAvail testMethod=test_cuda_is_available_nvml_avail_True_avoid_init_0>, <__main__.TestExtendedCUDAIsAvail testMethod=test_cuda_is_available_nvml_avail_True_avoid_init_1>, <__main__.TestExtendedCUDAIsAvail testMethod=test_cuda_is_available_nvml_avail_True_avoid_init_None>]> 2023-01-11T21:47:19.7506749Z test_cuda_is_available_nvml_avail_False_avoid_init_0 (__main__.TestExtendedCUDAIsAvail) 2023-01-11T21:47:19.7507195Z test_cuda_is_available_nvml_avail_False_avoid_init_1 (__main__.TestExtendedCUDAIsAvail) 2023-01-11T21:47:19.7507691Z test_cuda_is_available_nvml_avail_False_avoid_init_None (__main__.TestExtendedCUDAIsAvail) 2023-01-11T21:47:19.7508106Z test_cuda_is_available_nvml_avail_True_avoid_init_0 (__main__.TestExtendedCUDAIsAvail) 2023-01-11T21:47:19.7508674Z test_cuda_is_available_nvml_avail_True_avoid_init_1 (__main__.TestExtendedCUDAIsAvail) 2023-01-11T21:47:19.7509243Z test_cuda_is_available_nvml_avail_True_avoid_init_None (__main__.TestExtendedCUDAIsAvail) 2023-01-11T21:47:19.7509918Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:47:19.7510368Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:47:19.7511149Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:47:19.7511743Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:47:19.7512018Z 2023-01-11T21:47:19.7512147Z Running tests... 2023-01-11T21:47:19.7512656Z ---------------------------------------------------------------------- 2023-01-11T21:47:19.7513296Z Test results will be stored in test-reports/python-unittest/test_cuda_nvml_based_avail 2023-01-11T21:47:19.7513841Z test_cuda_is_available_nvml_avail_False_avoid_init_0 (__main__.TestExtendedCUDAIsAvail) ... ok (0.032s) 2023-01-11T21:47:19.7514191Z 2023-01-11T21:47:19.7514588Z ---------------------------------------------------------------------- 2023-01-11T21:47:19.7515004Z Ran 1 test in 0.032s 2023-01-11T21:47:19.7515201Z 2023-01-11T21:47:19.7515311Z OK 2023-01-11T21:47:19.7515483Z 2023-01-11T21:47:19.7515645Z Generating XML reports... 2023-01-11T21:47:19.7516243Z Generated XML report: test-reports/python-unittest/test_cuda_nvml_based_avail/TEST-TestExtendedCUDAIsAvail-20230111214711.xml 2023-01-11T21:47:19.7517136Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:47:19.7517517Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:47:19.7518298Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:47:19.7518825Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:47:19.7519142Z 2023-01-11T21:47:19.7519287Z Running tests... 2023-01-11T21:47:19.7519802Z ---------------------------------------------------------------------- 2023-01-11T21:47:19.7520289Z Test results will be stored in test-reports/python-unittest/test_cuda_nvml_based_avail 2023-01-11T21:47:19.7520675Z test_cuda_is_available_nvml_avail_False_avoid_init_1 (__main__.TestExtendedCUDAIsAvail) ... ok (0.029s) 2023-01-11T21:47:19.7520888Z 2023-01-11T21:47:19.7521091Z ---------------------------------------------------------------------- 2023-01-11T21:47:19.7521350Z Ran 1 test in 0.029s 2023-01-11T21:47:19.7521475Z 2023-01-11T21:47:19.7521539Z OK 2023-01-11T21:47:19.7521642Z 2023-01-11T21:47:19.7521736Z Generating XML reports... 2023-01-11T21:47:19.7522264Z Generated XML report: test-reports/python-unittest/test_cuda_nvml_based_avail/TEST-TestExtendedCUDAIsAvail-20230111214712.xml 2023-01-11T21:47:19.7522810Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:47:19.7523158Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:47:19.7523603Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:47:19.7523972Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:47:19.7524148Z 2023-01-11T21:47:19.7524222Z Running tests... 2023-01-11T21:47:19.7524541Z ---------------------------------------------------------------------- 2023-01-11T21:47:19.7524948Z Test results will be stored in test-reports/python-unittest/test_cuda_nvml_based_avail 2023-01-11T21:47:19.7525336Z test_cuda_is_available_nvml_avail_False_avoid_init_None (__main__.TestExtendedCUDAIsAvail) ... ok (0.029s) 2023-01-11T21:47:19.7525544Z 2023-01-11T21:47:19.7525743Z ---------------------------------------------------------------------- 2023-01-11T21:47:19.7526001Z Ran 1 test in 0.029s 2023-01-11T21:47:19.7526121Z 2023-01-11T21:47:19.7526191Z OK 2023-01-11T21:47:19.7526293Z 2023-01-11T21:47:19.7526380Z Generating XML reports... 2023-01-11T21:47:19.7526857Z Generated XML report: test-reports/python-unittest/test_cuda_nvml_based_avail/TEST-TestExtendedCUDAIsAvail-20230111214714.xml 2023-01-11T21:47:19.7527397Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:47:19.7527754Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:47:19.7528195Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:47:19.7528564Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:47:19.7539869Z 2023-01-11T21:47:19.7539984Z Running tests... 2023-01-11T21:47:19.7540336Z ---------------------------------------------------------------------- 2023-01-11T21:47:19.7540755Z Test results will be stored in test-reports/python-unittest/test_cuda_nvml_based_avail 2023-01-11T21:47:19.7541244Z test_cuda_is_available_nvml_avail_True_avoid_init_0 (__main__.TestExtendedCUDAIsAvail) ... ok (0.024s) 2023-01-11T21:47:19.7541459Z 2023-01-11T21:47:19.7541654Z ---------------------------------------------------------------------- 2023-01-11T21:47:19.7541917Z Ran 1 test in 0.024s 2023-01-11T21:47:19.7542040Z 2023-01-11T21:47:19.7542111Z OK 2023-01-11T21:47:19.7542212Z 2023-01-11T21:47:19.7542307Z Generating XML reports... 2023-01-11T21:47:19.7542762Z Generated XML report: test-reports/python-unittest/test_cuda_nvml_based_avail/TEST-TestExtendedCUDAIsAvail-20230111214715.xml 2023-01-11T21:47:19.7543315Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:47:19.7543677Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:47:19.7544116Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:47:19.7544495Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:47:19.7544671Z 2023-01-11T21:47:19.7544752Z Running tests... 2023-01-11T21:47:19.7545068Z ---------------------------------------------------------------------- 2023-01-11T21:47:19.7545466Z Test results will be stored in test-reports/python-unittest/test_cuda_nvml_based_avail 2023-01-11T21:47:19.7545848Z test_cuda_is_available_nvml_avail_True_avoid_init_1 (__main__.TestExtendedCUDAIsAvail) ... ok (0.013s) 2023-01-11T21:47:19.7546060Z 2023-01-11T21:47:19.7546262Z ---------------------------------------------------------------------- 2023-01-11T21:47:19.7546514Z Ran 1 test in 0.013s 2023-01-11T21:47:19.7546635Z 2023-01-11T21:47:19.7546755Z OK 2023-01-11T21:47:19.7546859Z 2023-01-11T21:47:19.7546957Z Generating XML reports... 2023-01-11T21:47:19.7547417Z Generated XML report: test-reports/python-unittest/test_cuda_nvml_based_avail/TEST-TestExtendedCUDAIsAvail-20230111214717.xml 2023-01-11T21:47:19.7547957Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:47:19.7548316Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:47:19.7548764Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:47:19.7549128Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:47:19.7549306Z 2023-01-11T21:47:19.7549404Z Running tests... 2023-01-11T21:47:19.7549754Z ---------------------------------------------------------------------- 2023-01-11T21:47:19.7550162Z Test results will be stored in test-reports/python-unittest/test_cuda_nvml_based_avail 2023-01-11T21:47:19.7550546Z test_cuda_is_available_nvml_avail_True_avoid_init_None (__main__.TestExtendedCUDAIsAvail) ... ok (0.028s) 2023-01-11T21:47:19.7550758Z 2023-01-11T21:47:19.7550956Z ---------------------------------------------------------------------- 2023-01-11T21:47:19.7551218Z Ran 1 test in 0.028s 2023-01-11T21:47:19.7551342Z 2023-01-11T21:47:19.7551403Z OK 2023-01-11T21:47:19.7551508Z 2023-01-11T21:47:19.7551601Z Generating XML reports... 2023-01-11T21:47:19.7552061Z Generated XML report: test-reports/python-unittest/test_cuda_nvml_based_avail/TEST-TestExtendedCUDAIsAvail-20230111214719.xml 2023-01-11T21:47:19.7552327Z 2023-01-11T21:47:19.7552646Z ##[endgroup] 2023-01-11T21:47:19.7553077Z FINISHED PRINTING LOG FILE of test_cuda_nvml_based_avail (/var/lib/jenkins/workspace/test/test-reports/test_cuda_nvml_based_avail_1g39c3hx) 2023-01-11T21:47:19.7553323Z 2023-01-11T21:47:19.7553495Z Running test_dispatch ... [2023-01-11 21:47:19.750185] 2023-01-11T21:47:19.7553978Z Executing ['/opt/conda/bin/python', '-bb', 'test_dispatch.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 21:47:19.750383] 2023-01-11T21:48:07.9746943Z 2023-01-11T21:48:07.9747629Z Expand the folded group to see the log file of test_dispatch 2023-01-11T21:48:07.9748839Z ##[group]PRINTING LOG FILE of test_dispatch (/var/lib/jenkins/workspace/test/test-reports/test_dispatch_riow3h5c) 2023-01-11T21:48:07.9749103Z 2023-01-11T21:48:07.9749190Z Running tests... 2023-01-11T21:48:07.9749670Z ---------------------------------------------------------------------- 2023-01-11T21:48:07.9750209Z Test results will be stored in test-reports/python-unittest/test_dispatch 2023-01-11T21:48:07.9750622Z test_all_invariants (__main__.TestDispatch) ... ok (1.121s) 2023-01-11T21:48:07.9751027Z test_computed_table (__main__.TestDispatch) ... ok (8.818s) 2023-01-11T21:48:07.9751449Z test_computed_table_with_ambiguous_autogradother (__main__.TestDispatch) ... ok (0.013s) 2023-01-11T21:48:07.9754850Z test_computed_table_with_autograd (__main__.TestDispatch) ... ok (0.002s) 2023-01-11T21:48:07.9755351Z test_computed_table_with_cpu_autograd_defaultbackend (__main__.TestDispatch) ... ok (0.244s) 2023-01-11T21:48:07.9755911Z test_computed_table_with_cpu_autograd_math (__main__.TestDispatch) ... ok (0.264s) 2023-01-11T21:48:07.9756509Z test_computed_table_with_cpu_autograd_math_defaultbackend (__main__.TestDispatch) ... ok (8.273s) 2023-01-11T21:48:07.9757049Z test_computed_table_with_cpu_defaultbackend (__main__.TestDispatch) ... ok (0.012s) 2023-01-11T21:48:07.9757431Z test_computed_table_with_cpu_math (__main__.TestDispatch) ... ok (0.013s) 2023-01-11T21:48:07.9757879Z test_computed_table_with_cpu_math_autogradcpu_fallthrough (__main__.TestDispatch) ... ok (0.002s) 2023-01-11T21:48:07.9758397Z test_computed_table_with_math (__main__.TestDispatch) ... ok (0.002s) 2023-01-11T21:48:07.9758699Z test_def (__main__.TestDispatch) ... ok (8.489s) 2023-01-11T21:48:07.9758977Z test_def_impl_schema_mismatch (__main__.TestDispatch) ... ok (0.010s) 2023-01-11T21:48:07.9759388Z test_def_only (__main__.TestDispatch) ... ok (0.001s) 2023-01-11T21:48:07.9760335Z test_def_with_explicit_alias (__main__.TestDispatch) ... ok (0.001s) 2023-01-11T21:48:07.9760659Z test_def_with_inference (__main__.TestDispatch) ... ok (0.282s) 2023-01-11T21:48:07.9761126Z test_dispatch_print_registrations_for_dispatch_key_invalid (__main__.TestDispatch) ... ok (0.001s) 2023-01-11T21:48:07.9761539Z test_find_dangling_impls (__main__.TestDispatch) ... ok (0.001s) 2023-01-11T21:48:07.9762010Z test_find_dangling_impls_ext (__main__.TestDispatch) ... Using /var/lib/jenkins/.cache/torch_extensions/py310_cu116 as PyTorch extensions root... 2023-01-11T21:48:07.9762430Z Creating extension directory /var/lib/jenkins/.cache/torch_extensions/py310_cu116/dangling_impl_extension... 2023-01-11T21:48:07.9762880Z Emitting ninja build file /var/lib/jenkins/.cache/torch_extensions/py310_cu116/dangling_impl_extension/build.ninja... 2023-01-11T21:48:07.9763457Z Building extension module dangling_impl_extension... 2023-01-11T21:48:07.9763745Z Using envvar MAX_JOBS (14) as the number of workers... 2023-01-11T21:48:07.9765672Z [1/2] c++ -MMD -MF dangling_impl_extension.o.d -DTORCH_EXTENSION_NAME=dangling_impl_extension -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /opt/conda/lib/python3.10/site-packages/torch/include -isystem /opt/conda/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/lib/python3.10/site-packages/torch/include/TH -isystem /opt/conda/lib/python3.10/site-packages/torch/include/THC -isystem /opt/conda/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=1 -fPIC -std=c++17 -g -c /var/lib/jenkins/workspace/test/cpp_extensions/dangling_impl_extension.cpp -o dangling_impl_extension.o 2023-01-11T21:48:07.9767323Z [2/2] c++ dangling_impl_extension.o -shared -L/opt/conda/lib/python3.10/site-packages/torch/lib -lc10 -ltorch_cpu -ltorch -ltorch_python -o dangling_impl_extension.so 2023-01-11T21:48:07.9767862Z Loading extension module dangling_impl_extension... 2023-01-11T21:48:07.9768130Z ok (18.290s) 2023-01-11T21:48:07.9768386Z test_impl_only (__main__.TestDispatch) ... ok (0.270s) 2023-01-11T21:48:07.9768843Z test_multiple_def_alias_defaulting (__main__.TestDispatch) ... ok (0.007s) 2023-01-11T21:48:07.9769239Z test_multiple_def_alias_mismatch (__main__.TestDispatch) ... ok (0.007s) 2023-01-11T21:48:07.9769648Z test_multiple_def_error (__main__.TestDispatch) ... ok (0.007s) 2023-01-11T21:48:07.9770047Z test_multiple_fallback (__main__.TestDispatch) ... ok (0.008s) 2023-01-11T21:48:07.9770608Z test_overwrite_math (__main__.TestDispatch) ... [W OperatorEntry.cpp:159] Warning: Overriding a previously registered kernel for the same operator and the same dispatch key 2023-01-11T21:48:07.9771126Z operator: __test45643__::foo 2023-01-11T21:48:07.9771400Z no debug info 2023-01-11T21:48:07.9771651Z dispatch key: (catch all) 2023-01-11T21:48:07.9771871Z previous kernel: fn1 2023-01-11T21:48:07.9772106Z new kernel: fn2 (function registerKernel) 2023-01-11T21:48:07.9772453Z [W OperatorEntry.cpp:159] Warning: Overriding a previously registered kernel for the same operator and the same dispatch key 2023-01-11T21:48:07.9772769Z operator: __test45644__::foo 2023-01-11T21:48:07.9772965Z no debug info 2023-01-11T21:48:07.9773153Z dispatch key: (catch all) 2023-01-11T21:48:07.9773362Z previous kernel: fn1 2023-01-11T21:48:07.9773592Z new kernel: fn2 (function registerKernel) 2023-01-11T21:48:07.9773796Z ok (0.001s) 2023-01-11T21:48:07.9774052Z test_autogradother (__main__.TestPythonDispatcher) ... ok (0.001s) 2023-01-11T21:48:07.9774356Z test_basic (__main__.TestPythonDispatcher) ... ok (0.001s) 2023-01-11T21:48:07.9774931Z test_defaultbackend_autogradcpu (__main__.TestPythonDispatcher) ... ok (0.001s) 2023-01-11T21:48:07.9775258Z test_defaultbackend_math (__main__.TestPythonDispatcher) ... ok (0.001s) 2023-01-11T21:48:07.9775722Z test_duplicate_registrations (__main__.TestPythonDispatcher) ... ok (0.000s) 2023-01-11T21:48:07.9776063Z test_math_autogradcpu (__main__.TestPythonDispatcher) ... ok (0.001s) 2023-01-11T21:48:07.9776398Z test_quantized_structured_not_implemented (__main__.TestPythonDispatcher) ... ok (0.028s) 2023-01-11T21:48:07.9776600Z 2023-01-11T21:48:07.9776824Z ---------------------------------------------------------------------- 2023-01-11T21:48:07.9777084Z Ran 32 tests in 46.172s 2023-01-11T21:48:07.9777207Z 2023-01-11T21:48:07.9777277Z OK 2023-01-11T21:48:07.9777369Z 2023-01-11T21:48:07.9777462Z Generating XML reports... 2023-01-11T21:48:07.9777868Z Generated XML report: test-reports/python-unittest/test_dispatch/TEST-TestDispatch-20230111214721.xml 2023-01-11T21:48:07.9778390Z Generated XML report: test-reports/python-unittest/test_dispatch/TEST-TestPythonDispatcher-20230111214721.xml 2023-01-11T21:48:07.9778639Z 2023-01-11T21:48:07.9778893Z ##[endgroup] 2023-01-11T21:48:07.9779380Z FINISHED PRINTING LOG FILE of test_dispatch (/var/lib/jenkins/workspace/test/test-reports/test_dispatch_riow3h5c) 2023-01-11T21:48:07.9779602Z 2023-01-11T21:48:07.9779771Z Running test_linalg ... [2023-01-11 21:48:07.974645] 2023-01-11T21:48:07.9780237Z Executing ['/opt/conda/bin/python', '-bb', 'test_linalg.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 21:48:07.974838] 2023-01-11T21:53:15.8029629Z 2023-01-11T21:53:15.8032703Z Expand the folded group to see the log file of test_linalg 2023-01-11T21:53:15.8047619Z ##[group]PRINTING LOG FILE of test_linalg (/var/lib/jenkins/workspace/test/test-reports/test_linalg_lzuigkoh) 2023-01-11T21:53:15.8047841Z 2023-01-11T21:53:15.8049779Z Running tests... 2023-01-11T21:53:15.8050487Z ---------------------------------------------------------------------- 2023-01-11T21:53:15.8051017Z Test results will be stored in test-reports/python-unittest/test_linalg 2023-01-11T21:53:15.8051347Z test_addbmm_cuda_bfloat16 (__main__.TestLinalgCUDA) ... ok (1.152s) 2023-01-11T21:53:15.8051680Z test_addbmm_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.525s) 2023-01-11T21:53:15.8051979Z test_addbmm_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (1.104s) 2023-01-11T21:53:15.8097072Z test_addbmm_cuda_float32 (__main__.TestLinalgCUDA) ... ok (1.003s) 2023-01-11T21:53:15.8097436Z test_addbmm_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.476s) 2023-01-11T21:53:15.8097753Z test_addmm_activation_cuda_bfloat16 (__main__.TestLinalgCUDA) ... ok (0.016s) 2023-01-11T21:53:15.8098083Z test_addmm_activation_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.027s) 2023-01-11T21:53:15.8098399Z test_addmm_activation_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.014s) 2023-01-11T21:53:15.8098721Z test_addmm_baddbmm_overflow_cuda_float16 (__main__.TestLinalgCUDA) ... ok (0.002s) 2023-01-11T21:53:15.8099045Z test_addmm_cuda_bfloat16 (__main__.TestLinalgCUDA) ... ok (0.015s) 2023-01-11T21:53:15.8099354Z test_addmm_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.016s) 2023-01-11T21:53:15.8099662Z test_addmm_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.030s) 2023-01-11T21:53:15.8099968Z test_addmm_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.027s) 2023-01-11T21:53:15.8100352Z test_addmm_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.014s) 2023-01-11T21:53:15.8100654Z test_addmm_sizes_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.218s) 2023-01-11T21:53:15.8100969Z test_addmm_sizes_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.434s) 2023-01-11T21:53:15.8101281Z test_addmm_sizes_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.429s) 2023-01-11T21:53:15.8101589Z test_addmm_sizes_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.216s) 2023-01-11T21:53:15.8101883Z test_addmv_cuda_bfloat16 (__main__.TestLinalgCUDA) ... ok (0.020s) 2023-01-11T21:53:15.8102181Z test_addmv_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.020s) 2023-01-11T21:53:15.8103446Z test_addmv_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.021s) 2023-01-11T21:53:15.8104143Z test_addmv_cuda_float16 (__main__.TestLinalgCUDA) ... ok (0.020s) 2023-01-11T21:53:15.8104557Z test_addmv_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.019s) 2023-01-11T21:53:15.8104877Z test_addmv_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.020s) 2023-01-11T21:53:15.8105198Z test_addmv_rowmajor_colmajor_incx_incy_lda_cuda_bfloat16 (__main__.TestLinalgCUDA) ... ok (0.013s) 2023-01-11T21:53:15.8105553Z test_addmv_rowmajor_colmajor_incx_incy_lda_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.013s) 2023-01-11T21:53:15.8105908Z test_addmv_rowmajor_colmajor_incx_incy_lda_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.012s) 2023-01-11T21:53:15.8106304Z test_addr_bool_cuda_bool (__main__.TestLinalgCUDA) ... ok (0.009s) 2023-01-11T21:53:15.8106757Z test_addr_float_and_complex_cuda_bfloat16 (__main__.TestLinalgCUDA) ... ok (0.016s) 2023-01-11T21:53:15.8107218Z test_addr_float_and_complex_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.019s) 2023-01-11T21:53:15.8107688Z test_addr_float_and_complex_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.019s) 2023-01-11T21:53:15.8108143Z test_addr_float_and_complex_cuda_float16 (__main__.TestLinalgCUDA) ... ok (0.016s) 2023-01-11T21:53:15.8108618Z test_addr_float_and_complex_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.015s) 2023-01-11T21:53:15.8109099Z test_addr_float_and_complex_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.015s) 2023-01-11T21:53:15.8109439Z test_addr_integral_cuda_int16 (__main__.TestLinalgCUDA) ... ok (0.022s) 2023-01-11T21:53:15.8109776Z test_addr_integral_cuda_int32 (__main__.TestLinalgCUDA) ... ok (0.022s) 2023-01-11T21:53:15.8110076Z test_addr_integral_cuda_int64 (__main__.TestLinalgCUDA) ... ok (0.022s) 2023-01-11T21:53:15.8110370Z test_addr_integral_cuda_int8 (__main__.TestLinalgCUDA) ... ok (0.022s) 2023-01-11T21:53:15.8110667Z test_addr_integral_cuda_uint8 (__main__.TestLinalgCUDA) ... ok (0.022s) 2023-01-11T21:53:15.8110958Z test_addr_type_promotion_cuda (__main__.TestLinalgCUDA) ... ok (0.330s) 2023-01-11T21:53:15.8111257Z test_baddbmm_cuda_bfloat16 (__main__.TestLinalgCUDA) ... ok (0.995s) 2023-01-11T21:53:15.8111557Z test_baddbmm_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (1.049s) 2023-01-11T21:53:15.8111927Z test_baddbmm_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (2.191s) 2023-01-11T21:53:15.8112230Z test_baddbmm_cuda_float32 (__main__.TestLinalgCUDA) ... ok (1.994s) 2023-01-11T21:53:15.8112523Z test_baddbmm_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.938s) 2023-01-11T21:53:15.8144992Z test_blas_alpha_beta_empty_cuda_bfloat16 (__main__.TestLinalgCUDA) ... ok (0.003s) 2023-01-11T21:53:15.8145355Z test_blas_alpha_beta_empty_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.003s) 2023-01-11T21:53:15.8145690Z test_blas_alpha_beta_empty_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.003s) 2023-01-11T21:53:15.8146024Z test_blas_alpha_beta_empty_cuda_float16 (__main__.TestLinalgCUDA) ... ok (0.003s) 2023-01-11T21:53:15.8146361Z test_blas_alpha_beta_empty_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.003s) 2023-01-11T21:53:15.8146679Z test_blas_alpha_beta_empty_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.003s) 2023-01-11T21:53:15.8147065Z test_blas_empty_cuda (__main__.TestLinalgCUDA) ... ok (0.014s) 2023-01-11T21:53:15.8147388Z test_blas_mv_large_input_cuda (__main__.TestLinalgCUDA) ... skip: Only runs on cpu (0.001s) 2023-01-11T21:53:15.8148164Z test_blas_nan_out_cuda_bfloat16 (__main__.TestLinalgCUDA) ... /var/lib/jenkins/workspace/test/test_linalg.py:4652: UserWarning: An output with one or more elements was resized since it had shape [7], which does not match the required output shape [5]. This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize_(0). (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/Resize.cpp:33.) 2023-01-11T21:53:15.8149038Z self.assertEqual(torch.mv(nm, _m), torch.mv(nm, _m, out=_m_out)) 2023-01-11T21:53:15.8149276Z ok (0.003s) 2023-01-11T21:53:15.8149653Z test_blas_nan_out_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.002s) 2023-01-11T21:53:15.8149971Z test_blas_nan_out_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.002s) 2023-01-11T21:53:15.8150277Z test_blas_nan_out_cuda_float16 (__main__.TestLinalgCUDA) ... ok (0.002s) 2023-01-11T21:53:15.8150577Z test_blas_nan_out_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.002s) 2023-01-11T21:53:15.8150865Z test_blas_nan_out_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.002s) 2023-01-11T21:53:15.8151607Z test_bmm_cuda_bfloat16 (__main__.TestLinalgCUDA) ... /var/lib/jenkins/workspace/test/test_linalg.py:5739: UserWarning: An output with one or more elements was resized since it had shape [1, 23, 12], which does not match the required output shape [1, 23, 0]. This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize_(0). (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/Resize.cpp:33.) 2023-01-11T21:53:15.8152274Z torch.bmm(b1, b2, out=res2) 2023-01-11T21:53:15.8152933Z /var/lib/jenkins/workspace/test/test_linalg.py:5739: UserWarning: An output with one or more elements was resized since it had shape [1, 23, 12], which does not match the required output shape [1, 0, 12]. This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize_(0). (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/Resize.cpp:33.) 2023-01-11T21:53:15.8153547Z torch.bmm(b1, b2, out=res2) 2023-01-11T21:53:15.8154234Z /var/lib/jenkins/workspace/test/test_linalg.py:5739: UserWarning: An output with one or more elements was resized since it had shape [1, 23, 12], which does not match the required output shape [1, 0, 0]. This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize_(0). (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/Resize.cpp:33.) 2023-01-11T21:53:15.8154841Z torch.bmm(b1, b2, out=res2) 2023-01-11T21:53:15.8155486Z /var/lib/jenkins/workspace/test/test_linalg.py:5739: UserWarning: An output with one or more elements was resized since it had shape [1, 23, 12], which does not match the required output shape [0, 23, 12]. This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize_(0). (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/Resize.cpp:33.) 2023-01-11T21:53:15.8156101Z torch.bmm(b1, b2, out=res2) 2023-01-11T21:53:15.8156749Z /var/lib/jenkins/workspace/test/test_linalg.py:5739: UserWarning: An output with one or more elements was resized since it had shape [1, 23, 12], which does not match the required output shape [0, 23, 0]. This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize_(0). (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/Resize.cpp:33.) 2023-01-11T21:53:15.8157349Z torch.bmm(b1, b2, out=res2) 2023-01-11T21:53:15.8157978Z /var/lib/jenkins/workspace/test/test_linalg.py:5739: UserWarning: An output with one or more elements was resized since it had shape [1, 23, 12], which does not match the required output shape [0, 0, 12]. This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize_(0). (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/Resize.cpp:33.) 2023-01-11T21:53:15.8158603Z torch.bmm(b1, b2, out=res2) 2023-01-11T21:53:15.8159237Z /var/lib/jenkins/workspace/test/test_linalg.py:5739: UserWarning: An output with one or more elements was resized since it had shape [1, 23, 12], which does not match the required output shape [0, 0, 0]. This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize_(0). (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/Resize.cpp:33.) 2023-01-11T21:53:15.8159833Z torch.bmm(b1, b2, out=res2) 2023-01-11T21:53:15.8160481Z /var/lib/jenkins/workspace/test/test_linalg.py:5739: UserWarning: An output with one or more elements was resized since it had shape [10, 23, 12], which does not match the required output shape [10, 23, 0]. This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize_(0). (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/Resize.cpp:33.) 2023-01-11T21:53:15.8161098Z torch.bmm(b1, b2, out=res2) 2023-01-11T21:53:15.8161746Z /var/lib/jenkins/workspace/test/test_linalg.py:5739: UserWarning: An output with one or more elements was resized since it had shape [10, 23, 12], which does not match the required output shape [10, 0, 12]. This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize_(0). (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/Resize.cpp:33.) 2023-01-11T21:53:15.8162366Z torch.bmm(b1, b2, out=res2) 2023-01-11T21:53:15.8163030Z /var/lib/jenkins/workspace/test/test_linalg.py:5739: UserWarning: An output with one or more elements was resized since it had shape [10, 23, 12], which does not match the required output shape [10, 0, 0]. This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize_(0). (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/Resize.cpp:33.) 2023-01-11T21:53:15.8163641Z torch.bmm(b1, b2, out=res2) 2023-01-11T21:53:15.8164280Z /var/lib/jenkins/workspace/test/test_linalg.py:5739: UserWarning: An output with one or more elements was resized since it had shape [10, 23, 12], which does not match the required output shape [0, 23, 12]. This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize_(0). (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/Resize.cpp:33.) 2023-01-11T21:53:15.8164885Z torch.bmm(b1, b2, out=res2) 2023-01-11T21:53:15.8165534Z /var/lib/jenkins/workspace/test/test_linalg.py:5739: UserWarning: An output with one or more elements was resized since it had shape [10, 23, 12], which does not match the required output shape [0, 23, 0]. This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize_(0). (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/Resize.cpp:33.) 2023-01-11T21:53:15.8166129Z torch.bmm(b1, b2, out=res2) 2023-01-11T21:53:15.8166766Z /var/lib/jenkins/workspace/test/test_linalg.py:5739: UserWarning: An output with one or more elements was resized since it had shape [10, 23, 12], which does not match the required output shape [0, 0, 12]. This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize_(0). (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/Resize.cpp:33.) 2023-01-11T21:53:15.8167390Z torch.bmm(b1, b2, out=res2) 2023-01-11T21:53:15.8168022Z /var/lib/jenkins/workspace/test/test_linalg.py:5739: UserWarning: An output with one or more elements was resized since it had shape [10, 23, 12], which does not match the required output shape [0, 0, 0]. This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize_(0). (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/Resize.cpp:33.) 2023-01-11T21:53:15.8168637Z torch.bmm(b1, b2, out=res2) 2023-01-11T21:53:15.8168862Z ok (15.701s) 2023-01-11T21:53:15.8169107Z test_bmm_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (15.734s) 2023-01-11T21:53:15.8169398Z test_bmm_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (31.414s) 2023-01-11T21:53:15.8169694Z test_bmm_cuda_float32 (__main__.TestLinalgCUDA) ... ok (31.163s) 2023-01-11T21:53:15.8169979Z test_bmm_cuda_float64 (__main__.TestLinalgCUDA) ... ok (15.592s) 2023-01-11T21:53:15.8170714Z test_broadcast_batched_matmul_cuda (__main__.TestLinalgCUDA) ... /var/lib/jenkins/workspace/test/test_linalg.py:6856: UserWarning: An output with one or more elements was resized since it had shape [3, 8], which does not match the required output shape [3, 8, 1]. This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize_(0). (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/Resize.cpp:33.) 2023-01-11T21:53:15.8171381Z torch.matmul(l, r, out=out) 2023-01-11T21:53:15.8172053Z /var/lib/jenkins/workspace/test/test_linalg.py:6856: UserWarning: An output with one or more elements was resized since it had shape [3, 1], which does not match the required output shape [3, 1, 1]. This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize_(0). (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/Resize.cpp:33.) 2023-01-11T21:53:15.8172665Z torch.matmul(l, r, out=out) 2023-01-11T21:53:15.8173297Z /var/lib/jenkins/workspace/test/test_linalg.py:6856: UserWarning: An output with one or more elements was resized since it had shape [1, 2], which does not match the required output shape [1, 2, 1]. This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize_(0). (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/Resize.cpp:33.) 2023-01-11T21:53:15.8173920Z torch.matmul(l, r, out=out) 2023-01-11T21:53:15.8174852Z /var/lib/jenkins/workspace/test/test_linalg.py:6856: UserWarning: An output with one or more elements was resized since it had shape [1, 6], which does not match the required output shape [1, 1, 6]. This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize_(0). (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/Resize.cpp:33.) 2023-01-11T21:53:15.8175489Z torch.matmul(l, r, out=out) 2023-01-11T21:53:15.8175674Z ok (0.048s) 2023-01-11T21:53:15.8175924Z test_broadcast_fused_matmul_cuda (__main__.TestLinalgCUDA) ... ok (0.003s) 2023-01-11T21:53:15.8176943Z test_chain_matmul_cuda_float64 (__main__.TestLinalgCUDA) ... /opt/conda/lib/python3.10/site-packages/torch/functional.py:1626: UserWarning: torch.chain_matmul is deprecated and will be removed in a future PyTorch release. Use torch.linalg.multi_dot instead, which accepts a list of two or more tensors rather than multiple parameters. (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/LinearAlgebra.cpp:1077.) 2023-01-11T21:53:15.8177700Z return _VF.chain_matmul(matrices) # type: ignore[attr-defined] 2023-01-11T21:53:15.8177933Z ok (0.005s) 2023-01-11T21:53:15.8178191Z test_cholesky_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.159s) 2023-01-11T21:53:15.8178503Z test_cholesky_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.016s) 2023-01-11T21:53:15.8178812Z test_cholesky_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.013s) 2023-01-11T21:53:15.8179107Z test_cholesky_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.011s) 2023-01-11T21:53:15.8179436Z test_cholesky_errors_and_warnings_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.040s) 2023-01-11T21:53:15.8179792Z test_cholesky_errors_and_warnings_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.039s) 2023-01-11T21:53:15.8180243Z test_cholesky_errors_and_warnings_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.039s) 2023-01-11T21:53:15.8180592Z test_cholesky_errors_and_warnings_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.039s) 2023-01-11T21:53:15.8180914Z test_cholesky_ex_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.006s) 2023-01-11T21:53:15.8181231Z test_cholesky_ex_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.007s) 2023-01-11T21:53:15.8181533Z test_cholesky_ex_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.006s) 2023-01-11T21:53:15.8181834Z test_cholesky_ex_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.005s) 2023-01-11T21:53:15.8182149Z test_cholesky_ex_non_pd_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.010s) 2023-01-11T21:53:15.8182470Z test_cholesky_ex_non_pd_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.010s) 2023-01-11T21:53:15.8182797Z test_cholesky_ex_non_pd_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.010s) 2023-01-11T21:53:15.8183109Z test_cholesky_ex_non_pd_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.010s) 2023-01-11T21:53:15.8183496Z test_cholesky_inverse_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.222s) 2023-01-11T21:53:15.8183812Z test_cholesky_inverse_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.032s) 2023-01-11T21:53:15.8184126Z test_cholesky_inverse_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.027s) 2023-01-11T21:53:15.8184442Z test_cholesky_inverse_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.027s) 2023-01-11T21:53:15.8184778Z test_cholesky_inverse_errors_and_warnings_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.019s) 2023-01-11T21:53:15.8185143Z test_cholesky_inverse_errors_and_warnings_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.019s) 2023-01-11T21:53:15.8185506Z test_cholesky_inverse_errors_and_warnings_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.019s) 2023-01-11T21:53:15.8185864Z test_cholesky_inverse_errors_and_warnings_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.019s) 2023-01-11T21:53:15.8186224Z test_cholesky_solve_batched_broadcasting_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.010s) 2023-01-11T21:53:15.8186593Z test_cholesky_solve_batched_broadcasting_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.010s) 2023-01-11T21:53:15.8186957Z test_cholesky_solve_batched_broadcasting_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.009s) 2023-01-11T21:53:15.8187310Z test_cholesky_solve_batched_broadcasting_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.009s) 2023-01-11T21:53:15.8187824Z test_cholesky_solve_batched_cuda_complex128 (__main__.TestLinalgCUDA) ... /var/lib/jenkins/workspace/test/test_linalg.py:2483: UserWarning: torch.cholesky is deprecated in favor of torch.linalg.cholesky and will be removed in a future PyTorch release. 2023-01-11T21:53:15.8188236Z L = torch.cholesky(A) 2023-01-11T21:53:15.8188520Z should be replaced with 2023-01-11T21:53:15.8188806Z L = torch.linalg.cholesky(A) 2023-01-11T21:53:15.8189029Z and 2023-01-11T21:53:15.8189224Z U = torch.cholesky(A, upper=True) 2023-01-11T21:53:15.8189436Z should be replaced with 2023-01-11T21:53:15.8189658Z U = torch.linalg.cholesky(A).mH(). 2023-01-11T21:53:15.8190083Z This transform will produce equivalent results for all valid (symmetric positive definite) inputs. (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/BatchLinearAlgebra.cpp:1730.) 2023-01-11T21:53:15.8190481Z L = torch.cholesky(A, upper=upper) 2023-01-11T21:53:15.8190682Z ok (0.007s) 2023-01-11T21:53:15.8190963Z test_cholesky_solve_batched_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.007s) 2023-01-11T21:53:15.8191300Z test_cholesky_solve_batched_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.006s) 2023-01-11T21:53:15.8191621Z test_cholesky_solve_batched_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.007s) 2023-01-11T21:53:15.8192034Z test_cholesky_solve_batched_many_batches_cuda_complex128 (__main__.TestLinalgCUDA) ... skip: test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test (0.001s) 2023-01-11T21:53:15.8192516Z test_cholesky_solve_batched_many_batches_cuda_complex64 (__main__.TestLinalgCUDA) ... skip: test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test (0.001s) 2023-01-11T21:53:15.8192990Z test_cholesky_solve_batched_many_batches_cuda_float32 (__main__.TestLinalgCUDA) ... skip: test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test (0.001s) 2023-01-11T21:53:15.8193443Z test_cholesky_solve_batched_many_batches_cuda_float64 (__main__.TestLinalgCUDA) ... skip: test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test (0.001s) 2023-01-11T21:53:15.8193832Z test_cholesky_solve_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.004s) 2023-01-11T21:53:15.8194152Z test_cholesky_solve_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.004s) 2023-01-11T21:53:15.8194467Z test_cholesky_solve_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.004s) 2023-01-11T21:53:15.8194771Z test_cholesky_solve_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.004s) 2023-01-11T21:53:15.8195152Z test_cholesky_solve_out_errors_and_warnings_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.007s) 2023-01-11T21:53:15.8195520Z test_cholesky_solve_out_errors_and_warnings_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.007s) 2023-01-11T21:53:15.8195882Z test_cholesky_solve_out_errors_and_warnings_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.007s) 2023-01-11T21:53:15.8196238Z test_cholesky_solve_out_errors_and_warnings_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.007s) 2023-01-11T21:53:15.8196564Z test_cond_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.254s) 2023-01-11T21:53:15.8196864Z test_cond_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.065s) 2023-01-11T21:53:15.8197148Z test_cond_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.061s) 2023-01-11T21:53:15.8197435Z test_cond_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.113s) 2023-01-11T21:53:15.8197761Z test_cond_errors_and_warnings_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.047s) 2023-01-11T21:53:15.8198092Z test_cond_errors_and_warnings_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.046s) 2023-01-11T21:53:15.8198430Z test_cond_errors_and_warnings_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.046s) 2023-01-11T21:53:15.8198781Z test_cond_errors_and_warnings_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.046s) 2023-01-11T21:53:15.8199149Z test_corner_cases_of_cublasltmatmul_cuda_bfloat16 (__main__.TestLinalgCUDA) ... ok (0.316s) 2023-01-11T21:53:15.8199495Z test_corner_cases_of_cublasltmatmul_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.451s) 2023-01-11T21:53:15.8199846Z test_corner_cases_of_cublasltmatmul_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.390s) 2023-01-11T21:53:15.8200192Z test_corner_cases_of_cublasltmatmul_cuda_float16 (__main__.TestLinalgCUDA) ... ok (0.318s) 2023-01-11T21:53:15.8200567Z test_corner_cases_of_cublasltmatmul_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.309s) 2023-01-11T21:53:15.8200904Z test_corner_cases_of_cublasltmatmul_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.385s) 2023-01-11T21:53:15.8201231Z test_cross_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.002s) 2023-01-11T21:53:15.8201528Z test_cross_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8201843Z test_cross_with_and_without_dim_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8202183Z test_cross_with_and_without_dim_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8202501Z test_det_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.024s) 2023-01-11T21:53:15.8202794Z test_det_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.019s) 2023-01-11T21:53:15.8203102Z test_det_logdet_slogdet_batched_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.188s) 2023-01-11T21:53:15.8203588Z test_det_logdet_slogdet_cuda_float64 (__main__.TestLinalgCUDA) ... /var/lib/jenkins/workspace/test/test_linalg.py:6457: UserWarning: torch.qr is deprecated in favor of torch.linalg.qr and will be removed in a future PyTorch release. 2023-01-11T21:53:15.8204137Z The boolean parameter 'some' has been replaced with a string parameter 'mode'. 2023-01-11T21:53:15.8204401Z Q, R = torch.qr(A, some) 2023-01-11T21:53:15.8204611Z should be replaced with 2023-01-11T21:53:15.8205089Z Q, R = torch.linalg.qr(A, 'reduced' if some else 'complete') (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/BatchLinearAlgebra.cpp:2459.) 2023-01-11T21:53:15.8205437Z q, _ = torch.qr(mat) 2023-01-11T21:53:15.8205618Z ok (2.621s) 2023-01-11T21:53:15.8205861Z test_dot_invalid_args_cuda (__main__.TestLinalgCUDA) ... ok (0.033s) 2023-01-11T21:53:15.8206173Z test_dot_vs_numpy_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.003s) 2023-01-11T21:53:15.8206479Z test_dot_vs_numpy_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.003s) 2023-01-11T21:53:15.8206863Z test_eig_check_magma_cuda_float32 (__main__.TestLinalgCUDA) ... skip: test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test (0.001s) 2023-01-11T21:53:15.8207246Z test_eig_compare_backends_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.008s) 2023-01-11T21:53:15.8207620Z test_eig_compare_backends_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.008s) 2023-01-11T21:53:15.8207948Z test_eig_compare_backends_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.007s) 2023-01-11T21:53:15.8208278Z test_eig_compare_backends_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.007s) 2023-01-11T21:53:15.8208618Z test_eig_errors_and_warnings_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.036s) 2023-01-11T21:53:15.8208954Z test_eig_errors_and_warnings_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.036s) 2023-01-11T21:53:15.8209289Z test_eig_errors_and_warnings_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.047s) 2023-01-11T21:53:15.8209624Z test_eig_errors_and_warnings_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.049s) 2023-01-11T21:53:15.8209941Z test_eig_numpy_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.010s) 2023-01-11T21:53:15.8210244Z test_eig_numpy_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.009s) 2023-01-11T21:53:15.8210553Z test_eig_removed_error_cuda (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8210864Z test_eig_with_nan_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.023s) 2023-01-11T21:53:15.8211173Z test_eig_with_nan_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.023s) 2023-01-11T21:53:15.8211483Z test_eig_with_nan_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.022s) 2023-01-11T21:53:15.8211788Z test_eig_with_nan_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.022s) 2023-01-11T21:53:15.8212092Z test_eigh_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.034s) 2023-01-11T21:53:15.8212389Z test_eigh_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.034s) 2023-01-11T21:53:15.8212719Z test_eigh_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.030s) 2023-01-11T21:53:15.8213013Z test_eigh_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.029s) 2023-01-11T21:53:15.8213327Z test_eigh_errors_and_warnings_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.043s) 2023-01-11T21:53:15.8213674Z test_eigh_errors_and_warnings_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.043s) 2023-01-11T21:53:15.8214012Z test_eigh_errors_and_warnings_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.043s) 2023-01-11T21:53:15.8214349Z test_eigh_errors_and_warnings_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.043s) 2023-01-11T21:53:15.8214914Z test_eigh_lower_uplo_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.002s) 2023-01-11T21:53:15.8215325Z test_eigh_lower_uplo_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.002s) 2023-01-11T21:53:15.8215710Z test_eigh_lower_uplo_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.002s) 2023-01-11T21:53:15.8216012Z test_eigh_lower_uplo_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.002s) 2023-01-11T21:53:15.8216339Z test_eigvals_compare_backends_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.013s) 2023-01-11T21:53:15.8216679Z test_eigvals_compare_backends_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.013s) 2023-01-11T21:53:15.8217017Z test_eigvals_compare_backends_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.014s) 2023-01-11T21:53:15.8217335Z test_eigvals_compare_backends_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.013s) 2023-01-11T21:53:15.8217668Z test_eigvals_errors_and_warnings_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.012s) 2023-01-11T21:53:15.8218008Z test_eigvals_errors_and_warnings_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.011s) 2023-01-11T21:53:15.8218334Z test_eigvals_errors_and_warnings_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.013s) 2023-01-11T21:53:15.8218686Z test_eigvals_errors_and_warnings_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.013s) 2023-01-11T21:53:15.8219033Z test_eigvals_numpy_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.006s) 2023-01-11T21:53:15.8219337Z test_eigvals_numpy_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.006s) 2023-01-11T21:53:15.8219634Z test_eigvalsh_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.019s) 2023-01-11T21:53:15.8219995Z test_eigvalsh_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.016s) 2023-01-11T21:53:15.8220400Z test_eigvalsh_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.015s) 2023-01-11T21:53:15.8220700Z test_eigvalsh_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.015s) 2023-01-11T21:53:15.8221226Z test_eigvalsh_errors_and_warnings_cuda_complex128 (__main__.TestLinalgCUDA) ... /var/lib/jenkins/workspace/test/test_linalg.py:1042: UserWarning: Casting complex values to real discards the imaginary part (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/Copy.cpp:276.) 2023-01-11T21:53:15.8221695Z out = torch.empty_like(t).to(real_dtype) 2023-01-11T21:53:15.8221911Z ok (0.029s) 2023-01-11T21:53:15.8222170Z test_eigvalsh_errors_and_warnings_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.028s) 2023-01-11T21:53:15.8222510Z test_eigvalsh_errors_and_warnings_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.029s) 2023-01-11T21:53:15.8222852Z test_eigvalsh_errors_and_warnings_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.029s) 2023-01-11T21:53:15.8223159Z test_einsum_corner_cases_cuda (__main__.TestLinalgCUDA) ... ok (0.010s) 2023-01-11T21:53:15.8223464Z test_einsum_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.038s) 2023-01-11T21:53:15.8223761Z test_einsum_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.035s) 2023-01-11T21:53:15.8224062Z test_einsum_error_cases_cuda (__main__.TestLinalgCUDA) ... ok (0.036s) 2023-01-11T21:53:15.8224367Z test_einsum_random_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (3.122s) 2023-01-11T21:53:15.8224678Z test_einsum_random_cuda_float64 (__main__.TestLinalgCUDA) ... ok (2.959s) 2023-01-11T21:53:15.8225003Z test_einsum_sublist_format_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.015s) 2023-01-11T21:53:15.8225365Z test_einsum_sublist_format_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.013s) 2023-01-11T21:53:15.8225672Z test_geqrf_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.020s) 2023-01-11T21:53:15.8225971Z test_geqrf_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.020s) 2023-01-11T21:53:15.8226259Z test_geqrf_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.019s) 2023-01-11T21:53:15.8226535Z test_geqrf_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.018s) 2023-01-11T21:53:15.8226843Z test_householder_product_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.014s) 2023-01-11T21:53:15.8227174Z test_householder_product_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.014s) 2023-01-11T21:53:15.8227491Z test_householder_product_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.013s) 2023-01-11T21:53:15.8227802Z test_householder_product_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.013s) 2023-01-11T21:53:15.8228136Z test_householder_product_errors_and_warnings_cuda (__main__.TestLinalgCUDA) ... ok (0.032s) 2023-01-11T21:53:15.8228454Z test_inner_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.033s) 2023-01-11T21:53:15.8228738Z test_inner_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.030s) 2023-01-11T21:53:15.8229046Z test_inv_errors_and_warnings_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.050s) 2023-01-11T21:53:15.8229378Z test_inv_errors_and_warnings_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.049s) 2023-01-11T21:53:15.8229694Z test_inv_errors_and_warnings_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.049s) 2023-01-11T21:53:15.8230013Z test_inv_errors_and_warnings_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.049s) 2023-01-11T21:53:15.8230328Z test_inv_ex_info_device_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8230646Z test_inv_ex_info_device_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8230956Z test_inv_ex_info_device_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8231260Z test_inv_ex_info_device_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8231610Z test_inv_ex_singular_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.011s) 2023-01-11T21:53:15.8231918Z test_inv_ex_singular_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.011s) 2023-01-11T21:53:15.8232224Z test_inv_ex_singular_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.011s) 2023-01-11T21:53:15.8232530Z test_inv_ex_singular_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.011s) 2023-01-11T21:53:15.8232870Z test_invariance_error_spectral_decompositions_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.017s) 2023-01-11T21:53:15.8233199Z test_inverse_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.150s) 2023-01-11T21:53:15.8233495Z test_inverse_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.103s) 2023-01-11T21:53:15.8233790Z test_inverse_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.095s) 2023-01-11T21:53:15.8234077Z test_inverse_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.101s) 2023-01-11T21:53:15.8234378Z test_inverse_errors_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.039s) 2023-01-11T21:53:15.8234692Z test_inverse_errors_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.038s) 2023-01-11T21:53:15.8235002Z test_inverse_errors_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.038s) 2023-01-11T21:53:15.8235300Z test_inverse_errors_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.038s) 2023-01-11T21:53:15.8235662Z test_inverse_errors_large_cuda_complex128 (__main__.TestLinalgCUDA) ... skip: test skipped for CUDA version (11, 6) (0.001s) 2023-01-11T21:53:15.8236059Z test_inverse_errors_large_cuda_complex64 (__main__.TestLinalgCUDA) ... skip: test skipped for CUDA version (11, 6) (0.001s) 2023-01-11T21:53:15.8236445Z test_inverse_errors_large_cuda_float32 (__main__.TestLinalgCUDA) ... skip: test skipped for CUDA version (11, 6) (0.001s) 2023-01-11T21:53:15.8236867Z test_inverse_errors_large_cuda_float64 (__main__.TestLinalgCUDA) ... skip: test skipped for CUDA version (11, 6) (0.001s) 2023-01-11T21:53:15.8237288Z test_inverse_many_batches_cuda_complex128 (__main__.TestLinalgCUDA) ... skip: test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test (0.001s) 2023-01-11T21:53:15.8237733Z test_inverse_many_batches_cuda_complex64 (__main__.TestLinalgCUDA) ... skip: test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test (0.001s) 2023-01-11T21:53:15.8238173Z test_inverse_many_batches_cuda_float32 (__main__.TestLinalgCUDA) ... skip: test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test (0.001s) 2023-01-11T21:53:15.8238601Z test_inverse_many_batches_cuda_float64 (__main__.TestLinalgCUDA) ... skip: test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test (0.001s) 2023-01-11T21:53:15.8238959Z test_kron_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.015s) 2023-01-11T21:53:15.8239250Z test_kron_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.016s) 2023-01-11T21:53:15.8239533Z test_kron_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.014s) 2023-01-11T21:53:15.8239817Z test_kron_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.013s) 2023-01-11T21:53:15.8240112Z test_kron_empty_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.002s) 2023-01-11T21:53:15.8240420Z test_kron_empty_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.002s) 2023-01-11T21:53:15.8240721Z test_kron_empty_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.002s) 2023-01-11T21:53:15.8241018Z test_kron_empty_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.002s) 2023-01-11T21:53:15.8241776Z test_kron_errors_and_warnings_cuda_complex128 (__main__.TestLinalgCUDA) ... /var/lib/jenkins/workspace/test/test_linalg.py:1117: UserWarning: An output with one or more elements was resized since it had shape [3, 3], which does not match the required output shape [6, 6]. This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize_(0). (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/Resize.cpp:33.) 2023-01-11T21:53:15.8242466Z torch.kron(a, b, out=out) 2023-01-11T21:53:15.8242652Z ok (0.006s) 2023-01-11T21:53:15.8243343Z test_kron_errors_and_warnings_cuda_complex64 (__main__.TestLinalgCUDA) ... /var/lib/jenkins/workspace/test/test_linalg.py:1117: UserWarning: An output with one or more elements was resized since it had shape [3, 3], which does not match the required output shape [6, 6]. This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize_(0). (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/Resize.cpp:33.) 2023-01-11T21:53:15.8244010Z torch.kron(a, b, out=out) 2023-01-11T21:53:15.8244202Z ok (0.005s) 2023-01-11T21:53:15.8244887Z test_kron_errors_and_warnings_cuda_float32 (__main__.TestLinalgCUDA) ... /var/lib/jenkins/workspace/test/test_linalg.py:1117: UserWarning: An output with one or more elements was resized since it had shape [3, 3], which does not match the required output shape [6, 6]. This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize_(0). (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/Resize.cpp:33.) 2023-01-11T21:53:15.8245537Z torch.kron(a, b, out=out) 2023-01-11T21:53:15.8245721Z ok (0.005s) 2023-01-11T21:53:15.8246385Z test_kron_errors_and_warnings_cuda_float64 (__main__.TestLinalgCUDA) ... /var/lib/jenkins/workspace/test/test_linalg.py:1117: UserWarning: An output with one or more elements was resized since it had shape [3, 3], which does not match the required output shape [6, 6]. This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize_(0). (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/Resize.cpp:33.) 2023-01-11T21:53:15.8247082Z torch.kron(a, b, out=out) 2023-01-11T21:53:15.8247276Z ok (0.005s) 2023-01-11T21:53:15.8247507Z test_lapack_empty_cuda (__main__.TestLinalgCUDA) ... ok (0.003s) 2023-01-11T21:53:15.8247896Z test_ldl_factor_cuda_complex128 (__main__.TestLinalgCUDA) ... /var/lib/jenkins/workspace/test/test_linalg.py:7290: ComplexWarning: scipy.linalg.ldl(): 2023-01-11T21:53:15.8248348Z The imaginary parts of the diagonalare ignored. Use "hermitian=False" for factorization ofcomplex symmetric arrays. 2023-01-11T21:53:15.8248740Z lambda x: scipy_ldl(x, hermitian=hermitian, lower=True), 2023-01-11T21:53:15.8248963Z ok (0.047s) 2023-01-11T21:53:15.8249211Z test_ldl_factor_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.017s) 2023-01-11T21:53:15.8249520Z test_ldl_factor_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.005s) 2023-01-11T21:53:15.8249817Z test_ldl_factor_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.005s) 2023-01-11T21:53:15.8250123Z test_ldl_solve_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.008s) 2023-01-11T21:53:15.8250432Z test_ldl_solve_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.008s) 2023-01-11T21:53:15.8250727Z test_ldl_solve_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.007s) 2023-01-11T21:53:15.8251024Z test_ldl_solve_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.008s) 2023-01-11T21:53:15.8251329Z test_linalg_cross_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.002s) 2023-01-11T21:53:15.8251638Z test_linalg_cross_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.002s) 2023-01-11T21:53:15.8251964Z test_linalg_cross_with_and_without_dim_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8252323Z test_linalg_cross_with_and_without_dim_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8252675Z test_linalg_lstsq_batch_broadcasting_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.252s) 2023-01-11T21:53:15.8253053Z test_linalg_lstsq_batch_broadcasting_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.060s) 2023-01-11T21:53:15.8253407Z test_linalg_lstsq_batch_broadcasting_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.050s) 2023-01-11T21:53:15.8253752Z test_linalg_lstsq_batch_broadcasting_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.091s) 2023-01-11T21:53:15.8254081Z test_linalg_lstsq_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.505s) 2023-01-11T21:53:15.8254388Z test_linalg_lstsq_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.146s) 2023-01-11T21:53:15.8254958Z test_linalg_lstsq_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.127s) 2023-01-11T21:53:15.8255291Z test_linalg_lstsq_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.197s) 2023-01-11T21:53:15.8255612Z test_linalg_lstsq_input_checks_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.039s) 2023-01-11T21:53:15.8255957Z test_linalg_lstsq_input_checks_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.039s) 2023-01-11T21:53:15.8256297Z test_linalg_lstsq_input_checks_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.039s) 2023-01-11T21:53:15.8256629Z test_linalg_lstsq_input_checks_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.038s) 2023-01-11T21:53:15.8256975Z test_linalg_lu_cpu_errors_cuda_complex128 (__main__.TestLinalgCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:53:15.8257345Z test_linalg_lu_cpu_errors_cuda_complex64 (__main__.TestLinalgCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:53:15.8257708Z test_linalg_lu_cpu_errors_cuda_float32 (__main__.TestLinalgCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:53:15.8258070Z test_linalg_lu_cpu_errors_cuda_float64 (__main__.TestLinalgCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T21:53:15.8258897Z test_linalg_lu_family_cuda_complex128 (__main__.TestLinalgCUDA) ... /opt/conda/lib/python3.10/site-packages/torch/functional.py:1728: UserWarning: torch.lu is deprecated in favor of torch.linalg.lu_factor / torch.linalg.lu_factor_ex and will be removed in a future PyTorch release. 2023-01-11T21:53:15.8259365Z LU, pivots = torch.lu(A, compute_pivots) 2023-01-11T21:53:15.8259590Z should be replaced with 2023-01-11T21:53:15.8259838Z LU, pivots = torch.linalg.lu_factor(A, compute_pivots) 2023-01-11T21:53:15.8260130Z and 2023-01-11T21:53:15.8260371Z LU, pivots, info = torch.lu(A, compute_pivots, get_infos=True) 2023-01-11T21:53:15.8260615Z should be replaced with 2023-01-11T21:53:15.8260983Z LU, pivots, info = torch.linalg.lu_factor_ex(A, compute_pivots) (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/BatchLinearAlgebra.cpp:2029.) 2023-01-11T21:53:15.8261393Z return torch._lu_with_info(A, pivot=pivot, check_errors=(not get_infos)) 2023-01-11T21:53:15.8261637Z ok (1.124s) 2023-01-11T21:53:15.8261884Z test_linalg_lu_family_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.994s) 2023-01-11T21:53:15.8262207Z test_linalg_lu_family_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.901s) 2023-01-11T21:53:15.8262522Z test_linalg_lu_family_cuda_float64 (__main__.TestLinalgCUDA) ... ok (1.285s) 2023-01-11T21:53:15.8263059Z test_linalg_lu_solve_cuda_complex128 (__main__.TestLinalgCUDA) ... [W Context.cpp:241] Warning: torch.backends.cuda.preferred_linalg_library is an experimental feature. If you see any error or unexpected behavior when this flag is set please file an issue on GitHub. (function operator()) 2023-01-11T21:53:15.8263491Z ok (1.968s) 2023-01-11T21:53:15.8263743Z test_linalg_lu_solve_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (1.926s) 2023-01-11T21:53:15.8264057Z test_linalg_lu_solve_cuda_float32 (__main__.TestLinalgCUDA) ... ok (1.919s) 2023-01-11T21:53:15.8264361Z test_linalg_lu_solve_cuda_float64 (__main__.TestLinalgCUDA) ... ok (1.925s) 2023-01-11T21:53:15.8264756Z test_linalg_matrix_exp_analytic_cuda_complex128 (__main__.TestLinalgCUDA) ... skip: test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test (0.003s) 2023-01-11T21:53:15.8265280Z test_linalg_matrix_exp_analytic_cuda_complex64 (__main__.TestLinalgCUDA) ... skip: test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test (0.003s) 2023-01-11T21:53:15.8265740Z test_linalg_matrix_exp_analytic_cuda_float32 (__main__.TestLinalgCUDA) ... skip: test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test (0.003s) 2023-01-11T21:53:15.8266190Z test_linalg_matrix_exp_analytic_cuda_float64 (__main__.TestLinalgCUDA) ... skip: test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test (0.003s) 2023-01-11T21:53:15.8266573Z test_linalg_matrix_exp_batch_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.045s) 2023-01-11T21:53:15.8266900Z test_linalg_matrix_exp_batch_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.045s) 2023-01-11T21:53:15.8267239Z test_linalg_matrix_exp_boundary_cases_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.012s) 2023-01-11T21:53:15.8267589Z test_linalg_matrix_exp_boundary_cases_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.012s) 2023-01-11T21:53:15.8267943Z test_linalg_matrix_exp_boundary_cases_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.012s) 2023-01-11T21:53:15.8268282Z test_linalg_matrix_exp_boundary_cases_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.012s) 2023-01-11T21:53:15.8268637Z test_linalg_matrix_exp_compare_with_taylor_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.100s) 2023-01-11T21:53:15.8268991Z test_linalg_matrix_exp_compare_with_taylor_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.099s) 2023-01-11T21:53:15.8269348Z test_linalg_matrix_exp_compare_with_taylor_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.100s) 2023-01-11T21:53:15.8269703Z test_linalg_matrix_exp_compare_with_taylor_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.093s) 2023-01-11T21:53:15.8270073Z test_linalg_matrix_exp_no_warnings_cuda_complex64 (__main__.TestLinalgCUDA) ... skip: Only runs on cpu (0.001s) 2023-01-11T21:53:15.8270460Z test_linalg_matrix_exp_utils_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.013s) 2023-01-11T21:53:15.8270801Z test_linalg_matrix_exp_utils_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.011s) 2023-01-11T21:53:15.8271140Z test_linalg_qr_autograd_errors_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.010s) 2023-01-11T21:53:15.8271488Z test_linalg_solve_triangular_broadcasting_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.918s) 2023-01-11T21:53:15.8271861Z test_linalg_solve_triangular_broadcasting_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.917s) 2023-01-11T21:53:15.8272231Z test_linalg_solve_triangular_broadcasting_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.017s) 2023-01-11T21:53:15.8272596Z test_linalg_solve_triangular_broadcasting_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.016s) 2023-01-11T21:53:15.8272941Z test_linalg_solve_triangular_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (1.954s) 2023-01-11T21:53:15.8273285Z test_linalg_solve_triangular_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (1.991s) 2023-01-11T21:53:15.8273621Z test_linalg_solve_triangular_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.442s) 2023-01-11T21:53:15.8273947Z test_linalg_solve_triangular_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.422s) 2023-01-11T21:53:15.8274327Z test_linalg_solve_triangular_large_cuda_complex128 (__main__.TestLinalgCUDA) ... skip: test skipped for CUDA version (11, 6) (0.001s) 2023-01-11T21:53:15.8274751Z test_linalg_solve_triangular_large_cuda_complex64 (__main__.TestLinalgCUDA) ... skip: test skipped for CUDA version (11, 6) (0.001s) 2023-01-11T21:53:15.8275170Z test_linalg_solve_triangular_large_cuda_float32 (__main__.TestLinalgCUDA) ... skip: test skipped for CUDA version (11, 6) (0.001s) 2023-01-11T21:53:15.8275577Z test_linalg_solve_triangular_large_cuda_float64 (__main__.TestLinalgCUDA) ... skip: test skipped for CUDA version (11, 6) (0.001s) 2023-01-11T21:53:15.8275954Z test_linear_algebra_scalar_raises_cuda (__main__.TestLinalgCUDA) ... ok (0.009s) 2023-01-11T21:53:15.8276470Z test_lobpcg_basic_cuda_float64 (__main__.TestLinalgCUDA) ... /var/lib/jenkins/workspace/test/test_linalg.py:5098: UserWarning: torch.symeig is deprecated in favor of torch.linalg.eigh and will be removed in a future PyTorch release. 2023-01-11T21:53:15.8277010Z The default behavior has changed from using the upper triangular portion of the matrix by default to using the lower triangular portion. 2023-01-11T21:53:15.8277346Z L, _ = torch.symeig(A, upper=upper) 2023-01-11T21:53:15.8277560Z should be replaced with 2023-01-11T21:53:15.8277868Z L = torch.linalg.eigvalsh(A, UPLO='U' if upper else 'L') 2023-01-11T21:53:15.8278099Z and 2023-01-11T21:53:15.8278304Z L, V = torch.symeig(A, eigenvectors=True) 2023-01-11T21:53:15.8278530Z should be replaced with 2023-01-11T21:53:15.8278997Z L, V = torch.linalg.eigh(A, UPLO='U' if upper else 'L') (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/BatchLinearAlgebra.cpp:2910.) 2023-01-11T21:53:15.8279341Z e = torch.symeig(A)[0] 2023-01-11T21:53:15.8279536Z ok (2.730s) 2023-01-11T21:53:15.8279782Z test_lobpcg_ortho_cuda_float64 (__main__.TestLinalgCUDA) ... ok (12.638s) 2023-01-11T21:53:15.8280083Z test_lobpcg_scipy_cuda_float64 (__main__.TestLinalgCUDA) 2023-01-11T21:53:15.8280427Z Compare torch and scipy.sparse.linalg implementations of lobpcg ... skip: Only runs on cpu (0.005s) 2023-01-11T21:53:15.8280807Z test_lobpcg_torchscript_cuda_float64 (__main__.TestLinalgCUDA) ... skip: Only runs on cpu (0.001s) 2023-01-11T21:53:15.8281141Z test_lstsq_removed_error_cuda (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8281635Z test_lu_solve_batched_broadcasting_cuda_complex128 (__main__.TestLinalgCUDA) ... /var/lib/jenkins/workspace/test/test_linalg.py:6953: UserWarning: torch.lu_solve is deprecated in favor of torch.linalg.lu_solveand will be removed in a future PyTorch release. 2023-01-11T21:53:15.8282147Z Note that torch.linalg.lu_solve has its arguments reversed. 2023-01-11T21:53:15.8282403Z X = torch.lu_solve(B, LU, pivots) 2023-01-11T21:53:15.8282613Z should be replaced with 2023-01-11T21:53:15.8282972Z X = torch.linalg.lu_solve(LU, pivots, B) (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/BatchLinearAlgebra.cpp:2183.) 2023-01-11T21:53:15.8283332Z x = torch.lu_solve(b, LU_data, LU_pivots) 2023-01-11T21:53:15.8283541Z ok (0.012s) 2023-01-11T21:53:15.8283804Z test_lu_solve_batched_broadcasting_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.005s) 2023-01-11T21:53:15.8284155Z test_lu_solve_batched_broadcasting_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.004s) 2023-01-11T21:53:15.8284498Z test_lu_solve_batched_broadcasting_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.005s) 2023-01-11T21:53:15.8284821Z test_lu_solve_batched_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.024s) 2023-01-11T21:53:15.8285147Z test_lu_solve_batched_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.011s) 2023-01-11T21:53:15.8285469Z test_lu_solve_batched_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.010s) 2023-01-11T21:53:15.8285780Z test_lu_solve_batched_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.012s) 2023-01-11T21:53:15.8286177Z test_lu_solve_batched_many_batches_cuda_complex128 (__main__.TestLinalgCUDA) ... skip: test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test (0.001s) 2023-01-11T21:53:15.8286641Z test_lu_solve_batched_many_batches_cuda_complex64 (__main__.TestLinalgCUDA) ... skip: test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test (0.001s) 2023-01-11T21:53:15.8287103Z test_lu_solve_batched_many_batches_cuda_float32 (__main__.TestLinalgCUDA) ... skip: test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test (0.001s) 2023-01-11T21:53:15.8287557Z test_lu_solve_batched_many_batches_cuda_float64 (__main__.TestLinalgCUDA) ... skip: test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test (0.001s) 2023-01-11T21:53:15.8287922Z test_lu_solve_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.021s) 2023-01-11T21:53:15.8288230Z test_lu_solve_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.007s) 2023-01-11T21:53:15.8288534Z test_lu_solve_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.006s) 2023-01-11T21:53:15.8288856Z test_lu_solve_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.009s) 2023-01-11T21:53:15.8289181Z test_lu_solve_large_matrices_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.002s) 2023-01-11T21:53:15.8289524Z test_lu_solve_large_matrices_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.002s) 2023-01-11T21:53:15.8289858Z test_lu_solve_large_matrices_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8290179Z test_lu_solve_large_matrices_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8290510Z test_lu_unpack_check_input_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.005s) 2023-01-11T21:53:15.8290816Z test_matmul_45724_cuda (__main__.TestLinalgCUDA) ... ok (0.283s) 2023-01-11T21:53:15.8291585Z test_matmul_small_brute_force_1d_Nd_cuda_complex64 (__main__.TestLinalgCUDA) ... /var/lib/jenkins/workspace/test/test_linalg.py:4287: UserWarning: An output with one or more elements was resized since it had shape [1], which does not match the required output shape [1, 1]. This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize_(0). (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/Resize.cpp:33.) 2023-01-11T21:53:15.8292273Z ans = torch.matmul(x, y, out=out) 2023-01-11T21:53:15.8292928Z /var/lib/jenkins/workspace/test/test_linalg.py:4287: UserWarning: An output with one or more elements was resized since it had shape [2], which does not match the required output shape [1, 2]. This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize_(0). (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/Resize.cpp:33.) 2023-01-11T21:53:15.8293573Z ans = torch.matmul(x, y, out=out) 2023-01-11T21:53:15.8294244Z /var/lib/jenkins/workspace/test/test_linalg.py:4287: UserWarning: An output with one or more elements was resized since it had shape [3], which does not match the required output shape [1, 3]. This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize_(0). (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/Resize.cpp:33.) 2023-01-11T21:53:15.8295159Z ans = torch.matmul(x, y, out=out) 2023-01-11T21:53:15.8295362Z ok (0.464s) 2023-01-11T21:53:15.8295627Z test_matmul_small_brute_force_1d_Nd_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.425s) 2023-01-11T21:53:15.8295974Z test_matmul_small_brute_force_2d_Nd_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (1.815s) 2023-01-11T21:53:15.8296310Z test_matmul_small_brute_force_2d_Nd_cuda_float32 (__main__.TestLinalgCUDA) ... ok (1.668s) 2023-01-11T21:53:15.8296647Z test_matmul_small_brute_force_3d_Nd_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (2.067s) 2023-01-11T21:53:15.8296970Z test_matmul_small_brute_force_3d_Nd_cuda_float32 (__main__.TestLinalgCUDA) ... ok (1.896s) 2023-01-11T21:53:15.8297279Z test_matrix_norm_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.011s) 2023-01-11T21:53:15.8297575Z test_matrix_norm_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.010s) 2023-01-11T21:53:15.8297896Z test_matrix_power_negative_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.023s) 2023-01-11T21:53:15.8298215Z test_matrix_power_negative_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.017s) 2023-01-11T21:53:15.8298552Z test_matrix_power_non_negative_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.010s) 2023-01-11T21:53:15.8298912Z test_matrix_power_non_negative_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.009s) 2023-01-11T21:53:15.8299255Z test_matrix_rank_atol_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.158s) 2023-01-11T21:53:15.8299668Z test_matrix_rank_atol_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.033s) 2023-01-11T21:53:15.8300075Z test_matrix_rank_atol_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.028s) 2023-01-11T21:53:15.8300415Z test_matrix_rank_atol_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.053s) 2023-01-11T21:53:15.8300759Z test_matrix_rank_atol_rtol_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.008s) 2023-01-11T21:53:15.8301095Z test_matrix_rank_basic_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.002s) 2023-01-11T21:53:15.8301411Z test_matrix_rank_basic_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.002s) 2023-01-11T21:53:15.8301718Z test_matrix_rank_basic_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.002s) 2023-01-11T21:53:15.8302028Z test_matrix_rank_basic_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.002s) 2023-01-11T21:53:15.8302338Z test_matrix_rank_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.086s) 2023-01-11T21:53:15.8302649Z test_matrix_rank_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.020s) 2023-01-11T21:53:15.8302945Z test_matrix_rank_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.017s) 2023-01-11T21:53:15.8303242Z test_matrix_rank_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.031s) 2023-01-11T21:53:15.8303589Z test_matrix_rank_empty_cuda_complex128 (__main__.TestLinalgCUDA) ... skip: test skipped for CUDA version (11, 6) (0.001s) 2023-01-11T21:53:15.8303978Z test_matrix_rank_empty_cuda_complex64 (__main__.TestLinalgCUDA) ... skip: test skipped for CUDA version (11, 6) (0.001s) 2023-01-11T21:53:15.8304364Z test_matrix_rank_empty_cuda_float32 (__main__.TestLinalgCUDA) ... skip: test skipped for CUDA version (11, 6) (0.001s) 2023-01-11T21:53:15.8304793Z test_matrix_rank_empty_cuda_float64 (__main__.TestLinalgCUDA) ... skip: test skipped for CUDA version (11, 6) (0.001s) 2023-01-11T21:53:15.8305165Z test_matrix_rank_out_errors_and_warnings_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.005s) 2023-01-11T21:53:15.8305518Z test_matrix_rank_out_errors_and_warnings_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.005s) 2023-01-11T21:53:15.8305869Z test_matrix_rank_out_errors_and_warnings_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.005s) 2023-01-11T21:53:15.8306214Z test_matrix_rank_out_errors_and_warnings_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.005s) 2023-01-11T21:53:15.8306544Z test_matrix_rank_removed_error_cuda (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8306845Z test_mm_bmm_non_memory_dense_cuda (__main__.TestLinalgCUDA) ... ok (0.004s) 2023-01-11T21:53:15.8307205Z test_mm_cuda_complex128 (__main__.TestLinalgCUDA) ... skip: test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test (0.002s) 2023-01-11T21:53:15.8307625Z test_mm_cuda_complex64 (__main__.TestLinalgCUDA) ... skip: test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test (0.002s) 2023-01-11T21:53:15.8308034Z test_mm_cuda_float32 (__main__.TestLinalgCUDA) ... skip: test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test (0.002s) 2023-01-11T21:53:15.8308423Z test_mm_cuda_float64 (__main__.TestLinalgCUDA) ... skip: test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test (0.002s) 2023-01-11T21:53:15.8308769Z test_multi_dot_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.007s) 2023-01-11T21:53:15.8309070Z test_multi_dot_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.007s) 2023-01-11T21:53:15.8309364Z test_multi_dot_errors_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.028s) 2023-01-11T21:53:15.8309676Z test_norm_complex_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.116s) 2023-01-11T21:53:15.8309987Z test_norm_complex_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.022s) 2023-01-11T21:53:15.8310297Z test_norm_complex_old_cuda (__main__.TestLinalgCUDA) ... ok (0.007s) 2023-01-11T21:53:15.8310591Z test_norm_dtype_cuda_bfloat16 (__main__.TestLinalgCUDA) ... ok (0.072s) 2023-01-11T21:53:15.8310896Z test_norm_dtype_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.150s) 2023-01-11T21:53:15.8311243Z test_norm_dtype_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.172s) 2023-01-11T21:53:15.8311544Z test_norm_dtype_cuda_float16 (__main__.TestLinalgCUDA) ... ok (0.072s) 2023-01-11T21:53:15.8311846Z test_norm_dtype_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.116s) 2023-01-11T21:53:15.8312142Z test_norm_dtype_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.070s) 2023-01-11T21:53:15.8312437Z test_norm_errors_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.144s) 2023-01-11T21:53:15.8312730Z test_norm_errors_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.144s) 2023-01-11T21:53:15.8313283Z test_norm_extreme_values_cuda (__main__.TestLinalgCUDA) ... /opt/conda/lib/python3.10/site-packages/numpy/linalg/linalg.py:2570: RuntimeWarning: divide by zero encountered in power 2023-01-11T21:53:15.8313637Z ret **= (1 / ord) 2023-01-11T21:53:15.8314029Z /opt/conda/lib/python3.10/site-packages/numpy/linalg/linalg.py:2568: RuntimeWarning: divide by zero encountered in reciprocal 2023-01-11T21:53:15.8314330Z absx **= ord 2023-01-11T21:53:15.8314722Z /opt/conda/lib/python3.10/site-packages/numpy/linalg/linalg.py:2568: RuntimeWarning: divide by zero encountered in power 2023-01-11T21:53:15.8315016Z absx **= ord 2023-01-11T21:53:15.8315187Z ok (0.052s) 2023-01-11T21:53:15.8315429Z test_norm_fastpaths_cuda (__main__.TestLinalgCUDA) ... ok (0.003s) 2023-01-11T21:53:15.8315747Z test_norm_fro_2_equivalence_old_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.126s) 2023-01-11T21:53:15.8316283Z test_norm_fused_type_promotion_cuda_bfloat16 (__main__.TestLinalgCUDA) ... STAGE:2023-01-11 21:51:16 18927:18927 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:53:15.8316832Z STAGE:2023-01-11 21:51:16 18927:18927 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:53:15.8317296Z STAGE:2023-01-11 21:51:16 18927:18927 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:53:15.8317758Z STAGE:2023-01-11 21:51:16 18927:18927 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:53:15.8318200Z STAGE:2023-01-11 21:51:16 18927:18927 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:53:15.8318709Z STAGE:2023-01-11 21:51:16 18927:18927 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:53:15.8318987Z ok (0.361s) 2023-01-11T21:53:15.8319455Z test_norm_fused_type_promotion_cuda_float16 (__main__.TestLinalgCUDA) ... STAGE:2023-01-11 21:51:16 18927:18927 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:53:15.8319955Z STAGE:2023-01-11 21:51:16 18927:18927 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:53:15.8320413Z STAGE:2023-01-11 21:51:16 18927:18927 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:53:15.8320861Z STAGE:2023-01-11 21:51:16 18927:18927 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:53:15.8321306Z STAGE:2023-01-11 21:51:16 18927:18927 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:53:15.8321754Z STAGE:2023-01-11 21:51:16 18927:18927 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:53:15.8322030Z ok (0.004s) 2023-01-11T21:53:15.8322281Z test_norm_matrix_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.117s) 2023-01-11T21:53:15.8322579Z test_norm_matrix_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.237s) 2023-01-11T21:53:15.8322914Z test_norm_matrix_degenerate_shapes_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.521s) 2023-01-11T21:53:15.8323268Z test_norm_matrix_degenerate_shapes_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.523s) 2023-01-11T21:53:15.8323632Z test_norm_matrix_degenerate_shapes_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.521s) 2023-01-11T21:53:15.8323967Z test_norm_matrix_degenerate_shapes_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.521s) 2023-01-11T21:53:15.8324331Z test_norm_old_cuda (__main__.TestLinalgCUDA) ... ok (0.040s) 2023-01-11T21:53:15.8324640Z test_norm_old_nan_propagation_cuda (__main__.TestLinalgCUDA) ... ok (0.008s) 2023-01-11T21:53:15.8324946Z test_norm_vector_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.099s) 2023-01-11T21:53:15.8325246Z test_norm_vector_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.641s) 2023-01-11T21:53:15.8325577Z test_norm_vector_degenerate_shapes_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.195s) 2023-01-11T21:53:15.8325930Z test_norm_vector_degenerate_shapes_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.194s) 2023-01-11T21:53:15.8326268Z test_norm_vector_degenerate_shapes_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.194s) 2023-01-11T21:53:15.8326619Z test_norm_vector_degenerate_shapes_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.194s) 2023-01-11T21:53:15.8326966Z test_nuclear_norm_axes_small_brute_force_old_cuda (__main__.TestLinalgCUDA) ... ok (0.075s) 2023-01-11T21:53:15.8327295Z test_nuclear_norm_exceptions_old_cuda (__main__.TestLinalgCUDA) ... ok (0.026s) 2023-01-11T21:53:15.8327618Z test_nuclear_norm_out_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.013s) 2023-01-11T21:53:15.8327930Z test_nuclear_norm_out_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.069s) 2023-01-11T21:53:15.8328256Z test_old_cholesky_batched_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.014s) 2023-01-11T21:53:15.8328588Z test_old_cholesky_batched_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.013s) 2023-01-11T21:53:15.8328922Z test_old_cholesky_batched_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.013s) 2023-01-11T21:53:15.8329246Z test_old_cholesky_batched_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.016s) 2023-01-11T21:53:15.8329673Z test_old_cholesky_batched_many_batches_cuda_float64 (__main__.TestLinalgCUDA) ... skip: test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test (0.001s) 2023-01-11T21:53:15.8330084Z test_old_cholesky_batched_upper_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8330437Z test_old_cholesky_batched_upper_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8330784Z test_old_cholesky_batched_upper_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8331116Z test_old_cholesky_batched_upper_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8331446Z test_old_cholesky_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.003s) 2023-01-11T21:53:15.8331768Z test_old_cholesky_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.005s) 2023-01-11T21:53:15.8332081Z test_old_cholesky_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.004s) 2023-01-11T21:53:15.8332380Z test_old_cholesky_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.003s) 2023-01-11T21:53:15.8332702Z test_old_cholesky_empty_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8333030Z test_old_cholesky_empty_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8333352Z test_old_cholesky_empty_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8333670Z test_old_cholesky_empty_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8333979Z test_ormqr_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.143s) 2023-01-11T21:53:15.8334285Z test_ormqr_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.150s) 2023-01-11T21:53:15.8334800Z test_ormqr_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.136s) 2023-01-11T21:53:15.8335158Z test_ormqr_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.129s) 2023-01-11T21:53:15.8335472Z test_ormqr_errors_and_warnings_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.027s) 2023-01-11T21:53:15.8335809Z test_ormqr_errors_and_warnings_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.026s) 2023-01-11T21:53:15.8336138Z test_ormqr_errors_and_warnings_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.026s) 2023-01-11T21:53:15.8336523Z test_ormqr_errors_and_warnings_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.026s) 2023-01-11T21:53:15.8337087Z test_outer_cuda_bfloat16 (__main__.TestLinalgCUDA) ... /var/lib/jenkins/workspace/test/test_linalg.py:132: UserWarning: torch.ger is deprecated and will be removed in a future PyTorch release. Use torch.outer instead. (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/LinearAlgebra.cpp:1242.) 2023-01-11T21:53:15.8337556Z torch.ger(a, b, out=out) 2023-01-11T21:53:15.8337747Z ok (0.006s) 2023-01-11T21:53:15.8337983Z test_outer_cuda_bool (__main__.TestLinalgCUDA) ... ok (0.005s) 2023-01-11T21:53:15.8338277Z test_outer_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.007s) 2023-01-11T21:53:15.8338566Z test_outer_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.007s) 2023-01-11T21:53:15.8338859Z test_outer_cuda_float16 (__main__.TestLinalgCUDA) ... ok (0.006s) 2023-01-11T21:53:15.8339147Z test_outer_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.006s) 2023-01-11T21:53:15.8339433Z test_outer_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.006s) 2023-01-11T21:53:15.8339713Z test_outer_cuda_int16 (__main__.TestLinalgCUDA) ... ok (0.005s) 2023-01-11T21:53:15.8339991Z test_outer_cuda_int32 (__main__.TestLinalgCUDA) ... ok (0.005s) 2023-01-11T21:53:15.8340354Z test_outer_cuda_int64 (__main__.TestLinalgCUDA) ... ok (0.005s) 2023-01-11T21:53:15.8340636Z test_outer_cuda_int8 (__main__.TestLinalgCUDA) ... ok (0.005s) 2023-01-11T21:53:15.8340914Z test_outer_cuda_uint8 (__main__.TestLinalgCUDA) ... ok (0.005s) 2023-01-11T21:53:15.8341213Z test_outer_ger_addr_legacy_tests_cuda (__main__.TestLinalgCUDA) ... ok (0.022s) 2023-01-11T21:53:15.8341537Z test_outer_type_promotion_cuda_bfloat16_bfloat16 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8341923Z test_outer_type_promotion_cuda_bfloat16_bool (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8342281Z test_outer_type_promotion_cuda_bfloat16_complex128 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8342627Z test_outer_type_promotion_cuda_bfloat16_complex64 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8342973Z test_outer_type_promotion_cuda_bfloat16_float16 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8343317Z test_outer_type_promotion_cuda_bfloat16_float32 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8343656Z test_outer_type_promotion_cuda_bfloat16_float64 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8343986Z test_outer_type_promotion_cuda_bfloat16_int16 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8344322Z test_outer_type_promotion_cuda_bfloat16_int32 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8344659Z test_outer_type_promotion_cuda_bfloat16_int64 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8344991Z test_outer_type_promotion_cuda_bfloat16_int8 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8345318Z test_outer_type_promotion_cuda_bfloat16_uint8 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8345653Z test_outer_type_promotion_cuda_bool_bfloat16 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8345992Z test_outer_type_promotion_cuda_bool_bool (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8346319Z test_outer_type_promotion_cuda_bool_complex128 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8346661Z test_outer_type_promotion_cuda_bool_complex64 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8346998Z test_outer_type_promotion_cuda_bool_float16 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8347334Z test_outer_type_promotion_cuda_bool_float32 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8347660Z test_outer_type_promotion_cuda_bool_float64 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8347994Z test_outer_type_promotion_cuda_bool_int16 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8348363Z test_outer_type_promotion_cuda_bool_int32 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8348686Z test_outer_type_promotion_cuda_bool_int64 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8349059Z test_outer_type_promotion_cuda_bool_int8 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8349388Z test_outer_type_promotion_cuda_bool_uint8 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8349733Z test_outer_type_promotion_cuda_complex128_bfloat16 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8350069Z test_outer_type_promotion_cuda_complex128_bool (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8350423Z test_outer_type_promotion_cuda_complex128_complex128 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8350782Z test_outer_type_promotion_cuda_complex128_complex64 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8351131Z test_outer_type_promotion_cuda_complex128_float16 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8351469Z test_outer_type_promotion_cuda_complex128_float32 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8351821Z test_outer_type_promotion_cuda_complex128_float64 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8352165Z test_outer_type_promotion_cuda_complex128_int16 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8352499Z test_outer_type_promotion_cuda_complex128_int32 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8352838Z test_outer_type_promotion_cuda_complex128_int64 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8353171Z test_outer_type_promotion_cuda_complex128_int8 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8353540Z test_outer_type_promotion_cuda_complex128_uint8 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8353875Z test_outer_type_promotion_cuda_complex64_bfloat16 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8354218Z test_outer_type_promotion_cuda_complex64_bool (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8354567Z test_outer_type_promotion_cuda_complex64_complex128 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8354923Z test_outer_type_promotion_cuda_complex64_complex64 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8355261Z test_outer_type_promotion_cuda_complex64_float16 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8355604Z test_outer_type_promotion_cuda_complex64_float32 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8355950Z test_outer_type_promotion_cuda_complex64_float64 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8356280Z test_outer_type_promotion_cuda_complex64_int16 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8356624Z test_outer_type_promotion_cuda_complex64_int32 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8356963Z test_outer_type_promotion_cuda_complex64_int64 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8357299Z test_outer_type_promotion_cuda_complex64_int8 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8357627Z test_outer_type_promotion_cuda_complex64_uint8 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8357964Z test_outer_type_promotion_cuda_float16_bfloat16 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8358302Z test_outer_type_promotion_cuda_float16_bool (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8358640Z test_outer_type_promotion_cuda_float16_complex128 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8358976Z test_outer_type_promotion_cuda_float16_complex64 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8359319Z test_outer_type_promotion_cuda_float16_float16 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8359656Z test_outer_type_promotion_cuda_float16_float32 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8360008Z test_outer_type_promotion_cuda_float16_float64 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8360346Z test_outer_type_promotion_cuda_float16_int16 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8360678Z test_outer_type_promotion_cuda_float16_int32 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8361011Z test_outer_type_promotion_cuda_float16_int64 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8361331Z test_outer_type_promotion_cuda_float16_int8 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8361672Z test_outer_type_promotion_cuda_float16_uint8 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8362004Z test_outer_type_promotion_cuda_float32_bfloat16 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8362334Z test_outer_type_promotion_cuda_float32_bool (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8362677Z test_outer_type_promotion_cuda_float32_complex128 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8363027Z test_outer_type_promotion_cuda_float32_complex64 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8363371Z test_outer_type_promotion_cuda_float32_float16 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8363700Z test_outer_type_promotion_cuda_float32_float32 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8364033Z test_outer_type_promotion_cuda_float32_float64 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8364366Z test_outer_type_promotion_cuda_float32_int16 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8380909Z test_outer_type_promotion_cuda_float32_int32 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8381384Z test_outer_type_promotion_cuda_float32_int64 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8381717Z test_outer_type_promotion_cuda_float32_int8 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8382064Z test_outer_type_promotion_cuda_float32_uint8 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8382404Z test_outer_type_promotion_cuda_float64_bfloat16 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8382735Z test_outer_type_promotion_cuda_float64_bool (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8383092Z test_outer_type_promotion_cuda_float64_complex128 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8383436Z test_outer_type_promotion_cuda_float64_complex64 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8383786Z test_outer_type_promotion_cuda_float64_float16 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8384119Z test_outer_type_promotion_cuda_float64_float32 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8384462Z test_outer_type_promotion_cuda_float64_float64 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8384794Z test_outer_type_promotion_cuda_float64_int16 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8385128Z test_outer_type_promotion_cuda_float64_int32 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8385458Z test_outer_type_promotion_cuda_float64_int64 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8385787Z test_outer_type_promotion_cuda_float64_int8 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8386124Z test_outer_type_promotion_cuda_float64_uint8 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8386449Z test_outer_type_promotion_cuda_int16_bfloat16 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8386785Z test_outer_type_promotion_cuda_int16_bool (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8387127Z test_outer_type_promotion_cuda_int16_complex128 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8387467Z test_outer_type_promotion_cuda_int16_complex64 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8387838Z test_outer_type_promotion_cuda_int16_float16 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8388176Z test_outer_type_promotion_cuda_int16_float32 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8388599Z test_outer_type_promotion_cuda_int16_float64 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8389010Z test_outer_type_promotion_cuda_int16_int16 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8389399Z test_outer_type_promotion_cuda_int16_int32 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8389730Z test_outer_type_promotion_cuda_int16_int64 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8390060Z test_outer_type_promotion_cuda_int16_int8 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8390387Z test_outer_type_promotion_cuda_int16_uint8 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8390720Z test_outer_type_promotion_cuda_int32_bfloat16 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8391057Z test_outer_type_promotion_cuda_int32_bool (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8391384Z test_outer_type_promotion_cuda_int32_complex128 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8391730Z test_outer_type_promotion_cuda_int32_complex64 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8392067Z test_outer_type_promotion_cuda_int32_float16 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8392397Z test_outer_type_promotion_cuda_int32_float32 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8392719Z test_outer_type_promotion_cuda_int32_float64 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8393049Z test_outer_type_promotion_cuda_int32_int16 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8393428Z test_outer_type_promotion_cuda_int32_int32 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8393762Z test_outer_type_promotion_cuda_int32_int64 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8394081Z test_outer_type_promotion_cuda_int32_int8 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8394416Z test_outer_type_promotion_cuda_int32_uint8 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8394746Z test_outer_type_promotion_cuda_int64_bfloat16 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8395078Z test_outer_type_promotion_cuda_int64_bool (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8395418Z test_outer_type_promotion_cuda_int64_complex128 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8395767Z test_outer_type_promotion_cuda_int64_complex64 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8396110Z test_outer_type_promotion_cuda_int64_float16 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8396446Z test_outer_type_promotion_cuda_int64_float32 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8396783Z test_outer_type_promotion_cuda_int64_float64 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8397118Z test_outer_type_promotion_cuda_int64_int16 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8397435Z test_outer_type_promotion_cuda_int64_int32 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8397770Z test_outer_type_promotion_cuda_int64_int64 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8398099Z test_outer_type_promotion_cuda_int64_int8 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8398434Z test_outer_type_promotion_cuda_int64_uint8 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8398758Z test_outer_type_promotion_cuda_int8_bfloat16 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8399099Z test_outer_type_promotion_cuda_int8_bool (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8399438Z test_outer_type_promotion_cuda_int8_complex128 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8399807Z test_outer_type_promotion_cuda_int8_complex64 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8400142Z test_outer_type_promotion_cuda_int8_float16 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8400481Z test_outer_type_promotion_cuda_int8_float32 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8400817Z test_outer_type_promotion_cuda_int8_float64 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8401138Z test_outer_type_promotion_cuda_int8_int16 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8401470Z test_outer_type_promotion_cuda_int8_int32 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8401793Z test_outer_type_promotion_cuda_int8_int64 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8402124Z test_outer_type_promotion_cuda_int8_int8 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8402447Z test_outer_type_promotion_cuda_int8_uint8 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8402785Z test_outer_type_promotion_cuda_uint8_bfloat16 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8403117Z test_outer_type_promotion_cuda_uint8_bool (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8403445Z test_outer_type_promotion_cuda_uint8_complex128 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8403794Z test_outer_type_promotion_cuda_uint8_complex64 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8404133Z test_outer_type_promotion_cuda_uint8_float16 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8404470Z test_outer_type_promotion_cuda_uint8_float32 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8404800Z test_outer_type_promotion_cuda_uint8_float64 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8405163Z test_outer_type_promotion_cuda_uint8_int16 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8405498Z test_outer_type_promotion_cuda_uint8_int32 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8405836Z test_outer_type_promotion_cuda_uint8_int64 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8406158Z test_outer_type_promotion_cuda_uint8_int8 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8406482Z test_outer_type_promotion_cuda_uint8_uint8 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8406796Z test_pca_lowrank_cuda (__main__.TestLinalgCUDA) ... ok (19.177s) 2023-01-11T21:53:15.8407092Z test_permute_matmul_cuda (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8407402Z test_pinv_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.448s) 2023-01-11T21:53:15.8407708Z test_pinv_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.240s) 2023-01-11T21:53:15.8408000Z test_pinv_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.211s) 2023-01-11T21:53:15.8408298Z test_pinv_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.237s) 2023-01-11T21:53:15.8408627Z test_pinv_errors_and_warnings_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.035s) 2023-01-11T21:53:15.8408985Z test_pinv_errors_and_warnings_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.031s) 2023-01-11T21:53:15.8409316Z test_pinv_errors_and_warnings_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.030s) 2023-01-11T21:53:15.8409651Z test_pinv_errors_and_warnings_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.031s) 2023-01-11T21:53:15.8409975Z test_pinverse_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.043s) 2023-01-11T21:53:15.8410290Z test_pinverse_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.018s) 2023-01-11T21:53:15.8410594Z test_pinverse_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.016s) 2023-01-11T21:53:15.8410896Z test_pinverse_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.019s) 2023-01-11T21:53:15.8411220Z test_preferred_linalg_library_cuda (__main__.TestLinalgCUDA) ... ok (0.004s) 2023-01-11T21:53:15.8411524Z test_qr_batched_cuda_complex128 (__main__.TestLinalgCUDA) 2023-01-11T21:53:15.8411875Z test torch.linalg.qr vs numpy.linalg.qr. We need some special logic ... ok (0.004s) 2023-01-11T21:53:15.8412195Z test_qr_batched_cuda_complex64 (__main__.TestLinalgCUDA) 2023-01-11T21:53:15.8412507Z test torch.linalg.qr vs numpy.linalg.qr. We need some special logic ... ok (0.004s) 2023-01-11T21:53:15.8412819Z test_qr_batched_cuda_float32 (__main__.TestLinalgCUDA) 2023-01-11T21:53:15.8413130Z test torch.linalg.qr vs numpy.linalg.qr. We need some special logic ... ok (0.004s) 2023-01-11T21:53:15.8413440Z test_qr_batched_cuda_float64 (__main__.TestLinalgCUDA) 2023-01-11T21:53:15.8413742Z test torch.linalg.qr vs numpy.linalg.qr. We need some special logic ... ok (0.003s) 2023-01-11T21:53:15.8414225Z test_qr_cuda_complex128 (__main__.TestLinalgCUDA) ... /var/lib/jenkins/workspace/test/test_linalg.py:3516: UserWarning: torch.qr is deprecated in favor of torch.linalg.qr and will be removed in a future PyTorch release. 2023-01-11T21:53:15.8415125Z The boolean parameter 'some' has been replaced with a string parameter 'mode'. 2023-01-11T21:53:15.8415401Z Q, R = torch.qr(A, some) 2023-01-11T21:53:15.8415598Z should be replaced with 2023-01-11T21:53:15.8416079Z Q, R = torch.linalg.qr(A, 'reduced' if some else 'complete') (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/BatchLinearAlgebra.cpp:2471.) 2023-01-11T21:53:15.8416455Z torch.qr(A, some=some, out=(Q_out, R_out)) 2023-01-11T21:53:15.8416659Z ok (0.077s) 2023-01-11T21:53:15.8416897Z test_qr_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.079s) 2023-01-11T21:53:15.8417181Z test_qr_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.070s) 2023-01-11T21:53:15.8417453Z test_qr_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.069s) 2023-01-11T21:53:15.8417750Z test_qr_error_cases_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.009s) 2023-01-11T21:53:15.8418112Z test_qr_vs_numpy_cuda_complex128 (__main__.TestLinalgCUDA) 2023-01-11T21:53:15.8418395Z test torch.linalg.qr vs numpy.linalg.qr ... ok (0.007s) 2023-01-11T21:53:15.8418676Z test_qr_vs_numpy_cuda_complex64 (__main__.TestLinalgCUDA) 2023-01-11T21:53:15.8418958Z test torch.linalg.qr vs numpy.linalg.qr ... ok (0.007s) 2023-01-11T21:53:15.8419231Z test_qr_vs_numpy_cuda_float32 (__main__.TestLinalgCUDA) 2023-01-11T21:53:15.8419496Z test torch.linalg.qr vs numpy.linalg.qr ... ok (0.007s) 2023-01-11T21:53:15.8419774Z test_qr_vs_numpy_cuda_float64 (__main__.TestLinalgCUDA) 2023-01-11T21:53:15.8420124Z test torch.linalg.qr vs numpy.linalg.qr ... ok (0.007s) 2023-01-11T21:53:15.8420410Z test_renorm_cuda (__main__.TestLinalgCUDA) ... ok (0.004s) 2023-01-11T21:53:15.8420714Z test_renorm_ps_cuda (__main__.TestLinalgCUDA) ... skip: Only runs on cpu (0.001s) 2023-01-11T21:53:15.8421037Z test_slogdet_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (1.151s) 2023-01-11T21:53:15.8421351Z test_slogdet_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (1.050s) 2023-01-11T21:53:15.8421645Z test_slogdet_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.121s) 2023-01-11T21:53:15.8421945Z test_slogdet_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.128s) 2023-01-11T21:53:15.8422261Z test_slogdet_errors_and_warnings_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.025s) 2023-01-11T21:53:15.8422597Z test_slogdet_errors_and_warnings_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.025s) 2023-01-11T21:53:15.8422954Z test_slogdet_errors_and_warnings_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.025s) 2023-01-11T21:53:15.8423293Z test_slogdet_errors_and_warnings_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.025s) 2023-01-11T21:53:15.8423635Z test_solve_batched_broadcasting_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.013s) 2023-01-11T21:53:15.8423970Z test_solve_batched_broadcasting_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.006s) 2023-01-11T21:53:15.8424318Z test_solve_batched_broadcasting_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.005s) 2023-01-11T21:53:15.8424663Z test_solve_batched_broadcasting_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.006s) 2023-01-11T21:53:15.8425027Z test_solve_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.179s) 2023-01-11T21:53:15.8425321Z test_solve_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.040s) 2023-01-11T21:53:15.8425611Z test_solve_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.035s) 2023-01-11T21:53:15.8425896Z test_solve_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.079s) 2023-01-11T21:53:15.8426184Z test_solve_removed_error_cuda (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8426495Z test_strided_mm_bmm_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.002s) 2023-01-11T21:53:15.8426800Z test_strided_mm_bmm_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8427089Z test_svd_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (3.709s) 2023-01-11T21:53:15.8427382Z test_svd_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (2.658s) 2023-01-11T21:53:15.8427669Z test_svd_cuda_float32 (__main__.TestLinalgCUDA) ... ok (2.408s) 2023-01-11T21:53:15.8427952Z test_svd_cuda_float64 (__main__.TestLinalgCUDA) ... ok (2.644s) 2023-01-11T21:53:15.8429010Z test_svd_lowrank_cuda_float64 (__main__.TestLinalgCUDA) ... /opt/conda/lib/python3.10/site-packages/torch/_lowrank.py:184: UserWarning: torch.linalg.svd: During SVD computation with the selected cusolver driver, batches 0 failed to converge. A more accurate method will be used to compute the SVD as a fallback. Check doc at https://pytorch.org/docs/stable/generated/torch.linalg.svd.html (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/cuda/linalg/BatchLinearAlgebraLib.cpp:907.) 2023-01-11T21:53:15.8429709Z U, S, Vh = torch.linalg.svd(B_t, full_matrices=False) 2023-01-11T21:53:15.8429935Z ok (64.932s) 2023-01-11T21:53:15.8430199Z test_svd_memory_allocation_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.059s) 2023-01-11T21:53:15.8430572Z test_svd_memory_allocation_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.007s) 2023-01-11T21:53:15.8430906Z test_svd_memory_allocation_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.005s) 2023-01-11T21:53:15.8431236Z test_svd_memory_allocation_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.026s) 2023-01-11T21:53:15.8431719Z test_symeig_cuda_complex128 (__main__.TestLinalgCUDA) ... /var/lib/jenkins/workspace/test/test_linalg.py:6990: UserWarning: torch.symeig is deprecated in favor of torch.linalg.eigh and will be removed in a future PyTorch release. 2023-01-11T21:53:15.8432248Z The default behavior has changed from using the upper triangular portion of the matrix by default to using the lower triangular portion. 2023-01-11T21:53:15.8432588Z L, _ = torch.symeig(A, upper=upper) 2023-01-11T21:53:15.8432813Z should be replaced with 2023-01-11T21:53:15.8433112Z L = torch.linalg.eigvalsh(A, UPLO='U' if upper else 'L') 2023-01-11T21:53:15.8433348Z and 2023-01-11T21:53:15.8433551Z L, V = torch.symeig(A, eigenvectors=True) 2023-01-11T21:53:15.8433781Z should be replaced with 2023-01-11T21:53:15.8434253Z L, V = torch.linalg.eigh(A, UPLO='U' if upper else 'L') (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/BatchLinearAlgebra.cpp:2928.) 2023-01-11T21:53:15.8434675Z torch.symeig(x, eigenvectors=eigenvectors, upper=upper, out=(oute, outv)) 2023-01-11T21:53:15.8435403Z /var/lib/jenkins/workspace/test/test_linalg.py:6990: UserWarning: An output with one or more elements was resized since it had shape [5, 5], which does not match the required output shape [0]. This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize_(0). (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/Resize.cpp:33.) 2023-01-11T21:53:15.8436109Z torch.symeig(x, eigenvectors=eigenvectors, upper=upper, out=(oute, outv)) 2023-01-11T21:53:15.8436863Z /var/lib/jenkins/workspace/test/test_linalg.py:6990: UserWarning: An output with one or more elements was resized since it had shape [3, 5, 5], which does not match the required output shape [0]. This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize_(0). (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/Resize.cpp:33.) 2023-01-11T21:53:15.8437545Z torch.symeig(x, eigenvectors=eigenvectors, upper=upper, out=(oute, outv)) 2023-01-11T21:53:15.8438251Z /var/lib/jenkins/workspace/test/test_linalg.py:6990: UserWarning: An output with one or more elements was resized since it had shape [3, 5, 5, 5], which does not match the required output shape [0]. This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize_(0). (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/Resize.cpp:33.) 2023-01-11T21:53:15.8438943Z torch.symeig(x, eigenvectors=eigenvectors, upper=upper, out=(oute, outv)) 2023-01-11T21:53:15.8439645Z /var/lib/jenkins/workspace/test/test_linalg.py:6990: UserWarning: An output with one or more elements was resized since it had shape [5, 3, 5, 5, 5], which does not match the required output shape [0]. This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize_(0). (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/Resize.cpp:33.) 2023-01-11T21:53:15.8440314Z torch.symeig(x, eigenvectors=eigenvectors, upper=upper, out=(oute, outv)) 2023-01-11T21:53:15.8440627Z ok (1.392s) 2023-01-11T21:53:15.8440930Z test_symeig_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (1.392s) 2023-01-11T21:53:15.8441247Z test_symeig_cuda_float32 (__main__.TestLinalgCUDA) ... ok (1.561s) 2023-01-11T21:53:15.8441562Z test_symeig_cuda_float64 (__main__.TestLinalgCUDA) ... ok (1.377s) 2023-01-11T21:53:15.8441887Z test_symeig_out_errors_and_warnings_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.024s) 2023-01-11T21:53:15.8442249Z test_symeig_out_errors_and_warnings_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.024s) 2023-01-11T21:53:15.8442599Z test_symeig_out_errors_and_warnings_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.023s) 2023-01-11T21:53:15.8442946Z test_symeig_out_errors_and_warnings_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.023s) 2023-01-11T21:53:15.8443250Z test_tensordot_cuda (__main__.TestLinalgCUDA) ... ok (0.005s) 2023-01-11T21:53:15.8443552Z test_tensorinv_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.016s) 2023-01-11T21:53:15.8443870Z test_tensorinv_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.017s) 2023-01-11T21:53:15.8444173Z test_tensorinv_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.015s) 2023-01-11T21:53:15.8444476Z test_tensorinv_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.015s) 2023-01-11T21:53:15.8444799Z test_tensorinv_empty_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8445126Z test_tensorinv_empty_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8445441Z test_tensorinv_empty_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8445760Z test_tensorinv_empty_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8446100Z test_tensorinv_errors_and_warnings_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.018s) 2023-01-11T21:53:15.8446446Z test_tensorinv_errors_and_warnings_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.017s) 2023-01-11T21:53:15.8446799Z test_tensorinv_errors_and_warnings_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.017s) 2023-01-11T21:53:15.8447148Z test_tensorinv_errors_and_warnings_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.017s) 2023-01-11T21:53:15.8447488Z test_tensorinv_singular_input_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.009s) 2023-01-11T21:53:15.8447863Z test_tensorinv_singular_input_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.009s) 2023-01-11T21:53:15.8448205Z test_tensorinv_singular_input_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.009s) 2023-01-11T21:53:15.8448543Z test_tensorinv_singular_input_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.009s) 2023-01-11T21:53:15.8448862Z test_tensorsolve_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.007s) 2023-01-11T21:53:15.8449203Z test_tensorsolve_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.007s) 2023-01-11T21:53:15.8449543Z test_tensorsolve_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.007s) 2023-01-11T21:53:15.8449849Z test_tensorsolve_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.007s) 2023-01-11T21:53:15.8450165Z test_tensorsolve_empty_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8450493Z test_tensorsolve_empty_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8450824Z test_tensorsolve_empty_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8451139Z test_tensorsolve_empty_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.001s) 2023-01-11T21:53:15.8451473Z test_tensorsolve_errors_and_warnings_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.007s) 2023-01-11T21:53:15.8452017Z test_triangular_solve_batched_broadcasting_cuda_complex128 (__main__.TestLinalgCUDA) ... /var/lib/jenkins/workspace/test/test_linalg.py:4212: UserWarning: torch.triangular_solve is deprecated in favor of torch.linalg.solve_triangularand will be removed in a future PyTorch release. 2023-01-11T21:53:15.8452571Z torch.linalg.solve_triangular has its arguments reversed and does not return a copy of one of the inputs. 2023-01-11T21:53:15.8452917Z X = torch.triangular_solve(B, A).solution 2023-01-11T21:53:15.8453140Z should be replaced with 2023-01-11T21:53:15.8453503Z X = torch.linalg.solve_triangular(A, B). (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/BatchLinearAlgebra.cpp:2225.) 2023-01-11T21:53:15.8453943Z x = torch.triangular_solve(b, A, upper=upper, transpose=transpose, unitriangular=unitriangular)[0] 2023-01-11T21:53:15.8454218Z ok (0.031s) 2023-01-11T21:53:15.8454671Z test_triangular_solve_batched_broadcasting_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.031s) 2023-01-11T21:53:15.8455174Z test_triangular_solve_batched_broadcasting_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.029s) 2023-01-11T21:53:15.8455533Z test_triangular_solve_batched_broadcasting_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.028s) 2023-01-11T21:53:15.8455884Z test_triangular_solve_batched_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.021s) 2023-01-11T21:53:15.8456228Z test_triangular_solve_batched_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.021s) 2023-01-11T21:53:15.8456559Z test_triangular_solve_batched_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.020s) 2023-01-11T21:53:15.8456883Z test_triangular_solve_batched_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.019s) 2023-01-11T21:53:15.8457297Z test_triangular_solve_batched_many_batches_cuda_complex128 (__main__.TestLinalgCUDA) ... skip: test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test (0.001s) 2023-01-11T21:53:15.8457777Z test_triangular_solve_batched_many_batches_cuda_complex64 (__main__.TestLinalgCUDA) ... skip: test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test (0.001s) 2023-01-11T21:53:15.8458253Z test_triangular_solve_batched_many_batches_cuda_float32 (__main__.TestLinalgCUDA) ... skip: test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test (0.001s) 2023-01-11T21:53:15.8458707Z test_triangular_solve_batched_many_batches_cuda_float64 (__main__.TestLinalgCUDA) ... skip: test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test (0.001s) 2023-01-11T21:53:15.8459098Z test_triangular_solve_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.017s) 2023-01-11T21:53:15.8459497Z test_triangular_solve_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.017s) 2023-01-11T21:53:15.8459821Z test_triangular_solve_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.016s) 2023-01-11T21:53:15.8460185Z test_triangular_solve_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.016s) 2023-01-11T21:53:15.8460525Z test_triangular_solve_out_errors_and_warnings_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.025s) 2023-01-11T21:53:15.8460895Z test_triangular_solve_out_errors_and_warnings_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.025s) 2023-01-11T21:53:15.8461259Z test_triangular_solve_out_errors_and_warnings_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.024s) 2023-01-11T21:53:15.8461608Z test_triangular_solve_out_errors_and_warnings_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.025s) 2023-01-11T21:53:15.8461941Z test_vdot_invalid_args_cuda (__main__.TestLinalgCUDA) ... ok (0.033s) 2023-01-11T21:53:15.8462251Z test_vdot_vs_numpy_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.004s) 2023-01-11T21:53:15.8462554Z test_vdot_vs_numpy_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.003s) 2023-01-11T21:53:15.8462857Z test_vector_norm_cuda_bfloat16 (__main__.TestLinalgCUDA) ... ok (1.339s) 2023-01-11T21:53:15.8463163Z test_vector_norm_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.845s) 2023-01-11T21:53:15.8463471Z test_vector_norm_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (1.342s) 2023-01-11T21:53:15.8463765Z test_vector_norm_cuda_float16 (__main__.TestLinalgCUDA) ... ok (1.338s) 2023-01-11T21:53:15.8464064Z test_vector_norm_cuda_float32 (__main__.TestLinalgCUDA) ... ok (1.339s) 2023-01-11T21:53:15.8464368Z test_vector_norm_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.844s) 2023-01-11T21:53:15.8464667Z test_vector_norm_dim_tuple_arg_cuda (__main__.TestLinalgCUDA) ... ok (0.038s) 2023-01-11T21:53:15.8465281Z test_vector_norm_extreme_values_cuda (__main__.TestLinalgCUDA) ... /opt/conda/lib/python3.10/site-packages/numpy/linalg/linalg.py:2570: RuntimeWarning: divide by zero encountered in power 2023-01-11T21:53:15.8465638Z ret **= (1 / ord) 2023-01-11T21:53:15.8466042Z /opt/conda/lib/python3.10/site-packages/numpy/linalg/linalg.py:2568: RuntimeWarning: divide by zero encountered in reciprocal 2023-01-11T21:53:15.8466336Z absx **= ord 2023-01-11T21:53:15.8466733Z /opt/conda/lib/python3.10/site-packages/numpy/linalg/linalg.py:2568: RuntimeWarning: divide by zero encountered in power 2023-01-11T21:53:15.8467030Z absx **= ord 2023-01-11T21:53:15.8467200Z ok (0.025s) 2023-01-11T21:53:15.8467309Z 2023-01-11T21:53:15.8467512Z ---------------------------------------------------------------------- 2023-01-11T21:53:15.8467772Z Ran 718 tests in 304.716s 2023-01-11T21:53:15.8467895Z 2023-01-11T21:53:15.8467967Z OK (skipped=47) 2023-01-11T21:53:15.8468087Z 2023-01-11T21:53:15.8468177Z Generating XML reports... 2023-01-11T21:53:15.8468576Z Generated XML report: test-reports/python-unittest/test_linalg/TEST-TestLinalgCUDA-20230111214810.xml 2023-01-11T21:53:15.8468807Z 2023-01-11T21:53:15.8469178Z ##[endgroup] 2023-01-11T21:53:15.8469595Z FINISHED PRINTING LOG FILE of test_linalg (/var/lib/jenkins/workspace/test/test-reports/test_linalg_lzuigkoh) 2023-01-11T21:53:15.8469830Z 2023-01-11T21:53:15.8470042Z Running test_multiprocessing_spawn ... [2023-01-11 21:53:15.804278] 2023-01-11T21:53:15.8470618Z Executing ['/opt/conda/bin/python', '-bb', 'test_multiprocessing_spawn.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 21:53:15.804502] 2023-01-11T21:53:35.8970148Z 2023-01-11T21:53:35.8970669Z Expand the folded group to see the log file of test_multiprocessing_spawn 2023-01-11T21:53:35.8971665Z ##[group]PRINTING LOG FILE of test_multiprocessing_spawn (/var/lib/jenkins/workspace/test/test-reports/test_multiprocessing_spawn_a6hjddgv) 2023-01-11T21:53:35.8976318Z 2023-01-11T21:53:35.8976965Z Running tests... 2023-01-11T21:53:35.8977596Z ---------------------------------------------------------------------- 2023-01-11T21:53:35.8978488Z Test results will be stored in test-reports/python-unittest/test_multiprocessing_spawn 2023-01-11T21:53:35.8979010Z test_errors_pickleable (__main__.ErrorTest) ... ok (1.085s) 2023-01-11T21:53:35.8979431Z test_exception_all (__main__.ForkTest) ... ok (0.057s) 2023-01-11T21:53:35.8979801Z test_exception_single (__main__.ForkTest) ... ok (0.113s) 2023-01-11T21:53:35.8980270Z test_first_argument_index (__main__.ForkTest) ... ok (0.050s) 2023-01-11T21:53:35.8980709Z test_success (__main__.ForkTest) ... ok (0.049s) 2023-01-11T21:53:35.8981175Z test_success_first_then_exception (__main__.ForkTest) ... ok (0.150s) 2023-01-11T21:53:35.8982886Z test_success_non_blocking (__main__.ForkTest) ... ok (0.050s) 2023-01-11T21:53:35.8983295Z test_terminate_exit (__main__.ForkTest) ... ok (0.060s) 2023-01-11T21:53:35.8983682Z test_terminate_signal (__main__.ForkTest) ... ok (0.739s) 2023-01-11T21:53:35.8984283Z test_exception_all (__main__.SpawnTest) ... /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:53:35.8984727Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:53:35.8985178Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:53:35.8985601Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:53:35.8986045Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:53:35.8986439Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:53:35.8987060Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:53:35.8987711Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:53:35.8987965Z ok (1.664s) 2023-01-11T21:53:35.8988472Z test_exception_raises (__main__.SpawnTest) ... /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:53:35.8988872Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:53:35.8989343Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:53:35.8989717Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:53:35.8989985Z ok (1.547s) 2023-01-11T21:53:35.8990429Z test_exception_single (__main__.SpawnTest) ... /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:53:35.8990863Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:53:35.8991302Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:53:35.8991702Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:53:35.8992146Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:53:35.8992539Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:53:35.8992964Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:53:35.8993362Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:53:35.8993799Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:53:35.8994191Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:53:35.8994661Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:53:35.8995071Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:53:35.8995566Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:53:35.8995956Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:53:35.8996384Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:53:35.8996787Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:53:35.8997024Z ok (3.138s) 2023-01-11T21:53:35.8997494Z test_first_argument_index (__main__.SpawnTest) ... /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:53:35.8997893Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:53:35.8998361Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:53:35.8998730Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:53:35.8999203Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:53:35.8999547Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:53:35.9000003Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:53:35.9000369Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:53:35.9000602Z ok (1.615s) 2023-01-11T21:53:35.9000865Z test_signal_raises (__main__.SpawnTest) ... ok (0.001s) 2023-01-11T21:53:35.9001354Z test_success (__main__.SpawnTest) ... /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:53:35.9001820Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:53:35.9002260Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:53:35.9002668Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:53:35.9003101Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:53:35.9003487Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:53:35.9003927Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:53:35.9004329Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:53:35.9004559Z ok (1.599s) 2023-01-11T21:53:35.9005046Z test_success_first_then_exception (__main__.SpawnTest) ... /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:53:35.9005466Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:53:35.9005934Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:53:35.9006307Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:53:35.9006790Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:53:35.9007138Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:53:35.9007600Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:53:35.9007968Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:53:35.9008207Z ok (1.693s) 2023-01-11T21:53:35.9008714Z test_success_non_blocking (__main__.SpawnTest) ... /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:53:35.9009104Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:53:35.9009630Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:53:35.9010002Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:53:35.9010479Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:53:35.9010839Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:53:35.9011311Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:53:35.9011678Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:53:35.9011943Z ok (1.603s) 2023-01-11T21:53:35.9012383Z test_terminate_exit (__main__.SpawnTest) ... /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:53:35.9012818Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:53:35.9013244Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:53:35.9013649Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:53:35.9014083Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:53:35.9014441Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:53:35.9015095Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:53:35.9015448Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:53:35.9015683Z ok (1.605s) 2023-01-11T21:53:35.9016270Z test_terminate_signal (__main__.SpawnTest) ... /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:53:35.9016645Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:53:35.9017083Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:53:35.9017433Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:53:35.9017858Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:53:35.9018235Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:53:35.9018683Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:53:35.9019034Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:53:35.9019262Z ok (1.515s) 2023-01-11T21:53:35.9019369Z 2023-01-11T21:53:35.9019572Z ---------------------------------------------------------------------- 2023-01-11T21:53:35.9019828Z Ran 19 tests in 18.338s 2023-01-11T21:53:35.9019947Z 2023-01-11T21:53:35.9020007Z OK 2023-01-11T21:53:35.9020172Z 2023-01-11T21:53:35.9020295Z Generating XML reports... 2023-01-11T21:53:35.9020714Z Generated XML report: test-reports/python-unittest/test_multiprocessing_spawn/TEST-ErrorTest-20230111215317.xml 2023-01-11T21:53:35.9021221Z Generated XML report: test-reports/python-unittest/test_multiprocessing_spawn/TEST-ForkTest-20230111215317.xml 2023-01-11T21:53:35.9021729Z Generated XML report: test-reports/python-unittest/test_multiprocessing_spawn/TEST-SpawnTest-20230111215317.xml 2023-01-11T21:53:35.9021974Z 2023-01-11T21:53:35.9022362Z ##[endgroup] 2023-01-11T21:53:35.9022796Z FINISHED PRINTING LOG FILE of test_multiprocessing_spawn (/var/lib/jenkins/workspace/test/test-reports/test_multiprocessing_spawn_a6hjddgv) 2023-01-11T21:53:35.9023047Z 2023-01-11T21:53:35.9023199Z Running test_ops ... [2023-01-11 21:53:35.896996] 2023-01-11T21:53:38.4215613Z Ignoring disabled issues: [] 2023-01-11T21:53:38.4370306Z Executing ['/opt/conda/bin/python', '-bb', 'test_ops.py', '-v', '--use-pytest', '-vv', '-rfEX', '-x', '--reruns=2', '--shard-id=0', '--num-shards=2', '-k=not _linalg_cholesky_', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 21:53:38.436388] 2023-01-11T21:53:38.4744365Z Ignoring disabled issues: [] 2023-01-11T21:53:38.4896607Z Executing ['/opt/conda/bin/python', '-bb', 'test_ops.py', '-v', '--use-pytest', '-vv', '-rfEX', '-x', '--reruns=2', '--shard-id=1', '--num-shards=2', '-k=not _linalg_cholesky_', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 21:53:38.489123] 2023-01-11T23:10:16.6583981Z 2023-01-11T23:10:16.6584774Z Expand the folded group to see the log file of test_ops 2023-01-11T23:10:16.6586080Z ##[group]PRINTING LOG FILE of test_ops (/var/lib/jenkins/workspace/test/test-reports/test_ops_xj2vdchj) 2023-01-11T23:10:16.6601333Z Test results will be stored in test-reports/python-pytest/test_ops/test_ops-8308d40cbcb1066e.xml 2023-01-11T23:10:16.6601794Z ============================= test session starts ============================== 2023-01-11T23:10:16.6602396Z platform linux -- Python 3.10.8, pytest-7.2.0, pluggy-1.0.0 -- /opt/conda/bin/python 2023-01-11T23:10:16.6604173Z cachedir: .pytest_cache 2023-01-11T23:10:16.6604976Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2023-01-11T23:10:16.6605596Z rootdir: /var/lib/jenkins/workspace, configfile: pytest.ini 2023-01-11T23:10:16.6606392Z plugins: hypothesis-5.35.1, flakefinder-1.1.0, rerunfailures-10.3, shard-0.1.2, xdist-3.1.0, xdoctest-1.1.0 2023-01-11T23:10:16.6606965Z collecting ... collected 30861 items / 25 deselected / 30836 selected 2023-01-11T23:10:16.8170649Z Running 15672 items in this shard: test/test_ops.py::TestCommonCUDA::test_compare_cpu_T_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu___radd___cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu___rdiv___cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu___rmod___cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu___rxor___cuda_int64, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_T_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs__conversions_bfloat16_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs__conversions_byte_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs__conversions_cdouble_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs__conversions_chalf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs__conversions_char_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs__conversions_double_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs__conversions_half_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs__conversions_short_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_addcdiv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_addcmul_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_addr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_arange_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_as_strided_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_atan2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_atleast_1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_bitwise_left_shift_cuda_int64, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_bucketize_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_column_stack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_constant_pad_nd_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_contiguous_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_diag_embed_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_diagonal_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_div_floor_rounding_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_div_no_rounding_mode_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_dstack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_empty_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_eye_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_fft_ifftshift_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_flipud_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_fmin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_hstack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_hypot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_igamma_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_igammac_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_index_add_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_index_fill_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_index_select_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_linalg_svd_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_logspace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_logsumexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_masked_fill_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_movedim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_mul_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_narrow_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_narrow_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_new_empty_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_new_empty_strided_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_new_full_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_new_ones_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_nn_functional_dropout_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_nn_functional_glu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_nn_functional_hardshrink_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_nn_functional_hinge_embedding_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_nn_functional_leaky_relu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_nn_functional_poisson_nll_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_nn_functional_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_ones_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_randn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_repeat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_reshape_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_rsub_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_special_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_special_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_special_xlog1py_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_special_zeta_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_stack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_to_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_tril_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_unflatten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_unfold_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_unsqueeze_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_view_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_zeros_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_addcdiv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_addcmul_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_addmm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_addmv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_addr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_arange_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_as_strided_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_as_strided_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_atan2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_baddbmm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_bernoulli_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_bfloat16_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_bincount_cuda_int64, test/test_ops.py::TestCommonCUDA::test_compare_cpu_bitwise_left_shift_cuda_int64, test/test_ops.py::TestCommonCUDA::test_compare_cpu_bmm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_bucketize_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_byte_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_cdist_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_cdouble_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_char_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_cholesky_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_cholesky_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_chunk_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_combinations_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_constant_pad_nd_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_cov_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_cummax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_cummin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_cumprod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_cumsum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_cumulative_trapezoid_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_diag_embed_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_diagonal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_diagonal_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_div_floor_rounding_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_div_no_rounding_mode_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_div_trunc_rounding_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_dsplit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_dstack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_empty_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_expand_as_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_expand_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_fft_fftshift_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_fliplr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_fmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_fmin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_full_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_full_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_gather_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_geqrf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_gradient_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_half_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_hypot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_igamma_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_igammac_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_index_add_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_index_put_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_index_select_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_inner_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_kron_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_kthvalue_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_lerp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_det_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_eigvals_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_eigvalsh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_lstsq_grad_oriented_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_lu_factor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_lu_factor_ex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_matrix_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_matrix_power_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_matrix_rank_hermitian_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_multi_dot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_norm_subgradients_at_zero_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_pinv_hermitian_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_pinv_singular_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_qr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_solve_ex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_solve_triangular_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_svd_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_svdvals_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linspace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_logaddexp2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_logsumexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_long_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_lu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_lu_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_lu_unpack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_mH_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_mT_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_masked_cumprod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_masked_logaddexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_masked_median_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_masked_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_masked_softmin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_matmul_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_max_pool2d_with_indices_backward_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_max_reduction_no_dim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_max_reduction_with_dim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_meshgrid_list_of_tensors_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_mm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_movedim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_msort_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_mul_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_mv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nanmedian_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nanquantile_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_native_dropout_backward_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_new_empty_strided_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_new_ones_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nextafter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_adaptive_avg_pool2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_adaptive_avg_pool3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_adaptive_max_pool1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_adaptive_max_pool2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_adaptive_max_pool3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_alpha_dropout_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_avg_pool3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_bilinear_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_cosine_embedding_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_cross_entropy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_dropout2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_fractional_max_pool2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_fractional_max_pool3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_gaussian_nll_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_grid_sample_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_hardshrink_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_hardtanh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_hinge_embedding_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_huber_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_instance_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_interpolate_area_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_interpolate_bilinear_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_interpolate_linear_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_kl_div_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_leaky_relu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_margin_ranking_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_max_pool2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_max_pool3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_max_unpool2d_grad_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_nll_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_pad_circular_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_pad_reflect_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_pixel_unshuffle_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_poisson_nll_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_relu6_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_rrelu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_softshrink_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_unfold_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_upsample_nearest_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_norm_fro_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_ones_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_ops_nvprims_var_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_outer_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_put_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_randint_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_randn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_randn_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_resolve_conj_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_rot90_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_scatter_reduce_amax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_scatter_reduce_amin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_scatter_reduce_sum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_segment_reduce_lengths_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_select_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_short_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_slice_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_softmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_sort_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_special_chebyshev_polynomial_t_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_special_chebyshev_polynomial_u_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_special_chebyshev_polynomial_w_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_special_hermite_polynomial_h_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_special_hermite_polynomial_he_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_special_laguerre_polynomial_l_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_special_legendre_polynomial_p_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_special_xlog1py_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_special_zeta_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_split_list_args_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_squeeze_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_std_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_std_mean_unbiased_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_std_unbiased_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_sum_to_size_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_svd_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_symeig_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_take_along_dim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_tensordot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_to_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_trace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_trapezoid_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_trapz_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_tril_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_true_divide_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_unfold_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_uniform_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_unique_consecutive_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_unsqueeze_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_var_unbiased_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_vdot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_view_as_complex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_view_as_real_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_compare_cpu_vsplit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_zeros_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_T_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing___getitem___cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_abs_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_acosh_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_add_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_angle_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_as_strided_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_as_strided_scatter_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_asinh_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_atan_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_atanh_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_atleast_1d_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_atleast_2d_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_bfloat16_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_block_diag_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_cdouble_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_char_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_conj_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_contiguous_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_cosh_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_double_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_dstack_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_empty_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_empty_like_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_exp_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_fft_fft2_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_fft_hfft2_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_fft_hfft_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_fft_ifftn_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_fft_ifftshift_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_fft_irfft2_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_fill_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_flatten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_float_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_full_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_hsplit_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_imag_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_index_add_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_index_select_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_isinf_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_isreal_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_long_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_mH_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_neg_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_new_zeros_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_nn_functional_conv1d_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_nn_functional_conv2d_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_nn_functional_conv_transpose1d_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_ones_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_randn_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_ravel_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_repeat_interleave_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_roll_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_select_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_sgn_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_sin_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_split_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_sqrt_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_stack_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_sum_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_tanh_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_tril_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_triu_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_unflatten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_unfold_copy_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_vstack_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_zeros_like_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_dtypes___getitem___cuda, test/test_ops.py::TestCommonCUDA::test_dtypes___radd___cuda, test/test_ops.py::TestCommonCUDA::test_dtypes___rand___cuda, test/test_ops.py::TestCommonCUDA::test_dtypes___rmod___cuda, test/test_ops.py::TestCommonCUDA::test_dtypes___rmul___cuda, test/test_ops.py::TestCommonCUDA::test_dtypes___ror___cuda, test/test_ops.py::TestCommonCUDA::test_dtypes___rsub___cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__native_batch_norm_legit_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_T_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs__conversions_chalf_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs__conversions_double_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs__conversions_float_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs__conversions_half_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs__conversions_short_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_acos_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_acosh_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_add_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_addr_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_all_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_allclose_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_amax_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_amin_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_arange_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_asinh_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_atan2_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_atanh_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_atleast_2d_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_atleast_3d_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_bitwise_and_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_bitwise_left_shift_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_bitwise_not_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_bitwise_or_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_bitwise_right_shift_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_broadcast_shapes_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_broadcast_tensors_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_bucketize_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_cat_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_chunk_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_clamp_max_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_clamp_min_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_conj_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_conj_physical_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_constant_pad_nd_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_cos_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_diag_embed_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_div_floor_rounding_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_div_trunc_rounding_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_dsplit_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_dstack_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_erfc_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_expand_as_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_expm1_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_fft_fft2_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_fft_fft_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_fft_fftshift_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_fft_hfft_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_fft_hfftn_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_fft_ifft2_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_fft_ifft_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_fft_ifftn_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_fft_ihfft_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_fft_irfft2_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_fft_irfft_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_fft_rfft2_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_fill_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_fliplr_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_floor_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_fmin_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_fmod_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_frac_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_gcd_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_gt_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_hsplit_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_hstack_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_hypot_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_i0_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_igamma_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_index_copy_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_index_fill_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_index_select_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_isfinite_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_isinf_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_isnan_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_isneginf_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_isreal_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_le_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_lerp_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_lgamma_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_linalg_matrix_norm_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_linalg_svd_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_linalg_vector_norm_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_linspace_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_log2_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_log_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_log_softmax_with_dtype_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_logical_not_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_logical_or_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_logical_xor_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_logsumexp_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_masked_fill_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_movedim_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_mul_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nan_to_num_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_ne_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_neg_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_new_empty_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_new_empty_strided_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_new_full_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nextafter_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_dropout_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_elu_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_group_norm_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_hardtanh_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_huber_loss_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_layer_norm_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_leaky_relu_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_nll_loss_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_pairwise_distance_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_relu_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_softplus_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_softshrink_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_triplet_margin_loss_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_ones_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_ravel_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_remainder_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_repeat_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_reshape_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_roll_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_rsqrt_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_sigmoid_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_sign_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_signbit_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_sin_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_sinh_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_softmax_with_dtype_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_special_bessel_j0_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_special_bessel_j1_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_special_entr_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_special_erfcx_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_special_i0e_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_special_log_softmax_with_dtype_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_special_logit_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_special_multigammaln_mvlgamma_p_1_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_special_multigammaln_mvlgamma_p_3_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_special_ndtri_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_special_spherical_bessel_j0_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_special_xlog1py_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_special_zeta_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_sqrt_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_squeeze_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_stack_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_std_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_sub_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_sum_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_sum_to_size_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_tan_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_tanh_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_to_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_transpose_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_triu_indices_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_trunc_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_unflatten_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_unfold_copy_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_unfold_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_view_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_vstack_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_where_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_xlogy_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_acosh_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_add_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_addcmul_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_addmv_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_all_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_amax_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_aminmax_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_any_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_arange_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_argmin_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_argsort_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_as_strided_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_as_strided_partial_views_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_as_strided_scatter_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_asinh_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_atan_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_atleast_1d_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_atleast_2d_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_bernoulli_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_bfloat16_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_bitwise_left_shift_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_bitwise_right_shift_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_bmm_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_bool_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_broadcast_shapes_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_broadcast_to_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_bucketize_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_byte_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_cartesian_prod_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_cat_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_cdist_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_ceil_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_chalf_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_char_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_cholesky_inverse_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_clamp_min_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_clone_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_combinations_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_conj_physical_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_copysign_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_cosh_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_count_nonzero_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_cov_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_cross_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_cummax_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_cumprod_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_cumulative_trapezoid_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_diag_embed_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_diagonal_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_digamma_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_div_floor_rounding_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_dot_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_dsplit_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_empty_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_equal_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_erf_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_fft_fftn_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_fft_hfftn_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_fft_ifftn_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_fft_ifftshift_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_fft_ihfft2_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_fft_ihfftn_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_fft_irfftn_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_fft_rfftn_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_flatten_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_fliplr_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_flipud_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_float_power_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_floor_divide_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_fmax_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_fmin_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_frac_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_full_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_full_like_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_geqrf_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_gradient_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_grid_sampler_2d_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_gt_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_histogram_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_hstack_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_hypot_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_i0_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_index_add_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_index_reduce_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_int_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_isfinite_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_isin_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_isinf_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_isnan_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_isneginf_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_isreal_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_jiterator_binary_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_jiterator_binary_return_by_ref_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_jiterator_unary_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_kron_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_kthvalue_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_lcm_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_ldexp_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_lerp_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_lgamma_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_cross_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_det_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_eig_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_eigvalsh_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_lstsq_grad_oriented_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_lu_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_lu_factor_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_lu_factor_ex_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_lu_solve_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_matrix_norm_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_matrix_power_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_matrix_rank_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_matrix_rank_hermitian_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_solve_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_solve_triangular_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_svd_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_svdvals_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_tensorinv_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_vander_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_vecdot_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_vector_norm_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_logaddexp2_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_logaddexp_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_logcumsumexp_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_logdet_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_logical_and_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_logical_not_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_logical_xor_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_logsumexp_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_long_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_lt_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_lu_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_lu_solve_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_masked_amin_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_masked_fill_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_masked_log_softmax_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_masked_logaddexp_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_masked_logsumexp_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_masked_mean_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_masked_median_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_masked_norm_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_masked_softmax_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_masked_softmin_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_masked_std_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_masked_sum_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_masked_var_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_matrix_exp_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_max_reduction_no_dim_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_mean_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_meshgrid_variadic_tensors_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_min_binary_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_min_reduction_no_dim_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_minimum_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_mm_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_movedim_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_msort_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_mvlgamma_mvlgamma_p_5_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nanmean_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_narrow_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_native_batch_norm_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_native_dropout_backward_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_new_empty_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_new_full_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_new_ones_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_new_zeros_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nextafter_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_adaptive_avg_pool1d_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_adaptive_max_pool2d_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_adaptive_max_pool3d_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_alpha_dropout_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_avg_pool1d_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_batch_norm_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_bilinear_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_binary_cross_entropy_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_binary_cross_entropy_with_logits_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_conv1d_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_conv_transpose2d_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_conv_transpose3d_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_cosine_embedding_loss_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_cross_entropy_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_ctc_loss_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_dropout2d_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_dropout3d_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_embedding_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_fractional_max_pool2d_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_gaussian_nll_loss_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_gelu_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_glu_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_hardtanh_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_hinge_embedding_loss_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_instance_norm_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_interpolate_area_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_interpolate_bicubic_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_interpolate_linear_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_l1_loss_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_leaky_relu_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_margin_ranking_loss_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_max_pool2d_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_max_unpool2d_grad_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_max_unpool3d_grad_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_mish_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_multilabel_margin_loss_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_nll_loss_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_pad_circular_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_pad_reflect_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_pairwise_distance_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_prelu_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_silu_complex_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_smooth_l1_loss_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_softplus_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_threshold_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_unfold_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_norm_inf_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_normal_number_mean_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_ones_like_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_ops_nvprims_var_mean_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_ops_nvprims_view_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_pca_lowrank_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_pinverse_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_polygamma_polygamma_n_1_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_polygamma_polygamma_n_4_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_qr_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_quantile_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_rad2deg_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_rand_like_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_randint_like_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_real_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_reciprocal_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_remainder_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_repeat_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_resize__cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_resolve_conj_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_resolve_neg_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_roll_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_rot90_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_round_decimals_3_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_scalar_tensor_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_scatter_add_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_scatter_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_scatter_reduce_amax_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_scatter_reduce_prod_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_scatter_reduce_sum_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_searchsorted_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_segment_reduce_lengths_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_select_scatter_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_short_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_sigmoid_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_signal_windows_bartlett_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_signal_windows_general_cosine_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_signal_windows_general_hamming_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_signal_windows_hann_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_sinc_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_sinh_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_slice_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_softmax_with_dtype_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_sort_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_special_bessel_j1_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_special_chebyshev_polynomial_u_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_special_erfcx_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_special_i0e_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_special_i1_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_special_i1e_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_special_laguerre_polynomial_l_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_special_legendre_polynomial_p_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_special_log_ndtr_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_special_modified_bessel_i1_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_special_ndtr_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_special_scaled_modified_bessel_k1_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_special_shifted_chebyshev_polynomial_u_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_special_shifted_chebyshev_polynomial_v_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_special_xlog1py_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_split_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_sqrt_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_square_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_std_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_std_mean_unbiased_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_sub_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_svd_lowrank_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_symeig_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_take_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_tanh_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_tensor_split_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_tensordot_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_to_sparse_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_transpose_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_triangular_solve_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_tril_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_triu_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_unfold_copy_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_unfold_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_uniform_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_unique_consecutive_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_unsqueeze_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_var_mean_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_view_as_real_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_vsplit_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_vstack_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_where_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_xlogy_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_zero__cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_zeros_like_cuda, test/test_ops.py::TestCommonCUDA::test_errors___radd___cuda, test/test_ops.py::TestCommonCUDA::test_errors___rdiv___cuda, test/test_ops.py::TestCommonCUDA::test_errors___rmul___cuda, test/test_ops.py::TestCommonCUDA::test_errors___rpow___cuda, test/test_ops.py::TestCommonCUDA::test_errors___rsub___cuda, test/test_ops.py::TestCommonCUDA::test_errors___rxor___cuda, test/test_ops.py::TestCommonCUDA::test_errors_amax_cuda, test/test_ops.py::TestCommonCUDA::test_errors_amin_cuda, test/test_ops.py::TestCommonCUDA::test_errors_atan2_cuda, test/test_ops.py::TestCommonCUDA::test_errors_bernoulli_cuda, test/test_ops.py::TestCommonCUDA::test_errors_bitwise_left_shift_cuda, test/test_ops.py::TestCommonCUDA::test_errors_bitwise_or_cuda, test/test_ops.py::TestCommonCUDA::test_errors_bitwise_right_shift_cuda, test/test_ops.py::TestCommonCUDA::test_errors_bitwise_xor_cuda, test/test_ops.py::TestCommonCUDA::test_errors_complex_cuda, test/test_ops.py::TestCommonCUDA::test_errors_copysign_cuda, test/test_ops.py::TestCommonCUDA::test_errors_cov_cuda, test/test_ops.py::TestCommonCUDA::test_errors_diag_cuda, test/test_ops.py::TestCommonCUDA::test_errors_diag_embed_cuda, test/test_ops.py::TestCommonCUDA::test_errors_diagonal_copy_cuda, test/test_ops.py::TestCommonCUDA::test_errors_div_no_rounding_mode_cuda, test/test_ops.py::TestCommonCUDA::test_errors_dstack_cuda, test/test_ops.py::TestCommonCUDA::test_errors_eq_cuda, test/test_ops.py::TestCommonCUDA::test_errors_fft_fft2_cuda, test/test_ops.py::TestCommonCUDA::test_errors_fft_fft_cuda, test/test_ops.py::TestCommonCUDA::test_errors_fft_hfftn_cuda, test/test_ops.py::TestCommonCUDA::test_errors_fft_ifftn_cuda, test/test_ops.py::TestCommonCUDA::test_errors_fft_ihfft_cuda, test/test_ops.py::TestCommonCUDA::test_errors_fft_ihfftn_cuda, test/test_ops.py::TestCommonCUDA::test_errors_fft_irfft2_cuda, test/test_ops.py::TestCommonCUDA::test_errors_fft_irfft_cuda, test/test_ops.py::TestCommonCUDA::test_errors_fft_irfftn_cuda, test/test_ops.py::TestCommonCUDA::test_errors_fft_rfft2_cuda, test/test_ops.py::TestCommonCUDA::test_errors_fft_rfft_cuda, test/test_ops.py::TestCommonCUDA::test_errors_fft_rfftn_cuda, test/test_ops.py::TestCommonCUDA::test_errors_floor_divide_cuda, test/test_ops.py::TestCommonCUDA::test_errors_fmin_cuda, test/test_ops.py::TestCommonCUDA::test_errors_gather_cuda, test/test_ops.py::TestCommonCUDA::test_errors_gradient_cuda, test/test_ops.py::TestCommonCUDA::test_errors_hsplit_cuda, test/test_ops.py::TestCommonCUDA::test_errors_hstack_cuda, test/test_ops.py::TestCommonCUDA::test_errors_hypot_cuda, test/test_ops.py::TestCommonCUDA::test_errors_isclose_cuda, test/test_ops.py::TestCommonCUDA::test_errors_jiterator_binary_cuda, test/test_ops.py::TestCommonCUDA::test_errors_ldexp_cuda, test/test_ops.py::TestCommonCUDA::test_errors_le_cuda, test/test_ops.py::TestCommonCUDA::test_errors_linalg_cross_cuda, test/test_ops.py::TestCommonCUDA::test_errors_linspace_cuda, test/test_ops.py::TestCommonCUDA::test_errors_logcumsumexp_cuda, test/test_ops.py::TestCommonCUDA::test_errors_logical_and_cuda, test/test_ops.py::TestCommonCUDA::test_errors_logical_or_cuda, test/test_ops.py::TestCommonCUDA::test_errors_logical_xor_cuda, test/test_ops.py::TestCommonCUDA::test_errors_masked_select_cuda, test/test_ops.py::TestCommonCUDA::test_errors_max_binary_cuda, test/test_ops.py::TestCommonCUDA::test_errors_mean_cuda, test/test_ops.py::TestCommonCUDA::test_errors_median_cuda, test/test_ops.py::TestCommonCUDA::test_errors_min_binary_cuda, test/test_ops.py::TestCommonCUDA::test_errors_minimum_cuda, test/test_ops.py::TestCommonCUDA::test_errors_movedim_cuda, test/test_ops.py::TestCommonCUDA::test_errors_mul_cuda, test/test_ops.py::TestCommonCUDA::test_errors_narrow_copy_cuda, test/test_ops.py::TestCommonCUDA::test_errors_narrow_cuda, test/test_ops.py::TestCommonCUDA::test_errors_native_layer_norm_cuda, test/test_ops.py::TestCommonCUDA::test_errors_ne_cuda, test/test_ops.py::TestCommonCUDA::test_errors_nextafter_cuda, test/test_ops.py::TestCommonCUDA::test_errors_nn_functional_avg_pool1d_cuda, test/test_ops.py::TestCommonCUDA::test_errors_nn_functional_avg_pool2d_cuda, test/test_ops.py::TestCommonCUDA::test_errors_nn_functional_embedding_cuda, test/test_ops.py::TestCommonCUDA::test_errors_nn_functional_gelu_cuda, test/test_ops.py::TestCommonCUDA::test_errors_nn_functional_group_norm_cuda, test/test_ops.py::TestCommonCUDA::test_errors_nn_functional_huber_loss_cuda, test/test_ops.py::TestCommonCUDA::test_errors_nn_functional_l1_loss_cuda, test/test_ops.py::TestCommonCUDA::test_errors_nn_functional_max_pool1d_cuda, test/test_ops.py::TestCommonCUDA::test_errors_nn_functional_max_pool2d_cuda, test/test_ops.py::TestCommonCUDA::test_errors_nn_functional_poisson_nll_loss_cuda, test/test_ops.py::TestCommonCUDA::test_errors_nn_functional_prelu_cuda, test/test_ops.py::TestCommonCUDA::test_errors_nn_functional_triplet_margin_with_distance_loss_cuda, test/test_ops.py::TestCommonCUDA::test_errors_ormqr_cuda, test/test_ops.py::TestCommonCUDA::test_errors_pow_cuda, test/test_ops.py::TestCommonCUDA::test_errors_renorm_cuda, test/test_ops.py::TestCommonCUDA::test_errors_reshape_as_cuda, test/test_ops.py::TestCommonCUDA::test_errors_reshape_cuda, test/test_ops.py::TestCommonCUDA::test_errors_roll_cuda, test/test_ops.py::TestCommonCUDA::test_errors_rot90_cuda, test/test_ops.py::TestCommonCUDA::test_errors_scatter_add_cuda, test/test_ops.py::TestCommonCUDA::test_errors_scatter_cuda, test/test_ops.py::TestCommonCUDA::test_errors_signal_windows_cosine_cuda, test/test_ops.py::TestCommonCUDA::test_errors_signal_windows_gaussian_cuda, test/test_ops.py::TestCommonCUDA::test_errors_signal_windows_general_cosine_cuda, test/test_ops.py::TestCommonCUDA::test_errors_signal_windows_kaiser_cuda, test/test_ops.py::TestCommonCUDA::test_errors_signal_windows_nuttall_cuda, test/test_ops.py::TestCommonCUDA::test_errors_special_chebyshev_polynomial_u_cuda, test/test_ops.py::TestCommonCUDA::test_errors_special_hermite_polynomial_h_cuda, test/test_ops.py::TestCommonCUDA::test_errors_special_legendre_polynomial_p_cuda, test/test_ops.py::TestCommonCUDA::test_errors_special_shifted_chebyshev_polynomial_t_cuda, test/test_ops.py::TestCommonCUDA::test_errors_special_shifted_chebyshev_polynomial_w_cuda, test/test_ops.py::TestCommonCUDA::test_errors_special_xlog1py_cuda, test/test_ops.py::TestCommonCUDA::test_errors_sub_cuda, test/test_ops.py::TestCommonCUDA::test_errors_t_cuda, test/test_ops.py::TestCommonCUDA::test_errors_tril_cuda, test/test_ops.py::TestCommonCUDA::test_errors_true_divide_cuda, test/test_ops.py::TestCommonCUDA::test_errors_unbind_cuda, test/test_ops.py::TestCommonCUDA::test_errors_view_as_cuda, test/test_ops.py::TestCommonCUDA::test_errors_view_copy_cuda, test/test_ops.py::TestCommonCUDA::test_errors_xlogy_cuda, test/test_ops.py::TestCommonCUDA::test_multiple_devices_H_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_T_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices___radd___cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices___radd___cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices___rdiv___cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices___rdiv___cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices___rmatmul___cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices___rmul___cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices___rmul___cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices___rsub___cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices__softmax_backward_data_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_abs_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_abs_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_acos_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_acosh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_add_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_addcdiv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_addcmul_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_addmm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_addr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_addr_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_all_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_allclose_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_amax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_amax_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_amin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_aminmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_aminmax_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_angle_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_any_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_any_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_arange_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_arange_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_argmax_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_argmin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_argwhere_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_as_strided_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_as_strided_partial_views_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_as_strided_partial_views_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_asin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_atan2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_atanh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_atanh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_atleast_1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_atleast_1d_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_atleast_2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_atleast_2d_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_atleast_3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_atleast_3d_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_bernoulli_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_bitwise_and_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_bitwise_right_shift_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_block_diag_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_block_diag_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_bmm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_bool_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_broadcast_tensors_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_broadcast_tensors_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_broadcast_to_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_broadcast_to_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_bucketize_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_byte_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_cartesian_prod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_ceil_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_ceil_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_cfloat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_cfloat_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_chalf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_cholesky_inverse_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_chunk_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_clamp_max_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_clamp_min_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_clone_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_column_stack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_combinations_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_complex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_conj_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_conj_physical_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_conj_physical_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_corrcoef_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_corrcoef_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_cos_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_cosh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_cosh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_count_nonzero_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_cov_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_cross_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_cummax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_cummax_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_cumprod_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_deg2rad_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_deg2rad_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_diag_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_diag_embed_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_diagonal_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_diagonal_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_diagonal_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_diagonal_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_div_no_rounding_mode_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_div_trunc_rounding_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_div_trunc_rounding_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_double_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_double_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_dsplit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_dsplit_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_einsum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_empty_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_empty_like_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_eq_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_equal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_equal_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_erf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_erf_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_erfc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_erfc_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_erfinv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_exp2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_exp2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_exp_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_expand_as_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_expand_as_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_expand_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_expand_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_expm1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_eye_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_fft2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_fft_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_fftn_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_fftshift_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_hfft2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_hfft_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_hfftn_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_ifft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_ifft2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_ifftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_ifftn_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_ifftshift_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_ifftshift_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_ihfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_ihfft_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_ihfftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_ihfftn_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_irfft_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_irfftn_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_rfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_rfft2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_rfftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fill_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_flipud_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_flipud_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_floor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_floor_divide_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fmod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fmod_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_full_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_ge_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_gradient_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_grid_sampler_2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_gt_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_half_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_half_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_heaviside_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_heaviside_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_histc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_histc_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_hstack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_i0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_i0_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_igamma_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_index_add_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_index_fill_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_index_put_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_index_reduce_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_index_reduce_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_inner_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_int_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_isin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_isin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_isinf_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_isnan_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_isneginf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_isneginf_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_isposinf_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_isreal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_isreal_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_jiterator_4inputs_with_extra_args_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_jiterator_4inputs_with_extra_args_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_jiterator_binary_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_jiterator_binary_return_by_ref_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_jiterator_binary_return_by_ref_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_jiterator_unary_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_kthvalue_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_ldexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_le_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_lgamma_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_det_singular_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_eig_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_ldl_factor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_ldl_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_lstsq_grad_oriented_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_lu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_lu_factor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_matrix_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_matrix_rank_hermitian_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_multi_dot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_norm_subgradients_at_zero_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_pinv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_pinv_hermitian_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_qr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_slogdet_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_solve_ex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_vander_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_vecdot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_vector_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linspace_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_log1p_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_log2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_log_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_logcumsumexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_logdet_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_logical_or_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_logical_xor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_logical_xor_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_logit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_logit_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_logspace_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_logsumexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_long_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_lt_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_lu_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_mH_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_amax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_amax_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_amin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_argmax_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_argmin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_cumprod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_cumprod_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_cumsum_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_logsumexp_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_mean_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_normalize_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_prod_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_scatter_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_select_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_var_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_var_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_matmul_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_max_binary_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_max_reduction_no_dim_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_maximum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_maximum_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_meshgrid_list_of_tensors_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_meshgrid_variadic_tensors_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_min_binary_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_min_binary_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_min_reduction_no_dim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_min_reduction_with_dim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_minimum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_minimum_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_mode_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_movedim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_msort_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_mv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_mvlgamma_mvlgamma_p_3_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_mvlgamma_mvlgamma_p_5_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_mvlgamma_mvlgamma_p_5_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nan_to_num_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nanmedian_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nanquantile_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nansum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nansum_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_narrow_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_narrow_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_native_batch_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_native_layer_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_ne_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_new_empty_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_new_empty_strided_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_new_full_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_new_full_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_new_ones_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_new_ones_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_new_zeros_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional__scaled_dot_product_attention_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_adaptive_avg_pool1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_adaptive_max_pool2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_adaptive_max_pool3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_alpha_dropout_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_avg_pool1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_avg_pool2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_avg_pool3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_batch_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_bilinear_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_conv_transpose1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_cosine_similarity_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_ctc_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_dropout3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_dropout_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_elu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_feature_alpha_dropout_without_train_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_fractional_max_pool2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_fractional_max_pool3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_gelu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_glu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_hinge_embedding_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_interpolate_area_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_interpolate_linear_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_interpolate_trilinear_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_layer_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_leaky_relu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_linear_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_local_response_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_margin_ranking_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_max_pool1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_max_unpool1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_max_unpool1d_grad_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_max_unpool2d_grad_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_mish_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_mse_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_pad_reflect_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_pad_replicate_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_pairwise_distance_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_pairwise_distance_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_pdist_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_pixel_shuffle_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_pixel_unshuffle_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_relu6_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_relu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_rrelu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_selu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_silu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_smooth_l1_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_soft_margin_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_softmin_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_softmin_with_dtype_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_softsign_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_softsign_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_tanhshrink_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_tanhshrink_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_threshold_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_threshold_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_unfold_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_upsample_bilinear_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nonzero_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nonzero_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_norm_fro_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_ones_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_ormqr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_outer_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_pinverse_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_polar_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_polygamma_polygamma_n_0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_polygamma_polygamma_n_1_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_polygamma_polygamma_n_2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_polygamma_polygamma_n_4_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_polygamma_polygamma_n_4_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_pow_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_prod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_put_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_put_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_qr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_rad2deg_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_rand_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_randint_like_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_randn_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_real_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_real_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_remainder_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_remainder_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_repeat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_repeat_interleave_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_reshape_as_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_reshape_as_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_resize__cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_resize_as__cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_resize_as__cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_resolve_conj_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_resolve_neg_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_resolve_neg_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_roll_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_rot90_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_rot90_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_round_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_round_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_round_decimals_0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_round_decimals_3_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_rsqrt_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_rsub_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_scalar_tensor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_scalar_tensor_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_scatter_add_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_scatter_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_scatter_reduce_amax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_scatter_reduce_amax_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_scatter_reduce_amin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_scatter_reduce_prod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_select_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_select_scatter_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_sigmoid_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_sigmoid_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_sign_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_sign_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_signal_windows_bartlett_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_signal_windows_exponential_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_signal_windows_general_cosine_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_signal_windows_hamming_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_signal_windows_hann_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_signal_windows_kaiser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_signal_windows_nuttall_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_sin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_sin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_sinc_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_slice_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_slice_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_slice_scatter_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_softmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_sort_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_sort_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_airy_ai_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_airy_ai_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_bessel_j0_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_bessel_y0_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_bessel_y1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_bessel_y1_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_chebyshev_polynomial_t_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_chebyshev_polynomial_v_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_chebyshev_polynomial_w_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_erfcx_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_erfcx_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_hermite_polynomial_h_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_hermite_polynomial_he_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_i1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_i1_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_i1e_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_log_ndtr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_modified_bessel_k1_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_ndtr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_ndtri_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_ndtri_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_scaled_modified_bessel_k0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_scaled_modified_bessel_k0_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_shifted_chebyshev_polynomial_t_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_spherical_bessel_j0_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_xlog1py_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_xlog1py_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_split_list_args_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_split_with_sizes_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_sqrt_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_squeeze_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_stack_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_std_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_std_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_std_unbiased_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_sub_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_sum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_sum_to_size_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_symeig_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_t_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_take_along_dim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_take_along_dim_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_take_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_take_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_tan_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_tanh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_tanh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_tensor_split_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_tensordot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_tile_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_to_sparse_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_trace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_transpose_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_transpose_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_trapezoid_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_trapz_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_trapz_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_triangular_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_tril_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_tril_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_triu_indices_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_true_divide_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_true_divide_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_trunc_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_unbind_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_unflatten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_unfold_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_uniform_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_unique_consecutive_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_unique_consecutive_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_unique_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_unsqueeze_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_var_mean_unbiased_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_var_unbiased_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_view_as_complex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_view_as_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_view_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_vsplit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_vsplit_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_where_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_xlogy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_xlogy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_zeros_like_cuda_int64, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values___getitem___cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values___rand___cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values___rmul___cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values___rxor___cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_abs_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_acos_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_add_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_aminmax_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_angle_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_argwhere_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_as_strided_partial_views_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_as_strided_scatter_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_asin_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_atan2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_atan_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_atleast_2d_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_bfloat16_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_bitwise_and_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_bitwise_not_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_bitwise_xor_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_block_diag_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_broadcast_to_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_byte_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_cdouble_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_clamp_min_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_clone_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_column_stack_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_combinations_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_conj_physical_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_constant_pad_nd_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_contiguous_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_copysign_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_cos_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_count_nonzero_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_deg2rad_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_diagonal_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_diff_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_digamma_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_div_no_rounding_mode_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_double_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_dsplit_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_empty_like_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_eq_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_equal_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_exp_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_expand_as_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_expm1_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_eye_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_fft2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_fft_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_fftn_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_hfft_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_hfftn_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_ifftn_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_ihfftn_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_irfft_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_irfftn_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_rfftn_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_flip_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fliplr_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_float_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fmin_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_full_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_full_like_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_gather_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_gt_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_heaviside_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_hstack_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_i0_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_index_fill_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_isclose_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_isfinite_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_isinf_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_isnan_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_isreal_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_jiterator_binary_return_by_ref_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_jiterator_unary_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_kron_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_logical_and_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_logsumexp_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_long_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_lt_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_masked_mean_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_masked_prod_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_masked_scatter_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_masked_select_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_max_binary_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_max_reduction_no_dim_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_meshgrid_list_of_tensors_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_meshgrid_variadic_tensors_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_min_binary_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_min_reduction_no_dim_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_min_reduction_with_dim_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_mode_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_nan_to_num_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_nansum_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_ne_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_new_empty_strided_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_nn_functional_cosine_embedding_loss_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_nn_functional_feature_alpha_dropout_without_train_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_nn_functional_pad_constant_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_nn_functional_pixel_unshuffle_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_nonzero_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_ones_like_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_outer_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_permute_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_polygamma_polygamma_n_0_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_polygamma_polygamma_n_1_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_prod_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_put_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_ravel_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_repeat_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_repeat_interleave_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_reshape_as_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_reshape_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_resize__cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_resolve_conj_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_resolve_neg_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_roll_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_rot90_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_rsqrt_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_scatter_add_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_scatter_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_select_scatter_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_sgn_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_short_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_sigmoid_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_sin_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_slice_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_bessel_y0_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_bessel_y1_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_chebyshev_polynomial_t_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_chebyshev_polynomial_v_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_entr_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_hermite_polynomial_h_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_hermite_polynomial_he_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_i0e_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_i1_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_laguerre_polynomial_l_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_legendre_polynomial_p_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_modified_bessel_i0_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_modified_bessel_k1_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_ndtri_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_polygamma_special_polygamma_n_0_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_scaled_modified_bessel_k0_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_shifted_chebyshev_polynomial_v_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_shifted_chebyshev_polynomial_w_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_spherical_bessel_j0_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_split_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_sqrt_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_stack_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_sum_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_tensor_split_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_trace_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_triu_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_true_divide_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_unbind_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_view_as_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_view_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_vsplit_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_zero__cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_zeros_like_cuda_bool, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_H_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_T_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_T_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples___getitem___cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples___getitem___cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples___getitem___cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples___radd___cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples___rmod___cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples___rmod___cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples___ror___cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples___rpow___cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples___rpow___cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples___rsub___cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples___rxor___cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples__native_batch_norm_legit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_abs_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_abs_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_abs_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_acos_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_acosh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_acosh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_add_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_addbmm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_addbmm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_addcmul_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_addmm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_addmm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_addmm_decomposed_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_addmv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_addr_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_addr_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_all_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_all_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_aminmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_any_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_any_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_any_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_arange_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_argmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_argmin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_argwhere_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_as_strided_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_as_strided_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_as_strided_partial_views_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_as_strided_partial_views_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_as_strided_scatter_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_asin_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_asin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_asinh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_asinh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_asinh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_atan_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_atan_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_atan_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_atanh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_atleast_1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_atleast_1d_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_atleast_2d_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_atleast_2d_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_atleast_3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_atleast_3d_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_baddbmm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_bfloat16_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_bincount_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_bitwise_and_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_bitwise_left_shift_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_block_diag_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_block_diag_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_bmm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_bmm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_bool_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_bool_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_broadcast_tensors_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_broadcast_to_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_bucketize_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_byte_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_byte_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cartesian_prod_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cartesian_prod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cat_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cdouble_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cdouble_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_ceil_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_ceil_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cfloat_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_chalf_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_chalf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_chalf_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_char_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cholesky_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cholesky_inverse_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cholesky_inverse_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cholesky_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_chunk_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_chunk_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_chunk_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_clamp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_clamp_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_clamp_max_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_clone_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_clone_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_clone_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_column_stack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_column_stack_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_combinations_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_conj_physical_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_conj_physical_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_conj_physical_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_constant_pad_nd_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_constant_pad_nd_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_contiguous_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_contiguous_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_contiguous_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_corrcoef_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cosh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cosh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_count_nonzero_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cov_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cov_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cross_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cummax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cumprod_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cumprod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cumsum_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cumulative_trapezoid_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cumulative_trapezoid_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cumulative_trapezoid_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_diag_embed_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_diag_embed_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_diagonal_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_diagonal_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_diagonal_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_diagonal_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_diagonal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_diff_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_digamma_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_dist_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_div_trunc_rounding_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_div_trunc_rounding_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_double_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_dstack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_einsum_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_einsum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_empty_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_empty_like_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_empty_like_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_eq_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_eq_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_eq_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_equal_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_exp2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_exp2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_exp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_expand_as_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_expand_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_expand_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_eye_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_eye_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_fft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_fft2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_fft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_fft_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_fftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_fftshift_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_hfft2_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_hfft2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_hfft_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_hfft_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_hfftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_hfftn_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_ifft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_ifft2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_ifft_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_ifftn_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_ifftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_ifftshift_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_ihfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_ihfft2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_ihfftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_ihfftn_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_irfft2_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_irfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_irfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_irfft_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_irfftn_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_rfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_rfft2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_rfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_rfft_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fill_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fill_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fill_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_flatten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_flatten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fliplr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_flipud_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_flipud_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_flipud_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_float_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_float_power_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_float_power_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_float_power_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_floor_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_floor_divide_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fmod_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_frac_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_frexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_full_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_full_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_full_like_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_full_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_ge_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_ge_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_gradient_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_gradient_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_half_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_half_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_heaviside_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_histc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_histc_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_hsplit_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_hsplit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_hsplit_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_hstack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_hstack_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_hypot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_i0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_igammac_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_imag_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_index_add_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_index_add_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_index_add_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_index_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_index_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_index_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_index_put_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_index_reduce_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_index_select_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_inner_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_int_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_int_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_int_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isclose_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isfinite_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isinf_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isinf_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isnan_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isnan_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isneginf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isneginf_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isposinf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isreal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isreal_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_istft_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_jiterator_2inputs_2outputs_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_jiterator_2inputs_2outputs_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_jiterator_4inputs_with_extra_args_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_jiterator_binary_return_by_ref_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_jiterator_binary_return_by_ref_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_jiterator_binary_return_by_ref_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_jiterator_unary_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_jiterator_unary_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_kron_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_kron_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_lcm_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_ldexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_ldexp_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_lerp_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_lgamma_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_cond_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_det_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_det_singular_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_eig_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_eigh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_eigh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_eigvals_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_eigvalsh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_eigvalsh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_householder_product_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_householder_product_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_inv_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_inv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_inv_ex_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_inv_ex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_ldl_factor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_ldl_solve_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_ldl_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_lstsq_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_lstsq_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_lstsq_grad_oriented_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_lu_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_lu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_lu_factor_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_lu_factor_ex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_lu_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_matrix_norm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_matrix_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_matrix_power_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_matrix_rank_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_matrix_rank_hermitian_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_multi_dot_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_norm_subgradients_at_zero_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_norm_subgradients_at_zero_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_pinv_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_pinv_hermitian_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_qr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_solve_ex_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_solve_ex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_solve_triangular_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_svd_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_svdvals_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_svdvals_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_tensorinv_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_tensorsolve_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_tensorsolve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_vander_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_vecdot_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_vector_norm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_vector_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linspace_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linspace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_log2_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_log2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_log2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_log_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_log_softmax_with_dtype_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logaddexp2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logcumsumexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logdet_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logical_and_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logical_and_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logical_not_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logical_not_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logical_or_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logical_or_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logical_xor_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logspace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logspace_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logsumexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_lt_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_lu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_lu_solve_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_lu_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_lu_unpack_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mH_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mH_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_amin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_argmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_argmax_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_argmin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_cumprod_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_cumprod_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_cumsum_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_cumsum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_cumsum_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_fill_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_fill_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_fill_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_logsumexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_mean_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_median_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_normalize_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_prod_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_prod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_scatter_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_select_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_select_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_softmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_sum_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_sum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_sum_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_var_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_var_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_matmul_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_matmul_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_matrix_exp_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_matrix_exp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_max_binary_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_max_pool2d_with_indices_backward_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_max_reduction_no_dim_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_max_reduction_with_dim_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_maximum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_maximum_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_median_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_median_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_meshgrid_list_of_tensors_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_meshgrid_variadic_tensors_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_min_reduction_no_dim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_min_reduction_no_dim_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_min_reduction_with_dim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_minimum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_minimum_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mode_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mode_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_movedim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_msort_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_msort_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mul_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mv_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mvlgamma_mvlgamma_p_1_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mvlgamma_mvlgamma_p_5_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nan_to_num_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nanmean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nanmedian_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nanmedian_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nanquantile_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nansum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nansum_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_narrow_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_narrow_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_narrow_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_narrow_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_narrow_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_narrow_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_native_batch_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_native_dropout_backward_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_native_layer_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_ne_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_neg_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_new_empty_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_new_empty_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_new_empty_strided_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_new_empty_strided_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_new_full_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_new_full_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_new_ones_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_new_ones_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_new_zeros_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_new_zeros_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional__scaled_dot_product_attention_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_adaptive_avg_pool1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_adaptive_avg_pool2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_adaptive_avg_pool3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_adaptive_max_pool1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_adaptive_max_pool2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_adaptive_max_pool3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_avg_pool1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_avg_pool2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_avg_pool3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_batch_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_batch_norm_without_cudnn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_bilinear_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_celu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_conv1d_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_conv1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_conv2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_conv_transpose1d_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_conv_transpose1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_conv_transpose2d_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_conv_transpose2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_conv_transpose3d_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_conv_transpose3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_cosine_similarity_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_cross_entropy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_ctc_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_dropout2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_dropout3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_embedding_bag_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_embedding_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_feature_alpha_dropout_without_train_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_feature_alpha_dropout_without_train_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_fractional_max_pool2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_fractional_max_pool3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_gaussian_nll_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_gelu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_hardswish_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_hardtanh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_huber_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_instance_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_interpolate_area_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_interpolate_nearest_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_interpolate_trilinear_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_layer_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_leaky_relu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_linear_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_local_response_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_max_pool1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_max_pool3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_max_unpool1d_grad_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_max_unpool2d_grad_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_max_unpool3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_multilabel_margin_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_one_hot_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pad_circular_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pad_circular_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pad_circular_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pad_constant_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pad_constant_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pad_constant_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pad_replicate_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pad_replicate_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pixel_shuffle_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pixel_shuffle_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pixel_unshuffle_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pixel_unshuffle_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_poisson_nll_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_poisson_nll_loss_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_prelu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_silu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_smooth_l1_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_soft_margin_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_softmin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_softmin_with_dtype_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_softsign_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_softsign_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_tanhshrink_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_tanhshrink_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_triplet_margin_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_unfold_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_unfold_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_upsample_bilinear_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nonzero_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nonzero_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nonzero_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_norm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_norm_fro_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_norm_fro_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_norm_inf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_ones_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_ones_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_ones_like_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_ones_like_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_ormqr_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_outer_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_outer_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_outer_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_permute_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_permute_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_polar_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_polygamma_polygamma_n_0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_polygamma_polygamma_n_1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_polygamma_polygamma_n_3_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_polygamma_polygamma_n_3_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_polygamma_polygamma_n_4_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_positive_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_positive_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_pow_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_pow_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_put_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_put_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_put_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_quantile_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_rad2deg_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_rand_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_randint_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_randint_like_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_randn_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_randn_like_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_randn_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_ravel_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_real_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_reciprocal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_remainder_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_remainder_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_repeat_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_repeat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_repeat_interleave_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_repeat_interleave_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_reshape_as_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_reshape_as_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_reshape_as_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_reshape_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_reshape_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_resize__cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_resize_as__cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_resolve_conj_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_resolve_conj_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_resolve_neg_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_resolve_neg_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_roll_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_roll_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_rot90_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_round_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_round_decimals_0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_round_decimals_3_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_rsqrt_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_rsqrt_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_rsub_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_rsub_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_scalar_tensor_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_scalar_tensor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_scalar_tensor_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_scatter_add_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_scatter_add_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_scatter_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_scatter_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_scatter_reduce_amax_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_scatter_reduce_amin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_scatter_reduce_prod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_scatter_reduce_prod_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_searchsorted_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_segment_reduce_offsets_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_select_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_select_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_select_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sgn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sgn_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_short_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_short_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_short_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_signal_windows_bartlett_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_signal_windows_blackman_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_signal_windows_cosine_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_signal_windows_gaussian_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_signal_windows_hann_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_signal_windows_kaiser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_signal_windows_nuttall_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_signbit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_signbit_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sinc_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sinc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sinc_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sinh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_slice_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_slice_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_slice_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_softmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sort_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sort_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_airy_ai_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_airy_ai_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_bessel_j0_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_bessel_y0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_bessel_y1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_chebyshev_polynomial_t_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_chebyshev_polynomial_u_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_chebyshev_polynomial_v_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_chebyshev_polynomial_v_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_entr_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_erfcx_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_erfcx_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_hermite_polynomial_he_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_hermite_polynomial_he_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_i1e_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_i1e_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_laguerre_polynomial_l_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_log_ndtr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_log_ndtr_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_modified_bessel_k0_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_modified_bessel_k1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_ndtr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_ndtr_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_ndtri_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_polygamma_special_polygamma_n_0_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_scaled_modified_bessel_k0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_scaled_modified_bessel_k0_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_scaled_modified_bessel_k1_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_shifted_chebyshev_polynomial_t_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_shifted_chebyshev_polynomial_u_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_shifted_chebyshev_polynomial_v_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_spherical_bessel_j0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_spherical_bessel_j0_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_split_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_split_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_split_list_args_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_split_list_args_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_split_with_sizes_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_split_with_sizes_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_split_with_sizes_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sqrt_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_square_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_square_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_square_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_stack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_stack_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_std_mean_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_std_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_std_mean_unbiased_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_std_mean_unbiased_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_std_unbiased_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_stft_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sub_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sub_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sum_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sum_to_size_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sum_to_size_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_svd_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_symeig_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_symeig_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_t_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_t_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_t_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_take_along_dim_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_take_along_dim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_take_along_dim_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_tan_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_tan_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_tanh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_tanh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_tensor_split_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_tensordot_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_tile_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_to_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_to_sparse_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_transpose_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_trapezoid_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_trapezoid_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_trapezoid_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_triangular_solve_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_tril_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_triu_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_true_divide_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_trunc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_trunc_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unbind_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unflatten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unflatten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_uniform_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unique_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unsqueeze_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_var_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_var_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_var_mean_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_var_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_var_mean_unbiased_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_var_unbiased_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_vdot_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_view_as_complex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_view_as_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_view_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_view_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_vstack_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_vstack_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_where_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_where_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_zero__cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_zeros_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_zeros_like_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_addbmm_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_allclose_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_allclose_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_aminmax_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_argwhere_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_broadcast_tensors_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_cat_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_cat_cuda_int64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_clamp_cuda_int64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_clone_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_diag_cuda_int64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_diagflat_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_diagflat_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_diagflat_cuda_int64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_diff_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_diff_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_diff_cuda_int64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_equal_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_equal_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_equal_cuda_int64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_flatten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_jiterator_2inputs_2outputs_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_jiterator_2inputs_2outputs_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_jiterator_4inputs_with_extra_args_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_jiterator_4inputs_with_extra_args_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_jiterator_4inputs_with_extra_args_cuda_int64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_linalg_cross_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_linalg_tensorinv_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_linalg_tensorinv_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_linalg_vander_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_linalg_vander_cuda_int64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_meshgrid_variadic_tensors_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_meshgrid_variadic_tensors_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_native_layer_norm_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_nn_functional_conv_transpose1d_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_nn_functional_conv_transpose1d_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_nn_functional_conv_transpose2d_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_nn_functional_group_norm_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_nn_functional_l1_loss_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_nn_functional_layer_norm_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_nn_functional_mse_loss_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_nn_functional_pdist_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_nn_functional_smooth_l1_loss_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_permute_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_permute_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_permute_cuda_int64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_ravel_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_repeat_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_roll_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_searchsorted_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_searchsorted_cuda_int64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_signal_windows_blackman_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_signal_windows_cosine_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_signal_windows_exponential_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_signal_windows_general_cosine_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_signal_windows_general_hamming_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_signal_windows_hamming_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_tensor_split_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_tensor_split_cuda_int64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_tile_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_tile_cuda_int64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_transpose_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_transpose_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_triu_indices_cuda_int64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_unbind_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_unbind_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_view_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_where_cuda_int64, test/test_ops.py::TestCommonCUDA::test_out_T_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out___radd___cuda_float32, test/test_ops.py::TestCommonCUDA::test_out___rand___cuda_int64, test/test_ops.py::TestCommonCUDA::test_out___rdiv___cuda_float32, test/test_ops.py::TestCommonCUDA::test_out___ror___cuda_int64, test/test_ops.py::TestCommonCUDA::test_out___rxor___cuda_int64, test/test_ops.py::TestCommonCUDA::test_out__native_batch_norm_legit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_T_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs__conversions_bfloat16_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs__conversions_byte_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs__conversions_cdouble_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs__conversions_cfloat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs__conversions_chalf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs__conversions_char_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs__conversions_complex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs__conversions_float_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs__conversions_int_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_abs_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_acosh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_allclose_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_amax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_any_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_arange_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_asinh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_atan_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_atanh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_atleast_2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_atleast_3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_bitwise_and_cuda_int64, test/test_ops.py::TestCommonCUDA::test_out__refs_bitwise_left_shift_cuda_int64, test/test_ops.py::TestCommonCUDA::test_out__refs_bitwise_right_shift_cuda_int64, test/test_ops.py::TestCommonCUDA::test_out__refs_bitwise_xor_cuda_int64, test/test_ops.py::TestCommonCUDA::test_out__refs_broadcast_shapes_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_broadcast_tensors_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_broadcast_to_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_bucketize_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_chunk_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_clamp_max_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_column_stack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_constant_pad_nd_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_contiguous_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_cosh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_diag_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_diag_embed_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_diagonal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_diagonal_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_digamma_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_div_no_rounding_mode_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_dstack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_empty_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_empty_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_eq_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_erf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_expand_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_expm1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_eye_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_fft_fft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_fft_fftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_fft_hfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_fft_ifft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_fft_ifft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_fft_ifftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_fft_ifftshift_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_fft_ihfftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_fft_irfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_fft_irfftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_fft_rfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_fft_rfftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_fill_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_flipud_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_float_power_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_floor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_floor_divide_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_fmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_fmin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_gcd_cuda_int64, test/test_ops.py::TestCommonCUDA::test_out__refs_gt_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_hsplit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_hstack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_hypot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_i0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_igamma_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_imag_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out__refs_index_add_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_index_select_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_isinf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_isneginf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_isposinf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_isreal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_lcm_cuda_int64, test/test_ops.py::TestCommonCUDA::test_out__refs_lgamma_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_linalg_matrix_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_linalg_svd_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_linalg_vector_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_linspace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_logical_not_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_lt_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_meshgrid_list_of_tensors_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_meshgrid_variadic_tensors_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_minimum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_mul_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_narrow_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_native_layer_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_neg_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_new_full_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_new_ones_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_nextafter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_alpha_dropout_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_celu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_elu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_gelu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_glu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_hardshrink_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_huber_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_l1_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_layer_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_mish_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_mse_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_prelu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_selu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_softmin_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_softplus_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_tanhshrink_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_triplet_margin_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_ones_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_permute_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_real_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_remainder_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_repeat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_reshape_as_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_reshape_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_rot90_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_rsub_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_signbit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_sin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_sinc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_special_bessel_j0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_special_entr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_special_i0e_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_special_i1e_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_special_logit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_special_multigammaln_mvlgamma_p_3_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_special_ndtri_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_special_spherical_bessel_j0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_special_xlog1py_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_special_zeta_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_square_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_squeeze_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_std_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_sub_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_sum_to_size_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_t_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_tan_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_tensor_split_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_transpose_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_tril_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_unbind_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_unflatten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_unfold_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_unfold_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_unsqueeze_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_view_as_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_vstack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_zeros_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_abs_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_addcmul_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_addmv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_addr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_allclose_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_amax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_any_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_arange_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_argmin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_argwhere_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_asinh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_atan_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_bitwise_and_cuda_int64, test/test_ops.py::TestCommonCUDA::test_out_bitwise_or_cuda_int64, test/test_ops.py::TestCommonCUDA::test_out_bitwise_right_shift_cuda_int64, test/test_ops.py::TestCommonCUDA::test_out_bitwise_xor_cuda_int64, test/test_ops.py::TestCommonCUDA::test_out_block_diag_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_bmm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_broadcast_to_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_byte_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_cartesian_prod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_cat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_cdist_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_cdouble_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_ceil_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_cfloat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_chalf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_char_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_cholesky_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_cholesky_inverse_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_chunk_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_clamp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_clamp_max_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_clamp_min_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_clone_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_combinations_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_complex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_constant_pad_nd_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_contiguous_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_copysign_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_cos_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_cross_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_cummin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_cumsum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_diag_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_diagflat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_diagonal_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_diff_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_dist_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_div_no_rounding_mode_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_div_trunc_rounding_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_dot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_dstack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_empty_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_eq_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_erf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_erfc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_exp2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_expm1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_eye_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_fft_fft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_fft_fft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_fft_fftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_fft_hfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_fft_hfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_fft_hfftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_fft_ifft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_fft_ifft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_fft_ifftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_fft_ifftshift_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_fft_ihfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_fft_ihfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_fft_irfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_fft_irfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_fft_irfftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_fft_rfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_fft_rfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_fill_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_flatten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_flip_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_fliplr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_flipud_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_float_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_float_power_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_fmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_fmod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_gather_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_geqrf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_grid_sampler_2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_half_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_heaviside_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_histc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_hstack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_hypot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_index_reduce_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_int_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_isclose_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_isfinite_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_isin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_isnan_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_isposinf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_isreal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_jiterator_2inputs_2outputs_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_jiterator_4inputs_with_extra_args_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_jiterator_binary_return_by_ref_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_jiterator_unary_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_le_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_lgamma_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_cross_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_det_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_det_singular_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_householder_product_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_lu_factor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_lu_factor_ex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_lu_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_matrix_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_matrix_rank_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_matrix_rank_hermitian_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_multi_dot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_pinv_hermitian_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_pinv_singular_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_qr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_solve_ex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_tensorinv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_vecdot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linspace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_log2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_logaddexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_logcumsumexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_logdet_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_logical_or_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_logical_xor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_logit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_logspace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_logsumexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_masked_amax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_masked_cumsum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_masked_fill_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_masked_log_softmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_masked_logaddexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_masked_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_masked_normalize_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_masked_prod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_masked_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_masked_std_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_max_binary_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_max_reduction_no_dim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_max_reduction_with_dim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_maximum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_median_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_meshgrid_list_of_tensors_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_min_binary_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_min_reduction_no_dim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_min_reduction_with_dim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_minimum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_mode_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_movedim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_msort_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_multinomial_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_mv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_mvlgamma_mvlgamma_p_5_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nan_to_num_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nanmedian_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nansum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_narrow_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_narrow_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_native_batch_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_native_layer_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_ne_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_neg_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_new_empty_strided_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nextafter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional__scaled_dot_product_attention_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_adaptive_avg_pool3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_alpha_dropout_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_batch_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_batch_norm_without_cudnn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_bilinear_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_celu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_conv2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_conv_transpose2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_cosine_similarity_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_dropout2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_dropout_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_fractional_max_pool3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_glu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_grid_sample_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_group_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_hardshrink_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_hardswish_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_hardtanh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_huber_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_interpolate_area_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_interpolate_bicubic_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_interpolate_bilinear_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_interpolate_linear_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_kl_div_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_l1_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_leaky_relu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_logsigmoid_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_margin_ranking_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_max_pool1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_max_unpool3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_multi_margin_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_one_hot_cuda_int64, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_pad_circular_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_pad_reflect_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_pixel_shuffle_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_relu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_silu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_softmin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_softmin_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_softshrink_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_softsign_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_triplet_margin_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_upsample_nearest_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_norm_inf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_norm_nuc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_normal_number_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_ones_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_ops_nvprims_native_batch_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_ops_nvprims_var_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_ops_nvprims_view_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_outer_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_polygamma_polygamma_n_0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_polygamma_polygamma_n_1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_polygamma_polygamma_n_2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_polygamma_polygamma_n_3_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_polygamma_polygamma_n_4_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_positive_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_qr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_quantile_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_randint_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_randn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_randn_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_real_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_reciprocal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_renorm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_repeat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_repeat_interleave_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_resize__cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_resize_as__cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_resolve_neg_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_round_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_round_decimals_3_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_round_decimals_neg_3_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_rsqrt_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_scatter_add_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_scatter_reduce_amax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_scatter_reduce_sum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_searchsorted_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_segment_reduce_lengths_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_select_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_short_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_sigmoid_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_signal_windows_exponential_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_signal_windows_hann_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_signal_windows_kaiser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_signbit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_sin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_sinc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_slice_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_sort_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_special_airy_ai_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_special_bessel_j1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_special_chebyshev_polynomial_t_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_special_chebyshev_polynomial_u_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_special_chebyshev_polynomial_w_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_special_entr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_special_hermite_polynomial_h_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_special_i0e_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_special_laguerre_polynomial_l_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_special_modified_bessel_k0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_special_modified_bessel_k1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_special_scaled_modified_bessel_k0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_special_scaled_modified_bessel_k1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_special_spherical_bessel_j0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_special_zeta_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_split_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_split_with_sizes_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_sqrt_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_square_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_stack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_std_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_std_unbiased_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_sub_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_sum_to_size_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_svd_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_svd_lowrank_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_tensor_split_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_tensordot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_tile_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_topk_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_trace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_transpose_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_trapezoid_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_triangular_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_tril_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_triu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_true_divide_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_unflatten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_unfold_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_unfold_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_unique_consecutive_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_unique_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_var_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_var_unbiased_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_view_as_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_view_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_vstack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_warning___rand___cuda, test/test_ops.py::TestCommonCUDA::test_out_warning___rdiv___cuda, test/test_ops.py::TestCommonCUDA::test_out_warning___rmatmul___cuda, test/test_ops.py::TestCommonCUDA::test_out_warning___ror___cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__native_batch_norm_legit_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_T_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs__conversions_bfloat16_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs__conversions_byte_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs__conversions_cdouble_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs__conversions_chalf_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs__conversions_double_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs__conversions_float_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs__conversions_int_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs__conversions_long_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs__conversions_short_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_abs_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_addr_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_all_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_allclose_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_any_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_arange_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_as_strided_scatter_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_atan2_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_atan_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_atleast_1d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_atleast_2d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_atleast_3d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_bitwise_or_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_bitwise_right_shift_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_broadcast_tensors_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_broadcast_to_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_cat_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_clamp_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_clamp_min_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_column_stack_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_conj_physical_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_constant_pad_nd_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_copysign_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_cumsum_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_diagonal_scatter_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_dstack_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_empty_like_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_eq_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_erf_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_exp2_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_expand_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_eye_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_fft_fft_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_fft_fftn_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_fft_fftshift_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_fft_hfft2_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_fft_hfft_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_fft_ihfft_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_fft_irfft_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_fft_rfft_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_fft_rfftn_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_fill_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_flatten_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_flip_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_fliplr_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_fmin_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_fmod_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_ge_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_gt_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_heaviside_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_hsplit_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_hstack_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_hypot_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_igamma_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_igammac_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_isfinite_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_isinf_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_isneginf_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_isposinf_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_isreal_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_lcm_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_le_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_lgamma_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_linalg_matrix_norm_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_linalg_norm_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_linspace_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_log_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_logical_or_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_logical_xor_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_logsumexp_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_lt_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_meshgrid_list_of_tensors_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_meshgrid_variadic_tensors_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_minimum_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_narrow_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_neg_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_new_empty_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_new_ones_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nextafter_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_alpha_dropout_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_celu_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_dropout_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_elu_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_gelu_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_glu_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_hardshrink_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_hinge_embedding_loss_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_huber_loss_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_leaky_relu_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_nll_loss_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_relu_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_softshrink_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_threshold_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_norm_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_ones_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_positive_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_prod_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_randn_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_real_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_repeat_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_rot90_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_round_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_rsqrt_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_sgn_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_sign_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_signbit_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_special_entr_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_special_erfcx_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_special_i1_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_special_i1e_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_special_log_ndtr_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_special_log_softmax_with_dtype_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_special_logit_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_special_multigammaln_mvlgamma_p_5_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_special_ndtri_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_sqrt_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_square_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_stack_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_std_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_sub_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_sum_to_size_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_t_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_tan_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_tanh_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_tensor_split_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_trace_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_transpose_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_tril_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_triu_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_triu_indices_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_unbind_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_unfold_copy_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_view_as_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_vsplit_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_where_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_xlogy_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_zeros_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__softmax_backward_data_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_acos_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_acosh_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_addmm_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_addmm_decomposed_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_addmv_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_allclose_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_aminmax_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_argmax_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_argsort_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_as_strided_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_as_strided_partial_views_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_asin_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_asinh_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_atan2_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_atleast_1d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_atleast_3d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_baddbmm_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_bfloat16_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_bitwise_or_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_bitwise_right_shift_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_bmm_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_broadcast_shapes_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_broadcast_tensors_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_broadcast_to_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_bucketize_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_cartesian_prod_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_cdouble_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_chalf_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_char_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_cholesky_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_cholesky_inverse_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_cholesky_solve_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_clamp_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_clamp_max_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_clamp_min_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_column_stack_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_combinations_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_complex_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_conj_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_cos_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_cosh_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_cross_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_cummax_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_cumsum_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_cumulative_trapezoid_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_deg2rad_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_diag_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_diag_embed_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_diagonal_copy_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_digamma_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_dist_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_div_floor_rounding_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_div_no_rounding_mode_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_dot_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_dsplit_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_dstack_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_einsum_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_empty_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_erfc_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_exp_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_expand_as_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_expand_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_expm1_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_fft_fftn_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_fft_hfft_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_fft_ifft2_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_fft_ifftn_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_fft_ihfft2_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_fft_irfft2_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_fft_irfft_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_fft_rfft2_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_flatten_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_fliplr_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_float_power_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_fmax_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_fmin_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_fmod_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_full_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_gather_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_ge_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_geqrf_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_grid_sampler_2d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_gt_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_histogramdd_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_hstack_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_hypot_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_igamma_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_imag_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_index_add_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_index_copy_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_index_select_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_inner_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_int_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_isfinite_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_isposinf_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_istft_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_jiterator_2inputs_2outputs_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_jiterator_unary_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_kron_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_kthvalue_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_lcm_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_ldexp_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_lerp_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_lgamma_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_cond_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_det_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_det_singular_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_eigh_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_eigvals_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_eigvalsh_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_inv_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_ldl_factor_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_lstsq_grad_oriented_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_lu_factor_ex_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_matrix_rank_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_pinv_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_pinv_hermitian_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_pinv_singular_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_svd_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_svdvals_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_tensorinv_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_vander_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linspace_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_log1p_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_logcumsumexp_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_logdet_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_logical_and_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_logical_xor_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_logspace_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_logsumexp_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_long_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_masked_amax_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_masked_argmin_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_masked_fill_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_masked_logaddexp_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_masked_scatter_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_masked_std_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_masked_sum_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_matrix_exp_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_max_pool2d_with_indices_backward_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_max_reduction_no_dim_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_max_reduction_with_dim_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_maximum_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_mean_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_min_binary_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_min_reduction_no_dim_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_minimum_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_mm_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_movedim_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_multinomial_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_mvlgamma_mvlgamma_p_3_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nanmedian_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nanquantile_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nansum_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_narrow_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_native_batch_norm_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_ne_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_new_empty_strided_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_new_full_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_new_ones_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nextafter_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional__scaled_dot_product_attention_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_adaptive_avg_pool1d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_adaptive_max_pool2d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_alpha_dropout_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_batch_norm_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_celu_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_conv1d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_conv2d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_conv_transpose2d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_conv_transpose3d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_cosine_similarity_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_dropout3d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_elu_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_embedding_bag_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_feature_alpha_dropout_with_train_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_fractional_max_pool2d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_glu_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_grid_sample_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_hardswish_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_interpolate_bilinear_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_interpolate_nearest_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_linear_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_local_response_norm_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_logsigmoid_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_margin_ranking_loss_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_max_unpool1d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_max_unpool1d_grad_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_max_unpool2d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_max_unpool2d_grad_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_max_unpool3d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_mse_loss_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_nll_loss_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_one_hot_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_pairwise_distance_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_pixel_shuffle_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_prelu_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_rrelu_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_selu_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_silu_complex_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_silu_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_smooth_l1_loss_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_soft_margin_loss_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_softplus_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_softshrink_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_threshold_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_triplet_margin_loss_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_unfold_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_upsample_nearest_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_norm_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_norm_fro_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_norm_nuc_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_normal_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_normal_number_mean_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_ones_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_ones_like_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_outer_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_permute_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_polygamma_polygamma_n_1_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_polygamma_polygamma_n_2_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_polygamma_polygamma_n_3_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_polygamma_polygamma_n_4_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_positive_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_prod_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_rad2deg_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_rand_like_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_randint_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_randint_like_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_randn_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_reciprocal_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_remainder_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_renorm_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_repeat_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_reshape_as_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_resize_as__cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_resolve_conj_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_roll_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_round_decimals_3_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_round_decimals_neg_3_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_rsqrt_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_scalar_tensor_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_scatter_add_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_scatter_reduce_mean_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_searchsorted_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_segment_reduce_offsets_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_sgn_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_short_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_sign_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_signal_windows_blackman_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_signal_windows_cosine_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_signal_windows_exponential_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_signal_windows_gaussian_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_signal_windows_general_cosine_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_signal_windows_general_hamming_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_signal_windows_hamming_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_signbit_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_sin_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_sinh_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_slice_scatter_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_softmax_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_bessel_y0_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_bessel_y1_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_chebyshev_polynomial_u_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_chebyshev_polynomial_w_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_erfcx_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_hermite_polynomial_h_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_i0e_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_i1_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_i1e_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_legendre_polynomial_p_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_log_ndtr_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_modified_bessel_i0_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_modified_bessel_i1_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_modified_bessel_k1_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_ndtr_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_ndtri_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_polygamma_special_polygamma_n_0_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_scaled_modified_bessel_k1_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_shifted_chebyshev_polynomial_w_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_spherical_bessel_j0_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_sqrt_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_squeeze_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_stack_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_std_mean_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_std_unbiased_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_stft_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_svd_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_symeig_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_take_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_tanh_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_tensordot_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_tile_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_topk_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_trace_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_transpose_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_trapezoid_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_triangular_solve_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_tril_indices_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_triu_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_triu_indices_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_trunc_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_unflatten_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_unfold_copy_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_unfold_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_uniform_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_unique_consecutive_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_unique_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_var_mean_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_var_unbiased_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_vdot_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_view_as_complex_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_view_copy_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_view_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_vsplit_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_where_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_zeros_cuda, test/test_ops.py::TestCommonCUDA::test_out_where_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_xlogy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_zeros_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_pointwise_tag_coverage_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_T_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_T_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_T_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_T_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_T_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_T_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_T_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_T_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_T_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_T_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_T_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bfloat16_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bfloat16_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bfloat16_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bfloat16_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bfloat16_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bfloat16_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bool_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bool_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bool_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bool_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bool_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bool_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_byte_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_byte_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_byte_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_byte_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_byte_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_byte_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_byte_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cdouble_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cdouble_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cdouble_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cdouble_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cfloat_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cfloat_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cfloat_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cfloat_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cfloat_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_chalf_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_chalf_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_chalf_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_chalf_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_chalf_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_chalf_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_chalf_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_char_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_char_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_char_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_char_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_char_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_char_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_complex_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_complex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_complex_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_double_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_double_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_double_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_float_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_float_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_float_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_float_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_float_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_float_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_float_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_half_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_half_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_half_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_half_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_half_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_half_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_half_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_half_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_int_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_int_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_int_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_int_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_int_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_long_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_long_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_long_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_long_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_short_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_short_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_short_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_short_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_short_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_abs_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_abs_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_abs_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_abs_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_abs_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_abs_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_acos_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_acos_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_acos_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_acos_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_acos_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_acos_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_acos_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_acosh_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_acosh_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_acosh_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_acosh_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_acosh_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_add_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_add_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_add_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_addcdiv_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_addcdiv_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_addcdiv_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_addcmul_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_addcmul_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_addcmul_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_addcmul_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_addcmul_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_addcmul_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_addcmul_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_addr_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_addr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_addr_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_addr_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_addr_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_addr_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_all_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_all_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_all_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_all_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_all_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_allclose_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_allclose_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_allclose_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_amax_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_amax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_amax_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_amax_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_amax_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_amax_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_amin_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_amin_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_amin_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_amin_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_any_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_any_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_any_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_any_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_arange_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_arange_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_arange_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_arange_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_partial_views_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_partial_views_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_partial_views_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_partial_views_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_partial_views_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_partial_views_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_scatter_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_scatter_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_scatter_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_scatter_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_scatter_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_asin_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_asin_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_asin_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_asin_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_asin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_asin_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_asin_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_asin_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_asinh_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_asinh_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_asinh_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_asinh_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_asinh_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_asinh_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_asinh_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_asinh_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atan2_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atan2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atan2_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atan2_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atan2_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atan_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atan_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atan_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atan_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atan_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atan_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atanh_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atanh_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atanh_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atanh_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atanh_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_1d_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_1d_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_1d_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_1d_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_1d_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_1d_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_2d_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_2d_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_2d_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_2d_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_3d_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_3d_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_3d_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_3d_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_and_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_and_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_and_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_and_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_left_shift_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_not_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_not_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_not_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_or_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_or_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_or_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_right_shift_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_right_shift_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_xor_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_shapes_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_tensors_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_tensors_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_tensors_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_tensors_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_tensors_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_tensors_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_to_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_to_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_to_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_to_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_to_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_to_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bucketize_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bucketize_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bucketize_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bucketize_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bucketize_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cat_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cat_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cat_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cat_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cat_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cat_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cat_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ceil_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ceil_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ceil_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ceil_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_chunk_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_chunk_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_chunk_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_chunk_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_chunk_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_chunk_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_chunk_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_chunk_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_chunk_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_chunk_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_max_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_max_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_max_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_max_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_min_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_min_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_min_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_min_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_min_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clone_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clone_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clone_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clone_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clone_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clone_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clone_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_column_stack_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_column_stack_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_column_stack_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_column_stack_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_physical_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_physical_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_physical_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_physical_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_physical_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_physical_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_physical_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_physical_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_physical_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_physical_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_constant_pad_nd_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_constant_pad_nd_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_constant_pad_nd_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_constant_pad_nd_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_contiguous_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_contiguous_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_contiguous_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_contiguous_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_contiguous_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_copysign_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_copysign_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_copysign_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_copysign_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cos_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cos_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cos_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cos_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cos_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cos_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cosh_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cosh_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cosh_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cosh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cosh_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cosh_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cosh_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cumsum_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cumsum_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cumsum_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cumsum_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cumsum_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_embed_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_embed_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_embed_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_embed_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_embed_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_embed_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_embed_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_embed_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_embed_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_copy_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_copy_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_copy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_copy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_copy_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_scatter_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_scatter_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_scatter_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_scatter_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_scatter_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_scatter_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_digamma_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_div_floor_rounding_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_div_floor_rounding_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_div_floor_rounding_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_div_floor_rounding_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_div_no_rounding_mode_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_div_no_rounding_mode_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_div_no_rounding_mode_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_div_no_rounding_mode_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_div_no_rounding_mode_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_div_no_rounding_mode_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_div_no_rounding_mode_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_div_trunc_rounding_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_div_trunc_rounding_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_div_trunc_rounding_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_dsplit_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_dsplit_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_dsplit_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_dsplit_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_dsplit_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_dstack_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_dstack_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_dstack_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_dstack_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_dstack_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_dstack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_dstack_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_dstack_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_dstack_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_dstack_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_like_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_like_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_like_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_like_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_like_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_like_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_eq_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_eq_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_eq_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_eq_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_eq_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_eq_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_eq_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_eq_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_erf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_erf_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_erf_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_erf_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_erfc_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_erfc_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_erfc_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_erfc_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_erfinv_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_erfinv_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_erfinv_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_erfinv_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_exp2_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_exp2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_exp2_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_exp2_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_exp2_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_exp2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_exp2_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_exp_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_exp_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_exp_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_exp_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_exp_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_exp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_exp_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_exp_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_exp_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_exp_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_as_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_as_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_as_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_as_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_as_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_as_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_as_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_as_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_as_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expm1_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expm1_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expm1_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expm1_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_eye_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_eye_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_eye_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_eye_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_eye_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_eye_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_eye_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft2_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft2_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft2_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft2_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft2_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft2_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftn_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftn_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftn_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftshift_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftshift_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftshift_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftshift_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftshift_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftshift_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftshift_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft2_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft2_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft2_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft2_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft2_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfftn_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfftn_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfftn_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfftn_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft2_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft2_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft2_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft2_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftn_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftn_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftn_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftn_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftn_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftn_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftshift_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftshift_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftshift_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftshift_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftshift_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftshift_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftshift_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftshift_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftshift_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfft2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfft2_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfft2_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfft2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfft_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfft_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfft_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfft_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfftn_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfftn_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfftn_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfftn_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfftn_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft2_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft2_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft2_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft2_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft2_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft2_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfftn_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfftn_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfftn_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfft2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfft2_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfft2_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfft2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfft2_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfft2_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfft_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfft_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfftn_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfftn_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfftn_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfftn_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fill_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fill_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fill_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fill_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fill_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fill_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fill_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fill_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flatten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flatten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flatten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flatten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flatten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flatten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flatten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flatten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flip_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flip_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flip_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flip_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flip_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fliplr_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fliplr_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fliplr_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fliplr_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fliplr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fliplr_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fliplr_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fliplr_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fliplr_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flipud_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flipud_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_float_power_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_float_power_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_float_power_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_float_power_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_float_power_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_floor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_floor_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_floor_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_floor_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_floor_divide_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_floor_divide_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_floor_divide_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_floor_divide_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_floor_divide_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fmax_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fmax_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fmax_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fmax_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fmin_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fmin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fmin_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fmin_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fmin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fmod_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fmod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fmod_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fmod_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fmod_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fmod_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fmod_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_frac_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_frac_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_gcd_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_gcd_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_gcd_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ge_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ge_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ge_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ge_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_gt_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_gt_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_heaviside_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_heaviside_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_hsplit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_hsplit_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_hsplit_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_hsplit_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_hsplit_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_hstack_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_hstack_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_hypot_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_hypot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_i0_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_i0_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_i0_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_i0_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_i0_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_igamma_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_igammac_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_imag_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_imag_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_add_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_add_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_add_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_add_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_add_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_copy_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_copy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_copy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_fill_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_fill_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_fill_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_fill_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_fill_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_select_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_select_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_select_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_select_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_select_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_select_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_select_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_select_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isclose_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isclose_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isclose_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isclose_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isclose_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isclose_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isclose_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isclose_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isfinite_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isfinite_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isfinite_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isfinite_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isfinite_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isinf_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isinf_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isinf_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isinf_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isinf_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isnan_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isnan_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isnan_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isnan_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isneginf_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isneginf_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isneginf_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isneginf_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isposinf_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isreal_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isreal_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isreal_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_lcm_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_le_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_le_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_le_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_le_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_le_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_le_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_lerp_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_lerp_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_lerp_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_lerp_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_lerp_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_lgamma_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_lgamma_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_lgamma_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_matrix_norm_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_matrix_norm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_matrix_norm_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_norm_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_norm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_norm_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_norm_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_svd_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_svd_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_svd_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_svdvals_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_svdvals_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_vector_norm_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_vector_norm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_vector_norm_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linspace_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linspace_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linspace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linspace_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linspace_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linspace_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linspace_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linspace_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log10_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log10_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log10_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log10_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log10_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log1p_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log1p_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log1p_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log1p_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log1p_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log1p_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log1p_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log2_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log2_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log_softmax_with_dtype_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log_softmax_with_dtype_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log_softmax_with_dtype_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log_softmax_with_dtype_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log_softmax_with_dtype_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log_softmax_with_dtype_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log_softmax_with_dtype_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_and_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_and_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_and_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_and_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_and_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_and_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_and_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_not_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_not_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_not_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_not_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_not_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_not_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_or_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_or_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_or_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_or_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_or_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_or_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_xor_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_xor_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_xor_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_xor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_xor_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_xor_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_xor_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_xor_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logspace_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logspace_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logspace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logspace_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logspace_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logsumexp_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logsumexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logsumexp_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logsumexp_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logsumexp_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logsumexp_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logsumexp_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_lt_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_lt_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_lt_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_lt_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_lt_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_masked_fill_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_masked_fill_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_masked_fill_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_masked_fill_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_masked_fill_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_maximum_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_maximum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_maximum_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_maximum_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_maximum_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_maximum_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_mean_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_mean_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_mean_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_mean_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_mean_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_list_of_tensors_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_list_of_tensors_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_list_of_tensors_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_list_of_tensors_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_list_of_tensors_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_variadic_tensors_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_variadic_tensors_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_variadic_tensors_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_variadic_tensors_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_minimum_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_minimum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_minimum_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_movedim_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_movedim_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_movedim_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_movedim_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_movedim_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_movedim_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_movedim_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_movedim_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_mul_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_mul_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_mul_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_mul_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_mul_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_mul_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_mul_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_mul_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_mul_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nan_to_num_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nan_to_num_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nan_to_num_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nan_to_num_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_copy_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_copy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_native_layer_norm_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ne_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ne_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ne_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ne_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_neg_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_neg_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_neg_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_neg_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_neg_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_neg_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_neg_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_neg_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_strided_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_strided_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_strided_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_strided_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_strided_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_strided_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_strided_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_strided_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_strided_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_full_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_full_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_full_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_full_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_full_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_full_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_full_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_ones_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_ones_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_ones_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_ones_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_ones_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_ones_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_ones_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_zeros_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_zeros_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_zeros_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_zeros_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_zeros_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nextafter_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_alpha_dropout_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_alpha_dropout_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_alpha_dropout_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_celu_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_celu_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_dropout_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_dropout_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_elu_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_elu_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_elu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_gelu_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_gelu_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_glu_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_group_norm_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_group_norm_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_hardshrink_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_hardshrink_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_hardtanh_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_hardtanh_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_hardtanh_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_hardtanh_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_hinge_embedding_loss_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_hinge_embedding_loss_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_hinge_embedding_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_huber_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_l1_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_layer_norm_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_layer_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_leaky_relu_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_leaky_relu_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_log_softmax_with_dtype_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_log_softmax_with_dtype_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_log_softmax_with_dtype_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_log_softmax_with_dtype_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_log_softmax_with_dtype_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_log_softmax_with_dtype_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_log_softmax_with_dtype_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_log_softmax_with_dtype_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_margin_ranking_loss_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_margin_ranking_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_margin_ranking_loss_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_mish_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_mish_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_mse_loss_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_mse_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_nll_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_nll_loss_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pairwise_distance_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pairwise_distance_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pairwise_distance_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pairwise_distance_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pdist_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_poisson_nll_loss_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_poisson_nll_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_poisson_nll_loss_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_poisson_nll_loss_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_poisson_nll_loss_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_poisson_nll_loss_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_poisson_nll_loss_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_prelu_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_prelu_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_prelu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_relu6_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_relu6_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_relu6_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_relu6_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_relu6_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_relu6_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_relu_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_relu_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_selu_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_selu_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_selu_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softmax_with_dtype_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softmax_with_dtype_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softmax_with_dtype_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softmax_with_dtype_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softmin_with_dtype_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softmin_with_dtype_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softmin_with_dtype_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softmin_with_dtype_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softmin_with_dtype_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softmin_with_dtype_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softplus_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softplus_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softshrink_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softshrink_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softshrink_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_tanhshrink_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_tanhshrink_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_tanhshrink_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_tanhshrink_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_tanhshrink_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_tanhshrink_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_threshold_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_threshold_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_threshold_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_threshold_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_threshold_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_threshold_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_triplet_margin_loss_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_triplet_margin_loss_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_triplet_margin_loss_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_triplet_margin_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_triplet_margin_loss_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_triplet_margin_loss_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_triplet_margin_loss_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_norm_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_norm_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_norm_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ones_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ones_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ones_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ones_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ones_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ones_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ones_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_permute_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_permute_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_permute_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_permute_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_permute_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_permute_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_permute_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_permute_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_permute_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_positive_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_positive_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_positive_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_positive_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_positive_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_pow_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_pow_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_pow_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_pow_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_pow_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_prod_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_prod_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_prod_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_prod_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_prod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_prod_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_prod_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_prod_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_randn_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_randn_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_randn_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_randn_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ravel_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ravel_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ravel_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ravel_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ravel_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_real_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_real_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_real_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_real_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_real_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_real_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_real_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reciprocal_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reciprocal_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reciprocal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reciprocal_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reciprocal_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reciprocal_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reciprocal_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_remainder_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_remainder_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_remainder_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_remainder_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_remainder_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_remainder_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_remainder_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_repeat_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_repeat_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_repeat_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_repeat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_repeat_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_repeat_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_repeat_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_repeat_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_as_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_as_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_as_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_as_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_as_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_as_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_as_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_roll_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_roll_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_roll_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_roll_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rot90_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rot90_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rot90_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rot90_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rot90_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_round_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_round_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_round_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rsqrt_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rsqrt_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rsqrt_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rsqrt_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rsub_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rsub_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rsub_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rsub_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rsub_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rsub_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rsub_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rsub_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sgn_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sgn_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sgn_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sgn_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sgn_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sigmoid_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sigmoid_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sigmoid_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sigmoid_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sign_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sign_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sign_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sign_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_signbit_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_signbit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_signbit_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_signbit_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_signbit_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sin_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sin_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sin_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sin_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sin_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sin_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sinc_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sinc_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sinc_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sinc_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sinc_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sinh_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sinh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sinh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sinh_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sinh_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sinh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sinh_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_softmax_with_dtype_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_softmax_with_dtype_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_softmax_with_dtype_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_softmax_with_dtype_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_bessel_j0_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_bessel_j0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_bessel_j0_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_bessel_j0_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_bessel_j0_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_bessel_j0_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_bessel_j1_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_bessel_j1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_bessel_j1_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_bessel_j1_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_bessel_j1_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_bessel_j1_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_bessel_j1_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_entr_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_entr_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_entr_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_entr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_erfcx_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_erfcx_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_erfcx_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_erfcx_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_erfcx_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_erfcx_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i0e_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i0e_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i0e_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i0e_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i1_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i1_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i1_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i1e_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i1e_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i1e_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i1e_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_log_ndtr_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_log_ndtr_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_log_ndtr_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_log_ndtr_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_log_ndtr_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_log_ndtr_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_log_softmax_with_dtype_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_log_softmax_with_dtype_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_log_softmax_with_dtype_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_log_softmax_with_dtype_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_log_softmax_with_dtype_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_log_softmax_with_dtype_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_logit_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_logit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_logit_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_logit_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_1_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_1_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_1_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_1_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_1_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_3_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_5_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_5_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_5_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_5_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_5_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_ndtr_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_ndtr_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_ndtr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_ndtr_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_ndtr_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_ndtr_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_ndtri_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_ndtri_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_ndtri_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_ndtri_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_ndtri_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_softmax_with_dtype_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_softmax_with_dtype_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_softmax_with_dtype_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_softmax_with_dtype_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_softmax_with_dtype_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_softmax_with_dtype_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_spherical_bessel_j0_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_spherical_bessel_j0_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_spherical_bessel_j0_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_spherical_bessel_j0_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_xlog1py_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_xlog1py_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_xlog1py_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_xlog1py_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_zeta_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_zeta_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_zeta_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sqrt_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sqrt_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sqrt_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sqrt_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sqrt_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sqrt_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sqrt_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sqrt_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_square_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_square_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_square_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_square_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_square_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_square_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_square_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_square_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_squeeze_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_squeeze_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_squeeze_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_squeeze_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_squeeze_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_squeeze_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_stack_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_stack_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_stack_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_stack_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_stack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_stack_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_std_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_std_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_std_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_std_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_std_mean_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_std_mean_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_std_mean_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sub_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sub_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sub_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sub_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sub_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sub_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sub_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_to_size_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_to_size_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_to_size_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_to_size_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_t_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_t_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_t_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_t_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_t_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_t_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_t_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tan_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tan_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tan_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tan_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tan_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tan_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tanh_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tanh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tanh_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tanh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tanh_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tensor_split_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tensor_split_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tensor_split_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tensor_split_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tensor_split_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tensor_split_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tensor_split_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tensor_split_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tensor_split_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_to_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_to_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_to_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_to_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_to_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_to_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_to_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_to_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_to_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_trace_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_trace_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_trace_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_trace_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_transpose_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_transpose_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_transpose_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_transpose_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_transpose_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tril_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tril_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tril_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tril_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tril_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tril_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tril_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tril_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tril_indices_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_triu_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_triu_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_triu_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_triu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_triu_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_triu_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_triu_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_triu_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_true_divide_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_true_divide_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_true_divide_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_true_divide_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_true_divide_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_true_divide_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_true_divide_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_true_divide_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_true_divide_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_true_divide_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_trunc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_trunc_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_trunc_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unbind_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unbind_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unbind_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unbind_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unbind_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unbind_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unbind_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unflatten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unflatten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unflatten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unflatten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unflatten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unflatten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unflatten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unflatten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_copy_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_copy_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_copy_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_copy_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unsqueeze_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unsqueeze_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unsqueeze_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unsqueeze_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unsqueeze_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_var_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_var_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_var_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_var_mean_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_var_mean_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_var_mean_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_as_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_as_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_as_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_as_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_as_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_as_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_as_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_as_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_vsplit_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_vsplit_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_vsplit_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_vsplit_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_vsplit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_vsplit_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_vsplit_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_vstack_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_vstack_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_vstack_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_vstack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_vstack_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_vstack_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_vstack_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_vstack_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_vstack_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_where_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_where_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_where_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_where_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_where_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_xlogy_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_xlogy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_xlogy_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_xlogy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_xlogy_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_zeros_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_zeros_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_zeros_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_zeros_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_zeros_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_zeros_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_add_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_arange_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_bitwise_right_shift_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_bitwise_xor_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_clamp_min_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_copysign_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_diag_embed_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_diagonal_copy_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_diagonal_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_div_floor_rounding_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_dsplit_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_dstack_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_eq_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fft_fft2_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fft_fftn_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fft_hfft2_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fft_hfft_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fft_hfftn_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fft_ifft2_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fft_ifft_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fft_ifftn_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fft_irfft_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fft_irfftn_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fliplr_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_floor_divide_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fmax_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fmin_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fmod_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_ge_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_gt_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_hypot_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_igamma_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_igammac_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_index_select_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_lcm_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_logical_and_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_logical_or_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_lt_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_masked_fill_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_mean_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_minimum_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_mul_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_neg_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_nextafter_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_nn_functional_gelu_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_nn_functional_hinge_embedding_loss_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_nn_functional_huber_loss_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_nn_functional_margin_ranking_loss_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_nn_functional_prelu_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_reshape_as_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_reshape_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_special_zeta_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_sub_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_sum_to_size_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_triu_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_unbind_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_vsplit_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_xlogy_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_T_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_T_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_T_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_T_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_T_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_T_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_T_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_T_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_T_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_T_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_T_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_T_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_T_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_byte_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_byte_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_byte_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_byte_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_byte_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_byte_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_byte_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_complex_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_complex_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_complex_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_float_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_float_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_float_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_float_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_float_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_float_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_float_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_float_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_float_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_float_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_float_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_float_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_float_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_float_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_float_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_half_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_half_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_half_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_half_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_half_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_half_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_half_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_half_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_half_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_half_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_half_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_half_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_int_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_int_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_int_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_int_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_int_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_int_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_int_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_int_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_int_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_int_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_int_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_int_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_short_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_short_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_short_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_short_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_short_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_short_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_short_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_short_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_short_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_short_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_short_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_short_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acos_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acos_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acos_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acos_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acos_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acos_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acos_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acos_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acos_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acosh_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acosh_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acosh_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acosh_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acosh_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acosh_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acosh_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acosh_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acosh_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acosh_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acosh_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acosh_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acosh_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcdiv_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcdiv_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcdiv_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcdiv_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcdiv_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcmul_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcmul_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcmul_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcmul_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcmul_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcmul_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcmul_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcmul_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addr_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addr_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addr_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addr_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addr_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addr_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addr_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addr_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addr_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addr_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addr_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addr_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addr_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_all_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_all_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_all_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_all_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_all_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_all_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_all_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_all_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_allclose_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_allclose_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_allclose_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_allclose_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_allclose_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_allclose_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_allclose_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_allclose_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_allclose_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amax_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amax_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amax_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amax_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amax_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amax_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amax_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amax_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amax_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amax_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amax_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amax_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amin_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amin_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amin_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amin_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amin_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amin_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amin_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amin_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_any_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_any_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_any_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_any_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_any_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_any_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_any_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_any_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_any_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_any_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_any_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_any_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_any_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_any_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_any_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_arange_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_arange_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_arange_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_arange_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_arange_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_arange_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_arange_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_arange_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_partial_views_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_partial_views_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_partial_views_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_partial_views_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_partial_views_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_partial_views_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_partial_views_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_partial_views_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_partial_views_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_partial_views_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_partial_views_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan2_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan2_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan2_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan2_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan2_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan2_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan2_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_and_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_and_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_and_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_and_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_and_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_and_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_and_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_and_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_left_shift_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_left_shift_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_left_shift_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_left_shift_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_left_shift_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_not_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_not_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_not_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_not_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_not_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_or_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_or_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_or_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_or_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_or_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_or_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_or_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_right_shift_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_right_shift_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_right_shift_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_right_shift_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_right_shift_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_xor_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_xor_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_xor_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_xor_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_xor_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_shapes_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_tensors_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_tensors_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_tensors_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_tensors_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_tensors_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_tensors_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_tensors_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_tensors_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_tensors_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_tensors_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_tensors_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_tensors_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_tensors_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_to_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_to_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_to_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_to_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_to_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_to_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_to_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_to_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_to_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_to_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bucketize_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bucketize_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bucketize_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bucketize_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bucketize_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bucketize_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bucketize_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bucketize_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bucketize_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ceil_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ceil_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ceil_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ceil_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ceil_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ceil_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ceil_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ceil_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ceil_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ceil_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ceil_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ceil_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ceil_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ceil_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_max_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_max_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_max_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_max_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_max_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_max_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_max_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_max_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_max_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_max_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_max_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_min_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_min_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_min_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_min_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_min_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_min_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_min_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_min_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_min_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_column_stack_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_column_stack_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_column_stack_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_column_stack_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_column_stack_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_column_stack_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_column_stack_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_column_stack_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_column_stack_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_column_stack_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_physical_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_physical_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_physical_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_physical_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_physical_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_physical_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_physical_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_physical_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_physical_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_physical_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_physical_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_physical_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_physical_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_constant_pad_nd_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_constant_pad_nd_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_constant_pad_nd_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_constant_pad_nd_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_constant_pad_nd_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_constant_pad_nd_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_constant_pad_nd_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_constant_pad_nd_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_constant_pad_nd_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_constant_pad_nd_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_constant_pad_nd_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_constant_pad_nd_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_copysign_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_copysign_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_copysign_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_copysign_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_copysign_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_copysign_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_copysign_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_copysign_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_copysign_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_copysign_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumsum_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumsum_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumsum_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumsum_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumsum_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumsum_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumsum_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumsum_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumsum_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumsum_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumsum_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_scatter_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_scatter_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_scatter_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_scatter_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_scatter_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_scatter_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_scatter_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_scatter_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_scatter_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_scatter_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_scatter_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_scatter_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_scatter_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_scatter_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_digamma_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_digamma_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_digamma_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_digamma_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_digamma_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_digamma_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_digamma_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_digamma_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_digamma_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_digamma_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_digamma_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_floor_rounding_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_floor_rounding_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_floor_rounding_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_floor_rounding_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_floor_rounding_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_floor_rounding_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_floor_rounding_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_floor_rounding_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_floor_rounding_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_floor_rounding_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_floor_rounding_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_floor_rounding_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_floor_rounding_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_trunc_rounding_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_trunc_rounding_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_trunc_rounding_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_trunc_rounding_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_trunc_rounding_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_trunc_rounding_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_trunc_rounding_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_trunc_rounding_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erf_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erf_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erf_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erf_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erf_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erf_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erf_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erf_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erf_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erf_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erf_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erf_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erf_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfc_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfc_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfc_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfc_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfc_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfc_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfc_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfc_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfc_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfc_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfinv_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfinv_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfinv_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfinv_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfinv_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfinv_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfinv_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfinv_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfinv_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfinv_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp2_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp2_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp2_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp2_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp2_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp2_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp2_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp2_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp2_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp2_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp2_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_as_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_as_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_as_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_as_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_as_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_as_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_as_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_as_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_as_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_as_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_as_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_as_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_as_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_as_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expm1_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expm1_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expm1_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expm1_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expm1_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expm1_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expm1_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expm1_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expm1_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eye_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eye_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eye_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eye_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eye_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eye_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eye_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eye_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eye_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eye_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eye_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eye_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eye_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eye_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft2_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft2_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft2_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft2_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft2_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft2_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft2_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft2_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft2_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft2_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft2_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft2_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft2_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftn_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftn_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftn_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftn_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftn_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftn_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftn_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftn_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftn_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftn_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftn_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftn_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft2_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft2_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft2_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft2_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft2_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft2_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft2_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft2_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft2_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft2_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft2_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft2_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfftn_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfftn_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfftn_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfftn_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfftn_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfftn_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfftn_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfftn_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfftn_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfftn_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfftn_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfftn_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfftn_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft2_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft2_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft2_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft2_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft2_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft2_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft2_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft2_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft2_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft2_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft2_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft2_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft2_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft2_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftn_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftn_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftn_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftn_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftn_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftn_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftn_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftn_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftn_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftn_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftn_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftn_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftn_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftshift_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftshift_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftshift_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftshift_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftshift_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftshift_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftshift_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftshift_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftshift_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftshift_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftshift_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftshift_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftshift_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftshift_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft2_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft2_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft2_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft2_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft2_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft2_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft2_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft2_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft2_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft2_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft2_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft2_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfftn_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfftn_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfftn_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfftn_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfftn_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfftn_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfftn_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfftn_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft2_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft2_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft2_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft2_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft2_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft2_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft2_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft2_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft2_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft2_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft2_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft2_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft2_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft2_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft2_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft2_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfftn_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfftn_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfftn_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfftn_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfftn_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfftn_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfftn_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfftn_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfftn_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft2_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft2_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft2_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft2_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft2_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft2_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft2_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft2_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft2_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft2_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfftn_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfftn_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfftn_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfftn_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfftn_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfftn_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfftn_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flip_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flip_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flip_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flip_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flip_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flip_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flip_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flip_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flip_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flip_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flip_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flip_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flip_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fliplr_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fliplr_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fliplr_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fliplr_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fliplr_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fliplr_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fliplr_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fliplr_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fliplr_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fliplr_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fliplr_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fliplr_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fliplr_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flipud_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flipud_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flipud_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flipud_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flipud_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flipud_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flipud_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flipud_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flipud_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flipud_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flipud_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flipud_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flipud_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flipud_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_float_power_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_float_power_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_float_power_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_float_power_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_float_power_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_float_power_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_float_power_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_float_power_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_float_power_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_float_power_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_divide_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_divide_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_divide_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_divide_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_divide_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_divide_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_divide_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_divide_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmax_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmax_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmax_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmax_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmax_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmax_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmax_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmax_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmin_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmin_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmin_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmin_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmin_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmin_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmin_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmin_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmin_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmod_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmod_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmod_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmod_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmod_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmod_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmod_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmod_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmod_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_frac_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_frac_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_frac_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_frac_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gcd_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gcd_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gcd_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gcd_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gcd_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ge_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ge_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ge_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ge_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ge_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ge_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ge_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ge_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ge_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ge_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ge_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ge_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ge_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ge_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gt_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gt_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gt_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gt_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gt_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gt_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gt_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gt_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gt_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gt_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gt_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_heaviside_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_heaviside_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_heaviside_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_heaviside_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_heaviside_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_heaviside_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_heaviside_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_heaviside_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_heaviside_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hypot_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hypot_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hypot_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_i0_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_i0_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_i0_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_i0_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_i0_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_i0_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_i0_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_i0_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_i0_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_i0_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_igamma_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_igamma_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_igammac_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_igammac_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_imag_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_imag_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_imag_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_copy_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_copy_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_copy_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_copy_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_copy_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_copy_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_copy_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_copy_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_copy_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_copy_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_fill_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_fill_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_fill_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_fill_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_fill_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_fill_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_fill_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_fill_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_fill_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_fill_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_fill_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_fill_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_fill_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isclose_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isclose_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isclose_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isclose_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isclose_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isclose_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isclose_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isclose_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isclose_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isclose_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isclose_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isnan_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isnan_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isnan_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isnan_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isnan_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isnan_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isnan_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isnan_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isnan_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isnan_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isnan_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isnan_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isnan_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isneginf_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isneginf_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isneginf_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isneginf_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isneginf_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isneginf_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isneginf_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isneginf_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isneginf_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isneginf_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isneginf_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isposinf_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isposinf_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isposinf_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isposinf_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isposinf_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isposinf_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isposinf_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lcm_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lcm_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lcm_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lcm_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lcm_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lcm_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lcm_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_le_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_le_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_le_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_le_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_le_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_le_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lerp_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lerp_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lerp_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lerp_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lerp_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lerp_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lgamma_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lgamma_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lgamma_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lgamma_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lgamma_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lgamma_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lgamma_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lgamma_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lgamma_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_matrix_norm_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_matrix_norm_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_matrix_norm_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_matrix_norm_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_matrix_norm_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_matrix_norm_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_matrix_norm_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_norm_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_norm_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_norm_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_norm_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_norm_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_norm_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_norm_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_svd_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_svd_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_svd_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_svd_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_svd_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_svdvals_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_svdvals_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_svdvals_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_svdvals_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_svdvals_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_vector_norm_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_vector_norm_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_vector_norm_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_vector_norm_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_vector_norm_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log10_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log10_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log10_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log10_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log10_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log10_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log10_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log10_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log10_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log10_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log10_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log10_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log1p_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log1p_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log1p_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log1p_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log1p_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log1p_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log1p_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log1p_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log1p_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log2_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log2_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log2_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log2_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log2_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log2_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log2_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log2_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log2_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log2_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log2_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_and_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_and_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_and_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_and_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_and_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_and_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_and_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_and_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_and_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_and_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_and_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_and_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_and_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_not_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_not_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_not_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_not_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_not_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_not_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_not_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_not_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_not_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_not_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_not_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_not_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_not_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_or_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_or_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_or_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_or_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_or_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_or_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_or_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_or_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_or_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_or_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_or_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_or_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_or_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_xor_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_xor_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_xor_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_xor_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_xor_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_xor_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_xor_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_xor_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_xor_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_xor_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_xor_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_xor_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logspace_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logspace_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logspace_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logspace_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logspace_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logspace_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logspace_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logspace_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logspace_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logsumexp_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logsumexp_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logsumexp_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logsumexp_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logsumexp_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logsumexp_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logsumexp_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logsumexp_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logsumexp_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logsumexp_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lt_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lt_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lt_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lt_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lt_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lt_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_maximum_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_maximum_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_maximum_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_maximum_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_maximum_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_maximum_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_maximum_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_maximum_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_maximum_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_maximum_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_maximum_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mean_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mean_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mean_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mean_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mean_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mean_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_variadic_tensors_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_variadic_tensors_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_variadic_tensors_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_variadic_tensors_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_variadic_tensors_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_variadic_tensors_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_variadic_tensors_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_variadic_tensors_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_variadic_tensors_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_variadic_tensors_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_variadic_tensors_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_minimum_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_minimum_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_minimum_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_minimum_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_minimum_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_minimum_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_minimum_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_minimum_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_minimum_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_minimum_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_minimum_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_minimum_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mul_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mul_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mul_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mul_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mul_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mul_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mul_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mul_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mul_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mul_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nan_to_num_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nan_to_num_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nan_to_num_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nan_to_num_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nan_to_num_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nan_to_num_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nan_to_num_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nan_to_num_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nan_to_num_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nan_to_num_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nan_to_num_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_native_layer_norm_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_native_layer_norm_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_native_layer_norm_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_native_layer_norm_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_native_layer_norm_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_native_layer_norm_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ne_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ne_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ne_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ne_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ne_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ne_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ne_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ne_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ne_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ne_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ne_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ne_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ne_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_neg_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_neg_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_neg_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_neg_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_neg_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_neg_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_neg_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_neg_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_neg_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_neg_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_neg_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_neg_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_neg_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_neg_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_neg_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_strided_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_strided_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_strided_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_strided_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_strided_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_strided_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_strided_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_strided_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_strided_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_strided_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_full_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_full_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_full_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_full_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_full_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_full_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_full_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_full_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_full_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_full_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_full_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_full_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_full_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_zeros_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_zeros_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_zeros_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_zeros_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_zeros_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_zeros_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_zeros_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_zeros_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_zeros_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_zeros_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_zeros_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_zeros_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_zeros_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_zeros_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_zeros_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_zeros_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nextafter_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nextafter_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nextafter_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_alpha_dropout_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_alpha_dropout_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_alpha_dropout_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_celu_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_celu_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_celu_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_celu_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_celu_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_dropout_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_dropout_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_dropout_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_dropout_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_dropout_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_dropout_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_elu_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_elu_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_elu_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_elu_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_elu_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_elu_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_gelu_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_gelu_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_gelu_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_glu_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_glu_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_glu_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_glu_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_glu_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_glu_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_glu_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_group_norm_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_group_norm_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_group_norm_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_group_norm_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_group_norm_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hardshrink_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hardshrink_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hardshrink_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hardshrink_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hardshrink_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hardtanh_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hardtanh_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hardtanh_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hardtanh_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hardtanh_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hardtanh_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hardtanh_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hardtanh_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hardtanh_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hardtanh_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hardtanh_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hinge_embedding_loss_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hinge_embedding_loss_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hinge_embedding_loss_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hinge_embedding_loss_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_huber_loss_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_huber_loss_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_huber_loss_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_huber_loss_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_huber_loss_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_huber_loss_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_l1_loss_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_l1_loss_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_l1_loss_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_l1_loss_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_l1_loss_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_l1_loss_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_l1_loss_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_l1_loss_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_layer_norm_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_layer_norm_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_layer_norm_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_layer_norm_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_leaky_relu_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_leaky_relu_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_leaky_relu_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_leaky_relu_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_leaky_relu_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_margin_ranking_loss_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_margin_ranking_loss_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_margin_ranking_loss_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_margin_ranking_loss_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_margin_ranking_loss_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_margin_ranking_loss_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_margin_ranking_loss_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_margin_ranking_loss_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_mish_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_mish_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_mish_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_mish_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_mse_loss_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_mse_loss_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_mse_loss_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_mse_loss_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_mse_loss_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_mse_loss_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_mse_loss_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_nll_loss_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_nll_loss_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_nll_loss_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_nll_loss_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_nll_loss_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pairwise_distance_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pairwise_distance_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pairwise_distance_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pairwise_distance_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pairwise_distance_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pairwise_distance_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pairwise_distance_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pairwise_distance_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pairwise_distance_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pairwise_distance_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pdist_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pdist_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_poisson_nll_loss_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_poisson_nll_loss_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_poisson_nll_loss_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_poisson_nll_loss_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_poisson_nll_loss_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_poisson_nll_loss_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_poisson_nll_loss_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_poisson_nll_loss_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_poisson_nll_loss_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_poisson_nll_loss_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_poisson_nll_loss_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_prelu_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_prelu_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_prelu_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_prelu_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu6_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu6_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu6_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu6_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu6_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu6_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu6_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu6_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu6_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu6_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu6_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu6_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_selu_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_selu_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_selu_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmin_with_dtype_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmin_with_dtype_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmin_with_dtype_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmin_with_dtype_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmin_with_dtype_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmin_with_dtype_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmin_with_dtype_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmin_with_dtype_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmin_with_dtype_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmin_with_dtype_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmin_with_dtype_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmin_with_dtype_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmin_with_dtype_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmin_with_dtype_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softplus_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softplus_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softplus_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softplus_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softshrink_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softshrink_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softshrink_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softshrink_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_tanhshrink_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_tanhshrink_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_tanhshrink_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_tanhshrink_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_tanhshrink_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_tanhshrink_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_tanhshrink_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_tanhshrink_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_tanhshrink_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_threshold_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_threshold_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_threshold_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_threshold_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_threshold_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_threshold_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_threshold_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_threshold_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_threshold_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_triplet_margin_loss_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_triplet_margin_loss_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_triplet_margin_loss_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_triplet_margin_loss_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_triplet_margin_loss_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_triplet_margin_loss_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_triplet_margin_loss_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_triplet_margin_loss_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_triplet_margin_loss_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_triplet_margin_loss_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_norm_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_norm_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_norm_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_norm_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_norm_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_norm_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_norm_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_norm_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ones_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ones_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ones_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ones_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ones_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ones_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ones_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ones_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ones_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ones_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ones_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ones_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ones_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_positive_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_positive_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_positive_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_positive_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_positive_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_positive_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_positive_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_positive_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_positive_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_positive_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_positive_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_pow_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_pow_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_pow_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_pow_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_pow_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_pow_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_pow_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_pow_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_pow_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_pow_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_pow_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_pow_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_prod_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_prod_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_prod_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_prod_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_prod_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_prod_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_prod_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_prod_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_prod_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_prod_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_prod_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_randn_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_randn_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_randn_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_randn_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_randn_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_randn_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_randn_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_randn_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reciprocal_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reciprocal_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reciprocal_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reciprocal_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reciprocal_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reciprocal_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reciprocal_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reciprocal_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reciprocal_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reciprocal_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_remainder_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_remainder_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_remainder_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_remainder_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_remainder_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_remainder_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_remainder_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_remainder_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_remainder_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_repeat_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_repeat_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_repeat_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_repeat_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_repeat_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_repeat_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_repeat_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_repeat_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_repeat_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_repeat_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_repeat_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_repeat_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rot90_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rot90_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rot90_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rot90_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rot90_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rot90_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rot90_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rot90_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rot90_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rot90_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_round_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_round_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_round_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_round_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_round_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_round_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_round_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_round_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_round_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsqrt_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsqrt_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsqrt_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsqrt_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsqrt_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsqrt_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsqrt_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsqrt_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsqrt_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsqrt_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsqrt_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsqrt_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsqrt_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsqrt_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsqrt_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsub_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsub_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsub_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsub_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsub_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsub_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsub_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsub_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsub_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsub_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sign_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sign_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sign_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sign_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sign_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sign_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sign_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sign_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sign_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sign_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_signbit_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_signbit_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_signbit_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_signbit_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_signbit_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_signbit_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_signbit_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_signbit_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_signbit_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_signbit_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_signbit_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sin_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sin_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sin_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sin_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sin_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sin_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sin_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sin_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sin_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sin_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sin_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sin_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sin_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sin_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sin_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinc_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinc_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinc_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinc_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinc_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinc_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinc_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinc_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinc_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinc_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinc_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinc_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinc_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinc_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinc_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinc_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinc_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_softmax_with_dtype_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_softmax_with_dtype_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_softmax_with_dtype_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_softmax_with_dtype_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_softmax_with_dtype_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_softmax_with_dtype_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_softmax_with_dtype_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_softmax_with_dtype_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_softmax_with_dtype_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_softmax_with_dtype_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_softmax_with_dtype_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_softmax_with_dtype_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_softmax_with_dtype_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j0_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j0_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j0_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j0_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j0_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j0_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j0_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j0_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j1_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j1_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j1_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j1_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j1_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_entr_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_entr_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_entr_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_entr_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_entr_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_entr_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_entr_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_erfcx_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_erfcx_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_erfcx_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_erfcx_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_erfcx_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_erfcx_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_erfcx_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_erfcx_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_erfcx_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i0e_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i0e_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i0e_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i0e_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i0e_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i0e_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i0e_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i0e_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1e_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1e_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1e_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1e_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1e_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1e_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_ndtr_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_ndtr_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_ndtr_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_ndtr_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_ndtr_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_ndtr_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_ndtr_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_logit_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_logit_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_logit_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_logit_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_logit_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_logit_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_logit_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_logit_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_logit_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_logit_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_logit_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_1_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_1_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_1_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_1_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_1_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_1_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_1_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_3_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_3_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_3_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_3_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_3_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_3_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_3_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_3_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_3_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_3_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_5_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_5_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_5_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_5_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_5_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_5_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_5_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_5_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_5_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_5_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_5_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtr_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtr_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtr_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtr_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtr_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtr_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtr_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtr_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtr_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtri_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtri_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtri_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtri_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtri_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtri_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtri_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtri_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_softmax_with_dtype_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_softmax_with_dtype_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_softmax_with_dtype_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_softmax_with_dtype_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_softmax_with_dtype_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_softmax_with_dtype_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_softmax_with_dtype_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_softmax_with_dtype_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_softmax_with_dtype_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_spherical_bessel_j0_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_spherical_bessel_j0_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_spherical_bessel_j0_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_spherical_bessel_j0_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_spherical_bessel_j0_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_spherical_bessel_j0_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_spherical_bessel_j0_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_spherical_bessel_j0_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_spherical_bessel_j0_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_xlog1py_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_xlog1py_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_xlog1py_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_xlog1py_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_xlog1py_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_xlog1py_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_xlog1py_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_xlog1py_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_xlog1py_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_xlog1py_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_xlog1py_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_xlog1py_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_zeta_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_zeta_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_zeta_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_zeta_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_zeta_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_zeta_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_zeta_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_std_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_std_mean_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_std_mean_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sub_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sub_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sub_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sub_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sub_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sub_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sub_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sub_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sub_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sub_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sub_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sub_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sub_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sub_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sub_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sub_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sub_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_to_size_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_to_size_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_to_size_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_to_size_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_to_size_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_to_size_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_to_size_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_to_size_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_to_size_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_to_size_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_to_size_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_to_size_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_to_size_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_to_size_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tan_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tan_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tan_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tan_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tan_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tan_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tan_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tan_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tan_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tan_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tan_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tan_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tensor_split_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tensor_split_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tensor_split_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tensor_split_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tensor_split_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tensor_split_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tensor_split_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tensor_split_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tensor_split_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tensor_split_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tensor_split_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_to_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_to_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_to_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_to_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_to_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_to_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_to_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_to_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_to_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_to_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_to_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_indices_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_true_divide_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_true_divide_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_true_divide_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_true_divide_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_true_divide_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_true_divide_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_true_divide_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_true_divide_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_true_divide_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trunc_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trunc_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trunc_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trunc_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trunc_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trunc_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trunc_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trunc_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trunc_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trunc_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trunc_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unflatten_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unflatten_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unflatten_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unflatten_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unflatten_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unflatten_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unflatten_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unflatten_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unflatten_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unflatten_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_mean_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_mean_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_mean_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_mean_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_mean_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_mean_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_mean_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vsplit_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vsplit_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vsplit_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vsplit_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vsplit_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vsplit_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vsplit_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vsplit_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vsplit_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vsplit_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vsplit_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vsplit_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vsplit_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vsplit_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vsplit_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vstack_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vstack_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vstack_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vstack_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vstack_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vstack_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vstack_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vstack_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vstack_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vstack_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vstack_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vstack_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vstack_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vstack_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vstack_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_where_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_where_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_where_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_where_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_where_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_where_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_where_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_where_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_xlogy_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_xlogy_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_xlogy_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_xlogy_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_xlogy_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_xlogy_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_xlogy_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_xlogy_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_xlogy_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_xlogy_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_xlogy_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_native_batch_norm_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_native_batch_norm_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_native_batch_norm_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_native_batch_norm_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_native_batch_norm_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_var_mean_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_var_mean_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_view_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_view_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_view_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_view_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_view_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_view_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_view_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_view_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_view_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_view_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_view_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_view_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_view_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_view_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_view_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_view_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_view_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_view_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_T_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_T_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_T_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_T_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_T_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_T_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_T_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_T_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bfloat16_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bfloat16_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bfloat16_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bfloat16_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bfloat16_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bfloat16_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bfloat16_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bfloat16_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bool_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bool_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bool_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bool_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bool_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bool_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_byte_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_byte_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_byte_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_byte_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_byte_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_byte_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cdouble_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cdouble_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cdouble_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cdouble_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cdouble_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cdouble_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cdouble_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cdouble_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cdouble_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cdouble_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cfloat_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cfloat_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cfloat_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cfloat_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cfloat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cfloat_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cfloat_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cfloat_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_chalf_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_chalf_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_chalf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_chalf_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_chalf_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_chalf_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_char_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_char_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_char_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_char_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_char_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_complex_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_double_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_double_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_double_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_double_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_double_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_double_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_double_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_double_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_float_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_float_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_float_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_float_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_float_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_float_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_float_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_float_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_float_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_half_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_half_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_half_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_int_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_int_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_int_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_int_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_int_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_int_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_long_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_long_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_long_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_short_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_short_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_short_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_short_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_short_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_short_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_short_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_short_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_short_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_abs_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_abs_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_abs_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_abs_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_abs_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_abs_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_abs_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acos_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acos_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acos_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acos_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acos_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acos_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acosh_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acosh_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acosh_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acosh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acosh_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acosh_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acosh_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acosh_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_add_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_add_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_add_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_add_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_add_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_add_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_add_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addcdiv_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addcdiv_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addcmul_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addcmul_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addcmul_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addcmul_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addcmul_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addcmul_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addcmul_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addr_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addr_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addr_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addr_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addr_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addr_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_all_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_all_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_all_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_all_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_all_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_all_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_allclose_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_allclose_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_allclose_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_amax_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_amax_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_amax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_amax_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_amin_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_amin_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_amin_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_amin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_amin_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_amin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_amin_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_amin_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_any_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_any_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_any_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_any_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_any_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_any_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_arange_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_arange_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_arange_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_arange_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_arange_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_arange_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_partial_views_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_partial_views_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_partial_views_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_partial_views_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_partial_views_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_scatter_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_scatter_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_scatter_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_scatter_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_scatter_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asin_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asin_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asin_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asin_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asin_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asin_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asinh_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asinh_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asinh_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asinh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asinh_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asinh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asinh_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asinh_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan2_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan2_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan2_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan2_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atanh_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atanh_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atanh_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atanh_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atanh_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atanh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_1d_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_1d_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_1d_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_1d_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_2d_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_2d_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_2d_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_2d_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_2d_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_2d_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_2d_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_3d_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_3d_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_3d_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_3d_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_3d_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_3d_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_3d_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_and_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_and_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_and_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_left_shift_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_left_shift_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_not_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_not_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_not_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_not_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_or_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_or_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_right_shift_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_xor_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_xor_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_tensors_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_tensors_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_tensors_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_tensors_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_tensors_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_to_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_to_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_to_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_to_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_to_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_to_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bucketize_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bucketize_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bucketize_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bucketize_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bucketize_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cat_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cat_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cat_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cat_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cat_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ceil_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ceil_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ceil_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ceil_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_chunk_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_chunk_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_chunk_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_max_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_max_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_max_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_max_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_max_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_min_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_min_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clone_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clone_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clone_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clone_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clone_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clone_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clone_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_column_stack_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_column_stack_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_column_stack_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_column_stack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_column_stack_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_physical_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_physical_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_physical_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_physical_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_physical_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_constant_pad_nd_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_constant_pad_nd_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_constant_pad_nd_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_constant_pad_nd_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_constant_pad_nd_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_constant_pad_nd_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_contiguous_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_contiguous_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_contiguous_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_contiguous_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_contiguous_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_contiguous_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_copysign_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_copysign_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_copysign_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_copysign_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cos_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cos_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cos_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cos_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cosh_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cosh_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cosh_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cumsum_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cumsum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cumsum_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cumsum_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_embed_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_embed_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_embed_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_embed_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_embed_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_embed_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_embed_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_embed_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_copy_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_copy_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_copy_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_copy_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_copy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_scatter_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_scatter_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_scatter_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_scatter_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_scatter_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_digamma_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_digamma_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_digamma_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_digamma_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_floor_rounding_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_floor_rounding_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_floor_rounding_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_floor_rounding_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_floor_rounding_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_no_rounding_mode_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_no_rounding_mode_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_no_rounding_mode_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_no_rounding_mode_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_no_rounding_mode_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_no_rounding_mode_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_no_rounding_mode_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_no_rounding_mode_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_no_rounding_mode_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_trunc_rounding_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_trunc_rounding_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_trunc_rounding_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_trunc_rounding_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dsplit_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dsplit_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dsplit_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dsplit_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dstack_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dstack_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dstack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dstack_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dstack_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dstack_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_like_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_like_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_like_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_like_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eq_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eq_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eq_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eq_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eq_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eq_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eq_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erf_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erf_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erf_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erf_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erfc_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erfc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erfc_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erfc_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erfc_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erfinv_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erfinv_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erfinv_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erfinv_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erfinv_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp2_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp2_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp2_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp2_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_as_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_as_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_as_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_as_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_as_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_as_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expm1_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expm1_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expm1_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expm1_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eye_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eye_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eye_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eye_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eye_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eye_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft2_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft2_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft2_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft2_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft2_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftn_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftn_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftn_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftn_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftn_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftshift_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftshift_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftshift_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftshift_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftshift_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftshift_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft2_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft2_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft2_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft2_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft2_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft2_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfftn_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfftn_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfftn_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfftn_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft2_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft2_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft2_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft2_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft2_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft2_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftn_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftn_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftn_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftn_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftn_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftn_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftn_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftn_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftshift_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftshift_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftshift_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftshift_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftshift_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftshift_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftshift_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfft2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfft2_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfft2_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfft2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfft2_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfft_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfft_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfft_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfft_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfft_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfft_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfftn_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfftn_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfftn_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft2_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft2_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft2_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft2_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfftn_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfftn_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfftn_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfftn_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfftn_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfftn_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfftn_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfft2_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfft2_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfft2_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfft_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfft_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfftn_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfftn_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfftn_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfftn_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fill_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fill_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fill_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fill_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fill_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fill_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fill_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fill_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fill_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flatten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flatten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flatten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flatten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flatten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flip_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flip_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flip_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flip_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flip_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fliplr_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fliplr_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fliplr_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fliplr_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fliplr_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flipud_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flipud_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flipud_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flipud_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flipud_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flipud_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flipud_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flipud_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_float_power_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_float_power_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_float_power_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_floor_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_floor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_floor_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_floor_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_floor_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_floor_divide_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_floor_divide_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_floor_divide_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_floor_divide_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmax_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmax_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmax_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmax_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmin_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmin_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmin_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmin_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmod_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmod_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmod_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_frac_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_frac_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_gcd_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_gcd_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ge_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ge_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ge_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ge_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ge_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ge_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ge_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_gt_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_gt_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_gt_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_gt_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_gt_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_gt_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_heaviside_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_heaviside_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_heaviside_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_heaviside_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_heaviside_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_heaviside_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hsplit_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hsplit_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hsplit_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hsplit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hsplit_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hsplit_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hsplit_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hstack_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hstack_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hstack_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hstack_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hstack_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hstack_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hstack_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hstack_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hypot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_i0_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_i0_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_i0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_i0_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_i0_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_i0_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_i0_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_igamma_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_imag_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_imag_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_add_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_add_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_add_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_add_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_add_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_add_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_add_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_copy_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_copy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_copy_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_copy_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_copy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_fill_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_fill_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_fill_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_fill_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_fill_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_fill_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_fill_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_fill_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_select_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_select_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_select_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_select_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_select_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_select_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isclose_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isclose_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isclose_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isclose_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isclose_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isclose_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isfinite_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isfinite_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isfinite_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isfinite_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isfinite_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isfinite_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isfinite_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isinf_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isinf_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isinf_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isinf_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isinf_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isinf_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isinf_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isinf_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isnan_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isnan_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isnan_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isnan_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isnan_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isnan_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isnan_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isneginf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isneginf_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isneginf_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isneginf_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isposinf_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isposinf_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isposinf_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isposinf_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isreal_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isreal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isreal_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isreal_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isreal_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isreal_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lcm_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lcm_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_le_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_le_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_le_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_le_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_le_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lerp_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lerp_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lerp_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lgamma_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lgamma_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lgamma_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lgamma_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lgamma_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_matrix_norm_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_matrix_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_matrix_norm_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_norm_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_norm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_norm_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_svd_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_svd_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_svdvals_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_svdvals_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_svdvals_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_vector_norm_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_vector_norm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_vector_norm_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_vector_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linspace_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linspace_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linspace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linspace_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linspace_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log10_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log10_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log10_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log10_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log10_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log10_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log10_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log1p_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log1p_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log1p_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log1p_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log1p_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log2_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log2_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log2_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log2_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log2_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log2_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log2_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_softmax_with_dtype_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_softmax_with_dtype_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_softmax_with_dtype_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_softmax_with_dtype_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_softmax_with_dtype_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_softmax_with_dtype_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_and_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_and_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_and_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_and_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_and_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_and_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_and_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_and_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_and_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_not_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_not_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_not_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_not_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_not_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_not_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_or_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_or_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_xor_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_xor_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_xor_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_xor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_xor_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_xor_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_xor_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_xor_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logspace_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logspace_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logspace_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logspace_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logspace_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logspace_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logsumexp_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logsumexp_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logsumexp_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logsumexp_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lt_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lt_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lt_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lt_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lt_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lt_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_masked_fill_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_masked_fill_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_masked_fill_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_masked_fill_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_masked_fill_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_masked_fill_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_masked_fill_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_masked_fill_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_masked_fill_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_maximum_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_maximum_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_maximum_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_maximum_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_maximum_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_mean_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_mean_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_list_of_tensors_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_list_of_tensors_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_list_of_tensors_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_list_of_tensors_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_list_of_tensors_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_list_of_tensors_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_list_of_tensors_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_list_of_tensors_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_list_of_tensors_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_variadic_tensors_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_variadic_tensors_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_variadic_tensors_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_variadic_tensors_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_variadic_tensors_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_variadic_tensors_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_variadic_tensors_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_minimum_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_minimum_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_minimum_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_minimum_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_minimum_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_minimum_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_movedim_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_movedim_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_movedim_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_movedim_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_movedim_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_movedim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_movedim_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_movedim_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_movedim_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_mul_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_mul_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_mul_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_mul_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_mul_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_mul_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nan_to_num_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nan_to_num_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nan_to_num_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nan_to_num_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nan_to_num_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_copy_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_copy_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_copy_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_native_layer_norm_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_native_layer_norm_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_native_layer_norm_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ne_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ne_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ne_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ne_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ne_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ne_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_neg_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_neg_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_neg_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_neg_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_neg_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_neg_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_neg_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_neg_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_strided_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_strided_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_strided_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_strided_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_strided_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_strided_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_full_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_full_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_full_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_full_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_full_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_full_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_full_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_full_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_ones_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_ones_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_ones_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_ones_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_ones_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_zeros_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_zeros_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_zeros_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_zeros_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_zeros_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_zeros_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_zeros_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_zeros_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_zeros_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nextafter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_alpha_dropout_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_alpha_dropout_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_celu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_celu_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_dropout_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_elu_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_gelu_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_gelu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_glu_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_glu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_glu_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_group_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_hardshrink_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_hardshrink_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_hardshrink_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_hardtanh_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_hardtanh_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_hinge_embedding_loss_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_hinge_embedding_loss_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_hinge_embedding_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_hinge_embedding_loss_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_huber_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_l1_loss_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_l1_loss_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_l1_loss_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_l1_loss_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_layer_norm_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_layer_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_leaky_relu_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_leaky_relu_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_log_softmax_with_dtype_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_log_softmax_with_dtype_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_log_softmax_with_dtype_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_log_softmax_with_dtype_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_log_softmax_with_dtype_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_log_softmax_with_dtype_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_margin_ranking_loss_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_margin_ranking_loss_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_margin_ranking_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_margin_ranking_loss_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_mish_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_mse_loss_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_nll_loss_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_nll_loss_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pairwise_distance_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pairwise_distance_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pairwise_distance_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pairwise_distance_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pairwise_distance_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pairwise_distance_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pairwise_distance_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pdist_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_poisson_nll_loss_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_poisson_nll_loss_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_poisson_nll_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_poisson_nll_loss_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_poisson_nll_loss_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_prelu_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_relu6_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_relu6_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_relu6_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_relu_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_relu_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_relu_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_selu_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_selu_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_selu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_selu_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmax_with_dtype_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmax_with_dtype_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmax_with_dtype_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmax_with_dtype_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmax_with_dtype_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmax_with_dtype_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmax_with_dtype_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmin_with_dtype_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmin_with_dtype_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmin_with_dtype_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmin_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmin_with_dtype_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmin_with_dtype_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmin_with_dtype_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmin_with_dtype_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softplus_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softshrink_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softshrink_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_tanhshrink_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_tanhshrink_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_tanhshrink_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_tanhshrink_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_tanhshrink_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_tanhshrink_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_threshold_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_threshold_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_threshold_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_threshold_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_threshold_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_triplet_margin_loss_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_triplet_margin_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_triplet_margin_loss_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_triplet_margin_loss_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_triplet_margin_loss_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_norm_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_norm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_norm_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_norm_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ones_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ones_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ones_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ones_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ones_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ones_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ones_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_permute_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_permute_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_permute_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_permute_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_permute_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_positive_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_positive_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_positive_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_positive_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_positive_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_positive_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_positive_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_positive_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_positive_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_pow_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_pow_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_pow_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_pow_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_pow_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_pow_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_prod_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_prod_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_prod_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_prod_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_prod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_prod_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_prod_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_prod_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_prod_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_prod_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_randn_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_randn_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_randn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ravel_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ravel_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ravel_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ravel_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ravel_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ravel_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ravel_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_real_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_real_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_real_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_real_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_real_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_real_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_real_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_real_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_real_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_real_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reciprocal_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reciprocal_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reciprocal_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reciprocal_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_remainder_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_remainder_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_remainder_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_remainder_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_remainder_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_remainder_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_repeat_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_repeat_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_repeat_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_repeat_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_as_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_as_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_as_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_as_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_as_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_roll_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_roll_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_roll_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_roll_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_roll_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_roll_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rot90_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rot90_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rot90_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rot90_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_round_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_round_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_round_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_round_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_round_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsqrt_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsqrt_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsqrt_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsqrt_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsub_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsub_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsub_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sgn_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sgn_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sgn_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sgn_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sgn_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sigmoid_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sigmoid_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sigmoid_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sigmoid_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sigmoid_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sigmoid_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sigmoid_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sign_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sign_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sign_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_signbit_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_signbit_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_signbit_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_signbit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_signbit_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sin_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sin_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sin_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sin_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sin_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinc_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinc_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinc_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinc_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinc_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinh_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinh_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinh_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinh_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_softmax_with_dtype_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_softmax_with_dtype_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_softmax_with_dtype_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_softmax_with_dtype_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_softmax_with_dtype_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_bessel_j0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_bessel_j0_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_bessel_j0_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_bessel_j0_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_bessel_j0_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_bessel_j1_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_bessel_j1_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_bessel_j1_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_bessel_j1_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_bessel_j1_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_bessel_j1_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_bessel_j1_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_entr_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_entr_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_entr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_entr_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_erfcx_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_erfcx_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_erfcx_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i0e_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i0e_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i0e_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i0e_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i1_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i1_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i1_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i1_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i1_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i1e_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i1e_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i1e_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i1e_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_log_ndtr_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_log_ndtr_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_log_ndtr_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_log_ndtr_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_log_softmax_with_dtype_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_log_softmax_with_dtype_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_log_softmax_with_dtype_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_log_softmax_with_dtype_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_log_softmax_with_dtype_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_log_softmax_with_dtype_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_logit_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_logit_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_logit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_logit_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_logit_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_1_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_1_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_1_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_1_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_1_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_3_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_3_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_5_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_5_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_5_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_5_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_ndtr_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_ndtr_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_ndtr_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_ndtr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_ndtr_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_ndtr_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_ndtr_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_ndtri_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_ndtri_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_ndtri_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_ndtri_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_softmax_with_dtype_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_softmax_with_dtype_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_softmax_with_dtype_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_softmax_with_dtype_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_softmax_with_dtype_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_softmax_with_dtype_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_softmax_with_dtype_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_xlog1py_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_xlog1py_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_xlog1py_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_xlog1py_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_xlog1py_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_xlog1py_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_xlog1py_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_zeta_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_zeta_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_zeta_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_zeta_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_zeta_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sqrt_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sqrt_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sqrt_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sqrt_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sqrt_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sqrt_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_square_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_square_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_square_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_square_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_square_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_square_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_squeeze_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_squeeze_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_squeeze_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_squeeze_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_squeeze_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_squeeze_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_squeeze_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_stack_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_stack_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_stack_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_stack_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_stack_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_stack_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_std_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_std_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_std_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_std_mean_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_std_mean_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sub_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sub_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sub_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sub_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_to_size_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_to_size_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_to_size_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_to_size_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_to_size_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_t_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_t_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_t_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_t_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_t_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_t_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_t_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_t_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tan_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tan_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tan_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tan_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tan_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tan_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tan_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tanh_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tanh_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tanh_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tanh_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tanh_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tanh_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tensor_split_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tensor_split_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tensor_split_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tensor_split_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tensor_split_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tensor_split_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tensor_split_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_to_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_to_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_to_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_to_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_to_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_to_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_to_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_to_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_trace_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_trace_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_trace_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_trace_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_trace_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_trace_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_transpose_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_transpose_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_transpose_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_transpose_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_transpose_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tril_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tril_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tril_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tril_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tril_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tril_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tril_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tril_indices_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_triu_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_triu_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_triu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_triu_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_triu_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_triu_indices_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_triu_indices_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_true_divide_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_true_divide_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_trunc_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_trunc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_trunc_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_trunc_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unbind_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unbind_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unbind_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unbind_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unbind_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unbind_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unbind_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unflatten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unflatten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unflatten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unflatten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unflatten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unflatten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unflatten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_copy_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_copy_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unsqueeze_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unsqueeze_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unsqueeze_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unsqueeze_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unsqueeze_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unsqueeze_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unsqueeze_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unsqueeze_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unsqueeze_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_var_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_var_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_var_mean_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_var_mean_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_var_mean_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_var_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_var_mean_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_as_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_as_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_as_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_as_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_as_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vsplit_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vsplit_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vsplit_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vsplit_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vsplit_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vsplit_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vstack_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vstack_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vstack_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vstack_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vstack_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_where_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_where_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_where_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_where_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_where_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_where_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_where_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_where_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_xlogy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_xlogy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_xlogy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_xlogy_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_zeros_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_zeros_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_zeros_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_zeros_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_zeros_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_zeros_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta_ops_nvprims_native_batch_norm_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta_ops_nvprims_native_batch_norm_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta_ops_nvprims_var_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta_ops_nvprims_var_mean_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta_ops_nvprims_view_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta_ops_nvprims_view_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta_ops_nvprims_view_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta_ops_nvprims_view_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta_ops_nvprims_view_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta_ops_nvprims_view_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta_ops_nvprims_view_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_ops_nvprims_native_batch_norm_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_ops_nvprims_native_batch_norm_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_ops_nvprims_native_batch_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_ops_nvprims_native_batch_norm_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_ops_nvprims_var_mean_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_ops_nvprims_var_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_ops_nvprims_var_mean_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_ops_nvprims_view_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_ops_nvprims_view_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_ops_nvprims_view_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_ops_nvprims_view_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_ops_nvprims_view_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_ops_nvprims_view_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_T_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_T_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_T_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_T_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_T_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_T_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bfloat16_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bfloat16_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bfloat16_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bfloat16_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bfloat16_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bfloat16_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bool_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bool_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bool_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bool_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bool_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bool_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_byte_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_byte_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_byte_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_byte_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_byte_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_byte_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cdouble_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cdouble_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cdouble_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cdouble_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cdouble_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cdouble_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cdouble_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cfloat_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cfloat_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cfloat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cfloat_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cfloat_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_chalf_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_chalf_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_chalf_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_chalf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_chalf_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_chalf_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_char_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_char_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_char_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_char_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_char_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_char_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_char_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_char_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_complex_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_double_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_double_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_double_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_float_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_float_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_float_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_float_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_float_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_float_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_float_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_half_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_half_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_half_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_half_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_half_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_int_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_int_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_int_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_int_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_int_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_int_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_int_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_long_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_long_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_long_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_long_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_long_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_long_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_long_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_long_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_long_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_short_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_short_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_short_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_short_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_short_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_short_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_abs_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_abs_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_abs_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_abs_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_abs_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_abs_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_abs_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acos_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acos_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acos_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acos_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acos_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acos_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acosh_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acosh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acosh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acosh_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acosh_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acosh_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_add_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_add_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_add_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_add_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_add_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_add_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addcdiv_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addcdiv_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addcdiv_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addcdiv_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addcmul_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addcmul_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addcmul_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addcmul_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addcmul_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addcmul_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addcmul_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addr_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addr_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addr_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addr_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addr_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addr_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addr_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_all_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_all_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_all_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_all_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_all_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_allclose_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_allclose_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_amax_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_amax_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_amax_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_amax_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_amax_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_amin_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_any_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_any_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_any_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_any_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_any_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_any_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_arange_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_arange_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_arange_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_arange_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_arange_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_partial_views_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_partial_views_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_partial_views_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_partial_views_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_scatter_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_scatter_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_scatter_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_scatter_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_scatter_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_scatter_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asin_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asin_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asin_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asin_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asin_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asin_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asin_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asin_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asinh_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asinh_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asinh_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asinh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asinh_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan2_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan2_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan2_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atanh_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atanh_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atanh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atanh_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atanh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atanh_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atanh_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atanh_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atanh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atanh_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atanh_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_1d_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_1d_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_1d_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_1d_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_1d_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_2d_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_2d_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_2d_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_2d_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_2d_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_2d_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_3d_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_3d_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_3d_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_3d_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_3d_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_3d_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_3d_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_and_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_and_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_and_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_left_shift_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_left_shift_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_not_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_not_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_not_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_not_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_or_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_or_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_right_shift_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_right_shift_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_right_shift_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_right_shift_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_xor_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_xor_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_xor_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_shapes_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_tensors_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_tensors_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_tensors_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_tensors_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_tensors_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_tensors_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_tensors_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_to_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_to_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_to_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_to_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_to_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_to_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bucketize_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bucketize_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bucketize_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bucketize_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bucketize_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bucketize_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cat_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cat_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cat_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cat_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ceil_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ceil_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ceil_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_chunk_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_chunk_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_chunk_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_chunk_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_chunk_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_max_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_max_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_max_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_min_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_min_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_min_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_min_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clone_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clone_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clone_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clone_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clone_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clone_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clone_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clone_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_column_stack_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_column_stack_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_column_stack_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_column_stack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_column_stack_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_column_stack_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_column_stack_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_physical_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_physical_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_physical_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_physical_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_physical_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_physical_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_physical_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_physical_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_constant_pad_nd_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_constant_pad_nd_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_constant_pad_nd_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_constant_pad_nd_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_constant_pad_nd_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_constant_pad_nd_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_contiguous_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_contiguous_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_contiguous_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_contiguous_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_contiguous_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_contiguous_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_contiguous_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_copysign_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_copysign_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_copysign_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_copysign_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_copysign_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_copysign_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_copysign_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cos_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cos_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cos_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cos_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cos_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cos_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cos_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cos_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cosh_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cosh_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cosh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cosh_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cosh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cosh_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cumsum_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cumsum_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cumsum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cumsum_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cumsum_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cumsum_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_embed_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_embed_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_embed_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_embed_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_copy_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_copy_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_copy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_copy_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_scatter_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_scatter_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_scatter_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_scatter_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_scatter_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_digamma_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_digamma_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_digamma_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_digamma_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_digamma_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_floor_rounding_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_floor_rounding_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_floor_rounding_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_floor_rounding_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_no_rounding_mode_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_no_rounding_mode_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_no_rounding_mode_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_no_rounding_mode_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_no_rounding_mode_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_no_rounding_mode_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_no_rounding_mode_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_no_rounding_mode_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_trunc_rounding_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_trunc_rounding_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_trunc_rounding_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_trunc_rounding_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dsplit_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dsplit_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dsplit_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dsplit_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dsplit_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dsplit_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dsplit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dsplit_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dstack_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dstack_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dstack_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dstack_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dstack_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dstack_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dstack_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dstack_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dstack_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dstack_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dstack_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_like_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_like_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_like_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_like_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_like_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eq_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eq_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eq_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eq_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eq_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eq_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eq_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eq_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eq_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eq_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erf_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erf_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erf_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erf_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erfc_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erfc_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erfc_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erfc_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erfc_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erfinv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erfinv_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erfinv_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp2_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp2_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp2_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp2_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp2_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp2_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_as_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_as_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_as_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_as_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_as_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_as_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_as_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expm1_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expm1_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expm1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expm1_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expm1_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expm1_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eye_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eye_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eye_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eye_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft2_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft2_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft2_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft2_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftn_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftn_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftn_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftn_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftn_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftn_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftn_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftshift_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftshift_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftshift_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftshift_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftshift_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftshift_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftshift_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftshift_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftshift_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft2_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft2_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft2_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft2_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft2_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft2_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfftn_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfftn_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfftn_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfftn_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfftn_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft2_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft2_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft2_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft2_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftn_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftn_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftn_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftshift_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftshift_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftshift_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftshift_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftshift_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfft2_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfft2_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfft2_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfft2_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfft_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfft_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfftn_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfftn_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfftn_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfftn_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfftn_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfftn_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft2_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft2_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft2_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft2_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft2_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfftn_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfftn_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfftn_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfftn_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfft2_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfft2_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfft2_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfft_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfft_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfft_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfftn_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfftn_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfftn_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfftn_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfftn_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fill_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fill_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fill_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flatten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flatten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flatten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flatten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flatten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flatten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flip_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flip_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flip_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fliplr_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fliplr_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fliplr_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fliplr_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fliplr_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fliplr_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fliplr_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flipud_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flipud_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flipud_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flipud_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flipud_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flipud_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flipud_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_float_power_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_float_power_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_float_power_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_float_power_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_float_power_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_float_power_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_float_power_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_float_power_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_float_power_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_floor_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_floor_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_floor_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_floor_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_floor_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_floor_divide_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_floor_divide_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_floor_divide_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_floor_divide_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmax_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmax_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmax_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmax_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmax_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmin_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmin_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmin_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmin_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmod_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmod_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_frac_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_frac_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_frac_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_frac_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_gcd_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_gcd_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ge_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ge_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ge_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ge_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ge_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_gt_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_gt_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_gt_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_gt_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_gt_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_gt_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_heaviside_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_heaviside_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_heaviside_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_heaviside_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_heaviside_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_heaviside_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hsplit_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hsplit_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hsplit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hsplit_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hsplit_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hsplit_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hsplit_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hsplit_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hstack_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hstack_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hstack_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hstack_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hypot_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hypot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_i0_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_i0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_i0_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_i0_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_i0_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_i0_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_i0_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_igamma_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_igamma_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_add_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_add_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_add_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_add_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_add_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_add_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_add_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_add_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_add_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_copy_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_copy_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_copy_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_copy_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_copy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_copy_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_fill_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_fill_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_fill_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_fill_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_fill_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_fill_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_fill_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_fill_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_fill_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_select_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_select_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_select_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_select_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_select_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_select_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_select_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_select_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_select_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_select_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isclose_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isclose_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isclose_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isclose_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isclose_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isclose_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isclose_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isfinite_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isfinite_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isfinite_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isfinite_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isfinite_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isfinite_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isfinite_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isinf_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isinf_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isinf_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isinf_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isinf_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isnan_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isnan_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isnan_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isnan_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isnan_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isneginf_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isneginf_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isneginf_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isneginf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isneginf_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isneginf_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isneginf_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isposinf_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isposinf_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isposinf_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isposinf_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isreal_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isreal_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isreal_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isreal_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isreal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isreal_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isreal_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isreal_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lcm_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_le_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_le_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_le_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_le_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_le_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lerp_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lerp_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lerp_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lgamma_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lgamma_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lgamma_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lgamma_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_matrix_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_matrix_norm_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_norm_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_norm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_norm_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_norm_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_svd_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_svd_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_svdvals_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_svdvals_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_svdvals_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_vector_norm_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_vector_norm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_vector_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linspace_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linspace_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linspace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linspace_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linspace_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linspace_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linspace_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log10_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log10_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log10_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log10_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log1p_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log1p_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log1p_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log1p_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log2_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log2_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log2_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log2_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log2_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log2_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log2_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log2_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_softmax_with_dtype_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_softmax_with_dtype_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_softmax_with_dtype_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_softmax_with_dtype_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_softmax_with_dtype_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_softmax_with_dtype_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_softmax_with_dtype_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_and_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_and_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_and_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_and_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_and_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_and_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_and_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_and_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_not_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_not_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_not_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_not_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_not_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_not_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_or_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_or_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_or_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_or_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_or_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_or_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_or_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_xor_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_xor_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_xor_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_xor_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logspace_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logspace_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logspace_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logsumexp_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logsumexp_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logsumexp_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logsumexp_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logsumexp_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lt_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lt_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_masked_fill_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_masked_fill_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_masked_fill_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_masked_fill_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_masked_fill_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_masked_fill_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_masked_fill_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_masked_fill_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_maximum_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_maximum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_maximum_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_maximum_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_maximum_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_maximum_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_mean_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_mean_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_mean_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_list_of_tensors_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_list_of_tensors_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_list_of_tensors_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_list_of_tensors_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_list_of_tensors_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_variadic_tensors_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_variadic_tensors_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_variadic_tensors_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_variadic_tensors_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_variadic_tensors_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_variadic_tensors_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_variadic_tensors_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_minimum_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_minimum_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_minimum_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_movedim_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_movedim_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_movedim_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_movedim_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_movedim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_movedim_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_movedim_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_mul_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_mul_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_mul_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_mul_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nan_to_num_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nan_to_num_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nan_to_num_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nan_to_num_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nan_to_num_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_copy_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_copy_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_copy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_copy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_native_layer_norm_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_native_layer_norm_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ne_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ne_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ne_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ne_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ne_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ne_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ne_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_neg_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_neg_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_neg_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_neg_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_neg_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_neg_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_strided_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_strided_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_strided_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_strided_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_strided_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_full_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_full_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_full_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_full_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_full_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_full_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_full_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_full_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_ones_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_ones_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_ones_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_ones_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_ones_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_ones_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_zeros_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_zeros_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_zeros_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_zeros_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_zeros_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_zeros_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_zeros_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nextafter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nextafter_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_alpha_dropout_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_celu_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_celu_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_dropout_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_dropout_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_elu_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_elu_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_elu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_elu_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_gelu_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_gelu_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_gelu_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_group_norm_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_group_norm_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_hardshrink_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_hardshrink_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_hardshrink_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_hardtanh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_hardtanh_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_hardtanh_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_hinge_embedding_loss_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_hinge_embedding_loss_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_hinge_embedding_loss_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_huber_loss_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_huber_loss_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_l1_loss_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_l1_loss_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_l1_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_layer_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_layer_norm_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_leaky_relu_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_leaky_relu_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_log_softmax_with_dtype_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_log_softmax_with_dtype_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_log_softmax_with_dtype_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_log_softmax_with_dtype_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_log_softmax_with_dtype_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_log_softmax_with_dtype_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_log_softmax_with_dtype_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_log_softmax_with_dtype_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_margin_ranking_loss_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_margin_ranking_loss_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_margin_ranking_loss_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_margin_ranking_loss_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_margin_ranking_loss_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_margin_ranking_loss_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_mish_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_mish_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_mse_loss_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_nll_loss_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_pairwise_distance_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_pairwise_distance_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_pairwise_distance_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_pairwise_distance_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_pairwise_distance_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_poisson_nll_loss_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_poisson_nll_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_poisson_nll_loss_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_poisson_nll_loss_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_prelu_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_relu6_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_relu6_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_relu6_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_relu6_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_relu6_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_relu6_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_relu_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_relu_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_relu_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_selu_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmax_with_dtype_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmax_with_dtype_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmax_with_dtype_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmax_with_dtype_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmax_with_dtype_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmax_with_dtype_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmax_with_dtype_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmax_with_dtype_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmin_with_dtype_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmin_with_dtype_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmin_with_dtype_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmin_with_dtype_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmin_with_dtype_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmin_with_dtype_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softplus_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softplus_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softplus_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softshrink_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softshrink_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softshrink_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softshrink_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_tanhshrink_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_tanhshrink_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_tanhshrink_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_tanhshrink_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_tanhshrink_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_threshold_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_threshold_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_threshold_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_threshold_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_triplet_margin_loss_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_triplet_margin_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_triplet_margin_loss_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_triplet_margin_loss_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_norm_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_norm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_norm_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_norm_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ones_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ones_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ones_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ones_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ones_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ones_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ones_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_permute_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_permute_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_permute_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_permute_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_permute_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_permute_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_positive_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_positive_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_positive_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_positive_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_positive_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_positive_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_positive_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_pow_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_pow_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_pow_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_pow_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_pow_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_pow_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_prod_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_prod_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_prod_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_prod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_prod_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_prod_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_prod_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_randn_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_randn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ravel_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ravel_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_real_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_real_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_real_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_real_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_real_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_real_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reciprocal_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reciprocal_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reciprocal_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reciprocal_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reciprocal_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reciprocal_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reciprocal_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reciprocal_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_remainder_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_remainder_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_remainder_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_remainder_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_remainder_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_remainder_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_remainder_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_remainder_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_repeat_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_repeat_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_repeat_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_repeat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_repeat_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_repeat_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_as_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_as_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_as_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_as_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_as_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_as_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_as_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_as_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_roll_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_roll_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_roll_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_roll_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_roll_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_roll_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rot90_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rot90_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rot90_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rot90_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rot90_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_round_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_round_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_round_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_round_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_round_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_round_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsqrt_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsqrt_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsqrt_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsqrt_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsqrt_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsqrt_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsqrt_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsub_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsub_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsub_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sgn_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sgn_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sgn_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sgn_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sigmoid_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sigmoid_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sigmoid_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sigmoid_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sign_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sign_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sign_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sign_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sign_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_signbit_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_signbit_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_signbit_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_signbit_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sin_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sin_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sin_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sin_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sin_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sin_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinc_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinc_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinc_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinc_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinc_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinh_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinh_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinh_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinh_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinh_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_softmax_with_dtype_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_softmax_with_dtype_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_softmax_with_dtype_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_softmax_with_dtype_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_softmax_with_dtype_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_bessel_j0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_bessel_j0_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_bessel_j0_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_bessel_j0_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_bessel_j0_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_bessel_j0_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_bessel_j1_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_bessel_j1_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_bessel_j1_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_bessel_j1_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_entr_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_entr_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_entr_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_entr_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_erfcx_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_erfcx_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_erfcx_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_erfcx_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i0e_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i0e_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i0e_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i0e_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i1_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i1_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i1_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i1_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i1e_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i1e_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i1e_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i1e_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i1e_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_log_ndtr_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_log_ndtr_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_log_ndtr_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_log_softmax_with_dtype_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_log_softmax_with_dtype_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_log_softmax_with_dtype_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_log_softmax_with_dtype_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_log_softmax_with_dtype_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_logit_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_logit_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_logit_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_logit_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_1_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_1_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_1_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_3_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_3_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_3_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_3_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_3_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_3_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_5_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_5_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_5_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_5_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_ndtr_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_ndtr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_ndtr_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_ndtr_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_ndtr_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_ndtr_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_ndtr_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_ndtri_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_ndtri_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_softmax_with_dtype_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_softmax_with_dtype_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_softmax_with_dtype_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_spherical_bessel_j0_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_spherical_bessel_j0_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_spherical_bessel_j0_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_spherical_bessel_j0_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_xlog1py_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_xlog1py_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_xlog1py_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_xlog1py_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_zeta_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_zeta_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_zeta_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_zeta_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sqrt_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sqrt_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sqrt_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sqrt_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sqrt_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sqrt_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_square_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_square_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_square_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_square_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_square_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_square_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_square_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_square_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_squeeze_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_squeeze_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_squeeze_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_squeeze_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_squeeze_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_squeeze_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_stack_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_stack_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_stack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_stack_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_std_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_std_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_std_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_std_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_std_mean_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_std_mean_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_std_mean_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sub_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sub_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sub_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sub_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sub_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sub_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_to_size_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_to_size_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_to_size_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_to_size_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_to_size_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_to_size_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_to_size_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_t_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_t_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_t_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_t_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_t_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_t_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tan_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tan_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tan_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tan_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tan_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tan_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tanh_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tanh_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tanh_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tanh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tanh_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tanh_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tanh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tanh_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tanh_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tensor_split_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tensor_split_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tensor_split_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tensor_split_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tensor_split_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tensor_split_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_to_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_to_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_to_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_to_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_to_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trace_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trace_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trace_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trace_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trace_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trace_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trace_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_transpose_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_transpose_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_transpose_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_transpose_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_transpose_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_transpose_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_transpose_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_transpose_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_transpose_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tril_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tril_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tril_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tril_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tril_indices_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tril_indices_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_triu_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_triu_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_triu_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_triu_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_triu_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_triu_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_triu_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_triu_indices_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_true_divide_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_true_divide_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_true_divide_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_true_divide_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_true_divide_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_true_divide_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trunc_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trunc_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trunc_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trunc_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trunc_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unbind_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unbind_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unbind_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unbind_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unbind_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unbind_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unbind_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unbind_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unbind_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unflatten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unflatten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unflatten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unflatten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unflatten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unflatten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unflatten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_copy_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_copy_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_copy_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_copy_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_copy_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_copy_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_copy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_copy_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unsqueeze_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unsqueeze_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unsqueeze_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unsqueeze_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unsqueeze_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unsqueeze_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unsqueeze_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unsqueeze_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unsqueeze_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unsqueeze_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_var_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_var_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_var_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_var_mean_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_var_mean_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_var_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_var_mean_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_as_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_as_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vsplit_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vsplit_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vsplit_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vsplit_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vsplit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vsplit_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vsplit_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vsplit_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vsplit_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vstack_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vstack_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vstack_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vstack_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vstack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vstack_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vstack_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vstack_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vstack_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_where_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_where_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_where_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_where_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_where_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_where_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_where_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_xlogy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_xlogy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_xlogy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_xlogy_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_xlogy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_xlogy_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_zeros_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_zeros_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_zeros_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_zeros_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback_ops_nvprims_native_batch_norm_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback_ops_nvprims_native_batch_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback_ops_nvprims_native_batch_norm_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback_ops_nvprims_var_mean_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback_ops_nvprims_var_mean_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback_ops_nvprims_var_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback_ops_nvprims_view_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback_ops_nvprims_view_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback_ops_nvprims_view_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback_ops_nvprims_view_cuda_int64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_T_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager___getitem___cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager___radd___cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager___rdiv___cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager___rmatmul___cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager___rmul___cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager___rsub___cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager__native_batch_norm_legit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_abs_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_abs_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_acos_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_acos_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_acosh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_addbmm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_addbmm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_addcdiv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_addcmul_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_addmm_decomposed_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_addmv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_addr_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_addr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_all_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_allclose_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_allclose_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_angle_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_angle_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_any_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_any_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_arange_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_as_strided_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_as_strided_partial_views_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_as_strided_partial_views_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_asin_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_asinh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_asinh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_atan2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_atan_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_atanh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_atanh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_baddbmm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_bernoulli_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_bfloat16_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_bmm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_bmm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_bool_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_broadcast_to_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_byte_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_byte_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cartesian_prod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cat_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cdouble_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_ceil_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cfloat_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_chalf_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_char_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cholesky_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cholesky_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_chunk_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_chunk_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_clamp_min_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_clone_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_clone_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_column_stack_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_combinations_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_complex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_conj_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_conj_physical_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_conj_physical_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_contiguous_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_contiguous_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_copysign_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_corrcoef_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cosh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_count_nonzero_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cov_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cross_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cumsum_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cumsum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_diag_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_diag_embed_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_diag_embed_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_diagflat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_diagonal_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_diagonal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_diagonal_scatter_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_diagonal_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_diff_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_digamma_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_dist_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_div_floor_rounding_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_div_no_rounding_mode_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_double_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_double_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_dsplit_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_dsplit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_dstack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_empty_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_empty_like_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_eq_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_equal_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_equal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_erfinv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_expand_as_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_expand_as_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_expand_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_eye_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_fft2_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_fft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_fft_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_fft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_fftn_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_fftshift_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_hfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_hfft_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_hfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_hfftn_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_ifft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_ifftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_ifftshift_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_ihfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_ihfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_irfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_irfft_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_irfftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_rfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_rfftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_flatten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_flip_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fliplr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_float_power_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_floor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fmin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_frexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_full_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_full_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_full_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_geqrf_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_gradient_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_grid_sampler_2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_half_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_hsplit_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_hsplit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_hstack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_hypot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_i0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_igamma_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_igammac_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_imag_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_index_add_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_index_fill_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_index_reduce_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_index_select_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_inner_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_int_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_isclose_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_isfinite_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_isnan_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_isnan_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_isneginf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_isreal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_jiterator_2inputs_2outputs_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_jiterator_binary_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_jiterator_binary_return_by_ref_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_kron_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_ldexp_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_ldexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_le_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_lerp_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_lerp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_lgamma_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_cross_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_det_singular_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_eigh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_eigvals_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_eigvals_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_householder_product_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_inv_ex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_ldl_factor_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_ldl_factor_ex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_ldl_solve_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_lstsq_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_lstsq_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_lstsq_grad_oriented_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_lu_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_lu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_lu_factor_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_lu_factor_ex_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_lu_factor_ex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_lu_solve_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_matrix_norm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_matrix_power_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_matrix_rank_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_multi_dot_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_multi_dot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_norm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_norm_subgradients_at_zero_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_pinv_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_pinv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_pinv_hermitian_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_qr_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_slogdet_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_solve_ex_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_solve_ex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_solve_triangular_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_svd_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_svdvals_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_svdvals_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_tensorsolve_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_vander_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_vecdot_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_vecdot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_vector_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_log1p_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_log1p_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_log2_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_log_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_log_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_logaddexp2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_logaddexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_logcumsumexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_logdet_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_logdet_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_logical_and_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_logical_and_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_logical_not_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_logical_or_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_logical_or_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_logical_xor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_logit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_logspace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_lt_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_lu_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_lu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_lu_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_mH_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_mH_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_mT_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_cumprod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_cumsum_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_fill_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_fill_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_log_softmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_logsumexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_median_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_normalize_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_prod_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_prod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_select_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_softmin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_std_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_var_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_matrix_exp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_max_pool2d_with_indices_backward_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_max_reduction_no_dim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_maximum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_median_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_meshgrid_list_of_tensors_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_meshgrid_variadic_tensors_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_mm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_mode_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_movedim_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_movedim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_mul_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_multinomial_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_mvlgamma_mvlgamma_p_1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_mvlgamma_mvlgamma_p_5_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nan_to_num_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nanmean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nanquantile_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nansum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_narrow_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_narrow_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_narrow_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_native_batch_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_native_layer_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_ne_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_neg_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_new_empty_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_new_empty_strided_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_new_empty_strided_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_new_full_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_new_ones_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_new_zeros_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_adaptive_avg_pool1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_adaptive_avg_pool3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_adaptive_max_pool1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_adaptive_max_pool3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_avg_pool2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_avg_pool3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_batch_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_binary_cross_entropy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_conv_transpose1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_conv_transpose2d_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_conv_transpose3d_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_cosine_embedding_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_ctc_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_dropout2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_dropout3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_dropout_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_embedding_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_fractional_max_pool2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_gelu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_group_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_hardswish_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_hardtanh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_hinge_embedding_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_instance_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_interpolate_bilinear_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_l1_loss_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_layer_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_linear_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_margin_ranking_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_max_pool2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_max_unpool1d_grad_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_max_unpool2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_max_unpool2d_grad_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_max_unpool3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_mish_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_mse_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_multi_margin_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_multilabel_margin_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_nll_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_normalize_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_pad_constant_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_pad_reflect_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_pad_replicate_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_pairwise_distance_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_pairwise_distance_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_pixel_shuffle_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_pixel_unshuffle_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_prelu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_rrelu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_selu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_softmin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_softplus_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_softsign_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_tanhshrink_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_triplet_margin_loss_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_triplet_margin_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_triplet_margin_with_distance_loss_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_unfold_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_upsample_bilinear_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_upsample_nearest_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nonzero_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_norm_fro_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_norm_nuc_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_norm_nuc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_normal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_ones_like_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_ones_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_ormqr_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_ormqr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_outer_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_permute_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_permute_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_polygamma_polygamma_n_1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_polygamma_polygamma_n_4_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_pow_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_pow_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_prod_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_put_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_qr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_rad2deg_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_rand_like_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_randint_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_randn_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_randn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_randn_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_ravel_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_real_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_reciprocal_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_remainder_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_repeat_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_repeat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_repeat_interleave_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_repeat_interleave_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_reshape_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_reshape_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_resize__cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_resize__cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_resize_as__cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_resolve_conj_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_resolve_neg_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_roll_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_rot90_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_rot90_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_round_decimals_0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_rsqrt_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_rsub_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_scalar_tensor_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_scatter_add_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_scatter_add_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_scatter_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_scatter_reduce_prod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_scatter_reduce_sum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_segment_reduce_lengths_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_segment_reduce_offsets_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_sgn_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_sgn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_short_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_sigmoid_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_sign_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_signal_windows_bartlett_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_signal_windows_blackman_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_signal_windows_cosine_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_signal_windows_exponential_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_signal_windows_gaussian_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_signal_windows_hamming_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_signbit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_sin_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_sinh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_slice_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_softmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_softmax_with_dtype_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_bessel_j0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_bessel_j1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_bessel_y1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_chebyshev_polynomial_t_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_entr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_erfcx_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_hermite_polynomial_he_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_i0e_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_i1e_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_legendre_polynomial_p_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_modified_bessel_i0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_modified_bessel_i1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_modified_bessel_k1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_polygamma_special_polygamma_n_0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_scaled_modified_bessel_k0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_scaled_modified_bessel_k1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_xlog1py_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_zeta_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_split_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_split_list_args_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_sqrt_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_sqrt_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_squeeze_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_stack_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_stack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_std_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_std_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_std_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_std_unbiased_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_std_unbiased_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_stft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_sub_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_sub_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_sum_to_size_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_svd_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_symeig_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_t_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_take_along_dim_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_take_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_tan_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_tanh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_tensor_split_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_tensordot_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_tensordot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_tile_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_to_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_to_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_to_sparse_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_to_sparse_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_topk_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_trace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_transpose_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_trapz_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_triangular_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_tril_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_triu_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_triu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_true_divide_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_true_divide_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_trunc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_unflatten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_unfold_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_uniform_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_unsqueeze_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_unsqueeze_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_var_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_var_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_var_mean_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_var_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_var_mean_unbiased_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_var_unbiased_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_vdot_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_vdot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_view_as_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_view_as_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_view_as_real_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_view_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_view_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_vsplit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_vstack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_where_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_xlogy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_zero__cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_zeros_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_zeros_like_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_H_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_T_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward___radd___cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward___rmatmul___cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward___rmul___cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward___rsub___cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward__softmax_backward_data_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_abs_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_acos_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_acosh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_add_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_addcdiv_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_addmm_decomposed_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_addmv_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_angle_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_as_strided_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_as_strided_partial_views_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_as_strided_scatter_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_asin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_atan_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_atanh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_atleast_1d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_atleast_3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_baddbmm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_bernoulli_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_bmm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_broadcast_tensors_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_broadcast_to_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_cartesian_prod_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_cat_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_cdist_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_ceil_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_cfloat_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_cholesky_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_chunk_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_clamp_min_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_column_stack_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_combinations_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_conj_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_conj_physical_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_contiguous_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_cosh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_cummin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_deg2rad_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_diag_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_diagflat_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_diagonal_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_diagonal_scatter_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_dist_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_div_no_rounding_mode_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_dot_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_double_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_erfc_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_erfinv_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_exp2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_expand_as_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_fft_fft2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_fft_fft_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_fft_fftn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_fft_hfft_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_fft_ifft2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_fft_ifft_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_fft_ifftshift_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_fft_ihfft2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_fft_ihfft_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_fft_irfft_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_fft_rfft2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_fft_rfftn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_fill_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_flipud_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_float_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_fmin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_fmod_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_gather_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_half_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_hstack_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_hypot_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_index_add_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_index_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_index_fill_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_index_reduce_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_index_select_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_inner_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_kron_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_kthvalue_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_lerp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_cond_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_cross_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_det_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_det_singular_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_eig_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_eigh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_householder_product_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_lu_factor_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_lu_factor_ex_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_pinv_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_qr_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_solve_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_solve_ex_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_solve_triangular_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_svd_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_tensorinv_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_vecdot_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_vector_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_log10_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_log1p_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_log2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_log_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_logaddexp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_logcumsumexp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_logit_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_logsumexp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_lu_solve_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_mH_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_mT_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_masked_cumsum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_masked_logsumexp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_masked_mean_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_masked_median_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_masked_normalize_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_masked_scatter_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_masked_select_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_masked_softmin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_masked_sum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_masked_var_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_matmul_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_max_binary_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_max_pool2d_with_indices_backward_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_max_reduction_with_dim_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_median_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_min_reduction_no_dim_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_mode_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_movedim_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_msort_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_mvlgamma_mvlgamma_p_1_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nanmean_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nanquantile_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nansum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_native_batch_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_native_dropout_backward_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_neg_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_adaptive_avg_pool2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_adaptive_max_pool3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_alpha_dropout_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_avg_pool2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_batch_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_batch_norm_without_cudnn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_celu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_conv1d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_conv2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_conv_transpose1d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_cosine_embedding_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_cosine_similarity_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_ctc_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_dropout2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_dropout3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_dropout_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_elu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_fractional_max_pool2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_fractional_max_pool3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_glu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_group_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_hardtanh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_huber_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_instance_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_interpolate_area_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_interpolate_bilinear_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_interpolate_nearest_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_interpolate_trilinear_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_kl_div_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_l1_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_linear_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_margin_ranking_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_max_pool3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_max_unpool2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_max_unpool2d_grad_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_max_unpool3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_max_unpool3d_grad_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_mish_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_mse_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_multilabel_margin_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_normalize_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_pad_circular_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_pad_constant_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_pad_reflect_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_pad_replicate_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_pairwise_distance_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_pdist_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_prelu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_relu6_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_relu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_selu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_silu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_softshrink_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_tanhshrink_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_threshold_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_triplet_margin_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_upsample_bilinear_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_upsample_nearest_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_normal_number_mean_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_ormqr_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_outer_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_pca_lowrank_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_polar_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_polygamma_polygamma_n_0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_polygamma_polygamma_n_2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_polygamma_polygamma_n_3_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_polygamma_polygamma_n_4_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_put_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_quantile_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_rad2deg_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_real_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_repeat_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_reshape_as_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_reshape_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_roll_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_round_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_round_decimals_0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_round_decimals_neg_3_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_rsub_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_scatter_reduce_amin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_scatter_reduce_prod_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_scatter_reduce_sum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_segment_reduce_lengths_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_segment_reduce_offsets_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_sgn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_sin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_sinc_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_slice_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_softmax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_sort_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_special_erfcx_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_special_i0e_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_special_i1_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_special_i1e_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_special_log_ndtr_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_special_ndtr_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_special_xlog1py_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_split_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_squeeze_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_stack_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_std_mean_unbiased_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_sub_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_sum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_t_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_take_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_tan_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_tensor_split_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_tensordot_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_tile_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_to_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_topk_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_trapezoid_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_true_divide_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_trunc_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_unflatten_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_unsqueeze_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_var_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_var_mean_unbiased_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_view_as_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_view_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_view_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_zero__cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad___getitem___cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad___rdiv___cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad___rmod___cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad___rmul___cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad___rpow___cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad__softmax_backward_data_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_abs_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_acos_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_add_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_addbmm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_addcdiv_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_addmm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_addr_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_allclose_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_amin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_any_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_arange_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_argmax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_argsort_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_argwhere_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_as_strided_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_as_strided_partial_views_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_as_strided_scatter_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_asin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_atan2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_atanh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_atleast_2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_atleast_3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_bernoulli_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_bmm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_cartesian_prod_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_cat_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_cdist_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_cfloat_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_cholesky_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_cholesky_solve_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_clamp_min_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_clone_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_conj_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_conj_physical_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_constant_pad_nd_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_cos_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_cross_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_cumsum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_deg2rad_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_diag_embed_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_diagonal_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_diff_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_digamma_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_dist_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_div_floor_rounding_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_div_no_rounding_mode_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_double_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_empty_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_empty_like_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_erfc_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_expand_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_expm1_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fft_fft2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fft_hfft2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fft_hfftn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fft_ifftn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fft_ihfft2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fft_irfftn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fft_rfft2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fft_rfft_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_flip_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_float_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_float_power_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_floor_divide_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fmax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fmod_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_frac_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_frexp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_full_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_full_like_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_gather_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_geqrf_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_gradient_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_grid_sampler_2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_gt_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_histc_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_hsplit_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_index_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_index_put_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_index_select_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_int_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_isclose_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_isfinite_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_isin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_isinf_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_isnan_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_isposinf_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_jiterator_4inputs_with_extra_args_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_jiterator_binary_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_jiterator_binary_return_by_ref_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_jiterator_unary_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_kthvalue_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_lgamma_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_cond_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_det_singular_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_eigvalsh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_householder_product_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_inv_ex_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_ldl_factor_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_ldl_factor_ex_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_ldl_solve_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_lstsq_grad_oriented_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_lu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_lu_solve_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_matrix_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_matrix_power_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_multi_dot_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_norm_subgradients_at_zero_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_pinv_hermitian_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_solve_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_solve_triangular_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_vander_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linspace_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_log10_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_log_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_logaddexp2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_logcumsumexp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_logdet_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_logical_and_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_logical_not_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_logical_xor_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_lu_solve_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_lu_unpack_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_amin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_argmax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_argmin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_cumprod_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_logsumexp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_mean_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_median_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_normalize_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_var_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_matmul_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_max_binary_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_max_reduction_no_dim_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_max_reduction_with_dim_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_maximum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_min_binary_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_min_reduction_no_dim_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_msort_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_mul_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_mv_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_mvlgamma_mvlgamma_p_1_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_mvlgamma_mvlgamma_p_5_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nan_to_num_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nanmean_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nanmedian_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_narrow_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_native_dropout_backward_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_native_layer_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_ne_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_new_empty_strided_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_new_ones_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional__scaled_dot_product_attention_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_adaptive_avg_pool1d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_adaptive_avg_pool2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_adaptive_avg_pool3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_adaptive_max_pool1d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_adaptive_max_pool3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_avg_pool2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_batch_norm_without_cudnn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_bilinear_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_binary_cross_entropy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_celu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_conv_transpose2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_conv_transpose3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_cosine_similarity_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_cross_entropy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_ctc_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_dropout2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_dropout3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_dropout_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_embedding_bag_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_embedding_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_fractional_max_pool2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_fractional_max_pool3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_gaussian_nll_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_gelu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_glu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_hardswish_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_hardtanh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_interpolate_area_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_interpolate_bicubic_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_interpolate_linear_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_kl_div_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_leaky_relu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_logsigmoid_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_margin_ranking_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_max_pool1d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_max_unpool2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_max_unpool2d_grad_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_mish_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_multi_margin_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_nll_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_pad_circular_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_pad_constant_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_pad_reflect_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_pdist_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_poisson_nll_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_relu6_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_rrelu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_selu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_silu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_softmin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_threshold_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_triplet_margin_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_unfold_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_upsample_nearest_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_norm_fro_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_norm_inf_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_norm_nuc_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_normal_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_ones_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_ormqr_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_permute_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_polygamma_polygamma_n_1_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_polygamma_polygamma_n_3_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_positive_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_qr_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_randint_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_randint_like_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_randn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_randn_like_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_reciprocal_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_repeat_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_reshape_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_resize__cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_resize_as__cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_roll_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_round_decimals_3_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_round_decimals_neg_3_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_rsqrt_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_scatter_add_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_scatter_reduce_amin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_scatter_reduce_sum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_select_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_short_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_sigmoid_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_sign_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_signal_windows_bartlett_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_signal_windows_exponential_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_signal_windows_general_hamming_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_signal_windows_hamming_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_signbit_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_sin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_sinc_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_slice_scatter_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_airy_ai_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_bessel_y1_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_chebyshev_polynomial_u_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_erfcx_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_hermite_polynomial_h_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_hermite_polynomial_he_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_i0e_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_i1_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_i1e_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_log_ndtr_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_modified_bessel_k0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_ndtr_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_polygamma_special_polygamma_n_0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_scaled_modified_bessel_k0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_spherical_bessel_j0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_xlog1py_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_split_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_split_with_sizes_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_sqrt_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_stack_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_std_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_stft_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_svd_lowrank_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_symeig_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_t_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_take_along_dim_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_take_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_tan_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_tensor_split_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_to_sparse_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_trace_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_transpose_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_trapezoid_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_triangular_solve_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_triu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_true_divide_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_unbind_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_unfold_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_unfold_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_var_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_var_mean_unbiased_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_var_unbiased_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_view_as_complex_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_view_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_zero__cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_H_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_T_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator___radd___cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator___rdiv___cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator___rmul___cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator___rpow___cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator___rsub___cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator__softmax_backward_data_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_abs_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_acos_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_addcmul_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_addmm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_addmm_decomposed_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_addmv_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_allclose_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_angle_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_argmax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_argsort_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_argwhere_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_as_strided_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_as_strided_scatter_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_asin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_asinh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_atan_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_atleast_1d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_atleast_3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_baddbmm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_bfloat16_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_block_diag_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_bmm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_bool_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_broadcast_shapes_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_broadcast_tensors_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_bucketize_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_byte_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_cartesian_prod_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_cat_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_cdouble_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_char_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_cholesky_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_chunk_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_clamp_max_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_combinations_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_complex_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_conj_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_constant_pad_nd_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_copysign_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_count_nonzero_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_cov_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_cross_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_cumprod_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_cumulative_trapezoid_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_deg2rad_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_diag_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_diagflat_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_diagonal_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_diagonal_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_diagonal_scatter_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_diff_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_digamma_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_dist_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_div_no_rounding_mode_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_div_trunc_rounding_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_dot_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_einsum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_empty_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_empty_like_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_equal_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_erf_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_exp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_expand_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_expm1_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_fft_fft2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_fft_fftn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_fft_hfft_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_fft_hfftn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_fft_ifft2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_fft_ifftn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_fft_ihfft2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_fft_irfft2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_fft_irfft_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_fft_irfftn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_fft_rfft2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_fft_rfftn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_flatten_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_flip_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_fliplr_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_float_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_float_power_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_floor_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_floor_divide_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_fmod_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_frac_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_full_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_full_like_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_gather_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_geqrf_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_gt_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_half_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_histc_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_i0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_index_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_index_put_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_isnan_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_isneginf_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_jiterator_2inputs_2outputs_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_jiterator_4inputs_with_extra_args_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_ldexp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_le_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_eigh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_householder_product_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_inv_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_inv_ex_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_ldl_factor_ex_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_ldl_solve_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_lstsq_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_lu_factor_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_lu_solve_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_matrix_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_matrix_power_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_matrix_rank_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_matrix_rank_hermitian_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_multi_dot_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_pinv_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_pinv_hermitian_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_pinv_singular_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_qr_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_slogdet_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_solve_ex_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_svdvals_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_vander_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_log1p_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_logaddexp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_logcumsumexp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_logical_and_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_logical_or_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_logspace_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_logsumexp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_long_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_lu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_lu_solve_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_lu_unpack_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_mT_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_amin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_argmax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_argmin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_cumprod_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_logsumexp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_median_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_normalize_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_prod_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_scatter_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_select_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_softmin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_std_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_sum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_var_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_matmul_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_matrix_exp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_max_binary_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_max_pool2d_with_indices_backward_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_max_reduction_no_dim_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_maximum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_min_reduction_no_dim_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_min_reduction_with_dim_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_mm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_movedim_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_msort_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_mul_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_mv_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_mvlgamma_mvlgamma_p_1_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_mvlgamma_mvlgamma_p_5_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nanmedian_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nanquantile_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nansum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_narrow_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_narrow_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_native_batch_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_native_layer_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_ne_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_neg_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_new_ones_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_new_zeros_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nextafter_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional__scaled_dot_product_attention_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_adaptive_avg_pool2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_adaptive_max_pool1d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_binary_cross_entropy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_celu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_conv2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_conv_transpose1d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_conv_transpose2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_cross_entropy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_ctc_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_dropout3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_dropout_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_elu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_embedding_bag_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_embedding_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_glu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_hardtanh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_hinge_embedding_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_huber_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_instance_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_interpolate_area_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_interpolate_bilinear_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_kl_div_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_layer_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_local_response_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_max_pool1d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_max_pool3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_max_unpool1d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_mish_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_mse_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_multi_margin_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_multilabel_margin_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_nll_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_pad_circular_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_pad_reflect_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_pdist_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_pixel_unshuffle_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_prelu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_relu6_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_selu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_silu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_softmin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_softmin_with_dtype_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_softplus_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_softshrink_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_softsign_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_triplet_margin_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_upsample_bilinear_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_upsample_nearest_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_norm_inf_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_normal_number_mean_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_ormqr_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_outer_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_polar_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_polygamma_polygamma_n_0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_polygamma_polygamma_n_3_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_pow_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_prod_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_put_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_rand_like_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_randint_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_randn_like_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_remainder_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_repeat_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_resize__cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_resize_as__cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_resolve_neg_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_roll_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_rot90_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_round_decimals_3_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_rsqrt_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_scalar_tensor_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_scatter_reduce_sum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_segment_reduce_lengths_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_sgn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_short_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_sigmoid_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_sign_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_signal_windows_cosine_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_signal_windows_general_hamming_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_signal_windows_hamming_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_signal_windows_hann_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_signal_windows_kaiser_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_sinc_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_sinh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_airy_ai_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_bessel_j0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_chebyshev_polynomial_t_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_chebyshev_polynomial_v_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_erfcx_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_hermite_polynomial_he_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_i0e_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_i1_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_i1e_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_laguerre_polynomial_l_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_legendre_polynomial_p_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_modified_bessel_k0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_modified_bessel_k1_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_ndtri_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_spherical_bessel_j0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_xlog1py_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_sqrt_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_stack_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_std_mean_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_std_mean_unbiased_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_std_unbiased_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_stft_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_sub_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_sum_to_size_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_symeig_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_t_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_take_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_tan_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_tensor_split_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_tensordot_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_topk_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_transpose_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_triangular_solve_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_triu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_true_divide_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_trunc_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_unbind_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_unflatten_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_unfold_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_unfold_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_unique_consecutive_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_unique_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_unsqueeze_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_var_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_var_mean_unbiased_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_vdot_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_view_as_complex_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_view_as_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_view_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_where_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_zeros_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_zeros_like_cuda_float32, test/test_ops.py::TestMathBitsCUDA::test_conj_view_H_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view___rsub___cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_T_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs__conversions_bfloat16_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs__conversions_cdouble_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs__conversions_chalf_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs__conversions_char_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs__conversions_double_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs__conversions_float_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs__conversions_half_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs__conversions_long_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs__conversions_short_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_abs_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_acosh_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_allclose_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_as_strided_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_atanh_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_atleast_2d_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_atleast_3d_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_broadcast_to_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_chunk_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_clone_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_column_stack_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_conj_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_conj_physical_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_constant_pad_nd_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_cos_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_cosh_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_cumsum_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_diag_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_diagonal_copy_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_diagonal_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_dsplit_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_dstack_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_expand_as_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_fft_fft2_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_fft_fft_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_fft_fftn_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_fft_hfft2_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_fft_hfft_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_fft_ifft_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_fft_ifftn_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_fft_irfft2_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_float_power_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_hsplit_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_imag_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_index_add_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_isreal_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_lerp_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_linalg_matrix_norm_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_linalg_norm_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_linalg_svdvals_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_linalg_vector_norm_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_log10_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_log1p_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_log2_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_log_softmax_with_dtype_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_logical_and_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_logical_or_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_masked_fill_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_meshgrid_variadic_tensors_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_movedim_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_neg_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_new_empty_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_new_empty_strided_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_new_zeros_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_nn_functional_l1_loss_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_nn_functional_log_softmax_with_dtype_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_nn_functional_pairwise_distance_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_nn_functional_softmax_with_dtype_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_nn_functional_softmin_with_dtype_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_nn_functional_tanhshrink_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_nn_functional_triplet_margin_loss_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_ones_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_positive_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_ravel_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_real_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_reciprocal_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_repeat_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_reshape_as_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_reshape_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_roll_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_rot90_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_rsqrt_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_sgn_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_sin_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_sinc_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_sinh_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_special_log_softmax_with_dtype_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_sub_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_t_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_tan_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_trace_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_transpose_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_true_divide_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_unbind_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_unfold_copy_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_unfold_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_var_mean_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_view_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_vsplit_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_zeros_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_abs_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_acos_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_add_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_addbmm_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_addcdiv_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_addmv_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_allclose_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_any_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_asin_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_atan_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_atanh_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_atleast_1d_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_atleast_2d_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_atleast_3d_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_bfloat16_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_bmm_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_bool_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_broadcast_tensors_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_byte_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_cfloat_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_chalf_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_char_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_cholesky_solve_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_chunk_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_conj_physical_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_constant_pad_nd_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_cosh_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_count_nonzero_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_cov_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_cumsum_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_cumulative_trapezoid_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_diag_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_diff_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_div_no_rounding_mode_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_double_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_dsplit_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_dstack_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_einsum_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_empty_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_empty_like_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_equal_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_expand_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_eye_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_fft_fft2_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_fft_fftshift_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_fft_hfftn_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_fft_ifftn_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_fft_irfft2_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_fft_irfftn_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_flatten_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_fliplr_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_float_power_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_full_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_half_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_hsplit_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_imag_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_index_fill_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_index_put_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_isinf_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_isnan_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_isreal_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_istft_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_jiterator_2inputs_2outputs_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_jiterator_binary_return_by_ref_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_jiterator_unary_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_kron_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_lerp_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_cond_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_eig_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_eigh_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_eigvals_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_inv_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_ldl_factor_ex_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_ldl_solve_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_lstsq_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_lstsq_grad_oriented_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_lu_solve_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_matrix_norm_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_matrix_power_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_matrix_rank_hermitian_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_multi_dot_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_norm_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_norm_subgradients_at_zero_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_solve_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_solve_ex_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_solve_triangular_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_svd_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_tensorinv_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_tensorsolve_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_vander_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_vecdot_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_log1p_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_log2_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_log_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_logdet_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_logical_and_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_logical_not_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_logical_or_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_logical_xor_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_logspace_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_long_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_lu_solve_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_lu_unpack_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_mH_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_masked_fill_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_masked_normalize_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_masked_select_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_matmul_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_meshgrid_list_of_tensors_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_mm_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_mul_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_mv_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_ne_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_new_empty_strided_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_new_ones_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_conv_transpose1d_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_pad_circular_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_pad_reflect_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_pairwise_distance_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_softmin_with_dtype_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_tanhshrink_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_triplet_margin_loss_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_triplet_margin_with_distance_loss_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_unfold_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_nonzero_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_norm_inf_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_ones_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_ormqr_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_pinverse_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_prod_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_put_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_rand_like_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_randn_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_randn_like_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_ravel_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_renorm_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_resize_as__cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_resolve_conj_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_roll_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_rsub_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_sigmoid_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_sin_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_sinh_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_sparse_sampled_addmm_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_sqrt_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_squeeze_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_stack_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_std_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_std_mean_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_std_mean_unbiased_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_sub_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_sum_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_sum_to_size_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_take_along_dim_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_take_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_tan_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_tensor_split_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_tile_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_to_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_trace_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_trapz_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_triangular_solve_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_tril_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_triu_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_true_divide_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_unfold_copy_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_unfold_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_var_mean_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_var_mean_unbiased_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_view_as_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_view_as_real_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_zeros_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_H_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_T_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view___getitem___cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view___radd___cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view___rsub___cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_T_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs__conversions_bfloat16_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs__conversions_bool_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs__conversions_byte_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs__conversions_cdouble_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs__conversions_chalf_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs__conversions_float_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs__conversions_long_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs__conversions_short_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_abs_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_acos_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_add_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_addcdiv_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_addr_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_all_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_as_strided_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_as_strided_scatter_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_asinh_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_atan_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_atleast_1d_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_atleast_2d_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_atleast_3d_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_chunk_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_clone_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_column_stack_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_conj_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_constant_pad_nd_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_contiguous_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_cos_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_diag_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_diag_embed_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_diagonal_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_diagonal_scatter_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_dsplit_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_dstack_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_empty_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_eq_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_exp_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_expand_as_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_expand_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_eye_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_fft_fft2_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_fft_fft_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_fft_hfft2_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_fft_hfftn_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_fft_ifft2_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_fft_ifft_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_fft_ifftn_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_fft_irfft2_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_fill_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_flatten_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_flip_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_fliplr_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_flipud_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_float_power_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_hstack_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_imag_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_index_fill_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_index_select_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_isclose_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_isinf_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_isnan_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_linalg_norm_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_linalg_vector_norm_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_linspace_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_log10_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_log1p_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_log2_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_log_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_logical_or_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_narrow_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_ne_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_new_empty_strided_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_new_full_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_new_zeros_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_nn_functional_pairwise_distance_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_nn_functional_softmax_with_dtype_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_nn_functional_triplet_margin_loss_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_pow_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_ravel_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_real_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_reciprocal_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_reshape_as_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_reshape_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_roll_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_rsub_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_sgn_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_sigmoid_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_sin_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_sinc_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_softmax_with_dtype_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_sqrt_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_squeeze_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_std_mean_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_sum_to_size_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_tan_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_tanh_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_trace_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_transpose_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_unbind_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_unflatten_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_unfold_copy_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_unfold_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_unsqueeze_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_var_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_var_mean_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_view_as_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_view_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_abs_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_acos_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_acosh_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_addcdiv_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_addcmul_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_addmm_decomposed_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_addr_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_angle_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_argwhere_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_as_strided_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_as_strided_partial_views_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_as_strided_scatter_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_atanh_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_atleast_1d_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_baddbmm_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_bmm_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_broadcast_tensors_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_byte_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_cat_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_cfloat_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_chunk_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_conj_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_conj_physical_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_contiguous_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_corrcoef_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_count_nonzero_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_cross_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_cumsum_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_cumulative_trapezoid_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_diagonal_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_diagonal_scatter_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_dist_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_div_no_rounding_mode_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_dsplit_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_dstack_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_einsum_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_equal_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_expand_as_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_fft_hfft2_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_fft_ifft2_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_fft_ifft_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_fft_ifftshift_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_fft_irfft_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_flatten_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_flip_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_flipud_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_float_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_float_power_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_full_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_full_like_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_gradient_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_half_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_imag_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_index_add_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_index_fill_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_index_put_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_index_select_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_inner_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_int_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_isclose_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_isinf_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_isreal_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_jiterator_2inputs_2outputs_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_jiterator_4inputs_with_extra_args_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_jiterator_binary_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_jiterator_binary_return_by_ref_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_jiterator_unary_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_kron_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_ldexp_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_cond_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_det_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_eigvalsh_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_householder_product_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_ldl_factor_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_ldl_solve_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_lu_factor_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_lu_solve_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_matrix_norm_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_matrix_rank_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_matrix_rank_hermitian_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_norm_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_pinv_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_pinv_hermitian_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_pinv_singular_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_slogdet_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_solve_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_solve_ex_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_tensorinv_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_tensorsolve_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_vander_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linspace_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_log1p_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_log_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_log_softmax_with_dtype_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_logical_and_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_logspace_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_long_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_mH_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_mT_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_masked_cumprod_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_masked_mean_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_masked_normalize_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_masked_prod_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_masked_std_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_masked_sum_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_masked_var_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_matrix_exp_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_meshgrid_variadic_tensors_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_mm_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_narrow_copy_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_new_empty_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_new_empty_strided_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_new_zeros_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_conv2d_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_conv_transpose2d_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_feature_alpha_dropout_without_train_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_linear_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_normalize_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_pad_replicate_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_pairwise_distance_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_pixel_shuffle_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_pixel_unshuffle_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_tanhshrink_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_triplet_margin_loss_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_unfold_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_norm_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_norm_inf_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_norm_nuc_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_ones_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_ones_like_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_permute_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_pinverse_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_prod_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_put_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_rand_like_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_randn_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_randn_like_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_ravel_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_real_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_reciprocal_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_repeat_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_resize__cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_resolve_conj_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_resolve_neg_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_rot90_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_rsub_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_scalar_tensor_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_scatter_add_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_scatter_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_sin_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_sinc_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_sinh_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_slice_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_softmax_with_dtype_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_sparse_sampled_addmm_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_split_list_args_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_split_with_sizes_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_stack_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_std_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_std_unbiased_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_stft_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_sub_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_sum_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_sum_to_size_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_symeig_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_t_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_tan_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_to_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_to_sparse_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_trace_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_transpose_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_unbind_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_unflatten_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_unfold_copy_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_unfold_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_var_mean_unbiased_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_view_as_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_view_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_zeros_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_zeros_like_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_view___getitem___cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view___rmod___cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__native_batch_norm_legit_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs__conversions_bfloat16_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs__conversions_bool_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs__conversions_cdouble_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs__conversions_chalf_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs__conversions_complex_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs__conversions_double_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs__conversions_float_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs__conversions_long_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_abs_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_acos_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_acosh_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_add_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_addcdiv_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_addcmul_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_amin_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_any_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_arange_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_as_strided_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_as_strided_partial_views_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_atanh_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_atleast_1d_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_atleast_2d_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_broadcast_tensors_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_bucketize_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_chunk_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_clamp_min_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_clone_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_constant_pad_nd_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_contiguous_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_copysign_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_cos_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_cosh_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_diag_embed_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_diagonal_scatter_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_div_no_rounding_mode_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_div_trunc_rounding_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_empty_like_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_eq_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_exp2_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_exp_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_expand_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_expm1_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_eye_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_fftshift_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_hfft_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_ifftshift_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_ihfftn_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_irfft_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_irfftn_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_rfft2_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_rfft_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_rfftn_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_flip_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_flipud_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_gt_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_heaviside_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_hypot_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_i0_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_igamma_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_igammac_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_index_add_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_index_fill_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_index_select_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_isclose_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_isfinite_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_isposinf_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_isreal_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_le_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_lgamma_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_linalg_matrix_norm_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_linalg_norm_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_log10_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_log2_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_log_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_log_softmax_with_dtype_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_logical_or_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_masked_fill_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_mean_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_meshgrid_list_of_tensors_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_meshgrid_variadic_tensors_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_minimum_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_movedim_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_mul_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_narrow_copy_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_native_layer_norm_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_neg_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_new_empty_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_new_full_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_dropout_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_gelu_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_group_norm_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_huber_loss_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_l1_loss_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_log_softmax_with_dtype_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_mish_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_mse_loss_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_pairwise_distance_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_softmin_with_dtype_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_softplus_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_softshrink_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_tanhshrink_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_triplet_margin_loss_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_ones_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_permute_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_positive_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_pow_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_randn_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_ravel_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_reciprocal_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_remainder_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_repeat_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_reshape_as_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_reshape_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_round_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_sgn_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_sigmoid_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_signbit_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_sin_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_sinc_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_special_bessel_j0_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_special_i0e_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_special_i1_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_special_i1e_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_special_log_ndtr_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_special_log_softmax_with_dtype_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_special_logit_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_special_multigammaln_mvlgamma_p_3_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_special_multigammaln_mvlgamma_p_5_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_special_ndtri_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_special_softmax_with_dtype_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_special_xlog1py_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_squeeze_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_sum_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_tanh_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_tensor_split_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_to_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_trace_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_tril_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_triu_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_true_divide_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_unbind_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_unflatten_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_unfold_copy_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_unfold_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_unsqueeze_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_var_mean_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_view_as_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_view_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_vsplit_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_vstack_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_where_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_xlogy_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_acos_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_acosh_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_add_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_addcmul_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_addmm_decomposed_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_addmv_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_all_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_allclose_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_amin_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_angle_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_arange_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_argmax_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_argmin_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_argsort_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_as_strided_partial_views_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_asin_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_atan_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_atanh_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_atleast_2d_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_atleast_3d_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_bfloat16_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_broadcast_tensors_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_cartesian_prod_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_ceil_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_char_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_cholesky_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_chunk_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_clamp_max_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_clone_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_column_stack_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_combinations_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_complex_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_copysign_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_corrcoef_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_cos_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_count_nonzero_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_cov_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_cross_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_cummin_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_cumprod_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_cumsum_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_cumulative_trapezoid_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_deg2rad_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_diag_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_diagonal_copy_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_diagonal_scatter_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_digamma_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_div_floor_rounding_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_div_no_rounding_mode_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_dstack_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_einsum_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_exp2_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_exp_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_expand_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_eye_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_fft_fft2_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_fft_fft_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_fft_fftshift_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_fft_hfft2_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_fft_hfft_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_fft_ifft_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_fft_ifftshift_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_fft_ihfft2_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_fft_ihfft_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_fft_irfft2_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_fft_irfft_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_fft_rfftn_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_flatten_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_flip_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_fliplr_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_flipud_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_float_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_frac_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_frexp_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_full_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_ge_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_grid_sampler_2d_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_half_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_heaviside_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_hsplit_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_hstack_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_hypot_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_i0_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_igamma_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_index_copy_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_index_fill_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_index_put_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_isfinite_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_jiterator_2inputs_2outputs_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_jiterator_4inputs_with_extra_args_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_jiterator_binary_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_jiterator_binary_return_by_ref_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_jiterator_unary_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_kthvalue_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_ldexp_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_le_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_lerp_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_lgamma_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_eig_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_eigvals_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_eigvalsh_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_inv_ex_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_ldl_factor_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_ldl_factor_ex_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_ldl_solve_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_lu_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_lu_factor_ex_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_lu_solve_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_matrix_rank_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_matrix_rank_hermitian_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_norm_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_norm_subgradients_at_zero_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_pinv_hermitian_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_slogdet_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_tensorinv_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_vander_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_vecdot_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linspace_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_log1p_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_log_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_logaddexp2_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_logaddexp_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_logcumsumexp_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_logdet_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_logical_and_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_logical_not_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_logical_xor_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_logit_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_lu_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_lu_solve_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_mH_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_mT_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_masked_fill_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_masked_logaddexp_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_masked_logsumexp_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_masked_normalize_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_masked_prod_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_masked_scatter_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_masked_select_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_masked_softmax_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_masked_softmin_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_masked_std_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_matmul_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_max_pool2d_with_indices_backward_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_max_reduction_no_dim_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_max_reduction_with_dim_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_mean_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_median_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_meshgrid_variadic_tensors_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_min_reduction_no_dim_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_min_reduction_with_dim_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_minimum_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_mode_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_movedim_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_mul_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_multinomial_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_mvlgamma_mvlgamma_p_1_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_mvlgamma_mvlgamma_p_3_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_mvlgamma_mvlgamma_p_5_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nanmean_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_narrow_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_native_batch_norm_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_native_dropout_backward_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_native_layer_norm_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_ne_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_new_full_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_adaptive_avg_pool1d_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_alpha_dropout_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_avg_pool2d_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_batch_norm_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_batch_norm_without_cudnn_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_binary_cross_entropy_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_binary_cross_entropy_with_logits_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_conv1d_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_conv_transpose1d_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_cosine_embedding_loss_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_cosine_similarity_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_cross_entropy_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_dropout_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_elu_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_embedding_bag_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_embedding_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_feature_alpha_dropout_without_train_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_fractional_max_pool3d_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_grid_sample_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_hardsigmoid_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_instance_norm_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_interpolate_area_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_interpolate_nearest_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_l1_loss_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_leaky_relu_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_linear_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_local_response_norm_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_logsigmoid_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_max_pool2d_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_max_unpool1d_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_max_unpool1d_grad_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_max_unpool2d_grad_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_mish_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_mse_loss_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_pad_circular_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_pad_constant_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_pad_reflect_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_pairwise_distance_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_pixel_shuffle_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_pixel_unshuffle_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_prelu_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_relu6_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_rrelu_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_smooth_l1_loss_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_soft_margin_loss_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_softmin_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_softplus_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_softshrink_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_tanhshrink_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_threshold_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_triplet_margin_loss_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_triplet_margin_with_distance_loss_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_upsample_bilinear_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nonzero_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_norm_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_norm_nuc_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_normal_number_mean_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_ones_like_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_ops_nvprims_native_batch_norm_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_ops_nvprims_var_mean_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_ormqr_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_pca_lowrank_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_permute_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_polar_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_polygamma_polygamma_n_0_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_polygamma_polygamma_n_3_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_positive_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_pow_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_prod_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_qr_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_quantile_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_rad2deg_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_rand_like_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_randint_like_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_reshape_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_resize__cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_resolve_conj_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_round_decimals_0_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_round_decimals_3_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_rsub_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_scatter_reduce_amax_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_scatter_reduce_prod_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_scatter_reduce_sum_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_segment_reduce_offsets_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_select_scatter_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_sgn_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_short_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_sigmoid_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_sign_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_signal_windows_bartlett_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_signal_windows_gaussian_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_signal_windows_general_cosine_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_signal_windows_hann_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_signal_windows_kaiser_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_signal_windows_nuttall_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_sinc_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_sinh_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_slice_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_slice_scatter_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_bessel_j0_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_bessel_j1_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_chebyshev_polynomial_u_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_chebyshev_polynomial_v_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_erfcx_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_hermite_polynomial_he_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_i1e_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_laguerre_polynomial_l_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_modified_bessel_i0_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_modified_bessel_i1_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_modified_bessel_k0_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_modified_bessel_k1_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_ndtr_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_ndtri_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_polygamma_special_polygamma_n_0_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_scaled_modified_bessel_k1_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_shifted_chebyshev_polynomial_u_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_shifted_chebyshev_polynomial_v_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_xlog1py_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_zeta_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_split_list_args_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_stack_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_std_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_std_mean_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_std_unbiased_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_sub_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_sum_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_take_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_tensordot_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_tile_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_topk_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_trace_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_trapezoid_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_tril_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_true_divide_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_unbind_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_unfold_copy_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_unique_consecutive_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_unsqueeze_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_view_as_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_view_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_vsplit_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_vstack_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_zero__cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_zeros_like_cuda_float64, test/test_ops.py::TestFakeTensorCUDA::test_fake_T_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake___radd___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake___rand___cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_fake___rmatmul___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake___rmul___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake___ror___cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_fake___rsub___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake__native_batch_norm_legit_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_acos_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_addbmm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_addmm_decomposed_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_addr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_amax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_aminmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_angle_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_argwhere_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_as_strided_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_as_strided_scatter_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_asinh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_atan2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_atan_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_atleast_2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_atleast_3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast___getitem___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast___rand___cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast___rdiv___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast___rmatmul___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast___rmod___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast___rpow___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast__native_batch_norm_legit_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_abs_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_acos_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_addcdiv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_addr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_all_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_amax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_amin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_any_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_arange_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_argmin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_argsort_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_argwhere_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_as_strided_partial_views_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_asinh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_atan_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_bincount_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_bitwise_left_shift_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_bitwise_not_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_bitwise_right_shift_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_bitwise_xor_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_block_diag_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_bmm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_broadcast_shapes_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_broadcast_to_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_bucketize_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_cartesian_prod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_ceil_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_cfloat_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_chalf_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_char_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_cholesky_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_cholesky_solve_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_clamp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_clamp_min_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_clone_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_combinations_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_constant_pad_nd_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_contiguous_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_copysign_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_cosh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_cov_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_cross_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_cummin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_cumprod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_diag_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_diagflat_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_diagonal_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_diagonal_scatter_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_diff_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_dist_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_dot_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_double_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_dsplit_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_dstack_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_einsum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_empty_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_empty_like_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_eq_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_erfc_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_erfinv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_exp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_expand_as_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_expand_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_expm1_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fft_fft2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fft_fftn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fft_fftshift_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fft_hfft2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fft_ifftn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fft_ifftshift_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fft_ihfft2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fft_ihfft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fft_ihfftn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fft_irfftn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fft_rfft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_flatten_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_float_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_float_power_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_floor_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fmin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_frac_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_frexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_full_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_geqrf_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_gradient_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_grid_sampler_2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_gt_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_half_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_heaviside_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_hstack_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_hypot_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_i0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_igammac_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_index_add_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_index_reduce_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_index_select_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_inner_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_int_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_isin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_isinf_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_isneginf_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_isreal_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_jiterator_2inputs_2outputs_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_jiterator_4inputs_with_extra_args_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_jiterator_binary_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_jiterator_unary_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_kron_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_lcm_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_ldexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_le_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_lgamma_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_det_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_eigh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_householder_product_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_ldl_factor_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_lu_factor_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_lu_factor_ex_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_lu_solve_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_matrix_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_matrix_power_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_qr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_slogdet_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_solve_ex_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_solve_triangular_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linspace_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_log_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_log_softmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_logaddexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_logcumsumexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_logical_not_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_logical_or_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_logical_xor_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_logit_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_logsumexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_long_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_lt_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_lu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_lu_solve_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_mH_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_mT_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_masked_amin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_masked_argmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_masked_argmin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_masked_cumsum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_masked_fill_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_masked_log_softmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_masked_logaddexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_masked_median_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_masked_softmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_masked_softmin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_max_binary_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_max_reduction_with_dim_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_maximum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_mean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_median_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_meshgrid_list_of_tensors_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_min_binary_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_minimum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_mm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_mul_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_mv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_mvlgamma_mvlgamma_p_1_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_mvlgamma_mvlgamma_p_5_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nan_to_num_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nanmedian_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nanquantile_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nansum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_narrow_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_native_batch_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_ne_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_neg_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_new_empty_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_new_full_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_new_ones_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_adaptive_avg_pool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_adaptive_max_pool1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_adaptive_max_pool3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_alpha_dropout_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_avg_pool1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_avg_pool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_avg_pool3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_celu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_conv_transpose3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_cosine_embedding_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_ctc_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_dropout2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_dropout3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_dropout_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_elu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_embedding_bag_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_embedding_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_fractional_max_pool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_gaussian_nll_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_gelu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_glu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_grid_sample_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_group_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_hardsigmoid_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_hardswish_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_hardtanh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_hinge_embedding_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_interpolate_area_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_interpolate_bilinear_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_l1_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_layer_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_leaky_relu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_linear_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_max_pool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_max_pool3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_max_unpool1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_max_unpool3d_grad_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_multilabel_margin_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_nll_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_normalize_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_pad_circular_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_pad_reflect_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_pad_replicate_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_pairwise_distance_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_pdist_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_poisson_nll_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_selu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_silu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_smooth_l1_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_soft_margin_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_softmin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_softmin_with_dtype_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_softshrink_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_softsign_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_tanhshrink_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_threshold_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_triplet_margin_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_unfold_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_upsample_bilinear_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nonzero_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_ormqr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_permute_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_polygamma_polygamma_n_2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_polygamma_polygamma_n_3_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_qr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_quantile_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_randint_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_ravel_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_real_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_remainder_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_repeat_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_repeat_interleave_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_reshape_as_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_reshape_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_resize_as__cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_rot90_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_round_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_round_decimals_3_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_scatter_add_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_scatter_reduce_amin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_scatter_reduce_mean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_segment_reduce_lengths_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_select_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_select_scatter_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_sgn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_short_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_signal_windows_blackman_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_signal_windows_cosine_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_signal_windows_gaussian_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_signal_windows_general_cosine_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_signal_windows_general_hamming_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_signal_windows_hamming_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_signal_windows_hann_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_signal_windows_kaiser_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_signal_windows_nuttall_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_sinh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_slice_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_slice_scatter_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_sort_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_sparse_sampled_addmm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_airy_ai_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_bessel_j1_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_bessel_y1_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_chebyshev_polynomial_u_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_chebyshev_polynomial_v_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_chebyshev_polynomial_w_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_entr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_erfcx_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_hermite_polynomial_h_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_i1e_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_legendre_polynomial_p_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_modified_bessel_i0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_modified_bessel_k0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_ndtri_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_scaled_modified_bessel_k0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_spherical_bessel_j0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_xlog1py_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_zeta_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_split_with_sizes_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_squeeze_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_stack_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_std_mean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_std_unbiased_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_stft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_sum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_svd_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_symeig_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_take_along_dim_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_tan_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_tensordot_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_tile_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_to_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_transpose_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_trapezoid_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_trapz_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_tril_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_triu_indices_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_unbind_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_unfold_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_uniform_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_unique_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_unsqueeze_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_var_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_var_mean_unbiased_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_var_unbiased_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_vdot_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_view_as_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_view_as_real_cuda_complex64, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_view_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_vsplit_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_vstack_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_where_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_zeros_like_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_bitwise_and_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_fake_bitwise_left_shift_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_fake_bitwise_right_shift_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_fake_broadcast_to_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_bucketize_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_cartesian_prod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_cat_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_cdist_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_ceil_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_cfloat_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_chalf_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_cholesky_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_cholesky_inverse_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_chunk_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_clamp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_clamp_max_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_clamp_min_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_column_stack_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_conj_physical_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_contiguous_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_count_nonzero_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_cov_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_H_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp___getitem___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp___rmod___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_abs_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_acosh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_addbmm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_addmv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_addr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_amax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_amin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_angle_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_as_strided_scatter_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_asinh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_atan_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_atleast_2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_baddbmm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_block_diag_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_broadcast_tensors_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_cartesian_prod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_cdist_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_cdouble_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_cfloat_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_cholesky_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_cholesky_inverse_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_cholesky_solve_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_clamp_max_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_clone_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_column_stack_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_combinations_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_conj_physical_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_constant_pad_nd_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_contiguous_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_copysign_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_cummin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_cumprod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_deg2rad_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_diag_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_diag_embed_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_diagflat_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_diagonal_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_diagonal_scatter_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_diff_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_div_floor_rounding_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_div_no_rounding_mode_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_div_trunc_rounding_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_dstack_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_einsum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_erf_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_erfinv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_exp2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_exp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_expand_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fft_fft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fft_fftn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fft_hfft2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fft_hfft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fft_ifft2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fft_ifft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fft_ifftn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fft_ifftshift_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fft_ihfft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fft_irfft2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fft_irfftn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fft_rfftn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fill_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_flatten_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fliplr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fmin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_frexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_gradient_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_half_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_hsplit_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_hstack_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_hypot_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_index_add_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_index_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_index_reduce_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_kthvalue_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_ldexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_lerp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_cond_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_eigvalsh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_inv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_inv_ex_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_lstsq_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_lstsq_grad_oriented_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_lu_solve_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_matrix_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_pinv_hermitian_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_slogdet_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_solve_triangular_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_svdvals_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_tensorinv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_tensorsolve_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_vector_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_log10_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_log1p_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_log_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_logaddexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_logsumexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_lu_solve_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_lu_unpack_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_mH_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_masked_amax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_masked_amin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_masked_cumsum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_masked_log_softmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_masked_logaddexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_masked_logsumexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_masked_median_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_masked_prod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_masked_scatter_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_masked_softmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_masked_sum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_masked_var_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_matmul_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_max_reduction_no_dim_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_maximum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_mean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_median_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_meshgrid_list_of_tensors_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_min_binary_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_min_reduction_no_dim_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_min_reduction_with_dim_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_minimum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_mm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_mul_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_mvlgamma_mvlgamma_p_1_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nanmedian_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_narrow_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_native_batch_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_native_dropout_backward_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_adaptive_avg_pool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_adaptive_max_pool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_adaptive_max_pool3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_avg_pool1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_batch_norm_without_cudnn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_bilinear_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_celu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_conv1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_conv2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_conv_transpose1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_conv_transpose2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_cosine_embedding_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_cross_entropy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_gaussian_nll_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_gelu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_hardshrink_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_hardswish_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_hardtanh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_hinge_embedding_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_huber_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_interpolate_bicubic_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_interpolate_bilinear_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_interpolate_linear_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_interpolate_trilinear_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_l1_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_layer_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_leaky_relu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_logsigmoid_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_max_pool1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_max_pool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_max_unpool1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_max_unpool1d_grad_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_max_unpool2d_grad_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_max_unpool3d_grad_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_mish_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_mse_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_multi_margin_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_multilabel_margin_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_pad_circular_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_pad_constant_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_pad_reflect_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_pdist_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_pixel_shuffle_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_prelu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_selu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_silu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_softmin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_softmin_with_dtype_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_softplus_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_softshrink_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_unfold_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_norm_fro_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_norm_nuc_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_outer_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_polar_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_polygamma_polygamma_n_1_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_polygamma_polygamma_n_4_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_prod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_put_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_qr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_rad2deg_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_reciprocal_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_remainder_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_renorm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_repeat_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_repeat_interleave_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_resolve_neg_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_roll_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_rot90_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_round_decimals_neg_3_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_rsqrt_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_rsub_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_scatter_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_scatter_reduce_amax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_scatter_reduce_amin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_scatter_reduce_mean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_scatter_reduce_prod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_scatter_reduce_sum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_select_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_sign_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_sinh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_softmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_softmax_with_dtype_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_sort_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_special_erfcx_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_special_i0e_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_special_ndtr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_special_ndtri_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_split_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_split_list_args_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_squeeze_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_stack_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_std_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_std_mean_unbiased_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_std_unbiased_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_sum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_svd_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_svd_lowrank_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_symeig_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_take_along_dim_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_take_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_to_sparse_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_topk_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_transpose_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_trapezoid_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_tril_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_triu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_true_divide_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_unfold_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_unfold_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_var_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_var_mean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_var_mean_unbiased_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_var_unbiased_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_vdot_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_view_as_complex_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_view_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_vstack_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_zero__cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_H_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_T_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp___radd___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp___rdiv___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp___rmul___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp___rsub___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp__softmax_backward_data_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_acos_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_acosh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_addcdiv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_addr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_angle_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_as_strided_scatter_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_asinh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_atan2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_atleast_2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_atleast_3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_baddbmm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_bernoulli_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_bfloat16_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_bmm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_broadcast_tensors_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_cartesian_prod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_cdist_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_cdouble_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_cfloat_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_cholesky_inverse_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_clamp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_clamp_max_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_clamp_min_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_column_stack_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_complex_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_conj_physical_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_contiguous_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_copysign_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_cos_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_cosh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_cross_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_cumprod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_deg2rad_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_diag_embed_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_diagflat_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_diagonal_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_diagonal_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_diagonal_scatter_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_div_floor_rounding_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_div_trunc_rounding_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_double_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_dsplit_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_erf_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_erfc_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_exp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_expand_as_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_expm1_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fft_fft2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fft_fft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fft_fftn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fft_ifft2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fft_ifft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fft_ifftn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fft_ifftshift_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fft_ihfft2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fft_irfft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fft_irfftn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fft_rfft2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fft_rfft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fill_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_flip_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_flipud_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_float_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_float_power_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_floor_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fmod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_half_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_hstack_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_hypot_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_index_add_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_lerp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_lgamma_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_det_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_eigh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_eigvalsh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_householder_product_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_inv_ex_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_lstsq_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_lu_factor_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_lu_solve_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_matrix_power_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_multi_dot_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_norm_subgradients_at_zero_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_pinv_singular_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_solve_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_solve_ex_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_svd_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_tensorinv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_tensorsolve_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_vecdot_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_log10_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_log1p_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_log2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_logaddexp2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_logaddexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_logdet_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_lu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_lu_solve_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_lu_unpack_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_mT_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_masked_amax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_masked_amin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_masked_logaddexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_masked_logsumexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_masked_scatter_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_masked_select_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_masked_softmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_masked_softmin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_masked_std_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_matmul_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_matrix_exp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_max_reduction_with_dim_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_maximum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_meshgrid_list_of_tensors_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_meshgrid_variadic_tensors_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_min_reduction_no_dim_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_minimum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_mm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_mode_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_movedim_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_msort_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_mul_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_mv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_mvlgamma_mvlgamma_p_1_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_mvlgamma_mvlgamma_p_5_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nan_to_num_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nanmean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nansum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_native_dropout_backward_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_neg_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_adaptive_avg_pool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_adaptive_avg_pool3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_adaptive_max_pool1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_adaptive_max_pool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_adaptive_max_pool3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_alpha_dropout_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_avg_pool1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_avg_pool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_batch_norm_without_cudnn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_bilinear_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_conv1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_conv2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_conv_transpose1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_conv_transpose2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_cosine_embedding_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_ctc_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_dropout3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_embedding_bag_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_embedding_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_fractional_max_pool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_glu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_group_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_hardshrink_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_hardsigmoid_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_hardswish_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_hardtanh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_instance_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_interpolate_linear_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_interpolate_nearest_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_interpolate_trilinear_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_layer_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_linear_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_local_response_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_logsigmoid_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_margin_ranking_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_max_unpool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_max_unpool3d_grad_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_mse_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_multi_margin_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_normalize_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_pad_circular_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_pad_constant_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_pairwise_distance_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_pdist_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_prelu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_relu6_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_relu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_rrelu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_smooth_l1_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_soft_margin_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_softmin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_softplus_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_threshold_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_triplet_margin_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_unfold_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_upsample_bilinear_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_upsample_nearest_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_norm_inf_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_norm_nuc_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_ormqr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_pca_lowrank_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_permute_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_pinverse_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_polygamma_polygamma_n_0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_polygamma_polygamma_n_1_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_polygamma_polygamma_n_2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_polygamma_polygamma_n_3_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_polygamma_polygamma_n_4_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_qr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_rad2deg_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_real_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_reciprocal_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_remainder_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_renorm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_repeat_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_reshape_as_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_resolve_conj_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_resolve_neg_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_roll_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_round_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_round_decimals_3_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_rsub_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_scatter_add_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_scatter_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_segment_reduce_offsets_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_select_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_select_scatter_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_sgn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_sigmoid_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_sign_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_sinc_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_sinh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_softmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_softmax_with_dtype_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_sparse_sampled_addmm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_special_erfcx_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_special_i1e_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_special_ndtri_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_special_polygamma_special_polygamma_n_0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_special_xlog1py_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_split_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_sqrt_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_std_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_std_mean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_std_mean_unbiased_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_std_unbiased_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_sum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_sum_to_size_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_svd_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_svd_lowrank_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_tan_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_tanh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_tensor_split_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_to_sparse_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_topk_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_trace_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_transpose_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_triangular_solve_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_tril_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_trunc_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_unflatten_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_unfold_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_unfold_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_var_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_var_mean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_var_unbiased_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_view_as_complex_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_view_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_vstack_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_zero__cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_cummax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_cummin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_diag_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_diagflat_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_diagonal_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_diff_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_digamma_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_div_no_rounding_mode_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_dot_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_double_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_dstack_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_equal_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_exp2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_expand_as_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_fft_fft2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_fft_hfft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_fft_ifft2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_fft_ifft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_fft_ifftshift_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_fft_ihfft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_fft_ihfftn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_fft_irfft2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_fft_irfft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_fft_irfftn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_fft_rfft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_fft_rfftn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_fill_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_flatten_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_fliplr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_flipud_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_floor_divide_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_fmin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_fmod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_frac_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_full_like_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_ge_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_geqrf_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_gradient_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_half_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_heaviside_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_histc_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_hsplit_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_hstack_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_hypot_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_index_add_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_index_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_index_fill_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_index_put_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_index_reduce_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_index_select_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_inner_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_isinf_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_isneginf_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_isposinf_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_jiterator_4inputs_with_extra_args_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_jiterator_binary_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_jiterator_binary_return_by_ref_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_le_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_lerp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_cross_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_eig_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_eigvalsh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_inv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_inv_ex_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_lstsq_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_lu_solve_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_matrix_rank_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_matrix_rank_hermitian_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_multi_dot_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_norm_subgradients_at_zero_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_pinv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_pinv_hermitian_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_qr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_slogdet_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_solve_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_solve_triangular_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_vecdot_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_log10_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_log1p_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_log2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_logaddexp2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_logaddexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_logdet_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_logical_not_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_logit_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_logspace_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_lt_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_lu_unpack_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_mH_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_mT_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_masked_amax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_masked_argmin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_masked_cumsum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_masked_fill_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_masked_log_softmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_masked_logsumexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_masked_normalize_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_max_binary_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_max_reduction_with_dim_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_maximum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_median_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_meshgrid_variadic_tensors_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_min_binary_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_min_reduction_no_dim_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_minimum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_mm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_mode_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_movedim_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_msort_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_multinomial_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_mv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nanmean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nanmedian_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nansum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_native_batch_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_ne_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_new_empty_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_new_empty_strided_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_new_full_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_new_zeros_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nextafter_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_adaptive_avg_pool1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_adaptive_avg_pool3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_adaptive_max_pool1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_adaptive_max_pool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_avg_pool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_avg_pool3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_batch_norm_without_cudnn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_celu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_conv_transpose2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_conv_transpose3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_cosine_embedding_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_cross_entropy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_ctc_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_dropout2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_dropout_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_elu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_embedding_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_fractional_max_pool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_glu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_grid_sample_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_hardsigmoid_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_hinge_embedding_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_huber_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_interpolate_area_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_interpolate_trilinear_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_l1_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_leaky_relu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_linear_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_max_unpool1d_grad_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_max_unpool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_max_unpool2d_grad_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_max_unpool3d_grad_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_mish_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_mse_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_multi_margin_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_multilabel_margin_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_nll_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_one_hot_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_pad_circular_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_pad_reflect_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_pad_replicate_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_pairwise_distance_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_pdist_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_pixel_shuffle_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_poisson_nll_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_prelu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_silu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_soft_margin_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_softplus_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_softshrink_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_tanhshrink_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_triplet_margin_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_unfold_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_norm_nuc_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_normal_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_normal_number_mean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_ones_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_outer_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_pca_lowrank_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_pinverse_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_polygamma_polygamma_n_0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_polygamma_polygamma_n_2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_polygamma_polygamma_n_3_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_polygamma_polygamma_n_4_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_pow_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_prod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_qr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_quantile_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_rad2deg_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_randint_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_randint_like_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_randn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_ravel_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_remainder_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_renorm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_reshape_as_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_reshape_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_resize_as__cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_resolve_neg_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_roll_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_rot90_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_round_decimals_0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_round_decimals_neg_3_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_scalar_tensor_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_scatter_add_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_segment_reduce_offsets_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_select_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_signal_windows_bartlett_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_signal_windows_cosine_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_signal_windows_exponential_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_signal_windows_gaussian_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_signal_windows_general_hamming_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_signal_windows_hamming_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_signal_windows_nuttall_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_sin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_slice_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_slice_scatter_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_softmax_with_dtype_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_sort_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_bessel_j1_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_bessel_y0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_chebyshev_polynomial_t_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_chebyshev_polynomial_u_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_chebyshev_polynomial_v_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_erfcx_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_hermite_polynomial_h_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_hermite_polynomial_he_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_modified_bessel_k0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_ndtr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_ndtri_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_polygamma_special_polygamma_n_0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_scaled_modified_bessel_k0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_spherical_bessel_j0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_xlog1py_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_split_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_split_with_sizes_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_sqrt_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_squeeze_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_stack_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_std_mean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_std_mean_unbiased_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_std_unbiased_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_stft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_sum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_take_along_dim_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_take_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_tan_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_tensordot_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_to_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_trace_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_trapezoid_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_trapz_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_tril_indices_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_fake_triu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_triu_indices_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_fake_unflatten_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_unfold_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_unique_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_var_unbiased_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_view_as_complex_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_view_as_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_view_as_real_cuda_complex64, test/test_ops.py::TestFakeTensorCUDA::test_fake_view_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_vsplit_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_vstack_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_where_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_zeros_like_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_H_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_T_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops___radd___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops___rmatmul___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops___ror___cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops___rpow___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops__native_batch_norm_legit_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops__softmax_backward_data_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_abs_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_acos_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_acosh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_addbmm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_addcmul_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_addmm_decomposed_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_addmv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_addr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_all_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_allclose_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_amin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_aminmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_angle_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_argmin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_argsort_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_argwhere_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_as_strided_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_as_strided_partial_views_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_atan_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_atleast_1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_bernoulli_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_bincount_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_bitwise_left_shift_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_bitwise_not_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_bitwise_right_shift_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_block_diag_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_bool_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_bucketize_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_byte_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_cat_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_cdist_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_cdouble_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_ceil_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_cholesky_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_chunk_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_clamp_max_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_clamp_min_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_column_stack_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_combinations_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_conj_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_constant_pad_nd_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_contiguous_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_copysign_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_cosh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_count_nonzero_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_cross_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_cummax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_cumprod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_cumulative_trapezoid_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_diag_embed_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_diagonal_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_digamma_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_dist_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_double_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_empty_like_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_eq_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_erf_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_erfc_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fft_fftshift_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fft_hfft2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fft_hfftn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fft_ihfft2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fft_ihfft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fft_ihfftn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fft_irfftn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fft_rfft2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fft_rfft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fill_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_flatten_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_flip_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_float_power_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_floor_divide_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fmin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fmod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_frac_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_frexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_full_like_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_ge_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_geqrf_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_gradient_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_grid_sampler_2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_gt_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_hstack_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_igammac_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_imag_cuda_complex64, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_index_put_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_index_reduce_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_inner_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_int_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_isclose_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_isfinite_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_jiterator_2inputs_2outputs_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_jiterator_binary_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_kron_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_lerp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_lgamma_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_cond_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_det_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_eig_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_householder_product_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_inv_ex_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_ldl_factor_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_ldl_factor_ex_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_ldl_solve_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_lstsq_grad_oriented_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_lu_factor_ex_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_matrix_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_matrix_rank_hermitian_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_multi_dot_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_norm_subgradients_at_zero_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_pinv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_pinv_hermitian_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_pinv_singular_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_qr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_solve_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_solve_triangular_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_svd_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_svdvals_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_tensorinv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_vander_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_vecdot_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_log2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_log_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_logaddexp2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_logcumsumexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_logdet_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_logical_and_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_logical_not_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_logspace_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_long_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_lu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_amax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_amin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_argmin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_cumsum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_logsumexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_median_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_prod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_scatter_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_softmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_softmin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_std_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_var_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_matrix_exp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_max_binary_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_max_pool2d_with_indices_backward_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_max_reduction_no_dim_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_max_reduction_with_dim_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_median_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_minimum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_mm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_mode_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_movedim_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_msort_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_multinomial_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_mv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_mvlgamma_mvlgamma_p_1_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nanquantile_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_narrow_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_native_batch_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_ne_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_neg_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_new_full_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_new_ones_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_new_zeros_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional__scaled_dot_product_attention_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_adaptive_max_pool1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_adaptive_max_pool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_avg_pool1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_avg_pool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_batch_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_bilinear_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_binary_cross_entropy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_conv1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_conv_transpose2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_cosine_embedding_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_cosine_similarity_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_cross_entropy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_ctc_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_dropout2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_elu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_embedding_bag_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_embedding_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_fractional_max_pool3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_gaussian_nll_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_gelu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_grid_sample_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_group_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_hardsigmoid_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_hardtanh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_interpolate_bilinear_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_interpolate_trilinear_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_l1_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_linear_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_local_response_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_logsigmoid_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_margin_ranking_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_max_pool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_max_unpool1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_max_unpool1d_grad_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_max_unpool2d_grad_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_max_unpool3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_max_unpool3d_grad_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_mish_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_nll_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_pad_circular_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_pad_reflect_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_pdist_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_prelu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_relu6_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_relu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_tanhshrink_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_threshold_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_unfold_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_upsample_bilinear_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nonzero_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_norm_inf_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_norm_nuc_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_ones_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_ones_like_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_ormqr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_pca_lowrank_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_pinverse_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_polar_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_polygamma_polygamma_n_0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_polygamma_polygamma_n_2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_pow_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_put_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_qr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_quantile_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_rad2deg_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_randint_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_randint_like_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_randn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_randn_like_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_ravel_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_real_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_reciprocal_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_remainder_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_repeat_interleave_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_reshape_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_resize__cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_resize_as__cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_resolve_conj_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_roll_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_rot90_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_round_decimals_0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_round_decimals_3_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_scatter_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_scatter_reduce_amax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_scatter_reduce_mean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_scatter_reduce_sum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_searchsorted_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_signal_windows_gaussian_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_signal_windows_general_cosine_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_signal_windows_hamming_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_signal_windows_nuttall_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_sin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_sinc_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_sinh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_slice_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_sort_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_bessel_j0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_bessel_y1_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_chebyshev_polynomial_t_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_erfcx_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_hermite_polynomial_h_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_hermite_polynomial_he_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_i0e_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_i1_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_legendre_polynomial_p_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_modified_bessel_i0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_modified_bessel_i1_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_scaled_modified_bessel_k0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_spherical_bessel_j0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_zeta_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_split_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_square_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_std_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_svd_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_svd_lowrank_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_symeig_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_t_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_take_along_dim_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_tan_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_tensor_split_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_to_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_topk_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_trace_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_trapezoid_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_triangular_solve_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_tril_indices_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_triu_indices_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_trunc_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_unfold_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_unfold_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_uniform_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_unique_consecutive_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_unsqueeze_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_var_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_var_mean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_var_mean_unbiased_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_var_unbiased_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_vdot_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_view_as_real_cuda_complex64, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_view_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_vsplit_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_vstack_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_zeros_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_zeros_like_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_T_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags___rmatmul___cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags___rmod___cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags___ror___cuda_int64, test/test_ops.py::TestTagsCUDA::test_tags___rsub___cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags___rxor___cuda_int64, test/test_ops.py::TestTagsCUDA::test_tags__refs__conversions_bfloat16_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs__conversions_bool_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs__conversions_byte_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs__conversions_cdouble_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs__conversions_cfloat_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs__conversions_complex_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs__conversions_double_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs__conversions_float_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs__conversions_short_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_acos_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_add_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_addcdiv_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_addcmul_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_all_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_allclose_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_amax_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_any_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_as_strided_partial_views_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_as_strided_scatter_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_atanh_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_atleast_1d_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_atleast_2d_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_atleast_3d_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_bitwise_and_cuda_int64, test/test_ops.py::TestTagsCUDA::test_tags__refs_bitwise_left_shift_cuda_int64, test/test_ops.py::TestTagsCUDA::test_tags__refs_bitwise_not_cuda_int64, test/test_ops.py::TestTagsCUDA::test_tags__refs_bitwise_right_shift_cuda_int64, test/test_ops.py::TestTagsCUDA::test_tags__refs_broadcast_to_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_ceil_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_chunk_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_clamp_max_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_conj_physical_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_constant_pad_nd_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_cumsum_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_diag_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_div_floor_rounding_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_dsplit_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_dstack_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_empty_like_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_eq_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_erfinv_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_exp_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_expand_as_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_expm1_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_fft_fft_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_fft_ifftshift_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_fft_ihfft2_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_fft_ihfft_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_fft_ihfftn_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_fft_irfft_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_fft_rfft_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_fft_rfftn_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_fill_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_flatten_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_flip_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_fliplr_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_flipud_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_fmax_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_gt_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_heaviside_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_hsplit_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_hypot_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_i0_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_igamma_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_igammac_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_imag_cuda_complex64, test/test_ops.py::TestTagsCUDA::test_tags__refs_index_add_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_index_fill_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_isclose_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_isinf_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_isreal_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_lcm_cuda_int64, test/test_ops.py::TestTagsCUDA::test_tags__refs_lerp_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_lgamma_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_linalg_matrix_norm_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_linalg_svd_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_linalg_svdvals_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_linalg_vector_norm_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_log10_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_log_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_logical_and_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_logical_not_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_logical_or_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_lt_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_mean_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_meshgrid_list_of_tensors_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_meshgrid_variadic_tensors_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_movedim_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_neg_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_new_empty_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_new_full_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_new_ones_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_new_zeros_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nextafter_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_alpha_dropout_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_dropout_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_glu_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_group_norm_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_hardshrink_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_huber_loss_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_margin_ranking_loss_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_mish_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_mse_loss_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_pairwise_distance_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_pdist_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_relu6_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_selu_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_softmax_with_dtype_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_threshold_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_triplet_margin_loss_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_positive_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_ravel_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_real_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_reciprocal_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_repeat_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_reshape_as_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_round_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_rsqrt_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_rsub_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_sigmoid_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_sinc_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_sinh_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_special_bessel_j0_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_special_bessel_j1_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_special_erfcx_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_special_log_ndtr_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_special_multigammaln_mvlgamma_p_1_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_special_multigammaln_mvlgamma_p_3_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_special_multigammaln_mvlgamma_p_5_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_special_xlog1py_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_sqrt_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_square_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_squeeze_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_stack_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_std_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_sum_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_t_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_trace_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_transpose_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_tril_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_triu_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_trunc_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_var_mean_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_where_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_xlogy_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_zeros_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_acos_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_add_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_addcdiv_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_addmm_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_addmm_decomposed_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_all_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_amax_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_amin_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_any_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_argmin_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_argsort_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_argwhere_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_as_strided_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_as_strided_partial_views_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_as_strided_scatter_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_atanh_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_atleast_1d_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_atleast_3d_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_bernoulli_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_bfloat16_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_bincount_cuda_int64, test/test_ops.py::TestTagsCUDA::test_tags_bitwise_right_shift_cuda_int64, test/test_ops.py::TestTagsCUDA::test_tags_bitwise_xor_cuda_int64, test/test_ops.py::TestTagsCUDA::test_tags_block_diag_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_bool_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_broadcast_tensors_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_bucketize_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_cat_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_cdist_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_ceil_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_cfloat_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_cholesky_inverse_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_cholesky_solve_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_chunk_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_clamp_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_clamp_max_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_clone_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_combinations_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_conj_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_constant_pad_nd_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_copysign_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_corrcoef_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_cos_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_cosh_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_count_nonzero_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_cov_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_cross_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_cumprod_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_cumulative_trapezoid_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_diag_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_diag_embed_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_diagonal_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_diff_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_dist_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_div_no_rounding_mode_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_dot_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_dsplit_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_dstack_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_empty_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_eq_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_equal_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_erf_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_erfc_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_erfinv_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_exp2_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_expand_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_expm1_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_eye_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_fft_fftn_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_fft_hfft2_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_fft_hfft_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_fft_hfftn_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_fft_ihfft2_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_fft_irfft2_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_fft_irfftn_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_fft_rfft2_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_fft_rfft_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_fft_rfftn_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_fill_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_flatten_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_float_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_fmin_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_fmod_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_frexp_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_full_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_full_like_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_gather_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_gcd_cuda_int64, test/test_ops.py::TestTagsCUDA::test_tags_ge_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_grid_sampler_2d_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_half_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_hsplit_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_i0_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_imag_cuda_complex64, test/test_ops.py::TestTagsCUDA::test_tags_index_add_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_index_copy_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_index_fill_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_index_reduce_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_int_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_isin_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_isreal_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_istft_cuda_complex64, test/test_ops.py::TestTagsCUDA::test_tags_jiterator_2inputs_2outputs_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_jiterator_4inputs_with_extra_args_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_jiterator_binary_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_jiterator_binary_return_by_ref_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_kron_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_kthvalue_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_lcm_cuda_int64, test/test_ops.py::TestTagsCUDA::test_tags_ldexp_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_lgamma_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_cross_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_eigh_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_eigvals_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_householder_product_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_ldl_factor_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_ldl_factor_ex_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_lu_factor_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_matrix_norm_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_matrix_power_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_matrix_rank_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_matrix_rank_hermitian_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_multi_dot_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_norm_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_pinv_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_qr_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_solve_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_solve_ex_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_solve_triangular_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_tensorinv_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_vander_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_vecdot_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linspace_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_log1p_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_log2_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_log_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_log_softmax_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_logaddexp2_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_logaddexp_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_logcumsumexp_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_logdet_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_logit_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_logspace_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_logsumexp_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_mH_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_mT_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_masked_amax_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_masked_argmin_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_masked_log_softmax_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_masked_logaddexp_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_masked_logsumexp_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_masked_mean_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_masked_median_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_masked_norm_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_masked_normalize_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_masked_prod_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_masked_softmax_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_masked_std_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_max_binary_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_maximum_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_mean_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_meshgrid_list_of_tensors_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_min_binary_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_minimum_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_mm_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_multinomial_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_mv_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_mvlgamma_mvlgamma_p_1_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_mvlgamma_mvlgamma_p_5_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nan_to_num_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nanmean_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nanquantile_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nansum_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_narrow_copy_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_native_batch_norm_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_native_layer_norm_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_ne_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_new_full_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nextafter_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_adaptive_avg_pool1d_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_adaptive_avg_pool2d_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_adaptive_avg_pool3d_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_adaptive_max_pool1d_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_adaptive_max_pool2d_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_adaptive_max_pool3d_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_alpha_dropout_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_avg_pool1d_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_avg_pool2d_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_avg_pool3d_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_batch_norm_without_cudnn_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_binary_cross_entropy_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_celu_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_conv2d_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_conv_transpose2d_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_conv_transpose3d_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_cosine_embedding_loss_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_cosine_similarity_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_cross_entropy_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_ctc_loss_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_dropout2d_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_dropout_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_embedding_bag_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_embedding_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_fractional_max_pool2d_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_fractional_max_pool3d_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_gaussian_nll_loss_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_gelu_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_group_norm_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_hardshrink_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_hardsigmoid_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_hardswish_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_hardtanh_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_hinge_embedding_loss_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_huber_loss_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_interpolate_area_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_interpolate_bicubic_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_interpolate_bilinear_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_interpolate_linear_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_interpolate_nearest_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_interpolate_trilinear_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_l1_loss_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_linear_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_logsigmoid_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_margin_ranking_loss_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_max_pool1d_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_max_pool3d_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_max_unpool1d_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_max_unpool2d_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_max_unpool2d_grad_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_max_unpool3d_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_max_unpool3d_grad_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_multi_margin_loss_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_multilabel_margin_loss_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_nll_loss_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_normalize_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_one_hot_cuda_int64, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_pad_constant_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_pad_reflect_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_pdist_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_pixel_shuffle_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_pixel_unshuffle_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_prelu_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_relu_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_silu_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_smooth_l1_loss_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_soft_margin_loss_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_softplus_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_softshrink_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_tanhshrink_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_threshold_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_triplet_margin_loss_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_upsample_bilinear_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_upsample_nearest_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_norm_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_norm_fro_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_norm_nuc_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_normal_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_ones_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_ones_like_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_ops_nvprims_view_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_pinverse_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_polar_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_polygamma_polygamma_n_0_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_polygamma_polygamma_n_4_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_positive_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_pow_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_prod_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_qr_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_quantile_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_rand_like_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_randint_like_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_randn_like_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_ravel_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_resolve_conj_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_resolve_neg_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_rot90_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_round_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_round_decimals_3_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_round_decimals_neg_3_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_rsqrt_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_rsub_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_scalar_tensor_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_scatter_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_scatter_reduce_amax_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_scatter_reduce_amin_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_scatter_reduce_mean_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_scatter_reduce_sum_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_searchsorted_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_segment_reduce_lengths_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_select_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_sgn_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_sigmoid_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_signal_windows_bartlett_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_signal_windows_gaussian_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_signal_windows_general_cosine_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_signal_windows_hamming_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_signal_windows_kaiser_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_signal_windows_nuttall_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_sin_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_slice_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_softmax_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_softmax_with_dtype_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_sparse_sampled_addmm_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_special_bessel_j0_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_special_bessel_y0_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_special_bessel_y1_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_special_chebyshev_polynomial_t_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_special_chebyshev_polynomial_u_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_special_chebyshev_polynomial_v_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_special_entr_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_special_erfcx_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_special_i0e_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_special_i1_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_special_i1e_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_special_laguerre_polynomial_l_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_special_log_ndtr_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_special_modified_bessel_i0_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_special_modified_bessel_i1_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_special_scaled_modified_bessel_k1_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_special_xlog1py_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_special_zeta_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_split_list_args_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_square_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_std_mean_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_std_mean_unbiased_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_std_unbiased_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_stft_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_sum_to_size_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_svd_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_symeig_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_take_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_tan_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_tanh_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_tensor_split_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_to_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_triangular_solve_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_trunc_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_unbind_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_unique_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_unsqueeze_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_var_mean_unbiased_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_var_unbiased_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_vsplit_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_vstack_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_zero__cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_zeros_cuda_float32 2023-01-11T23:10:16.9480845Z 2023-01-11T23:10:16.9489663Z test_ops.py::TestCommonCUDA::test_compare_cpu_T_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9490164Z test_ops.py::TestCommonCUDA::test_compare_cpu___radd___cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9490649Z test_ops.py::TestCommonCUDA::test_compare_cpu___rdiv___cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9491103Z test_ops.py::TestCommonCUDA::test_compare_cpu___rmod___cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9491556Z test_ops.py::TestCommonCUDA::test_compare_cpu___rxor___cuda_int64 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9492021Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_T_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9492502Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs__conversions_bfloat16_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9492988Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs__conversions_byte_cuda_float32 SKIPPED (Overflow when downcasting signed type is undefined) [ 0%] 2023-01-11T23:10:16.9493468Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs__conversions_cdouble_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9493958Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs__conversions_chalf_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9494441Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs__conversions_char_cuda_float32 SKIPPED (Overflow when downcasting signed type is undefined) [ 0%] 2023-01-11T23:10:16.9495332Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs__conversions_double_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9495812Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs__conversions_half_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9496295Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs__conversions_short_cuda_float32 SKIPPED (Overflow when downcasting signed type is undefined) [ 0%] 2023-01-11T23:10:16.9496765Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_addcdiv_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9497228Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_addcmul_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9497694Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_addr_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9498145Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_arange_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9498615Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_as_strided_scatter_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9499084Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_atan2_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9499541Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_atleast_1d_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9500046Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_bitwise_left_shift_cuda_int64 SKIPPED (Some inputs produce undefined outputs) [ 0%] 2023-01-11T23:10:16.9500503Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_bucketize_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9500972Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_column_stack_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9501443Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_constant_pad_nd_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9501903Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_contiguous_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9502366Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_diag_embed_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9502833Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_diagonal_copy_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9503309Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_div_floor_rounding_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9503785Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_div_no_rounding_mode_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9504245Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_dstack_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9504882Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_empty_like_cuda_float32 SKIPPED (output is non-deterministic) [ 0%] 2023-01-11T23:10:16.9505327Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_eye_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9505790Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_fft_ifftshift_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9506279Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_flipud_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9506734Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_fmin_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9507191Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_hstack_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9507648Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_hypot_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9508093Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_igamma_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9508563Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_igammac_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9509113Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_index_add_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9509577Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_index_fill_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9510036Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_index_select_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9510529Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_linalg_svd_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9511061Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_log_softmax_with_dtype_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9511538Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_logspace_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9512002Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_logsumexp_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9512462Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_masked_fill_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9512930Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_movedim_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9513384Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_mul_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9513841Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_narrow_copy_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9514296Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_narrow_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9514819Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_new_empty_cuda_float32 SKIPPED (output is non-deterministic) [ 0%] 2023-01-11T23:10:16.9515333Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_new_empty_strided_cuda_float32 SKIPPED (output is non-deterministic) [ 0%] 2023-01-11T23:10:16.9515778Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_new_full_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9516228Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_new_ones_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9516768Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_nn_functional_dropout_cuda_float32 SKIPPED (output is non-deterministic) [ 0%] 2023-01-11T23:10:16.9517252Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_nn_functional_glu_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9517736Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_nn_functional_hardshrink_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9518227Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_nn_functional_hinge_embedding_loss_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9518725Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_nn_functional_leaky_relu_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9519215Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_nn_functional_poisson_nll_loss_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9519716Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_nn_functional_softmax_with_dtype_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9520223Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_ones_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9520691Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_randn_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9521148Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_repeat_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9521606Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_reshape_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9522089Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_rsub_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9522545Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_softmax_with_dtype_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9523037Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_special_log_softmax_with_dtype_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9523527Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_special_softmax_with_dtype_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9524008Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_special_xlog1py_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9524488Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_special_zeta_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9524947Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_stack_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9525399Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_to_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9525848Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_tril_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9526304Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_unflatten_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9526759Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_unfold_copy_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9527256Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_unsqueeze_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9527711Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_view_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9528164Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_zeros_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9528608Z test_ops.py::TestCommonCUDA::test_compare_cpu_addcdiv_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9529058Z test_ops.py::TestCommonCUDA::test_compare_cpu_addcmul_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9529514Z test_ops.py::TestCommonCUDA::test_compare_cpu_addmm_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9529964Z test_ops.py::TestCommonCUDA::test_compare_cpu_addmv_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9530451Z test_ops.py::TestCommonCUDA::test_compare_cpu_addr_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9530899Z test_ops.py::TestCommonCUDA::test_compare_cpu_arange_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9531353Z test_ops.py::TestCommonCUDA::test_compare_cpu_as_strided_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9531816Z test_ops.py::TestCommonCUDA::test_compare_cpu_as_strided_scatter_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9532286Z test_ops.py::TestCommonCUDA::test_compare_cpu_atan2_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9532739Z test_ops.py::TestCommonCUDA::test_compare_cpu_baddbmm_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9533252Z test_ops.py::TestCommonCUDA::test_compare_cpu_bernoulli_cuda_float32 SKIPPED (output is non-deterministic) [ 0%] 2023-01-11T23:10:16.9533687Z test_ops.py::TestCommonCUDA::test_compare_cpu_bfloat16_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9534123Z test_ops.py::TestCommonCUDA::test_compare_cpu_bincount_cuda_int64 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9534682Z test_ops.py::TestCommonCUDA::test_compare_cpu_bitwise_left_shift_cuda_int64 SKIPPED (Some inputs produce undefined outputs) [ 0%] 2023-01-11T23:10:16.9535124Z test_ops.py::TestCommonCUDA::test_compare_cpu_bmm_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9535580Z test_ops.py::TestCommonCUDA::test_compare_cpu_bucketize_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9536027Z test_ops.py::TestCommonCUDA::test_compare_cpu_byte_cuda_float32 SKIPPED (Overflow when downcasting signed type is undefined) [ 0%] 2023-01-11T23:10:16.9536463Z test_ops.py::TestCommonCUDA::test_compare_cpu_cdist_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9536910Z test_ops.py::TestCommonCUDA::test_compare_cpu_cdouble_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9537355Z test_ops.py::TestCommonCUDA::test_compare_cpu_char_cuda_float32 SKIPPED (Overflow when downcasting signed type is undefined) [ 0%] 2023-01-11T23:10:16.9537800Z test_ops.py::TestCommonCUDA::test_compare_cpu_cholesky_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9538291Z test_ops.py::TestCommonCUDA::test_compare_cpu_cholesky_solve_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9538752Z test_ops.py::TestCommonCUDA::test_compare_cpu_chunk_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9539211Z test_ops.py::TestCommonCUDA::test_compare_cpu_combinations_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9539678Z test_ops.py::TestCommonCUDA::test_compare_cpu_constant_pad_nd_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9540127Z test_ops.py::TestCommonCUDA::test_compare_cpu_cov_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9540580Z test_ops.py::TestCommonCUDA::test_compare_cpu_cummax_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9541027Z test_ops.py::TestCommonCUDA::test_compare_cpu_cummin_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9541481Z test_ops.py::TestCommonCUDA::test_compare_cpu_cumprod_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9541921Z test_ops.py::TestCommonCUDA::test_compare_cpu_cumsum_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9542387Z test_ops.py::TestCommonCUDA::test_compare_cpu_cumulative_trapezoid_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9542865Z test_ops.py::TestCommonCUDA::test_compare_cpu_diag_embed_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9543353Z test_ops.py::TestCommonCUDA::test_compare_cpu_diagonal_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9543817Z test_ops.py::TestCommonCUDA::test_compare_cpu_diagonal_scatter_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9544291Z test_ops.py::TestCommonCUDA::test_compare_cpu_div_floor_rounding_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9544771Z test_ops.py::TestCommonCUDA::test_compare_cpu_div_no_rounding_mode_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9545248Z test_ops.py::TestCommonCUDA::test_compare_cpu_div_trunc_rounding_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9545710Z test_ops.py::TestCommonCUDA::test_compare_cpu_dsplit_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9546150Z test_ops.py::TestCommonCUDA::test_compare_cpu_dstack_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9546654Z test_ops.py::TestCommonCUDA::test_compare_cpu_empty_cuda_float32 SKIPPED (output is non-deterministic) [ 0%] 2023-01-11T23:10:16.9580087Z test_ops.py::TestCommonCUDA::test_compare_cpu_expand_as_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9580561Z test_ops.py::TestCommonCUDA::test_compare_cpu_expand_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9581023Z test_ops.py::TestCommonCUDA::test_compare_cpu_fft_fftshift_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9581475Z test_ops.py::TestCommonCUDA::test_compare_cpu_fliplr_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9581935Z test_ops.py::TestCommonCUDA::test_compare_cpu_fmax_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9582507Z test_ops.py::TestCommonCUDA::test_compare_cpu_fmin_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9582963Z test_ops.py::TestCommonCUDA::test_compare_cpu_full_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9583404Z test_ops.py::TestCommonCUDA::test_compare_cpu_full_like_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9583862Z test_ops.py::TestCommonCUDA::test_compare_cpu_gather_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9584318Z test_ops.py::TestCommonCUDA::test_compare_cpu_geqrf_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9584781Z test_ops.py::TestCommonCUDA::test_compare_cpu_gradient_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9585229Z test_ops.py::TestCommonCUDA::test_compare_cpu_half_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9585674Z test_ops.py::TestCommonCUDA::test_compare_cpu_hypot_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9586125Z test_ops.py::TestCommonCUDA::test_compare_cpu_igamma_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9586587Z test_ops.py::TestCommonCUDA::test_compare_cpu_igammac_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9587046Z test_ops.py::TestCommonCUDA::test_compare_cpu_index_add_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9587534Z test_ops.py::TestCommonCUDA::test_compare_cpu_index_put_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9588003Z test_ops.py::TestCommonCUDA::test_compare_cpu_index_select_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9588464Z test_ops.py::TestCommonCUDA::test_compare_cpu_inner_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9588995Z test_ops.py::TestCommonCUDA::test_compare_cpu_kron_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9589440Z test_ops.py::TestCommonCUDA::test_compare_cpu_kthvalue_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9589898Z test_ops.py::TestCommonCUDA::test_compare_cpu_lerp_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9590402Z test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_det_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9590871Z test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_eigvals_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9591341Z test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_eigvalsh_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9591828Z test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_lstsq_grad_oriented_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9592315Z test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_lu_factor_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9592797Z test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_lu_factor_ex_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9593299Z test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_matrix_norm_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9593778Z test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_matrix_power_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9594267Z test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_matrix_rank_hermitian_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9594746Z test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_multi_dot_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9595231Z test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_norm_subgradients_at_zero_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9595726Z test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_pinv_hermitian_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9596211Z test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_pinv_singular_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:10:16.9596676Z test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_qr_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:10:16.9597142Z test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_solve_ex_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:10:16.9597616Z test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_solve_triangular_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:10:16.9598121Z test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_svd_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:10:16.9598593Z test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_svdvals_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:10:16.9599058Z test_ops.py::TestCommonCUDA::test_compare_cpu_linspace_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:10:16.9599514Z test_ops.py::TestCommonCUDA::test_compare_cpu_logaddexp2_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:10:16.9600022Z test_ops.py::TestCommonCUDA::test_compare_cpu_logsumexp_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:10:16.9600526Z test_ops.py::TestCommonCUDA::test_compare_cpu_long_cuda_float32 SKIPPED (Overflow when downcasting signed type is undefined) [ 1%] 2023-01-11T23:10:16.9600977Z test_ops.py::TestCommonCUDA::test_compare_cpu_lu_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:10:16.9601430Z test_ops.py::TestCommonCUDA::test_compare_cpu_lu_solve_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:10:16.9601880Z test_ops.py::TestCommonCUDA::test_compare_cpu_lu_unpack_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:10:16.9602329Z test_ops.py::TestCommonCUDA::test_compare_cpu_mH_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:10:16.9602770Z test_ops.py::TestCommonCUDA::test_compare_cpu_mT_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:10:16.9603222Z test_ops.py::TestCommonCUDA::test_compare_cpu_masked_cumprod_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:10:16.9603692Z test_ops.py::TestCommonCUDA::test_compare_cpu_masked_logaddexp_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:10:16.9604202Z test_ops.py::TestCommonCUDA::test_compare_cpu_masked_median_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:10:16.9604679Z test_ops.py::TestCommonCUDA::test_compare_cpu_masked_scatter_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:10:16.9605148Z test_ops.py::TestCommonCUDA::test_compare_cpu_masked_softmin_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:10:16.9605602Z test_ops.py::TestCommonCUDA::test_compare_cpu_matmul_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:10:16.9606083Z test_ops.py::TestCommonCUDA::test_compare_cpu_max_pool2d_with_indices_backward_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:10:16.9606575Z test_ops.py::TestCommonCUDA::test_compare_cpu_max_reduction_no_dim_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:10:16.9607055Z test_ops.py::TestCommonCUDA::test_compare_cpu_max_reduction_with_dim_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:10:16.9607534Z test_ops.py::TestCommonCUDA::test_compare_cpu_meshgrid_list_of_tensors_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:10:16.9607995Z test_ops.py::TestCommonCUDA::test_compare_cpu_mm_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:10:16.9608447Z test_ops.py::TestCommonCUDA::test_compare_cpu_movedim_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:10:16.9608930Z test_ops.py::TestCommonCUDA::test_compare_cpu_msort_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:10:16.9609383Z test_ops.py::TestCommonCUDA::test_compare_cpu_mul_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:10:16.9609822Z test_ops.py::TestCommonCUDA::test_compare_cpu_mv_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:10:16.9610282Z test_ops.py::TestCommonCUDA::test_compare_cpu_nanmedian_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:10:16.9610797Z test_ops.py::TestCommonCUDA::test_compare_cpu_nanquantile_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:10:16.9611280Z test_ops.py::TestCommonCUDA::test_compare_cpu_native_dropout_backward_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:10:16.9611871Z test_ops.py::TestCommonCUDA::test_compare_cpu_new_empty_strided_cuda_float32 SKIPPED (output is non-deterministic) [ 1%] 2023-01-11T23:10:16.9612322Z test_ops.py::TestCommonCUDA::test_compare_cpu_new_ones_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:10:16.9612852Z test_ops.py::TestCommonCUDA::test_compare_cpu_nextafter_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:10:16.9613370Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_adaptive_avg_pool2d_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:10:16.9613864Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_adaptive_avg_pool3d_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:10:16.9614358Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_adaptive_max_pool1d_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:10:16.9615020Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_adaptive_max_pool2d_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:10:16.9615515Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_adaptive_max_pool3d_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:10:16.9615989Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_alpha_dropout_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:10:16.9616472Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_avg_pool3d_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:10:16.9616950Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_bilinear_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:10:16.9617443Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_cosine_embedding_loss_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:10:16.9617942Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_cross_entropy_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:10:16.9618484Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_dropout2d_cuda_float32 SKIPPED (output is non-deterministic) [ 1%] 2023-01-11T23:10:16.9619037Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_feature_alpha_dropout_with_train_cuda_float32 SKIPPED (output is non-deterministic) [ 1%] 2023-01-11T23:10:16.9619539Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_feature_alpha_dropout_without_train_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:10:16.9620181Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_fractional_max_pool2d_cuda_float32 SKIPPED (output is non-deterministic) [ 1%] 2023-01-11T23:10:16.9620745Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_fractional_max_pool3d_cuda_float32 SKIPPED (output is non-deterministic) [ 1%] 2023-01-11T23:10:16.9621219Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_gaussian_nll_loss_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:10:16.9621709Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_grid_sample_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:10:16.9622188Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_hardshrink_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:10:16.9622666Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_hardtanh_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:10:16.9623154Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_hinge_embedding_loss_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:10:16.9623640Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_huber_loss_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:10:16.9624121Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_instance_norm_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:10:16.9624610Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_interpolate_area_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:10:16.9625098Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_interpolate_bilinear_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:10:16.9625599Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_interpolate_linear_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:10:16.9626109Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_kl_div_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:10:16.9626587Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_leaky_relu_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:10:16.9627073Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_margin_ranking_loss_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:10:16.9627549Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_max_pool2d_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:10:16.9628030Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_max_pool3d_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:10:16.9628514Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_max_unpool2d_grad_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:10:16.9629732Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_nll_loss_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:10:16.9630228Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_pad_circular_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:10:16.9630731Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_pad_reflect_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:10:16.9631217Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_pixel_unshuffle_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:10:16.9631740Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_poisson_nll_loss_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:10:16.9632213Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_relu6_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:10:16.9632749Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_rrelu_cuda_float32 SKIPPED (output is non-deterministic) [ 1%] 2023-01-11T23:10:16.9633206Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_softshrink_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:10:16.9633683Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_unfold_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:10:16.9634171Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_upsample_nearest_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:10:16.9634639Z test_ops.py::TestCommonCUDA::test_compare_cpu_norm_fro_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:10:16.9635081Z test_ops.py::TestCommonCUDA::test_compare_cpu_ones_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:10:16.9635540Z test_ops.py::TestCommonCUDA::test_compare_cpu_ops_nvprims_var_mean_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:10:16.9635998Z test_ops.py::TestCommonCUDA::test_compare_cpu_outer_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:10:16.9636435Z test_ops.py::TestCommonCUDA::test_compare_cpu_put_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:10:16.9636951Z test_ops.py::TestCommonCUDA::test_compare_cpu_randint_like_cuda_float32 SKIPPED (output is non-deterministic) [ 1%] 2023-01-11T23:10:16.9637413Z test_ops.py::TestCommonCUDA::test_compare_cpu_randn_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:10:16.9637922Z test_ops.py::TestCommonCUDA::test_compare_cpu_randn_like_cuda_float32 SKIPPED (output is non-deterministic) [ 1%] 2023-01-11T23:10:16.9638351Z test_ops.py::TestCommonCUDA::test_compare_cpu_resolve_conj_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:10:16.9638805Z test_ops.py::TestCommonCUDA::test_compare_cpu_rot90_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:10:16.9639262Z test_ops.py::TestCommonCUDA::test_compare_cpu_scatter_reduce_amax_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:10:16.9639742Z test_ops.py::TestCommonCUDA::test_compare_cpu_scatter_reduce_amin_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:10:16.9640202Z test_ops.py::TestCommonCUDA::test_compare_cpu_scatter_reduce_sum_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:10:16.9640703Z test_ops.py::TestCommonCUDA::test_compare_cpu_segment_reduce_lengths_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:10:16.9641199Z test_ops.py::TestCommonCUDA::test_compare_cpu_select_scatter_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:10:16.9641651Z test_ops.py::TestCommonCUDA::test_compare_cpu_short_cuda_float32 SKIPPED (Overflow when downcasting signed type is undefined) [ 1%] 2023-01-11T23:10:16.9642095Z test_ops.py::TestCommonCUDA::test_compare_cpu_slice_scatter_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:10:16.9642578Z test_ops.py::TestCommonCUDA::test_compare_cpu_softmax_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:10:16.9643026Z test_ops.py::TestCommonCUDA::test_compare_cpu_sort_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:10:16.9643486Z test_ops.py::TestCommonCUDA::test_compare_cpu_special_chebyshev_polynomial_t_cuda_float32 SKIPPED (testing takes an unreasonably long time, #79528) [ 1%] 2023-01-11T23:10:16.9643958Z test_ops.py::TestCommonCUDA::test_compare_cpu_special_chebyshev_polynomial_u_cuda_float32 SKIPPED (testing takes an unreasonably long time, #79528) [ 1%] 2023-01-11T23:10:16.9644535Z test_ops.py::TestCommonCUDA::test_compare_cpu_special_chebyshev_polynomial_w_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 1%] 2023-01-11T23:10:16.9645031Z test_ops.py::TestCommonCUDA::test_compare_cpu_special_hermite_polynomial_h_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:10:16.9645509Z test_ops.py::TestCommonCUDA::test_compare_cpu_special_hermite_polynomial_he_cuda_float32 SKIPPED (testing takes an unreasonably long time, #79528) [ 1%] 2023-01-11T23:10:16.9645976Z test_ops.py::TestCommonCUDA::test_compare_cpu_special_laguerre_polynomial_l_cuda_float32 SKIPPED (testing takes an unreasonably long time, #79528) [ 1%] 2023-01-11T23:10:16.9646432Z test_ops.py::TestCommonCUDA::test_compare_cpu_special_legendre_polynomial_p_cuda_float32 SKIPPED (testing takes an unreasonably long time, #79528) [ 1%] 2023-01-11T23:10:16.9646913Z test_ops.py::TestCommonCUDA::test_compare_cpu_special_shifted_chebyshev_polynomial_u_cuda_float32 SKIPPED (testing takes an unreasonably long time, #79528) [ 1%] 2023-01-11T23:10:16.9647392Z test_ops.py::TestCommonCUDA::test_compare_cpu_special_shifted_chebyshev_polynomial_v_cuda_float32 SKIPPED (testing takes an unreasonably long time, #79528) [ 1%] 2023-01-11T23:10:16.9647871Z test_ops.py::TestCommonCUDA::test_compare_cpu_special_xlog1py_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:10:16.9648353Z test_ops.py::TestCommonCUDA::test_compare_cpu_special_zeta_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:10:16.9648817Z test_ops.py::TestCommonCUDA::test_compare_cpu_split_list_args_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:10:16.9649271Z test_ops.py::TestCommonCUDA::test_compare_cpu_squeeze_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:10:16.9649720Z test_ops.py::TestCommonCUDA::test_compare_cpu_std_mean_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:10:16.9650203Z test_ops.py::TestCommonCUDA::test_compare_cpu_std_mean_unbiased_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:10:16.9650693Z test_ops.py::TestCommonCUDA::test_compare_cpu_std_unbiased_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:10:16.9651152Z test_ops.py::TestCommonCUDA::test_compare_cpu_sum_to_size_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:10:16.9651597Z test_ops.py::TestCommonCUDA::test_compare_cpu_svd_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:10:16.9652044Z test_ops.py::TestCommonCUDA::test_compare_cpu_symeig_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:10:16.9652492Z test_ops.py::TestCommonCUDA::test_compare_cpu_take_along_dim_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:10:16.9652978Z test_ops.py::TestCommonCUDA::test_compare_cpu_tensordot_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:10:16.9653432Z test_ops.py::TestCommonCUDA::test_compare_cpu_to_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:10:16.9653876Z test_ops.py::TestCommonCUDA::test_compare_cpu_trace_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:10:16.9654319Z test_ops.py::TestCommonCUDA::test_compare_cpu_trapezoid_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:10:16.9654874Z test_ops.py::TestCommonCUDA::test_compare_cpu_trapz_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:10:16.9655320Z test_ops.py::TestCommonCUDA::test_compare_cpu_tril_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:10:16.9655772Z test_ops.py::TestCommonCUDA::test_compare_cpu_true_divide_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:10:16.9656218Z test_ops.py::TestCommonCUDA::test_compare_cpu_unfold_copy_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:10:16.9656731Z test_ops.py::TestCommonCUDA::test_compare_cpu_uniform_cuda_float32 SKIPPED (output is non-deterministic) [ 1%] 2023-01-11T23:10:16.9657178Z test_ops.py::TestCommonCUDA::test_compare_cpu_unique_consecutive_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:10:16.9657640Z test_ops.py::TestCommonCUDA::test_compare_cpu_unsqueeze_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:10:16.9658091Z test_ops.py::TestCommonCUDA::test_compare_cpu_var_unbiased_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:10:16.9658545Z test_ops.py::TestCommonCUDA::test_compare_cpu_vdot_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:10:16.9659038Z test_ops.py::TestCommonCUDA::test_compare_cpu_view_as_complex_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:10:16.9659509Z test_ops.py::TestCommonCUDA::test_compare_cpu_view_as_real_cuda_complex64 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:10:16.9659966Z test_ops.py::TestCommonCUDA::test_compare_cpu_vsplit_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:10:16.9660410Z test_ops.py::TestCommonCUDA::test_compare_cpu_zeros_like_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:10:16.9660819Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_T_cuda_complex32 PASSED [ 1%] 2023-01-11T23:10:16.9661200Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing___getitem___cuda_complex32 PASSED [ 1%] 2023-01-11T23:10:16.9661569Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_abs_cuda_complex32 PASSED [ 1%] 2023-01-11T23:10:16.9661944Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_acosh_cuda_complex32 PASSED [ 1%] 2023-01-11T23:10:16.9662316Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_add_cuda_complex32 PASSED [ 1%] 2023-01-11T23:10:16.9662685Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_angle_cuda_complex32 PASSED [ 1%] 2023-01-11T23:10:16.9663102Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_as_strided_cuda_complex32 SKIPPED (Errors when storage_offset is included) [ 1%] 2023-01-11T23:10:16.9663555Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_as_strided_scatter_cuda_complex32 SKIPPED (Fails on cuda + rocm) [ 1%] 2023-01-11T23:10:16.9663996Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_asinh_cuda_complex32 PASSED [ 1%] 2023-01-11T23:10:16.9664372Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_atan_cuda_complex32 PASSED [ 1%] 2023-01-11T23:10:16.9664740Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_atanh_cuda_complex32 PASSED [ 1%] 2023-01-11T23:10:16.9665120Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_atleast_1d_cuda_complex32 PASSED [ 1%] 2023-01-11T23:10:16.9665499Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_atleast_2d_cuda_complex32 PASSED [ 1%] 2023-01-11T23:10:16.9665881Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_bfloat16_cuda_complex32 PASSED [ 1%] 2023-01-11T23:10:16.9666256Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_block_diag_cuda_complex32 PASSED [ 1%] 2023-01-11T23:10:16.9666634Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_cdouble_cuda_complex32 PASSED [ 1%] 2023-01-11T23:10:16.9667012Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_char_cuda_complex32 PASSED [ 1%] 2023-01-11T23:10:16.9667379Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_conj_cuda_complex32 PASSED [ 1%] 2023-01-11T23:10:16.9667762Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_contiguous_cuda_complex32 PASSED [ 1%] 2023-01-11T23:10:16.9668138Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_cosh_cuda_complex32 PASSED [ 1%] 2023-01-11T23:10:16.9668508Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_double_cuda_complex32 PASSED [ 1%] 2023-01-11T23:10:16.9668948Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_dstack_cuda_complex32 PASSED [ 1%] 2023-01-11T23:10:16.9669364Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_empty_cuda_complex32 SKIPPED (Expected: empty is not comparable) [ 1%] 2023-01-11T23:10:16.9669796Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_empty_like_cuda_complex32 SKIPPED (Skipped!) [ 1%] 2023-01-11T23:10:16.9670192Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_exp_cuda_complex32 PASSED [ 1%] 2023-01-11T23:10:16.9670590Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_fft_fft2_cuda_complex32 PASSED [ 2%] 2023-01-11T23:10:16.9670971Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_fft_hfft2_cuda_complex32 PASSED [ 2%] 2023-01-11T23:10:16.9671352Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_fft_hfft_cuda_complex32 PASSED [ 2%] 2023-01-11T23:10:16.9671732Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_fft_ifftn_cuda_complex32 PASSED [ 2%] 2023-01-11T23:10:16.9672109Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_fft_ifftshift_cuda_complex32 PASSED [ 2%] 2023-01-11T23:10:16.9672497Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_fft_irfft2_cuda_complex32 PASSED [ 2%] 2023-01-11T23:10:16.9672874Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_fill_cuda_complex32 PASSED [ 2%] 2023-01-11T23:10:16.9673240Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_flatten_cuda_complex32 PASSED [ 2%] 2023-01-11T23:10:16.9673622Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_float_cuda_complex32 PASSED [ 2%] 2023-01-11T23:10:16.9673993Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_full_cuda_complex32 PASSED [ 2%] 2023-01-11T23:10:16.9674368Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_hsplit_cuda_complex32 PASSED [ 2%] 2023-01-11T23:10:16.9674733Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_imag_cuda_complex32 PASSED [ 2%] 2023-01-11T23:10:16.9675108Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_index_add_cuda_complex32 PASSED [ 2%] 2023-01-11T23:10:16.9675494Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_index_select_cuda_complex32 PASSED [ 2%] 2023-01-11T23:10:16.9675906Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_isinf_cuda_complex32 PASSED [ 2%] 2023-01-11T23:10:16.9676272Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_isreal_cuda_complex32 PASSED [ 2%] 2023-01-11T23:10:16.9676655Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_long_cuda_complex32 PASSED [ 2%] 2023-01-11T23:10:16.9677024Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_mH_cuda_complex32 PASSED [ 2%] 2023-01-11T23:10:16.9677383Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_neg_cuda_complex32 PASSED [ 2%] 2023-01-11T23:10:16.9677760Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_new_zeros_cuda_complex32 PASSED [ 2%] 2023-01-11T23:10:16.9678155Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_nn_functional_conv1d_cuda_complex32 PASSED [ 2%] 2023-01-11T23:10:16.9678560Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_nn_functional_conv2d_cuda_complex32 PASSED [ 2%] 2023-01-11T23:10:16.9678970Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_nn_functional_conv_transpose1d_cuda_complex32 PASSED [ 2%] 2023-01-11T23:10:16.9679369Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_ones_cuda_complex32 PASSED [ 2%] 2023-01-11T23:10:16.9679745Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_randn_cuda_complex32 PASSED [ 2%] 2023-01-11T23:10:16.9680136Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_ravel_cuda_complex32 PASSED [ 2%] 2023-01-11T23:10:16.9680516Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_repeat_interleave_cuda_complex32 PASSED [ 2%] 2023-01-11T23:10:16.9680902Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_roll_cuda_complex32 PASSED [ 2%] 2023-01-11T23:10:16.9681275Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_select_cuda_complex32 PASSED [ 2%] 2023-01-11T23:10:16.9681644Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_sgn_cuda_complex32 PASSED [ 2%] 2023-01-11T23:10:16.9682008Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_sin_cuda_complex32 PASSED [ 2%] 2023-01-11T23:10:16.9682416Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_split_cuda_complex32 PASSED [ 2%] 2023-01-11T23:10:16.9682791Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_sqrt_cuda_complex32 PASSED [ 2%] 2023-01-11T23:10:16.9683153Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_stack_cuda_complex32 PASSED [ 2%] 2023-01-11T23:10:16.9683519Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_sum_cuda_complex32 PASSED [ 2%] 2023-01-11T23:10:16.9683886Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_tanh_cuda_complex32 PASSED [ 2%] 2023-01-11T23:10:16.9684254Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_tril_cuda_complex32 PASSED [ 2%] 2023-01-11T23:10:16.9684612Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_triu_cuda_complex32 PASSED [ 2%] 2023-01-11T23:10:16.9684993Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_unflatten_cuda_complex32 PASSED [ 2%] 2023-01-11T23:10:16.9685379Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_unfold_copy_cuda_complex32 PASSED [ 2%] 2023-01-11T23:10:16.9685759Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_vstack_cuda_complex32 PASSED [ 2%] 2023-01-11T23:10:16.9686131Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_zeros_like_cuda_complex32 PASSED [ 2%] 2023-01-11T23:10:16.9686486Z test_ops.py::TestCommonCUDA::test_dtypes___getitem___cuda PASSED [ 2%] 2023-01-11T23:10:16.9686806Z test_ops.py::TestCommonCUDA::test_dtypes___radd___cuda PASSED [ 2%] 2023-01-11T23:10:16.9687118Z test_ops.py::TestCommonCUDA::test_dtypes___rand___cuda PASSED [ 2%] 2023-01-11T23:10:16.9687429Z test_ops.py::TestCommonCUDA::test_dtypes___rmod___cuda PASSED [ 2%] 2023-01-11T23:10:16.9687769Z test_ops.py::TestCommonCUDA::test_dtypes___rmul___cuda PASSED [ 2%] 2023-01-11T23:10:16.9688084Z test_ops.py::TestCommonCUDA::test_dtypes___ror___cuda PASSED [ 2%] 2023-01-11T23:10:16.9688393Z test_ops.py::TestCommonCUDA::test_dtypes___rsub___cuda PASSED [ 2%] 2023-01-11T23:10:16.9688723Z test_ops.py::TestCommonCUDA::test_dtypes__native_batch_norm_legit_cuda PASSED [ 2%] 2023-01-11T23:10:16.9689054Z test_ops.py::TestCommonCUDA::test_dtypes__refs_T_cuda PASSED [ 2%] 2023-01-11T23:10:16.9689380Z test_ops.py::TestCommonCUDA::test_dtypes__refs__conversions_chalf_cuda PASSED [ 2%] 2023-01-11T23:10:16.9689726Z test_ops.py::TestCommonCUDA::test_dtypes__refs__conversions_double_cuda PASSED [ 2%] 2023-01-11T23:10:16.9690072Z test_ops.py::TestCommonCUDA::test_dtypes__refs__conversions_float_cuda PASSED [ 2%] 2023-01-11T23:10:16.9690411Z test_ops.py::TestCommonCUDA::test_dtypes__refs__conversions_half_cuda PASSED [ 2%] 2023-01-11T23:10:16.9690746Z test_ops.py::TestCommonCUDA::test_dtypes__refs__conversions_short_cuda PASSED [ 2%] 2023-01-11T23:10:16.9691083Z test_ops.py::TestCommonCUDA::test_dtypes__refs_acos_cuda PASSED [ 2%] 2023-01-11T23:10:16.9691407Z test_ops.py::TestCommonCUDA::test_dtypes__refs_acosh_cuda PASSED [ 2%] 2023-01-11T23:10:16.9691716Z test_ops.py::TestCommonCUDA::test_dtypes__refs_add_cuda PASSED [ 2%] 2023-01-11T23:10:16.9692029Z test_ops.py::TestCommonCUDA::test_dtypes__refs_addr_cuda PASSED [ 2%] 2023-01-11T23:10:16.9692346Z test_ops.py::TestCommonCUDA::test_dtypes__refs_all_cuda PASSED [ 2%] 2023-01-11T23:10:16.9692668Z test_ops.py::TestCommonCUDA::test_dtypes__refs_allclose_cuda PASSED [ 2%] 2023-01-11T23:10:16.9692979Z test_ops.py::TestCommonCUDA::test_dtypes__refs_amax_cuda PASSED [ 2%] 2023-01-11T23:10:16.9693293Z test_ops.py::TestCommonCUDA::test_dtypes__refs_amin_cuda PASSED [ 2%] 2023-01-11T23:10:16.9693613Z test_ops.py::TestCommonCUDA::test_dtypes__refs_arange_cuda PASSED [ 2%] 2023-01-11T23:10:16.9693921Z test_ops.py::TestCommonCUDA::test_dtypes__refs_asinh_cuda PASSED [ 2%] 2023-01-11T23:10:16.9694233Z test_ops.py::TestCommonCUDA::test_dtypes__refs_atan2_cuda PASSED [ 2%] 2023-01-11T23:10:16.9694731Z test_ops.py::TestCommonCUDA::test_dtypes__refs_atanh_cuda PASSED [ 2%] 2023-01-11T23:10:16.9695054Z test_ops.py::TestCommonCUDA::test_dtypes__refs_atleast_2d_cuda PASSED [ 2%] 2023-01-11T23:10:16.9695365Z test_ops.py::TestCommonCUDA::test_dtypes__refs_atleast_3d_cuda PASSED [ 2%] 2023-01-11T23:10:16.9695682Z test_ops.py::TestCommonCUDA::test_dtypes__refs_bitwise_and_cuda PASSED [ 2%] 2023-01-11T23:10:16.9696015Z test_ops.py::TestCommonCUDA::test_dtypes__refs_bitwise_left_shift_cuda PASSED [ 2%] 2023-01-11T23:10:16.9696337Z test_ops.py::TestCommonCUDA::test_dtypes__refs_bitwise_not_cuda PASSED [ 2%] 2023-01-11T23:10:16.9696661Z test_ops.py::TestCommonCUDA::test_dtypes__refs_bitwise_or_cuda PASSED [ 2%] 2023-01-11T23:10:16.9697005Z test_ops.py::TestCommonCUDA::test_dtypes__refs_bitwise_right_shift_cuda PASSED [ 2%] 2023-01-11T23:10:16.9697350Z test_ops.py::TestCommonCUDA::test_dtypes__refs_broadcast_shapes_cuda PASSED [ 2%] 2023-01-11T23:10:16.9697687Z test_ops.py::TestCommonCUDA::test_dtypes__refs_broadcast_tensors_cuda PASSED [ 2%] 2023-01-11T23:10:16.9698019Z test_ops.py::TestCommonCUDA::test_dtypes__refs_bucketize_cuda PASSED [ 2%] 2023-01-11T23:10:16.9698333Z test_ops.py::TestCommonCUDA::test_dtypes__refs_cat_cuda PASSED [ 2%] 2023-01-11T23:10:16.9698637Z test_ops.py::TestCommonCUDA::test_dtypes__refs_chunk_cuda PASSED [ 2%] 2023-01-11T23:10:16.9698952Z test_ops.py::TestCommonCUDA::test_dtypes__refs_clamp_max_cuda PASSED [ 2%] 2023-01-11T23:10:16.9699274Z test_ops.py::TestCommonCUDA::test_dtypes__refs_clamp_min_cuda PASSED [ 2%] 2023-01-11T23:10:16.9699584Z test_ops.py::TestCommonCUDA::test_dtypes__refs_conj_cuda PASSED [ 2%] 2023-01-11T23:10:16.9699925Z test_ops.py::TestCommonCUDA::test_dtypes__refs_conj_physical_cuda PASSED [ 2%] 2023-01-11T23:10:16.9700262Z test_ops.py::TestCommonCUDA::test_dtypes__refs_constant_pad_nd_cuda PASSED [ 2%] 2023-01-11T23:10:16.9700628Z test_ops.py::TestCommonCUDA::test_dtypes__refs_cos_cuda PASSED [ 2%] 2023-01-11T23:10:16.9700933Z test_ops.py::TestCommonCUDA::test_dtypes__refs_diag_embed_cuda PASSED [ 2%] 2023-01-11T23:10:16.9701259Z test_ops.py::TestCommonCUDA::test_dtypes__refs_div_floor_rounding_cuda PASSED [ 2%] 2023-01-11T23:10:16.9701595Z test_ops.py::TestCommonCUDA::test_dtypes__refs_div_trunc_rounding_cuda PASSED [ 2%] 2023-01-11T23:10:16.9701923Z test_ops.py::TestCommonCUDA::test_dtypes__refs_dsplit_cuda PASSED [ 2%] 2023-01-11T23:10:16.9702227Z test_ops.py::TestCommonCUDA::test_dtypes__refs_dstack_cuda PASSED [ 2%] 2023-01-11T23:10:16.9702533Z test_ops.py::TestCommonCUDA::test_dtypes__refs_erfc_cuda PASSED [ 2%] 2023-01-11T23:10:16.9702847Z test_ops.py::TestCommonCUDA::test_dtypes__refs_expand_as_cuda PASSED [ 2%] 2023-01-11T23:10:16.9703157Z test_ops.py::TestCommonCUDA::test_dtypes__refs_expm1_cuda PASSED [ 2%] 2023-01-11T23:10:16.9703472Z test_ops.py::TestCommonCUDA::test_dtypes__refs_fft_fft2_cuda PASSED [ 2%] 2023-01-11T23:10:16.9703784Z test_ops.py::TestCommonCUDA::test_dtypes__refs_fft_fft_cuda PASSED [ 2%] 2023-01-11T23:10:16.9704100Z test_ops.py::TestCommonCUDA::test_dtypes__refs_fft_fftshift_cuda PASSED [ 2%] 2023-01-11T23:10:16.9704412Z test_ops.py::TestCommonCUDA::test_dtypes__refs_fft_hfft_cuda PASSED [ 2%] 2023-01-11T23:10:16.9704724Z test_ops.py::TestCommonCUDA::test_dtypes__refs_fft_hfftn_cuda PASSED [ 2%] 2023-01-11T23:10:16.9705304Z test_ops.py::TestCommonCUDA::test_dtypes__refs_fft_ifft2_cuda PASSED [ 2%] 2023-01-11T23:10:16.9705610Z test_ops.py::TestCommonCUDA::test_dtypes__refs_fft_ifft_cuda PASSED [ 2%] 2023-01-11T23:10:16.9705919Z test_ops.py::TestCommonCUDA::test_dtypes__refs_fft_ifftn_cuda PASSED [ 2%] 2023-01-11T23:10:16.9706233Z test_ops.py::TestCommonCUDA::test_dtypes__refs_fft_ihfft_cuda PASSED [ 2%] 2023-01-11T23:10:16.9706546Z test_ops.py::TestCommonCUDA::test_dtypes__refs_fft_irfft2_cuda PASSED [ 2%] 2023-01-11T23:10:16.9706887Z test_ops.py::TestCommonCUDA::test_dtypes__refs_fft_irfft_cuda PASSED [ 2%] 2023-01-11T23:10:16.9707201Z test_ops.py::TestCommonCUDA::test_dtypes__refs_fft_rfft2_cuda PASSED [ 2%] 2023-01-11T23:10:16.9707668Z test_ops.py::TestCommonCUDA::test_dtypes__refs_fill_cuda PASSED [ 2%] 2023-01-11T23:10:16.9707975Z test_ops.py::TestCommonCUDA::test_dtypes__refs_fliplr_cuda PASSED [ 2%] 2023-01-11T23:10:16.9708280Z test_ops.py::TestCommonCUDA::test_dtypes__refs_floor_cuda PASSED [ 2%] 2023-01-11T23:10:16.9708586Z test_ops.py::TestCommonCUDA::test_dtypes__refs_fmin_cuda PASSED [ 2%] 2023-01-11T23:10:16.9708956Z test_ops.py::TestCommonCUDA::test_dtypes__refs_fmod_cuda PASSED [ 2%] 2023-01-11T23:10:16.9709258Z test_ops.py::TestCommonCUDA::test_dtypes__refs_frac_cuda PASSED [ 2%] 2023-01-11T23:10:16.9709562Z test_ops.py::TestCommonCUDA::test_dtypes__refs_gcd_cuda PASSED [ 2%] 2023-01-11T23:10:16.9709865Z test_ops.py::TestCommonCUDA::test_dtypes__refs_gt_cuda PASSED [ 2%] 2023-01-11T23:10:16.9710165Z test_ops.py::TestCommonCUDA::test_dtypes__refs_hsplit_cuda PASSED [ 2%] 2023-01-11T23:10:16.9710473Z test_ops.py::TestCommonCUDA::test_dtypes__refs_hstack_cuda PASSED [ 2%] 2023-01-11T23:10:16.9710779Z test_ops.py::TestCommonCUDA::test_dtypes__refs_hypot_cuda PASSED [ 2%] 2023-01-11T23:10:16.9711080Z test_ops.py::TestCommonCUDA::test_dtypes__refs_i0_cuda PASSED [ 2%] 2023-01-11T23:10:16.9711379Z test_ops.py::TestCommonCUDA::test_dtypes__refs_igamma_cuda PASSED [ 2%] 2023-01-11T23:10:16.9711688Z test_ops.py::TestCommonCUDA::test_dtypes__refs_index_copy_cuda PASSED [ 2%] 2023-01-11T23:10:16.9712000Z test_ops.py::TestCommonCUDA::test_dtypes__refs_index_fill_cuda PASSED [ 2%] 2023-01-11T23:10:16.9712342Z test_ops.py::TestCommonCUDA::test_dtypes__refs_index_select_cuda PASSED [ 2%] 2023-01-11T23:10:16.9712652Z test_ops.py::TestCommonCUDA::test_dtypes__refs_isfinite_cuda PASSED [ 2%] 2023-01-11T23:10:16.9712965Z test_ops.py::TestCommonCUDA::test_dtypes__refs_isinf_cuda PASSED [ 2%] 2023-01-11T23:10:16.9713268Z test_ops.py::TestCommonCUDA::test_dtypes__refs_isnan_cuda PASSED [ 2%] 2023-01-11T23:10:16.9713570Z test_ops.py::TestCommonCUDA::test_dtypes__refs_isneginf_cuda PASSED [ 2%] 2023-01-11T23:10:16.9713878Z test_ops.py::TestCommonCUDA::test_dtypes__refs_isreal_cuda PASSED [ 2%] 2023-01-11T23:10:16.9714181Z test_ops.py::TestCommonCUDA::test_dtypes__refs_le_cuda PASSED [ 2%] 2023-01-11T23:10:16.9714480Z test_ops.py::TestCommonCUDA::test_dtypes__refs_lerp_cuda PASSED [ 2%] 2023-01-11T23:10:16.9714787Z test_ops.py::TestCommonCUDA::test_dtypes__refs_lgamma_cuda PASSED [ 2%] 2023-01-11T23:10:16.9715106Z test_ops.py::TestCommonCUDA::test_dtypes__refs_linalg_matrix_norm_cuda PASSED [ 2%] 2023-01-11T23:10:16.9715427Z test_ops.py::TestCommonCUDA::test_dtypes__refs_linalg_svd_cuda PASSED [ 2%] 2023-01-11T23:10:16.9715753Z test_ops.py::TestCommonCUDA::test_dtypes__refs_linalg_vector_norm_cuda PASSED [ 2%] 2023-01-11T23:10:16.9716076Z test_ops.py::TestCommonCUDA::test_dtypes__refs_linspace_cuda PASSED [ 2%] 2023-01-11T23:10:16.9716384Z test_ops.py::TestCommonCUDA::test_dtypes__refs_log2_cuda PASSED [ 2%] 2023-01-11T23:10:16.9716681Z test_ops.py::TestCommonCUDA::test_dtypes__refs_log_cuda PASSED [ 2%] 2023-01-11T23:10:16.9717006Z test_ops.py::TestCommonCUDA::test_dtypes__refs_log_softmax_with_dtype_cuda PASSED [ 2%] 2023-01-11T23:10:16.9717337Z test_ops.py::TestCommonCUDA::test_dtypes__refs_logical_not_cuda PASSED [ 2%] 2023-01-11T23:10:16.9717650Z test_ops.py::TestCommonCUDA::test_dtypes__refs_logical_or_cuda PASSED [ 2%] 2023-01-11T23:10:16.9717965Z test_ops.py::TestCommonCUDA::test_dtypes__refs_logical_xor_cuda PASSED [ 2%] 2023-01-11T23:10:16.9718294Z test_ops.py::TestCommonCUDA::test_dtypes__refs_logsumexp_cuda PASSED [ 2%] 2023-01-11T23:10:16.9718636Z test_ops.py::TestCommonCUDA::test_dtypes__refs_masked_fill_cuda PASSED [ 2%] 2023-01-11T23:10:16.9718944Z test_ops.py::TestCommonCUDA::test_dtypes__refs_movedim_cuda PASSED [ 2%] 2023-01-11T23:10:16.9719253Z test_ops.py::TestCommonCUDA::test_dtypes__refs_mul_cuda PASSED [ 2%] 2023-01-11T23:10:16.9719556Z test_ops.py::TestCommonCUDA::test_dtypes__refs_nan_to_num_cuda PASSED [ 2%] 2023-01-11T23:10:16.9719857Z test_ops.py::TestCommonCUDA::test_dtypes__refs_ne_cuda PASSED [ 2%] 2023-01-11T23:10:16.9720161Z test_ops.py::TestCommonCUDA::test_dtypes__refs_neg_cuda PASSED [ 2%] 2023-01-11T23:10:16.9720507Z test_ops.py::TestCommonCUDA::test_dtypes__refs_new_empty_cuda PASSED [ 2%] 2023-01-11T23:10:16.9720836Z test_ops.py::TestCommonCUDA::test_dtypes__refs_new_empty_strided_cuda PASSED [ 2%] 2023-01-11T23:10:16.9721155Z test_ops.py::TestCommonCUDA::test_dtypes__refs_new_full_cuda PASSED [ 2%] 2023-01-11T23:10:16.9721465Z test_ops.py::TestCommonCUDA::test_dtypes__refs_nextafter_cuda PASSED [ 2%] 2023-01-11T23:10:16.9721794Z test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_dropout_cuda PASSED [ 2%] 2023-01-11T23:10:16.9722131Z test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_elu_cuda PASSED [ 2%] 2023-01-11T23:10:16.9722474Z test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_group_norm_cuda PASSED [ 2%] 2023-01-11T23:10:16.9722821Z test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_hardtanh_cuda PASSED [ 2%] 2023-01-11T23:10:16.9723169Z test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_huber_loss_cuda PASSED [ 2%] 2023-01-11T23:10:16.9723514Z test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_layer_norm_cuda PASSED [ 2%] 2023-01-11T23:10:16.9723886Z test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_leaky_relu_cuda PASSED [ 3%] 2023-01-11T23:10:16.9724221Z test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_nll_loss_cuda PASSED [ 3%] 2023-01-11T23:10:16.9724579Z test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_pairwise_distance_cuda PASSED [ 3%] 2023-01-11T23:10:16.9724924Z test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_relu_cuda PASSED [ 3%] 2023-01-11T23:10:16.9725263Z test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_softplus_cuda PASSED [ 3%] 2023-01-11T23:10:16.9725607Z test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_softshrink_cuda PASSED [ 3%] 2023-01-11T23:10:16.9725959Z test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_triplet_margin_loss_cuda PASSED [ 3%] 2023-01-11T23:10:16.9726298Z test_ops.py::TestCommonCUDA::test_dtypes__refs_ones_cuda PASSED [ 3%] 2023-01-11T23:10:16.9726610Z test_ops.py::TestCommonCUDA::test_dtypes__refs_ravel_cuda PASSED [ 3%] 2023-01-11T23:10:16.9726930Z test_ops.py::TestCommonCUDA::test_dtypes__refs_remainder_cuda PASSED [ 3%] 2023-01-11T23:10:16.9727243Z test_ops.py::TestCommonCUDA::test_dtypes__refs_repeat_cuda PASSED [ 3%] 2023-01-11T23:10:16.9727555Z test_ops.py::TestCommonCUDA::test_dtypes__refs_reshape_cuda PASSED [ 3%] 2023-01-11T23:10:16.9727870Z test_ops.py::TestCommonCUDA::test_dtypes__refs_roll_cuda PASSED [ 3%] 2023-01-11T23:10:16.9728173Z test_ops.py::TestCommonCUDA::test_dtypes__refs_rsqrt_cuda PASSED [ 3%] 2023-01-11T23:10:16.9728477Z test_ops.py::TestCommonCUDA::test_dtypes__refs_sigmoid_cuda PASSED [ 3%] 2023-01-11T23:10:16.9728789Z test_ops.py::TestCommonCUDA::test_dtypes__refs_sign_cuda PASSED [ 3%] 2023-01-11T23:10:16.9729097Z test_ops.py::TestCommonCUDA::test_dtypes__refs_signbit_cuda PASSED [ 3%] 2023-01-11T23:10:16.9729401Z test_ops.py::TestCommonCUDA::test_dtypes__refs_sin_cuda PASSED [ 3%] 2023-01-11T23:10:16.9729708Z test_ops.py::TestCommonCUDA::test_dtypes__refs_sinh_cuda PASSED [ 3%] 2023-01-11T23:10:16.9730031Z test_ops.py::TestCommonCUDA::test_dtypes__refs_softmax_with_dtype_cuda PASSED [ 3%] 2023-01-11T23:10:16.9730385Z test_ops.py::TestCommonCUDA::test_dtypes__refs_special_bessel_j0_cuda PASSED [ 3%] 2023-01-11T23:10:16.9730763Z test_ops.py::TestCommonCUDA::test_dtypes__refs_special_bessel_j1_cuda PASSED [ 3%] 2023-01-11T23:10:16.9731088Z test_ops.py::TestCommonCUDA::test_dtypes__refs_special_entr_cuda PASSED [ 3%] 2023-01-11T23:10:16.9731405Z test_ops.py::TestCommonCUDA::test_dtypes__refs_special_erfcx_cuda PASSED [ 3%] 2023-01-11T23:10:16.9731716Z test_ops.py::TestCommonCUDA::test_dtypes__refs_special_i0e_cuda PASSED [ 3%] 2023-01-11T23:10:16.9732051Z test_ops.py::TestCommonCUDA::test_dtypes__refs_special_log_softmax_with_dtype_cuda PASSED [ 3%] 2023-01-11T23:10:16.9732387Z test_ops.py::TestCommonCUDA::test_dtypes__refs_special_logit_cuda PASSED [ 3%] 2023-01-11T23:10:16.9732727Z test_ops.py::TestCommonCUDA::test_dtypes__refs_special_multigammaln_mvlgamma_p_1_cuda PASSED [ 3%] 2023-01-11T23:10:16.9733098Z test_ops.py::TestCommonCUDA::test_dtypes__refs_special_multigammaln_mvlgamma_p_3_cuda PASSED [ 3%] 2023-01-11T23:10:16.9733440Z test_ops.py::TestCommonCUDA::test_dtypes__refs_special_ndtri_cuda PASSED [ 3%] 2023-01-11T23:10:16.9733778Z test_ops.py::TestCommonCUDA::test_dtypes__refs_special_spherical_bessel_j0_cuda PASSED [ 3%] 2023-01-11T23:10:16.9734116Z test_ops.py::TestCommonCUDA::test_dtypes__refs_special_xlog1py_cuda PASSED [ 3%] 2023-01-11T23:10:16.9734443Z test_ops.py::TestCommonCUDA::test_dtypes__refs_special_zeta_cuda PASSED [ 3%] 2023-01-11T23:10:16.9734854Z test_ops.py::TestCommonCUDA::test_dtypes__refs_sqrt_cuda PASSED [ 3%] 2023-01-11T23:10:16.9735159Z test_ops.py::TestCommonCUDA::test_dtypes__refs_squeeze_cuda PASSED [ 3%] 2023-01-11T23:10:16.9735468Z test_ops.py::TestCommonCUDA::test_dtypes__refs_stack_cuda PASSED [ 3%] 2023-01-11T23:10:16.9735821Z test_ops.py::TestCommonCUDA::test_dtypes__refs_std_cuda PASSED [ 3%] 2023-01-11T23:10:16.9736124Z test_ops.py::TestCommonCUDA::test_dtypes__refs_sub_cuda PASSED [ 3%] 2023-01-11T23:10:16.9736423Z test_ops.py::TestCommonCUDA::test_dtypes__refs_sum_cuda PASSED [ 3%] 2023-01-11T23:10:16.9736729Z test_ops.py::TestCommonCUDA::test_dtypes__refs_sum_to_size_cuda PASSED [ 3%] 2023-01-11T23:10:16.9737036Z test_ops.py::TestCommonCUDA::test_dtypes__refs_tan_cuda PASSED [ 3%] 2023-01-11T23:10:16.9737336Z test_ops.py::TestCommonCUDA::test_dtypes__refs_tanh_cuda PASSED [ 3%] 2023-01-11T23:10:16.9737640Z test_ops.py::TestCommonCUDA::test_dtypes__refs_to_cuda PASSED [ 3%] 2023-01-11T23:10:16.9737945Z test_ops.py::TestCommonCUDA::test_dtypes__refs_transpose_cuda PASSED [ 3%] 2023-01-11T23:10:16.9738259Z test_ops.py::TestCommonCUDA::test_dtypes__refs_triu_indices_cuda PASSED [ 3%] 2023-01-11T23:10:16.9738567Z test_ops.py::TestCommonCUDA::test_dtypes__refs_trunc_cuda PASSED [ 3%] 2023-01-11T23:10:16.9738877Z test_ops.py::TestCommonCUDA::test_dtypes__refs_unflatten_cuda PASSED [ 3%] 2023-01-11T23:10:16.9739191Z test_ops.py::TestCommonCUDA::test_dtypes__refs_unfold_copy_cuda PASSED [ 3%] 2023-01-11T23:10:16.9739500Z test_ops.py::TestCommonCUDA::test_dtypes__refs_unfold_cuda PASSED [ 3%] 2023-01-11T23:10:16.9739801Z test_ops.py::TestCommonCUDA::test_dtypes__refs_view_cuda PASSED [ 3%] 2023-01-11T23:10:16.9740104Z test_ops.py::TestCommonCUDA::test_dtypes__refs_vstack_cuda PASSED [ 3%] 2023-01-11T23:10:16.9740409Z test_ops.py::TestCommonCUDA::test_dtypes__refs_where_cuda PASSED [ 3%] 2023-01-11T23:10:16.9740710Z test_ops.py::TestCommonCUDA::test_dtypes__refs_xlogy_cuda PASSED [ 3%] 2023-01-11T23:10:16.9741011Z test_ops.py::TestCommonCUDA::test_dtypes_acosh_cuda PASSED [ 3%] 2023-01-11T23:10:16.9741314Z test_ops.py::TestCommonCUDA::test_dtypes_add_cuda PASSED [ 3%] 2023-01-11T23:10:16.9741611Z test_ops.py::TestCommonCUDA::test_dtypes_addcmul_cuda PASSED [ 3%] 2023-01-11T23:10:16.9741913Z test_ops.py::TestCommonCUDA::test_dtypes_addmv_cuda PASSED [ 3%] 2023-01-11T23:10:16.9742242Z test_ops.py::TestCommonCUDA::test_dtypes_all_cuda PASSED [ 3%] 2023-01-11T23:10:16.9742544Z test_ops.py::TestCommonCUDA::test_dtypes_amax_cuda PASSED [ 3%] 2023-01-11T23:10:16.9742839Z test_ops.py::TestCommonCUDA::test_dtypes_aminmax_cuda PASSED [ 3%] 2023-01-11T23:10:16.9743138Z test_ops.py::TestCommonCUDA::test_dtypes_any_cuda PASSED [ 3%] 2023-01-11T23:10:16.9743435Z test_ops.py::TestCommonCUDA::test_dtypes_arange_cuda PASSED [ 3%] 2023-01-11T23:10:16.9743733Z test_ops.py::TestCommonCUDA::test_dtypes_argmin_cuda PASSED [ 3%] 2023-01-11T23:10:16.9744034Z test_ops.py::TestCommonCUDA::test_dtypes_argsort_cuda PASSED [ 3%] 2023-01-11T23:10:16.9744337Z test_ops.py::TestCommonCUDA::test_dtypes_as_strided_cuda PASSED [ 3%] 2023-01-11T23:10:16.9744665Z test_ops.py::TestCommonCUDA::test_dtypes_as_strided_partial_views_cuda PASSED [ 3%] 2023-01-11T23:10:16.9744991Z test_ops.py::TestCommonCUDA::test_dtypes_as_strided_scatter_cuda PASSED [ 3%] 2023-01-11T23:10:16.9745300Z test_ops.py::TestCommonCUDA::test_dtypes_asinh_cuda PASSED [ 3%] 2023-01-11T23:10:16.9745597Z test_ops.py::TestCommonCUDA::test_dtypes_atan_cuda PASSED [ 3%] 2023-01-11T23:10:16.9745896Z test_ops.py::TestCommonCUDA::test_dtypes_atleast_1d_cuda PASSED [ 3%] 2023-01-11T23:10:16.9746201Z test_ops.py::TestCommonCUDA::test_dtypes_atleast_2d_cuda PASSED [ 3%] 2023-01-11T23:10:16.9746506Z test_ops.py::TestCommonCUDA::test_dtypes_bernoulli_cuda PASSED [ 3%] 2023-01-11T23:10:16.9746811Z test_ops.py::TestCommonCUDA::test_dtypes_bfloat16_cuda PASSED [ 3%] 2023-01-11T23:10:16.9747118Z test_ops.py::TestCommonCUDA::test_dtypes_bitwise_left_shift_cuda PASSED [ 3%] 2023-01-11T23:10:16.9747464Z test_ops.py::TestCommonCUDA::test_dtypes_bitwise_right_shift_cuda PASSED [ 3%] 2023-01-11T23:10:16.9747787Z test_ops.py::TestCommonCUDA::test_dtypes_bmm_cuda PASSED [ 3%] 2023-01-11T23:10:16.9748083Z test_ops.py::TestCommonCUDA::test_dtypes_bool_cuda PASSED [ 3%] 2023-01-11T23:10:16.9748408Z test_ops.py::TestCommonCUDA::test_dtypes_broadcast_shapes_cuda SKIPPED (Skipped!) [ 3%] 2023-01-11T23:10:16.9748811Z test_ops.py::TestCommonCUDA::test_dtypes_broadcast_to_cuda PASSED [ 3%] 2023-01-11T23:10:16.9749117Z test_ops.py::TestCommonCUDA::test_dtypes_bucketize_cuda PASSED [ 3%] 2023-01-11T23:10:16.9749415Z test_ops.py::TestCommonCUDA::test_dtypes_byte_cuda PASSED [ 3%] 2023-01-11T23:10:16.9749722Z test_ops.py::TestCommonCUDA::test_dtypes_cartesian_prod_cuda PASSED [ 3%] 2023-01-11T23:10:16.9750059Z test_ops.py::TestCommonCUDA::test_dtypes_cat_cuda PASSED [ 3%] 2023-01-11T23:10:16.9750384Z test_ops.py::TestCommonCUDA::test_dtypes_cdist_cuda PASSED [ 3%] 2023-01-11T23:10:16.9750687Z test_ops.py::TestCommonCUDA::test_dtypes_ceil_cuda PASSED [ 3%] 2023-01-11T23:10:16.9750992Z test_ops.py::TestCommonCUDA::test_dtypes_chalf_cuda PASSED [ 3%] 2023-01-11T23:10:16.9751291Z test_ops.py::TestCommonCUDA::test_dtypes_char_cuda PASSED [ 3%] 2023-01-11T23:10:16.9751595Z test_ops.py::TestCommonCUDA::test_dtypes_cholesky_inverse_cuda PASSED [ 3%] 2023-01-11T23:10:16.9751903Z test_ops.py::TestCommonCUDA::test_dtypes_clamp_min_cuda PASSED [ 3%] 2023-01-11T23:10:16.9752206Z test_ops.py::TestCommonCUDA::test_dtypes_clone_cuda PASSED [ 3%] 2023-01-11T23:10:16.9752505Z test_ops.py::TestCommonCUDA::test_dtypes_combinations_cuda PASSED [ 3%] 2023-01-11T23:10:16.9752816Z test_ops.py::TestCommonCUDA::test_dtypes_conj_physical_cuda PASSED [ 3%] 2023-01-11T23:10:16.9753124Z test_ops.py::TestCommonCUDA::test_dtypes_copysign_cuda PASSED [ 3%] 2023-01-11T23:10:16.9753434Z test_ops.py::TestCommonCUDA::test_dtypes_cosh_cuda PASSED [ 3%] 2023-01-11T23:10:16.9753733Z test_ops.py::TestCommonCUDA::test_dtypes_count_nonzero_cuda PASSED [ 3%] 2023-01-11T23:10:16.9754066Z test_ops.py::TestCommonCUDA::test_dtypes_cov_cuda PASSED [ 3%] 2023-01-11T23:10:16.9754364Z test_ops.py::TestCommonCUDA::test_dtypes_cross_cuda PASSED [ 3%] 2023-01-11T23:10:16.9754660Z test_ops.py::TestCommonCUDA::test_dtypes_cummax_cuda PASSED [ 3%] 2023-01-11T23:10:16.9754963Z test_ops.py::TestCommonCUDA::test_dtypes_cumprod_cuda PASSED [ 3%] 2023-01-11T23:10:16.9755284Z test_ops.py::TestCommonCUDA::test_dtypes_cumulative_trapezoid_cuda PASSED [ 3%] 2023-01-11T23:10:16.9755604Z test_ops.py::TestCommonCUDA::test_dtypes_diag_embed_cuda PASSED [ 3%] 2023-01-11T23:10:16.9755905Z test_ops.py::TestCommonCUDA::test_dtypes_diagonal_cuda PASSED [ 3%] 2023-01-11T23:10:16.9756213Z test_ops.py::TestCommonCUDA::test_dtypes_digamma_cuda PASSED [ 3%] 2023-01-11T23:10:16.9756521Z test_ops.py::TestCommonCUDA::test_dtypes_div_floor_rounding_cuda PASSED [ 3%] 2023-01-11T23:10:16.9756826Z test_ops.py::TestCommonCUDA::test_dtypes_dot_cuda PASSED [ 3%] 2023-01-11T23:10:16.9757127Z test_ops.py::TestCommonCUDA::test_dtypes_dsplit_cuda PASSED [ 3%] 2023-01-11T23:10:16.9757433Z test_ops.py::TestCommonCUDA::test_dtypes_empty_cuda PASSED [ 3%] 2023-01-11T23:10:16.9757735Z test_ops.py::TestCommonCUDA::test_dtypes_equal_cuda PASSED [ 3%] 2023-01-11T23:10:16.9758030Z test_ops.py::TestCommonCUDA::test_dtypes_erf_cuda PASSED [ 3%] 2023-01-11T23:10:16.9758327Z test_ops.py::TestCommonCUDA::test_dtypes_fft_fftn_cuda PASSED [ 3%] 2023-01-11T23:10:16.9758631Z test_ops.py::TestCommonCUDA::test_dtypes_fft_hfftn_cuda PASSED [ 3%] 2023-01-11T23:10:16.9758958Z test_ops.py::TestCommonCUDA::test_dtypes_fft_ifftn_cuda PASSED [ 3%] 2023-01-11T23:10:16.9759267Z test_ops.py::TestCommonCUDA::test_dtypes_fft_ifftshift_cuda PASSED [ 3%] 2023-01-11T23:10:16.9759572Z test_ops.py::TestCommonCUDA::test_dtypes_fft_ihfft2_cuda PASSED [ 3%] 2023-01-11T23:10:16.9759884Z test_ops.py::TestCommonCUDA::test_dtypes_fft_ihfftn_cuda PASSED [ 3%] 2023-01-11T23:10:16.9760184Z test_ops.py::TestCommonCUDA::test_dtypes_fft_irfftn_cuda PASSED [ 3%] 2023-01-11T23:10:16.9760491Z test_ops.py::TestCommonCUDA::test_dtypes_fft_rfftn_cuda PASSED [ 3%] 2023-01-11T23:10:16.9760840Z test_ops.py::TestCommonCUDA::test_dtypes_flatten_cuda PASSED [ 3%] 2023-01-11T23:10:16.9761139Z test_ops.py::TestCommonCUDA::test_dtypes_fliplr_cuda PASSED [ 3%] 2023-01-11T23:10:16.9761440Z test_ops.py::TestCommonCUDA::test_dtypes_flipud_cuda PASSED [ 3%] 2023-01-11T23:10:16.9761746Z test_ops.py::TestCommonCUDA::test_dtypes_float_power_cuda PASSED [ 3%] 2023-01-11T23:10:16.9762053Z test_ops.py::TestCommonCUDA::test_dtypes_floor_divide_cuda PASSED [ 3%] 2023-01-11T23:10:16.9762357Z test_ops.py::TestCommonCUDA::test_dtypes_fmax_cuda PASSED [ 3%] 2023-01-11T23:10:16.9762656Z test_ops.py::TestCommonCUDA::test_dtypes_fmin_cuda PASSED [ 3%] 2023-01-11T23:10:16.9762954Z test_ops.py::TestCommonCUDA::test_dtypes_frac_cuda PASSED [ 3%] 2023-01-11T23:10:16.9763248Z test_ops.py::TestCommonCUDA::test_dtypes_full_cuda PASSED [ 3%] 2023-01-11T23:10:16.9763545Z test_ops.py::TestCommonCUDA::test_dtypes_full_like_cuda PASSED [ 3%] 2023-01-11T23:10:16.9763847Z test_ops.py::TestCommonCUDA::test_dtypes_geqrf_cuda XFAIL [ 3%] 2023-01-11T23:10:16.9764142Z test_ops.py::TestCommonCUDA::test_dtypes_gradient_cuda PASSED [ 3%] 2023-01-11T23:10:16.9764449Z test_ops.py::TestCommonCUDA::test_dtypes_grid_sampler_2d_cuda PASSED [ 3%] 2023-01-11T23:10:16.9764754Z test_ops.py::TestCommonCUDA::test_dtypes_gt_cuda PASSED [ 3%] 2023-01-11T23:10:16.9765057Z test_ops.py::TestCommonCUDA::test_dtypes_histogram_cuda PASSED [ 3%] 2023-01-11T23:10:16.9765391Z test_ops.py::TestCommonCUDA::test_dtypes_hstack_cuda PASSED [ 3%] 2023-01-11T23:10:16.9765690Z test_ops.py::TestCommonCUDA::test_dtypes_hypot_cuda PASSED [ 3%] 2023-01-11T23:10:16.9765987Z test_ops.py::TestCommonCUDA::test_dtypes_i0_cuda PASSED [ 3%] 2023-01-11T23:10:16.9766280Z test_ops.py::TestCommonCUDA::test_dtypes_index_add_cuda PASSED [ 3%] 2023-01-11T23:10:16.9766585Z test_ops.py::TestCommonCUDA::test_dtypes_index_reduce_cuda PASSED [ 3%] 2023-01-11T23:10:16.9766891Z test_ops.py::TestCommonCUDA::test_dtypes_int_cuda PASSED [ 3%] 2023-01-11T23:10:16.9767191Z test_ops.py::TestCommonCUDA::test_dtypes_isfinite_cuda PASSED [ 3%] 2023-01-11T23:10:16.9767492Z test_ops.py::TestCommonCUDA::test_dtypes_isin_cuda PASSED [ 3%] 2023-01-11T23:10:16.9767796Z test_ops.py::TestCommonCUDA::test_dtypes_isinf_cuda PASSED [ 3%] 2023-01-11T23:10:16.9768096Z test_ops.py::TestCommonCUDA::test_dtypes_isnan_cuda PASSED [ 3%] 2023-01-11T23:10:16.9768400Z test_ops.py::TestCommonCUDA::test_dtypes_isneginf_cuda PASSED [ 3%] 2023-01-11T23:10:16.9768700Z test_ops.py::TestCommonCUDA::test_dtypes_isreal_cuda PASSED [ 3%] 2023-01-11T23:10:16.9769008Z test_ops.py::TestCommonCUDA::test_dtypes_jiterator_binary_cuda PASSED [ 3%] 2023-01-11T23:10:16.9769343Z test_ops.py::TestCommonCUDA::test_dtypes_jiterator_binary_return_by_ref_cuda PASSED [ 3%] 2023-01-11T23:10:16.9769669Z test_ops.py::TestCommonCUDA::test_dtypes_jiterator_unary_cuda PASSED [ 3%] 2023-01-11T23:10:16.9769978Z test_ops.py::TestCommonCUDA::test_dtypes_kron_cuda PASSED [ 3%] 2023-01-11T23:10:16.9770294Z test_ops.py::TestCommonCUDA::test_dtypes_kthvalue_cuda PASSED [ 3%] 2023-01-11T23:10:16.9770667Z test_ops.py::TestCommonCUDA::test_dtypes_lcm_cuda PASSED [ 3%] 2023-01-11T23:10:16.9770970Z test_ops.py::TestCommonCUDA::test_dtypes_ldexp_cuda PASSED [ 3%] 2023-01-11T23:10:16.9771276Z test_ops.py::TestCommonCUDA::test_dtypes_lerp_cuda PASSED [ 3%] 2023-01-11T23:10:16.9771586Z test_ops.py::TestCommonCUDA::test_dtypes_lgamma_cuda PASSED [ 3%] 2023-01-11T23:10:16.9771889Z test_ops.py::TestCommonCUDA::test_dtypes_linalg_cross_cuda PASSED [ 3%] 2023-01-11T23:10:16.9772194Z test_ops.py::TestCommonCUDA::test_dtypes_linalg_det_cuda PASSED [ 3%] 2023-01-11T23:10:16.9772504Z test_ops.py::TestCommonCUDA::test_dtypes_linalg_eig_cuda PASSED [ 3%] 2023-01-11T23:10:16.9772807Z test_ops.py::TestCommonCUDA::test_dtypes_linalg_eigvalsh_cuda PASSED [ 4%] 2023-01-11T23:10:16.9773134Z test_ops.py::TestCommonCUDA::test_dtypes_linalg_lstsq_grad_oriented_cuda PASSED [ 4%] 2023-01-11T23:10:16.9773457Z test_ops.py::TestCommonCUDA::test_dtypes_linalg_lu_cuda PASSED [ 4%] 2023-01-11T23:10:16.9773762Z test_ops.py::TestCommonCUDA::test_dtypes_linalg_lu_factor_cuda PASSED [ 4%] 2023-01-11T23:10:16.9774072Z test_ops.py::TestCommonCUDA::test_dtypes_linalg_lu_factor_ex_cuda PASSED [ 4%] 2023-01-11T23:10:16.9774383Z test_ops.py::TestCommonCUDA::test_dtypes_linalg_lu_solve_cuda PASSED [ 4%] 2023-01-11T23:10:16.9774796Z test_ops.py::TestCommonCUDA::test_dtypes_linalg_matrix_norm_cuda PASSED [ 4%] 2023-01-11T23:10:16.9775108Z test_ops.py::TestCommonCUDA::test_dtypes_linalg_matrix_power_cuda PASSED [ 4%] 2023-01-11T23:10:16.9775425Z test_ops.py::TestCommonCUDA::test_dtypes_linalg_matrix_rank_cuda PASSED [ 4%] 2023-01-11T23:10:16.9775753Z test_ops.py::TestCommonCUDA::test_dtypes_linalg_matrix_rank_hermitian_cuda PASSED [ 4%] 2023-01-11T23:10:16.9776078Z test_ops.py::TestCommonCUDA::test_dtypes_linalg_solve_cuda PASSED [ 4%] 2023-01-11T23:10:16.9776393Z test_ops.py::TestCommonCUDA::test_dtypes_linalg_solve_triangular_cuda PASSED [ 4%] 2023-01-11T23:10:16.9776718Z test_ops.py::TestCommonCUDA::test_dtypes_linalg_svd_cuda PASSED [ 4%] 2023-01-11T23:10:16.9777027Z test_ops.py::TestCommonCUDA::test_dtypes_linalg_svdvals_cuda PASSED [ 4%] 2023-01-11T23:10:16.9777378Z test_ops.py::TestCommonCUDA::test_dtypes_linalg_tensorinv_cuda PASSED [ 4%] 2023-01-11T23:10:16.9777694Z test_ops.py::TestCommonCUDA::test_dtypes_linalg_vander_cuda PASSED [ 4%] 2023-01-11T23:10:16.9777999Z test_ops.py::TestCommonCUDA::test_dtypes_linalg_vecdot_cuda PASSED [ 4%] 2023-01-11T23:10:16.9778313Z test_ops.py::TestCommonCUDA::test_dtypes_linalg_vector_norm_cuda PASSED [ 4%] 2023-01-11T23:10:16.9778620Z test_ops.py::TestCommonCUDA::test_dtypes_logaddexp2_cuda PASSED [ 4%] 2023-01-11T23:10:16.9778926Z test_ops.py::TestCommonCUDA::test_dtypes_logaddexp_cuda PASSED [ 4%] 2023-01-11T23:10:16.9779239Z test_ops.py::TestCommonCUDA::test_dtypes_logcumsumexp_cuda PASSED [ 4%] 2023-01-11T23:10:16.9779547Z test_ops.py::TestCommonCUDA::test_dtypes_logdet_cuda PASSED [ 4%] 2023-01-11T23:10:16.9779860Z test_ops.py::TestCommonCUDA::test_dtypes_logical_and_cuda PASSED [ 4%] 2023-01-11T23:10:16.9780213Z test_ops.py::TestCommonCUDA::test_dtypes_logical_not_cuda PASSED [ 4%] 2023-01-11T23:10:16.9780518Z test_ops.py::TestCommonCUDA::test_dtypes_logical_xor_cuda PASSED [ 4%] 2023-01-11T23:10:16.9780825Z test_ops.py::TestCommonCUDA::test_dtypes_logsumexp_cuda PASSED [ 4%] 2023-01-11T23:10:16.9781127Z test_ops.py::TestCommonCUDA::test_dtypes_long_cuda PASSED [ 4%] 2023-01-11T23:10:16.9781423Z test_ops.py::TestCommonCUDA::test_dtypes_lt_cuda PASSED [ 4%] 2023-01-11T23:10:16.9781714Z test_ops.py::TestCommonCUDA::test_dtypes_lu_cuda PASSED [ 4%] 2023-01-11T23:10:16.9782010Z test_ops.py::TestCommonCUDA::test_dtypes_lu_solve_cuda PASSED [ 4%] 2023-01-11T23:10:16.9782348Z test_ops.py::TestCommonCUDA::test_dtypes_masked_amin_cuda PASSED [ 4%] 2023-01-11T23:10:16.9782661Z test_ops.py::TestCommonCUDA::test_dtypes_masked_fill_cuda PASSED [ 4%] 2023-01-11T23:10:16.9782971Z test_ops.py::TestCommonCUDA::test_dtypes_masked_log_softmax_cuda PASSED [ 4%] 2023-01-11T23:10:16.9783285Z test_ops.py::TestCommonCUDA::test_dtypes_masked_logaddexp_cuda PASSED [ 4%] 2023-01-11T23:10:16.9783605Z test_ops.py::TestCommonCUDA::test_dtypes_masked_logsumexp_cuda PASSED [ 4%] 2023-01-11T23:10:16.9783909Z test_ops.py::TestCommonCUDA::test_dtypes_masked_mean_cuda PASSED [ 4%] 2023-01-11T23:10:16.9784216Z test_ops.py::TestCommonCUDA::test_dtypes_masked_median_cuda PASSED [ 4%] 2023-01-11T23:10:16.9784521Z test_ops.py::TestCommonCUDA::test_dtypes_masked_norm_cuda PASSED [ 4%] 2023-01-11T23:10:16.9784829Z test_ops.py::TestCommonCUDA::test_dtypes_masked_softmax_cuda PASSED [ 4%] 2023-01-11T23:10:16.9785137Z test_ops.py::TestCommonCUDA::test_dtypes_masked_softmin_cuda PASSED [ 4%] 2023-01-11T23:10:16.9785445Z test_ops.py::TestCommonCUDA::test_dtypes_masked_std_cuda PASSED [ 4%] 2023-01-11T23:10:16.9785749Z test_ops.py::TestCommonCUDA::test_dtypes_masked_sum_cuda PASSED [ 4%] 2023-01-11T23:10:16.9786050Z test_ops.py::TestCommonCUDA::test_dtypes_masked_var_cuda PASSED [ 4%] 2023-01-11T23:10:16.9786354Z test_ops.py::TestCommonCUDA::test_dtypes_matrix_exp_cuda PASSED [ 4%] 2023-01-11T23:10:16.9786673Z test_ops.py::TestCommonCUDA::test_dtypes_max_reduction_no_dim_cuda PASSED [ 4%] 2023-01-11T23:10:16.9786989Z test_ops.py::TestCommonCUDA::test_dtypes_mean_cuda PASSED [ 4%] 2023-01-11T23:10:16.9787308Z test_ops.py::TestCommonCUDA::test_dtypes_meshgrid_variadic_tensors_cuda PASSED [ 4%] 2023-01-11T23:10:16.9787632Z test_ops.py::TestCommonCUDA::test_dtypes_min_binary_cuda PASSED [ 4%] 2023-01-11T23:10:16.9787944Z test_ops.py::TestCommonCUDA::test_dtypes_min_reduction_no_dim_cuda PASSED [ 4%] 2023-01-11T23:10:16.9788261Z test_ops.py::TestCommonCUDA::test_dtypes_minimum_cuda PASSED [ 4%] 2023-01-11T23:10:16.9788561Z test_ops.py::TestCommonCUDA::test_dtypes_mm_cuda PASSED [ 4%] 2023-01-11T23:10:16.9788972Z test_ops.py::TestCommonCUDA::test_dtypes_movedim_cuda PASSED [ 4%] 2023-01-11T23:10:16.9789278Z test_ops.py::TestCommonCUDA::test_dtypes_msort_cuda PASSED [ 4%] 2023-01-11T23:10:16.9789590Z test_ops.py::TestCommonCUDA::test_dtypes_mvlgamma_mvlgamma_p_5_cuda PASSED [ 4%] 2023-01-11T23:10:16.9789908Z test_ops.py::TestCommonCUDA::test_dtypes_nanmean_cuda PASSED [ 4%] 2023-01-11T23:10:16.9790212Z test_ops.py::TestCommonCUDA::test_dtypes_narrow_cuda PASSED [ 4%] 2023-01-11T23:10:16.9790515Z test_ops.py::TestCommonCUDA::test_dtypes_native_batch_norm_cuda PASSED [ 4%] 2023-01-11T23:10:16.9790840Z test_ops.py::TestCommonCUDA::test_dtypes_native_dropout_backward_cuda PASSED [ 4%] 2023-01-11T23:10:16.9791164Z test_ops.py::TestCommonCUDA::test_dtypes_new_empty_cuda PASSED [ 4%] 2023-01-11T23:10:16.9791469Z test_ops.py::TestCommonCUDA::test_dtypes_new_full_cuda PASSED [ 4%] 2023-01-11T23:10:16.9791766Z test_ops.py::TestCommonCUDA::test_dtypes_new_ones_cuda PASSED [ 4%] 2023-01-11T23:10:16.9792065Z test_ops.py::TestCommonCUDA::test_dtypes_new_zeros_cuda PASSED [ 4%] 2023-01-11T23:10:16.9792371Z test_ops.py::TestCommonCUDA::test_dtypes_nextafter_cuda PASSED [ 4%] 2023-01-11T23:10:16.9792699Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_adaptive_avg_pool1d_cuda PASSED [ 4%] 2023-01-11T23:10:16.9793062Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_adaptive_max_pool2d_cuda PASSED [ 4%] 2023-01-11T23:10:16.9793417Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_adaptive_max_pool3d_cuda PASSED [ 4%] 2023-01-11T23:10:16.9793763Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_alpha_dropout_cuda PASSED [ 4%] 2023-01-11T23:10:16.9794127Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_avg_pool1d_cuda PASSED [ 4%] 2023-01-11T23:10:16.9794459Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_batch_norm_cuda PASSED [ 4%] 2023-01-11T23:10:16.9794800Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_bilinear_cuda PASSED [ 4%] 2023-01-11T23:10:16.9795142Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_binary_cross_entropy_cuda PASSED [ 4%] 2023-01-11T23:10:16.9795514Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_binary_cross_entropy_with_logits_cuda PASSED [ 4%] 2023-01-11T23:10:16.9795868Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_conv1d_cuda PASSED [ 4%] 2023-01-11T23:10:16.9796207Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_conv_transpose2d_cuda PASSED [ 4%] 2023-01-11T23:10:16.9796555Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_conv_transpose3d_cuda PASSED [ 4%] 2023-01-11T23:10:16.9796910Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_cosine_embedding_loss_cuda PASSED [ 4%] 2023-01-11T23:10:16.9797264Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_cross_entropy_cuda PASSED [ 4%] 2023-01-11T23:10:16.9797605Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_ctc_loss_cuda PASSED [ 4%] 2023-01-11T23:10:16.9797940Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_dropout2d_cuda PASSED [ 4%] 2023-01-11T23:10:16.9798274Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_dropout3d_cuda PASSED [ 4%] 2023-01-11T23:10:16.9798611Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_embedding_cuda PASSED [ 4%] 2023-01-11T23:10:16.9798953Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_fractional_max_pool2d_cuda PASSED [ 4%] 2023-01-11T23:10:16.9799307Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_gaussian_nll_loss_cuda PASSED [ 4%] 2023-01-11T23:10:16.9799643Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_gelu_cuda PASSED [ 4%] 2023-01-11T23:10:16.9799964Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_glu_cuda PASSED [ 4%] 2023-01-11T23:10:16.9800290Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_hardtanh_cuda PASSED [ 4%] 2023-01-11T23:10:16.9800659Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_hinge_embedding_loss_cuda PASSED [ 4%] 2023-01-11T23:10:16.9801096Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_instance_norm_cuda PASSED [ 4%] 2023-01-11T23:10:16.9801441Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_interpolate_area_cuda PASSED [ 4%] 2023-01-11T23:10:16.9801793Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_interpolate_bicubic_cuda PASSED [ 4%] 2023-01-11T23:10:16.9802152Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_interpolate_linear_cuda PASSED [ 4%] 2023-01-11T23:10:16.9802493Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_l1_loss_cuda PASSED [ 4%] 2023-01-11T23:10:16.9802825Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_leaky_relu_cuda PASSED [ 4%] 2023-01-11T23:10:16.9803170Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_margin_ranking_loss_cuda PASSED [ 4%] 2023-01-11T23:10:16.9803522Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_max_pool2d_cuda PASSED [ 4%] 2023-01-11T23:10:16.9803861Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_max_unpool2d_grad_cuda PASSED [ 4%] 2023-01-11T23:10:16.9804208Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_max_unpool3d_grad_cuda PASSED [ 4%] 2023-01-11T23:10:16.9804544Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_mish_cuda PASSED [ 4%] 2023-01-11T23:10:16.9804885Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_multilabel_margin_loss_cuda PASSED [ 4%] 2023-01-11T23:10:16.9805228Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_nll_loss_cuda PASSED [ 4%] 2023-01-11T23:10:16.9805568Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_pad_circular_cuda PASSED [ 4%] 2023-01-11T23:10:16.9805907Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_pad_reflect_cuda PASSED [ 4%] 2023-01-11T23:10:16.9806274Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_pairwise_distance_cuda PASSED [ 4%] 2023-01-11T23:10:16.9806611Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_prelu_cuda PASSED [ 4%] 2023-01-11T23:10:16.9806948Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_silu_complex_cuda XFAIL [ 4%] 2023-01-11T23:10:16.9807286Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_smooth_l1_loss_cuda PASSED [ 4%] 2023-01-11T23:10:16.9807620Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_softplus_cuda PASSED [ 4%] 2023-01-11T23:10:16.9807955Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_threshold_cuda PASSED [ 4%] 2023-01-11T23:10:16.9808283Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_unfold_cuda PASSED [ 4%] 2023-01-11T23:10:16.9808597Z test_ops.py::TestCommonCUDA::test_dtypes_norm_inf_cuda PASSED [ 4%] 2023-01-11T23:10:16.9808910Z test_ops.py::TestCommonCUDA::test_dtypes_normal_number_mean_cuda PASSED [ 4%] 2023-01-11T23:10:16.9809223Z test_ops.py::TestCommonCUDA::test_dtypes_ones_like_cuda PASSED [ 4%] 2023-01-11T23:10:16.9809541Z test_ops.py::TestCommonCUDA::test_dtypes_ops_nvprims_var_mean_cuda PASSED [ 4%] 2023-01-11T23:10:16.9809868Z test_ops.py::TestCommonCUDA::test_dtypes_ops_nvprims_view_cuda PASSED [ 4%] 2023-01-11T23:10:16.9810212Z test_ops.py::TestCommonCUDA::test_dtypes_pca_lowrank_cuda PASSED [ 4%] 2023-01-11T23:10:16.9810551Z test_ops.py::TestCommonCUDA::test_dtypes_pinverse_cuda PASSED [ 4%] 2023-01-11T23:10:16.9810895Z test_ops.py::TestCommonCUDA::test_dtypes_polygamma_polygamma_n_1_cuda SKIPPED (Skipped!) [ 4%] 2023-01-11T23:10:16.9811254Z test_ops.py::TestCommonCUDA::test_dtypes_polygamma_polygamma_n_4_cuda SKIPPED (Skipped!) [ 4%] 2023-01-11T23:10:16.9811575Z test_ops.py::TestCommonCUDA::test_dtypes_qr_cuda PASSED [ 4%] 2023-01-11T23:10:16.9811879Z test_ops.py::TestCommonCUDA::test_dtypes_quantile_cuda PASSED [ 4%] 2023-01-11T23:10:16.9812186Z test_ops.py::TestCommonCUDA::test_dtypes_rad2deg_cuda PASSED [ 4%] 2023-01-11T23:10:16.9812493Z test_ops.py::TestCommonCUDA::test_dtypes_rand_like_cuda PASSED [ 4%] 2023-01-11T23:10:16.9812825Z test_ops.py::TestCommonCUDA::test_dtypes_randint_like_cuda PASSED [ 4%] 2023-01-11T23:10:16.9813129Z test_ops.py::TestCommonCUDA::test_dtypes_real_cuda PASSED [ 4%] 2023-01-11T23:10:16.9813439Z test_ops.py::TestCommonCUDA::test_dtypes_reciprocal_cuda PASSED [ 4%] 2023-01-11T23:10:16.9813752Z test_ops.py::TestCommonCUDA::test_dtypes_remainder_cuda PASSED [ 4%] 2023-01-11T23:10:16.9814063Z test_ops.py::TestCommonCUDA::test_dtypes_repeat_cuda PASSED [ 4%] 2023-01-11T23:10:16.9814363Z test_ops.py::TestCommonCUDA::test_dtypes_resize__cuda XFAIL [ 4%] 2023-01-11T23:10:16.9814769Z test_ops.py::TestCommonCUDA::test_dtypes_resolve_conj_cuda PASSED [ 4%] 2023-01-11T23:10:16.9815083Z test_ops.py::TestCommonCUDA::test_dtypes_resolve_neg_cuda PASSED [ 4%] 2023-01-11T23:10:16.9815387Z test_ops.py::TestCommonCUDA::test_dtypes_roll_cuda PASSED [ 4%] 2023-01-11T23:10:16.9815691Z test_ops.py::TestCommonCUDA::test_dtypes_rot90_cuda PASSED [ 4%] 2023-01-11T23:10:16.9816022Z test_ops.py::TestCommonCUDA::test_dtypes_round_decimals_3_cuda SKIPPED (Skipped!) [ 4%] 2023-01-11T23:10:16.9816357Z test_ops.py::TestCommonCUDA::test_dtypes_scalar_tensor_cuda PASSED [ 4%] 2023-01-11T23:10:16.9816662Z test_ops.py::TestCommonCUDA::test_dtypes_scatter_add_cuda PASSED [ 4%] 2023-01-11T23:10:16.9816975Z test_ops.py::TestCommonCUDA::test_dtypes_scatter_cuda PASSED [ 4%] 2023-01-11T23:10:16.9817291Z test_ops.py::TestCommonCUDA::test_dtypes_scatter_reduce_amax_cuda PASSED [ 4%] 2023-01-11T23:10:16.9817609Z test_ops.py::TestCommonCUDA::test_dtypes_scatter_reduce_prod_cuda PASSED [ 4%] 2023-01-11T23:10:16.9817934Z test_ops.py::TestCommonCUDA::test_dtypes_scatter_reduce_sum_cuda PASSED [ 4%] 2023-01-11T23:10:16.9818300Z test_ops.py::TestCommonCUDA::test_dtypes_searchsorted_cuda PASSED [ 4%] 2023-01-11T23:10:16.9818640Z test_ops.py::TestCommonCUDA::test_dtypes_segment_reduce_lengths_cuda PASSED [ 4%] 2023-01-11T23:10:16.9818970Z test_ops.py::TestCommonCUDA::test_dtypes_select_scatter_cuda PASSED [ 4%] 2023-01-11T23:10:16.9819287Z test_ops.py::TestCommonCUDA::test_dtypes_short_cuda PASSED [ 4%] 2023-01-11T23:10:16.9819599Z test_ops.py::TestCommonCUDA::test_dtypes_sigmoid_cuda PASSED [ 4%] 2023-01-11T23:10:16.9819924Z test_ops.py::TestCommonCUDA::test_dtypes_signal_windows_bartlett_cuda PASSED [ 4%] 2023-01-11T23:10:16.9820323Z test_ops.py::TestCommonCUDA::test_dtypes_signal_windows_general_cosine_cuda PASSED [ 4%] 2023-01-11T23:10:16.9820683Z test_ops.py::TestCommonCUDA::test_dtypes_signal_windows_general_hamming_cuda PASSED [ 4%] 2023-01-11T23:10:16.9821030Z test_ops.py::TestCommonCUDA::test_dtypes_signal_windows_hann_cuda PASSED [ 4%] 2023-01-11T23:10:16.9821346Z test_ops.py::TestCommonCUDA::test_dtypes_sinc_cuda PASSED [ 4%] 2023-01-11T23:10:16.9821656Z test_ops.py::TestCommonCUDA::test_dtypes_sinh_cuda PASSED [ 4%] 2023-01-11T23:10:16.9821966Z test_ops.py::TestCommonCUDA::test_dtypes_slice_cuda PASSED [ 4%] 2023-01-11T23:10:16.9822279Z test_ops.py::TestCommonCUDA::test_dtypes_softmax_with_dtype_cuda PASSED [ 4%] 2023-01-11T23:10:16.9822598Z test_ops.py::TestCommonCUDA::test_dtypes_sort_cuda PASSED [ 4%] 2023-01-11T23:10:16.9822913Z test_ops.py::TestCommonCUDA::test_dtypes_special_bessel_j1_cuda PASSED [ 4%] 2023-01-11T23:10:16.9823259Z test_ops.py::TestCommonCUDA::test_dtypes_special_chebyshev_polynomial_u_cuda PASSED [ 4%] 2023-01-11T23:10:16.9823597Z test_ops.py::TestCommonCUDA::test_dtypes_special_erfcx_cuda PASSED [ 4%] 2023-01-11T23:10:16.9823920Z test_ops.py::TestCommonCUDA::test_dtypes_special_i0e_cuda PASSED [ 4%] 2023-01-11T23:10:16.9824241Z test_ops.py::TestCommonCUDA::test_dtypes_special_i1_cuda PASSED [ 5%] 2023-01-11T23:10:16.9824555Z test_ops.py::TestCommonCUDA::test_dtypes_special_i1e_cuda PASSED [ 5%] 2023-01-11T23:10:16.9824894Z test_ops.py::TestCommonCUDA::test_dtypes_special_laguerre_polynomial_l_cuda PASSED [ 5%] 2023-01-11T23:10:16.9825474Z test_ops.py::TestCommonCUDA::test_dtypes_special_legendre_polynomial_p_cuda SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 5%] 2023-01-11T23:10:16.9825874Z test_ops.py::TestCommonCUDA::test_dtypes_special_log_ndtr_cuda PASSED [ 5%] 2023-01-11T23:10:16.9826202Z test_ops.py::TestCommonCUDA::test_dtypes_special_modified_bessel_i1_cuda PASSED [ 5%] 2023-01-11T23:10:16.9826533Z test_ops.py::TestCommonCUDA::test_dtypes_special_ndtr_cuda PASSED [ 5%] 2023-01-11T23:10:16.9826873Z test_ops.py::TestCommonCUDA::test_dtypes_special_scaled_modified_bessel_k1_cuda PASSED [ 5%] 2023-01-11T23:10:16.9827412Z test_ops.py::TestCommonCUDA::test_dtypes_special_shifted_chebyshev_polynomial_u_cuda SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 5%] 2023-01-11T23:10:16.9827999Z test_ops.py::TestCommonCUDA::test_dtypes_special_shifted_chebyshev_polynomial_v_cuda SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 5%] 2023-01-11T23:10:16.9828397Z test_ops.py::TestCommonCUDA::test_dtypes_special_xlog1py_cuda PASSED [ 5%] 2023-01-11T23:10:16.9828778Z test_ops.py::TestCommonCUDA::test_dtypes_split_cuda PASSED [ 5%] 2023-01-11T23:10:16.9829090Z test_ops.py::TestCommonCUDA::test_dtypes_sqrt_cuda PASSED [ 5%] 2023-01-11T23:10:16.9829394Z test_ops.py::TestCommonCUDA::test_dtypes_square_cuda PASSED [ 5%] 2023-01-11T23:10:16.9829698Z test_ops.py::TestCommonCUDA::test_dtypes_std_cuda PASSED [ 5%] 2023-01-11T23:10:16.9830015Z test_ops.py::TestCommonCUDA::test_dtypes_std_mean_unbiased_cuda PASSED [ 5%] 2023-01-11T23:10:16.9830326Z test_ops.py::TestCommonCUDA::test_dtypes_sub_cuda PASSED [ 5%] 2023-01-11T23:10:16.9830693Z test_ops.py::TestCommonCUDA::test_dtypes_svd_lowrank_cuda PASSED [ 5%] 2023-01-11T23:10:16.9831028Z test_ops.py::TestCommonCUDA::test_dtypes_symeig_cuda PASSED [ 5%] 2023-01-11T23:10:16.9831338Z test_ops.py::TestCommonCUDA::test_dtypes_take_cuda PASSED [ 5%] 2023-01-11T23:10:16.9831636Z test_ops.py::TestCommonCUDA::test_dtypes_tanh_cuda PASSED [ 5%] 2023-01-11T23:10:16.9831946Z test_ops.py::TestCommonCUDA::test_dtypes_tensor_split_cuda PASSED [ 5%] 2023-01-11T23:10:16.9832259Z test_ops.py::TestCommonCUDA::test_dtypes_tensordot_cuda PASSED [ 5%] 2023-01-11T23:10:16.9832576Z test_ops.py::TestCommonCUDA::test_dtypes_to_sparse_cuda SKIPPED (Skipped!) [ 5%] 2023-01-11T23:10:16.9832900Z test_ops.py::TestCommonCUDA::test_dtypes_transpose_cuda PASSED [ 5%] 2023-01-11T23:10:16.9833215Z test_ops.py::TestCommonCUDA::test_dtypes_triangular_solve_cuda PASSED [ 5%] 2023-01-11T23:10:16.9833529Z test_ops.py::TestCommonCUDA::test_dtypes_tril_cuda PASSED [ 5%] 2023-01-11T23:10:16.9833826Z test_ops.py::TestCommonCUDA::test_dtypes_triu_cuda PASSED [ 5%] 2023-01-11T23:10:16.9834137Z test_ops.py::TestCommonCUDA::test_dtypes_unfold_copy_cuda PASSED [ 5%] 2023-01-11T23:10:16.9834450Z test_ops.py::TestCommonCUDA::test_dtypes_unfold_cuda PASSED [ 5%] 2023-01-11T23:10:16.9834750Z test_ops.py::TestCommonCUDA::test_dtypes_uniform_cuda PASSED [ 5%] 2023-01-11T23:10:16.9835073Z test_ops.py::TestCommonCUDA::test_dtypes_unique_consecutive_cuda PASSED [ 5%] 2023-01-11T23:10:16.9835391Z test_ops.py::TestCommonCUDA::test_dtypes_unsqueeze_cuda PASSED [ 5%] 2023-01-11T23:10:16.9835703Z test_ops.py::TestCommonCUDA::test_dtypes_var_mean_cuda PASSED [ 5%] 2023-01-11T23:10:16.9836006Z test_ops.py::TestCommonCUDA::test_dtypes_view_as_real_cuda PASSED [ 5%] 2023-01-11T23:10:16.9836316Z test_ops.py::TestCommonCUDA::test_dtypes_vsplit_cuda PASSED [ 5%] 2023-01-11T23:10:16.9836624Z test_ops.py::TestCommonCUDA::test_dtypes_vstack_cuda PASSED [ 5%] 2023-01-11T23:10:16.9836921Z test_ops.py::TestCommonCUDA::test_dtypes_where_cuda PASSED [ 5%] 2023-01-11T23:10:16.9837254Z test_ops.py::TestCommonCUDA::test_dtypes_xlogy_cuda PASSED [ 5%] 2023-01-11T23:10:16.9837562Z test_ops.py::TestCommonCUDA::test_dtypes_zero__cuda PASSED [ 5%] 2023-01-11T23:10:16.9837861Z test_ops.py::TestCommonCUDA::test_dtypes_zeros_like_cuda PASSED [ 5%] 2023-01-11T23:10:16.9838170Z test_ops.py::TestCommonCUDA::test_errors___radd___cuda PASSED [ 5%] 2023-01-11T23:10:16.9838473Z test_ops.py::TestCommonCUDA::test_errors___rdiv___cuda PASSED [ 5%] 2023-01-11T23:10:16.9838775Z test_ops.py::TestCommonCUDA::test_errors___rmul___cuda PASSED [ 5%] 2023-01-11T23:10:16.9839070Z test_ops.py::TestCommonCUDA::test_errors___rpow___cuda PASSED [ 5%] 2023-01-11T23:10:16.9839376Z test_ops.py::TestCommonCUDA::test_errors___rsub___cuda PASSED [ 5%] 2023-01-11T23:10:16.9839676Z test_ops.py::TestCommonCUDA::test_errors___rxor___cuda PASSED [ 5%] 2023-01-11T23:10:16.9839974Z test_ops.py::TestCommonCUDA::test_errors_amax_cuda PASSED [ 5%] 2023-01-11T23:10:16.9840275Z test_ops.py::TestCommonCUDA::test_errors_amin_cuda PASSED [ 5%] 2023-01-11T23:10:16.9840575Z test_ops.py::TestCommonCUDA::test_errors_atan2_cuda PASSED [ 5%] 2023-01-11T23:10:16.9840885Z test_ops.py::TestCommonCUDA::test_errors_bernoulli_cuda PASSED [ 5%] 2023-01-11T23:10:16.9841192Z test_ops.py::TestCommonCUDA::test_errors_bitwise_left_shift_cuda PASSED [ 5%] 2023-01-11T23:10:16.9841504Z test_ops.py::TestCommonCUDA::test_errors_bitwise_or_cuda PASSED [ 5%] 2023-01-11T23:10:16.9841817Z test_ops.py::TestCommonCUDA::test_errors_bitwise_right_shift_cuda PASSED [ 5%] 2023-01-11T23:10:16.9842151Z test_ops.py::TestCommonCUDA::test_errors_bitwise_xor_cuda PASSED [ 5%] 2023-01-11T23:10:16.9842462Z test_ops.py::TestCommonCUDA::test_errors_complex_cuda PASSED [ 5%] 2023-01-11T23:10:16.9842770Z test_ops.py::TestCommonCUDA::test_errors_copysign_cuda PASSED [ 5%] 2023-01-11T23:10:16.9843079Z test_ops.py::TestCommonCUDA::test_errors_cov_cuda PASSED [ 5%] 2023-01-11T23:10:16.9843374Z test_ops.py::TestCommonCUDA::test_errors_diag_cuda PASSED [ 5%] 2023-01-11T23:10:16.9843680Z test_ops.py::TestCommonCUDA::test_errors_diag_embed_cuda PASSED [ 5%] 2023-01-11T23:10:16.9843990Z test_ops.py::TestCommonCUDA::test_errors_diagonal_copy_cuda PASSED [ 5%] 2023-01-11T23:10:16.9844309Z test_ops.py::TestCommonCUDA::test_errors_div_no_rounding_mode_cuda PASSED [ 5%] 2023-01-11T23:10:16.9844632Z test_ops.py::TestCommonCUDA::test_errors_dstack_cuda PASSED [ 5%] 2023-01-11T23:10:16.9844934Z test_ops.py::TestCommonCUDA::test_errors_eq_cuda PASSED [ 5%] 2023-01-11T23:10:16.9845242Z test_ops.py::TestCommonCUDA::test_errors_fft_fft2_cuda PASSED [ 5%] 2023-01-11T23:10:16.9845542Z test_ops.py::TestCommonCUDA::test_errors_fft_fft_cuda PASSED [ 5%] 2023-01-11T23:10:16.9845849Z test_ops.py::TestCommonCUDA::test_errors_fft_hfftn_cuda PASSED [ 5%] 2023-01-11T23:10:16.9846159Z test_ops.py::TestCommonCUDA::test_errors_fft_ifftn_cuda PASSED [ 5%] 2023-01-11T23:10:16.9846459Z test_ops.py::TestCommonCUDA::test_errors_fft_ihfft_cuda PASSED [ 5%] 2023-01-11T23:10:16.9846767Z test_ops.py::TestCommonCUDA::test_errors_fft_ihfftn_cuda PASSED [ 5%] 2023-01-11T23:10:16.9847073Z test_ops.py::TestCommonCUDA::test_errors_fft_irfft2_cuda PASSED [ 5%] 2023-01-11T23:10:16.9847380Z test_ops.py::TestCommonCUDA::test_errors_fft_irfft_cuda PASSED [ 5%] 2023-01-11T23:10:16.9847681Z test_ops.py::TestCommonCUDA::test_errors_fft_irfftn_cuda PASSED [ 5%] 2023-01-11T23:10:16.9847992Z test_ops.py::TestCommonCUDA::test_errors_fft_rfft2_cuda PASSED [ 5%] 2023-01-11T23:10:16.9848300Z test_ops.py::TestCommonCUDA::test_errors_fft_rfft_cuda PASSED [ 5%] 2023-01-11T23:10:16.9848662Z test_ops.py::TestCommonCUDA::test_errors_fft_rfftn_cuda PASSED [ 5%] 2023-01-11T23:10:16.9848973Z test_ops.py::TestCommonCUDA::test_errors_floor_divide_cuda PASSED [ 5%] 2023-01-11T23:10:16.9849284Z test_ops.py::TestCommonCUDA::test_errors_fmin_cuda PASSED [ 5%] 2023-01-11T23:10:16.9849590Z test_ops.py::TestCommonCUDA::test_errors_gather_cuda PASSED [ 5%] 2023-01-11T23:10:16.9849891Z test_ops.py::TestCommonCUDA::test_errors_gradient_cuda PASSED [ 5%] 2023-01-11T23:10:16.9850228Z test_ops.py::TestCommonCUDA::test_errors_hsplit_cuda PASSED [ 5%] 2023-01-11T23:10:16.9850559Z test_ops.py::TestCommonCUDA::test_errors_hstack_cuda PASSED [ 5%] 2023-01-11T23:10:16.9850855Z test_ops.py::TestCommonCUDA::test_errors_hypot_cuda PASSED [ 5%] 2023-01-11T23:10:16.9851165Z test_ops.py::TestCommonCUDA::test_errors_isclose_cuda PASSED [ 5%] 2023-01-11T23:10:16.9851476Z test_ops.py::TestCommonCUDA::test_errors_jiterator_binary_cuda PASSED [ 5%] 2023-01-11T23:10:16.9851793Z test_ops.py::TestCommonCUDA::test_errors_ldexp_cuda PASSED [ 5%] 2023-01-11T23:10:16.9852089Z test_ops.py::TestCommonCUDA::test_errors_le_cuda PASSED [ 5%] 2023-01-11T23:10:16.9852394Z test_ops.py::TestCommonCUDA::test_errors_linalg_cross_cuda PASSED [ 5%] 2023-01-11T23:10:16.9852705Z test_ops.py::TestCommonCUDA::test_errors_linspace_cuda PASSED [ 5%] 2023-01-11T23:10:16.9853011Z test_ops.py::TestCommonCUDA::test_errors_logcumsumexp_cuda PASSED [ 5%] 2023-01-11T23:10:16.9853328Z test_ops.py::TestCommonCUDA::test_errors_logical_and_cuda PASSED [ 5%] 2023-01-11T23:10:16.9853636Z test_ops.py::TestCommonCUDA::test_errors_logical_or_cuda PASSED [ 5%] 2023-01-11T23:10:16.9853970Z test_ops.py::TestCommonCUDA::test_errors_logical_xor_cuda PASSED [ 5%] 2023-01-11T23:10:16.9854272Z test_ops.py::TestCommonCUDA::test_errors_masked_select_cuda PASSED [ 5%] 2023-01-11T23:10:16.9854708Z test_ops.py::TestCommonCUDA::test_errors_max_binary_cuda PASSED [ 5%] 2023-01-11T23:10:16.9855015Z test_ops.py::TestCommonCUDA::test_errors_mean_cuda PASSED [ 5%] 2023-01-11T23:10:16.9855311Z test_ops.py::TestCommonCUDA::test_errors_median_cuda PASSED [ 5%] 2023-01-11T23:10:16.9855616Z test_ops.py::TestCommonCUDA::test_errors_min_binary_cuda PASSED [ 5%] 2023-01-11T23:10:16.9855924Z test_ops.py::TestCommonCUDA::test_errors_minimum_cuda PASSED [ 5%] 2023-01-11T23:10:16.9856228Z test_ops.py::TestCommonCUDA::test_errors_movedim_cuda PASSED [ 5%] 2023-01-11T23:10:16.9856523Z test_ops.py::TestCommonCUDA::test_errors_mul_cuda PASSED [ 5%] 2023-01-11T23:10:16.9856832Z test_ops.py::TestCommonCUDA::test_errors_narrow_copy_cuda PASSED [ 5%] 2023-01-11T23:10:16.9857139Z test_ops.py::TestCommonCUDA::test_errors_narrow_cuda PASSED [ 5%] 2023-01-11T23:10:16.9857445Z test_ops.py::TestCommonCUDA::test_errors_native_layer_norm_cuda PASSED [ 5%] 2023-01-11T23:10:16.9857751Z test_ops.py::TestCommonCUDA::test_errors_ne_cuda PASSED [ 5%] 2023-01-11T23:10:16.9858066Z test_ops.py::TestCommonCUDA::test_errors_nextafter_cuda PASSED [ 5%] 2023-01-11T23:10:16.9858394Z test_ops.py::TestCommonCUDA::test_errors_nn_functional_avg_pool1d_cuda PASSED [ 5%] 2023-01-11T23:10:16.9858729Z test_ops.py::TestCommonCUDA::test_errors_nn_functional_avg_pool2d_cuda PASSED [ 5%] 2023-01-11T23:10:16.9859071Z test_ops.py::TestCommonCUDA::test_errors_nn_functional_embedding_cuda PASSED [ 5%] 2023-01-11T23:10:16.9859401Z test_ops.py::TestCommonCUDA::test_errors_nn_functional_gelu_cuda PASSED [ 5%] 2023-01-11T23:10:16.9859726Z test_ops.py::TestCommonCUDA::test_errors_nn_functional_group_norm_cuda PASSED [ 5%] 2023-01-11T23:10:16.9860065Z test_ops.py::TestCommonCUDA::test_errors_nn_functional_huber_loss_cuda PASSED [ 5%] 2023-01-11T23:10:16.9860402Z test_ops.py::TestCommonCUDA::test_errors_nn_functional_l1_loss_cuda PASSED [ 5%] 2023-01-11T23:10:16.9860777Z test_ops.py::TestCommonCUDA::test_errors_nn_functional_max_pool1d_cuda PASSED [ 5%] 2023-01-11T23:10:16.9861104Z test_ops.py::TestCommonCUDA::test_errors_nn_functional_max_pool2d_cuda PASSED [ 5%] 2023-01-11T23:10:16.9861446Z test_ops.py::TestCommonCUDA::test_errors_nn_functional_poisson_nll_loss_cuda PASSED [ 5%] 2023-01-11T23:10:16.9861785Z test_ops.py::TestCommonCUDA::test_errors_nn_functional_prelu_cuda PASSED [ 5%] 2023-01-11T23:10:16.9862138Z test_ops.py::TestCommonCUDA::test_errors_nn_functional_triplet_margin_with_distance_loss_cuda PASSED [ 5%] 2023-01-11T23:10:16.9862483Z test_ops.py::TestCommonCUDA::test_errors_ormqr_cuda PASSED [ 5%] 2023-01-11T23:10:16.9862793Z test_ops.py::TestCommonCUDA::test_errors_pow_cuda PASSED [ 5%] 2023-01-11T23:10:16.9863100Z test_ops.py::TestCommonCUDA::test_errors_renorm_cuda PASSED [ 5%] 2023-01-11T23:10:16.9863411Z test_ops.py::TestCommonCUDA::test_errors_reshape_as_cuda PASSED [ 5%] 2023-01-11T23:10:16.9863723Z test_ops.py::TestCommonCUDA::test_errors_reshape_cuda PASSED [ 5%] 2023-01-11T23:10:16.9864032Z test_ops.py::TestCommonCUDA::test_errors_roll_cuda PASSED [ 5%] 2023-01-11T23:10:16.9864329Z test_ops.py::TestCommonCUDA::test_errors_rot90_cuda PASSED [ 5%] 2023-01-11T23:10:16.9864641Z test_ops.py::TestCommonCUDA::test_errors_scatter_add_cuda PASSED [ 5%] 2023-01-11T23:10:16.9864951Z test_ops.py::TestCommonCUDA::test_errors_scatter_cuda PASSED [ 5%] 2023-01-11T23:10:16.9865275Z test_ops.py::TestCommonCUDA::test_errors_signal_windows_cosine_cuda PASSED [ 5%] 2023-01-11T23:10:16.9865622Z test_ops.py::TestCommonCUDA::test_errors_signal_windows_gaussian_cuda PASSED [ 5%] 2023-01-11T23:10:16.9866006Z test_ops.py::TestCommonCUDA::test_errors_signal_windows_general_cosine_cuda PASSED [ 5%] 2023-01-11T23:10:16.9866349Z test_ops.py::TestCommonCUDA::test_errors_signal_windows_kaiser_cuda PASSED [ 5%] 2023-01-11T23:10:16.9866683Z test_ops.py::TestCommonCUDA::test_errors_signal_windows_nuttall_cuda PASSED [ 5%] 2023-01-11T23:10:16.9867030Z test_ops.py::TestCommonCUDA::test_errors_special_chebyshev_polynomial_u_cuda PASSED [ 5%] 2023-01-11T23:10:16.9867385Z test_ops.py::TestCommonCUDA::test_errors_special_hermite_polynomial_h_cuda PASSED [ 5%] 2023-01-11T23:10:16.9867914Z test_ops.py::TestCommonCUDA::test_errors_special_legendre_polynomial_p_cuda SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 5%] 2023-01-11T23:10:16.9868490Z test_ops.py::TestCommonCUDA::test_errors_special_shifted_chebyshev_polynomial_t_cuda SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 5%] 2023-01-11T23:10:16.9869155Z test_ops.py::TestCommonCUDA::test_errors_special_shifted_chebyshev_polynomial_w_cuda SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 5%] 2023-01-11T23:10:16.9869560Z test_ops.py::TestCommonCUDA::test_errors_special_xlog1py_cuda PASSED [ 5%] 2023-01-11T23:10:16.9869883Z test_ops.py::TestCommonCUDA::test_errors_sub_cuda PASSED [ 5%] 2023-01-11T23:10:16.9870184Z test_ops.py::TestCommonCUDA::test_errors_t_cuda PASSED [ 5%] 2023-01-11T23:10:16.9870486Z test_ops.py::TestCommonCUDA::test_errors_tril_cuda PASSED [ 5%] 2023-01-11T23:10:16.9870808Z test_ops.py::TestCommonCUDA::test_errors_true_divide_cuda PASSED [ 5%] 2023-01-11T23:10:16.9871157Z test_ops.py::TestCommonCUDA::test_errors_unbind_cuda PASSED [ 5%] 2023-01-11T23:10:16.9871466Z test_ops.py::TestCommonCUDA::test_errors_view_as_cuda PASSED [ 5%] 2023-01-11T23:10:16.9871771Z test_ops.py::TestCommonCUDA::test_errors_view_copy_cuda PASSED [ 5%] 2023-01-11T23:10:16.9872083Z test_ops.py::TestCommonCUDA::test_errors_xlogy_cuda PASSED [ 5%] 2023-01-11T23:10:16.9872432Z test_ops.py::TestCommonCUDA::test_multiple_devices_H_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 5%] 2023-01-11T23:10:16.9872857Z test_ops.py::TestCommonCUDA::test_multiple_devices_T_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 5%] 2023-01-11T23:10:16.9873258Z test_ops.py::TestCommonCUDA::test_multiple_devices___radd___cuda_float32 SKIPPED (fewer than 2 devices detected) [ 5%] 2023-01-11T23:10:16.9873660Z test_ops.py::TestCommonCUDA::test_multiple_devices___radd___cuda_int64 SKIPPED (fewer than 2 devices detected) [ 5%] 2023-01-11T23:10:16.9874048Z test_ops.py::TestCommonCUDA::test_multiple_devices___rdiv___cuda_float32 SKIPPED (fewer than 2 devices detected) [ 5%] 2023-01-11T23:10:16.9874441Z test_ops.py::TestCommonCUDA::test_multiple_devices___rdiv___cuda_int64 SKIPPED (fewer than 2 devices detected) [ 5%] 2023-01-11T23:10:16.9874844Z test_ops.py::TestCommonCUDA::test_multiple_devices___rmatmul___cuda_float32 SKIPPED (fewer than 2 devices detected) [ 5%] 2023-01-11T23:10:16.9875251Z test_ops.py::TestCommonCUDA::test_multiple_devices___rmul___cuda_float32 SKIPPED (fewer than 2 devices detected) [ 5%] 2023-01-11T23:10:16.9875645Z test_ops.py::TestCommonCUDA::test_multiple_devices___rmul___cuda_int64 SKIPPED (fewer than 2 devices detected) [ 5%] 2023-01-11T23:10:16.9876037Z test_ops.py::TestCommonCUDA::test_multiple_devices___rsub___cuda_float32 SKIPPED (fewer than 2 devices detected) [ 5%] 2023-01-11T23:10:16.9876453Z test_ops.py::TestCommonCUDA::test_multiple_devices__softmax_backward_data_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 5%] 2023-01-11T23:10:16.9876867Z test_ops.py::TestCommonCUDA::test_multiple_devices_abs_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9877253Z test_ops.py::TestCommonCUDA::test_multiple_devices_abs_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9877670Z test_ops.py::TestCommonCUDA::test_multiple_devices_acos_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9878064Z test_ops.py::TestCommonCUDA::test_multiple_devices_acosh_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9878452Z test_ops.py::TestCommonCUDA::test_multiple_devices_add_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9878854Z test_ops.py::TestCommonCUDA::test_multiple_devices_addcdiv_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9879255Z test_ops.py::TestCommonCUDA::test_multiple_devices_addcmul_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9879652Z test_ops.py::TestCommonCUDA::test_multiple_devices_addmm_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9880043Z test_ops.py::TestCommonCUDA::test_multiple_devices_addr_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9880438Z test_ops.py::TestCommonCUDA::test_multiple_devices_addr_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9880827Z test_ops.py::TestCommonCUDA::test_multiple_devices_all_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9881227Z test_ops.py::TestCommonCUDA::test_multiple_devices_allclose_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9881622Z test_ops.py::TestCommonCUDA::test_multiple_devices_amax_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9882012Z test_ops.py::TestCommonCUDA::test_multiple_devices_amax_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9882401Z test_ops.py::TestCommonCUDA::test_multiple_devices_amin_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9882800Z test_ops.py::TestCommonCUDA::test_multiple_devices_aminmax_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9883199Z test_ops.py::TestCommonCUDA::test_multiple_devices_aminmax_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9883594Z test_ops.py::TestCommonCUDA::test_multiple_devices_angle_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9884025Z test_ops.py::TestCommonCUDA::test_multiple_devices_any_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9884418Z test_ops.py::TestCommonCUDA::test_multiple_devices_any_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9884809Z test_ops.py::TestCommonCUDA::test_multiple_devices_arange_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9885206Z test_ops.py::TestCommonCUDA::test_multiple_devices_arange_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9885599Z test_ops.py::TestCommonCUDA::test_multiple_devices_argmax_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9885987Z test_ops.py::TestCommonCUDA::test_multiple_devices_argmin_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9886379Z test_ops.py::TestCommonCUDA::test_multiple_devices_argwhere_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9886781Z test_ops.py::TestCommonCUDA::test_multiple_devices_as_strided_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9887199Z test_ops.py::TestCommonCUDA::test_multiple_devices_as_strided_partial_views_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9887625Z test_ops.py::TestCommonCUDA::test_multiple_devices_as_strided_partial_views_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9888039Z test_ops.py::TestCommonCUDA::test_multiple_devices_asin_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9888438Z test_ops.py::TestCommonCUDA::test_multiple_devices_atan2_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9888859Z test_ops.py::TestCommonCUDA::test_multiple_devices_atanh_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9889249Z test_ops.py::TestCommonCUDA::test_multiple_devices_atanh_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9889652Z test_ops.py::TestCommonCUDA::test_multiple_devices_atleast_1d_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9890054Z test_ops.py::TestCommonCUDA::test_multiple_devices_atleast_1d_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9890459Z test_ops.py::TestCommonCUDA::test_multiple_devices_atleast_2d_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9890902Z test_ops.py::TestCommonCUDA::test_multiple_devices_atleast_2d_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9891302Z test_ops.py::TestCommonCUDA::test_multiple_devices_atleast_3d_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9891704Z test_ops.py::TestCommonCUDA::test_multiple_devices_atleast_3d_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9892108Z test_ops.py::TestCommonCUDA::test_multiple_devices_bernoulli_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9892509Z test_ops.py::TestCommonCUDA::test_multiple_devices_bitwise_and_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9892925Z test_ops.py::TestCommonCUDA::test_multiple_devices_bitwise_right_shift_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9893340Z test_ops.py::TestCommonCUDA::test_multiple_devices_block_diag_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9893743Z test_ops.py::TestCommonCUDA::test_multiple_devices_block_diag_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9894135Z test_ops.py::TestCommonCUDA::test_multiple_devices_bmm_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9894692Z test_ops.py::TestCommonCUDA::test_multiple_devices_bool_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9895143Z test_ops.py::TestCommonCUDA::test_multiple_devices_broadcast_tensors_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9895576Z test_ops.py::TestCommonCUDA::test_multiple_devices_broadcast_tensors_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9895992Z test_ops.py::TestCommonCUDA::test_multiple_devices_broadcast_to_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9896406Z test_ops.py::TestCommonCUDA::test_multiple_devices_broadcast_to_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9896815Z test_ops.py::TestCommonCUDA::test_multiple_devices_bucketize_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9897218Z test_ops.py::TestCommonCUDA::test_multiple_devices_byte_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9897624Z test_ops.py::TestCommonCUDA::test_multiple_devices_cartesian_prod_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9898037Z test_ops.py::TestCommonCUDA::test_multiple_devices_ceil_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9898434Z test_ops.py::TestCommonCUDA::test_multiple_devices_ceil_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9898837Z test_ops.py::TestCommonCUDA::test_multiple_devices_cfloat_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9899229Z test_ops.py::TestCommonCUDA::test_multiple_devices_cfloat_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9899635Z test_ops.py::TestCommonCUDA::test_multiple_devices_chalf_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9900055Z test_ops.py::TestCommonCUDA::test_multiple_devices_cholesky_inverse_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9900549Z test_ops.py::TestCommonCUDA::test_multiple_devices_chunk_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9900947Z test_ops.py::TestCommonCUDA::test_multiple_devices_clamp_max_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9901355Z test_ops.py::TestCommonCUDA::test_multiple_devices_clamp_min_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9901765Z test_ops.py::TestCommonCUDA::test_multiple_devices_clone_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9902168Z test_ops.py::TestCommonCUDA::test_multiple_devices_column_stack_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9902581Z test_ops.py::TestCommonCUDA::test_multiple_devices_combinations_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9902988Z test_ops.py::TestCommonCUDA::test_multiple_devices_complex_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9903390Z test_ops.py::TestCommonCUDA::test_multiple_devices_conj_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9903793Z test_ops.py::TestCommonCUDA::test_multiple_devices_conj_physical_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9904210Z test_ops.py::TestCommonCUDA::test_multiple_devices_conj_physical_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9904621Z test_ops.py::TestCommonCUDA::test_multiple_devices_corrcoef_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9905027Z test_ops.py::TestCommonCUDA::test_multiple_devices_corrcoef_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9905419Z test_ops.py::TestCommonCUDA::test_multiple_devices_cos_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9905818Z test_ops.py::TestCommonCUDA::test_multiple_devices_cosh_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9906217Z test_ops.py::TestCommonCUDA::test_multiple_devices_cosh_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9906644Z test_ops.py::TestCommonCUDA::test_multiple_devices_count_nonzero_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9907042Z test_ops.py::TestCommonCUDA::test_multiple_devices_cov_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9907442Z test_ops.py::TestCommonCUDA::test_multiple_devices_cross_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9907846Z test_ops.py::TestCommonCUDA::test_multiple_devices_cummax_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9908244Z test_ops.py::TestCommonCUDA::test_multiple_devices_cummax_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9908634Z test_ops.py::TestCommonCUDA::test_multiple_devices_cumprod_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9909148Z test_ops.py::TestCommonCUDA::test_multiple_devices_deg2rad_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9909555Z test_ops.py::TestCommonCUDA::test_multiple_devices_deg2rad_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9909951Z test_ops.py::TestCommonCUDA::test_multiple_devices_diag_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9910348Z test_ops.py::TestCommonCUDA::test_multiple_devices_diag_embed_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9910762Z test_ops.py::TestCommonCUDA::test_multiple_devices_diagonal_copy_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9910963Z test_ops.py::TestCommonCUDA::test_multiple_devices_diagonal_copy_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9911188Z test_ops.py::TestCommonCUDA::test_multiple_devices_diagonal_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9911395Z test_ops.py::TestCommonCUDA::test_multiple_devices_diagonal_scatter_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9911605Z test_ops.py::TestCommonCUDA::test_multiple_devices_div_no_rounding_mode_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9911813Z test_ops.py::TestCommonCUDA::test_multiple_devices_div_trunc_rounding_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9912006Z test_ops.py::TestCommonCUDA::test_multiple_devices_div_trunc_rounding_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9912201Z test_ops.py::TestCommonCUDA::test_multiple_devices_double_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9912392Z test_ops.py::TestCommonCUDA::test_multiple_devices_double_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9912586Z test_ops.py::TestCommonCUDA::test_multiple_devices_dsplit_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9912782Z test_ops.py::TestCommonCUDA::test_multiple_devices_dsplit_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9912976Z test_ops.py::TestCommonCUDA::test_multiple_devices_einsum_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9913167Z test_ops.py::TestCommonCUDA::test_multiple_devices_empty_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9913365Z test_ops.py::TestCommonCUDA::test_multiple_devices_empty_like_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9913552Z test_ops.py::TestCommonCUDA::test_multiple_devices_eq_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9913746Z test_ops.py::TestCommonCUDA::test_multiple_devices_equal_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9913933Z test_ops.py::TestCommonCUDA::test_multiple_devices_equal_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9914144Z test_ops.py::TestCommonCUDA::test_multiple_devices_erf_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9914335Z test_ops.py::TestCommonCUDA::test_multiple_devices_erf_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9914524Z test_ops.py::TestCommonCUDA::test_multiple_devices_erfc_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9914712Z test_ops.py::TestCommonCUDA::test_multiple_devices_erfc_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9914905Z test_ops.py::TestCommonCUDA::test_multiple_devices_erfinv_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9915098Z test_ops.py::TestCommonCUDA::test_multiple_devices_exp2_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9915289Z test_ops.py::TestCommonCUDA::test_multiple_devices_exp2_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9915476Z test_ops.py::TestCommonCUDA::test_multiple_devices_exp_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9915670Z test_ops.py::TestCommonCUDA::test_multiple_devices_expand_as_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9915865Z test_ops.py::TestCommonCUDA::test_multiple_devices_expand_as_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9916061Z test_ops.py::TestCommonCUDA::test_multiple_devices_expand_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9916254Z test_ops.py::TestCommonCUDA::test_multiple_devices_expand_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9916446Z test_ops.py::TestCommonCUDA::test_multiple_devices_expm1_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9916672Z test_ops.py::TestCommonCUDA::test_multiple_devices_eye_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9916864Z test_ops.py::TestCommonCUDA::test_multiple_devices_fft_fft2_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9917057Z test_ops.py::TestCommonCUDA::test_multiple_devices_fft_fft_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9917247Z test_ops.py::TestCommonCUDA::test_multiple_devices_fft_fftn_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9917450Z test_ops.py::TestCommonCUDA::test_multiple_devices_fft_fftshift_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9917639Z test_ops.py::TestCommonCUDA::test_multiple_devices_fft_hfft2_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9917832Z test_ops.py::TestCommonCUDA::test_multiple_devices_fft_hfft_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9918027Z test_ops.py::TestCommonCUDA::test_multiple_devices_fft_hfftn_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9918224Z test_ops.py::TestCommonCUDA::test_multiple_devices_fft_ifft2_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9918419Z test_ops.py::TestCommonCUDA::test_multiple_devices_fft_ifft2_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9918613Z test_ops.py::TestCommonCUDA::test_multiple_devices_fft_ifftn_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9918806Z test_ops.py::TestCommonCUDA::test_multiple_devices_fft_ifftn_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9919009Z test_ops.py::TestCommonCUDA::test_multiple_devices_fft_ifftshift_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9919211Z test_ops.py::TestCommonCUDA::test_multiple_devices_fft_ifftshift_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9919405Z test_ops.py::TestCommonCUDA::test_multiple_devices_fft_ihfft_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9919628Z test_ops.py::TestCommonCUDA::test_multiple_devices_fft_ihfft_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9919827Z test_ops.py::TestCommonCUDA::test_multiple_devices_fft_ihfftn_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9920023Z test_ops.py::TestCommonCUDA::test_multiple_devices_fft_ihfftn_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9920238Z test_ops.py::TestCommonCUDA::test_multiple_devices_fft_irfft_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9920463Z test_ops.py::TestCommonCUDA::test_multiple_devices_fft_irfftn_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9920656Z test_ops.py::TestCommonCUDA::test_multiple_devices_fft_rfft2_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9920851Z test_ops.py::TestCommonCUDA::test_multiple_devices_fft_rfft2_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9921051Z test_ops.py::TestCommonCUDA::test_multiple_devices_fft_rfftn_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9921245Z test_ops.py::TestCommonCUDA::test_multiple_devices_fill_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9921431Z test_ops.py::TestCommonCUDA::test_multiple_devices_flipud_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9921624Z test_ops.py::TestCommonCUDA::test_multiple_devices_flipud_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9921819Z test_ops.py::TestCommonCUDA::test_multiple_devices_floor_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9922018Z test_ops.py::TestCommonCUDA::test_multiple_devices_floor_divide_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9922229Z test_ops.py::TestCommonCUDA::test_multiple_devices_fmax_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9922417Z test_ops.py::TestCommonCUDA::test_multiple_devices_fmod_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9922605Z test_ops.py::TestCommonCUDA::test_multiple_devices_fmod_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9922795Z test_ops.py::TestCommonCUDA::test_multiple_devices_full_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9922985Z test_ops.py::TestCommonCUDA::test_multiple_devices_ge_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9923170Z test_ops.py::TestCommonCUDA::test_multiple_devices_gradient_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9923375Z test_ops.py::TestCommonCUDA::test_multiple_devices_grid_sampler_2d_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9923568Z test_ops.py::TestCommonCUDA::test_multiple_devices_gt_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9923761Z test_ops.py::TestCommonCUDA::test_multiple_devices_half_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9923950Z test_ops.py::TestCommonCUDA::test_multiple_devices_half_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9924149Z test_ops.py::TestCommonCUDA::test_multiple_devices_heaviside_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9924343Z test_ops.py::TestCommonCUDA::test_multiple_devices_heaviside_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9924539Z test_ops.py::TestCommonCUDA::test_multiple_devices_histc_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9924729Z test_ops.py::TestCommonCUDA::test_multiple_devices_histc_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9924924Z test_ops.py::TestCommonCUDA::test_multiple_devices_hstack_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9925124Z test_ops.py::TestCommonCUDA::test_multiple_devices_i0_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:10:16.9925312Z test_ops.py::TestCommonCUDA::test_multiple_devices_i0_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9925507Z test_ops.py::TestCommonCUDA::test_multiple_devices_igamma_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9925704Z test_ops.py::TestCommonCUDA::test_multiple_devices_index_add_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9925899Z test_ops.py::TestCommonCUDA::test_multiple_devices_index_fill_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9926095Z test_ops.py::TestCommonCUDA::test_multiple_devices_index_put_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9926298Z test_ops.py::TestCommonCUDA::test_multiple_devices_index_reduce_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9926500Z test_ops.py::TestCommonCUDA::test_multiple_devices_index_reduce_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9926695Z test_ops.py::TestCommonCUDA::test_multiple_devices_inner_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9926886Z test_ops.py::TestCommonCUDA::test_multiple_devices_int_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9927072Z test_ops.py::TestCommonCUDA::test_multiple_devices_isin_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9927258Z test_ops.py::TestCommonCUDA::test_multiple_devices_isin_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9927449Z test_ops.py::TestCommonCUDA::test_multiple_devices_isinf_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9927665Z test_ops.py::TestCommonCUDA::test_multiple_devices_isnan_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9927864Z test_ops.py::TestCommonCUDA::test_multiple_devices_isneginf_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9928059Z test_ops.py::TestCommonCUDA::test_multiple_devices_isneginf_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9928248Z test_ops.py::TestCommonCUDA::test_multiple_devices_isposinf_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9928441Z test_ops.py::TestCommonCUDA::test_multiple_devices_isreal_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9928634Z test_ops.py::TestCommonCUDA::test_multiple_devices_isreal_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9928850Z test_ops.py::TestCommonCUDA::test_multiple_devices_jiterator_4inputs_with_extra_args_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9929076Z test_ops.py::TestCommonCUDA::test_multiple_devices_jiterator_4inputs_with_extra_args_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9929285Z test_ops.py::TestCommonCUDA::test_multiple_devices_jiterator_binary_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9929503Z test_ops.py::TestCommonCUDA::test_multiple_devices_jiterator_binary_return_by_ref_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9929720Z test_ops.py::TestCommonCUDA::test_multiple_devices_jiterator_binary_return_by_ref_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9929923Z test_ops.py::TestCommonCUDA::test_multiple_devices_jiterator_unary_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9930116Z test_ops.py::TestCommonCUDA::test_multiple_devices_kthvalue_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9930315Z test_ops.py::TestCommonCUDA::test_multiple_devices_ldexp_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9930527Z test_ops.py::TestCommonCUDA::test_multiple_devices_le_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9930754Z test_ops.py::TestCommonCUDA::test_multiple_devices_lgamma_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9930973Z test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_det_singular_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9931170Z test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_eig_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9931377Z test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_ldl_factor_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9931582Z test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_ldl_solve_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9931799Z test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_lstsq_grad_oriented_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9931999Z test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_lu_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9932202Z test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_lu_factor_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9932410Z test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_matrix_norm_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9932627Z test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_matrix_rank_hermitian_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9932832Z test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_multi_dot_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9933049Z test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_norm_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9933272Z test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_norm_subgradients_at_zero_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9933469Z test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_pinv_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9933682Z test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_pinv_hermitian_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9933881Z test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_qr_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9934084Z test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_slogdet_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9934284Z test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_solve_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9934586Z test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_solve_ex_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9934794Z test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_vander_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9934997Z test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_vecdot_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9935199Z test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_vector_norm_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9935393Z test_ops.py::TestCommonCUDA::test_multiple_devices_linspace_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9935584Z test_ops.py::TestCommonCUDA::test_multiple_devices_log1p_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9935774Z test_ops.py::TestCommonCUDA::test_multiple_devices_log2_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9935965Z test_ops.py::TestCommonCUDA::test_multiple_devices_log_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9936204Z test_ops.py::TestCommonCUDA::test_multiple_devices_logcumsumexp_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9936400Z test_ops.py::TestCommonCUDA::test_multiple_devices_logdet_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9936597Z test_ops.py::TestCommonCUDA::test_multiple_devices_logical_or_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9936795Z test_ops.py::TestCommonCUDA::test_multiple_devices_logical_xor_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9936994Z test_ops.py::TestCommonCUDA::test_multiple_devices_logical_xor_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9937188Z test_ops.py::TestCommonCUDA::test_multiple_devices_logit_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9937379Z test_ops.py::TestCommonCUDA::test_multiple_devices_logit_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9937577Z test_ops.py::TestCommonCUDA::test_multiple_devices_logspace_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9937772Z test_ops.py::TestCommonCUDA::test_multiple_devices_logsumexp_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9937963Z test_ops.py::TestCommonCUDA::test_multiple_devices_long_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9938155Z test_ops.py::TestCommonCUDA::test_multiple_devices_lt_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9938351Z test_ops.py::TestCommonCUDA::test_multiple_devices_lu_solve_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9938569Z test_ops.py::TestCommonCUDA::test_multiple_devices_mH_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9938767Z test_ops.py::TestCommonCUDA::test_multiple_devices_masked_amax_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9938957Z test_ops.py::TestCommonCUDA::test_multiple_devices_masked_amax_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9939147Z test_ops.py::TestCommonCUDA::test_multiple_devices_masked_amin_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9939348Z test_ops.py::TestCommonCUDA::test_multiple_devices_masked_argmax_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9939549Z test_ops.py::TestCommonCUDA::test_multiple_devices_masked_argmin_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9939753Z test_ops.py::TestCommonCUDA::test_multiple_devices_masked_cumprod_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9939958Z test_ops.py::TestCommonCUDA::test_multiple_devices_masked_cumprod_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9940158Z test_ops.py::TestCommonCUDA::test_multiple_devices_masked_cumsum_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9940363Z test_ops.py::TestCommonCUDA::test_multiple_devices_masked_logsumexp_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9940558Z test_ops.py::TestCommonCUDA::test_multiple_devices_masked_mean_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9940763Z test_ops.py::TestCommonCUDA::test_multiple_devices_masked_normalize_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9940955Z test_ops.py::TestCommonCUDA::test_multiple_devices_masked_prod_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9941157Z test_ops.py::TestCommonCUDA::test_multiple_devices_masked_scatter_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9941359Z test_ops.py::TestCommonCUDA::test_multiple_devices_masked_scatter_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9941586Z test_ops.py::TestCommonCUDA::test_multiple_devices_masked_select_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9941786Z test_ops.py::TestCommonCUDA::test_multiple_devices_masked_var_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9941982Z test_ops.py::TestCommonCUDA::test_multiple_devices_masked_var_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9942177Z test_ops.py::TestCommonCUDA::test_multiple_devices_matmul_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9942367Z test_ops.py::TestCommonCUDA::test_multiple_devices_max_binary_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9942574Z test_ops.py::TestCommonCUDA::test_multiple_devices_max_reduction_no_dim_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9942774Z test_ops.py::TestCommonCUDA::test_multiple_devices_maximum_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9942965Z test_ops.py::TestCommonCUDA::test_multiple_devices_maximum_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9943178Z test_ops.py::TestCommonCUDA::test_multiple_devices_meshgrid_list_of_tensors_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9943395Z test_ops.py::TestCommonCUDA::test_multiple_devices_meshgrid_variadic_tensors_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9943595Z test_ops.py::TestCommonCUDA::test_multiple_devices_min_binary_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9943794Z test_ops.py::TestCommonCUDA::test_multiple_devices_min_binary_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9944028Z test_ops.py::TestCommonCUDA::test_multiple_devices_min_reduction_no_dim_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9944243Z test_ops.py::TestCommonCUDA::test_multiple_devices_min_reduction_with_dim_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9944439Z test_ops.py::TestCommonCUDA::test_multiple_devices_minimum_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9944632Z test_ops.py::TestCommonCUDA::test_multiple_devices_minimum_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9944824Z test_ops.py::TestCommonCUDA::test_multiple_devices_mode_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9945014Z test_ops.py::TestCommonCUDA::test_multiple_devices_movedim_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9945213Z test_ops.py::TestCommonCUDA::test_multiple_devices_msort_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9945404Z test_ops.py::TestCommonCUDA::test_multiple_devices_mv_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9945616Z test_ops.py::TestCommonCUDA::test_multiple_devices_mvlgamma_mvlgamma_p_3_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9945826Z test_ops.py::TestCommonCUDA::test_multiple_devices_mvlgamma_mvlgamma_p_5_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9946029Z test_ops.py::TestCommonCUDA::test_multiple_devices_mvlgamma_mvlgamma_p_5_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9946225Z test_ops.py::TestCommonCUDA::test_multiple_devices_nan_to_num_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9946422Z test_ops.py::TestCommonCUDA::test_multiple_devices_nanmedian_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9946622Z test_ops.py::TestCommonCUDA::test_multiple_devices_nanquantile_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9946816Z test_ops.py::TestCommonCUDA::test_multiple_devices_nansum_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9947023Z test_ops.py::TestCommonCUDA::test_multiple_devices_nansum_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9947226Z test_ops.py::TestCommonCUDA::test_multiple_devices_narrow_copy_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9947623Z test_ops.py::TestCommonCUDA::test_multiple_devices_narrow_copy_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9947880Z test_ops.py::TestCommonCUDA::test_multiple_devices_native_batch_norm_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9948106Z test_ops.py::TestCommonCUDA::test_multiple_devices_native_layer_norm_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9948313Z test_ops.py::TestCommonCUDA::test_multiple_devices_ne_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9948524Z test_ops.py::TestCommonCUDA::test_multiple_devices_new_empty_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9948817Z test_ops.py::TestCommonCUDA::test_multiple_devices_new_empty_strided_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9949012Z test_ops.py::TestCommonCUDA::test_multiple_devices_new_full_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9949218Z test_ops.py::TestCommonCUDA::test_multiple_devices_new_full_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9949426Z test_ops.py::TestCommonCUDA::test_multiple_devices_new_ones_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9949662Z test_ops.py::TestCommonCUDA::test_multiple_devices_new_ones_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9949901Z test_ops.py::TestCommonCUDA::test_multiple_devices_new_zeros_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9950150Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional__scaled_dot_product_attention_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9950387Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_adaptive_avg_pool1d_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9950624Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_adaptive_max_pool2d_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9950856Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_adaptive_max_pool3d_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9951085Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_alpha_dropout_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9951295Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_avg_pool1d_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9951530Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_avg_pool2d_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9951772Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_avg_pool3d_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9951991Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_batch_norm_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9952215Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_bilinear_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9952462Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_binary_cross_entropy_with_logits_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9952700Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_conv_transpose1d_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9952964Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_cosine_similarity_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9953195Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_ctc_loss_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9953421Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_dropout3d_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9953659Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_dropout_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9953860Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_elu_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9954110Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_feature_alpha_dropout_without_train_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9954401Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_fractional_max_pool2d_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9954635Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_fractional_max_pool3d_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9954867Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_gelu_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9955086Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_glu_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9955329Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_hinge_embedding_loss_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9955566Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_interpolate_area_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9955849Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_interpolate_linear_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9956087Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_interpolate_trilinear_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9956294Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_layer_norm_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9956518Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_leaky_relu_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9956746Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_linear_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9956982Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_local_response_norm_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9957227Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_margin_ranking_loss_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9957463Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_max_pool1d_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9957692Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_max_unpool1d_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9957939Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_max_unpool1d_grad_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9958207Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_max_unpool2d_grad_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9958434Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_mish_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:10:16.9958662Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_mse_loss_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9958914Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_multilabel_soft_margin_loss_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9959146Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_pad_reflect_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9959378Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_pad_replicate_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9959609Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_pairwise_distance_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9959838Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_pairwise_distance_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9960085Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_pdist_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9960320Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_pixel_shuffle_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9960554Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_pixel_unshuffle_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9960775Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_relu6_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9960995Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_relu_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9961224Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_rrelu_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9961451Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_selu_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9961676Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_silu_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9961915Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_smooth_l1_loss_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9962211Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_soft_margin_loss_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9962452Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_softmin_with_dtype_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9962683Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_softmin_with_dtype_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9962914Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_softsign_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9963143Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_softsign_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9963375Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_tanhshrink_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9963607Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_tanhshrink_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9963816Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_threshold_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9964047Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_threshold_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9964314Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_triplet_margin_with_distance_loss_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9964569Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_unfold_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9964806Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_upsample_bilinear_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9965017Z test_ops.py::TestCommonCUDA::test_multiple_devices_nonzero_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9965234Z test_ops.py::TestCommonCUDA::test_multiple_devices_nonzero_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9965440Z test_ops.py::TestCommonCUDA::test_multiple_devices_norm_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9965646Z test_ops.py::TestCommonCUDA::test_multiple_devices_norm_fro_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9965898Z test_ops.py::TestCommonCUDA::test_multiple_devices_ones_like_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9966091Z test_ops.py::TestCommonCUDA::test_multiple_devices_ormqr_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9966315Z test_ops.py::TestCommonCUDA::test_multiple_devices_outer_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9966526Z test_ops.py::TestCommonCUDA::test_multiple_devices_pinverse_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9966732Z test_ops.py::TestCommonCUDA::test_multiple_devices_polar_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9966970Z test_ops.py::TestCommonCUDA::test_multiple_devices_polygamma_polygamma_n_0_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9967180Z test_ops.py::TestCommonCUDA::test_multiple_devices_polygamma_polygamma_n_1_cuda_int64 SKIPPED (Skipped!) [ 8%] 2023-01-11T23:10:16.9967412Z test_ops.py::TestCommonCUDA::test_multiple_devices_polygamma_polygamma_n_2_cuda_int64 SKIPPED (Skipped!) [ 8%] 2023-01-11T23:10:16.9967622Z test_ops.py::TestCommonCUDA::test_multiple_devices_polygamma_polygamma_n_4_cuda_float32 SKIPPED (Skipped!) [ 8%] 2023-01-11T23:10:16.9967836Z test_ops.py::TestCommonCUDA::test_multiple_devices_polygamma_polygamma_n_4_cuda_int64 SKIPPED (Skipped!) [ 8%] 2023-01-11T23:10:16.9968056Z test_ops.py::TestCommonCUDA::test_multiple_devices_pow_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9968241Z test_ops.py::TestCommonCUDA::test_multiple_devices_prod_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9968444Z test_ops.py::TestCommonCUDA::test_multiple_devices_put_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9968657Z test_ops.py::TestCommonCUDA::test_multiple_devices_put_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9968865Z test_ops.py::TestCommonCUDA::test_multiple_devices_qr_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9969077Z test_ops.py::TestCommonCUDA::test_multiple_devices_rad2deg_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9969289Z test_ops.py::TestCommonCUDA::test_multiple_devices_rand_like_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9969548Z test_ops.py::TestCommonCUDA::test_multiple_devices_randint_like_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9969764Z test_ops.py::TestCommonCUDA::test_multiple_devices_randn_like_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9969986Z test_ops.py::TestCommonCUDA::test_multiple_devices_real_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9970173Z test_ops.py::TestCommonCUDA::test_multiple_devices_real_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9970402Z test_ops.py::TestCommonCUDA::test_multiple_devices_remainder_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9970671Z test_ops.py::TestCommonCUDA::test_multiple_devices_remainder_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9970905Z test_ops.py::TestCommonCUDA::test_multiple_devices_repeat_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9971129Z test_ops.py::TestCommonCUDA::test_multiple_devices_repeat_interleave_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9971342Z test_ops.py::TestCommonCUDA::test_multiple_devices_reshape_as_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9971554Z test_ops.py::TestCommonCUDA::test_multiple_devices_reshape_as_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9971771Z test_ops.py::TestCommonCUDA::test_multiple_devices_resize__cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9972003Z test_ops.py::TestCommonCUDA::test_multiple_devices_resize_as__cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9972217Z test_ops.py::TestCommonCUDA::test_multiple_devices_resize_as__cuda_int64 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9972413Z test_ops.py::TestCommonCUDA::test_multiple_devices_resolve_conj_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9972627Z test_ops.py::TestCommonCUDA::test_multiple_devices_resolve_neg_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9972836Z test_ops.py::TestCommonCUDA::test_multiple_devices_resolve_neg_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9973039Z test_ops.py::TestCommonCUDA::test_multiple_devices_roll_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9973285Z test_ops.py::TestCommonCUDA::test_multiple_devices_rot90_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9973523Z test_ops.py::TestCommonCUDA::test_multiple_devices_rot90_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9973734Z test_ops.py::TestCommonCUDA::test_multiple_devices_round_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9973956Z test_ops.py::TestCommonCUDA::test_multiple_devices_round_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9974179Z test_ops.py::TestCommonCUDA::test_multiple_devices_round_decimals_0_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9974363Z test_ops.py::TestCommonCUDA::test_multiple_devices_round_decimals_3_cuda_float32 SKIPPED (Skipped!) [ 8%] 2023-01-11T23:10:16.9974667Z test_ops.py::TestCommonCUDA::test_multiple_devices_rsqrt_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9974876Z test_ops.py::TestCommonCUDA::test_multiple_devices_rsub_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9975098Z test_ops.py::TestCommonCUDA::test_multiple_devices_scalar_tensor_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9975326Z test_ops.py::TestCommonCUDA::test_multiple_devices_scalar_tensor_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9975541Z test_ops.py::TestCommonCUDA::test_multiple_devices_scatter_add_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9975754Z test_ops.py::TestCommonCUDA::test_multiple_devices_scatter_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9975984Z test_ops.py::TestCommonCUDA::test_multiple_devices_scatter_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9976208Z test_ops.py::TestCommonCUDA::test_multiple_devices_scatter_reduce_amax_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9976432Z test_ops.py::TestCommonCUDA::test_multiple_devices_scatter_reduce_amax_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9976634Z test_ops.py::TestCommonCUDA::test_multiple_devices_scatter_reduce_amin_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9976947Z test_ops.py::TestCommonCUDA::test_multiple_devices_scatter_reduce_prod_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9977158Z test_ops.py::TestCommonCUDA::test_multiple_devices_select_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9977376Z test_ops.py::TestCommonCUDA::test_multiple_devices_select_scatter_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9977584Z test_ops.py::TestCommonCUDA::test_multiple_devices_sigmoid_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9977791Z test_ops.py::TestCommonCUDA::test_multiple_devices_sigmoid_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9978021Z test_ops.py::TestCommonCUDA::test_multiple_devices_sign_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9978224Z test_ops.py::TestCommonCUDA::test_multiple_devices_sign_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9978459Z test_ops.py::TestCommonCUDA::test_multiple_devices_signal_windows_bartlett_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9978690Z test_ops.py::TestCommonCUDA::test_multiple_devices_signal_windows_exponential_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9978904Z test_ops.py::TestCommonCUDA::test_multiple_devices_signal_windows_general_cosine_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9979128Z test_ops.py::TestCommonCUDA::test_multiple_devices_signal_windows_hamming_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9979349Z test_ops.py::TestCommonCUDA::test_multiple_devices_signal_windows_hann_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9979607Z test_ops.py::TestCommonCUDA::test_multiple_devices_signal_windows_kaiser_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9979830Z test_ops.py::TestCommonCUDA::test_multiple_devices_signal_windows_nuttall_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9980047Z test_ops.py::TestCommonCUDA::test_multiple_devices_sin_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9980258Z test_ops.py::TestCommonCUDA::test_multiple_devices_sin_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9980498Z test_ops.py::TestCommonCUDA::test_multiple_devices_sinc_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9980710Z test_ops.py::TestCommonCUDA::test_multiple_devices_slice_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9980922Z test_ops.py::TestCommonCUDA::test_multiple_devices_slice_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9981119Z test_ops.py::TestCommonCUDA::test_multiple_devices_slice_scatter_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9981330Z test_ops.py::TestCommonCUDA::test_multiple_devices_softmax_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9981539Z test_ops.py::TestCommonCUDA::test_multiple_devices_sort_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9981743Z test_ops.py::TestCommonCUDA::test_multiple_devices_sort_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9981981Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_airy_ai_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9982196Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_airy_ai_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9982419Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_bessel_j0_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9982660Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_bessel_y0_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9982883Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_bessel_y1_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9983093Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_bessel_y1_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9983311Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_chebyshev_polynomial_t_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9983714Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_chebyshev_polynomial_v_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 8%] 2023-01-11T23:10:16.9984084Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_chebyshev_polynomial_w_cuda_int64 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 8%] 2023-01-11T23:10:16.9984319Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_erfcx_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9984565Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_erfcx_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9984797Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_hermite_polynomial_h_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9985028Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_hermite_polynomial_he_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9985240Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_i1_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9985459Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_i1_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9985698Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_i1e_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9985900Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_log_ndtr_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9986128Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_modified_bessel_k1_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9986357Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_ndtr_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9986573Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_ndtri_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9986788Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_ndtri_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9987030Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_scaled_modified_bessel_k0_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9987273Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_scaled_modified_bessel_k0_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9987662Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_shifted_chebyshev_polynomial_t_cuda_int64 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 8%] 2023-01-11T23:10:16.9988123Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_shifted_chebyshev_polynomial_v_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 8%] 2023-01-11T23:10:16.9988497Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_shifted_chebyshev_polynomial_w_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 8%] 2023-01-11T23:10:16.9988858Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_spherical_bessel_j0_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9989062Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_xlog1py_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9989308Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_xlog1py_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9989538Z test_ops.py::TestCommonCUDA::test_multiple_devices_split_list_args_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9989757Z test_ops.py::TestCommonCUDA::test_multiple_devices_split_with_sizes_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9989965Z test_ops.py::TestCommonCUDA::test_multiple_devices_sqrt_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9990180Z test_ops.py::TestCommonCUDA::test_multiple_devices_squeeze_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9990391Z test_ops.py::TestCommonCUDA::test_multiple_devices_stack_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9990605Z test_ops.py::TestCommonCUDA::test_multiple_devices_std_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9990840Z test_ops.py::TestCommonCUDA::test_multiple_devices_std_mean_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9991063Z test_ops.py::TestCommonCUDA::test_multiple_devices_std_unbiased_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9991252Z test_ops.py::TestCommonCUDA::test_multiple_devices_sub_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9991453Z test_ops.py::TestCommonCUDA::test_multiple_devices_sum_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9991670Z test_ops.py::TestCommonCUDA::test_multiple_devices_sum_to_size_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9991907Z test_ops.py::TestCommonCUDA::test_multiple_devices_symeig_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9992111Z test_ops.py::TestCommonCUDA::test_multiple_devices_t_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9992331Z test_ops.py::TestCommonCUDA::test_multiple_devices_take_along_dim_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9992584Z test_ops.py::TestCommonCUDA::test_multiple_devices_take_along_dim_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9992812Z test_ops.py::TestCommonCUDA::test_multiple_devices_take_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9993017Z test_ops.py::TestCommonCUDA::test_multiple_devices_take_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9993203Z test_ops.py::TestCommonCUDA::test_multiple_devices_tan_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9993409Z test_ops.py::TestCommonCUDA::test_multiple_devices_tanh_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9993612Z test_ops.py::TestCommonCUDA::test_multiple_devices_tanh_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9993826Z test_ops.py::TestCommonCUDA::test_multiple_devices_tensor_split_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9994038Z test_ops.py::TestCommonCUDA::test_multiple_devices_tensordot_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9994239Z test_ops.py::TestCommonCUDA::test_multiple_devices_tile_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9994461Z test_ops.py::TestCommonCUDA::test_multiple_devices_to_sparse_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:10:16.9994686Z test_ops.py::TestCommonCUDA::test_multiple_devices_trace_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 9%] 2023-01-11T23:10:16.9994902Z test_ops.py::TestCommonCUDA::test_multiple_devices_transpose_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 9%] 2023-01-11T23:10:16.9995141Z test_ops.py::TestCommonCUDA::test_multiple_devices_transpose_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 9%] 2023-01-11T23:10:16.9995349Z test_ops.py::TestCommonCUDA::test_multiple_devices_trapezoid_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 9%] 2023-01-11T23:10:16.9995557Z test_ops.py::TestCommonCUDA::test_multiple_devices_trapz_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 9%] 2023-01-11T23:10:16.9995762Z test_ops.py::TestCommonCUDA::test_multiple_devices_trapz_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 9%] 2023-01-11T23:10:16.9995991Z test_ops.py::TestCommonCUDA::test_multiple_devices_triangular_solve_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 9%] 2023-01-11T23:10:16.9996228Z test_ops.py::TestCommonCUDA::test_multiple_devices_tril_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 9%] 2023-01-11T23:10:16.9996435Z test_ops.py::TestCommonCUDA::test_multiple_devices_tril_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 9%] 2023-01-11T23:10:16.9996664Z test_ops.py::TestCommonCUDA::test_multiple_devices_triu_indices_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 9%] 2023-01-11T23:10:16.9996878Z test_ops.py::TestCommonCUDA::test_multiple_devices_true_divide_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 9%] 2023-01-11T23:10:16.9997089Z test_ops.py::TestCommonCUDA::test_multiple_devices_true_divide_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 9%] 2023-01-11T23:10:16.9997294Z test_ops.py::TestCommonCUDA::test_multiple_devices_trunc_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 9%] 2023-01-11T23:10:16.9997480Z test_ops.py::TestCommonCUDA::test_multiple_devices_unbind_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 9%] 2023-01-11T23:10:16.9997695Z test_ops.py::TestCommonCUDA::test_multiple_devices_unflatten_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 9%] 2023-01-11T23:10:16.9997938Z test_ops.py::TestCommonCUDA::test_multiple_devices_unfold_copy_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 9%] 2023-01-11T23:10:16.9998148Z test_ops.py::TestCommonCUDA::test_multiple_devices_uniform_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 9%] 2023-01-11T23:10:16.9998370Z test_ops.py::TestCommonCUDA::test_multiple_devices_unique_consecutive_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 9%] 2023-01-11T23:10:16.9998604Z test_ops.py::TestCommonCUDA::test_multiple_devices_unique_consecutive_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 9%] 2023-01-11T23:10:16.9998807Z test_ops.py::TestCommonCUDA::test_multiple_devices_unique_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 9%] 2023-01-11T23:10:16.9999020Z test_ops.py::TestCommonCUDA::test_multiple_devices_unsqueeze_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 9%] 2023-01-11T23:10:16.9999246Z test_ops.py::TestCommonCUDA::test_multiple_devices_var_mean_unbiased_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 9%] 2023-01-11T23:10:16.9999447Z test_ops.py::TestCommonCUDA::test_multiple_devices_var_unbiased_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 9%] 2023-01-11T23:10:16.9999670Z test_ops.py::TestCommonCUDA::test_multiple_devices_view_as_complex_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 9%] 2023-01-11T23:10:16.9999913Z test_ops.py::TestCommonCUDA::test_multiple_devices_view_as_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 9%] 2023-01-11T23:10:17.0000131Z test_ops.py::TestCommonCUDA::test_multiple_devices_view_copy_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 9%] 2023-01-11T23:10:17.0000377Z test_ops.py::TestCommonCUDA::test_multiple_devices_vsplit_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 9%] 2023-01-11T23:10:17.0000605Z test_ops.py::TestCommonCUDA::test_multiple_devices_vsplit_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 9%] 2023-01-11T23:10:17.0000810Z test_ops.py::TestCommonCUDA::test_multiple_devices_where_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 9%] 2023-01-11T23:10:17.0001022Z test_ops.py::TestCommonCUDA::test_multiple_devices_xlogy_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 9%] 2023-01-11T23:10:17.0001252Z test_ops.py::TestCommonCUDA::test_multiple_devices_xlogy_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 9%] 2023-01-11T23:10:17.0001465Z test_ops.py::TestCommonCUDA::test_multiple_devices_zeros_like_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 9%] 2023-01-11T23:10:17.0001633Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values___getitem___cuda_bool PASSED [ 9%] 2023-01-11T23:10:17.0001814Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values___rand___cuda_bool PASSED [ 9%] 2023-01-11T23:10:17.0001992Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values___rmul___cuda_bool PASSED [ 9%] 2023-01-11T23:10:17.0002171Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values___rxor___cuda_bool PASSED [ 9%] 2023-01-11T23:10:17.0002373Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_abs_cuda_bool PASSED [ 9%] 2023-01-11T23:10:17.0002549Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_acos_cuda_bool PASSED [ 9%] 2023-01-11T23:10:17.0002727Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_add_cuda_bool PASSED [ 9%] 2023-01-11T23:10:17.0002907Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_aminmax_cuda_bool PASSED [ 9%] 2023-01-11T23:10:17.0003067Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_angle_cuda_bool PASSED [ 9%] 2023-01-11T23:10:17.0003284Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_argwhere_cuda_bool PASSED [ 9%] 2023-01-11T23:10:17.0003531Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_as_strided_partial_views_cuda_bool SKIPPED (Modifies input strides and storage_offset) [ 9%] 2023-01-11T23:10:17.0003727Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_as_strided_scatter_cuda_bool PASSED [ 9%] 2023-01-11T23:10:17.0003940Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_asin_cuda_bool PASSED [ 9%] 2023-01-11T23:10:17.0004130Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_atan2_cuda_bool PASSED [ 9%] 2023-01-11T23:10:17.0004310Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_atan_cuda_bool PASSED [ 9%] 2023-01-11T23:10:17.0004495Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_atleast_2d_cuda_bool PASSED [ 9%] 2023-01-11T23:10:17.0004674Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_bfloat16_cuda_bool PASSED [ 9%] 2023-01-11T23:10:17.0004841Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_bitwise_and_cuda_bool PASSED [ 9%] 2023-01-11T23:10:17.0005026Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_bitwise_not_cuda_bool PASSED [ 9%] 2023-01-11T23:10:17.0005210Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_bitwise_xor_cuda_bool PASSED [ 9%] 2023-01-11T23:10:17.0005400Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_block_diag_cuda_bool PASSED [ 9%] 2023-01-11T23:10:17.0005591Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_broadcast_to_cuda_bool PASSED [ 9%] 2023-01-11T23:10:17.0005788Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_byte_cuda_bool PASSED [ 9%] 2023-01-11T23:10:17.0005969Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_cdouble_cuda_bool PASSED [ 9%] 2023-01-11T23:10:17.0006155Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_clamp_min_cuda_bool PASSED [ 9%] 2023-01-11T23:10:17.0006373Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_clone_cuda_bool PASSED [ 9%] 2023-01-11T23:10:17.0006537Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_column_stack_cuda_bool PASSED [ 9%] 2023-01-11T23:10:17.0006724Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_combinations_cuda_bool PASSED [ 9%] 2023-01-11T23:10:17.0006924Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_conj_physical_cuda_bool PASSED [ 9%] 2023-01-11T23:10:17.0007121Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_constant_pad_nd_cuda_bool PASSED [ 9%] 2023-01-11T23:10:17.0007309Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_contiguous_cuda_bool PASSED [ 9%] 2023-01-11T23:10:17.0007531Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_copysign_cuda_bool PASSED [ 9%] 2023-01-11T23:10:17.0007711Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_cos_cuda_bool PASSED [ 9%] 2023-01-11T23:10:17.0007898Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_count_nonzero_cuda_bool PASSED [ 9%] 2023-01-11T23:10:17.0008060Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_deg2rad_cuda_bool PASSED [ 9%] 2023-01-11T23:10:17.0008255Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_diagonal_copy_cuda_bool PASSED [ 9%] 2023-01-11T23:10:17.0008438Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_diff_cuda_bool PASSED [ 9%] 2023-01-11T23:10:17.0008625Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_digamma_cuda_bool PASSED [ 9%] 2023-01-11T23:10:17.0008826Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_div_no_rounding_mode_cuda_bool PASSED [ 9%] 2023-01-11T23:10:17.0009011Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_double_cuda_bool PASSED [ 9%] 2023-01-11T23:10:17.0009207Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_dsplit_cuda_bool PASSED [ 9%] 2023-01-11T23:10:17.0009407Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_empty_like_cuda_bool SKIPPED (Skipped!) [ 9%] 2023-01-11T23:10:17.0009627Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_eq_cuda_bool PASSED [ 9%] 2023-01-11T23:10:17.0009789Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_equal_cuda_bool PASSED [ 9%] 2023-01-11T23:10:17.0009966Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_exp_cuda_bool PASSED [ 9%] 2023-01-11T23:10:17.0010164Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_expand_as_cuda_bool PASSED [ 9%] 2023-01-11T23:10:17.0010400Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_expm1_cuda_bool PASSED [ 9%] 2023-01-11T23:10:17.0010575Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_eye_cuda_bool PASSED [ 9%] 2023-01-11T23:10:17.0010760Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_fft2_cuda_bool PASSED [ 9%] 2023-01-11T23:10:17.0010952Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_fft_cuda_bool PASSED [ 9%] 2023-01-11T23:10:17.0011138Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_fftn_cuda_bool PASSED [ 9%] 2023-01-11T23:10:17.0011299Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_hfft_cuda_bool PASSED [ 9%] 2023-01-11T23:10:17.0011484Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_hfftn_cuda_bool PASSED [ 9%] 2023-01-11T23:10:17.0011664Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_ifftn_cuda_bool PASSED [ 9%] 2023-01-11T23:10:17.0011857Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_ihfftn_cuda_bool PASSED [ 9%] 2023-01-11T23:10:17.0012035Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_irfft_cuda_bool PASSED [ 9%] 2023-01-11T23:10:17.0012218Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_irfftn_cuda_bool PASSED [ 9%] 2023-01-11T23:10:17.0012397Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_rfftn_cuda_bool PASSED [ 9%] 2023-01-11T23:10:17.0012595Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_flip_cuda_bool PASSED [ 9%] 2023-01-11T23:10:17.0012807Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fliplr_cuda_bool PASSED [ 9%] 2023-01-11T23:10:17.0012968Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_float_cuda_bool PASSED [ 9%] 2023-01-11T23:10:17.0013146Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fmin_cuda_bool PASSED [ 9%] 2023-01-11T23:10:17.0013321Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_full_cuda_bool PASSED [ 9%] 2023-01-11T23:10:17.0013507Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_full_like_cuda_bool PASSED [ 9%] 2023-01-11T23:10:17.0013688Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_gather_cuda_bool PASSED [ 9%] 2023-01-11T23:10:17.0013896Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_gt_cuda_bool PASSED [ 9%] 2023-01-11T23:10:17.0014075Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_heaviside_cuda_bool PASSED [ 9%] 2023-01-11T23:10:17.0014265Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_hstack_cuda_bool PASSED [ 9%] 2023-01-11T23:10:17.0014420Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_i0_cuda_bool PASSED [ 9%] 2023-01-11T23:10:17.0014714Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_index_fill_cuda_bool PASSED [ 9%] 2023-01-11T23:10:17.0014897Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_isclose_cuda_bool PASSED [ 9%] 2023-01-11T23:10:17.0015084Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_isfinite_cuda_bool PASSED [ 9%] 2023-01-11T23:10:17.0015261Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_isinf_cuda_bool PASSED [ 9%] 2023-01-11T23:10:17.0015447Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_isnan_cuda_bool PASSED [ 9%] 2023-01-11T23:10:17.0015626Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_isreal_cuda_bool PASSED [ 9%] 2023-01-11T23:10:17.0015831Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_jiterator_binary_return_by_ref_cuda_bool PASSED [ 9%] 2023-01-11T23:10:17.0016078Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_jiterator_unary_cuda_bool PASSED [ 9%] 2023-01-11T23:10:17.0016236Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_kron_cuda_bool PASSED [ 9%] 2023-01-11T23:10:17.0016421Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_logical_and_cuda_bool PASSED [ 9%] 2023-01-11T23:10:17.0016602Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_logsumexp_cuda_bool PASSED [ 9%] 2023-01-11T23:10:17.0016824Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_long_cuda_bool PASSED [ 9%] 2023-01-11T23:10:17.0017006Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_lt_cuda_bool PASSED [ 9%] 2023-01-11T23:10:17.0017195Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_masked_mean_cuda_bool PASSED [ 9%] 2023-01-11T23:10:17.0017381Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_masked_prod_cuda_bool PASSED [ 9%] 2023-01-11T23:10:17.0017574Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_masked_scatter_cuda_bool PASSED [ 9%] 2023-01-11T23:10:17.0017746Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_masked_select_cuda_bool PASSED [ 9%] 2023-01-11T23:10:17.0017946Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_max_binary_cuda_bool PASSED [ 9%] 2023-01-11T23:10:17.0018141Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_max_reduction_no_dim_cuda_bool PASSED [ 9%] 2023-01-11T23:10:17.0018343Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_meshgrid_list_of_tensors_cuda_bool PASSED [ 9%] 2023-01-11T23:10:17.0018555Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_meshgrid_variadic_tensors_cuda_bool PASSED [ 9%] 2023-01-11T23:10:17.0018742Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_min_binary_cuda_bool PASSED [ 9%] 2023-01-11T23:10:17.0018938Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_min_reduction_no_dim_cuda_bool PASSED [ 9%] 2023-01-11T23:10:17.0019140Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_min_reduction_with_dim_cuda_bool PASSED [ 9%] 2023-01-11T23:10:17.0019351Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_mode_cuda_bool PASSED [ 9%] 2023-01-11T23:10:17.0019513Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_nan_to_num_cuda_bool PASSED [ 9%] 2023-01-11T23:10:17.0019711Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_nansum_cuda_bool PASSED [ 9%] 2023-01-11T23:10:17.0019898Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_ne_cuda_bool PASSED [ 9%] 2023-01-11T23:10:17.0020183Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_new_empty_strided_cuda_bool SKIPPED (Expected: new_empty_strided is not comparable) [ 9%] 2023-01-11T23:10:17.0020399Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_nn_functional_cosine_embedding_loss_cuda_bool PASSED [ 9%] 2023-01-11T23:10:17.0020630Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_nn_functional_feature_alpha_dropout_without_train_cuda_bool PASSED [ 9%] 2023-01-11T23:10:17.0020833Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_nn_functional_pad_constant_cuda_bool PASSED [ 9%] 2023-01-11T23:10:17.0021039Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_nn_functional_pixel_unshuffle_cuda_bool PASSED [ 9%] 2023-01-11T23:10:17.0021221Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_nonzero_cuda_bool PASSED [ 9%] 2023-01-11T23:10:17.0021430Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_ones_like_cuda_bool PASSED [ 9%] 2023-01-11T23:10:17.0021592Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_outer_cuda_bool PASSED [ 9%] 2023-01-11T23:10:17.0021776Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_permute_cuda_bool PASSED [ 9%] 2023-01-11T23:10:17.0021979Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_polygamma_polygamma_n_0_cuda_bool PASSED [ 9%] 2023-01-11T23:10:17.0022191Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_polygamma_polygamma_n_1_cuda_bool SKIPPED (Skipped!) [ 9%] 2023-01-11T23:10:17.0022370Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_prod_cuda_bool PASSED [ 9%] 2023-01-11T23:10:17.0022550Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_put_cuda_bool PASSED [ 9%] 2023-01-11T23:10:17.0022761Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_ravel_cuda_bool PASSED [ 9%] 2023-01-11T23:10:17.0022974Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_repeat_cuda_bool PASSED [ 9%] 2023-01-11T23:10:17.0023151Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_repeat_interleave_cuda_bool PASSED [ 9%] 2023-01-11T23:10:17.0023354Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_reshape_as_cuda_bool PASSED [ 9%] 2023-01-11T23:10:17.0023535Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_reshape_cuda_bool PASSED [ 9%] 2023-01-11T23:10:17.0023713Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_resize__cuda_bool PASSED [ 9%] 2023-01-11T23:10:17.0023900Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_resolve_conj_cuda_bool PASSED [ 9%] 2023-01-11T23:10:17.0024085Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_resolve_neg_cuda_bool PASSED [ 9%] 2023-01-11T23:10:17.0024264Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_roll_cuda_bool PASSED [ 9%] 2023-01-11T23:10:17.0024447Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_rot90_cuda_bool PASSED [ 9%] 2023-01-11T23:10:17.0024630Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_rsqrt_cuda_bool PASSED [ 9%] 2023-01-11T23:10:17.0024797Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_scatter_add_cuda_bool PASSED [ 9%] 2023-01-11T23:10:17.0024991Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_scatter_cuda_bool PASSED [ 9%] 2023-01-11T23:10:17.0025196Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_select_scatter_cuda_bool PASSED [ 9%] 2023-01-11T23:10:17.0025374Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_sgn_cuda_bool PASSED [ 9%] 2023-01-11T23:10:17.0051635Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_short_cuda_bool PASSED [ 10%] 2023-01-11T23:10:17.0051862Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_sigmoid_cuda_bool PASSED [ 10%] 2023-01-11T23:10:17.0052032Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_sin_cuda_bool PASSED [ 10%] 2023-01-11T23:10:17.0052204Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_slice_cuda_bool PASSED [ 10%] 2023-01-11T23:10:17.0052389Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_bessel_y0_cuda_bool PASSED [ 10%] 2023-01-11T23:10:17.0052659Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_bessel_y1_cuda_bool PASSED [ 10%] 2023-01-11T23:10:17.0052854Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_chebyshev_polynomial_t_cuda_bool PASSED [ 10%] 2023-01-11T23:10:17.0053271Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_chebyshev_polynomial_v_cuda_bool SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 10%] 2023-01-11T23:10:17.0053451Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_entr_cuda_bool PASSED [ 10%] 2023-01-11T23:10:17.0053645Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_hermite_polynomial_h_cuda_bool PASSED [ 10%] 2023-01-11T23:10:17.0053837Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_hermite_polynomial_he_cuda_bool PASSED [ 10%] 2023-01-11T23:10:17.0054011Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_i0e_cuda_bool PASSED [ 10%] 2023-01-11T23:10:17.0054182Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_i1_cuda_bool PASSED [ 10%] 2023-01-11T23:10:17.0054371Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_laguerre_polynomial_l_cuda_bool PASSED [ 10%] 2023-01-11T23:10:17.0055103Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_legendre_polynomial_p_cuda_bool SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 10%] 2023-01-11T23:10:17.0055295Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_modified_bessel_i0_cuda_bool PASSED [ 10%] 2023-01-11T23:10:17.0055478Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_modified_bessel_k1_cuda_bool PASSED [ 10%] 2023-01-11T23:10:17.0055648Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_ndtri_cuda_bool PASSED [ 10%] 2023-01-11T23:10:17.0055969Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_polygamma_special_polygamma_n_0_cuda_bool PASSED [ 10%] 2023-01-11T23:10:17.0056167Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_scaled_modified_bessel_k0_cuda_bool PASSED [ 10%] 2023-01-11T23:10:17.0056544Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_shifted_chebyshev_polynomial_v_cuda_bool SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 10%] 2023-01-11T23:10:17.0056912Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_shifted_chebyshev_polynomial_w_cuda_bool SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 10%] 2023-01-11T23:10:17.0057099Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_spherical_bessel_j0_cuda_bool PASSED [ 10%] 2023-01-11T23:10:17.0057262Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_split_cuda_bool PASSED [ 10%] 2023-01-11T23:10:17.0057426Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_sqrt_cuda_bool PASSED [ 10%] 2023-01-11T23:10:17.0057585Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_stack_cuda_bool PASSED [ 10%] 2023-01-11T23:10:17.0057748Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_sum_cuda_bool PASSED [ 10%] 2023-01-11T23:10:17.0057918Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_tensor_split_cuda_bool PASSED [ 10%] 2023-01-11T23:10:17.0058082Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_trace_cuda_bool PASSED [ 10%] 2023-01-11T23:10:17.0058241Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_triu_cuda_bool PASSED [ 10%] 2023-01-11T23:10:17.0058409Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_true_divide_cuda_bool PASSED [ 10%] 2023-01-11T23:10:17.0058570Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_unbind_cuda_bool PASSED [ 10%] 2023-01-11T23:10:17.0058734Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_view_as_cuda_bool PASSED [ 10%] 2023-01-11T23:10:17.0058899Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_view_copy_cuda_bool PASSED [ 10%] 2023-01-11T23:10:17.0059111Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_vsplit_cuda_bool PASSED [ 10%] 2023-01-11T23:10:17.0059274Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_zero__cuda_bool PASSED [ 10%] 2023-01-11T23:10:17.0059456Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_zeros_like_cuda_bool PASSED [ 10%] 2023-01-11T23:10:17.0059620Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_H_cuda_float32 PASSED [ 10%] 2023-01-11T23:10:17.0059783Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_T_cuda_complex64 PASSED [ 10%] 2023-01-11T23:10:17.0059943Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_T_cuda_float32 PASSED [ 10%] 2023-01-11T23:10:17.0060119Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples___getitem___cuda_complex64 PASSED [ 10%] 2023-01-11T23:10:17.0060313Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples___getitem___cuda_float32 PASSED [ 10%] 2023-01-11T23:10:17.0060502Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples___getitem___cuda_int64 PASSED [ 10%] 2023-01-11T23:10:17.0060672Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples___radd___cuda_float32 PASSED [ 10%] 2023-01-11T23:10:17.0060838Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples___rmod___cuda_float32 PASSED [ 10%] 2023-01-11T23:10:17.0061000Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples___rmod___cuda_int64 PASSED [ 10%] 2023-01-11T23:10:17.0061162Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples___ror___cuda_int64 PASSED [ 10%] 2023-01-11T23:10:17.0061331Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples___rpow___cuda_complex64 PASSED [ 10%] 2023-01-11T23:10:17.0061493Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples___rpow___cuda_float32 PASSED [ 10%] 2023-01-11T23:10:17.0061654Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples___rsub___cuda_float32 PASSED [ 10%] 2023-01-11T23:10:17.0061836Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples___rxor___cuda_int64 PASSED [ 10%] 2023-01-11T23:10:17.0062027Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples__native_batch_norm_legit_cuda_float32 PASSED [ 10%] 2023-01-11T23:10:17.0062193Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_abs_cuda_complex64 PASSED [ 10%] 2023-01-11T23:10:17.0062355Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_abs_cuda_float32 PASSED [ 10%] 2023-01-11T23:10:17.0062515Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_abs_cuda_int64 PASSED [ 10%] 2023-01-11T23:10:17.0062677Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_acos_cuda_float32 PASSED [ 10%] 2023-01-11T23:10:17.0062845Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_acosh_cuda_complex64 PASSED [ 10%] 2023-01-11T23:10:17.0063008Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_acosh_cuda_float32 PASSED [ 10%] 2023-01-11T23:10:17.0063169Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_add_cuda_int64 PASSED [ 10%] 2023-01-11T23:10:17.0063334Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_addbmm_cuda_complex64 PASSED [ 10%] 2023-01-11T23:10:17.0063500Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_addbmm_cuda_float32 PASSED [ 10%] 2023-01-11T23:10:17.0063671Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_addcmul_cuda_complex64 PASSED [ 10%] 2023-01-11T23:10:17.0063836Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_addmm_cuda_complex64 PASSED [ 10%] 2023-01-11T23:10:17.0063999Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_addmm_cuda_float32 PASSED [ 10%] 2023-01-11T23:10:17.0064184Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_addmm_decomposed_cuda_complex64 PASSED [ 10%] 2023-01-11T23:10:17.0064346Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_addmv_cuda_float32 PASSED [ 10%] 2023-01-11T23:10:17.0064512Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_addr_cuda_complex64 PASSED [ 10%] 2023-01-11T23:10:17.0064673Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_addr_cuda_int64 PASSED [ 10%] 2023-01-11T23:10:17.0064836Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_all_cuda_complex64 PASSED [ 10%] 2023-01-11T23:10:17.0065069Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_all_cuda_float32 PASSED [ 10%] 2023-01-11T23:10:17.0065238Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_aminmax_cuda_float32 PASSED [ 10%] 2023-01-11T23:10:17.0065399Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_any_cuda_complex64 PASSED [ 10%] 2023-01-11T23:10:17.0065558Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_any_cuda_float32 PASSED [ 10%] 2023-01-11T23:10:17.0065716Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_any_cuda_int64 PASSED [ 10%] 2023-01-11T23:10:17.0065877Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_arange_cuda_int64 PASSED [ 10%] 2023-01-11T23:10:17.0066043Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_argmax_cuda_float32 PASSED [ 10%] 2023-01-11T23:10:17.0066202Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_argmin_cuda_int64 PASSED [ 10%] 2023-01-11T23:10:17.0066373Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_argwhere_cuda_float32 PASSED [ 10%] 2023-01-11T23:10:17.0066547Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_as_strided_cuda_complex64 XFAIL [ 10%] 2023-01-11T23:10:17.0066709Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_as_strided_cuda_int64 XFAIL [ 10%] 2023-01-11T23:10:17.0066893Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_as_strided_partial_views_cuda_float32 XFAIL [ 10%] 2023-01-11T23:10:17.0067074Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_as_strided_partial_views_cuda_int64 XFAIL [ 10%] 2023-01-11T23:10:17.0067299Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_as_strided_scatter_cuda_int64 SKIPPED (Works for int64, fails for everything else) [ 10%] 2023-01-11T23:10:17.0067491Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_asin_cuda_complex64 PASSED [ 10%] 2023-01-11T23:10:17.0067659Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_asin_cuda_float32 PASSED [ 10%] 2023-01-11T23:10:17.0067824Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_asinh_cuda_complex64 PASSED [ 10%] 2023-01-11T23:10:17.0067987Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_asinh_cuda_float32 PASSED [ 10%] 2023-01-11T23:10:17.0068145Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_asinh_cuda_int64 PASSED [ 10%] 2023-01-11T23:10:17.0068310Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_atan_cuda_complex64 PASSED [ 10%] 2023-01-11T23:10:17.0068474Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_atan_cuda_float32 PASSED [ 10%] 2023-01-11T23:10:17.0068635Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_atan_cuda_int64 PASSED [ 10%] 2023-01-11T23:10:17.0068881Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_atanh_cuda_int64 PASSED [ 10%] 2023-01-11T23:10:17.0069053Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_atleast_1d_cuda_float32 PASSED [ 10%] 2023-01-11T23:10:17.0069213Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_atleast_1d_cuda_int64 PASSED [ 10%] 2023-01-11T23:10:17.0069387Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_atleast_2d_cuda_complex64 PASSED [ 10%] 2023-01-11T23:10:17.0069549Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_atleast_2d_cuda_int64 PASSED [ 10%] 2023-01-11T23:10:17.0069715Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_atleast_3d_cuda_float32 PASSED [ 10%] 2023-01-11T23:10:17.0069874Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_atleast_3d_cuda_int64 PASSED [ 10%] 2023-01-11T23:10:17.0070042Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_baddbmm_cuda_complex64 PASSED [ 10%] 2023-01-11T23:10:17.0070200Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_bfloat16_cuda_int64 PASSED [ 10%] 2023-01-11T23:10:17.0070363Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_bincount_cuda_int64 PASSED [ 10%] 2023-01-11T23:10:17.0070526Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_bitwise_and_cuda_int64 PASSED [ 10%] 2023-01-11T23:10:17.0070723Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_bitwise_left_shift_cuda_int64 PASSED [ 10%] 2023-01-11T23:10:17.0070889Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_block_diag_cuda_float32 PASSED [ 10%] 2023-01-11T23:10:17.0071052Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_block_diag_cuda_int64 PASSED [ 10%] 2023-01-11T23:10:17.0071212Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_bmm_cuda_complex64 PASSED [ 10%] 2023-01-11T23:10:17.0071369Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_bmm_cuda_float32 PASSED [ 10%] 2023-01-11T23:10:17.0071530Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_bool_cuda_complex64 PASSED [ 10%] 2023-01-11T23:10:17.0071688Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_bool_cuda_int64 PASSED [ 10%] 2023-01-11T23:10:17.0071873Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_broadcast_tensors_cuda_complex64 PASSED [ 10%] 2023-01-11T23:10:17.0072035Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_broadcast_to_cuda_int64 PASSED [ 10%] 2023-01-11T23:10:17.0072201Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_bucketize_cuda_float32 PASSED [ 10%] 2023-01-11T23:10:17.0072360Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_byte_cuda_complex64 PASSED [ 10%] 2023-01-11T23:10:17.0072520Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_byte_cuda_float32 PASSED [ 10%] 2023-01-11T23:10:17.0072697Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cartesian_prod_cuda_complex64 PASSED [ 10%] 2023-01-11T23:10:17.0072868Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cartesian_prod_cuda_float32 PASSED [ 10%] 2023-01-11T23:10:17.0073030Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cat_cuda_complex64 PASSED [ 10%] 2023-01-11T23:10:17.0073219Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cdouble_cuda_complex64 PASSED [ 10%] 2023-01-11T23:10:17.0073383Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cdouble_cuda_float32 PASSED [ 10%] 2023-01-11T23:10:17.0073541Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_ceil_cuda_float32 PASSED [ 10%] 2023-01-11T23:10:17.0073699Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_ceil_cuda_int64 PASSED [ 10%] 2023-01-11T23:10:17.0073864Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cfloat_cuda_complex64 PASSED [ 10%] 2023-01-11T23:10:17.0074028Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_chalf_cuda_complex64 PASSED [ 10%] 2023-01-11T23:10:17.0074188Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_chalf_cuda_float32 PASSED [ 10%] 2023-01-11T23:10:17.0074346Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_chalf_cuda_int64 PASSED [ 10%] 2023-01-11T23:10:17.0074504Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_char_cuda_float32 PASSED [ 10%] 2023-01-11T23:10:17.0074673Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cholesky_cuda_complex64 PASSED [ 10%] 2023-01-11T23:10:17.0074847Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cholesky_inverse_cuda_complex64 PASSED [ 10%] 2023-01-11T23:10:17.0075021Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cholesky_inverse_cuda_float32 PASSED [ 10%] 2023-01-11T23:10:17.0075191Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cholesky_solve_cuda_float32 PASSED [ 10%] 2023-01-11T23:10:17.0075355Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_chunk_cuda_complex64 PASSED [ 10%] 2023-01-11T23:10:17.0075515Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_chunk_cuda_float32 PASSED [ 10%] 2023-01-11T23:10:17.0075674Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_chunk_cuda_int64 PASSED [ 10%] 2023-01-11T23:10:17.0075834Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_clamp_cuda_float32 PASSED [ 10%] 2023-01-11T23:10:17.0075994Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_clamp_cuda_int64 PASSED [ 10%] 2023-01-11T23:10:17.0076155Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_clamp_max_cuda_int64 PASSED [ 10%] 2023-01-11T23:10:17.0076335Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_clone_cuda_complex64 PASSED [ 10%] 2023-01-11T23:10:17.0076495Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_clone_cuda_float32 PASSED [ 10%] 2023-01-11T23:10:17.0076651Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_clone_cuda_int64 PASSED [ 10%] 2023-01-11T23:10:17.0076821Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_column_stack_cuda_float32 PASSED [ 10%] 2023-01-11T23:10:17.0076985Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_column_stack_cuda_int64 PASSED [ 10%] 2023-01-11T23:10:17.0077156Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_combinations_cuda_complex64 PASSED [ 10%] 2023-01-11T23:10:17.0077333Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_conj_physical_cuda_complex64 PASSED [ 10%] 2023-01-11T23:10:17.0077506Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_conj_physical_cuda_float32 PASSED [ 10%] 2023-01-11T23:10:17.0077674Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_conj_physical_cuda_int64 PASSED [ 10%] 2023-01-11T23:10:17.0077848Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_constant_pad_nd_cuda_float32 PASSED [ 10%] 2023-01-11T23:10:17.0078018Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_constant_pad_nd_cuda_int64 PASSED [ 10%] 2023-01-11T23:10:17.0078188Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_contiguous_cuda_complex64 PASSED [ 10%] 2023-01-11T23:10:17.0078356Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_contiguous_cuda_float32 PASSED [ 10%] 2023-01-11T23:10:17.0078521Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_contiguous_cuda_int64 PASSED [ 10%] 2023-01-11T23:10:17.0078684Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_corrcoef_cuda_float32 PASSED [ 10%] 2023-01-11T23:10:17.0078873Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cosh_cuda_complex64 PASSED [ 10%] 2023-01-11T23:10:17.0079036Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cosh_cuda_float32 PASSED [ 10%] 2023-01-11T23:10:17.0079205Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_count_nonzero_cuda_int64 PASSED [ 10%] 2023-01-11T23:10:17.0079369Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cov_cuda_complex64 PASSED [ 10%] 2023-01-11T23:10:17.0079529Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cov_cuda_int64 PASSED [ 10%] 2023-01-11T23:10:17.0079692Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cross_cuda_float32 PASSED [ 11%] 2023-01-11T23:10:17.0079855Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cummax_cuda_float32 PASSED [ 11%] 2023-01-11T23:10:17.0080025Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cumprod_cuda_complex64 PASSED [ 11%] 2023-01-11T23:10:17.0080193Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cumprod_cuda_float32 PASSED [ 11%] 2023-01-11T23:10:17.0080371Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cumsum_cuda_int64 PASSED [ 11%] 2023-01-11T23:10:17.0080587Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cumulative_trapezoid_cuda_complex64 PASSED [ 11%] 2023-01-11T23:10:17.0080776Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cumulative_trapezoid_cuda_float32 PASSED [ 11%] 2023-01-11T23:10:17.0080955Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cumulative_trapezoid_cuda_int64 PASSED [ 11%] 2023-01-11T23:10:17.0081125Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_diag_embed_cuda_complex64 PASSED [ 11%] 2023-01-11T23:10:17.0081290Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_diag_embed_cuda_int64 PASSED [ 11%] 2023-01-11T23:10:17.0081468Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_diagonal_copy_cuda_complex64 PASSED [ 11%] 2023-01-11T23:10:17.0081641Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_diagonal_copy_cuda_float32 PASSED [ 11%] 2023-01-11T23:10:17.0081813Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_diagonal_copy_cuda_int64 PASSED [ 11%] 2023-01-11T23:10:17.0082004Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_diagonal_cuda_complex64 PASSED [ 11%] 2023-01-11T23:10:17.0082169Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_diagonal_cuda_float32 PASSED [ 11%] 2023-01-11T23:10:17.0082333Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_diff_cuda_complex64 PASSED [ 11%] 2023-01-11T23:10:17.0082497Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_digamma_cuda_int64 PASSED [ 11%] 2023-01-11T23:10:17.0082662Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_dist_cuda_complex64 PASSED [ 11%] 2023-01-11T23:10:17.0082842Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_div_trunc_rounding_cuda_float32 PASSED [ 11%] 2023-01-11T23:10:17.0083020Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_div_trunc_rounding_cuda_int64 PASSED [ 11%] 2023-01-11T23:10:17.0083186Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_double_cuda_float32 PASSED [ 11%] 2023-01-11T23:10:17.0083348Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_dstack_cuda_float32 PASSED [ 11%] 2023-01-11T23:10:17.0083518Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_einsum_cuda_complex64 PASSED [ 11%] 2023-01-11T23:10:17.0083676Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_einsum_cuda_float32 PASSED [ 11%] 2023-01-11T23:10:17.0083859Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_empty_cuda_complex64 SKIPPED (Skipped!) [ 11%] 2023-01-11T23:10:17.0084045Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_empty_like_cuda_complex64 SKIPPED (Skipped!) [ 11%] 2023-01-11T23:10:17.0084225Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_empty_like_cuda_int64 SKIPPED (Skipped!) [ 11%] 2023-01-11T23:10:17.0084389Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_eq_cuda_complex64 PASSED [ 11%] 2023-01-11T23:10:17.0084573Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_eq_cuda_float32 PASSED [ 11%] 2023-01-11T23:10:17.0084731Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_eq_cuda_int64 PASSED [ 11%] 2023-01-11T23:10:17.0084900Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_equal_cuda_complex64 PASSED [ 11%] 2023-01-11T23:10:17.0085058Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_exp2_cuda_float32 PASSED [ 11%] 2023-01-11T23:10:17.0085216Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_exp2_cuda_int64 PASSED [ 11%] 2023-01-11T23:10:17.0085376Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_exp_cuda_float32 PASSED [ 11%] 2023-01-11T23:10:17.0085548Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_expand_as_cuda_complex64 PASSED [ 11%] 2023-01-11T23:10:17.0085712Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_expand_cuda_float32 PASSED [ 11%] 2023-01-11T23:10:17.0085874Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_expand_cuda_int64 PASSED [ 11%] 2023-01-11T23:10:17.0086056Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_eye_cuda_complex64 SKIPPED (Skipped!) [ 11%] 2023-01-11T23:10:17.0086228Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_eye_cuda_int64 SKIPPED (Skipped!) [ 11%] 2023-01-11T23:10:17.0086395Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_fft2_cuda_float32 PASSED [ 11%] 2023-01-11T23:10:17.0086554Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_fft2_cuda_int64 PASSED [ 11%] 2023-01-11T23:10:17.0086717Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_fft_cuda_float32 PASSED [ 11%] 2023-01-11T23:10:17.0086878Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_fft_cuda_int64 PASSED [ 11%] 2023-01-11T23:10:17.0087043Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_fftn_cuda_float32 PASSED [ 11%] 2023-01-11T23:10:17.0087212Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_fftshift_cuda_int64 PASSED [ 11%] 2023-01-11T23:10:17.0087385Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_hfft2_cuda_complex64 PASSED [ 11%] 2023-01-11T23:10:17.0087548Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_hfft2_cuda_int64 PASSED [ 11%] 2023-01-11T23:10:17.0087740Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_hfft_cuda_complex64 PASSED [ 11%] 2023-01-11T23:10:17.0087899Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_hfft_cuda_int64 PASSED [ 11%] 2023-01-11T23:10:17.0088068Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_hfftn_cuda_float32 PASSED [ 11%] 2023-01-11T23:10:17.0088233Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_hfftn_cuda_int64 PASSED [ 11%] 2023-01-11T23:10:17.0088396Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_ifft2_cuda_float32 PASSED [ 11%] 2023-01-11T23:10:17.0088555Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_ifft2_cuda_int64 PASSED [ 11%] 2023-01-11T23:10:17.0088724Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_ifft_cuda_complex64 PASSED [ 11%] 2023-01-11T23:10:17.0088898Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_ifftn_cuda_complex64 PASSED [ 11%] 2023-01-11T23:10:17.0089067Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_ifftn_cuda_float32 PASSED [ 11%] 2023-01-11T23:10:17.0089245Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_ifftshift_cuda_complex64 PASSED [ 11%] 2023-01-11T23:10:17.0089409Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_ihfft2_cuda_float32 PASSED [ 11%] 2023-01-11T23:10:17.0089572Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_ihfft2_cuda_int64 PASSED [ 11%] 2023-01-11T23:10:17.0089737Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_ihfftn_cuda_float32 PASSED [ 11%] 2023-01-11T23:10:17.0089898Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_ihfftn_cuda_int64 PASSED [ 11%] 2023-01-11T23:10:17.0090072Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_irfft2_cuda_complex64 PASSED [ 11%] 2023-01-11T23:10:17.0090265Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_irfft2_cuda_float32 PASSED [ 11%] 2023-01-11T23:10:17.0090432Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_irfft_cuda_float32 PASSED [ 11%] 2023-01-11T23:10:17.0090596Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_irfft_cuda_int64 PASSED [ 11%] 2023-01-11T23:10:17.0090783Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_irfftn_cuda_int64 PASSED [ 11%] 2023-01-11T23:10:17.0090975Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_rfft2_cuda_float32 PASSED [ 11%] 2023-01-11T23:10:17.0091138Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_rfft2_cuda_int64 PASSED [ 11%] 2023-01-11T23:10:17.0091301Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_rfft_cuda_float32 PASSED [ 11%] 2023-01-11T23:10:17.0091463Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_rfft_cuda_int64 PASSED [ 11%] 2023-01-11T23:10:17.0091628Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fill_cuda_complex64 PASSED [ 11%] 2023-01-11T23:10:17.0091791Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fill_cuda_float32 PASSED [ 11%] 2023-01-11T23:10:17.0091955Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fill_cuda_int64 PASSED [ 11%] 2023-01-11T23:10:17.0092123Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_flatten_cuda_complex64 PASSED [ 11%] 2023-01-11T23:10:17.0092284Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_flatten_cuda_int64 PASSED [ 11%] 2023-01-11T23:10:17.0092447Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fliplr_cuda_float32 PASSED [ 11%] 2023-01-11T23:10:17.0092617Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_flipud_cuda_complex64 PASSED [ 11%] 2023-01-11T23:10:17.0092778Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_flipud_cuda_float32 PASSED [ 11%] 2023-01-11T23:10:17.0092940Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_flipud_cuda_int64 PASSED [ 11%] 2023-01-11T23:10:17.0093106Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_float_cuda_int64 PASSED [ 11%] 2023-01-11T23:10:17.0093282Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_float_power_cuda_complex64 PASSED [ 11%] 2023-01-11T23:10:17.0093474Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_float_power_cuda_float32 PASSED [ 11%] 2023-01-11T23:10:17.0093641Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_float_power_cuda_int64 PASSED [ 11%] 2023-01-11T23:10:17.0093803Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_floor_cuda_int64 PASSED [ 11%] 2023-01-11T23:10:17.0093972Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_floor_divide_cuda_int64 PASSED [ 11%] 2023-01-11T23:10:17.0094131Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fmod_cuda_int64 PASSED [ 11%] 2023-01-11T23:10:17.0094293Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_frac_cuda_float32 PASSED [ 11%] 2023-01-11T23:10:17.0094457Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_frexp_cuda_float32 PASSED [ 11%] 2023-01-11T23:10:17.0094743Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_full_cuda_float32 PASSED [ 11%] 2023-01-11T23:10:17.0094902Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_full_cuda_int64 PASSED [ 11%] 2023-01-11T23:10:17.0095074Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_full_like_cuda_complex64 PASSED [ 11%] 2023-01-11T23:10:17.0095235Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_full_like_cuda_float32 PASSED [ 11%] 2023-01-11T23:10:17.0095391Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_ge_cuda_float32 PASSED [ 11%] 2023-01-11T23:10:17.0095547Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_ge_cuda_int64 PASSED [ 11%] 2023-01-11T23:10:17.0095714Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_gradient_cuda_complex64 PASSED [ 11%] 2023-01-11T23:10:17.0095874Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_gradient_cuda_int64 PASSED [ 11%] 2023-01-11T23:10:17.0096084Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_half_cuda_complex64 PASSED [ 11%] 2023-01-11T23:10:17.0096246Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_half_cuda_float32 PASSED [ 11%] 2023-01-11T23:10:17.0096419Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_heaviside_cuda_float32 PASSED [ 11%] 2023-01-11T23:10:17.0096579Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_histc_cuda_float32 PASSED [ 11%] 2023-01-11T23:10:17.0096740Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_histc_cuda_int64 PASSED [ 11%] 2023-01-11T23:10:17.0096909Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_hsplit_cuda_complex64 PASSED [ 11%] 2023-01-11T23:10:17.0097072Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_hsplit_cuda_float32 PASSED [ 11%] 2023-01-11T23:10:17.0097232Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_hsplit_cuda_int64 PASSED [ 11%] 2023-01-11T23:10:17.0097395Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_hstack_cuda_float32 PASSED [ 11%] 2023-01-11T23:10:17.0097558Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_hstack_cuda_int64 PASSED [ 11%] 2023-01-11T23:10:17.0097719Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_hypot_cuda_float32 PASSED [ 11%] 2023-01-11T23:10:17.0097880Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_i0_cuda_float32 PASSED [ 11%] 2023-01-11T23:10:17.0098042Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_igammac_cuda_float32 PASSED [ 11%] 2023-01-11T23:10:17.0098204Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_imag_cuda_complex64 PASSED [ 11%] 2023-01-11T23:10:17.0098374Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_index_add_cuda_complex64 PASSED [ 11%] 2023-01-11T23:10:17.0098539Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_index_add_cuda_float32 PASSED [ 11%] 2023-01-11T23:10:17.0098701Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_index_add_cuda_int64 PASSED [ 11%] 2023-01-11T23:10:17.0098879Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_index_copy_cuda_complex64 PASSED [ 11%] 2023-01-11T23:10:17.0099048Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_index_copy_cuda_float32 PASSED [ 11%] 2023-01-11T23:10:17.0099244Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_index_copy_cuda_int64 PASSED [ 11%] 2023-01-11T23:10:17.0099409Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_index_put_cuda_float32 PASSED [ 11%] 2023-01-11T23:10:17.0099577Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_index_reduce_cuda_int64 PASSED [ 11%] 2023-01-11T23:10:17.0099751Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_index_select_cuda_complex64 PASSED [ 11%] 2023-01-11T23:10:17.0099917Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_inner_cuda_complex64 PASSED [ 11%] 2023-01-11T23:10:17.0100107Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_int_cuda_complex64 PASSED [ 11%] 2023-01-11T23:10:17.0100294Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_int_cuda_float32 PASSED [ 11%] 2023-01-11T23:10:17.0100459Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_int_cuda_int64 PASSED [ 11%] 2023-01-11T23:10:17.0100627Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isclose_cuda_complex64 PASSED [ 11%] 2023-01-11T23:10:17.0100791Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isfinite_cuda_int64 PASSED [ 11%] 2023-01-11T23:10:17.0100949Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isin_cuda_float32 PASSED [ 11%] 2023-01-11T23:10:17.0101109Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isin_cuda_int64 PASSED [ 11%] 2023-01-11T23:10:17.0101272Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isinf_cuda_complex64 PASSED [ 11%] 2023-01-11T23:10:17.0101434Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isinf_cuda_int64 PASSED [ 11%] 2023-01-11T23:10:17.0101596Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isnan_cuda_float32 PASSED [ 11%] 2023-01-11T23:10:17.0101757Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isnan_cuda_int64 PASSED [ 11%] 2023-01-11T23:10:17.0101950Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isneginf_cuda_float32 PASSED [ 11%] 2023-01-11T23:10:17.0102116Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isneginf_cuda_int64 PASSED [ 11%] 2023-01-11T23:10:17.0102281Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isposinf_cuda_float32 PASSED [ 11%] 2023-01-11T23:10:17.0102442Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isreal_cuda_float32 PASSED [ 11%] 2023-01-11T23:10:17.0102602Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isreal_cuda_int64 PASSED [ 11%] 2023-01-11T23:10:17.0102766Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_istft_cuda_complex64 PASSED [ 11%] 2023-01-11T23:10:17.0102955Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_jiterator_2inputs_2outputs_cuda_float32 PASSED [ 11%] 2023-01-11T23:10:17.0103140Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_jiterator_2inputs_2outputs_cuda_int64 PASSED [ 11%] 2023-01-11T23:10:17.0103339Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_jiterator_4inputs_with_extra_args_cuda_complex64 PASSED [ 11%] 2023-01-11T23:10:17.0103539Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_jiterator_binary_return_by_ref_cuda_complex64 PASSED [ 11%] 2023-01-11T23:10:17.0103729Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_jiterator_binary_return_by_ref_cuda_float32 PASSED [ 11%] 2023-01-11T23:10:17.0103913Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_jiterator_binary_return_by_ref_cuda_int64 PASSED [ 11%] 2023-01-11T23:10:17.0104091Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_jiterator_unary_cuda_float32 PASSED [ 11%] 2023-01-11T23:10:17.0104264Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_jiterator_unary_cuda_int64 PASSED [ 11%] 2023-01-11T23:10:17.0104429Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_kron_cuda_complex64 PASSED [ 11%] 2023-01-11T23:10:17.0104595Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_kron_cuda_float32 PASSED [ 11%] 2023-01-11T23:10:17.0104754Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_lcm_cuda_int64 PASSED [ 11%] 2023-01-11T23:10:17.0104951Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_ldexp_cuda_float32 PASSED [ 11%] 2023-01-11T23:10:17.0105117Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_ldexp_cuda_int64 PASSED [ 11%] 2023-01-11T23:10:17.0105280Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_lerp_cuda_complex64 PASSED [ 11%] 2023-01-11T23:10:17.0105442Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_lgamma_cuda_float32 PASSED [ 11%] 2023-01-11T23:10:17.0105616Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_cond_cuda_complex64 PASSED [ 11%] 2023-01-11T23:10:17.0105784Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_det_cuda_float32 PASSED [ 11%] 2023-01-11T23:10:17.0106011Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_det_singular_cuda_complex64 SKIPPED (The backward may give different results) [ 11%] 2023-01-11T23:10:17.0106187Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_eig_cuda_complex64 PASSED [ 11%] 2023-01-11T23:10:17.0106361Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_eigh_cuda_complex64 PASSED [ 11%] 2023-01-11T23:10:17.0106531Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_eigh_cuda_float32 PASSED [ 12%] 2023-01-11T23:10:17.0106710Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_eigvals_cuda_complex64 PASSED [ 12%] 2023-01-11T23:10:17.0106889Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_eigvalsh_cuda_complex64 PASSED [ 12%] 2023-01-11T23:10:17.0107061Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_eigvalsh_cuda_float32 PASSED [ 12%] 2023-01-11T23:10:17.0107254Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_householder_product_cuda_complex64 PASSED [ 12%] 2023-01-11T23:10:17.0107443Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_householder_product_cuda_float32 PASSED [ 12%] 2023-01-11T23:10:17.0107641Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_inv_cuda_complex64 PASSED [ 12%] 2023-01-11T23:10:17.0107812Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_inv_cuda_float32 PASSED [ 12%] 2023-01-11T23:10:17.0107988Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_inv_ex_cuda_complex64 PASSED [ 12%] 2023-01-11T23:10:17.0108155Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_inv_ex_cuda_float32 PASSED [ 12%] 2023-01-11T23:10:17.0108333Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_ldl_factor_cuda_float32 PASSED [ 12%] 2023-01-11T23:10:17.0108508Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_ldl_solve_cuda_complex64 PASSED [ 12%] 2023-01-11T23:10:17.0108749Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_ldl_solve_cuda_float32 PASSED [ 12%] 2023-01-11T23:10:17.0108925Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_lstsq_cuda_complex64 PASSED [ 12%] 2023-01-11T23:10:17.0109099Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_lstsq_cuda_float32 PASSED [ 12%] 2023-01-11T23:10:17.0109287Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_lstsq_grad_oriented_cuda_float32 PASSED [ 12%] 2023-01-11T23:10:17.0109459Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_lu_cuda_complex64 PASSED [ 12%] 2023-01-11T23:10:17.0109627Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_lu_cuda_float32 PASSED [ 12%] 2023-01-11T23:10:17.0109807Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_lu_factor_cuda_complex64 PASSED [ 12%] 2023-01-11T23:10:17.0109988Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_lu_factor_ex_cuda_float32 PASSED [ 12%] 2023-01-11T23:10:17.0110161Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_lu_solve_cuda_float32 PASSED [ 12%] 2023-01-11T23:10:17.0110344Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_matrix_norm_cuda_complex64 PASSED [ 12%] 2023-01-11T23:10:17.0110524Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_matrix_norm_cuda_float32 PASSED [ 12%] 2023-01-11T23:10:17.0110733Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_matrix_power_cuda_complex64 PASSED [ 12%] 2023-01-11T23:10:17.0110915Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_matrix_rank_cuda_complex64 PASSED [ 12%] 2023-01-11T23:10:17.0111107Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_matrix_rank_hermitian_cuda_complex64 PASSED [ 12%] 2023-01-11T23:10:17.0111294Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_multi_dot_cuda_complex64 SKIPPED (67470!) [ 12%] 2023-01-11T23:10:17.0111490Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_norm_subgradients_at_zero_cuda_complex64 PASSED [ 12%] 2023-01-11T23:10:17.0111683Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_norm_subgradients_at_zero_cuda_float32 PASSED [ 12%] 2023-01-11T23:10:17.0111855Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_pinv_cuda_complex64 PASSED [ 12%] 2023-01-11T23:10:17.0112041Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_pinv_hermitian_cuda_float32 PASSED [ 12%] 2023-01-11T23:10:17.0112208Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_qr_cuda_float32 PASSED [ 12%] 2023-01-11T23:10:17.0112379Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_solve_cuda_float32 PASSED [ 12%] 2023-01-11T23:10:17.0112558Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_solve_ex_cuda_complex64 PASSED [ 12%] 2023-01-11T23:10:17.0112731Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_solve_ex_cuda_float32 PASSED [ 12%] 2023-01-11T23:10:17.0112919Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_solve_triangular_cuda_complex64 PASSED [ 12%] 2023-01-11T23:10:17.0113088Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_svd_cuda_float32 PASSED [ 12%] 2023-01-11T23:10:17.0113287Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_svdvals_cuda_complex64 PASSED [ 12%] 2023-01-11T23:10:17.0113456Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_svdvals_cuda_float32 PASSED [ 12%] 2023-01-11T23:10:17.0113634Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_tensorinv_cuda_complex64 PASSED [ 12%] 2023-01-11T23:10:17.0113814Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_tensorsolve_cuda_complex64 PASSED [ 12%] 2023-01-11T23:10:17.0113990Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_tensorsolve_cuda_float32 PASSED [ 12%] 2023-01-11T23:10:17.0114162Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_vander_cuda_complex64 PASSED [ 12%] 2023-01-11T23:10:17.0114330Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_vecdot_cuda_complex64 PASSED [ 12%] 2023-01-11T23:10:17.0114509Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_vector_norm_cuda_complex64 PASSED [ 12%] 2023-01-11T23:10:17.0114687Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_vector_norm_cuda_float32 PASSED [ 12%] 2023-01-11T23:10:17.0114858Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linspace_cuda_complex64 PASSED [ 12%] 2023-01-11T23:10:17.0115020Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linspace_cuda_float32 PASSED [ 12%] 2023-01-11T23:10:17.0115182Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_log2_cuda_complex64 PASSED [ 12%] 2023-01-11T23:10:17.0115343Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_log2_cuda_float32 PASSED [ 12%] 2023-01-11T23:10:17.0115501Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_log2_cuda_int64 PASSED [ 12%] 2023-01-11T23:10:17.0115660Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_log_cuda_float32 PASSED [ 12%] 2023-01-11T23:10:17.0115836Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_log_softmax_with_dtype_cuda_int64 PASSED [ 12%] 2023-01-11T23:10:17.0116007Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logaddexp2_cuda_float32 PASSED [ 12%] 2023-01-11T23:10:17.0116178Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logcumsumexp_cuda_float32 PASSED [ 12%] 2023-01-11T23:10:17.0116363Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logdet_cuda_float32 PASSED [ 12%] 2023-01-11T23:10:17.0116528Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logical_and_cuda_float32 PASSED [ 12%] 2023-01-11T23:10:17.0116693Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logical_and_cuda_int64 PASSED [ 12%] 2023-01-11T23:10:17.0116865Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logical_not_cuda_complex64 PASSED [ 12%] 2023-01-11T23:10:17.0117029Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logical_not_cuda_float32 PASSED [ 12%] 2023-01-11T23:10:17.0117201Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logical_or_cuda_complex64 PASSED [ 12%] 2023-01-11T23:10:17.0117368Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logical_or_cuda_int64 PASSED [ 12%] 2023-01-11T23:10:17.0117538Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logical_xor_cuda_complex64 PASSED [ 12%] 2023-01-11T23:10:17.0117704Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logspace_cuda_float32 PASSED [ 12%] 2023-01-11T23:10:17.0117864Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logspace_cuda_int64 PASSED [ 12%] 2023-01-11T23:10:17.0118032Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logsumexp_cuda_float32 PASSED [ 12%] 2023-01-11T23:10:17.0118188Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_lt_cuda_float32 PASSED [ 12%] 2023-01-11T23:10:17.0118343Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_lu_cuda_float32 PASSED [ 12%] 2023-01-11T23:10:17.0118510Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_lu_solve_cuda_complex64 PASSED [ 12%] 2023-01-11T23:10:17.0118673Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_lu_solve_cuda_float32 PASSED [ 12%] 2023-01-11T23:10:17.0118866Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_lu_unpack_cuda_complex64 PASSED [ 12%] 2023-01-11T23:10:17.0119030Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mH_cuda_complex64 PASSED [ 12%] 2023-01-11T23:10:17.0119193Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mH_cuda_int64 PASSED [ 12%] 2023-01-11T23:10:17.0119357Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_amin_cuda_int64 PASSED [ 12%] 2023-01-11T23:10:17.0119530Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_argmax_cuda_float32 PASSED [ 12%] 2023-01-11T23:10:17.0119700Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_argmax_cuda_int64 PASSED [ 12%] 2023-01-11T23:10:17.0119872Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_argmin_cuda_float32 PASSED [ 12%] 2023-01-11T23:10:17.0120050Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_cumprod_cuda_complex64 PASSED [ 12%] 2023-01-11T23:10:17.0120223Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_cumprod_cuda_int64 PASSED [ 12%] 2023-01-11T23:10:17.0120403Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_cumsum_cuda_complex64 PASSED [ 12%] 2023-01-11T23:10:17.0120601Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_cumsum_cuda_float32 PASSED [ 12%] 2023-01-11T23:10:17.0120791Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_cumsum_cuda_int64 PASSED [ 12%] 2023-01-11T23:10:17.0120963Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_fill_cuda_complex64 PASSED [ 12%] 2023-01-11T23:10:17.0121134Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_fill_cuda_float32 PASSED [ 12%] 2023-01-11T23:10:17.0121302Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_fill_cuda_int64 PASSED [ 12%] 2023-01-11T23:10:17.0121480Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_logsumexp_cuda_float32 PASSED [ 12%] 2023-01-11T23:10:17.0121648Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_mean_cuda_int64 PASSED [ 12%] 2023-01-11T23:10:17.0121818Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_median_cuda_float32 PASSED [ 12%] 2023-01-11T23:10:17.0122023Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_normalize_cuda_complex64 PASSED [ 12%] 2023-01-11T23:10:17.0122199Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_prod_cuda_complex64 PASSED [ 12%] 2023-01-11T23:10:17.0122366Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_prod_cuda_float32 PASSED [ 12%] 2023-01-11T23:10:17.0122540Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_scatter_cuda_float32 PASSED [ 12%] 2023-01-11T23:10:17.0122710Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_scatter_cuda_int64 PASSED [ 12%] 2023-01-11T23:10:17.0122879Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_select_cuda_float32 PASSED [ 12%] 2023-01-11T23:10:17.0123045Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_select_cuda_int64 PASSED [ 12%] 2023-01-11T23:10:17.0123218Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_softmax_cuda_float32 PASSED [ 12%] 2023-01-11T23:10:17.0123391Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_sum_cuda_complex64 PASSED [ 12%] 2023-01-11T23:10:17.0123561Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_sum_cuda_float32 PASSED [ 12%] 2023-01-11T23:10:17.0123727Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_sum_cuda_int64 PASSED [ 12%] 2023-01-11T23:10:17.0123896Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_var_cuda_complex64 PASSED [ 12%] 2023-01-11T23:10:17.0124062Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_var_cuda_int64 PASSED [ 12%] 2023-01-11T23:10:17.0124230Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_matmul_cuda_complex64 PASSED [ 12%] 2023-01-11T23:10:17.0124393Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_matmul_cuda_float32 PASSED [ 12%] 2023-01-11T23:10:17.0124589Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_matrix_exp_cuda_complex64 PASSED [ 12%] 2023-01-11T23:10:17.0124757Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_matrix_exp_cuda_float32 PASSED [ 12%] 2023-01-11T23:10:17.0124922Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_max_binary_cuda_int64 PASSED [ 12%] 2023-01-11T23:10:17.0125117Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_max_pool2d_with_indices_backward_cuda_float32 PASSED [ 12%] 2023-01-11T23:10:17.0125291Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_max_reduction_no_dim_cuda_int64 PASSED [ 12%] 2023-01-11T23:10:17.0125472Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_max_reduction_with_dim_cuda_int64 PASSED [ 12%] 2023-01-11T23:10:17.0125640Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_maximum_cuda_float32 PASSED [ 12%] 2023-01-11T23:10:17.0125804Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_maximum_cuda_int64 PASSED [ 12%] 2023-01-11T23:10:17.0125969Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mean_cuda_float32 PASSED [ 12%] 2023-01-11T23:10:17.0126135Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_median_cuda_float32 PASSED [ 12%] 2023-01-11T23:10:17.0126301Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_median_cuda_int64 PASSED [ 12%] 2023-01-11T23:10:17.0126485Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_meshgrid_list_of_tensors_cuda_int64 PASSED [ 12%] 2023-01-11T23:10:17.0126672Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_meshgrid_variadic_tensors_cuda_int64 PASSED [ 12%] 2023-01-11T23:10:17.0126850Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_min_reduction_no_dim_cuda_float32 PASSED [ 12%] 2023-01-11T23:10:17.0127028Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_min_reduction_no_dim_cuda_int64 PASSED [ 12%] 2023-01-11T23:10:17.0127210Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_min_reduction_with_dim_cuda_float32 PASSED [ 12%] 2023-01-11T23:10:17.0127381Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_minimum_cuda_float32 PASSED [ 12%] 2023-01-11T23:10:17.0127545Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_minimum_cuda_int64 PASSED [ 12%] 2023-01-11T23:10:17.0127728Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mm_cuda_float32 PASSED [ 12%] 2023-01-11T23:10:17.0127892Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mode_cuda_float32 PASSED [ 12%] 2023-01-11T23:10:17.0128052Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mode_cuda_int64 PASSED [ 12%] 2023-01-11T23:10:17.0128216Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_movedim_cuda_float32 PASSED [ 12%] 2023-01-11T23:10:17.0128375Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_msort_cuda_float32 PASSED [ 12%] 2023-01-11T23:10:17.0128537Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_msort_cuda_int64 PASSED [ 12%] 2023-01-11T23:10:17.0128699Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mul_cuda_int64 PASSED [ 12%] 2023-01-11T23:10:17.0128865Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mv_cuda_complex64 PASSED [ 12%] 2023-01-11T23:10:17.0129048Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mvlgamma_mvlgamma_p_1_cuda_int64 PASSED [ 12%] 2023-01-11T23:10:17.0129229Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mvlgamma_mvlgamma_p_5_cuda_int64 PASSED [ 12%] 2023-01-11T23:10:17.0129399Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nan_to_num_cuda_float32 PASSED [ 12%] 2023-01-11T23:10:17.0129565Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nanmean_cuda_float32 PASSED [ 12%] 2023-01-11T23:10:17.0129733Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nanmedian_cuda_float32 PASSED [ 12%] 2023-01-11T23:10:17.0129900Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nanmedian_cuda_int64 PASSED [ 12%] 2023-01-11T23:10:17.0130078Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nanquantile_cuda_float32 PASSED [ 12%] 2023-01-11T23:10:17.0130298Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nansum_cuda_float32 PASSED [ 12%] 2023-01-11T23:10:17.0130472Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nansum_cuda_int64 PASSED [ 12%] 2023-01-11T23:10:17.0130650Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_narrow_copy_cuda_complex64 PASSED [ 12%] 2023-01-11T23:10:17.0130818Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_narrow_copy_cuda_float32 PASSED [ 12%] 2023-01-11T23:10:17.0130986Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_narrow_copy_cuda_int64 PASSED [ 12%] 2023-01-11T23:10:17.0131154Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_narrow_cuda_complex64 PASSED [ 12%] 2023-01-11T23:10:17.0131314Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_narrow_cuda_float32 PASSED [ 12%] 2023-01-11T23:10:17.0131474Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_narrow_cuda_int64 PASSED [ 12%] 2023-01-11T23:10:17.0131655Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_native_batch_norm_cuda_float32 PASSED [ 12%] 2023-01-11T23:10:17.0131843Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_native_dropout_backward_cuda_float32 PASSED [ 12%] 2023-01-11T23:10:17.0132023Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_native_layer_norm_cuda_float32 PASSED [ 12%] 2023-01-11T23:10:17.0132184Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_ne_cuda_complex64 PASSED [ 12%] 2023-01-11T23:10:17.0132349Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_neg_cuda_complex64 PASSED [ 12%] 2023-01-11T23:10:17.0132533Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_new_empty_cuda_float32 SKIPPED (Skipped!) [ 12%] 2023-01-11T23:10:17.0132711Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_new_empty_cuda_int64 SKIPPED (Skipped!) [ 12%] 2023-01-11T23:10:17.0132946Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_new_empty_strided_cuda_complex64 SKIPPED (Expected: new_empty_strided is not comparable) [ 12%] 2023-01-11T23:10:17.0133179Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_new_empty_strided_cuda_int64 SKIPPED (Expected: new_empty_strided is not comparable) [ 12%] 2023-01-11T23:10:17.0133372Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_new_full_cuda_complex64 PASSED [ 12%] 2023-01-11T23:10:17.0133536Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_new_full_cuda_int64 PASSED [ 12%] 2023-01-11T23:10:17.0133707Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_new_ones_cuda_complex64 PASSED [ 12%] 2023-01-11T23:10:17.0133871Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_new_ones_cuda_int64 PASSED [ 12%] 2023-01-11T23:10:17.0134038Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_new_zeros_cuda_float32 PASSED [ 13%] 2023-01-11T23:10:17.0134202Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_new_zeros_cuda_int64 PASSED [ 13%] 2023-01-11T23:10:17.0134422Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional__scaled_dot_product_attention_cuda_float32 SKIPPED (Skipped!) [ 13%] 2023-01-11T23:10:17.0134726Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_adaptive_avg_pool1d_cuda_float32 PASSED [ 13%] 2023-01-11T23:10:17.0134921Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_adaptive_avg_pool2d_cuda_float32 PASSED [ 13%] 2023-01-11T23:10:17.0135109Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_adaptive_avg_pool3d_cuda_float32 PASSED [ 13%] 2023-01-11T23:10:17.0135296Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_adaptive_max_pool1d_cuda_float32 PASSED [ 13%] 2023-01-11T23:10:17.0135481Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_adaptive_max_pool2d_cuda_float32 PASSED [ 13%] 2023-01-11T23:10:17.0135667Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_adaptive_max_pool3d_cuda_float32 PASSED [ 13%] 2023-01-11T23:10:17.0135854Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_avg_pool1d_cuda_float32 PASSED [ 13%] 2023-01-11T23:10:17.0136079Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_avg_pool2d_cuda_float32 PASSED [ 13%] 2023-01-11T23:10:17.0136261Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_avg_pool3d_cuda_float32 PASSED [ 13%] 2023-01-11T23:10:17.0136435Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_batch_norm_cuda_float32 PASSED [ 13%] 2023-01-11T23:10:17.0136632Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_batch_norm_without_cudnn_cuda_float32 PASSED [ 13%] 2023-01-11T23:10:17.0136815Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_bilinear_cuda_float32 PASSED [ 13%] 2023-01-11T23:10:17.0137020Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_binary_cross_entropy_with_logits_cuda_float32 PASSED [ 13%] 2023-01-11T23:10:17.0137197Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_celu_cuda_float32 PASSED [ 13%] 2023-01-11T23:10:17.0137379Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_conv1d_cuda_complex64 PASSED [ 13%] 2023-01-11T23:10:17.0137558Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_conv1d_cuda_float32 PASSED [ 13%] 2023-01-11T23:10:17.0137739Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_conv2d_cuda_float32 PASSED [ 13%] 2023-01-11T23:10:17.0137932Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_conv_transpose1d_cuda_complex64 PASSED [ 13%] 2023-01-11T23:10:17.0138115Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_conv_transpose1d_cuda_float32 PASSED [ 13%] 2023-01-11T23:10:17.0138309Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_conv_transpose2d_cuda_complex64 PASSED [ 13%] 2023-01-11T23:10:17.0138496Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_conv_transpose2d_cuda_float32 PASSED [ 13%] 2023-01-11T23:10:17.0138691Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_conv_transpose3d_cuda_complex64 PASSED [ 13%] 2023-01-11T23:10:17.0138879Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_conv_transpose3d_cuda_float32 PASSED [ 13%] 2023-01-11T23:10:17.0139098Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_cosine_similarity_cuda_float32 PASSED [ 13%] 2023-01-11T23:10:17.0139287Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_cross_entropy_cuda_float32 PASSED [ 13%] 2023-01-11T23:10:17.0139467Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_ctc_loss_cuda_float32 PASSED [ 13%] 2023-01-11T23:10:17.0139651Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_dropout2d_cuda_float32 PASSED [ 13%] 2023-01-11T23:10:17.0139836Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_dropout3d_cuda_float32 PASSED [ 13%] 2023-01-11T23:10:17.0140024Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_embedding_bag_cuda_float32 PASSED [ 13%] 2023-01-11T23:10:17.0140243Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_embedding_cuda_float32 PASSED [ 13%] 2023-01-11T23:10:17.0140470Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_feature_alpha_dropout_without_train_cuda_complex64 PASSED [ 13%] 2023-01-11T23:10:17.0140677Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_feature_alpha_dropout_without_train_cuda_int64 PASSED [ 13%] 2023-01-11T23:10:17.0140870Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_fractional_max_pool2d_cuda_float32 PASSED [ 13%] 2023-01-11T23:10:17.0141064Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_fractional_max_pool3d_cuda_float32 PASSED [ 13%] 2023-01-11T23:10:17.0141253Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_gaussian_nll_loss_cuda_float32 PASSED [ 13%] 2023-01-11T23:10:17.0141431Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_gelu_cuda_float32 PASSED [ 13%] 2023-01-11T23:10:17.0143160Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_hardswish_cuda_float32 PASSED [ 13%] 2023-01-11T23:10:17.0143339Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_hardtanh_cuda_int64 PASSED [ 13%] 2023-01-11T23:10:17.0143521Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_huber_loss_cuda_float32 PASSED [ 13%] 2023-01-11T23:10:17.0143706Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_instance_norm_cuda_float32 PASSED [ 13%] 2023-01-11T23:10:17.0143896Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_interpolate_area_cuda_float32 PASSED [ 13%] 2023-01-11T23:10:17.0144087Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_interpolate_nearest_cuda_float32 PASSED [ 13%] 2023-01-11T23:10:17.0144279Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_interpolate_trilinear_cuda_float32 PASSED [ 13%] 2023-01-11T23:10:17.0144465Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_layer_norm_cuda_float32 PASSED [ 13%] 2023-01-11T23:10:17.0144650Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_leaky_relu_cuda_float32 PASSED [ 13%] 2023-01-11T23:10:17.0144830Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_linear_cuda_float32 PASSED [ 13%] 2023-01-11T23:10:17.0145020Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_local_response_norm_cuda_float32 PASSED [ 13%] 2023-01-11T23:10:17.0145201Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_max_pool1d_cuda_float32 PASSED [ 13%] 2023-01-11T23:10:17.0145381Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_max_pool3d_cuda_float32 PASSED [ 13%] 2023-01-11T23:10:17.0145571Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_max_unpool1d_grad_cuda_float32 PASSED [ 13%] 2023-01-11T23:10:17.0145756Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_max_unpool2d_grad_cuda_float32 PASSED [ 13%] 2023-01-11T23:10:17.0145941Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_max_unpool3d_cuda_float32 PASSED [ 13%] 2023-01-11T23:10:17.0146166Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_multilabel_margin_loss_cuda_float32 PASSED [ 13%] 2023-01-11T23:10:17.0146370Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_multilabel_soft_margin_loss_cuda_float32 PASSED [ 13%] 2023-01-11T23:10:17.0146547Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_one_hot_cuda_int64 PASSED [ 13%] 2023-01-11T23:10:17.0146736Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pad_circular_cuda_complex64 PASSED [ 13%] 2023-01-11T23:10:17.0146918Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pad_circular_cuda_float32 PASSED [ 13%] 2023-01-11T23:10:17.0147100Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pad_circular_cuda_int64 PASSED [ 13%] 2023-01-11T23:10:17.0147291Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pad_constant_cuda_complex64 PASSED [ 13%] 2023-01-11T23:10:17.0147476Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pad_constant_cuda_float32 PASSED [ 13%] 2023-01-11T23:10:17.0147658Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pad_constant_cuda_int64 PASSED [ 13%] 2023-01-11T23:10:17.0147853Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pad_replicate_cuda_complex64 PASSED [ 13%] 2023-01-11T23:10:17.0148042Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pad_replicate_cuda_float32 PASSED [ 13%] 2023-01-11T23:10:17.0148236Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pixel_shuffle_cuda_complex64 PASSED [ 13%] 2023-01-11T23:10:17.0148423Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pixel_shuffle_cuda_int64 PASSED [ 13%] 2023-01-11T23:10:17.0148637Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pixel_unshuffle_cuda_complex64 PASSED [ 13%] 2023-01-11T23:10:17.0148892Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pixel_unshuffle_cuda_float32 PASSED [ 13%] 2023-01-11T23:10:17.0149085Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_poisson_nll_loss_cuda_float32 PASSED [ 13%] 2023-01-11T23:10:17.0149272Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_poisson_nll_loss_cuda_int64 PASSED [ 13%] 2023-01-11T23:10:17.0149454Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_prelu_cuda_float32 PASSED [ 13%] 2023-01-11T23:10:17.0149636Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_silu_cuda_float32 PASSED [ 13%] 2023-01-11T23:10:17.0149820Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_smooth_l1_loss_cuda_float32 PASSED [ 13%] 2023-01-11T23:10:17.0150012Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_soft_margin_loss_cuda_float32 PASSED [ 13%] 2023-01-11T23:10:17.0150198Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_softmin_cuda_float32 PASSED [ 13%] 2023-01-11T23:10:17.0150396Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_softmin_with_dtype_cuda_complex64 PASSED [ 13%] 2023-01-11T23:10:17.0150612Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_softsign_cuda_complex64 PASSED [ 13%] 2023-01-11T23:10:17.0150821Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_softsign_cuda_float32 PASSED [ 13%] 2023-01-11T23:10:17.0151014Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_tanhshrink_cuda_complex64 PASSED [ 13%] 2023-01-11T23:10:17.0151200Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_tanhshrink_cuda_int64 PASSED [ 13%] 2023-01-11T23:10:17.0151394Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_triplet_margin_loss_cuda_float32 PASSED [ 13%] 2023-01-11T23:10:17.0151607Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_triplet_margin_with_distance_loss_cuda_float32 PASSED [ 13%] 2023-01-11T23:10:17.0151818Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_unfold_cuda_complex64 PASSED [ 13%] 2023-01-11T23:10:17.0152002Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_unfold_cuda_float32 PASSED [ 13%] 2023-01-11T23:10:17.0152190Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_upsample_bilinear_cuda_float32 PASSED [ 13%] 2023-01-11T23:10:17.0152362Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nonzero_cuda_complex64 PASSED [ 13%] 2023-01-11T23:10:17.0152531Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nonzero_cuda_float32 PASSED [ 13%] 2023-01-11T23:10:17.0152696Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nonzero_cuda_int64 PASSED [ 13%] 2023-01-11T23:10:17.0152871Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_norm_cuda_complex64 PASSED [ 13%] 2023-01-11T23:10:17.0153042Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_norm_fro_cuda_complex64 PASSED [ 13%] 2023-01-11T23:10:17.0153213Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_norm_fro_cuda_float32 PASSED [ 13%] 2023-01-11T23:10:17.0153383Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_norm_inf_cuda_float32 PASSED [ 13%] 2023-01-11T23:10:17.0153547Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_ones_cuda_complex64 PASSED [ 13%] 2023-01-11T23:10:17.0153704Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_ones_cuda_float32 PASSED [ 13%] 2023-01-11T23:10:17.0153876Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_ones_like_cuda_complex64 PASSED [ 13%] 2023-01-11T23:10:17.0154042Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_ones_like_cuda_int64 PASSED [ 13%] 2023-01-11T23:10:17.0154210Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_ormqr_cuda_complex64 PASSED [ 13%] 2023-01-11T23:10:17.0154436Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_outer_cuda_complex64 PASSED [ 13%] 2023-01-11T23:10:17.0154602Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_outer_cuda_float32 PASSED [ 13%] 2023-01-11T23:10:17.0154768Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_outer_cuda_int64 PASSED [ 13%] 2023-01-11T23:10:17.0154941Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_permute_cuda_complex64 PASSED [ 13%] 2023-01-11T23:10:17.0155100Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_permute_cuda_float32 PASSED [ 13%] 2023-01-11T23:10:17.0155266Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_polar_cuda_float32 PASSED [ 13%] 2023-01-11T23:10:17.0155452Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_polygamma_polygamma_n_0_cuda_float32 PASSED [ 13%] 2023-01-11T23:10:17.0155652Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_polygamma_polygamma_n_1_cuda_float32 SKIPPED (Skipped!) [ 13%] 2023-01-11T23:10:17.0155855Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_polygamma_polygamma_n_3_cuda_float32 SKIPPED (Skipped!) [ 13%] 2023-01-11T23:10:17.0156051Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_polygamma_polygamma_n_3_cuda_int64 SKIPPED (Skipped!) [ 13%] 2023-01-11T23:10:17.0156246Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_polygamma_polygamma_n_4_cuda_int64 SKIPPED (Skipped!) [ 13%] 2023-01-11T23:10:17.0156419Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_positive_cuda_complex64 PASSED [ 13%] 2023-01-11T23:10:17.0156587Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_positive_cuda_float32 PASSED [ 13%] 2023-01-11T23:10:17.0156748Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_pow_cuda_complex64 PASSED [ 13%] 2023-01-11T23:10:17.0156912Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_pow_cuda_float32 PASSED [ 13%] 2023-01-11T23:10:17.0157082Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_put_cuda_complex64 PASSED [ 13%] 2023-01-11T23:10:17.0157253Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_put_cuda_float32 PASSED [ 13%] 2023-01-11T23:10:17.0157414Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_put_cuda_int64 PASSED [ 13%] 2023-01-11T23:10:17.0157606Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_quantile_cuda_float32 PASSED [ 13%] 2023-01-11T23:10:17.0157775Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_rad2deg_cuda_float32 PASSED [ 13%] 2023-01-11T23:10:17.0157945Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_rand_like_cuda_float32 PASSED [ 13%] 2023-01-11T23:10:17.0158172Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_randint_cuda_float32 SKIPPED (Test expects tensor input) [ 13%] 2023-01-11T23:10:17.0158337Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_randint_like_cuda_int64 PASSED [ 13%] 2023-01-11T23:10:17.0158537Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_randn_cuda_complex64 SKIPPED (Test expects tensor input) [ 13%] 2023-01-11T23:10:17.0158714Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_randn_like_cuda_complex64 PASSED [ 13%] 2023-01-11T23:10:17.0158885Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_randn_like_cuda_float32 PASSED [ 13%] 2023-01-11T23:10:17.0159054Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_ravel_cuda_complex64 PASSED [ 13%] 2023-01-11T23:10:17.0159221Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_real_cuda_float32 PASSED [ 13%] 2023-01-11T23:10:17.0159394Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_reciprocal_cuda_float32 PASSED [ 13%] 2023-01-11T23:10:17.0159564Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_remainder_cuda_float32 PASSED [ 13%] 2023-01-11T23:10:17.0159732Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_remainder_cuda_int64 PASSED [ 13%] 2023-01-11T23:10:17.0159893Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_repeat_cuda_complex64 PASSED [ 13%] 2023-01-11T23:10:17.0160082Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_repeat_cuda_float32 PASSED [ 13%] 2023-01-11T23:10:17.0160289Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_repeat_interleave_cuda_float32 PASSED [ 13%] 2023-01-11T23:10:17.0160494Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_repeat_interleave_cuda_int64 PASSED [ 13%] 2023-01-11T23:10:17.0160669Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_reshape_as_cuda_complex64 PASSED [ 13%] 2023-01-11T23:10:17.0160838Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_reshape_as_cuda_float32 PASSED [ 13%] 2023-01-11T23:10:17.0161001Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_reshape_as_cuda_int64 PASSED [ 13%] 2023-01-11T23:10:17.0161166Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_reshape_cuda_float32 PASSED [ 13%] 2023-01-11T23:10:17.0161325Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_reshape_cuda_int64 PASSED [ 13%] 2023-01-11T23:10:17.0161496Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_resize__cuda_complex64 PASSED [ 13%] 2023-01-11T23:10:17.0161669Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_resize_as__cuda_complex64 PASSED [ 13%] 2023-01-11T23:10:17.0161847Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_resolve_conj_cuda_complex64 PASSED [ 13%] 2023-01-11T23:10:17.0162018Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_resolve_conj_cuda_int64 PASSED [ 13%] 2023-01-11T23:10:17.0162193Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_resolve_neg_cuda_complex64 PASSED [ 13%] 2023-01-11T23:10:17.0162365Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_resolve_neg_cuda_float32 PASSED [ 13%] 2023-01-11T23:10:17.0162533Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_roll_cuda_complex64 PASSED [ 13%] 2023-01-11T23:10:17.0162697Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_roll_cuda_float32 PASSED [ 13%] 2023-01-11T23:10:17.0162853Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_rot90_cuda_float32 PASSED [ 13%] 2023-01-11T23:10:17.0163023Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_round_cuda_float32 PASSED [ 13%] 2023-01-11T23:10:17.0163200Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_round_decimals_0_cuda_float32 PASSED [ 13%] 2023-01-11T23:10:17.0163415Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_round_decimals_3_cuda_float32 SKIPPED (Skipped!) [ 13%] 2023-01-11T23:10:17.0163588Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_rsqrt_cuda_complex64 PASSED [ 13%] 2023-01-11T23:10:17.0163750Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_rsqrt_cuda_int64 PASSED [ 13%] 2023-01-11T23:10:17.0163917Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_rsub_cuda_complex64 PASSED [ 13%] 2023-01-11T23:10:17.0164079Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_rsub_cuda_int64 PASSED [ 13%] 2023-01-11T23:10:17.0164265Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_scalar_tensor_cuda_complex64 SKIPPED (Skipped!) [ 13%] 2023-01-11T23:10:17.0164457Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_scalar_tensor_cuda_float32 SKIPPED (Skipped!) [ 13%] 2023-01-11T23:10:17.0164643Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_scalar_tensor_cuda_int64 SKIPPED (Skipped!) [ 14%] 2023-01-11T23:10:17.0164816Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_scatter_add_cuda_float32 PASSED [ 14%] 2023-01-11T23:10:17.0164986Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_scatter_add_cuda_int64 PASSED [ 14%] 2023-01-11T23:10:17.0165154Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_scatter_cuda_complex64 PASSED [ 14%] 2023-01-11T23:10:17.0165321Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_scatter_cuda_int64 PASSED [ 14%] 2023-01-11T23:10:17.0165500Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_scatter_reduce_amax_cuda_int64 PASSED [ 14%] 2023-01-11T23:10:17.0165674Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_scatter_reduce_amin_cuda_int64 PASSED [ 14%] 2023-01-11T23:10:17.0165876Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_scatter_reduce_prod_cuda_float32 PASSED [ 14%] 2023-01-11T23:10:17.0166056Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_scatter_reduce_prod_cuda_int64 PASSED [ 14%] 2023-01-11T23:10:17.0166229Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_searchsorted_cuda_int64 PASSED [ 14%] 2023-01-11T23:10:17.0166417Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_segment_reduce_offsets_cuda_float32 PASSED [ 14%] 2023-01-11T23:10:17.0166582Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_select_cuda_float32 PASSED [ 14%] 2023-01-11T23:10:17.0166748Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_select_cuda_int64 PASSED [ 14%] 2023-01-11T23:10:17.0166925Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_select_scatter_cuda_float32 PASSED [ 14%] 2023-01-11T23:10:17.0167088Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sgn_cuda_float32 PASSED [ 14%] 2023-01-11T23:10:17.0167251Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sgn_cuda_int64 PASSED [ 14%] 2023-01-11T23:10:17.0167416Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_short_cuda_complex64 PASSED [ 14%] 2023-01-11T23:10:17.0167581Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_short_cuda_float32 PASSED [ 14%] 2023-01-11T23:10:17.0167742Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_short_cuda_int64 PASSED [ 14%] 2023-01-11T23:10:17.0167941Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_signal_windows_bartlett_cuda_float32 SKIPPED (Skipped!) [ 14%] 2023-01-11T23:10:17.0168138Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_signal_windows_blackman_cuda_float32 SKIPPED (Skipped!) [ 14%] 2023-01-11T23:10:17.0168333Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_signal_windows_cosine_cuda_float32 SKIPPED (Skipped!) [ 14%] 2023-01-11T23:10:17.0168532Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_signal_windows_gaussian_cuda_float32 SKIPPED (Skipped!) [ 14%] 2023-01-11T23:10:17.0168729Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_signal_windows_hann_cuda_float32 SKIPPED (Skipped!) [ 14%] 2023-01-11T23:10:17.0168923Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_signal_windows_kaiser_cuda_float32 SKIPPED (Skipped!) [ 14%] 2023-01-11T23:10:17.0169136Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_signal_windows_nuttall_cuda_float32 SKIPPED (Skipped!) [ 14%] 2023-01-11T23:10:17.0169304Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_signbit_cuda_float32 PASSED [ 14%] 2023-01-11T23:10:17.0169469Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_signbit_cuda_int64 PASSED [ 14%] 2023-01-11T23:10:17.0169632Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sin_cuda_float32 PASSED [ 14%] 2023-01-11T23:10:17.0169794Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sin_cuda_int64 PASSED [ 14%] 2023-01-11T23:10:17.0169962Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sinc_cuda_complex64 PASSED [ 14%] 2023-01-11T23:10:17.0170131Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sinc_cuda_float32 PASSED [ 14%] 2023-01-11T23:10:17.0170293Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sinc_cuda_int64 PASSED [ 14%] 2023-01-11T23:10:17.0170468Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sinh_cuda_float32 PASSED [ 14%] 2023-01-11T23:10:17.0170655Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_slice_cuda_complex64 PASSED [ 14%] 2023-01-11T23:10:17.0170836Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_slice_cuda_float32 PASSED [ 14%] 2023-01-11T23:10:17.0170998Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_slice_cuda_int64 PASSED [ 14%] 2023-01-11T23:10:17.0171165Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_softmax_cuda_float32 PASSED [ 14%] 2023-01-11T23:10:17.0171328Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sort_cuda_float32 PASSED [ 14%] 2023-01-11T23:10:17.0171490Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sort_cuda_int64 PASSED [ 14%] 2023-01-11T23:10:17.0171698Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_airy_ai_cuda_float32 PASSED [ 14%] 2023-01-11T23:10:17.0171872Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_airy_ai_cuda_int64 PASSED [ 14%] 2023-01-11T23:10:17.0172044Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_bessel_j0_cuda_int64 PASSED [ 14%] 2023-01-11T23:10:17.0172221Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_bessel_y0_cuda_float32 PASSED [ 14%] 2023-01-11T23:10:17.0172393Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_bessel_y1_cuda_float32 PASSED [ 14%] 2023-01-11T23:10:17.0172588Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_chebyshev_polynomial_t_cuda_float32 PASSED [ 14%] 2023-01-11T23:10:17.0172783Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_chebyshev_polynomial_u_cuda_float32 PASSED [ 14%] 2023-01-11T23:10:17.0173162Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_chebyshev_polynomial_v_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 14%] 2023-01-11T23:10:17.0173527Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_chebyshev_polynomial_v_cuda_int64 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 14%] 2023-01-11T23:10:17.0173704Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_entr_cuda_int64 PASSED [ 14%] 2023-01-11T23:10:17.0173878Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_erfcx_cuda_float32 PASSED [ 14%] 2023-01-11T23:10:17.0174050Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_erfcx_cuda_int64 PASSED [ 14%] 2023-01-11T23:10:17.0174239Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_hermite_polynomial_he_cuda_float32 PASSED [ 14%] 2023-01-11T23:10:17.0174431Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_hermite_polynomial_he_cuda_int64 PASSED [ 14%] 2023-01-11T23:10:17.0174717Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_i1e_cuda_float32 PASSED [ 14%] 2023-01-11T23:10:17.0174889Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_i1e_cuda_int64 PASSED [ 14%] 2023-01-11T23:10:17.0175119Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_laguerre_polynomial_l_cuda_int64 PASSED [ 14%] 2023-01-11T23:10:17.0175300Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_log_ndtr_cuda_float32 PASSED [ 14%] 2023-01-11T23:10:17.0175476Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_log_ndtr_cuda_int64 PASSED [ 14%] 2023-01-11T23:10:17.0175664Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_modified_bessel_k0_cuda_int64 PASSED [ 14%] 2023-01-11T23:10:17.0175845Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_modified_bessel_k1_cuda_float32 PASSED [ 14%] 2023-01-11T23:10:17.0176019Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_ndtr_cuda_float32 PASSED [ 14%] 2023-01-11T23:10:17.0176194Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_ndtr_cuda_int64 PASSED [ 14%] 2023-01-11T23:10:17.0176367Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_ndtri_cuda_float32 PASSED [ 14%] 2023-01-11T23:10:17.0176571Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_polygamma_special_polygamma_n_0_cuda_int64 PASSED [ 14%] 2023-01-11T23:10:17.0176769Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_scaled_modified_bessel_k0_cuda_float32 PASSED [ 14%] 2023-01-11T23:10:17.0176963Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_scaled_modified_bessel_k0_cuda_int64 PASSED [ 14%] 2023-01-11T23:10:17.0177150Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_scaled_modified_bessel_k1_cuda_int64 PASSED [ 14%] 2023-01-11T23:10:17.0177528Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_shifted_chebyshev_polynomial_t_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 14%] 2023-01-11T23:10:17.0177928Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_shifted_chebyshev_polynomial_t_cuda_int64 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 14%] 2023-01-11T23:10:17.0178292Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_shifted_chebyshev_polynomial_u_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 14%] 2023-01-11T23:10:17.0178660Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_shifted_chebyshev_polynomial_u_cuda_int64 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 14%] 2023-01-11T23:10:17.0179020Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_shifted_chebyshev_polynomial_v_cuda_int64 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 14%] 2023-01-11T23:10:17.0179213Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_spherical_bessel_j0_cuda_float32 PASSED [ 14%] 2023-01-11T23:10:17.0179408Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_spherical_bessel_j0_cuda_int64 PASSED [ 14%] 2023-01-11T23:10:17.0179574Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_split_cuda_float32 PASSED [ 14%] 2023-01-11T23:10:17.0179742Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_split_cuda_int64 PASSED [ 14%] 2023-01-11T23:10:17.0179924Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_split_list_args_cuda_complex64 PASSED [ 14%] 2023-01-11T23:10:17.0180100Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_split_list_args_cuda_int64 PASSED [ 14%] 2023-01-11T23:10:17.0180284Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_split_with_sizes_cuda_complex64 PASSED [ 14%] 2023-01-11T23:10:17.0180456Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_split_with_sizes_cuda_float32 PASSED [ 14%] 2023-01-11T23:10:17.0180629Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_split_with_sizes_cuda_int64 PASSED [ 14%] 2023-01-11T23:10:17.0180794Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sqrt_cuda_float32 PASSED [ 14%] 2023-01-11T23:10:17.0180969Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_square_cuda_complex64 PASSED [ 14%] 2023-01-11T23:10:17.0181165Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_square_cuda_float32 PASSED [ 14%] 2023-01-11T23:10:17.0181329Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_square_cuda_int64 PASSED [ 14%] 2023-01-11T23:10:17.0181496Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_stack_cuda_float32 PASSED [ 14%] 2023-01-11T23:10:17.0181659Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_stack_cuda_int64 PASSED [ 14%] 2023-01-11T23:10:17.0181831Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_std_mean_cuda_complex64 PASSED [ 14%] 2023-01-11T23:10:17.0181992Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_std_mean_cuda_float32 PASSED [ 14%] 2023-01-11T23:10:17.0182178Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_std_mean_unbiased_cuda_complex64 PASSED [ 14%] 2023-01-11T23:10:17.0182360Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_std_mean_unbiased_cuda_float32 PASSED [ 14%] 2023-01-11T23:10:17.0182537Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_std_unbiased_cuda_complex64 PASSED [ 14%] 2023-01-11T23:10:17.0182706Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_stft_cuda_complex64 PASSED [ 14%] 2023-01-11T23:10:17.0182874Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sub_cuda_complex64 PASSED [ 14%] 2023-01-11T23:10:17.0183036Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sub_cuda_int64 PASSED [ 14%] 2023-01-11T23:10:17.0183200Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sum_cuda_complex64 PASSED [ 14%] 2023-01-11T23:10:17.0183365Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sum_to_size_cuda_float32 PASSED [ 14%] 2023-01-11T23:10:17.0183533Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sum_to_size_cuda_int64 PASSED [ 14%] 2023-01-11T23:10:17.0183721Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_svd_cuda_float32 PASSED [ 14%] 2023-01-11T23:10:17.0183891Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_symeig_cuda_complex64 PASSED [ 14%] 2023-01-11T23:10:17.0184060Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_symeig_cuda_float32 PASSED [ 14%] 2023-01-11T23:10:17.0184224Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_t_cuda_complex64 PASSED [ 14%] 2023-01-11T23:10:17.0184389Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_t_cuda_float32 PASSED [ 14%] 2023-01-11T23:10:17.0184548Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_t_cuda_int64 PASSED [ 14%] 2023-01-11T23:10:17.0184726Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_take_along_dim_cuda_complex64 PASSED [ 14%] 2023-01-11T23:10:17.0184891Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_take_along_dim_cuda_float32 PASSED [ 14%] 2023-01-11T23:10:17.0185064Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_take_along_dim_cuda_int64 PASSED [ 14%] 2023-01-11T23:10:17.0185233Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_tan_cuda_complex64 PASSED [ 14%] 2023-01-11T23:10:17.0185399Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_tan_cuda_int64 PASSED [ 14%] 2023-01-11T23:10:17.0185569Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_tanh_cuda_complex64 PASSED [ 14%] 2023-01-11T23:10:17.0185733Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_tanh_cuda_int64 PASSED [ 14%] 2023-01-11T23:10:17.0185907Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_tensor_split_cuda_float32 PASSED [ 14%] 2023-01-11T23:10:17.0186082Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_tensordot_cuda_complex64 PASSED [ 14%] 2023-01-11T23:10:17.0186240Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_tile_cuda_float32 PASSED [ 14%] 2023-01-11T23:10:17.0186403Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_to_cuda_float32 PASSED [ 14%] 2023-01-11T23:10:17.0186574Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_to_sparse_cuda_float32 SKIPPED [ 14%] 2023-01-11T23:10:17.0186751Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_transpose_cuda_complex64 PASSED [ 14%] 2023-01-11T23:10:17.0186957Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_trapezoid_cuda_complex64 PASSED [ 14%] 2023-01-11T23:10:17.0187131Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_trapezoid_cuda_float32 PASSED [ 14%] 2023-01-11T23:10:17.0187300Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_trapezoid_cuda_int64 PASSED [ 14%] 2023-01-11T23:10:17.0187481Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_triangular_solve_cuda_complex64 PASSED [ 14%] 2023-01-11T23:10:17.0187642Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_tril_cuda_int64 PASSED [ 14%] 2023-01-11T23:10:17.0187792Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_triu_cuda_int64 PASSED [ 14%] 2023-01-11T23:10:17.0187962Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_true_divide_cuda_float32 PASSED [ 14%] 2023-01-11T23:10:17.0188130Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_trunc_cuda_float32 PASSED [ 14%] 2023-01-11T23:10:17.0188292Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_trunc_cuda_int64 PASSED [ 14%] 2023-01-11T23:10:17.0188460Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unbind_cuda_float32 PASSED [ 14%] 2023-01-11T23:10:17.0188632Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unflatten_cuda_complex64 PASSED [ 14%] 2023-01-11T23:10:17.0188872Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unflatten_cuda_int64 PASSED [ 14%] 2023-01-11T23:10:17.0189044Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_uniform_cuda_complex64 PASSED [ 14%] 2023-01-11T23:10:17.0189201Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unique_cuda_int64 PASSED [ 14%] 2023-01-11T23:10:17.0189367Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unsqueeze_cuda_int64 PASSED [ 14%] 2023-01-11T23:10:17.0189534Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_var_cuda_complex64 PASSED [ 14%] 2023-01-11T23:10:17.0189731Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_var_cuda_float32 PASSED [ 14%] 2023-01-11T23:10:17.0189905Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_var_mean_cuda_complex64 PASSED [ 14%] 2023-01-11T23:10:17.0190073Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_var_mean_cuda_float32 PASSED [ 14%] 2023-01-11T23:10:17.0190256Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_var_mean_unbiased_cuda_complex64 PASSED [ 14%] 2023-01-11T23:10:17.0190433Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_var_unbiased_cuda_complex64 PASSED [ 14%] 2023-01-11T23:10:17.0190600Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_vdot_cuda_complex64 PASSED [ 14%] 2023-01-11T23:10:17.0190765Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_view_as_complex_cuda_float32 XFAIL [ 14%] 2023-01-11T23:10:17.0190936Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_view_as_cuda_complex64 PASSED [ 14%] 2023-01-11T23:10:17.0191105Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_view_cuda_complex64 PASSED [ 14%] 2023-01-11T23:10:17.0191266Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_view_cuda_int64 PASSED [ 14%] 2023-01-11T23:10:17.0191435Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_vstack_cuda_complex64 PASSED [ 14%] 2023-01-11T23:10:17.0191598Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_vstack_cuda_int64 PASSED [ 14%] 2023-01-11T23:10:17.0191767Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_where_cuda_complex64 PASSED [ 14%] 2023-01-11T23:10:17.0191931Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_where_cuda_int64 PASSED [ 14%] 2023-01-11T23:10:17.0192087Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_zero__cuda_float32 PASSED [ 14%] 2023-01-11T23:10:17.0192250Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_zeros_cuda_float32 PASSED [ 14%] 2023-01-11T23:10:17.0192422Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_zeros_like_cuda_complex64 PASSED [ 14%] 2023-01-11T23:10:17.0192583Z test_ops.py::TestCommonCUDA::test_numpy_ref_addbmm_cuda_complex128 PASSED [ 14%] 2023-01-11T23:10:17.0192771Z test_ops.py::TestCommonCUDA::test_numpy_ref_allclose_cuda_complex128 PASSED [ 14%] 2023-01-11T23:10:17.0192927Z test_ops.py::TestCommonCUDA::test_numpy_ref_allclose_cuda_float64 PASSED [ 14%] 2023-01-11T23:10:17.0193083Z test_ops.py::TestCommonCUDA::test_numpy_ref_aminmax_cuda_float64 PASSED [ 14%] 2023-01-11T23:10:17.0193236Z test_ops.py::TestCommonCUDA::test_numpy_ref_argwhere_cuda_float64 PASSED [ 14%] 2023-01-11T23:10:17.0193398Z test_ops.py::TestCommonCUDA::test_numpy_ref_broadcast_tensors_cuda_float64 PASSED [ 15%] 2023-01-11T23:10:17.0193541Z test_ops.py::TestCommonCUDA::test_numpy_ref_cat_cuda_complex128 PASSED [ 15%] 2023-01-11T23:10:17.0193688Z test_ops.py::TestCommonCUDA::test_numpy_ref_cat_cuda_int64 PASSED [ 15%] 2023-01-11T23:10:17.0193836Z test_ops.py::TestCommonCUDA::test_numpy_ref_clamp_cuda_int64 PASSED [ 15%] 2023-01-11T23:10:17.0193990Z test_ops.py::TestCommonCUDA::test_numpy_ref_clone_cuda_float64 XFAIL [ 15%] 2023-01-11T23:10:17.0194138Z test_ops.py::TestCommonCUDA::test_numpy_ref_diag_cuda_int64 PASSED [ 15%] 2023-01-11T23:10:17.0194299Z test_ops.py::TestCommonCUDA::test_numpy_ref_diagflat_cuda_complex128 PASSED [ 15%] 2023-01-11T23:10:17.0194450Z test_ops.py::TestCommonCUDA::test_numpy_ref_diagflat_cuda_float64 PASSED [ 15%] 2023-01-11T23:10:17.0194597Z test_ops.py::TestCommonCUDA::test_numpy_ref_diagflat_cuda_int64 PASSED [ 15%] 2023-01-11T23:10:17.0194742Z test_ops.py::TestCommonCUDA::test_numpy_ref_diff_cuda_complex128 PASSED [ 15%] 2023-01-11T23:10:17.0194888Z test_ops.py::TestCommonCUDA::test_numpy_ref_diff_cuda_float64 PASSED [ 15%] 2023-01-11T23:10:17.0195033Z test_ops.py::TestCommonCUDA::test_numpy_ref_diff_cuda_int64 PASSED [ 15%] 2023-01-11T23:10:17.0195181Z test_ops.py::TestCommonCUDA::test_numpy_ref_equal_cuda_complex128 PASSED [ 15%] 2023-01-11T23:10:17.0195353Z test_ops.py::TestCommonCUDA::test_numpy_ref_equal_cuda_float64 PASSED [ 15%] 2023-01-11T23:10:17.0195500Z test_ops.py::TestCommonCUDA::test_numpy_ref_equal_cuda_int64 PASSED [ 15%] 2023-01-11T23:10:17.0195652Z test_ops.py::TestCommonCUDA::test_numpy_ref_flatten_cuda_int64 PASSED [ 15%] 2023-01-11T23:10:17.0195831Z test_ops.py::TestCommonCUDA::test_numpy_ref_jiterator_2inputs_2outputs_cuda_complex128 PASSED [ 15%] 2023-01-11T23:10:17.0195998Z test_ops.py::TestCommonCUDA::test_numpy_ref_jiterator_2inputs_2outputs_cuda_float64 PASSED [ 15%] 2023-01-11T23:10:17.0196182Z test_ops.py::TestCommonCUDA::test_numpy_ref_jiterator_4inputs_with_extra_args_cuda_complex128 PASSED [ 15%] 2023-01-11T23:10:17.0196364Z test_ops.py::TestCommonCUDA::test_numpy_ref_jiterator_4inputs_with_extra_args_cuda_float64 PASSED [ 15%] 2023-01-11T23:10:17.0196542Z test_ops.py::TestCommonCUDA::test_numpy_ref_jiterator_4inputs_with_extra_args_cuda_int64 PASSED [ 15%] 2023-01-11T23:10:17.0196708Z test_ops.py::TestCommonCUDA::test_numpy_ref_linalg_cross_cuda_complex128 PASSED [ 15%] 2023-01-11T23:10:17.0196873Z test_ops.py::TestCommonCUDA::test_numpy_ref_linalg_tensorinv_cuda_complex128 PASSED [ 15%] 2023-01-11T23:10:17.0197040Z test_ops.py::TestCommonCUDA::test_numpy_ref_linalg_tensorinv_cuda_float64 PASSED [ 15%] 2023-01-11T23:10:17.0197202Z test_ops.py::TestCommonCUDA::test_numpy_ref_linalg_vander_cuda_complex128 PASSED [ 15%] 2023-01-11T23:10:17.0197361Z test_ops.py::TestCommonCUDA::test_numpy_ref_linalg_vander_cuda_int64 PASSED [ 15%] 2023-01-11T23:10:17.0197531Z test_ops.py::TestCommonCUDA::test_numpy_ref_meshgrid_variadic_tensors_cuda_complex128 PASSED [ 15%] 2023-01-11T23:10:17.0197704Z test_ops.py::TestCommonCUDA::test_numpy_ref_meshgrid_variadic_tensors_cuda_float64 PASSED [ 15%] 2023-01-11T23:10:17.0197864Z test_ops.py::TestCommonCUDA::test_numpy_ref_native_layer_norm_cuda_float64 PASSED [ 15%] 2023-01-11T23:10:17.0198047Z test_ops.py::TestCommonCUDA::test_numpy_ref_nn_functional_conv_transpose1d_cuda_complex128 PASSED [ 15%] 2023-01-11T23:10:17.0198231Z test_ops.py::TestCommonCUDA::test_numpy_ref_nn_functional_conv_transpose1d_cuda_float64 PASSED [ 15%] 2023-01-11T23:10:17.0198433Z test_ops.py::TestCommonCUDA::test_numpy_ref_nn_functional_conv_transpose2d_cuda_complex128 XFAIL [ 15%] 2023-01-11T23:10:17.0198607Z test_ops.py::TestCommonCUDA::test_numpy_ref_nn_functional_group_norm_cuda_float64 PASSED [ 15%] 2023-01-11T23:10:17.0198772Z test_ops.py::TestCommonCUDA::test_numpy_ref_nn_functional_l1_loss_cuda_float64 PASSED [ 15%] 2023-01-11T23:10:17.0198933Z test_ops.py::TestCommonCUDA::test_numpy_ref_nn_functional_layer_norm_cuda_float64 PASSED [ 15%] 2023-01-11T23:10:17.0199100Z test_ops.py::TestCommonCUDA::test_numpy_ref_nn_functional_mse_loss_cuda_float64 PASSED [ 15%] 2023-01-11T23:10:17.0199264Z test_ops.py::TestCommonCUDA::test_numpy_ref_nn_functional_pdist_cuda_float64 PASSED [ 15%] 2023-01-11T23:10:17.0199437Z test_ops.py::TestCommonCUDA::test_numpy_ref_nn_functional_smooth_l1_loss_cuda_float64 PASSED [ 15%] 2023-01-11T23:10:17.0199599Z test_ops.py::TestCommonCUDA::test_numpy_ref_permute_cuda_complex128 PASSED [ 15%] 2023-01-11T23:10:17.0199751Z test_ops.py::TestCommonCUDA::test_numpy_ref_permute_cuda_float64 PASSED [ 15%] 2023-01-11T23:10:17.0199906Z test_ops.py::TestCommonCUDA::test_numpy_ref_permute_cuda_int64 PASSED [ 15%] 2023-01-11T23:10:17.0200060Z test_ops.py::TestCommonCUDA::test_numpy_ref_ravel_cuda_complex128 PASSED [ 15%] 2023-01-11T23:10:17.0200246Z test_ops.py::TestCommonCUDA::test_numpy_ref_repeat_cuda_complex128 PASSED [ 15%] 2023-01-11T23:10:17.0200412Z test_ops.py::TestCommonCUDA::test_numpy_ref_roll_cuda_float64 PASSED [ 15%] 2023-01-11T23:10:17.0200573Z test_ops.py::TestCommonCUDA::test_numpy_ref_searchsorted_cuda_float64 PASSED [ 15%] 2023-01-11T23:10:17.0200731Z test_ops.py::TestCommonCUDA::test_numpy_ref_searchsorted_cuda_int64 PASSED [ 15%] 2023-01-11T23:10:17.0200901Z test_ops.py::TestCommonCUDA::test_numpy_ref_signal_windows_blackman_cuda_float64 PASSED [ 15%] 2023-01-11T23:10:17.0201093Z test_ops.py::TestCommonCUDA::test_numpy_ref_signal_windows_cosine_cuda_float64 PASSED [ 15%] 2023-01-11T23:10:17.0201269Z test_ops.py::TestCommonCUDA::test_numpy_ref_signal_windows_exponential_cuda_float64 PASSED [ 15%] 2023-01-11T23:10:17.0201450Z test_ops.py::TestCommonCUDA::test_numpy_ref_signal_windows_general_cosine_cuda_float64 PASSED [ 15%] 2023-01-11T23:10:17.0201627Z test_ops.py::TestCommonCUDA::test_numpy_ref_signal_windows_general_hamming_cuda_float64 PASSED [ 15%] 2023-01-11T23:10:17.0201788Z test_ops.py::TestCommonCUDA::test_numpy_ref_signal_windows_hamming_cuda_float64 PASSED [ 15%] 2023-01-11T23:10:17.0201952Z test_ops.py::TestCommonCUDA::test_numpy_ref_tensor_split_cuda_complex128 PASSED [ 15%] 2023-01-11T23:10:17.0202106Z test_ops.py::TestCommonCUDA::test_numpy_ref_tensor_split_cuda_int64 PASSED [ 15%] 2023-01-11T23:10:17.0202257Z test_ops.py::TestCommonCUDA::test_numpy_ref_tile_cuda_complex128 PASSED [ 15%] 2023-01-11T23:10:17.0202404Z test_ops.py::TestCommonCUDA::test_numpy_ref_tile_cuda_int64 PASSED [ 15%] 2023-01-11T23:10:17.0202569Z test_ops.py::TestCommonCUDA::test_numpy_ref_transpose_cuda_complex128 PASSED [ 15%] 2023-01-11T23:10:17.0202727Z test_ops.py::TestCommonCUDA::test_numpy_ref_transpose_cuda_float64 PASSED [ 15%] 2023-01-11T23:10:17.0202883Z test_ops.py::TestCommonCUDA::test_numpy_ref_triu_indices_cuda_int64 PASSED [ 15%] 2023-01-11T23:10:17.0203033Z test_ops.py::TestCommonCUDA::test_numpy_ref_unbind_cuda_complex128 PASSED [ 15%] 2023-01-11T23:10:17.0203183Z test_ops.py::TestCommonCUDA::test_numpy_ref_unbind_cuda_float64 PASSED [ 15%] 2023-01-11T23:10:17.0203333Z test_ops.py::TestCommonCUDA::test_numpy_ref_view_copy_cuda_int64 PASSED [ 15%] 2023-01-11T23:10:17.0203483Z test_ops.py::TestCommonCUDA::test_numpy_ref_where_cuda_int64 PASSED [ 15%] 2023-01-11T23:10:17.0203625Z test_ops.py::TestCommonCUDA::test_out_T_cuda_float32 PASSED [ 15%] 2023-01-11T23:10:17.0203768Z test_ops.py::TestCommonCUDA::test_out___radd___cuda_float32 PASSED [ 15%] 2023-01-11T23:10:17.0203914Z test_ops.py::TestCommonCUDA::test_out___rand___cuda_int64 PASSED [ 15%] 2023-01-11T23:10:17.0204058Z test_ops.py::TestCommonCUDA::test_out___rdiv___cuda_float32 PASSED [ 15%] 2023-01-11T23:10:17.0204216Z test_ops.py::TestCommonCUDA::test_out___ror___cuda_int64 PASSED [ 15%] 2023-01-11T23:10:17.0204359Z test_ops.py::TestCommonCUDA::test_out___rxor___cuda_int64 PASSED [ 15%] 2023-01-11T23:10:17.0204521Z test_ops.py::TestCommonCUDA::test_out__native_batch_norm_legit_cuda_float32 XFAIL [ 15%] 2023-01-11T23:10:17.0204665Z test_ops.py::TestCommonCUDA::test_out__refs_T_cuda_float32 PASSED [ 15%] 2023-01-11T23:10:17.0204832Z test_ops.py::TestCommonCUDA::test_out__refs__conversions_bfloat16_cuda_float32 PASSED [ 15%] 2023-01-11T23:10:17.0204994Z test_ops.py::TestCommonCUDA::test_out__refs__conversions_byte_cuda_float32 PASSED [ 15%] 2023-01-11T23:10:17.0205160Z test_ops.py::TestCommonCUDA::test_out__refs__conversions_cdouble_cuda_float32 PASSED [ 15%] 2023-01-11T23:10:17.0205328Z test_ops.py::TestCommonCUDA::test_out__refs__conversions_cfloat_cuda_float32 PASSED [ 15%] 2023-01-11T23:10:17.0205490Z test_ops.py::TestCommonCUDA::test_out__refs__conversions_chalf_cuda_float32 PASSED [ 15%] 2023-01-11T23:10:17.0205647Z test_ops.py::TestCommonCUDA::test_out__refs__conversions_char_cuda_float32 PASSED [ 15%] 2023-01-11T23:10:17.0205816Z test_ops.py::TestCommonCUDA::test_out__refs__conversions_complex_cuda_float32 PASSED [ 15%] 2023-01-11T23:10:17.0205978Z test_ops.py::TestCommonCUDA::test_out__refs__conversions_float_cuda_float32 PASSED [ 15%] 2023-01-11T23:10:17.0206142Z test_ops.py::TestCommonCUDA::test_out__refs__conversions_int_cuda_float32 PASSED [ 15%] 2023-01-11T23:10:17.0206289Z test_ops.py::TestCommonCUDA::test_out__refs_abs_cuda_float32 PASSED [ 15%] 2023-01-11T23:10:17.0206437Z test_ops.py::TestCommonCUDA::test_out__refs_acosh_cuda_float32 PASSED [ 15%] 2023-01-11T23:10:17.0206670Z test_ops.py::TestCommonCUDA::test_out__refs_allclose_cuda_float32 SKIPPED (Skipped! Only supports single tensor or iterable of tensor outputs.) [ 15%] 2023-01-11T23:10:17.0206843Z test_ops.py::TestCommonCUDA::test_out__refs_amax_cuda_float32 PASSED [ 15%] 2023-01-11T23:10:17.0206987Z test_ops.py::TestCommonCUDA::test_out__refs_any_cuda_float32 PASSED [ 15%] 2023-01-11T23:10:17.0207137Z test_ops.py::TestCommonCUDA::test_out__refs_arange_cuda_float32 PASSED [ 15%] 2023-01-11T23:10:17.0207285Z test_ops.py::TestCommonCUDA::test_out__refs_asinh_cuda_float32 PASSED [ 15%] 2023-01-11T23:10:17.0207429Z test_ops.py::TestCommonCUDA::test_out__refs_atan_cuda_float32 PASSED [ 15%] 2023-01-11T23:10:17.0207576Z test_ops.py::TestCommonCUDA::test_out__refs_atanh_cuda_float32 PASSED [ 15%] 2023-01-11T23:10:17.0207731Z test_ops.py::TestCommonCUDA::test_out__refs_atleast_2d_cuda_float32 PASSED [ 15%] 2023-01-11T23:10:17.0207884Z test_ops.py::TestCommonCUDA::test_out__refs_atleast_3d_cuda_float32 PASSED [ 15%] 2023-01-11T23:10:17.0208036Z test_ops.py::TestCommonCUDA::test_out__refs_bitwise_and_cuda_int64 PASSED [ 15%] 2023-01-11T23:10:17.0208191Z test_ops.py::TestCommonCUDA::test_out__refs_bitwise_left_shift_cuda_int64 PASSED [ 15%] 2023-01-11T23:10:17.0208353Z test_ops.py::TestCommonCUDA::test_out__refs_bitwise_right_shift_cuda_int64 PASSED [ 15%] 2023-01-11T23:10:17.0208506Z test_ops.py::TestCommonCUDA::test_out__refs_bitwise_xor_cuda_int64 PASSED [ 15%] 2023-01-11T23:10:17.0208667Z test_ops.py::TestCommonCUDA::test_out__refs_broadcast_shapes_cuda_float32 PASSED [ 15%] 2023-01-11T23:10:17.0208827Z test_ops.py::TestCommonCUDA::test_out__refs_broadcast_tensors_cuda_float32 PASSED [ 15%] 2023-01-11T23:10:17.0208984Z test_ops.py::TestCommonCUDA::test_out__refs_broadcast_to_cuda_float32 PASSED [ 15%] 2023-01-11T23:10:17.0209137Z test_ops.py::TestCommonCUDA::test_out__refs_bucketize_cuda_float32 PASSED [ 15%] 2023-01-11T23:10:17.0209285Z test_ops.py::TestCommonCUDA::test_out__refs_chunk_cuda_float32 PASSED [ 15%] 2023-01-11T23:10:17.0209437Z test_ops.py::TestCommonCUDA::test_out__refs_clamp_max_cuda_float32 PASSED [ 15%] 2023-01-11T23:10:17.0209592Z test_ops.py::TestCommonCUDA::test_out__refs_column_stack_cuda_float32 PASSED [ 15%] 2023-01-11T23:10:17.0209753Z test_ops.py::TestCommonCUDA::test_out__refs_constant_pad_nd_cuda_float32 PASSED [ 15%] 2023-01-11T23:10:17.0209931Z test_ops.py::TestCommonCUDA::test_out__refs_contiguous_cuda_float32 PASSED [ 15%] 2023-01-11T23:10:17.0210081Z test_ops.py::TestCommonCUDA::test_out__refs_cosh_cuda_float32 PASSED [ 15%] 2023-01-11T23:10:17.0210231Z test_ops.py::TestCommonCUDA::test_out__refs_diag_cuda_float32 PASSED [ 15%] 2023-01-11T23:10:17.0210384Z test_ops.py::TestCommonCUDA::test_out__refs_diag_embed_cuda_float32 PASSED [ 15%] 2023-01-11T23:10:17.0210538Z test_ops.py::TestCommonCUDA::test_out__refs_diagonal_cuda_float32 PASSED [ 15%] 2023-01-11T23:10:17.0210705Z test_ops.py::TestCommonCUDA::test_out__refs_diagonal_scatter_cuda_float32 PASSED [ 15%] 2023-01-11T23:10:17.0210867Z test_ops.py::TestCommonCUDA::test_out__refs_digamma_cuda_float32 PASSED [ 15%] 2023-01-11T23:10:17.0211055Z test_ops.py::TestCommonCUDA::test_out__refs_div_no_rounding_mode_cuda_float32 PASSED [ 15%] 2023-01-11T23:10:17.0211203Z test_ops.py::TestCommonCUDA::test_out__refs_dstack_cuda_float32 PASSED [ 15%] 2023-01-11T23:10:17.0211398Z test_ops.py::TestCommonCUDA::test_out__refs_empty_cuda_float32 SKIPPED (Expected: empty is not comparable) [ 15%] 2023-01-11T23:10:17.0211595Z test_ops.py::TestCommonCUDA::test_out__refs_empty_like_cuda_float32 SKIPPED (Expected: empty is not comparable) [ 15%] 2023-01-11T23:10:17.0211741Z test_ops.py::TestCommonCUDA::test_out__refs_eq_cuda_float32 PASSED [ 15%] 2023-01-11T23:10:17.0211888Z test_ops.py::TestCommonCUDA::test_out__refs_erf_cuda_float32 PASSED [ 15%] 2023-01-11T23:10:17.0212037Z test_ops.py::TestCommonCUDA::test_out__refs_expand_cuda_float32 PASSED [ 15%] 2023-01-11T23:10:17.0212177Z test_ops.py::TestCommonCUDA::test_out__refs_expm1_cuda_float32 PASSED [ 15%] 2023-01-11T23:10:17.0212346Z test_ops.py::TestCommonCUDA::test_out__refs_eye_cuda_float32 PASSED [ 15%] 2023-01-11T23:10:17.0212494Z test_ops.py::TestCommonCUDA::test_out__refs_fft_fft2_cuda_float32 PASSED [ 15%] 2023-01-11T23:10:17.0212644Z test_ops.py::TestCommonCUDA::test_out__refs_fft_fftn_cuda_float32 PASSED [ 15%] 2023-01-11T23:10:17.0212799Z test_ops.py::TestCommonCUDA::test_out__refs_fft_hfft2_cuda_float32 PASSED [ 15%] 2023-01-11T23:10:17.0212949Z test_ops.py::TestCommonCUDA::test_out__refs_fft_ifft2_cuda_float32 PASSED [ 15%] 2023-01-11T23:10:17.0213095Z test_ops.py::TestCommonCUDA::test_out__refs_fft_ifft_cuda_float32 PASSED [ 15%] 2023-01-11T23:10:17.0213247Z test_ops.py::TestCommonCUDA::test_out__refs_fft_ifftn_cuda_float32 PASSED [ 15%] 2023-01-11T23:10:17.0213400Z test_ops.py::TestCommonCUDA::test_out__refs_fft_ifftshift_cuda_float32 PASSED [ 15%] 2023-01-11T23:10:17.0213556Z test_ops.py::TestCommonCUDA::test_out__refs_fft_ihfftn_cuda_float32 PASSED [ 15%] 2023-01-11T23:10:17.0213705Z test_ops.py::TestCommonCUDA::test_out__refs_fft_irfft_cuda_float32 PASSED [ 15%] 2023-01-11T23:10:17.0213862Z test_ops.py::TestCommonCUDA::test_out__refs_fft_irfftn_cuda_float32 PASSED [ 15%] 2023-01-11T23:10:17.0214011Z test_ops.py::TestCommonCUDA::test_out__refs_fft_rfft_cuda_float32 PASSED [ 15%] 2023-01-11T23:10:17.0214162Z test_ops.py::TestCommonCUDA::test_out__refs_fft_rfftn_cuda_float32 PASSED [ 15%] 2023-01-11T23:10:17.0214309Z test_ops.py::TestCommonCUDA::test_out__refs_fill_cuda_float32 PASSED [ 15%] 2023-01-11T23:10:17.0214458Z test_ops.py::TestCommonCUDA::test_out__refs_flipud_cuda_float32 PASSED [ 15%] 2023-01-11T23:10:17.0214719Z test_ops.py::TestCommonCUDA::test_out__refs_float_power_cuda_float32 PASSED [ 15%] 2023-01-11T23:10:17.0214859Z test_ops.py::TestCommonCUDA::test_out__refs_floor_cuda_float32 PASSED [ 15%] 2023-01-11T23:10:17.0215017Z test_ops.py::TestCommonCUDA::test_out__refs_floor_divide_cuda_float32 PASSED [ 15%] 2023-01-11T23:10:17.0215163Z test_ops.py::TestCommonCUDA::test_out__refs_fmax_cuda_float32 PASSED [ 15%] 2023-01-11T23:10:17.0215313Z test_ops.py::TestCommonCUDA::test_out__refs_fmin_cuda_float32 PASSED [ 15%] 2023-01-11T23:10:17.0215456Z test_ops.py::TestCommonCUDA::test_out__refs_gcd_cuda_int64 PASSED [ 15%] 2023-01-11T23:10:17.0215637Z test_ops.py::TestCommonCUDA::test_out__refs_gt_cuda_float32 PASSED [ 15%] 2023-01-11T23:10:17.0215796Z test_ops.py::TestCommonCUDA::test_out__refs_hsplit_cuda_float32 PASSED [ 15%] 2023-01-11T23:10:17.0215941Z test_ops.py::TestCommonCUDA::test_out__refs_hstack_cuda_float32 PASSED [ 15%] 2023-01-11T23:10:17.0216080Z test_ops.py::TestCommonCUDA::test_out__refs_hypot_cuda_float32 PASSED [ 15%] 2023-01-11T23:10:17.0216222Z test_ops.py::TestCommonCUDA::test_out__refs_i0_cuda_float32 PASSED [ 15%] 2023-01-11T23:10:17.0216368Z test_ops.py::TestCommonCUDA::test_out__refs_igamma_cuda_float32 PASSED [ 15%] 2023-01-11T23:10:17.0216515Z test_ops.py::TestCommonCUDA::test_out__refs_imag_cuda_complex64 PASSED [ 15%] 2023-01-11T23:10:17.0216671Z test_ops.py::TestCommonCUDA::test_out__refs_index_add_cuda_float32 PASSED [ 15%] 2023-01-11T23:10:17.0216832Z test_ops.py::TestCommonCUDA::test_out__refs_index_select_cuda_float32 PASSED [ 15%] 2023-01-11T23:10:17.0216981Z test_ops.py::TestCommonCUDA::test_out__refs_isinf_cuda_float32 PASSED [ 15%] 2023-01-11T23:10:17.0217129Z test_ops.py::TestCommonCUDA::test_out__refs_isneginf_cuda_float32 PASSED [ 15%] 2023-01-11T23:10:17.0217271Z test_ops.py::TestCommonCUDA::test_out__refs_isposinf_cuda_float32 PASSED [ 15%] 2023-01-11T23:10:17.0217417Z test_ops.py::TestCommonCUDA::test_out__refs_isreal_cuda_float32 PASSED [ 15%] 2023-01-11T23:10:17.0217560Z test_ops.py::TestCommonCUDA::test_out__refs_lcm_cuda_int64 PASSED [ 15%] 2023-01-11T23:10:17.0217705Z test_ops.py::TestCommonCUDA::test_out__refs_lgamma_cuda_float32 PASSED [ 15%] 2023-01-11T23:10:17.0217868Z test_ops.py::TestCommonCUDA::test_out__refs_linalg_matrix_norm_cuda_float32 PASSED [ 15%] 2023-01-11T23:10:17.0218021Z test_ops.py::TestCommonCUDA::test_out__refs_linalg_svd_cuda_float32 PASSED [ 15%] 2023-01-11T23:10:17.0218213Z test_ops.py::TestCommonCUDA::test_out__refs_linalg_vector_norm_cuda_float32 PASSED [ 15%] 2023-01-11T23:10:17.0218362Z test_ops.py::TestCommonCUDA::test_out__refs_linspace_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0218514Z test_ops.py::TestCommonCUDA::test_out__refs_logical_not_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0218660Z test_ops.py::TestCommonCUDA::test_out__refs_lt_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0218805Z test_ops.py::TestCommonCUDA::test_out__refs_mean_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0218976Z test_ops.py::TestCommonCUDA::test_out__refs_meshgrid_list_of_tensors_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0219149Z test_ops.py::TestCommonCUDA::test_out__refs_meshgrid_variadic_tensors_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0219298Z test_ops.py::TestCommonCUDA::test_out__refs_minimum_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0219442Z test_ops.py::TestCommonCUDA::test_out__refs_mul_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0219617Z test_ops.py::TestCommonCUDA::test_out__refs_narrow_copy_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0219772Z test_ops.py::TestCommonCUDA::test_out__refs_native_layer_norm_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0219921Z test_ops.py::TestCommonCUDA::test_out__refs_neg_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0220093Z test_ops.py::TestCommonCUDA::test_out__refs_new_full_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0220263Z test_ops.py::TestCommonCUDA::test_out__refs_new_ones_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0220417Z test_ops.py::TestCommonCUDA::test_out__refs_nextafter_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0220592Z test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_alpha_dropout_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0220755Z test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_celu_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0220920Z test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_elu_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0221085Z test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_gelu_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0221264Z test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_glu_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0221438Z test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_hardshrink_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0221605Z test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_huber_loss_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0221769Z test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_l1_loss_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0221935Z test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_layer_norm_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0222097Z test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_mish_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0222263Z test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_mse_loss_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0222430Z test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_prelu_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0222584Z test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_selu_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0222765Z test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_softmax_with_dtype_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0222944Z test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_softmin_with_dtype_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0223112Z test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_softplus_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0223278Z test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_tanhshrink_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0223457Z test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_triplet_margin_loss_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0223605Z test_ops.py::TestCommonCUDA::test_out__refs_norm_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0223776Z test_ops.py::TestCommonCUDA::test_out__refs_ones_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0223927Z test_ops.py::TestCommonCUDA::test_out__refs_permute_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0224071Z test_ops.py::TestCommonCUDA::test_out__refs_real_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0224225Z test_ops.py::TestCommonCUDA::test_out__refs_remainder_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0224373Z test_ops.py::TestCommonCUDA::test_out__refs_repeat_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0224528Z test_ops.py::TestCommonCUDA::test_out__refs_reshape_as_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0224680Z test_ops.py::TestCommonCUDA::test_out__refs_reshape_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0224824Z test_ops.py::TestCommonCUDA::test_out__refs_rot90_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0224968Z test_ops.py::TestCommonCUDA::test_out__refs_rsub_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0225116Z test_ops.py::TestCommonCUDA::test_out__refs_signbit_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0225256Z test_ops.py::TestCommonCUDA::test_out__refs_sin_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0225414Z test_ops.py::TestCommonCUDA::test_out__refs_sinc_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0225631Z test_ops.py::TestCommonCUDA::test_out__refs_special_bessel_j0_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0225844Z test_ops.py::TestCommonCUDA::test_out__refs_special_entr_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0226047Z test_ops.py::TestCommonCUDA::test_out__refs_special_i0e_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0226217Z test_ops.py::TestCommonCUDA::test_out__refs_special_i1e_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0226375Z test_ops.py::TestCommonCUDA::test_out__refs_special_logit_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0226556Z test_ops.py::TestCommonCUDA::test_out__refs_special_multigammaln_mvlgamma_p_3_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0226714Z test_ops.py::TestCommonCUDA::test_out__refs_special_ndtri_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0226887Z test_ops.py::TestCommonCUDA::test_out__refs_special_spherical_bessel_j0_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0227095Z test_ops.py::TestCommonCUDA::test_out__refs_special_xlog1py_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0227253Z test_ops.py::TestCommonCUDA::test_out__refs_special_zeta_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0227402Z test_ops.py::TestCommonCUDA::test_out__refs_square_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0227551Z test_ops.py::TestCommonCUDA::test_out__refs_squeeze_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0227698Z test_ops.py::TestCommonCUDA::test_out__refs_std_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0227842Z test_ops.py::TestCommonCUDA::test_out__refs_sub_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0227990Z test_ops.py::TestCommonCUDA::test_out__refs_sum_to_size_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0228133Z test_ops.py::TestCommonCUDA::test_out__refs_t_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0228280Z test_ops.py::TestCommonCUDA::test_out__refs_tan_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0228436Z test_ops.py::TestCommonCUDA::test_out__refs_tensor_split_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0228595Z test_ops.py::TestCommonCUDA::test_out__refs_transpose_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0228828Z test_ops.py::TestCommonCUDA::test_out__refs_tril_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0228978Z test_ops.py::TestCommonCUDA::test_out__refs_unbind_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0229131Z test_ops.py::TestCommonCUDA::test_out__refs_unflatten_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0229285Z test_ops.py::TestCommonCUDA::test_out__refs_unfold_copy_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0229426Z test_ops.py::TestCommonCUDA::test_out__refs_unfold_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0229579Z test_ops.py::TestCommonCUDA::test_out__refs_unsqueeze_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0229758Z test_ops.py::TestCommonCUDA::test_out__refs_view_as_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0229907Z test_ops.py::TestCommonCUDA::test_out__refs_vstack_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0230058Z test_ops.py::TestCommonCUDA::test_out__refs_zeros_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0230204Z test_ops.py::TestCommonCUDA::test_out_abs_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0230350Z test_ops.py::TestCommonCUDA::test_out_addcmul_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0230496Z test_ops.py::TestCommonCUDA::test_out_addmv_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0230635Z test_ops.py::TestCommonCUDA::test_out_addr_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0230864Z test_ops.py::TestCommonCUDA::test_out_allclose_cuda_float32 SKIPPED (Skipped! Only supports single tensor or iterable of tensor outputs.) [ 16%] 2023-01-11T23:10:17.0231008Z test_ops.py::TestCommonCUDA::test_out_amax_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0231154Z test_ops.py::TestCommonCUDA::test_out_any_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0231298Z test_ops.py::TestCommonCUDA::test_out_arange_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0231446Z test_ops.py::TestCommonCUDA::test_out_argmin_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0231592Z test_ops.py::TestCommonCUDA::test_out_argwhere_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0231736Z test_ops.py::TestCommonCUDA::test_out_asinh_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0231873Z test_ops.py::TestCommonCUDA::test_out_atan_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0232019Z test_ops.py::TestCommonCUDA::test_out_bitwise_and_cuda_int64 PASSED [ 16%] 2023-01-11T23:10:17.0232164Z test_ops.py::TestCommonCUDA::test_out_bitwise_or_cuda_int64 PASSED [ 16%] 2023-01-11T23:10:17.0232321Z test_ops.py::TestCommonCUDA::test_out_bitwise_right_shift_cuda_int64 PASSED [ 16%] 2023-01-11T23:10:17.0232471Z test_ops.py::TestCommonCUDA::test_out_bitwise_xor_cuda_int64 PASSED [ 16%] 2023-01-11T23:10:17.0232617Z test_ops.py::TestCommonCUDA::test_out_block_diag_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0232784Z test_ops.py::TestCommonCUDA::test_out_bmm_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0232934Z test_ops.py::TestCommonCUDA::test_out_broadcast_to_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0233071Z test_ops.py::TestCommonCUDA::test_out_byte_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0233221Z test_ops.py::TestCommonCUDA::test_out_cartesian_prod_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0233364Z test_ops.py::TestCommonCUDA::test_out_cat_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0233508Z test_ops.py::TestCommonCUDA::test_out_cdist_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0233653Z test_ops.py::TestCommonCUDA::test_out_cdouble_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0233800Z test_ops.py::TestCommonCUDA::test_out_ceil_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0233944Z test_ops.py::TestCommonCUDA::test_out_cfloat_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0234092Z test_ops.py::TestCommonCUDA::test_out_chalf_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0234230Z test_ops.py::TestCommonCUDA::test_out_char_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0234379Z test_ops.py::TestCommonCUDA::test_out_cholesky_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0234532Z test_ops.py::TestCommonCUDA::test_out_cholesky_inverse_cuda_float32 XFAIL [ 16%] 2023-01-11T23:10:17.0234677Z test_ops.py::TestCommonCUDA::test_out_chunk_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0234822Z test_ops.py::TestCommonCUDA::test_out_clamp_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0234972Z test_ops.py::TestCommonCUDA::test_out_clamp_max_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0235120Z test_ops.py::TestCommonCUDA::test_out_clamp_min_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0235290Z test_ops.py::TestCommonCUDA::test_out_clone_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0235434Z test_ops.py::TestCommonCUDA::test_out_combinations_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0235584Z test_ops.py::TestCommonCUDA::test_out_complex_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0235738Z test_ops.py::TestCommonCUDA::test_out_constant_pad_nd_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0235887Z test_ops.py::TestCommonCUDA::test_out_contiguous_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0236036Z test_ops.py::TestCommonCUDA::test_out_copysign_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0236180Z test_ops.py::TestCommonCUDA::test_out_cos_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0236327Z test_ops.py::TestCommonCUDA::test_out_cross_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0236474Z test_ops.py::TestCommonCUDA::test_out_cummin_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0236613Z test_ops.py::TestCommonCUDA::test_out_cumsum_cuda_float32 XFAIL [ 16%] 2023-01-11T23:10:17.0236757Z test_ops.py::TestCommonCUDA::test_out_diag_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0236908Z test_ops.py::TestCommonCUDA::test_out_diagflat_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0237059Z test_ops.py::TestCommonCUDA::test_out_diagonal_copy_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0237203Z test_ops.py::TestCommonCUDA::test_out_diff_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0237345Z test_ops.py::TestCommonCUDA::test_out_dist_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0237509Z test_ops.py::TestCommonCUDA::test_out_div_no_rounding_mode_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0237669Z test_ops.py::TestCommonCUDA::test_out_div_trunc_rounding_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0237813Z test_ops.py::TestCommonCUDA::test_out_dot_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0237955Z test_ops.py::TestCommonCUDA::test_out_dstack_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0238144Z test_ops.py::TestCommonCUDA::test_out_empty_cuda_float32 SKIPPED (Expected: empty is not comparable) [ 16%] 2023-01-11T23:10:17.0238309Z test_ops.py::TestCommonCUDA::test_out_eq_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0238453Z test_ops.py::TestCommonCUDA::test_out_erf_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0238599Z test_ops.py::TestCommonCUDA::test_out_erfc_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0238742Z test_ops.py::TestCommonCUDA::test_out_exp2_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0238886Z test_ops.py::TestCommonCUDA::test_out_expm1_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0239029Z test_ops.py::TestCommonCUDA::test_out_eye_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0239168Z test_ops.py::TestCommonCUDA::test_out_fft_fft2_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0239315Z test_ops.py::TestCommonCUDA::test_out_fft_fft_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0239465Z test_ops.py::TestCommonCUDA::test_out_fft_fftn_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0239611Z test_ops.py::TestCommonCUDA::test_out_fft_hfft2_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0239759Z test_ops.py::TestCommonCUDA::test_out_fft_hfft_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0239906Z test_ops.py::TestCommonCUDA::test_out_fft_hfftn_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0240051Z test_ops.py::TestCommonCUDA::test_out_fft_ifft2_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0240222Z test_ops.py::TestCommonCUDA::test_out_fft_ifft_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0240384Z test_ops.py::TestCommonCUDA::test_out_fft_ifftn_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0240537Z test_ops.py::TestCommonCUDA::test_out_fft_ifftshift_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0240683Z test_ops.py::TestCommonCUDA::test_out_fft_ihfft2_cuda_float32 XFAIL [ 16%] 2023-01-11T23:10:17.0240854Z test_ops.py::TestCommonCUDA::test_out_fft_ihfft_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0241001Z test_ops.py::TestCommonCUDA::test_out_fft_irfft2_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0241149Z test_ops.py::TestCommonCUDA::test_out_fft_irfft_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0241295Z test_ops.py::TestCommonCUDA::test_out_fft_irfftn_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0241442Z test_ops.py::TestCommonCUDA::test_out_fft_rfft2_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0241581Z test_ops.py::TestCommonCUDA::test_out_fft_rfft_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0241726Z test_ops.py::TestCommonCUDA::test_out_fill_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0241873Z test_ops.py::TestCommonCUDA::test_out_flatten_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0242016Z test_ops.py::TestCommonCUDA::test_out_flip_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0242166Z test_ops.py::TestCommonCUDA::test_out_fliplr_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0242310Z test_ops.py::TestCommonCUDA::test_out_flipud_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0242457Z test_ops.py::TestCommonCUDA::test_out_float_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0242603Z test_ops.py::TestCommonCUDA::test_out_float_power_cuda_float32 PASSED [ 16%] 2023-01-11T23:10:17.0242741Z test_ops.py::TestCommonCUDA::test_out_fmax_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0242885Z test_ops.py::TestCommonCUDA::test_out_fmod_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0243031Z test_ops.py::TestCommonCUDA::test_out_gather_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0243175Z test_ops.py::TestCommonCUDA::test_out_geqrf_cuda_float32 XFAIL [ 17%] 2023-01-11T23:10:17.0243331Z test_ops.py::TestCommonCUDA::test_out_grid_sampler_2d_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0243475Z test_ops.py::TestCommonCUDA::test_out_half_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0243626Z test_ops.py::TestCommonCUDA::test_out_heaviside_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0243768Z test_ops.py::TestCommonCUDA::test_out_histc_cuda_float32 XFAIL [ 17%] 2023-01-11T23:10:17.0243966Z test_ops.py::TestCommonCUDA::test_out_hstack_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0244112Z test_ops.py::TestCommonCUDA::test_out_hypot_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0244262Z test_ops.py::TestCommonCUDA::test_out_index_reduce_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0244407Z test_ops.py::TestCommonCUDA::test_out_int_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0244554Z test_ops.py::TestCommonCUDA::test_out_isclose_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0244701Z test_ops.py::TestCommonCUDA::test_out_isfinite_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0244845Z test_ops.py::TestCommonCUDA::test_out_isin_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0244992Z test_ops.py::TestCommonCUDA::test_out_isnan_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0245135Z test_ops.py::TestCommonCUDA::test_out_isposinf_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0245282Z test_ops.py::TestCommonCUDA::test_out_isreal_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0245452Z test_ops.py::TestCommonCUDA::test_out_jiterator_2inputs_2outputs_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0245630Z test_ops.py::TestCommonCUDA::test_out_jiterator_4inputs_with_extra_args_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0245802Z test_ops.py::TestCommonCUDA::test_out_jiterator_binary_return_by_ref_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0245958Z test_ops.py::TestCommonCUDA::test_out_jiterator_unary_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0246101Z test_ops.py::TestCommonCUDA::test_out_le_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0246248Z test_ops.py::TestCommonCUDA::test_out_lgamma_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0246422Z test_ops.py::TestCommonCUDA::test_out_linalg_cross_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0246563Z test_ops.py::TestCommonCUDA::test_out_linalg_det_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0246729Z test_ops.py::TestCommonCUDA::test_out_linalg_det_singular_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0246900Z test_ops.py::TestCommonCUDA::test_out_linalg_householder_product_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0247059Z test_ops.py::TestCommonCUDA::test_out_linalg_lu_factor_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0247220Z test_ops.py::TestCommonCUDA::test_out_linalg_lu_factor_ex_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0247374Z test_ops.py::TestCommonCUDA::test_out_linalg_lu_solve_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0247534Z test_ops.py::TestCommonCUDA::test_out_linalg_matrix_norm_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0247689Z test_ops.py::TestCommonCUDA::test_out_linalg_matrix_rank_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0247857Z test_ops.py::TestCommonCUDA::test_out_linalg_matrix_rank_hermitian_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0248013Z test_ops.py::TestCommonCUDA::test_out_linalg_multi_dot_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0248181Z test_ops.py::TestCommonCUDA::test_out_linalg_pinv_hermitian_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0248406Z test_ops.py::TestCommonCUDA::test_out_linalg_pinv_singular_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 17%] 2023-01-11T23:10:17.0248556Z test_ops.py::TestCommonCUDA::test_out_linalg_qr_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0248707Z test_ops.py::TestCommonCUDA::test_out_linalg_solve_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0248862Z test_ops.py::TestCommonCUDA::test_out_linalg_solve_ex_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0249020Z test_ops.py::TestCommonCUDA::test_out_linalg_tensorinv_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0249165Z test_ops.py::TestCommonCUDA::test_out_linalg_vecdot_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0249317Z test_ops.py::TestCommonCUDA::test_out_linspace_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0249463Z test_ops.py::TestCommonCUDA::test_out_log2_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0249636Z test_ops.py::TestCommonCUDA::test_out_logaddexp_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0249791Z test_ops.py::TestCommonCUDA::test_out_logcumsumexp_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0249937Z test_ops.py::TestCommonCUDA::test_out_logdet_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0250086Z test_ops.py::TestCommonCUDA::test_out_logical_or_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0250235Z test_ops.py::TestCommonCUDA::test_out_logical_xor_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0250380Z test_ops.py::TestCommonCUDA::test_out_logit_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0250521Z test_ops.py::TestCommonCUDA::test_out_logspace_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0250672Z test_ops.py::TestCommonCUDA::test_out_logsumexp_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0250819Z test_ops.py::TestCommonCUDA::test_out_masked_amax_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0250975Z test_ops.py::TestCommonCUDA::test_out_masked_cumsum_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0251125Z test_ops.py::TestCommonCUDA::test_out_masked_fill_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0251287Z test_ops.py::TestCommonCUDA::test_out_masked_log_softmax_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0251445Z test_ops.py::TestCommonCUDA::test_out_masked_logaddexp_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0251597Z test_ops.py::TestCommonCUDA::test_out_masked_norm_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0251749Z test_ops.py::TestCommonCUDA::test_out_masked_normalize_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0251897Z test_ops.py::TestCommonCUDA::test_out_masked_prod_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0252073Z test_ops.py::TestCommonCUDA::test_out_masked_scatter_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0252224Z test_ops.py::TestCommonCUDA::test_out_masked_std_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0252374Z test_ops.py::TestCommonCUDA::test_out_max_binary_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0252540Z test_ops.py::TestCommonCUDA::test_out_max_reduction_no_dim_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0252705Z test_ops.py::TestCommonCUDA::test_out_max_reduction_with_dim_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0252851Z test_ops.py::TestCommonCUDA::test_out_maximum_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0252992Z test_ops.py::TestCommonCUDA::test_out_median_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0253158Z test_ops.py::TestCommonCUDA::test_out_meshgrid_list_of_tensors_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0253306Z test_ops.py::TestCommonCUDA::test_out_min_binary_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0253467Z test_ops.py::TestCommonCUDA::test_out_min_reduction_no_dim_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0253633Z test_ops.py::TestCommonCUDA::test_out_min_reduction_with_dim_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0253779Z test_ops.py::TestCommonCUDA::test_out_minimum_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0253926Z test_ops.py::TestCommonCUDA::test_out_mode_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0254078Z test_ops.py::TestCommonCUDA::test_out_movedim_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0254216Z test_ops.py::TestCommonCUDA::test_out_msort_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0254384Z test_ops.py::TestCommonCUDA::test_out_multinomial_cuda_float32 SKIPPED (Skipped!) [ 17%] 2023-01-11T23:10:17.0254631Z test_ops.py::TestCommonCUDA::test_out_mv_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0254796Z test_ops.py::TestCommonCUDA::test_out_mvlgamma_mvlgamma_p_5_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0254939Z test_ops.py::TestCommonCUDA::test_out_nan_to_num_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0255090Z test_ops.py::TestCommonCUDA::test_out_nanmedian_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0255237Z test_ops.py::TestCommonCUDA::test_out_nansum_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0255426Z test_ops.py::TestCommonCUDA::test_out_narrow_copy_cuda_float32 XFAIL [ 17%] 2023-01-11T23:10:17.0255565Z test_ops.py::TestCommonCUDA::test_out_narrow_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0255722Z test_ops.py::TestCommonCUDA::test_out_native_batch_norm_cuda_float32 XFAIL [ 17%] 2023-01-11T23:10:17.0255878Z test_ops.py::TestCommonCUDA::test_out_native_layer_norm_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0256018Z test_ops.py::TestCommonCUDA::test_out_ne_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0256159Z test_ops.py::TestCommonCUDA::test_out_neg_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0256317Z test_ops.py::TestCommonCUDA::test_out_new_empty_strided_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0256466Z test_ops.py::TestCommonCUDA::test_out_nextafter_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0256655Z test_ops.py::TestCommonCUDA::test_out_nn_functional__scaled_dot_product_attention_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0256835Z test_ops.py::TestCommonCUDA::test_out_nn_functional_adaptive_avg_pool3d_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0256998Z test_ops.py::TestCommonCUDA::test_out_nn_functional_alpha_dropout_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0257161Z test_ops.py::TestCommonCUDA::test_out_nn_functional_batch_norm_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0257342Z test_ops.py::TestCommonCUDA::test_out_nn_functional_batch_norm_without_cudnn_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0257504Z test_ops.py::TestCommonCUDA::test_out_nn_functional_bilinear_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0257662Z test_ops.py::TestCommonCUDA::test_out_nn_functional_celu_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0257858Z test_ops.py::TestCommonCUDA::test_out_nn_functional_conv2d_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0258031Z test_ops.py::TestCommonCUDA::test_out_nn_functional_conv_transpose2d_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0258207Z test_ops.py::TestCommonCUDA::test_out_nn_functional_cosine_similarity_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0258366Z test_ops.py::TestCommonCUDA::test_out_nn_functional_dropout2d_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0258529Z test_ops.py::TestCommonCUDA::test_out_nn_functional_dropout_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0258720Z test_ops.py::TestCommonCUDA::test_out_nn_functional_feature_alpha_dropout_with_train_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0258899Z test_ops.py::TestCommonCUDA::test_out_nn_functional_fractional_max_pool3d_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0259057Z test_ops.py::TestCommonCUDA::test_out_nn_functional_glu_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0259227Z test_ops.py::TestCommonCUDA::test_out_nn_functional_grid_sample_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0259396Z test_ops.py::TestCommonCUDA::test_out_nn_functional_group_norm_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0259562Z test_ops.py::TestCommonCUDA::test_out_nn_functional_hardshrink_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0259730Z test_ops.py::TestCommonCUDA::test_out_nn_functional_hardswish_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0259888Z test_ops.py::TestCommonCUDA::test_out_nn_functional_hardtanh_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0260062Z test_ops.py::TestCommonCUDA::test_out_nn_functional_huber_loss_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0260261Z test_ops.py::TestCommonCUDA::test_out_nn_functional_interpolate_area_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0260448Z test_ops.py::TestCommonCUDA::test_out_nn_functional_interpolate_bicubic_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0260627Z test_ops.py::TestCommonCUDA::test_out_nn_functional_interpolate_bilinear_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0260805Z test_ops.py::TestCommonCUDA::test_out_nn_functional_interpolate_linear_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0260965Z test_ops.py::TestCommonCUDA::test_out_nn_functional_kl_div_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0261153Z test_ops.py::TestCommonCUDA::test_out_nn_functional_l1_loss_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0261310Z test_ops.py::TestCommonCUDA::test_out_nn_functional_leaky_relu_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0261475Z test_ops.py::TestCommonCUDA::test_out_nn_functional_logsigmoid_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0261651Z test_ops.py::TestCommonCUDA::test_out_nn_functional_margin_ranking_loss_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0261815Z test_ops.py::TestCommonCUDA::test_out_nn_functional_max_pool1d_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0261982Z test_ops.py::TestCommonCUDA::test_out_nn_functional_max_unpool3d_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0262155Z test_ops.py::TestCommonCUDA::test_out_nn_functional_multi_margin_loss_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0262319Z test_ops.py::TestCommonCUDA::test_out_nn_functional_one_hot_cuda_int64 PASSED [ 17%] 2023-01-11T23:10:17.0262490Z test_ops.py::TestCommonCUDA::test_out_nn_functional_pad_circular_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0262658Z test_ops.py::TestCommonCUDA::test_out_nn_functional_pad_reflect_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0262822Z test_ops.py::TestCommonCUDA::test_out_nn_functional_pixel_shuffle_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0262981Z test_ops.py::TestCommonCUDA::test_out_nn_functional_relu_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0263138Z test_ops.py::TestCommonCUDA::test_out_nn_functional_silu_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0263305Z test_ops.py::TestCommonCUDA::test_out_nn_functional_softmin_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0263482Z test_ops.py::TestCommonCUDA::test_out_nn_functional_softmin_with_dtype_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0263671Z test_ops.py::TestCommonCUDA::test_out_nn_functional_softshrink_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0263834Z test_ops.py::TestCommonCUDA::test_out_nn_functional_softsign_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0264013Z test_ops.py::TestCommonCUDA::test_out_nn_functional_triplet_margin_loss_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0264179Z test_ops.py::TestCommonCUDA::test_out_nn_functional_upsample_nearest_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0264329Z test_ops.py::TestCommonCUDA::test_out_norm_inf_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0264478Z test_ops.py::TestCommonCUDA::test_out_norm_nuc_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0264639Z test_ops.py::TestCommonCUDA::test_out_normal_number_mean_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0264789Z test_ops.py::TestCommonCUDA::test_out_ones_like_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0264959Z test_ops.py::TestCommonCUDA::test_out_ops_nvprims_native_batch_norm_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0265122Z test_ops.py::TestCommonCUDA::test_out_ops_nvprims_var_mean_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0265279Z test_ops.py::TestCommonCUDA::test_out_ops_nvprims_view_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0265423Z test_ops.py::TestCommonCUDA::test_out_outer_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0265590Z test_ops.py::TestCommonCUDA::test_out_polygamma_polygamma_n_0_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0265770Z test_ops.py::TestCommonCUDA::test_out_polygamma_polygamma_n_1_cuda_float32 SKIPPED (Skipped!) [ 17%] 2023-01-11T23:10:17.0265946Z test_ops.py::TestCommonCUDA::test_out_polygamma_polygamma_n_2_cuda_float32 SKIPPED (Skipped!) [ 17%] 2023-01-11T23:10:17.0266118Z test_ops.py::TestCommonCUDA::test_out_polygamma_polygamma_n_3_cuda_float32 SKIPPED (Skipped!) [ 17%] 2023-01-11T23:10:17.0266290Z test_ops.py::TestCommonCUDA::test_out_polygamma_polygamma_n_4_cuda_float32 SKIPPED (Skipped!) [ 17%] 2023-01-11T23:10:17.0266443Z test_ops.py::TestCommonCUDA::test_out_positive_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0266592Z test_ops.py::TestCommonCUDA::test_out_qr_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0266744Z test_ops.py::TestCommonCUDA::test_out_quantile_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0266929Z test_ops.py::TestCommonCUDA::test_out_randint_like_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0267076Z test_ops.py::TestCommonCUDA::test_out_randn_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0267224Z test_ops.py::TestCommonCUDA::test_out_randn_like_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0267370Z test_ops.py::TestCommonCUDA::test_out_real_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0267518Z test_ops.py::TestCommonCUDA::test_out_reciprocal_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0267663Z test_ops.py::TestCommonCUDA::test_out_renorm_cuda_float32 PASSED [ 17%] 2023-01-11T23:10:17.0267809Z test_ops.py::TestCommonCUDA::test_out_repeat_cuda_float32 PASSED [ 18%] 2023-01-11T23:10:17.0267975Z test_ops.py::TestCommonCUDA::test_out_repeat_interleave_cuda_float32 PASSED [ 18%] 2023-01-11T23:10:17.0268115Z test_ops.py::TestCommonCUDA::test_out_resize__cuda_float32 PASSED [ 18%] 2023-01-11T23:10:17.0268265Z test_ops.py::TestCommonCUDA::test_out_resize_as__cuda_float32 PASSED [ 18%] 2023-01-11T23:10:17.0268412Z test_ops.py::TestCommonCUDA::test_out_resolve_neg_cuda_float32 PASSED [ 18%] 2023-01-11T23:10:17.0268556Z test_ops.py::TestCommonCUDA::test_out_round_cuda_float32 PASSED [ 18%] 2023-01-11T23:10:17.0268792Z test_ops.py::TestCommonCUDA::test_out_round_decimals_3_cuda_float32 SKIPPED (Skipped!) [ 18%] 2023-01-11T23:10:17.0268988Z test_ops.py::TestCommonCUDA::test_out_round_decimals_neg_3_cuda_float32 SKIPPED (Skipped!) [ 18%] 2023-01-11T23:10:17.0269134Z test_ops.py::TestCommonCUDA::test_out_rsqrt_cuda_float32 PASSED [ 18%] 2023-01-11T23:10:17.0269284Z test_ops.py::TestCommonCUDA::test_out_scatter_add_cuda_float32 PASSED [ 18%] 2023-01-11T23:10:17.0269456Z test_ops.py::TestCommonCUDA::test_out_scatter_cuda_float32 PASSED [ 18%] 2023-01-11T23:10:17.0269618Z test_ops.py::TestCommonCUDA::test_out_scatter_reduce_amax_cuda_float32 PASSED [ 18%] 2023-01-11T23:10:17.0269781Z test_ops.py::TestCommonCUDA::test_out_scatter_reduce_sum_cuda_float32 PASSED [ 18%] 2023-01-11T23:10:17.0269935Z test_ops.py::TestCommonCUDA::test_out_searchsorted_cuda_float32 PASSED [ 18%] 2023-01-11T23:10:17.0270103Z test_ops.py::TestCommonCUDA::test_out_segment_reduce_lengths_cuda_float32 PASSED [ 18%] 2023-01-11T23:10:17.0270251Z test_ops.py::TestCommonCUDA::test_out_select_cuda_float32 PASSED [ 18%] 2023-01-11T23:10:17.0270398Z test_ops.py::TestCommonCUDA::test_out_short_cuda_float32 PASSED [ 18%] 2023-01-11T23:10:17.0270547Z test_ops.py::TestCommonCUDA::test_out_sigmoid_cuda_float32 PASSED [ 18%] 2023-01-11T23:10:17.0270712Z test_ops.py::TestCommonCUDA::test_out_signal_windows_exponential_cuda_float32 PASSED [ 18%] 2023-01-11T23:10:17.0270879Z test_ops.py::TestCommonCUDA::test_out_signal_windows_hann_cuda_float32 PASSED [ 18%] 2023-01-11T23:10:17.0271042Z test_ops.py::TestCommonCUDA::test_out_signal_windows_kaiser_cuda_float32 PASSED [ 18%] 2023-01-11T23:10:17.0271192Z test_ops.py::TestCommonCUDA::test_out_signbit_cuda_float32 PASSED [ 18%] 2023-01-11T23:10:17.0271337Z test_ops.py::TestCommonCUDA::test_out_sin_cuda_float32 PASSED [ 18%] 2023-01-11T23:10:17.0271484Z test_ops.py::TestCommonCUDA::test_out_sinc_cuda_float32 PASSED [ 18%] 2023-01-11T23:10:17.0271631Z test_ops.py::TestCommonCUDA::test_out_slice_cuda_float32 PASSED [ 18%] 2023-01-11T23:10:17.0271775Z test_ops.py::TestCommonCUDA::test_out_sort_cuda_float32 PASSED [ 18%] 2023-01-11T23:10:17.0271932Z test_ops.py::TestCommonCUDA::test_out_special_airy_ai_cuda_float32 PASSED [ 18%] 2023-01-11T23:10:17.0272087Z test_ops.py::TestCommonCUDA::test_out_special_bessel_j1_cuda_float32 PASSED [ 18%] 2023-01-11T23:10:17.0272267Z test_ops.py::TestCommonCUDA::test_out_special_chebyshev_polynomial_t_cuda_float32 PASSED [ 18%] 2023-01-11T23:10:17.0272443Z test_ops.py::TestCommonCUDA::test_out_special_chebyshev_polynomial_u_cuda_float32 PASSED [ 18%] 2023-01-11T23:10:17.0272827Z test_ops.py::TestCommonCUDA::test_out_special_chebyshev_polynomial_w_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 18%] 2023-01-11T23:10:17.0272984Z test_ops.py::TestCommonCUDA::test_out_special_entr_cuda_float32 PASSED [ 18%] 2023-01-11T23:10:17.0273159Z test_ops.py::TestCommonCUDA::test_out_special_hermite_polynomial_h_cuda_float32 PASSED [ 18%] 2023-01-11T23:10:17.0273310Z test_ops.py::TestCommonCUDA::test_out_special_i0e_cuda_float32 PASSED [ 18%] 2023-01-11T23:10:17.0273487Z test_ops.py::TestCommonCUDA::test_out_special_laguerre_polynomial_l_cuda_float32 PASSED [ 18%] 2023-01-11T23:10:17.0273650Z test_ops.py::TestCommonCUDA::test_out_special_modified_bessel_k0_cuda_float32 PASSED [ 18%] 2023-01-11T23:10:17.0273824Z test_ops.py::TestCommonCUDA::test_out_special_modified_bessel_k1_cuda_float32 PASSED [ 18%] 2023-01-11T23:10:17.0274001Z test_ops.py::TestCommonCUDA::test_out_special_scaled_modified_bessel_k0_cuda_float32 PASSED [ 18%] 2023-01-11T23:10:17.0274177Z test_ops.py::TestCommonCUDA::test_out_special_scaled_modified_bessel_k1_cuda_float32 PASSED [ 18%] 2023-01-11T23:10:17.0274530Z test_ops.py::TestCommonCUDA::test_out_special_shifted_chebyshev_polynomial_t_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 18%] 2023-01-11T23:10:17.0274870Z test_ops.py::TestCommonCUDA::test_out_special_shifted_chebyshev_polynomial_v_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 18%] 2023-01-11T23:10:17.0275204Z test_ops.py::TestCommonCUDA::test_out_special_shifted_chebyshev_polynomial_w_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 18%] 2023-01-11T23:10:17.0275380Z test_ops.py::TestCommonCUDA::test_out_special_spherical_bessel_j0_cuda_float32 PASSED [ 18%] 2023-01-11T23:10:17.0275561Z test_ops.py::TestCommonCUDA::test_out_special_zeta_cuda_float32 PASSED [ 18%] 2023-01-11T23:10:17.0275712Z test_ops.py::TestCommonCUDA::test_out_split_cuda_float32 PASSED [ 18%] 2023-01-11T23:10:17.0275868Z test_ops.py::TestCommonCUDA::test_out_split_with_sizes_cuda_float32 PASSED [ 18%] 2023-01-11T23:10:17.0276015Z test_ops.py::TestCommonCUDA::test_out_sqrt_cuda_float32 PASSED [ 18%] 2023-01-11T23:10:17.0276166Z test_ops.py::TestCommonCUDA::test_out_square_cuda_float32 PASSED [ 18%] 2023-01-11T23:10:17.0276313Z test_ops.py::TestCommonCUDA::test_out_stack_cuda_float32 PASSED [ 18%] 2023-01-11T23:10:17.0276460Z test_ops.py::TestCommonCUDA::test_out_std_cuda_float32 PASSED [ 18%] 2023-01-11T23:10:17.0276612Z test_ops.py::TestCommonCUDA::test_out_std_unbiased_cuda_float32 PASSED [ 18%] 2023-01-11T23:10:17.0276756Z test_ops.py::TestCommonCUDA::test_out_sub_cuda_float32 PASSED [ 18%] 2023-01-11T23:10:17.0276911Z test_ops.py::TestCommonCUDA::test_out_sum_to_size_cuda_float32 PASSED [ 18%] 2023-01-11T23:10:17.0277051Z test_ops.py::TestCommonCUDA::test_out_svd_cuda_float32 PASSED [ 18%] 2023-01-11T23:10:17.0277204Z test_ops.py::TestCommonCUDA::test_out_svd_lowrank_cuda_float32 PASSED [ 18%] 2023-01-11T23:10:17.0277354Z test_ops.py::TestCommonCUDA::test_out_tensor_split_cuda_float32 PASSED [ 18%] 2023-01-11T23:10:17.0277506Z test_ops.py::TestCommonCUDA::test_out_tensordot_cuda_float32 PASSED [ 18%] 2023-01-11T23:10:17.0277652Z test_ops.py::TestCommonCUDA::test_out_tile_cuda_float32 PASSED [ 18%] 2023-01-11T23:10:17.0277798Z test_ops.py::TestCommonCUDA::test_out_topk_cuda_float32 PASSED [ 18%] 2023-01-11T23:10:17.0277944Z test_ops.py::TestCommonCUDA::test_out_trace_cuda_float32 PASSED [ 18%] 2023-01-11T23:10:17.0278095Z test_ops.py::TestCommonCUDA::test_out_transpose_cuda_float32 PASSED [ 18%] 2023-01-11T23:10:17.0278240Z test_ops.py::TestCommonCUDA::test_out_trapezoid_cuda_float32 PASSED [ 18%] 2023-01-11T23:10:17.0278402Z test_ops.py::TestCommonCUDA::test_out_triangular_solve_cuda_float32 XFAIL [ 18%] 2023-01-11T23:10:17.0278548Z test_ops.py::TestCommonCUDA::test_out_tril_cuda_float32 PASSED [ 18%] 2023-01-11T23:10:17.0278718Z test_ops.py::TestCommonCUDA::test_out_triu_cuda_float32 PASSED [ 18%] 2023-01-11T23:10:17.0278870Z test_ops.py::TestCommonCUDA::test_out_true_divide_cuda_float32 PASSED [ 18%] 2023-01-11T23:10:17.0279020Z test_ops.py::TestCommonCUDA::test_out_unflatten_cuda_float32 PASSED [ 18%] 2023-01-11T23:10:17.0279169Z test_ops.py::TestCommonCUDA::test_out_unfold_copy_cuda_float32 XFAIL [ 18%] 2023-01-11T23:10:17.0279316Z test_ops.py::TestCommonCUDA::test_out_unfold_cuda_float32 PASSED [ 18%] 2023-01-11T23:10:17.0279473Z test_ops.py::TestCommonCUDA::test_out_unique_consecutive_cuda_float32 PASSED [ 18%] 2023-01-11T23:10:17.0279621Z test_ops.py::TestCommonCUDA::test_out_unique_cuda_float32 PASSED [ 18%] 2023-01-11T23:10:17.0279769Z test_ops.py::TestCommonCUDA::test_out_var_cuda_float32 PASSED [ 18%] 2023-01-11T23:10:17.0279921Z test_ops.py::TestCommonCUDA::test_out_var_unbiased_cuda_float32 PASSED [ 18%] 2023-01-11T23:10:17.0280070Z test_ops.py::TestCommonCUDA::test_out_view_as_cuda_float32 PASSED [ 18%] 2023-01-11T23:10:17.0280216Z test_ops.py::TestCommonCUDA::test_out_view_cuda_float32 PASSED [ 18%] 2023-01-11T23:10:17.0280362Z test_ops.py::TestCommonCUDA::test_out_vstack_cuda_float32 PASSED [ 18%] 2023-01-11T23:10:17.0280524Z test_ops.py::TestCommonCUDA::test_out_warning___rand___cuda PASSED [ 18%] 2023-01-11T23:10:17.0280684Z test_ops.py::TestCommonCUDA::test_out_warning___rdiv___cuda PASSED [ 18%] 2023-01-11T23:10:17.0280850Z test_ops.py::TestCommonCUDA::test_out_warning___rmatmul___cuda PASSED [ 18%] 2023-01-11T23:10:17.0280995Z test_ops.py::TestCommonCUDA::test_out_warning___ror___cuda PASSED [ 18%] 2023-01-11T23:10:17.0281190Z test_ops.py::TestCommonCUDA::test_out_warning__native_batch_norm_legit_cuda PASSED [ 18%] 2023-01-11T23:10:17.0281337Z test_ops.py::TestCommonCUDA::test_out_warning__refs_T_cuda PASSED [ 18%] 2023-01-11T23:10:17.0281514Z test_ops.py::TestCommonCUDA::test_out_warning__refs__conversions_bfloat16_cuda PASSED [ 18%] 2023-01-11T23:10:17.0281684Z test_ops.py::TestCommonCUDA::test_out_warning__refs__conversions_byte_cuda PASSED [ 18%] 2023-01-11T23:10:17.0281857Z test_ops.py::TestCommonCUDA::test_out_warning__refs__conversions_cdouble_cuda PASSED [ 18%] 2023-01-11T23:10:17.0282026Z test_ops.py::TestCommonCUDA::test_out_warning__refs__conversions_chalf_cuda PASSED [ 18%] 2023-01-11T23:10:17.0282190Z test_ops.py::TestCommonCUDA::test_out_warning__refs__conversions_double_cuda PASSED [ 18%] 2023-01-11T23:10:17.0282355Z test_ops.py::TestCommonCUDA::test_out_warning__refs__conversions_float_cuda PASSED [ 18%] 2023-01-11T23:10:17.0282522Z test_ops.py::TestCommonCUDA::test_out_warning__refs__conversions_int_cuda PASSED [ 18%] 2023-01-11T23:10:17.0282691Z test_ops.py::TestCommonCUDA::test_out_warning__refs__conversions_long_cuda PASSED [ 18%] 2023-01-11T23:10:17.0282855Z test_ops.py::TestCommonCUDA::test_out_warning__refs__conversions_short_cuda PASSED [ 18%] 2023-01-11T23:10:17.0283013Z test_ops.py::TestCommonCUDA::test_out_warning__refs_abs_cuda PASSED [ 18%] 2023-01-11T23:10:17.0283166Z test_ops.py::TestCommonCUDA::test_out_warning__refs_addr_cuda PASSED [ 18%] 2023-01-11T23:10:17.0283316Z test_ops.py::TestCommonCUDA::test_out_warning__refs_all_cuda PASSED [ 18%] 2023-01-11T23:10:17.0283544Z test_ops.py::TestCommonCUDA::test_out_warning__refs_allclose_cuda SKIPPED (Skipped! Only supports single tensor or iterable of tensor outputs.) [ 18%] 2023-01-11T23:10:17.0283695Z test_ops.py::TestCommonCUDA::test_out_warning__refs_any_cuda PASSED [ 18%] 2023-01-11T23:10:17.0283849Z test_ops.py::TestCommonCUDA::test_out_warning__refs_arange_cuda PASSED [ 18%] 2023-01-11T23:10:17.0284015Z test_ops.py::TestCommonCUDA::test_out_warning__refs_as_strided_scatter_cuda PASSED [ 18%] 2023-01-11T23:10:17.0284172Z test_ops.py::TestCommonCUDA::test_out_warning__refs_atan2_cuda PASSED [ 18%] 2023-01-11T23:10:17.0284321Z test_ops.py::TestCommonCUDA::test_out_warning__refs_atan_cuda PASSED [ 18%] 2023-01-11T23:10:17.0284513Z test_ops.py::TestCommonCUDA::test_out_warning__refs_atleast_1d_cuda PASSED [ 18%] 2023-01-11T23:10:17.0284673Z test_ops.py::TestCommonCUDA::test_out_warning__refs_atleast_2d_cuda PASSED [ 18%] 2023-01-11T23:10:17.0284826Z test_ops.py::TestCommonCUDA::test_out_warning__refs_atleast_3d_cuda PASSED [ 18%] 2023-01-11T23:10:17.0284985Z test_ops.py::TestCommonCUDA::test_out_warning__refs_bitwise_or_cuda PASSED [ 18%] 2023-01-11T23:10:17.0285154Z test_ops.py::TestCommonCUDA::test_out_warning__refs_bitwise_right_shift_cuda PASSED [ 18%] 2023-01-11T23:10:17.0285323Z test_ops.py::TestCommonCUDA::test_out_warning__refs_broadcast_tensors_cuda PASSED [ 18%] 2023-01-11T23:10:17.0285490Z test_ops.py::TestCommonCUDA::test_out_warning__refs_broadcast_to_cuda PASSED [ 18%] 2023-01-11T23:10:17.0285644Z test_ops.py::TestCommonCUDA::test_out_warning__refs_cat_cuda PASSED [ 18%] 2023-01-11T23:10:17.0285797Z test_ops.py::TestCommonCUDA::test_out_warning__refs_clamp_cuda PASSED [ 18%] 2023-01-11T23:10:17.0285958Z test_ops.py::TestCommonCUDA::test_out_warning__refs_clamp_min_cuda PASSED [ 18%] 2023-01-11T23:10:17.0286121Z test_ops.py::TestCommonCUDA::test_out_warning__refs_column_stack_cuda PASSED [ 18%] 2023-01-11T23:10:17.0286279Z test_ops.py::TestCommonCUDA::test_out_warning__refs_conj_physical_cuda PASSED [ 18%] 2023-01-11T23:10:17.0286443Z test_ops.py::TestCommonCUDA::test_out_warning__refs_constant_pad_nd_cuda PASSED [ 18%] 2023-01-11T23:10:17.0286598Z test_ops.py::TestCommonCUDA::test_out_warning__refs_copysign_cuda PASSED [ 18%] 2023-01-11T23:10:17.0286752Z test_ops.py::TestCommonCUDA::test_out_warning__refs_cumsum_cuda PASSED [ 18%] 2023-01-11T23:10:17.0286921Z test_ops.py::TestCommonCUDA::test_out_warning__refs_diagonal_scatter_cuda PASSED [ 18%] 2023-01-11T23:10:17.0287100Z test_ops.py::TestCommonCUDA::test_out_warning__refs_dstack_cuda PASSED [ 18%] 2023-01-11T23:10:17.0287306Z test_ops.py::TestCommonCUDA::test_out_warning__refs_empty_like_cuda SKIPPED (Expected: empty is not comparable) [ 18%] 2023-01-11T23:10:17.0287457Z test_ops.py::TestCommonCUDA::test_out_warning__refs_eq_cuda PASSED [ 18%] 2023-01-11T23:10:17.0287602Z test_ops.py::TestCommonCUDA::test_out_warning__refs_erf_cuda PASSED [ 18%] 2023-01-11T23:10:17.0287754Z test_ops.py::TestCommonCUDA::test_out_warning__refs_exp2_cuda PASSED [ 18%] 2023-01-11T23:10:17.0287907Z test_ops.py::TestCommonCUDA::test_out_warning__refs_expand_cuda PASSED [ 18%] 2023-01-11T23:10:17.0288056Z test_ops.py::TestCommonCUDA::test_out_warning__refs_eye_cuda PASSED [ 18%] 2023-01-11T23:10:17.0288207Z test_ops.py::TestCommonCUDA::test_out_warning__refs_fft_fft_cuda PASSED [ 18%] 2023-01-11T23:10:17.0288360Z test_ops.py::TestCommonCUDA::test_out_warning__refs_fft_fftn_cuda PASSED [ 18%] 2023-01-11T23:10:17.0288524Z test_ops.py::TestCommonCUDA::test_out_warning__refs_fft_fftshift_cuda PASSED [ 18%] 2023-01-11T23:10:17.0288681Z test_ops.py::TestCommonCUDA::test_out_warning__refs_fft_hfft2_cuda PASSED [ 18%] 2023-01-11T23:10:17.0288831Z test_ops.py::TestCommonCUDA::test_out_warning__refs_fft_hfft_cuda PASSED [ 18%] 2023-01-11T23:10:17.0288990Z test_ops.py::TestCommonCUDA::test_out_warning__refs_fft_ihfft_cuda PASSED [ 18%] 2023-01-11T23:10:17.0289150Z test_ops.py::TestCommonCUDA::test_out_warning__refs_fft_irfft_cuda PASSED [ 18%] 2023-01-11T23:10:17.0289303Z test_ops.py::TestCommonCUDA::test_out_warning__refs_fft_rfft_cuda PASSED [ 18%] 2023-01-11T23:10:17.0289459Z test_ops.py::TestCommonCUDA::test_out_warning__refs_fft_rfftn_cuda PASSED [ 18%] 2023-01-11T23:10:17.0289611Z test_ops.py::TestCommonCUDA::test_out_warning__refs_fill_cuda PASSED [ 18%] 2023-01-11T23:10:17.0289762Z test_ops.py::TestCommonCUDA::test_out_warning__refs_flatten_cuda PASSED [ 18%] 2023-01-11T23:10:17.0289915Z test_ops.py::TestCommonCUDA::test_out_warning__refs_flip_cuda PASSED [ 18%] 2023-01-11T23:10:17.0290060Z test_ops.py::TestCommonCUDA::test_out_warning__refs_fliplr_cuda PASSED [ 18%] 2023-01-11T23:10:17.0290232Z test_ops.py::TestCommonCUDA::test_out_warning__refs_fmin_cuda PASSED [ 18%] 2023-01-11T23:10:17.0290383Z test_ops.py::TestCommonCUDA::test_out_warning__refs_fmod_cuda PASSED [ 18%] 2023-01-11T23:10:17.0290531Z test_ops.py::TestCommonCUDA::test_out_warning__refs_ge_cuda PASSED [ 18%] 2023-01-11T23:10:17.0290677Z test_ops.py::TestCommonCUDA::test_out_warning__refs_gt_cuda PASSED [ 18%] 2023-01-11T23:10:17.0290836Z test_ops.py::TestCommonCUDA::test_out_warning__refs_heaviside_cuda PASSED [ 18%] 2023-01-11T23:10:17.0290987Z test_ops.py::TestCommonCUDA::test_out_warning__refs_hsplit_cuda PASSED [ 18%] 2023-01-11T23:10:17.0291136Z test_ops.py::TestCommonCUDA::test_out_warning__refs_hstack_cuda PASSED [ 18%] 2023-01-11T23:10:17.0291282Z test_ops.py::TestCommonCUDA::test_out_warning__refs_hypot_cuda PASSED [ 18%] 2023-01-11T23:10:17.0291436Z test_ops.py::TestCommonCUDA::test_out_warning__refs_igamma_cuda PASSED [ 18%] 2023-01-11T23:10:17.0291588Z test_ops.py::TestCommonCUDA::test_out_warning__refs_igammac_cuda PASSED [ 18%] 2023-01-11T23:10:17.0291742Z test_ops.py::TestCommonCUDA::test_out_warning__refs_isfinite_cuda PASSED [ 18%] 2023-01-11T23:10:17.0291891Z test_ops.py::TestCommonCUDA::test_out_warning__refs_isinf_cuda PASSED [ 18%] 2023-01-11T23:10:17.0292044Z test_ops.py::TestCommonCUDA::test_out_warning__refs_isneginf_cuda PASSED [ 18%] 2023-01-11T23:10:17.0292196Z test_ops.py::TestCommonCUDA::test_out_warning__refs_isposinf_cuda PASSED [ 18%] 2023-01-11T23:10:17.0292350Z test_ops.py::TestCommonCUDA::test_out_warning__refs_isreal_cuda PASSED [ 18%] 2023-01-11T23:10:17.0292492Z test_ops.py::TestCommonCUDA::test_out_warning__refs_lcm_cuda PASSED [ 18%] 2023-01-11T23:10:17.0292641Z test_ops.py::TestCommonCUDA::test_out_warning__refs_le_cuda PASSED [ 18%] 2023-01-11T23:10:17.0292816Z test_ops.py::TestCommonCUDA::test_out_warning__refs_lgamma_cuda PASSED [ 18%] 2023-01-11T23:10:17.0292986Z test_ops.py::TestCommonCUDA::test_out_warning__refs_linalg_matrix_norm_cuda PASSED [ 18%] 2023-01-11T23:10:17.0293151Z test_ops.py::TestCommonCUDA::test_out_warning__refs_linalg_norm_cuda PASSED [ 18%] 2023-01-11T23:10:17.0293307Z test_ops.py::TestCommonCUDA::test_out_warning__refs_linspace_cuda PASSED [ 18%] 2023-01-11T23:10:17.0293456Z test_ops.py::TestCommonCUDA::test_out_warning__refs_log_cuda PASSED [ 19%] 2023-01-11T23:10:17.0293616Z test_ops.py::TestCommonCUDA::test_out_warning__refs_logical_or_cuda PASSED [ 19%] 2023-01-11T23:10:17.0293777Z test_ops.py::TestCommonCUDA::test_out_warning__refs_logical_xor_cuda PASSED [ 19%] 2023-01-11T23:10:17.0293930Z test_ops.py::TestCommonCUDA::test_out_warning__refs_logsumexp_cuda PASSED [ 19%] 2023-01-11T23:10:17.0294078Z test_ops.py::TestCommonCUDA::test_out_warning__refs_lt_cuda PASSED [ 19%] 2023-01-11T23:10:17.0294256Z test_ops.py::TestCommonCUDA::test_out_warning__refs_meshgrid_list_of_tensors_cuda PASSED [ 19%] 2023-01-11T23:10:17.0294436Z test_ops.py::TestCommonCUDA::test_out_warning__refs_meshgrid_variadic_tensors_cuda PASSED [ 19%] 2023-01-11T23:10:17.0294691Z test_ops.py::TestCommonCUDA::test_out_warning__refs_minimum_cuda PASSED [ 19%] 2023-01-11T23:10:17.0294844Z test_ops.py::TestCommonCUDA::test_out_warning__refs_narrow_cuda PASSED [ 19%] 2023-01-11T23:10:17.0294992Z test_ops.py::TestCommonCUDA::test_out_warning__refs_neg_cuda PASSED [ 19%] 2023-01-11T23:10:17.0295194Z test_ops.py::TestCommonCUDA::test_out_warning__refs_new_empty_cuda SKIPPED (Expected: empty is not comparable) [ 19%] 2023-01-11T23:10:17.0295341Z test_ops.py::TestCommonCUDA::test_out_warning__refs_new_ones_cuda PASSED [ 19%] 2023-01-11T23:10:17.0295500Z test_ops.py::TestCommonCUDA::test_out_warning__refs_nextafter_cuda PASSED [ 19%] 2023-01-11T23:10:17.0295683Z test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_alpha_dropout_cuda PASSED [ 19%] 2023-01-11T23:10:17.0295855Z test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_celu_cuda PASSED [ 19%] 2023-01-11T23:10:17.0296116Z test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_dropout_cuda SKIPPED (Expected: dropout is not comparable) [ 19%] 2023-01-11T23:10:17.0296288Z test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_elu_cuda PASSED [ 19%] 2023-01-11T23:10:17.0296459Z test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_gelu_cuda PASSED [ 19%] 2023-01-11T23:10:17.0296625Z test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_glu_cuda PASSED [ 19%] 2023-01-11T23:10:17.0296801Z test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_hardshrink_cuda PASSED [ 19%] 2023-01-11T23:10:17.0296983Z test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_hinge_embedding_loss_cuda PASSED [ 19%] 2023-01-11T23:10:17.0297156Z test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_huber_loss_cuda PASSED [ 19%] 2023-01-11T23:10:17.0297333Z test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_leaky_relu_cuda PASSED [ 19%] 2023-01-11T23:10:17.0297508Z test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_nll_loss_cuda PASSED [ 19%] 2023-01-11T23:10:17.0297675Z test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_relu_cuda PASSED [ 19%] 2023-01-11T23:10:17.0297851Z test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_softshrink_cuda PASSED [ 19%] 2023-01-11T23:10:17.0298026Z test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_threshold_cuda PASSED [ 19%] 2023-01-11T23:10:17.0298180Z test_ops.py::TestCommonCUDA::test_out_warning__refs_norm_cuda PASSED [ 19%] 2023-01-11T23:10:17.0298327Z test_ops.py::TestCommonCUDA::test_out_warning__refs_ones_cuda PASSED [ 19%] 2023-01-11T23:10:17.0298483Z test_ops.py::TestCommonCUDA::test_out_warning__refs_positive_cuda PASSED [ 19%] 2023-01-11T23:10:17.0298669Z test_ops.py::TestCommonCUDA::test_out_warning__refs_prod_cuda PASSED [ 19%] 2023-01-11T23:10:17.0298822Z test_ops.py::TestCommonCUDA::test_out_warning__refs_randn_cuda PASSED [ 19%] 2023-01-11T23:10:17.0298975Z test_ops.py::TestCommonCUDA::test_out_warning__refs_real_cuda PASSED [ 19%] 2023-01-11T23:10:17.0299128Z test_ops.py::TestCommonCUDA::test_out_warning__refs_repeat_cuda PASSED [ 19%] 2023-01-11T23:10:17.0299279Z test_ops.py::TestCommonCUDA::test_out_warning__refs_rot90_cuda PASSED [ 19%] 2023-01-11T23:10:17.0299431Z test_ops.py::TestCommonCUDA::test_out_warning__refs_round_cuda PASSED [ 19%] 2023-01-11T23:10:17.0299574Z test_ops.py::TestCommonCUDA::test_out_warning__refs_rsqrt_cuda PASSED [ 19%] 2023-01-11T23:10:17.0299725Z test_ops.py::TestCommonCUDA::test_out_warning__refs_sgn_cuda PASSED [ 19%] 2023-01-11T23:10:17.0299874Z test_ops.py::TestCommonCUDA::test_out_warning__refs_sign_cuda PASSED [ 19%] 2023-01-11T23:10:17.0300026Z test_ops.py::TestCommonCUDA::test_out_warning__refs_signbit_cuda PASSED [ 19%] 2023-01-11T23:10:17.0300192Z test_ops.py::TestCommonCUDA::test_out_warning__refs_special_entr_cuda PASSED [ 19%] 2023-01-11T23:10:17.0300356Z test_ops.py::TestCommonCUDA::test_out_warning__refs_special_erfcx_cuda PASSED [ 19%] 2023-01-11T23:10:17.0300519Z test_ops.py::TestCommonCUDA::test_out_warning__refs_special_i1_cuda PASSED [ 19%] 2023-01-11T23:10:17.0300682Z test_ops.py::TestCommonCUDA::test_out_warning__refs_special_i1e_cuda PASSED [ 19%] 2023-01-11T23:10:17.0300842Z test_ops.py::TestCommonCUDA::test_out_warning__refs_special_log_ndtr_cuda PASSED [ 19%] 2023-01-11T23:10:17.0301025Z test_ops.py::TestCommonCUDA::test_out_warning__refs_special_log_softmax_with_dtype_cuda PASSED [ 19%] 2023-01-11T23:10:17.0301191Z test_ops.py::TestCommonCUDA::test_out_warning__refs_special_logit_cuda PASSED [ 19%] 2023-01-11T23:10:17.0301380Z test_ops.py::TestCommonCUDA::test_out_warning__refs_special_multigammaln_mvlgamma_p_5_cuda PASSED [ 19%] 2023-01-11T23:10:17.0301546Z test_ops.py::TestCommonCUDA::test_out_warning__refs_special_ndtri_cuda PASSED [ 19%] 2023-01-11T23:10:17.0301700Z test_ops.py::TestCommonCUDA::test_out_warning__refs_sqrt_cuda PASSED [ 19%] 2023-01-11T23:10:17.0301854Z test_ops.py::TestCommonCUDA::test_out_warning__refs_square_cuda PASSED [ 19%] 2023-01-11T23:10:17.0302029Z test_ops.py::TestCommonCUDA::test_out_warning__refs_stack_cuda PASSED [ 19%] 2023-01-11T23:10:17.0302182Z test_ops.py::TestCommonCUDA::test_out_warning__refs_std_cuda PASSED [ 19%] 2023-01-11T23:10:17.0302325Z test_ops.py::TestCommonCUDA::test_out_warning__refs_sub_cuda PASSED [ 19%] 2023-01-11T23:10:17.0302484Z test_ops.py::TestCommonCUDA::test_out_warning__refs_sum_to_size_cuda PASSED [ 19%] 2023-01-11T23:10:17.0302633Z test_ops.py::TestCommonCUDA::test_out_warning__refs_t_cuda PASSED [ 19%] 2023-01-11T23:10:17.0302783Z test_ops.py::TestCommonCUDA::test_out_warning__refs_tan_cuda PASSED [ 19%] 2023-01-11T23:10:17.0302933Z test_ops.py::TestCommonCUDA::test_out_warning__refs_tanh_cuda PASSED [ 19%] 2023-01-11T23:10:17.0303100Z test_ops.py::TestCommonCUDA::test_out_warning__refs_tensor_split_cuda PASSED [ 19%] 2023-01-11T23:10:17.0303249Z test_ops.py::TestCommonCUDA::test_out_warning__refs_trace_cuda PASSED [ 19%] 2023-01-11T23:10:17.0303416Z test_ops.py::TestCommonCUDA::test_out_warning__refs_transpose_cuda PASSED [ 19%] 2023-01-11T23:10:17.0303559Z test_ops.py::TestCommonCUDA::test_out_warning__refs_tril_cuda PASSED [ 19%] 2023-01-11T23:10:17.0303709Z test_ops.py::TestCommonCUDA::test_out_warning__refs_triu_cuda PASSED [ 19%] 2023-01-11T23:10:17.0303872Z test_ops.py::TestCommonCUDA::test_out_warning__refs_triu_indices_cuda PASSED [ 19%] 2023-01-11T23:10:17.0304024Z test_ops.py::TestCommonCUDA::test_out_warning__refs_unbind_cuda PASSED [ 19%] 2023-01-11T23:10:17.0304183Z test_ops.py::TestCommonCUDA::test_out_warning__refs_unfold_copy_cuda PASSED [ 19%] 2023-01-11T23:10:17.0304341Z test_ops.py::TestCommonCUDA::test_out_warning__refs_view_as_cuda PASSED [ 19%] 2023-01-11T23:10:17.0304519Z test_ops.py::TestCommonCUDA::test_out_warning__refs_vsplit_cuda PASSED [ 19%] 2023-01-11T23:10:17.0304671Z test_ops.py::TestCommonCUDA::test_out_warning__refs_where_cuda PASSED [ 19%] 2023-01-11T23:10:17.0304820Z test_ops.py::TestCommonCUDA::test_out_warning__refs_xlogy_cuda PASSED [ 19%] 2023-01-11T23:10:17.0304969Z test_ops.py::TestCommonCUDA::test_out_warning__refs_zeros_cuda PASSED [ 19%] 2023-01-11T23:10:17.0305137Z test_ops.py::TestCommonCUDA::test_out_warning__softmax_backward_data_cuda PASSED [ 19%] 2023-01-11T23:10:17.0305287Z test_ops.py::TestCommonCUDA::test_out_warning_acos_cuda PASSED [ 19%] 2023-01-11T23:10:17.0305436Z test_ops.py::TestCommonCUDA::test_out_warning_acosh_cuda PASSED [ 19%] 2023-01-11T23:10:17.0305586Z test_ops.py::TestCommonCUDA::test_out_warning_addmm_cuda PASSED [ 19%] 2023-01-11T23:10:17.0305750Z test_ops.py::TestCommonCUDA::test_out_warning_addmm_decomposed_cuda PASSED [ 19%] 2023-01-11T23:10:17.0305901Z test_ops.py::TestCommonCUDA::test_out_warning_addmv_cuda PASSED [ 19%] 2023-01-11T23:10:17.0306126Z test_ops.py::TestCommonCUDA::test_out_warning_allclose_cuda SKIPPED (Skipped! Only supports single tensor or iterable of tensor outputs.) [ 19%] 2023-01-11T23:10:17.0306280Z test_ops.py::TestCommonCUDA::test_out_warning_aminmax_cuda PASSED [ 19%] 2023-01-11T23:10:17.0306430Z test_ops.py::TestCommonCUDA::test_out_warning_argmax_cuda PASSED [ 19%] 2023-01-11T23:10:17.0306578Z test_ops.py::TestCommonCUDA::test_out_warning_argsort_cuda PASSED [ 19%] 2023-01-11T23:10:17.0306729Z test_ops.py::TestCommonCUDA::test_out_warning_as_strided_cuda PASSED [ 19%] 2023-01-11T23:10:17.0306899Z test_ops.py::TestCommonCUDA::test_out_warning_as_strided_partial_views_cuda PASSED [ 19%] 2023-01-11T23:10:17.0307046Z test_ops.py::TestCommonCUDA::test_out_warning_asin_cuda PASSED [ 19%] 2023-01-11T23:10:17.0307193Z test_ops.py::TestCommonCUDA::test_out_warning_asinh_cuda PASSED [ 19%] 2023-01-11T23:10:17.0307342Z test_ops.py::TestCommonCUDA::test_out_warning_atan2_cuda PASSED [ 19%] 2023-01-11T23:10:17.0307486Z test_ops.py::TestCommonCUDA::test_out_warning_atleast_1d_cuda PASSED [ 19%] 2023-01-11T23:10:17.0307669Z test_ops.py::TestCommonCUDA::test_out_warning_atleast_3d_cuda PASSED [ 19%] 2023-01-11T23:10:17.0307822Z test_ops.py::TestCommonCUDA::test_out_warning_baddbmm_cuda PASSED [ 19%] 2023-01-11T23:10:17.0307971Z test_ops.py::TestCommonCUDA::test_out_warning_bfloat16_cuda PASSED [ 19%] 2023-01-11T23:10:17.0308120Z test_ops.py::TestCommonCUDA::test_out_warning_bitwise_or_cuda PASSED [ 19%] 2023-01-11T23:10:17.0308286Z test_ops.py::TestCommonCUDA::test_out_warning_bitwise_right_shift_cuda PASSED [ 19%] 2023-01-11T23:10:17.0308432Z test_ops.py::TestCommonCUDA::test_out_warning_bmm_cuda PASSED [ 19%] 2023-01-11T23:10:17.0308595Z test_ops.py::TestCommonCUDA::test_out_warning_broadcast_shapes_cuda PASSED [ 19%] 2023-01-11T23:10:17.0308820Z test_ops.py::TestCommonCUDA::test_out_warning_broadcast_tensors_cuda PASSED [ 19%] 2023-01-11T23:10:17.0308985Z test_ops.py::TestCommonCUDA::test_out_warning_broadcast_to_cuda PASSED [ 19%] 2023-01-11T23:10:17.0309137Z test_ops.py::TestCommonCUDA::test_out_warning_bucketize_cuda PASSED [ 19%] 2023-01-11T23:10:17.0309294Z test_ops.py::TestCommonCUDA::test_out_warning_cartesian_prod_cuda PASSED [ 19%] 2023-01-11T23:10:17.0309444Z test_ops.py::TestCommonCUDA::test_out_warning_cdouble_cuda PASSED [ 19%] 2023-01-11T23:10:17.0309590Z test_ops.py::TestCommonCUDA::test_out_warning_chalf_cuda PASSED [ 19%] 2023-01-11T23:10:17.0309736Z test_ops.py::TestCommonCUDA::test_out_warning_char_cuda PASSED [ 19%] 2023-01-11T23:10:17.0309887Z test_ops.py::TestCommonCUDA::test_out_warning_cholesky_cuda PASSED [ 19%] 2023-01-11T23:10:17.0310041Z test_ops.py::TestCommonCUDA::test_out_warning_cholesky_inverse_cuda PASSED [ 19%] 2023-01-11T23:10:17.0310198Z test_ops.py::TestCommonCUDA::test_out_warning_cholesky_solve_cuda PASSED [ 19%] 2023-01-11T23:10:17.0310373Z test_ops.py::TestCommonCUDA::test_out_warning_clamp_cuda PASSED [ 19%] 2023-01-11T23:10:17.0310524Z test_ops.py::TestCommonCUDA::test_out_warning_clamp_max_cuda PASSED [ 19%] 2023-01-11T23:10:17.0310676Z test_ops.py::TestCommonCUDA::test_out_warning_clamp_min_cuda PASSED [ 19%] 2023-01-11T23:10:17.0310826Z test_ops.py::TestCommonCUDA::test_out_warning_column_stack_cuda PASSED [ 19%] 2023-01-11T23:10:17.0310981Z test_ops.py::TestCommonCUDA::test_out_warning_combinations_cuda PASSED [ 19%] 2023-01-11T23:10:17.0311130Z test_ops.py::TestCommonCUDA::test_out_warning_complex_cuda PASSED [ 19%] 2023-01-11T23:10:17.0311271Z test_ops.py::TestCommonCUDA::test_out_warning_conj_cuda PASSED [ 19%] 2023-01-11T23:10:17.0311420Z test_ops.py::TestCommonCUDA::test_out_warning_cos_cuda PASSED [ 19%] 2023-01-11T23:10:17.0311568Z test_ops.py::TestCommonCUDA::test_out_warning_cosh_cuda PASSED [ 19%] 2023-01-11T23:10:17.0311717Z test_ops.py::TestCommonCUDA::test_out_warning_cross_cuda PASSED [ 19%] 2023-01-11T23:10:17.0311865Z test_ops.py::TestCommonCUDA::test_out_warning_cummax_cuda PASSED [ 19%] 2023-01-11T23:10:17.0312015Z test_ops.py::TestCommonCUDA::test_out_warning_cumsum_cuda PASSED [ 19%] 2023-01-11T23:10:17.0312184Z test_ops.py::TestCommonCUDA::test_out_warning_cumulative_trapezoid_cuda PASSED [ 19%] 2023-01-11T23:10:17.0312332Z test_ops.py::TestCommonCUDA::test_out_warning_deg2rad_cuda PASSED [ 19%] 2023-01-11T23:10:17.0312473Z test_ops.py::TestCommonCUDA::test_out_warning_diag_cuda PASSED [ 19%] 2023-01-11T23:10:17.0312623Z test_ops.py::TestCommonCUDA::test_out_warning_diag_embed_cuda PASSED [ 19%] 2023-01-11T23:10:17.0312775Z test_ops.py::TestCommonCUDA::test_out_warning_diagonal_copy_cuda PASSED [ 19%] 2023-01-11T23:10:17.0312924Z test_ops.py::TestCommonCUDA::test_out_warning_digamma_cuda PASSED [ 19%] 2023-01-11T23:10:17.0313070Z test_ops.py::TestCommonCUDA::test_out_warning_dist_cuda PASSED [ 19%] 2023-01-11T23:10:17.0313237Z test_ops.py::TestCommonCUDA::test_out_warning_div_floor_rounding_cuda PASSED [ 19%] 2023-01-11T23:10:17.0313402Z test_ops.py::TestCommonCUDA::test_out_warning_div_no_rounding_mode_cuda PASSED [ 19%] 2023-01-11T23:10:17.0313573Z test_ops.py::TestCommonCUDA::test_out_warning_dot_cuda PASSED [ 19%] 2023-01-11T23:10:17.0313717Z test_ops.py::TestCommonCUDA::test_out_warning_dsplit_cuda PASSED [ 19%] 2023-01-11T23:10:17.0313867Z test_ops.py::TestCommonCUDA::test_out_warning_dstack_cuda PASSED [ 19%] 2023-01-11T23:10:17.0314015Z test_ops.py::TestCommonCUDA::test_out_warning_einsum_cuda PASSED [ 19%] 2023-01-11T23:10:17.0314209Z test_ops.py::TestCommonCUDA::test_out_warning_empty_cuda SKIPPED (Expected: empty is not comparable) [ 19%] 2023-01-11T23:10:17.0314356Z test_ops.py::TestCommonCUDA::test_out_warning_erfc_cuda PASSED [ 19%] 2023-01-11T23:10:17.0314502Z test_ops.py::TestCommonCUDA::test_out_warning_exp_cuda PASSED [ 19%] 2023-01-11T23:10:17.0314655Z test_ops.py::TestCommonCUDA::test_out_warning_expand_as_cuda PASSED [ 19%] 2023-01-11T23:10:17.0314802Z test_ops.py::TestCommonCUDA::test_out_warning_expand_cuda PASSED [ 19%] 2023-01-11T23:10:17.0314945Z test_ops.py::TestCommonCUDA::test_out_warning_expm1_cuda PASSED [ 19%] 2023-01-11T23:10:17.0315095Z test_ops.py::TestCommonCUDA::test_out_warning_fft_fftn_cuda PASSED [ 19%] 2023-01-11T23:10:17.0315243Z test_ops.py::TestCommonCUDA::test_out_warning_fft_hfft_cuda PASSED [ 19%] 2023-01-11T23:10:17.0315392Z test_ops.py::TestCommonCUDA::test_out_warning_fft_ifft2_cuda PASSED [ 19%] 2023-01-11T23:10:17.0315539Z test_ops.py::TestCommonCUDA::test_out_warning_fft_ifftn_cuda PASSED [ 19%] 2023-01-11T23:10:17.0315688Z test_ops.py::TestCommonCUDA::test_out_warning_fft_ihfft2_cuda XFAIL [ 19%] 2023-01-11T23:10:17.0315837Z test_ops.py::TestCommonCUDA::test_out_warning_fft_irfft2_cuda PASSED [ 19%] 2023-01-11T23:10:17.0316011Z test_ops.py::TestCommonCUDA::test_out_warning_fft_irfft_cuda PASSED [ 19%] 2023-01-11T23:10:17.0316159Z test_ops.py::TestCommonCUDA::test_out_warning_fft_rfft2_cuda PASSED [ 19%] 2023-01-11T23:10:17.0316305Z test_ops.py::TestCommonCUDA::test_out_warning_flatten_cuda PASSED [ 19%] 2023-01-11T23:10:17.0316453Z test_ops.py::TestCommonCUDA::test_out_warning_fliplr_cuda PASSED [ 19%] 2023-01-11T23:10:17.0316604Z test_ops.py::TestCommonCUDA::test_out_warning_float_power_cuda PASSED [ 19%] 2023-01-11T23:10:17.0316750Z test_ops.py::TestCommonCUDA::test_out_warning_fmax_cuda PASSED [ 19%] 2023-01-11T23:10:17.0316895Z test_ops.py::TestCommonCUDA::test_out_warning_fmin_cuda PASSED [ 19%] 2023-01-11T23:10:17.0317043Z test_ops.py::TestCommonCUDA::test_out_warning_fmod_cuda PASSED [ 19%] 2023-01-11T23:10:17.0317188Z test_ops.py::TestCommonCUDA::test_out_warning_full_cuda XFAIL [ 19%] 2023-01-11T23:10:17.0317340Z test_ops.py::TestCommonCUDA::test_out_warning_gather_cuda PASSED [ 19%] 2023-01-11T23:10:17.0317479Z test_ops.py::TestCommonCUDA::test_out_warning_ge_cuda PASSED [ 19%] 2023-01-11T23:10:17.0317628Z test_ops.py::TestCommonCUDA::test_out_warning_geqrf_cuda PASSED [ 19%] 2023-01-11T23:10:17.0317790Z test_ops.py::TestCommonCUDA::test_out_warning_grid_sampler_2d_cuda PASSED [ 19%] 2023-01-11T23:10:17.0317934Z test_ops.py::TestCommonCUDA::test_out_warning_gt_cuda PASSED [ 19%] 2023-01-11T23:10:17.0318155Z test_ops.py::TestCommonCUDA::test_out_warning_histogramdd_cuda SKIPPED (Skipped! Op has not supported dtypes on this device.) [ 19%] 2023-01-11T23:10:17.0318303Z test_ops.py::TestCommonCUDA::test_out_warning_hstack_cuda PASSED [ 19%] 2023-01-11T23:10:17.0318450Z test_ops.py::TestCommonCUDA::test_out_warning_hypot_cuda PASSED [ 20%] 2023-01-11T23:10:17.0318598Z test_ops.py::TestCommonCUDA::test_out_warning_igamma_cuda PASSED [ 20%] 2023-01-11T23:10:17.0318742Z test_ops.py::TestCommonCUDA::test_out_warning_imag_cuda PASSED [ 20%] 2023-01-11T23:10:17.0318891Z test_ops.py::TestCommonCUDA::test_out_warning_index_add_cuda PASSED [ 20%] 2023-01-11T23:10:17.0319066Z test_ops.py::TestCommonCUDA::test_out_warning_index_copy_cuda PASSED [ 20%] 2023-01-11T23:10:17.0319222Z test_ops.py::TestCommonCUDA::test_out_warning_index_select_cuda PASSED [ 20%] 2023-01-11T23:10:17.0319367Z test_ops.py::TestCommonCUDA::test_out_warning_inner_cuda PASSED [ 20%] 2023-01-11T23:10:17.0319511Z test_ops.py::TestCommonCUDA::test_out_warning_int_cuda PASSED [ 20%] 2023-01-11T23:10:17.0319661Z test_ops.py::TestCommonCUDA::test_out_warning_isfinite_cuda PASSED [ 20%] 2023-01-11T23:10:17.0319813Z test_ops.py::TestCommonCUDA::test_out_warning_isposinf_cuda PASSED [ 20%] 2023-01-11T23:10:17.0319953Z test_ops.py::TestCommonCUDA::test_out_warning_istft_cuda PASSED [ 20%] 2023-01-11T23:10:17.0320126Z test_ops.py::TestCommonCUDA::test_out_warning_jiterator_2inputs_2outputs_cuda PASSED [ 20%] 2023-01-11T23:10:17.0320293Z test_ops.py::TestCommonCUDA::test_out_warning_jiterator_unary_cuda PASSED [ 20%] 2023-01-11T23:10:17.0320438Z test_ops.py::TestCommonCUDA::test_out_warning_kron_cuda PASSED [ 20%] 2023-01-11T23:10:17.0320618Z test_ops.py::TestCommonCUDA::test_out_warning_kthvalue_cuda PASSED [ 20%] 2023-01-11T23:10:17.0320788Z test_ops.py::TestCommonCUDA::test_out_warning_lcm_cuda PASSED [ 20%] 2023-01-11T23:10:17.0320935Z test_ops.py::TestCommonCUDA::test_out_warning_ldexp_cuda PASSED [ 20%] 2023-01-11T23:10:17.0321082Z test_ops.py::TestCommonCUDA::test_out_warning_lerp_cuda PASSED [ 20%] 2023-01-11T23:10:17.0321224Z test_ops.py::TestCommonCUDA::test_out_warning_lgamma_cuda PASSED [ 20%] 2023-01-11T23:10:17.0321376Z test_ops.py::TestCommonCUDA::test_out_warning_linalg_cond_cuda PASSED [ 20%] 2023-01-11T23:10:17.0321526Z test_ops.py::TestCommonCUDA::test_out_warning_linalg_det_cuda PASSED [ 20%] 2023-01-11T23:10:17.0321716Z test_ops.py::TestCommonCUDA::test_out_warning_linalg_det_singular_cuda PASSED [ 20%] 2023-01-11T23:10:17.0321867Z test_ops.py::TestCommonCUDA::test_out_warning_linalg_eigh_cuda PASSED [ 20%] 2023-01-11T23:10:17.0322025Z test_ops.py::TestCommonCUDA::test_out_warning_linalg_eigvals_cuda PASSED [ 20%] 2023-01-11T23:10:17.0322186Z test_ops.py::TestCommonCUDA::test_out_warning_linalg_eigvalsh_cuda PASSED [ 20%] 2023-01-11T23:10:17.0322337Z test_ops.py::TestCommonCUDA::test_out_warning_linalg_inv_cuda PASSED [ 20%] 2023-01-11T23:10:17.0322494Z test_ops.py::TestCommonCUDA::test_out_warning_linalg_ldl_factor_cuda PASSED [ 20%] 2023-01-11T23:10:17.0322666Z test_ops.py::TestCommonCUDA::test_out_warning_linalg_lstsq_grad_oriented_cuda PASSED [ 20%] 2023-01-11T23:10:17.0322831Z test_ops.py::TestCommonCUDA::test_out_warning_linalg_lu_factor_ex_cuda PASSED [ 20%] 2023-01-11T23:10:17.0322992Z test_ops.py::TestCommonCUDA::test_out_warning_linalg_matrix_rank_cuda PASSED [ 20%] 2023-01-11T23:10:17.0323147Z test_ops.py::TestCommonCUDA::test_out_warning_linalg_pinv_cuda PASSED [ 20%] 2023-01-11T23:10:17.0323318Z test_ops.py::TestCommonCUDA::test_out_warning_linalg_pinv_hermitian_cuda PASSED [ 20%] 2023-01-11T23:10:17.0323550Z test_ops.py::TestCommonCUDA::test_out_warning_linalg_pinv_singular_cuda SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 20%] 2023-01-11T23:10:17.0323703Z test_ops.py::TestCommonCUDA::test_out_warning_linalg_svd_cuda PASSED [ 20%] 2023-01-11T23:10:17.0323857Z test_ops.py::TestCommonCUDA::test_out_warning_linalg_svdvals_cuda PASSED [ 20%] 2023-01-11T23:10:17.0324012Z test_ops.py::TestCommonCUDA::test_out_warning_linalg_tensorinv_cuda PASSED [ 20%] 2023-01-11T23:10:17.0324165Z test_ops.py::TestCommonCUDA::test_out_warning_linalg_vander_cuda PASSED [ 20%] 2023-01-11T23:10:17.0324315Z test_ops.py::TestCommonCUDA::test_out_warning_linspace_cuda XFAIL [ 20%] 2023-01-11T23:10:17.0324462Z test_ops.py::TestCommonCUDA::test_out_warning_log1p_cuda PASSED [ 20%] 2023-01-11T23:10:17.0324622Z test_ops.py::TestCommonCUDA::test_out_warning_logcumsumexp_cuda XFAIL [ 20%] 2023-01-11T23:10:17.0324772Z test_ops.py::TestCommonCUDA::test_out_warning_logdet_cuda PASSED [ 20%] 2023-01-11T23:10:17.0324949Z test_ops.py::TestCommonCUDA::test_out_warning_logical_and_cuda PASSED [ 20%] 2023-01-11T23:10:17.0325104Z test_ops.py::TestCommonCUDA::test_out_warning_logical_xor_cuda PASSED [ 20%] 2023-01-11T23:10:17.0325247Z test_ops.py::TestCommonCUDA::test_out_warning_logspace_cuda XFAIL [ 20%] 2023-01-11T23:10:17.0325402Z test_ops.py::TestCommonCUDA::test_out_warning_logsumexp_cuda PASSED [ 20%] 2023-01-11T23:10:17.0325550Z test_ops.py::TestCommonCUDA::test_out_warning_long_cuda PASSED [ 20%] 2023-01-11T23:10:17.0325700Z test_ops.py::TestCommonCUDA::test_out_warning_masked_amax_cuda PASSED [ 20%] 2023-01-11T23:10:17.0325853Z test_ops.py::TestCommonCUDA::test_out_warning_masked_argmin_cuda PASSED [ 20%] 2023-01-11T23:10:17.0326006Z test_ops.py::TestCommonCUDA::test_out_warning_masked_fill_cuda PASSED [ 20%] 2023-01-11T23:10:17.0326170Z test_ops.py::TestCommonCUDA::test_out_warning_masked_logaddexp_cuda PASSED [ 20%] 2023-01-11T23:10:17.0326325Z test_ops.py::TestCommonCUDA::test_out_warning_masked_scatter_cuda PASSED [ 20%] 2023-01-11T23:10:17.0326470Z test_ops.py::TestCommonCUDA::test_out_warning_masked_std_cuda PASSED [ 20%] 2023-01-11T23:10:17.0326620Z test_ops.py::TestCommonCUDA::test_out_warning_masked_sum_cuda PASSED [ 20%] 2023-01-11T23:10:17.0326771Z test_ops.py::TestCommonCUDA::test_out_warning_matrix_exp_cuda PASSED [ 20%] 2023-01-11T23:10:17.0326950Z test_ops.py::TestCommonCUDA::test_out_warning_max_pool2d_with_indices_backward_cuda XFAIL [ 20%] 2023-01-11T23:10:17.0327117Z test_ops.py::TestCommonCUDA::test_out_warning_max_reduction_no_dim_cuda PASSED [ 20%] 2023-01-11T23:10:17.0327284Z test_ops.py::TestCommonCUDA::test_out_warning_max_reduction_with_dim_cuda PASSED [ 20%] 2023-01-11T23:10:17.0327461Z test_ops.py::TestCommonCUDA::test_out_warning_maximum_cuda PASSED [ 20%] 2023-01-11T23:10:17.0327609Z test_ops.py::TestCommonCUDA::test_out_warning_mean_cuda PASSED [ 20%] 2023-01-11T23:10:17.0327755Z test_ops.py::TestCommonCUDA::test_out_warning_min_binary_cuda PASSED [ 20%] 2023-01-11T23:10:17.0327922Z test_ops.py::TestCommonCUDA::test_out_warning_min_reduction_no_dim_cuda PASSED [ 20%] 2023-01-11T23:10:17.0328071Z test_ops.py::TestCommonCUDA::test_out_warning_minimum_cuda PASSED [ 20%] 2023-01-11T23:10:17.0328216Z test_ops.py::TestCommonCUDA::test_out_warning_mm_cuda PASSED [ 20%] 2023-01-11T23:10:17.0328364Z test_ops.py::TestCommonCUDA::test_out_warning_movedim_cuda PASSED [ 20%] 2023-01-11T23:10:17.0328519Z test_ops.py::TestCommonCUDA::test_out_warning_multinomial_cuda XFAIL [ 20%] 2023-01-11T23:10:17.0328688Z test_ops.py::TestCommonCUDA::test_out_warning_mvlgamma_mvlgamma_p_3_cuda PASSED [ 20%] 2023-01-11T23:10:17.0328840Z test_ops.py::TestCommonCUDA::test_out_warning_nanmedian_cuda PASSED [ 20%] 2023-01-11T23:10:17.0328987Z test_ops.py::TestCommonCUDA::test_out_warning_nanquantile_cuda PASSED [ 20%] 2023-01-11T23:10:17.0329135Z test_ops.py::TestCommonCUDA::test_out_warning_nansum_cuda PASSED [ 20%] 2023-01-11T23:10:17.0329286Z test_ops.py::TestCommonCUDA::test_out_warning_narrow_cuda PASSED [ 20%] 2023-01-11T23:10:17.0329450Z test_ops.py::TestCommonCUDA::test_out_warning_native_batch_norm_cuda PASSED [ 20%] 2023-01-11T23:10:17.0329596Z test_ops.py::TestCommonCUDA::test_out_warning_ne_cuda PASSED [ 20%] 2023-01-11T23:10:17.0329758Z test_ops.py::TestCommonCUDA::test_out_warning_new_empty_strided_cuda PASSED [ 20%] 2023-01-11T23:10:17.0329907Z test_ops.py::TestCommonCUDA::test_out_warning_new_full_cuda PASSED [ 20%] 2023-01-11T23:10:17.0330060Z test_ops.py::TestCommonCUDA::test_out_warning_new_ones_cuda PASSED [ 20%] 2023-01-11T23:10:17.0330213Z test_ops.py::TestCommonCUDA::test_out_warning_nextafter_cuda PASSED [ 20%] 2023-01-11T23:10:17.0330437Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional__scaled_dot_product_attention_cuda PASSED [ 20%] 2023-01-11T23:10:17.0330656Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_adaptive_avg_pool1d_cuda PASSED [ 20%] 2023-01-11T23:10:17.0330835Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_adaptive_max_pool2d_cuda PASSED [ 20%] 2023-01-11T23:10:17.0331011Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_alpha_dropout_cuda PASSED [ 20%] 2023-01-11T23:10:17.0331181Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_batch_norm_cuda PASSED [ 20%] 2023-01-11T23:10:17.0331345Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_celu_cuda PASSED [ 20%] 2023-01-11T23:10:17.0331510Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_conv1d_cuda PASSED [ 20%] 2023-01-11T23:10:17.0331671Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_conv2d_cuda PASSED [ 20%] 2023-01-11T23:10:17.0331844Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_conv_transpose2d_cuda PASSED [ 20%] 2023-01-11T23:10:17.0332023Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_conv_transpose3d_cuda PASSED [ 20%] 2023-01-11T23:10:17.0332210Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_cosine_similarity_cuda PASSED [ 20%] 2023-01-11T23:10:17.0332380Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_dropout3d_cuda PASSED [ 20%] 2023-01-11T23:10:17.0332542Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_elu_cuda PASSED [ 20%] 2023-01-11T23:10:17.0332714Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_embedding_bag_cuda PASSED [ 20%] 2023-01-11T23:10:17.0332910Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_feature_alpha_dropout_with_train_cuda PASSED [ 20%] 2023-01-11T23:10:17.0333092Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_fractional_max_pool2d_cuda PASSED [ 20%] 2023-01-11T23:10:17.0333311Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_glu_cuda PASSED [ 20%] 2023-01-11T23:10:17.0333476Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_grid_sample_cuda PASSED [ 20%] 2023-01-11T23:10:17.0333646Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_hardswish_cuda PASSED [ 20%] 2023-01-11T23:10:17.0333832Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_interpolate_bilinear_cuda PASSED [ 20%] 2023-01-11T23:10:17.0334014Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_interpolate_nearest_cuda PASSED [ 20%] 2023-01-11T23:10:17.0334178Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_linear_cuda PASSED [ 20%] 2023-01-11T23:10:17.0334355Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_local_response_norm_cuda PASSED [ 20%] 2023-01-11T23:10:17.0334618Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_logsigmoid_cuda PASSED [ 20%] 2023-01-11T23:10:17.0334797Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_margin_ranking_loss_cuda PASSED [ 20%] 2023-01-11T23:10:17.0334966Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_max_unpool1d_cuda PASSED [ 20%] 2023-01-11T23:10:17.0335143Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_max_unpool1d_grad_cuda PASSED [ 20%] 2023-01-11T23:10:17.0335316Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_max_unpool2d_cuda PASSED [ 20%] 2023-01-11T23:10:17.0335494Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_max_unpool2d_grad_cuda PASSED [ 20%] 2023-01-11T23:10:17.0335663Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_max_unpool3d_cuda PASSED [ 20%] 2023-01-11T23:10:17.0335827Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_mse_loss_cuda PASSED [ 20%] 2023-01-11T23:10:17.0335990Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_nll_loss_cuda PASSED [ 20%] 2023-01-11T23:10:17.0336153Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_one_hot_cuda PASSED [ 20%] 2023-01-11T23:10:17.0336331Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_pairwise_distance_cuda PASSED [ 20%] 2023-01-11T23:10:17.0336495Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_pixel_shuffle_cuda PASSED [ 20%] 2023-01-11T23:10:17.0336699Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_prelu_cuda PASSED [ 20%] 2023-01-11T23:10:17.0336858Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_rrelu_cuda PASSED [ 20%] 2023-01-11T23:10:17.0337018Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_selu_cuda PASSED [ 20%] 2023-01-11T23:10:17.0337252Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_silu_complex_cuda SKIPPED (Skipped! Op has not supported dtypes on this device.) [ 20%] 2023-01-11T23:10:17.0337410Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_silu_cuda PASSED [ 20%] 2023-01-11T23:10:17.0337581Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_smooth_l1_loss_cuda PASSED [ 20%] 2023-01-11T23:10:17.0337752Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_soft_margin_loss_cuda PASSED [ 20%] 2023-01-11T23:10:17.0337920Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_softplus_cuda PASSED [ 20%] 2023-01-11T23:10:17.0338084Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_softshrink_cuda PASSED [ 20%] 2023-01-11T23:10:17.0338252Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_threshold_cuda PASSED [ 20%] 2023-01-11T23:10:17.0338430Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_triplet_margin_loss_cuda PASSED [ 20%] 2023-01-11T23:10:17.0338592Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_unfold_cuda PASSED [ 20%] 2023-01-11T23:10:17.0338765Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_upsample_nearest_cuda PASSED [ 20%] 2023-01-11T23:10:17.0338915Z test_ops.py::TestCommonCUDA::test_out_warning_norm_cuda PASSED [ 20%] 2023-01-11T23:10:17.0339065Z test_ops.py::TestCommonCUDA::test_out_warning_norm_fro_cuda PASSED [ 20%] 2023-01-11T23:10:17.0339249Z test_ops.py::TestCommonCUDA::test_out_warning_norm_nuc_cuda PASSED [ 20%] 2023-01-11T23:10:17.0339389Z test_ops.py::TestCommonCUDA::test_out_warning_normal_cuda XFAIL [ 20%] 2023-01-11T23:10:17.0339553Z test_ops.py::TestCommonCUDA::test_out_warning_normal_number_mean_cuda PASSED [ 20%] 2023-01-11T23:10:17.0339697Z test_ops.py::TestCommonCUDA::test_out_warning_ones_cuda XFAIL [ 20%] 2023-01-11T23:10:17.0339847Z test_ops.py::TestCommonCUDA::test_out_warning_ones_like_cuda PASSED [ 20%] 2023-01-11T23:10:17.0339993Z test_ops.py::TestCommonCUDA::test_out_warning_outer_cuda PASSED [ 20%] 2023-01-11T23:10:17.0340142Z test_ops.py::TestCommonCUDA::test_out_warning_permute_cuda PASSED [ 20%] 2023-01-11T23:10:17.0340325Z test_ops.py::TestCommonCUDA::test_out_warning_polygamma_polygamma_n_1_cuda SKIPPED (Skipped!) [ 20%] 2023-01-11T23:10:17.0340500Z test_ops.py::TestCommonCUDA::test_out_warning_polygamma_polygamma_n_2_cuda SKIPPED (Skipped!) [ 20%] 2023-01-11T23:10:17.0340671Z test_ops.py::TestCommonCUDA::test_out_warning_polygamma_polygamma_n_3_cuda SKIPPED (Skipped!) [ 20%] 2023-01-11T23:10:17.0340844Z test_ops.py::TestCommonCUDA::test_out_warning_polygamma_polygamma_n_4_cuda SKIPPED (Skipped!) [ 20%] 2023-01-11T23:10:17.0341001Z test_ops.py::TestCommonCUDA::test_out_warning_positive_cuda PASSED [ 20%] 2023-01-11T23:10:17.0341148Z test_ops.py::TestCommonCUDA::test_out_warning_prod_cuda PASSED [ 20%] 2023-01-11T23:10:17.0341296Z test_ops.py::TestCommonCUDA::test_out_warning_rad2deg_cuda PASSED [ 20%] 2023-01-11T23:10:17.0341444Z test_ops.py::TestCommonCUDA::test_out_warning_rand_like_cuda PASSED [ 20%] 2023-01-11T23:10:17.0341590Z test_ops.py::TestCommonCUDA::test_out_warning_randint_cuda XFAIL [ 20%] 2023-01-11T23:10:17.0341740Z test_ops.py::TestCommonCUDA::test_out_warning_randint_like_cuda PASSED [ 20%] 2023-01-11T23:10:17.0341877Z test_ops.py::TestCommonCUDA::test_out_warning_randn_cuda XFAIL [ 20%] 2023-01-11T23:10:17.0342031Z test_ops.py::TestCommonCUDA::test_out_warning_reciprocal_cuda PASSED [ 20%] 2023-01-11T23:10:17.0342181Z test_ops.py::TestCommonCUDA::test_out_warning_remainder_cuda PASSED [ 20%] 2023-01-11T23:10:17.0342352Z test_ops.py::TestCommonCUDA::test_out_warning_renorm_cuda PASSED [ 20%] 2023-01-11T23:10:17.0342503Z test_ops.py::TestCommonCUDA::test_out_warning_repeat_cuda PASSED [ 20%] 2023-01-11T23:10:17.0342652Z test_ops.py::TestCommonCUDA::test_out_warning_reshape_as_cuda PASSED [ 20%] 2023-01-11T23:10:17.0342799Z test_ops.py::TestCommonCUDA::test_out_warning_resize_as__cuda PASSED [ 20%] 2023-01-11T23:10:17.0342947Z test_ops.py::TestCommonCUDA::test_out_warning_resolve_conj_cuda PASSED [ 20%] 2023-01-11T23:10:17.0343090Z test_ops.py::TestCommonCUDA::test_out_warning_roll_cuda PASSED [ 20%] 2023-01-11T23:10:17.0343256Z test_ops.py::TestCommonCUDA::test_out_warning_round_decimals_3_cuda SKIPPED (Skipped!) [ 20%] 2023-01-11T23:10:17.0343435Z test_ops.py::TestCommonCUDA::test_out_warning_round_decimals_neg_3_cuda SKIPPED (Skipped!) [ 20%] 2023-01-11T23:10:17.0343581Z test_ops.py::TestCommonCUDA::test_out_warning_rsqrt_cuda PASSED [ 20%] 2023-01-11T23:10:17.0343733Z test_ops.py::TestCommonCUDA::test_out_warning_scalar_tensor_cuda PASSED [ 20%] 2023-01-11T23:10:17.0343882Z test_ops.py::TestCommonCUDA::test_out_warning_scatter_add_cuda PASSED [ 20%] 2023-01-11T23:10:17.0344045Z test_ops.py::TestCommonCUDA::test_out_warning_scatter_reduce_mean_cuda PASSED [ 21%] 2023-01-11T23:10:17.0344197Z test_ops.py::TestCommonCUDA::test_out_warning_searchsorted_cuda PASSED [ 21%] 2023-01-11T23:10:17.0344365Z test_ops.py::TestCommonCUDA::test_out_warning_segment_reduce_offsets_cuda PASSED [ 21%] 2023-01-11T23:10:17.0344503Z test_ops.py::TestCommonCUDA::test_out_warning_sgn_cuda PASSED [ 21%] 2023-01-11T23:10:17.0344649Z test_ops.py::TestCommonCUDA::test_out_warning_short_cuda PASSED [ 21%] 2023-01-11T23:10:17.0344793Z test_ops.py::TestCommonCUDA::test_out_warning_sign_cuda PASSED [ 21%] 2023-01-11T23:10:17.0344985Z test_ops.py::TestCommonCUDA::test_out_warning_signal_windows_blackman_cuda PASSED [ 21%] 2023-01-11T23:10:17.0345154Z test_ops.py::TestCommonCUDA::test_out_warning_signal_windows_cosine_cuda PASSED [ 21%] 2023-01-11T23:10:17.0345325Z test_ops.py::TestCommonCUDA::test_out_warning_signal_windows_exponential_cuda PASSED [ 21%] 2023-01-11T23:10:17.0345489Z test_ops.py::TestCommonCUDA::test_out_warning_signal_windows_gaussian_cuda PASSED [ 21%] 2023-01-11T23:10:17.0345662Z test_ops.py::TestCommonCUDA::test_out_warning_signal_windows_general_cosine_cuda PASSED [ 21%] 2023-01-11T23:10:17.0345829Z test_ops.py::TestCommonCUDA::test_out_warning_signal_windows_general_hamming_cuda PASSED [ 21%] 2023-01-11T23:10:17.0345996Z test_ops.py::TestCommonCUDA::test_out_warning_signal_windows_hamming_cuda PASSED [ 21%] 2023-01-11T23:10:17.0346145Z test_ops.py::TestCommonCUDA::test_out_warning_signbit_cuda PASSED [ 21%] 2023-01-11T23:10:17.0346292Z test_ops.py::TestCommonCUDA::test_out_warning_sin_cuda PASSED [ 21%] 2023-01-11T23:10:17.0346437Z test_ops.py::TestCommonCUDA::test_out_warning_sinh_cuda PASSED [ 21%] 2023-01-11T23:10:17.0346591Z test_ops.py::TestCommonCUDA::test_out_warning_slice_scatter_cuda PASSED [ 21%] 2023-01-11T23:10:17.0346739Z test_ops.py::TestCommonCUDA::test_out_warning_softmax_cuda PASSED [ 21%] 2023-01-11T23:10:17.0346900Z test_ops.py::TestCommonCUDA::test_out_warning_special_bessel_y0_cuda PASSED [ 21%] 2023-01-11T23:10:17.0347050Z test_ops.py::TestCommonCUDA::test_out_warning_special_bessel_y1_cuda PASSED [ 21%] 2023-01-11T23:10:17.0347226Z test_ops.py::TestCommonCUDA::test_out_warning_special_chebyshev_polynomial_u_cuda PASSED [ 21%] 2023-01-11T23:10:17.0347573Z test_ops.py::TestCommonCUDA::test_out_warning_special_chebyshev_polynomial_w_cuda SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 21%] 2023-01-11T23:10:17.0347726Z test_ops.py::TestCommonCUDA::test_out_warning_special_erfcx_cuda PASSED [ 21%] 2023-01-11T23:10:17.0347903Z test_ops.py::TestCommonCUDA::test_out_warning_special_hermite_polynomial_h_cuda PASSED [ 21%] 2023-01-11T23:10:17.0348054Z test_ops.py::TestCommonCUDA::test_out_warning_special_i0e_cuda PASSED [ 21%] 2023-01-11T23:10:17.0348237Z test_ops.py::TestCommonCUDA::test_out_warning_special_i1_cuda PASSED [ 21%] 2023-01-11T23:10:17.0348392Z test_ops.py::TestCommonCUDA::test_out_warning_special_i1e_cuda PASSED [ 21%] 2023-01-11T23:10:17.0348784Z test_ops.py::TestCommonCUDA::test_out_warning_special_legendre_polynomial_p_cuda SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 21%] 2023-01-11T23:10:17.0348946Z test_ops.py::TestCommonCUDA::test_out_warning_special_log_ndtr_cuda PASSED [ 21%] 2023-01-11T23:10:17.0349118Z test_ops.py::TestCommonCUDA::test_out_warning_special_modified_bessel_i0_cuda PASSED [ 21%] 2023-01-11T23:10:17.0349288Z test_ops.py::TestCommonCUDA::test_out_warning_special_modified_bessel_i1_cuda PASSED [ 21%] 2023-01-11T23:10:17.0349456Z test_ops.py::TestCommonCUDA::test_out_warning_special_modified_bessel_k1_cuda PASSED [ 21%] 2023-01-11T23:10:17.0349606Z test_ops.py::TestCommonCUDA::test_out_warning_special_ndtr_cuda PASSED [ 21%] 2023-01-11T23:10:17.0349762Z test_ops.py::TestCommonCUDA::test_out_warning_special_ndtri_cuda PASSED [ 21%] 2023-01-11T23:10:17.0349949Z test_ops.py::TestCommonCUDA::test_out_warning_special_polygamma_special_polygamma_n_0_cuda PASSED [ 21%] 2023-01-11T23:10:17.0350131Z test_ops.py::TestCommonCUDA::test_out_warning_special_scaled_modified_bessel_k1_cuda PASSED [ 21%] 2023-01-11T23:10:17.0350535Z test_ops.py::TestCommonCUDA::test_out_warning_special_shifted_chebyshev_polynomial_w_cuda SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 21%] 2023-01-11T23:10:17.0350701Z test_ops.py::TestCommonCUDA::test_out_warning_special_spherical_bessel_j0_cuda PASSED [ 21%] 2023-01-11T23:10:17.0350849Z test_ops.py::TestCommonCUDA::test_out_warning_sqrt_cuda PASSED [ 21%] 2023-01-11T23:10:17.0351028Z test_ops.py::TestCommonCUDA::test_out_warning_squeeze_cuda PASSED [ 21%] 2023-01-11T23:10:17.0351177Z test_ops.py::TestCommonCUDA::test_out_warning_stack_cuda PASSED [ 21%] 2023-01-11T23:10:17.0351329Z test_ops.py::TestCommonCUDA::test_out_warning_std_mean_cuda PASSED [ 21%] 2023-01-11T23:10:17.0351481Z test_ops.py::TestCommonCUDA::test_out_warning_std_unbiased_cuda PASSED [ 21%] 2023-01-11T23:10:17.0351628Z test_ops.py::TestCommonCUDA::test_out_warning_stft_cuda PASSED [ 21%] 2023-01-11T23:10:17.0351775Z test_ops.py::TestCommonCUDA::test_out_warning_svd_cuda PASSED [ 21%] 2023-01-11T23:10:17.0351915Z test_ops.py::TestCommonCUDA::test_out_warning_symeig_cuda PASSED [ 21%] 2023-01-11T23:10:17.0352061Z test_ops.py::TestCommonCUDA::test_out_warning_take_cuda PASSED [ 21%] 2023-01-11T23:10:17.0352205Z test_ops.py::TestCommonCUDA::test_out_warning_tanh_cuda PASSED [ 21%] 2023-01-11T23:10:17.0352355Z test_ops.py::TestCommonCUDA::test_out_warning_tensordot_cuda PASSED [ 21%] 2023-01-11T23:10:17.0352501Z test_ops.py::TestCommonCUDA::test_out_warning_tile_cuda PASSED [ 21%] 2023-01-11T23:10:17.0352648Z test_ops.py::TestCommonCUDA::test_out_warning_topk_cuda PASSED [ 21%] 2023-01-11T23:10:17.0352794Z test_ops.py::TestCommonCUDA::test_out_warning_trace_cuda PASSED [ 21%] 2023-01-11T23:10:17.0352942Z test_ops.py::TestCommonCUDA::test_out_warning_transpose_cuda PASSED [ 21%] 2023-01-11T23:10:17.0353084Z test_ops.py::TestCommonCUDA::test_out_warning_trapezoid_cuda PASSED [ 21%] 2023-01-11T23:10:17.0353244Z test_ops.py::TestCommonCUDA::test_out_warning_triangular_solve_cuda PASSED [ 21%] 2023-01-11T23:10:17.0353395Z test_ops.py::TestCommonCUDA::test_out_warning_tril_indices_cuda PASSED [ 21%] 2023-01-11T23:10:17.0353538Z test_ops.py::TestCommonCUDA::test_out_warning_triu_cuda PASSED [ 21%] 2023-01-11T23:10:17.0353687Z test_ops.py::TestCommonCUDA::test_out_warning_triu_indices_cuda PASSED [ 21%] 2023-01-11T23:10:17.0353834Z test_ops.py::TestCommonCUDA::test_out_warning_trunc_cuda PASSED [ 21%] 2023-01-11T23:10:17.0353981Z test_ops.py::TestCommonCUDA::test_out_warning_unflatten_cuda PASSED [ 21%] 2023-01-11T23:10:17.0354158Z test_ops.py::TestCommonCUDA::test_out_warning_unfold_copy_cuda XFAIL [ 21%] 2023-01-11T23:10:17.0354299Z test_ops.py::TestCommonCUDA::test_out_warning_unfold_cuda PASSED [ 21%] 2023-01-11T23:10:17.0354447Z test_ops.py::TestCommonCUDA::test_out_warning_uniform_cuda PASSED [ 21%] 2023-01-11T23:10:17.0354613Z test_ops.py::TestCommonCUDA::test_out_warning_unique_consecutive_cuda PASSED [ 21%] 2023-01-11T23:10:17.0354758Z test_ops.py::TestCommonCUDA::test_out_warning_unique_cuda PASSED [ 21%] 2023-01-11T23:10:17.0354905Z test_ops.py::TestCommonCUDA::test_out_warning_var_mean_cuda PASSED [ 21%] 2023-01-11T23:10:17.0355055Z test_ops.py::TestCommonCUDA::test_out_warning_var_unbiased_cuda PASSED [ 21%] 2023-01-11T23:10:17.0355203Z test_ops.py::TestCommonCUDA::test_out_warning_vdot_cuda PASSED [ 21%] 2023-01-11T23:10:17.0355362Z test_ops.py::TestCommonCUDA::test_out_warning_view_as_complex_cuda PASSED [ 21%] 2023-01-11T23:10:17.0355510Z test_ops.py::TestCommonCUDA::test_out_warning_view_copy_cuda XFAIL [ 21%] 2023-01-11T23:10:17.0355649Z test_ops.py::TestCommonCUDA::test_out_warning_view_cuda PASSED [ 21%] 2023-01-11T23:10:17.0355795Z test_ops.py::TestCommonCUDA::test_out_warning_vsplit_cuda PASSED [ 21%] 2023-01-11T23:10:17.0355941Z test_ops.py::TestCommonCUDA::test_out_warning_where_cuda PASSED [ 21%] 2023-01-11T23:10:17.0356084Z test_ops.py::TestCommonCUDA::test_out_warning_zeros_cuda XFAIL [ 21%] 2023-01-11T23:10:17.0356230Z test_ops.py::TestCommonCUDA::test_out_where_cuda_float32 PASSED [ 21%] 2023-01-11T23:10:17.0356373Z test_ops.py::TestCommonCUDA::test_out_xlogy_cuda_float32 PASSED [ 21%] 2023-01-11T23:10:17.0356520Z test_ops.py::TestCommonCUDA::test_out_zeros_like_cuda_float32 PASSED [ 21%] 2023-01-11T23:10:17.0356695Z test_ops.py::TestCommonCUDA::test_pointwise_tag_coverage_cuda PASSED [ 21%] 2023-01-11T23:10:17.0356846Z test_ops.py::TestCommonCUDA::test_python_ref__refs_T_cuda_bfloat16 PASSED [ 21%] 2023-01-11T23:10:17.0357010Z test_ops.py::TestCommonCUDA::test_python_ref__refs_T_cuda_complex128 PASSED [ 21%] 2023-01-11T23:10:17.0357167Z test_ops.py::TestCommonCUDA::test_python_ref__refs_T_cuda_complex32 PASSED [ 21%] 2023-01-11T23:10:17.0357323Z test_ops.py::TestCommonCUDA::test_python_ref__refs_T_cuda_complex64 PASSED [ 21%] 2023-01-11T23:10:17.0357472Z test_ops.py::TestCommonCUDA::test_python_ref__refs_T_cuda_float16 PASSED [ 21%] 2023-01-11T23:10:17.0357623Z test_ops.py::TestCommonCUDA::test_python_ref__refs_T_cuda_float32 PASSED [ 21%] 2023-01-11T23:10:17.0357771Z test_ops.py::TestCommonCUDA::test_python_ref__refs_T_cuda_float64 PASSED [ 21%] 2023-01-11T23:10:17.0357919Z test_ops.py::TestCommonCUDA::test_python_ref__refs_T_cuda_int16 PASSED [ 21%] 2023-01-11T23:10:17.0358062Z test_ops.py::TestCommonCUDA::test_python_ref__refs_T_cuda_int32 PASSED [ 21%] 2023-01-11T23:10:17.0358209Z test_ops.py::TestCommonCUDA::test_python_ref__refs_T_cuda_int8 PASSED [ 21%] 2023-01-11T23:10:17.0358359Z test_ops.py::TestCommonCUDA::test_python_ref__refs_T_cuda_uint8 PASSED [ 21%] 2023-01-11T23:10:17.0358534Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bfloat16_cuda_bool PASSED [ 21%] 2023-01-11T23:10:17.0358718Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bfloat16_cuda_complex128 PASSED [ 21%] 2023-01-11T23:10:17.0358899Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bfloat16_cuda_complex64 PASSED [ 21%] 2023-01-11T23:10:17.0359078Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bfloat16_cuda_float32 PASSED [ 21%] 2023-01-11T23:10:17.0359251Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bfloat16_cuda_int16 PASSED [ 21%] 2023-01-11T23:10:17.0359422Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bfloat16_cuda_int32 PASSED [ 21%] 2023-01-11T23:10:17.0359598Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bool_cuda_bfloat16 PASSED [ 21%] 2023-01-11T23:10:17.0359794Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bool_cuda_complex32 PASSED [ 21%] 2023-01-11T23:10:17.0359968Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bool_cuda_complex64 PASSED [ 21%] 2023-01-11T23:10:17.0360152Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bool_cuda_float64 PASSED [ 21%] 2023-01-11T23:10:17.0360322Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bool_cuda_int32 PASSED [ 21%] 2023-01-11T23:10:17.0360490Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bool_cuda_int64 PASSED [ 21%] 2023-01-11T23:10:17.0360661Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_byte_cuda_bfloat16 PASSED [ 21%] 2023-01-11T23:10:17.0360834Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_byte_cuda_bool PASSED [ 21%] 2023-01-11T23:10:17.0361005Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_byte_cuda_complex128 PASSED [ 21%] 2023-01-11T23:10:17.0361180Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_byte_cuda_complex64 PASSED [ 21%] 2023-01-11T23:10:17.0361348Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_byte_cuda_int16 PASSED [ 21%] 2023-01-11T23:10:17.0361515Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_byte_cuda_int32 PASSED [ 21%] 2023-01-11T23:10:17.0361680Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_byte_cuda_uint8 PASSED [ 21%] 2023-01-11T23:10:17.0361857Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cdouble_cuda_bfloat16 PASSED [ 21%] 2023-01-11T23:10:17.0362030Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cdouble_cuda_bool PASSED [ 21%] 2023-01-11T23:10:17.0362203Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cdouble_cuda_float32 PASSED [ 21%] 2023-01-11T23:10:17.0362402Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cdouble_cuda_uint8 PASSED [ 21%] 2023-01-11T23:10:17.0362575Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cfloat_cuda_bfloat16 PASSED [ 21%] 2023-01-11T23:10:17.0362753Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cfloat_cuda_complex32 PASSED [ 21%] 2023-01-11T23:10:17.0362926Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cfloat_cuda_float16 PASSED [ 21%] 2023-01-11T23:10:17.0363097Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cfloat_cuda_int16 PASSED [ 21%] 2023-01-11T23:10:17.0363267Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cfloat_cuda_uint8 PASSED [ 21%] 2023-01-11T23:10:17.0363438Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_chalf_cuda_bool PASSED [ 21%] 2023-01-11T23:10:17.0363614Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_chalf_cuda_complex128 PASSED [ 21%] 2023-01-11T23:10:17.0363790Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_chalf_cuda_float16 PASSED [ 21%] 2023-01-11T23:10:17.0363957Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_chalf_cuda_int32 PASSED [ 21%] 2023-01-11T23:10:17.0364127Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_chalf_cuda_int64 PASSED [ 21%] 2023-01-11T23:10:17.0364296Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_chalf_cuda_int8 PASSED [ 21%] 2023-01-11T23:10:17.0364464Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_chalf_cuda_uint8 PASSED [ 21%] 2023-01-11T23:10:17.0364635Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_char_cuda_bfloat16 PASSED [ 21%] 2023-01-11T23:10:17.0364808Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_char_cuda_complex32 PASSED [ 21%] 2023-01-11T23:10:17.0364975Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_char_cuda_float64 PASSED [ 21%] 2023-01-11T23:10:17.0365145Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_char_cuda_int32 PASSED [ 21%] 2023-01-11T23:10:17.0365310Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_char_cuda_int64 PASSED [ 21%] 2023-01-11T23:10:17.0365489Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_char_cuda_uint8 PASSED [ 21%] 2023-01-11T23:10:17.0365668Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_complex_cuda_float16 PASSED [ 21%] 2023-01-11T23:10:17.0365843Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_complex_cuda_float32 PASSED [ 21%] 2023-01-11T23:10:17.0366014Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_complex_cuda_float64 PASSED [ 21%] 2023-01-11T23:10:17.0366184Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_double_cuda_float32 PASSED [ 21%] 2023-01-11T23:10:17.0366351Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_double_cuda_int16 PASSED [ 21%] 2023-01-11T23:10:17.0366528Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_double_cuda_int8 PASSED [ 21%] 2023-01-11T23:10:17.0366706Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_float_cuda_complex128 PASSED [ 21%] 2023-01-11T23:10:17.0366881Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_float_cuda_complex64 PASSED [ 21%] 2023-01-11T23:10:17.0367046Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_float_cuda_float16 PASSED [ 21%] 2023-01-11T23:10:17.0367218Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_float_cuda_float64 PASSED [ 21%] 2023-01-11T23:10:17.0367387Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_float_cuda_int16 PASSED [ 21%] 2023-01-11T23:10:17.0367553Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_float_cuda_int32 PASSED [ 21%] 2023-01-11T23:10:17.0367720Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_float_cuda_int8 PASSED [ 21%] 2023-01-11T23:10:17.0367912Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_half_cuda_bool PASSED [ 21%] 2023-01-11T23:10:17.0368089Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_half_cuda_complex128 PASSED [ 21%] 2023-01-11T23:10:17.0368263Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_half_cuda_complex64 PASSED [ 21%] 2023-01-11T23:10:17.0368435Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_half_cuda_float16 PASSED [ 21%] 2023-01-11T23:10:17.0368597Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_half_cuda_float32 PASSED [ 21%] 2023-01-11T23:10:17.0368766Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_half_cuda_float64 PASSED [ 21%] 2023-01-11T23:10:17.0368932Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_half_cuda_int64 PASSED [ 21%] 2023-01-11T23:10:17.0369097Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_half_cuda_int8 PASSED [ 21%] 2023-01-11T23:10:17.0369272Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_int_cuda_bfloat16 PASSED [ 21%] 2023-01-11T23:10:17.0369442Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_int_cuda_float16 PASSED [ 21%] 2023-01-11T23:10:17.0369612Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_int_cuda_float32 PASSED [ 21%] 2023-01-11T23:10:17.0369778Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_int_cuda_float64 PASSED [ 21%] 2023-01-11T23:10:17.0369938Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_int_cuda_int32 PASSED [ 21%] 2023-01-11T23:10:17.0370108Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_long_cuda_bfloat16 PASSED [ 22%] 2023-01-11T23:10:17.0370277Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_long_cuda_bool PASSED [ 22%] 2023-01-11T23:10:17.0370450Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_long_cuda_complex32 PASSED [ 22%] 2023-01-11T23:10:17.0370616Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_long_cuda_int64 PASSED [ 22%] 2023-01-11T23:10:17.0370791Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_short_cuda_bfloat16 PASSED [ 22%] 2023-01-11T23:10:17.0370988Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_short_cuda_complex64 PASSED [ 22%] 2023-01-11T23:10:17.0371163Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_short_cuda_float16 PASSED [ 22%] 2023-01-11T23:10:17.0371330Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_short_cuda_int32 PASSED [ 22%] 2023-01-11T23:10:17.0371491Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_short_cuda_uint8 PASSED [ 22%] 2023-01-11T23:10:17.0371651Z test_ops.py::TestCommonCUDA::test_python_ref__refs_abs_cuda_bfloat16 PASSED [ 22%] 2023-01-11T23:10:17.0371805Z test_ops.py::TestCommonCUDA::test_python_ref__refs_abs_cuda_bool PASSED [ 22%] 2023-01-11T23:10:17.0371962Z test_ops.py::TestCommonCUDA::test_python_ref__refs_abs_cuda_float32 PASSED [ 22%] 2023-01-11T23:10:17.0372122Z test_ops.py::TestCommonCUDA::test_python_ref__refs_abs_cuda_float64 PASSED [ 22%] 2023-01-11T23:10:17.0372278Z test_ops.py::TestCommonCUDA::test_python_ref__refs_abs_cuda_int16 PASSED [ 22%] 2023-01-11T23:10:17.0372435Z test_ops.py::TestCommonCUDA::test_python_ref__refs_abs_cuda_int8 PASSED [ 22%] 2023-01-11T23:10:17.0372598Z test_ops.py::TestCommonCUDA::test_python_ref__refs_acos_cuda_bfloat16 PASSED [ 22%] 2023-01-11T23:10:17.0372745Z test_ops.py::TestCommonCUDA::test_python_ref__refs_acos_cuda_bool PASSED [ 22%] 2023-01-11T23:10:17.0372907Z test_ops.py::TestCommonCUDA::test_python_ref__refs_acos_cuda_complex32 PASSED [ 22%] 2023-01-11T23:10:17.0373065Z test_ops.py::TestCommonCUDA::test_python_ref__refs_acos_cuda_float16 PASSED [ 22%] 2023-01-11T23:10:17.0373218Z test_ops.py::TestCommonCUDA::test_python_ref__refs_acos_cuda_float32 PASSED [ 22%] 2023-01-11T23:10:17.0373373Z test_ops.py::TestCommonCUDA::test_python_ref__refs_acos_cuda_int32 PASSED [ 22%] 2023-01-11T23:10:17.0373554Z test_ops.py::TestCommonCUDA::test_python_ref__refs_acos_cuda_int64 PASSED [ 22%] 2023-01-11T23:10:17.0373710Z test_ops.py::TestCommonCUDA::test_python_ref__refs_acosh_cuda_bool PASSED [ 22%] 2023-01-11T23:10:17.0373878Z test_ops.py::TestCommonCUDA::test_python_ref__refs_acosh_cuda_complex128 PASSED [ 22%] 2023-01-11T23:10:17.0374032Z test_ops.py::TestCommonCUDA::test_python_ref__refs_acosh_cuda_float16 PASSED [ 22%] 2023-01-11T23:10:17.0374189Z test_ops.py::TestCommonCUDA::test_python_ref__refs_acosh_cuda_int32 PASSED [ 22%] 2023-01-11T23:10:17.0374344Z test_ops.py::TestCommonCUDA::test_python_ref__refs_acosh_cuda_uint8 PASSED [ 22%] 2023-01-11T23:10:17.0374597Z test_ops.py::TestCommonCUDA::test_python_ref__refs_add_cuda_float32 PASSED [ 22%] 2023-01-11T23:10:17.0374758Z test_ops.py::TestCommonCUDA::test_python_ref__refs_add_cuda_float64 PASSED [ 22%] 2023-01-11T23:10:17.0374909Z test_ops.py::TestCommonCUDA::test_python_ref__refs_add_cuda_int16 PASSED [ 22%] 2023-01-11T23:10:17.0375070Z test_ops.py::TestCommonCUDA::test_python_ref__refs_addcdiv_cuda_bfloat16 PASSED [ 22%] 2023-01-11T23:10:17.0375235Z test_ops.py::TestCommonCUDA::test_python_ref__refs_addcdiv_cuda_float16 PASSED [ 22%] 2023-01-11T23:10:17.0375393Z test_ops.py::TestCommonCUDA::test_python_ref__refs_addcdiv_cuda_float64 PASSED [ 22%] 2023-01-11T23:10:17.0375555Z test_ops.py::TestCommonCUDA::test_python_ref__refs_addcmul_cuda_bfloat16 PASSED [ 22%] 2023-01-11T23:10:17.0375717Z test_ops.py::TestCommonCUDA::test_python_ref__refs_addcmul_cuda_float16 PASSED [ 22%] 2023-01-11T23:10:17.0375873Z test_ops.py::TestCommonCUDA::test_python_ref__refs_addcmul_cuda_float32 PASSED [ 22%] 2023-01-11T23:10:17.0376029Z test_ops.py::TestCommonCUDA::test_python_ref__refs_addcmul_cuda_float64 PASSED [ 22%] 2023-01-11T23:10:17.0376189Z test_ops.py::TestCommonCUDA::test_python_ref__refs_addcmul_cuda_int32 PASSED [ 22%] 2023-01-11T23:10:17.0376349Z test_ops.py::TestCommonCUDA::test_python_ref__refs_addcmul_cuda_int8 PASSED [ 22%] 2023-01-11T23:10:17.0376506Z test_ops.py::TestCommonCUDA::test_python_ref__refs_addcmul_cuda_uint8 PASSED [ 22%] 2023-01-11T23:10:17.0376670Z test_ops.py::TestCommonCUDA::test_python_ref__refs_addr_cuda_complex128 XFAIL [ 22%] 2023-01-11T23:10:17.0376821Z test_ops.py::TestCommonCUDA::test_python_ref__refs_addr_cuda_float32 XFAIL [ 22%] 2023-01-11T23:10:17.0377016Z test_ops.py::TestCommonCUDA::test_python_ref__refs_addr_cuda_float64 XFAIL [ 22%] 2023-01-11T23:10:17.0377171Z test_ops.py::TestCommonCUDA::test_python_ref__refs_addr_cuda_int16 XFAIL [ 22%] 2023-01-11T23:10:17.0377323Z test_ops.py::TestCommonCUDA::test_python_ref__refs_addr_cuda_int32 XFAIL [ 22%] 2023-01-11T23:10:17.0377474Z test_ops.py::TestCommonCUDA::test_python_ref__refs_addr_cuda_int8 XFAIL [ 22%] 2023-01-11T23:10:17.0377632Z test_ops.py::TestCommonCUDA::test_python_ref__refs_all_cuda_bfloat16 PASSED [ 22%] 2023-01-11T23:10:17.0377794Z test_ops.py::TestCommonCUDA::test_python_ref__refs_all_cuda_complex128 PASSED [ 22%] 2023-01-11T23:10:17.0377953Z test_ops.py::TestCommonCUDA::test_python_ref__refs_all_cuda_complex64 PASSED [ 22%] 2023-01-11T23:10:17.0378102Z test_ops.py::TestCommonCUDA::test_python_ref__refs_all_cuda_int16 PASSED [ 22%] 2023-01-11T23:10:17.0378253Z test_ops.py::TestCommonCUDA::test_python_ref__refs_all_cuda_int32 PASSED [ 22%] 2023-01-11T23:10:17.0378421Z test_ops.py::TestCommonCUDA::test_python_ref__refs_allclose_cuda_bfloat16 PASSED [ 22%] 2023-01-11T23:10:17.0378589Z test_ops.py::TestCommonCUDA::test_python_ref__refs_allclose_cuda_complex64 PASSED [ 22%] 2023-01-11T23:10:17.0378751Z test_ops.py::TestCommonCUDA::test_python_ref__refs_allclose_cuda_float32 PASSED [ 22%] 2023-01-11T23:10:17.0378904Z test_ops.py::TestCommonCUDA::test_python_ref__refs_amax_cuda_bool PASSED [ 22%] 2023-01-11T23:10:17.0379062Z test_ops.py::TestCommonCUDA::test_python_ref__refs_amax_cuda_float32 PASSED [ 22%] 2023-01-11T23:10:17.0379215Z test_ops.py::TestCommonCUDA::test_python_ref__refs_amax_cuda_float64 PASSED [ 22%] 2023-01-11T23:10:17.0379364Z test_ops.py::TestCommonCUDA::test_python_ref__refs_amax_cuda_int32 PASSED [ 22%] 2023-01-11T23:10:17.0379545Z test_ops.py::TestCommonCUDA::test_python_ref__refs_amax_cuda_int8 PASSED [ 22%] 2023-01-11T23:10:17.0379700Z test_ops.py::TestCommonCUDA::test_python_ref__refs_amax_cuda_uint8 PASSED [ 22%] 2023-01-11T23:10:17.0379860Z test_ops.py::TestCommonCUDA::test_python_ref__refs_amin_cuda_bfloat16 PASSED [ 22%] 2023-01-11T23:10:17.0380011Z test_ops.py::TestCommonCUDA::test_python_ref__refs_amin_cuda_bool PASSED [ 22%] 2023-01-11T23:10:17.0380166Z test_ops.py::TestCommonCUDA::test_python_ref__refs_amin_cuda_int16 PASSED [ 22%] 2023-01-11T23:10:17.0380321Z test_ops.py::TestCommonCUDA::test_python_ref__refs_amin_cuda_int32 PASSED [ 22%] 2023-01-11T23:10:17.0380472Z test_ops.py::TestCommonCUDA::test_python_ref__refs_any_cuda_bool PASSED [ 22%] 2023-01-11T23:10:17.0380622Z test_ops.py::TestCommonCUDA::test_python_ref__refs_any_cuda_float16 PASSED [ 22%] 2023-01-11T23:10:17.0380776Z test_ops.py::TestCommonCUDA::test_python_ref__refs_any_cuda_float32 PASSED [ 22%] 2023-01-11T23:10:17.0380934Z test_ops.py::TestCommonCUDA::test_python_ref__refs_any_cuda_uint8 PASSED [ 22%] 2023-01-11T23:10:17.0381093Z test_ops.py::TestCommonCUDA::test_python_ref__refs_arange_cuda_float16 PASSED [ 22%] 2023-01-11T23:10:17.0381252Z test_ops.py::TestCommonCUDA::test_python_ref__refs_arange_cuda_int64 PASSED [ 22%] 2023-01-11T23:10:17.0381406Z test_ops.py::TestCommonCUDA::test_python_ref__refs_arange_cuda_int8 PASSED [ 22%] 2023-01-11T23:10:17.0381562Z test_ops.py::TestCommonCUDA::test_python_ref__refs_arange_cuda_uint8 PASSED [ 22%] 2023-01-11T23:10:17.0381722Z test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_cuda_bool PASSED [ 22%] 2023-01-11T23:10:17.0381890Z test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_cuda_complex64 PASSED [ 22%] 2023-01-11T23:10:17.0382048Z test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_cuda_float64 PASSED [ 22%] 2023-01-11T23:10:17.0382211Z test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_cuda_int64 PASSED [ 22%] 2023-01-11T23:10:17.0382374Z test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_cuda_uint8 PASSED [ 22%] 2023-01-11T23:10:17.0382552Z test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_partial_views_cuda_bool PASSED [ 22%] 2023-01-11T23:10:17.0382762Z test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_partial_views_cuda_complex128 PASSED [ 22%] 2023-01-11T23:10:17.0382944Z test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_partial_views_cuda_float64 PASSED [ 22%] 2023-01-11T23:10:17.0383124Z test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_partial_views_cuda_int16 PASSED [ 22%] 2023-01-11T23:10:17.0383298Z test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_partial_views_cuda_int64 PASSED [ 22%] 2023-01-11T23:10:17.0383462Z test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_partial_views_cuda_uint8 PASSED [ 22%] 2023-01-11T23:10:17.0383637Z test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_scatter_cuda_bfloat16 PASSED [ 22%] 2023-01-11T23:10:17.0383809Z test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_scatter_cuda_bool PASSED [ 22%] 2023-01-11T23:10:17.0383985Z test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_scatter_cuda_complex128 PASSED [ 22%] 2023-01-11T23:10:17.0384159Z test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_scatter_cuda_float64 PASSED [ 22%] 2023-01-11T23:10:17.0384329Z test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_scatter_cuda_uint8 PASSED [ 22%] 2023-01-11T23:10:17.0384489Z test_ops.py::TestCommonCUDA::test_python_ref__refs_asin_cuda_bfloat16 PASSED [ 22%] 2023-01-11T23:10:17.0384644Z test_ops.py::TestCommonCUDA::test_python_ref__refs_asin_cuda_bool PASSED [ 22%] 2023-01-11T23:10:17.0384806Z test_ops.py::TestCommonCUDA::test_python_ref__refs_asin_cuda_complex128 PASSED [ 22%] 2023-01-11T23:10:17.0384960Z test_ops.py::TestCommonCUDA::test_python_ref__refs_asin_cuda_complex32 PASSED [ 22%] 2023-01-11T23:10:17.0385117Z test_ops.py::TestCommonCUDA::test_python_ref__refs_asin_cuda_float32 PASSED [ 22%] 2023-01-11T23:10:17.0385296Z test_ops.py::TestCommonCUDA::test_python_ref__refs_asin_cuda_int32 PASSED [ 22%] 2023-01-11T23:10:17.0385449Z test_ops.py::TestCommonCUDA::test_python_ref__refs_asin_cuda_int8 PASSED [ 22%] 2023-01-11T23:10:17.0385607Z test_ops.py::TestCommonCUDA::test_python_ref__refs_asin_cuda_uint8 PASSED [ 22%] 2023-01-11T23:10:17.0385761Z test_ops.py::TestCommonCUDA::test_python_ref__refs_asinh_cuda_bool PASSED [ 22%] 2023-01-11T23:10:17.0385925Z test_ops.py::TestCommonCUDA::test_python_ref__refs_asinh_cuda_complex128 PASSED [ 22%] 2023-01-11T23:10:17.0386087Z test_ops.py::TestCommonCUDA::test_python_ref__refs_asinh_cuda_complex32 PASSED [ 22%] 2023-01-11T23:10:17.0386238Z test_ops.py::TestCommonCUDA::test_python_ref__refs_asinh_cuda_float16 PASSED [ 22%] 2023-01-11T23:10:17.0386391Z test_ops.py::TestCommonCUDA::test_python_ref__refs_asinh_cuda_float64 PASSED [ 22%] 2023-01-11T23:10:17.0386546Z test_ops.py::TestCommonCUDA::test_python_ref__refs_asinh_cuda_int32 PASSED [ 22%] 2023-01-11T23:10:17.0386703Z test_ops.py::TestCommonCUDA::test_python_ref__refs_asinh_cuda_int8 PASSED [ 22%] 2023-01-11T23:10:17.0386859Z test_ops.py::TestCommonCUDA::test_python_ref__refs_asinh_cuda_uint8 PASSED [ 22%] 2023-01-11T23:10:17.0387021Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atan2_cuda_bfloat16 PASSED [ 22%] 2023-01-11T23:10:17.0387173Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atan2_cuda_bool PASSED [ 22%] 2023-01-11T23:10:17.0387327Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atan2_cuda_float64 PASSED [ 22%] 2023-01-11T23:10:17.0387475Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atan2_cuda_int8 PASSED [ 22%] 2023-01-11T23:10:17.0387630Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atan2_cuda_uint8 PASSED [ 22%] 2023-01-11T23:10:17.0387788Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atan_cuda_bfloat16 PASSED [ 22%] 2023-01-11T23:10:17.0387948Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atan_cuda_complex32 PASSED [ 22%] 2023-01-11T23:10:17.0388108Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atan_cuda_complex64 PASSED [ 22%] 2023-01-11T23:10:17.0388266Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atan_cuda_float16 PASSED [ 22%] 2023-01-11T23:10:17.0388451Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atan_cuda_int32 PASSED [ 22%] 2023-01-11T23:10:17.0388606Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atan_cuda_uint8 PASSED [ 22%] 2023-01-11T23:10:17.0388821Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atanh_cuda_bfloat16 PASSED [ 22%] 2023-01-11T23:10:17.0388987Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atanh_cuda_complex128 PASSED [ 22%] 2023-01-11T23:10:17.0389147Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atanh_cuda_float64 PASSED [ 22%] 2023-01-11T23:10:17.0389303Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atanh_cuda_int32 PASSED [ 22%] 2023-01-11T23:10:17.0389457Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atanh_cuda_int8 PASSED [ 22%] 2023-01-11T23:10:17.0389623Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_1d_cuda_bool PASSED [ 22%] 2023-01-11T23:10:17.0389792Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_1d_cuda_complex128 PASSED [ 22%] 2023-01-11T23:10:17.0389961Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_1d_cuda_complex32 PASSED [ 22%] 2023-01-11T23:10:17.0390127Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_1d_cuda_complex64 PASSED [ 22%] 2023-01-11T23:10:17.0390284Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_1d_cuda_float32 PASSED [ 22%] 2023-01-11T23:10:17.0390446Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_1d_cuda_float64 PASSED [ 22%] 2023-01-11T23:10:17.0390605Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_1d_cuda_int64 PASSED [ 22%] 2023-01-11T23:10:17.0390768Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_2d_cuda_bfloat16 PASSED [ 22%] 2023-01-11T23:10:17.0390933Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_2d_cuda_complex32 PASSED [ 22%] 2023-01-11T23:10:17.0391124Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_2d_cuda_float16 PASSED [ 22%] 2023-01-11T23:10:17.0391283Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_2d_cuda_int8 PASSED [ 22%] 2023-01-11T23:10:17.0391443Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_3d_cuda_bool PASSED [ 22%] 2023-01-11T23:10:17.0391603Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_3d_cuda_complex128 PASSED [ 22%] 2023-01-11T23:10:17.0391769Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_3d_cuda_complex64 PASSED [ 22%] 2023-01-11T23:10:17.0391929Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_3d_cuda_float64 PASSED [ 22%] 2023-01-11T23:10:17.0392091Z test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_and_cuda_bool PASSED [ 22%] 2023-01-11T23:10:17.0392253Z test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_and_cuda_int64 PASSED [ 22%] 2023-01-11T23:10:17.0392413Z test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_and_cuda_int8 PASSED [ 22%] 2023-01-11T23:10:17.0392577Z test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_and_cuda_uint8 PASSED [ 22%] 2023-01-11T23:10:17.0392749Z test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_left_shift_cuda_int64 PASSED [ 22%] 2023-01-11T23:10:17.0392910Z test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_not_cuda_bool PASSED [ 22%] 2023-01-11T23:10:17.0393065Z test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_not_cuda_int32 PASSED [ 22%] 2023-01-11T23:10:17.0393228Z test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_not_cuda_uint8 PASSED [ 22%] 2023-01-11T23:10:17.0393387Z test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_or_cuda_bool PASSED [ 22%] 2023-01-11T23:10:17.0393545Z test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_or_cuda_int16 PASSED [ 22%] 2023-01-11T23:10:17.0393700Z test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_or_cuda_int32 PASSED [ 22%] 2023-01-11T23:10:17.0393873Z test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_right_shift_cuda_int32 PASSED [ 22%] 2023-01-11T23:10:17.0394044Z test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_right_shift_cuda_uint8 PASSED [ 22%] 2023-01-11T23:10:17.0394228Z test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_xor_cuda_int64 PASSED [ 22%] 2023-01-11T23:10:17.0394393Z test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_shapes_cuda_float32 PASSED [ 22%] 2023-01-11T23:10:17.0394571Z test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_tensors_cuda_complex128 PASSED [ 22%] 2023-01-11T23:10:17.0394741Z test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_tensors_cuda_float16 PASSED [ 22%] 2023-01-11T23:10:17.0394916Z test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_tensors_cuda_float64 PASSED [ 22%] 2023-01-11T23:10:17.0395083Z test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_tensors_cuda_int32 PASSED [ 22%] 2023-01-11T23:10:17.0395250Z test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_tensors_cuda_int64 PASSED [ 22%] 2023-01-11T23:10:17.0395418Z test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_tensors_cuda_uint8 PASSED [ 22%] 2023-01-11T23:10:17.0395586Z test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_to_cuda_bfloat16 PASSED [ 22%] 2023-01-11T23:10:17.0395751Z test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_to_cuda_complex128 PASSED [ 23%] 2023-01-11T23:10:17.0395910Z test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_to_cuda_float32 PASSED [ 23%] 2023-01-11T23:10:17.0396073Z test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_to_cuda_int16 PASSED [ 23%] 2023-01-11T23:10:17.0396234Z test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_to_cuda_int8 PASSED [ 23%] 2023-01-11T23:10:17.0396397Z test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_to_cuda_uint8 PASSED [ 23%] 2023-01-11T23:10:17.0396559Z test_ops.py::TestCommonCUDA::test_python_ref__refs_bucketize_cuda_float32 PASSED [ 23%] 2023-01-11T23:10:17.0396754Z test_ops.py::TestCommonCUDA::test_python_ref__refs_bucketize_cuda_float64 PASSED [ 23%] 2023-01-11T23:10:17.0396914Z test_ops.py::TestCommonCUDA::test_python_ref__refs_bucketize_cuda_int32 PASSED [ 23%] 2023-01-11T23:10:17.0397079Z test_ops.py::TestCommonCUDA::test_python_ref__refs_bucketize_cuda_int8 PASSED [ 23%] 2023-01-11T23:10:17.0397230Z test_ops.py::TestCommonCUDA::test_python_ref__refs_bucketize_cuda_uint8 PASSED [ 23%] 2023-01-11T23:10:17.0397384Z test_ops.py::TestCommonCUDA::test_python_ref__refs_cat_cuda_bool PASSED [ 23%] 2023-01-11T23:10:17.0397545Z test_ops.py::TestCommonCUDA::test_python_ref__refs_cat_cuda_complex128 PASSED [ 23%] 2023-01-11T23:10:17.0397703Z test_ops.py::TestCommonCUDA::test_python_ref__refs_cat_cuda_complex32 PASSED [ 23%] 2023-01-11T23:10:17.0397860Z test_ops.py::TestCommonCUDA::test_python_ref__refs_cat_cuda_complex64 PASSED [ 23%] 2023-01-11T23:10:17.0398016Z test_ops.py::TestCommonCUDA::test_python_ref__refs_cat_cuda_float32 PASSED [ 23%] 2023-01-11T23:10:17.0398173Z test_ops.py::TestCommonCUDA::test_python_ref__refs_cat_cuda_int16 PASSED [ 23%] 2023-01-11T23:10:17.0398326Z test_ops.py::TestCommonCUDA::test_python_ref__refs_cat_cuda_int32 PASSED [ 23%] 2023-01-11T23:10:17.0398474Z test_ops.py::TestCommonCUDA::test_python_ref__refs_cat_cuda_uint8 PASSED [ 23%] 2023-01-11T23:10:17.0398632Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ceil_cuda_float64 PASSED [ 23%] 2023-01-11T23:10:17.0398786Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ceil_cuda_int32 PASSED [ 23%] 2023-01-11T23:10:17.0398939Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ceil_cuda_int64 PASSED [ 23%] 2023-01-11T23:10:17.0399093Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ceil_cuda_uint8 PASSED [ 23%] 2023-01-11T23:10:17.0399250Z test_ops.py::TestCommonCUDA::test_python_ref__refs_chunk_cuda_bool PASSED [ 23%] 2023-01-11T23:10:17.0399414Z test_ops.py::TestCommonCUDA::test_python_ref__refs_chunk_cuda_complex128 PASSED [ 23%] 2023-01-11T23:10:17.0399577Z test_ops.py::TestCommonCUDA::test_python_ref__refs_chunk_cuda_complex32 PASSED [ 23%] 2023-01-11T23:10:17.0399729Z test_ops.py::TestCommonCUDA::test_python_ref__refs_chunk_cuda_complex64 PASSED [ 23%] 2023-01-11T23:10:17.0399910Z test_ops.py::TestCommonCUDA::test_python_ref__refs_chunk_cuda_float16 PASSED [ 23%] 2023-01-11T23:10:17.0400067Z test_ops.py::TestCommonCUDA::test_python_ref__refs_chunk_cuda_float32 PASSED [ 23%] 2023-01-11T23:10:17.0400222Z test_ops.py::TestCommonCUDA::test_python_ref__refs_chunk_cuda_int16 PASSED [ 23%] 2023-01-11T23:10:17.0400377Z test_ops.py::TestCommonCUDA::test_python_ref__refs_chunk_cuda_int32 PASSED [ 23%] 2023-01-11T23:10:17.0400556Z test_ops.py::TestCommonCUDA::test_python_ref__refs_chunk_cuda_int64 PASSED [ 23%] 2023-01-11T23:10:17.0400734Z test_ops.py::TestCommonCUDA::test_python_ref__refs_chunk_cuda_uint8 PASSED [ 23%] 2023-01-11T23:10:17.0400895Z test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_cuda_bfloat16 PASSED [ 23%] 2023-01-11T23:10:17.0401055Z test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_cuda_float32 PASSED [ 23%] 2023-01-11T23:10:17.0401202Z test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_cuda_int16 PASSED [ 23%] 2023-01-11T23:10:17.0401357Z test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_cuda_int32 PASSED [ 23%] 2023-01-11T23:10:17.0401510Z test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_cuda_int64 PASSED [ 23%] 2023-01-11T23:10:17.0401664Z test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_cuda_uint8 PASSED [ 23%] 2023-01-11T23:10:17.0401829Z test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_max_cuda_bfloat16 PASSED [ 23%] 2023-01-11T23:10:17.0401992Z test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_max_cuda_float32 PASSED [ 23%] 2023-01-11T23:10:17.0402152Z test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_max_cuda_int64 PASSED [ 23%] 2023-01-11T23:10:17.0402310Z test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_max_cuda_uint8 PASSED [ 23%] 2023-01-11T23:10:17.0402492Z test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_min_cuda_bfloat16 PASSED [ 23%] 2023-01-11T23:10:17.0402652Z test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_min_cuda_bool PASSED [ 23%] 2023-01-11T23:10:17.0402816Z test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_min_cuda_float64 PASSED [ 23%] 2023-01-11T23:10:17.0402973Z test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_min_cuda_int64 PASSED [ 23%] 2023-01-11T23:10:17.0403130Z test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_min_cuda_int8 PASSED [ 23%] 2023-01-11T23:10:17.0403288Z test_ops.py::TestCommonCUDA::test_python_ref__refs_clone_cuda_complex32 PASSED [ 23%] 2023-01-11T23:10:17.0403442Z test_ops.py::TestCommonCUDA::test_python_ref__refs_clone_cuda_complex64 PASSED [ 23%] 2023-01-11T23:10:17.0403602Z test_ops.py::TestCommonCUDA::test_python_ref__refs_clone_cuda_float16 PASSED [ 23%] 2023-01-11T23:10:17.0403750Z test_ops.py::TestCommonCUDA::test_python_ref__refs_clone_cuda_float64 PASSED [ 23%] 2023-01-11T23:10:17.0403907Z test_ops.py::TestCommonCUDA::test_python_ref__refs_clone_cuda_int16 PASSED [ 23%] 2023-01-11T23:10:17.0404061Z test_ops.py::TestCommonCUDA::test_python_ref__refs_clone_cuda_int32 PASSED [ 23%] 2023-01-11T23:10:17.0404218Z test_ops.py::TestCommonCUDA::test_python_ref__refs_clone_cuda_int8 PASSED [ 23%] 2023-01-11T23:10:17.0404389Z test_ops.py::TestCommonCUDA::test_python_ref__refs_column_stack_cuda_complex32 PASSED [ 23%] 2023-01-11T23:10:17.0404561Z test_ops.py::TestCommonCUDA::test_python_ref__refs_column_stack_cuda_complex64 PASSED [ 23%] 2023-01-11T23:10:17.0404722Z test_ops.py::TestCommonCUDA::test_python_ref__refs_column_stack_cuda_int64 PASSED [ 23%] 2023-01-11T23:10:17.0404883Z test_ops.py::TestCommonCUDA::test_python_ref__refs_column_stack_cuda_uint8 PASSED [ 23%] 2023-01-11T23:10:17.0405043Z test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_cuda_complex128 PASSED [ 23%] 2023-01-11T23:10:17.0405195Z test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_cuda_complex32 PASSED [ 23%] 2023-01-11T23:10:17.0405354Z test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_cuda_complex64 PASSED [ 23%] 2023-01-11T23:10:17.0405512Z test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_cuda_float32 PASSED [ 23%] 2023-01-11T23:10:17.0405689Z test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_cuda_float64 PASSED [ 23%] 2023-01-11T23:10:17.0405849Z test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_cuda_int16 PASSED [ 23%] 2023-01-11T23:10:17.0406003Z test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_cuda_int32 PASSED [ 23%] 2023-01-11T23:10:17.0406169Z test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_physical_cuda_bfloat16 PASSED [ 23%] 2023-01-11T23:10:17.0406335Z test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_physical_cuda_bool PASSED [ 23%] 2023-01-11T23:10:17.0406500Z test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_physical_cuda_complex32 PASSED [ 23%] 2023-01-11T23:10:17.0406671Z test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_physical_cuda_complex64 PASSED [ 23%] 2023-01-11T23:10:17.0406842Z test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_physical_cuda_float16 PASSED [ 23%] 2023-01-11T23:10:17.0407008Z test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_physical_cuda_float32 PASSED [ 23%] 2023-01-11T23:10:17.0407174Z test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_physical_cuda_int32 PASSED [ 23%] 2023-01-11T23:10:17.0407337Z test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_physical_cuda_int64 PASSED [ 23%] 2023-01-11T23:10:17.0407499Z test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_physical_cuda_int8 PASSED [ 23%] 2023-01-11T23:10:17.0407663Z test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_physical_cuda_uint8 PASSED [ 23%] 2023-01-11T23:10:17.0407827Z test_ops.py::TestCommonCUDA::test_python_ref__refs_constant_pad_nd_cuda_bfloat16 PASSED [ 23%] 2023-01-11T23:10:17.0407993Z test_ops.py::TestCommonCUDA::test_python_ref__refs_constant_pad_nd_cuda_bool PASSED [ 23%] 2023-01-11T23:10:17.0408167Z test_ops.py::TestCommonCUDA::test_python_ref__refs_constant_pad_nd_cuda_complex128 PASSED [ 23%] 2023-01-11T23:10:17.0408357Z test_ops.py::TestCommonCUDA::test_python_ref__refs_constant_pad_nd_cuda_int16 PASSED [ 23%] 2023-01-11T23:10:17.0408527Z test_ops.py::TestCommonCUDA::test_python_ref__refs_contiguous_cuda_complex128 PASSED [ 23%] 2023-01-11T23:10:17.0408695Z test_ops.py::TestCommonCUDA::test_python_ref__refs_contiguous_cuda_complex64 PASSED [ 23%] 2023-01-11T23:10:17.0408857Z test_ops.py::TestCommonCUDA::test_python_ref__refs_contiguous_cuda_float64 PASSED [ 23%] 2023-01-11T23:10:17.0409019Z test_ops.py::TestCommonCUDA::test_python_ref__refs_contiguous_cuda_int16 PASSED [ 23%] 2023-01-11T23:10:17.0409181Z test_ops.py::TestCommonCUDA::test_python_ref__refs_contiguous_cuda_int64 PASSED [ 23%] 2023-01-11T23:10:17.0409338Z test_ops.py::TestCommonCUDA::test_python_ref__refs_copysign_cuda_float32 PASSED [ 23%] 2023-01-11T23:10:17.0409501Z test_ops.py::TestCommonCUDA::test_python_ref__refs_copysign_cuda_int16 PASSED [ 23%] 2023-01-11T23:10:17.0409663Z test_ops.py::TestCommonCUDA::test_python_ref__refs_copysign_cuda_int64 PASSED [ 23%] 2023-01-11T23:10:17.0409819Z test_ops.py::TestCommonCUDA::test_python_ref__refs_copysign_cuda_uint8 PASSED [ 23%] 2023-01-11T23:10:17.0409975Z test_ops.py::TestCommonCUDA::test_python_ref__refs_cos_cuda_bool PASSED [ 23%] 2023-01-11T23:10:17.0410135Z test_ops.py::TestCommonCUDA::test_python_ref__refs_cos_cuda_complex64 PASSED [ 23%] 2023-01-11T23:10:17.0410290Z test_ops.py::TestCommonCUDA::test_python_ref__refs_cos_cuda_float32 PASSED [ 23%] 2023-01-11T23:10:17.0410444Z test_ops.py::TestCommonCUDA::test_python_ref__refs_cos_cuda_int16 PASSED [ 23%] 2023-01-11T23:10:17.0410593Z test_ops.py::TestCommonCUDA::test_python_ref__refs_cos_cuda_int32 PASSED [ 23%] 2023-01-11T23:10:17.0410744Z test_ops.py::TestCommonCUDA::test_python_ref__refs_cos_cuda_int64 PASSED [ 23%] 2023-01-11T23:10:17.0410901Z test_ops.py::TestCommonCUDA::test_python_ref__refs_cosh_cuda_bfloat16 PASSED [ 23%] 2023-01-11T23:10:17.0411064Z test_ops.py::TestCommonCUDA::test_python_ref__refs_cosh_cuda_complex128 PASSED [ 23%] 2023-01-11T23:10:17.0411223Z test_ops.py::TestCommonCUDA::test_python_ref__refs_cosh_cuda_complex32 PASSED [ 23%] 2023-01-11T23:10:17.0411403Z test_ops.py::TestCommonCUDA::test_python_ref__refs_cosh_cuda_complex64 PASSED [ 23%] 2023-01-11T23:10:17.0411564Z test_ops.py::TestCommonCUDA::test_python_ref__refs_cosh_cuda_float64 PASSED [ 23%] 2023-01-11T23:10:17.0411719Z test_ops.py::TestCommonCUDA::test_python_ref__refs_cosh_cuda_int32 PASSED [ 23%] 2023-01-11T23:10:17.0411866Z test_ops.py::TestCommonCUDA::test_python_ref__refs_cosh_cuda_int8 PASSED [ 23%] 2023-01-11T23:10:17.0412028Z test_ops.py::TestCommonCUDA::test_python_ref__refs_cumsum_cuda_bfloat16 PASSED [ 23%] 2023-01-11T23:10:17.0412190Z test_ops.py::TestCommonCUDA::test_python_ref__refs_cumsum_cuda_complex128 PASSED [ 23%] 2023-01-11T23:10:17.0412349Z test_ops.py::TestCommonCUDA::test_python_ref__refs_cumsum_cuda_float16 PASSED [ 23%] 2023-01-11T23:10:17.0412508Z test_ops.py::TestCommonCUDA::test_python_ref__refs_cumsum_cuda_int16 PASSED [ 23%] 2023-01-11T23:10:17.0412663Z test_ops.py::TestCommonCUDA::test_python_ref__refs_cumsum_cuda_uint8 PASSED [ 23%] 2023-01-11T23:10:17.0412819Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_cuda_bool PASSED [ 23%] 2023-01-11T23:10:17.0412978Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_cuda_complex128 PASSED [ 23%] 2023-01-11T23:10:17.0413137Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_cuda_complex32 PASSED [ 23%] 2023-01-11T23:10:17.0413287Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_cuda_uint8 PASSED [ 23%] 2023-01-11T23:10:17.0413464Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_embed_cuda_bool PASSED [ 23%] 2023-01-11T23:10:17.0413629Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_embed_cuda_complex128 PASSED [ 23%] 2023-01-11T23:10:17.0413794Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_embed_cuda_complex32 PASSED [ 23%] 2023-01-11T23:10:17.0413985Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_embed_cuda_complex64 PASSED [ 23%] 2023-01-11T23:10:17.0414148Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_embed_cuda_float16 PASSED [ 23%] 2023-01-11T23:10:17.0414313Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_embed_cuda_float32 PASSED [ 23%] 2023-01-11T23:10:17.0414571Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_embed_cuda_int32 PASSED [ 23%] 2023-01-11T23:10:17.0414728Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_embed_cuda_int64 PASSED [ 23%] 2023-01-11T23:10:17.0414889Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_embed_cuda_int8 PASSED [ 23%] 2023-01-11T23:10:17.0415061Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_copy_cuda_complex128 PASSED [ 23%] 2023-01-11T23:10:17.0415232Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_copy_cuda_complex64 PASSED [ 23%] 2023-01-11T23:10:17.0415398Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_copy_cuda_float16 PASSED [ 23%] 2023-01-11T23:10:17.0415565Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_copy_cuda_float32 PASSED [ 23%] 2023-01-11T23:10:17.0415727Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_copy_cuda_float64 PASSED [ 23%] 2023-01-11T23:10:17.0415894Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_copy_cuda_int8 PASSED [ 23%] 2023-01-11T23:10:17.0416053Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_copy_cuda_uint8 PASSED [ 23%] 2023-01-11T23:10:17.0416216Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_cuda_bfloat16 PASSED [ 23%] 2023-01-11T23:10:17.0416381Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_cuda_complex128 PASSED [ 23%] 2023-01-11T23:10:17.0416542Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_cuda_complex32 PASSED [ 23%] 2023-01-11T23:10:17.0416704Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_cuda_complex64 PASSED [ 23%] 2023-01-11T23:10:17.0416864Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_cuda_int64 PASSED [ 23%] 2023-01-11T23:10:17.0417042Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_scatter_cuda_complex128 PASSED [ 23%] 2023-01-11T23:10:17.0417255Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_scatter_cuda_complex64 PASSED [ 23%] 2023-01-11T23:10:17.0417428Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_scatter_cuda_float16 PASSED [ 23%] 2023-01-11T23:10:17.0417593Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_scatter_cuda_float32 PASSED [ 23%] 2023-01-11T23:10:17.0417761Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_scatter_cuda_int32 PASSED [ 23%] 2023-01-11T23:10:17.0417927Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_scatter_cuda_int64 PASSED [ 23%] 2023-01-11T23:10:17.0418090Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_scatter_cuda_int8 PASSED [ 23%] 2023-01-11T23:10:17.0418249Z test_ops.py::TestCommonCUDA::test_python_ref__refs_digamma_cuda_int64 PASSED [ 23%] 2023-01-11T23:10:17.0418425Z test_ops.py::TestCommonCUDA::test_python_ref__refs_div_floor_rounding_cuda_float16 PASSED [ 23%] 2023-01-11T23:10:17.0418598Z test_ops.py::TestCommonCUDA::test_python_ref__refs_div_floor_rounding_cuda_float32 PASSED [ 23%] 2023-01-11T23:10:17.0418772Z test_ops.py::TestCommonCUDA::test_python_ref__refs_div_floor_rounding_cuda_float64 PASSED [ 23%] 2023-01-11T23:10:17.0418941Z test_ops.py::TestCommonCUDA::test_python_ref__refs_div_floor_rounding_cuda_int64 PASSED [ 23%] 2023-01-11T23:10:17.0419111Z test_ops.py::TestCommonCUDA::test_python_ref__refs_div_no_rounding_mode_cuda_bfloat16 PASSED [ 23%] 2023-01-11T23:10:17.0419283Z test_ops.py::TestCommonCUDA::test_python_ref__refs_div_no_rounding_mode_cuda_complex32 XFAIL [ 23%] 2023-01-11T23:10:17.0419461Z test_ops.py::TestCommonCUDA::test_python_ref__refs_div_no_rounding_mode_cuda_complex64 PASSED [ 23%] 2023-01-11T23:10:17.0419635Z test_ops.py::TestCommonCUDA::test_python_ref__refs_div_no_rounding_mode_cuda_float32 PASSED [ 23%] 2023-01-11T23:10:17.0419836Z test_ops.py::TestCommonCUDA::test_python_ref__refs_div_no_rounding_mode_cuda_int16 PASSED [ 23%] 2023-01-11T23:10:17.0420005Z test_ops.py::TestCommonCUDA::test_python_ref__refs_div_no_rounding_mode_cuda_int32 PASSED [ 23%] 2023-01-11T23:10:17.0420175Z test_ops.py::TestCommonCUDA::test_python_ref__refs_div_no_rounding_mode_cuda_int64 PASSED [ 23%] 2023-01-11T23:10:17.0420346Z test_ops.py::TestCommonCUDA::test_python_ref__refs_div_trunc_rounding_cuda_float32 PASSED [ 23%] 2023-01-11T23:10:17.0420507Z test_ops.py::TestCommonCUDA::test_python_ref__refs_div_trunc_rounding_cuda_int32 PASSED [ 23%] 2023-01-11T23:10:17.0420676Z test_ops.py::TestCommonCUDA::test_python_ref__refs_div_trunc_rounding_cuda_int8 PASSED [ 23%] 2023-01-11T23:10:17.0420835Z test_ops.py::TestCommonCUDA::test_python_ref__refs_dsplit_cuda_bool PASSED [ 23%] 2023-01-11T23:10:17.0421003Z test_ops.py::TestCommonCUDA::test_python_ref__refs_dsplit_cuda_complex32 PASSED [ 23%] 2023-01-11T23:10:17.0421169Z test_ops.py::TestCommonCUDA::test_python_ref__refs_dsplit_cuda_complex64 PASSED [ 23%] 2023-01-11T23:10:17.0421327Z test_ops.py::TestCommonCUDA::test_python_ref__refs_dsplit_cuda_int16 PASSED [ 23%] 2023-01-11T23:10:17.0421483Z test_ops.py::TestCommonCUDA::test_python_ref__refs_dsplit_cuda_int32 PASSED [ 24%] 2023-01-11T23:10:17.0421646Z test_ops.py::TestCommonCUDA::test_python_ref__refs_dstack_cuda_bfloat16 PASSED [ 24%] 2023-01-11T23:10:17.0421805Z test_ops.py::TestCommonCUDA::test_python_ref__refs_dstack_cuda_bool PASSED [ 24%] 2023-01-11T23:10:17.0421963Z test_ops.py::TestCommonCUDA::test_python_ref__refs_dstack_cuda_complex128 PASSED [ 24%] 2023-01-11T23:10:17.0422123Z test_ops.py::TestCommonCUDA::test_python_ref__refs_dstack_cuda_complex32 PASSED [ 24%] 2023-01-11T23:10:17.0422284Z test_ops.py::TestCommonCUDA::test_python_ref__refs_dstack_cuda_complex64 PASSED [ 24%] 2023-01-11T23:10:17.0422443Z test_ops.py::TestCommonCUDA::test_python_ref__refs_dstack_cuda_float32 PASSED [ 24%] 2023-01-11T23:10:17.0422604Z test_ops.py::TestCommonCUDA::test_python_ref__refs_dstack_cuda_float64 PASSED [ 24%] 2023-01-11T23:10:17.0422760Z test_ops.py::TestCommonCUDA::test_python_ref__refs_dstack_cuda_int16 PASSED [ 24%] 2023-01-11T23:10:17.0422970Z test_ops.py::TestCommonCUDA::test_python_ref__refs_dstack_cuda_int32 PASSED [ 24%] 2023-01-11T23:10:17.0423127Z test_ops.py::TestCommonCUDA::test_python_ref__refs_dstack_cuda_int8 PASSED [ 24%] 2023-01-11T23:10:17.0423324Z test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_cuda_bfloat16 SKIPPED (Expected: empty is not comparable) [ 24%] 2023-01-11T23:10:17.0423523Z test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_cuda_bool SKIPPED (Expected: empty is not comparable) [ 24%] 2023-01-11T23:10:17.0423724Z test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_cuda_complex128 SKIPPED (Expected: empty is not comparable) [ 24%] 2023-01-11T23:10:17.0423926Z test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_cuda_complex32 SKIPPED (Expected: empty is not comparable) [ 24%] 2023-01-11T23:10:17.0424127Z test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_cuda_complex64 SKIPPED (Expected: empty is not comparable) [ 24%] 2023-01-11T23:10:17.0424329Z test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_cuda_float32 SKIPPED (Expected: empty is not comparable) [ 24%] 2023-01-11T23:10:17.0424526Z test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_cuda_int16 SKIPPED (Expected: empty is not comparable) [ 24%] 2023-01-11T23:10:17.0424720Z test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_cuda_int64 SKIPPED (Expected: empty is not comparable) [ 24%] 2023-01-11T23:10:17.0424916Z test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_cuda_int8 SKIPPED (Expected: empty is not comparable) [ 24%] 2023-01-11T23:10:17.0425114Z test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_cuda_uint8 SKIPPED (Expected: empty is not comparable) [ 24%] 2023-01-11T23:10:17.0425315Z test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_like_cuda_complex64 SKIPPED (Expected: empty is not comparable) [ 24%] 2023-01-11T23:10:17.0425546Z test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_like_cuda_float16 SKIPPED (Expected: empty is not comparable) [ 24%] 2023-01-11T23:10:17.0425752Z test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_like_cuda_float64 SKIPPED (Expected: empty is not comparable) [ 24%] 2023-01-11T23:10:17.0425952Z test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_like_cuda_int16 SKIPPED (Expected: empty is not comparable) [ 24%] 2023-01-11T23:10:17.0426151Z test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_like_cuda_int32 SKIPPED (Expected: empty is not comparable) [ 24%] 2023-01-11T23:10:17.0426346Z test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_like_cuda_int64 SKIPPED (Expected: empty is not comparable) [ 24%] 2023-01-11T23:10:17.0426504Z test_ops.py::TestCommonCUDA::test_python_ref__refs_eq_cuda_bfloat16 PASSED [ 24%] 2023-01-11T23:10:17.0426663Z test_ops.py::TestCommonCUDA::test_python_ref__refs_eq_cuda_complex64 PASSED [ 24%] 2023-01-11T23:10:17.0426820Z test_ops.py::TestCommonCUDA::test_python_ref__refs_eq_cuda_float32 PASSED [ 24%] 2023-01-11T23:10:17.0426969Z test_ops.py::TestCommonCUDA::test_python_ref__refs_eq_cuda_float64 PASSED [ 24%] 2023-01-11T23:10:17.0427125Z test_ops.py::TestCommonCUDA::test_python_ref__refs_eq_cuda_int16 PASSED [ 24%] 2023-01-11T23:10:17.0427277Z test_ops.py::TestCommonCUDA::test_python_ref__refs_eq_cuda_int32 PASSED [ 24%] 2023-01-11T23:10:17.0427428Z test_ops.py::TestCommonCUDA::test_python_ref__refs_eq_cuda_int64 PASSED [ 24%] 2023-01-11T23:10:17.0427578Z test_ops.py::TestCommonCUDA::test_python_ref__refs_eq_cuda_int8 PASSED [ 24%] 2023-01-11T23:10:17.0427734Z test_ops.py::TestCommonCUDA::test_python_ref__refs_erf_cuda_float32 PASSED [ 24%] 2023-01-11T23:10:17.0427890Z test_ops.py::TestCommonCUDA::test_python_ref__refs_erf_cuda_float64 PASSED [ 24%] 2023-01-11T23:10:17.0428039Z test_ops.py::TestCommonCUDA::test_python_ref__refs_erf_cuda_int32 PASSED [ 24%] 2023-01-11T23:10:17.0428182Z test_ops.py::TestCommonCUDA::test_python_ref__refs_erf_cuda_int8 PASSED [ 24%] 2023-01-11T23:10:17.0428346Z test_ops.py::TestCommonCUDA::test_python_ref__refs_erfc_cuda_bfloat16 PASSED [ 24%] 2023-01-11T23:10:17.0428501Z test_ops.py::TestCommonCUDA::test_python_ref__refs_erfc_cuda_int32 PASSED [ 24%] 2023-01-11T23:10:17.0428743Z test_ops.py::TestCommonCUDA::test_python_ref__refs_erfc_cuda_int64 PASSED [ 24%] 2023-01-11T23:10:17.0428917Z test_ops.py::TestCommonCUDA::test_python_ref__refs_erfc_cuda_int8 PASSED [ 24%] 2023-01-11T23:10:17.0429080Z test_ops.py::TestCommonCUDA::test_python_ref__refs_erfinv_cuda_bool PASSED [ 24%] 2023-01-11T23:10:17.0429242Z test_ops.py::TestCommonCUDA::test_python_ref__refs_erfinv_cuda_float64 PASSED [ 24%] 2023-01-11T23:10:17.0429398Z test_ops.py::TestCommonCUDA::test_python_ref__refs_erfinv_cuda_int16 PASSED [ 24%] 2023-01-11T23:10:17.0429546Z test_ops.py::TestCommonCUDA::test_python_ref__refs_erfinv_cuda_uint8 PASSED [ 24%] 2023-01-11T23:10:17.0429703Z test_ops.py::TestCommonCUDA::test_python_ref__refs_exp2_cuda_bfloat16 PASSED [ 24%] 2023-01-11T23:10:17.0429863Z test_ops.py::TestCommonCUDA::test_python_ref__refs_exp2_cuda_float32 PASSED [ 24%] 2023-01-11T23:10:17.0430016Z test_ops.py::TestCommonCUDA::test_python_ref__refs_exp2_cuda_float64 PASSED [ 24%] 2023-01-11T23:10:17.0430173Z test_ops.py::TestCommonCUDA::test_python_ref__refs_exp2_cuda_int16 PASSED [ 24%] 2023-01-11T23:10:17.0430329Z test_ops.py::TestCommonCUDA::test_python_ref__refs_exp2_cuda_int32 PASSED [ 24%] 2023-01-11T23:10:17.0430482Z test_ops.py::TestCommonCUDA::test_python_ref__refs_exp2_cuda_int64 PASSED [ 24%] 2023-01-11T23:10:17.0430634Z test_ops.py::TestCommonCUDA::test_python_ref__refs_exp2_cuda_int8 PASSED [ 24%] 2023-01-11T23:10:17.0430781Z test_ops.py::TestCommonCUDA::test_python_ref__refs_exp_cuda_bool PASSED [ 24%] 2023-01-11T23:10:17.0430944Z test_ops.py::TestCommonCUDA::test_python_ref__refs_exp_cuda_complex128 PASSED [ 24%] 2023-01-11T23:10:17.0431104Z test_ops.py::TestCommonCUDA::test_python_ref__refs_exp_cuda_complex32 PASSED [ 24%] 2023-01-11T23:10:17.0431287Z test_ops.py::TestCommonCUDA::test_python_ref__refs_exp_cuda_complex64 PASSED [ 24%] 2023-01-11T23:10:17.0431444Z test_ops.py::TestCommonCUDA::test_python_ref__refs_exp_cuda_float16 PASSED [ 24%] 2023-01-11T23:10:17.0431602Z test_ops.py::TestCommonCUDA::test_python_ref__refs_exp_cuda_float32 PASSED [ 24%] 2023-01-11T23:10:17.0431756Z test_ops.py::TestCommonCUDA::test_python_ref__refs_exp_cuda_float64 PASSED [ 24%] 2023-01-11T23:10:17.0431911Z test_ops.py::TestCommonCUDA::test_python_ref__refs_exp_cuda_int32 PASSED [ 24%] 2023-01-11T23:10:17.0432065Z test_ops.py::TestCommonCUDA::test_python_ref__refs_exp_cuda_int64 PASSED [ 24%] 2023-01-11T23:10:17.0432213Z test_ops.py::TestCommonCUDA::test_python_ref__refs_exp_cuda_uint8 PASSED [ 24%] 2023-01-11T23:10:17.0432380Z test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_as_cuda_bfloat16 PASSED [ 24%] 2023-01-11T23:10:17.0432539Z test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_as_cuda_bool PASSED [ 24%] 2023-01-11T23:10:17.0432709Z test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_as_cuda_complex64 PASSED [ 24%] 2023-01-11T23:10:17.0432873Z test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_as_cuda_float16 PASSED [ 24%] 2023-01-11T23:10:17.0433038Z test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_as_cuda_float32 PASSED [ 24%] 2023-01-11T23:10:17.0433199Z test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_as_cuda_float64 PASSED [ 24%] 2023-01-11T23:10:17.0433359Z test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_as_cuda_int16 PASSED [ 24%] 2023-01-11T23:10:17.0433509Z test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_as_cuda_int32 PASSED [ 24%] 2023-01-11T23:10:17.0433666Z test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_as_cuda_int8 PASSED [ 24%] 2023-01-11T23:10:17.0433822Z test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_cuda_bfloat16 PASSED [ 24%] 2023-01-11T23:10:17.0433980Z test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_cuda_bool PASSED [ 24%] 2023-01-11T23:10:17.0434143Z test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_cuda_complex128 PASSED [ 24%] 2023-01-11T23:10:17.0434302Z test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_cuda_float16 PASSED [ 24%] 2023-01-11T23:10:17.0434480Z test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_cuda_float64 PASSED [ 24%] 2023-01-11T23:10:17.0434640Z test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_cuda_int32 PASSED [ 24%] 2023-01-11T23:10:17.0434788Z test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_cuda_int64 PASSED [ 24%] 2023-01-11T23:10:17.0434942Z test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_cuda_int8 PASSED [ 24%] 2023-01-11T23:10:17.0435100Z test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_cuda_uint8 PASSED [ 24%] 2023-01-11T23:10:17.0435261Z test_ops.py::TestCommonCUDA::test_python_ref__refs_expm1_cuda_bfloat16 PASSED [ 24%] 2023-01-11T23:10:17.0435417Z test_ops.py::TestCommonCUDA::test_python_ref__refs_expm1_cuda_bool PASSED [ 24%] 2023-01-11T23:10:17.0435577Z test_ops.py::TestCommonCUDA::test_python_ref__refs_expm1_cuda_float64 PASSED [ 24%] 2023-01-11T23:10:17.0435732Z test_ops.py::TestCommonCUDA::test_python_ref__refs_expm1_cuda_uint8 PASSED [ 24%] 2023-01-11T23:10:17.0435892Z test_ops.py::TestCommonCUDA::test_python_ref__refs_eye_cuda_bfloat16 PASSED [ 24%] 2023-01-11T23:10:17.0436041Z test_ops.py::TestCommonCUDA::test_python_ref__refs_eye_cuda_float32 PASSED [ 24%] 2023-01-11T23:10:17.0436193Z test_ops.py::TestCommonCUDA::test_python_ref__refs_eye_cuda_float64 PASSED [ 24%] 2023-01-11T23:10:17.0436346Z test_ops.py::TestCommonCUDA::test_python_ref__refs_eye_cuda_int16 PASSED [ 24%] 2023-01-11T23:10:17.0436498Z test_ops.py::TestCommonCUDA::test_python_ref__refs_eye_cuda_int32 PASSED [ 24%] 2023-01-11T23:10:17.0436649Z test_ops.py::TestCommonCUDA::test_python_ref__refs_eye_cuda_int64 PASSED [ 24%] 2023-01-11T23:10:17.0436803Z test_ops.py::TestCommonCUDA::test_python_ref__refs_eye_cuda_int8 PASSED [ 24%] 2023-01-11T23:10:17.0436991Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft2_cuda_complex128 PASSED [ 24%] 2023-01-11T23:10:17.0437154Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft2_cuda_complex32 PASSED [ 24%] 2023-01-11T23:10:17.0437320Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft2_cuda_float64 PASSED [ 24%] 2023-01-11T23:10:17.0437474Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft2_cuda_int16 PASSED [ 24%] 2023-01-11T23:10:17.0437627Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft2_cuda_int32 PASSED [ 24%] 2023-01-11T23:10:17.0437782Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft2_cuda_int8 PASSED [ 24%] 2023-01-11T23:10:17.0437938Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft_cuda_bool PASSED [ 24%] 2023-01-11T23:10:17.0438100Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft_cuda_complex128 PASSED [ 24%] 2023-01-11T23:10:17.0438257Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft_cuda_int32 PASSED [ 24%] 2023-01-11T23:10:17.0438413Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft_cuda_int64 PASSED [ 24%] 2023-01-11T23:10:17.0438565Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft_cuda_uint8 PASSED [ 24%] 2023-01-11T23:10:17.0438722Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftn_cuda_float32 PASSED [ 24%] 2023-01-11T23:10:17.0438877Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftn_cuda_int32 PASSED [ 24%] 2023-01-11T23:10:17.0439030Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftn_cuda_int64 PASSED [ 24%] 2023-01-11T23:10:17.0439182Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftn_cuda_uint8 PASSED [ 24%] 2023-01-11T23:10:17.0439348Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftshift_cuda_bool PASSED [ 24%] 2023-01-11T23:10:17.0439519Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftshift_cuda_complex32 PASSED [ 24%] 2023-01-11T23:10:17.0439686Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftshift_cuda_float16 PASSED [ 24%] 2023-01-11T23:10:17.0439853Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftshift_cuda_float32 PASSED [ 24%] 2023-01-11T23:10:17.0440009Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftshift_cuda_int16 PASSED [ 24%] 2023-01-11T23:10:17.0440193Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftshift_cuda_int64 PASSED [ 24%] 2023-01-11T23:10:17.0440361Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftshift_cuda_int8 PASSED [ 24%] 2023-01-11T23:10:17.0440527Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft2_cuda_complex128 PASSED [ 24%] 2023-01-11T23:10:17.0440689Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft2_cuda_complex64 PASSED [ 24%] 2023-01-11T23:10:17.0440847Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft2_cuda_float64 PASSED [ 24%] 2023-01-11T23:10:17.0441006Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft2_cuda_int32 PASSED [ 24%] 2023-01-11T23:10:17.0441163Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft2_cuda_uint8 PASSED [ 24%] 2023-01-11T23:10:17.0441324Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft_cuda_bool PASSED [ 24%] 2023-01-11T23:10:17.0441480Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft_cuda_complex32 PASSED [ 24%] 2023-01-11T23:10:17.0441644Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft_cuda_complex64 PASSED [ 24%] 2023-01-11T23:10:17.0441804Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft_cuda_float16 PASSED [ 24%] 2023-01-11T23:10:17.0441964Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft_cuda_int16 PASSED [ 24%] 2023-01-11T23:10:17.0442120Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft_cuda_int32 PASSED [ 24%] 2023-01-11T23:10:17.0442274Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft_cuda_int64 PASSED [ 24%] 2023-01-11T23:10:17.0447932Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft_cuda_int8 PASSED [ 24%] 2023-01-11T23:10:17.0448129Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfftn_cuda_float32 PASSED [ 24%] 2023-01-11T23:10:17.0448346Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfftn_cuda_float64 PASSED [ 24%] 2023-01-11T23:10:17.0448509Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfftn_cuda_int16 PASSED [ 24%] 2023-01-11T23:10:17.0448671Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfftn_cuda_int64 PASSED [ 24%] 2023-01-11T23:10:17.0448831Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfftn_cuda_int8 PASSED [ 24%] 2023-01-11T23:10:17.0448986Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft2_cuda_int16 PASSED [ 24%] 2023-01-11T23:10:17.0449139Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft2_cuda_int32 PASSED [ 24%] 2023-01-11T23:10:17.0449291Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft2_cuda_int64 PASSED [ 24%] 2023-01-11T23:10:17.0449445Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft2_cuda_int8 PASSED [ 24%] 2023-01-11T23:10:17.0449592Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft2_cuda_uint8 PASSED [ 24%] 2023-01-11T23:10:17.0449757Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft_cuda_bool PASSED [ 24%] 2023-01-11T23:10:17.0449919Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft_cuda_float32 PASSED [ 24%] 2023-01-11T23:10:17.0450075Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft_cuda_int16 PASSED [ 24%] 2023-01-11T23:10:17.0450229Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft_cuda_int32 PASSED [ 24%] 2023-01-11T23:10:17.0450382Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft_cuda_uint8 PASSED [ 24%] 2023-01-11T23:10:17.0450536Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftn_cuda_bool PASSED [ 24%] 2023-01-11T23:10:17.0450721Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftn_cuda_complex128 PASSED [ 24%] 2023-01-11T23:10:17.0450906Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftn_cuda_float32 PASSED [ 24%] 2023-01-11T23:10:17.0451071Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftn_cuda_float64 PASSED [ 24%] 2023-01-11T23:10:17.0451228Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftn_cuda_int16 PASSED [ 24%] 2023-01-11T23:10:17.0451407Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftn_cuda_int64 PASSED [ 24%] 2023-01-11T23:10:17.0451563Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftn_cuda_int8 PASSED [ 24%] 2023-01-11T23:10:17.0451726Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftshift_cuda_bool PASSED [ 24%] 2023-01-11T23:10:17.0451898Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftshift_cuda_complex128 PASSED [ 24%] 2023-01-11T23:10:17.0452070Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftshift_cuda_complex64 PASSED [ 24%] 2023-01-11T23:10:17.0452235Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftshift_cuda_float16 PASSED [ 24%] 2023-01-11T23:10:17.0452395Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftshift_cuda_float32 PASSED [ 24%] 2023-01-11T23:10:17.0452562Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftshift_cuda_float64 PASSED [ 24%] 2023-01-11T23:10:17.0452722Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftshift_cuda_int16 PASSED [ 25%] 2023-01-11T23:10:17.0452887Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftshift_cuda_int32 PASSED [ 25%] 2023-01-11T23:10:17.0453049Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftshift_cuda_uint8 PASSED [ 25%] 2023-01-11T23:10:17.0453211Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfft2_cuda_bool PASSED [ 25%] 2023-01-11T23:10:17.0453373Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfft2_cuda_float16 PASSED [ 25%] 2023-01-11T23:10:17.0453535Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfft2_cuda_float32 PASSED [ 25%] 2023-01-11T23:10:17.0453688Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfft2_cuda_int16 PASSED [ 25%] 2023-01-11T23:10:17.0453847Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfft2_cuda_int64 PASSED [ 25%] 2023-01-11T23:10:17.0454031Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfft_cuda_float32 PASSED [ 25%] 2023-01-11T23:10:17.0454189Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfft_cuda_int16 PASSED [ 25%] 2023-01-11T23:10:17.0454347Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfft_cuda_int32 PASSED [ 25%] 2023-01-11T23:10:17.0454723Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfft_cuda_int64 PASSED [ 25%] 2023-01-11T23:10:17.0454888Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfft_cuda_int8 PASSED [ 25%] 2023-01-11T23:10:17.0455050Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfftn_cuda_float16 PASSED [ 25%] 2023-01-11T23:10:17.0455214Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfftn_cuda_float32 PASSED [ 25%] 2023-01-11T23:10:17.0455372Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfftn_cuda_float64 PASSED [ 25%] 2023-01-11T23:10:17.0455534Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfftn_cuda_int32 PASSED [ 25%] 2023-01-11T23:10:17.0455700Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfftn_cuda_int64 PASSED [ 25%] 2023-01-11T23:10:17.0455860Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfftn_cuda_int8 PASSED [ 25%] 2023-01-11T23:10:17.0456016Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft2_cuda_bool PASSED [ 25%] 2023-01-11T23:10:17.0456183Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft2_cuda_complex128 PASSED [ 25%] 2023-01-11T23:10:17.0456346Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft2_cuda_complex32 PASSED [ 25%] 2023-01-11T23:10:17.0456508Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft2_cuda_complex64 PASSED [ 25%] 2023-01-11T23:10:17.0456660Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft2_cuda_float32 PASSED [ 25%] 2023-01-11T23:10:17.0456817Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft2_cuda_float64 PASSED [ 25%] 2023-01-11T23:10:17.0456976Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft2_cuda_int16 PASSED [ 25%] 2023-01-11T23:10:17.0457132Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft2_cuda_int32 PASSED [ 25%] 2023-01-11T23:10:17.0457333Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft_cuda_bool PASSED [ 25%] 2023-01-11T23:10:17.0457490Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft_cuda_float16 PASSED [ 25%] 2023-01-11T23:10:17.0457643Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft_cuda_int64 PASSED [ 25%] 2023-01-11T23:10:17.0457795Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfftn_cuda_bool PASSED [ 25%] 2023-01-11T23:10:17.0457959Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfftn_cuda_complex128 PASSED [ 25%] 2023-01-11T23:10:17.0458117Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfftn_cuda_complex64 PASSED [ 25%] 2023-01-11T23:10:17.0458274Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfftn_cuda_float32 PASSED [ 25%] 2023-01-11T23:10:17.0458431Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfft2_cuda_bool PASSED [ 25%] 2023-01-11T23:10:17.0458587Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfft2_cuda_float32 PASSED [ 25%] 2023-01-11T23:10:17.0458748Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfft2_cuda_int16 PASSED [ 25%] 2023-01-11T23:10:17.0458900Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfft2_cuda_int32 PASSED [ 25%] 2023-01-11T23:10:17.0459055Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfft2_cuda_int64 PASSED [ 25%] 2023-01-11T23:10:17.0459211Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfft2_cuda_int8 PASSED [ 25%] 2023-01-11T23:10:17.0459356Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfft2_cuda_uint8 PASSED [ 25%] 2023-01-11T23:10:17.0459512Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfft_cuda_float16 PASSED [ 25%] 2023-01-11T23:10:17.0459668Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfft_cuda_float32 PASSED [ 25%] 2023-01-11T23:10:17.0459855Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfft_cuda_int8 PASSED [ 25%] 2023-01-11T23:10:17.0460012Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfftn_cuda_float32 PASSED [ 25%] 2023-01-11T23:10:17.0460171Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfftn_cuda_int16 PASSED [ 25%] 2023-01-11T23:10:17.0460321Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfftn_cuda_int32 PASSED [ 25%] 2023-01-11T23:10:17.0460470Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfftn_cuda_int64 PASSED [ 25%] 2023-01-11T23:10:17.0460613Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfftn_cuda_uint8 PASSED [ 25%] 2023-01-11T23:10:17.0460767Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fill_cuda_bool PASSED [ 25%] 2023-01-11T23:10:17.0460922Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fill_cuda_complex128 PASSED [ 25%] 2023-01-11T23:10:17.0461077Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fill_cuda_complex64 PASSED [ 25%] 2023-01-11T23:10:17.0461233Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fill_cuda_float32 PASSED [ 25%] 2023-01-11T23:10:17.0461387Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fill_cuda_int16 PASSED [ 25%] 2023-01-11T23:10:17.0461546Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fill_cuda_int64 PASSED [ 25%] 2023-01-11T23:10:17.0461697Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fill_cuda_int8 PASSED [ 25%] 2023-01-11T23:10:17.0461845Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fill_cuda_uint8 PASSED [ 25%] 2023-01-11T23:10:17.0462003Z test_ops.py::TestCommonCUDA::test_python_ref__refs_flatten_cuda_bfloat16 PASSED [ 25%] 2023-01-11T23:10:17.0462161Z test_ops.py::TestCommonCUDA::test_python_ref__refs_flatten_cuda_bool PASSED [ 25%] 2023-01-11T23:10:17.0462322Z test_ops.py::TestCommonCUDA::test_python_ref__refs_flatten_cuda_complex128 PASSED [ 25%] 2023-01-11T23:10:17.0462480Z test_ops.py::TestCommonCUDA::test_python_ref__refs_flatten_cuda_complex64 PASSED [ 25%] 2023-01-11T23:10:17.0462638Z test_ops.py::TestCommonCUDA::test_python_ref__refs_flatten_cuda_float32 PASSED [ 25%] 2023-01-11T23:10:17.0462792Z test_ops.py::TestCommonCUDA::test_python_ref__refs_flatten_cuda_int16 PASSED [ 25%] 2023-01-11T23:10:17.0462968Z test_ops.py::TestCommonCUDA::test_python_ref__refs_flatten_cuda_int32 PASSED [ 25%] 2023-01-11T23:10:17.0463119Z test_ops.py::TestCommonCUDA::test_python_ref__refs_flatten_cuda_int64 PASSED [ 25%] 2023-01-11T23:10:17.0463262Z test_ops.py::TestCommonCUDA::test_python_ref__refs_flip_cuda_bfloat16 PASSED [ 25%] 2023-01-11T23:10:17.0463417Z test_ops.py::TestCommonCUDA::test_python_ref__refs_flip_cuda_float64 PASSED [ 25%] 2023-01-11T23:10:17.0463569Z test_ops.py::TestCommonCUDA::test_python_ref__refs_flip_cuda_int16 PASSED [ 25%] 2023-01-11T23:10:17.0463720Z test_ops.py::TestCommonCUDA::test_python_ref__refs_flip_cuda_int32 PASSED [ 25%] 2023-01-11T23:10:17.0463871Z test_ops.py::TestCommonCUDA::test_python_ref__refs_flip_cuda_int8 PASSED [ 25%] 2023-01-11T23:10:17.0464027Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fliplr_cuda_bool PASSED [ 25%] 2023-01-11T23:10:17.0464188Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fliplr_cuda_complex128 PASSED [ 25%] 2023-01-11T23:10:17.0464347Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fliplr_cuda_complex64 PASSED [ 25%] 2023-01-11T23:10:17.0464498Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fliplr_cuda_float16 PASSED [ 25%] 2023-01-11T23:10:17.0464649Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fliplr_cuda_float32 PASSED [ 25%] 2023-01-11T23:10:17.0464801Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fliplr_cuda_float64 PASSED [ 25%] 2023-01-11T23:10:17.0464953Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fliplr_cuda_int16 PASSED [ 25%] 2023-01-11T23:10:17.0465107Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fliplr_cuda_int8 PASSED [ 25%] 2023-01-11T23:10:17.0465262Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fliplr_cuda_uint8 PASSED [ 25%] 2023-01-11T23:10:17.0465440Z test_ops.py::TestCommonCUDA::test_python_ref__refs_flipud_cuda_bfloat16 PASSED [ 25%] 2023-01-11T23:10:17.0465596Z test_ops.py::TestCommonCUDA::test_python_ref__refs_flipud_cuda_complex64 PASSED [ 25%] 2023-01-11T23:10:17.0465761Z test_ops.py::TestCommonCUDA::test_python_ref__refs_float_power_cuda_complex128 PASSED [ 25%] 2023-01-11T23:10:17.0465923Z test_ops.py::TestCommonCUDA::test_python_ref__refs_float_power_cuda_complex64 PASSED [ 25%] 2023-01-11T23:10:17.0466084Z test_ops.py::TestCommonCUDA::test_python_ref__refs_float_power_cuda_float32 PASSED [ 25%] 2023-01-11T23:10:17.0466241Z test_ops.py::TestCommonCUDA::test_python_ref__refs_float_power_cuda_int16 PASSED [ 25%] 2023-01-11T23:10:17.0466398Z test_ops.py::TestCommonCUDA::test_python_ref__refs_float_power_cuda_int64 PASSED [ 25%] 2023-01-11T23:10:17.0466554Z test_ops.py::TestCommonCUDA::test_python_ref__refs_floor_cuda_float32 PASSED [ 25%] 2023-01-11T23:10:17.0466704Z test_ops.py::TestCommonCUDA::test_python_ref__refs_floor_cuda_int16 PASSED [ 25%] 2023-01-11T23:10:17.0466860Z test_ops.py::TestCommonCUDA::test_python_ref__refs_floor_cuda_int32 PASSED [ 25%] 2023-01-11T23:10:17.0467011Z test_ops.py::TestCommonCUDA::test_python_ref__refs_floor_cuda_int64 PASSED [ 25%] 2023-01-11T23:10:17.0467188Z test_ops.py::TestCommonCUDA::test_python_ref__refs_floor_divide_cuda_bfloat16 SKIPPED (Skipped!) [ 25%] 2023-01-11T23:10:17.0467349Z test_ops.py::TestCommonCUDA::test_python_ref__refs_floor_divide_cuda_float16 PASSED [ 25%] 2023-01-11T23:10:17.0467508Z test_ops.py::TestCommonCUDA::test_python_ref__refs_floor_divide_cuda_float64 PASSED [ 25%] 2023-01-11T23:10:17.0467667Z test_ops.py::TestCommonCUDA::test_python_ref__refs_floor_divide_cuda_int32 PASSED [ 25%] 2023-01-11T23:10:17.0467824Z test_ops.py::TestCommonCUDA::test_python_ref__refs_floor_divide_cuda_int8 PASSED [ 25%] 2023-01-11T23:10:17.0467978Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fmax_cuda_float16 PASSED [ 25%] 2023-01-11T23:10:17.0468128Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fmax_cuda_int16 PASSED [ 25%] 2023-01-11T23:10:17.0468283Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fmax_cuda_int32 PASSED [ 25%] 2023-01-11T23:10:17.0468428Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fmax_cuda_uint8 PASSED [ 25%] 2023-01-11T23:10:17.0468604Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fmin_cuda_float16 PASSED [ 25%] 2023-01-11T23:10:17.0468839Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fmin_cuda_float32 PASSED [ 25%] 2023-01-11T23:10:17.0468996Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fmin_cuda_float64 PASSED [ 25%] 2023-01-11T23:10:17.0469151Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fmin_cuda_int32 PASSED [ 25%] 2023-01-11T23:10:17.0469300Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fmin_cuda_int64 PASSED [ 25%] 2023-01-11T23:10:17.0469454Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fmod_cuda_float16 PASSED [ 25%] 2023-01-11T23:10:17.0469605Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fmod_cuda_float32 PASSED [ 25%] 2023-01-11T23:10:17.0469751Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fmod_cuda_float64 PASSED [ 25%] 2023-01-11T23:10:17.0469900Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fmod_cuda_int16 PASSED [ 25%] 2023-01-11T23:10:17.0470051Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fmod_cuda_int32 PASSED [ 25%] 2023-01-11T23:10:17.0470205Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fmod_cuda_int8 PASSED [ 25%] 2023-01-11T23:10:17.0470356Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fmod_cuda_uint8 PASSED [ 25%] 2023-01-11T23:10:17.0470505Z test_ops.py::TestCommonCUDA::test_python_ref__refs_frac_cuda_float16 PASSED [ 25%] 2023-01-11T23:10:17.0470652Z test_ops.py::TestCommonCUDA::test_python_ref__refs_frac_cuda_float32 PASSED [ 25%] 2023-01-11T23:10:17.0470801Z test_ops.py::TestCommonCUDA::test_python_ref__refs_gcd_cuda_int64 PASSED [ 25%] 2023-01-11T23:10:17.0470946Z test_ops.py::TestCommonCUDA::test_python_ref__refs_gcd_cuda_int8 PASSED [ 25%] 2023-01-11T23:10:17.0471122Z test_ops.py::TestCommonCUDA::test_python_ref__refs_gcd_cuda_uint8 PASSED [ 25%] 2023-01-11T23:10:17.0471274Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ge_cuda_float64 PASSED [ 25%] 2023-01-11T23:10:17.0471423Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ge_cuda_int32 PASSED [ 25%] 2023-01-11T23:10:17.0471570Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ge_cuda_int64 PASSED [ 25%] 2023-01-11T23:10:17.0471714Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ge_cuda_int8 PASSED [ 25%] 2023-01-11T23:10:17.0471866Z test_ops.py::TestCommonCUDA::test_python_ref__refs_gt_cuda_bfloat16 PASSED [ 25%] 2023-01-11T23:10:17.0472012Z test_ops.py::TestCommonCUDA::test_python_ref__refs_gt_cuda_uint8 PASSED [ 25%] 2023-01-11T23:10:17.0472166Z test_ops.py::TestCommonCUDA::test_python_ref__refs_heaviside_cuda_int16 PASSED [ 25%] 2023-01-11T23:10:17.0472322Z test_ops.py::TestCommonCUDA::test_python_ref__refs_heaviside_cuda_uint8 PASSED [ 25%] 2023-01-11T23:10:17.0472480Z test_ops.py::TestCommonCUDA::test_python_ref__refs_hsplit_cuda_float32 PASSED [ 25%] 2023-01-11T23:10:17.0472633Z test_ops.py::TestCommonCUDA::test_python_ref__refs_hsplit_cuda_float64 PASSED [ 25%] 2023-01-11T23:10:17.0472789Z test_ops.py::TestCommonCUDA::test_python_ref__refs_hsplit_cuda_int16 PASSED [ 25%] 2023-01-11T23:10:17.0472940Z test_ops.py::TestCommonCUDA::test_python_ref__refs_hsplit_cuda_int32 PASSED [ 25%] 2023-01-11T23:10:17.0473091Z test_ops.py::TestCommonCUDA::test_python_ref__refs_hsplit_cuda_uint8 PASSED [ 25%] 2023-01-11T23:10:17.0473250Z test_ops.py::TestCommonCUDA::test_python_ref__refs_hstack_cuda_complex64 PASSED [ 25%] 2023-01-11T23:10:17.0473407Z test_ops.py::TestCommonCUDA::test_python_ref__refs_hstack_cuda_float16 PASSED [ 25%] 2023-01-11T23:10:17.0473556Z test_ops.py::TestCommonCUDA::test_python_ref__refs_hypot_cuda_float16 PASSED [ 25%] 2023-01-11T23:10:17.0473709Z test_ops.py::TestCommonCUDA::test_python_ref__refs_hypot_cuda_float32 PASSED [ 25%] 2023-01-11T23:10:17.0473866Z test_ops.py::TestCommonCUDA::test_python_ref__refs_i0_cuda_bfloat16 PASSED [ 25%] 2023-01-11T23:10:17.0474014Z test_ops.py::TestCommonCUDA::test_python_ref__refs_i0_cuda_bool PASSED [ 25%] 2023-01-11T23:10:17.0474199Z test_ops.py::TestCommonCUDA::test_python_ref__refs_i0_cuda_float16 PASSED [ 25%] 2023-01-11T23:10:17.0474352Z test_ops.py::TestCommonCUDA::test_python_ref__refs_i0_cuda_int16 PASSED [ 25%] 2023-01-11T23:10:17.0474500Z test_ops.py::TestCommonCUDA::test_python_ref__refs_i0_cuda_int8 PASSED [ 25%] 2023-01-11T23:10:17.0474652Z test_ops.py::TestCommonCUDA::test_python_ref__refs_igamma_cuda_float32 PASSED [ 25%] 2023-01-11T23:10:17.0474802Z test_ops.py::TestCommonCUDA::test_python_ref__refs_igammac_cuda_float32 PASSED [ 25%] 2023-01-11T23:10:17.0474954Z test_ops.py::TestCommonCUDA::test_python_ref__refs_imag_cuda_complex128 PASSED [ 25%] 2023-01-11T23:10:17.0475106Z test_ops.py::TestCommonCUDA::test_python_ref__refs_imag_cuda_complex32 PASSED [ 25%] 2023-01-11T23:10:17.0475264Z test_ops.py::TestCommonCUDA::test_python_ref__refs_index_add_cuda_bfloat16 XFAIL [ 25%] 2023-01-11T23:10:17.0475419Z test_ops.py::TestCommonCUDA::test_python_ref__refs_index_add_cuda_bool XFAIL [ 25%] 2023-01-11T23:10:17.0475580Z test_ops.py::TestCommonCUDA::test_python_ref__refs_index_add_cuda_float32 XFAIL [ 25%] 2023-01-11T23:10:17.0475733Z test_ops.py::TestCommonCUDA::test_python_ref__refs_index_add_cuda_int32 XFAIL [ 25%] 2023-01-11T23:10:17.0475885Z test_ops.py::TestCommonCUDA::test_python_ref__refs_index_add_cuda_int8 XFAIL [ 25%] 2023-01-11T23:10:17.0476041Z test_ops.py::TestCommonCUDA::test_python_ref__refs_index_copy_cuda_complex128 XFAIL [ 25%] 2023-01-11T23:10:17.0476197Z test_ops.py::TestCommonCUDA::test_python_ref__refs_index_copy_cuda_float64 XFAIL [ 25%] 2023-01-11T23:10:17.0476354Z test_ops.py::TestCommonCUDA::test_python_ref__refs_index_copy_cuda_int8 XFAIL [ 25%] 2023-01-11T23:10:17.0476514Z test_ops.py::TestCommonCUDA::test_python_ref__refs_index_fill_cuda_bfloat16 XFAIL [ 25%] 2023-01-11T23:10:17.0476696Z test_ops.py::TestCommonCUDA::test_python_ref__refs_index_fill_cuda_bool XFAIL [ 25%] 2023-01-11T23:10:17.0476851Z test_ops.py::TestCommonCUDA::test_python_ref__refs_index_fill_cuda_float32 XFAIL [ 25%] 2023-01-11T23:10:17.0477012Z test_ops.py::TestCommonCUDA::test_python_ref__refs_index_fill_cuda_float64 XFAIL [ 25%] 2023-01-11T23:10:17.0477165Z test_ops.py::TestCommonCUDA::test_python_ref__refs_index_fill_cuda_int64 XFAIL [ 25%] 2023-01-11T23:10:17.0477325Z test_ops.py::TestCommonCUDA::test_python_ref__refs_index_select_cuda_bfloat16 XFAIL [ 25%] 2023-01-11T23:10:17.0477476Z test_ops.py::TestCommonCUDA::test_python_ref__refs_index_select_cuda_bool XFAIL [ 25%] 2023-01-11T23:10:17.0477640Z test_ops.py::TestCommonCUDA::test_python_ref__refs_index_select_cuda_complex32 XFAIL [ 26%] 2023-01-11T23:10:17.0477802Z test_ops.py::TestCommonCUDA::test_python_ref__refs_index_select_cuda_complex64 XFAIL [ 26%] 2023-01-11T23:10:17.0477962Z test_ops.py::TestCommonCUDA::test_python_ref__refs_index_select_cuda_float32 XFAIL [ 26%] 2023-01-11T23:10:17.0478118Z test_ops.py::TestCommonCUDA::test_python_ref__refs_index_select_cuda_int64 XFAIL [ 26%] 2023-01-11T23:10:17.0478273Z test_ops.py::TestCommonCUDA::test_python_ref__refs_index_select_cuda_int8 XFAIL [ 26%] 2023-01-11T23:10:17.0478431Z test_ops.py::TestCommonCUDA::test_python_ref__refs_index_select_cuda_uint8 XFAIL [ 26%] 2023-01-11T23:10:17.0478585Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isclose_cuda_bool PASSED [ 26%] 2023-01-11T23:10:17.0478740Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isclose_cuda_complex128 PASSED [ 26%] 2023-01-11T23:10:17.0478900Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isclose_cuda_complex64 PASSED [ 26%] 2023-01-11T23:10:17.0479056Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isclose_cuda_float64 PASSED [ 26%] 2023-01-11T23:10:17.0479210Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isclose_cuda_int16 PASSED [ 26%] 2023-01-11T23:10:17.0479361Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isclose_cuda_int32 PASSED [ 26%] 2023-01-11T23:10:17.0479510Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isclose_cuda_int64 PASSED [ 26%] 2023-01-11T23:10:17.0479665Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isclose_cuda_int8 PASSED [ 26%] 2023-01-11T23:10:17.0479845Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isfinite_cuda_bfloat16 PASSED [ 26%] 2023-01-11T23:10:17.0480002Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isfinite_cuda_complex128 PASSED [ 26%] 2023-01-11T23:10:17.0480170Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isfinite_cuda_complex32 PASSED [ 26%] 2023-01-11T23:10:17.0480352Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isfinite_cuda_float64 PASSED [ 26%] 2023-01-11T23:10:17.0480531Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isfinite_cuda_uint8 PASSED [ 26%] 2023-01-11T23:10:17.0480686Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isinf_cuda_bfloat16 PASSED [ 26%] 2023-01-11T23:10:17.0480841Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isinf_cuda_float16 PASSED [ 26%] 2023-01-11T23:10:17.0480998Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isinf_cuda_int64 PASSED [ 26%] 2023-01-11T23:10:17.0481151Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isinf_cuda_int8 PASSED [ 26%] 2023-01-11T23:10:17.0481310Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isinf_cuda_uint8 PASSED [ 26%] 2023-01-11T23:10:17.0481459Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isnan_cuda_bfloat16 PASSED [ 26%] 2023-01-11T23:10:17.0481611Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isnan_cuda_int32 PASSED [ 26%] 2023-01-11T23:10:17.0481763Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isnan_cuda_int8 PASSED [ 26%] 2023-01-11T23:10:17.0481914Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isnan_cuda_uint8 PASSED [ 26%] 2023-01-11T23:10:17.0482069Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isneginf_cuda_bool PASSED [ 26%] 2023-01-11T23:10:17.0482229Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isneginf_cuda_float16 PASSED [ 26%] 2023-01-11T23:10:17.0482407Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isneginf_cuda_int32 PASSED [ 26%] 2023-01-11T23:10:17.0482562Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isneginf_cuda_int8 PASSED [ 26%] 2023-01-11T23:10:17.0482710Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isposinf_cuda_int16 PASSED [ 26%] 2023-01-11T23:10:17.0482866Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isreal_cuda_bfloat16 PASSED [ 26%] 2023-01-11T23:10:17.0483024Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isreal_cuda_complex128 PASSED [ 26%] 2023-01-11T23:10:17.0483180Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isreal_cuda_int64 PASSED [ 26%] 2023-01-11T23:10:17.0483330Z test_ops.py::TestCommonCUDA::test_python_ref__refs_lcm_cuda_int32 PASSED [ 26%] 2023-01-11T23:10:17.0483482Z test_ops.py::TestCommonCUDA::test_python_ref__refs_le_cuda_bfloat16 PASSED [ 26%] 2023-01-11T23:10:17.0483632Z test_ops.py::TestCommonCUDA::test_python_ref__refs_le_cuda_bool PASSED [ 26%] 2023-01-11T23:10:17.0483783Z test_ops.py::TestCommonCUDA::test_python_ref__refs_le_cuda_int32 PASSED [ 26%] 2023-01-11T23:10:17.0483925Z test_ops.py::TestCommonCUDA::test_python_ref__refs_le_cuda_int64 PASSED [ 26%] 2023-01-11T23:10:17.0484075Z test_ops.py::TestCommonCUDA::test_python_ref__refs_le_cuda_int8 PASSED [ 26%] 2023-01-11T23:10:17.0484223Z test_ops.py::TestCommonCUDA::test_python_ref__refs_le_cuda_uint8 PASSED [ 26%] 2023-01-11T23:10:17.0484382Z test_ops.py::TestCommonCUDA::test_python_ref__refs_lerp_cuda_complex128 PASSED [ 26%] 2023-01-11T23:10:17.0484541Z test_ops.py::TestCommonCUDA::test_python_ref__refs_lerp_cuda_complex32 PASSED [ 26%] 2023-01-11T23:10:17.0484695Z test_ops.py::TestCommonCUDA::test_python_ref__refs_lerp_cuda_complex64 PASSED [ 26%] 2023-01-11T23:10:17.0484849Z test_ops.py::TestCommonCUDA::test_python_ref__refs_lerp_cuda_float16 PASSED [ 26%] 2023-01-11T23:10:17.0484999Z test_ops.py::TestCommonCUDA::test_python_ref__refs_lerp_cuda_float64 PASSED [ 26%] 2023-01-11T23:10:17.0485147Z test_ops.py::TestCommonCUDA::test_python_ref__refs_lgamma_cuda_float32 PASSED [ 26%] 2023-01-11T23:10:17.0485299Z test_ops.py::TestCommonCUDA::test_python_ref__refs_lgamma_cuda_float64 PASSED [ 26%] 2023-01-11T23:10:17.0485470Z test_ops.py::TestCommonCUDA::test_python_ref__refs_lgamma_cuda_int16 PASSED [ 26%] 2023-01-11T23:10:17.0485646Z test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_matrix_norm_cuda_complex128 PASSED [ 26%] 2023-01-11T23:10:17.0485819Z test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_matrix_norm_cuda_complex64 PASSED [ 26%] 2023-01-11T23:10:17.0485987Z test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_matrix_norm_cuda_float64 PASSED [ 26%] 2023-01-11T23:10:17.0486153Z test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_norm_cuda_complex128 PASSED [ 26%] 2023-01-11T23:10:17.0486318Z test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_norm_cuda_complex64 PASSED [ 26%] 2023-01-11T23:10:17.0486477Z test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_norm_cuda_float16 PASSED [ 26%] 2023-01-11T23:10:17.0486633Z test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_norm_cuda_float64 PASSED [ 26%] 2023-01-11T23:10:17.0486796Z test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_svd_cuda_complex64 PASSED [ 26%] 2023-01-11T23:10:17.0486957Z test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_svd_cuda_float32 PASSED [ 26%] 2023-01-11T23:10:17.0487116Z test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_svd_cuda_float64 PASSED [ 26%] 2023-01-11T23:10:17.0487286Z test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_svdvals_cuda_complex64 PASSED [ 26%] 2023-01-11T23:10:17.0487450Z test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_svdvals_cuda_float32 PASSED [ 26%] 2023-01-11T23:10:17.0487622Z test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_vector_norm_cuda_bfloat16 PASSED [ 26%] 2023-01-11T23:10:17.0487794Z test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_vector_norm_cuda_complex64 PASSED [ 26%] 2023-01-11T23:10:17.0487982Z test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_vector_norm_cuda_float16 PASSED [ 26%] 2023-01-11T23:10:17.0488145Z test_ops.py::TestCommonCUDA::test_python_ref__refs_linspace_cuda_bfloat16 PASSED [ 26%] 2023-01-11T23:10:17.0488307Z test_ops.py::TestCommonCUDA::test_python_ref__refs_linspace_cuda_float16 PASSED [ 26%] 2023-01-11T23:10:17.0488467Z test_ops.py::TestCommonCUDA::test_python_ref__refs_linspace_cuda_float32 PASSED [ 26%] 2023-01-11T23:10:17.0488627Z test_ops.py::TestCommonCUDA::test_python_ref__refs_linspace_cuda_float64 PASSED [ 26%] 2023-01-11T23:10:17.0488785Z test_ops.py::TestCommonCUDA::test_python_ref__refs_linspace_cuda_int16 XFAIL [ 26%] 2023-01-11T23:10:17.0488936Z test_ops.py::TestCommonCUDA::test_python_ref__refs_linspace_cuda_int32 XFAIL [ 26%] 2023-01-11T23:10:17.0489089Z test_ops.py::TestCommonCUDA::test_python_ref__refs_linspace_cuda_int8 XFAIL [ 26%] 2023-01-11T23:10:17.0489238Z test_ops.py::TestCommonCUDA::test_python_ref__refs_linspace_cuda_uint8 XFAIL [ 26%] 2023-01-11T23:10:17.0489392Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log10_cuda_complex128 PASSED [ 26%] 2023-01-11T23:10:17.0489550Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log10_cuda_complex64 PASSED [ 26%] 2023-01-11T23:10:17.0489706Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log10_cuda_float32 PASSED [ 26%] 2023-01-11T23:10:17.0489855Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log10_cuda_float64 PASSED [ 26%] 2023-01-11T23:10:17.0490010Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log10_cuda_uint8 PASSED [ 26%] 2023-01-11T23:10:17.0490187Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log1p_cuda_bfloat16 PASSED [ 26%] 2023-01-11T23:10:17.0490370Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log1p_cuda_complex64 PASSED [ 26%] 2023-01-11T23:10:17.0490521Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log1p_cuda_float16 PASSED [ 26%] 2023-01-11T23:10:17.0490663Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log1p_cuda_float32 PASSED [ 26%] 2023-01-11T23:10:17.0490813Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log1p_cuda_float64 PASSED [ 26%] 2023-01-11T23:10:17.0490966Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log1p_cuda_int16 PASSED [ 26%] 2023-01-11T23:10:17.0491145Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log1p_cuda_int64 PASSED [ 26%] 2023-01-11T23:10:17.0491300Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log2_cuda_bfloat16 PASSED [ 26%] 2023-01-11T23:10:17.0491449Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log2_cuda_bool PASSED [ 26%] 2023-01-11T23:10:17.0491604Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log2_cuda_float16 PASSED [ 26%] 2023-01-11T23:10:17.0491755Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log2_cuda_float32 PASSED [ 26%] 2023-01-11T23:10:17.0491907Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log_cuda_complex128 PASSED [ 26%] 2023-01-11T23:10:17.0492057Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log_cuda_complex32 PASSED [ 26%] 2023-01-11T23:10:17.0492212Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log_cuda_complex64 PASSED [ 26%] 2023-01-11T23:10:17.0492363Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log_cuda_float16 PASSED [ 26%] 2023-01-11T23:10:17.0492516Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log_cuda_float32 PASSED [ 26%] 2023-01-11T23:10:17.0492667Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log_cuda_float64 PASSED [ 26%] 2023-01-11T23:10:17.0492816Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log_cuda_int16 PASSED [ 26%] 2023-01-11T23:10:17.0492966Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log_cuda_int32 PASSED [ 26%] 2023-01-11T23:10:17.0493111Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log_cuda_int64 PASSED [ 26%] 2023-01-11T23:10:17.0493260Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log_cuda_uint8 PASSED [ 26%] 2023-01-11T23:10:17.0493430Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log_softmax_with_dtype_cuda_bool PASSED [ 26%] 2023-01-11T23:10:17.0493630Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log_softmax_with_dtype_cuda_complex128 PASSED [ 26%] 2023-01-11T23:10:17.0493808Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log_softmax_with_dtype_cuda_complex64 PASSED [ 26%] 2023-01-11T23:10:17.0493986Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log_softmax_with_dtype_cuda_float16 PASSED [ 26%] 2023-01-11T23:10:17.0494157Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log_softmax_with_dtype_cuda_int16 PASSED [ 26%] 2023-01-11T23:10:17.0494324Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log_softmax_with_dtype_cuda_int8 PASSED [ 26%] 2023-01-11T23:10:17.0494596Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log_softmax_with_dtype_cuda_uint8 PASSED [ 26%] 2023-01-11T23:10:17.0494751Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_and_cuda_bool PASSED [ 26%] 2023-01-11T23:10:17.0494919Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_and_cuda_complex128 PASSED [ 26%] 2023-01-11T23:10:17.0495085Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_and_cuda_float16 PASSED [ 26%] 2023-01-11T23:10:17.0495246Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_and_cuda_float64 PASSED [ 26%] 2023-01-11T23:10:17.0495407Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_and_cuda_int32 PASSED [ 26%] 2023-01-11T23:10:17.0495565Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_and_cuda_int64 PASSED [ 26%] 2023-01-11T23:10:17.0495723Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_and_cuda_int8 PASSED [ 26%] 2023-01-11T23:10:17.0495887Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_not_cuda_bfloat16 PASSED [ 26%] 2023-01-11T23:10:17.0496038Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_not_cuda_bool PASSED [ 26%] 2023-01-11T23:10:17.0496208Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_not_cuda_complex128 PASSED [ 26%] 2023-01-11T23:10:17.0496367Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_not_cuda_float32 PASSED [ 26%] 2023-01-11T23:10:17.0496531Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_not_cuda_float64 PASSED [ 26%] 2023-01-11T23:10:17.0496689Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_not_cuda_int16 PASSED [ 26%] 2023-01-11T23:10:17.0496883Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_or_cuda_bool PASSED [ 26%] 2023-01-11T23:10:17.0497044Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_or_cuda_complex64 PASSED [ 26%] 2023-01-11T23:10:17.0497204Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_or_cuda_float16 PASSED [ 26%] 2023-01-11T23:10:17.0497364Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_or_cuda_float32 PASSED [ 26%] 2023-01-11T23:10:17.0497513Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_or_cuda_int16 PASSED [ 26%] 2023-01-11T23:10:17.0497669Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_or_cuda_int32 PASSED [ 26%] 2023-01-11T23:10:17.0497832Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_xor_cuda_bfloat16 PASSED [ 26%] 2023-01-11T23:10:17.0497989Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_xor_cuda_bool PASSED [ 26%] 2023-01-11T23:10:17.0498146Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_xor_cuda_float16 PASSED [ 26%] 2023-01-11T23:10:17.0498306Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_xor_cuda_float32 PASSED [ 26%] 2023-01-11T23:10:17.0498465Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_xor_cuda_float64 PASSED [ 26%] 2023-01-11T23:10:17.0498623Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_xor_cuda_int16 PASSED [ 26%] 2023-01-11T23:10:17.0498776Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_xor_cuda_int64 PASSED [ 26%] 2023-01-11T23:10:17.0498933Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_xor_cuda_int8 PASSED [ 26%] 2023-01-11T23:10:17.0499093Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logspace_cuda_complex64 PASSED [ 26%] 2023-01-11T23:10:17.0499279Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logspace_cuda_float16 PASSED [ 26%] 2023-01-11T23:10:17.0499437Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logspace_cuda_float32 PASSED [ 26%] 2023-01-11T23:10:17.0499597Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logspace_cuda_int8 PASSED [ 26%] 2023-01-11T23:10:17.0499754Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logspace_cuda_uint8 PASSED [ 26%] 2023-01-11T23:10:17.0499913Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logsumexp_cuda_float16 PASSED [ 26%] 2023-01-11T23:10:17.0500073Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logsumexp_cuda_float32 PASSED [ 26%] 2023-01-11T23:10:17.0500224Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logsumexp_cuda_float64 PASSED [ 26%] 2023-01-11T23:10:17.0500381Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logsumexp_cuda_int16 PASSED [ 26%] 2023-01-11T23:10:17.0500532Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logsumexp_cuda_int32 PASSED [ 26%] 2023-01-11T23:10:17.0500688Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logsumexp_cuda_int64 PASSED [ 26%] 2023-01-11T23:10:17.0500840Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logsumexp_cuda_uint8 PASSED [ 26%] 2023-01-11T23:10:17.0500996Z test_ops.py::TestCommonCUDA::test_python_ref__refs_lt_cuda_bfloat16 PASSED [ 26%] 2023-01-11T23:10:17.0501147Z test_ops.py::TestCommonCUDA::test_python_ref__refs_lt_cuda_bool PASSED [ 26%] 2023-01-11T23:10:17.0501301Z test_ops.py::TestCommonCUDA::test_python_ref__refs_lt_cuda_float16 PASSED [ 26%] 2023-01-11T23:10:17.0501448Z test_ops.py::TestCommonCUDA::test_python_ref__refs_lt_cuda_float64 PASSED [ 26%] 2023-01-11T23:10:17.0501598Z test_ops.py::TestCommonCUDA::test_python_ref__refs_lt_cuda_int64 PASSED [ 26%] 2023-01-11T23:10:17.0501760Z test_ops.py::TestCommonCUDA::test_python_ref__refs_masked_fill_cuda_bfloat16 PASSED [ 26%] 2023-01-11T23:10:17.0501924Z test_ops.py::TestCommonCUDA::test_python_ref__refs_masked_fill_cuda_complex128 PASSED [ 26%] 2023-01-11T23:10:17.0502090Z test_ops.py::TestCommonCUDA::test_python_ref__refs_masked_fill_cuda_complex32 PASSED [ 26%] 2023-01-11T23:10:17.0502247Z test_ops.py::TestCommonCUDA::test_python_ref__refs_masked_fill_cuda_float16 PASSED [ 26%] 2023-01-11T23:10:17.0502426Z test_ops.py::TestCommonCUDA::test_python_ref__refs_masked_fill_cuda_int64 PASSED [ 26%] 2023-01-11T23:10:17.0502583Z test_ops.py::TestCommonCUDA::test_python_ref__refs_maximum_cuda_bool PASSED [ 26%] 2023-01-11T23:10:17.0502733Z test_ops.py::TestCommonCUDA::test_python_ref__refs_maximum_cuda_float32 PASSED [ 27%] 2023-01-11T23:10:17.0502887Z test_ops.py::TestCommonCUDA::test_python_ref__refs_maximum_cuda_int32 PASSED [ 27%] 2023-01-11T23:10:17.0503038Z test_ops.py::TestCommonCUDA::test_python_ref__refs_maximum_cuda_int64 PASSED [ 27%] 2023-01-11T23:10:17.0503192Z test_ops.py::TestCommonCUDA::test_python_ref__refs_maximum_cuda_int8 PASSED [ 27%] 2023-01-11T23:10:17.0503343Z test_ops.py::TestCommonCUDA::test_python_ref__refs_maximum_cuda_uint8 PASSED [ 27%] 2023-01-11T23:10:17.0503496Z test_ops.py::TestCommonCUDA::test_python_ref__refs_mean_cuda_bfloat16 PASSED [ 27%] 2023-01-11T23:10:17.0503650Z test_ops.py::TestCommonCUDA::test_python_ref__refs_mean_cuda_complex128 PASSED [ 27%] 2023-01-11T23:10:17.0503809Z test_ops.py::TestCommonCUDA::test_python_ref__refs_mean_cuda_complex64 PASSED [ 27%] 2023-01-11T23:10:17.0503954Z test_ops.py::TestCommonCUDA::test_python_ref__refs_mean_cuda_float16 PASSED [ 27%] 2023-01-11T23:10:17.0504106Z test_ops.py::TestCommonCUDA::test_python_ref__refs_mean_cuda_float64 PASSED [ 27%] 2023-01-11T23:10:17.0504280Z test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_list_of_tensors_cuda_bool PASSED [ 27%] 2023-01-11T23:10:17.0504456Z test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_list_of_tensors_cuda_float16 PASSED [ 27%] 2023-01-11T23:10:17.0504633Z test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_list_of_tensors_cuda_int32 PASSED [ 27%] 2023-01-11T23:10:17.0504804Z test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_list_of_tensors_cuda_int64 PASSED [ 27%] 2023-01-11T23:10:17.0504999Z test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_list_of_tensors_cuda_uint8 PASSED [ 27%] 2023-01-11T23:10:17.0505183Z test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_variadic_tensors_cuda_bfloat16 PASSED [ 27%] 2023-01-11T23:10:17.0505357Z test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_variadic_tensors_cuda_bool PASSED [ 27%] 2023-01-11T23:10:17.0505529Z test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_variadic_tensors_cuda_float32 PASSED [ 27%] 2023-01-11T23:10:17.0505706Z test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_variadic_tensors_cuda_int16 PASSED [ 27%] 2023-01-11T23:10:17.0505866Z test_ops.py::TestCommonCUDA::test_python_ref__refs_minimum_cuda_float16 PASSED [ 27%] 2023-01-11T23:10:17.0506019Z test_ops.py::TestCommonCUDA::test_python_ref__refs_minimum_cuda_float32 PASSED [ 27%] 2023-01-11T23:10:17.0506175Z test_ops.py::TestCommonCUDA::test_python_ref__refs_minimum_cuda_int64 PASSED [ 27%] 2023-01-11T23:10:17.0506337Z test_ops.py::TestCommonCUDA::test_python_ref__refs_movedim_cuda_bfloat16 PASSED [ 27%] 2023-01-11T23:10:17.0506500Z test_ops.py::TestCommonCUDA::test_python_ref__refs_movedim_cuda_complex128 PASSED [ 27%] 2023-01-11T23:10:17.0506664Z test_ops.py::TestCommonCUDA::test_python_ref__refs_movedim_cuda_complex64 PASSED [ 27%] 2023-01-11T23:10:17.0506823Z test_ops.py::TestCommonCUDA::test_python_ref__refs_movedim_cuda_float64 PASSED [ 27%] 2023-01-11T23:10:17.0506971Z test_ops.py::TestCommonCUDA::test_python_ref__refs_movedim_cuda_int16 PASSED [ 27%] 2023-01-11T23:10:17.0507123Z test_ops.py::TestCommonCUDA::test_python_ref__refs_movedim_cuda_int32 PASSED [ 27%] 2023-01-11T23:10:17.0507273Z test_ops.py::TestCommonCUDA::test_python_ref__refs_movedim_cuda_int64 PASSED [ 27%] 2023-01-11T23:10:17.0507428Z test_ops.py::TestCommonCUDA::test_python_ref__refs_movedim_cuda_int8 PASSED [ 27%] 2023-01-11T23:10:17.0507581Z test_ops.py::TestCommonCUDA::test_python_ref__refs_mul_cuda_bfloat16 PASSED [ 27%] 2023-01-11T23:10:17.0507737Z test_ops.py::TestCommonCUDA::test_python_ref__refs_mul_cuda_bool PASSED [ 27%] 2023-01-11T23:10:17.0507892Z test_ops.py::TestCommonCUDA::test_python_ref__refs_mul_cuda_complex32 XFAIL [ 27%] 2023-01-11T23:10:17.0508067Z test_ops.py::TestCommonCUDA::test_python_ref__refs_mul_cuda_float32 PASSED [ 27%] 2023-01-11T23:10:17.0508214Z test_ops.py::TestCommonCUDA::test_python_ref__refs_mul_cuda_float64 PASSED [ 27%] 2023-01-11T23:10:17.0508365Z test_ops.py::TestCommonCUDA::test_python_ref__refs_mul_cuda_int16 PASSED [ 27%] 2023-01-11T23:10:17.0508513Z test_ops.py::TestCommonCUDA::test_python_ref__refs_mul_cuda_int32 PASSED [ 27%] 2023-01-11T23:10:17.0508723Z test_ops.py::TestCommonCUDA::test_python_ref__refs_mul_cuda_int64 PASSED [ 27%] 2023-01-11T23:10:17.0508887Z test_ops.py::TestCommonCUDA::test_python_ref__refs_mul_cuda_int8 PASSED [ 27%] 2023-01-11T23:10:17.0509046Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nan_to_num_cuda_float16 PASSED [ 27%] 2023-01-11T23:10:17.0509208Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nan_to_num_cuda_float32 PASSED [ 27%] 2023-01-11T23:10:17.0509367Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nan_to_num_cuda_float64 PASSED [ 27%] 2023-01-11T23:10:17.0509520Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nan_to_num_cuda_uint8 PASSED [ 27%] 2023-01-11T23:10:17.0509679Z test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_copy_cuda_bool PASSED [ 27%] 2023-01-11T23:10:17.0509843Z test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_copy_cuda_complex64 PASSED [ 27%] 2023-01-11T23:10:17.0510002Z test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_copy_cuda_float32 PASSED [ 27%] 2023-01-11T23:10:17.0510160Z test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_copy_cuda_int32 PASSED [ 27%] 2023-01-11T23:10:17.0510316Z test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_copy_cuda_int64 PASSED [ 27%] 2023-01-11T23:10:17.0510474Z test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_copy_cuda_int8 PASSED [ 27%] 2023-01-11T23:10:17.0510652Z test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_cuda_bool PASSED [ 27%] 2023-01-11T23:10:17.0510810Z test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_cuda_complex128 PASSED [ 27%] 2023-01-11T23:10:17.0510963Z test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_cuda_complex64 PASSED [ 27%] 2023-01-11T23:10:17.0511122Z test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_cuda_float16 PASSED [ 27%] 2023-01-11T23:10:17.0511275Z test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_cuda_float64 PASSED [ 27%] 2023-01-11T23:10:17.0511428Z test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_cuda_int32 PASSED [ 27%] 2023-01-11T23:10:17.0511600Z test_ops.py::TestCommonCUDA::test_python_ref__refs_native_layer_norm_cuda_float16 PASSED [ 27%] 2023-01-11T23:10:17.0511753Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ne_cuda_float16 PASSED [ 27%] 2023-01-11T23:10:17.0511905Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ne_cuda_int16 PASSED [ 27%] 2023-01-11T23:10:17.0512058Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ne_cuda_int32 PASSED [ 27%] 2023-01-11T23:10:17.0512201Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ne_cuda_int64 PASSED [ 27%] 2023-01-11T23:10:17.0512354Z test_ops.py::TestCommonCUDA::test_python_ref__refs_neg_cuda_bfloat16 PASSED [ 27%] 2023-01-11T23:10:17.0512506Z test_ops.py::TestCommonCUDA::test_python_ref__refs_neg_cuda_complex32 PASSED [ 27%] 2023-01-11T23:10:17.0512653Z test_ops.py::TestCommonCUDA::test_python_ref__refs_neg_cuda_float32 PASSED [ 27%] 2023-01-11T23:10:17.0512808Z test_ops.py::TestCommonCUDA::test_python_ref__refs_neg_cuda_float64 PASSED [ 27%] 2023-01-11T23:10:17.0512957Z test_ops.py::TestCommonCUDA::test_python_ref__refs_neg_cuda_int16 PASSED [ 27%] 2023-01-11T23:10:17.0513103Z test_ops.py::TestCommonCUDA::test_python_ref__refs_neg_cuda_int32 PASSED [ 27%] 2023-01-11T23:10:17.0513252Z test_ops.py::TestCommonCUDA::test_python_ref__refs_neg_cuda_int64 PASSED [ 27%] 2023-01-11T23:10:17.0513397Z test_ops.py::TestCommonCUDA::test_python_ref__refs_neg_cuda_int8 PASSED [ 27%] 2023-01-11T23:10:17.0513600Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_cuda_bfloat16 SKIPPED (Expected: empty is not comparable) [ 27%] 2023-01-11T23:10:17.0513843Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_cuda_complex32 SKIPPED (Expected: empty is not comparable) [ 27%] 2023-01-11T23:10:17.0514046Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_cuda_float16 SKIPPED (Expected: empty is not comparable) [ 27%] 2023-01-11T23:10:17.0514244Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_cuda_float64 SKIPPED (Expected: empty is not comparable) [ 27%] 2023-01-11T23:10:17.0514440Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_cuda_int16 SKIPPED (Expected: empty is not comparable) [ 27%] 2023-01-11T23:10:17.0514635Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_cuda_int32 SKIPPED (Expected: empty is not comparable) [ 27%] 2023-01-11T23:10:17.0514832Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_cuda_int8 SKIPPED (Expected: empty is not comparable) [ 27%] 2023-01-11T23:10:17.0515029Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_cuda_uint8 SKIPPED (Expected: empty is not comparable) [ 27%] 2023-01-11T23:10:17.0515246Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_strided_cuda_bfloat16 SKIPPED (Expected: empty_strided is not comparable) [ 27%] 2023-01-11T23:10:17.0515459Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_strided_cuda_complex128 SKIPPED (Expected: empty_strided is not comparable) [ 27%] 2023-01-11T23:10:17.0515678Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_strided_cuda_complex32 SKIPPED (Expected: empty_strided is not comparable) [ 27%] 2023-01-11T23:10:17.0515892Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_strided_cuda_float16 SKIPPED (Expected: empty_strided is not comparable) [ 27%] 2023-01-11T23:10:17.0516162Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_strided_cuda_float64 SKIPPED (Expected: empty_strided is not comparable) [ 27%] 2023-01-11T23:10:17.0516376Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_strided_cuda_int16 SKIPPED (Expected: empty_strided is not comparable) [ 27%] 2023-01-11T23:10:17.0516585Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_strided_cuda_int32 SKIPPED (Expected: empty_strided is not comparable) [ 27%] 2023-01-11T23:10:17.0516794Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_strided_cuda_int8 SKIPPED (Expected: empty_strided is not comparable) [ 27%] 2023-01-11T23:10:17.0517007Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_strided_cuda_uint8 SKIPPED (Expected: empty_strided is not comparable) [ 27%] 2023-01-11T23:10:17.0517166Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_full_cuda_bool PASSED [ 27%] 2023-01-11T23:10:17.0517328Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_full_cuda_complex128 PASSED [ 27%] 2023-01-11T23:10:17.0517485Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_full_cuda_complex64 PASSED [ 27%] 2023-01-11T23:10:17.0517644Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_full_cuda_float16 PASSED [ 27%] 2023-01-11T23:10:17.0517803Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_full_cuda_float32 PASSED [ 27%] 2023-01-11T23:10:17.0517957Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_full_cuda_int8 PASSED [ 27%] 2023-01-11T23:10:17.0518115Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_full_cuda_uint8 PASSED [ 27%] 2023-01-11T23:10:17.0518274Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_ones_cuda_bfloat16 PASSED [ 27%] 2023-01-11T23:10:17.0518428Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_ones_cuda_bool PASSED [ 27%] 2023-01-11T23:10:17.0518596Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_ones_cuda_complex128 PASSED [ 27%] 2023-01-11T23:10:17.0518750Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_ones_cuda_complex64 PASSED [ 27%] 2023-01-11T23:10:17.0518909Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_ones_cuda_float32 PASSED [ 27%] 2023-01-11T23:10:17.0519063Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_ones_cuda_int32 PASSED [ 27%] 2023-01-11T23:10:17.0519240Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_ones_cuda_int8 PASSED [ 27%] 2023-01-11T23:10:17.0519396Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_zeros_cuda_bool PASSED [ 27%] 2023-01-11T23:10:17.0519556Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_zeros_cuda_complex32 PASSED [ 27%] 2023-01-11T23:10:17.0519716Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_zeros_cuda_float16 PASSED [ 27%] 2023-01-11T23:10:17.0519872Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_zeros_cuda_int32 PASSED [ 27%] 2023-01-11T23:10:17.0520026Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_zeros_cuda_uint8 PASSED [ 27%] 2023-01-11T23:10:17.0520179Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nextafter_cuda_bfloat16 PASSED [ 27%] 2023-01-11T23:10:17.0520404Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_alpha_dropout_cuda_bfloat16 SKIPPED (Expected: dropout is not comparable) [ 27%] 2023-01-11T23:10:17.0520627Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_alpha_dropout_cuda_float32 SKIPPED (Expected: dropout is not comparable) [ 27%] 2023-01-11T23:10:17.0520871Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_alpha_dropout_cuda_float64 SKIPPED (Expected: dropout is not comparable) [ 27%] 2023-01-11T23:10:17.0521069Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_celu_cuda_bfloat16 PASSED [ 27%] 2023-01-11T23:10:17.0521240Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_celu_cuda_float16 PASSED [ 27%] 2023-01-11T23:10:17.0521458Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_dropout_cuda_bfloat16 SKIPPED (Expected: dropout is not comparable) [ 27%] 2023-01-11T23:10:17.0521698Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_dropout_cuda_float16 SKIPPED (Expected: dropout is not comparable) [ 27%] 2023-01-11T23:10:17.0521868Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_elu_cuda_bfloat16 PASSED [ 27%] 2023-01-11T23:10:17.0522033Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_elu_cuda_float16 PASSED [ 27%] 2023-01-11T23:10:17.0522201Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_elu_cuda_float32 PASSED [ 27%] 2023-01-11T23:10:17.0522367Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_gelu_cuda_float16 PASSED [ 27%] 2023-01-11T23:10:17.0522530Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_gelu_cuda_float64 PASSED [ 27%] 2023-01-11T23:10:17.0522692Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_glu_cuda_float16 PASSED [ 27%] 2023-01-11T23:10:17.0522870Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_group_norm_cuda_bfloat16 PASSED [ 27%] 2023-01-11T23:10:17.0523048Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_group_norm_cuda_float64 PASSED [ 27%] 2023-01-11T23:10:17.0523226Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_hardshrink_cuda_bfloat16 PASSED [ 27%] 2023-01-11T23:10:17.0523402Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_hardshrink_cuda_float16 PASSED [ 27%] 2023-01-11T23:10:17.0523570Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_hardtanh_cuda_bfloat16 PASSED [ 27%] 2023-01-11T23:10:17.0523744Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_hardtanh_cuda_float16 PASSED [ 27%] 2023-01-11T23:10:17.0523915Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_hardtanh_cuda_int16 PASSED [ 27%] 2023-01-11T23:10:17.0524084Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_hardtanh_cuda_int8 PASSED [ 27%] 2023-01-11T23:10:17.0524272Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_hinge_embedding_loss_cuda_bfloat16 PASSED [ 27%] 2023-01-11T23:10:17.0524463Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_hinge_embedding_loss_cuda_float16 PASSED [ 27%] 2023-01-11T23:10:17.0524675Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_hinge_embedding_loss_cuda_float32 PASSED [ 27%] 2023-01-11T23:10:17.0524855Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_huber_loss_cuda_float32 PASSED [ 27%] 2023-01-11T23:10:17.0525025Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_l1_loss_cuda_float32 PASSED [ 27%] 2023-01-11T23:10:17.0525195Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_layer_norm_cuda_bfloat16 PASSED [ 27%] 2023-01-11T23:10:17.0525370Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_layer_norm_cuda_float32 PASSED [ 27%] 2023-01-11T23:10:17.0525542Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_leaky_relu_cuda_float16 PASSED [ 27%] 2023-01-11T23:10:17.0525718Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_leaky_relu_cuda_float64 PASSED [ 27%] 2023-01-11T23:10:17.0525908Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_log_softmax_with_dtype_cuda_bfloat16 PASSED [ 27%] 2023-01-11T23:10:17.0526095Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_log_softmax_with_dtype_cuda_bool PASSED [ 27%] 2023-01-11T23:10:17.0526291Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_log_softmax_with_dtype_cuda_complex128 PASSED [ 27%] 2023-01-11T23:10:17.0526483Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_log_softmax_with_dtype_cuda_complex32 PASSED [ 27%] 2023-01-11T23:10:17.0526669Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_log_softmax_with_dtype_cuda_complex64 PASSED [ 27%] 2023-01-11T23:10:17.0526851Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_log_softmax_with_dtype_cuda_float16 PASSED [ 27%] 2023-01-11T23:10:17.0527034Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_log_softmax_with_dtype_cuda_int64 PASSED [ 27%] 2023-01-11T23:10:17.0527246Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_log_softmax_with_dtype_cuda_uint8 PASSED [ 27%] 2023-01-11T23:10:17.0527433Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_margin_ranking_loss_cuda_bfloat16 PASSED [ 27%] 2023-01-11T23:10:17.0527615Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_margin_ranking_loss_cuda_float32 PASSED [ 27%] 2023-01-11T23:10:17.0527799Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_margin_ranking_loss_cuda_int8 PASSED [ 27%] 2023-01-11T23:10:17.0527970Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_mish_cuda_bfloat16 PASSED [ 27%] 2023-01-11T23:10:17.0528140Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_mish_cuda_float64 PASSED [ 27%] 2023-01-11T23:10:17.0528315Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_mse_loss_cuda_float16 PASSED [ 27%] 2023-01-11T23:10:17.0528481Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_mse_loss_cuda_float32 PASSED [ 27%] 2023-01-11T23:10:17.0528651Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_nll_loss_cuda_float32 PASSED [ 27%] 2023-01-11T23:10:17.0528817Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_nll_loss_cuda_float64 PASSED [ 27%] 2023-01-11T23:10:17.0529002Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pairwise_distance_cuda_complex128 PASSED [ 27%] 2023-01-11T23:10:17.0529185Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pairwise_distance_cuda_float64 PASSED [ 27%] 2023-01-11T23:10:17.0529366Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pairwise_distance_cuda_int16 PASSED [ 27%] 2023-01-11T23:10:17.0529545Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pairwise_distance_cuda_int8 PASSED [ 27%] 2023-01-11T23:10:17.0529713Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pdist_cuda_float64 XFAIL [ 27%] 2023-01-11T23:10:17.0529896Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_poisson_nll_loss_cuda_float16 PASSED [ 28%] 2023-01-11T23:10:17.0530096Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_poisson_nll_loss_cuda_float32 PASSED [ 28%] 2023-01-11T23:10:17.0530298Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_poisson_nll_loss_cuda_float64 PASSED [ 28%] 2023-01-11T23:10:17.0530502Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_poisson_nll_loss_cuda_int16 PASSED [ 28%] 2023-01-11T23:10:17.0530681Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_poisson_nll_loss_cuda_int32 PASSED [ 28%] 2023-01-11T23:10:17.0530854Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_poisson_nll_loss_cuda_int64 PASSED [ 28%] 2023-01-11T23:10:17.0531032Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_poisson_nll_loss_cuda_int8 PASSED [ 28%] 2023-01-11T23:10:17.0531210Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_prelu_cuda_bfloat16 PASSED [ 28%] 2023-01-11T23:10:17.0531382Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_prelu_cuda_float16 PASSED [ 28%] 2023-01-11T23:10:17.0531555Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_prelu_cuda_float32 PASSED [ 28%] 2023-01-11T23:10:17.0531717Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_relu6_cuda_float16 PASSED [ 28%] 2023-01-11T23:10:17.0531889Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_relu6_cuda_float32 PASSED [ 28%] 2023-01-11T23:10:17.0532056Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_relu6_cuda_int16 PASSED [ 28%] 2023-01-11T23:10:17.0532224Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_relu6_cuda_int32 PASSED [ 28%] 2023-01-11T23:10:17.0532388Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_relu6_cuda_int64 PASSED [ 28%] 2023-01-11T23:10:17.0532580Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_relu6_cuda_int8 PASSED [ 28%] 2023-01-11T23:10:17.0532746Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_relu_cuda_float64 PASSED [ 28%] 2023-01-11T23:10:17.0532916Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_relu_cuda_int64 PASSED [ 28%] 2023-01-11T23:10:17.0533082Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_selu_cuda_bfloat16 PASSED [ 28%] 2023-01-11T23:10:17.0533244Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_selu_cuda_float16 PASSED [ 28%] 2023-01-11T23:10:17.0533409Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_selu_cuda_float64 PASSED [ 28%] 2023-01-11T23:10:17.0533594Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softmax_with_dtype_cuda_bfloat16 PASSED [ 28%] 2023-01-11T23:10:17.0533775Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softmax_with_dtype_cuda_bool PASSED [ 28%] 2023-01-11T23:10:17.0533963Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softmax_with_dtype_cuda_complex64 PASSED [ 28%] 2023-01-11T23:10:17.0534145Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softmax_with_dtype_cuda_float32 PASSED [ 28%] 2023-01-11T23:10:17.0534325Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softmax_with_dtype_cuda_int8 PASSED [ 28%] 2023-01-11T23:10:17.0534626Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softmin_with_dtype_cuda_complex128 PASSED [ 28%] 2023-01-11T23:10:17.0534819Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softmin_with_dtype_cuda_complex64 PASSED [ 28%] 2023-01-11T23:10:17.0534997Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softmin_with_dtype_cuda_float16 PASSED [ 28%] 2023-01-11T23:10:17.0535180Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softmin_with_dtype_cuda_int16 PASSED [ 28%] 2023-01-11T23:10:17.0535361Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softmin_with_dtype_cuda_int8 PASSED [ 28%] 2023-01-11T23:10:17.0535542Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softmin_with_dtype_cuda_uint8 PASSED [ 28%] 2023-01-11T23:10:17.0535814Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softplus_cuda_bfloat16 PASSED [ 28%] 2023-01-11T23:10:17.0535995Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softplus_cuda_float64 PASSED [ 28%] 2023-01-11T23:10:17.0536171Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softshrink_cuda_bfloat16 PASSED [ 28%] 2023-01-11T23:10:17.0536350Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softshrink_cuda_float16 PASSED [ 28%] 2023-01-11T23:10:17.0536525Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softshrink_cuda_float32 PASSED [ 28%] 2023-01-11T23:10:17.0536696Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_tanhshrink_cuda_bfloat16 PASSED [ 28%] 2023-01-11T23:10:17.0536882Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_tanhshrink_cuda_complex64 PASSED [ 28%] 2023-01-11T23:10:17.0537058Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_tanhshrink_cuda_float32 PASSED [ 28%] 2023-01-11T23:10:17.0537233Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_tanhshrink_cuda_float64 PASSED [ 28%] 2023-01-11T23:10:17.0537406Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_tanhshrink_cuda_int32 PASSED [ 28%] 2023-01-11T23:10:17.0537574Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_tanhshrink_cuda_int64 PASSED [ 28%] 2023-01-11T23:10:17.0537749Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_threshold_cuda_float16 PASSED [ 28%] 2023-01-11T23:10:17.0537919Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_threshold_cuda_float32 PASSED [ 28%] 2023-01-11T23:10:17.0538087Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_threshold_cuda_float64 PASSED [ 28%] 2023-01-11T23:10:17.0538282Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_threshold_cuda_int16 PASSED [ 28%] 2023-01-11T23:10:17.0538455Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_threshold_cuda_int8 PASSED [ 28%] 2023-01-11T23:10:17.0538627Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_threshold_cuda_uint8 PASSED [ 28%] 2023-01-11T23:10:17.0538815Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_triplet_margin_loss_cuda_bfloat16 PASSED [ 28%] 2023-01-11T23:10:17.0539005Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_triplet_margin_loss_cuda_complex64 PASSED [ 28%] 2023-01-11T23:10:17.0539192Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_triplet_margin_loss_cuda_float16 PASSED [ 28%] 2023-01-11T23:10:17.0539376Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_triplet_margin_loss_cuda_float32 PASSED [ 28%] 2023-01-11T23:10:17.0539558Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_triplet_margin_loss_cuda_int64 PASSED [ 28%] 2023-01-11T23:10:17.0539742Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_triplet_margin_loss_cuda_int8 PASSED [ 28%] 2023-01-11T23:10:17.0539918Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_triplet_margin_loss_cuda_uint8 PASSED [ 28%] 2023-01-11T23:10:17.0540075Z test_ops.py::TestCommonCUDA::test_python_ref__refs_norm_cuda_bfloat16 PASSED [ 28%] 2023-01-11T23:10:17.0540234Z test_ops.py::TestCommonCUDA::test_python_ref__refs_norm_cuda_complex128 PASSED [ 28%] 2023-01-11T23:10:17.0540390Z test_ops.py::TestCommonCUDA::test_python_ref__refs_norm_cuda_float16 PASSED [ 28%] 2023-01-11T23:10:17.0540542Z test_ops.py::TestCommonCUDA::test_python_ref__refs_norm_cuda_float32 PASSED [ 28%] 2023-01-11T23:10:17.0540696Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ones_cuda_bool PASSED [ 28%] 2023-01-11T23:10:17.0540855Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ones_cuda_complex32 PASSED [ 28%] 2023-01-11T23:10:17.0541010Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ones_cuda_complex64 PASSED [ 28%] 2023-01-11T23:10:17.0541160Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ones_cuda_float64 PASSED [ 28%] 2023-01-11T23:10:17.0541337Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ones_cuda_int16 PASSED [ 28%] 2023-01-11T23:10:17.0541490Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ones_cuda_int32 PASSED [ 28%] 2023-01-11T23:10:17.0541641Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ones_cuda_int8 PASSED [ 28%] 2023-01-11T23:10:17.0541803Z test_ops.py::TestCommonCUDA::test_python_ref__refs_permute_cuda_bfloat16 PASSED [ 28%] 2023-01-11T23:10:17.0541958Z test_ops.py::TestCommonCUDA::test_python_ref__refs_permute_cuda_bool PASSED [ 28%] 2023-01-11T23:10:17.0542122Z test_ops.py::TestCommonCUDA::test_python_ref__refs_permute_cuda_complex128 PASSED [ 28%] 2023-01-11T23:10:17.0542283Z test_ops.py::TestCommonCUDA::test_python_ref__refs_permute_cuda_complex32 PASSED [ 28%] 2023-01-11T23:10:17.0542446Z test_ops.py::TestCommonCUDA::test_python_ref__refs_permute_cuda_complex64 PASSED [ 28%] 2023-01-11T23:10:17.0542598Z test_ops.py::TestCommonCUDA::test_python_ref__refs_permute_cuda_float32 PASSED [ 28%] 2023-01-11T23:10:17.0542759Z test_ops.py::TestCommonCUDA::test_python_ref__refs_permute_cuda_int16 PASSED [ 28%] 2023-01-11T23:10:17.0542913Z test_ops.py::TestCommonCUDA::test_python_ref__refs_permute_cuda_int32 PASSED [ 28%] 2023-01-11T23:10:17.0543063Z test_ops.py::TestCommonCUDA::test_python_ref__refs_permute_cuda_int64 PASSED [ 28%] 2023-01-11T23:10:17.0543225Z test_ops.py::TestCommonCUDA::test_python_ref__refs_positive_cuda_bfloat16 PASSED [ 28%] 2023-01-11T23:10:17.0543389Z test_ops.py::TestCommonCUDA::test_python_ref__refs_positive_cuda_complex128 PASSED [ 28%] 2023-01-11T23:10:17.0543549Z test_ops.py::TestCommonCUDA::test_python_ref__refs_positive_cuda_float64 PASSED [ 28%] 2023-01-11T23:10:17.0543707Z test_ops.py::TestCommonCUDA::test_python_ref__refs_positive_cuda_int64 PASSED [ 28%] 2023-01-11T23:10:17.0543878Z test_ops.py::TestCommonCUDA::test_python_ref__refs_positive_cuda_uint8 PASSED [ 28%] 2023-01-11T23:10:17.0544034Z test_ops.py::TestCommonCUDA::test_python_ref__refs_pow_cuda_float32 PASSED [ 28%] 2023-01-11T23:10:17.0544188Z test_ops.py::TestCommonCUDA::test_python_ref__refs_pow_cuda_int16 PASSED [ 28%] 2023-01-11T23:10:17.0544338Z test_ops.py::TestCommonCUDA::test_python_ref__refs_pow_cuda_int64 PASSED [ 28%] 2023-01-11T23:10:17.0544489Z test_ops.py::TestCommonCUDA::test_python_ref__refs_pow_cuda_int8 PASSED [ 28%] 2023-01-11T23:10:17.0544641Z test_ops.py::TestCommonCUDA::test_python_ref__refs_pow_cuda_uint8 PASSED [ 28%] 2023-01-11T23:10:17.0544797Z test_ops.py::TestCommonCUDA::test_python_ref__refs_prod_cuda_bfloat16 PASSED [ 28%] 2023-01-11T23:10:17.0544952Z test_ops.py::TestCommonCUDA::test_python_ref__refs_prod_cuda_bool PASSED [ 28%] 2023-01-11T23:10:17.0545108Z test_ops.py::TestCommonCUDA::test_python_ref__refs_prod_cuda_complex128 PASSED [ 28%] 2023-01-11T23:10:17.0545268Z test_ops.py::TestCommonCUDA::test_python_ref__refs_prod_cuda_complex64 PASSED [ 28%] 2023-01-11T23:10:17.0545425Z test_ops.py::TestCommonCUDA::test_python_ref__refs_prod_cuda_float32 PASSED [ 28%] 2023-01-11T23:10:17.0545579Z test_ops.py::TestCommonCUDA::test_python_ref__refs_prod_cuda_float64 PASSED [ 28%] 2023-01-11T23:10:17.0545734Z test_ops.py::TestCommonCUDA::test_python_ref__refs_prod_cuda_int64 PASSED [ 28%] 2023-01-11T23:10:17.0545887Z test_ops.py::TestCommonCUDA::test_python_ref__refs_prod_cuda_uint8 PASSED [ 28%] 2023-01-11T23:10:17.0546049Z test_ops.py::TestCommonCUDA::test_python_ref__refs_randn_cuda_complex32 PASSED [ 28%] 2023-01-11T23:10:17.0546203Z test_ops.py::TestCommonCUDA::test_python_ref__refs_randn_cuda_complex64 PASSED [ 28%] 2023-01-11T23:10:17.0546353Z test_ops.py::TestCommonCUDA::test_python_ref__refs_randn_cuda_float16 PASSED [ 28%] 2023-01-11T23:10:17.0546502Z test_ops.py::TestCommonCUDA::test_python_ref__refs_randn_cuda_float64 PASSED [ 28%] 2023-01-11T23:10:17.0546661Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ravel_cuda_bfloat16 PASSED [ 28%] 2023-01-11T23:10:17.0546821Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ravel_cuda_complex128 PASSED [ 28%] 2023-01-11T23:10:17.0547005Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ravel_cuda_complex64 PASSED [ 28%] 2023-01-11T23:10:17.0547160Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ravel_cuda_float64 PASSED [ 28%] 2023-01-11T23:10:17.0547309Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ravel_cuda_int8 PASSED [ 28%] 2023-01-11T23:10:17.0547462Z test_ops.py::TestCommonCUDA::test_python_ref__refs_real_cuda_bfloat16 PASSED [ 28%] 2023-01-11T23:10:17.0547618Z test_ops.py::TestCommonCUDA::test_python_ref__refs_real_cuda_complex32 PASSED [ 28%] 2023-01-11T23:10:17.0547765Z test_ops.py::TestCommonCUDA::test_python_ref__refs_real_cuda_float64 PASSED [ 28%] 2023-01-11T23:10:17.0547917Z test_ops.py::TestCommonCUDA::test_python_ref__refs_real_cuda_int16 PASSED [ 28%] 2023-01-11T23:10:17.0548071Z test_ops.py::TestCommonCUDA::test_python_ref__refs_real_cuda_int64 PASSED [ 28%] 2023-01-11T23:10:17.0548222Z test_ops.py::TestCommonCUDA::test_python_ref__refs_real_cuda_int8 PASSED [ 28%] 2023-01-11T23:10:17.0548375Z test_ops.py::TestCommonCUDA::test_python_ref__refs_real_cuda_uint8 PASSED [ 28%] 2023-01-11T23:10:17.0548534Z test_ops.py::TestCommonCUDA::test_python_ref__refs_reciprocal_cuda_bool PASSED [ 28%] 2023-01-11T23:10:17.0548769Z test_ops.py::TestCommonCUDA::test_python_ref__refs_reciprocal_cuda_float16 PASSED [ 28%] 2023-01-11T23:10:17.0548934Z test_ops.py::TestCommonCUDA::test_python_ref__refs_reciprocal_cuda_float32 PASSED [ 28%] 2023-01-11T23:10:17.0549090Z test_ops.py::TestCommonCUDA::test_python_ref__refs_reciprocal_cuda_float64 PASSED [ 28%] 2023-01-11T23:10:17.0549253Z test_ops.py::TestCommonCUDA::test_python_ref__refs_reciprocal_cuda_int32 PASSED [ 28%] 2023-01-11T23:10:17.0549411Z test_ops.py::TestCommonCUDA::test_python_ref__refs_reciprocal_cuda_int8 PASSED [ 28%] 2023-01-11T23:10:17.0549599Z test_ops.py::TestCommonCUDA::test_python_ref__refs_reciprocal_cuda_uint8 PASSED [ 28%] 2023-01-11T23:10:17.0549760Z test_ops.py::TestCommonCUDA::test_python_ref__refs_remainder_cuda_float16 PASSED [ 28%] 2023-01-11T23:10:17.0549925Z test_ops.py::TestCommonCUDA::test_python_ref__refs_remainder_cuda_float32 PASSED [ 28%] 2023-01-11T23:10:17.0550083Z test_ops.py::TestCommonCUDA::test_python_ref__refs_remainder_cuda_float64 PASSED [ 28%] 2023-01-11T23:10:17.0550238Z test_ops.py::TestCommonCUDA::test_python_ref__refs_remainder_cuda_int16 PASSED [ 28%] 2023-01-11T23:10:17.0550384Z test_ops.py::TestCommonCUDA::test_python_ref__refs_remainder_cuda_int32 PASSED [ 28%] 2023-01-11T23:10:17.0550540Z test_ops.py::TestCommonCUDA::test_python_ref__refs_remainder_cuda_int8 PASSED [ 28%] 2023-01-11T23:10:17.0550696Z test_ops.py::TestCommonCUDA::test_python_ref__refs_remainder_cuda_uint8 PASSED [ 28%] 2023-01-11T23:10:17.0550881Z test_ops.py::TestCommonCUDA::test_python_ref__refs_repeat_cuda_complex128 PASSED [ 28%] 2023-01-11T23:10:17.0551066Z test_ops.py::TestCommonCUDA::test_python_ref__refs_repeat_cuda_complex64 PASSED [ 28%] 2023-01-11T23:10:17.0551223Z test_ops.py::TestCommonCUDA::test_python_ref__refs_repeat_cuda_float16 PASSED [ 28%] 2023-01-11T23:10:17.0551379Z test_ops.py::TestCommonCUDA::test_python_ref__refs_repeat_cuda_float32 PASSED [ 28%] 2023-01-11T23:10:17.0551530Z test_ops.py::TestCommonCUDA::test_python_ref__refs_repeat_cuda_float64 PASSED [ 28%] 2023-01-11T23:10:17.0551685Z test_ops.py::TestCommonCUDA::test_python_ref__refs_repeat_cuda_int16 PASSED [ 28%] 2023-01-11T23:10:17.0551831Z test_ops.py::TestCommonCUDA::test_python_ref__refs_repeat_cuda_int64 PASSED [ 28%] 2023-01-11T23:10:17.0551982Z test_ops.py::TestCommonCUDA::test_python_ref__refs_repeat_cuda_uint8 PASSED [ 28%] 2023-01-11T23:10:17.0552140Z test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_as_cuda_bool PASSED [ 28%] 2023-01-11T23:10:17.0552305Z test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_as_cuda_complex128 PASSED [ 28%] 2023-01-11T23:10:17.0552472Z test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_as_cuda_complex32 PASSED [ 28%] 2023-01-11T23:10:17.0552635Z test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_as_cuda_complex64 PASSED [ 28%] 2023-01-11T23:10:17.0552829Z test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_as_cuda_float16 PASSED [ 28%] 2023-01-11T23:10:17.0552987Z test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_as_cuda_float64 PASSED [ 28%] 2023-01-11T23:10:17.0553141Z test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_as_cuda_int32 PASSED [ 28%] 2023-01-11T23:10:17.0553299Z test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_cuda_complex32 PASSED [ 28%] 2023-01-11T23:10:17.0553454Z test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_cuda_float32 PASSED [ 28%] 2023-01-11T23:10:17.0553612Z test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_cuda_int16 PASSED [ 28%] 2023-01-11T23:10:17.0553766Z test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_cuda_int64 PASSED [ 28%] 2023-01-11T23:10:17.0553925Z test_ops.py::TestCommonCUDA::test_python_ref__refs_roll_cuda_complex32 PASSED [ 28%] 2023-01-11T23:10:17.0554080Z test_ops.py::TestCommonCUDA::test_python_ref__refs_roll_cuda_float32 PASSED [ 28%] 2023-01-11T23:10:17.0554233Z test_ops.py::TestCommonCUDA::test_python_ref__refs_roll_cuda_int8 PASSED [ 28%] 2023-01-11T23:10:17.0554379Z test_ops.py::TestCommonCUDA::test_python_ref__refs_roll_cuda_uint8 PASSED [ 28%] 2023-01-11T23:10:17.0554533Z test_ops.py::TestCommonCUDA::test_python_ref__refs_rot90_cuda_bfloat16 PASSED [ 28%] 2023-01-11T23:10:17.0554684Z test_ops.py::TestCommonCUDA::test_python_ref__refs_rot90_cuda_int16 PASSED [ 28%] 2023-01-11T23:10:17.0554835Z test_ops.py::TestCommonCUDA::test_python_ref__refs_rot90_cuda_int64 PASSED [ 28%] 2023-01-11T23:10:17.0554985Z test_ops.py::TestCommonCUDA::test_python_ref__refs_rot90_cuda_int8 PASSED [ 28%] 2023-01-11T23:10:17.0555137Z test_ops.py::TestCommonCUDA::test_python_ref__refs_rot90_cuda_uint8 PASSED [ 28%] 2023-01-11T23:10:17.0555320Z test_ops.py::TestCommonCUDA::test_python_ref__refs_round_cuda_float16 PASSED [ 28%] 2023-01-11T23:10:17.0555471Z test_ops.py::TestCommonCUDA::test_python_ref__refs_round_cuda_int16 PASSED [ 28%] 2023-01-11T23:10:17.0555619Z test_ops.py::TestCommonCUDA::test_python_ref__refs_round_cuda_int32 PASSED [ 28%] 2023-01-11T23:10:17.0555777Z test_ops.py::TestCommonCUDA::test_python_ref__refs_rsqrt_cuda_float16 PASSED [ 28%] 2023-01-11T23:10:17.0555931Z test_ops.py::TestCommonCUDA::test_python_ref__refs_rsqrt_cuda_int32 PASSED [ 29%] 2023-01-11T23:10:17.0556083Z test_ops.py::TestCommonCUDA::test_python_ref__refs_rsqrt_cuda_int64 PASSED [ 29%] 2023-01-11T23:10:17.0556232Z test_ops.py::TestCommonCUDA::test_python_ref__refs_rsqrt_cuda_uint8 PASSED [ 29%] 2023-01-11T23:10:17.0556387Z test_ops.py::TestCommonCUDA::test_python_ref__refs_rsub_cuda_bfloat16 PASSED [ 29%] 2023-01-11T23:10:17.0556545Z test_ops.py::TestCommonCUDA::test_python_ref__refs_rsub_cuda_complex128 PASSED [ 29%] 2023-01-11T23:10:17.0556705Z test_ops.py::TestCommonCUDA::test_python_ref__refs_rsub_cuda_complex64 PASSED [ 29%] 2023-01-11T23:10:17.0556859Z test_ops.py::TestCommonCUDA::test_python_ref__refs_rsub_cuda_float16 PASSED [ 29%] 2023-01-11T23:10:17.0557007Z test_ops.py::TestCommonCUDA::test_python_ref__refs_rsub_cuda_int16 PASSED [ 29%] 2023-01-11T23:10:17.0557157Z test_ops.py::TestCommonCUDA::test_python_ref__refs_rsub_cuda_int32 PASSED [ 29%] 2023-01-11T23:10:17.0557308Z test_ops.py::TestCommonCUDA::test_python_ref__refs_rsub_cuda_int64 PASSED [ 29%] 2023-01-11T23:10:17.0557461Z test_ops.py::TestCommonCUDA::test_python_ref__refs_rsub_cuda_uint8 PASSED [ 29%] 2023-01-11T23:10:17.0557613Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sgn_cuda_bfloat16 PASSED [ 29%] 2023-01-11T23:10:17.0557768Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sgn_cuda_complex32 PASSED [ 29%] 2023-01-11T23:10:17.0557917Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sgn_cuda_int16 PASSED [ 29%] 2023-01-11T23:10:17.0558069Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sgn_cuda_int32 PASSED [ 29%] 2023-01-11T23:10:17.0558214Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sgn_cuda_int8 PASSED [ 29%] 2023-01-11T23:10:17.0558401Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sigmoid_cuda_complex64 PASSED [ 29%] 2023-01-11T23:10:17.0558561Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sigmoid_cuda_float64 PASSED [ 29%] 2023-01-11T23:10:17.0558718Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sigmoid_cuda_int16 PASSED [ 29%] 2023-01-11T23:10:17.0558872Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sigmoid_cuda_int8 PASSED [ 29%] 2023-01-11T23:10:17.0559025Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sign_cuda_bfloat16 PASSED [ 29%] 2023-01-11T23:10:17.0559174Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sign_cuda_int16 PASSED [ 29%] 2023-01-11T23:10:17.0559325Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sign_cuda_int32 PASSED [ 29%] 2023-01-11T23:10:17.0559469Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sign_cuda_int64 PASSED [ 29%] 2023-01-11T23:10:17.0559629Z test_ops.py::TestCommonCUDA::test_python_ref__refs_signbit_cuda_float16 PASSED [ 29%] 2023-01-11T23:10:17.0559784Z test_ops.py::TestCommonCUDA::test_python_ref__refs_signbit_cuda_float32 PASSED [ 29%] 2023-01-11T23:10:17.0559937Z test_ops.py::TestCommonCUDA::test_python_ref__refs_signbit_cuda_float64 PASSED [ 29%] 2023-01-11T23:10:17.0560092Z test_ops.py::TestCommonCUDA::test_python_ref__refs_signbit_cuda_int16 PASSED [ 29%] 2023-01-11T23:10:17.0560244Z test_ops.py::TestCommonCUDA::test_python_ref__refs_signbit_cuda_uint8 PASSED [ 29%] 2023-01-11T23:10:17.0560399Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sin_cuda_bfloat16 PASSED [ 29%] 2023-01-11T23:10:17.0560550Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sin_cuda_bool PASSED [ 29%] 2023-01-11T23:10:17.0560701Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sin_cuda_complex128 PASSED [ 29%] 2023-01-11T23:10:17.0560855Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sin_cuda_complex32 PASSED [ 29%] 2023-01-11T23:10:17.0561050Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sin_cuda_float32 PASSED [ 29%] 2023-01-11T23:10:17.0561224Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sin_cuda_float64 PASSED [ 29%] 2023-01-11T23:10:17.0561385Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sin_cuda_int64 PASSED [ 29%] 2023-01-11T23:10:17.0561533Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sin_cuda_uint8 PASSED [ 29%] 2023-01-11T23:10:17.0561685Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sinc_cuda_bfloat16 PASSED [ 29%] 2023-01-11T23:10:17.0561841Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sinc_cuda_complex128 PASSED [ 29%] 2023-01-11T23:10:17.0561989Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sinc_cuda_complex64 PASSED [ 29%] 2023-01-11T23:10:17.0562140Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sinc_cuda_int64 PASSED [ 29%] 2023-01-11T23:10:17.0562287Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sinc_cuda_int8 PASSED [ 29%] 2023-01-11T23:10:17.0562441Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sinh_cuda_bool PASSED [ 29%] 2023-01-11T23:10:17.0562597Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sinh_cuda_complex64 PASSED [ 29%] 2023-01-11T23:10:17.0562756Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sinh_cuda_float32 PASSED [ 29%] 2023-01-11T23:10:17.0562906Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sinh_cuda_int16 PASSED [ 29%] 2023-01-11T23:10:17.0563053Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sinh_cuda_int32 PASSED [ 29%] 2023-01-11T23:10:17.0563204Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sinh_cuda_int64 PASSED [ 29%] 2023-01-11T23:10:17.0563349Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sinh_cuda_uint8 PASSED [ 29%] 2023-01-11T23:10:17.0563519Z test_ops.py::TestCommonCUDA::test_python_ref__refs_softmax_with_dtype_cuda_bool PASSED [ 29%] 2023-01-11T23:10:17.0563697Z test_ops.py::TestCommonCUDA::test_python_ref__refs_softmax_with_dtype_cuda_complex128 PASSED [ 29%] 2023-01-11T23:10:17.0563873Z test_ops.py::TestCommonCUDA::test_python_ref__refs_softmax_with_dtype_cuda_complex64 PASSED [ 29%] 2023-01-11T23:10:17.0564068Z test_ops.py::TestCommonCUDA::test_python_ref__refs_softmax_with_dtype_cuda_float32 PASSED [ 29%] 2023-01-11T23:10:17.0564236Z test_ops.py::TestCommonCUDA::test_python_ref__refs_softmax_with_dtype_cuda_int8 PASSED [ 29%] 2023-01-11T23:10:17.0564400Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_bessel_j0_cuda_bool PASSED [ 29%] 2023-01-11T23:10:17.0564564Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_bessel_j0_cuda_float32 PASSED [ 29%] 2023-01-11T23:10:17.0564721Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_bessel_j0_cuda_int32 PASSED [ 29%] 2023-01-11T23:10:17.0564886Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_bessel_j0_cuda_int64 PASSED [ 29%] 2023-01-11T23:10:17.0565048Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_bessel_j0_cuda_int8 PASSED [ 29%] 2023-01-11T23:10:17.0565212Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_bessel_j0_cuda_uint8 PASSED [ 29%] 2023-01-11T23:10:17.0565372Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_bessel_j1_cuda_bool PASSED [ 29%] 2023-01-11T23:10:17.0565542Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_bessel_j1_cuda_float32 PASSED [ 29%] 2023-01-11T23:10:17.0565704Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_bessel_j1_cuda_float64 PASSED [ 29%] 2023-01-11T23:10:17.0565864Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_bessel_j1_cuda_int16 PASSED [ 29%] 2023-01-11T23:10:17.0566021Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_bessel_j1_cuda_int32 PASSED [ 29%] 2023-01-11T23:10:17.0566174Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_bessel_j1_cuda_int8 PASSED [ 29%] 2023-01-11T23:10:17.0566334Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_bessel_j1_cuda_uint8 PASSED [ 29%] 2023-01-11T23:10:17.0566524Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_entr_cuda_bfloat16 PASSED [ 29%] 2023-01-11T23:10:17.0566686Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_entr_cuda_bool PASSED [ 29%] 2023-01-11T23:10:17.0566852Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_entr_cuda_float16 PASSED [ 29%] 2023-01-11T23:10:17.0567014Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_entr_cuda_float32 PASSED [ 29%] 2023-01-11T23:10:17.0567178Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_erfcx_cuda_bool PASSED [ 29%] 2023-01-11T23:10:17.0567342Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_erfcx_cuda_float32 PASSED [ 29%] 2023-01-11T23:10:17.0567497Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_erfcx_cuda_int16 PASSED [ 29%] 2023-01-11T23:10:17.0567658Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_erfcx_cuda_int32 PASSED [ 29%] 2023-01-11T23:10:17.0567817Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_erfcx_cuda_int64 PASSED [ 29%] 2023-01-11T23:10:17.0567980Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_erfcx_cuda_uint8 PASSED [ 29%] 2023-01-11T23:10:17.0568141Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i0e_cuda_bfloat16 PASSED [ 29%] 2023-01-11T23:10:17.0568302Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i0e_cuda_float32 PASSED [ 29%] 2023-01-11T23:10:17.0568458Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i0e_cuda_float64 PASSED [ 29%] 2023-01-11T23:10:17.0568618Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i0e_cuda_int8 PASSED [ 29%] 2023-01-11T23:10:17.0568775Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i1_cuda_bool PASSED [ 29%] 2023-01-11T23:10:17.0568927Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i1_cuda_float32 PASSED [ 29%] 2023-01-11T23:10:17.0569086Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i1_cuda_int8 PASSED [ 29%] 2023-01-11T23:10:17.0569244Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i1_cuda_uint8 PASSED [ 29%] 2023-01-11T23:10:17.0569404Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i1e_cuda_float64 PASSED [ 29%] 2023-01-11T23:10:17.0569563Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i1e_cuda_int16 PASSED [ 29%] 2023-01-11T23:10:17.0569745Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i1e_cuda_int32 PASSED [ 29%] 2023-01-11T23:10:17.0569905Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i1e_cuda_int8 PASSED [ 29%] 2023-01-11T23:10:17.0570072Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_log_ndtr_cuda_bool PASSED [ 29%] 2023-01-11T23:10:17.0570252Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_log_ndtr_cuda_float64 PASSED [ 29%] 2023-01-11T23:10:17.0570444Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_log_ndtr_cuda_int32 PASSED [ 29%] 2023-01-11T23:10:17.0570615Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_log_ndtr_cuda_int64 PASSED [ 29%] 2023-01-11T23:10:17.0570781Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_log_ndtr_cuda_int8 PASSED [ 29%] 2023-01-11T23:10:17.0570942Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_log_ndtr_cuda_uint8 PASSED [ 29%] 2023-01-11T23:10:17.0571126Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_log_softmax_with_dtype_cuda_bool PASSED [ 29%] 2023-01-11T23:10:17.0571314Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_log_softmax_with_dtype_cuda_complex32 PASSED [ 29%] 2023-01-11T23:10:17.0571498Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_log_softmax_with_dtype_cuda_float32 PASSED [ 29%] 2023-01-11T23:10:17.0571681Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_log_softmax_with_dtype_cuda_float64 PASSED [ 29%] 2023-01-11T23:10:17.0571856Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_log_softmax_with_dtype_cuda_int32 PASSED [ 29%] 2023-01-11T23:10:17.0572036Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_log_softmax_with_dtype_cuda_int64 PASSED [ 29%] 2023-01-11T23:10:17.0572238Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_log_softmax_with_dtype_cuda_uint8 PASSED [ 29%] 2023-01-11T23:10:17.0572402Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_logit_cuda_float16 PASSED [ 29%] 2023-01-11T23:10:17.0572567Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_logit_cuda_float32 PASSED [ 29%] 2023-01-11T23:10:17.0572728Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_logit_cuda_int64 PASSED [ 29%] 2023-01-11T23:10:17.0572888Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_logit_cuda_uint8 PASSED [ 29%] 2023-01-11T23:10:17.0573077Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_1_cuda_float16 PASSED [ 29%] 2023-01-11T23:10:17.0573263Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_1_cuda_int16 PASSED [ 29%] 2023-01-11T23:10:17.0573440Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_1_cuda_int64 PASSED [ 29%] 2023-01-11T23:10:17.0573628Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_1_cuda_int8 PASSED [ 29%] 2023-01-11T23:10:17.0573808Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_1_cuda_uint8 PASSED [ 29%] 2023-01-11T23:10:17.0573986Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_3_cuda_int8 PASSED [ 29%] 2023-01-11T23:10:17.0574170Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_5_cuda_float64 PASSED [ 29%] 2023-01-11T23:10:17.0574354Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_5_cuda_int16 PASSED [ 29%] 2023-01-11T23:10:17.0574653Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_5_cuda_int32 PASSED [ 29%] 2023-01-11T23:10:17.0574833Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_5_cuda_int64 PASSED [ 29%] 2023-01-11T23:10:17.0575019Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_5_cuda_int8 PASSED [ 29%] 2023-01-11T23:10:17.0575176Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_ndtr_cuda_bfloat16 PASSED [ 29%] 2023-01-11T23:10:17.0575376Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_ndtr_cuda_bool PASSED [ 29%] 2023-01-11T23:10:17.0575545Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_ndtr_cuda_float32 PASSED [ 29%] 2023-01-11T23:10:17.0575706Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_ndtr_cuda_int16 PASSED [ 29%] 2023-01-11T23:10:17.0575870Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_ndtr_cuda_int64 PASSED [ 29%] 2023-01-11T23:10:17.0576029Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_ndtr_cuda_uint8 PASSED [ 29%] 2023-01-11T23:10:17.0576191Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_ndtri_cuda_bool PASSED [ 29%] 2023-01-11T23:10:17.0576355Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_ndtri_cuda_float32 PASSED [ 29%] 2023-01-11T23:10:17.0576512Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_ndtri_cuda_int32 PASSED [ 29%] 2023-01-11T23:10:17.0576673Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_ndtri_cuda_int64 PASSED [ 29%] 2023-01-11T23:10:17.0576836Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_ndtri_cuda_uint8 PASSED [ 29%] 2023-01-11T23:10:17.0577022Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_softmax_with_dtype_cuda_complex64 PASSED [ 29%] 2023-01-11T23:10:17.0577202Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_softmax_with_dtype_cuda_float64 PASSED [ 29%] 2023-01-11T23:10:17.0577379Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_softmax_with_dtype_cuda_int16 PASSED [ 29%] 2023-01-11T23:10:17.0577556Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_softmax_with_dtype_cuda_int64 PASSED [ 29%] 2023-01-11T23:10:17.0577736Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_softmax_with_dtype_cuda_int8 PASSED [ 29%] 2023-01-11T23:10:17.0577944Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_softmax_with_dtype_cuda_uint8 PASSED [ 29%] 2023-01-11T23:10:17.0578119Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_spherical_bessel_j0_cuda_int16 PASSED [ 29%] 2023-01-11T23:10:17.0578300Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_spherical_bessel_j0_cuda_int32 PASSED [ 29%] 2023-01-11T23:10:17.0578478Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_spherical_bessel_j0_cuda_int64 PASSED [ 29%] 2023-01-11T23:10:17.0578659Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_spherical_bessel_j0_cuda_int8 PASSED [ 29%] 2023-01-11T23:10:17.0578823Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_xlog1py_cuda_bool PASSED [ 29%] 2023-01-11T23:10:17.0578990Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_xlog1py_cuda_float32 PASSED [ 29%] 2023-01-11T23:10:17.0579155Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_xlog1py_cuda_float64 PASSED [ 29%] 2023-01-11T23:10:17.0579321Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_xlog1py_cuda_int64 PASSED [ 29%] 2023-01-11T23:10:17.0579487Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_zeta_cuda_bool PASSED [ 29%] 2023-01-11T23:10:17.0579641Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_zeta_cuda_int16 PASSED [ 29%] 2023-01-11T23:10:17.0579801Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_zeta_cuda_uint8 PASSED [ 29%] 2023-01-11T23:10:17.0579954Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sqrt_cuda_bool PASSED [ 29%] 2023-01-11T23:10:17.0580112Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sqrt_cuda_complex128 PASSED [ 29%] 2023-01-11T23:10:17.0580269Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sqrt_cuda_complex64 PASSED [ 29%] 2023-01-11T23:10:17.0580424Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sqrt_cuda_float16 PASSED [ 29%] 2023-01-11T23:10:17.0580577Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sqrt_cuda_float64 PASSED [ 29%] 2023-01-11T23:10:17.0580730Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sqrt_cuda_int16 PASSED [ 29%] 2023-01-11T23:10:17.0580875Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sqrt_cuda_int32 PASSED [ 29%] 2023-01-11T23:10:17.0581054Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sqrt_cuda_uint8 PASSED [ 29%] 2023-01-11T23:10:17.0581211Z test_ops.py::TestCommonCUDA::test_python_ref__refs_square_cuda_bool PASSED [ 29%] 2023-01-11T23:10:17.0581372Z test_ops.py::TestCommonCUDA::test_python_ref__refs_square_cuda_complex64 PASSED [ 29%] 2023-01-11T23:10:17.0581528Z test_ops.py::TestCommonCUDA::test_python_ref__refs_square_cuda_float16 PASSED [ 29%] 2023-01-11T23:10:17.0581680Z test_ops.py::TestCommonCUDA::test_python_ref__refs_square_cuda_float32 PASSED [ 29%] 2023-01-11T23:10:17.0581830Z test_ops.py::TestCommonCUDA::test_python_ref__refs_square_cuda_float64 PASSED [ 30%] 2023-01-11T23:10:17.0581984Z test_ops.py::TestCommonCUDA::test_python_ref__refs_square_cuda_int16 PASSED [ 30%] 2023-01-11T23:10:17.0582138Z test_ops.py::TestCommonCUDA::test_python_ref__refs_square_cuda_int32 PASSED [ 30%] 2023-01-11T23:10:17.0582282Z test_ops.py::TestCommonCUDA::test_python_ref__refs_square_cuda_uint8 PASSED [ 30%] 2023-01-11T23:10:17.0582442Z test_ops.py::TestCommonCUDA::test_python_ref__refs_squeeze_cuda_bfloat16 PASSED [ 30%] 2023-01-11T23:10:17.0582603Z test_ops.py::TestCommonCUDA::test_python_ref__refs_squeeze_cuda_complex128 PASSED [ 30%] 2023-01-11T23:10:17.0582764Z test_ops.py::TestCommonCUDA::test_python_ref__refs_squeeze_cuda_complex32 PASSED [ 30%] 2023-01-11T23:10:17.0582922Z test_ops.py::TestCommonCUDA::test_python_ref__refs_squeeze_cuda_int16 PASSED [ 30%] 2023-01-11T23:10:17.0583075Z test_ops.py::TestCommonCUDA::test_python_ref__refs_squeeze_cuda_int64 PASSED [ 30%] 2023-01-11T23:10:17.0583229Z test_ops.py::TestCommonCUDA::test_python_ref__refs_squeeze_cuda_int8 PASSED [ 30%] 2023-01-11T23:10:17.0583385Z test_ops.py::TestCommonCUDA::test_python_ref__refs_stack_cuda_bfloat16 PASSED [ 30%] 2023-01-11T23:10:17.0583556Z test_ops.py::TestCommonCUDA::test_python_ref__refs_stack_cuda_bool PASSED [ 30%] 2023-01-11T23:10:17.0583715Z test_ops.py::TestCommonCUDA::test_python_ref__refs_stack_cuda_complex64 PASSED [ 30%] 2023-01-11T23:10:17.0583871Z test_ops.py::TestCommonCUDA::test_python_ref__refs_stack_cuda_float16 PASSED [ 30%] 2023-01-11T23:10:17.0584023Z test_ops.py::TestCommonCUDA::test_python_ref__refs_stack_cuda_float32 PASSED [ 30%] 2023-01-11T23:10:17.0584176Z test_ops.py::TestCommonCUDA::test_python_ref__refs_stack_cuda_int16 PASSED [ 30%] 2023-01-11T23:10:17.0584332Z test_ops.py::TestCommonCUDA::test_python_ref__refs_std_cuda_bfloat16 PASSED [ 30%] 2023-01-11T23:10:17.0584489Z test_ops.py::TestCommonCUDA::test_python_ref__refs_std_cuda_complex64 PASSED [ 30%] 2023-01-11T23:10:17.0584643Z test_ops.py::TestCommonCUDA::test_python_ref__refs_std_cuda_float32 PASSED [ 30%] 2023-01-11T23:10:17.0584788Z test_ops.py::TestCommonCUDA::test_python_ref__refs_std_cuda_float64 PASSED [ 30%] 2023-01-11T23:10:17.0584951Z test_ops.py::TestCommonCUDA::test_python_ref__refs_std_mean_cuda_bfloat16 PASSED [ 30%] 2023-01-11T23:10:17.0585107Z test_ops.py::TestCommonCUDA::test_python_ref__refs_std_mean_cuda_float16 PASSED [ 30%] 2023-01-11T23:10:17.0585270Z test_ops.py::TestCommonCUDA::test_python_ref__refs_std_mean_cuda_float64 PASSED [ 30%] 2023-01-11T23:10:17.0585424Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sub_cuda_complex32 PASSED [ 30%] 2023-01-11T23:10:17.0585575Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sub_cuda_complex64 PASSED [ 30%] 2023-01-11T23:10:17.0585725Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sub_cuda_float32 PASSED [ 30%] 2023-01-11T23:10:17.0585877Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sub_cuda_float64 PASSED [ 30%] 2023-01-11T23:10:17.0586022Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sub_cuda_int16 PASSED [ 30%] 2023-01-11T23:10:17.0586174Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sub_cuda_int32 PASSED [ 30%] 2023-01-11T23:10:17.0586328Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sub_cuda_int64 PASSED [ 30%] 2023-01-11T23:10:17.0586483Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_cuda_bfloat16 PASSED [ 30%] 2023-01-11T23:10:17.0586665Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_cuda_complex32 PASSED [ 30%] 2023-01-11T23:10:17.0586820Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_cuda_complex64 PASSED [ 30%] 2023-01-11T23:10:17.0586972Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_cuda_float16 PASSED [ 30%] 2023-01-11T23:10:17.0587122Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_cuda_int32 PASSED [ 30%] 2023-01-11T23:10:17.0587271Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_cuda_int64 PASSED [ 30%] 2023-01-11T23:10:17.0587413Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_cuda_int8 PASSED [ 30%] 2023-01-11T23:10:17.0587580Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_to_size_cuda_complex128 PASSED [ 30%] 2023-01-11T23:10:17.0587745Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_to_size_cuda_float32 PASSED [ 30%] 2023-01-11T23:10:17.0587906Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_to_size_cuda_int64 PASSED [ 30%] 2023-01-11T23:10:17.0588065Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_to_size_cuda_int8 PASSED [ 30%] 2023-01-11T23:10:17.0588218Z test_ops.py::TestCommonCUDA::test_python_ref__refs_t_cuda_bfloat16 PASSED [ 30%] 2023-01-11T23:10:17.0588372Z test_ops.py::TestCommonCUDA::test_python_ref__refs_t_cuda_complex128 PASSED [ 30%] 2023-01-11T23:10:17.0588523Z test_ops.py::TestCommonCUDA::test_python_ref__refs_t_cuda_complex64 PASSED [ 30%] 2023-01-11T23:10:17.0588724Z test_ops.py::TestCommonCUDA::test_python_ref__refs_t_cuda_float16 PASSED [ 30%] 2023-01-11T23:10:17.0588892Z test_ops.py::TestCommonCUDA::test_python_ref__refs_t_cuda_float32 PASSED [ 30%] 2023-01-11T23:10:17.0589052Z test_ops.py::TestCommonCUDA::test_python_ref__refs_t_cuda_int32 PASSED [ 30%] 2023-01-11T23:10:17.0589233Z test_ops.py::TestCommonCUDA::test_python_ref__refs_t_cuda_int8 PASSED [ 30%] 2023-01-11T23:10:17.0589390Z test_ops.py::TestCommonCUDA::test_python_ref__refs_tan_cuda_bfloat16 PASSED [ 30%] 2023-01-11T23:10:17.0589543Z test_ops.py::TestCommonCUDA::test_python_ref__refs_tan_cuda_bool PASSED [ 30%] 2023-01-11T23:10:17.0589701Z test_ops.py::TestCommonCUDA::test_python_ref__refs_tan_cuda_float32 PASSED [ 30%] 2023-01-11T23:10:17.0589855Z test_ops.py::TestCommonCUDA::test_python_ref__refs_tan_cuda_float64 PASSED [ 30%] 2023-01-11T23:10:17.0590000Z test_ops.py::TestCommonCUDA::test_python_ref__refs_tan_cuda_int16 PASSED [ 30%] 2023-01-11T23:10:17.0590150Z test_ops.py::TestCommonCUDA::test_python_ref__refs_tan_cuda_int64 PASSED [ 30%] 2023-01-11T23:10:17.0590308Z test_ops.py::TestCommonCUDA::test_python_ref__refs_tanh_cuda_bfloat16 PASSED [ 30%] 2023-01-11T23:10:17.0590468Z test_ops.py::TestCommonCUDA::test_python_ref__refs_tanh_cuda_complex64 PASSED [ 30%] 2023-01-11T23:10:17.0590625Z test_ops.py::TestCommonCUDA::test_python_ref__refs_tanh_cuda_int16 PASSED [ 30%] 2023-01-11T23:10:17.0590782Z test_ops.py::TestCommonCUDA::test_python_ref__refs_tanh_cuda_int64 PASSED [ 30%] 2023-01-11T23:10:17.0590962Z test_ops.py::TestCommonCUDA::test_python_ref__refs_tanh_cuda_uint8 PASSED [ 30%] 2023-01-11T23:10:17.0591147Z test_ops.py::TestCommonCUDA::test_python_ref__refs_tensor_split_cuda_bool XFAIL [ 30%] 2023-01-11T23:10:17.0591309Z test_ops.py::TestCommonCUDA::test_python_ref__refs_tensor_split_cuda_complex128 XFAIL [ 30%] 2023-01-11T23:10:17.0591476Z test_ops.py::TestCommonCUDA::test_python_ref__refs_tensor_split_cuda_complex64 XFAIL [ 30%] 2023-01-11T23:10:17.0591639Z test_ops.py::TestCommonCUDA::test_python_ref__refs_tensor_split_cuda_float32 XFAIL [ 30%] 2023-01-11T23:10:17.0591798Z test_ops.py::TestCommonCUDA::test_python_ref__refs_tensor_split_cuda_int16 XFAIL [ 30%] 2023-01-11T23:10:17.0591958Z test_ops.py::TestCommonCUDA::test_python_ref__refs_tensor_split_cuda_int32 XFAIL [ 30%] 2023-01-11T23:10:17.0592119Z test_ops.py::TestCommonCUDA::test_python_ref__refs_tensor_split_cuda_int64 XFAIL [ 30%] 2023-01-11T23:10:17.0592279Z test_ops.py::TestCommonCUDA::test_python_ref__refs_tensor_split_cuda_int8 XFAIL [ 30%] 2023-01-11T23:10:17.0592470Z test_ops.py::TestCommonCUDA::test_python_ref__refs_tensor_split_cuda_uint8 XFAIL [ 30%] 2023-01-11T23:10:17.0592625Z test_ops.py::TestCommonCUDA::test_python_ref__refs_to_cuda_bfloat16 PASSED [ 30%] 2023-01-11T23:10:17.0592768Z test_ops.py::TestCommonCUDA::test_python_ref__refs_to_cuda_bool PASSED [ 30%] 2023-01-11T23:10:17.0592925Z test_ops.py::TestCommonCUDA::test_python_ref__refs_to_cuda_complex128 PASSED [ 30%] 2023-01-11T23:10:17.0593078Z test_ops.py::TestCommonCUDA::test_python_ref__refs_to_cuda_complex64 PASSED [ 30%] 2023-01-11T23:10:17.0593229Z test_ops.py::TestCommonCUDA::test_python_ref__refs_to_cuda_float16 PASSED [ 30%] 2023-01-11T23:10:17.0593382Z test_ops.py::TestCommonCUDA::test_python_ref__refs_to_cuda_float64 PASSED [ 30%] 2023-01-11T23:10:17.0593534Z test_ops.py::TestCommonCUDA::test_python_ref__refs_to_cuda_int32 PASSED [ 30%] 2023-01-11T23:10:17.0593682Z test_ops.py::TestCommonCUDA::test_python_ref__refs_to_cuda_int64 PASSED [ 30%] 2023-01-11T23:10:17.0593833Z test_ops.py::TestCommonCUDA::test_python_ref__refs_to_cuda_int8 PASSED [ 30%] 2023-01-11T23:10:17.0593987Z test_ops.py::TestCommonCUDA::test_python_ref__refs_trace_cuda_complex128 PASSED [ 30%] 2023-01-11T23:10:17.0594148Z test_ops.py::TestCommonCUDA::test_python_ref__refs_trace_cuda_complex32 PASSED [ 30%] 2023-01-11T23:10:17.0594304Z test_ops.py::TestCommonCUDA::test_python_ref__refs_trace_cuda_complex64 PASSED [ 30%] 2023-01-11T23:10:17.0594457Z test_ops.py::TestCommonCUDA::test_python_ref__refs_trace_cuda_uint8 PASSED [ 30%] 2023-01-11T23:10:17.0594621Z test_ops.py::TestCommonCUDA::test_python_ref__refs_transpose_cuda_complex128 PASSED [ 30%] 2023-01-11T23:10:17.0594783Z test_ops.py::TestCommonCUDA::test_python_ref__refs_transpose_cuda_complex64 PASSED [ 30%] 2023-01-11T23:10:17.0594967Z test_ops.py::TestCommonCUDA::test_python_ref__refs_transpose_cuda_float32 PASSED [ 30%] 2023-01-11T23:10:17.0595125Z test_ops.py::TestCommonCUDA::test_python_ref__refs_transpose_cuda_int32 PASSED [ 30%] 2023-01-11T23:10:17.0595277Z test_ops.py::TestCommonCUDA::test_python_ref__refs_transpose_cuda_int64 PASSED [ 30%] 2023-01-11T23:10:17.0595428Z test_ops.py::TestCommonCUDA::test_python_ref__refs_tril_cuda_bool PASSED [ 30%] 2023-01-11T23:10:17.0595588Z test_ops.py::TestCommonCUDA::test_python_ref__refs_tril_cuda_complex128 PASSED [ 30%] 2023-01-11T23:10:17.0595743Z test_ops.py::TestCommonCUDA::test_python_ref__refs_tril_cuda_complex64 PASSED [ 30%] 2023-01-11T23:10:17.0595897Z test_ops.py::TestCommonCUDA::test_python_ref__refs_tril_cuda_float32 PASSED [ 30%] 2023-01-11T23:10:17.0596047Z test_ops.py::TestCommonCUDA::test_python_ref__refs_tril_cuda_int16 PASSED [ 30%] 2023-01-11T23:10:17.0596199Z test_ops.py::TestCommonCUDA::test_python_ref__refs_tril_cuda_int32 PASSED [ 30%] 2023-01-11T23:10:17.0596350Z test_ops.py::TestCommonCUDA::test_python_ref__refs_tril_cuda_int8 PASSED [ 30%] 2023-01-11T23:10:17.0596495Z test_ops.py::TestCommonCUDA::test_python_ref__refs_tril_cuda_uint8 PASSED [ 30%] 2023-01-11T23:10:17.0596659Z test_ops.py::TestCommonCUDA::test_python_ref__refs_tril_indices_cuda_int64 PASSED [ 30%] 2023-01-11T23:10:17.0596808Z test_ops.py::TestCommonCUDA::test_python_ref__refs_triu_cuda_bool PASSED [ 30%] 2023-01-11T23:10:17.0596962Z test_ops.py::TestCommonCUDA::test_python_ref__refs_triu_cuda_complex128 PASSED [ 30%] 2023-01-11T23:10:17.0597124Z test_ops.py::TestCommonCUDA::test_python_ref__refs_triu_cuda_complex32 PASSED [ 30%] 2023-01-11T23:10:17.0597277Z test_ops.py::TestCommonCUDA::test_python_ref__refs_triu_cuda_float32 PASSED [ 30%] 2023-01-11T23:10:17.0597427Z test_ops.py::TestCommonCUDA::test_python_ref__refs_triu_cuda_float64 PASSED [ 30%] 2023-01-11T23:10:17.0597578Z test_ops.py::TestCommonCUDA::test_python_ref__refs_triu_cuda_int16 PASSED [ 30%] 2023-01-11T23:10:17.0597728Z test_ops.py::TestCommonCUDA::test_python_ref__refs_triu_cuda_int8 PASSED [ 30%] 2023-01-11T23:10:17.0597879Z test_ops.py::TestCommonCUDA::test_python_ref__refs_triu_cuda_uint8 PASSED [ 30%] 2023-01-11T23:10:17.0598062Z test_ops.py::TestCommonCUDA::test_python_ref__refs_true_divide_cuda_bool PASSED [ 30%] 2023-01-11T23:10:17.0598230Z test_ops.py::TestCommonCUDA::test_python_ref__refs_true_divide_cuda_complex128 PASSED [ 30%] 2023-01-11T23:10:17.0598392Z test_ops.py::TestCommonCUDA::test_python_ref__refs_true_divide_cuda_complex64 PASSED [ 30%] 2023-01-11T23:10:17.0598555Z test_ops.py::TestCommonCUDA::test_python_ref__refs_true_divide_cuda_float16 PASSED [ 30%] 2023-01-11T23:10:17.0598718Z test_ops.py::TestCommonCUDA::test_python_ref__refs_true_divide_cuda_float32 PASSED [ 30%] 2023-01-11T23:10:17.0598877Z test_ops.py::TestCommonCUDA::test_python_ref__refs_true_divide_cuda_float64 PASSED [ 30%] 2023-01-11T23:10:17.0599034Z test_ops.py::TestCommonCUDA::test_python_ref__refs_true_divide_cuda_int16 PASSED [ 30%] 2023-01-11T23:10:17.0599190Z test_ops.py::TestCommonCUDA::test_python_ref__refs_true_divide_cuda_int64 PASSED [ 30%] 2023-01-11T23:10:17.0599348Z test_ops.py::TestCommonCUDA::test_python_ref__refs_true_divide_cuda_int8 PASSED [ 30%] 2023-01-11T23:10:17.0599508Z test_ops.py::TestCommonCUDA::test_python_ref__refs_true_divide_cuda_uint8 PASSED [ 30%] 2023-01-11T23:10:17.0599666Z test_ops.py::TestCommonCUDA::test_python_ref__refs_trunc_cuda_float32 PASSED [ 30%] 2023-01-11T23:10:17.0599816Z test_ops.py::TestCommonCUDA::test_python_ref__refs_trunc_cuda_float64 PASSED [ 30%] 2023-01-11T23:10:17.0599967Z test_ops.py::TestCommonCUDA::test_python_ref__refs_trunc_cuda_int8 PASSED [ 30%] 2023-01-11T23:10:17.0600123Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unbind_cuda_bfloat16 PASSED [ 30%] 2023-01-11T23:10:17.0600278Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unbind_cuda_bool PASSED [ 30%] 2023-01-11T23:10:17.0600431Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unbind_cuda_complex32 PASSED [ 30%] 2023-01-11T23:10:17.0600611Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unbind_cuda_complex64 PASSED [ 30%] 2023-01-11T23:10:17.0600767Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unbind_cuda_int16 PASSED [ 30%] 2023-01-11T23:10:17.0600919Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unbind_cuda_int32 PASSED [ 30%] 2023-01-11T23:10:17.0601069Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unbind_cuda_int64 PASSED [ 30%] 2023-01-11T23:10:17.0601230Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unflatten_cuda_bfloat16 PASSED [ 30%] 2023-01-11T23:10:17.0601396Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unflatten_cuda_complex128 PASSED [ 30%] 2023-01-11T23:10:17.0601556Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unflatten_cuda_complex64 PASSED [ 30%] 2023-01-11T23:10:17.0601715Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unflatten_cuda_float16 PASSED [ 30%] 2023-01-11T23:10:17.0601868Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unflatten_cuda_float32 PASSED [ 30%] 2023-01-11T23:10:17.0602027Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unflatten_cuda_float64 PASSED [ 30%] 2023-01-11T23:10:17.0602183Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unflatten_cuda_int32 PASSED [ 30%] 2023-01-11T23:10:17.0602341Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unflatten_cuda_int8 PASSED [ 30%] 2023-01-11T23:10:17.0602504Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_copy_cuda_bfloat16 PASSED [ 30%] 2023-01-11T23:10:17.0602669Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_copy_cuda_complex128 PASSED [ 30%] 2023-01-11T23:10:17.0602827Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_copy_cuda_int16 PASSED [ 30%] 2023-01-11T23:10:17.0602983Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_copy_cuda_uint8 PASSED [ 30%] 2023-01-11T23:10:17.0603134Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_cuda_bfloat16 PASSED [ 30%] 2023-01-11T23:10:17.0603292Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_cuda_complex128 PASSED [ 30%] 2023-01-11T23:10:17.0603453Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_cuda_complex32 PASSED [ 30%] 2023-01-11T23:10:17.0603607Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_cuda_float64 PASSED [ 30%] 2023-01-11T23:10:17.0603822Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_cuda_int16 PASSED [ 30%] 2023-01-11T23:10:17.0603975Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_cuda_int64 PASSED [ 30%] 2023-01-11T23:10:17.0604127Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_cuda_uint8 PASSED [ 30%] 2023-01-11T23:10:17.0604290Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unsqueeze_cuda_bfloat16 PASSED [ 30%] 2023-01-11T23:10:17.0604448Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unsqueeze_cuda_complex128 PASSED [ 30%] 2023-01-11T23:10:17.0604604Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unsqueeze_cuda_float16 PASSED [ 30%] 2023-01-11T23:10:17.0604764Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unsqueeze_cuda_float32 PASSED [ 30%] 2023-01-11T23:10:17.0604924Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unsqueeze_cuda_int16 PASSED [ 30%] 2023-01-11T23:10:17.0605079Z test_ops.py::TestCommonCUDA::test_python_ref__refs_var_cuda_complex128 PASSED [ 30%] 2023-01-11T23:10:17.0605236Z test_ops.py::TestCommonCUDA::test_python_ref__refs_var_cuda_complex64 PASSED [ 30%] 2023-01-11T23:10:17.0605387Z test_ops.py::TestCommonCUDA::test_python_ref__refs_var_cuda_float64 PASSED [ 30%] 2023-01-11T23:10:17.0605546Z test_ops.py::TestCommonCUDA::test_python_ref__refs_var_mean_cuda_complex64 PASSED [ 30%] 2023-01-11T23:10:17.0605702Z test_ops.py::TestCommonCUDA::test_python_ref__refs_var_mean_cuda_float16 PASSED [ 30%] 2023-01-11T23:10:17.0605853Z test_ops.py::TestCommonCUDA::test_python_ref__refs_var_mean_cuda_float64 PASSED [ 30%] 2023-01-11T23:10:17.0606013Z test_ops.py::TestCommonCUDA::test_python_ref__refs_view_as_cuda_bfloat16 PASSED [ 30%] 2023-01-11T23:10:17.0606167Z test_ops.py::TestCommonCUDA::test_python_ref__refs_view_as_cuda_bool PASSED [ 30%] 2023-01-11T23:10:17.0606346Z test_ops.py::TestCommonCUDA::test_python_ref__refs_view_as_cuda_complex32 PASSED [ 30%] 2023-01-11T23:10:17.0606505Z test_ops.py::TestCommonCUDA::test_python_ref__refs_view_as_cuda_float16 PASSED [ 30%] 2023-01-11T23:10:17.0606657Z test_ops.py::TestCommonCUDA::test_python_ref__refs_view_as_cuda_float32 PASSED [ 31%] 2023-01-11T23:10:17.0606807Z test_ops.py::TestCommonCUDA::test_python_ref__refs_view_as_cuda_float64 PASSED [ 31%] 2023-01-11T23:10:17.0606957Z test_ops.py::TestCommonCUDA::test_python_ref__refs_view_as_cuda_int8 PASSED [ 31%] 2023-01-11T23:10:17.0607104Z test_ops.py::TestCommonCUDA::test_python_ref__refs_view_as_cuda_uint8 PASSED [ 31%] 2023-01-11T23:10:17.0607254Z test_ops.py::TestCommonCUDA::test_python_ref__refs_view_cuda_bfloat16 PASSED [ 31%] 2023-01-11T23:10:17.0607406Z test_ops.py::TestCommonCUDA::test_python_ref__refs_view_cuda_complex128 PASSED [ 31%] 2023-01-11T23:10:17.0607564Z test_ops.py::TestCommonCUDA::test_python_ref__refs_view_cuda_complex64 PASSED [ 31%] 2023-01-11T23:10:17.0607713Z test_ops.py::TestCommonCUDA::test_python_ref__refs_view_cuda_float16 PASSED [ 31%] 2023-01-11T23:10:17.0607867Z test_ops.py::TestCommonCUDA::test_python_ref__refs_view_cuda_float32 PASSED [ 31%] 2023-01-11T23:10:17.0608019Z test_ops.py::TestCommonCUDA::test_python_ref__refs_view_cuda_int64 PASSED [ 31%] 2023-01-11T23:10:17.0608172Z test_ops.py::TestCommonCUDA::test_python_ref__refs_view_cuda_int8 PASSED [ 31%] 2023-01-11T23:10:17.0608324Z test_ops.py::TestCommonCUDA::test_python_ref__refs_vsplit_cuda_bfloat16 PASSED [ 31%] 2023-01-11T23:10:17.0608476Z test_ops.py::TestCommonCUDA::test_python_ref__refs_vsplit_cuda_bool PASSED [ 31%] 2023-01-11T23:10:17.0608635Z test_ops.py::TestCommonCUDA::test_python_ref__refs_vsplit_cuda_complex64 PASSED [ 31%] 2023-01-11T23:10:17.0608794Z test_ops.py::TestCommonCUDA::test_python_ref__refs_vsplit_cuda_float16 PASSED [ 31%] 2023-01-11T23:10:17.0608949Z test_ops.py::TestCommonCUDA::test_python_ref__refs_vsplit_cuda_float32 PASSED [ 31%] 2023-01-11T23:10:17.0609107Z test_ops.py::TestCommonCUDA::test_python_ref__refs_vsplit_cuda_int32 PASSED [ 31%] 2023-01-11T23:10:17.0609262Z test_ops.py::TestCommonCUDA::test_python_ref__refs_vsplit_cuda_uint8 PASSED [ 31%] 2023-01-11T23:10:17.0609442Z test_ops.py::TestCommonCUDA::test_python_ref__refs_vstack_cuda_complex32 PASSED [ 31%] 2023-01-11T23:10:17.0609597Z test_ops.py::TestCommonCUDA::test_python_ref__refs_vstack_cuda_complex64 PASSED [ 31%] 2023-01-11T23:10:17.0609750Z test_ops.py::TestCommonCUDA::test_python_ref__refs_vstack_cuda_float16 PASSED [ 31%] 2023-01-11T23:10:17.0609902Z test_ops.py::TestCommonCUDA::test_python_ref__refs_vstack_cuda_float32 PASSED [ 31%] 2023-01-11T23:10:17.0610051Z test_ops.py::TestCommonCUDA::test_python_ref__refs_vstack_cuda_int16 PASSED [ 31%] 2023-01-11T23:10:17.0610200Z test_ops.py::TestCommonCUDA::test_python_ref__refs_vstack_cuda_int32 PASSED [ 31%] 2023-01-11T23:10:17.0610348Z test_ops.py::TestCommonCUDA::test_python_ref__refs_vstack_cuda_int64 PASSED [ 31%] 2023-01-11T23:10:17.0610504Z test_ops.py::TestCommonCUDA::test_python_ref__refs_vstack_cuda_int8 PASSED [ 31%] 2023-01-11T23:10:17.0610657Z test_ops.py::TestCommonCUDA::test_python_ref__refs_vstack_cuda_uint8 PASSED [ 31%] 2023-01-11T23:10:17.0610816Z test_ops.py::TestCommonCUDA::test_python_ref__refs_where_cuda_bfloat16 PASSED [ 31%] 2023-01-11T23:10:17.0610963Z test_ops.py::TestCommonCUDA::test_python_ref__refs_where_cuda_bool PASSED [ 31%] 2023-01-11T23:10:17.0611122Z test_ops.py::TestCommonCUDA::test_python_ref__refs_where_cuda_complex128 PASSED [ 31%] 2023-01-11T23:10:17.0611280Z test_ops.py::TestCommonCUDA::test_python_ref__refs_where_cuda_complex64 PASSED [ 31%] 2023-01-11T23:10:17.0611433Z test_ops.py::TestCommonCUDA::test_python_ref__refs_where_cuda_uint8 PASSED [ 31%] 2023-01-11T23:10:17.0611587Z test_ops.py::TestCommonCUDA::test_python_ref__refs_xlogy_cuda_bfloat16 PASSED [ 31%] 2023-01-11T23:10:17.0611741Z test_ops.py::TestCommonCUDA::test_python_ref__refs_xlogy_cuda_bool PASSED [ 31%] 2023-01-11T23:10:17.0611919Z test_ops.py::TestCommonCUDA::test_python_ref__refs_xlogy_cuda_int32 PASSED [ 31%] 2023-01-11T23:10:17.0612069Z test_ops.py::TestCommonCUDA::test_python_ref__refs_xlogy_cuda_int64 PASSED [ 31%] 2023-01-11T23:10:17.0612217Z test_ops.py::TestCommonCUDA::test_python_ref__refs_xlogy_cuda_uint8 PASSED [ 31%] 2023-01-11T23:10:17.0612372Z test_ops.py::TestCommonCUDA::test_python_ref__refs_zeros_cuda_bfloat16 PASSED [ 31%] 2023-01-11T23:10:17.0612523Z test_ops.py::TestCommonCUDA::test_python_ref__refs_zeros_cuda_bool PASSED [ 31%] 2023-01-11T23:10:17.0612682Z test_ops.py::TestCommonCUDA::test_python_ref__refs_zeros_cuda_complex64 PASSED [ 31%] 2023-01-11T23:10:17.0612840Z test_ops.py::TestCommonCUDA::test_python_ref__refs_zeros_cuda_float16 PASSED [ 31%] 2023-01-11T23:10:17.0612993Z test_ops.py::TestCommonCUDA::test_python_ref__refs_zeros_cuda_int8 PASSED [ 31%] 2023-01-11T23:10:17.0613143Z test_ops.py::TestCommonCUDA::test_python_ref__refs_zeros_cuda_uint8 PASSED [ 31%] 2023-01-11T23:10:17.0613298Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_add_cuda PASSED [ 31%] 2023-01-11T23:10:17.0613450Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_arange_cuda PASSED [ 31%] 2023-01-11T23:10:17.0613622Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_bitwise_right_shift_cuda PASSED [ 31%] 2023-01-11T23:10:17.0613784Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_bitwise_xor_cuda PASSED [ 31%] 2023-01-11T23:10:17.0613941Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_clamp_min_cuda XFAIL [ 31%] 2023-01-11T23:10:17.0614097Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_copysign_cuda PASSED [ 31%] 2023-01-11T23:10:17.0614257Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_diag_embed_cuda PASSED [ 31%] 2023-01-11T23:10:17.0614423Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_diagonal_copy_cuda PASSED [ 31%] 2023-01-11T23:10:17.0614935Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_diagonal_cuda PASSED [ 31%] 2023-01-11T23:10:17.0615143Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_div_floor_rounding_cuda PASSED [ 31%] 2023-01-11T23:10:17.0615296Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_dsplit_cuda PASSED [ 31%] 2023-01-11T23:10:17.0615507Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_dstack_cuda XFAIL [ 31%] 2023-01-11T23:10:17.0615671Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_eq_cuda PASSED [ 31%] 2023-01-11T23:10:17.0615830Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fft_fft2_cuda PASSED [ 31%] 2023-01-11T23:10:17.0615985Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fft_fftn_cuda PASSED [ 31%] 2023-01-11T23:10:17.0616144Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fft_hfft2_cuda PASSED [ 31%] 2023-01-11T23:10:17.0616299Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fft_hfft_cuda PASSED [ 31%] 2023-01-11T23:10:17.0616455Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fft_hfftn_cuda PASSED [ 31%] 2023-01-11T23:10:17.0616611Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fft_ifft2_cuda PASSED [ 31%] 2023-01-11T23:10:17.0616771Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fft_ifft_cuda PASSED [ 31%] 2023-01-11T23:10:17.0616928Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fft_ifftn_cuda PASSED [ 31%] 2023-01-11T23:10:17.0617083Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fft_irfft_cuda PASSED [ 31%] 2023-01-11T23:10:17.0617241Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fft_irfftn_cuda PASSED [ 31%] 2023-01-11T23:10:17.0617399Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fliplr_cuda PASSED [ 31%] 2023-01-11T23:10:17.0617565Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_floor_divide_cuda PASSED [ 31%] 2023-01-11T23:10:17.0617719Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fmax_cuda PASSED [ 31%] 2023-01-11T23:10:17.0617902Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fmin_cuda PASSED [ 31%] 2023-01-11T23:10:17.0618055Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fmod_cuda PASSED [ 31%] 2023-01-11T23:10:17.0618210Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_ge_cuda PASSED [ 31%] 2023-01-11T23:10:17.0618363Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_gt_cuda PASSED [ 31%] 2023-01-11T23:10:17.0618521Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_hypot_cuda PASSED [ 31%] 2023-01-11T23:10:17.0618675Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_igamma_cuda PASSED [ 31%] 2023-01-11T23:10:17.0618833Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_igammac_cuda PASSED [ 31%] 2023-01-11T23:10:17.0618993Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_index_select_cuda XFAIL [ 31%] 2023-01-11T23:10:17.0619139Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_lcm_cuda PASSED [ 31%] 2023-01-11T23:10:17.0619298Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_logical_and_cuda PASSED [ 31%] 2023-01-11T23:10:17.0619464Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_logical_or_cuda PASSED [ 31%] 2023-01-11T23:10:17.0619615Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_lt_cuda PASSED [ 31%] 2023-01-11T23:10:17.0619772Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_masked_fill_cuda XFAIL [ 31%] 2023-01-11T23:10:17.0619926Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_mean_cuda PASSED [ 31%] 2023-01-11T23:10:17.0620081Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_minimum_cuda XFAIL [ 31%] 2023-01-11T23:10:17.0620232Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_mul_cuda PASSED [ 31%] 2023-01-11T23:10:17.0620384Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_neg_cuda PASSED [ 31%] 2023-01-11T23:10:17.0620538Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_nextafter_cuda PASSED [ 31%] 2023-01-11T23:10:17.0620707Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_nn_functional_gelu_cuda PASSED [ 31%] 2023-01-11T23:10:17.0620895Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_nn_functional_hinge_embedding_loss_cuda PASSED [ 31%] 2023-01-11T23:10:17.0621094Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_nn_functional_huber_loss_cuda PASSED [ 31%] 2023-01-11T23:10:17.0621282Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_nn_functional_margin_ranking_loss_cuda PASSED [ 31%] 2023-01-11T23:10:17.0621453Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_nn_functional_prelu_cuda PASSED [ 31%] 2023-01-11T23:10:17.0621612Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_reshape_as_cuda PASSED [ 31%] 2023-01-11T23:10:17.0621774Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_reshape_cuda PASSED [ 31%] 2023-01-11T23:10:17.0621929Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_special_zeta_cuda PASSED [ 31%] 2023-01-11T23:10:17.0622085Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_sub_cuda PASSED [ 31%] 2023-01-11T23:10:17.0622248Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_sum_to_size_cuda PASSED [ 31%] 2023-01-11T23:10:17.0622407Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_triu_cuda PASSED [ 31%] 2023-01-11T23:10:17.0622565Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_unbind_cuda PASSED [ 31%] 2023-01-11T23:10:17.0622720Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_vsplit_cuda PASSED [ 31%] 2023-01-11T23:10:17.0622876Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_xlogy_cuda PASSED [ 31%] 2023-01-11T23:10:17.0623055Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_T_executor_aten_cuda_bfloat16 PASSED [ 31%] 2023-01-11T23:10:17.0623231Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_T_executor_aten_cuda_bool PASSED [ 31%] 2023-01-11T23:10:17.0623402Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_T_executor_aten_cuda_float16 PASSED [ 31%] 2023-01-11T23:10:17.0623611Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_T_executor_aten_cuda_float32 PASSED [ 31%] 2023-01-11T23:10:17.0623787Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_T_executor_aten_cuda_int16 PASSED [ 31%] 2023-01-11T23:10:17.0623960Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_T_executor_aten_cuda_int32 PASSED [ 31%] 2023-01-11T23:10:17.0624133Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_T_executor_aten_cuda_int8 PASSED [ 31%] 2023-01-11T23:10:17.0624337Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_T_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 31%] 2023-01-11T23:10:17.0624517Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_T_executor_nvfuser_cuda_float32 PASSED [ 31%] 2023-01-11T23:10:17.0624719Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_T_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 31%] 2023-01-11T23:10:17.0624894Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_T_executor_nvfuser_cuda_int32 PASSED [ 31%] 2023-01-11T23:10:17.0625245Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_T_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 31%] 2023-01-11T23:10:17.0625447Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_T_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 31%] 2023-01-11T23:10:17.0625647Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_aten_cuda_complex128 PASSED [ 31%] 2023-01-11T23:10:17.0625844Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_aten_cuda_complex64 PASSED [ 31%] 2023-01-11T23:10:17.0626038Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_aten_cuda_int16 PASSED [ 31%] 2023-01-11T23:10:17.0626229Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_aten_cuda_int64 PASSED [ 31%] 2023-01-11T23:10:17.0626426Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_aten_cuda_uint8 PASSED [ 31%] 2023-01-11T23:10:17.0626672Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 31%] 2023-01-11T23:10:17.0626891Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 31%] 2023-01-11T23:10:17.0627110Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 31%] 2023-01-11T23:10:17.0627478Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_nvfuser_cuda_float32 SKIPPED (_refs._conversions.bfloat16 doesn't support nvfuser) [ 31%] 2023-01-11T23:10:17.0627697Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 31%] 2023-01-11T23:10:17.0628047Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 31%] 2023-01-11T23:10:17.0628418Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_nvfuser_cuda_int32 SKIPPED (_refs._conversions.bfloat16 doesn't support nvfuser) [ 31%] 2023-01-11T23:10:17.0628835Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 31%] 2023-01-11T23:10:17.0629070Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 31%] 2023-01-11T23:10:17.0629264Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_aten_cuda_bfloat16 PASSED [ 31%] 2023-01-11T23:10:17.0629456Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_aten_cuda_bool PASSED [ 31%] 2023-01-11T23:10:17.0629681Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_aten_cuda_complex32 PASSED [ 31%] 2023-01-11T23:10:17.0629877Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_aten_cuda_complex64 PASSED [ 31%] 2023-01-11T23:10:17.0630072Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_aten_cuda_float64 PASSED [ 31%] 2023-01-11T23:10:17.0630256Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_aten_cuda_int16 PASSED [ 31%] 2023-01-11T23:10:17.0630440Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_aten_cuda_int64 PASSED [ 31%] 2023-01-11T23:10:17.0630627Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_aten_cuda_int8 PASSED [ 31%] 2023-01-11T23:10:17.0630843Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 31%] 2023-01-11T23:10:17.0631061Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 31%] 2023-01-11T23:10:17.0631285Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 31%] 2023-01-11T23:10:17.0631649Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 31%] 2023-01-11T23:10:17.0631869Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 31%] 2023-01-11T23:10:17.0632086Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 31%] 2023-01-11T23:10:17.0632425Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 31%] 2023-01-11T23:10:17.0632650Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_byte_executor_aten_cuda_complex128 PASSED [ 31%] 2023-01-11T23:10:17.0632846Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_byte_executor_aten_cuda_float16 PASSED [ 31%] 2023-01-11T23:10:17.0633035Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_byte_executor_aten_cuda_float32 PASSED [ 31%] 2023-01-11T23:10:17.0633227Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_byte_executor_aten_cuda_int32 PASSED [ 31%] 2023-01-11T23:10:17.0633416Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_byte_executor_aten_cuda_int8 PASSED [ 31%] 2023-01-11T23:10:17.0633634Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_byte_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 31%] 2023-01-11T23:10:17.0633850Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_byte_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 31%] 2023-01-11T23:10:17.0634050Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_aten_cuda_bfloat16 PASSED [ 31%] 2023-01-11T23:10:17.0634240Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_aten_cuda_bool PASSED [ 31%] 2023-01-11T23:10:17.0634445Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_aten_cuda_complex128 PASSED [ 31%] 2023-01-11T23:10:17.0634640Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_aten_cuda_complex64 PASSED [ 31%] 2023-01-11T23:10:17.0634837Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_aten_cuda_float64 PASSED [ 31%] 2023-01-11T23:10:17.0635029Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_aten_cuda_int16 PASSED [ 31%] 2023-01-11T23:10:17.0635246Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_aten_cuda_int64 PASSED [ 31%] 2023-01-11T23:10:17.0635438Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_aten_cuda_int8 PASSED [ 32%] 2023-01-11T23:10:17.0635661Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 32%] 2023-01-11T23:10:17.0635882Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 32%] 2023-01-11T23:10:17.0636254Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_nvfuser_cuda_float32 SKIPPED (_refs._conversions.cdouble doesn't support nvfuser) [ 32%] 2023-01-11T23:10:17.0636599Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 32%] 2023-01-11T23:10:17.0636970Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_nvfuser_cuda_int32 SKIPPED (_refs._conversions.cdouble doesn't support nvfuser) [ 32%] 2023-01-11T23:10:17.0637185Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 32%] 2023-01-11T23:10:17.0637398Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 32%] 2023-01-11T23:10:17.0637593Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_aten_cuda_bfloat16 PASSED [ 32%] 2023-01-11T23:10:17.0637787Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_aten_cuda_bool PASSED [ 32%] 2023-01-11T23:10:17.0637978Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_aten_cuda_int64 PASSED [ 32%] 2023-01-11T23:10:17.0638172Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_aten_cuda_int8 PASSED [ 32%] 2023-01-11T23:10:17.0638418Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 32%] 2023-01-11T23:10:17.0638784Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_nvfuser_cuda_float32 SKIPPED (_refs._conversions.cfloat doesn't support nvfuser) [ 32%] 2023-01-11T23:10:17.0639005Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 32%] 2023-01-11T23:10:17.0639364Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_nvfuser_cuda_int32 SKIPPED (_refs._conversions.cfloat doesn't support nvfuser) [ 32%] 2023-01-11T23:10:17.0639585Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 32%] 2023-01-11T23:10:17.0639803Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 32%] 2023-01-11T23:10:17.0639992Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_aten_cuda_bfloat16 PASSED [ 32%] 2023-01-11T23:10:17.0640180Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_aten_cuda_bool PASSED [ 32%] 2023-01-11T23:10:17.0640374Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_aten_cuda_float64 PASSED [ 32%] 2023-01-11T23:10:17.0640566Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_aten_cuda_int16 PASSED [ 32%] 2023-01-11T23:10:17.0640756Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_aten_cuda_int8 PASSED [ 32%] 2023-01-11T23:10:17.0640969Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_aten_cuda_uint8 PASSED [ 32%] 2023-01-11T23:10:17.0641193Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 32%] 2023-01-11T23:10:17.0641406Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 32%] 2023-01-11T23:10:17.0641629Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 32%] 2023-01-11T23:10:17.0641848Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 32%] 2023-01-11T23:10:17.0642059Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 32%] 2023-01-11T23:10:17.0642405Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 32%] 2023-01-11T23:10:17.0642767Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_nvfuser_cuda_int32 SKIPPED (_refs._conversions.chalf doesn't support nvfuser) [ 32%] 2023-01-11T23:10:17.0643109Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 32%] 2023-01-11T23:10:17.0643297Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_aten_cuda_bool PASSED [ 32%] 2023-01-11T23:10:17.0643493Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_aten_cuda_complex64 PASSED [ 32%] 2023-01-11T23:10:17.0643686Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_aten_cuda_float16 PASSED [ 32%] 2023-01-11T23:10:17.0643878Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_aten_cuda_float32 PASSED [ 32%] 2023-01-11T23:10:17.0644095Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_aten_cuda_int16 PASSED [ 32%] 2023-01-11T23:10:17.0644280Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_aten_cuda_int32 PASSED [ 32%] 2023-01-11T23:10:17.0644491Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 32%] 2023-01-11T23:10:17.0644709Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 32%] 2023-01-11T23:10:17.0644928Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 32%] 2023-01-11T23:10:17.0645296Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 32%] 2023-01-11T23:10:17.0645519Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 32%] 2023-01-11T23:10:17.0645873Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_nvfuser_cuda_float32 SKIPPED (_refs._conversions.char doesn't support nvfuser) [ 32%] 2023-01-11T23:10:17.0646223Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_nvfuser_cuda_int32 SKIPPED (_refs._conversions.char doesn't support nvfuser) [ 32%] 2023-01-11T23:10:17.0646436Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 32%] 2023-01-11T23:10:17.0646633Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_complex_executor_aten_cuda_float16 PASSED [ 32%] 2023-01-11T23:10:17.0646859Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_complex_executor_aten_cuda_float64 PASSED [ 32%] 2023-01-11T23:10:17.0647229Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_complex_executor_nvfuser_cuda_float32 SKIPPED (_refs._conversions.complex doesn't support nvfuser) [ 32%] 2023-01-11T23:10:17.0647425Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_aten_cuda_bfloat16 PASSED [ 32%] 2023-01-11T23:10:17.0647613Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_aten_cuda_bool PASSED [ 32%] 2023-01-11T23:10:17.0647813Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_aten_cuda_complex128 PASSED [ 32%] 2023-01-11T23:10:17.0648011Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_aten_cuda_complex32 PASSED [ 32%] 2023-01-11T23:10:17.0648208Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_aten_cuda_float16 PASSED [ 32%] 2023-01-11T23:10:17.0648403Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_aten_cuda_float32 PASSED [ 32%] 2023-01-11T23:10:17.0648596Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_aten_cuda_int64 PASSED [ 32%] 2023-01-11T23:10:17.0648787Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_aten_cuda_int8 PASSED [ 32%] 2023-01-11T23:10:17.0649008Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 32%] 2023-01-11T23:10:17.0649367Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 32%] 2023-01-11T23:10:17.0649586Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 32%] 2023-01-11T23:10:17.0649971Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_nvfuser_cuda_float32 SKIPPED (_refs._conversions.double doesn't support nvfuser) [ 32%] 2023-01-11T23:10:17.0650334Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_nvfuser_cuda_int32 SKIPPED (_refs._conversions.double doesn't support nvfuser) [ 32%] 2023-01-11T23:10:17.0650681Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 32%] 2023-01-11T23:10:17.0650877Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_float_executor_aten_cuda_bfloat16 PASSED [ 32%] 2023-01-11T23:10:17.0651097Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_float_executor_aten_cuda_bool PASSED [ 32%] 2023-01-11T23:10:17.0651320Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_float_executor_aten_cuda_complex64 PASSED [ 32%] 2023-01-11T23:10:17.0651514Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_float_executor_aten_cuda_float16 PASSED [ 32%] 2023-01-11T23:10:17.0651709Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_float_executor_aten_cuda_float64 PASSED [ 32%] 2023-01-11T23:10:17.0651897Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_float_executor_aten_cuda_int32 PASSED [ 32%] 2023-01-11T23:10:17.0652083Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_float_executor_aten_cuda_int8 PASSED [ 32%] 2023-01-11T23:10:17.0652296Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_float_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 32%] 2023-01-11T23:10:17.0652656Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_float_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 32%] 2023-01-11T23:10:17.0652905Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_float_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 32%] 2023-01-11T23:10:17.0653125Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_float_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 32%] 2023-01-11T23:10:17.0653483Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_float_executor_nvfuser_cuda_float32 SKIPPED (_refs._conversions.float doesn't support nvfuser) [ 32%] 2023-01-11T23:10:17.0653698Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_float_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 32%] 2023-01-11T23:10:17.0653913Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_float_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 32%] 2023-01-11T23:10:17.0654251Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_float_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 32%] 2023-01-11T23:10:17.0654449Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_half_executor_aten_cuda_bfloat16 PASSED [ 32%] 2023-01-11T23:10:17.0654762Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_half_executor_aten_cuda_bool PASSED [ 32%] 2023-01-11T23:10:17.0654958Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_half_executor_aten_cuda_complex64 PASSED [ 32%] 2023-01-11T23:10:17.0655144Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_half_executor_aten_cuda_float16 PASSED [ 32%] 2023-01-11T23:10:17.0655336Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_half_executor_aten_cuda_float32 PASSED [ 32%] 2023-01-11T23:10:17.0655526Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_half_executor_aten_cuda_int32 PASSED [ 32%] 2023-01-11T23:10:17.0655716Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_half_executor_aten_cuda_uint8 PASSED [ 32%] 2023-01-11T23:10:17.0655973Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_half_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 32%] 2023-01-11T23:10:17.0656195Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_half_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 32%] 2023-01-11T23:10:17.0656412Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_half_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 32%] 2023-01-11T23:10:17.0656626Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_half_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 32%] 2023-01-11T23:10:17.0656972Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_half_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 32%] 2023-01-11T23:10:17.0657165Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_int_executor_aten_cuda_bool PASSED [ 32%] 2023-01-11T23:10:17.0657357Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_int_executor_aten_cuda_complex128 PASSED [ 32%] 2023-01-11T23:10:17.0657551Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_int_executor_aten_cuda_complex64 PASSED [ 32%] 2023-01-11T23:10:17.0657742Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_int_executor_aten_cuda_uint8 PASSED [ 32%] 2023-01-11T23:10:17.0657959Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_int_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 32%] 2023-01-11T23:10:17.0658171Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_int_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 32%] 2023-01-11T23:10:17.0658389Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_int_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 32%] 2023-01-11T23:10:17.0658642Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_int_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 32%] 2023-01-11T23:10:17.0658859Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_int_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 32%] 2023-01-11T23:10:17.0659205Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_int_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 32%] 2023-01-11T23:10:17.0659415Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_int_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 32%] 2023-01-11T23:10:17.0659627Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_int_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 32%] 2023-01-11T23:10:17.0659816Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_aten_cuda_complex32 PASSED [ 32%] 2023-01-11T23:10:17.0660012Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_aten_cuda_complex64 PASSED [ 32%] 2023-01-11T23:10:17.0660207Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_aten_cuda_float32 PASSED [ 32%] 2023-01-11T23:10:17.0660397Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_aten_cuda_int64 PASSED [ 32%] 2023-01-11T23:10:17.0660584Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_aten_cuda_uint8 PASSED [ 32%] 2023-01-11T23:10:17.0660799Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 32%] 2023-01-11T23:10:17.0661046Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 32%] 2023-01-11T23:10:17.0661431Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 32%] 2023-01-11T23:10:17.0661676Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 32%] 2023-01-11T23:10:17.0661894Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 32%] 2023-01-11T23:10:17.0662238Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_nvfuser_cuda_float32 SKIPPED (_refs._conversions.long doesn't support nvfuser) [ 32%] 2023-01-11T23:10:17.0662579Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 32%] 2023-01-11T23:10:17.0662791Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 32%] 2023-01-11T23:10:17.0663138Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 32%] 2023-01-11T23:10:17.0663357Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 32%] 2023-01-11T23:10:17.0663549Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_short_executor_aten_cuda_bfloat16 PASSED [ 32%] 2023-01-11T23:10:17.0663739Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_short_executor_aten_cuda_bool PASSED [ 32%] 2023-01-11T23:10:17.0663931Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_short_executor_aten_cuda_float16 PASSED [ 32%] 2023-01-11T23:10:17.0664121Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_short_executor_aten_cuda_float32 PASSED [ 32%] 2023-01-11T23:10:17.0664338Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_short_executor_aten_cuda_float64 PASSED [ 32%] 2023-01-11T23:10:17.0664528Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_short_executor_aten_cuda_int16 PASSED [ 32%] 2023-01-11T23:10:17.0664706Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_short_executor_aten_cuda_int32 PASSED [ 32%] 2023-01-11T23:10:17.0664889Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_short_executor_aten_cuda_uint8 PASSED [ 32%] 2023-01-11T23:10:17.0665111Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_short_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 32%] 2023-01-11T23:10:17.0665324Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_short_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 32%] 2023-01-11T23:10:17.0665540Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_short_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 32%] 2023-01-11T23:10:17.0665756Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_short_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 32%] 2023-01-11T23:10:17.0665936Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_aten_cuda_bfloat16 PASSED [ 32%] 2023-01-11T23:10:17.0666121Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_aten_cuda_complex32 PASSED [ 32%] 2023-01-11T23:10:17.0666305Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_aten_cuda_complex64 PASSED [ 32%] 2023-01-11T23:10:17.0666477Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_aten_cuda_float16 PASSED [ 32%] 2023-01-11T23:10:17.0666654Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_aten_cuda_int16 PASSED [ 32%] 2023-01-11T23:10:17.0666828Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_aten_cuda_int32 PASSED [ 32%] 2023-01-11T23:10:17.0667004Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_aten_cuda_int64 PASSED [ 32%] 2023-01-11T23:10:17.0667231Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_aten_cuda_uint8 PASSED [ 32%] 2023-01-11T23:10:17.0667440Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 32%] 2023-01-11T23:10:17.0667779Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 32%] 2023-01-11T23:10:17.0667988Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 32%] 2023-01-11T23:10:17.0668191Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 32%] 2023-01-11T23:10:17.0668517Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 32%] 2023-01-11T23:10:17.0668754Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_nvfuser_cuda_int32 PASSED [ 32%] 2023-01-11T23:10:17.0668958Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 32%] 2023-01-11T23:10:17.0669276Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 32%] 2023-01-11T23:10:17.0669457Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acos_executor_aten_cuda_bfloat16 PASSED [ 32%] 2023-01-11T23:10:17.0669638Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acos_executor_aten_cuda_complex64 PASSED [ 32%] 2023-01-11T23:10:17.0669815Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acos_executor_aten_cuda_float32 PASSED [ 32%] 2023-01-11T23:10:17.0670021Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acos_executor_aten_cuda_int64 PASSED [ 32%] 2023-01-11T23:10:17.0670232Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acos_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 32%] 2023-01-11T23:10:17.0670436Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acos_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 32%] 2023-01-11T23:10:17.0670810Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acos_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 32%] 2023-01-11T23:10:17.0670984Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acos_executor_nvfuser_cuda_int32 PASSED [ 32%] 2023-01-11T23:10:17.0671186Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acos_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 32%] 2023-01-11T23:10:17.0671366Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acosh_executor_aten_cuda_bfloat16 PASSED [ 32%] 2023-01-11T23:10:17.0671548Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acosh_executor_aten_cuda_float32 PASSED [ 32%] 2023-01-11T23:10:17.0671728Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acosh_executor_aten_cuda_float64 PASSED [ 33%] 2023-01-11T23:10:17.0671905Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acosh_executor_aten_cuda_int16 PASSED [ 33%] 2023-01-11T23:10:17.0672083Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acosh_executor_aten_cuda_int64 PASSED [ 33%] 2023-01-11T23:10:17.0672260Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acosh_executor_aten_cuda_int8 PASSED [ 33%] 2023-01-11T23:10:17.0672437Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acosh_executor_aten_cuda_uint8 PASSED [ 33%] 2023-01-11T23:10:17.0672637Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acosh_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 33%] 2023-01-11T23:10:17.0672841Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acosh_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 33%] 2023-01-11T23:10:17.0673089Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acosh_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 33%] 2023-01-11T23:10:17.0673295Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acosh_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 33%] 2023-01-11T23:10:17.0673492Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acosh_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 33%] 2023-01-11T23:10:17.0673810Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acosh_executor_nvfuser_cuda_int32 SKIPPED (_refs.acosh doesn't support nvfuser) [ 33%] 2023-01-11T23:10:17.0673996Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_aten_cuda_complex128 PASSED [ 33%] 2023-01-11T23:10:17.0674178Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_aten_cuda_complex32 PASSED [ 33%] 2023-01-11T23:10:17.0674363Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_aten_cuda_complex64 PASSED [ 33%] 2023-01-11T23:10:17.0674542Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_aten_cuda_float16 PASSED [ 33%] 2023-01-11T23:10:17.0674714Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_aten_cuda_float32 PASSED [ 33%] 2023-01-11T23:10:17.0674890Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_aten_cuda_int64 PASSED [ 33%] 2023-01-11T23:10:17.0675066Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_aten_cuda_uint8 PASSED [ 33%] 2023-01-11T23:10:17.0675269Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 33%] 2023-01-11T23:10:17.0675478Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 33%] 2023-01-11T23:10:17.0675709Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 33%] 2023-01-11T23:10:17.0675895Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_nvfuser_cuda_float32 PASSED [ 33%] 2023-01-11T23:10:17.0676096Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 33%] 2023-01-11T23:10:17.0676278Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_nvfuser_cuda_int32 PASSED [ 33%] 2023-01-11T23:10:17.0676474Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 33%] 2023-01-11T23:10:17.0676674Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 33%] 2023-01-11T23:10:17.0676865Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcdiv_executor_aten_cuda_complex128 PASSED [ 33%] 2023-01-11T23:10:17.0677052Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcdiv_executor_aten_cuda_float32 PASSED [ 33%] 2023-01-11T23:10:17.0677265Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcdiv_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 33%] 2023-01-11T23:10:17.0677479Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcdiv_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 33%] 2023-01-11T23:10:17.0677666Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcdiv_executor_nvfuser_cuda_float32 PASSED [ 33%] 2023-01-11T23:10:17.0677852Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcmul_executor_aten_cuda_complex64 PASSED [ 33%] 2023-01-11T23:10:17.0678035Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcmul_executor_aten_cuda_float16 PASSED [ 33%] 2023-01-11T23:10:17.0678216Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcmul_executor_aten_cuda_float32 PASSED [ 33%] 2023-01-11T23:10:17.0678388Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcmul_executor_aten_cuda_float64 PASSED [ 33%] 2023-01-11T23:10:17.0678589Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcmul_executor_aten_cuda_int32 PASSED [ 33%] 2023-01-11T23:10:17.0678801Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcmul_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 33%] 2023-01-11T23:10:17.0679010Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcmul_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 33%] 2023-01-11T23:10:17.0679344Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcmul_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 33%] 2023-01-11T23:10:17.0679526Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addr_executor_aten_cuda_complex64 PASSED [ 33%] 2023-01-11T23:10:17.0679706Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addr_executor_aten_cuda_float16 PASSED [ 33%] 2023-01-11T23:10:17.0679890Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addr_executor_aten_cuda_float32 PASSED [ 33%] 2023-01-11T23:10:17.0680072Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addr_executor_aten_cuda_float64 PASSED [ 33%] 2023-01-11T23:10:17.0680250Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addr_executor_aten_cuda_int32 PASSED [ 33%] 2023-01-11T23:10:17.0680422Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addr_executor_aten_cuda_int64 PASSED [ 33%] 2023-01-11T23:10:17.0680599Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addr_executor_aten_cuda_uint8 PASSED [ 33%] 2023-01-11T23:10:17.0680805Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addr_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 33%] 2023-01-11T23:10:17.0681036Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addr_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 33%] 2023-01-11T23:10:17.0681402Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addr_executor_nvfuser_cuda_float32 SKIPPED (_refs.addr doesn't support nvfuser) [ 33%] 2023-01-11T23:10:17.0681610Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addr_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 33%] 2023-01-11T23:10:17.0681938Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addr_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 33%] 2023-01-11T23:10:17.0682250Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addr_executor_nvfuser_cuda_int32 SKIPPED (_refs.addr doesn't support nvfuser) [ 33%] 2023-01-11T23:10:17.0682428Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_all_executor_aten_cuda_bfloat16 PASSED [ 33%] 2023-01-11T23:10:17.0682605Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_all_executor_aten_cuda_complex64 PASSED [ 33%] 2023-01-11T23:10:17.0682782Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_all_executor_aten_cuda_uint8 PASSED [ 33%] 2023-01-11T23:10:17.0682992Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_all_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 33%] 2023-01-11T23:10:17.0683176Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_all_executor_nvfuser_cuda_float32 PASSED [ 33%] 2023-01-11T23:10:17.0683499Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_all_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 33%] 2023-01-11T23:10:17.0683699Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_all_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 33%] 2023-01-11T23:10:17.0683896Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_all_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 33%] 2023-01-11T23:10:17.0684083Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_allclose_executor_aten_cuda_complex64 PASSED [ 33%] 2023-01-11T23:10:17.0684272Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_allclose_executor_aten_cuda_float16 PASSED [ 33%] 2023-01-11T23:10:17.0684453Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_allclose_executor_aten_cuda_float32 PASSED [ 33%] 2023-01-11T23:10:17.0684650Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_allclose_executor_aten_cuda_float64 PASSED [ 33%] 2023-01-11T23:10:17.0684870Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_allclose_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 33%] 2023-01-11T23:10:17.0685080Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_allclose_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 33%] 2023-01-11T23:10:17.0685288Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_allclose_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 33%] 2023-01-11T23:10:17.0685616Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_allclose_executor_nvfuser_cuda_float32 SKIPPED (_refs.allclose doesn't support nvfuser) [ 33%] 2023-01-11T23:10:17.0685826Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_allclose_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 33%] 2023-01-11T23:10:17.0686007Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amax_executor_aten_cuda_bool PASSED [ 33%] 2023-01-11T23:10:17.0686185Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amax_executor_aten_cuda_float32 PASSED [ 33%] 2023-01-11T23:10:17.0686366Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amax_executor_aten_cuda_float64 PASSED [ 33%] 2023-01-11T23:10:17.0686541Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amax_executor_aten_cuda_int64 PASSED [ 33%] 2023-01-11T23:10:17.0686709Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amax_executor_aten_cuda_int8 PASSED [ 33%] 2023-01-11T23:10:17.0686883Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amax_executor_aten_cuda_uint8 PASSED [ 33%] 2023-01-11T23:10:17.0687110Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amax_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 33%] 2023-01-11T23:10:17.0687316Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amax_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 33%] 2023-01-11T23:10:17.0687498Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amax_executor_nvfuser_cuda_float32 PASSED [ 33%] 2023-01-11T23:10:17.0687703Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amax_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 33%] 2023-01-11T23:10:17.0688028Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amax_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 33%] 2023-01-11T23:10:17.0688229Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amax_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 33%] 2023-01-11T23:10:17.0688411Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amin_executor_aten_cuda_bfloat16 PASSED [ 33%] 2023-01-11T23:10:17.0688592Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amin_executor_aten_cuda_float16 PASSED [ 33%] 2023-01-11T23:10:17.0688766Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amin_executor_aten_cuda_float32 PASSED [ 33%] 2023-01-11T23:10:17.0688942Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amin_executor_aten_cuda_int32 PASSED [ 33%] 2023-01-11T23:10:17.0689119Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amin_executor_aten_cuda_uint8 PASSED [ 33%] 2023-01-11T23:10:17.0689444Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amin_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 33%] 2023-01-11T23:10:17.0689765Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amin_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 33%] 2023-01-11T23:10:17.0689966Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amin_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 33%] 2023-01-11T23:10:17.0690148Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_any_executor_aten_cuda_bfloat16 PASSED [ 33%] 2023-01-11T23:10:17.0690345Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_any_executor_aten_cuda_bool PASSED [ 33%] 2023-01-11T23:10:17.0690526Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_any_executor_aten_cuda_complex128 PASSED [ 33%] 2023-01-11T23:10:17.0690701Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_any_executor_aten_cuda_complex64 PASSED [ 33%] 2023-01-11T23:10:17.0690878Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_any_executor_aten_cuda_float32 PASSED [ 33%] 2023-01-11T23:10:17.0691058Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_any_executor_aten_cuda_float64 PASSED [ 33%] 2023-01-11T23:10:17.0691232Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_any_executor_aten_cuda_int16 PASSED [ 33%] 2023-01-11T23:10:17.0691414Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_any_executor_aten_cuda_int32 PASSED [ 33%] 2023-01-11T23:10:17.0691589Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_any_executor_aten_cuda_int64 PASSED [ 33%] 2023-01-11T23:10:17.0691798Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_any_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 33%] 2023-01-11T23:10:17.0692002Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_any_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 33%] 2023-01-11T23:10:17.0692322Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_any_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 33%] 2023-01-11T23:10:17.0692502Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_any_executor_nvfuser_cuda_int32 PASSED [ 33%] 2023-01-11T23:10:17.0692695Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_any_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 33%] 2023-01-11T23:10:17.0692956Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_any_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 33%] 2023-01-11T23:10:17.0693141Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_arange_executor_aten_cuda_bfloat16 PASSED [ 33%] 2023-01-11T23:10:17.0693323Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_arange_executor_aten_cuda_float64 PASSED [ 33%] 2023-01-11T23:10:17.0693499Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_arange_executor_aten_cuda_int16 PASSED [ 33%] 2023-01-11T23:10:17.0693677Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_arange_executor_aten_cuda_int8 PASSED [ 33%] 2023-01-11T23:10:17.0693888Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_arange_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 33%] 2023-01-11T23:10:17.0694211Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_arange_executor_nvfuser_cuda_float32 SKIPPED (_refs.arange doesn't support nvfuser) [ 33%] 2023-01-11T23:10:17.0694635Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_arange_executor_nvfuser_cuda_int32 SKIPPED (_refs.arange doesn't support nvfuser) [ 33%] 2023-01-11T23:10:17.0694957Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_arange_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 33%] 2023-01-11T23:10:17.0695146Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_executor_aten_cuda_bfloat16 PASSED [ 33%] 2023-01-11T23:10:17.0695331Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_executor_aten_cuda_int16 PASSED [ 33%] 2023-01-11T23:10:17.0695514Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_executor_aten_cuda_int32 PASSED [ 33%] 2023-01-11T23:10:17.0695697Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_executor_aten_cuda_int8 PASSED [ 33%] 2023-01-11T23:10:17.0695874Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_executor_aten_cuda_uint8 PASSED [ 33%] 2023-01-11T23:10:17.0696091Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 33%] 2023-01-11T23:10:17.0696347Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 33%] 2023-01-11T23:10:17.0696564Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 33%] 2023-01-11T23:10:17.0696776Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 33%] 2023-01-11T23:10:17.0697102Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_executor_nvfuser_cuda_float32 SKIPPED (_refs.as_strided doesn't support nvfuser) [ 33%] 2023-01-11T23:10:17.0697308Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 33%] 2023-01-11T23:10:17.0697520Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 33%] 2023-01-11T23:10:17.0697723Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 33%] 2023-01-11T23:10:17.0697923Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_partial_views_executor_aten_cuda_bfloat16 PASSED [ 33%] 2023-01-11T23:10:17.0698128Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_partial_views_executor_aten_cuda_complex128 PASSED [ 33%] 2023-01-11T23:10:17.0698330Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_partial_views_executor_aten_cuda_complex64 PASSED [ 33%] 2023-01-11T23:10:17.0698530Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_partial_views_executor_aten_cuda_float32 PASSED [ 33%] 2023-01-11T23:10:17.0698761Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_partial_views_executor_aten_cuda_float64 PASSED [ 33%] 2023-01-11T23:10:17.0698990Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_partial_views_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 33%] 2023-01-11T23:10:17.0699219Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_partial_views_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 33%] 2023-01-11T23:10:17.0699439Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_partial_views_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 33%] 2023-01-11T23:10:17.0699795Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_partial_views_executor_nvfuser_cuda_float32 SKIPPED (_refs.as_strided doesn't support nvfuser) [ 33%] 2023-01-11T23:10:17.0700147Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_partial_views_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 33%] 2023-01-11T23:10:17.0700397Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_partial_views_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 33%] 2023-01-11T23:10:17.0700623Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_aten_cuda_bfloat16 PASSED [ 33%] 2023-01-11T23:10:17.0700821Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_aten_cuda_complex128 PASSED [ 33%] 2023-01-11T23:10:17.0701013Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_aten_cuda_float16 PASSED [ 33%] 2023-01-11T23:10:17.0701204Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_aten_cuda_float32 PASSED [ 33%] 2023-01-11T23:10:17.0701396Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_aten_cuda_float64 PASSED [ 33%] 2023-01-11T23:10:17.0701619Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 33%] 2023-01-11T23:10:17.0701853Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 33%] 2023-01-11T23:10:17.0702076Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 33%] 2023-01-11T23:10:17.0702292Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 33%] 2023-01-11T23:10:17.0702652Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_nvfuser_cuda_float32 SKIPPED (_refs.as_strided_scatter doesn't support nvfuser) [ 33%] 2023-01-11T23:10:17.0702870Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 33%] 2023-01-11T23:10:17.0703215Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 33%] 2023-01-11T23:10:17.0703568Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_nvfuser_cuda_int32 SKIPPED (_refs.as_strided_scatter doesn't support nvfuser) [ 33%] 2023-01-11T23:10:17.0703782Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 33%] 2023-01-11T23:10:17.0703965Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_aten_cuda_complex32 PASSED [ 33%] 2023-01-11T23:10:17.0704145Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_aten_cuda_float64 PASSED [ 33%] 2023-01-11T23:10:17.0704321Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_aten_cuda_int16 PASSED [ 33%] 2023-01-11T23:10:17.0704523Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_aten_cuda_int32 PASSED [ 33%] 2023-01-11T23:10:17.0704699Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_aten_cuda_int8 PASSED [ 33%] 2023-01-11T23:10:17.0704878Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_aten_cuda_uint8 PASSED [ 33%] 2023-01-11T23:10:17.0705084Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 33%] 2023-01-11T23:10:17.0705284Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 34%] 2023-01-11T23:10:17.0705492Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 34%] 2023-01-11T23:10:17.0705833Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 34%] 2023-01-11T23:10:17.0706038Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 34%] 2023-01-11T23:10:17.0706362Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 34%] 2023-01-11T23:10:17.0706544Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_aten_cuda_complex128 PASSED [ 34%] 2023-01-11T23:10:17.0706722Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_aten_cuda_complex64 PASSED [ 34%] 2023-01-11T23:10:17.0706901Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_aten_cuda_int32 PASSED [ 34%] 2023-01-11T23:10:17.0707078Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_aten_cuda_int64 PASSED [ 34%] 2023-01-11T23:10:17.0707252Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_aten_cuda_int8 PASSED [ 34%] 2023-01-11T23:10:17.0707574Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_nvfuser_cuda_float32 SKIPPED (_refs.asinh doesn't support nvfuser) [ 34%] 2023-01-11T23:10:17.0707808Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 34%] 2023-01-11T23:10:17.0708133Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 34%] 2023-01-11T23:10:17.0708450Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_nvfuser_cuda_int32 SKIPPED (_refs.asinh doesn't support nvfuser) [ 34%] 2023-01-11T23:10:17.0708715Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 34%] 2023-01-11T23:10:17.0708904Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan2_executor_aten_cuda_int16 PASSED [ 34%] 2023-01-11T23:10:17.0709076Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan2_executor_aten_cuda_int64 PASSED [ 34%] 2023-01-11T23:10:17.0709253Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan2_executor_aten_cuda_int8 PASSED [ 34%] 2023-01-11T23:10:17.0709458Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan2_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 34%] 2023-01-11T23:10:17.0709658Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan2_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 34%] 2023-01-11T23:10:17.0709842Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan2_executor_nvfuser_cuda_float32 PASSED [ 34%] 2023-01-11T23:10:17.0710166Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan2_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 34%] 2023-01-11T23:10:17.0710346Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_aten_cuda_bfloat16 PASSED [ 34%] 2023-01-11T23:10:17.0710555Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_aten_cuda_complex32 PASSED [ 34%] 2023-01-11T23:10:17.0710731Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_aten_cuda_int16 PASSED [ 34%] 2023-01-11T23:10:17.0710903Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_aten_cuda_int64 PASSED [ 34%] 2023-01-11T23:10:17.0711080Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_aten_cuda_int8 PASSED [ 34%] 2023-01-11T23:10:17.0711259Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_aten_cuda_uint8 PASSED [ 34%] 2023-01-11T23:10:17.0711464Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 34%] 2023-01-11T23:10:17.0711664Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 34%] 2023-01-11T23:10:17.0711871Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 34%] 2023-01-11T23:10:17.0712051Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_nvfuser_cuda_int32 PASSED [ 34%] 2023-01-11T23:10:17.0712376Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 34%] 2023-01-11T23:10:17.0712577Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 34%] 2023-01-11T23:10:17.0712760Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_aten_cuda_bfloat16 PASSED [ 34%] 2023-01-11T23:10:17.0712938Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_aten_cuda_complex128 PASSED [ 34%] 2023-01-11T23:10:17.0713124Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_aten_cuda_complex32 PASSED [ 34%] 2023-01-11T23:10:17.0713309Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_aten_cuda_complex64 PASSED [ 34%] 2023-01-11T23:10:17.0713489Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_aten_cuda_float64 PASSED [ 34%] 2023-01-11T23:10:17.0713700Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_aten_cuda_int64 PASSED [ 34%] 2023-01-11T23:10:17.0713881Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_aten_cuda_int8 PASSED [ 34%] 2023-01-11T23:10:17.0714059Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_aten_cuda_uint8 PASSED [ 34%] 2023-01-11T23:10:17.0714265Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 34%] 2023-01-11T23:10:17.0714609Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 34%] 2023-01-11T23:10:17.0714815Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 34%] 2023-01-11T23:10:17.0715022Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 34%] 2023-01-11T23:10:17.0715206Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_nvfuser_cuda_float32 PASSED [ 34%] 2023-01-11T23:10:17.0715412Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 34%] 2023-01-11T23:10:17.0715598Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_nvfuser_cuda_int32 PASSED [ 34%] 2023-01-11T23:10:17.0715922Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 34%] 2023-01-11T23:10:17.0716126Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 34%] 2023-01-11T23:10:17.0716339Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_aten_cuda_bool PASSED [ 34%] 2023-01-11T23:10:17.0716534Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_aten_cuda_complex32 PASSED [ 34%] 2023-01-11T23:10:17.0716723Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_aten_cuda_complex64 PASSED [ 34%] 2023-01-11T23:10:17.0716902Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_aten_cuda_float16 PASSED [ 34%] 2023-01-11T23:10:17.0717087Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_aten_cuda_float64 PASSED [ 34%] 2023-01-11T23:10:17.0717267Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_aten_cuda_int16 PASSED [ 34%] 2023-01-11T23:10:17.0717447Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_aten_cuda_int32 PASSED [ 34%] 2023-01-11T23:10:17.0717624Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_aten_cuda_int64 PASSED [ 34%] 2023-01-11T23:10:17.0717803Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_aten_cuda_int8 PASSED [ 34%] 2023-01-11T23:10:17.0717982Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_aten_cuda_uint8 PASSED [ 34%] 2023-01-11T23:10:17.0718193Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 34%] 2023-01-11T23:10:17.0718402Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 34%] 2023-01-11T23:10:17.0718616Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 34%] 2023-01-11T23:10:17.0718818Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 34%] 2023-01-11T23:10:17.0719028Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 34%] 2023-01-11T23:10:17.0719252Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 34%] 2023-01-11T23:10:17.0719442Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_aten_cuda_bfloat16 PASSED [ 34%] 2023-01-11T23:10:17.0719623Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_aten_cuda_bool PASSED [ 34%] 2023-01-11T23:10:17.0719810Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_aten_cuda_complex128 PASSED [ 34%] 2023-01-11T23:10:17.0719993Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_aten_cuda_float16 PASSED [ 34%] 2023-01-11T23:10:17.0720176Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_aten_cuda_float32 PASSED [ 34%] 2023-01-11T23:10:17.0720391Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_aten_cuda_int32 PASSED [ 34%] 2023-01-11T23:10:17.0720587Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_aten_cuda_int64 PASSED [ 34%] 2023-01-11T23:10:17.0720765Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_aten_cuda_int8 PASSED [ 34%] 2023-01-11T23:10:17.0720940Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_aten_cuda_uint8 PASSED [ 34%] 2023-01-11T23:10:17.0721153Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 34%] 2023-01-11T23:10:17.0721368Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 34%] 2023-01-11T23:10:17.0721706Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_nvfuser_cuda_float32 SKIPPED (_refs.atleast_2d doesn't support nvfuser) [ 34%] 2023-01-11T23:10:17.0722072Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 34%] 2023-01-11T23:10:17.0722283Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 34%] 2023-01-11T23:10:17.0722469Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_aten_cuda_bfloat16 PASSED [ 34%] 2023-01-11T23:10:17.0722654Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_aten_cuda_bool PASSED [ 34%] 2023-01-11T23:10:17.0722834Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_aten_cuda_complex64 PASSED [ 34%] 2023-01-11T23:10:17.0723017Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_aten_cuda_float16 PASSED [ 34%] 2023-01-11T23:10:17.0723201Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_aten_cuda_float32 PASSED [ 34%] 2023-01-11T23:10:17.0723384Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_aten_cuda_float64 PASSED [ 34%] 2023-01-11T23:10:17.0723565Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_aten_cuda_int16 PASSED [ 34%] 2023-01-11T23:10:17.0723739Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_aten_cuda_int32 PASSED [ 34%] 2023-01-11T23:10:17.0723916Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_aten_cuda_uint8 PASSED [ 34%] 2023-01-11T23:10:17.0724127Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 34%] 2023-01-11T23:10:17.0724481Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 34%] 2023-01-11T23:10:17.0724692Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 34%] 2023-01-11T23:10:17.0725051Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_nvfuser_cuda_float32 SKIPPED (_refs.atleast_3d doesn't support nvfuser) [ 34%] 2023-01-11T23:10:17.0725255Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 34%] 2023-01-11T23:10:17.0725585Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 34%] 2023-01-11T23:10:17.0725792Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 34%] 2023-01-11T23:10:17.0726122Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 34%] 2023-01-11T23:10:17.0726331Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 34%] 2023-01-11T23:10:17.0726517Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_and_executor_aten_cuda_int16 PASSED [ 34%] 2023-01-11T23:10:17.0726702Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_and_executor_aten_cuda_int32 PASSED [ 34%] 2023-01-11T23:10:17.0726884Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_and_executor_aten_cuda_int8 PASSED [ 34%] 2023-01-11T23:10:17.0727064Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_and_executor_aten_cuda_uint8 PASSED [ 34%] 2023-01-11T23:10:17.0727391Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_and_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 34%] 2023-01-11T23:10:17.0727607Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_and_executor_nvfuser_cuda_int32 PASSED [ 34%] 2023-01-11T23:10:17.0727814Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_and_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 34%] 2023-01-11T23:10:17.0728028Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_and_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 34%] 2023-01-11T23:10:17.0728219Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_left_shift_executor_aten_cuda_int16 PASSED [ 34%] 2023-01-11T23:10:17.0728401Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_left_shift_executor_aten_cuda_int32 PASSED [ 34%] 2023-01-11T23:10:17.0728581Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_left_shift_executor_aten_cuda_int64 PASSED [ 34%] 2023-01-11T23:10:17.0728929Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_left_shift_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 34%] 2023-01-11T23:10:17.0729145Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_left_shift_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 34%] 2023-01-11T23:10:17.0729330Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_not_executor_aten_cuda_int16 PASSED [ 34%] 2023-01-11T23:10:17.0729512Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_not_executor_aten_cuda_uint8 PASSED [ 34%] 2023-01-11T23:10:17.0729714Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_not_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 34%] 2023-01-11T23:10:17.0730044Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_not_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 34%] 2023-01-11T23:10:17.0730250Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_not_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 34%] 2023-01-11T23:10:17.0730436Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_or_executor_aten_cuda_int32 PASSED [ 34%] 2023-01-11T23:10:17.0730626Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_or_executor_aten_cuda_int64 PASSED [ 34%] 2023-01-11T23:10:17.0730867Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_or_executor_aten_cuda_int8 PASSED [ 34%] 2023-01-11T23:10:17.0731078Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_or_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 34%] 2023-01-11T23:10:17.0731412Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_or_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 34%] 2023-01-11T23:10:17.0731744Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_or_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 34%] 2023-01-11T23:10:17.0731948Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_or_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 34%] 2023-01-11T23:10:17.0732137Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_right_shift_executor_aten_cuda_int64 PASSED [ 34%] 2023-01-11T23:10:17.0732331Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_right_shift_executor_aten_cuda_int8 PASSED [ 34%] 2023-01-11T23:10:17.0732523Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_right_shift_executor_aten_cuda_uint8 PASSED [ 34%] 2023-01-11T23:10:17.0732867Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_right_shift_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 34%] 2023-01-11T23:10:17.0733210Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_right_shift_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 34%] 2023-01-11T23:10:17.0733392Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_xor_executor_aten_cuda_int32 PASSED [ 34%] 2023-01-11T23:10:17.0733600Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_xor_executor_aten_cuda_int64 PASSED [ 34%] 2023-01-11T23:10:17.0733787Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_xor_executor_aten_cuda_int8 PASSED [ 34%] 2023-01-11T23:10:17.0733996Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_xor_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 34%] 2023-01-11T23:10:17.0734201Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_xor_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 34%] 2023-01-11T23:10:17.0734656Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_shapes_executor_nvfuser_cuda_float32 SKIPPED (_refs.broadcast_shapes doesn't support nvfuser) [ 34%] 2023-01-11T23:10:17.0734857Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_tensors_executor_aten_cuda_bfloat16 PASSED [ 34%] 2023-01-11T23:10:17.0735049Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_tensors_executor_aten_cuda_bool PASSED [ 34%] 2023-01-11T23:10:17.0735244Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_tensors_executor_aten_cuda_complex64 PASSED [ 34%] 2023-01-11T23:10:17.0735440Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_tensors_executor_aten_cuda_float16 PASSED [ 34%] 2023-01-11T23:10:17.0735631Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_tensors_executor_aten_cuda_int16 PASSED [ 34%] 2023-01-11T23:10:17.0735815Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_tensors_executor_aten_cuda_int64 PASSED [ 34%] 2023-01-11T23:10:17.0736002Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_tensors_executor_aten_cuda_int8 PASSED [ 34%] 2023-01-11T23:10:17.0736185Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_tensors_executor_aten_cuda_uint8 PASSED [ 34%] 2023-01-11T23:10:17.0736403Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_tensors_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 34%] 2023-01-11T23:10:17.0736662Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_tensors_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 34%] 2023-01-11T23:10:17.0736888Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_tensors_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 34%] 2023-01-11T23:10:17.0737081Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_tensors_executor_nvfuser_cuda_int32 PASSED [ 34%] 2023-01-11T23:10:17.0737296Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_tensors_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 34%] 2023-01-11T23:10:17.0737481Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_to_executor_aten_cuda_float16 PASSED [ 34%] 2023-01-11T23:10:17.0737666Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_to_executor_aten_cuda_float64 PASSED [ 34%] 2023-01-11T23:10:17.0737854Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_to_executor_aten_cuda_int32 PASSED [ 34%] 2023-01-11T23:10:17.0738041Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_to_executor_aten_cuda_uint8 PASSED [ 34%] 2023-01-11T23:10:17.0738254Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_to_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 34%] 2023-01-11T23:10:17.0738468Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_to_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 34%] 2023-01-11T23:10:17.0738677Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_to_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 34%] 2023-01-11T23:10:17.0738867Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_to_executor_nvfuser_cuda_float32 PASSED [ 34%] 2023-01-11T23:10:17.0739236Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_to_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 34%] 2023-01-11T23:10:17.0739451Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_to_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 35%] 2023-01-11T23:10:17.0739634Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bucketize_executor_aten_cuda_float16 XFAIL [ 35%] 2023-01-11T23:10:17.0739814Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bucketize_executor_aten_cuda_int16 XFAIL [ 35%] 2023-01-11T23:10:17.0739991Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bucketize_executor_aten_cuda_int32 XFAIL [ 35%] 2023-01-11T23:10:17.0740162Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bucketize_executor_aten_cuda_int64 XFAIL [ 35%] 2023-01-11T23:10:17.0740337Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bucketize_executor_aten_cuda_uint8 XFAIL [ 35%] 2023-01-11T23:10:17.0740524Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bucketize_executor_nvfuser_cuda_float32 XFAIL [ 35%] 2023-01-11T23:10:17.0740703Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bucketize_executor_nvfuser_cuda_int32 XFAIL [ 35%] 2023-01-11T23:10:17.0741036Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bucketize_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 35%] 2023-01-11T23:10:17.0741244Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bucketize_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 35%] 2023-01-11T23:10:17.0741422Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_aten_cuda_bfloat16 PASSED [ 35%] 2023-01-11T23:10:17.0741599Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_aten_cuda_bool PASSED [ 35%] 2023-01-11T23:10:17.0741782Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_aten_cuda_complex32 PASSED [ 35%] 2023-01-11T23:10:17.0741966Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_aten_cuda_float16 PASSED [ 35%] 2023-01-11T23:10:17.0742140Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_aten_cuda_int16 PASSED [ 35%] 2023-01-11T23:10:17.0742342Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_aten_cuda_int32 PASSED [ 35%] 2023-01-11T23:10:17.0742512Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_aten_cuda_int64 PASSED [ 35%] 2023-01-11T23:10:17.0742689Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_aten_cuda_int8 PASSED [ 35%] 2023-01-11T23:10:17.0743029Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 35%] 2023-01-11T23:10:17.0743236Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 35%] 2023-01-11T23:10:17.0743558Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 35%] 2023-01-11T23:10:17.0743762Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 35%] 2023-01-11T23:10:17.0744085Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 35%] 2023-01-11T23:10:17.0744283Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 35%] 2023-01-11T23:10:17.0744464Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ceil_executor_aten_cuda_bfloat16 PASSED [ 35%] 2023-01-11T23:10:17.0744645Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ceil_executor_aten_cuda_float16 PASSED [ 35%] 2023-01-11T23:10:17.0744815Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ceil_executor_aten_cuda_float32 PASSED [ 35%] 2023-01-11T23:10:17.0745021Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ceil_executor_aten_cuda_int16 PASSED [ 35%] 2023-01-11T23:10:17.0745202Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ceil_executor_aten_cuda_int32 PASSED [ 35%] 2023-01-11T23:10:17.0745378Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ceil_executor_aten_cuda_int64 PASSED [ 35%] 2023-01-11T23:10:17.0745581Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ceil_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 35%] 2023-01-11T23:10:17.0745896Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ceil_executor_nvfuser_cuda_float32 SKIPPED (_refs.ceil doesn't support nvfuser) [ 35%] 2023-01-11T23:10:17.0746099Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ceil_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 35%] 2023-01-11T23:10:17.0746418Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ceil_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 35%] 2023-01-11T23:10:17.0746731Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ceil_executor_nvfuser_cuda_int32 SKIPPED (_refs.ceil doesn't support nvfuser) [ 35%] 2023-01-11T23:10:17.0746930Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ceil_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 35%] 2023-01-11T23:10:17.0747243Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ceil_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 35%] 2023-01-11T23:10:17.0747443Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ceil_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 35%] 2023-01-11T23:10:17.0747623Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_aten_cuda_bfloat16 PASSED [ 35%] 2023-01-11T23:10:17.0747807Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_aten_cuda_complex64 PASSED [ 35%] 2023-01-11T23:10:17.0747988Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_aten_cuda_float16 PASSED [ 35%] 2023-01-11T23:10:17.0748166Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_aten_cuda_float32 PASSED [ 35%] 2023-01-11T23:10:17.0748367Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_aten_cuda_int64 PASSED [ 35%] 2023-01-11T23:10:17.0748545Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_aten_cuda_int8 PASSED [ 35%] 2023-01-11T23:10:17.0748797Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_aten_cuda_uint8 PASSED [ 35%] 2023-01-11T23:10:17.0749006Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 35%] 2023-01-11T23:10:17.0749359Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 35%] 2023-01-11T23:10:17.0749572Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 35%] 2023-01-11T23:10:17.0749891Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_nvfuser_cuda_float32 SKIPPED (_refs.chunk doesn't support nvfuser) [ 35%] 2023-01-11T23:10:17.0750214Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 35%] 2023-01-11T23:10:17.0750527Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_nvfuser_cuda_int32 SKIPPED (_refs.chunk doesn't support nvfuser) [ 35%] 2023-01-11T23:10:17.0750728Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 35%] 2023-01-11T23:10:17.0750911Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_executor_aten_cuda_bfloat16 PASSED [ 35%] 2023-01-11T23:10:17.0751118Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_executor_aten_cuda_float32 PASSED [ 35%] 2023-01-11T23:10:17.0751293Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_executor_aten_cuda_int16 PASSED [ 35%] 2023-01-11T23:10:17.0751468Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_executor_aten_cuda_int32 PASSED [ 35%] 2023-01-11T23:10:17.0751644Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_executor_aten_cuda_int8 PASSED [ 35%] 2023-01-11T23:10:17.0751967Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_executor_nvfuser_cuda_float32 SKIPPED (_refs.clamp doesn't support nvfuser) [ 35%] 2023-01-11T23:10:17.0752174Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 35%] 2023-01-11T23:10:17.0752487Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_executor_nvfuser_cuda_int32 SKIPPED (_refs.clamp doesn't support nvfuser) [ 35%] 2023-01-11T23:10:17.0752810Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 35%] 2023-01-11T23:10:17.0753017Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 35%] 2023-01-11T23:10:17.0753203Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_max_executor_aten_cuda_bfloat16 PASSED [ 35%] 2023-01-11T23:10:17.0753387Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_max_executor_aten_cuda_float16 PASSED [ 35%] 2023-01-11T23:10:17.0753570Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_max_executor_aten_cuda_float64 PASSED [ 35%] 2023-01-11T23:10:17.0753746Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_max_executor_aten_cuda_int32 PASSED [ 35%] 2023-01-11T23:10:17.0753924Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_max_executor_aten_cuda_int8 PASSED [ 35%] 2023-01-11T23:10:17.0754138Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_max_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 35%] 2023-01-11T23:10:17.0754380Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_max_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 35%] 2023-01-11T23:10:17.0754714Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_max_executor_nvfuser_cuda_float32 SKIPPED (_refs.clamp_max doesn't support nvfuser) [ 35%] 2023-01-11T23:10:17.0755040Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_max_executor_nvfuser_cuda_int32 SKIPPED (_refs.clamp_max doesn't support nvfuser) [ 35%] 2023-01-11T23:10:17.0755246Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_max_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 35%] 2023-01-11T23:10:17.0755448Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_max_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 35%] 2023-01-11T23:10:17.0755638Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_min_executor_aten_cuda_bfloat16 PASSED [ 35%] 2023-01-11T23:10:17.0755819Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_min_executor_aten_cuda_bool PASSED [ 35%] 2023-01-11T23:10:17.0755997Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_min_executor_aten_cuda_float32 PASSED [ 35%] 2023-01-11T23:10:17.0756179Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_min_executor_aten_cuda_float64 PASSED [ 35%] 2023-01-11T23:10:17.0756388Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_min_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 35%] 2023-01-11T23:10:17.0756596Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_min_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 35%] 2023-01-11T23:10:17.0756921Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_min_executor_nvfuser_cuda_int32 SKIPPED (_refs.clamp_min doesn't support nvfuser) [ 35%] 2023-01-11T23:10:17.0757153Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_min_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 35%] 2023-01-11T23:10:17.0757354Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_min_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 35%] 2023-01-11T23:10:17.0757534Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_aten_cuda_complex64 PASSED [ 35%] 2023-01-11T23:10:17.0757711Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_aten_cuda_int16 PASSED [ 35%] 2023-01-11T23:10:17.0757888Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_aten_cuda_int8 PASSED [ 35%] 2023-01-11T23:10:17.0758090Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 35%] 2023-01-11T23:10:17.0758290Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 35%] 2023-01-11T23:10:17.0758493Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 35%] 2023-01-11T23:10:17.0758678Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_nvfuser_cuda_float32 PASSED [ 35%] 2023-01-11T23:10:17.0758881Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 35%] 2023-01-11T23:10:17.0759084Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 35%] 2023-01-11T23:10:17.0759268Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_column_stack_executor_aten_cuda_bool PASSED [ 35%] 2023-01-11T23:10:17.0759461Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_column_stack_executor_aten_cuda_complex128 PASSED [ 35%] 2023-01-11T23:10:17.0759646Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_column_stack_executor_aten_cuda_float16 PASSED [ 35%] 2023-01-11T23:10:17.0759833Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_column_stack_executor_aten_cuda_int16 PASSED [ 35%] 2023-01-11T23:10:17.0760062Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_column_stack_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 35%] 2023-01-11T23:10:17.0760272Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_column_stack_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 35%] 2023-01-11T23:10:17.0760512Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_column_stack_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 35%] 2023-01-11T23:10:17.0760876Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_column_stack_executor_nvfuser_cuda_float32 SKIPPED (_refs.column_stack doesn't support nvfuser) [ 35%] 2023-01-11T23:10:17.0761209Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_column_stack_executor_nvfuser_cuda_int32 SKIPPED (_refs.column_stack doesn't support nvfuser) [ 35%] 2023-01-11T23:10:17.0761549Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_column_stack_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 35%] 2023-01-11T23:10:17.0761738Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_aten_cuda_complex64 PASSED [ 35%] 2023-01-11T23:10:17.0761918Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_aten_cuda_float16 PASSED [ 35%] 2023-01-11T23:10:17.0762099Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_aten_cuda_float32 PASSED [ 35%] 2023-01-11T23:10:17.0762275Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_aten_cuda_int16 PASSED [ 35%] 2023-01-11T23:10:17.0762445Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_aten_cuda_int32 PASSED [ 35%] 2023-01-11T23:10:17.0762655Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 35%] 2023-01-11T23:10:17.0762885Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 35%] 2023-01-11T23:10:17.0763204Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_nvfuser_cuda_float32 SKIPPED (_refs.conj doesn't support nvfuser) [ 35%] 2023-01-11T23:10:17.0763525Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 35%] 2023-01-11T23:10:17.0763836Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_nvfuser_cuda_int32 SKIPPED (_refs.conj doesn't support nvfuser) [ 35%] 2023-01-11T23:10:17.0764035Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 35%] 2023-01-11T23:10:17.0764226Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_physical_executor_aten_cuda_bfloat16 PASSED [ 35%] 2023-01-11T23:10:17.0764420Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_physical_executor_aten_cuda_complex32 PASSED [ 35%] 2023-01-11T23:10:17.0764611Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_physical_executor_aten_cuda_float16 PASSED [ 35%] 2023-01-11T23:10:17.0764787Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_physical_executor_aten_cuda_float32 PASSED [ 35%] 2023-01-11T23:10:17.0764971Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_physical_executor_aten_cuda_int16 PASSED [ 35%] 2023-01-11T23:10:17.0765155Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_physical_executor_aten_cuda_int64 PASSED [ 35%] 2023-01-11T23:10:17.0765340Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_physical_executor_aten_cuda_uint8 PASSED [ 35%] 2023-01-11T23:10:17.0765554Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_physical_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 35%] 2023-01-11T23:10:17.0765771Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_physical_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 35%] 2023-01-11T23:10:17.0766137Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_physical_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 35%] 2023-01-11T23:10:17.0766484Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_physical_executor_nvfuser_cuda_int32 SKIPPED (_refs.conj_physical doesn't support nvfuser) [ 35%] 2023-01-11T23:10:17.0766818Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_physical_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 35%] 2023-01-11T23:10:17.0767029Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_physical_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 35%] 2023-01-11T23:10:17.0767223Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_constant_pad_nd_executor_aten_cuda_bfloat16 PASSED [ 35%] 2023-01-11T23:10:17.0767409Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_constant_pad_nd_executor_aten_cuda_bool PASSED [ 35%] 2023-01-11T23:10:17.0767601Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_constant_pad_nd_executor_aten_cuda_float16 PASSED [ 35%] 2023-01-11T23:10:17.0767792Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_constant_pad_nd_executor_aten_cuda_int32 PASSED [ 35%] 2023-01-11T23:10:17.0767978Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_constant_pad_nd_executor_aten_cuda_int8 PASSED [ 35%] 2023-01-11T23:10:17.0768196Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_constant_pad_nd_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 35%] 2023-01-11T23:10:17.0768407Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_constant_pad_nd_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 35%] 2023-01-11T23:10:17.0768651Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_constant_pad_nd_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 35%] 2023-01-11T23:10:17.0769005Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_constant_pad_nd_executor_nvfuser_cuda_float32 SKIPPED (_refs.constant_pad_nd doesn't support nvfuser) [ 35%] 2023-01-11T23:10:17.0769218Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_constant_pad_nd_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 35%] 2023-01-11T23:10:17.0769430Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_constant_pad_nd_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 35%] 2023-01-11T23:10:17.0769764Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_constant_pad_nd_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 35%] 2023-01-11T23:10:17.0769950Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_aten_cuda_bfloat16 PASSED [ 35%] 2023-01-11T23:10:17.0770135Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_aten_cuda_bool PASSED [ 35%] 2023-01-11T23:10:17.0770323Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_aten_cuda_complex128 PASSED [ 35%] 2023-01-11T23:10:17.0770512Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_aten_cuda_complex64 PASSED [ 35%] 2023-01-11T23:10:17.0770698Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_aten_cuda_float16 PASSED [ 35%] 2023-01-11T23:10:17.0770883Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_aten_cuda_float64 PASSED [ 35%] 2023-01-11T23:10:17.0771067Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_aten_cuda_int32 PASSED [ 35%] 2023-01-11T23:10:17.0771247Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_aten_cuda_int64 PASSED [ 35%] 2023-01-11T23:10:17.0771457Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 35%] 2023-01-11T23:10:17.0771664Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 35%] 2023-01-11T23:10:17.0772039Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 35%] 2023-01-11T23:10:17.0772252Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 35%] 2023-01-11T23:10:17.0772465Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 35%] 2023-01-11T23:10:17.0772804Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_nvfuser_cuda_float32 SKIPPED (_refs.contiguous doesn't support nvfuser) [ 35%] 2023-01-11T23:10:17.0772990Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_copysign_executor_aten_cuda_float16 PASSED [ 35%] 2023-01-11T23:10:17.0773172Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_copysign_executor_aten_cuda_float32 PASSED [ 35%] 2023-01-11T23:10:17.0773354Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_copysign_executor_aten_cuda_float64 PASSED [ 35%] 2023-01-11T23:10:17.0773541Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_copysign_executor_aten_cuda_uint8 PASSED [ 35%] 2023-01-11T23:10:17.0773749Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_copysign_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 35%] 2023-01-11T23:10:17.0773949Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_copysign_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 36%] 2023-01-11T23:10:17.0774277Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_copysign_executor_nvfuser_cuda_float32 SKIPPED (_refs.copysign doesn't support nvfuser) [ 36%] 2023-01-11T23:10:17.0774614Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_copysign_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 36%] 2023-01-11T23:10:17.0774947Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_copysign_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 36%] 2023-01-11T23:10:17.0775155Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_copysign_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 36%] 2023-01-11T23:10:17.0775331Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_aten_cuda_bool PASSED [ 36%] 2023-01-11T23:10:17.0775513Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_aten_cuda_complex128 PASSED [ 36%] 2023-01-11T23:10:17.0775697Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_aten_cuda_complex64 PASSED [ 36%] 2023-01-11T23:10:17.0775877Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_aten_cuda_float16 PASSED [ 36%] 2023-01-11T23:10:17.0776051Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_aten_cuda_float32 PASSED [ 36%] 2023-01-11T23:10:17.0776232Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_aten_cuda_int16 PASSED [ 36%] 2023-01-11T23:10:17.0776404Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_aten_cuda_int8 PASSED [ 36%] 2023-01-11T23:10:17.0776609Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 36%] 2023-01-11T23:10:17.0776811Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 36%] 2023-01-11T23:10:17.0777134Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 36%] 2023-01-11T23:10:17.0777332Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 36%] 2023-01-11T23:10:17.0777535Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 36%] 2023-01-11T23:10:17.0777754Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_aten_cuda_bool PASSED [ 36%] 2023-01-11T23:10:17.0777943Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_aten_cuda_complex32 PASSED [ 36%] 2023-01-11T23:10:17.0778117Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_aten_cuda_complex64 PASSED [ 36%] 2023-01-11T23:10:17.0778296Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_aten_cuda_float32 PASSED [ 36%] 2023-01-11T23:10:17.0778475Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_aten_cuda_uint8 PASSED [ 36%] 2023-01-11T23:10:17.0778680Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 36%] 2023-01-11T23:10:17.0778888Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 36%] 2023-01-11T23:10:17.0779072Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_nvfuser_cuda_float32 PASSED [ 36%] 2023-01-11T23:10:17.0779253Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_nvfuser_cuda_int32 PASSED [ 36%] 2023-01-11T23:10:17.0779575Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 36%] 2023-01-11T23:10:17.0779755Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumsum_executor_aten_cuda_bfloat16 PASSED [ 36%] 2023-01-11T23:10:17.0779936Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumsum_executor_aten_cuda_float64 PASSED [ 36%] 2023-01-11T23:10:17.0780106Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumsum_executor_aten_cuda_int16 PASSED [ 36%] 2023-01-11T23:10:17.0780318Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumsum_executor_aten_cuda_int8 PASSED [ 36%] 2023-01-11T23:10:17.0780530Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumsum_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 36%] 2023-01-11T23:10:17.0780741Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumsum_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 36%] 2023-01-11T23:10:17.0781066Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumsum_executor_nvfuser_cuda_float32 SKIPPED (_refs.cumsum doesn't support nvfuser) [ 36%] 2023-01-11T23:10:17.0781270Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumsum_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 36%] 2023-01-11T23:10:17.0781596Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumsum_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 36%] 2023-01-11T23:10:17.0781923Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumsum_executor_nvfuser_cuda_int32 SKIPPED (_refs.cumsum doesn't support nvfuser) [ 36%] 2023-01-11T23:10:17.0782249Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumsum_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 36%] 2023-01-11T23:10:17.0782435Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_aten_cuda_bfloat16 PASSED [ 36%] 2023-01-11T23:10:17.0782618Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_aten_cuda_complex32 PASSED [ 36%] 2023-01-11T23:10:17.0782802Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_aten_cuda_float32 PASSED [ 36%] 2023-01-11T23:10:17.0782984Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_aten_cuda_int32 PASSED [ 36%] 2023-01-11T23:10:17.0783166Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_aten_cuda_int64 PASSED [ 36%] 2023-01-11T23:10:17.0783352Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_aten_cuda_int8 PASSED [ 36%] 2023-01-11T23:10:17.0783590Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_aten_cuda_uint8 PASSED [ 36%] 2023-01-11T23:10:17.0783945Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 36%] 2023-01-11T23:10:17.0784156Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 36%] 2023-01-11T23:10:17.0784490Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_nvfuser_cuda_float32 SKIPPED (_refs.diag_embed doesn't support nvfuser) [ 36%] 2023-01-11T23:10:17.0784671Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_executor_aten_cuda_bfloat16 PASSED [ 36%] 2023-01-11T23:10:17.0784853Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_executor_aten_cuda_complex128 PASSED [ 36%] 2023-01-11T23:10:17.0785033Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_executor_aten_cuda_complex32 PASSED [ 36%] 2023-01-11T23:10:17.0785217Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_executor_aten_cuda_complex64 PASSED [ 36%] 2023-01-11T23:10:17.0785398Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_executor_aten_cuda_float16 PASSED [ 36%] 2023-01-11T23:10:17.0785575Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_executor_aten_cuda_float64 PASSED [ 36%] 2023-01-11T23:10:17.0785751Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_executor_aten_cuda_int16 PASSED [ 36%] 2023-01-11T23:10:17.0785929Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_executor_aten_cuda_int32 PASSED [ 36%] 2023-01-11T23:10:17.0786104Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_executor_aten_cuda_int64 PASSED [ 36%] 2023-01-11T23:10:17.0786305Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_executor_aten_cuda_int8 PASSED [ 36%] 2023-01-11T23:10:17.0786472Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_executor_aten_cuda_uint8 PASSED [ 36%] 2023-01-11T23:10:17.0786683Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 36%] 2023-01-11T23:10:17.0786886Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 36%] 2023-01-11T23:10:17.0787205Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_executor_nvfuser_cuda_float32 SKIPPED (_refs.diag doesn't support nvfuser) [ 36%] 2023-01-11T23:10:17.0787524Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 36%] 2023-01-11T23:10:17.0787726Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 36%] 2023-01-11T23:10:17.0787918Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_aten_cuda_bfloat16 PASSED [ 36%] 2023-01-11T23:10:17.0788106Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_aten_cuda_bool PASSED [ 36%] 2023-01-11T23:10:17.0788301Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_aten_cuda_complex64 PASSED [ 36%] 2023-01-11T23:10:17.0788491Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_aten_cuda_float32 PASSED [ 36%] 2023-01-11T23:10:17.0788733Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_aten_cuda_int16 PASSED [ 36%] 2023-01-11T23:10:17.0788922Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_aten_cuda_uint8 PASSED [ 36%] 2023-01-11T23:10:17.0789136Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 36%] 2023-01-11T23:10:17.0789358Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 36%] 2023-01-11T23:10:17.0789602Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 36%] 2023-01-11T23:10:17.0789943Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 36%] 2023-01-11T23:10:17.0790286Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_nvfuser_cuda_int32 SKIPPED (_refs.diagonal_copy doesn't support nvfuser) [ 36%] 2023-01-11T23:10:17.0790497Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 36%] 2023-01-11T23:10:17.0790834Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 36%] 2023-01-11T23:10:17.0791022Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_executor_aten_cuda_bfloat16 PASSED [ 36%] 2023-01-11T23:10:17.0791204Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_executor_aten_cuda_complex64 PASSED [ 36%] 2023-01-11T23:10:17.0791386Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_executor_aten_cuda_int64 PASSED [ 36%] 2023-01-11T23:10:17.0791595Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 36%] 2023-01-11T23:10:17.0791803Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 36%] 2023-01-11T23:10:17.0792149Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 36%] 2023-01-11T23:10:17.0792386Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 36%] 2023-01-11T23:10:17.0792717Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_executor_nvfuser_cuda_float32 SKIPPED (_refs.diagonal doesn't support nvfuser) [ 36%] 2023-01-11T23:10:17.0792925Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 36%] 2023-01-11T23:10:17.0793254Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 36%] 2023-01-11T23:10:17.0793458Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 36%] 2023-01-11T23:10:17.0793655Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 36%] 2023-01-11T23:10:17.0793844Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_scatter_executor_aten_cuda_bfloat16 PASSED [ 36%] 2023-01-11T23:10:17.0794043Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_scatter_executor_aten_cuda_complex128 PASSED [ 36%] 2023-01-11T23:10:17.0794237Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_scatter_executor_aten_cuda_float32 PASSED [ 36%] 2023-01-11T23:10:17.0794422Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_scatter_executor_aten_cuda_float64 PASSED [ 36%] 2023-01-11T23:10:17.0794611Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_scatter_executor_aten_cuda_int16 PASSED [ 36%] 2023-01-11T23:10:17.0794792Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_scatter_executor_aten_cuda_int64 PASSED [ 36%] 2023-01-11T23:10:17.0794979Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_scatter_executor_aten_cuda_int8 PASSED [ 36%] 2023-01-11T23:10:17.0795164Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_scatter_executor_aten_cuda_uint8 PASSED [ 36%] 2023-01-11T23:10:17.0795415Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_scatter_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 36%] 2023-01-11T23:10:17.0795767Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_scatter_executor_nvfuser_cuda_float32 SKIPPED (_refs.diagonal_scatter doesn't support nvfuser) [ 36%] 2023-01-11T23:10:17.0795987Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_scatter_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 36%] 2023-01-11T23:10:17.0796336Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_scatter_executor_nvfuser_cuda_int32 SKIPPED (_refs.diagonal_scatter doesn't support nvfuser) [ 36%] 2023-01-11T23:10:17.0796675Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_scatter_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 36%] 2023-01-11T23:10:17.0796893Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_scatter_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 36%] 2023-01-11T23:10:17.0797079Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_digamma_executor_aten_cuda_bool PASSED [ 36%] 2023-01-11T23:10:17.0797265Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_digamma_executor_aten_cuda_float32 PASSED [ 36%] 2023-01-11T23:10:17.0797448Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_digamma_executor_aten_cuda_float64 PASSED [ 36%] 2023-01-11T23:10:17.0797629Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_digamma_executor_aten_cuda_int32 PASSED [ 36%] 2023-01-11T23:10:17.0797810Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_digamma_executor_aten_cuda_int64 PASSED [ 36%] 2023-01-11T23:10:17.0798014Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_digamma_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 36%] 2023-01-11T23:10:17.0798368Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_digamma_executor_nvfuser_cuda_float32 SKIPPED (_refs.digamma doesn't support nvfuser) [ 36%] 2023-01-11T23:10:17.0798701Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_digamma_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 36%] 2023-01-11T23:10:17.0799026Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_digamma_executor_nvfuser_cuda_int32 SKIPPED (_refs.digamma doesn't support nvfuser) [ 36%] 2023-01-11T23:10:17.0799352Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_digamma_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 36%] 2023-01-11T23:10:17.0799559Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_digamma_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 36%] 2023-01-11T23:10:17.0799755Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_floor_rounding_executor_aten_cuda_bfloat16 PASSED [ 36%] 2023-01-11T23:10:17.0799950Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_floor_rounding_executor_aten_cuda_float32 PASSED [ 36%] 2023-01-11T23:10:17.0800146Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_floor_rounding_executor_aten_cuda_float64 PASSED [ 36%] 2023-01-11T23:10:17.0800337Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_floor_rounding_executor_aten_cuda_int16 PASSED [ 36%] 2023-01-11T23:10:17.0800521Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_floor_rounding_executor_aten_cuda_int64 PASSED [ 36%] 2023-01-11T23:10:17.0800705Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_floor_rounding_executor_aten_cuda_int8 PASSED [ 36%] 2023-01-11T23:10:17.0800922Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_floor_rounding_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 36%] 2023-01-11T23:10:17.0801143Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_floor_rounding_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 36%] 2023-01-11T23:10:17.0801385Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_floor_rounding_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 36%] 2023-01-11T23:10:17.0801716Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_floor_rounding_executor_nvfuser_cuda_int32 SKIPPED (_refs.div doesn't support nvfuser) [ 36%] 2023-01-11T23:10:17.0801931Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_floor_rounding_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 36%] 2023-01-11T23:10:17.0802281Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_floor_rounding_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 36%] 2023-01-11T23:10:17.0802495Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_floor_rounding_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 36%] 2023-01-11T23:10:17.0802694Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_aten_cuda_bfloat16 PASSED [ 36%] 2023-01-11T23:10:17.0802906Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_aten_cuda_complex32 SKIPPED (Skipped!) [ 36%] 2023-01-11T23:10:17.0803113Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_aten_cuda_complex64 SKIPPED (Skipped!) [ 36%] 2023-01-11T23:10:17.0803300Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_aten_cuda_float16 PASSED [ 36%] 2023-01-11T23:10:17.0803493Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_aten_cuda_float32 PASSED [ 36%] 2023-01-11T23:10:17.0803684Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_aten_cuda_float64 PASSED [ 36%] 2023-01-11T23:10:17.0803899Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_aten_cuda_int64 PASSED [ 36%] 2023-01-11T23:10:17.0804088Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_aten_cuda_uint8 PASSED [ 36%] 2023-01-11T23:10:17.0804311Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 36%] 2023-01-11T23:10:17.0804523Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_nvfuser_cuda_complex128 SKIPPED (Skipped!) [ 36%] 2023-01-11T23:10:17.0804737Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_nvfuser_cuda_complex64 SKIPPED (Skipped!) [ 36%] 2023-01-11T23:10:17.0804955Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 36%] 2023-01-11T23:10:17.0805298Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_nvfuser_cuda_float32 SKIPPED (_refs.div doesn't support nvfuser) [ 36%] 2023-01-11T23:10:17.0805509Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 36%] 2023-01-11T23:10:17.0805853Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 36%] 2023-01-11T23:10:17.0806069Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 36%] 2023-01-11T23:10:17.0806261Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_trunc_rounding_executor_aten_cuda_float32 PASSED [ 36%] 2023-01-11T23:10:17.0806450Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_trunc_rounding_executor_aten_cuda_int16 PASSED [ 36%] 2023-01-11T23:10:17.0806636Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_trunc_rounding_executor_aten_cuda_int32 PASSED [ 36%] 2023-01-11T23:10:17.0806822Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_trunc_rounding_executor_aten_cuda_uint8 PASSED [ 36%] 2023-01-11T23:10:17.0807060Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_trunc_rounding_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 36%] 2023-01-11T23:10:17.0807392Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_trunc_rounding_executor_nvfuser_cuda_int32 SKIPPED (_refs.div doesn't support nvfuser) [ 36%] 2023-01-11T23:10:17.0807604Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_trunc_rounding_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 36%] 2023-01-11T23:10:17.0807941Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_trunc_rounding_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 36%] 2023-01-11T23:10:17.0808118Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_aten_cuda_bfloat16 PASSED [ 36%] 2023-01-11T23:10:17.0808308Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_aten_cuda_complex128 PASSED [ 36%] 2023-01-11T23:10:17.0808498Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_aten_cuda_complex32 PASSED [ 36%] 2023-01-11T23:10:17.0808680Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_aten_cuda_complex64 PASSED [ 36%] 2023-01-11T23:10:17.0808862Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_aten_cuda_float16 PASSED [ 36%] 2023-01-11T23:10:17.0809044Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_aten_cuda_int16 PASSED [ 36%] 2023-01-11T23:10:17.0809228Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_aten_cuda_int64 PASSED [ 37%] 2023-01-11T23:10:17.0809405Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_aten_cuda_uint8 PASSED [ 37%] 2023-01-11T23:10:17.0809636Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 37%] 2023-01-11T23:10:17.0809960Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_nvfuser_cuda_float32 SKIPPED (_refs.dsplit doesn't support nvfuser) [ 37%] 2023-01-11T23:10:17.0810285Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_nvfuser_cuda_int32 SKIPPED (_refs.dsplit doesn't support nvfuser) [ 37%] 2023-01-11T23:10:17.0810606Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 37%] 2023-01-11T23:10:17.0810812Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 37%] 2023-01-11T23:10:17.0810997Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_aten_cuda_bfloat16 PASSED [ 37%] 2023-01-11T23:10:17.0811187Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_aten_cuda_complex128 PASSED [ 37%] 2023-01-11T23:10:17.0811370Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_aten_cuda_complex32 PASSED [ 37%] 2023-01-11T23:10:17.0811557Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_aten_cuda_complex64 PASSED [ 37%] 2023-01-11T23:10:17.0811739Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_aten_cuda_float16 PASSED [ 37%] 2023-01-11T23:10:17.0811919Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_aten_cuda_uint8 PASSED [ 37%] 2023-01-11T23:10:17.0812123Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 37%] 2023-01-11T23:10:17.0812327Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 37%] 2023-01-11T23:10:17.0812533Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 37%] 2023-01-11T23:10:17.0812738Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 37%] 2023-01-11T23:10:17.0813084Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_nvfuser_cuda_int32 SKIPPED (_refs.dstack doesn't support nvfuser) [ 37%] 2023-01-11T23:10:17.0813287Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 37%] 2023-01-11T23:10:17.0813618Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 37%] 2023-01-11T23:10:17.0813815Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 37%] 2023-01-11T23:10:17.0814118Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_aten_cuda_bool SKIPPED (Can't check result for empty) [ 37%] 2023-01-11T23:10:17.0814428Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_aten_cuda_complex128 SKIPPED (Can't check result for empty) [ 37%] 2023-01-11T23:10:17.0814835Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_aten_cuda_complex32 SKIPPED (Can't check result for empty) [ 37%] 2023-01-11T23:10:17.0815139Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_aten_cuda_complex64 SKIPPED (Can't check result for empty) [ 37%] 2023-01-11T23:10:17.0815443Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_aten_cuda_float32 SKIPPED (Can't check result for empty) [ 37%] 2023-01-11T23:10:17.0815742Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_aten_cuda_int32 SKIPPED (Can't check result for empty) [ 37%] 2023-01-11T23:10:17.0816045Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_aten_cuda_uint8 SKIPPED (Can't check result for empty) [ 37%] 2023-01-11T23:10:17.0816395Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_nvfuser_cuda_float64 SKIPPED (Can't check result for empty) [ 37%] 2023-01-11T23:10:17.0816697Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_nvfuser_cuda_int64 SKIPPED (Can't check result for empty) [ 37%] 2023-01-11T23:10:17.0816997Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_nvfuser_cuda_int8 SKIPPED (Can't check result for empty) [ 37%] 2023-01-11T23:10:17.0817294Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_nvfuser_cuda_uint8 SKIPPED (Can't check result for empty) [ 37%] 2023-01-11T23:10:17.0817614Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_aten_cuda_bfloat16 SKIPPED (Can't check result for empty_like) [ 37%] 2023-01-11T23:10:17.0817925Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_aten_cuda_bool SKIPPED (Can't check result for empty_like) [ 37%] 2023-01-11T23:10:17.0818247Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_aten_cuda_complex64 SKIPPED (Can't check result for empty_like) [ 37%] 2023-01-11T23:10:17.0818558Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_aten_cuda_int32 SKIPPED (Can't check result for empty_like) [ 37%] 2023-01-11T23:10:17.0818862Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_aten_cuda_int64 SKIPPED (Can't check result for empty_like) [ 37%] 2023-01-11T23:10:17.0819163Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_aten_cuda_int8 SKIPPED (Can't check result for empty_like) [ 37%] 2023-01-11T23:10:17.0819471Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_aten_cuda_uint8 SKIPPED (Can't check result for empty_like) [ 37%] 2023-01-11T23:10:17.0819796Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_nvfuser_cuda_complex64 SKIPPED (Can't check result for empty_like) [ 37%] 2023-01-11T23:10:17.0820176Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_nvfuser_cuda_float16 SKIPPED (Can't check result for empty_like) [ 37%] 2023-01-11T23:10:17.0820519Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_nvfuser_cuda_float32 SKIPPED (Can't check result for empty_like) [ 37%] 2023-01-11T23:10:17.0820835Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_nvfuser_cuda_int64 SKIPPED (Can't check result for empty_like) [ 37%] 2023-01-11T23:10:17.0821153Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_nvfuser_cuda_int8 SKIPPED (Can't check result for empty_like) [ 37%] 2023-01-11T23:10:17.0821468Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_nvfuser_cuda_uint8 SKIPPED (Can't check result for empty_like) [ 37%] 2023-01-11T23:10:17.0821641Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_aten_cuda_bool PASSED [ 37%] 2023-01-11T23:10:17.0821823Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_aten_cuda_complex32 PASSED [ 37%] 2023-01-11T23:10:17.0822003Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_aten_cuda_float16 PASSED [ 37%] 2023-01-11T23:10:17.0822180Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_aten_cuda_float64 PASSED [ 37%] 2023-01-11T23:10:17.0822359Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_aten_cuda_int16 PASSED [ 37%] 2023-01-11T23:10:17.0822528Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_aten_cuda_int64 PASSED [ 37%] 2023-01-11T23:10:17.0822732Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 37%] 2023-01-11T23:10:17.0822940Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 37%] 2023-01-11T23:10:17.0823169Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 37%] 2023-01-11T23:10:17.0823346Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_nvfuser_cuda_int32 PASSED [ 37%] 2023-01-11T23:10:17.0823665Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 37%] 2023-01-11T23:10:17.0823846Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erf_executor_aten_cuda_bfloat16 PASSED [ 37%] 2023-01-11T23:10:17.0824021Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erf_executor_aten_cuda_bool PASSED [ 37%] 2023-01-11T23:10:17.0824198Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erf_executor_aten_cuda_float16 PASSED [ 37%] 2023-01-11T23:10:17.0824376Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erf_executor_aten_cuda_float32 PASSED [ 37%] 2023-01-11T23:10:17.0824551Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erf_executor_aten_cuda_float64 PASSED [ 37%] 2023-01-11T23:10:17.0824729Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erf_executor_aten_cuda_int64 PASSED [ 37%] 2023-01-11T23:10:17.0824901Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erf_executor_aten_cuda_int8 PASSED [ 37%] 2023-01-11T23:10:17.0825106Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erf_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 37%] 2023-01-11T23:10:17.0825283Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erf_executor_nvfuser_cuda_float32 PASSED [ 37%] 2023-01-11T23:10:17.0825489Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erf_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 37%] 2023-01-11T23:10:17.0825667Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erf_executor_nvfuser_cuda_int32 PASSED [ 37%] 2023-01-11T23:10:17.0825871Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erf_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 37%] 2023-01-11T23:10:17.0826217Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erf_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 37%] 2023-01-11T23:10:17.0826394Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfc_executor_aten_cuda_bool PASSED [ 37%] 2023-01-11T23:10:17.0826576Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfc_executor_aten_cuda_float16 PASSED [ 37%] 2023-01-11T23:10:17.0826750Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfc_executor_aten_cuda_float64 PASSED [ 37%] 2023-01-11T23:10:17.0826928Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfc_executor_aten_cuda_int32 PASSED [ 37%] 2023-01-11T23:10:17.0827126Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfc_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 37%] 2023-01-11T23:10:17.0827330Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfc_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 37%] 2023-01-11T23:10:17.0827517Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfc_executor_nvfuser_cuda_float32 PASSED [ 37%] 2023-01-11T23:10:17.0827844Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfc_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 37%] 2023-01-11T23:10:17.0828162Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfc_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 37%] 2023-01-11T23:10:17.0828364Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfc_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 37%] 2023-01-11T23:10:17.0828545Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfinv_executor_aten_cuda_bool PASSED [ 37%] 2023-01-11T23:10:17.0828786Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfinv_executor_aten_cuda_int16 PASSED [ 37%] 2023-01-11T23:10:17.0828994Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfinv_executor_aten_cuda_int32 PASSED [ 37%] 2023-01-11T23:10:17.0829173Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfinv_executor_aten_cuda_int64 PASSED [ 37%] 2023-01-11T23:10:17.0829343Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfinv_executor_aten_cuda_int8 PASSED [ 37%] 2023-01-11T23:10:17.0829516Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfinv_executor_aten_cuda_uint8 PASSED [ 37%] 2023-01-11T23:10:17.0829724Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfinv_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 37%] 2023-01-11T23:10:17.0829926Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfinv_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 37%] 2023-01-11T23:10:17.0830253Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfinv_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 37%] 2023-01-11T23:10:17.0830458Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfinv_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 37%] 2023-01-11T23:10:17.0830634Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp2_executor_aten_cuda_float16 PASSED [ 37%] 2023-01-11T23:10:17.0830811Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp2_executor_aten_cuda_float64 PASSED [ 37%] 2023-01-11T23:10:17.0830987Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp2_executor_aten_cuda_int32 PASSED [ 37%] 2023-01-11T23:10:17.0831164Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp2_executor_aten_cuda_int64 PASSED [ 37%] 2023-01-11T23:10:17.0831334Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp2_executor_aten_cuda_uint8 PASSED [ 37%] 2023-01-11T23:10:17.0831536Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp2_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 37%] 2023-01-11T23:10:17.0831743Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp2_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 37%] 2023-01-11T23:10:17.0832093Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp2_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 37%] 2023-01-11T23:10:17.0832407Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp2_executor_nvfuser_cuda_int32 SKIPPED (_refs.exp2 doesn't support nvfuser) [ 37%] 2023-01-11T23:10:17.0832609Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp2_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 37%] 2023-01-11T23:10:17.0832811Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp2_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 37%] 2023-01-11T23:10:17.0832994Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_aten_cuda_complex64 PASSED [ 37%] 2023-01-11T23:10:17.0833176Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_aten_cuda_float16 PASSED [ 37%] 2023-01-11T23:10:17.0833347Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_aten_cuda_int16 PASSED [ 37%] 2023-01-11T23:10:17.0833525Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_aten_cuda_int32 PASSED [ 37%] 2023-01-11T23:10:17.0833702Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_aten_cuda_int8 PASSED [ 37%] 2023-01-11T23:10:17.0833906Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 37%] 2023-01-11T23:10:17.0834243Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 37%] 2023-01-11T23:10:17.0834450Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 37%] 2023-01-11T23:10:17.0834658Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_nvfuser_cuda_float32 PASSED [ 37%] 2023-01-11T23:10:17.0834864Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 37%] 2023-01-11T23:10:17.0835186Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 37%] 2023-01-11T23:10:17.0835365Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_nvfuser_cuda_int32 PASSED [ 37%] 2023-01-11T23:10:17.0835680Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 37%] 2023-01-11T23:10:17.0835882Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 37%] 2023-01-11T23:10:17.0836070Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_as_executor_aten_cuda_bfloat16 PASSED [ 37%] 2023-01-11T23:10:17.0836263Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_as_executor_aten_cuda_complex64 PASSED [ 37%] 2023-01-11T23:10:17.0836450Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_as_executor_aten_cuda_float64 PASSED [ 37%] 2023-01-11T23:10:17.0836634Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_as_executor_aten_cuda_int16 PASSED [ 37%] 2023-01-11T23:10:17.0836816Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_as_executor_aten_cuda_int32 PASSED [ 37%] 2023-01-11T23:10:17.0836994Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_as_executor_aten_cuda_int64 PASSED [ 37%] 2023-01-11T23:10:17.0837174Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_as_executor_aten_cuda_int8 PASSED [ 37%] 2023-01-11T23:10:17.0837387Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_as_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 37%] 2023-01-11T23:10:17.0837590Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_as_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 37%] 2023-01-11T23:10:17.0837831Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_as_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 37%] 2023-01-11T23:10:17.0838167Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_as_executor_nvfuser_cuda_float32 SKIPPED (_refs.expand_as doesn't support nvfuser) [ 37%] 2023-01-11T23:10:17.0838500Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_as_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 37%] 2023-01-11T23:10:17.0838827Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_as_executor_nvfuser_cuda_int32 SKIPPED (_refs.expand_as doesn't support nvfuser) [ 37%] 2023-01-11T23:10:17.0839035Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_as_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 37%] 2023-01-11T23:10:17.0839223Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_executor_aten_cuda_bfloat16 PASSED [ 37%] 2023-01-11T23:10:17.0839411Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_executor_aten_cuda_bool PASSED [ 37%] 2023-01-11T23:10:17.0839597Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_executor_aten_cuda_float32 PASSED [ 37%] 2023-01-11T23:10:17.0839773Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_executor_aten_cuda_int32 PASSED [ 37%] 2023-01-11T23:10:17.0839943Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_executor_aten_cuda_int8 PASSED [ 37%] 2023-01-11T23:10:17.0840120Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_executor_aten_cuda_uint8 PASSED [ 37%] 2023-01-11T23:10:17.0840327Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 37%] 2023-01-11T23:10:17.0840560Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 37%] 2023-01-11T23:10:17.0840897Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_executor_nvfuser_cuda_float32 SKIPPED (_refs.expand doesn't support nvfuser) [ 37%] 2023-01-11T23:10:17.0841266Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 37%] 2023-01-11T23:10:17.0841586Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_executor_nvfuser_cuda_int32 SKIPPED (_refs.expand doesn't support nvfuser) [ 37%] 2023-01-11T23:10:17.0841786Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 37%] 2023-01-11T23:10:17.0841992Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 37%] 2023-01-11T23:10:17.0842175Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expm1_executor_aten_cuda_bfloat16 PASSED [ 37%] 2023-01-11T23:10:17.0842348Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expm1_executor_aten_cuda_float32 PASSED [ 37%] 2023-01-11T23:10:17.0842528Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expm1_executor_aten_cuda_int16 PASSED [ 37%] 2023-01-11T23:10:17.0842706Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expm1_executor_aten_cuda_int8 PASSED [ 37%] 2023-01-11T23:10:17.0842910Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expm1_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 37%] 2023-01-11T23:10:17.0843091Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expm1_executor_nvfuser_cuda_int32 PASSED [ 37%] 2023-01-11T23:10:17.0843293Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expm1_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 37%] 2023-01-11T23:10:17.0843621Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expm1_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 37%] 2023-01-11T23:10:17.0843844Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expm1_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 37%] 2023-01-11T23:10:17.0844026Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eye_executor_aten_cuda_bool PASSED [ 37%] 2023-01-11T23:10:17.0844202Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eye_executor_aten_cuda_complex64 PASSED [ 37%] 2023-01-11T23:10:17.0844380Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eye_executor_aten_cuda_float16 PASSED [ 37%] 2023-01-11T23:10:17.0844556Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eye_executor_aten_cuda_float32 PASSED [ 37%] 2023-01-11T23:10:17.0844732Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eye_executor_aten_cuda_int32 PASSED [ 37%] 2023-01-11T23:10:17.0844909Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eye_executor_aten_cuda_int64 PASSED [ 37%] 2023-01-11T23:10:17.0845082Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eye_executor_aten_cuda_int8 PASSED [ 37%] 2023-01-11T23:10:17.0845260Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eye_executor_aten_cuda_uint8 PASSED [ 38%] 2023-01-11T23:10:17.0845460Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eye_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 38%] 2023-01-11T23:10:17.0845779Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eye_executor_nvfuser_cuda_float32 SKIPPED (_refs.eye doesn't support nvfuser) [ 38%] 2023-01-11T23:10:17.0846087Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eye_executor_nvfuser_cuda_int32 SKIPPED (_refs.eye doesn't support nvfuser) [ 38%] 2023-01-11T23:10:17.0846282Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eye_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 38%] 2023-01-11T23:10:17.0846626Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eye_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 38%] 2023-01-11T23:10:17.0846831Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eye_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 38%] 2023-01-11T23:10:17.0847021Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft2_executor_aten_cuda_complex64 PASSED [ 38%] 2023-01-11T23:10:17.0847203Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft2_executor_aten_cuda_float16 PASSED [ 38%] 2023-01-11T23:10:17.0847383Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft2_executor_aten_cuda_float32 PASSED [ 38%] 2023-01-11T23:10:17.0847559Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft2_executor_aten_cuda_float64 PASSED [ 38%] 2023-01-11T23:10:17.0847737Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft2_executor_aten_cuda_int16 PASSED [ 38%] 2023-01-11T23:10:17.0847916Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft2_executor_aten_cuda_int64 PASSED [ 38%] 2023-01-11T23:10:17.0848087Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft2_executor_aten_cuda_int8 PASSED [ 38%] 2023-01-11T23:10:17.0848434Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft2_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 38%] 2023-01-11T23:10:17.0848640Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft2_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 38%] 2023-01-11T23:10:17.0848967Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft2_executor_nvfuser_cuda_float32 SKIPPED (_refs.fft.fft2 doesn't support nvfuser) [ 38%] 2023-01-11T23:10:17.0849172Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft2_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 38%] 2023-01-11T23:10:17.0849502Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft2_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 38%] 2023-01-11T23:10:17.0849735Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft2_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 38%] 2023-01-11T23:10:17.0849917Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft_executor_aten_cuda_bool PASSED [ 38%] 2023-01-11T23:10:17.0850105Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft_executor_aten_cuda_complex32 PASSED [ 38%] 2023-01-11T23:10:17.0850289Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft_executor_aten_cuda_complex64 PASSED [ 38%] 2023-01-11T23:10:17.0850471Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft_executor_aten_cuda_int32 PASSED [ 38%] 2023-01-11T23:10:17.0850645Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft_executor_aten_cuda_int64 PASSED [ 38%] 2023-01-11T23:10:17.0850824Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft_executor_aten_cuda_uint8 PASSED [ 38%] 2023-01-11T23:10:17.0851029Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 38%] 2023-01-11T23:10:17.0851241Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 38%] 2023-01-11T23:10:17.0851587Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 38%] 2023-01-11T23:10:17.0851797Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 38%] 2023-01-11T23:10:17.0851998Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 38%] 2023-01-11T23:10:17.0852325Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 38%] 2023-01-11T23:10:17.0852561Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 38%] 2023-01-11T23:10:17.0852763Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 38%] 2023-01-11T23:10:17.0852935Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftn_executor_aten_cuda_bool PASSED [ 38%] 2023-01-11T23:10:17.0853121Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftn_executor_aten_cuda_complex32 PASSED [ 38%] 2023-01-11T23:10:17.0853307Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftn_executor_aten_cuda_complex64 PASSED [ 38%] 2023-01-11T23:10:17.0853491Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftn_executor_aten_cuda_float16 PASSED [ 38%] 2023-01-11T23:10:17.0853673Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftn_executor_aten_cuda_float32 PASSED [ 38%] 2023-01-11T23:10:17.0853851Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftn_executor_aten_cuda_float64 PASSED [ 38%] 2023-01-11T23:10:17.0854034Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftn_executor_aten_cuda_int64 PASSED [ 38%] 2023-01-11T23:10:17.0854212Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftn_executor_aten_cuda_int8 PASSED [ 38%] 2023-01-11T23:10:17.0854388Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftn_executor_aten_cuda_uint8 PASSED [ 38%] 2023-01-11T23:10:17.0854806Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftn_executor_nvfuser_cuda_float32 SKIPPED (_refs.fft.fftn doesn't support nvfuser) [ 38%] 2023-01-11T23:10:17.0855015Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftn_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 38%] 2023-01-11T23:10:17.0855345Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftn_executor_nvfuser_cuda_int32 SKIPPED (_refs.fft.fftn doesn't support nvfuser) [ 38%] 2023-01-11T23:10:17.0855537Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_aten_cuda_bfloat16 PASSED [ 38%] 2023-01-11T23:10:17.0855774Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_aten_cuda_complex128 PASSED [ 38%] 2023-01-11T23:10:17.0855966Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_aten_cuda_complex32 PASSED [ 38%] 2023-01-11T23:10:17.0856150Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_aten_cuda_complex64 PASSED [ 38%] 2023-01-11T23:10:17.0856333Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_aten_cuda_int16 PASSED [ 38%] 2023-01-11T23:10:17.0856520Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_aten_cuda_int32 PASSED [ 38%] 2023-01-11T23:10:17.0856739Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 38%] 2023-01-11T23:10:17.0856943Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 38%] 2023-01-11T23:10:17.0857159Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 38%] 2023-01-11T23:10:17.0857501Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_nvfuser_cuda_int32 SKIPPED (_refs.fft.fftshift doesn't support nvfuser) [ 38%] 2023-01-11T23:10:17.0857709Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 38%] 2023-01-11T23:10:17.0857918Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 38%] 2023-01-11T23:10:17.0858135Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft2_executor_aten_cuda_complex128 PASSED [ 38%] 2023-01-11T23:10:17.0858324Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft2_executor_aten_cuda_complex64 PASSED [ 38%] 2023-01-11T23:10:17.0858510Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft2_executor_aten_cuda_float16 PASSED [ 38%] 2023-01-11T23:10:17.0858693Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft2_executor_aten_cuda_float32 PASSED [ 38%] 2023-01-11T23:10:17.0858872Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft2_executor_aten_cuda_float64 PASSED [ 38%] 2023-01-11T23:10:17.0859047Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft2_executor_aten_cuda_int32 PASSED [ 38%] 2023-01-11T23:10:17.0859228Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft2_executor_aten_cuda_uint8 PASSED [ 38%] 2023-01-11T23:10:17.0859440Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft2_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 38%] 2023-01-11T23:10:17.0859794Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft2_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 38%] 2023-01-11T23:10:17.0860001Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft2_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 38%] 2023-01-11T23:10:17.0860206Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft2_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 38%] 2023-01-11T23:10:17.0860535Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft2_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 38%] 2023-01-11T23:10:17.0860721Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_aten_cuda_complex128 PASSED [ 38%] 2023-01-11T23:10:17.0860905Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_aten_cuda_complex32 PASSED [ 38%] 2023-01-11T23:10:17.0861090Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_aten_cuda_float16 PASSED [ 38%] 2023-01-11T23:10:17.0861292Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_aten_cuda_float64 PASSED [ 38%] 2023-01-11T23:10:17.0861478Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_aten_cuda_int32 PASSED [ 38%] 2023-01-11T23:10:17.0861658Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_aten_cuda_uint8 PASSED [ 38%] 2023-01-11T23:10:17.0861863Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 38%] 2023-01-11T23:10:17.0862066Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 38%] 2023-01-11T23:10:17.0862392Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 38%] 2023-01-11T23:10:17.0862581Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfftn_executor_aten_cuda_complex32 PASSED [ 38%] 2023-01-11T23:10:17.0862769Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfftn_executor_aten_cuda_float16 PASSED [ 38%] 2023-01-11T23:10:17.0862950Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfftn_executor_aten_cuda_float64 PASSED [ 38%] 2023-01-11T23:10:17.0863131Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfftn_executor_aten_cuda_int64 PASSED [ 38%] 2023-01-11T23:10:17.0863306Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfftn_executor_aten_cuda_uint8 PASSED [ 38%] 2023-01-11T23:10:17.0863516Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfftn_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 38%] 2023-01-11T23:10:17.0863727Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfftn_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 38%] 2023-01-11T23:10:17.0863960Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfftn_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 38%] 2023-01-11T23:10:17.0864294Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfftn_executor_nvfuser_cuda_float32 SKIPPED (_refs.fft.hfftn doesn't support nvfuser) [ 38%] 2023-01-11T23:10:17.0864501Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfftn_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 38%] 2023-01-11T23:10:17.0864830Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfftn_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 38%] 2023-01-11T23:10:17.0865035Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfftn_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 38%] 2023-01-11T23:10:17.0865364Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfftn_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 38%] 2023-01-11T23:10:17.0865557Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft2_executor_aten_cuda_complex128 PASSED [ 38%] 2023-01-11T23:10:17.0865739Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft2_executor_aten_cuda_complex64 PASSED [ 38%] 2023-01-11T23:10:17.0865924Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft2_executor_aten_cuda_float16 PASSED [ 38%] 2023-01-11T23:10:17.0866104Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft2_executor_aten_cuda_int16 PASSED [ 38%] 2023-01-11T23:10:17.0866282Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft2_executor_aten_cuda_int64 PASSED [ 38%] 2023-01-11T23:10:17.0866462Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft2_executor_aten_cuda_int8 PASSED [ 38%] 2023-01-11T23:10:17.0866637Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft2_executor_aten_cuda_uint8 PASSED [ 38%] 2023-01-11T23:10:17.0866989Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft2_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 38%] 2023-01-11T23:10:17.0867223Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft2_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 38%] 2023-01-11T23:10:17.0867554Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft2_executor_nvfuser_cuda_float32 SKIPPED (_refs.fft.ifft2 doesn't support nvfuser) [ 38%] 2023-01-11T23:10:17.0867763Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft2_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 38%] 2023-01-11T23:10:17.0868082Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft2_executor_nvfuser_cuda_int32 SKIPPED (_refs.fft.ifft2 doesn't support nvfuser) [ 38%] 2023-01-11T23:10:17.0868407Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft2_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 38%] 2023-01-11T23:10:17.0868616Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft2_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 38%] 2023-01-11T23:10:17.0868859Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_aten_cuda_bool PASSED [ 38%] 2023-01-11T23:10:17.0869045Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_aten_cuda_complex128 PASSED [ 38%] 2023-01-11T23:10:17.0869232Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_aten_cuda_complex32 PASSED [ 38%] 2023-01-11T23:10:17.0869416Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_aten_cuda_complex64 PASSED [ 38%] 2023-01-11T23:10:17.0869599Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_aten_cuda_float16 PASSED [ 38%] 2023-01-11T23:10:17.0869809Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_aten_cuda_int32 PASSED [ 38%] 2023-01-11T23:10:17.0869988Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_aten_cuda_int8 PASSED [ 38%] 2023-01-11T23:10:17.0870336Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 38%] 2023-01-11T23:10:17.0870570Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 38%] 2023-01-11T23:10:17.0870922Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 38%] 2023-01-11T23:10:17.0871249Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_nvfuser_cuda_int32 SKIPPED (_refs.fft.ifft doesn't support nvfuser) [ 38%] 2023-01-11T23:10:17.0871454Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 38%] 2023-01-11T23:10:17.0871782Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 38%] 2023-01-11T23:10:17.0871965Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftn_executor_aten_cuda_bool PASSED [ 38%] 2023-01-11T23:10:17.0872153Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftn_executor_aten_cuda_complex128 PASSED [ 38%] 2023-01-11T23:10:17.0872340Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftn_executor_aten_cuda_complex32 PASSED [ 38%] 2023-01-11T23:10:17.0872524Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftn_executor_aten_cuda_float32 PASSED [ 38%] 2023-01-11T23:10:17.0872701Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftn_executor_aten_cuda_int16 PASSED [ 38%] 2023-01-11T23:10:17.0872883Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftn_executor_aten_cuda_int8 PASSED [ 38%] 2023-01-11T23:10:17.0873061Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftn_executor_aten_cuda_uint8 PASSED [ 38%] 2023-01-11T23:10:17.0873292Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftn_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 38%] 2023-01-11T23:10:17.0873505Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftn_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 38%] 2023-01-11T23:10:17.0873837Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftn_executor_nvfuser_cuda_float32 SKIPPED (_refs.fft.ifftn doesn't support nvfuser) [ 38%] 2023-01-11T23:10:17.0874045Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftn_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 38%] 2023-01-11T23:10:17.0874250Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftn_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 38%] 2023-01-11T23:10:17.0874452Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftn_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 38%] 2023-01-11T23:10:17.0874646Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftshift_executor_aten_cuda_bfloat16 PASSED [ 38%] 2023-01-11T23:10:17.0874832Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftshift_executor_aten_cuda_complex32 PASSED [ 38%] 2023-01-11T23:10:17.0875016Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftshift_executor_aten_cuda_complex64 PASSED [ 38%] 2023-01-11T23:10:17.0875202Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftshift_executor_aten_cuda_float16 PASSED [ 38%] 2023-01-11T23:10:17.0875384Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftshift_executor_aten_cuda_float32 PASSED [ 38%] 2023-01-11T23:10:17.0875567Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftshift_executor_aten_cuda_int16 PASSED [ 38%] 2023-01-11T23:10:17.0875812Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftshift_executor_aten_cuda_int32 PASSED [ 38%] 2023-01-11T23:10:17.0876002Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftshift_executor_aten_cuda_uint8 PASSED [ 38%] 2023-01-11T23:10:17.0876218Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftshift_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 38%] 2023-01-11T23:10:17.0876428Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftshift_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 38%] 2023-01-11T23:10:17.0876636Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftshift_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 38%] 2023-01-11T23:10:17.0876847Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftshift_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 38%] 2023-01-11T23:10:17.0877187Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftshift_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 38%] 2023-01-11T23:10:17.0877397Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftshift_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 38%] 2023-01-11T23:10:17.0877581Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft2_executor_aten_cuda_float16 PASSED [ 38%] 2023-01-11T23:10:17.0877767Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft2_executor_aten_cuda_float32 PASSED [ 38%] 2023-01-11T23:10:17.0877951Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft2_executor_aten_cuda_float64 PASSED [ 38%] 2023-01-11T23:10:17.0878134Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft2_executor_aten_cuda_int16 PASSED [ 38%] 2023-01-11T23:10:17.0878313Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft2_executor_aten_cuda_int8 PASSED [ 38%] 2023-01-11T23:10:17.0878518Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft2_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 38%] 2023-01-11T23:10:17.0878750Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft2_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 38%] 2023-01-11T23:10:17.0879087Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft2_executor_nvfuser_cuda_float32 SKIPPED (_refs.fft.ihfft2 doesn't support nvfuser) [ 38%] 2023-01-11T23:10:17.0879417Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft2_executor_nvfuser_cuda_int32 SKIPPED (_refs.fft.ihfft2 doesn't support nvfuser) [ 38%] 2023-01-11T23:10:17.0879628Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft2_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 38%] 2023-01-11T23:10:17.0879954Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft2_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 38%] 2023-01-11T23:10:17.0880164Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft2_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 39%] 2023-01-11T23:10:17.0880347Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft_executor_aten_cuda_bool PASSED [ 39%] 2023-01-11T23:10:17.0880529Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft_executor_aten_cuda_float32 PASSED [ 39%] 2023-01-11T23:10:17.0880737Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft_executor_aten_cuda_int16 PASSED [ 39%] 2023-01-11T23:10:17.0880941Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft_executor_aten_cuda_int64 PASSED [ 39%] 2023-01-11T23:10:17.0881122Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft_executor_aten_cuda_int8 PASSED [ 39%] 2023-01-11T23:10:17.0881293Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft_executor_aten_cuda_uint8 PASSED [ 39%] 2023-01-11T23:10:17.0881650Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft_executor_nvfuser_cuda_float32 SKIPPED (_refs.fft.ihfft doesn't support nvfuser) [ 39%] 2023-01-11T23:10:17.0881859Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 39%] 2023-01-11T23:10:17.0882044Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfftn_executor_aten_cuda_float16 PASSED [ 39%] 2023-01-11T23:10:17.0882228Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfftn_executor_aten_cuda_float32 PASSED [ 39%] 2023-01-11T23:10:17.0882409Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfftn_executor_aten_cuda_float64 PASSED [ 39%] 2023-01-11T23:10:17.0882613Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfftn_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 39%] 2023-01-11T23:10:17.0882820Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfftn_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 39%] 2023-01-11T23:10:17.0883033Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfftn_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 39%] 2023-01-11T23:10:17.0883356Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfftn_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 39%] 2023-01-11T23:10:17.0883562Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfftn_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 39%] 2023-01-11T23:10:17.0883742Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft2_executor_aten_cuda_bool PASSED [ 39%] 2023-01-11T23:10:17.0883929Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft2_executor_aten_cuda_complex32 PASSED [ 39%] 2023-01-11T23:10:17.0884115Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft2_executor_aten_cuda_complex64 PASSED [ 39%] 2023-01-11T23:10:17.0884301Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft2_executor_aten_cuda_float16 PASSED [ 39%] 2023-01-11T23:10:17.0884480Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft2_executor_aten_cuda_float32 PASSED [ 39%] 2023-01-11T23:10:17.0884686Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft2_executor_aten_cuda_float64 PASSED [ 39%] 2023-01-11T23:10:17.0884867Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft2_executor_aten_cuda_int16 PASSED [ 39%] 2023-01-11T23:10:17.0885046Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft2_executor_aten_cuda_int32 PASSED [ 39%] 2023-01-11T23:10:17.0885215Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft2_executor_aten_cuda_int64 PASSED [ 39%] 2023-01-11T23:10:17.0885393Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft2_executor_aten_cuda_int8 PASSED [ 39%] 2023-01-11T23:10:17.0885567Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft2_executor_aten_cuda_uint8 PASSED [ 39%] 2023-01-11T23:10:17.0885774Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft2_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 39%] 2023-01-11T23:10:17.0886131Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft2_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 39%] 2023-01-11T23:10:17.0886342Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft2_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 39%] 2023-01-11T23:10:17.0886678Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft2_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 39%] 2023-01-11T23:10:17.0887005Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft2_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 39%] 2023-01-11T23:10:17.0887218Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft_executor_aten_cuda_complex32 PASSED [ 39%] 2023-01-11T23:10:17.0887405Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft_executor_aten_cuda_complex64 PASSED [ 39%] 2023-01-11T23:10:17.0887586Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft_executor_aten_cuda_float16 PASSED [ 39%] 2023-01-11T23:10:17.0887768Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft_executor_aten_cuda_float32 PASSED [ 39%] 2023-01-11T23:10:17.0887948Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft_executor_aten_cuda_int16 PASSED [ 39%] 2023-01-11T23:10:17.0888128Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft_executor_aten_cuda_int32 PASSED [ 39%] 2023-01-11T23:10:17.0888306Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft_executor_aten_cuda_int8 PASSED [ 39%] 2023-01-11T23:10:17.0888516Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 39%] 2023-01-11T23:10:17.0888725Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 39%] 2023-01-11T23:10:17.0888934Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 39%] 2023-01-11T23:10:17.0889118Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfftn_executor_aten_cuda_float32 PASSED [ 39%] 2023-01-11T23:10:17.0889297Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfftn_executor_aten_cuda_int16 PASSED [ 39%] 2023-01-11T23:10:17.0889468Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfftn_executor_aten_cuda_int32 PASSED [ 39%] 2023-01-11T23:10:17.0889641Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfftn_executor_aten_cuda_uint8 PASSED [ 39%] 2023-01-11T23:10:17.0889855Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfftn_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 39%] 2023-01-11T23:10:17.0890229Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfftn_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 39%] 2023-01-11T23:10:17.0890466Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfftn_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 39%] 2023-01-11T23:10:17.0890700Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfftn_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 39%] 2023-01-11T23:10:17.0891029Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfftn_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 39%] 2023-01-11T23:10:17.0891210Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft2_executor_aten_cuda_bool PASSED [ 39%] 2023-01-11T23:10:17.0891393Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft2_executor_aten_cuda_float16 PASSED [ 39%] 2023-01-11T23:10:17.0891576Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft2_executor_aten_cuda_int16 PASSED [ 39%] 2023-01-11T23:10:17.0891754Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft2_executor_aten_cuda_int64 PASSED [ 39%] 2023-01-11T23:10:17.0891929Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft2_executor_aten_cuda_uint8 PASSED [ 39%] 2023-01-11T23:10:17.0892139Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft2_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 39%] 2023-01-11T23:10:17.0892343Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft2_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 39%] 2023-01-11T23:10:17.0892676Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft2_executor_nvfuser_cuda_int32 SKIPPED (_refs.fft.rfft2 doesn't support nvfuser) [ 39%] 2023-01-11T23:10:17.0892909Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft2_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 39%] 2023-01-11T23:10:17.0893238Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft2_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 39%] 2023-01-11T23:10:17.0893423Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft_executor_aten_cuda_float32 PASSED [ 39%] 2023-01-11T23:10:17.0893605Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft_executor_aten_cuda_int16 PASSED [ 39%] 2023-01-11T23:10:17.0893785Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft_executor_aten_cuda_int64 PASSED [ 39%] 2023-01-11T23:10:17.0893982Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 39%] 2023-01-11T23:10:17.0894307Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft_executor_nvfuser_cuda_float32 SKIPPED (_refs.fft.rfft doesn't support nvfuser) [ 39%] 2023-01-11T23:10:17.0894614Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 39%] 2023-01-11T23:10:17.0894942Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft_executor_nvfuser_cuda_int32 SKIPPED (_refs.fft.rfft doesn't support nvfuser) [ 39%] 2023-01-11T23:10:17.0895266Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 39%] 2023-01-11T23:10:17.0895453Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfftn_executor_aten_cuda_float64 PASSED [ 39%] 2023-01-11T23:10:17.0895635Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfftn_executor_aten_cuda_int64 PASSED [ 39%] 2023-01-11T23:10:17.0909596Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfftn_executor_aten_cuda_uint8 PASSED [ 39%] 2023-01-11T23:10:17.0909863Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfftn_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 39%] 2023-01-11T23:10:17.0910312Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfftn_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 39%] 2023-01-11T23:10:17.0910643Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfftn_executor_nvfuser_cuda_int32 SKIPPED (_refs.fft.rfftn doesn't support nvfuser) [ 39%] 2023-01-11T23:10:17.0910860Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfftn_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 39%] 2023-01-11T23:10:17.0911079Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_aten_cuda_bfloat16 PASSED [ 39%] 2023-01-11T23:10:17.0911290Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_aten_cuda_bool PASSED [ 39%] 2023-01-11T23:10:17.0911484Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_aten_cuda_complex32 PASSED [ 39%] 2023-01-11T23:10:17.0911679Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_aten_cuda_float16 PASSED [ 39%] 2023-01-11T23:10:17.0911870Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_aten_cuda_float32 PASSED [ 39%] 2023-01-11T23:10:17.0912060Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_aten_cuda_float64 PASSED [ 39%] 2023-01-11T23:10:17.0912246Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_aten_cuda_int64 PASSED [ 39%] 2023-01-11T23:10:17.0912425Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_aten_cuda_int8 PASSED [ 39%] 2023-01-11T23:10:17.0912611Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_aten_cuda_uint8 PASSED [ 39%] 2023-01-11T23:10:17.0912827Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 39%] 2023-01-11T23:10:17.0913073Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 39%] 2023-01-11T23:10:17.0913295Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 39%] 2023-01-11T23:10:17.0913641Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 39%] 2023-01-11T23:10:17.0913861Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 39%] 2023-01-11T23:10:17.0914075Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 39%] 2023-01-11T23:10:17.0914398Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_nvfuser_cuda_float32 SKIPPED (_refs.fill doesn't support nvfuser) [ 39%] 2023-01-11T23:10:17.0914731Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 39%] 2023-01-11T23:10:17.0915050Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_nvfuser_cuda_int32 SKIPPED (_refs.fill doesn't support nvfuser) [ 39%] 2023-01-11T23:10:17.0915242Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_aten_cuda_complex128 PASSED [ 39%] 2023-01-11T23:10:17.0915441Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_aten_cuda_complex32 PASSED [ 39%] 2023-01-11T23:10:17.0915640Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_aten_cuda_complex64 PASSED [ 39%] 2023-01-11T23:10:17.0915832Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_aten_cuda_float32 PASSED [ 39%] 2023-01-11T23:10:17.0916024Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_aten_cuda_float64 PASSED [ 39%] 2023-01-11T23:10:17.0916212Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_aten_cuda_int8 PASSED [ 39%] 2023-01-11T23:10:17.0916402Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_aten_cuda_uint8 PASSED [ 39%] 2023-01-11T23:10:17.0916648Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 39%] 2023-01-11T23:10:17.0916863Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 39%] 2023-01-11T23:10:17.0917207Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 39%] 2023-01-11T23:10:17.0917433Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 39%] 2023-01-11T23:10:17.0917649Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 39%] 2023-01-11T23:10:17.0917981Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_nvfuser_cuda_float32 SKIPPED (_refs.flatten doesn't support nvfuser) [ 39%] 2023-01-11T23:10:17.0918321Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 39%] 2023-01-11T23:10:17.0918649Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_nvfuser_cuda_int32 SKIPPED (_refs.flatten doesn't support nvfuser) [ 39%] 2023-01-11T23:10:17.0918864Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 39%] 2023-01-11T23:10:17.0919058Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flip_executor_aten_cuda_complex64 PASSED [ 39%] 2023-01-11T23:10:17.0919245Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flip_executor_aten_cuda_float32 PASSED [ 39%] 2023-01-11T23:10:17.0919460Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flip_executor_aten_cuda_int16 PASSED [ 39%] 2023-01-11T23:10:17.0919642Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flip_executor_aten_cuda_int8 PASSED [ 39%] 2023-01-11T23:10:17.0919858Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flip_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 39%] 2023-01-11T23:10:17.0920077Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flip_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 39%] 2023-01-11T23:10:17.0920295Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flip_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 39%] 2023-01-11T23:10:17.0920534Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flip_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 39%] 2023-01-11T23:10:17.0920881Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flip_executor_nvfuser_cuda_float32 SKIPPED (_refs.flip doesn't support nvfuser) [ 39%] 2023-01-11T23:10:17.0921097Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flip_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 39%] 2023-01-11T23:10:17.0921429Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flip_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 39%] 2023-01-11T23:10:17.0921750Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flip_executor_nvfuser_cuda_int32 SKIPPED (_refs.flip doesn't support nvfuser) [ 39%] 2023-01-11T23:10:17.0922075Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flip_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 39%] 2023-01-11T23:10:17.0922269Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fliplr_executor_aten_cuda_bfloat16 PASSED [ 39%] 2023-01-11T23:10:17.0922459Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fliplr_executor_aten_cuda_complex128 PASSED [ 39%] 2023-01-11T23:10:17.0922659Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fliplr_executor_aten_cuda_complex64 PASSED [ 39%] 2023-01-11T23:10:17.0922875Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fliplr_executor_aten_cuda_float16 PASSED [ 39%] 2023-01-11T23:10:17.0923067Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fliplr_executor_aten_cuda_float32 PASSED [ 39%] 2023-01-11T23:10:17.0923252Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fliplr_executor_aten_cuda_float64 PASSED [ 39%] 2023-01-11T23:10:17.0923441Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fliplr_executor_aten_cuda_int16 PASSED [ 39%] 2023-01-11T23:10:17.0923631Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fliplr_executor_aten_cuda_int64 PASSED [ 39%] 2023-01-11T23:10:17.0923818Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fliplr_executor_aten_cuda_uint8 PASSED [ 39%] 2023-01-11T23:10:17.0924038Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fliplr_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 39%] 2023-01-11T23:10:17.0924248Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fliplr_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 39%] 2023-01-11T23:10:17.0924582Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fliplr_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 39%] 2023-01-11T23:10:17.0924910Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fliplr_executor_nvfuser_cuda_int32 SKIPPED (_refs.fliplr doesn't support nvfuser) [ 39%] 2023-01-11T23:10:17.0925109Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flipud_executor_aten_cuda_complex128 PASSED [ 39%] 2023-01-11T23:10:17.0925304Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flipud_executor_aten_cuda_complex64 PASSED [ 39%] 2023-01-11T23:10:17.0925496Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flipud_executor_aten_cuda_float16 PASSED [ 39%] 2023-01-11T23:10:17.0925714Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flipud_executor_aten_cuda_float64 PASSED [ 39%] 2023-01-11T23:10:17.0925907Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flipud_executor_aten_cuda_int16 PASSED [ 39%] 2023-01-11T23:10:17.0926097Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flipud_executor_aten_cuda_int8 PASSED [ 39%] 2023-01-11T23:10:17.0926315Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flipud_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 39%] 2023-01-11T23:10:17.0926522Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flipud_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 39%] 2023-01-11T23:10:17.0926744Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flipud_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 39%] 2023-01-11T23:10:17.0926962Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flipud_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 39%] 2023-01-11T23:10:17.0927175Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flipud_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 39%] 2023-01-11T23:10:17.0927503Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flipud_executor_nvfuser_cuda_int32 SKIPPED (_refs.flipud doesn't support nvfuser) [ 39%] 2023-01-11T23:10:17.0927717Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flipud_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 39%] 2023-01-11T23:10:17.0928048Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flipud_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 39%] 2023-01-11T23:10:17.0928246Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_float_power_executor_aten_cuda_bfloat16 PASSED [ 39%] 2023-01-11T23:10:17.0928442Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_float_power_executor_aten_cuda_float16 PASSED [ 39%] 2023-01-11T23:10:17.0928641Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_float_power_executor_aten_cuda_float32 PASSED [ 39%] 2023-01-11T23:10:17.0928868Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_float_power_executor_aten_cuda_float64 PASSED [ 39%] 2023-01-11T23:10:17.0929063Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_float_power_executor_aten_cuda_int32 PASSED [ 39%] 2023-01-11T23:10:17.0929252Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_float_power_executor_aten_cuda_int64 PASSED [ 40%] 2023-01-11T23:10:17.0929472Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_float_power_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 40%] 2023-01-11T23:10:17.0929693Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_float_power_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 40%] 2023-01-11T23:10:17.0929914Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_float_power_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 40%] 2023-01-11T23:10:17.0930345Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_float_power_executor_nvfuser_cuda_int32 SKIPPED (_refs.float_power doesn't support nvfuser) [ 40%] 2023-01-11T23:10:17.0930557Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_divide_executor_aten_cuda_bfloat16 PASSED [ 40%] 2023-01-11T23:10:17.0930752Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_divide_executor_aten_cuda_int64 PASSED [ 40%] 2023-01-11T23:10:17.0930972Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_divide_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 40%] 2023-01-11T23:10:17.0931307Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_divide_executor_nvfuser_cuda_float32 SKIPPED (_refs.floor_divide doesn't support nvfuser) [ 40%] 2023-01-11T23:10:17.0931641Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_divide_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 40%] 2023-01-11T23:10:17.0932012Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_divide_executor_nvfuser_cuda_int32 SKIPPED (_refs.floor_divide doesn't support nvfuser) [ 40%] 2023-01-11T23:10:17.0932236Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_divide_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 40%] 2023-01-11T23:10:17.0932459Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_divide_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 40%] 2023-01-11T23:10:17.0932648Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_executor_aten_cuda_bfloat16 PASSED [ 40%] 2023-01-11T23:10:17.0932836Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_executor_aten_cuda_uint8 PASSED [ 40%] 2023-01-11T23:10:17.0933052Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 40%] 2023-01-11T23:10:17.0933377Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_executor_nvfuser_cuda_float32 SKIPPED (_refs.floor doesn't support nvfuser) [ 40%] 2023-01-11T23:10:17.0933707Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 40%] 2023-01-11T23:10:17.0934082Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_executor_nvfuser_cuda_int32 SKIPPED (_refs.floor doesn't support nvfuser) [ 40%] 2023-01-11T23:10:17.0934782Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 40%] 2023-01-11T23:10:17.0935032Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 40%] 2023-01-11T23:10:17.0938754Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmax_executor_aten_cuda_bfloat16 PASSED [ 40%] 2023-01-11T23:10:17.0938946Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmax_executor_aten_cuda_float16 PASSED [ 40%] 2023-01-11T23:10:17.0939128Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmax_executor_aten_cuda_float32 PASSED [ 40%] 2023-01-11T23:10:17.0939368Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmax_executor_aten_cuda_int16 PASSED [ 40%] 2023-01-11T23:10:17.0939552Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmax_executor_aten_cuda_int32 PASSED [ 40%] 2023-01-11T23:10:17.0939724Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmax_executor_aten_cuda_int64 PASSED [ 40%] 2023-01-11T23:10:17.0939927Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmax_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 40%] 2023-01-11T23:10:17.0940238Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmax_executor_nvfuser_cuda_int32 SKIPPED (_refs.fmax doesn't support nvfuser) [ 40%] 2023-01-11T23:10:17.0940419Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmin_executor_aten_cuda_bfloat16 PASSED [ 40%] 2023-01-11T23:10:17.0940597Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmin_executor_aten_cuda_bool PASSED [ 40%] 2023-01-11T23:10:17.0940770Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmin_executor_aten_cuda_int16 PASSED [ 40%] 2023-01-11T23:10:17.0940941Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmin_executor_aten_cuda_int32 PASSED [ 40%] 2023-01-11T23:10:17.0941109Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmin_executor_aten_cuda_int64 PASSED [ 40%] 2023-01-11T23:10:17.0941311Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmin_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 40%] 2023-01-11T23:10:17.0941510Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmin_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 40%] 2023-01-11T23:10:17.0941820Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmin_executor_nvfuser_cuda_float32 SKIPPED (_refs.fmin doesn't support nvfuser) [ 40%] 2023-01-11T23:10:17.0942054Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmin_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 40%] 2023-01-11T23:10:17.0942231Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmod_executor_aten_cuda_bfloat16 PASSED [ 40%] 2023-01-11T23:10:17.0942406Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmod_executor_aten_cuda_float16 PASSED [ 40%] 2023-01-11T23:10:17.0942583Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmod_executor_aten_cuda_float64 PASSED [ 40%] 2023-01-11T23:10:17.0942756Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmod_executor_aten_cuda_int8 PASSED [ 40%] 2023-01-11T23:10:17.0942926Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmod_executor_aten_cuda_uint8 PASSED [ 40%] 2023-01-11T23:10:17.0943125Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmod_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 40%] 2023-01-11T23:10:17.0943305Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmod_executor_nvfuser_cuda_int32 PASSED [ 40%] 2023-01-11T23:10:17.0943507Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmod_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 40%] 2023-01-11T23:10:17.0943705Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmod_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 40%] 2023-01-11T23:10:17.0943876Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_frac_executor_aten_cuda_float16 PASSED [ 40%] 2023-01-11T23:10:17.0944048Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_frac_executor_aten_cuda_float64 PASSED [ 40%] 2023-01-11T23:10:17.0944248Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_frac_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 40%] 2023-01-11T23:10:17.0944558Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_frac_executor_nvfuser_cuda_float32 SKIPPED (_refs.frac doesn't support nvfuser) [ 40%] 2023-01-11T23:10:17.0944733Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gcd_executor_aten_cuda_int16 PASSED [ 40%] 2023-01-11T23:10:17.0944931Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gcd_executor_aten_cuda_int8 PASSED [ 40%] 2023-01-11T23:10:17.0945105Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gcd_executor_aten_cuda_uint8 PASSED [ 40%] 2023-01-11T23:10:17.0945423Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gcd_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 40%] 2023-01-11T23:10:17.0945727Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gcd_executor_nvfuser_cuda_int32 SKIPPED (_refs.gcd doesn't support nvfuser) [ 40%] 2023-01-11T23:10:17.0945904Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ge_executor_aten_cuda_bfloat16 PASSED [ 40%] 2023-01-11T23:10:17.0946074Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ge_executor_aten_cuda_bool PASSED [ 40%] 2023-01-11T23:10:17.0946249Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ge_executor_aten_cuda_float16 PASSED [ 40%] 2023-01-11T23:10:17.0946425Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ge_executor_aten_cuda_float32 PASSED [ 40%] 2023-01-11T23:10:17.0946596Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ge_executor_aten_cuda_float64 PASSED [ 40%] 2023-01-11T23:10:17.0946767Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ge_executor_aten_cuda_int16 PASSED [ 40%] 2023-01-11T23:10:17.0946934Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ge_executor_aten_cuda_int64 PASSED [ 40%] 2023-01-11T23:10:17.0947099Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ge_executor_aten_cuda_uint8 PASSED [ 40%] 2023-01-11T23:10:17.0947300Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ge_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 40%] 2023-01-11T23:10:17.0947503Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ge_executor_nvfuser_cuda_float32 PASSED [ 40%] 2023-01-11T23:10:17.0947817Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ge_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 40%] 2023-01-11T23:10:17.0947994Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ge_executor_nvfuser_cuda_int32 PASSED [ 40%] 2023-01-11T23:10:17.0948306Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ge_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 40%] 2023-01-11T23:10:17.0948502Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ge_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 40%] 2023-01-11T23:10:17.0969426Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gt_executor_aten_cuda_float16 PASSED [ 40%] 2023-01-11T23:10:17.0969606Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gt_executor_aten_cuda_float64 PASSED [ 40%] 2023-01-11T23:10:17.0969781Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gt_executor_aten_cuda_int16 PASSED [ 40%] 2023-01-11T23:10:17.0969949Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gt_executor_aten_cuda_int64 PASSED [ 40%] 2023-01-11T23:10:17.0970125Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gt_executor_aten_cuda_int8 PASSED [ 40%] 2023-01-11T23:10:17.0970287Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gt_executor_aten_cuda_uint8 PASSED [ 40%] 2023-01-11T23:10:17.0970487Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gt_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 40%] 2023-01-11T23:10:17.0970685Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gt_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 40%] 2023-01-11T23:10:17.0971055Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gt_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 40%] 2023-01-11T23:10:17.0971366Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gt_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 40%] 2023-01-11T23:10:17.0971600Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gt_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 40%] 2023-01-11T23:10:17.0971789Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_heaviside_executor_aten_cuda_float32 PASSED [ 40%] 2023-01-11T23:10:17.0971975Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_heaviside_executor_aten_cuda_int8 PASSED [ 40%] 2023-01-11T23:10:17.0972158Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_heaviside_executor_aten_cuda_uint8 PASSED [ 40%] 2023-01-11T23:10:17.0972369Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_heaviside_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 40%] 2023-01-11T23:10:17.0972572Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_heaviside_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 40%] 2023-01-11T23:10:17.0972902Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_heaviside_executor_nvfuser_cuda_float32 SKIPPED (_refs.heaviside doesn't support nvfuser) [ 40%] 2023-01-11T23:10:17.0973114Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_heaviside_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 40%] 2023-01-11T23:10:17.0973444Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_heaviside_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 40%] 2023-01-11T23:10:17.0973650Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_heaviside_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 40%] 2023-01-11T23:10:17.0973828Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_aten_cuda_bool PASSED [ 40%] 2023-01-11T23:10:17.0974014Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_aten_cuda_complex128 PASSED [ 40%] 2023-01-11T23:10:17.0974225Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_aten_cuda_complex64 PASSED [ 40%] 2023-01-11T23:10:17.0974403Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_aten_cuda_float16 PASSED [ 40%] 2023-01-11T23:10:17.0974705Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_aten_cuda_float64 PASSED [ 40%] 2023-01-11T23:10:17.0974879Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_aten_cuda_int16 PASSED [ 40%] 2023-01-11T23:10:17.0975086Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 40%] 2023-01-11T23:10:17.0975427Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 40%] 2023-01-11T23:10:17.0975632Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 40%] 2023-01-11T23:10:17.0975946Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_nvfuser_cuda_int32 SKIPPED (_refs.hsplit doesn't support nvfuser) [ 40%] 2023-01-11T23:10:17.0976264Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 40%] 2023-01-11T23:10:17.0976466Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 40%] 2023-01-11T23:10:17.0976641Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_aten_cuda_bool PASSED [ 40%] 2023-01-11T23:10:17.0976819Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_aten_cuda_float32 PASSED [ 40%] 2023-01-11T23:10:17.0976995Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_aten_cuda_int32 PASSED [ 40%] 2023-01-11T23:10:17.0977165Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_aten_cuda_uint8 PASSED [ 40%] 2023-01-11T23:10:17.0977373Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 40%] 2023-01-11T23:10:17.0977626Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 40%] 2023-01-11T23:10:17.0977835Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 40%] 2023-01-11T23:10:17.0978170Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 40%] 2023-01-11T23:10:17.0978372Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 40%] 2023-01-11T23:10:17.0978685Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_nvfuser_cuda_float32 SKIPPED (_refs.hstack doesn't support nvfuser) [ 40%] 2023-01-11T23:10:17.0979008Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 40%] 2023-01-11T23:10:17.0979323Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_nvfuser_cuda_int32 SKIPPED (_refs.hstack doesn't support nvfuser) [ 40%] 2023-01-11T23:10:17.0979502Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hypot_executor_aten_cuda_bfloat16 PASSED [ 40%] 2023-01-11T23:10:17.0979680Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hypot_executor_aten_cuda_float64 PASSED [ 40%] 2023-01-11T23:10:17.0979878Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hypot_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 40%] 2023-01-11T23:10:17.0980053Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_i0_executor_aten_cuda_bfloat16 PASSED [ 40%] 2023-01-11T23:10:17.0980261Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_i0_executor_aten_cuda_float64 PASSED [ 40%] 2023-01-11T23:10:17.0980433Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_i0_executor_aten_cuda_uint8 PASSED [ 40%] 2023-01-11T23:10:17.0980631Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_i0_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 40%] 2023-01-11T23:10:17.0980828Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_i0_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 40%] 2023-01-11T23:10:17.0981133Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_i0_executor_nvfuser_cuda_float32 SKIPPED (_refs.i0 doesn't support nvfuser) [ 40%] 2023-01-11T23:10:17.0981330Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_i0_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 40%] 2023-01-11T23:10:17.0981631Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_i0_executor_nvfuser_cuda_int32 SKIPPED (_refs.i0 doesn't support nvfuser) [ 40%] 2023-01-11T23:10:17.0981826Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_i0_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 40%] 2023-01-11T23:10:17.0982140Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_i0_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 40%] 2023-01-11T23:10:17.0982321Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_igamma_executor_aten_cuda_float32 PASSED [ 40%] 2023-01-11T23:10:17.0982503Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_igamma_executor_aten_cuda_float64 PASSED [ 40%] 2023-01-11T23:10:17.0982827Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_igammac_executor_nvfuser_cuda_float32 SKIPPED (_refs.igammac doesn't support nvfuser) [ 40%] 2023-01-11T23:10:17.0983037Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_igammac_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 40%] 2023-01-11T23:10:17.0983218Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_imag_executor_aten_cuda_complex32 PASSED [ 40%] 2023-01-11T23:10:17.0983402Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_imag_executor_aten_cuda_complex64 PASSED [ 40%] 2023-01-11T23:10:17.0983634Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_imag_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 40%] 2023-01-11T23:10:17.0983815Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_aten_cuda_bool PASSED [ 40%] 2023-01-11T23:10:17.0984005Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_aten_cuda_complex32 PASSED [ 40%] 2023-01-11T23:10:17.0984185Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_aten_cuda_complex64 PASSED [ 40%] 2023-01-11T23:10:17.0984373Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_aten_cuda_float16 PASSED [ 40%] 2023-01-11T23:10:17.0984556Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_aten_cuda_int64 PASSED [ 40%] 2023-01-11T23:10:17.0984771Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 40%] 2023-01-11T23:10:17.0984983Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 40%] 2023-01-11T23:10:17.0985189Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 40%] 2023-01-11T23:10:17.0985517Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 40%] 2023-01-11T23:10:17.0985709Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_copy_executor_aten_cuda_complex64 PASSED [ 40%] 2023-01-11T23:10:17.0985896Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_copy_executor_aten_cuda_float16 PASSED [ 40%] 2023-01-11T23:10:17.0986109Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_copy_executor_aten_cuda_float32 PASSED [ 40%] 2023-01-11T23:10:17.0986317Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_copy_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 40%] 2023-01-11T23:10:17.0986536Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_copy_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 40%] 2023-01-11T23:10:17.0986746Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_copy_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 40%] 2023-01-11T23:10:17.0986955Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_copy_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 40%] 2023-01-11T23:10:17.0987165Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_copy_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 40%] 2023-01-11T23:10:17.0987503Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_copy_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 40%] 2023-01-11T23:10:17.0987836Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_copy_executor_nvfuser_cuda_int32 SKIPPED (_refs.index_copy doesn't support nvfuser) [ 40%] 2023-01-11T23:10:17.0988023Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_fill_executor_aten_cuda_bfloat16 PASSED [ 40%] 2023-01-11T23:10:17.0988211Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_fill_executor_aten_cuda_bool PASSED [ 40%] 2023-01-11T23:10:17.0988402Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_fill_executor_aten_cuda_complex128 PASSED [ 40%] 2023-01-11T23:10:17.0988581Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_fill_executor_aten_cuda_float32 PASSED [ 40%] 2023-01-11T23:10:17.0988847Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_fill_executor_aten_cuda_float64 PASSED [ 41%] 2023-01-11T23:10:17.0989038Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_fill_executor_aten_cuda_uint8 PASSED [ 41%] 2023-01-11T23:10:17.0989278Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_fill_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 41%] 2023-01-11T23:10:17.0989492Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_fill_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 41%] 2023-01-11T23:10:17.0989704Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_fill_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 41%] 2023-01-11T23:10:17.0990036Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_fill_executor_nvfuser_cuda_float32 SKIPPED (_refs.index_fill doesn't support nvfuser) [ 41%] 2023-01-11T23:10:17.0990363Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_fill_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 41%] 2023-01-11T23:10:17.0990690Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_fill_executor_nvfuser_cuda_int32 SKIPPED (_refs.index_fill doesn't support nvfuser) [ 41%] 2023-01-11T23:10:17.0990927Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_fill_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 41%] 2023-01-11T23:10:17.0991148Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_aten_cuda_bfloat16 PASSED [ 41%] 2023-01-11T23:10:17.0991330Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_aten_cuda_complex64 PASSED [ 41%] 2023-01-11T23:10:17.0991516Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_aten_cuda_float16 PASSED [ 41%] 2023-01-11T23:10:17.0991702Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_aten_cuda_float32 PASSED [ 41%] 2023-01-11T23:10:17.0991887Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_aten_cuda_uint8 PASSED [ 41%] 2023-01-11T23:10:17.0992126Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 41%] 2023-01-11T23:10:17.0992343Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 41%] 2023-01-11T23:10:17.0992555Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 41%] 2023-01-11T23:10:17.0992764Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 41%] 2023-01-11T23:10:17.0993098Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_nvfuser_cuda_int32 SKIPPED (_refs.index_select doesn't support nvfuser) [ 41%] 2023-01-11T23:10:17.0993307Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 41%] 2023-01-11T23:10:17.0993629Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 41%] 2023-01-11T23:10:17.0993841Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 41%] 2023-01-11T23:10:17.0994031Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isclose_executor_aten_cuda_complex64 PASSED [ 41%] 2023-01-11T23:10:17.0994212Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isclose_executor_aten_cuda_int16 PASSED [ 41%] 2023-01-11T23:10:17.0994392Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isclose_executor_aten_cuda_int8 PASSED [ 41%] 2023-01-11T23:10:17.0994574Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isclose_executor_aten_cuda_uint8 PASSED [ 41%] 2023-01-11T23:10:17.0994784Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isclose_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 41%] 2023-01-11T23:10:17.0994997Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isclose_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 41%] 2023-01-11T23:10:17.0995245Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isclose_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 41%] 2023-01-11T23:10:17.0995454Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isclose_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 41%] 2023-01-11T23:10:17.0995651Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isclose_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 41%] 2023-01-11T23:10:17.0995973Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isclose_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 41%] 2023-01-11T23:10:17.0996178Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isclose_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 41%] 2023-01-11T23:10:17.0996365Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_aten_cuda_bool PASSED [ 41%] 2023-01-11T23:10:17.0996556Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_aten_cuda_complex64 PASSED [ 41%] 2023-01-11T23:10:17.0996739Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_aten_cuda_int16 PASSED [ 41%] 2023-01-11T23:10:17.0996922Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_aten_cuda_int32 PASSED [ 41%] 2023-01-11T23:10:17.0997104Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_aten_cuda_int8 PASSED [ 41%] 2023-01-11T23:10:17.0997314Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 41%] 2023-01-11T23:10:17.0997527Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 41%] 2023-01-11T23:10:17.0997755Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 41%] 2023-01-11T23:10:17.0998084Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_nvfuser_cuda_float32 SKIPPED (_refs.isfinite doesn't support nvfuser) [ 41%] 2023-01-11T23:10:17.0998294Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 41%] 2023-01-11T23:10:17.0998614Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_nvfuser_cuda_int32 SKIPPED (_refs.isfinite doesn't support nvfuser) [ 41%] 2023-01-11T23:10:17.0998795Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_aten_cuda_float16 PASSED [ 41%] 2023-01-11T23:10:17.0998975Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_aten_cuda_float32 PASSED [ 41%] 2023-01-11T23:10:17.0999155Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_aten_cuda_int64 PASSED [ 41%] 2023-01-11T23:10:17.0999335Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_aten_cuda_int8 PASSED [ 41%] 2023-01-11T23:10:17.0999517Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_aten_cuda_uint8 PASSED [ 41%] 2023-01-11T23:10:17.0999723Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 41%] 2023-01-11T23:10:17.0999925Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 41%] 2023-01-11T23:10:17.1000128Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 41%] 2023-01-11T23:10:17.1000496Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_nvfuser_cuda_float32 SKIPPED (_refs.isinf doesn't support nvfuser) [ 41%] 2023-01-11T23:10:17.1000699Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 41%] 2023-01-11T23:10:17.1001077Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 41%] 2023-01-11T23:10:17.1001390Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_nvfuser_cuda_int32 SKIPPED (_refs.isinf doesn't support nvfuser) [ 41%] 2023-01-11T23:10:17.1001573Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isnan_executor_aten_cuda_bfloat16 PASSED [ 41%] 2023-01-11T23:10:17.1001752Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isnan_executor_aten_cuda_bool PASSED [ 41%] 2023-01-11T23:10:17.1001934Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isnan_executor_aten_cuda_float16 PASSED [ 41%] 2023-01-11T23:10:17.1002115Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isnan_executor_aten_cuda_float64 PASSED [ 41%] 2023-01-11T23:10:17.1002289Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isnan_executor_aten_cuda_int64 PASSED [ 41%] 2023-01-11T23:10:17.1002469Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isnan_executor_aten_cuda_uint8 PASSED [ 41%] 2023-01-11T23:10:17.1002677Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isnan_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 41%] 2023-01-11T23:10:17.1002885Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isnan_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 41%] 2023-01-11T23:10:17.1003088Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isnan_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 41%] 2023-01-11T23:10:17.1003273Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isnan_executor_nvfuser_cuda_float32 PASSED [ 41%] 2023-01-11T23:10:17.1003593Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isnan_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 41%] 2023-01-11T23:10:17.1003801Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isnan_executor_nvfuser_cuda_int32 PASSED [ 41%] 2023-01-11T23:10:17.1004125Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isnan_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 41%] 2023-01-11T23:10:17.1004306Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isneginf_executor_aten_cuda_bfloat16 PASSED [ 41%] 2023-01-11T23:10:17.1004491Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isneginf_executor_aten_cuda_float16 PASSED [ 41%] 2023-01-11T23:10:17.1004674Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isneginf_executor_aten_cuda_float64 PASSED [ 41%] 2023-01-11T23:10:17.1004853Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isneginf_executor_aten_cuda_int16 PASSED [ 41%] 2023-01-11T23:10:17.1005030Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isneginf_executor_aten_cuda_int64 PASSED [ 41%] 2023-01-11T23:10:17.1005212Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isneginf_executor_aten_cuda_int8 PASSED [ 41%] 2023-01-11T23:10:17.1005390Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isneginf_executor_aten_cuda_uint8 PASSED [ 41%] 2023-01-11T23:10:17.1005600Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isneginf_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 41%] 2023-01-11T23:10:17.1005806Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isneginf_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 41%] 2023-01-11T23:10:17.1006014Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isneginf_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 41%] 2023-01-11T23:10:17.1006332Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isneginf_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 41%] 2023-01-11T23:10:17.1006520Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isposinf_executor_aten_cuda_bool PASSED [ 41%] 2023-01-11T23:10:17.1006700Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isposinf_executor_aten_cuda_int8 PASSED [ 41%] 2023-01-11T23:10:17.1006950Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isposinf_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 41%] 2023-01-11T23:10:17.1007279Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isposinf_executor_nvfuser_cuda_float32 SKIPPED (_refs.isposinf doesn't support nvfuser) [ 41%] 2023-01-11T23:10:17.1007603Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isposinf_executor_nvfuser_cuda_int32 SKIPPED (_refs.isposinf doesn't support nvfuser) [ 41%] 2023-01-11T23:10:17.1007808Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isposinf_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 41%] 2023-01-11T23:10:17.1008129Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isposinf_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 41%] 2023-01-11T23:10:17.1008320Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_aten_cuda_complex64 PASSED [ 41%] 2023-01-11T23:10:17.1008504Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_aten_cuda_float32 PASSED [ 41%] 2023-01-11T23:10:17.1008676Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_aten_cuda_float64 PASSED [ 41%] 2023-01-11T23:10:17.1008856Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_aten_cuda_uint8 PASSED [ 41%] 2023-01-11T23:10:17.1009063Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 41%] 2023-01-11T23:10:17.1009277Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 41%] 2023-01-11T23:10:17.1009625Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 41%] 2023-01-11T23:10:17.1009834Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 41%] 2023-01-11T23:10:17.1010015Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lcm_executor_aten_cuda_int16 PASSED [ 41%] 2023-01-11T23:10:17.1010213Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lcm_executor_aten_cuda_int32 PASSED [ 41%] 2023-01-11T23:10:17.1010416Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lcm_executor_aten_cuda_uint8 PASSED [ 41%] 2023-01-11T23:10:17.1010737Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lcm_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 41%] 2023-01-11T23:10:17.1011036Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lcm_executor_nvfuser_cuda_int32 SKIPPED (_refs.lcm doesn't support nvfuser) [ 41%] 2023-01-11T23:10:17.1011243Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lcm_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 41%] 2023-01-11T23:10:17.1011562Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lcm_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 41%] 2023-01-11T23:10:17.1011741Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_le_executor_aten_cuda_bfloat16 PASSED [ 41%] 2023-01-11T23:10:17.1011919Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_le_executor_aten_cuda_float16 PASSED [ 41%] 2023-01-11T23:10:17.1012124Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_le_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 41%] 2023-01-11T23:10:17.1012303Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_le_executor_nvfuser_cuda_int32 PASSED [ 41%] 2023-01-11T23:10:17.1012616Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_le_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 41%] 2023-01-11T23:10:17.1012817Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_le_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 41%] 2023-01-11T23:10:17.1013026Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lerp_executor_aten_cuda_bfloat16 PASSED [ 41%] 2023-01-11T23:10:17.1013205Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lerp_executor_aten_cuda_complex128 PASSED [ 41%] 2023-01-11T23:10:17.1013387Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lerp_executor_aten_cuda_complex32 PASSED [ 41%] 2023-01-11T23:10:17.1013568Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lerp_executor_aten_cuda_float16 PASSED [ 41%] 2023-01-11T23:10:17.1013773Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lerp_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 41%] 2023-01-11T23:10:17.1013976Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lerp_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 41%] 2023-01-11T23:10:17.1014159Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lgamma_executor_aten_cuda_bool PASSED [ 41%] 2023-01-11T23:10:17.1014343Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lgamma_executor_aten_cuda_float64 PASSED [ 41%] 2023-01-11T23:10:17.1014621Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lgamma_executor_aten_cuda_int32 PASSED [ 41%] 2023-01-11T23:10:17.1014800Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lgamma_executor_aten_cuda_int8 PASSED [ 41%] 2023-01-11T23:10:17.1014998Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lgamma_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 41%] 2023-01-11T23:10:17.1015187Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lgamma_executor_nvfuser_cuda_float32 PASSED [ 41%] 2023-01-11T23:10:17.1015515Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lgamma_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 41%] 2023-01-11T23:10:17.1015764Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lgamma_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 41%] 2023-01-11T23:10:17.1015975Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lgamma_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 41%] 2023-01-11T23:10:17.1016174Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_matrix_norm_executor_aten_cuda_complex128 PASSED [ 41%] 2023-01-11T23:10:17.1016369Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_matrix_norm_executor_aten_cuda_float64 PASSED [ 41%] 2023-01-11T23:10:17.1016590Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_matrix_norm_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 41%] 2023-01-11T23:10:17.1016814Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_matrix_norm_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 41%] 2023-01-11T23:10:17.1017034Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_matrix_norm_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 41%] 2023-01-11T23:10:17.1017394Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_matrix_norm_executor_nvfuser_cuda_float32 SKIPPED (_refs.linalg.matrix_norm doesn't support nvfuser) [ 41%] 2023-01-11T23:10:17.1017605Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_matrix_norm_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 41%] 2023-01-11T23:10:17.1017799Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_norm_executor_aten_cuda_complex128 PASSED [ 41%] 2023-01-11T23:10:17.1017987Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_norm_executor_aten_cuda_float16 PASSED [ 41%] 2023-01-11T23:10:17.1018199Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_norm_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 41%] 2023-01-11T23:10:17.1018418Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_norm_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 41%] 2023-01-11T23:10:17.1018661Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_norm_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 41%] 2023-01-11T23:10:17.1019002Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_norm_executor_nvfuser_cuda_float32 SKIPPED (_refs.linalg.norm doesn't support nvfuser) [ 41%] 2023-01-11T23:10:17.1019214Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_norm_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 41%] 2023-01-11T23:10:17.1019403Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_svd_executor_aten_cuda_complex128 PASSED [ 41%] 2023-01-11T23:10:17.1019590Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_svd_executor_aten_cuda_complex64 PASSED [ 41%] 2023-01-11T23:10:17.1019770Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_svd_executor_aten_cuda_float64 PASSED [ 41%] 2023-01-11T23:10:17.1019985Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_svd_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 41%] 2023-01-11T23:10:17.1020326Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_svd_executor_nvfuser_cuda_float32 SKIPPED (_refs.linalg.svd doesn't support nvfuser) [ 41%] 2023-01-11T23:10:17.1020522Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_svdvals_executor_aten_cuda_complex64 PASSED [ 41%] 2023-01-11T23:10:17.1020739Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_svdvals_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 41%] 2023-01-11T23:10:17.1020958Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_svdvals_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 41%] 2023-01-11T23:10:17.1021310Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_svdvals_executor_nvfuser_cuda_float32 SKIPPED (_refs.linalg.svdvals doesn't support nvfuser) [ 41%] 2023-01-11T23:10:17.1021560Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_svdvals_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 41%] 2023-01-11T23:10:17.1021755Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_vector_norm_executor_aten_cuda_float16 PASSED [ 41%] 2023-01-11T23:10:17.1021976Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_vector_norm_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 41%] 2023-01-11T23:10:17.1022186Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_vector_norm_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 41%] 2023-01-11T23:10:17.1022544Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_vector_norm_executor_nvfuser_cuda_float32 SKIPPED (_refs.linalg.vector_norm doesn't support nvfuser) [ 41%] 2023-01-11T23:10:17.1022761Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_vector_norm_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 41%] 2023-01-11T23:10:17.1022951Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_executor_aten_cuda_bfloat16 PASSED [ 41%] 2023-01-11T23:10:17.1023142Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_executor_aten_cuda_complex128 PASSED [ 41%] 2023-01-11T23:10:17.1023333Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_executor_aten_cuda_complex64 PASSED [ 41%] 2023-01-11T23:10:17.1023518Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_executor_aten_cuda_float16 PASSED [ 41%] 2023-01-11T23:10:17.1023700Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_executor_aten_cuda_float64 PASSED [ 41%] 2023-01-11T23:10:17.1023880Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_executor_aten_cuda_int16 XFAIL [ 42%] 2023-01-11T23:10:17.1024058Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_executor_aten_cuda_int64 XFAIL [ 42%] 2023-01-11T23:10:17.1024229Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_executor_aten_cuda_uint8 XFAIL [ 42%] 2023-01-11T23:10:17.1024465Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 42%] 2023-01-11T23:10:17.1024684Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 42%] 2023-01-11T23:10:17.1025015Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_executor_nvfuser_cuda_float32 SKIPPED (_refs.linspace doesn't support nvfuser) [ 42%] 2023-01-11T23:10:17.1025341Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 42%] 2023-01-11T23:10:17.1025662Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_executor_nvfuser_cuda_int32 SKIPPED (_refs.linspace doesn't support nvfuser) [ 42%] 2023-01-11T23:10:17.1025875Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 42%] 2023-01-11T23:10:17.1026202Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 42%] 2023-01-11T23:10:17.1026388Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log10_executor_aten_cuda_complex128 PASSED [ 42%] 2023-01-11T23:10:17.1026570Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log10_executor_aten_cuda_complex64 PASSED [ 42%] 2023-01-11T23:10:17.1026748Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log10_executor_aten_cuda_int32 PASSED [ 42%] 2023-01-11T23:10:17.1026920Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log10_executor_aten_cuda_int8 PASSED [ 42%] 2023-01-11T23:10:17.1027099Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log10_executor_aten_cuda_uint8 PASSED [ 42%] 2023-01-11T23:10:17.1027329Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log10_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 42%] 2023-01-11T23:10:17.1027536Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log10_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 42%] 2023-01-11T23:10:17.1027744Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log10_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 42%] 2023-01-11T23:10:17.1027928Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log10_executor_nvfuser_cuda_float32 PASSED [ 42%] 2023-01-11T23:10:17.1028131Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log10_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 42%] 2023-01-11T23:10:17.1028312Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log10_executor_nvfuser_cuda_int32 PASSED [ 42%] 2023-01-11T23:10:17.1028517Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log10_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 42%] 2023-01-11T23:10:17.1028787Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log1p_executor_aten_cuda_bfloat16 PASSED [ 42%] 2023-01-11T23:10:17.1028977Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log1p_executor_aten_cuda_bool PASSED [ 42%] 2023-01-11T23:10:17.1029163Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log1p_executor_aten_cuda_complex64 PASSED [ 42%] 2023-01-11T23:10:17.1029344Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log1p_executor_aten_cuda_float32 PASSED [ 42%] 2023-01-11T23:10:17.1029524Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log1p_executor_aten_cuda_float64 PASSED [ 42%] 2023-01-11T23:10:17.1029702Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log1p_executor_aten_cuda_int16 PASSED [ 42%] 2023-01-11T23:10:17.1029907Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log1p_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 42%] 2023-01-11T23:10:17.1030239Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log1p_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 42%] 2023-01-11T23:10:17.1030473Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log1p_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 42%] 2023-01-11T23:10:17.1030657Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log2_executor_aten_cuda_bool PASSED [ 42%] 2023-01-11T23:10:17.1030831Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log2_executor_aten_cuda_float16 PASSED [ 42%] 2023-01-11T23:10:17.1031011Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log2_executor_aten_cuda_float64 PASSED [ 42%] 2023-01-11T23:10:17.1031191Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log2_executor_aten_cuda_int16 PASSED [ 42%] 2023-01-11T23:10:17.1031366Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log2_executor_aten_cuda_int32 PASSED [ 42%] 2023-01-11T23:10:17.1031548Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log2_executor_aten_cuda_int8 PASSED [ 42%] 2023-01-11T23:10:17.1031758Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log2_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 42%] 2023-01-11T23:10:17.1031966Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log2_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 42%] 2023-01-11T23:10:17.1032172Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log2_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 42%] 2023-01-11T23:10:17.1032494Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log2_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 42%] 2023-01-11T23:10:17.1032689Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log2_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 42%] 2023-01-11T23:10:17.1032895Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_aten_cuda_bfloat16 PASSED [ 42%] 2023-01-11T23:10:17.1033072Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_aten_cuda_bool PASSED [ 42%] 2023-01-11T23:10:17.1033257Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_aten_cuda_complex128 PASSED [ 42%] 2023-01-11T23:10:17.1033441Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_aten_cuda_complex32 PASSED [ 42%] 2023-01-11T23:10:17.1033618Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_aten_cuda_float32 PASSED [ 42%] 2023-01-11T23:10:17.1033796Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_aten_cuda_int16 PASSED [ 42%] 2023-01-11T23:10:17.1033971Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_aten_cuda_int32 PASSED [ 42%] 2023-01-11T23:10:17.1034149Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_aten_cuda_int64 PASSED [ 42%] 2023-01-11T23:10:17.1034318Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_aten_cuda_int8 PASSED [ 42%] 2023-01-11T23:10:17.1034494Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_aten_cuda_uint8 PASSED [ 42%] 2023-01-11T23:10:17.1034702Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 42%] 2023-01-11T23:10:17.1034908Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 42%] 2023-01-11T23:10:17.1035110Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 42%] 2023-01-11T23:10:17.1035293Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_nvfuser_cuda_float32 PASSED [ 42%] 2023-01-11T23:10:17.1035614Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 42%] 2023-01-11T23:10:17.1035818Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 42%] 2023-01-11T23:10:17.1036054Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 42%] 2023-01-11T23:10:17.1036256Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_aten_cuda_complex32 PASSED [ 42%] 2023-01-11T23:10:17.1036447Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_aten_cuda_float16 PASSED [ 42%] 2023-01-11T23:10:17.1036641Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_aten_cuda_float32 PASSED [ 42%] 2023-01-11T23:10:17.1036835Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_aten_cuda_int16 PASSED [ 42%] 2023-01-11T23:10:17.1037065Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 42%] 2023-01-11T23:10:17.1037289Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 42%] 2023-01-11T23:10:17.1037512Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 42%] 2023-01-11T23:10:17.1037711Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_nvfuser_cuda_float32 PASSED [ 42%] 2023-01-11T23:10:17.1037911Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_nvfuser_cuda_int32 PASSED [ 42%] 2023-01-11T23:10:17.1038129Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 42%] 2023-01-11T23:10:17.1038351Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 42%] 2023-01-11T23:10:17.1038565Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_and_executor_aten_cuda_bfloat16 PASSED [ 42%] 2023-01-11T23:10:17.1038746Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_and_executor_aten_cuda_bool PASSED [ 42%] 2023-01-11T23:10:17.1038941Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_and_executor_aten_cuda_complex128 PASSED [ 42%] 2023-01-11T23:10:17.1039129Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_and_executor_aten_cuda_complex64 PASSED [ 42%] 2023-01-11T23:10:17.1039315Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_and_executor_aten_cuda_float16 PASSED [ 42%] 2023-01-11T23:10:17.1039499Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_and_executor_aten_cuda_int16 PASSED [ 42%] 2023-01-11T23:10:17.1039683Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_and_executor_aten_cuda_int32 PASSED [ 42%] 2023-01-11T23:10:17.1039869Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_and_executor_aten_cuda_int8 PASSED [ 42%] 2023-01-11T23:10:17.1040088Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_and_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 42%] 2023-01-11T23:10:17.1040306Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_and_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 42%] 2023-01-11T23:10:17.1040510Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_and_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 42%] 2023-01-11T23:10:17.1040728Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_and_executor_nvfuser_cuda_float32 PASSED [ 42%] 2023-01-11T23:10:17.1040968Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_and_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 42%] 2023-01-11T23:10:17.1041153Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_not_executor_aten_cuda_bool PASSED [ 42%] 2023-01-11T23:10:17.1041365Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_not_executor_aten_cuda_complex128 PASSED [ 42%] 2023-01-11T23:10:17.1041551Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_not_executor_aten_cuda_float16 PASSED [ 42%] 2023-01-11T23:10:17.1041739Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_not_executor_aten_cuda_float64 PASSED [ 42%] 2023-01-11T23:10:17.1041922Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_not_executor_aten_cuda_int16 PASSED [ 42%] 2023-01-11T23:10:17.1042104Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_not_executor_aten_cuda_int8 PASSED [ 42%] 2023-01-11T23:10:17.1042320Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_not_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 42%] 2023-01-11T23:10:17.1042530Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_not_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 42%] 2023-01-11T23:10:17.1042747Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_not_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 42%] 2023-01-11T23:10:17.1042955Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_not_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 42%] 2023-01-11T23:10:17.1043143Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_not_executor_nvfuser_cuda_float32 PASSED [ 42%] 2023-01-11T23:10:17.1043355Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_not_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 42%] 2023-01-11T23:10:17.1043687Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_not_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 42%] 2023-01-11T23:10:17.1043903Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_or_executor_aten_cuda_bfloat16 PASSED [ 42%] 2023-01-11T23:10:17.1044090Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_or_executor_aten_cuda_bool PASSED [ 42%] 2023-01-11T23:10:17.1044280Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_or_executor_aten_cuda_float32 PASSED [ 42%] 2023-01-11T23:10:17.1044461Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_or_executor_aten_cuda_int32 PASSED [ 42%] 2023-01-11T23:10:17.1044636Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_or_executor_aten_cuda_int64 PASSED [ 42%] 2023-01-11T23:10:17.1044845Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_or_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 42%] 2023-01-11T23:10:17.1045061Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_or_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 42%] 2023-01-11T23:10:17.1045275Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_or_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 42%] 2023-01-11T23:10:17.1045487Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_or_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 42%] 2023-01-11T23:10:17.1045819Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_or_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 42%] 2023-01-11T23:10:17.1046005Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_or_executor_nvfuser_cuda_int32 PASSED [ 42%] 2023-01-11T23:10:17.1046333Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_or_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 42%] 2023-01-11T23:10:17.1046539Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_or_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 42%] 2023-01-11T23:10:17.1046730Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_xor_executor_aten_cuda_bfloat16 PASSED [ 42%] 2023-01-11T23:10:17.1046906Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_xor_executor_aten_cuda_bool PASSED [ 42%] 2023-01-11T23:10:17.1047123Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_xor_executor_aten_cuda_complex128 PASSED [ 42%] 2023-01-11T23:10:17.1047314Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_xor_executor_aten_cuda_complex64 PASSED [ 42%] 2023-01-11T23:10:17.1047502Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_xor_executor_aten_cuda_float16 PASSED [ 42%] 2023-01-11T23:10:17.1047687Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_xor_executor_aten_cuda_int16 PASSED [ 42%] 2023-01-11T23:10:17.1047873Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_xor_executor_aten_cuda_int64 PASSED [ 42%] 2023-01-11T23:10:17.1048056Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_xor_executor_aten_cuda_int8 PASSED [ 42%] 2023-01-11T23:10:17.1048241Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_xor_executor_aten_cuda_uint8 PASSED [ 42%] 2023-01-11T23:10:17.1048460Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_xor_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 42%] 2023-01-11T23:10:17.1048677Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_xor_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 42%] 2023-01-11T23:10:17.1048880Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_xor_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 42%] 2023-01-11T23:10:17.1049066Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logspace_executor_aten_cuda_complex128 PASSED [ 42%] 2023-01-11T23:10:17.1049247Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logspace_executor_aten_cuda_int16 XFAIL [ 42%] 2023-01-11T23:10:17.1049456Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logspace_executor_aten_cuda_int8 PASSED [ 42%] 2023-01-11T23:10:17.1049639Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logspace_executor_aten_cuda_uint8 PASSED [ 42%] 2023-01-11T23:10:17.1049855Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logspace_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 42%] 2023-01-11T23:10:17.1050065Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logspace_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 42%] 2023-01-11T23:10:17.1050392Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logspace_executor_nvfuser_cuda_float32 SKIPPED (_refs.logspace doesn't support nvfuser) [ 42%] 2023-01-11T23:10:17.1050601Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logspace_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 42%] 2023-01-11T23:10:17.1050800Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logspace_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 42%] 2023-01-11T23:10:17.1050990Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logsumexp_executor_aten_cuda_bfloat16 PASSED [ 42%] 2023-01-11T23:10:17.1051180Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logsumexp_executor_aten_cuda_float16 PASSED [ 42%] 2023-01-11T23:10:17.1051367Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logsumexp_executor_aten_cuda_int16 PASSED [ 42%] 2023-01-11T23:10:17.1051551Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logsumexp_executor_aten_cuda_int32 PASSED [ 42%] 2023-01-11T23:10:17.1051730Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logsumexp_executor_aten_cuda_int64 PASSED [ 42%] 2023-01-11T23:10:17.1051908Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logsumexp_executor_aten_cuda_uint8 PASSED [ 42%] 2023-01-11T23:10:17.1052120Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logsumexp_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 42%] 2023-01-11T23:10:17.1052334Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logsumexp_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 42%] 2023-01-11T23:10:17.1052567Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logsumexp_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 42%] 2023-01-11T23:10:17.1052894Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logsumexp_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 42%] 2023-01-11T23:10:17.1053073Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lt_executor_aten_cuda_float64 PASSED [ 42%] 2023-01-11T23:10:17.1053249Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lt_executor_aten_cuda_int16 PASSED [ 42%] 2023-01-11T23:10:17.1053419Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lt_executor_aten_cuda_int32 PASSED [ 42%] 2023-01-11T23:10:17.1053589Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lt_executor_aten_cuda_int64 PASSED [ 42%] 2023-01-11T23:10:17.1053909Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lt_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 42%] 2023-01-11T23:10:17.1054112Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lt_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 42%] 2023-01-11T23:10:17.1054294Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_aten_cuda_bool PASSED [ 42%] 2023-01-11T23:10:17.1054597Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_aten_cuda_complex128 PASSED [ 42%] 2023-01-11T23:10:17.1054792Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_aten_cuda_complex32 PASSED [ 42%] 2023-01-11T23:10:17.1054967Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_aten_cuda_int32 PASSED [ 42%] 2023-01-11T23:10:17.1055149Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_aten_cuda_int64 PASSED [ 42%] 2023-01-11T23:10:17.1055411Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 42%] 2023-01-11T23:10:17.1055770Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 42%] 2023-01-11T23:10:17.1056106Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_nvfuser_cuda_float32 SKIPPED (_refs.masked_fill doesn't support nvfuser) [ 42%] 2023-01-11T23:10:17.1056317Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 42%] 2023-01-11T23:10:17.1056653Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 42%] 2023-01-11T23:10:17.1056863Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 43%] 2023-01-11T23:10:17.1057071Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 43%] 2023-01-11T23:10:17.1057257Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_maximum_executor_aten_cuda_bfloat16 PASSED [ 43%] 2023-01-11T23:10:17.1057431Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_maximum_executor_aten_cuda_float16 PASSED [ 43%] 2023-01-11T23:10:17.1057611Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_maximum_executor_aten_cuda_float32 PASSED [ 43%] 2023-01-11T23:10:17.1057789Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_maximum_executor_aten_cuda_int64 PASSED [ 43%] 2023-01-11T23:10:17.1057969Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_maximum_executor_aten_cuda_int8 PASSED [ 43%] 2023-01-11T23:10:17.1058178Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_maximum_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 43%] 2023-01-11T23:10:17.1058505Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_maximum_executor_nvfuser_cuda_float32 SKIPPED (_refs.maximum doesn't support nvfuser) [ 43%] 2023-01-11T23:10:17.1058745Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_maximum_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 43%] 2023-01-11T23:10:17.1059070Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_maximum_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 43%] 2023-01-11T23:10:17.1059276Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_maximum_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 43%] 2023-01-11T23:10:17.1059478Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_maximum_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 43%] 2023-01-11T23:10:17.1059652Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mean_executor_aten_cuda_complex128 PASSED [ 43%] 2023-01-11T23:10:17.1059838Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mean_executor_aten_cuda_float16 PASSED [ 43%] 2023-01-11T23:10:17.1060021Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mean_executor_aten_cuda_float64 PASSED [ 43%] 2023-01-11T23:10:17.1060229Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mean_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 43%] 2023-01-11T23:10:17.1060432Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mean_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 43%] 2023-01-11T23:10:17.1060615Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mean_executor_nvfuser_cuda_float32 PASSED [ 43%] 2023-01-11T23:10:17.1060842Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_aten_cuda_bool PASSED [ 43%] 2023-01-11T23:10:17.1061068Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_aten_cuda_complex128 PASSED [ 43%] 2023-01-11T23:10:17.1061296Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_aten_cuda_complex64 PASSED [ 43%] 2023-01-11T23:10:17.1061502Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_aten_cuda_float16 PASSED [ 43%] 2023-01-11T23:10:17.1061692Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_aten_cuda_float64 PASSED [ 43%] 2023-01-11T23:10:17.1061889Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_aten_cuda_int32 PASSED [ 43%] 2023-01-11T23:10:17.1062086Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_aten_cuda_int64 PASSED [ 43%] 2023-01-11T23:10:17.1062276Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_aten_cuda_uint8 PASSED [ 43%] 2023-01-11T23:10:17.1062505Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 43%] 2023-01-11T23:10:17.1062729Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 43%] 2023-01-11T23:10:17.1062951Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 43%] 2023-01-11T23:10:17.1063166Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 43%] 2023-01-11T23:10:17.1063516Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 43%] 2023-01-11T23:10:17.1063737Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 43%] 2023-01-11T23:10:17.1063932Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_variadic_tensors_executor_aten_cuda_bool PASSED [ 43%] 2023-01-11T23:10:17.1064160Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_variadic_tensors_executor_aten_cuda_complex128 PASSED [ 43%] 2023-01-11T23:10:17.1064367Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_variadic_tensors_executor_aten_cuda_complex64 PASSED [ 43%] 2023-01-11T23:10:17.1064571Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_variadic_tensors_executor_aten_cuda_float16 PASSED [ 43%] 2023-01-11T23:10:17.1064770Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_variadic_tensors_executor_aten_cuda_int32 PASSED [ 43%] 2023-01-11T23:10:17.1064966Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_variadic_tensors_executor_aten_cuda_int8 PASSED [ 43%] 2023-01-11T23:10:17.1065196Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_variadic_tensors_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 43%] 2023-01-11T23:10:17.1065422Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_variadic_tensors_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 43%] 2023-01-11T23:10:17.1065775Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_variadic_tensors_executor_nvfuser_cuda_int32 SKIPPED (_refs.meshgrid doesn't support nvfuser) [ 43%] 2023-01-11T23:10:17.1066123Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_variadic_tensors_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 43%] 2023-01-11T23:10:17.1066347Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_variadic_tensors_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 43%] 2023-01-11T23:10:17.1066525Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_minimum_executor_aten_cuda_bfloat16 PASSED [ 43%] 2023-01-11T23:10:17.1066733Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_minimum_executor_aten_cuda_bool PASSED [ 43%] 2023-01-11T23:10:17.1066918Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_minimum_executor_aten_cuda_float16 PASSED [ 43%] 2023-01-11T23:10:17.1067103Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_minimum_executor_aten_cuda_float32 PASSED [ 43%] 2023-01-11T23:10:17.1067283Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_minimum_executor_aten_cuda_int16 PASSED [ 43%] 2023-01-11T23:10:17.1067464Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_minimum_executor_aten_cuda_int64 PASSED [ 43%] 2023-01-11T23:10:17.1067647Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_minimum_executor_aten_cuda_int8 PASSED [ 43%] 2023-01-11T23:10:17.1067828Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_minimum_executor_aten_cuda_uint8 PASSED [ 43%] 2023-01-11T23:10:17.1068150Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_minimum_executor_nvfuser_cuda_float32 SKIPPED (_refs.minimum doesn't support nvfuser) [ 43%] 2023-01-11T23:10:17.1068485Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_minimum_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 43%] 2023-01-11T23:10:17.1068876Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_minimum_executor_nvfuser_cuda_int32 SKIPPED (_refs.minimum doesn't support nvfuser) [ 43%] 2023-01-11T23:10:17.1069085Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_minimum_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 43%] 2023-01-11T23:10:17.1069266Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_aten_cuda_float16 PASSED [ 43%] 2023-01-11T23:10:17.1069445Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_aten_cuda_int16 PASSED [ 43%] 2023-01-11T23:10:17.1069654Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 43%] 2023-01-11T23:10:17.1069868Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 43%] 2023-01-11T23:10:17.1070242Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 43%] 2023-01-11T23:10:17.1070455Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 43%] 2023-01-11T23:10:17.1070665Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 43%] 2023-01-11T23:10:17.1071029Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_nvfuser_cuda_int32 SKIPPED (_refs.movedim doesn't support nvfuser) [ 43%] 2023-01-11T23:10:17.1071203Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mul_executor_aten_cuda_bfloat16 PASSED [ 43%] 2023-01-11T23:10:17.1071382Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mul_executor_aten_cuda_bool PASSED [ 43%] 2023-01-11T23:10:17.1071566Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mul_executor_aten_cuda_complex128 PASSED [ 43%] 2023-01-11T23:10:17.1071745Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mul_executor_aten_cuda_float32 PASSED [ 43%] 2023-01-11T23:10:17.1071923Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mul_executor_aten_cuda_int16 PASSED [ 43%] 2023-01-11T23:10:17.1072098Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mul_executor_aten_cuda_int8 PASSED [ 43%] 2023-01-11T23:10:17.1072272Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mul_executor_aten_cuda_uint8 PASSED [ 43%] 2023-01-11T23:10:17.1072457Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mul_executor_nvfuser_cuda_float32 PASSED [ 43%] 2023-01-11T23:10:17.1072660Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mul_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 43%] 2023-01-11T23:10:17.1072994Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mul_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 43%] 2023-01-11T23:10:17.1073183Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nan_to_num_executor_aten_cuda_bfloat16 PASSED [ 43%] 2023-01-11T23:10:17.1073366Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nan_to_num_executor_aten_cuda_bool PASSED [ 43%] 2023-01-11T23:10:17.1073551Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nan_to_num_executor_aten_cuda_float32 PASSED [ 43%] 2023-01-11T23:10:17.1073730Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nan_to_num_executor_aten_cuda_int32 PASSED [ 43%] 2023-01-11T23:10:17.1073911Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nan_to_num_executor_aten_cuda_int64 PASSED [ 43%] 2023-01-11T23:10:17.1074090Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nan_to_num_executor_aten_cuda_uint8 PASSED [ 43%] 2023-01-11T23:10:17.1074300Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nan_to_num_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 43%] 2023-01-11T23:10:17.1074510Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nan_to_num_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 43%] 2023-01-11T23:10:17.1074840Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nan_to_num_executor_nvfuser_cuda_float32 SKIPPED (_refs.nan_to_num doesn't support nvfuser) [ 43%] 2023-01-11T23:10:17.1075042Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nan_to_num_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 43%] 2023-01-11T23:10:17.1075248Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nan_to_num_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 43%] 2023-01-11T23:10:17.1075441Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_aten_cuda_complex128 PASSED [ 43%] 2023-01-11T23:10:17.1075636Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_aten_cuda_complex64 PASSED [ 43%] 2023-01-11T23:10:17.1075859Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_aten_cuda_int16 PASSED [ 43%] 2023-01-11T23:10:17.1076044Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_aten_cuda_int8 PASSED [ 43%] 2023-01-11T23:10:17.1076231Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_aten_cuda_uint8 PASSED [ 43%] 2023-01-11T23:10:17.1076445Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 43%] 2023-01-11T23:10:17.1076656Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 43%] 2023-01-11T23:10:17.1076995Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_nvfuser_cuda_float32 SKIPPED (_refs.narrow_copy doesn't support nvfuser) [ 43%] 2023-01-11T23:10:17.1077322Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 43%] 2023-01-11T23:10:17.1077659Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_nvfuser_cuda_int32 SKIPPED (_refs.narrow_copy doesn't support nvfuser) [ 43%] 2023-01-11T23:10:17.1077870Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 43%] 2023-01-11T23:10:17.1078198Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 43%] 2023-01-11T23:10:17.1078409Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 43%] 2023-01-11T23:10:17.1078618Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_aten_cuda_bool PASSED [ 43%] 2023-01-11T23:10:17.1078805Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_aten_cuda_complex64 PASSED [ 43%] 2023-01-11T23:10:17.1078993Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_aten_cuda_float16 PASSED [ 43%] 2023-01-11T23:10:17.1079175Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_aten_cuda_float32 PASSED [ 43%] 2023-01-11T23:10:17.1079351Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_aten_cuda_float64 PASSED [ 43%] 2023-01-11T23:10:17.1079523Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_aten_cuda_int16 PASSED [ 43%] 2023-01-11T23:10:17.1079701Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_aten_cuda_int32 PASSED [ 43%] 2023-01-11T23:10:17.1079882Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_aten_cuda_int8 PASSED [ 43%] 2023-01-11T23:10:17.1080063Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_aten_cuda_uint8 PASSED [ 43%] 2023-01-11T23:10:17.1080275Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 43%] 2023-01-11T23:10:17.1080618Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 43%] 2023-01-11T23:10:17.1080835Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 43%] 2023-01-11T23:10:17.1081074Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 43%] 2023-01-11T23:10:17.1081397Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_nvfuser_cuda_float32 SKIPPED (_refs.narrow doesn't support nvfuser) [ 43%] 2023-01-11T23:10:17.1081722Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 43%] 2023-01-11T23:10:17.1082065Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 43%] 2023-01-11T23:10:17.1082273Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 43%] 2023-01-11T23:10:17.1082470Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_native_layer_norm_executor_aten_cuda_bfloat16 PASSED [ 43%] 2023-01-11T23:10:17.1082667Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_native_layer_norm_executor_aten_cuda_float32 PASSED [ 43%] 2023-01-11T23:10:17.1082859Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_native_layer_norm_executor_aten_cuda_float64 PASSED [ 43%] 2023-01-11T23:10:17.1083081Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_native_layer_norm_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 43%] 2023-01-11T23:10:17.1083276Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_native_layer_norm_executor_nvfuser_cuda_float32 PASSED [ 43%] 2023-01-11T23:10:17.1083493Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_native_layer_norm_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 43%] 2023-01-11T23:10:17.1083674Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ne_executor_aten_cuda_bfloat16 PASSED [ 43%] 2023-01-11T23:10:17.1083850Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ne_executor_aten_cuda_bool PASSED [ 43%] 2023-01-11T23:10:17.1084025Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ne_executor_aten_cuda_complex128 PASSED [ 43%] 2023-01-11T23:10:17.1084202Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ne_executor_aten_cuda_int64 PASSED [ 43%] 2023-01-11T23:10:17.1084400Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ne_executor_aten_cuda_uint8 PASSED [ 43%] 2023-01-11T23:10:17.1084602Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ne_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 43%] 2023-01-11T23:10:17.1084810Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ne_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 43%] 2023-01-11T23:10:17.1085017Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ne_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 43%] 2023-01-11T23:10:17.1085201Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ne_executor_nvfuser_cuda_float32 PASSED [ 43%] 2023-01-11T23:10:17.1085403Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ne_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 43%] 2023-01-11T23:10:17.1085580Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ne_executor_nvfuser_cuda_int32 PASSED [ 43%] 2023-01-11T23:10:17.1085781Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ne_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 43%] 2023-01-11T23:10:17.1085973Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ne_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 43%] 2023-01-11T23:10:17.1086158Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_neg_executor_aten_cuda_complex64 PASSED [ 43%] 2023-01-11T23:10:17.1086337Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_neg_executor_aten_cuda_float16 PASSED [ 43%] 2023-01-11T23:10:17.1086510Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_neg_executor_aten_cuda_float32 PASSED [ 43%] 2023-01-11T23:10:17.1086689Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_neg_executor_aten_cuda_float64 PASSED [ 43%] 2023-01-11T23:10:17.1086863Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_neg_executor_aten_cuda_int16 PASSED [ 43%] 2023-01-11T23:10:17.1087039Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_neg_executor_aten_cuda_int32 PASSED [ 43%] 2023-01-11T23:10:17.1087216Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_neg_executor_aten_cuda_int64 PASSED [ 43%] 2023-01-11T23:10:17.1087449Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_neg_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 43%] 2023-01-11T23:10:17.1087649Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_neg_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 43%] 2023-01-11T23:10:17.1087854Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_neg_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 43%] 2023-01-11T23:10:17.1088039Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_neg_executor_nvfuser_cuda_float32 PASSED [ 43%] 2023-01-11T23:10:17.1088241Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_neg_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 43%] 2023-01-11T23:10:17.1088419Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_neg_executor_nvfuser_cuda_int32 PASSED [ 43%] 2023-01-11T23:10:17.1088623Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_neg_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 43%] 2023-01-11T23:10:17.1088831Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_neg_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 43%] 2023-01-11T23:10:17.1089141Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_aten_cuda_bool SKIPPED (Can't check result for new_empty) [ 43%] 2023-01-11T23:10:17.1089451Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_aten_cuda_float16 SKIPPED (Can't check result for new_empty) [ 43%] 2023-01-11T23:10:17.1089758Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_aten_cuda_int32 SKIPPED (Can't check result for new_empty) [ 43%] 2023-01-11T23:10:17.1090056Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_aten_cuda_int64 SKIPPED (Can't check result for new_empty) [ 43%] 2023-01-11T23:10:17.1090453Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_nvfuser_cuda_bfloat16 SKIPPED (Can't check result for new_empty) [ 43%] 2023-01-11T23:10:17.1090773Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_nvfuser_cuda_complex64 SKIPPED (Can't check result for new_empty) [ 43%] 2023-01-11T23:10:17.1091141Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_nvfuser_cuda_float32 SKIPPED (Can't check result for new_empty) [ 43%] 2023-01-11T23:10:17.1091447Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_nvfuser_cuda_float64 SKIPPED (Can't check result for new_empty) [ 43%] 2023-01-11T23:10:17.1091757Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_nvfuser_cuda_int16 SKIPPED (Can't check result for new_empty) [ 44%] 2023-01-11T23:10:17.1092064Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_nvfuser_cuda_int8 SKIPPED (Can't check result for new_empty) [ 44%] 2023-01-11T23:10:17.1092372Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_nvfuser_cuda_uint8 SKIPPED (Can't check result for new_empty) [ 44%] 2023-01-11T23:10:17.1092620Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_strided_executor_aten_cuda_bfloat16 SKIPPED (Expected: empty_strided is not comparable) [ 44%] 2023-01-11T23:10:17.1092854Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_strided_executor_aten_cuda_uint8 SKIPPED (Expected: empty_strided is not comparable) [ 44%] 2023-01-11T23:10:17.1093096Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_strided_executor_nvfuser_cuda_bfloat16 SKIPPED (Expected: empty_strided is not comparable) [ 44%] 2023-01-11T23:10:17.1093327Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_strided_executor_nvfuser_cuda_bool SKIPPED (Expected: empty_strided is not comparable) [ 44%] 2023-01-11T23:10:17.1093573Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_strided_executor_nvfuser_cuda_complex128 SKIPPED (Expected: empty_strided is not comparable) [ 44%] 2023-01-11T23:10:17.1093841Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_strided_executor_nvfuser_cuda_complex64 SKIPPED (Expected: empty_strided is not comparable) [ 44%] 2023-01-11T23:10:17.1094084Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_strided_executor_nvfuser_cuda_float32 SKIPPED (Expected: empty_strided is not comparable) [ 44%] 2023-01-11T23:10:17.1094324Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_strided_executor_nvfuser_cuda_float64 SKIPPED (Expected: empty_strided is not comparable) [ 44%] 2023-01-11T23:10:17.1094660Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_strided_executor_nvfuser_cuda_int16 SKIPPED (Expected: empty_strided is not comparable) [ 44%] 2023-01-11T23:10:17.1094895Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_strided_executor_nvfuser_cuda_int64 SKIPPED (Expected: empty_strided is not comparable) [ 44%] 2023-01-11T23:10:17.1095084Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_full_executor_aten_cuda_bfloat16 PASSED [ 44%] 2023-01-11T23:10:17.1095273Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_full_executor_aten_cuda_complex64 PASSED [ 44%] 2023-01-11T23:10:17.1095459Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_full_executor_aten_cuda_float16 PASSED [ 44%] 2023-01-11T23:10:17.1095642Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_full_executor_aten_cuda_float32 PASSED [ 44%] 2023-01-11T23:10:17.1095815Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_full_executor_aten_cuda_int16 PASSED [ 44%] 2023-01-11T23:10:17.1095994Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_full_executor_aten_cuda_int32 PASSED [ 44%] 2023-01-11T23:10:17.1096273Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_full_executor_aten_cuda_uint8 PASSED [ 44%] 2023-01-11T23:10:17.1096488Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_full_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 44%] 2023-01-11T23:10:17.1096696Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_full_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 44%] 2023-01-11T23:10:17.1096903Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_full_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 44%] 2023-01-11T23:10:17.1097104Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_full_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 44%] 2023-01-11T23:10:17.1097434Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_full_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 44%] 2023-01-11T23:10:17.1097638Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_full_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 44%] 2023-01-11T23:10:17.1097825Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_aten_cuda_bfloat16 PASSED [ 44%] 2023-01-11T23:10:17.1098003Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_aten_cuda_bool PASSED [ 44%] 2023-01-11T23:10:17.1098190Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_aten_cuda_complex128 PASSED [ 44%] 2023-01-11T23:10:17.1098369Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_aten_cuda_int16 PASSED [ 44%] 2023-01-11T23:10:17.1098548Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_aten_cuda_int64 PASSED [ 44%] 2023-01-11T23:10:17.1098729Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_aten_cuda_int8 PASSED [ 44%] 2023-01-11T23:10:17.1098938Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 44%] 2023-01-11T23:10:17.1099158Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 44%] 2023-01-11T23:10:17.1099397Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 44%] 2023-01-11T23:10:17.1099722Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_nvfuser_cuda_float32 SKIPPED (_refs.new_ones doesn't support nvfuser) [ 44%] 2023-01-11T23:10:17.1100048Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 44%] 2023-01-11T23:10:17.1100244Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 44%] 2023-01-11T23:10:17.1100441Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_zeros_executor_aten_cuda_bfloat16 PASSED [ 44%] 2023-01-11T23:10:17.1100670Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_zeros_executor_aten_cuda_complex128 PASSED [ 44%] 2023-01-11T23:10:17.1100865Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_zeros_executor_aten_cuda_complex32 PASSED [ 44%] 2023-01-11T23:10:17.1101050Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_zeros_executor_aten_cuda_complex64 PASSED [ 44%] 2023-01-11T23:10:17.1101235Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_zeros_executor_aten_cuda_float32 PASSED [ 44%] 2023-01-11T23:10:17.1101420Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_zeros_executor_aten_cuda_float64 PASSED [ 44%] 2023-01-11T23:10:17.1101604Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_zeros_executor_aten_cuda_int32 PASSED [ 44%] 2023-01-11T23:10:17.1101784Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_zeros_executor_aten_cuda_int64 PASSED [ 44%] 2023-01-11T23:10:17.1101980Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_zeros_executor_aten_cuda_int8 PASSED [ 44%] 2023-01-11T23:10:17.1102186Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_zeros_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 44%] 2023-01-11T23:10:17.1102399Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_zeros_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 44%] 2023-01-11T23:10:17.1102750Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_zeros_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 44%] 2023-01-11T23:10:17.1102960Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_zeros_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 44%] 2023-01-11T23:10:17.1103285Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_zeros_executor_nvfuser_cuda_float32 SKIPPED (_refs.new_zeros doesn't support nvfuser) [ 44%] 2023-01-11T23:10:17.1103612Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_zeros_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 44%] 2023-01-11T23:10:17.1103939Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_zeros_executor_nvfuser_cuda_int32 SKIPPED (_refs.new_zeros doesn't support nvfuser) [ 44%] 2023-01-11T23:10:17.1104125Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nextafter_executor_aten_cuda_bfloat16 PASSED [ 44%] 2023-01-11T23:10:17.1104313Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nextafter_executor_aten_cuda_float64 PASSED [ 44%] 2023-01-11T23:10:17.1104637Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nextafter_executor_nvfuser_cuda_float32 SKIPPED (_refs.nextafter doesn't support nvfuser) [ 44%] 2023-01-11T23:10:17.1104875Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_alpha_dropout_executor_aten_cuda_bfloat16 SKIPPED (Expected: dropout is not comparable) [ 44%] 2023-01-11T23:10:17.1105135Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_alpha_dropout_executor_nvfuser_cuda_bfloat16 SKIPPED (Expected: dropout is not comparable) [ 44%] 2023-01-11T23:10:17.1105415Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_alpha_dropout_executor_nvfuser_cuda_float16 SKIPPED (Expected: dropout is not comparable) [ 44%] 2023-01-11T23:10:17.1105615Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_celu_executor_aten_cuda_float32 PASSED [ 44%] 2023-01-11T23:10:17.1105811Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_celu_executor_aten_cuda_float64 PASSED [ 44%] 2023-01-11T23:10:17.1106030Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_celu_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 44%] 2023-01-11T23:10:17.1106250Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_celu_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 44%] 2023-01-11T23:10:17.1106470Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_celu_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 44%] 2023-01-11T23:10:17.1106672Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_dropout_executor_aten_cuda_bfloat16 XFAIL [ 44%] 2023-01-11T23:10:17.1106874Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_dropout_executor_aten_cuda_float16 XFAIL [ 44%] 2023-01-11T23:10:17.1107074Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_dropout_executor_aten_cuda_float64 XFAIL [ 44%] 2023-01-11T23:10:17.1107291Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_dropout_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 44%] 2023-01-11T23:10:17.1107510Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_dropout_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 44%] 2023-01-11T23:10:17.1107743Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_dropout_executor_nvfuser_cuda_float32 XFAIL [ 44%] 2023-01-11T23:10:17.1107937Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_elu_executor_aten_cuda_bfloat16 PASSED [ 44%] 2023-01-11T23:10:17.1108133Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_elu_executor_aten_cuda_float16 PASSED [ 44%] 2023-01-11T23:10:17.1108324Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_elu_executor_aten_cuda_float32 PASSED [ 44%] 2023-01-11T23:10:17.1108539Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_elu_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 44%] 2023-01-11T23:10:17.1108837Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_elu_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 44%] 2023-01-11T23:10:17.1109034Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_elu_executor_nvfuser_cuda_float32 PASSED [ 44%] 2023-01-11T23:10:17.1109232Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_gelu_executor_aten_cuda_float32 PASSED [ 44%] 2023-01-11T23:10:17.1109447Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_gelu_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 44%] 2023-01-11T23:10:17.1109665Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_gelu_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 44%] 2023-01-11T23:10:17.1110504Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_glu_executor_aten_cuda_bfloat16 PASSED [ 44%] 2023-01-11T23:10:17.1110733Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_glu_executor_aten_cuda_float16 PASSED [ 44%] 2023-01-11T23:10:17.1110927Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_glu_executor_aten_cuda_float32 PASSED [ 44%] 2023-01-11T23:10:17.1111123Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_glu_executor_aten_cuda_float64 PASSED [ 44%] 2023-01-11T23:10:17.1111384Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_glu_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 44%] 2023-01-11T23:10:17.1111601Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_glu_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 44%] 2023-01-11T23:10:17.1111817Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_glu_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 44%] 2023-01-11T23:10:17.1112018Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_group_norm_executor_aten_cuda_bfloat16 PASSED [ 44%] 2023-01-11T23:10:17.1112211Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_group_norm_executor_aten_cuda_float32 PASSED [ 44%] 2023-01-11T23:10:17.1112438Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_group_norm_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 44%] 2023-01-11T23:10:17.1112830Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_group_norm_executor_nvfuser_cuda_float32 SKIPPED (_refs.nn.functional.group_norm doesn't support nvfuser) [ 44%] 2023-01-11T23:10:17.1113054Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_group_norm_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 44%] 2023-01-11T23:10:17.1113255Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hardshrink_executor_aten_cuda_bfloat16 PASSED [ 44%] 2023-01-11T23:10:17.1113456Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hardshrink_executor_aten_cuda_float16 PASSED [ 44%] 2023-01-11T23:10:17.1113658Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hardshrink_executor_aten_cuda_float32 PASSED [ 44%] 2023-01-11T23:10:17.1113856Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hardshrink_executor_aten_cuda_float64 PASSED [ 44%] 2023-01-11T23:10:17.1114266Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hardshrink_executor_nvfuser_cuda_float32 SKIPPED (_refs.nn.functional.hardshrink doesn't support nvfuser) [ 44%] 2023-01-11T23:10:17.1114467Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hardtanh_executor_aten_cuda_bfloat16 PASSED [ 44%] 2023-01-11T23:10:17.1114667Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hardtanh_executor_aten_cuda_float16 PASSED [ 44%] 2023-01-11T23:10:17.1114853Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hardtanh_executor_aten_cuda_int16 PASSED [ 44%] 2023-01-11T23:10:17.1115047Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hardtanh_executor_aten_cuda_int32 PASSED [ 44%] 2023-01-11T23:10:17.1115243Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hardtanh_executor_aten_cuda_int8 PASSED [ 44%] 2023-01-11T23:10:17.1115473Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hardtanh_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 44%] 2023-01-11T23:10:17.1115694Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hardtanh_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 44%] 2023-01-11T23:10:17.1116068Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hardtanh_executor_nvfuser_cuda_float32 SKIPPED (_refs.nn.functional.hardtanh doesn't support nvfuser) [ 44%] 2023-01-11T23:10:17.1116293Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hardtanh_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 44%] 2023-01-11T23:10:17.1116638Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hardtanh_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 44%] 2023-01-11T23:10:17.1116986Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hardtanh_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 44%] 2023-01-11T23:10:17.1117236Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hinge_embedding_loss_executor_aten_cuda_float64 PASSED [ 44%] 2023-01-11T23:10:17.1117474Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hinge_embedding_loss_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 44%] 2023-01-11T23:10:17.1117701Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hinge_embedding_loss_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 44%] 2023-01-11T23:10:17.1117936Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hinge_embedding_loss_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 44%] 2023-01-11T23:10:17.1118134Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_huber_loss_executor_aten_cuda_float16 PASSED [ 44%] 2023-01-11T23:10:17.1118335Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_huber_loss_executor_aten_cuda_float32 PASSED [ 44%] 2023-01-11T23:10:17.1118561Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_huber_loss_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 44%] 2023-01-11T23:10:17.1118782Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_huber_loss_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 44%] 2023-01-11T23:10:17.1118984Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_huber_loss_executor_nvfuser_cuda_float32 PASSED [ 44%] 2023-01-11T23:10:17.1119205Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_huber_loss_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 44%] 2023-01-11T23:10:17.1119402Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_l1_loss_executor_aten_cuda_bfloat16 PASSED [ 44%] 2023-01-11T23:10:17.1119633Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_l1_loss_executor_aten_cuda_complex128 PASSED [ 44%] 2023-01-11T23:10:17.1119834Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_l1_loss_executor_aten_cuda_complex64 PASSED [ 44%] 2023-01-11T23:10:17.1120047Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_l1_loss_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 44%] 2023-01-11T23:10:17.1120272Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_l1_loss_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 44%] 2023-01-11T23:10:17.1120494Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_l1_loss_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 44%] 2023-01-11T23:10:17.1120910Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_l1_loss_executor_nvfuser_cuda_float32 SKIPPED (_refs.nn.functional.l1_loss doesn't support nvfuser) [ 44%] 2023-01-11T23:10:17.1121130Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_l1_loss_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 44%] 2023-01-11T23:10:17.1121333Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_layer_norm_executor_aten_cuda_bfloat16 PASSED [ 44%] 2023-01-11T23:10:17.1121530Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_layer_norm_executor_aten_cuda_float16 PASSED [ 44%] 2023-01-11T23:10:17.1121754Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_layer_norm_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 44%] 2023-01-11T23:10:17.1122381Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_layer_norm_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 44%] 2023-01-11T23:10:17.1122653Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_leaky_relu_executor_aten_cuda_bfloat16 PASSED [ 44%] 2023-01-11T23:10:17.1122900Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_leaky_relu_executor_aten_cuda_float32 PASSED [ 44%] 2023-01-11T23:10:17.1123164Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_leaky_relu_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 44%] 2023-01-11T23:10:17.1123369Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_leaky_relu_executor_nvfuser_cuda_float32 PASSED [ 44%] 2023-01-11T23:10:17.1123588Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_leaky_relu_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 44%] 2023-01-11T23:10:17.1123800Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_aten_cuda_bool PASSED [ 44%] 2023-01-11T23:10:17.1124016Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_aten_cuda_complex128 PASSED [ 44%] 2023-01-11T23:10:17.1124234Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_aten_cuda_complex32 PASSED [ 44%] 2023-01-11T23:10:17.1124453Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_aten_cuda_complex64 PASSED [ 44%] 2023-01-11T23:10:17.1124663Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_aten_cuda_float32 PASSED [ 44%] 2023-01-11T23:10:17.1124874Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_aten_cuda_float64 PASSED [ 44%] 2023-01-11T23:10:17.1125084Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_aten_cuda_uint8 PASSED [ 44%] 2023-01-11T23:10:17.1125319Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 44%] 2023-01-11T23:10:17.1125744Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 44%] 2023-01-11T23:10:17.1125962Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_nvfuser_cuda_float32 PASSED [ 44%] 2023-01-11T23:10:17.1126198Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 44%] 2023-01-11T23:10:17.1126562Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 44%] 2023-01-11T23:10:17.1126775Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_nvfuser_cuda_int32 PASSED [ 44%] 2023-01-11T23:10:17.1127009Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 44%] 2023-01-11T23:10:17.1127223Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_margin_ranking_loss_executor_aten_cuda_int16 PASSED [ 44%] 2023-01-11T23:10:17.1127423Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_margin_ranking_loss_executor_aten_cuda_int64 PASSED [ 44%] 2023-01-11T23:10:17.1127621Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_margin_ranking_loss_executor_aten_cuda_uint8 PASSED [ 44%] 2023-01-11T23:10:17.1127982Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_margin_ranking_loss_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 44%] 2023-01-11T23:10:17.1128385Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_margin_ranking_loss_executor_nvfuser_cuda_int32 SKIPPED (_refs.nn.functional.margin_ranking_loss doesn't support nvfuser) [ 44%] 2023-01-11T23:10:17.1128618Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_margin_ranking_loss_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 44%] 2023-01-11T23:10:17.1128994Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_margin_ranking_loss_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 44%] 2023-01-11T23:10:17.1129224Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_margin_ranking_loss_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 44%] 2023-01-11T23:10:17.1129419Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_mish_executor_aten_cuda_float16 PASSED [ 45%] 2023-01-11T23:10:17.1129615Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_mish_executor_aten_cuda_float32 PASSED [ 45%] 2023-01-11T23:10:17.1129833Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_mish_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 45%] 2023-01-11T23:10:17.1130058Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_mish_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 45%] 2023-01-11T23:10:17.1130261Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_mse_loss_executor_aten_cuda_bfloat16 PASSED [ 45%] 2023-01-11T23:10:17.1130459Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_mse_loss_executor_aten_cuda_float32 PASSED [ 45%] 2023-01-11T23:10:17.1130655Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_mse_loss_executor_aten_cuda_float64 PASSED [ 45%] 2023-01-11T23:10:17.1130878Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_mse_loss_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 45%] 2023-01-11T23:10:17.1131090Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_mse_loss_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 45%] 2023-01-11T23:10:17.1131487Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_mse_loss_executor_nvfuser_cuda_float32 SKIPPED (_refs.nn.functional.mse_loss doesn't support nvfuser) [ 45%] 2023-01-11T23:10:17.1131709Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_mse_loss_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 45%] 2023-01-11T23:10:17.1131905Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_nll_loss_executor_aten_cuda_float16 XFAIL [ 45%] 2023-01-11T23:10:17.1132126Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_nll_loss_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 45%] 2023-01-11T23:10:17.1132344Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_nll_loss_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 45%] 2023-01-11T23:10:17.1132709Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_nll_loss_executor_nvfuser_cuda_float32 SKIPPED (_refs.nn.functional.nll_loss doesn't support nvfuser) [ 45%] 2023-01-11T23:10:17.1132930Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_nll_loss_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 45%] 2023-01-11T23:10:17.1133279Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pairwise_distance_executor_aten_cuda_complex128 PASSED [ 45%] 2023-01-11T23:10:17.1133494Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pairwise_distance_executor_aten_cuda_complex64 PASSED [ 45%] 2023-01-11T23:10:17.1133703Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pairwise_distance_executor_aten_cuda_int16 PASSED [ 45%] 2023-01-11T23:10:17.1133900Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pairwise_distance_executor_aten_cuda_int8 PASSED [ 45%] 2023-01-11T23:10:17.1134106Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pairwise_distance_executor_aten_cuda_uint8 PASSED [ 45%] 2023-01-11T23:10:17.1134343Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pairwise_distance_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 45%] 2023-01-11T23:10:17.1134747Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pairwise_distance_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 45%] 2023-01-11T23:10:17.1134982Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pairwise_distance_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 45%] 2023-01-11T23:10:17.1135351Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pairwise_distance_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 45%] 2023-01-11T23:10:17.1135581Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pairwise_distance_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 45%] 2023-01-11T23:10:17.1135786Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pdist_executor_aten_cuda_float32 PASSED [ 45%] 2023-01-11T23:10:17.1135986Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pdist_executor_aten_cuda_float64 PASSED [ 45%] 2023-01-11T23:10:17.1136194Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_poisson_nll_loss_executor_aten_cuda_bfloat16 PASSED [ 45%] 2023-01-11T23:10:17.1136397Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_poisson_nll_loss_executor_aten_cuda_float16 PASSED [ 45%] 2023-01-11T23:10:17.1136587Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_poisson_nll_loss_executor_aten_cuda_float32 PASSED [ 45%] 2023-01-11T23:10:17.1136790Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_poisson_nll_loss_executor_aten_cuda_int16 PASSED [ 45%] 2023-01-11T23:10:17.1136996Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_poisson_nll_loss_executor_aten_cuda_int64 PASSED [ 45%] 2023-01-11T23:10:17.1137233Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_poisson_nll_loss_executor_aten_cuda_int8 PASSED [ 45%] 2023-01-11T23:10:17.1137445Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_poisson_nll_loss_executor_nvfuser_cuda_float32 PASSED [ 45%] 2023-01-11T23:10:17.1137673Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_poisson_nll_loss_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 45%] 2023-01-11T23:10:17.1137897Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_poisson_nll_loss_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 45%] 2023-01-11T23:10:17.1138253Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_poisson_nll_loss_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 45%] 2023-01-11T23:10:17.1138474Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_poisson_nll_loss_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 45%] 2023-01-11T23:10:17.1138674Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_prelu_executor_aten_cuda_float32 PASSED [ 45%] 2023-01-11T23:10:17.1138901Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_prelu_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 45%] 2023-01-11T23:10:17.1139258Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_prelu_executor_nvfuser_cuda_float32 SKIPPED (_refs.nn.functional.prelu doesn't support nvfuser) [ 45%] 2023-01-11T23:10:17.1139480Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_prelu_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 45%] 2023-01-11T23:10:17.1139675Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu6_executor_aten_cuda_float32 PASSED [ 45%] 2023-01-11T23:10:17.1139873Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu6_executor_aten_cuda_int16 PASSED [ 45%] 2023-01-11T23:10:17.1140065Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu6_executor_aten_cuda_int8 PASSED [ 45%] 2023-01-11T23:10:17.1140283Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu6_executor_aten_cuda_uint8 PASSED [ 45%] 2023-01-11T23:10:17.1140528Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu6_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 45%] 2023-01-11T23:10:17.1140761Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu6_executor_nvfuser_cuda_float32 PASSED [ 45%] 2023-01-11T23:10:17.1140980Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu6_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 45%] 2023-01-11T23:10:17.1141323Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu6_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 45%] 2023-01-11T23:10:17.1141517Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu6_executor_nvfuser_cuda_int32 PASSED [ 45%] 2023-01-11T23:10:17.1141737Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu6_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 45%] 2023-01-11T23:10:17.1142079Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu6_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 45%] 2023-01-11T23:10:17.1142295Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu6_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 45%] 2023-01-11T23:10:17.1142489Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu_executor_aten_cuda_bfloat16 PASSED [ 45%] 2023-01-11T23:10:17.1142683Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu_executor_aten_cuda_float32 PASSED [ 45%] 2023-01-11T23:10:17.1142906Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu_executor_aten_cuda_int32 PASSED [ 45%] 2023-01-11T23:10:17.1143098Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu_executor_aten_cuda_int8 PASSED [ 45%] 2023-01-11T23:10:17.1143285Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu_executor_aten_cuda_uint8 PASSED [ 45%] 2023-01-11T23:10:17.1143645Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu_executor_nvfuser_cuda_float32 SKIPPED (_refs.nn.functional.relu doesn't support nvfuser) [ 45%] 2023-01-11T23:10:17.1143987Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 45%] 2023-01-11T23:10:17.1144195Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 45%] 2023-01-11T23:10:17.1144417Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 45%] 2023-01-11T23:10:17.1144634Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_selu_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 45%] 2023-01-11T23:10:17.1144831Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_selu_executor_nvfuser_cuda_float32 PASSED [ 45%] 2023-01-11T23:10:17.1145047Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_selu_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 45%] 2023-01-11T23:10:17.1145257Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_aten_cuda_bfloat16 PASSED [ 45%] 2023-01-11T23:10:17.1145469Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_aten_cuda_complex64 PASSED [ 45%] 2023-01-11T23:10:17.1145679Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_aten_cuda_float64 PASSED [ 45%] 2023-01-11T23:10:17.1146373Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_aten_cuda_int16 PASSED [ 45%] 2023-01-11T23:10:17.1146588Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_aten_cuda_int32 PASSED [ 45%] 2023-01-11T23:10:17.1146778Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_aten_cuda_int64 PASSED [ 45%] 2023-01-11T23:10:17.1146984Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_aten_cuda_int8 PASSED [ 45%] 2023-01-11T23:10:17.1147184Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_aten_cuda_uint8 PASSED [ 45%] 2023-01-11T23:10:17.1147419Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 45%] 2023-01-11T23:10:17.1147655Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 45%] 2023-01-11T23:10:17.1147885Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 45%] 2023-01-11T23:10:17.1148095Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_nvfuser_cuda_int32 PASSED [ 45%] 2023-01-11T23:10:17.1148324Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 45%] 2023-01-11T23:10:17.1148531Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmin_with_dtype_executor_aten_cuda_bfloat16 PASSED [ 45%] 2023-01-11T23:10:17.1148865Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmin_with_dtype_executor_aten_cuda_complex128 PASSED [ 45%] 2023-01-11T23:10:17.1149080Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmin_with_dtype_executor_aten_cuda_complex64 PASSED [ 45%] 2023-01-11T23:10:17.1149279Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmin_with_dtype_executor_aten_cuda_float32 PASSED [ 45%] 2023-01-11T23:10:17.1149480Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmin_with_dtype_executor_aten_cuda_float64 PASSED [ 45%] 2023-01-11T23:10:17.1149684Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmin_with_dtype_executor_aten_cuda_int32 PASSED [ 45%] 2023-01-11T23:10:17.1149884Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmin_with_dtype_executor_aten_cuda_int64 PASSED [ 45%] 2023-01-11T23:10:17.1150079Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmin_with_dtype_executor_aten_cuda_uint8 PASSED [ 45%] 2023-01-11T23:10:17.1150319Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmin_with_dtype_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 45%] 2023-01-11T23:10:17.1150577Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmin_with_dtype_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 45%] 2023-01-11T23:10:17.1150827Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmin_with_dtype_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 45%] 2023-01-11T23:10:17.1151035Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmin_with_dtype_executor_nvfuser_cuda_int32 PASSED [ 45%] 2023-01-11T23:10:17.1151408Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmin_with_dtype_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 45%] 2023-01-11T23:10:17.1151639Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmin_with_dtype_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 45%] 2023-01-11T23:10:17.1151862Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softplus_executor_aten_cuda_bfloat16 PASSED [ 45%] 2023-01-11T23:10:17.1152061Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softplus_executor_aten_cuda_float32 PASSED [ 45%] 2023-01-11T23:10:17.1152259Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softplus_executor_aten_cuda_float64 PASSED [ 45%] 2023-01-11T23:10:17.1152482Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softplus_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 45%] 2023-01-11T23:10:17.1152682Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softshrink_executor_aten_cuda_bfloat16 PASSED [ 45%] 2023-01-11T23:10:17.1152885Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softshrink_executor_aten_cuda_float16 PASSED [ 45%] 2023-01-11T23:10:17.1153083Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softshrink_executor_aten_cuda_float32 PASSED [ 45%] 2023-01-11T23:10:17.1153310Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softshrink_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 45%] 2023-01-11T23:10:17.1153515Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_tanhshrink_executor_aten_cuda_complex128 PASSED [ 45%] 2023-01-11T23:10:17.1153717Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_tanhshrink_executor_aten_cuda_complex64 PASSED [ 45%] 2023-01-11T23:10:17.1153909Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_tanhshrink_executor_aten_cuda_float32 PASSED [ 45%] 2023-01-11T23:10:17.1154107Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_tanhshrink_executor_aten_cuda_int32 PASSED [ 45%] 2023-01-11T23:10:17.1154357Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_tanhshrink_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 45%] 2023-01-11T23:10:17.1154713Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_tanhshrink_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 45%] 2023-01-11T23:10:17.1154918Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_tanhshrink_executor_nvfuser_cuda_int32 PASSED [ 45%] 2023-01-11T23:10:17.1155141Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_tanhshrink_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 45%] 2023-01-11T23:10:17.1155488Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_tanhshrink_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 45%] 2023-01-11T23:10:17.1155689Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_threshold_executor_aten_cuda_float16 PASSED [ 45%] 2023-01-11T23:10:17.1155891Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_threshold_executor_aten_cuda_float32 PASSED [ 45%] 2023-01-11T23:10:17.1156087Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_threshold_executor_aten_cuda_float64 PASSED [ 45%] 2023-01-11T23:10:17.1156284Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_threshold_executor_aten_cuda_int64 PASSED [ 45%] 2023-01-11T23:10:17.1156503Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_threshold_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 45%] 2023-01-11T23:10:17.1156723Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_threshold_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 45%] 2023-01-11T23:10:17.1156940Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_threshold_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 45%] 2023-01-11T23:10:17.1157292Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_threshold_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 45%] 2023-01-11T23:10:17.1157551Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_threshold_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 45%] 2023-01-11T23:10:17.1157765Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_triplet_margin_loss_executor_aten_cuda_complex128 PASSED [ 45%] 2023-01-11T23:10:17.1157974Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_triplet_margin_loss_executor_aten_cuda_float32 PASSED [ 45%] 2023-01-11T23:10:17.1158182Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_triplet_margin_loss_executor_aten_cuda_float64 PASSED [ 45%] 2023-01-11T23:10:17.1158390Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_triplet_margin_loss_executor_aten_cuda_int64 PASSED [ 45%] 2023-01-11T23:10:17.1158598Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_triplet_margin_loss_executor_aten_cuda_int8 PASSED [ 45%] 2023-01-11T23:10:17.1158834Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_triplet_margin_loss_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 45%] 2023-01-11T23:10:17.1159064Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_triplet_margin_loss_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 45%] 2023-01-11T23:10:17.1159299Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_triplet_margin_loss_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 45%] 2023-01-11T23:10:17.1159531Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_triplet_margin_loss_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 45%] 2023-01-11T23:10:17.1159785Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_triplet_margin_loss_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 45%] 2023-01-11T23:10:17.1159970Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_norm_executor_aten_cuda_bfloat16 PASSED [ 45%] 2023-01-11T23:10:17.1160155Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_norm_executor_aten_cuda_complex64 PASSED [ 45%] 2023-01-11T23:10:17.1160334Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_norm_executor_aten_cuda_float64 PASSED [ 45%] 2023-01-11T23:10:17.1160555Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_norm_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 45%] 2023-01-11T23:10:17.1160800Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_norm_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 45%] 2023-01-11T23:10:17.1161004Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_norm_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 45%] 2023-01-11T23:10:17.1161201Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_norm_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 45%] 2023-01-11T23:10:17.1161519Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_norm_executor_nvfuser_cuda_float32 SKIPPED (_refs.norm doesn't support nvfuser) [ 45%] 2023-01-11T23:10:17.1161703Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ones_executor_aten_cuda_complex64 PASSED [ 45%] 2023-01-11T23:10:17.1161881Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ones_executor_aten_cuda_float16 PASSED [ 45%] 2023-01-11T23:10:17.1162061Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ones_executor_aten_cuda_float32 PASSED [ 45%] 2023-01-11T23:10:17.1162240Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ones_executor_aten_cuda_int16 PASSED [ 45%] 2023-01-11T23:10:17.1162419Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ones_executor_aten_cuda_int64 PASSED [ 45%] 2023-01-11T23:10:17.1162599Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ones_executor_aten_cuda_int8 PASSED [ 45%] 2023-01-11T23:10:17.1162834Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ones_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 45%] 2023-01-11T23:10:17.1163172Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ones_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 45%] 2023-01-11T23:10:17.1163371Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ones_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 45%] 2023-01-11T23:10:17.1163572Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ones_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 45%] 2023-01-11T23:10:17.1163889Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ones_executor_nvfuser_cuda_float32 SKIPPED (_refs.ones doesn't support nvfuser) [ 45%] 2023-01-11T23:10:17.1164212Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ones_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 45%] 2023-01-11T23:10:17.1164529Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ones_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 45%] 2023-01-11T23:10:17.1164712Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_aten_cuda_bool PASSED [ 45%] 2023-01-11T23:10:17.1164898Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_aten_cuda_float32 PASSED [ 45%] 2023-01-11T23:10:17.1165084Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_aten_cuda_float64 PASSED [ 45%] 2023-01-11T23:10:17.1165266Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_aten_cuda_int16 PASSED [ 45%] 2023-01-11T23:10:17.1165446Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_aten_cuda_int8 PASSED [ 45%] 2023-01-11T23:10:17.1165809Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 45%] 2023-01-11T23:10:17.1166025Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 46%] 2023-01-11T23:10:17.1166236Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 46%] 2023-01-11T23:10:17.1166423Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_nvfuser_cuda_float32 PASSED [ 46%] 2023-01-11T23:10:17.1166629Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 46%] 2023-01-11T23:10:17.1166958Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 46%] 2023-01-11T23:10:17.1167148Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_nvfuser_cuda_int32 PASSED [ 46%] 2023-01-11T23:10:17.1167353Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 46%] 2023-01-11T23:10:17.1167543Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_positive_executor_aten_cuda_complex64 PASSED [ 46%] 2023-01-11T23:10:17.1167730Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_positive_executor_aten_cuda_float16 PASSED [ 46%] 2023-01-11T23:10:17.1167906Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_positive_executor_aten_cuda_int16 PASSED [ 46%] 2023-01-11T23:10:17.1168089Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_positive_executor_aten_cuda_int32 PASSED [ 46%] 2023-01-11T23:10:17.1168271Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_positive_executor_aten_cuda_int8 PASSED [ 46%] 2023-01-11T23:10:17.1168616Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_positive_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 46%] 2023-01-11T23:10:17.1168976Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_positive_executor_nvfuser_cuda_float32 SKIPPED (_refs.positive doesn't support nvfuser) [ 46%] 2023-01-11T23:10:17.1169305Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_positive_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 46%] 2023-01-11T23:10:17.1169511Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_positive_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 46%] 2023-01-11T23:10:17.1169833Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_positive_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 46%] 2023-01-11T23:10:17.1170038Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_positive_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 46%] 2023-01-11T23:10:17.1170223Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_pow_executor_aten_cuda_bfloat16 PASSED [ 46%] 2023-01-11T23:10:17.1170397Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_pow_executor_aten_cuda_complex64 PASSED [ 46%] 2023-01-11T23:10:17.1170586Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_pow_executor_aten_cuda_float16 PASSED [ 46%] 2023-01-11T23:10:17.1170761Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_pow_executor_aten_cuda_int16 PASSED [ 46%] 2023-01-11T23:10:17.1170939Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_pow_executor_aten_cuda_int32 PASSED [ 46%] 2023-01-11T23:10:17.1171116Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_pow_executor_aten_cuda_uint8 PASSED [ 46%] 2023-01-11T23:10:17.1171452Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_pow_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 46%] 2023-01-11T23:10:17.1171657Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_pow_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 46%] 2023-01-11T23:10:17.1171993Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_pow_executor_nvfuser_cuda_float32 SKIPPED (_refs.pow doesn't support nvfuser) [ 46%] 2023-01-11T23:10:17.1172201Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_pow_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 46%] 2023-01-11T23:10:17.1172508Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_pow_executor_nvfuser_cuda_int32 SKIPPED (_refs.pow doesn't support nvfuser) [ 46%] 2023-01-11T23:10:17.1172701Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_pow_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 46%] 2023-01-11T23:10:17.1172883Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_prod_executor_aten_cuda_float32 PASSED [ 46%] 2023-01-11T23:10:17.1173063Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_prod_executor_aten_cuda_float64 PASSED [ 46%] 2023-01-11T23:10:17.1173247Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_prod_executor_aten_cuda_int16 PASSED [ 46%] 2023-01-11T23:10:17.1173426Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_prod_executor_aten_cuda_int32 PASSED [ 46%] 2023-01-11T23:10:17.1173610Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_prod_executor_aten_cuda_int8 PASSED [ 46%] 2023-01-11T23:10:17.1173819Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_prod_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 46%] 2023-01-11T23:10:17.1174028Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_prod_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 46%] 2023-01-11T23:10:17.1174340Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_prod_executor_nvfuser_cuda_float32 SKIPPED (_refs.prod doesn't support nvfuser) [ 46%] 2023-01-11T23:10:17.1174723Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_prod_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 46%] 2023-01-11T23:10:17.1175092Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_prod_executor_nvfuser_cuda_int32 SKIPPED (_refs.prod doesn't support nvfuser) [ 46%] 2023-01-11T23:10:17.1175471Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_prod_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 46%] 2023-01-11T23:10:17.1175804Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_randn_executor_aten_cuda_bfloat16 SKIPPED (make_traced() doesn't set seed properly!) [ 46%] 2023-01-11T23:10:17.1176126Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_randn_executor_aten_cuda_complex128 SKIPPED (make_traced() doesn't set seed properly!) [ 46%] 2023-01-11T23:10:17.1176445Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_randn_executor_aten_cuda_complex32 SKIPPED (make_traced() doesn't set seed properly!) [ 46%] 2023-01-11T23:10:17.1176763Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_randn_executor_aten_cuda_complex64 SKIPPED (make_traced() doesn't set seed properly!) [ 46%] 2023-01-11T23:10:17.1177080Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_randn_executor_aten_cuda_float16 SKIPPED (make_traced() doesn't set seed properly!) [ 46%] 2023-01-11T23:10:17.1177408Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_randn_executor_nvfuser_cuda_bfloat16 SKIPPED (make_traced() doesn't set seed properly!) [ 46%] 2023-01-11T23:10:17.1177732Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_randn_executor_nvfuser_cuda_complex128 SKIPPED (make_traced() doesn't set seed properly!) [ 46%] 2023-01-11T23:10:17.1178045Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_randn_executor_nvfuser_cuda_float16 SKIPPED (make_traced() doesn't set seed properly!) [ 46%] 2023-01-11T23:10:17.1178231Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_aten_cuda_bfloat16 PASSED [ 46%] 2023-01-11T23:10:17.1178409Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_aten_cuda_complex128 PASSED [ 46%] 2023-01-11T23:10:17.1178630Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_aten_cuda_float16 PASSED [ 46%] 2023-01-11T23:10:17.1178817Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_aten_cuda_float32 PASSED [ 46%] 2023-01-11T23:10:17.1178995Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_aten_cuda_float64 PASSED [ 46%] 2023-01-11T23:10:17.1179176Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_aten_cuda_int16 PASSED [ 46%] 2023-01-11T23:10:17.1179359Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_aten_cuda_int32 PASSED [ 46%] 2023-01-11T23:10:17.1179539Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_aten_cuda_int8 PASSED [ 46%] 2023-01-11T23:10:17.1179745Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 46%] 2023-01-11T23:10:17.1179959Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 46%] 2023-01-11T23:10:17.1180157Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 46%] 2023-01-11T23:10:17.1180475Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_nvfuser_cuda_int32 SKIPPED (_refs.ravel doesn't support nvfuser) [ 46%] 2023-01-11T23:10:17.1180794Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 46%] 2023-01-11T23:10:17.1181000Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 46%] 2023-01-11T23:10:17.1181190Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_aten_cuda_complex128 PASSED [ 46%] 2023-01-11T23:10:17.1181378Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_aten_cuda_complex64 PASSED [ 46%] 2023-01-11T23:10:17.1181560Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_aten_cuda_float16 PASSED [ 46%] 2023-01-11T23:10:17.1181824Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_aten_cuda_float32 PASSED [ 46%] 2023-01-11T23:10:17.1182010Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_aten_cuda_int64 PASSED [ 46%] 2023-01-11T23:10:17.1182351Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 46%] 2023-01-11T23:10:17.1182662Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_nvfuser_cuda_float32 SKIPPED (_refs.real doesn't support nvfuser) [ 46%] 2023-01-11T23:10:17.1182869Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 46%] 2023-01-11T23:10:17.1183078Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 46%] 2023-01-11T23:10:17.1183402Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 46%] 2023-01-11T23:10:17.1183593Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reciprocal_executor_aten_cuda_float16 PASSED [ 46%] 2023-01-11T23:10:17.1183781Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reciprocal_executor_aten_cuda_int16 PASSED [ 46%] 2023-01-11T23:10:17.1183972Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reciprocal_executor_aten_cuda_int32 PASSED [ 46%] 2023-01-11T23:10:17.1184161Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reciprocal_executor_aten_cuda_int8 PASSED [ 46%] 2023-01-11T23:10:17.1184346Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reciprocal_executor_aten_cuda_uint8 PASSED [ 46%] 2023-01-11T23:10:17.1184590Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reciprocal_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 46%] 2023-01-11T23:10:17.1184798Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reciprocal_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 46%] 2023-01-11T23:10:17.1184990Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reciprocal_executor_nvfuser_cuda_float32 PASSED [ 46%] 2023-01-11T23:10:17.1185203Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reciprocal_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 46%] 2023-01-11T23:10:17.1185538Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reciprocal_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 46%] 2023-01-11T23:10:17.1185725Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_remainder_executor_aten_cuda_bfloat16 PASSED [ 46%] 2023-01-11T23:10:17.1185916Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_remainder_executor_aten_cuda_float64 PASSED [ 46%] 2023-01-11T23:10:17.1186099Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_remainder_executor_aten_cuda_int32 PASSED [ 46%] 2023-01-11T23:10:17.1186284Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_remainder_executor_aten_cuda_int8 PASSED [ 46%] 2023-01-11T23:10:17.1186465Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_remainder_executor_aten_cuda_uint8 PASSED [ 46%] 2023-01-11T23:10:17.1186677Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_remainder_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 46%] 2023-01-11T23:10:17.1186879Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_remainder_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 46%] 2023-01-11T23:10:17.1187068Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_remainder_executor_nvfuser_cuda_int32 PASSED [ 46%] 2023-01-11T23:10:17.1187277Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_remainder_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 46%] 2023-01-11T23:10:17.1187457Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_repeat_executor_aten_cuda_bool PASSED [ 46%] 2023-01-11T23:10:17.1187665Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_repeat_executor_aten_cuda_float16 PASSED [ 46%] 2023-01-11T23:10:17.1187848Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_repeat_executor_aten_cuda_float32 PASSED [ 46%] 2023-01-11T23:10:17.1188025Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_repeat_executor_aten_cuda_float64 PASSED [ 46%] 2023-01-11T23:10:17.1188209Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_repeat_executor_aten_cuda_int64 PASSED [ 46%] 2023-01-11T23:10:17.1188414Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_repeat_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 46%] 2023-01-11T23:10:17.1188612Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_repeat_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 46%] 2023-01-11T23:10:17.1188915Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_repeat_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 46%] 2023-01-11T23:10:17.1189132Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_repeat_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 46%] 2023-01-11T23:10:17.1189455Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_repeat_executor_nvfuser_cuda_float32 SKIPPED (_refs.repeat doesn't support nvfuser) [ 46%] 2023-01-11T23:10:17.1189774Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_repeat_executor_nvfuser_cuda_int32 SKIPPED (_refs.repeat doesn't support nvfuser) [ 46%] 2023-01-11T23:10:17.1189980Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_repeat_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 46%] 2023-01-11T23:10:17.1190170Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_aten_cuda_bfloat16 PASSED [ 46%] 2023-01-11T23:10:17.1190392Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_aten_cuda_complex128 PASSED [ 46%] 2023-01-11T23:10:17.1190583Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_aten_cuda_float16 PASSED [ 46%] 2023-01-11T23:10:17.1190767Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_aten_cuda_float64 PASSED [ 46%] 2023-01-11T23:10:17.1190942Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_aten_cuda_int16 PASSED [ 46%] 2023-01-11T23:10:17.1191123Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_aten_cuda_int32 PASSED [ 46%] 2023-01-11T23:10:17.1191304Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_aten_cuda_uint8 PASSED [ 46%] 2023-01-11T23:10:17.1191520Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 46%] 2023-01-11T23:10:17.1191855Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_nvfuser_cuda_int32 SKIPPED (_refs.reshape_as doesn't support nvfuser) [ 46%] 2023-01-11T23:10:17.1192188Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 46%] 2023-01-11T23:10:17.1192371Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_aten_cuda_bool PASSED [ 46%] 2023-01-11T23:10:17.1192556Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_aten_cuda_complex128 PASSED [ 46%] 2023-01-11T23:10:17.1192741Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_aten_cuda_float16 PASSED [ 46%] 2023-01-11T23:10:17.1192923Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_aten_cuda_int32 PASSED [ 46%] 2023-01-11T23:10:17.1193094Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_aten_cuda_int8 PASSED [ 46%] 2023-01-11T23:10:17.1193279Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_aten_cuda_uint8 PASSED [ 46%] 2023-01-11T23:10:17.1193647Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 46%] 2023-01-11T23:10:17.1193966Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_nvfuser_cuda_int32 SKIPPED (_refs.reshape doesn't support nvfuser) [ 46%] 2023-01-11T23:10:17.1194175Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 46%] 2023-01-11T23:10:17.1194359Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_aten_cuda_complex128 PASSED [ 46%] 2023-01-11T23:10:17.1194541Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_aten_cuda_complex32 PASSED [ 46%] 2023-01-11T23:10:17.1194725Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_aten_cuda_complex64 PASSED [ 46%] 2023-01-11T23:10:17.1194902Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_aten_cuda_float16 PASSED [ 46%] 2023-01-11T23:10:17.1195083Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_aten_cuda_float32 PASSED [ 46%] 2023-01-11T23:10:17.1195254Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_aten_cuda_int32 PASSED [ 46%] 2023-01-11T23:10:17.1195430Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_aten_cuda_int64 PASSED [ 46%] 2023-01-11T23:10:17.1195639Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 46%] 2023-01-11T23:10:17.1195840Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 46%] 2023-01-11T23:10:17.1196069Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 46%] 2023-01-11T23:10:17.1196396Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 46%] 2023-01-11T23:10:17.1196705Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_nvfuser_cuda_int32 SKIPPED (_refs.roll doesn't support nvfuser) [ 46%] 2023-01-11T23:10:17.1196886Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rot90_executor_aten_cuda_bfloat16 PASSED [ 46%] 2023-01-11T23:10:17.1197065Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rot90_executor_aten_cuda_bool PASSED [ 46%] 2023-01-11T23:10:17.1197238Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rot90_executor_aten_cuda_float16 PASSED [ 46%] 2023-01-11T23:10:17.1197418Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rot90_executor_aten_cuda_float32 PASSED [ 46%] 2023-01-11T23:10:17.1197598Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rot90_executor_aten_cuda_int16 PASSED [ 46%] 2023-01-11T23:10:17.1197777Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rot90_executor_aten_cuda_int32 PASSED [ 46%] 2023-01-11T23:10:17.1197989Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rot90_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 46%] 2023-01-11T23:10:17.1198307Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rot90_executor_nvfuser_cuda_float32 SKIPPED (_refs.rot90 doesn't support nvfuser) [ 46%] 2023-01-11T23:10:17.1198618Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rot90_executor_nvfuser_cuda_int32 SKIPPED (_refs.rot90 doesn't support nvfuser) [ 46%] 2023-01-11T23:10:17.1198934Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rot90_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 46%] 2023-01-11T23:10:17.1199116Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_round_executor_aten_cuda_float32 PASSED [ 46%] 2023-01-11T23:10:17.1199301Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_round_executor_aten_cuda_float64 PASSED [ 46%] 2023-01-11T23:10:17.1199512Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_round_executor_aten_cuda_int16 PASSED [ 46%] 2023-01-11T23:10:17.1199691Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_round_executor_aten_cuda_int64 PASSED [ 46%] 2023-01-11T23:10:17.1199869Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_round_executor_aten_cuda_int8 PASSED [ 46%] 2023-01-11T23:10:17.1200076Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_round_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 46%] 2023-01-11T23:10:17.1200397Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_round_executor_nvfuser_cuda_float32 SKIPPED (_refs.round doesn't support nvfuser) [ 46%] 2023-01-11T23:10:17.1200605Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_round_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 46%] 2023-01-11T23:10:17.1200927Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_round_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 46%] 2023-01-11T23:10:17.1201108Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsqrt_executor_aten_cuda_bool PASSED [ 46%] 2023-01-11T23:10:17.1201295Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsqrt_executor_aten_cuda_complex128 PASSED [ 46%] 2023-01-11T23:10:17.1201483Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsqrt_executor_aten_cuda_complex32 PASSED [ 47%] 2023-01-11T23:10:17.1201661Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsqrt_executor_aten_cuda_complex64 PASSED [ 47%] 2023-01-11T23:10:17.1201841Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsqrt_executor_aten_cuda_float16 PASSED [ 47%] 2023-01-11T23:10:17.1202020Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsqrt_executor_aten_cuda_int16 PASSED [ 47%] 2023-01-11T23:10:17.1202222Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsqrt_executor_aten_cuda_int32 PASSED [ 47%] 2023-01-11T23:10:17.1202402Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsqrt_executor_aten_cuda_int64 PASSED [ 47%] 2023-01-11T23:10:17.1202576Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsqrt_executor_aten_cuda_int8 PASSED [ 47%] 2023-01-11T23:10:17.1202903Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsqrt_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 47%] 2023-01-11T23:10:17.1203117Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsqrt_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 47%] 2023-01-11T23:10:17.1203462Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsqrt_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 47%] 2023-01-11T23:10:17.1203660Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsqrt_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 47%] 2023-01-11T23:10:17.1203988Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsqrt_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 47%] 2023-01-11T23:10:17.1204311Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsqrt_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 47%] 2023-01-11T23:10:17.1204496Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsub_executor_aten_cuda_complex128 PASSED [ 47%] 2023-01-11T23:10:17.1204678Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsub_executor_aten_cuda_complex64 PASSED [ 47%] 2023-01-11T23:10:17.1204859Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsub_executor_aten_cuda_float32 PASSED [ 47%] 2023-01-11T23:10:17.1205038Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsub_executor_aten_cuda_float64 PASSED [ 47%] 2023-01-11T23:10:17.1205216Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsub_executor_aten_cuda_int32 PASSED [ 47%] 2023-01-11T23:10:17.1205395Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsub_executor_aten_cuda_int8 PASSED [ 47%] 2023-01-11T23:10:17.1205633Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsub_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 47%] 2023-01-11T23:10:17.1205949Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsub_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 47%] 2023-01-11T23:10:17.1206152Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsub_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 47%] 2023-01-11T23:10:17.1206354Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsub_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 47%] 2023-01-11T23:10:17.1206537Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_aten_cuda_bfloat16 PASSED [ 47%] 2023-01-11T23:10:17.1206723Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_aten_cuda_bool PASSED [ 47%] 2023-01-11T23:10:17.1206907Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_aten_cuda_complex128 PASSED [ 47%] 2023-01-11T23:10:17.1207090Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_aten_cuda_complex32 PASSED [ 47%] 2023-01-11T23:10:17.1207269Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_aten_cuda_float32 PASSED [ 47%] 2023-01-11T23:10:17.1207449Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_aten_cuda_int32 PASSED [ 47%] 2023-01-11T23:10:17.1207619Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_aten_cuda_int64 PASSED [ 47%] 2023-01-11T23:10:17.1207794Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_aten_cuda_int8 PASSED [ 47%] 2023-01-11T23:10:17.1207997Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 47%] 2023-01-11T23:10:17.1208362Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 47%] 2023-01-11T23:10:17.1208574Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 47%] 2023-01-11T23:10:17.1208892Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 47%] 2023-01-11T23:10:17.1209095Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 47%] 2023-01-11T23:10:17.1209278Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_aten_cuda_bool PASSED [ 47%] 2023-01-11T23:10:17.1209466Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_aten_cuda_complex128 PASSED [ 47%] 2023-01-11T23:10:17.1209656Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_aten_cuda_complex64 PASSED [ 47%] 2023-01-11T23:10:17.1209836Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_aten_cuda_float16 PASSED [ 47%] 2023-01-11T23:10:17.1210022Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_aten_cuda_float64 PASSED [ 47%] 2023-01-11T23:10:17.1210205Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_aten_cuda_uint8 PASSED [ 47%] 2023-01-11T23:10:17.1210411Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 47%] 2023-01-11T23:10:17.1210628Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 47%] 2023-01-11T23:10:17.1211018Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 47%] 2023-01-11T23:10:17.1211234Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 47%] 2023-01-11T23:10:17.1211469Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 47%] 2023-01-11T23:10:17.1211795Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 47%] 2023-01-11T23:10:17.1211975Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sign_executor_aten_cuda_bfloat16 PASSED [ 47%] 2023-01-11T23:10:17.1212146Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sign_executor_aten_cuda_bool PASSED [ 47%] 2023-01-11T23:10:17.1212331Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sign_executor_aten_cuda_float32 PASSED [ 47%] 2023-01-11T23:10:17.1212510Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sign_executor_aten_cuda_int16 PASSED [ 47%] 2023-01-11T23:10:17.1212691Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sign_executor_aten_cuda_uint8 PASSED [ 47%] 2023-01-11T23:10:17.1212897Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sign_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 47%] 2023-01-11T23:10:17.1213083Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sign_executor_nvfuser_cuda_float32 PASSED [ 47%] 2023-01-11T23:10:17.1213287Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sign_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 47%] 2023-01-11T23:10:17.1213610Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sign_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 47%] 2023-01-11T23:10:17.1213928Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sign_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 47%] 2023-01-11T23:10:17.1214150Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_signbit_executor_aten_cuda_bfloat16 PASSED [ 47%] 2023-01-11T23:10:17.1214325Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_signbit_executor_aten_cuda_int16 PASSED [ 47%] 2023-01-11T23:10:17.1214682Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_signbit_executor_aten_cuda_int8 PASSED [ 47%] 2023-01-11T23:10:17.1214868Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_signbit_executor_aten_cuda_uint8 PASSED [ 47%] 2023-01-11T23:10:17.1215077Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_signbit_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 47%] 2023-01-11T23:10:17.1215285Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_signbit_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 47%] 2023-01-11T23:10:17.1215609Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_signbit_executor_nvfuser_cuda_float32 SKIPPED (_refs.signbit doesn't support nvfuser) [ 47%] 2023-01-11T23:10:17.1215937Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_signbit_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 47%] 2023-01-11T23:10:17.1216145Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_signbit_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 47%] 2023-01-11T23:10:17.1216467Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_signbit_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 47%] 2023-01-11T23:10:17.1216672Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_signbit_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 47%] 2023-01-11T23:10:17.1216843Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sin_executor_aten_cuda_bfloat16 PASSED [ 47%] 2023-01-11T23:10:17.1217024Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sin_executor_aten_cuda_complex64 PASSED [ 47%] 2023-01-11T23:10:17.1217205Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sin_executor_aten_cuda_float32 PASSED [ 47%] 2023-01-11T23:10:17.1217385Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sin_executor_aten_cuda_float64 PASSED [ 47%] 2023-01-11T23:10:17.1217827Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sin_executor_aten_cuda_int8 PASSED [ 47%] 2023-01-11T23:10:17.1218082Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sin_executor_aten_cuda_uint8 PASSED [ 47%] 2023-01-11T23:10:17.1218293Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sin_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 47%] 2023-01-11T23:10:17.1218502Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sin_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 47%] 2023-01-11T23:10:17.1218856Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sin_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 47%] 2023-01-11T23:10:17.1219066Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sin_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 47%] 2023-01-11T23:10:17.1219242Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sin_executor_nvfuser_cuda_float32 PASSED [ 47%] 2023-01-11T23:10:17.1219561Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sin_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 47%] 2023-01-11T23:10:17.1219743Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sin_executor_nvfuser_cuda_int32 PASSED [ 47%] 2023-01-11T23:10:17.1219943Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sin_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 47%] 2023-01-11T23:10:17.1220261Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sin_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 47%] 2023-01-11T23:10:17.1220439Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinc_executor_aten_cuda_bool PASSED [ 47%] 2023-01-11T23:10:17.1220675Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinc_executor_aten_cuda_complex64 PASSED [ 47%] 2023-01-11T23:10:17.1220859Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinc_executor_aten_cuda_float16 PASSED [ 47%] 2023-01-11T23:10:17.1221039Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinc_executor_aten_cuda_float32 PASSED [ 47%] 2023-01-11T23:10:17.1221211Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinc_executor_aten_cuda_int64 PASSED [ 47%] 2023-01-11T23:10:17.1221393Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinc_executor_aten_cuda_int8 PASSED [ 47%] 2023-01-11T23:10:17.1221576Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinc_executor_aten_cuda_uint8 PASSED [ 47%] 2023-01-11T23:10:17.1221782Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinc_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 47%] 2023-01-11T23:10:17.1221988Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinc_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 47%] 2023-01-11T23:10:17.1222198Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinc_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 47%] 2023-01-11T23:10:17.1222405Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinc_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 47%] 2023-01-11T23:10:17.1222594Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinc_executor_nvfuser_cuda_float32 PASSED [ 47%] 2023-01-11T23:10:17.1222799Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinc_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 47%] 2023-01-11T23:10:17.1223121Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinc_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 47%] 2023-01-11T23:10:17.1223317Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinc_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 47%] 2023-01-11T23:10:17.1223638Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinc_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 47%] 2023-01-11T23:10:17.1223868Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinc_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 47%] 2023-01-11T23:10:17.1224050Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_aten_cuda_bool PASSED [ 47%] 2023-01-11T23:10:17.1224235Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_aten_cuda_complex128 PASSED [ 47%] 2023-01-11T23:10:17.1224417Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_aten_cuda_float32 PASSED [ 47%] 2023-01-11T23:10:17.1224597Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_aten_cuda_int32 PASSED [ 47%] 2023-01-11T23:10:17.1224775Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_aten_cuda_int8 PASSED [ 47%] 2023-01-11T23:10:17.1224983Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 47%] 2023-01-11T23:10:17.1225324Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 47%] 2023-01-11T23:10:17.1225518Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 47%] 2023-01-11T23:10:17.1225705Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_nvfuser_cuda_float32 PASSED [ 47%] 2023-01-11T23:10:17.1226024Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 47%] 2023-01-11T23:10:17.1226205Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_nvfuser_cuda_int32 PASSED [ 47%] 2023-01-11T23:10:17.1226431Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 47%] 2023-01-11T23:10:17.1226626Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_softmax_with_dtype_executor_aten_cuda_bfloat16 PASSED [ 47%] 2023-01-11T23:10:17.1226825Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_softmax_with_dtype_executor_aten_cuda_bool PASSED [ 47%] 2023-01-11T23:10:17.1227044Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_softmax_with_dtype_executor_aten_cuda_float32 PASSED [ 47%] 2023-01-11T23:10:17.1227304Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_softmax_with_dtype_executor_aten_cuda_float64 PASSED [ 47%] 2023-01-11T23:10:17.1227541Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_softmax_with_dtype_executor_aten_cuda_int16 PASSED [ 47%] 2023-01-11T23:10:17.1227734Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_softmax_with_dtype_executor_aten_cuda_int32 PASSED [ 47%] 2023-01-11T23:10:17.1227922Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_softmax_with_dtype_executor_aten_cuda_uint8 PASSED [ 47%] 2023-01-11T23:10:17.1228146Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_softmax_with_dtype_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 47%] 2023-01-11T23:10:17.1228366Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_softmax_with_dtype_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 47%] 2023-01-11T23:10:17.1228583Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_softmax_with_dtype_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 47%] 2023-01-11T23:10:17.1228865Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_softmax_with_dtype_executor_nvfuser_cuda_float32 PASSED [ 47%] 2023-01-11T23:10:17.1229085Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_softmax_with_dtype_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 47%] 2023-01-11T23:10:17.1229435Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_softmax_with_dtype_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 47%] 2023-01-11T23:10:17.1229669Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j0_executor_aten_cuda_bool PASSED [ 47%] 2023-01-11T23:10:17.1229864Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j0_executor_aten_cuda_float32 PASSED [ 47%] 2023-01-11T23:10:17.1230048Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j0_executor_aten_cuda_float64 PASSED [ 47%] 2023-01-11T23:10:17.1230240Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j0_executor_aten_cuda_int16 PASSED [ 47%] 2023-01-11T23:10:17.1230429Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j0_executor_aten_cuda_int8 PASSED [ 47%] 2023-01-11T23:10:17.1230786Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j0_executor_nvfuser_cuda_float32 SKIPPED (_refs.special.bessel_j0 doesn't support nvfuser) [ 47%] 2023-01-11T23:10:17.1231130Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j0_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 47%] 2023-01-11T23:10:17.1231468Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j0_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 47%] 2023-01-11T23:10:17.1231662Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j1_executor_aten_cuda_float64 PASSED [ 47%] 2023-01-11T23:10:17.1231850Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j1_executor_aten_cuda_int8 PASSED [ 47%] 2023-01-11T23:10:17.1232065Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j1_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 47%] 2023-01-11T23:10:17.1232283Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j1_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 47%] 2023-01-11T23:10:17.1232648Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j1_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 47%] 2023-01-11T23:10:17.1232840Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_entr_executor_aten_cuda_float16 PASSED [ 47%] 2023-01-11T23:10:17.1233029Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_entr_executor_aten_cuda_float64 PASSED [ 47%] 2023-01-11T23:10:17.1233217Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_entr_executor_aten_cuda_int64 PASSED [ 47%] 2023-01-11T23:10:17.1233403Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_entr_executor_aten_cuda_int8 PASSED [ 47%] 2023-01-11T23:10:17.1233616Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_entr_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 47%] 2023-01-11T23:10:17.1233951Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_entr_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 47%] 2023-01-11T23:10:17.1234164Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_entr_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 47%] 2023-01-11T23:10:17.1234352Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_erfcx_executor_aten_cuda_bool PASSED [ 47%] 2023-01-11T23:10:17.1234542Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_erfcx_executor_aten_cuda_float32 PASSED [ 47%] 2023-01-11T23:10:17.1234717Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_erfcx_executor_aten_cuda_float64 PASSED [ 47%] 2023-01-11T23:10:17.1235025Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_erfcx_executor_aten_cuda_int16 PASSED [ 47%] 2023-01-11T23:10:17.1235238Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_erfcx_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 47%] 2023-01-11T23:10:17.1235457Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_erfcx_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 47%] 2023-01-11T23:10:17.1235832Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_erfcx_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 47%] 2023-01-11T23:10:17.1236176Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_erfcx_executor_nvfuser_cuda_int32 SKIPPED (_refs.special.erfcx doesn't support nvfuser) [ 47%] 2023-01-11T23:10:17.1236389Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_erfcx_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 47%] 2023-01-11T23:10:17.1236579Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i0e_executor_aten_cuda_float32 PASSED [ 47%] 2023-01-11T23:10:17.1236764Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i0e_executor_aten_cuda_float64 PASSED [ 47%] 2023-01-11T23:10:17.1236952Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i0e_executor_aten_cuda_int64 PASSED [ 48%] 2023-01-11T23:10:17.1237155Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i0e_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 48%] 2023-01-11T23:10:17.1237493Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i0e_executor_nvfuser_cuda_float32 SKIPPED (_refs.special.i0e doesn't support nvfuser) [ 48%] 2023-01-11T23:10:17.1237824Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i0e_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 48%] 2023-01-11T23:10:17.1238035Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i0e_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 48%] 2023-01-11T23:10:17.1238364Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i0e_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 48%] 2023-01-11T23:10:17.1238574Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1_executor_aten_cuda_bool PASSED [ 48%] 2023-01-11T23:10:17.1238763Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1_executor_aten_cuda_float32 PASSED [ 48%] 2023-01-11T23:10:17.1238949Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1_executor_aten_cuda_float64 PASSED [ 48%] 2023-01-11T23:10:17.1239134Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1_executor_aten_cuda_int16 PASSED [ 48%] 2023-01-11T23:10:17.1239318Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1_executor_aten_cuda_int8 PASSED [ 48%] 2023-01-11T23:10:17.1239649Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1_executor_nvfuser_cuda_int32 SKIPPED (_refs.special.i1 doesn't support nvfuser) [ 48%] 2023-01-11T23:10:17.1239827Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1e_executor_aten_cuda_float32 PASSED [ 48%] 2023-01-11T23:10:17.1240013Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1e_executor_aten_cuda_int16 PASSED [ 48%] 2023-01-11T23:10:17.1240199Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1e_executor_aten_cuda_uint8 PASSED [ 48%] 2023-01-11T23:10:17.1240407Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1e_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 48%] 2023-01-11T23:10:17.1240741Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1e_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 48%] 2023-01-11T23:10:17.1241078Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1e_executor_nvfuser_cuda_int32 SKIPPED (_refs.special.i1e doesn't support nvfuser) [ 48%] 2023-01-11T23:10:17.1241273Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_ndtr_executor_aten_cuda_float32 PASSED [ 48%] 2023-01-11T23:10:17.1241466Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_ndtr_executor_aten_cuda_int32 PASSED [ 48%] 2023-01-11T23:10:17.1241653Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_ndtr_executor_aten_cuda_int64 PASSED [ 48%] 2023-01-11T23:10:17.1241879Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_ndtr_executor_aten_cuda_int8 PASSED [ 48%] 2023-01-11T23:10:17.1242056Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_ndtr_executor_aten_cuda_uint8 PASSED [ 48%] 2023-01-11T23:10:17.1242406Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_ndtr_executor_nvfuser_cuda_int32 SKIPPED (_refs.special.log_ndtr doesn't support nvfuser) [ 48%] 2023-01-11T23:10:17.1242627Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_ndtr_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 48%] 2023-01-11T23:10:17.1242834Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_aten_cuda_bfloat16 PASSED [ 48%] 2023-01-11T23:10:17.1243048Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_aten_cuda_complex128 PASSED [ 48%] 2023-01-11T23:10:17.1243260Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_aten_cuda_complex64 PASSED [ 48%] 2023-01-11T23:10:17.1243464Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_aten_cuda_float16 PASSED [ 48%] 2023-01-11T23:10:17.1243666Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_aten_cuda_int16 PASSED [ 48%] 2023-01-11T23:10:17.1243869Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_aten_cuda_int8 PASSED [ 48%] 2023-01-11T23:10:17.1244102Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 48%] 2023-01-11T23:10:17.1244357Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 48%] 2023-01-11T23:10:17.1244568Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_nvfuser_cuda_float32 PASSED [ 48%] 2023-01-11T23:10:17.1244799Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 48%] 2023-01-11T23:10:17.1245157Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 48%] 2023-01-11T23:10:17.1245347Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_logit_executor_aten_cuda_bool PASSED [ 48%] 2023-01-11T23:10:17.1245536Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_logit_executor_aten_cuda_float16 PASSED [ 48%] 2023-01-11T23:10:17.1245723Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_logit_executor_aten_cuda_float32 PASSED [ 48%] 2023-01-11T23:10:17.1245908Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_logit_executor_aten_cuda_float64 PASSED [ 48%] 2023-01-11T23:10:17.1246096Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_logit_executor_aten_cuda_int32 PASSED [ 48%] 2023-01-11T23:10:17.1246311Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_logit_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 48%] 2023-01-11T23:10:17.1246519Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_logit_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 48%] 2023-01-11T23:10:17.1246869Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_logit_executor_nvfuser_cuda_float32 SKIPPED (_refs.special.logit doesn't support nvfuser) [ 48%] 2023-01-11T23:10:17.1247210Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_logit_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 48%] 2023-01-11T23:10:17.1247581Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_logit_executor_nvfuser_cuda_int32 SKIPPED (_refs.special.logit doesn't support nvfuser) [ 48%] 2023-01-11T23:10:17.1247910Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_logit_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 48%] 2023-01-11T23:10:17.1248122Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_1_executor_aten_cuda_int32 PASSED [ 48%] 2023-01-11T23:10:17.1248328Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_1_executor_aten_cuda_uint8 PASSED [ 48%] 2023-01-11T23:10:17.1248562Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_1_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 48%] 2023-01-11T23:10:17.1248800Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_1_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 48%] 2023-01-11T23:10:17.1249165Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_1_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 48%] 2023-01-11T23:10:17.1249552Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_1_executor_nvfuser_cuda_int32 SKIPPED (_refs.special.multigammaln doesn't support nvfuser) [ 48%] 2023-01-11T23:10:17.1249782Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_1_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 48%] 2023-01-11T23:10:17.1249984Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_3_executor_aten_cuda_float32 PASSED [ 48%] 2023-01-11T23:10:17.1250222Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_3_executor_aten_cuda_int8 PASSED [ 48%] 2023-01-11T23:10:17.1250431Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_3_executor_aten_cuda_uint8 PASSED [ 48%] 2023-01-11T23:10:17.1250828Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_3_executor_nvfuser_cuda_float32 SKIPPED (_refs.special.multigammaln doesn't support nvfuser) [ 48%] 2023-01-11T23:10:17.1251102Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_3_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 48%] 2023-01-11T23:10:17.1251478Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_3_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 48%] 2023-01-11T23:10:17.1251860Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_3_executor_nvfuser_cuda_int32 SKIPPED (_refs.special.multigammaln doesn't support nvfuser) [ 48%] 2023-01-11T23:10:17.1252093Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_3_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 48%] 2023-01-11T23:10:17.1252453Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_3_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 48%] 2023-01-11T23:10:17.1252682Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_3_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 48%] 2023-01-11T23:10:17.1252893Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_5_executor_aten_cuda_float64 PASSED [ 48%] 2023-01-11T23:10:17.1253098Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_5_executor_aten_cuda_int16 PASSED [ 48%] 2023-01-11T23:10:17.1253296Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_5_executor_aten_cuda_int32 PASSED [ 48%] 2023-01-11T23:10:17.1253527Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_5_executor_aten_cuda_int8 PASSED [ 48%] 2023-01-11T23:10:17.1253732Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_5_executor_aten_cuda_uint8 PASSED [ 48%] 2023-01-11T23:10:17.1253963Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_5_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 48%] 2023-01-11T23:10:17.1254348Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_5_executor_nvfuser_cuda_float32 SKIPPED (_refs.special.multigammaln doesn't support nvfuser) [ 48%] 2023-01-11T23:10:17.1254826Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_5_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 48%] 2023-01-11T23:10:17.1255213Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_5_executor_nvfuser_cuda_int32 SKIPPED (_refs.special.multigammaln doesn't support nvfuser) [ 48%] 2023-01-11T23:10:17.1255445Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_5_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 48%] 2023-01-11T23:10:17.1255814Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_5_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 48%] 2023-01-11T23:10:17.1256000Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtr_executor_aten_cuda_bool PASSED [ 48%] 2023-01-11T23:10:17.1256192Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtr_executor_aten_cuda_float16 PASSED [ 48%] 2023-01-11T23:10:17.1256375Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtr_executor_aten_cuda_float64 PASSED [ 48%] 2023-01-11T23:10:17.1256611Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtr_executor_aten_cuda_int16 PASSED [ 48%] 2023-01-11T23:10:17.1256802Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtr_executor_aten_cuda_uint8 PASSED [ 48%] 2023-01-11T23:10:17.1257017Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtr_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 48%] 2023-01-11T23:10:17.1257351Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtr_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 48%] 2023-01-11T23:10:17.1257695Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtr_executor_nvfuser_cuda_int32 SKIPPED (_refs.special.ndtr doesn't support nvfuser) [ 48%] 2023-01-11T23:10:17.1257910Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtr_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 48%] 2023-01-11T23:10:17.1258104Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtri_executor_aten_cuda_bool PASSED [ 48%] 2023-01-11T23:10:17.1258298Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtri_executor_aten_cuda_float32 PASSED [ 48%] 2023-01-11T23:10:17.1258486Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtri_executor_aten_cuda_float64 PASSED [ 48%] 2023-01-11T23:10:17.1258669Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtri_executor_aten_cuda_int64 PASSED [ 48%] 2023-01-11T23:10:17.1258887Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtri_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 48%] 2023-01-11T23:10:17.1259232Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtri_executor_nvfuser_cuda_float32 SKIPPED (_refs.special.ndtri doesn't support nvfuser) [ 48%] 2023-01-11T23:10:17.1259574Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtri_executor_nvfuser_cuda_int32 SKIPPED (_refs.special.ndtri doesn't support nvfuser) [ 48%] 2023-01-11T23:10:17.1259826Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtri_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 48%] 2023-01-11T23:10:17.1260039Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_softmax_with_dtype_executor_aten_cuda_complex128 PASSED [ 48%] 2023-01-11T23:10:17.1260249Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_softmax_with_dtype_executor_aten_cuda_complex64 PASSED [ 48%] 2023-01-11T23:10:17.1260452Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_softmax_with_dtype_executor_aten_cuda_int16 PASSED [ 48%] 2023-01-11T23:10:17.1260653Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_softmax_with_dtype_executor_aten_cuda_int64 PASSED [ 48%] 2023-01-11T23:10:17.1260885Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_softmax_with_dtype_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 48%] 2023-01-11T23:10:17.1261122Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_softmax_with_dtype_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 48%] 2023-01-11T23:10:17.1261347Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_softmax_with_dtype_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 48%] 2023-01-11T23:10:17.1261557Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_softmax_with_dtype_executor_nvfuser_cuda_float32 PASSED [ 48%] 2023-01-11T23:10:17.1261761Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_softmax_with_dtype_executor_nvfuser_cuda_int32 PASSED [ 48%] 2023-01-11T23:10:17.1261963Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_spherical_bessel_j0_executor_aten_cuda_bool PASSED [ 48%] 2023-01-11T23:10:17.1262172Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_spherical_bessel_j0_executor_aten_cuda_float64 PASSED [ 48%] 2023-01-11T23:10:17.1262393Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_spherical_bessel_j0_executor_aten_cuda_int16 PASSED [ 48%] 2023-01-11T23:10:17.1262593Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_spherical_bessel_j0_executor_aten_cuda_int8 PASSED [ 48%] 2023-01-11T23:10:17.1262795Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_spherical_bessel_j0_executor_aten_cuda_uint8 PASSED [ 48%] 2023-01-11T23:10:17.1263019Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_spherical_bessel_j0_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 48%] 2023-01-11T23:10:17.1263377Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_spherical_bessel_j0_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 48%] 2023-01-11T23:10:17.1263766Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_spherical_bessel_j0_executor_nvfuser_cuda_int32 SKIPPED (_refs.special.spherical_bessel_j0 doesn't support nvfuser) [ 48%] 2023-01-11T23:10:17.1263990Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_spherical_bessel_j0_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 48%] 2023-01-11T23:10:17.1264186Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_xlog1py_executor_aten_cuda_bfloat16 PASSED [ 48%] 2023-01-11T23:10:17.1264383Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_xlog1py_executor_aten_cuda_float16 PASSED [ 48%] 2023-01-11T23:10:17.1264567Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_xlog1py_executor_aten_cuda_float32 PASSED [ 48%] 2023-01-11T23:10:17.1264753Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_xlog1py_executor_aten_cuda_float64 PASSED [ 48%] 2023-01-11T23:10:17.1264941Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_xlog1py_executor_aten_cuda_int64 PASSED [ 48%] 2023-01-11T23:10:17.1265131Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_xlog1py_executor_aten_cuda_int8 PASSED [ 48%] 2023-01-11T23:10:17.1265320Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_xlog1py_executor_aten_cuda_uint8 PASSED [ 48%] 2023-01-11T23:10:17.1265563Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_xlog1py_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 48%] 2023-01-11T23:10:17.1265924Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_xlog1py_executor_nvfuser_cuda_float32 SKIPPED (_refs.special.xlog1py doesn't support nvfuser) [ 48%] 2023-01-11T23:10:17.1266265Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_xlog1py_executor_nvfuser_cuda_int32 SKIPPED (_refs.special.xlog1py doesn't support nvfuser) [ 48%] 2023-01-11T23:10:17.1266481Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_xlog1py_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 48%] 2023-01-11T23:10:17.1266698Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_xlog1py_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 48%] 2023-01-11T23:10:17.1266889Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_zeta_executor_aten_cuda_float32 PASSED [ 48%] 2023-01-11T23:10:17.1267077Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_zeta_executor_aten_cuda_int8 PASSED [ 48%] 2023-01-11T23:10:17.1267423Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_zeta_executor_nvfuser_cuda_float32 SKIPPED (_refs.special.zeta doesn't support nvfuser) [ 48%] 2023-01-11T23:10:17.1267635Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_zeta_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 48%] 2023-01-11T23:10:17.1267972Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_zeta_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 48%] 2023-01-11T23:10:17.1268206Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_zeta_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 48%] 2023-01-11T23:10:17.1268419Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_zeta_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 48%] 2023-01-11T23:10:17.1268597Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_aten_cuda_complex32 PASSED [ 48%] 2023-01-11T23:10:17.1268858Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_aten_cuda_complex64 PASSED [ 48%] 2023-01-11T23:10:17.1269044Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_aten_cuda_float64 PASSED [ 48%] 2023-01-11T23:10:17.1269223Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_aten_cuda_int16 PASSED [ 48%] 2023-01-11T23:10:17.1269568Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 48%] 2023-01-11T23:10:17.1269777Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 48%] 2023-01-11T23:10:17.1269983Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 48%] 2023-01-11T23:10:17.1270169Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_nvfuser_cuda_float32 PASSED [ 48%] 2023-01-11T23:10:17.1270534Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 48%] 2023-01-11T23:10:17.1270747Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 48%] 2023-01-11T23:10:17.1270940Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 48%] 2023-01-11T23:10:17.1271121Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_aten_cuda_bool PASSED [ 48%] 2023-01-11T23:10:17.1271327Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_aten_cuda_complex64 SKIPPED (Skipped!) [ 48%] 2023-01-11T23:10:17.1271538Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_aten_cuda_float16 PASSED [ 48%] 2023-01-11T23:10:17.1271721Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_aten_cuda_float32 PASSED [ 48%] 2023-01-11T23:10:17.1271899Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_aten_cuda_float64 PASSED [ 48%] 2023-01-11T23:10:17.1272078Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_aten_cuda_int16 PASSED [ 48%] 2023-01-11T23:10:17.1272256Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_aten_cuda_int32 PASSED [ 48%] 2023-01-11T23:10:17.1272434Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_aten_cuda_int8 PASSED [ 48%] 2023-01-11T23:10:17.1272608Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_aten_cuda_uint8 PASSED [ 48%] 2023-01-11T23:10:17.1272821Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 48%] 2023-01-11T23:10:17.1273024Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 48%] 2023-01-11T23:10:17.1273209Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_nvfuser_cuda_float32 PASSED [ 48%] 2023-01-11T23:10:17.1273541Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 48%] 2023-01-11T23:10:17.1273726Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_nvfuser_cuda_int32 PASSED [ 48%] 2023-01-11T23:10:17.1273931Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 48%] 2023-01-11T23:10:17.1274200Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 48%] 2023-01-11T23:10:17.1274389Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_aten_cuda_bfloat16 PASSED [ 48%] 2023-01-11T23:10:17.1274579Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_aten_cuda_complex32 PASSED [ 49%] 2023-01-11T23:10:17.1274754Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_aten_cuda_float16 PASSED [ 49%] 2023-01-11T23:10:17.1274938Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_aten_cuda_float32 PASSED [ 49%] 2023-01-11T23:10:17.1275114Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_aten_cuda_int8 PASSED [ 49%] 2023-01-11T23:10:17.1275297Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_aten_cuda_uint8 PASSED [ 49%] 2023-01-11T23:10:17.1275506Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 49%] 2023-01-11T23:10:17.1275714Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 49%] 2023-01-11T23:10:17.1276047Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 49%] 2023-01-11T23:10:17.1276251Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 49%] 2023-01-11T23:10:17.1276452Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 49%] 2023-01-11T23:10:17.1276637Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_aten_cuda_bfloat16 PASSED [ 49%] 2023-01-11T23:10:17.1276813Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_aten_cuda_bool PASSED [ 49%] 2023-01-11T23:10:17.1276997Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_aten_cuda_complex128 PASSED [ 49%] 2023-01-11T23:10:17.1277204Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_aten_cuda_complex64 PASSED [ 49%] 2023-01-11T23:10:17.1277385Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_aten_cuda_float16 PASSED [ 49%] 2023-01-11T23:10:17.1277568Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_aten_cuda_float32 PASSED [ 49%] 2023-01-11T23:10:17.1277746Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_aten_cuda_int32 PASSED [ 49%] 2023-01-11T23:10:17.1277925Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_aten_cuda_int8 PASSED [ 49%] 2023-01-11T23:10:17.1278137Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 49%] 2023-01-11T23:10:17.1278486Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 49%] 2023-01-11T23:10:17.1278803Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_nvfuser_cuda_float32 SKIPPED (_refs.stack doesn't support nvfuser) [ 49%] 2023-01-11T23:10:17.1279010Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 49%] 2023-01-11T23:10:17.1279336Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 49%] 2023-01-11T23:10:17.1279653Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_nvfuser_cuda_int32 SKIPPED (_refs.stack doesn't support nvfuser) [ 49%] 2023-01-11T23:10:17.1279857Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 49%] 2023-01-11T23:10:17.1280065Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_std_executor_aten_cuda_bfloat16 PASSED [ 49%] 2023-01-11T23:10:17.1280255Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_std_mean_executor_aten_cuda_complex128 PASSED [ 49%] 2023-01-11T23:10:17.1280470Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_std_mean_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 49%] 2023-01-11T23:10:17.1280682Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sub_executor_aten_cuda_complex128 PASSED [ 49%] 2023-01-11T23:10:17.1280877Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sub_executor_aten_cuda_complex32 PASSED [ 49%] 2023-01-11T23:10:17.1281048Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sub_executor_aten_cuda_float16 PASSED [ 49%] 2023-01-11T23:10:17.1281225Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sub_executor_aten_cuda_float32 PASSED [ 49%] 2023-01-11T23:10:17.1281405Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sub_executor_aten_cuda_int16 PASSED [ 49%] 2023-01-11T23:10:17.1281582Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sub_executor_aten_cuda_int32 PASSED [ 49%] 2023-01-11T23:10:17.1281762Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sub_executor_aten_cuda_int64 PASSED [ 49%] 2023-01-11T23:10:17.1281937Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sub_executor_aten_cuda_int8 PASSED [ 49%] 2023-01-11T23:10:17.1282114Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sub_executor_aten_cuda_uint8 PASSED [ 49%] 2023-01-11T23:10:17.1282321Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sub_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 49%] 2023-01-11T23:10:17.1282664Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sub_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 49%] 2023-01-11T23:10:17.1282874Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sub_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 49%] 2023-01-11T23:10:17.1283109Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sub_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 49%] 2023-01-11T23:10:17.1283437Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sub_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 49%] 2023-01-11T23:10:17.1283638Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sub_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 49%] 2023-01-11T23:10:17.1283961Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sub_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 49%] 2023-01-11T23:10:17.1284161Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sub_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 49%] 2023-01-11T23:10:17.1284340Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_aten_cuda_bool PASSED [ 49%] 2023-01-11T23:10:17.1284525Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_aten_cuda_complex32 PASSED [ 49%] 2023-01-11T23:10:17.1284709Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_aten_cuda_complex64 PASSED [ 49%] 2023-01-11T23:10:17.1284888Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_aten_cuda_float16 PASSED [ 49%] 2023-01-11T23:10:17.1285068Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_aten_cuda_float32 PASSED [ 49%] 2023-01-11T23:10:17.1285236Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_aten_cuda_int64 PASSED [ 49%] 2023-01-11T23:10:17.1285436Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 49%] 2023-01-11T23:10:17.1285644Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 49%] 2023-01-11T23:10:17.1286012Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 49%] 2023-01-11T23:10:17.1286222Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 49%] 2023-01-11T23:10:17.1286423Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 49%] 2023-01-11T23:10:17.1286626Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 49%] 2023-01-11T23:10:17.1286822Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 49%] 2023-01-11T23:10:17.1287145Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 49%] 2023-01-11T23:10:17.1287351Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 49%] 2023-01-11T23:10:17.1287533Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_to_size_executor_aten_cuda_bfloat16 PASSED [ 49%] 2023-01-11T23:10:17.1287719Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_to_size_executor_aten_cuda_bool PASSED [ 49%] 2023-01-11T23:10:17.1287909Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_to_size_executor_aten_cuda_complex128 PASSED [ 49%] 2023-01-11T23:10:17.1288095Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_to_size_executor_aten_cuda_complex64 PASSED [ 49%] 2023-01-11T23:10:17.1288277Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_to_size_executor_aten_cuda_int16 PASSED [ 49%] 2023-01-11T23:10:17.1288459Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_to_size_executor_aten_cuda_int8 PASSED [ 49%] 2023-01-11T23:10:17.1288645Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_to_size_executor_aten_cuda_uint8 PASSED [ 49%] 2023-01-11T23:10:17.1288881Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_to_size_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 49%] 2023-01-11T23:10:17.1289096Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_to_size_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 49%] 2023-01-11T23:10:17.1289298Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_to_size_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 49%] 2023-01-11T23:10:17.1289488Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_to_size_executor_nvfuser_cuda_float32 PASSED [ 49%] 2023-01-11T23:10:17.1289676Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_to_size_executor_nvfuser_cuda_int32 PASSED [ 49%] 2023-01-11T23:10:17.1289883Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_to_size_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 49%] 2023-01-11T23:10:17.1290244Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_to_size_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 49%] 2023-01-11T23:10:17.1290454Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_executor_aten_cuda_bfloat16 PASSED [ 49%] 2023-01-11T23:10:17.1290628Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_executor_aten_cuda_bool PASSED [ 49%] 2023-01-11T23:10:17.1290808Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_executor_aten_cuda_complex64 PASSED [ 49%] 2023-01-11T23:10:17.1290992Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_executor_aten_cuda_float32 PASSED [ 49%] 2023-01-11T23:10:17.1291167Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_executor_aten_cuda_uint8 PASSED [ 49%] 2023-01-11T23:10:17.1291366Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 49%] 2023-01-11T23:10:17.1291592Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 49%] 2023-01-11T23:10:17.1291775Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_executor_nvfuser_cuda_float32 PASSED [ 49%] 2023-01-11T23:10:17.1291976Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 49%] 2023-01-11T23:10:17.1292175Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 49%] 2023-01-11T23:10:17.1292362Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tan_executor_aten_cuda_complex128 PASSED [ 49%] 2023-01-11T23:10:17.1292543Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tan_executor_aten_cuda_complex32 PASSED [ 49%] 2023-01-11T23:10:17.1292724Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tan_executor_aten_cuda_complex64 PASSED [ 49%] 2023-01-11T23:10:17.1292903Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tan_executor_aten_cuda_float16 PASSED [ 49%] 2023-01-11T23:10:17.1293076Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tan_executor_aten_cuda_float32 PASSED [ 49%] 2023-01-11T23:10:17.1293252Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tan_executor_aten_cuda_int16 PASSED [ 49%] 2023-01-11T23:10:17.1293428Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tan_executor_aten_cuda_int32 PASSED [ 49%] 2023-01-11T23:10:17.1293633Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tan_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 49%] 2023-01-11T23:10:17.1293842Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tan_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 49%] 2023-01-11T23:10:17.1294045Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tan_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 49%] 2023-01-11T23:10:17.1294249Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tan_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 49%] 2023-01-11T23:10:17.1294470Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tan_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 49%] 2023-01-11T23:10:17.1294757Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_aten_cuda_bfloat16 PASSED [ 49%] 2023-01-11T23:10:17.1294941Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_aten_cuda_complex128 PASSED [ 49%] 2023-01-11T23:10:17.1295113Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_aten_cuda_float16 PASSED [ 49%] 2023-01-11T23:10:17.1295293Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_aten_cuda_float32 PASSED [ 49%] 2023-01-11T23:10:17.1295471Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_aten_cuda_float64 PASSED [ 49%] 2023-01-11T23:10:17.1295651Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_aten_cuda_int16 PASSED [ 49%] 2023-01-11T23:10:17.1295860Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 49%] 2023-01-11T23:10:17.1296205Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 49%] 2023-01-11T23:10:17.1296414Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 49%] 2023-01-11T23:10:17.1296616Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 49%] 2023-01-11T23:10:17.1296798Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_nvfuser_cuda_float32 PASSED [ 49%] 2023-01-11T23:10:17.1297124Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 49%] 2023-01-11T23:10:17.1297343Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_nvfuser_cuda_int32 PASSED [ 49%] 2023-01-11T23:10:17.1297548Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 49%] 2023-01-11T23:10:17.1297878Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 49%] 2023-01-11T23:10:17.1298064Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tensor_split_executor_aten_cuda_bool PASSED [ 49%] 2023-01-11T23:10:17.1298253Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tensor_split_executor_aten_cuda_float64 PASSED [ 49%] 2023-01-11T23:10:17.1298440Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tensor_split_executor_aten_cuda_int16 PASSED [ 49%] 2023-01-11T23:10:17.1298627Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tensor_split_executor_aten_cuda_int64 PASSED [ 49%] 2023-01-11T23:10:17.1298807Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tensor_split_executor_aten_cuda_int8 PASSED [ 49%] 2023-01-11T23:10:17.1299025Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tensor_split_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 49%] 2023-01-11T23:10:17.1299236Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tensor_split_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 49%] 2023-01-11T23:10:17.1299447Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tensor_split_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 49%] 2023-01-11T23:10:17.1299793Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tensor_split_executor_nvfuser_cuda_float32 SKIPPED (_refs.tensor_split doesn't support nvfuser) [ 49%] 2023-01-11T23:10:17.1300128Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tensor_split_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 49%] 2023-01-11T23:10:17.1300512Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tensor_split_executor_nvfuser_cuda_int32 SKIPPED (_refs.tensor_split doesn't support nvfuser) [ 49%] 2023-01-11T23:10:17.1300693Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_to_executor_aten_cuda_bool PASSED [ 49%] 2023-01-11T23:10:17.1300875Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_to_executor_aten_cuda_complex128 PASSED [ 49%] 2023-01-11T23:10:17.1301053Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_to_executor_aten_cuda_float16 PASSED [ 49%] 2023-01-11T23:10:17.1301231Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_to_executor_aten_cuda_float32 PASSED [ 49%] 2023-01-11T23:10:17.1301407Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_to_executor_aten_cuda_int32 PASSED [ 49%] 2023-01-11T23:10:17.1301570Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_to_executor_aten_cuda_int64 PASSED [ 49%] 2023-01-11T23:10:17.1301748Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_to_executor_aten_cuda_uint8 PASSED [ 49%] 2023-01-11T23:10:17.1301957Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_to_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 49%] 2023-01-11T23:10:17.1302161Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_to_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 49%] 2023-01-11T23:10:17.1302362Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_to_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 49%] 2023-01-11T23:10:17.1302687Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_to_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 49%] 2023-01-11T23:10:17.1302870Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_aten_cuda_bfloat16 PASSED [ 49%] 2023-01-11T23:10:17.1303076Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_aten_cuda_bool PASSED [ 49%] 2023-01-11T23:10:17.1303255Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_aten_cuda_int32 PASSED [ 49%] 2023-01-11T23:10:17.1303434Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_aten_cuda_int8 PASSED [ 49%] 2023-01-11T23:10:17.1303607Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_aten_cuda_uint8 PASSED [ 49%] 2023-01-11T23:10:17.1303814Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 49%] 2023-01-11T23:10:17.1304015Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 49%] 2023-01-11T23:10:17.1304227Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 49%] 2023-01-11T23:10:17.1304574Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 49%] 2023-01-11T23:10:17.1304786Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 49%] 2023-01-11T23:10:17.1305104Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_nvfuser_cuda_int32 SKIPPED (_refs.trace doesn't support nvfuser) [ 49%] 2023-01-11T23:10:17.1305430Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 49%] 2023-01-11T23:10:17.1305618Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_aten_cuda_bool PASSED [ 49%] 2023-01-11T23:10:17.1305809Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_aten_cuda_complex64 PASSED [ 49%] 2023-01-11T23:10:17.1305989Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_aten_cuda_float64 PASSED [ 49%] 2023-01-11T23:10:17.1306177Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_aten_cuda_int16 PASSED [ 49%] 2023-01-11T23:10:17.1306388Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_aten_cuda_uint8 PASSED [ 49%] 2023-01-11T23:10:17.1306602Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 49%] 2023-01-11T23:10:17.1306809Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 49%] 2023-01-11T23:10:17.1307024Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 49%] 2023-01-11T23:10:17.1307236Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 49%] 2023-01-11T23:10:17.1307576Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 49%] 2023-01-11T23:10:17.1307765Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_nvfuser_cuda_int32 PASSED [ 49%] 2023-01-11T23:10:17.1307974Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 50%] 2023-01-11T23:10:17.1308146Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_executor_aten_cuda_bool PASSED [ 50%] 2023-01-11T23:10:17.1308328Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_executor_aten_cuda_complex64 PASSED [ 50%] 2023-01-11T23:10:17.1308503Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_executor_aten_cuda_int16 PASSED [ 50%] 2023-01-11T23:10:17.1308742Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_executor_aten_cuda_int8 PASSED [ 50%] 2023-01-11T23:10:17.1308965Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_executor_aten_cuda_uint8 PASSED [ 50%] 2023-01-11T23:10:17.1309168Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 50%] 2023-01-11T23:10:17.1309375Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 50%] 2023-01-11T23:10:17.1309577Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 50%] 2023-01-11T23:10:17.1309894Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_executor_nvfuser_cuda_int32 SKIPPED (_refs.tril doesn't support nvfuser) [ 50%] 2023-01-11T23:10:17.1310095Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 50%] 2023-01-11T23:10:17.1310408Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 50%] 2023-01-11T23:10:17.1310610Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 50%] 2023-01-11T23:10:17.1310787Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_executor_aten_cuda_bool PASSED [ 50%] 2023-01-11T23:10:17.1310972Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_executor_aten_cuda_complex128 PASSED [ 50%] 2023-01-11T23:10:17.1311176Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_executor_aten_cuda_float32 PASSED [ 50%] 2023-01-11T23:10:17.1311378Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_executor_aten_cuda_int16 PASSED [ 50%] 2023-01-11T23:10:17.1311553Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_executor_aten_cuda_int64 PASSED [ 50%] 2023-01-11T23:10:17.1311730Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_executor_aten_cuda_int8 PASSED [ 50%] 2023-01-11T23:10:17.1311933Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 50%] 2023-01-11T23:10:17.1312157Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 50%] 2023-01-11T23:10:17.1312483Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 50%] 2023-01-11T23:10:17.1312796Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_executor_nvfuser_cuda_int32 SKIPPED (_refs.triu doesn't support nvfuser) [ 50%] 2023-01-11T23:10:17.1313003Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_indices_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 50%] 2023-01-11T23:10:17.1313193Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_true_divide_executor_aten_cuda_bfloat16 PASSED [ 50%] 2023-01-11T23:10:17.1313386Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_true_divide_executor_aten_cuda_complex128 PASSED [ 50%] 2023-01-11T23:10:17.1313571Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_true_divide_executor_aten_cuda_float16 PASSED [ 50%] 2023-01-11T23:10:17.1313760Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_true_divide_executor_aten_cuda_float64 PASSED [ 50%] 2023-01-11T23:10:17.1313943Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_true_divide_executor_aten_cuda_int8 PASSED [ 50%] 2023-01-11T23:10:17.1314154Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_true_divide_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 50%] 2023-01-11T23:10:17.1314364Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_true_divide_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 50%] 2023-01-11T23:10:17.1314575Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_true_divide_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 50%] 2023-01-11T23:10:17.1314811Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_true_divide_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 50%] 2023-01-11T23:10:17.1314998Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trunc_executor_aten_cuda_bfloat16 PASSED [ 50%] 2023-01-11T23:10:17.1315181Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trunc_executor_aten_cuda_float16 PASSED [ 50%] 2023-01-11T23:10:17.1315360Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trunc_executor_aten_cuda_float64 PASSED [ 50%] 2023-01-11T23:10:17.1315539Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trunc_executor_aten_cuda_int8 PASSED [ 50%] 2023-01-11T23:10:17.1315719Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trunc_executor_aten_cuda_uint8 PASSED [ 50%] 2023-01-11T23:10:17.1315927Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trunc_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 50%] 2023-01-11T23:10:17.1316251Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trunc_executor_nvfuser_cuda_float32 SKIPPED (_refs.trunc doesn't support nvfuser) [ 50%] 2023-01-11T23:10:17.1316451Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trunc_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 50%] 2023-01-11T23:10:17.1316777Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trunc_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 50%] 2023-01-11T23:10:17.1317095Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trunc_executor_nvfuser_cuda_int32 SKIPPED (_refs.trunc doesn't support nvfuser) [ 50%] 2023-01-11T23:10:17.1317417Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trunc_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 50%] 2023-01-11T23:10:17.1317602Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_aten_cuda_bfloat16 PASSED [ 50%] 2023-01-11T23:10:17.1317792Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_aten_cuda_complex128 PASSED [ 50%] 2023-01-11T23:10:17.1317978Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_aten_cuda_complex32 PASSED [ 50%] 2023-01-11T23:10:17.1318183Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_aten_cuda_float16 PASSED [ 50%] 2023-01-11T23:10:17.1318365Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_aten_cuda_float32 PASSED [ 50%] 2023-01-11T23:10:17.1318544Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_aten_cuda_int32 PASSED [ 50%] 2023-01-11T23:10:17.1318746Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 50%] 2023-01-11T23:10:17.1318952Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 50%] 2023-01-11T23:10:17.1319301Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 50%] 2023-01-11T23:10:17.1319629Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_nvfuser_cuda_float32 SKIPPED (_refs.unbind doesn't support nvfuser) [ 50%] 2023-01-11T23:10:17.1319836Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 50%] 2023-01-11T23:10:17.1320167Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 50%] 2023-01-11T23:10:17.1320370Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 50%] 2023-01-11T23:10:17.1320572Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 50%] 2023-01-11T23:10:17.1320783Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unflatten_executor_aten_cuda_bool PASSED [ 50%] 2023-01-11T23:10:17.1320973Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unflatten_executor_aten_cuda_complex64 PASSED [ 50%] 2023-01-11T23:10:17.1321157Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unflatten_executor_aten_cuda_float64 PASSED [ 50%] 2023-01-11T23:10:17.1321341Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unflatten_executor_aten_cuda_int16 PASSED [ 50%] 2023-01-11T23:10:17.1321521Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unflatten_executor_aten_cuda_int32 PASSED [ 50%] 2023-01-11T23:10:17.1321733Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unflatten_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 50%] 2023-01-11T23:10:17.1321938Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unflatten_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 50%] 2023-01-11T23:10:17.1322150Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unflatten_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 50%] 2023-01-11T23:10:17.1322363Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unflatten_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 50%] 2023-01-11T23:10:17.1322689Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unflatten_executor_nvfuser_cuda_int32 SKIPPED (_refs.unflatten doesn't support nvfuser) [ 50%] 2023-01-11T23:10:17.1322879Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_aten_cuda_complex64 PASSED [ 50%] 2023-01-11T23:10:17.1323066Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_aten_cuda_float16 PASSED [ 50%] 2023-01-11T23:10:17.1323247Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_aten_cuda_float64 PASSED [ 50%] 2023-01-11T23:10:17.1323432Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_aten_cuda_int32 PASSED [ 50%] 2023-01-11T23:10:17.1323622Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_aten_cuda_uint8 PASSED [ 50%] 2023-01-11T23:10:17.1323868Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 50%] 2023-01-11T23:10:17.1324087Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 50%] 2023-01-11T23:10:17.1324298Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 50%] 2023-01-11T23:10:17.1324480Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_aten_cuda_bfloat16 PASSED [ 50%] 2023-01-11T23:10:17.1324664Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_aten_cuda_complex128 PASSED [ 50%] 2023-01-11T23:10:17.1324849Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_aten_cuda_complex32 PASSED [ 50%] 2023-01-11T23:10:17.1325029Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_aten_cuda_complex64 PASSED [ 50%] 2023-01-11T23:10:17.1325215Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_aten_cuda_float32 PASSED [ 50%] 2023-01-11T23:10:17.1325394Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_aten_cuda_float64 PASSED [ 50%] 2023-01-11T23:10:17.1325576Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_aten_cuda_int32 PASSED [ 50%] 2023-01-11T23:10:17.1325756Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_aten_cuda_int64 PASSED [ 50%] 2023-01-11T23:10:17.1325935Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_aten_cuda_int8 PASSED [ 50%] 2023-01-11T23:10:17.1326142Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 50%] 2023-01-11T23:10:17.1326374Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 50%] 2023-01-11T23:10:17.1326702Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 50%] 2023-01-11T23:10:17.1327018Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_nvfuser_cuda_int32 SKIPPED (_refs.unfold doesn't support nvfuser) [ 50%] 2023-01-11T23:10:17.1327215Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 50%] 2023-01-11T23:10:17.1327398Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_aten_cuda_bool PASSED [ 50%] 2023-01-11T23:10:17.1327585Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_aten_cuda_float16 PASSED [ 50%] 2023-01-11T23:10:17.1327772Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_aten_cuda_float64 PASSED [ 50%] 2023-01-11T23:10:17.1327958Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_aten_cuda_int64 PASSED [ 50%] 2023-01-11T23:10:17.1328142Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_aten_cuda_int8 PASSED [ 50%] 2023-01-11T23:10:17.1328354Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 50%] 2023-01-11T23:10:17.1328566Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 50%] 2023-01-11T23:10:17.1328919Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 50%] 2023-01-11T23:10:17.1329130Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 50%] 2023-01-11T23:10:17.1329314Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_nvfuser_cuda_float32 PASSED [ 50%] 2023-01-11T23:10:17.1329527Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_nvfuser_cuda_int32 PASSED [ 50%] 2023-01-11T23:10:17.1329737Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 50%] 2023-01-11T23:10:17.1330068Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 50%] 2023-01-11T23:10:17.1330316Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 50%] 2023-01-11T23:10:17.1330502Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_executor_aten_cuda_float16 PASSED [ 50%] 2023-01-11T23:10:17.1330680Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_executor_aten_cuda_float32 PASSED [ 50%] 2023-01-11T23:10:17.1330860Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_executor_aten_cuda_float64 PASSED [ 50%] 2023-01-11T23:10:17.1331070Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 50%] 2023-01-11T23:10:17.1331275Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 50%] 2023-01-11T23:10:17.1331472Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 50%] 2023-01-11T23:10:17.1331655Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_mean_executor_aten_cuda_bfloat16 PASSED [ 50%] 2023-01-11T23:10:17.1331843Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_mean_executor_aten_cuda_complex128 PASSED [ 50%] 2023-01-11T23:10:17.1332029Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_mean_executor_aten_cuda_complex64 PASSED [ 50%] 2023-01-11T23:10:17.1332242Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_mean_executor_aten_cuda_float16 PASSED [ 50%] 2023-01-11T23:10:17.1332428Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_mean_executor_aten_cuda_float64 PASSED [ 50%] 2023-01-11T23:10:17.1332639Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_mean_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 50%] 2023-01-11T23:10:17.1332848Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_mean_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 50%] 2023-01-11T23:10:17.1333031Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_executor_aten_cuda_bfloat16 PASSED [ 50%] 2023-01-11T23:10:17.1333210Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_executor_aten_cuda_complex128 PASSED [ 50%] 2023-01-11T23:10:17.1333395Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_executor_aten_cuda_complex32 PASSED [ 50%] 2023-01-11T23:10:17.1333586Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_executor_aten_cuda_complex64 PASSED [ 50%] 2023-01-11T23:10:17.1333769Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_executor_aten_cuda_int16 PASSED [ 50%] 2023-01-11T23:10:17.1333951Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_executor_aten_cuda_int32 PASSED [ 50%] 2023-01-11T23:10:17.1334158Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 50%] 2023-01-11T23:10:17.1334369Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 50%] 2023-01-11T23:10:17.1334680Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 50%] 2023-01-11T23:10:17.1335004Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_executor_nvfuser_cuda_float32 SKIPPED (_refs.view_as doesn't support nvfuser) [ 50%] 2023-01-11T23:10:17.1335258Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 50%] 2023-01-11T23:10:17.1335576Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 50%] 2023-01-11T23:10:17.1335784Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 50%] 2023-01-11T23:10:17.1335963Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_aten_cuda_bfloat16 PASSED [ 50%] 2023-01-11T23:10:17.1336142Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_aten_cuda_bool PASSED [ 50%] 2023-01-11T23:10:17.1336325Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_aten_cuda_complex32 PASSED [ 50%] 2023-01-11T23:10:17.1336508Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_aten_cuda_float16 PASSED [ 50%] 2023-01-11T23:10:17.1336688Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_aten_cuda_int64 PASSED [ 50%] 2023-01-11T23:10:17.1336868Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_aten_cuda_int8 PASSED [ 50%] 2023-01-11T23:10:17.1337190Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 50%] 2023-01-11T23:10:17.1337506Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 50%] 2023-01-11T23:10:17.1337683Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vsplit_executor_aten_cuda_bfloat16 PASSED [ 50%] 2023-01-11T23:10:17.1337865Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vsplit_executor_aten_cuda_bool PASSED [ 50%] 2023-01-11T23:10:17.1338089Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vsplit_executor_aten_cuda_complex32 PASSED [ 50%] 2023-01-11T23:10:17.1338275Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vsplit_executor_aten_cuda_complex64 PASSED [ 50%] 2023-01-11T23:10:17.1338455Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vsplit_executor_aten_cuda_float16 PASSED [ 50%] 2023-01-11T23:10:17.1338638Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vsplit_executor_aten_cuda_float32 PASSED [ 50%] 2023-01-11T23:10:17.1338817Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vsplit_executor_aten_cuda_int16 PASSED [ 50%] 2023-01-11T23:10:17.1338996Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vsplit_executor_aten_cuda_uint8 PASSED [ 50%] 2023-01-11T23:10:17.1339203Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vsplit_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 50%] 2023-01-11T23:10:17.1339406Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vsplit_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 50%] 2023-01-11T23:10:17.1339615Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vsplit_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 50%] 2023-01-11T23:10:17.1339814Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vsplit_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 50%] 2023-01-11T23:10:17.1340132Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vsplit_executor_nvfuser_cuda_int32 SKIPPED (_refs.vsplit doesn't support nvfuser) [ 50%] 2023-01-11T23:10:17.1340360Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vsplit_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 50%] 2023-01-11T23:10:17.1340707Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vsplit_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 50%] 2023-01-11T23:10:17.1340891Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vstack_executor_aten_cuda_bool PASSED [ 50%] 2023-01-11T23:10:17.1341075Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vstack_executor_aten_cuda_complex64 PASSED [ 50%] 2023-01-11T23:10:17.1341289Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vstack_executor_aten_cuda_float16 PASSED [ 50%] 2023-01-11T23:10:17.1341471Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vstack_executor_aten_cuda_float32 PASSED [ 50%] 2023-01-11T23:10:17.1341641Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vstack_executor_aten_cuda_int16 PASSED [ 51%] 2023-01-11T23:10:17.1341819Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vstack_executor_aten_cuda_int32 PASSED [ 51%] 2023-01-11T23:10:17.1341996Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vstack_executor_aten_cuda_int8 PASSED [ 51%] 2023-01-11T23:10:17.1342174Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vstack_executor_aten_cuda_uint8 PASSED [ 51%] 2023-01-11T23:10:17.1342386Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vstack_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 51%] 2023-01-11T23:10:17.1342594Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vstack_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 51%] 2023-01-11T23:10:17.1342806Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vstack_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 51%] 2023-01-11T23:10:17.1343017Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vstack_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 51%] 2023-01-11T23:10:17.1343221Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vstack_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 51%] 2023-01-11T23:10:17.1343541Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vstack_executor_nvfuser_cuda_int32 SKIPPED (_refs.vstack doesn't support nvfuser) [ 51%] 2023-01-11T23:10:17.1343765Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vstack_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 51%] 2023-01-11T23:10:17.1343952Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_where_executor_aten_cuda_complex128 PASSED [ 51%] 2023-01-11T23:10:17.1344140Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_where_executor_aten_cuda_float64 PASSED [ 51%] 2023-01-11T23:10:17.1344320Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_where_executor_aten_cuda_int8 PASSED [ 51%] 2023-01-11T23:10:17.1344528Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_where_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 51%] 2023-01-11T23:10:17.1344849Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_where_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 51%] 2023-01-11T23:10:17.1345162Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_where_executor_nvfuser_cuda_int32 SKIPPED (_refs.where doesn't support nvfuser) [ 51%] 2023-01-11T23:10:17.1345368Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_where_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 51%] 2023-01-11T23:10:17.1345573Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_where_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 51%] 2023-01-11T23:10:17.1345755Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_xlogy_executor_aten_cuda_bfloat16 PASSED [ 51%] 2023-01-11T23:10:17.1345929Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_xlogy_executor_aten_cuda_bool PASSED [ 51%] 2023-01-11T23:10:17.1346116Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_xlogy_executor_aten_cuda_float16 PASSED [ 51%] 2023-01-11T23:10:17.1346296Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_xlogy_executor_aten_cuda_float64 PASSED [ 51%] 2023-01-11T23:10:17.1346470Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_xlogy_executor_aten_cuda_int64 PASSED [ 51%] 2023-01-11T23:10:17.1346677Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_xlogy_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 51%] 2023-01-11T23:10:17.1347019Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_xlogy_executor_nvfuser_cuda_float32 SKIPPED (_refs.xlogy doesn't support nvfuser) [ 51%] 2023-01-11T23:10:17.1347225Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_xlogy_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 51%] 2023-01-11T23:10:17.1347550Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_xlogy_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 51%] 2023-01-11T23:10:17.1347862Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_xlogy_executor_nvfuser_cuda_int32 SKIPPED (_refs.xlogy doesn't support nvfuser) [ 51%] 2023-01-11T23:10:17.1348184Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_xlogy_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 51%] 2023-01-11T23:10:17.1348360Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_aten_cuda_bool PASSED [ 51%] 2023-01-11T23:10:17.1348549Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_aten_cuda_complex128 PASSED [ 51%] 2023-01-11T23:10:17.1348793Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_aten_cuda_complex64 PASSED [ 51%] 2023-01-11T23:10:17.1348988Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_aten_cuda_int32 PASSED [ 51%] 2023-01-11T23:10:17.1349165Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_aten_cuda_uint8 PASSED [ 51%] 2023-01-11T23:10:17.1349371Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 51%] 2023-01-11T23:10:17.1349574Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 51%] 2023-01-11T23:10:17.1349821Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 51%] 2023-01-11T23:10:17.1350031Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 51%] 2023-01-11T23:10:17.1350341Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_nvfuser_cuda_float32 SKIPPED (_refs.zeros doesn't support nvfuser) [ 51%] 2023-01-11T23:10:17.1350660Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 51%] 2023-01-11T23:10:17.1350979Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_nvfuser_cuda_int32 SKIPPED (_refs.zeros doesn't support nvfuser) [ 51%] 2023-01-11T23:10:17.1351184Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 51%] 2023-01-11T23:10:17.1351391Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 51%] 2023-01-11T23:10:17.1351598Z test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_native_batch_norm_executor_aten_cuda_float16 PASSED [ 51%] 2023-01-11T23:10:17.1351799Z test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_native_batch_norm_executor_aten_cuda_float32 PASSED [ 51%] 2023-01-11T23:10:17.1351994Z test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_native_batch_norm_executor_aten_cuda_float64 PASSED [ 51%] 2023-01-11T23:10:17.1352223Z test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_native_batch_norm_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 51%] 2023-01-11T23:10:17.1352426Z test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_native_batch_norm_executor_nvfuser_cuda_float32 PASSED [ 51%] 2023-01-11T23:10:17.1352620Z test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_var_mean_executor_aten_cuda_float64 PASSED [ 51%] 2023-01-11T23:10:17.1352831Z test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_var_mean_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 51%] 2023-01-11T23:10:17.1353046Z test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_view_executor_aten_cuda_bfloat16 PASSED [ 51%] 2023-01-11T23:10:17.1353233Z test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_view_executor_aten_cuda_bool PASSED [ 51%] 2023-01-11T23:10:17.1353429Z test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_view_executor_aten_cuda_complex32 PASSED [ 51%] 2023-01-11T23:10:17.1353617Z test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_view_executor_aten_cuda_float16 PASSED [ 51%] 2023-01-11T23:10:17.1353804Z test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_view_executor_aten_cuda_float64 PASSED [ 51%] 2023-01-11T23:10:17.1353990Z test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_view_executor_aten_cuda_int16 PASSED [ 51%] 2023-01-11T23:10:17.1354177Z test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_view_executor_aten_cuda_int32 PASSED [ 51%] 2023-01-11T23:10:17.1354364Z test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_view_executor_aten_cuda_int8 PASSED [ 51%] 2023-01-11T23:10:17.1354537Z test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_view_executor_aten_cuda_uint8 PASSED [ 51%] 2023-01-11T23:10:17.1354752Z test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_view_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 51%] 2023-01-11T23:10:17.1354963Z test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_view_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 51%] 2023-01-11T23:10:17.1355178Z test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_view_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 51%] 2023-01-11T23:10:17.1355394Z test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_view_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 51%] 2023-01-11T23:10:17.1355628Z test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_view_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 51%] 2023-01-11T23:10:17.1355824Z test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_view_executor_nvfuser_cuda_float32 PASSED [ 51%] 2023-01-11T23:10:17.1356157Z test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_view_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 51%] 2023-01-11T23:10:17.1356349Z test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_view_executor_nvfuser_cuda_int32 PASSED [ 51%] 2023-01-11T23:10:17.1356560Z test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_view_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 51%] 2023-01-11T23:10:17.1356715Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_T_cuda_bfloat16 PASSED [ 51%] 2023-01-11T23:10:17.1356881Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_T_cuda_complex32 PASSED [ 51%] 2023-01-11T23:10:17.1357044Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_T_cuda_complex64 PASSED [ 51%] 2023-01-11T23:10:17.1357207Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_T_cuda_float16 PASSED [ 51%] 2023-01-11T23:10:17.1357362Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_T_cuda_float32 PASSED [ 51%] 2023-01-11T23:10:17.1357519Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_T_cuda_int32 PASSED [ 51%] 2023-01-11T23:10:17.1357675Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_T_cuda_int64 PASSED [ 51%] 2023-01-11T23:10:17.1357831Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_T_cuda_int8 PASSED [ 51%] 2023-01-11T23:10:17.1358013Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bfloat16_cuda_bool PASSED [ 51%] 2023-01-11T23:10:17.1358194Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bfloat16_cuda_complex128 PASSED [ 51%] 2023-01-11T23:10:17.1358385Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bfloat16_cuda_complex64 PASSED [ 51%] 2023-01-11T23:10:17.1358590Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bfloat16_cuda_float16 PASSED [ 51%] 2023-01-11T23:10:17.1358769Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bfloat16_cuda_int16 PASSED [ 51%] 2023-01-11T23:10:17.1358946Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bfloat16_cuda_int32 PASSED [ 51%] 2023-01-11T23:10:17.1359122Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bfloat16_cuda_int8 PASSED [ 51%] 2023-01-11T23:10:17.1359296Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bfloat16_cuda_uint8 PASSED [ 51%] 2023-01-11T23:10:17.1359471Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bool_cuda_bfloat16 PASSED [ 51%] 2023-01-11T23:10:17.1359648Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bool_cuda_complex32 PASSED [ 51%] 2023-01-11T23:10:17.1359817Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bool_cuda_complex64 PASSED [ 51%] 2023-01-11T23:10:17.1359996Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bool_cuda_float16 PASSED [ 51%] 2023-01-11T23:10:17.1360168Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bool_cuda_int16 PASSED [ 51%] 2023-01-11T23:10:17.1360361Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bool_cuda_int32 PASSED [ 51%] 2023-01-11T23:10:17.1360562Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_byte_cuda_complex64 PASSED [ 51%] 2023-01-11T23:10:17.1360735Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_byte_cuda_float16 PASSED [ 51%] 2023-01-11T23:10:17.1360904Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_byte_cuda_float64 PASSED [ 51%] 2023-01-11T23:10:17.1361101Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_byte_cuda_int16 PASSED [ 51%] 2023-01-11T23:10:17.1361262Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_byte_cuda_int32 PASSED [ 51%] 2023-01-11T23:10:17.1361432Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_byte_cuda_uint8 PASSED [ 51%] 2023-01-11T23:10:17.1361603Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cdouble_cuda_bool PASSED [ 51%] 2023-01-11T23:10:17.1361786Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cdouble_cuda_complex128 PASSED [ 51%] 2023-01-11T23:10:17.1361969Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cdouble_cuda_complex64 PASSED [ 51%] 2023-01-11T23:10:17.1362149Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cdouble_cuda_float32 PASSED [ 51%] 2023-01-11T23:10:17.1362328Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cdouble_cuda_float64 PASSED [ 51%] 2023-01-11T23:10:17.1362508Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cdouble_cuda_int16 PASSED [ 51%] 2023-01-11T23:10:17.1362681Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cdouble_cuda_int32 PASSED [ 51%] 2023-01-11T23:10:17.1362846Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cdouble_cuda_int64 PASSED [ 51%] 2023-01-11T23:10:17.1363022Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cdouble_cuda_int8 PASSED [ 51%] 2023-01-11T23:10:17.1363193Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cdouble_cuda_uint8 PASSED [ 51%] 2023-01-11T23:10:17.1363368Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cfloat_cuda_bool PASSED [ 51%] 2023-01-11T23:10:17.1363549Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cfloat_cuda_complex128 PASSED [ 51%] 2023-01-11T23:10:17.1363729Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cfloat_cuda_complex32 PASSED [ 51%] 2023-01-11T23:10:17.1363914Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cfloat_cuda_complex64 PASSED [ 51%] 2023-01-11T23:10:17.1364093Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cfloat_cuda_float32 PASSED [ 51%] 2023-01-11T23:10:17.1364341Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cfloat_cuda_float64 PASSED [ 51%] 2023-01-11T23:10:17.1364511Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cfloat_cuda_int16 PASSED [ 51%] 2023-01-11T23:10:17.1364682Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cfloat_cuda_int32 PASSED [ 51%] 2023-01-11T23:10:17.1364866Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_chalf_cuda_complex32 PASSED [ 51%] 2023-01-11T23:10:17.1365043Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_chalf_cuda_complex64 PASSED [ 51%] 2023-01-11T23:10:17.1365220Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_chalf_cuda_float32 PASSED [ 51%] 2023-01-11T23:10:17.1365394Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_chalf_cuda_float64 PASSED [ 51%] 2023-01-11T23:10:17.1365569Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_chalf_cuda_int16 PASSED [ 51%] 2023-01-11T23:10:17.1365739Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_chalf_cuda_int8 PASSED [ 51%] 2023-01-11T23:10:17.1365911Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_char_cuda_bool PASSED [ 51%] 2023-01-11T23:10:17.1366079Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_char_cuda_float64 PASSED [ 51%] 2023-01-11T23:10:17.1366250Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_char_cuda_int16 PASSED [ 51%] 2023-01-11T23:10:17.1366422Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_char_cuda_int8 PASSED [ 51%] 2023-01-11T23:10:17.1366596Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_char_cuda_uint8 PASSED [ 51%] 2023-01-11T23:10:17.1366799Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_complex_cuda_float16 PASSED [ 51%] 2023-01-11T23:10:17.1366978Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_double_cuda_bfloat16 PASSED [ 51%] 2023-01-11T23:10:17.1367155Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_double_cuda_bool PASSED [ 51%] 2023-01-11T23:10:17.1367339Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_double_cuda_complex128 PASSED [ 51%] 2023-01-11T23:10:17.1367520Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_double_cuda_complex32 PASSED [ 51%] 2023-01-11T23:10:17.1367691Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_double_cuda_float32 PASSED [ 51%] 2023-01-11T23:10:17.1367865Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_double_cuda_int16 PASSED [ 51%] 2023-01-11T23:10:17.1368038Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_double_cuda_int64 PASSED [ 51%] 2023-01-11T23:10:17.1368209Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_double_cuda_uint8 PASSED [ 51%] 2023-01-11T23:10:17.1368391Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_float_cuda_complex128 PASSED [ 51%] 2023-01-11T23:10:17.1368570Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_float_cuda_complex32 PASSED [ 51%] 2023-01-11T23:10:17.1368752Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_float_cuda_complex64 PASSED [ 51%] 2023-01-11T23:10:17.1368928Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_float_cuda_float16 PASSED [ 51%] 2023-01-11T23:10:17.1369099Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_float_cuda_float32 PASSED [ 51%] 2023-01-11T23:10:17.1369265Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_float_cuda_int16 PASSED [ 51%] 2023-01-11T23:10:17.1369436Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_float_cuda_int64 PASSED [ 51%] 2023-01-11T23:10:17.1369606Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_float_cuda_int8 PASSED [ 51%] 2023-01-11T23:10:17.1369799Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_float_cuda_uint8 PASSED [ 51%] 2023-01-11T23:10:17.1369972Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_half_cuda_bfloat16 PASSED [ 51%] 2023-01-11T23:10:17.1370170Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_half_cuda_complex128 PASSED [ 51%] 2023-01-11T23:10:17.1370368Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_half_cuda_int32 PASSED [ 51%] 2023-01-11T23:10:17.1370541Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_int_cuda_bool PASSED [ 51%] 2023-01-11T23:10:17.1370715Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_int_cuda_complex64 PASSED [ 51%] 2023-01-11T23:10:17.1370879Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_int_cuda_float32 PASSED [ 51%] 2023-01-11T23:10:17.1371047Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_int_cuda_int16 PASSED [ 51%] 2023-01-11T23:10:17.1371220Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_int_cuda_int64 PASSED [ 51%] 2023-01-11T23:10:17.1371390Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_int_cuda_uint8 PASSED [ 51%] 2023-01-11T23:10:17.1371567Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_long_cuda_bfloat16 PASSED [ 51%] 2023-01-11T23:10:17.1371746Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_long_cuda_complex64 PASSED [ 51%] 2023-01-11T23:10:17.1371918Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_long_cuda_uint8 PASSED [ 51%] 2023-01-11T23:10:17.1372099Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_short_cuda_bfloat16 PASSED [ 51%] 2023-01-11T23:10:17.1372301Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_short_cuda_bool PASSED [ 52%] 2023-01-11T23:10:17.1372474Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_short_cuda_complex128 PASSED [ 52%] 2023-01-11T23:10:17.1372651Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_short_cuda_float32 PASSED [ 52%] 2023-01-11T23:10:17.1372819Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_short_cuda_int16 PASSED [ 52%] 2023-01-11T23:10:17.1372986Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_short_cuda_int32 PASSED [ 52%] 2023-01-11T23:10:17.1373151Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_short_cuda_int64 PASSED [ 52%] 2023-01-11T23:10:17.1373321Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_short_cuda_int8 PASSED [ 52%] 2023-01-11T23:10:17.1373488Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_short_cuda_uint8 PASSED [ 52%] 2023-01-11T23:10:17.1373651Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_abs_cuda_bfloat16 PASSED [ 52%] 2023-01-11T23:10:17.1373809Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_abs_cuda_complex128 PASSED [ 52%] 2023-01-11T23:10:17.1373973Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_abs_cuda_complex64 PASSED [ 52%] 2023-01-11T23:10:17.1374133Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_abs_cuda_float32 PASSED [ 52%] 2023-01-11T23:10:17.1374291Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_abs_cuda_int32 PASSED [ 52%] 2023-01-11T23:10:17.1374445Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_abs_cuda_int64 PASSED [ 52%] 2023-01-11T23:10:17.1374704Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_abs_cuda_int8 PASSED [ 52%] 2023-01-11T23:10:17.1374874Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acos_cuda_complex128 PASSED [ 52%] 2023-01-11T23:10:17.1375036Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acos_cuda_complex32 PASSED [ 52%] 2023-01-11T23:10:17.1375200Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acos_cuda_float16 PASSED [ 52%] 2023-01-11T23:10:17.1375352Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acos_cuda_float32 PASSED [ 52%] 2023-01-11T23:10:17.1375553Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acos_cuda_int8 PASSED [ 52%] 2023-01-11T23:10:17.1375712Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acos_cuda_uint8 PASSED [ 52%] 2023-01-11T23:10:17.1375879Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acosh_cuda_bfloat16 PASSED [ 52%] 2023-01-11T23:10:17.1376046Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acosh_cuda_complex128 PASSED [ 52%] 2023-01-11T23:10:17.1376210Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acosh_cuda_complex32 PASSED [ 52%] 2023-01-11T23:10:17.1376374Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acosh_cuda_complex64 PASSED [ 52%] 2023-01-11T23:10:17.1376538Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acosh_cuda_float64 PASSED [ 52%] 2023-01-11T23:10:17.1376691Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acosh_cuda_int16 PASSED [ 52%] 2023-01-11T23:10:17.1376853Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acosh_cuda_int32 PASSED [ 52%] 2023-01-11T23:10:17.1377014Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acosh_cuda_int8 PASSED [ 52%] 2023-01-11T23:10:17.1377172Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_add_cuda_bool PASSED [ 52%] 2023-01-11T23:10:17.1377334Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_add_cuda_complex128 PASSED [ 52%] 2023-01-11T23:10:17.1377495Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_add_cuda_complex64 PASSED [ 52%] 2023-01-11T23:10:17.1377655Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_add_cuda_float32 PASSED [ 52%] 2023-01-11T23:10:17.1377812Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_add_cuda_float64 PASSED [ 52%] 2023-01-11T23:10:17.1377999Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_add_cuda_int16 PASSED [ 52%] 2023-01-11T23:10:17.1378146Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_add_cuda_int32 PASSED [ 52%] 2023-01-11T23:10:17.1378317Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addcdiv_cuda_complex64 PASSED [ 52%] 2023-01-11T23:10:17.1378481Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addcdiv_cuda_float64 PASSED [ 52%] 2023-01-11T23:10:17.1378647Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addcmul_cuda_bfloat16 PASSED [ 52%] 2023-01-11T23:10:17.1378817Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addcmul_cuda_complex64 PASSED [ 52%] 2023-01-11T23:10:17.1378980Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addcmul_cuda_float64 PASSED [ 52%] 2023-01-11T23:10:17.1379142Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addcmul_cuda_int16 PASSED [ 52%] 2023-01-11T23:10:17.1379305Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addcmul_cuda_int32 PASSED [ 52%] 2023-01-11T23:10:17.1379462Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addcmul_cuda_int64 PASSED [ 52%] 2023-01-11T23:10:17.1379619Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addcmul_cuda_uint8 PASSED [ 52%] 2023-01-11T23:10:17.1379786Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addr_cuda_complex128 PASSED [ 52%] 2023-01-11T23:10:17.1379948Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addr_cuda_complex64 PASSED [ 52%] 2023-01-11T23:10:17.1380108Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addr_cuda_float16 PASSED [ 52%] 2023-01-11T23:10:17.1380268Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addr_cuda_float32 PASSED [ 52%] 2023-01-11T23:10:17.1380429Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addr_cuda_float64 PASSED [ 52%] 2023-01-11T23:10:17.1380587Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addr_cuda_int8 PASSED [ 52%] 2023-01-11T23:10:17.1380738Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addr_cuda_uint8 PASSED [ 52%] 2023-01-11T23:10:17.1380899Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_all_cuda_bool PASSED [ 52%] 2023-01-11T23:10:17.1381081Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_all_cuda_float16 PASSED [ 52%] 2023-01-11T23:10:17.1381244Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_all_cuda_float32 PASSED [ 52%] 2023-01-11T23:10:17.1381403Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_all_cuda_int16 PASSED [ 52%] 2023-01-11T23:10:17.1381556Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_all_cuda_int64 PASSED [ 52%] 2023-01-11T23:10:17.1381709Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_all_cuda_uint8 PASSED [ 52%] 2023-01-11T23:10:17.1381879Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_allclose_cuda_complex64 PASSED [ 52%] 2023-01-11T23:10:17.1382046Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_allclose_cuda_float16 PASSED [ 52%] 2023-01-11T23:10:17.1382207Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_allclose_cuda_float32 PASSED [ 52%] 2023-01-11T23:10:17.1382363Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_amax_cuda_bool PASSED [ 52%] 2023-01-11T23:10:17.1382525Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_amax_cuda_float16 PASSED [ 52%] 2023-01-11T23:10:17.1382682Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_amax_cuda_float32 PASSED [ 52%] 2023-01-11T23:10:17.1382847Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_amax_cuda_uint8 PASSED [ 52%] 2023-01-11T23:10:17.1383006Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_amin_cuda_bfloat16 PASSED [ 52%] 2023-01-11T23:10:17.1383164Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_amin_cuda_bool PASSED [ 52%] 2023-01-11T23:10:17.1383324Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_amin_cuda_float16 PASSED [ 52%] 2023-01-11T23:10:17.1383474Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_amin_cuda_float32 PASSED [ 52%] 2023-01-11T23:10:17.1383654Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_amin_cuda_float64 PASSED [ 52%] 2023-01-11T23:10:17.1383811Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_amin_cuda_int64 PASSED [ 52%] 2023-01-11T23:10:17.1383971Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_amin_cuda_int8 PASSED [ 52%] 2023-01-11T23:10:17.1384128Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_amin_cuda_uint8 PASSED [ 52%] 2023-01-11T23:10:17.1384290Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_any_cuda_bfloat16 PASSED [ 52%] 2023-01-11T23:10:17.1384451Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_any_cuda_complex64 PASSED [ 52%] 2023-01-11T23:10:17.1384609Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_any_cuda_float64 PASSED [ 52%] 2023-01-11T23:10:17.1384759Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_any_cuda_int16 PASSED [ 52%] 2023-01-11T23:10:17.1384913Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_any_cuda_int64 PASSED [ 52%] 2023-01-11T23:10:17.1385070Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_any_cuda_uint8 PASSED [ 52%] 2023-01-11T23:10:17.1385235Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_arange_cuda_bfloat16 PASSED [ 52%] 2023-01-11T23:10:17.1385400Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_arange_cuda_float32 PASSED [ 52%] 2023-01-11T23:10:17.1385557Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_arange_cuda_int16 PASSED [ 52%] 2023-01-11T23:10:17.1385715Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_arange_cuda_int32 PASSED [ 52%] 2023-01-11T23:10:17.1385873Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_arange_cuda_int8 PASSED [ 52%] 2023-01-11T23:10:17.1386031Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_arange_cuda_uint8 PASSED [ 52%] 2023-01-11T23:10:17.1386197Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_cuda_complex128 PASSED [ 52%] 2023-01-11T23:10:17.1386368Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_cuda_complex64 PASSED [ 52%] 2023-01-11T23:10:17.1386537Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_cuda_int64 PASSED [ 52%] 2023-01-11T23:10:17.1386723Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_cuda_int8 PASSED [ 52%] 2023-01-11T23:10:17.1386910Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_partial_views_cuda_bfloat16 PASSED [ 52%] 2023-01-11T23:10:17.1387090Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_partial_views_cuda_bool PASSED [ 52%] 2023-01-11T23:10:17.1387277Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_partial_views_cuda_float16 PASSED [ 52%] 2023-01-11T23:10:17.1387457Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_partial_views_cuda_int32 PASSED [ 52%] 2023-01-11T23:10:17.1387636Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_partial_views_cuda_int8 PASSED [ 52%] 2023-01-11T23:10:17.1387810Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_scatter_cuda_complex64 PASSED [ 52%] 2023-01-11T23:10:17.1387986Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_scatter_cuda_float16 PASSED [ 52%] 2023-01-11T23:10:17.1388162Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_scatter_cuda_int32 PASSED [ 52%] 2023-01-11T23:10:17.1388334Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_scatter_cuda_int64 PASSED [ 52%] 2023-01-11T23:10:17.1388507Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_scatter_cuda_uint8 PASSED [ 52%] 2023-01-11T23:10:17.1388746Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asin_cuda_bfloat16 PASSED [ 52%] 2023-01-11T23:10:17.1388925Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asin_cuda_bool PASSED [ 52%] 2023-01-11T23:10:17.1389090Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asin_cuda_complex128 PASSED [ 52%] 2023-01-11T23:10:17.1389246Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asin_cuda_complex32 PASSED [ 52%] 2023-01-11T23:10:17.1389436Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asin_cuda_float64 PASSED [ 52%] 2023-01-11T23:10:17.1389593Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asin_cuda_int8 PASSED [ 52%] 2023-01-11T23:10:17.1389758Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asinh_cuda_bool PASSED [ 52%] 2023-01-11T23:10:17.1389923Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asinh_cuda_complex128 PASSED [ 52%] 2023-01-11T23:10:17.1390095Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asinh_cuda_complex32 PASSED [ 52%] 2023-01-11T23:10:17.1390281Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asinh_cuda_float32 PASSED [ 52%] 2023-01-11T23:10:17.1390466Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asinh_cuda_int32 PASSED [ 52%] 2023-01-11T23:10:17.1390625Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asinh_cuda_int64 PASSED [ 52%] 2023-01-11T23:10:17.1390779Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asinh_cuda_int8 PASSED [ 52%] 2023-01-11T23:10:17.1390939Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asinh_cuda_uint8 PASSED [ 52%] 2023-01-11T23:10:17.1391102Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan2_cuda_bool PASSED [ 52%] 2023-01-11T23:10:17.1391262Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan2_cuda_float16 PASSED [ 52%] 2023-01-11T23:10:17.1391418Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan2_cuda_int16 PASSED [ 52%] 2023-01-11T23:10:17.1391572Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan2_cuda_int64 PASSED [ 52%] 2023-01-11T23:10:17.1391728Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan2_cuda_int8 PASSED [ 52%] 2023-01-11T23:10:17.1391885Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan2_cuda_uint8 PASSED [ 52%] 2023-01-11T23:10:17.1392034Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan_cuda_bool PASSED [ 52%] 2023-01-11T23:10:17.1392202Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan_cuda_complex128 PASSED [ 52%] 2023-01-11T23:10:17.1392365Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan_cuda_complex32 PASSED [ 52%] 2023-01-11T23:10:17.1392548Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan_cuda_float64 PASSED [ 52%] 2023-01-11T23:10:17.1392706Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan_cuda_int16 PASSED [ 52%] 2023-01-11T23:10:17.1392859Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan_cuda_int32 PASSED [ 52%] 2023-01-11T23:10:17.1393013Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan_cuda_uint8 PASSED [ 52%] 2023-01-11T23:10:17.1393178Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atanh_cuda_complex128 PASSED [ 52%] 2023-01-11T23:10:17.1393334Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atanh_cuda_complex32 PASSED [ 52%] 2023-01-11T23:10:17.1393494Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atanh_cuda_float64 PASSED [ 52%] 2023-01-11T23:10:17.1393653Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atanh_cuda_int16 PASSED [ 52%] 2023-01-11T23:10:17.1393810Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atanh_cuda_int32 PASSED [ 52%] 2023-01-11T23:10:17.1393968Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atanh_cuda_int64 PASSED [ 52%] 2023-01-11T23:10:17.1394136Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_1d_cuda_float32 PASSED [ 52%] 2023-01-11T23:10:17.1394299Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_1d_cuda_int16 PASSED [ 52%] 2023-01-11T23:10:17.1394461Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_1d_cuda_int32 PASSED [ 52%] 2023-01-11T23:10:17.1394620Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_1d_cuda_int8 PASSED [ 52%] 2023-01-11T23:10:17.1394772Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_1d_cuda_uint8 PASSED [ 52%] 2023-01-11T23:10:17.1394930Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_2d_cuda_bool PASSED [ 52%] 2023-01-11T23:10:17.1395123Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_2d_cuda_complex32 PASSED [ 52%] 2023-01-11T23:10:17.1395292Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_2d_cuda_float16 PASSED [ 52%] 2023-01-11T23:10:17.1395456Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_2d_cuda_int16 PASSED [ 52%] 2023-01-11T23:10:17.1395624Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_2d_cuda_int32 PASSED [ 52%] 2023-01-11T23:10:17.1395783Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_2d_cuda_int64 PASSED [ 52%] 2023-01-11T23:10:17.1395943Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_2d_cuda_uint8 PASSED [ 52%] 2023-01-11T23:10:17.1396105Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_3d_cuda_bfloat16 PASSED [ 52%] 2023-01-11T23:10:17.1396266Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_3d_cuda_bool PASSED [ 52%] 2023-01-11T23:10:17.1396440Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_3d_cuda_complex32 PASSED [ 52%] 2023-01-11T23:10:17.1396612Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_3d_cuda_complex64 PASSED [ 52%] 2023-01-11T23:10:17.1396779Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_3d_cuda_float16 PASSED [ 52%] 2023-01-11T23:10:17.1396942Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_3d_cuda_float32 PASSED [ 52%] 2023-01-11T23:10:17.1397104Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_3d_cuda_int32 PASSED [ 52%] 2023-01-11T23:10:17.1397265Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_3d_cuda_int64 PASSED [ 52%] 2023-01-11T23:10:17.1397431Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_and_cuda_int32 PASSED [ 52%] 2023-01-11T23:10:17.1397583Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_and_cuda_int8 PASSED [ 52%] 2023-01-11T23:10:17.1397746Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_and_cuda_uint8 PASSED [ 52%] 2023-01-11T23:10:17.1397919Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_left_shift_cuda_int16 PASSED [ 52%] 2023-01-11T23:10:17.1398108Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_left_shift_cuda_int64 PASSED [ 52%] 2023-01-11T23:10:17.1398272Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_not_cuda_int16 PASSED [ 53%] 2023-01-11T23:10:17.1398435Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_not_cuda_int32 PASSED [ 53%] 2023-01-11T23:10:17.1398595Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_not_cuda_int64 PASSED [ 53%] 2023-01-11T23:10:17.1398756Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_not_cuda_int8 PASSED [ 53%] 2023-01-11T23:10:17.1398920Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_or_cuda_bool PASSED [ 53%] 2023-01-11T23:10:17.1399071Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_or_cuda_int16 PASSED [ 53%] 2023-01-11T23:10:17.1399250Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_right_shift_cuda_int16 PASSED [ 53%] 2023-01-11T23:10:17.1399413Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_xor_cuda_bool PASSED [ 53%] 2023-01-11T23:10:17.1399570Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_xor_cuda_int64 PASSED [ 53%] 2023-01-11T23:10:17.1399747Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_tensors_cuda_bfloat16 PASSED [ 53%] 2023-01-11T23:10:17.1399926Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_tensors_cuda_complex128 PASSED [ 53%] 2023-01-11T23:10:17.1400105Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_tensors_cuda_float32 PASSED [ 53%] 2023-01-11T23:10:17.1400275Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_tensors_cuda_float64 PASSED [ 53%] 2023-01-11T23:10:17.1400441Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_tensors_cuda_int32 PASSED [ 53%] 2023-01-11T23:10:17.1400637Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_to_cuda_bool PASSED [ 53%] 2023-01-11T23:10:17.1400814Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_to_cuda_complex128 PASSED [ 53%] 2023-01-11T23:10:17.1400984Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_to_cuda_float16 PASSED [ 53%] 2023-01-11T23:10:17.1401152Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_to_cuda_float64 PASSED [ 53%] 2023-01-11T23:10:17.1401319Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_to_cuda_int32 PASSED [ 53%] 2023-01-11T23:10:17.1401485Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_to_cuda_int64 PASSED [ 53%] 2023-01-11T23:10:17.1401650Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bucketize_cuda_float32 PASSED [ 53%] 2023-01-11T23:10:17.1401817Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bucketize_cuda_float64 PASSED [ 53%] 2023-01-11T23:10:17.1401976Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bucketize_cuda_int16 PASSED [ 53%] 2023-01-11T23:10:17.1402137Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bucketize_cuda_int64 PASSED [ 53%] 2023-01-11T23:10:17.1402301Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bucketize_cuda_int8 PASSED [ 53%] 2023-01-11T23:10:17.1402459Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cat_cuda_bool PASSED [ 53%] 2023-01-11T23:10:17.1402621Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cat_cuda_complex128 PASSED [ 53%] 2023-01-11T23:10:17.1402782Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cat_cuda_complex64 PASSED [ 53%] 2023-01-11T23:10:17.1402946Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cat_cuda_float16 PASSED [ 53%] 2023-01-11T23:10:17.1403107Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cat_cuda_float32 PASSED [ 53%] 2023-01-11T23:10:17.1403256Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cat_cuda_int32 PASSED [ 53%] 2023-01-11T23:10:17.1403419Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ceil_cuda_float16 PASSED [ 53%] 2023-01-11T23:10:17.1403611Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ceil_cuda_float32 PASSED [ 53%] 2023-01-11T23:10:17.1403771Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ceil_cuda_int32 PASSED [ 53%] 2023-01-11T23:10:17.1403924Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ceil_cuda_int64 PASSED [ 53%] 2023-01-11T23:10:17.1404093Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_chunk_cuda_complex128 PASSED [ 53%] 2023-01-11T23:10:17.1404257Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_chunk_cuda_float32 PASSED [ 53%] 2023-01-11T23:10:17.1404415Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_chunk_cuda_float64 PASSED [ 53%] 2023-01-11T23:10:17.1404576Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_cuda_float32 PASSED [ 53%] 2023-01-11T23:10:17.1404733Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_cuda_float64 PASSED [ 53%] 2023-01-11T23:10:17.1404896Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_cuda_int16 PASSED [ 53%] 2023-01-11T23:10:17.1405063Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_cuda_int32 PASSED [ 53%] 2023-01-11T23:10:17.1405219Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_cuda_int8 PASSED [ 53%] 2023-01-11T23:10:17.1405377Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_cuda_uint8 PASSED [ 53%] 2023-01-11T23:10:17.1405541Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_max_cuda_float32 PASSED [ 53%] 2023-01-11T23:10:17.1405706Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_max_cuda_float64 PASSED [ 53%] 2023-01-11T23:10:17.1405868Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_max_cuda_int16 PASSED [ 53%] 2023-01-11T23:10:17.1406022Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_max_cuda_int32 PASSED [ 53%] 2023-01-11T23:10:17.1406206Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_max_cuda_uint8 PASSED [ 53%] 2023-01-11T23:10:17.1406365Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_min_cuda_int64 PASSED [ 53%] 2023-01-11T23:10:17.1406528Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_min_cuda_int8 PASSED [ 53%] 2023-01-11T23:10:17.1406692Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clone_cuda_complex128 PASSED [ 53%] 2023-01-11T23:10:17.1406858Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clone_cuda_complex32 PASSED [ 53%] 2023-01-11T23:10:17.1407021Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clone_cuda_complex64 PASSED [ 53%] 2023-01-11T23:10:17.1407180Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clone_cuda_float16 PASSED [ 53%] 2023-01-11T23:10:17.1407339Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clone_cuda_float32 PASSED [ 53%] 2023-01-11T23:10:17.1407491Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clone_cuda_float64 PASSED [ 53%] 2023-01-11T23:10:17.1407657Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clone_cuda_int8 PASSED [ 53%] 2023-01-11T23:10:17.1407830Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_column_stack_cuda_bfloat16 PASSED [ 53%] 2023-01-11T23:10:17.1408004Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_column_stack_cuda_complex128 PASSED [ 53%] 2023-01-11T23:10:17.1408171Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_column_stack_cuda_float16 PASSED [ 53%] 2023-01-11T23:10:17.1408341Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_column_stack_cuda_float32 PASSED [ 53%] 2023-01-11T23:10:17.1408504Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_column_stack_cuda_int8 PASSED [ 53%] 2023-01-11T23:10:17.1408670Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_cuda_complex32 PASSED [ 53%] 2023-01-11T23:10:17.1408826Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_cuda_complex64 PASSED [ 53%] 2023-01-11T23:10:17.1408992Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_cuda_float16 PASSED [ 53%] 2023-01-11T23:10:17.1409154Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_cuda_float64 PASSED [ 53%] 2023-01-11T23:10:17.1409333Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_cuda_int16 PASSED [ 53%] 2023-01-11T23:10:17.1409507Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_physical_cuda_bfloat16 PASSED [ 53%] 2023-01-11T23:10:17.1409675Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_physical_cuda_bool PASSED [ 53%] 2023-01-11T23:10:17.1409846Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_physical_cuda_float16 PASSED [ 53%] 2023-01-11T23:10:17.1410016Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_physical_cuda_float32 PASSED [ 53%] 2023-01-11T23:10:17.1410183Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_physical_cuda_int8 PASSED [ 53%] 2023-01-11T23:10:17.1410349Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_constant_pad_nd_cuda_bool PASSED [ 53%] 2023-01-11T23:10:17.1410528Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_constant_pad_nd_cuda_complex128 PASSED [ 53%] 2023-01-11T23:10:17.1410708Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_constant_pad_nd_cuda_complex64 PASSED [ 53%] 2023-01-11T23:10:17.1410878Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_constant_pad_nd_cuda_float16 PASSED [ 53%] 2023-01-11T23:10:17.1411046Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_constant_pad_nd_cuda_float32 PASSED [ 53%] 2023-01-11T23:10:17.1411216Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_constant_pad_nd_cuda_int8 PASSED [ 53%] 2023-01-11T23:10:17.1411382Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_contiguous_cuda_bool PASSED [ 53%] 2023-01-11T23:10:17.1411550Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_contiguous_cuda_complex128 PASSED [ 53%] 2023-01-11T23:10:17.1411734Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_contiguous_cuda_complex32 PASSED [ 53%] 2023-01-11T23:10:17.1411905Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_contiguous_cuda_complex64 PASSED [ 53%] 2023-01-11T23:10:17.1412073Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_contiguous_cuda_int8 PASSED [ 53%] 2023-01-11T23:10:17.1412238Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_contiguous_cuda_uint8 PASSED [ 53%] 2023-01-11T23:10:17.1412401Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_copysign_cuda_bool PASSED [ 53%] 2023-01-11T23:10:17.1412567Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_copysign_cuda_float64 PASSED [ 53%] 2023-01-11T23:10:17.1412730Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_copysign_cuda_int8 PASSED [ 53%] 2023-01-11T23:10:17.1412895Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_copysign_cuda_uint8 PASSED [ 53%] 2023-01-11T23:10:17.1413053Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cos_cuda_bool PASSED [ 53%] 2023-01-11T23:10:17.1413210Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cos_cuda_complex64 PASSED [ 53%] 2023-01-11T23:10:17.1413369Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cos_cuda_float32 PASSED [ 53%] 2023-01-11T23:10:17.1413531Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cos_cuda_uint8 PASSED [ 53%] 2023-01-11T23:10:17.1413693Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cosh_cuda_float16 PASSED [ 53%] 2023-01-11T23:10:17.1413851Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cosh_cuda_float64 PASSED [ 53%] 2023-01-11T23:10:17.1414009Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cosh_cuda_int8 PASSED [ 53%] 2023-01-11T23:10:17.1414180Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cumsum_cuda_complex128 PASSED [ 53%] 2023-01-11T23:10:17.1414341Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cumsum_cuda_float32 PASSED [ 53%] 2023-01-11T23:10:17.1414582Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cumsum_cuda_int32 PASSED [ 53%] 2023-01-11T23:10:17.1414750Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cumsum_cuda_int8 PASSED [ 53%] 2023-01-11T23:10:17.1414911Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_cuda_bfloat16 PASSED [ 53%] 2023-01-11T23:10:17.1415113Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_cuda_complex128 PASSED [ 53%] 2023-01-11T23:10:17.1415277Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_cuda_complex64 PASSED [ 53%] 2023-01-11T23:10:17.1415434Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_cuda_uint8 PASSED [ 53%] 2023-01-11T23:10:17.1415602Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_embed_cuda_bfloat16 PASSED [ 53%] 2023-01-11T23:10:17.1415772Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_embed_cuda_complex128 PASSED [ 53%] 2023-01-11T23:10:17.1415940Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_embed_cuda_float32 PASSED [ 53%] 2023-01-11T23:10:17.1416098Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_embed_cuda_int16 PASSED [ 53%] 2023-01-11T23:10:17.1416261Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_embed_cuda_int32 PASSED [ 53%] 2023-01-11T23:10:17.1416423Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_embed_cuda_int64 PASSED [ 53%] 2023-01-11T23:10:17.1416589Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_embed_cuda_int8 PASSED [ 53%] 2023-01-11T23:10:17.1416749Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_embed_cuda_uint8 PASSED [ 53%] 2023-01-11T23:10:17.1416923Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_copy_cuda_complex128 PASSED [ 53%] 2023-01-11T23:10:17.1417103Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_copy_cuda_complex32 PASSED [ 53%] 2023-01-11T23:10:17.1417274Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_copy_cuda_float16 PASSED [ 53%] 2023-01-11T23:10:17.1417438Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_copy_cuda_float32 PASSED [ 53%] 2023-01-11T23:10:17.1417636Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_copy_cuda_int32 PASSED [ 53%] 2023-01-11T23:10:17.1417808Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_copy_cuda_int64 PASSED [ 53%] 2023-01-11T23:10:17.1417974Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_copy_cuda_int8 PASSED [ 53%] 2023-01-11T23:10:17.1418144Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_cuda_complex32 PASSED [ 53%] 2023-01-11T23:10:17.1418307Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_cuda_complex64 PASSED [ 53%] 2023-01-11T23:10:17.1418470Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_cuda_float64 PASSED [ 53%] 2023-01-11T23:10:17.1418631Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_cuda_int32 PASSED [ 53%] 2023-01-11T23:10:17.1418791Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_cuda_int64 PASSED [ 53%] 2023-01-11T23:10:17.1418948Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_cuda_int8 PASSED [ 53%] 2023-01-11T23:10:17.1419110Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_cuda_uint8 PASSED [ 53%] 2023-01-11T23:10:17.1419293Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_scatter_cuda_complex64 PASSED [ 53%] 2023-01-11T23:10:17.1419466Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_scatter_cuda_float16 PASSED [ 53%] 2023-01-11T23:10:17.1419634Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_scatter_cuda_float64 PASSED [ 53%] 2023-01-11T23:10:17.1419805Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_scatter_cuda_int64 PASSED [ 53%] 2023-01-11T23:10:17.1419977Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_scatter_cuda_uint8 PASSED [ 53%] 2023-01-11T23:10:17.1420138Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_digamma_cuda_int32 PASSED [ 53%] 2023-01-11T23:10:17.1420302Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_digamma_cuda_int64 PASSED [ 53%] 2023-01-11T23:10:17.1420456Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_digamma_cuda_int8 PASSED [ 53%] 2023-01-11T23:10:17.1420640Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_digamma_cuda_uint8 PASSED [ 53%] 2023-01-11T23:10:17.1420820Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_floor_rounding_cuda_bfloat16 PASSED [ 53%] 2023-01-11T23:10:17.1420996Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_floor_rounding_cuda_float16 PASSED [ 53%] 2023-01-11T23:10:17.1421167Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_floor_rounding_cuda_float32 PASSED [ 53%] 2023-01-11T23:10:17.1421342Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_floor_rounding_cuda_int64 PASSED [ 53%] 2023-01-11T23:10:17.1421510Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_floor_rounding_cuda_uint8 PASSED [ 53%] 2023-01-11T23:10:17.1421695Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_no_rounding_mode_cuda_bfloat16 PASSED [ 53%] 2023-01-11T23:10:17.1421864Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_no_rounding_mode_cuda_bool PASSED [ 53%] 2023-01-11T23:10:17.1422049Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_no_rounding_mode_cuda_complex32 PASSED [ 53%] 2023-01-11T23:10:17.1422231Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_no_rounding_mode_cuda_complex64 PASSED [ 53%] 2023-01-11T23:10:17.1429391Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_no_rounding_mode_cuda_float16 PASSED [ 53%] 2023-01-11T23:10:17.1429601Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_no_rounding_mode_cuda_float32 PASSED [ 53%] 2023-01-11T23:10:17.1429784Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_no_rounding_mode_cuda_float64 PASSED [ 53%] 2023-01-11T23:10:17.1429962Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_no_rounding_mode_cuda_int64 PASSED [ 53%] 2023-01-11T23:10:17.1430194Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_no_rounding_mode_cuda_uint8 PASSED [ 53%] 2023-01-11T23:10:17.1430364Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_trunc_rounding_cuda_bfloat16 PASSED [ 53%] 2023-01-11T23:10:17.1430544Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_trunc_rounding_cuda_float64 PASSED [ 53%] 2023-01-11T23:10:17.1430717Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_trunc_rounding_cuda_int32 PASSED [ 53%] 2023-01-11T23:10:17.1430888Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_trunc_rounding_cuda_int8 PASSED [ 53%] 2023-01-11T23:10:17.1431053Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dsplit_cuda_bool PASSED [ 53%] 2023-01-11T23:10:17.1431221Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dsplit_cuda_complex128 PASSED [ 53%] 2023-01-11T23:10:17.1431382Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dsplit_cuda_int32 PASSED [ 53%] 2023-01-11T23:10:17.1431546Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dsplit_cuda_uint8 PASSED [ 54%] 2023-01-11T23:10:17.1431711Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dstack_cuda_complex128 PASSED [ 54%] 2023-01-11T23:10:17.1431872Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dstack_cuda_complex64 PASSED [ 54%] 2023-01-11T23:10:17.1432035Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dstack_cuda_float32 PASSED [ 54%] 2023-01-11T23:10:17.1432198Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dstack_cuda_int32 PASSED [ 54%] 2023-01-11T23:10:17.1432356Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dstack_cuda_int64 PASSED [ 54%] 2023-01-11T23:10:17.1432515Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dstack_cuda_uint8 PASSED [ 54%] 2023-01-11T23:10:17.1432675Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_cuda_bfloat16 PASSED [ 54%] 2023-01-11T23:10:17.1432834Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_cuda_bool PASSED [ 54%] 2023-01-11T23:10:17.1433003Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_cuda_complex128 PASSED [ 54%] 2023-01-11T23:10:17.1433160Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_cuda_complex64 PASSED [ 54%] 2023-01-11T23:10:17.1433349Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_cuda_float16 PASSED [ 54%] 2023-01-11T23:10:17.1433512Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_cuda_float32 PASSED [ 54%] 2023-01-11T23:10:17.1433673Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_cuda_int32 PASSED [ 54%] 2023-01-11T23:10:17.1433833Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_cuda_int64 PASSED [ 54%] 2023-01-11T23:10:17.1433995Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_cuda_uint8 PASSED [ 54%] 2023-01-11T23:10:17.1434159Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_like_cuda_bool PASSED [ 54%] 2023-01-11T23:10:17.1434335Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_like_cuda_complex32 PASSED [ 54%] 2023-01-11T23:10:17.1434503Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_like_cuda_float16 PASSED [ 54%] 2023-01-11T23:10:17.1434659Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_like_cuda_int32 PASSED [ 54%] 2023-01-11T23:10:17.1434816Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eq_cuda_bool PASSED [ 54%] 2023-01-11T23:10:17.1434978Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eq_cuda_complex128 PASSED [ 54%] 2023-01-11T23:10:17.1435139Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eq_cuda_complex64 PASSED [ 54%] 2023-01-11T23:10:17.1435296Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eq_cuda_float64 PASSED [ 54%] 2023-01-11T23:10:17.1435451Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eq_cuda_int32 PASSED [ 54%] 2023-01-11T23:10:17.1435604Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eq_cuda_int64 PASSED [ 54%] 2023-01-11T23:10:17.1435783Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eq_cuda_uint8 PASSED [ 54%] 2023-01-11T23:10:17.1435935Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erf_cuda_bfloat16 PASSED [ 54%] 2023-01-11T23:10:17.1436099Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erf_cuda_bool PASSED [ 54%] 2023-01-11T23:10:17.1436259Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erf_cuda_float64 PASSED [ 54%] 2023-01-11T23:10:17.1436415Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erf_cuda_int64 PASSED [ 54%] 2023-01-11T23:10:17.1436574Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erfc_cuda_bfloat16 PASSED [ 54%] 2023-01-11T23:10:17.1436731Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erfc_cuda_float32 PASSED [ 54%] 2023-01-11T23:10:17.1436886Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erfc_cuda_float64 PASSED [ 54%] 2023-01-11T23:10:17.1437042Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erfc_cuda_int32 PASSED [ 54%] 2023-01-11T23:10:17.1437190Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erfc_cuda_uint8 PASSED [ 54%] 2023-01-11T23:10:17.1437349Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erfinv_cuda_bool PASSED [ 54%] 2023-01-11T23:10:17.1437513Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erfinv_cuda_float64 PASSED [ 54%] 2023-01-11T23:10:17.1437669Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erfinv_cuda_int16 PASSED [ 54%] 2023-01-11T23:10:17.1437822Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erfinv_cuda_int32 PASSED [ 54%] 2023-01-11T23:10:17.1437979Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erfinv_cuda_uint8 PASSED [ 54%] 2023-01-11T23:10:17.1438136Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp2_cuda_bfloat16 PASSED [ 54%] 2023-01-11T23:10:17.1438290Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp2_cuda_float16 PASSED [ 54%] 2023-01-11T23:10:17.1438443Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp2_cuda_float32 PASSED [ 54%] 2023-01-11T23:10:17.1438596Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp2_cuda_float64 PASSED [ 54%] 2023-01-11T23:10:17.1438752Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp2_cuda_int16 PASSED [ 54%] 2023-01-11T23:10:17.1438933Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp2_cuda_int64 PASSED [ 54%] 2023-01-11T23:10:17.1439094Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp_cuda_bfloat16 PASSED [ 54%] 2023-01-11T23:10:17.1439256Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp_cuda_complex128 PASSED [ 54%] 2023-01-11T23:10:17.1439417Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp_cuda_complex64 PASSED [ 54%] 2023-01-11T23:10:17.1439574Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp_cuda_float16 PASSED [ 54%] 2023-01-11T23:10:17.1439731Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp_cuda_float64 PASSED [ 54%] 2023-01-11T23:10:17.1439882Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp_cuda_int32 PASSED [ 54%] 2023-01-11T23:10:17.1440040Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp_cuda_int64 PASSED [ 54%] 2023-01-11T23:10:17.1440196Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp_cuda_int8 PASSED [ 54%] 2023-01-11T23:10:17.1440351Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp_cuda_uint8 PASSED [ 54%] 2023-01-11T23:10:17.1440513Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_as_cuda_bool PASSED [ 54%] 2023-01-11T23:10:17.1440679Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_as_cuda_float32 PASSED [ 54%] 2023-01-11T23:10:17.1440843Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_as_cuda_int32 PASSED [ 54%] 2023-01-11T23:10:17.1441005Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_as_cuda_int64 PASSED [ 54%] 2023-01-11T23:10:17.1441164Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_as_cuda_int8 PASSED [ 54%] 2023-01-11T23:10:17.1441344Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_as_cuda_uint8 PASSED [ 54%] 2023-01-11T23:10:17.1441512Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_cuda_complex128 PASSED [ 54%] 2023-01-11T23:10:17.1441674Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_cuda_float16 PASSED [ 54%] 2023-01-11T23:10:17.1441833Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_cuda_float32 PASSED [ 54%] 2023-01-11T23:10:17.1441997Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_cuda_int16 PASSED [ 54%] 2023-01-11T23:10:17.1442155Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_cuda_int32 PASSED [ 54%] 2023-01-11T23:10:17.1442312Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_cuda_int64 PASSED [ 54%] 2023-01-11T23:10:17.1442472Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_cuda_int8 PASSED [ 54%] 2023-01-11T23:10:17.1442621Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_cuda_uint8 PASSED [ 54%] 2023-01-11T23:10:17.1442784Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expm1_cuda_float16 PASSED [ 54%] 2023-01-11T23:10:17.1442944Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expm1_cuda_float64 PASSED [ 54%] 2023-01-11T23:10:17.1443100Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expm1_cuda_int64 PASSED [ 54%] 2023-01-11T23:10:17.1443256Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expm1_cuda_uint8 PASSED [ 54%] 2023-01-11T23:10:17.1443413Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eye_cuda_float16 PASSED [ 54%] 2023-01-11T23:10:17.1443572Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eye_cuda_float32 PASSED [ 54%] 2023-01-11T23:10:17.1443725Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eye_cuda_float64 PASSED [ 54%] 2023-01-11T23:10:17.1443876Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eye_cuda_int64 PASSED [ 54%] 2023-01-11T23:10:17.1444032Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eye_cuda_int8 PASSED [ 54%] 2023-01-11T23:10:17.1444189Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eye_cuda_uint8 PASSED [ 54%] 2023-01-11T23:10:17.1444350Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft2_cuda_bool PASSED [ 54%] 2023-01-11T23:10:17.1444544Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft2_cuda_complex128 PASSED [ 54%] 2023-01-11T23:10:17.1444711Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft2_cuda_complex32 PASSED [ 54%] 2023-01-11T23:10:17.1444874Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft2_cuda_float32 PASSED [ 54%] 2023-01-11T23:10:17.1445035Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft2_cuda_float64 PASSED [ 54%] 2023-01-11T23:10:17.1445192Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft2_cuda_int32 PASSED [ 54%] 2023-01-11T23:10:17.1445343Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft2_cuda_int64 PASSED [ 54%] 2023-01-11T23:10:17.1445505Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft2_cuda_int8 PASSED [ 54%] 2023-01-11T23:10:17.1445664Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft_cuda_bool PASSED [ 54%] 2023-01-11T23:10:17.1445831Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft_cuda_complex32 PASSED [ 54%] 2023-01-11T23:10:17.1445991Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft_cuda_float32 PASSED [ 54%] 2023-01-11T23:10:17.1446148Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft_cuda_int16 PASSED [ 54%] 2023-01-11T23:10:17.1446308Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft_cuda_int32 PASSED [ 54%] 2023-01-11T23:10:17.1446463Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft_cuda_int64 PASSED [ 54%] 2023-01-11T23:10:17.1446621Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftn_cuda_complex32 PASSED [ 54%] 2023-01-11T23:10:17.1446786Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftn_cuda_complex64 PASSED [ 54%] 2023-01-11T23:10:17.1446974Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftn_cuda_float32 PASSED [ 54%] 2023-01-11T23:10:17.1447136Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftn_cuda_float64 PASSED [ 54%] 2023-01-11T23:10:17.1447295Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftn_cuda_int32 PASSED [ 54%] 2023-01-11T23:10:17.1447453Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftn_cuda_int8 PASSED [ 54%] 2023-01-11T23:10:17.1447619Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftshift_cuda_bool PASSED [ 54%] 2023-01-11T23:10:17.1447788Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftshift_cuda_float16 PASSED [ 54%] 2023-01-11T23:10:17.1447956Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftshift_cuda_float32 PASSED [ 54%] 2023-01-11T23:10:17.1448112Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftshift_cuda_int16 PASSED [ 54%] 2023-01-11T23:10:17.1448279Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftshift_cuda_int32 PASSED [ 54%] 2023-01-11T23:10:17.1448438Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftshift_cuda_int64 PASSED [ 54%] 2023-01-11T23:10:17.1448598Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft2_cuda_bool PASSED [ 54%] 2023-01-11T23:10:17.1448768Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft2_cuda_complex128 PASSED [ 54%] 2023-01-11T23:10:17.1448932Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft2_cuda_complex32 PASSED [ 54%] 2023-01-11T23:10:17.1449098Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft2_cuda_complex64 PASSED [ 54%] 2023-01-11T23:10:17.1449258Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft2_cuda_float16 PASSED [ 54%] 2023-01-11T23:10:17.1449409Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft2_cuda_int8 PASSED [ 54%] 2023-01-11T23:10:17.1449571Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft2_cuda_uint8 PASSED [ 54%] 2023-01-11T23:10:17.1449732Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft_cuda_bool PASSED [ 54%] 2023-01-11T23:10:17.1449932Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft_cuda_complex32 PASSED [ 54%] 2023-01-11T23:10:17.1450124Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft_cuda_float32 PASSED [ 54%] 2023-01-11T23:10:17.1450303Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft_cuda_int16 PASSED [ 54%] 2023-01-11T23:10:17.1450462Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft_cuda_int32 PASSED [ 54%] 2023-01-11T23:10:17.1450621Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft_cuda_int8 PASSED [ 54%] 2023-01-11T23:10:17.1450780Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft_cuda_uint8 PASSED [ 54%] 2023-01-11T23:10:17.1450942Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfftn_cuda_complex128 PASSED [ 54%] 2023-01-11T23:10:17.1451109Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfftn_cuda_float32 PASSED [ 54%] 2023-01-11T23:10:17.1451269Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfftn_cuda_int32 PASSED [ 54%] 2023-01-11T23:10:17.1451430Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfftn_cuda_int64 PASSED [ 54%] 2023-01-11T23:10:17.1451592Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfftn_cuda_uint8 PASSED [ 54%] 2023-01-11T23:10:17.1451753Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft2_cuda_bool PASSED [ 54%] 2023-01-11T23:10:17.1451916Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft2_cuda_complex128 PASSED [ 54%] 2023-01-11T23:10:17.1452083Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft2_cuda_complex32 PASSED [ 54%] 2023-01-11T23:10:17.1452235Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft2_cuda_float32 PASSED [ 54%] 2023-01-11T23:10:17.1452397Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft2_cuda_float64 PASSED [ 54%] 2023-01-11T23:10:17.1452582Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft2_cuda_int16 PASSED [ 54%] 2023-01-11T23:10:17.1452746Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft2_cuda_int64 PASSED [ 54%] 2023-01-11T23:10:17.1452905Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft2_cuda_int8 PASSED [ 54%] 2023-01-11T23:10:17.1453063Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft2_cuda_uint8 PASSED [ 54%] 2023-01-11T23:10:17.1453222Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft_cuda_bool PASSED [ 54%] 2023-01-11T23:10:17.1453389Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft_cuda_complex128 PASSED [ 54%] 2023-01-11T23:10:17.1453556Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft_cuda_complex32 PASSED [ 54%] 2023-01-11T23:10:17.1453715Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft_cuda_complex64 PASSED [ 54%] 2023-01-11T23:10:17.1453882Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft_cuda_float32 PASSED [ 54%] 2023-01-11T23:10:17.1454046Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft_cuda_float64 PASSED [ 54%] 2023-01-11T23:10:17.1454207Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft_cuda_int64 PASSED [ 54%] 2023-01-11T23:10:17.1454365Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft_cuda_int8 PASSED [ 54%] 2023-01-11T23:10:17.1454738Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft_cuda_uint8 PASSED [ 54%] 2023-01-11T23:10:17.1454902Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftn_cuda_bool PASSED [ 54%] 2023-01-11T23:10:17.1455070Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftn_cuda_complex128 PASSED [ 54%] 2023-01-11T23:10:17.1455232Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftn_cuda_float16 PASSED [ 54%] 2023-01-11T23:10:17.1455384Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftn_cuda_float32 PASSED [ 54%] 2023-01-11T23:10:17.1455547Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftn_cuda_float64 PASSED [ 54%] 2023-01-11T23:10:17.1455755Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftn_cuda_int16 PASSED [ 54%] 2023-01-11T23:10:17.1455917Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftn_cuda_int32 PASSED [ 54%] 2023-01-11T23:10:17.1456076Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftn_cuda_int8 PASSED [ 54%] 2023-01-11T23:10:17.1456235Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftn_cuda_uint8 PASSED [ 54%] 2023-01-11T23:10:17.1456408Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftshift_cuda_bfloat16 PASSED [ 54%] 2023-01-11T23:10:17.1456582Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftshift_cuda_complex128 PASSED [ 54%] 2023-01-11T23:10:17.1456747Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftshift_cuda_complex32 PASSED [ 54%] 2023-01-11T23:10:17.1456919Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftshift_cuda_float32 PASSED [ 54%] 2023-01-11T23:10:17.1457088Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftshift_cuda_float64 PASSED [ 54%] 2023-01-11T23:10:17.1457257Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftshift_cuda_int64 PASSED [ 55%] 2023-01-11T23:10:17.1457426Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftshift_cuda_uint8 PASSED [ 55%] 2023-01-11T23:10:17.1457589Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfft2_cuda_bool PASSED [ 55%] 2023-01-11T23:10:17.1457752Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfft2_cuda_float16 PASSED [ 55%] 2023-01-11T23:10:17.1457913Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfft2_cuda_int16 PASSED [ 55%] 2023-01-11T23:10:17.1458073Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfft2_cuda_int64 PASSED [ 55%] 2023-01-11T23:10:17.1458298Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfft2_cuda_uint8 PASSED [ 55%] 2023-01-11T23:10:17.1458456Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfft_cuda_bool PASSED [ 55%] 2023-01-11T23:10:17.1458625Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfft_cuda_float16 PASSED [ 55%] 2023-01-11T23:10:17.1458788Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfft_cuda_float32 PASSED [ 55%] 2023-01-11T23:10:17.1458947Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfft_cuda_int16 PASSED [ 55%] 2023-01-11T23:10:17.1459108Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfft_cuda_int32 PASSED [ 55%] 2023-01-11T23:10:17.1459267Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfft_cuda_int8 PASSED [ 55%] 2023-01-11T23:10:17.1459427Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfft_cuda_uint8 PASSED [ 55%] 2023-01-11T23:10:17.1459581Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfftn_cuda_float32 PASSED [ 55%] 2023-01-11T23:10:17.1459749Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfftn_cuda_float64 PASSED [ 55%] 2023-01-11T23:10:17.1459911Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfftn_cuda_int32 PASSED [ 55%] 2023-01-11T23:10:17.1460069Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfftn_cuda_int64 PASSED [ 55%] 2023-01-11T23:10:17.1460228Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft2_cuda_bool PASSED [ 55%] 2023-01-11T23:10:17.1460399Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft2_cuda_complex128 PASSED [ 55%] 2023-01-11T23:10:17.1460567Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft2_cuda_complex32 PASSED [ 55%] 2023-01-11T23:10:17.1460727Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft2_cuda_float32 PASSED [ 55%] 2023-01-11T23:10:17.1460884Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft2_cuda_int16 PASSED [ 55%] 2023-01-11T23:10:17.1461038Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft2_cuda_uint8 PASSED [ 55%] 2023-01-11T23:10:17.1461204Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft_cuda_complex128 PASSED [ 55%] 2023-01-11T23:10:17.1461430Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft_cuda_float32 PASSED [ 55%] 2023-01-11T23:10:17.1461596Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft_cuda_float64 PASSED [ 55%] 2023-01-11T23:10:17.1461755Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft_cuda_int64 PASSED [ 55%] 2023-01-11T23:10:17.1461915Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft_cuda_uint8 PASSED [ 55%] 2023-01-11T23:10:17.1462085Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfftn_cuda_complex128 PASSED [ 55%] 2023-01-11T23:10:17.1462254Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfftn_cuda_complex32 PASSED [ 55%] 2023-01-11T23:10:17.1462417Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfftn_cuda_complex64 PASSED [ 55%] 2023-01-11T23:10:17.1462582Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfftn_cuda_float64 PASSED [ 55%] 2023-01-11T23:10:17.1462741Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfftn_cuda_int16 PASSED [ 55%] 2023-01-11T23:10:17.1462902Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfftn_cuda_int8 PASSED [ 55%] 2023-01-11T23:10:17.1463062Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfftn_cuda_uint8 PASSED [ 55%] 2023-01-11T23:10:17.1463221Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfft2_cuda_float16 PASSED [ 55%] 2023-01-11T23:10:17.1463381Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfft2_cuda_float64 PASSED [ 55%] 2023-01-11T23:10:17.1463539Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfft2_cuda_int16 PASSED [ 55%] 2023-01-11T23:10:17.1463694Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfft_cuda_float32 PASSED [ 55%] 2023-01-11T23:10:17.1463876Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfft_cuda_float64 PASSED [ 55%] 2023-01-11T23:10:17.1464033Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfft_cuda_int64 PASSED [ 55%] 2023-01-11T23:10:17.1464196Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfftn_cuda_float16 PASSED [ 55%] 2023-01-11T23:10:17.1464354Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfftn_cuda_int16 PASSED [ 55%] 2023-01-11T23:10:17.1464511Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfftn_cuda_int64 PASSED [ 55%] 2023-01-11T23:10:17.1464668Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfftn_cuda_uint8 PASSED [ 55%] 2023-01-11T23:10:17.1464829Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fill_cuda_bfloat16 PASSED [ 55%] 2023-01-11T23:10:17.1464989Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fill_cuda_bool PASSED [ 55%] 2023-01-11T23:10:17.1465145Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fill_cuda_complex128 PASSED [ 55%] 2023-01-11T23:10:17.1465307Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fill_cuda_complex32 PASSED [ 55%] 2023-01-11T23:10:17.1465472Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fill_cuda_float16 PASSED [ 55%] 2023-01-11T23:10:17.1465631Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fill_cuda_int16 PASSED [ 55%] 2023-01-11T23:10:17.1465788Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fill_cuda_int64 PASSED [ 55%] 2023-01-11T23:10:17.1465943Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fill_cuda_int8 PASSED [ 55%] 2023-01-11T23:10:17.1466096Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fill_cuda_uint8 PASSED [ 55%] 2023-01-11T23:10:17.1466267Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flatten_cuda_complex128 PASSED [ 55%] 2023-01-11T23:10:17.1466430Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flatten_cuda_complex64 PASSED [ 55%] 2023-01-11T23:10:17.1466585Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flatten_cuda_float16 PASSED [ 55%] 2023-01-11T23:10:17.1466744Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flatten_cuda_int16 PASSED [ 55%] 2023-01-11T23:10:17.1466933Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flatten_cuda_int32 PASSED [ 55%] 2023-01-11T23:10:17.1467093Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flip_cuda_float32 PASSED [ 55%] 2023-01-11T23:10:17.1467253Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flip_cuda_float64 PASSED [ 55%] 2023-01-11T23:10:17.1467411Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flip_cuda_int16 PASSED [ 55%] 2023-01-11T23:10:17.1467567Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flip_cuda_int8 PASSED [ 55%] 2023-01-11T23:10:17.1467722Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flip_cuda_uint8 PASSED [ 55%] 2023-01-11T23:10:17.1467877Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fliplr_cuda_complex64 PASSED [ 55%] 2023-01-11T23:10:17.1468040Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fliplr_cuda_float16 PASSED [ 55%] 2023-01-11T23:10:17.1468203Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fliplr_cuda_float64 PASSED [ 55%] 2023-01-11T23:10:17.1468364Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fliplr_cuda_int32 PASSED [ 55%] 2023-01-11T23:10:17.1468524Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fliplr_cuda_int8 PASSED [ 55%] 2023-01-11T23:10:17.1468757Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flipud_cuda_bfloat16 PASSED [ 55%] 2023-01-11T23:10:17.1468937Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flipud_cuda_bool PASSED [ 55%] 2023-01-11T23:10:17.1469117Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flipud_cuda_float16 PASSED [ 55%] 2023-01-11T23:10:17.1469295Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flipud_cuda_float32 PASSED [ 55%] 2023-01-11T23:10:17.1469462Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flipud_cuda_float64 PASSED [ 55%] 2023-01-11T23:10:17.1469660Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flipud_cuda_int16 PASSED [ 55%] 2023-01-11T23:10:17.1469825Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flipud_cuda_int64 PASSED [ 55%] 2023-01-11T23:10:17.1469985Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flipud_cuda_uint8 PASSED [ 55%] 2023-01-11T23:10:17.1470153Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_float_power_cuda_bfloat16 PASSED [ 55%] 2023-01-11T23:10:17.1470321Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_float_power_cuda_float16 PASSED [ 55%] 2023-01-11T23:10:17.1470486Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_float_power_cuda_int16 PASSED [ 55%] 2023-01-11T23:10:17.1470646Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_floor_cuda_bfloat16 PASSED [ 55%] 2023-01-11T23:10:17.1470798Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_floor_cuda_float32 PASSED [ 55%] 2023-01-11T23:10:17.1470960Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_floor_cuda_int16 PASSED [ 55%] 2023-01-11T23:10:17.1471119Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_floor_cuda_int32 PASSED [ 55%] 2023-01-11T23:10:17.1471283Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_floor_cuda_uint8 PASSED [ 55%] 2023-01-11T23:10:17.1471455Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_floor_divide_cuda_bfloat16 PASSED [ 55%] 2023-01-11T23:10:17.1471623Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_floor_divide_cuda_float64 PASSED [ 55%] 2023-01-11T23:10:17.1471787Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_floor_divide_cuda_int16 PASSED [ 55%] 2023-01-11T23:10:17.1471951Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_floor_divide_cuda_uint8 PASSED [ 55%] 2023-01-11T23:10:17.1472112Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmax_cuda_bfloat16 PASSED [ 55%] 2023-01-11T23:10:17.1472264Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmax_cuda_float32 PASSED [ 55%] 2023-01-11T23:10:17.1472423Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmax_cuda_int16 PASSED [ 55%] 2023-01-11T23:10:17.1472578Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmax_cuda_int64 PASSED [ 55%] 2023-01-11T23:10:17.1472758Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmax_cuda_int8 PASSED [ 55%] 2023-01-11T23:10:17.1472920Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmin_cuda_bfloat16 PASSED [ 55%] 2023-01-11T23:10:17.1473079Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmin_cuda_float64 PASSED [ 55%] 2023-01-11T23:10:17.1473235Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmin_cuda_int16 PASSED [ 55%] 2023-01-11T23:10:17.1473389Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmin_cuda_int32 PASSED [ 55%] 2023-01-11T23:10:17.1473541Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmod_cuda_bfloat16 PASSED [ 55%] 2023-01-11T23:10:17.1473703Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmod_cuda_int8 PASSED [ 55%] 2023-01-11T23:10:17.1473859Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmod_cuda_uint8 PASSED [ 55%] 2023-01-11T23:10:17.1474019Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_frac_cuda_float32 PASSED [ 55%] 2023-01-11T23:10:17.1474176Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_frac_cuda_float64 PASSED [ 55%] 2023-01-11T23:10:17.1474332Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_gcd_cuda_int16 PASSED [ 55%] 2023-01-11T23:10:17.1474485Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_gcd_cuda_int64 PASSED [ 55%] 2023-01-11T23:10:17.1474643Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ge_cuda_bool PASSED [ 55%] 2023-01-11T23:10:17.1474794Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ge_cuda_float16 PASSED [ 55%] 2023-01-11T23:10:17.1474948Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ge_cuda_float32 PASSED [ 55%] 2023-01-11T23:10:17.1475129Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ge_cuda_int16 PASSED [ 55%] 2023-01-11T23:10:17.1475282Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ge_cuda_int32 PASSED [ 55%] 2023-01-11T23:10:17.1475438Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ge_cuda_int8 PASSED [ 55%] 2023-01-11T23:10:17.1475589Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ge_cuda_uint8 PASSED [ 55%] 2023-01-11T23:10:17.1475741Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_gt_cuda_bool PASSED [ 55%] 2023-01-11T23:10:17.1475899Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_gt_cuda_float16 PASSED [ 55%] 2023-01-11T23:10:17.1476048Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_gt_cuda_int16 PASSED [ 55%] 2023-01-11T23:10:17.1476190Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_gt_cuda_int32 PASSED [ 55%] 2023-01-11T23:10:17.1476338Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_gt_cuda_int64 PASSED [ 55%] 2023-01-11T23:10:17.1476491Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_gt_cuda_int8 PASSED [ 55%] 2023-01-11T23:10:17.1476660Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_heaviside_cuda_bfloat16 PASSED [ 55%] 2023-01-11T23:10:17.1476823Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_heaviside_cuda_bool PASSED [ 55%] 2023-01-11T23:10:17.1476989Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_heaviside_cuda_float32 PASSED [ 55%] 2023-01-11T23:10:17.1477152Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_heaviside_cuda_int16 PASSED [ 55%] 2023-01-11T23:10:17.1477316Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_heaviside_cuda_int32 PASSED [ 55%] 2023-01-11T23:10:17.1477467Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_heaviside_cuda_int64 PASSED [ 55%] 2023-01-11T23:10:17.1477629Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hsplit_cuda_bfloat16 PASSED [ 55%] 2023-01-11T23:10:17.1477793Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hsplit_cuda_complex32 PASSED [ 55%] 2023-01-11T23:10:17.1477956Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hsplit_cuda_float16 PASSED [ 55%] 2023-01-11T23:10:17.1478118Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hsplit_cuda_float32 PASSED [ 55%] 2023-01-11T23:10:17.1478300Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hsplit_cuda_float64 PASSED [ 55%] 2023-01-11T23:10:17.1478460Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hsplit_cuda_int64 PASSED [ 55%] 2023-01-11T23:10:17.1478622Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hsplit_cuda_uint8 PASSED [ 55%] 2023-01-11T23:10:17.1478777Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hstack_cuda_bfloat16 PASSED [ 55%] 2023-01-11T23:10:17.1478937Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hstack_cuda_bool PASSED [ 55%] 2023-01-11T23:10:17.1479100Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hstack_cuda_complex64 PASSED [ 55%] 2023-01-11T23:10:17.1479259Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hstack_cuda_float16 PASSED [ 55%] 2023-01-11T23:10:17.1479420Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hstack_cuda_int16 PASSED [ 55%] 2023-01-11T23:10:17.1479578Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hstack_cuda_int32 PASSED [ 55%] 2023-01-11T23:10:17.1479738Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hstack_cuda_int8 PASSED [ 55%] 2023-01-11T23:10:17.1479897Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hstack_cuda_uint8 PASSED [ 55%] 2023-01-11T23:10:17.1480059Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hypot_cuda_float32 PASSED [ 55%] 2023-01-11T23:10:17.1480210Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_i0_cuda_bool PASSED [ 55%] 2023-01-11T23:10:17.1480366Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_i0_cuda_float16 PASSED [ 55%] 2023-01-11T23:10:17.1480520Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_i0_cuda_float32 PASSED [ 55%] 2023-01-11T23:10:17.1480696Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_i0_cuda_float64 PASSED [ 55%] 2023-01-11T23:10:17.1480853Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_i0_cuda_int64 PASSED [ 55%] 2023-01-11T23:10:17.1481009Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_i0_cuda_int8 PASSED [ 55%] 2023-01-11T23:10:17.1481164Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_i0_cuda_uint8 PASSED [ 55%] 2023-01-11T23:10:17.1481325Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_igamma_cuda_float32 PASSED [ 55%] 2023-01-11T23:10:17.1481480Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_imag_cuda_complex128 PASSED [ 55%] 2023-01-11T23:10:17.1481642Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_imag_cuda_complex32 PASSED [ 55%] 2023-01-11T23:10:17.1481807Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_add_cuda_bfloat16 PASSED [ 55%] 2023-01-11T23:10:17.1481976Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_add_cuda_complex128 PASSED [ 55%] 2023-01-11T23:10:17.1482148Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_add_cuda_complex32 PASSED [ 55%] 2023-01-11T23:10:17.1482309Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_add_cuda_float32 PASSED [ 55%] 2023-01-11T23:10:17.1482470Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_add_cuda_int16 PASSED [ 55%] 2023-01-11T23:10:17.1482631Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_add_cuda_int64 PASSED [ 55%] 2023-01-11T23:10:17.1482789Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_add_cuda_uint8 PASSED [ 56%] 2023-01-11T23:10:17.1482940Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_copy_cuda_bool PASSED [ 56%] 2023-01-11T23:10:17.1483110Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_copy_cuda_complex128 PASSED [ 56%] 2023-01-11T23:10:17.1483282Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_copy_cuda_complex64 PASSED [ 56%] 2023-01-11T23:10:17.1483448Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_copy_cuda_float64 PASSED [ 56%] 2023-01-11T23:10:17.1483614Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_copy_cuda_int16 PASSED [ 56%] 2023-01-11T23:10:17.1483798Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_copy_cuda_int32 PASSED [ 56%] 2023-01-11T23:10:17.1483959Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_copy_cuda_int64 PASSED [ 56%] 2023-01-11T23:10:17.1484119Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_copy_cuda_int8 PASSED [ 56%] 2023-01-11T23:10:17.1484276Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_fill_cuda_bfloat16 PASSED [ 56%] 2023-01-11T23:10:17.1484436Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_fill_cuda_bool PASSED [ 56%] 2023-01-11T23:10:17.1484604Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_fill_cuda_complex64 PASSED [ 56%] 2023-01-11T23:10:17.1484769Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_fill_cuda_float64 PASSED [ 56%] 2023-01-11T23:10:17.1484930Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_fill_cuda_int16 PASSED [ 56%] 2023-01-11T23:10:17.1485087Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_fill_cuda_int32 PASSED [ 56%] 2023-01-11T23:10:17.1485248Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_fill_cuda_int8 PASSED [ 56%] 2023-01-11T23:10:17.1485405Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_fill_cuda_uint8 PASSED [ 56%] 2023-01-11T23:10:17.1485576Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_select_cuda_bfloat16 PASSED [ 56%] 2023-01-11T23:10:17.1485743Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_select_cuda_complex128 PASSED [ 56%] 2023-01-11T23:10:17.1485915Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_select_cuda_complex32 PASSED [ 56%] 2023-01-11T23:10:17.1486082Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_select_cuda_float16 PASSED [ 56%] 2023-01-11T23:10:17.1486275Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_select_cuda_int8 PASSED [ 56%] 2023-01-11T23:10:17.1486441Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_select_cuda_uint8 PASSED [ 56%] 2023-01-11T23:10:17.1486604Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isclose_cuda_bool PASSED [ 56%] 2023-01-11T23:10:17.1486769Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isclose_cuda_complex128 PASSED [ 56%] 2023-01-11T23:10:17.1486933Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isclose_cuda_float32 PASSED [ 56%] 2023-01-11T23:10:17.1487086Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isclose_cuda_int16 PASSED [ 56%] 2023-01-11T23:10:17.1487248Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isclose_cuda_int64 PASSED [ 56%] 2023-01-11T23:10:17.1487409Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isclose_cuda_uint8 PASSED [ 56%] 2023-01-11T23:10:17.1487574Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isfinite_cuda_bfloat16 PASSED [ 56%] 2023-01-11T23:10:17.1487742Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isfinite_cuda_complex64 PASSED [ 56%] 2023-01-11T23:10:17.1487910Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isfinite_cuda_float16 PASSED [ 56%] 2023-01-11T23:10:17.1488076Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isfinite_cuda_float32 PASSED [ 56%] 2023-01-11T23:10:17.1488238Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isfinite_cuda_int16 PASSED [ 56%] 2023-01-11T23:10:17.1488398Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isfinite_cuda_int32 PASSED [ 56%] 2023-01-11T23:10:17.1488550Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isfinite_cuda_int64 PASSED [ 56%] 2023-01-11T23:10:17.1488711Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isinf_cuda_bfloat16 PASSED [ 56%] 2023-01-11T23:10:17.1488869Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isinf_cuda_bool PASSED [ 56%] 2023-01-11T23:10:17.1489037Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isinf_cuda_complex64 PASSED [ 56%] 2023-01-11T23:10:17.1489196Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isinf_cuda_float16 PASSED [ 56%] 2023-01-11T23:10:17.1489389Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isinf_cuda_int16 PASSED [ 56%] 2023-01-11T23:10:17.1489551Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isinf_cuda_int64 PASSED [ 56%] 2023-01-11T23:10:17.1489709Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isinf_cuda_int8 PASSED [ 56%] 2023-01-11T23:10:17.1489859Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isinf_cuda_uint8 PASSED [ 56%] 2023-01-11T23:10:17.1490017Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isnan_cuda_bfloat16 PASSED [ 56%] 2023-01-11T23:10:17.1490177Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isnan_cuda_complex64 PASSED [ 56%] 2023-01-11T23:10:17.1490340Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isnan_cuda_float64 PASSED [ 56%] 2023-01-11T23:10:17.1490500Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isnan_cuda_int16 PASSED [ 56%] 2023-01-11T23:10:17.1490659Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isnan_cuda_int32 PASSED [ 56%] 2023-01-11T23:10:17.1490817Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isnan_cuda_int64 PASSED [ 56%] 2023-01-11T23:10:17.1490974Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isnan_cuda_uint8 PASSED [ 56%] 2023-01-11T23:10:17.1491139Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isneginf_cuda_float32 PASSED [ 56%] 2023-01-11T23:10:17.1491294Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isneginf_cuda_int32 PASSED [ 56%] 2023-01-11T23:10:17.1491452Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isneginf_cuda_int8 PASSED [ 56%] 2023-01-11T23:10:17.1491610Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isneginf_cuda_uint8 PASSED [ 56%] 2023-01-11T23:10:17.1491773Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isposinf_cuda_bool PASSED [ 56%] 2023-01-11T23:10:17.1491955Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isposinf_cuda_int32 PASSED [ 56%] 2023-01-11T23:10:17.1492114Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isposinf_cuda_int64 PASSED [ 56%] 2023-01-11T23:10:17.1492278Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isposinf_cuda_int8 PASSED [ 56%] 2023-01-11T23:10:17.1492439Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isreal_cuda_float16 PASSED [ 56%] 2023-01-11T23:10:17.1492588Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isreal_cuda_float32 PASSED [ 56%] 2023-01-11T23:10:17.1492745Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isreal_cuda_float64 PASSED [ 56%] 2023-01-11T23:10:17.1492904Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isreal_cuda_int16 PASSED [ 56%] 2023-01-11T23:10:17.1493062Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isreal_cuda_int32 PASSED [ 56%] 2023-01-11T23:10:17.1493223Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isreal_cuda_int8 PASSED [ 56%] 2023-01-11T23:10:17.1493383Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lcm_cuda_int32 PASSED [ 56%] 2023-01-11T23:10:17.1493540Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lcm_cuda_uint8 PASSED [ 56%] 2023-01-11T23:10:17.1493697Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_le_cuda_float16 PASSED [ 56%] 2023-01-11T23:10:17.1493849Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_le_cuda_float32 PASSED [ 56%] 2023-01-11T23:10:17.1493993Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_le_cuda_float64 PASSED [ 56%] 2023-01-11T23:10:17.1494150Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_le_cuda_int32 PASSED [ 56%] 2023-01-11T23:10:17.1494304Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_le_cuda_int64 PASSED [ 56%] 2023-01-11T23:10:17.1494466Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lerp_cuda_bfloat16 PASSED [ 56%] 2023-01-11T23:10:17.1494733Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lerp_cuda_complex128 PASSED [ 56%] 2023-01-11T23:10:17.1494892Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lerp_cuda_complex32 PASSED [ 56%] 2023-01-11T23:10:17.1495089Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lgamma_cuda_bool PASSED [ 56%] 2023-01-11T23:10:17.1495250Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lgamma_cuda_float32 PASSED [ 56%] 2023-01-11T23:10:17.1495400Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lgamma_cuda_int32 PASSED [ 56%] 2023-01-11T23:10:17.1495566Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lgamma_cuda_int64 PASSED [ 56%] 2023-01-11T23:10:17.1495723Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lgamma_cuda_int8 PASSED [ 56%] 2023-01-11T23:10:17.1495901Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_matrix_norm_cuda_float16 PASSED [ 56%] 2023-01-11T23:10:17.1496074Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_matrix_norm_cuda_float32 PASSED [ 56%] 2023-01-11T23:10:17.1496246Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_matrix_norm_cuda_float64 PASSED [ 56%] 2023-01-11T23:10:17.1496424Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_norm_cuda_complex128 PASSED [ 56%] 2023-01-11T23:10:17.1496596Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_norm_cuda_complex64 PASSED [ 56%] 2023-01-11T23:10:17.1496764Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_norm_cuda_float64 PASSED [ 56%] 2023-01-11T23:10:17.1496926Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_svd_cuda_complex128 PASSED [ 56%] 2023-01-11T23:10:17.1497093Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_svd_cuda_float64 PASSED [ 56%] 2023-01-11T23:10:17.1497269Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_svdvals_cuda_complex128 PASSED [ 56%] 2023-01-11T23:10:17.1497443Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_svdvals_cuda_complex64 PASSED [ 56%] 2023-01-11T23:10:17.1497647Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_svdvals_cuda_float64 PASSED [ 56%] 2023-01-11T23:10:17.1497830Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_vector_norm_cuda_bfloat16 PASSED [ 56%] 2023-01-11T23:10:17.1498008Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_vector_norm_cuda_complex64 PASSED [ 56%] 2023-01-11T23:10:17.1498182Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_vector_norm_cuda_float16 PASSED [ 56%] 2023-01-11T23:10:17.1498343Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_vector_norm_cuda_float32 PASSED [ 56%] 2023-01-11T23:10:17.1498509Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linspace_cuda_bfloat16 PASSED [ 56%] 2023-01-11T23:10:17.1498680Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linspace_cuda_complex128 PASSED [ 56%] 2023-01-11T23:10:17.1498845Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linspace_cuda_float32 PASSED [ 56%] 2023-01-11T23:10:17.1499009Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linspace_cuda_int32 PASSED [ 56%] 2023-01-11T23:10:17.1499171Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linspace_cuda_int8 PASSED [ 56%] 2023-01-11T23:10:17.1499335Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log10_cuda_complex128 PASSED [ 56%] 2023-01-11T23:10:17.1499498Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log10_cuda_complex64 PASSED [ 56%] 2023-01-11T23:10:17.1499657Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log10_cuda_float16 PASSED [ 56%] 2023-01-11T23:10:17.1499809Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log10_cuda_float32 PASSED [ 56%] 2023-01-11T23:10:17.1499968Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log10_cuda_int16 PASSED [ 56%] 2023-01-11T23:10:17.1500128Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log10_cuda_int8 PASSED [ 56%] 2023-01-11T23:10:17.1500284Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log10_cuda_uint8 PASSED [ 56%] 2023-01-11T23:10:17.1500449Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log1p_cuda_bfloat16 PASSED [ 56%] 2023-01-11T23:10:17.1500632Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log1p_cuda_bool PASSED [ 56%] 2023-01-11T23:10:17.1500792Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log1p_cuda_float64 PASSED [ 56%] 2023-01-11T23:10:17.1500948Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log1p_cuda_int32 PASSED [ 56%] 2023-01-11T23:10:17.1501095Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log1p_cuda_int8 PASSED [ 56%] 2023-01-11T23:10:17.1501251Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log2_cuda_bool PASSED [ 56%] 2023-01-11T23:10:17.1501417Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log2_cuda_complex128 PASSED [ 56%] 2023-01-11T23:10:17.1501577Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log2_cuda_complex64 PASSED [ 56%] 2023-01-11T23:10:17.1501739Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log2_cuda_float16 PASSED [ 56%] 2023-01-11T23:10:17.1501899Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log2_cuda_float32 PASSED [ 56%] 2023-01-11T23:10:17.1502060Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log2_cuda_float64 PASSED [ 56%] 2023-01-11T23:10:17.1502214Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log2_cuda_int16 PASSED [ 56%] 2023-01-11T23:10:17.1502366Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log2_cuda_int32 PASSED [ 56%] 2023-01-11T23:10:17.1502514Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log2_cuda_int8 PASSED [ 56%] 2023-01-11T23:10:17.1502672Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_cuda_bfloat16 PASSED [ 56%] 2023-01-11T23:10:17.1502829Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_cuda_bool PASSED [ 56%] 2023-01-11T23:10:17.1502993Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_cuda_complex128 PASSED [ 56%] 2023-01-11T23:10:17.1503181Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_cuda_complex64 PASSED [ 56%] 2023-01-11T23:10:17.1503340Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_cuda_float32 PASSED [ 56%] 2023-01-11T23:10:17.1503500Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_cuda_float64 PASSED [ 56%] 2023-01-11T23:10:17.1503659Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_cuda_int8 PASSED [ 56%] 2023-01-11T23:10:17.1503809Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_cuda_uint8 PASSED [ 56%] 2023-01-11T23:10:17.1503989Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_softmax_with_dtype_cuda_bool PASSED [ 56%] 2023-01-11T23:10:17.1504174Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_softmax_with_dtype_cuda_complex64 PASSED [ 56%] 2023-01-11T23:10:17.1504354Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_softmax_with_dtype_cuda_float16 PASSED [ 56%] 2023-01-11T23:10:17.1504537Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_softmax_with_dtype_cuda_float32 PASSED [ 56%] 2023-01-11T23:10:17.1504714Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_softmax_with_dtype_cuda_int32 PASSED [ 56%] 2023-01-11T23:10:17.1504894Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_softmax_with_dtype_cuda_int64 PASSED [ 56%] 2023-01-11T23:10:17.1505069Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_softmax_with_dtype_cuda_int8 PASSED [ 56%] 2023-01-11T23:10:17.1505238Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_and_cuda_bfloat16 PASSED [ 56%] 2023-01-11T23:10:17.1505395Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_and_cuda_bool PASSED [ 56%] 2023-01-11T23:10:17.1505567Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_and_cuda_complex64 PASSED [ 56%] 2023-01-11T23:10:17.1505734Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_and_cuda_float16 PASSED [ 56%] 2023-01-11T23:10:17.1505903Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_and_cuda_float32 PASSED [ 56%] 2023-01-11T23:10:17.1506069Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_and_cuda_float64 PASSED [ 56%] 2023-01-11T23:10:17.1506256Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_and_cuda_int16 PASSED [ 56%] 2023-01-11T23:10:17.1506420Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_and_cuda_int32 PASSED [ 56%] 2023-01-11T23:10:17.1506579Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_and_cuda_uint8 PASSED [ 56%] 2023-01-11T23:10:17.1506751Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_not_cuda_complex128 PASSED [ 56%] 2023-01-11T23:10:17.1506915Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_not_cuda_complex64 PASSED [ 56%] 2023-01-11T23:10:17.1507081Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_not_cuda_float16 PASSED [ 56%] 2023-01-11T23:10:17.1507245Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_not_cuda_int16 PASSED [ 56%] 2023-01-11T23:10:17.1507411Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_not_cuda_int64 PASSED [ 56%] 2023-01-11T23:10:17.1507574Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_not_cuda_uint8 PASSED [ 56%] 2023-01-11T23:10:17.1507739Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_or_cuda_float64 PASSED [ 56%] 2023-01-11T23:10:17.1507901Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_or_cuda_int16 PASSED [ 56%] 2023-01-11T23:10:17.1508069Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_xor_cuda_bfloat16 PASSED [ 56%] 2023-01-11T23:10:17.1508234Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_xor_cuda_complex128 PASSED [ 56%] 2023-01-11T23:10:17.1508400Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_xor_cuda_float16 PASSED [ 56%] 2023-01-11T23:10:17.1508565Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_xor_cuda_float32 PASSED [ 56%] 2023-01-11T23:10:17.1508828Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_xor_cuda_float64 PASSED [ 57%] 2023-01-11T23:10:17.1509011Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_xor_cuda_int32 PASSED [ 57%] 2023-01-11T23:10:17.1509178Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_xor_cuda_int64 PASSED [ 57%] 2023-01-11T23:10:17.1509337Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_xor_cuda_uint8 PASSED [ 57%] 2023-01-11T23:10:17.1509498Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logspace_cuda_bfloat16 PASSED [ 57%] 2023-01-11T23:10:17.1509664Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logspace_cuda_complex64 PASSED [ 57%] 2023-01-11T23:10:17.1509817Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logspace_cuda_int16 PASSED [ 57%] 2023-01-11T23:10:17.1509980Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logspace_cuda_int64 PASSED [ 57%] 2023-01-11T23:10:17.1510146Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logspace_cuda_int8 PASSED [ 57%] 2023-01-11T23:10:17.1510307Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logspace_cuda_uint8 PASSED [ 57%] 2023-01-11T23:10:17.1510476Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logsumexp_cuda_bfloat16 PASSED [ 57%] 2023-01-11T23:10:17.1510637Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logsumexp_cuda_float64 PASSED [ 57%] 2023-01-11T23:10:17.1510800Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logsumexp_cuda_int32 PASSED [ 57%] 2023-01-11T23:10:17.1510961Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logsumexp_cuda_int8 PASSED [ 57%] 2023-01-11T23:10:17.1511111Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lt_cuda_bool PASSED [ 57%] 2023-01-11T23:10:17.1511269Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lt_cuda_float16 PASSED [ 57%] 2023-01-11T23:10:17.1511426Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lt_cuda_int16 PASSED [ 57%] 2023-01-11T23:10:17.1511582Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lt_cuda_int64 PASSED [ 57%] 2023-01-11T23:10:17.1511736Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lt_cuda_int8 PASSED [ 57%] 2023-01-11T23:10:17.1511912Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lt_cuda_uint8 PASSED [ 57%] 2023-01-11T23:10:17.1512082Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_masked_fill_cuda_bfloat16 PASSED [ 57%] 2023-01-11T23:10:17.1512248Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_masked_fill_cuda_bool PASSED [ 57%] 2023-01-11T23:10:17.1512418Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_masked_fill_cuda_complex128 PASSED [ 57%] 2023-01-11T23:10:17.1512582Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_masked_fill_cuda_complex32 PASSED [ 57%] 2023-01-11T23:10:17.1512750Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_masked_fill_cuda_float32 PASSED [ 57%] 2023-01-11T23:10:17.1512915Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_masked_fill_cuda_float64 PASSED [ 57%] 2023-01-11T23:10:17.1513080Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_masked_fill_cuda_int32 PASSED [ 57%] 2023-01-11T23:10:17.1513244Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_masked_fill_cuda_int8 PASSED [ 57%] 2023-01-11T23:10:17.1513404Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_masked_fill_cuda_uint8 PASSED [ 57%] 2023-01-11T23:10:17.1513567Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_maximum_cuda_bfloat16 PASSED [ 57%] 2023-01-11T23:10:17.1513728Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_maximum_cuda_float16 PASSED [ 57%] 2023-01-11T23:10:17.1513881Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_maximum_cuda_float64 PASSED [ 57%] 2023-01-11T23:10:17.1514043Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_maximum_cuda_int8 PASSED [ 57%] 2023-01-11T23:10:17.1514208Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_maximum_cuda_uint8 PASSED [ 57%] 2023-01-11T23:10:17.1514397Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_mean_cuda_complex64 PASSED [ 57%] 2023-01-11T23:10:17.1514557Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_mean_cuda_float32 PASSED [ 57%] 2023-01-11T23:10:17.1514722Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_mean_cuda_float64 PASSED [ 57%] 2023-01-11T23:10:17.1514908Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_list_of_tensors_cuda_bfloat16 PASSED [ 57%] 2023-01-11T23:10:17.1515092Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_list_of_tensors_cuda_bool PASSED [ 57%] 2023-01-11T23:10:17.1515279Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_list_of_tensors_cuda_complex128 PASSED [ 57%] 2023-01-11T23:10:17.1515453Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_list_of_tensors_cuda_float32 PASSED [ 57%] 2023-01-11T23:10:17.1515637Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_list_of_tensors_cuda_float64 PASSED [ 57%] 2023-01-11T23:10:17.1515819Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_list_of_tensors_cuda_int16 PASSED [ 57%] 2023-01-11T23:10:17.1515997Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_list_of_tensors_cuda_int32 PASSED [ 57%] 2023-01-11T23:10:17.1516176Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_list_of_tensors_cuda_int64 PASSED [ 57%] 2023-01-11T23:10:17.1516355Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_list_of_tensors_cuda_int8 PASSED [ 57%] 2023-01-11T23:10:17.1516537Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_variadic_tensors_cuda_bool PASSED [ 57%] 2023-01-11T23:10:17.1516722Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_variadic_tensors_cuda_float16 PASSED [ 57%] 2023-01-11T23:10:17.1516906Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_variadic_tensors_cuda_float64 PASSED [ 57%] 2023-01-11T23:10:17.1517080Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_variadic_tensors_cuda_int16 PASSED [ 57%] 2023-01-11T23:10:17.1517267Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_variadic_tensors_cuda_int64 PASSED [ 57%] 2023-01-11T23:10:17.1517473Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_variadic_tensors_cuda_int8 PASSED [ 57%] 2023-01-11T23:10:17.1517658Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_variadic_tensors_cuda_uint8 PASSED [ 57%] 2023-01-11T23:10:17.1517819Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_minimum_cuda_bool PASSED [ 57%] 2023-01-11T23:10:17.1517981Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_minimum_cuda_float16 PASSED [ 57%] 2023-01-11T23:10:17.1518144Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_minimum_cuda_float64 PASSED [ 57%] 2023-01-11T23:10:17.1518303Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_minimum_cuda_int32 PASSED [ 57%] 2023-01-11T23:10:17.1518464Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_minimum_cuda_int8 PASSED [ 57%] 2023-01-11T23:10:17.1518619Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_minimum_cuda_uint8 PASSED [ 57%] 2023-01-11T23:10:17.1518785Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_movedim_cuda_bfloat16 PASSED [ 57%] 2023-01-11T23:10:17.1518956Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_movedim_cuda_complex128 PASSED [ 57%] 2023-01-11T23:10:17.1519123Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_movedim_cuda_complex32 PASSED [ 57%] 2023-01-11T23:10:17.1519287Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_movedim_cuda_complex64 PASSED [ 57%] 2023-01-11T23:10:17.1519450Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_movedim_cuda_float16 PASSED [ 57%] 2023-01-11T23:10:17.1519614Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_movedim_cuda_float32 PASSED [ 57%] 2023-01-11T23:10:17.1519773Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_movedim_cuda_float64 PASSED [ 57%] 2023-01-11T23:10:17.1519953Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_movedim_cuda_int32 PASSED [ 57%] 2023-01-11T23:10:17.1520138Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_movedim_cuda_int64 PASSED [ 57%] 2023-01-11T23:10:17.1520326Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_mul_cuda_complex128 PASSED [ 57%] 2023-01-11T23:10:17.1520485Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_mul_cuda_complex32 PASSED [ 57%] 2023-01-11T23:10:17.1520647Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_mul_cuda_complex64 PASSED [ 57%] 2023-01-11T23:10:17.1520803Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_mul_cuda_int32 PASSED [ 57%] 2023-01-11T23:10:17.1520955Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_mul_cuda_int64 PASSED [ 57%] 2023-01-11T23:10:17.1521107Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_mul_cuda_uint8 PASSED [ 57%] 2023-01-11T23:10:17.1521275Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nan_to_num_cuda_bfloat16 PASSED [ 57%] 2023-01-11T23:10:17.1521433Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nan_to_num_cuda_int16 PASSED [ 57%] 2023-01-11T23:10:17.1521596Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nan_to_num_cuda_int32 PASSED [ 57%] 2023-01-11T23:10:17.1521756Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nan_to_num_cuda_int8 PASSED [ 57%] 2023-01-11T23:10:17.1521914Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nan_to_num_cuda_uint8 PASSED [ 57%] 2023-01-11T23:10:17.1522075Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_copy_cuda_bool PASSED [ 57%] 2023-01-11T23:10:17.1522247Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_copy_cuda_complex32 PASSED [ 57%] 2023-01-11T23:10:17.1522417Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_copy_cuda_complex64 PASSED [ 57%] 2023-01-11T23:10:17.1522585Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_copy_cuda_float32 PASSED [ 57%] 2023-01-11T23:10:17.1522744Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_copy_cuda_int16 PASSED [ 57%] 2023-01-11T23:10:17.1522908Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_copy_cuda_int64 PASSED [ 57%] 2023-01-11T23:10:17.1523093Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_copy_cuda_uint8 PASSED [ 57%] 2023-01-11T23:10:17.1523256Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_cuda_bfloat16 PASSED [ 57%] 2023-01-11T23:10:17.1523415Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_cuda_bool PASSED [ 57%] 2023-01-11T23:10:17.1523577Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_cuda_complex64 PASSED [ 57%] 2023-01-11T23:10:17.1523738Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_cuda_float16 PASSED [ 57%] 2023-01-11T23:10:17.1523902Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_cuda_float32 PASSED [ 57%] 2023-01-11T23:10:17.1524062Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_cuda_int16 PASSED [ 57%] 2023-01-11T23:10:17.1524216Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_cuda_int8 PASSED [ 57%] 2023-01-11T23:10:17.1524375Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_cuda_uint8 PASSED [ 57%] 2023-01-11T23:10:17.1524552Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_native_layer_norm_cuda_bfloat16 PASSED [ 57%] 2023-01-11T23:10:17.1524723Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_native_layer_norm_cuda_float16 PASSED [ 57%] 2023-01-11T23:10:17.1524891Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_native_layer_norm_cuda_float64 PASSED [ 57%] 2023-01-11T23:10:17.1525051Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ne_cuda_complex128 PASSED [ 57%] 2023-01-11T23:10:17.1525210Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ne_cuda_float32 PASSED [ 57%] 2023-01-11T23:10:17.1525364Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ne_cuda_float64 PASSED [ 57%] 2023-01-11T23:10:17.1525538Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ne_cuda_int16 PASSED [ 57%] 2023-01-11T23:10:17.1525725Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ne_cuda_int32 PASSED [ 57%] 2023-01-11T23:10:17.1525879Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ne_cuda_uint8 PASSED [ 57%] 2023-01-11T23:10:17.1526039Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_neg_cuda_bfloat16 PASSED [ 57%] 2023-01-11T23:10:17.1526202Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_neg_cuda_complex128 PASSED [ 57%] 2023-01-11T23:10:17.1526363Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_neg_cuda_complex32 PASSED [ 57%] 2023-01-11T23:10:17.1526522Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_neg_cuda_complex64 PASSED [ 57%] 2023-01-11T23:10:17.1526681Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_neg_cuda_float16 PASSED [ 57%] 2023-01-11T23:10:17.1526829Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_neg_cuda_int16 PASSED [ 57%] 2023-01-11T23:10:17.1526985Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_neg_cuda_int64 PASSED [ 57%] 2023-01-11T23:10:17.1527142Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_neg_cuda_int8 PASSED [ 57%] 2023-01-11T23:10:17.1527304Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_cuda_bool PASSED [ 57%] 2023-01-11T23:10:17.1527473Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_cuda_complex128 PASSED [ 57%] 2023-01-11T23:10:17.1527641Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_cuda_complex32 PASSED [ 57%] 2023-01-11T23:10:17.1527809Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_cuda_float64 PASSED [ 57%] 2023-01-11T23:10:17.1527971Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_cuda_int32 PASSED [ 57%] 2023-01-11T23:10:17.1528132Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_cuda_int64 PASSED [ 57%] 2023-01-11T23:10:17.1528283Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_cuda_uint8 PASSED [ 57%] 2023-01-11T23:10:17.1528465Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_strided_cuda_bfloat16 PASSED [ 57%] 2023-01-11T23:10:17.1528676Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_strided_cuda_complex64 PASSED [ 57%] 2023-01-11T23:10:17.1528855Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_strided_cuda_float32 PASSED [ 57%] 2023-01-11T23:10:17.1529024Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_strided_cuda_float64 PASSED [ 57%] 2023-01-11T23:10:17.1529195Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_strided_cuda_int16 PASSED [ 57%] 2023-01-11T23:10:17.1529364Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_strided_cuda_int64 PASSED [ 57%] 2023-01-11T23:10:17.1529529Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_full_cuda_bfloat16 PASSED [ 57%] 2023-01-11T23:10:17.1529701Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_full_cuda_complex128 PASSED [ 57%] 2023-01-11T23:10:17.1529860Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_full_cuda_float16 PASSED [ 57%] 2023-01-11T23:10:17.1530026Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_full_cuda_float32 PASSED [ 57%] 2023-01-11T23:10:17.1530210Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_full_cuda_float64 PASSED [ 57%] 2023-01-11T23:10:17.1530394Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_full_cuda_int32 PASSED [ 57%] 2023-01-11T23:10:17.1530555Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_full_cuda_int8 PASSED [ 57%] 2023-01-11T23:10:17.1530716Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_full_cuda_uint8 PASSED [ 57%] 2023-01-11T23:10:17.1530880Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_ones_cuda_bfloat16 PASSED [ 57%] 2023-01-11T23:10:17.1531039Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_ones_cuda_bool PASSED [ 57%] 2023-01-11T23:10:17.1531223Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_ones_cuda_complex64 PASSED [ 57%] 2023-01-11T23:10:17.1531383Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_ones_cuda_float32 PASSED [ 57%] 2023-01-11T23:10:17.1531544Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_ones_cuda_int64 PASSED [ 57%] 2023-01-11T23:10:17.1531713Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_zeros_cuda_complex128 PASSED [ 57%] 2023-01-11T23:10:17.1531884Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_zeros_cuda_complex32 PASSED [ 57%] 2023-01-11T23:10:17.1532049Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_zeros_cuda_complex64 PASSED [ 57%] 2023-01-11T23:10:17.1532213Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_zeros_cuda_float16 PASSED [ 57%] 2023-01-11T23:10:17.1532375Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_zeros_cuda_float32 PASSED [ 57%] 2023-01-11T23:10:17.1532533Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_zeros_cuda_float64 PASSED [ 57%] 2023-01-11T23:10:17.1532687Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_zeros_cuda_int16 PASSED [ 57%] 2023-01-11T23:10:17.1532850Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_zeros_cuda_int32 PASSED [ 57%] 2023-01-11T23:10:17.1533009Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_zeros_cuda_int64 PASSED [ 57%] 2023-01-11T23:10:17.1533174Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nextafter_cuda_float32 PASSED [ 57%] 2023-01-11T23:10:17.1533363Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_alpha_dropout_cuda_float16 PASSED [ 57%] 2023-01-11T23:10:17.1533549Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_alpha_dropout_cuda_float32 PASSED [ 57%] 2023-01-11T23:10:17.1533726Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_celu_cuda_float32 PASSED [ 57%] 2023-01-11T23:10:17.1533898Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_celu_cuda_float64 PASSED [ 57%] 2023-01-11T23:10:17.1534081Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_dropout_cuda_float64 PASSED [ 57%] 2023-01-11T23:10:17.1534269Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_elu_cuda_float16 PASSED [ 57%] 2023-01-11T23:10:17.1534444Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_gelu_cuda_float16 PASSED [ 57%] 2023-01-11T23:10:17.1534715Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_gelu_cuda_float32 PASSED [ 57%] 2023-01-11T23:10:17.1534888Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_glu_cuda_float16 PASSED [ 57%] 2023-01-11T23:10:17.1535056Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_glu_cuda_float32 PASSED [ 58%] 2023-01-11T23:10:17.1535222Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_glu_cuda_float64 PASSED [ 58%] 2023-01-11T23:10:17.1535413Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_group_norm_cuda_float32 PASSED [ 58%] 2023-01-11T23:10:17.1535596Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_hardshrink_cuda_float16 PASSED [ 58%] 2023-01-11T23:10:17.1535771Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_hardshrink_cuda_float32 PASSED [ 58%] 2023-01-11T23:10:17.1535950Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_hardshrink_cuda_float64 PASSED [ 58%] 2023-01-11T23:10:17.1536130Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_hardtanh_cuda_int16 PASSED [ 58%] 2023-01-11T23:10:17.1536308Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_hardtanh_cuda_int32 PASSED [ 58%] 2023-01-11T23:10:17.1536502Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_hinge_embedding_loss_cuda_bfloat16 PASSED [ 58%] 2023-01-11T23:10:17.1536695Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_hinge_embedding_loss_cuda_float16 PASSED [ 58%] 2023-01-11T23:10:17.1536922Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_hinge_embedding_loss_cuda_float32 PASSED [ 58%] 2023-01-11T23:10:17.1537110Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_hinge_embedding_loss_cuda_float64 PASSED [ 58%] 2023-01-11T23:10:17.1537291Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_huber_loss_cuda_float32 PASSED [ 58%] 2023-01-11T23:10:17.1537471Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_l1_loss_cuda_bfloat16 PASSED [ 58%] 2023-01-11T23:10:17.1537644Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_l1_loss_cuda_complex64 PASSED [ 58%] 2023-01-11T23:10:17.1537822Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_l1_loss_cuda_float16 PASSED [ 58%] 2023-01-11T23:10:17.1537999Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_l1_loss_cuda_float64 PASSED [ 58%] 2023-01-11T23:10:17.1538185Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_layer_norm_cuda_bfloat16 PASSED [ 58%] 2023-01-11T23:10:17.1538363Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_layer_norm_cuda_float32 PASSED [ 58%] 2023-01-11T23:10:17.1538542Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_leaky_relu_cuda_bfloat16 PASSED [ 58%] 2023-01-11T23:10:17.1538718Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_leaky_relu_cuda_float64 PASSED [ 58%] 2023-01-11T23:10:17.1538917Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_log_softmax_with_dtype_cuda_complex128 PASSED [ 58%] 2023-01-11T23:10:17.1539115Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_log_softmax_with_dtype_cuda_complex32 PASSED [ 58%] 2023-01-11T23:10:17.1539305Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_log_softmax_with_dtype_cuda_complex64 PASSED [ 58%] 2023-01-11T23:10:17.1539505Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_log_softmax_with_dtype_cuda_float16 PASSED [ 58%] 2023-01-11T23:10:17.1539697Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_log_softmax_with_dtype_cuda_float32 PASSED [ 58%] 2023-01-11T23:10:17.1539918Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_log_softmax_with_dtype_cuda_int16 PASSED [ 58%] 2023-01-11T23:10:17.1540104Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_log_softmax_with_dtype_cuda_int32 PASSED [ 58%] 2023-01-11T23:10:17.1540287Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_margin_ranking_loss_cuda_bfloat16 PASSED [ 58%] 2023-01-11T23:10:17.1540477Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_margin_ranking_loss_cuda_float16 PASSED [ 58%] 2023-01-11T23:10:17.1540661Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_margin_ranking_loss_cuda_float32 PASSED [ 58%] 2023-01-11T23:10:17.1540851Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_margin_ranking_loss_cuda_int8 PASSED [ 58%] 2023-01-11T23:10:17.1541027Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_mish_cuda_float64 PASSED [ 58%] 2023-01-11T23:10:17.1541200Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_mse_loss_cuda_float16 PASSED [ 58%] 2023-01-11T23:10:17.1541382Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_nll_loss_cuda_bfloat16 PASSED [ 58%] 2023-01-11T23:10:17.1541560Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_nll_loss_cuda_float64 PASSED [ 58%] 2023-01-11T23:10:17.1541753Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pairwise_distance_cuda_bfloat16 PASSED [ 58%] 2023-01-11T23:10:17.1541944Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pairwise_distance_cuda_complex128 PASSED [ 58%] 2023-01-11T23:10:17.1542134Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pairwise_distance_cuda_complex64 PASSED [ 58%] 2023-01-11T23:10:17.1542351Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pairwise_distance_cuda_float32 PASSED [ 58%] 2023-01-11T23:10:17.1542546Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pairwise_distance_cuda_int16 PASSED [ 58%] 2023-01-11T23:10:17.1542733Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pairwise_distance_cuda_int8 PASSED [ 58%] 2023-01-11T23:10:17.1542913Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pairwise_distance_cuda_uint8 PASSED [ 58%] 2023-01-11T23:10:17.1543092Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pdist_cuda_float32 PASSED [ 58%] 2023-01-11T23:10:17.1543281Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_poisson_nll_loss_cuda_bfloat16 PASSED [ 58%] 2023-01-11T23:10:17.1543468Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_poisson_nll_loss_cuda_float16 PASSED [ 58%] 2023-01-11T23:10:17.1543653Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_poisson_nll_loss_cuda_float32 PASSED [ 58%] 2023-01-11T23:10:17.1543839Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_poisson_nll_loss_cuda_float64 PASSED [ 58%] 2023-01-11T23:10:17.1544022Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_poisson_nll_loss_cuda_int16 PASSED [ 58%] 2023-01-11T23:10:17.1544202Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_prelu_cuda_float64 PASSED [ 58%] 2023-01-11T23:10:17.1544380Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_relu6_cuda_bfloat16 PASSED [ 58%] 2023-01-11T23:10:17.1544547Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_relu6_cuda_int32 PASSED [ 58%] 2023-01-11T23:10:17.1544719Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_relu6_cuda_int8 PASSED [ 58%] 2023-01-11T23:10:17.1544897Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_relu_cuda_float16 PASSED [ 58%] 2023-01-11T23:10:17.1545067Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_relu_cuda_float64 PASSED [ 58%] 2023-01-11T23:10:17.1545305Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_relu_cuda_int32 PASSED [ 58%] 2023-01-11T23:10:17.1545486Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_selu_cuda_bfloat16 PASSED [ 58%] 2023-01-11T23:10:17.1545658Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_selu_cuda_float16 PASSED [ 58%] 2023-01-11T23:10:17.1545830Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_selu_cuda_float32 PASSED [ 58%] 2023-01-11T23:10:17.1546000Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_selu_cuda_float64 PASSED [ 58%] 2023-01-11T23:10:17.1546183Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmax_with_dtype_cuda_bfloat16 PASSED [ 58%] 2023-01-11T23:10:17.1546380Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmax_with_dtype_cuda_complex128 PASSED [ 58%] 2023-01-11T23:10:17.1546571Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmax_with_dtype_cuda_float16 PASSED [ 58%] 2023-01-11T23:10:17.1546762Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmax_with_dtype_cuda_float32 PASSED [ 58%] 2023-01-11T23:10:17.1546943Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmax_with_dtype_cuda_float64 PASSED [ 58%] 2023-01-11T23:10:17.1547129Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmax_with_dtype_cuda_int16 PASSED [ 58%] 2023-01-11T23:10:17.1547312Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmax_with_dtype_cuda_int64 PASSED [ 58%] 2023-01-11T23:10:17.1547499Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmax_with_dtype_cuda_uint8 PASSED [ 58%] 2023-01-11T23:10:17.1547721Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmin_with_dtype_cuda_complex128 PASSED [ 58%] 2023-01-11T23:10:17.1547905Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmin_with_dtype_cuda_complex64 PASSED [ 58%] 2023-01-11T23:10:17.1548096Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmin_with_dtype_cuda_float16 PASSED [ 58%] 2023-01-11T23:10:17.1548279Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmin_with_dtype_cuda_float32 PASSED [ 58%] 2023-01-11T23:10:17.1548460Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmin_with_dtype_cuda_float64 PASSED [ 58%] 2023-01-11T23:10:17.1548644Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmin_with_dtype_cuda_int64 PASSED [ 58%] 2023-01-11T23:10:17.1548928Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmin_with_dtype_cuda_int8 PASSED [ 58%] 2023-01-11T23:10:17.1549119Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmin_with_dtype_cuda_uint8 PASSED [ 58%] 2023-01-11T23:10:17.1549300Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softplus_cuda_float32 PASSED [ 58%] 2023-01-11T23:10:17.1549488Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softshrink_cuda_bfloat16 PASSED [ 58%] 2023-01-11T23:10:17.1549671Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softshrink_cuda_float64 PASSED [ 58%] 2023-01-11T23:10:17.1549843Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_tanhshrink_cuda_bfloat16 PASSED [ 58%] 2023-01-11T23:10:17.1550030Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_tanhshrink_cuda_complex128 PASSED [ 58%] 2023-01-11T23:10:17.1550211Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_tanhshrink_cuda_float32 PASSED [ 58%] 2023-01-11T23:10:17.1550391Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_tanhshrink_cuda_float64 PASSED [ 58%] 2023-01-11T23:10:17.1550572Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_tanhshrink_cuda_int8 PASSED [ 58%] 2023-01-11T23:10:17.1550778Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_tanhshrink_cuda_uint8 PASSED [ 58%] 2023-01-11T23:10:17.1550961Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_threshold_cuda_float16 PASSED [ 58%] 2023-01-11T23:10:17.1551140Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_threshold_cuda_float32 PASSED [ 58%] 2023-01-11T23:10:17.1551318Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_threshold_cuda_float64 PASSED [ 58%] 2023-01-11T23:10:17.1551486Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_threshold_cuda_int32 PASSED [ 58%] 2023-01-11T23:10:17.1551664Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_threshold_cuda_int8 PASSED [ 58%] 2023-01-11T23:10:17.1551857Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_triplet_margin_loss_cuda_float16 PASSED [ 58%] 2023-01-11T23:10:17.1552043Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_triplet_margin_loss_cuda_float32 PASSED [ 58%] 2023-01-11T23:10:17.1552236Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_triplet_margin_loss_cuda_int32 PASSED [ 58%] 2023-01-11T23:10:17.1552419Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_triplet_margin_loss_cuda_int8 PASSED [ 58%] 2023-01-11T23:10:17.1552606Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_triplet_margin_loss_cuda_uint8 PASSED [ 58%] 2023-01-11T23:10:17.1552767Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_norm_cuda_bfloat16 PASSED [ 58%] 2023-01-11T23:10:17.1552930Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_norm_cuda_complex64 PASSED [ 58%] 2023-01-11T23:10:17.1553084Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_norm_cuda_float16 PASSED [ 58%] 2023-01-11T23:10:17.1553268Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_norm_cuda_float32 PASSED [ 58%] 2023-01-11T23:10:17.1553427Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_norm_cuda_float64 PASSED [ 58%] 2023-01-11T23:10:17.1553590Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ones_cuda_bfloat16 PASSED [ 58%] 2023-01-11T23:10:17.1553755Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ones_cuda_complex128 PASSED [ 58%] 2023-01-11T23:10:17.1553917Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ones_cuda_complex32 PASSED [ 58%] 2023-01-11T23:10:17.1554078Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ones_cuda_complex64 PASSED [ 58%] 2023-01-11T23:10:17.1554237Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ones_cuda_float32 PASSED [ 58%] 2023-01-11T23:10:17.1554385Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ones_cuda_float64 PASSED [ 58%] 2023-01-11T23:10:17.1554546Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ones_cuda_int32 PASSED [ 58%] 2023-01-11T23:10:17.1554714Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_permute_cuda_complex128 PASSED [ 58%] 2023-01-11T23:10:17.1554884Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_permute_cuda_complex32 PASSED [ 58%] 2023-01-11T23:10:17.1555052Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_permute_cuda_complex64 PASSED [ 58%] 2023-01-11T23:10:17.1555216Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_permute_cuda_float16 PASSED [ 58%] 2023-01-11T23:10:17.1555380Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_permute_cuda_int16 PASSED [ 58%] 2023-01-11T23:10:17.1555546Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_positive_cuda_bfloat16 PASSED [ 58%] 2023-01-11T23:10:17.1555713Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_positive_cuda_complex128 PASSED [ 58%] 2023-01-11T23:10:17.1555873Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_positive_cuda_complex32 PASSED [ 58%] 2023-01-11T23:10:17.1556045Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_positive_cuda_complex64 PASSED [ 58%] 2023-01-11T23:10:17.1556211Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_positive_cuda_float32 PASSED [ 58%] 2023-01-11T23:10:17.1556400Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_positive_cuda_float64 PASSED [ 58%] 2023-01-11T23:10:17.1556562Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_positive_cuda_int16 PASSED [ 58%] 2023-01-11T23:10:17.1556723Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_positive_cuda_int32 PASSED [ 58%] 2023-01-11T23:10:17.1556884Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_positive_cuda_int64 PASSED [ 58%] 2023-01-11T23:10:17.1557045Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_pow_cuda_bfloat16 PASSED [ 58%] 2023-01-11T23:10:17.1557200Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_pow_cuda_complex32 PASSED [ 58%] 2023-01-11T23:10:17.1557363Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_pow_cuda_complex64 PASSED [ 58%] 2023-01-11T23:10:17.1557522Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_pow_cuda_int16 PASSED [ 58%] 2023-01-11T23:10:17.1557679Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_pow_cuda_int64 PASSED [ 58%] 2023-01-11T23:10:17.1557835Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_pow_cuda_int8 PASSED [ 58%] 2023-01-11T23:10:17.1557988Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_prod_cuda_bool PASSED [ 58%] 2023-01-11T23:10:17.1558154Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_prod_cuda_complex128 PASSED [ 58%] 2023-01-11T23:10:17.1558316Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_prod_cuda_complex64 PASSED [ 58%] 2023-01-11T23:10:17.1558478Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_prod_cuda_float16 PASSED [ 58%] 2023-01-11T23:10:17.1558631Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_prod_cuda_float32 PASSED [ 58%] 2023-01-11T23:10:17.1558813Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_prod_cuda_float64 PASSED [ 58%] 2023-01-11T23:10:17.1558971Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_prod_cuda_int16 PASSED [ 58%] 2023-01-11T23:10:17.1559127Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_prod_cuda_int32 PASSED [ 58%] 2023-01-11T23:10:17.1559279Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_prod_cuda_int64 PASSED [ 58%] 2023-01-11T23:10:17.1559432Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_prod_cuda_uint8 PASSED [ 58%] 2023-01-11T23:10:17.1559596Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_randn_cuda_complex32 PASSED [ 58%] 2023-01-11T23:10:17.1559757Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_randn_cuda_float16 PASSED [ 58%] 2023-01-11T23:10:17.1559910Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_randn_cuda_float32 PASSED [ 58%] 2023-01-11T23:10:17.1560071Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ravel_cuda_bfloat16 PASSED [ 58%] 2023-01-11T23:10:17.1560231Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ravel_cuda_bool PASSED [ 58%] 2023-01-11T23:10:17.1560395Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ravel_cuda_complex64 PASSED [ 58%] 2023-01-11T23:10:17.1560554Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ravel_cuda_float16 PASSED [ 58%] 2023-01-11T23:10:17.1560713Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ravel_cuda_float64 PASSED [ 58%] 2023-01-11T23:10:17.1560874Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ravel_cuda_int32 PASSED [ 58%] 2023-01-11T23:10:17.1561033Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ravel_cuda_uint8 PASSED [ 58%] 2023-01-11T23:10:17.1561191Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_real_cuda_bool PASSED [ 58%] 2023-01-11T23:10:17.1561347Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_real_cuda_complex128 PASSED [ 58%] 2023-01-11T23:10:17.1561507Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_real_cuda_complex32 PASSED [ 58%] 2023-01-11T23:10:17.1561672Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_real_cuda_complex64 PASSED [ 58%] 2023-01-11T23:10:17.1561835Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_real_cuda_float32 PASSED [ 58%] 2023-01-11T23:10:17.1562017Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_real_cuda_float64 PASSED [ 58%] 2023-01-11T23:10:17.1562179Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_real_cuda_int16 PASSED [ 58%] 2023-01-11T23:10:17.1562331Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_real_cuda_int32 PASSED [ 58%] 2023-01-11T23:10:17.1562482Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_real_cuda_int64 PASSED [ 58%] 2023-01-11T23:10:17.1562630Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_real_cuda_int8 PASSED [ 58%] 2023-01-11T23:10:17.1562800Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reciprocal_cuda_bfloat16 PASSED [ 59%] 2023-01-11T23:10:17.1562964Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reciprocal_cuda_bool PASSED [ 59%] 2023-01-11T23:10:17.1563140Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reciprocal_cuda_complex128 PASSED [ 59%] 2023-01-11T23:10:17.1563306Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reciprocal_cuda_int64 PASSED [ 59%] 2023-01-11T23:10:17.1563470Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_remainder_cuda_float32 PASSED [ 59%] 2023-01-11T23:10:17.1563637Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_remainder_cuda_float64 PASSED [ 59%] 2023-01-11T23:10:17.1563797Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_remainder_cuda_int16 PASSED [ 59%] 2023-01-11T23:10:17.1563951Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_remainder_cuda_int32 PASSED [ 59%] 2023-01-11T23:10:17.1564109Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_remainder_cuda_int64 PASSED [ 59%] 2023-01-11T23:10:17.1564265Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_remainder_cuda_uint8 PASSED [ 59%] 2023-01-11T23:10:17.1564448Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_repeat_cuda_bfloat16 PASSED [ 59%] 2023-01-11T23:10:17.1564607Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_repeat_cuda_float64 PASSED [ 59%] 2023-01-11T23:10:17.1564771Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_repeat_cuda_int32 PASSED [ 59%] 2023-01-11T23:10:17.1564932Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_repeat_cuda_int8 PASSED [ 59%] 2023-01-11T23:10:17.1565102Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_as_cuda_bfloat16 PASSED [ 59%] 2023-01-11T23:10:17.1565274Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_as_cuda_complex128 PASSED [ 59%] 2023-01-11T23:10:17.1565432Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_as_cuda_float16 PASSED [ 59%] 2023-01-11T23:10:17.1565598Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_as_cuda_int16 PASSED [ 59%] 2023-01-11T23:10:17.1565763Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_as_cuda_int32 PASSED [ 59%] 2023-01-11T23:10:17.1565923Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_cuda_bfloat16 PASSED [ 59%] 2023-01-11T23:10:17.1566085Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_cuda_bool PASSED [ 59%] 2023-01-11T23:10:17.1566253Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_cuda_complex128 PASSED [ 59%] 2023-01-11T23:10:17.1566418Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_cuda_complex32 PASSED [ 59%] 2023-01-11T23:10:17.1566582Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_cuda_complex64 PASSED [ 59%] 2023-01-11T23:10:17.1566745Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_cuda_float16 PASSED [ 59%] 2023-01-11T23:10:17.1566899Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_cuda_int64 PASSED [ 59%] 2023-01-11T23:10:17.1567061Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_cuda_uint8 PASSED [ 59%] 2023-01-11T23:10:17.1567226Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_roll_cuda_bfloat16 PASSED [ 59%] 2023-01-11T23:10:17.1567387Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_roll_cuda_complex64 PASSED [ 59%] 2023-01-11T23:10:17.1567580Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_roll_cuda_float32 PASSED [ 59%] 2023-01-11T23:10:17.1567739Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_roll_cuda_int32 PASSED [ 59%] 2023-01-11T23:10:17.1567897Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_roll_cuda_int8 PASSED [ 59%] 2023-01-11T23:10:17.1568052Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_roll_cuda_uint8 PASSED [ 59%] 2023-01-11T23:10:17.1568204Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rot90_cuda_bfloat16 PASSED [ 59%] 2023-01-11T23:10:17.1568361Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rot90_cuda_float64 PASSED [ 59%] 2023-01-11T23:10:17.1568520Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rot90_cuda_int32 PASSED [ 59%] 2023-01-11T23:10:17.1568679Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rot90_cuda_int8 PASSED [ 59%] 2023-01-11T23:10:17.1568840Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_round_cuda_bfloat16 PASSED [ 59%] 2023-01-11T23:10:17.1569001Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_round_cuda_float16 PASSED [ 59%] 2023-01-11T23:10:17.1569158Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_round_cuda_float32 PASSED [ 59%] 2023-01-11T23:10:17.1569314Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_round_cuda_int16 PASSED [ 59%] 2023-01-11T23:10:17.1569462Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_round_cuda_int8 PASSED [ 59%] 2023-01-11T23:10:17.1569626Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsqrt_cuda_bfloat16 PASSED [ 59%] 2023-01-11T23:10:17.1569782Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsqrt_cuda_bool PASSED [ 59%] 2023-01-11T23:10:17.1569939Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsqrt_cuda_float16 PASSED [ 59%] 2023-01-11T23:10:17.1570122Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsqrt_cuda_int16 PASSED [ 59%] 2023-01-11T23:10:17.1570289Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsub_cuda_complex128 PASSED [ 59%] 2023-01-11T23:10:17.1570449Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsub_cuda_float64 PASSED [ 59%] 2023-01-11T23:10:17.1570605Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsub_cuda_int8 PASSED [ 59%] 2023-01-11T23:10:17.1570774Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sgn_cuda_bool PASSED [ 59%] 2023-01-11T23:10:17.1570946Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sgn_cuda_float16 PASSED [ 59%] 2023-01-11T23:10:17.1571115Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sgn_cuda_int64 PASSED [ 59%] 2023-01-11T23:10:17.1571271Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sgn_cuda_int8 PASSED [ 59%] 2023-01-11T23:10:17.1571430Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sgn_cuda_uint8 PASSED [ 59%] 2023-01-11T23:10:17.1571594Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sigmoid_cuda_bfloat16 PASSED [ 59%] 2023-01-11T23:10:17.1571765Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sigmoid_cuda_complex128 PASSED [ 59%] 2023-01-11T23:10:17.1571932Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sigmoid_cuda_complex64 PASSED [ 59%] 2023-01-11T23:10:17.1572094Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sigmoid_cuda_float16 PASSED [ 59%] 2023-01-11T23:10:17.1572247Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sigmoid_cuda_float64 PASSED [ 59%] 2023-01-11T23:10:17.1572405Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sigmoid_cuda_int16 PASSED [ 59%] 2023-01-11T23:10:17.1572565Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sigmoid_cuda_int8 PASSED [ 59%] 2023-01-11T23:10:17.1572724Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sign_cuda_bfloat16 PASSED [ 59%] 2023-01-11T23:10:17.1572885Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sign_cuda_float32 PASSED [ 59%] 2023-01-11T23:10:17.1573045Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sign_cuda_int8 PASSED [ 59%] 2023-01-11T23:10:17.1573233Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_signbit_cuda_bfloat16 PASSED [ 59%] 2023-01-11T23:10:17.1573394Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_signbit_cuda_bool PASSED [ 59%] 2023-01-11T23:10:17.1573548Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_signbit_cuda_float16 PASSED [ 59%] 2023-01-11T23:10:17.1573710Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_signbit_cuda_float32 PASSED [ 59%] 2023-01-11T23:10:17.1573869Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_signbit_cuda_float64 PASSED [ 59%] 2023-01-11T23:10:17.1574028Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sin_cuda_complex64 PASSED [ 59%] 2023-01-11T23:10:17.1574184Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sin_cuda_float16 PASSED [ 59%] 2023-01-11T23:10:17.1574345Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sin_cuda_int16 PASSED [ 59%] 2023-01-11T23:10:17.1574608Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sin_cuda_int8 PASSED [ 59%] 2023-01-11T23:10:17.1574773Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sin_cuda_uint8 PASSED [ 59%] 2023-01-11T23:10:17.1574924Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinc_cuda_bool PASSED [ 59%] 2023-01-11T23:10:17.1575077Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinc_cuda_complex128 PASSED [ 59%] 2023-01-11T23:10:17.1575235Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinc_cuda_float64 PASSED [ 59%] 2023-01-11T23:10:17.1575393Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinc_cuda_int16 PASSED [ 59%] 2023-01-11T23:10:17.1575546Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinc_cuda_uint8 PASSED [ 59%] 2023-01-11T23:10:17.1575755Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinh_cuda_bfloat16 PASSED [ 59%] 2023-01-11T23:10:17.1575916Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinh_cuda_complex64 PASSED [ 59%] 2023-01-11T23:10:17.1576081Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinh_cuda_float16 PASSED [ 59%] 2023-01-11T23:10:17.1576238Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinh_cuda_int16 PASSED [ 59%] 2023-01-11T23:10:17.1576385Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinh_cuda_int64 PASSED [ 59%] 2023-01-11T23:10:17.1576540Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinh_cuda_int8 PASSED [ 59%] 2023-01-11T23:10:17.1576721Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_softmax_with_dtype_cuda_complex64 PASSED [ 59%] 2023-01-11T23:10:17.1576901Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_softmax_with_dtype_cuda_float16 PASSED [ 59%] 2023-01-11T23:10:17.1577075Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_softmax_with_dtype_cuda_int32 PASSED [ 59%] 2023-01-11T23:10:17.1577247Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_softmax_with_dtype_cuda_int64 PASSED [ 59%] 2023-01-11T23:10:17.1577414Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_softmax_with_dtype_cuda_uint8 PASSED [ 59%] 2023-01-11T23:10:17.1577590Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_bessel_j0_cuda_float32 PASSED [ 59%] 2023-01-11T23:10:17.1577759Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_bessel_j0_cuda_float64 PASSED [ 59%] 2023-01-11T23:10:17.1577921Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_bessel_j0_cuda_int32 PASSED [ 59%] 2023-01-11T23:10:17.1578085Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_bessel_j0_cuda_int64 PASSED [ 59%] 2023-01-11T23:10:17.1578250Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_bessel_j0_cuda_uint8 PASSED [ 59%] 2023-01-11T23:10:17.1578419Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_bessel_j1_cuda_bool PASSED [ 59%] 2023-01-11T23:10:17.1578597Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_bessel_j1_cuda_float64 PASSED [ 59%] 2023-01-11T23:10:17.1578765Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_bessel_j1_cuda_int16 PASSED [ 59%] 2023-01-11T23:10:17.1578958Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_bessel_j1_cuda_int32 PASSED [ 59%] 2023-01-11T23:10:17.1579123Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_bessel_j1_cuda_int64 PASSED [ 59%] 2023-01-11T23:10:17.1579290Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_bessel_j1_cuda_int8 PASSED [ 59%] 2023-01-11T23:10:17.1579453Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_bessel_j1_cuda_uint8 PASSED [ 59%] 2023-01-11T23:10:17.1579621Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_entr_cuda_bool PASSED [ 59%] 2023-01-11T23:10:17.1579793Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_entr_cuda_float16 PASSED [ 59%] 2023-01-11T23:10:17.1579964Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_entr_cuda_float32 PASSED [ 59%] 2023-01-11T23:10:17.1580132Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_entr_cuda_float64 PASSED [ 59%] 2023-01-11T23:10:17.1580301Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_erfcx_cuda_int32 PASSED [ 59%] 2023-01-11T23:10:17.1580466Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_erfcx_cuda_int64 PASSED [ 59%] 2023-01-11T23:10:17.1580634Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_erfcx_cuda_uint8 PASSED [ 59%] 2023-01-11T23:10:17.1580792Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i0e_cuda_float16 PASSED [ 59%] 2023-01-11T23:10:17.1580956Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i0e_cuda_float64 PASSED [ 59%] 2023-01-11T23:10:17.1581126Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i0e_cuda_int64 PASSED [ 59%] 2023-01-11T23:10:17.1581314Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i0e_cuda_int8 PASSED [ 59%] 2023-01-11T23:10:17.1581478Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i1_cuda_bool PASSED [ 59%] 2023-01-11T23:10:17.1581644Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i1_cuda_float64 PASSED [ 59%] 2023-01-11T23:10:17.1581805Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i1_cuda_int32 PASSED [ 59%] 2023-01-11T23:10:17.1581965Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i1_cuda_int64 PASSED [ 59%] 2023-01-11T23:10:17.1582124Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i1_cuda_uint8 PASSED [ 59%] 2023-01-11T23:10:17.1582281Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i1e_cuda_float64 PASSED [ 59%] 2023-01-11T23:10:17.1582449Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i1e_cuda_int16 PASSED [ 59%] 2023-01-11T23:10:17.1582611Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i1e_cuda_int32 PASSED [ 59%] 2023-01-11T23:10:17.1582773Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i1e_cuda_uint8 PASSED [ 59%] 2023-01-11T23:10:17.1582944Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_log_ndtr_cuda_bool PASSED [ 59%] 2023-01-11T23:10:17.1583116Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_log_ndtr_cuda_int16 PASSED [ 59%] 2023-01-11T23:10:17.1583285Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_log_ndtr_cuda_int32 PASSED [ 59%] 2023-01-11T23:10:17.1583454Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_log_ndtr_cuda_int64 PASSED [ 59%] 2023-01-11T23:10:17.1583640Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_log_softmax_with_dtype_cuda_bool PASSED [ 59%] 2023-01-11T23:10:17.1583825Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_log_softmax_with_dtype_cuda_complex64 PASSED [ 59%] 2023-01-11T23:10:17.1584014Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_log_softmax_with_dtype_cuda_float32 PASSED [ 59%] 2023-01-11T23:10:17.1584205Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_log_softmax_with_dtype_cuda_float64 PASSED [ 59%] 2023-01-11T23:10:17.1584417Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_log_softmax_with_dtype_cuda_int64 PASSED [ 59%] 2023-01-11T23:10:17.1584604Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_log_softmax_with_dtype_cuda_int8 PASSED [ 59%] 2023-01-11T23:10:17.1584790Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_log_softmax_with_dtype_cuda_uint8 PASSED [ 59%] 2023-01-11T23:10:17.1584960Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_logit_cuda_bfloat16 PASSED [ 59%] 2023-01-11T23:10:17.1585131Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_logit_cuda_float16 PASSED [ 59%] 2023-01-11T23:10:17.1585302Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_logit_cuda_float32 PASSED [ 59%] 2023-01-11T23:10:17.1585465Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_logit_cuda_int16 PASSED [ 59%] 2023-01-11T23:10:17.1585634Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_logit_cuda_uint8 PASSED [ 59%] 2023-01-11T23:10:17.1585833Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_1_cuda_float16 PASSED [ 59%] 2023-01-11T23:10:17.1586022Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_1_cuda_float32 PASSED [ 59%] 2023-01-11T23:10:17.1586211Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_1_cuda_int16 PASSED [ 59%] 2023-01-11T23:10:17.1586395Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_1_cuda_int32 PASSED [ 59%] 2023-01-11T23:10:17.1586576Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_1_cuda_int64 PASSED [ 59%] 2023-01-11T23:10:17.1586781Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_1_cuda_uint8 PASSED [ 59%] 2023-01-11T23:10:17.1586968Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_3_cuda_float16 PASSED [ 59%] 2023-01-11T23:10:17.1587148Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_3_cuda_float32 PASSED [ 59%] 2023-01-11T23:10:17.1587334Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_5_cuda_float32 PASSED [ 59%] 2023-01-11T23:10:17.1587518Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_5_cuda_float64 PASSED [ 59%] 2023-01-11T23:10:17.1587703Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_5_cuda_int64 PASSED [ 59%] 2023-01-11T23:10:17.1587890Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_5_cuda_int8 PASSED [ 59%] 2023-01-11T23:10:17.1588058Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_ndtr_cuda_bfloat16 PASSED [ 59%] 2023-01-11T23:10:17.1588228Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_ndtr_cuda_bool PASSED [ 59%] 2023-01-11T23:10:17.1588399Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_ndtr_cuda_float16 PASSED [ 59%] 2023-01-11T23:10:17.1588566Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_ndtr_cuda_float32 PASSED [ 59%] 2023-01-11T23:10:17.1588799Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_ndtr_cuda_int16 PASSED [ 59%] 2023-01-11T23:10:17.1588965Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_ndtr_cuda_int8 PASSED [ 59%] 2023-01-11T23:10:17.1589128Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_ndtr_cuda_uint8 PASSED [ 59%] 2023-01-11T23:10:17.1589299Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_ndtri_cuda_bool PASSED [ 60%] 2023-01-11T23:10:17.1589470Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_ndtri_cuda_float64 PASSED [ 60%] 2023-01-11T23:10:17.1589638Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_ndtri_cuda_int32 PASSED [ 60%] 2023-01-11T23:10:17.1589807Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_ndtri_cuda_uint8 PASSED [ 60%] 2023-01-11T23:10:17.1590015Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_softmax_with_dtype_cuda_bool PASSED [ 60%] 2023-01-11T23:10:17.1590206Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_softmax_with_dtype_cuda_complex128 PASSED [ 60%] 2023-01-11T23:10:17.1590387Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_softmax_with_dtype_cuda_complex64 PASSED [ 60%] 2023-01-11T23:10:17.1590573Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_softmax_with_dtype_cuda_float64 PASSED [ 60%] 2023-01-11T23:10:17.1590754Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_softmax_with_dtype_cuda_int32 PASSED [ 60%] 2023-01-11T23:10:17.1590939Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_softmax_with_dtype_cuda_int64 PASSED [ 60%] 2023-01-11T23:10:17.1591119Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_softmax_with_dtype_cuda_int8 PASSED [ 60%] 2023-01-11T23:10:17.1591293Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_xlog1py_cuda_bfloat16 PASSED [ 60%] 2023-01-11T23:10:17.1591461Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_xlog1py_cuda_bool PASSED [ 60%] 2023-01-11T23:10:17.1591637Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_xlog1py_cuda_float16 PASSED [ 60%] 2023-01-11T23:10:17.1591806Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_xlog1py_cuda_float32 PASSED [ 60%] 2023-01-11T23:10:17.1591968Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_xlog1py_cuda_int16 PASSED [ 60%] 2023-01-11T23:10:17.1592141Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_xlog1py_cuda_int32 PASSED [ 60%] 2023-01-11T23:10:17.1592334Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_xlog1py_cuda_uint8 PASSED [ 60%] 2023-01-11T23:10:17.1592500Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_zeta_cuda_float32 PASSED [ 60%] 2023-01-11T23:10:17.1592672Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_zeta_cuda_float64 PASSED [ 60%] 2023-01-11T23:10:17.1592838Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_zeta_cuda_int16 PASSED [ 60%] 2023-01-11T23:10:17.1593003Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_zeta_cuda_int64 PASSED [ 60%] 2023-01-11T23:10:17.1593167Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_zeta_cuda_int8 PASSED [ 60%] 2023-01-11T23:10:17.1593322Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sqrt_cuda_bfloat16 PASSED [ 60%] 2023-01-11T23:10:17.1593485Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sqrt_cuda_complex32 PASSED [ 60%] 2023-01-11T23:10:17.1593649Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sqrt_cuda_complex64 PASSED [ 60%] 2023-01-11T23:10:17.1593810Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sqrt_cuda_int16 PASSED [ 60%] 2023-01-11T23:10:17.1593966Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sqrt_cuda_int64 PASSED [ 60%] 2023-01-11T23:10:17.1594121Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sqrt_cuda_uint8 PASSED [ 60%] 2023-01-11T23:10:17.1594285Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_square_cuda_bfloat16 PASSED [ 60%] 2023-01-11T23:10:17.1594446Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_square_cuda_bool PASSED [ 60%] 2023-01-11T23:10:17.1594609Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_square_cuda_complex128 PASSED [ 60%] 2023-01-11T23:10:17.1594766Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_square_cuda_complex64 PASSED [ 60%] 2023-01-11T23:10:17.1594924Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_square_cuda_float32 PASSED [ 60%] 2023-01-11T23:10:17.1595085Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_square_cuda_int32 PASSED [ 60%] 2023-01-11T23:10:17.1595245Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_squeeze_cuda_bool PASSED [ 60%] 2023-01-11T23:10:17.1595433Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_squeeze_cuda_complex128 PASSED [ 60%] 2023-01-11T23:10:17.1595600Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_squeeze_cuda_complex32 PASSED [ 60%] 2023-01-11T23:10:17.1595764Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_squeeze_cuda_float16 PASSED [ 60%] 2023-01-11T23:10:17.1595925Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_squeeze_cuda_int16 PASSED [ 60%] 2023-01-11T23:10:17.1596077Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_squeeze_cuda_int32 PASSED [ 60%] 2023-01-11T23:10:17.1596238Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_squeeze_cuda_uint8 PASSED [ 60%] 2023-01-11T23:10:17.1596397Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_stack_cuda_bfloat16 PASSED [ 60%] 2023-01-11T23:10:17.1596566Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_stack_cuda_complex128 PASSED [ 60%] 2023-01-11T23:10:17.1596728Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_stack_cuda_complex64 PASSED [ 60%] 2023-01-11T23:10:17.1596888Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_stack_cuda_float16 PASSED [ 60%] 2023-01-11T23:10:17.1597046Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_stack_cuda_int64 PASSED [ 60%] 2023-01-11T23:10:17.1597209Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_stack_cuda_int8 PASSED [ 60%] 2023-01-11T23:10:17.1597370Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_std_cuda_bfloat16 PASSED [ 60%] 2023-01-11T23:10:17.1597521Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_std_cuda_complex128 PASSED [ 60%] 2023-01-11T23:10:17.1597679Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_std_cuda_float32 PASSED [ 60%] 2023-01-11T23:10:17.1597846Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_std_mean_cuda_complex64 PASSED [ 60%] 2023-01-11T23:10:17.1598034Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_std_mean_cuda_float64 PASSED [ 60%] 2023-01-11T23:10:17.1598197Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sub_cuda_complex64 PASSED [ 60%] 2023-01-11T23:10:17.1598355Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sub_cuda_float16 PASSED [ 60%] 2023-01-11T23:10:17.1598513Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sub_cuda_float64 PASSED [ 60%] 2023-01-11T23:10:17.1598670Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sub_cuda_int64 PASSED [ 60%] 2023-01-11T23:10:17.1598822Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_cuda_bfloat16 PASSED [ 60%] 2023-01-11T23:10:17.1598980Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_cuda_complex32 PASSED [ 60%] 2023-01-11T23:10:17.1599137Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_cuda_int32 PASSED [ 60%] 2023-01-11T23:10:17.1599293Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_cuda_uint8 PASSED [ 60%] 2023-01-11T23:10:17.1599466Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_to_size_cuda_bfloat16 PASSED [ 60%] 2023-01-11T23:10:17.1599629Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_to_size_cuda_bool PASSED [ 60%] 2023-01-11T23:10:17.1599801Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_to_size_cuda_complex128 PASSED [ 60%] 2023-01-11T23:10:17.1599967Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_to_size_cuda_float16 PASSED [ 60%] 2023-01-11T23:10:17.1600131Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_to_size_cuda_uint8 PASSED [ 60%] 2023-01-11T23:10:17.1600283Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_t_cuda_complex64 PASSED [ 60%] 2023-01-11T23:10:17.1600440Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_t_cuda_float16 PASSED [ 60%] 2023-01-11T23:10:17.1600592Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_t_cuda_float32 PASSED [ 60%] 2023-01-11T23:10:17.1600747Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_t_cuda_float64 PASSED [ 60%] 2023-01-11T23:10:17.1600904Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_t_cuda_int16 PASSED [ 60%] 2023-01-11T23:10:17.1601081Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_t_cuda_int32 PASSED [ 60%] 2023-01-11T23:10:17.1601232Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_t_cuda_int64 PASSED [ 60%] 2023-01-11T23:10:17.1601382Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_t_cuda_uint8 PASSED [ 60%] 2023-01-11T23:10:17.1601531Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tan_cuda_bool PASSED [ 60%] 2023-01-11T23:10:17.1601692Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tan_cuda_complex64 PASSED [ 60%] 2023-01-11T23:10:17.1601851Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tan_cuda_float32 PASSED [ 60%] 2023-01-11T23:10:17.1602009Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tan_cuda_float64 PASSED [ 60%] 2023-01-11T23:10:17.1602169Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tan_cuda_int16 PASSED [ 60%] 2023-01-11T23:10:17.1602328Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tan_cuda_int8 PASSED [ 60%] 2023-01-11T23:10:17.1602484Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tan_cuda_uint8 PASSED [ 60%] 2023-01-11T23:10:17.1602634Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tanh_cuda_bool PASSED [ 60%] 2023-01-11T23:10:17.1602788Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tanh_cuda_complex32 PASSED [ 60%] 2023-01-11T23:10:17.1602947Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tanh_cuda_float16 PASSED [ 60%] 2023-01-11T23:10:17.1603107Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tanh_cuda_float64 PASSED [ 60%] 2023-01-11T23:10:17.1603263Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tanh_cuda_int32 PASSED [ 60%] 2023-01-11T23:10:17.1603418Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tanh_cuda_int8 PASSED [ 60%] 2023-01-11T23:10:17.1603611Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tensor_split_cuda_bfloat16 XFAIL [ 60%] 2023-01-11T23:10:17.1603784Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tensor_split_cuda_complex128 XFAIL [ 60%] 2023-01-11T23:10:17.1603959Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tensor_split_cuda_complex64 XFAIL [ 60%] 2023-01-11T23:10:17.1604126Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tensor_split_cuda_float16 XFAIL [ 60%] 2023-01-11T23:10:17.1604283Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tensor_split_cuda_int16 XFAIL [ 60%] 2023-01-11T23:10:17.1604447Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tensor_split_cuda_int8 XFAIL [ 60%] 2023-01-11T23:10:17.1604608Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tensor_split_cuda_uint8 XFAIL [ 60%] 2023-01-11T23:10:17.1604774Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_to_cuda_bfloat16 PASSED [ 60%] 2023-01-11T23:10:17.1604936Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_to_cuda_complex64 PASSED [ 60%] 2023-01-11T23:10:17.1605091Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_to_cuda_float16 PASSED [ 60%] 2023-01-11T23:10:17.1605248Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_to_cuda_float32 PASSED [ 60%] 2023-01-11T23:10:17.1605398Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_to_cuda_float64 PASSED [ 60%] 2023-01-11T23:10:17.1605547Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_to_cuda_int16 PASSED [ 60%] 2023-01-11T23:10:17.1605700Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_to_cuda_int64 PASSED [ 60%] 2023-01-11T23:10:17.1605854Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_to_cuda_int8 PASSED [ 60%] 2023-01-11T23:10:17.1606007Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_trace_cuda_bool PASSED [ 60%] 2023-01-11T23:10:17.1606170Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_trace_cuda_complex32 PASSED [ 60%] 2023-01-11T23:10:17.1606332Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_trace_cuda_float16 PASSED [ 60%] 2023-01-11T23:10:17.1606490Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_trace_cuda_int64 PASSED [ 60%] 2023-01-11T23:10:17.1606674Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_trace_cuda_int8 PASSED [ 60%] 2023-01-11T23:10:17.1606825Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_trace_cuda_uint8 PASSED [ 60%] 2023-01-11T23:10:17.1606985Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_transpose_cuda_bool PASSED [ 60%] 2023-01-11T23:10:17.1607152Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_transpose_cuda_float32 PASSED [ 60%] 2023-01-11T23:10:17.1607316Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_transpose_cuda_float64 PASSED [ 60%] 2023-01-11T23:10:17.1607479Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_transpose_cuda_int64 PASSED [ 60%] 2023-01-11T23:10:17.1607641Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_transpose_cuda_uint8 PASSED [ 60%] 2023-01-11T23:10:17.1607800Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tril_cuda_bool PASSED [ 60%] 2023-01-11T23:10:17.1607962Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tril_cuda_complex32 PASSED [ 60%] 2023-01-11T23:10:17.1608126Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tril_cuda_complex64 PASSED [ 60%] 2023-01-11T23:10:17.1608279Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tril_cuda_float16 PASSED [ 60%] 2023-01-11T23:10:17.1608436Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tril_cuda_float32 PASSED [ 60%] 2023-01-11T23:10:17.1608593Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tril_cuda_int32 PASSED [ 60%] 2023-01-11T23:10:17.1608747Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tril_cuda_int64 PASSED [ 60%] 2023-01-11T23:10:17.1608914Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tril_indices_cuda_int32 PASSED [ 60%] 2023-01-11T23:10:17.1609096Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_triu_cuda_bool PASSED [ 60%] 2023-01-11T23:10:17.1609258Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_triu_cuda_complex32 PASSED [ 60%] 2023-01-11T23:10:17.1609419Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_triu_cuda_float32 PASSED [ 60%] 2023-01-11T23:10:17.1609571Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_triu_cuda_float64 PASSED [ 60%] 2023-01-11T23:10:17.1609730Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_triu_cuda_uint8 PASSED [ 60%] 2023-01-11T23:10:17.1609893Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_triu_indices_cuda_int32 PASSED [ 60%] 2023-01-11T23:10:17.1610059Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_triu_indices_cuda_int64 PASSED [ 60%] 2023-01-11T23:10:17.1610227Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_true_divide_cuda_bfloat16 PASSED [ 60%] 2023-01-11T23:10:17.1610393Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_true_divide_cuda_float16 PASSED [ 60%] 2023-01-11T23:10:17.1610555Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_trunc_cuda_float16 PASSED [ 60%] 2023-01-11T23:10:17.1610721Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_trunc_cuda_float32 PASSED [ 60%] 2023-01-11T23:10:17.1610906Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_trunc_cuda_float64 PASSED [ 60%] 2023-01-11T23:10:17.1611073Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_trunc_cuda_int8 PASSED [ 60%] 2023-01-11T23:10:17.1611237Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unbind_cuda_bfloat16 PASSED [ 60%] 2023-01-11T23:10:17.1611395Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unbind_cuda_bool PASSED [ 60%] 2023-01-11T23:10:17.1611561Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unbind_cuda_complex64 PASSED [ 60%] 2023-01-11T23:10:17.1611719Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unbind_cuda_float32 PASSED [ 60%] 2023-01-11T23:10:17.1611880Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unbind_cuda_int16 PASSED [ 60%] 2023-01-11T23:10:17.1612040Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unbind_cuda_int64 PASSED [ 60%] 2023-01-11T23:10:17.1612200Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unbind_cuda_uint8 PASSED [ 60%] 2023-01-11T23:10:17.1612382Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unflatten_cuda_bfloat16 PASSED [ 60%] 2023-01-11T23:10:17.1612551Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unflatten_cuda_complex64 PASSED [ 60%] 2023-01-11T23:10:17.1612715Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unflatten_cuda_int16 PASSED [ 60%] 2023-01-11T23:10:17.1612875Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unflatten_cuda_int32 PASSED [ 60%] 2023-01-11T23:10:17.1613034Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unflatten_cuda_int64 PASSED [ 60%] 2023-01-11T23:10:17.1613193Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unflatten_cuda_int8 PASSED [ 60%] 2023-01-11T23:10:17.1613361Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unflatten_cuda_uint8 PASSED [ 60%] 2023-01-11T23:10:17.1613523Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_copy_cuda_bool PASSED [ 60%] 2023-01-11T23:10:17.1613698Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_copy_cuda_complex64 PASSED [ 60%] 2023-01-11T23:10:17.1613860Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_copy_cuda_float32 PASSED [ 60%] 2023-01-11T23:10:17.1614026Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_copy_cuda_int16 PASSED [ 60%] 2023-01-11T23:10:17.1614191Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_copy_cuda_uint8 PASSED [ 60%] 2023-01-11T23:10:17.1614349Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_cuda_bool PASSED [ 60%] 2023-01-11T23:10:17.1614610Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_cuda_complex128 PASSED [ 60%] 2023-01-11T23:10:17.1614777Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_cuda_complex64 PASSED [ 60%] 2023-01-11T23:10:17.1614980Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_cuda_int32 PASSED [ 61%] 2023-01-11T23:10:17.1615146Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_cuda_int8 PASSED [ 61%] 2023-01-11T23:10:17.1615299Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_cuda_uint8 PASSED [ 61%] 2023-01-11T23:10:17.1615466Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unsqueeze_cuda_bfloat16 PASSED [ 61%] 2023-01-11T23:10:17.1615633Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unsqueeze_cuda_bool PASSED [ 61%] 2023-01-11T23:10:17.1615803Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unsqueeze_cuda_complex64 PASSED [ 61%] 2023-01-11T23:10:17.1615968Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unsqueeze_cuda_float16 PASSED [ 61%] 2023-01-11T23:10:17.1616131Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unsqueeze_cuda_float32 PASSED [ 61%] 2023-01-11T23:10:17.1616297Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unsqueeze_cuda_int16 PASSED [ 61%] 2023-01-11T23:10:17.1616461Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unsqueeze_cuda_int32 PASSED [ 61%] 2023-01-11T23:10:17.1616624Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unsqueeze_cuda_int64 PASSED [ 61%] 2023-01-11T23:10:17.1616780Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unsqueeze_cuda_int8 PASSED [ 61%] 2023-01-11T23:10:17.1616943Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_var_cuda_complex64 PASSED [ 61%] 2023-01-11T23:10:17.1617107Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_var_cuda_float32 PASSED [ 61%] 2023-01-11T23:10:17.1617272Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_var_mean_cuda_bfloat16 PASSED [ 61%] 2023-01-11T23:10:17.1617440Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_var_mean_cuda_complex128 PASSED [ 61%] 2023-01-11T23:10:17.1617605Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_var_mean_cuda_complex64 PASSED [ 61%] 2023-01-11T23:10:17.1617773Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_var_mean_cuda_float32 PASSED [ 61%] 2023-01-11T23:10:17.1617965Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_var_mean_cuda_float64 PASSED [ 61%] 2023-01-11T23:10:17.1618120Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_as_cuda_bfloat16 PASSED [ 61%] 2023-01-11T23:10:17.1618283Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_as_cuda_complex64 PASSED [ 61%] 2023-01-11T23:10:17.1618443Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_as_cuda_float32 PASSED [ 61%] 2023-01-11T23:10:17.1618603Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_as_cuda_float64 PASSED [ 61%] 2023-01-11T23:10:17.1618767Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_as_cuda_int16 PASSED [ 61%] 2023-01-11T23:10:17.1618926Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_cuda_bool PASSED [ 61%] 2023-01-11T23:10:17.1619092Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_cuda_complex64 PASSED [ 61%] 2023-01-11T23:10:17.1619254Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_cuda_float16 PASSED [ 61%] 2023-01-11T23:10:17.1619417Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_cuda_float32 PASSED [ 61%] 2023-01-11T23:10:17.1619571Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_cuda_float64 PASSED [ 61%] 2023-01-11T23:10:17.1619729Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_cuda_int32 PASSED [ 61%] 2023-01-11T23:10:17.1619883Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_cuda_int64 PASSED [ 61%] 2023-01-11T23:10:17.1620040Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_cuda_int8 PASSED [ 61%] 2023-01-11T23:10:17.1620205Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vsplit_cuda_bfloat16 PASSED [ 61%] 2023-01-11T23:10:17.1620365Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vsplit_cuda_bool PASSED [ 61%] 2023-01-11T23:10:17.1620553Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vsplit_cuda_float16 PASSED [ 61%] 2023-01-11T23:10:17.1620717Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vsplit_cuda_int16 PASSED [ 61%] 2023-01-11T23:10:17.1620873Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vsplit_cuda_int8 PASSED [ 61%] 2023-01-11T23:10:17.1621032Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vsplit_cuda_uint8 PASSED [ 61%] 2023-01-11T23:10:17.1621193Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vstack_cuda_bfloat16 PASSED [ 61%] 2023-01-11T23:10:17.1621358Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vstack_cuda_complex128 PASSED [ 61%] 2023-01-11T23:10:17.1621523Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vstack_cuda_complex32 PASSED [ 61%] 2023-01-11T23:10:17.1621684Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vstack_cuda_float64 PASSED [ 61%] 2023-01-11T23:10:17.1621842Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vstack_cuda_int32 PASSED [ 61%] 2023-01-11T23:10:17.1622010Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_where_cuda_complex128 PASSED [ 61%] 2023-01-11T23:10:17.1622175Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_where_cuda_complex64 PASSED [ 61%] 2023-01-11T23:10:17.1622330Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_where_cuda_float16 PASSED [ 61%] 2023-01-11T23:10:17.1622491Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_where_cuda_float32 PASSED [ 61%] 2023-01-11T23:10:17.1622652Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_where_cuda_float64 PASSED [ 61%] 2023-01-11T23:10:17.1622810Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_where_cuda_int16 PASSED [ 61%] 2023-01-11T23:10:17.1622971Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_where_cuda_int8 PASSED [ 61%] 2023-01-11T23:10:17.1623130Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_where_cuda_uint8 PASSED [ 61%] 2023-01-11T23:10:17.1623291Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_xlogy_cuda_bool PASSED [ 61%] 2023-01-11T23:10:17.1623450Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_xlogy_cuda_float64 PASSED [ 61%] 2023-01-11T23:10:17.1623626Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_xlogy_cuda_int8 PASSED [ 61%] 2023-01-11T23:10:17.1623787Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_xlogy_cuda_uint8 PASSED [ 61%] 2023-01-11T23:10:17.1623951Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_zeros_cuda_bfloat16 PASSED [ 61%] 2023-01-11T23:10:17.1624108Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_zeros_cuda_bool PASSED [ 61%] 2023-01-11T23:10:17.1624271Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_zeros_cuda_complex32 PASSED [ 61%] 2023-01-11T23:10:17.1624433Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_zeros_cuda_complex64 PASSED [ 61%] 2023-01-11T23:10:17.1624593Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_zeros_cuda_float32 PASSED [ 61%] 2023-01-11T23:10:17.1624758Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_zeros_cuda_int16 PASSED [ 61%] 2023-01-11T23:10:17.1624937Z test_ops.py::TestCommonCUDA::test_python_ref_meta_ops_nvprims_native_batch_norm_cuda_bfloat16 PASSED [ 61%] 2023-01-11T23:10:17.1625126Z test_ops.py::TestCommonCUDA::test_python_ref_meta_ops_nvprims_native_batch_norm_cuda_float16 PASSED [ 61%] 2023-01-11T23:10:17.1625300Z test_ops.py::TestCommonCUDA::test_python_ref_meta_ops_nvprims_var_mean_cuda_float32 PASSED [ 61%] 2023-01-11T23:10:17.1625474Z test_ops.py::TestCommonCUDA::test_python_ref_meta_ops_nvprims_var_mean_cuda_float64 PASSED [ 61%] 2023-01-11T23:10:17.1625640Z test_ops.py::TestCommonCUDA::test_python_ref_meta_ops_nvprims_view_cuda_bool PASSED [ 61%] 2023-01-11T23:10:17.1625813Z test_ops.py::TestCommonCUDA::test_python_ref_meta_ops_nvprims_view_cuda_complex32 PASSED [ 61%] 2023-01-11T23:10:17.1625985Z test_ops.py::TestCommonCUDA::test_python_ref_meta_ops_nvprims_view_cuda_complex64 PASSED [ 61%] 2023-01-11T23:10:17.1626177Z test_ops.py::TestCommonCUDA::test_python_ref_meta_ops_nvprims_view_cuda_float16 PASSED [ 61%] 2023-01-11T23:10:17.1626341Z test_ops.py::TestCommonCUDA::test_python_ref_meta_ops_nvprims_view_cuda_int32 PASSED [ 61%] 2023-01-11T23:10:17.1626497Z test_ops.py::TestCommonCUDA::test_python_ref_meta_ops_nvprims_view_cuda_int8 PASSED [ 61%] 2023-01-11T23:10:17.1626660Z test_ops.py::TestCommonCUDA::test_python_ref_meta_ops_nvprims_view_cuda_uint8 PASSED [ 61%] 2023-01-11T23:10:17.1626838Z test_ops.py::TestCommonCUDA::test_python_ref_ops_nvprims_native_batch_norm_cuda_bfloat16 XFAIL [ 61%] 2023-01-11T23:10:17.1627014Z test_ops.py::TestCommonCUDA::test_python_ref_ops_nvprims_native_batch_norm_cuda_float16 XFAIL [ 61%] 2023-01-11T23:10:17.1627186Z test_ops.py::TestCommonCUDA::test_python_ref_ops_nvprims_native_batch_norm_cuda_float32 XFAIL [ 61%] 2023-01-11T23:10:17.1627355Z test_ops.py::TestCommonCUDA::test_python_ref_ops_nvprims_native_batch_norm_cuda_float64 XFAIL [ 61%] 2023-01-11T23:10:17.1627521Z test_ops.py::TestCommonCUDA::test_python_ref_ops_nvprims_var_mean_cuda_float16 XFAIL [ 61%] 2023-01-11T23:10:17.1627679Z test_ops.py::TestCommonCUDA::test_python_ref_ops_nvprims_var_mean_cuda_float32 XFAIL [ 61%] 2023-01-11T23:10:17.1627844Z test_ops.py::TestCommonCUDA::test_python_ref_ops_nvprims_var_mean_cuda_float64 XFAIL [ 61%] 2023-01-11T23:10:17.1627999Z test_ops.py::TestCommonCUDA::test_python_ref_ops_nvprims_view_cuda_bool XFAIL [ 61%] 2023-01-11T23:10:17.1628164Z test_ops.py::TestCommonCUDA::test_python_ref_ops_nvprims_view_cuda_complex32 XFAIL [ 61%] 2023-01-11T23:10:17.1628325Z test_ops.py::TestCommonCUDA::test_python_ref_ops_nvprims_view_cuda_float32 XFAIL [ 61%] 2023-01-11T23:10:17.1628485Z test_ops.py::TestCommonCUDA::test_python_ref_ops_nvprims_view_cuda_int64 XFAIL [ 61%] 2023-01-11T23:10:17.1628643Z test_ops.py::TestCommonCUDA::test_python_ref_ops_nvprims_view_cuda_int8 XFAIL [ 61%] 2023-01-11T23:10:17.1628885Z test_ops.py::TestCommonCUDA::test_python_ref_ops_nvprims_view_cuda_uint8 XFAIL [ 61%] 2023-01-11T23:10:17.1629063Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_T_cuda_complex64 PASSED [ 61%] 2023-01-11T23:10:17.1629231Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_T_cuda_float32 PASSED [ 61%] 2023-01-11T23:10:17.1629416Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_T_cuda_int16 PASSED [ 61%] 2023-01-11T23:10:17.1629583Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_T_cuda_int32 PASSED [ 61%] 2023-01-11T23:10:17.1629748Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_T_cuda_int64 PASSED [ 61%] 2023-01-11T23:10:17.1629911Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_T_cuda_int8 PASSED [ 61%] 2023-01-11T23:10:17.1630106Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bfloat16_cuda_complex32 PASSED [ 61%] 2023-01-11T23:10:17.1630298Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bfloat16_cuda_float16 PASSED [ 61%] 2023-01-11T23:10:17.1630489Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bfloat16_cuda_int16 PASSED [ 61%] 2023-01-11T23:10:17.1630679Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bfloat16_cuda_int32 PASSED [ 61%] 2023-01-11T23:10:17.1630867Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bfloat16_cuda_int8 PASSED [ 61%] 2023-01-11T23:10:17.1631048Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bfloat16_cuda_uint8 PASSED [ 61%] 2023-01-11T23:10:17.1631236Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bool_cuda_bool PASSED [ 61%] 2023-01-11T23:10:17.1631422Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bool_cuda_float16 PASSED [ 61%] 2023-01-11T23:10:17.1631607Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bool_cuda_int16 PASSED [ 61%] 2023-01-11T23:10:17.1631861Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bool_cuda_int64 PASSED [ 61%] 2023-01-11T23:10:17.1632045Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bool_cuda_int8 PASSED [ 61%] 2023-01-11T23:10:17.1632229Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bool_cuda_uint8 PASSED [ 61%] 2023-01-11T23:10:17.1632407Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_byte_cuda_bool PASSED [ 61%] 2023-01-11T23:10:17.1632596Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_byte_cuda_complex128 PASSED [ 61%] 2023-01-11T23:10:17.1632773Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_byte_cuda_float16 PASSED [ 61%] 2023-01-11T23:10:17.1632954Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_byte_cuda_int32 PASSED [ 61%] 2023-01-11T23:10:17.1633138Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_byte_cuda_int8 PASSED [ 61%] 2023-01-11T23:10:17.1633319Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_byte_cuda_uint8 PASSED [ 61%] 2023-01-11T23:10:17.1633514Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cdouble_cuda_bfloat16 PASSED [ 61%] 2023-01-11T23:10:17.1633702Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cdouble_cuda_bool PASSED [ 61%] 2023-01-11T23:10:17.1633893Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cdouble_cuda_complex32 PASSED [ 61%] 2023-01-11T23:10:17.1634081Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cdouble_cuda_float16 PASSED [ 61%] 2023-01-11T23:10:17.1634268Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cdouble_cuda_int16 PASSED [ 61%] 2023-01-11T23:10:17.1634457Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cdouble_cuda_int32 PASSED [ 61%] 2023-01-11T23:10:17.1634641Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cdouble_cuda_uint8 PASSED [ 61%] 2023-01-11T23:10:17.1634854Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cfloat_cuda_bfloat16 PASSED [ 61%] 2023-01-11T23:10:17.1635050Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cfloat_cuda_complex32 PASSED [ 61%] 2023-01-11T23:10:17.1635236Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cfloat_cuda_float32 PASSED [ 61%] 2023-01-11T23:10:17.1635422Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cfloat_cuda_int32 PASSED [ 61%] 2023-01-11T23:10:17.1635606Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cfloat_cuda_int64 PASSED [ 61%] 2023-01-11T23:10:17.1635792Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_chalf_cuda_bool PASSED [ 61%] 2023-01-11T23:10:17.1635987Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_chalf_cuda_complex128 PASSED [ 61%] 2023-01-11T23:10:17.1636177Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_chalf_cuda_complex32 PASSED [ 61%] 2023-01-11T23:10:17.1636358Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_chalf_cuda_float32 PASSED [ 61%] 2023-01-11T23:10:17.1636543Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_chalf_cuda_int8 PASSED [ 61%] 2023-01-11T23:10:17.1636730Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_chalf_cuda_uint8 PASSED [ 61%] 2023-01-11T23:10:17.1636915Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_char_cuda_bool PASSED [ 61%] 2023-01-11T23:10:17.1637103Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_char_cuda_complex128 PASSED [ 61%] 2023-01-11T23:10:17.1637293Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_char_cuda_complex64 PASSED [ 61%] 2023-01-11T23:10:17.1637504Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_char_cuda_float16 PASSED [ 61%] 2023-01-11T23:10:17.1637693Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_char_cuda_float32 PASSED [ 61%] 2023-01-11T23:10:17.1637880Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_char_cuda_int32 PASSED [ 61%] 2023-01-11T23:10:17.1638053Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_char_cuda_int64 PASSED [ 61%] 2023-01-11T23:10:17.1638236Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_char_cuda_int8 PASSED [ 61%] 2023-01-11T23:10:17.1638423Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_complex_cuda_float16 PASSED [ 61%] 2023-01-11T23:10:17.1638615Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_double_cuda_bfloat16 PASSED [ 61%] 2023-01-11T23:10:17.1638806Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_double_cuda_float64 PASSED [ 61%] 2023-01-11T23:10:17.1638994Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_double_cuda_int8 PASSED [ 61%] 2023-01-11T23:10:17.1639196Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_float_cuda_bfloat16 PASSED [ 61%] 2023-01-11T23:10:17.1639379Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_float_cuda_bool PASSED [ 61%] 2023-01-11T23:10:17.1639563Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_float_cuda_complex32 PASSED [ 61%] 2023-01-11T23:10:17.1639754Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_float_cuda_complex64 PASSED [ 61%] 2023-01-11T23:10:17.1639934Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_float_cuda_float16 PASSED [ 61%] 2023-01-11T23:10:17.1640123Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_float_cuda_float32 PASSED [ 61%] 2023-01-11T23:10:17.1640312Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_float_cuda_int64 PASSED [ 61%] 2023-01-11T23:10:17.1640519Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_half_cuda_bool PASSED [ 61%] 2023-01-11T23:10:17.1640708Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_half_cuda_complex64 PASSED [ 61%] 2023-01-11T23:10:17.1640920Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_half_cuda_float16 PASSED [ 61%] 2023-01-11T23:10:17.1641132Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_half_cuda_float32 PASSED [ 61%] 2023-01-11T23:10:17.1641319Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_half_cuda_int8 PASSED [ 61%] 2023-01-11T23:10:17.1641505Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_int_cuda_bfloat16 PASSED [ 61%] 2023-01-11T23:10:17.1641682Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_int_cuda_bool PASSED [ 61%] 2023-01-11T23:10:17.1641875Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_int_cuda_complex64 PASSED [ 61%] 2023-01-11T23:10:17.1642060Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_int_cuda_float16 PASSED [ 61%] 2023-01-11T23:10:17.1642244Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_int_cuda_float32 PASSED [ 61%] 2023-01-11T23:10:17.1642425Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_int_cuda_float64 PASSED [ 61%] 2023-01-11T23:10:17.1642609Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_int_cuda_int8 PASSED [ 62%] 2023-01-11T23:10:17.1642796Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_long_cuda_bfloat16 PASSED [ 62%] 2023-01-11T23:10:17.1643008Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_long_cuda_complex32 PASSED [ 62%] 2023-01-11T23:10:17.1643201Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_long_cuda_complex64 PASSED [ 62%] 2023-01-11T23:10:17.1643380Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_long_cuda_float16 PASSED [ 62%] 2023-01-11T23:10:17.1643564Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_long_cuda_float32 PASSED [ 62%] 2023-01-11T23:10:17.1643747Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_long_cuda_float64 PASSED [ 62%] 2023-01-11T23:10:17.1643934Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_long_cuda_int32 PASSED [ 62%] 2023-01-11T23:10:17.1644123Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_long_cuda_int64 PASSED [ 62%] 2023-01-11T23:10:17.1644307Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_long_cuda_uint8 PASSED [ 62%] 2023-01-11T23:10:17.1644491Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_short_cuda_bool PASSED [ 62%] 2023-01-11T23:10:17.1644685Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_short_cuda_complex64 PASSED [ 62%] 2023-01-11T23:10:17.1644871Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_short_cuda_float64 PASSED [ 62%] 2023-01-11T23:10:17.1645047Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_short_cuda_int32 PASSED [ 62%] 2023-01-11T23:10:17.1645229Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_short_cuda_int8 PASSED [ 62%] 2023-01-11T23:10:17.1645409Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_short_cuda_uint8 PASSED [ 62%] 2023-01-11T23:10:17.1645588Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_abs_cuda_complex128 PASSED [ 62%] 2023-01-11T23:10:17.1645762Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_abs_cuda_complex32 PASSED [ 62%] 2023-01-11T23:10:17.1645967Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_abs_cuda_complex64 PASSED [ 62%] 2023-01-11T23:10:17.1646140Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_abs_cuda_float16 PASSED [ 62%] 2023-01-11T23:10:17.1646314Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_abs_cuda_float32 PASSED [ 62%] 2023-01-11T23:10:17.1646485Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_abs_cuda_int16 PASSED [ 62%] 2023-01-11T23:10:17.1646646Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_abs_cuda_int64 PASSED [ 62%] 2023-01-11T23:10:17.1646817Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acos_cuda_bfloat16 PASSED [ 62%] 2023-01-11T23:10:17.1646988Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acos_cuda_bool PASSED [ 62%] 2023-01-11T23:10:17.1647172Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acos_cuda_complex128 PASSED [ 62%] 2023-01-11T23:10:17.1647346Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acos_cuda_float32 PASSED [ 62%] 2023-01-11T23:10:17.1647518Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acos_cuda_float64 PASSED [ 62%] 2023-01-11T23:10:17.1647688Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acos_cuda_uint8 PASSED [ 62%] 2023-01-11T23:10:17.1647870Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acosh_cuda_complex128 PASSED [ 62%] 2023-01-11T23:10:17.1648048Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acosh_cuda_complex64 PASSED [ 62%] 2023-01-11T23:10:17.1648210Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acosh_cuda_float32 PASSED [ 62%] 2023-01-11T23:10:17.1648377Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acosh_cuda_float64 PASSED [ 62%] 2023-01-11T23:10:17.1648574Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acosh_cuda_int8 PASSED [ 62%] 2023-01-11T23:10:17.1648749Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acosh_cuda_uint8 PASSED [ 62%] 2023-01-11T23:10:17.1648918Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_add_cuda_bool PASSED [ 62%] 2023-01-11T23:10:17.1649092Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_add_cuda_complex64 PASSED [ 62%] 2023-01-11T23:10:17.1649262Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_add_cuda_float32 PASSED [ 62%] 2023-01-11T23:10:17.1649431Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_add_cuda_int16 PASSED [ 62%] 2023-01-11T23:10:17.1649597Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_add_cuda_int32 PASSED [ 62%] 2023-01-11T23:10:17.1649757Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_add_cuda_int8 PASSED [ 62%] 2023-01-11T23:10:17.1649942Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addcdiv_cuda_bfloat16 PASSED [ 62%] 2023-01-11T23:10:17.1650125Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addcdiv_cuda_complex128 PASSED [ 62%] 2023-01-11T23:10:17.1650307Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addcdiv_cuda_complex64 PASSED [ 62%] 2023-01-11T23:10:17.1650484Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addcdiv_cuda_float64 PASSED [ 62%] 2023-01-11T23:10:17.1650665Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addcmul_cuda_complex128 PASSED [ 62%] 2023-01-11T23:10:17.1650844Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addcmul_cuda_float16 PASSED [ 62%] 2023-01-11T23:10:17.1651019Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addcmul_cuda_float32 PASSED [ 62%] 2023-01-11T23:10:17.1651191Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addcmul_cuda_float64 PASSED [ 62%] 2023-01-11T23:10:17.1651361Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addcmul_cuda_int32 PASSED [ 62%] 2023-01-11T23:10:17.1651553Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addcmul_cuda_int64 PASSED [ 62%] 2023-01-11T23:10:17.1651725Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addcmul_cuda_uint8 PASSED [ 62%] 2023-01-11T23:10:17.1651892Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addr_cuda_bfloat16 PASSED [ 62%] 2023-01-11T23:10:17.1652066Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addr_cuda_complex128 PASSED [ 62%] 2023-01-11T23:10:17.1652242Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addr_cuda_complex64 PASSED [ 62%] 2023-01-11T23:10:17.1652414Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addr_cuda_int16 PASSED [ 62%] 2023-01-11T23:10:17.1652583Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addr_cuda_int32 PASSED [ 62%] 2023-01-11T23:10:17.1652748Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addr_cuda_int8 PASSED [ 62%] 2023-01-11T23:10:17.1652921Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addr_cuda_uint8 PASSED [ 62%] 2023-01-11T23:10:17.1653093Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_all_cuda_bfloat16 PASSED [ 62%] 2023-01-11T23:10:17.1653261Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_all_cuda_int16 PASSED [ 62%] 2023-01-11T23:10:17.1653425Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_all_cuda_int32 PASSED [ 62%] 2023-01-11T23:10:17.1653594Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_all_cuda_int64 PASSED [ 62%] 2023-01-11T23:10:17.1653761Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_all_cuda_uint8 PASSED [ 62%] 2023-01-11T23:10:17.1654450Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_allclose_cuda_complex128 PASSED [ 62%] 2023-01-11T23:10:17.1654844Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_allclose_cuda_float64 PASSED [ 62%] 2023-01-11T23:10:17.1655071Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_amax_cuda_bfloat16 PASSED [ 62%] 2023-01-11T23:10:17.1655283Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_amax_cuda_int16 PASSED [ 62%] 2023-01-11T23:10:17.1655455Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_amax_cuda_int32 PASSED [ 62%] 2023-01-11T23:10:17.1655626Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_amax_cuda_int64 PASSED [ 62%] 2023-01-11T23:10:17.1655796Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_amax_cuda_uint8 PASSED [ 62%] 2023-01-11T23:10:17.1655970Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_amin_cuda_bfloat16 PASSED [ 62%] 2023-01-11T23:10:17.1656143Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_any_cuda_bool PASSED [ 62%] 2023-01-11T23:10:17.1656325Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_any_cuda_complex128 PASSED [ 62%] 2023-01-11T23:10:17.1656499Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_any_cuda_float16 PASSED [ 62%] 2023-01-11T23:10:17.1656664Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_any_cuda_int16 PASSED [ 62%] 2023-01-11T23:10:17.1656833Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_any_cuda_int64 PASSED [ 62%] 2023-01-11T23:10:17.1657001Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_any_cuda_int8 PASSED [ 62%] 2023-01-11T23:10:17.1657181Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_arange_cuda_bfloat16 PASSED [ 62%] 2023-01-11T23:10:17.1657357Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_arange_cuda_float32 PASSED [ 62%] 2023-01-11T23:10:17.1657530Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_arange_cuda_float64 PASSED [ 62%] 2023-01-11T23:10:17.1657712Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_arange_cuda_int16 PASSED [ 62%] 2023-01-11T23:10:17.1657884Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_arange_cuda_int32 PASSED [ 62%] 2023-01-11T23:10:17.1658124Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_cuda_float64 PASSED [ 62%] 2023-01-11T23:10:17.1658297Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_cuda_int64 PASSED [ 62%] 2023-01-11T23:10:17.1658469Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_cuda_int8 PASSED [ 62%] 2023-01-11T23:10:17.1658665Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_partial_views_cuda_bfloat16 PASSED [ 62%] 2023-01-11T23:10:17.1658858Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_partial_views_cuda_bool PASSED [ 62%] 2023-01-11T23:10:17.1659058Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_partial_views_cuda_complex128 PASSED [ 62%] 2023-01-11T23:10:17.1659255Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_partial_views_cuda_int32 PASSED [ 62%] 2023-01-11T23:10:17.1659450Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_scatter_cuda_bfloat16 PASSED [ 62%] 2023-01-11T23:10:17.1659639Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_scatter_cuda_complex128 PASSED [ 62%] 2023-01-11T23:10:17.1659828Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_scatter_cuda_complex32 PASSED [ 62%] 2023-01-11T23:10:17.1660011Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_scatter_cuda_float16 PASSED [ 62%] 2023-01-11T23:10:17.1660197Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_scatter_cuda_int32 PASSED [ 62%] 2023-01-11T23:10:17.1660380Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_scatter_cuda_uint8 PASSED [ 62%] 2023-01-11T23:10:17.1660587Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asin_cuda_bfloat16 PASSED [ 62%] 2023-01-11T23:10:17.1660769Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asin_cuda_complex128 PASSED [ 62%] 2023-01-11T23:10:17.1660946Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asin_cuda_complex64 PASSED [ 62%] 2023-01-11T23:10:17.1661142Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asin_cuda_float16 PASSED [ 62%] 2023-01-11T23:10:17.1661341Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asin_cuda_float64 PASSED [ 62%] 2023-01-11T23:10:17.1661507Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asin_cuda_int16 PASSED [ 62%] 2023-01-11T23:10:17.1661669Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asin_cuda_int32 PASSED [ 62%] 2023-01-11T23:10:17.1661840Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asin_cuda_uint8 PASSED [ 62%] 2023-01-11T23:10:17.1662022Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asinh_cuda_complex32 PASSED [ 62%] 2023-01-11T23:10:17.1662193Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asinh_cuda_float64 PASSED [ 62%] 2023-01-11T23:10:17.1662364Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asinh_cuda_int32 PASSED [ 62%] 2023-01-11T23:10:17.1662534Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asinh_cuda_int64 PASSED [ 62%] 2023-01-11T23:10:17.1662698Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asinh_cuda_int8 PASSED [ 62%] 2023-01-11T23:10:17.1662864Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan2_cuda_bool PASSED [ 62%] 2023-01-11T23:10:17.1663032Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan2_cuda_float32 PASSED [ 62%] 2023-01-11T23:10:17.1663193Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan2_cuda_float64 PASSED [ 62%] 2023-01-11T23:10:17.1663361Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan2_cuda_int32 PASSED [ 62%] 2023-01-11T23:10:17.1663550Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan2_cuda_int64 PASSED [ 62%] 2023-01-11T23:10:17.1663722Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan2_cuda_uint8 PASSED [ 62%] 2023-01-11T23:10:17.1663889Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan_cuda_bool PASSED [ 62%] 2023-01-11T23:10:17.1664066Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan_cuda_complex32 PASSED [ 62%] 2023-01-11T23:10:17.1664239Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan_cuda_complex64 PASSED [ 62%] 2023-01-11T23:10:17.1664402Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan_cuda_int16 PASSED [ 62%] 2023-01-11T23:10:17.1664563Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan_cuda_int32 PASSED [ 62%] 2023-01-11T23:10:17.1664733Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan_cuda_uint8 PASSED [ 62%] 2023-01-11T23:10:17.1664911Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atanh_cuda_complex128 PASSED [ 62%] 2023-01-11T23:10:17.1665091Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atanh_cuda_complex32 PASSED [ 62%] 2023-01-11T23:10:17.1665264Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atanh_cuda_complex64 PASSED [ 62%] 2023-01-11T23:10:17.1665438Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atanh_cuda_float16 PASSED [ 62%] 2023-01-11T23:10:17.1665606Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atanh_cuda_float32 PASSED [ 62%] 2023-01-11T23:10:17.1665772Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atanh_cuda_float64 PASSED [ 62%] 2023-01-11T23:10:17.1665944Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atanh_cuda_int16 PASSED [ 62%] 2023-01-11T23:10:17.1666133Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atanh_cuda_int32 PASSED [ 62%] 2023-01-11T23:10:17.1666303Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atanh_cuda_int64 PASSED [ 62%] 2023-01-11T23:10:17.1666473Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atanh_cuda_int8 PASSED [ 62%] 2023-01-11T23:10:17.1666641Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atanh_cuda_uint8 PASSED [ 62%] 2023-01-11T23:10:17.1666823Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_1d_cuda_bfloat16 PASSED [ 62%] 2023-01-11T23:10:17.1667008Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_1d_cuda_complex128 PASSED [ 62%] 2023-01-11T23:10:17.1667192Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_1d_cuda_complex32 PASSED [ 62%] 2023-01-11T23:10:17.1667373Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_1d_cuda_complex64 PASSED [ 62%] 2023-01-11T23:10:17.1667557Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_1d_cuda_float32 PASSED [ 62%] 2023-01-11T23:10:17.1667732Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_1d_cuda_float64 PASSED [ 62%] 2023-01-11T23:10:17.1667909Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_2d_cuda_bool PASSED [ 62%] 2023-01-11T23:10:17.1668088Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_2d_cuda_complex64 PASSED [ 62%] 2023-01-11T23:10:17.1668266Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_2d_cuda_int16 PASSED [ 62%] 2023-01-11T23:10:17.1668436Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_2d_cuda_int32 PASSED [ 62%] 2023-01-11T23:10:17.1668606Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_2d_cuda_int64 PASSED [ 62%] 2023-01-11T23:10:17.1668881Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_2d_cuda_int8 PASSED [ 62%] 2023-01-11T23:10:17.1669058Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_3d_cuda_bool PASSED [ 62%] 2023-01-11T23:10:17.1669266Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_3d_cuda_complex32 PASSED [ 62%] 2023-01-11T23:10:17.1669444Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_3d_cuda_complex64 PASSED [ 62%] 2023-01-11T23:10:17.1669622Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_3d_cuda_float16 PASSED [ 62%] 2023-01-11T23:10:17.1669800Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_3d_cuda_float32 PASSED [ 62%] 2023-01-11T23:10:17.1669974Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_3d_cuda_int32 PASSED [ 62%] 2023-01-11T23:10:17.1670146Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_3d_cuda_int64 PASSED [ 62%] 2023-01-11T23:10:17.1670319Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_3d_cuda_uint8 PASSED [ 62%] 2023-01-11T23:10:17.1670495Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_and_cuda_int32 PASSED [ 62%] 2023-01-11T23:10:17.1670676Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_and_cuda_int64 PASSED [ 62%] 2023-01-11T23:10:17.1670853Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_and_cuda_int8 PASSED [ 62%] 2023-01-11T23:10:17.1671031Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_left_shift_cuda_int32 PASSED [ 62%] 2023-01-11T23:10:17.1671215Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_left_shift_cuda_int8 PASSED [ 62%] 2023-01-11T23:10:17.1671395Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_not_cuda_int32 PASSED [ 63%] 2023-01-11T23:10:17.1671576Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_not_cuda_int64 PASSED [ 63%] 2023-01-11T23:10:17.1671802Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_not_cuda_int8 PASSED [ 63%] 2023-01-11T23:10:17.1671982Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_not_cuda_uint8 PASSED [ 63%] 2023-01-11T23:10:17.1672156Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_or_cuda_int32 PASSED [ 63%] 2023-01-11T23:10:17.1672328Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_or_cuda_int64 PASSED [ 63%] 2023-01-11T23:10:17.1672519Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_right_shift_cuda_int16 PASSED [ 63%] 2023-01-11T23:10:17.1672699Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_right_shift_cuda_int32 PASSED [ 63%] 2023-01-11T23:10:17.1672886Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_right_shift_cuda_int8 PASSED [ 63%] 2023-01-11T23:10:17.1673071Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_right_shift_cuda_uint8 PASSED [ 63%] 2023-01-11T23:10:17.1673247Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_xor_cuda_int16 PASSED [ 63%] 2023-01-11T23:10:17.1673423Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_xor_cuda_int8 PASSED [ 63%] 2023-01-11T23:10:17.1673598Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_xor_cuda_uint8 PASSED [ 63%] 2023-01-11T23:10:17.1673784Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_shapes_cuda_float32 PASSED [ 63%] 2023-01-11T23:10:17.1673974Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_tensors_cuda_bfloat16 PASSED [ 63%] 2023-01-11T23:10:17.1674158Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_tensors_cuda_bool PASSED [ 63%] 2023-01-11T23:10:17.1674335Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_tensors_cuda_float32 PASSED [ 63%] 2023-01-11T23:10:17.1674524Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_tensors_cuda_float64 PASSED [ 63%] 2023-01-11T23:10:17.1674733Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_tensors_cuda_int16 PASSED [ 63%] 2023-01-11T23:10:17.1674919Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_tensors_cuda_int32 PASSED [ 63%] 2023-01-11T23:10:17.1675103Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_tensors_cuda_int8 PASSED [ 63%] 2023-01-11T23:10:17.1675287Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_to_cuda_bfloat16 PASSED [ 63%] 2023-01-11T23:10:17.1675464Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_to_cuda_complex64 PASSED [ 63%] 2023-01-11T23:10:17.1675644Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_to_cuda_float16 PASSED [ 63%] 2023-01-11T23:10:17.1675826Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_to_cuda_float64 PASSED [ 63%] 2023-01-11T23:10:17.1676001Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_to_cuda_int32 PASSED [ 63%] 2023-01-11T23:10:17.1676185Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_to_cuda_uint8 PASSED [ 63%] 2023-01-11T23:10:17.1676363Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bucketize_cuda_float32 PASSED [ 63%] 2023-01-11T23:10:17.1676541Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bucketize_cuda_float64 PASSED [ 63%] 2023-01-11T23:10:17.1676717Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bucketize_cuda_int16 PASSED [ 63%] 2023-01-11T23:10:17.1676889Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bucketize_cuda_int32 PASSED [ 63%] 2023-01-11T23:10:17.1677059Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bucketize_cuda_int64 PASSED [ 63%] 2023-01-11T23:10:17.1677257Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bucketize_cuda_int8 PASSED [ 63%] 2023-01-11T23:10:17.1677432Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cat_cuda_complex32 PASSED [ 63%] 2023-01-11T23:10:17.1677599Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cat_cuda_float16 PASSED [ 63%] 2023-01-11T23:10:17.1677772Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cat_cuda_float32 PASSED [ 63%] 2023-01-11T23:10:17.1677941Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cat_cuda_int16 PASSED [ 63%] 2023-01-11T23:10:17.1678108Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cat_cuda_int8 PASSED [ 63%] 2023-01-11T23:10:17.1678281Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ceil_cuda_float32 PASSED [ 63%] 2023-01-11T23:10:17.1678448Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ceil_cuda_int8 PASSED [ 63%] 2023-01-11T23:10:17.1678621Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ceil_cuda_uint8 PASSED [ 63%] 2023-01-11T23:10:17.1678795Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_chunk_cuda_complex64 PASSED [ 63%] 2023-01-11T23:10:17.1678974Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_chunk_cuda_float64 PASSED [ 63%] 2023-01-11T23:10:17.1679135Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_chunk_cuda_int16 PASSED [ 63%] 2023-01-11T23:10:17.1679304Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_chunk_cuda_int32 PASSED [ 63%] 2023-01-11T23:10:17.1679473Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_chunk_cuda_int8 PASSED [ 63%] 2023-01-11T23:10:17.1679648Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_cuda_float16 PASSED [ 63%] 2023-01-11T23:10:17.1679815Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_cuda_float32 PASSED [ 63%] 2023-01-11T23:10:17.1679988Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_cuda_float64 PASSED [ 63%] 2023-01-11T23:10:17.1680160Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_cuda_int64 PASSED [ 63%] 2023-01-11T23:10:17.1680350Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_cuda_int8 PASSED [ 63%] 2023-01-11T23:10:17.1680523Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_max_cuda_float32 PASSED [ 63%] 2023-01-11T23:10:17.1680697Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_max_cuda_int32 PASSED [ 63%] 2023-01-11T23:10:17.1680869Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_max_cuda_int8 PASSED [ 63%] 2023-01-11T23:10:17.1681037Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_min_cuda_bool PASSED [ 63%] 2023-01-11T23:10:17.1681213Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_min_cuda_float64 PASSED [ 63%] 2023-01-11T23:10:17.1681393Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_min_cuda_int16 PASSED [ 63%] 2023-01-11T23:10:17.1681564Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_min_cuda_uint8 PASSED [ 63%] 2023-01-11T23:10:17.1681738Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clone_cuda_bool PASSED [ 63%] 2023-01-11T23:10:17.1681915Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clone_cuda_complex128 PASSED [ 63%] 2023-01-11T23:10:17.1682081Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clone_cuda_complex32 PASSED [ 63%] 2023-01-11T23:10:17.1682251Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clone_cuda_float32 PASSED [ 63%] 2023-01-11T23:10:17.1682420Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clone_cuda_int16 PASSED [ 63%] 2023-01-11T23:10:17.1682589Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clone_cuda_int64 PASSED [ 63%] 2023-01-11T23:10:17.1682781Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clone_cuda_int8 PASSED [ 63%] 2023-01-11T23:10:17.1682951Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clone_cuda_uint8 PASSED [ 63%] 2023-01-11T23:10:17.1683139Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_column_stack_cuda_bfloat16 PASSED [ 63%] 2023-01-11T23:10:17.1683317Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_column_stack_cuda_bool PASSED [ 63%] 2023-01-11T23:10:17.1683503Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_column_stack_cuda_complex128 PASSED [ 63%] 2023-01-11T23:10:17.1683678Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_column_stack_cuda_float32 PASSED [ 63%] 2023-01-11T23:10:17.1683856Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_column_stack_cuda_int16 PASSED [ 63%] 2023-01-11T23:10:17.1684033Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_column_stack_cuda_int8 PASSED [ 63%] 2023-01-11T23:10:17.1684216Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_column_stack_cuda_uint8 PASSED [ 63%] 2023-01-11T23:10:17.1684386Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_cuda_bool PASSED [ 63%] 2023-01-11T23:10:17.1684559Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_cuda_float32 PASSED [ 63%] 2023-01-11T23:10:17.1684731Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_cuda_int32 PASSED [ 63%] 2023-01-11T23:10:17.1684898Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_cuda_int64 PASSED [ 63%] 2023-01-11T23:10:17.1685064Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_cuda_int8 PASSED [ 63%] 2023-01-11T23:10:17.1685235Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_physical_cuda_bool PASSED [ 63%] 2023-01-11T23:10:17.1685424Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_physical_cuda_complex128 PASSED [ 63%] 2023-01-11T23:10:17.1685613Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_physical_cuda_complex64 PASSED [ 63%] 2023-01-11T23:10:17.1685798Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_physical_cuda_float16 PASSED [ 63%] 2023-01-11T23:10:17.1686023Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_physical_cuda_float32 PASSED [ 63%] 2023-01-11T23:10:17.1686205Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_physical_cuda_int16 PASSED [ 63%] 2023-01-11T23:10:17.1686386Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_physical_cuda_int32 PASSED [ 63%] 2023-01-11T23:10:17.1686564Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_physical_cuda_int8 PASSED [ 63%] 2023-01-11T23:10:17.1686753Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_constant_pad_nd_cuda_complex64 PASSED [ 63%] 2023-01-11T23:10:17.1686930Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_constant_pad_nd_cuda_float16 PASSED [ 63%] 2023-01-11T23:10:17.1687118Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_constant_pad_nd_cuda_float64 PASSED [ 63%] 2023-01-11T23:10:17.1687300Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_constant_pad_nd_cuda_int16 PASSED [ 63%] 2023-01-11T23:10:17.1687479Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_constant_pad_nd_cuda_int64 PASSED [ 63%] 2023-01-11T23:10:17.1687655Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_constant_pad_nd_cuda_uint8 PASSED [ 63%] 2023-01-11T23:10:17.1687835Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_contiguous_cuda_bfloat16 PASSED [ 63%] 2023-01-11T23:10:17.1688019Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_contiguous_cuda_complex128 PASSED [ 63%] 2023-01-11T23:10:17.1688201Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_contiguous_cuda_complex32 PASSED [ 63%] 2023-01-11T23:10:17.1688410Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_contiguous_cuda_complex64 PASSED [ 63%] 2023-01-11T23:10:17.1688583Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_contiguous_cuda_float16 PASSED [ 63%] 2023-01-11T23:10:17.1688763Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_contiguous_cuda_float64 PASSED [ 63%] 2023-01-11T23:10:17.1688943Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_contiguous_cuda_int32 PASSED [ 63%] 2023-01-11T23:10:17.1689120Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_copysign_cuda_bfloat16 PASSED [ 63%] 2023-01-11T23:10:17.1689299Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_copysign_cuda_float16 PASSED [ 63%] 2023-01-11T23:10:17.1689471Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_copysign_cuda_float32 PASSED [ 63%] 2023-01-11T23:10:17.1689641Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_copysign_cuda_float64 PASSED [ 63%] 2023-01-11T23:10:17.1689821Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_copysign_cuda_int16 PASSED [ 63%] 2023-01-11T23:10:17.1689998Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_copysign_cuda_int8 PASSED [ 63%] 2023-01-11T23:10:17.1690165Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_copysign_cuda_uint8 PASSED [ 63%] 2023-01-11T23:10:17.1690361Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cos_cuda_complex128 PASSED [ 63%] 2023-01-11T23:10:17.1690554Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cos_cuda_complex64 PASSED [ 63%] 2023-01-11T23:10:17.1690725Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cos_cuda_float16 PASSED [ 63%] 2023-01-11T23:10:17.1690894Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cos_cuda_float64 PASSED [ 63%] 2023-01-11T23:10:17.1691062Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cos_cuda_int16 PASSED [ 63%] 2023-01-11T23:10:17.1691234Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cos_cuda_int32 PASSED [ 63%] 2023-01-11T23:10:17.1691400Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cos_cuda_int64 PASSED [ 63%] 2023-01-11T23:10:17.1691589Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cos_cuda_uint8 PASSED [ 63%] 2023-01-11T23:10:17.1691749Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cosh_cuda_bool PASSED [ 63%] 2023-01-11T23:10:17.1691928Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cosh_cuda_complex128 PASSED [ 63%] 2023-01-11T23:10:17.1692103Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cosh_cuda_complex64 PASSED [ 63%] 2023-01-11T23:10:17.1692281Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cosh_cuda_float16 PASSED [ 63%] 2023-01-11T23:10:17.1692455Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cosh_cuda_float32 PASSED [ 63%] 2023-01-11T23:10:17.1692628Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cosh_cuda_uint8 PASSED [ 63%] 2023-01-11T23:10:17.1692804Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cumsum_cuda_bfloat16 PASSED [ 63%] 2023-01-11T23:10:17.1692980Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cumsum_cuda_float16 PASSED [ 63%] 2023-01-11T23:10:17.1693142Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cumsum_cuda_float32 PASSED [ 63%] 2023-01-11T23:10:17.1693309Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cumsum_cuda_float64 PASSED [ 63%] 2023-01-11T23:10:17.1693479Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cumsum_cuda_int64 PASSED [ 63%] 2023-01-11T23:10:17.1693653Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cumsum_cuda_uint8 PASSED [ 63%] 2023-01-11T23:10:17.1693823Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_cuda_bfloat16 PASSED [ 63%] 2023-01-11T23:10:17.1694020Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_cuda_complex64 PASSED [ 63%] 2023-01-11T23:10:17.1694190Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_cuda_float16 PASSED [ 63%] 2023-01-11T23:10:17.1694364Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_cuda_float32 PASSED [ 63%] 2023-01-11T23:10:17.1694652Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_cuda_int8 PASSED [ 63%] 2023-01-11T23:10:17.1694817Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_cuda_uint8 PASSED [ 63%] 2023-01-11T23:10:17.1695001Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_embed_cuda_complex32 PASSED [ 63%] 2023-01-11T23:10:17.1695188Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_embed_cuda_complex64 PASSED [ 63%] 2023-01-11T23:10:17.1695369Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_embed_cuda_float32 PASSED [ 63%] 2023-01-11T23:10:17.1695550Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_embed_cuda_int32 PASSED [ 63%] 2023-01-11T23:10:17.1695731Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_copy_cuda_bool PASSED [ 63%] 2023-01-11T23:10:17.1695923Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_copy_cuda_complex128 PASSED [ 63%] 2023-01-11T23:10:17.1696106Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_copy_cuda_float32 PASSED [ 63%] 2023-01-11T23:10:17.1696290Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_copy_cuda_int32 PASSED [ 63%] 2023-01-11T23:10:17.1696462Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_copy_cuda_int64 PASSED [ 63%] 2023-01-11T23:10:17.1696640Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_copy_cuda_int8 PASSED [ 63%] 2023-01-11T23:10:17.1696824Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_copy_cuda_uint8 PASSED [ 63%] 2023-01-11T23:10:17.1697002Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_cuda_bool PASSED [ 63%] 2023-01-11T23:10:17.1697224Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_cuda_complex32 PASSED [ 63%] 2023-01-11T23:10:17.1697407Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_cuda_complex64 PASSED [ 63%] 2023-01-11T23:10:17.1697582Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_cuda_float16 PASSED [ 63%] 2023-01-11T23:10:17.1697755Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_cuda_float32 PASSED [ 63%] 2023-01-11T23:10:17.1697928Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_cuda_int32 PASSED [ 63%] 2023-01-11T23:10:17.1698092Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_cuda_int64 PASSED [ 63%] 2023-01-11T23:10:17.1698287Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_scatter_cuda_bfloat16 PASSED [ 63%] 2023-01-11T23:10:17.1698480Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_scatter_cuda_complex64 PASSED [ 63%] 2023-01-11T23:10:17.1698665Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_scatter_cuda_int32 PASSED [ 63%] 2023-01-11T23:10:17.1698849Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_scatter_cuda_int64 PASSED [ 63%] 2023-01-11T23:10:17.1699030Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_scatter_cuda_uint8 PASSED [ 63%] 2023-01-11T23:10:17.1699207Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_digamma_cuda_float16 PASSED [ 63%] 2023-01-11T23:10:17.1699380Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_digamma_cuda_int16 PASSED [ 63%] 2023-01-11T23:10:17.1699550Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_digamma_cuda_int32 PASSED [ 64%] 2023-01-11T23:10:17.1699745Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_digamma_cuda_int64 PASSED [ 64%] 2023-01-11T23:10:17.1699920Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_digamma_cuda_int8 PASSED [ 64%] 2023-01-11T23:10:17.1700113Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_floor_rounding_cuda_float16 PASSED [ 64%] 2023-01-11T23:10:17.1700303Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_floor_rounding_cuda_float64 PASSED [ 64%] 2023-01-11T23:10:17.1700490Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_floor_rounding_cuda_int64 PASSED [ 64%] 2023-01-11T23:10:17.1700677Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_floor_rounding_cuda_uint8 PASSED [ 64%] 2023-01-11T23:10:17.1700871Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_no_rounding_mode_cuda_bfloat16 PASSED [ 64%] 2023-01-11T23:10:17.1701059Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_no_rounding_mode_cuda_bool PASSED [ 64%] 2023-01-11T23:10:17.1701254Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_no_rounding_mode_cuda_complex32 XFAIL [ 64%] 2023-01-11T23:10:17.1701438Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_no_rounding_mode_cuda_float32 PASSED [ 64%] 2023-01-11T23:10:17.1701628Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_no_rounding_mode_cuda_float64 PASSED [ 64%] 2023-01-11T23:10:17.1701814Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_no_rounding_mode_cuda_int64 PASSED [ 64%] 2023-01-11T23:10:17.1701996Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_no_rounding_mode_cuda_int8 PASSED [ 64%] 2023-01-11T23:10:17.1702188Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_no_rounding_mode_cuda_uint8 PASSED [ 64%] 2023-01-11T23:10:17.1702376Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_trunc_rounding_cuda_bfloat16 PASSED [ 64%] 2023-01-11T23:10:17.1702562Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_trunc_rounding_cuda_float32 PASSED [ 64%] 2023-01-11T23:10:17.1702772Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_trunc_rounding_cuda_int32 PASSED [ 64%] 2023-01-11T23:10:17.1702959Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_trunc_rounding_cuda_uint8 PASSED [ 64%] 2023-01-11T23:10:17.1703137Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dsplit_cuda_bfloat16 PASSED [ 64%] 2023-01-11T23:10:17.1703305Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dsplit_cuda_bool PASSED [ 64%] 2023-01-11T23:10:17.1703490Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dsplit_cuda_complex128 PASSED [ 64%] 2023-01-11T23:10:17.1703671Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dsplit_cuda_complex32 PASSED [ 64%] 2023-01-11T23:10:17.1703847Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dsplit_cuda_complex64 PASSED [ 64%] 2023-01-11T23:10:17.1704023Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dsplit_cuda_float16 PASSED [ 64%] 2023-01-11T23:10:17.1704195Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dsplit_cuda_float32 PASSED [ 64%] 2023-01-11T23:10:17.1704367Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dsplit_cuda_int8 PASSED [ 64%] 2023-01-11T23:10:17.1704547Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dstack_cuda_bfloat16 PASSED [ 64%] 2023-01-11T23:10:17.1704719Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dstack_cuda_complex128 PASSED [ 64%] 2023-01-11T23:10:17.1704893Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dstack_cuda_complex32 PASSED [ 64%] 2023-01-11T23:10:17.1705067Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dstack_cuda_complex64 PASSED [ 64%] 2023-01-11T23:10:17.1705265Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dstack_cuda_float16 PASSED [ 64%] 2023-01-11T23:10:17.1705434Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dstack_cuda_float64 PASSED [ 64%] 2023-01-11T23:10:17.1705609Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dstack_cuda_int16 PASSED [ 64%] 2023-01-11T23:10:17.1705781Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dstack_cuda_int32 PASSED [ 64%] 2023-01-11T23:10:17.1705952Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dstack_cuda_int64 PASSED [ 64%] 2023-01-11T23:10:17.1706124Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dstack_cuda_int8 PASSED [ 64%] 2023-01-11T23:10:17.1706287Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dstack_cuda_uint8 PASSED [ 64%] 2023-01-11T23:10:17.1706509Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_cuda_bfloat16 SKIPPED (Expected: empty is not comparable) [ 64%] 2023-01-11T23:10:17.1706734Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_cuda_complex128 SKIPPED (Expected: empty is not comparable) [ 64%] 2023-01-11T23:10:17.1706955Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_cuda_complex64 SKIPPED (Expected: empty is not comparable) [ 64%] 2023-01-11T23:10:17.1707172Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_cuda_float16 SKIPPED (Expected: empty is not comparable) [ 64%] 2023-01-11T23:10:17.1707385Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_cuda_float64 SKIPPED (Expected: empty is not comparable) [ 64%] 2023-01-11T23:10:17.1707596Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_cuda_int16 SKIPPED (Expected: empty is not comparable) [ 64%] 2023-01-11T23:10:17.1707804Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_cuda_int32 SKIPPED (Expected: empty is not comparable) [ 64%] 2023-01-11T23:10:17.1708016Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_cuda_int8 SKIPPED (Expected: empty is not comparable) [ 64%] 2023-01-11T23:10:17.1708247Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_cuda_uint8 SKIPPED (Expected: empty is not comparable) [ 64%] 2023-01-11T23:10:17.1708469Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_like_cuda_bfloat16 SKIPPED (Expected: empty is not comparable) [ 64%] 2023-01-11T23:10:17.1708754Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_like_cuda_complex128 SKIPPED (Expected: empty is not comparable) [ 64%] 2023-01-11T23:10:17.1708982Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_like_cuda_complex64 SKIPPED (Expected: empty is not comparable) [ 64%] 2023-01-11T23:10:17.1709202Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_like_cuda_float64 SKIPPED (Expected: empty is not comparable) [ 64%] 2023-01-11T23:10:17.1709419Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_like_cuda_int32 SKIPPED (Expected: empty is not comparable) [ 64%] 2023-01-11T23:10:17.1709592Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eq_cuda_bfloat16 PASSED [ 64%] 2023-01-11T23:10:17.1709775Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eq_cuda_complex128 PASSED [ 64%] 2023-01-11T23:10:17.1709949Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eq_cuda_complex64 PASSED [ 64%] 2023-01-11T23:10:17.1710119Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eq_cuda_float16 PASSED [ 64%] 2023-01-11T23:10:17.1710290Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eq_cuda_float32 PASSED [ 64%] 2023-01-11T23:10:17.1710454Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eq_cuda_float64 PASSED [ 64%] 2023-01-11T23:10:17.1710623Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eq_cuda_int16 PASSED [ 64%] 2023-01-11T23:10:17.1710816Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eq_cuda_int64 PASSED [ 64%] 2023-01-11T23:10:17.1710983Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eq_cuda_int8 PASSED [ 64%] 2023-01-11T23:10:17.1711155Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eq_cuda_uint8 PASSED [ 64%] 2023-01-11T23:10:17.1711328Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erf_cuda_bfloat16 PASSED [ 64%] 2023-01-11T23:10:17.1711496Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erf_cuda_bool PASSED [ 64%] 2023-01-11T23:10:17.1711668Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erf_cuda_float32 PASSED [ 64%] 2023-01-11T23:10:17.1711838Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erf_cuda_int32 PASSED [ 64%] 2023-01-11T23:10:17.1712000Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erf_cuda_int64 PASSED [ 64%] 2023-01-11T23:10:17.1712180Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erfc_cuda_bfloat16 PASSED [ 64%] 2023-01-11T23:10:17.1712349Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erfc_cuda_bool PASSED [ 64%] 2023-01-11T23:10:17.1712522Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erfc_cuda_int32 PASSED [ 64%] 2023-01-11T23:10:17.1712689Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erfc_cuda_int64 PASSED [ 64%] 2023-01-11T23:10:17.1712854Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erfc_cuda_int8 PASSED [ 64%] 2023-01-11T23:10:17.1713030Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erfinv_cuda_float32 PASSED [ 64%] 2023-01-11T23:10:17.1713200Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erfinv_cuda_float64 PASSED [ 64%] 2023-01-11T23:10:17.1713374Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erfinv_cuda_int8 PASSED [ 64%] 2023-01-11T23:10:17.1713545Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp2_cuda_bfloat16 PASSED [ 64%] 2023-01-11T23:10:17.1713714Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp2_cuda_bool PASSED [ 64%] 2023-01-11T23:10:17.1713909Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp2_cuda_float16 PASSED [ 64%] 2023-01-11T23:10:17.1714083Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp2_cuda_float64 PASSED [ 64%] 2023-01-11T23:10:17.1714254Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp2_cuda_int16 PASSED [ 64%] 2023-01-11T23:10:17.1714422Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp2_cuda_int32 PASSED [ 64%] 2023-01-11T23:10:17.1714590Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp2_cuda_int8 PASSED [ 64%] 2023-01-11T23:10:17.1714761Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp_cuda_bfloat16 PASSED [ 64%] 2023-01-11T23:10:17.1714930Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp_cuda_complex64 PASSED [ 64%] 2023-01-11T23:10:17.1715103Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp_cuda_float16 PASSED [ 64%] 2023-01-11T23:10:17.1715274Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp_cuda_int16 PASSED [ 64%] 2023-01-11T23:10:17.1715440Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp_cuda_uint8 PASSED [ 64%] 2023-01-11T23:10:17.1715625Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_as_cuda_complex128 PASSED [ 64%] 2023-01-11T23:10:17.1715809Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_as_cuda_complex64 PASSED [ 64%] 2023-01-11T23:10:17.1715997Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_as_cuda_float16 PASSED [ 64%] 2023-01-11T23:10:17.1716177Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_as_cuda_float32 PASSED [ 64%] 2023-01-11T23:10:17.1716356Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_as_cuda_int16 PASSED [ 64%] 2023-01-11T23:10:17.1716573Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_as_cuda_int32 PASSED [ 64%] 2023-01-11T23:10:17.1716750Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_as_cuda_int8 PASSED [ 64%] 2023-01-11T23:10:17.1716923Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_cuda_bfloat16 PASSED [ 64%] 2023-01-11T23:10:17.1717093Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_cuda_float32 PASSED [ 64%] 2023-01-11T23:10:17.1717267Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_cuda_int16 PASSED [ 64%] 2023-01-11T23:10:17.1717439Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_cuda_int32 PASSED [ 64%] 2023-01-11T23:10:17.1717612Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_cuda_int64 PASSED [ 64%] 2023-01-11T23:10:17.1717785Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_cuda_uint8 PASSED [ 64%] 2023-01-11T23:10:17.1717957Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expm1_cuda_bool PASSED [ 64%] 2023-01-11T23:10:17.1718125Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expm1_cuda_float16 PASSED [ 64%] 2023-01-11T23:10:17.1718294Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expm1_cuda_float32 PASSED [ 64%] 2023-01-11T23:10:17.1718465Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expm1_cuda_int64 PASSED [ 64%] 2023-01-11T23:10:17.1718633Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expm1_cuda_int8 PASSED [ 64%] 2023-01-11T23:10:17.1718805Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expm1_cuda_uint8 PASSED [ 64%] 2023-01-11T23:10:17.1718972Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eye_cuda_bool PASSED [ 64%] 2023-01-11T23:10:17.1719146Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eye_cuda_complex64 PASSED [ 64%] 2023-01-11T23:10:17.1719322Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eye_cuda_float32 PASSED [ 64%] 2023-01-11T23:10:17.1719552Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eye_cuda_int32 PASSED [ 64%] 2023-01-11T23:10:17.1719729Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft2_cuda_complex32 PASSED [ 64%] 2023-01-11T23:10:17.1719910Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft2_cuda_complex64 PASSED [ 64%] 2023-01-11T23:10:17.1720087Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft2_cuda_float16 PASSED [ 64%] 2023-01-11T23:10:17.1720259Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft2_cuda_uint8 PASSED [ 64%] 2023-01-11T23:10:17.1720436Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft_cuda_complex128 PASSED [ 64%] 2023-01-11T23:10:17.1720614Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft_cuda_complex32 PASSED [ 64%] 2023-01-11T23:10:17.1720796Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft_cuda_complex64 PASSED [ 64%] 2023-01-11T23:10:17.1720980Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft_cuda_float32 PASSED [ 64%] 2023-01-11T23:10:17.1721147Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft_cuda_int16 PASSED [ 64%] 2023-01-11T23:10:17.1721317Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft_cuda_int32 PASSED [ 64%] 2023-01-11T23:10:17.1721488Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft_cuda_int8 PASSED [ 64%] 2023-01-11T23:10:17.1721661Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft_cuda_uint8 PASSED [ 64%] 2023-01-11T23:10:17.1721832Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftn_cuda_bool PASSED [ 64%] 2023-01-11T23:10:17.1722040Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftn_cuda_complex128 PASSED [ 64%] 2023-01-11T23:10:17.1722219Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftn_cuda_complex32 PASSED [ 64%] 2023-01-11T23:10:17.1722399Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftn_cuda_float16 PASSED [ 64%] 2023-01-11T23:10:17.1722572Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftn_cuda_float64 PASSED [ 64%] 2023-01-11T23:10:17.1722737Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftn_cuda_int8 PASSED [ 64%] 2023-01-11T23:10:17.1722910Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftn_cuda_uint8 PASSED [ 64%] 2023-01-11T23:10:17.1723096Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftshift_cuda_bfloat16 PASSED [ 64%] 2023-01-11T23:10:17.1723274Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftshift_cuda_bool PASSED [ 64%] 2023-01-11T23:10:17.1723465Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftshift_cuda_complex128 PASSED [ 64%] 2023-01-11T23:10:17.1723656Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftshift_cuda_complex64 PASSED [ 64%] 2023-01-11T23:10:17.1723840Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftshift_cuda_float16 PASSED [ 64%] 2023-01-11T23:10:17.1724025Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftshift_cuda_float32 PASSED [ 64%] 2023-01-11T23:10:17.1724205Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftshift_cuda_int16 PASSED [ 64%] 2023-01-11T23:10:17.1724374Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftshift_cuda_int8 PASSED [ 64%] 2023-01-11T23:10:17.1724551Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftshift_cuda_uint8 PASSED [ 64%] 2023-01-11T23:10:17.1724725Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft2_cuda_bool PASSED [ 64%] 2023-01-11T23:10:17.1724908Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft2_cuda_complex64 PASSED [ 64%] 2023-01-11T23:10:17.1725113Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft2_cuda_float16 PASSED [ 64%] 2023-01-11T23:10:17.1725290Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft2_cuda_float32 PASSED [ 64%] 2023-01-11T23:10:17.1725466Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft2_cuda_float64 PASSED [ 64%] 2023-01-11T23:10:17.1725643Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft2_cuda_int16 PASSED [ 64%] 2023-01-11T23:10:17.1725817Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft2_cuda_int64 PASSED [ 64%] 2023-01-11T23:10:17.1725981Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft2_cuda_int8 PASSED [ 64%] 2023-01-11T23:10:17.1726155Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft2_cuda_uint8 PASSED [ 64%] 2023-01-11T23:10:17.1726336Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft_cuda_complex32 PASSED [ 64%] 2023-01-11T23:10:17.1726516Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft_cuda_complex64 PASSED [ 64%] 2023-01-11T23:10:17.1726692Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft_cuda_float64 PASSED [ 64%] 2023-01-11T23:10:17.1726867Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft_cuda_int16 PASSED [ 64%] 2023-01-11T23:10:17.1727038Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft_cuda_int32 PASSED [ 64%] 2023-01-11T23:10:17.1727206Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft_cuda_uint8 PASSED [ 64%] 2023-01-11T23:10:17.1727389Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfftn_cuda_complex128 PASSED [ 64%] 2023-01-11T23:10:17.1727593Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfftn_cuda_complex64 PASSED [ 64%] 2023-01-11T23:10:17.1727772Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfftn_cuda_float16 PASSED [ 64%] 2023-01-11T23:10:17.1727950Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfftn_cuda_float32 PASSED [ 65%] 2023-01-11T23:10:17.1728128Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfftn_cuda_int32 PASSED [ 65%] 2023-01-11T23:10:17.1728303Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfftn_cuda_uint8 PASSED [ 65%] 2023-01-11T23:10:17.1728473Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft2_cuda_bool PASSED [ 65%] 2023-01-11T23:10:17.1728649Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft2_cuda_float32 PASSED [ 65%] 2023-01-11T23:10:17.1728826Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft2_cuda_float64 PASSED [ 65%] 2023-01-11T23:10:17.1729001Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft2_cuda_int16 PASSED [ 65%] 2023-01-11T23:10:17.1729164Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft2_cuda_int32 PASSED [ 65%] 2023-01-11T23:10:17.1729336Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft2_cuda_int64 PASSED [ 65%] 2023-01-11T23:10:17.1729505Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft2_cuda_uint8 PASSED [ 65%] 2023-01-11T23:10:17.1729680Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft_cuda_bool PASSED [ 65%] 2023-01-11T23:10:17.1729863Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft_cuda_complex128 PASSED [ 65%] 2023-01-11T23:10:17.1730041Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft_cuda_complex32 PASSED [ 65%] 2023-01-11T23:10:17.1730218Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft_cuda_float16 PASSED [ 65%] 2023-01-11T23:10:17.1730394Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft_cuda_float32 PASSED [ 65%] 2023-01-11T23:10:17.1730565Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft_cuda_int16 PASSED [ 65%] 2023-01-11T23:10:17.1730762Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftn_cuda_complex32 PASSED [ 65%] 2023-01-11T23:10:17.1730946Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftn_cuda_complex64 PASSED [ 65%] 2023-01-11T23:10:17.1731123Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftn_cuda_float64 PASSED [ 65%] 2023-01-11T23:10:17.1731301Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftshift_cuda_bool PASSED [ 65%] 2023-01-11T23:10:17.1731490Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftshift_cuda_complex64 PASSED [ 65%] 2023-01-11T23:10:17.1731676Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftshift_cuda_float16 PASSED [ 65%] 2023-01-11T23:10:17.1731863Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftshift_cuda_int64 PASSED [ 65%] 2023-01-11T23:10:17.1732041Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftshift_cuda_uint8 PASSED [ 65%] 2023-01-11T23:10:17.1732220Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfft2_cuda_float16 PASSED [ 65%] 2023-01-11T23:10:17.1732388Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfft2_cuda_float64 PASSED [ 65%] 2023-01-11T23:10:17.1732565Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfft2_cuda_int8 PASSED [ 65%] 2023-01-11T23:10:17.1732740Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfft2_cuda_uint8 PASSED [ 65%] 2023-01-11T23:10:17.1732917Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfft_cuda_float16 PASSED [ 65%] 2023-01-11T23:10:17.1733118Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfft_cuda_float32 PASSED [ 65%] 2023-01-11T23:10:17.1733293Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfft_cuda_int8 PASSED [ 65%] 2023-01-11T23:10:17.1733469Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfftn_cuda_bool PASSED [ 65%] 2023-01-11T23:10:17.1733652Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfftn_cuda_float64 PASSED [ 65%] 2023-01-11T23:10:17.1733832Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfftn_cuda_int16 PASSED [ 65%] 2023-01-11T23:10:17.1734000Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfftn_cuda_int32 PASSED [ 65%] 2023-01-11T23:10:17.1734176Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfftn_cuda_int8 PASSED [ 65%] 2023-01-11T23:10:17.1734347Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfftn_cuda_uint8 PASSED [ 65%] 2023-01-11T23:10:17.1734662Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft2_cuda_bool PASSED [ 65%] 2023-01-11T23:10:17.1734903Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft2_cuda_complex32 PASSED [ 65%] 2023-01-11T23:10:17.1735094Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft2_cuda_float16 PASSED [ 65%] 2023-01-11T23:10:17.1735274Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft2_cuda_float32 PASSED [ 65%] 2023-01-11T23:10:17.1735450Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft2_cuda_int16 PASSED [ 65%] 2023-01-11T23:10:17.1735628Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft2_cuda_int32 PASSED [ 65%] 2023-01-11T23:10:17.1735797Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft2_cuda_int8 PASSED [ 65%] 2023-01-11T23:10:17.1735974Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft_cuda_bool PASSED [ 65%] 2023-01-11T23:10:17.1736166Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft_cuda_complex32 PASSED [ 65%] 2023-01-11T23:10:17.1736350Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft_cuda_complex64 PASSED [ 65%] 2023-01-11T23:10:17.1736586Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft_cuda_float16 PASSED [ 65%] 2023-01-11T23:10:17.1736765Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft_cuda_float32 PASSED [ 65%] 2023-01-11T23:10:17.1736940Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft_cuda_float64 PASSED [ 65%] 2023-01-11T23:10:17.1737116Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft_cuda_int16 PASSED [ 65%] 2023-01-11T23:10:17.1737287Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft_cuda_int64 PASSED [ 65%] 2023-01-11T23:10:17.1737454Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft_cuda_int8 PASSED [ 65%] 2023-01-11T23:10:17.1737626Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft_cuda_uint8 PASSED [ 65%] 2023-01-11T23:10:17.1737810Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfftn_cuda_complex64 PASSED [ 65%] 2023-01-11T23:10:17.1737988Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfftn_cuda_float64 PASSED [ 65%] 2023-01-11T23:10:17.1738167Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfftn_cuda_int8 PASSED [ 65%] 2023-01-11T23:10:17.1738342Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfftn_cuda_uint8 PASSED [ 65%] 2023-01-11T23:10:17.1738514Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfft2_cuda_float32 PASSED [ 65%] 2023-01-11T23:10:17.1738693Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfft2_cuda_float64 PASSED [ 65%] 2023-01-11T23:10:17.1738866Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfft2_cuda_int16 PASSED [ 65%] 2023-01-11T23:10:17.1739062Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfft2_cuda_int32 PASSED [ 65%] 2023-01-11T23:10:17.1739235Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfft_cuda_int16 PASSED [ 65%] 2023-01-11T23:10:17.1739405Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfft_cuda_int64 PASSED [ 65%] 2023-01-11T23:10:17.1739574Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfft_cuda_uint8 PASSED [ 65%] 2023-01-11T23:10:17.1739742Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfftn_cuda_bool PASSED [ 65%] 2023-01-11T23:10:17.1739919Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfftn_cuda_float16 PASSED [ 65%] 2023-01-11T23:10:17.1740093Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfftn_cuda_int16 PASSED [ 65%] 2023-01-11T23:10:17.1740264Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfftn_cuda_int32 PASSED [ 65%] 2023-01-11T23:10:17.1740431Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfftn_cuda_int8 PASSED [ 65%] 2023-01-11T23:10:17.1740606Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fill_cuda_complex128 PASSED [ 65%] 2023-01-11T23:10:17.1740777Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fill_cuda_complex64 PASSED [ 65%] 2023-01-11T23:10:17.1740948Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fill_cuda_uint8 PASSED [ 65%] 2023-01-11T23:10:17.1741127Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flatten_cuda_bfloat16 PASSED [ 65%] 2023-01-11T23:10:17.1741301Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flatten_cuda_bool PASSED [ 65%] 2023-01-11T23:10:17.1741482Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flatten_cuda_complex128 PASSED [ 65%] 2023-01-11T23:10:17.1741659Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flatten_cuda_float16 PASSED [ 65%] 2023-01-11T23:10:17.1741834Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flatten_cuda_float32 PASSED [ 65%] 2023-01-11T23:10:17.1742024Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flatten_cuda_int32 PASSED [ 65%] 2023-01-11T23:10:17.1742198Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flip_cuda_complex128 PASSED [ 65%] 2023-01-11T23:10:17.1742371Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flip_cuda_float64 PASSED [ 65%] 2023-01-11T23:10:17.1742541Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flip_cuda_int16 PASSED [ 65%] 2023-01-11T23:10:17.1742717Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fliplr_cuda_bfloat16 PASSED [ 65%] 2023-01-11T23:10:17.1742891Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fliplr_cuda_bool PASSED [ 65%] 2023-01-11T23:10:17.1743067Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fliplr_cuda_float16 PASSED [ 65%] 2023-01-11T23:10:17.1743237Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fliplr_cuda_float64 PASSED [ 65%] 2023-01-11T23:10:17.1743413Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fliplr_cuda_int16 PASSED [ 65%] 2023-01-11T23:10:17.1743576Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fliplr_cuda_int32 PASSED [ 65%] 2023-01-11T23:10:17.1743749Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fliplr_cuda_int8 PASSED [ 65%] 2023-01-11T23:10:17.1743924Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flipud_cuda_bfloat16 PASSED [ 65%] 2023-01-11T23:10:17.1744102Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flipud_cuda_complex128 PASSED [ 65%] 2023-01-11T23:10:17.1744282Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flipud_cuda_complex64 PASSED [ 65%] 2023-01-11T23:10:17.1744479Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flipud_cuda_float16 PASSED [ 65%] 2023-01-11T23:10:17.1744652Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flipud_cuda_int32 PASSED [ 65%] 2023-01-11T23:10:17.1744828Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flipud_cuda_int64 PASSED [ 65%] 2023-01-11T23:10:17.1745001Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flipud_cuda_uint8 PASSED [ 65%] 2023-01-11T23:10:17.1745177Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_float_power_cuda_bfloat16 PASSED [ 65%] 2023-01-11T23:10:17.1745366Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_float_power_cuda_complex128 PASSED [ 65%] 2023-01-11T23:10:17.1745551Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_float_power_cuda_complex64 PASSED [ 65%] 2023-01-11T23:10:17.1745732Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_float_power_cuda_float16 PASSED [ 65%] 2023-01-11T23:10:17.1745915Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_float_power_cuda_float32 PASSED [ 65%] 2023-01-11T23:10:17.1746096Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_float_power_cuda_float64 PASSED [ 65%] 2023-01-11T23:10:17.1746278Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_float_power_cuda_int16 PASSED [ 65%] 2023-01-11T23:10:17.1746457Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_float_power_cuda_int64 PASSED [ 65%] 2023-01-11T23:10:17.1746635Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_float_power_cuda_int8 PASSED [ 65%] 2023-01-11T23:10:17.1746803Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_floor_cuda_float16 PASSED [ 65%] 2023-01-11T23:10:17.1746973Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_floor_cuda_float64 PASSED [ 65%] 2023-01-11T23:10:17.1747143Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_floor_cuda_int32 PASSED [ 65%] 2023-01-11T23:10:17.1747316Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_floor_cuda_int8 PASSED [ 65%] 2023-01-11T23:10:17.1747489Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_floor_cuda_uint8 PASSED [ 65%] 2023-01-11T23:10:17.1747712Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_floor_divide_cuda_bfloat16 SKIPPED (Skipped!) [ 65%] 2023-01-11T23:10:17.1747895Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_floor_divide_cuda_float16 PASSED [ 65%] 2023-01-11T23:10:17.1748080Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_floor_divide_cuda_float32 PASSED [ 65%] 2023-01-11T23:10:17.1748263Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_floor_divide_cuda_int16 PASSED [ 65%] 2023-01-11T23:10:17.1748433Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmax_cuda_bfloat16 PASSED [ 65%] 2023-01-11T23:10:17.1748603Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmax_cuda_bool PASSED [ 65%] 2023-01-11T23:10:17.1748850Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmax_cuda_float16 PASSED [ 65%] 2023-01-11T23:10:17.1749026Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmax_cuda_float32 PASSED [ 65%] 2023-01-11T23:10:17.1749197Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmax_cuda_int64 PASSED [ 65%] 2023-01-11T23:10:17.1749369Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmax_cuda_uint8 PASSED [ 65%] 2023-01-11T23:10:17.1749539Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmin_cuda_bool PASSED [ 65%] 2023-01-11T23:10:17.1749709Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmin_cuda_float16 PASSED [ 65%] 2023-01-11T23:10:17.1749870Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmin_cuda_int16 PASSED [ 65%] 2023-01-11T23:10:17.1750038Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmin_cuda_int8 PASSED [ 65%] 2023-01-11T23:10:17.1750237Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmod_cuda_float64 PASSED [ 65%] 2023-01-11T23:10:17.1750406Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmod_cuda_uint8 PASSED [ 65%] 2023-01-11T23:10:17.1750584Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_frac_cuda_bfloat16 PASSED [ 65%] 2023-01-11T23:10:17.1750754Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_frac_cuda_float16 PASSED [ 65%] 2023-01-11T23:10:17.1750926Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_frac_cuda_float32 PASSED [ 65%] 2023-01-11T23:10:17.1751092Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_frac_cuda_float64 PASSED [ 65%] 2023-01-11T23:10:17.1751261Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_gcd_cuda_int16 PASSED [ 65%] 2023-01-11T23:10:17.1751424Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_gcd_cuda_uint8 PASSED [ 65%] 2023-01-11T23:10:17.1751600Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ge_cuda_bfloat16 PASSED [ 65%] 2023-01-11T23:10:17.1751770Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ge_cuda_float16 PASSED [ 65%] 2023-01-11T23:10:17.1751943Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ge_cuda_float32 PASSED [ 65%] 2023-01-11T23:10:17.1752111Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ge_cuda_int64 PASSED [ 65%] 2023-01-11T23:10:17.1752276Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ge_cuda_int8 PASSED [ 65%] 2023-01-11T23:10:17.1752449Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_gt_cuda_bfloat16 PASSED [ 65%] 2023-01-11T23:10:17.1752619Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_gt_cuda_bool PASSED [ 65%] 2023-01-11T23:10:17.1752785Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_gt_cuda_float32 PASSED [ 65%] 2023-01-11T23:10:17.1752945Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_gt_cuda_int16 PASSED [ 65%] 2023-01-11T23:10:17.1753112Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_gt_cuda_int64 PASSED [ 65%] 2023-01-11T23:10:17.1753300Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_gt_cuda_uint8 PASSED [ 65%] 2023-01-11T23:10:17.1753486Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_heaviside_cuda_bfloat16 PASSED [ 65%] 2023-01-11T23:10:17.1753669Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_heaviside_cuda_float32 PASSED [ 65%] 2023-01-11T23:10:17.1753851Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_heaviside_cuda_float64 PASSED [ 65%] 2023-01-11T23:10:17.1754034Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_heaviside_cuda_int32 PASSED [ 65%] 2023-01-11T23:10:17.1754211Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_heaviside_cuda_int8 PASSED [ 65%] 2023-01-11T23:10:17.1754391Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_heaviside_cuda_uint8 PASSED [ 65%] 2023-01-11T23:10:17.1754562Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hsplit_cuda_bool PASSED [ 65%] 2023-01-11T23:10:17.1754744Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hsplit_cuda_float16 PASSED [ 65%] 2023-01-11T23:10:17.1754915Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hsplit_cuda_float32 PASSED [ 65%] 2023-01-11T23:10:17.1755086Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hsplit_cuda_float64 PASSED [ 65%] 2023-01-11T23:10:17.1755258Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hsplit_cuda_int32 PASSED [ 65%] 2023-01-11T23:10:17.1755432Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hsplit_cuda_int64 PASSED [ 65%] 2023-01-11T23:10:17.1755609Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hsplit_cuda_int8 PASSED [ 65%] 2023-01-11T23:10:17.1755807Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hsplit_cuda_uint8 PASSED [ 65%] 2023-01-11T23:10:17.1755973Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hstack_cuda_bool PASSED [ 66%] 2023-01-11T23:10:17.1756158Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hstack_cuda_complex64 PASSED [ 66%] 2023-01-11T23:10:17.1756335Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hstack_cuda_float16 PASSED [ 66%] 2023-01-11T23:10:17.1756508Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hstack_cuda_float64 PASSED [ 66%] 2023-01-11T23:10:17.1756680Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hypot_cuda_bfloat16 PASSED [ 66%] 2023-01-11T23:10:17.1756856Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hypot_cuda_float32 PASSED [ 66%] 2023-01-11T23:10:17.1757030Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_i0_cuda_bfloat16 PASSED [ 66%] 2023-01-11T23:10:17.1757204Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_i0_cuda_float32 PASSED [ 66%] 2023-01-11T23:10:17.1757374Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_i0_cuda_int16 PASSED [ 66%] 2023-01-11T23:10:17.1757539Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_i0_cuda_int32 PASSED [ 66%] 2023-01-11T23:10:17.1757706Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_i0_cuda_int64 PASSED [ 66%] 2023-01-11T23:10:17.1757876Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_i0_cuda_int8 PASSED [ 66%] 2023-01-11T23:10:17.1758041Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_i0_cuda_uint8 PASSED [ 66%] 2023-01-11T23:10:17.1758216Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_igamma_cuda_float32 PASSED [ 66%] 2023-01-11T23:10:17.1758392Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_igamma_cuda_float64 PASSED [ 66%] 2023-01-11T23:10:17.1758578Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_add_cuda_bfloat16 PASSED [ 66%] 2023-01-11T23:10:17.1758765Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_add_cuda_complex128 PASSED [ 66%] 2023-01-11T23:10:17.1758973Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_add_cuda_complex32 PASSED [ 66%] 2023-01-11T23:10:17.1759150Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_add_cuda_float16 PASSED [ 66%] 2023-01-11T23:10:17.1759333Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_add_cuda_float32 PASSED [ 66%] 2023-01-11T23:10:17.1759513Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_add_cuda_int32 PASSED [ 66%] 2023-01-11T23:10:17.1759683Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_add_cuda_int64 PASSED [ 66%] 2023-01-11T23:10:17.1759858Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_add_cuda_int8 PASSED [ 66%] 2023-01-11T23:10:17.1760033Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_add_cuda_uint8 PASSED [ 66%] 2023-01-11T23:10:17.1760218Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_copy_cuda_bfloat16 PASSED [ 66%] 2023-01-11T23:10:17.1760410Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_copy_cuda_complex128 PASSED [ 66%] 2023-01-11T23:10:17.1760592Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_copy_cuda_float32 PASSED [ 66%] 2023-01-11T23:10:17.1760766Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_copy_cuda_int16 PASSED [ 66%] 2023-01-11T23:10:17.1760945Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_copy_cuda_int32 PASSED [ 66%] 2023-01-11T23:10:17.1761147Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_copy_cuda_int64 PASSED [ 66%] 2023-01-11T23:10:17.1761343Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_copy_cuda_int8 PASSED [ 66%] 2023-01-11T23:10:17.1761537Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_copy_cuda_uint8 PASSED [ 66%] 2023-01-11T23:10:17.1761711Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_fill_cuda_bool PASSED [ 66%] 2023-01-11T23:10:17.1761902Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_fill_cuda_complex128 PASSED [ 66%] 2023-01-11T23:10:17.1762087Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_fill_cuda_complex64 PASSED [ 66%] 2023-01-11T23:10:17.1762266Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_fill_cuda_float16 PASSED [ 66%] 2023-01-11T23:10:17.1762442Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_fill_cuda_float32 PASSED [ 66%] 2023-01-11T23:10:17.1762628Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_fill_cuda_float64 PASSED [ 66%] 2023-01-11T23:10:17.1762808Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_fill_cuda_int16 PASSED [ 66%] 2023-01-11T23:10:17.1762984Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_fill_cuda_int32 PASSED [ 66%] 2023-01-11T23:10:17.1763162Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_fill_cuda_uint8 PASSED [ 66%] 2023-01-11T23:10:17.1763342Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_select_cuda_bool PASSED [ 66%] 2023-01-11T23:10:17.1763531Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_select_cuda_complex128 PASSED [ 66%] 2023-01-11T23:10:17.1763720Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_select_cuda_complex32 PASSED [ 66%] 2023-01-11T23:10:17.1763906Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_select_cuda_complex64 PASSED [ 66%] 2023-01-11T23:10:17.1764084Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_select_cuda_float32 PASSED [ 66%] 2023-01-11T23:10:17.1764269Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_select_cuda_int16 PASSED [ 66%] 2023-01-11T23:10:17.1764447Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_select_cuda_int32 PASSED [ 66%] 2023-01-11T23:10:17.1764661Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_select_cuda_int64 PASSED [ 66%] 2023-01-11T23:10:17.1764843Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_select_cuda_int8 PASSED [ 66%] 2023-01-11T23:10:17.1765021Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_select_cuda_uint8 PASSED [ 66%] 2023-01-11T23:10:17.1765203Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isclose_cuda_bool PASSED [ 66%] 2023-01-11T23:10:17.1765383Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isclose_cuda_complex64 PASSED [ 66%] 2023-01-11T23:10:17.1765564Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isclose_cuda_float16 PASSED [ 66%] 2023-01-11T23:10:17.1765737Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isclose_cuda_float32 PASSED [ 66%] 2023-01-11T23:10:17.1765910Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isclose_cuda_float64 PASSED [ 66%] 2023-01-11T23:10:17.1766089Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isclose_cuda_int16 PASSED [ 66%] 2023-01-11T23:10:17.1766263Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isclose_cuda_uint8 PASSED [ 66%] 2023-01-11T23:10:17.1766444Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isfinite_cuda_bfloat16 PASSED [ 66%] 2023-01-11T23:10:17.1766623Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isfinite_cuda_bool PASSED [ 66%] 2023-01-11T23:10:17.1766810Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isfinite_cuda_complex128 PASSED [ 66%] 2023-01-11T23:10:17.1766992Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isfinite_cuda_float16 PASSED [ 66%] 2023-01-11T23:10:17.1767195Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isfinite_cuda_float32 PASSED [ 66%] 2023-01-11T23:10:17.1767363Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isfinite_cuda_float64 PASSED [ 66%] 2023-01-11T23:10:17.1767542Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isfinite_cuda_int32 PASSED [ 66%] 2023-01-11T23:10:17.1767717Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isinf_cuda_bfloat16 PASSED [ 66%] 2023-01-11T23:10:17.1767894Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isinf_cuda_complex128 PASSED [ 66%] 2023-01-11T23:10:17.1768071Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isinf_cuda_float64 PASSED [ 66%] 2023-01-11T23:10:17.1768245Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isinf_cuda_int32 PASSED [ 66%] 2023-01-11T23:10:17.1768421Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isinf_cuda_int64 PASSED [ 66%] 2023-01-11T23:10:17.1768598Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isnan_cuda_float64 PASSED [ 66%] 2023-01-11T23:10:17.1768765Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isnan_cuda_int16 PASSED [ 66%] 2023-01-11T23:10:17.1768942Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isnan_cuda_int32 PASSED [ 66%] 2023-01-11T23:10:17.1769117Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isnan_cuda_int64 PASSED [ 66%] 2023-01-11T23:10:17.1769289Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isnan_cuda_uint8 PASSED [ 66%] 2023-01-11T23:10:17.1769470Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isneginf_cuda_bfloat16 PASSED [ 66%] 2023-01-11T23:10:17.1769649Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isneginf_cuda_bool PASSED [ 66%] 2023-01-11T23:10:17.1769829Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isneginf_cuda_float16 PASSED [ 66%] 2023-01-11T23:10:17.1770025Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isneginf_cuda_float32 PASSED [ 66%] 2023-01-11T23:10:17.1770202Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isneginf_cuda_int64 PASSED [ 66%] 2023-01-11T23:10:17.1770405Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isneginf_cuda_int8 PASSED [ 66%] 2023-01-11T23:10:17.1770581Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isneginf_cuda_uint8 PASSED [ 66%] 2023-01-11T23:10:17.1770753Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isposinf_cuda_bool PASSED [ 66%] 2023-01-11T23:10:17.1770931Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isposinf_cuda_float64 PASSED [ 66%] 2023-01-11T23:10:17.1771102Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isposinf_cuda_int64 PASSED [ 66%] 2023-01-11T23:10:17.1771274Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isposinf_cuda_uint8 PASSED [ 66%] 2023-01-11T23:10:17.1771454Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isreal_cuda_bfloat16 PASSED [ 66%] 2023-01-11T23:10:17.1771630Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isreal_cuda_bool PASSED [ 66%] 2023-01-11T23:10:17.1771812Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isreal_cuda_complex128 PASSED [ 66%] 2023-01-11T23:10:17.1771982Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isreal_cuda_complex64 PASSED [ 66%] 2023-01-11T23:10:17.1772153Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isreal_cuda_float32 PASSED [ 66%] 2023-01-11T23:10:17.1772325Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isreal_cuda_int16 PASSED [ 66%] 2023-01-11T23:10:17.1772499Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isreal_cuda_int32 PASSED [ 66%] 2023-01-11T23:10:17.1772675Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isreal_cuda_int64 PASSED [ 66%] 2023-01-11T23:10:17.1772871Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lcm_cuda_int16 PASSED [ 66%] 2023-01-11T23:10:17.1773045Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_le_cuda_float32 PASSED [ 66%] 2023-01-11T23:10:17.1773219Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_le_cuda_float64 PASSED [ 66%] 2023-01-11T23:10:17.1773389Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_le_cuda_int32 PASSED [ 66%] 2023-01-11T23:10:17.1773552Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_le_cuda_int8 PASSED [ 66%] 2023-01-11T23:10:17.1773721Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_le_cuda_uint8 PASSED [ 66%] 2023-01-11T23:10:17.1773899Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lerp_cuda_bfloat16 PASSED [ 66%] 2023-01-11T23:10:17.1774072Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lerp_cuda_float16 PASSED [ 66%] 2023-01-11T23:10:17.1774248Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lerp_cuda_float64 PASSED [ 66%] 2023-01-11T23:10:17.1774427Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lgamma_cuda_float32 PASSED [ 66%] 2023-01-11T23:10:17.1774722Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lgamma_cuda_float64 PASSED [ 66%] 2023-01-11T23:10:17.1774900Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lgamma_cuda_int16 PASSED [ 66%] 2023-01-11T23:10:17.1775067Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lgamma_cuda_int32 PASSED [ 66%] 2023-01-11T23:10:17.1775267Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_matrix_norm_cuda_float32 PASSED [ 66%] 2023-01-11T23:10:17.1775458Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_matrix_norm_cuda_float64 PASSED [ 66%] 2023-01-11T23:10:17.1775646Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_norm_cuda_complex128 PASSED [ 66%] 2023-01-11T23:10:17.1775843Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_norm_cuda_complex64 PASSED [ 66%] 2023-01-11T23:10:17.1776066Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_norm_cuda_float16 PASSED [ 66%] 2023-01-11T23:10:17.1776250Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_norm_cuda_float64 PASSED [ 66%] 2023-01-11T23:10:17.1776435Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_svd_cuda_complex64 PASSED [ 66%] 2023-01-11T23:10:17.1776617Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_svd_cuda_float64 PASSED [ 66%] 2023-01-11T23:10:17.1776811Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_svdvals_cuda_complex128 PASSED [ 66%] 2023-01-11T23:10:17.1776995Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_svdvals_cuda_complex64 PASSED [ 66%] 2023-01-11T23:10:17.1777184Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_svdvals_cuda_float64 PASSED [ 66%] 2023-01-11T23:10:17.1777379Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_vector_norm_cuda_complex128 PASSED [ 66%] 2023-01-11T23:10:17.1777574Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_vector_norm_cuda_complex64 PASSED [ 66%] 2023-01-11T23:10:17.1777765Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_vector_norm_cuda_float32 PASSED [ 66%] 2023-01-11T23:10:17.1777949Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linspace_cuda_complex128 PASSED [ 66%] 2023-01-11T23:10:17.1778132Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linspace_cuda_float16 PASSED [ 66%] 2023-01-11T23:10:17.1778309Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linspace_cuda_float32 PASSED [ 66%] 2023-01-11T23:10:17.1778485Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linspace_cuda_float64 PASSED [ 66%] 2023-01-11T23:10:17.1778687Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linspace_cuda_int16 XFAIL [ 66%] 2023-01-11T23:10:17.1778863Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linspace_cuda_int32 XFAIL [ 66%] 2023-01-11T23:10:17.1779038Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linspace_cuda_int8 XFAIL [ 66%] 2023-01-11T23:10:17.1779218Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log10_cuda_complex128 PASSED [ 66%] 2023-01-11T23:10:17.1779398Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log10_cuda_complex64 PASSED [ 66%] 2023-01-11T23:10:17.1779573Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log10_cuda_float16 PASSED [ 66%] 2023-01-11T23:10:17.1779747Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log10_cuda_int64 PASSED [ 66%] 2023-01-11T23:10:17.1779931Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log1p_cuda_complex64 PASSED [ 66%] 2023-01-11T23:10:17.1780110Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log1p_cuda_float16 PASSED [ 66%] 2023-01-11T23:10:17.1780281Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log1p_cuda_int64 PASSED [ 66%] 2023-01-11T23:10:17.1780460Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log1p_cuda_uint8 PASSED [ 66%] 2023-01-11T23:10:17.1780632Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log2_cuda_bfloat16 PASSED [ 66%] 2023-01-11T23:10:17.1780814Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log2_cuda_complex128 PASSED [ 66%] 2023-01-11T23:10:17.1780988Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log2_cuda_complex64 PASSED [ 66%] 2023-01-11T23:10:17.1781160Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log2_cuda_float16 PASSED [ 66%] 2023-01-11T23:10:17.1781333Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log2_cuda_float64 PASSED [ 66%] 2023-01-11T23:10:17.1781509Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log2_cuda_int16 PASSED [ 66%] 2023-01-11T23:10:17.1781697Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log2_cuda_int32 PASSED [ 66%] 2023-01-11T23:10:17.1781870Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log2_cuda_int64 PASSED [ 66%] 2023-01-11T23:10:17.1782042Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log2_cuda_int8 PASSED [ 66%] 2023-01-11T23:10:17.1782221Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_cuda_complex128 PASSED [ 66%] 2023-01-11T23:10:17.1782396Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_cuda_complex32 PASSED [ 66%] 2023-01-11T23:10:17.1782570Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_cuda_complex64 PASSED [ 66%] 2023-01-11T23:10:17.1782744Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_cuda_int16 PASSED [ 66%] 2023-01-11T23:10:17.1782919Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_cuda_int32 PASSED [ 66%] 2023-01-11T23:10:17.1783092Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_cuda_int64 PASSED [ 66%] 2023-01-11T23:10:17.1783257Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_cuda_uint8 PASSED [ 66%] 2023-01-11T23:10:17.1783456Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_softmax_with_dtype_cuda_bfloat16 PASSED [ 66%] 2023-01-11T23:10:17.1783656Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_softmax_with_dtype_cuda_complex128 PASSED [ 66%] 2023-01-11T23:10:17.1783854Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_softmax_with_dtype_cuda_complex32 PASSED [ 66%] 2023-01-11T23:10:17.1784051Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_softmax_with_dtype_cuda_float16 PASSED [ 66%] 2023-01-11T23:10:17.1784268Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_softmax_with_dtype_cuda_float64 PASSED [ 66%] 2023-01-11T23:10:17.1784461Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_softmax_with_dtype_cuda_int32 PASSED [ 67%] 2023-01-11T23:10:17.1784655Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_softmax_with_dtype_cuda_int64 PASSED [ 67%] 2023-01-11T23:10:17.1784841Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_and_cuda_bfloat16 PASSED [ 67%] 2023-01-11T23:10:17.1785022Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_and_cuda_complex128 PASSED [ 67%] 2023-01-11T23:10:17.1785209Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_and_cuda_float64 PASSED [ 67%] 2023-01-11T23:10:17.1785392Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_and_cuda_int16 PASSED [ 67%] 2023-01-11T23:10:17.1785574Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_and_cuda_int32 PASSED [ 67%] 2023-01-11T23:10:17.1785757Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_and_cuda_int64 PASSED [ 67%] 2023-01-11T23:10:17.1785938Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_and_cuda_int8 PASSED [ 67%] 2023-01-11T23:10:17.1786118Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_and_cuda_uint8 PASSED [ 67%] 2023-01-11T23:10:17.1786303Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_not_cuda_bfloat16 PASSED [ 67%] 2023-01-11T23:10:17.1786488Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_not_cuda_float64 PASSED [ 67%] 2023-01-11T23:10:17.1786659Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_not_cuda_int16 PASSED [ 67%] 2023-01-11T23:10:17.1786839Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_not_cuda_int32 PASSED [ 67%] 2023-01-11T23:10:17.1787019Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_not_cuda_int64 PASSED [ 67%] 2023-01-11T23:10:17.1787202Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_not_cuda_int8 PASSED [ 67%] 2023-01-11T23:10:17.1787408Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_or_cuda_bool PASSED [ 67%] 2023-01-11T23:10:17.1787596Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_or_cuda_complex128 PASSED [ 67%] 2023-01-11T23:10:17.1787776Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_or_cuda_float16 PASSED [ 67%] 2023-01-11T23:10:17.1787957Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_or_cuda_float32 PASSED [ 67%] 2023-01-11T23:10:17.1788138Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_or_cuda_float64 PASSED [ 67%] 2023-01-11T23:10:17.1788309Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_or_cuda_int16 PASSED [ 67%] 2023-01-11T23:10:17.1788491Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_or_cuda_int8 PASSED [ 67%] 2023-01-11T23:10:17.1788731Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_xor_cuda_bfloat16 PASSED [ 67%] 2023-01-11T23:10:17.1788932Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_xor_cuda_bool PASSED [ 67%] 2023-01-11T23:10:17.1789129Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_xor_cuda_float64 PASSED [ 67%] 2023-01-11T23:10:17.1789307Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_xor_cuda_int8 PASSED [ 67%] 2023-01-11T23:10:17.1789481Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logspace_cuda_float64 PASSED [ 67%] 2023-01-11T23:10:17.1789657Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logspace_cuda_int64 XFAIL [ 67%] 2023-01-11T23:10:17.1789833Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logspace_cuda_uint8 PASSED [ 67%] 2023-01-11T23:10:17.1790030Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logsumexp_cuda_bool PASSED [ 67%] 2023-01-11T23:10:17.1790214Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logsumexp_cuda_float16 PASSED [ 67%] 2023-01-11T23:10:17.1790397Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logsumexp_cuda_float64 PASSED [ 67%] 2023-01-11T23:10:17.1790576Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logsumexp_cuda_int8 PASSED [ 67%] 2023-01-11T23:10:17.1790754Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logsumexp_cuda_uint8 PASSED [ 67%] 2023-01-11T23:10:17.1790926Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lt_cuda_float16 PASSED [ 67%] 2023-01-11T23:10:17.1791096Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lt_cuda_int32 PASSED [ 67%] 2023-01-11T23:10:17.1791283Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_masked_fill_cuda_bfloat16 PASSED [ 67%] 2023-01-11T23:10:17.1791462Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_masked_fill_cuda_bool PASSED [ 67%] 2023-01-11T23:10:17.1791641Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_masked_fill_cuda_complex64 PASSED [ 67%] 2023-01-11T23:10:17.1791827Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_masked_fill_cuda_float16 PASSED [ 67%] 2023-01-11T23:10:17.1792009Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_masked_fill_cuda_float32 PASSED [ 67%] 2023-01-11T23:10:17.1792188Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_masked_fill_cuda_int32 PASSED [ 67%] 2023-01-11T23:10:17.1792371Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_masked_fill_cuda_int8 PASSED [ 67%] 2023-01-11T23:10:17.1792547Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_masked_fill_cuda_uint8 PASSED [ 67%] 2023-01-11T23:10:17.1792723Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_maximum_cuda_bool PASSED [ 67%] 2023-01-11T23:10:17.1792904Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_maximum_cuda_float32 PASSED [ 67%] 2023-01-11T23:10:17.1793106Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_maximum_cuda_int32 PASSED [ 67%] 2023-01-11T23:10:17.1793272Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_maximum_cuda_int64 PASSED [ 67%] 2023-01-11T23:10:17.1793446Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_maximum_cuda_int8 PASSED [ 67%] 2023-01-11T23:10:17.1793620Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_maximum_cuda_uint8 PASSED [ 67%] 2023-01-11T23:10:17.1793791Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_mean_cuda_bfloat16 PASSED [ 67%] 2023-01-11T23:10:17.1793967Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_mean_cuda_complex64 PASSED [ 67%] 2023-01-11T23:10:17.1794140Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_mean_cuda_float64 PASSED [ 67%] 2023-01-11T23:10:17.1794346Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_list_of_tensors_cuda_complex128 PASSED [ 67%] 2023-01-11T23:10:17.1794548Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_list_of_tensors_cuda_complex64 PASSED [ 67%] 2023-01-11T23:10:17.1794746Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_list_of_tensors_cuda_int32 PASSED [ 67%] 2023-01-11T23:10:17.1794926Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_list_of_tensors_cuda_int64 PASSED [ 67%] 2023-01-11T23:10:17.1795116Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_list_of_tensors_cuda_int8 PASSED [ 67%] 2023-01-11T23:10:17.1795318Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_variadic_tensors_cuda_bfloat16 PASSED [ 67%] 2023-01-11T23:10:17.1795513Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_variadic_tensors_cuda_bool PASSED [ 67%] 2023-01-11T23:10:17.1795747Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_variadic_tensors_cuda_complex64 PASSED [ 67%] 2023-01-11T23:10:17.1795949Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_variadic_tensors_cuda_float32 PASSED [ 67%] 2023-01-11T23:10:17.1796146Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_variadic_tensors_cuda_float64 PASSED [ 67%] 2023-01-11T23:10:17.1796340Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_variadic_tensors_cuda_int32 PASSED [ 67%] 2023-01-11T23:10:17.1796532Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_variadic_tensors_cuda_int8 PASSED [ 67%] 2023-01-11T23:10:17.1796710Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_minimum_cuda_float16 PASSED [ 67%] 2023-01-11T23:10:17.1796878Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_minimum_cuda_float64 PASSED [ 67%] 2023-01-11T23:10:17.1797054Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_minimum_cuda_uint8 PASSED [ 67%] 2023-01-11T23:10:17.1797233Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_movedim_cuda_bfloat16 PASSED [ 67%] 2023-01-11T23:10:17.1797418Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_movedim_cuda_complex128 PASSED [ 67%] 2023-01-11T23:10:17.1797601Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_movedim_cuda_complex32 PASSED [ 67%] 2023-01-11T23:10:17.1797779Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_movedim_cuda_float16 PASSED [ 67%] 2023-01-11T23:10:17.1797956Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_movedim_cuda_float32 PASSED [ 67%] 2023-01-11T23:10:17.1798128Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_movedim_cuda_float64 PASSED [ 67%] 2023-01-11T23:10:17.1798297Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_movedim_cuda_uint8 PASSED [ 67%] 2023-01-11T23:10:17.1798477Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_mul_cuda_complex128 PASSED [ 67%] 2023-01-11T23:10:17.1798673Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_mul_cuda_complex32 XFAIL [ 67%] 2023-01-11T23:10:17.1798851Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_mul_cuda_complex64 PASSED [ 67%] 2023-01-11T23:10:17.1799024Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_mul_cuda_uint8 PASSED [ 67%] 2023-01-11T23:10:17.1799197Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nan_to_num_cuda_bool PASSED [ 67%] 2023-01-11T23:10:17.1799374Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nan_to_num_cuda_float32 PASSED [ 67%] 2023-01-11T23:10:17.1799554Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nan_to_num_cuda_int32 PASSED [ 67%] 2023-01-11T23:10:17.1799730Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nan_to_num_cuda_int8 PASSED [ 67%] 2023-01-11T23:10:17.1799900Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nan_to_num_cuda_uint8 PASSED [ 67%] 2023-01-11T23:10:17.1800087Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_copy_cuda_bfloat16 PASSED [ 67%] 2023-01-11T23:10:17.1800269Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_copy_cuda_bool PASSED [ 67%] 2023-01-11T23:10:17.1800480Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_copy_cuda_complex64 PASSED [ 67%] 2023-01-11T23:10:17.1800694Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_copy_cuda_float16 PASSED [ 67%] 2023-01-11T23:10:17.1800875Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_copy_cuda_float32 PASSED [ 67%] 2023-01-11T23:10:17.1801056Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_copy_cuda_float64 PASSED [ 67%] 2023-01-11T23:10:17.1801261Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_copy_cuda_int8 PASSED [ 67%] 2023-01-11T23:10:17.1801457Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_cuda_complex128 PASSED [ 67%] 2023-01-11T23:10:17.1801631Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_cuda_complex32 PASSED [ 67%] 2023-01-11T23:10:17.1801808Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_cuda_complex64 PASSED [ 67%] 2023-01-11T23:10:17.1801984Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_cuda_float16 PASSED [ 67%] 2023-01-11T23:10:17.1802154Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_cuda_float32 PASSED [ 67%] 2023-01-11T23:10:17.1802325Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_cuda_int64 PASSED [ 67%] 2023-01-11T23:10:17.1802517Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_native_layer_norm_cuda_bfloat16 PASSED [ 67%] 2023-01-11T23:10:17.1802708Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_native_layer_norm_cuda_float16 PASSED [ 67%] 2023-01-11T23:10:17.1802882Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ne_cuda_bfloat16 PASSED [ 67%] 2023-01-11T23:10:17.1803055Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ne_cuda_bool PASSED [ 67%] 2023-01-11T23:10:17.1803224Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ne_cuda_complex128 PASSED [ 67%] 2023-01-11T23:10:17.1803396Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ne_cuda_float64 PASSED [ 67%] 2023-01-11T23:10:17.1803566Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ne_cuda_int16 PASSED [ 67%] 2023-01-11T23:10:17.1803734Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ne_cuda_int64 PASSED [ 67%] 2023-01-11T23:10:17.1803899Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ne_cuda_uint8 PASSED [ 67%] 2023-01-11T23:10:17.1804075Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_neg_cuda_bfloat16 PASSED [ 67%] 2023-01-11T23:10:17.1804251Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_neg_cuda_complex128 PASSED [ 67%] 2023-01-11T23:10:17.1804455Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_neg_cuda_float16 PASSED [ 67%] 2023-01-11T23:10:17.1804630Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_neg_cuda_float64 PASSED [ 67%] 2023-01-11T23:10:17.1804793Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_neg_cuda_int16 PASSED [ 67%] 2023-01-11T23:10:17.1804963Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_neg_cuda_int32 PASSED [ 67%] 2023-01-11T23:10:17.1805185Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_cuda_bool SKIPPED (Expected: empty is not comparable) [ 67%] 2023-01-11T23:10:17.1805412Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_cuda_complex128 SKIPPED (Expected: empty is not comparable) [ 67%] 2023-01-11T23:10:17.1805636Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_cuda_complex32 SKIPPED (Expected: empty is not comparable) [ 67%] 2023-01-11T23:10:17.1805859Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_cuda_complex64 SKIPPED (Expected: empty is not comparable) [ 67%] 2023-01-11T23:10:17.1806078Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_cuda_float16 SKIPPED (Expected: empty is not comparable) [ 67%] 2023-01-11T23:10:17.1806292Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_cuda_int16 SKIPPED (Expected: empty is not comparable) [ 67%] 2023-01-11T23:10:17.1806507Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_cuda_int32 SKIPPED (Expected: empty is not comparable) [ 67%] 2023-01-11T23:10:17.1806745Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_strided_cuda_complex32 SKIPPED (Expected: empty_strided is not comparable) [ 67%] 2023-01-11T23:10:17.1807033Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_strided_cuda_float16 SKIPPED (Expected: empty_strided is not comparable) [ 67%] 2023-01-11T23:10:17.1807267Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_strided_cuda_float32 SKIPPED (Expected: empty_strided is not comparable) [ 67%] 2023-01-11T23:10:17.1807493Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_strided_cuda_float64 SKIPPED (Expected: empty_strided is not comparable) [ 67%] 2023-01-11T23:10:17.1807725Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_strided_cuda_uint8 SKIPPED (Expected: empty_strided is not comparable) [ 67%] 2023-01-11T23:10:17.1807903Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_full_cuda_bool PASSED [ 67%] 2023-01-11T23:10:17.1808087Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_full_cuda_complex32 PASSED [ 67%] 2023-01-11T23:10:17.1808268Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_full_cuda_float16 PASSED [ 67%] 2023-01-11T23:10:17.1808445Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_full_cuda_float32 PASSED [ 67%] 2023-01-11T23:10:17.1808620Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_full_cuda_float64 PASSED [ 67%] 2023-01-11T23:10:17.1808793Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_full_cuda_int32 PASSED [ 67%] 2023-01-11T23:10:17.1808958Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_full_cuda_int64 PASSED [ 67%] 2023-01-11T23:10:17.1809128Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_full_cuda_uint8 PASSED [ 67%] 2023-01-11T23:10:17.1809309Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_ones_cuda_complex32 PASSED [ 67%] 2023-01-11T23:10:17.1809490Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_ones_cuda_complex64 PASSED [ 67%] 2023-01-11T23:10:17.1809671Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_ones_cuda_float16 PASSED [ 67%] 2023-01-11T23:10:17.1809868Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_ones_cuda_float64 PASSED [ 67%] 2023-01-11T23:10:17.1810045Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_ones_cuda_int32 PASSED [ 67%] 2023-01-11T23:10:17.1810218Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_ones_cuda_int8 PASSED [ 67%] 2023-01-11T23:10:17.1810398Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_zeros_cuda_bfloat16 PASSED [ 67%] 2023-01-11T23:10:17.1810577Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_zeros_cuda_complex128 PASSED [ 67%] 2023-01-11T23:10:17.1810790Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_zeros_cuda_complex32 PASSED [ 67%] 2023-01-11T23:10:17.1810990Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_zeros_cuda_float16 PASSED [ 67%] 2023-01-11T23:10:17.1811176Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_zeros_cuda_int16 PASSED [ 67%] 2023-01-11T23:10:17.1811352Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_zeros_cuda_int32 PASSED [ 67%] 2023-01-11T23:10:17.1811522Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_zeros_cuda_int64 PASSED [ 67%] 2023-01-11T23:10:17.1811701Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nextafter_cuda_float32 PASSED [ 67%] 2023-01-11T23:10:17.1811882Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nextafter_cuda_float64 PASSED [ 67%] 2023-01-11T23:10:17.1812124Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_alpha_dropout_cuda_float64 SKIPPED (Expected: dropout is not comparable) [ 67%] 2023-01-11T23:10:17.1812308Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_celu_cuda_float16 PASSED [ 67%] 2023-01-11T23:10:17.1812528Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_celu_cuda_float64 PASSED [ 67%] 2023-01-11T23:10:17.1812767Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_dropout_cuda_bfloat16 SKIPPED (Expected: dropout is not comparable) [ 67%] 2023-01-11T23:10:17.1813000Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_dropout_cuda_float32 SKIPPED (Expected: dropout is not comparable) [ 67%] 2023-01-11T23:10:17.1813190Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_elu_cuda_bfloat16 PASSED [ 67%] 2023-01-11T23:10:17.1813380Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_elu_cuda_float16 PASSED [ 67%] 2023-01-11T23:10:17.1813568Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_elu_cuda_float32 PASSED [ 67%] 2023-01-11T23:10:17.1813756Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_elu_cuda_float64 PASSED [ 68%] 2023-01-11T23:10:17.1813948Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_gelu_cuda_bfloat16 PASSED [ 68%] 2023-01-11T23:10:17.1814137Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_gelu_cuda_float16 PASSED [ 68%] 2023-01-11T23:10:17.1814318Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_gelu_cuda_float64 PASSED [ 68%] 2023-01-11T23:10:17.1814610Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_group_norm_cuda_bfloat16 PASSED [ 68%] 2023-01-11T23:10:17.1814810Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_group_norm_cuda_float64 PASSED [ 68%] 2023-01-11T23:10:17.1815006Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_hardshrink_cuda_bfloat16 PASSED [ 68%] 2023-01-11T23:10:17.1815203Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_hardshrink_cuda_float16 PASSED [ 68%] 2023-01-11T23:10:17.1815402Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_hardshrink_cuda_float32 PASSED [ 68%] 2023-01-11T23:10:17.1815634Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_hardtanh_cuda_float32 PASSED [ 68%] 2023-01-11T23:10:17.1815827Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_hardtanh_cuda_int16 PASSED [ 68%] 2023-01-11T23:10:17.1816021Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_hardtanh_cuda_int8 PASSED [ 68%] 2023-01-11T23:10:17.1816223Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_hinge_embedding_loss_cuda_bfloat16 PASSED [ 68%] 2023-01-11T23:10:17.1816431Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_hinge_embedding_loss_cuda_float16 PASSED [ 68%] 2023-01-11T23:10:17.1816636Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_hinge_embedding_loss_cuda_float64 PASSED [ 68%] 2023-01-11T23:10:17.1816836Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_huber_loss_cuda_bfloat16 PASSED [ 68%] 2023-01-11T23:10:17.1817030Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_huber_loss_cuda_float16 PASSED [ 68%] 2023-01-11T23:10:17.1817219Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_l1_loss_cuda_complex128 PASSED [ 68%] 2023-01-11T23:10:17.1817411Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_l1_loss_cuda_float16 PASSED [ 68%] 2023-01-11T23:10:17.1817596Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_l1_loss_cuda_float32 PASSED [ 68%] 2023-01-11T23:10:17.1817786Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_layer_norm_cuda_float32 PASSED [ 68%] 2023-01-11T23:10:17.1817974Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_layer_norm_cuda_float64 PASSED [ 68%] 2023-01-11T23:10:17.1818183Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_leaky_relu_cuda_float16 PASSED [ 68%] 2023-01-11T23:10:17.1818376Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_leaky_relu_cuda_float64 PASSED [ 68%] 2023-01-11T23:10:17.1818607Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_log_softmax_with_dtype_cuda_bool PASSED [ 68%] 2023-01-11T23:10:17.1818819Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_log_softmax_with_dtype_cuda_complex128 PASSED [ 68%] 2023-01-11T23:10:17.1819029Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_log_softmax_with_dtype_cuda_complex32 PASSED [ 68%] 2023-01-11T23:10:17.1819235Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_log_softmax_with_dtype_cuda_float16 PASSED [ 68%] 2023-01-11T23:10:17.1819445Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_log_softmax_with_dtype_cuda_float32 PASSED [ 68%] 2023-01-11T23:10:17.1819649Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_log_softmax_with_dtype_cuda_int32 PASSED [ 68%] 2023-01-11T23:10:17.1819852Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_log_softmax_with_dtype_cuda_int64 PASSED [ 68%] 2023-01-11T23:10:17.1820056Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_log_softmax_with_dtype_cuda_int8 PASSED [ 68%] 2023-01-11T23:10:17.1820253Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_log_softmax_with_dtype_cuda_uint8 PASSED [ 68%] 2023-01-11T23:10:17.1820450Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_margin_ranking_loss_cuda_bfloat16 PASSED [ 68%] 2023-01-11T23:10:17.1820651Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_margin_ranking_loss_cuda_float16 PASSED [ 68%] 2023-01-11T23:10:17.1820853Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_margin_ranking_loss_cuda_float64 PASSED [ 68%] 2023-01-11T23:10:17.1821081Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_margin_ranking_loss_cuda_int32 PASSED [ 68%] 2023-01-11T23:10:17.1821285Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_margin_ranking_loss_cuda_int64 PASSED [ 68%] 2023-01-11T23:10:17.1821488Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_margin_ranking_loss_cuda_int8 PASSED [ 68%] 2023-01-11T23:10:17.1821680Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_mish_cuda_bfloat16 PASSED [ 68%] 2023-01-11T23:10:17.1821870Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_mish_cuda_float64 PASSED [ 68%] 2023-01-11T23:10:17.1822066Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_mse_loss_cuda_float64 PASSED [ 68%] 2023-01-11T23:10:17.1822248Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_nll_loss_cuda_float64 PASSED [ 68%] 2023-01-11T23:10:17.1822456Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_pairwise_distance_cuda_bfloat16 PASSED [ 68%] 2023-01-11T23:10:17.1822661Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_pairwise_distance_cuda_complex128 PASSED [ 68%] 2023-01-11T23:10:17.1822865Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_pairwise_distance_cuda_float32 PASSED [ 68%] 2023-01-11T23:10:17.1823067Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_pairwise_distance_cuda_float64 PASSED [ 68%] 2023-01-11T23:10:17.1823267Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_pairwise_distance_cuda_uint8 PASSED [ 68%] 2023-01-11T23:10:17.1823488Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_poisson_nll_loss_cuda_bfloat16 PASSED [ 68%] 2023-01-11T23:10:17.1823693Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_poisson_nll_loss_cuda_float32 PASSED [ 68%] 2023-01-11T23:10:17.1823891Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_poisson_nll_loss_cuda_int32 PASSED [ 68%] 2023-01-11T23:10:17.1824083Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_poisson_nll_loss_cuda_int8 PASSED [ 68%] 2023-01-11T23:10:17.1824275Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_prelu_cuda_float16 PASSED [ 68%] 2023-01-11T23:10:17.1824466Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_relu6_cuda_float32 PASSED [ 68%] 2023-01-11T23:10:17.1824655Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_relu6_cuda_float64 PASSED [ 68%] 2023-01-11T23:10:17.1824844Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_relu6_cuda_int16 PASSED [ 68%] 2023-01-11T23:10:17.1825031Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_relu6_cuda_int32 PASSED [ 68%] 2023-01-11T23:10:17.1825219Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_relu6_cuda_int8 PASSED [ 68%] 2023-01-11T23:10:17.1825406Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_relu6_cuda_uint8 PASSED [ 68%] 2023-01-11T23:10:17.1825597Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_relu_cuda_float64 PASSED [ 68%] 2023-01-11T23:10:17.1825782Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_relu_cuda_int32 PASSED [ 68%] 2023-01-11T23:10:17.1825961Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_relu_cuda_uint8 PASSED [ 68%] 2023-01-11T23:10:17.1826155Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_selu_cuda_bfloat16 PASSED [ 68%] 2023-01-11T23:10:17.1826362Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmax_with_dtype_cuda_bfloat16 PASSED [ 68%] 2023-01-11T23:10:17.1826587Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmax_with_dtype_cuda_bool PASSED [ 68%] 2023-01-11T23:10:17.1826794Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmax_with_dtype_cuda_complex64 PASSED [ 68%] 2023-01-11T23:10:17.1826995Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmax_with_dtype_cuda_float32 PASSED [ 68%] 2023-01-11T23:10:17.1827193Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmax_with_dtype_cuda_int16 PASSED [ 68%] 2023-01-11T23:10:17.1827392Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmax_with_dtype_cuda_int32 PASSED [ 68%] 2023-01-11T23:10:17.1827588Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmax_with_dtype_cuda_int64 PASSED [ 68%] 2023-01-11T23:10:17.1827781Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmax_with_dtype_cuda_int8 PASSED [ 68%] 2023-01-11T23:10:17.1827977Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmax_with_dtype_cuda_uint8 PASSED [ 68%] 2023-01-11T23:10:17.1828186Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmin_with_dtype_cuda_complex128 PASSED [ 68%] 2023-01-11T23:10:17.1828390Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmin_with_dtype_cuda_float64 PASSED [ 68%] 2023-01-11T23:10:17.1828589Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmin_with_dtype_cuda_int16 PASSED [ 68%] 2023-01-11T23:10:17.1828863Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmin_with_dtype_cuda_int32 PASSED [ 68%] 2023-01-11T23:10:17.1829091Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmin_with_dtype_cuda_int64 PASSED [ 68%] 2023-01-11T23:10:17.1829289Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmin_with_dtype_cuda_uint8 PASSED [ 68%] 2023-01-11T23:10:17.1829485Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softplus_cuda_float16 PASSED [ 68%] 2023-01-11T23:10:17.1829675Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softplus_cuda_float32 PASSED [ 68%] 2023-01-11T23:10:17.1829855Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softplus_cuda_float64 PASSED [ 68%] 2023-01-11T23:10:17.1830057Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softshrink_cuda_bfloat16 PASSED [ 68%] 2023-01-11T23:10:17.1830253Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softshrink_cuda_float16 PASSED [ 68%] 2023-01-11T23:10:17.1830447Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softshrink_cuda_float32 PASSED [ 68%] 2023-01-11T23:10:17.1830656Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softshrink_cuda_float64 PASSED [ 68%] 2023-01-11T23:10:17.1830888Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_tanhshrink_cuda_complex64 PASSED [ 68%] 2023-01-11T23:10:17.1831085Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_tanhshrink_cuda_float32 PASSED [ 68%] 2023-01-11T23:10:17.1831279Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_tanhshrink_cuda_int16 PASSED [ 68%] 2023-01-11T23:10:17.1831469Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_tanhshrink_cuda_int32 PASSED [ 68%] 2023-01-11T23:10:17.1831655Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_tanhshrink_cuda_int64 PASSED [ 68%] 2023-01-11T23:10:17.1831842Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_threshold_cuda_int16 PASSED [ 68%] 2023-01-11T23:10:17.1832053Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_threshold_cuda_int32 PASSED [ 68%] 2023-01-11T23:10:17.1832243Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_threshold_cuda_int64 PASSED [ 68%] 2023-01-11T23:10:17.1832437Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_threshold_cuda_int8 PASSED [ 68%] 2023-01-11T23:10:17.1832648Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_triplet_margin_loss_cuda_float16 PASSED [ 68%] 2023-01-11T23:10:17.1832855Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_triplet_margin_loss_cuda_float32 PASSED [ 68%] 2023-01-11T23:10:17.1833060Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_triplet_margin_loss_cuda_float64 PASSED [ 68%] 2023-01-11T23:10:17.1833271Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_triplet_margin_loss_cuda_int64 PASSED [ 68%] 2023-01-11T23:10:17.1833452Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_norm_cuda_bfloat16 PASSED [ 68%] 2023-01-11T23:10:17.1833624Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_norm_cuda_complex64 PASSED [ 68%] 2023-01-11T23:10:17.1833800Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_norm_cuda_float16 PASSED [ 68%] 2023-01-11T23:10:17.1833976Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_norm_cuda_float32 PASSED [ 68%] 2023-01-11T23:10:17.1839689Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_norm_cuda_float64 PASSED [ 68%] 2023-01-11T23:10:17.1839889Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ones_cuda_bfloat16 PASSED [ 68%] 2023-01-11T23:10:17.1840122Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ones_cuda_complex128 PASSED [ 68%] 2023-01-11T23:10:17.1840297Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ones_cuda_complex64 PASSED [ 68%] 2023-01-11T23:10:17.1840480Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ones_cuda_float32 PASSED [ 68%] 2023-01-11T23:10:17.1840682Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ones_cuda_int16 PASSED [ 68%] 2023-01-11T23:10:17.1840855Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ones_cuda_int8 PASSED [ 68%] 2023-01-11T23:10:17.1841024Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ones_cuda_uint8 PASSED [ 68%] 2023-01-11T23:10:17.1841203Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_permute_cuda_bfloat16 PASSED [ 68%] 2023-01-11T23:10:17.1841374Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_permute_cuda_bool PASSED [ 68%] 2023-01-11T23:10:17.1841557Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_permute_cuda_complex32 PASSED [ 68%] 2023-01-11T23:10:17.1841732Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_permute_cuda_float16 PASSED [ 68%] 2023-01-11T23:10:17.1841908Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_permute_cuda_float64 PASSED [ 68%] 2023-01-11T23:10:17.1842080Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_permute_cuda_int16 PASSED [ 68%] 2023-01-11T23:10:17.1842258Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_positive_cuda_bfloat16 PASSED [ 68%] 2023-01-11T23:10:17.1842436Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_positive_cuda_complex128 PASSED [ 68%] 2023-01-11T23:10:17.1842619Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_positive_cuda_complex32 PASSED [ 68%] 2023-01-11T23:10:17.1842799Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_positive_cuda_complex64 PASSED [ 68%] 2023-01-11T23:10:17.1842974Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_positive_cuda_int32 PASSED [ 68%] 2023-01-11T23:10:17.1843142Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_positive_cuda_int64 PASSED [ 68%] 2023-01-11T23:10:17.1843338Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_positive_cuda_int8 PASSED [ 68%] 2023-01-11T23:10:17.1843509Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_pow_cuda_complex128 PASSED [ 68%] 2023-01-11T23:10:17.1843678Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_pow_cuda_complex32 XFAIL [ 68%] 2023-01-11T23:10:17.1843850Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_pow_cuda_float64 PASSED [ 68%] 2023-01-11T23:10:17.1844010Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_pow_cuda_int32 PASSED [ 68%] 2023-01-11T23:10:17.1844177Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_pow_cuda_int64 PASSED [ 68%] 2023-01-11T23:10:17.1844348Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_pow_cuda_uint8 PASSED [ 68%] 2023-01-11T23:10:17.1844527Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_prod_cuda_complex128 PASSED [ 68%] 2023-01-11T23:10:17.1844704Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_prod_cuda_complex32 PASSED [ 68%] 2023-01-11T23:10:17.1844877Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_prod_cuda_float16 PASSED [ 68%] 2023-01-11T23:10:17.1845050Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_prod_cuda_float32 PASSED [ 68%] 2023-01-11T23:10:17.1845220Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_prod_cuda_int64 PASSED [ 68%] 2023-01-11T23:10:17.1845380Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_prod_cuda_int8 PASSED [ 68%] 2023-01-11T23:10:17.1845548Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_prod_cuda_uint8 PASSED [ 68%] 2023-01-11T23:10:17.1845754Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_randn_cuda_complex32 PASSED [ 68%] 2023-01-11T23:10:17.1845928Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_randn_cuda_float32 PASSED [ 68%] 2023-01-11T23:10:17.1846104Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ravel_cuda_int32 PASSED [ 68%] 2023-01-11T23:10:17.1846279Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ravel_cuda_uint8 PASSED [ 68%] 2023-01-11T23:10:17.1846456Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_real_cuda_complex32 PASSED [ 68%] 2023-01-11T23:10:17.1846626Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_real_cuda_complex64 PASSED [ 68%] 2023-01-11T23:10:17.1846799Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_real_cuda_float32 PASSED [ 68%] 2023-01-11T23:10:17.1846964Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_real_cuda_float64 PASSED [ 68%] 2023-01-11T23:10:17.1847138Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_real_cuda_int16 PASSED [ 68%] 2023-01-11T23:10:17.1847307Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_real_cuda_uint8 PASSED [ 68%] 2023-01-11T23:10:17.1847493Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reciprocal_cuda_bfloat16 PASSED [ 68%] 2023-01-11T23:10:17.1847677Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reciprocal_cuda_bool PASSED [ 68%] 2023-01-11T23:10:17.1847861Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reciprocal_cuda_float64 PASSED [ 68%] 2023-01-11T23:10:17.1848041Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reciprocal_cuda_int16 PASSED [ 68%] 2023-01-11T23:10:17.1848216Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reciprocal_cuda_int32 PASSED [ 68%] 2023-01-11T23:10:17.1848388Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reciprocal_cuda_int64 PASSED [ 68%] 2023-01-11T23:10:17.1848561Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reciprocal_cuda_int8 PASSED [ 68%] 2023-01-11T23:10:17.1848735Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reciprocal_cuda_uint8 PASSED [ 68%] 2023-01-11T23:10:17.1848952Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_remainder_cuda_bfloat16 PASSED [ 68%] 2023-01-11T23:10:17.1849134Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_remainder_cuda_float16 PASSED [ 68%] 2023-01-11T23:10:17.1849313Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_remainder_cuda_float32 PASSED [ 68%] 2023-01-11T23:10:17.1849492Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_remainder_cuda_float64 PASSED [ 69%] 2023-01-11T23:10:17.1849667Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_remainder_cuda_int16 PASSED [ 69%] 2023-01-11T23:10:17.1849839Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_remainder_cuda_int64 PASSED [ 69%] 2023-01-11T23:10:17.1850018Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_remainder_cuda_int8 PASSED [ 69%] 2023-01-11T23:10:17.1850184Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_remainder_cuda_uint8 PASSED [ 69%] 2023-01-11T23:10:17.1850358Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_repeat_cuda_bfloat16 PASSED [ 69%] 2023-01-11T23:10:17.1850539Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_repeat_cuda_complex128 PASSED [ 69%] 2023-01-11T23:10:17.1850736Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_repeat_cuda_complex64 PASSED [ 69%] 2023-01-11T23:10:17.1850940Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_repeat_cuda_float32 PASSED [ 69%] 2023-01-11T23:10:17.1851118Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_repeat_cuda_float64 PASSED [ 69%] 2023-01-11T23:10:17.1851290Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_repeat_cuda_int64 PASSED [ 69%] 2023-01-11T23:10:17.1851499Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_as_cuda_bfloat16 PASSED [ 69%] 2023-01-11T23:10:17.1851680Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_as_cuda_bool PASSED [ 69%] 2023-01-11T23:10:17.1851859Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_as_cuda_complex64 PASSED [ 69%] 2023-01-11T23:10:17.1852041Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_as_cuda_float16 PASSED [ 69%] 2023-01-11T23:10:17.1852221Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_as_cuda_float32 PASSED [ 69%] 2023-01-11T23:10:17.1852396Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_as_cuda_int16 PASSED [ 69%] 2023-01-11T23:10:17.1852570Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_as_cuda_int32 PASSED [ 69%] 2023-01-11T23:10:17.1852739Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_as_cuda_int64 PASSED [ 69%] 2023-01-11T23:10:17.1852914Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_cuda_bool PASSED [ 69%] 2023-01-11T23:10:17.1853097Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_cuda_complex64 PASSED [ 69%] 2023-01-11T23:10:17.1853272Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_cuda_int16 PASSED [ 69%] 2023-01-11T23:10:17.1853435Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_cuda_int32 PASSED [ 69%] 2023-01-11T23:10:17.1853606Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_roll_cuda_bfloat16 PASSED [ 69%] 2023-01-11T23:10:17.1853780Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_roll_cuda_complex32 PASSED [ 69%] 2023-01-11T23:10:17.1853949Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_roll_cuda_float16 PASSED [ 69%] 2023-01-11T23:10:17.1854122Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_roll_cuda_float64 PASSED [ 69%] 2023-01-11T23:10:17.1854297Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_roll_cuda_int32 PASSED [ 69%] 2023-01-11T23:10:17.1854737Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_roll_cuda_uint8 PASSED [ 69%] 2023-01-11T23:10:17.1854917Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rot90_cuda_bfloat16 PASSED [ 69%] 2023-01-11T23:10:17.1855083Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rot90_cuda_complex64 PASSED [ 69%] 2023-01-11T23:10:17.1855250Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rot90_cuda_float16 PASSED [ 69%] 2023-01-11T23:10:17.1855415Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rot90_cuda_float32 PASSED [ 69%] 2023-01-11T23:10:17.1855582Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rot90_cuda_int8 PASSED [ 69%] 2023-01-11T23:10:17.1855757Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_round_cuda_bfloat16 PASSED [ 69%] 2023-01-11T23:10:17.1855934Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_round_cuda_float32 PASSED [ 69%] 2023-01-11T23:10:17.1856106Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_round_cuda_int16 PASSED [ 69%] 2023-01-11T23:10:17.1856275Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_round_cuda_int64 PASSED [ 69%] 2023-01-11T23:10:17.1856442Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_round_cuda_int8 PASSED [ 69%] 2023-01-11T23:10:17.1856604Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_round_cuda_uint8 PASSED [ 69%] 2023-01-11T23:10:17.1856775Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsqrt_cuda_bfloat16 PASSED [ 69%] 2023-01-11T23:10:17.1856941Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsqrt_cuda_bool PASSED [ 69%] 2023-01-11T23:10:17.1857113Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsqrt_cuda_float16 PASSED [ 69%] 2023-01-11T23:10:17.1857316Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsqrt_cuda_int16 PASSED [ 69%] 2023-01-11T23:10:17.1857487Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsqrt_cuda_int32 PASSED [ 69%] 2023-01-11T23:10:17.1857655Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsqrt_cuda_int8 PASSED [ 69%] 2023-01-11T23:10:17.1857823Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsqrt_cuda_uint8 PASSED [ 69%] 2023-01-11T23:10:17.1857994Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsub_cuda_int16 PASSED [ 69%] 2023-01-11T23:10:17.1858153Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsub_cuda_int64 PASSED [ 69%] 2023-01-11T23:10:17.1858320Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsub_cuda_int8 PASSED [ 69%] 2023-01-11T23:10:17.1858493Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sgn_cuda_complex128 PASSED [ 69%] 2023-01-11T23:10:17.1858663Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sgn_cuda_float64 PASSED [ 69%] 2023-01-11T23:10:17.1858829Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sgn_cuda_int32 PASSED [ 69%] 2023-01-11T23:10:17.1859001Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sgn_cuda_uint8 PASSED [ 69%] 2023-01-11T23:10:17.1859171Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sigmoid_cuda_bool PASSED [ 69%] 2023-01-11T23:10:17.1859343Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sigmoid_cuda_int16 PASSED [ 69%] 2023-01-11T23:10:17.1859511Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sigmoid_cuda_int32 PASSED [ 69%] 2023-01-11T23:10:17.1859674Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sigmoid_cuda_int8 PASSED [ 69%] 2023-01-11T23:10:17.1859843Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sign_cuda_float16 PASSED [ 69%] 2023-01-11T23:10:17.1860016Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sign_cuda_float64 PASSED [ 69%] 2023-01-11T23:10:17.1860187Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sign_cuda_int64 PASSED [ 69%] 2023-01-11T23:10:17.1860396Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sign_cuda_int8 PASSED [ 69%] 2023-01-11T23:10:17.1860593Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sign_cuda_uint8 PASSED [ 69%] 2023-01-11T23:10:17.1860776Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_signbit_cuda_bfloat16 PASSED [ 69%] 2023-01-11T23:10:17.1860945Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_signbit_cuda_bool PASSED [ 69%] 2023-01-11T23:10:17.1861113Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_signbit_cuda_int32 PASSED [ 69%] 2023-01-11T23:10:17.1861283Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_signbit_cuda_int64 PASSED [ 69%] 2023-01-11T23:10:17.1861455Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sin_cuda_complex32 PASSED [ 69%] 2023-01-11T23:10:17.1861624Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sin_cuda_complex64 PASSED [ 69%] 2023-01-11T23:10:17.1861797Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sin_cuda_float16 PASSED [ 69%] 2023-01-11T23:10:17.1861967Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sin_cuda_float64 PASSED [ 69%] 2023-01-11T23:10:17.1862135Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sin_cuda_int32 PASSED [ 69%] 2023-01-11T23:10:17.1862306Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sin_cuda_uint8 PASSED [ 69%] 2023-01-11T23:10:17.1862484Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinc_cuda_complex128 PASSED [ 69%] 2023-01-11T23:10:17.1862648Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinc_cuda_float32 PASSED [ 69%] 2023-01-11T23:10:17.1862843Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinc_cuda_int16 PASSED [ 69%] 2023-01-11T23:10:17.1863012Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinc_cuda_int32 PASSED [ 69%] 2023-01-11T23:10:17.1863180Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinc_cuda_int64 PASSED [ 69%] 2023-01-11T23:10:17.1863349Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinc_cuda_uint8 PASSED [ 69%] 2023-01-11T23:10:17.1863517Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinh_cuda_bool PASSED [ 69%] 2023-01-11T23:10:17.1863694Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinh_cuda_complex32 PASSED [ 69%] 2023-01-11T23:10:17.1863866Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinh_cuda_complex64 PASSED [ 69%] 2023-01-11T23:10:17.1864036Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinh_cuda_float32 PASSED [ 69%] 2023-01-11T23:10:17.1864199Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinh_cuda_int16 PASSED [ 69%] 2023-01-11T23:10:17.1864363Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinh_cuda_int32 PASSED [ 69%] 2023-01-11T23:10:17.1864531Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinh_cuda_int64 PASSED [ 69%] 2023-01-11T23:10:17.1864700Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinh_cuda_int8 PASSED [ 69%] 2023-01-11T23:10:17.1864889Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_softmax_with_dtype_cuda_bool PASSED [ 69%] 2023-01-11T23:10:17.1865085Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_softmax_with_dtype_cuda_complex128 PASSED [ 69%] 2023-01-11T23:10:17.1865279Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_softmax_with_dtype_cuda_complex64 PASSED [ 69%] 2023-01-11T23:10:17.1865466Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_softmax_with_dtype_cuda_int8 PASSED [ 69%] 2023-01-11T23:10:17.1865655Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_softmax_with_dtype_cuda_uint8 PASSED [ 69%] 2023-01-11T23:10:17.1865859Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_bessel_j0_cuda_float32 PASSED [ 69%] 2023-01-11T23:10:17.1866051Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_bessel_j0_cuda_float64 PASSED [ 69%] 2023-01-11T23:10:17.1866235Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_bessel_j0_cuda_int16 PASSED [ 69%] 2023-01-11T23:10:17.1866418Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_bessel_j0_cuda_int64 PASSED [ 69%] 2023-01-11T23:10:17.1866601Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_bessel_j0_cuda_int8 PASSED [ 69%] 2023-01-11T23:10:17.1866781Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_bessel_j0_cuda_uint8 PASSED [ 69%] 2023-01-11T23:10:17.1866964Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_bessel_j1_cuda_bool PASSED [ 69%] 2023-01-11T23:10:17.1867142Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_bessel_j1_cuda_int64 PASSED [ 69%] 2023-01-11T23:10:17.1867322Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_bessel_j1_cuda_int8 PASSED [ 69%] 2023-01-11T23:10:17.1867494Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_bessel_j1_cuda_uint8 PASSED [ 69%] 2023-01-11T23:10:17.1867675Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_entr_cuda_bool PASSED [ 69%] 2023-01-11T23:10:17.1867859Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_entr_cuda_float16 PASSED [ 69%] 2023-01-11T23:10:17.1868041Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_entr_cuda_float64 PASSED [ 69%] 2023-01-11T23:10:17.1868223Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_entr_cuda_int16 PASSED [ 69%] 2023-01-11T23:10:17.1868432Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_erfcx_cuda_float32 PASSED [ 69%] 2023-01-11T23:10:17.1868621Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_erfcx_cuda_float64 PASSED [ 69%] 2023-01-11T23:10:17.1868887Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_erfcx_cuda_int16 PASSED [ 69%] 2023-01-11T23:10:17.1869073Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_erfcx_cuda_uint8 PASSED [ 69%] 2023-01-11T23:10:17.1869246Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i0e_cuda_float32 PASSED [ 69%] 2023-01-11T23:10:17.1869425Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i0e_cuda_int16 PASSED [ 69%] 2023-01-11T23:10:17.1869604Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i0e_cuda_int32 PASSED [ 69%] 2023-01-11T23:10:17.1869786Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i0e_cuda_uint8 PASSED [ 69%] 2023-01-11T23:10:17.1869969Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i1_cuda_bool PASSED [ 69%] 2023-01-11T23:10:17.1870756Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i1_cuda_float32 PASSED [ 69%] 2023-01-11T23:10:17.1870936Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i1_cuda_int16 PASSED [ 69%] 2023-01-11T23:10:17.1871109Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i1_cuda_int64 PASSED [ 69%] 2023-01-11T23:10:17.1871280Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i1_cuda_uint8 PASSED [ 69%] 2023-01-11T23:10:17.1871445Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i1e_cuda_bool PASSED [ 69%] 2023-01-11T23:10:17.1871626Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i1e_cuda_float64 PASSED [ 69%] 2023-01-11T23:10:17.1871803Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i1e_cuda_int64 PASSED [ 69%] 2023-01-11T23:10:17.1871982Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i1e_cuda_int8 PASSED [ 69%] 2023-01-11T23:10:17.1872186Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i1e_cuda_uint8 PASSED [ 69%] 2023-01-11T23:10:17.1872372Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_log_ndtr_cuda_bool PASSED [ 69%] 2023-01-11T23:10:17.1872557Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_log_ndtr_cuda_int64 PASSED [ 69%] 2023-01-11T23:10:17.1872739Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_log_ndtr_cuda_int8 PASSED [ 69%] 2023-01-11T23:10:17.1872940Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_log_softmax_with_dtype_cuda_bool PASSED [ 69%] 2023-01-11T23:10:17.1873136Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_log_softmax_with_dtype_cuda_float32 PASSED [ 69%] 2023-01-11T23:10:17.1873341Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_log_softmax_with_dtype_cuda_float64 PASSED [ 69%] 2023-01-11T23:10:17.1873539Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_log_softmax_with_dtype_cuda_int16 PASSED [ 69%] 2023-01-11T23:10:17.1873735Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_log_softmax_with_dtype_cuda_int64 PASSED [ 69%] 2023-01-11T23:10:17.1873932Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_log_softmax_with_dtype_cuda_uint8 PASSED [ 69%] 2023-01-11T23:10:17.1874114Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_logit_cuda_bfloat16 PASSED [ 69%] 2023-01-11T23:10:17.1874297Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_logit_cuda_int16 PASSED [ 69%] 2023-01-11T23:10:17.1874479Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_logit_cuda_int32 PASSED [ 69%] 2023-01-11T23:10:17.1874683Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_logit_cuda_int64 PASSED [ 69%] 2023-01-11T23:10:17.1874883Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_1_cuda_int16 PASSED [ 69%] 2023-01-11T23:10:17.1875089Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_1_cuda_int8 PASSED [ 69%] 2023-01-11T23:10:17.1875288Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_1_cuda_uint8 PASSED [ 69%] 2023-01-11T23:10:17.1875499Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_3_cuda_float16 PASSED [ 69%] 2023-01-11T23:10:17.1875704Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_3_cuda_float32 PASSED [ 69%] 2023-01-11T23:10:17.1875899Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_3_cuda_int16 PASSED [ 69%] 2023-01-11T23:10:17.1876104Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_3_cuda_int64 PASSED [ 69%] 2023-01-11T23:10:17.1876305Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_3_cuda_int8 PASSED [ 69%] 2023-01-11T23:10:17.1876502Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_3_cuda_uint8 PASSED [ 69%] 2023-01-11T23:10:17.1876706Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_5_cuda_float32 PASSED [ 69%] 2023-01-11T23:10:17.1876900Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_5_cuda_int32 PASSED [ 69%] 2023-01-11T23:10:17.1877100Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_5_cuda_int64 PASSED [ 69%] 2023-01-11T23:10:17.1877302Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_5_cuda_int8 PASSED [ 69%] 2023-01-11T23:10:17.1877485Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_ndtr_cuda_float16 PASSED [ 69%] 2023-01-11T23:10:17.1877691Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_ndtr_cuda_float32 PASSED [ 69%] 2023-01-11T23:10:17.1877877Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_ndtr_cuda_int16 PASSED [ 69%] 2023-01-11T23:10:17.1878062Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_ndtr_cuda_int32 PASSED [ 69%] 2023-01-11T23:10:17.1878247Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_ndtr_cuda_int64 PASSED [ 69%] 2023-01-11T23:10:17.1878427Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_ndtr_cuda_int8 PASSED [ 69%] 2023-01-11T23:10:17.1878604Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_ndtr_cuda_uint8 PASSED [ 69%] 2023-01-11T23:10:17.1878787Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_ndtri_cuda_float64 PASSED [ 70%] 2023-01-11T23:10:17.1878971Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_ndtri_cuda_int16 PASSED [ 70%] 2023-01-11T23:10:17.1879169Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_softmax_with_dtype_cuda_bfloat16 PASSED [ 70%] 2023-01-11T23:10:17.1879374Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_softmax_with_dtype_cuda_complex64 PASSED [ 70%] 2023-01-11T23:10:17.1879568Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_softmax_with_dtype_cuda_int8 PASSED [ 70%] 2023-01-11T23:10:17.1879764Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_spherical_bessel_j0_cuda_float64 PASSED [ 70%] 2023-01-11T23:10:17.1879962Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_spherical_bessel_j0_cuda_int16 PASSED [ 70%] 2023-01-11T23:10:17.1880184Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_spherical_bessel_j0_cuda_int32 PASSED [ 70%] 2023-01-11T23:10:17.1880383Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_spherical_bessel_j0_cuda_int8 PASSED [ 70%] 2023-01-11T23:10:17.1880589Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_xlog1py_cuda_float16 PASSED [ 70%] 2023-01-11T23:10:17.1880802Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_xlog1py_cuda_float32 PASSED [ 70%] 2023-01-11T23:10:17.1880985Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_xlog1py_cuda_int16 PASSED [ 70%] 2023-01-11T23:10:17.1881171Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_xlog1py_cuda_int8 PASSED [ 70%] 2023-01-11T23:10:17.1881415Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_zeta_cuda_float64 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 70%] 2023-01-11T23:10:17.1881599Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_zeta_cuda_int16 PASSED [ 70%] 2023-01-11T23:10:17.1881781Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_zeta_cuda_int32 PASSED [ 70%] 2023-01-11T23:10:17.1881961Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_zeta_cuda_int64 PASSED [ 70%] 2023-01-11T23:10:17.1882135Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sqrt_cuda_bool PASSED [ 70%] 2023-01-11T23:10:17.1882306Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sqrt_cuda_complex32 PASSED [ 70%] 2023-01-11T23:10:17.1882481Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sqrt_cuda_complex64 PASSED [ 70%] 2023-01-11T23:10:17.1882653Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sqrt_cuda_float16 PASSED [ 70%] 2023-01-11T23:10:17.1882825Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sqrt_cuda_int32 PASSED [ 70%] 2023-01-11T23:10:17.1883001Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sqrt_cuda_int8 PASSED [ 70%] 2023-01-11T23:10:17.1883201Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_square_cuda_bfloat16 PASSED [ 70%] 2023-01-11T23:10:17.1883383Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_square_cuda_complex128 PASSED [ 70%] 2023-01-11T23:10:17.1883561Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_square_cuda_complex64 PASSED [ 70%] 2023-01-11T23:10:17.1883737Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_square_cuda_float16 PASSED [ 70%] 2023-01-11T23:10:17.1883900Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_square_cuda_float32 PASSED [ 70%] 2023-01-11T23:10:17.1884070Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_square_cuda_float64 PASSED [ 70%] 2023-01-11T23:10:17.1884241Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_square_cuda_int32 PASSED [ 70%] 2023-01-11T23:10:17.1884416Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_square_cuda_int64 PASSED [ 70%] 2023-01-11T23:10:17.1884591Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_squeeze_cuda_bool PASSED [ 70%] 2023-01-11T23:10:17.1884774Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_squeeze_cuda_complex128 PASSED [ 70%] 2023-01-11T23:10:17.1884957Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_squeeze_cuda_float16 PASSED [ 70%] 2023-01-11T23:10:17.1885131Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_squeeze_cuda_float32 PASSED [ 70%] 2023-01-11T23:10:17.1885301Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_squeeze_cuda_float64 PASSED [ 70%] 2023-01-11T23:10:17.1885469Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_squeeze_cuda_uint8 PASSED [ 70%] 2023-01-11T23:10:17.1885671Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_stack_cuda_bfloat16 PASSED [ 70%] 2023-01-11T23:10:17.1885852Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_stack_cuda_complex128 PASSED [ 70%] 2023-01-11T23:10:17.1886028Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_stack_cuda_float32 PASSED [ 70%] 2023-01-11T23:10:17.1886202Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_stack_cuda_int32 PASSED [ 70%] 2023-01-11T23:10:17.1886376Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_std_cuda_bfloat16 PASSED [ 70%] 2023-01-11T23:10:17.1886551Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_std_cuda_complex128 PASSED [ 70%] 2023-01-11T23:10:17.1886724Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_std_cuda_float16 PASSED [ 70%] 2023-01-11T23:10:17.1886898Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_std_cuda_float64 PASSED [ 70%] 2023-01-11T23:10:17.1887076Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_std_mean_cuda_complex128 PASSED [ 70%] 2023-01-11T23:10:17.1887256Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_std_mean_cuda_float16 PASSED [ 70%] 2023-01-11T23:10:17.1887432Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_std_mean_cuda_float64 PASSED [ 70%] 2023-01-11T23:10:17.1887603Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sub_cuda_bfloat16 PASSED [ 70%] 2023-01-11T23:10:17.1887779Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sub_cuda_complex128 PASSED [ 70%] 2023-01-11T23:10:17.1887953Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sub_cuda_complex32 PASSED [ 70%] 2023-01-11T23:10:17.1888122Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sub_cuda_complex64 PASSED [ 70%] 2023-01-11T23:10:17.1888294Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sub_cuda_float32 PASSED [ 70%] 2023-01-11T23:10:17.1888469Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sub_cuda_int32 PASSED [ 70%] 2023-01-11T23:10:17.1888635Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_cuda_bfloat16 PASSED [ 70%] 2023-01-11T23:10:17.1888844Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_cuda_bool PASSED [ 70%] 2023-01-11T23:10:17.1889022Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_cuda_complex128 PASSED [ 70%] 2023-01-11T23:10:17.1889198Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_cuda_complex64 PASSED [ 70%] 2023-01-11T23:10:17.1889369Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_cuda_float64 PASSED [ 70%] 2023-01-11T23:10:17.1889538Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_cuda_int16 PASSED [ 70%] 2023-01-11T23:10:17.1889709Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_cuda_int32 PASSED [ 70%] 2023-01-11T23:10:17.1889879Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_cuda_int64 PASSED [ 70%] 2023-01-11T23:10:17.1890045Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_cuda_uint8 PASSED [ 70%] 2023-01-11T23:10:17.1890233Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_to_size_cuda_bfloat16 PASSED [ 70%] 2023-01-11T23:10:17.1890440Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_to_size_cuda_float16 PASSED [ 70%] 2023-01-11T23:10:17.1890643Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_to_size_cuda_float32 PASSED [ 70%] 2023-01-11T23:10:17.1890821Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_to_size_cuda_float64 PASSED [ 70%] 2023-01-11T23:10:17.1891000Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_to_size_cuda_int32 PASSED [ 70%] 2023-01-11T23:10:17.1891177Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_to_size_cuda_int8 PASSED [ 70%] 2023-01-11T23:10:17.1891375Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_to_size_cuda_uint8 PASSED [ 70%] 2023-01-11T23:10:17.1891549Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_t_cuda_bfloat16 PASSED [ 70%] 2023-01-11T23:10:17.1891718Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_t_cuda_complex128 PASSED [ 70%] 2023-01-11T23:10:17.1891892Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_t_cuda_complex64 PASSED [ 70%] 2023-01-11T23:10:17.1892064Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_t_cuda_float16 PASSED [ 70%] 2023-01-11T23:10:17.1892231Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_t_cuda_float64 PASSED [ 70%] 2023-01-11T23:10:17.1892400Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_t_cuda_uint8 PASSED [ 70%] 2023-01-11T23:10:17.1892573Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tan_cuda_bfloat16 PASSED [ 70%] 2023-01-11T23:10:17.1892751Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tan_cuda_complex64 PASSED [ 70%] 2023-01-11T23:10:17.1892924Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tan_cuda_float16 PASSED [ 70%] 2023-01-11T23:10:17.1893096Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tan_cuda_float32 PASSED [ 70%] 2023-01-11T23:10:17.1893259Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tan_cuda_int32 PASSED [ 70%] 2023-01-11T23:10:17.1893427Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tan_cuda_int64 PASSED [ 70%] 2023-01-11T23:10:17.1893594Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tanh_cuda_bool PASSED [ 70%] 2023-01-11T23:10:17.1893774Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tanh_cuda_complex128 PASSED [ 70%] 2023-01-11T23:10:17.1893946Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tanh_cuda_float16 PASSED [ 70%] 2023-01-11T23:10:17.1894119Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tanh_cuda_float32 PASSED [ 70%] 2023-01-11T23:10:17.1894292Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tanh_cuda_int16 PASSED [ 70%] 2023-01-11T23:10:17.1894616Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tanh_cuda_int32 PASSED [ 70%] 2023-01-11T23:10:17.1894789Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tanh_cuda_int64 PASSED [ 70%] 2023-01-11T23:10:17.1894948Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tanh_cuda_int8 PASSED [ 70%] 2023-01-11T23:10:17.1895114Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tanh_cuda_uint8 PASSED [ 70%] 2023-01-11T23:10:17.1895300Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tensor_split_cuda_bfloat16 PASSED [ 70%] 2023-01-11T23:10:17.1895478Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tensor_split_cuda_bool PASSED [ 70%] 2023-01-11T23:10:17.1895662Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tensor_split_cuda_complex64 PASSED [ 70%] 2023-01-11T23:10:17.1895851Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tensor_split_cuda_float16 PASSED [ 70%] 2023-01-11T23:10:17.1896031Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tensor_split_cuda_int8 PASSED [ 70%] 2023-01-11T23:10:17.1896208Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tensor_split_cuda_uint8 PASSED [ 70%] 2023-01-11T23:10:17.1896370Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_to_cuda_bool PASSED [ 70%] 2023-01-11T23:10:17.1896540Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_to_cuda_complex64 PASSED [ 70%] 2023-01-11T23:10:17.1896707Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_to_cuda_float64 PASSED [ 70%] 2023-01-11T23:10:17.1896874Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_to_cuda_int64 PASSED [ 70%] 2023-01-11T23:10:17.1897040Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_to_cuda_int8 PASSED [ 70%] 2023-01-11T23:10:17.1897255Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trace_cuda_bfloat16 PASSED [ 70%] 2023-01-11T23:10:17.1897424Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trace_cuda_bool PASSED [ 70%] 2023-01-11T23:10:17.1897601Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trace_cuda_complex128 PASSED [ 70%] 2023-01-11T23:10:17.1897776Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trace_cuda_complex32 PASSED [ 70%] 2023-01-11T23:10:17.1897940Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trace_cuda_float32 PASSED [ 70%] 2023-01-11T23:10:17.1898109Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trace_cuda_int64 PASSED [ 70%] 2023-01-11T23:10:17.1898279Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trace_cuda_int8 PASSED [ 70%] 2023-01-11T23:10:17.1898451Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trace_cuda_uint8 PASSED [ 70%] 2023-01-11T23:10:17.1898635Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_transpose_cuda_bfloat16 PASSED [ 70%] 2023-01-11T23:10:17.1898812Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_transpose_cuda_bool PASSED [ 70%] 2023-01-11T23:10:17.1898996Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_transpose_cuda_complex32 PASSED [ 70%] 2023-01-11T23:10:17.1899182Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_transpose_cuda_complex64 PASSED [ 70%] 2023-01-11T23:10:17.1899361Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_transpose_cuda_float32 PASSED [ 70%] 2023-01-11T23:10:17.1899534Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_transpose_cuda_float64 PASSED [ 70%] 2023-01-11T23:10:17.1899712Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_transpose_cuda_int16 PASSED [ 70%] 2023-01-11T23:10:17.1899888Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_transpose_cuda_int64 PASSED [ 70%] 2023-01-11T23:10:17.1900061Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_transpose_cuda_uint8 PASSED [ 70%] 2023-01-11T23:10:17.1900302Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tril_cuda_complex128 PASSED [ 70%] 2023-01-11T23:10:17.1900474Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tril_cuda_float16 PASSED [ 70%] 2023-01-11T23:10:17.1900646Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tril_cuda_float64 PASSED [ 70%] 2023-01-11T23:10:17.1900818Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tril_cuda_uint8 PASSED [ 70%] 2023-01-11T23:10:17.1901000Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tril_indices_cuda_int32 PASSED [ 70%] 2023-01-11T23:10:17.1901170Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tril_indices_cuda_int64 PASSED [ 70%] 2023-01-11T23:10:17.1901341Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_triu_cuda_bool PASSED [ 70%] 2023-01-11T23:10:17.1901517Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_triu_cuda_complex128 PASSED [ 70%] 2023-01-11T23:10:17.1901692Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_triu_cuda_complex32 PASSED [ 70%] 2023-01-11T23:10:17.1901863Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_triu_cuda_float64 PASSED [ 70%] 2023-01-11T23:10:17.1902031Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_triu_cuda_int16 PASSED [ 70%] 2023-01-11T23:10:17.1902199Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_triu_cuda_int64 PASSED [ 70%] 2023-01-11T23:10:17.1902367Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_triu_cuda_int8 PASSED [ 70%] 2023-01-11T23:10:17.1902543Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_triu_indices_cuda_int64 PASSED [ 70%] 2023-01-11T23:10:17.1902745Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_true_divide_cuda_bfloat16 PASSED [ 70%] 2023-01-11T23:10:17.1902931Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_true_divide_cuda_complex64 PASSED [ 70%] 2023-01-11T23:10:17.1903114Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_true_divide_cuda_float64 PASSED [ 70%] 2023-01-11T23:10:17.1903294Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_true_divide_cuda_int32 PASSED [ 70%] 2023-01-11T23:10:17.1903473Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_true_divide_cuda_int64 PASSED [ 70%] 2023-01-11T23:10:17.1903651Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_true_divide_cuda_int8 PASSED [ 70%] 2023-01-11T23:10:17.1903828Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trunc_cuda_bfloat16 PASSED [ 70%] 2023-01-11T23:10:17.1904004Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trunc_cuda_float16 PASSED [ 70%] 2023-01-11T23:10:17.1904172Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trunc_cuda_int16 PASSED [ 70%] 2023-01-11T23:10:17.1904343Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trunc_cuda_int64 PASSED [ 70%] 2023-01-11T23:10:17.1904519Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trunc_cuda_int8 PASSED [ 70%] 2023-01-11T23:10:17.1904697Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unbind_cuda_bfloat16 PASSED [ 70%] 2023-01-11T23:10:17.1904875Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unbind_cuda_complex64 PASSED [ 70%] 2023-01-11T23:10:17.1905047Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unbind_cuda_float16 PASSED [ 70%] 2023-01-11T23:10:17.1905291Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unbind_cuda_float32 PASSED [ 70%] 2023-01-11T23:10:17.1905461Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unbind_cuda_float64 PASSED [ 70%] 2023-01-11T23:10:17.1905636Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unbind_cuda_int16 PASSED [ 70%] 2023-01-11T23:10:17.1905800Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unbind_cuda_int32 PASSED [ 70%] 2023-01-11T23:10:17.1905999Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unbind_cuda_int64 PASSED [ 70%] 2023-01-11T23:10:17.1906175Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unbind_cuda_uint8 PASSED [ 70%] 2023-01-11T23:10:17.1906348Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unflatten_cuda_bool PASSED [ 70%] 2023-01-11T23:10:17.1906532Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unflatten_cuda_complex128 PASSED [ 70%] 2023-01-11T23:10:17.1906717Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unflatten_cuda_complex32 PASSED [ 70%] 2023-01-11T23:10:17.1906896Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unflatten_cuda_float32 PASSED [ 70%] 2023-01-11T23:10:17.1907081Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unflatten_cuda_int16 PASSED [ 71%] 2023-01-11T23:10:17.1907255Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unflatten_cuda_int32 PASSED [ 71%] 2023-01-11T23:10:17.1907425Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unflatten_cuda_int8 PASSED [ 71%] 2023-01-11T23:10:17.1907612Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_copy_cuda_bfloat16 PASSED [ 71%] 2023-01-11T23:10:17.1907791Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_copy_cuda_bool PASSED [ 71%] 2023-01-11T23:10:17.1907977Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_copy_cuda_complex128 PASSED [ 71%] 2023-01-11T23:10:17.1908160Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_copy_cuda_complex32 PASSED [ 71%] 2023-01-11T23:10:17.1908341Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_copy_cuda_float16 PASSED [ 71%] 2023-01-11T23:10:17.1908547Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_copy_cuda_float32 PASSED [ 71%] 2023-01-11T23:10:17.1908804Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_copy_cuda_int16 PASSED [ 71%] 2023-01-11T23:10:17.1908987Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_copy_cuda_int32 PASSED [ 71%] 2023-01-11T23:10:17.1909157Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_copy_cuda_int8 PASSED [ 71%] 2023-01-11T23:10:17.1909336Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_copy_cuda_uint8 PASSED [ 71%] 2023-01-11T23:10:17.1909514Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_cuda_complex32 PASSED [ 71%] 2023-01-11T23:10:17.1909689Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_cuda_complex64 PASSED [ 71%] 2023-01-11T23:10:17.1909864Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_cuda_float16 PASSED [ 71%] 2023-01-11T23:10:17.1910040Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_cuda_float32 PASSED [ 71%] 2023-01-11T23:10:17.1910211Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_cuda_float64 PASSED [ 71%] 2023-01-11T23:10:17.1910384Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_cuda_int8 PASSED [ 71%] 2023-01-11T23:10:17.1910559Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unsqueeze_cuda_bool PASSED [ 71%] 2023-01-11T23:10:17.1910735Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unsqueeze_cuda_complex128 PASSED [ 71%] 2023-01-11T23:10:17.1910920Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unsqueeze_cuda_complex32 PASSED [ 71%] 2023-01-11T23:10:17.1911104Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unsqueeze_cuda_complex64 PASSED [ 71%] 2023-01-11T23:10:17.1911284Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unsqueeze_cuda_float64 PASSED [ 71%] 2023-01-11T23:10:17.1911461Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unsqueeze_cuda_int16 PASSED [ 71%] 2023-01-11T23:10:17.1911670Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unsqueeze_cuda_int32 PASSED [ 71%] 2023-01-11T23:10:17.1911841Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unsqueeze_cuda_int64 PASSED [ 71%] 2023-01-11T23:10:17.1912013Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unsqueeze_cuda_int8 PASSED [ 71%] 2023-01-11T23:10:17.1912182Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unsqueeze_cuda_uint8 PASSED [ 71%] 2023-01-11T23:10:17.1912348Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_var_cuda_bfloat16 PASSED [ 71%] 2023-01-11T23:10:17.1912523Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_var_cuda_complex128 PASSED [ 71%] 2023-01-11T23:10:17.1912699Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_var_cuda_float64 PASSED [ 71%] 2023-01-11T23:10:17.1912881Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_var_mean_cuda_complex128 PASSED [ 71%] 2023-01-11T23:10:17.1913060Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_var_mean_cuda_float16 PASSED [ 71%] 2023-01-11T23:10:17.1913233Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_var_mean_cuda_float32 PASSED [ 71%] 2023-01-11T23:10:17.1913404Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_var_mean_cuda_float64 PASSED [ 71%] 2023-01-11T23:10:17.1913581Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_as_cuda_complex32 PASSED [ 71%] 2023-01-11T23:10:17.1913755Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_as_cuda_int64 PASSED [ 71%] 2023-01-11T23:10:17.1913921Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_cuda_complex64 PASSED [ 71%] 2023-01-11T23:10:17.1914122Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_cuda_float16 PASSED [ 71%] 2023-01-11T23:10:17.1914294Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_cuda_float32 PASSED [ 71%] 2023-01-11T23:10:17.1914472Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_cuda_int16 PASSED [ 71%] 2023-01-11T23:10:17.1914648Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_cuda_int64 PASSED [ 71%] 2023-01-11T23:10:17.1914819Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_cuda_int8 PASSED [ 71%] 2023-01-11T23:10:17.1914990Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_cuda_uint8 PASSED [ 71%] 2023-01-11T23:10:17.1915172Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vsplit_cuda_complex128 PASSED [ 71%] 2023-01-11T23:10:17.1915341Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vsplit_cuda_complex32 PASSED [ 71%] 2023-01-11T23:10:17.1915518Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vsplit_cuda_complex64 PASSED [ 71%] 2023-01-11T23:10:17.1915692Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vsplit_cuda_float16 PASSED [ 71%] 2023-01-11T23:10:17.1915867Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vsplit_cuda_float32 PASSED [ 71%] 2023-01-11T23:10:17.1916039Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vsplit_cuda_float64 PASSED [ 71%] 2023-01-11T23:10:17.1916209Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vsplit_cuda_int16 PASSED [ 71%] 2023-01-11T23:10:17.1916380Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vsplit_cuda_int64 PASSED [ 71%] 2023-01-11T23:10:17.1916553Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vsplit_cuda_int8 PASSED [ 71%] 2023-01-11T23:10:17.1916733Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vstack_cuda_complex128 PASSED [ 71%] 2023-01-11T23:10:17.1916907Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vstack_cuda_complex32 PASSED [ 71%] 2023-01-11T23:10:17.1917080Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vstack_cuda_complex64 PASSED [ 71%] 2023-01-11T23:10:17.1917280Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vstack_cuda_float16 PASSED [ 71%] 2023-01-11T23:10:17.1917454Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vstack_cuda_float32 PASSED [ 71%] 2023-01-11T23:10:17.1917625Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vstack_cuda_float64 PASSED [ 71%] 2023-01-11T23:10:17.1917799Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vstack_cuda_int16 PASSED [ 71%] 2023-01-11T23:10:17.1917971Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vstack_cuda_int64 PASSED [ 71%] 2023-01-11T23:10:17.1918141Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vstack_cuda_uint8 PASSED [ 71%] 2023-01-11T23:10:17.1918320Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_where_cuda_bool PASSED [ 71%] 2023-01-11T23:10:17.1918489Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_where_cuda_complex128 PASSED [ 71%] 2023-01-11T23:10:17.1918671Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_where_cuda_complex32 PASSED [ 71%] 2023-01-11T23:10:17.1918845Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_where_cuda_complex64 PASSED [ 71%] 2023-01-11T23:10:17.1919018Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_where_cuda_float32 PASSED [ 71%] 2023-01-11T23:10:17.1919191Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_where_cuda_int16 PASSED [ 71%] 2023-01-11T23:10:17.1919363Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_where_cuda_int32 PASSED [ 71%] 2023-01-11T23:10:17.1919538Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_xlogy_cuda_bool PASSED [ 71%] 2023-01-11T23:10:17.1919741Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_xlogy_cuda_float32 PASSED [ 71%] 2023-01-11T23:10:17.1919911Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_xlogy_cuda_float64 PASSED [ 71%] 2023-01-11T23:10:17.1920078Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_xlogy_cuda_int32 PASSED [ 71%] 2023-01-11T23:10:17.1920254Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_xlogy_cuda_int8 PASSED [ 71%] 2023-01-11T23:10:17.1920454Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_xlogy_cuda_uint8 PASSED [ 71%] 2023-01-11T23:10:17.1920649Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_zeros_cuda_complex128 PASSED [ 71%] 2023-01-11T23:10:17.1920824Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_zeros_cuda_float32 PASSED [ 71%] 2023-01-11T23:10:17.1920995Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_zeros_cuda_int16 PASSED [ 71%] 2023-01-11T23:10:17.1921167Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_zeros_cuda_uint8 PASSED [ 71%] 2023-01-11T23:10:17.1921364Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback_ops_nvprims_native_batch_norm_cuda_float16 PASSED [ 71%] 2023-01-11T23:10:17.1921558Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback_ops_nvprims_native_batch_norm_cuda_float32 PASSED [ 71%] 2023-01-11T23:10:17.1921739Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback_ops_nvprims_native_batch_norm_cuda_float64 PASSED [ 71%] 2023-01-11T23:10:17.1921928Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback_ops_nvprims_var_mean_cuda_bfloat16 PASSED [ 71%] 2023-01-11T23:10:17.1922115Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback_ops_nvprims_var_mean_cuda_float16 PASSED [ 71%] 2023-01-11T23:10:17.1922301Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback_ops_nvprims_var_mean_cuda_float32 PASSED [ 71%] 2023-01-11T23:10:17.1922489Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback_ops_nvprims_view_cuda_complex128 PASSED [ 71%] 2023-01-11T23:10:17.1922677Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback_ops_nvprims_view_cuda_float16 PASSED [ 71%] 2023-01-11T23:10:17.1922883Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback_ops_nvprims_view_cuda_float32 PASSED [ 71%] 2023-01-11T23:10:17.1923061Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback_ops_nvprims_view_cuda_int64 PASSED [ 71%] 2023-01-11T23:10:17.1923234Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_T_cuda_complex64 PASSED [ 71%] 2023-01-11T23:10:17.1923407Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager___getitem___cuda_complex64 PASSED [ 71%] 2023-01-11T23:10:17.1923581Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager___radd___cuda_complex64 PASSED [ 71%] 2023-01-11T23:10:17.1923753Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager___rdiv___cuda_float32 PASSED [ 71%] 2023-01-11T23:10:17.1923934Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager___rmatmul___cuda_float32 PASSED [ 71%] 2023-01-11T23:10:17.1924103Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager___rmul___cuda_float32 PASSED [ 71%] 2023-01-11T23:10:17.1924279Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager___rsub___cuda_complex64 PASSED [ 71%] 2023-01-11T23:10:17.1924471Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager__native_batch_norm_legit_cuda_float32 PASSED [ 71%] 2023-01-11T23:10:17.1924645Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_abs_cuda_complex64 PASSED [ 71%] 2023-01-11T23:10:17.1924813Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_abs_cuda_float32 PASSED [ 71%] 2023-01-11T23:10:17.1924979Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_acos_cuda_complex64 PASSED [ 71%] 2023-01-11T23:10:17.1925147Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_acos_cuda_float32 PASSED [ 71%] 2023-01-11T23:10:17.1925320Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_acosh_cuda_complex64 PASSED [ 71%] 2023-01-11T23:10:17.1925534Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_addbmm_cuda_complex64 SKIPPED (Skipped!) [ 71%] 2023-01-11T23:10:17.1925727Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_addbmm_cuda_float32 SKIPPED (Skipped!) [ 71%] 2023-01-11T23:10:17.1925902Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_addcdiv_cuda_float32 XFAIL [ 71%] 2023-01-11T23:10:17.1926075Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_addcmul_cuda_complex64 XFAIL [ 71%] 2023-01-11T23:10:17.1926266Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_addmm_decomposed_cuda_complex64 PASSED [ 71%] 2023-01-11T23:10:17.1926428Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_addmv_cuda_float32 PASSED [ 71%] 2023-01-11T23:10:17.1926614Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_addr_cuda_complex64 SKIPPED (Skipped!) [ 71%] 2023-01-11T23:10:17.1926798Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_addr_cuda_float32 SKIPPED (Skipped!) [ 71%] 2023-01-11T23:10:17.1926968Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_all_cuda_complex64 PASSED [ 71%] 2023-01-11T23:10:17.1927147Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_allclose_cuda_complex64 PASSED [ 71%] 2023-01-11T23:10:17.1927323Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_allclose_cuda_float32 PASSED [ 71%] 2023-01-11T23:10:17.1927498Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_angle_cuda_complex64 PASSED [ 71%] 2023-01-11T23:10:17.1927671Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_angle_cuda_float32 PASSED [ 71%] 2023-01-11T23:10:17.1927841Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_any_cuda_complex64 PASSED [ 71%] 2023-01-11T23:10:17.1928002Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_any_cuda_float32 PASSED [ 71%] 2023-01-11T23:10:17.1928173Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_arange_cuda_float32 XFAIL [ 71%] 2023-01-11T23:10:17.1928401Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_as_strided_cuda_complex64 SKIPPED (Errors when storage_offset is included) [ 71%] 2023-01-11T23:10:17.1928633Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_as_strided_partial_views_cuda_complex64 XFAIL [ 71%] 2023-01-11T23:10:17.1928826Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_as_strided_partial_views_cuda_float32 XFAIL [ 71%] 2023-01-11T23:10:17.1929001Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_asin_cuda_complex64 PASSED [ 71%] 2023-01-11T23:10:17.1929175Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_asinh_cuda_complex64 PASSED [ 71%] 2023-01-11T23:10:17.1929342Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_asinh_cuda_float32 PASSED [ 71%] 2023-01-11T23:10:17.1929511Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_atan2_cuda_float32 PASSED [ 71%] 2023-01-11T23:10:17.1929678Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_atan_cuda_complex64 PASSED [ 71%] 2023-01-11T23:10:17.1929850Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_atanh_cuda_complex64 PASSED [ 71%] 2023-01-11T23:10:17.1930020Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_atanh_cuda_float32 PASSED [ 71%] 2023-01-11T23:10:17.1930194Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_baddbmm_cuda_float32 PASSED [ 71%] 2023-01-11T23:10:17.1930397Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_bernoulli_cuda_float32 PASSED [ 71%] 2023-01-11T23:10:17.1930598Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_bfloat16_cuda_complex64 PASSED [ 71%] 2023-01-11T23:10:17.1930769Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_bmm_cuda_complex64 PASSED [ 71%] 2023-01-11T23:10:17.1930938Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_bmm_cuda_float32 PASSED [ 71%] 2023-01-11T23:10:17.1931105Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_bool_cuda_complex64 PASSED [ 71%] 2023-01-11T23:10:17.1931304Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_broadcast_to_cuda_float32 PASSED [ 71%] 2023-01-11T23:10:17.1931478Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_byte_cuda_complex64 PASSED [ 71%] 2023-01-11T23:10:17.1931648Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_byte_cuda_float32 PASSED [ 71%] 2023-01-11T23:10:17.1931830Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cartesian_prod_cuda_float32 PASSED [ 71%] 2023-01-11T23:10:17.1931999Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cat_cuda_complex64 PASSED [ 71%] 2023-01-11T23:10:17.1932167Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cat_cuda_float32 PASSED [ 71%] 2023-01-11T23:10:17.1932343Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cdouble_cuda_complex64 PASSED [ 71%] 2023-01-11T23:10:17.1932514Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_ceil_cuda_float32 PASSED [ 71%] 2023-01-11T23:10:17.1932685Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cfloat_cuda_complex64 PASSED [ 71%] 2023-01-11T23:10:17.1932856Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_chalf_cuda_complex64 PASSED [ 71%] 2023-01-11T23:10:17.1933027Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_char_cuda_float32 PASSED [ 71%] 2023-01-11T23:10:17.1933206Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cholesky_cuda_complex64 PASSED [ 71%] 2023-01-11T23:10:17.1933379Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cholesky_cuda_float32 PASSED [ 71%] 2023-01-11T23:10:17.1933551Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_chunk_cuda_complex64 PASSED [ 71%] 2023-01-11T23:10:17.1933718Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_chunk_cuda_float32 PASSED [ 71%] 2023-01-11T23:10:17.1933891Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_clamp_min_cuda_float32 PASSED [ 71%] 2023-01-11T23:10:17.1934064Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_clone_cuda_complex64 PASSED [ 71%] 2023-01-11T23:10:17.1934223Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_clone_cuda_float32 PASSED [ 71%] 2023-01-11T23:10:17.1934427Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_column_stack_cuda_complex64 PASSED [ 71%] 2023-01-11T23:10:17.1934710Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_combinations_cuda_float32 PASSED [ 71%] 2023-01-11T23:10:17.1934882Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_complex_cuda_float32 PASSED [ 71%] 2023-01-11T23:10:17.1935047Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_conj_cuda_float32 PASSED [ 72%] 2023-01-11T23:10:17.1935231Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_conj_physical_cuda_complex64 PASSED [ 72%] 2023-01-11T23:10:17.1935409Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_conj_physical_cuda_float32 PASSED [ 72%] 2023-01-11T23:10:17.1935593Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_contiguous_cuda_complex64 PASSED [ 72%] 2023-01-11T23:10:17.1935771Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_contiguous_cuda_float32 PASSED [ 72%] 2023-01-11T23:10:17.1935938Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_copysign_cuda_float32 PASSED [ 72%] 2023-01-11T23:10:17.1936112Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_corrcoef_cuda_complex64 PASSED [ 72%] 2023-01-11T23:10:17.1936284Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cosh_cuda_complex64 PASSED [ 72%] 2023-01-11T23:10:17.1936464Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_count_nonzero_cuda_float32 PASSED [ 72%] 2023-01-11T23:10:17.1936631Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cov_cuda_float32 PASSED [ 72%] 2023-01-11T23:10:17.1936800Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cross_cuda_complex64 PASSED [ 72%] 2023-01-11T23:10:17.1937013Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cumsum_cuda_complex64 PASSED [ 72%] 2023-01-11T23:10:17.1937184Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cumsum_cuda_float32 PASSED [ 72%] 2023-01-11T23:10:17.1937349Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_diag_cuda_complex64 PASSED [ 72%] 2023-01-11T23:10:17.1937529Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_diag_embed_cuda_complex64 PASSED [ 72%] 2023-01-11T23:10:17.1937702Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_diag_embed_cuda_float32 PASSED [ 72%] 2023-01-11T23:10:17.1937873Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_diagflat_cuda_float32 PASSED [ 72%] 2023-01-11T23:10:17.1938051Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_diagonal_cuda_complex64 PASSED [ 72%] 2023-01-11T23:10:17.1938221Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_diagonal_cuda_float32 PASSED [ 72%] 2023-01-11T23:10:17.1938411Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_diagonal_scatter_cuda_complex64 PASSED [ 72%] 2023-01-11T23:10:17.1938595Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_diagonal_scatter_cuda_float32 PASSED [ 72%] 2023-01-11T23:10:17.1938767Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_diff_cuda_float32 PASSED [ 72%] 2023-01-11T23:10:17.1938932Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_digamma_cuda_float32 PASSED [ 72%] 2023-01-11T23:10:17.1939101Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_dist_cuda_complex64 PASSED [ 72%] 2023-01-11T23:10:17.1939289Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_div_floor_rounding_cuda_float32 PASSED [ 72%] 2023-01-11T23:10:17.1939476Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_div_no_rounding_mode_cuda_float32 PASSED [ 72%] 2023-01-11T23:10:17.1939648Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_double_cuda_complex64 PASSED [ 72%] 2023-01-11T23:10:17.1939815Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_double_cuda_float32 PASSED [ 72%] 2023-01-11T23:10:17.1939989Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_dsplit_cuda_complex64 PASSED [ 72%] 2023-01-11T23:10:17.1940193Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_dsplit_cuda_float32 PASSED [ 72%] 2023-01-11T23:10:17.1940364Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_dstack_cuda_float32 PASSED [ 72%] 2023-01-11T23:10:17.1940542Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_empty_cuda_float32 SKIPPED (Skipped!) [ 72%] 2023-01-11T23:10:17.1940719Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_empty_like_cuda_complex64 PASSED [ 72%] 2023-01-11T23:10:17.1940883Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_eq_cuda_float32 PASSED [ 72%] 2023-01-11T23:10:17.1941055Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_equal_cuda_complex64 PASSED [ 72%] 2023-01-11T23:10:17.1941222Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_equal_cuda_float32 PASSED [ 72%] 2023-01-11T23:10:17.1941392Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_erfinv_cuda_float32 PASSED [ 72%] 2023-01-11T23:10:17.1941572Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_expand_as_cuda_complex64 PASSED [ 72%] 2023-01-11T23:10:17.1941747Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_expand_as_cuda_float32 PASSED [ 72%] 2023-01-11T23:10:17.1941910Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_expand_cuda_float32 PASSED [ 72%] 2023-01-11T23:10:17.1942093Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_eye_cuda_float32 SKIPPED (Skipped!) [ 72%] 2023-01-11T23:10:17.1942271Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_fft2_cuda_complex64 PASSED [ 72%] 2023-01-11T23:10:17.1942443Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_fft2_cuda_float32 PASSED [ 72%] 2023-01-11T23:10:17.1942617Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_fft_cuda_complex64 PASSED [ 72%] 2023-01-11T23:10:17.1942812Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_fft_cuda_float32 PASSED [ 72%] 2023-01-11T23:10:17.1942983Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_fftn_cuda_complex64 PASSED [ 72%] 2023-01-11T23:10:17.1943164Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_fftshift_cuda_float32 PASSED [ 72%] 2023-01-11T23:10:17.1943336Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_hfft2_cuda_float32 PASSED [ 72%] 2023-01-11T23:10:17.1943504Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_hfft_cuda_complex64 PASSED [ 72%] 2023-01-11T23:10:17.1943673Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_hfft_cuda_float32 PASSED [ 72%] 2023-01-11T23:10:17.1943851Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_hfftn_cuda_complex64 PASSED [ 72%] 2023-01-11T23:10:17.1944025Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_ifft2_cuda_float32 PASSED [ 72%] 2023-01-11T23:10:17.1944200Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_ifftn_cuda_float32 PASSED [ 72%] 2023-01-11T23:10:17.1944384Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_ifftshift_cuda_complex64 PASSED [ 72%] 2023-01-11T23:10:17.1944561Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_ihfft2_cuda_float32 PASSED [ 72%] 2023-01-11T23:10:17.1944733Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_ihfft_cuda_float32 PASSED [ 72%] 2023-01-11T23:10:17.1944908Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_irfft2_cuda_float32 PASSED [ 72%] 2023-01-11T23:10:17.1945075Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_irfft_cuda_complex64 PASSED [ 72%] 2023-01-11T23:10:17.1945244Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_irfftn_cuda_float32 PASSED [ 72%] 2023-01-11T23:10:17.1945414Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_rfft_cuda_float32 PASSED [ 72%] 2023-01-11T23:10:17.1945589Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_rfftn_cuda_float32 PASSED [ 72%] 2023-01-11T23:10:17.1945769Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_flatten_cuda_complex64 PASSED [ 72%] 2023-01-11T23:10:17.1945961Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_flip_cuda_float32 PASSED [ 72%] 2023-01-11T23:10:17.1946137Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fliplr_cuda_float32 PASSED [ 72%] 2023-01-11T23:10:17.1946316Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_float_power_cuda_complex64 PASSED [ 72%] 2023-01-11T23:10:17.1946485Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_floor_cuda_float32 PASSED [ 72%] 2023-01-11T23:10:17.1946643Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fmin_cuda_float32 PASSED [ 72%] 2023-01-11T23:10:17.1946808Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_frexp_cuda_float32 PASSED [ 72%] 2023-01-11T23:10:17.1946976Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_full_cuda_complex64 XFAIL [ 72%] 2023-01-11T23:10:17.1947143Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_full_cuda_float32 XFAIL [ 72%] 2023-01-11T23:10:17.1947316Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_full_like_cuda_float32 PASSED [ 72%] 2023-01-11T23:10:17.1947490Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_geqrf_cuda_complex64 PASSED [ 72%] 2023-01-11T23:10:17.1947664Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_gradient_cuda_float32 PASSED [ 72%] 2023-01-11T23:10:17.1947847Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_grid_sampler_2d_cuda_float32 PASSED [ 72%] 2023-01-11T23:10:17.1948008Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_half_cuda_complex64 PASSED [ 72%] 2023-01-11T23:10:17.1948181Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_hsplit_cuda_complex64 PASSED [ 72%] 2023-01-11T23:10:17.1948374Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_hsplit_cuda_float32 PASSED [ 72%] 2023-01-11T23:10:17.1948543Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_hstack_cuda_float32 PASSED [ 72%] 2023-01-11T23:10:17.1948785Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_hypot_cuda_float32 PASSED [ 72%] 2023-01-11T23:10:17.1948954Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_i0_cuda_float32 PASSED [ 72%] 2023-01-11T23:10:17.1949121Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_igamma_cuda_float32 PASSED [ 72%] 2023-01-11T23:10:17.1949291Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_igammac_cuda_float32 PASSED [ 72%] 2023-01-11T23:10:17.1949459Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_imag_cuda_complex64 PASSED [ 72%] 2023-01-11T23:10:17.1949633Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_index_add_cuda_complex64 PASSED [ 72%] 2023-01-11T23:10:17.1949806Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_index_fill_cuda_float32 PASSED [ 72%] 2023-01-11T23:10:17.1949985Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_index_reduce_cuda_float32 PASSED [ 72%] 2023-01-11T23:10:17.1950161Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_index_select_cuda_float32 PASSED [ 72%] 2023-01-11T23:10:17.1950327Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_inner_cuda_float32 PASSED [ 72%] 2023-01-11T23:10:17.1950495Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_int_cuda_complex64 PASSED [ 72%] 2023-01-11T23:10:17.1950664Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_isclose_cuda_float32 PASSED [ 72%] 2023-01-11T23:10:17.1950844Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_isfinite_cuda_complex64 PASSED [ 72%] 2023-01-11T23:10:17.1951016Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_isnan_cuda_complex64 PASSED [ 72%] 2023-01-11T23:10:17.1951179Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_isnan_cuda_float32 PASSED [ 72%] 2023-01-11T23:10:17.1951355Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_isneginf_cuda_float32 PASSED [ 72%] 2023-01-11T23:10:17.1951521Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_isreal_cuda_float32 PASSED [ 72%] 2023-01-11T23:10:17.1951743Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_jiterator_2inputs_2outputs_cuda_complex64 PASSED [ 72%] 2023-01-11T23:10:17.1951928Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_jiterator_binary_cuda_complex64 PASSED [ 72%] 2023-01-11T23:10:17.1952128Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_jiterator_binary_return_by_ref_cuda_complex64 PASSED [ 72%] 2023-01-11T23:10:17.1952295Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_kron_cuda_complex64 PASSED [ 72%] 2023-01-11T23:10:17.1952464Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_ldexp_cuda_complex64 XFAIL [ 72%] 2023-01-11T23:10:17.1952623Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_ldexp_cuda_float32 XFAIL [ 72%] 2023-01-11T23:10:17.1952791Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_le_cuda_float32 PASSED [ 72%] 2023-01-11T23:10:17.1952963Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_lerp_cuda_complex64 PASSED [ 72%] 2023-01-11T23:10:17.1953134Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_lerp_cuda_float32 PASSED [ 72%] 2023-01-11T23:10:17.1953302Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_lgamma_cuda_float32 PASSED [ 72%] 2023-01-11T23:10:17.1953480Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_cross_cuda_float32 PASSED [ 72%] 2023-01-11T23:10:17.1953666Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_det_singular_cuda_float32 PASSED [ 72%] 2023-01-11T23:10:17.1953848Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_eigh_cuda_complex64 PASSED [ 72%] 2023-01-11T23:10:17.1954031Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_eigvals_cuda_complex64 PASSED [ 72%] 2023-01-11T23:10:17.1954230Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_eigvals_cuda_float32 PASSED [ 72%] 2023-01-11T23:10:17.1954430Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_householder_product_cuda_complex64 PASSED [ 72%] 2023-01-11T23:10:17.1954606Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_inv_ex_cuda_float32 PASSED [ 72%] 2023-01-11T23:10:17.1954795Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_ldl_factor_cuda_complex64 PASSED [ 72%] 2023-01-11T23:10:17.1954981Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_ldl_factor_ex_cuda_float32 PASSED [ 72%] 2023-01-11T23:10:17.1955162Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_ldl_solve_cuda_complex64 PASSED [ 72%] 2023-01-11T23:10:17.1955343Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_lstsq_cuda_complex64 PASSED [ 72%] 2023-01-11T23:10:17.1955518Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_lstsq_cuda_float32 PASSED [ 72%] 2023-01-11T23:10:17.1955713Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_lstsq_grad_oriented_cuda_float32 PASSED [ 72%] 2023-01-11T23:10:17.1955886Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_lu_cuda_complex64 PASSED [ 72%] 2023-01-11T23:10:17.1956176Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_lu_cuda_float32 PASSED [ 72%] 2023-01-11T23:10:17.1956360Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_lu_factor_cuda_complex64 PASSED [ 72%] 2023-01-11T23:10:17.1956548Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_lu_factor_ex_cuda_complex64 PASSED [ 72%] 2023-01-11T23:10:17.1956734Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_lu_factor_ex_cuda_float32 PASSED [ 72%] 2023-01-11T23:10:17.1956915Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_lu_solve_cuda_complex64 PASSED [ 72%] 2023-01-11T23:10:17.1957107Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_matrix_norm_cuda_complex64 PASSED [ 72%] 2023-01-11T23:10:17.1957295Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_matrix_power_cuda_complex64 PASSED [ 72%] 2023-01-11T23:10:17.1957503Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_matrix_rank_cuda_complex64 PASSED [ 72%] 2023-01-11T23:10:17.1957679Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_multi_dot_cuda_complex64 PASSED [ 72%] 2023-01-11T23:10:17.1957861Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_multi_dot_cuda_float32 PASSED [ 72%] 2023-01-11T23:10:17.1958045Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_norm_cuda_complex64 PASSED [ 72%] 2023-01-11T23:10:17.1958248Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_norm_subgradients_at_zero_cuda_complex64 PASSED [ 72%] 2023-01-11T23:10:17.1958428Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_pinv_cuda_complex64 PASSED [ 72%] 2023-01-11T23:10:17.1958607Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_pinv_cuda_float32 PASSED [ 72%] 2023-01-11T23:10:17.1958797Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_pinv_hermitian_cuda_float32 PASSED [ 72%] 2023-01-11T23:10:17.1958973Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_qr_cuda_complex64 PASSED [ 72%] 2023-01-11T23:10:17.1959148Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_slogdet_cuda_float32 PASSED [ 72%] 2023-01-11T23:10:17.1959323Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_solve_ex_cuda_complex64 PASSED [ 72%] 2023-01-11T23:10:17.1959506Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_solve_ex_cuda_float32 PASSED [ 72%] 2023-01-11T23:10:17.1959696Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_solve_triangular_cuda_float32 PASSED [ 72%] 2023-01-11T23:10:17.1959897Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_svd_cuda_float32 PASSED [ 72%] 2023-01-11T23:10:17.1960078Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_svdvals_cuda_complex64 PASSED [ 72%] 2023-01-11T23:10:17.1960260Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_svdvals_cuda_float32 PASSED [ 72%] 2023-01-11T23:10:17.1960447Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_tensorsolve_cuda_complex64 PASSED [ 72%] 2023-01-11T23:10:17.1960627Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_vander_cuda_float32 PASSED [ 72%] 2023-01-11T23:10:17.1960810Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_vecdot_cuda_complex64 PASSED [ 72%] 2023-01-11T23:10:17.1960979Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_vecdot_cuda_float32 PASSED [ 72%] 2023-01-11T23:10:17.1961159Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_vector_norm_cuda_float32 PASSED [ 72%] 2023-01-11T23:10:17.1961335Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_log1p_cuda_complex64 PASSED [ 72%] 2023-01-11T23:10:17.1961504Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_log1p_cuda_float32 PASSED [ 72%] 2023-01-11T23:10:17.1961675Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_log2_cuda_complex64 PASSED [ 72%] 2023-01-11T23:10:17.1961841Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_log_cuda_complex64 PASSED [ 72%] 2023-01-11T23:10:17.1962009Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_log_cuda_float32 PASSED [ 72%] 2023-01-11T23:10:17.1962185Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_logaddexp2_cuda_float32 PASSED [ 72%] 2023-01-11T23:10:17.1962363Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_logaddexp_cuda_float32 PASSED [ 72%] 2023-01-11T23:10:17.1962537Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_logcumsumexp_cuda_float32 PASSED [ 72%] 2023-01-11T23:10:17.1962740Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_logdet_cuda_complex64 PASSED [ 72%] 2023-01-11T23:10:17.1962929Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_logdet_cuda_float32 PASSED [ 72%] 2023-01-11T23:10:17.1963131Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_logical_and_cuda_complex64 PASSED [ 73%] 2023-01-11T23:10:17.1963309Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_logical_and_cuda_float32 PASSED [ 73%] 2023-01-11T23:10:17.1963499Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_logical_not_cuda_float32 SKIPPED (Skipped!) [ 73%] 2023-01-11T23:10:17.1963675Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_logical_or_cuda_complex64 PASSED [ 73%] 2023-01-11T23:10:17.1963847Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_logical_or_cuda_float32 PASSED [ 73%] 2023-01-11T23:10:17.1964024Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_logical_xor_cuda_float32 PASSED [ 73%] 2023-01-11T23:10:17.1964189Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_logit_cuda_float32 PASSED [ 73%] 2023-01-11T23:10:17.1964360Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_logspace_cuda_float32 XFAIL [ 73%] 2023-01-11T23:10:17.1964530Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_lt_cuda_float32 PASSED [ 73%] 2023-01-11T23:10:17.1964699Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_lu_cuda_complex64 PASSED [ 73%] 2023-01-11T23:10:17.1964862Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_lu_cuda_float32 PASSED [ 73%] 2023-01-11T23:10:17.1965031Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_lu_solve_cuda_float32 PASSED [ 73%] 2023-01-11T23:10:17.1965203Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_mH_cuda_complex64 PASSED [ 73%] 2023-01-11T23:10:17.1965367Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_mH_cuda_float32 PASSED [ 73%] 2023-01-11T23:10:17.1965529Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_mT_cuda_complex64 PASSED [ 73%] 2023-01-11T23:10:17.1965735Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_cumprod_cuda_float32 PASSED [ 73%] 2023-01-11T23:10:17.1965915Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_cumsum_cuda_complex64 PASSED [ 73%] 2023-01-11T23:10:17.1966097Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_fill_cuda_complex64 PASSED [ 73%] 2023-01-11T23:10:17.1966273Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_fill_cuda_float32 PASSED [ 73%] 2023-01-11T23:10:17.1966456Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_log_softmax_cuda_float32 PASSED [ 73%] 2023-01-11T23:10:17.1966638Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_logsumexp_cuda_float32 PASSED [ 73%] 2023-01-11T23:10:17.1966815Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_mean_cuda_float32 PASSED [ 73%] 2023-01-11T23:10:17.1966994Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_median_cuda_float32 PASSED [ 73%] 2023-01-11T23:10:17.1967176Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_normalize_cuda_complex64 PASSED [ 73%] 2023-01-11T23:10:17.1967359Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_prod_cuda_complex64 PASSED [ 73%] 2023-01-11T23:10:17.1967534Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_prod_cuda_float32 PASSED [ 73%] 2023-01-11T23:10:17.1967715Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_select_cuda_complex64 PASSED [ 73%] 2023-01-11T23:10:17.1967895Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_softmin_cuda_float32 PASSED [ 73%] 2023-01-11T23:10:17.1968072Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_std_cuda_complex64 PASSED [ 73%] 2023-01-11T23:10:17.1968250Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_var_cuda_complex64 PASSED [ 73%] 2023-01-11T23:10:17.1968428Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_matrix_exp_cuda_float32 PASSED [ 73%] 2023-01-11T23:10:17.1968625Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_max_pool2d_with_indices_backward_cuda_float32 PASSED [ 73%] 2023-01-11T23:10:17.1968841Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_max_reduction_no_dim_cuda_float32 PASSED [ 73%] 2023-01-11T23:10:17.1969017Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_maximum_cuda_float32 PASSED [ 73%] 2023-01-11T23:10:17.1969189Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_median_cuda_float32 PASSED [ 73%] 2023-01-11T23:10:17.1969384Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_meshgrid_list_of_tensors_cuda_complex64 PASSED [ 73%] 2023-01-11T23:10:17.1969578Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_meshgrid_variadic_tensors_cuda_float32 PASSED [ 73%] 2023-01-11T23:10:17.1969745Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_mm_cuda_complex64 PASSED [ 73%] 2023-01-11T23:10:17.1969918Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_mode_cuda_float32 PASSED [ 73%] 2023-01-11T23:10:17.1970093Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_movedim_cuda_complex64 PASSED [ 73%] 2023-01-11T23:10:17.1970270Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_movedim_cuda_float32 PASSED [ 73%] 2023-01-11T23:10:17.1970458Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_mul_cuda_complex64 PASSED [ 73%] 2023-01-11T23:10:17.1970662Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_multinomial_cuda_float32 PASSED [ 73%] 2023-01-11T23:10:17.1970848Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_mvlgamma_mvlgamma_p_1_cuda_float32 PASSED [ 73%] 2023-01-11T23:10:17.1971036Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_mvlgamma_mvlgamma_p_5_cuda_float32 PASSED [ 73%] 2023-01-11T23:10:17.1971210Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nan_to_num_cuda_float32 PASSED [ 73%] 2023-01-11T23:10:17.1971411Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nanmean_cuda_float32 PASSED [ 73%] 2023-01-11T23:10:17.1971591Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nanquantile_cuda_float32 PASSED [ 73%] 2023-01-11T23:10:17.1971762Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nansum_cuda_float32 PASSED [ 73%] 2023-01-11T23:10:17.1971941Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_narrow_copy_cuda_complex64 PASSED [ 73%] 2023-01-11T23:10:17.1972106Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_narrow_cuda_complex64 PASSED [ 73%] 2023-01-11T23:10:17.1972275Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_narrow_cuda_float32 PASSED [ 73%] 2023-01-11T23:10:17.1972458Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_native_batch_norm_cuda_float32 PASSED [ 73%] 2023-01-11T23:10:17.1972640Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_native_layer_norm_cuda_float32 PASSED [ 73%] 2023-01-11T23:10:17.1972812Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_ne_cuda_complex64 PASSED [ 73%] 2023-01-11T23:10:17.1972984Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_neg_cuda_complex64 PASSED [ 73%] 2023-01-11T23:10:17.1973177Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_new_empty_cuda_float32 SKIPPED (Skipped!) [ 73%] 2023-01-11T23:10:17.1973419Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_new_empty_strided_cuda_complex64 SKIPPED (Expected: new_empty_strided is not comparable) [ 73%] 2023-01-11T23:10:17.1973655Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_new_empty_strided_cuda_float32 SKIPPED (Expected: new_empty_strided is not comparable) [ 73%] 2023-01-11T23:10:17.1973820Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_new_full_cuda_float32 PASSED [ 73%] 2023-01-11T23:10:17.1973998Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_new_ones_cuda_complex64 PASSED [ 73%] 2023-01-11T23:10:17.1974176Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_new_zeros_cuda_float32 PASSED [ 73%] 2023-01-11T23:10:17.1974377Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_adaptive_avg_pool1d_cuda_float32 PASSED [ 73%] 2023-01-11T23:10:17.1974727Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_adaptive_avg_pool3d_cuda_float32 PASSED [ 73%] 2023-01-11T23:10:17.1974932Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_adaptive_max_pool1d_cuda_float32 PASSED [ 73%] 2023-01-11T23:10:17.1975131Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_adaptive_max_pool3d_cuda_float32 PASSED [ 73%] 2023-01-11T23:10:17.1975320Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_avg_pool2d_cuda_float32 PASSED [ 73%] 2023-01-11T23:10:17.1975509Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_avg_pool3d_cuda_float32 PASSED [ 73%] 2023-01-11T23:10:17.1975696Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_batch_norm_cuda_float32 PASSED [ 73%] 2023-01-11T23:10:17.1975895Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_binary_cross_entropy_cuda_float32 PASSED [ 73%] 2023-01-11T23:10:17.1976111Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_binary_cross_entropy_with_logits_cuda_float32 PASSED [ 73%] 2023-01-11T23:10:17.1976309Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_conv_transpose1d_cuda_float32 PASSED [ 73%] 2023-01-11T23:10:17.1976507Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_conv_transpose2d_cuda_complex64 PASSED [ 73%] 2023-01-11T23:10:17.1976703Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_conv_transpose3d_cuda_complex64 PASSED [ 73%] 2023-01-11T23:10:17.1976902Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_cosine_embedding_loss_cuda_float32 PASSED [ 73%] 2023-01-11T23:10:17.1977124Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_ctc_loss_cuda_float32 PASSED [ 73%] 2023-01-11T23:10:17.1977315Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_dropout2d_cuda_float32 PASSED [ 73%] 2023-01-11T23:10:17.1977505Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_dropout3d_cuda_float32 PASSED [ 73%] 2023-01-11T23:10:17.1977687Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_dropout_cuda_float32 PASSED [ 73%] 2023-01-11T23:10:17.1977873Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_embedding_cuda_float32 PASSED [ 73%] 2023-01-11T23:10:17.1978088Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_feature_alpha_dropout_with_train_cuda_float32 PASSED [ 73%] 2023-01-11T23:10:17.1978303Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_feature_alpha_dropout_without_train_cuda_float32 PASSED [ 73%] 2023-01-11T23:10:17.1978503Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_fractional_max_pool2d_cuda_float32 PASSED [ 73%] 2023-01-11T23:10:17.1978686Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_gelu_cuda_float32 PASSED [ 73%] 2023-01-11T23:10:17.1978878Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_group_norm_cuda_float32 PASSED [ 73%] 2023-01-11T23:10:17.1979068Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_hardswish_cuda_float32 PASSED [ 73%] 2023-01-11T23:10:17.1979256Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_hardtanh_cuda_float32 PASSED [ 73%] 2023-01-11T23:10:17.1979447Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_hinge_embedding_loss_cuda_float32 PASSED [ 73%] 2023-01-11T23:10:17.1979644Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_instance_norm_cuda_float32 PASSED [ 73%] 2023-01-11T23:10:17.1979847Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_interpolate_bilinear_cuda_float32 PASSED [ 73%] 2023-01-11T23:10:17.1980037Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_l1_loss_cuda_complex64 PASSED [ 73%] 2023-01-11T23:10:17.1980252Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_layer_norm_cuda_float32 PASSED [ 73%] 2023-01-11T23:10:17.1980440Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_linear_cuda_float32 PASSED [ 73%] 2023-01-11T23:10:17.1980639Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_margin_ranking_loss_cuda_float32 PASSED [ 73%] 2023-01-11T23:10:17.1980832Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_max_pool2d_cuda_float32 PASSED [ 73%] 2023-01-11T23:10:17.1981029Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_max_unpool1d_grad_cuda_float32 PASSED [ 73%] 2023-01-11T23:10:17.1981221Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_max_unpool2d_cuda_float32 PASSED [ 73%] 2023-01-11T23:10:17.1981408Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_max_unpool2d_grad_cuda_float32 PASSED [ 73%] 2023-01-11T23:10:17.1981598Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_max_unpool3d_cuda_float32 PASSED [ 73%] 2023-01-11T23:10:17.1981786Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_mish_cuda_float32 PASSED [ 73%] 2023-01-11T23:10:17.1981973Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_mse_loss_cuda_float32 PASSED [ 73%] 2023-01-11T23:10:17.1982169Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_multi_margin_loss_cuda_float32 PASSED [ 73%] 2023-01-11T23:10:17.1982371Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_multilabel_margin_loss_cuda_float32 PASSED [ 73%] 2023-01-11T23:10:17.1982561Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_nll_loss_cuda_float32 PASSED [ 73%] 2023-01-11T23:10:17.1982773Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_normalize_cuda_float32 PASSED [ 73%] 2023-01-11T23:10:17.1982973Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_pad_constant_cuda_float32 PASSED [ 73%] 2023-01-11T23:10:17.1983158Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_pad_reflect_cuda_float32 PASSED [ 73%] 2023-01-11T23:10:17.1983353Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_pad_replicate_cuda_float32 PASSED [ 73%] 2023-01-11T23:10:17.1983557Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_pairwise_distance_cuda_complex64 PASSED [ 73%] 2023-01-11T23:10:17.1983756Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_pairwise_distance_cuda_float32 PASSED [ 73%] 2023-01-11T23:10:17.1983948Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_pixel_shuffle_cuda_float32 PASSED [ 73%] 2023-01-11T23:10:17.1984149Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_pixel_unshuffle_cuda_complex64 PASSED [ 73%] 2023-01-11T23:10:17.1984337Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_prelu_cuda_float32 PASSED [ 73%] 2023-01-11T23:10:17.1984522Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_rrelu_cuda_float32 PASSED [ 73%] 2023-01-11T23:10:17.1984702Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_selu_cuda_float32 PASSED [ 73%] 2023-01-11T23:10:17.1984890Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_softmin_cuda_float32 PASSED [ 73%] 2023-01-11T23:10:17.1985073Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_softplus_cuda_float32 PASSED [ 73%] 2023-01-11T23:10:17.1985264Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_softsign_cuda_complex64 PASSED [ 73%] 2023-01-11T23:10:17.1985460Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_tanhshrink_cuda_complex64 PASSED [ 73%] 2023-01-11T23:10:17.1985683Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_triplet_margin_loss_cuda_complex64 PASSED [ 73%] 2023-01-11T23:10:17.1985885Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_triplet_margin_loss_cuda_float32 PASSED [ 73%] 2023-01-11T23:10:17.1986102Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_triplet_margin_with_distance_loss_cuda_complex64 PASSED [ 73%] 2023-01-11T23:10:17.1986291Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_unfold_cuda_float32 PASSED [ 73%] 2023-01-11T23:10:17.1986493Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_upsample_bilinear_cuda_float32 PASSED [ 73%] 2023-01-11T23:10:17.1986690Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_upsample_nearest_cuda_float32 PASSED [ 73%] 2023-01-11T23:10:17.1986858Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nonzero_cuda_float32 PASSED [ 73%] 2023-01-11T23:10:17.1987027Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_norm_cuda_float32 PASSED [ 73%] 2023-01-11T23:10:17.1987201Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_norm_fro_cuda_float32 PASSED [ 73%] 2023-01-11T23:10:17.1987377Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_norm_nuc_cuda_complex64 PASSED [ 73%] 2023-01-11T23:10:17.1987548Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_norm_nuc_cuda_float32 PASSED [ 73%] 2023-01-11T23:10:17.1987719Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_normal_cuda_float32 PASSED [ 73%] 2023-01-11T23:10:17.1987896Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_ones_like_cuda_complex64 PASSED [ 73%] 2023-01-11T23:10:17.1988068Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_ones_like_cuda_float32 PASSED [ 73%] 2023-01-11T23:10:17.1988310Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_ormqr_cuda_complex64 PASSED [ 73%] 2023-01-11T23:10:17.1988472Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_ormqr_cuda_float32 PASSED [ 73%] 2023-01-11T23:10:17.1988645Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_outer_cuda_float32 PASSED [ 73%] 2023-01-11T23:10:17.1988886Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_permute_cuda_complex64 PASSED [ 73%] 2023-01-11T23:10:17.1989061Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_permute_cuda_float32 PASSED [ 73%] 2023-01-11T23:10:17.1989265Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_polygamma_polygamma_n_1_cuda_float32 SKIPPED (Skipped!) [ 73%] 2023-01-11T23:10:17.1989470Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_polygamma_polygamma_n_4_cuda_float32 SKIPPED (Skipped!) [ 73%] 2023-01-11T23:10:17.1989638Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_pow_cuda_complex64 PASSED [ 73%] 2023-01-11T23:10:17.1989811Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_pow_cuda_float32 PASSED [ 73%] 2023-01-11T23:10:17.1989983Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_prod_cuda_complex64 PASSED [ 73%] 2023-01-11T23:10:17.1990147Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_put_cuda_complex64 PASSED [ 73%] 2023-01-11T23:10:17.1990316Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_qr_cuda_float32 PASSED [ 73%] 2023-01-11T23:10:17.1990484Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_rad2deg_cuda_float32 PASSED [ 73%] 2023-01-11T23:10:17.1990663Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_rand_like_cuda_complex64 PASSED [ 73%] 2023-01-11T23:10:17.1990831Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_randint_cuda_float32 XFAIL [ 73%] 2023-01-11T23:10:17.1991001Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_randn_cuda_complex64 XFAIL [ 73%] 2023-01-11T23:10:17.1991170Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_randn_cuda_float32 XFAIL [ 73%] 2023-01-11T23:10:17.1991346Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_randn_like_cuda_float32 PASSED [ 73%] 2023-01-11T23:10:17.1991537Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_ravel_cuda_float32 PASSED [ 73%] 2023-01-11T23:10:17.1991706Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_real_cuda_float32 PASSED [ 73%] 2023-01-11T23:10:17.1991890Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_reciprocal_cuda_complex64 PASSED [ 73%] 2023-01-11T23:10:17.1992066Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_remainder_cuda_float32 PASSED [ 73%] 2023-01-11T23:10:17.1992240Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_repeat_cuda_complex64 PASSED [ 73%] 2023-01-11T23:10:17.1992411Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_repeat_cuda_float32 PASSED [ 73%] 2023-01-11T23:10:17.1992600Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_repeat_interleave_cuda_complex64 PASSED [ 74%] 2023-01-11T23:10:17.1992789Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_repeat_interleave_cuda_float32 PASSED [ 74%] 2023-01-11T23:10:17.1992967Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_reshape_cuda_complex64 PASSED [ 74%] 2023-01-11T23:10:17.1993132Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_reshape_cuda_float32 PASSED [ 74%] 2023-01-11T23:10:17.1993305Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_resize__cuda_complex64 PASSED [ 74%] 2023-01-11T23:10:17.1993474Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_resize__cuda_float32 PASSED [ 74%] 2023-01-11T23:10:17.1993651Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_resize_as__cuda_complex64 PASSED [ 74%] 2023-01-11T23:10:17.1993830Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_resolve_conj_cuda_complex64 PASSED [ 74%] 2023-01-11T23:10:17.1994034Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_resolve_neg_cuda_complex64 PASSED [ 74%] 2023-01-11T23:10:17.1994203Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_roll_cuda_complex64 PASSED [ 74%] 2023-01-11T23:10:17.1994376Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_rot90_cuda_complex64 PASSED [ 74%] 2023-01-11T23:10:17.1994546Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_rot90_cuda_float32 PASSED [ 74%] 2023-01-11T23:10:17.1994721Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_round_decimals_0_cuda_float32 PASSED [ 74%] 2023-01-11T23:10:17.1994893Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_rsqrt_cuda_complex64 PASSED [ 74%] 2023-01-11T23:10:17.1995063Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_rsub_cuda_complex64 PASSED [ 74%] 2023-01-11T23:10:17.1995258Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_scalar_tensor_cuda_complex64 SKIPPED (Skipped!) [ 74%] 2023-01-11T23:10:17.1995444Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_scatter_add_cuda_complex64 PASSED [ 74%] 2023-01-11T23:10:17.1995623Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_scatter_add_cuda_float32 PASSED [ 74%] 2023-01-11T23:10:17.1995801Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_scatter_cuda_complex64 PASSED [ 74%] 2023-01-11T23:10:17.1995987Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_scatter_reduce_prod_cuda_float32 PASSED [ 74%] 2023-01-11T23:10:17.1996170Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_scatter_reduce_sum_cuda_float32 PASSED [ 74%] 2023-01-11T23:10:17.1996352Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_segment_reduce_lengths_cuda_float32 PASSED [ 74%] 2023-01-11T23:10:17.1996542Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_segment_reduce_offsets_cuda_float32 PASSED [ 74%] 2023-01-11T23:10:17.1996716Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_sgn_cuda_complex64 PASSED [ 74%] 2023-01-11T23:10:17.1996888Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_sgn_cuda_float32 PASSED [ 74%] 2023-01-11T23:10:17.1997055Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_short_cuda_float32 PASSED [ 74%] 2023-01-11T23:10:17.1997251Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_sigmoid_cuda_complex64 PASSED [ 74%] 2023-01-11T23:10:17.1997421Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_sign_cuda_float32 PASSED [ 74%] 2023-01-11T23:10:17.1997623Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_signal_windows_bartlett_cuda_float32 SKIPPED (Skipped!) [ 74%] 2023-01-11T23:10:17.1997817Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_signal_windows_blackman_cuda_float32 SKIPPED (Skipped!) [ 74%] 2023-01-11T23:10:17.1998015Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_signal_windows_cosine_cuda_float32 SKIPPED (Skipped!) [ 74%] 2023-01-11T23:10:17.1998222Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_signal_windows_exponential_cuda_float32 SKIPPED (Skipped!) [ 74%] 2023-01-11T23:10:17.1998424Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_signal_windows_gaussian_cuda_float32 SKIPPED (Skipped!) [ 74%] 2023-01-11T23:10:17.1998624Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_signal_windows_hamming_cuda_float32 SKIPPED (Skipped!) [ 74%] 2023-01-11T23:10:17.1998796Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_signbit_cuda_float32 PASSED [ 74%] 2023-01-11T23:10:17.1998965Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_sin_cuda_complex64 PASSED [ 74%] 2023-01-11T23:10:17.1999133Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_sinh_cuda_float32 PASSED [ 74%] 2023-01-11T23:10:17.1999316Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_slice_scatter_cuda_float32 PASSED [ 74%] 2023-01-11T23:10:17.1999484Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_softmax_cuda_float32 PASSED [ 74%] 2023-01-11T23:10:17.1999691Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_softmax_with_dtype_cuda_complex64 PASSED [ 74%] 2023-01-11T23:10:17.1999875Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_softmax_with_dtype_cuda_float32 PASSED [ 74%] 2023-01-11T23:10:17.2000062Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_bessel_j0_cuda_float32 PASSED [ 74%] 2023-01-11T23:10:17.2000243Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_bessel_j1_cuda_float32 PASSED [ 74%] 2023-01-11T23:10:17.2000424Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_bessel_y1_cuda_float32 PASSED [ 74%] 2023-01-11T23:10:17.2000623Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_chebyshev_polynomial_t_cuda_float32 PASSED [ 74%] 2023-01-11T23:10:17.2000804Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_entr_cuda_float32 PASSED [ 74%] 2023-01-11T23:10:17.2000981Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_erfcx_cuda_float32 PASSED [ 74%] 2023-01-11T23:10:17.2001181Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_hermite_polynomial_he_cuda_float32 PASSED [ 74%] 2023-01-11T23:10:17.2001374Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_i0e_cuda_float32 PASSED [ 74%] 2023-01-11T23:10:17.2001575Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_i1e_cuda_float32 PASSED [ 74%] 2023-01-11T23:10:17.2002065Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_legendre_polynomial_p_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 74%] 2023-01-11T23:10:17.2002255Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_modified_bessel_i0_cuda_float32 PASSED [ 74%] 2023-01-11T23:10:17.2002447Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_modified_bessel_i1_cuda_float32 PASSED [ 74%] 2023-01-11T23:10:17.2002638Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_modified_bessel_k1_cuda_float32 PASSED [ 74%] 2023-01-11T23:10:17.2002849Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_polygamma_special_polygamma_n_0_cuda_float32 PASSED [ 74%] 2023-01-11T23:10:17.2003073Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_scaled_modified_bessel_k0_cuda_float32 PASSED [ 74%] 2023-01-11T23:10:17.2003272Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_scaled_modified_bessel_k1_cuda_float32 PASSED [ 74%] 2023-01-11T23:10:17.2003445Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_xlog1py_cuda_float32 PASSED [ 74%] 2023-01-11T23:10:17.2003622Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_zeta_cuda_float32 PASSED [ 74%] 2023-01-11T23:10:17.2003790Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_split_cuda_float32 PASSED [ 74%] 2023-01-11T23:10:17.2003973Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_split_list_args_cuda_complex64 PASSED [ 74%] 2023-01-11T23:10:17.2004148Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_sqrt_cuda_complex64 PASSED [ 74%] 2023-01-11T23:10:17.2004315Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_sqrt_cuda_float32 PASSED [ 74%] 2023-01-11T23:10:17.2004490Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_squeeze_cuda_complex64 PASSED [ 74%] 2023-01-11T23:10:17.2004661Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_stack_cuda_complex64 PASSED [ 74%] 2023-01-11T23:10:17.2004827Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_stack_cuda_float32 PASSED [ 74%] 2023-01-11T23:10:17.2004987Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_std_cuda_complex64 PASSED [ 74%] 2023-01-11T23:10:17.2005152Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_std_cuda_float32 PASSED [ 74%] 2023-01-11T23:10:17.2005325Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_std_mean_cuda_float32 PASSED [ 74%] 2023-01-11T23:10:17.2005507Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_std_unbiased_cuda_complex64 PASSED [ 74%] 2023-01-11T23:10:17.2005708Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_std_unbiased_cuda_float32 PASSED [ 74%] 2023-01-11T23:10:17.2005877Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_stft_cuda_float32 PASSED [ 74%] 2023-01-11T23:10:17.2006043Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_sub_cuda_complex64 PASSED [ 74%] 2023-01-11T23:10:17.2006209Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_sub_cuda_float32 PASSED [ 74%] 2023-01-11T23:10:17.2006390Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_sum_to_size_cuda_complex64 PASSED [ 74%] 2023-01-11T23:10:17.2006549Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_svd_cuda_complex64 PASSED [ 74%] 2023-01-11T23:10:17.2006718Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_symeig_cuda_float32 PASSED [ 74%] 2023-01-11T23:10:17.2006882Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_t_cuda_float32 PASSED [ 74%] 2023-01-11T23:10:17.2007066Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_take_along_dim_cuda_complex64 PASSED [ 74%] 2023-01-11T23:10:17.2007235Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_take_cuda_complex64 PASSED [ 74%] 2023-01-11T23:10:17.2007403Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_tan_cuda_complex64 PASSED [ 74%] 2023-01-11T23:10:17.2007567Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_tanh_cuda_float32 PASSED [ 74%] 2023-01-11T23:10:17.2007747Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_tensor_split_cuda_complex64 PASSED [ 74%] 2023-01-11T23:10:17.2007919Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_tensordot_cuda_complex64 PASSED [ 74%] 2023-01-11T23:10:17.2008095Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_tensordot_cuda_float32 PASSED [ 74%] 2023-01-11T23:10:17.2008267Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_tile_cuda_complex64 PASSED [ 74%] 2023-01-11T23:10:17.2008434Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_to_cuda_complex64 PASSED [ 74%] 2023-01-11T23:10:17.2008596Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_to_cuda_float32 PASSED [ 74%] 2023-01-11T23:10:17.2008806Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_to_sparse_cuda_complex64 PASSED [ 74%] 2023-01-11T23:10:17.2008979Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_to_sparse_cuda_float32 PASSED [ 74%] 2023-01-11T23:10:17.2009145Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_topk_cuda_float32 PASSED [ 74%] 2023-01-11T23:10:17.2009311Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_trace_cuda_float32 PASSED [ 74%] 2023-01-11T23:10:17.2009479Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_transpose_cuda_float32 PASSED [ 74%] 2023-01-11T23:10:17.2009646Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_trapz_cuda_float32 PASSED [ 74%] 2023-01-11T23:10:17.2009831Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_triangular_solve_cuda_float32 PASSED [ 74%] 2023-01-11T23:10:17.2010003Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_tril_cuda_complex64 PASSED [ 74%] 2023-01-11T23:10:17.2010176Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_triu_cuda_complex64 PASSED [ 74%] 2023-01-11T23:10:17.2010344Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_triu_cuda_float32 PASSED [ 74%] 2023-01-11T23:10:17.2010551Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_true_divide_cuda_complex64 PASSED [ 74%] 2023-01-11T23:10:17.2010742Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_true_divide_cuda_float32 PASSED [ 74%] 2023-01-11T23:10:17.2010909Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_trunc_cuda_float32 PASSED [ 74%] 2023-01-11T23:10:17.2011079Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_unflatten_cuda_complex64 PASSED [ 74%] 2023-01-11T23:10:17.2011272Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_unfold_copy_cuda_float32 PASSED [ 74%] 2023-01-11T23:10:17.2011440Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_uniform_cuda_float32 XFAIL [ 74%] 2023-01-11T23:10:17.2011616Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_unsqueeze_cuda_complex64 PASSED [ 74%] 2023-01-11T23:10:17.2011788Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_unsqueeze_cuda_float32 PASSED [ 74%] 2023-01-11T23:10:17.2011955Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_var_cuda_complex64 PASSED [ 74%] 2023-01-11T23:10:17.2012121Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_var_cuda_float32 PASSED [ 74%] 2023-01-11T23:10:17.2012297Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_var_mean_cuda_complex64 PASSED [ 74%] 2023-01-11T23:10:17.2012460Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_var_mean_cuda_float32 PASSED [ 74%] 2023-01-11T23:10:17.2012646Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_var_mean_unbiased_cuda_complex64 PASSED [ 74%] 2023-01-11T23:10:17.2012823Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_var_unbiased_cuda_float32 PASSED [ 74%] 2023-01-11T23:10:17.2012994Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_vdot_cuda_complex64 PASSED [ 74%] 2023-01-11T23:10:17.2013161Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_vdot_cuda_float32 PASSED [ 74%] 2023-01-11T23:10:17.2013336Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_view_as_cuda_complex64 PASSED [ 74%] 2023-01-11T23:10:17.2013505Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_view_as_cuda_float32 PASSED [ 74%] 2023-01-11T23:10:17.2013683Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_view_as_real_cuda_complex64 PASSED [ 74%] 2023-01-11T23:10:17.2013853Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_view_cuda_complex64 PASSED [ 74%] 2023-01-11T23:10:17.2014011Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_view_cuda_float32 PASSED [ 74%] 2023-01-11T23:10:17.2014179Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_vsplit_cuda_float32 PASSED [ 74%] 2023-01-11T23:10:17.2014345Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_vstack_cuda_float32 PASSED [ 74%] 2023-01-11T23:10:17.2014720Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_where_cuda_complex64 PASSED [ 74%] 2023-01-11T23:10:17.2014889Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_xlogy_cuda_float32 PASSED [ 74%] 2023-01-11T23:10:17.2015054Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_zero__cuda_float32 PASSED [ 74%] 2023-01-11T23:10:17.2015219Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_zeros_cuda_float32 XFAIL [ 74%] 2023-01-11T23:10:17.2015390Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_zeros_like_cuda_float32 PASSED [ 74%] 2023-01-11T23:10:17.2015564Z test_ops.py::TestCompositeComplianceCUDA::test_backward_H_cuda_float32 PASSED [ 74%] 2023-01-11T23:10:17.2015729Z test_ops.py::TestCompositeComplianceCUDA::test_backward_T_cuda_float32 PASSED [ 74%] 2023-01-11T23:10:17.2015905Z test_ops.py::TestCompositeComplianceCUDA::test_backward___radd___cuda_float32 PASSED [ 74%] 2023-01-11T23:10:17.2016091Z test_ops.py::TestCompositeComplianceCUDA::test_backward___rmatmul___cuda_float32 PASSED [ 74%] 2023-01-11T23:10:17.2016262Z test_ops.py::TestCompositeComplianceCUDA::test_backward___rmul___cuda_float32 PASSED [ 74%] 2023-01-11T23:10:17.2016433Z test_ops.py::TestCompositeComplianceCUDA::test_backward___rsub___cuda_float32 PASSED [ 74%] 2023-01-11T23:10:17.2016627Z test_ops.py::TestCompositeComplianceCUDA::test_backward__softmax_backward_data_cuda_float32 PASSED [ 74%] 2023-01-11T23:10:17.2016800Z test_ops.py::TestCompositeComplianceCUDA::test_backward_abs_cuda_float32 PASSED [ 74%] 2023-01-11T23:10:17.2016976Z test_ops.py::TestCompositeComplianceCUDA::test_backward_acos_cuda_float32 PASSED [ 74%] 2023-01-11T23:10:17.2017145Z test_ops.py::TestCompositeComplianceCUDA::test_backward_acosh_cuda_float32 PASSED [ 74%] 2023-01-11T23:10:17.2017346Z test_ops.py::TestCompositeComplianceCUDA::test_backward_add_cuda_float32 PASSED [ 74%] 2023-01-11T23:10:17.2017526Z test_ops.py::TestCompositeComplianceCUDA::test_backward_addcdiv_cuda_float32 PASSED [ 74%] 2023-01-11T23:10:17.2017716Z test_ops.py::TestCompositeComplianceCUDA::test_backward_addmm_decomposed_cuda_float32 PASSED [ 74%] 2023-01-11T23:10:17.2017918Z test_ops.py::TestCompositeComplianceCUDA::test_backward_addmv_cuda_float32 PASSED [ 74%] 2023-01-11T23:10:17.2018091Z test_ops.py::TestCompositeComplianceCUDA::test_backward_angle_cuda_float32 PASSED [ 74%] 2023-01-11T23:10:17.2018270Z test_ops.py::TestCompositeComplianceCUDA::test_backward_as_strided_cuda_float32 PASSED [ 74%] 2023-01-11T23:10:17.2018464Z test_ops.py::TestCompositeComplianceCUDA::test_backward_as_strided_partial_views_cuda_float32 XFAIL [ 74%] 2023-01-11T23:10:17.2018653Z test_ops.py::TestCompositeComplianceCUDA::test_backward_as_strided_scatter_cuda_float32 PASSED [ 74%] 2023-01-11T23:10:17.2018821Z test_ops.py::TestCompositeComplianceCUDA::test_backward_asin_cuda_float32 PASSED [ 74%] 2023-01-11T23:10:17.2018993Z test_ops.py::TestCompositeComplianceCUDA::test_backward_atan_cuda_float32 PASSED [ 74%] 2023-01-11T23:10:17.2019166Z test_ops.py::TestCompositeComplianceCUDA::test_backward_atanh_cuda_float32 PASSED [ 74%] 2023-01-11T23:10:17.2019345Z test_ops.py::TestCompositeComplianceCUDA::test_backward_atleast_1d_cuda_float32 PASSED [ 74%] 2023-01-11T23:10:17.2019521Z test_ops.py::TestCompositeComplianceCUDA::test_backward_atleast_3d_cuda_float32 PASSED [ 74%] 2023-01-11T23:10:17.2019697Z test_ops.py::TestCompositeComplianceCUDA::test_backward_baddbmm_cuda_float32 PASSED [ 74%] 2023-01-11T23:10:17.2019877Z test_ops.py::TestCompositeComplianceCUDA::test_backward_bernoulli_cuda_float32 PASSED [ 74%] 2023-01-11T23:10:17.2020048Z test_ops.py::TestCompositeComplianceCUDA::test_backward_bmm_cuda_float32 PASSED [ 74%] 2023-01-11T23:10:17.2020235Z test_ops.py::TestCompositeComplianceCUDA::test_backward_broadcast_tensors_cuda_float32 PASSED [ 74%] 2023-01-11T23:10:17.2020422Z test_ops.py::TestCompositeComplianceCUDA::test_backward_broadcast_to_cuda_float32 PASSED [ 74%] 2023-01-11T23:10:17.2020656Z test_ops.py::TestCompositeComplianceCUDA::test_backward_cartesian_prod_cuda_float32 PASSED [ 74%] 2023-01-11T23:10:17.2020855Z test_ops.py::TestCompositeComplianceCUDA::test_backward_cat_cuda_float32 PASSED [ 74%] 2023-01-11T23:10:17.2021030Z test_ops.py::TestCompositeComplianceCUDA::test_backward_cdist_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2021204Z test_ops.py::TestCompositeComplianceCUDA::test_backward_ceil_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2021378Z test_ops.py::TestCompositeComplianceCUDA::test_backward_cfloat_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2021555Z test_ops.py::TestCompositeComplianceCUDA::test_backward_cholesky_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2021731Z test_ops.py::TestCompositeComplianceCUDA::test_backward_chunk_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2021903Z test_ops.py::TestCompositeComplianceCUDA::test_backward_clamp_min_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2022083Z test_ops.py::TestCompositeComplianceCUDA::test_backward_column_stack_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2022269Z test_ops.py::TestCompositeComplianceCUDA::test_backward_combinations_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2022441Z test_ops.py::TestCompositeComplianceCUDA::test_backward_conj_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2022625Z test_ops.py::TestCompositeComplianceCUDA::test_backward_conj_physical_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2022802Z test_ops.py::TestCompositeComplianceCUDA::test_backward_contiguous_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2022975Z test_ops.py::TestCompositeComplianceCUDA::test_backward_cosh_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2023149Z test_ops.py::TestCompositeComplianceCUDA::test_backward_cummin_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2023349Z test_ops.py::TestCompositeComplianceCUDA::test_backward_deg2rad_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2023522Z test_ops.py::TestCompositeComplianceCUDA::test_backward_diag_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2023702Z test_ops.py::TestCompositeComplianceCUDA::test_backward_diagflat_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2023880Z test_ops.py::TestCompositeComplianceCUDA::test_backward_diagonal_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2024071Z test_ops.py::TestCompositeComplianceCUDA::test_backward_diagonal_scatter_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2024242Z test_ops.py::TestCompositeComplianceCUDA::test_backward_dist_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2024436Z test_ops.py::TestCompositeComplianceCUDA::test_backward_div_no_rounding_mode_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2024609Z test_ops.py::TestCompositeComplianceCUDA::test_backward_dot_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2024783Z test_ops.py::TestCompositeComplianceCUDA::test_backward_double_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2024950Z test_ops.py::TestCompositeComplianceCUDA::test_backward_erfc_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2025124Z test_ops.py::TestCompositeComplianceCUDA::test_backward_erfinv_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2025300Z test_ops.py::TestCompositeComplianceCUDA::test_backward_exp2_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2025477Z test_ops.py::TestCompositeComplianceCUDA::test_backward_expand_as_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2025652Z test_ops.py::TestCompositeComplianceCUDA::test_backward_fft_fft2_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2025825Z test_ops.py::TestCompositeComplianceCUDA::test_backward_fft_fft_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2025998Z test_ops.py::TestCompositeComplianceCUDA::test_backward_fft_fftn_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2026171Z test_ops.py::TestCompositeComplianceCUDA::test_backward_fft_hfft_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2026342Z test_ops.py::TestCompositeComplianceCUDA::test_backward_fft_ifft2_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2026513Z test_ops.py::TestCompositeComplianceCUDA::test_backward_fft_ifft_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2026717Z test_ops.py::TestCompositeComplianceCUDA::test_backward_fft_ifftshift_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2026898Z test_ops.py::TestCompositeComplianceCUDA::test_backward_fft_ihfft2_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2027075Z test_ops.py::TestCompositeComplianceCUDA::test_backward_fft_ihfft_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2027252Z test_ops.py::TestCompositeComplianceCUDA::test_backward_fft_irfft_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2027425Z test_ops.py::TestCompositeComplianceCUDA::test_backward_fft_rfft2_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2027599Z test_ops.py::TestCompositeComplianceCUDA::test_backward_fft_rfftn_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2027769Z test_ops.py::TestCompositeComplianceCUDA::test_backward_fill_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2027940Z test_ops.py::TestCompositeComplianceCUDA::test_backward_flipud_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2028115Z test_ops.py::TestCompositeComplianceCUDA::test_backward_float_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2028288Z test_ops.py::TestCompositeComplianceCUDA::test_backward_fmin_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2028459Z test_ops.py::TestCompositeComplianceCUDA::test_backward_fmod_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2028630Z test_ops.py::TestCompositeComplianceCUDA::test_backward_gather_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2028876Z test_ops.py::TestCompositeComplianceCUDA::test_backward_half_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2029052Z test_ops.py::TestCompositeComplianceCUDA::test_backward_hstack_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2029227Z test_ops.py::TestCompositeComplianceCUDA::test_backward_hypot_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2029424Z test_ops.py::TestCompositeComplianceCUDA::test_backward_index_add_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2029603Z test_ops.py::TestCompositeComplianceCUDA::test_backward_index_copy_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2029782Z test_ops.py::TestCompositeComplianceCUDA::test_backward_index_fill_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2029959Z test_ops.py::TestCompositeComplianceCUDA::test_backward_index_reduce_cuda_float32 XFAIL [ 75%] 2023-01-11T23:10:17.2030141Z test_ops.py::TestCompositeComplianceCUDA::test_backward_index_select_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2030313Z test_ops.py::TestCompositeComplianceCUDA::test_backward_inner_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2030484Z test_ops.py::TestCompositeComplianceCUDA::test_backward_kron_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2030658Z test_ops.py::TestCompositeComplianceCUDA::test_backward_kthvalue_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2030829Z test_ops.py::TestCompositeComplianceCUDA::test_backward_lerp_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2031006Z test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_cond_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2031184Z test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_cross_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2031365Z test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_det_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2031558Z test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_det_singular_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2031734Z test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_eig_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2031909Z test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_eigh_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2032111Z test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_householder_product_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2032313Z test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_lu_factor_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2032504Z test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_lu_factor_ex_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2032677Z test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_pinv_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2032884Z test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_qr_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2033068Z test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_solve_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2033255Z test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_solve_ex_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2033450Z test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_solve_triangular_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2033626Z test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_svd_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2033811Z test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_tensorinv_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2033997Z test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_vecdot_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2034179Z test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_vector_norm_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2034354Z test_ops.py::TestCompositeComplianceCUDA::test_backward_log10_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2034528Z test_ops.py::TestCompositeComplianceCUDA::test_backward_log1p_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2034699Z test_ops.py::TestCompositeComplianceCUDA::test_backward_log2_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2034871Z test_ops.py::TestCompositeComplianceCUDA::test_backward_log_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2035048Z test_ops.py::TestCompositeComplianceCUDA::test_backward_logaddexp_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2035235Z test_ops.py::TestCompositeComplianceCUDA::test_backward_logcumsumexp_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2035432Z test_ops.py::TestCompositeComplianceCUDA::test_backward_logit_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2035613Z test_ops.py::TestCompositeComplianceCUDA::test_backward_logsumexp_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2035786Z test_ops.py::TestCompositeComplianceCUDA::test_backward_lu_solve_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2035957Z test_ops.py::TestCompositeComplianceCUDA::test_backward_mH_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2036126Z test_ops.py::TestCompositeComplianceCUDA::test_backward_mT_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2036311Z test_ops.py::TestCompositeComplianceCUDA::test_backward_masked_cumsum_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2036500Z test_ops.py::TestCompositeComplianceCUDA::test_backward_masked_logsumexp_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2036683Z test_ops.py::TestCompositeComplianceCUDA::test_backward_masked_mean_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2036872Z test_ops.py::TestCompositeComplianceCUDA::test_backward_masked_median_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2037063Z test_ops.py::TestCompositeComplianceCUDA::test_backward_masked_normalize_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2037243Z test_ops.py::TestCompositeComplianceCUDA::test_backward_masked_scatter_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2037426Z test_ops.py::TestCompositeComplianceCUDA::test_backward_masked_select_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2037611Z test_ops.py::TestCompositeComplianceCUDA::test_backward_masked_softmin_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2037793Z test_ops.py::TestCompositeComplianceCUDA::test_backward_masked_sum_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2037971Z test_ops.py::TestCompositeComplianceCUDA::test_backward_masked_var_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2038147Z test_ops.py::TestCompositeComplianceCUDA::test_backward_matmul_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2038324Z test_ops.py::TestCompositeComplianceCUDA::test_backward_max_binary_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2038535Z test_ops.py::TestCompositeComplianceCUDA::test_backward_max_pool2d_with_indices_backward_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2038757Z test_ops.py::TestCompositeComplianceCUDA::test_backward_max_reduction_with_dim_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2038926Z test_ops.py::TestCompositeComplianceCUDA::test_backward_median_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2039120Z test_ops.py::TestCompositeComplianceCUDA::test_backward_min_reduction_no_dim_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2039296Z test_ops.py::TestCompositeComplianceCUDA::test_backward_mode_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2039474Z test_ops.py::TestCompositeComplianceCUDA::test_backward_movedim_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2039647Z test_ops.py::TestCompositeComplianceCUDA::test_backward_msort_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2039842Z test_ops.py::TestCompositeComplianceCUDA::test_backward_mvlgamma_mvlgamma_p_1_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2040018Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nanmean_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2040201Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nanquantile_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2040381Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nansum_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2040589Z test_ops.py::TestCompositeComplianceCUDA::test_backward_native_batch_norm_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2040812Z test_ops.py::TestCompositeComplianceCUDA::test_backward_native_dropout_backward_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2040985Z test_ops.py::TestCompositeComplianceCUDA::test_backward_neg_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2041194Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_adaptive_avg_pool2d_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2041398Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_adaptive_max_pool3d_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2041630Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_alpha_dropout_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2041834Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_avg_pool2d_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2042027Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_batch_norm_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2042236Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_batch_norm_without_cudnn_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2042450Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_binary_cross_entropy_with_logits_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2042639Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_celu_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2042829Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_conv1d_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2043024Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_conv2d_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2043228Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_conv_transpose1d_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2043441Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_cosine_embedding_loss_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2043647Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_cosine_similarity_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2043838Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_ctc_loss_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2044035Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_dropout2d_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2044225Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_dropout3d_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2044420Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_dropout_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2044614Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_elu_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2044856Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_feature_alpha_dropout_with_train_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2045082Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_feature_alpha_dropout_without_train_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2045291Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_fractional_max_pool2d_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2045501Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_fractional_max_pool3d_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2045693Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_glu_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2045889Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_group_norm_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2046094Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_hardtanh_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2046287Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_huber_loss_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2046491Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_instance_norm_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2046697Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_interpolate_area_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2046905Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_interpolate_bilinear_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2047113Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_interpolate_nearest_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2047319Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_interpolate_trilinear_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2047534Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_kl_div_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2047731Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_l1_loss_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2047926Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_linear_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2048121Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_margin_ranking_loss_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2048317Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_max_pool3d_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2048515Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_max_unpool2d_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2048719Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_max_unpool2d_grad_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2048919Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_max_unpool3d_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2049120Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_max_unpool3d_grad_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2049314Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_mish_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2049506Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_mse_loss_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2049714Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_multilabel_margin_loss_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2049920Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_multilabel_soft_margin_loss_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2050116Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_normalize_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2050311Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_pad_circular_cuda_float32 PASSED [ 75%] 2023-01-11T23:10:17.2050509Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_pad_constant_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2050762Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_pad_reflect_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2050992Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_pad_replicate_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2051199Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_pairwise_distance_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2051392Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_pdist_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2051584Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_prelu_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2051769Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_relu6_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2051963Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_relu_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2052147Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_selu_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2052335Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_silu_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2052534Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_softshrink_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2052734Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_tanhshrink_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2052931Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_threshold_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2053140Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_triplet_margin_loss_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2053361Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_triplet_margin_with_distance_loss_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2053590Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_upsample_bilinear_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2053787Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_upsample_nearest_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2053963Z test_ops.py::TestCompositeComplianceCUDA::test_backward_norm_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2054150Z test_ops.py::TestCompositeComplianceCUDA::test_backward_normal_number_mean_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2054324Z test_ops.py::TestCompositeComplianceCUDA::test_backward_ormqr_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2054594Z test_ops.py::TestCompositeComplianceCUDA::test_backward_outer_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2054845Z test_ops.py::TestCompositeComplianceCUDA::test_backward_pca_lowrank_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 76%] 2023-01-11T23:10:17.2055022Z test_ops.py::TestCompositeComplianceCUDA::test_backward_polar_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2055220Z test_ops.py::TestCompositeComplianceCUDA::test_backward_polygamma_polygamma_n_0_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2055419Z test_ops.py::TestCompositeComplianceCUDA::test_backward_polygamma_polygamma_n_2_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2055603Z test_ops.py::TestCompositeComplianceCUDA::test_backward_polygamma_polygamma_n_3_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2055795Z test_ops.py::TestCompositeComplianceCUDA::test_backward_polygamma_polygamma_n_4_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2055970Z test_ops.py::TestCompositeComplianceCUDA::test_backward_put_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2056150Z test_ops.py::TestCompositeComplianceCUDA::test_backward_quantile_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2056329Z test_ops.py::TestCompositeComplianceCUDA::test_backward_rad2deg_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2056506Z test_ops.py::TestCompositeComplianceCUDA::test_backward_real_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2056683Z test_ops.py::TestCompositeComplianceCUDA::test_backward_repeat_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2056906Z test_ops.py::TestCompositeComplianceCUDA::test_backward_reshape_as_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2057079Z test_ops.py::TestCompositeComplianceCUDA::test_backward_reshape_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2057254Z test_ops.py::TestCompositeComplianceCUDA::test_backward_roll_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2057431Z test_ops.py::TestCompositeComplianceCUDA::test_backward_round_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2057621Z test_ops.py::TestCompositeComplianceCUDA::test_backward_round_decimals_0_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2057811Z test_ops.py::TestCompositeComplianceCUDA::test_backward_round_decimals_neg_3_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2057988Z test_ops.py::TestCompositeComplianceCUDA::test_backward_rsub_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2058180Z test_ops.py::TestCompositeComplianceCUDA::test_backward_scatter_reduce_amin_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2058371Z test_ops.py::TestCompositeComplianceCUDA::test_backward_scatter_reduce_prod_cuda_float32 XFAIL [ 76%] 2023-01-11T23:10:17.2058556Z test_ops.py::TestCompositeComplianceCUDA::test_backward_scatter_reduce_sum_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2058744Z test_ops.py::TestCompositeComplianceCUDA::test_backward_segment_reduce_lengths_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2058944Z test_ops.py::TestCompositeComplianceCUDA::test_backward_segment_reduce_offsets_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2059119Z test_ops.py::TestCompositeComplianceCUDA::test_backward_sgn_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2059299Z test_ops.py::TestCompositeComplianceCUDA::test_backward_sin_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2059508Z test_ops.py::TestCompositeComplianceCUDA::test_backward_sinc_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2059685Z test_ops.py::TestCompositeComplianceCUDA::test_backward_slice_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2059866Z test_ops.py::TestCompositeComplianceCUDA::test_backward_softmax_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2060036Z test_ops.py::TestCompositeComplianceCUDA::test_backward_sort_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2060216Z test_ops.py::TestCompositeComplianceCUDA::test_backward_special_erfcx_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2060399Z test_ops.py::TestCompositeComplianceCUDA::test_backward_special_i0e_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2060579Z test_ops.py::TestCompositeComplianceCUDA::test_backward_special_i1_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2060760Z test_ops.py::TestCompositeComplianceCUDA::test_backward_special_i1e_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2060948Z test_ops.py::TestCompositeComplianceCUDA::test_backward_special_log_ndtr_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2061131Z test_ops.py::TestCompositeComplianceCUDA::test_backward_special_ndtr_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2061320Z test_ops.py::TestCompositeComplianceCUDA::test_backward_special_xlog1py_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2061496Z test_ops.py::TestCompositeComplianceCUDA::test_backward_split_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2061672Z test_ops.py::TestCompositeComplianceCUDA::test_backward_squeeze_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2061840Z test_ops.py::TestCompositeComplianceCUDA::test_backward_stack_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2062031Z test_ops.py::TestCompositeComplianceCUDA::test_backward_std_mean_unbiased_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2062204Z test_ops.py::TestCompositeComplianceCUDA::test_backward_sub_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2062375Z test_ops.py::TestCompositeComplianceCUDA::test_backward_sum_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2062550Z test_ops.py::TestCompositeComplianceCUDA::test_backward_t_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2062724Z test_ops.py::TestCompositeComplianceCUDA::test_backward_take_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2062922Z test_ops.py::TestCompositeComplianceCUDA::test_backward_tan_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2063109Z test_ops.py::TestCompositeComplianceCUDA::test_backward_tensor_split_cuda_float32 XFAIL [ 76%] 2023-01-11T23:10:17.2063289Z test_ops.py::TestCompositeComplianceCUDA::test_backward_tensordot_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2063454Z test_ops.py::TestCompositeComplianceCUDA::test_backward_tile_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2063627Z test_ops.py::TestCompositeComplianceCUDA::test_backward_to_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2063798Z test_ops.py::TestCompositeComplianceCUDA::test_backward_topk_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2063975Z test_ops.py::TestCompositeComplianceCUDA::test_backward_trapezoid_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2064159Z test_ops.py::TestCompositeComplianceCUDA::test_backward_true_divide_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2064335Z test_ops.py::TestCompositeComplianceCUDA::test_backward_trunc_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2064520Z test_ops.py::TestCompositeComplianceCUDA::test_backward_unflatten_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2064697Z test_ops.py::TestCompositeComplianceCUDA::test_backward_unsqueeze_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2064861Z test_ops.py::TestCompositeComplianceCUDA::test_backward_var_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2065054Z test_ops.py::TestCompositeComplianceCUDA::test_backward_var_mean_unbiased_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2065231Z test_ops.py::TestCompositeComplianceCUDA::test_backward_view_as_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2065410Z test_ops.py::TestCompositeComplianceCUDA::test_backward_view_copy_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2065610Z test_ops.py::TestCompositeComplianceCUDA::test_backward_view_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2065787Z test_ops.py::TestCompositeComplianceCUDA::test_backward_zero__cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2065974Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad___getitem___cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2066152Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad___rdiv___cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2066331Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad___rmod___cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2066498Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad___rmul___cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2066671Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad___rpow___cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2066870Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad__softmax_backward_data_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2067048Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_abs_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2067226Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_acos_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2067401Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_add_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2067580Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_addbmm_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2067758Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_addcdiv_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2067927Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_addmm_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2068099Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_addr_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2068317Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_allclose_cuda_float32 SKIPPED (Does not support autograd) [ 76%] 2023-01-11T23:10:17.2068491Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_amin_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2068775Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_any_cuda_float32 SKIPPED (Does not support autograd) [ 76%] 2023-01-11T23:10:17.2069017Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_arange_cuda_float32 SKIPPED (Does not support autograd) [ 76%] 2023-01-11T23:10:17.2069227Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_argmax_cuda_float32 SKIPPED (Does not support autograd) [ 76%] 2023-01-11T23:10:17.2069442Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_argsort_cuda_float32 SKIPPED (Does not support autograd) [ 76%] 2023-01-11T23:10:17.2069657Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_argwhere_cuda_float32 SKIPPED (Does not support autograd) [ 76%] 2023-01-11T23:10:17.2069833Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_as_strided_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2070035Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_as_strided_partial_views_cuda_float32 XFAIL [ 76%] 2023-01-11T23:10:17.2070230Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_as_strided_scatter_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2070407Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_asin_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2070590Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_atan2_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2070767Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_atanh_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2070948Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_atleast_2d_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2071128Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_atleast_3d_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2071311Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_bernoulli_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2071477Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_bmm_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2071693Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_cartesian_prod_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2071867Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_cat_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2072080Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_cdist_cuda_float32 SKIPPED (Does not support forward_ad) [ 76%] 2023-01-11T23:10:17.2072290Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_cfloat_cuda_float32 SKIPPED (Does not support forward_ad) [ 76%] 2023-01-11T23:10:17.2072505Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_cholesky_cuda_float32 SKIPPED (Does not support forward_ad) [ 76%] 2023-01-11T23:10:17.2072693Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_cholesky_solve_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2072878Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_clamp_min_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2073054Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_clone_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2073225Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_conj_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2073414Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_conj_physical_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2073605Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_constant_pad_nd_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2073777Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_cos_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2073951Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_cross_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2074128Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_cumsum_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2074306Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_deg2rad_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2074488Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_diag_embed_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2074674Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_diagonal_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2074843Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_diff_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2075046Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_digamma_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2075220Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_dist_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2075413Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_div_floor_rounding_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2075610Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_div_no_rounding_mode_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2075791Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_double_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2076000Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_empty_cuda_float32 SKIPPED (Does not support autograd) [ 76%] 2023-01-11T23:10:17.2076215Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_empty_like_cuda_float32 SKIPPED (Does not support autograd) [ 76%] 2023-01-11T23:10:17.2076387Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_erfc_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2076562Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_expand_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2076737Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_expm1_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2076913Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fft_fft2_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2077093Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fft_hfft2_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2077272Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fft_hfftn_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2077452Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fft_ifftn_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2077658Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fft_ihfft2_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2077839Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fft_irfftn_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2078011Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fft_rfft2_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2078189Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fft_rfft_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2078365Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_flip_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2078575Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_float_cuda_float32 SKIPPED (Does not support forward_ad) [ 76%] 2023-01-11T23:10:17.2078759Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_float_power_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2078974Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_floor_divide_cuda_float32 SKIPPED (Does not support autograd) [ 76%] 2023-01-11T23:10:17.2079150Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fmax_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2079324Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fmod_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2079501Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_frac_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2079671Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_frexp_cuda_float32 PASSED [ 76%] 2023-01-11T23:10:17.2079878Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_full_cuda_float32 SKIPPED (Does not support autograd) [ 76%] 2023-01-11T23:10:17.2080091Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_full_like_cuda_float32 SKIPPED (Does not support autograd) [ 77%] 2023-01-11T23:10:17.2080273Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_gather_cuda_float32 PASSED [ 77%] 2023-01-11T23:10:17.2080481Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_geqrf_cuda_float32 SKIPPED (Does not support autograd) [ 77%] 2023-01-11T23:10:17.2080664Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_gradient_cuda_float32 PASSED [ 77%] 2023-01-11T23:10:17.2080945Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_grid_sampler_2d_cuda_float32 SKIPPED (Does not support forward_ad) [ 77%] 2023-01-11T23:10:17.2081150Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_gt_cuda_float32 SKIPPED (Does not support autograd) [ 77%] 2023-01-11T23:10:17.2081361Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_histc_cuda_float32 SKIPPED (Does not support autograd) [ 77%] 2023-01-11T23:10:17.2081532Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_hsplit_cuda_float32 PASSED [ 77%] 2023-01-11T23:10:17.2081718Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_index_copy_cuda_float32 PASSED [ 77%] 2023-01-11T23:10:17.2081897Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_index_put_cuda_float32 PASSED [ 77%] 2023-01-11T23:10:17.2082084Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_index_select_cuda_float32 PASSED [ 77%] 2023-01-11T23:10:17.2082294Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_int_cuda_float32 SKIPPED (Does not support autograd) [ 77%] 2023-01-11T23:10:17.2082503Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_isclose_cuda_float32 SKIPPED (Does not support autograd) [ 77%] 2023-01-11T23:10:17.2082716Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_isfinite_cuda_float32 SKIPPED (Does not support autograd) [ 77%] 2023-01-11T23:10:17.2082926Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_isin_cuda_float32 SKIPPED (Does not support autograd) [ 77%] 2023-01-11T23:10:17.2083130Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_isinf_cuda_float32 SKIPPED (Does not support autograd) [ 77%] 2023-01-11T23:10:17.2083328Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_isnan_cuda_float32 SKIPPED (Does not support autograd) [ 77%] 2023-01-11T23:10:17.2083538Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_isposinf_cuda_float32 SKIPPED (Does not support autograd) [ 77%] 2023-01-11T23:10:17.2083797Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_jiterator_4inputs_with_extra_args_cuda_float32 SKIPPED (Does not support autograd) [ 77%] 2023-01-11T23:10:17.2084019Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_jiterator_binary_cuda_float32 SKIPPED (Does not support autograd) [ 77%] 2023-01-11T23:10:17.2084250Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_jiterator_binary_return_by_ref_cuda_float32 SKIPPED (Does not support autograd) [ 77%] 2023-01-11T23:10:17.2084470Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_jiterator_unary_cuda_float32 SKIPPED (Does not support autograd) [ 77%] 2023-01-11T23:10:17.2084651Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_kthvalue_cuda_float32 PASSED [ 77%] 2023-01-11T23:10:17.2084828Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_lgamma_cuda_float32 PASSED [ 77%] 2023-01-11T23:10:17.2085016Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_cond_cuda_float32 PASSED [ 77%] 2023-01-11T23:10:17.2085211Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_det_singular_cuda_float32 PASSED [ 77%] 2023-01-11T23:10:17.2085396Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_eigvalsh_cuda_float32 PASSED [ 77%] 2023-01-11T23:10:17.2085599Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_householder_product_cuda_float32 PASSED [ 77%] 2023-01-11T23:10:17.2085787Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_inv_ex_cuda_float32 PASSED [ 77%] 2023-01-11T23:10:17.2086007Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_ldl_factor_cuda_float32 SKIPPED (Does not support autograd) [ 77%] 2023-01-11T23:10:17.2086227Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_ldl_factor_ex_cuda_float32 SKIPPED (Does not support autograd) [ 77%] 2023-01-11T23:10:17.2086449Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_ldl_solve_cuda_float32 SKIPPED (Does not support autograd) [ 77%] 2023-01-11T23:10:17.2086647Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_lstsq_grad_oriented_cuda_float32 PASSED [ 77%] 2023-01-11T23:10:17.2086850Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_lu_cuda_float32 PASSED [ 77%] 2023-01-11T23:10:17.2087042Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_lu_solve_cuda_float32 PASSED [ 77%] 2023-01-11T23:10:17.2087226Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_matrix_norm_cuda_float32 PASSED [ 77%] 2023-01-11T23:10:17.2087420Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_matrix_power_cuda_float32 PASSED [ 77%] 2023-01-11T23:10:17.2087609Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_multi_dot_cuda_float32 PASSED [ 77%] 2023-01-11T23:10:17.2087792Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_norm_cuda_float32 PASSED [ 77%] 2023-01-11T23:10:17.2088001Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_norm_subgradients_at_zero_cuda_float32 PASSED [ 77%] 2023-01-11T23:10:17.2088202Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_pinv_hermitian_cuda_float32 PASSED [ 77%] 2023-01-11T23:10:17.2088388Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_solve_cuda_float32 PASSED [ 77%] 2023-01-11T23:10:17.2088585Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_solve_triangular_cuda_float32 PASSED [ 77%] 2023-01-11T23:10:17.2088772Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_vander_cuda_float32 PASSED [ 77%] 2023-01-11T23:10:17.2088975Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linspace_cuda_float32 SKIPPED (Does not support autograd) [ 77%] 2023-01-11T23:10:17.2089151Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_log10_cuda_float32 PASSED [ 77%] 2023-01-11T23:10:17.2089326Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_log_cuda_float32 PASSED [ 77%] 2023-01-11T23:10:17.2089552Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_log_softmax_with_dtype_cuda_float32 PASSED [ 77%] 2023-01-11T23:10:17.2089741Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_logaddexp2_cuda_float32 PASSED [ 77%] 2023-01-11T23:10:17.2089963Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_logcumsumexp_cuda_float32 SKIPPED (Does not support forward_ad) [ 77%] 2023-01-11T23:10:17.2090146Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_logdet_cuda_float32 PASSED [ 77%] 2023-01-11T23:10:17.2090361Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_logical_and_cuda_float32 SKIPPED (Does not support autograd) [ 77%] 2023-01-11T23:10:17.2090608Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_logical_not_cuda_float32 SKIPPED (Does not support autograd) [ 77%] 2023-01-11T23:10:17.2090831Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_logical_xor_cuda_float32 SKIPPED (Does not support autograd) [ 77%] 2023-01-11T23:10:17.2091007Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_lu_solve_cuda_float32 PASSED [ 77%] 2023-01-11T23:10:17.2091188Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_lu_unpack_cuda_float32 PASSED [ 77%] 2023-01-11T23:10:17.2091371Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_amin_cuda_float32 PASSED [ 77%] 2023-01-11T23:10:17.2091589Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_argmax_cuda_float32 SKIPPED (Does not support autograd) [ 77%] 2023-01-11T23:10:17.2091804Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_argmin_cuda_float32 SKIPPED (Does not support autograd) [ 77%] 2023-01-11T23:10:17.2091994Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_cumprod_cuda_float32 PASSED [ 77%] 2023-01-11T23:10:17.2092187Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_logsumexp_cuda_float32 PASSED [ 77%] 2023-01-11T23:10:17.2092372Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_mean_cuda_float32 PASSED [ 77%] 2023-01-11T23:10:17.2092554Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_median_cuda_float32 PASSED [ 77%] 2023-01-11T23:10:17.2092772Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_normalize_cuda_float32 PASSED [ 77%] 2023-01-11T23:10:17.2092956Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_var_cuda_float32 PASSED [ 77%] 2023-01-11T23:10:17.2093137Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_matmul_cuda_float32 PASSED [ 77%] 2023-01-11T23:10:17.2093318Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_max_binary_cuda_float32 PASSED [ 77%] 2023-01-11T23:10:17.2093514Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_max_reduction_no_dim_cuda_float32 PASSED [ 77%] 2023-01-11T23:10:17.2093710Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_max_reduction_with_dim_cuda_float32 PASSED [ 77%] 2023-01-11T23:10:17.2093892Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_maximum_cuda_float32 PASSED [ 77%] 2023-01-11T23:10:17.2094073Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_min_binary_cuda_float32 PASSED [ 77%] 2023-01-11T23:10:17.2094256Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_min_reduction_no_dim_cuda_float32 PASSED [ 77%] 2023-01-11T23:10:17.2094435Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_msort_cuda_float32 PASSED [ 77%] 2023-01-11T23:10:17.2094919Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_mul_cuda_float32 PASSED [ 77%] 2023-01-11T23:10:17.2095096Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_mv_cuda_float32 PASSED [ 77%] 2023-01-11T23:10:17.2095292Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_mvlgamma_mvlgamma_p_1_cuda_float32 PASSED [ 77%] 2023-01-11T23:10:17.2095487Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_mvlgamma_mvlgamma_p_3_cuda_float32 PASSED [ 77%] 2023-01-11T23:10:17.2095678Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_mvlgamma_mvlgamma_p_5_cuda_float32 PASSED [ 77%] 2023-01-11T23:10:17.2095908Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nan_to_num_cuda_float32 PASSED [ 77%] 2023-01-11T23:10:17.2096092Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nanmean_cuda_float32 PASSED [ 77%] 2023-01-11T23:10:17.2096266Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nanmedian_cuda_float32 PASSED [ 77%] 2023-01-11T23:10:17.2096482Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_narrow_copy_cuda_float32 SKIPPED (Does not support autograd) [ 77%] 2023-01-11T23:10:17.2096712Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_native_dropout_backward_cuda_float32 SKIPPED (Does not support forward_ad) [ 77%] 2023-01-11T23:10:17.2096935Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_native_layer_norm_cuda_float32 SKIPPED (Does not support forward_ad) [ 77%] 2023-01-11T23:10:17.2097141Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_ne_cuda_float32 SKIPPED (Does not support autograd) [ 77%] 2023-01-11T23:10:17.2097363Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_new_empty_strided_cuda_float32 SKIPPED (Does not support autograd) [ 77%] 2023-01-11T23:10:17.2097576Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_new_ones_cuda_float32 SKIPPED (Does not support autograd) [ 77%] 2023-01-11T23:10:17.2097825Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional__scaled_dot_product_attention_cuda_float32 SKIPPED (Does not support forward_ad) [ 77%] 2023-01-11T23:10:17.2098034Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_adaptive_avg_pool1d_cuda_float32 PASSED [ 77%] 2023-01-11T23:10:17.2098243Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_adaptive_avg_pool2d_cuda_float32 PASSED [ 77%] 2023-01-11T23:10:17.2098445Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_adaptive_avg_pool3d_cuda_float32 PASSED [ 77%] 2023-01-11T23:10:17.2098657Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_adaptive_max_pool1d_cuda_float32 PASSED [ 77%] 2023-01-11T23:10:17.2098860Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_adaptive_max_pool3d_cuda_float32 PASSED [ 77%] 2023-01-11T23:10:17.2099093Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_avg_pool2d_cuda_float32 PASSED [ 77%] 2023-01-11T23:10:17.2099337Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_batch_norm_without_cudnn_cuda_float32 SKIPPED (Does not support autograd) [ 77%] 2023-01-11T23:10:17.2099535Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_bilinear_cuda_float32 PASSED [ 77%] 2023-01-11T23:10:17.2099744Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_binary_cross_entropy_cuda_float32 PASSED [ 77%] 2023-01-11T23:10:17.2099939Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_celu_cuda_float32 PASSED [ 77%] 2023-01-11T23:10:17.2100148Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_conv_transpose2d_cuda_float32 PASSED [ 77%] 2023-01-11T23:10:17.2100347Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_conv_transpose3d_cuda_float32 PASSED [ 77%] 2023-01-11T23:10:17.2100546Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_cosine_similarity_cuda_float32 PASSED [ 77%] 2023-01-11T23:10:17.2100746Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_cross_entropy_cuda_float32 PASSED [ 77%] 2023-01-11T23:10:17.2100971Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_ctc_loss_cuda_float32 SKIPPED (Does not support forward_ad) [ 77%] 2023-01-11T23:10:17.2101173Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_dropout2d_cuda_float32 PASSED [ 77%] 2023-01-11T23:10:17.2101371Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_dropout3d_cuda_float32 PASSED [ 77%] 2023-01-11T23:10:17.2101590Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_dropout_cuda_float32 PASSED [ 77%] 2023-01-11T23:10:17.2101824Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_embedding_bag_cuda_float32 SKIPPED (Does not support forward_ad) [ 77%] 2023-01-11T23:10:17.2102021Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_embedding_cuda_float32 PASSED [ 77%] 2023-01-11T23:10:17.2102241Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_feature_alpha_dropout_with_train_cuda_float32 PASSED [ 77%] 2023-01-11T23:10:17.2102442Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_fractional_max_pool2d_cuda_float32 PASSED [ 77%] 2023-01-11T23:10:17.2102653Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_fractional_max_pool3d_cuda_float32 PASSED [ 77%] 2023-01-11T23:10:17.2102855Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_gaussian_nll_loss_cuda_float32 XFAIL [ 77%] 2023-01-11T23:10:17.2103047Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_gelu_cuda_float32 PASSED [ 77%] 2023-01-11T23:10:17.2103239Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_glu_cuda_float32 PASSED [ 77%] 2023-01-11T23:10:17.2103438Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_hardswish_cuda_float32 PASSED [ 77%] 2023-01-11T23:10:17.2103635Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_hardtanh_cuda_float32 PASSED [ 77%] 2023-01-11T23:10:17.2103838Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_interpolate_area_cuda_float32 PASSED [ 77%] 2023-01-11T23:10:17.2104048Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_interpolate_bicubic_cuda_float32 PASSED [ 77%] 2023-01-11T23:10:17.2104257Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_interpolate_linear_cuda_float32 PASSED [ 77%] 2023-01-11T23:10:17.2104447Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_kl_div_cuda_float32 PASSED [ 77%] 2023-01-11T23:10:17.2104644Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_leaky_relu_cuda_float32 PASSED [ 77%] 2023-01-11T23:10:17.2104864Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_logsigmoid_cuda_float32 PASSED [ 77%] 2023-01-11T23:10:17.2105074Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_margin_ranking_loss_cuda_float32 PASSED [ 77%] 2023-01-11T23:10:17.2105274Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_max_pool1d_cuda_float32 PASSED [ 77%] 2023-01-11T23:10:17.2105488Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_max_unpool2d_cuda_float32 SKIPPED (Skipped!) [ 77%] 2023-01-11T23:10:17.2105693Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_max_unpool2d_grad_cuda_float32 PASSED [ 77%] 2023-01-11T23:10:17.2105887Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_mish_cuda_float32 PASSED [ 77%] 2023-01-11T23:10:17.2106125Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_multi_margin_loss_cuda_float32 SKIPPED (Does not support forward_ad) [ 77%] 2023-01-11T23:10:17.2106315Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_nll_loss_cuda_float32 PASSED [ 77%] 2023-01-11T23:10:17.2106517Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_pad_circular_cuda_float32 PASSED [ 77%] 2023-01-11T23:10:17.2106717Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_pad_constant_cuda_float32 PASSED [ 77%] 2023-01-11T23:10:17.2106914Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_pad_reflect_cuda_float32 PASSED [ 77%] 2023-01-11T23:10:17.2107142Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_pdist_cuda_float32 SKIPPED (Does not support forward_ad) [ 77%] 2023-01-11T23:10:17.2107368Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_poisson_nll_loss_cuda_float32 PASSED [ 77%] 2023-01-11T23:10:17.2107557Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_relu6_cuda_float32 PASSED [ 77%] 2023-01-11T23:10:17.2107754Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_rrelu_cuda_float32 PASSED [ 77%] 2023-01-11T23:10:17.2107943Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_selu_cuda_float32 PASSED [ 77%] 2023-01-11T23:10:17.2108127Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_silu_cuda_float32 PASSED [ 77%] 2023-01-11T23:10:17.2108324Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_softmin_cuda_float32 PASSED [ 77%] 2023-01-11T23:10:17.2108525Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_threshold_cuda_float32 PASSED [ 77%] 2023-01-11T23:10:17.2108790Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_triplet_margin_loss_cuda_float32 PASSED [ 77%] 2023-01-11T23:10:17.2109017Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_triplet_margin_with_distance_loss_cuda_float32 PASSED [ 77%] 2023-01-11T23:10:17.2109214Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_unfold_cuda_float32 PASSED [ 77%] 2023-01-11T23:10:17.2109421Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_upsample_nearest_cuda_float32 PASSED [ 77%] 2023-01-11T23:10:17.2109598Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_norm_fro_cuda_float32 PASSED [ 77%] 2023-01-11T23:10:17.2109781Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_norm_inf_cuda_float32 PASSED [ 77%] 2023-01-11T23:10:17.2109953Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_norm_nuc_cuda_float32 PASSED [ 77%] 2023-01-11T23:10:17.2110158Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_normal_cuda_float32 SKIPPED (Does not support forward_ad) [ 77%] 2023-01-11T23:10:17.2110371Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_ones_cuda_float32 SKIPPED (Does not support autograd) [ 77%] 2023-01-11T23:10:17.2110585Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_ormqr_cuda_float32 SKIPPED (Does not support forward_ad) [ 77%] 2023-01-11T23:10:17.2110793Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_permute_cuda_float32 PASSED [ 77%] 2023-01-11T23:10:17.2110990Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_polygamma_polygamma_n_1_cuda_float32 PASSED [ 77%] 2023-01-11T23:10:17.2111186Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_polygamma_polygamma_n_3_cuda_float32 PASSED [ 77%] 2023-01-11T23:10:17.2111362Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_positive_cuda_float32 PASSED [ 77%] 2023-01-11T23:10:17.2111535Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_qr_cuda_float32 PASSED [ 77%] 2023-01-11T23:10:17.2111747Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_randint_cuda_float32 SKIPPED (Does not support autograd) [ 77%] 2023-01-11T23:10:17.2111958Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_randint_like_cuda_float32 SKIPPED (Does not support autograd) [ 77%] 2023-01-11T23:10:17.2112166Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_randn_cuda_float32 SKIPPED (Does not support autograd) [ 78%] 2023-01-11T23:10:17.2112377Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_randn_like_cuda_float32 SKIPPED (Does not support autograd) [ 78%] 2023-01-11T23:10:17.2112559Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_reciprocal_cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2112740Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_repeat_cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2112921Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_reshape_cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2113128Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_resize__cuda_float32 SKIPPED (Does not support autograd) [ 78%] 2023-01-11T23:10:17.2113364Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_resize_as__cuda_float32 SKIPPED (Does not support autograd) [ 78%] 2023-01-11T23:10:17.2113541Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_roll_cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2113728Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_round_decimals_3_cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2113922Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_round_decimals_neg_3_cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2114100Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_rsqrt_cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2114283Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_scatter_add_cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2114477Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_scatter_reduce_amin_cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2114668Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_scatter_reduce_sum_cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2114845Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_select_cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2115054Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_short_cuda_float32 SKIPPED (Does not support autograd) [ 78%] 2023-01-11T23:10:17.2115233Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_sigmoid_cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2115400Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_sign_cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2115627Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_signal_windows_bartlett_cuda_float32 SKIPPED (Does not support autograd) [ 78%] 2023-01-11T23:10:17.2115857Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_signal_windows_exponential_cuda_float32 SKIPPED (Does not support autograd) [ 78%] 2023-01-11T23:10:17.2116088Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_signal_windows_general_hamming_cuda_float32 SKIPPED (Does not support autograd) [ 78%] 2023-01-11T23:10:17.2116316Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_signal_windows_hamming_cuda_float32 SKIPPED (Does not support autograd) [ 78%] 2023-01-11T23:10:17.2116548Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_signbit_cuda_float32 SKIPPED (Does not support autograd) [ 78%] 2023-01-11T23:10:17.2116727Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_sin_cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2116902Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_sinc_cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2117092Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_slice_scatter_cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2117278Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_softmax_with_dtype_cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2117495Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_airy_ai_cuda_float32 SKIPPED (Does not support autograd) [ 78%] 2023-01-11T23:10:17.2117717Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_bessel_y1_cuda_float32 SKIPPED (Does not support autograd) [ 78%] 2023-01-11T23:10:17.2117956Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_chebyshev_polynomial_u_cuda_float32 SKIPPED (Does not support autograd) [ 78%] 2023-01-11T23:10:17.2118144Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_erfcx_cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2118379Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_hermite_polynomial_h_cuda_float32 SKIPPED (Does not support autograd) [ 78%] 2023-01-11T23:10:17.2118612Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_hermite_polynomial_he_cuda_float32 SKIPPED (Does not support autograd) [ 78%] 2023-01-11T23:10:17.2118796Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_i0e_cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2118980Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_i1_cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2119193Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_i1e_cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2119377Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_log_ndtr_cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2119610Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_modified_bessel_k0_cuda_float32 SKIPPED (Does not support autograd) [ 78%] 2023-01-11T23:10:17.2119795Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_ndtr_cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2120009Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_polygamma_special_polygamma_n_0_cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2120242Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_scaled_modified_bessel_k0_cuda_float32 SKIPPED (Does not support autograd) [ 78%] 2023-01-11T23:10:17.2120686Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_shifted_chebyshev_polynomial_t_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 78%] 2023-01-11T23:10:17.2121085Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_shifted_chebyshev_polynomial_w_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 78%] 2023-01-11T23:10:17.2121313Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_spherical_bessel_j0_cuda_float32 SKIPPED (Does not support autograd) [ 78%] 2023-01-11T23:10:17.2121499Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_xlog1py_cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2121675Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_split_cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2121856Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_split_with_sizes_cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2122029Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_sqrt_cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2122208Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_stack_cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2122385Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_std_cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2122580Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_stft_cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2122835Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_svd_lowrank_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 78%] 2023-01-11T23:10:17.2123047Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_symeig_cuda_float32 SKIPPED (Does not support forward_ad) [ 78%] 2023-01-11T23:10:17.2123217Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_t_cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2123404Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_take_along_dim_cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2123572Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_take_cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2123745Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_tan_cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2123929Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_tensor_split_cuda_float32 XFAIL [ 78%] 2023-01-11T23:10:17.2124142Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_to_sparse_cuda_float32 SKIPPED (Does not support forward_ad) [ 78%] 2023-01-11T23:10:17.2124318Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_trace_cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2124500Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_transpose_cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2124683Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_trapezoid_cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2124875Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_triangular_solve_cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2125046Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_triu_cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2125248Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_true_divide_cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2125423Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_unbind_cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2125606Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_unfold_copy_cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2125781Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_unfold_cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2125955Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_var_cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2126144Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_var_mean_unbiased_cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2126329Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_var_unbiased_cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2126515Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_view_as_complex_cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2126682Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_view_cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2126857Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_zero__cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2127031Z test_ops.py::TestCompositeComplianceCUDA::test_operator_H_cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2127197Z test_ops.py::TestCompositeComplianceCUDA::test_operator_T_cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2127374Z test_ops.py::TestCompositeComplianceCUDA::test_operator___radd___cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2127550Z test_ops.py::TestCompositeComplianceCUDA::test_operator___rdiv___cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2127723Z test_ops.py::TestCompositeComplianceCUDA::test_operator___rmul___cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2127894Z test_ops.py::TestCompositeComplianceCUDA::test_operator___rpow___cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2128066Z test_ops.py::TestCompositeComplianceCUDA::test_operator___rsub___cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2128257Z test_ops.py::TestCompositeComplianceCUDA::test_operator__softmax_backward_data_cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2128430Z test_ops.py::TestCompositeComplianceCUDA::test_operator_abs_cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2128629Z test_ops.py::TestCompositeComplianceCUDA::test_operator_acos_cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2128805Z test_ops.py::TestCompositeComplianceCUDA::test_operator_addcmul_cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2128979Z test_ops.py::TestCompositeComplianceCUDA::test_operator_addmm_cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2129170Z test_ops.py::TestCompositeComplianceCUDA::test_operator_addmm_decomposed_cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2129344Z test_ops.py::TestCompositeComplianceCUDA::test_operator_addmv_cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2129520Z test_ops.py::TestCompositeComplianceCUDA::test_operator_allclose_cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2129682Z test_ops.py::TestCompositeComplianceCUDA::test_operator_angle_cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2129861Z test_ops.py::TestCompositeComplianceCUDA::test_operator_argmax_cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2130037Z test_ops.py::TestCompositeComplianceCUDA::test_operator_argsort_cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2130214Z test_ops.py::TestCompositeComplianceCUDA::test_operator_argwhere_cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2130393Z test_ops.py::TestCompositeComplianceCUDA::test_operator_as_strided_cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2130581Z test_ops.py::TestCompositeComplianceCUDA::test_operator_as_strided_scatter_cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2130762Z test_ops.py::TestCompositeComplianceCUDA::test_operator_asin_cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2130964Z test_ops.py::TestCompositeComplianceCUDA::test_operator_asinh_cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2131149Z test_ops.py::TestCompositeComplianceCUDA::test_operator_atan_cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2131349Z test_ops.py::TestCompositeComplianceCUDA::test_operator_atleast_1d_cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2131529Z test_ops.py::TestCompositeComplianceCUDA::test_operator_atleast_3d_cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2131708Z test_ops.py::TestCompositeComplianceCUDA::test_operator_baddbmm_cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2131883Z test_ops.py::TestCompositeComplianceCUDA::test_operator_bfloat16_cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2132059Z test_ops.py::TestCompositeComplianceCUDA::test_operator_block_diag_cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2132231Z test_ops.py::TestCompositeComplianceCUDA::test_operator_bmm_cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2132402Z test_ops.py::TestCompositeComplianceCUDA::test_operator_bool_cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2132592Z test_ops.py::TestCompositeComplianceCUDA::test_operator_broadcast_shapes_cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2132786Z test_ops.py::TestCompositeComplianceCUDA::test_operator_broadcast_tensors_cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2132959Z test_ops.py::TestCompositeComplianceCUDA::test_operator_bucketize_cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2133134Z test_ops.py::TestCompositeComplianceCUDA::test_operator_byte_cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2133317Z test_ops.py::TestCompositeComplianceCUDA::test_operator_cartesian_prod_cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2133490Z test_ops.py::TestCompositeComplianceCUDA::test_operator_cat_cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2133668Z test_ops.py::TestCompositeComplianceCUDA::test_operator_cdouble_cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2133840Z test_ops.py::TestCompositeComplianceCUDA::test_operator_char_cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2134016Z test_ops.py::TestCompositeComplianceCUDA::test_operator_cholesky_cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2134191Z test_ops.py::TestCompositeComplianceCUDA::test_operator_chunk_cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2134365Z test_ops.py::TestCompositeComplianceCUDA::test_operator_clamp_max_cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2134646Z test_ops.py::TestCompositeComplianceCUDA::test_operator_combinations_cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2134876Z test_ops.py::TestCompositeComplianceCUDA::test_operator_complex_cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2135055Z test_ops.py::TestCompositeComplianceCUDA::test_operator_conj_cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2135242Z test_ops.py::TestCompositeComplianceCUDA::test_operator_constant_pad_nd_cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2135419Z test_ops.py::TestCompositeComplianceCUDA::test_operator_copysign_cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2135600Z test_ops.py::TestCompositeComplianceCUDA::test_operator_count_nonzero_cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2135788Z test_ops.py::TestCompositeComplianceCUDA::test_operator_cov_cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2135965Z test_ops.py::TestCompositeComplianceCUDA::test_operator_cross_cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2136136Z test_ops.py::TestCompositeComplianceCUDA::test_operator_cumprod_cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2136331Z test_ops.py::TestCompositeComplianceCUDA::test_operator_cumulative_trapezoid_cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2136505Z test_ops.py::TestCompositeComplianceCUDA::test_operator_deg2rad_cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2136676Z test_ops.py::TestCompositeComplianceCUDA::test_operator_diag_cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2136851Z test_ops.py::TestCompositeComplianceCUDA::test_operator_diagflat_cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2137036Z test_ops.py::TestCompositeComplianceCUDA::test_operator_diagonal_copy_cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2137211Z test_ops.py::TestCompositeComplianceCUDA::test_operator_diagonal_cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2137396Z test_ops.py::TestCompositeComplianceCUDA::test_operator_diagonal_scatter_cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2137590Z test_ops.py::TestCompositeComplianceCUDA::test_operator_diff_cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2137770Z test_ops.py::TestCompositeComplianceCUDA::test_operator_digamma_cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2137943Z test_ops.py::TestCompositeComplianceCUDA::test_operator_dist_cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2138134Z test_ops.py::TestCompositeComplianceCUDA::test_operator_div_no_rounding_mode_cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2138325Z test_ops.py::TestCompositeComplianceCUDA::test_operator_div_trunc_rounding_cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2138500Z test_ops.py::TestCompositeComplianceCUDA::test_operator_dot_cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2138674Z test_ops.py::TestCompositeComplianceCUDA::test_operator_einsum_cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2138892Z test_ops.py::TestCompositeComplianceCUDA::test_operator_empty_cuda_float32 SKIPPED (Expected: empty is not comparable) [ 78%] 2023-01-11T23:10:17.2139121Z test_ops.py::TestCompositeComplianceCUDA::test_operator_empty_like_cuda_float32 SKIPPED (Expected: empty_like is not comparable) [ 78%] 2023-01-11T23:10:17.2139290Z test_ops.py::TestCompositeComplianceCUDA::test_operator_equal_cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2139463Z test_ops.py::TestCompositeComplianceCUDA::test_operator_erf_cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2139637Z test_ops.py::TestCompositeComplianceCUDA::test_operator_exp_cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2139812Z test_ops.py::TestCompositeComplianceCUDA::test_operator_expand_cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2139983Z test_ops.py::TestCompositeComplianceCUDA::test_operator_expm1_cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2140159Z test_ops.py::TestCompositeComplianceCUDA::test_operator_fft_fft2_cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2140333Z test_ops.py::TestCompositeComplianceCUDA::test_operator_fft_fftn_cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2140513Z test_ops.py::TestCompositeComplianceCUDA::test_operator_fft_hfft_cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2140685Z test_ops.py::TestCompositeComplianceCUDA::test_operator_fft_hfftn_cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2140902Z test_ops.py::TestCompositeComplianceCUDA::test_operator_fft_ifft2_cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2141077Z test_ops.py::TestCompositeComplianceCUDA::test_operator_fft_ifftn_cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2141257Z test_ops.py::TestCompositeComplianceCUDA::test_operator_fft_ihfft2_cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2141434Z test_ops.py::TestCompositeComplianceCUDA::test_operator_fft_irfft2_cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2141607Z test_ops.py::TestCompositeComplianceCUDA::test_operator_fft_irfft_cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2141779Z test_ops.py::TestCompositeComplianceCUDA::test_operator_fft_irfftn_cuda_float32 PASSED [ 78%] 2023-01-11T23:10:17.2141956Z test_ops.py::TestCompositeComplianceCUDA::test_operator_fft_rfft2_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2142129Z test_ops.py::TestCompositeComplianceCUDA::test_operator_fft_rfftn_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2142301Z test_ops.py::TestCompositeComplianceCUDA::test_operator_flatten_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2142475Z test_ops.py::TestCompositeComplianceCUDA::test_operator_flip_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2142650Z test_ops.py::TestCompositeComplianceCUDA::test_operator_fliplr_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2142827Z test_ops.py::TestCompositeComplianceCUDA::test_operator_float_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2143006Z test_ops.py::TestCompositeComplianceCUDA::test_operator_float_power_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2143178Z test_ops.py::TestCompositeComplianceCUDA::test_operator_floor_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2143363Z test_ops.py::TestCompositeComplianceCUDA::test_operator_floor_divide_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2143562Z test_ops.py::TestCompositeComplianceCUDA::test_operator_fmod_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2143732Z test_ops.py::TestCompositeComplianceCUDA::test_operator_frac_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2143899Z test_ops.py::TestCompositeComplianceCUDA::test_operator_full_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2144075Z test_ops.py::TestCompositeComplianceCUDA::test_operator_full_like_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2144248Z test_ops.py::TestCompositeComplianceCUDA::test_operator_gather_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2144423Z test_ops.py::TestCompositeComplianceCUDA::test_operator_geqrf_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2144595Z test_ops.py::TestCompositeComplianceCUDA::test_operator_gt_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2144770Z test_ops.py::TestCompositeComplianceCUDA::test_operator_half_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2144942Z test_ops.py::TestCompositeComplianceCUDA::test_operator_histc_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2145115Z test_ops.py::TestCompositeComplianceCUDA::test_operator_i0_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2145291Z test_ops.py::TestCompositeComplianceCUDA::test_operator_index_copy_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2145468Z test_ops.py::TestCompositeComplianceCUDA::test_operator_index_put_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2145641Z test_ops.py::TestCompositeComplianceCUDA::test_operator_isnan_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2145819Z test_ops.py::TestCompositeComplianceCUDA::test_operator_isneginf_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2146026Z test_ops.py::TestCompositeComplianceCUDA::test_operator_jiterator_2inputs_2outputs_cuda_float32 SKIPPED (skip) [ 79%] 2023-01-11T23:10:17.2146238Z test_ops.py::TestCompositeComplianceCUDA::test_operator_jiterator_4inputs_with_extra_args_cuda_float32 SKIPPED (skip) [ 79%] 2023-01-11T23:10:17.2146413Z test_ops.py::TestCompositeComplianceCUDA::test_operator_ldexp_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2146582Z test_ops.py::TestCompositeComplianceCUDA::test_operator_le_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2146786Z test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_eigh_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2146983Z test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_householder_product_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2147159Z test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_inv_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2147341Z test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_inv_ex_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2147531Z test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_ldl_factor_ex_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2147719Z test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_ldl_solve_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2147900Z test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_lstsq_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2148614Z test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_lu_factor_cuda_float32 SKIPPED (Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/91685 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests.) [ 79%] 2023-01-11T23:10:17.2148886Z test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_lu_solve_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2149076Z test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_matrix_norm_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2149268Z test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_matrix_power_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2149451Z test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_matrix_rank_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2149682Z test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_matrix_rank_hermitian_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2149869Z test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_multi_dot_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2150050Z test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_norm_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2150231Z test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_pinv_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2150422Z test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_pinv_hermitian_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2150677Z test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_pinv_singular_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 79%] 2023-01-11T23:10:17.2150853Z test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_qr_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2151037Z test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_slogdet_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2151219Z test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_solve_ex_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2151404Z test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_svdvals_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2151586Z test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_vander_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2151761Z test_ops.py::TestCompositeComplianceCUDA::test_operator_log1p_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2151954Z test_ops.py::TestCompositeComplianceCUDA::test_operator_log_softmax_with_dtype_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2152135Z test_ops.py::TestCompositeComplianceCUDA::test_operator_logaddexp_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2152318Z test_ops.py::TestCompositeComplianceCUDA::test_operator_logcumsumexp_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2152499Z test_ops.py::TestCompositeComplianceCUDA::test_operator_logical_and_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2152683Z test_ops.py::TestCompositeComplianceCUDA::test_operator_logical_or_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2152852Z test_ops.py::TestCompositeComplianceCUDA::test_operator_logspace_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2153054Z test_ops.py::TestCompositeComplianceCUDA::test_operator_logsumexp_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2153229Z test_ops.py::TestCompositeComplianceCUDA::test_operator_long_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2153397Z test_ops.py::TestCompositeComplianceCUDA::test_operator_lu_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2153574Z test_ops.py::TestCompositeComplianceCUDA::test_operator_lu_solve_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2153751Z test_ops.py::TestCompositeComplianceCUDA::test_operator_lu_unpack_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2153919Z test_ops.py::TestCompositeComplianceCUDA::test_operator_mT_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2154102Z test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_amin_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2154279Z test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_argmax_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2154465Z test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_argmin_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2154651Z test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_cumprod_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2154838Z test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_logsumexp_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2155019Z test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_median_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2155205Z test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_normalize_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2155387Z test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_prod_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2155569Z test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_scatter_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2155784Z test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_select_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2155961Z test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_softmin_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2156140Z test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_std_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2156317Z test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_sum_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2156494Z test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_var_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2156671Z test_ops.py::TestCompositeComplianceCUDA::test_operator_matmul_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2156849Z test_ops.py::TestCompositeComplianceCUDA::test_operator_matrix_exp_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2157026Z test_ops.py::TestCompositeComplianceCUDA::test_operator_max_binary_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2157233Z test_ops.py::TestCompositeComplianceCUDA::test_operator_max_pool2d_with_indices_backward_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2157430Z test_ops.py::TestCompositeComplianceCUDA::test_operator_max_reduction_no_dim_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2157600Z test_ops.py::TestCompositeComplianceCUDA::test_operator_maximum_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2157790Z test_ops.py::TestCompositeComplianceCUDA::test_operator_min_reduction_no_dim_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2157983Z test_ops.py::TestCompositeComplianceCUDA::test_operator_min_reduction_with_dim_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2158155Z test_ops.py::TestCompositeComplianceCUDA::test_operator_mm_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2158330Z test_ops.py::TestCompositeComplianceCUDA::test_operator_movedim_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2158504Z test_ops.py::TestCompositeComplianceCUDA::test_operator_msort_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2158682Z test_ops.py::TestCompositeComplianceCUDA::test_operator_mul_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2158852Z test_ops.py::TestCompositeComplianceCUDA::test_operator_mv_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2159062Z test_ops.py::TestCompositeComplianceCUDA::test_operator_mvlgamma_mvlgamma_p_1_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2159257Z test_ops.py::TestCompositeComplianceCUDA::test_operator_mvlgamma_mvlgamma_p_5_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2159437Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nanmedian_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2159619Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nanquantile_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2159792Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nansum_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2159969Z test_ops.py::TestCompositeComplianceCUDA::test_operator_narrow_copy_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2160144Z test_ops.py::TestCompositeComplianceCUDA::test_operator_narrow_cuda_float32 XFAIL [ 79%] 2023-01-11T23:10:17.2160334Z test_ops.py::TestCompositeComplianceCUDA::test_operator_native_batch_norm_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2160539Z test_ops.py::TestCompositeComplianceCUDA::test_operator_native_layer_norm_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2160702Z test_ops.py::TestCompositeComplianceCUDA::test_operator_ne_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2160874Z test_ops.py::TestCompositeComplianceCUDA::test_operator_neg_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2161050Z test_ops.py::TestCompositeComplianceCUDA::test_operator_new_ones_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2161226Z test_ops.py::TestCompositeComplianceCUDA::test_operator_new_zeros_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2161404Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nextafter_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2161622Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional__scaled_dot_product_attention_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2161856Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_adaptive_avg_pool2d_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2162060Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_adaptive_max_pool1d_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2162264Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_binary_cross_entropy_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2162446Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_celu_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2162634Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_conv2d_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2162837Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_conv_transpose1d_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2163038Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_conv_transpose2d_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2163241Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_cross_entropy_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2163437Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_ctc_loss_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2163631Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_dropout3d_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2163822Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_dropout_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2164008Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_elu_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2164225Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_embedding_bag_cuda_float32 SKIPPED (Allowed exemption) [ 79%] 2023-01-11T23:10:17.2164440Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_embedding_cuda_float32 SKIPPED (Allowed exemption) [ 79%] 2023-01-11T23:10:17.2164660Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_feature_alpha_dropout_with_train_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2164908Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_feature_alpha_dropout_without_train_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2165099Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_glu_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2165291Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_hardtanh_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2165494Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_hinge_embedding_loss_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2165687Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_huber_loss_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2165883Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_instance_norm_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2166082Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_interpolate_area_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2166288Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_interpolate_bilinear_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2166477Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_kl_div_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2166673Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_layer_norm_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2166876Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_local_response_norm_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2167072Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_max_pool1d_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2167263Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_max_pool3d_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2167487Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_max_unpool1d_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2167677Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_mish_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2167871Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_mse_loss_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2168066Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_multi_margin_loss_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2168272Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_multilabel_margin_loss_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2168465Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_nll_loss_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2168662Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_pad_circular_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2168858Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_pad_reflect_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2169052Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_pdist_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2169256Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_pixel_unshuffle_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2169444Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_prelu_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2169636Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_relu6_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2169817Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_selu_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2170001Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_silu_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2170193Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_softmin_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2170400Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_softmin_with_dtype_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2170591Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_softplus_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2170812Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_softshrink_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2171005Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_softsign_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2171208Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_triplet_margin_loss_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2171432Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_triplet_margin_with_distance_loss_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2171629Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_upsample_bilinear_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2171829Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_upsample_nearest_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2172012Z test_ops.py::TestCompositeComplianceCUDA::test_operator_norm_inf_cuda_float32 PASSED [ 79%] 2023-01-11T23:10:17.2172203Z test_ops.py::TestCompositeComplianceCUDA::test_operator_normal_number_mean_cuda_float32 PASSED [ 80%] 2023-01-11T23:10:17.2172378Z test_ops.py::TestCompositeComplianceCUDA::test_operator_ormqr_cuda_float32 PASSED [ 80%] 2023-01-11T23:10:17.2172553Z test_ops.py::TestCompositeComplianceCUDA::test_operator_outer_cuda_float32 PASSED [ 80%] 2023-01-11T23:10:17.2172727Z test_ops.py::TestCompositeComplianceCUDA::test_operator_polar_cuda_float32 PASSED [ 80%] 2023-01-11T23:10:17.2172924Z test_ops.py::TestCompositeComplianceCUDA::test_operator_polygamma_polygamma_n_0_cuda_float32 PASSED [ 80%] 2023-01-11T23:10:17.2173122Z test_ops.py::TestCompositeComplianceCUDA::test_operator_polygamma_polygamma_n_3_cuda_float32 PASSED [ 80%] 2023-01-11T23:10:17.2173288Z test_ops.py::TestCompositeComplianceCUDA::test_operator_pow_cuda_float32 PASSED [ 80%] 2023-01-11T23:10:17.2173522Z test_ops.py::TestCompositeComplianceCUDA::test_operator_prod_cuda_float32 PASSED [ 80%] 2023-01-11T23:10:17.2173697Z test_ops.py::TestCompositeComplianceCUDA::test_operator_put_cuda_float32 PASSED [ 80%] 2023-01-11T23:10:17.2173875Z test_ops.py::TestCompositeComplianceCUDA::test_operator_rand_like_cuda_float32 PASSED [ 80%] 2023-01-11T23:10:17.2174052Z test_ops.py::TestCompositeComplianceCUDA::test_operator_randint_cuda_float32 PASSED [ 80%] 2023-01-11T23:10:17.2174230Z test_ops.py::TestCompositeComplianceCUDA::test_operator_randn_like_cuda_float32 PASSED [ 80%] 2023-01-11T23:10:17.2174410Z test_ops.py::TestCompositeComplianceCUDA::test_operator_remainder_cuda_float32 PASSED [ 80%] 2023-01-11T23:10:17.2174677Z test_ops.py::TestCompositeComplianceCUDA::test_operator_repeat_cuda_float32 PASSED [ 80%] 2023-01-11T23:10:17.2174876Z test_ops.py::TestCompositeComplianceCUDA::test_operator_resize__cuda_float32 SKIPPED (Allowed exception) [ 80%] 2023-01-11T23:10:17.2175084Z test_ops.py::TestCompositeComplianceCUDA::test_operator_resize_as__cuda_float32 SKIPPED (Allowed exemption) [ 80%] 2023-01-11T23:10:17.2175265Z test_ops.py::TestCompositeComplianceCUDA::test_operator_resolve_neg_cuda_float32 PASSED [ 80%] 2023-01-11T23:10:17.2175443Z test_ops.py::TestCompositeComplianceCUDA::test_operator_roll_cuda_float32 PASSED [ 80%] 2023-01-11T23:10:17.2175614Z test_ops.py::TestCompositeComplianceCUDA::test_operator_rot90_cuda_float32 PASSED [ 80%] 2023-01-11T23:10:17.2175804Z test_ops.py::TestCompositeComplianceCUDA::test_operator_round_decimals_3_cuda_float32 PASSED [ 80%] 2023-01-11T23:10:17.2175978Z test_ops.py::TestCompositeComplianceCUDA::test_operator_rsqrt_cuda_float32 PASSED [ 80%] 2023-01-11T23:10:17.2176161Z test_ops.py::TestCompositeComplianceCUDA::test_operator_scalar_tensor_cuda_float32 PASSED [ 80%] 2023-01-11T23:10:17.2176350Z test_ops.py::TestCompositeComplianceCUDA::test_operator_scatter_reduce_sum_cuda_float32 PASSED [ 80%] 2023-01-11T23:10:17.2176540Z test_ops.py::TestCompositeComplianceCUDA::test_operator_segment_reduce_lengths_cuda_float32 PASSED [ 80%] 2023-01-11T23:10:17.2176713Z test_ops.py::TestCompositeComplianceCUDA::test_operator_sgn_cuda_float32 PASSED [ 80%] 2023-01-11T23:10:17.2176935Z test_ops.py::TestCompositeComplianceCUDA::test_operator_short_cuda_float32 PASSED [ 80%] 2023-01-11T23:10:17.2177115Z test_ops.py::TestCompositeComplianceCUDA::test_operator_sigmoid_cuda_float32 PASSED [ 80%] 2023-01-11T23:10:17.2177288Z test_ops.py::TestCompositeComplianceCUDA::test_operator_sign_cuda_float32 PASSED [ 80%] 2023-01-11T23:10:17.2177483Z test_ops.py::TestCompositeComplianceCUDA::test_operator_signal_windows_cosine_cuda_float32 PASSED [ 80%] 2023-01-11T23:10:17.2177685Z test_ops.py::TestCompositeComplianceCUDA::test_operator_signal_windows_general_hamming_cuda_float32 PASSED [ 80%] 2023-01-11T23:10:17.2177877Z test_ops.py::TestCompositeComplianceCUDA::test_operator_signal_windows_hamming_cuda_float32 PASSED [ 80%] 2023-01-11T23:10:17.2178071Z test_ops.py::TestCompositeComplianceCUDA::test_operator_signal_windows_hann_cuda_float32 PASSED [ 80%] 2023-01-11T23:10:17.2178256Z test_ops.py::TestCompositeComplianceCUDA::test_operator_signal_windows_kaiser_cuda_float32 PASSED [ 80%] 2023-01-11T23:10:17.2178431Z test_ops.py::TestCompositeComplianceCUDA::test_operator_sinc_cuda_float32 PASSED [ 80%] 2023-01-11T23:10:17.2178604Z test_ops.py::TestCompositeComplianceCUDA::test_operator_sinh_cuda_float32 PASSED [ 80%] 2023-01-11T23:10:17.2178793Z test_ops.py::TestCompositeComplianceCUDA::test_operator_softmax_with_dtype_cuda_float32 PASSED [ 80%] 2023-01-11T23:10:17.2178978Z test_ops.py::TestCompositeComplianceCUDA::test_operator_special_airy_ai_cuda_float32 PASSED [ 80%] 2023-01-11T23:10:17.2179165Z test_ops.py::TestCompositeComplianceCUDA::test_operator_special_bessel_j0_cuda_float32 PASSED [ 80%] 2023-01-11T23:10:17.2179365Z test_ops.py::TestCompositeComplianceCUDA::test_operator_special_chebyshev_polynomial_t_cuda_float32 PASSED [ 80%] 2023-01-11T23:10:17.2179793Z test_ops.py::TestCompositeComplianceCUDA::test_operator_special_chebyshev_polynomial_v_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 80%] 2023-01-11T23:10:17.2179982Z test_ops.py::TestCompositeComplianceCUDA::test_operator_special_erfcx_cuda_float32 PASSED [ 80%] 2023-01-11T23:10:17.2180192Z test_ops.py::TestCompositeComplianceCUDA::test_operator_special_hermite_polynomial_he_cuda_float32 PASSED [ 80%] 2023-01-11T23:10:17.2180403Z test_ops.py::TestCompositeComplianceCUDA::test_operator_special_i0e_cuda_float32 PASSED [ 80%] 2023-01-11T23:10:17.2180590Z test_ops.py::TestCompositeComplianceCUDA::test_operator_special_i1_cuda_float32 PASSED [ 80%] 2023-01-11T23:10:17.2180772Z test_ops.py::TestCompositeComplianceCUDA::test_operator_special_i1e_cuda_float32 PASSED [ 80%] 2023-01-11T23:10:17.2180972Z test_ops.py::TestCompositeComplianceCUDA::test_operator_special_laguerre_polynomial_l_cuda_float32 PASSED [ 80%] 2023-01-11T23:10:17.2181346Z test_ops.py::TestCompositeComplianceCUDA::test_operator_special_legendre_polynomial_p_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 80%] 2023-01-11T23:10:17.2181549Z test_ops.py::TestCompositeComplianceCUDA::test_operator_special_modified_bessel_k0_cuda_float32 PASSED [ 80%] 2023-01-11T23:10:17.2181746Z test_ops.py::TestCompositeComplianceCUDA::test_operator_special_modified_bessel_k1_cuda_float32 PASSED [ 80%] 2023-01-11T23:10:17.2181928Z test_ops.py::TestCompositeComplianceCUDA::test_operator_special_ndtri_cuda_float32 PASSED [ 80%] 2023-01-11T23:10:17.2182305Z test_ops.py::TestCompositeComplianceCUDA::test_operator_special_shifted_chebyshev_polynomial_t_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 80%] 2023-01-11T23:10:17.2182689Z test_ops.py::TestCompositeComplianceCUDA::test_operator_special_shifted_chebyshev_polynomial_w_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 80%] 2023-01-11T23:10:17.2182893Z test_ops.py::TestCompositeComplianceCUDA::test_operator_special_spherical_bessel_j0_cuda_float32 PASSED [ 80%] 2023-01-11T23:10:17.2183081Z test_ops.py::TestCompositeComplianceCUDA::test_operator_special_xlog1py_cuda_float32 PASSED [ 80%] 2023-01-11T23:10:17.2183283Z test_ops.py::TestCompositeComplianceCUDA::test_operator_sqrt_cuda_float32 PASSED [ 80%] 2023-01-11T23:10:17.2183461Z test_ops.py::TestCompositeComplianceCUDA::test_operator_stack_cuda_float32 PASSED [ 80%] 2023-01-11T23:10:17.2183639Z test_ops.py::TestCompositeComplianceCUDA::test_operator_std_mean_cuda_float32 PASSED [ 80%] 2023-01-11T23:10:17.2183827Z test_ops.py::TestCompositeComplianceCUDA::test_operator_std_mean_unbiased_cuda_float32 PASSED [ 80%] 2023-01-11T23:10:17.2184008Z test_ops.py::TestCompositeComplianceCUDA::test_operator_std_unbiased_cuda_float32 PASSED [ 80%] 2023-01-11T23:10:17.2184181Z test_ops.py::TestCompositeComplianceCUDA::test_operator_stft_cuda_float32 PASSED [ 80%] 2023-01-11T23:10:17.2184345Z test_ops.py::TestCompositeComplianceCUDA::test_operator_sub_cuda_float32 PASSED [ 80%] 2023-01-11T23:10:17.2184530Z test_ops.py::TestCompositeComplianceCUDA::test_operator_sum_to_size_cuda_float32 PASSED [ 80%] 2023-01-11T23:10:17.2184706Z test_ops.py::TestCompositeComplianceCUDA::test_operator_symeig_cuda_float32 PASSED [ 80%] 2023-01-11T23:10:17.2184878Z test_ops.py::TestCompositeComplianceCUDA::test_operator_t_cuda_float32 PASSED [ 80%] 2023-01-11T23:10:17.2185051Z test_ops.py::TestCompositeComplianceCUDA::test_operator_take_cuda_float32 PASSED [ 80%] 2023-01-11T23:10:17.2185222Z test_ops.py::TestCompositeComplianceCUDA::test_operator_tan_cuda_float32 PASSED [ 80%] 2023-01-11T23:10:17.2185400Z test_ops.py::TestCompositeComplianceCUDA::test_operator_tensor_split_cuda_float32 XFAIL [ 80%] 2023-01-11T23:10:17.2185580Z test_ops.py::TestCompositeComplianceCUDA::test_operator_tensordot_cuda_float32 PASSED [ 80%] 2023-01-11T23:10:17.2185744Z test_ops.py::TestCompositeComplianceCUDA::test_operator_topk_cuda_float32 PASSED [ 80%] 2023-01-11T23:10:17.2185953Z test_ops.py::TestCompositeComplianceCUDA::test_operator_transpose_cuda_float32 PASSED [ 80%] 2023-01-11T23:10:17.2186141Z test_ops.py::TestCompositeComplianceCUDA::test_operator_triangular_solve_cuda_float32 PASSED [ 80%] 2023-01-11T23:10:17.2186317Z test_ops.py::TestCompositeComplianceCUDA::test_operator_triu_cuda_float32 PASSED [ 80%] 2023-01-11T23:10:17.2186497Z test_ops.py::TestCompositeComplianceCUDA::test_operator_true_divide_cuda_float32 PASSED [ 80%] 2023-01-11T23:10:17.2186672Z test_ops.py::TestCompositeComplianceCUDA::test_operator_trunc_cuda_float32 PASSED [ 80%] 2023-01-11T23:10:17.2186844Z test_ops.py::TestCompositeComplianceCUDA::test_operator_unbind_cuda_float32 PASSED [ 80%] 2023-01-11T23:10:17.2187022Z test_ops.py::TestCompositeComplianceCUDA::test_operator_unflatten_cuda_float32 PASSED [ 80%] 2023-01-11T23:10:17.2187202Z test_ops.py::TestCompositeComplianceCUDA::test_operator_unfold_copy_cuda_float32 PASSED [ 80%] 2023-01-11T23:10:17.2187371Z test_ops.py::TestCompositeComplianceCUDA::test_operator_unfold_cuda_float32 PASSED [ 80%] 2023-01-11T23:10:17.2187565Z test_ops.py::TestCompositeComplianceCUDA::test_operator_unique_consecutive_cuda_float32 PASSED [ 80%] 2023-01-11T23:10:17.2187741Z test_ops.py::TestCompositeComplianceCUDA::test_operator_unique_cuda_float32 PASSED [ 80%] 2023-01-11T23:10:17.2187918Z test_ops.py::TestCompositeComplianceCUDA::test_operator_unsqueeze_cuda_float32 PASSED [ 80%] 2023-01-11T23:10:17.2188089Z test_ops.py::TestCompositeComplianceCUDA::test_operator_var_cuda_float32 PASSED [ 80%] 2023-01-11T23:10:17.2188279Z test_ops.py::TestCompositeComplianceCUDA::test_operator_var_mean_unbiased_cuda_float32 PASSED [ 80%] 2023-01-11T23:10:17.2188452Z test_ops.py::TestCompositeComplianceCUDA::test_operator_vdot_cuda_float32 PASSED [ 80%] 2023-01-11T23:10:17.2188638Z test_ops.py::TestCompositeComplianceCUDA::test_operator_view_as_complex_cuda_float32 PASSED [ 80%] 2023-01-11T23:10:17.2188878Z test_ops.py::TestCompositeComplianceCUDA::test_operator_view_as_cuda_float32 PASSED [ 80%] 2023-01-11T23:10:17.2189062Z test_ops.py::TestCompositeComplianceCUDA::test_operator_view_copy_cuda_float32 PASSED [ 80%] 2023-01-11T23:10:17.2189237Z test_ops.py::TestCompositeComplianceCUDA::test_operator_where_cuda_float32 PASSED [ 80%] 2023-01-11T23:10:17.2189443Z test_ops.py::TestCompositeComplianceCUDA::test_operator_zeros_cuda_float32 PASSED [ 80%] 2023-01-11T23:10:17.2189624Z test_ops.py::TestCompositeComplianceCUDA::test_operator_zeros_like_cuda_float32 PASSED [ 80%] 2023-01-11T23:10:17.2189781Z test_ops.py::TestMathBitsCUDA::test_conj_view_H_cuda_complex64 PASSED [ 80%] 2023-01-11T23:10:17.2189939Z test_ops.py::TestMathBitsCUDA::test_conj_view___rsub___cuda_complex64 PASSED [ 80%] 2023-01-11T23:10:17.2190097Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_T_cuda_complex64 PASSED [ 80%] 2023-01-11T23:10:17.2190280Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs__conversions_bfloat16_cuda_complex64 PASSED [ 80%] 2023-01-11T23:10:17.2190451Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs__conversions_cdouble_cuda_complex64 PASSED [ 80%] 2023-01-11T23:10:17.2190626Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs__conversions_chalf_cuda_complex64 PASSED [ 80%] 2023-01-11T23:10:17.2190801Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs__conversions_char_cuda_complex64 PASSED [ 80%] 2023-01-11T23:10:17.2190976Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs__conversions_double_cuda_complex64 PASSED [ 80%] 2023-01-11T23:10:17.2191145Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs__conversions_float_cuda_complex64 PASSED [ 80%] 2023-01-11T23:10:17.2191314Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs__conversions_half_cuda_complex64 PASSED [ 80%] 2023-01-11T23:10:17.2191479Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs__conversions_long_cuda_complex64 PASSED [ 80%] 2023-01-11T23:10:17.2191647Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs__conversions_short_cuda_complex64 PASSED [ 80%] 2023-01-11T23:10:17.2191799Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_abs_cuda_complex64 PASSED [ 80%] 2023-01-11T23:10:17.2191984Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_acosh_cuda_complex64 PASSED [ 80%] 2023-01-11T23:10:17.2192152Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_allclose_cuda_complex64 PASSED [ 80%] 2023-01-11T23:10:17.2192365Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_as_strided_cuda_complex64 SKIPPED (Errors when storage_offset is included) [ 80%] 2023-01-11T23:10:17.2192525Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_atanh_cuda_complex64 PASSED [ 80%] 2023-01-11T23:10:17.2192690Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_atleast_2d_cuda_complex64 PASSED [ 80%] 2023-01-11T23:10:17.2192854Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_atleast_3d_cuda_complex64 PASSED [ 80%] 2023-01-11T23:10:17.2193022Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_broadcast_to_cuda_complex64 PASSED [ 80%] 2023-01-11T23:10:17.2193180Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_chunk_cuda_complex64 PASSED [ 80%] 2023-01-11T23:10:17.2193333Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_clone_cuda_complex64 PASSED [ 80%] 2023-01-11T23:10:17.2193496Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_column_stack_cuda_complex64 PASSED [ 80%] 2023-01-11T23:10:17.2193657Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_conj_cuda_complex64 PASSED [ 80%] 2023-01-11T23:10:17.2193827Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_conj_physical_cuda_complex64 PASSED [ 80%] 2023-01-11T23:10:17.2194000Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_constant_pad_nd_cuda_complex64 PASSED [ 80%] 2023-01-11T23:10:17.2194160Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_cos_cuda_complex64 PASSED [ 80%] 2023-01-11T23:10:17.2194319Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_cosh_cuda_complex64 PASSED [ 80%] 2023-01-11T23:10:17.2194477Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_cumsum_cuda_complex64 PASSED [ 80%] 2023-01-11T23:10:17.2194642Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_diag_cuda_complex64 PASSED [ 80%] 2023-01-11T23:10:17.2194803Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_diagonal_copy_cuda_complex64 PASSED [ 80%] 2023-01-11T23:10:17.2194991Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_diagonal_cuda_complex64 PASSED [ 80%] 2023-01-11T23:10:17.2195155Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_dsplit_cuda_complex64 PASSED [ 80%] 2023-01-11T23:10:17.2195319Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_dstack_cuda_complex64 PASSED [ 80%] 2023-01-11T23:10:17.2195484Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_expand_as_cuda_complex64 PASSED [ 80%] 2023-01-11T23:10:17.2195644Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_fft_fft2_cuda_complex64 PASSED [ 80%] 2023-01-11T23:10:17.2195807Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_fft_fft_cuda_complex64 PASSED [ 80%] 2023-01-11T23:10:17.2195969Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_fft_fftn_cuda_complex64 PASSED [ 80%] 2023-01-11T23:10:17.2196130Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_fft_hfft2_cuda_complex64 PASSED [ 80%] 2023-01-11T23:10:17.2196289Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_fft_hfft_cuda_complex64 PASSED [ 80%] 2023-01-11T23:10:17.2196449Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_fft_ifft_cuda_complex64 PASSED [ 80%] 2023-01-11T23:10:17.2196609Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_fft_ifftn_cuda_complex64 PASSED [ 80%] 2023-01-11T23:10:17.2196773Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_fft_irfft2_cuda_complex64 PASSED [ 80%] 2023-01-11T23:10:17.2196940Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_float_power_cuda_complex64 PASSED [ 80%] 2023-01-11T23:10:17.2197101Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_hsplit_cuda_complex64 PASSED [ 80%] 2023-01-11T23:10:17.2197261Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_imag_cuda_complex64 PASSED [ 80%] 2023-01-11T23:10:17.2197419Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_index_add_cuda_complex64 PASSED [ 80%] 2023-01-11T23:10:17.2197605Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_isreal_cuda_complex64 PASSED [ 80%] 2023-01-11T23:10:17.2197765Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_lerp_cuda_complex64 PASSED [ 80%] 2023-01-11T23:10:17.2197941Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_linalg_matrix_norm_cuda_complex64 PASSED [ 80%] 2023-01-11T23:10:17.2198108Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_linalg_norm_cuda_complex64 PASSED [ 80%] 2023-01-11T23:10:17.2198278Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_linalg_svdvals_cuda_complex64 PASSED [ 80%] 2023-01-11T23:10:17.2198449Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_linalg_vector_norm_cuda_complex64 PASSED [ 80%] 2023-01-11T23:10:17.2198606Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_log10_cuda_complex64 PASSED [ 80%] 2023-01-11T23:10:17.2198768Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_log1p_cuda_complex64 PASSED [ 80%] 2023-01-11T23:10:17.2198922Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_log2_cuda_complex64 PASSED [ 80%] 2023-01-11T23:10:17.2199099Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_log_softmax_with_dtype_cuda_complex64 PASSED [ 80%] 2023-01-11T23:10:17.2199268Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_logical_and_cuda_complex64 PASSED [ 80%] 2023-01-11T23:10:17.2199434Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_logical_or_cuda_complex64 PASSED [ 80%] 2023-01-11T23:10:17.2199596Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_masked_fill_cuda_complex64 PASSED [ 80%] 2023-01-11T23:10:17.2199783Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_meshgrid_variadic_tensors_cuda_complex64 PASSED [ 80%] 2023-01-11T23:10:17.2199945Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_movedim_cuda_complex64 PASSED [ 80%] 2023-01-11T23:10:17.2200103Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_neg_cuda_complex64 PASSED [ 80%] 2023-01-11T23:10:17.2200309Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_new_empty_cuda_complex64 SKIPPED (Expected: empty is not comparable) [ 80%] 2023-01-11T23:10:17.2200527Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_new_empty_strided_cuda_complex64 SKIPPED (Expected: empty_strided is not comparable) [ 80%] 2023-01-11T23:10:17.2200719Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_new_zeros_cuda_complex64 PASSED [ 80%] 2023-01-11T23:10:17.2200898Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_nn_functional_l1_loss_cuda_complex64 PASSED [ 80%] 2023-01-11T23:10:17.2201093Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_nn_functional_log_softmax_with_dtype_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2201283Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_nn_functional_pairwise_distance_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2201473Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_nn_functional_softmax_with_dtype_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2201665Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_nn_functional_softmin_with_dtype_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2201852Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_nn_functional_tanhshrink_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2202046Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_nn_functional_triplet_margin_loss_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2202198Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_ones_cuda_complex64 XFAIL [ 81%] 2023-01-11T23:10:17.2202363Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_positive_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2202524Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_ravel_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2202685Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_real_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2202850Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_reciprocal_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2203012Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_repeat_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2203204Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_reshape_as_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2203367Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_reshape_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2203522Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_roll_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2203681Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_rot90_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2203839Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_rsqrt_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2203998Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_sgn_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2204150Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_sin_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2204307Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_sinc_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2204461Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_sinh_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2204656Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_special_log_softmax_with_dtype_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2204815Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_sub_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2204965Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_t_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2205123Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_tan_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2205281Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_trace_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2205447Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_transpose_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2205612Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_true_divide_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2205773Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_unbind_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2205946Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_unfold_copy_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2206106Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_unfold_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2206289Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_var_mean_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2206448Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_view_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2206607Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_vsplit_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2206764Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_zeros_cuda_complex64 XFAIL [ 81%] 2023-01-11T23:10:17.2206918Z test_ops.py::TestMathBitsCUDA::test_conj_view_abs_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2207070Z test_ops.py::TestMathBitsCUDA::test_conj_view_acos_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2207225Z test_ops.py::TestMathBitsCUDA::test_conj_view_add_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2207384Z test_ops.py::TestMathBitsCUDA::test_conj_view_addbmm_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2207544Z test_ops.py::TestMathBitsCUDA::test_conj_view_addcdiv_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2207695Z test_ops.py::TestMathBitsCUDA::test_conj_view_addmv_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2207856Z test_ops.py::TestMathBitsCUDA::test_conj_view_allclose_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2208006Z test_ops.py::TestMathBitsCUDA::test_conj_view_any_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2208158Z test_ops.py::TestMathBitsCUDA::test_conj_view_asin_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2208312Z test_ops.py::TestMathBitsCUDA::test_conj_view_atan_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2208465Z test_ops.py::TestMathBitsCUDA::test_conj_view_atanh_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2208623Z test_ops.py::TestMathBitsCUDA::test_conj_view_atleast_1d_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2208805Z test_ops.py::TestMathBitsCUDA::test_conj_view_atleast_2d_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2208952Z test_ops.py::TestMathBitsCUDA::test_conj_view_atleast_3d_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2209110Z test_ops.py::TestMathBitsCUDA::test_conj_view_bfloat16_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2209263Z test_ops.py::TestMathBitsCUDA::test_conj_view_bmm_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2209412Z test_ops.py::TestMathBitsCUDA::test_conj_view_bool_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2209582Z test_ops.py::TestMathBitsCUDA::test_conj_view_broadcast_tensors_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2209734Z test_ops.py::TestMathBitsCUDA::test_conj_view_byte_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2209892Z test_ops.py::TestMathBitsCUDA::test_conj_view_cfloat_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2210047Z test_ops.py::TestMathBitsCUDA::test_conj_view_chalf_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2210195Z test_ops.py::TestMathBitsCUDA::test_conj_view_char_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2210378Z test_ops.py::TestMathBitsCUDA::test_conj_view_cholesky_solve_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2210557Z test_ops.py::TestMathBitsCUDA::test_conj_view_chunk_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2210732Z test_ops.py::TestMathBitsCUDA::test_conj_view_conj_physical_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2210899Z test_ops.py::TestMathBitsCUDA::test_conj_view_constant_pad_nd_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2211051Z test_ops.py::TestMathBitsCUDA::test_conj_view_cosh_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2211214Z test_ops.py::TestMathBitsCUDA::test_conj_view_count_nonzero_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2211364Z test_ops.py::TestMathBitsCUDA::test_conj_view_cov_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2211512Z test_ops.py::TestMathBitsCUDA::test_conj_view_cumsum_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2211690Z test_ops.py::TestMathBitsCUDA::test_conj_view_cumulative_trapezoid_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2211840Z test_ops.py::TestMathBitsCUDA::test_conj_view_diag_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2212014Z test_ops.py::TestMathBitsCUDA::test_conj_view_diff_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2212187Z test_ops.py::TestMathBitsCUDA::test_conj_view_div_no_rounding_mode_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2212344Z test_ops.py::TestMathBitsCUDA::test_conj_view_double_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2212500Z test_ops.py::TestMathBitsCUDA::test_conj_view_dsplit_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2212657Z test_ops.py::TestMathBitsCUDA::test_conj_view_dstack_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2212804Z test_ops.py::TestMathBitsCUDA::test_conj_view_einsum_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2212973Z test_ops.py::TestMathBitsCUDA::test_conj_view_empty_cuda_complex64 SKIPPED (Skipped!) [ 81%] 2023-01-11T23:10:17.2213152Z test_ops.py::TestMathBitsCUDA::test_conj_view_empty_like_cuda_complex64 SKIPPED (Skipped!) [ 81%] 2023-01-11T23:10:17.2213307Z test_ops.py::TestMathBitsCUDA::test_conj_view_equal_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2213463Z test_ops.py::TestMathBitsCUDA::test_conj_view_expand_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2213630Z test_ops.py::TestMathBitsCUDA::test_conj_view_eye_cuda_complex64 SKIPPED (Skipped!) [ 81%] 2023-01-11T23:10:17.2213791Z test_ops.py::TestMathBitsCUDA::test_conj_view_fft_fft2_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2213953Z test_ops.py::TestMathBitsCUDA::test_conj_view_fft_fftshift_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2214115Z test_ops.py::TestMathBitsCUDA::test_conj_view_fft_hfftn_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2214265Z test_ops.py::TestMathBitsCUDA::test_conj_view_fft_ifftn_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2214424Z test_ops.py::TestMathBitsCUDA::test_conj_view_fft_irfft2_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2214703Z test_ops.py::TestMathBitsCUDA::test_conj_view_fft_irfftn_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2214873Z test_ops.py::TestMathBitsCUDA::test_conj_view_flatten_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2215033Z test_ops.py::TestMathBitsCUDA::test_conj_view_fliplr_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2215197Z test_ops.py::TestMathBitsCUDA::test_conj_view_float_power_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2215348Z test_ops.py::TestMathBitsCUDA::test_conj_view_full_cuda_complex64 XFAIL [ 81%] 2023-01-11T23:10:17.2215502Z test_ops.py::TestMathBitsCUDA::test_conj_view_half_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2215651Z test_ops.py::TestMathBitsCUDA::test_conj_view_hsplit_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2215803Z test_ops.py::TestMathBitsCUDA::test_conj_view_imag_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2215960Z test_ops.py::TestMathBitsCUDA::test_conj_view_index_fill_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2216124Z test_ops.py::TestMathBitsCUDA::test_conj_view_index_put_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2216279Z test_ops.py::TestMathBitsCUDA::test_conj_view_isinf_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2216434Z test_ops.py::TestMathBitsCUDA::test_conj_view_isnan_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2216591Z test_ops.py::TestMathBitsCUDA::test_conj_view_isreal_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2216747Z test_ops.py::TestMathBitsCUDA::test_conj_view_istft_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2216920Z test_ops.py::TestMathBitsCUDA::test_conj_view_jiterator_2inputs_2outputs_cuda_complex64 XFAIL [ 81%] 2023-01-11T23:10:17.2217099Z test_ops.py::TestMathBitsCUDA::test_conj_view_jiterator_binary_return_by_ref_cuda_complex64 XFAIL [ 81%] 2023-01-11T23:10:17.2217262Z test_ops.py::TestMathBitsCUDA::test_conj_view_jiterator_unary_cuda_complex64 XFAIL [ 81%] 2023-01-11T23:10:17.2217416Z test_ops.py::TestMathBitsCUDA::test_conj_view_kron_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2217572Z test_ops.py::TestMathBitsCUDA::test_conj_view_lerp_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2217734Z test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_cond_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2217942Z test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_eig_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2218103Z test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_eigh_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2218268Z test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_eigvals_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2218421Z test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_inv_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2218590Z test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_ldl_factor_ex_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2218758Z test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_ldl_solve_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2218924Z test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_lstsq_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2219108Z test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_lstsq_grad_oriented_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2219272Z test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_lu_solve_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2219444Z test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_matrix_norm_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2219614Z test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_matrix_power_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2219786Z test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_matrix_rank_hermitian_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2219951Z test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_multi_dot_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2220112Z test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_norm_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2220297Z test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_norm_subgradients_at_zero_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2220490Z test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_solve_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2220655Z test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_solve_ex_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2220833Z test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_solve_triangular_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2220992Z test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_svd_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2221157Z test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_tensorinv_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2221319Z test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_tensorsolve_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2221482Z test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_vander_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2221645Z test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_vecdot_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2221802Z test_ops.py::TestMathBitsCUDA::test_conj_view_log1p_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2221958Z test_ops.py::TestMathBitsCUDA::test_conj_view_log2_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2222111Z test_ops.py::TestMathBitsCUDA::test_conj_view_log_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2222270Z test_ops.py::TestMathBitsCUDA::test_conj_view_logdet_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2222434Z test_ops.py::TestMathBitsCUDA::test_conj_view_logical_and_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2222589Z test_ops.py::TestMathBitsCUDA::test_conj_view_logical_not_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2222748Z test_ops.py::TestMathBitsCUDA::test_conj_view_logical_or_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2222907Z test_ops.py::TestMathBitsCUDA::test_conj_view_logical_xor_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2223069Z test_ops.py::TestMathBitsCUDA::test_conj_view_logspace_cuda_complex64 XFAIL [ 81%] 2023-01-11T23:10:17.2223221Z test_ops.py::TestMathBitsCUDA::test_conj_view_long_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2223383Z test_ops.py::TestMathBitsCUDA::test_conj_view_lu_solve_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2223540Z test_ops.py::TestMathBitsCUDA::test_conj_view_lu_unpack_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2223719Z test_ops.py::TestMathBitsCUDA::test_conj_view_mH_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2223875Z test_ops.py::TestMathBitsCUDA::test_conj_view_masked_fill_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2224040Z test_ops.py::TestMathBitsCUDA::test_conj_view_masked_normalize_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2224205Z test_ops.py::TestMathBitsCUDA::test_conj_view_masked_select_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2224362Z test_ops.py::TestMathBitsCUDA::test_conj_view_matmul_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2224539Z test_ops.py::TestMathBitsCUDA::test_conj_view_meshgrid_list_of_tensors_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2224693Z test_ops.py::TestMathBitsCUDA::test_conj_view_mm_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2224844Z test_ops.py::TestMathBitsCUDA::test_conj_view_mul_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2224994Z test_ops.py::TestMathBitsCUDA::test_conj_view_mv_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2225150Z test_ops.py::TestMathBitsCUDA::test_conj_view_ne_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2225364Z test_ops.py::TestMathBitsCUDA::test_conj_view_new_empty_strided_cuda_complex64 SKIPPED (Expected: new_empty_strided is not comparable) [ 81%] 2023-01-11T23:10:17.2225523Z test_ops.py::TestMathBitsCUDA::test_conj_view_new_ones_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2225706Z test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_conv_transpose1d_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2225886Z test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_pad_circular_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2226066Z test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_pad_reflect_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2226280Z test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_pairwise_distance_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2226469Z test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_softmin_with_dtype_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2226646Z test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_tanhshrink_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2226835Z test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_triplet_margin_loss_cuda_complex64 PASSED [ 81%] 2023-01-11T23:10:17.2227030Z test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_triplet_margin_with_distance_loss_cuda_complex64 PASSED [ 82%] 2023-01-11T23:10:17.2227201Z test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_unfold_cuda_complex64 PASSED [ 82%] 2023-01-11T23:10:17.2227360Z test_ops.py::TestMathBitsCUDA::test_conj_view_nonzero_cuda_complex64 PASSED [ 82%] 2023-01-11T23:10:17.2227516Z test_ops.py::TestMathBitsCUDA::test_conj_view_norm_inf_cuda_complex64 PASSED [ 82%] 2023-01-11T23:10:17.2227669Z test_ops.py::TestMathBitsCUDA::test_conj_view_ones_cuda_complex64 XFAIL [ 82%] 2023-01-11T23:10:17.2227828Z test_ops.py::TestMathBitsCUDA::test_conj_view_ormqr_cuda_complex64 PASSED [ 82%] 2023-01-11T23:10:17.2227993Z test_ops.py::TestMathBitsCUDA::test_conj_view_pinverse_cuda_complex64 PASSED [ 82%] 2023-01-11T23:10:17.2228146Z test_ops.py::TestMathBitsCUDA::test_conj_view_prod_cuda_complex64 PASSED [ 82%] 2023-01-11T23:10:17.2228291Z test_ops.py::TestMathBitsCUDA::test_conj_view_put_cuda_complex64 PASSED [ 82%] 2023-01-11T23:10:17.2228464Z test_ops.py::TestMathBitsCUDA::test_conj_view_rand_like_cuda_complex64 PASSED [ 82%] 2023-01-11T23:10:17.2228711Z test_ops.py::TestMathBitsCUDA::test_conj_view_randn_cuda_complex64 XFAIL [ 82%] 2023-01-11T23:10:17.2228956Z test_ops.py::TestMathBitsCUDA::test_conj_view_randn_like_cuda_complex64 PASSED [ 82%] 2023-01-11T23:10:17.2229149Z test_ops.py::TestMathBitsCUDA::test_conj_view_ravel_cuda_complex64 PASSED [ 82%] 2023-01-11T23:10:17.2229353Z test_ops.py::TestMathBitsCUDA::test_conj_view_renorm_cuda_complex64 PASSED [ 82%] 2023-01-11T23:10:17.2229515Z test_ops.py::TestMathBitsCUDA::test_conj_view_resize_as__cuda_complex64 PASSED [ 82%] 2023-01-11T23:10:17.2229719Z test_ops.py::TestMathBitsCUDA::test_conj_view_resolve_conj_cuda_complex64 PASSED [ 82%] 2023-01-11T23:10:17.2229867Z test_ops.py::TestMathBitsCUDA::test_conj_view_roll_cuda_complex64 PASSED [ 82%] 2023-01-11T23:10:17.2230019Z test_ops.py::TestMathBitsCUDA::test_conj_view_rsub_cuda_complex64 PASSED [ 82%] 2023-01-11T23:10:17.2230178Z test_ops.py::TestMathBitsCUDA::test_conj_view_sigmoid_cuda_complex64 PASSED [ 82%] 2023-01-11T23:10:17.2230328Z test_ops.py::TestMathBitsCUDA::test_conj_view_sin_cuda_complex64 PASSED [ 82%] 2023-01-11T23:10:17.2230482Z test_ops.py::TestMathBitsCUDA::test_conj_view_sinh_cuda_complex64 PASSED [ 82%] 2023-01-11T23:10:17.2230666Z test_ops.py::TestMathBitsCUDA::test_conj_view_sparse_sampled_addmm_cuda_complex64 SKIPPED (Skipped!) [ 82%] 2023-01-11T23:10:17.2230824Z test_ops.py::TestMathBitsCUDA::test_conj_view_sqrt_cuda_complex64 PASSED [ 82%] 2023-01-11T23:10:17.2230981Z test_ops.py::TestMathBitsCUDA::test_conj_view_squeeze_cuda_complex64 PASSED [ 82%] 2023-01-11T23:10:17.2231133Z test_ops.py::TestMathBitsCUDA::test_conj_view_stack_cuda_complex64 XFAIL [ 82%] 2023-01-11T23:10:17.2231276Z test_ops.py::TestMathBitsCUDA::test_conj_view_std_cuda_complex64 PASSED [ 82%] 2023-01-11T23:10:17.2231440Z test_ops.py::TestMathBitsCUDA::test_conj_view_std_mean_cuda_complex64 PASSED [ 82%] 2023-01-11T23:10:17.2231612Z test_ops.py::TestMathBitsCUDA::test_conj_view_std_mean_unbiased_cuda_complex64 PASSED [ 82%] 2023-01-11T23:10:17.2231760Z test_ops.py::TestMathBitsCUDA::test_conj_view_sub_cuda_complex64 PASSED [ 82%] 2023-01-11T23:10:17.2231910Z test_ops.py::TestMathBitsCUDA::test_conj_view_sum_cuda_complex64 PASSED [ 82%] 2023-01-11T23:10:17.2232074Z test_ops.py::TestMathBitsCUDA::test_conj_view_sum_to_size_cuda_complex64 PASSED [ 82%] 2023-01-11T23:10:17.2232263Z test_ops.py::TestMathBitsCUDA::test_conj_view_take_along_dim_cuda_complex64 PASSED [ 82%] 2023-01-11T23:10:17.2232411Z test_ops.py::TestMathBitsCUDA::test_conj_view_take_cuda_complex64 PASSED [ 82%] 2023-01-11T23:10:17.2232555Z test_ops.py::TestMathBitsCUDA::test_conj_view_tan_cuda_complex64 PASSED [ 82%] 2023-01-11T23:10:17.2232719Z test_ops.py::TestMathBitsCUDA::test_conj_view_tensor_split_cuda_complex64 PASSED [ 82%] 2023-01-11T23:10:17.2232867Z test_ops.py::TestMathBitsCUDA::test_conj_view_tile_cuda_complex64 PASSED [ 82%] 2023-01-11T23:10:17.2233017Z test_ops.py::TestMathBitsCUDA::test_conj_view_to_cuda_complex64 PASSED [ 82%] 2023-01-11T23:10:17.2233173Z test_ops.py::TestMathBitsCUDA::test_conj_view_trace_cuda_complex64 PASSED [ 82%] 2023-01-11T23:10:17.2233327Z test_ops.py::TestMathBitsCUDA::test_conj_view_trapz_cuda_complex64 PASSED [ 82%] 2023-01-11T23:10:17.2233493Z test_ops.py::TestMathBitsCUDA::test_conj_view_triangular_solve_cuda_complex64 PASSED [ 82%] 2023-01-11T23:10:17.2233646Z test_ops.py::TestMathBitsCUDA::test_conj_view_tril_cuda_complex64 PASSED [ 82%] 2023-01-11T23:10:17.2233788Z test_ops.py::TestMathBitsCUDA::test_conj_view_triu_cuda_complex64 PASSED [ 82%] 2023-01-11T23:10:17.2233952Z test_ops.py::TestMathBitsCUDA::test_conj_view_true_divide_cuda_complex64 PASSED [ 82%] 2023-01-11T23:10:17.2234113Z test_ops.py::TestMathBitsCUDA::test_conj_view_unfold_copy_cuda_complex64 PASSED [ 82%] 2023-01-11T23:10:17.2234270Z test_ops.py::TestMathBitsCUDA::test_conj_view_unfold_cuda_complex64 PASSED [ 82%] 2023-01-11T23:10:17.2234430Z test_ops.py::TestMathBitsCUDA::test_conj_view_var_mean_cuda_complex64 PASSED [ 82%] 2023-01-11T23:10:17.2234599Z test_ops.py::TestMathBitsCUDA::test_conj_view_var_mean_unbiased_cuda_complex64 PASSED [ 82%] 2023-01-11T23:10:17.2234758Z test_ops.py::TestMathBitsCUDA::test_conj_view_view_as_cuda_complex64 PASSED [ 82%] 2023-01-11T23:10:17.2235077Z test_ops.py::TestMathBitsCUDA::test_conj_view_view_as_real_cuda_complex64 SKIPPED (Operation doesn't support conjugated inputs.) [ 82%] 2023-01-11T23:10:17.2235233Z test_ops.py::TestMathBitsCUDA::test_conj_view_zeros_cuda_complex64 XFAIL [ 82%] 2023-01-11T23:10:17.2235383Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_H_cuda_complex128 PASSED [ 82%] 2023-01-11T23:10:17.2235563Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_T_cuda_complex128 PASSED [ 82%] 2023-01-11T23:10:17.2235734Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view___getitem___cuda_complex128 PASSED [ 82%] 2023-01-11T23:10:17.2235898Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view___radd___cuda_complex128 PASSED [ 82%] 2023-01-11T23:10:17.2236061Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view___rsub___cuda_complex128 PASSED [ 82%] 2023-01-11T23:10:17.2236224Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_T_cuda_complex128 PASSED [ 82%] 2023-01-11T23:10:17.2236411Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs__conversions_bfloat16_cuda_complex128 PASSED [ 82%] 2023-01-11T23:10:17.2236594Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs__conversions_bool_cuda_complex128 PASSED [ 82%] 2023-01-11T23:10:17.2236766Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs__conversions_byte_cuda_complex128 PASSED [ 82%] 2023-01-11T23:10:17.2236957Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs__conversions_cdouble_cuda_complex128 PASSED [ 82%] 2023-01-11T23:10:17.2237138Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs__conversions_chalf_cuda_complex128 PASSED [ 82%] 2023-01-11T23:10:17.2237320Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs__conversions_float_cuda_complex128 PASSED [ 82%] 2023-01-11T23:10:17.2237498Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs__conversions_long_cuda_complex128 PASSED [ 82%] 2023-01-11T23:10:17.2237682Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs__conversions_short_cuda_complex128 PASSED [ 82%] 2023-01-11T23:10:17.2237848Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_abs_cuda_complex128 PASSED [ 82%] 2023-01-11T23:10:17.2238037Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_acos_cuda_complex128 PASSED [ 82%] 2023-01-11T23:10:17.2238199Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_add_cuda_complex128 PASSED [ 82%] 2023-01-11T23:10:17.2238362Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_addcdiv_cuda_complex128 PASSED [ 82%] 2023-01-11T23:10:17.2238524Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_addr_cuda_complex128 PASSED [ 82%] 2023-01-11T23:10:17.2238683Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_all_cuda_complex128 PASSED [ 82%] 2023-01-11T23:10:17.2238898Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_as_strided_cuda_complex128 SKIPPED (Errors when storage_offset is included) [ 82%] 2023-01-11T23:10:17.2239079Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_as_strided_scatter_cuda_complex128 PASSED [ 82%] 2023-01-11T23:10:17.2239245Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_asinh_cuda_complex128 PASSED [ 82%] 2023-01-11T23:10:17.2239409Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_atan_cuda_complex128 PASSED [ 82%] 2023-01-11T23:10:17.2239579Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_atleast_1d_cuda_complex128 PASSED [ 82%] 2023-01-11T23:10:17.2239753Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_atleast_2d_cuda_complex128 PASSED [ 82%] 2023-01-11T23:10:17.2239917Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_atleast_3d_cuda_complex128 PASSED [ 82%] 2023-01-11T23:10:17.2240082Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_chunk_cuda_complex128 PASSED [ 82%] 2023-01-11T23:10:17.2240247Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_clone_cuda_complex128 PASSED [ 82%] 2023-01-11T23:10:17.2240420Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_column_stack_cuda_complex128 PASSED [ 82%] 2023-01-11T23:10:17.2240582Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_conj_cuda_complex128 PASSED [ 82%] 2023-01-11T23:10:17.2240764Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_constant_pad_nd_cuda_complex128 PASSED [ 82%] 2023-01-11T23:10:17.2240934Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_contiguous_cuda_complex128 PASSED [ 82%] 2023-01-11T23:10:17.2241120Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_cos_cuda_complex128 PASSED [ 82%] 2023-01-11T23:10:17.2241280Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_diag_cuda_complex128 PASSED [ 82%] 2023-01-11T23:10:17.2241454Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_diag_embed_cuda_complex128 PASSED [ 82%] 2023-01-11T23:10:17.2241622Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_diagonal_cuda_complex128 PASSED [ 82%] 2023-01-11T23:10:17.2241801Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_diagonal_scatter_cuda_complex128 PASSED [ 82%] 2023-01-11T23:10:17.2241967Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_dsplit_cuda_complex128 PASSED [ 82%] 2023-01-11T23:10:17.2242135Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_dstack_cuda_complex128 PASSED [ 82%] 2023-01-11T23:10:17.2242341Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_empty_cuda_complex128 SKIPPED (Expected: empty is not comparable) [ 82%] 2023-01-11T23:10:17.2242505Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_eq_cuda_complex128 PASSED [ 82%] 2023-01-11T23:10:17.2242668Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_exp_cuda_complex128 PASSED [ 82%] 2023-01-11T23:10:17.2242832Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_expand_as_cuda_complex128 PASSED [ 82%] 2023-01-11T23:10:17.2242995Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_expand_cuda_complex128 PASSED [ 82%] 2023-01-11T23:10:17.2243173Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_eye_cuda_complex128 SKIPPED (Skipped!) [ 82%] 2023-01-11T23:10:17.2243340Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_fft_fft2_cuda_complex128 PASSED [ 82%] 2023-01-11T23:10:17.2243504Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_fft_fft_cuda_complex128 PASSED [ 82%] 2023-01-11T23:10:17.2243695Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_fft_hfft2_cuda_complex128 PASSED [ 82%] 2023-01-11T23:10:17.2243866Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_fft_hfftn_cuda_complex128 PASSED [ 82%] 2023-01-11T23:10:17.2244031Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_fft_ifft2_cuda_complex128 PASSED [ 82%] 2023-01-11T23:10:17.2244194Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_fft_ifft_cuda_complex128 PASSED [ 82%] 2023-01-11T23:10:17.2244352Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_fft_ifftn_cuda_complex128 PASSED [ 82%] 2023-01-11T23:10:17.2244521Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_fft_irfft2_cuda_complex128 PASSED [ 82%] 2023-01-11T23:10:17.2244684Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_fill_cuda_complex128 PASSED [ 82%] 2023-01-11T23:10:17.2244851Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_flatten_cuda_complex128 PASSED [ 82%] 2023-01-11T23:10:17.2245013Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_flip_cuda_complex128 PASSED [ 82%] 2023-01-11T23:10:17.2245180Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_fliplr_cuda_complex128 PASSED [ 82%] 2023-01-11T23:10:17.2245349Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_flipud_cuda_complex128 PASSED [ 82%] 2023-01-11T23:10:17.2245520Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_float_power_cuda_complex128 PASSED [ 82%] 2023-01-11T23:10:17.2245677Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_hstack_cuda_complex128 PASSED [ 82%] 2023-01-11T23:10:17.2245840Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_imag_cuda_complex128 PASSED [ 82%] 2023-01-11T23:10:17.2246012Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_index_fill_cuda_complex128 PASSED [ 82%] 2023-01-11T23:10:17.2246185Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_index_select_cuda_complex128 PASSED [ 82%] 2023-01-11T23:10:17.2246353Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_isclose_cuda_complex128 PASSED [ 82%] 2023-01-11T23:10:17.2246519Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_isinf_cuda_complex128 PASSED [ 82%] 2023-01-11T23:10:17.2246707Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_isnan_cuda_complex128 PASSED [ 82%] 2023-01-11T23:10:17.2246875Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_linalg_norm_cuda_complex128 PASSED [ 82%] 2023-01-11T23:10:17.2247052Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_linalg_vector_norm_cuda_complex128 PASSED [ 82%] 2023-01-11T23:10:17.2247211Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_linspace_cuda_complex128 XFAIL [ 82%] 2023-01-11T23:10:17.2247368Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_log10_cuda_complex128 PASSED [ 82%] 2023-01-11T23:10:17.2247531Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_log1p_cuda_complex128 PASSED [ 82%] 2023-01-11T23:10:17.2247696Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_log2_cuda_complex128 PASSED [ 82%] 2023-01-11T23:10:17.2247858Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_log_cuda_complex128 PASSED [ 82%] 2023-01-11T23:10:17.2248029Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_logical_or_cuda_complex128 PASSED [ 82%] 2023-01-11T23:10:17.2248194Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_narrow_cuda_complex128 PASSED [ 82%] 2023-01-11T23:10:17.2248354Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_ne_cuda_complex128 PASSED [ 82%] 2023-01-11T23:10:17.2248581Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_new_empty_strided_cuda_complex128 SKIPPED (Expected: empty_strided is not comparable) [ 82%] 2023-01-11T23:10:17.2248741Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_new_full_cuda_complex128 PASSED [ 82%] 2023-01-11T23:10:17.2248909Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_new_zeros_cuda_complex128 PASSED [ 82%] 2023-01-11T23:10:17.2249125Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_nn_functional_pairwise_distance_cuda_complex128 PASSED [ 82%] 2023-01-11T23:10:17.2249318Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_nn_functional_softmax_with_dtype_cuda_complex128 PASSED [ 82%] 2023-01-11T23:10:17.2249518Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_nn_functional_triplet_margin_loss_cuda_complex128 PASSED [ 82%] 2023-01-11T23:10:17.2249681Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_pow_cuda_complex128 PASSED [ 82%] 2023-01-11T23:10:17.2249845Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_ravel_cuda_complex128 PASSED [ 82%] 2023-01-11T23:10:17.2250005Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_real_cuda_complex128 PASSED [ 82%] 2023-01-11T23:10:17.2250177Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_reciprocal_cuda_complex128 PASSED [ 82%] 2023-01-11T23:10:17.2250340Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_reshape_as_cuda_complex128 PASSED [ 82%] 2023-01-11T23:10:17.2250510Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_reshape_cuda_complex128 PASSED [ 82%] 2023-01-11T23:10:17.2250673Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_roll_cuda_complex128 PASSED [ 82%] 2023-01-11T23:10:17.2250834Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_rsub_cuda_complex128 PASSED [ 82%] 2023-01-11T23:10:17.2250995Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_sgn_cuda_complex128 PASSED [ 82%] 2023-01-11T23:10:17.2251162Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_sigmoid_cuda_complex128 PASSED [ 82%] 2023-01-11T23:10:17.2251319Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_sin_cuda_complex128 PASSED [ 82%] 2023-01-11T23:10:17.2251481Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_sinc_cuda_complex128 PASSED [ 82%] 2023-01-11T23:10:17.2251654Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_softmax_with_dtype_cuda_complex128 PASSED [ 82%] 2023-01-11T23:10:17.2251818Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_sqrt_cuda_complex128 PASSED [ 82%] 2023-01-11T23:10:17.2251982Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_squeeze_cuda_complex128 PASSED [ 82%] 2023-01-11T23:10:17.2252172Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_std_mean_cuda_complex128 PASSED [ 82%] 2023-01-11T23:10:17.2252344Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_sum_to_size_cuda_complex128 PASSED [ 82%] 2023-01-11T23:10:17.2252506Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_tan_cuda_complex128 PASSED [ 82%] 2023-01-11T23:10:17.2252668Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_tanh_cuda_complex128 PASSED [ 82%] 2023-01-11T23:10:17.2252830Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_trace_cuda_complex128 PASSED [ 82%] 2023-01-11T23:10:17.2253003Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_transpose_cuda_complex128 PASSED [ 82%] 2023-01-11T23:10:17.2253160Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_unbind_cuda_complex128 PASSED [ 82%] 2023-01-11T23:10:17.2253334Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_unflatten_cuda_complex128 PASSED [ 82%] 2023-01-11T23:10:17.2253508Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_unfold_copy_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2253671Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_unfold_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2253837Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_unsqueeze_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2253999Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_var_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2254164Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_var_mean_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2254333Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_view_as_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2254604Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_view_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2254817Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_abs_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2254978Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_acos_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2255139Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_acosh_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2255304Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_addcdiv_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2255466Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_addcmul_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2255642Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_addmm_decomposed_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2255801Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_addr_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2255960Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_angle_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2256115Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_argwhere_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2256280Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_as_strided_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2256501Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_as_strided_partial_views_cuda_complex128 SKIPPED (Test changes in memory layout) [ 83%] 2023-01-11T23:10:17.2256675Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_as_strided_scatter_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2256832Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_atanh_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2256994Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_atleast_1d_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2257156Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_baddbmm_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2257313Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_bmm_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2257483Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_broadcast_tensors_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2257637Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_byte_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2257797Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_cat_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2258043Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_cfloat_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2258208Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_chunk_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2258370Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_conj_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2258538Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_conj_physical_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2258702Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_contiguous_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2258864Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_corrcoef_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2259024Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_count_nonzero_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2259184Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_cross_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2259343Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_cumsum_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2259527Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_cumulative_trapezoid_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2259688Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_diagonal_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2259859Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_diagonal_scatter_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2260017Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_dist_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2260191Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_div_no_rounding_mode_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2260347Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_dsplit_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2260535Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_dstack_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2260698Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_einsum_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2260902Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_equal_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2261067Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_expand_as_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2261231Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_fft_hfft2_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2261392Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_fft_ifft2_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2261553Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_fft_ifft_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2261720Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_fft_ifftshift_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2261872Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_fft_irfft_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2262038Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_flatten_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2262198Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_flip_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2262364Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_flipud_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2262523Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_float_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2262694Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_float_power_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2262852Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_full_cuda_complex128 XFAIL [ 83%] 2023-01-11T23:10:17.2263013Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_full_like_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2263170Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_gradient_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2263328Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_half_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2263485Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_imag_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2263648Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_index_add_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2269343Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_index_fill_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2269600Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_index_put_cuda_complex128 SKIPPED (Operation not tested with tensors with negative bit.) [ 83%] 2023-01-11T23:10:17.2269774Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_index_select_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2269939Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_inner_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2270099Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_int_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2270277Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_isclose_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2270472Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_isinf_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2270636Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_isreal_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2270820Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_jiterator_2inputs_2outputs_cuda_complex128 XFAIL [ 83%] 2023-01-11T23:10:17.2271012Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_jiterator_4inputs_with_extra_args_cuda_complex128 XFAIL [ 83%] 2023-01-11T23:10:17.2271182Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_jiterator_binary_cuda_complex128 XFAIL [ 83%] 2023-01-11T23:10:17.2271367Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_jiterator_binary_return_by_ref_cuda_complex128 XFAIL [ 83%] 2023-01-11T23:10:17.2271537Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_jiterator_unary_cuda_complex128 XFAIL [ 83%] 2023-01-11T23:10:17.2271699Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_kron_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2271877Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_ldexp_cuda_complex128 XFAIL [ 83%] 2023-01-11T23:10:17.2272045Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_cond_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2272218Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_det_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2272391Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_eigvalsh_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2272581Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_householder_product_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2272755Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_ldl_factor_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2272929Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_ldl_solve_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2273100Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_lu_factor_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2273267Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_lu_solve_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2273443Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_matrix_norm_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2273614Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_matrix_rank_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2273803Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_matrix_rank_hermitian_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2273969Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_norm_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2274135Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_pinv_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2274313Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_pinv_hermitian_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2274553Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_pinv_singular_cuda_complex128 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 83%] 2023-01-11T23:10:17.2274729Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_slogdet_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2274890Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_solve_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2275080Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_solve_ex_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2275255Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_tensorinv_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2275434Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_tensorsolve_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2275603Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_vander_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2275768Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linspace_cuda_complex128 XFAIL [ 83%] 2023-01-11T23:10:17.2275929Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_log1p_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2276094Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_log_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2276275Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_log_softmax_with_dtype_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2276438Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_logical_and_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2276600Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_logspace_cuda_complex128 XFAIL [ 83%] 2023-01-11T23:10:17.2276762Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_long_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2276921Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_mH_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2277073Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_mT_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2277245Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_masked_cumprod_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2277412Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_masked_mean_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2277610Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_masked_normalize_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2277769Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_masked_prod_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2277937Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_masked_std_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2278103Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_masked_sum_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2278264Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_masked_var_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2278424Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_matrix_exp_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2278610Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_meshgrid_variadic_tensors_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2278768Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_mm_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2278936Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_narrow_copy_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2279116Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_new_empty_cuda_complex128 SKIPPED (Skipped!) [ 83%] 2023-01-11T23:10:17.2279342Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_new_empty_strided_cuda_complex128 SKIPPED (Expected: new_empty_strided is not comparable) [ 83%] 2023-01-11T23:10:17.2279508Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_new_zeros_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2279690Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_conv2d_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2279881Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_conv_transpose2d_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2280092Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_feature_alpha_dropout_without_train_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2280275Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_linear_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2280486Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_normalize_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2280722Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_pad_replicate_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2280912Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_pairwise_distance_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2281091Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_pixel_shuffle_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2281281Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_pixel_unshuffle_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2281464Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_tanhshrink_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2281655Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_triplet_margin_loss_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2281837Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_unfold_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2282001Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_norm_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2282167Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_norm_inf_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2282333Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_norm_nuc_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2282490Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_ones_cuda_complex128 XFAIL [ 83%] 2023-01-11T23:10:17.2282646Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_ones_like_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2282810Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_permute_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2282973Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_pinverse_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2283134Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_prod_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2283317Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_put_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2283482Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_rand_like_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2283644Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_randn_cuda_complex128 XFAIL [ 83%] 2023-01-11T23:10:17.2283811Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_randn_like_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2283964Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_ravel_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2284124Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_real_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2284288Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_reciprocal_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2284451Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_repeat_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2284674Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_resize__cuda_complex128 SKIPPED (Operation not tested with tensors with negative bit.) [ 83%] 2023-01-11T23:10:17.2284848Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_resolve_conj_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2285015Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_resolve_neg_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2285177Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_rot90_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2285332Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_rsub_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2285508Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_scalar_tensor_cuda_complex128 SKIPPED (Skipped!) [ 83%] 2023-01-11T23:10:17.2285675Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_scatter_add_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2285838Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_scatter_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2285995Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_sin_cuda_complex128 PASSED [ 83%] 2023-01-11T23:10:17.2286155Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_sinc_cuda_complex128 PASSED [ 84%] 2023-01-11T23:10:17.2286308Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_sinh_cuda_complex128 PASSED [ 84%] 2023-01-11T23:10:17.2286493Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_slice_cuda_complex128 PASSED [ 84%] 2023-01-11T23:10:17.2286671Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_softmax_with_dtype_cuda_complex128 PASSED [ 84%] 2023-01-11T23:10:17.2286855Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_sparse_sampled_addmm_cuda_complex128 SKIPPED (Skipped!) [ 84%] 2023-01-11T23:10:17.2287023Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_split_list_args_cuda_complex128 PASSED [ 84%] 2023-01-11T23:10:17.2287195Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_split_with_sizes_cuda_complex128 PASSED [ 84%] 2023-01-11T23:10:17.2287356Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_stack_cuda_complex128 PASSED [ 84%] 2023-01-11T23:10:17.2287518Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_std_cuda_complex128 PASSED [ 84%] 2023-01-11T23:10:17.2287686Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_std_unbiased_cuda_complex128 PASSED [ 84%] 2023-01-11T23:10:17.2287846Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_stft_cuda_complex128 PASSED [ 84%] 2023-01-11T23:10:17.2288002Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_sub_cuda_complex128 PASSED [ 84%] 2023-01-11T23:10:17.2288157Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_sum_cuda_complex128 PASSED [ 84%] 2023-01-11T23:10:17.2288316Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_sum_to_size_cuda_complex128 PASSED [ 84%] 2023-01-11T23:10:17.2288481Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_symeig_cuda_complex128 PASSED [ 84%] 2023-01-11T23:10:17.2288638Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_t_cuda_complex128 PASSED [ 84%] 2023-01-11T23:10:17.2288798Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_tan_cuda_complex128 PASSED [ 84%] 2023-01-11T23:10:17.2288986Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_to_cuda_complex128 PASSED [ 84%] 2023-01-11T23:10:17.2289154Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_to_sparse_cuda_complex128 PASSED [ 84%] 2023-01-11T23:10:17.2289317Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_trace_cuda_complex128 PASSED [ 84%] 2023-01-11T23:10:17.2289482Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_transpose_cuda_complex128 PASSED [ 84%] 2023-01-11T23:10:17.2289638Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_unbind_cuda_complex128 PASSED [ 84%] 2023-01-11T23:10:17.2289803Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_unflatten_cuda_complex128 PASSED [ 84%] 2023-01-11T23:10:17.2289970Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_unfold_copy_cuda_complex128 PASSED [ 84%] 2023-01-11T23:10:17.2290132Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_unfold_cuda_complex128 PASSED [ 84%] 2023-01-11T23:10:17.2290317Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_var_mean_unbiased_cuda_complex128 PASSED [ 84%] 2023-01-11T23:10:17.2290506Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_view_as_cuda_complex128 PASSED [ 84%] 2023-01-11T23:10:17.2290681Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_view_cuda_complex128 PASSED [ 84%] 2023-01-11T23:10:17.2290836Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_zeros_cuda_complex128 XFAIL [ 84%] 2023-01-11T23:10:17.2291000Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_zeros_like_cuda_complex128 PASSED [ 84%] 2023-01-11T23:10:17.2291151Z test_ops.py::TestMathBitsCUDA::test_neg_view___getitem___cuda_float64 PASSED [ 84%] 2023-01-11T23:10:17.2291306Z test_ops.py::TestMathBitsCUDA::test_neg_view___rmod___cuda_float64 PASSED [ 84%] 2023-01-11T23:10:17.2291479Z test_ops.py::TestMathBitsCUDA::test_neg_view__native_batch_norm_legit_cuda_float64 PASSED [ 84%] 2023-01-11T23:10:17.2291655Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs__conversions_bfloat16_cuda_float64 PASSED [ 84%] 2023-01-11T23:10:17.2291829Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs__conversions_bool_cuda_float64 PASSED [ 84%] 2023-01-11T23:10:17.2292006Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs__conversions_cdouble_cuda_float64 PASSED [ 84%] 2023-01-11T23:10:17.2292201Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs__conversions_chalf_cuda_float64 PASSED [ 84%] 2023-01-11T23:10:17.2292379Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs__conversions_complex_cuda_float64 PASSED [ 84%] 2023-01-11T23:10:17.2292543Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs__conversions_double_cuda_float64 PASSED [ 84%] 2023-01-11T23:10:17.2292715Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs__conversions_float_cuda_float64 PASSED [ 84%] 2023-01-11T23:10:17.2292882Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs__conversions_long_cuda_float64 PASSED [ 84%] 2023-01-11T23:10:17.2293041Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_abs_cuda_float64 PASSED [ 84%] 2023-01-11T23:10:17.2293204Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_acos_cuda_float64 PASSED [ 84%] 2023-01-11T23:10:17.2293359Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_acosh_cuda_float64 PASSED [ 84%] 2023-01-11T23:10:17.2293514Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_add_cuda_float64 PASSED [ 84%] 2023-01-11T23:10:17.2293676Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_addcdiv_cuda_float64 PASSED [ 84%] 2023-01-11T23:10:17.2293826Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_addcmul_cuda_float64 PASSED [ 84%] 2023-01-11T23:10:17.2293985Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_amin_cuda_float64 PASSED [ 84%] 2023-01-11T23:10:17.2294142Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_any_cuda_float64 PASSED [ 84%] 2023-01-11T23:10:17.2294297Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_arange_cuda_float64 XFAIL [ 84%] 2023-01-11T23:10:17.2294760Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_as_strided_cuda_float64 SKIPPED (Errors when storage_offset is included) [ 84%] 2023-01-11T23:10:17.2295110Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_as_strided_partial_views_cuda_float64 SKIPPED (Errors when storage_offset is included) [ 84%] 2023-01-11T23:10:17.2295273Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_atanh_cuda_float64 PASSED [ 84%] 2023-01-11T23:10:17.2295443Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_atleast_1d_cuda_float64 PASSED [ 84%] 2023-01-11T23:10:17.2295606Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_atleast_2d_cuda_float64 PASSED [ 84%] 2023-01-11T23:10:17.2295772Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_broadcast_tensors_cuda_float64 PASSED [ 84%] 2023-01-11T23:10:17.2295934Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_bucketize_cuda_float64 PASSED [ 84%] 2023-01-11T23:10:17.2296091Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_chunk_cuda_float64 PASSED [ 84%] 2023-01-11T23:10:17.2296257Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_clamp_min_cuda_float64 PASSED [ 84%] 2023-01-11T23:10:17.2296415Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_clone_cuda_float64 PASSED [ 84%] 2023-01-11T23:10:17.2296584Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_constant_pad_nd_cuda_float64 PASSED [ 84%] 2023-01-11T23:10:17.2296748Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_contiguous_cuda_float64 PASSED [ 84%] 2023-01-11T23:10:17.2296915Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_copysign_cuda_float64 PASSED [ 84%] 2023-01-11T23:10:17.2297070Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_cos_cuda_float64 PASSED [ 84%] 2023-01-11T23:10:17.2297221Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_cosh_cuda_float64 PASSED [ 84%] 2023-01-11T23:10:17.2297384Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_diag_embed_cuda_float64 PASSED [ 84%] 2023-01-11T23:10:17.2297554Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_diagonal_scatter_cuda_float64 PASSED [ 84%] 2023-01-11T23:10:17.2297729Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_div_no_rounding_mode_cuda_float64 PASSED [ 84%] 2023-01-11T23:10:17.2297903Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_div_trunc_rounding_cuda_float64 PASSED [ 84%] 2023-01-11T23:10:17.2298106Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_empty_like_cuda_float64 SKIPPED (Expected: empty is not comparable) [ 84%] 2023-01-11T23:10:17.2298299Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_eq_cuda_float64 PASSED [ 84%] 2023-01-11T23:10:17.2298459Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_exp2_cuda_float64 PASSED [ 84%] 2023-01-11T23:10:17.2298607Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_exp_cuda_float64 PASSED [ 84%] 2023-01-11T23:10:17.2298770Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_expand_cuda_float64 PASSED [ 84%] 2023-01-11T23:10:17.2298927Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_expm1_cuda_float64 PASSED [ 84%] 2023-01-11T23:10:17.2299094Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_eye_cuda_float64 SKIPPED (Skipped!) [ 84%] 2023-01-11T23:10:17.2299260Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_fftshift_cuda_float64 PASSED [ 84%] 2023-01-11T23:10:17.2299423Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_hfft_cuda_float64 PASSED [ 84%] 2023-01-11T23:10:17.2299594Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_ifftshift_cuda_float64 PASSED [ 84%] 2023-01-11T23:10:17.2299760Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_ihfftn_cuda_float64 PASSED [ 84%] 2023-01-11T23:10:17.2299922Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_irfft_cuda_float64 PASSED [ 84%] 2023-01-11T23:10:17.2300077Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_irfftn_cuda_float64 PASSED [ 84%] 2023-01-11T23:10:17.2300263Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_rfft2_cuda_float64 PASSED [ 84%] 2023-01-11T23:10:17.2300450Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_rfft_cuda_float64 PASSED [ 84%] 2023-01-11T23:10:17.2300607Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_rfftn_cuda_float64 PASSED [ 84%] 2023-01-11T23:10:17.2300790Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_flip_cuda_float64 PASSED [ 84%] 2023-01-11T23:10:17.2300950Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_flipud_cuda_float64 PASSED [ 84%] 2023-01-11T23:10:17.2301109Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_gt_cuda_float64 PASSED [ 84%] 2023-01-11T23:10:17.2301267Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_heaviside_cuda_float64 PASSED [ 84%] 2023-01-11T23:10:17.2301419Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_hypot_cuda_float64 PASSED [ 84%] 2023-01-11T23:10:17.2301571Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_i0_cuda_float64 PASSED [ 84%] 2023-01-11T23:10:17.2301728Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_igamma_cuda_float64 PASSED [ 84%] 2023-01-11T23:10:17.2301886Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_igammac_cuda_float64 PASSED [ 84%] 2023-01-11T23:10:17.2302043Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_index_add_cuda_float64 PASSED [ 84%] 2023-01-11T23:10:17.2302209Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_index_fill_cuda_float64 PASSED [ 84%] 2023-01-11T23:10:17.2302371Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_index_select_cuda_float64 PASSED [ 84%] 2023-01-11T23:10:17.2302531Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_isclose_cuda_float64 PASSED [ 84%] 2023-01-11T23:10:17.2302683Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_isfinite_cuda_float64 PASSED [ 84%] 2023-01-11T23:10:17.2302845Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_isposinf_cuda_float64 PASSED [ 84%] 2023-01-11T23:10:17.2303005Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_isreal_cuda_float64 PASSED [ 84%] 2023-01-11T23:10:17.2303159Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_le_cuda_float64 PASSED [ 84%] 2023-01-11T23:10:17.2303318Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_lgamma_cuda_float64 PASSED [ 84%] 2023-01-11T23:10:17.2303489Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_linalg_matrix_norm_cuda_float64 PASSED [ 84%] 2023-01-11T23:10:17.2303655Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_linalg_norm_cuda_float64 PASSED [ 84%] 2023-01-11T23:10:17.2303816Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_log10_cuda_float64 PASSED [ 84%] 2023-01-11T23:10:17.2304005Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_log2_cuda_float64 PASSED [ 84%] 2023-01-11T23:10:17.2304154Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_log_cuda_float64 PASSED [ 84%] 2023-01-11T23:10:17.2304330Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_log_softmax_with_dtype_cuda_float64 PASSED [ 84%] 2023-01-11T23:10:17.2304493Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_logical_or_cuda_float64 PASSED [ 84%] 2023-01-11T23:10:17.2304656Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_masked_fill_cuda_float64 PASSED [ 84%] 2023-01-11T23:10:17.2304814Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_mean_cuda_float64 PASSED [ 84%] 2023-01-11T23:10:17.2304993Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_meshgrid_list_of_tensors_cuda_float64 PASSED [ 84%] 2023-01-11T23:10:17.2305178Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_meshgrid_variadic_tensors_cuda_float64 PASSED [ 84%] 2023-01-11T23:10:17.2305341Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_minimum_cuda_float64 PASSED [ 84%] 2023-01-11T23:10:17.2305490Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_movedim_cuda_float64 PASSED [ 84%] 2023-01-11T23:10:17.2305645Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_mul_cuda_float64 PASSED [ 84%] 2023-01-11T23:10:17.2305809Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_narrow_copy_cuda_float64 PASSED [ 84%] 2023-01-11T23:10:17.2305977Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_native_layer_norm_cuda_float64 PASSED [ 84%] 2023-01-11T23:10:17.2306134Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_neg_cuda_float64 PASSED [ 84%] 2023-01-11T23:10:17.2306350Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_new_empty_cuda_float64 SKIPPED (Expected: empty is not comparable) [ 84%] 2023-01-11T23:10:17.2306596Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_new_full_cuda_float64 PASSED [ 84%] 2023-01-11T23:10:17.2306866Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_dropout_cuda_float64 SKIPPED (Expected: dropout is not comparable) [ 84%] 2023-01-11T23:10:17.2307049Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_gelu_cuda_float64 PASSED [ 84%] 2023-01-11T23:10:17.2307221Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_group_norm_cuda_float64 PASSED [ 84%] 2023-01-11T23:10:17.2307400Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_huber_loss_cuda_float64 PASSED [ 84%] 2023-01-11T23:10:17.2307577Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_l1_loss_cuda_float64 PASSED [ 84%] 2023-01-11T23:10:17.2307768Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_log_softmax_with_dtype_cuda_float64 PASSED [ 84%] 2023-01-11T23:10:17.2307943Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_mish_cuda_float64 PASSED [ 84%] 2023-01-11T23:10:17.2308125Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_mse_loss_cuda_float64 PASSED [ 84%] 2023-01-11T23:10:17.2308314Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_pairwise_distance_cuda_float64 PASSED [ 84%] 2023-01-11T23:10:17.2308505Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_softmin_with_dtype_cuda_float64 PASSED [ 84%] 2023-01-11T23:10:17.2308762Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_softplus_cuda_float64 PASSED [ 84%] 2023-01-11T23:10:17.2308937Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_softshrink_cuda_float64 PASSED [ 84%] 2023-01-11T23:10:17.2309123Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_tanhshrink_cuda_float64 PASSED [ 84%] 2023-01-11T23:10:17.2309310Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_triplet_margin_loss_cuda_float64 PASSED [ 84%] 2023-01-11T23:10:17.2309474Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_ones_cuda_float64 XFAIL [ 84%] 2023-01-11T23:10:17.2309634Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_permute_cuda_float64 PASSED [ 84%] 2023-01-11T23:10:17.2309837Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_positive_cuda_float64 PASSED [ 84%] 2023-01-11T23:10:17.2309998Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_pow_cuda_float64 PASSED [ 84%] 2023-01-11T23:10:17.2310187Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_randn_cuda_float64 SKIPPED (Test expects tensor input) [ 84%] 2023-01-11T23:10:17.2310340Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_ravel_cuda_float64 PASSED [ 84%] 2023-01-11T23:10:17.2310506Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_reciprocal_cuda_float64 PASSED [ 84%] 2023-01-11T23:10:17.2310668Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_remainder_cuda_float64 PASSED [ 84%] 2023-01-11T23:10:17.2310830Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_repeat_cuda_float64 PASSED [ 84%] 2023-01-11T23:10:17.2310995Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_reshape_as_cuda_float64 PASSED [ 84%] 2023-01-11T23:10:17.2311156Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_reshape_cuda_float64 PASSED [ 84%] 2023-01-11T23:10:17.2311316Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_round_cuda_float64 PASSED [ 84%] 2023-01-11T23:10:17.2311474Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_sgn_cuda_float64 PASSED [ 84%] 2023-01-11T23:10:17.2311636Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_sigmoid_cuda_float64 PASSED [ 84%] 2023-01-11T23:10:17.2311786Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_signbit_cuda_float64 PASSED [ 84%] 2023-01-11T23:10:17.2311942Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_sin_cuda_float64 PASSED [ 84%] 2023-01-11T23:10:17.2312101Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_sinc_cuda_float64 PASSED [ 84%] 2023-01-11T23:10:17.2312271Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_special_bessel_j0_cuda_float64 PASSED [ 84%] 2023-01-11T23:10:17.2312460Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_special_i0e_cuda_float64 PASSED [ 84%] 2023-01-11T23:10:17.2312623Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_special_i1_cuda_float64 PASSED [ 84%] 2023-01-11T23:10:17.2312793Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_special_i1e_cuda_float64 PASSED [ 84%] 2023-01-11T23:10:17.2312964Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_special_log_ndtr_cuda_float64 PASSED [ 84%] 2023-01-11T23:10:17.2313143Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_special_log_softmax_with_dtype_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2313314Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_special_logit_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2313506Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_special_multigammaln_mvlgamma_p_3_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2313694Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_special_multigammaln_mvlgamma_p_5_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2313858Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_special_ndtri_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2314041Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_special_softmax_with_dtype_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2314212Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_special_xlog1py_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2314371Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_squeeze_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2314527Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_sum_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2314678Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_tanh_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2314843Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_tensor_split_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2314998Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_to_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2315155Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_trace_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2315313Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_tril_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2315466Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_triu_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2315657Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_true_divide_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2315821Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_unbind_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2315976Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_unflatten_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2316143Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_unfold_copy_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2316304Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_unfold_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2316466Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_unsqueeze_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2316628Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_var_mean_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2316791Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_view_as_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2316952Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_view_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2317112Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_vsplit_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2317267Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_vstack_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2317419Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_where_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2317575Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_xlogy_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2317731Z test_ops.py::TestMathBitsCUDA::test_neg_view_acos_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2317886Z test_ops.py::TestMathBitsCUDA::test_neg_view_acosh_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2318039Z test_ops.py::TestMathBitsCUDA::test_neg_view_add_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2318219Z test_ops.py::TestMathBitsCUDA::test_neg_view_addcmul_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2318387Z test_ops.py::TestMathBitsCUDA::test_neg_view_addmm_decomposed_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2318543Z test_ops.py::TestMathBitsCUDA::test_neg_view_addmv_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2318687Z test_ops.py::TestMathBitsCUDA::test_neg_view_all_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2318842Z test_ops.py::TestMathBitsCUDA::test_neg_view_allclose_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2318993Z test_ops.py::TestMathBitsCUDA::test_neg_view_amin_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2319143Z test_ops.py::TestMathBitsCUDA::test_neg_view_angle_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2319293Z test_ops.py::TestMathBitsCUDA::test_neg_view_arange_cuda_float64 XFAIL [ 85%] 2023-01-11T23:10:17.2319444Z test_ops.py::TestMathBitsCUDA::test_neg_view_argmax_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2319599Z test_ops.py::TestMathBitsCUDA::test_neg_view_argmin_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2319750Z test_ops.py::TestMathBitsCUDA::test_neg_view_argsort_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2319956Z test_ops.py::TestMathBitsCUDA::test_neg_view_as_strided_partial_views_cuda_float64 SKIPPED (Test changes in memory layout) [ 85%] 2023-01-11T23:10:17.2320106Z test_ops.py::TestMathBitsCUDA::test_neg_view_asin_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2320254Z test_ops.py::TestMathBitsCUDA::test_neg_view_atan_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2320404Z test_ops.py::TestMathBitsCUDA::test_neg_view_atanh_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2320564Z test_ops.py::TestMathBitsCUDA::test_neg_view_atleast_2d_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2320720Z test_ops.py::TestMathBitsCUDA::test_neg_view_atleast_3d_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2320876Z test_ops.py::TestMathBitsCUDA::test_neg_view_bfloat16_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2321052Z test_ops.py::TestMathBitsCUDA::test_neg_view_broadcast_tensors_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2321207Z test_ops.py::TestMathBitsCUDA::test_neg_view_cartesian_prod_cuda_float64 XFAIL [ 85%] 2023-01-11T23:10:17.2321383Z test_ops.py::TestMathBitsCUDA::test_neg_view_ceil_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2321537Z test_ops.py::TestMathBitsCUDA::test_neg_view_char_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2321695Z test_ops.py::TestMathBitsCUDA::test_neg_view_cholesky_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2321844Z test_ops.py::TestMathBitsCUDA::test_neg_view_chunk_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2322005Z test_ops.py::TestMathBitsCUDA::test_neg_view_clamp_max_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2322159Z test_ops.py::TestMathBitsCUDA::test_neg_view_clone_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2322317Z test_ops.py::TestMathBitsCUDA::test_neg_view_column_stack_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2322471Z test_ops.py::TestMathBitsCUDA::test_neg_view_combinations_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2322625Z test_ops.py::TestMathBitsCUDA::test_neg_view_complex_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2322784Z test_ops.py::TestMathBitsCUDA::test_neg_view_copysign_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2322941Z test_ops.py::TestMathBitsCUDA::test_neg_view_corrcoef_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2323089Z test_ops.py::TestMathBitsCUDA::test_neg_view_cos_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2323252Z test_ops.py::TestMathBitsCUDA::test_neg_view_count_nonzero_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2323401Z test_ops.py::TestMathBitsCUDA::test_neg_view_cov_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2323551Z test_ops.py::TestMathBitsCUDA::test_neg_view_cross_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2323704Z test_ops.py::TestMathBitsCUDA::test_neg_view_cummin_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2323870Z test_ops.py::TestMathBitsCUDA::test_neg_view_cumprod_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2324019Z test_ops.py::TestMathBitsCUDA::test_neg_view_cumsum_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2324194Z test_ops.py::TestMathBitsCUDA::test_neg_view_cumulative_trapezoid_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2324345Z test_ops.py::TestMathBitsCUDA::test_neg_view_deg2rad_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2324491Z test_ops.py::TestMathBitsCUDA::test_neg_view_diag_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2324654Z test_ops.py::TestMathBitsCUDA::test_neg_view_diagonal_copy_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2324819Z test_ops.py::TestMathBitsCUDA::test_neg_view_diagonal_scatter_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2324970Z test_ops.py::TestMathBitsCUDA::test_neg_view_digamma_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2325132Z test_ops.py::TestMathBitsCUDA::test_neg_view_div_floor_rounding_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2325304Z test_ops.py::TestMathBitsCUDA::test_neg_view_div_no_rounding_mode_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2325457Z test_ops.py::TestMathBitsCUDA::test_neg_view_dstack_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2325615Z test_ops.py::TestMathBitsCUDA::test_neg_view_einsum_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2325764Z test_ops.py::TestMathBitsCUDA::test_neg_view_exp2_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2325912Z test_ops.py::TestMathBitsCUDA::test_neg_view_exp_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2326064Z test_ops.py::TestMathBitsCUDA::test_neg_view_expand_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2326233Z test_ops.py::TestMathBitsCUDA::test_neg_view_eye_cuda_float64 SKIPPED (Skipped!) [ 85%] 2023-01-11T23:10:17.2326380Z test_ops.py::TestMathBitsCUDA::test_neg_view_fft_fft2_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2326532Z test_ops.py::TestMathBitsCUDA::test_neg_view_fft_fft_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2326696Z test_ops.py::TestMathBitsCUDA::test_neg_view_fft_fftshift_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2326850Z test_ops.py::TestMathBitsCUDA::test_neg_view_fft_hfft2_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2327029Z test_ops.py::TestMathBitsCUDA::test_neg_view_fft_hfft_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2327188Z test_ops.py::TestMathBitsCUDA::test_neg_view_fft_ifft_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2327351Z test_ops.py::TestMathBitsCUDA::test_neg_view_fft_ifftshift_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2327510Z test_ops.py::TestMathBitsCUDA::test_neg_view_fft_ihfft2_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2327660Z test_ops.py::TestMathBitsCUDA::test_neg_view_fft_ihfft_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2327823Z test_ops.py::TestMathBitsCUDA::test_neg_view_fft_irfft2_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2327977Z test_ops.py::TestMathBitsCUDA::test_neg_view_fft_irfft_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2328133Z test_ops.py::TestMathBitsCUDA::test_neg_view_fft_rfftn_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2328288Z test_ops.py::TestMathBitsCUDA::test_neg_view_flatten_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2328440Z test_ops.py::TestMathBitsCUDA::test_neg_view_flip_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2328597Z test_ops.py::TestMathBitsCUDA::test_neg_view_fliplr_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2328749Z test_ops.py::TestMathBitsCUDA::test_neg_view_flipud_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2328891Z test_ops.py::TestMathBitsCUDA::test_neg_view_float_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2329038Z test_ops.py::TestMathBitsCUDA::test_neg_view_frac_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2329189Z test_ops.py::TestMathBitsCUDA::test_neg_view_frexp_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2329336Z test_ops.py::TestMathBitsCUDA::test_neg_view_full_cuda_float64 XFAIL [ 85%] 2023-01-11T23:10:17.2329482Z test_ops.py::TestMathBitsCUDA::test_neg_view_ge_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2329671Z test_ops.py::TestMathBitsCUDA::test_neg_view_grid_sampler_2d_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2329820Z test_ops.py::TestMathBitsCUDA::test_neg_view_half_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2329984Z test_ops.py::TestMathBitsCUDA::test_neg_view_heaviside_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2330134Z test_ops.py::TestMathBitsCUDA::test_neg_view_hsplit_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2330276Z test_ops.py::TestMathBitsCUDA::test_neg_view_hstack_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2330425Z test_ops.py::TestMathBitsCUDA::test_neg_view_hypot_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2330574Z test_ops.py::TestMathBitsCUDA::test_neg_view_i0_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2330724Z test_ops.py::TestMathBitsCUDA::test_neg_view_igamma_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2330884Z test_ops.py::TestMathBitsCUDA::test_neg_view_index_copy_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2331044Z test_ops.py::TestMathBitsCUDA::test_neg_view_index_fill_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2331267Z test_ops.py::TestMathBitsCUDA::test_neg_view_index_put_cuda_float64 SKIPPED (Operation not tested with tensors with negative bit.) [ 85%] 2023-01-11T23:10:17.2331429Z test_ops.py::TestMathBitsCUDA::test_neg_view_isfinite_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2331597Z test_ops.py::TestMathBitsCUDA::test_neg_view_jiterator_2inputs_2outputs_cuda_float64 XFAIL [ 85%] 2023-01-11T23:10:17.2331780Z test_ops.py::TestMathBitsCUDA::test_neg_view_jiterator_4inputs_with_extra_args_cuda_float64 XFAIL [ 85%] 2023-01-11T23:10:17.2331943Z test_ops.py::TestMathBitsCUDA::test_neg_view_jiterator_binary_cuda_float64 XFAIL [ 85%] 2023-01-11T23:10:17.2332122Z test_ops.py::TestMathBitsCUDA::test_neg_view_jiterator_binary_return_by_ref_cuda_float64 XFAIL [ 85%] 2023-01-11T23:10:17.2332285Z test_ops.py::TestMathBitsCUDA::test_neg_view_jiterator_unary_cuda_float64 XFAIL [ 85%] 2023-01-11T23:10:17.2332446Z test_ops.py::TestMathBitsCUDA::test_neg_view_kthvalue_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2332599Z test_ops.py::TestMathBitsCUDA::test_neg_view_ldexp_cuda_float64 XFAIL [ 85%] 2023-01-11T23:10:17.2332773Z test_ops.py::TestMathBitsCUDA::test_neg_view_le_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2332918Z test_ops.py::TestMathBitsCUDA::test_neg_view_lerp_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2333070Z test_ops.py::TestMathBitsCUDA::test_neg_view_lgamma_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2333225Z test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_eig_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2333386Z test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_eigvals_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2333551Z test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_eigvalsh_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2333712Z test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_inv_ex_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2333878Z test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_ldl_factor_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2334050Z test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_ldl_factor_ex_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2334219Z test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_ldl_solve_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2334368Z test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_lu_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2334652Z test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_lu_factor_ex_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2334816Z test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_lu_solve_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2334981Z test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_matrix_rank_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2335162Z test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_matrix_rank_hermitian_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2335319Z test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_norm_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2335549Z test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_norm_subgradients_at_zero_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2335721Z test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_pinv_hermitian_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2335880Z test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_slogdet_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2336047Z test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_tensorinv_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2336209Z test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_vander_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2336367Z test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_vecdot_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2336524Z test_ops.py::TestMathBitsCUDA::test_neg_view_linspace_cuda_float64 XFAIL [ 85%] 2023-01-11T23:10:17.2336678Z test_ops.py::TestMathBitsCUDA::test_neg_view_log1p_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2336829Z test_ops.py::TestMathBitsCUDA::test_neg_view_log_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2336992Z test_ops.py::TestMathBitsCUDA::test_neg_view_logaddexp2_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2337143Z test_ops.py::TestMathBitsCUDA::test_neg_view_logaddexp_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2337308Z test_ops.py::TestMathBitsCUDA::test_neg_view_logcumsumexp_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2337460Z test_ops.py::TestMathBitsCUDA::test_neg_view_logdet_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2337618Z test_ops.py::TestMathBitsCUDA::test_neg_view_logical_and_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2337776Z test_ops.py::TestMathBitsCUDA::test_neg_view_logical_not_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2337929Z test_ops.py::TestMathBitsCUDA::test_neg_view_logical_xor_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2338081Z test_ops.py::TestMathBitsCUDA::test_neg_view_logit_cuda_float64 PASSED [ 85%] 2023-01-11T23:10:17.2338231Z test_ops.py::TestMathBitsCUDA::test_neg_view_lu_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2338391Z test_ops.py::TestMathBitsCUDA::test_neg_view_lu_solve_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2338532Z test_ops.py::TestMathBitsCUDA::test_neg_view_mH_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2338709Z test_ops.py::TestMathBitsCUDA::test_neg_view_mT_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2338872Z test_ops.py::TestMathBitsCUDA::test_neg_view_masked_fill_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2339037Z test_ops.py::TestMathBitsCUDA::test_neg_view_masked_logaddexp_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2339201Z test_ops.py::TestMathBitsCUDA::test_neg_view_masked_logsumexp_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2339368Z test_ops.py::TestMathBitsCUDA::test_neg_view_masked_normalize_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2339526Z test_ops.py::TestMathBitsCUDA::test_neg_view_masked_prod_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2339689Z test_ops.py::TestMathBitsCUDA::test_neg_view_masked_scatter_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2339846Z test_ops.py::TestMathBitsCUDA::test_neg_view_masked_select_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2340007Z test_ops.py::TestMathBitsCUDA::test_neg_view_masked_softmax_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2340170Z test_ops.py::TestMathBitsCUDA::test_neg_view_masked_softmin_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2340333Z test_ops.py::TestMathBitsCUDA::test_neg_view_masked_std_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2340509Z test_ops.py::TestMathBitsCUDA::test_neg_view_matmul_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2340715Z test_ops.py::TestMathBitsCUDA::test_neg_view_max_pool2d_with_indices_backward_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2340884Z test_ops.py::TestMathBitsCUDA::test_neg_view_max_reduction_no_dim_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2341057Z test_ops.py::TestMathBitsCUDA::test_neg_view_max_reduction_with_dim_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2341202Z test_ops.py::TestMathBitsCUDA::test_neg_view_mean_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2341382Z test_ops.py::TestMathBitsCUDA::test_neg_view_median_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2341557Z test_ops.py::TestMathBitsCUDA::test_neg_view_meshgrid_variadic_tensors_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2341730Z test_ops.py::TestMathBitsCUDA::test_neg_view_min_reduction_no_dim_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2341901Z test_ops.py::TestMathBitsCUDA::test_neg_view_min_reduction_with_dim_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2342054Z test_ops.py::TestMathBitsCUDA::test_neg_view_minimum_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2342206Z test_ops.py::TestMathBitsCUDA::test_neg_view_mode_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2342359Z test_ops.py::TestMathBitsCUDA::test_neg_view_movedim_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2342510Z test_ops.py::TestMathBitsCUDA::test_neg_view_mul_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2342662Z test_ops.py::TestMathBitsCUDA::test_neg_view_multinomial_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2342835Z test_ops.py::TestMathBitsCUDA::test_neg_view_mvlgamma_mvlgamma_p_1_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2343005Z test_ops.py::TestMathBitsCUDA::test_neg_view_mvlgamma_mvlgamma_p_3_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2343172Z test_ops.py::TestMathBitsCUDA::test_neg_view_mvlgamma_mvlgamma_p_5_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2343325Z test_ops.py::TestMathBitsCUDA::test_neg_view_nanmean_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2343474Z test_ops.py::TestMathBitsCUDA::test_neg_view_narrow_cuda_float64 XFAIL [ 86%] 2023-01-11T23:10:17.2343638Z test_ops.py::TestMathBitsCUDA::test_neg_view_native_batch_norm_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2343811Z test_ops.py::TestMathBitsCUDA::test_neg_view_native_dropout_backward_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2343965Z test_ops.py::TestMathBitsCUDA::test_neg_view_native_layer_norm_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2344117Z test_ops.py::TestMathBitsCUDA::test_neg_view_ne_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2344275Z test_ops.py::TestMathBitsCUDA::test_neg_view_new_full_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2344494Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_adaptive_avg_pool1d_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2344675Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_alpha_dropout_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2344847Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_avg_pool2d_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2345021Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_batch_norm_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2345208Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_batch_norm_without_cudnn_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2345394Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_binary_cross_entropy_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2345585Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_binary_cross_entropy_with_logits_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2345755Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_conv1d_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2345939Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_conv_transpose1d_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2346125Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_cosine_embedding_loss_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2346306Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_cosine_similarity_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2346484Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_cross_entropy_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2346668Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_dropout_cuda_float64 SKIPPED (Skipped!) [ 86%] 2023-01-11T23:10:17.2346835Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_elu_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2347036Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_embedding_bag_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2347215Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_embedding_cuda_float64 SKIPPED (Skipped!) [ 86%] 2023-01-11T23:10:17.2347418Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_feature_alpha_dropout_without_train_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2347660Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_fractional_max_pool3d_cuda_float64 SKIPPED (Operation not tested with tensors with negative bit.) [ 86%] 2023-01-11T23:10:17.2347837Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_grid_sample_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2348012Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_hardsigmoid_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2348187Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_instance_norm_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2348367Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_interpolate_area_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2348552Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_interpolate_nearest_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2348794Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_l1_loss_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2348962Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_leaky_relu_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2349129Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_linear_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2349309Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_local_response_norm_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2349485Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_logsigmoid_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2349658Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_max_pool2d_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2349835Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_max_unpool1d_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2350021Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_max_unpool1d_grad_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2350227Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_max_unpool2d_grad_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2350394Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_mish_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2350556Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_mse_loss_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2350736Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_pad_circular_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2350907Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_pad_constant_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2351081Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_pad_reflect_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2351263Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_pairwise_distance_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2351445Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_pixel_shuffle_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2351627Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_pixel_unshuffle_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2351795Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_prelu_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2351954Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_relu6_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2352116Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_rrelu_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2352296Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_smooth_l1_loss_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2352472Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_soft_margin_loss_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2352640Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_softmin_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2352834Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_softplus_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2353013Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_softshrink_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2353186Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_tanhshrink_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2353358Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_threshold_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2353534Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_triplet_margin_loss_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2353732Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_triplet_margin_with_distance_loss_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2353915Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_upsample_bilinear_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2354073Z test_ops.py::TestMathBitsCUDA::test_neg_view_nonzero_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2354225Z test_ops.py::TestMathBitsCUDA::test_neg_view_norm_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2354381Z test_ops.py::TestMathBitsCUDA::test_neg_view_norm_nuc_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2354551Z test_ops.py::TestMathBitsCUDA::test_neg_view_normal_number_mean_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2354708Z test_ops.py::TestMathBitsCUDA::test_neg_view_ones_like_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2354888Z test_ops.py::TestMathBitsCUDA::test_neg_view_ops_nvprims_native_batch_norm_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2355048Z test_ops.py::TestMathBitsCUDA::test_neg_view_ops_nvprims_var_mean_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2355202Z test_ops.py::TestMathBitsCUDA::test_neg_view_ormqr_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2355363Z test_ops.py::TestMathBitsCUDA::test_neg_view_pca_lowrank_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2355520Z test_ops.py::TestMathBitsCUDA::test_neg_view_permute_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2355674Z test_ops.py::TestMathBitsCUDA::test_neg_view_polar_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2355870Z test_ops.py::TestMathBitsCUDA::test_neg_view_polygamma_polygamma_n_0_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2356046Z test_ops.py::TestMathBitsCUDA::test_neg_view_polygamma_polygamma_n_3_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2356204Z test_ops.py::TestMathBitsCUDA::test_neg_view_positive_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2356347Z test_ops.py::TestMathBitsCUDA::test_neg_view_pow_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2356501Z test_ops.py::TestMathBitsCUDA::test_neg_view_prod_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2356650Z test_ops.py::TestMathBitsCUDA::test_neg_view_qr_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2356807Z test_ops.py::TestMathBitsCUDA::test_neg_view_quantile_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2356962Z test_ops.py::TestMathBitsCUDA::test_neg_view_rad2deg_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2357117Z test_ops.py::TestMathBitsCUDA::test_neg_view_rand_like_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2357284Z test_ops.py::TestMathBitsCUDA::test_neg_view_randint_like_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2357435Z test_ops.py::TestMathBitsCUDA::test_neg_view_reshape_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2357644Z test_ops.py::TestMathBitsCUDA::test_neg_view_resize__cuda_float64 SKIPPED (Operation not tested with tensors with negative bit.) [ 86%] 2023-01-11T23:10:17.2357804Z test_ops.py::TestMathBitsCUDA::test_neg_view_resolve_conj_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2357985Z test_ops.py::TestMathBitsCUDA::test_neg_view_round_decimals_0_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2358166Z test_ops.py::TestMathBitsCUDA::test_neg_view_round_decimals_3_cuda_float64 SKIPPED (Skipped!) [ 86%] 2023-01-11T23:10:17.2358317Z test_ops.py::TestMathBitsCUDA::test_neg_view_rsub_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2358541Z test_ops.py::TestMathBitsCUDA::test_neg_view_scatter_reduce_amax_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2358706Z test_ops.py::TestMathBitsCUDA::test_neg_view_scatter_reduce_prod_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2358873Z test_ops.py::TestMathBitsCUDA::test_neg_view_scatter_reduce_sum_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2359044Z test_ops.py::TestMathBitsCUDA::test_neg_view_segment_reduce_offsets_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2359200Z test_ops.py::TestMathBitsCUDA::test_neg_view_select_scatter_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2359348Z test_ops.py::TestMathBitsCUDA::test_neg_view_sgn_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2359498Z test_ops.py::TestMathBitsCUDA::test_neg_view_short_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2359652Z test_ops.py::TestMathBitsCUDA::test_neg_view_sigmoid_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2359804Z test_ops.py::TestMathBitsCUDA::test_neg_view_sign_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2359991Z test_ops.py::TestMathBitsCUDA::test_neg_view_signal_windows_bartlett_cuda_float64 SKIPPED (Skipped!) [ 86%] 2023-01-11T23:10:17.2360181Z test_ops.py::TestMathBitsCUDA::test_neg_view_signal_windows_gaussian_cuda_float64 SKIPPED (Skipped!) [ 86%] 2023-01-11T23:10:17.2360368Z test_ops.py::TestMathBitsCUDA::test_neg_view_signal_windows_general_cosine_cuda_float64 SKIPPED (Skipped!) [ 86%] 2023-01-11T23:10:17.2360543Z test_ops.py::TestMathBitsCUDA::test_neg_view_signal_windows_hann_cuda_float64 SKIPPED (Skipped!) [ 86%] 2023-01-11T23:10:17.2360738Z test_ops.py::TestMathBitsCUDA::test_neg_view_signal_windows_kaiser_cuda_float64 SKIPPED (Skipped!) [ 86%] 2023-01-11T23:10:17.2360957Z test_ops.py::TestMathBitsCUDA::test_neg_view_signal_windows_nuttall_cuda_float64 SKIPPED (Skipped!) [ 86%] 2023-01-11T23:10:17.2361110Z test_ops.py::TestMathBitsCUDA::test_neg_view_sinc_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2361262Z test_ops.py::TestMathBitsCUDA::test_neg_view_sinh_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2361418Z test_ops.py::TestMathBitsCUDA::test_neg_view_slice_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2361578Z test_ops.py::TestMathBitsCUDA::test_neg_view_slice_scatter_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2361765Z test_ops.py::TestMathBitsCUDA::test_neg_view_special_bessel_j0_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2361931Z test_ops.py::TestMathBitsCUDA::test_neg_view_special_bessel_j1_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2362108Z test_ops.py::TestMathBitsCUDA::test_neg_view_special_chebyshev_polynomial_u_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2362486Z test_ops.py::TestMathBitsCUDA::test_neg_view_special_chebyshev_polynomial_v_cuda_float64 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 86%] 2023-01-11T23:10:17.2362649Z test_ops.py::TestMathBitsCUDA::test_neg_view_special_erfcx_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2362828Z test_ops.py::TestMathBitsCUDA::test_neg_view_special_hermite_polynomial_he_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2362989Z test_ops.py::TestMathBitsCUDA::test_neg_view_special_i1e_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2363175Z test_ops.py::TestMathBitsCUDA::test_neg_view_special_laguerre_polynomial_l_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2363350Z test_ops.py::TestMathBitsCUDA::test_neg_view_special_modified_bessel_i0_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2363519Z test_ops.py::TestMathBitsCUDA::test_neg_view_special_modified_bessel_i1_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2363685Z test_ops.py::TestMathBitsCUDA::test_neg_view_special_modified_bessel_k0_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2363846Z test_ops.py::TestMathBitsCUDA::test_neg_view_special_modified_bessel_k1_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2364007Z test_ops.py::TestMathBitsCUDA::test_neg_view_special_ndtr_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2364166Z test_ops.py::TestMathBitsCUDA::test_neg_view_special_ndtri_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2364384Z test_ops.py::TestMathBitsCUDA::test_neg_view_special_polygamma_special_polygamma_n_0_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2364569Z test_ops.py::TestMathBitsCUDA::test_neg_view_special_scaled_modified_bessel_k1_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2364932Z test_ops.py::TestMathBitsCUDA::test_neg_view_special_shifted_chebyshev_polynomial_u_cuda_float64 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 86%] 2023-01-11T23:10:17.2365290Z test_ops.py::TestMathBitsCUDA::test_neg_view_special_shifted_chebyshev_polynomial_v_cuda_float64 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 86%] 2023-01-11T23:10:17.2365452Z test_ops.py::TestMathBitsCUDA::test_neg_view_special_xlog1py_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2365611Z test_ops.py::TestMathBitsCUDA::test_neg_view_special_zeta_cuda_float64 PASSED [ 86%] 2023-01-11T23:10:17.2365764Z test_ops.py::TestMathBitsCUDA::test_neg_view_split_list_args_cuda_float64 PASSED [ 87%] 2023-01-11T23:10:17.2365917Z test_ops.py::TestMathBitsCUDA::test_neg_view_stack_cuda_float64 XFAIL [ 87%] 2023-01-11T23:10:17.2366067Z test_ops.py::TestMathBitsCUDA::test_neg_view_std_cuda_float64 PASSED [ 87%] 2023-01-11T23:10:17.2366225Z test_ops.py::TestMathBitsCUDA::test_neg_view_std_mean_cuda_float64 PASSED [ 87%] 2023-01-11T23:10:17.2366387Z test_ops.py::TestMathBitsCUDA::test_neg_view_std_unbiased_cuda_float64 PASSED [ 87%] 2023-01-11T23:10:17.2366537Z test_ops.py::TestMathBitsCUDA::test_neg_view_sub_cuda_float64 PASSED [ 87%] 2023-01-11T23:10:17.2366684Z test_ops.py::TestMathBitsCUDA::test_neg_view_sum_cuda_float64 PASSED [ 87%] 2023-01-11T23:10:17.2366831Z test_ops.py::TestMathBitsCUDA::test_neg_view_take_cuda_float64 PASSED [ 87%] 2023-01-11T23:10:17.2366982Z test_ops.py::TestMathBitsCUDA::test_neg_view_tensordot_cuda_float64 PASSED [ 87%] 2023-01-11T23:10:17.2367130Z test_ops.py::TestMathBitsCUDA::test_neg_view_tile_cuda_float64 PASSED [ 87%] 2023-01-11T23:10:17.2367282Z test_ops.py::TestMathBitsCUDA::test_neg_view_topk_cuda_float64 PASSED [ 87%] 2023-01-11T23:10:17.2367428Z test_ops.py::TestMathBitsCUDA::test_neg_view_trace_cuda_float64 PASSED [ 87%] 2023-01-11T23:10:17.2367608Z test_ops.py::TestMathBitsCUDA::test_neg_view_trapezoid_cuda_float64 PASSED [ 87%] 2023-01-11T23:10:17.2367756Z test_ops.py::TestMathBitsCUDA::test_neg_view_tril_cuda_float64 PASSED [ 87%] 2023-01-11T23:10:17.2367917Z test_ops.py::TestMathBitsCUDA::test_neg_view_true_divide_cuda_float64 PASSED [ 87%] 2023-01-11T23:10:17.2368067Z test_ops.py::TestMathBitsCUDA::test_neg_view_unbind_cuda_float64 PASSED [ 87%] 2023-01-11T23:10:17.2368219Z test_ops.py::TestMathBitsCUDA::test_neg_view_unfold_copy_cuda_float64 PASSED [ 87%] 2023-01-11T23:10:17.2368385Z test_ops.py::TestMathBitsCUDA::test_neg_view_unique_consecutive_cuda_float64 PASSED [ 87%] 2023-01-11T23:10:17.2368542Z test_ops.py::TestMathBitsCUDA::test_neg_view_unsqueeze_cuda_float64 PASSED [ 87%] 2023-01-11T23:10:17.2368693Z test_ops.py::TestMathBitsCUDA::test_neg_view_view_as_cuda_float64 PASSED [ 87%] 2023-01-11T23:10:17.2368840Z test_ops.py::TestMathBitsCUDA::test_neg_view_view_cuda_float64 PASSED [ 87%] 2023-01-11T23:10:17.2368994Z test_ops.py::TestMathBitsCUDA::test_neg_view_vsplit_cuda_float64 PASSED [ 87%] 2023-01-11T23:10:17.2369143Z test_ops.py::TestMathBitsCUDA::test_neg_view_vstack_cuda_float64 PASSED [ 87%] 2023-01-11T23:10:17.2369291Z test_ops.py::TestMathBitsCUDA::test_neg_view_zero__cuda_float64 PASSED [ 87%] 2023-01-11T23:10:17.2369450Z test_ops.py::TestMathBitsCUDA::test_neg_view_zeros_like_cuda_float64 PASSED [ 87%] 2023-01-11T23:10:17.2369593Z test_ops.py::TestFakeTensorCUDA::test_fake_T_cuda_float32 PASSED [ 87%] 2023-01-11T23:10:17.2369744Z test_ops.py::TestFakeTensorCUDA::test_fake___radd___cuda_float32 PASSED [ 87%] 2023-01-11T23:10:17.2369891Z test_ops.py::TestFakeTensorCUDA::test_fake___rand___cuda_int64 PASSED [ 87%] 2023-01-11T23:10:17.2370047Z test_ops.py::TestFakeTensorCUDA::test_fake___rmatmul___cuda_float32 PASSED [ 87%] 2023-01-11T23:10:17.2370218Z test_ops.py::TestFakeTensorCUDA::test_fake___rmul___cuda_float32 PASSED [ 87%] 2023-01-11T23:10:17.2370363Z test_ops.py::TestFakeTensorCUDA::test_fake___ror___cuda_int64 PASSED [ 87%] 2023-01-11T23:10:17.2370518Z test_ops.py::TestFakeTensorCUDA::test_fake___rsub___cuda_float32 PASSED [ 87%] 2023-01-11T23:10:17.2370689Z test_ops.py::TestFakeTensorCUDA::test_fake__native_batch_norm_legit_cuda_float32 PASSED [ 87%] 2023-01-11T23:10:17.2370831Z test_ops.py::TestFakeTensorCUDA::test_fake_acos_cuda_float32 PASSED [ 87%] 2023-01-11T23:10:17.2370980Z test_ops.py::TestFakeTensorCUDA::test_fake_addbmm_cuda_float32 PASSED [ 87%] 2023-01-11T23:10:17.2371147Z test_ops.py::TestFakeTensorCUDA::test_fake_addmm_decomposed_cuda_float32 PASSED [ 87%] 2023-01-11T23:10:17.2371295Z test_ops.py::TestFakeTensorCUDA::test_fake_addr_cuda_float32 PASSED [ 87%] 2023-01-11T23:10:17.2371444Z test_ops.py::TestFakeTensorCUDA::test_fake_amax_cuda_float32 PASSED [ 87%] 2023-01-11T23:10:17.2371626Z test_ops.py::TestFakeTensorCUDA::test_fake_aminmax_cuda_float32 SKIPPED (Skip failing test) [ 87%] 2023-01-11T23:10:17.2371776Z test_ops.py::TestFakeTensorCUDA::test_fake_angle_cuda_float32 PASSED [ 87%] 2023-01-11T23:10:17.2371931Z test_ops.py::TestFakeTensorCUDA::test_fake_argwhere_cuda_float32 PASSED [ 87%] 2023-01-11T23:10:17.2372080Z test_ops.py::TestFakeTensorCUDA::test_fake_as_strided_cuda_float32 PASSED [ 87%] 2023-01-11T23:10:17.2372247Z test_ops.py::TestFakeTensorCUDA::test_fake_as_strided_scatter_cuda_float32 PASSED [ 87%] 2023-01-11T23:10:17.2372396Z test_ops.py::TestFakeTensorCUDA::test_fake_asinh_cuda_float32 PASSED [ 87%] 2023-01-11T23:10:17.2372545Z test_ops.py::TestFakeTensorCUDA::test_fake_atan2_cuda_float32 PASSED [ 87%] 2023-01-11T23:10:17.2372692Z test_ops.py::TestFakeTensorCUDA::test_fake_atan_cuda_float32 PASSED [ 87%] 2023-01-11T23:10:17.2372846Z test_ops.py::TestFakeTensorCUDA::test_fake_atleast_2d_cuda_float32 PASSED [ 87%] 2023-01-11T23:10:17.2373003Z test_ops.py::TestFakeTensorCUDA::test_fake_atleast_3d_cuda_float32 PASSED [ 87%] 2023-01-11T23:10:17.2373169Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast___getitem___cuda_float32 PASSED [ 87%] 2023-01-11T23:10:17.2373354Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast___rand___cuda_int64 PASSED [ 87%] 2023-01-11T23:10:17.2373518Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast___rdiv___cuda_float32 PASSED [ 87%] 2023-01-11T23:10:17.2373686Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast___rmatmul___cuda_float32 PASSED [ 87%] 2023-01-11T23:10:17.2373850Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast___rmod___cuda_float32 PASSED [ 87%] 2023-01-11T23:10:17.2374008Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast___rpow___cuda_float32 PASSED [ 87%] 2023-01-11T23:10:17.2374191Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast__native_batch_norm_legit_cuda_float32 PASSED [ 87%] 2023-01-11T23:10:17.2374354Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_abs_cuda_float32 PASSED [ 87%] 2023-01-11T23:10:17.2374621Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_acos_cuda_float32 PASSED [ 87%] 2023-01-11T23:10:17.2374788Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_addcdiv_cuda_float32 PASSED [ 87%] 2023-01-11T23:10:17.2374946Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_addr_cuda_float32 PASSED [ 87%] 2023-01-11T23:10:17.2375104Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_all_cuda_float32 PASSED [ 87%] 2023-01-11T23:10:17.2375264Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_amax_cuda_float32 PASSED [ 87%] 2023-01-11T23:10:17.2375420Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_amin_cuda_float32 PASSED [ 87%] 2023-01-11T23:10:17.2375577Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_any_cuda_float32 PASSED [ 87%] 2023-01-11T23:10:17.2375739Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_arange_cuda_float32 PASSED [ 87%] 2023-01-11T23:10:17.2375897Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_argmin_cuda_float32 PASSED [ 87%] 2023-01-11T23:10:17.2376101Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_argsort_cuda_float32 PASSED [ 87%] 2023-01-11T23:10:17.2376256Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_argwhere_cuda_float32 PASSED [ 87%] 2023-01-11T23:10:17.2376439Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_as_strided_partial_views_cuda_float32 PASSED [ 87%] 2023-01-11T23:10:17.2376602Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_asinh_cuda_float32 PASSED [ 87%] 2023-01-11T23:10:17.2376758Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_atan_cuda_float32 PASSED [ 87%] 2023-01-11T23:10:17.2376920Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_bincount_cuda_int64 PASSED [ 87%] 2023-01-11T23:10:17.2377092Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_bitwise_left_shift_cuda_int64 PASSED [ 87%] 2023-01-11T23:10:17.2377257Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_bitwise_not_cuda_int64 PASSED [ 87%] 2023-01-11T23:10:17.2377435Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_bitwise_right_shift_cuda_int64 PASSED [ 87%] 2023-01-11T23:10:17.2377589Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_bitwise_xor_cuda_int64 PASSED [ 87%] 2023-01-11T23:10:17.2377758Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_block_diag_cuda_float32 PASSED [ 87%] 2023-01-11T23:10:17.2377918Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_bmm_cuda_float32 PASSED [ 87%] 2023-01-11T23:10:17.2378091Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_broadcast_shapes_cuda_float32 PASSED [ 87%] 2023-01-11T23:10:17.2378259Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_broadcast_to_cuda_float32 PASSED [ 87%] 2023-01-11T23:10:17.2378423Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_bucketize_cuda_float32 PASSED [ 87%] 2023-01-11T23:10:17.2378595Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_cartesian_prod_cuda_float32 PASSED [ 87%] 2023-01-11T23:10:17.2378750Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_ceil_cuda_float32 PASSED [ 87%] 2023-01-11T23:10:17.2378912Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_cfloat_cuda_float32 PASSED [ 87%] 2023-01-11T23:10:17.2379066Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_chalf_cuda_float32 PASSED [ 87%] 2023-01-11T23:10:17.2379251Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_char_cuda_float32 PASSED [ 87%] 2023-01-11T23:10:17.2379445Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_cholesky_cuda_float32 SKIPPED (Skip failing test) [ 87%] 2023-01-11T23:10:17.2379613Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_cholesky_solve_cuda_float32 PASSED [ 87%] 2023-01-11T23:10:17.2379774Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_clamp_cuda_float32 PASSED [ 87%] 2023-01-11T23:10:17.2379942Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_clamp_min_cuda_float32 PASSED [ 87%] 2023-01-11T23:10:17.2380102Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_clone_cuda_float32 PASSED [ 87%] 2023-01-11T23:10:17.2380271Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_combinations_cuda_float32 PASSED [ 87%] 2023-01-11T23:10:17.2380439Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_constant_pad_nd_cuda_float32 PASSED [ 87%] 2023-01-11T23:10:17.2380609Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_contiguous_cuda_float32 PASSED [ 87%] 2023-01-11T23:10:17.2380774Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_copysign_cuda_float32 PASSED [ 87%] 2023-01-11T23:10:17.2380931Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_cosh_cuda_float32 PASSED [ 87%] 2023-01-11T23:10:17.2381114Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_cov_cuda_float32 SKIPPED (Skip failing test) [ 87%] 2023-01-11T23:10:17.2381277Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_cross_cuda_float32 PASSED [ 87%] 2023-01-11T23:10:17.2381439Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_cummin_cuda_float32 PASSED [ 87%] 2023-01-11T23:10:17.2381601Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_cumprod_cuda_float32 PASSED [ 87%] 2023-01-11T23:10:17.2381788Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_diag_cuda_float32 PASSED [ 87%] 2023-01-11T23:10:17.2381946Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_diagflat_cuda_float32 PASSED [ 87%] 2023-01-11T23:10:17.2382114Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_diagonal_cuda_float32 PASSED [ 87%] 2023-01-11T23:10:17.2382287Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_diagonal_scatter_cuda_float32 PASSED [ 87%] 2023-01-11T23:10:17.2382447Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_diff_cuda_float32 PASSED [ 87%] 2023-01-11T23:10:17.2382606Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_dist_cuda_float32 PASSED [ 87%] 2023-01-11T23:10:17.2382767Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_dot_cuda_float32 PASSED [ 87%] 2023-01-11T23:10:17.2382929Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_double_cuda_float32 PASSED [ 87%] 2023-01-11T23:10:17.2383091Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_dsplit_cuda_float32 PASSED [ 87%] 2023-01-11T23:10:17.2383243Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_dstack_cuda_float32 PASSED [ 87%] 2023-01-11T23:10:17.2383399Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_einsum_cuda_float32 PASSED [ 87%] 2023-01-11T23:10:17.2383563Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_empty_cuda_float32 PASSED [ 87%] 2023-01-11T23:10:17.2383728Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_empty_like_cuda_float32 PASSED [ 87%] 2023-01-11T23:10:17.2383885Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_eq_cuda_float32 PASSED [ 87%] 2023-01-11T23:10:17.2384043Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_erfc_cuda_float32 PASSED [ 87%] 2023-01-11T23:10:17.2384200Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_erfinv_cuda_float32 PASSED [ 87%] 2023-01-11T23:10:17.2384359Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_exp_cuda_float32 PASSED [ 87%] 2023-01-11T23:10:17.2384517Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_expand_as_cuda_float32 PASSED [ 87%] 2023-01-11T23:10:17.2384682Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_expand_cuda_float32 PASSED [ 87%] 2023-01-11T23:10:17.2384841Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_expm1_cuda_float32 PASSED [ 87%] 2023-01-11T23:10:17.2385039Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fft_fft2_cuda_float32 PASSED [ 87%] 2023-01-11T23:10:17.2385202Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fft_fftn_cuda_float32 PASSED [ 87%] 2023-01-11T23:10:17.2385369Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fft_fftshift_cuda_float32 PASSED [ 87%] 2023-01-11T23:10:17.2385534Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fft_hfft2_cuda_float32 PASSED [ 87%] 2023-01-11T23:10:17.2385701Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fft_ifftn_cuda_float32 PASSED [ 87%] 2023-01-11T23:10:17.2385870Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fft_ifftshift_cuda_float32 PASSED [ 87%] 2023-01-11T23:10:17.2386027Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fft_ihfft2_cuda_float32 PASSED [ 87%] 2023-01-11T23:10:17.2386191Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fft_ihfft_cuda_float32 PASSED [ 87%] 2023-01-11T23:10:17.2386358Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fft_ihfftn_cuda_float32 PASSED [ 87%] 2023-01-11T23:10:17.2386524Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fft_irfftn_cuda_float32 PASSED [ 87%] 2023-01-11T23:10:17.2386685Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fft_rfft_cuda_float32 PASSED [ 87%] 2023-01-11T23:10:17.2386848Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_flatten_cuda_float32 PASSED [ 87%] 2023-01-11T23:10:17.2387010Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_float_cuda_float32 PASSED [ 87%] 2023-01-11T23:10:17.2387182Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_float_power_cuda_float32 PASSED [ 87%] 2023-01-11T23:10:17.2387335Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_floor_cuda_float32 PASSED [ 87%] 2023-01-11T23:10:17.2387518Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fmin_cuda_float32 PASSED [ 87%] 2023-01-11T23:10:17.2387673Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_frac_cuda_float32 PASSED [ 87%] 2023-01-11T23:10:17.2387834Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_frexp_cuda_float32 PASSED [ 87%] 2023-01-11T23:10:17.2387994Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_full_cuda_float32 PASSED [ 87%] 2023-01-11T23:10:17.2388150Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_geqrf_cuda_float32 PASSED [ 87%] 2023-01-11T23:10:17.2388313Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_gradient_cuda_float32 PASSED [ 87%] 2023-01-11T23:10:17.2388486Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_grid_sampler_2d_cuda_float32 PASSED [ 87%] 2023-01-11T23:10:17.2388639Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_gt_cuda_float32 PASSED [ 87%] 2023-01-11T23:10:17.2388875Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_half_cuda_float32 PASSED [ 87%] 2023-01-11T23:10:17.2389043Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_heaviside_cuda_float32 PASSED [ 87%] 2023-01-11T23:10:17.2389203Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_hstack_cuda_float32 PASSED [ 87%] 2023-01-11T23:10:17.2389365Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_hypot_cuda_float32 PASSED [ 87%] 2023-01-11T23:10:17.2389521Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_i0_cuda_float32 PASSED [ 87%] 2023-01-11T23:10:17.2389684Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_igammac_cuda_float32 PASSED [ 87%] 2023-01-11T23:10:17.2389850Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_index_add_cuda_float32 PASSED [ 87%] 2023-01-11T23:10:17.2390018Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_index_reduce_cuda_float32 PASSED [ 87%] 2023-01-11T23:10:17.2390178Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_index_select_cuda_float32 PASSED [ 87%] 2023-01-11T23:10:17.2390336Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_inner_cuda_float32 PASSED [ 87%] 2023-01-11T23:10:17.2390496Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_int_cuda_float32 PASSED [ 87%] 2023-01-11T23:10:17.2390657Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_isin_cuda_float32 PASSED [ 87%] 2023-01-11T23:10:17.2390838Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_isinf_cuda_float32 PASSED [ 87%] 2023-01-11T23:10:17.2391003Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_isneginf_cuda_float32 PASSED [ 87%] 2023-01-11T23:10:17.2391167Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_isreal_cuda_float32 PASSED [ 87%] 2023-01-11T23:10:17.2391375Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_jiterator_2inputs_2outputs_cuda_float32 SKIPPED (Skip failing test) [ 88%] 2023-01-11T23:10:17.2391582Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_jiterator_4inputs_with_extra_args_cuda_float32 SKIPPED (Skip failing test) [ 88%] 2023-01-11T23:10:17.2391780Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_jiterator_binary_cuda_float32 SKIPPED (Skip failing test) [ 88%] 2023-01-11T23:10:17.2391978Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_jiterator_unary_cuda_float32 SKIPPED (Skip failing test) [ 88%] 2023-01-11T23:10:17.2392140Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_kron_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2392298Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_lcm_cuda_int64 PASSED [ 88%] 2023-01-11T23:10:17.2392459Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_ldexp_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2392616Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_le_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2392779Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_lgamma_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2392944Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_det_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2393105Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_eigh_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2393292Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_householder_product_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2393491Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_ldl_factor_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2393668Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_lu_factor_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2393845Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_lu_factor_ex_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2394015Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_lu_solve_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2394191Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_matrix_norm_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2394392Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_matrix_power_cuda_float32 SKIPPED (Skip failing test) [ 88%] 2023-01-11T23:10:17.2394556Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_qr_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2394719Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_slogdet_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2394892Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_solve_ex_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2395080Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_solve_triangular_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2395246Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linspace_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2395407Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_log_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2395574Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_log_softmax_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2395741Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_logaddexp_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2395913Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_logcumsumexp_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2396072Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_logical_not_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2396239Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_logical_or_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2396402Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_logical_xor_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2396586Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_logit_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2396752Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_logsumexp_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2396914Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_long_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2397073Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_lt_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2397231Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_lu_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2397418Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_lu_solve_cuda_float32 SKIPPED (Skip failing test) [ 88%] 2023-01-11T23:10:17.2397566Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_mH_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2397723Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_mT_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2397888Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_masked_amin_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2398061Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_masked_argmax_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2398229Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_masked_argmin_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2398395Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_masked_cumsum_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2398556Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_masked_fill_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2398731Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_masked_log_softmax_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2398899Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_masked_logaddexp_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2399090Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_masked_median_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2399258Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_masked_softmax_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2399428Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_masked_softmin_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2399591Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_max_binary_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2399773Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_max_reduction_with_dim_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2399942Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_maximum_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2400102Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_mean_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2400262Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_median_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2400439Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_meshgrid_list_of_tensors_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2400608Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_min_binary_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2400773Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_minimum_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2400934Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_mm_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2401117Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_mul_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2401296Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_mv_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2401499Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_mvlgamma_mvlgamma_p_1_cuda_float32 SKIPPED (Skip failing test) [ 88%] 2023-01-11T23:10:17.2401701Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_mvlgamma_mvlgamma_p_3_cuda_float32 SKIPPED (Skip failing test) [ 88%] 2023-01-11T23:10:17.2401895Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_mvlgamma_mvlgamma_p_5_cuda_float32 SKIPPED (Skip failing test) [ 88%] 2023-01-11T23:10:17.2402068Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nan_to_num_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2402237Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nanmedian_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2402450Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nanquantile_cuda_float32 SKIPPED (Skip failing test) [ 88%] 2023-01-11T23:10:17.2402614Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nansum_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2402801Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_narrow_cuda_float32 SKIPPED (Skip failing test) [ 88%] 2023-01-11T23:10:17.2402992Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_native_batch_norm_cuda_float32 SKIPPED (Skipped!) [ 88%] 2023-01-11T23:10:17.2403147Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_ne_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2403308Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_neg_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2403472Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_new_empty_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2403634Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_new_full_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2403799Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_new_ones_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2403993Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_adaptive_avg_pool2d_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2404185Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_adaptive_max_pool1d_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2404379Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_adaptive_max_pool3d_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2404567Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_alpha_dropout_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2404749Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_avg_pool1d_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2404952Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_avg_pool2d_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2405129Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_avg_pool3d_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2405335Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_binary_cross_entropy_with_logits_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2405510Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_celu_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2405700Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_conv_transpose3d_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2405894Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_cosine_embedding_loss_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2406094Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_ctc_loss_cuda_float32 SKIPPED (Skip failing test) [ 88%] 2023-01-11T23:10:17.2406280Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_dropout2d_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2406464Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_dropout3d_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2406646Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_dropout_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2406813Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_elu_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2407025Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_embedding_bag_cuda_float32 SKIPPED (Skip failing test) [ 88%] 2023-01-11T23:10:17.2407204Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_embedding_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2407403Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_fractional_max_pool2d_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2407591Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_gaussian_nll_loss_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2407768Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_gelu_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2407969Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_glu_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2408155Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_grid_sample_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2408339Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_group_norm_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2408517Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_hardsigmoid_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2408699Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_hardswish_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2408879Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_hardtanh_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2409075Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_hinge_embedding_loss_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2409265Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_interpolate_area_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2409463Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_interpolate_bilinear_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2409641Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_l1_loss_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2409822Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_layer_norm_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2410002Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_leaky_relu_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2410172Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_linear_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2410354Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_max_pool2d_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2410561Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_max_pool3d_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2410747Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_max_unpool1d_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2410969Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_max_unpool3d_grad_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2411187Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_multilabel_margin_loss_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2411384Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_multilabel_soft_margin_loss_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2411589Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_nll_loss_cuda_float32 SKIPPED (Skip failing test) [ 88%] 2023-01-11T23:10:17.2411769Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_normalize_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2411957Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_pad_circular_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2412135Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_pad_reflect_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2412325Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_pad_replicate_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2412516Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_pairwise_distance_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2412694Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_pdist_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2412883Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_poisson_nll_loss_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2413058Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_selu_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2413232Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_silu_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2413421Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_smooth_l1_loss_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2413632Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_soft_margin_loss_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2413807Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_softmin_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2413997Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_softmin_with_dtype_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2414184Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_softshrink_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2414365Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_softsign_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2414656Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_tanhshrink_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2414841Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_threshold_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2415039Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_triplet_margin_loss_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2415223Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_unfold_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2415411Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_upsample_bilinear_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2415569Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nonzero_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2415730Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_ormqr_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2415894Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_permute_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2416078Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_polygamma_polygamma_n_2_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2416263Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_polygamma_polygamma_n_3_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2416462Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_qr_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2416656Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_quantile_cuda_float32 SKIPPED (Skip failing test) [ 88%] 2023-01-11T23:10:17.2416820Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_randint_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2416979Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_ravel_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2417143Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_real_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2417311Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_remainder_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2417478Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_repeat_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2417680Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_repeat_interleave_cuda_float32 SKIPPED (Skip failing test) [ 88%] 2023-01-11T23:10:17.2417854Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_reshape_as_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2418019Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_reshape_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2418192Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_resize_as__cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2418355Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_rot90_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2418509Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_round_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2418685Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_round_decimals_3_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2418852Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_scatter_add_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2419031Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_scatter_reduce_amin_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2419206Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_scatter_reduce_mean_cuda_float32 PASSED [ 88%] 2023-01-11T23:10:17.2419414Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_segment_reduce_lengths_cuda_float32 SKIPPED (Skip failing test) [ 88%] 2023-01-11T23:10:17.2419607Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_select_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2419778Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_select_scatter_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2419931Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_sgn_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2420091Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_short_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2420274Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_signal_windows_blackman_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2420454Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_signal_windows_cosine_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2420637Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_signal_windows_gaussian_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2420830Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_signal_windows_general_cosine_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2421021Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_signal_windows_general_hamming_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2421200Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_signal_windows_hamming_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2421381Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_signal_windows_hann_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2421551Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_signal_windows_kaiser_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2421727Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_signal_windows_nuttall_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2421887Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_sinh_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2422044Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_slice_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2422237Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_slice_scatter_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2422400Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_sort_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2422600Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_sparse_sampled_addmm_cuda_float32 SKIPPED (Skip failing test) [ 89%] 2023-01-11T23:10:17.2422774Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_airy_ai_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2422946Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_bessel_j1_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2423112Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_bessel_y1_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2423304Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_chebyshev_polynomial_u_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2423671Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_chebyshev_polynomial_v_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 89%] 2023-01-11T23:10:17.2424031Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_chebyshev_polynomial_w_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 89%] 2023-01-11T23:10:17.2424201Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_entr_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2424374Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_erfcx_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2424562Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_hermite_polynomial_h_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2424727Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_i1e_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2425081Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_legendre_polynomial_p_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 89%] 2023-01-11T23:10:17.2425275Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_modified_bessel_i0_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2425453Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_modified_bessel_k0_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2425660Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_ndtri_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2425853Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_scaled_modified_bessel_k0_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2426223Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_shifted_chebyshev_polynomial_t_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 89%] 2023-01-11T23:10:17.2426580Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_shifted_chebyshev_polynomial_v_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 89%] 2023-01-11T23:10:17.2426766Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_spherical_bessel_j0_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2426941Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_xlog1py_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2427111Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_zeta_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2427284Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_split_with_sizes_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2427440Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_squeeze_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2427602Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_stack_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2427764Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_std_mean_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2427933Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_std_unbiased_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2428092Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_stft_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2428280Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_sum_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2428434Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_svd_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2428598Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_symeig_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2428832Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_take_along_dim_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2428992Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_tan_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2429160Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_tensordot_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2429320Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_tile_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2429478Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_to_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2429642Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_transpose_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2429815Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_trapezoid_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2429977Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_trapz_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2430139Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_tril_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2430321Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_triu_indices_cuda_int64 PASSED [ 89%] 2023-01-11T23:10:17.2430507Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_unbind_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2430679Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_unfold_copy_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2430842Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_uniform_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2431003Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_unique_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2431167Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_unsqueeze_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2431333Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_var_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2431509Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_var_mean_unbiased_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2431697Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_var_unbiased_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2431861Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_vdot_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2432023Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_view_as_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2432195Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_view_as_real_cuda_complex64 PASSED [ 89%] 2023-01-11T23:10:17.2432361Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_view_copy_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2432523Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_vsplit_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2432684Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_vstack_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2432844Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_where_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2433011Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_zeros_like_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2433159Z test_ops.py::TestFakeTensorCUDA::test_fake_bitwise_and_cuda_int64 PASSED [ 89%] 2023-01-11T23:10:17.2433323Z test_ops.py::TestFakeTensorCUDA::test_fake_bitwise_left_shift_cuda_int64 PASSED [ 89%] 2023-01-11T23:10:17.2433486Z test_ops.py::TestFakeTensorCUDA::test_fake_bitwise_right_shift_cuda_int64 PASSED [ 89%] 2023-01-11T23:10:17.2433646Z test_ops.py::TestFakeTensorCUDA::test_fake_broadcast_to_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2433804Z test_ops.py::TestFakeTensorCUDA::test_fake_bucketize_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2433966Z test_ops.py::TestFakeTensorCUDA::test_fake_cartesian_prod_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2434144Z test_ops.py::TestFakeTensorCUDA::test_fake_cat_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2434297Z test_ops.py::TestFakeTensorCUDA::test_fake_cdist_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2434443Z test_ops.py::TestFakeTensorCUDA::test_fake_ceil_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2434601Z test_ops.py::TestFakeTensorCUDA::test_fake_cfloat_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2434751Z test_ops.py::TestFakeTensorCUDA::test_fake_chalf_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2434930Z test_ops.py::TestFakeTensorCUDA::test_fake_cholesky_cuda_float32 SKIPPED (Skip failing test) [ 89%] 2023-01-11T23:10:17.2435121Z test_ops.py::TestFakeTensorCUDA::test_fake_cholesky_inverse_cuda_float32 SKIPPED (Skip failing test) [ 89%] 2023-01-11T23:10:17.2435272Z test_ops.py::TestFakeTensorCUDA::test_fake_chunk_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2435425Z test_ops.py::TestFakeTensorCUDA::test_fake_clamp_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2435582Z test_ops.py::TestFakeTensorCUDA::test_fake_clamp_max_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2435727Z test_ops.py::TestFakeTensorCUDA::test_fake_clamp_min_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2435886Z test_ops.py::TestFakeTensorCUDA::test_fake_column_stack_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2436046Z test_ops.py::TestFakeTensorCUDA::test_fake_conj_physical_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2436203Z test_ops.py::TestFakeTensorCUDA::test_fake_contiguous_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2436363Z test_ops.py::TestFakeTensorCUDA::test_fake_count_nonzero_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2436532Z test_ops.py::TestFakeTensorCUDA::test_fake_cov_cuda_float32 SKIPPED (Skip failing test) [ 89%] 2023-01-11T23:10:17.2436702Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_H_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2436885Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp___getitem___cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2437061Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp___rmod___cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2437233Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_abs_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2437434Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_acosh_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2437613Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_addbmm_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2437788Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_addmv_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2437964Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_addr_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2438139Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_amax_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2438313Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_amin_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2438516Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_angle_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2438702Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_as_strided_scatter_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2438878Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_asinh_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2439049Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_atan_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2439229Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_atleast_2d_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2439409Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_baddbmm_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2439589Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_block_diag_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2439778Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_broadcast_tensors_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2439991Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_cartesian_prod_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2440166Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_cdist_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2440340Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_cdouble_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2440519Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_cfloat_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2440693Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_cholesky_cuda_float32 XFAIL [ 89%] 2023-01-11T23:10:17.2440882Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_cholesky_inverse_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2441067Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_cholesky_solve_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2441245Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_clamp_max_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2441421Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_clone_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2441606Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_column_stack_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2441790Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_combinations_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2441968Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_conj_physical_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2442154Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_constant_pad_nd_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2442337Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_contiguous_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2442517Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_copysign_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2442693Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_cummin_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2442876Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_cumprod_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2443067Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_deg2rad_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2443241Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_diag_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2443418Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_diag_embed_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2443589Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_diagflat_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2443758Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_diagonal_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2443945Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_diagonal_scatter_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2444124Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_diff_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2444315Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_div_floor_rounding_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2444509Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_div_no_rounding_mode_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2444700Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_div_trunc_rounding_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2444878Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_dstack_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2445050Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_einsum_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2445215Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_erf_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2445394Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_erfinv_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2445595Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_exp2_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2445768Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_exp_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2445947Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_expand_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2446117Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fft_fft_cuda_float32 XFAIL [ 89%] 2023-01-11T23:10:17.2446290Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fft_fftn_cuda_float32 XFAIL [ 89%] 2023-01-11T23:10:17.2446465Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fft_hfft2_cuda_float32 XFAIL [ 89%] 2023-01-11T23:10:17.2446639Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fft_hfft_cuda_float32 XFAIL [ 89%] 2023-01-11T23:10:17.2446804Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fft_ifft2_cuda_float32 XFAIL [ 89%] 2023-01-11T23:10:17.2446975Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fft_ifft_cuda_float32 XFAIL [ 89%] 2023-01-11T23:10:17.2447146Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fft_ifftn_cuda_float32 XFAIL [ 89%] 2023-01-11T23:10:17.2447331Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fft_ifftshift_cuda_float32 PASSED [ 89%] 2023-01-11T23:10:17.2447504Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fft_ihfft_cuda_float32 XFAIL [ 89%] 2023-01-11T23:10:17.2447683Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fft_irfft2_cuda_float32 XFAIL [ 90%] 2023-01-11T23:10:17.2447857Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fft_irfftn_cuda_float32 XFAIL [ 90%] 2023-01-11T23:10:17.2448030Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fft_rfftn_cuda_float32 XFAIL [ 90%] 2023-01-11T23:10:17.2448204Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fill_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2448373Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_flatten_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2448545Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fliplr_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2448775Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fmax_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2448949Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fmin_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2449126Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_frexp_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2449300Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_gradient_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2449472Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_half_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2449647Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_hsplit_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2449813Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_hstack_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2449989Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_hypot_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2450170Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_index_add_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2450350Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_index_copy_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2450559Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_index_reduce_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2450757Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_kthvalue_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2450928Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_ldexp_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2451101Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_lerp_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2451305Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_cond_cuda_float32 XFAIL [ 90%] 2023-01-11T23:10:17.2451486Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_eigvalsh_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2451670Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_inv_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2451855Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_inv_ex_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2452039Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_lstsq_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2452236Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_lstsq_grad_oriented_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2452421Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_lu_solve_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2452606Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_matrix_norm_cuda_float32 XFAIL [ 90%] 2023-01-11T23:10:17.2452814Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_pinv_hermitian_cuda_float32 SKIPPED (Skipped!) [ 90%] 2023-01-11T23:10:17.2453002Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_slogdet_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2453191Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_solve_triangular_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2453373Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_svdvals_cuda_float32 XFAIL [ 90%] 2023-01-11T23:10:17.2453561Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_tensorinv_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2453752Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_tensorsolve_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2453939Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_vector_norm_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2454117Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_log10_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2454315Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_log1p_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2454592Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_log_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2454789Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_log_softmax_with_dtype_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2454964Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_logaddexp_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2455142Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_logsumexp_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2455319Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_lu_solve_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2455490Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_lu_unpack_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2455664Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_mH_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2455849Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_masked_amax_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2456031Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_masked_amin_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2456215Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_masked_cumsum_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2456404Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_masked_log_softmax_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2456584Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_masked_logaddexp_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2456771Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_masked_logsumexp_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2456992Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_masked_median_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2457177Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_masked_prod_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2457366Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_masked_scatter_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2457551Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_masked_softmax_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2457729Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_masked_sum_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2457909Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_masked_var_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2458088Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_matmul_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2458271Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_max_reduction_no_dim_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2458449Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_maximum_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2458623Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_mean_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2458799Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_median_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2458995Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_meshgrid_list_of_tensors_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2459172Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_min_binary_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2459364Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_min_reduction_no_dim_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2459556Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_min_reduction_with_dim_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2459736Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_minimum_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2459899Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_mm_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2460105Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_mul_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2460325Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_mvlgamma_mvlgamma_p_1_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2460527Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nanmedian_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2460701Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_narrow_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2460890Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_native_batch_norm_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2461086Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_native_dropout_backward_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2461294Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_adaptive_avg_pool2d_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2461503Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_adaptive_max_pool2d_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2461703Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_adaptive_max_pool3d_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2461893Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_avg_pool1d_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2462101Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_batch_norm_without_cudnn_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2462295Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_bilinear_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2462510Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_binary_cross_entropy_with_logits_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2462721Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_celu_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2462913Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_conv1d_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2463103Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_conv2d_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2463308Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_conv_transpose1d_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2463512Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_conv_transpose2d_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2463711Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_cosine_embedding_loss_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2463912Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_cross_entropy_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2464133Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_feature_alpha_dropout_with_train_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2464335Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_gaussian_nll_loss_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2464523Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_gelu_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2464721Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_hardshrink_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2464916Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_hardswish_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2465109Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_hardtanh_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2465314Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_hinge_embedding_loss_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2465511Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_huber_loss_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2465740Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_interpolate_bicubic_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2465949Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_interpolate_bilinear_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2466152Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_interpolate_linear_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2466362Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_interpolate_trilinear_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2466553Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_l1_loss_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2466753Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_layer_norm_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2466943Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_leaky_relu_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2467139Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_logsigmoid_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2467326Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_max_pool1d_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2467514Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_max_pool2d_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2467704Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_max_unpool1d_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2467904Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_max_unpool1d_grad_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2468128Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_max_unpool2d_grad_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2468326Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_max_unpool3d_grad_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2468517Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_mish_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2468770Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_mse_loss_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2468978Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_multi_margin_loss_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2469185Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_multilabel_margin_loss_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2469400Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_multilabel_soft_margin_loss_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2469600Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_pad_circular_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2469792Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_pad_constant_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2469991Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_pad_reflect_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2470184Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_pdist_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2470382Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_pixel_shuffle_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2470570Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_prelu_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2470759Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_selu_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2470949Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_silu_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2471194Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_softmin_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2471404Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_softmin_with_dtype_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2471591Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_softplus_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2471787Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_softshrink_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2472004Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_triplet_margin_with_distance_loss_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2472196Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_unfold_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2472376Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_norm_fro_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2472554Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_norm_nuc_cuda_float32 XFAIL [ 90%] 2023-01-11T23:10:17.2472732Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_outer_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2472908Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_polar_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2473105Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_polygamma_polygamma_n_1_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2473288Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_polygamma_polygamma_n_4_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2473465Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_prod_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2473637Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_put_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2473834Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_qr_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2474014Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_rad2deg_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2474195Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_reciprocal_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2474374Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_remainder_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2474548Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_renorm_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2474721Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_repeat_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2474906Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_repeat_interleave_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2475091Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_resolve_neg_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2475262Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_roll_cuda_float32 XFAIL [ 90%] 2023-01-11T23:10:17.2475441Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_rot90_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2475631Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_round_decimals_neg_3_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2475806Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_rsqrt_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2475980Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_rsub_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2476157Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_scatter_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2476348Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_scatter_reduce_amax_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2476531Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_scatter_reduce_amin_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2476721Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_scatter_reduce_mean_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2476933Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_scatter_reduce_prod_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2477122Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_scatter_reduce_sum_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2477300Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_select_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2477474Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_sign_cuda_float32 PASSED [ 90%] 2023-01-11T23:10:17.2477646Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_sinh_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2477823Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_softmax_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2478016Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_softmax_with_dtype_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2478185Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_sort_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2478371Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_special_erfcx_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2478552Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_special_i0e_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2478736Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_special_ndtr_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2478920Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_special_ndtri_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2479095Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_split_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2479280Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_split_list_args_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2479479Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_squeeze_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2479656Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_stack_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2479821Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_std_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2480009Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_std_mean_unbiased_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2480191Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_std_unbiased_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2480363Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_sum_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2480533Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_svd_cuda_float32 XFAIL [ 91%] 2023-01-11T23:10:17.2480716Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_svd_lowrank_cuda_float32 XFAIL [ 91%] 2023-01-11T23:10:17.2480891Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_symeig_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2481078Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_take_along_dim_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2481251Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_take_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2481424Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_to_sparse_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2481599Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_topk_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2481780Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_transpose_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2481956Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_trapezoid_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2482134Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_tril_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2482308Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_triu_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2482515Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_true_divide_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2482700Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_unfold_copy_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2482877Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_unfold_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2483043Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_var_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2483221Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_var_mean_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2483407Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_var_mean_unbiased_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2483592Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_var_unbiased_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2483766Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_vdot_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2483955Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_view_as_complex_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2484135Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_view_copy_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2484311Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_vstack_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2484483Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_zero__cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2484647Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_H_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2484819Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_T_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2485027Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp___radd___cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2485205Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp___rdiv___cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2485384Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp___rmul___cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2485562Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp___rsub___cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2485758Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp__softmax_backward_data_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2485933Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_acos_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2486110Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_acosh_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2486283Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_addcdiv_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2486461Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_addr_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2486641Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_angle_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2486832Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_as_strided_scatter_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2487008Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_asinh_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2487181Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_atan2_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2487367Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_atleast_2d_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2487552Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_atleast_3d_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2487733Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_baddbmm_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2487913Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_bernoulli_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2488115Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_bfloat16_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2488291Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_bmm_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2488483Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_broadcast_tensors_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2488672Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_cartesian_prod_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2488849Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_cdist_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2489029Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_cdouble_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2489213Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_cfloat_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2489402Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_cholesky_inverse_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2489574Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_clamp_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2489758Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_clamp_max_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2489940Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_clamp_min_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2490125Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_column_stack_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2490303Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_complex_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2490490Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_conj_physical_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2490696Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_contiguous_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2490881Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_copysign_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2491056Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_cos_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2491226Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_cosh_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2491402Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_cross_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2491584Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_cumprod_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2491760Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_deg2rad_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2491940Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_diag_embed_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2492122Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_diagflat_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2492312Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_diagonal_copy_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2492493Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_diagonal_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2492676Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_diagonal_scatter_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2492865Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_div_floor_rounding_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2493056Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_div_trunc_rounding_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2493234Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_double_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2493411Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_dsplit_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2493585Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_erf_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2493784Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_erfc_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2493960Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_exp_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2494145Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_expand_as_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2494325Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_expm1_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2494587Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fft_fft2_cuda_float32 XFAIL [ 91%] 2023-01-11T23:10:17.2494768Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fft_fft_cuda_float32 XFAIL [ 91%] 2023-01-11T23:10:17.2494949Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fft_fftn_cuda_float32 XFAIL [ 91%] 2023-01-11T23:10:17.2495130Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fft_ifft2_cuda_float32 XFAIL [ 91%] 2023-01-11T23:10:17.2495309Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fft_ifft_cuda_float32 XFAIL [ 91%] 2023-01-11T23:10:17.2495489Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fft_ifftn_cuda_float32 XFAIL [ 91%] 2023-01-11T23:10:17.2495678Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fft_ifftshift_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2495856Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fft_ihfft2_cuda_float32 XFAIL [ 91%] 2023-01-11T23:10:17.2496025Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fft_irfft_cuda_float32 XFAIL [ 91%] 2023-01-11T23:10:17.2496205Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fft_irfftn_cuda_float32 XFAIL [ 91%] 2023-01-11T23:10:17.2496416Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fft_rfft2_cuda_float32 XFAIL [ 91%] 2023-01-11T23:10:17.2496593Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fft_rfft_cuda_float32 XFAIL [ 91%] 2023-01-11T23:10:17.2496772Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fill_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2496945Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_flip_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2497127Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_flipud_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2497302Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_float_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2497487Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_float_power_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2497660Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_floor_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2497835Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fmax_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2498011Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fmod_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2498181Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_half_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2498360Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_hstack_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2498536Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_hypot_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2498723Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_index_add_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2498899Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_lerp_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2499078Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_lgamma_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2499253Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_det_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2499465Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_eigh_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2499660Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_eigvalsh_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2499863Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_householder_product_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2500050Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_inv_ex_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2500234Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_lstsq_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2500423Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_lu_factor_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2500615Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_lu_solve_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2500813Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_matrix_power_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2501003Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_multi_dot_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2501201Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_norm_subgradients_at_zero_cuda_float32 XFAIL [ 91%] 2023-01-11T23:10:17.2501457Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_pinv_singular_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 91%] 2023-01-11T23:10:17.2501644Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_solve_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2501858Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_solve_ex_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2502037Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_svd_cuda_float32 XFAIL [ 91%] 2023-01-11T23:10:17.2502230Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_tensorinv_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2502428Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_tensorsolve_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2502613Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_vecdot_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2502789Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_log10_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2502957Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_log1p_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2503134Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_log2_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2503323Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_logaddexp2_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2503512Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_logaddexp_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2503692Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_logdet_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2503866Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_lu_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2504047Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_lu_solve_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2504230Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_lu_unpack_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2504402Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_mT_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2504580Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_masked_amax_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2504765Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_masked_amin_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2504989Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_masked_logaddexp_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2505180Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_masked_logsumexp_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2505368Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_masked_scatter_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2505556Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_masked_select_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2505740Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_masked_softmax_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2505929Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_masked_softmin_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2506116Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_masked_std_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2506290Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_matmul_cuda_float32 PASSED [ 91%] 2023-01-11T23:10:17.2506477Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_matrix_exp_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2506674Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_max_reduction_with_dim_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2506853Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_maximum_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2507055Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_meshgrid_list_of_tensors_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2507258Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_meshgrid_variadic_tensors_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2507476Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_min_reduction_no_dim_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2507659Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_minimum_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2507835Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_mm_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2508005Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_mode_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2508187Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_movedim_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2508367Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_msort_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2508541Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_mul_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2508789Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_mv_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2509009Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_mvlgamma_mvlgamma_p_1_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2509199Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_mvlgamma_mvlgamma_p_3_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2509386Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_mvlgamma_mvlgamma_p_5_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2509570Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nan_to_num_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2509743Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nanmean_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2509922Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nansum_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2510117Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_native_dropout_backward_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2510303Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_neg_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2510513Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_adaptive_avg_pool2d_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2510749Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_adaptive_avg_pool3d_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2510960Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_adaptive_max_pool1d_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2511165Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_adaptive_max_pool2d_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2511366Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_adaptive_max_pool3d_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2511571Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_alpha_dropout_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2511763Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_avg_pool1d_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2511964Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_avg_pool2d_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2512176Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_batch_norm_without_cudnn_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2512369Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_bilinear_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2512561Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_conv1d_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2512749Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_conv2d_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2512959Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_conv_transpose1d_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2513188Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_conv_transpose2d_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2513398Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_cosine_embedding_loss_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2513606Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_ctc_loss_cuda_float32 SKIPPED (Skipped!) [ 92%] 2023-01-11T23:10:17.2513798Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_dropout3d_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2513996Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_embedding_bag_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2514193Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_embedding_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2514414Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_feature_alpha_dropout_with_train_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2514642Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_feature_alpha_dropout_without_train_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2514849Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_fractional_max_pool2d_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2515040Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_glu_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2515237Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_group_norm_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2515438Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_hardshrink_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2515638Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_hardsigmoid_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2515828Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_hardswish_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2516051Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_hardtanh_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2516253Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_instance_norm_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2516461Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_interpolate_linear_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2516672Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_interpolate_nearest_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2516883Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_interpolate_trilinear_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2517077Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_layer_norm_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2517271Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_linear_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2517478Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_local_response_norm_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2517672Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_logsigmoid_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2517866Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_margin_ranking_loss_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2518065Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_max_unpool2d_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2518269Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_max_unpool3d_grad_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2518463Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_mse_loss_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2518692Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_multi_margin_loss_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2518891Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_normalize_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2519089Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_pad_circular_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2519289Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_pad_constant_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2519493Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_pairwise_distance_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2519679Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_pdist_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2519872Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_prelu_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2520058Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_relu6_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2520254Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_relu_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2520446Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_rrelu_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2520646Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_smooth_l1_loss_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2520847Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_soft_margin_loss_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2521043Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_softmin_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2521239Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_softplus_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2521439Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_threshold_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2521660Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_triplet_margin_loss_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2521856Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_unfold_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2522062Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_upsample_bilinear_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2522267Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_upsample_nearest_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2522443Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_norm_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2522627Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_norm_inf_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2522807Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_norm_nuc_cuda_float32 XFAIL [ 92%] 2023-01-11T23:10:17.2522987Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_ormqr_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2523171Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_pca_lowrank_cuda_float32 XFAIL [ 92%] 2023-01-11T23:10:17.2523345Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_permute_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2523524Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_pinverse_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2523725Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_polygamma_polygamma_n_0_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2523922Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_polygamma_polygamma_n_1_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2524142Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_polygamma_polygamma_n_2_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2524343Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_polygamma_polygamma_n_3_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2524543Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_polygamma_polygamma_n_4_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2524719Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_qr_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2524899Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_rad2deg_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2525075Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_real_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2525253Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_reciprocal_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2525438Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_remainder_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2525616Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_renorm_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2525792Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_repeat_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2525975Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_reshape_as_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2526165Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_resolve_conj_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2526351Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_resolve_neg_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2526525Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_roll_cuda_float32 XFAIL [ 92%] 2023-01-11T23:10:17.2526705Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_round_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2526890Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_round_decimals_3_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2527090Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_rsub_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2527277Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_scatter_add_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2527458Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_scatter_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2527655Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_segment_reduce_offsets_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2527833Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_select_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2528021Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_select_scatter_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2528199Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_sgn_cuda_float32 XFAIL [ 92%] 2023-01-11T23:10:17.2528380Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_sigmoid_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2528552Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_sign_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2528722Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_sinc_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2528891Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_sinh_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2529071Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_softmax_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2529265Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_softmax_with_dtype_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2529463Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_sparse_sampled_addmm_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2529675Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_special_erfcx_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2529863Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_special_i1e_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2530045Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_special_ndtri_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2530256Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_special_polygamma_special_polygamma_n_0_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2530443Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_special_xlog1py_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2530628Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_split_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2530820Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_sqrt_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2531026Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_std_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2531218Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_std_mean_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2531410Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_std_mean_unbiased_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2531597Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_std_unbiased_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2531778Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_sum_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2531954Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_sum_to_size_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2532129Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_svd_cuda_float32 XFAIL [ 92%] 2023-01-11T23:10:17.2532311Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_svd_lowrank_cuda_float32 XFAIL [ 92%] 2023-01-11T23:10:17.2532489Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_tan_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2532666Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_tanh_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2532876Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_tensor_split_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2533060Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_to_sparse_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2533237Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_topk_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2533414Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_trace_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2533590Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_transpose_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2533782Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_triangular_solve_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2533958Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_tril_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2534136Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_trunc_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2534317Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_unflatten_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2534662Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_unfold_copy_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2534846Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_unfold_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2535019Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_var_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2535200Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_var_mean_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2535380Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_var_unbiased_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2535660Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_view_as_complex_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2535847Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_view_copy_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2536031Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_vstack_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2536207Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_zero__cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2536363Z test_ops.py::TestFakeTensorCUDA::test_fake_cummax_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2536518Z test_ops.py::TestFakeTensorCUDA::test_fake_cummin_cuda_float32 PASSED [ 92%] 2023-01-11T23:10:17.2536672Z test_ops.py::TestFakeTensorCUDA::test_fake_diag_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2536821Z test_ops.py::TestFakeTensorCUDA::test_fake_diagflat_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2536981Z test_ops.py::TestFakeTensorCUDA::test_fake_diagonal_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2537134Z test_ops.py::TestFakeTensorCUDA::test_fake_diff_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2537290Z test_ops.py::TestFakeTensorCUDA::test_fake_digamma_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2537457Z test_ops.py::TestFakeTensorCUDA::test_fake_div_no_rounding_mode_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2537604Z test_ops.py::TestFakeTensorCUDA::test_fake_dot_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2537753Z test_ops.py::TestFakeTensorCUDA::test_fake_double_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2537904Z test_ops.py::TestFakeTensorCUDA::test_fake_dstack_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2538055Z test_ops.py::TestFakeTensorCUDA::test_fake_equal_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2538197Z test_ops.py::TestFakeTensorCUDA::test_fake_exp2_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2538353Z test_ops.py::TestFakeTensorCUDA::test_fake_expand_as_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2538504Z test_ops.py::TestFakeTensorCUDA::test_fake_fft_fft2_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2538691Z test_ops.py::TestFakeTensorCUDA::test_fake_fft_hfft_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2538842Z test_ops.py::TestFakeTensorCUDA::test_fake_fft_ifft2_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2538992Z test_ops.py::TestFakeTensorCUDA::test_fake_fft_ifft_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2539155Z test_ops.py::TestFakeTensorCUDA::test_fake_fft_ifftshift_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2539306Z test_ops.py::TestFakeTensorCUDA::test_fake_fft_ihfft_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2539457Z test_ops.py::TestFakeTensorCUDA::test_fake_fft_ihfftn_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2539615Z test_ops.py::TestFakeTensorCUDA::test_fake_fft_irfft2_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2539768Z test_ops.py::TestFakeTensorCUDA::test_fake_fft_irfft_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2539924Z test_ops.py::TestFakeTensorCUDA::test_fake_fft_irfftn_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2540078Z test_ops.py::TestFakeTensorCUDA::test_fake_fft_rfft_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2540227Z test_ops.py::TestFakeTensorCUDA::test_fake_fft_rfftn_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2540377Z test_ops.py::TestFakeTensorCUDA::test_fake_fill_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2540529Z test_ops.py::TestFakeTensorCUDA::test_fake_flatten_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2540672Z test_ops.py::TestFakeTensorCUDA::test_fake_fliplr_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2540823Z test_ops.py::TestFakeTensorCUDA::test_fake_flipud_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2540986Z test_ops.py::TestFakeTensorCUDA::test_fake_floor_divide_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2541133Z test_ops.py::TestFakeTensorCUDA::test_fake_fmin_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2541308Z test_ops.py::TestFakeTensorCUDA::test_fake_fmod_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2541457Z test_ops.py::TestFakeTensorCUDA::test_fake_frac_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2541607Z test_ops.py::TestFakeTensorCUDA::test_fake_full_like_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2541755Z test_ops.py::TestFakeTensorCUDA::test_fake_ge_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2541897Z test_ops.py::TestFakeTensorCUDA::test_fake_geqrf_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2542051Z test_ops.py::TestFakeTensorCUDA::test_fake_gradient_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2542200Z test_ops.py::TestFakeTensorCUDA::test_fake_half_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2542353Z test_ops.py::TestFakeTensorCUDA::test_fake_heaviside_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2542502Z test_ops.py::TestFakeTensorCUDA::test_fake_histc_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2542654Z test_ops.py::TestFakeTensorCUDA::test_fake_hsplit_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2542802Z test_ops.py::TestFakeTensorCUDA::test_fake_hstack_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2542954Z test_ops.py::TestFakeTensorCUDA::test_fake_hypot_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2543099Z test_ops.py::TestFakeTensorCUDA::test_fake_index_add_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2543256Z test_ops.py::TestFakeTensorCUDA::test_fake_index_copy_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2543415Z test_ops.py::TestFakeTensorCUDA::test_fake_index_fill_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2543568Z test_ops.py::TestFakeTensorCUDA::test_fake_index_put_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2543730Z test_ops.py::TestFakeTensorCUDA::test_fake_index_reduce_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2543889Z test_ops.py::TestFakeTensorCUDA::test_fake_index_select_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2544042Z test_ops.py::TestFakeTensorCUDA::test_fake_inner_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2544193Z test_ops.py::TestFakeTensorCUDA::test_fake_isinf_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2544338Z test_ops.py::TestFakeTensorCUDA::test_fake_isneginf_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2544526Z test_ops.py::TestFakeTensorCUDA::test_fake_isposinf_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2544736Z test_ops.py::TestFakeTensorCUDA::test_fake_jiterator_4inputs_with_extra_args_cuda_float32 SKIPPED (Skip failing test) [ 93%] 2023-01-11T23:10:17.2544929Z test_ops.py::TestFakeTensorCUDA::test_fake_jiterator_binary_cuda_float32 SKIPPED (Skip failing test) [ 93%] 2023-01-11T23:10:17.2545130Z test_ops.py::TestFakeTensorCUDA::test_fake_jiterator_binary_return_by_ref_cuda_float32 SKIPPED (Skip failing test) [ 93%] 2023-01-11T23:10:17.2545278Z test_ops.py::TestFakeTensorCUDA::test_fake_le_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2545429Z test_ops.py::TestFakeTensorCUDA::test_fake_lerp_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2545594Z test_ops.py::TestFakeTensorCUDA::test_fake_linalg_cross_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2545751Z test_ops.py::TestFakeTensorCUDA::test_fake_linalg_eig_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2545934Z test_ops.py::TestFakeTensorCUDA::test_fake_linalg_eigvalsh_cuda_float32 SKIPPED (Skip failing test) [ 93%] 2023-01-11T23:10:17.2546093Z test_ops.py::TestFakeTensorCUDA::test_fake_linalg_inv_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2546255Z test_ops.py::TestFakeTensorCUDA::test_fake_linalg_inv_ex_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2546413Z test_ops.py::TestFakeTensorCUDA::test_fake_linalg_lstsq_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2546576Z test_ops.py::TestFakeTensorCUDA::test_fake_linalg_lu_solve_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2546743Z test_ops.py::TestFakeTensorCUDA::test_fake_linalg_matrix_rank_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2546945Z test_ops.py::TestFakeTensorCUDA::test_fake_linalg_matrix_rank_hermitian_cuda_float32 SKIPPED (Skip failing test) [ 93%] 2023-01-11T23:10:17.2547136Z test_ops.py::TestFakeTensorCUDA::test_fake_linalg_multi_dot_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2547291Z test_ops.py::TestFakeTensorCUDA::test_fake_linalg_norm_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2547473Z test_ops.py::TestFakeTensorCUDA::test_fake_linalg_norm_subgradients_at_zero_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2547631Z test_ops.py::TestFakeTensorCUDA::test_fake_linalg_pinv_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2547826Z test_ops.py::TestFakeTensorCUDA::test_fake_linalg_pinv_hermitian_cuda_float32 SKIPPED (Skip failing test) [ 93%] 2023-01-11T23:10:17.2547981Z test_ops.py::TestFakeTensorCUDA::test_fake_linalg_qr_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2548144Z test_ops.py::TestFakeTensorCUDA::test_fake_linalg_slogdet_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2548325Z test_ops.py::TestFakeTensorCUDA::test_fake_linalg_solve_cuda_float32 SKIPPED (Skip failing test) [ 93%] 2023-01-11T23:10:17.2548500Z test_ops.py::TestFakeTensorCUDA::test_fake_linalg_solve_triangular_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2548740Z test_ops.py::TestFakeTensorCUDA::test_fake_linalg_vecdot_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2548905Z test_ops.py::TestFakeTensorCUDA::test_fake_log10_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2549075Z test_ops.py::TestFakeTensorCUDA::test_fake_log1p_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2549225Z test_ops.py::TestFakeTensorCUDA::test_fake_log2_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2549385Z test_ops.py::TestFakeTensorCUDA::test_fake_logaddexp2_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2549542Z test_ops.py::TestFakeTensorCUDA::test_fake_logaddexp_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2549694Z test_ops.py::TestFakeTensorCUDA::test_fake_logdet_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2549853Z test_ops.py::TestFakeTensorCUDA::test_fake_logical_not_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2550007Z test_ops.py::TestFakeTensorCUDA::test_fake_logit_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2550153Z test_ops.py::TestFakeTensorCUDA::test_fake_logspace_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2550327Z test_ops.py::TestFakeTensorCUDA::test_fake_lt_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2550481Z test_ops.py::TestFakeTensorCUDA::test_fake_lu_unpack_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2550629Z test_ops.py::TestFakeTensorCUDA::test_fake_mH_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2550776Z test_ops.py::TestFakeTensorCUDA::test_fake_mT_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2550938Z test_ops.py::TestFakeTensorCUDA::test_fake_masked_amax_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2551101Z test_ops.py::TestFakeTensorCUDA::test_fake_masked_argmin_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2551258Z test_ops.py::TestFakeTensorCUDA::test_fake_masked_cumsum_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2551412Z test_ops.py::TestFakeTensorCUDA::test_fake_masked_fill_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2551579Z test_ops.py::TestFakeTensorCUDA::test_fake_masked_log_softmax_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2551750Z test_ops.py::TestFakeTensorCUDA::test_fake_masked_logsumexp_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2551915Z test_ops.py::TestFakeTensorCUDA::test_fake_masked_normalize_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2552073Z test_ops.py::TestFakeTensorCUDA::test_fake_max_binary_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2552244Z test_ops.py::TestFakeTensorCUDA::test_fake_max_reduction_with_dim_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2552398Z test_ops.py::TestFakeTensorCUDA::test_fake_maximum_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2552548Z test_ops.py::TestFakeTensorCUDA::test_fake_median_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2552717Z test_ops.py::TestFakeTensorCUDA::test_fake_meshgrid_variadic_tensors_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2552901Z test_ops.py::TestFakeTensorCUDA::test_fake_min_binary_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2553072Z test_ops.py::TestFakeTensorCUDA::test_fake_min_reduction_no_dim_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2553230Z test_ops.py::TestFakeTensorCUDA::test_fake_minimum_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2553381Z test_ops.py::TestFakeTensorCUDA::test_fake_mm_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2553533Z test_ops.py::TestFakeTensorCUDA::test_fake_mode_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2553684Z test_ops.py::TestFakeTensorCUDA::test_fake_movedim_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2553835Z test_ops.py::TestFakeTensorCUDA::test_fake_msort_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2554018Z test_ops.py::TestFakeTensorCUDA::test_fake_multinomial_cuda_float32 SKIPPED (Skip failing test) [ 93%] 2023-01-11T23:10:17.2554158Z test_ops.py::TestFakeTensorCUDA::test_fake_mv_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2554354Z test_ops.py::TestFakeTensorCUDA::test_fake_mvlgamma_mvlgamma_p_3_cuda_float32 SKIPPED (Skip failing test) [ 93%] 2023-01-11T23:10:17.2554532Z test_ops.py::TestFakeTensorCUDA::test_fake_nanmean_cuda_float32 SKIPPED (Skip failing test) [ 93%] 2023-01-11T23:10:17.2554689Z test_ops.py::TestFakeTensorCUDA::test_fake_nanmedian_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2554841Z test_ops.py::TestFakeTensorCUDA::test_fake_nansum_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2555020Z test_ops.py::TestFakeTensorCUDA::test_fake_native_batch_norm_cuda_float32 SKIPPED (Skipped!) [ 93%] 2023-01-11T23:10:17.2555169Z test_ops.py::TestFakeTensorCUDA::test_fake_ne_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2555321Z test_ops.py::TestFakeTensorCUDA::test_fake_new_empty_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2555481Z test_ops.py::TestFakeTensorCUDA::test_fake_new_empty_strided_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2555633Z test_ops.py::TestFakeTensorCUDA::test_fake_new_full_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2555788Z test_ops.py::TestFakeTensorCUDA::test_fake_new_zeros_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2555938Z test_ops.py::TestFakeTensorCUDA::test_fake_nextafter_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2556148Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_adaptive_avg_pool1d_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2556335Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_adaptive_avg_pool3d_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2556519Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_adaptive_max_pool1d_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2556706Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_adaptive_max_pool2d_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2556880Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_avg_pool2d_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2557046Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_avg_pool3d_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2557239Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_batch_norm_without_cudnn_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2557408Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_celu_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2557594Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_conv_transpose2d_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2557773Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_conv_transpose3d_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2557960Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_cosine_embedding_loss_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2558135Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_cross_entropy_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2558329Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_ctc_loss_cuda_float32 SKIPPED (Skip failing test) [ 93%] 2023-01-11T23:10:17.2558495Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_dropout2d_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2558693Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_dropout_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2558861Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_elu_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2559035Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_embedding_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2559221Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_fractional_max_pool2d_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2559385Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_glu_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2559561Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_grid_sample_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2559739Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_hardsigmoid_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2559929Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_hinge_embedding_loss_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2560095Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_huber_loss_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2560281Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_interpolate_area_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2560471Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_interpolate_trilinear_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2560640Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_l1_loss_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2560809Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_leaky_relu_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2560978Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_linear_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2561157Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_max_unpool1d_grad_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2561335Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_max_unpool2d_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2561520Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_max_unpool2d_grad_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2561689Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_max_unpool3d_grad_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2561878Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_mish_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2562050Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_mse_loss_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2562231Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_multi_margin_loss_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2562420Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_multilabel_margin_loss_cuda_float32 PASSED [ 93%] 2023-01-11T23:10:17.2562614Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_nll_loss_cuda_float32 SKIPPED (Skip failing test) [ 94%] 2023-01-11T23:10:17.2562804Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_one_hot_cuda_int64 SKIPPED (Skip failing test) [ 94%] 2023-01-11T23:10:17.2562984Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_pad_circular_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2563158Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_pad_reflect_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2563330Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_pad_replicate_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2563508Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_pairwise_distance_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2563675Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_pdist_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2563847Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_pixel_shuffle_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2564028Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_poisson_nll_loss_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2564200Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_prelu_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2564391Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_silu_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2564573Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_soft_margin_loss_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2564740Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_softplus_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2564915Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_softshrink_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2565087Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_tanhshrink_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2565272Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_triplet_margin_loss_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2565471Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_triplet_margin_with_distance_loss_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2565640Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_unfold_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2565796Z test_ops.py::TestFakeTensorCUDA::test_fake_norm_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2565952Z test_ops.py::TestFakeTensorCUDA::test_fake_norm_nuc_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2566109Z test_ops.py::TestFakeTensorCUDA::test_fake_normal_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2566271Z test_ops.py::TestFakeTensorCUDA::test_fake_normal_number_mean_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2566422Z test_ops.py::TestFakeTensorCUDA::test_fake_ones_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2566577Z test_ops.py::TestFakeTensorCUDA::test_fake_outer_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2566737Z test_ops.py::TestFakeTensorCUDA::test_fake_pca_lowrank_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2566896Z test_ops.py::TestFakeTensorCUDA::test_fake_pinverse_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2567069Z test_ops.py::TestFakeTensorCUDA::test_fake_polygamma_polygamma_n_0_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2567242Z test_ops.py::TestFakeTensorCUDA::test_fake_polygamma_polygamma_n_2_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2567416Z test_ops.py::TestFakeTensorCUDA::test_fake_polygamma_polygamma_n_3_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2567601Z test_ops.py::TestFakeTensorCUDA::test_fake_polygamma_polygamma_n_4_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2567754Z test_ops.py::TestFakeTensorCUDA::test_fake_pow_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2567909Z test_ops.py::TestFakeTensorCUDA::test_fake_prod_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2568060Z test_ops.py::TestFakeTensorCUDA::test_fake_qr_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2568238Z test_ops.py::TestFakeTensorCUDA::test_fake_quantile_cuda_float32 SKIPPED (Skip failing test) [ 94%] 2023-01-11T23:10:17.2568390Z test_ops.py::TestFakeTensorCUDA::test_fake_rad2deg_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2568541Z test_ops.py::TestFakeTensorCUDA::test_fake_randint_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2568708Z test_ops.py::TestFakeTensorCUDA::test_fake_randint_like_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2568860Z test_ops.py::TestFakeTensorCUDA::test_fake_randn_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2569006Z test_ops.py::TestFakeTensorCUDA::test_fake_ravel_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2569162Z test_ops.py::TestFakeTensorCUDA::test_fake_remainder_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2569313Z test_ops.py::TestFakeTensorCUDA::test_fake_renorm_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2569474Z test_ops.py::TestFakeTensorCUDA::test_fake_reshape_as_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2569625Z test_ops.py::TestFakeTensorCUDA::test_fake_reshape_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2569783Z test_ops.py::TestFakeTensorCUDA::test_fake_resize_as__cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2569942Z test_ops.py::TestFakeTensorCUDA::test_fake_resolve_neg_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2570094Z test_ops.py::TestFakeTensorCUDA::test_fake_roll_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2570263Z test_ops.py::TestFakeTensorCUDA::test_fake_rot90_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2570430Z test_ops.py::TestFakeTensorCUDA::test_fake_round_decimals_0_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2570602Z test_ops.py::TestFakeTensorCUDA::test_fake_round_decimals_neg_3_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2570769Z test_ops.py::TestFakeTensorCUDA::test_fake_scalar_tensor_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2570954Z test_ops.py::TestFakeTensorCUDA::test_fake_scatter_add_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2571148Z test_ops.py::TestFakeTensorCUDA::test_fake_segment_reduce_offsets_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2571300Z test_ops.py::TestFakeTensorCUDA::test_fake_select_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2571472Z test_ops.py::TestFakeTensorCUDA::test_fake_signal_windows_bartlett_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2571633Z test_ops.py::TestFakeTensorCUDA::test_fake_signal_windows_cosine_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2571815Z test_ops.py::TestFakeTensorCUDA::test_fake_signal_windows_exponential_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2571988Z test_ops.py::TestFakeTensorCUDA::test_fake_signal_windows_gaussian_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2572166Z test_ops.py::TestFakeTensorCUDA::test_fake_signal_windows_general_hamming_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2572337Z test_ops.py::TestFakeTensorCUDA::test_fake_signal_windows_hamming_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2572501Z test_ops.py::TestFakeTensorCUDA::test_fake_signal_windows_nuttall_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2572652Z test_ops.py::TestFakeTensorCUDA::test_fake_sin_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2572804Z test_ops.py::TestFakeTensorCUDA::test_fake_slice_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2572959Z test_ops.py::TestFakeTensorCUDA::test_fake_slice_scatter_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2573127Z test_ops.py::TestFakeTensorCUDA::test_fake_softmax_with_dtype_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2573278Z test_ops.py::TestFakeTensorCUDA::test_fake_sort_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2573465Z test_ops.py::TestFakeTensorCUDA::test_fake_special_bessel_j1_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2573635Z test_ops.py::TestFakeTensorCUDA::test_fake_special_bessel_y0_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2573819Z test_ops.py::TestFakeTensorCUDA::test_fake_special_chebyshev_polynomial_t_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2573994Z test_ops.py::TestFakeTensorCUDA::test_fake_special_chebyshev_polynomial_u_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2574357Z test_ops.py::TestFakeTensorCUDA::test_fake_special_chebyshev_polynomial_v_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 94%] 2023-01-11T23:10:17.2574616Z test_ops.py::TestFakeTensorCUDA::test_fake_special_erfcx_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2574792Z test_ops.py::TestFakeTensorCUDA::test_fake_special_hermite_polynomial_h_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2574970Z test_ops.py::TestFakeTensorCUDA::test_fake_special_hermite_polynomial_he_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2575146Z test_ops.py::TestFakeTensorCUDA::test_fake_special_modified_bessel_k0_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2575309Z test_ops.py::TestFakeTensorCUDA::test_fake_special_ndtr_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2575469Z test_ops.py::TestFakeTensorCUDA::test_fake_special_ndtri_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2575664Z test_ops.py::TestFakeTensorCUDA::test_fake_special_polygamma_special_polygamma_n_0_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2575850Z test_ops.py::TestFakeTensorCUDA::test_fake_special_scaled_modified_bessel_k0_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2576207Z test_ops.py::TestFakeTensorCUDA::test_fake_special_shifted_chebyshev_polynomial_t_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 94%] 2023-01-11T23:10:17.2576426Z test_ops.py::TestFakeTensorCUDA::test_fake_special_spherical_bessel_j0_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2576583Z test_ops.py::TestFakeTensorCUDA::test_fake_special_xlog1py_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2576738Z test_ops.py::TestFakeTensorCUDA::test_fake_split_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2576904Z test_ops.py::TestFakeTensorCUDA::test_fake_split_with_sizes_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2577054Z test_ops.py::TestFakeTensorCUDA::test_fake_sqrt_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2577207Z test_ops.py::TestFakeTensorCUDA::test_fake_squeeze_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2577358Z test_ops.py::TestFakeTensorCUDA::test_fake_stack_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2577510Z test_ops.py::TestFakeTensorCUDA::test_fake_std_mean_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2577676Z test_ops.py::TestFakeTensorCUDA::test_fake_std_mean_unbiased_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2577831Z test_ops.py::TestFakeTensorCUDA::test_fake_std_unbiased_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2577980Z test_ops.py::TestFakeTensorCUDA::test_fake_stft_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2578130Z test_ops.py::TestFakeTensorCUDA::test_fake_sum_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2578292Z test_ops.py::TestFakeTensorCUDA::test_fake_take_along_dim_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2578443Z test_ops.py::TestFakeTensorCUDA::test_fake_take_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2578593Z test_ops.py::TestFakeTensorCUDA::test_fake_tan_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2578745Z test_ops.py::TestFakeTensorCUDA::test_fake_tensordot_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2578893Z test_ops.py::TestFakeTensorCUDA::test_fake_to_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2579035Z test_ops.py::TestFakeTensorCUDA::test_fake_trace_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2579192Z test_ops.py::TestFakeTensorCUDA::test_fake_trapezoid_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2579339Z test_ops.py::TestFakeTensorCUDA::test_fake_trapz_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2579529Z test_ops.py::TestFakeTensorCUDA::test_fake_tril_indices_cuda_int64 PASSED [ 94%] 2023-01-11T23:10:17.2579676Z test_ops.py::TestFakeTensorCUDA::test_fake_triu_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2579834Z test_ops.py::TestFakeTensorCUDA::test_fake_triu_indices_cuda_int64 PASSED [ 94%] 2023-01-11T23:10:17.2579986Z test_ops.py::TestFakeTensorCUDA::test_fake_unflatten_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2580134Z test_ops.py::TestFakeTensorCUDA::test_fake_unfold_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2580284Z test_ops.py::TestFakeTensorCUDA::test_fake_unique_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2580438Z test_ops.py::TestFakeTensorCUDA::test_fake_var_unbiased_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2580604Z test_ops.py::TestFakeTensorCUDA::test_fake_view_as_complex_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2580757Z test_ops.py::TestFakeTensorCUDA::test_fake_view_as_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2580921Z test_ops.py::TestFakeTensorCUDA::test_fake_view_as_real_cuda_complex64 PASSED [ 94%] 2023-01-11T23:10:17.2581073Z test_ops.py::TestFakeTensorCUDA::test_fake_view_copy_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2581222Z test_ops.py::TestFakeTensorCUDA::test_fake_vsplit_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2581370Z test_ops.py::TestFakeTensorCUDA::test_fake_vstack_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2581521Z test_ops.py::TestFakeTensorCUDA::test_fake_where_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2581670Z test_ops.py::TestFakeTensorCUDA::test_fake_zeros_like_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2581824Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_H_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2581982Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_T_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2582174Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops___radd___cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2582346Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops___rmatmul___cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2582508Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops___ror___cuda_int64 PASSED [ 94%] 2023-01-11T23:10:17.2582676Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops___rpow___cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2582858Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops__native_batch_norm_legit_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2583035Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops__softmax_backward_data_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2583197Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_abs_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2583359Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_acos_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2583525Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_acosh_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2583690Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_addbmm_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2583856Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_addcmul_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2584038Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_addmm_decomposed_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2584198Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_addmv_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2584351Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_addr_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2584511Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_all_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2584678Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_allclose_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2584840Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_amin_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2585031Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_aminmax_cuda_float32 SKIPPED (Skip failing test) [ 94%] 2023-01-11T23:10:17.2585195Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_angle_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2585401Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_argmin_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2585570Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_argsort_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2585737Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_argwhere_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2585899Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_as_strided_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2586085Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_as_strided_partial_views_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2586247Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_atan_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2586414Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_atleast_1d_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2586589Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_bernoulli_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2586754Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_bincount_cuda_int64 PASSED [ 94%] 2023-01-11T23:10:17.2586928Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_bitwise_left_shift_cuda_int64 PASSED [ 94%] 2023-01-11T23:10:17.2587095Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_bitwise_not_cuda_int64 PASSED [ 94%] 2023-01-11T23:10:17.2587265Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_bitwise_right_shift_cuda_int64 PASSED [ 94%] 2023-01-11T23:10:17.2587432Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_block_diag_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2587592Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_bool_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2587759Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_bucketize_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2587945Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_byte_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2588105Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_cat_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2588271Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_cdist_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2588437Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_cdouble_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2588599Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_ceil_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2588856Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_cholesky_cuda_float32 SKIPPED (Skip failing test) [ 94%] 2023-01-11T23:10:17.2589030Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_chunk_cuda_float32 PASSED [ 94%] 2023-01-11T23:10:17.2589194Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_clamp_max_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2589360Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_clamp_min_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2589536Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_column_stack_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2589706Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_combinations_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2589869Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_conj_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2590045Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_constant_pad_nd_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2590206Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_contiguous_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2590376Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_copysign_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2590538Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_cosh_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2590708Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_count_nonzero_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2590869Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_cross_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2591038Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_cummax_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2591204Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_cumprod_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2591410Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_cumulative_trapezoid_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2591579Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_diag_embed_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2591743Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_diagonal_copy_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2591907Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_digamma_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2592068Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_dist_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2592231Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_double_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2592398Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_empty_like_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2592558Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_eq_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2592723Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_erf_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2592883Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_erfc_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2593045Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fft_fftshift_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2593210Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fft_hfft2_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2593375Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fft_hfftn_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2593542Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fft_ihfft2_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2593705Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fft_ihfft_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2593897Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fft_ihfftn_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2594063Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fft_irfftn_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2594230Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fft_rfft2_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2594388Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fft_rfft_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2594549Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fill_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2594714Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_flatten_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2594879Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_flip_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2595047Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_float_power_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2595217Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_floor_divide_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2595384Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fmin_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2595543Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fmod_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2595700Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_frac_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2595855Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_frexp_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2596020Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_full_like_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2596182Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_ge_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2596347Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_geqrf_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2596511Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_gradient_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2596685Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_grid_sampler_2d_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2596848Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_gt_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2597012Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_hstack_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2597202Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_igammac_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2597369Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_imag_cuda_complex64 PASSED [ 95%] 2023-01-11T23:10:17.2597534Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_index_put_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2597702Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_index_reduce_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2597866Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_inner_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2598029Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_int_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2598191Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_isclose_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2598360Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_isfinite_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2598563Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_jiterator_2inputs_2outputs_cuda_float32 SKIPPED (Skip failing test) [ 95%] 2023-01-11T23:10:17.2598762Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_jiterator_binary_cuda_float32 SKIPPED (Skip failing test) [ 95%] 2023-01-11T23:10:17.2598924Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_kron_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2599085Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_lerp_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2599246Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_lgamma_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2599415Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_cond_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2599582Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_det_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2599776Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_eig_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2599969Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_householder_product_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2600134Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_inv_ex_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2600309Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_ldl_factor_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2600487Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_ldl_factor_ex_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2600661Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_ldl_solve_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2600848Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_lstsq_grad_oriented_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2601028Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_lu_factor_ex_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2601228Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_matrix_norm_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2601466Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_matrix_rank_hermitian_cuda_float32 SKIPPED (Skip failing test) [ 95%] 2023-01-11T23:10:17.2601638Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_multi_dot_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2601802Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_norm_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2601994Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_norm_subgradients_at_zero_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2602161Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_pinv_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2602363Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_pinv_hermitian_cuda_float32 SKIPPED (Skip failing test) [ 95%] 2023-01-11T23:10:17.2602604Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_pinv_singular_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 95%] 2023-01-11T23:10:17.2602775Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_qr_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2602992Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_solve_cuda_float32 SKIPPED (Skip failing test) [ 95%] 2023-01-11T23:10:17.2603178Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_solve_triangular_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2603345Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_svd_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2603511Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_svdvals_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2603686Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_tensorinv_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2603859Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_vander_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2604032Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_vecdot_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2604192Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_log2_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2604357Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_log_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2604527Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_logaddexp2_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2604700Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_logcumsumexp_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2604863Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_logdet_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2605025Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_logical_and_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2605191Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_logical_not_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2605360Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_logspace_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2605551Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_long_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2605714Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_lu_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2605883Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_amax_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2606046Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_amin_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2606218Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_argmin_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2606381Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_cumsum_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2606558Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_logsumexp_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2606727Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_median_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2606897Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_prod_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2607068Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_scatter_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2607241Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_softmax_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2607407Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_softmin_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2607572Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_std_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2607740Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_var_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2607899Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_matrix_exp_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2608067Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_max_binary_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2608263Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_max_pool2d_with_indices_backward_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2608445Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_max_reduction_no_dim_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2608650Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_max_reduction_with_dim_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2608815Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_median_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2608983Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_minimum_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2609144Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_mm_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2609299Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_mode_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2609462Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_movedim_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2609629Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_msort_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2609824Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_multinomial_cuda_float32 SKIPPED (Skip failing test) [ 95%] 2023-01-11T23:10:17.2609988Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_mv_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2610193Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_mvlgamma_mvlgamma_p_1_cuda_float32 SKIPPED (Skip failing test) [ 95%] 2023-01-11T23:10:17.2610395Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_mvlgamma_mvlgamma_p_3_cuda_float32 SKIPPED (Skip failing test) [ 95%] 2023-01-11T23:10:17.2610584Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nanquantile_cuda_float32 SKIPPED (Skip failing test) [ 95%] 2023-01-11T23:10:17.2610774Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_narrow_cuda_float32 SKIPPED (Skip failing test) [ 95%] 2023-01-11T23:10:17.2610946Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_native_batch_norm_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2611113Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_ne_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2611333Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_neg_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2611511Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_new_full_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2611679Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_new_ones_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2611847Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_new_zeros_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2612051Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional__scaled_dot_product_attention_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2612246Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_adaptive_max_pool1d_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2612443Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_adaptive_max_pool2d_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2612621Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_avg_pool1d_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2612807Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_avg_pool2d_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2612991Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_batch_norm_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2613175Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_bilinear_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2613370Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_binary_cross_entropy_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2613547Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_conv1d_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2613738Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_conv_transpose2d_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2613935Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_cosine_embedding_loss_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2614126Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_cosine_similarity_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2614311Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_cross_entropy_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2614644Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_ctc_loss_cuda_float32 SKIPPED (Skip failing test) [ 95%] 2023-01-11T23:10:17.2614833Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_dropout2d_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2615011Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_elu_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2615219Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_embedding_bag_cuda_float32 SKIPPED (Skip failing test) [ 95%] 2023-01-11T23:10:17.2615403Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_embedding_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2615608Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_feature_alpha_dropout_with_train_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2615807Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_fractional_max_pool3d_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2616002Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_gaussian_nll_loss_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2616173Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_gelu_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2616362Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_grid_sample_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2616542Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_group_norm_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2616729Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_hardsigmoid_cuda_float32 PASSED [ 95%] 2023-01-11T23:10:17.2616910Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_hardtanh_cuda_float32 PASSED [ 96%] 2023-01-11T23:10:17.2617135Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_interpolate_bilinear_cuda_float32 PASSED [ 96%] 2023-01-11T23:10:17.2617331Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_interpolate_trilinear_cuda_float32 PASSED [ 96%] 2023-01-11T23:10:17.2617515Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_l1_loss_cuda_float32 PASSED [ 96%] 2023-01-11T23:10:17.2617693Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_linear_cuda_float32 PASSED [ 96%] 2023-01-11T23:10:17.2617878Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_local_response_norm_cuda_float32 PASSED [ 96%] 2023-01-11T23:10:17.2618061Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_logsigmoid_cuda_float32 PASSED [ 96%] 2023-01-11T23:10:17.2618251Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_margin_ranking_loss_cuda_float32 PASSED [ 96%] 2023-01-11T23:10:17.2618431Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_max_pool2d_cuda_float32 PASSED [ 96%] 2023-01-11T23:10:17.2618621Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_max_unpool1d_cuda_float32 PASSED [ 96%] 2023-01-11T23:10:17.2618814Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_max_unpool1d_grad_cuda_float32 PASSED [ 96%] 2023-01-11T23:10:17.2619003Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_max_unpool2d_grad_cuda_float32 PASSED [ 96%] 2023-01-11T23:10:17.2619190Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_max_unpool3d_cuda_float32 PASSED [ 96%] 2023-01-11T23:10:17.2619376Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_max_unpool3d_grad_cuda_float32 PASSED [ 96%] 2023-01-11T23:10:17.2619546Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_mish_cuda_float32 PASSED [ 96%] 2023-01-11T23:10:17.2619749Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_nll_loss_cuda_float32 SKIPPED (Skip failing test) [ 96%] 2023-01-11T23:10:17.2619937Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_pad_circular_cuda_float32 PASSED [ 96%] 2023-01-11T23:10:17.2620124Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_pad_reflect_cuda_float32 PASSED [ 96%] 2023-01-11T23:10:17.2620333Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_pdist_cuda_float32 PASSED [ 96%] 2023-01-11T23:10:17.2620532Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_prelu_cuda_float32 PASSED [ 96%] 2023-01-11T23:10:17.2620734Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_relu6_cuda_float32 PASSED [ 96%] 2023-01-11T23:10:17.2620910Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_relu_cuda_float32 PASSED [ 96%] 2023-01-11T23:10:17.2621098Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_tanhshrink_cuda_float32 PASSED [ 96%] 2023-01-11T23:10:17.2621275Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_threshold_cuda_float32 PASSED [ 96%] 2023-01-11T23:10:17.2621486Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_triplet_margin_with_distance_loss_cuda_float32 PASSED [ 96%] 2023-01-11T23:10:17.2621675Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_unfold_cuda_float32 PASSED [ 96%] 2023-01-11T23:10:17.2621870Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_upsample_bilinear_cuda_float32 PASSED [ 96%] 2023-01-11T23:10:17.2622036Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nonzero_cuda_float32 PASSED [ 96%] 2023-01-11T23:10:17.2622199Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_norm_cuda_float32 PASSED [ 96%] 2023-01-11T23:10:17.2622368Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_norm_inf_cuda_float32 PASSED [ 96%] 2023-01-11T23:10:17.2622534Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_norm_nuc_cuda_float32 PASSED [ 96%] 2023-01-11T23:10:17.2622698Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_ones_cuda_float32 PASSED [ 96%] 2023-01-11T23:10:17.2622883Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_ones_like_cuda_float32 PASSED [ 96%] 2023-01-11T23:10:17.2623048Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_ormqr_cuda_float32 PASSED [ 96%] 2023-01-11T23:10:17.2623223Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_pca_lowrank_cuda_float32 PASSED [ 96%] 2023-01-11T23:10:17.2623387Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_pinverse_cuda_float32 PASSED [ 96%] 2023-01-11T23:10:17.2623553Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_polar_cuda_float32 PASSED [ 96%] 2023-01-11T23:10:17.2623737Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_polygamma_polygamma_n_0_cuda_float32 PASSED [ 96%] 2023-01-11T23:10:17.2623920Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_polygamma_polygamma_n_2_cuda_float32 PASSED [ 96%] 2023-01-11T23:10:17.2624083Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_pow_cuda_float32 PASSED [ 96%] 2023-01-11T23:10:17.2624240Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_put_cuda_float32 PASSED [ 96%] 2023-01-11T23:10:17.2624395Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_qr_cuda_float32 PASSED [ 96%] 2023-01-11T23:10:17.2624584Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_quantile_cuda_float32 SKIPPED (Skip failing test) [ 96%] 2023-01-11T23:10:17.2624753Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_rad2deg_cuda_float32 PASSED [ 96%] 2023-01-11T23:10:17.2624919Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_randint_cuda_float32 PASSED [ 96%] 2023-01-11T23:10:17.2625092Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_randint_like_cuda_float32 PASSED [ 96%] 2023-01-11T23:10:17.2625257Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_randn_cuda_float32 PASSED [ 96%] 2023-01-11T23:10:17.2625425Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_randn_like_cuda_float32 PASSED [ 96%] 2023-01-11T23:10:17.2625588Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_ravel_cuda_float32 PASSED [ 96%] 2023-01-11T23:10:17.2625747Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_real_cuda_float32 PASSED [ 96%] 2023-01-11T23:10:17.2625921Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_reciprocal_cuda_float32 PASSED [ 96%] 2023-01-11T23:10:17.2626159Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_remainder_cuda_float32 PASSED [ 96%] 2023-01-11T23:10:17.2626360Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_repeat_interleave_cuda_float32 SKIPPED (Skip failing test) [ 96%] 2023-01-11T23:10:17.2626526Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_reshape_cuda_float32 PASSED [ 96%] 2023-01-11T23:10:17.2626690Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_resize__cuda_float32 PASSED [ 96%] 2023-01-11T23:10:17.2626855Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_resize_as__cuda_float32 PASSED [ 96%] 2023-01-11T23:10:17.2627024Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_resolve_conj_cuda_float32 PASSED [ 96%] 2023-01-11T23:10:17.2627188Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_roll_cuda_float32 PASSED [ 96%] 2023-01-11T23:10:17.2627346Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_rot90_cuda_float32 PASSED [ 96%] 2023-01-11T23:10:17.2627519Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_round_decimals_0_cuda_float32 PASSED [ 96%] 2023-01-11T23:10:17.2627695Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_round_decimals_3_cuda_float32 PASSED [ 96%] 2023-01-11T23:10:17.2627862Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_scatter_cuda_float32 PASSED [ 96%] 2023-01-11T23:10:17.2628040Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_scatter_reduce_amax_cuda_float32 PASSED [ 96%] 2023-01-11T23:10:17.2628217Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_scatter_reduce_mean_cuda_float32 PASSED [ 96%] 2023-01-11T23:10:17.2628392Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_scatter_reduce_sum_cuda_float32 PASSED [ 96%] 2023-01-11T23:10:17.2628563Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_searchsorted_cuda_float32 PASSED [ 96%] 2023-01-11T23:10:17.2628839Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_signal_windows_gaussian_cuda_float32 PASSED [ 96%] 2023-01-11T23:10:17.2629029Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_signal_windows_general_cosine_cuda_float32 PASSED [ 96%] 2023-01-11T23:10:17.2629216Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_signal_windows_hamming_cuda_float32 PASSED [ 96%] 2023-01-11T23:10:17.2629395Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_signal_windows_nuttall_cuda_float32 PASSED [ 96%] 2023-01-11T23:10:17.2629556Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_sin_cuda_float32 PASSED [ 96%] 2023-01-11T23:10:17.2629718Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_sinc_cuda_float32 PASSED [ 96%] 2023-01-11T23:10:17.2629878Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_sinh_cuda_float32 PASSED [ 96%] 2023-01-11T23:10:17.2630043Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_slice_cuda_float32 PASSED [ 96%] 2023-01-11T23:10:17.2630204Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_sort_cuda_float32 PASSED [ 96%] 2023-01-11T23:10:17.2630377Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_bessel_j0_cuda_float32 PASSED [ 96%] 2023-01-11T23:10:17.2630558Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_bessel_y1_cuda_float32 PASSED [ 96%] 2023-01-11T23:10:17.2630753Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_chebyshev_polynomial_t_cuda_float32 PASSED [ 96%] 2023-01-11T23:10:17.2630925Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_erfcx_cuda_float32 PASSED [ 96%] 2023-01-11T23:10:17.2631115Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_hermite_polynomial_h_cuda_float32 PASSED [ 96%] 2023-01-11T23:10:17.2631309Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_hermite_polynomial_he_cuda_float32 PASSED [ 96%] 2023-01-11T23:10:17.2631480Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_i0e_cuda_float32 PASSED [ 96%] 2023-01-11T23:10:17.2631647Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_i1_cuda_float32 PASSED [ 96%] 2023-01-11T23:10:17.2632021Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_legendre_polynomial_p_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 96%] 2023-01-11T23:10:17.2632230Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_modified_bessel_i0_cuda_float32 PASSED [ 96%] 2023-01-11T23:10:17.2632419Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_modified_bessel_i1_cuda_float32 PASSED [ 96%] 2023-01-11T23:10:17.2632610Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_scaled_modified_bessel_k0_cuda_float32 PASSED [ 96%] 2023-01-11T23:10:17.2632983Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_shifted_chebyshev_polynomial_w_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 96%] 2023-01-11T23:10:17.2633171Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_spherical_bessel_j0_cuda_float32 PASSED [ 96%] 2023-01-11T23:10:17.2633343Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_zeta_cuda_float32 PASSED [ 96%] 2023-01-11T23:10:17.2633507Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_split_cuda_float32 PASSED [ 96%] 2023-01-11T23:10:17.2633674Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_square_cuda_float32 PASSED [ 96%] 2023-01-11T23:10:17.2633835Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_std_cuda_float32 PASSED [ 96%] 2023-01-11T23:10:17.2633986Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_svd_cuda_float32 PASSED [ 96%] 2023-01-11T23:10:17.2634154Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_svd_lowrank_cuda_float32 PASSED [ 96%] 2023-01-11T23:10:17.2634318Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_symeig_cuda_float32 PASSED [ 96%] 2023-01-11T23:10:17.2634479Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_t_cuda_float32 PASSED [ 96%] 2023-01-11T23:10:17.2634651Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_take_along_dim_cuda_float32 PASSED [ 96%] 2023-01-11T23:10:17.2634837Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_tan_cuda_float32 PASSED [ 96%] 2023-01-11T23:10:17.2635031Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_tensor_split_cuda_float32 SKIPPED (Skip failing test) [ 96%] 2023-01-11T23:10:17.2635193Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_to_cuda_float32 PASSED [ 96%] 2023-01-11T23:10:17.2635348Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_topk_cuda_float32 PASSED [ 96%] 2023-01-11T23:10:17.2635509Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_trace_cuda_float32 PASSED [ 96%] 2023-01-11T23:10:17.2635677Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_trapezoid_cuda_float32 PASSED [ 96%] 2023-01-11T23:10:17.2635853Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_triangular_solve_cuda_float32 PASSED [ 96%] 2023-01-11T23:10:17.2636020Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_tril_indices_cuda_int64 PASSED [ 96%] 2023-01-11T23:10:17.2636186Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_triu_indices_cuda_int64 PASSED [ 96%] 2023-01-11T23:10:17.2636353Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_trunc_cuda_float32 PASSED [ 96%] 2023-01-11T23:10:17.2636521Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_unfold_copy_cuda_float32 PASSED [ 96%] 2023-01-11T23:10:17.2636686Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_unfold_cuda_float32 PASSED [ 96%] 2023-01-11T23:10:17.2636843Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_uniform_cuda_float32 PASSED [ 96%] 2023-01-11T23:10:17.2637022Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_unique_consecutive_cuda_float32 PASSED [ 96%] 2023-01-11T23:10:17.2637188Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_unsqueeze_cuda_float32 PASSED [ 96%] 2023-01-11T23:10:17.2637348Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_var_cuda_float32 PASSED [ 96%] 2023-01-11T23:10:17.2637514Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_var_mean_cuda_float32 PASSED [ 96%] 2023-01-11T23:10:17.2637695Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_var_mean_unbiased_cuda_float32 PASSED [ 96%] 2023-01-11T23:10:17.2637865Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_var_unbiased_cuda_float32 PASSED [ 96%] 2023-01-11T23:10:17.2638052Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_vdot_cuda_float32 PASSED [ 96%] 2023-01-11T23:10:17.2638217Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_view_as_real_cuda_complex64 PASSED [ 96%] 2023-01-11T23:10:17.2638377Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_view_cuda_float32 PASSED [ 96%] 2023-01-11T23:10:17.2638544Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_vsplit_cuda_float32 PASSED [ 96%] 2023-01-11T23:10:17.2638703Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_vstack_cuda_float32 PASSED [ 96%] 2023-01-11T23:10:17.2638867Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_zeros_cuda_float32 PASSED [ 96%] 2023-01-11T23:10:17.2639034Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_zeros_like_cuda_float32 PASSED [ 96%] 2023-01-11T23:10:17.2639197Z test_ops.py::TestTagsCUDA::test_tags_T_cuda_float32 SKIPPED (Only runs on cpu) [ 96%] 2023-01-11T23:10:17.2639366Z test_ops.py::TestTagsCUDA::test_tags___rmatmul___cuda_float32 SKIPPED (Only runs on cpu) [ 96%] 2023-01-11T23:10:17.2639533Z test_ops.py::TestTagsCUDA::test_tags___rmod___cuda_float32 SKIPPED (Only runs on cpu) [ 96%] 2023-01-11T23:10:17.2639684Z test_ops.py::TestTagsCUDA::test_tags___ror___cuda_int64 SKIPPED (Only runs on cpu) [ 96%] 2023-01-11T23:10:17.2639848Z test_ops.py::TestTagsCUDA::test_tags___rsub___cuda_float32 SKIPPED (Only runs on cpu) [ 96%] 2023-01-11T23:10:17.2640009Z test_ops.py::TestTagsCUDA::test_tags___rxor___cuda_int64 SKIPPED (Only runs on cpu) [ 96%] 2023-01-11T23:10:17.2640193Z test_ops.py::TestTagsCUDA::test_tags__refs__conversions_bfloat16_cuda_float32 SKIPPED (Only runs on cpu) [ 96%] 2023-01-11T23:10:17.2640375Z test_ops.py::TestTagsCUDA::test_tags__refs__conversions_bool_cuda_float32 SKIPPED (Only runs on cpu) [ 96%] 2023-01-11T23:10:17.2640585Z test_ops.py::TestTagsCUDA::test_tags__refs__conversions_byte_cuda_float32 SKIPPED (Only runs on cpu) [ 96%] 2023-01-11T23:10:17.2640771Z test_ops.py::TestTagsCUDA::test_tags__refs__conversions_cdouble_cuda_float32 SKIPPED (Only runs on cpu) [ 96%] 2023-01-11T23:10:17.2640956Z test_ops.py::TestTagsCUDA::test_tags__refs__conversions_cfloat_cuda_float32 SKIPPED (Only runs on cpu) [ 96%] 2023-01-11T23:10:17.2641132Z test_ops.py::TestTagsCUDA::test_tags__refs__conversions_complex_cuda_float32 SKIPPED (Only runs on cpu) [ 96%] 2023-01-11T23:10:17.2641317Z test_ops.py::TestTagsCUDA::test_tags__refs__conversions_double_cuda_float32 SKIPPED (Only runs on cpu) [ 96%] 2023-01-11T23:10:17.2641502Z test_ops.py::TestTagsCUDA::test_tags__refs__conversions_float_cuda_float32 SKIPPED (Only runs on cpu) [ 96%] 2023-01-11T23:10:17.2641683Z test_ops.py::TestTagsCUDA::test_tags__refs__conversions_short_cuda_float32 SKIPPED (Only runs on cpu) [ 96%] 2023-01-11T23:10:17.2641848Z test_ops.py::TestTagsCUDA::test_tags__refs_acos_cuda_float32 SKIPPED (Only runs on cpu) [ 96%] 2023-01-11T23:10:17.2642015Z test_ops.py::TestTagsCUDA::test_tags__refs_add_cuda_float32 SKIPPED (Only runs on cpu) [ 96%] 2023-01-11T23:10:17.2642188Z test_ops.py::TestTagsCUDA::test_tags__refs_addcdiv_cuda_float32 SKIPPED (Only runs on cpu) [ 96%] 2023-01-11T23:10:17.2642356Z test_ops.py::TestTagsCUDA::test_tags__refs_addcmul_cuda_float32 SKIPPED (Only runs on cpu) [ 96%] 2023-01-11T23:10:17.2642519Z test_ops.py::TestTagsCUDA::test_tags__refs_all_cuda_float32 SKIPPED (Only runs on cpu) [ 96%] 2023-01-11T23:10:17.2642683Z test_ops.py::TestTagsCUDA::test_tags__refs_allclose_cuda_float32 SKIPPED (Only runs on cpu) [ 96%] 2023-01-11T23:10:17.2642848Z test_ops.py::TestTagsCUDA::test_tags__refs_amax_cuda_float32 SKIPPED (Only runs on cpu) [ 96%] 2023-01-11T23:10:17.2643012Z test_ops.py::TestTagsCUDA::test_tags__refs_any_cuda_float32 SKIPPED (Only runs on cpu) [ 96%] 2023-01-11T23:10:17.2643200Z test_ops.py::TestTagsCUDA::test_tags__refs_as_strided_partial_views_cuda_float32 SKIPPED (Only runs on cpu) [ 96%] 2023-01-11T23:10:17.2643386Z test_ops.py::TestTagsCUDA::test_tags__refs_as_strided_scatter_cuda_float32 SKIPPED (Only runs on cpu) [ 96%] 2023-01-11T23:10:17.2643578Z test_ops.py::TestTagsCUDA::test_tags__refs_atanh_cuda_float32 SKIPPED (Only runs on cpu) [ 96%] 2023-01-11T23:10:17.2643752Z test_ops.py::TestTagsCUDA::test_tags__refs_atleast_1d_cuda_float32 SKIPPED (Only runs on cpu) [ 96%] 2023-01-11T23:10:17.2643924Z test_ops.py::TestTagsCUDA::test_tags__refs_atleast_2d_cuda_float32 SKIPPED (Only runs on cpu) [ 96%] 2023-01-11T23:10:17.2644086Z test_ops.py::TestTagsCUDA::test_tags__refs_atleast_3d_cuda_float32 SKIPPED (Only runs on cpu) [ 96%] 2023-01-11T23:10:17.2644249Z test_ops.py::TestTagsCUDA::test_tags__refs_bitwise_and_cuda_int64 SKIPPED (Only runs on cpu) [ 96%] 2023-01-11T23:10:17.2644427Z test_ops.py::TestTagsCUDA::test_tags__refs_bitwise_left_shift_cuda_int64 SKIPPED (Only runs on cpu) [ 96%] 2023-01-11T23:10:17.2644597Z test_ops.py::TestTagsCUDA::test_tags__refs_bitwise_not_cuda_int64 SKIPPED (Only runs on cpu) [ 96%] 2023-01-11T23:10:17.2644782Z test_ops.py::TestTagsCUDA::test_tags__refs_bitwise_right_shift_cuda_int64 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2644959Z test_ops.py::TestTagsCUDA::test_tags__refs_broadcast_to_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2645126Z test_ops.py::TestTagsCUDA::test_tags__refs_ceil_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2645298Z test_ops.py::TestTagsCUDA::test_tags__refs_chunk_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2645472Z test_ops.py::TestTagsCUDA::test_tags__refs_clamp_max_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2645649Z test_ops.py::TestTagsCUDA::test_tags__refs_conj_physical_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2645820Z test_ops.py::TestTagsCUDA::test_tags__refs_constant_pad_nd_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2646014Z test_ops.py::TestTagsCUDA::test_tags__refs_cumsum_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2646180Z test_ops.py::TestTagsCUDA::test_tags__refs_diag_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2646362Z test_ops.py::TestTagsCUDA::test_tags__refs_div_floor_rounding_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2646529Z test_ops.py::TestTagsCUDA::test_tags__refs_dsplit_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2646698Z test_ops.py::TestTagsCUDA::test_tags__refs_dstack_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2646873Z test_ops.py::TestTagsCUDA::test_tags__refs_empty_like_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2647039Z test_ops.py::TestTagsCUDA::test_tags__refs_eq_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2647205Z test_ops.py::TestTagsCUDA::test_tags__refs_erfinv_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2647363Z test_ops.py::TestTagsCUDA::test_tags__refs_exp_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2647535Z test_ops.py::TestTagsCUDA::test_tags__refs_expand_as_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2647704Z test_ops.py::TestTagsCUDA::test_tags__refs_expm1_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2647871Z test_ops.py::TestTagsCUDA::test_tags__refs_fft_fft_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2648046Z test_ops.py::TestTagsCUDA::test_tags__refs_fft_ifftshift_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2648219Z test_ops.py::TestTagsCUDA::test_tags__refs_fft_ihfft2_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2648389Z test_ops.py::TestTagsCUDA::test_tags__refs_fft_ihfft_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2648558Z test_ops.py::TestTagsCUDA::test_tags__refs_fft_ihfftn_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2648728Z test_ops.py::TestTagsCUDA::test_tags__refs_fft_irfft_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2648890Z test_ops.py::TestTagsCUDA::test_tags__refs_fft_rfft_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2649078Z test_ops.py::TestTagsCUDA::test_tags__refs_fft_rfftn_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2649245Z test_ops.py::TestTagsCUDA::test_tags__refs_fill_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2649414Z test_ops.py::TestTagsCUDA::test_tags__refs_flatten_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2649578Z test_ops.py::TestTagsCUDA::test_tags__refs_flip_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2649748Z test_ops.py::TestTagsCUDA::test_tags__refs_fliplr_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2649914Z test_ops.py::TestTagsCUDA::test_tags__refs_flipud_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2650078Z test_ops.py::TestTagsCUDA::test_tags__refs_fmax_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2650238Z test_ops.py::TestTagsCUDA::test_tags__refs_gt_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2650410Z test_ops.py::TestTagsCUDA::test_tags__refs_heaviside_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2650599Z test_ops.py::TestTagsCUDA::test_tags__refs_hsplit_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2650787Z test_ops.py::TestTagsCUDA::test_tags__refs_hypot_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2650948Z test_ops.py::TestTagsCUDA::test_tags__refs_i0_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2651115Z test_ops.py::TestTagsCUDA::test_tags__refs_igamma_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2651284Z test_ops.py::TestTagsCUDA::test_tags__refs_igammac_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2651455Z test_ops.py::TestTagsCUDA::test_tags__refs_imag_cuda_complex64 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2651653Z test_ops.py::TestTagsCUDA::test_tags__refs_index_add_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2651820Z test_ops.py::TestTagsCUDA::test_tags__refs_index_fill_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2651987Z test_ops.py::TestTagsCUDA::test_tags__refs_isclose_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2652154Z test_ops.py::TestTagsCUDA::test_tags__refs_isinf_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2652321Z test_ops.py::TestTagsCUDA::test_tags__refs_isreal_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2652482Z test_ops.py::TestTagsCUDA::test_tags__refs_lcm_cuda_int64 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2652650Z test_ops.py::TestTagsCUDA::test_tags__refs_lerp_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2652815Z test_ops.py::TestTagsCUDA::test_tags__refs_lgamma_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2652999Z test_ops.py::TestTagsCUDA::test_tags__refs_linalg_matrix_norm_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2653172Z test_ops.py::TestTagsCUDA::test_tags__refs_linalg_svd_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2653342Z test_ops.py::TestTagsCUDA::test_tags__refs_linalg_svdvals_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2653523Z test_ops.py::TestTagsCUDA::test_tags__refs_linalg_vector_norm_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2653686Z test_ops.py::TestTagsCUDA::test_tags__refs_log10_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2653850Z test_ops.py::TestTagsCUDA::test_tags__refs_log_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2654025Z test_ops.py::TestTagsCUDA::test_tags__refs_logical_and_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2654194Z test_ops.py::TestTagsCUDA::test_tags__refs_logical_not_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2654369Z test_ops.py::TestTagsCUDA::test_tags__refs_logical_or_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2654664Z test_ops.py::TestTagsCUDA::test_tags__refs_lt_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2654834Z test_ops.py::TestTagsCUDA::test_tags__refs_mean_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2655015Z test_ops.py::TestTagsCUDA::test_tags__refs_meshgrid_list_of_tensors_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2655203Z test_ops.py::TestTagsCUDA::test_tags__refs_meshgrid_variadic_tensors_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2655370Z test_ops.py::TestTagsCUDA::test_tags__refs_movedim_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2655534Z test_ops.py::TestTagsCUDA::test_tags__refs_neg_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2655705Z test_ops.py::TestTagsCUDA::test_tags__refs_new_empty_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2655881Z test_ops.py::TestTagsCUDA::test_tags__refs_new_full_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2656050Z test_ops.py::TestTagsCUDA::test_tags__refs_new_ones_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2656215Z test_ops.py::TestTagsCUDA::test_tags__refs_new_zeros_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2656374Z test_ops.py::TestTagsCUDA::test_tags__refs_nextafter_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2656566Z test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_alpha_dropout_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2656751Z test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_dropout_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2656931Z test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_glu_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2657149Z test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_group_norm_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2657334Z test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_hardshrink_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2657515Z test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_huber_loss_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2657714Z test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_log_softmax_with_dtype_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2657914Z test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_margin_ranking_loss_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2658097Z test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_mish_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2658274Z test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_mse_loss_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2658473Z test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_pairwise_distance_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2658655Z test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_pdist_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2658844Z test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_relu6_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2659023Z test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_selu_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2659218Z test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_softmax_with_dtype_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2659403Z test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_threshold_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2659596Z test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_triplet_margin_loss_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2659771Z test_ops.py::TestTagsCUDA::test_tags__refs_positive_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2659928Z test_ops.py::TestTagsCUDA::test_tags__refs_ravel_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2660119Z test_ops.py::TestTagsCUDA::test_tags__refs_real_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2660295Z test_ops.py::TestTagsCUDA::test_tags__refs_reciprocal_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2660463Z test_ops.py::TestTagsCUDA::test_tags__refs_repeat_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2660635Z test_ops.py::TestTagsCUDA::test_tags__refs_reshape_as_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2660801Z test_ops.py::TestTagsCUDA::test_tags__refs_round_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2660968Z test_ops.py::TestTagsCUDA::test_tags__refs_rsqrt_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2661136Z test_ops.py::TestTagsCUDA::test_tags__refs_rsub_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2661320Z test_ops.py::TestTagsCUDA::test_tags__refs_sigmoid_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2661480Z test_ops.py::TestTagsCUDA::test_tags__refs_sinc_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2661643Z test_ops.py::TestTagsCUDA::test_tags__refs_sinh_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2661823Z test_ops.py::TestTagsCUDA::test_tags__refs_special_bessel_j0_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2662004Z test_ops.py::TestTagsCUDA::test_tags__refs_special_bessel_j1_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2662179Z test_ops.py::TestTagsCUDA::test_tags__refs_special_erfcx_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2662358Z test_ops.py::TestTagsCUDA::test_tags__refs_special_log_ndtr_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2662578Z test_ops.py::TestTagsCUDA::test_tags__refs_special_multigammaln_mvlgamma_p_1_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2662782Z test_ops.py::TestTagsCUDA::test_tags__refs_special_multigammaln_mvlgamma_p_3_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2662972Z test_ops.py::TestTagsCUDA::test_tags__refs_special_multigammaln_mvlgamma_p_5_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2663143Z test_ops.py::TestTagsCUDA::test_tags__refs_special_xlog1py_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2663309Z test_ops.py::TestTagsCUDA::test_tags__refs_sqrt_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2663478Z test_ops.py::TestTagsCUDA::test_tags__refs_square_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2663650Z test_ops.py::TestTagsCUDA::test_tags__refs_squeeze_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2663816Z test_ops.py::TestTagsCUDA::test_tags__refs_stack_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2663984Z test_ops.py::TestTagsCUDA::test_tags__refs_std_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2664148Z test_ops.py::TestTagsCUDA::test_tags__refs_sum_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2664308Z test_ops.py::TestTagsCUDA::test_tags__refs_t_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2664472Z test_ops.py::TestTagsCUDA::test_tags__refs_trace_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2664637Z test_ops.py::TestTagsCUDA::test_tags__refs_transpose_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2664799Z test_ops.py::TestTagsCUDA::test_tags__refs_tril_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2664962Z test_ops.py::TestTagsCUDA::test_tags__refs_triu_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2665125Z test_ops.py::TestTagsCUDA::test_tags__refs_trunc_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2665299Z test_ops.py::TestTagsCUDA::test_tags__refs_var_mean_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2665463Z test_ops.py::TestTagsCUDA::test_tags__refs_where_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2665659Z test_ops.py::TestTagsCUDA::test_tags__refs_xlogy_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2665825Z test_ops.py::TestTagsCUDA::test_tags__refs_zeros_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2665985Z test_ops.py::TestTagsCUDA::test_tags_acos_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2666137Z test_ops.py::TestTagsCUDA::test_tags_add_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2666302Z test_ops.py::TestTagsCUDA::test_tags_addcdiv_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2666464Z test_ops.py::TestTagsCUDA::test_tags_addmm_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2666645Z test_ops.py::TestTagsCUDA::test_tags_addmm_decomposed_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2666803Z test_ops.py::TestTagsCUDA::test_tags_all_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2666966Z test_ops.py::TestTagsCUDA::test_tags_amax_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2667127Z test_ops.py::TestTagsCUDA::test_tags_amin_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2667283Z test_ops.py::TestTagsCUDA::test_tags_any_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2667439Z test_ops.py::TestTagsCUDA::test_tags_argmin_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2667604Z test_ops.py::TestTagsCUDA::test_tags_argsort_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2667770Z test_ops.py::TestTagsCUDA::test_tags_argwhere_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2667935Z test_ops.py::TestTagsCUDA::test_tags_as_strided_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2668144Z test_ops.py::TestTagsCUDA::test_tags_as_strided_partial_views_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2668325Z test_ops.py::TestTagsCUDA::test_tags_as_strided_scatter_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2668488Z test_ops.py::TestTagsCUDA::test_tags_atanh_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2668655Z test_ops.py::TestTagsCUDA::test_tags_atleast_1d_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2668894Z test_ops.py::TestTagsCUDA::test_tags_atleast_3d_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2669053Z test_ops.py::TestTagsCUDA::test_tags_bernoulli_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2669216Z test_ops.py::TestTagsCUDA::test_tags_bfloat16_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2669378Z test_ops.py::TestTagsCUDA::test_tags_bincount_cuda_int64 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2669556Z test_ops.py::TestTagsCUDA::test_tags_bitwise_right_shift_cuda_int64 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2669722Z test_ops.py::TestTagsCUDA::test_tags_bitwise_xor_cuda_int64 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2669890Z test_ops.py::TestTagsCUDA::test_tags_block_diag_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2670051Z test_ops.py::TestTagsCUDA::test_tags_bool_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2670228Z test_ops.py::TestTagsCUDA::test_tags_broadcast_tensors_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2670386Z test_ops.py::TestTagsCUDA::test_tags_bucketize_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2670544Z test_ops.py::TestTagsCUDA::test_tags_cat_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2670707Z test_ops.py::TestTagsCUDA::test_tags_cdist_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2670875Z test_ops.py::TestTagsCUDA::test_tags_ceil_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2671036Z test_ops.py::TestTagsCUDA::test_tags_cfloat_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2671235Z test_ops.py::TestTagsCUDA::test_tags_cholesky_inverse_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2671409Z test_ops.py::TestTagsCUDA::test_tags_cholesky_solve_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2671571Z test_ops.py::TestTagsCUDA::test_tags_chunk_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2671734Z test_ops.py::TestTagsCUDA::test_tags_clamp_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2671893Z test_ops.py::TestTagsCUDA::test_tags_clamp_max_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:10:17.2672047Z test_ops.py::TestTagsCUDA::test_tags_clone_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2672217Z test_ops.py::TestTagsCUDA::test_tags_combinations_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2672379Z test_ops.py::TestTagsCUDA::test_tags_conj_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2672555Z test_ops.py::TestTagsCUDA::test_tags_constant_pad_nd_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2672721Z test_ops.py::TestTagsCUDA::test_tags_copysign_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2672886Z test_ops.py::TestTagsCUDA::test_tags_corrcoef_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2673048Z test_ops.py::TestTagsCUDA::test_tags_cos_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2673202Z test_ops.py::TestTagsCUDA::test_tags_cosh_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2673373Z test_ops.py::TestTagsCUDA::test_tags_count_nonzero_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2673532Z test_ops.py::TestTagsCUDA::test_tags_cov_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2673720Z test_ops.py::TestTagsCUDA::test_tags_cross_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2673886Z test_ops.py::TestTagsCUDA::test_tags_cumprod_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2674070Z test_ops.py::TestTagsCUDA::test_tags_cumulative_trapezoid_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2674232Z test_ops.py::TestTagsCUDA::test_tags_diag_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2674399Z test_ops.py::TestTagsCUDA::test_tags_diag_embed_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2674566Z test_ops.py::TestTagsCUDA::test_tags_diagonal_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2674718Z test_ops.py::TestTagsCUDA::test_tags_diff_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2674878Z test_ops.py::TestTagsCUDA::test_tags_dist_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2675058Z test_ops.py::TestTagsCUDA::test_tags_div_no_rounding_mode_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2675218Z test_ops.py::TestTagsCUDA::test_tags_dot_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2675384Z test_ops.py::TestTagsCUDA::test_tags_dsplit_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2675546Z test_ops.py::TestTagsCUDA::test_tags_dstack_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2675707Z test_ops.py::TestTagsCUDA::test_tags_empty_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2675866Z test_ops.py::TestTagsCUDA::test_tags_eq_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2676018Z test_ops.py::TestTagsCUDA::test_tags_equal_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2681471Z test_ops.py::TestTagsCUDA::test_tags_erf_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2681636Z test_ops.py::TestTagsCUDA::test_tags_erfc_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2681808Z test_ops.py::TestTagsCUDA::test_tags_erfinv_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2681968Z test_ops.py::TestTagsCUDA::test_tags_exp2_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2682185Z test_ops.py::TestTagsCUDA::test_tags_expand_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2682347Z test_ops.py::TestTagsCUDA::test_tags_expm1_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2682505Z test_ops.py::TestTagsCUDA::test_tags_eye_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2682668Z test_ops.py::TestTagsCUDA::test_tags_fft_fftn_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2682826Z test_ops.py::TestTagsCUDA::test_tags_fft_hfft2_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2682985Z test_ops.py::TestTagsCUDA::test_tags_fft_hfft_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2683145Z test_ops.py::TestTagsCUDA::test_tags_fft_hfftn_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2683309Z test_ops.py::TestTagsCUDA::test_tags_fft_ihfft2_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2683478Z test_ops.py::TestTagsCUDA::test_tags_fft_irfft2_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2683643Z test_ops.py::TestTagsCUDA::test_tags_fft_irfftn_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2683806Z test_ops.py::TestTagsCUDA::test_tags_fft_rfft2_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2683971Z test_ops.py::TestTagsCUDA::test_tags_fft_rfft_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2684124Z test_ops.py::TestTagsCUDA::test_tags_fft_rfftn_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2684285Z test_ops.py::TestTagsCUDA::test_tags_fill_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2684450Z test_ops.py::TestTagsCUDA::test_tags_flatten_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2684637Z test_ops.py::TestTagsCUDA::test_tags_float_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2684792Z test_ops.py::TestTagsCUDA::test_tags_fmin_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2684953Z test_ops.py::TestTagsCUDA::test_tags_fmod_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2685117Z test_ops.py::TestTagsCUDA::test_tags_frexp_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2685277Z test_ops.py::TestTagsCUDA::test_tags_full_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2685444Z test_ops.py::TestTagsCUDA::test_tags_full_like_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2685600Z test_ops.py::TestTagsCUDA::test_tags_gather_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2685756Z test_ops.py::TestTagsCUDA::test_tags_gcd_cuda_int64 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2685917Z test_ops.py::TestTagsCUDA::test_tags_ge_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2686092Z test_ops.py::TestTagsCUDA::test_tags_grid_sampler_2d_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2686255Z test_ops.py::TestTagsCUDA::test_tags_half_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2686415Z test_ops.py::TestTagsCUDA::test_tags_hsplit_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2686569Z test_ops.py::TestTagsCUDA::test_tags_i0_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2686735Z test_ops.py::TestTagsCUDA::test_tags_imag_cuda_complex64 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2686892Z test_ops.py::TestTagsCUDA::test_tags_index_add_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2687057Z test_ops.py::TestTagsCUDA::test_tags_index_copy_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2687222Z test_ops.py::TestTagsCUDA::test_tags_index_fill_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2687395Z test_ops.py::TestTagsCUDA::test_tags_index_reduce_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2687556Z test_ops.py::TestTagsCUDA::test_tags_int_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2687740Z test_ops.py::TestTagsCUDA::test_tags_isin_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2687904Z test_ops.py::TestTagsCUDA::test_tags_isreal_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2688071Z test_ops.py::TestTagsCUDA::test_tags_istft_cuda_complex64 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2688257Z test_ops.py::TestTagsCUDA::test_tags_jiterator_2inputs_2outputs_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2688441Z test_ops.py::TestTagsCUDA::test_tags_jiterator_4inputs_with_extra_args_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2688618Z test_ops.py::TestTagsCUDA::test_tags_jiterator_binary_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2688808Z test_ops.py::TestTagsCUDA::test_tags_jiterator_binary_return_by_ref_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2688967Z test_ops.py::TestTagsCUDA::test_tags_kron_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2689135Z test_ops.py::TestTagsCUDA::test_tags_kthvalue_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2689292Z test_ops.py::TestTagsCUDA::test_tags_lcm_cuda_int64 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2689454Z test_ops.py::TestTagsCUDA::test_tags_ldexp_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2689618Z test_ops.py::TestTagsCUDA::test_tags_lgamma_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2689787Z test_ops.py::TestTagsCUDA::test_tags_linalg_cross_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2689949Z test_ops.py::TestTagsCUDA::test_tags_linalg_eigh_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2690145Z test_ops.py::TestTagsCUDA::test_tags_linalg_eigvals_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2690330Z test_ops.py::TestTagsCUDA::test_tags_linalg_householder_product_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2690508Z test_ops.py::TestTagsCUDA::test_tags_linalg_ldl_factor_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2690685Z test_ops.py::TestTagsCUDA::test_tags_linalg_ldl_factor_ex_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2690856Z test_ops.py::TestTagsCUDA::test_tags_linalg_lu_factor_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2691030Z test_ops.py::TestTagsCUDA::test_tags_linalg_matrix_norm_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2691210Z test_ops.py::TestTagsCUDA::test_tags_linalg_matrix_power_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2691383Z test_ops.py::TestTagsCUDA::test_tags_linalg_matrix_rank_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2691565Z test_ops.py::TestTagsCUDA::test_tags_linalg_matrix_rank_hermitian_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2691745Z test_ops.py::TestTagsCUDA::test_tags_linalg_multi_dot_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2691910Z test_ops.py::TestTagsCUDA::test_tags_linalg_norm_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2692080Z test_ops.py::TestTagsCUDA::test_tags_linalg_pinv_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2692245Z test_ops.py::TestTagsCUDA::test_tags_linalg_qr_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2692411Z test_ops.py::TestTagsCUDA::test_tags_linalg_solve_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2692583Z test_ops.py::TestTagsCUDA::test_tags_linalg_solve_ex_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2692764Z test_ops.py::TestTagsCUDA::test_tags_linalg_solve_triangular_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2692942Z test_ops.py::TestTagsCUDA::test_tags_linalg_tensorinv_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2693128Z test_ops.py::TestTagsCUDA::test_tags_linalg_vander_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2693297Z test_ops.py::TestTagsCUDA::test_tags_linalg_vecdot_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2693462Z test_ops.py::TestTagsCUDA::test_tags_linspace_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2693621Z test_ops.py::TestTagsCUDA::test_tags_log1p_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2693780Z test_ops.py::TestTagsCUDA::test_tags_log2_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2693937Z test_ops.py::TestTagsCUDA::test_tags_log_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2694102Z test_ops.py::TestTagsCUDA::test_tags_log_softmax_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2694286Z test_ops.py::TestTagsCUDA::test_tags_log_softmax_with_dtype_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2694448Z test_ops.py::TestTagsCUDA::test_tags_logaddexp2_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2694841Z test_ops.py::TestTagsCUDA::test_tags_logaddexp_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2695013Z test_ops.py::TestTagsCUDA::test_tags_logcumsumexp_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2695177Z test_ops.py::TestTagsCUDA::test_tags_logdet_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2695337Z test_ops.py::TestTagsCUDA::test_tags_logit_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2695503Z test_ops.py::TestTagsCUDA::test_tags_logspace_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2695670Z test_ops.py::TestTagsCUDA::test_tags_logsumexp_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2695927Z test_ops.py::TestTagsCUDA::test_tags_mH_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2696092Z test_ops.py::TestTagsCUDA::test_tags_mT_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2696254Z test_ops.py::TestTagsCUDA::test_tags_masked_amax_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2696423Z test_ops.py::TestTagsCUDA::test_tags_masked_argmin_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2696599Z test_ops.py::TestTagsCUDA::test_tags_masked_log_softmax_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2696777Z test_ops.py::TestTagsCUDA::test_tags_masked_logaddexp_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2696948Z test_ops.py::TestTagsCUDA::test_tags_masked_logsumexp_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2697114Z test_ops.py::TestTagsCUDA::test_tags_masked_mean_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2697281Z test_ops.py::TestTagsCUDA::test_tags_masked_median_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2697449Z test_ops.py::TestTagsCUDA::test_tags_masked_norm_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2697619Z test_ops.py::TestTagsCUDA::test_tags_masked_normalize_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2697785Z test_ops.py::TestTagsCUDA::test_tags_masked_prod_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2697956Z test_ops.py::TestTagsCUDA::test_tags_masked_softmax_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2698121Z test_ops.py::TestTagsCUDA::test_tags_masked_std_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2698284Z test_ops.py::TestTagsCUDA::test_tags_max_binary_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2698446Z test_ops.py::TestTagsCUDA::test_tags_maximum_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2698603Z test_ops.py::TestTagsCUDA::test_tags_mean_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2698787Z test_ops.py::TestTagsCUDA::test_tags_meshgrid_list_of_tensors_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2698981Z test_ops.py::TestTagsCUDA::test_tags_min_binary_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2699152Z test_ops.py::TestTagsCUDA::test_tags_minimum_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2699314Z test_ops.py::TestTagsCUDA::test_tags_mm_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2699478Z test_ops.py::TestTagsCUDA::test_tags_multinomial_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2699631Z test_ops.py::TestTagsCUDA::test_tags_mv_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2699812Z test_ops.py::TestTagsCUDA::test_tags_mvlgamma_mvlgamma_p_1_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2699993Z test_ops.py::TestTagsCUDA::test_tags_mvlgamma_mvlgamma_p_5_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2700161Z test_ops.py::TestTagsCUDA::test_tags_nan_to_num_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2700319Z test_ops.py::TestTagsCUDA::test_tags_nanmean_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2700491Z test_ops.py::TestTagsCUDA::test_tags_nanquantile_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2700646Z test_ops.py::TestTagsCUDA::test_tags_nansum_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2700810Z test_ops.py::TestTagsCUDA::test_tags_narrow_copy_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2700988Z test_ops.py::TestTagsCUDA::test_tags_native_batch_norm_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2701158Z test_ops.py::TestTagsCUDA::test_tags_native_layer_norm_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2701313Z test_ops.py::TestTagsCUDA::test_tags_ne_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2701500Z test_ops.py::TestTagsCUDA::test_tags_new_full_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2701666Z test_ops.py::TestTagsCUDA::test_tags_nextafter_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2701860Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_adaptive_avg_pool1d_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2702046Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_adaptive_avg_pool2d_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2702222Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_adaptive_avg_pool3d_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2702403Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_adaptive_max_pool1d_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2702585Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_adaptive_max_pool2d_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2702766Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_adaptive_max_pool3d_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2702954Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_alpha_dropout_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2703141Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_avg_pool1d_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2703325Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_avg_pool2d_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2703504Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_avg_pool3d_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2703701Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_batch_norm_without_cudnn_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2703887Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_binary_cross_entropy_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2704064Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_celu_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:10:17.2704247Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_conv2d_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2704460Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_conv_transpose2d_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2704648Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_conv_transpose3d_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2704842Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_cosine_embedding_loss_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2705033Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_cosine_similarity_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2705215Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_cross_entropy_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2705392Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_ctc_loss_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2705570Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_dropout2d_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2705752Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_dropout_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2705933Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_embedding_bag_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2706115Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_embedding_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2706308Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_fractional_max_pool2d_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2706502Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_fractional_max_pool3d_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2706688Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_gaussian_nll_loss_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2706891Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_gelu_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2707075Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_group_norm_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2707262Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_hardshrink_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2707441Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_hardsigmoid_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2707625Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_hardswish_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2707807Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_hardtanh_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2708000Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_hinge_embedding_loss_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2708183Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_huber_loss_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2708377Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_interpolate_area_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2708570Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_interpolate_bicubic_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2708828Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_interpolate_bilinear_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2709024Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_interpolate_linear_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2709211Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_interpolate_nearest_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2709405Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_interpolate_trilinear_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2709581Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_l1_loss_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2709765Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_linear_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2709987Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_logsigmoid_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2710178Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_margin_ranking_loss_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2710355Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_max_pool1d_cuda_float32 SKIPPED (Skipped!) [ 99%] 2023-01-11T23:10:17.2710535Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_max_pool3d_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2710721Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_max_unpool1d_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2710900Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_max_unpool2d_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2711092Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_max_unpool2d_grad_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2711277Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_max_unpool3d_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2711471Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_max_unpool3d_grad_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2711655Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_multi_margin_loss_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2711852Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_multilabel_margin_loss_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2712032Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_nll_loss_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2712214Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_normalize_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2712426Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_one_hot_cuda_int64 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2712609Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_pad_constant_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2712790Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_pad_reflect_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2712967Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_pdist_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2713153Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_pixel_shuffle_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2713339Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_pixel_unshuffle_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2713518Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_prelu_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2713694Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_relu_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2713876Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_silu_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2714062Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_smooth_l1_loss_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2714250Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_soft_margin_loss_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2714425Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_softplus_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2714606Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_softshrink_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2714790Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_tanhshrink_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2714972Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_threshold_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2715166Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_triplet_margin_loss_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2715396Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_triplet_margin_with_distance_loss_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2715588Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_upsample_bilinear_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2715776Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_upsample_nearest_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2715937Z test_ops.py::TestTagsCUDA::test_tags_norm_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2716093Z test_ops.py::TestTagsCUDA::test_tags_norm_fro_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2716256Z test_ops.py::TestTagsCUDA::test_tags_norm_nuc_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2716419Z test_ops.py::TestTagsCUDA::test_tags_normal_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2716582Z test_ops.py::TestTagsCUDA::test_tags_ones_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2716750Z test_ops.py::TestTagsCUDA::test_tags_ones_like_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2716925Z test_ops.py::TestTagsCUDA::test_tags_ops_nvprims_view_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2717091Z test_ops.py::TestTagsCUDA::test_tags_pinverse_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2717255Z test_ops.py::TestTagsCUDA::test_tags_polar_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2717440Z test_ops.py::TestTagsCUDA::test_tags_polygamma_polygamma_n_0_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2717615Z test_ops.py::TestTagsCUDA::test_tags_polygamma_polygamma_n_4_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2717780Z test_ops.py::TestTagsCUDA::test_tags_positive_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2718006Z test_ops.py::TestTagsCUDA::test_tags_pow_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2718167Z test_ops.py::TestTagsCUDA::test_tags_prod_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2718329Z test_ops.py::TestTagsCUDA::test_tags_qr_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2718492Z test_ops.py::TestTagsCUDA::test_tags_quantile_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2718656Z test_ops.py::TestTagsCUDA::test_tags_rand_like_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2718827Z test_ops.py::TestTagsCUDA::test_tags_randint_like_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2718986Z test_ops.py::TestTagsCUDA::test_tags_randn_like_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2719147Z test_ops.py::TestTagsCUDA::test_tags_ravel_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2719320Z test_ops.py::TestTagsCUDA::test_tags_resolve_conj_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2719485Z test_ops.py::TestTagsCUDA::test_tags_resolve_neg_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2719648Z test_ops.py::TestTagsCUDA::test_tags_rot90_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2719806Z test_ops.py::TestTagsCUDA::test_tags_round_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2719981Z test_ops.py::TestTagsCUDA::test_tags_round_decimals_3_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2720158Z test_ops.py::TestTagsCUDA::test_tags_round_decimals_neg_3_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2720319Z test_ops.py::TestTagsCUDA::test_tags_rsqrt_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2720472Z test_ops.py::TestTagsCUDA::test_tags_rsub_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2720669Z test_ops.py::TestTagsCUDA::test_tags_scalar_tensor_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2720860Z test_ops.py::TestTagsCUDA::test_tags_scatter_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2721060Z test_ops.py::TestTagsCUDA::test_tags_scatter_reduce_amax_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2721242Z test_ops.py::TestTagsCUDA::test_tags_scatter_reduce_amin_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2721421Z test_ops.py::TestTagsCUDA::test_tags_scatter_reduce_mean_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2721598Z test_ops.py::TestTagsCUDA::test_tags_scatter_reduce_sum_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2721768Z test_ops.py::TestTagsCUDA::test_tags_searchsorted_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2721950Z test_ops.py::TestTagsCUDA::test_tags_segment_reduce_lengths_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2722110Z test_ops.py::TestTagsCUDA::test_tags_select_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2722271Z test_ops.py::TestTagsCUDA::test_tags_sgn_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2722436Z test_ops.py::TestTagsCUDA::test_tags_sigmoid_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2722618Z test_ops.py::TestTagsCUDA::test_tags_signal_windows_bartlett_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2722799Z test_ops.py::TestTagsCUDA::test_tags_signal_windows_gaussian_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2722986Z test_ops.py::TestTagsCUDA::test_tags_signal_windows_general_cosine_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2723168Z test_ops.py::TestTagsCUDA::test_tags_signal_windows_hamming_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2723351Z test_ops.py::TestTagsCUDA::test_tags_signal_windows_kaiser_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2723552Z test_ops.py::TestTagsCUDA::test_tags_signal_windows_nuttall_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2723711Z test_ops.py::TestTagsCUDA::test_tags_sin_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2723874Z test_ops.py::TestTagsCUDA::test_tags_slice_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2724034Z test_ops.py::TestTagsCUDA::test_tags_softmax_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2724215Z test_ops.py::TestTagsCUDA::test_tags_softmax_with_dtype_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2724392Z test_ops.py::TestTagsCUDA::test_tags_sparse_sampled_addmm_cuda_float32 SKIPPED (Skipped!) [ 99%] 2023-01-11T23:10:17.2724566Z test_ops.py::TestTagsCUDA::test_tags_special_bessel_j0_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2724737Z test_ops.py::TestTagsCUDA::test_tags_special_bessel_y0_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2724911Z test_ops.py::TestTagsCUDA::test_tags_special_bessel_y1_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2725094Z test_ops.py::TestTagsCUDA::test_tags_special_chebyshev_polynomial_t_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2725280Z test_ops.py::TestTagsCUDA::test_tags_special_chebyshev_polynomial_u_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2725637Z test_ops.py::TestTagsCUDA::test_tags_special_chebyshev_polynomial_v_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 99%] 2023-01-11T23:10:17.2725808Z test_ops.py::TestTagsCUDA::test_tags_special_entr_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2725979Z test_ops.py::TestTagsCUDA::test_tags_special_erfcx_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2726149Z test_ops.py::TestTagsCUDA::test_tags_special_i0e_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2726316Z test_ops.py::TestTagsCUDA::test_tags_special_i1_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2726483Z test_ops.py::TestTagsCUDA::test_tags_special_i1e_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2726697Z test_ops.py::TestTagsCUDA::test_tags_special_laguerre_polynomial_l_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2726869Z test_ops.py::TestTagsCUDA::test_tags_special_log_ndtr_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2727055Z test_ops.py::TestTagsCUDA::test_tags_special_modified_bessel_i0_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2727238Z test_ops.py::TestTagsCUDA::test_tags_special_modified_bessel_i1_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2727429Z test_ops.py::TestTagsCUDA::test_tags_special_scaled_modified_bessel_k1_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2727779Z test_ops.py::TestTagsCUDA::test_tags_special_shifted_chebyshev_polynomial_v_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 99%] 2023-01-11T23:10:17.2728123Z test_ops.py::TestTagsCUDA::test_tags_special_shifted_chebyshev_polynomial_w_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 99%] 2023-01-11T23:10:17.2728302Z test_ops.py::TestTagsCUDA::test_tags_special_xlog1py_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2728468Z test_ops.py::TestTagsCUDA::test_tags_special_zeta_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2728640Z test_ops.py::TestTagsCUDA::test_tags_split_list_args_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2728806Z test_ops.py::TestTagsCUDA::test_tags_square_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2728962Z test_ops.py::TestTagsCUDA::test_tags_std_mean_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2729135Z test_ops.py::TestTagsCUDA::test_tags_std_mean_unbiased_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2729329Z test_ops.py::TestTagsCUDA::test_tags_std_unbiased_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2729490Z test_ops.py::TestTagsCUDA::test_tags_stft_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2729662Z test_ops.py::TestTagsCUDA::test_tags_sum_to_size_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2729819Z test_ops.py::TestTagsCUDA::test_tags_svd_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2729981Z test_ops.py::TestTagsCUDA::test_tags_symeig_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2730140Z test_ops.py::TestTagsCUDA::test_tags_take_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2730287Z test_ops.py::TestTagsCUDA::test_tags_tan_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2730454Z test_ops.py::TestTagsCUDA::test_tags_tanh_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2730651Z test_ops.py::TestTagsCUDA::test_tags_tensor_split_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2730829Z test_ops.py::TestTagsCUDA::test_tags_to_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2731004Z test_ops.py::TestTagsCUDA::test_tags_triangular_solve_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2731166Z test_ops.py::TestTagsCUDA::test_tags_trunc_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2731328Z test_ops.py::TestTagsCUDA::test_tags_unbind_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2731489Z test_ops.py::TestTagsCUDA::test_tags_unique_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2731655Z test_ops.py::TestTagsCUDA::test_tags_unsqueeze_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2731821Z test_ops.py::TestTagsCUDA::test_tags_var_mean_unbiased_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2731992Z test_ops.py::TestTagsCUDA::test_tags_var_unbiased_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2732161Z test_ops.py::TestTagsCUDA::test_tags_vsplit_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2732325Z test_ops.py::TestTagsCUDA::test_tags_vstack_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2732505Z test_ops.py::TestTagsCUDA::test_tags_zero__cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:10:17.2732668Z test_ops.py::TestTagsCUDA::test_tags_zeros_cuda_float32 SKIPPED (Only runs on cpu) [100%] 2023-01-11T23:10:17.2732675Z 2023-01-11T23:10:17.2732798Z =============================== warnings summary =============================== 2023-01-11T23:10:17.2733029Z ../../../../../opt/conda/lib/python3.10/site-packages/_pytest/config/__init__.py:1171 2023-01-11T23:10:17.2733392Z /opt/conda/lib/python3.10/site-packages/_pytest/config/__init__.py:1171: PytestAssertRewriteWarning: Module already imported so cannot be rewritten: hypothesis 2023-01-11T23:10:17.2733490Z self._mark_plugins_for_rewrite(hook) 2023-01-11T23:10:17.2733499Z 2023-01-11T23:10:17.2733729Z -- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html 2023-01-11T23:10:17.2734025Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_ops/test_ops-8308d40cbcb1066e.xml - 2023-01-11T23:10:17.2734179Z = 12240 passed, 3207 skipped, 25 deselected, 225 xfailed, 1 warning in 4589.34s (1:16:29) = 2023-01-11T23:10:17.2734358Z If in CI, skip info is located in the xml test reports, please either go to s3 or the hud to download them 2023-01-11T23:10:17.2734364Z 2023-01-11T23:10:17.2734834Z ##[endgroup] 2023-01-11T23:10:17.2735104Z FINISHED PRINTING LOG FILE of test_ops (/var/lib/jenkins/workspace/test/test-reports/test_ops_xj2vdchj) 2023-01-11T23:10:17.2735109Z 2023-01-11T23:13:46.7161921Z 2023-01-11T23:13:46.7162736Z Expand the folded group to see the log file of test_ops 2023-01-11T23:13:46.7163497Z ##[group]PRINTING LOG FILE of test_ops (/var/lib/jenkins/workspace/test/test-reports/test_ops_oa0bw8mk) 2023-01-11T23:13:46.7178787Z Test results will be stored in test-reports/python-pytest/test_ops/test_ops-621a128d9f5db79e.xml 2023-01-11T23:13:46.7179139Z ============================= test session starts ============================== 2023-01-11T23:13:46.7179536Z platform linux -- Python 3.10.8, pytest-7.2.0, pluggy-1.0.0 -- /opt/conda/bin/python 2023-01-11T23:13:46.7179816Z cachedir: .pytest_cache 2023-01-11T23:13:46.7180238Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2023-01-11T23:13:46.7182221Z rootdir: /var/lib/jenkins/workspace, configfile: pytest.ini 2023-01-11T23:13:46.7183013Z plugins: hypothesis-5.35.1, flakefinder-1.1.0, rerunfailures-10.3, shard-0.1.2, xdist-3.1.0, xdoctest-1.1.0 2023-01-11T23:13:46.7183494Z collecting ... collected 30861 items / 17 deselected / 30844 selected 2023-01-11T23:13:46.8672140Z Running 15147 items in this shard: test/test_ops.py::TestCommonCUDA::test_compare_cpu_H_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu___getitem___cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu___rand___cuda_int64, test/test_ops.py::TestCommonCUDA::test_compare_cpu___rmatmul___cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu___rmul___cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu___ror___cuda_int64, test/test_ops.py::TestCommonCUDA::test_compare_cpu___rpow___cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu___rsub___cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__native_batch_norm_legit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs__conversions_bool_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs__conversions_cfloat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs__conversions_complex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs__conversions_float_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs__conversions_int_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs__conversions_long_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_as_strided_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_as_strided_partial_views_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_atleast_2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_atleast_3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_bitwise_right_shift_cuda_int64, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_chunk_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_copysign_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_cumsum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_diagonal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_diagonal_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_div_trunc_rounding_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_dsplit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_empty_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_expand_as_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_expand_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_fft_fftshift_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_flip_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_fliplr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_fmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_hsplit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_index_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_lerp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_linalg_matrix_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_linalg_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_linalg_svdvals_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_linalg_vector_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_linspace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_meshgrid_list_of_tensors_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_new_zeros_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_nextafter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_nn_functional_alpha_dropout_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_nn_functional_hardtanh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_nn_functional_huber_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_nn_functional_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_nn_functional_margin_ranking_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_nn_functional_nll_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_nn_functional_relu6_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_nn_functional_softmin_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_nn_functional_softshrink_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_nn_functional_triplet_margin_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_reshape_as_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_rot90_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_squeeze_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_std_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_sum_to_size_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_t_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_trace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_triu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_true_divide_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_unfold_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_var_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_view_as_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_vsplit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_vstack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_xlogy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__softmax_backward_data_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_addmm_decomposed_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_argsort_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_as_strided_partial_views_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_atleast_1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_atleast_2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_atleast_3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_bitwise_right_shift_cuda_int64, test/test_ops.py::TestCommonCUDA::test_compare_cpu_block_diag_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_bool_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_cartesian_prod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_cfloat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_chalf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_cholesky_inverse_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_column_stack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_complex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_contiguous_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_copysign_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_corrcoef_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_cross_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_diagonal_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_dist_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_dot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_double_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_einsum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_empty_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_eye_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_fft_ifftshift_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_flip_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_flipud_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_float_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_grid_sampler_2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_histc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_hsplit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_hstack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_index_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_index_fill_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_index_reduce_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_int_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_isin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_istft_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_compare_cpu_ldexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_cond_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_det_singular_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_eig_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_eigh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_householder_product_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_inv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_inv_ex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_ldl_factor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_ldl_factor_ex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_ldl_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_lstsq_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_lu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_lu_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_matrix_rank_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_pinv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_slogdet_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_vector_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_log_softmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_logaddexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_logcumsumexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_logdet_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_logspace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_masked_cumsum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_masked_fill_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_masked_log_softmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_masked_logsumexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_masked_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_masked_normalize_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_masked_select_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_masked_softmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_matrix_exp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_median_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_min_reduction_no_dim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_min_reduction_with_dim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_mode_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_multinomial_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_narrow_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_narrow_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_native_batch_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_new_empty_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_new_full_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_new_zeros_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional__scaled_dot_product_attention_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_adaptive_avg_pool1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_avg_pool1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_avg_pool2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_batch_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_binary_cross_entropy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_conv1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_conv2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_cosine_similarity_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_ctc_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_dropout3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_dropout_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_embedding_bag_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_embedding_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_glu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_hardswish_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_interpolate_bicubic_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_interpolate_nearest_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_interpolate_trilinear_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_linear_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_local_response_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_max_pool1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_max_unpool1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_max_unpool1d_grad_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_max_unpool2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_max_unpool3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_max_unpool3d_grad_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_multi_margin_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_multilabel_margin_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_normalize_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_pad_constant_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_pad_replicate_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_pixel_shuffle_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_soft_margin_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_softmin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_softmin_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_triplet_margin_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_upsample_bilinear_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nonzero_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_norm_inf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_norm_nuc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_normal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_normal_number_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_ones_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_ops_nvprims_native_batch_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_ops_nvprims_view_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_ormqr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_pca_lowrank_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_pinverse_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_polar_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_qr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_quantile_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_rand_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_randint_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_renorm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_repeat_interleave_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_reshape_as_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_reshape_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_resize__cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_resize_as__cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_resolve_neg_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_rsub_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_scalar_tensor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_scatter_add_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_scatter_reduce_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_scatter_reduce_prod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_segment_reduce_offsets_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_select_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_slice_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_sparse_sampled_addmm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_special_chebyshev_polynomial_v_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_split_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_split_with_sizes_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_stack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_stft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_svd_lowrank_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_t_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_take_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_to_sparse_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_topk_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_triangular_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_triu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_unflatten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_unfold_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_unique_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_var_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_var_mean_unbiased_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_view_as_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_view_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_vstack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_xlogy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_zero__cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_zeros_cuda_float32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_H_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_acos_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_as_strided_partial_views_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_asin_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_atleast_3d_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_bool_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_cat_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_cfloat_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_chalf_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_chunk_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_clone_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_column_stack_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_conj_physical_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_cos_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_diag_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_diag_embed_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_diagonal_copy_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_diagonal_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_div_no_rounding_mode_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_dsplit_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_eq_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_fft_fft_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_fft_fftn_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_fft_fftshift_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_fft_hfftn_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_fft_ifft2_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_fft_ifft_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_fft_irfft_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_fft_irfftn_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_hstack_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_index_put_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_isfinite_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_lerp_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_log_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_log_softmax_with_dtype_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_mT_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_masked_fill_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_movedim_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_mul_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_narrow_copy_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_narrow_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_new_empty_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_new_empty_strided_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_new_full_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_new_ones_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_nn_functional_conv_transpose2d_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_nn_functional_conv_transpose3d_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_nonzero_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_ones_like_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_permute_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_positive_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_pow_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_prod_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_rand_like_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_randn_like_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_real_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_reshape_as_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_reshape_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_resolve_neg_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_rsqrt_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_scalar_tensor_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_sigmoid_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_sinh_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_slice_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_split_with_sizes_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_squeeze_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_sub_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_tan_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_trace_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_transpose_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_true_divide_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_unbind_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_unfold_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_unsqueeze_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_view_as_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_view_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_vsplit_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_where_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_zeros_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_dtypes_H_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_T_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes___rdiv___cuda, test/test_ops.py::TestCommonCUDA::test_dtypes___rmatmul___cuda, test/test_ops.py::TestCommonCUDA::test_dtypes___rpow___cuda, test/test_ops.py::TestCommonCUDA::test_dtypes___rxor___cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs__conversions_bfloat16_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs__conversions_bool_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs__conversions_byte_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs__conversions_cdouble_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs__conversions_cfloat_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs__conversions_char_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs__conversions_complex_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs__conversions_int_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs__conversions_long_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_abs_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_addcdiv_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_addcmul_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_any_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_as_strided_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_as_strided_partial_views_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_as_strided_scatter_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_asin_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_atan_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_atleast_1d_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_bitwise_xor_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_broadcast_to_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_ceil_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_clamp_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_clone_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_column_stack_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_contiguous_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_copysign_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_cosh_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_cumsum_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_diag_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_diagonal_copy_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_diagonal_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_diagonal_scatter_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_digamma_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_div_no_rounding_mode_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_empty_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_empty_like_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_eq_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_erf_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_erfinv_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_exp2_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_exp_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_expand_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_eye_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_fft_fftn_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_fft_hfft2_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_fft_ifftshift_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_fft_ihfft2_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_fft_ihfftn_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_fft_irfftn_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_fft_rfft_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_fft_rfftn_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_flatten_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_flip_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_flipud_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_float_power_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_floor_divide_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_fmax_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_ge_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_heaviside_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_igammac_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_imag_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_index_add_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_isclose_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_isposinf_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_lcm_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_linalg_norm_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_linalg_svdvals_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_log10_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_log1p_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_logical_and_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_logspace_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_lt_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_maximum_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_mean_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_meshgrid_list_of_tensors_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_meshgrid_variadic_tensors_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_minimum_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_narrow_copy_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_narrow_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_native_layer_norm_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_new_ones_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_new_zeros_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_alpha_dropout_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_celu_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_gelu_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_glu_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_hardshrink_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_hinge_embedding_loss_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_l1_loss_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_log_softmax_with_dtype_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_margin_ranking_loss_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_mish_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_mse_loss_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_pdist_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_poisson_nll_loss_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_prelu_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_relu6_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_selu_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_softmax_with_dtype_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_softmin_with_dtype_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_tanhshrink_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_threshold_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_norm_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_permute_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_positive_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_pow_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_prod_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_randn_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_real_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_reciprocal_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_reshape_as_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_rot90_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_round_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_rsub_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_sgn_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_sinc_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_special_i1_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_special_i1e_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_special_log_ndtr_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_special_multigammaln_mvlgamma_p_5_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_special_ndtr_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_special_softmax_with_dtype_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_square_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_std_mean_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_t_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_tensor_split_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_trace_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_tril_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_tril_indices_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_triu_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_true_divide_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_unbind_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_unsqueeze_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_var_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_var_mean_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_view_as_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_vsplit_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_zeros_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__softmax_backward_data_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_abs_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_acos_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_addbmm_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_addcdiv_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_addmm_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_addmm_decomposed_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_addr_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_allclose_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_amin_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_angle_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_argmax_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_argwhere_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_asin_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_atan2_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_atanh_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_atleast_3d_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_baddbmm_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_bincount_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_bitwise_and_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_bitwise_not_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_bitwise_or_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_bitwise_xor_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_block_diag_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_broadcast_tensors_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_cdouble_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_cfloat_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_cholesky_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_cholesky_solve_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_chunk_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_clamp_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_clamp_max_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_column_stack_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_complex_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_conj_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_constant_pad_nd_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_contiguous_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_corrcoef_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_cos_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_cummin_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_cumsum_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_deg2rad_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_diag_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_diagflat_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_diagonal_copy_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_diagonal_scatter_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_diff_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_dist_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_div_no_rounding_mode_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_div_trunc_rounding_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_double_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_dstack_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_einsum_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_empty_like_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_eq_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_erfc_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_erfinv_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_exp2_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_exp_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_expand_as_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_expand_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_expm1_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_eye_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_fft_fft2_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_fft_fft_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_fft_fftshift_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_fft_hfft2_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_fft_hfft_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_fft_ifft2_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_fft_ifft_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_fft_ihfft_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_fft_irfft2_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_fft_irfft_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_fft_rfft2_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_fft_rfft_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_fill_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_flip_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_float_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_floor_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_fmod_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_frexp_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_gather_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_gcd_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_ge_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_half_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_heaviside_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_histc_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_histogramdd_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_hsplit_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_igamma_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_igammac_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_imag_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_index_copy_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_index_fill_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_index_put_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_index_select_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_inner_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_isclose_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_isposinf_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_istft_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_jiterator_2inputs_2outputs_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_jiterator_4inputs_with_extra_args_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_le_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_cond_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_det_singular_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_eigh_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_eigvals_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_householder_product_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_inv_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_inv_ex_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_ldl_factor_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_ldl_factor_ex_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_ldl_solve_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_lstsq_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_multi_dot_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_norm_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_norm_subgradients_at_zero_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_pinv_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_pinv_hermitian_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_pinv_singular_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_qr_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_slogdet_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_solve_ex_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_tensorsolve_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linspace_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_log10_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_log1p_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_log2_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_log_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_log_softmax_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_log_softmax_with_dtype_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_logical_or_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_logit_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_logspace_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_lu_unpack_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_mH_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_mT_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_masked_amax_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_masked_argmax_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_masked_argmin_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_masked_cumprod_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_masked_cumsum_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_masked_normalize_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_masked_prod_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_masked_scatter_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_masked_select_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_matmul_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_max_binary_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_max_pool2d_with_indices_backward_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_max_reduction_with_dim_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_maximum_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_median_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_meshgrid_list_of_tensors_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_min_reduction_with_dim_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_mode_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_mul_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_multinomial_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_mv_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_mvlgamma_mvlgamma_p_1_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_mvlgamma_mvlgamma_p_3_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nan_to_num_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nanmedian_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nanquantile_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nansum_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_narrow_copy_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_native_layer_norm_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_ne_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_neg_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_new_empty_strided_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional__scaled_dot_product_attention_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_adaptive_avg_pool2d_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_adaptive_avg_pool3d_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_adaptive_max_pool1d_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_avg_pool2d_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_avg_pool3d_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_batch_norm_without_cudnn_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_celu_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_conv2d_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_conv_transpose1d_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_cosine_similarity_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_dropout_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_elu_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_embedding_bag_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_feature_alpha_dropout_with_train_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_feature_alpha_dropout_without_train_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_fractional_max_pool3d_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_grid_sample_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_group_norm_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_hardshrink_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_hardsigmoid_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_hardswish_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_huber_loss_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_interpolate_bilinear_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_interpolate_nearest_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_interpolate_trilinear_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_kl_div_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_layer_norm_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_linear_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_local_response_norm_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_logsigmoid_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_max_pool1d_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_max_pool3d_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_max_unpool1d_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_max_unpool1d_grad_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_max_unpool2d_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_max_unpool3d_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_mse_loss_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_multi_margin_loss_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_multilabel_soft_margin_loss_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_normalize_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_one_hot_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_pad_constant_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_pad_replicate_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_pdist_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_pixel_shuffle_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_pixel_unshuffle_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_poisson_nll_loss_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_relu6_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_relu_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_rrelu_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_selu_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_silu_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_soft_margin_loss_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_softmin_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_softmin_with_dtype_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_softshrink_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_softsign_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_tanhshrink_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_triplet_margin_loss_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_triplet_margin_with_distance_loss_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_upsample_bilinear_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_upsample_nearest_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nonzero_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_norm_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_norm_fro_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_norm_nuc_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_normal_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_ones_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_ops_nvprims_native_batch_norm_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_ormqr_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_outer_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_permute_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_polar_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_polygamma_polygamma_n_0_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_polygamma_polygamma_n_2_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_polygamma_polygamma_n_3_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_positive_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_pow_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_prod_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_put_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_randint_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_randn_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_randn_like_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_ravel_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_renorm_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_repeat_interleave_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_reshape_as_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_reshape_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_resize_as__cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_round_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_round_decimals_0_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_round_decimals_neg_3_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_rsqrt_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_rsub_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_scatter_reduce_amin_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_scatter_reduce_mean_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_segment_reduce_offsets_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_select_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_sgn_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_sign_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_signal_windows_blackman_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_signal_windows_cosine_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_signal_windows_exponential_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_signal_windows_gaussian_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_signal_windows_hamming_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_signal_windows_kaiser_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_signal_windows_nuttall_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_signbit_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_sin_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_slice_scatter_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_softmax_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_sparse_sampled_addmm_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_special_airy_ai_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_special_bessel_j0_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_special_bessel_y0_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_special_bessel_y1_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_special_chebyshev_polynomial_t_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_special_chebyshev_polynomial_v_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_special_chebyshev_polynomial_w_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_special_entr_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_special_hermite_polynomial_h_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_special_hermite_polynomial_he_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_special_modified_bessel_i0_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_special_modified_bessel_k0_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_special_modified_bessel_k1_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_special_ndtri_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_special_polygamma_special_polygamma_n_0_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_special_scaled_modified_bessel_k0_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_special_shifted_chebyshev_polynomial_t_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_special_shifted_chebyshev_polynomial_w_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_special_spherical_bessel_j0_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_special_zeta_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_split_list_args_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_split_with_sizes_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_squeeze_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_stack_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_std_mean_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_std_unbiased_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_stft_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_sum_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_sum_to_size_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_svd_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_t_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_take_along_dim_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_tan_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_tile_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_to_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_topk_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_trace_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_trapezoid_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_trapz_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_tril_indices_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_triu_indices_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_true_divide_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_trunc_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_unbind_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_unflatten_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_unique_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_var_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_var_mean_unbiased_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_var_unbiased_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_vdot_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_view_as_complex_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_view_as_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_view_copy_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_view_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_zeros_cuda, test/test_ops.py::TestCommonCUDA::test_errors_T_cuda, test/test_ops.py::TestCommonCUDA::test_errors___rand___cuda, test/test_ops.py::TestCommonCUDA::test_errors___rmod___cuda, test/test_ops.py::TestCommonCUDA::test_errors___ror___cuda, test/test_ops.py::TestCommonCUDA::test_errors_add_cuda, test/test_ops.py::TestCommonCUDA::test_errors_aminmax_cuda, test/test_ops.py::TestCommonCUDA::test_errors_arange_cuda, test/test_ops.py::TestCommonCUDA::test_errors_as_strided_scatter_cuda, test/test_ops.py::TestCommonCUDA::test_errors_bitwise_and_cuda, test/test_ops.py::TestCommonCUDA::test_errors_cat_cuda, test/test_ops.py::TestCommonCUDA::test_errors_clamp_max_cuda, test/test_ops.py::TestCommonCUDA::test_errors_clamp_min_cuda, test/test_ops.py::TestCommonCUDA::test_errors_diagonal_cuda, test/test_ops.py::TestCommonCUDA::test_errors_div_floor_rounding_cuda, test/test_ops.py::TestCommonCUDA::test_errors_div_trunc_rounding_cuda, test/test_ops.py::TestCommonCUDA::test_errors_dsplit_cuda, test/test_ops.py::TestCommonCUDA::test_errors_eye_cuda, test/test_ops.py::TestCommonCUDA::test_errors_fft_fftn_cuda, test/test_ops.py::TestCommonCUDA::test_errors_fft_hfft2_cuda, test/test_ops.py::TestCommonCUDA::test_errors_fft_hfft_cuda, test/test_ops.py::TestCommonCUDA::test_errors_fft_ifft2_cuda, test/test_ops.py::TestCommonCUDA::test_errors_fft_ifft_cuda, test/test_ops.py::TestCommonCUDA::test_errors_fft_ihfft2_cuda, test/test_ops.py::TestCommonCUDA::test_errors_fliplr_cuda, test/test_ops.py::TestCommonCUDA::test_errors_flipud_cuda, test/test_ops.py::TestCommonCUDA::test_errors_float_power_cuda, test/test_ops.py::TestCommonCUDA::test_errors_fmax_cuda, test/test_ops.py::TestCommonCUDA::test_errors_fmod_cuda, test/test_ops.py::TestCommonCUDA::test_errors_gcd_cuda, test/test_ops.py::TestCommonCUDA::test_errors_ge_cuda, test/test_ops.py::TestCommonCUDA::test_errors_gt_cuda, test/test_ops.py::TestCommonCUDA::test_errors_heaviside_cuda, test/test_ops.py::TestCommonCUDA::test_errors_igamma_cuda, test/test_ops.py::TestCommonCUDA::test_errors_igammac_cuda, test/test_ops.py::TestCommonCUDA::test_errors_index_select_cuda, test/test_ops.py::TestCommonCUDA::test_errors_jiterator_binary_return_by_ref_cuda, test/test_ops.py::TestCommonCUDA::test_errors_kthvalue_cuda, test/test_ops.py::TestCommonCUDA::test_errors_lcm_cuda, test/test_ops.py::TestCommonCUDA::test_errors_linalg_lstsq_cuda, test/test_ops.py::TestCommonCUDA::test_errors_linalg_lstsq_grad_oriented_cuda, test/test_ops.py::TestCommonCUDA::test_errors_logspace_cuda, test/test_ops.py::TestCommonCUDA::test_errors_lt_cuda, test/test_ops.py::TestCommonCUDA::test_errors_masked_fill_cuda, test/test_ops.py::TestCommonCUDA::test_errors_maximum_cuda, test/test_ops.py::TestCommonCUDA::test_errors_multinomial_cuda, test/test_ops.py::TestCommonCUDA::test_errors_neg_cuda, test/test_ops.py::TestCommonCUDA::test_errors_nn_functional_avg_pool3d_cuda, test/test_ops.py::TestCommonCUDA::test_errors_nn_functional_conv1d_cuda, test/test_ops.py::TestCommonCUDA::test_errors_nn_functional_conv2d_cuda, test/test_ops.py::TestCommonCUDA::test_errors_nn_functional_gaussian_nll_loss_cuda, test/test_ops.py::TestCommonCUDA::test_errors_nn_functional_hinge_embedding_loss_cuda, test/test_ops.py::TestCommonCUDA::test_errors_nn_functional_margin_ranking_loss_cuda, test/test_ops.py::TestCommonCUDA::test_errors_nn_functional_max_pool3d_cuda, test/test_ops.py::TestCommonCUDA::test_errors_nn_functional_rrelu_cuda, test/test_ops.py::TestCommonCUDA::test_errors_nn_functional_soft_margin_loss_cuda, test/test_ops.py::TestCommonCUDA::test_errors_nn_functional_softshrink_cuda, test/test_ops.py::TestCommonCUDA::test_errors_nn_functional_triplet_margin_loss_cuda, test/test_ops.py::TestCommonCUDA::test_errors_polar_cuda, test/test_ops.py::TestCommonCUDA::test_errors_remainder_cuda, test/test_ops.py::TestCommonCUDA::test_errors_rsub_cuda, test/test_ops.py::TestCommonCUDA::test_errors_signal_windows_bartlett_cuda, test/test_ops.py::TestCommonCUDA::test_errors_signal_windows_blackman_cuda, test/test_ops.py::TestCommonCUDA::test_errors_signal_windows_exponential_cuda, test/test_ops.py::TestCommonCUDA::test_errors_signal_windows_general_hamming_cuda, test/test_ops.py::TestCommonCUDA::test_errors_signal_windows_hamming_cuda, test/test_ops.py::TestCommonCUDA::test_errors_signal_windows_hann_cuda, test/test_ops.py::TestCommonCUDA::test_errors_special_chebyshev_polynomial_t_cuda, test/test_ops.py::TestCommonCUDA::test_errors_special_chebyshev_polynomial_v_cuda, test/test_ops.py::TestCommonCUDA::test_errors_special_chebyshev_polynomial_w_cuda, test/test_ops.py::TestCommonCUDA::test_errors_special_hermite_polynomial_he_cuda, test/test_ops.py::TestCommonCUDA::test_errors_special_laguerre_polynomial_l_cuda, test/test_ops.py::TestCommonCUDA::test_errors_special_shifted_chebyshev_polynomial_u_cuda, test/test_ops.py::TestCommonCUDA::test_errors_special_shifted_chebyshev_polynomial_v_cuda, test/test_ops.py::TestCommonCUDA::test_errors_special_zeta_cuda, test/test_ops.py::TestCommonCUDA::test_errors_sum_to_size_cuda, test/test_ops.py::TestCommonCUDA::test_errors_take_cuda, test/test_ops.py::TestCommonCUDA::test_errors_trace_cuda, test/test_ops.py::TestCommonCUDA::test_errors_triu_cuda, test/test_ops.py::TestCommonCUDA::test_errors_uniform_cuda, test/test_ops.py::TestCommonCUDA::test_errors_view_cuda, test/test_ops.py::TestCommonCUDA::test_errors_vsplit_cuda, test/test_ops.py::TestCommonCUDA::test_errors_vstack_cuda, test/test_ops.py::TestCommonCUDA::test_errors_where_cuda, test/test_ops.py::TestCommonCUDA::test_multiple_devices_H_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_T_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices___getitem___cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices___getitem___cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices___rand___cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices___rmod___cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices___rmod___cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices___ror___cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices___rpow___cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices___rpow___cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices___rsub___cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices___rxor___cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices__native_batch_norm_legit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_acos_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_acosh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_add_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_addbmm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_addcmul_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_addmm_decomposed_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_addmv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_all_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_amin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_angle_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_argmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_argmin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_argsort_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_argsort_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_argwhere_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_as_strided_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_as_strided_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_as_strided_scatter_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_asin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_asinh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_asinh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_atan2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_atan_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_atan_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_baddbmm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_bfloat16_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_bfloat16_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_bincount_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_bitwise_left_shift_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_bitwise_not_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_bitwise_or_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_bitwise_xor_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_bool_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_broadcast_shapes_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_bucketize_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_byte_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_cartesian_prod_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_cat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_cat_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_cdist_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_cdouble_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_cdouble_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_chalf_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_char_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_char_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_cholesky_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_cholesky_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_chunk_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_clamp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_clamp_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_clamp_max_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_clamp_min_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_clone_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_column_stack_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_combinations_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_conj_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_constant_pad_nd_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_constant_pad_nd_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_contiguous_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_contiguous_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_copysign_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_copysign_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_cos_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_count_nonzero_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_cov_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_cross_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_cummin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_cummin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_cumprod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_cumsum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_cumsum_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_cumulative_trapezoid_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_cumulative_trapezoid_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_diag_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_diag_embed_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_diagflat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_diagflat_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_diagonal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_diagonal_scatter_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_diff_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_diff_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_digamma_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_digamma_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_dist_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_div_floor_rounding_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_div_floor_rounding_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_div_no_rounding_mode_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_dot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_dstack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_dstack_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_empty_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_empty_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_eq_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_erfinv_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_exp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_expm1_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_eye_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_fft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_fft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_fftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_fftshift_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_hfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_hfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_hfftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_ifft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_ifft_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_ihfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_ihfft2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_irfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_irfft2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_irfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_irfftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_rfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_rfft_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_rfftn_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fill_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_flatten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_flatten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_flip_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_flip_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fliplr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fliplr_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_float_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_float_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_float_power_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_float_power_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_floor_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_floor_divide_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fmax_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fmin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fmin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_frac_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_frexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_full_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_full_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_full_like_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_gather_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_gather_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_gcd_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_ge_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_geqrf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_gradient_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_gt_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_hsplit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_hsplit_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_hstack_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_hypot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_igammac_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_index_add_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_index_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_index_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_index_fill_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_index_put_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_index_select_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_index_select_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_int_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_isclose_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_isclose_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_isfinite_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_isfinite_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_isinf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_isnan_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_isposinf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_jiterator_2inputs_2outputs_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_jiterator_2inputs_2outputs_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_jiterator_binary_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_jiterator_unary_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_kron_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_kron_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_kthvalue_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_lcm_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_ldexp_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_le_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_lerp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_lgamma_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_cond_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_cross_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_cross_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_det_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_eigh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_eigvals_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_eigvalsh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_householder_product_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_inv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_inv_ex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_ldl_factor_ex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_lstsq_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_lu_factor_ex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_lu_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_matrix_power_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_matrix_rank_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_pinv_singular_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_solve_triangular_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_svd_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_svdvals_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_tensorinv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_tensorsolve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_vander_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linspace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_log10_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_log10_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_log1p_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_log2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_log_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_log_softmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_log_softmax_with_dtype_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_logaddexp2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_logaddexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_logical_and_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_logical_and_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_logical_not_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_logical_not_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_logical_or_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_logspace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_logsumexp_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_long_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_lt_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_lu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_lu_unpack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_mH_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_mT_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_mT_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_amin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_argmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_argmin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_cumsum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_fill_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_fill_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_log_softmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_logaddexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_logsumexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_median_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_prod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_select_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_softmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_softmin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_std_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_std_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_sum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_sum_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_matrix_exp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_max_binary_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_max_pool2d_with_indices_backward_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_max_reduction_no_dim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_max_reduction_with_dim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_max_reduction_with_dim_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_median_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_median_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_meshgrid_list_of_tensors_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_meshgrid_variadic_tensors_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_min_reduction_no_dim_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_min_reduction_with_dim_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_mm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_mode_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_movedim_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_msort_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_mul_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_mul_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_multinomial_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_mvlgamma_mvlgamma_p_1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_mvlgamma_mvlgamma_p_1_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nan_to_num_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nanmean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nanmedian_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_narrow_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_narrow_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_native_dropout_backward_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_ne_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_neg_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_neg_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_new_empty_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_new_empty_strided_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_new_zeros_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nextafter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_adaptive_avg_pool2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_adaptive_avg_pool3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_adaptive_max_pool1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_batch_norm_without_cudnn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_binary_cross_entropy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_celu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_conv1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_conv2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_conv_transpose2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_conv_transpose3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_cosine_embedding_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_cosine_embedding_loss_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_cross_entropy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_dropout2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_embedding_bag_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_embedding_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_gaussian_nll_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_grid_sample_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_group_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_hardshrink_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_hardsigmoid_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_hardswish_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_hardtanh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_hardtanh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_huber_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_instance_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_interpolate_bicubic_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_interpolate_bilinear_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_interpolate_nearest_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_kl_div_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_l1_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_logsigmoid_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_margin_ranking_loss_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_max_pool2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_max_pool3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_max_unpool2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_max_unpool3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_max_unpool3d_grad_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_multi_margin_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_multilabel_margin_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_nll_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_normalize_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_one_hot_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_pad_circular_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_pad_circular_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_pad_constant_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_pad_constant_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_pixel_shuffle_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_pixel_unshuffle_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_poisson_nll_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_poisson_nll_loss_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_prelu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_relu6_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_relu_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_softmin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_softplus_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_softshrink_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_triplet_margin_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_triplet_margin_loss_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_triplet_margin_with_distance_loss_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_upsample_nearest_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_norm_inf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_norm_nuc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_normal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_normal_number_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_ones_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_ones_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_ones_like_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_outer_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_pca_lowrank_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_permute_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_permute_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_polygamma_polygamma_n_0_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_polygamma_polygamma_n_1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_polygamma_polygamma_n_2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_polygamma_polygamma_n_3_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_polygamma_polygamma_n_3_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_positive_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_positive_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_pow_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_prod_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_quantile_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_rad2deg_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_randint_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_randint_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_randint_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_randn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_ravel_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_ravel_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_reciprocal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_reciprocal_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_renorm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_repeat_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_repeat_interleave_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_reshape_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_reshape_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_resize__cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_resolve_conj_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_roll_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_round_decimals_neg_3_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_rsqrt_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_rsub_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_scatter_add_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_scatter_reduce_amin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_scatter_reduce_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_scatter_reduce_mean_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_scatter_reduce_prod_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_scatter_reduce_sum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_scatter_reduce_sum_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_searchsorted_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_searchsorted_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_segment_reduce_lengths_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_segment_reduce_offsets_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_select_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_select_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_sgn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_sgn_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_short_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_short_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_signal_windows_blackman_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_signal_windows_cosine_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_signal_windows_gaussian_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_signal_windows_general_hamming_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_signbit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_signbit_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_sinc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_sinh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_sinh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_slice_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_softmax_with_dtype_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_sparse_sampled_addmm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_bessel_j0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_bessel_j1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_bessel_j1_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_bessel_y0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_chebyshev_polynomial_t_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_chebyshev_polynomial_u_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_chebyshev_polynomial_u_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_chebyshev_polynomial_v_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_chebyshev_polynomial_w_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_entr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_entr_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_hermite_polynomial_h_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_hermite_polynomial_he_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_i0e_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_i0e_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_i1e_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_laguerre_polynomial_l_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_laguerre_polynomial_l_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_legendre_polynomial_p_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_legendre_polynomial_p_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_log_ndtr_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_modified_bessel_i0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_modified_bessel_i0_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_modified_bessel_i1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_modified_bessel_i1_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_modified_bessel_k0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_modified_bessel_k0_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_modified_bessel_k1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_ndtr_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_polygamma_special_polygamma_n_0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_polygamma_special_polygamma_n_0_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_scaled_modified_bessel_k1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_scaled_modified_bessel_k1_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_shifted_chebyshev_polynomial_u_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_shifted_chebyshev_polynomial_v_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_shifted_chebyshev_polynomial_w_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_spherical_bessel_j0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_zeta_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_zeta_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_split_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_split_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_split_list_args_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_split_with_sizes_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_sqrt_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_square_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_square_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_squeeze_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_stack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_std_mean_unbiased_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_stft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_sub_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_sum_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_sum_to_size_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_svd_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_svd_lowrank_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_t_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_tan_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_tensor_split_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_tile_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_to_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_to_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_to_sparse_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_topk_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_topk_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_trace_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_trapezoid_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_tril_indices_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_triu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_triu_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_trunc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_unbind_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_unflatten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_unfold_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_unfold_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_unfold_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_unique_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_unsqueeze_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_var_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_var_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_vdot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_view_as_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_view_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_view_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_view_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_vstack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_vstack_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_where_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_zero__cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_zero__cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_zeros_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_zeros_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_zeros_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_H_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_T_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values___radd___cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values___rdiv___cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values___ror___cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_acosh_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_addr_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_all_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_amax_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_amin_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_any_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_as_strided_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_asinh_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_atanh_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_atleast_1d_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_atleast_3d_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_bitwise_or_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_bool_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_broadcast_tensors_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_cartesian_prod_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_cat_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_cfloat_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_chalf_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_char_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_chunk_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_clamp_max_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_conj_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_cosh_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_cummax_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_cummin_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_diag_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_diag_embed_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_diagflat_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_diagonal_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_diagonal_scatter_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_dstack_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_empty_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_erf_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_erfc_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_erfinv_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_exp2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_expand_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_fftshift_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_hfft2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_ifft2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_ifft_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_ifftshift_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_ihfft2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_ihfft_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_irfft2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_rfft2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_rfft_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fill_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_flatten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_flipud_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_float_power_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fmax_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_ge_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_half_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_hsplit_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_index_add_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_index_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_index_put_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_index_select_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_int_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_isneginf_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_isposinf_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_jiterator_2inputs_2outputs_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_jiterator_4inputs_with_extra_args_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_jiterator_binary_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_ldexp_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_le_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_lgamma_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_log10_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_log1p_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_log2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_log_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_log_softmax_with_dtype_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_logical_not_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_logical_or_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_logical_xor_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_logit_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_mH_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_mT_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_masked_fill_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_masked_sum_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_max_reduction_with_dim_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_maximum_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_minimum_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_movedim_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_mul_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_narrow_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_narrow_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_new_empty_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_new_full_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_new_ones_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_new_zeros_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_nn_functional_pad_circular_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_nn_functional_pixel_shuffle_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_nn_functional_softsign_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_ones_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_polygamma_polygamma_n_2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_polygamma_polygamma_n_3_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_polygamma_polygamma_n_4_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_rad2deg_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_real_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_reciprocal_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_resize_as__cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_scalar_tensor_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_scatter_reduce_sum_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_select_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_sign_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_signbit_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_sinc_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_sinh_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_slice_scatter_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_softmax_with_dtype_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_airy_ai_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_bessel_j0_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_bessel_j1_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_chebyshev_polynomial_u_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_chebyshev_polynomial_w_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_erfcx_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_i1e_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_log_ndtr_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_modified_bessel_i1_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_modified_bessel_k0_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_ndtr_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_scaled_modified_bessel_k1_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_shifted_chebyshev_polynomial_t_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_shifted_chebyshev_polynomial_u_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_xlog1py_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_zeta_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_split_list_args_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_split_with_sizes_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_square_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_squeeze_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_sum_to_size_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_t_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_take_along_dim_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_take_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_tan_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_tanh_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_tile_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_to_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_to_sparse_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_transpose_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_tril_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_unflatten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_unfold_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_unfold_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_unique_consecutive_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_unique_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_unsqueeze_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_view_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_vstack_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_where_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_xlogy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_zeros_cuda_bool, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_H_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_H_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_T_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples___radd___cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples___radd___cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples___rand___cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples___rdiv___cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples___rdiv___cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples___rdiv___cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples___rmatmul___cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples___rmatmul___cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples___rmul___cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples___rmul___cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples___rmul___cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples___rpow___cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples___rsub___cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples___rsub___cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples__softmax_backward_data_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_acos_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_acos_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_acosh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_add_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_add_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_addcdiv_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_addcdiv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_addcmul_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_addcmul_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_addmm_decomposed_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_addmv_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_addr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_all_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_allclose_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_allclose_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_amax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_amax_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_amin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_amin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_aminmax_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_angle_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_angle_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_angle_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_arange_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_argmax_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_argmin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_argsort_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_argsort_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_argwhere_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_argwhere_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_as_strided_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_as_strided_partial_views_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_as_strided_scatter_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_as_strided_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_asin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_atan2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_atan2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_atanh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_atanh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_atleast_1d_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_atleast_2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_atleast_3d_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_baddbmm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_bernoulli_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_bfloat16_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_bfloat16_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_bitwise_not_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_bitwise_or_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_bitwise_right_shift_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_bitwise_xor_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_block_diag_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_bool_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_broadcast_shapes_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_broadcast_tensors_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_broadcast_tensors_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_broadcast_to_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_broadcast_to_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_bucketize_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_byte_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cartesian_prod_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cat_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cdist_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cdouble_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cfloat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cfloat_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_char_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_char_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cholesky_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cholesky_solve_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_clamp_max_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_clamp_min_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_clamp_min_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_column_stack_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_combinations_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_combinations_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_complex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_conj_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_conj_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_conj_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_constant_pad_nd_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_copysign_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_copysign_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_corrcoef_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_corrcoef_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cos_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cos_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cos_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cosh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_count_nonzero_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_count_nonzero_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cov_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cross_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cross_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cummax_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cummin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cummin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cumprod_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cumsum_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cumsum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_deg2rad_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_deg2rad_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_diag_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_diag_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_diag_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_diag_embed_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_diagflat_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_diagflat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_diagflat_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_diagonal_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_diagonal_scatter_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_diagonal_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_diagonal_scatter_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_diff_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_diff_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_digamma_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_dist_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_div_floor_rounding_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_div_floor_rounding_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_div_no_rounding_mode_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_div_no_rounding_mode_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_div_no_rounding_mode_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_dot_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_dot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_double_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_double_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_dsplit_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_dsplit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_dsplit_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_dstack_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_dstack_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_empty_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_empty_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_empty_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_equal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_equal_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_erf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_erf_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_erfc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_erfc_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_erfinv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_erfinv_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_exp_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_exp_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_expand_as_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_expand_as_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_expand_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_expm1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_expm1_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_eye_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_fft2_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_fft_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_fftn_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_fftn_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_fftshift_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_fftshift_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_hfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_hfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_hfftn_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_ifft2_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_ifft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_ifft_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_ifftn_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_ifftshift_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_ifftshift_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_ihfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_ihfft_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_irfft2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_irfft_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_irfftn_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_irfftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_rfftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_rfftn_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_flatten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_flip_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_flip_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_flip_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fliplr_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fliplr_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_float_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_float_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_floor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_floor_divide_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fmax_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fmin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fmin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fmod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_full_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_full_like_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_gather_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_gather_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_gather_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_gcd_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_geqrf_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_geqrf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_gradient_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_grid_sampler_2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_gt_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_gt_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_half_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_heaviside_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_hstack_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_i0_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_igamma_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_index_fill_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_index_fill_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_index_fill_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_index_put_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_index_put_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_index_reduce_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_index_select_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_index_select_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_inner_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isclose_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isclose_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isfinite_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isfinite_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isinf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isnan_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isposinf_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isreal_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_jiterator_2inputs_2outputs_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_jiterator_4inputs_with_extra_args_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_jiterator_4inputs_with_extra_args_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_jiterator_binary_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_jiterator_binary_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_jiterator_binary_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_jiterator_unary_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_kron_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_kthvalue_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_kthvalue_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_ldexp_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_le_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_le_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_lerp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_lgamma_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_cond_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_cross_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_cross_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_cross_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_det_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_det_singular_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_eig_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_eigvals_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_ldl_factor_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_ldl_factor_ex_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_ldl_factor_ex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_lstsq_grad_oriented_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_lu_factor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_lu_factor_ex_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_lu_solve_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_matrix_power_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_matrix_rank_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_matrix_rank_hermitian_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_multi_dot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_norm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_pinv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_pinv_hermitian_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_pinv_singular_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_pinv_singular_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_qr_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_slogdet_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_slogdet_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_solve_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_solve_triangular_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_svd_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_tensorinv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_vander_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_vander_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_vecdot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linspace_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_log10_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_log10_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_log10_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_log1p_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_log1p_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_log1p_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_log_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_log_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_log_softmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_log_softmax_with_dtype_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logaddexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logdet_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logical_and_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logical_not_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logical_or_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logical_xor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logical_xor_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logit_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logspace_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logsumexp_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_long_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_long_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_long_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_lt_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_lu_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_lu_unpack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mH_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mT_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mT_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mT_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_amax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_amax_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_amin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_argmin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_cumprod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_log_softmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_logaddexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_logsumexp_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_mean_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_normalize_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_prod_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_scatter_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_select_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_softmin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_std_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_std_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_std_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_var_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_max_binary_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_max_reduction_no_dim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_max_reduction_with_dim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mean_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_meshgrid_list_of_tensors_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_meshgrid_list_of_tensors_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_meshgrid_variadic_tensors_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_meshgrid_variadic_tensors_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_min_binary_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_min_binary_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_min_reduction_with_dim_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_movedim_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_movedim_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mul_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mul_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_multinomial_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mvlgamma_mvlgamma_p_1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mvlgamma_mvlgamma_p_3_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mvlgamma_mvlgamma_p_5_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nan_to_num_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_ne_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_ne_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_neg_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_neg_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_new_empty_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_new_empty_strided_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_new_full_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_new_ones_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_new_zeros_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nextafter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_alpha_dropout_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_binary_cross_entropy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_conv2d_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_cosine_embedding_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_cosine_embedding_loss_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_dropout_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_elu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_glu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_grid_sample_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_group_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_hardshrink_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_hardsigmoid_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_hardtanh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_hinge_embedding_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_interpolate_bicubic_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_interpolate_bilinear_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_interpolate_linear_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_kl_div_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_l1_loss_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_l1_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_linear_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_logsigmoid_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_margin_ranking_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_margin_ranking_loss_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_max_pool2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_max_unpool1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_max_unpool2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_max_unpool3d_grad_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_mish_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_mse_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_multi_margin_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_nll_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_normalize_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_normalize_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pad_reflect_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pad_reflect_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pairwise_distance_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pairwise_distance_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pairwise_distance_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pdist_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pixel_shuffle_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pixel_unshuffle_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_relu6_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_relu6_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_relu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_relu_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_rrelu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_selu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_softmin_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_softmin_with_dtype_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_softplus_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_softshrink_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_softsign_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_tanhshrink_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_threshold_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_threshold_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_triplet_margin_loss_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_triplet_margin_loss_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_triplet_margin_with_distance_loss_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_triplet_margin_with_distance_loss_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_upsample_nearest_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_norm_inf_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_norm_nuc_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_norm_nuc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_normal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_normal_number_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_ones_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_ones_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_ormqr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_pca_lowrank_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_permute_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_pinverse_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_pinverse_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_polygamma_polygamma_n_0_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_polygamma_polygamma_n_1_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_polygamma_polygamma_n_2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_polygamma_polygamma_n_2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_polygamma_polygamma_n_4_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_positive_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_pow_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_prod_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_prod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_prod_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_qr_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_qr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_rad2deg_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_rand_like_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_randint_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_randint_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_randn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_ravel_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_ravel_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_real_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_real_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_reciprocal_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_reciprocal_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_renorm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_renorm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_repeat_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_repeat_interleave_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_reshape_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_resize__cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_resize__cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_resize_as__cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_resize_as__cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_resolve_conj_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_resolve_neg_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_roll_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_rot90_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_rot90_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_round_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_round_decimals_neg_3_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_rsqrt_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_rsub_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_scatter_add_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_scatter_reduce_amax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_scatter_reduce_amin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_scatter_reduce_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_scatter_reduce_mean_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_scatter_reduce_sum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_scatter_reduce_sum_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_searchsorted_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_segment_reduce_lengths_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_select_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_select_scatter_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sgn_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sigmoid_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sigmoid_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sigmoid_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sign_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sign_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_signal_windows_exponential_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_signal_windows_general_cosine_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_signal_windows_general_hamming_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_signal_windows_hamming_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sin_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sinh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sinh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_slice_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_slice_scatter_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_softmax_with_dtype_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_softmax_with_dtype_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sparse_sampled_addmm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sparse_sampled_addmm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_bessel_j0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_bessel_j1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_bessel_j1_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_bessel_y0_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_bessel_y1_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_chebyshev_polynomial_t_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_chebyshev_polynomial_u_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_chebyshev_polynomial_w_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_chebyshev_polynomial_w_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_entr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_hermite_polynomial_h_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_hermite_polynomial_h_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_i0e_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_i0e_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_i1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_i1_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_laguerre_polynomial_l_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_legendre_polynomial_p_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_legendre_polynomial_p_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_modified_bessel_i0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_modified_bessel_i0_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_modified_bessel_i1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_modified_bessel_i1_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_modified_bessel_k0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_modified_bessel_k1_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_ndtri_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_polygamma_special_polygamma_n_0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_scaled_modified_bessel_k1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_shifted_chebyshev_polynomial_w_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_xlog1py_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_xlog1py_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_zeta_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_zeta_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_split_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_split_list_args_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sqrt_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sqrt_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_squeeze_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_squeeze_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_squeeze_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_stack_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_std_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_std_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_std_unbiased_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_stft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sub_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sum_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sum_to_size_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_svd_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_svd_lowrank_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_take_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_take_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_take_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_tan_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_tanh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_tensor_split_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_tensor_split_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_tensordot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_tile_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_tile_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_to_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_to_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_to_sparse_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_to_sparse_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_topk_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_topk_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_trace_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_trace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_trace_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_transpose_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_transpose_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_trapz_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_trapz_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_trapz_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_triangular_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_tril_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_tril_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_tril_indices_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_triu_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_triu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_triu_indices_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_true_divide_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_true_divide_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unbind_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unbind_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unflatten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unfold_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unfold_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unfold_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unfold_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unfold_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unfold_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_uniform_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unique_consecutive_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unique_consecutive_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unique_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unsqueeze_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unsqueeze_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_var_mean_unbiased_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_var_unbiased_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_vdot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_view_as_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_view_as_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_view_as_real_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_view_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_view_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_view_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_vsplit_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_vsplit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_vsplit_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_vstack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_where_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_xlogy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_xlogy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_zero__cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_zero__cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_zeros_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_zeros_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_zeros_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_zeros_like_cuda_int64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_addbmm_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_aminmax_cuda_int64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_argwhere_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_argwhere_cuda_int64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_broadcast_tensors_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_broadcast_tensors_cuda_int64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_broadcast_to_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_broadcast_to_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_broadcast_to_cuda_int64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_cat_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_clamp_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_clone_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_clone_cuda_int64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_diag_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_diag_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_flatten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_flatten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_jiterator_2inputs_2outputs_cuda_int64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_linalg_cross_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_linalg_cross_cuda_int64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_linalg_tensorsolve_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_linalg_tensorsolve_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_linalg_vander_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_linalg_vecdot_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_linalg_vecdot_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_meshgrid_variadic_tensors_cuda_int64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_nn_functional_conv_transpose2d_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_nn_functional_conv_transpose3d_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_nn_functional_conv_transpose3d_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_nn_functional_gelu_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_nn_functional_l1_loss_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_nn_functional_one_hot_cuda_int64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_nn_functional_pairwise_distance_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_nn_functional_pairwise_distance_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_nn_functional_pairwise_distance_cuda_int64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_ravel_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_ravel_cuda_int64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_repeat_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_repeat_cuda_int64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_roll_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_roll_cuda_int64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_signal_windows_bartlett_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_signal_windows_gaussian_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_signal_windows_hann_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_signal_windows_kaiser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_signal_windows_nuttall_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_tensor_split_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_tile_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_transpose_cuda_int64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_tril_indices_cuda_int64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_unbind_cuda_int64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_view_copy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_where_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_where_cuda_float64, test/test_ops.py::TestCommonCUDA::test_out_H_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out___getitem___cuda_float32, test/test_ops.py::TestCommonCUDA::test_out___rmatmul___cuda_float32, test/test_ops.py::TestCommonCUDA::test_out___rmod___cuda_float32, test/test_ops.py::TestCommonCUDA::test_out___rmul___cuda_float32, test/test_ops.py::TestCommonCUDA::test_out___rpow___cuda_float32, test/test_ops.py::TestCommonCUDA::test_out___rsub___cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs__conversions_bool_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs__conversions_double_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs__conversions_half_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs__conversions_long_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs__conversions_short_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_acos_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_add_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_addcdiv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_addcmul_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_addr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_all_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_amin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_as_strided_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_as_strided_partial_views_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_as_strided_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_asin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_atan2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_atleast_1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_bitwise_not_cuda_int64, test/test_ops.py::TestCommonCUDA::test_out__refs_bitwise_or_cuda_int64, test/test_ops.py::TestCommonCUDA::test_out__refs_cat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_ceil_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_clamp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_clamp_min_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_clone_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_conj_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_conj_physical_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_copysign_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_cos_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_cumsum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_diagonal_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_div_floor_rounding_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_div_trunc_rounding_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_dsplit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_erfc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_erfinv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_exp2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_exp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_expand_as_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_fft_fft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_fft_fftshift_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_fft_hfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_fft_hfftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_fft_ihfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_fft_ihfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_fft_irfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_fft_rfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_flatten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_flip_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_fliplr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_fmod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_frac_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_ge_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_heaviside_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_igammac_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_index_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_index_fill_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_isclose_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_isfinite_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_isnan_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_le_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_lerp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_linalg_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_linalg_svdvals_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_log10_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_log1p_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_log2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_log_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_logical_and_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_logical_or_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_logical_xor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_logspace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_logsumexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_masked_fill_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_maximum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_movedim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_nan_to_num_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_narrow_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_ne_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_new_empty_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_new_empty_strided_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_new_zeros_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_dropout_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_group_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_hardtanh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_hinge_embedding_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_leaky_relu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_margin_ranking_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_nll_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_pairwise_distance_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_pdist_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_poisson_nll_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_relu6_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_relu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_softshrink_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_threshold_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_positive_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_pow_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_prod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_randn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_ravel_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_reciprocal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_roll_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_round_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_rsqrt_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_sgn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_sigmoid_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_sign_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_sinh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_special_bessel_j1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_special_erfcx_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_special_i1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_special_log_ndtr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_special_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_special_multigammaln_mvlgamma_p_1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_special_multigammaln_mvlgamma_p_5_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_special_ndtr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_special_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_sqrt_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_stack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_std_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_sum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_tanh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_to_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_trace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_tril_indices_cuda_int64, test/test_ops.py::TestCommonCUDA::test_out__refs_triu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_triu_indices_cuda_int64, test/test_ops.py::TestCommonCUDA::test_out__refs_true_divide_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_trunc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_var_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_var_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_view_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_vsplit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_where_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_xlogy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__softmax_backward_data_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_acos_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_acosh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_add_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_addbmm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_addcdiv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_addmm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_addmm_decomposed_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_all_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_amin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_aminmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_angle_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_argmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_argsort_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_as_strided_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_as_strided_partial_views_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_as_strided_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_asin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_atan2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_atanh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_atleast_1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_atleast_2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_atleast_3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_baddbmm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_bernoulli_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_bfloat16_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_bincount_cuda_int64, test/test_ops.py::TestCommonCUDA::test_out_bitwise_left_shift_cuda_int64, test/test_ops.py::TestCommonCUDA::test_out_bitwise_not_cuda_int64, test/test_ops.py::TestCommonCUDA::test_out_bool_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_broadcast_shapes_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_broadcast_tensors_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_bucketize_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_cholesky_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_column_stack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_conj_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_conj_physical_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_corrcoef_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_cosh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_count_nonzero_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_cov_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_cummax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_cumprod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_cumulative_trapezoid_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_deg2rad_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_diag_embed_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_diagonal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_diagonal_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_digamma_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_div_floor_rounding_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_double_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_dsplit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_einsum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_empty_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_equal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_erfinv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_exp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_expand_as_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_expand_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_fft_fftshift_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_fft_ihfftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_fft_rfftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_floor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_floor_divide_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_fmin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_frac_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_frexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_full_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_full_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_gcd_cuda_int64, test/test_ops.py::TestCommonCUDA::test_out_ge_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_gradient_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_gt_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_hsplit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_i0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_igamma_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_igammac_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_imag_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_index_add_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_index_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_index_fill_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_index_put_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_index_select_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_inner_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_isinf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_isneginf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_istft_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_jiterator_binary_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_kron_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_kthvalue_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_lcm_cuda_int64, test/test_ops.py::TestCommonCUDA::test_out_ldexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_lerp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_cond_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_eig_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_eigh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_eigvals_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_eigvalsh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_inv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_inv_ex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_ldl_factor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_ldl_factor_ex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_ldl_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_lstsq_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_lstsq_grad_oriented_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_lu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_matrix_power_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_norm_subgradients_at_zero_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_pinv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_slogdet_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_solve_triangular_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_svd_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_svdvals_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_tensorsolve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_vander_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_vector_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_log10_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_log1p_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_log_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_log_softmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_logaddexp2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_logical_and_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_logical_not_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_long_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_lt_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_lu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_lu_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_lu_unpack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_mH_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_mT_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_masked_amin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_masked_argmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_masked_argmin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_masked_cumprod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_masked_logsumexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_masked_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_masked_median_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_masked_select_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_masked_softmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_masked_softmin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_masked_sum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_masked_var_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_matmul_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_matrix_exp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_max_pool2d_with_indices_backward_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_meshgrid_variadic_tensors_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_mm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_mul_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_mvlgamma_mvlgamma_p_1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nanmean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nanquantile_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_native_dropout_backward_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_new_empty_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_new_full_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_new_ones_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_new_zeros_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_adaptive_avg_pool1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_adaptive_avg_pool2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_adaptive_max_pool1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_adaptive_max_pool2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_adaptive_max_pool3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_avg_pool1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_avg_pool2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_avg_pool3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_binary_cross_entropy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_conv1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_conv_transpose1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_conv_transpose3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_cosine_embedding_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_cross_entropy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_ctc_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_dropout3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_elu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_embedding_bag_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_embedding_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_fractional_max_pool2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_gaussian_nll_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_gelu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_hardsigmoid_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_hinge_embedding_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_instance_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_interpolate_nearest_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_interpolate_trilinear_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_layer_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_linear_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_local_response_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_max_pool2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_max_pool3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_max_unpool1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_max_unpool1d_grad_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_max_unpool2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_max_unpool2d_grad_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_max_unpool3d_grad_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_mish_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_mse_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_multilabel_margin_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_nll_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_normalize_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_pad_constant_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_pad_replicate_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_pairwise_distance_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_pdist_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_pixel_unshuffle_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_poisson_nll_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_prelu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_relu6_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_rrelu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_selu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_smooth_l1_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_soft_margin_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_softplus_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_tanhshrink_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_threshold_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_unfold_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_upsample_bilinear_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nonzero_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_norm_fro_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_normal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_ones_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_ormqr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_pca_lowrank_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_permute_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_pinverse_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_polar_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_pow_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_prod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_put_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_rad2deg_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_rand_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_randint_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_ravel_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_remainder_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_reshape_as_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_reshape_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_resolve_conj_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_roll_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_rot90_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_round_decimals_0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_rsub_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_scalar_tensor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_scatter_reduce_amin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_scatter_reduce_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_scatter_reduce_prod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_segment_reduce_offsets_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_select_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_sgn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_sign_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_signal_windows_bartlett_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_signal_windows_blackman_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_signal_windows_cosine_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_signal_windows_gaussian_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_signal_windows_general_cosine_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_signal_windows_general_hamming_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_signal_windows_hamming_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_signal_windows_nuttall_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_sinh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_slice_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_softmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_sparse_sampled_addmm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_special_bessel_j0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_special_bessel_y0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_special_bessel_y1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_special_chebyshev_polynomial_v_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_special_erfcx_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_special_hermite_polynomial_he_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_special_i1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_special_i1e_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_special_legendre_polynomial_p_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_special_log_ndtr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_special_modified_bessel_i0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_special_modified_bessel_i1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_special_ndtr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_special_ndtri_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_special_polygamma_special_polygamma_n_0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_special_xlog1py_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_split_list_args_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_squeeze_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_std_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_std_mean_unbiased_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_stft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_sum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_symeig_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_t_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_take_along_dim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_take_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_tan_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_tanh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_to_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_to_sparse_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_trapz_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_tril_indices_cuda_int64, test/test_ops.py::TestCommonCUDA::test_out_triu_indices_cuda_int64, test/test_ops.py::TestCommonCUDA::test_out_trunc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_unbind_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_uniform_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_unsqueeze_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_var_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_var_mean_unbiased_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_vdot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_view_as_complex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_view_as_real_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_view_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_vsplit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_warning_H_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_T_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning___getitem___cuda, test/test_ops.py::TestCommonCUDA::test_out_warning___radd___cuda, test/test_ops.py::TestCommonCUDA::test_out_warning___rmod___cuda, test/test_ops.py::TestCommonCUDA::test_out_warning___rmul___cuda, test/test_ops.py::TestCommonCUDA::test_out_warning___rpow___cuda, test/test_ops.py::TestCommonCUDA::test_out_warning___rsub___cuda, test/test_ops.py::TestCommonCUDA::test_out_warning___rxor___cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs__conversions_bool_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs__conversions_cfloat_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs__conversions_char_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs__conversions_complex_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs__conversions_half_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_acos_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_acosh_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_add_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_addcdiv_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_addcmul_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_amax_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_amin_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_as_strided_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_as_strided_partial_views_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_asin_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_asinh_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_atanh_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_bitwise_and_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_bitwise_left_shift_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_bitwise_not_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_bitwise_xor_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_broadcast_shapes_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_bucketize_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_ceil_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_chunk_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_clamp_max_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_clone_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_conj_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_contiguous_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_cos_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_cosh_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_diag_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_diag_embed_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_diagonal_copy_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_diagonal_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_digamma_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_div_floor_rounding_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_div_no_rounding_mode_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_div_trunc_rounding_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_dsplit_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_empty_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_erfc_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_erfinv_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_exp_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_expand_as_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_expm1_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_fft_fft2_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_fft_hfftn_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_fft_ifft2_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_fft_ifft_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_fft_ifftn_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_fft_ifftshift_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_fft_ihfft2_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_fft_ihfftn_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_fft_irfft2_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_fft_irfftn_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_fft_rfft2_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_flipud_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_float_power_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_floor_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_floor_divide_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_fmax_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_frac_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_gcd_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_i0_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_imag_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_index_add_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_index_copy_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_index_fill_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_index_select_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_isclose_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_isnan_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_lerp_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_linalg_svd_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_linalg_svdvals_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_linalg_vector_norm_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_log10_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_log1p_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_log2_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_log_softmax_with_dtype_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_logical_and_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_logical_not_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_logspace_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_masked_fill_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_maximum_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_mean_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_movedim_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_mul_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nan_to_num_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_narrow_copy_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_native_layer_norm_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_ne_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_new_empty_strided_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_new_full_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_new_zeros_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_group_norm_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_hardtanh_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_l1_loss_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_layer_norm_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_log_softmax_with_dtype_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_margin_ranking_loss_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_mish_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_mse_loss_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_pairwise_distance_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_pdist_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_poisson_nll_loss_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_prelu_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_relu6_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_selu_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_softmax_with_dtype_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_softmin_with_dtype_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_softplus_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_tanhshrink_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_triplet_margin_loss_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_permute_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_pow_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_ravel_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_reciprocal_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_remainder_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_reshape_as_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_reshape_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_roll_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_rsub_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_sigmoid_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_sin_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_sinc_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_sinh_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_softmax_with_dtype_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_special_bessel_j0_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_special_bessel_j1_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_special_i0e_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_special_multigammaln_mvlgamma_p_1_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_special_multigammaln_mvlgamma_p_3_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_special_ndtr_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_special_softmax_with_dtype_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_special_spherical_bessel_j0_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_special_xlog1py_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_special_zeta_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_squeeze_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_std_mean_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_sum_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_to_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_tril_indices_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_true_divide_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_trunc_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_unflatten_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_unfold_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_unsqueeze_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_var_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_var_mean_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_view_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_vstack_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_abs_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_add_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_addbmm_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_addcdiv_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_addcmul_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_addr_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_all_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_amax_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_amin_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_angle_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_any_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_arange_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_argmin_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_argwhere_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_as_strided_scatter_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_atan_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_atanh_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_atleast_2d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_bernoulli_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_bincount_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_bitwise_and_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_bitwise_left_shift_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_bitwise_not_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_bitwise_xor_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_block_diag_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_bool_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_byte_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_cat_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_cdist_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_ceil_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_cfloat_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_chunk_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_clone_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_conj_physical_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_constant_pad_nd_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_contiguous_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_copysign_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_corrcoef_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_count_nonzero_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_cov_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_cummin_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_cumprod_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_diagflat_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_diagonal_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_diagonal_scatter_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_diff_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_div_trunc_rounding_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_double_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_empty_like_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_eq_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_equal_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_erf_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_erfinv_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_exp2_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_eye_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_fft_fft2_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_fft_fft_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_fft_fftshift_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_fft_hfft2_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_fft_hfftn_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_fft_ifft_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_fft_ifftshift_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_fft_ihfft_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_fft_ihfftn_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_fft_irfftn_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_fft_rfft_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_fft_rfftn_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_fill_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_flip_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_flipud_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_float_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_floor_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_floor_divide_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_frac_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_frexp_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_full_like_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_gcd_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_gradient_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_half_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_heaviside_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_histc_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_histogram_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_hsplit_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_i0_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_igammac_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_index_fill_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_index_put_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_index_reduce_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_isclose_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_isin_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_isinf_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_isnan_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_isneginf_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_isreal_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_jiterator_4inputs_with_extra_args_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_jiterator_binary_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_jiterator_binary_return_by_ref_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_le_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_cross_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_eig_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_householder_product_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_inv_ex_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_ldl_factor_ex_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_ldl_solve_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_lstsq_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_lu_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_lu_factor_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_lu_solve_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_matrix_norm_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_matrix_power_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_matrix_rank_hermitian_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_multi_dot_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_norm_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_norm_subgradients_at_zero_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_qr_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_slogdet_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_solve_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_solve_ex_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_solve_triangular_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_tensorsolve_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_vecdot_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_vector_norm_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_log10_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_log2_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_log_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_log_softmax_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_log_softmax_with_dtype_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_logaddexp2_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_logaddexp_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_logical_not_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_logical_or_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_logit_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_lt_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_lu_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_lu_solve_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_lu_unpack_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_mH_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_mT_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_masked_amin_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_masked_argmax_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_masked_cumprod_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_masked_cumsum_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_masked_log_softmax_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_masked_logsumexp_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_masked_mean_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_masked_median_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_masked_norm_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_masked_normalize_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_masked_prod_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_masked_select_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_masked_softmax_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_masked_softmin_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_masked_var_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_matmul_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_max_binary_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_median_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_meshgrid_list_of_tensors_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_meshgrid_variadic_tensors_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_min_reduction_with_dim_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_mode_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_msort_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_mul_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_mv_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_mvlgamma_mvlgamma_p_1_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_mvlgamma_mvlgamma_p_5_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nan_to_num_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nanmean_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_narrow_copy_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_native_dropout_backward_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_native_layer_norm_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_neg_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_new_empty_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_new_zeros_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_adaptive_avg_pool2d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_adaptive_avg_pool3d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_adaptive_max_pool1d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_adaptive_max_pool3d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_avg_pool1d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_avg_pool2d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_avg_pool3d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_batch_norm_without_cudnn_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_bilinear_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_binary_cross_entropy_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_binary_cross_entropy_with_logits_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_conv_transpose1d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_cosine_embedding_loss_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_cross_entropy_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_ctc_loss_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_dropout2d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_dropout_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_embedding_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_feature_alpha_dropout_without_train_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_fractional_max_pool3d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_gaussian_nll_loss_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_gelu_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_group_norm_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_hardshrink_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_hardsigmoid_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_hardtanh_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_hinge_embedding_loss_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_huber_loss_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_instance_norm_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_interpolate_area_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_interpolate_bicubic_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_interpolate_linear_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_interpolate_trilinear_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_kl_div_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_l1_loss_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_layer_norm_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_leaky_relu_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_max_pool1d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_max_pool2d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_max_pool3d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_max_unpool3d_grad_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_mish_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_multi_margin_loss_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_multilabel_margin_loss_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_multilabel_soft_margin_loss_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_normalize_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_pad_circular_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_pad_constant_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_pad_reflect_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_pad_replicate_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_pdist_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_pixel_unshuffle_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_poisson_nll_loss_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_relu6_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_relu_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_softmin_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_softmin_with_dtype_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_softsign_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_tanhshrink_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_triplet_margin_with_distance_loss_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_upsample_bilinear_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nonzero_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_norm_inf_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_ops_nvprims_native_batch_norm_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_ops_nvprims_var_mean_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_ops_nvprims_view_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_ormqr_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_pca_lowrank_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_pinverse_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_polar_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_polygamma_polygamma_n_0_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_pow_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_put_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_qr_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_quantile_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_randn_like_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_ravel_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_real_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_repeat_interleave_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_reshape_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_resize__cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_resolve_neg_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_rot90_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_round_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_round_decimals_0_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_rsub_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_scatter_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_scatter_reduce_amax_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_scatter_reduce_amin_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_scatter_reduce_prod_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_scatter_reduce_sum_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_segment_reduce_lengths_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_select_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_select_scatter_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_sigmoid_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_signal_windows_bartlett_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_signal_windows_hann_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_signal_windows_kaiser_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_signal_windows_nuttall_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_sinc_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_slice_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_softmax_with_dtype_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_sort_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_sparse_sampled_addmm_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_airy_ai_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_bessel_j0_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_bessel_j1_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_chebyshev_polynomial_t_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_chebyshev_polynomial_v_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_entr_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_hermite_polynomial_he_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_laguerre_polynomial_l_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_modified_bessel_k0_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_scaled_modified_bessel_k0_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_shifted_chebyshev_polynomial_t_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_shifted_chebyshev_polynomial_u_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_shifted_chebyshev_polynomial_v_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_xlog1py_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_zeta_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_split_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_split_list_args_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_split_with_sizes_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_square_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_std_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_std_mean_unbiased_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_sub_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_sum_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_sum_to_size_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_svd_lowrank_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_t_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_take_along_dim_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_tan_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_tensor_split_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_to_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_to_sparse_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_trapz_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_tril_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_true_divide_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_unbind_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_unsqueeze_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_var_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_var_mean_unbiased_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_view_as_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_view_as_real_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_vstack_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_xlogy_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_zero__cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_zeros_like_cuda, test/test_ops.py::TestCommonCUDA::test_out_zero__cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_zeros_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_T_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_T_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bfloat16_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bfloat16_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bfloat16_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bfloat16_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bfloat16_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bfloat16_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bfloat16_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bool_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bool_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bool_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bool_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bool_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bool_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bool_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_byte_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_byte_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_byte_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_byte_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_byte_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cdouble_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cdouble_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cdouble_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cdouble_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cdouble_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cdouble_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cdouble_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cdouble_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cdouble_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cfloat_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cfloat_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cfloat_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cfloat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cfloat_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cfloat_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cfloat_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cfloat_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_chalf_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_chalf_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_chalf_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_chalf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_chalf_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_chalf_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_char_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_char_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_char_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_char_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_char_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_char_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_char_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_double_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_double_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_double_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_double_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_double_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_double_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_double_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_double_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_double_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_double_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_float_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_float_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_float_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_float_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_float_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_float_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_half_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_half_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_half_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_half_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_int_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_int_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_int_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_int_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_int_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_int_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_int_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_long_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_long_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_long_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_long_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_long_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_long_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_long_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_long_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_long_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_short_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_short_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_short_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_short_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_short_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_short_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_short_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_abs_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_abs_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_abs_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_abs_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_abs_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_abs_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_abs_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_acos_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_acos_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_acos_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_acos_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_acos_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_acos_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_acosh_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_acosh_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_acosh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_acosh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_acosh_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_acosh_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_acosh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_acosh_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_add_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_add_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_add_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_add_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_add_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_add_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_add_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_add_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_add_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_add_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_addcdiv_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_addcdiv_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_addcdiv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_addcmul_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_addcmul_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_addcmul_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_addcmul_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_addr_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_addr_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_addr_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_addr_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_addr_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_addr_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_all_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_all_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_all_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_all_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_all_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_all_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_all_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_allclose_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_allclose_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_allclose_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_amax_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_amax_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_amax_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_amax_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_amin_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_amin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_amin_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_amin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_amin_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_amin_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_any_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_any_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_any_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_any_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_any_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_any_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_any_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_any_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_arange_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_arange_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_arange_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_arange_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_arange_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_partial_views_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_partial_views_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_partial_views_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_partial_views_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_partial_views_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_partial_views_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_scatter_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_scatter_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_scatter_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_scatter_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_scatter_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_scatter_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_scatter_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_asin_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_asin_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_asin_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_asin_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_asin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_asinh_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_asinh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_asinh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_asinh_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_asinh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atan2_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atan2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atan2_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atan2_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atan2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atan_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atan_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atan_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atan_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atan_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atan_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atan_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atanh_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atanh_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atanh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atanh_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atanh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atanh_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atanh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atanh_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_1d_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_1d_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_1d_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_1d_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_1d_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_1d_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_2d_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_2d_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_2d_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_2d_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_2d_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_2d_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_2d_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_2d_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_3d_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_3d_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_3d_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_3d_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_3d_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_3d_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_3d_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_3d_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_and_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_and_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_left_shift_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_left_shift_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_left_shift_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_left_shift_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_not_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_not_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_not_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_or_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_or_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_or_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_right_shift_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_right_shift_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_right_shift_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_xor_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_xor_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_xor_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_xor_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_xor_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_tensors_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_tensors_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_tensors_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_tensors_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_tensors_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_tensors_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_to_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_to_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_to_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_to_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_to_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_to_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bucketize_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bucketize_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bucketize_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cat_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cat_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cat_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cat_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cat_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ceil_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ceil_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ceil_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ceil_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ceil_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_chunk_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_chunk_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_chunk_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_max_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_max_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_max_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_max_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_max_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_max_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_min_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_min_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_min_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_min_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_min_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clone_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clone_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clone_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clone_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clone_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clone_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_column_stack_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_column_stack_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_column_stack_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_column_stack_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_column_stack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_column_stack_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_column_stack_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_column_stack_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_column_stack_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_physical_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_physical_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_physical_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_constant_pad_nd_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_constant_pad_nd_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_constant_pad_nd_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_constant_pad_nd_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_constant_pad_nd_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_constant_pad_nd_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_constant_pad_nd_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_constant_pad_nd_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_contiguous_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_contiguous_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_contiguous_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_contiguous_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_contiguous_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_contiguous_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_contiguous_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_contiguous_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_copysign_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_copysign_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_copysign_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_copysign_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_copysign_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_copysign_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cos_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cos_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cos_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cos_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cos_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cos_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cos_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cosh_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cosh_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cosh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cosh_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cosh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cosh_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cumsum_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cumsum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cumsum_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cumsum_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cumsum_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cumsum_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_embed_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_embed_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_embed_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_embed_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_copy_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_copy_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_copy_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_copy_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_scatter_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_scatter_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_scatter_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_scatter_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_scatter_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_digamma_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_digamma_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_digamma_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_digamma_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_digamma_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_digamma_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_digamma_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_digamma_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_div_floor_rounding_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_div_floor_rounding_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_div_floor_rounding_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_div_floor_rounding_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_div_floor_rounding_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_div_no_rounding_mode_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_div_no_rounding_mode_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_div_no_rounding_mode_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_div_no_rounding_mode_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_div_no_rounding_mode_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_div_no_rounding_mode_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_div_trunc_rounding_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_div_trunc_rounding_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_div_trunc_rounding_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_div_trunc_rounding_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_div_trunc_rounding_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_div_trunc_rounding_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_dsplit_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_dsplit_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_dsplit_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_dsplit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_dsplit_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_dsplit_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_dsplit_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_dsplit_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_dstack_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_dstack_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_dstack_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_like_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_like_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_like_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_like_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_like_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_like_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_eq_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_eq_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_eq_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_eq_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_eq_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_erf_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_erf_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_erf_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_erf_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_erf_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_erf_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_erfc_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_erfc_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_erfc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_erfc_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_erfc_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_erfc_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_erfinv_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_erfinv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_erfinv_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_erfinv_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_erfinv_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_exp2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_exp2_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_exp2_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_exp_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_exp_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_exp_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_as_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_as_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_as_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expm1_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expm1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expm1_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expm1_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expm1_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expm1_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_eye_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_eye_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_eye_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_eye_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_eye_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft2_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft2_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft2_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftn_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftn_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftn_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftn_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftn_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftn_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftn_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftn_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftshift_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftshift_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftshift_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftshift_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftshift_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftshift_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft2_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft2_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft2_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft2_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfftn_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfftn_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfftn_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfftn_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfftn_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfftn_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfftn_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft2_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft2_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft2_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft2_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft2_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftn_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftn_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftn_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftn_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftn_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftshift_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftshift_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftshift_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftshift_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfft2_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfft2_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfft2_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfft2_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfft_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfft_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfft_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfft_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfftn_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfftn_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfftn_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft2_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft2_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft2_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfftn_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfftn_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfftn_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfftn_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfftn_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfftn_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfftn_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfftn_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfft2_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfft2_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfft_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfft_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfft_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfft_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfft_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfft_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfftn_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfftn_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfftn_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfftn_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fill_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fill_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fill_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fill_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fill_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flatten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flatten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flatten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flatten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flatten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flip_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flip_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flip_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flip_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flip_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flip_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flip_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fliplr_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fliplr_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fliplr_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flipud_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flipud_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flipud_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flipud_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flipud_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flipud_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flipud_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flipud_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flipud_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flipud_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_float_power_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_float_power_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_float_power_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_float_power_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_float_power_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_float_power_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_float_power_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_floor_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_floor_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_floor_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_floor_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_floor_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_floor_divide_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_floor_divide_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_floor_divide_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_floor_divide_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fmax_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fmax_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fmax_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fmax_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fmax_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fmin_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fmin_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fmin_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fmin_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fmin_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fmod_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fmod_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_frac_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_frac_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_gcd_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_gcd_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ge_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ge_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ge_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ge_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ge_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ge_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_gt_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_gt_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_gt_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_gt_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_gt_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_gt_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_gt_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_gt_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_heaviside_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_heaviside_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_heaviside_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_heaviside_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_heaviside_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_heaviside_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_heaviside_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_heaviside_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_hsplit_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_hsplit_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_hsplit_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_hsplit_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_hsplit_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_hsplit_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_hsplit_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_hsplit_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_hstack_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_hstack_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_hstack_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_hstack_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_hstack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_hstack_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_hstack_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_hstack_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_hstack_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_hstack_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_hstack_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_hypot_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_hypot_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_i0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_i0_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_i0_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_i0_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_i0_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_igamma_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_igammac_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_imag_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_add_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_add_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_add_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_add_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_add_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_add_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_add_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_add_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_copy_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_copy_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_copy_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_copy_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_copy_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_fill_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_fill_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_fill_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_fill_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_fill_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_fill_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_fill_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_select_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_select_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_select_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_select_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_select_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isclose_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isclose_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isclose_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isclose_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isfinite_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isfinite_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isfinite_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isfinite_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isfinite_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isfinite_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isfinite_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isfinite_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isinf_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isinf_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isinf_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isinf_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isinf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isinf_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isinf_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isinf_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isnan_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isnan_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isnan_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isnan_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isnan_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isnan_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isnan_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isnan_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isneginf_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isneginf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isneginf_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isneginf_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isneginf_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isneginf_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isposinf_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isposinf_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isposinf_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isposinf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isposinf_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isposinf_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isposinf_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isposinf_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isposinf_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isreal_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isreal_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isreal_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isreal_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isreal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isreal_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isreal_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isreal_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isreal_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isreal_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_lcm_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_lcm_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_lcm_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_lcm_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_le_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_le_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_le_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_le_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_lerp_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_lerp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_lgamma_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_lgamma_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_lgamma_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_lgamma_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_lgamma_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_lgamma_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_matrix_norm_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_matrix_norm_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_matrix_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_norm_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_svd_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_svdvals_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_svdvals_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_vector_norm_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_vector_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_vector_norm_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linspace_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linspace_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linspace_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log10_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log10_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log10_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log10_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log10_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log10_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log10_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log1p_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log1p_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log1p_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log1p_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log1p_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log2_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log2_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log2_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log2_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log2_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log2_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log2_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log_softmax_with_dtype_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log_softmax_with_dtype_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log_softmax_with_dtype_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log_softmax_with_dtype_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log_softmax_with_dtype_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_and_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_and_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_and_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_and_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_and_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_not_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_not_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_not_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_not_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_not_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_not_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_or_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_or_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_or_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_or_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_or_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_or_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_xor_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_xor_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_xor_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_xor_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logspace_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logspace_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logspace_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logspace_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logspace_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logspace_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logsumexp_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logsumexp_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logsumexp_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_lt_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_lt_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_lt_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_lt_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_lt_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_masked_fill_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_masked_fill_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_masked_fill_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_masked_fill_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_masked_fill_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_masked_fill_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_masked_fill_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_masked_fill_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_maximum_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_maximum_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_maximum_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_maximum_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_list_of_tensors_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_list_of_tensors_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_list_of_tensors_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_list_of_tensors_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_list_of_tensors_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_list_of_tensors_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_list_of_tensors_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_variadic_tensors_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_variadic_tensors_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_variadic_tensors_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_variadic_tensors_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_variadic_tensors_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_variadic_tensors_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_variadic_tensors_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_variadic_tensors_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_minimum_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_minimum_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_minimum_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_minimum_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_minimum_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_minimum_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_minimum_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_movedim_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_movedim_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_movedim_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_movedim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_movedim_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_mul_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_mul_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_mul_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_mul_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nan_to_num_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nan_to_num_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nan_to_num_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nan_to_num_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nan_to_num_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nan_to_num_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_copy_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_copy_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_copy_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_copy_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_copy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_copy_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_copy_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_native_layer_norm_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_native_layer_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_native_layer_norm_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ne_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ne_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ne_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ne_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ne_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ne_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ne_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ne_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_neg_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_neg_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_neg_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_neg_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_strided_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_strided_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_strided_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_strided_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_full_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_full_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_full_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_full_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_full_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_full_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_ones_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_ones_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_ones_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_ones_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_ones_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_ones_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_zeros_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_zeros_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_zeros_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_zeros_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_zeros_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_zeros_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_zeros_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_zeros_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nextafter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nextafter_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_alpha_dropout_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_celu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_celu_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_dropout_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_dropout_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_elu_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_gelu_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_gelu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_glu_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_glu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_glu_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_group_norm_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_group_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_hardshrink_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_hardshrink_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_hardtanh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_hardtanh_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_hardtanh_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_hardtanh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_hinge_embedding_loss_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_huber_loss_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_huber_loss_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_huber_loss_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_l1_loss_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_l1_loss_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_l1_loss_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_l1_loss_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_l1_loss_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_layer_norm_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_layer_norm_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_leaky_relu_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_leaky_relu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_log_softmax_with_dtype_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_log_softmax_with_dtype_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_log_softmax_with_dtype_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_log_softmax_with_dtype_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_margin_ranking_loss_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_margin_ranking_loss_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_margin_ranking_loss_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_margin_ranking_loss_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_margin_ranking_loss_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_margin_ranking_loss_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_mish_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_mish_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_mse_loss_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_mse_loss_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_nll_loss_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_nll_loss_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pairwise_distance_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pairwise_distance_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pairwise_distance_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pairwise_distance_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pairwise_distance_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pairwise_distance_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pairwise_distance_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pdist_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_poisson_nll_loss_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_poisson_nll_loss_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_prelu_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_relu6_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_relu6_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_relu6_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_relu_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_relu_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_relu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_relu_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_relu_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_relu_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_relu_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_selu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softmax_with_dtype_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softmax_with_dtype_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softmax_with_dtype_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softmax_with_dtype_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softmax_with_dtype_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softmax_with_dtype_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softmax_with_dtype_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softmin_with_dtype_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softmin_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softmin_with_dtype_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softmin_with_dtype_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softmin_with_dtype_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softplus_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softplus_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softshrink_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_tanhshrink_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_tanhshrink_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_tanhshrink_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_tanhshrink_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_tanhshrink_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_threshold_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_threshold_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_threshold_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_triplet_margin_loss_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_triplet_margin_loss_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_triplet_margin_loss_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_triplet_margin_loss_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_norm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_norm_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ones_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ones_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ones_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ones_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ones_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ones_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_permute_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_permute_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_permute_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_permute_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_positive_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_positive_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_positive_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_positive_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_positive_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_positive_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_positive_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_pow_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_pow_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_pow_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_pow_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_pow_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_pow_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_pow_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_prod_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_prod_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_prod_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_prod_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_prod_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_randn_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_randn_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_randn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ravel_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ravel_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ravel_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ravel_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ravel_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ravel_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ravel_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ravel_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_real_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_real_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_real_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_real_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_real_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_real_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reciprocal_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reciprocal_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reciprocal_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reciprocal_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reciprocal_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_remainder_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_remainder_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_repeat_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_repeat_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_repeat_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_repeat_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_as_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_as_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_as_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_as_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_as_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_as_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_roll_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_roll_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_roll_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_roll_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_roll_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_roll_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_roll_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_roll_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_roll_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rot90_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rot90_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rot90_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rot90_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rot90_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rot90_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rot90_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_round_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_round_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_round_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_round_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_round_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_round_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rsqrt_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rsqrt_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rsqrt_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rsqrt_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rsqrt_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rsqrt_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rsqrt_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rsqrt_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rsqrt_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rsub_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rsub_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rsub_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sgn_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sgn_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sgn_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sgn_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sgn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sgn_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sgn_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sgn_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sigmoid_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sigmoid_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sigmoid_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sigmoid_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sigmoid_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sigmoid_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sigmoid_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sigmoid_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sigmoid_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sign_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sign_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sign_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sign_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sign_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sign_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_signbit_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_signbit_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_signbit_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_signbit_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_signbit_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sin_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sin_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sin_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sin_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sin_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sinc_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sinc_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sinc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sinc_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sinc_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sinc_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sinc_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sinh_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sinh_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sinh_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sinh_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sinh_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sinh_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_softmax_with_dtype_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_softmax_with_dtype_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_softmax_with_dtype_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_softmax_with_dtype_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_softmax_with_dtype_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_softmax_with_dtype_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_softmax_with_dtype_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_bessel_j0_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_bessel_j0_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_bessel_j1_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_entr_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_entr_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_entr_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_entr_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_entr_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_entr_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_erfcx_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_erfcx_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i0e_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i0e_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i0e_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i0e_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i0e_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i0e_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i1_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i1_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i1_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i1_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i1e_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i1e_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i1e_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i1e_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_log_ndtr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_log_ndtr_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_log_softmax_with_dtype_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_log_softmax_with_dtype_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_log_softmax_with_dtype_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_log_softmax_with_dtype_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_log_softmax_with_dtype_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_log_softmax_with_dtype_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_logit_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_logit_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_logit_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_logit_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_logit_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_logit_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_1_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_1_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_3_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_3_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_3_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_3_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_3_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_3_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_3_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_5_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_5_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_5_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_ndtr_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_ndtr_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_ndtr_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_ndtr_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_ndtri_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_ndtri_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_ndtri_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_softmax_with_dtype_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_softmax_with_dtype_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_softmax_with_dtype_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_softmax_with_dtype_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_softmax_with_dtype_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_spherical_bessel_j0_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_spherical_bessel_j0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_spherical_bessel_j0_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_spherical_bessel_j0_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_xlog1py_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_xlog1py_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_xlog1py_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_xlog1py_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_xlog1py_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_xlog1py_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_zeta_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_zeta_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_zeta_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_zeta_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_zeta_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sqrt_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sqrt_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sqrt_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sqrt_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sqrt_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_square_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_square_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_square_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_square_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_squeeze_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_squeeze_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_squeeze_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_squeeze_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_squeeze_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_squeeze_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_squeeze_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_stack_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_stack_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_stack_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_stack_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_stack_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_stack_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_stack_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_std_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_std_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_std_mean_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_std_mean_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_std_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sub_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sub_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sub_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sub_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sub_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_to_size_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_to_size_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_to_size_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_to_size_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_to_size_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_to_size_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_to_size_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_to_size_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_t_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_t_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_t_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_t_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_t_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tan_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tan_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tan_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tan_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tan_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tan_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tan_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tanh_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tanh_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tanh_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tanh_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tanh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tanh_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tanh_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tanh_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tensor_split_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tensor_split_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tensor_split_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_to_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_to_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_to_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_trace_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_trace_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_trace_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_trace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_trace_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_trace_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_trace_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_trace_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_trace_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_transpose_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_transpose_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_transpose_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_transpose_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_transpose_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_transpose_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_transpose_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_transpose_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tril_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tril_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tril_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tril_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tril_indices_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_triu_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_triu_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_triu_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_triu_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_triu_indices_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_triu_indices_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_true_divide_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_true_divide_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_true_divide_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_trunc_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_trunc_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_trunc_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_trunc_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_trunc_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_trunc_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unbind_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unbind_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unbind_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unbind_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unbind_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unbind_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unflatten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unflatten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unflatten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unflatten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unflatten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_copy_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_copy_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_copy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_copy_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_copy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unsqueeze_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unsqueeze_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unsqueeze_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unsqueeze_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unsqueeze_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unsqueeze_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unsqueeze_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unsqueeze_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_var_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_var_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_var_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_var_mean_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_var_mean_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_var_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_as_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_as_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_as_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_as_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_as_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_vsplit_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_vsplit_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_vsplit_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_vsplit_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_vsplit_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_vsplit_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_vstack_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_vstack_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_vstack_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_vstack_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_where_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_where_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_where_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_where_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_where_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_where_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_where_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_where_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_xlogy_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_xlogy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_xlogy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_xlogy_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_xlogy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_zeros_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_zeros_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_zeros_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_zeros_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_zeros_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_zeros_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_zeros_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_T_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs__conversions_complex_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_amax_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_amin_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_as_strided_scatter_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_atan2_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_bitwise_and_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_bitwise_left_shift_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_bitwise_or_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_cat_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_clamp_max_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_diag_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_div_no_rounding_mode_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_div_trunc_rounding_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_eye_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fft_fft_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fft_ihfft2_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fft_ihfft_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fft_ihfftn_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fft_irfft2_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fft_rfft2_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fft_rfft_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fft_rfftn_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_flipud_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_float_power_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_gcd_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_heaviside_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_hsplit_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_hstack_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_isclose_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_le_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_linspace_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_logical_xor_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_logspace_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_maximum_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_movedim_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_narrow_copy_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_narrow_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_native_layer_norm_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_ne_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_nn_functional_group_norm_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_nn_functional_l1_loss_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_nn_functional_poisson_nll_loss_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_nn_functional_softshrink_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_nn_functional_triplet_margin_loss_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_pow_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_remainder_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_roll_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_rot90_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_rsub_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_special_xlog1py_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_t_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_trace_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_tril_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_true_divide_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_view_as_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_view_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_vstack_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_where_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors_ops_nvprims_view_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_T_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_T_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_T_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_T_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_T_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_T_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_T_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_T_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_T_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_T_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_T_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_T_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_T_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_byte_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_byte_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_byte_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_byte_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_byte_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_byte_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_byte_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_byte_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_byte_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_byte_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_byte_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_byte_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_byte_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_byte_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_byte_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_byte_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_byte_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_complex_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_complex_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_complex_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_float_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_float_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_float_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_float_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_float_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_float_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_float_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_float_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_float_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_float_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_float_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_half_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_half_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_half_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_half_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_half_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_half_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_half_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_half_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_half_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_half_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_half_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_half_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_int_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_int_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_int_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_int_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_int_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_int_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_int_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_int_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_int_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_int_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_int_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_int_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_short_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_short_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_short_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_short_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_short_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_short_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_short_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_short_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_short_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_short_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_short_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_short_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acos_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acos_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acos_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acos_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acos_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acos_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acos_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acos_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acos_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acos_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acos_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acos_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acos_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acos_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acos_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acos_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acos_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acosh_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acosh_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acosh_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acosh_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acosh_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acosh_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acosh_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acosh_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acosh_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acosh_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acosh_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acosh_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acosh_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcdiv_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcdiv_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcdiv_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcdiv_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcdiv_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcdiv_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcdiv_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcmul_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcmul_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcmul_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcmul_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcmul_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcmul_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcmul_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcmul_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcmul_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcmul_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcmul_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcmul_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcmul_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcmul_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addr_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addr_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addr_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addr_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addr_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addr_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addr_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addr_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addr_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addr_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addr_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_all_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_all_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_all_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_all_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_all_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_all_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_all_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_all_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_all_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_all_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_all_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_all_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_all_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_all_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_all_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_all_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_allclose_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_allclose_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_allclose_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amax_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amax_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amax_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amax_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amax_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amax_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amax_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amax_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amin_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amin_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amin_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amin_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amin_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amin_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amin_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amin_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amin_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amin_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amin_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amin_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_any_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_any_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_any_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_any_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_any_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_any_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_any_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_any_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_any_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_arange_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_arange_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_arange_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_arange_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_arange_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_arange_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_arange_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_arange_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_arange_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_arange_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_partial_views_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_partial_views_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_partial_views_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_partial_views_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_partial_views_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_partial_views_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_partial_views_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_partial_views_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_partial_views_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_partial_views_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_partial_views_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_partial_views_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_partial_views_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan2_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan2_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan2_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan2_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan2_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan2_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan2_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan2_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan2_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan2_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan2_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan2_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan2_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_and_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_and_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_and_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_and_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_left_shift_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_left_shift_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_left_shift_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_left_shift_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_left_shift_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_not_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_not_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_not_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_not_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_not_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_not_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_not_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_or_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_or_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_or_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_or_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_or_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_right_shift_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_right_shift_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_right_shift_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_right_shift_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_right_shift_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_xor_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_xor_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_xor_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_xor_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_xor_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_xor_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_xor_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_shapes_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_tensors_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_tensors_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_tensors_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_tensors_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_tensors_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_tensors_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_tensors_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_tensors_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_tensors_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_tensors_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_tensors_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_to_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_to_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_to_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_to_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_to_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_to_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_to_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_to_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_to_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_to_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_to_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_to_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_to_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_to_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bucketize_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bucketize_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bucketize_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bucketize_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bucketize_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bucketize_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bucketize_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ceil_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ceil_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ceil_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ceil_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_max_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_max_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_max_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_max_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_max_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_max_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_max_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_max_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_max_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_min_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_min_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_min_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_min_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_min_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_min_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_min_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_min_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_min_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_min_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_min_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_column_stack_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_column_stack_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_column_stack_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_column_stack_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_column_stack_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_column_stack_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_column_stack_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_column_stack_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_column_stack_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_column_stack_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_column_stack_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_column_stack_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_column_stack_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_column_stack_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_column_stack_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_column_stack_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_physical_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_physical_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_physical_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_physical_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_physical_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_physical_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_physical_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_physical_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_physical_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_physical_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_physical_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_physical_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_physical_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_constant_pad_nd_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_constant_pad_nd_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_constant_pad_nd_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_constant_pad_nd_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_constant_pad_nd_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_constant_pad_nd_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_constant_pad_nd_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_constant_pad_nd_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_constant_pad_nd_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_constant_pad_nd_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_constant_pad_nd_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_constant_pad_nd_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_copysign_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_copysign_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_copysign_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_copysign_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_copysign_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_copysign_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_copysign_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_copysign_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_copysign_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_copysign_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumsum_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumsum_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumsum_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumsum_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumsum_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumsum_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumsum_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumsum_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumsum_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumsum_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumsum_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_scatter_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_scatter_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_scatter_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_scatter_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_scatter_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_scatter_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_scatter_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_scatter_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_scatter_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_scatter_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_digamma_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_digamma_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_digamma_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_digamma_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_digamma_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_digamma_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_digamma_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_floor_rounding_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_floor_rounding_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_floor_rounding_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_floor_rounding_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_floor_rounding_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_trunc_rounding_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_trunc_rounding_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_trunc_rounding_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_trunc_rounding_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_trunc_rounding_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_trunc_rounding_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_trunc_rounding_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_trunc_rounding_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_trunc_rounding_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_trunc_rounding_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erf_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erf_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erf_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erf_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erf_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erf_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erf_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfc_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfc_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfc_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfc_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfc_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfc_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfc_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfc_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfc_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfc_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfinv_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfinv_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfinv_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfinv_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfinv_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfinv_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfinv_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfinv_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp2_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp2_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp2_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp2_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp2_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp2_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp2_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp2_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp2_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_as_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_as_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_as_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_as_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_as_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_as_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_as_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_as_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_as_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_as_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expm1_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expm1_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expm1_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expm1_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expm1_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expm1_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expm1_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expm1_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expm1_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expm1_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expm1_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eye_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eye_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eye_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eye_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eye_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eye_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eye_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eye_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eye_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eye_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft2_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft2_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft2_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft2_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft2_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft2_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft2_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft2_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft2_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft2_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft2_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftn_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftn_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftn_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftn_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftn_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftn_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftn_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftn_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftn_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftn_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftn_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftn_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft2_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft2_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft2_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft2_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft2_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft2_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft2_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft2_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft2_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft2_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft2_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft2_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfftn_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfftn_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfftn_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfftn_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfftn_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfftn_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfftn_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfftn_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfftn_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfftn_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfftn_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft2_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft2_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft2_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft2_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft2_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft2_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft2_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft2_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft2_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft2_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftn_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftn_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftn_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftn_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftn_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftn_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftn_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftn_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftn_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftn_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftn_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftshift_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftshift_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftshift_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftshift_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftshift_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftshift_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftshift_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftshift_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftshift_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftshift_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftshift_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftshift_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft2_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft2_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft2_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft2_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft2_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft2_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfftn_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfftn_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfftn_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfftn_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfftn_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfftn_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfftn_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfftn_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfftn_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfftn_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft2_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft2_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft2_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft2_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft2_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft2_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft2_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft2_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfftn_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfftn_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfftn_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfftn_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfftn_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfftn_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfftn_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfftn_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfftn_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfftn_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfftn_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfftn_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfftn_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfftn_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfftn_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft2_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft2_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft2_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft2_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft2_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft2_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft2_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft2_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfftn_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfftn_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfftn_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfftn_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfftn_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfftn_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfftn_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfftn_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfftn_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfftn_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfftn_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flip_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flip_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flip_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flip_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flip_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flip_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flip_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flip_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flip_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flip_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flip_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fliplr_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fliplr_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fliplr_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fliplr_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fliplr_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fliplr_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fliplr_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fliplr_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fliplr_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fliplr_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fliplr_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flipud_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flipud_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flipud_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flipud_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flipud_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flipud_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flipud_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flipud_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flipud_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flipud_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_float_power_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_float_power_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_float_power_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_float_power_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_float_power_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_float_power_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_float_power_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_float_power_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_float_power_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_float_power_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_float_power_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_float_power_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_float_power_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_float_power_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_divide_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_divide_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_divide_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_divide_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_divide_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_divide_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_divide_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_divide_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_divide_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_divide_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmax_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmax_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmax_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmax_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmax_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmax_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmax_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmax_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmax_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmax_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmax_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmax_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmin_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmin_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmin_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmin_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmin_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmin_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmin_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmin_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmin_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmin_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmin_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmod_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmod_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmod_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmod_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmod_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmod_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmod_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmod_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmod_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_frac_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_frac_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_frac_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_frac_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gcd_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gcd_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gcd_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gcd_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gcd_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ge_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ge_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ge_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ge_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ge_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ge_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gt_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gt_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gt_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gt_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gt_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gt_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gt_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gt_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gt_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_heaviside_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_heaviside_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_heaviside_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_heaviside_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_heaviside_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_heaviside_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_heaviside_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_heaviside_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_heaviside_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_heaviside_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_heaviside_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hypot_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hypot_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hypot_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hypot_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hypot_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_i0_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_i0_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_i0_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_i0_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_i0_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_i0_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_i0_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_i0_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_i0_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_i0_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_igamma_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_igamma_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_igammac_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_igammac_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_imag_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_imag_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_imag_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_copy_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_copy_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_copy_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_copy_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_copy_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_copy_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_copy_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_copy_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_copy_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_copy_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_copy_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_copy_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_copy_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_copy_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_fill_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_fill_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_fill_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_fill_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_fill_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_fill_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_fill_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_fill_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_fill_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_fill_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_fill_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isclose_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isclose_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isclose_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isclose_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isclose_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isclose_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isclose_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isclose_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isclose_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isclose_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isclose_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isclose_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isclose_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isnan_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isnan_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isnan_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isnan_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isnan_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isnan_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isnan_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isnan_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isnan_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isnan_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isnan_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isneginf_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isneginf_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isneginf_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isneginf_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isneginf_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isneginf_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isneginf_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isneginf_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isneginf_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isposinf_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isposinf_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isposinf_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isposinf_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isposinf_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isposinf_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isposinf_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isposinf_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isposinf_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isposinf_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isposinf_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isposinf_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isposinf_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lcm_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lcm_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lcm_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_le_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_le_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_le_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_le_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_le_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_le_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_le_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_le_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_le_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_le_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_le_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_le_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_le_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_le_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lerp_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lerp_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lerp_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lerp_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lerp_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lerp_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lerp_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lerp_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lgamma_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lgamma_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lgamma_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lgamma_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lgamma_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lgamma_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lgamma_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lgamma_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lgamma_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_matrix_norm_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_matrix_norm_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_matrix_norm_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_matrix_norm_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_matrix_norm_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_norm_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_norm_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_norm_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_norm_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_norm_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_svd_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_svd_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_svd_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_svdvals_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_svdvals_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_svdvals_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_vector_norm_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_vector_norm_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_vector_norm_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_vector_norm_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_vector_norm_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_vector_norm_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_vector_norm_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log10_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log10_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log10_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log10_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log10_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log10_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log10_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log10_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log10_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log10_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log10_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log10_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log1p_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log1p_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log1p_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log1p_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log1p_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log1p_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log1p_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log1p_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log1p_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log1p_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log1p_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log1p_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log1p_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log1p_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log1p_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log2_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log2_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log2_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log2_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log2_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log2_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log2_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log2_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log2_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log2_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log2_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log2_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log2_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_and_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_and_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_and_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_and_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_and_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_and_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_and_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_and_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_and_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_and_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_and_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_not_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_not_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_not_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_not_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_not_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_not_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_not_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_not_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_not_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_not_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_not_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_or_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_or_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_or_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_or_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_or_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_or_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_or_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_or_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_or_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_or_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_or_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_xor_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_xor_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_xor_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_xor_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_xor_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_xor_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_xor_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_xor_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_xor_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_xor_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_xor_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_xor_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logspace_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logspace_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logspace_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logspace_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logspace_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logspace_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logspace_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logspace_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logspace_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logspace_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logspace_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logspace_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logspace_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logsumexp_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logsumexp_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logsumexp_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logsumexp_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logsumexp_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logsumexp_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logsumexp_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logsumexp_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logsumexp_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logsumexp_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lt_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lt_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lt_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lt_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lt_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lt_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lt_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lt_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lt_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lt_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lt_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lt_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lt_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lt_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_maximum_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_maximum_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_maximum_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_maximum_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_maximum_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_maximum_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_maximum_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_maximum_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_maximum_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mean_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mean_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mean_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mean_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mean_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mean_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_variadic_tensors_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_variadic_tensors_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_variadic_tensors_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_variadic_tensors_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_variadic_tensors_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_variadic_tensors_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_variadic_tensors_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_variadic_tensors_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_variadic_tensors_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_variadic_tensors_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_variadic_tensors_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_variadic_tensors_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_variadic_tensors_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_minimum_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_minimum_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_minimum_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_minimum_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_minimum_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_minimum_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_minimum_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_minimum_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mul_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mul_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mul_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mul_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mul_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mul_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mul_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mul_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mul_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mul_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mul_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mul_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mul_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mul_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mul_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mul_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nan_to_num_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nan_to_num_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nan_to_num_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nan_to_num_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nan_to_num_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nan_to_num_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nan_to_num_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nan_to_num_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nan_to_num_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_native_layer_norm_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_native_layer_norm_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ne_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ne_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ne_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ne_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ne_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ne_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ne_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ne_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ne_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ne_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ne_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_neg_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_neg_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_neg_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_neg_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_neg_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_neg_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_neg_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_neg_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_neg_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_strided_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_strided_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_strided_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_strided_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_strided_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_strided_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_strided_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_strided_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_strided_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_strided_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_strided_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_strided_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_strided_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_strided_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_strided_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_strided_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_full_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_full_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_full_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_full_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_full_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_full_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_full_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_full_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_full_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_full_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_full_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_full_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_full_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_zeros_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_zeros_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_zeros_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_zeros_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_zeros_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_zeros_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_zeros_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_zeros_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_zeros_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_zeros_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nextafter_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nextafter_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nextafter_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_alpha_dropout_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_alpha_dropout_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_alpha_dropout_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_alpha_dropout_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_alpha_dropout_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_celu_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_celu_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_celu_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_dropout_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_dropout_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_elu_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_elu_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_gelu_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_gelu_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_gelu_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_gelu_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_gelu_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_glu_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_group_norm_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_group_norm_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_group_norm_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hardshrink_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hardshrink_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hardshrink_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hardtanh_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hardtanh_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hardtanh_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hardtanh_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hardtanh_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hinge_embedding_loss_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hinge_embedding_loss_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hinge_embedding_loss_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hinge_embedding_loss_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_huber_loss_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_huber_loss_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_l1_loss_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_l1_loss_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_l1_loss_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_l1_loss_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_layer_norm_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_layer_norm_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_layer_norm_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_layer_norm_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_leaky_relu_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_leaky_relu_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_leaky_relu_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_margin_ranking_loss_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_margin_ranking_loss_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_margin_ranking_loss_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_margin_ranking_loss_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_margin_ranking_loss_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_margin_ranking_loss_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_margin_ranking_loss_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_margin_ranking_loss_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_margin_ranking_loss_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_margin_ranking_loss_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_mish_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_mish_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_mish_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_mish_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_mse_loss_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_nll_loss_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_nll_loss_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_nll_loss_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pairwise_distance_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pairwise_distance_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pairwise_distance_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pairwise_distance_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pairwise_distance_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pairwise_distance_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pairwise_distance_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pairwise_distance_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pairwise_distance_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pairwise_distance_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pairwise_distance_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pairwise_distance_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pdist_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pdist_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_poisson_nll_loss_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_poisson_nll_loss_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_poisson_nll_loss_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_poisson_nll_loss_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_poisson_nll_loss_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_poisson_nll_loss_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_poisson_nll_loss_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_prelu_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_prelu_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_prelu_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_prelu_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu6_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu6_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu6_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu6_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu6_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu6_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_selu_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_selu_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_selu_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_selu_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_selu_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmin_with_dtype_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmin_with_dtype_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmin_with_dtype_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmin_with_dtype_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmin_with_dtype_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmin_with_dtype_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmin_with_dtype_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmin_with_dtype_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softplus_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softplus_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softplus_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softplus_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softshrink_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softshrink_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softshrink_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softshrink_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_tanhshrink_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_tanhshrink_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_tanhshrink_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_tanhshrink_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_tanhshrink_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_tanhshrink_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_tanhshrink_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_tanhshrink_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_tanhshrink_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_tanhshrink_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_tanhshrink_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_tanhshrink_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_tanhshrink_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_threshold_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_threshold_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_threshold_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_threshold_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_threshold_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_threshold_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_threshold_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_threshold_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_threshold_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_triplet_margin_loss_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_triplet_margin_loss_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_triplet_margin_loss_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_triplet_margin_loss_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_triplet_margin_loss_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_triplet_margin_loss_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_triplet_margin_loss_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_triplet_margin_loss_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_triplet_margin_loss_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_triplet_margin_loss_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_triplet_margin_loss_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_triplet_margin_loss_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_norm_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_norm_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_norm_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_norm_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ones_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ones_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ones_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ones_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ones_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ones_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ones_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ones_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ones_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ones_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ones_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ones_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ones_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_positive_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_positive_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_positive_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_positive_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_positive_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_positive_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_positive_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_positive_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_positive_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_positive_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_positive_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_positive_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_positive_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_pow_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_pow_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_pow_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_pow_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_pow_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_pow_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_pow_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_pow_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_pow_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_pow_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_pow_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_pow_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_prod_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_prod_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_prod_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_prod_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_prod_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_prod_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_prod_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_prod_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_prod_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_prod_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_prod_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_prod_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_prod_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_prod_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_prod_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_randn_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_randn_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_randn_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_randn_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_randn_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_randn_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reciprocal_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reciprocal_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reciprocal_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reciprocal_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reciprocal_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reciprocal_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reciprocal_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reciprocal_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reciprocal_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reciprocal_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reciprocal_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reciprocal_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reciprocal_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reciprocal_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_remainder_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_remainder_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_remainder_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_remainder_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_remainder_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_remainder_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_remainder_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_remainder_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_remainder_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_repeat_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_repeat_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_repeat_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_repeat_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_repeat_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_repeat_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_repeat_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_repeat_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_repeat_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_repeat_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_repeat_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_repeat_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rot90_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rot90_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rot90_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rot90_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rot90_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rot90_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rot90_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rot90_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rot90_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rot90_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rot90_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rot90_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rot90_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rot90_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_round_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_round_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_round_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_round_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_round_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_round_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_round_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_round_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_round_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsqrt_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsqrt_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsqrt_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsqrt_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsqrt_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsqrt_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsqrt_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsqrt_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsqrt_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsqrt_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsqrt_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsub_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsub_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsub_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsub_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsub_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsub_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsub_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsub_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsub_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsub_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsub_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsub_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sign_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sign_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sign_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sign_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sign_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sign_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sign_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sign_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sign_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sign_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_signbit_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_signbit_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_signbit_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_signbit_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_signbit_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_signbit_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_signbit_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_signbit_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_signbit_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sin_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sin_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sin_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sin_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sin_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sin_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sin_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sin_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sin_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sin_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sin_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinc_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinc_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinc_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinc_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinc_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinc_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinc_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_softmax_with_dtype_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_softmax_with_dtype_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_softmax_with_dtype_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_softmax_with_dtype_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_softmax_with_dtype_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_softmax_with_dtype_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_softmax_with_dtype_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_softmax_with_dtype_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_softmax_with_dtype_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_softmax_with_dtype_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_softmax_with_dtype_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j0_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j0_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j0_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j0_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j0_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j0_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j0_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j0_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j1_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j1_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j1_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j1_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j1_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j1_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j1_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j1_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j1_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j1_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j1_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_entr_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_entr_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_entr_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_entr_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_entr_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_entr_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_entr_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_entr_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_entr_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_entr_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_entr_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_entr_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_entr_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_erfcx_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_erfcx_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_erfcx_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_erfcx_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_erfcx_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_erfcx_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_erfcx_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i0e_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i0e_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i0e_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i0e_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i0e_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i0e_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i0e_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i0e_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i0e_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i0e_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i0e_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i0e_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1e_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1e_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1e_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1e_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1e_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1e_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1e_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1e_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1e_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1e_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_ndtr_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_ndtr_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_ndtr_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_ndtr_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_ndtr_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_ndtr_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_ndtr_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_ndtr_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_ndtr_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_logit_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_logit_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_logit_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_logit_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_logit_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_logit_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_logit_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_logit_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_logit_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_1_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_1_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_1_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_1_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_1_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_1_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_1_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_1_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_1_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_3_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_3_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_3_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_3_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_3_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_3_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_5_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_5_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_5_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_5_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_5_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtr_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtr_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtr_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtr_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtr_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtr_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtr_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtr_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtr_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtr_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtr_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtri_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtri_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtri_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtri_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtri_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtri_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtri_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtri_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_softmax_with_dtype_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_softmax_with_dtype_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_softmax_with_dtype_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_softmax_with_dtype_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_softmax_with_dtype_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_softmax_with_dtype_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_softmax_with_dtype_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_softmax_with_dtype_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_softmax_with_dtype_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_softmax_with_dtype_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_softmax_with_dtype_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_softmax_with_dtype_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_softmax_with_dtype_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_softmax_with_dtype_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_softmax_with_dtype_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_spherical_bessel_j0_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_spherical_bessel_j0_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_spherical_bessel_j0_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_spherical_bessel_j0_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_spherical_bessel_j0_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_spherical_bessel_j0_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_spherical_bessel_j0_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_xlog1py_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_xlog1py_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_xlog1py_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_xlog1py_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_xlog1py_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_xlog1py_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_xlog1py_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_xlog1py_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_zeta_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_zeta_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_zeta_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_zeta_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_zeta_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_zeta_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_zeta_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_zeta_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_zeta_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_std_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_std_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_std_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_std_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_std_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_std_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_std_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_std_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_std_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_std_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_std_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_std_mean_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_std_mean_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_std_mean_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_std_mean_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_std_mean_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_std_mean_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_std_mean_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_std_mean_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_std_mean_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_std_mean_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sub_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sub_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sub_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sub_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sub_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sub_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sub_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_to_size_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_to_size_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_to_size_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_to_size_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_to_size_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_to_size_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_to_size_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_to_size_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_to_size_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_to_size_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tan_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tan_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tan_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tan_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tan_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tan_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tan_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tan_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tan_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tan_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tan_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tan_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tan_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tan_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tensor_split_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tensor_split_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tensor_split_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tensor_split_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tensor_split_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tensor_split_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tensor_split_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tensor_split_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tensor_split_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tensor_split_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tensor_split_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tensor_split_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tensor_split_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_to_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_to_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_to_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_to_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_to_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_to_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_to_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_to_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_to_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_to_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_to_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_to_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_to_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_indices_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_indices_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_indices_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_indices_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_indices_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_indices_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_indices_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_true_divide_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_true_divide_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_true_divide_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_true_divide_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_true_divide_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_true_divide_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_true_divide_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_true_divide_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_true_divide_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_true_divide_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_true_divide_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_true_divide_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_true_divide_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_true_divide_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_true_divide_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_true_divide_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_true_divide_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trunc_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trunc_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trunc_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trunc_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trunc_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trunc_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trunc_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unflatten_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unflatten_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unflatten_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unflatten_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unflatten_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unflatten_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unflatten_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unflatten_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unflatten_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unflatten_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unflatten_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unflatten_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unflatten_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unflatten_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unflatten_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unflatten_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_mean_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_mean_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_mean_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_mean_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_mean_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vsplit_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vsplit_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vsplit_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vsplit_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vsplit_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vsplit_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vsplit_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vsplit_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vsplit_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vsplit_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vsplit_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vstack_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vstack_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vstack_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vstack_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vstack_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vstack_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vstack_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vstack_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vstack_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vstack_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vstack_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_where_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_where_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_where_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_where_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_where_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_where_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_where_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_where_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_where_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_where_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_where_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_where_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_where_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_where_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_where_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_where_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_where_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_where_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_xlogy_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_xlogy_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_xlogy_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_xlogy_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_xlogy_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_xlogy_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_xlogy_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_xlogy_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_xlogy_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_native_batch_norm_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_native_batch_norm_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_native_batch_norm_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_var_mean_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_var_mean_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_var_mean_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_var_mean_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_var_mean_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_var_mean_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_view_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_view_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_view_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_view_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_view_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_view_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_view_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_view_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_T_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_T_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_T_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_T_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_T_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bfloat16_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bfloat16_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bfloat16_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bfloat16_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bfloat16_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bool_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bool_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bool_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bool_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bool_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bool_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bool_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_byte_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_byte_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_byte_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_byte_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_byte_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_byte_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cdouble_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cdouble_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cdouble_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cfloat_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cfloat_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cfloat_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cfloat_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cfloat_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_chalf_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_chalf_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_chalf_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_chalf_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_chalf_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_chalf_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_chalf_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_char_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_char_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_char_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_char_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_char_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_char_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_char_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_char_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_complex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_complex_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_double_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_double_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_double_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_double_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_double_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_float_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_float_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_float_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_float_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_half_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_half_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_half_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_half_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_half_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_half_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_half_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_half_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_half_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_int_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_int_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_int_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_int_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_int_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_int_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_long_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_long_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_long_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_long_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_long_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_long_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_long_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_long_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_long_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_long_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_short_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_short_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_short_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_abs_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_abs_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_abs_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_abs_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_abs_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_abs_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acos_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acos_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acos_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acos_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acos_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acos_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acos_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acosh_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acosh_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acosh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acosh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acosh_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_add_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_add_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_add_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_add_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_add_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_add_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addcdiv_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addcdiv_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addcdiv_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addcdiv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addcmul_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addcmul_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addcmul_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addcmul_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addr_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addr_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addr_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addr_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addr_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_all_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_all_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_all_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_all_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_all_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_all_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_allclose_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_allclose_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_allclose_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_amax_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_amax_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_amax_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_amax_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_amax_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_amax_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_amin_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_amin_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_any_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_any_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_any_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_any_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_any_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_any_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_arange_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_arange_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_arange_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_partial_views_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_partial_views_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_partial_views_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_partial_views_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_partial_views_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_partial_views_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_partial_views_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_scatter_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_scatter_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_scatter_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_scatter_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_scatter_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_scatter_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_scatter_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asin_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asin_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asin_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asin_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asin_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asinh_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asinh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asinh_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asinh_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asinh_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan2_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan2_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan2_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atanh_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atanh_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atanh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atanh_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atanh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atanh_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atanh_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_1d_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_1d_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_1d_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_1d_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_1d_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_1d_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_1d_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_1d_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_2d_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_2d_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_2d_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_2d_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_2d_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_3d_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_3d_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_3d_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_3d_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_3d_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_and_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_and_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_and_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_left_shift_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_left_shift_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_left_shift_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_not_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_not_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_or_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_or_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_or_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_or_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_right_shift_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_right_shift_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_right_shift_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_right_shift_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_xor_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_xor_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_xor_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_xor_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_shapes_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_tensors_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_tensors_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_tensors_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_tensors_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_tensors_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_tensors_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_tensors_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_to_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_to_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_to_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_to_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_to_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_to_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bucketize_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bucketize_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bucketize_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cat_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cat_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cat_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cat_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cat_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cat_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cat_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ceil_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ceil_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ceil_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ceil_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ceil_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_chunk_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_chunk_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_chunk_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_chunk_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_chunk_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_chunk_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_chunk_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_chunk_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_chunk_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_chunk_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_max_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_max_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_max_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_max_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_max_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_min_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_min_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_min_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_min_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_min_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_min_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_min_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_min_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clone_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clone_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clone_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clone_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clone_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clone_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_column_stack_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_column_stack_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_column_stack_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_column_stack_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_column_stack_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_column_stack_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_column_stack_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_column_stack_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_physical_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_physical_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_physical_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_physical_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_physical_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_physical_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_physical_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_physical_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_constant_pad_nd_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_constant_pad_nd_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_constant_pad_nd_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_constant_pad_nd_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_constant_pad_nd_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_constant_pad_nd_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_contiguous_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_contiguous_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_contiguous_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_contiguous_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_contiguous_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_contiguous_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_contiguous_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_copysign_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_copysign_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_copysign_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_copysign_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_copysign_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_copysign_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cos_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cos_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cos_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cos_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cos_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cos_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cos_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cos_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cos_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cosh_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cosh_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cosh_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cosh_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cosh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cosh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cosh_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cosh_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cosh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cosh_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cumsum_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cumsum_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cumsum_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cumsum_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cumsum_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cumsum_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cumsum_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_embed_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_embed_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_embed_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_embed_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_embed_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_copy_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_copy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_copy_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_copy_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_scatter_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_scatter_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_scatter_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_scatter_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_scatter_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_scatter_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_digamma_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_digamma_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_digamma_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_digamma_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_digamma_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_floor_rounding_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_floor_rounding_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_floor_rounding_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_floor_rounding_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_no_rounding_mode_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_no_rounding_mode_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_no_rounding_mode_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_no_rounding_mode_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_trunc_rounding_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_trunc_rounding_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_trunc_rounding_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_trunc_rounding_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_trunc_rounding_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dsplit_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dsplit_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dsplit_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dsplit_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dsplit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dsplit_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dsplit_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dsplit_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dsplit_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dstack_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dstack_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dstack_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dstack_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dstack_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dstack_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dstack_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_like_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_like_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_like_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_like_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_like_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_like_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_like_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_like_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eq_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eq_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eq_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eq_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eq_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eq_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erf_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erf_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erf_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erf_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erf_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erfc_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erfc_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erfc_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erfc_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erfc_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erfinv_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erfinv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erfinv_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erfinv_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp2_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp2_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp2_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_as_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_as_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_as_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_as_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_as_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_as_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expm1_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expm1_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expm1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expm1_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expm1_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expm1_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eye_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eye_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eye_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eye_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eye_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eye_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft2_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft2_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft2_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft2_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftn_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftn_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftn_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftn_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftn_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftn_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftshift_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftshift_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftshift_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftshift_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftshift_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftshift_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftshift_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft2_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft2_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft2_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfftn_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfftn_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfftn_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfftn_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfftn_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfftn_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfftn_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft2_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft2_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft2_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftn_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftn_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftn_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftshift_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftshift_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftshift_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftshift_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftshift_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftshift_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfft2_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfft2_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfft2_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfft_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfft_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfftn_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfftn_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfftn_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfftn_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfftn_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft2_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft2_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft2_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft2_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft2_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfftn_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfftn_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfftn_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfftn_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfft2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfft2_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfft2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfft2_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfft2_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfft_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfft_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfft_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfft_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfft_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfft_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfftn_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfftn_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfftn_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfftn_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fill_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fill_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fill_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fill_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flatten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flatten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flatten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flatten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flatten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flatten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flatten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flatten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flip_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flip_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flip_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flip_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flip_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flip_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flip_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fliplr_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fliplr_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fliplr_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fliplr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fliplr_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fliplr_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fliplr_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flipud_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flipud_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flipud_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flipud_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_float_power_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_float_power_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_float_power_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_float_power_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_float_power_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_float_power_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_float_power_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_float_power_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_float_power_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_floor_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_floor_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_floor_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_floor_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_floor_divide_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_floor_divide_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_floor_divide_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_floor_divide_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_floor_divide_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmax_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmax_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmax_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmax_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmax_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmin_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmin_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmin_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmin_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmod_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmod_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmod_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmod_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmod_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_frac_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_frac_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_gcd_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_gcd_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_gcd_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ge_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ge_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ge_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_gt_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_gt_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_gt_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_gt_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_heaviside_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_heaviside_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_heaviside_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_heaviside_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hsplit_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hsplit_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hsplit_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hsplit_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hsplit_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hsplit_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hstack_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hstack_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hstack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hstack_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hstack_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hypot_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hypot_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hypot_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_i0_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_i0_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_i0_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_igamma_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_igammac_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_igammac_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_imag_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_add_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_add_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_add_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_add_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_add_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_add_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_copy_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_copy_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_copy_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_fill_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_fill_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_fill_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_fill_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_select_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_select_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_select_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_select_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_select_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_select_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_select_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isclose_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isclose_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isclose_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isclose_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isclose_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isclose_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isfinite_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isfinite_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isfinite_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isfinite_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isfinite_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isfinite_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isinf_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isinf_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isinf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isinf_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isinf_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isnan_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isnan_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isnan_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isnan_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isnan_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isneginf_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isneginf_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isneginf_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isneginf_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isneginf_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isneginf_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isposinf_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isposinf_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isposinf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isposinf_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isposinf_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isposinf_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isreal_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isreal_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isreal_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isreal_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isreal_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isreal_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isreal_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lcm_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lcm_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lcm_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_le_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_le_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_le_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_le_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_le_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lerp_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lerp_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lerp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lerp_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lgamma_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lgamma_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lgamma_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lgamma_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_matrix_norm_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_matrix_norm_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_matrix_norm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_norm_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_norm_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_svd_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_svd_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_svdvals_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_vector_norm_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_vector_norm_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linspace_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linspace_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linspace_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linspace_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linspace_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linspace_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log10_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log10_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log10_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log10_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log10_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log1p_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log1p_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log1p_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log1p_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log1p_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log1p_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log1p_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log2_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log2_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_softmax_with_dtype_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_softmax_with_dtype_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_softmax_with_dtype_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_softmax_with_dtype_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_softmax_with_dtype_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_softmax_with_dtype_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_and_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_and_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_and_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_not_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_not_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_not_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_not_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_not_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_not_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_or_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_or_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_or_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_or_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_or_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_or_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_or_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_or_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_or_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_or_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_xor_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_xor_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_xor_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_xor_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logspace_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logspace_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logspace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logspace_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logspace_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logsumexp_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logsumexp_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logsumexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logsumexp_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logsumexp_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logsumexp_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lt_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lt_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lt_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lt_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_masked_fill_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_masked_fill_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_masked_fill_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_masked_fill_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_maximum_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_maximum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_maximum_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_maximum_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_maximum_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_mean_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_mean_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_mean_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_list_of_tensors_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_list_of_tensors_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_list_of_tensors_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_variadic_tensors_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_variadic_tensors_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_variadic_tensors_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_variadic_tensors_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_variadic_tensors_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_minimum_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_minimum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_minimum_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_minimum_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_movedim_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_movedim_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_movedim_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_movedim_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_mul_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_mul_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_mul_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_mul_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_mul_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_mul_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_mul_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nan_to_num_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nan_to_num_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nan_to_num_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nan_to_num_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nan_to_num_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_copy_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_copy_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_copy_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_copy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_copy_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_copy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_native_layer_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ne_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ne_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ne_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ne_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ne_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ne_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_neg_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_neg_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_neg_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_neg_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_strided_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_strided_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_strided_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_strided_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_strided_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_strided_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_strided_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_full_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_full_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_full_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_full_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_full_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_ones_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_ones_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_ones_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_ones_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_ones_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_ones_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_ones_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_ones_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_zeros_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_zeros_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_zeros_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_zeros_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nextafter_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nextafter_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_alpha_dropout_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_alpha_dropout_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_celu_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_celu_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_dropout_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_dropout_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_dropout_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_elu_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_elu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_elu_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_gelu_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_gelu_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_glu_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_group_norm_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_group_norm_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_group_norm_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_hardshrink_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_hardtanh_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_hardtanh_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_hardtanh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_hardtanh_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_hardtanh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_hardtanh_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_huber_loss_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_huber_loss_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_huber_loss_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_l1_loss_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_l1_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_layer_norm_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_layer_norm_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_leaky_relu_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_leaky_relu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_log_softmax_with_dtype_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_log_softmax_with_dtype_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_log_softmax_with_dtype_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_log_softmax_with_dtype_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_log_softmax_with_dtype_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_log_softmax_with_dtype_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_margin_ranking_loss_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_margin_ranking_loss_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_margin_ranking_loss_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_margin_ranking_loss_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_margin_ranking_loss_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_mish_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_mish_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_mish_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_mse_loss_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_mse_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_mse_loss_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_nll_loss_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_nll_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pairwise_distance_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pairwise_distance_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pairwise_distance_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pairwise_distance_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pdist_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_poisson_nll_loss_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_poisson_nll_loss_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_poisson_nll_loss_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_poisson_nll_loss_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_prelu_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_prelu_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_prelu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_relu6_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_relu6_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_relu6_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_relu6_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_relu6_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_relu6_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_relu_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_relu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_relu_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_relu_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_relu_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_relu_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmax_with_dtype_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmax_with_dtype_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmax_with_dtype_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmax_with_dtype_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmin_with_dtype_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmin_with_dtype_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmin_with_dtype_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softplus_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softplus_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softplus_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softshrink_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softshrink_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_tanhshrink_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_tanhshrink_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_tanhshrink_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_tanhshrink_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_tanhshrink_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_threshold_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_threshold_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_threshold_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_threshold_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_triplet_margin_loss_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_triplet_margin_loss_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_triplet_margin_loss_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_triplet_margin_loss_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_triplet_margin_loss_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_triplet_margin_loss_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_norm_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ones_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ones_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ones_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ones_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ones_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ones_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_permute_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_permute_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_permute_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_permute_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_permute_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_permute_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_permute_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_permute_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_positive_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_positive_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_positive_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_pow_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_pow_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_pow_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_pow_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_pow_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_pow_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_prod_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_prod_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_prod_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_randn_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_randn_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_randn_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_randn_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ravel_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ravel_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ravel_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ravel_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ravel_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ravel_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_real_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_real_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_real_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reciprocal_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reciprocal_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reciprocal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reciprocal_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reciprocal_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reciprocal_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reciprocal_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reciprocal_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_remainder_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_remainder_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_remainder_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_repeat_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_repeat_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_repeat_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_repeat_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_repeat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_repeat_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_repeat_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_repeat_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_as_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_as_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_as_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_as_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_as_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_as_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_as_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_as_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_roll_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_roll_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_roll_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_roll_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_roll_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_roll_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_roll_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rot90_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rot90_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rot90_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rot90_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rot90_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rot90_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rot90_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rot90_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_round_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_round_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_round_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_round_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsqrt_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsqrt_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsqrt_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsqrt_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsqrt_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsqrt_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsqrt_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsqrt_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsqrt_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsub_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsub_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsub_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsub_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsub_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsub_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsub_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsub_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sgn_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sgn_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sgn_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sgn_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sgn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sgn_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sgn_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sgn_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sigmoid_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sigmoid_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sigmoid_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sigmoid_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sigmoid_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sigmoid_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sign_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sign_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sign_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sign_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sign_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sign_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sign_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_signbit_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_signbit_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_signbit_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_signbit_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_signbit_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sin_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sin_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sin_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sin_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sin_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sin_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinc_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinc_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinc_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinc_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinc_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinc_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinh_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinh_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinh_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinh_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinh_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinh_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_softmax_with_dtype_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_softmax_with_dtype_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_softmax_with_dtype_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_softmax_with_dtype_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_softmax_with_dtype_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_softmax_with_dtype_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_bessel_j0_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_bessel_j0_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_bessel_j0_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_bessel_j1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_entr_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_entr_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_entr_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_entr_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_entr_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_entr_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_erfcx_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_erfcx_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_erfcx_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_erfcx_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_erfcx_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i0e_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i0e_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i0e_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i0e_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i0e_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i0e_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i1_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i1_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i1e_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i1e_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i1e_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i1e_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_log_ndtr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_log_ndtr_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_log_ndtr_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_log_ndtr_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_log_softmax_with_dtype_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_log_softmax_with_dtype_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_log_softmax_with_dtype_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_log_softmax_with_dtype_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_log_softmax_with_dtype_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_log_softmax_with_dtype_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_logit_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_logit_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_logit_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_logit_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_logit_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_1_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_1_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_3_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_3_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_3_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_3_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_3_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_3_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_5_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_5_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_5_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_5_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_ndtr_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_ndtr_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_ndtr_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_ndtri_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_ndtri_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_ndtri_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_ndtri_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_softmax_with_dtype_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_softmax_with_dtype_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_softmax_with_dtype_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_softmax_with_dtype_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_spherical_bessel_j0_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_spherical_bessel_j0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_spherical_bessel_j0_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_spherical_bessel_j0_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_spherical_bessel_j0_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_spherical_bessel_j0_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_spherical_bessel_j0_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_spherical_bessel_j0_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_xlog1py_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_xlog1py_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_xlog1py_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_zeta_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_zeta_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_zeta_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sqrt_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sqrt_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sqrt_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sqrt_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sqrt_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sqrt_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sqrt_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_square_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_square_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_square_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_square_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_square_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_square_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_squeeze_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_squeeze_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_squeeze_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_squeeze_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_squeeze_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_squeeze_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_stack_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_stack_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_stack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_stack_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_stack_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_stack_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_stack_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_std_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_std_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_std_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_std_mean_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_std_mean_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_std_mean_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_std_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sub_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sub_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sub_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sub_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sub_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sub_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sub_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sub_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_to_size_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_to_size_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_to_size_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_to_size_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_to_size_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_to_size_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_to_size_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_t_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_t_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_t_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_t_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tan_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tan_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tan_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tan_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tan_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tan_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tanh_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tanh_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tanh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tanh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tanh_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tanh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tanh_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tensor_split_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tensor_split_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tensor_split_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tensor_split_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tensor_split_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_to_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_to_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_to_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_to_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_trace_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_trace_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_trace_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_trace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_trace_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_trace_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_trace_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_transpose_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_transpose_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_transpose_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_transpose_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_transpose_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_transpose_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_transpose_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_transpose_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tril_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tril_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tril_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tril_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tril_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tril_indices_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_triu_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_triu_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_triu_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_triu_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_triu_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_triu_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_triu_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_true_divide_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_true_divide_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_true_divide_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_true_divide_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_true_divide_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_true_divide_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_true_divide_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_true_divide_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_true_divide_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_true_divide_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_true_divide_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_trunc_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_trunc_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_trunc_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_trunc_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_trunc_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unbind_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unbind_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unbind_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unbind_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unbind_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unbind_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unflatten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unflatten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unflatten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unflatten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unflatten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unflatten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_copy_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_copy_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_copy_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_copy_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_copy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_copy_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_copy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unsqueeze_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unsqueeze_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unsqueeze_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unsqueeze_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_var_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_var_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_var_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_var_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_var_mean_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_as_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_as_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_as_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_as_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_as_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_as_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_as_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_as_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vsplit_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vsplit_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vsplit_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vsplit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vsplit_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vsplit_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vsplit_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vstack_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vstack_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vstack_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vstack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vstack_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vstack_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vstack_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vstack_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_where_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_where_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_where_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_where_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_where_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_xlogy_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_xlogy_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_xlogy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_xlogy_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_xlogy_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_xlogy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_zeros_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_zeros_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_zeros_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_zeros_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_zeros_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_zeros_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_zeros_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta_ops_nvprims_native_batch_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta_ops_nvprims_native_batch_norm_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta_ops_nvprims_var_mean_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta_ops_nvprims_var_mean_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta_ops_nvprims_view_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta_ops_nvprims_view_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta_ops_nvprims_view_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta_ops_nvprims_view_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta_ops_nvprims_view_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta_ops_nvprims_view_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_ops_nvprims_var_mean_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_ops_nvprims_view_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_ops_nvprims_view_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_ops_nvprims_view_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_ops_nvprims_view_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_ops_nvprims_view_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_ops_nvprims_view_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_ops_nvprims_view_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_T_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_T_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_T_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_T_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_T_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_T_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_T_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bfloat16_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bfloat16_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bfloat16_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bfloat16_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bfloat16_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bfloat16_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bfloat16_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bool_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bool_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bool_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bool_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bool_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bool_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bool_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_byte_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_byte_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_byte_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_byte_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_byte_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_byte_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cdouble_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cdouble_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cdouble_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cdouble_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cdouble_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cdouble_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cfloat_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cfloat_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cfloat_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cfloat_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cfloat_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cfloat_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cfloat_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cfloat_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_chalf_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_chalf_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_chalf_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_chalf_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_chalf_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_chalf_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_chalf_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_char_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_char_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_char_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_char_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_char_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_complex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_complex_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_double_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_double_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_double_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_double_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_double_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_double_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_double_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_double_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_double_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_double_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_float_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_float_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_float_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_float_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_float_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_float_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_half_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_half_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_half_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_half_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_half_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_half_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_half_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_int_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_int_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_int_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_int_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_int_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_long_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_long_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_long_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_long_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_short_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_short_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_short_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_short_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_short_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_short_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_abs_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_abs_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_abs_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_abs_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_abs_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_abs_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acos_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acos_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acos_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acos_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acos_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acos_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acos_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acosh_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acosh_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acosh_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acosh_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acosh_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acosh_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acosh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_add_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_add_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_add_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_add_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_add_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_add_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_add_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addcdiv_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addcdiv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addcmul_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addcmul_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addcmul_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addcmul_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addr_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addr_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addr_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addr_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_all_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_all_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_all_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_all_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_all_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_all_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_all_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_allclose_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_allclose_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_allclose_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_allclose_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_amax_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_amax_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_amax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_amax_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_amax_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_amin_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_amin_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_amin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_amin_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_amin_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_amin_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_amin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_amin_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_amin_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_any_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_any_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_any_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_any_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_any_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_any_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_arange_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_arange_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_arange_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_arange_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_partial_views_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_partial_views_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_partial_views_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_partial_views_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_partial_views_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_partial_views_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_partial_views_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_partial_views_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_scatter_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_scatter_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_scatter_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_scatter_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_scatter_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_scatter_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asin_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asin_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asin_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asinh_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asinh_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asinh_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asinh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asinh_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asinh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asinh_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asinh_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan2_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan2_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan2_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan2_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atanh_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atanh_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_1d_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_1d_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_1d_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_1d_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_1d_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_1d_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_1d_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_2d_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_2d_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_2d_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_2d_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_2d_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_2d_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_3d_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_3d_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_3d_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_3d_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_3d_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_and_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_and_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_and_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_left_shift_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_left_shift_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_left_shift_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_not_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_not_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_or_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_or_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_or_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_or_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_right_shift_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_xor_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_xor_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_xor_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_tensors_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_tensors_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_tensors_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_tensors_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_tensors_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_to_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_to_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_to_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_to_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_to_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_to_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bucketize_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bucketize_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cat_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cat_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cat_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cat_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cat_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cat_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cat_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cat_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ceil_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ceil_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ceil_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ceil_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ceil_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ceil_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_chunk_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_chunk_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_chunk_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_chunk_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_chunk_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_chunk_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_chunk_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_chunk_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_max_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_max_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_max_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_max_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_max_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_max_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_max_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_min_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_min_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_min_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_min_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_min_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_min_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clone_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clone_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clone_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clone_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clone_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_column_stack_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_column_stack_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_column_stack_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_column_stack_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_column_stack_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_column_stack_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_physical_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_physical_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_physical_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_physical_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_physical_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_constant_pad_nd_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_constant_pad_nd_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_constant_pad_nd_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_constant_pad_nd_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_constant_pad_nd_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_constant_pad_nd_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_contiguous_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_contiguous_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_contiguous_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_contiguous_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_contiguous_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_contiguous_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_copysign_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_copysign_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_copysign_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cos_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cos_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cos_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cos_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cos_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cosh_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cosh_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cosh_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cosh_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cosh_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cosh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cosh_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cumsum_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cumsum_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cumsum_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cumsum_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cumsum_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_embed_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_embed_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_embed_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_embed_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_embed_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_embed_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_embed_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_embed_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_embed_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_copy_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_copy_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_copy_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_copy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_copy_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_scatter_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_scatter_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_scatter_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_scatter_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_scatter_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_scatter_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_digamma_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_digamma_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_digamma_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_digamma_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_floor_rounding_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_floor_rounding_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_floor_rounding_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_floor_rounding_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_floor_rounding_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_no_rounding_mode_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_no_rounding_mode_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_no_rounding_mode_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_no_rounding_mode_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_no_rounding_mode_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_trunc_rounding_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_trunc_rounding_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_trunc_rounding_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_trunc_rounding_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_trunc_rounding_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dsplit_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dsplit_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dsplit_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dsplit_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dsplit_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dstack_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dstack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_like_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_like_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_like_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_like_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_like_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_like_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_like_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eq_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eq_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eq_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erf_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erf_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erf_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erf_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erf_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erfc_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erfc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erfc_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erfc_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erfc_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erfinv_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erfinv_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erfinv_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erfinv_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erfinv_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erfinv_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp2_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_as_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_as_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_as_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_as_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_as_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expm1_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expm1_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expm1_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expm1_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eye_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eye_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eye_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eye_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eye_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eye_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eye_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eye_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft2_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft2_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft2_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft2_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft2_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftn_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftn_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftn_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftn_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftshift_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftshift_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftshift_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftshift_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft2_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft2_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft2_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfftn_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfftn_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfftn_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfftn_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfftn_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfftn_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft2_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft2_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft2_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft2_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft2_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftn_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftn_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftn_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftn_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftn_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftn_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftn_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftn_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftshift_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftshift_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftshift_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftshift_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftshift_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftshift_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftshift_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftshift_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfft2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfft2_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfft2_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfft2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfft_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfft_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfft_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfft_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfft_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfft_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfftn_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfftn_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft2_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft2_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft2_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft2_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfftn_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfftn_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfftn_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfftn_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfftn_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfftn_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfftn_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfft2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfft2_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfft2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfft2_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfft2_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfft_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfft_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfft_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfft_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfft_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfftn_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfftn_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfftn_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fill_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fill_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fill_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fill_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fill_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fill_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fill_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fill_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fill_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fill_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flatten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flatten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flatten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flatten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flatten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flatten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flatten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flip_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flip_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flip_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flip_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flip_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flip_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flip_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flip_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flip_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fliplr_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fliplr_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fliplr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fliplr_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fliplr_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flipud_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flipud_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flipud_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flipud_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flipud_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_float_power_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_float_power_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_float_power_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_floor_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_floor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_floor_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_floor_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_floor_divide_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_floor_divide_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_floor_divide_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_floor_divide_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_floor_divide_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmax_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmax_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmax_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmax_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmin_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmin_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmin_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmin_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmod_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmod_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmod_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmod_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmod_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmod_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_gcd_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_gcd_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_gcd_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ge_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ge_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ge_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ge_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ge_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_gt_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_gt_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_gt_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_gt_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_heaviside_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_heaviside_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_heaviside_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_heaviside_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hsplit_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hsplit_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hsplit_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hsplit_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hsplit_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hstack_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hstack_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hstack_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hstack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hstack_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hstack_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hstack_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hstack_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hstack_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hypot_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hypot_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_i0_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_i0_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_i0_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_igammac_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_igammac_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_imag_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_imag_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_imag_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_add_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_add_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_add_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_add_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_copy_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_copy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_fill_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_fill_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_fill_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_select_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_select_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_select_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isclose_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isclose_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isclose_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isclose_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isclose_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isfinite_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isfinite_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isfinite_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isfinite_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isfinite_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isfinite_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isinf_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isinf_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isinf_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isinf_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isinf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isinf_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isinf_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isinf_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isnan_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isnan_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isnan_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isnan_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isnan_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isnan_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isnan_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isneginf_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isneginf_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isneginf_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isposinf_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isposinf_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isposinf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isposinf_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isposinf_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isposinf_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isreal_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isreal_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isreal_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isreal_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isreal_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lcm_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lcm_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lcm_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lcm_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_le_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_le_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_le_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_le_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_le_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lerp_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lerp_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lerp_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lerp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lgamma_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lgamma_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lgamma_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lgamma_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lgamma_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_matrix_norm_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_matrix_norm_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_matrix_norm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_matrix_norm_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_norm_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_svd_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_svd_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_svdvals_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_vector_norm_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_vector_norm_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_vector_norm_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linspace_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linspace_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linspace_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linspace_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log10_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log10_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log10_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log10_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log10_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log10_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log10_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log10_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log1p_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log1p_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log1p_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log1p_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log1p_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log1p_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log1p_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log1p_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log2_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_softmax_with_dtype_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_softmax_with_dtype_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_softmax_with_dtype_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_softmax_with_dtype_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_softmax_with_dtype_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_and_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_and_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_and_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_and_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_not_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_not_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_not_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_not_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_not_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_not_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_or_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_or_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_or_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_or_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_or_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_xor_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_xor_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_xor_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_xor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_xor_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_xor_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_xor_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_xor_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logspace_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logspace_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logspace_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logspace_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logspace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logspace_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logspace_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logspace_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logsumexp_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logsumexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logsumexp_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logsumexp_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logsumexp_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lt_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lt_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lt_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lt_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lt_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lt_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lt_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lt_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_masked_fill_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_masked_fill_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_masked_fill_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_masked_fill_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_masked_fill_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_maximum_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_maximum_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_maximum_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_maximum_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_mean_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_mean_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_list_of_tensors_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_list_of_tensors_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_list_of_tensors_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_list_of_tensors_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_list_of_tensors_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_list_of_tensors_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_list_of_tensors_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_variadic_tensors_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_variadic_tensors_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_variadic_tensors_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_variadic_tensors_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_variadic_tensors_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_minimum_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_minimum_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_minimum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_minimum_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_minimum_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_minimum_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_minimum_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_movedim_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_movedim_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_movedim_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_movedim_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_movedim_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_movedim_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_mul_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_mul_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_mul_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_mul_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_mul_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_mul_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_mul_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_mul_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_mul_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nan_to_num_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nan_to_num_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nan_to_num_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nan_to_num_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nan_to_num_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_copy_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_copy_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_copy_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_copy_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_copy_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_native_layer_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_native_layer_norm_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ne_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ne_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ne_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ne_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ne_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_neg_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_neg_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_neg_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_neg_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_neg_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_neg_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_strided_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_strided_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_strided_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_strided_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_strided_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_strided_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_strided_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_strided_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_full_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_full_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_full_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_full_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_full_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_ones_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_ones_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_ones_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_ones_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_ones_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_ones_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_ones_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_zeros_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_zeros_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_zeros_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_zeros_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_zeros_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_zeros_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nextafter_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_alpha_dropout_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_alpha_dropout_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_alpha_dropout_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_celu_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_celu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_dropout_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_dropout_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_gelu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_glu_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_glu_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_glu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_glu_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_group_norm_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_group_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_hardshrink_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_hardtanh_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_hardtanh_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_hardtanh_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_hardtanh_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_hardtanh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_hinge_embedding_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_huber_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_huber_loss_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_l1_loss_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_l1_loss_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_l1_loss_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_layer_norm_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_layer_norm_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_leaky_relu_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_leaky_relu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_log_softmax_with_dtype_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_log_softmax_with_dtype_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_log_softmax_with_dtype_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_log_softmax_with_dtype_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_margin_ranking_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_margin_ranking_loss_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_margin_ranking_loss_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_mish_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_mish_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_mse_loss_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_mse_loss_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_mse_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_nll_loss_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_nll_loss_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_nll_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_pairwise_distance_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_pairwise_distance_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_pairwise_distance_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_pairwise_distance_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_pairwise_distance_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_pairwise_distance_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_pdist_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_pdist_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_poisson_nll_loss_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_poisson_nll_loss_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_poisson_nll_loss_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_poisson_nll_loss_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_poisson_nll_loss_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_prelu_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_prelu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_prelu_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_relu6_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_relu6_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_relu6_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_relu_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_relu_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_relu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_relu_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_relu_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_relu_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_selu_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_selu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_selu_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmax_with_dtype_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmax_with_dtype_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmax_with_dtype_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmin_with_dtype_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmin_with_dtype_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmin_with_dtype_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmin_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmin_with_dtype_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softplus_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_tanhshrink_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_tanhshrink_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_tanhshrink_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_tanhshrink_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_tanhshrink_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_tanhshrink_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_threshold_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_threshold_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_threshold_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_threshold_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_threshold_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_triplet_margin_loss_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_triplet_margin_loss_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_triplet_margin_loss_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_triplet_margin_loss_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_triplet_margin_loss_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_triplet_margin_loss_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_triplet_margin_loss_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_norm_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ones_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ones_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ones_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ones_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ones_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ones_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_permute_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_permute_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_permute_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_permute_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_permute_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_permute_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_permute_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_positive_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_positive_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_positive_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_positive_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_positive_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_pow_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_pow_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_pow_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_pow_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_pow_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_pow_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_prod_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_prod_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_prod_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_prod_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_prod_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_prod_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_randn_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_randn_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_randn_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_randn_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_randn_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ravel_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ravel_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ravel_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ravel_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ravel_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ravel_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ravel_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ravel_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ravel_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ravel_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ravel_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_real_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_real_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_real_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_real_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_real_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_real_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_real_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reciprocal_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reciprocal_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reciprocal_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reciprocal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_remainder_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_repeat_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_repeat_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_repeat_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_repeat_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_repeat_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_repeat_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_as_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_as_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_as_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_as_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_as_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_roll_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_roll_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_roll_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_roll_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_roll_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_roll_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_roll_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rot90_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rot90_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rot90_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rot90_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rot90_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rot90_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rot90_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_round_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_round_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_round_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsqrt_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsqrt_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsqrt_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsqrt_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsqrt_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsqrt_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsub_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsub_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsub_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsub_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsub_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsub_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsub_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsub_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sgn_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sgn_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sgn_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sgn_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sgn_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sgn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sgn_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sgn_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sgn_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sigmoid_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sigmoid_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sigmoid_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sigmoid_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sigmoid_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sigmoid_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sigmoid_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sigmoid_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sigmoid_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sign_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sign_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sign_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sign_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sign_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_signbit_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_signbit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_signbit_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_signbit_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_signbit_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_signbit_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sin_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sin_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sin_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sin_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sin_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinc_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinc_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinc_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinc_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinc_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinc_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinh_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinh_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinh_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinh_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinh_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_softmax_with_dtype_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_softmax_with_dtype_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_softmax_with_dtype_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_softmax_with_dtype_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_softmax_with_dtype_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_softmax_with_dtype_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_bessel_j0_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_bessel_j0_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_bessel_j1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_bessel_j1_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_bessel_j1_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_bessel_j1_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_entr_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_entr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_entr_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_entr_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_entr_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_entr_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_erfcx_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_erfcx_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_erfcx_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_erfcx_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i0e_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i0e_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i0e_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i0e_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i0e_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i0e_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i1_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i1_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i1_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i1e_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i1e_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i1e_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_log_ndtr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_log_ndtr_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_log_ndtr_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_log_ndtr_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_log_ndtr_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_log_softmax_with_dtype_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_log_softmax_with_dtype_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_log_softmax_with_dtype_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_log_softmax_with_dtype_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_log_softmax_with_dtype_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_log_softmax_with_dtype_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_log_softmax_with_dtype_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_logit_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_logit_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_logit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_logit_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_logit_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_logit_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_1_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_1_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_1_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_1_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_3_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_3_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_5_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_5_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_5_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_5_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_ndtr_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_ndtr_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_ndtr_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_ndtri_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_ndtri_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_ndtri_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_ndtri_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_ndtri_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_ndtri_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_softmax_with_dtype_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_softmax_with_dtype_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_softmax_with_dtype_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_softmax_with_dtype_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_softmax_with_dtype_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_softmax_with_dtype_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_softmax_with_dtype_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_softmax_with_dtype_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_spherical_bessel_j0_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_spherical_bessel_j0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_spherical_bessel_j0_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_spherical_bessel_j0_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_xlog1py_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_xlog1py_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_xlog1py_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_xlog1py_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_xlog1py_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_xlog1py_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_zeta_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_zeta_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_zeta_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_zeta_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sqrt_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sqrt_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sqrt_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sqrt_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sqrt_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sqrt_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sqrt_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_square_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_square_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_square_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_square_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_squeeze_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_squeeze_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_squeeze_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_squeeze_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_squeeze_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_squeeze_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_squeeze_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_stack_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_stack_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_stack_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_stack_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_stack_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_stack_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_stack_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_stack_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_stack_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_std_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_std_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_std_mean_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_std_mean_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_std_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sub_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sub_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sub_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sub_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sub_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sub_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_to_size_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_to_size_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_to_size_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_to_size_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_to_size_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_t_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_t_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_t_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_t_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_t_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_t_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tan_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tan_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tan_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tan_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tan_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tan_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tan_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tanh_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tanh_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tanh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tanh_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tensor_split_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tensor_split_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tensor_split_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tensor_split_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tensor_split_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tensor_split_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_to_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_to_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_to_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_to_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_to_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_to_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_to_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trace_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trace_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trace_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trace_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trace_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_transpose_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_transpose_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_transpose_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_transpose_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tril_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tril_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tril_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tril_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tril_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tril_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tril_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tril_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_triu_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_triu_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_triu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_triu_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_triu_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_triu_indices_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_true_divide_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_true_divide_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_true_divide_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_true_divide_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_true_divide_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_true_divide_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_true_divide_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trunc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trunc_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trunc_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trunc_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unbind_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unbind_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unbind_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unbind_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unflatten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unflatten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unflatten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unflatten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unflatten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unflatten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_copy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unsqueeze_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unsqueeze_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unsqueeze_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_var_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_var_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_var_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_var_mean_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_var_mean_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_as_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_as_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_as_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_as_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_as_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_as_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_as_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_as_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_as_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_as_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_as_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vsplit_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vsplit_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vsplit_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vsplit_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vstack_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vstack_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vstack_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vstack_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_where_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_where_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_where_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_where_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_where_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_where_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_xlogy_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_xlogy_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_xlogy_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_xlogy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_zeros_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_zeros_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_zeros_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_zeros_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_zeros_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_zeros_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_zeros_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_zeros_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_zeros_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback_ops_nvprims_native_batch_norm_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback_ops_nvprims_var_mean_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback_ops_nvprims_view_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback_ops_nvprims_view_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback_ops_nvprims_view_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback_ops_nvprims_view_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback_ops_nvprims_view_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback_ops_nvprims_view_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback_ops_nvprims_view_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback_ops_nvprims_view_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback_ops_nvprims_view_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_H_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_H_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_T_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager___getitem___cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager___radd___cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager___rdiv___cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager___rmatmul___cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager___rmod___cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager___rmul___cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager___rpow___cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager___rpow___cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager___rsub___cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager__softmax_backward_data_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_acosh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_add_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_add_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_addcdiv_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_addcmul_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_addmm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_addmm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_addmm_decomposed_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_addmv_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_all_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_amax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_amin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_aminmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_argmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_argmin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_argsort_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_argwhere_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_argwhere_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_as_strided_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_as_strided_scatter_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_as_strided_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_asin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_atan_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_atleast_1d_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_atleast_1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_atleast_2d_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_atleast_2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_atleast_3d_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_atleast_3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_baddbmm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_bfloat16_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_block_diag_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_block_diag_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_bool_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_broadcast_shapes_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_broadcast_tensors_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_broadcast_tensors_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_broadcast_to_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_bucketize_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cartesian_prod_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cdist_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cdouble_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cfloat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_chalf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_char_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cholesky_inverse_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cholesky_inverse_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cholesky_solve_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cholesky_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_clamp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_clamp_max_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_column_stack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_combinations_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_conj_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_constant_pad_nd_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_constant_pad_nd_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_corrcoef_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cos_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cos_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cosh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_count_nonzero_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cov_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cross_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cummax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cummin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cumprod_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cumprod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cumulative_trapezoid_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cumulative_trapezoid_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_deg2rad_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_diag_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_diagflat_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_diagonal_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_diagonal_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_diff_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_dist_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_div_no_rounding_mode_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_div_trunc_rounding_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_dot_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_dot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_dstack_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_einsum_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_einsum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_empty_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_empty_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_eq_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_erf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_erfc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_exp2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_exp_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_exp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_expand_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_expm1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_eye_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_fftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_fftshift_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_hfft2_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_hfftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_ifft2_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_ifft_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_ifft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_ifftn_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_ifftshift_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_ihfftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_irfft2_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_irfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_irfftn_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_rfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fill_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fill_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_flatten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_flip_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fliplr_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_flipud_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_flipud_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_float_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_float_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_float_power_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_floor_divide_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fmod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_frac_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_full_like_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_gather_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_gather_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_ge_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_geqrf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_gradient_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_gt_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_half_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_heaviside_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_histc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_hstack_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_index_add_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_index_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_index_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_index_fill_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_index_put_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_index_put_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_index_select_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_inner_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_int_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_isclose_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_isfinite_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_isin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_isinf_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_isinf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_isposinf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_isreal_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_istft_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_jiterator_2inputs_2outputs_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_jiterator_4inputs_with_extra_args_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_jiterator_4inputs_with_extra_args_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_jiterator_binary_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_jiterator_binary_return_by_ref_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_jiterator_unary_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_jiterator_unary_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_kron_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_kthvalue_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_cond_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_cond_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_cross_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_det_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_det_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_det_singular_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_eig_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_eig_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_eigh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_eigvalsh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_eigvalsh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_householder_product_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_inv_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_inv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_inv_ex_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_ldl_factor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_ldl_factor_ex_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_ldl_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_lstsq_grad_oriented_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_lu_factor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_lu_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_matrix_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_matrix_power_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_matrix_rank_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_matrix_rank_hermitian_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_matrix_rank_hermitian_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_norm_subgradients_at_zero_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_pinv_hermitian_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_pinv_singular_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_pinv_singular_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_qr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_slogdet_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_solve_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_solve_triangular_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_svd_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_tensorinv_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_tensorinv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_tensorsolve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_vander_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_vector_norm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linspace_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linspace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_log10_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_log10_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_log2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_log_softmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_log_softmax_with_dtype_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_logical_not_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_logical_xor_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_logspace_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_logsumexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_long_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_long_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_lu_solve_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_lu_unpack_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_lu_unpack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_mT_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_amax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_amin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_argmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_argmin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_cumprod_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_cumsum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_logaddexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_mean_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_normalize_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_scatter_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_select_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_softmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_std_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_sum_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_sum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_var_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_matmul_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_matmul_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_matrix_exp_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_max_binary_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_max_reduction_with_dim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_mean_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_meshgrid_list_of_tensors_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_meshgrid_variadic_tensors_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_min_binary_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_min_reduction_no_dim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_min_reduction_with_dim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_minimum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_mm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_msort_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_mul_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_mv_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_mv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nanmedian_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_narrow_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_native_dropout_backward_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_ne_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_neg_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_new_empty_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_new_full_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_new_ones_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_new_zeros_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nextafter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional__scaled_dot_product_attention_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_adaptive_avg_pool2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_adaptive_max_pool2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_alpha_dropout_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_avg_pool1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_batch_norm_without_cudnn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_bilinear_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_celu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_conv1d_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_conv1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_conv2d_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_conv2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_conv_transpose1d_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_conv_transpose2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_conv_transpose3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_cosine_similarity_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_cross_entropy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_elu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_embedding_bag_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_feature_alpha_dropout_without_train_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_fractional_max_pool3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_gaussian_nll_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_glu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_grid_sample_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_hardshrink_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_hardsigmoid_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_huber_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_interpolate_area_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_interpolate_bicubic_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_interpolate_linear_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_interpolate_nearest_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_interpolate_trilinear_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_kl_div_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_l1_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_leaky_relu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_linear_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_local_response_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_logsigmoid_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_max_pool1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_max_pool3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_max_unpool1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_max_unpool3d_grad_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_normalize_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_pad_circular_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_pad_circular_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_pad_constant_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_pad_reflect_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_pad_replicate_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_pdist_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_pixel_shuffle_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_pixel_unshuffle_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_poisson_nll_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_relu6_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_relu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_silu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_smooth_l1_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_soft_margin_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_softmin_with_dtype_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_softmin_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_softshrink_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_softsign_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_tanhshrink_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_threshold_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_unfold_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nonzero_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_norm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_norm_fro_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_norm_inf_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_norm_inf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_normal_number_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_ones_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_ones_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_outer_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_pca_lowrank_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_pinverse_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_pinverse_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_polar_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_polygamma_polygamma_n_0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_polygamma_polygamma_n_2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_polygamma_polygamma_n_3_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_positive_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_positive_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_prod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_put_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_qr_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_quantile_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_rand_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_randint_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_randn_like_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_ravel_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_real_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_reciprocal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_renorm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_renorm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_reshape_as_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_reshape_as_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_resize_as__cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_resolve_conj_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_resolve_neg_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_roll_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_round_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_round_decimals_3_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_round_decimals_neg_3_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_rsqrt_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_rsub_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_scalar_tensor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_scatter_reduce_amax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_scatter_reduce_amin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_scatter_reduce_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_searchsorted_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_select_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_select_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_select_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_short_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_sigmoid_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_signal_windows_general_cosine_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_signal_windows_general_hamming_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_signal_windows_hann_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_signal_windows_kaiser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_signal_windows_nuttall_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_sin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_sinc_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_sinc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_sinh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_slice_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_slice_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_sort_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_sparse_sampled_addmm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_sparse_sampled_addmm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_airy_ai_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_bessel_y0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_chebyshev_polynomial_u_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_chebyshev_polynomial_v_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_chebyshev_polynomial_w_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_hermite_polynomial_h_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_i1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_laguerre_polynomial_l_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_log_ndtr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_modified_bessel_k0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_ndtr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_ndtri_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_spherical_bessel_j0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_split_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_split_list_args_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_split_with_sizes_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_split_with_sizes_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_square_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_square_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_squeeze_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_std_mean_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_std_mean_unbiased_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_std_mean_unbiased_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_stft_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_sum_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_sum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_sum_to_size_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_svd_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_svd_lowrank_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_symeig_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_t_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_take_along_dim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_take_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_tan_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_tanh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_tensor_split_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_tile_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_trace_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_transpose_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_trapezoid_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_trapezoid_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_trapz_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_triangular_solve_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_tril_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_unbind_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_unbind_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_unflatten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_unfold_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_unfold_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_unfold_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_uniform_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_unique_consecutive_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_unique_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_var_mean_unbiased_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_var_unbiased_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_view_as_complex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_view_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_vsplit_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_vstack_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_where_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_zero__cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_zeros_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_zeros_like_cuda_complex64, test/test_ops.py::TestCompositeComplianceCUDA::test_backward___getitem___cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward___rdiv___cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward___rmod___cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward___rpow___cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward__native_batch_norm_legit_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_addbmm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_addcmul_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_addmm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_addr_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_amax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_amin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_asinh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_atan2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_atleast_2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_bfloat16_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_block_diag_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_cdouble_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_chalf_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_cholesky_inverse_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_cholesky_solve_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_clamp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_clamp_max_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_clone_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_complex_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_constant_pad_nd_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_copysign_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_corrcoef_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_cos_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_cov_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_cross_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_cummax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_cumprod_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_cumsum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_cumulative_trapezoid_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_diag_embed_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_diagonal_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_diff_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_digamma_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_div_floor_rounding_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_div_trunc_rounding_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_dsplit_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_dstack_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_einsum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_erf_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_exp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_expand_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_expm1_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_fft_fftshift_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_fft_hfft2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_fft_hfftn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_fft_ifftn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_fft_ihfftn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_fft_irfft2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_fft_irfftn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_fft_rfft_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_flatten_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_flip_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_fliplr_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_float_power_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_floor_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_fmax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_frac_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_frexp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_gradient_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_grid_sampler_2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_hsplit_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_i0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_index_put_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_ldexp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_lgamma_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_eigvals_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_eigvalsh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_inv_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_inv_ex_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_lstsq_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_lstsq_grad_oriented_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_lu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_lu_solve_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_matrix_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_matrix_power_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_multi_dot_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_norm_subgradients_at_zero_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_pinv_hermitian_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_pinv_singular_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_slogdet_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_svdvals_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_tensorsolve_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_vander_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_log_softmax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_logaddexp2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_logdet_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_lu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_lu_unpack_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_masked_amax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_masked_amin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_masked_cumprod_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_masked_fill_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_masked_log_softmax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_masked_logaddexp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_masked_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_masked_prod_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_masked_softmax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_masked_std_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_matrix_exp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_max_reduction_no_dim_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_maximum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_mean_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_meshgrid_list_of_tensors_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_meshgrid_variadic_tensors_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_min_binary_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_min_reduction_with_dim_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_minimum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_mm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_mul_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_mv_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_mvlgamma_mvlgamma_p_5_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nan_to_num_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nanmedian_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_narrow_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_native_layer_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional__scaled_dot_product_attention_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_adaptive_avg_pool1d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_adaptive_avg_pool3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_adaptive_max_pool1d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_adaptive_max_pool2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_avg_pool1d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_avg_pool3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_bilinear_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_binary_cross_entropy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_conv_transpose2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_conv_transpose3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_cross_entropy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_embedding_bag_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_embedding_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_gaussian_nll_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_gelu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_grid_sample_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_hardshrink_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_hardsigmoid_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_hardswish_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_hinge_embedding_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_interpolate_bicubic_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_interpolate_linear_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_layer_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_leaky_relu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_local_response_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_logsigmoid_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_max_pool1d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_max_pool2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_max_unpool1d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_max_unpool1d_grad_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_multi_margin_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_nll_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_pixel_shuffle_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_pixel_unshuffle_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_poisson_nll_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_rrelu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_smooth_l1_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_soft_margin_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_softmin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_softmin_with_dtype_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_softplus_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_softsign_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_unfold_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_norm_fro_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_norm_inf_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_norm_nuc_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_normal_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_permute_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_pinverse_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_polygamma_polygamma_n_1_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_positive_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_pow_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_prod_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_qr_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_ravel_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_reciprocal_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_remainder_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_renorm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_repeat_interleave_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_resolve_conj_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_resolve_neg_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_rot90_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_round_decimals_3_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_rsqrt_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_scatter_add_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_scatter_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_scatter_reduce_amax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_scatter_reduce_mean_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_select_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_select_scatter_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_sigmoid_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_sign_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_sinh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_slice_scatter_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_sparse_sampled_addmm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_special_entr_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_special_ndtri_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_special_polygamma_special_polygamma_n_0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_split_list_args_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_split_with_sizes_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_sqrt_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_square_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_std_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_std_mean_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_std_unbiased_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_stft_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_sum_to_size_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_svd_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_svd_lowrank_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_symeig_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_take_along_dim_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_tanh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_to_sparse_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_trace_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_transpose_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_trapz_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_triangular_solve_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_tril_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_triu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_unbind_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_unfold_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_unfold_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_var_mean_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_var_unbiased_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_vdot_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_view_as_complex_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_vsplit_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_vstack_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_where_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_xlogy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_H_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_T_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad___radd___cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad___rmatmul___cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad___rsub___cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad__native_batch_norm_legit_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_acosh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_addcmul_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_addmm_decomposed_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_addmv_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_all_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_amax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_aminmax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_angle_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_argmin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_asinh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_atan_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_atleast_1d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_baddbmm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_bfloat16_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_block_diag_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_bool_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_broadcast_shapes_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_broadcast_tensors_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_broadcast_to_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_bucketize_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_byte_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_cdouble_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_ceil_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_chalf_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_char_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_cholesky_inverse_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_chunk_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_clamp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_clamp_max_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_column_stack_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_combinations_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_complex_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_contiguous_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_copysign_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_corrcoef_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_cosh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_count_nonzero_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_cov_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_cummax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_cummin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_cumprod_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_cumulative_trapezoid_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_diag_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_diagflat_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_diagonal_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_diagonal_scatter_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_div_trunc_rounding_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_dot_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_dsplit_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_dstack_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_einsum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_eq_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_equal_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_erf_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_erfinv_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_exp2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_exp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_expand_as_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_eye_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fft_fft_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fft_fftn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fft_fftshift_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fft_hfft_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fft_ifft2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fft_ifft_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fft_ifftshift_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fft_ihfft_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fft_ihfftn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fft_irfft2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fft_irfft_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fft_rfftn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fill_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_flatten_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fliplr_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_flipud_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_floor_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fmin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_ge_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_half_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_heaviside_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_hstack_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_hypot_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_i0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_igamma_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_igammac_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_index_add_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_index_fill_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_index_reduce_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_inner_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_isneginf_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_isreal_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_jiterator_2inputs_2outputs_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_kron_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_ldexp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_le_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_lerp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_cross_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_det_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_eig_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_eigh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_eigvals_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_inv_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_lstsq_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_lu_factor_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_lu_factor_ex_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_matrix_rank_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_matrix_rank_hermitian_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_pinv_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_pinv_singular_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_qr_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_slogdet_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_solve_ex_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_svd_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_svdvals_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_tensorinv_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_tensorsolve_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_vecdot_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_vector_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_log1p_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_log2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_log_softmax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_logaddexp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_logical_or_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_logit_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_logspace_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_logsumexp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_long_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_lt_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_lu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_mH_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_mT_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_amax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_cumsum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_fill_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_log_softmax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_logaddexp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_prod_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_scatter_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_select_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_softmax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_softmin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_std_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_sum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_matrix_exp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_max_pool2d_with_indices_backward_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_mean_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_median_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_meshgrid_list_of_tensors_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_meshgrid_variadic_tensors_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_min_reduction_with_dim_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_minimum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_mm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_mode_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_movedim_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_multinomial_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nanquantile_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nansum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_narrow_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_native_batch_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_neg_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_new_empty_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_new_full_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_new_zeros_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nextafter_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_adaptive_max_pool2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_alpha_dropout_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_avg_pool1d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_avg_pool3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_batch_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_conv1d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_conv2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_conv_transpose1d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_cosine_embedding_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_elu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_grid_sample_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_group_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_hardshrink_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_hardsigmoid_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_hinge_embedding_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_huber_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_instance_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_interpolate_bilinear_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_interpolate_nearest_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_interpolate_trilinear_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_l1_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_layer_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_linear_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_local_response_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_max_pool2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_max_pool3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_max_unpool1d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_max_unpool1d_grad_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_max_unpool3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_max_unpool3d_grad_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_mse_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_multilabel_margin_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_normalize_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_pad_replicate_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_pairwise_distance_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_pixel_shuffle_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_pixel_unshuffle_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_prelu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_relu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_smooth_l1_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_soft_margin_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_softmin_with_dtype_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_softplus_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_softshrink_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_softsign_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_tanhshrink_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_upsample_bilinear_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nonzero_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_normal_number_mean_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_ones_like_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_outer_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_pca_lowrank_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_pinverse_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_polar_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_polygamma_polygamma_n_0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_polygamma_polygamma_n_2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_polygamma_polygamma_n_4_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_pow_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_prod_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_put_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_quantile_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_rad2deg_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_rand_like_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_ravel_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_real_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_remainder_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_renorm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_repeat_interleave_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_reshape_as_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_resolve_conj_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_resolve_neg_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_rot90_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_round_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_round_decimals_0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_rsub_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_scalar_tensor_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_scatter_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_scatter_reduce_amax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_scatter_reduce_mean_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_scatter_reduce_prod_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_searchsorted_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_segment_reduce_lengths_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_segment_reduce_offsets_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_select_scatter_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_sgn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_signal_windows_blackman_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_signal_windows_cosine_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_signal_windows_gaussian_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_signal_windows_general_cosine_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_signal_windows_hann_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_signal_windows_kaiser_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_signal_windows_nuttall_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_sinh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_slice_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_softmax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_sort_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_sparse_sampled_addmm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_bessel_j0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_bessel_j1_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_bessel_y0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_chebyshev_polynomial_t_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_chebyshev_polynomial_v_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_chebyshev_polynomial_w_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_entr_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_laguerre_polynomial_l_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_legendre_polynomial_p_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_modified_bessel_i0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_modified_bessel_i1_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_modified_bessel_k1_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_ndtri_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_scaled_modified_bessel_k1_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_zeta_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_split_list_args_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_square_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_squeeze_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_std_mean_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_std_mean_unbiased_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_std_unbiased_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_sub_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_sum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_sum_to_size_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_svd_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_tanh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_tensordot_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_tile_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_to_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_topk_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_trapz_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_tril_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_trunc_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_unflatten_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_uniform_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_unique_consecutive_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_unique_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_unsqueeze_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_var_mean_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_vdot_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_view_as_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_view_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_vsplit_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_vstack_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_where_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_xlogy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_zeros_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_zeros_like_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator___getitem___cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator___rmatmul___cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator___rmod___cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator__native_batch_norm_legit_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_acosh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_add_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_addbmm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_addcdiv_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_addr_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_all_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_amax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_amin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_aminmax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_any_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_arange_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_argmin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_as_strided_partial_views_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_atan2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_atanh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_atleast_2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_bernoulli_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_broadcast_to_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_cdist_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_ceil_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_cfloat_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_chalf_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_cholesky_inverse_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_cholesky_solve_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_clamp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_clamp_min_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_clone_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_column_stack_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_conj_physical_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_contiguous_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_corrcoef_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_cos_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_cosh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_cummax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_cummin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_cumsum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_diag_embed_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_div_floor_rounding_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_double_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_dsplit_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_dstack_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_eq_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_erfc_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_erfinv_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_exp2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_expand_as_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_eye_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_fft_fft_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_fft_fftshift_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_fft_hfft2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_fft_ifft_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_fft_ifftshift_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_fft_ihfft_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_fft_ihfftn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_fft_rfft_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_fill_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_flipud_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_fmax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_fmin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_frexp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_ge_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_gradient_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_grid_sampler_2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_heaviside_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_hsplit_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_hstack_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_hypot_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_igamma_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_igammac_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_index_add_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_index_fill_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_index_reduce_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_index_select_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_inner_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_int_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_isclose_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_isfinite_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_isin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_isinf_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_isposinf_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_isreal_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_jiterator_binary_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_jiterator_binary_return_by_ref_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_jiterator_unary_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_kron_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_kthvalue_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_lerp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_lgamma_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_cond_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_cross_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_det_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_det_singular_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_eig_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_eigvals_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_eigvalsh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_ldl_factor_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_lstsq_grad_oriented_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_lu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_lu_factor_ex_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_norm_subgradients_at_zero_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_solve_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_solve_triangular_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_svd_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_tensorinv_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_tensorsolve_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_vecdot_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_vector_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linspace_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_log10_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_log2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_log_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_log_softmax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_logaddexp2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_logdet_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_logical_not_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_logical_xor_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_logit_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_lt_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_mH_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_amax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_cumsum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_fill_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_log_softmax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_logaddexp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_mean_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_softmax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_max_reduction_with_dim_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_mean_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_median_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_meshgrid_list_of_tensors_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_meshgrid_variadic_tensors_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_min_binary_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_minimum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_mode_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_multinomial_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nan_to_num_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nanmean_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_native_dropout_backward_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_new_empty_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_new_empty_strided_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_new_full_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_adaptive_avg_pool1d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_adaptive_avg_pool3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_adaptive_max_pool2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_adaptive_max_pool3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_alpha_dropout_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_avg_pool1d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_avg_pool2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_avg_pool3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_batch_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_batch_norm_without_cudnn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_bilinear_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_conv1d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_conv_transpose3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_cosine_embedding_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_cosine_similarity_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_dropout2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_fractional_max_pool2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_fractional_max_pool3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_gaussian_nll_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_gelu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_grid_sample_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_group_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_hardshrink_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_hardsigmoid_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_hardswish_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_interpolate_bicubic_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_interpolate_linear_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_interpolate_nearest_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_interpolate_trilinear_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_l1_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_leaky_relu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_linear_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_logsigmoid_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_margin_ranking_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_max_pool2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_max_unpool1d_grad_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_max_unpool2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_max_unpool2d_grad_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_max_unpool3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_max_unpool3d_grad_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_normalize_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_pad_constant_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_pad_replicate_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_pairwise_distance_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_pixel_shuffle_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_poisson_nll_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_relu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_rrelu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_smooth_l1_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_soft_margin_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_tanhshrink_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_threshold_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_unfold_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nonzero_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_norm_fro_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_norm_nuc_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_normal_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_ones_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_ones_like_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_pca_lowrank_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_permute_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_pinverse_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_polygamma_polygamma_n_1_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_polygamma_polygamma_n_2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_polygamma_polygamma_n_4_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_positive_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_qr_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_quantile_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_rad2deg_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_randint_like_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_randn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_ravel_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_real_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_reciprocal_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_renorm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_repeat_interleave_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_reshape_as_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_reshape_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_resolve_conj_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_round_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_round_decimals_0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_round_decimals_neg_3_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_rsub_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_scatter_add_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_scatter_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_scatter_reduce_amax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_scatter_reduce_amin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_scatter_reduce_mean_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_scatter_reduce_prod_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_searchsorted_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_segment_reduce_offsets_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_select_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_select_scatter_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_signal_windows_bartlett_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_signal_windows_blackman_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_signal_windows_exponential_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_signal_windows_gaussian_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_signal_windows_general_cosine_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_signal_windows_nuttall_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_signbit_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_sin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_slice_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_slice_scatter_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_softmax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_sort_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_sparse_sampled_addmm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_bessel_j1_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_bessel_y0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_bessel_y1_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_chebyshev_polynomial_u_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_chebyshev_polynomial_w_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_entr_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_hermite_polynomial_h_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_log_ndtr_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_modified_bessel_i0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_modified_bessel_i1_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_ndtr_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_polygamma_special_polygamma_n_0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_scaled_modified_bessel_k0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_scaled_modified_bessel_k1_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_zeta_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_split_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_split_list_args_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_split_with_sizes_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_square_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_squeeze_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_std_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_sum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_svd_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_svd_lowrank_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_take_along_dim_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_tanh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_tile_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_to_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_to_sparse_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_trace_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_trapezoid_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_trapz_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_tril_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_uniform_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_var_mean_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_var_unbiased_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_view_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_vsplit_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_vstack_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_xlogy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_zero__cuda_float32, test/test_ops.py::TestMathBitsCUDA::test_conj_view_T_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view___getitem___cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view___radd___cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view___rdiv___cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view___rmatmul___cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view___rmul___cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view___rpow___cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs__conversions_bool_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs__conversions_byte_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs__conversions_cfloat_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs__conversions_int_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_acos_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_add_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_addcdiv_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_addcmul_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_addr_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_all_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_any_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_as_strided_partial_views_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_as_strided_scatter_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_asin_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_asinh_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_atan_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_atleast_1d_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_broadcast_tensors_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_cat_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_contiguous_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_diag_embed_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_diagonal_scatter_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_div_no_rounding_mode_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_empty_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_empty_like_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_eq_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_exp_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_expand_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_eye_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_fft_fftshift_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_fft_hfftn_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_fft_ifft2_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_fft_ifftshift_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_fft_irfft_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_fft_irfftn_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_fill_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_flatten_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_flip_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_fliplr_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_flipud_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_hstack_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_index_copy_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_index_fill_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_index_select_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_isclose_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_isfinite_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_isinf_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_isnan_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_linalg_svd_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_linspace_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_log_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_logical_not_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_logical_xor_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_logspace_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_mean_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_meshgrid_list_of_tensors_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_mul_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_narrow_copy_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_narrow_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_ne_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_new_full_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_new_ones_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_norm_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_permute_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_pow_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_prod_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_randn_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_rsub_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_sigmoid_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_softmax_with_dtype_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_special_softmax_with_dtype_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_sqrt_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_square_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_squeeze_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_stack_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_std_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_std_mean_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_sum_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_sum_to_size_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_tanh_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_tensor_split_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_to_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_tril_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_triu_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_unflatten_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_unsqueeze_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_var_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_view_as_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_vstack_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_where_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_acosh_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_addcmul_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_addmm_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_addmm_decomposed_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_addr_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_all_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_angle_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_argwhere_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_as_strided_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_as_strided_partial_views_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_as_strided_scatter_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_asinh_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_baddbmm_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_block_diag_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_broadcast_to_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_cartesian_prod_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_cat_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_cdouble_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_cholesky_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_cholesky_inverse_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_clone_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_column_stack_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_combinations_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_conj_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_contiguous_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_corrcoef_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_cos_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_cross_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_cumprod_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_diag_embed_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_diagflat_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_diagonal_copy_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_diagonal_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_diagonal_scatter_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_dist_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_dot_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_eq_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_exp_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_expand_as_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_fft_fft_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_fft_fftn_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_fft_hfft2_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_fft_hfft_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_fft_ifft2_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_fft_ifft_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_fft_ifftshift_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_fft_irfft_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_fill_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_flip_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_flipud_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_float_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_full_like_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_gather_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_geqrf_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_gradient_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_hstack_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_index_add_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_index_copy_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_index_select_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_inner_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_int_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_isclose_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_isfinite_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_jiterator_4inputs_with_extra_args_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_jiterator_binary_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_ldexp_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_cross_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_det_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_det_singular_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_eigvalsh_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_householder_product_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_inv_ex_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_ldl_factor_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_lu_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_lu_factor_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_lu_factor_ex_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_matrix_rank_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_pinv_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_pinv_hermitian_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_pinv_singular_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_qr_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_slogdet_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_svdvals_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_vector_norm_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linspace_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_log10_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_log_softmax_with_dtype_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_lu_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_mT_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_masked_cumprod_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_masked_cumsum_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_masked_mean_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_masked_prod_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_masked_scatter_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_masked_std_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_masked_sum_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_masked_var_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_matrix_exp_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_mean_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_meshgrid_variadic_tensors_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_movedim_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_narrow_copy_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_narrow_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_neg_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_new_empty_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_new_full_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_new_zeros_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_conv1d_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_conv2d_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_conv_transpose2d_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_conv_transpose3d_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_feature_alpha_dropout_without_train_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_l1_loss_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_linear_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_normalize_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_pad_constant_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_pad_replicate_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_pixel_shuffle_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_pixel_unshuffle_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_softsign_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_norm_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_norm_fro_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_norm_nuc_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_ones_like_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_ops_nvprims_view_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_outer_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_permute_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_positive_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_pow_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_qr_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_real_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_reciprocal_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_repeat_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_repeat_interleave_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_reshape_as_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_reshape_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_resize__cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_resolve_neg_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_rot90_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_rsqrt_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_scalar_tensor_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_scatter_add_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_scatter_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_select_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_sgn_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_short_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_sinc_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_slice_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_softmax_with_dtype_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_split_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_split_list_args_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_split_with_sizes_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_square_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_std_unbiased_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_stft_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_svd_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_symeig_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_t_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_tanh_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_tensordot_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_to_sparse_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_transpose_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_trapezoid_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_unbind_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_unflatten_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_uniform_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_unsqueeze_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_var_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_var_unbiased_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_vdot_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_view_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_vsplit_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_vstack_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_where_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_zero__cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_zeros_like_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view___rdiv___cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view___rmatmul___cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view___rmul___cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view___rpow___cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs__conversions_cfloat_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs__conversions_char_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs__conversions_double_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs__conversions_half_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs__conversions_int_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_acosh_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_addcmul_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_allclose_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_any_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_as_strided_partial_views_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_asin_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_atanh_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_broadcast_tensors_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_broadcast_to_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_cat_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_conj_physical_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_cosh_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_cumsum_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_diagonal_copy_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_div_no_rounding_mode_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_empty_like_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_fft_fftn_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_fft_fftshift_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_fft_hfft_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_fft_ifftshift_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_fft_irfft_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_fft_irfftn_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_hsplit_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_index_add_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_index_copy_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_isfinite_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_isreal_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_lerp_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_linalg_matrix_norm_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_linalg_svd_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_linalg_svdvals_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_log_softmax_with_dtype_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_logical_and_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_logical_not_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_logical_xor_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_logspace_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_masked_fill_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_mean_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_meshgrid_list_of_tensors_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_meshgrid_variadic_tensors_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_movedim_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_mul_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_narrow_copy_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_neg_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_new_empty_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_new_ones_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_nn_functional_l1_loss_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_nn_functional_log_softmax_with_dtype_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_nn_functional_softmin_with_dtype_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_nn_functional_tanhshrink_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_norm_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_ones_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_permute_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_positive_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_prod_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_randn_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_repeat_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_rot90_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_rsqrt_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_sinh_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_special_log_softmax_with_dtype_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_special_softmax_with_dtype_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_square_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_stack_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_std_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_sub_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_sum_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_t_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_tensor_split_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_to_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_tril_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_triu_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_true_divide_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_vsplit_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_vstack_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_where_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_zeros_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_add_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_addbmm_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_addmm_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_addmv_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_all_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_allclose_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_any_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_asin_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_asinh_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_atan_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_atleast_2d_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_atleast_3d_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_bfloat16_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_block_diag_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_bool_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_broadcast_to_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_cartesian_prod_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_cdouble_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_chalf_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_char_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_cholesky_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_cholesky_inverse_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_cholesky_solve_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_clone_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_column_stack_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_combinations_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_constant_pad_nd_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_cos_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_cosh_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_cov_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_cumprod_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_diag_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_diag_embed_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_diagflat_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_diagonal_copy_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_diff_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_dot_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_double_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_empty_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_empty_like_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_eq_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_exp_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_expand_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_eye_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_fft_fft2_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_fft_fft_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_fft_fftn_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_fft_fftshift_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_fft_hfft_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_fft_hfftn_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_fft_ifftn_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_fft_irfft2_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_fft_irfftn_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_fill_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_fliplr_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_gather_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_geqrf_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_hsplit_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_hstack_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_index_copy_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_isfinite_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_isnan_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_istft_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_lerp_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_cross_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_det_singular_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_eig_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_eigh_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_eigvals_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_inv_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_inv_ex_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_ldl_factor_ex_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_lstsq_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_lstsq_grad_oriented_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_lu_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_lu_factor_ex_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_matrix_power_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_multi_dot_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_norm_subgradients_at_zero_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_qr_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_solve_triangular_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_svd_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_svdvals_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_vecdot_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_vector_norm_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_log10_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_log2_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_logdet_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_logical_not_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_logical_or_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_logical_xor_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_lu_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_lu_solve_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_lu_unpack_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_masked_cumsum_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_masked_fill_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_masked_scatter_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_masked_select_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_matmul_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_mean_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_meshgrid_list_of_tensors_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_movedim_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_mul_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_mv_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_narrow_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_ne_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_neg_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_new_full_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_new_ones_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_conv1d_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_conv_transpose1d_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_conv_transpose3d_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_l1_loss_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_pad_circular_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_pad_constant_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_pad_reflect_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_softmin_with_dtype_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_softsign_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_triplet_margin_with_distance_loss_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nonzero_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_norm_fro_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_ops_nvprims_view_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_ormqr_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_outer_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_positive_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_pow_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_qr_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_renorm_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_repeat_interleave_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_reshape_as_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_reshape_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_resize_as__cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_roll_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_rsqrt_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_select_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_sgn_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_short_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_sigmoid_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_split_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_sqrt_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_square_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_squeeze_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_std_mean_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_std_mean_unbiased_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_svd_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_take_along_dim_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_take_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_tanh_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_tensor_split_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_tensordot_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_tile_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_trapezoid_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_trapz_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_triangular_solve_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_tril_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_triu_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_true_divide_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_uniform_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_unsqueeze_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_var_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_var_mean_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_var_unbiased_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_vdot_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_view_as_real_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_vsplit_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_vstack_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_where_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_zero__cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_view_H_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_T_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view___radd___cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view___rdiv___cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view___rmatmul___cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view___rmul___cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view___rpow___cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view___rsub___cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_T_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs__conversions_byte_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs__conversions_cfloat_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs__conversions_char_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs__conversions_half_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs__conversions_int_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs__conversions_short_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_addr_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_all_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_allclose_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_amax_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_as_strided_scatter_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_asin_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_asinh_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_atan2_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_atan_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_atleast_3d_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_broadcast_to_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_cat_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_ceil_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_clamp_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_clamp_max_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_column_stack_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_conj_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_conj_physical_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_cumsum_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_diag_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_diagonal_copy_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_diagonal_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_digamma_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_div_floor_rounding_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_dsplit_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_dstack_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_empty_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_erf_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_erfc_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_erfinv_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_expand_as_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_fft2_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_fft_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_fftn_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_hfft2_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_hfftn_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_ifft2_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_ifft_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_ifftn_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_ihfft2_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_ihfft_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_irfft2_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fill_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_flatten_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fliplr_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_float_power_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_floor_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_floor_divide_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fmax_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fmin_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fmod_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_frac_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_ge_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_hsplit_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_hstack_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_index_copy_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_isinf_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_isnan_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_isneginf_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_lerp_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_linalg_svd_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_linalg_svdvals_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_linalg_vector_norm_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_linspace_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_log1p_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_logical_and_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_logical_not_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_logical_xor_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_logspace_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_logsumexp_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_lt_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_maximum_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nan_to_num_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_narrow_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_ne_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_new_empty_strided_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_new_ones_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_new_zeros_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nextafter_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_alpha_dropout_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_celu_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_elu_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_glu_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_hardshrink_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_hardtanh_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_hinge_embedding_loss_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_layer_norm_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_leaky_relu_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_margin_ranking_loss_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_nll_loss_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_pdist_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_poisson_nll_loss_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_prelu_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_relu6_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_relu_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_selu_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_softmax_with_dtype_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_threshold_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_norm_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_prod_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_real_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_roll_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_rot90_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_rsqrt_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_rsub_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_sign_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_sinh_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_softmax_with_dtype_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_special_bessel_j1_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_special_entr_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_special_erfcx_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_special_multigammaln_mvlgamma_p_1_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_special_ndtr_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_special_spherical_bessel_j0_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_special_zeta_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_sqrt_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_square_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_stack_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_std_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_std_mean_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_sub_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_sum_to_size_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_t_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_tan_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_transpose_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_trunc_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_var_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_zeros_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__softmax_backward_data_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_abs_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_addbmm_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_addcdiv_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_addmm_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_addr_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_amax_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_aminmax_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_any_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_argwhere_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_as_strided_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_as_strided_scatter_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_asinh_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_atan2_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_atleast_1d_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_baddbmm_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_bernoulli_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_block_diag_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_bmm_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_bool_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_broadcast_to_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_bucketize_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_byte_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_cat_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_cdist_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_cdouble_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_cfloat_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_chalf_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_cholesky_inverse_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_cholesky_solve_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_clamp_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_clamp_min_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_conj_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_conj_physical_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_constant_pad_nd_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_contiguous_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_cosh_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_cummax_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_diag_embed_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_diagflat_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_diagonal_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_diff_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_dist_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_div_trunc_rounding_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_dot_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_double_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_dsplit_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_empty_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_empty_like_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_eq_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_equal_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_erf_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_erfc_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_erfinv_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_expand_as_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_expm1_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_fft_fftn_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_fft_hfftn_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_fft_ifft2_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_fft_ifftn_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_fft_ihfftn_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_fft_irfftn_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_fft_rfft2_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_fft_rfft_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_fill_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_float_power_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_floor_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_floor_divide_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_fmax_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_fmin_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_fmod_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_full_like_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_gather_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_geqrf_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_gradient_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_gt_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_histc_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_igammac_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_index_add_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_index_reduce_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_index_select_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_inner_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_int_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_isclose_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_isin_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_isinf_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_isnan_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_isneginf_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_isposinf_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_isreal_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_kron_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_cond_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_cross_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_det_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_det_singular_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_eigh_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_householder_product_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_inv_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_lstsq_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_lstsq_grad_oriented_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_lu_factor_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_matrix_norm_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_matrix_power_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_multi_dot_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_pinv_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_pinv_singular_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_qr_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_solve_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_solve_ex_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_solve_triangular_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_svd_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_svdvals_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_tensorsolve_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_vector_norm_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_log10_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_log2_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_log_softmax_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_log_softmax_with_dtype_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_logical_or_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_logspace_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_logsumexp_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_long_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_lt_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_lu_unpack_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_masked_amax_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_masked_amin_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_masked_argmax_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_masked_argmin_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_masked_cumprod_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_masked_cumsum_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_masked_log_softmax_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_masked_mean_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_masked_median_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_masked_norm_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_masked_sum_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_masked_var_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_matrix_exp_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_max_binary_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_maximum_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_meshgrid_list_of_tensors_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_min_binary_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_mm_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_msort_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_mv_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nan_to_num_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nanmedian_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nanquantile_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nansum_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_narrow_copy_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_neg_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_new_empty_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_new_empty_strided_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_new_ones_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_new_zeros_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nextafter_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional__scaled_dot_product_attention_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_adaptive_avg_pool2d_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_adaptive_avg_pool3d_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_adaptive_max_pool1d_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_adaptive_max_pool2d_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_adaptive_max_pool3d_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_avg_pool1d_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_avg_pool3d_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_bilinear_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_celu_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_conv2d_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_conv_transpose2d_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_conv_transpose3d_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_ctc_loss_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_dropout2d_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_dropout3d_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_feature_alpha_dropout_with_train_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_fractional_max_pool2d_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_gaussian_nll_loss_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_gelu_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_glu_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_group_norm_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_hardshrink_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_hardswish_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_hardtanh_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_hinge_embedding_loss_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_huber_loss_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_interpolate_bicubic_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_interpolate_bilinear_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_interpolate_linear_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_interpolate_trilinear_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_kl_div_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_layer_norm_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_margin_ranking_loss_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_max_pool1d_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_max_pool3d_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_max_unpool2d_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_max_unpool3d_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_max_unpool3d_grad_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_multi_margin_loss_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_multilabel_margin_loss_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_multilabel_soft_margin_loss_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_nll_loss_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_normalize_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_pad_replicate_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_pdist_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_poisson_nll_loss_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_relu_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_selu_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_silu_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_softmin_with_dtype_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_softsign_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_unfold_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_upsample_nearest_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_norm_fro_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_norm_inf_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_normal_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_ones_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_ops_nvprims_view_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_outer_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_pinverse_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_polygamma_polygamma_n_1_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_polygamma_polygamma_n_2_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_polygamma_polygamma_n_4_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_put_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_randint_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_randn_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_randn_like_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_ravel_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_real_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_reciprocal_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_remainder_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_renorm_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_repeat_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_repeat_interleave_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_reshape_as_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_resize_as__cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_resolve_neg_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_roll_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_rot90_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_round_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_round_decimals_neg_3_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_rsqrt_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_scalar_tensor_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_scatter_add_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_scatter_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_scatter_reduce_amin_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_scatter_reduce_mean_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_searchsorted_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_segment_reduce_lengths_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_select_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_signal_windows_blackman_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_signal_windows_cosine_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_signal_windows_exponential_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_signal_windows_general_hamming_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_signal_windows_hamming_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_signbit_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_sin_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_softmax_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_softmax_with_dtype_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_sort_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_sparse_sampled_addmm_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_airy_ai_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_bessel_y0_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_bessel_y1_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_chebyshev_polynomial_t_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_chebyshev_polynomial_w_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_entr_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_hermite_polynomial_h_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_i0e_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_i1_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_legendre_polynomial_p_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_log_ndtr_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_scaled_modified_bessel_k0_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_shifted_chebyshev_polynomial_t_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_shifted_chebyshev_polynomial_w_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_spherical_bessel_j0_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_split_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_split_with_sizes_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_sqrt_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_square_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_squeeze_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_std_mean_unbiased_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_stft_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_sum_to_size_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_svd_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_svd_lowrank_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_symeig_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_t_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_take_along_dim_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_tan_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_tanh_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_tensor_split_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_to_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_to_sparse_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_transpose_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_trapz_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_triangular_solve_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_triu_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_trunc_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_unflatten_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_unfold_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_uniform_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_unique_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_var_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_var_mean_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_var_mean_unbiased_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_var_unbiased_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_vdot_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_view_as_complex_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_view_copy_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_where_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_xlogy_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_zeros_cuda_float64, test/test_ops.py::TestFakeTensorCUDA::test_fake_H_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake___getitem___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake___rdiv___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake___rmod___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake___rpow___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake___rxor___cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_fake__softmax_backward_data_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_abs_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_acosh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_add_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_addcdiv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_addcmul_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_addmm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_addmv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_all_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_allclose_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_amin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_any_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_arange_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_argmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_argmin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_argsort_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_as_strided_partial_views_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_asin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_atanh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_atleast_1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_H_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_T_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast___radd___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast___rmul___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast___ror___cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast___rsub___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast___rxor___cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast__softmax_backward_data_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_acosh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_add_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_addbmm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_addcmul_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_addmm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_addmm_decomposed_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_addmv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_allclose_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_aminmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_angle_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_argmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_as_strided_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_as_strided_scatter_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_asin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_atan2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_atanh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_atleast_1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_atleast_2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_atleast_3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_baddbmm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_bernoulli_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_bfloat16_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_bitwise_and_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_bitwise_or_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_bool_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_broadcast_tensors_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_byte_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_cat_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_cdist_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_cdouble_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_cholesky_inverse_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_chunk_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_clamp_max_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_column_stack_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_complex_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_conj_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_conj_physical_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_corrcoef_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_cos_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_count_nonzero_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_cummax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_cumsum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_cumulative_trapezoid_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_deg2rad_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_diag_embed_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_diagonal_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_digamma_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_div_floor_rounding_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_div_no_rounding_mode_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_div_trunc_rounding_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_equal_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_erf_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_exp2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_eye_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fft_fft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fft_hfft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fft_hfftn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fft_ifft2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fft_ifft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fft_irfft2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fft_irfft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fft_rfft2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fft_rfftn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fill_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_flip_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fliplr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_flipud_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_floor_divide_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fmod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_full_like_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_gather_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_gcd_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_ge_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_histc_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_hsplit_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_igamma_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_imag_cuda_complex64, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_index_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_index_fill_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_index_put_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_isclose_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_isfinite_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_isnan_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_isposinf_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_istft_cuda_complex64, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_jiterator_binary_return_by_ref_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_kthvalue_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_lerp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_cond_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_cross_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_det_singular_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_eig_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_eigvals_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_eigvalsh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_inv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_inv_ex_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_ldl_factor_ex_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_ldl_solve_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_lstsq_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_lstsq_grad_oriented_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_lu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_matrix_rank_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_matrix_rank_hermitian_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_multi_dot_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_norm_subgradients_at_zero_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_pinv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_pinv_hermitian_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_pinv_singular_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_solve_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_svd_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_svdvals_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_tensorinv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_tensorsolve_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_vander_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_vecdot_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_vector_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_log10_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_log1p_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_log2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_logaddexp2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_logdet_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_logical_and_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_logspace_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_lu_unpack_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_masked_amax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_masked_cumprod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_masked_logsumexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_masked_mean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_masked_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_masked_normalize_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_masked_prod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_masked_scatter_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_masked_select_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_masked_std_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_masked_sum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_masked_var_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_matmul_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_matrix_exp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_max_pool2d_with_indices_backward_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_max_reduction_no_dim_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_meshgrid_variadic_tensors_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_min_reduction_no_dim_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_min_reduction_with_dim_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_mode_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_movedim_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_msort_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_multinomial_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nanmean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_narrow_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_native_dropout_backward_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_native_layer_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_new_empty_strided_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_new_zeros_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nextafter_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional__scaled_dot_product_attention_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_adaptive_avg_pool1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_adaptive_avg_pool3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_adaptive_max_pool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_batch_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_batch_norm_without_cudnn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_bilinear_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_binary_cross_entropy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_conv1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_conv2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_conv_transpose1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_conv_transpose2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_cosine_similarity_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_cross_entropy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_fractional_max_pool3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_hardshrink_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_huber_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_instance_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_interpolate_bicubic_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_interpolate_linear_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_interpolate_nearest_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_interpolate_trilinear_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_kl_div_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_local_response_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_logsigmoid_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_margin_ranking_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_max_pool1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_max_unpool1d_grad_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_max_unpool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_max_unpool2d_grad_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_max_unpool3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_mish_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_mse_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_multi_margin_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_one_hot_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_pad_constant_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_pixel_shuffle_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_pixel_unshuffle_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_prelu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_relu6_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_relu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_rrelu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_softplus_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_upsample_nearest_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_norm_fro_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_norm_inf_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_norm_nuc_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_normal_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_normal_number_mean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_ones_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_ones_like_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_outer_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_pca_lowrank_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_pinverse_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_polar_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_polygamma_polygamma_n_0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_polygamma_polygamma_n_1_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_polygamma_polygamma_n_4_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_positive_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_pow_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_prod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_put_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_rad2deg_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_rand_like_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_randint_like_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_randn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_randn_like_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_reciprocal_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_renorm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_resize__cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_resolve_conj_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_resolve_neg_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_roll_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_round_decimals_0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_round_decimals_neg_3_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_rsqrt_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_rsub_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_scalar_tensor_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_scatter_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_scatter_reduce_amax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_scatter_reduce_prod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_scatter_reduce_sum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_searchsorted_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_segment_reduce_offsets_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_sigmoid_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_sign_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_signal_windows_bartlett_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_signal_windows_exponential_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_signbit_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_sin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_sinc_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_softmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_softmax_with_dtype_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_bessel_j0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_bessel_y0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_chebyshev_polynomial_t_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_hermite_polynomial_he_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_i0e_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_i1_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_laguerre_polynomial_l_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_log_ndtr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_modified_bessel_i1_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_modified_bessel_k1_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_ndtr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_polygamma_special_polygamma_n_0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_scaled_modified_bessel_k1_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_split_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_split_list_args_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_sqrt_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_square_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_std_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_std_mean_unbiased_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_sub_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_sum_to_size_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_svd_lowrank_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_t_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_take_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_tanh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_tensor_split_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_to_sparse_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_topk_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_trace_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_triangular_solve_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_tril_indices_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_triu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_true_divide_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_trunc_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_unflatten_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_unfold_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_unique_consecutive_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_var_mean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_view_as_complex_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_view_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_xlogy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_zero__cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_zeros_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_baddbmm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_bernoulli_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_bfloat16_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_bincount_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_fake_bitwise_not_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_fake_bitwise_or_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_fake_bitwise_xor_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_fake_block_diag_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_bmm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_bool_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_broadcast_shapes_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_broadcast_tensors_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_byte_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_cdouble_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_char_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_cholesky_solve_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_clone_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_combinations_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_complex_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_conj_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_constant_pad_nd_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_copysign_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_corrcoef_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_cos_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_cosh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_cross_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_T_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp___radd___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp___rdiv___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp___rmatmul___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp___rmul___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp___rpow___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp___rsub___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp__native_batch_norm_legit_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp__softmax_backward_data_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_acos_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_add_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_addcdiv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_addcmul_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_addmm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_addmm_decomposed_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_as_strided_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_as_strided_partial_views_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_asin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_atan2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_atanh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_atleast_1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_atleast_3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_bernoulli_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_bfloat16_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_bmm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_broadcast_to_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_cat_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_ceil_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_chalf_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_chunk_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_clamp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_clamp_min_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_complex_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_conj_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_corrcoef_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_cos_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_cosh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_cov_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_cross_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_cummax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_cumsum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_cumulative_trapezoid_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_diagonal_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_digamma_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_dist_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_dot_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_double_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_dsplit_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_erfc_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_expand_as_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_expm1_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fft_fft2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fft_fftshift_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fft_hfftn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fft_ihfft2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fft_ihfftn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fft_irfft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fft_rfft2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fft_rfft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_flip_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_flipud_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_float_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_float_power_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_floor_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fmod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_frac_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_gather_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_grid_sampler_2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_i0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_index_fill_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_index_put_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_index_select_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_inner_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_kron_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_lgamma_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_cross_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_det_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_det_singular_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_eig_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_eigh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_eigvals_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_householder_product_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_lu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_lu_factor_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_lu_factor_ex_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_matrix_power_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_multi_dot_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_norm_subgradients_at_zero_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_pinv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_pinv_singular_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_qr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_solve_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_solve_ex_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_svd_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_vander_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_vecdot_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_log2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_log_softmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_logaddexp2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_logcumsumexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_logdet_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_logit_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_lu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_mT_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_masked_cumprod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_masked_fill_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_masked_mean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_masked_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_masked_normalize_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_masked_select_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_masked_softmin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_masked_std_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_matrix_exp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_max_binary_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_max_pool2d_with_indices_backward_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_max_reduction_with_dim_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_meshgrid_variadic_tensors_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_mode_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_movedim_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_msort_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_mv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_mvlgamma_mvlgamma_p_5_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nan_to_num_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nanmean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nanquantile_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nansum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_native_layer_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_neg_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional__scaled_dot_product_attention_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_adaptive_avg_pool1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_adaptive_avg_pool3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_adaptive_max_pool1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_alpha_dropout_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_avg_pool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_avg_pool3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_batch_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_binary_cross_entropy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_conv_transpose3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_cosine_similarity_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_ctc_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_dropout2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_dropout3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_dropout_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_elu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_embedding_bag_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_embedding_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_fractional_max_pool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_fractional_max_pool3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_glu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_grid_sample_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_group_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_hardsigmoid_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_instance_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_interpolate_area_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_interpolate_nearest_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_kl_div_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_linear_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_local_response_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_margin_ranking_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_max_pool3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_max_unpool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_max_unpool3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_nll_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_normalize_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_pad_replicate_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_pairwise_distance_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_pixel_unshuffle_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_poisson_nll_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_relu6_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_relu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_rrelu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_smooth_l1_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_soft_margin_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_softsign_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_tanhshrink_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_threshold_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_triplet_margin_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_upsample_bilinear_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_upsample_nearest_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_norm_inf_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_normal_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_normal_number_mean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_ormqr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_pca_lowrank_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_permute_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_pinverse_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_polygamma_polygamma_n_0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_polygamma_polygamma_n_2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_polygamma_polygamma_n_3_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_positive_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_pow_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_quantile_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_ravel_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_real_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_reshape_as_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_reshape_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_resolve_conj_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_round_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_round_decimals_0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_round_decimals_3_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_scatter_add_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_segment_reduce_lengths_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_segment_reduce_offsets_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_select_scatter_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_sgn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_sigmoid_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_sin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_sinc_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_slice_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_slice_scatter_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_sparse_sampled_addmm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_special_entr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_special_i1_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_special_i1e_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_special_log_ndtr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_special_polygamma_special_polygamma_n_0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_special_xlog1py_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_split_with_sizes_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_sqrt_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_square_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_std_mean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_stft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_sub_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_sum_to_size_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_t_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_tan_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_tanh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_tensor_split_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_tensordot_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_tile_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_to_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_trace_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_trapz_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_triangular_solve_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_trunc_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_unbind_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_unflatten_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_unsqueeze_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_view_as_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_view_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_vsplit_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_where_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_xlogy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp___getitem___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp___rmatmul___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp___rmod___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp___rpow___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp__native_batch_norm_legit_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_abs_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_add_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_addbmm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_addcmul_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_addmm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_addmm_decomposed_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_addmv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_amax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_amin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_as_strided_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_as_strided_partial_views_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_asin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_atan_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_atanh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_atleast_1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_block_diag_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_broadcast_to_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_cat_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_ceil_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_chalf_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_cholesky_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_cholesky_solve_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_chunk_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_clone_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_combinations_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_conj_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_constant_pad_nd_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_corrcoef_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_cov_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_cummax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_cummin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_cumsum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_cumulative_trapezoid_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_diag_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_diff_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_digamma_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_dist_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_div_no_rounding_mode_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_dot_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_dstack_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_einsum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_erfinv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_exp2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_expand_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fft_fftshift_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fft_hfft2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fft_hfft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fft_hfftn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fft_ihfft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fft_ihfftn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fft_irfft2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fft_rfftn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_flatten_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fliplr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fmin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_frac_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_frexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_gather_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_gradient_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_grid_sampler_2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_hsplit_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_i0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_index_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_index_fill_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_index_put_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_index_reduce_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_index_select_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_inner_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_kron_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_kthvalue_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_ldexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_cond_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_cross_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_det_singular_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_eig_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_eigvals_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_inv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_lstsq_grad_oriented_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_lu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_lu_factor_ex_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_matrix_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_pinv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_pinv_hermitian_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_qr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_slogdet_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_solve_triangular_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_svdvals_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_vander_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_vector_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_log_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_log_softmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_logcumsumexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_logit_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_logsumexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_mH_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_masked_cumprod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_masked_cumsum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_masked_fill_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_masked_log_softmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_masked_mean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_masked_median_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_masked_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_masked_normalize_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_masked_prod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_masked_sum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_masked_var_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_max_binary_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_max_pool2d_with_indices_backward_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_max_reduction_no_dim_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_mean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_median_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_min_binary_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_min_reduction_with_dim_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nanmedian_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nanquantile_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_narrow_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_native_batch_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_native_layer_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional__scaled_dot_product_attention_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_adaptive_avg_pool1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_avg_pool3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_batch_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_binary_cross_entropy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_celu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_conv_transpose3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_cosine_similarity_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_cross_entropy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_dropout2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_dropout_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_elu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_fractional_max_pool3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_gaussian_nll_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_gelu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_grid_sample_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_hinge_embedding_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_huber_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_interpolate_area_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_interpolate_bicubic_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_interpolate_bilinear_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_kl_div_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_l1_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_leaky_relu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_max_pool1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_max_pool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_max_pool3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_max_unpool1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_max_unpool1d_grad_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_max_unpool2d_grad_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_max_unpool3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_mish_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_multilabel_margin_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_nll_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_pad_reflect_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_pad_replicate_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_pixel_shuffle_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_pixel_unshuffle_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_poisson_nll_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_selu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_silu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_softmin_with_dtype_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_softshrink_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_softsign_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_tanhshrink_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_norm_fro_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_normal_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_normal_number_mean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_outer_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_polar_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_positive_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_pow_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_prod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_put_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_quantile_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_ravel_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_repeat_interleave_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_reshape_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_rot90_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_round_decimals_0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_round_decimals_neg_3_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_rsqrt_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_scatter_reduce_amax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_scatter_reduce_amin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_scatter_reduce_mean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_scatter_reduce_prod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_scatter_reduce_sum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_segment_reduce_lengths_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_sin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_slice_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_slice_scatter_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_sort_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_special_entr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_special_i0e_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_special_i1_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_special_log_ndtr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_special_ndtr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_split_list_args_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_split_with_sizes_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_square_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_squeeze_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_stack_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_stft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_sub_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_symeig_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_t_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_take_along_dim_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_take_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_tensordot_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_tile_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_to_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_trapezoid_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_trapz_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_triu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_true_divide_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_unbind_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_unsqueeze_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_var_mean_unbiased_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_vdot_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_view_as_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_view_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_vsplit_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_where_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_xlogy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_cumprod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_cumsum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_cumulative_trapezoid_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_deg2rad_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_diag_embed_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_diagonal_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_diagonal_scatter_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_dist_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_div_floor_rounding_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_div_trunc_rounding_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_dsplit_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_einsum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_empty_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_empty_like_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_eq_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_erf_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_erfc_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_erfinv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_exp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_expand_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_expm1_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_eye_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_fft_fft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_fft_fftn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_fft_fftshift_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_fft_hfft2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_fft_hfftn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_fft_ifftn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_fft_ihfft2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_fft_rfft2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_flip_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_float_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_float_power_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_floor_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_fmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_frexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_full_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_gather_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_gcd_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_fake_grid_sampler_2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_gt_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_i0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_igamma_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_igammac_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_imag_cuda_complex64, test/test_ops.py::TestFakeTensorCUDA::test_fake_int_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_isclose_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_isfinite_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_isin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_isnan_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_isreal_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_istft_cuda_complex64, test/test_ops.py::TestFakeTensorCUDA::test_fake_jiterator_2inputs_2outputs_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_jiterator_unary_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_kron_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_kthvalue_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_lcm_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_fake_ldexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_lgamma_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_cond_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_det_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_det_singular_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_eigh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_eigvals_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_householder_product_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_ldl_factor_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_ldl_factor_ex_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_ldl_solve_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_lstsq_grad_oriented_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_lu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_lu_factor_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_lu_factor_ex_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_matrix_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_matrix_power_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_pinv_singular_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_solve_ex_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_svd_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_svdvals_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_tensorinv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_tensorsolve_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_vander_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_vector_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linspace_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_log_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_log_softmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_logcumsumexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_logical_and_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_logical_or_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_logical_xor_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_logsumexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_long_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_lu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_lu_solve_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_masked_amin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_masked_argmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_masked_cumprod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_masked_logaddexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_masked_mean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_masked_median_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_masked_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_masked_prod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_masked_scatter_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_masked_select_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_masked_softmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_masked_softmin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_masked_std_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_masked_sum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_masked_var_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_matmul_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_matrix_exp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_max_pool2d_with_indices_backward_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_max_reduction_no_dim_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_mean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_meshgrid_list_of_tensors_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_min_reduction_with_dim_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_mul_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_mvlgamma_mvlgamma_p_1_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_mvlgamma_mvlgamma_p_5_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nan_to_num_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nanquantile_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_narrow_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_narrow_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_native_dropout_backward_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_native_layer_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_neg_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_new_ones_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional__scaled_dot_product_attention_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_adaptive_avg_pool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_adaptive_max_pool3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_alpha_dropout_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_avg_pool1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_batch_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_bilinear_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_binary_cross_entropy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_conv1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_conv2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_conv_transpose1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_cosine_similarity_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_dropout3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_embedding_bag_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_fractional_max_pool3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_gaussian_nll_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_gelu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_group_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_hardshrink_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_hardswish_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_hardtanh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_instance_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_interpolate_bicubic_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_interpolate_bilinear_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_interpolate_linear_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_interpolate_nearest_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_kl_div_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_layer_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_local_response_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_logsigmoid_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_margin_ranking_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_max_pool1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_max_pool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_max_pool3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_max_unpool1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_max_unpool3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_normalize_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_pad_constant_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_pixel_unshuffle_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_relu6_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_relu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_rrelu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_selu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_smooth_l1_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_softmin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_softmin_with_dtype_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_softsign_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_threshold_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_upsample_bilinear_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_upsample_nearest_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nonzero_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_norm_fro_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_norm_inf_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_ones_like_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_ormqr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_permute_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_polar_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_polygamma_polygamma_n_1_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_positive_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_put_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_rand_like_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_randn_like_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_real_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_reciprocal_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_repeat_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_repeat_interleave_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_resize__cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_resolve_conj_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_round_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_round_decimals_3_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_rsqrt_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_rsub_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_scatter_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_scatter_reduce_amax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_scatter_reduce_amin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_scatter_reduce_mean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_scatter_reduce_prod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_scatter_reduce_sum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_searchsorted_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_segment_reduce_lengths_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_select_scatter_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_sgn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_short_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_sigmoid_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_sign_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_signal_windows_blackman_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_signal_windows_general_cosine_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_signal_windows_hann_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_signal_windows_kaiser_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_signbit_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_sinc_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_sinh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_softmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_sparse_sampled_addmm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_airy_ai_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_bessel_j0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_bessel_y1_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_chebyshev_polynomial_w_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_entr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_i0e_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_i1_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_i1e_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_laguerre_polynomial_l_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_legendre_polynomial_p_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_log_ndtr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_modified_bessel_i0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_modified_bessel_i1_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_modified_bessel_k1_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_scaled_modified_bessel_k1_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_zeta_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_split_list_args_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_square_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_std_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_sub_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_sum_to_size_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_svd_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_svd_lowrank_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_symeig_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_t_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_tanh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_tensor_split_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_tile_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_to_sparse_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_topk_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_transpose_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_triangular_solve_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_tril_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_true_divide_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_trunc_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_unbind_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_unfold_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_uniform_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_unique_consecutive_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_unsqueeze_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_var_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_var_mean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_var_mean_unbiased_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_vdot_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_view_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_xlogy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_zero__cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_zeros_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops___getitem___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops___rand___cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops___rdiv___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops___rmod___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops___rmul___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops___rsub___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops___rxor___cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_add_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_addcdiv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_addmm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_amax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_any_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_arange_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_argmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_as_strided_scatter_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_asin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_asinh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_atan2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_atanh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_atleast_2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_atleast_3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_baddbmm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_bfloat16_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_bitwise_and_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_bitwise_or_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_bitwise_xor_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_bmm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_broadcast_shapes_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_broadcast_tensors_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_broadcast_to_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_cartesian_prod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_cfloat_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_chalf_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_char_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_cholesky_inverse_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_cholesky_solve_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_clamp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_clone_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_complex_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_conj_physical_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_corrcoef_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_cos_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_cov_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_cummin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_cumsum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_deg2rad_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_diag_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_diagflat_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_diagonal_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_diagonal_scatter_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_diff_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_div_floor_rounding_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_div_no_rounding_mode_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_div_trunc_rounding_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_dot_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_dsplit_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_dstack_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_einsum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_empty_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_equal_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_erfinv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_exp2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_exp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_expand_as_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_expand_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_expm1_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_eye_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fft_fft2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fft_fft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fft_fftn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fft_hfft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fft_ifft2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fft_ifft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fft_ifftn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fft_ifftshift_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fft_irfft2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fft_irfft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fft_rfftn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fliplr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_flipud_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_float_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_floor_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_full_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_gather_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_gcd_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_half_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_heaviside_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_histc_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_hsplit_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_hypot_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_i0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_igamma_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_index_add_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_index_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_index_fill_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_index_select_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_isin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_isinf_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_isnan_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_isneginf_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_isposinf_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_isreal_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_istft_cuda_complex64, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_jiterator_4inputs_with_extra_args_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_jiterator_binary_return_by_ref_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_jiterator_unary_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_kthvalue_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_lcm_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_ldexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_le_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_cross_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_det_singular_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_eigh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_eigvals_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_eigvalsh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_inv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_lstsq_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_lu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_lu_factor_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_lu_solve_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_matrix_power_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_matrix_rank_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_slogdet_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_solve_ex_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_tensorsolve_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_vector_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linspace_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_log10_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_log1p_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_log_softmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_logaddexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_logical_or_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_logical_xor_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_logit_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_logsumexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_lt_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_lu_solve_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_lu_unpack_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_mH_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_mT_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_argmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_cumprod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_fill_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_log_softmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_logaddexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_mean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_normalize_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_select_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_sum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_matmul_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_maximum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_mean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_meshgrid_list_of_tensors_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_meshgrid_variadic_tensors_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_min_binary_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_min_reduction_no_dim_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_min_reduction_with_dim_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_mul_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_mvlgamma_mvlgamma_p_5_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nan_to_num_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nanmean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nanmedian_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nansum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_narrow_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_native_dropout_backward_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_native_layer_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_new_empty_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_new_empty_strided_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nextafter_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_adaptive_avg_pool1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_adaptive_avg_pool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_adaptive_avg_pool3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_adaptive_max_pool3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_alpha_dropout_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_avg_pool3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_batch_norm_without_cudnn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_celu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_conv2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_conv_transpose1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_conv_transpose3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_dropout3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_dropout_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_fractional_max_pool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_glu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_hardshrink_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_hardswish_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_hinge_embedding_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_huber_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_instance_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_interpolate_area_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_interpolate_bicubic_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_interpolate_linear_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_interpolate_nearest_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_kl_div_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_layer_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_leaky_relu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_max_pool1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_max_pool3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_max_unpool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_mse_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_multi_margin_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_multilabel_margin_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_normalize_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_one_hot_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_pad_constant_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_pad_replicate_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_pairwise_distance_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_pixel_shuffle_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_pixel_unshuffle_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_poisson_nll_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_rrelu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_selu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_silu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_smooth_l1_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_soft_margin_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_softmin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_softmin_with_dtype_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_softplus_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_softshrink_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_softsign_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_triplet_margin_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_upsample_nearest_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_norm_fro_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_normal_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_normal_number_mean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_outer_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_permute_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_polygamma_polygamma_n_1_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_polygamma_polygamma_n_3_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_polygamma_polygamma_n_4_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_positive_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_prod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_rand_like_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_renorm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_repeat_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_reshape_as_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_resolve_neg_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_round_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_round_decimals_neg_3_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_rsqrt_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_rsub_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_scalar_tensor_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_scatter_add_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_scatter_reduce_amin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_scatter_reduce_prod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_segment_reduce_lengths_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_segment_reduce_offsets_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_select_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_select_scatter_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_sgn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_short_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_sigmoid_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_sign_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_signal_windows_bartlett_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_signal_windows_blackman_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_signal_windows_cosine_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_signal_windows_exponential_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_signal_windows_general_hamming_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_signal_windows_hann_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_signal_windows_kaiser_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_signbit_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_slice_scatter_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_softmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_softmax_with_dtype_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_sparse_sampled_addmm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_airy_ai_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_bessel_j1_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_bessel_y0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_chebyshev_polynomial_u_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_chebyshev_polynomial_v_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_chebyshev_polynomial_w_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_entr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_i1e_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_laguerre_polynomial_l_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_log_ndtr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_modified_bessel_k0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_modified_bessel_k1_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_ndtr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_ndtri_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_polygamma_special_polygamma_n_0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_scaled_modified_bessel_k1_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_xlog1py_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_split_list_args_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_split_with_sizes_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_sqrt_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_squeeze_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_stack_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_std_mean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_std_mean_unbiased_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_std_unbiased_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_stft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_sub_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_sum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_sum_to_size_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_take_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_tanh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_tensordot_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_tile_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_to_sparse_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_transpose_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_trapz_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_tril_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_triu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_true_divide_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_unbind_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_unflatten_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_unique_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_view_as_complex_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_view_as_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_view_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_where_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_xlogy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_zero__cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_H_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags___getitem___cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags___radd___cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags___rand___cuda_int64, test/test_ops.py::TestTagsCUDA::test_tags___rdiv___cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags___rmul___cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags___rpow___cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__native_batch_norm_legit_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_T_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs__conversions_chalf_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs__conversions_char_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs__conversions_half_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs__conversions_int_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs__conversions_long_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_abs_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_acosh_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_addr_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_amin_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_arange_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_as_strided_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_asin_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_asinh_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_atan2_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_atan_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_bitwise_or_cuda_int64, test/test_ops.py::TestTagsCUDA::test_tags__refs_bitwise_xor_cuda_int64, test/test_ops.py::TestTagsCUDA::test_tags__refs_broadcast_shapes_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_broadcast_tensors_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_bucketize_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_cat_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_clamp_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_clamp_min_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_clone_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_column_stack_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_conj_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_contiguous_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_copysign_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_cos_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_cosh_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_diag_embed_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_diagonal_copy_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_diagonal_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_diagonal_scatter_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_digamma_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_div_no_rounding_mode_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_div_trunc_rounding_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_empty_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_erf_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_erfc_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_exp2_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_expand_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_eye_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_fft_fft2_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_fft_fftn_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_fft_fftshift_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_fft_hfft2_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_fft_hfft_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_fft_hfftn_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_fft_ifft2_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_fft_ifft_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_fft_ifftn_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_fft_irfft2_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_fft_irfftn_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_fft_rfft2_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_float_power_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_floor_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_floor_divide_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_fmin_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_fmod_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_frac_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_gcd_cuda_int64, test/test_ops.py::TestTagsCUDA::test_tags__refs_ge_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_hstack_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_index_copy_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_index_select_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_isfinite_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_isnan_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_isneginf_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_isposinf_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_le_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_linalg_norm_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_linspace_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_log1p_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_log2_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_logical_xor_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_logspace_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_logsumexp_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_masked_fill_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_maximum_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_minimum_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_mul_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nan_to_num_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_narrow_copy_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_narrow_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_native_layer_norm_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_ne_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_new_empty_strided_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_celu_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_elu_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_gelu_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_hardtanh_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_hinge_embedding_loss_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_l1_loss_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_layer_norm_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_leaky_relu_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_nll_loss_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_poisson_nll_loss_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_prelu_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_relu_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_softmin_with_dtype_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_softplus_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_softshrink_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_tanhshrink_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_norm_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_ones_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_permute_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_pow_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_prod_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_randn_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_remainder_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_reshape_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_roll_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_rot90_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_sgn_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_sign_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_signbit_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_sin_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_softmax_with_dtype_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_special_entr_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_special_i0e_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_special_i1_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_special_i1e_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_special_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_special_logit_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_special_ndtr_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_special_ndtri_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_special_softmax_with_dtype_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_special_spherical_bessel_j0_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_special_zeta_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_std_mean_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_sub_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_sum_to_size_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_tan_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_tanh_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_tensor_split_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_to_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_tril_indices_cuda_int64, test/test_ops.py::TestTagsCUDA::test_tags__refs_triu_indices_cuda_int64, test/test_ops.py::TestTagsCUDA::test_tags__refs_true_divide_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_unbind_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_unflatten_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_unfold_copy_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_unfold_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_unsqueeze_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_var_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_view_as_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_view_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_vsplit_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_vstack_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__softmax_backward_data_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_abs_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_acosh_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_addbmm_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_addcmul_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_addmv_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_addr_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_allclose_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_aminmax_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_angle_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_arange_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_argmax_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_asin_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_asinh_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_atan2_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_atan_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_atleast_2d_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_baddbmm_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_bitwise_and_cuda_int64, test/test_ops.py::TestTagsCUDA::test_tags_bitwise_left_shift_cuda_int64, test/test_ops.py::TestTagsCUDA::test_tags_bitwise_not_cuda_int64, test/test_ops.py::TestTagsCUDA::test_tags_bitwise_or_cuda_int64, test/test_ops.py::TestTagsCUDA::test_tags_bmm_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_broadcast_shapes_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_broadcast_to_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_byte_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_cartesian_prod_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_cdouble_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_chalf_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_char_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_cholesky_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_clamp_min_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_column_stack_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_complex_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_conj_physical_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_contiguous_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_cummax_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_cummin_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_cumsum_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_deg2rad_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_diagflat_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_diagonal_copy_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_diagonal_scatter_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_digamma_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_div_floor_rounding_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_div_trunc_rounding_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_double_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_einsum_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_empty_like_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_exp_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_expand_as_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_fft_fft2_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_fft_fft_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_fft_fftshift_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_fft_ifft2_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_fft_ifft_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_fft_ifftn_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_fft_ifftshift_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_fft_ihfft_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_fft_ihfftn_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_fft_irfft_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_flip_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_fliplr_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_flipud_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_float_power_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_floor_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_floor_divide_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_fmax_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_frac_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_geqrf_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_gradient_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_gt_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_heaviside_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_histc_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_hstack_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_hypot_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_igamma_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_igammac_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_index_put_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_index_select_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_inner_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_isclose_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_isfinite_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_isinf_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_isnan_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_isneginf_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_isposinf_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_jiterator_unary_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_le_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_lerp_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_cond_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_det_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_det_singular_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_eig_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_eigvalsh_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_inv_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_inv_ex_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_ldl_solve_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_lstsq_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_lstsq_grad_oriented_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_lu_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_lu_factor_ex_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_lu_solve_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_norm_subgradients_at_zero_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_pinv_hermitian_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_pinv_singular_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_slogdet_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_svd_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_svdvals_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_tensorsolve_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_vector_norm_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_log10_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_logical_and_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_logical_not_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_logical_or_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_logical_xor_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_long_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_lt_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_lu_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_lu_solve_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_lu_unpack_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_masked_amin_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_masked_argmax_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_masked_cumprod_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_masked_cumsum_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_masked_fill_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_masked_scatter_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_masked_select_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_masked_softmin_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_masked_sum_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_masked_var_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_matmul_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_matrix_exp_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_max_pool2d_with_indices_backward_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_max_reduction_no_dim_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_max_reduction_with_dim_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_median_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_meshgrid_variadic_tensors_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_min_reduction_no_dim_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_min_reduction_with_dim_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_mode_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_movedim_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_msort_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_mul_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nanmedian_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_narrow_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_native_dropout_backward_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_neg_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_new_empty_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_new_empty_strided_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_new_ones_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_new_zeros_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional__scaled_dot_product_attention_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_batch_norm_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_bilinear_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_conv1d_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_conv_transpose1d_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_dropout3d_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_elu_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_glu_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_grid_sample_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_instance_norm_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_kl_div_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_layer_norm_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_leaky_relu_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_local_response_norm_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_max_pool2d_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_max_unpool1d_grad_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_mish_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_mse_loss_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_pad_circular_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_pad_replicate_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_pairwise_distance_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_poisson_nll_loss_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_relu6_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_rrelu_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_selu_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_softmin_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_softmin_with_dtype_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_softsign_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_unfold_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nonzero_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_norm_inf_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_normal_number_mean_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_ops_nvprims_native_batch_norm_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_ops_nvprims_var_mean_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_ormqr_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_outer_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_pca_lowrank_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_permute_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_polygamma_polygamma_n_1_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_polygamma_polygamma_n_2_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_polygamma_polygamma_n_3_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_put_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_rad2deg_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_randint_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_randn_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_real_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_reciprocal_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_remainder_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_renorm_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_repeat_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_repeat_interleave_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_reshape_as_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_reshape_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_resize__cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_resize_as__cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_roll_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_round_decimals_0_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_scatter_add_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_scatter_reduce_prod_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_segment_reduce_offsets_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_select_scatter_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_short_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_sign_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_signal_windows_blackman_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_signal_windows_cosine_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_signal_windows_exponential_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_signal_windows_general_hamming_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_signal_windows_hann_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_signbit_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_sinc_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_sinh_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_slice_scatter_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_sort_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_special_airy_ai_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_special_bessel_j1_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_special_chebyshev_polynomial_w_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_special_hermite_polynomial_h_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_special_hermite_polynomial_he_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_special_legendre_polynomial_p_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_special_modified_bessel_k0_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_special_modified_bessel_k1_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_special_ndtr_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_special_ndtri_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_special_polygamma_special_polygamma_n_0_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_special_scaled_modified_bessel_k0_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_special_spherical_bessel_j0_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_split_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_split_with_sizes_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_sqrt_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_squeeze_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_stack_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_std_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_sub_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_sum_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_svd_lowrank_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_t_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_take_along_dim_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_tensordot_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_tile_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_to_sparse_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_topk_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_trace_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_transpose_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_trapezoid_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_trapz_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_tril_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_tril_indices_cuda_int64, test/test_ops.py::TestTagsCUDA::test_tags_triu_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_triu_indices_cuda_int64, test/test_ops.py::TestTagsCUDA::test_tags_true_divide_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_unflatten_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_unfold_copy_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_unfold_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_uniform_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_unique_consecutive_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_var_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_var_mean_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_vdot_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_view_as_complex_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_view_as_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_view_as_real_cuda_complex64, test/test_ops.py::TestTagsCUDA::test_tags_view_copy_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_view_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_where_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_xlogy_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_zeros_like_cuda_float32 2023-01-11T23:13:46.9946776Z 2023-01-11T23:13:46.9947071Z test_ops.py::TestCommonCUDA::test_compare_cpu_H_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:46.9947542Z test_ops.py::TestCommonCUDA::test_compare_cpu___getitem___cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:46.9948009Z test_ops.py::TestCommonCUDA::test_compare_cpu___rand___cuda_int64 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:46.9948462Z test_ops.py::TestCommonCUDA::test_compare_cpu___rmatmul___cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:46.9949082Z test_ops.py::TestCommonCUDA::test_compare_cpu___rmul___cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:46.9949529Z test_ops.py::TestCommonCUDA::test_compare_cpu___ror___cuda_int64 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:46.9950064Z test_ops.py::TestCommonCUDA::test_compare_cpu___rpow___cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:46.9950502Z test_ops.py::TestCommonCUDA::test_compare_cpu___rsub___cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:46.9950965Z test_ops.py::TestCommonCUDA::test_compare_cpu__native_batch_norm_legit_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:46.9951439Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs__conversions_bool_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:46.9951923Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs__conversions_cfloat_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:46.9952413Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs__conversions_complex_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:46.9952890Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs__conversions_float_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:46.9953357Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs__conversions_int_cuda_float32 SKIPPED (Overflow when downcasting signed type is undefined) [ 0%] 2023-01-11T23:13:46.9953818Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs__conversions_long_cuda_float32 SKIPPED (Overflow when downcasting signed type is undefined) [ 0%] 2023-01-11T23:13:46.9954285Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_as_strided_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:46.9954812Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_as_strided_partial_views_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:46.9955286Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_atleast_2d_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:46.9955739Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_atleast_3d_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:46.9956198Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_bitwise_right_shift_cuda_int64 SKIPPED (Skipped some inputs produce undefined outputs) [ 0%] 2023-01-11T23:13:46.9956657Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_chunk_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:46.9957126Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_copysign_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:46.9957579Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_cumsum_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:46.9958037Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_diagonal_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:46.9958510Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_diagonal_scatter_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:46.9958989Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_div_trunc_rounding_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:46.9959504Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_dsplit_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:46.9960155Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_empty_cuda_float32 SKIPPED (output is non-deterministic) [ 0%] 2023-01-11T23:13:46.9960603Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_expand_as_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:46.9961062Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_expand_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:46.9961519Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_fft_fftshift_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:46.9961963Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_flip_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:46.9962417Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_fliplr_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:46.9962871Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_fmax_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:46.9963323Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_hsplit_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:46.9963769Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_index_copy_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:46.9964225Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_lerp_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:46.9964687Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_linalg_matrix_norm_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:46.9965161Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_linalg_norm_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:46.9965665Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_linalg_svdvals_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:46.9966143Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_linalg_vector_norm_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:46.9966605Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_linspace_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:46.9967080Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_meshgrid_list_of_tensors_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:46.9967556Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_new_zeros_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:46.9968016Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_nextafter_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:46.9968483Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_nn_functional_alpha_dropout_cuda_float32 SKIPPED (Expected: dropout is not comparable) [ 0%] 2023-01-11T23:13:46.9968984Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_nn_functional_hardtanh_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:46.9969491Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_nn_functional_huber_loss_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:46.9969979Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_nn_functional_log_softmax_with_dtype_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:46.9970516Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_nn_functional_margin_ranking_loss_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:46.9971005Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_nn_functional_nll_loss_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:46.9971485Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_nn_functional_relu6_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:46.9971979Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_nn_functional_softmin_with_dtype_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:46.9972464Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_nn_functional_softshrink_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:46.9972966Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_nn_functional_triplet_margin_loss_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:46.9973443Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_norm_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:46.9973901Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_reshape_as_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:46.9974349Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_rot90_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:46.9974921Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_squeeze_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:46.9975384Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_std_mean_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:46.9975887Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_sum_to_size_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:46.9976337Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_t_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:46.9976788Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_trace_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:46.9977238Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_triu_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:46.9977692Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_true_divide_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:46.9978152Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_unfold_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:46.9978606Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_var_mean_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:46.9979061Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_view_as_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:46.9979514Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_vsplit_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:46.9979969Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_vstack_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:46.9980408Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_xlogy_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:46.9980913Z test_ops.py::TestCommonCUDA::test_compare_cpu__softmax_backward_data_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:46.9981389Z test_ops.py::TestCommonCUDA::test_compare_cpu_addmm_decomposed_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:46.9981847Z test_ops.py::TestCommonCUDA::test_compare_cpu_argsort_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:46.9982305Z test_ops.py::TestCommonCUDA::test_compare_cpu_as_strided_partial_views_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:46.9982769Z test_ops.py::TestCommonCUDA::test_compare_cpu_atleast_1d_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:46.9983224Z test_ops.py::TestCommonCUDA::test_compare_cpu_atleast_2d_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:46.9983675Z test_ops.py::TestCommonCUDA::test_compare_cpu_atleast_3d_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:46.9984110Z test_ops.py::TestCommonCUDA::test_compare_cpu_bitwise_right_shift_cuda_int64 SKIPPED (Some inputs produce undefined outputs) [ 0%] 2023-01-11T23:13:46.9984559Z test_ops.py::TestCommonCUDA::test_compare_cpu_block_diag_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:46.9985008Z test_ops.py::TestCommonCUDA::test_compare_cpu_bool_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:46.9985459Z test_ops.py::TestCommonCUDA::test_compare_cpu_cartesian_prod_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:46.9985918Z test_ops.py::TestCommonCUDA::test_compare_cpu_cfloat_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:46.9986380Z test_ops.py::TestCommonCUDA::test_compare_cpu_chalf_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:46.9986836Z test_ops.py::TestCommonCUDA::test_compare_cpu_cholesky_inverse_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:46.9987299Z test_ops.py::TestCommonCUDA::test_compare_cpu_column_stack_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:46.9987753Z test_ops.py::TestCommonCUDA::test_compare_cpu_complex_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:46.9988198Z test_ops.py::TestCommonCUDA::test_compare_cpu_contiguous_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:46.9988652Z test_ops.py::TestCommonCUDA::test_compare_cpu_copysign_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:46.9989097Z test_ops.py::TestCommonCUDA::test_compare_cpu_corrcoef_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:46.9989544Z test_ops.py::TestCommonCUDA::test_compare_cpu_cross_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:46.9990080Z test_ops.py::TestCommonCUDA::test_compare_cpu_diagonal_copy_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:46.9990529Z test_ops.py::TestCommonCUDA::test_compare_cpu_dist_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:46.9990970Z test_ops.py::TestCommonCUDA::test_compare_cpu_dot_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:46.9991455Z test_ops.py::TestCommonCUDA::test_compare_cpu_double_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:46.9991896Z test_ops.py::TestCommonCUDA::test_compare_cpu_einsum_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:46.9992422Z test_ops.py::TestCommonCUDA::test_compare_cpu_empty_like_cuda_float32 SKIPPED (output is non-deterministic) [ 0%] 2023-01-11T23:13:46.9992857Z test_ops.py::TestCommonCUDA::test_compare_cpu_eye_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:46.9993314Z test_ops.py::TestCommonCUDA::test_compare_cpu_fft_ifftshift_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:46.9993762Z test_ops.py::TestCommonCUDA::test_compare_cpu_flip_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:46.9994215Z test_ops.py::TestCommonCUDA::test_compare_cpu_flipud_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:46.9994664Z test_ops.py::TestCommonCUDA::test_compare_cpu_float_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:46.9995128Z test_ops.py::TestCommonCUDA::test_compare_cpu_grid_sampler_2d_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:46.9995579Z test_ops.py::TestCommonCUDA::test_compare_cpu_histc_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:46.9996026Z test_ops.py::TestCommonCUDA::test_compare_cpu_hsplit_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:46.9996471Z test_ops.py::TestCommonCUDA::test_compare_cpu_hstack_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:46.9996931Z test_ops.py::TestCommonCUDA::test_compare_cpu_index_copy_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:46.9997465Z test_ops.py::TestCommonCUDA::test_compare_cpu_index_fill_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:46.9997921Z test_ops.py::TestCommonCUDA::test_compare_cpu_index_reduce_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:46.9998375Z test_ops.py::TestCommonCUDA::test_compare_cpu_int_cuda_float32 SKIPPED (Overflow when downcasting signed type is undefined) [ 0%] 2023-01-11T23:13:46.9998825Z test_ops.py::TestCommonCUDA::test_compare_cpu_isin_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:46.9999328Z test_ops.py::TestCommonCUDA::test_compare_cpu_istft_cuda_complex64 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:46.9999778Z test_ops.py::TestCommonCUDA::test_compare_cpu_ldexp_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:47.0000232Z test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_cond_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:47.0000705Z test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_det_singular_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:47.0001175Z test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_eig_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:47.0001629Z test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_eigh_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:47.0002107Z test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_householder_product_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:47.0002609Z test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_inv_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:47.0003072Z test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_inv_ex_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:47.0003529Z test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_ldl_factor_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:47.0004005Z test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_ldl_factor_ex_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:47.0004479Z test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_ldl_solve_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:47.0004945Z test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_lstsq_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:47.0005402Z test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_lu_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:47.0005856Z test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_lu_solve_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:47.0006325Z test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_matrix_rank_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:47.0006790Z test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_norm_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:47.0007247Z test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_pinv_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:47.0007706Z test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_slogdet_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:47.0008195Z test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_solve_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:47.0008664Z test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_vector_norm_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:47.0009131Z test_ops.py::TestCommonCUDA::test_compare_cpu_log_softmax_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:47.0009591Z test_ops.py::TestCommonCUDA::test_compare_cpu_log_softmax_with_dtype_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:47.0010058Z test_ops.py::TestCommonCUDA::test_compare_cpu_logaddexp_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:47.0010526Z test_ops.py::TestCommonCUDA::test_compare_cpu_logcumsumexp_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:47.0010987Z test_ops.py::TestCommonCUDA::test_compare_cpu_logdet_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:47.0011433Z test_ops.py::TestCommonCUDA::test_compare_cpu_logspace_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:47.0011895Z test_ops.py::TestCommonCUDA::test_compare_cpu_masked_cumsum_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:47.0012357Z test_ops.py::TestCommonCUDA::test_compare_cpu_masked_fill_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:47.0012846Z test_ops.py::TestCommonCUDA::test_compare_cpu_masked_log_softmax_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:47.0013321Z test_ops.py::TestCommonCUDA::test_compare_cpu_masked_logsumexp_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:47.0013778Z test_ops.py::TestCommonCUDA::test_compare_cpu_masked_norm_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:47.0014239Z test_ops.py::TestCommonCUDA::test_compare_cpu_masked_normalize_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:47.0014804Z test_ops.py::TestCommonCUDA::test_compare_cpu_masked_select_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:47.0015264Z test_ops.py::TestCommonCUDA::test_compare_cpu_masked_softmax_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:47.0015713Z test_ops.py::TestCommonCUDA::test_compare_cpu_matrix_exp_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:47.0016167Z test_ops.py::TestCommonCUDA::test_compare_cpu_median_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:47.0016624Z test_ops.py::TestCommonCUDA::test_compare_cpu_min_reduction_no_dim_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:47.0017100Z test_ops.py::TestCommonCUDA::test_compare_cpu_min_reduction_with_dim_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:47.0017548Z test_ops.py::TestCommonCUDA::test_compare_cpu_mode_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2023-01-11T23:13:47.0018063Z test_ops.py::TestCommonCUDA::test_compare_cpu_multinomial_cuda_float32 SKIPPED (output is non-deterministic) [ 1%] 2023-01-11T23:13:47.0018503Z test_ops.py::TestCommonCUDA::test_compare_cpu_narrow_copy_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:13:47.0018994Z test_ops.py::TestCommonCUDA::test_compare_cpu_narrow_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:13:47.0019444Z test_ops.py::TestCommonCUDA::test_compare_cpu_native_batch_norm_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:13:47.0019957Z test_ops.py::TestCommonCUDA::test_compare_cpu_new_empty_cuda_float32 SKIPPED (output is non-deterministic) [ 1%] 2023-01-11T23:13:47.0020386Z test_ops.py::TestCommonCUDA::test_compare_cpu_new_full_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:13:47.0020836Z test_ops.py::TestCommonCUDA::test_compare_cpu_new_zeros_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:13:47.0021418Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional__scaled_dot_product_attention_cuda_float32 SKIPPED (output is non-deterministic (when dropout_p > 0)) [ 1%] 2023-01-11T23:13:47.0021923Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_adaptive_avg_pool1d_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:13:47.0022413Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_avg_pool1d_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:13:47.0022883Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_avg_pool2d_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:13:47.0023353Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_batch_norm_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:13:47.0023876Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_binary_cross_entropy_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:13:47.0024390Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_binary_cross_entropy_with_logits_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:13:47.0024884Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_conv1d_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:13:47.0025360Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_conv2d_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:13:47.0025841Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_cosine_similarity_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:13:47.0026331Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_ctc_loss_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:13:47.0026878Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_dropout3d_cuda_float32 SKIPPED (output is non-deterministic) [ 1%] 2023-01-11T23:13:47.0027399Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_dropout_cuda_float32 SKIPPED (output is non-deterministic) [ 1%] 2023-01-11T23:13:47.0027864Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_embedding_bag_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:13:47.0028303Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_embedding_cuda_float32 SKIPPED (Skipped!) [ 1%] 2023-01-11T23:13:47.0028739Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_glu_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:13:47.0029218Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_hardswish_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:13:47.0029817Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_interpolate_bicubic_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:13:47.0030318Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_interpolate_nearest_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:13:47.0030827Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_interpolate_trilinear_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:13:47.0031321Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_linear_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:13:47.0031806Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_local_response_norm_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:13:47.0032293Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_max_pool1d_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:13:47.0032773Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_max_unpool1d_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:13:47.0033267Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_max_unpool1d_grad_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:13:47.0033760Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_max_unpool2d_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:13:47.0034244Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_max_unpool3d_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:13:47.0034764Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_max_unpool3d_grad_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:13:47.0035257Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_multi_margin_loss_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:13:47.0035749Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_multilabel_margin_loss_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:13:47.0036252Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_multilabel_soft_margin_loss_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:13:47.0036736Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_normalize_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:13:47.0037212Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_pad_constant_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:13:47.0037700Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_pad_replicate_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:13:47.0038184Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_pixel_shuffle_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:13:47.0038666Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_soft_margin_loss_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:13:47.0039186Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_softmin_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:13:47.0039675Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_softmin_with_dtype_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:13:47.0040205Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_triplet_margin_loss_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:13:47.0040715Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_triplet_margin_with_distance_loss_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:13:47.0041214Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_upsample_bilinear_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:13:47.0041692Z test_ops.py::TestCommonCUDA::test_compare_cpu_nonzero_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:13:47.0042138Z test_ops.py::TestCommonCUDA::test_compare_cpu_norm_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:13:47.0042587Z test_ops.py::TestCommonCUDA::test_compare_cpu_norm_inf_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:13:47.0043034Z test_ops.py::TestCommonCUDA::test_compare_cpu_norm_nuc_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:13:47.0043538Z test_ops.py::TestCommonCUDA::test_compare_cpu_normal_cuda_float32 SKIPPED (output is non-deterministic) [ 1%] 2023-01-11T23:13:47.0044038Z test_ops.py::TestCommonCUDA::test_compare_cpu_normal_number_mean_cuda_float32 SKIPPED (output is non-deterministic) [ 1%] 2023-01-11T23:13:47.0044476Z test_ops.py::TestCommonCUDA::test_compare_cpu_ones_like_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:13:47.0044951Z test_ops.py::TestCommonCUDA::test_compare_cpu_ops_nvprims_native_batch_norm_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:13:47.0045452Z test_ops.py::TestCommonCUDA::test_compare_cpu_ops_nvprims_view_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:13:47.0045916Z test_ops.py::TestCommonCUDA::test_compare_cpu_ormqr_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:13:47.0046430Z test_ops.py::TestCommonCUDA::test_compare_cpu_pca_lowrank_cuda_float32 SKIPPED (output is non-deterministic) [ 1%] 2023-01-11T23:13:47.0046869Z test_ops.py::TestCommonCUDA::test_compare_cpu_pinverse_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:13:47.0047318Z test_ops.py::TestCommonCUDA::test_compare_cpu_polar_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:13:47.0047768Z test_ops.py::TestCommonCUDA::test_compare_cpu_qr_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:13:47.0048226Z test_ops.py::TestCommonCUDA::test_compare_cpu_quantile_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:13:47.0048743Z test_ops.py::TestCommonCUDA::test_compare_cpu_rand_like_cuda_float32 SKIPPED (output is non-deterministic) [ 1%] 2023-01-11T23:13:47.0049170Z test_ops.py::TestCommonCUDA::test_compare_cpu_randint_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:13:47.0049622Z test_ops.py::TestCommonCUDA::test_compare_cpu_renorm_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:13:47.0050085Z test_ops.py::TestCommonCUDA::test_compare_cpu_repeat_interleave_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:13:47.0050557Z test_ops.py::TestCommonCUDA::test_compare_cpu_reshape_as_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:13:47.0051010Z test_ops.py::TestCommonCUDA::test_compare_cpu_reshape_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:13:47.0051485Z test_ops.py::TestCommonCUDA::test_compare_cpu_resize__cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:13:47.0051935Z test_ops.py::TestCommonCUDA::test_compare_cpu_resize_as__cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:13:47.0052399Z test_ops.py::TestCommonCUDA::test_compare_cpu_resolve_neg_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:13:47.0052842Z test_ops.py::TestCommonCUDA::test_compare_cpu_rsub_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:13:47.0053301Z test_ops.py::TestCommonCUDA::test_compare_cpu_scalar_tensor_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:13:47.0053776Z test_ops.py::TestCommonCUDA::test_compare_cpu_scatter_add_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:13:47.0054235Z test_ops.py::TestCommonCUDA::test_compare_cpu_scatter_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:13:47.0054800Z test_ops.py::TestCommonCUDA::test_compare_cpu_scatter_reduce_mean_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:13:47.0055278Z test_ops.py::TestCommonCUDA::test_compare_cpu_scatter_reduce_prod_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:13:47.0055760Z test_ops.py::TestCommonCUDA::test_compare_cpu_segment_reduce_offsets_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:13:47.0056274Z test_ops.py::TestCommonCUDA::test_compare_cpu_select_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:13:47.0056726Z test_ops.py::TestCommonCUDA::test_compare_cpu_slice_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:13:47.0057182Z test_ops.py::TestCommonCUDA::test_compare_cpu_softmax_with_dtype_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:13:47.0057657Z test_ops.py::TestCommonCUDA::test_compare_cpu_sparse_sampled_addmm_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:13:47.0058265Z test_ops.py::TestCommonCUDA::test_compare_cpu_special_chebyshev_polynomial_v_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 1%] 2023-01-11T23:13:47.0058766Z test_ops.py::TestCommonCUDA::test_compare_cpu_special_shifted_chebyshev_polynomial_t_cuda_float32 SKIPPED (testing takes an unreasonably long time, #79528) [ 1%] 2023-01-11T23:13:47.0059302Z test_ops.py::TestCommonCUDA::test_compare_cpu_special_shifted_chebyshev_polynomial_w_cuda_float32 SKIPPED (testing takes an unreasonably long time, #79528) [ 1%] 2023-01-11T23:13:47.0059774Z test_ops.py::TestCommonCUDA::test_compare_cpu_split_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:13:47.0060236Z test_ops.py::TestCommonCUDA::test_compare_cpu_split_with_sizes_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:13:47.0060694Z test_ops.py::TestCommonCUDA::test_compare_cpu_stack_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:13:47.0061134Z test_ops.py::TestCommonCUDA::test_compare_cpu_stft_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:13:47.0061653Z test_ops.py::TestCommonCUDA::test_compare_cpu_svd_lowrank_cuda_float32 SKIPPED (output is non-deterministic) [ 1%] 2023-01-11T23:13:47.0062083Z test_ops.py::TestCommonCUDA::test_compare_cpu_t_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:13:47.0062559Z test_ops.py::TestCommonCUDA::test_compare_cpu_take_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:13:47.0063003Z test_ops.py::TestCommonCUDA::test_compare_cpu_to_sparse_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:13:47.0063452Z test_ops.py::TestCommonCUDA::test_compare_cpu_topk_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:13:47.0063910Z test_ops.py::TestCommonCUDA::test_compare_cpu_triangular_solve_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:13:47.0064364Z test_ops.py::TestCommonCUDA::test_compare_cpu_triu_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:13:47.0064822Z test_ops.py::TestCommonCUDA::test_compare_cpu_unflatten_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:13:47.0065272Z test_ops.py::TestCommonCUDA::test_compare_cpu_unfold_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:13:47.0065713Z test_ops.py::TestCommonCUDA::test_compare_cpu_unique_cuda_float32 SKIPPED (Output order is undefined when sorted=False) [ 1%] 2023-01-11T23:13:47.0066153Z test_ops.py::TestCommonCUDA::test_compare_cpu_var_mean_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:13:47.0066615Z test_ops.py::TestCommonCUDA::test_compare_cpu_var_mean_unbiased_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:13:47.0067092Z test_ops.py::TestCommonCUDA::test_compare_cpu_view_as_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:13:47.0067542Z test_ops.py::TestCommonCUDA::test_compare_cpu_view_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:13:47.0067990Z test_ops.py::TestCommonCUDA::test_compare_cpu_vstack_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:13:47.0068438Z test_ops.py::TestCommonCUDA::test_compare_cpu_xlogy_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:13:47.0068872Z test_ops.py::TestCommonCUDA::test_compare_cpu_zero__cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:13:47.0069305Z test_ops.py::TestCommonCUDA::test_compare_cpu_zeros_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2023-01-11T23:13:47.0069777Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_H_cuda_complex32 PASSED [ 1%] 2023-01-11T23:13:47.0070169Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_acos_cuda_complex32 PASSED [ 1%] 2023-01-11T23:13:47.0070560Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_as_strided_partial_views_cuda_complex32 XFAIL [ 1%] 2023-01-11T23:13:47.0070955Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_asin_cuda_complex32 PASSED [ 1%] 2023-01-11T23:13:47.0071343Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_atleast_3d_cuda_complex32 PASSED [ 1%] 2023-01-11T23:13:47.0071723Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_bool_cuda_complex32 PASSED [ 1%] 2023-01-11T23:13:47.0072090Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_cat_cuda_complex32 PASSED [ 1%] 2023-01-11T23:13:47.0072466Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_cfloat_cuda_complex32 PASSED [ 1%] 2023-01-11T23:13:47.0072848Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_chalf_cuda_complex32 PASSED [ 1%] 2023-01-11T23:13:47.0108747Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_chunk_cuda_complex32 PASSED [ 1%] 2023-01-11T23:13:47.0109303Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_clone_cuda_complex32 PASSED [ 1%] 2023-01-11T23:13:47.0109786Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_column_stack_cuda_complex32 PASSED [ 1%] 2023-01-11T23:13:47.0110204Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_conj_physical_cuda_complex32 PASSED [ 1%] 2023-01-11T23:13:47.0110595Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_cos_cuda_complex32 PASSED [ 1%] 2023-01-11T23:13:47.0110968Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_diag_cuda_complex32 PASSED [ 1%] 2023-01-11T23:13:47.0111358Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_diag_embed_cuda_complex32 PASSED [ 1%] 2023-01-11T23:13:47.0111756Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_diagonal_copy_cuda_complex32 PASSED [ 1%] 2023-01-11T23:13:47.0112138Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_diagonal_cuda_complex32 PASSED [ 1%] 2023-01-11T23:13:47.0112544Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_div_no_rounding_mode_cuda_complex32 PASSED [ 1%] 2023-01-11T23:13:47.0112943Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_dsplit_cuda_complex32 PASSED [ 1%] 2023-01-11T23:13:47.0113322Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_eq_cuda_complex32 PASSED [ 1%] 2023-01-11T23:13:47.0113694Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_fft_fft_cuda_complex32 PASSED [ 1%] 2023-01-11T23:13:47.0114072Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_fft_fftn_cuda_complex32 PASSED [ 1%] 2023-01-11T23:13:47.0114460Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_fft_fftshift_cuda_complex32 PASSED [ 1%] 2023-01-11T23:13:47.0114901Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_fft_hfftn_cuda_complex32 PASSED [ 1%] 2023-01-11T23:13:47.0115287Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_fft_ifft2_cuda_complex32 PASSED [ 1%] 2023-01-11T23:13:47.0115673Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_fft_ifft_cuda_complex32 PASSED [ 1%] 2023-01-11T23:13:47.0116059Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_fft_irfft_cuda_complex32 PASSED [ 1%] 2023-01-11T23:13:47.0116439Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_fft_irfftn_cuda_complex32 PASSED [ 1%] 2023-01-11T23:13:47.0116832Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_hstack_cuda_complex32 PASSED [ 1%] 2023-01-11T23:13:47.0117220Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_index_put_cuda_complex32 PASSED [ 1%] 2023-01-11T23:13:47.0117612Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_isfinite_cuda_complex32 PASSED [ 1%] 2023-01-11T23:13:47.0117989Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_lerp_cuda_complex32 PASSED [ 1%] 2023-01-11T23:13:47.0118369Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_log_cuda_complex32 PASSED [ 1%] 2023-01-11T23:13:47.0118768Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_log_softmax_with_dtype_cuda_complex32 PASSED [ 1%] 2023-01-11T23:13:47.0119166Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_mT_cuda_complex32 PASSED [ 1%] 2023-01-11T23:13:47.0119547Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_masked_fill_cuda_complex32 PASSED [ 1%] 2023-01-11T23:13:47.0119935Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_movedim_cuda_complex32 PASSED [ 1%] 2023-01-11T23:13:47.0120314Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_mul_cuda_complex32 PASSED [ 1%] 2023-01-11T23:13:47.0120700Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_narrow_copy_cuda_complex32 PASSED [ 1%] 2023-01-11T23:13:47.0121078Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_narrow_cuda_complex32 PASSED [ 1%] 2023-01-11T23:13:47.0121543Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_new_empty_cuda_complex32 SKIPPED (Expected: new_empty is not comparable) [ 1%] 2023-01-11T23:13:47.0122038Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_new_empty_strided_cuda_complex32 SKIPPED (Expected: new_empty_strided is not comparable) [ 1%] 2023-01-11T23:13:47.0122483Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_new_full_cuda_complex32 PASSED [ 2%] 2023-01-11T23:13:47.0122865Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_new_ones_cuda_complex32 PASSED [ 2%] 2023-01-11T23:13:47.0123275Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_nn_functional_conv_transpose2d_cuda_complex32 PASSED [ 2%] 2023-01-11T23:13:47.0123717Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_nn_functional_conv_transpose3d_cuda_complex32 PASSED [ 2%] 2023-01-11T23:13:47.0124119Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_nonzero_cuda_complex32 PASSED [ 2%] 2023-01-11T23:13:47.0124509Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_ones_like_cuda_complex32 PASSED [ 2%] 2023-01-11T23:13:47.0124893Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_permute_cuda_complex32 PASSED [ 2%] 2023-01-11T23:13:47.0125280Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_positive_cuda_complex32 PASSED [ 2%] 2023-01-11T23:13:47.0125654Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_pow_cuda_complex32 PASSED [ 2%] 2023-01-11T23:13:47.0126036Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_prod_cuda_complex32 PASSED [ 2%] 2023-01-11T23:13:47.0126480Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_rand_like_cuda_complex32 SKIPPED (Expected: randn_like is not comparable between dtypes) [ 2%] 2023-01-11T23:13:47.0127016Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_randn_like_cuda_complex32 SKIPPED (Expected: randn_like is not comparable between dtypes) [ 2%] 2023-01-11T23:13:47.0127447Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_real_cuda_complex32 PASSED [ 2%] 2023-01-11T23:13:47.0127838Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_reshape_as_cuda_complex32 PASSED [ 2%] 2023-01-11T23:13:47.0128225Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_reshape_cuda_complex32 PASSED [ 2%] 2023-01-11T23:13:47.0128616Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_resolve_neg_cuda_complex32 PASSED [ 2%] 2023-01-11T23:13:47.0129002Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_rsqrt_cuda_complex32 PASSED [ 2%] 2023-01-11T23:13:47.0129440Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_scalar_tensor_cuda_complex32 PASSED [ 2%] 2023-01-11T23:13:47.0129837Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_sigmoid_cuda_complex32 PASSED [ 2%] 2023-01-11T23:13:47.0130218Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_sinh_cuda_complex32 PASSED [ 2%] 2023-01-11T23:13:47.0130600Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_slice_cuda_complex32 PASSED [ 2%] 2023-01-11T23:13:47.0130995Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_split_with_sizes_cuda_complex32 PASSED [ 2%] 2023-01-11T23:13:47.0131388Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_squeeze_cuda_complex32 PASSED [ 2%] 2023-01-11T23:13:47.0131759Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_sub_cuda_complex32 PASSED [ 2%] 2023-01-11T23:13:47.0132136Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_tan_cuda_complex32 PASSED [ 2%] 2023-01-11T23:13:47.0132518Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_trace_cuda_complex32 PASSED [ 2%] 2023-01-11T23:13:47.0132905Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_transpose_cuda_complex32 PASSED [ 2%] 2023-01-11T23:13:47.0133318Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_true_divide_cuda_complex32 PASSED [ 2%] 2023-01-11T23:13:47.0133709Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_unbind_cuda_complex32 PASSED [ 2%] 2023-01-11T23:13:47.0134092Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_unfold_cuda_complex32 PASSED [ 2%] 2023-01-11T23:13:47.0134601Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_unsqueeze_cuda_complex32 PASSED [ 2%] 2023-01-11T23:13:47.0134982Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_view_as_cuda_complex32 PASSED [ 2%] 2023-01-11T23:13:47.0135364Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_view_cuda_complex32 PASSED [ 2%] 2023-01-11T23:13:47.0135748Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_vsplit_cuda_complex32 PASSED [ 2%] 2023-01-11T23:13:47.0136125Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_where_cuda_complex32 PASSED [ 2%] 2023-01-11T23:13:47.0136512Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_zeros_cuda_complex32 PASSED [ 2%] 2023-01-11T23:13:47.0136863Z test_ops.py::TestCommonCUDA::test_dtypes_H_cuda PASSED [ 2%] 2023-01-11T23:13:47.0137189Z test_ops.py::TestCommonCUDA::test_dtypes_T_cuda PASSED [ 2%] 2023-01-11T23:13:47.0137505Z test_ops.py::TestCommonCUDA::test_dtypes___rdiv___cuda PASSED [ 2%] 2023-01-11T23:13:47.0137827Z test_ops.py::TestCommonCUDA::test_dtypes___rmatmul___cuda PASSED [ 2%] 2023-01-11T23:13:47.0138157Z test_ops.py::TestCommonCUDA::test_dtypes___rpow___cuda PASSED [ 2%] 2023-01-11T23:13:47.0138472Z test_ops.py::TestCommonCUDA::test_dtypes___rxor___cuda PASSED [ 2%] 2023-01-11T23:13:47.0138875Z test_ops.py::TestCommonCUDA::test_dtypes__refs__conversions_bfloat16_cuda PASSED [ 2%] 2023-01-11T23:13:47.0139224Z test_ops.py::TestCommonCUDA::test_dtypes__refs__conversions_bool_cuda PASSED [ 2%] 2023-01-11T23:13:47.0139567Z test_ops.py::TestCommonCUDA::test_dtypes__refs__conversions_byte_cuda PASSED [ 2%] 2023-01-11T23:13:47.0139910Z test_ops.py::TestCommonCUDA::test_dtypes__refs__conversions_cdouble_cuda PASSED [ 2%] 2023-01-11T23:13:47.0140261Z test_ops.py::TestCommonCUDA::test_dtypes__refs__conversions_cfloat_cuda PASSED [ 2%] 2023-01-11T23:13:47.0140604Z test_ops.py::TestCommonCUDA::test_dtypes__refs__conversions_char_cuda PASSED [ 2%] 2023-01-11T23:13:47.0140942Z test_ops.py::TestCommonCUDA::test_dtypes__refs__conversions_complex_cuda PASSED [ 2%] 2023-01-11T23:13:47.0141291Z test_ops.py::TestCommonCUDA::test_dtypes__refs__conversions_int_cuda PASSED [ 2%] 2023-01-11T23:13:47.0141633Z test_ops.py::TestCommonCUDA::test_dtypes__refs__conversions_long_cuda PASSED [ 2%] 2023-01-11T23:13:47.0141969Z test_ops.py::TestCommonCUDA::test_dtypes__refs_abs_cuda PASSED [ 2%] 2023-01-11T23:13:47.0142286Z test_ops.py::TestCommonCUDA::test_dtypes__refs_addcdiv_cuda PASSED [ 2%] 2023-01-11T23:13:47.0142616Z test_ops.py::TestCommonCUDA::test_dtypes__refs_addcmul_cuda PASSED [ 2%] 2023-01-11T23:13:47.0142939Z test_ops.py::TestCommonCUDA::test_dtypes__refs_any_cuda PASSED [ 2%] 2023-01-11T23:13:47.0143256Z test_ops.py::TestCommonCUDA::test_dtypes__refs_as_strided_cuda PASSED [ 2%] 2023-01-11T23:13:47.0143606Z test_ops.py::TestCommonCUDA::test_dtypes__refs_as_strided_partial_views_cuda PASSED [ 2%] 2023-01-11T23:13:47.0143967Z test_ops.py::TestCommonCUDA::test_dtypes__refs_as_strided_scatter_cuda PASSED [ 2%] 2023-01-11T23:13:47.0144304Z test_ops.py::TestCommonCUDA::test_dtypes__refs_asin_cuda PASSED [ 2%] 2023-01-11T23:13:47.0144618Z test_ops.py::TestCommonCUDA::test_dtypes__refs_atan_cuda PASSED [ 2%] 2023-01-11T23:13:47.0144945Z test_ops.py::TestCommonCUDA::test_dtypes__refs_atleast_1d_cuda PASSED [ 2%] 2023-01-11T23:13:47.0145270Z test_ops.py::TestCommonCUDA::test_dtypes__refs_bitwise_xor_cuda PASSED [ 2%] 2023-01-11T23:13:47.0145625Z test_ops.py::TestCommonCUDA::test_dtypes__refs_broadcast_to_cuda PASSED [ 2%] 2023-01-11T23:13:47.0145953Z test_ops.py::TestCommonCUDA::test_dtypes__refs_ceil_cuda PASSED [ 2%] 2023-01-11T23:13:47.0146270Z test_ops.py::TestCommonCUDA::test_dtypes__refs_clamp_cuda PASSED [ 2%] 2023-01-11T23:13:47.0146593Z test_ops.py::TestCommonCUDA::test_dtypes__refs_clone_cuda PASSED [ 2%] 2023-01-11T23:13:47.0146912Z test_ops.py::TestCommonCUDA::test_dtypes__refs_column_stack_cuda PASSED [ 2%] 2023-01-11T23:13:47.0147239Z test_ops.py::TestCommonCUDA::test_dtypes__refs_contiguous_cuda PASSED [ 2%] 2023-01-11T23:13:47.0147564Z test_ops.py::TestCommonCUDA::test_dtypes__refs_copysign_cuda PASSED [ 2%] 2023-01-11T23:13:47.0147876Z test_ops.py::TestCommonCUDA::test_dtypes__refs_cosh_cuda PASSED [ 2%] 2023-01-11T23:13:47.0148202Z test_ops.py::TestCommonCUDA::test_dtypes__refs_cumsum_cuda PASSED [ 2%] 2023-01-11T23:13:47.0148516Z test_ops.py::TestCommonCUDA::test_dtypes__refs_diag_cuda PASSED [ 2%] 2023-01-11T23:13:47.0148841Z test_ops.py::TestCommonCUDA::test_dtypes__refs_diagonal_copy_cuda PASSED [ 2%] 2023-01-11T23:13:47.0149158Z test_ops.py::TestCommonCUDA::test_dtypes__refs_diagonal_cuda PASSED [ 2%] 2023-01-11T23:13:47.0149491Z test_ops.py::TestCommonCUDA::test_dtypes__refs_diagonal_scatter_cuda PASSED [ 2%] 2023-01-11T23:13:47.0149908Z test_ops.py::TestCommonCUDA::test_dtypes__refs_digamma_cuda PASSED [ 2%] 2023-01-11T23:13:47.0150238Z test_ops.py::TestCommonCUDA::test_dtypes__refs_div_no_rounding_mode_cuda PASSED [ 2%] 2023-01-11T23:13:47.0150573Z test_ops.py::TestCommonCUDA::test_dtypes__refs_empty_cuda PASSED [ 2%] 2023-01-11T23:13:47.0150896Z test_ops.py::TestCommonCUDA::test_dtypes__refs_empty_like_cuda PASSED [ 2%] 2023-01-11T23:13:47.0151259Z test_ops.py::TestCommonCUDA::test_dtypes__refs_eq_cuda PASSED [ 2%] 2023-01-11T23:13:47.0151571Z test_ops.py::TestCommonCUDA::test_dtypes__refs_erf_cuda PASSED [ 2%] 2023-01-11T23:13:47.0151894Z test_ops.py::TestCommonCUDA::test_dtypes__refs_erfinv_cuda PASSED [ 2%] 2023-01-11T23:13:47.0152220Z test_ops.py::TestCommonCUDA::test_dtypes__refs_exp2_cuda PASSED [ 2%] 2023-01-11T23:13:47.0152534Z test_ops.py::TestCommonCUDA::test_dtypes__refs_exp_cuda PASSED [ 2%] 2023-01-11T23:13:47.0152859Z test_ops.py::TestCommonCUDA::test_dtypes__refs_expand_cuda PASSED [ 2%] 2023-01-11T23:13:47.0153180Z test_ops.py::TestCommonCUDA::test_dtypes__refs_eye_cuda PASSED [ 2%] 2023-01-11T23:13:47.0153504Z test_ops.py::TestCommonCUDA::test_dtypes__refs_fft_fftn_cuda PASSED [ 2%] 2023-01-11T23:13:47.0153823Z test_ops.py::TestCommonCUDA::test_dtypes__refs_fft_hfft2_cuda PASSED [ 2%] 2023-01-11T23:13:47.0154157Z test_ops.py::TestCommonCUDA::test_dtypes__refs_fft_ifftshift_cuda PASSED [ 2%] 2023-01-11T23:13:47.0154484Z test_ops.py::TestCommonCUDA::test_dtypes__refs_fft_ihfft2_cuda PASSED [ 2%] 2023-01-11T23:13:47.0154805Z test_ops.py::TestCommonCUDA::test_dtypes__refs_fft_ihfftn_cuda PASSED [ 2%] 2023-01-11T23:13:47.0155129Z test_ops.py::TestCommonCUDA::test_dtypes__refs_fft_irfftn_cuda PASSED [ 2%] 2023-01-11T23:13:47.0155452Z test_ops.py::TestCommonCUDA::test_dtypes__refs_fft_rfft_cuda PASSED [ 2%] 2023-01-11T23:13:47.0155776Z test_ops.py::TestCommonCUDA::test_dtypes__refs_fft_rfftn_cuda PASSED [ 2%] 2023-01-11T23:13:47.0156095Z test_ops.py::TestCommonCUDA::test_dtypes__refs_flatten_cuda PASSED [ 2%] 2023-01-11T23:13:47.0156417Z test_ops.py::TestCommonCUDA::test_dtypes__refs_flip_cuda PASSED [ 2%] 2023-01-11T23:13:47.0156741Z test_ops.py::TestCommonCUDA::test_dtypes__refs_flipud_cuda PASSED [ 2%] 2023-01-11T23:13:47.0157059Z test_ops.py::TestCommonCUDA::test_dtypes__refs_float_power_cuda PASSED [ 2%] 2023-01-11T23:13:47.0157394Z test_ops.py::TestCommonCUDA::test_dtypes__refs_floor_divide_cuda PASSED [ 2%] 2023-01-11T23:13:47.0157719Z test_ops.py::TestCommonCUDA::test_dtypes__refs_fmax_cuda PASSED [ 2%] 2023-01-11T23:13:47.0158065Z test_ops.py::TestCommonCUDA::test_dtypes__refs_ge_cuda PASSED [ 2%] 2023-01-11T23:13:47.0158383Z test_ops.py::TestCommonCUDA::test_dtypes__refs_heaviside_cuda PASSED [ 2%] 2023-01-11T23:13:47.0158708Z test_ops.py::TestCommonCUDA::test_dtypes__refs_igammac_cuda PASSED [ 2%] 2023-01-11T23:13:47.0159030Z test_ops.py::TestCommonCUDA::test_dtypes__refs_imag_cuda PASSED [ 2%] 2023-01-11T23:13:47.0159344Z test_ops.py::TestCommonCUDA::test_dtypes__refs_index_add_cuda PASSED [ 2%] 2023-01-11T23:13:47.0159670Z test_ops.py::TestCommonCUDA::test_dtypes__refs_isclose_cuda PASSED [ 2%] 2023-01-11T23:13:47.0159997Z test_ops.py::TestCommonCUDA::test_dtypes__refs_isposinf_cuda PASSED [ 2%] 2023-01-11T23:13:47.0160322Z test_ops.py::TestCommonCUDA::test_dtypes__refs_lcm_cuda PASSED [ 2%] 2023-01-11T23:13:47.0160635Z test_ops.py::TestCommonCUDA::test_dtypes__refs_linalg_norm_cuda PASSED [ 2%] 2023-01-11T23:13:47.0160975Z test_ops.py::TestCommonCUDA::test_dtypes__refs_linalg_svdvals_cuda PASSED [ 2%] 2023-01-11T23:13:47.0161312Z test_ops.py::TestCommonCUDA::test_dtypes__refs_log10_cuda PASSED [ 2%] 2023-01-11T23:13:47.0161629Z test_ops.py::TestCommonCUDA::test_dtypes__refs_log1p_cuda PASSED [ 2%] 2023-01-11T23:13:47.0161953Z test_ops.py::TestCommonCUDA::test_dtypes__refs_logical_and_cuda PASSED [ 2%] 2023-01-11T23:13:47.0162281Z test_ops.py::TestCommonCUDA::test_dtypes__refs_logspace_cuda PASSED [ 2%] 2023-01-11T23:13:47.0162602Z test_ops.py::TestCommonCUDA::test_dtypes__refs_lt_cuda PASSED [ 2%] 2023-01-11T23:13:47.0162917Z test_ops.py::TestCommonCUDA::test_dtypes__refs_maximum_cuda PASSED [ 2%] 2023-01-11T23:13:47.0163266Z test_ops.py::TestCommonCUDA::test_dtypes__refs_mean_cuda PASSED [ 2%] 2023-01-11T23:13:47.0163610Z test_ops.py::TestCommonCUDA::test_dtypes__refs_meshgrid_list_of_tensors_cuda PASSED [ 2%] 2023-01-11T23:13:47.0163970Z test_ops.py::TestCommonCUDA::test_dtypes__refs_meshgrid_variadic_tensors_cuda PASSED [ 2%] 2023-01-11T23:13:47.0164317Z test_ops.py::TestCommonCUDA::test_dtypes__refs_minimum_cuda PASSED [ 2%] 2023-01-11T23:13:47.0164645Z test_ops.py::TestCommonCUDA::test_dtypes__refs_narrow_copy_cuda PASSED [ 2%] 2023-01-11T23:13:47.0164973Z test_ops.py::TestCommonCUDA::test_dtypes__refs_narrow_cuda PASSED [ 2%] 2023-01-11T23:13:47.0165299Z test_ops.py::TestCommonCUDA::test_dtypes__refs_native_layer_norm_cuda PASSED [ 2%] 2023-01-11T23:13:47.0165636Z test_ops.py::TestCommonCUDA::test_dtypes__refs_new_ones_cuda PASSED [ 2%] 2023-01-11T23:13:47.0165961Z test_ops.py::TestCommonCUDA::test_dtypes__refs_new_zeros_cuda PASSED [ 2%] 2023-01-11T23:13:47.0166303Z test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_alpha_dropout_cuda PASSED [ 2%] 2023-01-11T23:13:47.0166670Z test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_celu_cuda PASSED [ 2%] 2023-01-11T23:13:47.0167023Z test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_gelu_cuda PASSED [ 2%] 2023-01-11T23:13:47.0167377Z test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_glu_cuda PASSED [ 2%] 2023-01-11T23:13:47.0167727Z test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_hardshrink_cuda PASSED [ 2%] 2023-01-11T23:13:47.0168102Z test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_hinge_embedding_loss_cuda PASSED [ 2%] 2023-01-11T23:13:47.0168479Z test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_l1_loss_cuda PASSED [ 2%] 2023-01-11T23:13:47.0168846Z test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_log_softmax_with_dtype_cuda PASSED [ 2%] 2023-01-11T23:13:47.0169232Z test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_margin_ranking_loss_cuda PASSED [ 2%] 2023-01-11T23:13:47.0169602Z test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_mish_cuda PASSED [ 2%] 2023-01-11T23:13:47.0169956Z test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_mse_loss_cuda PASSED [ 2%] 2023-01-11T23:13:47.0170333Z test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_pdist_cuda PASSED [ 2%] 2023-01-11T23:13:47.0170702Z test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_poisson_nll_loss_cuda PASSED [ 2%] 2023-01-11T23:13:47.0171066Z test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_prelu_cuda PASSED [ 2%] 2023-01-11T23:13:47.0171417Z test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_relu6_cuda PASSED [ 2%] 2023-01-11T23:13:47.0171756Z test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_selu_cuda PASSED [ 2%] 2023-01-11T23:13:47.0172124Z test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_softmax_with_dtype_cuda PASSED [ 2%] 2023-01-11T23:13:47.0172499Z test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_softmin_with_dtype_cuda PASSED [ 2%] 2023-01-11T23:13:47.0172869Z test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_tanhshrink_cuda PASSED [ 2%] 2023-01-11T23:13:47.0173237Z test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_threshold_cuda PASSED [ 2%] 2023-01-11T23:13:47.0173581Z test_ops.py::TestCommonCUDA::test_dtypes__refs_norm_cuda PASSED [ 2%] 2023-01-11T23:13:47.0173915Z test_ops.py::TestCommonCUDA::test_dtypes__refs_permute_cuda PASSED [ 2%] 2023-01-11T23:13:47.0174242Z test_ops.py::TestCommonCUDA::test_dtypes__refs_positive_cuda PASSED [ 2%] 2023-01-11T23:13:47.0174670Z test_ops.py::TestCommonCUDA::test_dtypes__refs_pow_cuda PASSED [ 2%] 2023-01-11T23:13:47.0174995Z test_ops.py::TestCommonCUDA::test_dtypes__refs_prod_cuda PASSED [ 2%] 2023-01-11T23:13:47.0175314Z test_ops.py::TestCommonCUDA::test_dtypes__refs_randn_cuda PASSED [ 2%] 2023-01-11T23:13:47.0175634Z test_ops.py::TestCommonCUDA::test_dtypes__refs_real_cuda PASSED [ 2%] 2023-01-11T23:13:47.0176015Z test_ops.py::TestCommonCUDA::test_dtypes__refs_reciprocal_cuda PASSED [ 3%] 2023-01-11T23:13:47.0176350Z test_ops.py::TestCommonCUDA::test_dtypes__refs_reshape_as_cuda PASSED [ 3%] 2023-01-11T23:13:47.0176674Z test_ops.py::TestCommonCUDA::test_dtypes__refs_rot90_cuda PASSED [ 3%] 2023-01-11T23:13:47.0176996Z test_ops.py::TestCommonCUDA::test_dtypes__refs_round_cuda PASSED [ 3%] 2023-01-11T23:13:47.0177318Z test_ops.py::TestCommonCUDA::test_dtypes__refs_rsub_cuda PASSED [ 3%] 2023-01-11T23:13:47.0177633Z test_ops.py::TestCommonCUDA::test_dtypes__refs_sgn_cuda PASSED [ 3%] 2023-01-11T23:13:47.0177954Z test_ops.py::TestCommonCUDA::test_dtypes__refs_sinc_cuda PASSED [ 3%] 2023-01-11T23:13:47.0178280Z test_ops.py::TestCommonCUDA::test_dtypes__refs_special_i1_cuda PASSED [ 3%] 2023-01-11T23:13:47.0178612Z test_ops.py::TestCommonCUDA::test_dtypes__refs_special_i1e_cuda PASSED [ 3%] 2023-01-11T23:13:47.0178950Z test_ops.py::TestCommonCUDA::test_dtypes__refs_special_log_ndtr_cuda PASSED [ 3%] 2023-01-11T23:13:47.0179317Z test_ops.py::TestCommonCUDA::test_dtypes__refs_special_multigammaln_mvlgamma_p_5_cuda PASSED [ 3%] 2023-01-11T23:13:47.0179677Z test_ops.py::TestCommonCUDA::test_dtypes__refs_special_ndtr_cuda PASSED [ 3%] 2023-01-11T23:13:47.0180020Z test_ops.py::TestCommonCUDA::test_dtypes__refs_special_softmax_with_dtype_cuda PASSED [ 3%] 2023-01-11T23:13:47.0180368Z test_ops.py::TestCommonCUDA::test_dtypes__refs_square_cuda PASSED [ 3%] 2023-01-11T23:13:47.0180692Z test_ops.py::TestCommonCUDA::test_dtypes__refs_std_mean_cuda PASSED [ 3%] 2023-01-11T23:13:47.0181017Z test_ops.py::TestCommonCUDA::test_dtypes__refs_t_cuda PASSED [ 3%] 2023-01-11T23:13:47.0181337Z test_ops.py::TestCommonCUDA::test_dtypes__refs_tensor_split_cuda PASSED [ 3%] 2023-01-11T23:13:47.0181662Z test_ops.py::TestCommonCUDA::test_dtypes__refs_trace_cuda PASSED [ 3%] 2023-01-11T23:13:47.0181987Z test_ops.py::TestCommonCUDA::test_dtypes__refs_tril_cuda PASSED [ 3%] 2023-01-11T23:13:47.0182303Z test_ops.py::TestCommonCUDA::test_dtypes__refs_tril_indices_cuda PASSED [ 3%] 2023-01-11T23:13:47.0182663Z test_ops.py::TestCommonCUDA::test_dtypes__refs_triu_cuda PASSED [ 3%] 2023-01-11T23:13:47.0182984Z test_ops.py::TestCommonCUDA::test_dtypes__refs_true_divide_cuda PASSED [ 3%] 2023-01-11T23:13:47.0183305Z test_ops.py::TestCommonCUDA::test_dtypes__refs_unbind_cuda PASSED [ 3%] 2023-01-11T23:13:47.0183618Z test_ops.py::TestCommonCUDA::test_dtypes__refs_unsqueeze_cuda PASSED [ 3%] 2023-01-11T23:13:47.0183938Z test_ops.py::TestCommonCUDA::test_dtypes__refs_var_cuda PASSED [ 3%] 2023-01-11T23:13:47.0184253Z test_ops.py::TestCommonCUDA::test_dtypes__refs_var_mean_cuda PASSED [ 3%] 2023-01-11T23:13:47.0184566Z test_ops.py::TestCommonCUDA::test_dtypes__refs_view_as_cuda PASSED [ 3%] 2023-01-11T23:13:47.0184883Z test_ops.py::TestCommonCUDA::test_dtypes__refs_vsplit_cuda PASSED [ 3%] 2023-01-11T23:13:47.0185205Z test_ops.py::TestCommonCUDA::test_dtypes__refs_zeros_cuda PASSED [ 3%] 2023-01-11T23:13:47.0185535Z test_ops.py::TestCommonCUDA::test_dtypes__softmax_backward_data_cuda PASSED [ 3%] 2023-01-11T23:13:47.0185856Z test_ops.py::TestCommonCUDA::test_dtypes_abs_cuda PASSED [ 3%] 2023-01-11T23:13:47.0186168Z test_ops.py::TestCommonCUDA::test_dtypes_acos_cuda PASSED [ 3%] 2023-01-11T23:13:47.0186480Z test_ops.py::TestCommonCUDA::test_dtypes_addbmm_cuda PASSED [ 3%] 2023-01-11T23:13:47.0186786Z test_ops.py::TestCommonCUDA::test_dtypes_addcdiv_cuda PASSED [ 3%] 2023-01-11T23:13:47.0187100Z test_ops.py::TestCommonCUDA::test_dtypes_addmm_cuda PASSED [ 3%] 2023-01-11T23:13:47.0187421Z test_ops.py::TestCommonCUDA::test_dtypes_addmm_decomposed_cuda PASSED [ 3%] 2023-01-11T23:13:47.0187742Z test_ops.py::TestCommonCUDA::test_dtypes_addr_cuda PASSED [ 3%] 2023-01-11T23:13:47.0188076Z test_ops.py::TestCommonCUDA::test_dtypes_allclose_cuda PASSED [ 3%] 2023-01-11T23:13:47.0188387Z test_ops.py::TestCommonCUDA::test_dtypes_amin_cuda PASSED [ 3%] 2023-01-11T23:13:47.0188700Z test_ops.py::TestCommonCUDA::test_dtypes_angle_cuda PASSED [ 3%] 2023-01-11T23:13:47.0189035Z test_ops.py::TestCommonCUDA::test_dtypes_argmax_cuda PASSED [ 3%] 2023-01-11T23:13:47.0189373Z test_ops.py::TestCommonCUDA::test_dtypes_argwhere_cuda PASSED [ 3%] 2023-01-11T23:13:47.0189757Z test_ops.py::TestCommonCUDA::test_dtypes_asin_cuda PASSED [ 3%] 2023-01-11T23:13:47.0190077Z test_ops.py::TestCommonCUDA::test_dtypes_atan2_cuda PASSED [ 3%] 2023-01-11T23:13:47.0190383Z test_ops.py::TestCommonCUDA::test_dtypes_atanh_cuda PASSED [ 3%] 2023-01-11T23:13:47.0190702Z test_ops.py::TestCommonCUDA::test_dtypes_atleast_3d_cuda PASSED [ 3%] 2023-01-11T23:13:47.0191023Z test_ops.py::TestCommonCUDA::test_dtypes_baddbmm_cuda PASSED [ 3%] 2023-01-11T23:13:47.0191332Z test_ops.py::TestCommonCUDA::test_dtypes_bincount_cuda PASSED [ 3%] 2023-01-11T23:13:47.0191651Z test_ops.py::TestCommonCUDA::test_dtypes_bitwise_and_cuda PASSED [ 3%] 2023-01-11T23:13:47.0191973Z test_ops.py::TestCommonCUDA::test_dtypes_bitwise_not_cuda PASSED [ 3%] 2023-01-11T23:13:47.0192294Z test_ops.py::TestCommonCUDA::test_dtypes_bitwise_or_cuda PASSED [ 3%] 2023-01-11T23:13:47.0192604Z test_ops.py::TestCommonCUDA::test_dtypes_bitwise_xor_cuda PASSED [ 3%] 2023-01-11T23:13:47.0192923Z test_ops.py::TestCommonCUDA::test_dtypes_block_diag_cuda PASSED [ 3%] 2023-01-11T23:13:47.0193247Z test_ops.py::TestCommonCUDA::test_dtypes_broadcast_tensors_cuda PASSED [ 3%] 2023-01-11T23:13:47.0193563Z test_ops.py::TestCommonCUDA::test_dtypes_cdouble_cuda PASSED [ 3%] 2023-01-11T23:13:47.0193876Z test_ops.py::TestCommonCUDA::test_dtypes_cfloat_cuda PASSED [ 3%] 2023-01-11T23:13:47.0194194Z test_ops.py::TestCommonCUDA::test_dtypes_cholesky_cuda PASSED [ 3%] 2023-01-11T23:13:47.0194517Z test_ops.py::TestCommonCUDA::test_dtypes_cholesky_solve_cuda PASSED [ 3%] 2023-01-11T23:13:47.0194860Z test_ops.py::TestCommonCUDA::test_dtypes_chunk_cuda PASSED [ 3%] 2023-01-11T23:13:47.0195179Z test_ops.py::TestCommonCUDA::test_dtypes_clamp_cuda PASSED [ 3%] 2023-01-11T23:13:47.0195496Z test_ops.py::TestCommonCUDA::test_dtypes_clamp_max_cuda PASSED [ 3%] 2023-01-11T23:13:47.0195807Z test_ops.py::TestCommonCUDA::test_dtypes_column_stack_cuda PASSED [ 3%] 2023-01-11T23:13:47.0196123Z test_ops.py::TestCommonCUDA::test_dtypes_complex_cuda PASSED [ 3%] 2023-01-11T23:13:47.0196439Z test_ops.py::TestCommonCUDA::test_dtypes_conj_cuda PASSED [ 3%] 2023-01-11T23:13:47.0196750Z test_ops.py::TestCommonCUDA::test_dtypes_constant_pad_nd_cuda PASSED [ 3%] 2023-01-11T23:13:47.0197077Z test_ops.py::TestCommonCUDA::test_dtypes_contiguous_cuda PASSED [ 3%] 2023-01-11T23:13:47.0197397Z test_ops.py::TestCommonCUDA::test_dtypes_corrcoef_cuda PASSED [ 3%] 2023-01-11T23:13:47.0197719Z test_ops.py::TestCommonCUDA::test_dtypes_cos_cuda PASSED [ 3%] 2023-01-11T23:13:47.0198022Z test_ops.py::TestCommonCUDA::test_dtypes_cummin_cuda PASSED [ 3%] 2023-01-11T23:13:47.0198335Z test_ops.py::TestCommonCUDA::test_dtypes_cumsum_cuda PASSED [ 3%] 2023-01-11T23:13:47.0198650Z test_ops.py::TestCommonCUDA::test_dtypes_deg2rad_cuda PASSED [ 3%] 2023-01-11T23:13:47.0198957Z test_ops.py::TestCommonCUDA::test_dtypes_diag_cuda PASSED [ 3%] 2023-01-11T23:13:47.0199273Z test_ops.py::TestCommonCUDA::test_dtypes_diagflat_cuda PASSED [ 3%] 2023-01-11T23:13:47.0199593Z test_ops.py::TestCommonCUDA::test_dtypes_diagonal_copy_cuda PASSED [ 3%] 2023-01-11T23:13:47.0199996Z test_ops.py::TestCommonCUDA::test_dtypes_diagonal_scatter_cuda PASSED [ 3%] 2023-01-11T23:13:47.0200307Z test_ops.py::TestCommonCUDA::test_dtypes_diff_cuda PASSED [ 3%] 2023-01-11T23:13:47.0200618Z test_ops.py::TestCommonCUDA::test_dtypes_dist_cuda PASSED [ 3%] 2023-01-11T23:13:47.0200944Z test_ops.py::TestCommonCUDA::test_dtypes_div_no_rounding_mode_cuda PASSED [ 3%] 2023-01-11T23:13:47.0201272Z test_ops.py::TestCommonCUDA::test_dtypes_div_trunc_rounding_cuda PASSED [ 3%] 2023-01-11T23:13:47.0201588Z test_ops.py::TestCommonCUDA::test_dtypes_double_cuda PASSED [ 3%] 2023-01-11T23:13:47.0201899Z test_ops.py::TestCommonCUDA::test_dtypes_dstack_cuda PASSED [ 3%] 2023-01-11T23:13:47.0202209Z test_ops.py::TestCommonCUDA::test_dtypes_einsum_cuda PASSED [ 3%] 2023-01-11T23:13:47.0202517Z test_ops.py::TestCommonCUDA::test_dtypes_empty_like_cuda PASSED [ 3%] 2023-01-11T23:13:47.0202833Z test_ops.py::TestCommonCUDA::test_dtypes_eq_cuda PASSED [ 3%] 2023-01-11T23:13:47.0203139Z test_ops.py::TestCommonCUDA::test_dtypes_erfc_cuda PASSED [ 3%] 2023-01-11T23:13:47.0203441Z test_ops.py::TestCommonCUDA::test_dtypes_erfinv_cuda PASSED [ 3%] 2023-01-11T23:13:47.0203751Z test_ops.py::TestCommonCUDA::test_dtypes_exp2_cuda PASSED [ 3%] 2023-01-11T23:13:47.0204054Z test_ops.py::TestCommonCUDA::test_dtypes_exp_cuda PASSED [ 3%] 2023-01-11T23:13:47.0204361Z test_ops.py::TestCommonCUDA::test_dtypes_expand_as_cuda PASSED [ 3%] 2023-01-11T23:13:47.0204668Z test_ops.py::TestCommonCUDA::test_dtypes_expand_cuda PASSED [ 3%] 2023-01-11T23:13:47.0204978Z test_ops.py::TestCommonCUDA::test_dtypes_expm1_cuda PASSED [ 3%] 2023-01-11T23:13:47.0205289Z test_ops.py::TestCommonCUDA::test_dtypes_eye_cuda PASSED [ 3%] 2023-01-11T23:13:47.0205592Z test_ops.py::TestCommonCUDA::test_dtypes_fft_fft2_cuda PASSED [ 3%] 2023-01-11T23:13:47.0205904Z test_ops.py::TestCommonCUDA::test_dtypes_fft_fft_cuda PASSED [ 3%] 2023-01-11T23:13:47.0206213Z test_ops.py::TestCommonCUDA::test_dtypes_fft_fftshift_cuda PASSED [ 3%] 2023-01-11T23:13:47.0206559Z test_ops.py::TestCommonCUDA::test_dtypes_fft_hfft2_cuda PASSED [ 3%] 2023-01-11T23:13:47.0206864Z test_ops.py::TestCommonCUDA::test_dtypes_fft_hfft_cuda PASSED [ 3%] 2023-01-11T23:13:47.0207174Z test_ops.py::TestCommonCUDA::test_dtypes_fft_ifft2_cuda PASSED [ 3%] 2023-01-11T23:13:47.0207486Z test_ops.py::TestCommonCUDA::test_dtypes_fft_ifft_cuda PASSED [ 3%] 2023-01-11T23:13:47.0207789Z test_ops.py::TestCommonCUDA::test_dtypes_fft_ihfft_cuda PASSED [ 3%] 2023-01-11T23:13:47.0208104Z test_ops.py::TestCommonCUDA::test_dtypes_fft_irfft2_cuda PASSED [ 3%] 2023-01-11T23:13:47.0208411Z test_ops.py::TestCommonCUDA::test_dtypes_fft_irfft_cuda PASSED [ 3%] 2023-01-11T23:13:47.0208720Z test_ops.py::TestCommonCUDA::test_dtypes_fft_rfft2_cuda PASSED [ 3%] 2023-01-11T23:13:47.0209029Z test_ops.py::TestCommonCUDA::test_dtypes_fft_rfft_cuda PASSED [ 3%] 2023-01-11T23:13:47.0209378Z test_ops.py::TestCommonCUDA::test_dtypes_fill_cuda PASSED [ 3%] 2023-01-11T23:13:47.0209686Z test_ops.py::TestCommonCUDA::test_dtypes_flip_cuda PASSED [ 3%] 2023-01-11T23:13:47.0209989Z test_ops.py::TestCommonCUDA::test_dtypes_float_cuda PASSED [ 3%] 2023-01-11T23:13:47.0210296Z test_ops.py::TestCommonCUDA::test_dtypes_floor_cuda PASSED [ 3%] 2023-01-11T23:13:47.0210602Z test_ops.py::TestCommonCUDA::test_dtypes_fmod_cuda PASSED [ 3%] 2023-01-11T23:13:47.0210912Z test_ops.py::TestCommonCUDA::test_dtypes_frexp_cuda PASSED [ 3%] 2023-01-11T23:13:47.0211215Z test_ops.py::TestCommonCUDA::test_dtypes_gather_cuda PASSED [ 3%] 2023-01-11T23:13:47.0211519Z test_ops.py::TestCommonCUDA::test_dtypes_gcd_cuda PASSED [ 3%] 2023-01-11T23:13:47.0211850Z test_ops.py::TestCommonCUDA::test_dtypes_ge_cuda PASSED [ 3%] 2023-01-11T23:13:47.0212151Z test_ops.py::TestCommonCUDA::test_dtypes_half_cuda PASSED [ 3%] 2023-01-11T23:13:47.0212462Z test_ops.py::TestCommonCUDA::test_dtypes_heaviside_cuda PASSED [ 3%] 2023-01-11T23:13:47.0212770Z test_ops.py::TestCommonCUDA::test_dtypes_histc_cuda PASSED [ 3%] 2023-01-11T23:13:47.0213083Z test_ops.py::TestCommonCUDA::test_dtypes_histogramdd_cuda PASSED [ 3%] 2023-01-11T23:13:47.0213392Z test_ops.py::TestCommonCUDA::test_dtypes_hsplit_cuda PASSED [ 3%] 2023-01-11T23:13:47.0213699Z test_ops.py::TestCommonCUDA::test_dtypes_igamma_cuda PASSED [ 3%] 2023-01-11T23:13:47.0214009Z test_ops.py::TestCommonCUDA::test_dtypes_igammac_cuda PASSED [ 3%] 2023-01-11T23:13:47.0214311Z test_ops.py::TestCommonCUDA::test_dtypes_imag_cuda PASSED [ 3%] 2023-01-11T23:13:47.0214746Z test_ops.py::TestCommonCUDA::test_dtypes_index_copy_cuda PASSED [ 3%] 2023-01-11T23:13:47.0215057Z test_ops.py::TestCommonCUDA::test_dtypes_index_fill_cuda PASSED [ 3%] 2023-01-11T23:13:47.0215365Z test_ops.py::TestCommonCUDA::test_dtypes_index_put_cuda PASSED [ 3%] 2023-01-11T23:13:47.0215674Z test_ops.py::TestCommonCUDA::test_dtypes_index_select_cuda PASSED [ 3%] 2023-01-11T23:13:47.0215980Z test_ops.py::TestCommonCUDA::test_dtypes_inner_cuda PASSED [ 3%] 2023-01-11T23:13:47.0216285Z test_ops.py::TestCommonCUDA::test_dtypes_isclose_cuda PASSED [ 3%] 2023-01-11T23:13:47.0216585Z test_ops.py::TestCommonCUDA::test_dtypes_isposinf_cuda PASSED [ 3%] 2023-01-11T23:13:47.0216892Z test_ops.py::TestCommonCUDA::test_dtypes_istft_cuda PASSED [ 3%] 2023-01-11T23:13:47.0217219Z test_ops.py::TestCommonCUDA::test_dtypes_jiterator_2inputs_2outputs_cuda PASSED [ 3%] 2023-01-11T23:13:47.0217574Z test_ops.py::TestCommonCUDA::test_dtypes_jiterator_4inputs_with_extra_args_cuda PASSED [ 3%] 2023-01-11T23:13:47.0217900Z test_ops.py::TestCommonCUDA::test_dtypes_le_cuda PASSED [ 3%] 2023-01-11T23:13:47.0218202Z test_ops.py::TestCommonCUDA::test_dtypes_linalg_cond_cuda PASSED [ 3%] 2023-01-11T23:13:47.0218563Z test_ops.py::TestCommonCUDA::test_dtypes_linalg_det_singular_cuda PASSED [ 3%] 2023-01-11T23:13:47.0218884Z test_ops.py::TestCommonCUDA::test_dtypes_linalg_eigh_cuda PASSED [ 3%] 2023-01-11T23:13:47.0219220Z test_ops.py::TestCommonCUDA::test_dtypes_linalg_eigvals_cuda PASSED [ 3%] 2023-01-11T23:13:47.0219576Z test_ops.py::TestCommonCUDA::test_dtypes_linalg_householder_product_cuda PASSED [ 3%] 2023-01-11T23:13:47.0219906Z test_ops.py::TestCommonCUDA::test_dtypes_linalg_inv_cuda PASSED [ 3%] 2023-01-11T23:13:47.0220211Z test_ops.py::TestCommonCUDA::test_dtypes_linalg_inv_ex_cuda PASSED [ 3%] 2023-01-11T23:13:47.0220524Z test_ops.py::TestCommonCUDA::test_dtypes_linalg_ldl_factor_cuda PASSED [ 3%] 2023-01-11T23:13:47.0220853Z test_ops.py::TestCommonCUDA::test_dtypes_linalg_ldl_factor_ex_cuda PASSED [ 3%] 2023-01-11T23:13:47.0221172Z test_ops.py::TestCommonCUDA::test_dtypes_linalg_ldl_solve_cuda PASSED [ 3%] 2023-01-11T23:13:47.0221488Z test_ops.py::TestCommonCUDA::test_dtypes_linalg_lstsq_cuda PASSED [ 3%] 2023-01-11T23:13:47.0221801Z test_ops.py::TestCommonCUDA::test_dtypes_linalg_multi_dot_cuda PASSED [ 3%] 2023-01-11T23:13:47.0222111Z test_ops.py::TestCommonCUDA::test_dtypes_linalg_norm_cuda PASSED [ 3%] 2023-01-11T23:13:47.0222438Z test_ops.py::TestCommonCUDA::test_dtypes_linalg_norm_subgradients_at_zero_cuda PASSED [ 3%] 2023-01-11T23:13:47.0222771Z test_ops.py::TestCommonCUDA::test_dtypes_linalg_pinv_cuda PASSED [ 3%] 2023-01-11T23:13:47.0223095Z test_ops.py::TestCommonCUDA::test_dtypes_linalg_pinv_hermitian_cuda PASSED [ 3%] 2023-01-11T23:13:47.0223492Z test_ops.py::TestCommonCUDA::test_dtypes_linalg_pinv_singular_cuda SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 3%] 2023-01-11T23:13:47.0223932Z test_ops.py::TestCommonCUDA::test_dtypes_linalg_qr_cuda PASSED [ 3%] 2023-01-11T23:13:47.0224245Z test_ops.py::TestCommonCUDA::test_dtypes_linalg_slogdet_cuda PASSED [ 3%] 2023-01-11T23:13:47.0224568Z test_ops.py::TestCommonCUDA::test_dtypes_linalg_solve_ex_cuda PASSED [ 4%] 2023-01-11T23:13:47.0224882Z test_ops.py::TestCommonCUDA::test_dtypes_linalg_tensorsolve_cuda PASSED [ 4%] 2023-01-11T23:13:47.0225200Z test_ops.py::TestCommonCUDA::test_dtypes_linspace_cuda PASSED [ 4%] 2023-01-11T23:13:47.0225507Z test_ops.py::TestCommonCUDA::test_dtypes_log10_cuda PASSED [ 4%] 2023-01-11T23:13:47.0225803Z test_ops.py::TestCommonCUDA::test_dtypes_log1p_cuda PASSED [ 4%] 2023-01-11T23:13:47.0226104Z test_ops.py::TestCommonCUDA::test_dtypes_log2_cuda PASSED [ 4%] 2023-01-11T23:13:47.0226403Z test_ops.py::TestCommonCUDA::test_dtypes_log_cuda PASSED [ 4%] 2023-01-11T23:13:47.0226710Z test_ops.py::TestCommonCUDA::test_dtypes_log_softmax_cuda PASSED [ 4%] 2023-01-11T23:13:47.0227030Z test_ops.py::TestCommonCUDA::test_dtypes_log_softmax_with_dtype_cuda PASSED [ 4%] 2023-01-11T23:13:47.0227361Z test_ops.py::TestCommonCUDA::test_dtypes_logical_or_cuda PASSED [ 4%] 2023-01-11T23:13:47.0227663Z test_ops.py::TestCommonCUDA::test_dtypes_logit_cuda PASSED [ 4%] 2023-01-11T23:13:47.0227963Z test_ops.py::TestCommonCUDA::test_dtypes_logspace_cuda PASSED [ 4%] 2023-01-11T23:13:47.0228271Z test_ops.py::TestCommonCUDA::test_dtypes_lu_unpack_cuda PASSED [ 4%] 2023-01-11T23:13:47.0228575Z test_ops.py::TestCommonCUDA::test_dtypes_mH_cuda PASSED [ 4%] 2023-01-11T23:13:47.0228873Z test_ops.py::TestCommonCUDA::test_dtypes_mT_cuda PASSED [ 4%] 2023-01-11T23:13:47.0229197Z test_ops.py::TestCommonCUDA::test_dtypes_masked_amax_cuda PASSED [ 4%] 2023-01-11T23:13:47.0229534Z test_ops.py::TestCommonCUDA::test_dtypes_masked_argmax_cuda PASSED [ 4%] 2023-01-11T23:13:47.0229931Z test_ops.py::TestCommonCUDA::test_dtypes_masked_argmin_cuda PASSED [ 4%] 2023-01-11T23:13:47.0230273Z test_ops.py::TestCommonCUDA::test_dtypes_masked_cumprod_cuda PASSED [ 4%] 2023-01-11T23:13:47.0230586Z test_ops.py::TestCommonCUDA::test_dtypes_masked_cumsum_cuda PASSED [ 4%] 2023-01-11T23:13:47.0230900Z test_ops.py::TestCommonCUDA::test_dtypes_masked_normalize_cuda PASSED [ 4%] 2023-01-11T23:13:47.0231218Z test_ops.py::TestCommonCUDA::test_dtypes_masked_prod_cuda PASSED [ 4%] 2023-01-11T23:13:47.0231526Z test_ops.py::TestCommonCUDA::test_dtypes_masked_scatter_cuda PASSED [ 4%] 2023-01-11T23:13:47.0231837Z test_ops.py::TestCommonCUDA::test_dtypes_masked_select_cuda PASSED [ 4%] 2023-01-11T23:13:47.0232148Z test_ops.py::TestCommonCUDA::test_dtypes_matmul_cuda PASSED [ 4%] 2023-01-11T23:13:47.0232449Z test_ops.py::TestCommonCUDA::test_dtypes_max_binary_cuda PASSED [ 4%] 2023-01-11T23:13:47.0232785Z test_ops.py::TestCommonCUDA::test_dtypes_max_pool2d_with_indices_backward_cuda PASSED [ 4%] 2023-01-11T23:13:47.0233138Z test_ops.py::TestCommonCUDA::test_dtypes_max_reduction_with_dim_cuda PASSED [ 4%] 2023-01-11T23:13:47.0233461Z test_ops.py::TestCommonCUDA::test_dtypes_maximum_cuda PASSED [ 4%] 2023-01-11T23:13:47.0233762Z test_ops.py::TestCommonCUDA::test_dtypes_median_cuda PASSED [ 4%] 2023-01-11T23:13:47.0234084Z test_ops.py::TestCommonCUDA::test_dtypes_meshgrid_list_of_tensors_cuda PASSED [ 4%] 2023-01-11T23:13:47.0234422Z test_ops.py::TestCommonCUDA::test_dtypes_min_reduction_with_dim_cuda PASSED [ 4%] 2023-01-11T23:13:47.0234736Z test_ops.py::TestCommonCUDA::test_dtypes_mode_cuda PASSED [ 4%] 2023-01-11T23:13:47.0235041Z test_ops.py::TestCommonCUDA::test_dtypes_mul_cuda PASSED [ 4%] 2023-01-11T23:13:47.0235347Z test_ops.py::TestCommonCUDA::test_dtypes_multinomial_cuda PASSED [ 4%] 2023-01-11T23:13:47.0235691Z test_ops.py::TestCommonCUDA::test_dtypes_mv_cuda PASSED [ 4%] 2023-01-11T23:13:47.0236005Z test_ops.py::TestCommonCUDA::test_dtypes_mvlgamma_mvlgamma_p_1_cuda PASSED [ 4%] 2023-01-11T23:13:47.0236338Z test_ops.py::TestCommonCUDA::test_dtypes_mvlgamma_mvlgamma_p_3_cuda PASSED [ 4%] 2023-01-11T23:13:47.0236659Z test_ops.py::TestCommonCUDA::test_dtypes_nan_to_num_cuda PASSED [ 4%] 2023-01-11T23:13:47.0236963Z test_ops.py::TestCommonCUDA::test_dtypes_nanmedian_cuda PASSED [ 4%] 2023-01-11T23:13:47.0237279Z test_ops.py::TestCommonCUDA::test_dtypes_nanquantile_cuda PASSED [ 4%] 2023-01-11T23:13:47.0237585Z test_ops.py::TestCommonCUDA::test_dtypes_nansum_cuda PASSED [ 4%] 2023-01-11T23:13:47.0237895Z test_ops.py::TestCommonCUDA::test_dtypes_narrow_copy_cuda PASSED [ 4%] 2023-01-11T23:13:47.0238206Z test_ops.py::TestCommonCUDA::test_dtypes_native_layer_norm_cuda PASSED [ 4%] 2023-01-11T23:13:47.0238516Z test_ops.py::TestCommonCUDA::test_dtypes_ne_cuda PASSED [ 4%] 2023-01-11T23:13:47.0238819Z test_ops.py::TestCommonCUDA::test_dtypes_neg_cuda PASSED [ 4%] 2023-01-11T23:13:47.0239126Z test_ops.py::TestCommonCUDA::test_dtypes_new_empty_strided_cuda PASSED [ 4%] 2023-01-11T23:13:47.0239492Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional__scaled_dot_product_attention_cuda SKIPPED (Skipped!) [ 4%] 2023-01-11T23:13:47.0239874Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_adaptive_avg_pool2d_cuda PASSED [ 4%] 2023-01-11T23:13:47.0240232Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_adaptive_avg_pool3d_cuda PASSED [ 4%] 2023-01-11T23:13:47.0240583Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_adaptive_max_pool1d_cuda PASSED [ 4%] 2023-01-11T23:13:47.0240930Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_avg_pool2d_cuda PASSED [ 4%] 2023-01-11T23:13:47.0241264Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_avg_pool3d_cuda PASSED [ 4%] 2023-01-11T23:13:47.0241619Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_batch_norm_without_cudnn_cuda PASSED [ 4%] 2023-01-11T23:13:47.0241961Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_celu_cuda PASSED [ 4%] 2023-01-11T23:13:47.0242315Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_conv2d_cuda PASSED [ 4%] 2023-01-11T23:13:47.0242663Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_conv_transpose1d_cuda PASSED [ 4%] 2023-01-11T23:13:47.0243011Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_cosine_similarity_cuda PASSED [ 4%] 2023-01-11T23:13:47.0243354Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_dropout_cuda PASSED [ 4%] 2023-01-11T23:13:47.0243684Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_elu_cuda PASSED [ 4%] 2023-01-11T23:13:47.0244020Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_embedding_bag_cuda PASSED [ 4%] 2023-01-11T23:13:47.0244383Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_feature_alpha_dropout_with_train_cuda PASSED [ 4%] 2023-01-11T23:13:47.0244776Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_feature_alpha_dropout_without_train_cuda PASSED [ 4%] 2023-01-11T23:13:47.0245153Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_fractional_max_pool3d_cuda PASSED [ 4%] 2023-01-11T23:13:47.0245499Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_grid_sample_cuda PASSED [ 4%] 2023-01-11T23:13:47.0245838Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_group_norm_cuda PASSED [ 4%] 2023-01-11T23:13:47.0246174Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_hardshrink_cuda PASSED [ 4%] 2023-01-11T23:13:47.0246514Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_hardsigmoid_cuda PASSED [ 4%] 2023-01-11T23:13:47.0246847Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_hardswish_cuda PASSED [ 4%] 2023-01-11T23:13:47.0247180Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_huber_loss_cuda PASSED [ 4%] 2023-01-11T23:13:47.0247555Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_interpolate_bilinear_cuda PASSED [ 4%] 2023-01-11T23:13:47.0247913Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_interpolate_nearest_cuda PASSED [ 4%] 2023-01-11T23:13:47.0248279Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_interpolate_trilinear_cuda PASSED [ 4%] 2023-01-11T23:13:47.0248627Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_kl_div_cuda PASSED [ 4%] 2023-01-11T23:13:47.0248961Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_layer_norm_cuda PASSED [ 4%] 2023-01-11T23:13:47.0249288Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_linear_cuda PASSED [ 4%] 2023-01-11T23:13:47.0249630Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_local_response_norm_cuda PASSED [ 4%] 2023-01-11T23:13:47.0249975Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_logsigmoid_cuda PASSED [ 4%] 2023-01-11T23:13:47.0250303Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_max_pool1d_cuda PASSED [ 4%] 2023-01-11T23:13:47.0250635Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_max_pool3d_cuda PASSED [ 4%] 2023-01-11T23:13:47.0250978Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_max_unpool1d_cuda PASSED [ 4%] 2023-01-11T23:13:47.0251329Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_max_unpool1d_grad_cuda PASSED [ 4%] 2023-01-11T23:13:47.0251672Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_max_unpool2d_cuda PASSED [ 4%] 2023-01-11T23:13:47.0252051Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_max_unpool3d_cuda PASSED [ 4%] 2023-01-11T23:13:47.0252451Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_mse_loss_cuda PASSED [ 4%] 2023-01-11T23:13:47.0252795Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_multi_margin_loss_cuda PASSED [ 4%] 2023-01-11T23:13:47.0253153Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_multilabel_soft_margin_loss_cuda PASSED [ 4%] 2023-01-11T23:13:47.0253506Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_normalize_cuda PASSED [ 4%] 2023-01-11T23:13:47.0253844Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_one_hot_cuda PASSED [ 4%] 2023-01-11T23:13:47.0254177Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_pad_constant_cuda PASSED [ 4%] 2023-01-11T23:13:47.0254686Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_pad_replicate_cuda PASSED [ 4%] 2023-01-11T23:13:47.0255031Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_pdist_cuda PASSED [ 4%] 2023-01-11T23:13:47.0255366Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_pixel_shuffle_cuda PASSED [ 4%] 2023-01-11T23:13:47.0255706Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_pixel_unshuffle_cuda PASSED [ 4%] 2023-01-11T23:13:47.0256051Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_poisson_nll_loss_cuda PASSED [ 4%] 2023-01-11T23:13:47.0256389Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_relu6_cuda PASSED [ 4%] 2023-01-11T23:13:47.0256707Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_relu_cuda PASSED [ 4%] 2023-01-11T23:13:47.0257033Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_rrelu_cuda PASSED [ 4%] 2023-01-11T23:13:47.0257356Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_selu_cuda PASSED [ 4%] 2023-01-11T23:13:47.0257680Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_silu_cuda PASSED [ 4%] 2023-01-11T23:13:47.0258013Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_soft_margin_loss_cuda PASSED [ 4%] 2023-01-11T23:13:47.0258355Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_softmin_cuda PASSED [ 4%] 2023-01-11T23:13:47.0258701Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_softmin_with_dtype_cuda PASSED [ 4%] 2023-01-11T23:13:47.0259039Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_softshrink_cuda PASSED [ 4%] 2023-01-11T23:13:47.0259383Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_softsign_cuda PASSED [ 4%] 2023-01-11T23:13:47.0259768Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_tanhshrink_cuda PASSED [ 4%] 2023-01-11T23:13:47.0260156Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_triplet_margin_loss_cuda PASSED [ 4%] 2023-01-11T23:13:47.0260529Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_triplet_margin_with_distance_loss_cuda PASSED [ 4%] 2023-01-11T23:13:47.0260901Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_upsample_bilinear_cuda PASSED [ 4%] 2023-01-11T23:13:47.0261255Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_upsample_nearest_cuda PASSED [ 4%] 2023-01-11T23:13:47.0261586Z test_ops.py::TestCommonCUDA::test_dtypes_nonzero_cuda PASSED [ 4%] 2023-01-11T23:13:47.0261900Z test_ops.py::TestCommonCUDA::test_dtypes_norm_cuda PASSED [ 4%] 2023-01-11T23:13:47.0262210Z test_ops.py::TestCommonCUDA::test_dtypes_norm_fro_cuda PASSED [ 4%] 2023-01-11T23:13:47.0262526Z test_ops.py::TestCommonCUDA::test_dtypes_norm_nuc_cuda PASSED [ 4%] 2023-01-11T23:13:47.0262838Z test_ops.py::TestCommonCUDA::test_dtypes_normal_cuda PASSED [ 4%] 2023-01-11T23:13:47.0263147Z test_ops.py::TestCommonCUDA::test_dtypes_ones_cuda PASSED [ 4%] 2023-01-11T23:13:47.0263483Z test_ops.py::TestCommonCUDA::test_dtypes_ops_nvprims_native_batch_norm_cuda PASSED [ 4%] 2023-01-11T23:13:47.0263806Z test_ops.py::TestCommonCUDA::test_dtypes_ormqr_cuda PASSED [ 4%] 2023-01-11T23:13:47.0264112Z test_ops.py::TestCommonCUDA::test_dtypes_outer_cuda PASSED [ 4%] 2023-01-11T23:13:47.0264417Z test_ops.py::TestCommonCUDA::test_dtypes_permute_cuda PASSED [ 4%] 2023-01-11T23:13:47.0264726Z test_ops.py::TestCommonCUDA::test_dtypes_polar_cuda PASSED [ 4%] 2023-01-11T23:13:47.0265050Z test_ops.py::TestCommonCUDA::test_dtypes_polygamma_polygamma_n_0_cuda PASSED [ 4%] 2023-01-11T23:13:47.0265406Z test_ops.py::TestCommonCUDA::test_dtypes_polygamma_polygamma_n_2_cuda SKIPPED (Skipped!) [ 4%] 2023-01-11T23:13:47.0265774Z test_ops.py::TestCommonCUDA::test_dtypes_polygamma_polygamma_n_3_cuda SKIPPED (Skipped!) [ 4%] 2023-01-11T23:13:47.0266112Z test_ops.py::TestCommonCUDA::test_dtypes_positive_cuda PASSED [ 4%] 2023-01-11T23:13:47.0266422Z test_ops.py::TestCommonCUDA::test_dtypes_pow_cuda PASSED [ 4%] 2023-01-11T23:13:47.0266757Z test_ops.py::TestCommonCUDA::test_dtypes_prod_cuda PASSED [ 4%] 2023-01-11T23:13:47.0267064Z test_ops.py::TestCommonCUDA::test_dtypes_put_cuda PASSED [ 4%] 2023-01-11T23:13:47.0267363Z test_ops.py::TestCommonCUDA::test_dtypes_randint_cuda PASSED [ 4%] 2023-01-11T23:13:47.0267669Z test_ops.py::TestCommonCUDA::test_dtypes_randn_cuda PASSED [ 4%] 2023-01-11T23:13:47.0267976Z test_ops.py::TestCommonCUDA::test_dtypes_randn_like_cuda PASSED [ 4%] 2023-01-11T23:13:47.0268275Z test_ops.py::TestCommonCUDA::test_dtypes_ravel_cuda PASSED [ 4%] 2023-01-11T23:13:47.0268575Z test_ops.py::TestCommonCUDA::test_dtypes_renorm_cuda PASSED [ 4%] 2023-01-11T23:13:47.0268899Z test_ops.py::TestCommonCUDA::test_dtypes_repeat_interleave_cuda PASSED [ 4%] 2023-01-11T23:13:47.0269214Z test_ops.py::TestCommonCUDA::test_dtypes_reshape_as_cuda PASSED [ 4%] 2023-01-11T23:13:47.0269520Z test_ops.py::TestCommonCUDA::test_dtypes_reshape_cuda PASSED [ 4%] 2023-01-11T23:13:47.0269889Z test_ops.py::TestCommonCUDA::test_dtypes_resize_as__cuda XFAIL [ 4%] 2023-01-11T23:13:47.0270193Z test_ops.py::TestCommonCUDA::test_dtypes_round_cuda PASSED [ 4%] 2023-01-11T23:13:47.0270495Z test_ops.py::TestCommonCUDA::test_dtypes_round_decimals_0_cuda PASSED [ 4%] 2023-01-11T23:13:47.0270828Z test_ops.py::TestCommonCUDA::test_dtypes_round_decimals_neg_3_cuda SKIPPED (Skipped!) [ 4%] 2023-01-11T23:13:47.0271157Z test_ops.py::TestCommonCUDA::test_dtypes_rsqrt_cuda PASSED [ 4%] 2023-01-11T23:13:47.0271457Z test_ops.py::TestCommonCUDA::test_dtypes_rsub_cuda PASSED [ 4%] 2023-01-11T23:13:47.0271792Z test_ops.py::TestCommonCUDA::test_dtypes_scatter_reduce_amin_cuda PASSED [ 4%] 2023-01-11T23:13:47.0272108Z test_ops.py::TestCommonCUDA::test_dtypes_scatter_reduce_mean_cuda PASSED [ 4%] 2023-01-11T23:13:47.0272439Z test_ops.py::TestCommonCUDA::test_dtypes_segment_reduce_offsets_cuda PASSED [ 4%] 2023-01-11T23:13:47.0272751Z test_ops.py::TestCommonCUDA::test_dtypes_select_cuda PASSED [ 4%] 2023-01-11T23:13:47.0273050Z test_ops.py::TestCommonCUDA::test_dtypes_sgn_cuda PASSED [ 4%] 2023-01-11T23:13:47.0273347Z test_ops.py::TestCommonCUDA::test_dtypes_sign_cuda PASSED [ 4%] 2023-01-11T23:13:47.0273667Z test_ops.py::TestCommonCUDA::test_dtypes_signal_windows_blackman_cuda PASSED [ 4%] 2023-01-11T23:13:47.0274001Z test_ops.py::TestCommonCUDA::test_dtypes_signal_windows_cosine_cuda PASSED [ 4%] 2023-01-11T23:13:47.0274338Z test_ops.py::TestCommonCUDA::test_dtypes_signal_windows_exponential_cuda PASSED [ 4%] 2023-01-11T23:13:47.0274682Z test_ops.py::TestCommonCUDA::test_dtypes_signal_windows_gaussian_cuda PASSED [ 4%] 2023-01-11T23:13:47.0275013Z test_ops.py::TestCommonCUDA::test_dtypes_signal_windows_hamming_cuda PASSED [ 5%] 2023-01-11T23:13:47.0275347Z test_ops.py::TestCommonCUDA::test_dtypes_signal_windows_kaiser_cuda PASSED [ 5%] 2023-01-11T23:13:47.0275680Z test_ops.py::TestCommonCUDA::test_dtypes_signal_windows_nuttall_cuda PASSED [ 5%] 2023-01-11T23:13:47.0276000Z test_ops.py::TestCommonCUDA::test_dtypes_signbit_cuda PASSED [ 5%] 2023-01-11T23:13:47.0276298Z test_ops.py::TestCommonCUDA::test_dtypes_sin_cuda PASSED [ 5%] 2023-01-11T23:13:47.0276605Z test_ops.py::TestCommonCUDA::test_dtypes_slice_scatter_cuda PASSED [ 5%] 2023-01-11T23:13:47.0276918Z test_ops.py::TestCommonCUDA::test_dtypes_softmax_cuda PASSED [ 5%] 2023-01-11T23:13:47.0277236Z test_ops.py::TestCommonCUDA::test_dtypes_sparse_sampled_addmm_cuda PASSED [ 5%] 2023-01-11T23:13:47.0277556Z test_ops.py::TestCommonCUDA::test_dtypes_special_airy_ai_cuda PASSED [ 5%] 2023-01-11T23:13:47.0277874Z test_ops.py::TestCommonCUDA::test_dtypes_special_bessel_j0_cuda PASSED [ 5%] 2023-01-11T23:13:47.0278188Z test_ops.py::TestCommonCUDA::test_dtypes_special_bessel_y0_cuda PASSED [ 5%] 2023-01-11T23:13:47.0278528Z test_ops.py::TestCommonCUDA::test_dtypes_special_bessel_y1_cuda PASSED [ 5%] 2023-01-11T23:13:47.0278866Z test_ops.py::TestCommonCUDA::test_dtypes_special_chebyshev_polynomial_t_cuda PASSED [ 5%] 2023-01-11T23:13:47.0279470Z test_ops.py::TestCommonCUDA::test_dtypes_special_chebyshev_polynomial_v_cuda SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 5%] 2023-01-11T23:13:47.0280031Z test_ops.py::TestCommonCUDA::test_dtypes_special_chebyshev_polynomial_w_cuda SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 5%] 2023-01-11T23:13:47.0280416Z test_ops.py::TestCommonCUDA::test_dtypes_special_entr_cuda PASSED [ 5%] 2023-01-11T23:13:47.0280752Z test_ops.py::TestCommonCUDA::test_dtypes_special_hermite_polynomial_h_cuda PASSED [ 5%] 2023-01-11T23:13:47.0281104Z test_ops.py::TestCommonCUDA::test_dtypes_special_hermite_polynomial_he_cuda PASSED [ 5%] 2023-01-11T23:13:47.0281448Z test_ops.py::TestCommonCUDA::test_dtypes_special_modified_bessel_i0_cuda PASSED [ 5%] 2023-01-11T23:13:47.0281789Z test_ops.py::TestCommonCUDA::test_dtypes_special_modified_bessel_k0_cuda PASSED [ 5%] 2023-01-11T23:13:47.0282131Z test_ops.py::TestCommonCUDA::test_dtypes_special_modified_bessel_k1_cuda PASSED [ 5%] 2023-01-11T23:13:47.0282458Z test_ops.py::TestCommonCUDA::test_dtypes_special_ndtri_cuda PASSED [ 5%] 2023-01-11T23:13:47.0282803Z test_ops.py::TestCommonCUDA::test_dtypes_special_polygamma_special_polygamma_n_0_cuda PASSED [ 5%] 2023-01-11T23:13:47.0283166Z test_ops.py::TestCommonCUDA::test_dtypes_special_scaled_modified_bessel_k0_cuda PASSED [ 5%] 2023-01-11T23:13:47.0283688Z test_ops.py::TestCommonCUDA::test_dtypes_special_shifted_chebyshev_polynomial_t_cuda SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 5%] 2023-01-11T23:13:47.0284302Z test_ops.py::TestCommonCUDA::test_dtypes_special_shifted_chebyshev_polynomial_w_cuda SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 5%] 2023-01-11T23:13:47.0284708Z test_ops.py::TestCommonCUDA::test_dtypes_special_spherical_bessel_j0_cuda PASSED [ 5%] 2023-01-11T23:13:47.0285041Z test_ops.py::TestCommonCUDA::test_dtypes_special_zeta_cuda PASSED [ 5%] 2023-01-11T23:13:47.0285359Z test_ops.py::TestCommonCUDA::test_dtypes_split_list_args_cuda PASSED [ 5%] 2023-01-11T23:13:47.0285673Z test_ops.py::TestCommonCUDA::test_dtypes_split_with_sizes_cuda PASSED [ 5%] 2023-01-11T23:13:47.0285986Z test_ops.py::TestCommonCUDA::test_dtypes_squeeze_cuda PASSED [ 5%] 2023-01-11T23:13:47.0286297Z test_ops.py::TestCommonCUDA::test_dtypes_stack_cuda PASSED [ 5%] 2023-01-11T23:13:47.0286600Z test_ops.py::TestCommonCUDA::test_dtypes_std_mean_cuda PASSED [ 5%] 2023-01-11T23:13:47.0286909Z test_ops.py::TestCommonCUDA::test_dtypes_std_unbiased_cuda PASSED [ 5%] 2023-01-11T23:13:47.0287210Z test_ops.py::TestCommonCUDA::test_dtypes_stft_cuda PASSED [ 5%] 2023-01-11T23:13:47.0287512Z test_ops.py::TestCommonCUDA::test_dtypes_sum_cuda PASSED [ 5%] 2023-01-11T23:13:47.0287809Z test_ops.py::TestCommonCUDA::test_dtypes_sum_to_size_cuda PASSED [ 5%] 2023-01-11T23:13:47.0288110Z test_ops.py::TestCommonCUDA::test_dtypes_svd_cuda PASSED [ 5%] 2023-01-11T23:13:47.0288405Z test_ops.py::TestCommonCUDA::test_dtypes_t_cuda PASSED [ 5%] 2023-01-11T23:13:47.0288706Z test_ops.py::TestCommonCUDA::test_dtypes_take_along_dim_cuda PASSED [ 5%] 2023-01-11T23:13:47.0289022Z test_ops.py::TestCommonCUDA::test_dtypes_tan_cuda PASSED [ 5%] 2023-01-11T23:13:47.0289354Z test_ops.py::TestCommonCUDA::test_dtypes_tile_cuda PASSED [ 5%] 2023-01-11T23:13:47.0289649Z test_ops.py::TestCommonCUDA::test_dtypes_to_cuda PASSED [ 5%] 2023-01-11T23:13:47.0289946Z test_ops.py::TestCommonCUDA::test_dtypes_topk_cuda PASSED [ 5%] 2023-01-11T23:13:47.0290246Z test_ops.py::TestCommonCUDA::test_dtypes_trace_cuda PASSED [ 5%] 2023-01-11T23:13:47.0290577Z test_ops.py::TestCommonCUDA::test_dtypes_trapezoid_cuda PASSED [ 5%] 2023-01-11T23:13:47.0290884Z test_ops.py::TestCommonCUDA::test_dtypes_trapz_cuda PASSED [ 5%] 2023-01-11T23:13:47.0291185Z test_ops.py::TestCommonCUDA::test_dtypes_tril_indices_cuda PASSED [ 5%] 2023-01-11T23:13:47.0291492Z test_ops.py::TestCommonCUDA::test_dtypes_triu_indices_cuda PASSED [ 5%] 2023-01-11T23:13:47.0291798Z test_ops.py::TestCommonCUDA::test_dtypes_true_divide_cuda PASSED [ 5%] 2023-01-11T23:13:47.0292097Z test_ops.py::TestCommonCUDA::test_dtypes_trunc_cuda PASSED [ 5%] 2023-01-11T23:13:47.0292396Z test_ops.py::TestCommonCUDA::test_dtypes_unbind_cuda PASSED [ 5%] 2023-01-11T23:13:47.0292702Z test_ops.py::TestCommonCUDA::test_dtypes_unflatten_cuda PASSED [ 5%] 2023-01-11T23:13:47.0293005Z test_ops.py::TestCommonCUDA::test_dtypes_unique_cuda PASSED [ 5%] 2023-01-11T23:13:47.0293300Z test_ops.py::TestCommonCUDA::test_dtypes_var_cuda PASSED [ 5%] 2023-01-11T23:13:47.0293605Z test_ops.py::TestCommonCUDA::test_dtypes_var_mean_unbiased_cuda PASSED [ 5%] 2023-01-11T23:13:47.0293916Z test_ops.py::TestCommonCUDA::test_dtypes_var_unbiased_cuda PASSED [ 5%] 2023-01-11T23:13:47.0294212Z test_ops.py::TestCommonCUDA::test_dtypes_vdot_cuda PASSED [ 5%] 2023-01-11T23:13:47.0294618Z test_ops.py::TestCommonCUDA::test_dtypes_view_as_complex_cuda PASSED [ 5%] 2023-01-11T23:13:47.0294928Z test_ops.py::TestCommonCUDA::test_dtypes_view_as_cuda PASSED [ 5%] 2023-01-11T23:13:47.0295230Z test_ops.py::TestCommonCUDA::test_dtypes_view_copy_cuda PASSED [ 5%] 2023-01-11T23:13:47.0295576Z test_ops.py::TestCommonCUDA::test_dtypes_view_cuda PASSED [ 5%] 2023-01-11T23:13:47.0295877Z test_ops.py::TestCommonCUDA::test_dtypes_zeros_cuda PASSED [ 5%] 2023-01-11T23:13:47.0296177Z test_ops.py::TestCommonCUDA::test_errors_T_cuda PASSED [ 5%] 2023-01-11T23:13:47.0296470Z test_ops.py::TestCommonCUDA::test_errors___rand___cuda PASSED [ 5%] 2023-01-11T23:13:47.0296765Z test_ops.py::TestCommonCUDA::test_errors___rmod___cuda PASSED [ 5%] 2023-01-11T23:13:47.0297060Z test_ops.py::TestCommonCUDA::test_errors___ror___cuda PASSED [ 5%] 2023-01-11T23:13:47.0297360Z test_ops.py::TestCommonCUDA::test_errors_add_cuda PASSED [ 5%] 2023-01-11T23:13:47.0297654Z test_ops.py::TestCommonCUDA::test_errors_aminmax_cuda PASSED [ 5%] 2023-01-11T23:13:47.0297951Z test_ops.py::TestCommonCUDA::test_errors_arange_cuda PASSED [ 5%] 2023-01-11T23:13:47.0298259Z test_ops.py::TestCommonCUDA::test_errors_as_strided_scatter_cuda PASSED [ 5%] 2023-01-11T23:13:47.0298568Z test_ops.py::TestCommonCUDA::test_errors_bitwise_and_cuda PASSED [ 5%] 2023-01-11T23:13:47.0298867Z test_ops.py::TestCommonCUDA::test_errors_cat_cuda PASSED [ 5%] 2023-01-11T23:13:47.0299196Z test_ops.py::TestCommonCUDA::test_errors_clamp_max_cuda XFAIL [ 5%] 2023-01-11T23:13:47.0299520Z test_ops.py::TestCommonCUDA::test_errors_clamp_min_cuda XFAIL [ 5%] 2023-01-11T23:13:47.0299822Z test_ops.py::TestCommonCUDA::test_errors_diagonal_cuda PASSED [ 5%] 2023-01-11T23:13:47.0300130Z test_ops.py::TestCommonCUDA::test_errors_div_floor_rounding_cuda PASSED [ 5%] 2023-01-11T23:13:47.0300444Z test_ops.py::TestCommonCUDA::test_errors_div_trunc_rounding_cuda PASSED [ 5%] 2023-01-11T23:13:47.0300750Z test_ops.py::TestCommonCUDA::test_errors_dsplit_cuda PASSED [ 5%] 2023-01-11T23:13:47.0301051Z test_ops.py::TestCommonCUDA::test_errors_eye_cuda PASSED [ 5%] 2023-01-11T23:13:47.0301350Z test_ops.py::TestCommonCUDA::test_errors_fft_fftn_cuda PASSED [ 5%] 2023-01-11T23:13:47.0301652Z test_ops.py::TestCommonCUDA::test_errors_fft_hfft2_cuda PASSED [ 5%] 2023-01-11T23:13:47.0301985Z test_ops.py::TestCommonCUDA::test_errors_fft_hfft_cuda PASSED [ 5%] 2023-01-11T23:13:47.0302288Z test_ops.py::TestCommonCUDA::test_errors_fft_ifft2_cuda PASSED [ 5%] 2023-01-11T23:13:47.0302586Z test_ops.py::TestCommonCUDA::test_errors_fft_ifft_cuda PASSED [ 5%] 2023-01-11T23:13:47.0302882Z test_ops.py::TestCommonCUDA::test_errors_fft_ihfft2_cuda PASSED [ 5%] 2023-01-11T23:13:47.0303182Z test_ops.py::TestCommonCUDA::test_errors_fliplr_cuda PASSED [ 5%] 2023-01-11T23:13:47.0303482Z test_ops.py::TestCommonCUDA::test_errors_flipud_cuda PASSED [ 5%] 2023-01-11T23:13:47.0303784Z test_ops.py::TestCommonCUDA::test_errors_float_power_cuda PASSED [ 5%] 2023-01-11T23:13:47.0304086Z test_ops.py::TestCommonCUDA::test_errors_fmax_cuda PASSED [ 5%] 2023-01-11T23:13:47.0304382Z test_ops.py::TestCommonCUDA::test_errors_fmod_cuda PASSED [ 5%] 2023-01-11T23:13:47.0304678Z test_ops.py::TestCommonCUDA::test_errors_gcd_cuda PASSED [ 5%] 2023-01-11T23:13:47.0304973Z test_ops.py::TestCommonCUDA::test_errors_ge_cuda PASSED [ 5%] 2023-01-11T23:13:47.0305267Z test_ops.py::TestCommonCUDA::test_errors_gt_cuda PASSED [ 5%] 2023-01-11T23:13:47.0305566Z test_ops.py::TestCommonCUDA::test_errors_heaviside_cuda PASSED [ 5%] 2023-01-11T23:13:47.0305869Z test_ops.py::TestCommonCUDA::test_errors_igamma_cuda PASSED [ 5%] 2023-01-11T23:13:47.0306164Z test_ops.py::TestCommonCUDA::test_errors_igammac_cuda PASSED [ 5%] 2023-01-11T23:13:47.0306466Z test_ops.py::TestCommonCUDA::test_errors_index_select_cuda PASSED [ 5%] 2023-01-11T23:13:47.0306793Z test_ops.py::TestCommonCUDA::test_errors_jiterator_binary_return_by_ref_cuda PASSED [ 5%] 2023-01-11T23:13:47.0307153Z test_ops.py::TestCommonCUDA::test_errors_kthvalue_cuda PASSED [ 5%] 2023-01-11T23:13:47.0307453Z test_ops.py::TestCommonCUDA::test_errors_lcm_cuda PASSED [ 5%] 2023-01-11T23:13:47.0307757Z test_ops.py::TestCommonCUDA::test_errors_linalg_lstsq_cuda PASSED [ 5%] 2023-01-11T23:13:47.0308082Z test_ops.py::TestCommonCUDA::test_errors_linalg_lstsq_grad_oriented_cuda PASSED [ 5%] 2023-01-11T23:13:47.0308400Z test_ops.py::TestCommonCUDA::test_errors_logspace_cuda PASSED [ 5%] 2023-01-11T23:13:47.0308698Z test_ops.py::TestCommonCUDA::test_errors_lt_cuda PASSED [ 5%] 2023-01-11T23:13:47.0308999Z test_ops.py::TestCommonCUDA::test_errors_masked_fill_cuda PASSED [ 5%] 2023-01-11T23:13:47.0309298Z test_ops.py::TestCommonCUDA::test_errors_maximum_cuda PASSED [ 5%] 2023-01-11T23:13:47.0309601Z test_ops.py::TestCommonCUDA::test_errors_multinomial_cuda PASSED [ 5%] 2023-01-11T23:13:47.0309977Z test_ops.py::TestCommonCUDA::test_errors_neg_cuda PASSED [ 5%] 2023-01-11T23:13:47.0310300Z test_ops.py::TestCommonCUDA::test_errors_nn_functional_avg_pool3d_cuda PASSED [ 5%] 2023-01-11T23:13:47.0310635Z test_ops.py::TestCommonCUDA::test_errors_nn_functional_conv1d_cuda PASSED [ 5%] 2023-01-11T23:13:47.0310965Z test_ops.py::TestCommonCUDA::test_errors_nn_functional_conv2d_cuda PASSED [ 5%] 2023-01-11T23:13:47.0311301Z test_ops.py::TestCommonCUDA::test_errors_nn_functional_gaussian_nll_loss_cuda PASSED [ 5%] 2023-01-11T23:13:47.0311649Z test_ops.py::TestCommonCUDA::test_errors_nn_functional_hinge_embedding_loss_cuda PASSED [ 5%] 2023-01-11T23:13:47.0312001Z test_ops.py::TestCommonCUDA::test_errors_nn_functional_margin_ranking_loss_cuda PASSED [ 5%] 2023-01-11T23:13:47.0312340Z test_ops.py::TestCommonCUDA::test_errors_nn_functional_max_pool3d_cuda PASSED [ 5%] 2023-01-11T23:13:47.0312667Z test_ops.py::TestCommonCUDA::test_errors_nn_functional_rrelu_cuda PASSED [ 5%] 2023-01-11T23:13:47.0312998Z test_ops.py::TestCommonCUDA::test_errors_nn_functional_soft_margin_loss_cuda PASSED [ 5%] 2023-01-11T23:13:47.0313337Z test_ops.py::TestCommonCUDA::test_errors_nn_functional_softshrink_cuda PASSED [ 5%] 2023-01-11T23:13:47.0313724Z test_ops.py::TestCommonCUDA::test_errors_nn_functional_triplet_margin_loss_cuda PASSED [ 5%] 2023-01-11T23:13:47.0314057Z test_ops.py::TestCommonCUDA::test_errors_polar_cuda PASSED [ 5%] 2023-01-11T23:13:47.0314373Z test_ops.py::TestCommonCUDA::test_errors_remainder_cuda PASSED [ 5%] 2023-01-11T23:13:47.0314684Z test_ops.py::TestCommonCUDA::test_errors_rsub_cuda PASSED [ 5%] 2023-01-11T23:13:47.0315012Z test_ops.py::TestCommonCUDA::test_errors_signal_windows_bartlett_cuda PASSED [ 5%] 2023-01-11T23:13:47.0315343Z test_ops.py::TestCommonCUDA::test_errors_signal_windows_blackman_cuda PASSED [ 5%] 2023-01-11T23:13:47.0315689Z test_ops.py::TestCommonCUDA::test_errors_signal_windows_exponential_cuda PASSED [ 5%] 2023-01-11T23:13:47.0316045Z test_ops.py::TestCommonCUDA::test_errors_signal_windows_general_hamming_cuda PASSED [ 5%] 2023-01-11T23:13:47.0316382Z test_ops.py::TestCommonCUDA::test_errors_signal_windows_hamming_cuda PASSED [ 5%] 2023-01-11T23:13:47.0316718Z test_ops.py::TestCommonCUDA::test_errors_signal_windows_hann_cuda PASSED [ 5%] 2023-01-11T23:13:47.0317058Z test_ops.py::TestCommonCUDA::test_errors_special_chebyshev_polynomial_t_cuda PASSED [ 5%] 2023-01-11T23:13:47.0317586Z test_ops.py::TestCommonCUDA::test_errors_special_chebyshev_polynomial_v_cuda SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 5%] 2023-01-11T23:13:47.0318150Z test_ops.py::TestCommonCUDA::test_errors_special_chebyshev_polynomial_w_cuda SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 5%] 2023-01-11T23:13:47.0318561Z test_ops.py::TestCommonCUDA::test_errors_special_hermite_polynomial_he_cuda PASSED [ 5%] 2023-01-11T23:13:47.0318915Z test_ops.py::TestCommonCUDA::test_errors_special_laguerre_polynomial_l_cuda PASSED [ 5%] 2023-01-11T23:13:47.0319463Z test_ops.py::TestCommonCUDA::test_errors_special_shifted_chebyshev_polynomial_u_cuda SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 5%] 2023-01-11T23:13:47.0320041Z test_ops.py::TestCommonCUDA::test_errors_special_shifted_chebyshev_polynomial_v_cuda SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 5%] 2023-01-11T23:13:47.0320440Z test_ops.py::TestCommonCUDA::test_errors_special_zeta_cuda PASSED [ 5%] 2023-01-11T23:13:47.0320763Z test_ops.py::TestCommonCUDA::test_errors_sum_to_size_cuda PASSED [ 5%] 2023-01-11T23:13:47.0321077Z test_ops.py::TestCommonCUDA::test_errors_take_cuda PASSED [ 5%] 2023-01-11T23:13:47.0321384Z test_ops.py::TestCommonCUDA::test_errors_trace_cuda PASSED [ 5%] 2023-01-11T23:13:47.0321687Z test_ops.py::TestCommonCUDA::test_errors_triu_cuda PASSED [ 5%] 2023-01-11T23:13:47.0321999Z test_ops.py::TestCommonCUDA::test_errors_uniform_cuda PASSED [ 5%] 2023-01-11T23:13:47.0322303Z test_ops.py::TestCommonCUDA::test_errors_view_cuda PASSED [ 5%] 2023-01-11T23:13:47.0322611Z test_ops.py::TestCommonCUDA::test_errors_vsplit_cuda PASSED [ 5%] 2023-01-11T23:13:47.0322920Z test_ops.py::TestCommonCUDA::test_errors_vstack_cuda PASSED [ 5%] 2023-01-11T23:13:47.0323231Z test_ops.py::TestCommonCUDA::test_errors_where_cuda PASSED [ 5%] 2023-01-11T23:13:47.0323577Z test_ops.py::TestCommonCUDA::test_multiple_devices_H_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 5%] 2023-01-11T23:13:47.0323967Z test_ops.py::TestCommonCUDA::test_multiple_devices_T_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 5%] 2023-01-11T23:13:47.0324363Z test_ops.py::TestCommonCUDA::test_multiple_devices___getitem___cuda_float32 SKIPPED (fewer than 2 devices detected) [ 5%] 2023-01-11T23:13:47.0324760Z test_ops.py::TestCommonCUDA::test_multiple_devices___getitem___cuda_int64 SKIPPED (fewer than 2 devices detected) [ 5%] 2023-01-11T23:13:47.0325162Z test_ops.py::TestCommonCUDA::test_multiple_devices___rand___cuda_int64 SKIPPED (fewer than 2 devices detected) [ 5%] 2023-01-11T23:13:47.0325600Z test_ops.py::TestCommonCUDA::test_multiple_devices___rmod___cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0325997Z test_ops.py::TestCommonCUDA::test_multiple_devices___rmod___cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0326388Z test_ops.py::TestCommonCUDA::test_multiple_devices___ror___cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0326785Z test_ops.py::TestCommonCUDA::test_multiple_devices___rpow___cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0327176Z test_ops.py::TestCommonCUDA::test_multiple_devices___rpow___cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0327565Z test_ops.py::TestCommonCUDA::test_multiple_devices___rsub___cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0327943Z test_ops.py::TestCommonCUDA::test_multiple_devices___rxor___cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0328366Z test_ops.py::TestCommonCUDA::test_multiple_devices__native_batch_norm_legit_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0328781Z test_ops.py::TestCommonCUDA::test_multiple_devices_acos_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0329224Z test_ops.py::TestCommonCUDA::test_multiple_devices_acosh_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0329614Z test_ops.py::TestCommonCUDA::test_multiple_devices_add_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0330009Z test_ops.py::TestCommonCUDA::test_multiple_devices_addbmm_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0330409Z test_ops.py::TestCommonCUDA::test_multiple_devices_addcmul_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0330853Z test_ops.py::TestCommonCUDA::test_multiple_devices_addmm_decomposed_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0331261Z test_ops.py::TestCommonCUDA::test_multiple_devices_addmv_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0331658Z test_ops.py::TestCommonCUDA::test_multiple_devices_all_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0332053Z test_ops.py::TestCommonCUDA::test_multiple_devices_amin_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0332450Z test_ops.py::TestCommonCUDA::test_multiple_devices_angle_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0332842Z test_ops.py::TestCommonCUDA::test_multiple_devices_argmax_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0333239Z test_ops.py::TestCommonCUDA::test_multiple_devices_argmin_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0333640Z test_ops.py::TestCommonCUDA::test_multiple_devices_argsort_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0334041Z test_ops.py::TestCommonCUDA::test_multiple_devices_argsort_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0334428Z test_ops.py::TestCommonCUDA::test_multiple_devices_argwhere_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0334933Z test_ops.py::TestCommonCUDA::test_multiple_devices_as_strided_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0335347Z test_ops.py::TestCommonCUDA::test_multiple_devices_as_strided_scatter_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0335764Z test_ops.py::TestCommonCUDA::test_multiple_devices_as_strided_scatter_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0336177Z test_ops.py::TestCommonCUDA::test_multiple_devices_asin_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0336569Z test_ops.py::TestCommonCUDA::test_multiple_devices_asinh_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0337004Z test_ops.py::TestCommonCUDA::test_multiple_devices_asinh_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0337389Z test_ops.py::TestCommonCUDA::test_multiple_devices_atan2_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0337783Z test_ops.py::TestCommonCUDA::test_multiple_devices_atan_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0338171Z test_ops.py::TestCommonCUDA::test_multiple_devices_atan_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0338570Z test_ops.py::TestCommonCUDA::test_multiple_devices_baddbmm_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0338969Z test_ops.py::TestCommonCUDA::test_multiple_devices_bfloat16_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0339414Z test_ops.py::TestCommonCUDA::test_multiple_devices_bfloat16_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0339810Z test_ops.py::TestCommonCUDA::test_multiple_devices_bincount_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0340213Z test_ops.py::TestCommonCUDA::test_multiple_devices_bitwise_left_shift_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0340616Z test_ops.py::TestCommonCUDA::test_multiple_devices_bitwise_not_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0341015Z test_ops.py::TestCommonCUDA::test_multiple_devices_bitwise_or_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0341421Z test_ops.py::TestCommonCUDA::test_multiple_devices_bitwise_xor_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0341852Z test_ops.py::TestCommonCUDA::test_multiple_devices_bool_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0342254Z test_ops.py::TestCommonCUDA::test_multiple_devices_broadcast_shapes_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0342672Z test_ops.py::TestCommonCUDA::test_multiple_devices_bucketize_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0343070Z test_ops.py::TestCommonCUDA::test_multiple_devices_byte_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0343475Z test_ops.py::TestCommonCUDA::test_multiple_devices_cartesian_prod_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0343870Z test_ops.py::TestCommonCUDA::test_multiple_devices_cat_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0344262Z test_ops.py::TestCommonCUDA::test_multiple_devices_cat_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0344660Z test_ops.py::TestCommonCUDA::test_multiple_devices_cdist_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0345062Z test_ops.py::TestCommonCUDA::test_multiple_devices_cdouble_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0345459Z test_ops.py::TestCommonCUDA::test_multiple_devices_cdouble_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0345855Z test_ops.py::TestCommonCUDA::test_multiple_devices_chalf_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0346249Z test_ops.py::TestCommonCUDA::test_multiple_devices_char_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0346641Z test_ops.py::TestCommonCUDA::test_multiple_devices_char_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0347030Z test_ops.py::TestCommonCUDA::test_multiple_devices_cholesky_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0347440Z test_ops.py::TestCommonCUDA::test_multiple_devices_cholesky_solve_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0347841Z test_ops.py::TestCommonCUDA::test_multiple_devices_chunk_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0348256Z test_ops.py::TestCommonCUDA::test_multiple_devices_clamp_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0348654Z test_ops.py::TestCommonCUDA::test_multiple_devices_clamp_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0349057Z test_ops.py::TestCommonCUDA::test_multiple_devices_clamp_max_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0349462Z test_ops.py::TestCommonCUDA::test_multiple_devices_clamp_min_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0349902Z test_ops.py::TestCommonCUDA::test_multiple_devices_clone_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0350304Z test_ops.py::TestCommonCUDA::test_multiple_devices_column_stack_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0350722Z test_ops.py::TestCommonCUDA::test_multiple_devices_combinations_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0351128Z test_ops.py::TestCommonCUDA::test_multiple_devices_conj_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0351527Z test_ops.py::TestCommonCUDA::test_multiple_devices_constant_pad_nd_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0351942Z test_ops.py::TestCommonCUDA::test_multiple_devices_constant_pad_nd_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0352355Z test_ops.py::TestCommonCUDA::test_multiple_devices_contiguous_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0352764Z test_ops.py::TestCommonCUDA::test_multiple_devices_contiguous_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0353191Z test_ops.py::TestCommonCUDA::test_multiple_devices_copysign_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0353594Z test_ops.py::TestCommonCUDA::test_multiple_devices_copysign_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0353986Z test_ops.py::TestCommonCUDA::test_multiple_devices_cos_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0354384Z test_ops.py::TestCommonCUDA::test_multiple_devices_count_nonzero_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0354770Z test_ops.py::TestCommonCUDA::test_multiple_devices_cov_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0355159Z test_ops.py::TestCommonCUDA::test_multiple_devices_cross_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0355556Z test_ops.py::TestCommonCUDA::test_multiple_devices_cummin_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0355955Z test_ops.py::TestCommonCUDA::test_multiple_devices_cummin_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0356350Z test_ops.py::TestCommonCUDA::test_multiple_devices_cumprod_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0356750Z test_ops.py::TestCommonCUDA::test_multiple_devices_cumsum_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0357140Z test_ops.py::TestCommonCUDA::test_multiple_devices_cumsum_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0357548Z test_ops.py::TestCommonCUDA::test_multiple_devices_cumulative_trapezoid_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0357968Z test_ops.py::TestCommonCUDA::test_multiple_devices_cumulative_trapezoid_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0358377Z test_ops.py::TestCommonCUDA::test_multiple_devices_diag_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0358778Z test_ops.py::TestCommonCUDA::test_multiple_devices_diag_embed_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0359233Z test_ops.py::TestCommonCUDA::test_multiple_devices_diagflat_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0359660Z test_ops.py::TestCommonCUDA::test_multiple_devices_diagflat_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0360058Z test_ops.py::TestCommonCUDA::test_multiple_devices_diagonal_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0360465Z test_ops.py::TestCommonCUDA::test_multiple_devices_diagonal_scatter_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0360860Z test_ops.py::TestCommonCUDA::test_multiple_devices_diff_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0361253Z test_ops.py::TestCommonCUDA::test_multiple_devices_diff_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0361658Z test_ops.py::TestCommonCUDA::test_multiple_devices_digamma_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0362064Z test_ops.py::TestCommonCUDA::test_multiple_devices_digamma_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0362456Z test_ops.py::TestCommonCUDA::test_multiple_devices_dist_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0362864Z test_ops.py::TestCommonCUDA::test_multiple_devices_div_floor_rounding_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0363283Z test_ops.py::TestCommonCUDA::test_multiple_devices_div_floor_rounding_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0363710Z test_ops.py::TestCommonCUDA::test_multiple_devices_div_no_rounding_mode_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0364133Z test_ops.py::TestCommonCUDA::test_multiple_devices_dot_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0364531Z test_ops.py::TestCommonCUDA::test_multiple_devices_dstack_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0364933Z test_ops.py::TestCommonCUDA::test_multiple_devices_dstack_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0365331Z test_ops.py::TestCommonCUDA::test_multiple_devices_empty_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0365729Z test_ops.py::TestCommonCUDA::test_multiple_devices_empty_like_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0366130Z test_ops.py::TestCommonCUDA::test_multiple_devices_eq_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0366520Z test_ops.py::TestCommonCUDA::test_multiple_devices_erfinv_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0366915Z test_ops.py::TestCommonCUDA::test_multiple_devices_exp_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0367295Z test_ops.py::TestCommonCUDA::test_multiple_devices_expm1_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0367691Z test_ops.py::TestCommonCUDA::test_multiple_devices_eye_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0368086Z test_ops.py::TestCommonCUDA::test_multiple_devices_fft_fft2_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0368489Z test_ops.py::TestCommonCUDA::test_multiple_devices_fft_fft_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0368884Z test_ops.py::TestCommonCUDA::test_multiple_devices_fft_fftn_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0369338Z test_ops.py::TestCommonCUDA::test_multiple_devices_fft_fftshift_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0369745Z test_ops.py::TestCommonCUDA::test_multiple_devices_fft_hfft2_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0370146Z test_ops.py::TestCommonCUDA::test_multiple_devices_fft_hfft_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0370567Z test_ops.py::TestCommonCUDA::test_multiple_devices_fft_hfftn_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0370972Z test_ops.py::TestCommonCUDA::test_multiple_devices_fft_ifft_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0371370Z test_ops.py::TestCommonCUDA::test_multiple_devices_fft_ifft_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0371769Z test_ops.py::TestCommonCUDA::test_multiple_devices_fft_ihfft2_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0372161Z test_ops.py::TestCommonCUDA::test_multiple_devices_fft_ihfft2_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0372360Z test_ops.py::TestCommonCUDA::test_multiple_devices_fft_irfft2_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0372552Z test_ops.py::TestCommonCUDA::test_multiple_devices_fft_irfft2_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0372753Z test_ops.py::TestCommonCUDA::test_multiple_devices_fft_irfft_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0372945Z test_ops.py::TestCommonCUDA::test_multiple_devices_fft_irfftn_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0373136Z test_ops.py::TestCommonCUDA::test_multiple_devices_fft_rfft_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0373327Z test_ops.py::TestCommonCUDA::test_multiple_devices_fft_rfft_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0373520Z test_ops.py::TestCommonCUDA::test_multiple_devices_fft_rfftn_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0373726Z test_ops.py::TestCommonCUDA::test_multiple_devices_fill_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0373921Z test_ops.py::TestCommonCUDA::test_multiple_devices_flatten_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0374114Z test_ops.py::TestCommonCUDA::test_multiple_devices_flatten_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0374307Z test_ops.py::TestCommonCUDA::test_multiple_devices_flip_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0374596Z test_ops.py::TestCommonCUDA::test_multiple_devices_flip_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0374790Z test_ops.py::TestCommonCUDA::test_multiple_devices_fliplr_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0374981Z test_ops.py::TestCommonCUDA::test_multiple_devices_fliplr_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0375175Z test_ops.py::TestCommonCUDA::test_multiple_devices_float_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0375364Z test_ops.py::TestCommonCUDA::test_multiple_devices_float_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0375557Z test_ops.py::TestCommonCUDA::test_multiple_devices_float_power_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0375753Z test_ops.py::TestCommonCUDA::test_multiple_devices_float_power_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0375942Z test_ops.py::TestCommonCUDA::test_multiple_devices_floor_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0376138Z test_ops.py::TestCommonCUDA::test_multiple_devices_floor_divide_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0376326Z test_ops.py::TestCommonCUDA::test_multiple_devices_fmax_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0376519Z test_ops.py::TestCommonCUDA::test_multiple_devices_fmin_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0376788Z test_ops.py::TestCommonCUDA::test_multiple_devices_fmin_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0376983Z test_ops.py::TestCommonCUDA::test_multiple_devices_frac_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0377175Z test_ops.py::TestCommonCUDA::test_multiple_devices_frexp_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0377363Z test_ops.py::TestCommonCUDA::test_multiple_devices_full_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0377551Z test_ops.py::TestCommonCUDA::test_multiple_devices_full_like_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0377743Z test_ops.py::TestCommonCUDA::test_multiple_devices_full_like_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0377937Z test_ops.py::TestCommonCUDA::test_multiple_devices_gather_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0378127Z test_ops.py::TestCommonCUDA::test_multiple_devices_gather_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0378315Z test_ops.py::TestCommonCUDA::test_multiple_devices_gcd_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0378501Z test_ops.py::TestCommonCUDA::test_multiple_devices_ge_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0378692Z test_ops.py::TestCommonCUDA::test_multiple_devices_geqrf_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0378890Z test_ops.py::TestCommonCUDA::test_multiple_devices_gradient_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0379077Z test_ops.py::TestCommonCUDA::test_multiple_devices_gt_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0379330Z test_ops.py::TestCommonCUDA::test_multiple_devices_hsplit_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 6%] 2023-01-11T23:13:47.0379544Z test_ops.py::TestCommonCUDA::test_multiple_devices_hsplit_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0379732Z test_ops.py::TestCommonCUDA::test_multiple_devices_hstack_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0379923Z test_ops.py::TestCommonCUDA::test_multiple_devices_hypot_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0380116Z test_ops.py::TestCommonCUDA::test_multiple_devices_igammac_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0380303Z test_ops.py::TestCommonCUDA::test_multiple_devices_index_add_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0380500Z test_ops.py::TestCommonCUDA::test_multiple_devices_index_copy_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0380698Z test_ops.py::TestCommonCUDA::test_multiple_devices_index_copy_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0380895Z test_ops.py::TestCommonCUDA::test_multiple_devices_index_fill_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0381087Z test_ops.py::TestCommonCUDA::test_multiple_devices_index_put_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0381279Z test_ops.py::TestCommonCUDA::test_multiple_devices_index_select_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0381478Z test_ops.py::TestCommonCUDA::test_multiple_devices_index_select_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0381667Z test_ops.py::TestCommonCUDA::test_multiple_devices_int_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0381862Z test_ops.py::TestCommonCUDA::test_multiple_devices_isclose_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0382055Z test_ops.py::TestCommonCUDA::test_multiple_devices_isclose_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0382274Z test_ops.py::TestCommonCUDA::test_multiple_devices_isfinite_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0382468Z test_ops.py::TestCommonCUDA::test_multiple_devices_isfinite_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0382655Z test_ops.py::TestCommonCUDA::test_multiple_devices_isinf_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0382842Z test_ops.py::TestCommonCUDA::test_multiple_devices_isnan_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0383031Z test_ops.py::TestCommonCUDA::test_multiple_devices_isposinf_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0383240Z test_ops.py::TestCommonCUDA::test_multiple_devices_jiterator_2inputs_2outputs_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0383460Z test_ops.py::TestCommonCUDA::test_multiple_devices_jiterator_2inputs_2outputs_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0383663Z test_ops.py::TestCommonCUDA::test_multiple_devices_jiterator_binary_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0383861Z test_ops.py::TestCommonCUDA::test_multiple_devices_jiterator_unary_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0384051Z test_ops.py::TestCommonCUDA::test_multiple_devices_kron_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0384239Z test_ops.py::TestCommonCUDA::test_multiple_devices_kron_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0384430Z test_ops.py::TestCommonCUDA::test_multiple_devices_kthvalue_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0384617Z test_ops.py::TestCommonCUDA::test_multiple_devices_lcm_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0384830Z test_ops.py::TestCommonCUDA::test_multiple_devices_ldexp_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0385013Z test_ops.py::TestCommonCUDA::test_multiple_devices_le_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0385200Z test_ops.py::TestCommonCUDA::test_multiple_devices_lerp_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0385385Z test_ops.py::TestCommonCUDA::test_multiple_devices_lgamma_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0385582Z test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_cond_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0385782Z test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_cross_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0385978Z test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_cross_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0386175Z test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_det_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0386375Z test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_eigh_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0386574Z test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_eigvals_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0386773Z test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_eigvalsh_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0386982Z test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_householder_product_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0387173Z test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_inv_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0387374Z test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_inv_ex_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0387606Z test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_ldl_factor_ex_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0387803Z test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_lstsq_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0388008Z test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_lu_factor_ex_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0388209Z test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_lu_solve_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0388415Z test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_matrix_power_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0388616Z test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_matrix_rank_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0388862Z test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_pinv_singular_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 7%] 2023-01-11T23:13:47.0389069Z test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_solve_triangular_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0389266Z test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_svd_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0389464Z test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_svdvals_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0389667Z test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_tensorinv_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0389927Z test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_tensorsolve_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0390156Z test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_vander_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0390354Z test_ops.py::TestCommonCUDA::test_multiple_devices_linspace_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0390544Z test_ops.py::TestCommonCUDA::test_multiple_devices_log10_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0390733Z test_ops.py::TestCommonCUDA::test_multiple_devices_log10_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0390927Z test_ops.py::TestCommonCUDA::test_multiple_devices_log1p_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0391111Z test_ops.py::TestCommonCUDA::test_multiple_devices_log2_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0391298Z test_ops.py::TestCommonCUDA::test_multiple_devices_log_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0391496Z test_ops.py::TestCommonCUDA::test_multiple_devices_log_softmax_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0391709Z test_ops.py::TestCommonCUDA::test_multiple_devices_log_softmax_with_dtype_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0391915Z test_ops.py::TestCommonCUDA::test_multiple_devices_log_softmax_with_dtype_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0392113Z test_ops.py::TestCommonCUDA::test_multiple_devices_logaddexp2_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0392317Z test_ops.py::TestCommonCUDA::test_multiple_devices_logaddexp_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0392515Z test_ops.py::TestCommonCUDA::test_multiple_devices_logical_and_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0392710Z test_ops.py::TestCommonCUDA::test_multiple_devices_logical_and_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0392913Z test_ops.py::TestCommonCUDA::test_multiple_devices_logical_not_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0393122Z test_ops.py::TestCommonCUDA::test_multiple_devices_logical_not_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0393320Z test_ops.py::TestCommonCUDA::test_multiple_devices_logical_or_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0393513Z test_ops.py::TestCommonCUDA::test_multiple_devices_logspace_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0393714Z test_ops.py::TestCommonCUDA::test_multiple_devices_logsumexp_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0393904Z test_ops.py::TestCommonCUDA::test_multiple_devices_long_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0394092Z test_ops.py::TestCommonCUDA::test_multiple_devices_lt_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0394282Z test_ops.py::TestCommonCUDA::test_multiple_devices_lu_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0394483Z test_ops.py::TestCommonCUDA::test_multiple_devices_lu_unpack_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0394673Z test_ops.py::TestCommonCUDA::test_multiple_devices_mH_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0394859Z test_ops.py::TestCommonCUDA::test_multiple_devices_mT_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0395038Z test_ops.py::TestCommonCUDA::test_multiple_devices_mT_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0395233Z test_ops.py::TestCommonCUDA::test_multiple_devices_masked_amin_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0395434Z test_ops.py::TestCommonCUDA::test_multiple_devices_masked_argmax_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0395659Z test_ops.py::TestCommonCUDA::test_multiple_devices_masked_argmin_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0395856Z test_ops.py::TestCommonCUDA::test_multiple_devices_masked_cumsum_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0396053Z test_ops.py::TestCommonCUDA::test_multiple_devices_masked_fill_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0396248Z test_ops.py::TestCommonCUDA::test_multiple_devices_masked_fill_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0396454Z test_ops.py::TestCommonCUDA::test_multiple_devices_masked_log_softmax_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0396657Z test_ops.py::TestCommonCUDA::test_multiple_devices_masked_logaddexp_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0396856Z test_ops.py::TestCommonCUDA::test_multiple_devices_masked_logsumexp_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0397055Z test_ops.py::TestCommonCUDA::test_multiple_devices_masked_mean_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0397252Z test_ops.py::TestCommonCUDA::test_multiple_devices_masked_median_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0397444Z test_ops.py::TestCommonCUDA::test_multiple_devices_masked_norm_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0397642Z test_ops.py::TestCommonCUDA::test_multiple_devices_masked_prod_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0397836Z test_ops.py::TestCommonCUDA::test_multiple_devices_masked_select_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0398037Z test_ops.py::TestCommonCUDA::test_multiple_devices_masked_softmax_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0398237Z test_ops.py::TestCommonCUDA::test_multiple_devices_masked_softmin_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0398455Z test_ops.py::TestCommonCUDA::test_multiple_devices_masked_std_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0398648Z test_ops.py::TestCommonCUDA::test_multiple_devices_masked_std_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0398835Z test_ops.py::TestCommonCUDA::test_multiple_devices_masked_sum_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0399028Z test_ops.py::TestCommonCUDA::test_multiple_devices_masked_sum_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0399224Z test_ops.py::TestCommonCUDA::test_multiple_devices_matrix_exp_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0399414Z test_ops.py::TestCommonCUDA::test_multiple_devices_max_binary_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0399637Z test_ops.py::TestCommonCUDA::test_multiple_devices_max_pool2d_with_indices_backward_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0399847Z test_ops.py::TestCommonCUDA::test_multiple_devices_max_reduction_no_dim_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0400054Z test_ops.py::TestCommonCUDA::test_multiple_devices_max_reduction_with_dim_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0400259Z test_ops.py::TestCommonCUDA::test_multiple_devices_max_reduction_with_dim_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0400451Z test_ops.py::TestCommonCUDA::test_multiple_devices_mean_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0400640Z test_ops.py::TestCommonCUDA::test_multiple_devices_median_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0400843Z test_ops.py::TestCommonCUDA::test_multiple_devices_median_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0401056Z test_ops.py::TestCommonCUDA::test_multiple_devices_meshgrid_list_of_tensors_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0401268Z test_ops.py::TestCommonCUDA::test_multiple_devices_meshgrid_variadic_tensors_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0401473Z test_ops.py::TestCommonCUDA::test_multiple_devices_min_reduction_no_dim_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0401680Z test_ops.py::TestCommonCUDA::test_multiple_devices_min_reduction_with_dim_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0401866Z test_ops.py::TestCommonCUDA::test_multiple_devices_mm_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0402057Z test_ops.py::TestCommonCUDA::test_multiple_devices_mode_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0402252Z test_ops.py::TestCommonCUDA::test_multiple_devices_movedim_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0402442Z test_ops.py::TestCommonCUDA::test_multiple_devices_msort_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0402631Z test_ops.py::TestCommonCUDA::test_multiple_devices_mul_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0402809Z test_ops.py::TestCommonCUDA::test_multiple_devices_mul_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0403006Z test_ops.py::TestCommonCUDA::test_multiple_devices_multinomial_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0403217Z test_ops.py::TestCommonCUDA::test_multiple_devices_mvlgamma_mvlgamma_p_1_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0403424Z test_ops.py::TestCommonCUDA::test_multiple_devices_mvlgamma_mvlgamma_p_1_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0403633Z test_ops.py::TestCommonCUDA::test_multiple_devices_mvlgamma_mvlgamma_p_3_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0403854Z test_ops.py::TestCommonCUDA::test_multiple_devices_nan_to_num_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0404051Z test_ops.py::TestCommonCUDA::test_multiple_devices_nanmean_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0404249Z test_ops.py::TestCommonCUDA::test_multiple_devices_nanmedian_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0404443Z test_ops.py::TestCommonCUDA::test_multiple_devices_narrow_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0404631Z test_ops.py::TestCommonCUDA::test_multiple_devices_narrow_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0404836Z test_ops.py::TestCommonCUDA::test_multiple_devices_native_dropout_backward_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0405020Z test_ops.py::TestCommonCUDA::test_multiple_devices_ne_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0405211Z test_ops.py::TestCommonCUDA::test_multiple_devices_neg_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0405398Z test_ops.py::TestCommonCUDA::test_multiple_devices_neg_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0405589Z test_ops.py::TestCommonCUDA::test_multiple_devices_new_empty_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0405790Z test_ops.py::TestCommonCUDA::test_multiple_devices_new_empty_strided_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0405982Z test_ops.py::TestCommonCUDA::test_multiple_devices_new_zeros_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0406199Z test_ops.py::TestCommonCUDA::test_multiple_devices_nextafter_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0406425Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_adaptive_avg_pool2d_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0406647Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_adaptive_avg_pool3d_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0406860Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_adaptive_max_pool1d_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0407083Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_batch_norm_without_cudnn_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0407302Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_binary_cross_entropy_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0407508Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_celu_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0407716Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_conv1d_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0407915Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_conv2d_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0408132Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_conv_transpose2d_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0408350Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_conv_transpose3d_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0408570Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_cosine_embedding_loss_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0408792Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_cosine_embedding_loss_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0409056Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_cross_entropy_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0409292Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_dropout2d_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0409506Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_embedding_bag_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0409714Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_embedding_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 7%] 2023-01-11T23:13:47.0409947Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_feature_alpha_dropout_with_train_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0410185Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_feature_alpha_dropout_without_train_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0410404Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_gaussian_nll_loss_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0410615Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_grid_sample_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0410830Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_group_norm_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0411036Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_hardshrink_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0411248Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_hardsigmoid_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0411475Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_hardswish_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0411684Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_hardtanh_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0411895Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_hardtanh_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0412105Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_huber_loss_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0412316Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_instance_norm_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0412536Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_interpolate_bicubic_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0412756Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_interpolate_bilinear_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0412982Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_interpolate_nearest_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0413189Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_kl_div_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0413395Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_l1_loss_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0413603Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_logsigmoid_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0413812Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_margin_ranking_loss_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0414014Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_max_pool2d_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0414225Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_max_pool3d_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0414454Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_max_unpool2d_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0414775Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_max_unpool3d_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0414993Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_max_unpool3d_grad_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0415208Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_multi_margin_loss_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0415431Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_multilabel_margin_loss_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0415640Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_nll_loss_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0415854Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_normalize_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0416050Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_one_hot_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0416263Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_pad_circular_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0416472Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_pad_circular_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0416685Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_pad_constant_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0416945Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_pad_constant_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0417155Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_pixel_shuffle_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0417369Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_pixel_unshuffle_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0417586Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_poisson_nll_loss_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0417798Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_poisson_nll_loss_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0418003Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_prelu_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0418203Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_relu6_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0418402Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_relu_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0418609Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_softmin_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0418814Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_softplus_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0419049Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_softshrink_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0419288Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_triplet_margin_loss_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0419504Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_triplet_margin_loss_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0419762Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_triplet_margin_with_distance_loss_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0419978Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_upsample_nearest_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0420171Z test_ops.py::TestCommonCUDA::test_multiple_devices_norm_inf_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0420366Z test_ops.py::TestCommonCUDA::test_multiple_devices_norm_nuc_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0420551Z test_ops.py::TestCommonCUDA::test_multiple_devices_normal_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0420756Z test_ops.py::TestCommonCUDA::test_multiple_devices_normal_number_mean_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0420947Z test_ops.py::TestCommonCUDA::test_multiple_devices_ones_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0421134Z test_ops.py::TestCommonCUDA::test_multiple_devices_ones_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0421323Z test_ops.py::TestCommonCUDA::test_multiple_devices_ones_like_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0421515Z test_ops.py::TestCommonCUDA::test_multiple_devices_outer_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0421715Z test_ops.py::TestCommonCUDA::test_multiple_devices_pca_lowrank_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0421908Z test_ops.py::TestCommonCUDA::test_multiple_devices_permute_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0422100Z test_ops.py::TestCommonCUDA::test_multiple_devices_permute_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0422331Z test_ops.py::TestCommonCUDA::test_multiple_devices_polygamma_polygamma_n_0_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0422519Z test_ops.py::TestCommonCUDA::test_multiple_devices_polygamma_polygamma_n_1_cuda_float32 SKIPPED (Skipped!) [ 8%] 2023-01-11T23:13:47.0422709Z test_ops.py::TestCommonCUDA::test_multiple_devices_polygamma_polygamma_n_2_cuda_float32 SKIPPED (Skipped!) [ 8%] 2023-01-11T23:13:47.0422898Z test_ops.py::TestCommonCUDA::test_multiple_devices_polygamma_polygamma_n_3_cuda_float32 SKIPPED (Skipped!) [ 8%] 2023-01-11T23:13:47.0423088Z test_ops.py::TestCommonCUDA::test_multiple_devices_polygamma_polygamma_n_3_cuda_int64 SKIPPED (Skipped!) [ 8%] 2023-01-11T23:13:47.0423282Z test_ops.py::TestCommonCUDA::test_multiple_devices_positive_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0423474Z test_ops.py::TestCommonCUDA::test_multiple_devices_positive_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0423666Z test_ops.py::TestCommonCUDA::test_multiple_devices_pow_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0423851Z test_ops.py::TestCommonCUDA::test_multiple_devices_prod_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0424046Z test_ops.py::TestCommonCUDA::test_multiple_devices_quantile_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0424232Z test_ops.py::TestCommonCUDA::test_multiple_devices_rad2deg_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0424419Z test_ops.py::TestCommonCUDA::test_multiple_devices_randint_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0424609Z test_ops.py::TestCommonCUDA::test_multiple_devices_randint_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0424808Z test_ops.py::TestCommonCUDA::test_multiple_devices_randint_like_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0425004Z test_ops.py::TestCommonCUDA::test_multiple_devices_randn_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0425219Z test_ops.py::TestCommonCUDA::test_multiple_devices_ravel_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0425406Z test_ops.py::TestCommonCUDA::test_multiple_devices_ravel_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0425607Z test_ops.py::TestCommonCUDA::test_multiple_devices_reciprocal_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0425802Z test_ops.py::TestCommonCUDA::test_multiple_devices_reciprocal_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0425993Z test_ops.py::TestCommonCUDA::test_multiple_devices_renorm_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0426176Z test_ops.py::TestCommonCUDA::test_multiple_devices_repeat_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0426381Z test_ops.py::TestCommonCUDA::test_multiple_devices_repeat_interleave_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0426576Z test_ops.py::TestCommonCUDA::test_multiple_devices_reshape_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0426764Z test_ops.py::TestCommonCUDA::test_multiple_devices_reshape_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0426946Z test_ops.py::TestCommonCUDA::test_multiple_devices_resize__cuda_int64 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0427139Z test_ops.py::TestCommonCUDA::test_multiple_devices_resolve_conj_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0427326Z test_ops.py::TestCommonCUDA::test_multiple_devices_roll_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0427537Z test_ops.py::TestCommonCUDA::test_multiple_devices_round_decimals_neg_3_cuda_float32 SKIPPED (Skipped!) [ 8%] 2023-01-11T23:13:47.0427725Z test_ops.py::TestCommonCUDA::test_multiple_devices_rsqrt_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0427910Z test_ops.py::TestCommonCUDA::test_multiple_devices_rsub_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0428108Z test_ops.py::TestCommonCUDA::test_multiple_devices_scatter_add_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0428311Z test_ops.py::TestCommonCUDA::test_multiple_devices_scatter_reduce_amin_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0428515Z test_ops.py::TestCommonCUDA::test_multiple_devices_scatter_reduce_mean_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0428720Z test_ops.py::TestCommonCUDA::test_multiple_devices_scatter_reduce_mean_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0428925Z test_ops.py::TestCommonCUDA::test_multiple_devices_scatter_reduce_prod_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0429135Z test_ops.py::TestCommonCUDA::test_multiple_devices_scatter_reduce_sum_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0429337Z test_ops.py::TestCommonCUDA::test_multiple_devices_scatter_reduce_sum_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0429536Z test_ops.py::TestCommonCUDA::test_multiple_devices_searchsorted_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0429791Z test_ops.py::TestCommonCUDA::test_multiple_devices_searchsorted_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0429998Z test_ops.py::TestCommonCUDA::test_multiple_devices_segment_reduce_lengths_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0430198Z test_ops.py::TestCommonCUDA::test_multiple_devices_segment_reduce_offsets_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0430392Z test_ops.py::TestCommonCUDA::test_multiple_devices_select_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0430614Z test_ops.py::TestCommonCUDA::test_multiple_devices_select_scatter_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0430805Z test_ops.py::TestCommonCUDA::test_multiple_devices_sgn_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0430991Z test_ops.py::TestCommonCUDA::test_multiple_devices_sgn_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0431178Z test_ops.py::TestCommonCUDA::test_multiple_devices_short_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0431364Z test_ops.py::TestCommonCUDA::test_multiple_devices_short_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0431575Z test_ops.py::TestCommonCUDA::test_multiple_devices_signal_windows_blackman_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0431784Z test_ops.py::TestCommonCUDA::test_multiple_devices_signal_windows_cosine_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0431987Z test_ops.py::TestCommonCUDA::test_multiple_devices_signal_windows_gaussian_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0432204Z test_ops.py::TestCommonCUDA::test_multiple_devices_signal_windows_general_hamming_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0432396Z test_ops.py::TestCommonCUDA::test_multiple_devices_signbit_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0432586Z test_ops.py::TestCommonCUDA::test_multiple_devices_signbit_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0432777Z test_ops.py::TestCommonCUDA::test_multiple_devices_sinc_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0432987Z test_ops.py::TestCommonCUDA::test_multiple_devices_sinh_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0433177Z test_ops.py::TestCommonCUDA::test_multiple_devices_sinh_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0433377Z test_ops.py::TestCommonCUDA::test_multiple_devices_slice_scatter_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0433582Z test_ops.py::TestCommonCUDA::test_multiple_devices_softmax_with_dtype_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0433784Z test_ops.py::TestCommonCUDA::test_multiple_devices_softmax_with_dtype_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0433983Z test_ops.py::TestCommonCUDA::test_multiple_devices_sparse_sampled_addmm_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0434184Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_bessel_j0_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0434394Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_bessel_j1_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0434595Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_bessel_j1_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0434791Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_bessel_y0_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0435012Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_chebyshev_polynomial_t_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0435230Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_chebyshev_polynomial_u_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0435447Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_chebyshev_polynomial_u_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0435818Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_chebyshev_polynomial_v_cuda_int64 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 8%] 2023-01-11T23:13:47.0436195Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_chebyshev_polynomial_w_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 8%] 2023-01-11T23:13:47.0436398Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_entr_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0436588Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_entr_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0436810Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_hermite_polynomial_h_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0437029Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_hermite_polynomial_he_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0437229Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_i0e_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0437427Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_i0e_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0437627Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_i1e_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0437845Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_laguerre_polynomial_l_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0438061Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_laguerre_polynomial_l_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0438411Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_legendre_polynomial_p_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 8%] 2023-01-11T23:13:47.0438787Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_legendre_polynomial_p_cuda_int64 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 8%] 2023-01-11T23:13:47.0438985Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_log_ndtr_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0439203Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_modified_bessel_i0_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0439417Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_modified_bessel_i0_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0439628Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_modified_bessel_i1_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0439842Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_modified_bessel_i1_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0440055Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_modified_bessel_k0_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0440268Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_modified_bessel_k0_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0440477Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_modified_bessel_k1_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0440675Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_ndtr_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0440903Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_polygamma_special_polygamma_n_0_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0441130Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_polygamma_special_polygamma_n_0_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0441345Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_scaled_modified_bessel_k1_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0441584Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_scaled_modified_bessel_k1_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 8%] 2023-01-11T23:13:47.0441956Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_shifted_chebyshev_polynomial_t_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 8%] 2023-01-11T23:13:47.0442310Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_shifted_chebyshev_polynomial_u_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 9%] 2023-01-11T23:13:47.0442668Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_shifted_chebyshev_polynomial_u_cuda_int64 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 9%] 2023-01-11T23:13:47.0443032Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_shifted_chebyshev_polynomial_v_cuda_int64 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 9%] 2023-01-11T23:13:47.0443396Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_shifted_chebyshev_polynomial_w_cuda_int64 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 9%] 2023-01-11T23:13:47.0443614Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_spherical_bessel_j0_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 9%] 2023-01-11T23:13:47.0443819Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_zeta_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 9%] 2023-01-11T23:13:47.0444017Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_zeta_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 9%] 2023-01-11T23:13:47.0444213Z test_ops.py::TestCommonCUDA::test_multiple_devices_split_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 9%] 2023-01-11T23:13:47.0444395Z test_ops.py::TestCommonCUDA::test_multiple_devices_split_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 9%] 2023-01-11T23:13:47.0444618Z test_ops.py::TestCommonCUDA::test_multiple_devices_split_list_args_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 9%] 2023-01-11T23:13:47.0444819Z test_ops.py::TestCommonCUDA::test_multiple_devices_split_with_sizes_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 9%] 2023-01-11T23:13:47.0445008Z test_ops.py::TestCommonCUDA::test_multiple_devices_sqrt_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 9%] 2023-01-11T23:13:47.0445201Z test_ops.py::TestCommonCUDA::test_multiple_devices_square_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 9%] 2023-01-11T23:13:47.0445398Z test_ops.py::TestCommonCUDA::test_multiple_devices_square_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 9%] 2023-01-11T23:13:47.0445590Z test_ops.py::TestCommonCUDA::test_multiple_devices_squeeze_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 9%] 2023-01-11T23:13:47.0445779Z test_ops.py::TestCommonCUDA::test_multiple_devices_stack_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 9%] 2023-01-11T23:13:47.0445985Z test_ops.py::TestCommonCUDA::test_multiple_devices_std_mean_unbiased_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 9%] 2023-01-11T23:13:47.0446179Z test_ops.py::TestCommonCUDA::test_multiple_devices_stft_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 9%] 2023-01-11T23:13:47.0446360Z test_ops.py::TestCommonCUDA::test_multiple_devices_sub_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 9%] 2023-01-11T23:13:47.0446548Z test_ops.py::TestCommonCUDA::test_multiple_devices_sum_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 9%] 2023-01-11T23:13:47.0446741Z test_ops.py::TestCommonCUDA::test_multiple_devices_sum_to_size_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 9%] 2023-01-11T23:13:47.0446936Z test_ops.py::TestCommonCUDA::test_multiple_devices_svd_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 9%] 2023-01-11T23:13:47.0447134Z test_ops.py::TestCommonCUDA::test_multiple_devices_svd_lowrank_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 9%] 2023-01-11T23:13:47.0447322Z test_ops.py::TestCommonCUDA::test_multiple_devices_t_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 9%] 2023-01-11T23:13:47.0447533Z test_ops.py::TestCommonCUDA::test_multiple_devices_tan_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 9%] 2023-01-11T23:13:47.0447736Z test_ops.py::TestCommonCUDA::test_multiple_devices_tensor_split_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 9%] 2023-01-11T23:13:47.0447926Z test_ops.py::TestCommonCUDA::test_multiple_devices_tile_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 9%] 2023-01-11T23:13:47.0448106Z test_ops.py::TestCommonCUDA::test_multiple_devices_to_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 9%] 2023-01-11T23:13:47.0448293Z test_ops.py::TestCommonCUDA::test_multiple_devices_to_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 9%] 2023-01-11T23:13:47.0448486Z test_ops.py::TestCommonCUDA::test_multiple_devices_to_sparse_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 9%] 2023-01-11T23:13:47.0448681Z test_ops.py::TestCommonCUDA::test_multiple_devices_topk_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 9%] 2023-01-11T23:13:47.0448873Z test_ops.py::TestCommonCUDA::test_multiple_devices_topk_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 9%] 2023-01-11T23:13:47.0449092Z test_ops.py::TestCommonCUDA::test_multiple_devices_trace_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 9%] 2023-01-11T23:13:47.0449306Z test_ops.py::TestCommonCUDA::test_multiple_devices_trapezoid_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 9%] 2023-01-11T23:13:47.0449502Z test_ops.py::TestCommonCUDA::test_multiple_devices_tril_indices_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 9%] 2023-01-11T23:13:47.0449691Z test_ops.py::TestCommonCUDA::test_multiple_devices_triu_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 9%] 2023-01-11T23:13:47.0449875Z test_ops.py::TestCommonCUDA::test_multiple_devices_triu_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 9%] 2023-01-11T23:13:47.0450084Z test_ops.py::TestCommonCUDA::test_multiple_devices_trunc_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 9%] 2023-01-11T23:13:47.0450280Z test_ops.py::TestCommonCUDA::test_multiple_devices_unbind_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 9%] 2023-01-11T23:13:47.0450473Z test_ops.py::TestCommonCUDA::test_multiple_devices_unflatten_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 9%] 2023-01-11T23:13:47.0450669Z test_ops.py::TestCommonCUDA::test_multiple_devices_unfold_copy_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 9%] 2023-01-11T23:13:47.0450862Z test_ops.py::TestCommonCUDA::test_multiple_devices_unfold_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 9%] 2023-01-11T23:13:47.0451051Z test_ops.py::TestCommonCUDA::test_multiple_devices_unfold_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 9%] 2023-01-11T23:13:47.0451237Z test_ops.py::TestCommonCUDA::test_multiple_devices_unique_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 9%] 2023-01-11T23:13:47.0451430Z test_ops.py::TestCommonCUDA::test_multiple_devices_unsqueeze_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 9%] 2023-01-11T23:13:47.0451620Z test_ops.py::TestCommonCUDA::test_multiple_devices_var_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 9%] 2023-01-11T23:13:47.0451805Z test_ops.py::TestCommonCUDA::test_multiple_devices_var_mean_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 9%] 2023-01-11T23:13:47.0451995Z test_ops.py::TestCommonCUDA::test_multiple_devices_vdot_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 9%] 2023-01-11T23:13:47.0452183Z test_ops.py::TestCommonCUDA::test_multiple_devices_view_as_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 9%] 2023-01-11T23:13:47.0452370Z test_ops.py::TestCommonCUDA::test_multiple_devices_view_copy_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 9%] 2023-01-11T23:13:47.0452556Z test_ops.py::TestCommonCUDA::test_multiple_devices_view_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 9%] 2023-01-11T23:13:47.0452744Z test_ops.py::TestCommonCUDA::test_multiple_devices_view_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 9%] 2023-01-11T23:13:47.0452955Z test_ops.py::TestCommonCUDA::test_multiple_devices_vstack_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 9%] 2023-01-11T23:13:47.0453149Z test_ops.py::TestCommonCUDA::test_multiple_devices_vstack_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 9%] 2023-01-11T23:13:47.0453337Z test_ops.py::TestCommonCUDA::test_multiple_devices_where_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 9%] 2023-01-11T23:13:47.0453525Z test_ops.py::TestCommonCUDA::test_multiple_devices_zero__cuda_float32 SKIPPED (fewer than 2 devices detected) [ 9%] 2023-01-11T23:13:47.0453703Z test_ops.py::TestCommonCUDA::test_multiple_devices_zero__cuda_int64 SKIPPED (fewer than 2 devices detected) [ 9%] 2023-01-11T23:13:47.0453894Z test_ops.py::TestCommonCUDA::test_multiple_devices_zeros_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 9%] 2023-01-11T23:13:47.0454084Z test_ops.py::TestCommonCUDA::test_multiple_devices_zeros_cuda_int64 SKIPPED (fewer than 2 devices detected) [ 9%] 2023-01-11T23:13:47.0454280Z test_ops.py::TestCommonCUDA::test_multiple_devices_zeros_like_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 9%] 2023-01-11T23:13:47.0454443Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_H_cuda_bool PASSED [ 9%] 2023-01-11T23:13:47.0454713Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_T_cuda_bool PASSED [ 9%] 2023-01-11T23:13:47.0454886Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values___radd___cuda_bool PASSED [ 9%] 2023-01-11T23:13:47.0455049Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values___rdiv___cuda_bool PASSED [ 9%] 2023-01-11T23:13:47.0455212Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values___ror___cuda_bool PASSED [ 9%] 2023-01-11T23:13:47.0455368Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_acosh_cuda_bool PASSED [ 9%] 2023-01-11T23:13:47.0455590Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_addr_cuda_bool PASSED [ 9%] 2023-01-11T23:13:47.0455754Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_all_cuda_bool PASSED [ 9%] 2023-01-11T23:13:47.0455919Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_amax_cuda_bool PASSED [ 9%] 2023-01-11T23:13:47.0456081Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_amin_cuda_bool PASSED [ 9%] 2023-01-11T23:13:47.0456241Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_any_cuda_bool PASSED [ 9%] 2023-01-11T23:13:47.0456414Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_as_strided_cuda_bool PASSED [ 9%] 2023-01-11T23:13:47.0456576Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_asinh_cuda_bool PASSED [ 9%] 2023-01-11T23:13:47.0456732Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_atanh_cuda_bool PASSED [ 9%] 2023-01-11T23:13:47.0456905Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_atleast_1d_cuda_bool PASSED [ 9%] 2023-01-11T23:13:47.0457074Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_atleast_3d_cuda_bool PASSED [ 9%] 2023-01-11T23:13:47.0457239Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_bitwise_or_cuda_bool PASSED [ 9%] 2023-01-11T23:13:47.0457401Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_bool_cuda_bool PASSED [ 9%] 2023-01-11T23:13:47.0457582Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_broadcast_tensors_cuda_bool PASSED [ 9%] 2023-01-11T23:13:47.0457755Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_cartesian_prod_cuda_bool PASSED [ 9%] 2023-01-11T23:13:47.0457917Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_cat_cuda_bool PASSED [ 9%] 2023-01-11T23:13:47.0458080Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_cfloat_cuda_bool PASSED [ 9%] 2023-01-11T23:13:47.0458233Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_chalf_cuda_bool PASSED [ 9%] 2023-01-11T23:13:47.0458398Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_char_cuda_bool PASSED [ 9%] 2023-01-11T23:13:47.0458562Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_chunk_cuda_bool PASSED [ 9%] 2023-01-11T23:13:47.0458764Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_clamp_max_cuda_bool PASSED [ 9%] 2023-01-11T23:13:47.0458924Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_conj_cuda_bool PASSED [ 9%] 2023-01-11T23:13:47.0459081Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_cosh_cuda_bool PASSED [ 9%] 2023-01-11T23:13:47.0459244Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_cummax_cuda_bool PASSED [ 9%] 2023-01-11T23:13:47.0459408Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_cummin_cuda_bool PASSED [ 9%] 2023-01-11T23:13:47.0459561Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_diag_cuda_bool PASSED [ 9%] 2023-01-11T23:13:47.0459735Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_diag_embed_cuda_bool PASSED [ 9%] 2023-01-11T23:13:47.0459901Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_diagflat_cuda_bool PASSED [ 9%] 2023-01-11T23:13:47.0460069Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_diagonal_cuda_bool PASSED [ 9%] 2023-01-11T23:13:47.0460245Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_diagonal_scatter_cuda_bool PASSED [ 9%] 2023-01-11T23:13:47.0460407Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_dstack_cuda_bool PASSED [ 9%] 2023-01-11T23:13:47.0460589Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_empty_cuda_bool SKIPPED (Skipped!) [ 9%] 2023-01-11T23:13:47.0460749Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_erf_cuda_bool PASSED [ 9%] 2023-01-11T23:13:47.0460914Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_erfc_cuda_bool PASSED [ 9%] 2023-01-11T23:13:47.0461068Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_erfinv_cuda_bool PASSED [ 9%] 2023-01-11T23:13:47.0461293Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_exp2_cuda_bool PASSED [ 9%] 2023-01-11T23:13:47.0461453Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_expand_cuda_bool PASSED [ 9%] 2023-01-11T23:13:47.0461630Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_fftshift_cuda_bool PASSED [ 9%] 2023-01-11T23:13:47.0461798Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_hfft2_cuda_bool PASSED [ 9%] 2023-01-11T23:13:47.0461968Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_ifft2_cuda_bool PASSED [ 9%] 2023-01-11T23:13:47.0462133Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_ifft_cuda_bool PASSED [ 9%] 2023-01-11T23:13:47.0462308Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_ifftshift_cuda_bool PASSED [ 9%] 2023-01-11T23:13:47.0462467Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_ihfft2_cuda_bool PASSED [ 9%] 2023-01-11T23:13:47.0462633Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_ihfft_cuda_bool PASSED [ 9%] 2023-01-11T23:13:47.0462795Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_irfft2_cuda_bool PASSED [ 9%] 2023-01-11T23:13:47.0462959Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_rfft2_cuda_bool PASSED [ 9%] 2023-01-11T23:13:47.0463121Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_rfft_cuda_bool PASSED [ 9%] 2023-01-11T23:13:47.0463285Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fill_cuda_bool PASSED [ 9%] 2023-01-11T23:13:47.0463450Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_flatten_cuda_bool PASSED [ 9%] 2023-01-11T23:13:47.0463617Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_flipud_cuda_bool PASSED [ 9%] 2023-01-11T23:13:47.0463786Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_float_power_cuda_bool PASSED [ 9%] 2023-01-11T23:13:47.0463938Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fmax_cuda_bool PASSED [ 9%] 2023-01-11T23:13:47.0464100Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_ge_cuda_bool PASSED [ 9%] 2023-01-11T23:13:47.0464258Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_half_cuda_bool PASSED [ 9%] 2023-01-11T23:13:47.0464443Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_hsplit_cuda_bool PASSED [ 9%] 2023-01-11T23:13:47.0464609Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_index_add_cuda_bool PASSED [ 9%] 2023-01-11T23:13:47.0464776Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_index_copy_cuda_bool PASSED [ 9%] 2023-01-11T23:13:47.0464944Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_index_put_cuda_bool PASSED [ 9%] 2023-01-11T23:13:47.0465116Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_index_select_cuda_bool PASSED [ 9%] 2023-01-11T23:13:47.0465270Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_int_cuda_bool PASSED [ 9%] 2023-01-11T23:13:47.0465442Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_isneginf_cuda_bool PASSED [ 9%] 2023-01-11T23:13:47.0465606Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_isposinf_cuda_bool PASSED [ 9%] 2023-01-11T23:13:47.0465800Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_jiterator_2inputs_2outputs_cuda_bool PASSED [ 9%] 2023-01-11T23:13:47.0465995Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_jiterator_4inputs_with_extra_args_cuda_bool PASSED [ 9%] 2023-01-11T23:13:47.0466175Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_jiterator_binary_cuda_bool PASSED [ 9%] 2023-01-11T23:13:47.0466338Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_ldexp_cuda_bool PASSED [ 9%] 2023-01-11T23:13:47.0466500Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_le_cuda_bool PASSED [ 9%] 2023-01-11T23:13:47.0466663Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_lgamma_cuda_bool PASSED [ 9%] 2023-01-11T23:13:47.0466844Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_log10_cuda_bool PASSED [ 9%] 2023-01-11T23:13:47.0467009Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_log1p_cuda_bool PASSED [ 9%] 2023-01-11T23:13:47.0467172Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_log2_cuda_bool PASSED [ 9%] 2023-01-11T23:13:47.0467333Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_log_cuda_bool PASSED [ 9%] 2023-01-11T23:13:47.0467517Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_log_softmax_with_dtype_cuda_bool PASSED [ 9%] 2023-01-11T23:13:47.0467690Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_logical_not_cuda_bool PASSED [ 9%] 2023-01-11T23:13:47.0467857Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_logical_or_cuda_bool PASSED [ 9%] 2023-01-11T23:13:47.0468028Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_logical_xor_cuda_bool PASSED [ 9%] 2023-01-11T23:13:47.0468190Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_logit_cuda_bool PASSED [ 9%] 2023-01-11T23:13:47.0468343Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_mH_cuda_bool PASSED [ 9%] 2023-01-11T23:13:47.0468499Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_mT_cuda_bool PASSED [ 9%] 2023-01-11T23:13:47.0468667Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_masked_fill_cuda_bool PASSED [ 9%] 2023-01-11T23:13:47.0468837Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_masked_sum_cuda_bool PASSED [ 9%] 2023-01-11T23:13:47.0469038Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_max_reduction_with_dim_cuda_bool PASSED [ 9%] 2023-01-11T23:13:47.0469229Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_maximum_cuda_bool PASSED [ 9%] 2023-01-11T23:13:47.0469404Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_minimum_cuda_bool PASSED [ 9%] 2023-01-11T23:13:47.0469565Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_movedim_cuda_bool PASSED [ 9%] 2023-01-11T23:13:47.0469785Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_mul_cuda_bool PASSED [ 9%] 2023-01-11T23:13:47.0469955Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_narrow_copy_cuda_bool PASSED [ 10%] 2023-01-11T23:13:47.0470146Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_narrow_cuda_bool PASSED [ 10%] 2023-01-11T23:13:47.0470333Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_new_empty_cuda_bool SKIPPED (Skipped!) [ 10%] 2023-01-11T23:13:47.0470498Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_new_full_cuda_bool PASSED [ 10%] 2023-01-11T23:13:47.0470661Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_new_ones_cuda_bool PASSED [ 10%] 2023-01-11T23:13:47.0470827Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_new_zeros_cuda_bool PASSED [ 10%] 2023-01-11T23:13:47.0471020Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_nn_functional_pad_circular_cuda_bool PASSED [ 10%] 2023-01-11T23:13:47.0471215Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_nn_functional_pixel_shuffle_cuda_bool PASSED [ 10%] 2023-01-11T23:13:47.0471391Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_nn_functional_softsign_cuda_bool PASSED [ 10%] 2023-01-11T23:13:47.0471558Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_ones_cuda_bool PASSED [ 10%] 2023-01-11T23:13:47.0471758Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_polygamma_polygamma_n_2_cuda_bool SKIPPED (Skipped!) [ 10%] 2023-01-11T23:13:47.0471949Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_polygamma_polygamma_n_3_cuda_bool SKIPPED (Skipped!) [ 10%] 2023-01-11T23:13:47.0472145Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_polygamma_polygamma_n_4_cuda_bool SKIPPED (Skipped!) [ 10%] 2023-01-11T23:13:47.0472311Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_rad2deg_cuda_bool PASSED [ 10%] 2023-01-11T23:13:47.0472476Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_real_cuda_bool PASSED [ 10%] 2023-01-11T23:13:47.0472675Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_reciprocal_cuda_bool PASSED [ 10%] 2023-01-11T23:13:47.0472853Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_resize_as__cuda_bool PASSED [ 10%] 2023-01-11T23:13:47.0473075Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_scalar_tensor_cuda_bool PASSED [ 10%] 2023-01-11T23:13:47.0473275Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_scatter_reduce_sum_cuda_bool PASSED [ 10%] 2023-01-11T23:13:47.0473440Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_select_cuda_bool PASSED [ 10%] 2023-01-11T23:13:47.0473600Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_sign_cuda_bool PASSED [ 10%] 2023-01-11T23:13:47.0473764Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_signbit_cuda_bool PASSED [ 10%] 2023-01-11T23:13:47.0473923Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_sinc_cuda_bool PASSED [ 10%] 2023-01-11T23:13:47.0474086Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_sinh_cuda_bool PASSED [ 10%] 2023-01-11T23:13:47.0474259Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_slice_scatter_cuda_bool PASSED [ 10%] 2023-01-11T23:13:47.0474431Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_softmax_with_dtype_cuda_bool PASSED [ 10%] 2023-01-11T23:13:47.0474608Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_airy_ai_cuda_bool PASSED [ 10%] 2023-01-11T23:13:47.0474784Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_bessel_j0_cuda_bool PASSED [ 10%] 2023-01-11T23:13:47.0474956Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_bessel_j1_cuda_bool PASSED [ 10%] 2023-01-11T23:13:47.0475151Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_chebyshev_polynomial_u_cuda_bool PASSED [ 10%] 2023-01-11T23:13:47.0475531Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_chebyshev_polynomial_w_cuda_bool SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 10%] 2023-01-11T23:13:47.0475710Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_erfcx_cuda_bool PASSED [ 10%] 2023-01-11T23:13:47.0475920Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_i1e_cuda_bool PASSED [ 10%] 2023-01-11T23:13:47.0476104Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_log_ndtr_cuda_bool PASSED [ 10%] 2023-01-11T23:13:47.0476295Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_modified_bessel_i1_cuda_bool PASSED [ 10%] 2023-01-11T23:13:47.0476476Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_modified_bessel_k0_cuda_bool PASSED [ 10%] 2023-01-11T23:13:47.0476650Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_ndtr_cuda_bool PASSED [ 10%] 2023-01-11T23:13:47.0476849Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_scaled_modified_bessel_k1_cuda_bool PASSED [ 10%] 2023-01-11T23:13:47.0477224Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_shifted_chebyshev_polynomial_t_cuda_bool SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 10%] 2023-01-11T23:13:47.0477599Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_shifted_chebyshev_polynomial_u_cuda_bool SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 10%] 2023-01-11T23:13:47.0477779Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_xlog1py_cuda_bool PASSED [ 10%] 2023-01-11T23:13:47.0477952Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_zeta_cuda_bool PASSED [ 10%] 2023-01-11T23:13:47.0478132Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_split_list_args_cuda_bool PASSED [ 10%] 2023-01-11T23:13:47.0478310Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_split_with_sizes_cuda_bool PASSED [ 10%] 2023-01-11T23:13:47.0478469Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_square_cuda_bool PASSED [ 10%] 2023-01-11T23:13:47.0478666Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_squeeze_cuda_bool PASSED [ 10%] 2023-01-11T23:13:47.0478842Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_sum_to_size_cuda_bool PASSED [ 10%] 2023-01-11T23:13:47.0479034Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_t_cuda_bool PASSED [ 10%] 2023-01-11T23:13:47.0479231Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_take_along_dim_cuda_bool PASSED [ 10%] 2023-01-11T23:13:47.0479396Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_take_cuda_bool PASSED [ 10%] 2023-01-11T23:13:47.0479561Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_tan_cuda_bool PASSED [ 10%] 2023-01-11T23:13:47.0479725Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_tanh_cuda_bool PASSED [ 10%] 2023-01-11T23:13:47.0479888Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_tile_cuda_bool PASSED [ 10%] 2023-01-11T23:13:47.0480044Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_to_cuda_bool PASSED [ 10%] 2023-01-11T23:13:47.0480216Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_to_sparse_cuda_bool PASSED [ 10%] 2023-01-11T23:13:47.0480389Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_transpose_cuda_bool PASSED [ 10%] 2023-01-11T23:13:47.0480553Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_tril_cuda_bool PASSED [ 10%] 2023-01-11T23:13:47.0480721Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_unflatten_cuda_bool PASSED [ 10%] 2023-01-11T23:13:47.0480894Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_unfold_copy_cuda_bool PASSED [ 10%] 2023-01-11T23:13:47.0481060Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_unfold_cuda_bool PASSED [ 10%] 2023-01-11T23:13:47.0481242Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_unique_consecutive_cuda_bool PASSED [ 10%] 2023-01-11T23:13:47.0481398Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_unique_cuda_bool PASSED [ 10%] 2023-01-11T23:13:47.0481571Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_unsqueeze_cuda_bool PASSED [ 10%] 2023-01-11T23:13:47.0481735Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_view_cuda_bool PASSED [ 10%] 2023-01-11T23:13:47.0481922Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_vstack_cuda_bool PASSED [ 10%] 2023-01-11T23:13:47.0482091Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_where_cuda_bool PASSED [ 10%] 2023-01-11T23:13:47.0482257Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_xlogy_cuda_bool PASSED [ 10%] 2023-01-11T23:13:47.0482422Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_zeros_cuda_bool PASSED [ 10%] 2023-01-11T23:13:47.0482590Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_H_cuda_complex64 PASSED [ 10%] 2023-01-11T23:13:47.0482758Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_H_cuda_int64 PASSED [ 10%] 2023-01-11T23:13:47.0482913Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_T_cuda_int64 PASSED [ 10%] 2023-01-11T23:13:47.0483090Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples___radd___cuda_complex64 PASSED [ 10%] 2023-01-11T23:13:47.0483257Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples___radd___cuda_int64 PASSED [ 10%] 2023-01-11T23:13:47.0483424Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples___rand___cuda_int64 PASSED [ 10%] 2023-01-11T23:13:47.0483594Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples___rdiv___cuda_complex64 PASSED [ 10%] 2023-01-11T23:13:47.0483766Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples___rdiv___cuda_float32 PASSED [ 10%] 2023-01-11T23:13:47.0483934Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples___rdiv___cuda_int64 PASSED [ 10%] 2023-01-11T23:13:47.0484111Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples___rmatmul___cuda_complex64 PASSED [ 10%] 2023-01-11T23:13:47.0484277Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples___rmatmul___cuda_float32 PASSED [ 10%] 2023-01-11T23:13:47.0484475Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples___rmul___cuda_complex64 PASSED [ 10%] 2023-01-11T23:13:47.0484646Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples___rmul___cuda_float32 PASSED [ 10%] 2023-01-11T23:13:47.0484814Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples___rmul___cuda_int64 PASSED [ 10%] 2023-01-11T23:13:47.0484977Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples___rpow___cuda_int64 PASSED [ 10%] 2023-01-11T23:13:47.0485150Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples___rsub___cuda_complex64 PASSED [ 10%] 2023-01-11T23:13:47.0485311Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples___rsub___cuda_int64 PASSED [ 10%] 2023-01-11T23:13:47.0485502Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples__softmax_backward_data_cuda_float32 PASSED [ 10%] 2023-01-11T23:13:47.0485670Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_acos_cuda_complex64 PASSED [ 10%] 2023-01-11T23:13:47.0485829Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_acos_cuda_int64 PASSED [ 10%] 2023-01-11T23:13:47.0486000Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_acosh_cuda_int64 PASSED [ 10%] 2023-01-11T23:13:47.0486169Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_add_cuda_complex64 PASSED [ 10%] 2023-01-11T23:13:47.0486339Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_add_cuda_float32 PASSED [ 10%] 2023-01-11T23:13:47.0486514Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_addcdiv_cuda_complex64 PASSED [ 10%] 2023-01-11T23:13:47.0486687Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_addcdiv_cuda_float32 PASSED [ 10%] 2023-01-11T23:13:47.0486859Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_addcmul_cuda_float32 PASSED [ 10%] 2023-01-11T23:13:47.0487027Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_addcmul_cuda_int64 PASSED [ 10%] 2023-01-11T23:13:47.0487204Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_addmm_decomposed_cuda_float32 PASSED [ 10%] 2023-01-11T23:13:47.0487378Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_addmv_cuda_complex64 PASSED [ 10%] 2023-01-11T23:13:47.0487548Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_addr_cuda_float32 PASSED [ 10%] 2023-01-11T23:13:47.0487736Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_all_cuda_int64 PASSED [ 10%] 2023-01-11T23:13:47.0487912Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_allclose_cuda_complex64 PASSED [ 10%] 2023-01-11T23:13:47.0488084Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_allclose_cuda_float32 PASSED [ 10%] 2023-01-11T23:13:47.0488249Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_amax_cuda_float32 PASSED [ 10%] 2023-01-11T23:13:47.0488416Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_amax_cuda_int64 PASSED [ 10%] 2023-01-11T23:13:47.0488582Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_amin_cuda_float32 PASSED [ 10%] 2023-01-11T23:13:47.0488741Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_amin_cuda_int64 PASSED [ 10%] 2023-01-11T23:13:47.0488930Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_aminmax_cuda_int64 PASSED [ 10%] 2023-01-11T23:13:47.0489125Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_angle_cuda_complex64 PASSED [ 10%] 2023-01-11T23:13:47.0489302Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_angle_cuda_float32 PASSED [ 10%] 2023-01-11T23:13:47.0489467Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_angle_cuda_int64 PASSED [ 10%] 2023-01-11T23:13:47.0489636Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_arange_cuda_float32 PASSED [ 10%] 2023-01-11T23:13:47.0489806Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_argmax_cuda_int64 PASSED [ 10%] 2023-01-11T23:13:47.0489977Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_argmin_cuda_float32 PASSED [ 10%] 2023-01-11T23:13:47.0490138Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_argsort_cuda_float32 PASSED [ 10%] 2023-01-11T23:13:47.0490305Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_argsort_cuda_int64 PASSED [ 10%] 2023-01-11T23:13:47.0490508Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_argwhere_cuda_complex64 PASSED [ 10%] 2023-01-11T23:13:47.0490676Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_argwhere_cuda_int64 PASSED [ 10%] 2023-01-11T23:13:47.0490850Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_as_strided_cuda_float32 XFAIL [ 10%] 2023-01-11T23:13:47.0491046Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_as_strided_partial_views_cuda_complex64 XFAIL [ 10%] 2023-01-11T23:13:47.0491281Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_as_strided_scatter_cuda_complex64 SKIPPED (Works for int64, fails for everything else) [ 10%] 2023-01-11T23:13:47.0491507Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_as_strided_scatter_cuda_float32 SKIPPED (Works for int64, fails for everything else) [ 10%] 2023-01-11T23:13:47.0491674Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_asin_cuda_int64 PASSED [ 10%] 2023-01-11T23:13:47.0491837Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_atan2_cuda_float32 PASSED [ 10%] 2023-01-11T23:13:47.0492002Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_atan2_cuda_int64 PASSED [ 10%] 2023-01-11T23:13:47.0492179Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_atanh_cuda_complex64 PASSED [ 10%] 2023-01-11T23:13:47.0492344Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_atanh_cuda_float32 PASSED [ 10%] 2023-01-11T23:13:47.0492523Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_atleast_1d_cuda_complex64 PASSED [ 10%] 2023-01-11T23:13:47.0492696Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_atleast_2d_cuda_float32 PASSED [ 10%] 2023-01-11T23:13:47.0492875Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_atleast_3d_cuda_complex64 PASSED [ 10%] 2023-01-11T23:13:47.0493046Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_baddbmm_cuda_float32 PASSED [ 10%] 2023-01-11T23:13:47.0493220Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_bernoulli_cuda_float32 PASSED [ 10%] 2023-01-11T23:13:47.0493387Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_bfloat16_cuda_complex64 PASSED [ 10%] 2023-01-11T23:13:47.0493588Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_bfloat16_cuda_float32 PASSED [ 10%] 2023-01-11T23:13:47.0493760Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_bitwise_not_cuda_int64 PASSED [ 10%] 2023-01-11T23:13:47.0493927Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_bitwise_or_cuda_int64 PASSED [ 10%] 2023-01-11T23:13:47.0494112Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_bitwise_right_shift_cuda_int64 PASSED [ 10%] 2023-01-11T23:13:47.0494282Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_bitwise_xor_cuda_int64 PASSED [ 10%] 2023-01-11T23:13:47.0494460Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_block_diag_cuda_complex64 PASSED [ 10%] 2023-01-11T23:13:47.0494747Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_bool_cuda_float32 PASSED [ 10%] 2023-01-11T23:13:47.0494943Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_broadcast_shapes_cuda_float32 SKIPPED (Skipped!) [ 10%] 2023-01-11T23:13:47.0495130Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_broadcast_tensors_cuda_float32 PASSED [ 10%] 2023-01-11T23:13:47.0495315Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_broadcast_tensors_cuda_int64 PASSED [ 10%] 2023-01-11T23:13:47.0495502Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_broadcast_to_cuda_complex64 PASSED [ 10%] 2023-01-11T23:13:47.0495677Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_broadcast_to_cuda_float32 PASSED [ 10%] 2023-01-11T23:13:47.0495850Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_bucketize_cuda_int64 PASSED [ 10%] 2023-01-11T23:13:47.0496018Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_byte_cuda_int64 PASSED [ 10%] 2023-01-11T23:13:47.0496193Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cartesian_prod_cuda_int64 PASSED [ 10%] 2023-01-11T23:13:47.0496420Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cat_cuda_float32 PASSED [ 10%] 2023-01-11T23:13:47.0496575Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cat_cuda_int64 PASSED [ 10%] 2023-01-11T23:13:47.0496747Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cdist_cuda_float32 PASSED [ 10%] 2023-01-11T23:13:47.0496913Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cdouble_cuda_int64 PASSED [ 10%] 2023-01-11T23:13:47.0497078Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cfloat_cuda_float32 PASSED [ 10%] 2023-01-11T23:13:47.0497242Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cfloat_cuda_int64 PASSED [ 10%] 2023-01-11T23:13:47.0497410Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_char_cuda_complex64 PASSED [ 11%] 2023-01-11T23:13:47.0497574Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_char_cuda_int64 PASSED [ 11%] 2023-01-11T23:13:47.0497741Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cholesky_cuda_float32 PASSED [ 11%] 2023-01-11T23:13:47.0497917Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cholesky_solve_cuda_complex64 PASSED [ 11%] 2023-01-11T23:13:47.0498087Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_clamp_max_cuda_float32 PASSED [ 11%] 2023-01-11T23:13:47.0498261Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_clamp_min_cuda_float32 PASSED [ 11%] 2023-01-11T23:13:47.0498428Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_clamp_min_cuda_int64 PASSED [ 11%] 2023-01-11T23:13:47.0498605Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_column_stack_cuda_complex64 PASSED [ 11%] 2023-01-11T23:13:47.0498782Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_combinations_cuda_float32 PASSED [ 11%] 2023-01-11T23:13:47.0498980Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_combinations_cuda_int64 PASSED [ 11%] 2023-01-11T23:13:47.0499174Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_complex_cuda_float32 PASSED [ 11%] 2023-01-11T23:13:47.0499344Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_conj_cuda_complex64 PASSED [ 11%] 2023-01-11T23:13:47.0499501Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_conj_cuda_float32 PASSED [ 11%] 2023-01-11T23:13:47.0499697Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_conj_cuda_int64 PASSED [ 11%] 2023-01-11T23:13:47.0499882Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_constant_pad_nd_cuda_complex64 PASSED [ 11%] 2023-01-11T23:13:47.0500051Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_copysign_cuda_float32 PASSED [ 11%] 2023-01-11T23:13:47.0500217Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_copysign_cuda_int64 PASSED [ 11%] 2023-01-11T23:13:47.0500388Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_corrcoef_cuda_complex64 PASSED [ 11%] 2023-01-11T23:13:47.0500554Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_corrcoef_cuda_int64 PASSED [ 11%] 2023-01-11T23:13:47.0500721Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cos_cuda_complex64 PASSED [ 11%] 2023-01-11T23:13:47.0500881Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cos_cuda_float32 PASSED [ 11%] 2023-01-11T23:13:47.0501042Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cos_cuda_int64 PASSED [ 11%] 2023-01-11T23:13:47.0501207Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cosh_cuda_int64 PASSED [ 11%] 2023-01-11T23:13:47.0501387Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_count_nonzero_cuda_complex64 PASSED [ 11%] 2023-01-11T23:13:47.0501563Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_count_nonzero_cuda_float32 PASSED [ 11%] 2023-01-11T23:13:47.0501728Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cov_cuda_float32 PASSED [ 11%] 2023-01-11T23:13:47.0501897Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cross_cuda_complex64 PASSED [ 11%] 2023-01-11T23:13:47.0502065Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cross_cuda_int64 PASSED [ 11%] 2023-01-11T23:13:47.0502254Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cummax_cuda_int64 PASSED [ 11%] 2023-01-11T23:13:47.0502413Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cummin_cuda_float32 PASSED [ 11%] 2023-01-11T23:13:47.0502585Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cummin_cuda_int64 PASSED [ 11%] 2023-01-11T23:13:47.0502751Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cumprod_cuda_int64 PASSED [ 11%] 2023-01-11T23:13:47.0502924Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cumsum_cuda_complex64 PASSED [ 11%] 2023-01-11T23:13:47.0503087Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cumsum_cuda_float32 PASSED [ 11%] 2023-01-11T23:13:47.0503254Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_deg2rad_cuda_float32 PASSED [ 11%] 2023-01-11T23:13:47.0503419Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_deg2rad_cuda_int64 PASSED [ 11%] 2023-01-11T23:13:47.0503582Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_diag_cuda_complex64 PASSED [ 11%] 2023-01-11T23:13:47.0503740Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_diag_cuda_float32 PASSED [ 11%] 2023-01-11T23:13:47.0503902Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_diag_cuda_int64 PASSED [ 11%] 2023-01-11T23:13:47.0504079Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_diag_embed_cuda_float32 PASSED [ 11%] 2023-01-11T23:13:47.0504257Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_diagflat_cuda_complex64 PASSED [ 11%] 2023-01-11T23:13:47.0504427Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_diagflat_cuda_float32 PASSED [ 11%] 2023-01-11T23:13:47.0504592Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_diagflat_cuda_int64 PASSED [ 11%] 2023-01-11T23:13:47.0504757Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_diagonal_cuda_int64 PASSED [ 11%] 2023-01-11T23:13:47.0504943Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_diagonal_scatter_cuda_complex64 PASSED [ 11%] 2023-01-11T23:13:47.0505125Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_diagonal_scatter_cuda_float32 PASSED [ 11%] 2023-01-11T23:13:47.0505293Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_diagonal_scatter_cuda_int64 PASSED [ 11%] 2023-01-11T23:13:47.0505485Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_diff_cuda_float32 PASSED [ 11%] 2023-01-11T23:13:47.0505651Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_diff_cuda_int64 PASSED [ 11%] 2023-01-11T23:13:47.0505820Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_digamma_cuda_float32 PASSED [ 11%] 2023-01-11T23:13:47.0505982Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_dist_cuda_float32 PASSED [ 11%] 2023-01-11T23:13:47.0506165Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_div_floor_rounding_cuda_float32 PASSED [ 11%] 2023-01-11T23:13:47.0506342Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_div_floor_rounding_cuda_int64 PASSED [ 11%] 2023-01-11T23:13:47.0506530Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_div_no_rounding_mode_cuda_complex64 PASSED [ 11%] 2023-01-11T23:13:47.0506708Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_div_no_rounding_mode_cuda_float32 PASSED [ 11%] 2023-01-11T23:13:47.0506892Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_div_no_rounding_mode_cuda_int64 PASSED [ 11%] 2023-01-11T23:13:47.0507061Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_dot_cuda_complex64 PASSED [ 11%] 2023-01-11T23:13:47.0507227Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_dot_cuda_float32 PASSED [ 11%] 2023-01-11T23:13:47.0507398Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_double_cuda_complex64 PASSED [ 11%] 2023-01-11T23:13:47.0507561Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_double_cuda_int64 PASSED [ 11%] 2023-01-11T23:13:47.0507732Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_dsplit_cuda_complex64 PASSED [ 11%] 2023-01-11T23:13:47.0507899Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_dsplit_cuda_float32 PASSED [ 11%] 2023-01-11T23:13:47.0508086Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_dsplit_cuda_int64 PASSED [ 11%] 2023-01-11T23:13:47.0508247Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_dstack_cuda_complex64 PASSED [ 11%] 2023-01-11T23:13:47.0508414Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_dstack_cuda_int64 PASSED [ 11%] 2023-01-11T23:13:47.0508598Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_empty_cuda_float32 SKIPPED (Skipped!) [ 11%] 2023-01-11T23:13:47.0508782Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_empty_cuda_int64 SKIPPED (Skipped!) [ 11%] 2023-01-11T23:13:47.0508991Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_empty_like_cuda_float32 SKIPPED (Skipped!) [ 11%] 2023-01-11T23:13:47.0509182Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_equal_cuda_float32 PASSED [ 11%] 2023-01-11T23:13:47.0509345Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_equal_cuda_int64 PASSED [ 11%] 2023-01-11T23:13:47.0509514Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_erf_cuda_float32 PASSED [ 11%] 2023-01-11T23:13:47.0509729Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_erf_cuda_int64 PASSED [ 11%] 2023-01-11T23:13:47.0509903Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_erfc_cuda_float32 PASSED [ 11%] 2023-01-11T23:13:47.0510065Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_erfc_cuda_int64 PASSED [ 11%] 2023-01-11T23:13:47.0510231Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_erfinv_cuda_float32 PASSED [ 11%] 2023-01-11T23:13:47.0510396Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_erfinv_cuda_int64 PASSED [ 11%] 2023-01-11T23:13:47.0510563Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_exp_cuda_complex64 PASSED [ 11%] 2023-01-11T23:13:47.0510729Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_exp_cuda_int64 PASSED [ 11%] 2023-01-11T23:13:47.0510899Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_expand_as_cuda_float32 PASSED [ 11%] 2023-01-11T23:13:47.0511071Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_expand_as_cuda_int64 PASSED [ 11%] 2023-01-11T23:13:47.0511232Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_expand_cuda_complex64 PASSED [ 11%] 2023-01-11T23:13:47.0511424Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_expm1_cuda_float32 PASSED [ 11%] 2023-01-11T23:13:47.0511593Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_expm1_cuda_int64 PASSED [ 11%] 2023-01-11T23:13:47.0511772Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_eye_cuda_float32 SKIPPED (Skipped!) [ 11%] 2023-01-11T23:13:47.0511945Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_fft2_cuda_complex64 PASSED [ 11%] 2023-01-11T23:13:47.0512118Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_fft_cuda_complex64 PASSED [ 11%] 2023-01-11T23:13:47.0512291Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_fftn_cuda_complex64 PASSED [ 11%] 2023-01-11T23:13:47.0512456Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_fftn_cuda_int64 PASSED [ 11%] 2023-01-11T23:13:47.0512629Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_fftshift_cuda_complex64 PASSED [ 11%] 2023-01-11T23:13:47.0512806Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_fftshift_cuda_float32 PASSED [ 11%] 2023-01-11T23:13:47.0512979Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_hfft2_cuda_float32 PASSED [ 11%] 2023-01-11T23:13:47.0513147Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_hfft_cuda_float32 PASSED [ 11%] 2023-01-11T23:13:47.0513322Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_hfftn_cuda_complex64 PASSED [ 11%] 2023-01-11T23:13:47.0513494Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_ifft2_cuda_complex64 PASSED [ 11%] 2023-01-11T23:13:47.0513660Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_ifft_cuda_float32 PASSED [ 11%] 2023-01-11T23:13:47.0513824Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_ifft_cuda_int64 PASSED [ 11%] 2023-01-11T23:13:47.0514016Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_ifftn_cuda_int64 PASSED [ 11%] 2023-01-11T23:13:47.0514184Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_ifftshift_cuda_float32 PASSED [ 11%] 2023-01-11T23:13:47.0514357Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_ifftshift_cuda_int64 PASSED [ 11%] 2023-01-11T23:13:47.0514527Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_ihfft_cuda_float32 PASSED [ 11%] 2023-01-11T23:13:47.0514693Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_ihfft_cuda_int64 PASSED [ 11%] 2023-01-11T23:13:47.0514860Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_irfft2_cuda_int64 PASSED [ 11%] 2023-01-11T23:13:47.0515033Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_irfft_cuda_complex64 PASSED [ 11%] 2023-01-11T23:13:47.0515208Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_irfftn_cuda_complex64 PASSED [ 11%] 2023-01-11T23:13:47.0515381Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_irfftn_cuda_float32 PASSED [ 11%] 2023-01-11T23:13:47.0515551Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_rfftn_cuda_float32 PASSED [ 11%] 2023-01-11T23:13:47.0515714Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_rfftn_cuda_int64 PASSED [ 11%] 2023-01-11T23:13:47.0515881Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_flatten_cuda_float32 PASSED [ 11%] 2023-01-11T23:13:47.0516050Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_flip_cuda_complex64 PASSED [ 11%] 2023-01-11T23:13:47.0516216Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_flip_cuda_float32 PASSED [ 11%] 2023-01-11T23:13:47.0516379Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_flip_cuda_int64 PASSED [ 11%] 2023-01-11T23:13:47.0516548Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fliplr_cuda_complex64 PASSED [ 11%] 2023-01-11T23:13:47.0516713Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fliplr_cuda_int64 PASSED [ 11%] 2023-01-11T23:13:47.0516883Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_float_cuda_complex64 PASSED [ 11%] 2023-01-11T23:13:47.0517040Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_float_cuda_float32 PASSED [ 11%] 2023-01-11T23:13:47.0517231Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_floor_cuda_float32 PASSED [ 11%] 2023-01-11T23:13:47.0517407Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_floor_divide_cuda_float32 PASSED [ 11%] 2023-01-11T23:13:47.0517571Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fmax_cuda_float32 PASSED [ 11%] 2023-01-11T23:13:47.0517735Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fmax_cuda_int64 PASSED [ 11%] 2023-01-11T23:13:47.0517896Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fmin_cuda_float32 PASSED [ 11%] 2023-01-11T23:13:47.0518060Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fmin_cuda_int64 PASSED [ 11%] 2023-01-11T23:13:47.0518222Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fmod_cuda_float32 PASSED [ 11%] 2023-01-11T23:13:47.0518394Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_full_cuda_complex64 PASSED [ 11%] 2023-01-11T23:13:47.0518555Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_full_like_cuda_int64 PASSED [ 11%] 2023-01-11T23:13:47.0518725Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_gather_cuda_complex64 PASSED [ 11%] 2023-01-11T23:13:47.0518891Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_gather_cuda_float32 PASSED [ 11%] 2023-01-11T23:13:47.0519059Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_gather_cuda_int64 PASSED [ 11%] 2023-01-11T23:13:47.0519222Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_gcd_cuda_int64 PASSED [ 11%] 2023-01-11T23:13:47.0519391Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_geqrf_cuda_complex64 PASSED [ 11%] 2023-01-11T23:13:47.0519558Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_geqrf_cuda_float32 PASSED [ 11%] 2023-01-11T23:13:47.0519765Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_gradient_cuda_float32 PASSED [ 11%] 2023-01-11T23:13:47.0519936Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_grid_sampler_2d_cuda_float32 PASSED [ 11%] 2023-01-11T23:13:47.0520102Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_gt_cuda_float32 PASSED [ 11%] 2023-01-11T23:13:47.0520265Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_gt_cuda_int64 PASSED [ 11%] 2023-01-11T23:13:47.0520427Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_half_cuda_int64 PASSED [ 11%] 2023-01-11T23:13:47.0520593Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_heaviside_cuda_int64 PASSED [ 11%] 2023-01-11T23:13:47.0520761Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_hstack_cuda_complex64 PASSED [ 11%] 2023-01-11T23:13:47.0520920Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_i0_cuda_int64 PASSED [ 11%] 2023-01-11T23:13:47.0521086Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_igamma_cuda_float32 PASSED [ 11%] 2023-01-11T23:13:47.0521254Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_index_fill_cuda_complex64 PASSED [ 11%] 2023-01-11T23:13:47.0521426Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_index_fill_cuda_float32 PASSED [ 11%] 2023-01-11T23:13:47.0521599Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_index_fill_cuda_int64 PASSED [ 11%] 2023-01-11T23:13:47.0521775Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_index_put_cuda_complex64 PASSED [ 11%] 2023-01-11T23:13:47.0521941Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_index_put_cuda_int64 PASSED [ 11%] 2023-01-11T23:13:47.0522116Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_index_reduce_cuda_float32 PASSED [ 11%] 2023-01-11T23:13:47.0522288Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_index_select_cuda_float32 PASSED [ 11%] 2023-01-11T23:13:47.0522461Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_index_select_cuda_int64 PASSED [ 11%] 2023-01-11T23:13:47.0522631Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_inner_cuda_float32 PASSED [ 11%] 2023-01-11T23:13:47.0522790Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isclose_cuda_float32 PASSED [ 11%] 2023-01-11T23:13:47.0522980Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isclose_cuda_int64 PASSED [ 11%] 2023-01-11T23:13:47.0523157Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isfinite_cuda_complex64 PASSED [ 11%] 2023-01-11T23:13:47.0523325Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isfinite_cuda_float32 PASSED [ 12%] 2023-01-11T23:13:47.0523491Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isinf_cuda_float32 PASSED [ 12%] 2023-01-11T23:13:47.0523657Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isnan_cuda_complex64 PASSED [ 12%] 2023-01-11T23:13:47.0523824Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isposinf_cuda_int64 PASSED [ 12%] 2023-01-11T23:13:47.0523992Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isreal_cuda_complex64 PASSED [ 12%] 2023-01-11T23:13:47.0524190Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_jiterator_2inputs_2outputs_cuda_complex64 PASSED [ 12%] 2023-01-11T23:13:47.0524382Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_jiterator_4inputs_with_extra_args_cuda_float32 PASSED [ 12%] 2023-01-11T23:13:47.0524577Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_jiterator_4inputs_with_extra_args_cuda_int64 PASSED [ 12%] 2023-01-11T23:13:47.0524758Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_jiterator_binary_cuda_complex64 PASSED [ 12%] 2023-01-11T23:13:47.0524936Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_jiterator_binary_cuda_float32 PASSED [ 12%] 2023-01-11T23:13:47.0525114Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_jiterator_binary_cuda_int64 PASSED [ 12%] 2023-01-11T23:13:47.0525297Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_jiterator_unary_cuda_complex64 PASSED [ 12%] 2023-01-11T23:13:47.0525487Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_kron_cuda_int64 PASSED [ 12%] 2023-01-11T23:13:47.0525657Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_kthvalue_cuda_float32 PASSED [ 12%] 2023-01-11T23:13:47.0525818Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_kthvalue_cuda_int64 PASSED [ 12%] 2023-01-11T23:13:47.0525991Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_ldexp_cuda_complex64 PASSED [ 12%] 2023-01-11T23:13:47.0526157Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_le_cuda_float32 PASSED [ 12%] 2023-01-11T23:13:47.0526319Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_le_cuda_int64 PASSED [ 12%] 2023-01-11T23:13:47.0526485Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_lerp_cuda_float32 PASSED [ 12%] 2023-01-11T23:13:47.0526650Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_lgamma_cuda_int64 PASSED [ 12%] 2023-01-11T23:13:47.0526825Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_cond_cuda_float32 PASSED [ 12%] 2023-01-11T23:13:47.0527006Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_cross_cuda_complex64 PASSED [ 12%] 2023-01-11T23:13:47.0527177Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_cross_cuda_float32 PASSED [ 12%] 2023-01-11T23:13:47.0527342Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_cross_cuda_int64 PASSED [ 12%] 2023-01-11T23:13:47.0527517Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_det_cuda_complex64 PASSED [ 12%] 2023-01-11T23:13:47.0527743Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_det_singular_cuda_float32 SKIPPED (The backward may give different results) [ 12%] 2023-01-11T23:13:47.0527911Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_eig_cuda_float32 PASSED [ 12%] 2023-01-11T23:13:47.0528087Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_eigvals_cuda_float32 PASSED [ 12%] 2023-01-11T23:13:47.0528272Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_ldl_factor_cuda_complex64 PASSED [ 12%] 2023-01-11T23:13:47.0528461Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_ldl_factor_ex_cuda_complex64 PASSED [ 12%] 2023-01-11T23:13:47.0528666Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_ldl_factor_ex_cuda_float32 PASSED [ 12%] 2023-01-11T23:13:47.0528862Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_lstsq_grad_oriented_cuda_complex64 PASSED [ 12%] 2023-01-11T23:13:47.0529058Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_lu_factor_cuda_float32 PASSED [ 12%] 2023-01-11T23:13:47.0529263Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_lu_factor_ex_cuda_complex64 PASSED [ 12%] 2023-01-11T23:13:47.0529447Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_lu_solve_cuda_complex64 PASSED [ 12%] 2023-01-11T23:13:47.0529629Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_matrix_power_cuda_float32 PASSED [ 12%] 2023-01-11T23:13:47.0529813Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_matrix_rank_cuda_float32 PASSED [ 12%] 2023-01-11T23:13:47.0530006Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_matrix_rank_hermitian_cuda_float32 PASSED [ 12%] 2023-01-11T23:13:47.0530194Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_multi_dot_cuda_float32 SKIPPED (67470!) [ 12%] 2023-01-11T23:13:47.0530374Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_norm_cuda_complex64 PASSED [ 12%] 2023-01-11T23:13:47.0530545Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_norm_cuda_float32 PASSED [ 12%] 2023-01-11T23:13:47.0530712Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_pinv_cuda_float32 PASSED [ 12%] 2023-01-11T23:13:47.0530902Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_pinv_hermitian_cuda_complex64 PASSED [ 12%] 2023-01-11T23:13:47.0531150Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_pinv_singular_cuda_complex64 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 12%] 2023-01-11T23:13:47.0541264Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_pinv_singular_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 12%] 2023-01-11T23:13:47.0541481Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_qr_cuda_complex64 PASSED [ 12%] 2023-01-11T23:13:47.0541667Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_slogdet_cuda_complex64 PASSED [ 12%] 2023-01-11T23:13:47.0541842Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_slogdet_cuda_float32 PASSED [ 12%] 2023-01-11T23:13:47.0542020Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_solve_cuda_complex64 PASSED [ 12%] 2023-01-11T23:13:47.0542206Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_solve_triangular_cuda_float32 PASSED [ 12%] 2023-01-11T23:13:47.0542374Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_svd_cuda_complex64 PASSED [ 12%] 2023-01-11T23:13:47.0542556Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_tensorinv_cuda_float32 PASSED [ 12%] 2023-01-11T23:13:47.0542730Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_vander_cuda_float32 PASSED [ 12%] 2023-01-11T23:13:47.0542904Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_vander_cuda_int64 PASSED [ 12%] 2023-01-11T23:13:47.0543081Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_vecdot_cuda_float32 PASSED [ 12%] 2023-01-11T23:13:47.0543249Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linspace_cuda_int64 PASSED [ 12%] 2023-01-11T23:13:47.0543418Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_log10_cuda_complex64 PASSED [ 12%] 2023-01-11T23:13:47.0543583Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_log10_cuda_float32 PASSED [ 12%] 2023-01-11T23:13:47.0543745Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_log10_cuda_int64 PASSED [ 12%] 2023-01-11T23:13:47.0543905Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_log1p_cuda_complex64 PASSED [ 12%] 2023-01-11T23:13:47.0544072Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_log1p_cuda_float32 PASSED [ 12%] 2023-01-11T23:13:47.0544316Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_log1p_cuda_int64 PASSED [ 12%] 2023-01-11T23:13:47.0544485Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_log_cuda_complex64 PASSED [ 12%] 2023-01-11T23:13:47.0544650Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_log_cuda_int64 PASSED [ 12%] 2023-01-11T23:13:47.0544822Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_log_softmax_cuda_float32 PASSED [ 12%] 2023-01-11T23:13:47.0545012Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_log_softmax_with_dtype_cuda_complex64 PASSED [ 12%] 2023-01-11T23:13:47.0545200Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_log_softmax_with_dtype_cuda_float32 PASSED [ 12%] 2023-01-11T23:13:47.0545365Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logaddexp_cuda_float32 PASSED [ 12%] 2023-01-11T23:13:47.0545541Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logdet_cuda_complex64 PASSED [ 12%] 2023-01-11T23:13:47.0545715Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logical_and_cuda_complex64 PASSED [ 12%] 2023-01-11T23:13:47.0545888Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logical_not_cuda_int64 PASSED [ 12%] 2023-01-11T23:13:47.0546059Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logical_or_cuda_float32 PASSED [ 12%] 2023-01-11T23:13:47.0546230Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logical_xor_cuda_float32 PASSED [ 12%] 2023-01-11T23:13:47.0546401Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logical_xor_cuda_int64 PASSED [ 12%] 2023-01-11T23:13:47.0546567Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logit_cuda_float32 PASSED [ 12%] 2023-01-11T23:13:47.0546732Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logit_cuda_int64 PASSED [ 12%] 2023-01-11T23:13:47.0546953Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logspace_cuda_complex64 PASSED [ 12%] 2023-01-11T23:13:47.0547122Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logsumexp_cuda_int64 PASSED [ 12%] 2023-01-11T23:13:47.0547294Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_long_cuda_complex64 PASSED [ 12%] 2023-01-11T23:13:47.0547462Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_long_cuda_float32 PASSED [ 12%] 2023-01-11T23:13:47.0547627Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_long_cuda_int64 PASSED [ 12%] 2023-01-11T23:13:47.0547790Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_lt_cuda_int64 PASSED [ 12%] 2023-01-11T23:13:47.0547957Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_lu_cuda_complex64 PASSED [ 12%] 2023-01-11T23:13:47.0548129Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_lu_unpack_cuda_float32 PASSED [ 12%] 2023-01-11T23:13:47.0548286Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mH_cuda_float32 PASSED [ 12%] 2023-01-11T23:13:47.0548458Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mT_cuda_complex64 PASSED [ 12%] 2023-01-11T23:13:47.0548618Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mT_cuda_float32 PASSED [ 12%] 2023-01-11T23:13:47.0548783Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mT_cuda_int64 PASSED [ 12%] 2023-01-11T23:13:47.0548955Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_amax_cuda_float32 PASSED [ 12%] 2023-01-11T23:13:47.0549127Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_amax_cuda_int64 PASSED [ 12%] 2023-01-11T23:13:47.0549302Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_amin_cuda_float32 PASSED [ 12%] 2023-01-11T23:13:47.0549473Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_argmin_cuda_int64 PASSED [ 12%] 2023-01-11T23:13:47.0549651Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_cumprod_cuda_float32 PASSED [ 12%] 2023-01-11T23:13:47.0549922Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_log_softmax_cuda_float32 PASSED [ 12%] 2023-01-11T23:13:47.0550109Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_logaddexp_cuda_float32 PASSED [ 12%] 2023-01-11T23:13:47.0550322Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_logsumexp_cuda_int64 PASSED [ 12%] 2023-01-11T23:13:47.0550503Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_mean_cuda_complex64 PASSED [ 12%] 2023-01-11T23:13:47.0550679Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_mean_cuda_float32 PASSED [ 12%] 2023-01-11T23:13:47.0550852Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_norm_cuda_float32 PASSED [ 12%] 2023-01-11T23:13:47.0551032Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_normalize_cuda_float32 PASSED [ 12%] 2023-01-11T23:13:47.0551204Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_prod_cuda_int64 PASSED [ 12%] 2023-01-11T23:13:47.0551385Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_scatter_cuda_complex64 PASSED [ 12%] 2023-01-11T23:13:47.0551561Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_select_cuda_complex64 PASSED [ 12%] 2023-01-11T23:13:47.0551738Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_softmin_cuda_float32 PASSED [ 12%] 2023-01-11T23:13:47.0551915Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_std_cuda_complex64 PASSED [ 12%] 2023-01-11T23:13:47.0552087Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_std_cuda_float32 PASSED [ 12%] 2023-01-11T23:13:47.0552256Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_std_cuda_int64 PASSED [ 12%] 2023-01-11T23:13:47.0552427Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_var_cuda_float32 PASSED [ 12%] 2023-01-11T23:13:47.0552595Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_max_binary_cuda_float32 PASSED [ 12%] 2023-01-11T23:13:47.0552779Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_max_reduction_no_dim_cuda_float32 PASSED [ 12%] 2023-01-11T23:13:47.0552989Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_max_reduction_with_dim_cuda_float32 PASSED [ 12%] 2023-01-11T23:13:47.0553161Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mean_cuda_complex64 PASSED [ 12%] 2023-01-11T23:13:47.0553354Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_meshgrid_list_of_tensors_cuda_complex64 PASSED [ 12%] 2023-01-11T23:13:47.0553543Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_meshgrid_list_of_tensors_cuda_float32 PASSED [ 12%] 2023-01-11T23:13:47.0553737Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_meshgrid_variadic_tensors_cuda_complex64 PASSED [ 12%] 2023-01-11T23:13:47.0553929Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_meshgrid_variadic_tensors_cuda_float32 PASSED [ 12%] 2023-01-11T23:13:47.0554102Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_min_binary_cuda_float32 PASSED [ 12%] 2023-01-11T23:13:47.0554275Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_min_binary_cuda_int64 PASSED [ 12%] 2023-01-11T23:13:47.0554459Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_min_reduction_with_dim_cuda_int64 PASSED [ 12%] 2023-01-11T23:13:47.0554622Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mm_cuda_complex64 PASSED [ 12%] 2023-01-11T23:13:47.0554796Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_movedim_cuda_complex64 PASSED [ 12%] 2023-01-11T23:13:47.0554965Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_movedim_cuda_int64 PASSED [ 12%] 2023-01-11T23:13:47.0555133Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mul_cuda_complex64 PASSED [ 12%] 2023-01-11T23:13:47.0555300Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mul_cuda_float32 PASSED [ 12%] 2023-01-11T23:13:47.0555475Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_multinomial_cuda_float32 PASSED [ 12%] 2023-01-11T23:13:47.0555640Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mv_cuda_float32 PASSED [ 12%] 2023-01-11T23:13:47.0555831Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mvlgamma_mvlgamma_p_1_cuda_float32 PASSED [ 12%] 2023-01-11T23:13:47.0556018Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mvlgamma_mvlgamma_p_3_cuda_float32 PASSED [ 12%] 2023-01-11T23:13:47.0556217Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mvlgamma_mvlgamma_p_3_cuda_int64 PASSED [ 12%] 2023-01-11T23:13:47.0556402Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mvlgamma_mvlgamma_p_5_cuda_float32 PASSED [ 12%] 2023-01-11T23:13:47.0556568Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nan_to_num_cuda_int64 PASSED [ 12%] 2023-01-11T23:13:47.0556730Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_ne_cuda_float32 PASSED [ 12%] 2023-01-11T23:13:47.0556892Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_ne_cuda_int64 PASSED [ 12%] 2023-01-11T23:13:47.0557055Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_neg_cuda_float32 PASSED [ 12%] 2023-01-11T23:13:47.0557221Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_neg_cuda_int64 PASSED [ 12%] 2023-01-11T23:13:47.0557410Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_new_empty_cuda_complex64 SKIPPED (Skipped!) [ 12%] 2023-01-11T23:13:47.0557653Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_new_empty_strided_cuda_float32 SKIPPED (Expected: new_empty_strided is not comparable) [ 12%] 2023-01-11T23:13:47.0557816Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_new_full_cuda_float32 PASSED [ 12%] 2023-01-11T23:13:47.0557984Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_new_ones_cuda_float32 PASSED [ 12%] 2023-01-11T23:13:47.0558157Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_new_zeros_cuda_complex64 PASSED [ 12%] 2023-01-11T23:13:47.0558330Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nextafter_cuda_float32 PASSED [ 12%] 2023-01-11T23:13:47.0558524Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_alpha_dropout_cuda_float32 PASSED [ 12%] 2023-01-11T23:13:47.0558745Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_binary_cross_entropy_cuda_float32 PASSED [ 12%] 2023-01-11T23:13:47.0558949Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_conv2d_cuda_complex64 PASSED [ 12%] 2023-01-11T23:13:47.0559181Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_cosine_embedding_loss_cuda_float32 PASSED [ 12%] 2023-01-11T23:13:47.0559379Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_cosine_embedding_loss_cuda_int64 PASSED [ 12%] 2023-01-11T23:13:47.0559557Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_dropout_cuda_float32 PASSED [ 12%] 2023-01-11T23:13:47.0559736Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_elu_cuda_float32 PASSED [ 12%] 2023-01-11T23:13:47.0559946Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_feature_alpha_dropout_with_train_cuda_float32 PASSED [ 12%] 2023-01-11T23:13:47.0560166Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_feature_alpha_dropout_without_train_cuda_float32 PASSED [ 12%] 2023-01-11T23:13:47.0560346Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_glu_cuda_float32 PASSED [ 12%] 2023-01-11T23:13:47.0560539Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_grid_sample_cuda_float32 PASSED [ 12%] 2023-01-11T23:13:47.0560727Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_group_norm_cuda_float32 PASSED [ 13%] 2023-01-11T23:13:47.0560917Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_hardshrink_cuda_float32 PASSED [ 13%] 2023-01-11T23:13:47.0561107Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_hardsigmoid_cuda_float32 PASSED [ 13%] 2023-01-11T23:13:47.0561287Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_hardtanh_cuda_float32 PASSED [ 13%] 2023-01-11T23:13:47.0561483Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_hinge_embedding_loss_cuda_float32 PASSED [ 13%] 2023-01-11T23:13:47.0561684Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_interpolate_bicubic_cuda_float32 PASSED [ 13%] 2023-01-11T23:13:47.0561945Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_interpolate_bilinear_cuda_float32 PASSED [ 13%] 2023-01-11T23:13:47.0562142Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_interpolate_linear_cuda_float32 PASSED [ 13%] 2023-01-11T23:13:47.0562325Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_kl_div_cuda_float32 PASSED [ 13%] 2023-01-11T23:13:47.0562509Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_l1_loss_cuda_complex64 PASSED [ 13%] 2023-01-11T23:13:47.0562691Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_l1_loss_cuda_float32 PASSED [ 13%] 2023-01-11T23:13:47.0562876Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_linear_cuda_complex64 PASSED [ 13%] 2023-01-11T23:13:47.0563059Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_logsigmoid_cuda_float32 PASSED [ 13%] 2023-01-11T23:13:47.0563257Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_margin_ranking_loss_cuda_float32 PASSED [ 13%] 2023-01-11T23:13:47.0563451Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_margin_ranking_loss_cuda_int64 PASSED [ 13%] 2023-01-11T23:13:47.0563639Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_max_pool2d_cuda_float32 PASSED [ 13%] 2023-01-11T23:13:47.0563830Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_max_unpool1d_cuda_float32 PASSED [ 13%] 2023-01-11T23:13:47.0564016Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_max_unpool2d_cuda_float32 PASSED [ 13%] 2023-01-11T23:13:47.0564207Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_max_unpool3d_grad_cuda_float32 PASSED [ 13%] 2023-01-11T23:13:47.0564415Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_mish_cuda_float32 PASSED [ 13%] 2023-01-11T23:13:47.0564599Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_mse_loss_cuda_float32 PASSED [ 13%] 2023-01-11T23:13:47.0564786Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_multi_margin_loss_cuda_float32 PASSED [ 13%] 2023-01-11T23:13:47.0564970Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_nll_loss_cuda_float32 PASSED [ 13%] 2023-01-11T23:13:47.0565162Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_normalize_cuda_complex64 PASSED [ 13%] 2023-01-11T23:13:47.0565347Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_normalize_cuda_float32 PASSED [ 13%] 2023-01-11T23:13:47.0565539Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pad_reflect_cuda_complex64 PASSED [ 13%] 2023-01-11T23:13:47.0565733Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pad_reflect_cuda_float32 PASSED [ 13%] 2023-01-11T23:13:47.0565936Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pairwise_distance_cuda_complex64 PASSED [ 13%] 2023-01-11T23:13:47.0566130Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pairwise_distance_cuda_float32 PASSED [ 13%] 2023-01-11T23:13:47.0566321Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pairwise_distance_cuda_int64 PASSED [ 13%] 2023-01-11T23:13:47.0566511Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pdist_cuda_float32 PASSED [ 13%] 2023-01-11T23:13:47.0566691Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pixel_shuffle_cuda_float32 PASSED [ 13%] 2023-01-11T23:13:47.0566882Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pixel_unshuffle_cuda_int64 PASSED [ 13%] 2023-01-11T23:13:47.0567066Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_relu6_cuda_float32 PASSED [ 13%] 2023-01-11T23:13:47.0567247Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_relu6_cuda_int64 PASSED [ 13%] 2023-01-11T23:13:47.0567449Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_relu_cuda_float32 PASSED [ 13%] 2023-01-11T23:13:47.0567631Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_relu_cuda_int64 PASSED [ 13%] 2023-01-11T23:13:47.0567814Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_rrelu_cuda_float32 PASSED [ 13%] 2023-01-11T23:13:47.0567991Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_selu_cuda_float32 PASSED [ 13%] 2023-01-11T23:13:47.0568188Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_softmin_with_dtype_cuda_float32 PASSED [ 13%] 2023-01-11T23:13:47.0568373Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_softmin_with_dtype_cuda_int64 PASSED [ 13%] 2023-01-11T23:13:47.0568564Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_softplus_cuda_float32 PASSED [ 13%] 2023-01-11T23:13:47.0568752Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_softshrink_cuda_float32 PASSED [ 13%] 2023-01-11T23:13:47.0568940Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_softsign_cuda_int64 PASSED [ 13%] 2023-01-11T23:13:47.0569153Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_tanhshrink_cuda_float32 PASSED [ 13%] 2023-01-11T23:13:47.0569362Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_threshold_cuda_float32 PASSED [ 13%] 2023-01-11T23:13:47.0569547Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_threshold_cuda_int64 PASSED [ 13%] 2023-01-11T23:13:47.0569744Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_triplet_margin_loss_cuda_complex64 PASSED [ 13%] 2023-01-11T23:13:47.0569941Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_triplet_margin_loss_cuda_int64 PASSED [ 13%] 2023-01-11T23:13:47.0570172Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_triplet_margin_with_distance_loss_cuda_complex64 PASSED [ 13%] 2023-01-11T23:13:47.0570383Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_triplet_margin_with_distance_loss_cuda_int64 PASSED [ 13%] 2023-01-11T23:13:47.0570581Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_upsample_nearest_cuda_float32 PASSED [ 13%] 2023-01-11T23:13:47.0570748Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_norm_cuda_float32 PASSED [ 13%] 2023-01-11T23:13:47.0570923Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_norm_inf_cuda_complex64 PASSED [ 13%] 2023-01-11T23:13:47.0571101Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_norm_nuc_cuda_complex64 PASSED [ 13%] 2023-01-11T23:13:47.0571272Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_norm_nuc_cuda_float32 PASSED [ 13%] 2023-01-11T23:13:47.0571440Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_normal_cuda_float32 PASSED [ 13%] 2023-01-11T23:13:47.0571620Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_normal_number_mean_cuda_float32 PASSED [ 13%] 2023-01-11T23:13:47.0571778Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_ones_cuda_int64 PASSED [ 13%] 2023-01-11T23:13:47.0571948Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_ones_like_cuda_float32 PASSED [ 13%] 2023-01-11T23:13:47.0572113Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_ormqr_cuda_float32 PASSED [ 13%] 2023-01-11T23:13:47.0572285Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_pca_lowrank_cuda_float32 PASSED [ 13%] 2023-01-11T23:13:47.0572451Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_permute_cuda_int64 PASSED [ 13%] 2023-01-11T23:13:47.0572622Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_pinverse_cuda_complex64 PASSED [ 13%] 2023-01-11T23:13:47.0572792Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_pinverse_cuda_float32 PASSED [ 13%] 2023-01-11T23:13:47.0572981Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_polygamma_polygamma_n_0_cuda_int64 PASSED [ 13%] 2023-01-11T23:13:47.0573197Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_polygamma_polygamma_n_1_cuda_int64 SKIPPED (Skipped!) [ 13%] 2023-01-11T23:13:47.0573398Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_polygamma_polygamma_n_2_cuda_float32 SKIPPED (Skipped!) [ 13%] 2023-01-11T23:13:47.0573595Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_polygamma_polygamma_n_2_cuda_int64 SKIPPED (Skipped!) [ 13%] 2023-01-11T23:13:47.0573793Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_polygamma_polygamma_n_4_cuda_float32 SKIPPED (Skipped!) [ 13%] 2023-01-11T23:13:47.0573959Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_positive_cuda_int64 PASSED [ 13%] 2023-01-11T23:13:47.0574126Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_pow_cuda_int64 PASSED [ 13%] 2023-01-11T23:13:47.0574299Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_prod_cuda_complex64 PASSED [ 13%] 2023-01-11T23:13:47.0574465Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_prod_cuda_float32 PASSED [ 13%] 2023-01-11T23:13:47.0574749Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_prod_cuda_int64 PASSED [ 13%] 2023-01-11T23:13:47.0574905Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_qr_cuda_complex64 PASSED [ 13%] 2023-01-11T23:13:47.0575065Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_qr_cuda_float32 PASSED [ 13%] 2023-01-11T23:13:47.0575232Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_rad2deg_cuda_int64 PASSED [ 13%] 2023-01-11T23:13:47.0575404Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_rand_like_cuda_complex64 PASSED [ 13%] 2023-01-11T23:13:47.0575601Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_randint_cuda_int64 SKIPPED (Test expects tensor input) [ 13%] 2023-01-11T23:13:47.0575777Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_randint_like_cuda_float32 PASSED [ 13%] 2023-01-11T23:13:47.0576066Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_randn_cuda_float32 SKIPPED (Test expects tensor input) [ 13%] 2023-01-11T23:13:47.0576239Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_ravel_cuda_float32 PASSED [ 13%] 2023-01-11T23:13:47.0576402Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_ravel_cuda_int64 PASSED [ 13%] 2023-01-11T23:13:47.0576563Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_real_cuda_complex64 PASSED [ 13%] 2023-01-11T23:13:47.0576723Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_real_cuda_int64 PASSED [ 13%] 2023-01-11T23:13:47.0576899Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_reciprocal_cuda_complex64 PASSED [ 13%] 2023-01-11T23:13:47.0577069Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_reciprocal_cuda_int64 PASSED [ 13%] 2023-01-11T23:13:47.0577238Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_renorm_cuda_complex64 PASSED [ 13%] 2023-01-11T23:13:47.0577405Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_renorm_cuda_float32 PASSED [ 13%] 2023-01-11T23:13:47.0577567Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_repeat_cuda_int64 PASSED [ 13%] 2023-01-11T23:13:47.0577756Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_repeat_interleave_cuda_complex64 PASSED [ 13%] 2023-01-11T23:13:47.0577927Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_reshape_cuda_complex64 PASSED [ 13%] 2023-01-11T23:13:47.0578088Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_resize__cuda_float32 PASSED [ 13%] 2023-01-11T23:13:47.0578251Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_resize__cuda_int64 PASSED [ 13%] 2023-01-11T23:13:47.0578423Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_resize_as__cuda_float32 PASSED [ 13%] 2023-01-11T23:13:47.0578589Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_resize_as__cuda_int64 PASSED [ 13%] 2023-01-11T23:13:47.0578762Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_resolve_conj_cuda_float32 PASSED [ 13%] 2023-01-11T23:13:47.0578936Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_resolve_neg_cuda_int64 PASSED [ 13%] 2023-01-11T23:13:47.0579132Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_roll_cuda_int64 PASSED [ 13%] 2023-01-11T23:13:47.0579302Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_rot90_cuda_complex64 PASSED [ 13%] 2023-01-11T23:13:47.0579456Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_rot90_cuda_int64 PASSED [ 13%] 2023-01-11T23:13:47.0579621Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_round_cuda_int64 PASSED [ 13%] 2023-01-11T23:13:47.0579815Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_round_decimals_neg_3_cuda_float32 SKIPPED (Skipped!) [ 13%] 2023-01-11T23:13:47.0579978Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_rsqrt_cuda_float32 PASSED [ 13%] 2023-01-11T23:13:47.0580141Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_rsub_cuda_float32 PASSED [ 13%] 2023-01-11T23:13:47.0580320Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_scatter_add_cuda_complex64 PASSED [ 13%] 2023-01-11T23:13:47.0580487Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_scatter_cuda_float32 PASSED [ 13%] 2023-01-11T23:13:47.0580674Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_scatter_reduce_amax_cuda_float32 PASSED [ 13%] 2023-01-11T23:13:47.0580861Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_scatter_reduce_amin_cuda_float32 PASSED [ 13%] 2023-01-11T23:13:47.0581031Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_scatter_reduce_mean_cuda_float32 PASSED [ 13%] 2023-01-11T23:13:47.0581211Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_scatter_reduce_mean_cuda_int64 PASSED [ 13%] 2023-01-11T23:13:47.0581389Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_scatter_reduce_sum_cuda_float32 PASSED [ 13%] 2023-01-11T23:13:47.0581566Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_scatter_reduce_sum_cuda_int64 PASSED [ 13%] 2023-01-11T23:13:47.0581765Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_searchsorted_cuda_float32 PASSED [ 13%] 2023-01-11T23:13:47.0581950Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_segment_reduce_lengths_cuda_float32 PASSED [ 13%] 2023-01-11T23:13:47.0582126Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_select_cuda_complex64 PASSED [ 13%] 2023-01-11T23:13:47.0582301Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_select_scatter_cuda_int64 PASSED [ 13%] 2023-01-11T23:13:47.0582459Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sgn_cuda_complex64 PASSED [ 13%] 2023-01-11T23:13:47.0582632Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sigmoid_cuda_complex64 PASSED [ 13%] 2023-01-11T23:13:47.0582801Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sigmoid_cuda_float32 PASSED [ 13%] 2023-01-11T23:13:47.0582963Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sigmoid_cuda_int64 PASSED [ 13%] 2023-01-11T23:13:47.0583131Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sign_cuda_float32 PASSED [ 13%] 2023-01-11T23:13:47.0583292Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sign_cuda_int64 PASSED [ 13%] 2023-01-11T23:13:47.0583500Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_signal_windows_exponential_cuda_float32 SKIPPED (Skipped!) [ 13%] 2023-01-11T23:13:47.0583702Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_signal_windows_general_cosine_cuda_float32 SKIPPED (Skipped!) [ 13%] 2023-01-11T23:13:47.0583907Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_signal_windows_general_hamming_cuda_float32 SKIPPED (Skipped!) [ 13%] 2023-01-11T23:13:47.0584095Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_signal_windows_hamming_cuda_float32 SKIPPED (Skipped!) [ 13%] 2023-01-11T23:13:47.0584262Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sin_cuda_complex64 PASSED [ 13%] 2023-01-11T23:13:47.0584433Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sinh_cuda_complex64 PASSED [ 13%] 2023-01-11T23:13:47.0584599Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sinh_cuda_int64 PASSED [ 13%] 2023-01-11T23:13:47.0584776Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_slice_scatter_cuda_float32 PASSED [ 13%] 2023-01-11T23:13:47.0584977Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_slice_scatter_cuda_int64 PASSED [ 13%] 2023-01-11T23:13:47.0585168Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_softmax_with_dtype_cuda_complex64 PASSED [ 13%] 2023-01-11T23:13:47.0585347Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_softmax_with_dtype_cuda_float32 PASSED [ 13%] 2023-01-11T23:13:47.0585524Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_softmax_with_dtype_cuda_int64 PASSED [ 13%] 2023-01-11T23:13:47.0585717Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sparse_sampled_addmm_cuda_complex64 SKIPPED (Skipped!) [ 13%] 2023-01-11T23:13:47.0585913Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sparse_sampled_addmm_cuda_float32 SKIPPED (Skipped!) [ 13%] 2023-01-11T23:13:47.0586096Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_bessel_j0_cuda_float32 PASSED [ 13%] 2023-01-11T23:13:47.0586274Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_bessel_j1_cuda_float32 PASSED [ 13%] 2023-01-11T23:13:47.0586447Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_bessel_j1_cuda_int64 PASSED [ 13%] 2023-01-11T23:13:47.0586617Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_bessel_y0_cuda_int64 PASSED [ 13%] 2023-01-11T23:13:47.0586785Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_bessel_y1_cuda_int64 PASSED [ 13%] 2023-01-11T23:13:47.0586979Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_chebyshev_polynomial_t_cuda_int64 PASSED [ 13%] 2023-01-11T23:13:47.0587172Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_chebyshev_polynomial_u_cuda_int64 PASSED [ 13%] 2023-01-11T23:13:47.0587565Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_chebyshev_polynomial_w_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 13%] 2023-01-11T23:13:47.0587964Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_chebyshev_polynomial_w_cuda_int64 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 13%] 2023-01-11T23:13:47.0588144Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_entr_cuda_float32 PASSED [ 13%] 2023-01-11T23:13:47.0588337Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_hermite_polynomial_h_cuda_float32 PASSED [ 13%] 2023-01-11T23:13:47.0588527Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_hermite_polynomial_h_cuda_int64 PASSED [ 13%] 2023-01-11T23:13:47.0588700Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_i0e_cuda_float32 PASSED [ 13%] 2023-01-11T23:13:47.0588870Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_i0e_cuda_int64 PASSED [ 14%] 2023-01-11T23:13:47.0589043Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_i1_cuda_float32 PASSED [ 14%] 2023-01-11T23:13:47.0589214Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_i1_cuda_int64 PASSED [ 14%] 2023-01-11T23:13:47.0589410Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_laguerre_polynomial_l_cuda_float32 PASSED [ 14%] 2023-01-11T23:13:47.0589823Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_legendre_polynomial_p_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 14%] 2023-01-11T23:13:47.0590180Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_legendre_polynomial_p_cuda_int64 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 14%] 2023-01-11T23:13:47.0590372Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_modified_bessel_i0_cuda_float32 PASSED [ 14%] 2023-01-11T23:13:47.0590557Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_modified_bessel_i0_cuda_int64 PASSED [ 14%] 2023-01-11T23:13:47.0590746Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_modified_bessel_i1_cuda_float32 PASSED [ 14%] 2023-01-11T23:13:47.0590964Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_modified_bessel_i1_cuda_int64 PASSED [ 14%] 2023-01-11T23:13:47.0591154Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_modified_bessel_k0_cuda_float32 PASSED [ 14%] 2023-01-11T23:13:47.0591337Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_modified_bessel_k1_cuda_int64 PASSED [ 14%] 2023-01-11T23:13:47.0591512Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_ndtri_cuda_int64 PASSED [ 14%] 2023-01-11T23:13:47.0591712Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_polygamma_special_polygamma_n_0_cuda_float32 PASSED [ 14%] 2023-01-11T23:13:47.0591909Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_scaled_modified_bessel_k1_cuda_float32 PASSED [ 14%] 2023-01-11T23:13:47.0592286Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_shifted_chebyshev_polynomial_v_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 14%] 2023-01-11T23:13:47.0592660Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_shifted_chebyshev_polynomial_w_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 14%] 2023-01-11T23:13:47.0593023Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_shifted_chebyshev_polynomial_w_cuda_int64 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 14%] 2023-01-11T23:13:47.0593202Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_xlog1py_cuda_float32 PASSED [ 14%] 2023-01-11T23:13:47.0593376Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_xlog1py_cuda_int64 PASSED [ 14%] 2023-01-11T23:13:47.0593550Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_zeta_cuda_float32 PASSED [ 14%] 2023-01-11T23:13:47.0593748Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_zeta_cuda_int64 PASSED [ 14%] 2023-01-11T23:13:47.0593921Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_split_cuda_complex64 PASSED [ 14%] 2023-01-11T23:13:47.0594096Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_split_list_args_cuda_float32 PASSED [ 14%] 2023-01-11T23:13:47.0594265Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sqrt_cuda_complex64 PASSED [ 14%] 2023-01-11T23:13:47.0594428Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sqrt_cuda_int64 PASSED [ 14%] 2023-01-11T23:13:47.0594604Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_squeeze_cuda_complex64 PASSED [ 14%] 2023-01-11T23:13:47.0594772Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_squeeze_cuda_float32 PASSED [ 14%] 2023-01-11T23:13:47.0594939Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_squeeze_cuda_int64 PASSED [ 14%] 2023-01-11T23:13:47.0595112Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_stack_cuda_complex64 PASSED [ 14%] 2023-01-11T23:13:47.0595281Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_std_cuda_complex64 PASSED [ 14%] 2023-01-11T23:13:47.0595444Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_std_cuda_float32 PASSED [ 14%] 2023-01-11T23:13:47.0595616Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_std_unbiased_cuda_float32 PASSED [ 14%] 2023-01-11T23:13:47.0595779Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_stft_cuda_float32 PASSED [ 14%] 2023-01-11T23:13:47.0595943Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sub_cuda_float32 PASSED [ 14%] 2023-01-11T23:13:47.0596106Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sum_cuda_float32 PASSED [ 14%] 2023-01-11T23:13:47.0596267Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sum_cuda_int64 PASSED [ 14%] 2023-01-11T23:13:47.0596445Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sum_to_size_cuda_complex64 PASSED [ 14%] 2023-01-11T23:13:47.0596619Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_svd_cuda_complex64 PASSED [ 14%] 2023-01-11T23:13:47.0596792Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_svd_lowrank_cuda_float32 PASSED [ 14%] 2023-01-11T23:13:47.0596979Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_take_cuda_complex64 PASSED [ 14%] 2023-01-11T23:13:47.0597146Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_take_cuda_float32 PASSED [ 14%] 2023-01-11T23:13:47.0597307Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_take_cuda_int64 PASSED [ 14%] 2023-01-11T23:13:47.0597470Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_tan_cuda_float32 PASSED [ 14%] 2023-01-11T23:13:47.0597636Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_tanh_cuda_float32 PASSED [ 14%] 2023-01-11T23:13:47.0597814Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_tensor_split_cuda_complex64 PASSED [ 14%] 2023-01-11T23:13:47.0597984Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_tensor_split_cuda_int64 PASSED [ 14%] 2023-01-11T23:13:47.0598159Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_tensordot_cuda_float32 PASSED [ 14%] 2023-01-11T23:13:47.0598328Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_tile_cuda_complex64 PASSED [ 14%] 2023-01-11T23:13:47.0598487Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_tile_cuda_int64 PASSED [ 14%] 2023-01-11T23:13:47.0598654Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_to_cuda_complex64 PASSED [ 14%] 2023-01-11T23:13:47.0598819Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_to_cuda_int64 PASSED [ 14%] 2023-01-11T23:13:47.0598995Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_to_sparse_cuda_complex64 SKIPPED [ 14%] 2023-01-11T23:13:47.0599164Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_to_sparse_cuda_int64 SKIPPED [ 14%] 2023-01-11T23:13:47.0599329Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_topk_cuda_float32 PASSED [ 14%] 2023-01-11T23:13:47.0599494Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_topk_cuda_int64 PASSED [ 14%] 2023-01-11T23:13:47.0599691Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_trace_cuda_complex64 PASSED [ 14%] 2023-01-11T23:13:47.0599849Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_trace_cuda_float32 PASSED [ 14%] 2023-01-11T23:13:47.0600016Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_trace_cuda_int64 PASSED [ 14%] 2023-01-11T23:13:47.0600188Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_transpose_cuda_float32 PASSED [ 14%] 2023-01-11T23:13:47.0600356Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_transpose_cuda_int64 PASSED [ 14%] 2023-01-11T23:13:47.0600524Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_trapz_cuda_complex64 PASSED [ 14%] 2023-01-11T23:13:47.0600690Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_trapz_cuda_float32 PASSED [ 14%] 2023-01-11T23:13:47.0600853Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_trapz_cuda_int64 PASSED [ 14%] 2023-01-11T23:13:47.0601036Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_triangular_solve_cuda_float32 PASSED [ 14%] 2023-01-11T23:13:47.0601205Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_tril_cuda_complex64 PASSED [ 14%] 2023-01-11T23:13:47.0601364Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_tril_cuda_float32 PASSED [ 14%] 2023-01-11T23:13:47.0601551Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_tril_indices_cuda_int64 SKIPPED (Skipped!) [ 14%] 2023-01-11T23:13:47.0601718Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_triu_cuda_complex64 PASSED [ 14%] 2023-01-11T23:13:47.0601887Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_triu_cuda_float32 PASSED [ 14%] 2023-01-11T23:13:47.0602075Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_triu_indices_cuda_int64 SKIPPED (Skipped!) [ 14%] 2023-01-11T23:13:47.0602253Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_true_divide_cuda_complex64 PASSED [ 14%] 2023-01-11T23:13:47.0602423Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_true_divide_cuda_int64 PASSED [ 14%] 2023-01-11T23:13:47.0602594Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unbind_cuda_complex64 PASSED [ 14%] 2023-01-11T23:13:47.0602772Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unbind_cuda_int64 PASSED [ 14%] 2023-01-11T23:13:47.0602947Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unflatten_cuda_float32 PASSED [ 14%] 2023-01-11T23:13:47.0603124Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unfold_copy_cuda_complex64 PASSED [ 14%] 2023-01-11T23:13:47.0603297Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unfold_copy_cuda_float32 PASSED [ 14%] 2023-01-11T23:13:47.0603465Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unfold_copy_cuda_int64 PASSED [ 14%] 2023-01-11T23:13:47.0603632Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unfold_cuda_complex64 PASSED [ 14%] 2023-01-11T23:13:47.0603798Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unfold_cuda_float32 PASSED [ 14%] 2023-01-11T23:13:47.0603963Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unfold_cuda_int64 PASSED [ 14%] 2023-01-11T23:13:47.0604131Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_uniform_cuda_float32 PASSED [ 14%] 2023-01-11T23:13:47.0604311Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unique_consecutive_cuda_float32 PASSED [ 14%] 2023-01-11T23:13:47.0604489Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unique_consecutive_cuda_int64 PASSED [ 14%] 2023-01-11T23:13:47.0604658Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unique_cuda_float32 PASSED [ 14%] 2023-01-11T23:13:47.0604834Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unsqueeze_cuda_complex64 PASSED [ 14%] 2023-01-11T23:13:47.0605003Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unsqueeze_cuda_float32 PASSED [ 14%] 2023-01-11T23:13:47.0605183Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_var_mean_unbiased_cuda_float32 PASSED [ 14%] 2023-01-11T23:13:47.0605381Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_var_unbiased_cuda_float32 PASSED [ 14%] 2023-01-11T23:13:47.0605546Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_vdot_cuda_float32 PASSED [ 14%] 2023-01-11T23:13:47.0605709Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_view_as_cuda_float32 PASSED [ 14%] 2023-01-11T23:13:47.0605875Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_view_as_cuda_int64 PASSED [ 14%] 2023-01-11T23:13:47.0606051Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_view_as_real_cuda_complex64 PASSED [ 14%] 2023-01-11T23:13:47.0606223Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_view_copy_cuda_float32 PASSED [ 14%] 2023-01-11T23:13:47.0606388Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_view_copy_cuda_int64 PASSED [ 14%] 2023-01-11T23:13:47.0606550Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_view_cuda_float32 PASSED [ 14%] 2023-01-11T23:13:47.0606721Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_vsplit_cuda_complex64 PASSED [ 14%] 2023-01-11T23:13:47.0606888Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_vsplit_cuda_float32 PASSED [ 14%] 2023-01-11T23:13:47.0607048Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_vsplit_cuda_int64 PASSED [ 14%] 2023-01-11T23:13:47.0607211Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_vstack_cuda_float32 PASSED [ 14%] 2023-01-11T23:13:47.0607377Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_where_cuda_float32 PASSED [ 14%] 2023-01-11T23:13:47.0607543Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_xlogy_cuda_float32 PASSED [ 14%] 2023-01-11T23:13:47.0607707Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_xlogy_cuda_int64 PASSED [ 14%] 2023-01-11T23:13:47.0607875Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_zero__cuda_complex64 PASSED [ 14%] 2023-01-11T23:13:47.0608035Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_zero__cuda_int64 PASSED [ 14%] 2023-01-11T23:13:47.0608206Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_zeros_cuda_complex64 PASSED [ 14%] 2023-01-11T23:13:47.0608368Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_zeros_cuda_int64 PASSED [ 14%] 2023-01-11T23:13:47.0608557Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_zeros_like_cuda_float32 PASSED [ 14%] 2023-01-11T23:13:47.0608726Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_zeros_like_cuda_int64 PASSED [ 14%] 2023-01-11T23:13:47.0608880Z test_ops.py::TestCommonCUDA::test_numpy_ref_addbmm_cuda_float64 PASSED [ 14%] 2023-01-11T23:13:47.0609031Z test_ops.py::TestCommonCUDA::test_numpy_ref_aminmax_cuda_int64 PASSED [ 14%] 2023-01-11T23:13:47.0609191Z test_ops.py::TestCommonCUDA::test_numpy_ref_argwhere_cuda_complex128 PASSED [ 14%] 2023-01-11T23:13:47.0609343Z test_ops.py::TestCommonCUDA::test_numpy_ref_argwhere_cuda_int64 PASSED [ 14%] 2023-01-11T23:13:47.0609509Z test_ops.py::TestCommonCUDA::test_numpy_ref_broadcast_tensors_cuda_complex128 PASSED [ 14%] 2023-01-11T23:13:47.0609676Z test_ops.py::TestCommonCUDA::test_numpy_ref_broadcast_tensors_cuda_int64 PASSED [ 14%] 2023-01-11T23:13:47.0609837Z test_ops.py::TestCommonCUDA::test_numpy_ref_broadcast_to_cuda_complex128 PASSED [ 14%] 2023-01-11T23:13:47.0609992Z test_ops.py::TestCommonCUDA::test_numpy_ref_broadcast_to_cuda_float64 PASSED [ 14%] 2023-01-11T23:13:47.0610148Z test_ops.py::TestCommonCUDA::test_numpy_ref_broadcast_to_cuda_int64 PASSED [ 14%] 2023-01-11T23:13:47.0610299Z test_ops.py::TestCommonCUDA::test_numpy_ref_cat_cuda_float64 PASSED [ 14%] 2023-01-11T23:13:47.0610450Z test_ops.py::TestCommonCUDA::test_numpy_ref_clamp_cuda_float64 PASSED [ 14%] 2023-01-11T23:13:47.0610600Z test_ops.py::TestCommonCUDA::test_numpy_ref_clone_cuda_complex128 XFAIL [ 14%] 2023-01-11T23:13:47.0610750Z test_ops.py::TestCommonCUDA::test_numpy_ref_clone_cuda_int64 XFAIL [ 14%] 2023-01-11T23:13:47.0610901Z test_ops.py::TestCommonCUDA::test_numpy_ref_diag_cuda_complex128 PASSED [ 14%] 2023-01-11T23:13:47.0611075Z test_ops.py::TestCommonCUDA::test_numpy_ref_diag_cuda_float64 PASSED [ 14%] 2023-01-11T23:13:47.0611225Z test_ops.py::TestCommonCUDA::test_numpy_ref_flatten_cuda_complex128 PASSED [ 14%] 2023-01-11T23:13:47.0611379Z test_ops.py::TestCommonCUDA::test_numpy_ref_flatten_cuda_float64 PASSED [ 14%] 2023-01-11T23:13:47.0611552Z test_ops.py::TestCommonCUDA::test_numpy_ref_jiterator_2inputs_2outputs_cuda_int64 PASSED [ 14%] 2023-01-11T23:13:47.0611713Z test_ops.py::TestCommonCUDA::test_numpy_ref_linalg_cross_cuda_float64 PASSED [ 14%] 2023-01-11T23:13:47.0611865Z test_ops.py::TestCommonCUDA::test_numpy_ref_linalg_cross_cuda_int64 PASSED [ 14%] 2023-01-11T23:13:47.0612038Z test_ops.py::TestCommonCUDA::test_numpy_ref_linalg_tensorsolve_cuda_complex128 PASSED [ 14%] 2023-01-11T23:13:47.0612203Z test_ops.py::TestCommonCUDA::test_numpy_ref_linalg_tensorsolve_cuda_float64 PASSED [ 14%] 2023-01-11T23:13:47.0612364Z test_ops.py::TestCommonCUDA::test_numpy_ref_linalg_vander_cuda_float64 PASSED [ 14%] 2023-01-11T23:13:47.0612521Z test_ops.py::TestCommonCUDA::test_numpy_ref_linalg_vecdot_cuda_complex128 PASSED [ 14%] 2023-01-11T23:13:47.0612678Z test_ops.py::TestCommonCUDA::test_numpy_ref_linalg_vecdot_cuda_float64 PASSED [ 14%] 2023-01-11T23:13:47.0612851Z test_ops.py::TestCommonCUDA::test_numpy_ref_meshgrid_variadic_tensors_cuda_int64 PASSED [ 14%] 2023-01-11T23:13:47.0613031Z test_ops.py::TestCommonCUDA::test_numpy_ref_nn_functional_conv_transpose2d_cuda_float64 XFAIL [ 14%] 2023-01-11T23:13:47.0613212Z test_ops.py::TestCommonCUDA::test_numpy_ref_nn_functional_conv_transpose3d_cuda_complex128 XFAIL [ 14%] 2023-01-11T23:13:47.0613392Z test_ops.py::TestCommonCUDA::test_numpy_ref_nn_functional_conv_transpose3d_cuda_float64 XFAIL [ 14%] 2023-01-11T23:13:47.0613556Z test_ops.py::TestCommonCUDA::test_numpy_ref_nn_functional_gelu_cuda_float64 PASSED [ 14%] 2023-01-11T23:13:47.0613724Z test_ops.py::TestCommonCUDA::test_numpy_ref_nn_functional_l1_loss_cuda_complex128 PASSED [ 14%] 2023-01-11T23:13:47.0613889Z test_ops.py::TestCommonCUDA::test_numpy_ref_nn_functional_one_hot_cuda_int64 PASSED [ 14%] 2023-01-11T23:13:47.0614071Z test_ops.py::TestCommonCUDA::test_numpy_ref_nn_functional_pairwise_distance_cuda_complex128 PASSED [ 14%] 2023-01-11T23:13:47.0614278Z test_ops.py::TestCommonCUDA::test_numpy_ref_nn_functional_pairwise_distance_cuda_float64 PASSED [ 14%] 2023-01-11T23:13:47.0614459Z test_ops.py::TestCommonCUDA::test_numpy_ref_nn_functional_pairwise_distance_cuda_int64 PASSED [ 14%] 2023-01-11T23:13:47.0614719Z test_ops.py::TestCommonCUDA::test_numpy_ref_ravel_cuda_float64 PASSED [ 14%] 2023-01-11T23:13:47.0614872Z test_ops.py::TestCommonCUDA::test_numpy_ref_ravel_cuda_int64 PASSED [ 14%] 2023-01-11T23:13:47.0615025Z test_ops.py::TestCommonCUDA::test_numpy_ref_repeat_cuda_float64 PASSED [ 14%] 2023-01-11T23:13:47.0615176Z test_ops.py::TestCommonCUDA::test_numpy_ref_repeat_cuda_int64 PASSED [ 14%] 2023-01-11T23:13:47.0615327Z test_ops.py::TestCommonCUDA::test_numpy_ref_roll_cuda_complex128 PASSED [ 14%] 2023-01-11T23:13:47.0615470Z test_ops.py::TestCommonCUDA::test_numpy_ref_roll_cuda_int64 PASSED [ 14%] 2023-01-11T23:13:47.0615642Z test_ops.py::TestCommonCUDA::test_numpy_ref_signal_windows_bartlett_cuda_float64 PASSED [ 14%] 2023-01-11T23:13:47.0615819Z test_ops.py::TestCommonCUDA::test_numpy_ref_signal_windows_gaussian_cuda_float64 PASSED [ 15%] 2023-01-11T23:13:47.0615985Z test_ops.py::TestCommonCUDA::test_numpy_ref_signal_windows_hann_cuda_float64 PASSED [ 15%] 2023-01-11T23:13:47.0616153Z test_ops.py::TestCommonCUDA::test_numpy_ref_signal_windows_kaiser_cuda_float64 PASSED [ 15%] 2023-01-11T23:13:47.0616321Z test_ops.py::TestCommonCUDA::test_numpy_ref_signal_windows_nuttall_cuda_float64 PASSED [ 15%] 2023-01-11T23:13:47.0616480Z test_ops.py::TestCommonCUDA::test_numpy_ref_tensor_split_cuda_float64 PASSED [ 15%] 2023-01-11T23:13:47.0616630Z test_ops.py::TestCommonCUDA::test_numpy_ref_tile_cuda_float64 PASSED [ 15%] 2023-01-11T23:13:47.0616784Z test_ops.py::TestCommonCUDA::test_numpy_ref_transpose_cuda_int64 PASSED [ 15%] 2023-01-11T23:13:47.0616974Z test_ops.py::TestCommonCUDA::test_numpy_ref_tril_indices_cuda_int64 PASSED [ 15%] 2023-01-11T23:13:47.0617125Z test_ops.py::TestCommonCUDA::test_numpy_ref_unbind_cuda_int64 PASSED [ 15%] 2023-01-11T23:13:47.0617285Z test_ops.py::TestCommonCUDA::test_numpy_ref_view_copy_cuda_float64 PASSED [ 15%] 2023-01-11T23:13:47.0617437Z test_ops.py::TestCommonCUDA::test_numpy_ref_where_cuda_complex128 PASSED [ 15%] 2023-01-11T23:13:47.0617587Z test_ops.py::TestCommonCUDA::test_numpy_ref_where_cuda_float64 PASSED [ 15%] 2023-01-11T23:13:47.0617729Z test_ops.py::TestCommonCUDA::test_out_H_cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0617876Z test_ops.py::TestCommonCUDA::test_out___getitem___cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0618021Z test_ops.py::TestCommonCUDA::test_out___rmatmul___cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0618157Z test_ops.py::TestCommonCUDA::test_out___rmod___cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0618300Z test_ops.py::TestCommonCUDA::test_out___rmul___cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0618443Z test_ops.py::TestCommonCUDA::test_out___rpow___cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0618586Z test_ops.py::TestCommonCUDA::test_out___rsub___cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0618751Z test_ops.py::TestCommonCUDA::test_out__refs__conversions_bool_cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0618917Z test_ops.py::TestCommonCUDA::test_out__refs__conversions_double_cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0619085Z test_ops.py::TestCommonCUDA::test_out__refs__conversions_half_cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0619248Z test_ops.py::TestCommonCUDA::test_out__refs__conversions_long_cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0619403Z test_ops.py::TestCommonCUDA::test_out__refs__conversions_short_cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0619553Z test_ops.py::TestCommonCUDA::test_out__refs_acos_cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0619703Z test_ops.py::TestCommonCUDA::test_out__refs_add_cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0619851Z test_ops.py::TestCommonCUDA::test_out__refs_addcdiv_cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0620031Z test_ops.py::TestCommonCUDA::test_out__refs_addcmul_cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0620177Z test_ops.py::TestCommonCUDA::test_out__refs_addr_cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0620323Z test_ops.py::TestCommonCUDA::test_out__refs_all_cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0620467Z test_ops.py::TestCommonCUDA::test_out__refs_amin_cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0620616Z test_ops.py::TestCommonCUDA::test_out__refs_as_strided_cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0620788Z test_ops.py::TestCommonCUDA::test_out__refs_as_strided_partial_views_cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0620950Z test_ops.py::TestCommonCUDA::test_out__refs_as_strided_scatter_cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0621100Z test_ops.py::TestCommonCUDA::test_out__refs_asin_cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0621245Z test_ops.py::TestCommonCUDA::test_out__refs_atan2_cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0621399Z test_ops.py::TestCommonCUDA::test_out__refs_atleast_1d_cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0621558Z test_ops.py::TestCommonCUDA::test_out__refs_bitwise_not_cuda_int64 PASSED [ 15%] 2023-01-11T23:13:47.0621704Z test_ops.py::TestCommonCUDA::test_out__refs_bitwise_or_cuda_int64 PASSED [ 15%] 2023-01-11T23:13:47.0621843Z test_ops.py::TestCommonCUDA::test_out__refs_cat_cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0621989Z test_ops.py::TestCommonCUDA::test_out__refs_ceil_cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0622138Z test_ops.py::TestCommonCUDA::test_out__refs_clamp_cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0622292Z test_ops.py::TestCommonCUDA::test_out__refs_clamp_min_cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0622437Z test_ops.py::TestCommonCUDA::test_out__refs_clone_cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0622607Z test_ops.py::TestCommonCUDA::test_out__refs_conj_cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0622767Z test_ops.py::TestCommonCUDA::test_out__refs_conj_physical_cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0622920Z test_ops.py::TestCommonCUDA::test_out__refs_copysign_cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0623057Z test_ops.py::TestCommonCUDA::test_out__refs_cos_cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0623205Z test_ops.py::TestCommonCUDA::test_out__refs_cumsum_cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0623365Z test_ops.py::TestCommonCUDA::test_out__refs_diagonal_copy_cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0623528Z test_ops.py::TestCommonCUDA::test_out__refs_div_floor_rounding_cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0623687Z test_ops.py::TestCommonCUDA::test_out__refs_div_trunc_rounding_cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0623833Z test_ops.py::TestCommonCUDA::test_out__refs_dsplit_cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0623982Z test_ops.py::TestCommonCUDA::test_out__refs_erfc_cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0624129Z test_ops.py::TestCommonCUDA::test_out__refs_erfinv_cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0624278Z test_ops.py::TestCommonCUDA::test_out__refs_exp2_cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0624417Z test_ops.py::TestCommonCUDA::test_out__refs_exp_cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0624570Z test_ops.py::TestCommonCUDA::test_out__refs_expand_as_cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0624716Z test_ops.py::TestCommonCUDA::test_out__refs_fft_fft_cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0624872Z test_ops.py::TestCommonCUDA::test_out__refs_fft_fftshift_cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0625020Z test_ops.py::TestCommonCUDA::test_out__refs_fft_hfft_cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0625174Z test_ops.py::TestCommonCUDA::test_out__refs_fft_hfftn_cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0625332Z test_ops.py::TestCommonCUDA::test_out__refs_fft_ihfft2_cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0625485Z test_ops.py::TestCommonCUDA::test_out__refs_fft_ihfft_cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0625656Z test_ops.py::TestCommonCUDA::test_out__refs_fft_irfft2_cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0625808Z test_ops.py::TestCommonCUDA::test_out__refs_fft_rfft2_cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0625958Z test_ops.py::TestCommonCUDA::test_out__refs_flatten_cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0626104Z test_ops.py::TestCommonCUDA::test_out__refs_flip_cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0626254Z test_ops.py::TestCommonCUDA::test_out__refs_fliplr_cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0626399Z test_ops.py::TestCommonCUDA::test_out__refs_fmod_cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0626543Z test_ops.py::TestCommonCUDA::test_out__refs_frac_cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0626687Z test_ops.py::TestCommonCUDA::test_out__refs_ge_cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0626836Z test_ops.py::TestCommonCUDA::test_out__refs_heaviside_cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0626984Z test_ops.py::TestCommonCUDA::test_out__refs_igammac_cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0627141Z test_ops.py::TestCommonCUDA::test_out__refs_index_copy_cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0627295Z test_ops.py::TestCommonCUDA::test_out__refs_index_fill_cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0627441Z test_ops.py::TestCommonCUDA::test_out__refs_isclose_cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0627589Z test_ops.py::TestCommonCUDA::test_out__refs_isfinite_cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0627734Z test_ops.py::TestCommonCUDA::test_out__refs_isnan_cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0627878Z test_ops.py::TestCommonCUDA::test_out__refs_le_cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0628016Z test_ops.py::TestCommonCUDA::test_out__refs_lerp_cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0628206Z test_ops.py::TestCommonCUDA::test_out__refs_linalg_norm_cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0628365Z test_ops.py::TestCommonCUDA::test_out__refs_linalg_svdvals_cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0628514Z test_ops.py::TestCommonCUDA::test_out__refs_log10_cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0628664Z test_ops.py::TestCommonCUDA::test_out__refs_log1p_cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0628809Z test_ops.py::TestCommonCUDA::test_out__refs_log2_cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0628955Z test_ops.py::TestCommonCUDA::test_out__refs_log_cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0629123Z test_ops.py::TestCommonCUDA::test_out__refs_log_softmax_with_dtype_cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0629275Z test_ops.py::TestCommonCUDA::test_out__refs_logical_and_cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0629429Z test_ops.py::TestCommonCUDA::test_out__refs_logical_or_cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0629587Z test_ops.py::TestCommonCUDA::test_out__refs_logical_xor_cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0629801Z test_ops.py::TestCommonCUDA::test_out__refs_logspace_cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0629967Z test_ops.py::TestCommonCUDA::test_out__refs_logsumexp_cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0630120Z test_ops.py::TestCommonCUDA::test_out__refs_masked_fill_cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0630269Z test_ops.py::TestCommonCUDA::test_out__refs_maximum_cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0630417Z test_ops.py::TestCommonCUDA::test_out__refs_movedim_cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0630561Z test_ops.py::TestCommonCUDA::test_out__refs_nan_to_num_cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0630708Z test_ops.py::TestCommonCUDA::test_out__refs_narrow_cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0630854Z test_ops.py::TestCommonCUDA::test_out__refs_ne_cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0631053Z test_ops.py::TestCommonCUDA::test_out__refs_new_empty_cuda_float32 SKIPPED (Expected: empty is not comparable) [ 15%] 2023-01-11T23:13:47.0631214Z test_ops.py::TestCommonCUDA::test_out__refs_new_empty_strided_cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0631395Z test_ops.py::TestCommonCUDA::test_out__refs_new_zeros_cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0631613Z test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_dropout_cuda_float32 SKIPPED (Expected: dropout is not comparable) [ 15%] 2023-01-11T23:13:47.0631782Z test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_group_norm_cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0631950Z test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_hardtanh_cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0632124Z test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_hinge_embedding_loss_cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0632297Z test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_leaky_relu_cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0632482Z test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_log_softmax_with_dtype_cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0632663Z test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_margin_ranking_loss_cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0632829Z test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_nll_loss_cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0633007Z test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_pairwise_distance_cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0633174Z test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_pdist_cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0633349Z test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_poisson_nll_loss_cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0633512Z test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_relu6_cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0633667Z test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_relu_cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0633861Z test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_softshrink_cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0634029Z test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_threshold_cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0634183Z test_ops.py::TestCommonCUDA::test_out__refs_positive_cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0634330Z test_ops.py::TestCommonCUDA::test_out__refs_pow_cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0634479Z test_ops.py::TestCommonCUDA::test_out__refs_prod_cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0634626Z test_ops.py::TestCommonCUDA::test_out__refs_randn_cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0634770Z test_ops.py::TestCommonCUDA::test_out__refs_ravel_cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0634920Z test_ops.py::TestCommonCUDA::test_out__refs_reciprocal_cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0635066Z test_ops.py::TestCommonCUDA::test_out__refs_roll_cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0635215Z test_ops.py::TestCommonCUDA::test_out__refs_round_cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0635362Z test_ops.py::TestCommonCUDA::test_out__refs_rsqrt_cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0635505Z test_ops.py::TestCommonCUDA::test_out__refs_sgn_cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0635656Z test_ops.py::TestCommonCUDA::test_out__refs_sigmoid_cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0635800Z test_ops.py::TestCommonCUDA::test_out__refs_sign_cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0635945Z test_ops.py::TestCommonCUDA::test_out__refs_sinh_cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0636100Z test_ops.py::TestCommonCUDA::test_out__refs_softmax_with_dtype_cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0636264Z test_ops.py::TestCommonCUDA::test_out__refs_special_bessel_j1_cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0636425Z test_ops.py::TestCommonCUDA::test_out__refs_special_erfcx_cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0636577Z test_ops.py::TestCommonCUDA::test_out__refs_special_i1_cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0636740Z test_ops.py::TestCommonCUDA::test_out__refs_special_log_ndtr_cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0636941Z test_ops.py::TestCommonCUDA::test_out__refs_special_log_softmax_with_dtype_cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0637123Z test_ops.py::TestCommonCUDA::test_out__refs_special_multigammaln_mvlgamma_p_1_cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0637303Z test_ops.py::TestCommonCUDA::test_out__refs_special_multigammaln_mvlgamma_p_5_cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0637461Z test_ops.py::TestCommonCUDA::test_out__refs_special_ndtr_cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0637625Z test_ops.py::TestCommonCUDA::test_out__refs_special_softmax_with_dtype_cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0637773Z test_ops.py::TestCommonCUDA::test_out__refs_sqrt_cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0637918Z test_ops.py::TestCommonCUDA::test_out__refs_stack_cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0638069Z test_ops.py::TestCommonCUDA::test_out__refs_std_mean_cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0638215Z test_ops.py::TestCommonCUDA::test_out__refs_sum_cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0638365Z test_ops.py::TestCommonCUDA::test_out__refs_tanh_cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0638510Z test_ops.py::TestCommonCUDA::test_out__refs_to_cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0638654Z test_ops.py::TestCommonCUDA::test_out__refs_trace_cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0638802Z test_ops.py::TestCommonCUDA::test_out__refs_tril_indices_cuda_int64 PASSED [ 15%] 2023-01-11T23:13:47.0638945Z test_ops.py::TestCommonCUDA::test_out__refs_triu_cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0639100Z test_ops.py::TestCommonCUDA::test_out__refs_triu_indices_cuda_int64 PASSED [ 15%] 2023-01-11T23:13:47.0639258Z test_ops.py::TestCommonCUDA::test_out__refs_true_divide_cuda_float32 PASSED [ 15%] 2023-01-11T23:13:47.0639427Z test_ops.py::TestCommonCUDA::test_out__refs_trunc_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0639572Z test_ops.py::TestCommonCUDA::test_out__refs_var_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0639721Z test_ops.py::TestCommonCUDA::test_out__refs_var_mean_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0639869Z test_ops.py::TestCommonCUDA::test_out__refs_view_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0640009Z test_ops.py::TestCommonCUDA::test_out__refs_vsplit_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0640154Z test_ops.py::TestCommonCUDA::test_out__refs_where_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0640299Z test_ops.py::TestCommonCUDA::test_out__refs_xlogy_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0640462Z test_ops.py::TestCommonCUDA::test_out__softmax_backward_data_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0640606Z test_ops.py::TestCommonCUDA::test_out_acos_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0640747Z test_ops.py::TestCommonCUDA::test_out_acosh_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0640892Z test_ops.py::TestCommonCUDA::test_out_add_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0641035Z test_ops.py::TestCommonCUDA::test_out_addbmm_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0641175Z test_ops.py::TestCommonCUDA::test_out_addcdiv_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0641320Z test_ops.py::TestCommonCUDA::test_out_addmm_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0641474Z test_ops.py::TestCommonCUDA::test_out_addmm_decomposed_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0641617Z test_ops.py::TestCommonCUDA::test_out_all_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0641763Z test_ops.py::TestCommonCUDA::test_out_amin_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0641907Z test_ops.py::TestCommonCUDA::test_out_aminmax_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0642049Z test_ops.py::TestCommonCUDA::test_out_angle_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0642196Z test_ops.py::TestCommonCUDA::test_out_argmax_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0642333Z test_ops.py::TestCommonCUDA::test_out_argsort_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0642513Z test_ops.py::TestCommonCUDA::test_out_as_strided_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0642679Z test_ops.py::TestCommonCUDA::test_out_as_strided_partial_views_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0642836Z test_ops.py::TestCommonCUDA::test_out_as_strided_scatter_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0642978Z test_ops.py::TestCommonCUDA::test_out_asin_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0643122Z test_ops.py::TestCommonCUDA::test_out_atan2_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0643267Z test_ops.py::TestCommonCUDA::test_out_atanh_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0643416Z test_ops.py::TestCommonCUDA::test_out_atleast_1d_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0643567Z test_ops.py::TestCommonCUDA::test_out_atleast_2d_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0643705Z test_ops.py::TestCommonCUDA::test_out_atleast_3d_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0643851Z test_ops.py::TestCommonCUDA::test_out_baddbmm_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0643996Z test_ops.py::TestCommonCUDA::test_out_bernoulli_cuda_float32 XFAIL [ 16%] 2023-01-11T23:13:47.0644140Z test_ops.py::TestCommonCUDA::test_out_bfloat16_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0644283Z test_ops.py::TestCommonCUDA::test_out_bincount_cuda_int64 PASSED [ 16%] 2023-01-11T23:13:47.0644436Z test_ops.py::TestCommonCUDA::test_out_bitwise_left_shift_cuda_int64 PASSED [ 16%] 2023-01-11T23:13:47.0644582Z test_ops.py::TestCommonCUDA::test_out_bitwise_not_cuda_int64 PASSED [ 16%] 2023-01-11T23:13:47.0644723Z test_ops.py::TestCommonCUDA::test_out_bool_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0644877Z test_ops.py::TestCommonCUDA::test_out_broadcast_shapes_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0645062Z test_ops.py::TestCommonCUDA::test_out_broadcast_tensors_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0645208Z test_ops.py::TestCommonCUDA::test_out_bucketize_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0645359Z test_ops.py::TestCommonCUDA::test_out_cholesky_solve_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0645507Z test_ops.py::TestCommonCUDA::test_out_column_stack_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0645648Z test_ops.py::TestCommonCUDA::test_out_conj_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0645796Z test_ops.py::TestCommonCUDA::test_out_conj_physical_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0645942Z test_ops.py::TestCommonCUDA::test_out_corrcoef_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0646078Z test_ops.py::TestCommonCUDA::test_out_cosh_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0646225Z test_ops.py::TestCommonCUDA::test_out_count_nonzero_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0646370Z test_ops.py::TestCommonCUDA::test_out_cov_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0646513Z test_ops.py::TestCommonCUDA::test_out_cummax_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0646657Z test_ops.py::TestCommonCUDA::test_out_cumprod_cuda_float32 XFAIL [ 16%] 2023-01-11T23:13:47.0646821Z test_ops.py::TestCommonCUDA::test_out_cumulative_trapezoid_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0646965Z test_ops.py::TestCommonCUDA::test_out_deg2rad_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0647112Z test_ops.py::TestCommonCUDA::test_out_diag_embed_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0647251Z test_ops.py::TestCommonCUDA::test_out_diagonal_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0647408Z test_ops.py::TestCommonCUDA::test_out_diagonal_scatter_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0647554Z test_ops.py::TestCommonCUDA::test_out_digamma_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0647714Z test_ops.py::TestCommonCUDA::test_out_div_floor_rounding_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0647859Z test_ops.py::TestCommonCUDA::test_out_double_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0648029Z test_ops.py::TestCommonCUDA::test_out_dsplit_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0648173Z test_ops.py::TestCommonCUDA::test_out_einsum_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0648318Z test_ops.py::TestCommonCUDA::test_out_empty_like_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0648534Z test_ops.py::TestCommonCUDA::test_out_equal_cuda_float32 SKIPPED (Skipped! Only supports single tensor or iterable of tensor outputs.) [ 16%] 2023-01-11T23:13:47.0648680Z test_ops.py::TestCommonCUDA::test_out_erfinv_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0648821Z test_ops.py::TestCommonCUDA::test_out_exp_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0648969Z test_ops.py::TestCommonCUDA::test_out_expand_as_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0649115Z test_ops.py::TestCommonCUDA::test_out_expand_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0649285Z test_ops.py::TestCommonCUDA::test_out_fft_fftshift_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0649456Z test_ops.py::TestCommonCUDA::test_out_fft_ihfftn_cuda_float32 XFAIL [ 16%] 2023-01-11T23:13:47.0649600Z test_ops.py::TestCommonCUDA::test_out_fft_rfftn_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0649737Z test_ops.py::TestCommonCUDA::test_out_floor_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0649884Z test_ops.py::TestCommonCUDA::test_out_floor_divide_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0650027Z test_ops.py::TestCommonCUDA::test_out_fmin_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0650170Z test_ops.py::TestCommonCUDA::test_out_frac_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0650313Z test_ops.py::TestCommonCUDA::test_out_frexp_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0650457Z test_ops.py::TestCommonCUDA::test_out_full_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0650675Z test_ops.py::TestCommonCUDA::test_out_full_like_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0650814Z test_ops.py::TestCommonCUDA::test_out_gcd_cuda_int64 PASSED [ 16%] 2023-01-11T23:13:47.0650956Z test_ops.py::TestCommonCUDA::test_out_ge_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0651096Z test_ops.py::TestCommonCUDA::test_out_gradient_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0651238Z test_ops.py::TestCommonCUDA::test_out_gt_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0651378Z test_ops.py::TestCommonCUDA::test_out_hsplit_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0651520Z test_ops.py::TestCommonCUDA::test_out_i0_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0651664Z test_ops.py::TestCommonCUDA::test_out_igamma_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0651809Z test_ops.py::TestCommonCUDA::test_out_igammac_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0651956Z test_ops.py::TestCommonCUDA::test_out_imag_cuda_complex64 PASSED [ 16%] 2023-01-11T23:13:47.0652103Z test_ops.py::TestCommonCUDA::test_out_index_add_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0652245Z test_ops.py::TestCommonCUDA::test_out_index_copy_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0652391Z test_ops.py::TestCommonCUDA::test_out_index_fill_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0652536Z test_ops.py::TestCommonCUDA::test_out_index_put_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0652682Z test_ops.py::TestCommonCUDA::test_out_index_select_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0652824Z test_ops.py::TestCommonCUDA::test_out_inner_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0652964Z test_ops.py::TestCommonCUDA::test_out_isinf_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0653109Z test_ops.py::TestCommonCUDA::test_out_isneginf_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0653256Z test_ops.py::TestCommonCUDA::test_out_istft_cuda_complex64 PASSED [ 16%] 2023-01-11T23:13:47.0653403Z test_ops.py::TestCommonCUDA::test_out_jiterator_binary_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0653571Z test_ops.py::TestCommonCUDA::test_out_kron_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0653718Z test_ops.py::TestCommonCUDA::test_out_kthvalue_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0653857Z test_ops.py::TestCommonCUDA::test_out_lcm_cuda_int64 PASSED [ 16%] 2023-01-11T23:13:47.0654003Z test_ops.py::TestCommonCUDA::test_out_ldexp_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0654147Z test_ops.py::TestCommonCUDA::test_out_lerp_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0654296Z test_ops.py::TestCommonCUDA::test_out_linalg_cond_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0654442Z test_ops.py::TestCommonCUDA::test_out_linalg_eig_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0654689Z test_ops.py::TestCommonCUDA::test_out_linalg_eigh_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0654843Z test_ops.py::TestCommonCUDA::test_out_linalg_eigvals_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0654997Z test_ops.py::TestCommonCUDA::test_out_linalg_eigvalsh_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0655146Z test_ops.py::TestCommonCUDA::test_out_linalg_inv_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0655292Z test_ops.py::TestCommonCUDA::test_out_linalg_inv_ex_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0655449Z test_ops.py::TestCommonCUDA::test_out_linalg_ldl_factor_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0655610Z test_ops.py::TestCommonCUDA::test_out_linalg_ldl_factor_ex_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0655770Z test_ops.py::TestCommonCUDA::test_out_linalg_ldl_solve_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0655931Z test_ops.py::TestCommonCUDA::test_out_linalg_lstsq_cuda_float32 SKIPPED (Skipped!) [ 16%] 2023-01-11T23:13:47.0656098Z test_ops.py::TestCommonCUDA::test_out_linalg_lstsq_grad_oriented_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0656284Z test_ops.py::TestCommonCUDA::test_out_linalg_lu_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0656446Z test_ops.py::TestCommonCUDA::test_out_linalg_matrix_power_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0656597Z test_ops.py::TestCommonCUDA::test_out_linalg_norm_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0656772Z test_ops.py::TestCommonCUDA::test_out_linalg_norm_subgradients_at_zero_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0656919Z test_ops.py::TestCommonCUDA::test_out_linalg_pinv_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0657069Z test_ops.py::TestCommonCUDA::test_out_linalg_slogdet_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0657227Z test_ops.py::TestCommonCUDA::test_out_linalg_solve_triangular_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0657374Z test_ops.py::TestCommonCUDA::test_out_linalg_svd_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0657524Z test_ops.py::TestCommonCUDA::test_out_linalg_svdvals_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0657688Z test_ops.py::TestCommonCUDA::test_out_linalg_tensorsolve_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0657838Z test_ops.py::TestCommonCUDA::test_out_linalg_vander_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0657998Z test_ops.py::TestCommonCUDA::test_out_linalg_vector_norm_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0658140Z test_ops.py::TestCommonCUDA::test_out_log10_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0658284Z test_ops.py::TestCommonCUDA::test_out_log1p_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0658421Z test_ops.py::TestCommonCUDA::test_out_log_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0658567Z test_ops.py::TestCommonCUDA::test_out_log_softmax_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0658727Z test_ops.py::TestCommonCUDA::test_out_log_softmax_with_dtype_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0658875Z test_ops.py::TestCommonCUDA::test_out_logaddexp2_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0659029Z test_ops.py::TestCommonCUDA::test_out_logical_and_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0659178Z test_ops.py::TestCommonCUDA::test_out_logical_not_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0659351Z test_ops.py::TestCommonCUDA::test_out_long_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0659493Z test_ops.py::TestCommonCUDA::test_out_lt_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0659633Z test_ops.py::TestCommonCUDA::test_out_lu_cuda_float32 XFAIL [ 16%] 2023-01-11T23:13:47.0659773Z test_ops.py::TestCommonCUDA::test_out_lu_solve_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0659918Z test_ops.py::TestCommonCUDA::test_out_lu_unpack_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0660059Z test_ops.py::TestCommonCUDA::test_out_mH_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0660201Z test_ops.py::TestCommonCUDA::test_out_mT_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0660345Z test_ops.py::TestCommonCUDA::test_out_masked_amin_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0660496Z test_ops.py::TestCommonCUDA::test_out_masked_argmax_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0660644Z test_ops.py::TestCommonCUDA::test_out_masked_argmin_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0660798Z test_ops.py::TestCommonCUDA::test_out_masked_cumprod_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0660949Z test_ops.py::TestCommonCUDA::test_out_masked_logsumexp_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0661094Z test_ops.py::TestCommonCUDA::test_out_masked_mean_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0661242Z test_ops.py::TestCommonCUDA::test_out_masked_median_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0661389Z test_ops.py::TestCommonCUDA::test_out_masked_select_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0661537Z test_ops.py::TestCommonCUDA::test_out_masked_softmax_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0661685Z test_ops.py::TestCommonCUDA::test_out_masked_softmin_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0661856Z test_ops.py::TestCommonCUDA::test_out_masked_sum_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0662004Z test_ops.py::TestCommonCUDA::test_out_masked_var_cuda_float32 PASSED [ 16%] 2023-01-11T23:13:47.0662141Z test_ops.py::TestCommonCUDA::test_out_matmul_cuda_float32 XFAIL [ 17%] 2023-01-11T23:13:47.0662286Z test_ops.py::TestCommonCUDA::test_out_matrix_exp_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0662476Z test_ops.py::TestCommonCUDA::test_out_max_pool2d_with_indices_backward_cuda_float32 SKIPPED (Skipped!) [ 17%] 2023-01-11T23:13:47.0662618Z test_ops.py::TestCommonCUDA::test_out_mean_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0662784Z test_ops.py::TestCommonCUDA::test_out_meshgrid_variadic_tensors_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0662925Z test_ops.py::TestCommonCUDA::test_out_mm_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0663067Z test_ops.py::TestCommonCUDA::test_out_mul_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0663232Z test_ops.py::TestCommonCUDA::test_out_mvlgamma_mvlgamma_p_1_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0663387Z test_ops.py::TestCommonCUDA::test_out_mvlgamma_mvlgamma_p_3_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0663533Z test_ops.py::TestCommonCUDA::test_out_nanmean_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0663681Z test_ops.py::TestCommonCUDA::test_out_nanquantile_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0663845Z test_ops.py::TestCommonCUDA::test_out_native_dropout_backward_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0663992Z test_ops.py::TestCommonCUDA::test_out_new_empty_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0664135Z test_ops.py::TestCommonCUDA::test_out_new_full_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0664279Z test_ops.py::TestCommonCUDA::test_out_new_ones_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0664423Z test_ops.py::TestCommonCUDA::test_out_new_zeros_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0664597Z test_ops.py::TestCommonCUDA::test_out_nn_functional_adaptive_avg_pool1d_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0664769Z test_ops.py::TestCommonCUDA::test_out_nn_functional_adaptive_avg_pool2d_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0664970Z test_ops.py::TestCommonCUDA::test_out_nn_functional_adaptive_max_pool1d_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0665144Z test_ops.py::TestCommonCUDA::test_out_nn_functional_adaptive_max_pool2d_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0665314Z test_ops.py::TestCommonCUDA::test_out_nn_functional_adaptive_max_pool3d_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0665479Z test_ops.py::TestCommonCUDA::test_out_nn_functional_avg_pool1d_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0665639Z test_ops.py::TestCommonCUDA::test_out_nn_functional_avg_pool2d_cuda_float32 XFAIL [ 17%] 2023-01-11T23:13:47.0665802Z test_ops.py::TestCommonCUDA::test_out_nn_functional_avg_pool3d_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0665982Z test_ops.py::TestCommonCUDA::test_out_nn_functional_binary_cross_entropy_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0666162Z test_ops.py::TestCommonCUDA::test_out_nn_functional_binary_cross_entropy_with_logits_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0666326Z test_ops.py::TestCommonCUDA::test_out_nn_functional_conv1d_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0666500Z test_ops.py::TestCommonCUDA::test_out_nn_functional_conv_transpose1d_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0666670Z test_ops.py::TestCommonCUDA::test_out_nn_functional_conv_transpose3d_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0666847Z test_ops.py::TestCommonCUDA::test_out_nn_functional_cosine_embedding_loss_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0667013Z test_ops.py::TestCommonCUDA::test_out_nn_functional_cross_entropy_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0667175Z test_ops.py::TestCommonCUDA::test_out_nn_functional_ctc_loss_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0667337Z test_ops.py::TestCommonCUDA::test_out_nn_functional_dropout3d_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0667532Z test_ops.py::TestCommonCUDA::test_out_nn_functional_elu_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0667698Z test_ops.py::TestCommonCUDA::test_out_nn_functional_embedding_bag_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0667863Z test_ops.py::TestCommonCUDA::test_out_nn_functional_embedding_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0668058Z test_ops.py::TestCommonCUDA::test_out_nn_functional_feature_alpha_dropout_without_train_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0668235Z test_ops.py::TestCommonCUDA::test_out_nn_functional_fractional_max_pool2d_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0668408Z test_ops.py::TestCommonCUDA::test_out_nn_functional_gaussian_nll_loss_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0668577Z test_ops.py::TestCommonCUDA::test_out_nn_functional_gelu_cuda_float32 SKIPPED (Skipped!) [ 17%] 2023-01-11T23:13:47.0668744Z test_ops.py::TestCommonCUDA::test_out_nn_functional_hardsigmoid_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0668923Z test_ops.py::TestCommonCUDA::test_out_nn_functional_hinge_embedding_loss_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0669084Z test_ops.py::TestCommonCUDA::test_out_nn_functional_instance_norm_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0669259Z test_ops.py::TestCommonCUDA::test_out_nn_functional_interpolate_nearest_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0669437Z test_ops.py::TestCommonCUDA::test_out_nn_functional_interpolate_trilinear_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0669599Z test_ops.py::TestCommonCUDA::test_out_nn_functional_layer_norm_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0669826Z test_ops.py::TestCommonCUDA::test_out_nn_functional_linear_cuda_float32 XFAIL [ 17%] 2023-01-11T23:13:47.0670004Z test_ops.py::TestCommonCUDA::test_out_nn_functional_local_response_norm_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0670163Z test_ops.py::TestCommonCUDA::test_out_nn_functional_max_pool2d_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0670330Z test_ops.py::TestCommonCUDA::test_out_nn_functional_max_pool3d_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0670496Z test_ops.py::TestCommonCUDA::test_out_nn_functional_max_unpool1d_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0670692Z test_ops.py::TestCommonCUDA::test_out_nn_functional_max_unpool1d_grad_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0670863Z test_ops.py::TestCommonCUDA::test_out_nn_functional_max_unpool2d_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0671031Z test_ops.py::TestCommonCUDA::test_out_nn_functional_max_unpool2d_grad_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0671203Z test_ops.py::TestCommonCUDA::test_out_nn_functional_max_unpool3d_grad_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0671363Z test_ops.py::TestCommonCUDA::test_out_nn_functional_mish_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0671524Z test_ops.py::TestCommonCUDA::test_out_nn_functional_mse_loss_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0671703Z test_ops.py::TestCommonCUDA::test_out_nn_functional_multilabel_margin_loss_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0671890Z test_ops.py::TestCommonCUDA::test_out_nn_functional_multilabel_soft_margin_loss_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0672052Z test_ops.py::TestCommonCUDA::test_out_nn_functional_nll_loss_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0672211Z test_ops.py::TestCommonCUDA::test_out_nn_functional_normalize_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0672376Z test_ops.py::TestCommonCUDA::test_out_nn_functional_pad_constant_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0672543Z test_ops.py::TestCommonCUDA::test_out_nn_functional_pad_replicate_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0672717Z test_ops.py::TestCommonCUDA::test_out_nn_functional_pairwise_distance_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0672878Z test_ops.py::TestCommonCUDA::test_out_nn_functional_pdist_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0673048Z test_ops.py::TestCommonCUDA::test_out_nn_functional_pixel_unshuffle_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0673247Z test_ops.py::TestCommonCUDA::test_out_nn_functional_poisson_nll_loss_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0673405Z test_ops.py::TestCommonCUDA::test_out_nn_functional_prelu_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0673560Z test_ops.py::TestCommonCUDA::test_out_nn_functional_relu6_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0673719Z test_ops.py::TestCommonCUDA::test_out_nn_functional_rrelu_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0673877Z test_ops.py::TestCommonCUDA::test_out_nn_functional_selu_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0674043Z test_ops.py::TestCommonCUDA::test_out_nn_functional_smooth_l1_loss_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0674210Z test_ops.py::TestCommonCUDA::test_out_nn_functional_soft_margin_loss_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0674375Z test_ops.py::TestCommonCUDA::test_out_nn_functional_softplus_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0674541Z test_ops.py::TestCommonCUDA::test_out_nn_functional_tanhshrink_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0674707Z test_ops.py::TestCommonCUDA::test_out_nn_functional_threshold_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0674895Z test_ops.py::TestCommonCUDA::test_out_nn_functional_triplet_margin_with_distance_loss_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0675058Z test_ops.py::TestCommonCUDA::test_out_nn_functional_unfold_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0675229Z test_ops.py::TestCommonCUDA::test_out_nn_functional_upsample_bilinear_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0675377Z test_ops.py::TestCommonCUDA::test_out_nonzero_cuda_float32 XFAIL [ 17%] 2023-01-11T23:13:47.0675526Z test_ops.py::TestCommonCUDA::test_out_norm_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0675675Z test_ops.py::TestCommonCUDA::test_out_norm_fro_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0675821Z test_ops.py::TestCommonCUDA::test_out_normal_cuda_float32 XFAIL [ 17%] 2023-01-11T23:13:47.0675968Z test_ops.py::TestCommonCUDA::test_out_ones_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0676112Z test_ops.py::TestCommonCUDA::test_out_ormqr_cuda_float32 XFAIL [ 17%] 2023-01-11T23:13:47.0676283Z test_ops.py::TestCommonCUDA::test_out_pca_lowrank_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0676432Z test_ops.py::TestCommonCUDA::test_out_permute_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0676579Z test_ops.py::TestCommonCUDA::test_out_pinverse_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0676723Z test_ops.py::TestCommonCUDA::test_out_polar_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0676867Z test_ops.py::TestCommonCUDA::test_out_pow_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0677012Z test_ops.py::TestCommonCUDA::test_out_prod_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0677154Z test_ops.py::TestCommonCUDA::test_out_put_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0677299Z test_ops.py::TestCommonCUDA::test_out_rad2deg_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0677442Z test_ops.py::TestCommonCUDA::test_out_rand_like_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0677585Z test_ops.py::TestCommonCUDA::test_out_randint_cuda_float32 XFAIL [ 17%] 2023-01-11T23:13:47.0677731Z test_ops.py::TestCommonCUDA::test_out_ravel_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0677877Z test_ops.py::TestCommonCUDA::test_out_remainder_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0678023Z test_ops.py::TestCommonCUDA::test_out_reshape_as_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0678169Z test_ops.py::TestCommonCUDA::test_out_reshape_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0678316Z test_ops.py::TestCommonCUDA::test_out_resolve_conj_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0678457Z test_ops.py::TestCommonCUDA::test_out_roll_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0678592Z test_ops.py::TestCommonCUDA::test_out_rot90_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0678777Z test_ops.py::TestCommonCUDA::test_out_round_decimals_0_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0678923Z test_ops.py::TestCommonCUDA::test_out_rsub_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0679097Z test_ops.py::TestCommonCUDA::test_out_scalar_tensor_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0679282Z test_ops.py::TestCommonCUDA::test_out_scatter_reduce_amin_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0679439Z test_ops.py::TestCommonCUDA::test_out_scatter_reduce_mean_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0679594Z test_ops.py::TestCommonCUDA::test_out_scatter_reduce_prod_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0679757Z test_ops.py::TestCommonCUDA::test_out_segment_reduce_offsets_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0679901Z test_ops.py::TestCommonCUDA::test_out_select_scatter_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0680043Z test_ops.py::TestCommonCUDA::test_out_sgn_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0680187Z test_ops.py::TestCommonCUDA::test_out_sign_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0680352Z test_ops.py::TestCommonCUDA::test_out_signal_windows_bartlett_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0680518Z test_ops.py::TestCommonCUDA::test_out_signal_windows_blackman_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0680681Z test_ops.py::TestCommonCUDA::test_out_signal_windows_cosine_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0680843Z test_ops.py::TestCommonCUDA::test_out_signal_windows_gaussian_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0681014Z test_ops.py::TestCommonCUDA::test_out_signal_windows_general_cosine_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0681177Z test_ops.py::TestCommonCUDA::test_out_signal_windows_general_hamming_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0681337Z test_ops.py::TestCommonCUDA::test_out_signal_windows_hamming_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0681493Z test_ops.py::TestCommonCUDA::test_out_signal_windows_nuttall_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0681642Z test_ops.py::TestCommonCUDA::test_out_sinh_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0681792Z test_ops.py::TestCommonCUDA::test_out_slice_scatter_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0681968Z test_ops.py::TestCommonCUDA::test_out_softmax_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0682130Z test_ops.py::TestCommonCUDA::test_out_softmax_with_dtype_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0682306Z test_ops.py::TestCommonCUDA::test_out_sparse_sampled_addmm_cuda_float32 SKIPPED (Skipped!) [ 17%] 2023-01-11T23:13:47.0682463Z test_ops.py::TestCommonCUDA::test_out_special_bessel_j0_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0682612Z test_ops.py::TestCommonCUDA::test_out_special_bessel_y0_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0682766Z test_ops.py::TestCommonCUDA::test_out_special_bessel_y1_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0683114Z test_ops.py::TestCommonCUDA::test_out_special_chebyshev_polynomial_v_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 17%] 2023-01-11T23:13:47.0683269Z test_ops.py::TestCommonCUDA::test_out_special_erfcx_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0683444Z test_ops.py::TestCommonCUDA::test_out_special_hermite_polynomial_he_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0683592Z test_ops.py::TestCommonCUDA::test_out_special_i1_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0683743Z test_ops.py::TestCommonCUDA::test_out_special_i1e_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0684077Z test_ops.py::TestCommonCUDA::test_out_special_legendre_polynomial_p_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 17%] 2023-01-11T23:13:47.0684235Z test_ops.py::TestCommonCUDA::test_out_special_log_ndtr_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0684393Z test_ops.py::TestCommonCUDA::test_out_special_modified_bessel_i0_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0684559Z test_ops.py::TestCommonCUDA::test_out_special_modified_bessel_i1_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0684739Z test_ops.py::TestCommonCUDA::test_out_special_ndtr_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0684890Z test_ops.py::TestCommonCUDA::test_out_special_ndtri_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0685078Z test_ops.py::TestCommonCUDA::test_out_special_polygamma_special_polygamma_n_0_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0685426Z test_ops.py::TestCommonCUDA::test_out_special_shifted_chebyshev_polynomial_u_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 17%] 2023-01-11T23:13:47.0685581Z test_ops.py::TestCommonCUDA::test_out_special_xlog1py_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0685735Z test_ops.py::TestCommonCUDA::test_out_split_list_args_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0685875Z test_ops.py::TestCommonCUDA::test_out_squeeze_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0686021Z test_ops.py::TestCommonCUDA::test_out_std_mean_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0686183Z test_ops.py::TestCommonCUDA::test_out_std_mean_unbiased_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0686328Z test_ops.py::TestCommonCUDA::test_out_stft_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0686470Z test_ops.py::TestCommonCUDA::test_out_sum_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0686615Z test_ops.py::TestCommonCUDA::test_out_symeig_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0686758Z test_ops.py::TestCommonCUDA::test_out_t_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0686907Z test_ops.py::TestCommonCUDA::test_out_take_along_dim_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0687043Z test_ops.py::TestCommonCUDA::test_out_take_cuda_float32 PASSED [ 17%] 2023-01-11T23:13:47.0687184Z test_ops.py::TestCommonCUDA::test_out_tan_cuda_float32 PASSED [ 18%] 2023-01-11T23:13:47.0687326Z test_ops.py::TestCommonCUDA::test_out_tanh_cuda_float32 PASSED [ 18%] 2023-01-11T23:13:47.0687471Z test_ops.py::TestCommonCUDA::test_out_to_cuda_float32 PASSED [ 18%] 2023-01-11T23:13:47.0687615Z test_ops.py::TestCommonCUDA::test_out_to_sparse_cuda_float32 PASSED [ 18%] 2023-01-11T23:13:47.0687787Z test_ops.py::TestCommonCUDA::test_out_trapz_cuda_float32 PASSED [ 18%] 2023-01-11T23:13:47.0687934Z test_ops.py::TestCommonCUDA::test_out_tril_indices_cuda_int64 PASSED [ 18%] 2023-01-11T23:13:47.0688080Z test_ops.py::TestCommonCUDA::test_out_triu_indices_cuda_int64 PASSED [ 18%] 2023-01-11T23:13:47.0688218Z test_ops.py::TestCommonCUDA::test_out_trunc_cuda_float32 PASSED [ 18%] 2023-01-11T23:13:47.0688362Z test_ops.py::TestCommonCUDA::test_out_unbind_cuda_float32 PASSED [ 18%] 2023-01-11T23:13:47.0688508Z test_ops.py::TestCommonCUDA::test_out_uniform_cuda_float32 PASSED [ 18%] 2023-01-11T23:13:47.0688652Z test_ops.py::TestCommonCUDA::test_out_unsqueeze_cuda_float32 PASSED [ 18%] 2023-01-11T23:13:47.0688797Z test_ops.py::TestCommonCUDA::test_out_var_mean_cuda_float32 PASSED [ 18%] 2023-01-11T23:13:47.0688957Z test_ops.py::TestCommonCUDA::test_out_var_mean_unbiased_cuda_float32 PASSED [ 18%] 2023-01-11T23:13:47.0689099Z test_ops.py::TestCommonCUDA::test_out_vdot_cuda_float32 PASSED [ 18%] 2023-01-11T23:13:47.0689256Z test_ops.py::TestCommonCUDA::test_out_view_as_complex_cuda_float32 PASSED [ 18%] 2023-01-11T23:13:47.0689403Z test_ops.py::TestCommonCUDA::test_out_view_as_real_cuda_complex64 PASSED [ 18%] 2023-01-11T23:13:47.0689542Z test_ops.py::TestCommonCUDA::test_out_view_copy_cuda_float32 XFAIL [ 18%] 2023-01-11T23:13:47.0689687Z test_ops.py::TestCommonCUDA::test_out_vsplit_cuda_float32 PASSED [ 18%] 2023-01-11T23:13:47.0689828Z test_ops.py::TestCommonCUDA::test_out_warning_H_cuda PASSED [ 18%] 2023-01-11T23:13:47.0689969Z test_ops.py::TestCommonCUDA::test_out_warning_T_cuda PASSED [ 18%] 2023-01-11T23:13:47.0690114Z test_ops.py::TestCommonCUDA::test_out_warning___getitem___cuda PASSED [ 18%] 2023-01-11T23:13:47.0690285Z test_ops.py::TestCommonCUDA::test_out_warning___radd___cuda PASSED [ 18%] 2023-01-11T23:13:47.0690428Z test_ops.py::TestCommonCUDA::test_out_warning___rmod___cuda PASSED [ 18%] 2023-01-11T23:13:47.0690573Z test_ops.py::TestCommonCUDA::test_out_warning___rmul___cuda PASSED [ 18%] 2023-01-11T23:13:47.0690710Z test_ops.py::TestCommonCUDA::test_out_warning___rpow___cuda PASSED [ 18%] 2023-01-11T23:13:47.0690854Z test_ops.py::TestCommonCUDA::test_out_warning___rsub___cuda PASSED [ 18%] 2023-01-11T23:13:47.0690997Z test_ops.py::TestCommonCUDA::test_out_warning___rxor___cuda PASSED [ 18%] 2023-01-11T23:13:47.0691162Z test_ops.py::TestCommonCUDA::test_out_warning__refs__conversions_bool_cuda PASSED [ 18%] 2023-01-11T23:13:47.0691329Z test_ops.py::TestCommonCUDA::test_out_warning__refs__conversions_cfloat_cuda PASSED [ 18%] 2023-01-11T23:13:47.0691493Z test_ops.py::TestCommonCUDA::test_out_warning__refs__conversions_char_cuda PASSED [ 18%] 2023-01-11T23:13:47.0691664Z test_ops.py::TestCommonCUDA::test_out_warning__refs__conversions_complex_cuda PASSED [ 18%] 2023-01-11T23:13:47.0691827Z test_ops.py::TestCommonCUDA::test_out_warning__refs__conversions_half_cuda PASSED [ 18%] 2023-01-11T23:13:47.0691970Z test_ops.py::TestCommonCUDA::test_out_warning__refs_acos_cuda PASSED [ 18%] 2023-01-11T23:13:47.0692120Z test_ops.py::TestCommonCUDA::test_out_warning__refs_acosh_cuda PASSED [ 18%] 2023-01-11T23:13:47.0692266Z test_ops.py::TestCommonCUDA::test_out_warning__refs_add_cuda PASSED [ 18%] 2023-01-11T23:13:47.0692413Z test_ops.py::TestCommonCUDA::test_out_warning__refs_addcdiv_cuda PASSED [ 18%] 2023-01-11T23:13:47.0692562Z test_ops.py::TestCommonCUDA::test_out_warning__refs_addcmul_cuda PASSED [ 18%] 2023-01-11T23:13:47.0692708Z test_ops.py::TestCommonCUDA::test_out_warning__refs_amax_cuda PASSED [ 18%] 2023-01-11T23:13:47.0692854Z test_ops.py::TestCommonCUDA::test_out_warning__refs_amin_cuda PASSED [ 18%] 2023-01-11T23:13:47.0693012Z test_ops.py::TestCommonCUDA::test_out_warning__refs_as_strided_cuda PASSED [ 18%] 2023-01-11T23:13:47.0693178Z test_ops.py::TestCommonCUDA::test_out_warning__refs_as_strided_partial_views_cuda PASSED [ 18%] 2023-01-11T23:13:47.0693349Z test_ops.py::TestCommonCUDA::test_out_warning__refs_asin_cuda PASSED [ 18%] 2023-01-11T23:13:47.0693498Z test_ops.py::TestCommonCUDA::test_out_warning__refs_asinh_cuda PASSED [ 18%] 2023-01-11T23:13:47.0693644Z test_ops.py::TestCommonCUDA::test_out_warning__refs_atanh_cuda PASSED [ 18%] 2023-01-11T23:13:47.0693801Z test_ops.py::TestCommonCUDA::test_out_warning__refs_bitwise_and_cuda PASSED [ 18%] 2023-01-11T23:13:47.0693968Z test_ops.py::TestCommonCUDA::test_out_warning__refs_bitwise_left_shift_cuda PASSED [ 18%] 2023-01-11T23:13:47.0694123Z test_ops.py::TestCommonCUDA::test_out_warning__refs_bitwise_not_cuda PASSED [ 18%] 2023-01-11T23:13:47.0694274Z test_ops.py::TestCommonCUDA::test_out_warning__refs_bitwise_xor_cuda PASSED [ 18%] 2023-01-11T23:13:47.0694433Z test_ops.py::TestCommonCUDA::test_out_warning__refs_broadcast_shapes_cuda PASSED [ 18%] 2023-01-11T23:13:47.0694696Z test_ops.py::TestCommonCUDA::test_out_warning__refs_bucketize_cuda PASSED [ 18%] 2023-01-11T23:13:47.0694849Z test_ops.py::TestCommonCUDA::test_out_warning__refs_ceil_cuda PASSED [ 18%] 2023-01-11T23:13:47.0694997Z test_ops.py::TestCommonCUDA::test_out_warning__refs_chunk_cuda PASSED [ 18%] 2023-01-11T23:13:47.0695151Z test_ops.py::TestCommonCUDA::test_out_warning__refs_clamp_max_cuda PASSED [ 18%] 2023-01-11T23:13:47.0695300Z test_ops.py::TestCommonCUDA::test_out_warning__refs_clone_cuda PASSED [ 18%] 2023-01-11T23:13:47.0695446Z test_ops.py::TestCommonCUDA::test_out_warning__refs_conj_cuda PASSED [ 18%] 2023-01-11T23:13:47.0695602Z test_ops.py::TestCommonCUDA::test_out_warning__refs_contiguous_cuda PASSED [ 18%] 2023-01-11T23:13:47.0695752Z test_ops.py::TestCommonCUDA::test_out_warning__refs_cos_cuda PASSED [ 18%] 2023-01-11T23:13:47.0695892Z test_ops.py::TestCommonCUDA::test_out_warning__refs_cosh_cuda PASSED [ 18%] 2023-01-11T23:13:47.0696082Z test_ops.py::TestCommonCUDA::test_out_warning__refs_diag_cuda PASSED [ 18%] 2023-01-11T23:13:47.0696238Z test_ops.py::TestCommonCUDA::test_out_warning__refs_diag_embed_cuda PASSED [ 18%] 2023-01-11T23:13:47.0696400Z test_ops.py::TestCommonCUDA::test_out_warning__refs_diagonal_copy_cuda PASSED [ 18%] 2023-01-11T23:13:47.0696551Z test_ops.py::TestCommonCUDA::test_out_warning__refs_diagonal_cuda PASSED [ 18%] 2023-01-11T23:13:47.0696698Z test_ops.py::TestCommonCUDA::test_out_warning__refs_digamma_cuda PASSED [ 18%] 2023-01-11T23:13:47.0696863Z test_ops.py::TestCommonCUDA::test_out_warning__refs_div_floor_rounding_cuda PASSED [ 18%] 2023-01-11T23:13:47.0697028Z test_ops.py::TestCommonCUDA::test_out_warning__refs_div_no_rounding_mode_cuda PASSED [ 18%] 2023-01-11T23:13:47.0697183Z test_ops.py::TestCommonCUDA::test_out_warning__refs_div_trunc_rounding_cuda PASSED [ 18%] 2023-01-11T23:13:47.0697332Z test_ops.py::TestCommonCUDA::test_out_warning__refs_dsplit_cuda PASSED [ 18%] 2023-01-11T23:13:47.0697529Z test_ops.py::TestCommonCUDA::test_out_warning__refs_empty_cuda SKIPPED (Expected: empty is not comparable) [ 18%] 2023-01-11T23:13:47.0697680Z test_ops.py::TestCommonCUDA::test_out_warning__refs_erfc_cuda PASSED [ 18%] 2023-01-11T23:13:47.0697828Z test_ops.py::TestCommonCUDA::test_out_warning__refs_erfinv_cuda PASSED [ 18%] 2023-01-11T23:13:47.0697979Z test_ops.py::TestCommonCUDA::test_out_warning__refs_exp_cuda PASSED [ 18%] 2023-01-11T23:13:47.0698135Z test_ops.py::TestCommonCUDA::test_out_warning__refs_expand_as_cuda PASSED [ 18%] 2023-01-11T23:13:47.0698283Z test_ops.py::TestCommonCUDA::test_out_warning__refs_expm1_cuda PASSED [ 18%] 2023-01-11T23:13:47.0698424Z test_ops.py::TestCommonCUDA::test_out_warning__refs_fft_fft2_cuda PASSED [ 18%] 2023-01-11T23:13:47.0698578Z test_ops.py::TestCommonCUDA::test_out_warning__refs_fft_hfftn_cuda PASSED [ 18%] 2023-01-11T23:13:47.0698730Z test_ops.py::TestCommonCUDA::test_out_warning__refs_fft_ifft2_cuda PASSED [ 18%] 2023-01-11T23:13:47.0698880Z test_ops.py::TestCommonCUDA::test_out_warning__refs_fft_ifft_cuda PASSED [ 18%] 2023-01-11T23:13:47.0699031Z test_ops.py::TestCommonCUDA::test_out_warning__refs_fft_ifftn_cuda PASSED [ 18%] 2023-01-11T23:13:47.0699230Z test_ops.py::TestCommonCUDA::test_out_warning__refs_fft_ifftshift_cuda PASSED [ 18%] 2023-01-11T23:13:47.0699399Z test_ops.py::TestCommonCUDA::test_out_warning__refs_fft_ihfft2_cuda PASSED [ 18%] 2023-01-11T23:13:47.0699573Z test_ops.py::TestCommonCUDA::test_out_warning__refs_fft_ihfftn_cuda PASSED [ 18%] 2023-01-11T23:13:47.0699735Z test_ops.py::TestCommonCUDA::test_out_warning__refs_fft_irfft2_cuda PASSED [ 18%] 2023-01-11T23:13:47.0699888Z test_ops.py::TestCommonCUDA::test_out_warning__refs_fft_irfftn_cuda PASSED [ 18%] 2023-01-11T23:13:47.0700038Z test_ops.py::TestCommonCUDA::test_out_warning__refs_fft_rfft2_cuda PASSED [ 18%] 2023-01-11T23:13:47.0700193Z test_ops.py::TestCommonCUDA::test_out_warning__refs_flipud_cuda PASSED [ 18%] 2023-01-11T23:13:47.0700352Z test_ops.py::TestCommonCUDA::test_out_warning__refs_float_power_cuda PASSED [ 18%] 2023-01-11T23:13:47.0700499Z test_ops.py::TestCommonCUDA::test_out_warning__refs_floor_cuda PASSED [ 18%] 2023-01-11T23:13:47.0700659Z test_ops.py::TestCommonCUDA::test_out_warning__refs_floor_divide_cuda PASSED [ 18%] 2023-01-11T23:13:47.0700808Z test_ops.py::TestCommonCUDA::test_out_warning__refs_fmax_cuda PASSED [ 18%] 2023-01-11T23:13:47.0700948Z test_ops.py::TestCommonCUDA::test_out_warning__refs_frac_cuda PASSED [ 18%] 2023-01-11T23:13:47.0701096Z test_ops.py::TestCommonCUDA::test_out_warning__refs_gcd_cuda PASSED [ 18%] 2023-01-11T23:13:47.0701242Z test_ops.py::TestCommonCUDA::test_out_warning__refs_i0_cuda PASSED [ 18%] 2023-01-11T23:13:47.0701388Z test_ops.py::TestCommonCUDA::test_out_warning__refs_imag_cuda PASSED [ 18%] 2023-01-11T23:13:47.0701540Z test_ops.py::TestCommonCUDA::test_out_warning__refs_index_add_cuda PASSED [ 18%] 2023-01-11T23:13:47.0701718Z test_ops.py::TestCommonCUDA::test_out_warning__refs_index_copy_cuda PASSED [ 18%] 2023-01-11T23:13:47.0701869Z test_ops.py::TestCommonCUDA::test_out_warning__refs_index_fill_cuda PASSED [ 18%] 2023-01-11T23:13:47.0702031Z test_ops.py::TestCommonCUDA::test_out_warning__refs_index_select_cuda PASSED [ 18%] 2023-01-11T23:13:47.0702180Z test_ops.py::TestCommonCUDA::test_out_warning__refs_isclose_cuda PASSED [ 18%] 2023-01-11T23:13:47.0702322Z test_ops.py::TestCommonCUDA::test_out_warning__refs_isnan_cuda PASSED [ 18%] 2023-01-11T23:13:47.0702468Z test_ops.py::TestCommonCUDA::test_out_warning__refs_lerp_cuda PASSED [ 18%] 2023-01-11T23:13:47.0702622Z test_ops.py::TestCommonCUDA::test_out_warning__refs_linalg_svd_cuda PASSED [ 18%] 2023-01-11T23:13:47.0702783Z test_ops.py::TestCommonCUDA::test_out_warning__refs_linalg_svdvals_cuda PASSED [ 18%] 2023-01-11T23:13:47.0702944Z test_ops.py::TestCommonCUDA::test_out_warning__refs_linalg_vector_norm_cuda PASSED [ 18%] 2023-01-11T23:13:47.0703093Z test_ops.py::TestCommonCUDA::test_out_warning__refs_log10_cuda PASSED [ 18%] 2023-01-11T23:13:47.0703240Z test_ops.py::TestCommonCUDA::test_out_warning__refs_log1p_cuda PASSED [ 18%] 2023-01-11T23:13:47.0703386Z test_ops.py::TestCommonCUDA::test_out_warning__refs_log2_cuda PASSED [ 18%] 2023-01-11T23:13:47.0703551Z test_ops.py::TestCommonCUDA::test_out_warning__refs_log_softmax_with_dtype_cuda PASSED [ 18%] 2023-01-11T23:13:47.0703708Z test_ops.py::TestCommonCUDA::test_out_warning__refs_logical_and_cuda PASSED [ 18%] 2023-01-11T23:13:47.0703864Z test_ops.py::TestCommonCUDA::test_out_warning__refs_logical_not_cuda PASSED [ 18%] 2023-01-11T23:13:47.0704016Z test_ops.py::TestCommonCUDA::test_out_warning__refs_logspace_cuda PASSED [ 18%] 2023-01-11T23:13:47.0704173Z test_ops.py::TestCommonCUDA::test_out_warning__refs_masked_fill_cuda PASSED [ 18%] 2023-01-11T23:13:47.0704323Z test_ops.py::TestCommonCUDA::test_out_warning__refs_maximum_cuda PASSED [ 18%] 2023-01-11T23:13:47.0704470Z test_ops.py::TestCommonCUDA::test_out_warning__refs_mean_cuda PASSED [ 18%] 2023-01-11T23:13:47.0704621Z test_ops.py::TestCommonCUDA::test_out_warning__refs_movedim_cuda PASSED [ 18%] 2023-01-11T23:13:47.0704761Z test_ops.py::TestCommonCUDA::test_out_warning__refs_mul_cuda PASSED [ 18%] 2023-01-11T23:13:47.0704940Z test_ops.py::TestCommonCUDA::test_out_warning__refs_nan_to_num_cuda PASSED [ 18%] 2023-01-11T23:13:47.0705098Z test_ops.py::TestCommonCUDA::test_out_warning__refs_narrow_copy_cuda PASSED [ 18%] 2023-01-11T23:13:47.0705262Z test_ops.py::TestCommonCUDA::test_out_warning__refs_native_layer_norm_cuda PASSED [ 18%] 2023-01-11T23:13:47.0705407Z test_ops.py::TestCommonCUDA::test_out_warning__refs_ne_cuda PASSED [ 18%] 2023-01-11T23:13:47.0705567Z test_ops.py::TestCommonCUDA::test_out_warning__refs_new_empty_strided_cuda PASSED [ 18%] 2023-01-11T23:13:47.0705714Z test_ops.py::TestCommonCUDA::test_out_warning__refs_new_full_cuda PASSED [ 18%] 2023-01-11T23:13:47.0705869Z test_ops.py::TestCommonCUDA::test_out_warning__refs_new_zeros_cuda PASSED [ 18%] 2023-01-11T23:13:47.0706038Z test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_group_norm_cuda PASSED [ 18%] 2023-01-11T23:13:47.0706207Z test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_hardtanh_cuda PASSED [ 18%] 2023-01-11T23:13:47.0706377Z test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_l1_loss_cuda PASSED [ 18%] 2023-01-11T23:13:47.0706548Z test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_layer_norm_cuda PASSED [ 18%] 2023-01-11T23:13:47.0706738Z test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_log_softmax_with_dtype_cuda PASSED [ 18%] 2023-01-11T23:13:47.0706920Z test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_margin_ranking_loss_cuda PASSED [ 18%] 2023-01-11T23:13:47.0707086Z test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_mish_cuda PASSED [ 18%] 2023-01-11T23:13:47.0707253Z test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_mse_loss_cuda PASSED [ 18%] 2023-01-11T23:13:47.0707472Z test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_pairwise_distance_cuda PASSED [ 18%] 2023-01-11T23:13:47.0707633Z test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_pdist_cuda PASSED [ 18%] 2023-01-11T23:13:47.0707814Z test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_poisson_nll_loss_cuda PASSED [ 18%] 2023-01-11T23:13:47.0707981Z test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_prelu_cuda PASSED [ 18%] 2023-01-11T23:13:47.0708144Z test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_relu6_cuda PASSED [ 18%] 2023-01-11T23:13:47.0708307Z test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_selu_cuda PASSED [ 18%] 2023-01-11T23:13:47.0708488Z test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_softmax_with_dtype_cuda PASSED [ 18%] 2023-01-11T23:13:47.0708668Z test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_softmin_with_dtype_cuda PASSED [ 18%] 2023-01-11T23:13:47.0708838Z test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_softplus_cuda PASSED [ 18%] 2023-01-11T23:13:47.0709009Z test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_tanhshrink_cuda PASSED [ 18%] 2023-01-11T23:13:47.0709186Z test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_triplet_margin_loss_cuda PASSED [ 18%] 2023-01-11T23:13:47.0709339Z test_ops.py::TestCommonCUDA::test_out_warning__refs_permute_cuda PASSED [ 18%] 2023-01-11T23:13:47.0709488Z test_ops.py::TestCommonCUDA::test_out_warning__refs_pow_cuda PASSED [ 18%] 2023-01-11T23:13:47.0709638Z test_ops.py::TestCommonCUDA::test_out_warning__refs_ravel_cuda PASSED [ 18%] 2023-01-11T23:13:47.0709853Z test_ops.py::TestCommonCUDA::test_out_warning__refs_reciprocal_cuda PASSED [ 18%] 2023-01-11T23:13:47.0710014Z test_ops.py::TestCommonCUDA::test_out_warning__refs_remainder_cuda PASSED [ 18%] 2023-01-11T23:13:47.0710172Z test_ops.py::TestCommonCUDA::test_out_warning__refs_reshape_as_cuda PASSED [ 18%] 2023-01-11T23:13:47.0710328Z test_ops.py::TestCommonCUDA::test_out_warning__refs_reshape_cuda PASSED [ 18%] 2023-01-11T23:13:47.0710470Z test_ops.py::TestCommonCUDA::test_out_warning__refs_roll_cuda PASSED [ 18%] 2023-01-11T23:13:47.0710646Z test_ops.py::TestCommonCUDA::test_out_warning__refs_rsub_cuda PASSED [ 18%] 2023-01-11T23:13:47.0710797Z test_ops.py::TestCommonCUDA::test_out_warning__refs_sigmoid_cuda PASSED [ 18%] 2023-01-11T23:13:47.0710943Z test_ops.py::TestCommonCUDA::test_out_warning__refs_sin_cuda PASSED [ 19%] 2023-01-11T23:13:47.0711093Z test_ops.py::TestCommonCUDA::test_out_warning__refs_sinc_cuda PASSED [ 19%] 2023-01-11T23:13:47.0711239Z test_ops.py::TestCommonCUDA::test_out_warning__refs_sinh_cuda PASSED [ 19%] 2023-01-11T23:13:47.0711406Z test_ops.py::TestCommonCUDA::test_out_warning__refs_softmax_with_dtype_cuda PASSED [ 19%] 2023-01-11T23:13:47.0711575Z test_ops.py::TestCommonCUDA::test_out_warning__refs_special_bessel_j0_cuda PASSED [ 19%] 2023-01-11T23:13:47.0711733Z test_ops.py::TestCommonCUDA::test_out_warning__refs_special_bessel_j1_cuda PASSED [ 19%] 2023-01-11T23:13:47.0711896Z test_ops.py::TestCommonCUDA::test_out_warning__refs_special_i0e_cuda PASSED [ 19%] 2023-01-11T23:13:47.0712083Z test_ops.py::TestCommonCUDA::test_out_warning__refs_special_multigammaln_mvlgamma_p_1_cuda PASSED [ 19%] 2023-01-11T23:13:47.0712266Z test_ops.py::TestCommonCUDA::test_out_warning__refs_special_multigammaln_mvlgamma_p_3_cuda PASSED [ 19%] 2023-01-11T23:13:47.0712424Z test_ops.py::TestCommonCUDA::test_out_warning__refs_special_ndtr_cuda PASSED [ 19%] 2023-01-11T23:13:47.0712597Z test_ops.py::TestCommonCUDA::test_out_warning__refs_special_softmax_with_dtype_cuda PASSED [ 19%] 2023-01-11T23:13:47.0712777Z test_ops.py::TestCommonCUDA::test_out_warning__refs_special_spherical_bessel_j0_cuda PASSED [ 19%] 2023-01-11T23:13:47.0712944Z test_ops.py::TestCommonCUDA::test_out_warning__refs_special_xlog1py_cuda PASSED [ 19%] 2023-01-11T23:13:47.0713101Z test_ops.py::TestCommonCUDA::test_out_warning__refs_special_zeta_cuda PASSED [ 19%] 2023-01-11T23:13:47.0713276Z test_ops.py::TestCommonCUDA::test_out_warning__refs_squeeze_cuda PASSED [ 19%] 2023-01-11T23:13:47.0713426Z test_ops.py::TestCommonCUDA::test_out_warning__refs_std_mean_cuda PASSED [ 19%] 2023-01-11T23:13:47.0713577Z test_ops.py::TestCommonCUDA::test_out_warning__refs_sum_cuda PASSED [ 19%] 2023-01-11T23:13:47.0713723Z test_ops.py::TestCommonCUDA::test_out_warning__refs_to_cuda PASSED [ 19%] 2023-01-11T23:13:47.0713883Z test_ops.py::TestCommonCUDA::test_out_warning__refs_tril_indices_cuda PASSED [ 19%] 2023-01-11T23:13:47.0714038Z test_ops.py::TestCommonCUDA::test_out_warning__refs_true_divide_cuda PASSED [ 19%] 2023-01-11T23:13:47.0714187Z test_ops.py::TestCommonCUDA::test_out_warning__refs_trunc_cuda PASSED [ 19%] 2023-01-11T23:13:47.0714344Z test_ops.py::TestCommonCUDA::test_out_warning__refs_unflatten_cuda PASSED [ 19%] 2023-01-11T23:13:47.0714486Z test_ops.py::TestCommonCUDA::test_out_warning__refs_unfold_cuda PASSED [ 19%] 2023-01-11T23:13:47.0714645Z test_ops.py::TestCommonCUDA::test_out_warning__refs_unsqueeze_cuda PASSED [ 19%] 2023-01-11T23:13:47.0714791Z test_ops.py::TestCommonCUDA::test_out_warning__refs_var_cuda PASSED [ 19%] 2023-01-11T23:13:47.0714946Z test_ops.py::TestCommonCUDA::test_out_warning__refs_var_mean_cuda PASSED [ 19%] 2023-01-11T23:13:47.0715093Z test_ops.py::TestCommonCUDA::test_out_warning__refs_view_cuda PASSED [ 19%] 2023-01-11T23:13:47.0715240Z test_ops.py::TestCommonCUDA::test_out_warning__refs_vstack_cuda PASSED [ 19%] 2023-01-11T23:13:47.0715384Z test_ops.py::TestCommonCUDA::test_out_warning_abs_cuda PASSED [ 19%] 2023-01-11T23:13:47.0715529Z test_ops.py::TestCommonCUDA::test_out_warning_add_cuda PASSED [ 19%] 2023-01-11T23:13:47.0715667Z test_ops.py::TestCommonCUDA::test_out_warning_addbmm_cuda XFAIL [ 19%] 2023-01-11T23:13:47.0715815Z test_ops.py::TestCommonCUDA::test_out_warning_addcdiv_cuda PASSED [ 19%] 2023-01-11T23:13:47.0715963Z test_ops.py::TestCommonCUDA::test_out_warning_addcmul_cuda PASSED [ 19%] 2023-01-11T23:13:47.0716105Z test_ops.py::TestCommonCUDA::test_out_warning_addr_cuda PASSED [ 19%] 2023-01-11T23:13:47.0716271Z test_ops.py::TestCommonCUDA::test_out_warning_all_cuda PASSED [ 19%] 2023-01-11T23:13:47.0716417Z test_ops.py::TestCommonCUDA::test_out_warning_amax_cuda PASSED [ 19%] 2023-01-11T23:13:47.0716560Z test_ops.py::TestCommonCUDA::test_out_warning_amin_cuda PASSED [ 19%] 2023-01-11T23:13:47.0716704Z test_ops.py::TestCommonCUDA::test_out_warning_angle_cuda PASSED [ 19%] 2023-01-11T23:13:47.0716839Z test_ops.py::TestCommonCUDA::test_out_warning_any_cuda PASSED [ 19%] 2023-01-11T23:13:47.0716979Z test_ops.py::TestCommonCUDA::test_out_warning_arange_cuda XFAIL [ 19%] 2023-01-11T23:13:47.0717126Z test_ops.py::TestCommonCUDA::test_out_warning_argmin_cuda PASSED [ 19%] 2023-01-11T23:13:47.0717272Z test_ops.py::TestCommonCUDA::test_out_warning_argwhere_cuda PASSED [ 19%] 2023-01-11T23:13:47.0717436Z test_ops.py::TestCommonCUDA::test_out_warning_as_strided_scatter_cuda PASSED [ 19%] 2023-01-11T23:13:47.0717579Z test_ops.py::TestCommonCUDA::test_out_warning_atan_cuda PASSED [ 19%] 2023-01-11T23:13:47.0717724Z test_ops.py::TestCommonCUDA::test_out_warning_atanh_cuda PASSED [ 19%] 2023-01-11T23:13:47.0717870Z test_ops.py::TestCommonCUDA::test_out_warning_atleast_2d_cuda PASSED [ 19%] 2023-01-11T23:13:47.0718013Z test_ops.py::TestCommonCUDA::test_out_warning_bernoulli_cuda XFAIL [ 19%] 2023-01-11T23:13:47.0718160Z test_ops.py::TestCommonCUDA::test_out_warning_bincount_cuda PASSED [ 19%] 2023-01-11T23:13:47.0718306Z test_ops.py::TestCommonCUDA::test_out_warning_bitwise_and_cuda PASSED [ 19%] 2023-01-11T23:13:47.0718465Z test_ops.py::TestCommonCUDA::test_out_warning_bitwise_left_shift_cuda PASSED [ 19%] 2023-01-11T23:13:47.0718612Z test_ops.py::TestCommonCUDA::test_out_warning_bitwise_not_cuda PASSED [ 19%] 2023-01-11T23:13:47.0718785Z test_ops.py::TestCommonCUDA::test_out_warning_bitwise_xor_cuda PASSED [ 19%] 2023-01-11T23:13:47.0718932Z test_ops.py::TestCommonCUDA::test_out_warning_block_diag_cuda PASSED [ 19%] 2023-01-11T23:13:47.0719078Z test_ops.py::TestCommonCUDA::test_out_warning_bool_cuda PASSED [ 19%] 2023-01-11T23:13:47.0719214Z test_ops.py::TestCommonCUDA::test_out_warning_byte_cuda PASSED [ 19%] 2023-01-11T23:13:47.0719355Z test_ops.py::TestCommonCUDA::test_out_warning_cat_cuda PASSED [ 19%] 2023-01-11T23:13:47.0719498Z test_ops.py::TestCommonCUDA::test_out_warning_cdist_cuda PASSED [ 19%] 2023-01-11T23:13:47.0719640Z test_ops.py::TestCommonCUDA::test_out_warning_ceil_cuda PASSED [ 19%] 2023-01-11T23:13:47.0719784Z test_ops.py::TestCommonCUDA::test_out_warning_cfloat_cuda PASSED [ 19%] 2023-01-11T23:13:47.0719928Z test_ops.py::TestCommonCUDA::test_out_warning_chunk_cuda PASSED [ 19%] 2023-01-11T23:13:47.0720073Z test_ops.py::TestCommonCUDA::test_out_warning_clone_cuda PASSED [ 19%] 2023-01-11T23:13:47.0720222Z test_ops.py::TestCommonCUDA::test_out_warning_conj_physical_cuda PASSED [ 19%] 2023-01-11T23:13:47.0720373Z test_ops.py::TestCommonCUDA::test_out_warning_constant_pad_nd_cuda PASSED [ 19%] 2023-01-11T23:13:47.0720522Z test_ops.py::TestCommonCUDA::test_out_warning_contiguous_cuda PASSED [ 19%] 2023-01-11T23:13:47.0720668Z test_ops.py::TestCommonCUDA::test_out_warning_copysign_cuda PASSED [ 19%] 2023-01-11T23:13:47.0720815Z test_ops.py::TestCommonCUDA::test_out_warning_corrcoef_cuda PASSED [ 19%] 2023-01-11T23:13:47.0720965Z test_ops.py::TestCommonCUDA::test_out_warning_count_nonzero_cuda PASSED [ 19%] 2023-01-11T23:13:47.0721105Z test_ops.py::TestCommonCUDA::test_out_warning_cov_cuda PASSED [ 19%] 2023-01-11T23:13:47.0721252Z test_ops.py::TestCommonCUDA::test_out_warning_cummin_cuda PASSED [ 19%] 2023-01-11T23:13:47.0721400Z test_ops.py::TestCommonCUDA::test_out_warning_cumprod_cuda PASSED [ 19%] 2023-01-11T23:13:47.0721543Z test_ops.py::TestCommonCUDA::test_out_warning_diagflat_cuda PASSED [ 19%] 2023-01-11T23:13:47.0721690Z test_ops.py::TestCommonCUDA::test_out_warning_diagonal_cuda PASSED [ 19%] 2023-01-11T23:13:47.0721882Z test_ops.py::TestCommonCUDA::test_out_warning_diagonal_scatter_cuda PASSED [ 19%] 2023-01-11T23:13:47.0722033Z test_ops.py::TestCommonCUDA::test_out_warning_diff_cuda PASSED [ 19%] 2023-01-11T23:13:47.0722191Z test_ops.py::TestCommonCUDA::test_out_warning_div_trunc_rounding_cuda PASSED [ 19%] 2023-01-11T23:13:47.0722336Z test_ops.py::TestCommonCUDA::test_out_warning_double_cuda PASSED [ 19%] 2023-01-11T23:13:47.0722485Z test_ops.py::TestCommonCUDA::test_out_warning_empty_like_cuda PASSED [ 19%] 2023-01-11T23:13:47.0722627Z test_ops.py::TestCommonCUDA::test_out_warning_eq_cuda PASSED [ 19%] 2023-01-11T23:13:47.0722849Z test_ops.py::TestCommonCUDA::test_out_warning_equal_cuda SKIPPED (Skipped! Only supports single tensor or iterable of tensor outputs.) [ 19%] 2023-01-11T23:13:47.0722989Z test_ops.py::TestCommonCUDA::test_out_warning_erf_cuda PASSED [ 19%] 2023-01-11T23:13:47.0723133Z test_ops.py::TestCommonCUDA::test_out_warning_erfinv_cuda PASSED [ 19%] 2023-01-11T23:13:47.0723275Z test_ops.py::TestCommonCUDA::test_out_warning_exp2_cuda PASSED [ 19%] 2023-01-11T23:13:47.0723416Z test_ops.py::TestCommonCUDA::test_out_warning_eye_cuda XFAIL [ 19%] 2023-01-11T23:13:47.0723560Z test_ops.py::TestCommonCUDA::test_out_warning_fft_fft2_cuda PASSED [ 19%] 2023-01-11T23:13:47.0723705Z test_ops.py::TestCommonCUDA::test_out_warning_fft_fft_cuda PASSED [ 19%] 2023-01-11T23:13:47.0723853Z test_ops.py::TestCommonCUDA::test_out_warning_fft_fftshift_cuda PASSED [ 19%] 2023-01-11T23:13:47.0723997Z test_ops.py::TestCommonCUDA::test_out_warning_fft_hfft2_cuda PASSED [ 19%] 2023-01-11T23:13:47.0724136Z test_ops.py::TestCommonCUDA::test_out_warning_fft_hfftn_cuda PASSED [ 19%] 2023-01-11T23:13:47.0724310Z test_ops.py::TestCommonCUDA::test_out_warning_fft_ifft_cuda PASSED [ 19%] 2023-01-11T23:13:47.0724459Z test_ops.py::TestCommonCUDA::test_out_warning_fft_ifftshift_cuda PASSED [ 19%] 2023-01-11T23:13:47.0724608Z test_ops.py::TestCommonCUDA::test_out_warning_fft_ihfft_cuda PASSED [ 19%] 2023-01-11T23:13:47.0724753Z test_ops.py::TestCommonCUDA::test_out_warning_fft_ihfftn_cuda XFAIL [ 19%] 2023-01-11T23:13:47.0724898Z test_ops.py::TestCommonCUDA::test_out_warning_fft_irfftn_cuda PASSED [ 19%] 2023-01-11T23:13:47.0725042Z test_ops.py::TestCommonCUDA::test_out_warning_fft_rfft_cuda PASSED [ 19%] 2023-01-11T23:13:47.0725189Z test_ops.py::TestCommonCUDA::test_out_warning_fft_rfftn_cuda PASSED [ 19%] 2023-01-11T23:13:47.0725325Z test_ops.py::TestCommonCUDA::test_out_warning_fill_cuda PASSED [ 19%] 2023-01-11T23:13:47.0725469Z test_ops.py::TestCommonCUDA::test_out_warning_flip_cuda PASSED [ 19%] 2023-01-11T23:13:47.0725617Z test_ops.py::TestCommonCUDA::test_out_warning_flipud_cuda PASSED [ 19%] 2023-01-11T23:13:47.0725759Z test_ops.py::TestCommonCUDA::test_out_warning_float_cuda PASSED [ 19%] 2023-01-11T23:13:47.0725904Z test_ops.py::TestCommonCUDA::test_out_warning_floor_cuda PASSED [ 19%] 2023-01-11T23:13:47.0726055Z test_ops.py::TestCommonCUDA::test_out_warning_floor_divide_cuda PASSED [ 19%] 2023-01-11T23:13:47.0726197Z test_ops.py::TestCommonCUDA::test_out_warning_frac_cuda PASSED [ 19%] 2023-01-11T23:13:47.0726340Z test_ops.py::TestCommonCUDA::test_out_warning_frexp_cuda PASSED [ 19%] 2023-01-11T23:13:47.0726478Z test_ops.py::TestCommonCUDA::test_out_warning_full_like_cuda PASSED [ 19%] 2023-01-11T23:13:47.0726619Z test_ops.py::TestCommonCUDA::test_out_warning_gcd_cuda PASSED [ 19%] 2023-01-11T23:13:47.0726765Z test_ops.py::TestCommonCUDA::test_out_warning_gradient_cuda PASSED [ 19%] 2023-01-11T23:13:47.0726908Z test_ops.py::TestCommonCUDA::test_out_warning_half_cuda PASSED [ 19%] 2023-01-11T23:13:47.0727058Z test_ops.py::TestCommonCUDA::test_out_warning_heaviside_cuda PASSED [ 19%] 2023-01-11T23:13:47.0727200Z test_ops.py::TestCommonCUDA::test_out_warning_histc_cuda PASSED [ 19%] 2023-01-11T23:13:47.0727442Z test_ops.py::TestCommonCUDA::test_out_warning_histogram_cuda SKIPPED (Skipped! Op has not supported dtypes on this device.) [ 19%] 2023-01-11T23:13:47.0727589Z test_ops.py::TestCommonCUDA::test_out_warning_hsplit_cuda PASSED [ 19%] 2023-01-11T23:13:47.0727725Z test_ops.py::TestCommonCUDA::test_out_warning_i0_cuda PASSED [ 19%] 2023-01-11T23:13:47.0727870Z test_ops.py::TestCommonCUDA::test_out_warning_igammac_cuda PASSED [ 19%] 2023-01-11T23:13:47.0728017Z test_ops.py::TestCommonCUDA::test_out_warning_index_fill_cuda PASSED [ 19%] 2023-01-11T23:13:47.0728163Z test_ops.py::TestCommonCUDA::test_out_warning_index_put_cuda PASSED [ 19%] 2023-01-11T23:13:47.0728310Z test_ops.py::TestCommonCUDA::test_out_warning_index_reduce_cuda PASSED [ 19%] 2023-01-11T23:13:47.0728458Z test_ops.py::TestCommonCUDA::test_out_warning_isclose_cuda PASSED [ 19%] 2023-01-11T23:13:47.0728600Z test_ops.py::TestCommonCUDA::test_out_warning_isin_cuda PASSED [ 19%] 2023-01-11T23:13:47.0728746Z test_ops.py::TestCommonCUDA::test_out_warning_isinf_cuda PASSED [ 19%] 2023-01-11T23:13:47.0728882Z test_ops.py::TestCommonCUDA::test_out_warning_isnan_cuda PASSED [ 19%] 2023-01-11T23:13:47.0729029Z test_ops.py::TestCommonCUDA::test_out_warning_isneginf_cuda PASSED [ 19%] 2023-01-11T23:13:47.0729195Z test_ops.py::TestCommonCUDA::test_out_warning_isreal_cuda PASSED [ 19%] 2023-01-11T23:13:47.0729396Z test_ops.py::TestCommonCUDA::test_out_warning_jiterator_4inputs_with_extra_args_cuda PASSED [ 19%] 2023-01-11T23:13:47.0729551Z test_ops.py::TestCommonCUDA::test_out_warning_jiterator_binary_cuda PASSED [ 19%] 2023-01-11T23:13:47.0729724Z test_ops.py::TestCommonCUDA::test_out_warning_jiterator_binary_return_by_ref_cuda PASSED [ 19%] 2023-01-11T23:13:47.0729893Z test_ops.py::TestCommonCUDA::test_out_warning_le_cuda PASSED [ 19%] 2023-01-11T23:13:47.0730041Z test_ops.py::TestCommonCUDA::test_out_warning_linalg_cross_cuda PASSED [ 19%] 2023-01-11T23:13:47.0730192Z test_ops.py::TestCommonCUDA::test_out_warning_linalg_eig_cuda PASSED [ 19%] 2023-01-11T23:13:47.0730356Z test_ops.py::TestCommonCUDA::test_out_warning_linalg_householder_product_cuda PASSED [ 19%] 2023-01-11T23:13:47.0730504Z test_ops.py::TestCommonCUDA::test_out_warning_linalg_inv_ex_cuda PASSED [ 19%] 2023-01-11T23:13:47.0730665Z test_ops.py::TestCommonCUDA::test_out_warning_linalg_ldl_factor_ex_cuda PASSED [ 19%] 2023-01-11T23:13:47.0730821Z test_ops.py::TestCommonCUDA::test_out_warning_linalg_ldl_solve_cuda PASSED [ 19%] 2023-01-11T23:13:47.0730971Z test_ops.py::TestCommonCUDA::test_out_warning_linalg_lstsq_cuda PASSED [ 19%] 2023-01-11T23:13:47.0731116Z test_ops.py::TestCommonCUDA::test_out_warning_linalg_lu_cuda PASSED [ 19%] 2023-01-11T23:13:47.0731276Z test_ops.py::TestCommonCUDA::test_out_warning_linalg_lu_factor_cuda PASSED [ 19%] 2023-01-11T23:13:47.0731431Z test_ops.py::TestCommonCUDA::test_out_warning_linalg_lu_solve_cuda PASSED [ 19%] 2023-01-11T23:13:47.0731586Z test_ops.py::TestCommonCUDA::test_out_warning_linalg_matrix_norm_cuda PASSED [ 19%] 2023-01-11T23:13:47.0731748Z test_ops.py::TestCommonCUDA::test_out_warning_linalg_matrix_power_cuda PASSED [ 19%] 2023-01-11T23:13:47.0731917Z test_ops.py::TestCommonCUDA::test_out_warning_linalg_matrix_rank_hermitian_cuda PASSED [ 19%] 2023-01-11T23:13:47.0732070Z test_ops.py::TestCommonCUDA::test_out_warning_linalg_multi_dot_cuda PASSED [ 19%] 2023-01-11T23:13:47.0732220Z test_ops.py::TestCommonCUDA::test_out_warning_linalg_norm_cuda PASSED [ 19%] 2023-01-11T23:13:47.0732396Z test_ops.py::TestCommonCUDA::test_out_warning_linalg_norm_subgradients_at_zero_cuda PASSED [ 19%] 2023-01-11T23:13:47.0732545Z test_ops.py::TestCommonCUDA::test_out_warning_linalg_qr_cuda PASSED [ 19%] 2023-01-11T23:13:47.0732698Z test_ops.py::TestCommonCUDA::test_out_warning_linalg_slogdet_cuda PASSED [ 19%] 2023-01-11T23:13:47.0732841Z test_ops.py::TestCommonCUDA::test_out_warning_linalg_solve_cuda PASSED [ 19%] 2023-01-11T23:13:47.0733019Z test_ops.py::TestCommonCUDA::test_out_warning_linalg_solve_ex_cuda PASSED [ 19%] 2023-01-11T23:13:47.0733186Z test_ops.py::TestCommonCUDA::test_out_warning_linalg_solve_triangular_cuda PASSED [ 19%] 2023-01-11T23:13:47.0733345Z test_ops.py::TestCommonCUDA::test_out_warning_linalg_tensorsolve_cuda PASSED [ 19%] 2023-01-11T23:13:47.0733494Z test_ops.py::TestCommonCUDA::test_out_warning_linalg_vecdot_cuda PASSED [ 19%] 2023-01-11T23:13:47.0733651Z test_ops.py::TestCommonCUDA::test_out_warning_linalg_vector_norm_cuda PASSED [ 19%] 2023-01-11T23:13:47.0733796Z test_ops.py::TestCommonCUDA::test_out_warning_log10_cuda PASSED [ 19%] 2023-01-11T23:13:47.0733940Z test_ops.py::TestCommonCUDA::test_out_warning_log2_cuda PASSED [ 19%] 2023-01-11T23:13:47.0734081Z test_ops.py::TestCommonCUDA::test_out_warning_log_cuda PASSED [ 20%] 2023-01-11T23:13:47.0734230Z test_ops.py::TestCommonCUDA::test_out_warning_log_softmax_cuda PASSED [ 20%] 2023-01-11T23:13:47.0734396Z test_ops.py::TestCommonCUDA::test_out_warning_log_softmax_with_dtype_cuda PASSED [ 20%] 2023-01-11T23:13:47.0734670Z test_ops.py::TestCommonCUDA::test_out_warning_logaddexp2_cuda PASSED [ 20%] 2023-01-11T23:13:47.0734822Z test_ops.py::TestCommonCUDA::test_out_warning_logaddexp_cuda PASSED [ 20%] 2023-01-11T23:13:47.0734972Z test_ops.py::TestCommonCUDA::test_out_warning_logical_not_cuda PASSED [ 20%] 2023-01-11T23:13:47.0735119Z test_ops.py::TestCommonCUDA::test_out_warning_logical_or_cuda PASSED [ 20%] 2023-01-11T23:13:47.0735265Z test_ops.py::TestCommonCUDA::test_out_warning_logit_cuda PASSED [ 20%] 2023-01-11T23:13:47.0735407Z test_ops.py::TestCommonCUDA::test_out_warning_lt_cuda PASSED [ 20%] 2023-01-11T23:13:47.0735581Z test_ops.py::TestCommonCUDA::test_out_warning_lu_cuda XFAIL [ 20%] 2023-01-11T23:13:47.0735726Z test_ops.py::TestCommonCUDA::test_out_warning_lu_solve_cuda PASSED [ 20%] 2023-01-11T23:13:47.0735875Z test_ops.py::TestCommonCUDA::test_out_warning_lu_unpack_cuda PASSED [ 20%] 2023-01-11T23:13:47.0736025Z test_ops.py::TestCommonCUDA::test_out_warning_mH_cuda PASSED [ 20%] 2023-01-11T23:13:47.0736168Z test_ops.py::TestCommonCUDA::test_out_warning_mT_cuda PASSED [ 20%] 2023-01-11T23:13:47.0736315Z test_ops.py::TestCommonCUDA::test_out_warning_masked_amin_cuda PASSED [ 20%] 2023-01-11T23:13:47.0736465Z test_ops.py::TestCommonCUDA::test_out_warning_masked_argmax_cuda PASSED [ 20%] 2023-01-11T23:13:47.0736615Z test_ops.py::TestCommonCUDA::test_out_warning_masked_cumprod_cuda PASSED [ 20%] 2023-01-11T23:13:47.0736758Z test_ops.py::TestCommonCUDA::test_out_warning_masked_cumsum_cuda PASSED [ 20%] 2023-01-11T23:13:47.0736918Z test_ops.py::TestCommonCUDA::test_out_warning_masked_log_softmax_cuda PASSED [ 20%] 2023-01-11T23:13:47.0737079Z test_ops.py::TestCommonCUDA::test_out_warning_masked_logsumexp_cuda PASSED [ 20%] 2023-01-11T23:13:47.0737226Z test_ops.py::TestCommonCUDA::test_out_warning_masked_mean_cuda PASSED [ 20%] 2023-01-11T23:13:47.0737377Z test_ops.py::TestCommonCUDA::test_out_warning_masked_median_cuda PASSED [ 20%] 2023-01-11T23:13:47.0737525Z test_ops.py::TestCommonCUDA::test_out_warning_masked_norm_cuda PASSED [ 20%] 2023-01-11T23:13:47.0737684Z test_ops.py::TestCommonCUDA::test_out_warning_masked_normalize_cuda PASSED [ 20%] 2023-01-11T23:13:47.0737831Z test_ops.py::TestCommonCUDA::test_out_warning_masked_prod_cuda PASSED [ 20%] 2023-01-11T23:13:47.0737974Z test_ops.py::TestCommonCUDA::test_out_warning_masked_select_cuda PASSED [ 20%] 2023-01-11T23:13:47.0738131Z test_ops.py::TestCommonCUDA::test_out_warning_masked_softmax_cuda PASSED [ 20%] 2023-01-11T23:13:47.0738279Z test_ops.py::TestCommonCUDA::test_out_warning_masked_softmin_cuda PASSED [ 20%] 2023-01-11T23:13:47.0738429Z test_ops.py::TestCommonCUDA::test_out_warning_masked_var_cuda PASSED [ 20%] 2023-01-11T23:13:47.0738575Z test_ops.py::TestCommonCUDA::test_out_warning_matmul_cuda PASSED [ 20%] 2023-01-11T23:13:47.0738802Z test_ops.py::TestCommonCUDA::test_out_warning_max_binary_cuda PASSED [ 20%] 2023-01-11T23:13:47.0738950Z test_ops.py::TestCommonCUDA::test_out_warning_median_cuda PASSED [ 20%] 2023-01-11T23:13:47.0739115Z test_ops.py::TestCommonCUDA::test_out_warning_meshgrid_list_of_tensors_cuda PASSED [ 20%] 2023-01-11T23:13:47.0739277Z test_ops.py::TestCommonCUDA::test_out_warning_meshgrid_variadic_tensors_cuda PASSED [ 20%] 2023-01-11T23:13:47.0739441Z test_ops.py::TestCommonCUDA::test_out_warning_min_reduction_with_dim_cuda PASSED [ 20%] 2023-01-11T23:13:47.0739584Z test_ops.py::TestCommonCUDA::test_out_warning_mode_cuda XFAIL [ 20%] 2023-01-11T23:13:47.0739728Z test_ops.py::TestCommonCUDA::test_out_warning_msort_cuda PASSED [ 20%] 2023-01-11T23:13:47.0739873Z test_ops.py::TestCommonCUDA::test_out_warning_mul_cuda PASSED [ 20%] 2023-01-11T23:13:47.0740016Z test_ops.py::TestCommonCUDA::test_out_warning_mv_cuda PASSED [ 20%] 2023-01-11T23:13:47.0740183Z test_ops.py::TestCommonCUDA::test_out_warning_mvlgamma_mvlgamma_p_1_cuda PASSED [ 20%] 2023-01-11T23:13:47.0740345Z test_ops.py::TestCommonCUDA::test_out_warning_mvlgamma_mvlgamma_p_5_cuda PASSED [ 20%] 2023-01-11T23:13:47.0740485Z test_ops.py::TestCommonCUDA::test_out_warning_nan_to_num_cuda PASSED [ 20%] 2023-01-11T23:13:47.0740631Z test_ops.py::TestCommonCUDA::test_out_warning_nanmean_cuda PASSED [ 20%] 2023-01-11T23:13:47.0740779Z test_ops.py::TestCommonCUDA::test_out_warning_narrow_copy_cuda XFAIL [ 20%] 2023-01-11T23:13:47.0740946Z test_ops.py::TestCommonCUDA::test_out_warning_native_dropout_backward_cuda PASSED [ 20%] 2023-01-11T23:13:47.0741104Z test_ops.py::TestCommonCUDA::test_out_warning_native_layer_norm_cuda PASSED [ 20%] 2023-01-11T23:13:47.0741273Z test_ops.py::TestCommonCUDA::test_out_warning_neg_cuda PASSED [ 20%] 2023-01-11T23:13:47.0741421Z test_ops.py::TestCommonCUDA::test_out_warning_new_empty_cuda PASSED [ 20%] 2023-01-11T23:13:47.0741566Z test_ops.py::TestCommonCUDA::test_out_warning_new_zeros_cuda PASSED [ 20%] 2023-01-11T23:13:47.0741742Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_adaptive_avg_pool2d_cuda PASSED [ 20%] 2023-01-11T23:13:47.0741917Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_adaptive_avg_pool3d_cuda PASSED [ 20%] 2023-01-11T23:13:47.0742089Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_adaptive_max_pool1d_cuda PASSED [ 20%] 2023-01-11T23:13:47.0742259Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_adaptive_max_pool3d_cuda PASSED [ 20%] 2023-01-11T23:13:47.0742425Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_avg_pool1d_cuda PASSED [ 20%] 2023-01-11T23:13:47.0742586Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_avg_pool2d_cuda PASSED [ 20%] 2023-01-11T23:13:47.0742753Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_avg_pool3d_cuda PASSED [ 20%] 2023-01-11T23:13:47.0742935Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_batch_norm_without_cudnn_cuda PASSED [ 20%] 2023-01-11T23:13:47.0743104Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_bilinear_cuda PASSED [ 20%] 2023-01-11T23:13:47.0743275Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_binary_cross_entropy_cuda PASSED [ 20%] 2023-01-11T23:13:47.0743468Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_binary_cross_entropy_with_logits_cuda PASSED [ 20%] 2023-01-11T23:13:47.0743640Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_conv_transpose1d_cuda PASSED [ 20%] 2023-01-11T23:13:47.0743822Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_cosine_embedding_loss_cuda PASSED [ 20%] 2023-01-11T23:13:47.0743993Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_cross_entropy_cuda PASSED [ 20%] 2023-01-11T23:13:47.0744160Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_ctc_loss_cuda PASSED [ 20%] 2023-01-11T23:13:47.0744327Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_dropout2d_cuda PASSED [ 20%] 2023-01-11T23:13:47.0744515Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_dropout_cuda PASSED [ 20%] 2023-01-11T23:13:47.0744683Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_embedding_cuda PASSED [ 20%] 2023-01-11T23:13:47.0744871Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_feature_alpha_dropout_without_train_cuda PASSED [ 20%] 2023-01-11T23:13:47.0745050Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_fractional_max_pool3d_cuda PASSED [ 20%] 2023-01-11T23:13:47.0745229Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_gaussian_nll_loss_cuda PASSED [ 20%] 2023-01-11T23:13:47.0745387Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_gelu_cuda PASSED [ 20%] 2023-01-11T23:13:47.0745552Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_group_norm_cuda PASSED [ 20%] 2023-01-11T23:13:47.0745718Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_hardshrink_cuda PASSED [ 20%] 2023-01-11T23:13:47.0745887Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_hardsigmoid_cuda PASSED [ 20%] 2023-01-11T23:13:47.0746049Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_hardtanh_cuda PASSED [ 20%] 2023-01-11T23:13:47.0746218Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_hinge_embedding_loss_cuda PASSED [ 20%] 2023-01-11T23:13:47.0746379Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_huber_loss_cuda PASSED [ 20%] 2023-01-11T23:13:47.0746549Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_instance_norm_cuda PASSED [ 20%] 2023-01-11T23:13:47.0746724Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_interpolate_area_cuda PASSED [ 20%] 2023-01-11T23:13:47.0746904Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_interpolate_bicubic_cuda PASSED [ 20%] 2023-01-11T23:13:47.0747113Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_interpolate_linear_cuda PASSED [ 20%] 2023-01-11T23:13:47.0747298Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_interpolate_trilinear_cuda PASSED [ 20%] 2023-01-11T23:13:47.0747459Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_kl_div_cuda PASSED [ 20%] 2023-01-11T23:13:47.0747619Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_l1_loss_cuda PASSED [ 20%] 2023-01-11T23:13:47.0747772Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_layer_norm_cuda PASSED [ 20%] 2023-01-11T23:13:47.0747936Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_leaky_relu_cuda PASSED [ 20%] 2023-01-11T23:13:47.0748099Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_max_pool1d_cuda PASSED [ 20%] 2023-01-11T23:13:47.0748264Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_max_pool2d_cuda PASSED [ 20%] 2023-01-11T23:13:47.0748425Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_max_pool3d_cuda PASSED [ 20%] 2023-01-11T23:13:47.0748599Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_max_unpool3d_grad_cuda PASSED [ 20%] 2023-01-11T23:13:47.0748760Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_mish_cuda PASSED [ 20%] 2023-01-11T23:13:47.0748934Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_multi_margin_loss_cuda PASSED [ 20%] 2023-01-11T23:13:47.0749107Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_multilabel_margin_loss_cuda PASSED [ 20%] 2023-01-11T23:13:47.0749295Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_multilabel_soft_margin_loss_cuda PASSED [ 20%] 2023-01-11T23:13:47.0749460Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_normalize_cuda PASSED [ 20%] 2023-01-11T23:13:47.0749628Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_pad_circular_cuda PASSED [ 20%] 2023-01-11T23:13:47.0749875Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_pad_constant_cuda PASSED [ 20%] 2023-01-11T23:13:47.0750048Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_pad_reflect_cuda PASSED [ 20%] 2023-01-11T23:13:47.0750250Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_pad_replicate_cuda PASSED [ 20%] 2023-01-11T23:13:47.0750415Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_pdist_cuda PASSED [ 20%] 2023-01-11T23:13:47.0750588Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_pixel_unshuffle_cuda PASSED [ 20%] 2023-01-11T23:13:47.0750753Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_poisson_nll_loss_cuda PASSED [ 20%] 2023-01-11T23:13:47.0750915Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_relu6_cuda PASSED [ 20%] 2023-01-11T23:13:47.0751077Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_relu_cuda PASSED [ 20%] 2023-01-11T23:13:47.0751239Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_softmin_cuda PASSED [ 20%] 2023-01-11T23:13:47.0751417Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_softmin_with_dtype_cuda PASSED [ 20%] 2023-01-11T23:13:47.0751582Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_softsign_cuda PASSED [ 20%] 2023-01-11T23:13:47.0751750Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_tanhshrink_cuda PASSED [ 20%] 2023-01-11T23:13:47.0751944Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_triplet_margin_with_distance_loss_cuda PASSED [ 20%] 2023-01-11T23:13:47.0752120Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_upsample_bilinear_cuda PASSED [ 20%] 2023-01-11T23:13:47.0752263Z test_ops.py::TestCommonCUDA::test_out_warning_nonzero_cuda XFAIL [ 20%] 2023-01-11T23:13:47.0752412Z test_ops.py::TestCommonCUDA::test_out_warning_norm_inf_cuda PASSED [ 20%] 2023-01-11T23:13:47.0752585Z test_ops.py::TestCommonCUDA::test_out_warning_ops_nvprims_native_batch_norm_cuda PASSED [ 20%] 2023-01-11T23:13:47.0752750Z test_ops.py::TestCommonCUDA::test_out_warning_ops_nvprims_var_mean_cuda PASSED [ 20%] 2023-01-11T23:13:47.0752944Z test_ops.py::TestCommonCUDA::test_out_warning_ops_nvprims_view_cuda PASSED [ 20%] 2023-01-11T23:13:47.0753090Z test_ops.py::TestCommonCUDA::test_out_warning_ormqr_cuda PASSED [ 20%] 2023-01-11T23:13:47.0753247Z test_ops.py::TestCommonCUDA::test_out_warning_pca_lowrank_cuda PASSED [ 20%] 2023-01-11T23:13:47.0753398Z test_ops.py::TestCommonCUDA::test_out_warning_pinverse_cuda PASSED [ 20%] 2023-01-11T23:13:47.0753541Z test_ops.py::TestCommonCUDA::test_out_warning_polar_cuda PASSED [ 20%] 2023-01-11T23:13:47.0753708Z test_ops.py::TestCommonCUDA::test_out_warning_polygamma_polygamma_n_0_cuda PASSED [ 20%] 2023-01-11T23:13:47.0753851Z test_ops.py::TestCommonCUDA::test_out_warning_pow_cuda PASSED [ 20%] 2023-01-11T23:13:47.0753998Z test_ops.py::TestCommonCUDA::test_out_warning_put_cuda PASSED [ 20%] 2023-01-11T23:13:47.0754141Z test_ops.py::TestCommonCUDA::test_out_warning_qr_cuda PASSED [ 20%] 2023-01-11T23:13:47.0754292Z test_ops.py::TestCommonCUDA::test_out_warning_quantile_cuda PASSED [ 20%] 2023-01-11T23:13:47.0754439Z test_ops.py::TestCommonCUDA::test_out_warning_randn_like_cuda PASSED [ 20%] 2023-01-11T23:13:47.0754587Z test_ops.py::TestCommonCUDA::test_out_warning_ravel_cuda PASSED [ 20%] 2023-01-11T23:13:47.0754725Z test_ops.py::TestCommonCUDA::test_out_warning_real_cuda PASSED [ 20%] 2023-01-11T23:13:47.0754886Z test_ops.py::TestCommonCUDA::test_out_warning_repeat_interleave_cuda PASSED [ 20%] 2023-01-11T23:13:47.0755034Z test_ops.py::TestCommonCUDA::test_out_warning_reshape_cuda PASSED [ 20%] 2023-01-11T23:13:47.0755180Z test_ops.py::TestCommonCUDA::test_out_warning_resize__cuda PASSED [ 20%] 2023-01-11T23:13:47.0755328Z test_ops.py::TestCommonCUDA::test_out_warning_resolve_neg_cuda PASSED [ 20%] 2023-01-11T23:13:47.0755472Z test_ops.py::TestCommonCUDA::test_out_warning_rot90_cuda PASSED [ 20%] 2023-01-11T23:13:47.0755619Z test_ops.py::TestCommonCUDA::test_out_warning_round_cuda PASSED [ 20%] 2023-01-11T23:13:47.0755776Z test_ops.py::TestCommonCUDA::test_out_warning_round_decimals_0_cuda PASSED [ 20%] 2023-01-11T23:13:47.0755937Z test_ops.py::TestCommonCUDA::test_out_warning_rsub_cuda PASSED [ 20%] 2023-01-11T23:13:47.0756086Z test_ops.py::TestCommonCUDA::test_out_warning_scatter_cuda PASSED [ 20%] 2023-01-11T23:13:47.0756249Z test_ops.py::TestCommonCUDA::test_out_warning_scatter_reduce_amax_cuda PASSED [ 20%] 2023-01-11T23:13:47.0756406Z test_ops.py::TestCommonCUDA::test_out_warning_scatter_reduce_amin_cuda PASSED [ 20%] 2023-01-11T23:13:47.0756570Z test_ops.py::TestCommonCUDA::test_out_warning_scatter_reduce_prod_cuda PASSED [ 20%] 2023-01-11T23:13:47.0756730Z test_ops.py::TestCommonCUDA::test_out_warning_scatter_reduce_sum_cuda PASSED [ 20%] 2023-01-11T23:13:47.0756895Z test_ops.py::TestCommonCUDA::test_out_warning_segment_reduce_lengths_cuda PASSED [ 20%] 2023-01-11T23:13:47.0757047Z test_ops.py::TestCommonCUDA::test_out_warning_select_cuda PASSED [ 20%] 2023-01-11T23:13:47.0757192Z test_ops.py::TestCommonCUDA::test_out_warning_select_scatter_cuda PASSED [ 20%] 2023-01-11T23:13:47.0757344Z test_ops.py::TestCommonCUDA::test_out_warning_sigmoid_cuda PASSED [ 20%] 2023-01-11T23:13:47.0757511Z test_ops.py::TestCommonCUDA::test_out_warning_signal_windows_bartlett_cuda PASSED [ 20%] 2023-01-11T23:13:47.0757674Z test_ops.py::TestCommonCUDA::test_out_warning_signal_windows_hann_cuda PASSED [ 20%] 2023-01-11T23:13:47.0757836Z test_ops.py::TestCommonCUDA::test_out_warning_signal_windows_kaiser_cuda PASSED [ 20%] 2023-01-11T23:13:47.0758000Z test_ops.py::TestCommonCUDA::test_out_warning_signal_windows_nuttall_cuda PASSED [ 20%] 2023-01-11T23:13:47.0758146Z test_ops.py::TestCommonCUDA::test_out_warning_sinc_cuda PASSED [ 20%] 2023-01-11T23:13:47.0758292Z test_ops.py::TestCommonCUDA::test_out_warning_slice_cuda PASSED [ 20%] 2023-01-11T23:13:47.0758472Z test_ops.py::TestCommonCUDA::test_out_warning_softmax_with_dtype_cuda PASSED [ 20%] 2023-01-11T23:13:47.0758618Z test_ops.py::TestCommonCUDA::test_out_warning_sort_cuda PASSED [ 21%] 2023-01-11T23:13:47.0758798Z test_ops.py::TestCommonCUDA::test_out_warning_sparse_sampled_addmm_cuda SKIPPED (Skipped!) [ 21%] 2023-01-11T23:13:47.0758957Z test_ops.py::TestCommonCUDA::test_out_warning_special_airy_ai_cuda PASSED [ 21%] 2023-01-11T23:13:47.0759113Z test_ops.py::TestCommonCUDA::test_out_warning_special_bessel_j0_cuda PASSED [ 21%] 2023-01-11T23:13:47.0759276Z test_ops.py::TestCommonCUDA::test_out_warning_special_bessel_j1_cuda PASSED [ 21%] 2023-01-11T23:13:47.0759477Z test_ops.py::TestCommonCUDA::test_out_warning_special_chebyshev_polynomial_t_cuda PASSED [ 21%] 2023-01-11T23:13:47.0759851Z test_ops.py::TestCommonCUDA::test_out_warning_special_chebyshev_polynomial_v_cuda SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 21%] 2023-01-11T23:13:47.0760006Z test_ops.py::TestCommonCUDA::test_out_warning_special_entr_cuda PASSED [ 21%] 2023-01-11T23:13:47.0760176Z test_ops.py::TestCommonCUDA::test_out_warning_special_hermite_polynomial_he_cuda PASSED [ 21%] 2023-01-11T23:13:47.0760350Z test_ops.py::TestCommonCUDA::test_out_warning_special_laguerre_polynomial_l_cuda PASSED [ 21%] 2023-01-11T23:13:47.0760519Z test_ops.py::TestCommonCUDA::test_out_warning_special_modified_bessel_k0_cuda PASSED [ 21%] 2023-01-11T23:13:47.0760697Z test_ops.py::TestCommonCUDA::test_out_warning_special_scaled_modified_bessel_k0_cuda PASSED [ 21%] 2023-01-11T23:13:47.0761039Z test_ops.py::TestCommonCUDA::test_out_warning_special_shifted_chebyshev_polynomial_t_cuda SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 21%] 2023-01-11T23:13:47.0761384Z test_ops.py::TestCommonCUDA::test_out_warning_special_shifted_chebyshev_polynomial_u_cuda SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 21%] 2023-01-11T23:13:47.0761719Z test_ops.py::TestCommonCUDA::test_out_warning_special_shifted_chebyshev_polynomial_v_cuda SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 21%] 2023-01-11T23:13:47.0761881Z test_ops.py::TestCommonCUDA::test_out_warning_special_xlog1py_cuda PASSED [ 21%] 2023-01-11T23:13:47.0762060Z test_ops.py::TestCommonCUDA::test_out_warning_special_zeta_cuda PASSED [ 21%] 2023-01-11T23:13:47.0762203Z test_ops.py::TestCommonCUDA::test_out_warning_split_cuda PASSED [ 21%] 2023-01-11T23:13:47.0762359Z test_ops.py::TestCommonCUDA::test_out_warning_split_list_args_cuda PASSED [ 21%] 2023-01-11T23:13:47.0762516Z test_ops.py::TestCommonCUDA::test_out_warning_split_with_sizes_cuda PASSED [ 21%] 2023-01-11T23:13:47.0762666Z test_ops.py::TestCommonCUDA::test_out_warning_square_cuda PASSED [ 21%] 2023-01-11T23:13:47.0762809Z test_ops.py::TestCommonCUDA::test_out_warning_std_cuda PASSED [ 21%] 2023-01-11T23:13:47.0762967Z test_ops.py::TestCommonCUDA::test_out_warning_std_mean_unbiased_cuda PASSED [ 21%] 2023-01-11T23:13:47.0763112Z test_ops.py::TestCommonCUDA::test_out_warning_sub_cuda PASSED [ 21%] 2023-01-11T23:13:47.0763260Z test_ops.py::TestCommonCUDA::test_out_warning_sum_cuda PASSED [ 21%] 2023-01-11T23:13:47.0763406Z test_ops.py::TestCommonCUDA::test_out_warning_sum_to_size_cuda PASSED [ 21%] 2023-01-11T23:13:47.0763553Z test_ops.py::TestCommonCUDA::test_out_warning_svd_lowrank_cuda PASSED [ 21%] 2023-01-11T23:13:47.0763694Z test_ops.py::TestCommonCUDA::test_out_warning_t_cuda PASSED [ 21%] 2023-01-11T23:13:47.0763845Z test_ops.py::TestCommonCUDA::test_out_warning_take_along_dim_cuda PASSED [ 21%] 2023-01-11T23:13:47.0763992Z test_ops.py::TestCommonCUDA::test_out_warning_tan_cuda PASSED [ 21%] 2023-01-11T23:13:47.0764140Z test_ops.py::TestCommonCUDA::test_out_warning_tensor_split_cuda PASSED [ 21%] 2023-01-11T23:13:47.0764281Z test_ops.py::TestCommonCUDA::test_out_warning_to_cuda PASSED [ 21%] 2023-01-11T23:13:47.0764427Z test_ops.py::TestCommonCUDA::test_out_warning_to_sparse_cuda PASSED [ 21%] 2023-01-11T23:13:47.0764599Z test_ops.py::TestCommonCUDA::test_out_warning_trapz_cuda PASSED [ 21%] 2023-01-11T23:13:47.0764734Z test_ops.py::TestCommonCUDA::test_out_warning_tril_cuda PASSED [ 21%] 2023-01-11T23:13:47.0764884Z test_ops.py::TestCommonCUDA::test_out_warning_true_divide_cuda PASSED [ 21%] 2023-01-11T23:13:47.0765030Z test_ops.py::TestCommonCUDA::test_out_warning_unbind_cuda PASSED [ 21%] 2023-01-11T23:13:47.0765176Z test_ops.py::TestCommonCUDA::test_out_warning_unsqueeze_cuda PASSED [ 21%] 2023-01-11T23:13:47.0765318Z test_ops.py::TestCommonCUDA::test_out_warning_var_cuda PASSED [ 21%] 2023-01-11T23:13:47.0765478Z test_ops.py::TestCommonCUDA::test_out_warning_var_mean_unbiased_cuda PASSED [ 21%] 2023-01-11T23:13:47.0765624Z test_ops.py::TestCommonCUDA::test_out_warning_view_as_cuda PASSED [ 21%] 2023-01-11T23:13:47.0765771Z test_ops.py::TestCommonCUDA::test_out_warning_view_as_real_cuda PASSED [ 21%] 2023-01-11T23:13:47.0765911Z test_ops.py::TestCommonCUDA::test_out_warning_vstack_cuda PASSED [ 21%] 2023-01-11T23:13:47.0766055Z test_ops.py::TestCommonCUDA::test_out_warning_xlogy_cuda PASSED [ 21%] 2023-01-11T23:13:47.0766201Z test_ops.py::TestCommonCUDA::test_out_warning_zero__cuda PASSED [ 21%] 2023-01-11T23:13:47.0766346Z test_ops.py::TestCommonCUDA::test_out_warning_zeros_like_cuda PASSED [ 21%] 2023-01-11T23:13:47.0766489Z test_ops.py::TestCommonCUDA::test_out_zero__cuda_float32 PASSED [ 21%] 2023-01-11T23:13:47.0766631Z test_ops.py::TestCommonCUDA::test_out_zeros_cuda_float32 PASSED [ 21%] 2023-01-11T23:13:47.0766777Z test_ops.py::TestCommonCUDA::test_python_ref__refs_T_cuda_bool PASSED [ 21%] 2023-01-11T23:13:47.0766924Z test_ops.py::TestCommonCUDA::test_python_ref__refs_T_cuda_int64 PASSED [ 21%] 2023-01-11T23:13:47.0767097Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bfloat16_cuda_bfloat16 PASSED [ 21%] 2023-01-11T23:13:47.0767276Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bfloat16_cuda_complex32 PASSED [ 21%] 2023-01-11T23:13:47.0767456Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bfloat16_cuda_float16 PASSED [ 21%] 2023-01-11T23:13:47.0767651Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bfloat16_cuda_float64 PASSED [ 21%] 2023-01-11T23:13:47.0767825Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bfloat16_cuda_int64 PASSED [ 21%] 2023-01-11T23:13:47.0767995Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bfloat16_cuda_int8 PASSED [ 21%] 2023-01-11T23:13:47.0768165Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bfloat16_cuda_uint8 PASSED [ 21%] 2023-01-11T23:13:47.0768334Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bool_cuda_bool PASSED [ 21%] 2023-01-11T23:13:47.0768513Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bool_cuda_complex128 PASSED [ 21%] 2023-01-11T23:13:47.0768676Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bool_cuda_float16 PASSED [ 21%] 2023-01-11T23:13:47.0768852Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bool_cuda_float32 PASSED [ 21%] 2023-01-11T23:13:47.0769023Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bool_cuda_int16 PASSED [ 21%] 2023-01-11T23:13:47.0769190Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bool_cuda_int8 PASSED [ 21%] 2023-01-11T23:13:47.0769357Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bool_cuda_uint8 PASSED [ 21%] 2023-01-11T23:13:47.0769524Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_byte_cuda_float16 PASSED [ 21%] 2023-01-11T23:13:47.0769694Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_byte_cuda_float32 PASSED [ 21%] 2023-01-11T23:13:47.0769862Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_byte_cuda_float64 PASSED [ 21%] 2023-01-11T23:13:47.0770021Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_byte_cuda_int64 PASSED [ 21%] 2023-01-11T23:13:47.0770212Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_byte_cuda_int8 PASSED [ 21%] 2023-01-11T23:13:47.0770392Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cdouble_cuda_complex128 PASSED [ 21%] 2023-01-11T23:13:47.0770570Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cdouble_cuda_complex32 PASSED [ 21%] 2023-01-11T23:13:47.0770744Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cdouble_cuda_complex64 PASSED [ 21%] 2023-01-11T23:13:47.0770923Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cdouble_cuda_float16 PASSED [ 21%] 2023-01-11T23:13:47.0771093Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cdouble_cuda_float64 PASSED [ 21%] 2023-01-11T23:13:47.0771266Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cdouble_cuda_int16 PASSED [ 21%] 2023-01-11T23:13:47.0771435Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cdouble_cuda_int32 PASSED [ 21%] 2023-01-11T23:13:47.0771602Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cdouble_cuda_int64 PASSED [ 21%] 2023-01-11T23:13:47.0771771Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cdouble_cuda_int8 PASSED [ 21%] 2023-01-11T23:13:47.0771942Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cfloat_cuda_bool PASSED [ 21%] 2023-01-11T23:13:47.0772123Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cfloat_cuda_complex128 PASSED [ 21%] 2023-01-11T23:13:47.0772297Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cfloat_cuda_complex64 PASSED [ 21%] 2023-01-11T23:13:47.0772469Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cfloat_cuda_float32 PASSED [ 21%] 2023-01-11T23:13:47.0772641Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cfloat_cuda_float64 PASSED [ 21%] 2023-01-11T23:13:47.0772810Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cfloat_cuda_int32 PASSED [ 21%] 2023-01-11T23:13:47.0772983Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cfloat_cuda_int64 PASSED [ 21%] 2023-01-11T23:13:47.0773148Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cfloat_cuda_int8 PASSED [ 21%] 2023-01-11T23:13:47.0773345Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_chalf_cuda_bfloat16 PASSED [ 21%] 2023-01-11T23:13:47.0773523Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_chalf_cuda_complex32 PASSED [ 21%] 2023-01-11T23:13:47.0773693Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_chalf_cuda_complex64 PASSED [ 21%] 2023-01-11T23:13:47.0773864Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_chalf_cuda_float32 PASSED [ 21%] 2023-01-11T23:13:47.0774035Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_chalf_cuda_float64 PASSED [ 21%] 2023-01-11T23:13:47.0774205Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_chalf_cuda_int16 PASSED [ 21%] 2023-01-11T23:13:47.0774376Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_char_cuda_bool PASSED [ 21%] 2023-01-11T23:13:47.0774681Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_char_cuda_complex128 PASSED [ 21%] 2023-01-11T23:13:47.0774850Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_char_cuda_complex64 PASSED [ 21%] 2023-01-11T23:13:47.0775020Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_char_cuda_float16 PASSED [ 21%] 2023-01-11T23:13:47.0775189Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_char_cuda_float32 PASSED [ 21%] 2023-01-11T23:13:47.0775359Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_char_cuda_int16 PASSED [ 21%] 2023-01-11T23:13:47.0775525Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_char_cuda_int8 PASSED [ 21%] 2023-01-11T23:13:47.0775696Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_double_cuda_bfloat16 PASSED [ 21%] 2023-01-11T23:13:47.0775926Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_double_cuda_bool PASSED [ 21%] 2023-01-11T23:13:47.0776106Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_double_cuda_complex128 PASSED [ 21%] 2023-01-11T23:13:47.0776278Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_double_cuda_complex32 PASSED [ 21%] 2023-01-11T23:13:47.0776451Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_double_cuda_complex64 PASSED [ 21%] 2023-01-11T23:13:47.0776622Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_double_cuda_float16 PASSED [ 21%] 2023-01-11T23:13:47.0776794Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_double_cuda_float64 PASSED [ 21%] 2023-01-11T23:13:47.0776965Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_double_cuda_int32 PASSED [ 21%] 2023-01-11T23:13:47.0777134Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_double_cuda_int64 PASSED [ 21%] 2023-01-11T23:13:47.0777306Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_double_cuda_uint8 PASSED [ 21%] 2023-01-11T23:13:47.0777474Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_float_cuda_bfloat16 PASSED [ 21%] 2023-01-11T23:13:47.0777645Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_float_cuda_bool PASSED [ 21%] 2023-01-11T23:13:47.0777813Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_float_cuda_complex32 PASSED [ 21%] 2023-01-11T23:13:47.0777984Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_float_cuda_float32 PASSED [ 21%] 2023-01-11T23:13:47.0778152Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_float_cuda_int64 PASSED [ 21%] 2023-01-11T23:13:47.0778322Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_float_cuda_uint8 PASSED [ 21%] 2023-01-11T23:13:47.0778495Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_half_cuda_bfloat16 PASSED [ 21%] 2023-01-11T23:13:47.0778666Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_half_cuda_int16 PASSED [ 21%] 2023-01-11T23:13:47.0778837Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_half_cuda_int32 PASSED [ 21%] 2023-01-11T23:13:47.0779055Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_half_cuda_uint8 PASSED [ 21%] 2023-01-11T23:13:47.0779249Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_int_cuda_bool PASSED [ 21%] 2023-01-11T23:13:47.0779417Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_int_cuda_complex128 PASSED [ 21%] 2023-01-11T23:13:47.0779589Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_int_cuda_complex64 PASSED [ 21%] 2023-01-11T23:13:47.0779756Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_int_cuda_int16 PASSED [ 21%] 2023-01-11T23:13:47.0779921Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_int_cuda_int64 PASSED [ 21%] 2023-01-11T23:13:47.0780088Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_int_cuda_int8 PASSED [ 21%] 2023-01-11T23:13:47.0780254Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_int_cuda_uint8 PASSED [ 21%] 2023-01-11T23:13:47.0780428Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_long_cuda_complex128 PASSED [ 21%] 2023-01-11T23:13:47.0780602Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_long_cuda_complex64 PASSED [ 21%] 2023-01-11T23:13:47.0780766Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_long_cuda_float16 PASSED [ 21%] 2023-01-11T23:13:47.0780935Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_long_cuda_float32 PASSED [ 21%] 2023-01-11T23:13:47.0781103Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_long_cuda_float64 PASSED [ 21%] 2023-01-11T23:13:47.0781266Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_long_cuda_int16 PASSED [ 21%] 2023-01-11T23:13:47.0781431Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_long_cuda_int32 PASSED [ 21%] 2023-01-11T23:13:47.0781620Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_long_cuda_int8 PASSED [ 21%] 2023-01-11T23:13:47.0781782Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_long_cuda_uint8 PASSED [ 21%] 2023-01-11T23:13:47.0781953Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_short_cuda_bool PASSED [ 21%] 2023-01-11T23:13:47.0782130Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_short_cuda_complex128 PASSED [ 21%] 2023-01-11T23:13:47.0782293Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_short_cuda_float32 PASSED [ 21%] 2023-01-11T23:13:47.0782464Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_short_cuda_float64 PASSED [ 21%] 2023-01-11T23:13:47.0782631Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_short_cuda_int16 PASSED [ 21%] 2023-01-11T23:13:47.0782800Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_short_cuda_int64 PASSED [ 21%] 2023-01-11T23:13:47.0782967Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_short_cuda_int8 PASSED [ 21%] 2023-01-11T23:13:47.0783131Z test_ops.py::TestCommonCUDA::test_python_ref__refs_abs_cuda_complex128 PASSED [ 21%] 2023-01-11T23:13:47.0783294Z test_ops.py::TestCommonCUDA::test_python_ref__refs_abs_cuda_complex32 PASSED [ 21%] 2023-01-11T23:13:47.0783450Z test_ops.py::TestCommonCUDA::test_python_ref__refs_abs_cuda_complex64 PASSED [ 21%] 2023-01-11T23:13:47.0783600Z test_ops.py::TestCommonCUDA::test_python_ref__refs_abs_cuda_float16 PASSED [ 21%] 2023-01-11T23:13:47.0783753Z test_ops.py::TestCommonCUDA::test_python_ref__refs_abs_cuda_int32 PASSED [ 21%] 2023-01-11T23:13:47.0783905Z test_ops.py::TestCommonCUDA::test_python_ref__refs_abs_cuda_int64 PASSED [ 21%] 2023-01-11T23:13:47.0784057Z test_ops.py::TestCommonCUDA::test_python_ref__refs_abs_cuda_uint8 PASSED [ 21%] 2023-01-11T23:13:47.0784219Z test_ops.py::TestCommonCUDA::test_python_ref__refs_acos_cuda_complex128 PASSED [ 21%] 2023-01-11T23:13:47.0784383Z test_ops.py::TestCommonCUDA::test_python_ref__refs_acos_cuda_complex64 PASSED [ 21%] 2023-01-11T23:13:47.0784541Z test_ops.py::TestCommonCUDA::test_python_ref__refs_acos_cuda_float64 PASSED [ 21%] 2023-01-11T23:13:47.0784719Z test_ops.py::TestCommonCUDA::test_python_ref__refs_acos_cuda_int16 PASSED [ 22%] 2023-01-11T23:13:47.0784876Z test_ops.py::TestCommonCUDA::test_python_ref__refs_acos_cuda_int8 PASSED [ 22%] 2023-01-11T23:13:47.0785025Z test_ops.py::TestCommonCUDA::test_python_ref__refs_acos_cuda_uint8 PASSED [ 22%] 2023-01-11T23:13:47.0785182Z test_ops.py::TestCommonCUDA::test_python_ref__refs_acosh_cuda_bfloat16 PASSED [ 22%] 2023-01-11T23:13:47.0785341Z test_ops.py::TestCommonCUDA::test_python_ref__refs_acosh_cuda_complex32 PASSED [ 22%] 2023-01-11T23:13:47.0785496Z test_ops.py::TestCommonCUDA::test_python_ref__refs_acosh_cuda_complex64 PASSED [ 22%] 2023-01-11T23:13:47.0785655Z test_ops.py::TestCommonCUDA::test_python_ref__refs_acosh_cuda_float32 PASSED [ 22%] 2023-01-11T23:13:47.0785813Z test_ops.py::TestCommonCUDA::test_python_ref__refs_acosh_cuda_float64 PASSED [ 22%] 2023-01-11T23:13:47.0785971Z test_ops.py::TestCommonCUDA::test_python_ref__refs_acosh_cuda_int16 PASSED [ 22%] 2023-01-11T23:13:47.0786133Z test_ops.py::TestCommonCUDA::test_python_ref__refs_acosh_cuda_int64 PASSED [ 22%] 2023-01-11T23:13:47.0786280Z test_ops.py::TestCommonCUDA::test_python_ref__refs_acosh_cuda_int8 PASSED [ 22%] 2023-01-11T23:13:47.0786436Z test_ops.py::TestCommonCUDA::test_python_ref__refs_add_cuda_bfloat16 PASSED [ 22%] 2023-01-11T23:13:47.0786589Z test_ops.py::TestCommonCUDA::test_python_ref__refs_add_cuda_bool PASSED [ 22%] 2023-01-11T23:13:47.0786754Z test_ops.py::TestCommonCUDA::test_python_ref__refs_add_cuda_complex128 PASSED [ 22%] 2023-01-11T23:13:47.0786913Z test_ops.py::TestCommonCUDA::test_python_ref__refs_add_cuda_complex32 PASSED [ 22%] 2023-01-11T23:13:47.0787067Z test_ops.py::TestCommonCUDA::test_python_ref__refs_add_cuda_complex64 PASSED [ 22%] 2023-01-11T23:13:47.0787222Z test_ops.py::TestCommonCUDA::test_python_ref__refs_add_cuda_float16 PASSED [ 22%] 2023-01-11T23:13:47.0787410Z test_ops.py::TestCommonCUDA::test_python_ref__refs_add_cuda_int32 PASSED [ 22%] 2023-01-11T23:13:47.0787553Z test_ops.py::TestCommonCUDA::test_python_ref__refs_add_cuda_int64 PASSED [ 22%] 2023-01-11T23:13:47.0787706Z test_ops.py::TestCommonCUDA::test_python_ref__refs_add_cuda_int8 PASSED [ 22%] 2023-01-11T23:13:47.0787860Z test_ops.py::TestCommonCUDA::test_python_ref__refs_add_cuda_uint8 PASSED [ 22%] 2023-01-11T23:13:47.0788026Z test_ops.py::TestCommonCUDA::test_python_ref__refs_addcdiv_cuda_complex128 PASSED [ 22%] 2023-01-11T23:13:47.0788191Z test_ops.py::TestCommonCUDA::test_python_ref__refs_addcdiv_cuda_complex64 PASSED [ 22%] 2023-01-11T23:13:47.0788352Z test_ops.py::TestCommonCUDA::test_python_ref__refs_addcdiv_cuda_float32 PASSED [ 22%] 2023-01-11T23:13:47.0788520Z test_ops.py::TestCommonCUDA::test_python_ref__refs_addcmul_cuda_complex128 PASSED [ 22%] 2023-01-11T23:13:47.0788683Z test_ops.py::TestCommonCUDA::test_python_ref__refs_addcmul_cuda_complex64 PASSED [ 22%] 2023-01-11T23:13:47.0788839Z test_ops.py::TestCommonCUDA::test_python_ref__refs_addcmul_cuda_int16 PASSED [ 22%] 2023-01-11T23:13:47.0788992Z test_ops.py::TestCommonCUDA::test_python_ref__refs_addcmul_cuda_int64 PASSED [ 22%] 2023-01-11T23:13:47.0789153Z test_ops.py::TestCommonCUDA::test_python_ref__refs_addr_cuda_bfloat16 XFAIL [ 22%] 2023-01-11T23:13:47.0789304Z test_ops.py::TestCommonCUDA::test_python_ref__refs_addr_cuda_bool XFAIL [ 22%] 2023-01-11T23:13:47.0789460Z test_ops.py::TestCommonCUDA::test_python_ref__refs_addr_cuda_complex64 XFAIL [ 22%] 2023-01-11T23:13:47.0789617Z test_ops.py::TestCommonCUDA::test_python_ref__refs_addr_cuda_float16 XFAIL [ 22%] 2023-01-11T23:13:47.0789837Z test_ops.py::TestCommonCUDA::test_python_ref__refs_addr_cuda_int64 XFAIL [ 22%] 2023-01-11T23:13:47.0790005Z test_ops.py::TestCommonCUDA::test_python_ref__refs_addr_cuda_uint8 XFAIL [ 22%] 2023-01-11T23:13:47.0790151Z test_ops.py::TestCommonCUDA::test_python_ref__refs_all_cuda_bool PASSED [ 22%] 2023-01-11T23:13:47.0790310Z test_ops.py::TestCommonCUDA::test_python_ref__refs_all_cuda_float16 PASSED [ 22%] 2023-01-11T23:13:47.0790465Z test_ops.py::TestCommonCUDA::test_python_ref__refs_all_cuda_float32 PASSED [ 22%] 2023-01-11T23:13:47.0790649Z test_ops.py::TestCommonCUDA::test_python_ref__refs_all_cuda_float64 PASSED [ 22%] 2023-01-11T23:13:47.0790806Z test_ops.py::TestCommonCUDA::test_python_ref__refs_all_cuda_int64 PASSED [ 22%] 2023-01-11T23:13:47.0790956Z test_ops.py::TestCommonCUDA::test_python_ref__refs_all_cuda_int8 PASSED [ 22%] 2023-01-11T23:13:47.0791106Z test_ops.py::TestCommonCUDA::test_python_ref__refs_all_cuda_uint8 PASSED [ 22%] 2023-01-11T23:13:47.0791272Z test_ops.py::TestCommonCUDA::test_python_ref__refs_allclose_cuda_complex128 PASSED [ 22%] 2023-01-11T23:13:47.0791437Z test_ops.py::TestCommonCUDA::test_python_ref__refs_allclose_cuda_float16 PASSED [ 22%] 2023-01-11T23:13:47.0791590Z test_ops.py::TestCommonCUDA::test_python_ref__refs_allclose_cuda_float64 PASSED [ 22%] 2023-01-11T23:13:47.0791751Z test_ops.py::TestCommonCUDA::test_python_ref__refs_amax_cuda_bfloat16 PASSED [ 22%] 2023-01-11T23:13:47.0791906Z test_ops.py::TestCommonCUDA::test_python_ref__refs_amax_cuda_float16 PASSED [ 22%] 2023-01-11T23:13:47.0792062Z test_ops.py::TestCommonCUDA::test_python_ref__refs_amax_cuda_int16 PASSED [ 22%] 2023-01-11T23:13:47.0792218Z test_ops.py::TestCommonCUDA::test_python_ref__refs_amax_cuda_int64 PASSED [ 22%] 2023-01-11T23:13:47.0792375Z test_ops.py::TestCommonCUDA::test_python_ref__refs_amin_cuda_float16 PASSED [ 22%] 2023-01-11T23:13:47.0792531Z test_ops.py::TestCommonCUDA::test_python_ref__refs_amin_cuda_float32 PASSED [ 22%] 2023-01-11T23:13:47.0792682Z test_ops.py::TestCommonCUDA::test_python_ref__refs_amin_cuda_float64 PASSED [ 22%] 2023-01-11T23:13:47.0792828Z test_ops.py::TestCommonCUDA::test_python_ref__refs_amin_cuda_int64 PASSED [ 22%] 2023-01-11T23:13:47.0792979Z test_ops.py::TestCommonCUDA::test_python_ref__refs_amin_cuda_int8 PASSED [ 22%] 2023-01-11T23:13:47.0793156Z test_ops.py::TestCommonCUDA::test_python_ref__refs_amin_cuda_uint8 PASSED [ 22%] 2023-01-11T23:13:47.0793311Z test_ops.py::TestCommonCUDA::test_python_ref__refs_any_cuda_bfloat16 PASSED [ 22%] 2023-01-11T23:13:47.0793476Z test_ops.py::TestCommonCUDA::test_python_ref__refs_any_cuda_complex128 PASSED [ 22%] 2023-01-11T23:13:47.0793634Z test_ops.py::TestCommonCUDA::test_python_ref__refs_any_cuda_complex64 PASSED [ 22%] 2023-01-11T23:13:47.0793790Z test_ops.py::TestCommonCUDA::test_python_ref__refs_any_cuda_float64 PASSED [ 22%] 2023-01-11T23:13:47.0793940Z test_ops.py::TestCommonCUDA::test_python_ref__refs_any_cuda_int16 PASSED [ 22%] 2023-01-11T23:13:47.0794084Z test_ops.py::TestCommonCUDA::test_python_ref__refs_any_cuda_int32 PASSED [ 22%] 2023-01-11T23:13:47.0794234Z test_ops.py::TestCommonCUDA::test_python_ref__refs_any_cuda_int64 PASSED [ 22%] 2023-01-11T23:13:47.0794382Z test_ops.py::TestCommonCUDA::test_python_ref__refs_any_cuda_int8 PASSED [ 22%] 2023-01-11T23:13:47.0794547Z test_ops.py::TestCommonCUDA::test_python_ref__refs_arange_cuda_bfloat16 PASSED [ 22%] 2023-01-11T23:13:47.0794707Z test_ops.py::TestCommonCUDA::test_python_ref__refs_arange_cuda_float32 PASSED [ 22%] 2023-01-11T23:13:47.0794865Z test_ops.py::TestCommonCUDA::test_python_ref__refs_arange_cuda_float64 PASSED [ 22%] 2023-01-11T23:13:47.0795021Z test_ops.py::TestCommonCUDA::test_python_ref__refs_arange_cuda_int16 PASSED [ 22%] 2023-01-11T23:13:47.0795177Z test_ops.py::TestCommonCUDA::test_python_ref__refs_arange_cuda_int32 PASSED [ 22%] 2023-01-11T23:13:47.0795333Z test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_cuda_bfloat16 PASSED [ 22%] 2023-01-11T23:13:47.0795504Z test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_cuda_complex128 PASSED [ 22%] 2023-01-11T23:13:47.0795671Z test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_cuda_float16 PASSED [ 22%] 2023-01-11T23:13:47.0795831Z test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_cuda_float32 PASSED [ 22%] 2023-01-11T23:13:47.0795994Z test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_cuda_int16 PASSED [ 22%] 2023-01-11T23:13:47.0796150Z test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_cuda_int32 PASSED [ 22%] 2023-01-11T23:13:47.0796333Z test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_cuda_int8 PASSED [ 22%] 2023-01-11T23:13:47.0796518Z test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_partial_views_cuda_bfloat16 PASSED [ 22%] 2023-01-11T23:13:47.0796703Z test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_partial_views_cuda_complex64 PASSED [ 22%] 2023-01-11T23:13:47.0796876Z test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_partial_views_cuda_float16 PASSED [ 22%] 2023-01-11T23:13:47.0797050Z test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_partial_views_cuda_float32 PASSED [ 22%] 2023-01-11T23:13:47.0797230Z test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_partial_views_cuda_int32 PASSED [ 22%] 2023-01-11T23:13:47.0797409Z test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_partial_views_cuda_int8 PASSED [ 22%] 2023-01-11T23:13:47.0797582Z test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_scatter_cuda_complex32 PASSED [ 22%] 2023-01-11T23:13:47.0797753Z test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_scatter_cuda_complex64 PASSED [ 22%] 2023-01-11T23:13:47.0797927Z test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_scatter_cuda_float16 PASSED [ 22%] 2023-01-11T23:13:47.0798098Z test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_scatter_cuda_float32 PASSED [ 22%] 2023-01-11T23:13:47.0798266Z test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_scatter_cuda_int16 PASSED [ 22%] 2023-01-11T23:13:47.0798429Z test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_scatter_cuda_int32 PASSED [ 22%] 2023-01-11T23:13:47.0798597Z test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_scatter_cuda_int64 PASSED [ 22%] 2023-01-11T23:13:47.0798763Z test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_scatter_cuda_int8 PASSED [ 22%] 2023-01-11T23:13:47.0798947Z test_ops.py::TestCommonCUDA::test_python_ref__refs_asin_cuda_complex64 PASSED [ 22%] 2023-01-11T23:13:47.0799108Z test_ops.py::TestCommonCUDA::test_python_ref__refs_asin_cuda_float16 PASSED [ 22%] 2023-01-11T23:13:47.0799262Z test_ops.py::TestCommonCUDA::test_python_ref__refs_asin_cuda_float64 PASSED [ 22%] 2023-01-11T23:13:47.0799416Z test_ops.py::TestCommonCUDA::test_python_ref__refs_asin_cuda_int16 PASSED [ 22%] 2023-01-11T23:13:47.0799570Z test_ops.py::TestCommonCUDA::test_python_ref__refs_asin_cuda_int64 PASSED [ 22%] 2023-01-11T23:13:47.0799723Z test_ops.py::TestCommonCUDA::test_python_ref__refs_asinh_cuda_bfloat16 PASSED [ 22%] 2023-01-11T23:13:47.0799884Z test_ops.py::TestCommonCUDA::test_python_ref__refs_asinh_cuda_complex64 PASSED [ 22%] 2023-01-11T23:13:47.0800043Z test_ops.py::TestCommonCUDA::test_python_ref__refs_asinh_cuda_float32 PASSED [ 22%] 2023-01-11T23:13:47.0800199Z test_ops.py::TestCommonCUDA::test_python_ref__refs_asinh_cuda_int16 PASSED [ 22%] 2023-01-11T23:13:47.0800354Z test_ops.py::TestCommonCUDA::test_python_ref__refs_asinh_cuda_int64 PASSED [ 22%] 2023-01-11T23:13:47.0800513Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atan2_cuda_float16 PASSED [ 22%] 2023-01-11T23:13:47.0800671Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atan2_cuda_float32 PASSED [ 22%] 2023-01-11T23:13:47.0800823Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atan2_cuda_int16 PASSED [ 22%] 2023-01-11T23:13:47.0800969Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atan2_cuda_int32 PASSED [ 22%] 2023-01-11T23:13:47.0801119Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atan2_cuda_int64 PASSED [ 22%] 2023-01-11T23:13:47.0801273Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atan_cuda_bool PASSED [ 22%] 2023-01-11T23:13:47.0801433Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atan_cuda_complex128 PASSED [ 22%] 2023-01-11T23:13:47.0801589Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atan_cuda_float32 PASSED [ 22%] 2023-01-11T23:13:47.0801746Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atan_cuda_float64 PASSED [ 22%] 2023-01-11T23:13:47.0801904Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atan_cuda_int16 PASSED [ 22%] 2023-01-11T23:13:47.0802079Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atan_cuda_int64 PASSED [ 22%] 2023-01-11T23:13:47.0802224Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atan_cuda_int8 PASSED [ 22%] 2023-01-11T23:13:47.0802379Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atanh_cuda_bool PASSED [ 22%] 2023-01-11T23:13:47.0802541Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atanh_cuda_complex32 PASSED [ 22%] 2023-01-11T23:13:47.0802701Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atanh_cuda_complex64 PASSED [ 22%] 2023-01-11T23:13:47.0802858Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atanh_cuda_float16 PASSED [ 22%] 2023-01-11T23:13:47.0803012Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atanh_cuda_float32 PASSED [ 22%] 2023-01-11T23:13:47.0803170Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atanh_cuda_int16 PASSED [ 22%] 2023-01-11T23:13:47.0803327Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atanh_cuda_int64 PASSED [ 22%] 2023-01-11T23:13:47.0803485Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atanh_cuda_uint8 PASSED [ 22%] 2023-01-11T23:13:47.0803641Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_1d_cuda_bfloat16 PASSED [ 22%] 2023-01-11T23:13:47.0803806Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_1d_cuda_float16 PASSED [ 22%] 2023-01-11T23:13:47.0803966Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_1d_cuda_int16 PASSED [ 22%] 2023-01-11T23:13:47.0804127Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_1d_cuda_int32 PASSED [ 22%] 2023-01-11T23:13:47.0804287Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_1d_cuda_int8 PASSED [ 22%] 2023-01-11T23:13:47.0804445Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_1d_cuda_uint8 PASSED [ 22%] 2023-01-11T23:13:47.0804629Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_2d_cuda_bool PASSED [ 22%] 2023-01-11T23:13:47.0804795Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_2d_cuda_complex128 PASSED [ 22%] 2023-01-11T23:13:47.0804955Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_2d_cuda_complex64 PASSED [ 22%] 2023-01-11T23:13:47.0805120Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_2d_cuda_float32 PASSED [ 22%] 2023-01-11T23:13:47.0805281Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_2d_cuda_float64 PASSED [ 22%] 2023-01-11T23:13:47.0805440Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_2d_cuda_int16 PASSED [ 22%] 2023-01-11T23:13:47.0805595Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_2d_cuda_int32 PASSED [ 22%] 2023-01-11T23:13:47.0805751Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_2d_cuda_int64 PASSED [ 22%] 2023-01-11T23:13:47.0805910Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_2d_cuda_uint8 PASSED [ 22%] 2023-01-11T23:13:47.0806077Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_3d_cuda_bfloat16 PASSED [ 22%] 2023-01-11T23:13:47.0806240Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_3d_cuda_complex32 PASSED [ 22%] 2023-01-11T23:13:47.0806403Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_3d_cuda_float16 PASSED [ 22%] 2023-01-11T23:13:47.0806561Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_3d_cuda_float32 PASSED [ 22%] 2023-01-11T23:13:47.0806719Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_3d_cuda_int16 PASSED [ 22%] 2023-01-11T23:13:47.0806875Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_3d_cuda_int32 PASSED [ 22%] 2023-01-11T23:13:47.0807029Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_3d_cuda_int64 PASSED [ 22%] 2023-01-11T23:13:47.0807186Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_3d_cuda_int8 PASSED [ 22%] 2023-01-11T23:13:47.0807345Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_3d_cuda_uint8 PASSED [ 22%] 2023-01-11T23:13:47.0807509Z test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_and_cuda_int16 PASSED [ 22%] 2023-01-11T23:13:47.0807688Z test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_and_cuda_int32 PASSED [ 22%] 2023-01-11T23:13:47.0807859Z test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_left_shift_cuda_int16 PASSED [ 22%] 2023-01-11T23:13:47.0808026Z test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_left_shift_cuda_int32 PASSED [ 22%] 2023-01-11T23:13:47.0808193Z test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_left_shift_cuda_int8 PASSED [ 22%] 2023-01-11T23:13:47.0808358Z test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_left_shift_cuda_uint8 PASSED [ 22%] 2023-01-11T23:13:47.0808518Z test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_not_cuda_int16 PASSED [ 22%] 2023-01-11T23:13:47.0808677Z test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_not_cuda_int64 PASSED [ 22%] 2023-01-11T23:13:47.0808839Z test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_not_cuda_int8 PASSED [ 22%] 2023-01-11T23:13:47.0808987Z test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_or_cuda_int64 PASSED [ 23%] 2023-01-11T23:13:47.0809146Z test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_or_cuda_int8 PASSED [ 23%] 2023-01-11T23:13:47.0809300Z test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_or_cuda_uint8 PASSED [ 23%] 2023-01-11T23:13:47.0809472Z test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_right_shift_cuda_int16 PASSED [ 23%] 2023-01-11T23:13:47.0809641Z test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_right_shift_cuda_int64 PASSED [ 23%] 2023-01-11T23:13:47.0809808Z test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_right_shift_cuda_int8 PASSED [ 23%] 2023-01-11T23:13:47.0809968Z test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_xor_cuda_bool PASSED [ 23%] 2023-01-11T23:13:47.0810156Z test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_xor_cuda_int16 PASSED [ 23%] 2023-01-11T23:13:47.0810317Z test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_xor_cuda_int32 PASSED [ 23%] 2023-01-11T23:13:47.0810469Z test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_xor_cuda_int8 PASSED [ 23%] 2023-01-11T23:13:47.0810627Z test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_xor_cuda_uint8 PASSED [ 23%] 2023-01-11T23:13:47.0810801Z test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_tensors_cuda_bfloat16 PASSED [ 23%] 2023-01-11T23:13:47.0810971Z test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_tensors_cuda_bool PASSED [ 23%] 2023-01-11T23:13:47.0811147Z test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_tensors_cuda_complex64 PASSED [ 23%] 2023-01-11T23:13:47.0811318Z test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_tensors_cuda_float32 PASSED [ 23%] 2023-01-11T23:13:47.0811487Z test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_tensors_cuda_int16 PASSED [ 23%] 2023-01-11T23:13:47.0811658Z test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_tensors_cuda_int8 PASSED [ 23%] 2023-01-11T23:13:47.0811813Z test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_to_cuda_bool PASSED [ 23%] 2023-01-11T23:13:47.0811981Z test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_to_cuda_complex64 PASSED [ 23%] 2023-01-11T23:13:47.0812149Z test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_to_cuda_float16 PASSED [ 23%] 2023-01-11T23:13:47.0812313Z test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_to_cuda_float64 PASSED [ 23%] 2023-01-11T23:13:47.0812477Z test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_to_cuda_int32 PASSED [ 23%] 2023-01-11T23:13:47.0812638Z test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_to_cuda_int64 PASSED [ 23%] 2023-01-11T23:13:47.0812797Z test_ops.py::TestCommonCUDA::test_python_ref__refs_bucketize_cuda_float16 PASSED [ 23%] 2023-01-11T23:13:47.0812962Z test_ops.py::TestCommonCUDA::test_python_ref__refs_bucketize_cuda_int16 PASSED [ 23%] 2023-01-11T23:13:47.0813116Z test_ops.py::TestCommonCUDA::test_python_ref__refs_bucketize_cuda_int64 PASSED [ 23%] 2023-01-11T23:13:47.0813292Z test_ops.py::TestCommonCUDA::test_python_ref__refs_cat_cuda_bfloat16 PASSED [ 23%] 2023-01-11T23:13:47.0813449Z test_ops.py::TestCommonCUDA::test_python_ref__refs_cat_cuda_float16 PASSED [ 23%] 2023-01-11T23:13:47.0813602Z test_ops.py::TestCommonCUDA::test_python_ref__refs_cat_cuda_float64 PASSED [ 23%] 2023-01-11T23:13:47.0813759Z test_ops.py::TestCommonCUDA::test_python_ref__refs_cat_cuda_int64 PASSED [ 23%] 2023-01-11T23:13:47.0813920Z test_ops.py::TestCommonCUDA::test_python_ref__refs_cat_cuda_int8 PASSED [ 23%] 2023-01-11T23:13:47.0814079Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ceil_cuda_bfloat16 PASSED [ 23%] 2023-01-11T23:13:47.0814235Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ceil_cuda_float16 PASSED [ 23%] 2023-01-11T23:13:47.0814387Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ceil_cuda_float32 PASSED [ 23%] 2023-01-11T23:13:47.0814647Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ceil_cuda_int16 PASSED [ 23%] 2023-01-11T23:13:47.0814801Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ceil_cuda_int8 PASSED [ 23%] 2023-01-11T23:13:47.0814968Z test_ops.py::TestCommonCUDA::test_python_ref__refs_chunk_cuda_bfloat16 PASSED [ 23%] 2023-01-11T23:13:47.0815124Z test_ops.py::TestCommonCUDA::test_python_ref__refs_chunk_cuda_float64 PASSED [ 23%] 2023-01-11T23:13:47.0815279Z test_ops.py::TestCommonCUDA::test_python_ref__refs_chunk_cuda_int8 PASSED [ 23%] 2023-01-11T23:13:47.0815438Z test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_cuda_float16 PASSED [ 23%] 2023-01-11T23:13:47.0815594Z test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_cuda_float64 PASSED [ 23%] 2023-01-11T23:13:47.0815750Z test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_cuda_int8 PASSED [ 23%] 2023-01-11T23:13:47.0815903Z test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_max_cuda_bool PASSED [ 23%] 2023-01-11T23:13:47.0816110Z test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_max_cuda_float16 PASSED [ 23%] 2023-01-11T23:13:47.0816267Z test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_max_cuda_float64 PASSED [ 23%] 2023-01-11T23:13:47.0816428Z test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_max_cuda_int16 PASSED [ 23%] 2023-01-11T23:13:47.0816581Z test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_max_cuda_int32 PASSED [ 23%] 2023-01-11T23:13:47.0816741Z test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_max_cuda_int8 PASSED [ 23%] 2023-01-11T23:13:47.0816903Z test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_min_cuda_float16 PASSED [ 23%] 2023-01-11T23:13:47.0817059Z test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_min_cuda_float32 PASSED [ 23%] 2023-01-11T23:13:47.0817211Z test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_min_cuda_int16 PASSED [ 23%] 2023-01-11T23:13:47.0817367Z test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_min_cuda_int32 PASSED [ 23%] 2023-01-11T23:13:47.0817523Z test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_min_cuda_uint8 PASSED [ 23%] 2023-01-11T23:13:47.0817681Z test_ops.py::TestCommonCUDA::test_python_ref__refs_clone_cuda_bfloat16 PASSED [ 23%] 2023-01-11T23:13:47.0817839Z test_ops.py::TestCommonCUDA::test_python_ref__refs_clone_cuda_bool PASSED [ 23%] 2023-01-11T23:13:47.0818000Z test_ops.py::TestCommonCUDA::test_python_ref__refs_clone_cuda_complex128 PASSED [ 23%] 2023-01-11T23:13:47.0818156Z test_ops.py::TestCommonCUDA::test_python_ref__refs_clone_cuda_float32 PASSED [ 23%] 2023-01-11T23:13:47.0818310Z test_ops.py::TestCommonCUDA::test_python_ref__refs_clone_cuda_int64 PASSED [ 23%] 2023-01-11T23:13:47.0818465Z test_ops.py::TestCommonCUDA::test_python_ref__refs_clone_cuda_uint8 PASSED [ 23%] 2023-01-11T23:13:47.0818623Z test_ops.py::TestCommonCUDA::test_python_ref__refs_column_stack_cuda_bfloat16 PASSED [ 23%] 2023-01-11T23:13:47.0818781Z test_ops.py::TestCommonCUDA::test_python_ref__refs_column_stack_cuda_bool PASSED [ 23%] 2023-01-11T23:13:47.0818952Z test_ops.py::TestCommonCUDA::test_python_ref__refs_column_stack_cuda_complex128 PASSED [ 23%] 2023-01-11T23:13:47.0819155Z test_ops.py::TestCommonCUDA::test_python_ref__refs_column_stack_cuda_float16 PASSED [ 23%] 2023-01-11T23:13:47.0819347Z test_ops.py::TestCommonCUDA::test_python_ref__refs_column_stack_cuda_float32 PASSED [ 23%] 2023-01-11T23:13:47.0819524Z test_ops.py::TestCommonCUDA::test_python_ref__refs_column_stack_cuda_float64 PASSED [ 23%] 2023-01-11T23:13:47.0819686Z test_ops.py::TestCommonCUDA::test_python_ref__refs_column_stack_cuda_int16 PASSED [ 23%] 2023-01-11T23:13:47.0819842Z test_ops.py::TestCommonCUDA::test_python_ref__refs_column_stack_cuda_int32 PASSED [ 23%] 2023-01-11T23:13:47.0819996Z test_ops.py::TestCommonCUDA::test_python_ref__refs_column_stack_cuda_int8 PASSED [ 23%] 2023-01-11T23:13:47.0820155Z test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_cuda_bfloat16 PASSED [ 23%] 2023-01-11T23:13:47.0820310Z test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_cuda_bool PASSED [ 23%] 2023-01-11T23:13:47.0820468Z test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_cuda_float16 PASSED [ 23%] 2023-01-11T23:13:47.0820623Z test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_cuda_int64 PASSED [ 23%] 2023-01-11T23:13:47.0820775Z test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_cuda_int8 PASSED [ 23%] 2023-01-11T23:13:47.0820929Z test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_cuda_uint8 PASSED [ 23%] 2023-01-11T23:13:47.0821100Z test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_physical_cuda_complex128 PASSED [ 23%] 2023-01-11T23:13:47.0821260Z test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_physical_cuda_float64 PASSED [ 23%] 2023-01-11T23:13:47.0821420Z test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_physical_cuda_int16 PASSED [ 23%] 2023-01-11T23:13:47.0821589Z test_ops.py::TestCommonCUDA::test_python_ref__refs_constant_pad_nd_cuda_complex64 PASSED [ 23%] 2023-01-11T23:13:47.0821784Z test_ops.py::TestCommonCUDA::test_python_ref__refs_constant_pad_nd_cuda_float16 PASSED [ 23%] 2023-01-11T23:13:47.0821951Z test_ops.py::TestCommonCUDA::test_python_ref__refs_constant_pad_nd_cuda_float32 PASSED [ 23%] 2023-01-11T23:13:47.0822115Z test_ops.py::TestCommonCUDA::test_python_ref__refs_constant_pad_nd_cuda_float64 PASSED [ 23%] 2023-01-11T23:13:47.0822281Z test_ops.py::TestCommonCUDA::test_python_ref__refs_constant_pad_nd_cuda_int32 PASSED [ 23%] 2023-01-11T23:13:47.0822441Z test_ops.py::TestCommonCUDA::test_python_ref__refs_constant_pad_nd_cuda_int64 PASSED [ 23%] 2023-01-11T23:13:47.0822602Z test_ops.py::TestCommonCUDA::test_python_ref__refs_constant_pad_nd_cuda_int8 PASSED [ 23%] 2023-01-11T23:13:47.0822755Z test_ops.py::TestCommonCUDA::test_python_ref__refs_constant_pad_nd_cuda_uint8 PASSED [ 23%] 2023-01-11T23:13:47.0822917Z test_ops.py::TestCommonCUDA::test_python_ref__refs_contiguous_cuda_bfloat16 PASSED [ 23%] 2023-01-11T23:13:47.0823080Z test_ops.py::TestCommonCUDA::test_python_ref__refs_contiguous_cuda_bool PASSED [ 23%] 2023-01-11T23:13:47.0823248Z test_ops.py::TestCommonCUDA::test_python_ref__refs_contiguous_cuda_complex32 PASSED [ 23%] 2023-01-11T23:13:47.0823410Z test_ops.py::TestCommonCUDA::test_python_ref__refs_contiguous_cuda_float16 PASSED [ 23%] 2023-01-11T23:13:47.0823579Z test_ops.py::TestCommonCUDA::test_python_ref__refs_contiguous_cuda_float32 PASSED [ 23%] 2023-01-11T23:13:47.0823738Z test_ops.py::TestCommonCUDA::test_python_ref__refs_contiguous_cuda_int32 PASSED [ 23%] 2023-01-11T23:13:47.0823897Z test_ops.py::TestCommonCUDA::test_python_ref__refs_contiguous_cuda_int8 PASSED [ 23%] 2023-01-11T23:13:47.0824048Z test_ops.py::TestCommonCUDA::test_python_ref__refs_contiguous_cuda_uint8 PASSED [ 23%] 2023-01-11T23:13:47.0824211Z test_ops.py::TestCommonCUDA::test_python_ref__refs_copysign_cuda_bfloat16 PASSED [ 23%] 2023-01-11T23:13:47.0824368Z test_ops.py::TestCommonCUDA::test_python_ref__refs_copysign_cuda_bool PASSED [ 23%] 2023-01-11T23:13:47.0824529Z test_ops.py::TestCommonCUDA::test_python_ref__refs_copysign_cuda_float16 PASSED [ 23%] 2023-01-11T23:13:47.0824694Z test_ops.py::TestCommonCUDA::test_python_ref__refs_copysign_cuda_float64 PASSED [ 23%] 2023-01-11T23:13:47.0824879Z test_ops.py::TestCommonCUDA::test_python_ref__refs_copysign_cuda_int32 PASSED [ 23%] 2023-01-11T23:13:47.0825040Z test_ops.py::TestCommonCUDA::test_python_ref__refs_copysign_cuda_int8 PASSED [ 23%] 2023-01-11T23:13:47.0825198Z test_ops.py::TestCommonCUDA::test_python_ref__refs_cos_cuda_bfloat16 PASSED [ 23%] 2023-01-11T23:13:47.0825356Z test_ops.py::TestCommonCUDA::test_python_ref__refs_cos_cuda_complex128 PASSED [ 23%] 2023-01-11T23:13:47.0825504Z test_ops.py::TestCommonCUDA::test_python_ref__refs_cos_cuda_complex32 PASSED [ 23%] 2023-01-11T23:13:47.0825659Z test_ops.py::TestCommonCUDA::test_python_ref__refs_cos_cuda_float16 PASSED [ 23%] 2023-01-11T23:13:47.0825814Z test_ops.py::TestCommonCUDA::test_python_ref__refs_cos_cuda_float64 PASSED [ 23%] 2023-01-11T23:13:47.0825970Z test_ops.py::TestCommonCUDA::test_python_ref__refs_cos_cuda_int8 PASSED [ 23%] 2023-01-11T23:13:47.0826124Z test_ops.py::TestCommonCUDA::test_python_ref__refs_cos_cuda_uint8 PASSED [ 23%] 2023-01-11T23:13:47.0826279Z test_ops.py::TestCommonCUDA::test_python_ref__refs_cosh_cuda_bool PASSED [ 23%] 2023-01-11T23:13:47.0826438Z test_ops.py::TestCommonCUDA::test_python_ref__refs_cosh_cuda_float16 PASSED [ 23%] 2023-01-11T23:13:47.0826591Z test_ops.py::TestCommonCUDA::test_python_ref__refs_cosh_cuda_float32 PASSED [ 23%] 2023-01-11T23:13:47.0826738Z test_ops.py::TestCommonCUDA::test_python_ref__refs_cosh_cuda_int16 PASSED [ 23%] 2023-01-11T23:13:47.0826890Z test_ops.py::TestCommonCUDA::test_python_ref__refs_cosh_cuda_int64 PASSED [ 23%] 2023-01-11T23:13:47.0827046Z test_ops.py::TestCommonCUDA::test_python_ref__refs_cosh_cuda_uint8 PASSED [ 23%] 2023-01-11T23:13:47.0827207Z test_ops.py::TestCommonCUDA::test_python_ref__refs_cumsum_cuda_complex64 PASSED [ 23%] 2023-01-11T23:13:47.0827365Z test_ops.py::TestCommonCUDA::test_python_ref__refs_cumsum_cuda_float32 PASSED [ 23%] 2023-01-11T23:13:47.0827594Z test_ops.py::TestCommonCUDA::test_python_ref__refs_cumsum_cuda_float64 PASSED [ 23%] 2023-01-11T23:13:47.0827746Z test_ops.py::TestCommonCUDA::test_python_ref__refs_cumsum_cuda_int32 PASSED [ 23%] 2023-01-11T23:13:47.0827901Z test_ops.py::TestCommonCUDA::test_python_ref__refs_cumsum_cuda_int64 PASSED [ 23%] 2023-01-11T23:13:47.0828048Z test_ops.py::TestCommonCUDA::test_python_ref__refs_cumsum_cuda_int8 PASSED [ 23%] 2023-01-11T23:13:47.0828206Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_cuda_bfloat16 PASSED [ 23%] 2023-01-11T23:13:47.0828364Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_cuda_complex64 PASSED [ 23%] 2023-01-11T23:13:47.0828522Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_cuda_float16 PASSED [ 23%] 2023-01-11T23:13:47.0828679Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_cuda_float32 PASSED [ 23%] 2023-01-11T23:13:47.0828831Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_cuda_float64 PASSED [ 23%] 2023-01-11T23:13:47.0828986Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_cuda_int16 PASSED [ 23%] 2023-01-11T23:13:47.0829139Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_cuda_int32 PASSED [ 23%] 2023-01-11T23:13:47.0829289Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_cuda_int64 PASSED [ 23%] 2023-01-11T23:13:47.0829441Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_cuda_int8 PASSED [ 23%] 2023-01-11T23:13:47.0829605Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_embed_cuda_bfloat16 PASSED [ 23%] 2023-01-11T23:13:47.0829837Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_embed_cuda_float64 PASSED [ 23%] 2023-01-11T23:13:47.0830011Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_embed_cuda_int16 PASSED [ 23%] 2023-01-11T23:13:47.0830172Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_embed_cuda_uint8 PASSED [ 23%] 2023-01-11T23:13:47.0830340Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_copy_cuda_bfloat16 PASSED [ 23%] 2023-01-11T23:13:47.0830509Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_copy_cuda_bool PASSED [ 23%] 2023-01-11T23:13:47.0830707Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_copy_cuda_complex32 PASSED [ 23%] 2023-01-11T23:13:47.0830863Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_copy_cuda_int16 PASSED [ 23%] 2023-01-11T23:13:47.0831027Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_copy_cuda_int32 PASSED [ 23%] 2023-01-11T23:13:47.0831187Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_copy_cuda_int64 PASSED [ 23%] 2023-01-11T23:13:47.0831343Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_cuda_bool PASSED [ 23%] 2023-01-11T23:13:47.0831502Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_cuda_float16 PASSED [ 23%] 2023-01-11T23:13:47.0831662Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_cuda_float32 PASSED [ 23%] 2023-01-11T23:13:47.0831826Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_cuda_float64 PASSED [ 23%] 2023-01-11T23:13:47.0831985Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_cuda_int16 PASSED [ 23%] 2023-01-11T23:13:47.0832138Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_cuda_int32 PASSED [ 23%] 2023-01-11T23:13:47.0832293Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_cuda_int8 PASSED [ 23%] 2023-01-11T23:13:47.0832449Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_cuda_uint8 PASSED [ 23%] 2023-01-11T23:13:47.0832621Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_scatter_cuda_bfloat16 PASSED [ 23%] 2023-01-11T23:13:47.0832790Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_scatter_cuda_bool PASSED [ 23%] 2023-01-11T23:13:47.0832959Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_scatter_cuda_float64 PASSED [ 23%] 2023-01-11T23:13:47.0833127Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_scatter_cuda_int16 PASSED [ 23%] 2023-01-11T23:13:47.0833320Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_scatter_cuda_uint8 PASSED [ 23%] 2023-01-11T23:13:47.0833478Z test_ops.py::TestCommonCUDA::test_python_ref__refs_digamma_cuda_bool PASSED [ 23%] 2023-01-11T23:13:47.0833633Z test_ops.py::TestCommonCUDA::test_python_ref__refs_digamma_cuda_float16 PASSED [ 23%] 2023-01-11T23:13:47.0833791Z test_ops.py::TestCommonCUDA::test_python_ref__refs_digamma_cuda_float32 PASSED [ 24%] 2023-01-11T23:13:47.0833946Z test_ops.py::TestCommonCUDA::test_python_ref__refs_digamma_cuda_float64 PASSED [ 24%] 2023-01-11T23:13:47.0834102Z test_ops.py::TestCommonCUDA::test_python_ref__refs_digamma_cuda_int16 PASSED [ 24%] 2023-01-11T23:13:47.0834253Z test_ops.py::TestCommonCUDA::test_python_ref__refs_digamma_cuda_int32 PASSED [ 24%] 2023-01-11T23:13:47.0834410Z test_ops.py::TestCommonCUDA::test_python_ref__refs_digamma_cuda_int8 PASSED [ 24%] 2023-01-11T23:13:47.0834565Z test_ops.py::TestCommonCUDA::test_python_ref__refs_digamma_cuda_uint8 PASSED [ 24%] 2023-01-11T23:13:47.0834742Z test_ops.py::TestCommonCUDA::test_python_ref__refs_div_floor_rounding_cuda_bfloat16 PASSED [ 24%] 2023-01-11T23:13:47.0834901Z test_ops.py::TestCommonCUDA::test_python_ref__refs_div_floor_rounding_cuda_int16 PASSED [ 24%] 2023-01-11T23:13:47.0835071Z test_ops.py::TestCommonCUDA::test_python_ref__refs_div_floor_rounding_cuda_int32 PASSED [ 24%] 2023-01-11T23:13:47.0835237Z test_ops.py::TestCommonCUDA::test_python_ref__refs_div_floor_rounding_cuda_int8 PASSED [ 24%] 2023-01-11T23:13:47.0835398Z test_ops.py::TestCommonCUDA::test_python_ref__refs_div_floor_rounding_cuda_uint8 PASSED [ 24%] 2023-01-11T23:13:47.0835566Z test_ops.py::TestCommonCUDA::test_python_ref__refs_div_no_rounding_mode_cuda_bool PASSED [ 24%] 2023-01-11T23:13:47.0835746Z test_ops.py::TestCommonCUDA::test_python_ref__refs_div_no_rounding_mode_cuda_complex128 PASSED [ 24%] 2023-01-11T23:13:47.0835922Z test_ops.py::TestCommonCUDA::test_python_ref__refs_div_no_rounding_mode_cuda_float16 PASSED [ 24%] 2023-01-11T23:13:47.0836093Z test_ops.py::TestCommonCUDA::test_python_ref__refs_div_no_rounding_mode_cuda_float64 PASSED [ 24%] 2023-01-11T23:13:47.0836263Z test_ops.py::TestCommonCUDA::test_python_ref__refs_div_no_rounding_mode_cuda_int8 PASSED [ 24%] 2023-01-11T23:13:47.0836452Z test_ops.py::TestCommonCUDA::test_python_ref__refs_div_no_rounding_mode_cuda_uint8 PASSED [ 24%] 2023-01-11T23:13:47.0836623Z test_ops.py::TestCommonCUDA::test_python_ref__refs_div_trunc_rounding_cuda_bfloat16 PASSED [ 24%] 2023-01-11T23:13:47.0836793Z test_ops.py::TestCommonCUDA::test_python_ref__refs_div_trunc_rounding_cuda_float16 PASSED [ 24%] 2023-01-11T23:13:47.0836963Z test_ops.py::TestCommonCUDA::test_python_ref__refs_div_trunc_rounding_cuda_float64 PASSED [ 24%] 2023-01-11T23:13:47.0837132Z test_ops.py::TestCommonCUDA::test_python_ref__refs_div_trunc_rounding_cuda_int16 PASSED [ 24%] 2023-01-11T23:13:47.0837301Z test_ops.py::TestCommonCUDA::test_python_ref__refs_div_trunc_rounding_cuda_int64 PASSED [ 24%] 2023-01-11T23:13:47.0837470Z test_ops.py::TestCommonCUDA::test_python_ref__refs_div_trunc_rounding_cuda_uint8 PASSED [ 24%] 2023-01-11T23:13:47.0837631Z test_ops.py::TestCommonCUDA::test_python_ref__refs_dsplit_cuda_bfloat16 PASSED [ 24%] 2023-01-11T23:13:47.0837791Z test_ops.py::TestCommonCUDA::test_python_ref__refs_dsplit_cuda_complex128 PASSED [ 24%] 2023-01-11T23:13:47.0837950Z test_ops.py::TestCommonCUDA::test_python_ref__refs_dsplit_cuda_float16 PASSED [ 24%] 2023-01-11T23:13:47.0838109Z test_ops.py::TestCommonCUDA::test_python_ref__refs_dsplit_cuda_float32 PASSED [ 24%] 2023-01-11T23:13:47.0838264Z test_ops.py::TestCommonCUDA::test_python_ref__refs_dsplit_cuda_float64 PASSED [ 24%] 2023-01-11T23:13:47.0838422Z test_ops.py::TestCommonCUDA::test_python_ref__refs_dsplit_cuda_int64 PASSED [ 24%] 2023-01-11T23:13:47.0838579Z test_ops.py::TestCommonCUDA::test_python_ref__refs_dsplit_cuda_int8 PASSED [ 24%] 2023-01-11T23:13:47.0838740Z test_ops.py::TestCommonCUDA::test_python_ref__refs_dsplit_cuda_uint8 PASSED [ 24%] 2023-01-11T23:13:47.0838922Z test_ops.py::TestCommonCUDA::test_python_ref__refs_dstack_cuda_float16 PASSED [ 24%] 2023-01-11T23:13:47.0839071Z test_ops.py::TestCommonCUDA::test_python_ref__refs_dstack_cuda_int64 PASSED [ 24%] 2023-01-11T23:13:47.0839227Z test_ops.py::TestCommonCUDA::test_python_ref__refs_dstack_cuda_uint8 PASSED [ 24%] 2023-01-11T23:13:47.0839433Z test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_cuda_float16 SKIPPED (Expected: empty is not comparable) [ 24%] 2023-01-11T23:13:47.0839634Z test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_cuda_float64 SKIPPED (Expected: empty is not comparable) [ 24%] 2023-01-11T23:13:47.0839833Z test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_cuda_int32 SKIPPED (Expected: empty is not comparable) [ 24%] 2023-01-11T23:13:47.0840039Z test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_like_cuda_bfloat16 SKIPPED (Expected: empty is not comparable) [ 24%] 2023-01-11T23:13:47.0840241Z test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_like_cuda_bool SKIPPED (Expected: empty is not comparable) [ 24%] 2023-01-11T23:13:47.0840452Z test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_like_cuda_complex128 SKIPPED (Expected: empty is not comparable) [ 24%] 2023-01-11T23:13:47.0840660Z test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_like_cuda_complex32 SKIPPED (Expected: empty is not comparable) [ 24%] 2023-01-11T23:13:47.0840863Z test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_like_cuda_float32 SKIPPED (Expected: empty is not comparable) [ 24%] 2023-01-11T23:13:47.0841056Z test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_like_cuda_int8 SKIPPED (Expected: empty is not comparable) [ 24%] 2023-01-11T23:13:47.0841258Z test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_like_cuda_uint8 SKIPPED (Expected: empty is not comparable) [ 24%] 2023-01-11T23:13:47.0841412Z test_ops.py::TestCommonCUDA::test_python_ref__refs_eq_cuda_bool PASSED [ 24%] 2023-01-11T23:13:47.0841572Z test_ops.py::TestCommonCUDA::test_python_ref__refs_eq_cuda_complex128 PASSED [ 24%] 2023-01-11T23:13:47.0841733Z test_ops.py::TestCommonCUDA::test_python_ref__refs_eq_cuda_complex32 PASSED [ 24%] 2023-01-11T23:13:47.0841889Z test_ops.py::TestCommonCUDA::test_python_ref__refs_eq_cuda_float16 PASSED [ 24%] 2023-01-11T23:13:47.0842068Z test_ops.py::TestCommonCUDA::test_python_ref__refs_eq_cuda_uint8 PASSED [ 24%] 2023-01-11T23:13:47.0842228Z test_ops.py::TestCommonCUDA::test_python_ref__refs_erf_cuda_bfloat16 PASSED [ 24%] 2023-01-11T23:13:47.0842380Z test_ops.py::TestCommonCUDA::test_python_ref__refs_erf_cuda_bool PASSED [ 24%] 2023-01-11T23:13:47.0842530Z test_ops.py::TestCommonCUDA::test_python_ref__refs_erf_cuda_float16 PASSED [ 24%] 2023-01-11T23:13:47.0842682Z test_ops.py::TestCommonCUDA::test_python_ref__refs_erf_cuda_int16 PASSED [ 24%] 2023-01-11T23:13:47.0842833Z test_ops.py::TestCommonCUDA::test_python_ref__refs_erf_cuda_int64 PASSED [ 24%] 2023-01-11T23:13:47.0842986Z test_ops.py::TestCommonCUDA::test_python_ref__refs_erf_cuda_uint8 PASSED [ 24%] 2023-01-11T23:13:47.0843142Z test_ops.py::TestCommonCUDA::test_python_ref__refs_erfc_cuda_bool PASSED [ 24%] 2023-01-11T23:13:47.0843300Z test_ops.py::TestCommonCUDA::test_python_ref__refs_erfc_cuda_float16 PASSED [ 24%] 2023-01-11T23:13:47.0843456Z test_ops.py::TestCommonCUDA::test_python_ref__refs_erfc_cuda_float32 PASSED [ 24%] 2023-01-11T23:13:47.0843609Z test_ops.py::TestCommonCUDA::test_python_ref__refs_erfc_cuda_float64 PASSED [ 24%] 2023-01-11T23:13:47.0843756Z test_ops.py::TestCommonCUDA::test_python_ref__refs_erfc_cuda_int16 PASSED [ 24%] 2023-01-11T23:13:47.0843914Z test_ops.py::TestCommonCUDA::test_python_ref__refs_erfc_cuda_uint8 PASSED [ 24%] 2023-01-11T23:13:47.0844075Z test_ops.py::TestCommonCUDA::test_python_ref__refs_erfinv_cuda_float16 PASSED [ 24%] 2023-01-11T23:13:47.0844230Z test_ops.py::TestCommonCUDA::test_python_ref__refs_erfinv_cuda_float32 PASSED [ 24%] 2023-01-11T23:13:47.0844384Z test_ops.py::TestCommonCUDA::test_python_ref__refs_erfinv_cuda_int32 PASSED [ 24%] 2023-01-11T23:13:47.0844562Z test_ops.py::TestCommonCUDA::test_python_ref__refs_erfinv_cuda_int64 PASSED [ 24%] 2023-01-11T23:13:47.0844717Z test_ops.py::TestCommonCUDA::test_python_ref__refs_erfinv_cuda_int8 PASSED [ 24%] 2023-01-11T23:13:47.0844870Z test_ops.py::TestCommonCUDA::test_python_ref__refs_exp2_cuda_bool PASSED [ 24%] 2023-01-11T23:13:47.0845019Z test_ops.py::TestCommonCUDA::test_python_ref__refs_exp2_cuda_float16 PASSED [ 24%] 2023-01-11T23:13:47.0845170Z test_ops.py::TestCommonCUDA::test_python_ref__refs_exp2_cuda_uint8 PASSED [ 24%] 2023-01-11T23:13:47.0845323Z test_ops.py::TestCommonCUDA::test_python_ref__refs_exp_cuda_bfloat16 PASSED [ 24%] 2023-01-11T23:13:47.0845475Z test_ops.py::TestCommonCUDA::test_python_ref__refs_exp_cuda_int16 PASSED [ 24%] 2023-01-11T23:13:47.0845623Z test_ops.py::TestCommonCUDA::test_python_ref__refs_exp_cuda_int8 PASSED [ 24%] 2023-01-11T23:13:47.0845791Z test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_as_cuda_complex128 PASSED [ 24%] 2023-01-11T23:13:47.0845957Z test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_as_cuda_int64 PASSED [ 24%] 2023-01-11T23:13:47.0846114Z test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_as_cuda_uint8 PASSED [ 24%] 2023-01-11T23:13:47.0846272Z test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_cuda_complex64 PASSED [ 24%] 2023-01-11T23:13:47.0846432Z test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_cuda_float32 PASSED [ 24%] 2023-01-11T23:13:47.0846586Z test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_cuda_int16 PASSED [ 24%] 2023-01-11T23:13:47.0846743Z test_ops.py::TestCommonCUDA::test_python_ref__refs_expm1_cuda_float16 PASSED [ 24%] 2023-01-11T23:13:47.0846895Z test_ops.py::TestCommonCUDA::test_python_ref__refs_expm1_cuda_float32 PASSED [ 24%] 2023-01-11T23:13:47.0847048Z test_ops.py::TestCommonCUDA::test_python_ref__refs_expm1_cuda_int16 PASSED [ 24%] 2023-01-11T23:13:47.0847202Z test_ops.py::TestCommonCUDA::test_python_ref__refs_expm1_cuda_int32 PASSED [ 24%] 2023-01-11T23:13:47.0847357Z test_ops.py::TestCommonCUDA::test_python_ref__refs_expm1_cuda_int64 PASSED [ 24%] 2023-01-11T23:13:47.0847507Z test_ops.py::TestCommonCUDA::test_python_ref__refs_expm1_cuda_int8 PASSED [ 24%] 2023-01-11T23:13:47.0847658Z test_ops.py::TestCommonCUDA::test_python_ref__refs_eye_cuda_bool PASSED [ 24%] 2023-01-11T23:13:47.0847843Z test_ops.py::TestCommonCUDA::test_python_ref__refs_eye_cuda_complex128 PASSED [ 24%] 2023-01-11T23:13:47.0848001Z test_ops.py::TestCommonCUDA::test_python_ref__refs_eye_cuda_complex64 PASSED [ 24%] 2023-01-11T23:13:47.0848156Z test_ops.py::TestCommonCUDA::test_python_ref__refs_eye_cuda_float16 PASSED [ 24%] 2023-01-11T23:13:47.0848308Z test_ops.py::TestCommonCUDA::test_python_ref__refs_eye_cuda_uint8 PASSED [ 24%] 2023-01-11T23:13:47.0848466Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft2_cuda_bool PASSED [ 24%] 2023-01-11T23:13:47.0848630Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft2_cuda_complex64 PASSED [ 24%] 2023-01-11T23:13:47.0848794Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft2_cuda_float16 PASSED [ 24%] 2023-01-11T23:13:47.0848977Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft2_cuda_float32 PASSED [ 24%] 2023-01-11T23:13:47.0849157Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft2_cuda_int64 PASSED [ 24%] 2023-01-11T23:13:47.0849315Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft2_cuda_uint8 PASSED [ 24%] 2023-01-11T23:13:47.0849476Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft_cuda_complex32 PASSED [ 24%] 2023-01-11T23:13:47.0849640Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft_cuda_complex64 PASSED [ 24%] 2023-01-11T23:13:47.0849800Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft_cuda_float16 PASSED [ 24%] 2023-01-11T23:13:47.0849958Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft_cuda_float32 PASSED [ 24%] 2023-01-11T23:13:47.0850113Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft_cuda_float64 PASSED [ 24%] 2023-01-11T23:13:47.0850260Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft_cuda_int16 PASSED [ 24%] 2023-01-11T23:13:47.0850442Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft_cuda_int8 PASSED [ 24%] 2023-01-11T23:13:47.0850598Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftn_cuda_bool PASSED [ 24%] 2023-01-11T23:13:47.0850763Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftn_cuda_complex128 PASSED [ 24%] 2023-01-11T23:13:47.0850927Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftn_cuda_complex32 PASSED [ 24%] 2023-01-11T23:13:47.0851090Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftn_cuda_complex64 PASSED [ 24%] 2023-01-11T23:13:47.0851249Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftn_cuda_float16 PASSED [ 24%] 2023-01-11T23:13:47.0851407Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftn_cuda_float64 PASSED [ 24%] 2023-01-11T23:13:47.0851557Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftn_cuda_int16 PASSED [ 24%] 2023-01-11T23:13:47.0851716Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftn_cuda_int8 PASSED [ 24%] 2023-01-11T23:13:47.0851883Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftshift_cuda_bfloat16 PASSED [ 24%] 2023-01-11T23:13:47.0852056Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftshift_cuda_complex128 PASSED [ 24%] 2023-01-11T23:13:47.0852224Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftshift_cuda_complex64 PASSED [ 24%] 2023-01-11T23:13:47.0852390Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftshift_cuda_float64 PASSED [ 24%] 2023-01-11T23:13:47.0852552Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftshift_cuda_int32 PASSED [ 24%] 2023-01-11T23:13:47.0852715Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftshift_cuda_uint8 PASSED [ 24%] 2023-01-11T23:13:47.0852872Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft2_cuda_bool PASSED [ 24%] 2023-01-11T23:13:47.0853025Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft2_cuda_complex32 PASSED [ 24%] 2023-01-11T23:13:47.0853189Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft2_cuda_float16 PASSED [ 24%] 2023-01-11T23:13:47.0853345Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft2_cuda_float32 PASSED [ 24%] 2023-01-11T23:13:47.0853525Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft2_cuda_int16 PASSED [ 24%] 2023-01-11T23:13:47.0853680Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft2_cuda_int64 PASSED [ 24%] 2023-01-11T23:13:47.0853838Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft2_cuda_int8 PASSED [ 24%] 2023-01-11T23:13:47.0854001Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft_cuda_complex128 PASSED [ 24%] 2023-01-11T23:13:47.0854159Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft_cuda_float32 PASSED [ 24%] 2023-01-11T23:13:47.0854312Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft_cuda_float64 PASSED [ 24%] 2023-01-11T23:13:47.0854467Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft_cuda_uint8 PASSED [ 24%] 2023-01-11T23:13:47.0854739Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfftn_cuda_bool PASSED [ 24%] 2023-01-11T23:13:47.0854905Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfftn_cuda_complex128 PASSED [ 24%] 2023-01-11T23:13:47.0855072Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfftn_cuda_complex32 PASSED [ 24%] 2023-01-11T23:13:47.0855232Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfftn_cuda_complex64 PASSED [ 24%] 2023-01-11T23:13:47.0855392Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfftn_cuda_float16 PASSED [ 24%] 2023-01-11T23:13:47.0855549Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfftn_cuda_int32 PASSED [ 24%] 2023-01-11T23:13:47.0855696Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfftn_cuda_uint8 PASSED [ 24%] 2023-01-11T23:13:47.0855856Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft2_cuda_bool PASSED [ 24%] 2023-01-11T23:13:47.0856019Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft2_cuda_complex128 PASSED [ 24%] 2023-01-11T23:13:47.0856219Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft2_cuda_complex32 PASSED [ 24%] 2023-01-11T23:13:47.0856379Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft2_cuda_complex64 PASSED [ 24%] 2023-01-11T23:13:47.0856537Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft2_cuda_float16 PASSED [ 24%] 2023-01-11T23:13:47.0856693Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft2_cuda_float32 PASSED [ 24%] 2023-01-11T23:13:47.0856853Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft2_cuda_float64 PASSED [ 24%] 2023-01-11T23:13:47.0857016Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft_cuda_complex128 PASSED [ 24%] 2023-01-11T23:13:47.0857171Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft_cuda_complex32 PASSED [ 24%] 2023-01-11T23:13:47.0857333Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft_cuda_complex64 PASSED [ 24%] 2023-01-11T23:13:47.0857494Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft_cuda_float16 PASSED [ 24%] 2023-01-11T23:13:47.0857653Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft_cuda_float64 PASSED [ 24%] 2023-01-11T23:13:47.0857815Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft_cuda_int64 PASSED [ 24%] 2023-01-11T23:13:47.0857974Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft_cuda_int8 PASSED [ 24%] 2023-01-11T23:13:47.0858134Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftn_cuda_complex32 PASSED [ 24%] 2023-01-11T23:13:47.0858292Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftn_cuda_complex64 PASSED [ 24%] 2023-01-11T23:13:47.0858445Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftn_cuda_float16 PASSED [ 24%] 2023-01-11T23:13:47.0858605Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftn_cuda_int32 PASSED [ 25%] 2023-01-11T23:13:47.0858762Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftn_cuda_uint8 PASSED [ 25%] 2023-01-11T23:13:47.0858947Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftshift_cuda_bfloat16 PASSED [ 25%] 2023-01-11T23:13:47.0859142Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftshift_cuda_complex32 PASSED [ 25%] 2023-01-11T23:13:47.0859341Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftshift_cuda_int64 PASSED [ 25%] 2023-01-11T23:13:47.0859506Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftshift_cuda_int8 PASSED [ 25%] 2023-01-11T23:13:47.0859667Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfft2_cuda_float64 PASSED [ 25%] 2023-01-11T23:13:47.0859829Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfft2_cuda_int32 PASSED [ 25%] 2023-01-11T23:13:47.0859980Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfft2_cuda_int8 PASSED [ 25%] 2023-01-11T23:13:47.0860137Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfft2_cuda_uint8 PASSED [ 25%] 2023-01-11T23:13:47.0860294Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfft_cuda_bool PASSED [ 25%] 2023-01-11T23:13:47.0860457Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfft_cuda_float16 PASSED [ 25%] 2023-01-11T23:13:47.0860616Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfft_cuda_float64 PASSED [ 25%] 2023-01-11T23:13:47.0860777Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfft_cuda_uint8 PASSED [ 25%] 2023-01-11T23:13:47.0860934Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfftn_cuda_bool PASSED [ 25%] 2023-01-11T23:13:47.0861092Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfftn_cuda_int16 PASSED [ 25%] 2023-01-11T23:13:47.0861242Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfftn_cuda_uint8 PASSED [ 25%] 2023-01-11T23:13:47.0861406Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft2_cuda_float16 PASSED [ 25%] 2023-01-11T23:13:47.0861563Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft2_cuda_int64 PASSED [ 25%] 2023-01-11T23:13:47.0861719Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft2_cuda_int8 PASSED [ 25%] 2023-01-11T23:13:47.0861903Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft2_cuda_uint8 PASSED [ 25%] 2023-01-11T23:13:47.0862072Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft_cuda_complex128 PASSED [ 25%] 2023-01-11T23:13:47.0862234Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft_cuda_complex32 PASSED [ 25%] 2023-01-11T23:13:47.0862393Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft_cuda_complex64 PASSED [ 25%] 2023-01-11T23:13:47.0862544Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft_cuda_float32 PASSED [ 25%] 2023-01-11T23:13:47.0862704Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft_cuda_float64 PASSED [ 25%] 2023-01-11T23:13:47.0862860Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft_cuda_int16 PASSED [ 25%] 2023-01-11T23:13:47.0863013Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft_cuda_int32 PASSED [ 25%] 2023-01-11T23:13:47.0863178Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft_cuda_int8 PASSED [ 25%] 2023-01-11T23:13:47.0863335Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft_cuda_uint8 PASSED [ 25%] 2023-01-11T23:13:47.0863501Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfftn_cuda_complex32 PASSED [ 25%] 2023-01-11T23:13:47.0863662Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfftn_cuda_float16 PASSED [ 25%] 2023-01-11T23:13:47.0863822Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfftn_cuda_float64 PASSED [ 25%] 2023-01-11T23:13:47.0863972Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfftn_cuda_int16 PASSED [ 25%] 2023-01-11T23:13:47.0864128Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfftn_cuda_int32 PASSED [ 25%] 2023-01-11T23:13:47.0864288Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfftn_cuda_int64 PASSED [ 25%] 2023-01-11T23:13:47.0864446Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfftn_cuda_int8 PASSED [ 25%] 2023-01-11T23:13:47.0864608Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfftn_cuda_uint8 PASSED [ 25%] 2023-01-11T23:13:47.0864769Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfft2_cuda_float16 PASSED [ 25%] 2023-01-11T23:13:47.0864961Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfft2_cuda_float64 PASSED [ 25%] 2023-01-11T23:13:47.0865127Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfft_cuda_bool PASSED [ 25%] 2023-01-11T23:13:47.0865278Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfft_cuda_float64 PASSED [ 25%] 2023-01-11T23:13:47.0865435Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfft_cuda_int16 PASSED [ 25%] 2023-01-11T23:13:47.0865593Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfft_cuda_int32 PASSED [ 25%] 2023-01-11T23:13:47.0865746Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfft_cuda_int64 PASSED [ 25%] 2023-01-11T23:13:47.0865900Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfft_cuda_uint8 PASSED [ 25%] 2023-01-11T23:13:47.0866055Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfftn_cuda_bool PASSED [ 25%] 2023-01-11T23:13:47.0866216Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfftn_cuda_float16 PASSED [ 25%] 2023-01-11T23:13:47.0866384Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfftn_cuda_float64 PASSED [ 25%] 2023-01-11T23:13:47.0866531Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfftn_cuda_int8 PASSED [ 25%] 2023-01-11T23:13:47.0866688Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fill_cuda_bfloat16 PASSED [ 25%] 2023-01-11T23:13:47.0866842Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fill_cuda_complex32 PASSED [ 25%] 2023-01-11T23:13:47.0867000Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fill_cuda_float16 PASSED [ 25%] 2023-01-11T23:13:47.0867156Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fill_cuda_float64 PASSED [ 25%] 2023-01-11T23:13:47.0867309Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fill_cuda_int32 PASSED [ 25%] 2023-01-11T23:13:47.0867511Z test_ops.py::TestCommonCUDA::test_python_ref__refs_flatten_cuda_complex32 PASSED [ 25%] 2023-01-11T23:13:47.0867670Z test_ops.py::TestCommonCUDA::test_python_ref__refs_flatten_cuda_float16 PASSED [ 25%] 2023-01-11T23:13:47.0867828Z test_ops.py::TestCommonCUDA::test_python_ref__refs_flatten_cuda_float64 PASSED [ 25%] 2023-01-11T23:13:47.0867979Z test_ops.py::TestCommonCUDA::test_python_ref__refs_flatten_cuda_int8 PASSED [ 25%] 2023-01-11T23:13:47.0868137Z test_ops.py::TestCommonCUDA::test_python_ref__refs_flatten_cuda_uint8 PASSED [ 25%] 2023-01-11T23:13:47.0868292Z test_ops.py::TestCommonCUDA::test_python_ref__refs_flip_cuda_bool PASSED [ 25%] 2023-01-11T23:13:47.0868451Z test_ops.py::TestCommonCUDA::test_python_ref__refs_flip_cuda_complex128 PASSED [ 25%] 2023-01-11T23:13:47.0868611Z test_ops.py::TestCommonCUDA::test_python_ref__refs_flip_cuda_complex64 PASSED [ 25%] 2023-01-11T23:13:47.0868770Z test_ops.py::TestCommonCUDA::test_python_ref__refs_flip_cuda_float16 PASSED [ 25%] 2023-01-11T23:13:47.0868925Z test_ops.py::TestCommonCUDA::test_python_ref__refs_flip_cuda_float32 PASSED [ 25%] 2023-01-11T23:13:47.0869082Z test_ops.py::TestCommonCUDA::test_python_ref__refs_flip_cuda_int64 PASSED [ 25%] 2023-01-11T23:13:47.0869230Z test_ops.py::TestCommonCUDA::test_python_ref__refs_flip_cuda_uint8 PASSED [ 25%] 2023-01-11T23:13:47.0869388Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fliplr_cuda_bfloat16 PASSED [ 25%] 2023-01-11T23:13:47.0869545Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fliplr_cuda_int32 PASSED [ 25%] 2023-01-11T23:13:47.0869751Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fliplr_cuda_int64 PASSED [ 25%] 2023-01-11T23:13:47.0869909Z test_ops.py::TestCommonCUDA::test_python_ref__refs_flipud_cuda_bool PASSED [ 25%] 2023-01-11T23:13:47.0870074Z test_ops.py::TestCommonCUDA::test_python_ref__refs_flipud_cuda_complex128 PASSED [ 25%] 2023-01-11T23:13:47.0870232Z test_ops.py::TestCommonCUDA::test_python_ref__refs_flipud_cuda_float16 PASSED [ 25%] 2023-01-11T23:13:47.0870396Z test_ops.py::TestCommonCUDA::test_python_ref__refs_flipud_cuda_float32 PASSED [ 25%] 2023-01-11T23:13:47.0870545Z test_ops.py::TestCommonCUDA::test_python_ref__refs_flipud_cuda_float64 PASSED [ 25%] 2023-01-11T23:13:47.0870730Z test_ops.py::TestCommonCUDA::test_python_ref__refs_flipud_cuda_int16 PASSED [ 25%] 2023-01-11T23:13:47.0870887Z test_ops.py::TestCommonCUDA::test_python_ref__refs_flipud_cuda_int32 PASSED [ 25%] 2023-01-11T23:13:47.0871041Z test_ops.py::TestCommonCUDA::test_python_ref__refs_flipud_cuda_int64 PASSED [ 25%] 2023-01-11T23:13:47.0871201Z test_ops.py::TestCommonCUDA::test_python_ref__refs_flipud_cuda_int8 PASSED [ 25%] 2023-01-11T23:13:47.0871361Z test_ops.py::TestCommonCUDA::test_python_ref__refs_flipud_cuda_uint8 PASSED [ 25%] 2023-01-11T23:13:47.0871529Z test_ops.py::TestCommonCUDA::test_python_ref__refs_float_power_cuda_bfloat16 PASSED [ 25%] 2023-01-11T23:13:47.0871694Z test_ops.py::TestCommonCUDA::test_python_ref__refs_float_power_cuda_bool PASSED [ 25%] 2023-01-11T23:13:47.0871854Z test_ops.py::TestCommonCUDA::test_python_ref__refs_float_power_cuda_float16 PASSED [ 25%] 2023-01-11T23:13:47.0872021Z test_ops.py::TestCommonCUDA::test_python_ref__refs_float_power_cuda_float64 PASSED [ 25%] 2023-01-11T23:13:47.0872185Z test_ops.py::TestCommonCUDA::test_python_ref__refs_float_power_cuda_int32 PASSED [ 25%] 2023-01-11T23:13:47.0872347Z test_ops.py::TestCommonCUDA::test_python_ref__refs_float_power_cuda_int8 PASSED [ 25%] 2023-01-11T23:13:47.0872511Z test_ops.py::TestCommonCUDA::test_python_ref__refs_float_power_cuda_uint8 PASSED [ 25%] 2023-01-11T23:13:47.0872670Z test_ops.py::TestCommonCUDA::test_python_ref__refs_floor_cuda_bfloat16 PASSED [ 25%] 2023-01-11T23:13:47.0872829Z test_ops.py::TestCommonCUDA::test_python_ref__refs_floor_cuda_float16 PASSED [ 25%] 2023-01-11T23:13:47.0872990Z test_ops.py::TestCommonCUDA::test_python_ref__refs_floor_cuda_float64 PASSED [ 25%] 2023-01-11T23:13:47.0873147Z test_ops.py::TestCommonCUDA::test_python_ref__refs_floor_cuda_int8 PASSED [ 25%] 2023-01-11T23:13:47.0873326Z test_ops.py::TestCommonCUDA::test_python_ref__refs_floor_cuda_uint8 PASSED [ 25%] 2023-01-11T23:13:47.0873493Z test_ops.py::TestCommonCUDA::test_python_ref__refs_floor_divide_cuda_float32 PASSED [ 25%] 2023-01-11T23:13:47.0873661Z test_ops.py::TestCommonCUDA::test_python_ref__refs_floor_divide_cuda_int16 PASSED [ 25%] 2023-01-11T23:13:47.0873823Z test_ops.py::TestCommonCUDA::test_python_ref__refs_floor_divide_cuda_int64 PASSED [ 25%] 2023-01-11T23:13:47.0873982Z test_ops.py::TestCommonCUDA::test_python_ref__refs_floor_divide_cuda_uint8 PASSED [ 25%] 2023-01-11T23:13:47.0874145Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fmax_cuda_bfloat16 PASSED [ 25%] 2023-01-11T23:13:47.0874301Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fmax_cuda_bool PASSED [ 25%] 2023-01-11T23:13:47.0874460Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fmax_cuda_float32 PASSED [ 25%] 2023-01-11T23:13:47.0874608Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fmax_cuda_float64 PASSED [ 25%] 2023-01-11T23:13:47.0874768Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fmax_cuda_int64 PASSED [ 25%] 2023-01-11T23:13:47.0874927Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fmax_cuda_int8 PASSED [ 25%] 2023-01-11T23:13:47.0875087Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fmin_cuda_bfloat16 PASSED [ 25%] 2023-01-11T23:13:47.0875240Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fmin_cuda_bool PASSED [ 25%] 2023-01-11T23:13:47.0875396Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fmin_cuda_int16 PASSED [ 25%] 2023-01-11T23:13:47.0875551Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fmin_cuda_int8 PASSED [ 25%] 2023-01-11T23:13:47.0875706Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fmin_cuda_uint8 PASSED [ 25%] 2023-01-11T23:13:47.0875860Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fmod_cuda_bfloat16 PASSED [ 25%] 2023-01-11T23:13:47.0876014Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fmod_cuda_int64 PASSED [ 25%] 2023-01-11T23:13:47.0876176Z test_ops.py::TestCommonCUDA::test_python_ref__refs_frac_cuda_bfloat16 PASSED [ 25%] 2023-01-11T23:13:47.0876334Z test_ops.py::TestCommonCUDA::test_python_ref__refs_frac_cuda_float64 PASSED [ 25%] 2023-01-11T23:13:47.0876512Z test_ops.py::TestCommonCUDA::test_python_ref__refs_gcd_cuda_int16 PASSED [ 25%] 2023-01-11T23:13:47.0876664Z test_ops.py::TestCommonCUDA::test_python_ref__refs_gcd_cuda_int32 PASSED [ 25%] 2023-01-11T23:13:47.0876818Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ge_cuda_bfloat16 PASSED [ 25%] 2023-01-11T23:13:47.0876967Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ge_cuda_bool PASSED [ 25%] 2023-01-11T23:13:47.0877111Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ge_cuda_float16 PASSED [ 25%] 2023-01-11T23:13:47.0877265Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ge_cuda_float32 PASSED [ 25%] 2023-01-11T23:13:47.0877418Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ge_cuda_int16 PASSED [ 25%] 2023-01-11T23:13:47.0877566Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ge_cuda_uint8 PASSED [ 25%] 2023-01-11T23:13:47.0877721Z test_ops.py::TestCommonCUDA::test_python_ref__refs_gt_cuda_bool PASSED [ 25%] 2023-01-11T23:13:47.0877873Z test_ops.py::TestCommonCUDA::test_python_ref__refs_gt_cuda_float16 PASSED [ 25%] 2023-01-11T23:13:47.0878028Z test_ops.py::TestCommonCUDA::test_python_ref__refs_gt_cuda_float32 PASSED [ 25%] 2023-01-11T23:13:47.0878176Z test_ops.py::TestCommonCUDA::test_python_ref__refs_gt_cuda_float64 PASSED [ 25%] 2023-01-11T23:13:47.0878316Z test_ops.py::TestCommonCUDA::test_python_ref__refs_gt_cuda_int16 PASSED [ 25%] 2023-01-11T23:13:47.0878465Z test_ops.py::TestCommonCUDA::test_python_ref__refs_gt_cuda_int32 PASSED [ 25%] 2023-01-11T23:13:47.0878612Z test_ops.py::TestCommonCUDA::test_python_ref__refs_gt_cuda_int64 PASSED [ 25%] 2023-01-11T23:13:47.0878763Z test_ops.py::TestCommonCUDA::test_python_ref__refs_gt_cuda_int8 PASSED [ 25%] 2023-01-11T23:13:47.0878950Z test_ops.py::TestCommonCUDA::test_python_ref__refs_heaviside_cuda_bfloat16 PASSED [ 25%] 2023-01-11T23:13:47.0879161Z test_ops.py::TestCommonCUDA::test_python_ref__refs_heaviside_cuda_bool PASSED [ 25%] 2023-01-11T23:13:47.0879322Z test_ops.py::TestCommonCUDA::test_python_ref__refs_heaviside_cuda_float16 PASSED [ 25%] 2023-01-11T23:13:47.0879489Z test_ops.py::TestCommonCUDA::test_python_ref__refs_heaviside_cuda_float32 PASSED [ 25%] 2023-01-11T23:13:47.0879649Z test_ops.py::TestCommonCUDA::test_python_ref__refs_heaviside_cuda_float64 PASSED [ 25%] 2023-01-11T23:13:47.0879803Z test_ops.py::TestCommonCUDA::test_python_ref__refs_heaviside_cuda_int32 PASSED [ 25%] 2023-01-11T23:13:47.0879959Z test_ops.py::TestCommonCUDA::test_python_ref__refs_heaviside_cuda_int64 PASSED [ 25%] 2023-01-11T23:13:47.0880121Z test_ops.py::TestCommonCUDA::test_python_ref__refs_heaviside_cuda_int8 PASSED [ 25%] 2023-01-11T23:13:47.0880276Z test_ops.py::TestCommonCUDA::test_python_ref__refs_hsplit_cuda_bfloat16 PASSED [ 25%] 2023-01-11T23:13:47.0880434Z test_ops.py::TestCommonCUDA::test_python_ref__refs_hsplit_cuda_bool PASSED [ 25%] 2023-01-11T23:13:47.0880596Z test_ops.py::TestCommonCUDA::test_python_ref__refs_hsplit_cuda_complex128 PASSED [ 25%] 2023-01-11T23:13:47.0880755Z test_ops.py::TestCommonCUDA::test_python_ref__refs_hsplit_cuda_complex32 PASSED [ 25%] 2023-01-11T23:13:47.0880924Z test_ops.py::TestCommonCUDA::test_python_ref__refs_hsplit_cuda_complex64 PASSED [ 25%] 2023-01-11T23:13:47.0881075Z test_ops.py::TestCommonCUDA::test_python_ref__refs_hsplit_cuda_float16 PASSED [ 25%] 2023-01-11T23:13:47.0881231Z test_ops.py::TestCommonCUDA::test_python_ref__refs_hsplit_cuda_int64 PASSED [ 25%] 2023-01-11T23:13:47.0881388Z test_ops.py::TestCommonCUDA::test_python_ref__refs_hsplit_cuda_int8 PASSED [ 25%] 2023-01-11T23:13:47.0881551Z test_ops.py::TestCommonCUDA::test_python_ref__refs_hstack_cuda_bfloat16 PASSED [ 25%] 2023-01-11T23:13:47.0881706Z test_ops.py::TestCommonCUDA::test_python_ref__refs_hstack_cuda_bool PASSED [ 25%] 2023-01-11T23:13:47.0881864Z test_ops.py::TestCommonCUDA::test_python_ref__refs_hstack_cuda_complex128 PASSED [ 25%] 2023-01-11T23:13:47.0882026Z test_ops.py::TestCommonCUDA::test_python_ref__refs_hstack_cuda_complex32 PASSED [ 25%] 2023-01-11T23:13:47.0882186Z test_ops.py::TestCommonCUDA::test_python_ref__refs_hstack_cuda_float32 PASSED [ 25%] 2023-01-11T23:13:47.0882360Z test_ops.py::TestCommonCUDA::test_python_ref__refs_hstack_cuda_float64 PASSED [ 25%] 2023-01-11T23:13:47.0882519Z test_ops.py::TestCommonCUDA::test_python_ref__refs_hstack_cuda_int16 PASSED [ 25%] 2023-01-11T23:13:47.0882670Z test_ops.py::TestCommonCUDA::test_python_ref__refs_hstack_cuda_int32 PASSED [ 25%] 2023-01-11T23:13:47.0882820Z test_ops.py::TestCommonCUDA::test_python_ref__refs_hstack_cuda_int64 PASSED [ 25%] 2023-01-11T23:13:47.0882971Z test_ops.py::TestCommonCUDA::test_python_ref__refs_hstack_cuda_int8 PASSED [ 26%] 2023-01-11T23:13:47.0883128Z test_ops.py::TestCommonCUDA::test_python_ref__refs_hstack_cuda_uint8 PASSED [ 26%] 2023-01-11T23:13:47.0883287Z test_ops.py::TestCommonCUDA::test_python_ref__refs_hypot_cuda_bfloat16 PASSED [ 26%] 2023-01-11T23:13:47.0883445Z test_ops.py::TestCommonCUDA::test_python_ref__refs_hypot_cuda_float64 PASSED [ 26%] 2023-01-11T23:13:47.0883598Z test_ops.py::TestCommonCUDA::test_python_ref__refs_i0_cuda_float32 PASSED [ 26%] 2023-01-11T23:13:47.0883747Z test_ops.py::TestCommonCUDA::test_python_ref__refs_i0_cuda_float64 PASSED [ 26%] 2023-01-11T23:13:47.0883897Z test_ops.py::TestCommonCUDA::test_python_ref__refs_i0_cuda_int32 PASSED [ 26%] 2023-01-11T23:13:47.0884046Z test_ops.py::TestCommonCUDA::test_python_ref__refs_i0_cuda_int64 PASSED [ 26%] 2023-01-11T23:13:47.0884198Z test_ops.py::TestCommonCUDA::test_python_ref__refs_i0_cuda_uint8 PASSED [ 26%] 2023-01-11T23:13:47.0884353Z test_ops.py::TestCommonCUDA::test_python_ref__refs_igamma_cuda_float64 PASSED [ 26%] 2023-01-11T23:13:47.0884510Z test_ops.py::TestCommonCUDA::test_python_ref__refs_igammac_cuda_float64 PASSED [ 26%] 2023-01-11T23:13:47.0884665Z test_ops.py::TestCommonCUDA::test_python_ref__refs_imag_cuda_complex64 PASSED [ 26%] 2023-01-11T23:13:47.0884851Z test_ops.py::TestCommonCUDA::test_python_ref__refs_index_add_cuda_complex128 XFAIL [ 26%] 2023-01-11T23:13:47.0885007Z test_ops.py::TestCommonCUDA::test_python_ref__refs_index_add_cuda_complex32 XFAIL [ 26%] 2023-01-11T23:13:47.0885170Z test_ops.py::TestCommonCUDA::test_python_ref__refs_index_add_cuda_complex64 XFAIL [ 26%] 2023-01-11T23:13:47.0885329Z test_ops.py::TestCommonCUDA::test_python_ref__refs_index_add_cuda_float16 XFAIL [ 26%] 2023-01-11T23:13:47.0885489Z test_ops.py::TestCommonCUDA::test_python_ref__refs_index_add_cuda_float64 XFAIL [ 26%] 2023-01-11T23:13:47.0885641Z test_ops.py::TestCommonCUDA::test_python_ref__refs_index_add_cuda_int16 XFAIL [ 26%] 2023-01-11T23:13:47.0885792Z test_ops.py::TestCommonCUDA::test_python_ref__refs_index_add_cuda_int64 XFAIL [ 26%] 2023-01-11T23:13:47.0885948Z test_ops.py::TestCommonCUDA::test_python_ref__refs_index_add_cuda_uint8 XFAIL [ 26%] 2023-01-11T23:13:47.0886109Z test_ops.py::TestCommonCUDA::test_python_ref__refs_index_copy_cuda_bfloat16 XFAIL [ 26%] 2023-01-11T23:13:47.0886257Z test_ops.py::TestCommonCUDA::test_python_ref__refs_index_copy_cuda_bool XFAIL [ 26%] 2023-01-11T23:13:47.0886419Z test_ops.py::TestCommonCUDA::test_python_ref__refs_index_copy_cuda_complex64 XFAIL [ 26%] 2023-01-11T23:13:47.0886582Z test_ops.py::TestCommonCUDA::test_python_ref__refs_index_copy_cuda_float16 XFAIL [ 26%] 2023-01-11T23:13:47.0886740Z test_ops.py::TestCommonCUDA::test_python_ref__refs_index_copy_cuda_float32 XFAIL [ 26%] 2023-01-11T23:13:47.0886898Z test_ops.py::TestCommonCUDA::test_python_ref__refs_index_copy_cuda_int16 XFAIL [ 26%] 2023-01-11T23:13:47.0887052Z test_ops.py::TestCommonCUDA::test_python_ref__refs_index_copy_cuda_int32 XFAIL [ 26%] 2023-01-11T23:13:47.0887202Z test_ops.py::TestCommonCUDA::test_python_ref__refs_index_copy_cuda_int64 XFAIL [ 26%] 2023-01-11T23:13:47.0887358Z test_ops.py::TestCommonCUDA::test_python_ref__refs_index_copy_cuda_uint8 XFAIL [ 26%] 2023-01-11T23:13:47.0887514Z test_ops.py::TestCommonCUDA::test_python_ref__refs_index_fill_cuda_complex128 XFAIL [ 26%] 2023-01-11T23:13:47.0887682Z test_ops.py::TestCommonCUDA::test_python_ref__refs_index_fill_cuda_complex64 XFAIL [ 26%] 2023-01-11T23:13:47.0887862Z test_ops.py::TestCommonCUDA::test_python_ref__refs_index_fill_cuda_float16 XFAIL [ 26%] 2023-01-11T23:13:47.0888021Z test_ops.py::TestCommonCUDA::test_python_ref__refs_index_fill_cuda_int16 XFAIL [ 26%] 2023-01-11T23:13:47.0888176Z test_ops.py::TestCommonCUDA::test_python_ref__refs_index_fill_cuda_int32 XFAIL [ 26%] 2023-01-11T23:13:47.0888333Z test_ops.py::TestCommonCUDA::test_python_ref__refs_index_fill_cuda_int8 XFAIL [ 26%] 2023-01-11T23:13:47.0888489Z test_ops.py::TestCommonCUDA::test_python_ref__refs_index_fill_cuda_uint8 XFAIL [ 26%] 2023-01-11T23:13:47.0888656Z test_ops.py::TestCommonCUDA::test_python_ref__refs_index_select_cuda_complex128 XFAIL [ 26%] 2023-01-11T23:13:47.0888816Z test_ops.py::TestCommonCUDA::test_python_ref__refs_index_select_cuda_float16 XFAIL [ 26%] 2023-01-11T23:13:47.0888973Z test_ops.py::TestCommonCUDA::test_python_ref__refs_index_select_cuda_float64 XFAIL [ 26%] 2023-01-11T23:13:47.0889129Z test_ops.py::TestCommonCUDA::test_python_ref__refs_index_select_cuda_int16 XFAIL [ 26%] 2023-01-11T23:13:47.0889284Z test_ops.py::TestCommonCUDA::test_python_ref__refs_index_select_cuda_int32 XFAIL [ 26%] 2023-01-11T23:13:47.0889446Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isclose_cuda_bfloat16 PASSED [ 26%] 2023-01-11T23:13:47.0889602Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isclose_cuda_float16 PASSED [ 26%] 2023-01-11T23:13:47.0889756Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isclose_cuda_float32 PASSED [ 26%] 2023-01-11T23:13:47.0889914Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isclose_cuda_uint8 PASSED [ 26%] 2023-01-11T23:13:47.0890074Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isfinite_cuda_bool PASSED [ 26%] 2023-01-11T23:13:47.0890231Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isfinite_cuda_complex64 PASSED [ 26%] 2023-01-11T23:13:47.0890414Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isfinite_cuda_float16 PASSED [ 26%] 2023-01-11T23:13:47.0890577Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isfinite_cuda_float32 PASSED [ 26%] 2023-01-11T23:13:47.0890737Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isfinite_cuda_int16 PASSED [ 26%] 2023-01-11T23:13:47.0890899Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isfinite_cuda_int32 PASSED [ 26%] 2023-01-11T23:13:47.0891055Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isfinite_cuda_int64 PASSED [ 26%] 2023-01-11T23:13:47.0891212Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isfinite_cuda_int8 PASSED [ 26%] 2023-01-11T23:13:47.0891366Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isinf_cuda_bool PASSED [ 26%] 2023-01-11T23:13:47.0891518Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isinf_cuda_complex128 PASSED [ 26%] 2023-01-11T23:13:47.0891678Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isinf_cuda_complex32 PASSED [ 26%] 2023-01-11T23:13:47.0891835Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isinf_cuda_complex64 PASSED [ 26%] 2023-01-11T23:13:47.0891990Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isinf_cuda_float32 PASSED [ 26%] 2023-01-11T23:13:47.0892144Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isinf_cuda_float64 PASSED [ 26%] 2023-01-11T23:13:47.0892299Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isinf_cuda_int16 PASSED [ 26%] 2023-01-11T23:13:47.0892453Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isinf_cuda_int32 PASSED [ 26%] 2023-01-11T23:13:47.0892608Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isnan_cuda_bool PASSED [ 26%] 2023-01-11T23:13:47.0892767Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isnan_cuda_complex128 PASSED [ 26%] 2023-01-11T23:13:47.0892919Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isnan_cuda_complex64 PASSED [ 26%] 2023-01-11T23:13:47.0893075Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isnan_cuda_float16 PASSED [ 26%] 2023-01-11T23:13:47.0893231Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isnan_cuda_float32 PASSED [ 26%] 2023-01-11T23:13:47.0893382Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isnan_cuda_float64 PASSED [ 26%] 2023-01-11T23:13:47.0893565Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isnan_cuda_int16 PASSED [ 26%] 2023-01-11T23:13:47.0893719Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isnan_cuda_int64 PASSED [ 26%] 2023-01-11T23:13:47.0893880Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isneginf_cuda_bfloat16 PASSED [ 26%] 2023-01-11T23:13:47.0894041Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isneginf_cuda_float32 PASSED [ 26%] 2023-01-11T23:13:47.0894191Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isneginf_cuda_float64 PASSED [ 26%] 2023-01-11T23:13:47.0894350Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isneginf_cuda_int16 PASSED [ 26%] 2023-01-11T23:13:47.0894610Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isneginf_cuda_int64 PASSED [ 26%] 2023-01-11T23:13:47.0894770Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isneginf_cuda_uint8 PASSED [ 26%] 2023-01-11T23:13:47.0894931Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isposinf_cuda_bfloat16 PASSED [ 26%] 2023-01-11T23:13:47.0895090Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isposinf_cuda_bool PASSED [ 26%] 2023-01-11T23:13:47.0895252Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isposinf_cuda_float16 PASSED [ 26%] 2023-01-11T23:13:47.0895409Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isposinf_cuda_float32 PASSED [ 26%] 2023-01-11T23:13:47.0895561Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isposinf_cuda_float64 PASSED [ 26%] 2023-01-11T23:13:47.0895721Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isposinf_cuda_int32 PASSED [ 26%] 2023-01-11T23:13:47.0895883Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isposinf_cuda_int64 PASSED [ 26%] 2023-01-11T23:13:47.0896039Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isposinf_cuda_int8 PASSED [ 26%] 2023-01-11T23:13:47.0896237Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isposinf_cuda_uint8 PASSED [ 26%] 2023-01-11T23:13:47.0896395Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isreal_cuda_bool PASSED [ 26%] 2023-01-11T23:13:47.0896556Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isreal_cuda_complex32 PASSED [ 26%] 2023-01-11T23:13:47.0896719Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isreal_cuda_complex64 PASSED [ 26%] 2023-01-11T23:13:47.0896874Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isreal_cuda_float16 PASSED [ 26%] 2023-01-11T23:13:47.0897020Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isreal_cuda_float32 PASSED [ 26%] 2023-01-11T23:13:47.0897173Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isreal_cuda_float64 PASSED [ 26%] 2023-01-11T23:13:47.0897329Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isreal_cuda_int16 PASSED [ 26%] 2023-01-11T23:13:47.0897480Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isreal_cuda_int32 PASSED [ 26%] 2023-01-11T23:13:47.0897638Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isreal_cuda_int8 PASSED [ 26%] 2023-01-11T23:13:47.0897795Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isreal_cuda_uint8 PASSED [ 26%] 2023-01-11T23:13:47.0897949Z test_ops.py::TestCommonCUDA::test_python_ref__refs_lcm_cuda_int16 PASSED [ 26%] 2023-01-11T23:13:47.0898103Z test_ops.py::TestCommonCUDA::test_python_ref__refs_lcm_cuda_int64 PASSED [ 26%] 2023-01-11T23:13:47.0898247Z test_ops.py::TestCommonCUDA::test_python_ref__refs_lcm_cuda_int8 PASSED [ 26%] 2023-01-11T23:13:47.0898397Z test_ops.py::TestCommonCUDA::test_python_ref__refs_lcm_cuda_uint8 PASSED [ 26%] 2023-01-11T23:13:47.0898550Z test_ops.py::TestCommonCUDA::test_python_ref__refs_le_cuda_float16 PASSED [ 26%] 2023-01-11T23:13:47.0898706Z test_ops.py::TestCommonCUDA::test_python_ref__refs_le_cuda_float32 PASSED [ 26%] 2023-01-11T23:13:47.0898856Z test_ops.py::TestCommonCUDA::test_python_ref__refs_le_cuda_float64 PASSED [ 26%] 2023-01-11T23:13:47.0899008Z test_ops.py::TestCommonCUDA::test_python_ref__refs_le_cuda_int16 PASSED [ 26%] 2023-01-11T23:13:47.0899166Z test_ops.py::TestCommonCUDA::test_python_ref__refs_lerp_cuda_bfloat16 PASSED [ 26%] 2023-01-11T23:13:47.0899358Z test_ops.py::TestCommonCUDA::test_python_ref__refs_lerp_cuda_float32 PASSED [ 26%] 2023-01-11T23:13:47.0899509Z test_ops.py::TestCommonCUDA::test_python_ref__refs_lgamma_cuda_bool PASSED [ 26%] 2023-01-11T23:13:47.0899664Z test_ops.py::TestCommonCUDA::test_python_ref__refs_lgamma_cuda_float16 PASSED [ 26%] 2023-01-11T23:13:47.0899815Z test_ops.py::TestCommonCUDA::test_python_ref__refs_lgamma_cuda_int32 PASSED [ 26%] 2023-01-11T23:13:47.0899969Z test_ops.py::TestCommonCUDA::test_python_ref__refs_lgamma_cuda_int64 PASSED [ 26%] 2023-01-11T23:13:47.0900123Z test_ops.py::TestCommonCUDA::test_python_ref__refs_lgamma_cuda_int8 PASSED [ 26%] 2023-01-11T23:13:47.0900280Z test_ops.py::TestCommonCUDA::test_python_ref__refs_lgamma_cuda_uint8 PASSED [ 26%] 2023-01-11T23:13:47.0900454Z test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_matrix_norm_cuda_bfloat16 PASSED [ 26%] 2023-01-11T23:13:47.0900631Z test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_matrix_norm_cuda_float16 PASSED [ 26%] 2023-01-11T23:13:47.0900799Z test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_matrix_norm_cuda_float32 PASSED [ 26%] 2023-01-11T23:13:47.0900965Z test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_norm_cuda_bfloat16 PASSED [ 26%] 2023-01-11T23:13:47.0901126Z test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_norm_cuda_float32 PASSED [ 26%] 2023-01-11T23:13:47.0901291Z test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_svd_cuda_complex128 PASSED [ 26%] 2023-01-11T23:13:47.0901463Z test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_svdvals_cuda_complex128 PASSED [ 26%] 2023-01-11T23:13:47.0901630Z test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_svdvals_cuda_float64 PASSED [ 26%] 2023-01-11T23:13:47.0901805Z test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_vector_norm_cuda_complex128 PASSED [ 26%] 2023-01-11T23:13:47.0902000Z test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_vector_norm_cuda_float32 PASSED [ 26%] 2023-01-11T23:13:47.0902173Z test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_vector_norm_cuda_float64 PASSED [ 26%] 2023-01-11T23:13:47.0902334Z test_ops.py::TestCommonCUDA::test_python_ref__refs_linspace_cuda_complex128 PASSED [ 26%] 2023-01-11T23:13:47.0902503Z test_ops.py::TestCommonCUDA::test_python_ref__refs_linspace_cuda_complex64 PASSED [ 26%] 2023-01-11T23:13:47.0902661Z test_ops.py::TestCommonCUDA::test_python_ref__refs_linspace_cuda_int64 XFAIL [ 26%] 2023-01-11T23:13:47.0902815Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log10_cuda_bfloat16 PASSED [ 26%] 2023-01-11T23:13:47.0902967Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log10_cuda_bool PASSED [ 26%] 2023-01-11T23:13:47.0903123Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log10_cuda_float16 PASSED [ 26%] 2023-01-11T23:13:47.0903278Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log10_cuda_int16 PASSED [ 26%] 2023-01-11T23:13:47.0903434Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log10_cuda_int32 PASSED [ 26%] 2023-01-11T23:13:47.0903578Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log10_cuda_int64 PASSED [ 26%] 2023-01-11T23:13:47.0903733Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log10_cuda_int8 PASSED [ 26%] 2023-01-11T23:13:47.0903887Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log1p_cuda_bool PASSED [ 26%] 2023-01-11T23:13:47.0904054Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log1p_cuda_complex128 PASSED [ 26%] 2023-01-11T23:13:47.0904207Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log1p_cuda_int32 PASSED [ 26%] 2023-01-11T23:13:47.0904359Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log1p_cuda_int8 PASSED [ 26%] 2023-01-11T23:13:47.0904512Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log1p_cuda_uint8 PASSED [ 26%] 2023-01-11T23:13:47.0904671Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log2_cuda_complex128 PASSED [ 26%] 2023-01-11T23:13:47.0904826Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log2_cuda_complex64 PASSED [ 26%] 2023-01-11T23:13:47.0904981Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log2_cuda_float64 PASSED [ 26%] 2023-01-11T23:13:47.0905156Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log2_cuda_int16 PASSED [ 26%] 2023-01-11T23:13:47.0905306Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log2_cuda_int32 PASSED [ 26%] 2023-01-11T23:13:47.0905457Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log2_cuda_int64 PASSED [ 26%] 2023-01-11T23:13:47.0905609Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log2_cuda_int8 PASSED [ 26%] 2023-01-11T23:13:47.0905763Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log2_cuda_uint8 PASSED [ 26%] 2023-01-11T23:13:47.0905921Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log_cuda_bfloat16 PASSED [ 26%] 2023-01-11T23:13:47.0906074Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log_cuda_bool PASSED [ 26%] 2023-01-11T23:13:47.0906219Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log_cuda_int8 PASSED [ 26%] 2023-01-11T23:13:47.0906401Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log_softmax_with_dtype_cuda_bfloat16 PASSED [ 26%] 2023-01-11T23:13:47.0906585Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log_softmax_with_dtype_cuda_complex32 PASSED [ 26%] 2023-01-11T23:13:47.0906762Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log_softmax_with_dtype_cuda_float32 PASSED [ 26%] 2023-01-11T23:13:47.0906933Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log_softmax_with_dtype_cuda_float64 PASSED [ 26%] 2023-01-11T23:13:47.0907106Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log_softmax_with_dtype_cuda_int32 PASSED [ 27%] 2023-01-11T23:13:47.0907274Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log_softmax_with_dtype_cuda_int64 PASSED [ 27%] 2023-01-11T23:13:47.0907441Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_and_cuda_bfloat16 PASSED [ 27%] 2023-01-11T23:13:47.0907639Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_and_cuda_complex64 PASSED [ 27%] 2023-01-11T23:13:47.0907803Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_and_cuda_float32 PASSED [ 27%] 2023-01-11T23:13:47.0907966Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_and_cuda_int16 PASSED [ 27%] 2023-01-11T23:13:47.0908126Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_and_cuda_uint8 PASSED [ 27%] 2023-01-11T23:13:47.0908291Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_not_cuda_complex64 PASSED [ 27%] 2023-01-11T23:13:47.0908452Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_not_cuda_float16 PASSED [ 27%] 2023-01-11T23:13:47.0908608Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_not_cuda_int32 PASSED [ 27%] 2023-01-11T23:13:47.0908763Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_not_cuda_int64 PASSED [ 27%] 2023-01-11T23:13:47.0908926Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_not_cuda_int8 PASSED [ 27%] 2023-01-11T23:13:47.0909082Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_not_cuda_uint8 PASSED [ 27%] 2023-01-11T23:13:47.0909241Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_or_cuda_bfloat16 PASSED [ 27%] 2023-01-11T23:13:47.0909407Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_or_cuda_complex128 PASSED [ 27%] 2023-01-11T23:13:47.0909569Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_or_cuda_float64 PASSED [ 27%] 2023-01-11T23:13:47.0909784Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_or_cuda_int64 PASSED [ 27%] 2023-01-11T23:13:47.0909946Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_or_cuda_int8 PASSED [ 27%] 2023-01-11T23:13:47.0910105Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_or_cuda_uint8 PASSED [ 27%] 2023-01-11T23:13:47.0910273Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_xor_cuda_complex128 PASSED [ 27%] 2023-01-11T23:13:47.0910433Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_xor_cuda_complex64 PASSED [ 27%] 2023-01-11T23:13:47.0910591Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_xor_cuda_int32 PASSED [ 27%] 2023-01-11T23:13:47.0910775Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_xor_cuda_uint8 PASSED [ 27%] 2023-01-11T23:13:47.0910938Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logspace_cuda_bfloat16 PASSED [ 27%] 2023-01-11T23:13:47.0911102Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logspace_cuda_complex128 PASSED [ 27%] 2023-01-11T23:13:47.0911261Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logspace_cuda_float64 PASSED [ 27%] 2023-01-11T23:13:47.0911419Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logspace_cuda_int16 XFAIL [ 27%] 2023-01-11T23:13:47.0911572Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logspace_cuda_int32 XFAIL [ 27%] 2023-01-11T23:13:47.0911719Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logspace_cuda_int64 XFAIL [ 27%] 2023-01-11T23:13:47.0911885Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logsumexp_cuda_bfloat16 PASSED [ 27%] 2023-01-11T23:13:47.0912046Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logsumexp_cuda_bool PASSED [ 27%] 2023-01-11T23:13:47.0912206Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logsumexp_cuda_int8 PASSED [ 27%] 2023-01-11T23:13:47.0912362Z test_ops.py::TestCommonCUDA::test_python_ref__refs_lt_cuda_float32 PASSED [ 27%] 2023-01-11T23:13:47.0912514Z test_ops.py::TestCommonCUDA::test_python_ref__refs_lt_cuda_int16 PASSED [ 27%] 2023-01-11T23:13:47.0912666Z test_ops.py::TestCommonCUDA::test_python_ref__refs_lt_cuda_int32 PASSED [ 27%] 2023-01-11T23:13:47.0912816Z test_ops.py::TestCommonCUDA::test_python_ref__refs_lt_cuda_int8 PASSED [ 27%] 2023-01-11T23:13:47.0912968Z test_ops.py::TestCommonCUDA::test_python_ref__refs_lt_cuda_uint8 PASSED [ 27%] 2023-01-11T23:13:47.0913121Z test_ops.py::TestCommonCUDA::test_python_ref__refs_masked_fill_cuda_bool PASSED [ 27%] 2023-01-11T23:13:47.0913314Z test_ops.py::TestCommonCUDA::test_python_ref__refs_masked_fill_cuda_complex64 PASSED [ 27%] 2023-01-11T23:13:47.0913476Z test_ops.py::TestCommonCUDA::test_python_ref__refs_masked_fill_cuda_float32 PASSED [ 27%] 2023-01-11T23:13:47.0913643Z test_ops.py::TestCommonCUDA::test_python_ref__refs_masked_fill_cuda_float64 PASSED [ 27%] 2023-01-11T23:13:47.0913801Z test_ops.py::TestCommonCUDA::test_python_ref__refs_masked_fill_cuda_int16 PASSED [ 27%] 2023-01-11T23:13:47.0913959Z test_ops.py::TestCommonCUDA::test_python_ref__refs_masked_fill_cuda_int32 PASSED [ 27%] 2023-01-11T23:13:47.0914120Z test_ops.py::TestCommonCUDA::test_python_ref__refs_masked_fill_cuda_int8 PASSED [ 27%] 2023-01-11T23:13:47.0914277Z test_ops.py::TestCommonCUDA::test_python_ref__refs_masked_fill_cuda_uint8 PASSED [ 27%] 2023-01-11T23:13:47.0914429Z test_ops.py::TestCommonCUDA::test_python_ref__refs_maximum_cuda_bfloat16 PASSED [ 27%] 2023-01-11T23:13:47.0914590Z test_ops.py::TestCommonCUDA::test_python_ref__refs_maximum_cuda_float16 PASSED [ 27%] 2023-01-11T23:13:47.0914747Z test_ops.py::TestCommonCUDA::test_python_ref__refs_maximum_cuda_float64 PASSED [ 27%] 2023-01-11T23:13:47.0914905Z test_ops.py::TestCommonCUDA::test_python_ref__refs_maximum_cuda_int16 PASSED [ 27%] 2023-01-11T23:13:47.0915064Z test_ops.py::TestCommonCUDA::test_python_ref__refs_mean_cuda_float32 PASSED [ 27%] 2023-01-11T23:13:47.0915244Z test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_list_of_tensors_cuda_bfloat16 PASSED [ 27%] 2023-01-11T23:13:47.0915428Z test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_list_of_tensors_cuda_complex128 PASSED [ 27%] 2023-01-11T23:13:47.0915612Z test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_list_of_tensors_cuda_complex64 PASSED [ 27%] 2023-01-11T23:13:47.0915792Z test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_list_of_tensors_cuda_float32 PASSED [ 27%] 2023-01-11T23:13:47.0915964Z test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_list_of_tensors_cuda_float64 PASSED [ 27%] 2023-01-11T23:13:47.0916145Z test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_list_of_tensors_cuda_int16 PASSED [ 27%] 2023-01-11T23:13:47.0916318Z test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_list_of_tensors_cuda_int8 PASSED [ 27%] 2023-01-11T23:13:47.0916566Z test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_variadic_tensors_cuda_complex128 PASSED [ 27%] 2023-01-11T23:13:47.0916752Z test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_variadic_tensors_cuda_complex64 PASSED [ 27%] 2023-01-11T23:13:47.0916936Z test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_variadic_tensors_cuda_float16 PASSED [ 27%] 2023-01-11T23:13:47.0917118Z test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_variadic_tensors_cuda_float64 PASSED [ 27%] 2023-01-11T23:13:47.0917293Z test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_variadic_tensors_cuda_int32 PASSED [ 27%] 2023-01-11T23:13:47.0917465Z test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_variadic_tensors_cuda_int64 PASSED [ 27%] 2023-01-11T23:13:47.0917637Z test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_variadic_tensors_cuda_int8 PASSED [ 27%] 2023-01-11T23:13:47.0917807Z test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_variadic_tensors_cuda_uint8 PASSED [ 27%] 2023-01-11T23:13:47.0917968Z test_ops.py::TestCommonCUDA::test_python_ref__refs_minimum_cuda_bfloat16 PASSED [ 27%] 2023-01-11T23:13:47.0918127Z test_ops.py::TestCommonCUDA::test_python_ref__refs_minimum_cuda_bool PASSED [ 27%] 2023-01-11T23:13:47.0918286Z test_ops.py::TestCommonCUDA::test_python_ref__refs_minimum_cuda_float64 PASSED [ 27%] 2023-01-11T23:13:47.0918441Z test_ops.py::TestCommonCUDA::test_python_ref__refs_minimum_cuda_int16 PASSED [ 27%] 2023-01-11T23:13:47.0918595Z test_ops.py::TestCommonCUDA::test_python_ref__refs_minimum_cuda_int32 PASSED [ 27%] 2023-01-11T23:13:47.0918750Z test_ops.py::TestCommonCUDA::test_python_ref__refs_minimum_cuda_int8 PASSED [ 27%] 2023-01-11T23:13:47.0918919Z test_ops.py::TestCommonCUDA::test_python_ref__refs_minimum_cuda_uint8 PASSED [ 27%] 2023-01-11T23:13:47.0919128Z test_ops.py::TestCommonCUDA::test_python_ref__refs_movedim_cuda_bool PASSED [ 27%] 2023-01-11T23:13:47.0919293Z test_ops.py::TestCommonCUDA::test_python_ref__refs_movedim_cuda_complex32 PASSED [ 27%] 2023-01-11T23:13:47.0919451Z test_ops.py::TestCommonCUDA::test_python_ref__refs_movedim_cuda_float16 PASSED [ 27%] 2023-01-11T23:13:47.0919605Z test_ops.py::TestCommonCUDA::test_python_ref__refs_movedim_cuda_float32 PASSED [ 27%] 2023-01-11T23:13:47.0919758Z test_ops.py::TestCommonCUDA::test_python_ref__refs_movedim_cuda_uint8 PASSED [ 27%] 2023-01-11T23:13:47.0919916Z test_ops.py::TestCommonCUDA::test_python_ref__refs_mul_cuda_complex128 PASSED [ 27%] 2023-01-11T23:13:47.0920074Z test_ops.py::TestCommonCUDA::test_python_ref__refs_mul_cuda_complex64 PASSED [ 27%] 2023-01-11T23:13:47.0920222Z test_ops.py::TestCommonCUDA::test_python_ref__refs_mul_cuda_float16 PASSED [ 27%] 2023-01-11T23:13:47.0920374Z test_ops.py::TestCommonCUDA::test_python_ref__refs_mul_cuda_uint8 PASSED [ 27%] 2023-01-11T23:13:47.0920539Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nan_to_num_cuda_bfloat16 PASSED [ 27%] 2023-01-11T23:13:47.0920696Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nan_to_num_cuda_bool PASSED [ 27%] 2023-01-11T23:13:47.0920856Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nan_to_num_cuda_int16 PASSED [ 27%] 2023-01-11T23:13:47.0921009Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nan_to_num_cuda_int32 PASSED [ 27%] 2023-01-11T23:13:47.0921167Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nan_to_num_cuda_int64 PASSED [ 27%] 2023-01-11T23:13:47.0921320Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nan_to_num_cuda_int8 PASSED [ 27%] 2023-01-11T23:13:47.0921485Z test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_copy_cuda_bfloat16 PASSED [ 27%] 2023-01-11T23:13:47.0921646Z test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_copy_cuda_complex128 PASSED [ 27%] 2023-01-11T23:13:47.0946166Z test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_copy_cuda_complex32 PASSED [ 27%] 2023-01-11T23:13:47.0946360Z test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_copy_cuda_float16 PASSED [ 27%] 2023-01-11T23:13:47.0946645Z test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_copy_cuda_float64 PASSED [ 27%] 2023-01-11T23:13:47.0946811Z test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_copy_cuda_int16 PASSED [ 27%] 2023-01-11T23:13:47.0946974Z test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_copy_cuda_uint8 PASSED [ 27%] 2023-01-11T23:13:47.0947134Z test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_cuda_bfloat16 PASSED [ 27%] 2023-01-11T23:13:47.0947295Z test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_cuda_complex32 PASSED [ 27%] 2023-01-11T23:13:47.0947449Z test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_cuda_float32 PASSED [ 27%] 2023-01-11T23:13:47.0947604Z test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_cuda_int16 PASSED [ 27%] 2023-01-11T23:13:47.0947762Z test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_cuda_int64 PASSED [ 27%] 2023-01-11T23:13:47.0947920Z test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_cuda_int8 PASSED [ 27%] 2023-01-11T23:13:47.0948077Z test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_cuda_uint8 PASSED [ 27%] 2023-01-11T23:13:47.0948247Z test_ops.py::TestCommonCUDA::test_python_ref__refs_native_layer_norm_cuda_bfloat16 PASSED [ 27%] 2023-01-11T23:13:47.0948417Z test_ops.py::TestCommonCUDA::test_python_ref__refs_native_layer_norm_cuda_float32 PASSED [ 27%] 2023-01-11T23:13:47.0948583Z test_ops.py::TestCommonCUDA::test_python_ref__refs_native_layer_norm_cuda_float64 PASSED [ 27%] 2023-01-11T23:13:47.0948734Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ne_cuda_bfloat16 PASSED [ 27%] 2023-01-11T23:13:47.0948887Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ne_cuda_bool PASSED [ 27%] 2023-01-11T23:13:47.0949044Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ne_cuda_complex128 PASSED [ 27%] 2023-01-11T23:13:47.0949238Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ne_cuda_complex64 PASSED [ 27%] 2023-01-11T23:13:47.0949391Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ne_cuda_float32 PASSED [ 27%] 2023-01-11T23:13:47.0949545Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ne_cuda_float64 PASSED [ 27%] 2023-01-11T23:13:47.0949785Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ne_cuda_int8 PASSED [ 27%] 2023-01-11T23:13:47.0949939Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ne_cuda_uint8 PASSED [ 27%] 2023-01-11T23:13:47.0950100Z test_ops.py::TestCommonCUDA::test_python_ref__refs_neg_cuda_complex128 PASSED [ 27%] 2023-01-11T23:13:47.0950251Z test_ops.py::TestCommonCUDA::test_python_ref__refs_neg_cuda_complex64 PASSED [ 27%] 2023-01-11T23:13:47.0950402Z test_ops.py::TestCommonCUDA::test_python_ref__refs_neg_cuda_float16 PASSED [ 27%] 2023-01-11T23:13:47.0950552Z test_ops.py::TestCommonCUDA::test_python_ref__refs_neg_cuda_uint8 PASSED [ 27%] 2023-01-11T23:13:47.0950754Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_cuda_bool SKIPPED (Expected: empty is not comparable) [ 27%] 2023-01-11T23:13:47.0950958Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_cuda_complex128 SKIPPED (Expected: empty is not comparable) [ 27%] 2023-01-11T23:13:47.0951162Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_cuda_complex64 SKIPPED (Expected: empty is not comparable) [ 27%] 2023-01-11T23:13:47.0951362Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_cuda_float32 SKIPPED (Expected: empty is not comparable) [ 27%] 2023-01-11T23:13:47.0951556Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_cuda_int64 SKIPPED (Expected: empty is not comparable) [ 27%] 2023-01-11T23:13:47.0951769Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_strided_cuda_bool SKIPPED (Expected: empty_strided is not comparable) [ 27%] 2023-01-11T23:13:47.0951984Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_strided_cuda_complex64 SKIPPED (Expected: empty_strided is not comparable) [ 27%] 2023-01-11T23:13:47.0952203Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_strided_cuda_float32 SKIPPED (Expected: empty_strided is not comparable) [ 27%] 2023-01-11T23:13:47.0952444Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_strided_cuda_int64 SKIPPED (Expected: empty_strided is not comparable) [ 27%] 2023-01-11T23:13:47.0952607Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_full_cuda_bfloat16 PASSED [ 27%] 2023-01-11T23:13:47.0952769Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_full_cuda_complex32 PASSED [ 27%] 2023-01-11T23:13:47.0952928Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_full_cuda_float64 PASSED [ 27%] 2023-01-11T23:13:47.0953083Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_full_cuda_int16 PASSED [ 27%] 2023-01-11T23:13:47.0953232Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_full_cuda_int32 PASSED [ 27%] 2023-01-11T23:13:47.0953381Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_full_cuda_int64 PASSED [ 27%] 2023-01-11T23:13:47.0953542Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_ones_cuda_complex32 PASSED [ 27%] 2023-01-11T23:13:47.0953704Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_ones_cuda_float16 PASSED [ 27%] 2023-01-11T23:13:47.0953862Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_ones_cuda_float64 PASSED [ 27%] 2023-01-11T23:13:47.0954017Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_ones_cuda_int16 PASSED [ 27%] 2023-01-11T23:13:47.0954167Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_ones_cuda_int64 PASSED [ 27%] 2023-01-11T23:13:47.0954316Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_ones_cuda_uint8 PASSED [ 27%] 2023-01-11T23:13:47.0954475Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_zeros_cuda_bfloat16 PASSED [ 27%] 2023-01-11T23:13:47.0954638Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_zeros_cuda_complex128 PASSED [ 27%] 2023-01-11T23:13:47.0954825Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_zeros_cuda_complex64 PASSED [ 27%] 2023-01-11T23:13:47.0954986Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_zeros_cuda_float32 PASSED [ 27%] 2023-01-11T23:13:47.0955150Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_zeros_cuda_float64 PASSED [ 27%] 2023-01-11T23:13:47.0955309Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_zeros_cuda_int16 PASSED [ 27%] 2023-01-11T23:13:47.0955463Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_zeros_cuda_int64 PASSED [ 27%] 2023-01-11T23:13:47.0955619Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_zeros_cuda_int8 PASSED [ 27%] 2023-01-11T23:13:47.0955779Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nextafter_cuda_float32 PASSED [ 27%] 2023-01-11T23:13:47.0955942Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nextafter_cuda_float64 PASSED [ 27%] 2023-01-11T23:13:47.0956168Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_alpha_dropout_cuda_float16 SKIPPED (Expected: dropout is not comparable) [ 27%] 2023-01-11T23:13:47.0956340Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_celu_cuda_float32 PASSED [ 27%] 2023-01-11T23:13:47.0956513Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_celu_cuda_float64 PASSED [ 27%] 2023-01-11T23:13:47.0956731Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_dropout_cuda_float32 SKIPPED (Expected: dropout is not comparable) [ 27%] 2023-01-11T23:13:47.0956946Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_dropout_cuda_float64 SKIPPED (Expected: dropout is not comparable) [ 27%] 2023-01-11T23:13:47.0957117Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_elu_cuda_float64 PASSED [ 28%] 2023-01-11T23:13:47.0957289Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_gelu_cuda_bfloat16 PASSED [ 28%] 2023-01-11T23:13:47.0957458Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_gelu_cuda_float32 PASSED [ 28%] 2023-01-11T23:13:47.0957629Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_glu_cuda_bfloat16 PASSED [ 28%] 2023-01-11T23:13:47.0957795Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_glu_cuda_float32 PASSED [ 28%] 2023-01-11T23:13:47.0957988Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_glu_cuda_float64 PASSED [ 28%] 2023-01-11T23:13:47.0958170Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_group_norm_cuda_float16 PASSED [ 28%] 2023-01-11T23:13:47.0958348Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_group_norm_cuda_float32 PASSED [ 28%] 2023-01-11T23:13:47.0958527Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_hardshrink_cuda_float32 PASSED [ 28%] 2023-01-11T23:13:47.0958705Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_hardshrink_cuda_float64 PASSED [ 28%] 2023-01-11T23:13:47.0958882Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_hardtanh_cuda_float32 PASSED [ 28%] 2023-01-11T23:13:47.0959057Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_hardtanh_cuda_float64 PASSED [ 28%] 2023-01-11T23:13:47.0959230Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_hardtanh_cuda_int32 PASSED [ 28%] 2023-01-11T23:13:47.0959400Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_hardtanh_cuda_int64 PASSED [ 28%] 2023-01-11T23:13:47.0959587Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_hinge_embedding_loss_cuda_float64 PASSED [ 28%] 2023-01-11T23:13:47.0959767Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_huber_loss_cuda_bfloat16 PASSED [ 28%] 2023-01-11T23:13:47.0959943Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_huber_loss_cuda_float16 PASSED [ 28%] 2023-01-11T23:13:47.0960115Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_huber_loss_cuda_float64 PASSED [ 28%] 2023-01-11T23:13:47.0960292Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_l1_loss_cuda_bfloat16 PASSED [ 28%] 2023-01-11T23:13:47.0960492Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_l1_loss_cuda_complex128 PASSED [ 28%] 2023-01-11T23:13:47.0960670Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_l1_loss_cuda_complex64 PASSED [ 28%] 2023-01-11T23:13:47.0960841Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_l1_loss_cuda_float16 PASSED [ 28%] 2023-01-11T23:13:47.0961008Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_l1_loss_cuda_float64 PASSED [ 28%] 2023-01-11T23:13:47.0961180Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_layer_norm_cuda_float16 PASSED [ 28%] 2023-01-11T23:13:47.0961354Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_layer_norm_cuda_float64 PASSED [ 28%] 2023-01-11T23:13:47.0961531Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_leaky_relu_cuda_bfloat16 PASSED [ 28%] 2023-01-11T23:13:47.0961703Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_leaky_relu_cuda_float32 PASSED [ 28%] 2023-01-11T23:13:47.0961898Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_log_softmax_with_dtype_cuda_float32 PASSED [ 28%] 2023-01-11T23:13:47.0962086Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_log_softmax_with_dtype_cuda_float64 PASSED [ 28%] 2023-01-11T23:13:47.0962274Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_log_softmax_with_dtype_cuda_int16 PASSED [ 28%] 2023-01-11T23:13:47.0962460Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_log_softmax_with_dtype_cuda_int32 PASSED [ 28%] 2023-01-11T23:13:47.0962647Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_log_softmax_with_dtype_cuda_int8 PASSED [ 28%] 2023-01-11T23:13:47.0962832Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_margin_ranking_loss_cuda_float16 PASSED [ 28%] 2023-01-11T23:13:47.0963016Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_margin_ranking_loss_cuda_float64 PASSED [ 28%] 2023-01-11T23:13:47.0963204Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_margin_ranking_loss_cuda_int16 PASSED [ 28%] 2023-01-11T23:13:47.0963415Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_margin_ranking_loss_cuda_int32 PASSED [ 28%] 2023-01-11T23:13:47.0963595Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_margin_ranking_loss_cuda_int64 PASSED [ 28%] 2023-01-11T23:13:47.0963773Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_margin_ranking_loss_cuda_uint8 PASSED [ 28%] 2023-01-11T23:13:47.0963946Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_mish_cuda_float16 PASSED [ 28%] 2023-01-11T23:13:47.0964116Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_mish_cuda_float32 PASSED [ 28%] 2023-01-11T23:13:47.0964295Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_mse_loss_cuda_bfloat16 PASSED [ 28%] 2023-01-11T23:13:47.0964468Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_mse_loss_cuda_float64 PASSED [ 28%] 2023-01-11T23:13:47.0964641Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_nll_loss_cuda_bfloat16 PASSED [ 28%] 2023-01-11T23:13:47.0964814Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_nll_loss_cuda_float16 PASSED [ 28%] 2023-01-11T23:13:47.0965002Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pairwise_distance_cuda_bfloat16 PASSED [ 28%] 2023-01-11T23:13:47.0965188Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pairwise_distance_cuda_complex64 PASSED [ 28%] 2023-01-11T23:13:47.0965370Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pairwise_distance_cuda_float16 PASSED [ 28%] 2023-01-11T23:13:47.0965553Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pairwise_distance_cuda_float32 PASSED [ 28%] 2023-01-11T23:13:47.0965739Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pairwise_distance_cuda_int32 PASSED [ 28%] 2023-01-11T23:13:47.0965946Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pairwise_distance_cuda_int64 PASSED [ 28%] 2023-01-11T23:13:47.0966124Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pairwise_distance_cuda_uint8 PASSED [ 28%] 2023-01-11T23:13:47.0966297Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pdist_cuda_float32 XFAIL [ 28%] 2023-01-11T23:13:47.0966481Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_poisson_nll_loss_cuda_bfloat16 PASSED [ 28%] 2023-01-11T23:13:47.0966664Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_poisson_nll_loss_cuda_uint8 PASSED [ 28%] 2023-01-11T23:13:47.0966834Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_prelu_cuda_float64 PASSED [ 28%] 2023-01-11T23:13:47.0967008Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_relu6_cuda_bfloat16 PASSED [ 28%] 2023-01-11T23:13:47.0967177Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_relu6_cuda_float64 PASSED [ 28%] 2023-01-11T23:13:47.0967348Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_relu6_cuda_uint8 PASSED [ 28%] 2023-01-11T23:13:47.0967521Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_relu_cuda_bfloat16 PASSED [ 28%] 2023-01-11T23:13:47.0967688Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_relu_cuda_float16 PASSED [ 28%] 2023-01-11T23:13:47.0967858Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_relu_cuda_float32 PASSED [ 28%] 2023-01-11T23:13:47.0968025Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_relu_cuda_int16 PASSED [ 28%] 2023-01-11T23:13:47.0968192Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_relu_cuda_int32 PASSED [ 28%] 2023-01-11T23:13:47.0968356Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_relu_cuda_int8 PASSED [ 28%] 2023-01-11T23:13:47.0968518Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_relu_cuda_uint8 PASSED [ 28%] 2023-01-11T23:13:47.0968690Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_selu_cuda_float32 PASSED [ 28%] 2023-01-11T23:13:47.0968879Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softmax_with_dtype_cuda_complex128 PASSED [ 28%] 2023-01-11T23:13:47.0969091Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softmax_with_dtype_cuda_float16 PASSED [ 28%] 2023-01-11T23:13:47.0969275Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softmax_with_dtype_cuda_float64 PASSED [ 28%] 2023-01-11T23:13:47.0969459Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softmax_with_dtype_cuda_int16 PASSED [ 28%] 2023-01-11T23:13:47.0969642Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softmax_with_dtype_cuda_int32 PASSED [ 28%] 2023-01-11T23:13:47.0969819Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softmax_with_dtype_cuda_int64 PASSED [ 28%] 2023-01-11T23:13:47.0970001Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softmax_with_dtype_cuda_uint8 PASSED [ 28%] 2023-01-11T23:13:47.0970202Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softmin_with_dtype_cuda_bfloat16 PASSED [ 28%] 2023-01-11T23:13:47.0970447Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softmin_with_dtype_cuda_float32 PASSED [ 28%] 2023-01-11T23:13:47.0970673Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softmin_with_dtype_cuda_float64 PASSED [ 28%] 2023-01-11T23:13:47.0970891Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softmin_with_dtype_cuda_int32 PASSED [ 28%] 2023-01-11T23:13:47.0971069Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softmin_with_dtype_cuda_int64 PASSED [ 28%] 2023-01-11T23:13:47.0971246Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softplus_cuda_float16 PASSED [ 28%] 2023-01-11T23:13:47.0971418Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softplus_cuda_float32 PASSED [ 28%] 2023-01-11T23:13:47.0971654Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softshrink_cuda_float64 PASSED [ 28%] 2023-01-11T23:13:47.0971841Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_tanhshrink_cuda_complex128 PASSED [ 28%] 2023-01-11T23:13:47.0972019Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_tanhshrink_cuda_float16 PASSED [ 28%] 2023-01-11T23:13:47.0972194Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_tanhshrink_cuda_int16 PASSED [ 28%] 2023-01-11T23:13:47.0972368Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_tanhshrink_cuda_int8 PASSED [ 28%] 2023-01-11T23:13:47.0972541Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_tanhshrink_cuda_uint8 PASSED [ 28%] 2023-01-11T23:13:47.0972716Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_threshold_cuda_bfloat16 PASSED [ 28%] 2023-01-11T23:13:47.0972893Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_threshold_cuda_int32 PASSED [ 28%] 2023-01-11T23:13:47.0973065Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_threshold_cuda_int64 PASSED [ 28%] 2023-01-11T23:13:47.0973260Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_triplet_margin_loss_cuda_complex128 PASSED [ 28%] 2023-01-11T23:13:47.0973447Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_triplet_margin_loss_cuda_float64 PASSED [ 28%] 2023-01-11T23:13:47.0973630Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_triplet_margin_loss_cuda_int16 PASSED [ 28%] 2023-01-11T23:13:47.0973813Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_triplet_margin_loss_cuda_int32 PASSED [ 28%] 2023-01-11T23:13:47.0973974Z test_ops.py::TestCommonCUDA::test_python_ref__refs_norm_cuda_complex64 PASSED [ 28%] 2023-01-11T23:13:47.0974132Z test_ops.py::TestCommonCUDA::test_python_ref__refs_norm_cuda_float64 PASSED [ 28%] 2023-01-11T23:13:47.0974287Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ones_cuda_bfloat16 PASSED [ 28%] 2023-01-11T23:13:47.0974449Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ones_cuda_complex128 PASSED [ 28%] 2023-01-11T23:13:47.0974732Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ones_cuda_float16 PASSED [ 28%] 2023-01-11T23:13:47.0974949Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ones_cuda_float32 PASSED [ 28%] 2023-01-11T23:13:47.0975156Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ones_cuda_int64 PASSED [ 28%] 2023-01-11T23:13:47.0975349Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ones_cuda_uint8 PASSED [ 28%] 2023-01-11T23:13:47.0975509Z test_ops.py::TestCommonCUDA::test_python_ref__refs_permute_cuda_float16 PASSED [ 28%] 2023-01-11T23:13:47.0975667Z test_ops.py::TestCommonCUDA::test_python_ref__refs_permute_cuda_float64 PASSED [ 28%] 2023-01-11T23:13:47.0975818Z test_ops.py::TestCommonCUDA::test_python_ref__refs_permute_cuda_int8 PASSED [ 28%] 2023-01-11T23:13:47.0975975Z test_ops.py::TestCommonCUDA::test_python_ref__refs_permute_cuda_uint8 PASSED [ 28%] 2023-01-11T23:13:47.0976142Z test_ops.py::TestCommonCUDA::test_python_ref__refs_positive_cuda_complex32 PASSED [ 28%] 2023-01-11T23:13:47.0976303Z test_ops.py::TestCommonCUDA::test_python_ref__refs_positive_cuda_complex64 PASSED [ 28%] 2023-01-11T23:13:47.0976464Z test_ops.py::TestCommonCUDA::test_python_ref__refs_positive_cuda_float16 PASSED [ 28%] 2023-01-11T23:13:47.0976621Z test_ops.py::TestCommonCUDA::test_python_ref__refs_positive_cuda_float32 PASSED [ 28%] 2023-01-11T23:13:47.0976777Z test_ops.py::TestCommonCUDA::test_python_ref__refs_positive_cuda_int16 PASSED [ 28%] 2023-01-11T23:13:47.0976929Z test_ops.py::TestCommonCUDA::test_python_ref__refs_positive_cuda_int32 PASSED [ 28%] 2023-01-11T23:13:47.0977077Z test_ops.py::TestCommonCUDA::test_python_ref__refs_positive_cuda_int8 PASSED [ 28%] 2023-01-11T23:13:47.0977231Z test_ops.py::TestCommonCUDA::test_python_ref__refs_pow_cuda_bfloat16 PASSED [ 28%] 2023-01-11T23:13:47.0977384Z test_ops.py::TestCommonCUDA::test_python_ref__refs_pow_cuda_complex128 PASSED [ 28%] 2023-01-11T23:13:47.0977585Z test_ops.py::TestCommonCUDA::test_python_ref__refs_pow_cuda_complex32 XFAIL [ 28%] 2023-01-11T23:13:47.0977737Z test_ops.py::TestCommonCUDA::test_python_ref__refs_pow_cuda_complex64 PASSED [ 28%] 2023-01-11T23:13:47.0977891Z test_ops.py::TestCommonCUDA::test_python_ref__refs_pow_cuda_float16 PASSED [ 28%] 2023-01-11T23:13:47.0978042Z test_ops.py::TestCommonCUDA::test_python_ref__refs_pow_cuda_float64 PASSED [ 28%] 2023-01-11T23:13:47.0978197Z test_ops.py::TestCommonCUDA::test_python_ref__refs_pow_cuda_int32 PASSED [ 28%] 2023-01-11T23:13:47.0978352Z test_ops.py::TestCommonCUDA::test_python_ref__refs_prod_cuda_complex32 PASSED [ 28%] 2023-01-11T23:13:47.0978505Z test_ops.py::TestCommonCUDA::test_python_ref__refs_prod_cuda_float16 PASSED [ 28%] 2023-01-11T23:13:47.0978654Z test_ops.py::TestCommonCUDA::test_python_ref__refs_prod_cuda_int16 PASSED [ 28%] 2023-01-11T23:13:47.0978806Z test_ops.py::TestCommonCUDA::test_python_ref__refs_prod_cuda_int32 PASSED [ 28%] 2023-01-11T23:13:47.0978962Z test_ops.py::TestCommonCUDA::test_python_ref__refs_prod_cuda_int8 PASSED [ 28%] 2023-01-11T23:13:47.0979138Z test_ops.py::TestCommonCUDA::test_python_ref__refs_randn_cuda_bfloat16 PASSED [ 28%] 2023-01-11T23:13:47.0979321Z test_ops.py::TestCommonCUDA::test_python_ref__refs_randn_cuda_complex128 PASSED [ 28%] 2023-01-11T23:13:47.0979483Z test_ops.py::TestCommonCUDA::test_python_ref__refs_randn_cuda_float32 PASSED [ 28%] 2023-01-11T23:13:47.0979634Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ravel_cuda_bool PASSED [ 28%] 2023-01-11T23:13:47.0979786Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ravel_cuda_complex32 PASSED [ 28%] 2023-01-11T23:13:47.0979938Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ravel_cuda_float16 PASSED [ 28%] 2023-01-11T23:13:47.0980087Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ravel_cuda_float32 PASSED [ 28%] 2023-01-11T23:13:47.0980237Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ravel_cuda_int16 PASSED [ 28%] 2023-01-11T23:13:47.0980390Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ravel_cuda_int32 PASSED [ 28%] 2023-01-11T23:13:47.0980538Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ravel_cuda_int64 PASSED [ 28%] 2023-01-11T23:13:47.0980713Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ravel_cuda_uint8 PASSED [ 28%] 2023-01-11T23:13:47.0980866Z test_ops.py::TestCommonCUDA::test_python_ref__refs_real_cuda_bool PASSED [ 28%] 2023-01-11T23:13:47.0981021Z test_ops.py::TestCommonCUDA::test_python_ref__refs_real_cuda_complex128 PASSED [ 28%] 2023-01-11T23:13:47.0981177Z test_ops.py::TestCommonCUDA::test_python_ref__refs_real_cuda_complex64 PASSED [ 28%] 2023-01-11T23:13:47.0981331Z test_ops.py::TestCommonCUDA::test_python_ref__refs_real_cuda_float16 PASSED [ 28%] 2023-01-11T23:13:47.0981482Z test_ops.py::TestCommonCUDA::test_python_ref__refs_real_cuda_float32 PASSED [ 28%] 2023-01-11T23:13:47.0981633Z test_ops.py::TestCommonCUDA::test_python_ref__refs_real_cuda_int32 PASSED [ 28%] 2023-01-11T23:13:47.0981804Z test_ops.py::TestCommonCUDA::test_python_ref__refs_reciprocal_cuda_bfloat16 PASSED [ 28%] 2023-01-11T23:13:47.0981971Z test_ops.py::TestCommonCUDA::test_python_ref__refs_reciprocal_cuda_complex128 PASSED [ 28%] 2023-01-11T23:13:47.0982142Z test_ops.py::TestCommonCUDA::test_python_ref__refs_reciprocal_cuda_complex64 PASSED [ 28%] 2023-01-11T23:13:47.0982299Z test_ops.py::TestCommonCUDA::test_python_ref__refs_reciprocal_cuda_int16 PASSED [ 28%] 2023-01-11T23:13:47.0982460Z test_ops.py::TestCommonCUDA::test_python_ref__refs_reciprocal_cuda_int64 PASSED [ 28%] 2023-01-11T23:13:47.0982623Z test_ops.py::TestCommonCUDA::test_python_ref__refs_remainder_cuda_bfloat16 PASSED [ 28%] 2023-01-11T23:13:47.0982780Z test_ops.py::TestCommonCUDA::test_python_ref__refs_remainder_cuda_int64 PASSED [ 28%] 2023-01-11T23:13:47.0982933Z test_ops.py::TestCommonCUDA::test_python_ref__refs_repeat_cuda_bfloat16 PASSED [ 28%] 2023-01-11T23:13:47.0983087Z test_ops.py::TestCommonCUDA::test_python_ref__refs_repeat_cuda_bool PASSED [ 28%] 2023-01-11T23:13:47.0983273Z test_ops.py::TestCommonCUDA::test_python_ref__refs_repeat_cuda_int32 PASSED [ 28%] 2023-01-11T23:13:47.0983426Z test_ops.py::TestCommonCUDA::test_python_ref__refs_repeat_cuda_int8 PASSED [ 29%] 2023-01-11T23:13:47.0983592Z test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_as_cuda_bfloat16 PASSED [ 29%] 2023-01-11T23:13:47.0983750Z test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_as_cuda_float32 PASSED [ 29%] 2023-01-11T23:13:47.0983910Z test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_as_cuda_int16 PASSED [ 29%] 2023-01-11T23:13:47.0984065Z test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_as_cuda_int64 PASSED [ 29%] 2023-01-11T23:13:47.0984221Z test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_as_cuda_int8 PASSED [ 29%] 2023-01-11T23:13:47.0984377Z test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_as_cuda_uint8 PASSED [ 29%] 2023-01-11T23:13:47.0984535Z test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_cuda_bfloat16 PASSED [ 29%] 2023-01-11T23:13:47.0984693Z test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_cuda_bool PASSED [ 29%] 2023-01-11T23:13:47.0984855Z test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_cuda_complex128 PASSED [ 29%] 2023-01-11T23:13:47.0985015Z test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_cuda_complex64 PASSED [ 29%] 2023-01-11T23:13:47.0985171Z test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_cuda_float16 PASSED [ 29%] 2023-01-11T23:13:47.0985323Z test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_cuda_float64 PASSED [ 29%] 2023-01-11T23:13:47.0985480Z test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_cuda_int32 PASSED [ 29%] 2023-01-11T23:13:47.0985636Z test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_cuda_int8 PASSED [ 29%] 2023-01-11T23:13:47.0985790Z test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_cuda_uint8 PASSED [ 29%] 2023-01-11T23:13:47.0985942Z test_ops.py::TestCommonCUDA::test_python_ref__refs_roll_cuda_bfloat16 PASSED [ 29%] 2023-01-11T23:13:47.0986097Z test_ops.py::TestCommonCUDA::test_python_ref__refs_roll_cuda_bool PASSED [ 29%] 2023-01-11T23:13:47.0986252Z test_ops.py::TestCommonCUDA::test_python_ref__refs_roll_cuda_complex128 PASSED [ 29%] 2023-01-11T23:13:47.0986435Z test_ops.py::TestCommonCUDA::test_python_ref__refs_roll_cuda_complex64 PASSED [ 29%] 2023-01-11T23:13:47.0986591Z test_ops.py::TestCommonCUDA::test_python_ref__refs_roll_cuda_float16 PASSED [ 29%] 2023-01-11T23:13:47.0986742Z test_ops.py::TestCommonCUDA::test_python_ref__refs_roll_cuda_float64 PASSED [ 29%] 2023-01-11T23:13:47.0986894Z test_ops.py::TestCommonCUDA::test_python_ref__refs_roll_cuda_int16 PASSED [ 29%] 2023-01-11T23:13:47.0987048Z test_ops.py::TestCommonCUDA::test_python_ref__refs_roll_cuda_int32 PASSED [ 29%] 2023-01-11T23:13:47.0987200Z test_ops.py::TestCommonCUDA::test_python_ref__refs_roll_cuda_int64 PASSED [ 29%] 2023-01-11T23:13:47.0987351Z test_ops.py::TestCommonCUDA::test_python_ref__refs_rot90_cuda_bool PASSED [ 29%] 2023-01-11T23:13:47.0987509Z test_ops.py::TestCommonCUDA::test_python_ref__refs_rot90_cuda_complex128 PASSED [ 29%] 2023-01-11T23:13:47.0987664Z test_ops.py::TestCommonCUDA::test_python_ref__refs_rot90_cuda_complex64 PASSED [ 29%] 2023-01-11T23:13:47.0987816Z test_ops.py::TestCommonCUDA::test_python_ref__refs_rot90_cuda_float16 PASSED [ 29%] 2023-01-11T23:13:47.0987966Z test_ops.py::TestCommonCUDA::test_python_ref__refs_rot90_cuda_float32 PASSED [ 29%] 2023-01-11T23:13:47.0988114Z test_ops.py::TestCommonCUDA::test_python_ref__refs_rot90_cuda_float64 PASSED [ 29%] 2023-01-11T23:13:47.0988267Z test_ops.py::TestCommonCUDA::test_python_ref__refs_rot90_cuda_int32 PASSED [ 29%] 2023-01-11T23:13:47.0988422Z test_ops.py::TestCommonCUDA::test_python_ref__refs_round_cuda_bfloat16 PASSED [ 29%] 2023-01-11T23:13:47.0988577Z test_ops.py::TestCommonCUDA::test_python_ref__refs_round_cuda_float32 PASSED [ 29%] 2023-01-11T23:13:47.0988729Z test_ops.py::TestCommonCUDA::test_python_ref__refs_round_cuda_float64 PASSED [ 29%] 2023-01-11T23:13:47.0988901Z test_ops.py::TestCommonCUDA::test_python_ref__refs_round_cuda_int64 PASSED [ 29%] 2023-01-11T23:13:47.0989053Z test_ops.py::TestCommonCUDA::test_python_ref__refs_round_cuda_int8 PASSED [ 29%] 2023-01-11T23:13:47.0989208Z test_ops.py::TestCommonCUDA::test_python_ref__refs_round_cuda_uint8 PASSED [ 29%] 2023-01-11T23:13:47.0989365Z test_ops.py::TestCommonCUDA::test_python_ref__refs_rsqrt_cuda_bfloat16 PASSED [ 29%] 2023-01-11T23:13:47.0989518Z test_ops.py::TestCommonCUDA::test_python_ref__refs_rsqrt_cuda_bool PASSED [ 29%] 2023-01-11T23:13:47.0989676Z test_ops.py::TestCommonCUDA::test_python_ref__refs_rsqrt_cuda_complex128 PASSED [ 29%] 2023-01-11T23:13:47.0989906Z test_ops.py::TestCommonCUDA::test_python_ref__refs_rsqrt_cuda_complex32 PASSED [ 29%] 2023-01-11T23:13:47.0990060Z test_ops.py::TestCommonCUDA::test_python_ref__refs_rsqrt_cuda_complex64 PASSED [ 29%] 2023-01-11T23:13:47.0990209Z test_ops.py::TestCommonCUDA::test_python_ref__refs_rsqrt_cuda_float32 PASSED [ 29%] 2023-01-11T23:13:47.0990366Z test_ops.py::TestCommonCUDA::test_python_ref__refs_rsqrt_cuda_float64 PASSED [ 29%] 2023-01-11T23:13:47.0990519Z test_ops.py::TestCommonCUDA::test_python_ref__refs_rsqrt_cuda_int16 PASSED [ 29%] 2023-01-11T23:13:47.0990673Z test_ops.py::TestCommonCUDA::test_python_ref__refs_rsqrt_cuda_int8 PASSED [ 29%] 2023-01-11T23:13:47.0990827Z test_ops.py::TestCommonCUDA::test_python_ref__refs_rsub_cuda_float32 PASSED [ 29%] 2023-01-11T23:13:47.0990978Z test_ops.py::TestCommonCUDA::test_python_ref__refs_rsub_cuda_float64 PASSED [ 29%] 2023-01-11T23:13:47.0991130Z test_ops.py::TestCommonCUDA::test_python_ref__refs_rsub_cuda_int8 PASSED [ 29%] 2023-01-11T23:13:47.0991282Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sgn_cuda_bool PASSED [ 29%] 2023-01-11T23:13:47.0991437Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sgn_cuda_complex128 PASSED [ 29%] 2023-01-11T23:13:47.0991590Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sgn_cuda_complex64 PASSED [ 29%] 2023-01-11T23:13:47.0991746Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sgn_cuda_float16 PASSED [ 29%] 2023-01-11T23:13:47.0991899Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sgn_cuda_float32 PASSED [ 29%] 2023-01-11T23:13:47.0992077Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sgn_cuda_float64 PASSED [ 29%] 2023-01-11T23:13:47.0992228Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sgn_cuda_int64 PASSED [ 29%] 2023-01-11T23:13:47.0992379Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sgn_cuda_uint8 PASSED [ 29%] 2023-01-11T23:13:47.0992537Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sigmoid_cuda_bfloat16 PASSED [ 29%] 2023-01-11T23:13:47.0992689Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sigmoid_cuda_bool PASSED [ 29%] 2023-01-11T23:13:47.0992851Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sigmoid_cuda_complex128 PASSED [ 29%] 2023-01-11T23:13:47.0993013Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sigmoid_cuda_complex32 PASSED [ 29%] 2023-01-11T23:13:47.0993174Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sigmoid_cuda_float16 PASSED [ 29%] 2023-01-11T23:13:47.0993327Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sigmoid_cuda_float32 PASSED [ 29%] 2023-01-11T23:13:47.0993486Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sigmoid_cuda_int32 PASSED [ 29%] 2023-01-11T23:13:47.0993638Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sigmoid_cuda_int64 PASSED [ 29%] 2023-01-11T23:13:47.0993790Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sigmoid_cuda_uint8 PASSED [ 29%] 2023-01-11T23:13:47.0993941Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sign_cuda_bool PASSED [ 29%] 2023-01-11T23:13:47.0994092Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sign_cuda_float16 PASSED [ 29%] 2023-01-11T23:13:47.0994243Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sign_cuda_float32 PASSED [ 29%] 2023-01-11T23:13:47.0994391Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sign_cuda_float64 PASSED [ 29%] 2023-01-11T23:13:47.0994565Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sign_cuda_int8 PASSED [ 29%] 2023-01-11T23:13:47.0994718Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sign_cuda_uint8 PASSED [ 29%] 2023-01-11T23:13:47.0994877Z test_ops.py::TestCommonCUDA::test_python_ref__refs_signbit_cuda_bfloat16 PASSED [ 29%] 2023-01-11T23:13:47.0995030Z test_ops.py::TestCommonCUDA::test_python_ref__refs_signbit_cuda_bool PASSED [ 29%] 2023-01-11T23:13:47.0995184Z test_ops.py::TestCommonCUDA::test_python_ref__refs_signbit_cuda_int32 PASSED [ 29%] 2023-01-11T23:13:47.0995331Z test_ops.py::TestCommonCUDA::test_python_ref__refs_signbit_cuda_int64 PASSED [ 29%] 2023-01-11T23:13:47.0995482Z test_ops.py::TestCommonCUDA::test_python_ref__refs_signbit_cuda_int8 PASSED [ 29%] 2023-01-11T23:13:47.0995632Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sin_cuda_complex64 PASSED [ 29%] 2023-01-11T23:13:47.0995784Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sin_cuda_float16 PASSED [ 29%] 2023-01-11T23:13:47.0995938Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sin_cuda_int16 PASSED [ 29%] 2023-01-11T23:13:47.0996088Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sin_cuda_int32 PASSED [ 29%] 2023-01-11T23:13:47.0996239Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sin_cuda_int8 PASSED [ 29%] 2023-01-11T23:13:47.0996389Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sinc_cuda_bool PASSED [ 29%] 2023-01-11T23:13:47.0996540Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sinc_cuda_float16 PASSED [ 29%] 2023-01-11T23:13:47.0996691Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sinc_cuda_float32 PASSED [ 29%] 2023-01-11T23:13:47.0996840Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sinc_cuda_float64 PASSED [ 29%] 2023-01-11T23:13:47.0996991Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sinc_cuda_int16 PASSED [ 29%] 2023-01-11T23:13:47.0997143Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sinc_cuda_int32 PASSED [ 29%] 2023-01-11T23:13:47.0997295Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sinc_cuda_uint8 PASSED [ 29%] 2023-01-11T23:13:47.0997452Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sinh_cuda_bfloat16 PASSED [ 29%] 2023-01-11T23:13:47.0997608Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sinh_cuda_complex128 PASSED [ 29%] 2023-01-11T23:13:47.0997791Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sinh_cuda_complex32 PASSED [ 29%] 2023-01-11T23:13:47.0997941Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sinh_cuda_float16 PASSED [ 29%] 2023-01-11T23:13:47.0998089Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sinh_cuda_float64 PASSED [ 29%] 2023-01-11T23:13:47.0998236Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sinh_cuda_int8 PASSED [ 29%] 2023-01-11T23:13:47.0998406Z test_ops.py::TestCommonCUDA::test_python_ref__refs_softmax_with_dtype_cuda_bfloat16 PASSED [ 29%] 2023-01-11T23:13:47.0998574Z test_ops.py::TestCommonCUDA::test_python_ref__refs_softmax_with_dtype_cuda_float16 PASSED [ 29%] 2023-01-11T23:13:47.0998742Z test_ops.py::TestCommonCUDA::test_python_ref__refs_softmax_with_dtype_cuda_float64 PASSED [ 29%] 2023-01-11T23:13:47.0998911Z test_ops.py::TestCommonCUDA::test_python_ref__refs_softmax_with_dtype_cuda_int16 PASSED [ 29%] 2023-01-11T23:13:47.0999076Z test_ops.py::TestCommonCUDA::test_python_ref__refs_softmax_with_dtype_cuda_int32 PASSED [ 29%] 2023-01-11T23:13:47.0999238Z test_ops.py::TestCommonCUDA::test_python_ref__refs_softmax_with_dtype_cuda_int64 PASSED [ 29%] 2023-01-11T23:13:47.0999402Z test_ops.py::TestCommonCUDA::test_python_ref__refs_softmax_with_dtype_cuda_uint8 PASSED [ 29%] 2023-01-11T23:13:47.0999566Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_bessel_j0_cuda_float64 PASSED [ 29%] 2023-01-11T23:13:47.0999727Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_bessel_j0_cuda_int16 PASSED [ 29%] 2023-01-11T23:13:47.0999888Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_bessel_j1_cuda_int64 PASSED [ 29%] 2023-01-11T23:13:47.1000048Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_entr_cuda_float64 PASSED [ 29%] 2023-01-11T23:13:47.1000242Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_entr_cuda_int16 PASSED [ 29%] 2023-01-11T23:13:47.1000400Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_entr_cuda_int32 PASSED [ 29%] 2023-01-11T23:13:47.1000558Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_entr_cuda_int64 PASSED [ 29%] 2023-01-11T23:13:47.1000715Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_entr_cuda_int8 PASSED [ 29%] 2023-01-11T23:13:47.1000870Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_entr_cuda_uint8 PASSED [ 29%] 2023-01-11T23:13:47.1001033Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_erfcx_cuda_float64 PASSED [ 29%] 2023-01-11T23:13:47.1001195Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_erfcx_cuda_int8 PASSED [ 29%] 2023-01-11T23:13:47.1001353Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i0e_cuda_bool PASSED [ 29%] 2023-01-11T23:13:47.1001513Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i0e_cuda_float16 PASSED [ 29%] 2023-01-11T23:13:47.1001670Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i0e_cuda_int16 PASSED [ 29%] 2023-01-11T23:13:47.1001829Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i0e_cuda_int32 PASSED [ 29%] 2023-01-11T23:13:47.1001986Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i0e_cuda_int64 PASSED [ 29%] 2023-01-11T23:13:47.1002143Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i0e_cuda_uint8 PASSED [ 29%] 2023-01-11T23:13:47.1002298Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i1_cuda_float64 PASSED [ 29%] 2023-01-11T23:13:47.1002454Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i1_cuda_int16 PASSED [ 29%] 2023-01-11T23:13:47.1002607Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i1_cuda_int32 PASSED [ 29%] 2023-01-11T23:13:47.1002762Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i1_cuda_int64 PASSED [ 29%] 2023-01-11T23:13:47.1002920Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i1e_cuda_bool PASSED [ 29%] 2023-01-11T23:13:47.1003077Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i1e_cuda_float32 PASSED [ 29%] 2023-01-11T23:13:47.1003255Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i1e_cuda_int64 PASSED [ 29%] 2023-01-11T23:13:47.1003411Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i1e_cuda_uint8 PASSED [ 29%] 2023-01-11T23:13:47.1003575Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_log_ndtr_cuda_float32 PASSED [ 29%] 2023-01-11T23:13:47.1003738Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_log_ndtr_cuda_int16 PASSED [ 29%] 2023-01-11T23:13:47.1003922Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_log_softmax_with_dtype_cuda_bfloat16 PASSED [ 29%] 2023-01-11T23:13:47.1004106Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_log_softmax_with_dtype_cuda_complex128 PASSED [ 29%] 2023-01-11T23:13:47.1004294Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_log_softmax_with_dtype_cuda_complex64 PASSED [ 29%] 2023-01-11T23:13:47.1004475Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_log_softmax_with_dtype_cuda_float16 PASSED [ 29%] 2023-01-11T23:13:47.1004655Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_log_softmax_with_dtype_cuda_int16 PASSED [ 29%] 2023-01-11T23:13:47.1004827Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_log_softmax_with_dtype_cuda_int8 PASSED [ 29%] 2023-01-11T23:13:47.1004991Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_logit_cuda_bfloat16 PASSED [ 29%] 2023-01-11T23:13:47.1005148Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_logit_cuda_bool PASSED [ 29%] 2023-01-11T23:13:47.1005309Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_logit_cuda_float64 PASSED [ 29%] 2023-01-11T23:13:47.1005468Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_logit_cuda_int16 PASSED [ 29%] 2023-01-11T23:13:47.1005651Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_logit_cuda_int32 PASSED [ 29%] 2023-01-11T23:13:47.1005809Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_logit_cuda_int8 PASSED [ 29%] 2023-01-11T23:13:47.1005998Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_1_cuda_float32 PASSED [ 29%] 2023-01-11T23:13:47.1006180Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_1_cuda_float64 PASSED [ 29%] 2023-01-11T23:13:47.1006362Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_1_cuda_int32 PASSED [ 29%] 2023-01-11T23:13:47.1006540Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_3_cuda_float16 PASSED [ 29%] 2023-01-11T23:13:47.1006719Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_3_cuda_float32 PASSED [ 29%] 2023-01-11T23:13:47.1006901Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_3_cuda_float64 PASSED [ 29%] 2023-01-11T23:13:47.1007085Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_3_cuda_int16 PASSED [ 29%] 2023-01-11T23:13:47.1007267Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_3_cuda_int32 PASSED [ 29%] 2023-01-11T23:13:47.1007444Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_3_cuda_int64 PASSED [ 29%] 2023-01-11T23:13:47.1007618Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_3_cuda_uint8 PASSED [ 29%] 2023-01-11T23:13:47.1007800Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_5_cuda_float16 PASSED [ 30%] 2023-01-11T23:13:47.1007976Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_5_cuda_float32 PASSED [ 30%] 2023-01-11T23:13:47.1008151Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_5_cuda_uint8 PASSED [ 30%] 2023-01-11T23:13:47.1008316Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_ndtr_cuda_float16 PASSED [ 30%] 2023-01-11T23:13:47.1008477Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_ndtr_cuda_float64 PASSED [ 30%] 2023-01-11T23:13:47.1008665Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_ndtr_cuda_int32 PASSED [ 30%] 2023-01-11T23:13:47.1008831Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_ndtr_cuda_int8 PASSED [ 30%] 2023-01-11T23:13:47.1009018Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_ndtri_cuda_float64 PASSED [ 30%] 2023-01-11T23:13:47.1009197Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_ndtri_cuda_int16 PASSED [ 30%] 2023-01-11T23:13:47.1009351Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_ndtri_cuda_int8 PASSED [ 30%] 2023-01-11T23:13:47.1009528Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_softmax_with_dtype_cuda_bfloat16 PASSED [ 30%] 2023-01-11T23:13:47.1009703Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_softmax_with_dtype_cuda_bool PASSED [ 30%] 2023-01-11T23:13:47.1009887Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_softmax_with_dtype_cuda_complex128 PASSED [ 30%] 2023-01-11T23:13:47.1010067Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_softmax_with_dtype_cuda_float16 PASSED [ 30%] 2023-01-11T23:13:47.1010243Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_softmax_with_dtype_cuda_float32 PASSED [ 30%] 2023-01-11T23:13:47.1010417Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_softmax_with_dtype_cuda_int32 PASSED [ 30%] 2023-01-11T23:13:47.1010592Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_spherical_bessel_j0_cuda_bool PASSED [ 30%] 2023-01-11T23:13:47.1010770Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_spherical_bessel_j0_cuda_float32 PASSED [ 30%] 2023-01-11T23:13:47.1010942Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_spherical_bessel_j0_cuda_float64 PASSED [ 30%] 2023-01-11T23:13:47.1011150Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_spherical_bessel_j0_cuda_uint8 PASSED [ 30%] 2023-01-11T23:13:47.1011316Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_xlog1py_cuda_bfloat16 PASSED [ 30%] 2023-01-11T23:13:47.1011481Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_xlog1py_cuda_float16 PASSED [ 30%] 2023-01-11T23:13:47.1011644Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_xlog1py_cuda_int16 PASSED [ 30%] 2023-01-11T23:13:47.1011804Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_xlog1py_cuda_int32 PASSED [ 30%] 2023-01-11T23:13:47.1011964Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_xlog1py_cuda_int8 PASSED [ 30%] 2023-01-11T23:13:47.1012123Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_xlog1py_cuda_uint8 PASSED [ 30%] 2023-01-11T23:13:47.1012278Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_zeta_cuda_float32 PASSED [ 30%] 2023-01-11T23:13:47.1012504Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_zeta_cuda_float64 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 30%] 2023-01-11T23:13:47.1012666Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_zeta_cuda_int32 PASSED [ 30%] 2023-01-11T23:13:47.1012827Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_zeta_cuda_int64 PASSED [ 30%] 2023-01-11T23:13:47.1012984Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_zeta_cuda_int8 PASSED [ 30%] 2023-01-11T23:13:47.1013140Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sqrt_cuda_bfloat16 PASSED [ 30%] 2023-01-11T23:13:47.1013295Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sqrt_cuda_complex32 PASSED [ 30%] 2023-01-11T23:13:47.1013448Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sqrt_cuda_float32 PASSED [ 30%] 2023-01-11T23:13:47.1013598Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sqrt_cuda_int64 PASSED [ 30%] 2023-01-11T23:13:47.1013746Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sqrt_cuda_int8 PASSED [ 30%] 2023-01-11T23:13:47.1013903Z test_ops.py::TestCommonCUDA::test_python_ref__refs_square_cuda_bfloat16 PASSED [ 30%] 2023-01-11T23:13:47.1014061Z test_ops.py::TestCommonCUDA::test_python_ref__refs_square_cuda_complex128 PASSED [ 30%] 2023-01-11T23:13:47.1014235Z test_ops.py::TestCommonCUDA::test_python_ref__refs_square_cuda_int64 PASSED [ 30%] 2023-01-11T23:13:47.1014388Z test_ops.py::TestCommonCUDA::test_python_ref__refs_square_cuda_int8 PASSED [ 30%] 2023-01-11T23:13:47.1014649Z test_ops.py::TestCommonCUDA::test_python_ref__refs_squeeze_cuda_bool PASSED [ 30%] 2023-01-11T23:13:47.1014808Z test_ops.py::TestCommonCUDA::test_python_ref__refs_squeeze_cuda_complex64 PASSED [ 30%] 2023-01-11T23:13:47.1014964Z test_ops.py::TestCommonCUDA::test_python_ref__refs_squeeze_cuda_float16 PASSED [ 30%] 2023-01-11T23:13:47.1015115Z test_ops.py::TestCommonCUDA::test_python_ref__refs_squeeze_cuda_float32 PASSED [ 30%] 2023-01-11T23:13:47.1015261Z test_ops.py::TestCommonCUDA::test_python_ref__refs_squeeze_cuda_float64 PASSED [ 30%] 2023-01-11T23:13:47.1015418Z test_ops.py::TestCommonCUDA::test_python_ref__refs_squeeze_cuda_int32 PASSED [ 30%] 2023-01-11T23:13:47.1015568Z test_ops.py::TestCommonCUDA::test_python_ref__refs_squeeze_cuda_uint8 PASSED [ 30%] 2023-01-11T23:13:47.1015728Z test_ops.py::TestCommonCUDA::test_python_ref__refs_stack_cuda_complex128 PASSED [ 30%] 2023-01-11T23:13:47.1015884Z test_ops.py::TestCommonCUDA::test_python_ref__refs_stack_cuda_complex32 PASSED [ 30%] 2023-01-11T23:13:47.1016037Z test_ops.py::TestCommonCUDA::test_python_ref__refs_stack_cuda_float64 PASSED [ 30%] 2023-01-11T23:13:47.1016188Z test_ops.py::TestCommonCUDA::test_python_ref__refs_stack_cuda_int32 PASSED [ 30%] 2023-01-11T23:13:47.1016337Z test_ops.py::TestCommonCUDA::test_python_ref__refs_stack_cuda_int64 PASSED [ 30%] 2023-01-11T23:13:47.1016485Z test_ops.py::TestCommonCUDA::test_python_ref__refs_stack_cuda_int8 PASSED [ 30%] 2023-01-11T23:13:47.1016635Z test_ops.py::TestCommonCUDA::test_python_ref__refs_stack_cuda_uint8 PASSED [ 30%] 2023-01-11T23:13:47.1016834Z test_ops.py::TestCommonCUDA::test_python_ref__refs_std_cuda_complex128 PASSED [ 30%] 2023-01-11T23:13:47.1016982Z test_ops.py::TestCommonCUDA::test_python_ref__refs_std_cuda_float16 PASSED [ 30%] 2023-01-11T23:13:47.1017145Z test_ops.py::TestCommonCUDA::test_python_ref__refs_std_mean_cuda_complex128 PASSED [ 30%] 2023-01-11T23:13:47.1017305Z test_ops.py::TestCommonCUDA::test_python_ref__refs_std_mean_cuda_complex64 PASSED [ 30%] 2023-01-11T23:13:47.1017460Z test_ops.py::TestCommonCUDA::test_python_ref__refs_std_mean_cuda_float32 PASSED [ 30%] 2023-01-11T23:13:47.1017612Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sub_cuda_bfloat16 PASSED [ 30%] 2023-01-11T23:13:47.1017763Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sub_cuda_complex128 PASSED [ 30%] 2023-01-11T23:13:47.1017913Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sub_cuda_float16 PASSED [ 30%] 2023-01-11T23:13:47.1018060Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sub_cuda_int8 PASSED [ 30%] 2023-01-11T23:13:47.1018211Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sub_cuda_uint8 PASSED [ 30%] 2023-01-11T23:13:47.1018359Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_cuda_bool PASSED [ 30%] 2023-01-11T23:13:47.1018515Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_cuda_complex128 PASSED [ 30%] 2023-01-11T23:13:47.1018665Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_cuda_float32 PASSED [ 30%] 2023-01-11T23:13:47.1018814Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_cuda_float64 PASSED [ 30%] 2023-01-11T23:13:47.1018980Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_cuda_int16 PASSED [ 30%] 2023-01-11T23:13:47.1019143Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_cuda_uint8 PASSED [ 30%] 2023-01-11T23:13:47.1019313Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_to_size_cuda_bfloat16 PASSED [ 30%] 2023-01-11T23:13:47.1019469Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_to_size_cuda_bool PASSED [ 30%] 2023-01-11T23:13:47.1019633Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_to_size_cuda_complex64 PASSED [ 30%] 2023-01-11T23:13:47.1019789Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_to_size_cuda_float16 PASSED [ 30%] 2023-01-11T23:13:47.1019979Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_to_size_cuda_float64 PASSED [ 30%] 2023-01-11T23:13:47.1020136Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_to_size_cuda_int16 PASSED [ 30%] 2023-01-11T23:13:47.1020291Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_to_size_cuda_int32 PASSED [ 30%] 2023-01-11T23:13:47.1020443Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_to_size_cuda_uint8 PASSED [ 30%] 2023-01-11T23:13:47.1020589Z test_ops.py::TestCommonCUDA::test_python_ref__refs_t_cuda_bool PASSED [ 30%] 2023-01-11T23:13:47.1020735Z test_ops.py::TestCommonCUDA::test_python_ref__refs_t_cuda_float64 PASSED [ 30%] 2023-01-11T23:13:47.1020881Z test_ops.py::TestCommonCUDA::test_python_ref__refs_t_cuda_int16 PASSED [ 30%] 2023-01-11T23:13:47.1021031Z test_ops.py::TestCommonCUDA::test_python_ref__refs_t_cuda_int64 PASSED [ 30%] 2023-01-11T23:13:47.1021174Z test_ops.py::TestCommonCUDA::test_python_ref__refs_t_cuda_uint8 PASSED [ 30%] 2023-01-11T23:13:47.1021334Z test_ops.py::TestCommonCUDA::test_python_ref__refs_tan_cuda_complex128 PASSED [ 30%] 2023-01-11T23:13:47.1021488Z test_ops.py::TestCommonCUDA::test_python_ref__refs_tan_cuda_complex32 PASSED [ 30%] 2023-01-11T23:13:47.1021636Z test_ops.py::TestCommonCUDA::test_python_ref__refs_tan_cuda_complex64 PASSED [ 30%] 2023-01-11T23:13:47.1021786Z test_ops.py::TestCommonCUDA::test_python_ref__refs_tan_cuda_float16 PASSED [ 30%] 2023-01-11T23:13:47.1021933Z test_ops.py::TestCommonCUDA::test_python_ref__refs_tan_cuda_int32 PASSED [ 30%] 2023-01-11T23:13:47.1022077Z test_ops.py::TestCommonCUDA::test_python_ref__refs_tan_cuda_int8 PASSED [ 30%] 2023-01-11T23:13:47.1022222Z test_ops.py::TestCommonCUDA::test_python_ref__refs_tan_cuda_uint8 PASSED [ 30%] 2023-01-11T23:13:47.1022394Z test_ops.py::TestCommonCUDA::test_python_ref__refs_tanh_cuda_bool PASSED [ 30%] 2023-01-11T23:13:47.1022551Z test_ops.py::TestCommonCUDA::test_python_ref__refs_tanh_cuda_complex128 PASSED [ 30%] 2023-01-11T23:13:47.1022709Z test_ops.py::TestCommonCUDA::test_python_ref__refs_tanh_cuda_complex32 PASSED [ 30%] 2023-01-11T23:13:47.1022858Z test_ops.py::TestCommonCUDA::test_python_ref__refs_tanh_cuda_float16 PASSED [ 30%] 2023-01-11T23:13:47.1023007Z test_ops.py::TestCommonCUDA::test_python_ref__refs_tanh_cuda_float32 PASSED [ 30%] 2023-01-11T23:13:47.1023154Z test_ops.py::TestCommonCUDA::test_python_ref__refs_tanh_cuda_float64 PASSED [ 30%] 2023-01-11T23:13:47.1023303Z test_ops.py::TestCommonCUDA::test_python_ref__refs_tanh_cuda_int32 PASSED [ 30%] 2023-01-11T23:13:47.1023449Z test_ops.py::TestCommonCUDA::test_python_ref__refs_tanh_cuda_int8 PASSED [ 30%] 2023-01-11T23:13:47.1023610Z test_ops.py::TestCommonCUDA::test_python_ref__refs_tensor_split_cuda_bfloat16 XFAIL [ 30%] 2023-01-11T23:13:47.1023768Z test_ops.py::TestCommonCUDA::test_python_ref__refs_tensor_split_cuda_float16 XFAIL [ 30%] 2023-01-11T23:13:47.1023928Z test_ops.py::TestCommonCUDA::test_python_ref__refs_tensor_split_cuda_float64 XFAIL [ 30%] 2023-01-11T23:13:47.1024082Z test_ops.py::TestCommonCUDA::test_python_ref__refs_to_cuda_float32 PASSED [ 30%] 2023-01-11T23:13:47.1024225Z test_ops.py::TestCommonCUDA::test_python_ref__refs_to_cuda_int16 PASSED [ 30%] 2023-01-11T23:13:47.1024371Z test_ops.py::TestCommonCUDA::test_python_ref__refs_to_cuda_uint8 PASSED [ 30%] 2023-01-11T23:13:47.1024527Z test_ops.py::TestCommonCUDA::test_python_ref__refs_trace_cuda_bfloat16 PASSED [ 30%] 2023-01-11T23:13:47.1024676Z test_ops.py::TestCommonCUDA::test_python_ref__refs_trace_cuda_bool PASSED [ 30%] 2023-01-11T23:13:47.1024829Z test_ops.py::TestCommonCUDA::test_python_ref__refs_trace_cuda_float16 PASSED [ 30%] 2023-01-11T23:13:47.1024979Z test_ops.py::TestCommonCUDA::test_python_ref__refs_trace_cuda_float32 PASSED [ 30%] 2023-01-11T23:13:47.1025127Z test_ops.py::TestCommonCUDA::test_python_ref__refs_trace_cuda_float64 PASSED [ 30%] 2023-01-11T23:13:47.1025281Z test_ops.py::TestCommonCUDA::test_python_ref__refs_trace_cuda_int16 PASSED [ 30%] 2023-01-11T23:13:47.1025427Z test_ops.py::TestCommonCUDA::test_python_ref__refs_trace_cuda_int32 PASSED [ 30%] 2023-01-11T23:13:47.1025602Z test_ops.py::TestCommonCUDA::test_python_ref__refs_trace_cuda_int64 PASSED [ 30%] 2023-01-11T23:13:47.1025751Z test_ops.py::TestCommonCUDA::test_python_ref__refs_trace_cuda_int8 PASSED [ 30%] 2023-01-11T23:13:47.1025913Z test_ops.py::TestCommonCUDA::test_python_ref__refs_transpose_cuda_bfloat16 PASSED [ 30%] 2023-01-11T23:13:47.1026068Z test_ops.py::TestCommonCUDA::test_python_ref__refs_transpose_cuda_bool PASSED [ 30%] 2023-01-11T23:13:47.1026229Z test_ops.py::TestCommonCUDA::test_python_ref__refs_transpose_cuda_complex32 PASSED [ 30%] 2023-01-11T23:13:47.1026387Z test_ops.py::TestCommonCUDA::test_python_ref__refs_transpose_cuda_float16 PASSED [ 30%] 2023-01-11T23:13:47.1026545Z test_ops.py::TestCommonCUDA::test_python_ref__refs_transpose_cuda_float64 PASSED [ 30%] 2023-01-11T23:13:47.1026701Z test_ops.py::TestCommonCUDA::test_python_ref__refs_transpose_cuda_int16 PASSED [ 30%] 2023-01-11T23:13:47.1026855Z test_ops.py::TestCommonCUDA::test_python_ref__refs_transpose_cuda_int8 PASSED [ 30%] 2023-01-11T23:13:47.1027010Z test_ops.py::TestCommonCUDA::test_python_ref__refs_transpose_cuda_uint8 PASSED [ 30%] 2023-01-11T23:13:47.1027161Z test_ops.py::TestCommonCUDA::test_python_ref__refs_tril_cuda_complex32 PASSED [ 30%] 2023-01-11T23:13:47.1027313Z test_ops.py::TestCommonCUDA::test_python_ref__refs_tril_cuda_float16 PASSED [ 30%] 2023-01-11T23:13:47.1027462Z test_ops.py::TestCommonCUDA::test_python_ref__refs_tril_cuda_float64 PASSED [ 30%] 2023-01-11T23:13:47.1027612Z test_ops.py::TestCommonCUDA::test_python_ref__refs_tril_cuda_int64 PASSED [ 30%] 2023-01-11T23:13:47.1027770Z test_ops.py::TestCommonCUDA::test_python_ref__refs_tril_indices_cuda_int32 PASSED [ 30%] 2023-01-11T23:13:47.1027919Z test_ops.py::TestCommonCUDA::test_python_ref__refs_triu_cuda_complex64 PASSED [ 30%] 2023-01-11T23:13:47.1028144Z test_ops.py::TestCommonCUDA::test_python_ref__refs_triu_cuda_float16 PASSED [ 30%] 2023-01-11T23:13:47.1028291Z test_ops.py::TestCommonCUDA::test_python_ref__refs_triu_cuda_int32 PASSED [ 30%] 2023-01-11T23:13:47.1028444Z test_ops.py::TestCommonCUDA::test_python_ref__refs_triu_cuda_int64 PASSED [ 30%] 2023-01-11T23:13:47.1028602Z test_ops.py::TestCommonCUDA::test_python_ref__refs_triu_indices_cuda_int32 PASSED [ 30%] 2023-01-11T23:13:47.1028757Z test_ops.py::TestCommonCUDA::test_python_ref__refs_triu_indices_cuda_int64 PASSED [ 30%] 2023-01-11T23:13:47.1028918Z test_ops.py::TestCommonCUDA::test_python_ref__refs_true_divide_cuda_bfloat16 PASSED [ 30%] 2023-01-11T23:13:47.1029078Z test_ops.py::TestCommonCUDA::test_python_ref__refs_true_divide_cuda_complex32 XFAIL [ 30%] 2023-01-11T23:13:47.1029235Z test_ops.py::TestCommonCUDA::test_python_ref__refs_true_divide_cuda_int32 PASSED [ 30%] 2023-01-11T23:13:47.1029387Z test_ops.py::TestCommonCUDA::test_python_ref__refs_trunc_cuda_bfloat16 PASSED [ 30%] 2023-01-11T23:13:47.1029539Z test_ops.py::TestCommonCUDA::test_python_ref__refs_trunc_cuda_float16 PASSED [ 30%] 2023-01-11T23:13:47.1029750Z test_ops.py::TestCommonCUDA::test_python_ref__refs_trunc_cuda_int16 PASSED [ 30%] 2023-01-11T23:13:47.1029904Z test_ops.py::TestCommonCUDA::test_python_ref__refs_trunc_cuda_int32 PASSED [ 30%] 2023-01-11T23:13:47.1030052Z test_ops.py::TestCommonCUDA::test_python_ref__refs_trunc_cuda_int64 PASSED [ 30%] 2023-01-11T23:13:47.1030202Z test_ops.py::TestCommonCUDA::test_python_ref__refs_trunc_cuda_uint8 PASSED [ 30%] 2023-01-11T23:13:47.1030359Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unbind_cuda_complex128 PASSED [ 30%] 2023-01-11T23:13:47.1030514Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unbind_cuda_float16 PASSED [ 30%] 2023-01-11T23:13:47.1030661Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unbind_cuda_float32 PASSED [ 30%] 2023-01-11T23:13:47.1030810Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unbind_cuda_float64 PASSED [ 30%] 2023-01-11T23:13:47.1030963Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unbind_cuda_int8 PASSED [ 30%] 2023-01-11T23:13:47.1031119Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unbind_cuda_uint8 PASSED [ 30%] 2023-01-11T23:13:47.1031301Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unflatten_cuda_bool PASSED [ 30%] 2023-01-11T23:13:47.1031464Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unflatten_cuda_complex32 PASSED [ 30%] 2023-01-11T23:13:47.1031619Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unflatten_cuda_int16 PASSED [ 30%] 2023-01-11T23:13:47.1031770Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unflatten_cuda_int64 PASSED [ 30%] 2023-01-11T23:13:47.1031918Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unflatten_cuda_uint8 PASSED [ 31%] 2023-01-11T23:13:47.1032077Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_copy_cuda_bool PASSED [ 31%] 2023-01-11T23:13:47.1032243Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_copy_cuda_complex32 PASSED [ 31%] 2023-01-11T23:13:47.1032405Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_copy_cuda_complex64 PASSED [ 31%] 2023-01-11T23:13:47.1032569Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_copy_cuda_float16 PASSED [ 31%] 2023-01-11T23:13:47.1032730Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_copy_cuda_float32 PASSED [ 31%] 2023-01-11T23:13:47.1032892Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_copy_cuda_float64 PASSED [ 31%] 2023-01-11T23:13:47.1033051Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_copy_cuda_int32 PASSED [ 31%] 2023-01-11T23:13:47.1033210Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_copy_cuda_int64 PASSED [ 31%] 2023-01-11T23:13:47.1033364Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_copy_cuda_int8 PASSED [ 31%] 2023-01-11T23:13:47.1033519Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_cuda_bool PASSED [ 31%] 2023-01-11T23:13:47.1033706Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_cuda_complex64 PASSED [ 31%] 2023-01-11T23:13:47.1033861Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_cuda_float16 PASSED [ 31%] 2023-01-11T23:13:47.1034018Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_cuda_float32 PASSED [ 31%] 2023-01-11T23:13:47.1034175Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_cuda_int32 PASSED [ 31%] 2023-01-11T23:13:47.1034330Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_cuda_int8 PASSED [ 31%] 2023-01-11T23:13:47.1034488Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unsqueeze_cuda_bool PASSED [ 31%] 2023-01-11T23:13:47.1034643Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unsqueeze_cuda_complex32 PASSED [ 31%] 2023-01-11T23:13:47.1034805Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unsqueeze_cuda_complex64 PASSED [ 31%] 2023-01-11T23:13:47.1034966Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unsqueeze_cuda_float64 PASSED [ 31%] 2023-01-11T23:13:47.1035127Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unsqueeze_cuda_int32 PASSED [ 31%] 2023-01-11T23:13:47.1035282Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unsqueeze_cuda_int64 PASSED [ 31%] 2023-01-11T23:13:47.1035442Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unsqueeze_cuda_int8 PASSED [ 31%] 2023-01-11T23:13:47.1035597Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unsqueeze_cuda_uint8 PASSED [ 31%] 2023-01-11T23:13:47.1035754Z test_ops.py::TestCommonCUDA::test_python_ref__refs_var_cuda_bfloat16 PASSED [ 31%] 2023-01-11T23:13:47.1035901Z test_ops.py::TestCommonCUDA::test_python_ref__refs_var_cuda_float16 PASSED [ 31%] 2023-01-11T23:13:47.1036055Z test_ops.py::TestCommonCUDA::test_python_ref__refs_var_cuda_float32 PASSED [ 31%] 2023-01-11T23:13:47.1036217Z test_ops.py::TestCommonCUDA::test_python_ref__refs_var_mean_cuda_bfloat16 PASSED [ 31%] 2023-01-11T23:13:47.1036380Z test_ops.py::TestCommonCUDA::test_python_ref__refs_var_mean_cuda_complex128 PASSED [ 31%] 2023-01-11T23:13:47.1036542Z test_ops.py::TestCommonCUDA::test_python_ref__refs_var_mean_cuda_float32 PASSED [ 31%] 2023-01-11T23:13:47.1036705Z test_ops.py::TestCommonCUDA::test_python_ref__refs_view_as_cuda_complex128 PASSED [ 31%] 2023-01-11T23:13:47.1036892Z test_ops.py::TestCommonCUDA::test_python_ref__refs_view_as_cuda_complex64 PASSED [ 31%] 2023-01-11T23:13:47.1037052Z test_ops.py::TestCommonCUDA::test_python_ref__refs_view_as_cuda_int16 PASSED [ 31%] 2023-01-11T23:13:47.1037209Z test_ops.py::TestCommonCUDA::test_python_ref__refs_view_as_cuda_int32 PASSED [ 31%] 2023-01-11T23:13:47.1037353Z test_ops.py::TestCommonCUDA::test_python_ref__refs_view_as_cuda_int64 PASSED [ 31%] 2023-01-11T23:13:47.1037507Z test_ops.py::TestCommonCUDA::test_python_ref__refs_view_cuda_bool PASSED [ 31%] 2023-01-11T23:13:47.1037666Z test_ops.py::TestCommonCUDA::test_python_ref__refs_view_cuda_complex32 PASSED [ 31%] 2023-01-11T23:13:47.1037822Z test_ops.py::TestCommonCUDA::test_python_ref__refs_view_cuda_float64 PASSED [ 31%] 2023-01-11T23:13:47.1037979Z test_ops.py::TestCommonCUDA::test_python_ref__refs_view_cuda_int16 PASSED [ 31%] 2023-01-11T23:13:47.1038133Z test_ops.py::TestCommonCUDA::test_python_ref__refs_view_cuda_int32 PASSED [ 31%] 2023-01-11T23:13:47.1038292Z test_ops.py::TestCommonCUDA::test_python_ref__refs_view_cuda_uint8 PASSED [ 31%] 2023-01-11T23:13:47.1038457Z test_ops.py::TestCommonCUDA::test_python_ref__refs_vsplit_cuda_complex128 PASSED [ 31%] 2023-01-11T23:13:47.1038611Z test_ops.py::TestCommonCUDA::test_python_ref__refs_vsplit_cuda_complex32 PASSED [ 31%] 2023-01-11T23:13:47.1038769Z test_ops.py::TestCommonCUDA::test_python_ref__refs_vsplit_cuda_float64 PASSED [ 31%] 2023-01-11T23:13:47.1038947Z test_ops.py::TestCommonCUDA::test_python_ref__refs_vsplit_cuda_int16 PASSED [ 31%] 2023-01-11T23:13:47.1039121Z test_ops.py::TestCommonCUDA::test_python_ref__refs_vsplit_cuda_int64 PASSED [ 31%] 2023-01-11T23:13:47.1039282Z test_ops.py::TestCommonCUDA::test_python_ref__refs_vsplit_cuda_int8 PASSED [ 31%] 2023-01-11T23:13:47.1039469Z test_ops.py::TestCommonCUDA::test_python_ref__refs_vstack_cuda_bfloat16 PASSED [ 31%] 2023-01-11T23:13:47.1039623Z test_ops.py::TestCommonCUDA::test_python_ref__refs_vstack_cuda_bool PASSED [ 31%] 2023-01-11T23:13:47.1039789Z test_ops.py::TestCommonCUDA::test_python_ref__refs_vstack_cuda_complex128 PASSED [ 31%] 2023-01-11T23:13:47.1039939Z test_ops.py::TestCommonCUDA::test_python_ref__refs_vstack_cuda_float64 PASSED [ 31%] 2023-01-11T23:13:47.1040097Z test_ops.py::TestCommonCUDA::test_python_ref__refs_where_cuda_complex32 PASSED [ 31%] 2023-01-11T23:13:47.1040254Z test_ops.py::TestCommonCUDA::test_python_ref__refs_where_cuda_float16 PASSED [ 31%] 2023-01-11T23:13:47.1040410Z test_ops.py::TestCommonCUDA::test_python_ref__refs_where_cuda_float32 PASSED [ 31%] 2023-01-11T23:13:47.1040562Z test_ops.py::TestCommonCUDA::test_python_ref__refs_where_cuda_float64 PASSED [ 31%] 2023-01-11T23:13:47.1040714Z test_ops.py::TestCommonCUDA::test_python_ref__refs_where_cuda_int16 PASSED [ 31%] 2023-01-11T23:13:47.1040872Z test_ops.py::TestCommonCUDA::test_python_ref__refs_where_cuda_int32 PASSED [ 31%] 2023-01-11T23:13:47.1041022Z test_ops.py::TestCommonCUDA::test_python_ref__refs_where_cuda_int64 PASSED [ 31%] 2023-01-11T23:13:47.1041171Z test_ops.py::TestCommonCUDA::test_python_ref__refs_where_cuda_int8 PASSED [ 31%] 2023-01-11T23:13:47.1041328Z test_ops.py::TestCommonCUDA::test_python_ref__refs_xlogy_cuda_float16 PASSED [ 31%] 2023-01-11T23:13:47.1041481Z test_ops.py::TestCommonCUDA::test_python_ref__refs_xlogy_cuda_float32 PASSED [ 31%] 2023-01-11T23:13:47.1041630Z test_ops.py::TestCommonCUDA::test_python_ref__refs_xlogy_cuda_float64 PASSED [ 31%] 2023-01-11T23:13:47.1041780Z test_ops.py::TestCommonCUDA::test_python_ref__refs_xlogy_cuda_int16 PASSED [ 31%] 2023-01-11T23:13:47.1041937Z test_ops.py::TestCommonCUDA::test_python_ref__refs_xlogy_cuda_int8 PASSED [ 31%] 2023-01-11T23:13:47.1042100Z test_ops.py::TestCommonCUDA::test_python_ref__refs_zeros_cuda_complex128 PASSED [ 31%] 2023-01-11T23:13:47.1042263Z test_ops.py::TestCommonCUDA::test_python_ref__refs_zeros_cuda_complex32 PASSED [ 31%] 2023-01-11T23:13:47.1042417Z test_ops.py::TestCommonCUDA::test_python_ref__refs_zeros_cuda_float32 PASSED [ 31%] 2023-01-11T23:13:47.1042588Z test_ops.py::TestCommonCUDA::test_python_ref__refs_zeros_cuda_float64 PASSED [ 31%] 2023-01-11T23:13:47.1042743Z test_ops.py::TestCommonCUDA::test_python_ref__refs_zeros_cuda_int16 PASSED [ 31%] 2023-01-11T23:13:47.1042896Z test_ops.py::TestCommonCUDA::test_python_ref__refs_zeros_cuda_int32 PASSED [ 31%] 2023-01-11T23:13:47.1043050Z test_ops.py::TestCommonCUDA::test_python_ref__refs_zeros_cuda_int64 PASSED [ 31%] 2023-01-11T23:13:47.1043204Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_T_cuda PASSED [ 31%] 2023-01-11T23:13:47.1043379Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs__conversions_complex_cuda PASSED [ 31%] 2023-01-11T23:13:47.1043533Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_amax_cuda PASSED [ 31%] 2023-01-11T23:13:47.1043689Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_amin_cuda PASSED [ 31%] 2023-01-11T23:13:47.1043852Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_as_strided_scatter_cuda PASSED [ 31%] 2023-01-11T23:13:47.1044012Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_atan2_cuda PASSED [ 31%] 2023-01-11T23:13:47.1044176Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_bitwise_and_cuda PASSED [ 31%] 2023-01-11T23:13:47.1044342Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_bitwise_left_shift_cuda PASSED [ 31%] 2023-01-11T23:13:47.1044503Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_bitwise_or_cuda PASSED [ 31%] 2023-01-11T23:13:47.1044655Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_cat_cuda XFAIL [ 31%] 2023-01-11T23:13:47.1044813Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_clamp_max_cuda XFAIL [ 31%] 2023-01-11T23:13:47.1044967Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_diag_cuda PASSED [ 31%] 2023-01-11T23:13:47.1045159Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_div_no_rounding_mode_cuda PASSED [ 31%] 2023-01-11T23:13:47.1045331Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_div_trunc_rounding_cuda PASSED [ 31%] 2023-01-11T23:13:47.1045490Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_eye_cuda PASSED [ 31%] 2023-01-11T23:13:47.1045648Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fft_fft_cuda PASSED [ 31%] 2023-01-11T23:13:47.1045807Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fft_ihfft2_cuda PASSED [ 31%] 2023-01-11T23:13:47.1045967Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fft_ihfft_cuda PASSED [ 31%] 2023-01-11T23:13:47.1046129Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fft_ihfftn_cuda PASSED [ 31%] 2023-01-11T23:13:47.1046288Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fft_irfft2_cuda PASSED [ 31%] 2023-01-11T23:13:47.1046448Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fft_rfft2_cuda PASSED [ 31%] 2023-01-11T23:13:47.1046602Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fft_rfft_cuda PASSED [ 31%] 2023-01-11T23:13:47.1046761Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fft_rfftn_cuda PASSED [ 31%] 2023-01-11T23:13:47.1046923Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_flipud_cuda PASSED [ 31%] 2023-01-11T23:13:47.1047084Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_float_power_cuda PASSED [ 31%] 2023-01-11T23:13:47.1047240Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_gcd_cuda PASSED [ 31%] 2023-01-11T23:13:47.1047401Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_heaviside_cuda PASSED [ 31%] 2023-01-11T23:13:47.1047560Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_hsplit_cuda PASSED [ 31%] 2023-01-11T23:13:47.1047717Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_hstack_cuda XFAIL [ 31%] 2023-01-11T23:13:47.1047867Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_isclose_cuda PASSED [ 31%] 2023-01-11T23:13:47.1048024Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_le_cuda PASSED [ 31%] 2023-01-11T23:13:47.1048187Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_linspace_cuda PASSED [ 31%] 2023-01-11T23:13:47.1048375Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_logical_xor_cuda PASSED [ 31%] 2023-01-11T23:13:47.1048538Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_logspace_cuda PASSED [ 31%] 2023-01-11T23:13:47.1048694Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_maximum_cuda XFAIL [ 31%] 2023-01-11T23:13:47.1048849Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_movedim_cuda PASSED [ 31%] 2023-01-11T23:13:47.1049011Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_narrow_copy_cuda PASSED [ 31%] 2023-01-11T23:13:47.1049161Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_narrow_cuda PASSED [ 31%] 2023-01-11T23:13:47.1049333Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_native_layer_norm_cuda PASSED [ 31%] 2023-01-11T23:13:47.1049490Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_ne_cuda PASSED [ 31%] 2023-01-11T23:13:47.1049669Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_nn_functional_group_norm_cuda PASSED [ 31%] 2023-01-11T23:13:47.1049848Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_nn_functional_l1_loss_cuda PASSED [ 31%] 2023-01-11T23:13:47.1050032Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_nn_functional_poisson_nll_loss_cuda PASSED [ 31%] 2023-01-11T23:13:47.1050211Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_nn_functional_softshrink_cuda PASSED [ 31%] 2023-01-11T23:13:47.1050401Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_nn_functional_triplet_margin_loss_cuda PASSED [ 31%] 2023-01-11T23:13:47.1050557Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_pow_cuda PASSED [ 31%] 2023-01-11T23:13:47.1050711Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_remainder_cuda PASSED [ 31%] 2023-01-11T23:13:47.1050904Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_roll_cuda PASSED [ 31%] 2023-01-11T23:13:47.1051058Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_rot90_cuda PASSED [ 31%] 2023-01-11T23:13:47.1051216Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_rsub_cuda PASSED [ 31%] 2023-01-11T23:13:47.1051385Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_special_xlog1py_cuda PASSED [ 31%] 2023-01-11T23:13:47.1051537Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_t_cuda PASSED [ 31%] 2023-01-11T23:13:47.1051693Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_trace_cuda PASSED [ 31%] 2023-01-11T23:13:47.1051847Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_tril_cuda PASSED [ 31%] 2023-01-11T23:13:47.1052002Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_true_divide_cuda PASSED [ 31%] 2023-01-11T23:13:47.1052164Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_view_as_cuda PASSED [ 31%] 2023-01-11T23:13:47.1052319Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_view_cuda PASSED [ 31%] 2023-01-11T23:13:47.1052475Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_vstack_cuda XFAIL [ 31%] 2023-01-11T23:13:47.1052630Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_where_cuda XFAIL [ 31%] 2023-01-11T23:13:47.1052793Z test_ops.py::TestCommonCUDA::test_python_ref_errors_ops_nvprims_view_cuda PASSED [ 31%] 2023-01-11T23:13:47.1052978Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_T_executor_aten_cuda_complex128 PASSED [ 31%] 2023-01-11T23:13:47.1053160Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_T_executor_aten_cuda_complex32 PASSED [ 31%] 2023-01-11T23:13:47.1053341Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_T_executor_aten_cuda_complex64 PASSED [ 31%] 2023-01-11T23:13:47.1053513Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_T_executor_aten_cuda_float64 PASSED [ 31%] 2023-01-11T23:13:47.1053692Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_T_executor_aten_cuda_int64 PASSED [ 31%] 2023-01-11T23:13:47.1053863Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_T_executor_aten_cuda_uint8 PASSED [ 31%] 2023-01-11T23:13:47.1054095Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_T_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 31%] 2023-01-11T23:13:47.1054305Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_T_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 31%] 2023-01-11T23:13:47.1054889Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_T_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 31%] 2023-01-11T23:13:47.1055099Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_T_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 31%] 2023-01-11T23:13:47.1055301Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_T_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 31%] 2023-01-11T23:13:47.1055629Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_T_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 31%] 2023-01-11T23:13:47.1055825Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_T_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 31%] 2023-01-11T23:13:47.1056025Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_aten_cuda_bfloat16 PASSED [ 31%] 2023-01-11T23:13:47.1056221Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_aten_cuda_bool PASSED [ 31%] 2023-01-11T23:13:47.1056420Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_aten_cuda_complex32 PASSED [ 31%] 2023-01-11T23:13:47.1056615Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_aten_cuda_float16 PASSED [ 31%] 2023-01-11T23:13:47.1056811Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_aten_cuda_float32 PASSED [ 31%] 2023-01-11T23:13:47.1057047Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_aten_cuda_float64 PASSED [ 31%] 2023-01-11T23:13:47.1057245Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_aten_cuda_int32 PASSED [ 31%] 2023-01-11T23:13:47.1057439Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_aten_cuda_int8 PASSED [ 31%] 2023-01-11T23:13:47.1057666Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 32%] 2023-01-11T23:13:47.1058026Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 32%] 2023-01-11T23:13:47.1058255Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 32%] 2023-01-11T23:13:47.1058475Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 32%] 2023-01-11T23:13:47.1058673Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_aten_cuda_complex128 PASSED [ 32%] 2023-01-11T23:13:47.1058867Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_aten_cuda_float16 PASSED [ 32%] 2023-01-11T23:13:47.1059059Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_aten_cuda_float32 PASSED [ 32%] 2023-01-11T23:13:47.1059252Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_aten_cuda_int32 PASSED [ 32%] 2023-01-11T23:13:47.1059439Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_aten_cuda_uint8 PASSED [ 32%] 2023-01-11T23:13:47.1059662Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 32%] 2023-01-11T23:13:47.1060056Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_nvfuser_cuda_float32 SKIPPED (_refs._conversions.bool doesn't support nvfuser) [ 32%] 2023-01-11T23:13:47.1060402Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 32%] 2023-01-11T23:13:47.1060754Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_nvfuser_cuda_int32 SKIPPED (_refs._conversions.bool doesn't support nvfuser) [ 32%] 2023-01-11T23:13:47.1060972Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 32%] 2023-01-11T23:13:47.1061186Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 32%] 2023-01-11T23:13:47.1061384Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_byte_executor_aten_cuda_bfloat16 PASSED [ 32%] 2023-01-11T23:13:47.1061580Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_byte_executor_aten_cuda_bool PASSED [ 32%] 2023-01-11T23:13:47.1061776Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_byte_executor_aten_cuda_complex64 PASSED [ 32%] 2023-01-11T23:13:47.1061969Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_byte_executor_aten_cuda_float64 PASSED [ 32%] 2023-01-11T23:13:47.1062161Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_byte_executor_aten_cuda_int16 PASSED [ 32%] 2023-01-11T23:13:47.1062349Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_byte_executor_aten_cuda_int64 PASSED [ 32%] 2023-01-11T23:13:47.1062532Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_byte_executor_aten_cuda_uint8 PASSED [ 32%] 2023-01-11T23:13:47.1062774Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_byte_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 32%] 2023-01-11T23:13:47.1062995Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_byte_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 32%] 2023-01-11T23:13:47.1063216Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_byte_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 32%] 2023-01-11T23:13:47.1063440Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_byte_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 32%] 2023-01-11T23:13:47.1063658Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_byte_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 32%] 2023-01-11T23:13:47.1064017Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_byte_executor_nvfuser_cuda_float32 SKIPPED (_refs._conversions.byte doesn't support nvfuser) [ 32%] 2023-01-11T23:13:47.1064367Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_byte_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 32%] 2023-01-11T23:13:47.1064721Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_byte_executor_nvfuser_cuda_int32 SKIPPED (_refs._conversions.byte doesn't support nvfuser) [ 32%] 2023-01-11T23:13:47.1064938Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_byte_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 32%] 2023-01-11T23:13:47.1065278Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_byte_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 32%] 2023-01-11T23:13:47.1065484Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_aten_cuda_complex32 PASSED [ 32%] 2023-01-11T23:13:47.1065678Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_aten_cuda_float16 PASSED [ 32%] 2023-01-11T23:13:47.1065902Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_aten_cuda_float32 PASSED [ 32%] 2023-01-11T23:13:47.1066099Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_aten_cuda_int32 PASSED [ 32%] 2023-01-11T23:13:47.1066294Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_aten_cuda_uint8 PASSED [ 32%] 2023-01-11T23:13:47.1066516Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 32%] 2023-01-11T23:13:47.1066740Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 32%] 2023-01-11T23:13:47.1067103Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 32%] 2023-01-11T23:13:47.1067339Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 32%] 2023-01-11T23:13:47.1067563Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 32%] 2023-01-11T23:13:47.1067910Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 32%] 2023-01-11T23:13:47.1068110Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_aten_cuda_complex128 PASSED [ 32%] 2023-01-11T23:13:47.1068301Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_aten_cuda_complex32 PASSED [ 32%] 2023-01-11T23:13:47.1068497Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_aten_cuda_complex64 PASSED [ 32%] 2023-01-11T23:13:47.1068719Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_aten_cuda_float16 PASSED [ 32%] 2023-01-11T23:13:47.1068952Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_aten_cuda_float32 PASSED [ 32%] 2023-01-11T23:13:47.1069163Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_aten_cuda_float64 PASSED [ 32%] 2023-01-11T23:13:47.1069355Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_aten_cuda_int16 PASSED [ 32%] 2023-01-11T23:13:47.1069546Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_aten_cuda_int32 PASSED [ 32%] 2023-01-11T23:13:47.1069808Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_aten_cuda_uint8 PASSED [ 32%] 2023-01-11T23:13:47.1070035Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 32%] 2023-01-11T23:13:47.1070256Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 32%] 2023-01-11T23:13:47.1070472Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 32%] 2023-01-11T23:13:47.1070839Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 32%] 2023-01-11T23:13:47.1071062Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 32%] 2023-01-11T23:13:47.1071410Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 32%] 2023-01-11T23:13:47.1071759Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 32%] 2023-01-11T23:13:47.1071989Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_aten_cuda_complex128 PASSED [ 32%] 2023-01-11T23:13:47.1072189Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_aten_cuda_complex32 PASSED [ 32%] 2023-01-11T23:13:47.1072386Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_aten_cuda_complex64 PASSED [ 32%] 2023-01-11T23:13:47.1072581Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_aten_cuda_float16 PASSED [ 32%] 2023-01-11T23:13:47.1072775Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_aten_cuda_float32 PASSED [ 32%] 2023-01-11T23:13:47.1072963Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_aten_cuda_int32 PASSED [ 32%] 2023-01-11T23:13:47.1073147Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_aten_cuda_int64 PASSED [ 32%] 2023-01-11T23:13:47.1073512Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 32%] 2023-01-11T23:13:47.1073733Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 32%] 2023-01-11T23:13:47.1074096Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_nvfuser_cuda_float32 SKIPPED (_refs._conversions.chalf doesn't support nvfuser) [ 32%] 2023-01-11T23:13:47.1074312Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 32%] 2023-01-11T23:13:47.1074561Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 32%] 2023-01-11T23:13:47.1074760Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_aten_cuda_bfloat16 PASSED [ 32%] 2023-01-11T23:13:47.1074959Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_aten_cuda_complex128 PASSED [ 32%] 2023-01-11T23:13:47.1075155Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_aten_cuda_complex32 PASSED [ 32%] 2023-01-11T23:13:47.1075353Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_aten_cuda_float64 PASSED [ 32%] 2023-01-11T23:13:47.1075538Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_aten_cuda_int64 PASSED [ 32%] 2023-01-11T23:13:47.1075727Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_aten_cuda_int8 PASSED [ 32%] 2023-01-11T23:13:47.1075920Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_aten_cuda_uint8 PASSED [ 32%] 2023-01-11T23:13:47.1076146Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 32%] 2023-01-11T23:13:47.1076368Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 32%] 2023-01-11T23:13:47.1076712Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 32%] 2023-01-11T23:13:47.1076928Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 32%] 2023-01-11T23:13:47.1077277Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 32%] 2023-01-11T23:13:47.1077478Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_complex_executor_aten_cuda_float32 PASSED [ 32%] 2023-01-11T23:13:47.1077729Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_complex_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 32%] 2023-01-11T23:13:47.1077951Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_complex_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 32%] 2023-01-11T23:13:47.1078145Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_aten_cuda_complex64 PASSED [ 32%] 2023-01-11T23:13:47.1078341Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_aten_cuda_float64 PASSED [ 32%] 2023-01-11T23:13:47.1078534Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_aten_cuda_int16 PASSED [ 32%] 2023-01-11T23:13:47.1078729Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_aten_cuda_int32 PASSED [ 32%] 2023-01-11T23:13:47.1078923Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_aten_cuda_uint8 PASSED [ 32%] 2023-01-11T23:13:47.1079146Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 32%] 2023-01-11T23:13:47.1079364Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 32%] 2023-01-11T23:13:47.1079589Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 32%] 2023-01-11T23:13:47.1079807Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 32%] 2023-01-11T23:13:47.1080154Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 32%] 2023-01-11T23:13:47.1080395Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 32%] 2023-01-11T23:13:47.1080614Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 32%] 2023-01-11T23:13:47.1080813Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_float_executor_aten_cuda_complex128 PASSED [ 32%] 2023-01-11T23:13:47.1081010Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_float_executor_aten_cuda_complex32 PASSED [ 32%] 2023-01-11T23:13:47.1081207Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_float_executor_aten_cuda_float32 PASSED [ 32%] 2023-01-11T23:13:47.1081397Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_float_executor_aten_cuda_int16 PASSED [ 32%] 2023-01-11T23:13:47.1081584Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_float_executor_aten_cuda_int64 PASSED [ 32%] 2023-01-11T23:13:47.1081772Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_float_executor_aten_cuda_uint8 PASSED [ 32%] 2023-01-11T23:13:47.1081989Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_float_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 32%] 2023-01-11T23:13:47.1082210Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_float_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 32%] 2023-01-11T23:13:47.1082548Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_float_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 32%] 2023-01-11T23:13:47.1082907Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_float_executor_nvfuser_cuda_int32 SKIPPED (_refs._conversions.float doesn't support nvfuser) [ 32%] 2023-01-11T23:13:47.1083125Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_float_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 32%] 2023-01-11T23:13:47.1083350Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_half_executor_aten_cuda_complex128 PASSED [ 32%] 2023-01-11T23:13:47.1083545Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_half_executor_aten_cuda_float64 PASSED [ 32%] 2023-01-11T23:13:47.1083738Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_half_executor_aten_cuda_int16 PASSED [ 32%] 2023-01-11T23:13:47.1083925Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_half_executor_aten_cuda_int64 PASSED [ 32%] 2023-01-11T23:13:47.1084114Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_half_executor_aten_cuda_int8 PASSED [ 32%] 2023-01-11T23:13:47.1084335Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_half_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 32%] 2023-01-11T23:13:47.1084559Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_half_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 32%] 2023-01-11T23:13:47.1084782Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_half_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 32%] 2023-01-11T23:13:47.1085134Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_half_executor_nvfuser_cuda_float32 SKIPPED (_refs._conversions.half doesn't support nvfuser) [ 32%] 2023-01-11T23:13:47.1085479Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_half_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 32%] 2023-01-11T23:13:47.1085835Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_half_executor_nvfuser_cuda_int32 SKIPPED (_refs._conversions.half doesn't support nvfuser) [ 32%] 2023-01-11T23:13:47.1086079Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_half_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 32%] 2023-01-11T23:13:47.1086278Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_int_executor_aten_cuda_bfloat16 PASSED [ 32%] 2023-01-11T23:13:47.1086470Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_int_executor_aten_cuda_float16 PASSED [ 32%] 2023-01-11T23:13:47.1086657Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_int_executor_aten_cuda_float32 PASSED [ 32%] 2023-01-11T23:13:47.1086842Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_int_executor_aten_cuda_float64 PASSED [ 32%] 2023-01-11T23:13:47.1087028Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_int_executor_aten_cuda_int16 PASSED [ 32%] 2023-01-11T23:13:47.1087212Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_int_executor_aten_cuda_int32 PASSED [ 32%] 2023-01-11T23:13:47.1087389Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_int_executor_aten_cuda_int64 PASSED [ 32%] 2023-01-11T23:13:47.1087580Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_int_executor_aten_cuda_int8 PASSED [ 32%] 2023-01-11T23:13:47.1087939Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_int_executor_nvfuser_cuda_float32 SKIPPED (_refs._conversions.int doesn't support nvfuser) [ 32%] 2023-01-11T23:13:47.1088153Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_int_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 32%] 2023-01-11T23:13:47.1088500Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_int_executor_nvfuser_cuda_int32 SKIPPED (_refs._conversions.int doesn't support nvfuser) [ 32%] 2023-01-11T23:13:47.1088846Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_int_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 32%] 2023-01-11T23:13:47.1089046Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_aten_cuda_bfloat16 PASSED [ 32%] 2023-01-11T23:13:47.1089269Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_aten_cuda_bool PASSED [ 32%] 2023-01-11T23:13:47.1089471Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_aten_cuda_complex128 PASSED [ 32%] 2023-01-11T23:13:47.1089666Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_aten_cuda_float16 PASSED [ 32%] 2023-01-11T23:13:47.1089851Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_aten_cuda_float64 PASSED [ 32%] 2023-01-11T23:13:47.1090043Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_aten_cuda_int16 PASSED [ 32%] 2023-01-11T23:13:47.1090232Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_aten_cuda_int32 PASSED [ 32%] 2023-01-11T23:13:47.1090419Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_aten_cuda_int8 PASSED [ 32%] 2023-01-11T23:13:47.1090640Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 32%] 2023-01-11T23:13:47.1090859Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 32%] 2023-01-11T23:13:47.1091216Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_nvfuser_cuda_int32 SKIPPED (_refs._conversions.long doesn't support nvfuser) [ 32%] 2023-01-11T23:13:47.1091416Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_short_executor_aten_cuda_complex128 PASSED [ 32%] 2023-01-11T23:13:47.1091612Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_short_executor_aten_cuda_complex64 PASSED [ 32%] 2023-01-11T23:13:47.1091842Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_short_executor_aten_cuda_int64 PASSED [ 32%] 2023-01-11T23:13:47.1092030Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_short_executor_aten_cuda_int8 PASSED [ 32%] 2023-01-11T23:13:47.1092250Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_short_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 32%] 2023-01-11T23:13:47.1092467Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_short_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 32%] 2023-01-11T23:13:47.1092689Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_short_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 32%] 2023-01-11T23:13:47.1093057Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_short_executor_nvfuser_cuda_float32 SKIPPED (_refs._conversions.short doesn't support nvfuser) [ 33%] 2023-01-11T23:13:47.1093405Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_short_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 33%] 2023-01-11T23:13:47.1093763Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_short_executor_nvfuser_cuda_int32 SKIPPED (_refs._conversions.short doesn't support nvfuser) [ 33%] 2023-01-11T23:13:47.1093979Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_short_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 33%] 2023-01-11T23:13:47.1094324Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_short_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 33%] 2023-01-11T23:13:47.1094629Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_aten_cuda_bool PASSED [ 33%] 2023-01-11T23:13:47.1094817Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_aten_cuda_complex128 PASSED [ 33%] 2023-01-11T23:13:47.1094994Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_aten_cuda_float32 PASSED [ 33%] 2023-01-11T23:13:47.1095221Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_aten_cuda_float64 PASSED [ 33%] 2023-01-11T23:13:47.1095399Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_aten_cuda_int8 PASSED [ 33%] 2023-01-11T23:13:47.1095608Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 33%] 2023-01-11T23:13:47.1095817Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 33%] 2023-01-11T23:13:47.1096024Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 33%] 2023-01-11T23:13:47.1096207Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_nvfuser_cuda_float32 PASSED [ 33%] 2023-01-11T23:13:47.1096418Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 33%] 2023-01-11T23:13:47.1096602Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acos_executor_aten_cuda_bool PASSED [ 33%] 2023-01-11T23:13:47.1096787Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acos_executor_aten_cuda_complex128 PASSED [ 33%] 2023-01-11T23:13:47.1096960Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acos_executor_aten_cuda_complex32 PASSED [ 33%] 2023-01-11T23:13:47.1097141Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acos_executor_aten_cuda_float16 PASSED [ 33%] 2023-01-11T23:13:47.1097324Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acos_executor_aten_cuda_float64 PASSED [ 33%] 2023-01-11T23:13:47.1097503Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acos_executor_aten_cuda_int16 PASSED [ 33%] 2023-01-11T23:13:47.1097714Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acos_executor_aten_cuda_int32 PASSED [ 33%] 2023-01-11T23:13:47.1097893Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acos_executor_aten_cuda_int8 PASSED [ 33%] 2023-01-11T23:13:47.1098077Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acos_executor_aten_cuda_uint8 PASSED [ 33%] 2023-01-11T23:13:47.1098288Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acos_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 33%] 2023-01-11T23:13:47.1098492Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acos_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 33%] 2023-01-11T23:13:47.1098831Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acos_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 33%] 2023-01-11T23:13:47.1099042Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acos_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 33%] 2023-01-11T23:13:47.1099229Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acos_executor_nvfuser_cuda_float32 PASSED [ 33%] 2023-01-11T23:13:47.1099437Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acos_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 33%] 2023-01-11T23:13:47.1099641Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acos_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 33%] 2023-01-11T23:13:47.1099964Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acos_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 33%] 2023-01-11T23:13:47.1100148Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acosh_executor_aten_cuda_bool PASSED [ 33%] 2023-01-11T23:13:47.1100335Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acosh_executor_aten_cuda_complex128 PASSED [ 33%] 2023-01-11T23:13:47.1100522Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acosh_executor_aten_cuda_complex32 PASSED [ 33%] 2023-01-11T23:13:47.1100710Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acosh_executor_aten_cuda_complex64 PASSED [ 33%] 2023-01-11T23:13:47.1100909Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acosh_executor_aten_cuda_float16 PASSED [ 33%] 2023-01-11T23:13:47.1101091Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acosh_executor_aten_cuda_int32 PASSED [ 33%] 2023-01-11T23:13:47.1101307Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acosh_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 33%] 2023-01-11T23:13:47.1101657Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acosh_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 33%] 2023-01-11T23:13:47.1101983Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acosh_executor_nvfuser_cuda_float32 SKIPPED (_refs.acosh doesn't support nvfuser) [ 33%] 2023-01-11T23:13:47.1102314Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acosh_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 33%] 2023-01-11T23:13:47.1102522Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acosh_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 33%] 2023-01-11T23:13:47.1102847Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acosh_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 33%] 2023-01-11T23:13:47.1103049Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acosh_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 33%] 2023-01-11T23:13:47.1103232Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_aten_cuda_bfloat16 PASSED [ 33%] 2023-01-11T23:13:47.1103403Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_aten_cuda_bool PASSED [ 33%] 2023-01-11T23:13:47.1103583Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_aten_cuda_float64 PASSED [ 33%] 2023-01-11T23:13:47.1103788Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_aten_cuda_int16 PASSED [ 33%] 2023-01-11T23:13:47.1103970Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_aten_cuda_int32 PASSED [ 33%] 2023-01-11T23:13:47.1104147Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_aten_cuda_int8 PASSED [ 33%] 2023-01-11T23:13:47.1104351Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 33%] 2023-01-11T23:13:47.1104562Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 33%] 2023-01-11T23:13:47.1104955Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 33%] 2023-01-11T23:13:47.1105283Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 33%] 2023-01-11T23:13:47.1105610Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 33%] 2023-01-11T23:13:47.1105793Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcdiv_executor_aten_cuda_bfloat16 PASSED [ 33%] 2023-01-11T23:13:47.1105984Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcdiv_executor_aten_cuda_complex64 PASSED [ 33%] 2023-01-11T23:13:47.1106170Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcdiv_executor_aten_cuda_float16 PASSED [ 33%] 2023-01-11T23:13:47.1106353Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcdiv_executor_aten_cuda_float64 PASSED [ 33%] 2023-01-11T23:13:47.1106566Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcdiv_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 33%] 2023-01-11T23:13:47.1106778Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcdiv_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 33%] 2023-01-11T23:13:47.1106981Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcdiv_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 33%] 2023-01-11T23:13:47.1107193Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcmul_executor_aten_cuda_bfloat16 PASSED [ 33%] 2023-01-11T23:13:47.1107384Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcmul_executor_aten_cuda_complex128 PASSED [ 33%] 2023-01-11T23:13:47.1107562Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcmul_executor_aten_cuda_int16 PASSED [ 33%] 2023-01-11T23:13:47.1107745Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcmul_executor_aten_cuda_int64 PASSED [ 33%] 2023-01-11T23:13:47.1107927Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcmul_executor_aten_cuda_int8 PASSED [ 33%] 2023-01-11T23:13:47.1108108Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcmul_executor_aten_cuda_uint8 PASSED [ 33%] 2023-01-11T23:13:47.1108318Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcmul_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 33%] 2023-01-11T23:13:47.1108532Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcmul_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 33%] 2023-01-11T23:13:47.1108721Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcmul_executor_nvfuser_cuda_float32 PASSED [ 33%] 2023-01-11T23:13:47.1108925Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcmul_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 33%] 2023-01-11T23:13:47.1109107Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcmul_executor_nvfuser_cuda_int32 PASSED [ 33%] 2023-01-11T23:13:47.1109315Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcmul_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 33%] 2023-01-11T23:13:47.1109664Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcmul_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 33%] 2023-01-11T23:13:47.1109950Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcmul_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 33%] 2023-01-11T23:13:47.1110154Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addr_executor_aten_cuda_bfloat16 PASSED [ 33%] 2023-01-11T23:13:47.1110336Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addr_executor_aten_cuda_bool PASSED [ 33%] 2023-01-11T23:13:47.1110524Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addr_executor_aten_cuda_complex128 PASSED [ 33%] 2023-01-11T23:13:47.1110706Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addr_executor_aten_cuda_int16 PASSED [ 33%] 2023-01-11T23:13:47.1110884Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addr_executor_aten_cuda_int8 PASSED [ 33%] 2023-01-11T23:13:47.1111096Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addr_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 33%] 2023-01-11T23:13:47.1111308Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addr_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 33%] 2023-01-11T23:13:47.1111512Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addr_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 33%] 2023-01-11T23:13:47.1111709Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addr_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 33%] 2023-01-11T23:13:47.1112034Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addr_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 33%] 2023-01-11T23:13:47.1112238Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addr_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 33%] 2023-01-11T23:13:47.1112416Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_all_executor_aten_cuda_bool PASSED [ 33%] 2023-01-11T23:13:47.1112604Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_all_executor_aten_cuda_complex128 PASSED [ 33%] 2023-01-11T23:13:47.1112826Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_all_executor_aten_cuda_float16 PASSED [ 33%] 2023-01-11T23:13:47.1113007Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_all_executor_aten_cuda_float32 PASSED [ 33%] 2023-01-11T23:13:47.1113183Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_all_executor_aten_cuda_float64 PASSED [ 33%] 2023-01-11T23:13:47.1113359Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_all_executor_aten_cuda_int16 PASSED [ 33%] 2023-01-11T23:13:47.1113527Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_all_executor_aten_cuda_int32 PASSED [ 33%] 2023-01-11T23:13:47.1113704Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_all_executor_aten_cuda_int64 PASSED [ 33%] 2023-01-11T23:13:47.1113883Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_all_executor_aten_cuda_int8 PASSED [ 33%] 2023-01-11T23:13:47.1114090Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_all_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 33%] 2023-01-11T23:13:47.1114297Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_all_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 33%] 2023-01-11T23:13:47.1114505Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_all_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 33%] 2023-01-11T23:13:47.1114709Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_all_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 33%] 2023-01-11T23:13:47.1114913Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_all_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 33%] 2023-01-11T23:13:47.1115093Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_all_executor_nvfuser_cuda_int32 PASSED [ 33%] 2023-01-11T23:13:47.1115443Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_all_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 33%] 2023-01-11T23:13:47.1115628Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_allclose_executor_aten_cuda_bfloat16 PASSED [ 33%] 2023-01-11T23:13:47.1115819Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_allclose_executor_aten_cuda_complex128 PASSED [ 33%] 2023-01-11T23:13:47.1116033Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_allclose_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 33%] 2023-01-11T23:13:47.1116214Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amax_executor_aten_cuda_bfloat16 PASSED [ 33%] 2023-01-11T23:13:47.1116394Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amax_executor_aten_cuda_float16 PASSED [ 33%] 2023-01-11T23:13:47.1116573Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amax_executor_aten_cuda_int16 PASSED [ 33%] 2023-01-11T23:13:47.1116754Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amax_executor_aten_cuda_int32 PASSED [ 33%] 2023-01-11T23:13:47.1116965Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amax_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 33%] 2023-01-11T23:13:47.1117294Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amax_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 33%] 2023-01-11T23:13:47.1117470Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amax_executor_nvfuser_cuda_int32 PASSED [ 33%] 2023-01-11T23:13:47.1117675Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amax_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 33%] 2023-01-11T23:13:47.1117852Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amin_executor_aten_cuda_bool PASSED [ 33%] 2023-01-11T23:13:47.1118033Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amin_executor_aten_cuda_float64 PASSED [ 33%] 2023-01-11T23:13:47.1118216Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amin_executor_aten_cuda_int16 PASSED [ 33%] 2023-01-11T23:13:47.1118397Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amin_executor_aten_cuda_int64 PASSED [ 33%] 2023-01-11T23:13:47.1118640Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amin_executor_aten_cuda_int8 PASSED [ 33%] 2023-01-11T23:13:47.1118848Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amin_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 33%] 2023-01-11T23:13:47.1119077Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amin_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 33%] 2023-01-11T23:13:47.1119309Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amin_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 33%] 2023-01-11T23:13:47.1119487Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amin_executor_nvfuser_cuda_float32 PASSED [ 33%] 2023-01-11T23:13:47.1119692Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amin_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 33%] 2023-01-11T23:13:47.1119877Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amin_executor_nvfuser_cuda_int32 PASSED [ 33%] 2023-01-11T23:13:47.1120084Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amin_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 33%] 2023-01-11T23:13:47.1120266Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_any_executor_aten_cuda_float16 PASSED [ 33%] 2023-01-11T23:13:47.1120443Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_any_executor_aten_cuda_int8 PASSED [ 33%] 2023-01-11T23:13:47.1120620Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_any_executor_aten_cuda_uint8 PASSED [ 33%] 2023-01-11T23:13:47.1120828Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_any_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 33%] 2023-01-11T23:13:47.1121056Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_any_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 33%] 2023-01-11T23:13:47.1121261Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_any_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 33%] 2023-01-11T23:13:47.1121450Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_any_executor_nvfuser_cuda_float32 PASSED [ 33%] 2023-01-11T23:13:47.1121655Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_any_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 33%] 2023-01-11T23:13:47.1121980Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_any_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 33%] 2023-01-11T23:13:47.1122165Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_arange_executor_aten_cuda_float16 PASSED [ 33%] 2023-01-11T23:13:47.1122350Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_arange_executor_aten_cuda_float32 PASSED [ 33%] 2023-01-11T23:13:47.1122530Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_arange_executor_aten_cuda_int32 PASSED [ 33%] 2023-01-11T23:13:47.1122709Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_arange_executor_aten_cuda_int64 PASSED [ 33%] 2023-01-11T23:13:47.1122894Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_arange_executor_aten_cuda_uint8 PASSED [ 33%] 2023-01-11T23:13:47.1123101Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_arange_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 33%] 2023-01-11T23:13:47.1123294Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_arange_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 33%] 2023-01-11T23:13:47.1123625Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_arange_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 33%] 2023-01-11T23:13:47.1123831Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_arange_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 33%] 2023-01-11T23:13:47.1124053Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_arange_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 33%] 2023-01-11T23:13:47.1124332Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_executor_aten_cuda_bool PASSED [ 33%] 2023-01-11T23:13:47.1124585Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_executor_aten_cuda_complex128 PASSED [ 33%] 2023-01-11T23:13:47.1124826Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_executor_aten_cuda_complex64 PASSED [ 33%] 2023-01-11T23:13:47.1125049Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_executor_aten_cuda_float16 PASSED [ 33%] 2023-01-11T23:13:47.1125265Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_executor_aten_cuda_float32 PASSED [ 33%] 2023-01-11T23:13:47.1125475Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_executor_aten_cuda_float64 PASSED [ 34%] 2023-01-11T23:13:47.1125693Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_executor_aten_cuda_int64 PASSED [ 34%] 2023-01-11T23:13:47.1125972Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 34%] 2023-01-11T23:13:47.1126375Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 34%] 2023-01-11T23:13:47.1126738Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_executor_nvfuser_cuda_int32 SKIPPED (_refs.as_strided doesn't support nvfuser) [ 34%] 2023-01-11T23:13:47.1127067Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 34%] 2023-01-11T23:13:47.1127268Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_partial_views_executor_aten_cuda_bool PASSED [ 34%] 2023-01-11T23:13:47.1127520Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_partial_views_executor_aten_cuda_float16 PASSED [ 34%] 2023-01-11T23:13:47.1127722Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_partial_views_executor_aten_cuda_int16 PASSED [ 34%] 2023-01-11T23:13:47.1127924Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_partial_views_executor_aten_cuda_int32 PASSED [ 34%] 2023-01-11T23:13:47.1128122Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_partial_views_executor_aten_cuda_int64 PASSED [ 34%] 2023-01-11T23:13:47.1128361Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_partial_views_executor_aten_cuda_int8 PASSED [ 34%] 2023-01-11T23:13:47.1128611Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_partial_views_executor_aten_cuda_uint8 PASSED [ 34%] 2023-01-11T23:13:47.1128866Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_partial_views_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 34%] 2023-01-11T23:13:47.1129125Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_partial_views_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 34%] 2023-01-11T23:13:47.1129385Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_partial_views_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 34%] 2023-01-11T23:13:47.1129811Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_partial_views_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 34%] 2023-01-11T23:13:47.1130165Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_partial_views_executor_nvfuser_cuda_int32 SKIPPED (_refs.as_strided doesn't support nvfuser) [ 34%] 2023-01-11T23:13:47.1130386Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_partial_views_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 34%] 2023-01-11T23:13:47.1130628Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_aten_cuda_bool PASSED [ 34%] 2023-01-11T23:13:47.1130890Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_aten_cuda_complex32 PASSED [ 34%] 2023-01-11T23:13:47.1131120Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_aten_cuda_complex64 PASSED [ 34%] 2023-01-11T23:13:47.1131312Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_aten_cuda_int16 PASSED [ 34%] 2023-01-11T23:13:47.1131497Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_aten_cuda_int32 PASSED [ 34%] 2023-01-11T23:13:47.1131681Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_aten_cuda_int64 PASSED [ 34%] 2023-01-11T23:13:47.1131870Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_aten_cuda_int8 PASSED [ 34%] 2023-01-11T23:13:47.1132061Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_aten_cuda_uint8 PASSED [ 34%] 2023-01-11T23:13:47.1132429Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 34%] 2023-01-11T23:13:47.1132651Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 34%] 2023-01-11T23:13:47.1132869Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 34%] 2023-01-11T23:13:47.1133215Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 34%] 2023-01-11T23:13:47.1133403Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_aten_cuda_bfloat16 PASSED [ 34%] 2023-01-11T23:13:47.1133623Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_aten_cuda_bool PASSED [ 34%] 2023-01-11T23:13:47.1133812Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_aten_cuda_complex128 PASSED [ 34%] 2023-01-11T23:13:47.1133995Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_aten_cuda_complex64 PASSED [ 34%] 2023-01-11T23:13:47.1134174Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_aten_cuda_float16 PASSED [ 34%] 2023-01-11T23:13:47.1134352Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_aten_cuda_float32 PASSED [ 34%] 2023-01-11T23:13:47.1134708Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_aten_cuda_int64 PASSED [ 34%] 2023-01-11T23:13:47.1134967Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 34%] 2023-01-11T23:13:47.1135157Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_nvfuser_cuda_float32 PASSED [ 34%] 2023-01-11T23:13:47.1135364Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 34%] 2023-01-11T23:13:47.1135543Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_nvfuser_cuda_int32 PASSED [ 34%] 2023-01-11T23:13:47.1135746Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 34%] 2023-01-11T23:13:47.1136074Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 34%] 2023-01-11T23:13:47.1136278Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 34%] 2023-01-11T23:13:47.1136461Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_aten_cuda_bfloat16 PASSED [ 34%] 2023-01-11T23:13:47.1136645Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_aten_cuda_bool PASSED [ 34%] 2023-01-11T23:13:47.1136829Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_aten_cuda_complex32 PASSED [ 34%] 2023-01-11T23:13:47.1137067Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_aten_cuda_float16 PASSED [ 34%] 2023-01-11T23:13:47.1137251Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_aten_cuda_float32 PASSED [ 34%] 2023-01-11T23:13:47.1137427Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_aten_cuda_float64 PASSED [ 34%] 2023-01-11T23:13:47.1137599Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_aten_cuda_int16 PASSED [ 34%] 2023-01-11T23:13:47.1137779Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_aten_cuda_uint8 PASSED [ 34%] 2023-01-11T23:13:47.1137988Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 34%] 2023-01-11T23:13:47.1138191Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 34%] 2023-01-11T23:13:47.1138406Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 34%] 2023-01-11T23:13:47.1138750Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 34%] 2023-01-11T23:13:47.1138987Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 34%] 2023-01-11T23:13:47.1139217Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 34%] 2023-01-11T23:13:47.1139420Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 34%] 2023-01-11T23:13:47.1139776Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 34%] 2023-01-11T23:13:47.1139957Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan2_executor_aten_cuda_bfloat16 PASSED [ 34%] 2023-01-11T23:13:47.1140137Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan2_executor_aten_cuda_bool PASSED [ 34%] 2023-01-11T23:13:47.1140319Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan2_executor_aten_cuda_float16 PASSED [ 34%] 2023-01-11T23:13:47.1140500Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan2_executor_aten_cuda_float32 PASSED [ 34%] 2023-01-11T23:13:47.1140673Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan2_executor_aten_cuda_float64 PASSED [ 34%] 2023-01-11T23:13:47.1140849Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan2_executor_aten_cuda_int32 PASSED [ 34%] 2023-01-11T23:13:47.1141032Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan2_executor_aten_cuda_uint8 PASSED [ 34%] 2023-01-11T23:13:47.1141241Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan2_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 34%] 2023-01-11T23:13:47.1141440Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan2_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 34%] 2023-01-11T23:13:47.1141764Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan2_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 34%] 2023-01-11T23:13:47.1141947Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan2_executor_nvfuser_cuda_int32 PASSED [ 34%] 2023-01-11T23:13:47.1142150Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan2_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 34%] 2023-01-11T23:13:47.1142356Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan2_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 34%] 2023-01-11T23:13:47.1142540Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_aten_cuda_bool PASSED [ 34%] 2023-01-11T23:13:47.1142750Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_aten_cuda_complex128 PASSED [ 34%] 2023-01-11T23:13:47.1142936Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_aten_cuda_complex64 PASSED [ 34%] 2023-01-11T23:13:47.1143117Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_aten_cuda_float16 PASSED [ 34%] 2023-01-11T23:13:47.1143298Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_aten_cuda_float32 PASSED [ 34%] 2023-01-11T23:13:47.1143476Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_aten_cuda_float64 PASSED [ 34%] 2023-01-11T23:13:47.1143648Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_aten_cuda_int32 PASSED [ 34%] 2023-01-11T23:13:47.1143863Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 34%] 2023-01-11T23:13:47.1144207Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 34%] 2023-01-11T23:13:47.1144421Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 34%] 2023-01-11T23:13:47.1144626Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 34%] 2023-01-11T23:13:47.1144811Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_nvfuser_cuda_float32 PASSED [ 34%] 2023-01-11T23:13:47.1145132Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 34%] 2023-01-11T23:13:47.1145365Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 34%] 2023-01-11T23:13:47.1145546Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_aten_cuda_bool PASSED [ 34%] 2023-01-11T23:13:47.1145731Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_aten_cuda_float16 PASSED [ 34%] 2023-01-11T23:13:47.1145906Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_aten_cuda_float32 PASSED [ 34%] 2023-01-11T23:13:47.1146087Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_aten_cuda_int16 PASSED [ 34%] 2023-01-11T23:13:47.1146266Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_aten_cuda_int32 PASSED [ 34%] 2023-01-11T23:13:47.1146471Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 34%] 2023-01-11T23:13:47.1146680Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 34%] 2023-01-11T23:13:47.1147010Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 34%] 2023-01-11T23:13:47.1147216Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 34%] 2023-01-11T23:13:47.1147403Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_aten_cuda_bfloat16 PASSED [ 34%] 2023-01-11T23:13:47.1147595Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_aten_cuda_complex128 PASSED [ 34%] 2023-01-11T23:13:47.1147774Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_aten_cuda_float32 PASSED [ 34%] 2023-01-11T23:13:47.1148121Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 34%] 2023-01-11T23:13:47.1148337Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 34%] 2023-01-11T23:13:47.1148575Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 34%] 2023-01-11T23:13:47.1148911Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_nvfuser_cuda_float32 SKIPPED (_refs.atleast_1d doesn't support nvfuser) [ 34%] 2023-01-11T23:13:47.1149249Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 34%] 2023-01-11T23:13:47.1149579Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_nvfuser_cuda_int32 SKIPPED (_refs.atleast_1d doesn't support nvfuser) [ 34%] 2023-01-11T23:13:47.1150025Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 34%] 2023-01-11T23:13:47.1150222Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_aten_cuda_complex32 PASSED [ 34%] 2023-01-11T23:13:47.1150421Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_aten_cuda_complex64 PASSED [ 34%] 2023-01-11T23:13:47.1150605Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_aten_cuda_float64 PASSED [ 34%] 2023-01-11T23:13:47.1150782Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_aten_cuda_int16 PASSED [ 34%] 2023-01-11T23:13:47.1150993Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 34%] 2023-01-11T23:13:47.1151207Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 34%] 2023-01-11T23:13:47.1151591Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 34%] 2023-01-11T23:13:47.1151807Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 34%] 2023-01-11T23:13:47.1152018Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 34%] 2023-01-11T23:13:47.1152347Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 34%] 2023-01-11T23:13:47.1152682Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_nvfuser_cuda_int32 SKIPPED (_refs.atleast_2d doesn't support nvfuser) [ 34%] 2023-01-11T23:13:47.1152887Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 34%] 2023-01-11T23:13:47.1153081Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_aten_cuda_complex128 PASSED [ 34%] 2023-01-11T23:13:47.1153263Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_aten_cuda_complex32 PASSED [ 34%] 2023-01-11T23:13:47.1153449Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_aten_cuda_int64 PASSED [ 34%] 2023-01-11T23:13:47.1153635Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_aten_cuda_int8 PASSED [ 34%] 2023-01-11T23:13:47.1153848Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 34%] 2023-01-11T23:13:47.1154056Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 34%] 2023-01-11T23:13:47.1154270Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 34%] 2023-01-11T23:13:47.1154609Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_nvfuser_cuda_int32 SKIPPED (_refs.atleast_3d doesn't support nvfuser) [ 34%] 2023-01-11T23:13:47.1154819Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_and_executor_aten_cuda_bool PASSED [ 34%] 2023-01-11T23:13:47.1155009Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_and_executor_aten_cuda_int64 PASSED [ 34%] 2023-01-11T23:13:47.1155219Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_and_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 34%] 2023-01-11T23:13:47.1155542Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_and_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 34%] 2023-01-11T23:13:47.1155739Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_left_shift_executor_aten_cuda_int8 PASSED [ 34%] 2023-01-11T23:13:47.1155935Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_left_shift_executor_aten_cuda_uint8 PASSED [ 34%] 2023-01-11T23:13:47.1156293Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_left_shift_executor_nvfuser_cuda_int32 SKIPPED (_refs.bitwise_left_shift doesn't support nvfuser) [ 34%] 2023-01-11T23:13:47.1156511Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_left_shift_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 34%] 2023-01-11T23:13:47.1156849Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_left_shift_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 34%] 2023-01-11T23:13:47.1157035Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_not_executor_aten_cuda_bool PASSED [ 34%] 2023-01-11T23:13:47.1157221Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_not_executor_aten_cuda_int32 PASSED [ 34%] 2023-01-11T23:13:47.1157407Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_not_executor_aten_cuda_int64 PASSED [ 34%] 2023-01-11T23:13:47.1157615Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_not_executor_aten_cuda_int8 PASSED [ 34%] 2023-01-11T23:13:47.1157796Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_not_executor_nvfuser_cuda_int32 PASSED [ 34%] 2023-01-11T23:13:47.1158126Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_not_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 34%] 2023-01-11T23:13:47.1158342Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_not_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 34%] 2023-01-11T23:13:47.1158529Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_or_executor_aten_cuda_bool PASSED [ 34%] 2023-01-11T23:13:47.1158713Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_or_executor_aten_cuda_int16 PASSED [ 34%] 2023-01-11T23:13:47.1158894Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_or_executor_aten_cuda_uint8 PASSED [ 34%] 2023-01-11T23:13:47.1159080Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_or_executor_nvfuser_cuda_int32 PASSED [ 34%] 2023-01-11T23:13:47.1159293Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_or_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 34%] 2023-01-11T23:13:47.1159491Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_right_shift_executor_aten_cuda_int16 PASSED [ 34%] 2023-01-11T23:13:47.1159684Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_right_shift_executor_aten_cuda_int32 PASSED [ 34%] 2023-01-11T23:13:47.1160037Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_right_shift_executor_nvfuser_cuda_int32 SKIPPED (_refs.bitwise_right_shift doesn't support nvfuser) [ 34%] 2023-01-11T23:13:47.1160257Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_right_shift_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 34%] 2023-01-11T23:13:47.1160480Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_right_shift_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 35%] 2023-01-11T23:13:47.1160693Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_xor_executor_aten_cuda_bool PASSED [ 35%] 2023-01-11T23:13:47.1160879Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_xor_executor_aten_cuda_int16 PASSED [ 35%] 2023-01-11T23:13:47.1161064Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_xor_executor_aten_cuda_uint8 PASSED [ 35%] 2023-01-11T23:13:47.1161275Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_xor_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 35%] 2023-01-11T23:13:47.1161608Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_xor_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 35%] 2023-01-11T23:13:47.1161802Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_xor_executor_nvfuser_cuda_int32 PASSED [ 35%] 2023-01-11T23:13:47.1162132Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_xor_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 35%] 2023-01-11T23:13:47.1162328Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_shapes_executor_aten_cuda_float32 PASSED [ 35%] 2023-01-11T23:13:47.1162519Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_tensors_executor_aten_cuda_complex128 PASSED [ 35%] 2023-01-11T23:13:47.1162715Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_tensors_executor_aten_cuda_float32 PASSED [ 35%] 2023-01-11T23:13:47.1162912Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_tensors_executor_aten_cuda_float64 PASSED [ 35%] 2023-01-11T23:13:47.1163107Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_tensors_executor_aten_cuda_int32 PASSED [ 35%] 2023-01-11T23:13:47.1163358Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_tensors_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 35%] 2023-01-11T23:13:47.1163584Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_tensors_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 35%] 2023-01-11T23:13:47.1163782Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_tensors_executor_nvfuser_cuda_float32 PASSED [ 35%] 2023-01-11T23:13:47.1163996Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_tensors_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 35%] 2023-01-11T23:13:47.1164340Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_tensors_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 35%] 2023-01-11T23:13:47.1164559Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_tensors_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 35%] 2023-01-11T23:13:47.1164894Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_tensors_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 35%] 2023-01-11T23:13:47.1165093Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_to_executor_aten_cuda_bfloat16 PASSED [ 35%] 2023-01-11T23:13:47.1165280Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_to_executor_aten_cuda_bool PASSED [ 35%] 2023-01-11T23:13:47.1165474Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_to_executor_aten_cuda_complex128 PASSED [ 35%] 2023-01-11T23:13:47.1165667Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_to_executor_aten_cuda_complex64 PASSED [ 35%] 2023-01-11T23:13:47.1165853Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_to_executor_aten_cuda_float32 PASSED [ 35%] 2023-01-11T23:13:47.1166041Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_to_executor_aten_cuda_int16 PASSED [ 35%] 2023-01-11T23:13:47.1166231Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_to_executor_aten_cuda_int64 PASSED [ 35%] 2023-01-11T23:13:47.1166440Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_to_executor_aten_cuda_int8 PASSED [ 35%] 2023-01-11T23:13:47.1166655Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_to_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 35%] 2023-01-11T23:13:47.1166864Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_to_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 35%] 2023-01-11T23:13:47.1167081Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_to_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 35%] 2023-01-11T23:13:47.1167418Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_to_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 35%] 2023-01-11T23:13:47.1167614Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_to_executor_nvfuser_cuda_int32 PASSED [ 35%] 2023-01-11T23:13:47.1167829Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_to_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 35%] 2023-01-11T23:13:47.1168014Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bucketize_executor_aten_cuda_float32 XFAIL [ 35%] 2023-01-11T23:13:47.1168199Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bucketize_executor_aten_cuda_float64 XFAIL [ 35%] 2023-01-11T23:13:47.1168384Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bucketize_executor_aten_cuda_int8 XFAIL [ 35%] 2023-01-11T23:13:47.1168596Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bucketize_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 35%] 2023-01-11T23:13:47.1168810Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bucketize_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 35%] 2023-01-11T23:13:47.1169207Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bucketize_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 35%] 2023-01-11T23:13:47.1169426Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bucketize_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 35%] 2023-01-11T23:13:47.1169613Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_aten_cuda_complex128 PASSED [ 35%] 2023-01-11T23:13:47.1169796Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_aten_cuda_complex64 PASSED [ 35%] 2023-01-11T23:13:47.1169976Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_aten_cuda_float32 PASSED [ 35%] 2023-01-11T23:13:47.1170156Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_aten_cuda_float64 PASSED [ 35%] 2023-01-11T23:13:47.1170336Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_aten_cuda_uint8 PASSED [ 35%] 2023-01-11T23:13:47.1170545Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 35%] 2023-01-11T23:13:47.1170749Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 35%] 2023-01-11T23:13:47.1170952Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 35%] 2023-01-11T23:13:47.1171156Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 35%] 2023-01-11T23:13:47.1171468Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_nvfuser_cuda_float32 SKIPPED (_refs.cat doesn't support nvfuser) [ 35%] 2023-01-11T23:13:47.1171676Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 35%] 2023-01-11T23:13:47.1171989Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_nvfuser_cuda_int32 SKIPPED (_refs.cat doesn't support nvfuser) [ 35%] 2023-01-11T23:13:47.1172172Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ceil_executor_aten_cuda_float64 PASSED [ 35%] 2023-01-11T23:13:47.1172382Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ceil_executor_aten_cuda_int8 PASSED [ 35%] 2023-01-11T23:13:47.1172565Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ceil_executor_aten_cuda_uint8 PASSED [ 35%] 2023-01-11T23:13:47.1172773Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ceil_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 35%] 2023-01-11T23:13:47.1172954Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_aten_cuda_bool PASSED [ 35%] 2023-01-11T23:13:47.1173133Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_aten_cuda_complex128 PASSED [ 35%] 2023-01-11T23:13:47.1173325Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_aten_cuda_complex32 PASSED [ 35%] 2023-01-11T23:13:47.1173507Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_aten_cuda_float64 PASSED [ 35%] 2023-01-11T23:13:47.1173690Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_aten_cuda_int16 PASSED [ 35%] 2023-01-11T23:13:47.1173869Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_aten_cuda_int32 PASSED [ 35%] 2023-01-11T23:13:47.1174078Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 35%] 2023-01-11T23:13:47.1174283Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 35%] 2023-01-11T23:13:47.1174781Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 35%] 2023-01-11T23:13:47.1175017Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 35%] 2023-01-11T23:13:47.1175576Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 35%] 2023-01-11T23:13:47.1175778Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 35%] 2023-01-11T23:13:47.1175962Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_executor_aten_cuda_float16 PASSED [ 35%] 2023-01-11T23:13:47.1176145Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_executor_aten_cuda_float64 PASSED [ 35%] 2023-01-11T23:13:47.1176324Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_executor_aten_cuda_int64 PASSED [ 35%] 2023-01-11T23:13:47.1176503Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_executor_aten_cuda_uint8 PASSED [ 35%] 2023-01-11T23:13:47.1176716Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 35%] 2023-01-11T23:13:47.1176921Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 35%] 2023-01-11T23:13:47.1177250Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 35%] 2023-01-11T23:13:47.1177456Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 35%] 2023-01-11T23:13:47.1177636Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_max_executor_aten_cuda_bool PASSED [ 35%] 2023-01-11T23:13:47.1177824Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_max_executor_aten_cuda_float32 PASSED [ 35%] 2023-01-11T23:13:47.1178007Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_max_executor_aten_cuda_int16 PASSED [ 35%] 2023-01-11T23:13:47.1178194Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_max_executor_aten_cuda_int64 PASSED [ 35%] 2023-01-11T23:13:47.1178413Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_max_executor_aten_cuda_uint8 PASSED [ 35%] 2023-01-11T23:13:47.1178628Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_max_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 35%] 2023-01-11T23:13:47.1178839Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_max_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 35%] 2023-01-11T23:13:47.1179171Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_max_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 35%] 2023-01-11T23:13:47.1179497Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_max_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 35%] 2023-01-11T23:13:47.1179687Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_min_executor_aten_cuda_float16 PASSED [ 35%] 2023-01-11T23:13:47.1179864Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_min_executor_aten_cuda_int16 PASSED [ 35%] 2023-01-11T23:13:47.1180047Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_min_executor_aten_cuda_int32 PASSED [ 35%] 2023-01-11T23:13:47.1180227Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_min_executor_aten_cuda_int64 PASSED [ 35%] 2023-01-11T23:13:47.1180409Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_min_executor_aten_cuda_int8 PASSED [ 35%] 2023-01-11T23:13:47.1180589Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_min_executor_aten_cuda_uint8 PASSED [ 35%] 2023-01-11T23:13:47.1180800Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_min_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 35%] 2023-01-11T23:13:47.1181008Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_min_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 35%] 2023-01-11T23:13:47.1181367Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_min_executor_nvfuser_cuda_float32 SKIPPED (_refs.clamp_min doesn't support nvfuser) [ 35%] 2023-01-11T23:13:47.1181695Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_min_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 35%] 2023-01-11T23:13:47.1182016Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_min_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 35%] 2023-01-11T23:13:47.1182195Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_aten_cuda_bfloat16 PASSED [ 35%] 2023-01-11T23:13:47.1182375Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_aten_cuda_bool PASSED [ 35%] 2023-01-11T23:13:47.1182562Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_aten_cuda_complex128 PASSED [ 35%] 2023-01-11T23:13:47.1182746Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_aten_cuda_complex32 PASSED [ 35%] 2023-01-11T23:13:47.1182928Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_aten_cuda_float16 PASSED [ 35%] 2023-01-11T23:13:47.1183107Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_aten_cuda_float32 PASSED [ 35%] 2023-01-11T23:13:47.1183284Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_aten_cuda_float64 PASSED [ 35%] 2023-01-11T23:13:47.1183463Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_aten_cuda_int32 PASSED [ 35%] 2023-01-11T23:13:47.1183642Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_aten_cuda_int64 PASSED [ 35%] 2023-01-11T23:13:47.1183813Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_aten_cuda_uint8 PASSED [ 35%] 2023-01-11T23:13:47.1184025Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 35%] 2023-01-11T23:13:47.1184229Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 35%] 2023-01-11T23:13:47.1184599Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 35%] 2023-01-11T23:13:47.1184810Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 35%] 2023-01-11T23:13:47.1185132Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 35%] 2023-01-11T23:13:47.1185317Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_nvfuser_cuda_int32 PASSED [ 35%] 2023-01-11T23:13:47.1185637Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 35%] 2023-01-11T23:13:47.1185834Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_column_stack_executor_aten_cuda_bfloat16 PASSED [ 35%] 2023-01-11T23:13:47.1186028Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_column_stack_executor_aten_cuda_complex32 PASSED [ 35%] 2023-01-11T23:13:47.1186206Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_column_stack_executor_aten_cuda_complex64 PASSED [ 35%] 2023-01-11T23:13:47.1186394Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_column_stack_executor_aten_cuda_float32 PASSED [ 35%] 2023-01-11T23:13:47.1186582Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_column_stack_executor_aten_cuda_float64 PASSED [ 35%] 2023-01-11T23:13:47.1186767Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_column_stack_executor_aten_cuda_int32 PASSED [ 35%] 2023-01-11T23:13:47.1186954Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_column_stack_executor_aten_cuda_int64 PASSED [ 35%] 2023-01-11T23:13:47.1187175Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_column_stack_executor_aten_cuda_int8 PASSED [ 35%] 2023-01-11T23:13:47.1187367Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_column_stack_executor_aten_cuda_uint8 PASSED [ 35%] 2023-01-11T23:13:47.1187717Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_column_stack_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 35%] 2023-01-11T23:13:47.1187936Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_column_stack_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 35%] 2023-01-11T23:13:47.1188150Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_column_stack_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 35%] 2023-01-11T23:13:47.1188355Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_column_stack_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 35%] 2023-01-11T23:13:47.1188688Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_column_stack_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 35%] 2023-01-11T23:13:47.1188901Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_column_stack_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 35%] 2023-01-11T23:13:47.1189111Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_column_stack_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 35%] 2023-01-11T23:13:47.1189295Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_aten_cuda_bfloat16 PASSED [ 35%] 2023-01-11T23:13:47.1189474Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_aten_cuda_bool PASSED [ 35%] 2023-01-11T23:13:47.1189658Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_aten_cuda_complex128 PASSED [ 35%] 2023-01-11T23:13:47.1189939Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_aten_cuda_complex32 PASSED [ 35%] 2023-01-11T23:13:47.1190125Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_aten_cuda_float64 PASSED [ 35%] 2023-01-11T23:13:47.1190339Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_aten_cuda_int64 PASSED [ 35%] 2023-01-11T23:13:47.1190512Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_aten_cuda_int8 PASSED [ 35%] 2023-01-11T23:13:47.1190693Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_aten_cuda_uint8 PASSED [ 35%] 2023-01-11T23:13:47.1190904Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 35%] 2023-01-11T23:13:47.1191245Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 35%] 2023-01-11T23:13:47.1191454Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 35%] 2023-01-11T23:13:47.1191664Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 35%] 2023-01-11T23:13:47.1191871Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 35%] 2023-01-11T23:13:47.1192074Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 35%] 2023-01-11T23:13:47.1192393Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 35%] 2023-01-11T23:13:47.1192581Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_physical_executor_aten_cuda_bool PASSED [ 35%] 2023-01-11T23:13:47.1192769Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_physical_executor_aten_cuda_complex128 PASSED [ 35%] 2023-01-11T23:13:47.1192992Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_physical_executor_aten_cuda_complex64 PASSED [ 35%] 2023-01-11T23:13:47.1193189Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_physical_executor_aten_cuda_float64 PASSED [ 35%] 2023-01-11T23:13:47.1193381Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_physical_executor_aten_cuda_int32 PASSED [ 35%] 2023-01-11T23:13:47.1193571Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_physical_executor_aten_cuda_int8 PASSED [ 35%] 2023-01-11T23:13:47.1193789Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_physical_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 36%] 2023-01-11T23:13:47.1194004Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_physical_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 36%] 2023-01-11T23:13:47.1194226Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_physical_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 36%] 2023-01-11T23:13:47.1194586Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_physical_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 36%] 2023-01-11T23:13:47.1194931Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_physical_executor_nvfuser_cuda_float32 SKIPPED (_refs.conj_physical doesn't support nvfuser) [ 36%] 2023-01-11T23:13:47.1195140Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_physical_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 36%] 2023-01-11T23:13:47.1195358Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_physical_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 36%] 2023-01-11T23:13:47.1195559Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_constant_pad_nd_executor_aten_cuda_complex128 PASSED [ 36%] 2023-01-11T23:13:47.1195759Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_constant_pad_nd_executor_aten_cuda_complex64 PASSED [ 36%] 2023-01-11T23:13:47.1195957Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_constant_pad_nd_executor_aten_cuda_float32 PASSED [ 36%] 2023-01-11T23:13:47.1196173Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_constant_pad_nd_executor_aten_cuda_float64 PASSED [ 36%] 2023-01-11T23:13:47.1196367Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_constant_pad_nd_executor_aten_cuda_int16 PASSED [ 36%] 2023-01-11T23:13:47.1196553Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_constant_pad_nd_executor_aten_cuda_int64 PASSED [ 36%] 2023-01-11T23:13:47.1196741Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_constant_pad_nd_executor_aten_cuda_uint8 PASSED [ 36%] 2023-01-11T23:13:47.1196968Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_constant_pad_nd_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 36%] 2023-01-11T23:13:47.1197181Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_constant_pad_nd_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 36%] 2023-01-11T23:13:47.1197527Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_constant_pad_nd_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 36%] 2023-01-11T23:13:47.1197872Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_constant_pad_nd_executor_nvfuser_cuda_int32 SKIPPED (_refs.constant_pad_nd doesn't support nvfuser) [ 36%] 2023-01-11T23:13:47.1198093Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_constant_pad_nd_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 36%] 2023-01-11T23:13:47.1198286Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_aten_cuda_complex32 PASSED [ 36%] 2023-01-11T23:13:47.1198478Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_aten_cuda_float32 PASSED [ 36%] 2023-01-11T23:13:47.1198665Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_aten_cuda_int16 PASSED [ 36%] 2023-01-11T23:13:47.1198880Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_aten_cuda_int8 PASSED [ 36%] 2023-01-11T23:13:47.1199070Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_aten_cuda_uint8 PASSED [ 36%] 2023-01-11T23:13:47.1199289Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 36%] 2023-01-11T23:13:47.1199498Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 36%] 2023-01-11T23:13:47.1199839Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 36%] 2023-01-11T23:13:47.1200172Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_nvfuser_cuda_int32 SKIPPED (_refs.contiguous doesn't support nvfuser) [ 36%] 2023-01-11T23:13:47.1200386Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 36%] 2023-01-11T23:13:47.1200717Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 36%] 2023-01-11T23:13:47.1200929Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 36%] 2023-01-11T23:13:47.1201120Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_copysign_executor_aten_cuda_bfloat16 PASSED [ 36%] 2023-01-11T23:13:47.1201307Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_copysign_executor_aten_cuda_bool PASSED [ 36%] 2023-01-11T23:13:47.1201496Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_copysign_executor_aten_cuda_int16 PASSED [ 36%] 2023-01-11T23:13:47.1201682Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_copysign_executor_aten_cuda_int32 PASSED [ 36%] 2023-01-11T23:13:47.1201859Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_copysign_executor_aten_cuda_int64 PASSED [ 36%] 2023-01-11T23:13:47.1202067Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_copysign_executor_aten_cuda_int8 PASSED [ 36%] 2023-01-11T23:13:47.1202286Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_copysign_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 36%] 2023-01-11T23:13:47.1202615Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_copysign_executor_nvfuser_cuda_int32 SKIPPED (_refs.copysign doesn't support nvfuser) [ 36%] 2023-01-11T23:13:47.1202827Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_copysign_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 36%] 2023-01-11T23:13:47.1203155Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_copysign_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 36%] 2023-01-11T23:13:47.1203344Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_aten_cuda_bfloat16 PASSED [ 36%] 2023-01-11T23:13:47.1203529Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_aten_cuda_complex32 PASSED [ 36%] 2023-01-11T23:13:47.1203714Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_aten_cuda_float64 PASSED [ 36%] 2023-01-11T23:13:47.1203896Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_aten_cuda_int32 PASSED [ 36%] 2023-01-11T23:13:47.1204068Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_aten_cuda_int64 PASSED [ 36%] 2023-01-11T23:13:47.1204249Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_aten_cuda_uint8 PASSED [ 36%] 2023-01-11T23:13:47.1204456Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 36%] 2023-01-11T23:13:47.1204662Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 36%] 2023-01-11T23:13:47.1205032Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 36%] 2023-01-11T23:13:47.1205245Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 36%] 2023-01-11T23:13:47.1205451Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 36%] 2023-01-11T23:13:47.1205640Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_nvfuser_cuda_float32 PASSED [ 36%] 2023-01-11T23:13:47.1205826Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_nvfuser_cuda_int32 PASSED [ 36%] 2023-01-11T23:13:47.1206148Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 36%] 2023-01-11T23:13:47.1206327Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_aten_cuda_bfloat16 PASSED [ 36%] 2023-01-11T23:13:47.1206516Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_aten_cuda_complex128 PASSED [ 36%] 2023-01-11T23:13:47.1206698Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_aten_cuda_float16 PASSED [ 36%] 2023-01-11T23:13:47.1206881Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_aten_cuda_float64 PASSED [ 36%] 2023-01-11T23:13:47.1207062Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_aten_cuda_int16 PASSED [ 36%] 2023-01-11T23:13:47.1207242Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_aten_cuda_int32 PASSED [ 36%] 2023-01-11T23:13:47.1207422Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_aten_cuda_int64 PASSED [ 36%] 2023-01-11T23:13:47.1207603Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_aten_cuda_int8 PASSED [ 36%] 2023-01-11T23:13:47.1207816Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 36%] 2023-01-11T23:13:47.1208042Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 36%] 2023-01-11T23:13:47.1208257Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 36%] 2023-01-11T23:13:47.1208597Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 36%] 2023-01-11T23:13:47.1208806Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 36%] 2023-01-11T23:13:47.1209182Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 36%] 2023-01-11T23:13:47.1209391Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 36%] 2023-01-11T23:13:47.1209598Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 36%] 2023-01-11T23:13:47.1209789Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumsum_executor_aten_cuda_complex128 PASSED [ 36%] 2023-01-11T23:13:47.1209978Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumsum_executor_aten_cuda_complex64 PASSED [ 36%] 2023-01-11T23:13:47.1210164Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumsum_executor_aten_cuda_float16 PASSED [ 36%] 2023-01-11T23:13:47.1210342Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumsum_executor_aten_cuda_float32 PASSED [ 36%] 2023-01-11T23:13:47.1210524Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumsum_executor_aten_cuda_int32 PASSED [ 36%] 2023-01-11T23:13:47.1210755Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumsum_executor_aten_cuda_int64 PASSED [ 36%] 2023-01-11T23:13:47.1210943Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumsum_executor_aten_cuda_uint8 PASSED [ 36%] 2023-01-11T23:13:47.1211155Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumsum_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 36%] 2023-01-11T23:13:47.1211365Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumsum_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 36%] 2023-01-11T23:13:47.1211574Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumsum_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 36%] 2023-01-11T23:13:47.1211782Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumsum_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 36%] 2023-01-11T23:13:47.1211972Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_aten_cuda_bool PASSED [ 36%] 2023-01-11T23:13:47.1212169Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_aten_cuda_complex128 PASSED [ 36%] 2023-01-11T23:13:47.1212355Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_aten_cuda_complex64 PASSED [ 36%] 2023-01-11T23:13:47.1212545Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_aten_cuda_float16 PASSED [ 36%] 2023-01-11T23:13:47.1212734Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_aten_cuda_float64 PASSED [ 36%] 2023-01-11T23:13:47.1212922Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_aten_cuda_int16 PASSED [ 36%] 2023-01-11T23:13:47.1213140Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 36%] 2023-01-11T23:13:47.1213352Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 36%] 2023-01-11T23:13:47.1213574Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 36%] 2023-01-11T23:13:47.1213816Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 36%] 2023-01-11T23:13:47.1214031Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 36%] 2023-01-11T23:13:47.1214366Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 36%] 2023-01-11T23:13:47.1214977Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_nvfuser_cuda_int32 SKIPPED (_refs.diag_embed doesn't support nvfuser) [ 36%] 2023-01-11T23:13:47.1215193Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 36%] 2023-01-11T23:13:47.1215532Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 36%] 2023-01-11T23:13:47.1215748Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 36%] 2023-01-11T23:13:47.1215928Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_executor_aten_cuda_bool PASSED [ 36%] 2023-01-11T23:13:47.1216113Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_executor_aten_cuda_float32 PASSED [ 36%] 2023-01-11T23:13:47.1216315Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 36%] 2023-01-11T23:13:47.1216527Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 36%] 2023-01-11T23:13:47.1217073Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 36%] 2023-01-11T23:13:47.1217288Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 36%] 2023-01-11T23:13:47.1217487Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 36%] 2023-01-11T23:13:47.1217811Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 36%] 2023-01-11T23:13:47.1218123Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_executor_nvfuser_cuda_int32 SKIPPED (_refs.diag doesn't support nvfuser) [ 36%] 2023-01-11T23:13:47.1218327Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 36%] 2023-01-11T23:13:47.1218526Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_aten_cuda_complex128 PASSED [ 36%] 2023-01-11T23:13:47.1218725Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_aten_cuda_complex32 PASSED [ 36%] 2023-01-11T23:13:47.1218919Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_aten_cuda_float16 PASSED [ 36%] 2023-01-11T23:13:47.1219106Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_aten_cuda_float64 PASSED [ 36%] 2023-01-11T23:13:47.1219298Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_aten_cuda_int32 PASSED [ 36%] 2023-01-11T23:13:47.1219488Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_aten_cuda_int64 PASSED [ 36%] 2023-01-11T23:13:47.1219670Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_aten_cuda_int8 PASSED [ 36%] 2023-01-11T23:13:47.1219890Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 36%] 2023-01-11T23:13:47.1220284Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 36%] 2023-01-11T23:13:47.1220506Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 36%] 2023-01-11T23:13:47.1220849Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_nvfuser_cuda_float32 SKIPPED (_refs.diagonal_copy doesn't support nvfuser) [ 36%] 2023-01-11T23:13:47.1221067Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 36%] 2023-01-11T23:13:47.1221280Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 36%] 2023-01-11T23:13:47.1221467Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_executor_aten_cuda_bool PASSED [ 36%] 2023-01-11T23:13:47.1221661Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_executor_aten_cuda_complex128 PASSED [ 36%] 2023-01-11T23:13:47.1221854Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_executor_aten_cuda_complex32 PASSED [ 36%] 2023-01-11T23:13:47.1222031Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_executor_aten_cuda_float16 PASSED [ 36%] 2023-01-11T23:13:47.1222217Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_executor_aten_cuda_float32 PASSED [ 36%] 2023-01-11T23:13:47.1222396Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_executor_aten_cuda_float64 PASSED [ 36%] 2023-01-11T23:13:47.1222579Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_executor_aten_cuda_int16 PASSED [ 36%] 2023-01-11T23:13:47.1222789Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_executor_aten_cuda_int32 PASSED [ 36%] 2023-01-11T23:13:47.1222975Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_executor_aten_cuda_int8 PASSED [ 36%] 2023-01-11T23:13:47.1223156Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_executor_aten_cuda_uint8 PASSED [ 36%] 2023-01-11T23:13:47.1223372Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 36%] 2023-01-11T23:13:47.1223584Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 36%] 2023-01-11T23:13:47.1223910Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_executor_nvfuser_cuda_int32 SKIPPED (_refs.diagonal doesn't support nvfuser) [ 36%] 2023-01-11T23:13:47.1224227Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 36%] 2023-01-11T23:13:47.1224427Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_scatter_executor_aten_cuda_bool PASSED [ 36%] 2023-01-11T23:13:47.1224628Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_scatter_executor_aten_cuda_complex64 PASSED [ 36%] 2023-01-11T23:13:47.1224823Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_scatter_executor_aten_cuda_float16 PASSED [ 36%] 2023-01-11T23:13:47.1225013Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_scatter_executor_aten_cuda_int32 PASSED [ 36%] 2023-01-11T23:13:47.1225233Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_scatter_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 36%] 2023-01-11T23:13:47.1225455Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_scatter_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 36%] 2023-01-11T23:13:47.1225681Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_scatter_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 36%] 2023-01-11T23:13:47.1225902Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_scatter_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 36%] 2023-01-11T23:13:47.1226272Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_scatter_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 36%] 2023-01-11T23:13:47.1226480Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_scatter_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 36%] 2023-01-11T23:13:47.1226665Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_digamma_executor_aten_cuda_float16 PASSED [ 36%] 2023-01-11T23:13:47.1226851Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_digamma_executor_aten_cuda_int16 PASSED [ 36%] 2023-01-11T23:13:47.1227033Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_digamma_executor_aten_cuda_int8 PASSED [ 36%] 2023-01-11T23:13:47.1227223Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_digamma_executor_aten_cuda_uint8 PASSED [ 36%] 2023-01-11T23:13:47.1227438Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_digamma_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 36%] 2023-01-11T23:13:47.1227643Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_digamma_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 36%] 2023-01-11T23:13:47.1227851Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_digamma_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 36%] 2023-01-11T23:13:47.1228046Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_floor_rounding_executor_aten_cuda_float16 PASSED [ 37%] 2023-01-11T23:13:47.1228240Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_floor_rounding_executor_aten_cuda_int32 PASSED [ 37%] 2023-01-11T23:13:47.1228420Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_floor_rounding_executor_aten_cuda_uint8 PASSED [ 37%] 2023-01-11T23:13:47.1228802Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_floor_rounding_executor_nvfuser_cuda_float32 SKIPPED (_refs.div doesn't support nvfuser) [ 37%] 2023-01-11T23:13:47.1229146Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_floor_rounding_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 37%] 2023-01-11T23:13:47.1229341Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_aten_cuda_bool PASSED [ 37%] 2023-01-11T23:13:47.1229554Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_aten_cuda_complex128 SKIPPED (Skipped!) [ 37%] 2023-01-11T23:13:47.1229846Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_aten_cuda_int16 PASSED [ 37%] 2023-01-11T23:13:47.1230049Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_aten_cuda_int32 PASSED [ 37%] 2023-01-11T23:13:47.1230245Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_aten_cuda_int8 PASSED [ 37%] 2023-01-11T23:13:47.1230468Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 37%] 2023-01-11T23:13:47.1230681Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_nvfuser_cuda_complex32 SKIPPED (Skipped!) [ 37%] 2023-01-11T23:13:47.1230892Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 37%] 2023-01-11T23:13:47.1231236Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 37%] 2023-01-11T23:13:47.1231565Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_nvfuser_cuda_int32 SKIPPED (_refs.div doesn't support nvfuser) [ 37%] 2023-01-11T23:13:47.1231766Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_trunc_rounding_executor_aten_cuda_bfloat16 PASSED [ 37%] 2023-01-11T23:13:47.1231995Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_trunc_rounding_executor_aten_cuda_float16 PASSED [ 37%] 2023-01-11T23:13:47.1232188Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_trunc_rounding_executor_aten_cuda_float64 PASSED [ 37%] 2023-01-11T23:13:47.1232380Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_trunc_rounding_executor_aten_cuda_int64 PASSED [ 37%] 2023-01-11T23:13:47.1232571Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_trunc_rounding_executor_aten_cuda_int8 PASSED [ 37%] 2023-01-11T23:13:47.1232792Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_trunc_rounding_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 37%] 2023-01-11T23:13:47.1233130Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_trunc_rounding_executor_nvfuser_cuda_float32 SKIPPED (_refs.div doesn't support nvfuser) [ 37%] 2023-01-11T23:13:47.1233351Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_trunc_rounding_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 37%] 2023-01-11T23:13:47.1233685Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_trunc_rounding_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 37%] 2023-01-11T23:13:47.1233903Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_trunc_rounding_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 37%] 2023-01-11T23:13:47.1234087Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_aten_cuda_bool PASSED [ 37%] 2023-01-11T23:13:47.1234273Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_aten_cuda_float32 PASSED [ 37%] 2023-01-11T23:13:47.1234457Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_aten_cuda_float64 PASSED [ 37%] 2023-01-11T23:13:47.1234663Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_aten_cuda_int32 PASSED [ 37%] 2023-01-11T23:13:47.1234848Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_aten_cuda_int8 PASSED [ 37%] 2023-01-11T23:13:47.1235057Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 37%] 2023-01-11T23:13:47.1235263Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 37%] 2023-01-11T23:13:47.1235476Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 37%] 2023-01-11T23:13:47.1235811Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 37%] 2023-01-11T23:13:47.1236026Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 37%] 2023-01-11T23:13:47.1236235Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 37%] 2023-01-11T23:13:47.1236560Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 37%] 2023-01-11T23:13:47.1236765Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 37%] 2023-01-11T23:13:47.1236947Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_aten_cuda_bool PASSED [ 37%] 2023-01-11T23:13:47.1237134Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_aten_cuda_float32 PASSED [ 37%] 2023-01-11T23:13:47.1237358Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_aten_cuda_float64 PASSED [ 37%] 2023-01-11T23:13:47.1237582Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_aten_cuda_int16 PASSED [ 37%] 2023-01-11T23:13:47.1237804Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_aten_cuda_int32 PASSED [ 37%] 2023-01-11T23:13:47.1238001Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_aten_cuda_int64 PASSED [ 37%] 2023-01-11T23:13:47.1238181Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_aten_cuda_int8 PASSED [ 37%] 2023-01-11T23:13:47.1238396Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 37%] 2023-01-11T23:13:47.1238753Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 37%] 2023-01-11T23:13:47.1238963Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 37%] 2023-01-11T23:13:47.1239286Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_nvfuser_cuda_float32 SKIPPED (_refs.dstack doesn't support nvfuser) [ 37%] 2023-01-11T23:13:47.1239618Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 37%] 2023-01-11T23:13:47.1239921Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_aten_cuda_bfloat16 SKIPPED (Can't check result for empty) [ 37%] 2023-01-11T23:13:47.1240221Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_aten_cuda_float16 SKIPPED (Can't check result for empty) [ 37%] 2023-01-11T23:13:47.1240524Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_aten_cuda_float64 SKIPPED (Can't check result for empty) [ 37%] 2023-01-11T23:13:47.1240816Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_aten_cuda_int16 SKIPPED (Can't check result for empty) [ 37%] 2023-01-11T23:13:47.1241147Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_aten_cuda_int64 SKIPPED (Can't check result for empty) [ 37%] 2023-01-11T23:13:47.1241445Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_aten_cuda_int8 SKIPPED (Can't check result for empty) [ 37%] 2023-01-11T23:13:47.1241749Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_nvfuser_cuda_bfloat16 SKIPPED (Can't check result for empty) [ 37%] 2023-01-11T23:13:47.1242046Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_nvfuser_cuda_bool SKIPPED (Can't check result for empty) [ 37%] 2023-01-11T23:13:47.1242358Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_nvfuser_cuda_complex128 SKIPPED (Can't check result for empty) [ 37%] 2023-01-11T23:13:47.1242665Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_nvfuser_cuda_complex32 SKIPPED (Can't check result for empty) [ 37%] 2023-01-11T23:13:47.1242976Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_nvfuser_cuda_complex64 SKIPPED (Can't check result for empty) [ 37%] 2023-01-11T23:13:47.1243282Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_nvfuser_cuda_float16 SKIPPED (Can't check result for empty) [ 37%] 2023-01-11T23:13:47.1243583Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_nvfuser_cuda_float32 SKIPPED (Can't check result for empty) [ 37%] 2023-01-11T23:13:47.1243883Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_nvfuser_cuda_int16 SKIPPED (Can't check result for empty) [ 37%] 2023-01-11T23:13:47.1244174Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_nvfuser_cuda_int32 SKIPPED (Can't check result for empty) [ 37%] 2023-01-11T23:13:47.1244492Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_aten_cuda_complex128 SKIPPED (Can't check result for empty_like) [ 37%] 2023-01-11T23:13:47.1244821Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_aten_cuda_complex32 SKIPPED (Can't check result for empty_like) [ 37%] 2023-01-11T23:13:47.1245162Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_aten_cuda_float16 SKIPPED (Can't check result for empty_like) [ 37%] 2023-01-11T23:13:47.1245471Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_aten_cuda_float32 SKIPPED (Can't check result for empty_like) [ 37%] 2023-01-11T23:13:47.1245775Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_aten_cuda_float64 SKIPPED (Can't check result for empty_like) [ 37%] 2023-01-11T23:13:47.1246083Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_aten_cuda_int16 SKIPPED (Can't check result for empty_like) [ 37%] 2023-01-11T23:13:47.1246408Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_nvfuser_cuda_bfloat16 SKIPPED (Can't check result for empty_like) [ 37%] 2023-01-11T23:13:47.1246722Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_nvfuser_cuda_bool SKIPPED (Can't check result for empty_like) [ 37%] 2023-01-11T23:13:47.1247047Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_nvfuser_cuda_complex128 SKIPPED (Can't check result for empty_like) [ 37%] 2023-01-11T23:13:47.1247373Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_nvfuser_cuda_complex32 SKIPPED (Can't check result for empty_like) [ 37%] 2023-01-11T23:13:47.1247687Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_nvfuser_cuda_float64 SKIPPED (Can't check result for empty_like) [ 37%] 2023-01-11T23:13:47.1247997Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_nvfuser_cuda_int16 SKIPPED (Can't check result for empty_like) [ 37%] 2023-01-11T23:13:47.1248339Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_nvfuser_cuda_int32 SKIPPED (Can't check result for empty_like) [ 37%] 2023-01-11T23:13:47.1248524Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_aten_cuda_bfloat16 PASSED [ 37%] 2023-01-11T23:13:47.1248709Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_aten_cuda_complex128 PASSED [ 37%] 2023-01-11T23:13:47.1248892Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_aten_cuda_complex64 PASSED [ 37%] 2023-01-11T23:13:47.1249070Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_aten_cuda_float32 PASSED [ 37%] 2023-01-11T23:13:47.1249248Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_aten_cuda_int32 PASSED [ 37%] 2023-01-11T23:13:47.1249426Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_aten_cuda_int8 PASSED [ 37%] 2023-01-11T23:13:47.1249605Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_aten_cuda_uint8 PASSED [ 37%] 2023-01-11T23:13:47.1249806Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 37%] 2023-01-11T23:13:47.1250144Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 37%] 2023-01-11T23:13:47.1250348Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 37%] 2023-01-11T23:13:47.1250528Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_nvfuser_cuda_float32 PASSED [ 37%] 2023-01-11T23:13:47.1250730Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 37%] 2023-01-11T23:13:47.1251050Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 37%] 2023-01-11T23:13:47.1251250Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 37%] 2023-01-11T23:13:47.1251481Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 37%] 2023-01-11T23:13:47.1251661Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erf_executor_aten_cuda_int16 PASSED [ 37%] 2023-01-11T23:13:47.1251840Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erf_executor_aten_cuda_int32 PASSED [ 37%] 2023-01-11T23:13:47.1252009Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erf_executor_aten_cuda_uint8 PASSED [ 37%] 2023-01-11T23:13:47.1252211Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erf_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 37%] 2023-01-11T23:13:47.1252415Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erf_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 37%] 2023-01-11T23:13:47.1252738Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erf_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 37%] 2023-01-11T23:13:47.1252943Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erf_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 37%] 2023-01-11T23:13:47.1253125Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfc_executor_aten_cuda_bfloat16 PASSED [ 37%] 2023-01-11T23:13:47.1253304Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfc_executor_aten_cuda_float32 PASSED [ 37%] 2023-01-11T23:13:47.1253483Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfc_executor_aten_cuda_int16 PASSED [ 37%] 2023-01-11T23:13:47.1253663Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfc_executor_aten_cuda_int64 PASSED [ 37%] 2023-01-11T23:13:47.1253831Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfc_executor_aten_cuda_int8 PASSED [ 37%] 2023-01-11T23:13:47.1254037Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfc_executor_aten_cuda_uint8 PASSED [ 37%] 2023-01-11T23:13:47.1254248Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfc_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 37%] 2023-01-11T23:13:47.1254452Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfc_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 37%] 2023-01-11T23:13:47.1254867Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfc_executor_nvfuser_cuda_int32 PASSED [ 37%] 2023-01-11T23:13:47.1255077Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfc_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 37%] 2023-01-11T23:13:47.1255258Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfinv_executor_aten_cuda_float16 PASSED [ 37%] 2023-01-11T23:13:47.1255442Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfinv_executor_aten_cuda_float32 PASSED [ 37%] 2023-01-11T23:13:47.1255622Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfinv_executor_aten_cuda_float64 PASSED [ 37%] 2023-01-11T23:13:47.1255833Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfinv_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 37%] 2023-01-11T23:13:47.1256034Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfinv_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 37%] 2023-01-11T23:13:47.1256364Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfinv_executor_nvfuser_cuda_float32 SKIPPED (_refs.erfinv doesn't support nvfuser) [ 37%] 2023-01-11T23:13:47.1256688Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfinv_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 37%] 2023-01-11T23:13:47.1257015Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfinv_executor_nvfuser_cuda_int32 SKIPPED (_refs.erfinv doesn't support nvfuser) [ 37%] 2023-01-11T23:13:47.1257259Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp2_executor_aten_cuda_bfloat16 PASSED [ 37%] 2023-01-11T23:13:47.1257514Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp2_executor_aten_cuda_bool PASSED [ 37%] 2023-01-11T23:13:47.1257818Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp2_executor_aten_cuda_float32 PASSED [ 37%] 2023-01-11T23:13:47.1258002Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp2_executor_aten_cuda_int16 PASSED [ 37%] 2023-01-11T23:13:47.1258180Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp2_executor_aten_cuda_int8 PASSED [ 37%] 2023-01-11T23:13:47.1258380Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp2_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 37%] 2023-01-11T23:13:47.1258710Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp2_executor_nvfuser_cuda_float32 SKIPPED (_refs.exp2 doesn't support nvfuser) [ 37%] 2023-01-11T23:13:47.1258916Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp2_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 37%] 2023-01-11T23:13:47.1259262Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp2_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 37%] 2023-01-11T23:13:47.1259468Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_aten_cuda_bfloat16 PASSED [ 37%] 2023-01-11T23:13:47.1259645Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_aten_cuda_bool PASSED [ 37%] 2023-01-11T23:13:47.1259830Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_aten_cuda_complex128 PASSED [ 37%] 2023-01-11T23:13:47.1260011Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_aten_cuda_complex32 PASSED [ 37%] 2023-01-11T23:13:47.1260192Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_aten_cuda_float32 PASSED [ 37%] 2023-01-11T23:13:47.1260434Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_aten_cuda_float64 PASSED [ 37%] 2023-01-11T23:13:47.1260606Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_aten_cuda_int64 PASSED [ 37%] 2023-01-11T23:13:47.1260791Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_aten_cuda_uint8 PASSED [ 37%] 2023-01-11T23:13:47.1261001Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 37%] 2023-01-11T23:13:47.1261207Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 37%] 2023-01-11T23:13:47.1261416Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 37%] 2023-01-11T23:13:47.1261619Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 37%] 2023-01-11T23:13:47.1261806Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_as_executor_aten_cuda_bool PASSED [ 37%] 2023-01-11T23:13:47.1262001Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_as_executor_aten_cuda_complex128 PASSED [ 37%] 2023-01-11T23:13:47.1262194Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_as_executor_aten_cuda_float16 PASSED [ 37%] 2023-01-11T23:13:47.1262384Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_as_executor_aten_cuda_float32 PASSED [ 37%] 2023-01-11T23:13:47.1262565Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_as_executor_aten_cuda_uint8 PASSED [ 37%] 2023-01-11T23:13:47.1262781Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_as_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 37%] 2023-01-11T23:13:47.1262999Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_as_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 37%] 2023-01-11T23:13:47.1263216Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_as_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 37%] 2023-01-11T23:13:47.1263454Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_as_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 37%] 2023-01-11T23:13:47.1263786Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_as_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 37%] 2023-01-11T23:13:47.1263977Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_executor_aten_cuda_complex128 PASSED [ 37%] 2023-01-11T23:13:47.1264168Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_executor_aten_cuda_complex64 PASSED [ 37%] 2023-01-11T23:13:47.1264352Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_executor_aten_cuda_float16 PASSED [ 38%] 2023-01-11T23:13:47.1264529Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_executor_aten_cuda_float64 PASSED [ 38%] 2023-01-11T23:13:47.1264715Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_executor_aten_cuda_int16 PASSED [ 38%] 2023-01-11T23:13:47.1264899Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_executor_aten_cuda_int64 PASSED [ 38%] 2023-01-11T23:13:47.1265114Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 38%] 2023-01-11T23:13:47.1265326Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 38%] 2023-01-11T23:13:47.1265539Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 38%] 2023-01-11T23:13:47.1265748Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 38%] 2023-01-11T23:13:47.1266072Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 38%] 2023-01-11T23:13:47.1266283Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expm1_executor_aten_cuda_bool PASSED [ 38%] 2023-01-11T23:13:47.1266472Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expm1_executor_aten_cuda_float16 PASSED [ 38%] 2023-01-11T23:13:47.1266649Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expm1_executor_aten_cuda_float64 PASSED [ 38%] 2023-01-11T23:13:47.1266831Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expm1_executor_aten_cuda_int32 PASSED [ 38%] 2023-01-11T23:13:47.1267012Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expm1_executor_aten_cuda_int64 PASSED [ 38%] 2023-01-11T23:13:47.1267192Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expm1_executor_aten_cuda_uint8 PASSED [ 38%] 2023-01-11T23:13:47.1267401Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expm1_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 38%] 2023-01-11T23:13:47.1267607Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expm1_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 38%] 2023-01-11T23:13:47.1267798Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expm1_executor_nvfuser_cuda_float32 PASSED [ 38%] 2023-01-11T23:13:47.1268006Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expm1_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 38%] 2023-01-11T23:13:47.1268336Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expm1_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 38%] 2023-01-11T23:13:47.1268520Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eye_executor_aten_cuda_bfloat16 PASSED [ 38%] 2023-01-11T23:13:47.1268700Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eye_executor_aten_cuda_complex128 PASSED [ 38%] 2023-01-11T23:13:47.1268880Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eye_executor_aten_cuda_float64 PASSED [ 38%] 2023-01-11T23:13:47.1269061Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eye_executor_aten_cuda_int16 PASSED [ 38%] 2023-01-11T23:13:47.1269298Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eye_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 38%] 2023-01-11T23:13:47.1269510Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eye_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 38%] 2023-01-11T23:13:47.1269805Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eye_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 38%] 2023-01-11T23:13:47.1270038Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eye_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 38%] 2023-01-11T23:13:47.1270247Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eye_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 38%] 2023-01-11T23:13:47.1270578Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eye_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 38%] 2023-01-11T23:13:47.1270765Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft2_executor_aten_cuda_bool PASSED [ 38%] 2023-01-11T23:13:47.1270951Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft2_executor_aten_cuda_complex128 PASSED [ 38%] 2023-01-11T23:13:47.1271142Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft2_executor_aten_cuda_complex32 PASSED [ 38%] 2023-01-11T23:13:47.1271325Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft2_executor_aten_cuda_int32 PASSED [ 38%] 2023-01-11T23:13:47.1271508Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft2_executor_aten_cuda_uint8 PASSED [ 38%] 2023-01-11T23:13:47.1271714Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft2_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 38%] 2023-01-11T23:13:47.1271962Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft2_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 38%] 2023-01-11T23:13:47.1272180Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft2_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 38%] 2023-01-11T23:13:47.1272509Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft2_executor_nvfuser_cuda_int32 SKIPPED (_refs.fft.fft2 doesn't support nvfuser) [ 38%] 2023-01-11T23:13:47.1272719Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft2_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 38%] 2023-01-11T23:13:47.1273038Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft2_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 38%] 2023-01-11T23:13:47.1273227Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft_executor_aten_cuda_complex128 PASSED [ 38%] 2023-01-11T23:13:47.1273416Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft_executor_aten_cuda_float16 PASSED [ 38%] 2023-01-11T23:13:47.1273604Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft_executor_aten_cuda_float32 PASSED [ 38%] 2023-01-11T23:13:47.1273787Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft_executor_aten_cuda_float64 PASSED [ 38%] 2023-01-11T23:13:47.1273969Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft_executor_aten_cuda_int16 PASSED [ 38%] 2023-01-11T23:13:47.1274151Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft_executor_aten_cuda_int8 PASSED [ 38%] 2023-01-11T23:13:47.1274362Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 38%] 2023-01-11T23:13:47.1274687Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft_executor_nvfuser_cuda_float32 SKIPPED (_refs.fft.fft doesn't support nvfuser) [ 38%] 2023-01-11T23:13:47.1275010Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft_executor_nvfuser_cuda_int32 SKIPPED (_refs.fft.fft doesn't support nvfuser) [ 38%] 2023-01-11T23:13:47.1275360Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 38%] 2023-01-11T23:13:47.1275553Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftn_executor_aten_cuda_complex128 PASSED [ 38%] 2023-01-11T23:13:47.1275738Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftn_executor_aten_cuda_int16 PASSED [ 38%] 2023-01-11T23:13:47.1275923Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftn_executor_aten_cuda_int32 PASSED [ 38%] 2023-01-11T23:13:47.1276134Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftn_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 38%] 2023-01-11T23:13:47.1276348Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftn_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 38%] 2023-01-11T23:13:47.1276700Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftn_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 38%] 2023-01-11T23:13:47.1276916Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftn_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 38%] 2023-01-11T23:13:47.1277127Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftn_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 38%] 2023-01-11T23:13:47.1277459Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftn_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 38%] 2023-01-11T23:13:47.1277665Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftn_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 38%] 2023-01-11T23:13:47.1277986Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftn_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 38%] 2023-01-11T23:13:47.1278220Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftn_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 38%] 2023-01-11T23:13:47.1278413Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_aten_cuda_bool PASSED [ 38%] 2023-01-11T23:13:47.1278604Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_aten_cuda_float16 PASSED [ 38%] 2023-01-11T23:13:47.1278797Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_aten_cuda_float32 PASSED [ 38%] 2023-01-11T23:13:47.1278987Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_aten_cuda_float64 PASSED [ 38%] 2023-01-11T23:13:47.1279174Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_aten_cuda_int64 PASSED [ 38%] 2023-01-11T23:13:47.1279361Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_aten_cuda_int8 PASSED [ 38%] 2023-01-11T23:13:47.1279551Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_aten_cuda_uint8 PASSED [ 38%] 2023-01-11T23:13:47.1279898Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 38%] 2023-01-11T23:13:47.1280117Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 38%] 2023-01-11T23:13:47.1280332Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 38%] 2023-01-11T23:13:47.1280681Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_nvfuser_cuda_float32 SKIPPED (_refs.fft.fftshift doesn't support nvfuser) [ 38%] 2023-01-11T23:13:47.1280896Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 38%] 2023-01-11T23:13:47.1281236Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 38%] 2023-01-11T23:13:47.1281592Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 38%] 2023-01-11T23:13:47.1281780Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft2_executor_aten_cuda_bool PASSED [ 38%] 2023-01-11T23:13:47.1281974Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft2_executor_aten_cuda_complex32 PASSED [ 38%] 2023-01-11T23:13:47.1282158Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft2_executor_aten_cuda_int16 PASSED [ 38%] 2023-01-11T23:13:47.1282344Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft2_executor_aten_cuda_int64 PASSED [ 38%] 2023-01-11T23:13:47.1282522Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft2_executor_aten_cuda_int8 PASSED [ 38%] 2023-01-11T23:13:47.1282732Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft2_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 38%] 2023-01-11T23:13:47.1282948Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft2_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 38%] 2023-01-11T23:13:47.1283283Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft2_executor_nvfuser_cuda_float32 SKIPPED (_refs.fft.hfft2 doesn't support nvfuser) [ 38%] 2023-01-11T23:13:47.1283498Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft2_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 38%] 2023-01-11T23:13:47.1283829Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft2_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 38%] 2023-01-11T23:13:47.1284156Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft2_executor_nvfuser_cuda_int32 SKIPPED (_refs.fft.hfft2 doesn't support nvfuser) [ 38%] 2023-01-11T23:13:47.1284390Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft2_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 38%] 2023-01-11T23:13:47.1284578Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_aten_cuda_bool PASSED [ 38%] 2023-01-11T23:13:47.1284764Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_aten_cuda_complex64 PASSED [ 38%] 2023-01-11T23:13:47.1284942Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_aten_cuda_float32 PASSED [ 38%] 2023-01-11T23:13:47.1285124Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_aten_cuda_int16 PASSED [ 38%] 2023-01-11T23:13:47.1285303Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_aten_cuda_int64 PASSED [ 38%] 2023-01-11T23:13:47.1285483Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_aten_cuda_int8 PASSED [ 38%] 2023-01-11T23:13:47.1285696Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 38%] 2023-01-11T23:13:47.1286043Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 38%] 2023-01-11T23:13:47.1286251Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 38%] 2023-01-11T23:13:47.1286461Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 38%] 2023-01-11T23:13:47.1286782Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_nvfuser_cuda_float32 SKIPPED (_refs.fft.hfft doesn't support nvfuser) [ 38%] 2023-01-11T23:13:47.1286990Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 38%] 2023-01-11T23:13:47.1287312Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 38%] 2023-01-11T23:13:47.1287660Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_nvfuser_cuda_int32 SKIPPED (_refs.fft.hfft doesn't support nvfuser) [ 38%] 2023-01-11T23:13:47.1287867Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 38%] 2023-01-11T23:13:47.1288049Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfftn_executor_aten_cuda_bool PASSED [ 38%] 2023-01-11T23:13:47.1288235Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfftn_executor_aten_cuda_complex128 PASSED [ 38%] 2023-01-11T23:13:47.1288424Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfftn_executor_aten_cuda_complex64 PASSED [ 38%] 2023-01-11T23:13:47.1288615Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfftn_executor_aten_cuda_float32 PASSED [ 38%] 2023-01-11T23:13:47.1288802Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfftn_executor_aten_cuda_int16 PASSED [ 38%] 2023-01-11T23:13:47.1288985Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfftn_executor_aten_cuda_int32 PASSED [ 38%] 2023-01-11T23:13:47.1289167Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfftn_executor_aten_cuda_int8 PASSED [ 38%] 2023-01-11T23:13:47.1289368Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfftn_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 38%] 2023-01-11T23:13:47.1289713Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfftn_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 38%] 2023-01-11T23:13:47.1290036Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfftn_executor_nvfuser_cuda_int32 SKIPPED (_refs.fft.hfftn doesn't support nvfuser) [ 38%] 2023-01-11T23:13:47.1290274Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfftn_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 38%] 2023-01-11T23:13:47.1290459Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft2_executor_aten_cuda_bool PASSED [ 38%] 2023-01-11T23:13:47.1290645Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft2_executor_aten_cuda_complex32 PASSED [ 38%] 2023-01-11T23:13:47.1290829Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft2_executor_aten_cuda_float32 PASSED [ 38%] 2023-01-11T23:13:47.1291014Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft2_executor_aten_cuda_float64 PASSED [ 38%] 2023-01-11T23:13:47.1291194Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft2_executor_aten_cuda_int32 PASSED [ 38%] 2023-01-11T23:13:47.1291399Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft2_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 38%] 2023-01-11T23:13:47.1291606Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft2_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 38%] 2023-01-11T23:13:47.1291824Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft2_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 38%] 2023-01-11T23:13:47.1292152Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft2_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 38%] 2023-01-11T23:13:47.1292359Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft2_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 38%] 2023-01-11T23:13:47.1292541Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_aten_cuda_float32 PASSED [ 38%] 2023-01-11T23:13:47.1292726Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_aten_cuda_float64 PASSED [ 38%] 2023-01-11T23:13:47.1292913Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_aten_cuda_int16 PASSED [ 38%] 2023-01-11T23:13:47.1293094Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_aten_cuda_int64 PASSED [ 38%] 2023-01-11T23:13:47.1293297Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_aten_cuda_uint8 PASSED [ 38%] 2023-01-11T23:13:47.1293504Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 38%] 2023-01-11T23:13:47.1293708Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 38%] 2023-01-11T23:13:47.1293919Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 38%] 2023-01-11T23:13:47.1294243Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_nvfuser_cuda_float32 SKIPPED (_refs.fft.ifft doesn't support nvfuser) [ 38%] 2023-01-11T23:13:47.1294453Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 38%] 2023-01-11T23:13:47.1294981Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 38%] 2023-01-11T23:13:47.1295174Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftn_executor_aten_cuda_complex64 PASSED [ 38%] 2023-01-11T23:13:47.1295358Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftn_executor_aten_cuda_float16 PASSED [ 38%] 2023-01-11T23:13:47.1295541Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftn_executor_aten_cuda_float64 PASSED [ 38%] 2023-01-11T23:13:47.1295722Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftn_executor_aten_cuda_int32 PASSED [ 38%] 2023-01-11T23:13:47.1295896Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftn_executor_aten_cuda_int64 PASSED [ 38%] 2023-01-11T23:13:47.1296186Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftn_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 38%] 2023-01-11T23:13:47.1296550Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftn_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 38%] 2023-01-11T23:13:47.1296760Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftn_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 38%] 2023-01-11T23:13:47.1297092Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftn_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 38%] 2023-01-11T23:13:47.1297417Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftn_executor_nvfuser_cuda_int32 SKIPPED (_refs.fft.ifftn doesn't support nvfuser) [ 38%] 2023-01-11T23:13:47.1297748Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftn_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 38%] 2023-01-11T23:13:47.1297942Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftshift_executor_aten_cuda_bool PASSED [ 38%] 2023-01-11T23:13:47.1298139Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftshift_executor_aten_cuda_complex128 PASSED [ 38%] 2023-01-11T23:13:47.1298333Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftshift_executor_aten_cuda_float64 PASSED [ 38%] 2023-01-11T23:13:47.1298521Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftshift_executor_aten_cuda_int64 PASSED [ 38%] 2023-01-11T23:13:47.1298703Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftshift_executor_aten_cuda_int8 PASSED [ 38%] 2023-01-11T23:13:47.1298922Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftshift_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 39%] 2023-01-11T23:13:47.1299281Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftshift_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 39%] 2023-01-11T23:13:47.1299537Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftshift_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 39%] 2023-01-11T23:13:47.1299886Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftshift_executor_nvfuser_cuda_float32 SKIPPED (_refs.fft.ifftshift doesn't support nvfuser) [ 39%] 2023-01-11T23:13:47.1300225Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftshift_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 39%] 2023-01-11T23:13:47.1300566Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftshift_executor_nvfuser_cuda_int32 SKIPPED (_refs.fft.ifftshift doesn't support nvfuser) [ 39%] 2023-01-11T23:13:47.1300777Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftshift_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 39%] 2023-01-11T23:13:47.1300963Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft2_executor_aten_cuda_bool PASSED [ 39%] 2023-01-11T23:13:47.1301147Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft2_executor_aten_cuda_int32 PASSED [ 39%] 2023-01-11T23:13:47.1301322Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft2_executor_aten_cuda_int64 PASSED [ 39%] 2023-01-11T23:13:47.1301500Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft2_executor_aten_cuda_uint8 PASSED [ 39%] 2023-01-11T23:13:47.1301713Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft2_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 39%] 2023-01-11T23:13:47.1302047Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft2_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 39%] 2023-01-11T23:13:47.1302417Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft_executor_aten_cuda_float16 PASSED [ 39%] 2023-01-11T23:13:47.1302602Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft_executor_aten_cuda_float64 PASSED [ 39%] 2023-01-11T23:13:47.1302788Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft_executor_aten_cuda_int32 PASSED [ 39%] 2023-01-11T23:13:47.1302996Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 39%] 2023-01-11T23:13:47.1303204Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 39%] 2023-01-11T23:13:47.1303412Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 39%] 2023-01-11T23:13:47.1303731Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 39%] 2023-01-11T23:13:47.1304057Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft_executor_nvfuser_cuda_int32 SKIPPED (_refs.fft.ihfft doesn't support nvfuser) [ 39%] 2023-01-11T23:13:47.1304266Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 39%] 2023-01-11T23:13:47.1304589Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 39%] 2023-01-11T23:13:47.1304775Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfftn_executor_aten_cuda_bool PASSED [ 39%] 2023-01-11T23:13:47.1304959Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfftn_executor_aten_cuda_int16 PASSED [ 39%] 2023-01-11T23:13:47.1305141Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfftn_executor_aten_cuda_int32 PASSED [ 39%] 2023-01-11T23:13:47.1305317Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfftn_executor_aten_cuda_int64 PASSED [ 39%] 2023-01-11T23:13:47.1305500Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfftn_executor_aten_cuda_int8 PASSED [ 39%] 2023-01-11T23:13:47.1305728Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfftn_executor_aten_cuda_uint8 PASSED [ 39%] 2023-01-11T23:13:47.1305934Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfftn_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 39%] 2023-01-11T23:13:47.1306265Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfftn_executor_nvfuser_cuda_float32 SKIPPED (_refs.fft.ihfftn doesn't support nvfuser) [ 39%] 2023-01-11T23:13:47.1306594Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfftn_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 39%] 2023-01-11T23:13:47.1306925Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfftn_executor_nvfuser_cuda_int32 SKIPPED (_refs.fft.ihfftn doesn't support nvfuser) [ 39%] 2023-01-11T23:13:47.1307120Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft2_executor_aten_cuda_complex128 PASSED [ 39%] 2023-01-11T23:13:47.1307334Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft2_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 39%] 2023-01-11T23:13:47.1307546Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft2_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 39%] 2023-01-11T23:13:47.1307875Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft2_executor_nvfuser_cuda_float32 SKIPPED (_refs.fft.irfft2 doesn't support nvfuser) [ 39%] 2023-01-11T23:13:47.1308081Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft2_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 39%] 2023-01-11T23:13:47.1308407Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft2_executor_nvfuser_cuda_int32 SKIPPED (_refs.fft.irfft2 doesn't support nvfuser) [ 39%] 2023-01-11T23:13:47.1308641Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft2_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 39%] 2023-01-11T23:13:47.1308839Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft2_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 39%] 2023-01-11T23:13:47.1309021Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft_executor_aten_cuda_bool PASSED [ 39%] 2023-01-11T23:13:47.1309211Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft_executor_aten_cuda_complex128 PASSED [ 39%] 2023-01-11T23:13:47.1309399Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft_executor_aten_cuda_float64 PASSED [ 39%] 2023-01-11T23:13:47.1309578Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft_executor_aten_cuda_int64 PASSED [ 39%] 2023-01-11T23:13:47.1309839Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft_executor_aten_cuda_uint8 PASSED [ 39%] 2023-01-11T23:13:47.1310055Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 39%] 2023-01-11T23:13:47.1310403Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 39%] 2023-01-11T23:13:47.1310614Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 39%] 2023-01-11T23:13:47.1310939Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft_executor_nvfuser_cuda_float32 SKIPPED (_refs.fft.irfft doesn't support nvfuser) [ 39%] 2023-01-11T23:13:47.1311140Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 39%] 2023-01-11T23:13:47.1311463Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 39%] 2023-01-11T23:13:47.1311788Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft_executor_nvfuser_cuda_int32 SKIPPED (_refs.fft.irfft doesn't support nvfuser) [ 39%] 2023-01-11T23:13:47.1312022Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 39%] 2023-01-11T23:13:47.1312344Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 39%] 2023-01-11T23:13:47.1312526Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfftn_executor_aten_cuda_bool PASSED [ 39%] 2023-01-11T23:13:47.1312717Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfftn_executor_aten_cuda_complex128 PASSED [ 39%] 2023-01-11T23:13:47.1312904Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfftn_executor_aten_cuda_complex32 PASSED [ 39%] 2023-01-11T23:13:47.1313100Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfftn_executor_aten_cuda_complex64 PASSED [ 39%] 2023-01-11T23:13:47.1313291Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfftn_executor_aten_cuda_float16 PASSED [ 39%] 2023-01-11T23:13:47.1313470Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfftn_executor_aten_cuda_float64 PASSED [ 39%] 2023-01-11T23:13:47.1313652Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfftn_executor_aten_cuda_int64 PASSED [ 39%] 2023-01-11T23:13:47.1313831Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfftn_executor_aten_cuda_int8 PASSED [ 39%] 2023-01-11T23:13:47.1314038Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfftn_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 39%] 2023-01-11T23:13:47.1314371Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfftn_executor_nvfuser_cuda_float32 SKIPPED (_refs.fft.irfftn doesn't support nvfuser) [ 39%] 2023-01-11T23:13:47.1314610Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfftn_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 39%] 2023-01-11T23:13:47.1314936Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfftn_executor_nvfuser_cuda_int32 SKIPPED (_refs.fft.irfftn doesn't support nvfuser) [ 39%] 2023-01-11T23:13:47.1315144Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfftn_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 39%] 2023-01-11T23:13:47.1315469Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfftn_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 39%] 2023-01-11T23:13:47.1315677Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfftn_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 39%] 2023-01-11T23:13:47.1315861Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft2_executor_aten_cuda_float32 PASSED [ 39%] 2023-01-11T23:13:47.1316041Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft2_executor_aten_cuda_float64 PASSED [ 39%] 2023-01-11T23:13:47.1316221Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft2_executor_aten_cuda_int32 PASSED [ 39%] 2023-01-11T23:13:47.1316404Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft2_executor_aten_cuda_int8 PASSED [ 39%] 2023-01-11T23:13:47.1316613Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft2_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 39%] 2023-01-11T23:13:47.1316943Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft2_executor_nvfuser_cuda_float32 SKIPPED (_refs.fft.rfft2 doesn't support nvfuser) [ 39%] 2023-01-11T23:13:47.1317271Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft2_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 39%] 2023-01-11T23:13:47.1317480Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft2_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 39%] 2023-01-11T23:13:47.1317667Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft_executor_aten_cuda_bool PASSED [ 39%] 2023-01-11T23:13:47.1317882Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft_executor_aten_cuda_float16 PASSED [ 39%] 2023-01-11T23:13:47.1318059Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft_executor_aten_cuda_float64 PASSED [ 39%] 2023-01-11T23:13:47.1318237Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft_executor_aten_cuda_int32 PASSED [ 39%] 2023-01-11T23:13:47.1318415Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft_executor_aten_cuda_int8 PASSED [ 39%] 2023-01-11T23:13:47.1318593Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft_executor_aten_cuda_uint8 PASSED [ 39%] 2023-01-11T23:13:47.1318800Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 39%] 2023-01-11T23:13:47.1319133Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 39%] 2023-01-11T23:13:47.1319342Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 39%] 2023-01-11T23:13:47.1319543Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 39%] 2023-01-11T23:13:47.1319720Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfftn_executor_aten_cuda_bool PASSED [ 39%] 2023-01-11T23:13:47.1319907Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfftn_executor_aten_cuda_float16 PASSED [ 39%] 2023-01-11T23:13:47.1320086Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfftn_executor_aten_cuda_float32 PASSED [ 39%] 2023-01-11T23:13:47.1320296Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfftn_executor_aten_cuda_int16 PASSED [ 39%] 2023-01-11T23:13:47.1320475Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfftn_executor_aten_cuda_int32 PASSED [ 39%] 2023-01-11T23:13:47.1320654Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfftn_executor_aten_cuda_int8 PASSED [ 39%] 2023-01-11T23:13:47.1320862Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfftn_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 39%] 2023-01-11T23:13:47.1321198Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfftn_executor_nvfuser_cuda_float32 SKIPPED (_refs.fft.rfftn doesn't support nvfuser) [ 39%] 2023-01-11T23:13:47.1321407Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfftn_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 39%] 2023-01-11T23:13:47.1321611Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfftn_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 39%] 2023-01-11T23:13:47.1321937Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfftn_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 39%] 2023-01-11T23:13:47.1322123Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_aten_cuda_complex128 PASSED [ 39%] 2023-01-11T23:13:47.1322298Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_aten_cuda_complex64 PASSED [ 39%] 2023-01-11T23:13:47.1322479Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_aten_cuda_int16 PASSED [ 39%] 2023-01-11T23:13:47.1322659Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_aten_cuda_int32 PASSED [ 39%] 2023-01-11T23:13:47.1322864Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 39%] 2023-01-11T23:13:47.1323067Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 39%] 2023-01-11T23:13:47.1323389Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 39%] 2023-01-11T23:13:47.1323620Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 39%] 2023-01-11T23:13:47.1323807Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_aten_cuda_bfloat16 PASSED [ 39%] 2023-01-11T23:13:47.1323990Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_aten_cuda_bool PASSED [ 39%] 2023-01-11T23:13:47.1324173Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_aten_cuda_float16 PASSED [ 39%] 2023-01-11T23:13:47.1324348Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_aten_cuda_int16 PASSED [ 39%] 2023-01-11T23:13:47.1324527Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_aten_cuda_int32 PASSED [ 39%] 2023-01-11T23:13:47.1324708Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_aten_cuda_int64 PASSED [ 39%] 2023-01-11T23:13:47.1324926Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 39%] 2023-01-11T23:13:47.1325136Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 39%] 2023-01-11T23:13:47.1325462Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 39%] 2023-01-11T23:13:47.1325673Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 39%] 2023-01-11T23:13:47.1325854Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flip_executor_aten_cuda_bfloat16 PASSED [ 39%] 2023-01-11T23:13:47.1326071Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flip_executor_aten_cuda_bool PASSED [ 39%] 2023-01-11T23:13:47.1326250Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flip_executor_aten_cuda_complex128 PASSED [ 39%] 2023-01-11T23:13:47.1326433Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flip_executor_aten_cuda_float16 PASSED [ 39%] 2023-01-11T23:13:47.1326615Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flip_executor_aten_cuda_float64 PASSED [ 39%] 2023-01-11T23:13:47.1326797Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flip_executor_aten_cuda_int32 PASSED [ 39%] 2023-01-11T23:13:47.1326978Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flip_executor_aten_cuda_int64 PASSED [ 39%] 2023-01-11T23:13:47.1327156Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flip_executor_aten_cuda_uint8 PASSED [ 39%] 2023-01-11T23:13:47.1327363Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flip_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 39%] 2023-01-11T23:13:47.1327575Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flip_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 39%] 2023-01-11T23:13:47.1327780Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flip_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 39%] 2023-01-11T23:13:47.1327963Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fliplr_executor_aten_cuda_bool PASSED [ 39%] 2023-01-11T23:13:47.1328134Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fliplr_executor_aten_cuda_int32 PASSED [ 39%] 2023-01-11T23:13:47.1328315Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fliplr_executor_aten_cuda_int8 PASSED [ 39%] 2023-01-11T23:13:47.1328523Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fliplr_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 39%] 2023-01-11T23:13:47.1328729Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fliplr_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 39%] 2023-01-11T23:13:47.1328942Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fliplr_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 39%] 2023-01-11T23:13:47.1329176Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fliplr_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 39%] 2023-01-11T23:13:47.1329501Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fliplr_executor_nvfuser_cuda_float32 SKIPPED (_refs.fliplr doesn't support nvfuser) [ 39%] 2023-01-11T23:13:47.1329706Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fliplr_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 39%] 2023-01-11T23:13:47.1330027Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fliplr_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 39%] 2023-01-11T23:13:47.1330229Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fliplr_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 39%] 2023-01-11T23:13:47.1330410Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flipud_executor_aten_cuda_bfloat16 PASSED [ 39%] 2023-01-11T23:13:47.1330596Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flipud_executor_aten_cuda_bool PASSED [ 39%] 2023-01-11T23:13:47.1330780Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flipud_executor_aten_cuda_float32 PASSED [ 39%] 2023-01-11T23:13:47.1330962Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flipud_executor_aten_cuda_int32 PASSED [ 39%] 2023-01-11T23:13:47.1331143Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flipud_executor_aten_cuda_int64 PASSED [ 39%] 2023-01-11T23:13:47.1331325Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flipud_executor_aten_cuda_uint8 PASSED [ 39%] 2023-01-11T23:13:47.1331646Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flipud_executor_nvfuser_cuda_float32 SKIPPED (_refs.flipud doesn't support nvfuser) [ 39%] 2023-01-11T23:13:47.1331883Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flipud_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 39%] 2023-01-11T23:13:47.1332208Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flipud_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 39%] 2023-01-11T23:13:47.1332407Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flipud_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 39%] 2023-01-11T23:13:47.1332597Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_float_power_executor_aten_cuda_bool PASSED [ 39%] 2023-01-11T23:13:47.1332788Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_float_power_executor_aten_cuda_complex128 PASSED [ 39%] 2023-01-11T23:13:47.1332977Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_float_power_executor_aten_cuda_complex64 PASSED [ 40%] 2023-01-11T23:13:47.1333163Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_float_power_executor_aten_cuda_int16 PASSED [ 40%] 2023-01-11T23:13:47.1333351Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_float_power_executor_aten_cuda_int8 PASSED [ 40%] 2023-01-11T23:13:47.1333539Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_float_power_executor_aten_cuda_uint8 PASSED [ 40%] 2023-01-11T23:13:47.1333748Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_float_power_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 40%] 2023-01-11T23:13:47.1333964Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_float_power_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 40%] 2023-01-11T23:13:47.1334181Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_float_power_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 40%] 2023-01-11T23:13:47.1334782Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_float_power_executor_nvfuser_cuda_float32 SKIPPED (_refs.float_power doesn't support nvfuser) [ 40%] 2023-01-11T23:13:47.1335164Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_float_power_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 40%] 2023-01-11T23:13:47.1335437Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_float_power_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 40%] 2023-01-11T23:13:47.1335777Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_float_power_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 40%] 2023-01-11T23:13:47.1335989Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_float_power_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 40%] 2023-01-11T23:13:47.1336180Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_divide_executor_aten_cuda_float16 PASSED [ 40%] 2023-01-11T23:13:47.1336370Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_divide_executor_aten_cuda_float32 PASSED [ 40%] 2023-01-11T23:13:47.1336560Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_divide_executor_aten_cuda_float64 PASSED [ 40%] 2023-01-11T23:13:47.1336748Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_divide_executor_aten_cuda_int16 PASSED [ 40%] 2023-01-11T23:13:47.1336937Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_divide_executor_aten_cuda_int32 PASSED [ 40%] 2023-01-11T23:13:47.1337116Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_divide_executor_aten_cuda_int8 PASSED [ 40%] 2023-01-11T23:13:47.1337301Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_divide_executor_aten_cuda_uint8 PASSED [ 40%] 2023-01-11T23:13:47.1337516Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_divide_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 40%] 2023-01-11T23:13:47.1337732Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_divide_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 40%] 2023-01-11T23:13:47.1338102Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_divide_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 40%] 2023-01-11T23:13:47.1338291Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_executor_aten_cuda_float16 PASSED [ 40%] 2023-01-11T23:13:47.1338476Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_executor_aten_cuda_float32 PASSED [ 40%] 2023-01-11T23:13:47.1338655Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_executor_aten_cuda_float64 PASSED [ 40%] 2023-01-11T23:13:47.1338836Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_executor_aten_cuda_int16 PASSED [ 40%] 2023-01-11T23:13:47.1339020Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_executor_aten_cuda_int32 PASSED [ 40%] 2023-01-11T23:13:47.1339193Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_executor_aten_cuda_int64 PASSED [ 40%] 2023-01-11T23:13:47.1339377Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_executor_aten_cuda_int8 PASSED [ 40%] 2023-01-11T23:13:47.1339587Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 40%] 2023-01-11T23:13:47.1339797Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 40%] 2023-01-11T23:13:47.1340006Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 40%] 2023-01-11T23:13:47.1340187Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmax_executor_aten_cuda_bool PASSED [ 40%] 2023-01-11T23:13:47.1340370Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmax_executor_aten_cuda_float64 PASSED [ 40%] 2023-01-11T23:13:47.1340553Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmax_executor_aten_cuda_int8 PASSED [ 40%] 2023-01-11T23:13:47.1340737Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmax_executor_aten_cuda_uint8 PASSED [ 40%] 2023-01-11T23:13:47.1340943Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmax_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 40%] 2023-01-11T23:13:47.1341175Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmax_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 40%] 2023-01-11T23:13:47.1341384Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmax_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 40%] 2023-01-11T23:13:47.1341705Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmax_executor_nvfuser_cuda_float32 SKIPPED (_refs.fmax doesn't support nvfuser) [ 40%] 2023-01-11T23:13:47.1342031Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmax_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 40%] 2023-01-11T23:13:47.1342237Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmax_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 40%] 2023-01-11T23:13:47.1342563Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmax_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 40%] 2023-01-11T23:13:47.1342770Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmax_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 40%] 2023-01-11T23:13:47.1342952Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmin_executor_aten_cuda_float16 PASSED [ 40%] 2023-01-11T23:13:47.1343136Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmin_executor_aten_cuda_float32 PASSED [ 40%] 2023-01-11T23:13:47.1343311Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmin_executor_aten_cuda_float64 PASSED [ 40%] 2023-01-11T23:13:47.1343490Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmin_executor_aten_cuda_int8 PASSED [ 40%] 2023-01-11T23:13:47.1343670Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmin_executor_aten_cuda_uint8 PASSED [ 40%] 2023-01-11T23:13:47.1343968Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmin_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 40%] 2023-01-11T23:13:47.1344300Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmin_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 40%] 2023-01-11T23:13:47.1344616Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmin_executor_nvfuser_cuda_int32 SKIPPED (_refs.fmin doesn't support nvfuser) [ 40%] 2023-01-11T23:13:47.1344820Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmin_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 40%] 2023-01-11T23:13:47.1345140Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmin_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 40%] 2023-01-11T23:13:47.1345344Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmin_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 40%] 2023-01-11T23:13:47.1345529Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmod_executor_aten_cuda_float32 PASSED [ 40%] 2023-01-11T23:13:47.1345702Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmod_executor_aten_cuda_int16 PASSED [ 40%] 2023-01-11T23:13:47.1345885Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmod_executor_aten_cuda_int32 PASSED [ 40%] 2023-01-11T23:13:47.1346065Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmod_executor_aten_cuda_int64 PASSED [ 40%] 2023-01-11T23:13:47.1346272Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmod_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 40%] 2023-01-11T23:13:47.1346459Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmod_executor_nvfuser_cuda_float32 PASSED [ 40%] 2023-01-11T23:13:47.1346665Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmod_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 40%] 2023-01-11T23:13:47.1346994Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmod_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 40%] 2023-01-11T23:13:47.1347365Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmod_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 40%] 2023-01-11T23:13:47.1347555Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_frac_executor_aten_cuda_bfloat16 PASSED [ 40%] 2023-01-11T23:13:47.1347737Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_frac_executor_aten_cuda_float32 PASSED [ 40%] 2023-01-11T23:13:47.1347933Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_frac_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 40%] 2023-01-11T23:13:47.1348142Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_frac_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 40%] 2023-01-11T23:13:47.1348324Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gcd_executor_aten_cuda_int32 PASSED [ 40%] 2023-01-11T23:13:47.1348509Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gcd_executor_aten_cuda_int64 PASSED [ 40%] 2023-01-11T23:13:47.1348716Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gcd_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 40%] 2023-01-11T23:13:47.1349042Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gcd_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 40%] 2023-01-11T23:13:47.1349246Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gcd_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 40%] 2023-01-11T23:13:47.1349427Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ge_executor_aten_cuda_int32 PASSED [ 40%] 2023-01-11T23:13:47.1349606Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ge_executor_aten_cuda_int8 PASSED [ 40%] 2023-01-11T23:13:47.1349893Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ge_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 40%] 2023-01-11T23:13:47.1350119Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ge_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 40%] 2023-01-11T23:13:47.1350325Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ge_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 40%] 2023-01-11T23:13:47.1350526Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ge_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 40%] 2023-01-11T23:13:47.1350708Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gt_executor_aten_cuda_bfloat16 PASSED [ 40%] 2023-01-11T23:13:47.1350886Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gt_executor_aten_cuda_bool PASSED [ 40%] 2023-01-11T23:13:47.1351067Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gt_executor_aten_cuda_float32 PASSED [ 40%] 2023-01-11T23:13:47.1351242Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gt_executor_aten_cuda_int32 PASSED [ 40%] 2023-01-11T23:13:47.1351448Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gt_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 40%] 2023-01-11T23:13:47.1351633Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gt_executor_nvfuser_cuda_float32 PASSED [ 40%] 2023-01-11T23:13:47.1351826Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gt_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 40%] 2023-01-11T23:13:47.1352006Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gt_executor_nvfuser_cuda_int32 PASSED [ 40%] 2023-01-11T23:13:47.1352207Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gt_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 40%] 2023-01-11T23:13:47.1352397Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_heaviside_executor_aten_cuda_bfloat16 PASSED [ 40%] 2023-01-11T23:13:47.1352582Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_heaviside_executor_aten_cuda_bool PASSED [ 40%] 2023-01-11T23:13:47.1352773Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_heaviside_executor_aten_cuda_float16 PASSED [ 40%] 2023-01-11T23:13:47.1352960Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_heaviside_executor_aten_cuda_float64 PASSED [ 40%] 2023-01-11T23:13:47.1353173Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_heaviside_executor_aten_cuda_int16 PASSED [ 40%] 2023-01-11T23:13:47.1353359Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_heaviside_executor_aten_cuda_int32 PASSED [ 40%] 2023-01-11T23:13:47.1353529Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_heaviside_executor_aten_cuda_int64 PASSED [ 40%] 2023-01-11T23:13:47.1353742Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_heaviside_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 40%] 2023-01-11T23:13:47.1353954Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_heaviside_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 40%] 2023-01-11T23:13:47.1354293Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_heaviside_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 40%] 2023-01-11T23:13:47.1354629Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_heaviside_executor_nvfuser_cuda_int32 SKIPPED (_refs.heaviside doesn't support nvfuser) [ 40%] 2023-01-11T23:13:47.1354809Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_aten_cuda_bfloat16 PASSED [ 40%] 2023-01-11T23:13:47.1354996Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_aten_cuda_complex32 PASSED [ 40%] 2023-01-11T23:13:47.1355178Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_aten_cuda_float32 PASSED [ 40%] 2023-01-11T23:13:47.1355358Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_aten_cuda_int32 PASSED [ 40%] 2023-01-11T23:13:47.1355540Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_aten_cuda_int64 PASSED [ 40%] 2023-01-11T23:13:47.1355741Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_aten_cuda_int8 PASSED [ 40%] 2023-01-11T23:13:47.1355925Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_aten_cuda_uint8 PASSED [ 40%] 2023-01-11T23:13:47.1356136Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 40%] 2023-01-11T23:13:47.1356351Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 40%] 2023-01-11T23:13:47.1356559Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 40%] 2023-01-11T23:13:47.1356886Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_nvfuser_cuda_float32 SKIPPED (_refs.hsplit doesn't support nvfuser) [ 40%] 2023-01-11T23:13:47.1357094Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 40%] 2023-01-11T23:13:47.1357423Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 40%] 2023-01-11T23:13:47.1357634Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 40%] 2023-01-11T23:13:47.1357821Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_aten_cuda_bfloat16 PASSED [ 40%] 2023-01-11T23:13:47.1357999Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_aten_cuda_complex128 PASSED [ 40%] 2023-01-11T23:13:47.1358186Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_aten_cuda_complex32 PASSED [ 40%] 2023-01-11T23:13:47.1358371Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_aten_cuda_complex64 PASSED [ 40%] 2023-01-11T23:13:47.1358555Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_aten_cuda_float16 PASSED [ 40%] 2023-01-11T23:13:47.1358737Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_aten_cuda_float64 PASSED [ 40%] 2023-01-11T23:13:47.1358946Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_aten_cuda_int16 PASSED [ 40%] 2023-01-11T23:13:47.1359155Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_aten_cuda_int64 PASSED [ 40%] 2023-01-11T23:13:47.1359352Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_aten_cuda_int8 PASSED [ 40%] 2023-01-11T23:13:47.1359565Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 40%] 2023-01-11T23:13:47.1359771Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 40%] 2023-01-11T23:13:47.1359973Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 40%] 2023-01-11T23:13:47.1360300Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 40%] 2023-01-11T23:13:47.1360501Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 40%] 2023-01-11T23:13:47.1360685Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hypot_executor_aten_cuda_float16 PASSED [ 40%] 2023-01-11T23:13:47.1360870Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hypot_executor_aten_cuda_float32 PASSED [ 40%] 2023-01-11T23:13:47.1361077Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hypot_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 40%] 2023-01-11T23:13:47.1361394Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hypot_executor_nvfuser_cuda_float32 SKIPPED (_refs.hypot doesn't support nvfuser) [ 40%] 2023-01-11T23:13:47.1361621Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hypot_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 40%] 2023-01-11T23:13:47.1361801Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_i0_executor_aten_cuda_bool PASSED [ 40%] 2023-01-11T23:13:47.1361983Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_i0_executor_aten_cuda_float16 PASSED [ 40%] 2023-01-11T23:13:47.1362155Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_i0_executor_aten_cuda_float32 PASSED [ 40%] 2023-01-11T23:13:47.1362330Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_i0_executor_aten_cuda_int16 PASSED [ 40%] 2023-01-11T23:13:47.1362499Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_i0_executor_aten_cuda_int32 PASSED [ 40%] 2023-01-11T23:13:47.1362667Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_i0_executor_aten_cuda_int64 PASSED [ 40%] 2023-01-11T23:13:47.1362844Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_i0_executor_aten_cuda_int8 PASSED [ 40%] 2023-01-11T23:13:47.1363048Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_i0_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 40%] 2023-01-11T23:13:47.1363373Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_i0_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 40%] 2023-01-11T23:13:47.1363574Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_i0_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 40%] 2023-01-11T23:13:47.1363895Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_igamma_executor_nvfuser_cuda_float32 SKIPPED (_refs.igamma doesn't support nvfuser) [ 40%] 2023-01-11T23:13:47.1364096Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_igamma_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 40%] 2023-01-11T23:13:47.1364282Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_igammac_executor_aten_cuda_float32 PASSED [ 40%] 2023-01-11T23:13:47.1364467Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_igammac_executor_aten_cuda_float64 PASSED [ 40%] 2023-01-11T23:13:47.1364681Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_imag_executor_aten_cuda_complex128 PASSED [ 40%] 2023-01-11T23:13:47.1365021Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_imag_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 40%] 2023-01-11T23:13:47.1365231Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_imag_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 40%] 2023-01-11T23:13:47.1365419Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_aten_cuda_bfloat16 PASSED [ 40%] 2023-01-11T23:13:47.1365610Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_aten_cuda_complex128 PASSED [ 40%] 2023-01-11T23:13:47.1365796Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_aten_cuda_float32 PASSED [ 40%] 2023-01-11T23:13:47.1365985Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_aten_cuda_float64 PASSED [ 41%] 2023-01-11T23:13:47.1366163Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_aten_cuda_int16 PASSED [ 41%] 2023-01-11T23:13:47.1366346Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_aten_cuda_int32 PASSED [ 41%] 2023-01-11T23:13:47.1366527Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_aten_cuda_int8 PASSED [ 41%] 2023-01-11T23:13:47.1366706Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_aten_cuda_uint8 PASSED [ 41%] 2023-01-11T23:13:47.1366919Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 41%] 2023-01-11T23:13:47.1367128Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 41%] 2023-01-11T23:13:47.1367504Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 41%] 2023-01-11T23:13:47.1367722Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 41%] 2023-01-11T23:13:47.1367936Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 41%] 2023-01-11T23:13:47.1368265Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_nvfuser_cuda_float32 SKIPPED (_refs.index_add doesn't support nvfuser) [ 41%] 2023-01-11T23:13:47.1368585Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 41%] 2023-01-11T23:13:47.1368914Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_nvfuser_cuda_int32 SKIPPED (_refs.index_add doesn't support nvfuser) [ 41%] 2023-01-11T23:13:47.1369122Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 41%] 2023-01-11T23:13:47.1369312Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_copy_executor_aten_cuda_bfloat16 PASSED [ 41%] 2023-01-11T23:13:47.1369500Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_copy_executor_aten_cuda_bool PASSED [ 41%] 2023-01-11T23:13:47.1369691Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_copy_executor_aten_cuda_complex128 PASSED [ 41%] 2023-01-11T23:13:47.1369879Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_copy_executor_aten_cuda_float64 PASSED [ 41%] 2023-01-11T23:13:47.1370063Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_copy_executor_aten_cuda_int16 PASSED [ 41%] 2023-01-11T23:13:47.1370250Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_copy_executor_aten_cuda_int32 PASSED [ 41%] 2023-01-11T23:13:47.1370430Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_copy_executor_aten_cuda_int64 PASSED [ 41%] 2023-01-11T23:13:47.1370633Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_copy_executor_aten_cuda_int8 PASSED [ 41%] 2023-01-11T23:13:47.1370813Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_copy_executor_aten_cuda_uint8 PASSED [ 41%] 2023-01-11T23:13:47.1371024Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_copy_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 41%] 2023-01-11T23:13:47.1371360Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_copy_executor_nvfuser_cuda_float32 SKIPPED (_refs.index_copy doesn't support nvfuser) [ 41%] 2023-01-11T23:13:47.1371572Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_copy_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 41%] 2023-01-11T23:13:47.1371905Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_copy_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 41%] 2023-01-11T23:13:47.1372118Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_copy_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 41%] 2023-01-11T23:13:47.1372308Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_fill_executor_aten_cuda_complex64 PASSED [ 41%] 2023-01-11T23:13:47.1372496Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_fill_executor_aten_cuda_float16 PASSED [ 41%] 2023-01-11T23:13:47.1372680Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_fill_executor_aten_cuda_int16 PASSED [ 41%] 2023-01-11T23:13:47.1372857Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_fill_executor_aten_cuda_int32 PASSED [ 41%] 2023-01-11T23:13:47.1373035Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_fill_executor_aten_cuda_int64 PASSED [ 41%] 2023-01-11T23:13:47.1373246Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_fill_executor_aten_cuda_int8 PASSED [ 41%] 2023-01-11T23:13:47.1373459Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_fill_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 41%] 2023-01-11T23:13:47.1373669Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_fill_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 41%] 2023-01-11T23:13:47.1373879Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_fill_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 41%] 2023-01-11T23:13:47.1374088Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_fill_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 41%] 2023-01-11T23:13:47.1374418Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_fill_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 41%] 2023-01-11T23:13:47.1374899Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_aten_cuda_bool PASSED [ 41%] 2023-01-11T23:13:47.1375106Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_aten_cuda_complex128 PASSED [ 41%] 2023-01-11T23:13:47.1375298Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_aten_cuda_complex32 PASSED [ 41%] 2023-01-11T23:13:47.1375489Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_aten_cuda_float64 PASSED [ 41%] 2023-01-11T23:13:47.1375679Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_aten_cuda_int16 PASSED [ 41%] 2023-01-11T23:13:47.1375864Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_aten_cuda_int32 PASSED [ 41%] 2023-01-11T23:13:47.1376051Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_aten_cuda_int64 PASSED [ 41%] 2023-01-11T23:13:47.1376245Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_aten_cuda_int8 PASSED [ 41%] 2023-01-11T23:13:47.1376463Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 41%] 2023-01-11T23:13:47.1376751Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 41%] 2023-01-11T23:13:47.1377114Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 41%] 2023-01-11T23:13:47.1377444Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_nvfuser_cuda_float32 SKIPPED (_refs.index_select doesn't support nvfuser) [ 41%] 2023-01-11T23:13:47.1377775Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 41%] 2023-01-11T23:13:47.1377966Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isclose_executor_aten_cuda_bfloat16 PASSED [ 41%] 2023-01-11T23:13:47.1378148Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isclose_executor_aten_cuda_bool PASSED [ 41%] 2023-01-11T23:13:47.1378337Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isclose_executor_aten_cuda_complex128 PASSED [ 41%] 2023-01-11T23:13:47.1378522Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isclose_executor_aten_cuda_float16 PASSED [ 41%] 2023-01-11T23:13:47.1378710Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isclose_executor_aten_cuda_float32 PASSED [ 41%] 2023-01-11T23:13:47.1378891Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isclose_executor_aten_cuda_float64 PASSED [ 41%] 2023-01-11T23:13:47.1379075Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isclose_executor_aten_cuda_int32 PASSED [ 41%] 2023-01-11T23:13:47.1379256Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isclose_executor_aten_cuda_int64 PASSED [ 41%] 2023-01-11T23:13:47.1379492Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isclose_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 41%] 2023-01-11T23:13:47.1379709Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isclose_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 41%] 2023-01-11T23:13:47.1380038Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isclose_executor_nvfuser_cuda_float32 SKIPPED (_refs.isclose doesn't support nvfuser) [ 41%] 2023-01-11T23:13:47.1380368Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isclose_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 41%] 2023-01-11T23:13:47.1380693Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isclose_executor_nvfuser_cuda_int32 SKIPPED (_refs.isclose doesn't support nvfuser) [ 41%] 2023-01-11T23:13:47.1380883Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_aten_cuda_bfloat16 PASSED [ 41%] 2023-01-11T23:13:47.1381077Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_aten_cuda_complex128 PASSED [ 41%] 2023-01-11T23:13:47.1381273Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_aten_cuda_complex32 PASSED [ 41%] 2023-01-11T23:13:47.1381460Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_aten_cuda_float16 PASSED [ 41%] 2023-01-11T23:13:47.1381646Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_aten_cuda_float32 PASSED [ 41%] 2023-01-11T23:13:47.1381821Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_aten_cuda_float64 PASSED [ 41%] 2023-01-11T23:13:47.1382005Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_aten_cuda_int64 PASSED [ 41%] 2023-01-11T23:13:47.1382192Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_aten_cuda_uint8 PASSED [ 41%] 2023-01-11T23:13:47.1382407Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 41%] 2023-01-11T23:13:47.1382783Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 41%] 2023-01-11T23:13:47.1383000Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 41%] 2023-01-11T23:13:47.1383327Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 41%] 2023-01-11T23:13:47.1383535Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 41%] 2023-01-11T23:13:47.1383859Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 41%] 2023-01-11T23:13:47.1384073Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 41%] 2023-01-11T23:13:47.1384253Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_aten_cuda_bfloat16 PASSED [ 41%] 2023-01-11T23:13:47.1384436Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_aten_cuda_bool PASSED [ 41%] 2023-01-11T23:13:47.1384624Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_aten_cuda_complex128 PASSED [ 41%] 2023-01-11T23:13:47.1384811Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_aten_cuda_complex32 PASSED [ 41%] 2023-01-11T23:13:47.1384995Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_aten_cuda_complex64 PASSED [ 41%] 2023-01-11T23:13:47.1385178Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_aten_cuda_float64 PASSED [ 41%] 2023-01-11T23:13:47.1385427Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_aten_cuda_int16 PASSED [ 41%] 2023-01-11T23:13:47.1385606Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_aten_cuda_int32 PASSED [ 41%] 2023-01-11T23:13:47.1385814Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 41%] 2023-01-11T23:13:47.1386017Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 41%] 2023-01-11T23:13:47.1386360Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 41%] 2023-01-11T23:13:47.1386569Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 41%] 2023-01-11T23:13:47.1386891Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 41%] 2023-01-11T23:13:47.1387093Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 41%] 2023-01-11T23:13:47.1387280Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isnan_executor_aten_cuda_complex128 PASSED [ 41%] 2023-01-11T23:13:47.1387465Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isnan_executor_aten_cuda_complex64 PASSED [ 41%] 2023-01-11T23:13:47.1387649Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isnan_executor_aten_cuda_float32 PASSED [ 41%] 2023-01-11T23:13:47.1387828Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isnan_executor_aten_cuda_int16 PASSED [ 41%] 2023-01-11T23:13:47.1388007Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isnan_executor_aten_cuda_int32 PASSED [ 41%] 2023-01-11T23:13:47.1388178Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isnan_executor_aten_cuda_int8 PASSED [ 41%] 2023-01-11T23:13:47.1388387Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isnan_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 41%] 2023-01-11T23:13:47.1388626Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isnan_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 41%] 2023-01-11T23:13:47.1388836Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isnan_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 41%] 2023-01-11T23:13:47.1389040Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isnan_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 41%] 2023-01-11T23:13:47.1389250Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isnan_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 41%] 2023-01-11T23:13:47.1389439Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isneginf_executor_aten_cuda_bool PASSED [ 41%] 2023-01-11T23:13:47.1389624Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isneginf_executor_aten_cuda_float32 PASSED [ 41%] 2023-01-11T23:13:47.1389913Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isneginf_executor_aten_cuda_int32 PASSED [ 41%] 2023-01-11T23:13:47.1390143Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isneginf_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 41%] 2023-01-11T23:13:47.1390468Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isneginf_executor_nvfuser_cuda_float32 SKIPPED (_refs.isneginf doesn't support nvfuser) [ 41%] 2023-01-11T23:13:47.1390797Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isneginf_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 41%] 2023-01-11T23:13:47.1391121Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isneginf_executor_nvfuser_cuda_int32 SKIPPED (_refs.isneginf doesn't support nvfuser) [ 41%] 2023-01-11T23:13:47.1391329Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isneginf_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 41%] 2023-01-11T23:13:47.1391562Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isneginf_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 41%] 2023-01-11T23:13:47.1391753Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isposinf_executor_aten_cuda_bfloat16 PASSED [ 41%] 2023-01-11T23:13:47.1391936Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isposinf_executor_aten_cuda_float16 PASSED [ 41%] 2023-01-11T23:13:47.1392122Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isposinf_executor_aten_cuda_float32 PASSED [ 41%] 2023-01-11T23:13:47.1392302Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isposinf_executor_aten_cuda_float64 PASSED [ 41%] 2023-01-11T23:13:47.1392483Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isposinf_executor_aten_cuda_int16 PASSED [ 41%] 2023-01-11T23:13:47.1392658Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isposinf_executor_aten_cuda_int32 PASSED [ 41%] 2023-01-11T23:13:47.1392838Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isposinf_executor_aten_cuda_int64 PASSED [ 41%] 2023-01-11T23:13:47.1393021Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isposinf_executor_aten_cuda_uint8 PASSED [ 41%] 2023-01-11T23:13:47.1393235Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isposinf_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 41%] 2023-01-11T23:13:47.1393442Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isposinf_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 41%] 2023-01-11T23:13:47.1393651Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isposinf_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 41%] 2023-01-11T23:13:47.1393980Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isposinf_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 41%] 2023-01-11T23:13:47.1394182Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isposinf_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 41%] 2023-01-11T23:13:47.1394369Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_aten_cuda_bfloat16 PASSED [ 41%] 2023-01-11T23:13:47.1394575Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_aten_cuda_bool PASSED [ 41%] 2023-01-11T23:13:47.1394758Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_aten_cuda_complex128 PASSED [ 41%] 2023-01-11T23:13:47.1394941Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_aten_cuda_complex32 PASSED [ 41%] 2023-01-11T23:13:47.1395123Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_aten_cuda_float16 PASSED [ 41%] 2023-01-11T23:13:47.1395303Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_aten_cuda_int16 PASSED [ 41%] 2023-01-11T23:13:47.1395483Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_aten_cuda_int32 PASSED [ 41%] 2023-01-11T23:13:47.1395667Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_aten_cuda_int64 PASSED [ 41%] 2023-01-11T23:13:47.1395851Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_aten_cuda_int8 PASSED [ 41%] 2023-01-11T23:13:47.1396056Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 41%] 2023-01-11T23:13:47.1396400Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 41%] 2023-01-11T23:13:47.1396603Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 41%] 2023-01-11T23:13:47.1396810Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 41%] 2023-01-11T23:13:47.1397181Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_nvfuser_cuda_float32 SKIPPED (_refs.isreal doesn't support nvfuser) [ 41%] 2023-01-11T23:13:47.1397392Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 41%] 2023-01-11T23:13:47.1397709Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_nvfuser_cuda_int32 SKIPPED (_refs.isreal doesn't support nvfuser) [ 41%] 2023-01-11T23:13:47.1398031Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 41%] 2023-01-11T23:13:47.1398237Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 41%] 2023-01-11T23:13:47.1398416Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lcm_executor_aten_cuda_int64 PASSED [ 41%] 2023-01-11T23:13:47.1398595Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lcm_executor_aten_cuda_int8 PASSED [ 41%] 2023-01-11T23:13:47.1398806Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lcm_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 41%] 2023-01-11T23:13:47.1399004Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_le_executor_aten_cuda_bool PASSED [ 41%] 2023-01-11T23:13:47.1399209Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_le_executor_aten_cuda_float32 PASSED [ 41%] 2023-01-11T23:13:47.1399387Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_le_executor_aten_cuda_float64 PASSED [ 41%] 2023-01-11T23:13:47.1399563Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_le_executor_aten_cuda_int16 PASSED [ 42%] 2023-01-11T23:13:47.1399736Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_le_executor_aten_cuda_int32 PASSED [ 42%] 2023-01-11T23:13:47.1399906Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_le_executor_aten_cuda_int64 PASSED [ 42%] 2023-01-11T23:13:47.1400084Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_le_executor_aten_cuda_int8 PASSED [ 42%] 2023-01-11T23:13:47.1400255Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_le_executor_aten_cuda_uint8 PASSED [ 42%] 2023-01-11T23:13:47.1400482Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_le_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 42%] 2023-01-11T23:13:47.1400690Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_le_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 42%] 2023-01-11T23:13:47.1400865Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_le_executor_nvfuser_cuda_float32 PASSED [ 42%] 2023-01-11T23:13:47.1401066Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_le_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 42%] 2023-01-11T23:13:47.1401385Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_le_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 42%] 2023-01-11T23:13:47.1401589Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_le_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 42%] 2023-01-11T23:13:47.1401773Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lerp_executor_aten_cuda_complex64 PASSED [ 42%] 2023-01-11T23:13:47.1401957Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lerp_executor_aten_cuda_float32 PASSED [ 42%] 2023-01-11T23:13:47.1402138Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lerp_executor_aten_cuda_float64 PASSED [ 42%] 2023-01-11T23:13:47.1402345Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lerp_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 42%] 2023-01-11T23:13:47.1402555Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lerp_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 42%] 2023-01-11T23:13:47.1402887Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lerp_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 42%] 2023-01-11T23:13:47.1403121Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lerp_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 42%] 2023-01-11T23:13:47.1403440Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lerp_executor_nvfuser_cuda_float32 SKIPPED (_refs.lerp doesn't support nvfuser) [ 42%] 2023-01-11T23:13:47.1403625Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lgamma_executor_aten_cuda_float16 PASSED [ 42%] 2023-01-11T23:13:47.1403808Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lgamma_executor_aten_cuda_float32 PASSED [ 42%] 2023-01-11T23:13:47.1403987Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lgamma_executor_aten_cuda_int16 PASSED [ 42%] 2023-01-11T23:13:47.1404166Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lgamma_executor_aten_cuda_int64 PASSED [ 42%] 2023-01-11T23:13:47.1404345Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lgamma_executor_aten_cuda_uint8 PASSED [ 42%] 2023-01-11T23:13:47.1404555Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lgamma_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 42%] 2023-01-11T23:13:47.1404758Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lgamma_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 42%] 2023-01-11T23:13:47.1404937Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lgamma_executor_nvfuser_cuda_int32 PASSED [ 42%] 2023-01-11T23:13:47.1405261Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lgamma_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 42%] 2023-01-11T23:13:47.1405459Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_matrix_norm_executor_aten_cuda_bfloat16 PASSED [ 42%] 2023-01-11T23:13:47.1405656Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_matrix_norm_executor_aten_cuda_complex64 PASSED [ 42%] 2023-01-11T23:13:47.1405854Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_matrix_norm_executor_aten_cuda_float16 PASSED [ 42%] 2023-01-11T23:13:47.1406049Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_matrix_norm_executor_aten_cuda_float32 PASSED [ 42%] 2023-01-11T23:13:47.1406297Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_matrix_norm_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 42%] 2023-01-11T23:13:47.1406489Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_norm_executor_aten_cuda_bfloat16 PASSED [ 42%] 2023-01-11T23:13:47.1406683Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_norm_executor_aten_cuda_complex64 PASSED [ 42%] 2023-01-11T23:13:47.1406871Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_norm_executor_aten_cuda_float32 PASSED [ 42%] 2023-01-11T23:13:47.1407051Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_norm_executor_aten_cuda_float64 PASSED [ 42%] 2023-01-11T23:13:47.1407266Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_norm_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 42%] 2023-01-11T23:13:47.1407455Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_svd_executor_aten_cuda_float32 PASSED [ 42%] 2023-01-11T23:13:47.1407673Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_svd_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 42%] 2023-01-11T23:13:47.1407886Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_svd_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 42%] 2023-01-11T23:13:47.1408082Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_svdvals_executor_aten_cuda_complex128 PASSED [ 42%] 2023-01-11T23:13:47.1408275Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_svdvals_executor_aten_cuda_float32 PASSED [ 42%] 2023-01-11T23:13:47.1408462Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_svdvals_executor_aten_cuda_float64 PASSED [ 42%] 2023-01-11T23:13:47.1408682Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_vector_norm_executor_aten_cuda_bfloat16 PASSED [ 42%] 2023-01-11T23:13:47.1408883Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_vector_norm_executor_aten_cuda_complex128 PASSED [ 42%] 2023-01-11T23:13:47.1409073Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_vector_norm_executor_aten_cuda_complex64 PASSED [ 42%] 2023-01-11T23:13:47.1409266Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_vector_norm_executor_aten_cuda_float32 PASSED [ 42%] 2023-01-11T23:13:47.1409458Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_vector_norm_executor_aten_cuda_float64 PASSED [ 42%] 2023-01-11T23:13:47.1409678Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_vector_norm_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 42%] 2023-01-11T23:13:47.1409896Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_vector_norm_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 42%] 2023-01-11T23:13:47.1410085Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_executor_aten_cuda_float32 PASSED [ 42%] 2023-01-11T23:13:47.1410270Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_executor_aten_cuda_int32 XFAIL [ 42%] 2023-01-11T23:13:47.1410451Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_executor_aten_cuda_int8 XFAIL [ 42%] 2023-01-11T23:13:47.1410664Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 42%] 2023-01-11T23:13:47.1410877Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 42%] 2023-01-11T23:13:47.1411073Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 42%] 2023-01-11T23:13:47.1411283Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 42%] 2023-01-11T23:13:47.1411466Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log10_executor_aten_cuda_bfloat16 PASSED [ 42%] 2023-01-11T23:13:47.1411668Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log10_executor_aten_cuda_bool PASSED [ 42%] 2023-01-11T23:13:47.1411851Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log10_executor_aten_cuda_float16 PASSED [ 42%] 2023-01-11T23:13:47.1412031Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log10_executor_aten_cuda_float32 PASSED [ 42%] 2023-01-11T23:13:47.1412205Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log10_executor_aten_cuda_float64 PASSED [ 42%] 2023-01-11T23:13:47.1412382Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log10_executor_aten_cuda_int16 PASSED [ 42%] 2023-01-11T23:13:47.1412561Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log10_executor_aten_cuda_int64 PASSED [ 42%] 2023-01-11T23:13:47.1412765Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log10_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 42%] 2023-01-11T23:13:47.1412970Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log10_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 42%] 2023-01-11T23:13:47.1413302Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log10_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 42%] 2023-01-11T23:13:47.1413507Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log10_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 42%] 2023-01-11T23:13:47.1413827Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log10_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 42%] 2023-01-11T23:13:47.1414014Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log1p_executor_aten_cuda_complex128 PASSED [ 42%] 2023-01-11T23:13:47.1414228Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log1p_executor_aten_cuda_float16 PASSED [ 42%] 2023-01-11T23:13:47.1414409Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log1p_executor_aten_cuda_int32 PASSED [ 42%] 2023-01-11T23:13:47.1414903Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log1p_executor_aten_cuda_int64 PASSED [ 42%] 2023-01-11T23:13:47.1415093Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log1p_executor_aten_cuda_int8 PASSED [ 42%] 2023-01-11T23:13:47.1415265Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log1p_executor_aten_cuda_uint8 PASSED [ 42%] 2023-01-11T23:13:47.1415473Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log1p_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 42%] 2023-01-11T23:13:47.1415686Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log1p_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 42%] 2023-01-11T23:13:47.1415898Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log1p_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 42%] 2023-01-11T23:13:47.1416105Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log1p_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 42%] 2023-01-11T23:13:47.1416290Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log1p_executor_nvfuser_cuda_float32 PASSED [ 42%] 2023-01-11T23:13:47.1416496Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log1p_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 42%] 2023-01-11T23:13:47.1416679Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log1p_executor_nvfuser_cuda_int32 PASSED [ 42%] 2023-01-11T23:13:47.1417004Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log1p_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 42%] 2023-01-11T23:13:47.1417208Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log1p_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 42%] 2023-01-11T23:13:47.1417384Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log2_executor_aten_cuda_bfloat16 PASSED [ 42%] 2023-01-11T23:13:47.1417634Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log2_executor_aten_cuda_complex128 PASSED [ 42%] 2023-01-11T23:13:47.1417818Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log2_executor_aten_cuda_complex64 PASSED [ 42%] 2023-01-11T23:13:47.1417997Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log2_executor_aten_cuda_float32 PASSED [ 42%] 2023-01-11T23:13:47.1418175Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log2_executor_aten_cuda_int64 PASSED [ 42%] 2023-01-11T23:13:47.1418351Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log2_executor_aten_cuda_uint8 PASSED [ 42%] 2023-01-11T23:13:47.1418556Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log2_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 42%] 2023-01-11T23:13:47.1418762Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log2_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 42%] 2023-01-11T23:13:47.1418950Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log2_executor_nvfuser_cuda_float32 PASSED [ 42%] 2023-01-11T23:13:47.1419149Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log2_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 42%] 2023-01-11T23:13:47.1419328Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log2_executor_nvfuser_cuda_int32 PASSED [ 42%] 2023-01-11T23:13:47.1419648Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log2_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 42%] 2023-01-11T23:13:47.1419850Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log2_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 42%] 2023-01-11T23:13:47.1420033Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_aten_cuda_complex64 PASSED [ 42%] 2023-01-11T23:13:47.1420251Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_aten_cuda_float16 PASSED [ 42%] 2023-01-11T23:13:47.1420435Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_aten_cuda_float64 PASSED [ 42%] 2023-01-11T23:13:47.1420639Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 42%] 2023-01-11T23:13:47.1420979Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 42%] 2023-01-11T23:13:47.1421187Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 42%] 2023-01-11T23:13:47.1421384Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 42%] 2023-01-11T23:13:47.1429001Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_nvfuser_cuda_int32 PASSED [ 42%] 2023-01-11T23:13:47.1429364Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 42%] 2023-01-11T23:13:47.1429568Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_aten_cuda_bfloat16 PASSED [ 42%] 2023-01-11T23:13:47.1429900Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_aten_cuda_bool PASSED [ 42%] 2023-01-11T23:13:47.1430108Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_aten_cuda_complex128 PASSED [ 42%] 2023-01-11T23:13:47.1430308Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_aten_cuda_complex64 PASSED [ 42%] 2023-01-11T23:13:47.1430505Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_aten_cuda_float64 PASSED [ 42%] 2023-01-11T23:13:47.1430704Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_aten_cuda_int32 PASSED [ 42%] 2023-01-11T23:13:47.1430975Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_aten_cuda_int64 PASSED [ 42%] 2023-01-11T23:13:47.1431166Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_aten_cuda_int8 PASSED [ 42%] 2023-01-11T23:13:47.1431363Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_aten_cuda_uint8 PASSED [ 42%] 2023-01-11T23:13:47.1431590Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 42%] 2023-01-11T23:13:47.1431963Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 42%] 2023-01-11T23:13:47.1432191Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 42%] 2023-01-11T23:13:47.1432416Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 42%] 2023-01-11T23:13:47.1432768Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 42%] 2023-01-11T23:13:47.1433117Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 42%] 2023-01-11T23:13:47.1433308Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_and_executor_aten_cuda_float32 PASSED [ 42%] 2023-01-11T23:13:47.1433497Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_and_executor_aten_cuda_float64 PASSED [ 42%] 2023-01-11T23:13:47.1433723Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_and_executor_aten_cuda_int64 PASSED [ 42%] 2023-01-11T23:13:47.1433910Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_and_executor_aten_cuda_uint8 PASSED [ 42%] 2023-01-11T23:13:47.1434125Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_and_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 42%] 2023-01-11T23:13:47.1434340Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_and_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 42%] 2023-01-11T23:13:47.1434552Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_and_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 42%] 2023-01-11T23:13:47.1434891Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_and_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 42%] 2023-01-11T23:13:47.1435079Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_and_executor_nvfuser_cuda_int32 PASSED [ 42%] 2023-01-11T23:13:47.1435291Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_and_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 42%] 2023-01-11T23:13:47.1435623Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_and_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 42%] 2023-01-11T23:13:47.1435812Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_not_executor_aten_cuda_bfloat16 PASSED [ 42%] 2023-01-11T23:13:47.1436004Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_not_executor_aten_cuda_complex64 PASSED [ 42%] 2023-01-11T23:13:47.1436183Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_not_executor_aten_cuda_float32 PASSED [ 42%] 2023-01-11T23:13:47.1436367Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_not_executor_aten_cuda_int32 PASSED [ 42%] 2023-01-11T23:13:47.1436552Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_not_executor_aten_cuda_int64 PASSED [ 42%] 2023-01-11T23:13:47.1436736Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_not_executor_aten_cuda_uint8 PASSED [ 42%] 2023-01-11T23:13:47.1436974Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_not_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 42%] 2023-01-11T23:13:47.1437309Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_not_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 42%] 2023-01-11T23:13:47.1437498Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_not_executor_nvfuser_cuda_int32 PASSED [ 42%] 2023-01-11T23:13:47.1437705Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_not_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 42%] 2023-01-11T23:13:47.1437915Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_not_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 42%] 2023-01-11T23:13:47.1438104Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_or_executor_aten_cuda_complex128 PASSED [ 42%] 2023-01-11T23:13:47.1438296Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_or_executor_aten_cuda_complex64 PASSED [ 42%] 2023-01-11T23:13:47.1438481Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_or_executor_aten_cuda_float16 PASSED [ 42%] 2023-01-11T23:13:47.1438668Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_or_executor_aten_cuda_float64 PASSED [ 42%] 2023-01-11T23:13:47.1438850Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_or_executor_aten_cuda_int16 PASSED [ 42%] 2023-01-11T23:13:47.1439033Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_or_executor_aten_cuda_int8 PASSED [ 42%] 2023-01-11T23:13:47.1439211Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_or_executor_aten_cuda_uint8 PASSED [ 42%] 2023-01-11T23:13:47.1439448Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_or_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 42%] 2023-01-11T23:13:47.1439665Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_or_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 43%] 2023-01-11T23:13:47.1439853Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_or_executor_nvfuser_cuda_float32 PASSED [ 43%] 2023-01-11T23:13:47.1440055Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_or_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 43%] 2023-01-11T23:13:47.1440239Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_xor_executor_aten_cuda_float32 PASSED [ 43%] 2023-01-11T23:13:47.1440423Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_xor_executor_aten_cuda_float64 PASSED [ 43%] 2023-01-11T23:13:47.1440606Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_xor_executor_aten_cuda_int32 PASSED [ 43%] 2023-01-11T23:13:47.1440824Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_xor_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 43%] 2023-01-11T23:13:47.1441036Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_xor_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 43%] 2023-01-11T23:13:47.1441227Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_xor_executor_nvfuser_cuda_float32 PASSED [ 43%] 2023-01-11T23:13:47.1441439Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_xor_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 43%] 2023-01-11T23:13:47.1441774Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_xor_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 43%] 2023-01-11T23:13:47.1441965Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_xor_executor_nvfuser_cuda_int32 PASSED [ 43%] 2023-01-11T23:13:47.1442170Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_xor_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 43%] 2023-01-11T23:13:47.1442535Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_xor_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 43%] 2023-01-11T23:13:47.1442747Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_xor_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 43%] 2023-01-11T23:13:47.1442932Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logspace_executor_aten_cuda_bfloat16 PASSED [ 43%] 2023-01-11T23:13:47.1443121Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logspace_executor_aten_cuda_complex64 PASSED [ 43%] 2023-01-11T23:13:47.1443306Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logspace_executor_aten_cuda_float16 PASSED [ 43%] 2023-01-11T23:13:47.1443487Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logspace_executor_aten_cuda_float32 PASSED [ 43%] 2023-01-11T23:13:47.1443670Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logspace_executor_aten_cuda_float64 PASSED [ 43%] 2023-01-11T23:13:47.1443853Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logspace_executor_aten_cuda_int32 XFAIL [ 43%] 2023-01-11T23:13:47.1444034Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logspace_executor_aten_cuda_int64 XFAIL [ 43%] 2023-01-11T23:13:47.1444243Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logspace_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 43%] 2023-01-11T23:13:47.1444455Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logspace_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 43%] 2023-01-11T23:13:47.1444787Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logspace_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 43%] 2023-01-11T23:13:47.1445115Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logspace_executor_nvfuser_cuda_int32 SKIPPED (_refs.logspace doesn't support nvfuser) [ 43%] 2023-01-11T23:13:47.1445350Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logspace_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 43%] 2023-01-11T23:13:47.1445683Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logspace_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 43%] 2023-01-11T23:13:47.1445870Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logsumexp_executor_aten_cuda_bool PASSED [ 43%] 2023-01-11T23:13:47.1446058Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logsumexp_executor_aten_cuda_float32 PASSED [ 43%] 2023-01-11T23:13:47.1446245Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logsumexp_executor_aten_cuda_float64 PASSED [ 43%] 2023-01-11T23:13:47.1446424Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logsumexp_executor_aten_cuda_int8 PASSED [ 43%] 2023-01-11T23:13:47.1446629Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logsumexp_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 43%] 2023-01-11T23:13:47.1446966Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logsumexp_executor_nvfuser_cuda_float32 SKIPPED (_refs.logsumexp doesn't support nvfuser) [ 43%] 2023-01-11T23:13:47.1447298Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logsumexp_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 43%] 2023-01-11T23:13:47.1447629Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logsumexp_executor_nvfuser_cuda_int32 SKIPPED (_refs.logsumexp doesn't support nvfuser) [ 43%] 2023-01-11T23:13:47.1447837Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logsumexp_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 43%] 2023-01-11T23:13:47.1448042Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logsumexp_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 43%] 2023-01-11T23:13:47.1448223Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lt_executor_aten_cuda_bfloat16 PASSED [ 43%] 2023-01-11T23:13:47.1448402Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lt_executor_aten_cuda_bool PASSED [ 43%] 2023-01-11T23:13:47.1448609Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lt_executor_aten_cuda_float16 PASSED [ 43%] 2023-01-11T23:13:47.1448790Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lt_executor_aten_cuda_float32 PASSED [ 43%] 2023-01-11T23:13:47.1448961Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lt_executor_aten_cuda_int8 PASSED [ 43%] 2023-01-11T23:13:47.1449163Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lt_executor_aten_cuda_uint8 PASSED [ 43%] 2023-01-11T23:13:47.1449392Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lt_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 43%] 2023-01-11T23:13:47.1449594Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lt_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 43%] 2023-01-11T23:13:47.1449799Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lt_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 43%] 2023-01-11T23:13:47.1449985Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lt_executor_nvfuser_cuda_float32 PASSED [ 43%] 2023-01-11T23:13:47.1450187Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lt_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 43%] 2023-01-11T23:13:47.1450508Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lt_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 43%] 2023-01-11T23:13:47.1450688Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lt_executor_nvfuser_cuda_int32 PASSED [ 43%] 2023-01-11T23:13:47.1450880Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lt_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 43%] 2023-01-11T23:13:47.1451098Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_aten_cuda_bfloat16 PASSED [ 43%] 2023-01-11T23:13:47.1451290Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_aten_cuda_complex64 PASSED [ 43%] 2023-01-11T23:13:47.1451479Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_aten_cuda_float16 PASSED [ 43%] 2023-01-11T23:13:47.1451667Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_aten_cuda_float32 PASSED [ 43%] 2023-01-11T23:13:47.1451851Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_aten_cuda_float64 PASSED [ 43%] 2023-01-11T23:13:47.1452036Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_aten_cuda_int16 PASSED [ 43%] 2023-01-11T23:13:47.1452221Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_aten_cuda_int8 PASSED [ 43%] 2023-01-11T23:13:47.1452403Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_aten_cuda_uint8 PASSED [ 43%] 2023-01-11T23:13:47.1452621Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 43%] 2023-01-11T23:13:47.1452824Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 43%] 2023-01-11T23:13:47.1453039Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 43%] 2023-01-11T23:13:47.1453252Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 43%] 2023-01-11T23:13:47.1453591Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_nvfuser_cuda_int32 SKIPPED (_refs.masked_fill doesn't support nvfuser) [ 43%] 2023-01-11T23:13:47.1453925Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 43%] 2023-01-11T23:13:47.1454108Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_maximum_executor_aten_cuda_bool PASSED [ 43%] 2023-01-11T23:13:47.1454325Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_maximum_executor_aten_cuda_float64 PASSED [ 43%] 2023-01-11T23:13:47.1454747Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_maximum_executor_aten_cuda_int16 PASSED [ 43%] 2023-01-11T23:13:47.1454988Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_maximum_executor_aten_cuda_int32 PASSED [ 43%] 2023-01-11T23:13:47.1455173Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_maximum_executor_aten_cuda_uint8 PASSED [ 43%] 2023-01-11T23:13:47.1455377Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_maximum_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 43%] 2023-01-11T23:13:47.1455584Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_maximum_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 43%] 2023-01-11T23:13:47.1455925Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_maximum_executor_nvfuser_cuda_int32 SKIPPED (_refs.maximum doesn't support nvfuser) [ 43%] 2023-01-11T23:13:47.1456263Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_maximum_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 43%] 2023-01-11T23:13:47.1456447Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mean_executor_aten_cuda_bfloat16 PASSED [ 43%] 2023-01-11T23:13:47.1456628Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mean_executor_aten_cuda_complex64 PASSED [ 43%] 2023-01-11T23:13:47.1456808Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mean_executor_aten_cuda_float32 PASSED [ 43%] 2023-01-11T23:13:47.1457015Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mean_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 43%] 2023-01-11T23:13:47.1457330Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mean_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 43%] 2023-01-11T23:13:47.1457538Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mean_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 43%] 2023-01-11T23:13:47.1457736Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_aten_cuda_bfloat16 PASSED [ 43%] 2023-01-11T23:13:47.1457938Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_aten_cuda_float32 PASSED [ 43%] 2023-01-11T23:13:47.1458140Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_aten_cuda_int16 PASSED [ 43%] 2023-01-11T23:13:47.1458345Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_aten_cuda_int8 PASSED [ 43%] 2023-01-11T23:13:47.1458576Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 43%] 2023-01-11T23:13:47.1458809Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 43%] 2023-01-11T23:13:47.1459169Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_nvfuser_cuda_float32 SKIPPED (_refs.meshgrid doesn't support nvfuser) [ 43%] 2023-01-11T23:13:47.1459522Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_nvfuser_cuda_int32 SKIPPED (_refs.meshgrid doesn't support nvfuser) [ 43%] 2023-01-11T23:13:47.1459744Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 43%] 2023-01-11T23:13:47.1460095Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 43%] 2023-01-11T23:13:47.1460306Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_variadic_tensors_executor_aten_cuda_bfloat16 PASSED [ 43%] 2023-01-11T23:13:47.1460545Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_variadic_tensors_executor_aten_cuda_float32 PASSED [ 43%] 2023-01-11T23:13:47.1460754Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_variadic_tensors_executor_aten_cuda_float64 PASSED [ 43%] 2023-01-11T23:13:47.1460954Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_variadic_tensors_executor_aten_cuda_int16 PASSED [ 43%] 2023-01-11T23:13:47.1461155Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_variadic_tensors_executor_aten_cuda_int64 PASSED [ 43%] 2023-01-11T23:13:47.1461354Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_variadic_tensors_executor_aten_cuda_uint8 PASSED [ 43%] 2023-01-11T23:13:47.1461580Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_variadic_tensors_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 43%] 2023-01-11T23:13:47.1461813Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_variadic_tensors_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 43%] 2023-01-11T23:13:47.1462047Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_variadic_tensors_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 43%] 2023-01-11T23:13:47.1462277Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_variadic_tensors_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 43%] 2023-01-11T23:13:47.1462640Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_variadic_tensors_executor_nvfuser_cuda_float32 SKIPPED (_refs.meshgrid doesn't support nvfuser) [ 43%] 2023-01-11T23:13:47.1462999Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_variadic_tensors_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 43%] 2023-01-11T23:13:47.1463244Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_variadic_tensors_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 43%] 2023-01-11T23:13:47.1463435Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_minimum_executor_aten_cuda_float64 PASSED [ 43%] 2023-01-11T23:13:47.1463619Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_minimum_executor_aten_cuda_int32 PASSED [ 43%] 2023-01-11T23:13:47.1463826Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_minimum_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 43%] 2023-01-11T23:13:47.1464032Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_minimum_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 43%] 2023-01-11T23:13:47.1464239Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_minimum_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 43%] 2023-01-11T23:13:47.1464440Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_minimum_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 43%] 2023-01-11T23:13:47.1464649Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_minimum_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 43%] 2023-01-11T23:13:47.1464977Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_minimum_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 43%] 2023-01-11T23:13:47.1465169Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_aten_cuda_bfloat16 PASSED [ 43%] 2023-01-11T23:13:47.1465344Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_aten_cuda_bool PASSED [ 43%] 2023-01-11T23:13:47.1465534Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_aten_cuda_complex128 PASSED [ 43%] 2023-01-11T23:13:47.1465723Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_aten_cuda_complex32 PASSED [ 43%] 2023-01-11T23:13:47.1465913Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_aten_cuda_complex64 PASSED [ 43%] 2023-01-11T23:13:47.1466097Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_aten_cuda_float32 PASSED [ 43%] 2023-01-11T23:13:47.1466314Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_aten_cuda_float64 PASSED [ 43%] 2023-01-11T23:13:47.1466498Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_aten_cuda_int32 PASSED [ 43%] 2023-01-11T23:13:47.1466678Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_aten_cuda_int64 PASSED [ 43%] 2023-01-11T23:13:47.1466857Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_aten_cuda_int8 PASSED [ 43%] 2023-01-11T23:13:47.1467030Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_aten_cuda_uint8 PASSED [ 43%] 2023-01-11T23:13:47.1467236Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 43%] 2023-01-11T23:13:47.1467568Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_nvfuser_cuda_float32 SKIPPED (_refs.movedim doesn't support nvfuser) [ 43%] 2023-01-11T23:13:47.1467776Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 43%] 2023-01-11T23:13:47.1468107Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 43%] 2023-01-11T23:13:47.1468312Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 43%] 2023-01-11T23:13:47.1468640Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 43%] 2023-01-11T23:13:47.1468855Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 43%] 2023-01-11T23:13:47.1469084Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mul_executor_aten_cuda_complex32 XFAIL [ 43%] 2023-01-11T23:13:47.1469271Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mul_executor_aten_cuda_complex64 PASSED [ 43%] 2023-01-11T23:13:47.1469443Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mul_executor_aten_cuda_float16 PASSED [ 43%] 2023-01-11T23:13:47.1469623Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mul_executor_aten_cuda_float64 PASSED [ 43%] 2023-01-11T23:13:47.1469887Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mul_executor_aten_cuda_int32 PASSED [ 43%] 2023-01-11T23:13:47.1470067Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mul_executor_aten_cuda_int64 PASSED [ 43%] 2023-01-11T23:13:47.1470271Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mul_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 43%] 2023-01-11T23:13:47.1470475Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mul_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 43%] 2023-01-11T23:13:47.1470689Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mul_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 43%] 2023-01-11T23:13:47.1471032Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mul_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 43%] 2023-01-11T23:13:47.1471236Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mul_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 43%] 2023-01-11T23:13:47.1471441Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mul_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 43%] 2023-01-11T23:13:47.1471637Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mul_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 43%] 2023-01-11T23:13:47.1471963Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mul_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 43%] 2023-01-11T23:13:47.1472144Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mul_executor_nvfuser_cuda_int32 PASSED [ 43%] 2023-01-11T23:13:47.1472374Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mul_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 43%] 2023-01-11T23:13:47.1472563Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nan_to_num_executor_aten_cuda_float16 PASSED [ 43%] 2023-01-11T23:13:47.1472749Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nan_to_num_executor_aten_cuda_float64 PASSED [ 43%] 2023-01-11T23:13:47.1472931Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nan_to_num_executor_aten_cuda_int16 PASSED [ 43%] 2023-01-11T23:13:47.1473113Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nan_to_num_executor_aten_cuda_int8 PASSED [ 43%] 2023-01-11T23:13:47.1473329Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nan_to_num_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 43%] 2023-01-11T23:13:47.1473666Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nan_to_num_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 44%] 2023-01-11T23:13:47.1473987Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nan_to_num_executor_nvfuser_cuda_int32 SKIPPED (_refs.nan_to_num doesn't support nvfuser) [ 44%] 2023-01-11T23:13:47.1474198Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nan_to_num_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 44%] 2023-01-11T23:13:47.1474525Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nan_to_num_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 44%] 2023-01-11T23:13:47.1474715Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_aten_cuda_bfloat16 PASSED [ 44%] 2023-01-11T23:13:47.1474945Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_aten_cuda_bool PASSED [ 44%] 2023-01-11T23:13:47.1475139Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_aten_cuda_complex32 PASSED [ 44%] 2023-01-11T23:13:47.1475336Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_aten_cuda_float16 PASSED [ 44%] 2023-01-11T23:13:47.1475521Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_aten_cuda_float32 PASSED [ 44%] 2023-01-11T23:13:47.1475706Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_aten_cuda_float64 PASSED [ 44%] 2023-01-11T23:13:47.1475890Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_aten_cuda_int32 PASSED [ 44%] 2023-01-11T23:13:47.1476067Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_aten_cuda_int64 PASSED [ 44%] 2023-01-11T23:13:47.1476276Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 44%] 2023-01-11T23:13:47.1476494Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 44%] 2023-01-11T23:13:47.1476853Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 44%] 2023-01-11T23:13:47.1477075Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 44%] 2023-01-11T23:13:47.1477287Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 44%] 2023-01-11T23:13:47.1477473Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_aten_cuda_bfloat16 PASSED [ 44%] 2023-01-11T23:13:47.1477660Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_aten_cuda_complex128 PASSED [ 44%] 2023-01-11T23:13:47.1477847Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_aten_cuda_complex32 PASSED [ 44%] 2023-01-11T23:13:47.1478058Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_aten_cuda_int64 PASSED [ 44%] 2023-01-11T23:13:47.1478260Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 44%] 2023-01-11T23:13:47.1478471Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 44%] 2023-01-11T23:13:47.1478677Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 44%] 2023-01-11T23:13:47.1479016Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_nvfuser_cuda_int32 SKIPPED (_refs.narrow doesn't support nvfuser) [ 44%] 2023-01-11T23:13:47.1479259Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 44%] 2023-01-11T23:13:47.1479504Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_native_layer_norm_executor_aten_cuda_float16 PASSED [ 44%] 2023-01-11T23:13:47.1479748Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_native_layer_norm_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 44%] 2023-01-11T23:13:47.1479933Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ne_executor_aten_cuda_complex64 PASSED [ 44%] 2023-01-11T23:13:47.1480112Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ne_executor_aten_cuda_float16 PASSED [ 44%] 2023-01-11T23:13:47.1480281Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ne_executor_aten_cuda_float32 PASSED [ 44%] 2023-01-11T23:13:47.1480459Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ne_executor_aten_cuda_float64 PASSED [ 44%] 2023-01-11T23:13:47.1480682Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ne_executor_aten_cuda_int16 PASSED [ 44%] 2023-01-11T23:13:47.1480852Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ne_executor_aten_cuda_int32 PASSED [ 44%] 2023-01-11T23:13:47.1481030Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ne_executor_aten_cuda_int8 PASSED [ 44%] 2023-01-11T23:13:47.1481237Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ne_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 44%] 2023-01-11T23:13:47.1481443Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ne_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 44%] 2023-01-11T23:13:47.1481775Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ne_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 44%] 2023-01-11T23:13:47.1482096Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ne_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 44%] 2023-01-11T23:13:47.1482279Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_neg_executor_aten_cuda_bfloat16 PASSED [ 44%] 2023-01-11T23:13:47.1482458Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_neg_executor_aten_cuda_complex128 PASSED [ 44%] 2023-01-11T23:13:47.1482643Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_neg_executor_aten_cuda_complex32 PASSED [ 44%] 2023-01-11T23:13:47.1482818Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_neg_executor_aten_cuda_int8 PASSED [ 44%] 2023-01-11T23:13:47.1482997Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_neg_executor_aten_cuda_uint8 PASSED [ 44%] 2023-01-11T23:13:47.1483202Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_neg_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 44%] 2023-01-11T23:13:47.1483544Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_neg_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 44%] 2023-01-11T23:13:47.1483873Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_neg_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 44%] 2023-01-11T23:13:47.1484232Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_neg_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 44%] 2023-01-11T23:13:47.1484550Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_aten_cuda_bfloat16 SKIPPED (Can't check result for new_empty) [ 44%] 2023-01-11T23:13:47.1484868Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_aten_cuda_complex128 SKIPPED (Can't check result for new_empty) [ 44%] 2023-01-11T23:13:47.1485176Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_aten_cuda_complex32 SKIPPED (Can't check result for new_empty) [ 44%] 2023-01-11T23:13:47.1485485Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_aten_cuda_complex64 SKIPPED (Can't check result for new_empty) [ 44%] 2023-01-11T23:13:47.1485801Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_aten_cuda_float32 SKIPPED (Can't check result for new_empty) [ 44%] 2023-01-11T23:13:47.1486109Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_aten_cuda_float64 SKIPPED (Can't check result for new_empty) [ 44%] 2023-01-11T23:13:47.1486419Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_aten_cuda_int16 SKIPPED (Can't check result for new_empty) [ 44%] 2023-01-11T23:13:47.1486730Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_aten_cuda_int8 SKIPPED (Can't check result for new_empty) [ 44%] 2023-01-11T23:13:47.1487042Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_aten_cuda_uint8 SKIPPED (Can't check result for new_empty) [ 44%] 2023-01-11T23:13:47.1487354Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_nvfuser_cuda_bool SKIPPED (Can't check result for new_empty) [ 44%] 2023-01-11T23:13:47.1487704Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_nvfuser_cuda_complex128 SKIPPED (Can't check result for new_empty) [ 44%] 2023-01-11T23:13:47.1488033Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_nvfuser_cuda_complex32 SKIPPED (Can't check result for new_empty) [ 44%] 2023-01-11T23:13:47.1488351Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_nvfuser_cuda_float16 SKIPPED (Can't check result for new_empty) [ 44%] 2023-01-11T23:13:47.1488657Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_nvfuser_cuda_int32 SKIPPED (Can't check result for new_empty) [ 44%] 2023-01-11T23:13:47.1488961Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_nvfuser_cuda_int64 SKIPPED (Can't check result for new_empty) [ 44%] 2023-01-11T23:13:47.1489200Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_strided_executor_aten_cuda_bool SKIPPED (Expected: empty_strided is not comparable) [ 44%] 2023-01-11T23:13:47.1489450Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_strided_executor_aten_cuda_complex128 SKIPPED (Expected: empty_strided is not comparable) [ 44%] 2023-01-11T23:13:47.1489691Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_strided_executor_aten_cuda_complex32 SKIPPED (Expected: empty_strided is not comparable) [ 44%] 2023-01-11T23:13:47.1489924Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_strided_executor_aten_cuda_complex64 SKIPPED (Expected: empty_strided is not comparable) [ 44%] 2023-01-11T23:13:47.1490157Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_strided_executor_aten_cuda_float16 SKIPPED (Expected: empty_strided is not comparable) [ 44%] 2023-01-11T23:13:47.1490387Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_strided_executor_aten_cuda_float32 SKIPPED (Expected: empty_strided is not comparable) [ 44%] 2023-01-11T23:13:47.1490617Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_strided_executor_aten_cuda_float64 SKIPPED (Expected: empty_strided is not comparable) [ 44%] 2023-01-11T23:13:47.1490883Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_strided_executor_aten_cuda_int16 SKIPPED (Expected: empty_strided is not comparable) [ 44%] 2023-01-11T23:13:47.1491124Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_strided_executor_aten_cuda_int32 SKIPPED (Expected: empty_strided is not comparable) [ 44%] 2023-01-11T23:13:47.1491361Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_strided_executor_aten_cuda_int64 SKIPPED (Expected: empty_strided is not comparable) [ 44%] 2023-01-11T23:13:47.1491592Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_strided_executor_aten_cuda_int8 SKIPPED (Expected: empty_strided is not comparable) [ 44%] 2023-01-11T23:13:47.1491842Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_strided_executor_nvfuser_cuda_complex32 SKIPPED (Expected: empty_strided is not comparable) [ 44%] 2023-01-11T23:13:47.1492090Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_strided_executor_nvfuser_cuda_float16 SKIPPED (Expected: empty_strided is not comparable) [ 44%] 2023-01-11T23:13:47.1492330Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_strided_executor_nvfuser_cuda_int32 SKIPPED (Expected: empty_strided is not comparable) [ 44%] 2023-01-11T23:13:47.1492567Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_strided_executor_nvfuser_cuda_int8 SKIPPED (Expected: empty_strided is not comparable) [ 44%] 2023-01-11T23:13:47.1492802Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_strided_executor_nvfuser_cuda_uint8 SKIPPED (Expected: empty_strided is not comparable) [ 44%] 2023-01-11T23:13:47.1493012Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_full_executor_aten_cuda_bool PASSED [ 44%] 2023-01-11T23:13:47.1493201Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_full_executor_aten_cuda_complex128 PASSED [ 44%] 2023-01-11T23:13:47.1493392Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_full_executor_aten_cuda_complex32 PASSED [ 44%] 2023-01-11T23:13:47.1493577Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_full_executor_aten_cuda_float64 PASSED [ 44%] 2023-01-11T23:13:47.1493757Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_full_executor_aten_cuda_int64 PASSED [ 44%] 2023-01-11T23:13:47.1493928Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_full_executor_aten_cuda_int8 PASSED [ 44%] 2023-01-11T23:13:47.1494141Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_full_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 44%] 2023-01-11T23:13:47.1494777Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_full_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 44%] 2023-01-11T23:13:47.1495037Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_full_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 44%] 2023-01-11T23:13:47.1495380Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_full_executor_nvfuser_cuda_float32 SKIPPED (_refs.new_full doesn't support nvfuser) [ 44%] 2023-01-11T23:13:47.1495709Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_full_executor_nvfuser_cuda_int32 SKIPPED (_refs.new_full doesn't support nvfuser) [ 44%] 2023-01-11T23:13:47.1495925Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_full_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 44%] 2023-01-11T23:13:47.1496256Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_full_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 44%] 2023-01-11T23:13:47.1496451Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_aten_cuda_complex32 PASSED [ 44%] 2023-01-11T23:13:47.1496639Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_aten_cuda_complex64 PASSED [ 44%] 2023-01-11T23:13:47.1496950Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_aten_cuda_float16 PASSED [ 44%] 2023-01-11T23:13:47.1497139Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_aten_cuda_float32 PASSED [ 44%] 2023-01-11T23:13:47.1497316Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_aten_cuda_float64 PASSED [ 44%] 2023-01-11T23:13:47.1497497Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_aten_cuda_int32 PASSED [ 44%] 2023-01-11T23:13:47.1497676Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_aten_cuda_uint8 PASSED [ 44%] 2023-01-11T23:13:47.1497883Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 44%] 2023-01-11T23:13:47.1498230Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 44%] 2023-01-11T23:13:47.1498443Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 44%] 2023-01-11T23:13:47.1498654Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 44%] 2023-01-11T23:13:47.1498982Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_nvfuser_cuda_int32 SKIPPED (_refs.new_ones doesn't support nvfuser) [ 44%] 2023-01-11T23:13:47.1499208Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 44%] 2023-01-11T23:13:47.1499553Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 44%] 2023-01-11T23:13:47.1499777Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_zeros_executor_aten_cuda_bool PASSED [ 44%] 2023-01-11T23:13:47.1499967Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_zeros_executor_aten_cuda_float16 PASSED [ 44%] 2023-01-11T23:13:47.1500153Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_zeros_executor_aten_cuda_int16 PASSED [ 44%] 2023-01-11T23:13:47.1500337Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_zeros_executor_aten_cuda_uint8 PASSED [ 44%] 2023-01-11T23:13:47.1500545Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_zeros_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 44%] 2023-01-11T23:13:47.1500760Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_zeros_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 44%] 2023-01-11T23:13:47.1501025Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_zeros_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 44%] 2023-01-11T23:13:47.1501246Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_zeros_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 44%] 2023-01-11T23:13:47.1501574Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_zeros_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 44%] 2023-01-11T23:13:47.1501785Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_zeros_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 44%] 2023-01-11T23:13:47.1501971Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nextafter_executor_aten_cuda_float32 PASSED [ 44%] 2023-01-11T23:13:47.1502180Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nextafter_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 44%] 2023-01-11T23:13:47.1502391Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nextafter_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 44%] 2023-01-11T23:13:47.1502640Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_alpha_dropout_executor_aten_cuda_float16 SKIPPED (Expected: dropout is not comparable) [ 44%] 2023-01-11T23:13:47.1502937Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_alpha_dropout_executor_aten_cuda_float32 SKIPPED (Expected: dropout is not comparable) [ 44%] 2023-01-11T23:13:47.1503186Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_alpha_dropout_executor_aten_cuda_float64 SKIPPED (Expected: dropout is not comparable) [ 44%] 2023-01-11T23:13:47.1503434Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_alpha_dropout_executor_nvfuser_cuda_float32 SKIPPED (Expected: dropout is not comparable) [ 44%] 2023-01-11T23:13:47.1503679Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_alpha_dropout_executor_nvfuser_cuda_float64 SKIPPED (Expected: dropout is not comparable) [ 44%] 2023-01-11T23:13:47.1503878Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_celu_executor_aten_cuda_bfloat16 PASSED [ 44%] 2023-01-11T23:13:47.1504067Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_celu_executor_aten_cuda_float16 PASSED [ 44%] 2023-01-11T23:13:47.1504271Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_celu_executor_nvfuser_cuda_float32 PASSED [ 44%] 2023-01-11T23:13:47.1504469Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_dropout_executor_aten_cuda_float32 XFAIL [ 44%] 2023-01-11T23:13:47.1504694Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_dropout_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 44%] 2023-01-11T23:13:47.1504886Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_elu_executor_aten_cuda_float64 PASSED [ 44%] 2023-01-11T23:13:47.1505107Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_elu_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 44%] 2023-01-11T23:13:47.1505330Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_gelu_executor_aten_cuda_bfloat16 PASSED [ 44%] 2023-01-11T23:13:47.1505528Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_gelu_executor_aten_cuda_float16 PASSED [ 44%] 2023-01-11T23:13:47.1505719Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_gelu_executor_aten_cuda_float64 PASSED [ 44%] 2023-01-11T23:13:47.1505919Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_gelu_executor_nvfuser_cuda_float32 PASSED [ 44%] 2023-01-11T23:13:47.1506131Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_gelu_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 44%] 2023-01-11T23:13:47.1506494Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_glu_executor_nvfuser_cuda_float32 SKIPPED (_refs.nn.functional.glu doesn't support nvfuser) [ 44%] 2023-01-11T23:13:47.1506699Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_group_norm_executor_aten_cuda_float16 PASSED [ 44%] 2023-01-11T23:13:47.1506901Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_group_norm_executor_aten_cuda_float64 PASSED [ 44%] 2023-01-11T23:13:47.1507124Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_group_norm_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 44%] 2023-01-11T23:13:47.1507349Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hardshrink_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 44%] 2023-01-11T23:13:47.1507576Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hardshrink_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 44%] 2023-01-11T23:13:47.1507794Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hardshrink_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 44%] 2023-01-11T23:13:47.1508001Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hardtanh_executor_aten_cuda_float32 PASSED [ 44%] 2023-01-11T23:13:47.1508227Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hardtanh_executor_aten_cuda_float64 PASSED [ 44%] 2023-01-11T23:13:47.1508424Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hardtanh_executor_aten_cuda_int64 PASSED [ 44%] 2023-01-11T23:13:47.1508799Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hardtanh_executor_nvfuser_cuda_int32 SKIPPED (_refs.nn.functional.hardtanh doesn't support nvfuser) [ 44%] 2023-01-11T23:13:47.1509023Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hardtanh_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 44%] 2023-01-11T23:13:47.1509238Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hinge_embedding_loss_executor_aten_cuda_bfloat16 PASSED [ 44%] 2023-01-11T23:13:47.1509455Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hinge_embedding_loss_executor_aten_cuda_float16 PASSED [ 44%] 2023-01-11T23:13:47.1509669Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hinge_embedding_loss_executor_aten_cuda_float32 PASSED [ 44%] 2023-01-11T23:13:47.1510184Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hinge_embedding_loss_executor_nvfuser_cuda_float32 SKIPPED (_refs.nn.functional.hinge_embedding_loss doesn't support nvfuser) [ 44%] 2023-01-11T23:13:47.1510393Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_huber_loss_executor_aten_cuda_bfloat16 PASSED [ 45%] 2023-01-11T23:13:47.1510594Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_huber_loss_executor_aten_cuda_float64 PASSED [ 45%] 2023-01-11T23:13:47.1510791Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_l1_loss_executor_aten_cuda_float16 PASSED [ 45%] 2023-01-11T23:13:47.1511021Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_l1_loss_executor_aten_cuda_float32 PASSED [ 45%] 2023-01-11T23:13:47.1511217Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_l1_loss_executor_aten_cuda_float64 PASSED [ 45%] 2023-01-11T23:13:47.1511428Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_l1_loss_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 45%] 2023-01-11T23:13:47.1511625Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_layer_norm_executor_aten_cuda_float32 PASSED [ 45%] 2023-01-11T23:13:47.1511823Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_layer_norm_executor_aten_cuda_float64 PASSED [ 45%] 2023-01-11T23:13:47.1512046Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_layer_norm_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 45%] 2023-01-11T23:13:47.1512254Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_layer_norm_executor_nvfuser_cuda_float32 PASSED [ 45%] 2023-01-11T23:13:47.1512457Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_leaky_relu_executor_aten_cuda_float16 PASSED [ 45%] 2023-01-11T23:13:47.1512656Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_leaky_relu_executor_aten_cuda_float64 PASSED [ 45%] 2023-01-11T23:13:47.1512882Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_leaky_relu_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 45%] 2023-01-11T23:13:47.1513095Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_aten_cuda_bfloat16 PASSED [ 45%] 2023-01-11T23:13:47.1513309Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_aten_cuda_float16 PASSED [ 45%] 2023-01-11T23:13:47.1513517Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_aten_cuda_int16 PASSED [ 45%] 2023-01-11T23:13:47.1513751Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_aten_cuda_int32 PASSED [ 45%] 2023-01-11T23:13:47.1513962Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_aten_cuda_int64 PASSED [ 45%] 2023-01-11T23:13:47.1514172Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_aten_cuda_int8 PASSED [ 45%] 2023-01-11T23:13:47.1514406Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 45%] 2023-01-11T23:13:47.1514653Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 45%] 2023-01-11T23:13:47.1514892Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 45%] 2023-01-11T23:13:47.1515124Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 45%] 2023-01-11T23:13:47.1515357Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 45%] 2023-01-11T23:13:47.1515727Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 45%] 2023-01-11T23:13:47.1515941Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_margin_ranking_loss_executor_aten_cuda_bfloat16 PASSED [ 45%] 2023-01-11T23:13:47.1516151Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_margin_ranking_loss_executor_aten_cuda_float16 PASSED [ 45%] 2023-01-11T23:13:47.1516396Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_margin_ranking_loss_executor_aten_cuda_float32 PASSED [ 45%] 2023-01-11T23:13:47.1516606Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_margin_ranking_loss_executor_aten_cuda_float64 PASSED [ 45%] 2023-01-11T23:13:47.1516813Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_margin_ranking_loss_executor_aten_cuda_int32 PASSED [ 45%] 2023-01-11T23:13:47.1517016Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_margin_ranking_loss_executor_aten_cuda_int8 PASSED [ 45%] 2023-01-11T23:13:47.1517250Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_margin_ranking_loss_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 45%] 2023-01-11T23:13:47.1517482Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_margin_ranking_loss_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 45%] 2023-01-11T23:13:47.1517901Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_margin_ranking_loss_executor_nvfuser_cuda_float32 SKIPPED (_refs.nn.functional.margin_ranking_loss doesn't support nvfuser) [ 45%] 2023-01-11T23:13:47.1518132Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_margin_ranking_loss_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 45%] 2023-01-11T23:13:47.1518332Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_mish_executor_aten_cuda_bfloat16 PASSED [ 45%] 2023-01-11T23:13:47.1518529Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_mish_executor_aten_cuda_float64 PASSED [ 45%] 2023-01-11T23:13:47.1518751Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_mish_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 45%] 2023-01-11T23:13:47.1518943Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_mish_executor_nvfuser_cuda_float32 PASSED [ 45%] 2023-01-11T23:13:47.1519145Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_mse_loss_executor_aten_cuda_float16 PASSED [ 45%] 2023-01-11T23:13:47.1519376Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_nll_loss_executor_aten_cuda_bfloat16 XFAIL [ 45%] 2023-01-11T23:13:47.1519571Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_nll_loss_executor_aten_cuda_float32 XFAIL [ 45%] 2023-01-11T23:13:47.1519763Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_nll_loss_executor_aten_cuda_float64 XFAIL [ 45%] 2023-01-11T23:13:47.1519972Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pairwise_distance_executor_aten_cuda_bfloat16 PASSED [ 45%] 2023-01-11T23:13:47.1520179Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pairwise_distance_executor_aten_cuda_float16 PASSED [ 45%] 2023-01-11T23:13:47.1520384Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pairwise_distance_executor_aten_cuda_float32 PASSED [ 45%] 2023-01-11T23:13:47.1520587Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pairwise_distance_executor_aten_cuda_float64 PASSED [ 45%] 2023-01-11T23:13:47.1520791Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pairwise_distance_executor_aten_cuda_int32 PASSED [ 45%] 2023-01-11T23:13:47.1520984Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pairwise_distance_executor_aten_cuda_int64 PASSED [ 45%] 2023-01-11T23:13:47.1521216Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pairwise_distance_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 45%] 2023-01-11T23:13:47.1521452Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pairwise_distance_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 45%] 2023-01-11T23:13:47.1521693Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pairwise_distance_executor_nvfuser_cuda_float32 PASSED [ 45%] 2023-01-11T23:13:47.1521927Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pairwise_distance_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 45%] 2023-01-11T23:13:47.1522297Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pairwise_distance_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 45%] 2023-01-11T23:13:47.1522506Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pairwise_distance_executor_nvfuser_cuda_int32 PASSED [ 45%] 2023-01-11T23:13:47.1522878Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pdist_executor_nvfuser_cuda_float32 SKIPPED (_refs.nn.functional.pdist doesn't support nvfuser) [ 45%] 2023-01-11T23:13:47.1523101Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pdist_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 45%] 2023-01-11T23:13:47.1523307Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_poisson_nll_loss_executor_aten_cuda_float64 PASSED [ 45%] 2023-01-11T23:13:47.1523514Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_poisson_nll_loss_executor_aten_cuda_int32 PASSED [ 45%] 2023-01-11T23:13:47.1523712Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_poisson_nll_loss_executor_aten_cuda_uint8 PASSED [ 45%] 2023-01-11T23:13:47.1523941Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_poisson_nll_loss_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 45%] 2023-01-11T23:13:47.1524168Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_poisson_nll_loss_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 45%] 2023-01-11T23:13:47.1524532Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_poisson_nll_loss_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 45%] 2023-01-11T23:13:47.1524745Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_poisson_nll_loss_executor_nvfuser_cuda_int32 PASSED [ 45%] 2023-01-11T23:13:47.1524968Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_prelu_executor_aten_cuda_bfloat16 PASSED [ 45%] 2023-01-11T23:13:47.1525169Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_prelu_executor_aten_cuda_float16 PASSED [ 45%] 2023-01-11T23:13:47.1525365Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_prelu_executor_aten_cuda_float64 PASSED [ 45%] 2023-01-11T23:13:47.1525585Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_prelu_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 45%] 2023-01-11T23:13:47.1525780Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu6_executor_aten_cuda_bfloat16 PASSED [ 45%] 2023-01-11T23:13:47.1525970Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu6_executor_aten_cuda_float16 PASSED [ 45%] 2023-01-11T23:13:47.1526166Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu6_executor_aten_cuda_float64 PASSED [ 45%] 2023-01-11T23:13:47.1526357Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu6_executor_aten_cuda_int32 PASSED [ 45%] 2023-01-11T23:13:47.1526549Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu6_executor_aten_cuda_int64 PASSED [ 45%] 2023-01-11T23:13:47.1526768Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu6_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 45%] 2023-01-11T23:13:47.1526960Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu_executor_aten_cuda_float16 PASSED [ 45%] 2023-01-11T23:13:47.1527157Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu_executor_aten_cuda_float64 PASSED [ 45%] 2023-01-11T23:13:47.1527382Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu_executor_aten_cuda_int16 PASSED [ 45%] 2023-01-11T23:13:47.1527574Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu_executor_aten_cuda_int64 PASSED [ 45%] 2023-01-11T23:13:47.1527796Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 45%] 2023-01-11T23:13:47.1528006Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 45%] 2023-01-11T23:13:47.1528228Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 45%] 2023-01-11T23:13:47.1528589Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu_executor_nvfuser_cuda_int32 SKIPPED (_refs.nn.functional.relu doesn't support nvfuser) [ 45%] 2023-01-11T23:13:47.1528937Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 45%] 2023-01-11T23:13:47.1529136Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_selu_executor_aten_cuda_bfloat16 PASSED [ 45%] 2023-01-11T23:13:47.1529337Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_selu_executor_aten_cuda_float16 PASSED [ 45%] 2023-01-11T23:13:47.1529532Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_selu_executor_aten_cuda_float32 PASSED [ 45%] 2023-01-11T23:13:47.1529723Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_selu_executor_aten_cuda_float64 PASSED [ 45%] 2023-01-11T23:13:47.1529941Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_selu_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 45%] 2023-01-11T23:13:47.1530153Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_aten_cuda_bool PASSED [ 45%] 2023-01-11T23:13:47.1530391Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_aten_cuda_complex128 PASSED [ 45%] 2023-01-11T23:13:47.1530593Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_aten_cuda_float16 PASSED [ 45%] 2023-01-11T23:13:47.1530796Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_aten_cuda_float32 PASSED [ 45%] 2023-01-11T23:13:47.1531027Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 45%] 2023-01-11T23:13:47.1531263Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 45%] 2023-01-11T23:13:47.1531508Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 45%] 2023-01-11T23:13:47.1531723Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_nvfuser_cuda_float32 PASSED [ 45%] 2023-01-11T23:13:47.1532098Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 45%] 2023-01-11T23:13:47.1532328Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 45%] 2023-01-11T23:13:47.1532692Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 45%] 2023-01-11T23:13:47.1532902Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmin_with_dtype_executor_aten_cuda_float16 PASSED [ 45%] 2023-01-11T23:13:47.1533136Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmin_with_dtype_executor_aten_cuda_int16 PASSED [ 45%] 2023-01-11T23:13:47.1533336Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmin_with_dtype_executor_aten_cuda_int8 PASSED [ 45%] 2023-01-11T23:13:47.1533568Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmin_with_dtype_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 45%] 2023-01-11T23:13:47.1533801Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmin_with_dtype_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 45%] 2023-01-11T23:13:47.1534012Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmin_with_dtype_executor_nvfuser_cuda_float32 PASSED [ 45%] 2023-01-11T23:13:47.1534380Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmin_with_dtype_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 45%] 2023-01-11T23:13:47.1534900Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmin_with_dtype_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 45%] 2023-01-11T23:13:47.1535123Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softplus_executor_aten_cuda_float16 PASSED [ 45%] 2023-01-11T23:13:47.1535352Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softplus_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 45%] 2023-01-11T23:13:47.1535559Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softplus_executor_nvfuser_cuda_float32 PASSED [ 45%] 2023-01-11T23:13:47.1535782Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softplus_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 45%] 2023-01-11T23:13:47.1535987Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softshrink_executor_aten_cuda_float64 PASSED [ 45%] 2023-01-11T23:13:47.1536317Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softshrink_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 45%] 2023-01-11T23:13:47.1536717Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softshrink_executor_nvfuser_cuda_float32 SKIPPED (_refs.nn.functional.softshrink doesn't support nvfuser) [ 45%] 2023-01-11T23:13:47.1536941Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softshrink_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 45%] 2023-01-11T23:13:47.1537143Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_tanhshrink_executor_aten_cuda_bfloat16 PASSED [ 45%] 2023-01-11T23:13:47.1537345Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_tanhshrink_executor_aten_cuda_float16 PASSED [ 45%] 2023-01-11T23:13:47.1537550Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_tanhshrink_executor_aten_cuda_float64 PASSED [ 45%] 2023-01-11T23:13:47.1537749Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_tanhshrink_executor_aten_cuda_int16 PASSED [ 45%] 2023-01-11T23:13:47.1537946Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_tanhshrink_executor_aten_cuda_int64 PASSED [ 45%] 2023-01-11T23:13:47.1538142Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_tanhshrink_executor_aten_cuda_int8 PASSED [ 45%] 2023-01-11T23:13:47.1538335Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_tanhshrink_executor_aten_cuda_uint8 PASSED [ 45%] 2023-01-11T23:13:47.1538562Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_tanhshrink_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 45%] 2023-01-11T23:13:47.1538782Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_tanhshrink_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 45%] 2023-01-11T23:13:47.1539051Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_tanhshrink_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 45%] 2023-01-11T23:13:47.1539263Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_tanhshrink_executor_nvfuser_cuda_float32 PASSED [ 45%] 2023-01-11T23:13:47.1539485Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_tanhshrink_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 45%] 2023-01-11T23:13:47.1539705Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_tanhshrink_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 45%] 2023-01-11T23:13:47.1539904Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_threshold_executor_aten_cuda_bfloat16 PASSED [ 45%] 2023-01-11T23:13:47.1540099Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_threshold_executor_aten_cuda_int16 PASSED [ 45%] 2023-01-11T23:13:47.1540299Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_threshold_executor_aten_cuda_int32 PASSED [ 45%] 2023-01-11T23:13:47.1540503Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_threshold_executor_aten_cuda_int8 PASSED [ 45%] 2023-01-11T23:13:47.1540701Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_threshold_executor_aten_cuda_uint8 PASSED [ 45%] 2023-01-11T23:13:47.1541081Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_threshold_executor_nvfuser_cuda_float32 SKIPPED (_refs.nn.functional.threshold doesn't support nvfuser) [ 45%] 2023-01-11T23:13:47.1541463Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_threshold_executor_nvfuser_cuda_int32 SKIPPED (_refs.nn.functional.threshold doesn't support nvfuser) [ 45%] 2023-01-11T23:13:47.1541812Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_threshold_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 45%] 2023-01-11T23:13:47.1542067Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_threshold_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 45%] 2023-01-11T23:13:47.1542280Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_triplet_margin_loss_executor_aten_cuda_bfloat16 PASSED [ 45%] 2023-01-11T23:13:47.1542492Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_triplet_margin_loss_executor_aten_cuda_complex64 PASSED [ 45%] 2023-01-11T23:13:47.1542702Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_triplet_margin_loss_executor_aten_cuda_float16 PASSED [ 45%] 2023-01-11T23:13:47.1542908Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_triplet_margin_loss_executor_aten_cuda_int16 PASSED [ 45%] 2023-01-11T23:13:47.1543109Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_triplet_margin_loss_executor_aten_cuda_int32 PASSED [ 45%] 2023-01-11T23:13:47.1543314Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_triplet_margin_loss_executor_aten_cuda_uint8 PASSED [ 45%] 2023-01-11T23:13:47.1543738Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_triplet_margin_loss_executor_nvfuser_cuda_float32 SKIPPED (_refs.nn.functional.triplet_margin_loss doesn't support nvfuser) [ 45%] 2023-01-11T23:13:47.1543973Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_triplet_margin_loss_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 45%] 2023-01-11T23:13:47.1544339Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_triplet_margin_loss_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 45%] 2023-01-11T23:13:47.1544746Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_triplet_margin_loss_executor_nvfuser_cuda_int32 SKIPPED (_refs.nn.functional.triplet_margin_loss doesn't support nvfuser) [ 45%] 2023-01-11T23:13:47.1545140Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_triplet_margin_loss_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 45%] 2023-01-11T23:13:47.1545372Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_triplet_margin_loss_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 45%] 2023-01-11T23:13:47.1545558Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_norm_executor_aten_cuda_complex128 PASSED [ 45%] 2023-01-11T23:13:47.1545739Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_norm_executor_aten_cuda_float16 PASSED [ 45%] 2023-01-11T23:13:47.1545919Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_norm_executor_aten_cuda_float32 PASSED [ 46%] 2023-01-11T23:13:47.1546125Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_norm_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 46%] 2023-01-11T23:13:47.1546308Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ones_executor_aten_cuda_bfloat16 PASSED [ 46%] 2023-01-11T23:13:47.1546491Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ones_executor_aten_cuda_bool PASSED [ 46%] 2023-01-11T23:13:47.1546670Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ones_executor_aten_cuda_complex128 PASSED [ 46%] 2023-01-11T23:13:47.1546853Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ones_executor_aten_cuda_complex32 PASSED [ 46%] 2023-01-11T23:13:47.1547029Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ones_executor_aten_cuda_float64 PASSED [ 46%] 2023-01-11T23:13:47.1547208Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ones_executor_aten_cuda_int32 PASSED [ 46%] 2023-01-11T23:13:47.1547386Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ones_executor_aten_cuda_uint8 PASSED [ 46%] 2023-01-11T23:13:47.1547595Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ones_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 46%] 2023-01-11T23:13:47.1547796Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ones_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 46%] 2023-01-11T23:13:47.1548026Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ones_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 46%] 2023-01-11T23:13:47.1548341Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ones_executor_nvfuser_cuda_int32 SKIPPED (_refs.ones doesn't support nvfuser) [ 46%] 2023-01-11T23:13:47.1548545Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ones_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 46%] 2023-01-11T23:13:47.1548740Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ones_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 46%] 2023-01-11T23:13:47.1548933Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_aten_cuda_bfloat16 PASSED [ 46%] 2023-01-11T23:13:47.1549149Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_aten_cuda_complex128 PASSED [ 46%] 2023-01-11T23:13:47.1549364Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_aten_cuda_complex32 PASSED [ 46%] 2023-01-11T23:13:47.1549552Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_aten_cuda_complex64 PASSED [ 46%] 2023-01-11T23:13:47.1549818Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_aten_cuda_float16 PASSED [ 46%] 2023-01-11T23:13:47.1550001Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_aten_cuda_int32 PASSED [ 46%] 2023-01-11T23:13:47.1550185Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_aten_cuda_int64 PASSED [ 46%] 2023-01-11T23:13:47.1550360Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_aten_cuda_uint8 PASSED [ 46%] 2023-01-11T23:13:47.1550596Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 46%] 2023-01-11T23:13:47.1550804Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 46%] 2023-01-11T23:13:47.1551017Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 46%] 2023-01-11T23:13:47.1551346Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 46%] 2023-01-11T23:13:47.1551555Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 46%] 2023-01-11T23:13:47.1551741Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_positive_executor_aten_cuda_bfloat16 PASSED [ 46%] 2023-01-11T23:13:47.1551928Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_positive_executor_aten_cuda_complex128 PASSED [ 46%] 2023-01-11T23:13:47.1552118Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_positive_executor_aten_cuda_complex32 PASSED [ 46%] 2023-01-11T23:13:47.1552310Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_positive_executor_aten_cuda_float32 PASSED [ 46%] 2023-01-11T23:13:47.1552492Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_positive_executor_aten_cuda_float64 PASSED [ 46%] 2023-01-11T23:13:47.1552667Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_positive_executor_aten_cuda_int64 PASSED [ 46%] 2023-01-11T23:13:47.1552849Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_positive_executor_aten_cuda_uint8 PASSED [ 46%] 2023-01-11T23:13:47.1553060Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_positive_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 46%] 2023-01-11T23:13:47.1553276Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_positive_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 46%] 2023-01-11T23:13:47.1553490Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_positive_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 46%] 2023-01-11T23:13:47.1553728Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_positive_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 46%] 2023-01-11T23:13:47.1553933Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_positive_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 46%] 2023-01-11T23:13:47.1554265Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_positive_executor_nvfuser_cuda_int32 SKIPPED (_refs.positive doesn't support nvfuser) [ 46%] 2023-01-11T23:13:47.1554447Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_pow_executor_aten_cuda_complex128 PASSED [ 46%] 2023-01-11T23:13:47.1554633Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_pow_executor_aten_cuda_complex32 XFAIL [ 46%] 2023-01-11T23:13:47.1554812Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_pow_executor_aten_cuda_float32 PASSED [ 46%] 2023-01-11T23:13:47.1554992Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_pow_executor_aten_cuda_float64 PASSED [ 46%] 2023-01-11T23:13:47.1555174Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_pow_executor_aten_cuda_int64 PASSED [ 46%] 2023-01-11T23:13:47.1555350Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_pow_executor_aten_cuda_int8 PASSED [ 46%] 2023-01-11T23:13:47.1555554Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_pow_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 46%] 2023-01-11T23:13:47.1555763Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_pow_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 46%] 2023-01-11T23:13:47.1555971Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_pow_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 46%] 2023-01-11T23:13:47.1556332Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_pow_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 46%] 2023-01-11T23:13:47.1556538Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_pow_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 46%] 2023-01-11T23:13:47.1556859Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_pow_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 46%] 2023-01-11T23:13:47.1557036Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_prod_executor_aten_cuda_bfloat16 PASSED [ 46%] 2023-01-11T23:13:47.1557217Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_prod_executor_aten_cuda_bool PASSED [ 46%] 2023-01-11T23:13:47.1557403Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_prod_executor_aten_cuda_complex128 PASSED [ 46%] 2023-01-11T23:13:47.1557585Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_prod_executor_aten_cuda_complex32 PASSED [ 46%] 2023-01-11T23:13:47.1557769Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_prod_executor_aten_cuda_complex64 PASSED [ 46%] 2023-01-11T23:13:47.1557948Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_prod_executor_aten_cuda_float16 PASSED [ 46%] 2023-01-11T23:13:47.1558129Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_prod_executor_aten_cuda_int64 PASSED [ 46%] 2023-01-11T23:13:47.1558307Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_prod_executor_aten_cuda_uint8 PASSED [ 46%] 2023-01-11T23:13:47.1558517Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_prod_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 46%] 2023-01-11T23:13:47.1558713Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_prod_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 46%] 2023-01-11T23:13:47.1559055Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_prod_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 46%] 2023-01-11T23:13:47.1559263Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_prod_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 46%] 2023-01-11T23:13:47.1559623Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_prod_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 46%] 2023-01-11T23:13:47.1559830Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_prod_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 46%] 2023-01-11T23:13:47.1560032Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_prod_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 46%] 2023-01-11T23:13:47.1560353Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_randn_executor_aten_cuda_float32 SKIPPED (make_traced() doesn't set seed properly!) [ 46%] 2023-01-11T23:13:47.1560674Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_randn_executor_aten_cuda_float64 SKIPPED (make_traced() doesn't set seed properly!) [ 46%] 2023-01-11T23:13:47.1561007Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_randn_executor_nvfuser_cuda_complex32 SKIPPED (make_traced() doesn't set seed properly!) [ 46%] 2023-01-11T23:13:47.1561337Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_randn_executor_nvfuser_cuda_complex64 SKIPPED (make_traced() doesn't set seed properly!) [ 46%] 2023-01-11T23:13:47.1561657Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_randn_executor_nvfuser_cuda_float32 SKIPPED (make_traced() doesn't set seed properly!) [ 46%] 2023-01-11T23:13:47.1561972Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_randn_executor_nvfuser_cuda_float64 SKIPPED (make_traced() doesn't set seed properly!) [ 46%] 2023-01-11T23:13:47.1562155Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_aten_cuda_bool PASSED [ 46%] 2023-01-11T23:13:47.1562341Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_aten_cuda_complex32 PASSED [ 46%] 2023-01-11T23:13:47.1562553Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_aten_cuda_complex64 PASSED [ 46%] 2023-01-11T23:13:47.1562733Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_aten_cuda_int64 PASSED [ 46%] 2023-01-11T23:13:47.1562917Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_aten_cuda_uint8 PASSED [ 46%] 2023-01-11T23:13:47.1563127Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 46%] 2023-01-11T23:13:47.1563335Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 46%] 2023-01-11T23:13:47.1563677Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 46%] 2023-01-11T23:13:47.1563886Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 46%] 2023-01-11T23:13:47.1564201Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_nvfuser_cuda_float32 SKIPPED (_refs.ravel doesn't support nvfuser) [ 46%] 2023-01-11T23:13:47.1564529Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 46%] 2023-01-11T23:13:47.1564734Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 46%] 2023-01-11T23:13:47.1564917Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_aten_cuda_bfloat16 PASSED [ 46%] 2023-01-11T23:13:47.1565096Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_aten_cuda_bool PASSED [ 46%] 2023-01-11T23:13:47.1565279Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_aten_cuda_complex32 PASSED [ 46%] 2023-01-11T23:13:47.1565461Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_aten_cuda_float64 PASSED [ 46%] 2023-01-11T23:13:47.1565639Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_aten_cuda_int16 PASSED [ 46%] 2023-01-11T23:13:47.1565842Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_aten_cuda_int32 PASSED [ 46%] 2023-01-11T23:13:47.1566016Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_aten_cuda_int8 PASSED [ 46%] 2023-01-11T23:13:47.1566195Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_aten_cuda_uint8 PASSED [ 46%] 2023-01-11T23:13:47.1566401Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 46%] 2023-01-11T23:13:47.1566602Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 46%] 2023-01-11T23:13:47.1566811Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 46%] 2023-01-11T23:13:47.1567019Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 46%] 2023-01-11T23:13:47.1567226Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 46%] 2023-01-11T23:13:47.1567550Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 46%] 2023-01-11T23:13:47.1567865Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_nvfuser_cuda_int32 SKIPPED (_refs.real doesn't support nvfuser) [ 46%] 2023-01-11T23:13:47.1568068Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 46%] 2023-01-11T23:13:47.1568250Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reciprocal_executor_aten_cuda_bfloat16 PASSED [ 46%] 2023-01-11T23:13:47.1568468Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reciprocal_executor_aten_cuda_bool PASSED [ 46%] 2023-01-11T23:13:47.1568662Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reciprocal_executor_aten_cuda_complex128 PASSED [ 46%] 2023-01-11T23:13:47.1568851Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reciprocal_executor_aten_cuda_complex64 PASSED [ 46%] 2023-01-11T23:13:47.1569039Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reciprocal_executor_aten_cuda_float32 PASSED [ 46%] 2023-01-11T23:13:47.1569226Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reciprocal_executor_aten_cuda_float64 PASSED [ 46%] 2023-01-11T23:13:47.1569411Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reciprocal_executor_aten_cuda_int64 PASSED [ 46%] 2023-01-11T23:13:47.1569628Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reciprocal_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 46%] 2023-01-11T23:13:47.1569846Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reciprocal_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 46%] 2023-01-11T23:13:47.1570061Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reciprocal_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 46%] 2023-01-11T23:13:47.1570241Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reciprocal_executor_nvfuser_cuda_int32 PASSED [ 46%] 2023-01-11T23:13:47.1570450Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reciprocal_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 46%] 2023-01-11T23:13:47.1570786Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reciprocal_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 46%] 2023-01-11T23:13:47.1570996Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reciprocal_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 46%] 2023-01-11T23:13:47.1571186Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_remainder_executor_aten_cuda_float16 PASSED [ 46%] 2023-01-11T23:13:47.1571369Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_remainder_executor_aten_cuda_float32 PASSED [ 46%] 2023-01-11T23:13:47.1571577Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_remainder_executor_aten_cuda_int16 PASSED [ 46%] 2023-01-11T23:13:47.1571764Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_remainder_executor_aten_cuda_int64 PASSED [ 46%] 2023-01-11T23:13:47.1571979Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_remainder_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 46%] 2023-01-11T23:13:47.1572168Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_remainder_executor_nvfuser_cuda_float32 PASSED [ 46%] 2023-01-11T23:13:47.1572494Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_remainder_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 46%] 2023-01-11T23:13:47.1572708Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_remainder_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 46%] 2023-01-11T23:13:47.1573039Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_remainder_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 46%] 2023-01-11T23:13:47.1573222Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_repeat_executor_aten_cuda_bfloat16 PASSED [ 46%] 2023-01-11T23:13:47.1573408Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_repeat_executor_aten_cuda_complex128 PASSED [ 46%] 2023-01-11T23:13:47.1573593Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_repeat_executor_aten_cuda_complex64 PASSED [ 46%] 2023-01-11T23:13:47.1573772Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_repeat_executor_aten_cuda_int16 PASSED [ 46%] 2023-01-11T23:13:47.1573954Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_repeat_executor_aten_cuda_int32 PASSED [ 46%] 2023-01-11T23:13:47.1574162Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_repeat_executor_aten_cuda_int8 PASSED [ 46%] 2023-01-11T23:13:47.1574335Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_repeat_executor_aten_cuda_uint8 PASSED [ 46%] 2023-01-11T23:13:47.1574722Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_repeat_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 46%] 2023-01-11T23:13:47.1574993Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_repeat_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 46%] 2023-01-11T23:13:47.1575341Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_repeat_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 46%] 2023-01-11T23:13:47.1575548Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_repeat_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 46%] 2023-01-11T23:13:47.1575873Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_repeat_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 46%] 2023-01-11T23:13:47.1576060Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_aten_cuda_bool PASSED [ 46%] 2023-01-11T23:13:47.1576255Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_aten_cuda_complex32 PASSED [ 46%] 2023-01-11T23:13:47.1576442Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_aten_cuda_complex64 PASSED [ 46%] 2023-01-11T23:13:47.1576628Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_aten_cuda_float32 PASSED [ 46%] 2023-01-11T23:13:47.1576805Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_aten_cuda_int64 PASSED [ 46%] 2023-01-11T23:13:47.1576988Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_aten_cuda_int8 PASSED [ 46%] 2023-01-11T23:13:47.1577199Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 46%] 2023-01-11T23:13:47.1577410Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 46%] 2023-01-11T23:13:47.1577830Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 46%] 2023-01-11T23:13:47.1578054Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 46%] 2023-01-11T23:13:47.1578268Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 46%] 2023-01-11T23:13:47.1578607Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_nvfuser_cuda_float32 SKIPPED (_refs.reshape_as doesn't support nvfuser) [ 46%] 2023-01-11T23:13:47.1578815Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 46%] 2023-01-11T23:13:47.1579169Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 46%] 2023-01-11T23:13:47.1579418Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 46%] 2023-01-11T23:13:47.1579621Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 47%] 2023-01-11T23:13:47.1579809Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_aten_cuda_bfloat16 PASSED [ 47%] 2023-01-11T23:13:47.1580001Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_aten_cuda_complex32 PASSED [ 47%] 2023-01-11T23:13:47.1580190Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_aten_cuda_complex64 PASSED [ 47%] 2023-01-11T23:13:47.1580410Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_aten_cuda_float32 PASSED [ 47%] 2023-01-11T23:13:47.1580597Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_aten_cuda_float64 PASSED [ 47%] 2023-01-11T23:13:47.1580778Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_aten_cuda_int16 PASSED [ 47%] 2023-01-11T23:13:47.1580959Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_aten_cuda_int64 PASSED [ 47%] 2023-01-11T23:13:47.1581163Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 47%] 2023-01-11T23:13:47.1581361Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 47%] 2023-01-11T23:13:47.1581572Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 47%] 2023-01-11T23:13:47.1581786Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 47%] 2023-01-11T23:13:47.1581995Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 47%] 2023-01-11T23:13:47.1582326Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_nvfuser_cuda_float32 SKIPPED (_refs.reshape doesn't support nvfuser) [ 47%] 2023-01-11T23:13:47.1582536Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 47%] 2023-01-11T23:13:47.1582862Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 47%] 2023-01-11T23:13:47.1583194Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 47%] 2023-01-11T23:13:47.1583403Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 47%] 2023-01-11T23:13:47.1583625Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_aten_cuda_bfloat16 PASSED [ 47%] 2023-01-11T23:13:47.1583806Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_aten_cuda_bool PASSED [ 47%] 2023-01-11T23:13:47.1583981Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_aten_cuda_float64 PASSED [ 47%] 2023-01-11T23:13:47.1584164Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_aten_cuda_int16 PASSED [ 47%] 2023-01-11T23:13:47.1584341Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_aten_cuda_int8 PASSED [ 47%] 2023-01-11T23:13:47.1584517Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_aten_cuda_uint8 PASSED [ 47%] 2023-01-11T23:13:47.1584728Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 47%] 2023-01-11T23:13:47.1585074Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 47%] 2023-01-11T23:13:47.1585281Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 47%] 2023-01-11T23:13:47.1585598Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_nvfuser_cuda_float32 SKIPPED (_refs.roll doesn't support nvfuser) [ 47%] 2023-01-11T23:13:47.1585801Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 47%] 2023-01-11T23:13:47.1586006Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 47%] 2023-01-11T23:13:47.1586365Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 47%] 2023-01-11T23:13:47.1586573Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 47%] 2023-01-11T23:13:47.1586759Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rot90_executor_aten_cuda_complex128 PASSED [ 47%] 2023-01-11T23:13:47.1586943Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rot90_executor_aten_cuda_complex64 PASSED [ 47%] 2023-01-11T23:13:47.1587125Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rot90_executor_aten_cuda_float64 PASSED [ 47%] 2023-01-11T23:13:47.1587304Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rot90_executor_aten_cuda_int64 PASSED [ 47%] 2023-01-11T23:13:47.1587479Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rot90_executor_aten_cuda_int8 PASSED [ 47%] 2023-01-11T23:13:47.1587657Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rot90_executor_aten_cuda_uint8 PASSED [ 47%] 2023-01-11T23:13:47.1587864Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rot90_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 47%] 2023-01-11T23:13:47.1588059Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rot90_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 47%] 2023-01-11T23:13:47.1588272Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rot90_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 47%] 2023-01-11T23:13:47.1588475Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rot90_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 47%] 2023-01-11T23:13:47.1588672Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rot90_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 47%] 2023-01-11T23:13:47.1589001Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rot90_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 47%] 2023-01-11T23:13:47.1589207Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rot90_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 47%] 2023-01-11T23:13:47.1589436Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rot90_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 47%] 2023-01-11T23:13:47.1589622Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_round_executor_aten_cuda_bfloat16 PASSED [ 47%] 2023-01-11T23:13:47.1589872Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_round_executor_aten_cuda_float16 PASSED [ 47%] 2023-01-11T23:13:47.1590053Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_round_executor_aten_cuda_int32 PASSED [ 47%] 2023-01-11T23:13:47.1590226Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_round_executor_aten_cuda_uint8 PASSED [ 47%] 2023-01-11T23:13:47.1590431Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_round_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 47%] 2023-01-11T23:13:47.1590633Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_round_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 47%] 2023-01-11T23:13:47.1590966Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_round_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 47%] 2023-01-11T23:13:47.1591288Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_round_executor_nvfuser_cuda_int32 SKIPPED (_refs.round doesn't support nvfuser) [ 47%] 2023-01-11T23:13:47.1591492Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_round_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 47%] 2023-01-11T23:13:47.1591674Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsqrt_executor_aten_cuda_bfloat16 PASSED [ 47%] 2023-01-11T23:13:47.1591856Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsqrt_executor_aten_cuda_float32 PASSED [ 47%] 2023-01-11T23:13:47.1592066Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsqrt_executor_aten_cuda_float64 PASSED [ 47%] 2023-01-11T23:13:47.1592244Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsqrt_executor_aten_cuda_uint8 PASSED [ 47%] 2023-01-11T23:13:47.1592445Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsqrt_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 47%] 2023-01-11T23:13:47.1592658Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsqrt_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 47%] 2023-01-11T23:13:47.1592841Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsqrt_executor_nvfuser_cuda_float32 PASSED [ 47%] 2023-01-11T23:13:47.1593042Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsqrt_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 47%] 2023-01-11T23:13:47.1593224Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsqrt_executor_nvfuser_cuda_int32 PASSED [ 47%] 2023-01-11T23:13:47.1593427Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsqrt_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 47%] 2023-01-11T23:13:47.1593632Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsqrt_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 47%] 2023-01-11T23:13:47.1593813Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsub_executor_aten_cuda_bfloat16 PASSED [ 47%] 2023-01-11T23:13:47.1593992Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsub_executor_aten_cuda_float16 PASSED [ 47%] 2023-01-11T23:13:47.1594163Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsub_executor_aten_cuda_int16 PASSED [ 47%] 2023-01-11T23:13:47.1594339Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsub_executor_aten_cuda_int64 PASSED [ 47%] 2023-01-11T23:13:47.1594517Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsub_executor_aten_cuda_uint8 PASSED [ 47%] 2023-01-11T23:13:47.1594724Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsub_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 47%] 2023-01-11T23:13:47.1594931Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsub_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 47%] 2023-01-11T23:13:47.1595159Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsub_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 47%] 2023-01-11T23:13:47.1595345Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsub_executor_nvfuser_cuda_float32 PASSED [ 47%] 2023-01-11T23:13:47.1595551Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsub_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 47%] 2023-01-11T23:13:47.1595733Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsub_executor_nvfuser_cuda_int32 PASSED [ 47%] 2023-01-11T23:13:47.1596057Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsub_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 47%] 2023-01-11T23:13:47.1596238Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_aten_cuda_complex64 PASSED [ 47%] 2023-01-11T23:13:47.1596416Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_aten_cuda_float16 PASSED [ 47%] 2023-01-11T23:13:47.1596597Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_aten_cuda_float64 PASSED [ 47%] 2023-01-11T23:13:47.1596774Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_aten_cuda_int16 PASSED [ 47%] 2023-01-11T23:13:47.1596952Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_aten_cuda_uint8 PASSED [ 47%] 2023-01-11T23:13:47.1597157Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 47%] 2023-01-11T23:13:47.1597367Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 47%] 2023-01-11T23:13:47.1597616Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 47%] 2023-01-11T23:13:47.1597798Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_nvfuser_cuda_float32 PASSED [ 47%] 2023-01-11T23:13:47.1597997Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 47%] 2023-01-11T23:13:47.1598324Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 47%] 2023-01-11T23:13:47.1598504Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_nvfuser_cuda_int32 PASSED [ 47%] 2023-01-11T23:13:47.1598708Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 47%] 2023-01-11T23:13:47.1598894Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_aten_cuda_bfloat16 PASSED [ 47%] 2023-01-11T23:13:47.1599085Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_aten_cuda_complex32 PASSED [ 47%] 2023-01-11T23:13:47.1599271Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_aten_cuda_float32 PASSED [ 47%] 2023-01-11T23:13:47.1599456Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_aten_cuda_int16 PASSED [ 47%] 2023-01-11T23:13:47.1599637Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_aten_cuda_int32 PASSED [ 47%] 2023-01-11T23:13:47.1599815Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_aten_cuda_int64 PASSED [ 47%] 2023-01-11T23:13:47.1599988Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_aten_cuda_int8 PASSED [ 47%] 2023-01-11T23:13:47.1600199Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 47%] 2023-01-11T23:13:47.1600409Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 47%] 2023-01-11T23:13:47.1600596Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_nvfuser_cuda_float32 PASSED [ 47%] 2023-01-11T23:13:47.1600958Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 47%] 2023-01-11T23:13:47.1601151Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_nvfuser_cuda_int32 PASSED [ 47%] 2023-01-11T23:13:47.1601360Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 47%] 2023-01-11T23:13:47.1601563Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 47%] 2023-01-11T23:13:47.1601742Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sign_executor_aten_cuda_float16 PASSED [ 47%] 2023-01-11T23:13:47.1601927Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sign_executor_aten_cuda_float64 PASSED [ 47%] 2023-01-11T23:13:47.1602102Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sign_executor_aten_cuda_int32 PASSED [ 47%] 2023-01-11T23:13:47.1602286Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sign_executor_aten_cuda_int64 PASSED [ 47%] 2023-01-11T23:13:47.1602467Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sign_executor_aten_cuda_int8 PASSED [ 47%] 2023-01-11T23:13:47.1602675Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sign_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 47%] 2023-01-11T23:13:47.1602879Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sign_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 47%] 2023-01-11T23:13:47.1603062Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sign_executor_nvfuser_cuda_int32 PASSED [ 47%] 2023-01-11T23:13:47.1603294Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sign_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 47%] 2023-01-11T23:13:47.1603498Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sign_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 47%] 2023-01-11T23:13:47.1603682Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_signbit_executor_aten_cuda_bool PASSED [ 47%] 2023-01-11T23:13:47.1603861Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_signbit_executor_aten_cuda_float16 PASSED [ 47%] 2023-01-11T23:13:47.1604048Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_signbit_executor_aten_cuda_float32 PASSED [ 47%] 2023-01-11T23:13:47.1604227Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_signbit_executor_aten_cuda_float64 PASSED [ 47%] 2023-01-11T23:13:47.1604410Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_signbit_executor_aten_cuda_int32 PASSED [ 47%] 2023-01-11T23:13:47.1604589Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_signbit_executor_aten_cuda_int64 PASSED [ 47%] 2023-01-11T23:13:47.1604803Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_signbit_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 47%] 2023-01-11T23:13:47.1605015Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_signbit_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 47%] 2023-01-11T23:13:47.1605345Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_signbit_executor_nvfuser_cuda_int32 SKIPPED (_refs.signbit doesn't support nvfuser) [ 47%] 2023-01-11T23:13:47.1605525Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sin_executor_aten_cuda_bool PASSED [ 47%] 2023-01-11T23:13:47.1605710Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sin_executor_aten_cuda_complex128 PASSED [ 47%] 2023-01-11T23:13:47.1605884Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sin_executor_aten_cuda_complex32 PASSED [ 47%] 2023-01-11T23:13:47.1606067Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sin_executor_aten_cuda_float16 PASSED [ 47%] 2023-01-11T23:13:47.1606246Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sin_executor_aten_cuda_int16 PASSED [ 47%] 2023-01-11T23:13:47.1606455Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sin_executor_aten_cuda_int32 PASSED [ 47%] 2023-01-11T23:13:47.1606637Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sin_executor_aten_cuda_int64 PASSED [ 47%] 2023-01-11T23:13:47.1606842Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sin_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 47%] 2023-01-11T23:13:47.1607052Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sin_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 47%] 2023-01-11T23:13:47.1607260Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sin_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 47%] 2023-01-11T23:13:47.1607461Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sin_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 47%] 2023-01-11T23:13:47.1607641Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinc_executor_aten_cuda_bfloat16 PASSED [ 47%] 2023-01-11T23:13:47.1607831Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinc_executor_aten_cuda_complex128 PASSED [ 47%] 2023-01-11T23:13:47.1608010Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinc_executor_aten_cuda_float64 PASSED [ 47%] 2023-01-11T23:13:47.1608188Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinc_executor_aten_cuda_int16 PASSED [ 47%] 2023-01-11T23:13:47.1608367Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinc_executor_aten_cuda_int32 PASSED [ 47%] 2023-01-11T23:13:47.1608582Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinc_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 47%] 2023-01-11T23:13:47.1608764Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinc_executor_nvfuser_cuda_int32 PASSED [ 47%] 2023-01-11T23:13:47.1608996Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_aten_cuda_bfloat16 PASSED [ 47%] 2023-01-11T23:13:47.1609207Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_aten_cuda_complex32 PASSED [ 47%] 2023-01-11T23:13:47.1609382Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_aten_cuda_complex64 PASSED [ 47%] 2023-01-11T23:13:47.1609565Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_aten_cuda_float16 PASSED [ 47%] 2023-01-11T23:13:47.1609744Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_aten_cuda_float64 PASSED [ 47%] 2023-01-11T23:13:47.1609921Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_aten_cuda_int16 PASSED [ 47%] 2023-01-11T23:13:47.1610095Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_aten_cuda_int64 PASSED [ 47%] 2023-01-11T23:13:47.1610275Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_aten_cuda_uint8 PASSED [ 47%] 2023-01-11T23:13:47.1610482Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 47%] 2023-01-11T23:13:47.1610693Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 48%] 2023-01-11T23:13:47.1610901Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 48%] 2023-01-11T23:13:47.1611104Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 48%] 2023-01-11T23:13:47.1611301Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 48%] 2023-01-11T23:13:47.1611629Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 48%] 2023-01-11T23:13:47.1611833Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_softmax_with_dtype_executor_aten_cuda_complex128 PASSED [ 48%] 2023-01-11T23:13:47.1612062Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_softmax_with_dtype_executor_aten_cuda_complex64 PASSED [ 48%] 2023-01-11T23:13:47.1612262Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_softmax_with_dtype_executor_aten_cuda_float16 PASSED [ 48%] 2023-01-11T23:13:47.1612454Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_softmax_with_dtype_executor_aten_cuda_int64 PASSED [ 48%] 2023-01-11T23:13:47.1612645Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_softmax_with_dtype_executor_aten_cuda_int8 PASSED [ 48%] 2023-01-11T23:13:47.1612866Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_softmax_with_dtype_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 48%] 2023-01-11T23:13:47.1613092Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_softmax_with_dtype_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 48%] 2023-01-11T23:13:47.1613442Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_softmax_with_dtype_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 48%] 2023-01-11T23:13:47.1613635Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_softmax_with_dtype_executor_nvfuser_cuda_int32 PASSED [ 48%] 2023-01-11T23:13:47.1613856Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_softmax_with_dtype_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 48%] 2023-01-11T23:13:47.1614072Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_softmax_with_dtype_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 48%] 2023-01-11T23:13:47.1614265Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j0_executor_aten_cuda_int32 PASSED [ 48%] 2023-01-11T23:13:47.1614456Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j0_executor_aten_cuda_int64 PASSED [ 48%] 2023-01-11T23:13:47.1614873Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j0_executor_aten_cuda_uint8 PASSED [ 48%] 2023-01-11T23:13:47.1615095Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j0_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 48%] 2023-01-11T23:13:47.1615314Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j0_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 48%] 2023-01-11T23:13:47.1615683Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j0_executor_nvfuser_cuda_int32 SKIPPED (_refs.special.bessel_j0 doesn't support nvfuser) [ 48%] 2023-01-11T23:13:47.1615903Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j0_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 48%] 2023-01-11T23:13:47.1616113Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j0_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 48%] 2023-01-11T23:13:47.1616310Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j1_executor_aten_cuda_bool PASSED [ 48%] 2023-01-11T23:13:47.1616510Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j1_executor_aten_cuda_float32 PASSED [ 48%] 2023-01-11T23:13:47.1616698Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j1_executor_aten_cuda_int16 PASSED [ 48%] 2023-01-11T23:13:47.1616882Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j1_executor_aten_cuda_int32 PASSED [ 48%] 2023-01-11T23:13:47.1617064Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j1_executor_aten_cuda_int64 PASSED [ 48%] 2023-01-11T23:13:47.1617247Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j1_executor_aten_cuda_uint8 PASSED [ 48%] 2023-01-11T23:13:47.1617611Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j1_executor_nvfuser_cuda_float32 SKIPPED (_refs.special.bessel_j1 doesn't support nvfuser) [ 48%] 2023-01-11T23:13:47.1618021Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j1_executor_nvfuser_cuda_int32 SKIPPED (_refs.special.bessel_j1 doesn't support nvfuser) [ 48%] 2023-01-11T23:13:47.1618241Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j1_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 48%] 2023-01-11T23:13:47.1618585Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j1_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 48%] 2023-01-11T23:13:47.1618794Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j1_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 48%] 2023-01-11T23:13:47.1618986Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_entr_executor_aten_cuda_bfloat16 PASSED [ 48%] 2023-01-11T23:13:47.1619177Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_entr_executor_aten_cuda_bool PASSED [ 48%] 2023-01-11T23:13:47.1619366Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_entr_executor_aten_cuda_float32 PASSED [ 48%] 2023-01-11T23:13:47.1619557Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_entr_executor_aten_cuda_int16 PASSED [ 48%] 2023-01-11T23:13:47.1619746Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_entr_executor_aten_cuda_int32 PASSED [ 48%] 2023-01-11T23:13:47.1619934Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_entr_executor_aten_cuda_uint8 PASSED [ 48%] 2023-01-11T23:13:47.1620152Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_entr_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 48%] 2023-01-11T23:13:47.1620367Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_entr_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 48%] 2023-01-11T23:13:47.1620754Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_entr_executor_nvfuser_cuda_float32 SKIPPED (_refs.special.entr doesn't support nvfuser) [ 48%] 2023-01-11T23:13:47.1620963Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_entr_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 48%] 2023-01-11T23:13:47.1621314Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_entr_executor_nvfuser_cuda_int32 SKIPPED (_refs.special.entr doesn't support nvfuser) [ 48%] 2023-01-11T23:13:47.1621650Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_entr_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 48%] 2023-01-11T23:13:47.1621865Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_entr_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 48%] 2023-01-11T23:13:47.1622058Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_erfcx_executor_aten_cuda_int32 PASSED [ 48%] 2023-01-11T23:13:47.1622250Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_erfcx_executor_aten_cuda_int64 PASSED [ 48%] 2023-01-11T23:13:47.1622439Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_erfcx_executor_aten_cuda_int8 PASSED [ 48%] 2023-01-11T23:13:47.1622630Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_erfcx_executor_aten_cuda_uint8 PASSED [ 48%] 2023-01-11T23:13:47.1622984Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_erfcx_executor_nvfuser_cuda_float32 SKIPPED (_refs.special.erfcx doesn't support nvfuser) [ 48%] 2023-01-11T23:13:47.1623199Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_erfcx_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 48%] 2023-01-11T23:13:47.1623529Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_erfcx_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 48%] 2023-01-11T23:13:47.1623724Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i0e_executor_aten_cuda_bfloat16 PASSED [ 48%] 2023-01-11T23:13:47.1623913Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i0e_executor_aten_cuda_bool PASSED [ 48%] 2023-01-11T23:13:47.1624127Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i0e_executor_aten_cuda_float16 PASSED [ 48%] 2023-01-11T23:13:47.1624317Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i0e_executor_aten_cuda_int16 PASSED [ 48%] 2023-01-11T23:13:47.1624503Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i0e_executor_aten_cuda_int32 PASSED [ 48%] 2023-01-11T23:13:47.1624688Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i0e_executor_aten_cuda_int8 PASSED [ 48%] 2023-01-11T23:13:47.1624869Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i0e_executor_aten_cuda_uint8 PASSED [ 48%] 2023-01-11T23:13:47.1625086Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i0e_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 48%] 2023-01-11T23:13:47.1625301Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i0e_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 48%] 2023-01-11T23:13:47.1625509Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i0e_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 48%] 2023-01-11T23:13:47.1625856Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i0e_executor_nvfuser_cuda_int32 SKIPPED (_refs.special.i0e doesn't support nvfuser) [ 48%] 2023-01-11T23:13:47.1626069Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i0e_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 48%] 2023-01-11T23:13:47.1626253Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1_executor_aten_cuda_int32 PASSED [ 48%] 2023-01-11T23:13:47.1626438Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1_executor_aten_cuda_int64 PASSED [ 48%] 2023-01-11T23:13:47.1626647Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1_executor_aten_cuda_uint8 PASSED [ 48%] 2023-01-11T23:13:47.1626867Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 48%] 2023-01-11T23:13:47.1627206Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1_executor_nvfuser_cuda_float32 SKIPPED (_refs.special.i1 doesn't support nvfuser) [ 48%] 2023-01-11T23:13:47.1627420Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 48%] 2023-01-11T23:13:47.1627751Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 48%] 2023-01-11T23:13:47.1627954Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 48%] 2023-01-11T23:13:47.1628290Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 48%] 2023-01-11T23:13:47.1628503Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 48%] 2023-01-11T23:13:47.1628689Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1e_executor_aten_cuda_bool PASSED [ 48%] 2023-01-11T23:13:47.1628901Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1e_executor_aten_cuda_float64 PASSED [ 48%] 2023-01-11T23:13:47.1629118Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1e_executor_aten_cuda_int32 PASSED [ 48%] 2023-01-11T23:13:47.1629303Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1e_executor_aten_cuda_int64 PASSED [ 48%] 2023-01-11T23:13:47.1629488Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1e_executor_aten_cuda_int8 PASSED [ 48%] 2023-01-11T23:13:47.1629910Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1e_executor_nvfuser_cuda_float32 SKIPPED (_refs.special.i1e doesn't support nvfuser) [ 48%] 2023-01-11T23:13:47.1630153Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1e_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 48%] 2023-01-11T23:13:47.1630360Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1e_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 48%] 2023-01-11T23:13:47.1630700Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1e_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 48%] 2023-01-11T23:13:47.1630913Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1e_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 48%] 2023-01-11T23:13:47.1631108Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_ndtr_executor_aten_cuda_bool PASSED [ 48%] 2023-01-11T23:13:47.1631311Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_ndtr_executor_aten_cuda_float64 PASSED [ 48%] 2023-01-11T23:13:47.1631507Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_ndtr_executor_aten_cuda_int16 PASSED [ 48%] 2023-01-11T23:13:47.1631727Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_ndtr_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 48%] 2023-01-11T23:13:47.1632096Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_ndtr_executor_nvfuser_cuda_float32 SKIPPED (_refs.special.log_ndtr doesn't support nvfuser) [ 48%] 2023-01-11T23:13:47.1632320Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_ndtr_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 48%] 2023-01-11T23:13:47.1632668Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_ndtr_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 48%] 2023-01-11T23:13:47.1633043Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_ndtr_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 48%] 2023-01-11T23:13:47.1633261Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_ndtr_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 48%] 2023-01-11T23:13:47.1633470Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_aten_cuda_bool PASSED [ 48%] 2023-01-11T23:13:47.1633685Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_aten_cuda_complex32 PASSED [ 48%] 2023-01-11T23:13:47.1633892Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_aten_cuda_float32 PASSED [ 48%] 2023-01-11T23:13:47.1634096Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_aten_cuda_float64 PASSED [ 48%] 2023-01-11T23:13:47.1634303Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_aten_cuda_int32 PASSED [ 48%] 2023-01-11T23:13:47.1634515Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_aten_cuda_int64 PASSED [ 48%] 2023-01-11T23:13:47.1634724Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_aten_cuda_uint8 PASSED [ 48%] 2023-01-11T23:13:47.1634954Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 48%] 2023-01-11T23:13:47.1635194Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 48%] 2023-01-11T23:13:47.1635581Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 48%] 2023-01-11T23:13:47.1635809Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 48%] 2023-01-11T23:13:47.1636206Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 48%] 2023-01-11T23:13:47.1636421Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_nvfuser_cuda_int32 PASSED [ 48%] 2023-01-11T23:13:47.1636651Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 48%] 2023-01-11T23:13:47.1636879Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 48%] 2023-01-11T23:13:47.1637074Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_logit_executor_aten_cuda_bfloat16 PASSED [ 48%] 2023-01-11T23:13:47.1637270Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_logit_executor_aten_cuda_int16 PASSED [ 48%] 2023-01-11T23:13:47.1637464Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_logit_executor_aten_cuda_int64 PASSED [ 48%] 2023-01-11T23:13:47.1637654Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_logit_executor_aten_cuda_int8 PASSED [ 48%] 2023-01-11T23:13:47.1637843Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_logit_executor_aten_cuda_uint8 PASSED [ 48%] 2023-01-11T23:13:47.1638051Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_logit_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 48%] 2023-01-11T23:13:47.1638270Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_logit_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 48%] 2023-01-11T23:13:47.1638521Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_logit_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 48%] 2023-01-11T23:13:47.1638738Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_logit_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 48%] 2023-01-11T23:13:47.1638978Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_1_executor_aten_cuda_float16 PASSED [ 48%] 2023-01-11T23:13:47.1639214Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_1_executor_aten_cuda_float32 PASSED [ 48%] 2023-01-11T23:13:47.1639424Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_1_executor_aten_cuda_float64 PASSED [ 48%] 2023-01-11T23:13:47.1639635Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_1_executor_aten_cuda_int16 PASSED [ 48%] 2023-01-11T23:13:47.1639840Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_1_executor_aten_cuda_int64 PASSED [ 48%] 2023-01-11T23:13:47.1640050Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_1_executor_aten_cuda_int8 PASSED [ 48%] 2023-01-11T23:13:47.1640455Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_1_executor_nvfuser_cuda_float32 SKIPPED (_refs.special.multigammaln doesn't support nvfuser) [ 48%] 2023-01-11T23:13:47.1640820Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_1_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 48%] 2023-01-11T23:13:47.1641058Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_1_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 48%] 2023-01-11T23:13:47.1641268Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_3_executor_aten_cuda_float16 PASSED [ 48%] 2023-01-11T23:13:47.1641481Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_3_executor_aten_cuda_float64 PASSED [ 48%] 2023-01-11T23:13:47.1641715Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_3_executor_aten_cuda_int16 PASSED [ 48%] 2023-01-11T23:13:47.1641923Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_3_executor_aten_cuda_int32 PASSED [ 48%] 2023-01-11T23:13:47.1642126Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_3_executor_aten_cuda_int64 PASSED [ 48%] 2023-01-11T23:13:47.1642361Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_3_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 48%] 2023-01-11T23:13:47.1642569Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_5_executor_aten_cuda_float16 PASSED [ 48%] 2023-01-11T23:13:47.1642780Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_5_executor_aten_cuda_float32 PASSED [ 48%] 2023-01-11T23:13:47.1642988Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_5_executor_aten_cuda_int64 PASSED [ 48%] 2023-01-11T23:13:47.1643218Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_5_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 48%] 2023-01-11T23:13:47.1643449Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_5_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 48%] 2023-01-11T23:13:47.1643640Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtr_executor_aten_cuda_bfloat16 PASSED [ 48%] 2023-01-11T23:13:47.1643829Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtr_executor_aten_cuda_float32 PASSED [ 48%] 2023-01-11T23:13:47.1644019Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtr_executor_aten_cuda_int32 PASSED [ 48%] 2023-01-11T23:13:47.1644233Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtr_executor_aten_cuda_int64 PASSED [ 48%] 2023-01-11T23:13:47.1644425Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtr_executor_aten_cuda_int8 PASSED [ 48%] 2023-01-11T23:13:47.1644640Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtr_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 48%] 2023-01-11T23:13:47.1644857Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtr_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 48%] 2023-01-11T23:13:47.1645211Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtr_executor_nvfuser_cuda_float32 SKIPPED (_refs.special.ndtr doesn't support nvfuser) [ 48%] 2023-01-11T23:13:47.1645421Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtr_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 48%] 2023-01-11T23:13:47.1645761Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtr_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 48%] 2023-01-11T23:13:47.1645976Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtr_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 48%] 2023-01-11T23:13:47.1646165Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtri_executor_aten_cuda_int16 PASSED [ 48%] 2023-01-11T23:13:47.1646355Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtri_executor_aten_cuda_int32 PASSED [ 49%] 2023-01-11T23:13:47.1646542Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtri_executor_aten_cuda_int8 PASSED [ 49%] 2023-01-11T23:13:47.1646734Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtri_executor_aten_cuda_uint8 PASSED [ 49%] 2023-01-11T23:13:47.1646952Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtri_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 49%] 2023-01-11T23:13:47.1647296Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtri_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 49%] 2023-01-11T23:13:47.1647544Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtri_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 49%] 2023-01-11T23:13:47.1647879Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtri_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 49%] 2023-01-11T23:13:47.1648088Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_softmax_with_dtype_executor_aten_cuda_bfloat16 PASSED [ 49%] 2023-01-11T23:13:47.1648293Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_softmax_with_dtype_executor_aten_cuda_bool PASSED [ 49%] 2023-01-11T23:13:47.1648498Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_softmax_with_dtype_executor_aten_cuda_float16 PASSED [ 49%] 2023-01-11T23:13:47.1648704Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_softmax_with_dtype_executor_aten_cuda_float32 PASSED [ 49%] 2023-01-11T23:13:47.1648909Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_softmax_with_dtype_executor_aten_cuda_float64 PASSED [ 49%] 2023-01-11T23:13:47.1649111Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_softmax_with_dtype_executor_aten_cuda_int32 PASSED [ 49%] 2023-01-11T23:13:47.1649309Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_softmax_with_dtype_executor_aten_cuda_int8 PASSED [ 49%] 2023-01-11T23:13:47.1649506Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_softmax_with_dtype_executor_aten_cuda_uint8 PASSED [ 49%] 2023-01-11T23:13:47.1649733Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_softmax_with_dtype_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 49%] 2023-01-11T23:13:47.1649981Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_softmax_with_dtype_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 49%] 2023-01-11T23:13:47.1650219Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_softmax_with_dtype_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 49%] 2023-01-11T23:13:47.1650580Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_softmax_with_dtype_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 49%] 2023-01-11T23:13:47.1650804Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_softmax_with_dtype_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 49%] 2023-01-11T23:13:47.1651164Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_softmax_with_dtype_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 49%] 2023-01-11T23:13:47.1651390Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_softmax_with_dtype_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 49%] 2023-01-11T23:13:47.1651599Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_spherical_bessel_j0_executor_aten_cuda_float32 PASSED [ 49%] 2023-01-11T23:13:47.1651807Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_spherical_bessel_j0_executor_aten_cuda_int32 PASSED [ 49%] 2023-01-11T23:13:47.1652011Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_spherical_bessel_j0_executor_aten_cuda_int64 PASSED [ 49%] 2023-01-11T23:13:47.1652409Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_spherical_bessel_j0_executor_nvfuser_cuda_float32 SKIPPED (_refs.special.spherical_bessel_j0 doesn't support nvfuser) [ 49%] 2023-01-11T23:13:47.1652640Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_spherical_bessel_j0_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 49%] 2023-01-11T23:13:47.1653001Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_spherical_bessel_j0_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 49%] 2023-01-11T23:13:47.1653253Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_spherical_bessel_j0_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 49%] 2023-01-11T23:13:47.1653447Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_xlog1py_executor_aten_cuda_bool PASSED [ 49%] 2023-01-11T23:13:47.1653640Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_xlog1py_executor_aten_cuda_int16 PASSED [ 49%] 2023-01-11T23:13:47.1653828Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_xlog1py_executor_aten_cuda_int32 PASSED [ 49%] 2023-01-11T23:13:47.1654049Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_xlog1py_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 49%] 2023-01-11T23:13:47.1654267Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_xlog1py_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 49%] 2023-01-11T23:13:47.1654688Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_xlog1py_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 49%] 2023-01-11T23:13:47.1655040Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_xlog1py_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 49%] 2023-01-11T23:13:47.1655380Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_xlog1py_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 49%] 2023-01-11T23:13:47.1655573Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_zeta_executor_aten_cuda_bool PASSED [ 49%] 2023-01-11T23:13:47.1655821Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_zeta_executor_aten_cuda_float64 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 49%] 2023-01-11T23:13:47.1656060Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_zeta_executor_aten_cuda_int16 PASSED [ 49%] 2023-01-11T23:13:47.1656253Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_zeta_executor_aten_cuda_int32 PASSED [ 49%] 2023-01-11T23:13:47.1656444Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_zeta_executor_aten_cuda_int64 PASSED [ 49%] 2023-01-11T23:13:47.1656634Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_zeta_executor_aten_cuda_uint8 PASSED [ 49%] 2023-01-11T23:13:47.1656850Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_zeta_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 49%] 2023-01-11T23:13:47.1657198Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_zeta_executor_nvfuser_cuda_int32 SKIPPED (_refs.special.zeta doesn't support nvfuser) [ 49%] 2023-01-11T23:13:47.1657538Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_zeta_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 49%] 2023-01-11T23:13:47.1657731Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_aten_cuda_bfloat16 PASSED [ 49%] 2023-01-11T23:13:47.1657919Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_aten_cuda_bool PASSED [ 49%] 2023-01-11T23:13:47.1658102Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_aten_cuda_complex128 PASSED [ 49%] 2023-01-11T23:13:47.1658288Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_aten_cuda_float16 PASSED [ 49%] 2023-01-11T23:13:47.1658470Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_aten_cuda_float32 PASSED [ 49%] 2023-01-11T23:13:47.1658652Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_aten_cuda_int32 PASSED [ 49%] 2023-01-11T23:13:47.1658834Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_aten_cuda_int64 PASSED [ 49%] 2023-01-11T23:13:47.1659021Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_aten_cuda_int8 PASSED [ 49%] 2023-01-11T23:13:47.1659201Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_aten_cuda_uint8 PASSED [ 49%] 2023-01-11T23:13:47.1659448Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 49%] 2023-01-11T23:13:47.1659656Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 49%] 2023-01-11T23:13:47.1659869Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 49%] 2023-01-11T23:13:47.1660068Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 49%] 2023-01-11T23:13:47.1660254Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_nvfuser_cuda_int32 PASSED [ 49%] 2023-01-11T23:13:47.1660582Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 49%] 2023-01-11T23:13:47.1660770Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_aten_cuda_bfloat16 PASSED [ 49%] 2023-01-11T23:13:47.1660958Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_aten_cuda_complex128 PASSED [ 49%] 2023-01-11T23:13:47.1661139Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_aten_cuda_int64 PASSED [ 49%] 2023-01-11T23:13:47.1661354Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 49%] 2023-01-11T23:13:47.1661557Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_nvfuser_cuda_complex64 SKIPPED (Skipped!) [ 49%] 2023-01-11T23:13:47.1661765Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 49%] 2023-01-11T23:13:47.1661988Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 49%] 2023-01-11T23:13:47.1662319Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 49%] 2023-01-11T23:13:47.1662507Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_aten_cuda_bool PASSED [ 49%] 2023-01-11T23:13:47.1662697Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_aten_cuda_complex128 PASSED [ 49%] 2023-01-11T23:13:47.1662886Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_aten_cuda_complex64 PASSED [ 49%] 2023-01-11T23:13:47.1663072Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_aten_cuda_float64 PASSED [ 49%] 2023-01-11T23:13:47.1663257Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_aten_cuda_int16 PASSED [ 49%] 2023-01-11T23:13:47.1663440Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_aten_cuda_int32 PASSED [ 49%] 2023-01-11T23:13:47.1663619Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_aten_cuda_int64 PASSED [ 49%] 2023-01-11T23:13:47.1663827Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 49%] 2023-01-11T23:13:47.1664035Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 49%] 2023-01-11T23:13:47.1664385Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 49%] 2023-01-11T23:13:47.1664598Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 49%] 2023-01-11T23:13:47.1664810Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 49%] 2023-01-11T23:13:47.1665001Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_nvfuser_cuda_float32 PASSED [ 49%] 2023-01-11T23:13:47.1665211Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_nvfuser_cuda_int32 PASSED [ 49%] 2023-01-11T23:13:47.1665541Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 49%] 2023-01-11T23:13:47.1665732Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_aten_cuda_complex32 PASSED [ 49%] 2023-01-11T23:13:47.1665918Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_aten_cuda_float64 PASSED [ 49%] 2023-01-11T23:13:47.1666099Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_aten_cuda_int16 PASSED [ 49%] 2023-01-11T23:13:47.1666277Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_aten_cuda_int64 PASSED [ 49%] 2023-01-11T23:13:47.1666458Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_aten_cuda_uint8 PASSED [ 49%] 2023-01-11T23:13:47.1666669Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 49%] 2023-01-11T23:13:47.1666874Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 49%] 2023-01-11T23:13:47.1667085Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 49%] 2023-01-11T23:13:47.1667292Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 49%] 2023-01-11T23:13:47.1667620Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 49%] 2023-01-11T23:13:47.1667852Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 49%] 2023-01-11T23:13:47.1668044Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_std_executor_aten_cuda_complex128 PASSED [ 49%] 2023-01-11T23:13:47.1668230Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_std_executor_aten_cuda_complex64 PASSED [ 49%] 2023-01-11T23:13:47.1668403Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_std_executor_aten_cuda_float16 PASSED [ 49%] 2023-01-11T23:13:47.1668583Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_std_executor_aten_cuda_float32 PASSED [ 49%] 2023-01-11T23:13:47.1668761Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_std_executor_aten_cuda_float64 PASSED [ 49%] 2023-01-11T23:13:47.1668968Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_std_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 49%] 2023-01-11T23:13:47.1669181Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_std_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 49%] 2023-01-11T23:13:47.1669390Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_std_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 49%] 2023-01-11T23:13:47.1669596Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_std_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 49%] 2023-01-11T23:13:47.1669847Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_std_executor_nvfuser_cuda_float32 PASSED [ 49%] 2023-01-11T23:13:47.1670054Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_std_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 49%] 2023-01-11T23:13:47.1670237Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_std_mean_executor_aten_cuda_bfloat16 PASSED [ 49%] 2023-01-11T23:13:47.1670428Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_std_mean_executor_aten_cuda_complex64 PASSED [ 49%] 2023-01-11T23:13:47.1670618Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_std_mean_executor_aten_cuda_float16 PASSED [ 49%] 2023-01-11T23:13:47.1670832Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_std_mean_executor_aten_cuda_float32 PASSED [ 49%] 2023-01-11T23:13:47.1671015Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_std_mean_executor_aten_cuda_float64 PASSED [ 49%] 2023-01-11T23:13:47.1671229Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_std_mean_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 49%] 2023-01-11T23:13:47.1671446Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_std_mean_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 49%] 2023-01-11T23:13:47.1671662Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_std_mean_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 49%] 2023-01-11T23:13:47.1671850Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_std_mean_executor_nvfuser_cuda_float32 PASSED [ 49%] 2023-01-11T23:13:47.1672064Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_std_mean_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 49%] 2023-01-11T23:13:47.1672243Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sub_executor_aten_cuda_bfloat16 PASSED [ 49%] 2023-01-11T23:13:47.1672427Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sub_executor_aten_cuda_complex64 PASSED [ 49%] 2023-01-11T23:13:47.1672608Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sub_executor_aten_cuda_float64 PASSED [ 49%] 2023-01-11T23:13:47.1672819Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sub_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 49%] 2023-01-11T23:13:47.1673027Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sub_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 49%] 2023-01-11T23:13:47.1673211Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sub_executor_nvfuser_cuda_float32 PASSED [ 49%] 2023-01-11T23:13:47.1673424Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sub_executor_nvfuser_cuda_int32 PASSED [ 49%] 2023-01-11T23:13:47.1673609Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_aten_cuda_bfloat16 PASSED [ 49%] 2023-01-11T23:13:47.1673791Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_aten_cuda_complex128 PASSED [ 49%] 2023-01-11T23:13:47.1673963Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_aten_cuda_float64 PASSED [ 49%] 2023-01-11T23:13:47.1674145Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_aten_cuda_int16 PASSED [ 49%] 2023-01-11T23:13:47.1674324Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_aten_cuda_int32 PASSED [ 49%] 2023-01-11T23:13:47.1674504Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_aten_cuda_int8 PASSED [ 49%] 2023-01-11T23:13:47.1674686Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_aten_cuda_uint8 PASSED [ 49%] 2023-01-11T23:13:47.1674895Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 49%] 2023-01-11T23:13:47.1675083Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_nvfuser_cuda_float32 PASSED [ 49%] 2023-01-11T23:13:47.1675414Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 49%] 2023-01-11T23:13:47.1675596Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_nvfuser_cuda_int32 PASSED [ 49%] 2023-01-11T23:13:47.1675784Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_to_size_executor_aten_cuda_float16 PASSED [ 49%] 2023-01-11T23:13:47.1675966Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_to_size_executor_aten_cuda_float32 PASSED [ 49%] 2023-01-11T23:13:47.1676159Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_to_size_executor_aten_cuda_float64 PASSED [ 49%] 2023-01-11T23:13:47.1676346Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_to_size_executor_aten_cuda_int32 PASSED [ 49%] 2023-01-11T23:13:47.1676574Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_to_size_executor_aten_cuda_int64 PASSED [ 49%] 2023-01-11T23:13:47.1676790Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_to_size_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 49%] 2023-01-11T23:13:47.1677007Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_to_size_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 49%] 2023-01-11T23:13:47.1677219Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_to_size_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 49%] 2023-01-11T23:13:47.1677556Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_to_size_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 49%] 2023-01-11T23:13:47.1677769Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_to_size_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 49%] 2023-01-11T23:13:47.1677954Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_executor_aten_cuda_complex128 PASSED [ 49%] 2023-01-11T23:13:47.1678127Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_executor_aten_cuda_float16 PASSED [ 49%] 2023-01-11T23:13:47.1678306Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_executor_aten_cuda_float64 PASSED [ 49%] 2023-01-11T23:13:47.1678487Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_executor_aten_cuda_int16 PASSED [ 49%] 2023-01-11T23:13:47.1678658Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_executor_aten_cuda_int32 PASSED [ 49%] 2023-01-11T23:13:47.1678828Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_executor_aten_cuda_int64 PASSED [ 49%] 2023-01-11T23:13:47.1679043Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_executor_aten_cuda_int8 PASSED [ 49%] 2023-01-11T23:13:47.1679248Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 50%] 2023-01-11T23:13:47.1679453Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 50%] 2023-01-11T23:13:47.1679662Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 50%] 2023-01-11T23:13:47.1679979Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 50%] 2023-01-11T23:13:47.1680162Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_executor_nvfuser_cuda_int32 PASSED [ 50%] 2023-01-11T23:13:47.1680364Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 50%] 2023-01-11T23:13:47.1680686Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 50%] 2023-01-11T23:13:47.1680872Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tan_executor_aten_cuda_bfloat16 PASSED [ 50%] 2023-01-11T23:13:47.1681050Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tan_executor_aten_cuda_bool PASSED [ 50%] 2023-01-11T23:13:47.1681232Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tan_executor_aten_cuda_float64 PASSED [ 50%] 2023-01-11T23:13:47.1681410Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tan_executor_aten_cuda_int64 PASSED [ 50%] 2023-01-11T23:13:47.1681588Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tan_executor_aten_cuda_int8 PASSED [ 50%] 2023-01-11T23:13:47.1681758Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tan_executor_aten_cuda_uint8 PASSED [ 50%] 2023-01-11T23:13:47.1681965Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tan_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 50%] 2023-01-11T23:13:47.1682205Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tan_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 50%] 2023-01-11T23:13:47.1682547Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tan_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 50%] 2023-01-11T23:13:47.1682733Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tan_executor_nvfuser_cuda_float32 PASSED [ 50%] 2023-01-11T23:13:47.1683059Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tan_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 50%] 2023-01-11T23:13:47.1683240Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tan_executor_nvfuser_cuda_int32 PASSED [ 50%] 2023-01-11T23:13:47.1683564Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tan_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 50%] 2023-01-11T23:13:47.1683772Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tan_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 50%] 2023-01-11T23:13:47.1683955Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_aten_cuda_bool PASSED [ 50%] 2023-01-11T23:13:47.1684132Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_aten_cuda_complex32 PASSED [ 50%] 2023-01-11T23:13:47.1684316Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_aten_cuda_complex64 PASSED [ 50%] 2023-01-11T23:13:47.1684498Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_aten_cuda_int32 PASSED [ 50%] 2023-01-11T23:13:47.1684681Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_aten_cuda_int64 PASSED [ 50%] 2023-01-11T23:13:47.1684860Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_aten_cuda_int8 PASSED [ 50%] 2023-01-11T23:13:47.1685069Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_aten_cuda_uint8 PASSED [ 50%] 2023-01-11T23:13:47.1685279Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 50%] 2023-01-11T23:13:47.1685492Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 50%] 2023-01-11T23:13:47.1685695Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 50%] 2023-01-11T23:13:47.1685901Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 50%] 2023-01-11T23:13:47.1686089Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tensor_split_executor_aten_cuda_bfloat16 PASSED [ 50%] 2023-01-11T23:13:47.1686283Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tensor_split_executor_aten_cuda_complex128 PASSED [ 50%] 2023-01-11T23:13:47.1686478Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tensor_split_executor_aten_cuda_complex64 PASSED [ 50%] 2023-01-11T23:13:47.1686668Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tensor_split_executor_aten_cuda_float16 PASSED [ 50%] 2023-01-11T23:13:47.1686858Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tensor_split_executor_aten_cuda_float32 PASSED [ 50%] 2023-01-11T23:13:47.1687044Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tensor_split_executor_aten_cuda_int32 PASSED [ 50%] 2023-01-11T23:13:47.1687233Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tensor_split_executor_aten_cuda_uint8 PASSED [ 50%] 2023-01-11T23:13:47.1687446Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tensor_split_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 50%] 2023-01-11T23:13:47.1687666Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tensor_split_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 50%] 2023-01-11T23:13:47.1687876Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tensor_split_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 50%] 2023-01-11T23:13:47.1688115Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tensor_split_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 50%] 2023-01-11T23:13:47.1688455Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tensor_split_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 50%] 2023-01-11T23:13:47.1688669Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tensor_split_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 50%] 2023-01-11T23:13:47.1688851Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_to_executor_aten_cuda_bfloat16 PASSED [ 50%] 2023-01-11T23:13:47.1689034Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_to_executor_aten_cuda_complex64 PASSED [ 50%] 2023-01-11T23:13:47.1689216Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_to_executor_aten_cuda_float64 PASSED [ 50%] 2023-01-11T23:13:47.1689398Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_to_executor_aten_cuda_int16 PASSED [ 50%] 2023-01-11T23:13:47.1689576Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_to_executor_aten_cuda_int8 PASSED [ 50%] 2023-01-11T23:13:47.1689783Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_to_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 50%] 2023-01-11T23:13:47.1689980Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_to_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 50%] 2023-01-11T23:13:47.1690186Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_to_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 50%] 2023-01-11T23:13:47.1690499Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_to_executor_nvfuser_cuda_float32 SKIPPED (_refs.to doesn't support nvfuser) [ 50%] 2023-01-11T23:13:47.1690835Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_to_executor_nvfuser_cuda_int32 SKIPPED (_refs.to doesn't support nvfuser) [ 50%] 2023-01-11T23:13:47.1691040Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_to_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 50%] 2023-01-11T23:13:47.1691360Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_to_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 50%] 2023-01-11T23:13:47.1691562Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_to_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 50%] 2023-01-11T23:13:47.1691750Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_aten_cuda_complex128 PASSED [ 50%] 2023-01-11T23:13:47.1691936Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_aten_cuda_complex32 PASSED [ 50%] 2023-01-11T23:13:47.1692124Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_aten_cuda_complex64 PASSED [ 50%] 2023-01-11T23:13:47.1692302Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_aten_cuda_float16 PASSED [ 50%] 2023-01-11T23:13:47.1692489Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_aten_cuda_float32 PASSED [ 50%] 2023-01-11T23:13:47.1692668Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_aten_cuda_float64 PASSED [ 50%] 2023-01-11T23:13:47.1692847Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_aten_cuda_int16 PASSED [ 50%] 2023-01-11T23:13:47.1693027Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_aten_cuda_int64 PASSED [ 50%] 2023-01-11T23:13:47.1693239Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 50%] 2023-01-11T23:13:47.1693445Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 50%] 2023-01-11T23:13:47.1693777Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_nvfuser_cuda_float32 SKIPPED (_refs.trace doesn't support nvfuser) [ 50%] 2023-01-11T23:13:47.1694134Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 50%] 2023-01-11T23:13:47.1694341Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 50%] 2023-01-11T23:13:47.1694727Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 50%] 2023-01-11T23:13:47.1694923Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_aten_cuda_bfloat16 PASSED [ 50%] 2023-01-11T23:13:47.1695116Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_aten_cuda_complex128 PASSED [ 50%] 2023-01-11T23:13:47.1695312Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_aten_cuda_complex32 PASSED [ 50%] 2023-01-11T23:13:47.1695500Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_aten_cuda_float16 PASSED [ 50%] 2023-01-11T23:13:47.1695692Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_aten_cuda_float32 PASSED [ 50%] 2023-01-11T23:13:47.1695876Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_aten_cuda_int32 PASSED [ 50%] 2023-01-11T23:13:47.1696070Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_aten_cuda_int64 PASSED [ 50%] 2023-01-11T23:13:47.1696253Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_aten_cuda_int8 PASSED [ 50%] 2023-01-11T23:13:47.1696462Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 50%] 2023-01-11T23:13:47.1696882Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 50%] 2023-01-11T23:13:47.1697102Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 50%] 2023-01-11T23:13:47.1697304Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_nvfuser_cuda_float32 PASSED [ 50%] 2023-01-11T23:13:47.1697527Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 50%] 2023-01-11T23:13:47.1697861Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 50%] 2023-01-11T23:13:47.1698056Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_executor_aten_cuda_complex128 PASSED [ 50%] 2023-01-11T23:13:47.1698249Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_executor_aten_cuda_complex32 PASSED [ 50%] 2023-01-11T23:13:47.1698438Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_executor_aten_cuda_float16 PASSED [ 50%] 2023-01-11T23:13:47.1698628Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_executor_aten_cuda_float32 PASSED [ 50%] 2023-01-11T23:13:47.1698806Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_executor_aten_cuda_float64 PASSED [ 50%] 2023-01-11T23:13:47.1699008Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_executor_aten_cuda_int32 PASSED [ 50%] 2023-01-11T23:13:47.1699222Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_executor_aten_cuda_int64 PASSED [ 50%] 2023-01-11T23:13:47.1699434Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 50%] 2023-01-11T23:13:47.1699783Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 50%] 2023-01-11T23:13:47.1700003Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 50%] 2023-01-11T23:13:47.1700360Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_executor_nvfuser_cuda_float32 SKIPPED (_refs.tril doesn't support nvfuser) [ 50%] 2023-01-11T23:13:47.1700690Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 50%] 2023-01-11T23:13:47.1700882Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_indices_executor_aten_cuda_int32 PASSED [ 50%] 2023-01-11T23:13:47.1701072Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_indices_executor_aten_cuda_int64 PASSED [ 50%] 2023-01-11T23:13:47.1701406Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_indices_executor_nvfuser_cuda_int32 SKIPPED (_refs.tril_indices doesn't support nvfuser) [ 50%] 2023-01-11T23:13:47.1701628Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_indices_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 50%] 2023-01-11T23:13:47.1701818Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_executor_aten_cuda_complex32 PASSED [ 50%] 2023-01-11T23:13:47.1702005Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_executor_aten_cuda_complex64 PASSED [ 50%] 2023-01-11T23:13:47.1702187Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_executor_aten_cuda_float16 PASSED [ 50%] 2023-01-11T23:13:47.1702371Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_executor_aten_cuda_float64 PASSED [ 50%] 2023-01-11T23:13:47.1702551Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_executor_aten_cuda_int32 PASSED [ 50%] 2023-01-11T23:13:47.1702734Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_executor_aten_cuda_uint8 PASSED [ 50%] 2023-01-11T23:13:47.1702975Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 50%] 2023-01-11T23:13:47.1703322Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 50%] 2023-01-11T23:13:47.1703524Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 50%] 2023-01-11T23:13:47.1703728Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 50%] 2023-01-11T23:13:47.1704046Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_executor_nvfuser_cuda_float32 SKIPPED (_refs.triu doesn't support nvfuser) [ 50%] 2023-01-11T23:13:47.1704250Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 50%] 2023-01-11T23:13:47.1704576Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 50%] 2023-01-11T23:13:47.1704783Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 50%] 2023-01-11T23:13:47.1704971Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_indices_executor_aten_cuda_int32 PASSED [ 50%] 2023-01-11T23:13:47.1705160Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_indices_executor_aten_cuda_int64 PASSED [ 50%] 2023-01-11T23:13:47.1705499Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_indices_executor_nvfuser_cuda_int32 SKIPPED (_refs.triu_indices doesn't support nvfuser) [ 50%] 2023-01-11T23:13:47.1705688Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_true_divide_executor_aten_cuda_bool PASSED [ 50%] 2023-01-11T23:13:47.1705871Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_true_divide_executor_aten_cuda_complex32 XFAIL [ 50%] 2023-01-11T23:13:47.1706068Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_true_divide_executor_aten_cuda_complex64 PASSED [ 50%] 2023-01-11T23:13:47.1706284Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_true_divide_executor_aten_cuda_float32 PASSED [ 50%] 2023-01-11T23:13:47.1706474Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_true_divide_executor_aten_cuda_int16 PASSED [ 50%] 2023-01-11T23:13:47.1706664Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_true_divide_executor_aten_cuda_int32 PASSED [ 50%] 2023-01-11T23:13:47.1706850Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_true_divide_executor_aten_cuda_int64 PASSED [ 50%] 2023-01-11T23:13:47.1707032Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_true_divide_executor_aten_cuda_uint8 PASSED [ 50%] 2023-01-11T23:13:47.1707246Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_true_divide_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 50%] 2023-01-11T23:13:47.1707603Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_true_divide_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 50%] 2023-01-11T23:13:47.1707824Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_true_divide_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 50%] 2023-01-11T23:13:47.1708033Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_true_divide_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 50%] 2023-01-11T23:13:47.1708228Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_true_divide_executor_nvfuser_cuda_float32 PASSED [ 50%] 2023-01-11T23:13:47.1708569Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_true_divide_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 50%] 2023-01-11T23:13:47.1708759Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_true_divide_executor_nvfuser_cuda_int32 PASSED [ 50%] 2023-01-11T23:13:47.1709123Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_true_divide_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 50%] 2023-01-11T23:13:47.1709339Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_true_divide_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 50%] 2023-01-11T23:13:47.1709523Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trunc_executor_aten_cuda_float32 PASSED [ 50%] 2023-01-11T23:13:47.1709761Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trunc_executor_aten_cuda_int16 PASSED [ 50%] 2023-01-11T23:13:47.1709965Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trunc_executor_aten_cuda_int32 PASSED [ 50%] 2023-01-11T23:13:47.1710147Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trunc_executor_aten_cuda_int64 PASSED [ 50%] 2023-01-11T23:13:47.1710352Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trunc_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 50%] 2023-01-11T23:13:47.1710560Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trunc_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 50%] 2023-01-11T23:13:47.1710768Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trunc_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 50%] 2023-01-11T23:13:47.1710946Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_aten_cuda_bool PASSED [ 50%] 2023-01-11T23:13:47.1711136Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_aten_cuda_complex64 PASSED [ 50%] 2023-01-11T23:13:47.1711323Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_aten_cuda_float64 PASSED [ 50%] 2023-01-11T23:13:47.1711507Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_aten_cuda_int16 PASSED [ 50%] 2023-01-11T23:13:47.1711688Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_aten_cuda_int64 PASSED [ 50%] 2023-01-11T23:13:47.1711871Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_aten_cuda_int8 PASSED [ 50%] 2023-01-11T23:13:47.1712046Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_aten_cuda_uint8 PASSED [ 50%] 2023-01-11T23:13:47.1712289Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 50%] 2023-01-11T23:13:47.1712508Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 51%] 2023-01-11T23:13:47.1712715Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 51%] 2023-01-11T23:13:47.1713041Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_nvfuser_cuda_int32 SKIPPED (_refs.unbind doesn't support nvfuser) [ 51%] 2023-01-11T23:13:47.1713369Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 51%] 2023-01-11T23:13:47.1713563Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unflatten_executor_aten_cuda_bfloat16 PASSED [ 51%] 2023-01-11T23:13:47.1713758Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unflatten_executor_aten_cuda_complex128 PASSED [ 51%] 2023-01-11T23:13:47.1713947Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unflatten_executor_aten_cuda_complex32 PASSED [ 51%] 2023-01-11T23:13:47.1714135Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unflatten_executor_aten_cuda_float16 PASSED [ 51%] 2023-01-11T23:13:47.1714314Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unflatten_executor_aten_cuda_float32 PASSED [ 51%] 2023-01-11T23:13:47.1714502Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unflatten_executor_aten_cuda_int64 PASSED [ 51%] 2023-01-11T23:13:47.1714687Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unflatten_executor_aten_cuda_int8 PASSED [ 51%] 2023-01-11T23:13:47.1714897Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unflatten_executor_aten_cuda_uint8 PASSED [ 51%] 2023-01-11T23:13:47.1715120Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unflatten_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 51%] 2023-01-11T23:13:47.1715473Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unflatten_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 51%] 2023-01-11T23:13:47.1715684Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unflatten_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 51%] 2023-01-11T23:13:47.1716020Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unflatten_executor_nvfuser_cuda_float32 SKIPPED (_refs.unflatten doesn't support nvfuser) [ 51%] 2023-01-11T23:13:47.1716354Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unflatten_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 51%] 2023-01-11T23:13:47.1716564Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unflatten_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 51%] 2023-01-11T23:13:47.1716891Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unflatten_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 51%] 2023-01-11T23:13:47.1717103Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unflatten_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 51%] 2023-01-11T23:13:47.1717292Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_aten_cuda_bfloat16 PASSED [ 51%] 2023-01-11T23:13:47.1717482Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_aten_cuda_bool PASSED [ 51%] 2023-01-11T23:13:47.1717675Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_aten_cuda_complex128 PASSED [ 51%] 2023-01-11T23:13:47.1717869Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_aten_cuda_complex32 PASSED [ 51%] 2023-01-11T23:13:47.1718057Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_aten_cuda_float32 PASSED [ 51%] 2023-01-11T23:13:47.1718268Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_aten_cuda_int16 PASSED [ 51%] 2023-01-11T23:13:47.1718457Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_aten_cuda_int64 PASSED [ 51%] 2023-01-11T23:13:47.1718644Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_aten_cuda_int8 PASSED [ 51%] 2023-01-11T23:13:47.1718850Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 51%] 2023-01-11T23:13:47.1719204Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 51%] 2023-01-11T23:13:47.1719426Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 51%] 2023-01-11T23:13:47.1719644Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 51%] 2023-01-11T23:13:47.1719989Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_nvfuser_cuda_float32 SKIPPED (_refs.unfold_copy doesn't support nvfuser) [ 51%] 2023-01-11T23:13:47.1720199Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 51%] 2023-01-11T23:13:47.1720534Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 51%] 2023-01-11T23:13:47.1720868Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_nvfuser_cuda_int32 SKIPPED (_refs.unfold_copy doesn't support nvfuser) [ 51%] 2023-01-11T23:13:47.1721118Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 51%] 2023-01-11T23:13:47.1721451Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 51%] 2023-01-11T23:13:47.1721636Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_aten_cuda_bool PASSED [ 51%] 2023-01-11T23:13:47.1721817Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_aten_cuda_float16 PASSED [ 51%] 2023-01-11T23:13:47.1722002Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_aten_cuda_int16 PASSED [ 51%] 2023-01-11T23:13:47.1722183Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_aten_cuda_uint8 PASSED [ 51%] 2023-01-11T23:13:47.1722391Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 51%] 2023-01-11T23:13:47.1722607Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 51%] 2023-01-11T23:13:47.1722956Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 51%] 2023-01-11T23:13:47.1723168Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 51%] 2023-01-11T23:13:47.1723491Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_nvfuser_cuda_float32 SKIPPED (_refs.unfold doesn't support nvfuser) [ 51%] 2023-01-11T23:13:47.1723701Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 51%] 2023-01-11T23:13:47.1723906Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 51%] 2023-01-11T23:13:47.1724226Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 51%] 2023-01-11T23:13:47.1724445Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_aten_cuda_bfloat16 PASSED [ 51%] 2023-01-11T23:13:47.1724640Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_aten_cuda_complex128 PASSED [ 51%] 2023-01-11T23:13:47.1724830Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_aten_cuda_complex32 PASSED [ 51%] 2023-01-11T23:13:47.1725020Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_aten_cuda_complex64 PASSED [ 51%] 2023-01-11T23:13:47.1725206Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_aten_cuda_float32 PASSED [ 51%] 2023-01-11T23:13:47.1725389Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_aten_cuda_int16 PASSED [ 51%] 2023-01-11T23:13:47.1725580Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_aten_cuda_int32 PASSED [ 51%] 2023-01-11T23:13:47.1725764Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_aten_cuda_uint8 PASSED [ 51%] 2023-01-11T23:13:47.1725968Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 51%] 2023-01-11T23:13:47.1726180Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 51%] 2023-01-11T23:13:47.1726390Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 51%] 2023-01-11T23:13:47.1726736Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 51%] 2023-01-11T23:13:47.1726948Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_executor_aten_cuda_bfloat16 PASSED [ 51%] 2023-01-11T23:13:47.1727134Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_executor_aten_cuda_complex128 PASSED [ 51%] 2023-01-11T23:13:47.1727321Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_executor_aten_cuda_complex64 PASSED [ 51%] 2023-01-11T23:13:47.1727527Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 51%] 2023-01-11T23:13:47.1727734Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 51%] 2023-01-11T23:13:47.1727918Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_executor_nvfuser_cuda_float32 PASSED [ 51%] 2023-01-11T23:13:47.1728096Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_mean_executor_aten_cuda_float32 PASSED [ 51%] 2023-01-11T23:13:47.1728311Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_mean_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 51%] 2023-01-11T23:13:47.1728529Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_mean_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 51%] 2023-01-11T23:13:47.1728716Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_mean_executor_nvfuser_cuda_float32 PASSED [ 51%] 2023-01-11T23:13:47.1728927Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_mean_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 51%] 2023-01-11T23:13:47.1729109Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_executor_aten_cuda_bool PASSED [ 51%] 2023-01-11T23:13:47.1729296Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_executor_aten_cuda_float16 PASSED [ 51%] 2023-01-11T23:13:47.1729478Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_executor_aten_cuda_float32 PASSED [ 51%] 2023-01-11T23:13:47.1729659Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_executor_aten_cuda_float64 PASSED [ 51%] 2023-01-11T23:13:47.1729838Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_executor_aten_cuda_int64 PASSED [ 51%] 2023-01-11T23:13:47.1730034Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_executor_aten_cuda_int8 PASSED [ 51%] 2023-01-11T23:13:47.1730218Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_executor_aten_cuda_uint8 PASSED [ 51%] 2023-01-11T23:13:47.1730425Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 51%] 2023-01-11T23:13:47.1730774Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 51%] 2023-01-11T23:13:47.1730984Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 51%] 2023-01-11T23:13:47.1731311Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_executor_nvfuser_cuda_int32 SKIPPED (_refs.view_as doesn't support nvfuser) [ 51%] 2023-01-11T23:13:47.1731645Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 51%] 2023-01-11T23:13:47.1731854Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 51%] 2023-01-11T23:13:47.1732038Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_aten_cuda_complex128 PASSED [ 51%] 2023-01-11T23:13:47.1732223Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_aten_cuda_complex64 PASSED [ 51%] 2023-01-11T23:13:47.1732398Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_aten_cuda_float32 PASSED [ 51%] 2023-01-11T23:13:47.1732580Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_aten_cuda_float64 PASSED [ 51%] 2023-01-11T23:13:47.1732785Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_aten_cuda_int16 PASSED [ 51%] 2023-01-11T23:13:47.1732967Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_aten_cuda_int32 PASSED [ 51%] 2023-01-11T23:13:47.1733147Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_aten_cuda_uint8 PASSED [ 51%] 2023-01-11T23:13:47.1733358Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 51%] 2023-01-11T23:13:47.1733561Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 51%] 2023-01-11T23:13:47.1733772Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 51%] 2023-01-11T23:13:47.1734115Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 51%] 2023-01-11T23:13:47.1734326Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 51%] 2023-01-11T23:13:47.1734637Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 51%] 2023-01-11T23:13:47.1734964Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_nvfuser_cuda_float32 SKIPPED (_refs.view doesn't support nvfuser) [ 51%] 2023-01-11T23:13:47.1735168Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 51%] 2023-01-11T23:13:47.1735485Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_nvfuser_cuda_int32 SKIPPED (_refs.view doesn't support nvfuser) [ 51%] 2023-01-11T23:13:47.1735690Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 51%] 2023-01-11T23:13:47.1735901Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 51%] 2023-01-11T23:13:47.1736138Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vsplit_executor_aten_cuda_complex128 PASSED [ 51%] 2023-01-11T23:13:47.1736328Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vsplit_executor_aten_cuda_float64 PASSED [ 51%] 2023-01-11T23:13:47.1736510Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vsplit_executor_aten_cuda_int32 PASSED [ 51%] 2023-01-11T23:13:47.1736683Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vsplit_executor_aten_cuda_int64 PASSED [ 51%] 2023-01-11T23:13:47.1736867Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vsplit_executor_aten_cuda_int8 PASSED [ 51%] 2023-01-11T23:13:47.1737076Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vsplit_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 51%] 2023-01-11T23:13:47.1737294Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vsplit_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 51%] 2023-01-11T23:13:47.1737646Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vsplit_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 51%] 2023-01-11T23:13:47.1737976Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vsplit_executor_nvfuser_cuda_float32 SKIPPED (_refs.vsplit doesn't support nvfuser) [ 51%] 2023-01-11T23:13:47.1738302Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vsplit_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 51%] 2023-01-11T23:13:47.1738508Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vsplit_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 51%] 2023-01-11T23:13:47.1738695Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vstack_executor_aten_cuda_bfloat16 PASSED [ 51%] 2023-01-11T23:13:47.1738935Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vstack_executor_aten_cuda_complex128 PASSED [ 51%] 2023-01-11T23:13:47.1739155Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vstack_executor_aten_cuda_complex32 PASSED [ 51%] 2023-01-11T23:13:47.1739337Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vstack_executor_aten_cuda_float64 PASSED [ 51%] 2023-01-11T23:13:47.1739516Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vstack_executor_aten_cuda_int64 PASSED [ 51%] 2023-01-11T23:13:47.1739865Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vstack_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 51%] 2023-01-11T23:13:47.1740192Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vstack_executor_nvfuser_cuda_float32 SKIPPED (_refs.vstack doesn't support nvfuser) [ 51%] 2023-01-11T23:13:47.1740402Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vstack_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 51%] 2023-01-11T23:13:47.1740736Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vstack_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 51%] 2023-01-11T23:13:47.1740945Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vstack_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 51%] 2023-01-11T23:13:47.1741270Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vstack_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 51%] 2023-01-11T23:13:47.1741454Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_where_executor_aten_cuda_bfloat16 PASSED [ 51%] 2023-01-11T23:13:47.1741636Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_where_executor_aten_cuda_bool PASSED [ 51%] 2023-01-11T23:13:47.1741817Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_where_executor_aten_cuda_complex32 PASSED [ 51%] 2023-01-11T23:13:47.1742006Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_where_executor_aten_cuda_complex64 PASSED [ 51%] 2023-01-11T23:13:47.1742189Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_where_executor_aten_cuda_float16 PASSED [ 51%] 2023-01-11T23:13:47.1742398Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_where_executor_aten_cuda_float32 PASSED [ 51%] 2023-01-11T23:13:47.1742583Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_where_executor_aten_cuda_int16 PASSED [ 51%] 2023-01-11T23:13:47.1742763Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_where_executor_aten_cuda_int32 PASSED [ 51%] 2023-01-11T23:13:47.1742944Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_where_executor_aten_cuda_int64 PASSED [ 51%] 2023-01-11T23:13:47.1743125Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_where_executor_aten_cuda_uint8 PASSED [ 51%] 2023-01-11T23:13:47.1743337Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_where_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 51%] 2023-01-11T23:13:47.1743538Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_where_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 51%] 2023-01-11T23:13:47.1743752Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_where_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 51%] 2023-01-11T23:13:47.1744099Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_where_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 51%] 2023-01-11T23:13:47.1744309Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_where_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 51%] 2023-01-11T23:13:47.1744517Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_where_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 51%] 2023-01-11T23:13:47.1744837Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_where_executor_nvfuser_cuda_float32 SKIPPED (_refs.where doesn't support nvfuser) [ 51%] 2023-01-11T23:13:47.1745191Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_where_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 51%] 2023-01-11T23:13:47.1745379Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_xlogy_executor_aten_cuda_float32 PASSED [ 51%] 2023-01-11T23:13:47.1745561Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_xlogy_executor_aten_cuda_int16 PASSED [ 51%] 2023-01-11T23:13:47.1745742Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_xlogy_executor_aten_cuda_int32 PASSED [ 51%] 2023-01-11T23:13:47.1745914Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_xlogy_executor_aten_cuda_int8 PASSED [ 51%] 2023-01-11T23:13:47.1746094Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_xlogy_executor_aten_cuda_uint8 PASSED [ 51%] 2023-01-11T23:13:47.1746307Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_xlogy_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 51%] 2023-01-11T23:13:47.1746518Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_xlogy_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 51%] 2023-01-11T23:13:47.1746726Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_xlogy_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 51%] 2023-01-11T23:13:47.1746934Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_xlogy_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 52%] 2023-01-11T23:13:47.1747117Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_aten_cuda_bfloat16 PASSED [ 52%] 2023-01-11T23:13:47.1747305Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_aten_cuda_complex32 PASSED [ 52%] 2023-01-11T23:13:47.1747487Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_aten_cuda_float16 PASSED [ 52%] 2023-01-11T23:13:47.1747662Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_aten_cuda_float32 PASSED [ 52%] 2023-01-11T23:13:47.1747844Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_aten_cuda_float64 PASSED [ 52%] 2023-01-11T23:13:47.1748050Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_aten_cuda_int16 PASSED [ 52%] 2023-01-11T23:13:47.1748231Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_aten_cuda_int64 PASSED [ 52%] 2023-01-11T23:13:47.1748409Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_aten_cuda_int8 PASSED [ 52%] 2023-01-11T23:13:47.1748758Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 52%] 2023-01-11T23:13:47.1748964Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 52%] 2023-01-11T23:13:47.1749164Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 52%] 2023-01-11T23:13:47.1749498Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 52%] 2023-01-11T23:13:47.1749775Z test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_native_batch_norm_executor_aten_cuda_bfloat16 PASSED [ 52%] 2023-01-11T23:13:47.1749998Z test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_native_batch_norm_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 52%] 2023-01-11T23:13:47.1750220Z test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_native_batch_norm_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 52%] 2023-01-11T23:13:47.1750416Z test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_var_mean_executor_aten_cuda_bfloat16 PASSED [ 52%] 2023-01-11T23:13:47.1750611Z test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_var_mean_executor_aten_cuda_float16 PASSED [ 52%] 2023-01-11T23:13:47.1750833Z test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_var_mean_executor_aten_cuda_float32 PASSED [ 52%] 2023-01-11T23:13:47.1751055Z test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_var_mean_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 52%] 2023-01-11T23:13:47.1751252Z test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_var_mean_executor_nvfuser_cuda_float32 PASSED [ 52%] 2023-01-11T23:13:47.1751470Z test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_var_mean_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 52%] 2023-01-11T23:13:47.1751664Z test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_view_executor_aten_cuda_complex128 PASSED [ 52%] 2023-01-11T23:13:47.1751855Z test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_view_executor_aten_cuda_complex64 PASSED [ 52%] 2023-01-11T23:13:47.1752039Z test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_view_executor_aten_cuda_float32 PASSED [ 52%] 2023-01-11T23:13:47.1752229Z test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_view_executor_aten_cuda_int64 PASSED [ 52%] 2023-01-11T23:13:47.1752586Z test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_view_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 52%] 2023-01-11T23:13:47.1752800Z test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_view_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 52%] 2023-01-11T23:13:47.1753011Z test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_view_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 52%] 2023-01-11T23:13:47.1753350Z test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_view_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 52%] 2023-01-11T23:13:47.1753511Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_T_cuda_bool PASSED [ 52%] 2023-01-11T23:13:47.1753679Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_T_cuda_complex128 PASSED [ 52%] 2023-01-11T23:13:47.1753841Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_T_cuda_float64 PASSED [ 52%] 2023-01-11T23:13:47.1754025Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_T_cuda_int16 PASSED [ 52%] 2023-01-11T23:13:47.1754174Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_T_cuda_uint8 PASSED [ 52%] 2023-01-11T23:13:47.1754361Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bfloat16_cuda_bfloat16 PASSED [ 52%] 2023-01-11T23:13:47.1754547Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bfloat16_cuda_complex32 PASSED [ 52%] 2023-01-11T23:13:47.1754726Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bfloat16_cuda_float32 PASSED [ 52%] 2023-01-11T23:13:47.1754906Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bfloat16_cuda_float64 PASSED [ 52%] 2023-01-11T23:13:47.1755086Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bfloat16_cuda_int64 PASSED [ 52%] 2023-01-11T23:13:47.1755261Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bool_cuda_bool PASSED [ 52%] 2023-01-11T23:13:47.1755443Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bool_cuda_complex128 PASSED [ 52%] 2023-01-11T23:13:47.1755621Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bool_cuda_float32 PASSED [ 52%] 2023-01-11T23:13:47.1755789Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bool_cuda_float64 PASSED [ 52%] 2023-01-11T23:13:47.1755962Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bool_cuda_int64 PASSED [ 52%] 2023-01-11T23:13:47.1756132Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bool_cuda_int8 PASSED [ 52%] 2023-01-11T23:13:47.1756305Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bool_cuda_uint8 PASSED [ 52%] 2023-01-11T23:13:47.1756516Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_byte_cuda_bfloat16 PASSED [ 52%] 2023-01-11T23:13:47.1756689Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_byte_cuda_bool PASSED [ 52%] 2023-01-11T23:13:47.1756874Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_byte_cuda_complex128 PASSED [ 52%] 2023-01-11T23:13:47.1757050Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_byte_cuda_float32 PASSED [ 52%] 2023-01-11T23:13:47.1757214Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_byte_cuda_int64 PASSED [ 52%] 2023-01-11T23:13:47.1757384Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_byte_cuda_int8 PASSED [ 52%] 2023-01-11T23:13:47.1757566Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cdouble_cuda_bfloat16 PASSED [ 52%] 2023-01-11T23:13:47.1757749Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cdouble_cuda_complex32 PASSED [ 52%] 2023-01-11T23:13:47.1757933Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cdouble_cuda_float16 PASSED [ 52%] 2023-01-11T23:13:47.1758112Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cfloat_cuda_bfloat16 PASSED [ 52%] 2023-01-11T23:13:47.1758295Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cfloat_cuda_float16 PASSED [ 52%] 2023-01-11T23:13:47.1758472Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cfloat_cuda_int64 PASSED [ 52%] 2023-01-11T23:13:47.1758645Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cfloat_cuda_int8 PASSED [ 52%] 2023-01-11T23:13:47.1758810Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cfloat_cuda_uint8 PASSED [ 52%] 2023-01-11T23:13:47.1759014Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_chalf_cuda_bfloat16 PASSED [ 52%] 2023-01-11T23:13:47.1759215Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_chalf_cuda_bool PASSED [ 52%] 2023-01-11T23:13:47.1759401Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_chalf_cuda_complex128 PASSED [ 52%] 2023-01-11T23:13:47.1759607Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_chalf_cuda_float16 PASSED [ 52%] 2023-01-11T23:13:47.1759783Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_chalf_cuda_int32 PASSED [ 52%] 2023-01-11T23:13:47.1759953Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_chalf_cuda_int64 PASSED [ 52%] 2023-01-11T23:13:47.1760121Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_chalf_cuda_uint8 PASSED [ 52%] 2023-01-11T23:13:47.1760294Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_char_cuda_bfloat16 PASSED [ 52%] 2023-01-11T23:13:47.1760466Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_char_cuda_complex128 PASSED [ 52%] 2023-01-11T23:13:47.1760645Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_char_cuda_complex32 PASSED [ 52%] 2023-01-11T23:13:47.1760825Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_char_cuda_complex64 PASSED [ 52%] 2023-01-11T23:13:47.1761004Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_char_cuda_float16 PASSED [ 52%] 2023-01-11T23:13:47.1761176Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_char_cuda_float32 PASSED [ 52%] 2023-01-11T23:13:47.1761347Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_char_cuda_int32 PASSED [ 52%] 2023-01-11T23:13:47.1761515Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_char_cuda_int64 PASSED [ 52%] 2023-01-11T23:13:47.1761695Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_complex_cuda_float32 PASSED [ 52%] 2023-01-11T23:13:47.1761873Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_complex_cuda_float64 PASSED [ 52%] 2023-01-11T23:13:47.1762048Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_double_cuda_complex64 PASSED [ 52%] 2023-01-11T23:13:47.1762258Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_double_cuda_float16 PASSED [ 52%] 2023-01-11T23:13:47.1762441Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_double_cuda_float64 PASSED [ 52%] 2023-01-11T23:13:47.1762617Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_double_cuda_int32 PASSED [ 52%] 2023-01-11T23:13:47.1762789Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_double_cuda_int8 PASSED [ 52%] 2023-01-11T23:13:47.1762967Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_float_cuda_bfloat16 PASSED [ 52%] 2023-01-11T23:13:47.1763145Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_float_cuda_bool PASSED [ 52%] 2023-01-11T23:13:47.1763326Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_float_cuda_float64 PASSED [ 52%] 2023-01-11T23:13:47.1763505Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_float_cuda_int32 PASSED [ 52%] 2023-01-11T23:13:47.1763672Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_half_cuda_bool PASSED [ 52%] 2023-01-11T23:13:47.1763851Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_half_cuda_complex64 PASSED [ 52%] 2023-01-11T23:13:47.1764029Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_half_cuda_float16 PASSED [ 52%] 2023-01-11T23:13:47.1764201Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_half_cuda_float32 PASSED [ 52%] 2023-01-11T23:13:47.1764373Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_half_cuda_float64 PASSED [ 52%] 2023-01-11T23:13:47.1764546Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_half_cuda_int16 PASSED [ 52%] 2023-01-11T23:13:47.1764716Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_half_cuda_int64 PASSED [ 52%] 2023-01-11T23:13:47.1764890Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_half_cuda_int8 PASSED [ 52%] 2023-01-11T23:13:47.1765062Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_half_cuda_uint8 PASSED [ 52%] 2023-01-11T23:13:47.1765261Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_int_cuda_bfloat16 PASSED [ 52%] 2023-01-11T23:13:47.1765440Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_int_cuda_complex128 PASSED [ 52%] 2023-01-11T23:13:47.1765614Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_int_cuda_float16 PASSED [ 52%] 2023-01-11T23:13:47.1765785Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_int_cuda_float64 PASSED [ 52%] 2023-01-11T23:13:47.1765953Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_int_cuda_int32 PASSED [ 52%] 2023-01-11T23:13:47.1766126Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_int_cuda_int8 PASSED [ 52%] 2023-01-11T23:13:47.1766302Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_long_cuda_bool PASSED [ 52%] 2023-01-11T23:13:47.1766481Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_long_cuda_complex128 PASSED [ 52%] 2023-01-11T23:13:47.1766666Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_long_cuda_complex32 PASSED [ 52%] 2023-01-11T23:13:47.1766834Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_long_cuda_float16 PASSED [ 52%] 2023-01-11T23:13:47.1767007Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_long_cuda_float32 PASSED [ 52%] 2023-01-11T23:13:47.1767177Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_long_cuda_float64 PASSED [ 52%] 2023-01-11T23:13:47.1767348Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_long_cuda_int16 PASSED [ 52%] 2023-01-11T23:13:47.1767519Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_long_cuda_int32 PASSED [ 52%] 2023-01-11T23:13:47.1767724Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_long_cuda_int64 PASSED [ 52%] 2023-01-11T23:13:47.1767896Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_long_cuda_int8 PASSED [ 52%] 2023-01-11T23:13:47.1768082Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_short_cuda_complex64 PASSED [ 52%] 2023-01-11T23:13:47.1768253Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_short_cuda_float16 PASSED [ 52%] 2023-01-11T23:13:47.1768429Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_short_cuda_float64 PASSED [ 52%] 2023-01-11T23:13:47.1768589Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_abs_cuda_bool PASSED [ 52%] 2023-01-11T23:13:47.1768758Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_abs_cuda_complex32 PASSED [ 52%] 2023-01-11T23:13:47.1768937Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_abs_cuda_float16 PASSED [ 52%] 2023-01-11T23:13:47.1769124Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_abs_cuda_float64 PASSED [ 52%] 2023-01-11T23:13:47.1769292Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_abs_cuda_int16 PASSED [ 52%] 2023-01-11T23:13:47.1769452Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_abs_cuda_uint8 PASSED [ 52%] 2023-01-11T23:13:47.1769616Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acos_cuda_bfloat16 PASSED [ 52%] 2023-01-11T23:13:47.1769768Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acos_cuda_bool PASSED [ 52%] 2023-01-11T23:13:47.1769933Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acos_cuda_complex64 PASSED [ 52%] 2023-01-11T23:13:47.1770095Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acos_cuda_float64 PASSED [ 52%] 2023-01-11T23:13:47.1770255Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acos_cuda_int16 PASSED [ 52%] 2023-01-11T23:13:47.1770411Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acos_cuda_int32 PASSED [ 52%] 2023-01-11T23:13:47.1770566Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acos_cuda_int64 PASSED [ 52%] 2023-01-11T23:13:47.1770721Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acosh_cuda_bool PASSED [ 52%] 2023-01-11T23:13:47.1770912Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acosh_cuda_float16 PASSED [ 52%] 2023-01-11T23:13:47.1771067Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acosh_cuda_float32 PASSED [ 52%] 2023-01-11T23:13:47.1771230Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acosh_cuda_int64 PASSED [ 52%] 2023-01-11T23:13:47.1771392Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acosh_cuda_uint8 PASSED [ 52%] 2023-01-11T23:13:47.1771556Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_add_cuda_bfloat16 PASSED [ 52%] 2023-01-11T23:13:47.1771717Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_add_cuda_complex32 PASSED [ 52%] 2023-01-11T23:13:47.1771878Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_add_cuda_float16 PASSED [ 52%] 2023-01-11T23:13:47.1772040Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_add_cuda_int64 PASSED [ 52%] 2023-01-11T23:13:47.1772199Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_add_cuda_int8 PASSED [ 52%] 2023-01-11T23:13:47.1772361Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_add_cuda_uint8 PASSED [ 52%] 2023-01-11T23:13:47.1772522Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addcdiv_cuda_bfloat16 PASSED [ 52%] 2023-01-11T23:13:47.1772693Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addcdiv_cuda_complex128 PASSED [ 52%] 2023-01-11T23:13:47.1772859Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addcdiv_cuda_float16 PASSED [ 52%] 2023-01-11T23:13:47.1773027Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addcdiv_cuda_float32 PASSED [ 52%] 2023-01-11T23:13:47.1773198Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addcmul_cuda_complex128 PASSED [ 52%] 2023-01-11T23:13:47.1773359Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addcmul_cuda_float16 PASSED [ 52%] 2023-01-11T23:13:47.1773550Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addcmul_cuda_float32 PASSED [ 52%] 2023-01-11T23:13:47.1773718Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addcmul_cuda_int8 PASSED [ 52%] 2023-01-11T23:13:47.1773874Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addr_cuda_bfloat16 PASSED [ 52%] 2023-01-11T23:13:47.1774033Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addr_cuda_bool PASSED [ 52%] 2023-01-11T23:13:47.1774192Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addr_cuda_int16 PASSED [ 52%] 2023-01-11T23:13:47.1774347Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addr_cuda_int32 PASSED [ 52%] 2023-01-11T23:13:47.1774681Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addr_cuda_int64 PASSED [ 53%] 2023-01-11T23:13:47.1774848Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_all_cuda_bfloat16 PASSED [ 53%] 2023-01-11T23:13:47.1775015Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_all_cuda_complex128 PASSED [ 53%] 2023-01-11T23:13:47.1775177Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_all_cuda_complex64 PASSED [ 53%] 2023-01-11T23:13:47.1775331Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_all_cuda_float64 PASSED [ 53%] 2023-01-11T23:13:47.1775488Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_all_cuda_int32 PASSED [ 53%] 2023-01-11T23:13:47.1775647Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_all_cuda_int8 PASSED [ 53%] 2023-01-11T23:13:47.1775816Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_allclose_cuda_bfloat16 PASSED [ 53%] 2023-01-11T23:13:47.1775988Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_allclose_cuda_complex128 PASSED [ 53%] 2023-01-11T23:13:47.1776153Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_allclose_cuda_float64 PASSED [ 53%] 2023-01-11T23:13:47.1776315Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_amax_cuda_bfloat16 PASSED [ 53%] 2023-01-11T23:13:47.1776478Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_amax_cuda_float64 PASSED [ 53%] 2023-01-11T23:13:47.1776636Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_amax_cuda_int16 PASSED [ 53%] 2023-01-11T23:13:47.1776825Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_amax_cuda_int32 PASSED [ 53%] 2023-01-11T23:13:47.1776980Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_amax_cuda_int64 PASSED [ 53%] 2023-01-11T23:13:47.1777140Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_amax_cuda_int8 PASSED [ 53%] 2023-01-11T23:13:47.1777293Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_amin_cuda_int16 PASSED [ 53%] 2023-01-11T23:13:47.1777446Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_amin_cuda_int32 PASSED [ 53%] 2023-01-11T23:13:47.1777603Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_any_cuda_bool PASSED [ 53%] 2023-01-11T23:13:47.1777767Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_any_cuda_complex128 PASSED [ 53%] 2023-01-11T23:13:47.1777933Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_any_cuda_float16 PASSED [ 53%] 2023-01-11T23:13:47.1778084Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_any_cuda_float32 PASSED [ 53%] 2023-01-11T23:13:47.1778244Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_any_cuda_int32 PASSED [ 53%] 2023-01-11T23:13:47.1778400Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_any_cuda_int8 PASSED [ 53%] 2023-01-11T23:13:47.1778563Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_arange_cuda_float16 PASSED [ 53%] 2023-01-11T23:13:47.1778724Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_arange_cuda_float64 PASSED [ 53%] 2023-01-11T23:13:47.1778881Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_arange_cuda_int64 XFAIL [ 53%] 2023-01-11T23:13:47.1779053Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_cuda_bfloat16 PASSED [ 53%] 2023-01-11T23:13:47.1779217Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_cuda_bool PASSED [ 53%] 2023-01-11T23:13:47.1779427Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_cuda_float16 PASSED [ 53%] 2023-01-11T23:13:47.1779591Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_cuda_float32 PASSED [ 53%] 2023-01-11T23:13:47.1779756Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_cuda_float64 PASSED [ 53%] 2023-01-11T23:13:47.1779921Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_cuda_int16 PASSED [ 53%] 2023-01-11T23:13:47.1780087Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_cuda_int32 PASSED [ 53%] 2023-01-11T23:13:47.1780249Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_cuda_uint8 PASSED [ 53%] 2023-01-11T23:13:47.1780439Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_partial_views_cuda_complex128 PASSED [ 53%] 2023-01-11T23:13:47.1780627Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_partial_views_cuda_complex64 PASSED [ 53%] 2023-01-11T23:13:47.1780813Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_partial_views_cuda_float32 PASSED [ 53%] 2023-01-11T23:13:47.1780991Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_partial_views_cuda_float64 PASSED [ 53%] 2023-01-11T23:13:47.1781173Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_partial_views_cuda_int16 PASSED [ 53%] 2023-01-11T23:13:47.1781355Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_partial_views_cuda_int64 PASSED [ 53%] 2023-01-11T23:13:47.1781535Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_partial_views_cuda_uint8 PASSED [ 53%] 2023-01-11T23:13:47.1781714Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_scatter_cuda_bfloat16 PASSED [ 53%] 2023-01-11T23:13:47.1781887Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_scatter_cuda_bool PASSED [ 53%] 2023-01-11T23:13:47.1782073Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_scatter_cuda_complex128 PASSED [ 53%] 2023-01-11T23:13:47.1782254Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_scatter_cuda_complex32 PASSED [ 53%] 2023-01-11T23:13:47.1782457Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_scatter_cuda_float32 PASSED [ 53%] 2023-01-11T23:13:47.1782623Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_scatter_cuda_float64 PASSED [ 53%] 2023-01-11T23:13:47.1782797Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_scatter_cuda_int16 PASSED [ 53%] 2023-01-11T23:13:47.1782969Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_scatter_cuda_int8 PASSED [ 53%] 2023-01-11T23:13:47.1783134Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asin_cuda_complex64 PASSED [ 53%] 2023-01-11T23:13:47.1783296Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asin_cuda_float16 PASSED [ 53%] 2023-01-11T23:13:47.1783463Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asin_cuda_float32 PASSED [ 53%] 2023-01-11T23:13:47.1783623Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asin_cuda_int16 PASSED [ 53%] 2023-01-11T23:13:47.1783780Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asin_cuda_int32 PASSED [ 53%] 2023-01-11T23:13:47.1783932Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asin_cuda_int64 PASSED [ 53%] 2023-01-11T23:13:47.1784079Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asin_cuda_uint8 PASSED [ 53%] 2023-01-11T23:13:47.1784241Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asinh_cuda_bfloat16 PASSED [ 53%] 2023-01-11T23:13:47.1784408Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asinh_cuda_complex64 PASSED [ 53%] 2023-01-11T23:13:47.1784576Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asinh_cuda_float16 PASSED [ 53%] 2023-01-11T23:13:47.1784739Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asinh_cuda_float64 PASSED [ 53%] 2023-01-11T23:13:47.1784928Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asinh_cuda_int16 PASSED [ 53%] 2023-01-11T23:13:47.1785092Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan2_cuda_bfloat16 PASSED [ 53%] 2023-01-11T23:13:47.1785259Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan2_cuda_float32 PASSED [ 53%] 2023-01-11T23:13:47.1785411Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan2_cuda_float64 PASSED [ 53%] 2023-01-11T23:13:47.1785567Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan2_cuda_int32 PASSED [ 53%] 2023-01-11T23:13:47.1785730Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan_cuda_bfloat16 PASSED [ 53%] 2023-01-11T23:13:47.1785894Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan_cuda_complex64 PASSED [ 53%] 2023-01-11T23:13:47.1786055Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan_cuda_float16 PASSED [ 53%] 2023-01-11T23:13:47.1786217Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan_cuda_float32 PASSED [ 53%] 2023-01-11T23:13:47.1786379Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan_cuda_int64 PASSED [ 53%] 2023-01-11T23:13:47.1786538Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan_cuda_int8 PASSED [ 53%] 2023-01-11T23:13:47.1786701Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atanh_cuda_bfloat16 PASSED [ 53%] 2023-01-11T23:13:47.1786854Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atanh_cuda_bool PASSED [ 53%] 2023-01-11T23:13:47.1787021Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atanh_cuda_complex64 PASSED [ 53%] 2023-01-11T23:13:47.1787181Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atanh_cuda_float16 PASSED [ 53%] 2023-01-11T23:13:47.1787340Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atanh_cuda_float32 PASSED [ 53%] 2023-01-11T23:13:47.1787497Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atanh_cuda_int8 PASSED [ 53%] 2023-01-11T23:13:47.1787658Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atanh_cuda_uint8 PASSED [ 53%] 2023-01-11T23:13:47.1787833Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_1d_cuda_bfloat16 PASSED [ 53%] 2023-01-11T23:13:47.1787997Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_1d_cuda_bool PASSED [ 53%] 2023-01-11T23:13:47.1788188Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_1d_cuda_complex128 PASSED [ 53%] 2023-01-11T23:13:47.1788365Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_1d_cuda_complex32 PASSED [ 53%] 2023-01-11T23:13:47.1788536Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_1d_cuda_complex64 PASSED [ 53%] 2023-01-11T23:13:47.1788704Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_1d_cuda_float16 PASSED [ 53%] 2023-01-11T23:13:47.1788869Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_1d_cuda_float64 PASSED [ 53%] 2023-01-11T23:13:47.1789035Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_1d_cuda_int64 PASSED [ 53%] 2023-01-11T23:13:47.1789205Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_2d_cuda_bfloat16 PASSED [ 53%] 2023-01-11T23:13:47.1789382Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_2d_cuda_complex128 PASSED [ 53%] 2023-01-11T23:13:47.1789579Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_2d_cuda_complex64 PASSED [ 53%] 2023-01-11T23:13:47.1789818Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_2d_cuda_float32 PASSED [ 53%] 2023-01-11T23:13:47.1790001Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_2d_cuda_float64 PASSED [ 53%] 2023-01-11T23:13:47.1790167Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_2d_cuda_int8 PASSED [ 53%] 2023-01-11T23:13:47.1790335Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_3d_cuda_complex128 PASSED [ 53%] 2023-01-11T23:13:47.1790500Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_3d_cuda_float64 PASSED [ 53%] 2023-01-11T23:13:47.1790695Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_3d_cuda_int16 PASSED [ 53%] 2023-01-11T23:13:47.1790860Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_3d_cuda_int8 PASSED [ 53%] 2023-01-11T23:13:47.1791024Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_3d_cuda_uint8 PASSED [ 53%] 2023-01-11T23:13:47.1791181Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_and_cuda_bool PASSED [ 53%] 2023-01-11T23:13:47.1791351Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_and_cuda_int16 PASSED [ 53%] 2023-01-11T23:13:47.1791516Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_and_cuda_int64 PASSED [ 53%] 2023-01-11T23:13:47.1791692Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_left_shift_cuda_int32 PASSED [ 53%] 2023-01-11T23:13:47.1791865Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_left_shift_cuda_int8 PASSED [ 53%] 2023-01-11T23:13:47.1792036Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_left_shift_cuda_uint8 PASSED [ 53%] 2023-01-11T23:13:47.1792201Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_not_cuda_bool PASSED [ 53%] 2023-01-11T23:13:47.1792369Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_not_cuda_uint8 PASSED [ 53%] 2023-01-11T23:13:47.1792530Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_or_cuda_int32 PASSED [ 53%] 2023-01-11T23:13:47.1792684Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_or_cuda_int64 PASSED [ 53%] 2023-01-11T23:13:47.1792848Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_or_cuda_int8 PASSED [ 53%] 2023-01-11T23:13:47.1793011Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_or_cuda_uint8 PASSED [ 53%] 2023-01-11T23:13:47.1793191Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_right_shift_cuda_int32 PASSED [ 53%] 2023-01-11T23:13:47.1793364Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_right_shift_cuda_int64 PASSED [ 53%] 2023-01-11T23:13:47.1793536Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_right_shift_cuda_int8 PASSED [ 53%] 2023-01-11T23:13:47.1793710Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_right_shift_cuda_uint8 PASSED [ 53%] 2023-01-11T23:13:47.1793903Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_xor_cuda_int16 PASSED [ 53%] 2023-01-11T23:13:47.1794069Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_xor_cuda_int32 PASSED [ 53%] 2023-01-11T23:13:47.1794224Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_xor_cuda_int8 PASSED [ 53%] 2023-01-11T23:13:47.1794386Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_xor_cuda_uint8 PASSED [ 53%] 2023-01-11T23:13:47.1794560Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_shapes_cuda_float32 PASSED [ 53%] 2023-01-11T23:13:47.1794736Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_tensors_cuda_bool PASSED [ 53%] 2023-01-11T23:13:47.1794920Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_tensors_cuda_complex64 PASSED [ 53%] 2023-01-11T23:13:47.1795093Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_tensors_cuda_float16 PASSED [ 53%] 2023-01-11T23:13:47.1795268Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_tensors_cuda_int16 PASSED [ 53%] 2023-01-11T23:13:47.1795439Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_tensors_cuda_int64 PASSED [ 53%] 2023-01-11T23:13:47.1795601Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_tensors_cuda_int8 PASSED [ 53%] 2023-01-11T23:13:47.1795776Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_tensors_cuda_uint8 PASSED [ 53%] 2023-01-11T23:13:47.1795946Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_to_cuda_bfloat16 PASSED [ 53%] 2023-01-11T23:13:47.1796121Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_to_cuda_complex64 PASSED [ 53%] 2023-01-11T23:13:47.1796322Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_to_cuda_float32 PASSED [ 53%] 2023-01-11T23:13:47.1796490Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_to_cuda_int16 PASSED [ 53%] 2023-01-11T23:13:47.1796658Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_to_cuda_int8 PASSED [ 53%] 2023-01-11T23:13:47.1796824Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_to_cuda_uint8 PASSED [ 53%] 2023-01-11T23:13:47.1796988Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bucketize_cuda_float16 PASSED [ 53%] 2023-01-11T23:13:47.1797146Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bucketize_cuda_int32 PASSED [ 53%] 2023-01-11T23:13:47.1797311Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bucketize_cuda_uint8 PASSED [ 53%] 2023-01-11T23:13:47.1797474Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cat_cuda_bfloat16 PASSED [ 53%] 2023-01-11T23:13:47.1797639Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cat_cuda_complex32 PASSED [ 53%] 2023-01-11T23:13:47.1797802Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cat_cuda_float64 PASSED [ 53%] 2023-01-11T23:13:47.1797964Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cat_cuda_int16 PASSED [ 53%] 2023-01-11T23:13:47.1798120Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cat_cuda_int64 PASSED [ 53%] 2023-01-11T23:13:47.1798277Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cat_cuda_int8 PASSED [ 53%] 2023-01-11T23:13:47.1798423Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cat_cuda_uint8 PASSED [ 53%] 2023-01-11T23:13:47.1798588Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ceil_cuda_bfloat16 PASSED [ 53%] 2023-01-11T23:13:47.1798749Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ceil_cuda_float64 PASSED [ 53%] 2023-01-11T23:13:47.1798909Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ceil_cuda_int16 PASSED [ 53%] 2023-01-11T23:13:47.1799072Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ceil_cuda_int8 PASSED [ 53%] 2023-01-11T23:13:47.1799228Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ceil_cuda_uint8 PASSED [ 53%] 2023-01-11T23:13:47.1799418Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_chunk_cuda_bfloat16 PASSED [ 53%] 2023-01-11T23:13:47.1799583Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_chunk_cuda_bool PASSED [ 53%] 2023-01-11T23:13:47.1799748Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_chunk_cuda_complex32 PASSED [ 53%] 2023-01-11T23:13:47.1799905Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_chunk_cuda_complex64 PASSED [ 53%] 2023-01-11T23:13:47.1800067Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_chunk_cuda_float16 PASSED [ 53%] 2023-01-11T23:13:47.1800227Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_chunk_cuda_int16 PASSED [ 54%] 2023-01-11T23:13:47.1800387Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_chunk_cuda_int32 PASSED [ 54%] 2023-01-11T23:13:47.1800550Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_chunk_cuda_int64 PASSED [ 54%] 2023-01-11T23:13:47.1800709Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_chunk_cuda_int8 PASSED [ 54%] 2023-01-11T23:13:47.1800870Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_chunk_cuda_uint8 PASSED [ 54%] 2023-01-11T23:13:47.1801036Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_cuda_bfloat16 PASSED [ 54%] 2023-01-11T23:13:47.1801193Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_cuda_float16 PASSED [ 54%] 2023-01-11T23:13:47.1801349Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_cuda_int64 PASSED [ 54%] 2023-01-11T23:13:47.1801520Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_max_cuda_bfloat16 PASSED [ 54%] 2023-01-11T23:13:47.1801684Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_max_cuda_bool PASSED [ 54%] 2023-01-11T23:13:47.1801850Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_max_cuda_float16 PASSED [ 54%] 2023-01-11T23:13:47.1802051Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_max_cuda_int64 PASSED [ 54%] 2023-01-11T23:13:47.1802213Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_max_cuda_int8 PASSED [ 54%] 2023-01-11T23:13:47.1802383Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_min_cuda_bfloat16 PASSED [ 54%] 2023-01-11T23:13:47.1802545Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_min_cuda_bool PASSED [ 54%] 2023-01-11T23:13:47.1802702Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_min_cuda_float16 PASSED [ 54%] 2023-01-11T23:13:47.1802868Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_min_cuda_float32 PASSED [ 54%] 2023-01-11T23:13:47.1803031Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_min_cuda_float64 PASSED [ 54%] 2023-01-11T23:13:47.1803192Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_min_cuda_int16 PASSED [ 54%] 2023-01-11T23:13:47.1803356Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_min_cuda_int32 PASSED [ 54%] 2023-01-11T23:13:47.1803515Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_min_cuda_uint8 PASSED [ 54%] 2023-01-11T23:13:47.1803678Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clone_cuda_bfloat16 PASSED [ 54%] 2023-01-11T23:13:47.1803837Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clone_cuda_bool PASSED [ 54%] 2023-01-11T23:13:47.1803989Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clone_cuda_int16 PASSED [ 54%] 2023-01-11T23:13:47.1804149Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clone_cuda_int32 PASSED [ 54%] 2023-01-11T23:13:47.1804309Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clone_cuda_int64 PASSED [ 54%] 2023-01-11T23:13:47.1804468Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clone_cuda_uint8 PASSED [ 54%] 2023-01-11T23:13:47.1804636Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_column_stack_cuda_bool PASSED [ 54%] 2023-01-11T23:13:47.1804813Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_column_stack_cuda_complex32 PASSED [ 54%] 2023-01-11T23:13:47.1804987Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_column_stack_cuda_complex64 PASSED [ 54%] 2023-01-11T23:13:47.1805184Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_column_stack_cuda_float64 PASSED [ 54%] 2023-01-11T23:13:47.1805353Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_column_stack_cuda_int16 PASSED [ 54%] 2023-01-11T23:13:47.1805512Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_column_stack_cuda_int32 PASSED [ 54%] 2023-01-11T23:13:47.1805678Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_column_stack_cuda_int64 PASSED [ 54%] 2023-01-11T23:13:47.1805841Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_column_stack_cuda_uint8 PASSED [ 54%] 2023-01-11T23:13:47.1806005Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_cuda_bfloat16 PASSED [ 54%] 2023-01-11T23:13:47.1806170Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_cuda_bool PASSED [ 54%] 2023-01-11T23:13:47.1806335Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_cuda_complex128 PASSED [ 54%] 2023-01-11T23:13:47.1806499Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_cuda_float32 PASSED [ 54%] 2023-01-11T23:13:47.1806658Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_cuda_int32 PASSED [ 54%] 2023-01-11T23:13:47.1806807Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_cuda_int64 PASSED [ 54%] 2023-01-11T23:13:47.1806966Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_cuda_int8 PASSED [ 54%] 2023-01-11T23:13:47.1807122Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_cuda_uint8 PASSED [ 54%] 2023-01-11T23:13:47.1807299Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_physical_cuda_complex128 PASSED [ 54%] 2023-01-11T23:13:47.1807477Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_physical_cuda_complex32 PASSED [ 54%] 2023-01-11T23:13:47.1807676Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_physical_cuda_complex64 PASSED [ 54%] 2023-01-11T23:13:47.1807849Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_physical_cuda_float64 PASSED [ 54%] 2023-01-11T23:13:47.1808022Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_physical_cuda_int16 PASSED [ 54%] 2023-01-11T23:13:47.1808193Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_physical_cuda_int32 PASSED [ 54%] 2023-01-11T23:13:47.1808358Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_physical_cuda_int64 PASSED [ 54%] 2023-01-11T23:13:47.1808531Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_physical_cuda_uint8 PASSED [ 54%] 2023-01-11T23:13:47.1808708Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_constant_pad_nd_cuda_bfloat16 PASSED [ 54%] 2023-01-11T23:13:47.1808880Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_constant_pad_nd_cuda_float64 PASSED [ 54%] 2023-01-11T23:13:47.1809055Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_constant_pad_nd_cuda_int16 PASSED [ 54%] 2023-01-11T23:13:47.1809226Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_constant_pad_nd_cuda_int32 PASSED [ 54%] 2023-01-11T23:13:47.1809400Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_constant_pad_nd_cuda_int64 PASSED [ 54%] 2023-01-11T23:13:47.1809571Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_constant_pad_nd_cuda_uint8 PASSED [ 54%] 2023-01-11T23:13:47.1809742Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_contiguous_cuda_bfloat16 PASSED [ 54%] 2023-01-11T23:13:47.1809903Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_contiguous_cuda_float16 PASSED [ 54%] 2023-01-11T23:13:47.1810074Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_contiguous_cuda_float32 PASSED [ 54%] 2023-01-11T23:13:47.1810239Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_contiguous_cuda_float64 PASSED [ 54%] 2023-01-11T23:13:47.1810408Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_contiguous_cuda_int16 PASSED [ 54%] 2023-01-11T23:13:47.1810576Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_contiguous_cuda_int32 PASSED [ 54%] 2023-01-11T23:13:47.1810772Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_contiguous_cuda_int64 PASSED [ 54%] 2023-01-11T23:13:47.1810942Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_copysign_cuda_bfloat16 PASSED [ 54%] 2023-01-11T23:13:47.1811103Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_copysign_cuda_float16 PASSED [ 54%] 2023-01-11T23:13:47.1811259Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_copysign_cuda_float32 PASSED [ 54%] 2023-01-11T23:13:47.1811422Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_copysign_cuda_int16 PASSED [ 54%] 2023-01-11T23:13:47.1811587Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_copysign_cuda_int32 PASSED [ 54%] 2023-01-11T23:13:47.1811753Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_copysign_cuda_int64 PASSED [ 54%] 2023-01-11T23:13:47.1811917Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cos_cuda_bfloat16 PASSED [ 54%] 2023-01-11T23:13:47.1812076Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cos_cuda_complex128 PASSED [ 54%] 2023-01-11T23:13:47.1812243Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cos_cuda_complex32 PASSED [ 54%] 2023-01-11T23:13:47.1812403Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cos_cuda_float16 PASSED [ 54%] 2023-01-11T23:13:47.1812566Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cos_cuda_float64 PASSED [ 54%] 2023-01-11T23:13:47.1812717Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cos_cuda_int16 PASSED [ 54%] 2023-01-11T23:13:47.1812874Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cos_cuda_int32 PASSED [ 54%] 2023-01-11T23:13:47.1813024Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cos_cuda_int64 PASSED [ 54%] 2023-01-11T23:13:47.1813182Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cos_cuda_int8 PASSED [ 54%] 2023-01-11T23:13:47.1813373Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cosh_cuda_bfloat16 PASSED [ 54%] 2023-01-11T23:13:47.1813532Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cosh_cuda_bool PASSED [ 54%] 2023-01-11T23:13:47.1813703Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cosh_cuda_complex128 PASSED [ 54%] 2023-01-11T23:13:47.1813867Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cosh_cuda_complex32 PASSED [ 54%] 2023-01-11T23:13:47.1814022Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cosh_cuda_complex64 PASSED [ 54%] 2023-01-11T23:13:47.1814183Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cosh_cuda_float32 PASSED [ 54%] 2023-01-11T23:13:47.1814342Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cosh_cuda_int16 PASSED [ 54%] 2023-01-11T23:13:47.1814596Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cosh_cuda_int32 PASSED [ 54%] 2023-01-11T23:13:47.1814756Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cosh_cuda_int64 PASSED [ 54%] 2023-01-11T23:13:47.1814908Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cosh_cuda_uint8 PASSED [ 54%] 2023-01-11T23:13:47.1815074Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cumsum_cuda_bfloat16 PASSED [ 54%] 2023-01-11T23:13:47.1815239Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cumsum_cuda_complex64 PASSED [ 54%] 2023-01-11T23:13:47.1815392Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cumsum_cuda_float16 PASSED [ 54%] 2023-01-11T23:13:47.1815552Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cumsum_cuda_float64 PASSED [ 54%] 2023-01-11T23:13:47.1815715Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cumsum_cuda_int16 PASSED [ 54%] 2023-01-11T23:13:47.1815873Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cumsum_cuda_int64 PASSED [ 54%] 2023-01-11T23:13:47.1816035Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cumsum_cuda_uint8 PASSED [ 54%] 2023-01-11T23:13:47.1816198Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_cuda_bool PASSED [ 54%] 2023-01-11T23:13:47.1816359Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_cuda_complex32 PASSED [ 54%] 2023-01-11T23:13:47.1816557Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_cuda_float16 PASSED [ 54%] 2023-01-11T23:13:47.1816721Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_cuda_float32 PASSED [ 54%] 2023-01-11T23:13:47.1816872Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_cuda_float64 PASSED [ 54%] 2023-01-11T23:13:47.1817033Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_cuda_int16 PASSED [ 54%] 2023-01-11T23:13:47.1817188Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_cuda_int32 PASSED [ 54%] 2023-01-11T23:13:47.1817344Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_cuda_int64 PASSED [ 54%] 2023-01-11T23:13:47.1817500Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_cuda_int8 PASSED [ 54%] 2023-01-11T23:13:47.1817668Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_embed_cuda_bool PASSED [ 54%] 2023-01-11T23:13:47.1817842Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_embed_cuda_complex32 PASSED [ 54%] 2023-01-11T23:13:47.1818016Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_embed_cuda_complex64 PASSED [ 54%] 2023-01-11T23:13:47.1818178Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_embed_cuda_float16 PASSED [ 54%] 2023-01-11T23:13:47.1818346Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_embed_cuda_float64 PASSED [ 54%] 2023-01-11T23:13:47.1818520Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_copy_cuda_bfloat16 PASSED [ 54%] 2023-01-11T23:13:47.1818687Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_copy_cuda_bool PASSED [ 54%] 2023-01-11T23:13:47.1818863Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_copy_cuda_complex64 PASSED [ 54%] 2023-01-11T23:13:47.1819070Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_copy_cuda_float64 PASSED [ 54%] 2023-01-11T23:13:47.1819245Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_copy_cuda_int16 PASSED [ 54%] 2023-01-11T23:13:47.1819420Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_copy_cuda_uint8 PASSED [ 54%] 2023-01-11T23:13:47.1819588Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_cuda_bfloat16 PASSED [ 54%] 2023-01-11T23:13:47.1819746Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_cuda_bool PASSED [ 54%] 2023-01-11T23:13:47.1819917Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_cuda_complex128 PASSED [ 54%] 2023-01-11T23:13:47.1820081Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_cuda_float16 PASSED [ 54%] 2023-01-11T23:13:47.1820247Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_cuda_float32 PASSED [ 54%] 2023-01-11T23:13:47.1820410Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_cuda_int16 PASSED [ 54%] 2023-01-11T23:13:47.1820591Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_scatter_cuda_bfloat16 PASSED [ 54%] 2023-01-11T23:13:47.1820766Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_scatter_cuda_bool PASSED [ 54%] 2023-01-11T23:13:47.1820950Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_scatter_cuda_complex128 PASSED [ 54%] 2023-01-11T23:13:47.1821118Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_scatter_cuda_float32 PASSED [ 54%] 2023-01-11T23:13:47.1821297Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_scatter_cuda_int16 PASSED [ 54%] 2023-01-11T23:13:47.1821471Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_scatter_cuda_int32 PASSED [ 54%] 2023-01-11T23:13:47.1821643Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_scatter_cuda_int8 PASSED [ 54%] 2023-01-11T23:13:47.1821806Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_digamma_cuda_bool PASSED [ 54%] 2023-01-11T23:13:47.1821976Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_digamma_cuda_float16 PASSED [ 54%] 2023-01-11T23:13:47.1822171Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_digamma_cuda_float32 PASSED [ 54%] 2023-01-11T23:13:47.1822336Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_digamma_cuda_float64 PASSED [ 54%] 2023-01-11T23:13:47.1822499Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_digamma_cuda_int16 PASSED [ 54%] 2023-01-11T23:13:47.1822672Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_floor_rounding_cuda_float64 PASSED [ 54%] 2023-01-11T23:13:47.1822847Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_floor_rounding_cuda_int16 PASSED [ 54%] 2023-01-11T23:13:47.1823020Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_floor_rounding_cuda_int32 PASSED [ 54%] 2023-01-11T23:13:47.1823194Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_floor_rounding_cuda_int8 PASSED [ 54%] 2023-01-11T23:13:47.1823382Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_no_rounding_mode_cuda_complex128 PASSED [ 54%] 2023-01-11T23:13:47.1823561Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_no_rounding_mode_cuda_int16 PASSED [ 54%] 2023-01-11T23:13:47.1823733Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_no_rounding_mode_cuda_int32 PASSED [ 54%] 2023-01-11T23:13:47.1823908Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_no_rounding_mode_cuda_int8 PASSED [ 54%] 2023-01-11T23:13:47.1824080Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_trunc_rounding_cuda_float16 PASSED [ 54%] 2023-01-11T23:13:47.1824243Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_trunc_rounding_cuda_float32 PASSED [ 54%] 2023-01-11T23:13:47.1824417Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_trunc_rounding_cuda_int16 PASSED [ 54%] 2023-01-11T23:13:47.1824588Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_trunc_rounding_cuda_int64 PASSED [ 54%] 2023-01-11T23:13:47.1824786Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_trunc_rounding_cuda_uint8 PASSED [ 54%] 2023-01-11T23:13:47.1824956Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dsplit_cuda_bfloat16 PASSED [ 54%] 2023-01-11T23:13:47.1825125Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dsplit_cuda_complex32 PASSED [ 54%] 2023-01-11T23:13:47.1825291Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dsplit_cuda_complex64 PASSED [ 54%] 2023-01-11T23:13:47.1825456Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dsplit_cuda_float16 PASSED [ 54%] 2023-01-11T23:13:47.1825623Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dsplit_cuda_float32 PASSED [ 55%] 2023-01-11T23:13:47.1825779Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dsplit_cuda_float64 PASSED [ 55%] 2023-01-11T23:13:47.1825939Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dsplit_cuda_int16 PASSED [ 55%] 2023-01-11T23:13:47.1826102Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dsplit_cuda_int64 PASSED [ 55%] 2023-01-11T23:13:47.1826264Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dsplit_cuda_int8 PASSED [ 55%] 2023-01-11T23:13:47.1826429Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dstack_cuda_bfloat16 PASSED [ 55%] 2023-01-11T23:13:47.1826592Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dstack_cuda_bool PASSED [ 55%] 2023-01-11T23:13:47.1826760Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dstack_cuda_complex32 PASSED [ 55%] 2023-01-11T23:13:47.1826921Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dstack_cuda_float16 PASSED [ 55%] 2023-01-11T23:13:47.1827076Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dstack_cuda_float64 PASSED [ 55%] 2023-01-11T23:13:47.1827238Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dstack_cuda_int16 PASSED [ 55%] 2023-01-11T23:13:47.1827402Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dstack_cuda_int8 PASSED [ 55%] 2023-01-11T23:13:47.1827570Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_cuda_complex32 PASSED [ 55%] 2023-01-11T23:13:47.1827732Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_cuda_float64 PASSED [ 55%] 2023-01-11T23:13:47.1827916Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_cuda_int16 PASSED [ 55%] 2023-01-11T23:13:47.1828078Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_cuda_int8 PASSED [ 55%] 2023-01-11T23:13:47.1828246Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_like_cuda_bfloat16 PASSED [ 55%] 2023-01-11T23:13:47.1828414Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_like_cuda_complex128 PASSED [ 55%] 2023-01-11T23:13:47.1828585Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_like_cuda_complex64 PASSED [ 55%] 2023-01-11T23:13:47.1828753Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_like_cuda_float32 PASSED [ 55%] 2023-01-11T23:13:47.1828937Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_like_cuda_float64 PASSED [ 55%] 2023-01-11T23:13:47.1829129Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_like_cuda_int16 PASSED [ 55%] 2023-01-11T23:13:47.1829313Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_like_cuda_int64 PASSED [ 55%] 2023-01-11T23:13:47.1829478Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_like_cuda_int8 PASSED [ 55%] 2023-01-11T23:13:47.1829638Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_like_cuda_uint8 PASSED [ 55%] 2023-01-11T23:13:47.1829860Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eq_cuda_bfloat16 PASSED [ 55%] 2023-01-11T23:13:47.1830017Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eq_cuda_complex32 PASSED [ 55%] 2023-01-11T23:13:47.1830178Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eq_cuda_float16 PASSED [ 55%] 2023-01-11T23:13:47.1830334Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eq_cuda_float32 PASSED [ 55%] 2023-01-11T23:13:47.1830525Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eq_cuda_int16 PASSED [ 55%] 2023-01-11T23:13:47.1830686Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eq_cuda_int8 PASSED [ 55%] 2023-01-11T23:13:47.1830853Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erf_cuda_float16 PASSED [ 55%] 2023-01-11T23:13:47.1831016Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erf_cuda_float32 PASSED [ 55%] 2023-01-11T23:13:47.1831179Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erf_cuda_int16 PASSED [ 55%] 2023-01-11T23:13:47.1831331Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erf_cuda_int32 PASSED [ 55%] 2023-01-11T23:13:47.1831491Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erf_cuda_int8 PASSED [ 55%] 2023-01-11T23:13:47.1831649Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erf_cuda_uint8 PASSED [ 55%] 2023-01-11T23:13:47.1831806Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erfc_cuda_bool PASSED [ 55%] 2023-01-11T23:13:47.1831971Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erfc_cuda_float16 PASSED [ 55%] 2023-01-11T23:13:47.1832132Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erfc_cuda_int16 PASSED [ 55%] 2023-01-11T23:13:47.1832293Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erfc_cuda_int64 PASSED [ 55%] 2023-01-11T23:13:47.1832455Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erfc_cuda_int8 PASSED [ 55%] 2023-01-11T23:13:47.1832625Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erfinv_cuda_float16 PASSED [ 55%] 2023-01-11T23:13:47.1832786Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erfinv_cuda_float32 PASSED [ 55%] 2023-01-11T23:13:47.1832949Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erfinv_cuda_int64 PASSED [ 55%] 2023-01-11T23:13:47.1833112Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erfinv_cuda_int8 PASSED [ 55%] 2023-01-11T23:13:47.1833274Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp2_cuda_bool PASSED [ 55%] 2023-01-11T23:13:47.1833434Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp2_cuda_int32 PASSED [ 55%] 2023-01-11T23:13:47.1833591Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp2_cuda_int8 PASSED [ 55%] 2023-01-11T23:13:47.1833775Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp2_cuda_uint8 PASSED [ 55%] 2023-01-11T23:13:47.1833939Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp_cuda_bool PASSED [ 55%] 2023-01-11T23:13:47.1834104Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp_cuda_complex32 PASSED [ 55%] 2023-01-11T23:13:47.1834267Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp_cuda_float32 PASSED [ 55%] 2023-01-11T23:13:47.1834430Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp_cuda_int16 PASSED [ 55%] 2023-01-11T23:13:47.1834604Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_as_cuda_bfloat16 PASSED [ 55%] 2023-01-11T23:13:47.1834778Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_as_cuda_complex128 PASSED [ 55%] 2023-01-11T23:13:47.1834956Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_as_cuda_complex64 PASSED [ 55%] 2023-01-11T23:13:47.1835130Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_as_cuda_float16 PASSED [ 55%] 2023-01-11T23:13:47.1835300Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_as_cuda_float64 PASSED [ 55%] 2023-01-11T23:13:47.1835460Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_as_cuda_int16 PASSED [ 55%] 2023-01-11T23:13:47.1835628Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_cuda_bfloat16 PASSED [ 55%] 2023-01-11T23:13:47.1835792Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_cuda_bool PASSED [ 55%] 2023-01-11T23:13:47.1835963Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_cuda_complex64 PASSED [ 55%] 2023-01-11T23:13:47.1836129Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_cuda_float64 PASSED [ 55%] 2023-01-11T23:13:47.1836323Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expm1_cuda_bfloat16 PASSED [ 55%] 2023-01-11T23:13:47.1836488Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expm1_cuda_bool PASSED [ 55%] 2023-01-11T23:13:47.1836656Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expm1_cuda_float32 PASSED [ 55%] 2023-01-11T23:13:47.1836820Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expm1_cuda_int16 PASSED [ 55%] 2023-01-11T23:13:47.1836974Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expm1_cuda_int32 PASSED [ 55%] 2023-01-11T23:13:47.1837136Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expm1_cuda_int8 PASSED [ 55%] 2023-01-11T23:13:47.1837300Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eye_cuda_bfloat16 PASSED [ 55%] 2023-01-11T23:13:47.1837465Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eye_cuda_bool PASSED [ 55%] 2023-01-11T23:13:47.1837631Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eye_cuda_complex128 PASSED [ 55%] 2023-01-11T23:13:47.1837800Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eye_cuda_complex64 PASSED [ 55%] 2023-01-11T23:13:47.1837962Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eye_cuda_int16 PASSED [ 55%] 2023-01-11T23:13:47.1838123Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eye_cuda_int32 PASSED [ 55%] 2023-01-11T23:13:47.1838288Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft2_cuda_complex64 PASSED [ 55%] 2023-01-11T23:13:47.1838454Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft2_cuda_float16 PASSED [ 55%] 2023-01-11T23:13:47.1838617Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft2_cuda_int16 PASSED [ 55%] 2023-01-11T23:13:47.1838779Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft2_cuda_uint8 PASSED [ 55%] 2023-01-11T23:13:47.1838950Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft_cuda_complex128 PASSED [ 55%] 2023-01-11T23:13:47.1839122Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft_cuda_complex64 PASSED [ 55%] 2023-01-11T23:13:47.1839289Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft_cuda_float16 PASSED [ 55%] 2023-01-11T23:13:47.1839483Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft_cuda_float64 PASSED [ 55%] 2023-01-11T23:13:47.1839650Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft_cuda_int8 PASSED [ 55%] 2023-01-11T23:13:47.1839807Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft_cuda_uint8 PASSED [ 55%] 2023-01-11T23:13:47.1839971Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftn_cuda_bool PASSED [ 55%] 2023-01-11T23:13:47.1840144Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftn_cuda_complex128 PASSED [ 55%] 2023-01-11T23:13:47.1840310Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftn_cuda_float16 PASSED [ 55%] 2023-01-11T23:13:47.1840476Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftn_cuda_int16 PASSED [ 55%] 2023-01-11T23:13:47.1840642Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftn_cuda_int64 PASSED [ 55%] 2023-01-11T23:13:47.1840806Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftn_cuda_uint8 PASSED [ 55%] 2023-01-11T23:13:47.1840986Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftshift_cuda_bfloat16 PASSED [ 55%] 2023-01-11T23:13:47.1841157Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftshift_cuda_complex128 PASSED [ 55%] 2023-01-11T23:13:47.1841334Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftshift_cuda_complex32 PASSED [ 55%] 2023-01-11T23:13:47.1841510Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftshift_cuda_complex64 PASSED [ 55%] 2023-01-11T23:13:47.1841683Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftshift_cuda_float64 PASSED [ 55%] 2023-01-11T23:13:47.1841855Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftshift_cuda_int8 PASSED [ 55%] 2023-01-11T23:13:47.1842066Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftshift_cuda_uint8 PASSED [ 55%] 2023-01-11T23:13:47.1842233Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft2_cuda_float32 PASSED [ 55%] 2023-01-11T23:13:47.1842401Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft2_cuda_float64 PASSED [ 55%] 2023-01-11T23:13:47.1842569Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft2_cuda_int16 PASSED [ 55%] 2023-01-11T23:13:47.1842727Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft2_cuda_int32 PASSED [ 55%] 2023-01-11T23:13:47.1842893Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft2_cuda_int64 PASSED [ 55%] 2023-01-11T23:13:47.1843065Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft_cuda_complex128 PASSED [ 55%] 2023-01-11T23:13:47.1843237Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft_cuda_complex64 PASSED [ 55%] 2023-01-11T23:13:47.1843406Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft_cuda_float16 PASSED [ 55%] 2023-01-11T23:13:47.1843573Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft_cuda_float64 PASSED [ 55%] 2023-01-11T23:13:47.1843741Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft_cuda_int64 PASSED [ 55%] 2023-01-11T23:13:47.1843907Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfftn_cuda_bool PASSED [ 55%] 2023-01-11T23:13:47.1844073Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfftn_cuda_complex32 PASSED [ 55%] 2023-01-11T23:13:47.1844246Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfftn_cuda_complex64 PASSED [ 55%] 2023-01-11T23:13:47.1844416Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfftn_cuda_float16 PASSED [ 55%] 2023-01-11T23:13:47.1844584Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfftn_cuda_float64 PASSED [ 55%] 2023-01-11T23:13:47.1844751Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfftn_cuda_int16 PASSED [ 55%] 2023-01-11T23:13:47.1844920Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfftn_cuda_int8 PASSED [ 55%] 2023-01-11T23:13:47.1845087Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft2_cuda_complex64 PASSED [ 55%] 2023-01-11T23:13:47.1845278Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft2_cuda_float16 PASSED [ 55%] 2023-01-11T23:13:47.1845443Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft2_cuda_int32 PASSED [ 55%] 2023-01-11T23:13:47.1845603Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft_cuda_float16 PASSED [ 55%] 2023-01-11T23:13:47.1845769Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft_cuda_int16 PASSED [ 55%] 2023-01-11T23:13:47.1845932Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft_cuda_int32 PASSED [ 55%] 2023-01-11T23:13:47.1846102Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftn_cuda_complex32 PASSED [ 55%] 2023-01-11T23:13:47.1846274Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftn_cuda_complex64 PASSED [ 55%] 2023-01-11T23:13:47.1846440Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftn_cuda_int64 PASSED [ 55%] 2023-01-11T23:13:47.1846613Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftshift_cuda_bool PASSED [ 55%] 2023-01-11T23:13:47.1846791Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftshift_cuda_complex64 PASSED [ 55%] 2023-01-11T23:13:47.1846969Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftshift_cuda_float16 PASSED [ 55%] 2023-01-11T23:13:47.1847130Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftshift_cuda_int16 PASSED [ 55%] 2023-01-11T23:13:47.1847299Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftshift_cuda_int32 PASSED [ 55%] 2023-01-11T23:13:47.1847468Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftshift_cuda_int8 PASSED [ 55%] 2023-01-11T23:13:47.1847636Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfft2_cuda_float32 PASSED [ 55%] 2023-01-11T23:13:47.1847828Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfft2_cuda_float64 PASSED [ 55%] 2023-01-11T23:13:47.1847996Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfft2_cuda_int32 PASSED [ 55%] 2023-01-11T23:13:47.1848165Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfft2_cuda_int8 PASSED [ 55%] 2023-01-11T23:13:47.1848331Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfft_cuda_float64 PASSED [ 55%] 2023-01-11T23:13:47.1848485Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfft_cuda_int64 PASSED [ 55%] 2023-01-11T23:13:47.1848654Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfftn_cuda_bool PASSED [ 55%] 2023-01-11T23:13:47.1848820Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfftn_cuda_float16 PASSED [ 55%] 2023-01-11T23:13:47.1848985Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfftn_cuda_int16 PASSED [ 55%] 2023-01-11T23:13:47.1849148Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfftn_cuda_int8 PASSED [ 55%] 2023-01-11T23:13:47.1849312Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfftn_cuda_uint8 PASSED [ 55%] 2023-01-11T23:13:47.1849488Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft2_cuda_complex64 PASSED [ 55%] 2023-01-11T23:13:47.1849657Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft2_cuda_float16 PASSED [ 55%] 2023-01-11T23:13:47.1849823Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft2_cuda_float64 PASSED [ 55%] 2023-01-11T23:13:47.1849983Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft2_cuda_int32 PASSED [ 55%] 2023-01-11T23:13:47.1850149Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft2_cuda_int64 PASSED [ 55%] 2023-01-11T23:13:47.1850312Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft2_cuda_int8 PASSED [ 55%] 2023-01-11T23:13:47.1850477Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft_cuda_bool PASSED [ 55%] 2023-01-11T23:13:47.1850649Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft_cuda_complex32 PASSED [ 55%] 2023-01-11T23:13:47.1850845Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft_cuda_complex64 PASSED [ 55%] 2023-01-11T23:13:47.1851015Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft_cuda_float16 PASSED [ 55%] 2023-01-11T23:13:47.1851180Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft_cuda_int16 PASSED [ 56%] 2023-01-11T23:13:47.1851335Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft_cuda_int32 PASSED [ 56%] 2023-01-11T23:13:47.1851501Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft_cuda_int8 PASSED [ 56%] 2023-01-11T23:13:47.1851664Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfftn_cuda_bool PASSED [ 56%] 2023-01-11T23:13:47.1851832Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfftn_cuda_float16 PASSED [ 56%] 2023-01-11T23:13:47.1852004Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfftn_cuda_float32 PASSED [ 56%] 2023-01-11T23:13:47.1852168Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfftn_cuda_int32 PASSED [ 56%] 2023-01-11T23:13:47.1852337Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfftn_cuda_int64 PASSED [ 56%] 2023-01-11T23:13:47.1852501Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfft2_cuda_bool PASSED [ 56%] 2023-01-11T23:13:47.1852668Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfft2_cuda_float32 PASSED [ 56%] 2023-01-11T23:13:47.1852823Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfft2_cuda_int32 PASSED [ 56%] 2023-01-11T23:13:47.1852985Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfft2_cuda_int64 PASSED [ 56%] 2023-01-11T23:13:47.1853147Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfft2_cuda_int8 PASSED [ 56%] 2023-01-11T23:13:47.1853313Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfft2_cuda_uint8 PASSED [ 56%] 2023-01-11T23:13:47.1853508Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfft_cuda_bool PASSED [ 56%] 2023-01-11T23:13:47.1853680Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfft_cuda_float16 PASSED [ 56%] 2023-01-11T23:13:47.1853843Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfft_cuda_int16 PASSED [ 56%] 2023-01-11T23:13:47.1854008Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfft_cuda_int32 PASSED [ 56%] 2023-01-11T23:13:47.1854167Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfft_cuda_int8 PASSED [ 56%] 2023-01-11T23:13:47.1854330Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfft_cuda_uint8 PASSED [ 56%] 2023-01-11T23:13:47.1854619Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfftn_cuda_bool PASSED [ 56%] 2023-01-11T23:13:47.1854790Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfftn_cuda_float32 PASSED [ 56%] 2023-01-11T23:13:47.1854960Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfftn_cuda_float64 PASSED [ 56%] 2023-01-11T23:13:47.1855123Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfftn_cuda_int32 PASSED [ 56%] 2023-01-11T23:13:47.1855286Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfftn_cuda_int8 PASSED [ 56%] 2023-01-11T23:13:47.1855448Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fill_cuda_complex64 PASSED [ 56%] 2023-01-11T23:13:47.1855613Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fill_cuda_float32 PASSED [ 56%] 2023-01-11T23:13:47.1855771Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fill_cuda_float64 PASSED [ 56%] 2023-01-11T23:13:47.1855932Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fill_cuda_int32 PASSED [ 56%] 2023-01-11T23:13:47.1856100Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flatten_cuda_bfloat16 PASSED [ 56%] 2023-01-11T23:13:47.1856262Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flatten_cuda_bool PASSED [ 56%] 2023-01-11T23:13:47.1856436Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flatten_cuda_complex32 PASSED [ 56%] 2023-01-11T23:13:47.1856600Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flatten_cuda_float32 PASSED [ 56%] 2023-01-11T23:13:47.1856815Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flatten_cuda_float64 PASSED [ 56%] 2023-01-11T23:13:47.1856982Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flatten_cuda_int64 PASSED [ 56%] 2023-01-11T23:13:47.1857138Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flatten_cuda_int8 PASSED [ 56%] 2023-01-11T23:13:47.1857304Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flatten_cuda_uint8 PASSED [ 56%] 2023-01-11T23:13:47.1857469Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flip_cuda_bfloat16 PASSED [ 56%] 2023-01-11T23:13:47.1857630Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flip_cuda_bool PASSED [ 56%] 2023-01-11T23:13:47.1857799Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flip_cuda_complex128 PASSED [ 56%] 2023-01-11T23:13:47.1857963Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flip_cuda_complex64 PASSED [ 56%] 2023-01-11T23:13:47.1858129Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flip_cuda_float16 PASSED [ 56%] 2023-01-11T23:13:47.1858289Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flip_cuda_int32 PASSED [ 56%] 2023-01-11T23:13:47.1858446Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flip_cuda_int64 PASSED [ 56%] 2023-01-11T23:13:47.1858604Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fliplr_cuda_bfloat16 PASSED [ 56%] 2023-01-11T23:13:47.1858768Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fliplr_cuda_bool PASSED [ 56%] 2023-01-11T23:13:47.1858940Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fliplr_cuda_complex128 PASSED [ 56%] 2023-01-11T23:13:47.1859102Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fliplr_cuda_float32 PASSED [ 56%] 2023-01-11T23:13:47.1859298Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fliplr_cuda_int16 PASSED [ 56%] 2023-01-11T23:13:47.1859461Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fliplr_cuda_int64 PASSED [ 56%] 2023-01-11T23:13:47.1859628Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fliplr_cuda_uint8 PASSED [ 56%] 2023-01-11T23:13:47.1859799Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flipud_cuda_complex128 PASSED [ 56%] 2023-01-11T23:13:47.1859959Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flipud_cuda_complex64 PASSED [ 56%] 2023-01-11T23:13:47.1860120Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flipud_cuda_int32 PASSED [ 56%] 2023-01-11T23:13:47.1860282Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flipud_cuda_int8 PASSED [ 56%] 2023-01-11T23:13:47.1860448Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_float_power_cuda_bool PASSED [ 56%] 2023-01-11T23:13:47.1860625Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_float_power_cuda_complex128 PASSED [ 56%] 2023-01-11T23:13:47.1860801Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_float_power_cuda_complex64 PASSED [ 56%] 2023-01-11T23:13:47.1860974Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_float_power_cuda_float32 PASSED [ 56%] 2023-01-11T23:13:47.1861145Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_float_power_cuda_float64 PASSED [ 56%] 2023-01-11T23:13:47.1861312Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_float_power_cuda_int32 PASSED [ 56%] 2023-01-11T23:13:47.1861473Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_float_power_cuda_int64 PASSED [ 56%] 2023-01-11T23:13:47.1861640Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_float_power_cuda_int8 PASSED [ 56%] 2023-01-11T23:13:47.1861805Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_float_power_cuda_uint8 PASSED [ 56%] 2023-01-11T23:13:47.1861969Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_floor_cuda_float16 PASSED [ 56%] 2023-01-11T23:13:47.1862136Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_floor_cuda_float64 PASSED [ 56%] 2023-01-11T23:13:47.1862299Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_floor_cuda_int64 PASSED [ 56%] 2023-01-11T23:13:47.1862484Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_floor_cuda_int8 PASSED [ 56%] 2023-01-11T23:13:47.1862655Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_floor_divide_cuda_float16 PASSED [ 56%] 2023-01-11T23:13:47.1862819Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_floor_divide_cuda_float32 PASSED [ 56%] 2023-01-11T23:13:47.1862986Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_floor_divide_cuda_int32 PASSED [ 56%] 2023-01-11T23:13:47.1863150Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_floor_divide_cuda_int64 PASSED [ 56%] 2023-01-11T23:13:47.1863316Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_floor_divide_cuda_int8 PASSED [ 56%] 2023-01-11T23:13:47.1863475Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmax_cuda_bool PASSED [ 56%] 2023-01-11T23:13:47.1863641Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmax_cuda_float16 PASSED [ 56%] 2023-01-11T23:13:47.1863804Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmax_cuda_float64 PASSED [ 56%] 2023-01-11T23:13:47.1863964Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmax_cuda_int32 PASSED [ 56%] 2023-01-11T23:13:47.1864120Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmax_cuda_uint8 PASSED [ 56%] 2023-01-11T23:13:47.1864271Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmin_cuda_bool PASSED [ 56%] 2023-01-11T23:13:47.1864433Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmin_cuda_float16 PASSED [ 56%] 2023-01-11T23:13:47.1864595Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmin_cuda_float32 PASSED [ 56%] 2023-01-11T23:13:47.1864752Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmin_cuda_int64 PASSED [ 56%] 2023-01-11T23:13:47.1864934Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmin_cuda_int8 PASSED [ 56%] 2023-01-11T23:13:47.1865088Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmin_cuda_uint8 PASSED [ 56%] 2023-01-11T23:13:47.1865248Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmod_cuda_float16 PASSED [ 56%] 2023-01-11T23:13:47.1865408Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmod_cuda_float32 PASSED [ 56%] 2023-01-11T23:13:47.1865559Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmod_cuda_float64 PASSED [ 56%] 2023-01-11T23:13:47.1865717Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmod_cuda_int16 PASSED [ 56%] 2023-01-11T23:13:47.1865871Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmod_cuda_int32 PASSED [ 56%] 2023-01-11T23:13:47.1866026Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmod_cuda_int64 PASSED [ 56%] 2023-01-11T23:13:47.1866189Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_frac_cuda_bfloat16 PASSED [ 56%] 2023-01-11T23:13:47.1866349Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_frac_cuda_float16 PASSED [ 56%] 2023-01-11T23:13:47.1866509Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_gcd_cuda_int32 PASSED [ 56%] 2023-01-11T23:13:47.1866669Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_gcd_cuda_int8 PASSED [ 56%] 2023-01-11T23:13:47.1866818Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_gcd_cuda_uint8 PASSED [ 56%] 2023-01-11T23:13:47.1866981Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ge_cuda_bfloat16 PASSED [ 56%] 2023-01-11T23:13:47.1867137Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ge_cuda_float64 PASSED [ 56%] 2023-01-11T23:13:47.1867294Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ge_cuda_int64 PASSED [ 56%] 2023-01-11T23:13:47.1867453Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_gt_cuda_bfloat16 PASSED [ 56%] 2023-01-11T23:13:47.1867609Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_gt_cuda_float32 PASSED [ 56%] 2023-01-11T23:13:47.1867765Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_gt_cuda_float64 PASSED [ 56%] 2023-01-11T23:13:47.1867923Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_gt_cuda_uint8 PASSED [ 56%] 2023-01-11T23:13:47.1868114Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_heaviside_cuda_float16 PASSED [ 56%] 2023-01-11T23:13:47.1868277Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_heaviside_cuda_float64 PASSED [ 56%] 2023-01-11T23:13:47.1868442Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_heaviside_cuda_int8 PASSED [ 56%] 2023-01-11T23:13:47.1868607Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_heaviside_cuda_uint8 PASSED [ 56%] 2023-01-11T23:13:47.1868767Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hsplit_cuda_bool PASSED [ 56%] 2023-01-11T23:13:47.1868936Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hsplit_cuda_complex128 PASSED [ 56%] 2023-01-11T23:13:47.1869101Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hsplit_cuda_complex64 PASSED [ 56%] 2023-01-11T23:13:47.1869268Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hsplit_cuda_int16 PASSED [ 56%] 2023-01-11T23:13:47.1869426Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hsplit_cuda_int32 PASSED [ 56%] 2023-01-11T23:13:47.1869583Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hsplit_cuda_int8 PASSED [ 56%] 2023-01-11T23:13:47.1869809Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hstack_cuda_complex128 PASSED [ 56%] 2023-01-11T23:13:47.1869976Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hstack_cuda_complex32 PASSED [ 56%] 2023-01-11T23:13:47.1870138Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hstack_cuda_float32 PASSED [ 56%] 2023-01-11T23:13:47.1870299Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hstack_cuda_float64 PASSED [ 56%] 2023-01-11T23:13:47.1870456Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hstack_cuda_int64 PASSED [ 56%] 2023-01-11T23:13:47.1870646Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hypot_cuda_bfloat16 PASSED [ 56%] 2023-01-11T23:13:47.1870808Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hypot_cuda_float16 PASSED [ 56%] 2023-01-11T23:13:47.1870969Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hypot_cuda_float64 PASSED [ 56%] 2023-01-11T23:13:47.1871130Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_i0_cuda_bfloat16 PASSED [ 56%] 2023-01-11T23:13:47.1871288Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_i0_cuda_int16 PASSED [ 56%] 2023-01-11T23:13:47.1871442Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_i0_cuda_int32 PASSED [ 56%] 2023-01-11T23:13:47.1871604Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_igamma_cuda_float64 PASSED [ 56%] 2023-01-11T23:13:47.1871769Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_igammac_cuda_float32 PASSED [ 56%] 2023-01-11T23:13:47.1871934Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_igammac_cuda_float64 PASSED [ 56%] 2023-01-11T23:13:47.1872099Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_imag_cuda_complex64 PASSED [ 56%] 2023-01-11T23:13:47.1872259Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_add_cuda_bool PASSED [ 56%] 2023-01-11T23:13:47.1872425Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_add_cuda_complex64 PASSED [ 56%] 2023-01-11T23:13:47.1872594Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_add_cuda_float16 PASSED [ 56%] 2023-01-11T23:13:47.1872758Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_add_cuda_float64 PASSED [ 56%] 2023-01-11T23:13:47.1872921Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_add_cuda_int32 PASSED [ 56%] 2023-01-11T23:13:47.1873078Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_add_cuda_int8 PASSED [ 56%] 2023-01-11T23:13:47.1873247Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_copy_cuda_bfloat16 PASSED [ 56%] 2023-01-11T23:13:47.1873414Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_copy_cuda_float16 PASSED [ 56%] 2023-01-11T23:13:47.1873585Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_copy_cuda_float32 PASSED [ 56%] 2023-01-11T23:13:47.1873774Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_copy_cuda_uint8 PASSED [ 56%] 2023-01-11T23:13:47.1873951Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_fill_cuda_complex128 PASSED [ 56%] 2023-01-11T23:13:47.1874117Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_fill_cuda_float16 PASSED [ 56%] 2023-01-11T23:13:47.1874281Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_fill_cuda_float32 PASSED [ 56%] 2023-01-11T23:13:47.1874449Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_fill_cuda_int64 PASSED [ 56%] 2023-01-11T23:13:47.1874617Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_select_cuda_bool PASSED [ 56%] 2023-01-11T23:13:47.1874790Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_select_cuda_complex64 PASSED [ 56%] 2023-01-11T23:13:47.1874965Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_select_cuda_float32 PASSED [ 56%] 2023-01-11T23:13:47.1875136Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_select_cuda_float64 PASSED [ 56%] 2023-01-11T23:13:47.1875297Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_select_cuda_int16 PASSED [ 56%] 2023-01-11T23:13:47.1875461Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_select_cuda_int32 PASSED [ 56%] 2023-01-11T23:13:47.1875623Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_select_cuda_int64 PASSED [ 56%] 2023-01-11T23:13:47.1875787Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isclose_cuda_bfloat16 PASSED [ 56%] 2023-01-11T23:13:47.1875954Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isclose_cuda_complex64 PASSED [ 56%] 2023-01-11T23:13:47.1876119Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isclose_cuda_float16 PASSED [ 56%] 2023-01-11T23:13:47.1876311Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isclose_cuda_float64 PASSED [ 57%] 2023-01-11T23:13:47.1876475Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isclose_cuda_int32 PASSED [ 57%] 2023-01-11T23:13:47.1876639Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isclose_cuda_int8 PASSED [ 57%] 2023-01-11T23:13:47.1876796Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isfinite_cuda_bool PASSED [ 57%] 2023-01-11T23:13:47.1876967Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isfinite_cuda_complex128 PASSED [ 57%] 2023-01-11T23:13:47.1877132Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isfinite_cuda_complex32 PASSED [ 57%] 2023-01-11T23:13:47.1877297Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isfinite_cuda_float64 PASSED [ 57%] 2023-01-11T23:13:47.1877459Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isfinite_cuda_int8 PASSED [ 57%] 2023-01-11T23:13:47.1877624Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isfinite_cuda_uint8 PASSED [ 57%] 2023-01-11T23:13:47.1877791Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isinf_cuda_complex128 PASSED [ 57%] 2023-01-11T23:13:47.1877959Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isinf_cuda_complex32 PASSED [ 57%] 2023-01-11T23:13:47.1878115Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isinf_cuda_float32 PASSED [ 57%] 2023-01-11T23:13:47.1878278Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isinf_cuda_float64 PASSED [ 57%] 2023-01-11T23:13:47.1878438Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isinf_cuda_int32 PASSED [ 57%] 2023-01-11T23:13:47.1878598Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isnan_cuda_bool PASSED [ 57%] 2023-01-11T23:13:47.1878766Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isnan_cuda_complex128 PASSED [ 57%] 2023-01-11T23:13:47.1878926Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isnan_cuda_float16 PASSED [ 57%] 2023-01-11T23:13:47.1879090Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isnan_cuda_float32 PASSED [ 57%] 2023-01-11T23:13:47.1879247Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isnan_cuda_int8 PASSED [ 57%] 2023-01-11T23:13:47.1879436Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isneginf_cuda_bfloat16 PASSED [ 57%] 2023-01-11T23:13:47.1879591Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isneginf_cuda_bool PASSED [ 57%] 2023-01-11T23:13:47.1879758Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isneginf_cuda_float16 PASSED [ 57%] 2023-01-11T23:13:47.1879922Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isneginf_cuda_float64 PASSED [ 57%] 2023-01-11T23:13:47.1880084Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isneginf_cuda_int16 PASSED [ 57%] 2023-01-11T23:13:47.1880245Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isneginf_cuda_int64 PASSED [ 57%] 2023-01-11T23:13:47.1880411Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isposinf_cuda_bfloat16 PASSED [ 57%] 2023-01-11T23:13:47.1880577Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isposinf_cuda_float16 PASSED [ 57%] 2023-01-11T23:13:47.1880740Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isposinf_cuda_float32 PASSED [ 57%] 2023-01-11T23:13:47.1880896Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isposinf_cuda_float64 PASSED [ 57%] 2023-01-11T23:13:47.1881059Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isposinf_cuda_int16 PASSED [ 57%] 2023-01-11T23:13:47.1881223Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isposinf_cuda_uint8 PASSED [ 57%] 2023-01-11T23:13:47.1881386Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isreal_cuda_bfloat16 PASSED [ 57%] 2023-01-11T23:13:47.1881546Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isreal_cuda_bool PASSED [ 57%] 2023-01-11T23:13:47.1881712Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isreal_cuda_complex128 PASSED [ 57%] 2023-01-11T23:13:47.1881877Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isreal_cuda_complex32 PASSED [ 57%] 2023-01-11T23:13:47.1882079Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isreal_cuda_complex64 PASSED [ 57%] 2023-01-11T23:13:47.1882235Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isreal_cuda_int64 PASSED [ 57%] 2023-01-11T23:13:47.1882400Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isreal_cuda_uint8 PASSED [ 57%] 2023-01-11T23:13:47.1882559Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lcm_cuda_int16 PASSED [ 57%] 2023-01-11T23:13:47.1882714Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lcm_cuda_int64 PASSED [ 57%] 2023-01-11T23:13:47.1882872Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lcm_cuda_int8 PASSED [ 57%] 2023-01-11T23:13:47.1883033Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_le_cuda_bfloat16 PASSED [ 57%] 2023-01-11T23:13:47.1883196Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_le_cuda_bool PASSED [ 57%] 2023-01-11T23:13:47.1883357Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_le_cuda_int16 PASSED [ 57%] 2023-01-11T23:13:47.1883513Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_le_cuda_int8 PASSED [ 57%] 2023-01-11T23:13:47.1883665Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_le_cuda_uint8 PASSED [ 57%] 2023-01-11T23:13:47.1883830Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lerp_cuda_complex64 PASSED [ 57%] 2023-01-11T23:13:47.1883992Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lerp_cuda_float16 PASSED [ 57%] 2023-01-11T23:13:47.1884153Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lerp_cuda_float32 PASSED [ 57%] 2023-01-11T23:13:47.1884314Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lerp_cuda_float64 PASSED [ 57%] 2023-01-11T23:13:47.1884474Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lgamma_cuda_float16 PASSED [ 57%] 2023-01-11T23:13:47.1884635Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lgamma_cuda_float64 PASSED [ 57%] 2023-01-11T23:13:47.1884795Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lgamma_cuda_int16 PASSED [ 57%] 2023-01-11T23:13:47.1884946Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lgamma_cuda_uint8 PASSED [ 57%] 2023-01-11T23:13:47.1885151Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_matrix_norm_cuda_bfloat16 PASSED [ 57%] 2023-01-11T23:13:47.1885340Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_matrix_norm_cuda_complex128 PASSED [ 57%] 2023-01-11T23:13:47.1885520Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_matrix_norm_cuda_complex64 PASSED [ 57%] 2023-01-11T23:13:47.1885689Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_norm_cuda_bfloat16 PASSED [ 57%] 2023-01-11T23:13:47.1885862Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_norm_cuda_float16 PASSED [ 57%] 2023-01-11T23:13:47.1886029Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_norm_cuda_float32 PASSED [ 57%] 2023-01-11T23:13:47.1886203Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_svd_cuda_complex64 PASSED [ 57%] 2023-01-11T23:13:47.1886374Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_svd_cuda_float32 PASSED [ 57%] 2023-01-11T23:13:47.1886542Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_svdvals_cuda_float32 PASSED [ 57%] 2023-01-11T23:13:47.1886727Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_vector_norm_cuda_complex128 PASSED [ 57%] 2023-01-11T23:13:47.1886902Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_vector_norm_cuda_float64 PASSED [ 57%] 2023-01-11T23:13:47.1887071Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linspace_cuda_complex64 PASSED [ 57%] 2023-01-11T23:13:47.1887236Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linspace_cuda_float16 PASSED [ 57%] 2023-01-11T23:13:47.1887400Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linspace_cuda_float64 PASSED [ 57%] 2023-01-11T23:13:47.1887587Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linspace_cuda_int16 PASSED [ 57%] 2023-01-11T23:13:47.1887750Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linspace_cuda_int64 PASSED [ 57%] 2023-01-11T23:13:47.1887915Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linspace_cuda_uint8 PASSED [ 57%] 2023-01-11T23:13:47.1888070Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log10_cuda_bfloat16 PASSED [ 57%] 2023-01-11T23:13:47.1888231Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log10_cuda_bool PASSED [ 57%] 2023-01-11T23:13:47.1888392Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log10_cuda_float64 PASSED [ 57%] 2023-01-11T23:13:47.1888553Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log10_cuda_int32 PASSED [ 57%] 2023-01-11T23:13:47.1888714Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log10_cuda_int64 PASSED [ 57%] 2023-01-11T23:13:47.1888881Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log1p_cuda_complex128 PASSED [ 57%] 2023-01-11T23:13:47.1889049Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log1p_cuda_complex64 PASSED [ 57%] 2023-01-11T23:13:47.1889210Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log1p_cuda_float16 PASSED [ 57%] 2023-01-11T23:13:47.1889366Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log1p_cuda_float32 PASSED [ 57%] 2023-01-11T23:13:47.1889527Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log1p_cuda_int16 PASSED [ 57%] 2023-01-11T23:13:47.1889684Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log1p_cuda_int64 PASSED [ 57%] 2023-01-11T23:13:47.1889844Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log1p_cuda_uint8 PASSED [ 57%] 2023-01-11T23:13:47.1890004Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log2_cuda_bfloat16 PASSED [ 57%] 2023-01-11T23:13:47.1890163Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log2_cuda_int64 PASSED [ 57%] 2023-01-11T23:13:47.1890318Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log2_cuda_uint8 PASSED [ 57%] 2023-01-11T23:13:47.1890483Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_cuda_complex32 PASSED [ 57%] 2023-01-11T23:13:47.1890633Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_cuda_float16 PASSED [ 57%] 2023-01-11T23:13:47.1890822Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_cuda_int16 PASSED [ 57%] 2023-01-11T23:13:47.1890978Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_cuda_int32 PASSED [ 57%] 2023-01-11T23:13:47.1891131Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_cuda_int64 PASSED [ 57%] 2023-01-11T23:13:47.1891314Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_softmax_with_dtype_cuda_bfloat16 PASSED [ 57%] 2023-01-11T23:13:47.1891503Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_softmax_with_dtype_cuda_complex128 PASSED [ 57%] 2023-01-11T23:13:47.1891689Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_softmax_with_dtype_cuda_complex32 PASSED [ 57%] 2023-01-11T23:13:47.1891871Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_softmax_with_dtype_cuda_float64 PASSED [ 57%] 2023-01-11T23:13:47.1892050Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_softmax_with_dtype_cuda_int16 PASSED [ 57%] 2023-01-11T23:13:47.1892225Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_softmax_with_dtype_cuda_uint8 PASSED [ 57%] 2023-01-11T23:13:47.1892399Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_and_cuda_complex128 PASSED [ 57%] 2023-01-11T23:13:47.1892566Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_and_cuda_int64 PASSED [ 57%] 2023-01-11T23:13:47.1892731Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_and_cuda_int8 PASSED [ 57%] 2023-01-11T23:13:47.1892901Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_not_cuda_bfloat16 PASSED [ 57%] 2023-01-11T23:13:47.1893066Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_not_cuda_bool PASSED [ 57%] 2023-01-11T23:13:47.1893261Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_not_cuda_float32 PASSED [ 57%] 2023-01-11T23:13:47.1893430Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_not_cuda_float64 PASSED [ 57%] 2023-01-11T23:13:47.1893600Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_not_cuda_int32 PASSED [ 57%] 2023-01-11T23:13:47.1893757Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_not_cuda_int8 PASSED [ 57%] 2023-01-11T23:13:47.1893927Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_or_cuda_bfloat16 PASSED [ 57%] 2023-01-11T23:13:47.1894094Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_or_cuda_bool PASSED [ 57%] 2023-01-11T23:13:47.1894267Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_or_cuda_complex128 PASSED [ 57%] 2023-01-11T23:13:47.1894436Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_or_cuda_complex64 PASSED [ 57%] 2023-01-11T23:13:47.1894708Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_or_cuda_float16 PASSED [ 57%] 2023-01-11T23:13:47.1909820Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_or_cuda_float32 PASSED [ 57%] 2023-01-11T23:13:47.1910060Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_or_cuda_int32 PASSED [ 57%] 2023-01-11T23:13:47.1910233Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_or_cuda_int64 PASSED [ 57%] 2023-01-11T23:13:47.1910407Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_or_cuda_int8 PASSED [ 57%] 2023-01-11T23:13:47.1910576Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_or_cuda_uint8 PASSED [ 57%] 2023-01-11T23:13:47.1910745Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_xor_cuda_bool PASSED [ 57%] 2023-01-11T23:13:47.1910926Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_xor_cuda_complex64 PASSED [ 57%] 2023-01-11T23:13:47.1911095Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_xor_cuda_int16 PASSED [ 57%] 2023-01-11T23:13:47.1911266Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_xor_cuda_int8 PASSED [ 57%] 2023-01-11T23:13:47.1911442Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logspace_cuda_complex128 PASSED [ 57%] 2023-01-11T23:13:47.1911697Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logspace_cuda_float16 PASSED [ 57%] 2023-01-11T23:13:47.1911867Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logspace_cuda_float32 PASSED [ 57%] 2023-01-11T23:13:47.1912036Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logspace_cuda_float64 PASSED [ 57%] 2023-01-11T23:13:47.1912207Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logspace_cuda_int32 PASSED [ 57%] 2023-01-11T23:13:47.1912380Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logsumexp_cuda_bool PASSED [ 57%] 2023-01-11T23:13:47.1912557Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logsumexp_cuda_float16 PASSED [ 57%] 2023-01-11T23:13:47.1912735Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logsumexp_cuda_float32 PASSED [ 57%] 2023-01-11T23:13:47.1912908Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logsumexp_cuda_int16 PASSED [ 57%] 2023-01-11T23:13:47.1913081Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logsumexp_cuda_int64 PASSED [ 57%] 2023-01-11T23:13:47.1913242Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logsumexp_cuda_uint8 PASSED [ 57%] 2023-01-11T23:13:47.1913411Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lt_cuda_bfloat16 PASSED [ 57%] 2023-01-11T23:13:47.1913578Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lt_cuda_float32 PASSED [ 57%] 2023-01-11T23:13:47.1913741Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lt_cuda_float64 PASSED [ 57%] 2023-01-11T23:13:47.1913906Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lt_cuda_int32 PASSED [ 57%] 2023-01-11T23:13:47.1914091Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_masked_fill_cuda_complex64 PASSED [ 57%] 2023-01-11T23:13:47.1914310Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_masked_fill_cuda_float16 PASSED [ 57%] 2023-01-11T23:13:47.1914487Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_masked_fill_cuda_int16 PASSED [ 57%] 2023-01-11T23:13:47.1914663Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_masked_fill_cuda_int64 PASSED [ 57%] 2023-01-11T23:13:47.1914827Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_maximum_cuda_bool PASSED [ 57%] 2023-01-11T23:13:47.1914997Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_maximum_cuda_float32 PASSED [ 57%] 2023-01-11T23:13:47.1915167Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_maximum_cuda_int16 PASSED [ 57%] 2023-01-11T23:13:47.1915334Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_maximum_cuda_int32 PASSED [ 57%] 2023-01-11T23:13:47.1915505Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_maximum_cuda_int64 PASSED [ 57%] 2023-01-11T23:13:47.1915674Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_mean_cuda_bfloat16 PASSED [ 57%] 2023-01-11T23:13:47.1915846Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_mean_cuda_complex128 PASSED [ 57%] 2023-01-11T23:13:47.1916017Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_mean_cuda_float16 PASSED [ 57%] 2023-01-11T23:13:47.1916207Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_list_of_tensors_cuda_complex64 PASSED [ 57%] 2023-01-11T23:13:47.1916400Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_list_of_tensors_cuda_float16 PASSED [ 57%] 2023-01-11T23:13:47.1916592Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_list_of_tensors_cuda_uint8 PASSED [ 57%] 2023-01-11T23:13:47.1916787Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_variadic_tensors_cuda_bfloat16 PASSED [ 57%] 2023-01-11T23:13:47.1916986Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_variadic_tensors_cuda_complex128 PASSED [ 57%] 2023-01-11T23:13:47.1917187Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_variadic_tensors_cuda_complex64 PASSED [ 58%] 2023-01-11T23:13:47.1917381Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_variadic_tensors_cuda_float32 PASSED [ 58%] 2023-01-11T23:13:47.1917599Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_variadic_tensors_cuda_int32 PASSED [ 58%] 2023-01-11T23:13:47.1917773Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_minimum_cuda_bfloat16 PASSED [ 58%] 2023-01-11T23:13:47.1917935Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_minimum_cuda_float32 PASSED [ 58%] 2023-01-11T23:13:47.1918104Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_minimum_cuda_int16 PASSED [ 58%] 2023-01-11T23:13:47.1918272Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_minimum_cuda_int64 PASSED [ 58%] 2023-01-11T23:13:47.1918440Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_movedim_cuda_bool PASSED [ 58%] 2023-01-11T23:13:47.1918608Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_movedim_cuda_int16 PASSED [ 58%] 2023-01-11T23:13:47.1918781Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_movedim_cuda_int8 PASSED [ 58%] 2023-01-11T23:13:47.1918950Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_movedim_cuda_uint8 PASSED [ 58%] 2023-01-11T23:13:47.1919115Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_mul_cuda_bfloat16 PASSED [ 58%] 2023-01-11T23:13:47.1919284Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_mul_cuda_bool PASSED [ 58%] 2023-01-11T23:13:47.1919445Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_mul_cuda_float16 PASSED [ 58%] 2023-01-11T23:13:47.1919614Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_mul_cuda_float32 PASSED [ 58%] 2023-01-11T23:13:47.1919782Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_mul_cuda_float64 PASSED [ 58%] 2023-01-11T23:13:47.1919947Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_mul_cuda_int16 PASSED [ 58%] 2023-01-11T23:13:47.1920140Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_mul_cuda_int8 PASSED [ 58%] 2023-01-11T23:13:47.1920311Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nan_to_num_cuda_bool PASSED [ 58%] 2023-01-11T23:13:47.1920490Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nan_to_num_cuda_float16 PASSED [ 58%] 2023-01-11T23:13:47.1920664Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nan_to_num_cuda_float32 PASSED [ 58%] 2023-01-11T23:13:47.1920824Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nan_to_num_cuda_float64 PASSED [ 58%] 2023-01-11T23:13:47.1920993Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nan_to_num_cuda_int64 PASSED [ 58%] 2023-01-11T23:13:47.1921172Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_copy_cuda_bfloat16 PASSED [ 58%] 2023-01-11T23:13:47.1921353Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_copy_cuda_complex128 PASSED [ 58%] 2023-01-11T23:13:47.1921528Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_copy_cuda_float16 PASSED [ 58%] 2023-01-11T23:13:47.1921705Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_copy_cuda_float64 PASSED [ 58%] 2023-01-11T23:13:47.1921880Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_copy_cuda_int32 PASSED [ 58%] 2023-01-11T23:13:47.1922053Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_copy_cuda_int8 PASSED [ 58%] 2023-01-11T23:13:47.1922224Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_cuda_complex128 PASSED [ 58%] 2023-01-11T23:13:47.1922386Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_cuda_complex32 PASSED [ 58%] 2023-01-11T23:13:47.1922556Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_cuda_float64 PASSED [ 58%] 2023-01-11T23:13:47.1922724Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_cuda_int32 PASSED [ 58%] 2023-01-11T23:13:47.1922889Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_cuda_int64 PASSED [ 58%] 2023-01-11T23:13:47.1923077Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_native_layer_norm_cuda_float32 PASSED [ 58%] 2023-01-11T23:13:47.1923242Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ne_cuda_bfloat16 PASSED [ 58%] 2023-01-11T23:13:47.1923434Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ne_cuda_bool PASSED [ 58%] 2023-01-11T23:13:47.1923605Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ne_cuda_complex64 PASSED [ 58%] 2023-01-11T23:13:47.1923765Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ne_cuda_float16 PASSED [ 58%] 2023-01-11T23:13:47.1923927Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ne_cuda_int64 PASSED [ 58%] 2023-01-11T23:13:47.1924091Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ne_cuda_int8 PASSED [ 58%] 2023-01-11T23:13:47.1924258Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_neg_cuda_float32 PASSED [ 58%] 2023-01-11T23:13:47.1924425Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_neg_cuda_float64 PASSED [ 58%] 2023-01-11T23:13:47.1924592Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_neg_cuda_int32 PASSED [ 58%] 2023-01-11T23:13:47.1924754Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_neg_cuda_uint8 PASSED [ 58%] 2023-01-11T23:13:47.1924935Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_cuda_bfloat16 PASSED [ 58%] 2023-01-11T23:13:47.1925103Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_cuda_complex64 PASSED [ 58%] 2023-01-11T23:13:47.1925276Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_cuda_float16 PASSED [ 58%] 2023-01-11T23:13:47.1925450Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_cuda_float32 PASSED [ 58%] 2023-01-11T23:13:47.1925620Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_cuda_int16 PASSED [ 58%] 2023-01-11T23:13:47.1925791Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_cuda_int8 PASSED [ 58%] 2023-01-11T23:13:47.1925971Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_strided_cuda_bool PASSED [ 58%] 2023-01-11T23:13:47.1926189Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_strided_cuda_complex128 PASSED [ 58%] 2023-01-11T23:13:47.1926379Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_strided_cuda_complex32 PASSED [ 58%] 2023-01-11T23:13:47.1926563Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_strided_cuda_float16 PASSED [ 58%] 2023-01-11T23:13:47.1926735Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_strided_cuda_int32 PASSED [ 58%] 2023-01-11T23:13:47.1926914Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_strided_cuda_int8 PASSED [ 58%] 2023-01-11T23:13:47.1927097Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_strided_cuda_uint8 PASSED [ 58%] 2023-01-11T23:13:47.1927267Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_full_cuda_bool PASSED [ 58%] 2023-01-11T23:13:47.1927444Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_full_cuda_complex32 PASSED [ 58%] 2023-01-11T23:13:47.1927623Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_full_cuda_complex64 PASSED [ 58%] 2023-01-11T23:13:47.1927794Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_full_cuda_int16 PASSED [ 58%] 2023-01-11T23:13:47.1927964Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_full_cuda_int64 PASSED [ 58%] 2023-01-11T23:13:47.1928140Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_ones_cuda_complex128 PASSED [ 58%] 2023-01-11T23:13:47.1928308Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_ones_cuda_complex32 PASSED [ 58%] 2023-01-11T23:13:47.1928479Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_ones_cuda_float16 PASSED [ 58%] 2023-01-11T23:13:47.1928650Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_ones_cuda_float64 PASSED [ 58%] 2023-01-11T23:13:47.1928820Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_ones_cuda_int16 PASSED [ 58%] 2023-01-11T23:13:47.1929017Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_ones_cuda_int32 PASSED [ 58%] 2023-01-11T23:13:47.1929208Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_ones_cuda_int8 PASSED [ 58%] 2023-01-11T23:13:47.1929402Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_ones_cuda_uint8 PASSED [ 58%] 2023-01-11T23:13:47.1929578Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_zeros_cuda_bfloat16 PASSED [ 58%] 2023-01-11T23:13:47.1929738Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_zeros_cuda_bool PASSED [ 58%] 2023-01-11T23:13:47.1929902Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_zeros_cuda_int8 PASSED [ 58%] 2023-01-11T23:13:47.1930074Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_zeros_cuda_uint8 PASSED [ 58%] 2023-01-11T23:13:47.1930252Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nextafter_cuda_bfloat16 PASSED [ 58%] 2023-01-11T23:13:47.1930425Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nextafter_cuda_float64 PASSED [ 58%] 2023-01-11T23:13:47.1930630Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_alpha_dropout_cuda_bfloat16 PASSED [ 58%] 2023-01-11T23:13:47.1930875Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_alpha_dropout_cuda_float64 PASSED [ 58%] 2023-01-11T23:13:47.1931061Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_celu_cuda_bfloat16 PASSED [ 58%] 2023-01-11T23:13:47.1931246Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_celu_cuda_float16 PASSED [ 58%] 2023-01-11T23:13:47.1931430Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_dropout_cuda_bfloat16 PASSED [ 58%] 2023-01-11T23:13:47.1931621Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_dropout_cuda_float16 PASSED [ 58%] 2023-01-11T23:13:47.1931812Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_dropout_cuda_float32 PASSED [ 58%] 2023-01-11T23:13:47.1932039Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_elu_cuda_bfloat16 PASSED [ 58%] 2023-01-11T23:13:47.1932218Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_elu_cuda_float32 PASSED [ 58%] 2023-01-11T23:13:47.1932400Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_elu_cuda_float64 PASSED [ 58%] 2023-01-11T23:13:47.1932581Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_gelu_cuda_bfloat16 PASSED [ 58%] 2023-01-11T23:13:47.1932764Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_gelu_cuda_float64 PASSED [ 58%] 2023-01-11T23:13:47.1932941Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_glu_cuda_bfloat16 PASSED [ 58%] 2023-01-11T23:13:47.1933127Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_group_norm_cuda_bfloat16 PASSED [ 58%] 2023-01-11T23:13:47.1933318Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_group_norm_cuda_float16 PASSED [ 58%] 2023-01-11T23:13:47.1933511Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_group_norm_cuda_float64 PASSED [ 58%] 2023-01-11T23:13:47.1933706Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_hardshrink_cuda_bfloat16 PASSED [ 58%] 2023-01-11T23:13:47.1933894Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_hardtanh_cuda_bfloat16 PASSED [ 58%] 2023-01-11T23:13:47.1934139Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_hardtanh_cuda_float16 PASSED [ 58%] 2023-01-11T23:13:47.1934331Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_hardtanh_cuda_float32 PASSED [ 58%] 2023-01-11T23:13:47.1934891Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_hardtanh_cuda_float64 PASSED [ 58%] 2023-01-11T23:13:47.1935113Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_hardtanh_cuda_int64 PASSED [ 58%] 2023-01-11T23:13:47.1939835Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_hardtanh_cuda_int8 PASSED [ 58%] 2023-01-11T23:13:47.1940025Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_huber_loss_cuda_bfloat16 PASSED [ 58%] 2023-01-11T23:13:47.1940284Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_huber_loss_cuda_float16 PASSED [ 58%] 2023-01-11T23:13:47.1940467Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_huber_loss_cuda_float64 PASSED [ 58%] 2023-01-11T23:13:47.1940641Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_l1_loss_cuda_complex128 PASSED [ 58%] 2023-01-11T23:13:47.1940819Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_l1_loss_cuda_float32 PASSED [ 58%] 2023-01-11T23:13:47.1941001Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_layer_norm_cuda_float16 PASSED [ 58%] 2023-01-11T23:13:47.1941182Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_layer_norm_cuda_float64 PASSED [ 58%] 2023-01-11T23:13:47.1941361Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_leaky_relu_cuda_float16 PASSED [ 58%] 2023-01-11T23:13:47.1941536Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_leaky_relu_cuda_float32 PASSED [ 58%] 2023-01-11T23:13:47.1941727Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_log_softmax_with_dtype_cuda_bfloat16 PASSED [ 58%] 2023-01-11T23:13:47.1941921Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_log_softmax_with_dtype_cuda_bool PASSED [ 58%] 2023-01-11T23:13:47.1942114Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_log_softmax_with_dtype_cuda_float64 PASSED [ 58%] 2023-01-11T23:13:47.1942306Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_log_softmax_with_dtype_cuda_int64 PASSED [ 58%] 2023-01-11T23:13:47.1942498Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_log_softmax_with_dtype_cuda_int8 PASSED [ 58%] 2023-01-11T23:13:47.1963514Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_log_softmax_with_dtype_cuda_uint8 PASSED [ 58%] 2023-01-11T23:13:47.1963742Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_margin_ranking_loss_cuda_float64 PASSED [ 58%] 2023-01-11T23:13:47.1963943Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_margin_ranking_loss_cuda_int16 PASSED [ 58%] 2023-01-11T23:13:47.1964124Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_margin_ranking_loss_cuda_int32 PASSED [ 58%] 2023-01-11T23:13:47.1964302Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_margin_ranking_loss_cuda_int64 PASSED [ 58%] 2023-01-11T23:13:47.1964474Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_margin_ranking_loss_cuda_uint8 PASSED [ 58%] 2023-01-11T23:13:47.1964651Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_mish_cuda_bfloat16 PASSED [ 58%] 2023-01-11T23:13:47.1964830Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_mish_cuda_float16 PASSED [ 58%] 2023-01-11T23:13:47.1965001Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_mish_cuda_float32 PASSED [ 58%] 2023-01-11T23:13:47.1965186Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_mse_loss_cuda_bfloat16 PASSED [ 58%] 2023-01-11T23:13:47.1965366Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_mse_loss_cuda_float32 PASSED [ 58%] 2023-01-11T23:13:47.1965543Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_mse_loss_cuda_float64 PASSED [ 58%] 2023-01-11T23:13:47.1965721Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_nll_loss_cuda_float16 PASSED [ 58%] 2023-01-11T23:13:47.1965898Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_nll_loss_cuda_float32 PASSED [ 58%] 2023-01-11T23:13:47.1966079Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pairwise_distance_cuda_float16 PASSED [ 58%] 2023-01-11T23:13:47.1966266Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pairwise_distance_cuda_float64 PASSED [ 58%] 2023-01-11T23:13:47.1966520Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pairwise_distance_cuda_int32 PASSED [ 58%] 2023-01-11T23:13:47.1966711Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pairwise_distance_cuda_int64 PASSED [ 58%] 2023-01-11T23:13:47.1966889Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pdist_cuda_float64 PASSED [ 58%] 2023-01-11T23:13:47.1967074Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_poisson_nll_loss_cuda_int32 PASSED [ 58%] 2023-01-11T23:13:47.1967256Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_poisson_nll_loss_cuda_int64 PASSED [ 58%] 2023-01-11T23:13:47.1967440Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_poisson_nll_loss_cuda_int8 PASSED [ 58%] 2023-01-11T23:13:47.1967626Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_poisson_nll_loss_cuda_uint8 PASSED [ 58%] 2023-01-11T23:13:47.1967797Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_prelu_cuda_bfloat16 PASSED [ 58%] 2023-01-11T23:13:47.1967974Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_prelu_cuda_float16 PASSED [ 58%] 2023-01-11T23:13:47.1968149Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_prelu_cuda_float32 PASSED [ 58%] 2023-01-11T23:13:47.1968322Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_relu6_cuda_float16 PASSED [ 58%] 2023-01-11T23:13:47.1968499Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_relu6_cuda_float32 PASSED [ 58%] 2023-01-11T23:13:47.1968670Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_relu6_cuda_float64 PASSED [ 58%] 2023-01-11T23:13:47.1968858Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_relu6_cuda_int16 PASSED [ 58%] 2023-01-11T23:13:47.1969088Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_relu6_cuda_int64 PASSED [ 58%] 2023-01-11T23:13:47.1969263Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_relu6_cuda_uint8 PASSED [ 58%] 2023-01-11T23:13:47.1969432Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_relu_cuda_bfloat16 PASSED [ 58%] 2023-01-11T23:13:47.1969606Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_relu_cuda_float32 PASSED [ 58%] 2023-01-11T23:13:47.1969778Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_relu_cuda_int16 PASSED [ 58%] 2023-01-11T23:13:47.1969944Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_relu_cuda_int64 PASSED [ 58%] 2023-01-11T23:13:47.1970114Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_relu_cuda_int8 PASSED [ 58%] 2023-01-11T23:13:47.1970280Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_relu_cuda_uint8 PASSED [ 59%] 2023-01-11T23:13:47.1970469Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmax_with_dtype_cuda_bool PASSED [ 59%] 2023-01-11T23:13:47.1970662Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmax_with_dtype_cuda_complex64 PASSED [ 59%] 2023-01-11T23:13:47.1970848Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmax_with_dtype_cuda_int32 PASSED [ 59%] 2023-01-11T23:13:47.1971027Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmax_with_dtype_cuda_int8 PASSED [ 59%] 2023-01-11T23:13:47.1971217Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmin_with_dtype_cuda_bfloat16 PASSED [ 59%] 2023-01-11T23:13:47.1971402Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmin_with_dtype_cuda_int16 PASSED [ 59%] 2023-01-11T23:13:47.1971587Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmin_with_dtype_cuda_int32 PASSED [ 59%] 2023-01-11T23:13:47.1971773Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softplus_cuda_bfloat16 PASSED [ 59%] 2023-01-11T23:13:47.1971980Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softplus_cuda_float16 PASSED [ 59%] 2023-01-11T23:13:47.1972161Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softplus_cuda_float64 PASSED [ 59%] 2023-01-11T23:13:47.1972343Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softshrink_cuda_float16 PASSED [ 59%] 2023-01-11T23:13:47.1972524Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softshrink_cuda_float32 PASSED [ 59%] 2023-01-11T23:13:47.1972701Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_tanhshrink_cuda_complex64 PASSED [ 59%] 2023-01-11T23:13:47.1972882Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_tanhshrink_cuda_float16 PASSED [ 59%] 2023-01-11T23:13:47.1973063Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_tanhshrink_cuda_int16 PASSED [ 59%] 2023-01-11T23:13:47.1973242Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_tanhshrink_cuda_int32 PASSED [ 59%] 2023-01-11T23:13:47.1973420Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_tanhshrink_cuda_int64 PASSED [ 59%] 2023-01-11T23:13:47.1973600Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_threshold_cuda_bfloat16 PASSED [ 59%] 2023-01-11T23:13:47.1973780Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_threshold_cuda_int16 PASSED [ 59%] 2023-01-11T23:13:47.1973963Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_threshold_cuda_int64 PASSED [ 59%] 2023-01-11T23:13:47.1974205Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_threshold_cuda_uint8 PASSED [ 59%] 2023-01-11T23:13:47.1974412Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_triplet_margin_loss_cuda_bfloat16 PASSED [ 59%] 2023-01-11T23:13:47.1974908Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_triplet_margin_loss_cuda_complex128 PASSED [ 59%] 2023-01-11T23:13:47.1975111Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_triplet_margin_loss_cuda_complex64 PASSED [ 59%] 2023-01-11T23:13:47.1975304Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_triplet_margin_loss_cuda_float64 PASSED [ 59%] 2023-01-11T23:13:47.1975489Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_triplet_margin_loss_cuda_int16 PASSED [ 59%] 2023-01-11T23:13:47.1975673Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_triplet_margin_loss_cuda_int64 PASSED [ 59%] 2023-01-11T23:13:47.1975836Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_norm_cuda_complex128 PASSED [ 59%] 2023-01-11T23:13:47.1975998Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ones_cuda_bool PASSED [ 59%] 2023-01-11T23:13:47.1976160Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ones_cuda_float16 PASSED [ 59%] 2023-01-11T23:13:47.1976312Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ones_cuda_int16 PASSED [ 59%] 2023-01-11T23:13:47.1976469Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ones_cuda_int64 PASSED [ 59%] 2023-01-11T23:13:47.1976626Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ones_cuda_int8 PASSED [ 59%] 2023-01-11T23:13:47.1976780Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ones_cuda_uint8 PASSED [ 59%] 2023-01-11T23:13:47.1976945Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_permute_cuda_bfloat16 PASSED [ 59%] 2023-01-11T23:13:47.1977107Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_permute_cuda_bool PASSED [ 59%] 2023-01-11T23:13:47.1977270Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_permute_cuda_float32 PASSED [ 59%] 2023-01-11T23:13:47.1977429Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_permute_cuda_float64 PASSED [ 59%] 2023-01-11T23:13:47.1977595Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_permute_cuda_int32 PASSED [ 59%] 2023-01-11T23:13:47.1977749Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_permute_cuda_int64 PASSED [ 59%] 2023-01-11T23:13:47.1977965Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_permute_cuda_int8 PASSED [ 59%] 2023-01-11T23:13:47.1978127Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_permute_cuda_uint8 PASSED [ 59%] 2023-01-11T23:13:47.1978292Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_positive_cuda_float16 PASSED [ 59%] 2023-01-11T23:13:47.1978452Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_positive_cuda_int8 PASSED [ 59%] 2023-01-11T23:13:47.1978617Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_positive_cuda_uint8 PASSED [ 59%] 2023-01-11T23:13:47.1978778Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_pow_cuda_complex128 PASSED [ 59%] 2023-01-11T23:13:47.1978941Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_pow_cuda_float16 PASSED [ 59%] 2023-01-11T23:13:47.1979117Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_pow_cuda_float32 PASSED [ 59%] 2023-01-11T23:13:47.1979298Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_pow_cuda_float64 PASSED [ 59%] 2023-01-11T23:13:47.1979461Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_pow_cuda_int32 PASSED [ 59%] 2023-01-11T23:13:47.1979619Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_pow_cuda_uint8 PASSED [ 59%] 2023-01-11T23:13:47.1979779Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_prod_cuda_bfloat16 PASSED [ 59%] 2023-01-11T23:13:47.1979945Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_prod_cuda_complex32 PASSED [ 59%] 2023-01-11T23:13:47.1980102Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_prod_cuda_int8 PASSED [ 59%] 2023-01-11T23:13:47.1980263Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_randn_cuda_bfloat16 PASSED [ 59%] 2023-01-11T23:13:47.1980429Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_randn_cuda_complex128 PASSED [ 59%] 2023-01-11T23:13:47.1980663Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_randn_cuda_complex64 PASSED [ 59%] 2023-01-11T23:13:47.1980827Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_randn_cuda_float64 PASSED [ 59%] 2023-01-11T23:13:47.1980990Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ravel_cuda_complex128 PASSED [ 59%] 2023-01-11T23:13:47.1981151Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ravel_cuda_complex32 PASSED [ 59%] 2023-01-11T23:13:47.1981309Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ravel_cuda_float32 PASSED [ 59%] 2023-01-11T23:13:47.1981466Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ravel_cuda_int16 PASSED [ 59%] 2023-01-11T23:13:47.1981622Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ravel_cuda_int64 PASSED [ 59%] 2023-01-11T23:13:47.1981781Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ravel_cuda_int8 PASSED [ 59%] 2023-01-11T23:13:47.1981933Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_real_cuda_bfloat16 PASSED [ 59%] 2023-01-11T23:13:47.1982092Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_real_cuda_float16 PASSED [ 59%] 2023-01-11T23:13:47.1982251Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_real_cuda_uint8 PASSED [ 59%] 2023-01-11T23:13:47.1982428Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reciprocal_cuda_complex64 PASSED [ 59%] 2023-01-11T23:13:47.1982595Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reciprocal_cuda_float16 PASSED [ 59%] 2023-01-11T23:13:47.1982760Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reciprocal_cuda_float32 PASSED [ 59%] 2023-01-11T23:13:47.1982921Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reciprocal_cuda_float64 PASSED [ 59%] 2023-01-11T23:13:47.1983089Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reciprocal_cuda_int16 PASSED [ 59%] 2023-01-11T23:13:47.1983244Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reciprocal_cuda_int32 PASSED [ 59%] 2023-01-11T23:13:47.1983408Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reciprocal_cuda_int8 PASSED [ 59%] 2023-01-11T23:13:47.1983567Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reciprocal_cuda_uint8 PASSED [ 59%] 2023-01-11T23:13:47.1983756Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_remainder_cuda_bfloat16 PASSED [ 59%] 2023-01-11T23:13:47.1983924Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_remainder_cuda_float16 PASSED [ 59%] 2023-01-11T23:13:47.1984083Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_remainder_cuda_int8 PASSED [ 59%] 2023-01-11T23:13:47.1984240Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_repeat_cuda_bool PASSED [ 59%] 2023-01-11T23:13:47.1984403Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_repeat_cuda_complex128 PASSED [ 59%] 2023-01-11T23:13:47.1984566Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_repeat_cuda_complex64 PASSED [ 59%] 2023-01-11T23:13:47.1984733Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_repeat_cuda_float16 PASSED [ 59%] 2023-01-11T23:13:47.1984950Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_repeat_cuda_float32 PASSED [ 59%] 2023-01-11T23:13:47.1985169Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_repeat_cuda_int16 PASSED [ 59%] 2023-01-11T23:13:47.1985354Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_repeat_cuda_int64 PASSED [ 59%] 2023-01-11T23:13:47.1985557Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_repeat_cuda_uint8 PASSED [ 59%] 2023-01-11T23:13:47.1985723Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_as_cuda_bool PASSED [ 59%] 2023-01-11T23:13:47.1985892Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_as_cuda_complex32 PASSED [ 59%] 2023-01-11T23:13:47.1986061Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_as_cuda_complex64 PASSED [ 59%] 2023-01-11T23:13:47.1986220Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_as_cuda_float32 PASSED [ 59%] 2023-01-11T23:13:47.1986434Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_as_cuda_float64 PASSED [ 59%] 2023-01-11T23:13:47.1986601Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_as_cuda_int64 PASSED [ 59%] 2023-01-11T23:13:47.1986769Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_as_cuda_int8 PASSED [ 59%] 2023-01-11T23:13:47.1986933Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_as_cuda_uint8 PASSED [ 59%] 2023-01-11T23:13:47.1987096Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_cuda_float32 PASSED [ 59%] 2023-01-11T23:13:47.1987259Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_cuda_float64 PASSED [ 59%] 2023-01-11T23:13:47.1987422Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_cuda_int16 PASSED [ 59%] 2023-01-11T23:13:47.1987585Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_cuda_int32 PASSED [ 59%] 2023-01-11T23:13:47.1987744Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_cuda_int8 PASSED [ 59%] 2023-01-11T23:13:47.1987905Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_roll_cuda_bool PASSED [ 59%] 2023-01-11T23:13:47.1988073Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_roll_cuda_complex128 PASSED [ 59%] 2023-01-11T23:13:47.1988239Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_roll_cuda_complex32 PASSED [ 59%] 2023-01-11T23:13:47.1988399Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_roll_cuda_float16 PASSED [ 59%] 2023-01-11T23:13:47.1988559Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_roll_cuda_float64 PASSED [ 59%] 2023-01-11T23:13:47.1988719Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_roll_cuda_int16 PASSED [ 59%] 2023-01-11T23:13:47.1988877Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_roll_cuda_int64 PASSED [ 59%] 2023-01-11T23:13:47.1989023Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rot90_cuda_bool PASSED [ 59%] 2023-01-11T23:13:47.1989192Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rot90_cuda_complex128 PASSED [ 59%] 2023-01-11T23:13:47.1989356Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rot90_cuda_complex64 PASSED [ 59%] 2023-01-11T23:13:47.1989543Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rot90_cuda_float16 PASSED [ 59%] 2023-01-11T23:13:47.1989787Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rot90_cuda_float32 PASSED [ 59%] 2023-01-11T23:13:47.1989951Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rot90_cuda_int16 PASSED [ 59%] 2023-01-11T23:13:47.1990111Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rot90_cuda_int64 PASSED [ 59%] 2023-01-11T23:13:47.1990270Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rot90_cuda_uint8 PASSED [ 59%] 2023-01-11T23:13:47.1990429Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_round_cuda_float64 PASSED [ 59%] 2023-01-11T23:13:47.1990580Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_round_cuda_int32 PASSED [ 59%] 2023-01-11T23:13:47.1990740Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_round_cuda_int64 PASSED [ 59%] 2023-01-11T23:13:47.1990900Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_round_cuda_uint8 PASSED [ 59%] 2023-01-11T23:13:47.1991070Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsqrt_cuda_complex128 PASSED [ 59%] 2023-01-11T23:13:47.1991233Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsqrt_cuda_complex32 PASSED [ 59%] 2023-01-11T23:13:47.1991398Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsqrt_cuda_complex64 PASSED [ 59%] 2023-01-11T23:13:47.1991560Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsqrt_cuda_float32 PASSED [ 59%] 2023-01-11T23:13:47.1991721Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsqrt_cuda_float64 PASSED [ 59%] 2023-01-11T23:13:47.1991875Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsqrt_cuda_int32 PASSED [ 59%] 2023-01-11T23:13:47.1992061Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsqrt_cuda_int64 PASSED [ 59%] 2023-01-11T23:13:47.1992221Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsqrt_cuda_int8 PASSED [ 59%] 2023-01-11T23:13:47.1992382Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsqrt_cuda_uint8 PASSED [ 59%] 2023-01-11T23:13:47.1992541Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsub_cuda_bfloat16 PASSED [ 59%] 2023-01-11T23:13:47.1992704Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsub_cuda_complex64 PASSED [ 59%] 2023-01-11T23:13:47.1992864Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsub_cuda_float16 PASSED [ 59%] 2023-01-11T23:13:47.1993023Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsub_cuda_float32 PASSED [ 59%] 2023-01-11T23:13:47.1993178Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsub_cuda_int16 PASSED [ 59%] 2023-01-11T23:13:47.1993325Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsub_cuda_int32 PASSED [ 59%] 2023-01-11T23:13:47.1993480Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsub_cuda_int64 PASSED [ 59%] 2023-01-11T23:13:47.1993632Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsub_cuda_uint8 PASSED [ 59%] 2023-01-11T23:13:47.1993795Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sgn_cuda_bfloat16 PASSED [ 59%] 2023-01-11T23:13:47.1993957Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sgn_cuda_complex128 PASSED [ 59%] 2023-01-11T23:13:47.1994120Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sgn_cuda_complex32 PASSED [ 59%] 2023-01-11T23:13:47.1994284Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sgn_cuda_complex64 PASSED [ 59%] 2023-01-11T23:13:47.1994443Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sgn_cuda_float32 PASSED [ 59%] 2023-01-11T23:13:47.1994595Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sgn_cuda_float64 PASSED [ 59%] 2023-01-11T23:13:47.1994758Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sgn_cuda_int16 PASSED [ 59%] 2023-01-11T23:13:47.1994920Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sgn_cuda_int32 PASSED [ 59%] 2023-01-11T23:13:47.1995082Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sigmoid_cuda_bool PASSED [ 59%] 2023-01-11T23:13:47.1995283Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sigmoid_cuda_complex32 PASSED [ 59%] 2023-01-11T23:13:47.1995450Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sigmoid_cuda_float32 PASSED [ 59%] 2023-01-11T23:13:47.1995610Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sigmoid_cuda_int32 PASSED [ 59%] 2023-01-11T23:13:47.1995769Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sigmoid_cuda_int64 PASSED [ 59%] 2023-01-11T23:13:47.1995922Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sigmoid_cuda_uint8 PASSED [ 59%] 2023-01-11T23:13:47.1996078Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sign_cuda_bool PASSED [ 59%] 2023-01-11T23:13:47.1996235Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sign_cuda_float16 PASSED [ 59%] 2023-01-11T23:13:47.1996399Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sign_cuda_float64 PASSED [ 60%] 2023-01-11T23:13:47.1996554Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sign_cuda_int16 PASSED [ 60%] 2023-01-11T23:13:47.1996712Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sign_cuda_int32 PASSED [ 60%] 2023-01-11T23:13:47.1996865Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sign_cuda_int64 PASSED [ 60%] 2023-01-11T23:13:47.1997017Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sign_cuda_uint8 PASSED [ 60%] 2023-01-11T23:13:47.1997176Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_signbit_cuda_int16 PASSED [ 60%] 2023-01-11T23:13:47.1997329Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_signbit_cuda_int32 PASSED [ 60%] 2023-01-11T23:13:47.1997490Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_signbit_cuda_int64 PASSED [ 60%] 2023-01-11T23:13:47.1997649Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_signbit_cuda_int8 PASSED [ 60%] 2023-01-11T23:13:47.1997846Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_signbit_cuda_uint8 PASSED [ 60%] 2023-01-11T23:13:47.1998013Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sin_cuda_bfloat16 PASSED [ 60%] 2023-01-11T23:13:47.1998173Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sin_cuda_bool PASSED [ 60%] 2023-01-11T23:13:47.1998341Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sin_cuda_complex128 PASSED [ 60%] 2023-01-11T23:13:47.1998501Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sin_cuda_complex32 PASSED [ 60%] 2023-01-11T23:13:47.1998653Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sin_cuda_float32 PASSED [ 60%] 2023-01-11T23:13:47.1998815Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sin_cuda_float64 PASSED [ 60%] 2023-01-11T23:13:47.1998976Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sin_cuda_int32 PASSED [ 60%] 2023-01-11T23:13:47.1999134Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sin_cuda_int64 PASSED [ 60%] 2023-01-11T23:13:47.1999299Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinc_cuda_bfloat16 PASSED [ 60%] 2023-01-11T23:13:47.1999464Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinc_cuda_complex64 PASSED [ 60%] 2023-01-11T23:13:47.1999629Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinc_cuda_float16 PASSED [ 60%] 2023-01-11T23:13:47.1999788Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinc_cuda_float32 PASSED [ 60%] 2023-01-11T23:13:47.1999940Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinc_cuda_int32 PASSED [ 60%] 2023-01-11T23:13:47.2000095Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinc_cuda_int64 PASSED [ 60%] 2023-01-11T23:13:47.2000251Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinc_cuda_int8 PASSED [ 60%] 2023-01-11T23:13:47.2000407Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinh_cuda_bool PASSED [ 60%] 2023-01-11T23:13:47.2000576Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinh_cuda_complex128 PASSED [ 60%] 2023-01-11T23:13:47.2000741Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinh_cuda_complex32 PASSED [ 60%] 2023-01-11T23:13:47.2000929Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinh_cuda_float32 PASSED [ 60%] 2023-01-11T23:13:47.2001093Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinh_cuda_float64 PASSED [ 60%] 2023-01-11T23:13:47.2001251Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinh_cuda_int32 PASSED [ 60%] 2023-01-11T23:13:47.2001400Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinh_cuda_uint8 PASSED [ 60%] 2023-01-11T23:13:47.2001583Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_softmax_with_dtype_cuda_bfloat16 PASSED [ 60%] 2023-01-11T23:13:47.2001759Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_softmax_with_dtype_cuda_bool PASSED [ 60%] 2023-01-11T23:13:47.2001943Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_softmax_with_dtype_cuda_complex128 PASSED [ 60%] 2023-01-11T23:13:47.2002124Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_softmax_with_dtype_cuda_float32 PASSED [ 60%] 2023-01-11T23:13:47.2002301Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_softmax_with_dtype_cuda_float64 PASSED [ 60%] 2023-01-11T23:13:47.2002478Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_softmax_with_dtype_cuda_int16 PASSED [ 60%] 2023-01-11T23:13:47.2002652Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_softmax_with_dtype_cuda_int8 PASSED [ 60%] 2023-01-11T23:13:47.2002824Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_bessel_j0_cuda_bool PASSED [ 60%] 2023-01-11T23:13:47.2002988Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_bessel_j0_cuda_int16 PASSED [ 60%] 2023-01-11T23:13:47.2003158Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_bessel_j0_cuda_int8 PASSED [ 60%] 2023-01-11T23:13:47.2003335Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_bessel_j1_cuda_float32 PASSED [ 60%] 2023-01-11T23:13:47.2003531Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_entr_cuda_bfloat16 PASSED [ 60%] 2023-01-11T23:13:47.2003703Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_entr_cuda_int16 PASSED [ 60%] 2023-01-11T23:13:47.2003871Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_entr_cuda_int32 PASSED [ 60%] 2023-01-11T23:13:47.2004034Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_entr_cuda_int64 PASSED [ 60%] 2023-01-11T23:13:47.2004201Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_entr_cuda_int8 PASSED [ 60%] 2023-01-11T23:13:47.2004357Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_entr_cuda_uint8 PASSED [ 60%] 2023-01-11T23:13:47.2004528Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_erfcx_cuda_bool PASSED [ 60%] 2023-01-11T23:13:47.2004702Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_erfcx_cuda_float32 PASSED [ 60%] 2023-01-11T23:13:47.2004878Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_erfcx_cuda_float64 PASSED [ 60%] 2023-01-11T23:13:47.2005044Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_erfcx_cuda_int16 PASSED [ 60%] 2023-01-11T23:13:47.2005215Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_erfcx_cuda_int8 PASSED [ 60%] 2023-01-11T23:13:47.2005386Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i0e_cuda_bfloat16 PASSED [ 60%] 2023-01-11T23:13:47.2005558Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i0e_cuda_bool PASSED [ 60%] 2023-01-11T23:13:47.2005727Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i0e_cuda_float32 PASSED [ 60%] 2023-01-11T23:13:47.2005886Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i0e_cuda_int16 PASSED [ 60%] 2023-01-11T23:13:47.2006053Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i0e_cuda_int32 PASSED [ 60%] 2023-01-11T23:13:47.2006219Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i0e_cuda_uint8 PASSED [ 60%] 2023-01-11T23:13:47.2006382Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i1_cuda_float32 PASSED [ 60%] 2023-01-11T23:13:47.2006568Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i1_cuda_int16 PASSED [ 60%] 2023-01-11T23:13:47.2006737Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i1_cuda_int8 PASSED [ 60%] 2023-01-11T23:13:47.2006899Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i1e_cuda_bool PASSED [ 60%] 2023-01-11T23:13:47.2007070Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i1e_cuda_float32 PASSED [ 60%] 2023-01-11T23:13:47.2007230Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i1e_cuda_int64 PASSED [ 60%] 2023-01-11T23:13:47.2007394Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i1e_cuda_int8 PASSED [ 60%] 2023-01-11T23:13:47.2007571Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_log_ndtr_cuda_float32 PASSED [ 60%] 2023-01-11T23:13:47.2007746Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_log_ndtr_cuda_float64 PASSED [ 60%] 2023-01-11T23:13:47.2007919Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_log_ndtr_cuda_int8 PASSED [ 60%] 2023-01-11T23:13:47.2008092Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_log_ndtr_cuda_uint8 PASSED [ 60%] 2023-01-11T23:13:47.2008286Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_log_softmax_with_dtype_cuda_bfloat16 PASSED [ 60%] 2023-01-11T23:13:47.2008483Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_log_softmax_with_dtype_cuda_complex128 PASSED [ 60%] 2023-01-11T23:13:47.2008675Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_log_softmax_with_dtype_cuda_complex32 PASSED [ 60%] 2023-01-11T23:13:47.2008855Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_log_softmax_with_dtype_cuda_float16 PASSED [ 60%] 2023-01-11T23:13:47.2009069Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_log_softmax_with_dtype_cuda_int16 PASSED [ 60%] 2023-01-11T23:13:47.2009260Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_log_softmax_with_dtype_cuda_int32 PASSED [ 60%] 2023-01-11T23:13:47.2009430Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_logit_cuda_bool PASSED [ 60%] 2023-01-11T23:13:47.2009604Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_logit_cuda_float64 PASSED [ 60%] 2023-01-11T23:13:47.2009775Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_logit_cuda_int32 PASSED [ 60%] 2023-01-11T23:13:47.2009944Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_logit_cuda_int64 PASSED [ 60%] 2023-01-11T23:13:47.2010115Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_logit_cuda_int8 PASSED [ 60%] 2023-01-11T23:13:47.2010310Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_1_cuda_float64 PASSED [ 60%] 2023-01-11T23:13:47.2010496Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_1_cuda_int8 PASSED [ 60%] 2023-01-11T23:13:47.2010691Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_3_cuda_float64 PASSED [ 60%] 2023-01-11T23:13:47.2010883Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_3_cuda_int16 PASSED [ 60%] 2023-01-11T23:13:47.2011069Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_3_cuda_int32 PASSED [ 60%] 2023-01-11T23:13:47.2011252Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_3_cuda_int64 PASSED [ 60%] 2023-01-11T23:13:47.2011442Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_3_cuda_int8 PASSED [ 60%] 2023-01-11T23:13:47.2011632Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_3_cuda_uint8 PASSED [ 60%] 2023-01-11T23:13:47.2011828Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_5_cuda_float16 PASSED [ 60%] 2023-01-11T23:13:47.2012037Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_5_cuda_int16 PASSED [ 60%] 2023-01-11T23:13:47.2012222Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_5_cuda_int32 PASSED [ 60%] 2023-01-11T23:13:47.2012398Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_5_cuda_uint8 PASSED [ 60%] 2023-01-11T23:13:47.2012568Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_ndtr_cuda_float64 PASSED [ 60%] 2023-01-11T23:13:47.2012740Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_ndtr_cuda_int32 PASSED [ 60%] 2023-01-11T23:13:47.2012907Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_ndtr_cuda_int64 PASSED [ 60%] 2023-01-11T23:13:47.2013085Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_ndtri_cuda_float32 PASSED [ 60%] 2023-01-11T23:13:47.2013257Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_ndtri_cuda_int16 PASSED [ 60%] 2023-01-11T23:13:47.2013426Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_ndtri_cuda_int64 PASSED [ 60%] 2023-01-11T23:13:47.2013592Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_ndtri_cuda_int8 PASSED [ 60%] 2023-01-11T23:13:47.2013778Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_softmax_with_dtype_cuda_bfloat16 PASSED [ 60%] 2023-01-11T23:13:47.2013956Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_softmax_with_dtype_cuda_float16 PASSED [ 60%] 2023-01-11T23:13:47.2014143Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_softmax_with_dtype_cuda_float32 PASSED [ 60%] 2023-01-11T23:13:47.2014326Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_softmax_with_dtype_cuda_int16 PASSED [ 60%] 2023-01-11T23:13:47.2014672Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_softmax_with_dtype_cuda_uint8 PASSED [ 60%] 2023-01-11T23:13:47.2014861Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_spherical_bessel_j0_cuda_bool PASSED [ 60%] 2023-01-11T23:13:47.2015052Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_spherical_bessel_j0_cuda_float32 PASSED [ 60%] 2023-01-11T23:13:47.2015236Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_spherical_bessel_j0_cuda_float64 PASSED [ 60%] 2023-01-11T23:13:47.2015421Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_spherical_bessel_j0_cuda_int16 PASSED [ 60%] 2023-01-11T23:13:47.2015606Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_spherical_bessel_j0_cuda_int32 PASSED [ 60%] 2023-01-11T23:13:47.2015777Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_spherical_bessel_j0_cuda_int64 PASSED [ 60%] 2023-01-11T23:13:47.2015960Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_spherical_bessel_j0_cuda_int8 PASSED [ 60%] 2023-01-11T23:13:47.2016140Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_spherical_bessel_j0_cuda_uint8 PASSED [ 60%] 2023-01-11T23:13:47.2016318Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_xlog1py_cuda_float64 PASSED [ 60%] 2023-01-11T23:13:47.2016491Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_xlog1py_cuda_int64 PASSED [ 60%] 2023-01-11T23:13:47.2016664Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_xlog1py_cuda_int8 PASSED [ 60%] 2023-01-11T23:13:47.2016832Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_zeta_cuda_bool PASSED [ 60%] 2023-01-11T23:13:47.2016998Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_zeta_cuda_int32 PASSED [ 60%] 2023-01-11T23:13:47.2017166Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_zeta_cuda_uint8 PASSED [ 60%] 2023-01-11T23:13:47.2017319Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sqrt_cuda_bool PASSED [ 60%] 2023-01-11T23:13:47.2017488Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sqrt_cuda_complex128 PASSED [ 60%] 2023-01-11T23:13:47.2017651Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sqrt_cuda_float16 PASSED [ 60%] 2023-01-11T23:13:47.2017855Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sqrt_cuda_float32 PASSED [ 60%] 2023-01-11T23:13:47.2018019Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sqrt_cuda_float64 PASSED [ 60%] 2023-01-11T23:13:47.2018179Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sqrt_cuda_int32 PASSED [ 60%] 2023-01-11T23:13:47.2018338Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sqrt_cuda_int8 PASSED [ 60%] 2023-01-11T23:13:47.2018503Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_square_cuda_float16 PASSED [ 60%] 2023-01-11T23:13:47.2018660Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_square_cuda_float64 PASSED [ 60%] 2023-01-11T23:13:47.2018822Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_square_cuda_int16 PASSED [ 60%] 2023-01-11T23:13:47.2018986Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_square_cuda_int64 PASSED [ 60%] 2023-01-11T23:13:47.2019151Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_square_cuda_int8 PASSED [ 60%] 2023-01-11T23:13:47.2019310Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_square_cuda_uint8 PASSED [ 60%] 2023-01-11T23:13:47.2019479Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_squeeze_cuda_bfloat16 PASSED [ 60%] 2023-01-11T23:13:47.2019647Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_squeeze_cuda_complex64 PASSED [ 60%] 2023-01-11T23:13:47.2019812Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_squeeze_cuda_float32 PASSED [ 60%] 2023-01-11T23:13:47.2019969Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_squeeze_cuda_float64 PASSED [ 60%] 2023-01-11T23:13:47.2020134Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_squeeze_cuda_int64 PASSED [ 60%] 2023-01-11T23:13:47.2020325Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_squeeze_cuda_int8 PASSED [ 60%] 2023-01-11T23:13:47.2020484Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_stack_cuda_bool PASSED [ 60%] 2023-01-11T23:13:47.2020651Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_stack_cuda_complex32 PASSED [ 60%] 2023-01-11T23:13:47.2020811Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_stack_cuda_float32 PASSED [ 60%] 2023-01-11T23:13:47.2020974Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_stack_cuda_float64 PASSED [ 60%] 2023-01-11T23:13:47.2021133Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_stack_cuda_int16 PASSED [ 60%] 2023-01-11T23:13:47.2021293Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_stack_cuda_int32 PASSED [ 60%] 2023-01-11T23:13:47.2021447Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_stack_cuda_uint8 PASSED [ 60%] 2023-01-11T23:13:47.2021610Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_std_cuda_complex64 PASSED [ 60%] 2023-01-11T23:13:47.2021769Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_std_cuda_float16 PASSED [ 60%] 2023-01-11T23:13:47.2021929Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_std_cuda_float64 PASSED [ 60%] 2023-01-11T23:13:47.2022100Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_std_mean_cuda_bfloat16 PASSED [ 60%] 2023-01-11T23:13:47.2022269Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_std_mean_cuda_complex128 PASSED [ 60%] 2023-01-11T23:13:47.2022434Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_std_mean_cuda_float16 PASSED [ 61%] 2023-01-11T23:13:47.2022600Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_std_mean_cuda_float32 PASSED [ 61%] 2023-01-11T23:13:47.2022753Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sub_cuda_bfloat16 PASSED [ 61%] 2023-01-11T23:13:47.2022919Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sub_cuda_complex128 PASSED [ 61%] 2023-01-11T23:13:47.2023083Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sub_cuda_complex32 PASSED [ 61%] 2023-01-11T23:13:47.2023245Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sub_cuda_float32 PASSED [ 61%] 2023-01-11T23:13:47.2023431Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sub_cuda_int16 PASSED [ 61%] 2023-01-11T23:13:47.2023589Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sub_cuda_int32 PASSED [ 61%] 2023-01-11T23:13:47.2023746Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sub_cuda_int8 PASSED [ 61%] 2023-01-11T23:13:47.2023902Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sub_cuda_uint8 PASSED [ 61%] 2023-01-11T23:13:47.2024060Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_cuda_bool PASSED [ 61%] 2023-01-11T23:13:47.2024218Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_cuda_complex128 PASSED [ 61%] 2023-01-11T23:13:47.2024382Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_cuda_complex64 PASSED [ 61%] 2023-01-11T23:13:47.2024545Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_cuda_float16 PASSED [ 61%] 2023-01-11T23:13:47.2024703Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_cuda_float32 PASSED [ 61%] 2023-01-11T23:13:47.2024865Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_cuda_float64 PASSED [ 61%] 2023-01-11T23:13:47.2025024Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_cuda_int16 PASSED [ 61%] 2023-01-11T23:13:47.2025178Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_cuda_int64 PASSED [ 61%] 2023-01-11T23:13:47.2025335Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_cuda_int8 PASSED [ 61%] 2023-01-11T23:13:47.2025501Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_to_size_cuda_complex64 PASSED [ 61%] 2023-01-11T23:13:47.2025671Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_to_size_cuda_float32 PASSED [ 61%] 2023-01-11T23:13:47.2025839Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_to_size_cuda_float64 PASSED [ 61%] 2023-01-11T23:13:47.2026028Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_to_size_cuda_int16 PASSED [ 61%] 2023-01-11T23:13:47.2026193Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_to_size_cuda_int32 PASSED [ 61%] 2023-01-11T23:13:47.2026358Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_to_size_cuda_int64 PASSED [ 61%] 2023-01-11T23:13:47.2026519Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_to_size_cuda_int8 PASSED [ 61%] 2023-01-11T23:13:47.2026684Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_t_cuda_bfloat16 PASSED [ 61%] 2023-01-11T23:13:47.2026834Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_t_cuda_bool PASSED [ 61%] 2023-01-11T23:13:47.2026997Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_t_cuda_complex128 PASSED [ 61%] 2023-01-11T23:13:47.2027150Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_t_cuda_int8 PASSED [ 61%] 2023-01-11T23:13:47.2027308Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tan_cuda_bfloat16 PASSED [ 61%] 2023-01-11T23:13:47.2027475Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tan_cuda_complex128 PASSED [ 61%] 2023-01-11T23:13:47.2027640Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tan_cuda_complex32 PASSED [ 61%] 2023-01-11T23:13:47.2027802Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tan_cuda_float16 PASSED [ 61%] 2023-01-11T23:13:47.2027960Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tan_cuda_int32 PASSED [ 61%] 2023-01-11T23:13:47.2028117Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tan_cuda_int64 PASSED [ 61%] 2023-01-11T23:13:47.2028272Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tanh_cuda_bfloat16 PASSED [ 61%] 2023-01-11T23:13:47.2028438Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tanh_cuda_complex128 PASSED [ 61%] 2023-01-11T23:13:47.2028602Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tanh_cuda_complex64 PASSED [ 61%] 2023-01-11T23:13:47.2028764Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tanh_cuda_float32 PASSED [ 61%] 2023-01-11T23:13:47.2028921Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tanh_cuda_int16 PASSED [ 61%] 2023-01-11T23:13:47.2029102Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tanh_cuda_int64 PASSED [ 61%] 2023-01-11T23:13:47.2029262Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tanh_cuda_uint8 PASSED [ 61%] 2023-01-11T23:13:47.2029428Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tensor_split_cuda_bool XFAIL [ 61%] 2023-01-11T23:13:47.2029589Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tensor_split_cuda_float32 XFAIL [ 61%] 2023-01-11T23:13:47.2029844Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tensor_split_cuda_float64 XFAIL [ 61%] 2023-01-11T23:13:47.2030011Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tensor_split_cuda_int32 XFAIL [ 61%] 2023-01-11T23:13:47.2030176Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tensor_split_cuda_int64 XFAIL [ 61%] 2023-01-11T23:13:47.2030337Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_to_cuda_bool PASSED [ 61%] 2023-01-11T23:13:47.2030502Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_to_cuda_complex128 PASSED [ 61%] 2023-01-11T23:13:47.2030664Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_to_cuda_int32 PASSED [ 61%] 2023-01-11T23:13:47.2030820Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_to_cuda_uint8 PASSED [ 61%] 2023-01-11T23:13:47.2031157Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_trace_cuda_bfloat16 PASSED [ 61%] 2023-01-11T23:13:47.2031318Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_trace_cuda_complex128 PASSED [ 61%] 2023-01-11T23:13:47.2031483Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_trace_cuda_complex64 PASSED [ 61%] 2023-01-11T23:13:47.2031644Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_trace_cuda_float32 PASSED [ 61%] 2023-01-11T23:13:47.2031804Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_trace_cuda_float64 PASSED [ 61%] 2023-01-11T23:13:47.2031998Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_trace_cuda_int16 PASSED [ 61%] 2023-01-11T23:13:47.2032159Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_trace_cuda_int32 PASSED [ 61%] 2023-01-11T23:13:47.2032332Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_transpose_cuda_bfloat16 PASSED [ 61%] 2023-01-11T23:13:47.2032506Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_transpose_cuda_complex128 PASSED [ 61%] 2023-01-11T23:13:47.2032670Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_transpose_cuda_complex32 PASSED [ 61%] 2023-01-11T23:13:47.2032839Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_transpose_cuda_complex64 PASSED [ 61%] 2023-01-11T23:13:47.2033007Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_transpose_cuda_float16 PASSED [ 61%] 2023-01-11T23:13:47.2033173Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_transpose_cuda_int16 PASSED [ 61%] 2023-01-11T23:13:47.2033341Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_transpose_cuda_int32 PASSED [ 61%] 2023-01-11T23:13:47.2033505Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_transpose_cuda_int8 PASSED [ 61%] 2023-01-11T23:13:47.2033671Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tril_cuda_complex128 PASSED [ 61%] 2023-01-11T23:13:47.2033835Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tril_cuda_float64 PASSED [ 61%] 2023-01-11T23:13:47.2033994Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tril_cuda_int16 PASSED [ 61%] 2023-01-11T23:13:47.2034143Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tril_cuda_int8 PASSED [ 61%] 2023-01-11T23:13:47.2034301Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tril_cuda_uint8 PASSED [ 61%] 2023-01-11T23:13:47.2034467Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tril_indices_cuda_int64 PASSED [ 61%] 2023-01-11T23:13:47.2034631Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_triu_cuda_complex128 PASSED [ 61%] 2023-01-11T23:13:47.2034797Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_triu_cuda_complex64 PASSED [ 61%] 2023-01-11T23:13:47.2034962Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_triu_cuda_float16 PASSED [ 61%] 2023-01-11T23:13:47.2035149Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_triu_cuda_int16 PASSED [ 61%] 2023-01-11T23:13:47.2035306Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_triu_cuda_int32 PASSED [ 61%] 2023-01-11T23:13:47.2035454Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_triu_cuda_int64 PASSED [ 61%] 2023-01-11T23:13:47.2035613Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_triu_cuda_int8 PASSED [ 61%] 2023-01-11T23:13:47.2035780Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_true_divide_cuda_bool PASSED [ 61%] 2023-01-11T23:13:47.2035953Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_true_divide_cuda_complex128 PASSED [ 61%] 2023-01-11T23:13:47.2036126Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_true_divide_cuda_complex32 PASSED [ 61%] 2023-01-11T23:13:47.2036303Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_true_divide_cuda_complex64 PASSED [ 61%] 2023-01-11T23:13:47.2036475Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_true_divide_cuda_float32 PASSED [ 61%] 2023-01-11T23:13:47.2036643Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_true_divide_cuda_float64 PASSED [ 61%] 2023-01-11T23:13:47.2036812Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_true_divide_cuda_int16 PASSED [ 61%] 2023-01-11T23:13:47.2036970Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_true_divide_cuda_int32 PASSED [ 61%] 2023-01-11T23:13:47.2037130Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_true_divide_cuda_int64 PASSED [ 61%] 2023-01-11T23:13:47.2037296Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_true_divide_cuda_int8 PASSED [ 61%] 2023-01-11T23:13:47.2037455Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_true_divide_cuda_uint8 PASSED [ 61%] 2023-01-11T23:13:47.2037655Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_trunc_cuda_bfloat16 PASSED [ 61%] 2023-01-11T23:13:47.2037816Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_trunc_cuda_int16 PASSED [ 61%] 2023-01-11T23:13:47.2037981Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_trunc_cuda_int32 PASSED [ 61%] 2023-01-11T23:13:47.2038143Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_trunc_cuda_int64 PASSED [ 61%] 2023-01-11T23:13:47.2038296Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_trunc_cuda_uint8 PASSED [ 61%] 2023-01-11T23:13:47.2038462Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unbind_cuda_complex128 PASSED [ 61%] 2023-01-11T23:13:47.2038629Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unbind_cuda_complex32 PASSED [ 61%] 2023-01-11T23:13:47.2038792Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unbind_cuda_float16 PASSED [ 61%] 2023-01-11T23:13:47.2038957Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unbind_cuda_float64 PASSED [ 61%] 2023-01-11T23:13:47.2039121Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unbind_cuda_int32 PASSED [ 61%] 2023-01-11T23:13:47.2039281Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unbind_cuda_int8 PASSED [ 61%] 2023-01-11T23:13:47.2039443Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unflatten_cuda_bool PASSED [ 61%] 2023-01-11T23:13:47.2039621Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unflatten_cuda_complex128 PASSED [ 61%] 2023-01-11T23:13:47.2039784Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unflatten_cuda_complex32 PASSED [ 61%] 2023-01-11T23:13:47.2039950Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unflatten_cuda_float16 PASSED [ 61%] 2023-01-11T23:13:47.2040112Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unflatten_cuda_float32 PASSED [ 61%] 2023-01-11T23:13:47.2040275Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unflatten_cuda_float64 PASSED [ 61%] 2023-01-11T23:13:47.2040446Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_copy_cuda_bfloat16 PASSED [ 61%] 2023-01-11T23:13:47.2040621Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_copy_cuda_complex128 PASSED [ 61%] 2023-01-11T23:13:47.2040820Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_copy_cuda_complex32 PASSED [ 61%] 2023-01-11T23:13:47.2040989Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_copy_cuda_float16 PASSED [ 61%] 2023-01-11T23:13:47.2041149Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_copy_cuda_float64 PASSED [ 61%] 2023-01-11T23:13:47.2041314Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_copy_cuda_int32 PASSED [ 61%] 2023-01-11T23:13:47.2041483Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_copy_cuda_int64 PASSED [ 61%] 2023-01-11T23:13:47.2041647Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_copy_cuda_int8 PASSED [ 61%] 2023-01-11T23:13:47.2041814Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_cuda_bfloat16 PASSED [ 61%] 2023-01-11T23:13:47.2041977Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_cuda_complex32 PASSED [ 61%] 2023-01-11T23:13:47.2042141Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_cuda_float16 PASSED [ 61%] 2023-01-11T23:13:47.2042304Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_cuda_float32 PASSED [ 61%] 2023-01-11T23:13:47.2042466Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_cuda_float64 PASSED [ 61%] 2023-01-11T23:13:47.2042621Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_cuda_int16 PASSED [ 61%] 2023-01-11T23:13:47.2042781Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_cuda_int64 PASSED [ 61%] 2023-01-11T23:13:47.2042953Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unsqueeze_cuda_complex128 PASSED [ 61%] 2023-01-11T23:13:47.2043120Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unsqueeze_cuda_complex32 PASSED [ 61%] 2023-01-11T23:13:47.2043321Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unsqueeze_cuda_float64 PASSED [ 61%] 2023-01-11T23:13:47.2043486Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unsqueeze_cuda_uint8 PASSED [ 61%] 2023-01-11T23:13:47.2043649Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_var_cuda_bfloat16 PASSED [ 61%] 2023-01-11T23:13:47.2043811Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_var_cuda_complex128 PASSED [ 61%] 2023-01-11T23:13:47.2043964Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_var_cuda_float16 PASSED [ 61%] 2023-01-11T23:13:47.2044124Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_var_cuda_float64 PASSED [ 61%] 2023-01-11T23:13:47.2044287Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_var_mean_cuda_float16 PASSED [ 61%] 2023-01-11T23:13:47.2044449Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_as_cuda_bool PASSED [ 61%] 2023-01-11T23:13:47.2044615Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_as_cuda_complex128 PASSED [ 61%] 2023-01-11T23:13:47.2044782Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_as_cuda_complex32 PASSED [ 61%] 2023-01-11T23:13:47.2044948Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_as_cuda_float16 PASSED [ 61%] 2023-01-11T23:13:47.2045110Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_as_cuda_int32 PASSED [ 61%] 2023-01-11T23:13:47.2045277Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_as_cuda_int64 PASSED [ 61%] 2023-01-11T23:13:47.2045431Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_as_cuda_int8 PASSED [ 61%] 2023-01-11T23:13:47.2045593Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_as_cuda_uint8 PASSED [ 61%] 2023-01-11T23:13:47.2045751Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_cuda_bfloat16 PASSED [ 61%] 2023-01-11T23:13:47.2045915Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_cuda_complex128 PASSED [ 61%] 2023-01-11T23:13:47.2046085Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_cuda_complex32 PASSED [ 61%] 2023-01-11T23:13:47.2046245Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_cuda_int16 PASSED [ 61%] 2023-01-11T23:13:47.2046427Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_cuda_uint8 PASSED [ 61%] 2023-01-11T23:13:47.2046596Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vsplit_cuda_complex128 PASSED [ 61%] 2023-01-11T23:13:47.2046756Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vsplit_cuda_complex32 PASSED [ 61%] 2023-01-11T23:13:47.2046922Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vsplit_cuda_complex64 PASSED [ 61%] 2023-01-11T23:13:47.2047084Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vsplit_cuda_float32 PASSED [ 61%] 2023-01-11T23:13:47.2047246Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vsplit_cuda_float64 PASSED [ 61%] 2023-01-11T23:13:47.2047405Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vsplit_cuda_int32 PASSED [ 61%] 2023-01-11T23:13:47.2047571Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vsplit_cuda_int64 PASSED [ 61%] 2023-01-11T23:13:47.2047732Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vstack_cuda_bool PASSED [ 62%] 2023-01-11T23:13:47.2047899Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vstack_cuda_complex64 PASSED [ 62%] 2023-01-11T23:13:47.2048054Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vstack_cuda_float16 PASSED [ 62%] 2023-01-11T23:13:47.2048215Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vstack_cuda_float32 PASSED [ 62%] 2023-01-11T23:13:47.2048374Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vstack_cuda_int16 PASSED [ 62%] 2023-01-11T23:13:47.2048532Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vstack_cuda_int64 PASSED [ 62%] 2023-01-11T23:13:47.2048692Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vstack_cuda_int8 PASSED [ 62%] 2023-01-11T23:13:47.2048880Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vstack_cuda_uint8 PASSED [ 62%] 2023-01-11T23:13:47.2049059Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_where_cuda_bfloat16 PASSED [ 62%] 2023-01-11T23:13:47.2049244Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_where_cuda_bool PASSED [ 62%] 2023-01-11T23:13:47.2049419Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_where_cuda_complex32 PASSED [ 62%] 2023-01-11T23:13:47.2049572Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_where_cuda_int32 PASSED [ 62%] 2023-01-11T23:13:47.2049732Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_where_cuda_int64 PASSED [ 62%] 2023-01-11T23:13:47.2049899Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_xlogy_cuda_bfloat16 PASSED [ 62%] 2023-01-11T23:13:47.2050059Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_xlogy_cuda_float16 PASSED [ 62%] 2023-01-11T23:13:47.2050225Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_xlogy_cuda_float32 PASSED [ 62%] 2023-01-11T23:13:47.2050386Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_xlogy_cuda_int16 PASSED [ 62%] 2023-01-11T23:13:47.2050546Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_xlogy_cuda_int32 PASSED [ 62%] 2023-01-11T23:13:47.2050705Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_xlogy_cuda_int64 PASSED [ 62%] 2023-01-11T23:13:47.2050864Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_zeros_cuda_complex128 PASSED [ 62%] 2023-01-11T23:13:47.2051026Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_zeros_cuda_float16 PASSED [ 62%] 2023-01-11T23:13:47.2051184Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_zeros_cuda_float64 PASSED [ 62%] 2023-01-11T23:13:47.2051342Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_zeros_cuda_int32 PASSED [ 62%] 2023-01-11T23:13:47.2051499Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_zeros_cuda_int64 PASSED [ 62%] 2023-01-11T23:13:47.2051660Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_zeros_cuda_int8 PASSED [ 62%] 2023-01-11T23:13:47.2051823Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_zeros_cuda_uint8 PASSED [ 62%] 2023-01-11T23:13:47.2052009Z test_ops.py::TestCommonCUDA::test_python_ref_meta_ops_nvprims_native_batch_norm_cuda_float32 PASSED [ 62%] 2023-01-11T23:13:47.2052227Z test_ops.py::TestCommonCUDA::test_python_ref_meta_ops_nvprims_native_batch_norm_cuda_float64 PASSED [ 62%] 2023-01-11T23:13:47.2052398Z test_ops.py::TestCommonCUDA::test_python_ref_meta_ops_nvprims_var_mean_cuda_bfloat16 PASSED [ 62%] 2023-01-11T23:13:47.2052572Z test_ops.py::TestCommonCUDA::test_python_ref_meta_ops_nvprims_var_mean_cuda_float16 PASSED [ 62%] 2023-01-11T23:13:47.2052742Z test_ops.py::TestCommonCUDA::test_python_ref_meta_ops_nvprims_view_cuda_bfloat16 PASSED [ 62%] 2023-01-11T23:13:47.2052917Z test_ops.py::TestCommonCUDA::test_python_ref_meta_ops_nvprims_view_cuda_complex128 PASSED [ 62%] 2023-01-11T23:13:47.2053087Z test_ops.py::TestCommonCUDA::test_python_ref_meta_ops_nvprims_view_cuda_float32 PASSED [ 62%] 2023-01-11T23:13:47.2053258Z test_ops.py::TestCommonCUDA::test_python_ref_meta_ops_nvprims_view_cuda_float64 PASSED [ 62%] 2023-01-11T23:13:47.2053425Z test_ops.py::TestCommonCUDA::test_python_ref_meta_ops_nvprims_view_cuda_int16 PASSED [ 62%] 2023-01-11T23:13:47.2053594Z test_ops.py::TestCommonCUDA::test_python_ref_meta_ops_nvprims_view_cuda_int64 PASSED [ 62%] 2023-01-11T23:13:47.2053753Z test_ops.py::TestCommonCUDA::test_python_ref_ops_nvprims_var_mean_cuda_bfloat16 XFAIL [ 62%] 2023-01-11T23:13:47.2053916Z test_ops.py::TestCommonCUDA::test_python_ref_ops_nvprims_view_cuda_bfloat16 XFAIL [ 62%] 2023-01-11T23:13:47.2054085Z test_ops.py::TestCommonCUDA::test_python_ref_ops_nvprims_view_cuda_complex128 XFAIL [ 62%] 2023-01-11T23:13:47.2054249Z test_ops.py::TestCommonCUDA::test_python_ref_ops_nvprims_view_cuda_complex64 XFAIL [ 62%] 2023-01-11T23:13:47.2054410Z test_ops.py::TestCommonCUDA::test_python_ref_ops_nvprims_view_cuda_float16 XFAIL [ 62%] 2023-01-11T23:13:47.2054706Z test_ops.py::TestCommonCUDA::test_python_ref_ops_nvprims_view_cuda_float64 XFAIL [ 62%] 2023-01-11T23:13:47.2054868Z test_ops.py::TestCommonCUDA::test_python_ref_ops_nvprims_view_cuda_int16 XFAIL [ 62%] 2023-01-11T23:13:47.2055029Z test_ops.py::TestCommonCUDA::test_python_ref_ops_nvprims_view_cuda_int32 XFAIL [ 62%] 2023-01-11T23:13:47.2055201Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_T_cuda_bfloat16 PASSED [ 62%] 2023-01-11T23:13:47.2055362Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_T_cuda_bool PASSED [ 62%] 2023-01-11T23:13:47.2055535Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_T_cuda_complex128 PASSED [ 62%] 2023-01-11T23:13:47.2055708Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_T_cuda_complex32 PASSED [ 62%] 2023-01-11T23:13:47.2055879Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_T_cuda_float16 PASSED [ 62%] 2023-01-11T23:13:47.2056045Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_T_cuda_float64 PASSED [ 62%] 2023-01-11T23:13:47.2056217Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_T_cuda_uint8 PASSED [ 62%] 2023-01-11T23:13:47.2056413Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bfloat16_cuda_bfloat16 PASSED [ 62%] 2023-01-11T23:13:47.2056607Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bfloat16_cuda_bool PASSED [ 62%] 2023-01-11T23:13:47.2056803Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bfloat16_cuda_complex128 PASSED [ 62%] 2023-01-11T23:13:47.2056989Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bfloat16_cuda_complex64 PASSED [ 62%] 2023-01-11T23:13:47.2057178Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bfloat16_cuda_float32 PASSED [ 62%] 2023-01-11T23:13:47.2057365Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bfloat16_cuda_float64 PASSED [ 62%] 2023-01-11T23:13:47.2057557Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bfloat16_cuda_int64 PASSED [ 62%] 2023-01-11T23:13:47.2057747Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bool_cuda_bfloat16 PASSED [ 62%] 2023-01-11T23:13:47.2057985Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bool_cuda_complex128 PASSED [ 62%] 2023-01-11T23:13:47.2058177Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bool_cuda_complex32 PASSED [ 62%] 2023-01-11T23:13:47.2058364Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bool_cuda_complex64 PASSED [ 62%] 2023-01-11T23:13:47.2058551Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bool_cuda_float32 PASSED [ 62%] 2023-01-11T23:13:47.2058730Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bool_cuda_float64 PASSED [ 62%] 2023-01-11T23:13:47.2058921Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bool_cuda_int32 PASSED [ 62%] 2023-01-11T23:13:47.2059106Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_byte_cuda_bfloat16 PASSED [ 62%] 2023-01-11T23:13:47.2059300Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_byte_cuda_complex64 PASSED [ 62%] 2023-01-11T23:13:47.2059490Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_byte_cuda_float32 PASSED [ 62%] 2023-01-11T23:13:47.2059676Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_byte_cuda_float64 PASSED [ 62%] 2023-01-11T23:13:47.2059860Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_byte_cuda_int16 PASSED [ 62%] 2023-01-11T23:13:47.2060043Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_byte_cuda_int64 PASSED [ 62%] 2023-01-11T23:13:47.2060238Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cdouble_cuda_complex128 PASSED [ 62%] 2023-01-11T23:13:47.2060456Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cdouble_cuda_complex64 PASSED [ 62%] 2023-01-11T23:13:47.2060648Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cdouble_cuda_float32 PASSED [ 62%] 2023-01-11T23:13:47.2060836Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cdouble_cuda_float64 PASSED [ 62%] 2023-01-11T23:13:47.2061025Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cdouble_cuda_int64 PASSED [ 62%] 2023-01-11T23:13:47.2061211Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cdouble_cuda_int8 PASSED [ 62%] 2023-01-11T23:13:47.2061399Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cfloat_cuda_bool PASSED [ 62%] 2023-01-11T23:13:47.2061591Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cfloat_cuda_complex128 PASSED [ 62%] 2023-01-11T23:13:47.2061782Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cfloat_cuda_complex64 PASSED [ 62%] 2023-01-11T23:13:47.2061972Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cfloat_cuda_float16 PASSED [ 62%] 2023-01-11T23:13:47.2062160Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cfloat_cuda_float64 PASSED [ 62%] 2023-01-11T23:13:47.2062337Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cfloat_cuda_int16 PASSED [ 62%] 2023-01-11T23:13:47.2062524Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cfloat_cuda_int8 PASSED [ 62%] 2023-01-11T23:13:47.2062707Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cfloat_cuda_uint8 PASSED [ 62%] 2023-01-11T23:13:47.2062895Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_chalf_cuda_bfloat16 PASSED [ 62%] 2023-01-11T23:13:47.2063086Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_chalf_cuda_complex64 PASSED [ 62%] 2023-01-11T23:13:47.2063271Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_chalf_cuda_float16 PASSED [ 62%] 2023-01-11T23:13:47.2063483Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_chalf_cuda_float64 PASSED [ 62%] 2023-01-11T23:13:47.2063668Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_chalf_cuda_int16 PASSED [ 62%] 2023-01-11T23:13:47.2063852Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_chalf_cuda_int32 PASSED [ 62%] 2023-01-11T23:13:47.2064023Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_chalf_cuda_int64 PASSED [ 62%] 2023-01-11T23:13:47.2064207Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_char_cuda_bfloat16 PASSED [ 62%] 2023-01-11T23:13:47.2064393Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_char_cuda_complex32 PASSED [ 62%] 2023-01-11T23:13:47.2064579Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_char_cuda_float64 PASSED [ 62%] 2023-01-11T23:13:47.2064767Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_char_cuda_int16 PASSED [ 62%] 2023-01-11T23:13:47.2064950Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_char_cuda_uint8 PASSED [ 62%] 2023-01-11T23:13:47.2065136Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_complex_cuda_float32 PASSED [ 62%] 2023-01-11T23:13:47.2065325Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_complex_cuda_float64 PASSED [ 62%] 2023-01-11T23:13:47.2065510Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_double_cuda_bool PASSED [ 62%] 2023-01-11T23:13:47.2065695Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_double_cuda_complex128 PASSED [ 62%] 2023-01-11T23:13:47.2065914Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_double_cuda_complex32 PASSED [ 62%] 2023-01-11T23:13:47.2066102Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_double_cuda_complex64 PASSED [ 62%] 2023-01-11T23:13:47.2066289Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_double_cuda_float16 PASSED [ 62%] 2023-01-11T23:13:47.2066475Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_double_cuda_float32 PASSED [ 62%] 2023-01-11T23:13:47.2066657Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_double_cuda_int16 PASSED [ 62%] 2023-01-11T23:13:47.2066838Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_double_cuda_int32 PASSED [ 62%] 2023-01-11T23:13:47.2067023Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_double_cuda_int64 PASSED [ 62%] 2023-01-11T23:13:47.2067208Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_double_cuda_uint8 PASSED [ 62%] 2023-01-11T23:13:47.2067400Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_float_cuda_complex128 PASSED [ 62%] 2023-01-11T23:13:47.2067578Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_float_cuda_float64 PASSED [ 62%] 2023-01-11T23:13:47.2067759Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_float_cuda_int16 PASSED [ 62%] 2023-01-11T23:13:47.2067942Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_float_cuda_int32 PASSED [ 62%] 2023-01-11T23:13:47.2068126Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_float_cuda_int8 PASSED [ 62%] 2023-01-11T23:13:47.2068303Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_float_cuda_uint8 PASSED [ 62%] 2023-01-11T23:13:47.2068493Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_half_cuda_bfloat16 PASSED [ 62%] 2023-01-11T23:13:47.2068680Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_half_cuda_complex128 PASSED [ 62%] 2023-01-11T23:13:47.2068922Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_half_cuda_float64 PASSED [ 62%] 2023-01-11T23:13:47.2069106Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_half_cuda_int16 PASSED [ 62%] 2023-01-11T23:13:47.2069281Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_half_cuda_int32 PASSED [ 62%] 2023-01-11T23:13:47.2069458Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_half_cuda_int64 PASSED [ 62%] 2023-01-11T23:13:47.2069639Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_half_cuda_uint8 PASSED [ 62%] 2023-01-11T23:13:47.2069896Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_int_cuda_complex128 PASSED [ 62%] 2023-01-11T23:13:47.2070084Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_int_cuda_int16 PASSED [ 62%] 2023-01-11T23:13:47.2070269Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_int_cuda_int32 PASSED [ 62%] 2023-01-11T23:13:47.2070448Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_int_cuda_int64 PASSED [ 62%] 2023-01-11T23:13:47.2070627Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_int_cuda_uint8 PASSED [ 62%] 2023-01-11T23:13:47.2070805Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_long_cuda_bool PASSED [ 62%] 2023-01-11T23:13:47.2070987Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_long_cuda_complex128 PASSED [ 62%] 2023-01-11T23:13:47.2071164Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_long_cuda_int16 PASSED [ 62%] 2023-01-11T23:13:47.2071374Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_long_cuda_int8 PASSED [ 62%] 2023-01-11T23:13:47.2071561Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_short_cuda_bfloat16 PASSED [ 62%] 2023-01-11T23:13:47.2071753Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_short_cuda_complex128 PASSED [ 62%] 2023-01-11T23:13:47.2071938Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_short_cuda_float16 PASSED [ 62%] 2023-01-11T23:13:47.2072126Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_short_cuda_float32 PASSED [ 62%] 2023-01-11T23:13:47.2072307Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_short_cuda_int16 PASSED [ 62%] 2023-01-11T23:13:47.2072487Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_short_cuda_int64 PASSED [ 62%] 2023-01-11T23:13:47.2072654Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_abs_cuda_bfloat16 PASSED [ 62%] 2023-01-11T23:13:47.2072822Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_abs_cuda_bool PASSED [ 62%] 2023-01-11T23:13:47.2072999Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_abs_cuda_float64 PASSED [ 62%] 2023-01-11T23:13:47.2073164Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_abs_cuda_int32 PASSED [ 62%] 2023-01-11T23:13:47.2073327Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_abs_cuda_int8 PASSED [ 62%] 2023-01-11T23:13:47.2073494Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_abs_cuda_uint8 PASSED [ 62%] 2023-01-11T23:13:47.2073670Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acos_cuda_complex32 PASSED [ 62%] 2023-01-11T23:13:47.2073840Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acos_cuda_complex64 PASSED [ 62%] 2023-01-11T23:13:47.2074010Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acos_cuda_float16 PASSED [ 62%] 2023-01-11T23:13:47.2074172Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acos_cuda_int16 PASSED [ 62%] 2023-01-11T23:13:47.2074374Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acos_cuda_int32 PASSED [ 62%] 2023-01-11T23:13:47.2074543Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acos_cuda_int64 PASSED [ 62%] 2023-01-11T23:13:47.2074708Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acos_cuda_int8 PASSED [ 62%] 2023-01-11T23:13:47.2074881Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acosh_cuda_bfloat16 PASSED [ 62%] 2023-01-11T23:13:47.2075048Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acosh_cuda_bool PASSED [ 63%] 2023-01-11T23:13:47.2075226Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acosh_cuda_complex32 PASSED [ 63%] 2023-01-11T23:13:47.2075399Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acosh_cuda_float16 PASSED [ 63%] 2023-01-11T23:13:47.2075570Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acosh_cuda_int16 PASSED [ 63%] 2023-01-11T23:13:47.2075734Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acosh_cuda_int32 PASSED [ 63%] 2023-01-11T23:13:47.2075907Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acosh_cuda_int64 PASSED [ 63%] 2023-01-11T23:13:47.2076076Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_add_cuda_bfloat16 PASSED [ 63%] 2023-01-11T23:13:47.2076250Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_add_cuda_complex128 PASSED [ 63%] 2023-01-11T23:13:47.2076421Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_add_cuda_complex32 PASSED [ 63%] 2023-01-11T23:13:47.2076589Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_add_cuda_float16 PASSED [ 63%] 2023-01-11T23:13:47.2076755Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_add_cuda_float64 PASSED [ 63%] 2023-01-11T23:13:47.2076956Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_add_cuda_int64 PASSED [ 63%] 2023-01-11T23:13:47.2077118Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_add_cuda_uint8 PASSED [ 63%] 2023-01-11T23:13:47.2077295Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addcdiv_cuda_float16 PASSED [ 63%] 2023-01-11T23:13:47.2077467Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addcdiv_cuda_float32 PASSED [ 63%] 2023-01-11T23:13:47.2077643Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addcmul_cuda_bfloat16 PASSED [ 63%] 2023-01-11T23:13:47.2077823Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addcmul_cuda_complex64 PASSED [ 63%] 2023-01-11T23:13:47.2077993Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addcmul_cuda_int16 PASSED [ 63%] 2023-01-11T23:13:47.2078162Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addcmul_cuda_int8 PASSED [ 63%] 2023-01-11T23:13:47.2078332Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addr_cuda_bool PASSED [ 63%] 2023-01-11T23:13:47.2078506Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addr_cuda_float16 PASSED [ 63%] 2023-01-11T23:13:47.2078669Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addr_cuda_float32 PASSED [ 63%] 2023-01-11T23:13:47.2078837Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addr_cuda_float64 PASSED [ 63%] 2023-01-11T23:13:47.2079005Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addr_cuda_int64 PASSED [ 63%] 2023-01-11T23:13:47.2079172Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_all_cuda_bool PASSED [ 63%] 2023-01-11T23:13:47.2079345Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_all_cuda_complex128 PASSED [ 63%] 2023-01-11T23:13:47.2079515Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_all_cuda_complex64 PASSED [ 63%] 2023-01-11T23:13:47.2079687Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_all_cuda_float16 PASSED [ 63%] 2023-01-11T23:13:47.2079879Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_all_cuda_float32 PASSED [ 63%] 2023-01-11T23:13:47.2080048Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_all_cuda_float64 PASSED [ 63%] 2023-01-11T23:13:47.2080208Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_all_cuda_int8 PASSED [ 63%] 2023-01-11T23:13:47.2080387Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_allclose_cuda_bfloat16 PASSED [ 63%] 2023-01-11T23:13:47.2080567Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_allclose_cuda_complex64 PASSED [ 63%] 2023-01-11T23:13:47.2080743Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_allclose_cuda_float16 PASSED [ 63%] 2023-01-11T23:13:47.2080917Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_allclose_cuda_float32 PASSED [ 63%] 2023-01-11T23:13:47.2081087Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_amax_cuda_bool PASSED [ 63%] 2023-01-11T23:13:47.2081258Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_amax_cuda_float16 PASSED [ 63%] 2023-01-11T23:13:47.2081431Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_amax_cuda_float32 PASSED [ 63%] 2023-01-11T23:13:47.2081600Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_amax_cuda_float64 PASSED [ 63%] 2023-01-11T23:13:47.2081757Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_amax_cuda_int8 PASSED [ 63%] 2023-01-11T23:13:47.2081924Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_amin_cuda_bool PASSED [ 63%] 2023-01-11T23:13:47.2082091Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_amin_cuda_float16 PASSED [ 63%] 2023-01-11T23:13:47.2082256Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_amin_cuda_float32 PASSED [ 63%] 2023-01-11T23:13:47.2082453Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_amin_cuda_float64 PASSED [ 63%] 2023-01-11T23:13:47.2082619Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_amin_cuda_int16 PASSED [ 63%] 2023-01-11T23:13:47.2082788Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_amin_cuda_int32 PASSED [ 63%] 2023-01-11T23:13:47.2082957Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_amin_cuda_int64 PASSED [ 63%] 2023-01-11T23:13:47.2083121Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_amin_cuda_int8 PASSED [ 63%] 2023-01-11T23:13:47.2083278Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_amin_cuda_uint8 PASSED [ 63%] 2023-01-11T23:13:47.2083445Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_any_cuda_bfloat16 PASSED [ 63%] 2023-01-11T23:13:47.2083618Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_any_cuda_complex64 PASSED [ 63%] 2023-01-11T23:13:47.2083788Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_any_cuda_float32 PASSED [ 63%] 2023-01-11T23:13:47.2083956Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_any_cuda_float64 PASSED [ 63%] 2023-01-11T23:13:47.2084124Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_any_cuda_int32 PASSED [ 63%] 2023-01-11T23:13:47.2084290Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_any_cuda_uint8 PASSED [ 63%] 2023-01-11T23:13:47.2084462Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_arange_cuda_float16 PASSED [ 63%] 2023-01-11T23:13:47.2084621Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_arange_cuda_int64 PASSED [ 63%] 2023-01-11T23:13:47.2084789Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_arange_cuda_int8 PASSED [ 63%] 2023-01-11T23:13:47.2084955Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_arange_cuda_uint8 PASSED [ 63%] 2023-01-11T23:13:47.2085141Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_cuda_bfloat16 PASSED [ 63%] 2023-01-11T23:13:47.2085317Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_cuda_bool PASSED [ 63%] 2023-01-11T23:13:47.2085524Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_cuda_complex128 PASSED [ 63%] 2023-01-11T23:13:47.2085706Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_cuda_complex64 PASSED [ 63%] 2023-01-11T23:13:47.2085882Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_cuda_float16 PASSED [ 63%] 2023-01-11T23:13:47.2086058Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_cuda_float32 PASSED [ 63%] 2023-01-11T23:13:47.2086224Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_cuda_int16 PASSED [ 63%] 2023-01-11T23:13:47.2086393Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_cuda_int32 PASSED [ 63%] 2023-01-11T23:13:47.2086565Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_cuda_uint8 PASSED [ 63%] 2023-01-11T23:13:47.2086768Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_partial_views_cuda_complex64 PASSED [ 63%] 2023-01-11T23:13:47.2086963Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_partial_views_cuda_float16 PASSED [ 63%] 2023-01-11T23:13:47.2087153Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_partial_views_cuda_float32 PASSED [ 63%] 2023-01-11T23:13:47.2087337Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_partial_views_cuda_float64 PASSED [ 63%] 2023-01-11T23:13:47.2087523Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_partial_views_cuda_int16 PASSED [ 63%] 2023-01-11T23:13:47.2087709Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_partial_views_cuda_int64 PASSED [ 63%] 2023-01-11T23:13:47.2087924Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_partial_views_cuda_int8 PASSED [ 63%] 2023-01-11T23:13:47.2088102Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_partial_views_cuda_uint8 PASSED [ 63%] 2023-01-11T23:13:47.2088286Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_scatter_cuda_bool PASSED [ 63%] 2023-01-11T23:13:47.2088476Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_scatter_cuda_complex64 PASSED [ 63%] 2023-01-11T23:13:47.2088662Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_scatter_cuda_float32 PASSED [ 63%] 2023-01-11T23:13:47.2088847Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_scatter_cuda_float64 PASSED [ 63%] 2023-01-11T23:13:47.2089030Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_scatter_cuda_int16 PASSED [ 63%] 2023-01-11T23:13:47.2089215Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_scatter_cuda_int64 PASSED [ 63%] 2023-01-11T23:13:47.2089397Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_scatter_cuda_int8 PASSED [ 63%] 2023-01-11T23:13:47.2089566Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asin_cuda_bool PASSED [ 63%] 2023-01-11T23:13:47.2089733Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asin_cuda_complex32 PASSED [ 63%] 2023-01-11T23:13:47.2089906Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asin_cuda_float32 PASSED [ 63%] 2023-01-11T23:13:47.2090073Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asin_cuda_int64 PASSED [ 63%] 2023-01-11T23:13:47.2090240Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asin_cuda_int8 PASSED [ 63%] 2023-01-11T23:13:47.2090412Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asinh_cuda_bfloat16 PASSED [ 63%] 2023-01-11T23:13:47.2090583Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asinh_cuda_bool PASSED [ 63%] 2023-01-11T23:13:47.2090762Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asinh_cuda_complex128 PASSED [ 63%] 2023-01-11T23:13:47.2090963Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asinh_cuda_complex64 PASSED [ 63%] 2023-01-11T23:13:47.2091137Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asinh_cuda_float16 PASSED [ 63%] 2023-01-11T23:13:47.2091298Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asinh_cuda_float32 PASSED [ 63%] 2023-01-11T23:13:47.2091467Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asinh_cuda_int16 PASSED [ 63%] 2023-01-11T23:13:47.2091637Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asinh_cuda_uint8 PASSED [ 63%] 2023-01-11T23:13:47.2091810Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan2_cuda_bfloat16 PASSED [ 63%] 2023-01-11T23:13:47.2091985Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan2_cuda_float16 PASSED [ 63%] 2023-01-11T23:13:47.2092154Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan2_cuda_int16 PASSED [ 63%] 2023-01-11T23:13:47.2092324Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan2_cuda_int8 PASSED [ 63%] 2023-01-11T23:13:47.2092492Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan_cuda_bfloat16 PASSED [ 63%] 2023-01-11T23:13:47.2092659Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan_cuda_complex128 PASSED [ 63%] 2023-01-11T23:13:47.2092828Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan_cuda_float16 PASSED [ 63%] 2023-01-11T23:13:47.2092999Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan_cuda_float32 PASSED [ 63%] 2023-01-11T23:13:47.2093168Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan_cuda_float64 PASSED [ 63%] 2023-01-11T23:13:47.2093357Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan_cuda_int64 PASSED [ 63%] 2023-01-11T23:13:47.2093526Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan_cuda_int8 PASSED [ 63%] 2023-01-11T23:13:47.2093701Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atanh_cuda_bfloat16 PASSED [ 63%] 2023-01-11T23:13:47.2093871Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atanh_cuda_bool PASSED [ 63%] 2023-01-11T23:13:47.2094045Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_1d_cuda_bool PASSED [ 63%] 2023-01-11T23:13:47.2094215Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_1d_cuda_float16 PASSED [ 63%] 2023-01-11T23:13:47.2094387Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_1d_cuda_int16 PASSED [ 63%] 2023-01-11T23:13:47.2094749Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_1d_cuda_int32 PASSED [ 63%] 2023-01-11T23:13:47.2094924Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_1d_cuda_int64 PASSED [ 63%] 2023-01-11T23:13:47.2095098Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_1d_cuda_int8 PASSED [ 63%] 2023-01-11T23:13:47.2095268Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_1d_cuda_uint8 PASSED [ 63%] 2023-01-11T23:13:47.2095446Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_2d_cuda_bfloat16 PASSED [ 63%] 2023-01-11T23:13:47.2095632Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_2d_cuda_complex128 PASSED [ 63%] 2023-01-11T23:13:47.2095814Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_2d_cuda_complex32 PASSED [ 63%] 2023-01-11T23:13:47.2095983Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_2d_cuda_float16 PASSED [ 63%] 2023-01-11T23:13:47.2096163Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_2d_cuda_float32 PASSED [ 63%] 2023-01-11T23:13:47.2096342Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_2d_cuda_float64 PASSED [ 63%] 2023-01-11T23:13:47.2096559Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_2d_cuda_uint8 PASSED [ 63%] 2023-01-11T23:13:47.2096739Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_3d_cuda_bfloat16 PASSED [ 63%] 2023-01-11T23:13:47.2096920Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_3d_cuda_complex128 PASSED [ 63%] 2023-01-11T23:13:47.2097093Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_3d_cuda_float64 PASSED [ 63%] 2023-01-11T23:13:47.2097267Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_3d_cuda_int16 PASSED [ 63%] 2023-01-11T23:13:47.2097437Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_3d_cuda_int8 PASSED [ 63%] 2023-01-11T23:13:47.2097600Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_and_cuda_bool PASSED [ 63%] 2023-01-11T23:13:47.2097777Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_and_cuda_int16 PASSED [ 63%] 2023-01-11T23:13:47.2097955Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_and_cuda_uint8 PASSED [ 63%] 2023-01-11T23:13:47.2098141Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_left_shift_cuda_int16 PASSED [ 63%] 2023-01-11T23:13:47.2098324Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_left_shift_cuda_int64 PASSED [ 63%] 2023-01-11T23:13:47.2098505Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_left_shift_cuda_uint8 PASSED [ 63%] 2023-01-11T23:13:47.2098678Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_not_cuda_bool PASSED [ 63%] 2023-01-11T23:13:47.2098849Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_not_cuda_int16 PASSED [ 63%] 2023-01-11T23:13:47.2099058Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_or_cuda_bool PASSED [ 63%] 2023-01-11T23:13:47.2099223Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_or_cuda_int16 PASSED [ 63%] 2023-01-11T23:13:47.2099399Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_or_cuda_int8 PASSED [ 63%] 2023-01-11T23:13:47.2099568Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_or_cuda_uint8 PASSED [ 63%] 2023-01-11T23:13:47.2099756Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_right_shift_cuda_int64 PASSED [ 63%] 2023-01-11T23:13:47.2099928Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_xor_cuda_bool PASSED [ 63%] 2023-01-11T23:13:47.2100099Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_xor_cuda_int32 PASSED [ 63%] 2023-01-11T23:13:47.2100276Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_xor_cuda_int64 PASSED [ 63%] 2023-01-11T23:13:47.2100473Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_tensors_cuda_complex128 PASSED [ 63%] 2023-01-11T23:13:47.2100667Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_tensors_cuda_complex64 PASSED [ 63%] 2023-01-11T23:13:47.2100847Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_tensors_cuda_float16 PASSED [ 63%] 2023-01-11T23:13:47.2101034Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_tensors_cuda_int64 PASSED [ 63%] 2023-01-11T23:13:47.2101220Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_tensors_cuda_uint8 PASSED [ 63%] 2023-01-11T23:13:47.2101400Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_to_cuda_bool PASSED [ 63%] 2023-01-11T23:13:47.2101589Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_to_cuda_complex128 PASSED [ 63%] 2023-01-11T23:13:47.2101774Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_to_cuda_float32 PASSED [ 63%] 2023-01-11T23:13:47.2101955Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_to_cuda_int16 PASSED [ 64%] 2023-01-11T23:13:47.2102159Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_to_cuda_int64 PASSED [ 64%] 2023-01-11T23:13:47.2102335Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_to_cuda_int8 PASSED [ 64%] 2023-01-11T23:13:47.2102506Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bucketize_cuda_float16 PASSED [ 64%] 2023-01-11T23:13:47.2102685Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bucketize_cuda_uint8 PASSED [ 64%] 2023-01-11T23:13:47.2102855Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cat_cuda_bfloat16 PASSED [ 64%] 2023-01-11T23:13:47.2103021Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cat_cuda_bool PASSED [ 64%] 2023-01-11T23:13:47.2103197Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cat_cuda_complex128 PASSED [ 64%] 2023-01-11T23:13:47.2103369Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cat_cuda_complex64 PASSED [ 64%] 2023-01-11T23:13:47.2103540Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cat_cuda_float64 PASSED [ 64%] 2023-01-11T23:13:47.2103708Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cat_cuda_int32 PASSED [ 64%] 2023-01-11T23:13:47.2103874Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cat_cuda_int64 PASSED [ 64%] 2023-01-11T23:13:47.2104034Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cat_cuda_uint8 PASSED [ 64%] 2023-01-11T23:13:47.2104204Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ceil_cuda_bfloat16 PASSED [ 64%] 2023-01-11T23:13:47.2104375Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ceil_cuda_float16 PASSED [ 64%] 2023-01-11T23:13:47.2104568Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ceil_cuda_float64 PASSED [ 64%] 2023-01-11T23:13:47.2104738Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ceil_cuda_int16 PASSED [ 64%] 2023-01-11T23:13:47.2104908Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ceil_cuda_int32 PASSED [ 64%] 2023-01-11T23:13:47.2105075Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ceil_cuda_int64 PASSED [ 64%] 2023-01-11T23:13:47.2105250Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_chunk_cuda_bfloat16 PASSED [ 64%] 2023-01-11T23:13:47.2105416Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_chunk_cuda_bool PASSED [ 64%] 2023-01-11T23:13:47.2105586Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_chunk_cuda_complex128 PASSED [ 64%] 2023-01-11T23:13:47.2105762Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_chunk_cuda_complex32 PASSED [ 64%] 2023-01-11T23:13:47.2105935Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_chunk_cuda_float16 PASSED [ 64%] 2023-01-11T23:13:47.2106103Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_chunk_cuda_float32 PASSED [ 64%] 2023-01-11T23:13:47.2106276Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_chunk_cuda_int64 PASSED [ 64%] 2023-01-11T23:13:47.2106447Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_chunk_cuda_uint8 PASSED [ 64%] 2023-01-11T23:13:47.2106619Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_cuda_bfloat16 PASSED [ 64%] 2023-01-11T23:13:47.2106787Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_cuda_int16 PASSED [ 64%] 2023-01-11T23:13:47.2106948Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_cuda_int32 PASSED [ 64%] 2023-01-11T23:13:47.2107117Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_cuda_uint8 PASSED [ 64%] 2023-01-11T23:13:47.2107300Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_max_cuda_bfloat16 PASSED [ 64%] 2023-01-11T23:13:47.2107475Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_max_cuda_bool PASSED [ 64%] 2023-01-11T23:13:47.2107675Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_max_cuda_float16 PASSED [ 64%] 2023-01-11T23:13:47.2107856Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_max_cuda_float64 PASSED [ 64%] 2023-01-11T23:13:47.2108030Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_max_cuda_int16 PASSED [ 64%] 2023-01-11T23:13:47.2108199Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_max_cuda_int64 PASSED [ 64%] 2023-01-11T23:13:47.2108368Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_max_cuda_uint8 PASSED [ 64%] 2023-01-11T23:13:47.2108538Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_min_cuda_bfloat16 PASSED [ 64%] 2023-01-11T23:13:47.2108720Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_min_cuda_float16 PASSED [ 64%] 2023-01-11T23:13:47.2108896Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_min_cuda_float32 PASSED [ 64%] 2023-01-11T23:13:47.2109072Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_min_cuda_int32 PASSED [ 64%] 2023-01-11T23:13:47.2109257Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_min_cuda_int64 PASSED [ 64%] 2023-01-11T23:13:47.2109454Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_min_cuda_int8 PASSED [ 64%] 2023-01-11T23:13:47.2109634Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clone_cuda_bfloat16 PASSED [ 64%] 2023-01-11T23:13:47.2109869Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clone_cuda_complex64 PASSED [ 64%] 2023-01-11T23:13:47.2110041Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clone_cuda_float16 PASSED [ 64%] 2023-01-11T23:13:47.2110228Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clone_cuda_float64 PASSED [ 64%] 2023-01-11T23:13:47.2110397Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clone_cuda_int32 PASSED [ 64%] 2023-01-11T23:13:47.2110587Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_column_stack_cuda_complex32 PASSED [ 64%] 2023-01-11T23:13:47.2110771Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_column_stack_cuda_complex64 PASSED [ 64%] 2023-01-11T23:13:47.2110951Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_column_stack_cuda_float16 PASSED [ 64%] 2023-01-11T23:13:47.2111131Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_column_stack_cuda_float64 PASSED [ 64%] 2023-01-11T23:13:47.2111306Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_column_stack_cuda_int32 PASSED [ 64%] 2023-01-11T23:13:47.2111480Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_column_stack_cuda_int64 PASSED [ 64%] 2023-01-11T23:13:47.2111653Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_cuda_bfloat16 PASSED [ 64%] 2023-01-11T23:13:47.2111825Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_cuda_complex128 PASSED [ 64%] 2023-01-11T23:13:47.2111997Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_cuda_complex32 PASSED [ 64%] 2023-01-11T23:13:47.2112167Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_cuda_complex64 PASSED [ 64%] 2023-01-11T23:13:47.2112337Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_cuda_float16 PASSED [ 64%] 2023-01-11T23:13:47.2112508Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_cuda_float64 PASSED [ 64%] 2023-01-11T23:13:47.2112675Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_cuda_int16 PASSED [ 64%] 2023-01-11T23:13:47.2112844Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_cuda_uint8 PASSED [ 64%] 2023-01-11T23:13:47.2113030Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_physical_cuda_bfloat16 PASSED [ 64%] 2023-01-11T23:13:47.2113239Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_physical_cuda_complex32 PASSED [ 64%] 2023-01-11T23:13:47.2113415Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_physical_cuda_float64 PASSED [ 64%] 2023-01-11T23:13:47.2113593Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_physical_cuda_int64 PASSED [ 64%] 2023-01-11T23:13:47.2113775Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_physical_cuda_uint8 PASSED [ 64%] 2023-01-11T23:13:47.2113960Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_constant_pad_nd_cuda_bfloat16 PASSED [ 64%] 2023-01-11T23:13:47.2114141Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_constant_pad_nd_cuda_bool PASSED [ 64%] 2023-01-11T23:13:47.2114333Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_constant_pad_nd_cuda_complex128 PASSED [ 64%] 2023-01-11T23:13:47.2114514Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_constant_pad_nd_cuda_float32 PASSED [ 64%] 2023-01-11T23:13:47.2114697Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_constant_pad_nd_cuda_int32 PASSED [ 64%] 2023-01-11T23:13:47.2114872Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_constant_pad_nd_cuda_int8 PASSED [ 64%] 2023-01-11T23:13:47.2115043Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_contiguous_cuda_bool PASSED [ 64%] 2023-01-11T23:13:47.2115219Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_contiguous_cuda_float32 PASSED [ 64%] 2023-01-11T23:13:47.2115395Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_contiguous_cuda_int16 PASSED [ 64%] 2023-01-11T23:13:47.2115567Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_contiguous_cuda_int64 PASSED [ 64%] 2023-01-11T23:13:47.2115779Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_contiguous_cuda_int8 PASSED [ 64%] 2023-01-11T23:13:47.2115950Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_contiguous_cuda_uint8 PASSED [ 64%] 2023-01-11T23:13:47.2116126Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_copysign_cuda_bool PASSED [ 64%] 2023-01-11T23:13:47.2116296Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_copysign_cuda_int32 PASSED [ 64%] 2023-01-11T23:13:47.2116464Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_copysign_cuda_int64 PASSED [ 64%] 2023-01-11T23:13:47.2116626Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cos_cuda_bfloat16 PASSED [ 64%] 2023-01-11T23:13:47.2116793Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cos_cuda_bool PASSED [ 64%] 2023-01-11T23:13:47.2116964Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cos_cuda_complex32 PASSED [ 64%] 2023-01-11T23:13:47.2117133Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cos_cuda_float32 PASSED [ 64%] 2023-01-11T23:13:47.2117300Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cos_cuda_int8 PASSED [ 64%] 2023-01-11T23:13:47.2117475Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cosh_cuda_bfloat16 PASSED [ 64%] 2023-01-11T23:13:47.2117647Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cosh_cuda_complex32 PASSED [ 64%] 2023-01-11T23:13:47.2117814Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cosh_cuda_float64 PASSED [ 64%] 2023-01-11T23:13:47.2117981Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cosh_cuda_int16 PASSED [ 64%] 2023-01-11T23:13:47.2118142Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cosh_cuda_int32 PASSED [ 64%] 2023-01-11T23:13:47.2118308Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cosh_cuda_int64 PASSED [ 64%] 2023-01-11T23:13:47.2118477Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cosh_cuda_int8 PASSED [ 64%] 2023-01-11T23:13:47.2118656Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cumsum_cuda_complex128 PASSED [ 64%] 2023-01-11T23:13:47.2118857Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cumsum_cuda_complex64 PASSED [ 64%] 2023-01-11T23:13:47.2119025Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cumsum_cuda_int16 PASSED [ 64%] 2023-01-11T23:13:47.2119193Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cumsum_cuda_int32 PASSED [ 64%] 2023-01-11T23:13:47.2119363Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cumsum_cuda_int8 PASSED [ 64%] 2023-01-11T23:13:47.2119524Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_cuda_bool PASSED [ 64%] 2023-01-11T23:13:47.2119700Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_cuda_complex128 PASSED [ 64%] 2023-01-11T23:13:47.2119875Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_cuda_complex32 PASSED [ 64%] 2023-01-11T23:13:47.2120043Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_cuda_float64 PASSED [ 64%] 2023-01-11T23:13:47.2120211Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_cuda_int16 PASSED [ 64%] 2023-01-11T23:13:47.2120377Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_cuda_int32 PASSED [ 64%] 2023-01-11T23:13:47.2120542Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_cuda_int64 PASSED [ 64%] 2023-01-11T23:13:47.2120721Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_embed_cuda_bfloat16 PASSED [ 64%] 2023-01-11T23:13:47.2120895Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_embed_cuda_bool PASSED [ 64%] 2023-01-11T23:13:47.2121070Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_embed_cuda_complex128 PASSED [ 64%] 2023-01-11T23:13:47.2121276Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_embed_cuda_float16 PASSED [ 64%] 2023-01-11T23:13:47.2121452Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_embed_cuda_float64 PASSED [ 64%] 2023-01-11T23:13:47.2121627Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_embed_cuda_int16 PASSED [ 64%] 2023-01-11T23:13:47.2121796Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_embed_cuda_int64 PASSED [ 64%] 2023-01-11T23:13:47.2121969Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_embed_cuda_int8 PASSED [ 64%] 2023-01-11T23:13:47.2122138Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_embed_cuda_uint8 PASSED [ 64%] 2023-01-11T23:13:47.2122320Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_copy_cuda_bfloat16 PASSED [ 64%] 2023-01-11T23:13:47.2122505Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_copy_cuda_complex32 PASSED [ 64%] 2023-01-11T23:13:47.2122684Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_copy_cuda_complex64 PASSED [ 64%] 2023-01-11T23:13:47.2122868Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_copy_cuda_float16 PASSED [ 64%] 2023-01-11T23:13:47.2123048Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_copy_cuda_float64 PASSED [ 64%] 2023-01-11T23:13:47.2123227Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_copy_cuda_int16 PASSED [ 64%] 2023-01-11T23:13:47.2123402Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_cuda_bfloat16 PASSED [ 64%] 2023-01-11T23:13:47.2123581Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_cuda_complex128 PASSED [ 64%] 2023-01-11T23:13:47.2123756Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_cuda_float64 PASSED [ 64%] 2023-01-11T23:13:47.2123929Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_cuda_int16 PASSED [ 64%] 2023-01-11T23:13:47.2124101Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_cuda_int8 PASSED [ 64%] 2023-01-11T23:13:47.2124287Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_cuda_uint8 PASSED [ 64%] 2023-01-11T23:13:47.2124471Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_scatter_cuda_bool PASSED [ 64%] 2023-01-11T23:13:47.2124661Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_scatter_cuda_complex128 PASSED [ 64%] 2023-01-11T23:13:47.2124845Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_scatter_cuda_float16 PASSED [ 64%] 2023-01-11T23:13:47.2125028Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_scatter_cuda_float32 PASSED [ 64%] 2023-01-11T23:13:47.2125207Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_scatter_cuda_float64 PASSED [ 64%] 2023-01-11T23:13:47.2125391Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_scatter_cuda_int16 PASSED [ 64%] 2023-01-11T23:13:47.2125574Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_scatter_cuda_int8 PASSED [ 64%] 2023-01-11T23:13:47.2125746Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_digamma_cuda_bool PASSED [ 64%] 2023-01-11T23:13:47.2125914Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_digamma_cuda_float32 PASSED [ 64%] 2023-01-11T23:13:47.2126085Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_digamma_cuda_float64 PASSED [ 64%] 2023-01-11T23:13:47.2126259Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_digamma_cuda_uint8 PASSED [ 64%] 2023-01-11T23:13:47.2126446Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_floor_rounding_cuda_bfloat16 PASSED [ 64%] 2023-01-11T23:13:47.2126632Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_floor_rounding_cuda_float32 PASSED [ 64%] 2023-01-11T23:13:47.2126842Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_floor_rounding_cuda_int16 PASSED [ 64%] 2023-01-11T23:13:47.2127026Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_floor_rounding_cuda_int32 PASSED [ 64%] 2023-01-11T23:13:47.2127211Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_floor_rounding_cuda_int8 PASSED [ 64%] 2023-01-11T23:13:47.2127404Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_no_rounding_mode_cuda_complex128 PASSED [ 64%] 2023-01-11T23:13:47.2127588Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_no_rounding_mode_cuda_complex64 PASSED [ 64%] 2023-01-11T23:13:47.2127776Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_no_rounding_mode_cuda_float16 PASSED [ 64%] 2023-01-11T23:13:47.2127962Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_no_rounding_mode_cuda_int16 PASSED [ 64%] 2023-01-11T23:13:47.2128150Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_no_rounding_mode_cuda_int32 PASSED [ 64%] 2023-01-11T23:13:47.2128336Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_trunc_rounding_cuda_float16 PASSED [ 64%] 2023-01-11T23:13:47.2128523Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_trunc_rounding_cuda_float64 PASSED [ 64%] 2023-01-11T23:13:47.2128703Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_trunc_rounding_cuda_int16 PASSED [ 65%] 2023-01-11T23:13:47.2128886Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_trunc_rounding_cuda_int64 PASSED [ 65%] 2023-01-11T23:13:47.2129069Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_trunc_rounding_cuda_int8 PASSED [ 65%] 2023-01-11T23:13:47.2129246Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dsplit_cuda_float64 PASSED [ 65%] 2023-01-11T23:13:47.2129412Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dsplit_cuda_int16 PASSED [ 65%] 2023-01-11T23:13:47.2129583Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dsplit_cuda_int32 PASSED [ 65%] 2023-01-11T23:13:47.2129778Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dsplit_cuda_int64 PASSED [ 65%] 2023-01-11T23:13:47.2129950Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dsplit_cuda_uint8 PASSED [ 65%] 2023-01-11T23:13:47.2130122Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dstack_cuda_bool PASSED [ 65%] 2023-01-11T23:13:47.2130294Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dstack_cuda_float32 PASSED [ 65%] 2023-01-11T23:13:47.2130506Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_cuda_bool SKIPPED (Expected: empty is not comparable) [ 65%] 2023-01-11T23:13:47.2130724Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_cuda_complex32 SKIPPED (Expected: empty is not comparable) [ 65%] 2023-01-11T23:13:47.2130940Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_cuda_float32 SKIPPED (Expected: empty is not comparable) [ 65%] 2023-01-11T23:13:47.2131146Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_cuda_int64 SKIPPED (Expected: empty is not comparable) [ 65%] 2023-01-11T23:13:47.2131360Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_like_cuda_bool SKIPPED (Expected: empty is not comparable) [ 65%] 2023-01-11T23:13:47.2131581Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_like_cuda_complex32 SKIPPED (Expected: empty is not comparable) [ 65%] 2023-01-11T23:13:47.2131794Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_like_cuda_float16 SKIPPED (Expected: empty is not comparable) [ 65%] 2023-01-11T23:13:47.2132010Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_like_cuda_float32 SKIPPED (Expected: empty is not comparable) [ 65%] 2023-01-11T23:13:47.2132245Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_like_cuda_int16 SKIPPED (Expected: empty is not comparable) [ 65%] 2023-01-11T23:13:47.2132461Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_like_cuda_int64 SKIPPED (Expected: empty is not comparable) [ 65%] 2023-01-11T23:13:47.2132670Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_like_cuda_int8 SKIPPED (Expected: empty is not comparable) [ 65%] 2023-01-11T23:13:47.2132884Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_like_cuda_uint8 SKIPPED (Expected: empty is not comparable) [ 65%] 2023-01-11T23:13:47.2133051Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eq_cuda_bool PASSED [ 65%] 2023-01-11T23:13:47.2133217Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eq_cuda_complex32 PASSED [ 65%] 2023-01-11T23:13:47.2133385Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eq_cuda_int32 PASSED [ 65%] 2023-01-11T23:13:47.2133558Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erf_cuda_float16 PASSED [ 65%] 2023-01-11T23:13:47.2133730Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erf_cuda_float64 PASSED [ 65%] 2023-01-11T23:13:47.2133897Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erf_cuda_int16 PASSED [ 65%] 2023-01-11T23:13:47.2134062Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erf_cuda_int8 PASSED [ 65%] 2023-01-11T23:13:47.2134229Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erf_cuda_uint8 PASSED [ 65%] 2023-01-11T23:13:47.2134400Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erfc_cuda_float16 PASSED [ 65%] 2023-01-11T23:13:47.2134737Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erfc_cuda_float32 PASSED [ 65%] 2023-01-11T23:13:47.2134902Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erfc_cuda_float64 PASSED [ 65%] 2023-01-11T23:13:47.2135075Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erfc_cuda_int16 PASSED [ 65%] 2023-01-11T23:13:47.2135245Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erfc_cuda_uint8 PASSED [ 65%] 2023-01-11T23:13:47.2135457Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erfinv_cuda_bool PASSED [ 65%] 2023-01-11T23:13:47.2135632Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erfinv_cuda_float16 PASSED [ 65%] 2023-01-11T23:13:47.2135798Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erfinv_cuda_int16 PASSED [ 65%] 2023-01-11T23:13:47.2135965Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erfinv_cuda_int32 PASSED [ 65%] 2023-01-11T23:13:47.2136132Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erfinv_cuda_int64 PASSED [ 65%] 2023-01-11T23:13:47.2136301Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erfinv_cuda_uint8 PASSED [ 65%] 2023-01-11T23:13:47.2136464Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp2_cuda_float32 PASSED [ 65%] 2023-01-11T23:13:47.2136631Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp2_cuda_int64 PASSED [ 65%] 2023-01-11T23:13:47.2136799Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp2_cuda_uint8 PASSED [ 65%] 2023-01-11T23:13:47.2136966Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp_cuda_bool PASSED [ 65%] 2023-01-11T23:13:47.2137143Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp_cuda_complex128 PASSED [ 65%] 2023-01-11T23:13:47.2137316Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp_cuda_complex32 PASSED [ 65%] 2023-01-11T23:13:47.2137487Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp_cuda_float32 PASSED [ 65%] 2023-01-11T23:13:47.2137659Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp_cuda_float64 PASSED [ 65%] 2023-01-11T23:13:47.2137854Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp_cuda_int32 PASSED [ 65%] 2023-01-11T23:13:47.2138022Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp_cuda_int64 PASSED [ 65%] 2023-01-11T23:13:47.2138191Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp_cuda_int8 PASSED [ 65%] 2023-01-11T23:13:47.2138373Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_as_cuda_bfloat16 PASSED [ 65%] 2023-01-11T23:13:47.2138547Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_as_cuda_bool PASSED [ 65%] 2023-01-11T23:13:47.2138724Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_as_cuda_float64 PASSED [ 65%] 2023-01-11T23:13:47.2138898Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_as_cuda_int64 PASSED [ 65%] 2023-01-11T23:13:47.2139068Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_as_cuda_uint8 PASSED [ 65%] 2023-01-11T23:13:47.2139240Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_cuda_bool PASSED [ 65%] 2023-01-11T23:13:47.2139409Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_cuda_complex128 PASSED [ 65%] 2023-01-11T23:13:47.2139590Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_cuda_complex64 PASSED [ 65%] 2023-01-11T23:13:47.2139763Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_cuda_float16 PASSED [ 65%] 2023-01-11T23:13:47.2139932Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_cuda_float64 PASSED [ 65%] 2023-01-11T23:13:47.2140102Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_cuda_int8 PASSED [ 65%] 2023-01-11T23:13:47.2140273Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expm1_cuda_bfloat16 PASSED [ 65%] 2023-01-11T23:13:47.2140445Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expm1_cuda_float64 PASSED [ 65%] 2023-01-11T23:13:47.2140616Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expm1_cuda_int16 PASSED [ 65%] 2023-01-11T23:13:47.2140785Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expm1_cuda_int32 PASSED [ 65%] 2023-01-11T23:13:47.2140974Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eye_cuda_bfloat16 PASSED [ 65%] 2023-01-11T23:13:47.2141150Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eye_cuda_complex128 PASSED [ 65%] 2023-01-11T23:13:47.2141316Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eye_cuda_float16 PASSED [ 65%] 2023-01-11T23:13:47.2141484Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eye_cuda_float64 PASSED [ 65%] 2023-01-11T23:13:47.2141651Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eye_cuda_int16 PASSED [ 65%] 2023-01-11T23:13:47.2141815Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eye_cuda_int64 PASSED [ 65%] 2023-01-11T23:13:47.2141984Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eye_cuda_int8 PASSED [ 65%] 2023-01-11T23:13:47.2142147Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eye_cuda_uint8 PASSED [ 65%] 2023-01-11T23:13:47.2142315Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft2_cuda_bool PASSED [ 65%] 2023-01-11T23:13:47.2142495Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft2_cuda_complex128 PASSED [ 65%] 2023-01-11T23:13:47.2142669Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft2_cuda_float32 PASSED [ 65%] 2023-01-11T23:13:47.2142839Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft2_cuda_float64 PASSED [ 65%] 2023-01-11T23:13:47.2143010Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft2_cuda_int16 PASSED [ 65%] 2023-01-11T23:13:47.2143177Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft2_cuda_int32 PASSED [ 65%] 2023-01-11T23:13:47.2143369Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft2_cuda_int64 PASSED [ 65%] 2023-01-11T23:13:47.2143539Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft2_cuda_int8 PASSED [ 65%] 2023-01-11T23:13:47.2143709Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft_cuda_bool PASSED [ 65%] 2023-01-11T23:13:47.2143874Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft_cuda_float16 PASSED [ 65%] 2023-01-11T23:13:47.2144043Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft_cuda_float64 PASSED [ 65%] 2023-01-11T23:13:47.2144213Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft_cuda_int64 PASSED [ 65%] 2023-01-11T23:13:47.2144391Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftn_cuda_complex64 PASSED [ 65%] 2023-01-11T23:13:47.2144563Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftn_cuda_float32 PASSED [ 65%] 2023-01-11T23:13:47.2144731Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftn_cuda_int16 PASSED [ 65%] 2023-01-11T23:13:47.2144895Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftn_cuda_int32 PASSED [ 65%] 2023-01-11T23:13:47.2145062Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftn_cuda_int64 PASSED [ 65%] 2023-01-11T23:13:47.2145247Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftshift_cuda_complex32 PASSED [ 65%] 2023-01-11T23:13:47.2145421Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftshift_cuda_float64 PASSED [ 65%] 2023-01-11T23:13:47.2145597Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftshift_cuda_int32 PASSED [ 65%] 2023-01-11T23:13:47.2145775Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftshift_cuda_int64 PASSED [ 65%] 2023-01-11T23:13:47.2145956Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft2_cuda_complex128 PASSED [ 65%] 2023-01-11T23:13:47.2146139Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft2_cuda_complex32 PASSED [ 65%] 2023-01-11T23:13:47.2146336Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft2_cuda_int32 PASSED [ 65%] 2023-01-11T23:13:47.2146508Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft_cuda_bool PASSED [ 65%] 2023-01-11T23:13:47.2146685Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft_cuda_complex128 PASSED [ 65%] 2023-01-11T23:13:47.2146855Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft_cuda_float16 PASSED [ 65%] 2023-01-11T23:13:47.2147016Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft_cuda_float32 PASSED [ 65%] 2023-01-11T23:13:47.2147187Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft_cuda_int64 PASSED [ 65%] 2023-01-11T23:13:47.2147362Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft_cuda_int8 PASSED [ 65%] 2023-01-11T23:13:47.2147534Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfftn_cuda_bool PASSED [ 65%] 2023-01-11T23:13:47.2147716Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfftn_cuda_complex32 PASSED [ 65%] 2023-01-11T23:13:47.2147893Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfftn_cuda_float64 PASSED [ 65%] 2023-01-11T23:13:47.2148066Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfftn_cuda_int16 PASSED [ 65%] 2023-01-11T23:13:47.2148240Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfftn_cuda_int64 PASSED [ 65%] 2023-01-11T23:13:47.2148412Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfftn_cuda_int8 PASSED [ 65%] 2023-01-11T23:13:47.2148584Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft2_cuda_complex128 PASSED [ 65%] 2023-01-11T23:13:47.2148792Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft2_cuda_complex32 PASSED [ 65%] 2023-01-11T23:13:47.2148968Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft2_cuda_complex64 PASSED [ 65%] 2023-01-11T23:13:47.2149146Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft2_cuda_float16 PASSED [ 65%] 2023-01-11T23:13:47.2149321Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft2_cuda_int8 PASSED [ 65%] 2023-01-11T23:13:47.2149502Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft_cuda_complex64 PASSED [ 65%] 2023-01-11T23:13:47.2149759Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft_cuda_float64 PASSED [ 65%] 2023-01-11T23:13:47.2149938Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft_cuda_int32 PASSED [ 65%] 2023-01-11T23:13:47.2150110Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft_cuda_int64 PASSED [ 65%] 2023-01-11T23:13:47.2150274Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft_cuda_int8 PASSED [ 65%] 2023-01-11T23:13:47.2150442Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft_cuda_uint8 PASSED [ 65%] 2023-01-11T23:13:47.2150614Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftn_cuda_bool PASSED [ 65%] 2023-01-11T23:13:47.2150793Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftn_cuda_complex128 PASSED [ 65%] 2023-01-11T23:13:47.2150971Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftn_cuda_float16 PASSED [ 65%] 2023-01-11T23:13:47.2151147Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftn_cuda_float32 PASSED [ 65%] 2023-01-11T23:13:47.2151319Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftn_cuda_int16 PASSED [ 65%] 2023-01-11T23:13:47.2151487Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftn_cuda_int32 PASSED [ 65%] 2023-01-11T23:13:47.2151658Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftn_cuda_int64 PASSED [ 65%] 2023-01-11T23:13:47.2151818Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftn_cuda_int8 PASSED [ 65%] 2023-01-11T23:13:47.2152015Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftn_cuda_uint8 PASSED [ 65%] 2023-01-11T23:13:47.2152204Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftshift_cuda_bfloat16 PASSED [ 65%] 2023-01-11T23:13:47.2152391Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftshift_cuda_complex128 PASSED [ 65%] 2023-01-11T23:13:47.2152576Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftshift_cuda_complex32 PASSED [ 65%] 2023-01-11T23:13:47.2152758Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftshift_cuda_float32 PASSED [ 65%] 2023-01-11T23:13:47.2152939Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftshift_cuda_float64 PASSED [ 65%] 2023-01-11T23:13:47.2153118Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftshift_cuda_int16 PASSED [ 65%] 2023-01-11T23:13:47.2153300Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftshift_cuda_int32 PASSED [ 65%] 2023-01-11T23:13:47.2153471Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftshift_cuda_int8 PASSED [ 65%] 2023-01-11T23:13:47.2153644Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfft2_cuda_bool PASSED [ 65%] 2023-01-11T23:13:47.2153822Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfft2_cuda_float32 PASSED [ 65%] 2023-01-11T23:13:47.2153994Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfft2_cuda_int16 PASSED [ 65%] 2023-01-11T23:13:47.2154164Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfft2_cuda_int32 PASSED [ 65%] 2023-01-11T23:13:47.2154405Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfft2_cuda_int64 PASSED [ 65%] 2023-01-11T23:13:47.2154577Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfft_cuda_bool PASSED [ 65%] 2023-01-11T23:13:47.2154754Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfft_cuda_float64 PASSED [ 65%] 2023-01-11T23:13:47.2154928Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfft_cuda_int16 PASSED [ 65%] 2023-01-11T23:13:47.2155091Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfft_cuda_int32 PASSED [ 65%] 2023-01-11T23:13:47.2155260Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfft_cuda_int64 PASSED [ 65%] 2023-01-11T23:13:47.2155430Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfft_cuda_uint8 PASSED [ 65%] 2023-01-11T23:13:47.2155607Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfftn_cuda_float16 PASSED [ 65%] 2023-01-11T23:13:47.2155784Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfftn_cuda_float32 PASSED [ 65%] 2023-01-11T23:13:47.2155958Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfftn_cuda_int64 PASSED [ 66%] 2023-01-11T23:13:47.2156141Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft2_cuda_complex128 PASSED [ 66%] 2023-01-11T23:13:47.2156320Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft2_cuda_complex64 PASSED [ 66%] 2023-01-11T23:13:47.2156491Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft2_cuda_float64 PASSED [ 66%] 2023-01-11T23:13:47.2156654Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft2_cuda_int64 PASSED [ 66%] 2023-01-11T23:13:47.2156821Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft2_cuda_uint8 PASSED [ 66%] 2023-01-11T23:13:47.2156998Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft_cuda_complex128 PASSED [ 66%] 2023-01-11T23:13:47.2157175Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft_cuda_int32 PASSED [ 66%] 2023-01-11T23:13:47.2157369Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfftn_cuda_bool PASSED [ 66%] 2023-01-11T23:13:47.2157553Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfftn_cuda_complex128 PASSED [ 66%] 2023-01-11T23:13:47.2157732Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfftn_cuda_complex32 PASSED [ 66%] 2023-01-11T23:13:47.2157908Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfftn_cuda_float16 PASSED [ 66%] 2023-01-11T23:13:47.2158085Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfftn_cuda_float32 PASSED [ 66%] 2023-01-11T23:13:47.2158252Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfftn_cuda_int16 PASSED [ 66%] 2023-01-11T23:13:47.2158423Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfftn_cuda_int32 PASSED [ 66%] 2023-01-11T23:13:47.2158596Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfftn_cuda_int64 PASSED [ 66%] 2023-01-11T23:13:47.2158771Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfft2_cuda_bool PASSED [ 66%] 2023-01-11T23:13:47.2158944Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfft2_cuda_float16 PASSED [ 66%] 2023-01-11T23:13:47.2159117Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfft2_cuda_int64 PASSED [ 66%] 2023-01-11T23:13:47.2159287Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfft2_cuda_int8 PASSED [ 66%] 2023-01-11T23:13:47.2159460Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfft2_cuda_uint8 PASSED [ 66%] 2023-01-11T23:13:47.2159632Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfft_cuda_bool PASSED [ 66%] 2023-01-11T23:13:47.2159799Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfft_cuda_float16 PASSED [ 66%] 2023-01-11T23:13:47.2159993Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfft_cuda_float32 PASSED [ 66%] 2023-01-11T23:13:47.2160163Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfft_cuda_float64 PASSED [ 66%] 2023-01-11T23:13:47.2160333Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfft_cuda_int32 PASSED [ 66%] 2023-01-11T23:13:47.2160500Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfft_cuda_int8 PASSED [ 66%] 2023-01-11T23:13:47.2160679Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfftn_cuda_float32 PASSED [ 66%] 2023-01-11T23:13:47.2160857Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfftn_cuda_float64 PASSED [ 66%] 2023-01-11T23:13:47.2161028Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfftn_cuda_int64 PASSED [ 66%] 2023-01-11T23:13:47.2161192Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfftn_cuda_uint8 PASSED [ 66%] 2023-01-11T23:13:47.2161367Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fill_cuda_bfloat16 PASSED [ 66%] 2023-01-11T23:13:47.2161539Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fill_cuda_bool PASSED [ 66%] 2023-01-11T23:13:47.2161713Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fill_cuda_complex32 PASSED [ 66%] 2023-01-11T23:13:47.2161883Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fill_cuda_float16 PASSED [ 66%] 2023-01-11T23:13:47.2162052Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fill_cuda_float32 PASSED [ 66%] 2023-01-11T23:13:47.2162222Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fill_cuda_float64 PASSED [ 66%] 2023-01-11T23:13:47.2162390Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fill_cuda_int16 PASSED [ 66%] 2023-01-11T23:13:47.2162560Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fill_cuda_int32 PASSED [ 66%] 2023-01-11T23:13:47.2162720Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fill_cuda_int64 PASSED [ 66%] 2023-01-11T23:13:47.2162912Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fill_cuda_int8 PASSED [ 66%] 2023-01-11T23:13:47.2163095Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flatten_cuda_complex32 PASSED [ 66%] 2023-01-11T23:13:47.2163274Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flatten_cuda_complex64 PASSED [ 66%] 2023-01-11T23:13:47.2163449Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flatten_cuda_float64 PASSED [ 66%] 2023-01-11T23:13:47.2163620Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flatten_cuda_int16 PASSED [ 66%] 2023-01-11T23:13:47.2163790Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flatten_cuda_int64 PASSED [ 66%] 2023-01-11T23:13:47.2163964Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flatten_cuda_int8 PASSED [ 66%] 2023-01-11T23:13:47.2164137Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flatten_cuda_uint8 PASSED [ 66%] 2023-01-11T23:13:47.2164302Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flip_cuda_bfloat16 PASSED [ 66%] 2023-01-11T23:13:47.2164475Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flip_cuda_bool PASSED [ 66%] 2023-01-11T23:13:47.2164653Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flip_cuda_complex64 PASSED [ 66%] 2023-01-11T23:13:47.2164825Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flip_cuda_float16 PASSED [ 66%] 2023-01-11T23:13:47.2165000Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flip_cuda_float32 PASSED [ 66%] 2023-01-11T23:13:47.2165171Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flip_cuda_int32 PASSED [ 66%] 2023-01-11T23:13:47.2165366Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flip_cuda_int64 PASSED [ 66%] 2023-01-11T23:13:47.2165536Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flip_cuda_int8 PASSED [ 66%] 2023-01-11T23:13:47.2165707Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flip_cuda_uint8 PASSED [ 66%] 2023-01-11T23:13:47.2165881Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fliplr_cuda_complex128 PASSED [ 66%] 2023-01-11T23:13:47.2166061Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fliplr_cuda_complex64 PASSED [ 66%] 2023-01-11T23:13:47.2166233Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fliplr_cuda_float32 PASSED [ 66%] 2023-01-11T23:13:47.2166402Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fliplr_cuda_int64 PASSED [ 66%] 2023-01-11T23:13:47.2166572Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fliplr_cuda_uint8 PASSED [ 66%] 2023-01-11T23:13:47.2166744Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flipud_cuda_bool PASSED [ 66%] 2023-01-11T23:13:47.2166922Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flipud_cuda_float32 PASSED [ 66%] 2023-01-11T23:13:47.2167094Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flipud_cuda_float64 PASSED [ 66%] 2023-01-11T23:13:47.2167263Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flipud_cuda_int16 PASSED [ 66%] 2023-01-11T23:13:47.2167428Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flipud_cuda_int8 PASSED [ 66%] 2023-01-11T23:13:47.2167605Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_float_power_cuda_bool PASSED [ 66%] 2023-01-11T23:13:47.2167780Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_float_power_cuda_int32 PASSED [ 66%] 2023-01-11T23:13:47.2167963Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_float_power_cuda_uint8 PASSED [ 66%] 2023-01-11T23:13:47.2168138Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_floor_cuda_bfloat16 PASSED [ 66%] 2023-01-11T23:13:47.2168309Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_floor_cuda_float32 PASSED [ 66%] 2023-01-11T23:13:47.2168506Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_floor_cuda_int16 PASSED [ 66%] 2023-01-11T23:13:47.2168678Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_floor_cuda_int64 PASSED [ 66%] 2023-01-11T23:13:47.2168853Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_floor_divide_cuda_float64 PASSED [ 66%] 2023-01-11T23:13:47.2169033Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_floor_divide_cuda_int32 PASSED [ 66%] 2023-01-11T23:13:47.2169216Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_floor_divide_cuda_int64 PASSED [ 66%] 2023-01-11T23:13:47.2169395Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_floor_divide_cuda_int8 PASSED [ 66%] 2023-01-11T23:13:47.2169577Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_floor_divide_cuda_uint8 PASSED [ 66%] 2023-01-11T23:13:47.2169748Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmax_cuda_float64 PASSED [ 66%] 2023-01-11T23:13:47.2169922Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmax_cuda_int16 PASSED [ 66%] 2023-01-11T23:13:47.2170094Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmax_cuda_int32 PASSED [ 66%] 2023-01-11T23:13:47.2170263Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmax_cuda_int8 PASSED [ 66%] 2023-01-11T23:13:47.2170429Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmin_cuda_bfloat16 PASSED [ 66%] 2023-01-11T23:13:47.2170599Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmin_cuda_float32 PASSED [ 66%] 2023-01-11T23:13:47.2170769Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmin_cuda_float64 PASSED [ 66%] 2023-01-11T23:13:47.2170963Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmin_cuda_int32 PASSED [ 66%] 2023-01-11T23:13:47.2171130Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmin_cuda_int64 PASSED [ 66%] 2023-01-11T23:13:47.2171301Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmin_cuda_uint8 PASSED [ 66%] 2023-01-11T23:13:47.2171478Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmod_cuda_bfloat16 PASSED [ 66%] 2023-01-11T23:13:47.2171646Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmod_cuda_float16 PASSED [ 66%] 2023-01-11T23:13:47.2171816Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmod_cuda_float32 PASSED [ 66%] 2023-01-11T23:13:47.2171975Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmod_cuda_int16 PASSED [ 66%] 2023-01-11T23:13:47.2172139Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmod_cuda_int32 PASSED [ 66%] 2023-01-11T23:13:47.2172307Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmod_cuda_int64 PASSED [ 66%] 2023-01-11T23:13:47.2172477Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmod_cuda_int8 PASSED [ 66%] 2023-01-11T23:13:47.2172644Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_gcd_cuda_int32 PASSED [ 66%] 2023-01-11T23:13:47.2172812Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_gcd_cuda_int64 PASSED [ 66%] 2023-01-11T23:13:47.2172978Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_gcd_cuda_int8 PASSED [ 66%] 2023-01-11T23:13:47.2173148Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ge_cuda_bool PASSED [ 66%] 2023-01-11T23:13:47.2173315Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ge_cuda_float64 PASSED [ 66%] 2023-01-11T23:13:47.2173475Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ge_cuda_int16 PASSED [ 66%] 2023-01-11T23:13:47.2173641Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ge_cuda_int32 PASSED [ 66%] 2023-01-11T23:13:47.2173808Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ge_cuda_uint8 PASSED [ 66%] 2023-01-11T23:13:47.2174001Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_gt_cuda_float16 PASSED [ 66%] 2023-01-11T23:13:47.2174169Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_gt_cuda_float64 PASSED [ 66%] 2023-01-11T23:13:47.2174332Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_gt_cuda_int32 PASSED [ 66%] 2023-01-11T23:13:47.2174608Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_gt_cuda_int8 PASSED [ 66%] 2023-01-11T23:13:47.2174789Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_heaviside_cuda_bool PASSED [ 66%] 2023-01-11T23:13:47.2174962Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_heaviside_cuda_float16 PASSED [ 66%] 2023-01-11T23:13:47.2175142Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_heaviside_cuda_int16 PASSED [ 66%] 2023-01-11T23:13:47.2175320Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_heaviside_cuda_int64 PASSED [ 66%] 2023-01-11T23:13:47.2175494Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hsplit_cuda_bfloat16 PASSED [ 66%] 2023-01-11T23:13:47.2175676Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hsplit_cuda_complex128 PASSED [ 66%] 2023-01-11T23:13:47.2175854Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hsplit_cuda_complex32 PASSED [ 66%] 2023-01-11T23:13:47.2176027Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hsplit_cuda_complex64 PASSED [ 66%] 2023-01-11T23:13:47.2176197Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hsplit_cuda_int16 PASSED [ 66%] 2023-01-11T23:13:47.2176368Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hstack_cuda_bfloat16 PASSED [ 66%] 2023-01-11T23:13:47.2176537Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hstack_cuda_complex128 PASSED [ 66%] 2023-01-11T23:13:47.2176755Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hstack_cuda_complex32 PASSED [ 66%] 2023-01-11T23:13:47.2176930Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hstack_cuda_float32 PASSED [ 66%] 2023-01-11T23:13:47.2177098Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hstack_cuda_int16 PASSED [ 66%] 2023-01-11T23:13:47.2177268Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hstack_cuda_int32 PASSED [ 66%] 2023-01-11T23:13:47.2177438Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hstack_cuda_int64 PASSED [ 66%] 2023-01-11T23:13:47.2177609Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hstack_cuda_int8 PASSED [ 66%] 2023-01-11T23:13:47.2177775Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hstack_cuda_uint8 PASSED [ 66%] 2023-01-11T23:13:47.2177952Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hypot_cuda_float16 PASSED [ 66%] 2023-01-11T23:13:47.2178117Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hypot_cuda_float64 PASSED [ 66%] 2023-01-11T23:13:47.2178288Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_i0_cuda_bool PASSED [ 66%] 2023-01-11T23:13:47.2178455Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_i0_cuda_float16 PASSED [ 66%] 2023-01-11T23:13:47.2178622Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_i0_cuda_float64 PASSED [ 66%] 2023-01-11T23:13:47.2178797Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_igammac_cuda_float32 PASSED [ 66%] 2023-01-11T23:13:47.2178968Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_igammac_cuda_float64 PASSED [ 66%] 2023-01-11T23:13:47.2179138Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_imag_cuda_complex128 PASSED [ 66%] 2023-01-11T23:13:47.2179316Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_imag_cuda_complex32 PASSED [ 66%] 2023-01-11T23:13:47.2179517Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_imag_cuda_complex64 PASSED [ 66%] 2023-01-11T23:13:47.2179731Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_add_cuda_bool PASSED [ 66%] 2023-01-11T23:13:47.2180019Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_add_cuda_complex64 PASSED [ 66%] 2023-01-11T23:13:47.2180235Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_add_cuda_float64 PASSED [ 66%] 2023-01-11T23:13:47.2180426Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_add_cuda_int16 PASSED [ 66%] 2023-01-11T23:13:47.2180629Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_copy_cuda_bool PASSED [ 66%] 2023-01-11T23:13:47.2180814Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_copy_cuda_complex64 PASSED [ 66%] 2023-01-11T23:13:47.2180994Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_copy_cuda_float16 PASSED [ 66%] 2023-01-11T23:13:47.2181179Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_copy_cuda_float64 PASSED [ 66%] 2023-01-11T23:13:47.2181363Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_fill_cuda_bfloat16 PASSED [ 66%] 2023-01-11T23:13:47.2181530Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_fill_cuda_int64 PASSED [ 66%] 2023-01-11T23:13:47.2181793Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_fill_cuda_int8 PASSED [ 66%] 2023-01-11T23:13:47.2181992Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_select_cuda_bfloat16 PASSED [ 66%] 2023-01-11T23:13:47.2182196Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_select_cuda_float16 PASSED [ 66%] 2023-01-11T23:13:47.2182377Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_select_cuda_float64 PASSED [ 66%] 2023-01-11T23:13:47.2182640Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isclose_cuda_bfloat16 PASSED [ 66%] 2023-01-11T23:13:47.2182817Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isclose_cuda_complex128 PASSED [ 67%] 2023-01-11T23:13:47.2182990Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isclose_cuda_int32 PASSED [ 67%] 2023-01-11T23:13:47.2183159Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isclose_cuda_int64 PASSED [ 67%] 2023-01-11T23:13:47.2183321Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isclose_cuda_int8 PASSED [ 67%] 2023-01-11T23:13:47.2183495Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isfinite_cuda_complex32 PASSED [ 67%] 2023-01-11T23:13:47.2183671Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isfinite_cuda_complex64 PASSED [ 67%] 2023-01-11T23:13:47.2183843Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isfinite_cuda_int16 PASSED [ 67%] 2023-01-11T23:13:47.2184015Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isfinite_cuda_int64 PASSED [ 67%] 2023-01-11T23:13:47.2184184Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isfinite_cuda_int8 PASSED [ 67%] 2023-01-11T23:13:47.2184355Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isfinite_cuda_uint8 PASSED [ 67%] 2023-01-11T23:13:47.2184523Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isinf_cuda_bool PASSED [ 67%] 2023-01-11T23:13:47.2184692Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isinf_cuda_complex32 PASSED [ 67%] 2023-01-11T23:13:47.2184866Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isinf_cuda_complex64 PASSED [ 67%] 2023-01-11T23:13:47.2185035Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isinf_cuda_float16 PASSED [ 67%] 2023-01-11T23:13:47.2185202Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isinf_cuda_float32 PASSED [ 67%] 2023-01-11T23:13:47.2185374Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isinf_cuda_int16 PASSED [ 67%] 2023-01-11T23:13:47.2185540Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isinf_cuda_int8 PASSED [ 67%] 2023-01-11T23:13:47.2185737Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isinf_cuda_uint8 PASSED [ 67%] 2023-01-11T23:13:47.2185908Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isnan_cuda_bfloat16 PASSED [ 67%] 2023-01-11T23:13:47.2186075Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isnan_cuda_bool PASSED [ 67%] 2023-01-11T23:13:47.2186243Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isnan_cuda_complex128 PASSED [ 67%] 2023-01-11T23:13:47.2186414Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isnan_cuda_complex64 PASSED [ 67%] 2023-01-11T23:13:47.2186581Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isnan_cuda_float16 PASSED [ 67%] 2023-01-11T23:13:47.2186750Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isnan_cuda_float32 PASSED [ 67%] 2023-01-11T23:13:47.2186917Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isnan_cuda_int8 PASSED [ 67%] 2023-01-11T23:13:47.2187092Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isneginf_cuda_float64 PASSED [ 67%] 2023-01-11T23:13:47.2187262Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isneginf_cuda_int16 PASSED [ 67%] 2023-01-11T23:13:47.2187431Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isneginf_cuda_int32 PASSED [ 67%] 2023-01-11T23:13:47.2187608Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isposinf_cuda_bfloat16 PASSED [ 67%] 2023-01-11T23:13:47.2187776Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isposinf_cuda_float16 PASSED [ 67%] 2023-01-11T23:13:47.2187946Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isposinf_cuda_float32 PASSED [ 67%] 2023-01-11T23:13:47.2188143Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isposinf_cuda_int16 PASSED [ 67%] 2023-01-11T23:13:47.2188310Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isposinf_cuda_int32 PASSED [ 67%] 2023-01-11T23:13:47.2188478Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isposinf_cuda_int8 PASSED [ 67%] 2023-01-11T23:13:47.2188650Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isreal_cuda_complex32 PASSED [ 67%] 2023-01-11T23:13:47.2188816Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isreal_cuda_float16 PASSED [ 67%] 2023-01-11T23:13:47.2188980Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isreal_cuda_float64 PASSED [ 67%] 2023-01-11T23:13:47.2189149Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isreal_cuda_int8 PASSED [ 67%] 2023-01-11T23:13:47.2189312Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isreal_cuda_uint8 PASSED [ 67%] 2023-01-11T23:13:47.2189480Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lcm_cuda_int32 PASSED [ 67%] 2023-01-11T23:13:47.2189645Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lcm_cuda_int64 PASSED [ 67%] 2023-01-11T23:13:47.2189879Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lcm_cuda_int8 PASSED [ 67%] 2023-01-11T23:13:47.2190044Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lcm_cuda_uint8 PASSED [ 67%] 2023-01-11T23:13:47.2190215Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_le_cuda_bfloat16 PASSED [ 67%] 2023-01-11T23:13:47.2190381Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_le_cuda_bool PASSED [ 67%] 2023-01-11T23:13:47.2190547Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_le_cuda_float16 PASSED [ 67%] 2023-01-11T23:13:47.2190704Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_le_cuda_int16 PASSED [ 67%] 2023-01-11T23:13:47.2190872Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_le_cuda_int64 PASSED [ 67%] 2023-01-11T23:13:47.2191048Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lerp_cuda_complex128 PASSED [ 67%] 2023-01-11T23:13:47.2191244Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lerp_cuda_complex32 PASSED [ 67%] 2023-01-11T23:13:47.2191415Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lerp_cuda_complex64 PASSED [ 67%] 2023-01-11T23:13:47.2191584Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lerp_cuda_float32 PASSED [ 67%] 2023-01-11T23:13:47.2191752Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lgamma_cuda_bool PASSED [ 67%] 2023-01-11T23:13:47.2191923Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lgamma_cuda_float16 PASSED [ 67%] 2023-01-11T23:13:47.2192093Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lgamma_cuda_int64 PASSED [ 67%] 2023-01-11T23:13:47.2192259Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lgamma_cuda_int8 PASSED [ 67%] 2023-01-11T23:13:47.2192426Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lgamma_cuda_uint8 PASSED [ 67%] 2023-01-11T23:13:47.2192619Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_matrix_norm_cuda_bfloat16 PASSED [ 67%] 2023-01-11T23:13:47.2192812Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_matrix_norm_cuda_complex128 PASSED [ 67%] 2023-01-11T23:13:47.2192999Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_matrix_norm_cuda_complex64 PASSED [ 67%] 2023-01-11T23:13:47.2193184Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_matrix_norm_cuda_float16 PASSED [ 67%] 2023-01-11T23:13:47.2193364Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_norm_cuda_bfloat16 PASSED [ 67%] 2023-01-11T23:13:47.2193542Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_norm_cuda_float32 PASSED [ 67%] 2023-01-11T23:13:47.2193759Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_svd_cuda_complex128 PASSED [ 67%] 2023-01-11T23:13:47.2193931Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_svd_cuda_float32 PASSED [ 67%] 2023-01-11T23:13:47.2194112Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_svdvals_cuda_float32 PASSED [ 67%] 2023-01-11T23:13:47.2194387Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_vector_norm_cuda_bfloat16 PASSED [ 67%] 2023-01-11T23:13:47.2194592Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_vector_norm_cuda_float16 PASSED [ 67%] 2023-01-11T23:13:47.2194806Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_vector_norm_cuda_float64 PASSED [ 67%] 2023-01-11T23:13:47.2194989Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linspace_cuda_bfloat16 PASSED [ 67%] 2023-01-11T23:13:47.2195177Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linspace_cuda_complex64 PASSED [ 67%] 2023-01-11T23:13:47.2195350Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linspace_cuda_int64 XFAIL [ 67%] 2023-01-11T23:13:47.2195522Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linspace_cuda_uint8 XFAIL [ 67%] 2023-01-11T23:13:47.2195686Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log10_cuda_bfloat16 PASSED [ 67%] 2023-01-11T23:13:47.2195855Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log10_cuda_bool PASSED [ 67%] 2023-01-11T23:13:47.2196025Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log10_cuda_float32 PASSED [ 67%] 2023-01-11T23:13:47.2196195Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log10_cuda_float64 PASSED [ 67%] 2023-01-11T23:13:47.2196364Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log10_cuda_int16 PASSED [ 67%] 2023-01-11T23:13:47.2196540Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log10_cuda_int32 PASSED [ 67%] 2023-01-11T23:13:47.2196712Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log10_cuda_int8 PASSED [ 67%] 2023-01-11T23:13:47.2196922Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log10_cuda_uint8 PASSED [ 67%] 2023-01-11T23:13:47.2197097Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log1p_cuda_bfloat16 PASSED [ 67%] 2023-01-11T23:13:47.2197257Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log1p_cuda_bool PASSED [ 67%] 2023-01-11T23:13:47.2197435Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log1p_cuda_complex128 PASSED [ 67%] 2023-01-11T23:13:47.2197606Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log1p_cuda_float32 PASSED [ 67%] 2023-01-11T23:13:47.2197772Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log1p_cuda_float64 PASSED [ 67%] 2023-01-11T23:13:47.2197942Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log1p_cuda_int16 PASSED [ 67%] 2023-01-11T23:13:47.2198111Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log1p_cuda_int32 PASSED [ 67%] 2023-01-11T23:13:47.2198284Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log1p_cuda_int8 PASSED [ 67%] 2023-01-11T23:13:47.2198453Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log2_cuda_bool PASSED [ 67%] 2023-01-11T23:13:47.2198622Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log2_cuda_float32 PASSED [ 67%] 2023-01-11T23:13:47.2198781Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log2_cuda_uint8 PASSED [ 67%] 2023-01-11T23:13:47.2198951Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_cuda_bfloat16 PASSED [ 67%] 2023-01-11T23:13:47.2199120Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_cuda_bool PASSED [ 67%] 2023-01-11T23:13:47.2199318Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_cuda_float16 PASSED [ 67%] 2023-01-11T23:13:47.2199486Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_cuda_float32 PASSED [ 67%] 2023-01-11T23:13:47.2199656Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_cuda_float64 PASSED [ 67%] 2023-01-11T23:13:47.2199825Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_cuda_int8 PASSED [ 67%] 2023-01-11T23:13:47.2200016Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_softmax_with_dtype_cuda_bool PASSED [ 67%] 2023-01-11T23:13:47.2200211Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_softmax_with_dtype_cuda_complex64 PASSED [ 67%] 2023-01-11T23:13:47.2200394Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_softmax_with_dtype_cuda_float32 PASSED [ 67%] 2023-01-11T23:13:47.2200583Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_softmax_with_dtype_cuda_int16 PASSED [ 67%] 2023-01-11T23:13:47.2200774Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_softmax_with_dtype_cuda_int8 PASSED [ 67%] 2023-01-11T23:13:47.2200963Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_softmax_with_dtype_cuda_uint8 PASSED [ 67%] 2023-01-11T23:13:47.2201140Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_and_cuda_bool PASSED [ 67%] 2023-01-11T23:13:47.2201323Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_and_cuda_complex64 PASSED [ 67%] 2023-01-11T23:13:47.2201507Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_and_cuda_float16 PASSED [ 67%] 2023-01-11T23:13:47.2201686Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_and_cuda_float32 PASSED [ 67%] 2023-01-11T23:13:47.2201860Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_not_cuda_bool PASSED [ 67%] 2023-01-11T23:13:47.2202036Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_not_cuda_complex128 PASSED [ 67%] 2023-01-11T23:13:47.2202224Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_not_cuda_complex64 PASSED [ 67%] 2023-01-11T23:13:47.2202428Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_not_cuda_float16 PASSED [ 67%] 2023-01-11T23:13:47.2202609Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_not_cuda_float32 PASSED [ 67%] 2023-01-11T23:13:47.2202785Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_not_cuda_uint8 PASSED [ 67%] 2023-01-11T23:13:47.2202968Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_or_cuda_bfloat16 PASSED [ 67%] 2023-01-11T23:13:47.2203149Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_or_cuda_complex64 PASSED [ 67%] 2023-01-11T23:13:47.2203328Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_or_cuda_int32 PASSED [ 67%] 2023-01-11T23:13:47.2203502Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_or_cuda_int64 PASSED [ 67%] 2023-01-11T23:13:47.2203664Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_or_cuda_uint8 PASSED [ 67%] 2023-01-11T23:13:47.2203849Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_xor_cuda_complex128 PASSED [ 67%] 2023-01-11T23:13:47.2204031Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_xor_cuda_complex64 PASSED [ 67%] 2023-01-11T23:13:47.2204207Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_xor_cuda_float16 PASSED [ 67%] 2023-01-11T23:13:47.2204383Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_xor_cuda_float32 PASSED [ 67%] 2023-01-11T23:13:47.2204561Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_xor_cuda_int16 PASSED [ 67%] 2023-01-11T23:13:47.2204740Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_xor_cuda_int32 PASSED [ 67%] 2023-01-11T23:13:47.2204942Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_xor_cuda_int64 PASSED [ 67%] 2023-01-11T23:13:47.2205120Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_xor_cuda_uint8 PASSED [ 67%] 2023-01-11T23:13:47.2205286Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logspace_cuda_bfloat16 PASSED [ 67%] 2023-01-11T23:13:47.2205466Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logspace_cuda_complex128 PASSED [ 67%] 2023-01-11T23:13:47.2205648Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logspace_cuda_complex64 PASSED [ 67%] 2023-01-11T23:13:47.2205825Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logspace_cuda_float16 PASSED [ 67%] 2023-01-11T23:13:47.2205994Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logspace_cuda_float32 PASSED [ 67%] 2023-01-11T23:13:47.2206165Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logspace_cuda_int16 XFAIL [ 67%] 2023-01-11T23:13:47.2206339Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logspace_cuda_int32 XFAIL [ 67%] 2023-01-11T23:13:47.2206509Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logspace_cuda_int8 PASSED [ 67%] 2023-01-11T23:13:47.2206690Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logsumexp_cuda_bfloat16 PASSED [ 67%] 2023-01-11T23:13:47.2206855Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logsumexp_cuda_float32 PASSED [ 67%] 2023-01-11T23:13:47.2207030Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logsumexp_cuda_int16 PASSED [ 67%] 2023-01-11T23:13:47.2207202Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logsumexp_cuda_int32 PASSED [ 67%] 2023-01-11T23:13:47.2207371Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logsumexp_cuda_int64 PASSED [ 67%] 2023-01-11T23:13:47.2207540Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lt_cuda_bfloat16 PASSED [ 67%] 2023-01-11T23:13:47.2207712Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lt_cuda_bool PASSED [ 67%] 2023-01-11T23:13:47.2207904Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lt_cuda_float32 PASSED [ 67%] 2023-01-11T23:13:47.2208072Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lt_cuda_float64 PASSED [ 67%] 2023-01-11T23:13:47.2208232Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lt_cuda_int16 PASSED [ 67%] 2023-01-11T23:13:47.2208398Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lt_cuda_int64 PASSED [ 67%] 2023-01-11T23:13:47.2208560Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lt_cuda_int8 PASSED [ 67%] 2023-01-11T23:13:47.2208723Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lt_cuda_uint8 PASSED [ 67%] 2023-01-11T23:13:47.2208910Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_masked_fill_cuda_complex128 PASSED [ 67%] 2023-01-11T23:13:47.2209109Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_masked_fill_cuda_complex32 PASSED [ 67%] 2023-01-11T23:13:47.2209316Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_masked_fill_cuda_float64 PASSED [ 67%] 2023-01-11T23:13:47.2209501Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_masked_fill_cuda_int16 PASSED [ 68%] 2023-01-11T23:13:47.2209678Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_masked_fill_cuda_int64 PASSED [ 68%] 2023-01-11T23:13:47.2209844Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_maximum_cuda_bfloat16 PASSED [ 68%] 2023-01-11T23:13:47.2210019Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_maximum_cuda_float16 PASSED [ 68%] 2023-01-11T23:13:47.2210191Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_maximum_cuda_float64 PASSED [ 68%] 2023-01-11T23:13:47.2210360Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_maximum_cuda_int16 PASSED [ 68%] 2023-01-11T23:13:47.2210554Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_mean_cuda_complex128 PASSED [ 68%] 2023-01-11T23:13:47.2210727Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_mean_cuda_float16 PASSED [ 68%] 2023-01-11T23:13:47.2210895Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_mean_cuda_float32 PASSED [ 68%] 2023-01-11T23:13:47.2211095Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_list_of_tensors_cuda_bfloat16 PASSED [ 68%] 2023-01-11T23:13:47.2211286Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_list_of_tensors_cuda_bool PASSED [ 68%] 2023-01-11T23:13:47.2211471Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_list_of_tensors_cuda_float16 PASSED [ 68%] 2023-01-11T23:13:47.2211659Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_list_of_tensors_cuda_float32 PASSED [ 68%] 2023-01-11T23:13:47.2211844Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_list_of_tensors_cuda_float64 PASSED [ 68%] 2023-01-11T23:13:47.2212037Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_list_of_tensors_cuda_int16 PASSED [ 68%] 2023-01-11T23:13:47.2212225Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_list_of_tensors_cuda_uint8 PASSED [ 68%] 2023-01-11T23:13:47.2212423Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_variadic_tensors_cuda_complex128 PASSED [ 68%] 2023-01-11T23:13:47.2212620Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_variadic_tensors_cuda_float16 PASSED [ 68%] 2023-01-11T23:13:47.2212813Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_variadic_tensors_cuda_int16 PASSED [ 68%] 2023-01-11T23:13:47.2213000Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_variadic_tensors_cuda_int64 PASSED [ 68%] 2023-01-11T23:13:47.2213188Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_variadic_tensors_cuda_uint8 PASSED [ 68%] 2023-01-11T23:13:47.2213383Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_minimum_cuda_bfloat16 PASSED [ 68%] 2023-01-11T23:13:47.2213556Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_minimum_cuda_bool PASSED [ 68%] 2023-01-11T23:13:47.2213730Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_minimum_cuda_float32 PASSED [ 68%] 2023-01-11T23:13:47.2213900Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_minimum_cuda_int16 PASSED [ 68%] 2023-01-11T23:13:47.2214067Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_minimum_cuda_int32 PASSED [ 68%] 2023-01-11T23:13:47.2214233Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_minimum_cuda_int64 PASSED [ 68%] 2023-01-11T23:13:47.2214401Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_minimum_cuda_int8 PASSED [ 68%] 2023-01-11T23:13:47.2214781Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_movedim_cuda_bool PASSED [ 68%] 2023-01-11T23:13:47.2214965Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_movedim_cuda_complex64 PASSED [ 68%] 2023-01-11T23:13:47.2215130Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_movedim_cuda_int16 PASSED [ 68%] 2023-01-11T23:13:47.2215297Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_movedim_cuda_int32 PASSED [ 68%] 2023-01-11T23:13:47.2215462Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_movedim_cuda_int64 PASSED [ 68%] 2023-01-11T23:13:47.2215628Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_movedim_cuda_int8 PASSED [ 68%] 2023-01-11T23:13:47.2215800Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_mul_cuda_bfloat16 PASSED [ 68%] 2023-01-11T23:13:47.2215965Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_mul_cuda_bool PASSED [ 68%] 2023-01-11T23:13:47.2216191Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_mul_cuda_float16 PASSED [ 68%] 2023-01-11T23:13:47.2216361Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_mul_cuda_float32 PASSED [ 68%] 2023-01-11T23:13:47.2216523Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_mul_cuda_float64 PASSED [ 68%] 2023-01-11T23:13:47.2216690Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_mul_cuda_int16 PASSED [ 68%] 2023-01-11T23:13:47.2216856Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_mul_cuda_int32 PASSED [ 68%] 2023-01-11T23:13:47.2217024Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_mul_cuda_int64 PASSED [ 68%] 2023-01-11T23:13:47.2217186Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_mul_cuda_int8 PASSED [ 68%] 2023-01-11T23:13:47.2217365Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nan_to_num_cuda_bfloat16 PASSED [ 68%] 2023-01-11T23:13:47.2217544Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nan_to_num_cuda_float16 PASSED [ 68%] 2023-01-11T23:13:47.2217719Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nan_to_num_cuda_float64 PASSED [ 68%] 2023-01-11T23:13:47.2217894Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nan_to_num_cuda_int16 PASSED [ 68%] 2023-01-11T23:13:47.2218057Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nan_to_num_cuda_int64 PASSED [ 68%] 2023-01-11T23:13:47.2218241Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_copy_cuda_complex128 PASSED [ 68%] 2023-01-11T23:13:47.2218425Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_copy_cuda_complex32 PASSED [ 68%] 2023-01-11T23:13:47.2218601Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_copy_cuda_int16 PASSED [ 68%] 2023-01-11T23:13:47.2218777Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_copy_cuda_int32 PASSED [ 68%] 2023-01-11T23:13:47.2218953Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_copy_cuda_int64 PASSED [ 68%] 2023-01-11T23:13:47.2219160Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_copy_cuda_uint8 PASSED [ 68%] 2023-01-11T23:13:47.2219335Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_cuda_bfloat16 PASSED [ 68%] 2023-01-11T23:13:47.2219503Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_cuda_bool PASSED [ 68%] 2023-01-11T23:13:47.2219670Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_cuda_float64 PASSED [ 68%] 2023-01-11T23:13:47.2219838Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_cuda_int16 PASSED [ 68%] 2023-01-11T23:13:47.2220007Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_cuda_int32 PASSED [ 68%] 2023-01-11T23:13:47.2220179Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_cuda_int8 PASSED [ 68%] 2023-01-11T23:13:47.2220352Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_cuda_uint8 PASSED [ 68%] 2023-01-11T23:13:47.2220541Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_native_layer_norm_cuda_float32 PASSED [ 68%] 2023-01-11T23:13:47.2220729Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_native_layer_norm_cuda_float64 PASSED [ 68%] 2023-01-11T23:13:47.2220901Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ne_cuda_complex64 PASSED [ 68%] 2023-01-11T23:13:47.2221068Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ne_cuda_float16 PASSED [ 68%] 2023-01-11T23:13:47.2221227Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ne_cuda_float32 PASSED [ 68%] 2023-01-11T23:13:47.2221392Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ne_cuda_int32 PASSED [ 68%] 2023-01-11T23:13:47.2221558Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ne_cuda_int8 PASSED [ 68%] 2023-01-11T23:13:47.2221756Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_neg_cuda_complex32 PASSED [ 68%] 2023-01-11T23:13:47.2221929Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_neg_cuda_complex64 PASSED [ 68%] 2023-01-11T23:13:47.2222099Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_neg_cuda_float32 PASSED [ 68%] 2023-01-11T23:13:47.2222267Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_neg_cuda_int64 PASSED [ 68%] 2023-01-11T23:13:47.2222435Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_neg_cuda_int8 PASSED [ 68%] 2023-01-11T23:13:47.2222602Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_neg_cuda_uint8 PASSED [ 68%] 2023-01-11T23:13:47.2222814Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_cuda_bfloat16 SKIPPED (Expected: empty is not comparable) [ 68%] 2023-01-11T23:13:47.2223033Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_cuda_float32 SKIPPED (Expected: empty is not comparable) [ 68%] 2023-01-11T23:13:47.2223249Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_cuda_float64 SKIPPED (Expected: empty is not comparable) [ 68%] 2023-01-11T23:13:47.2223460Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_cuda_int64 SKIPPED (Expected: empty is not comparable) [ 68%] 2023-01-11T23:13:47.2223674Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_cuda_int8 SKIPPED (Expected: empty is not comparable) [ 68%] 2023-01-11T23:13:47.2223885Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_cuda_uint8 SKIPPED (Expected: empty is not comparable) [ 68%] 2023-01-11T23:13:47.2224119Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_strided_cuda_bfloat16 SKIPPED (Expected: empty_strided is not comparable) [ 68%] 2023-01-11T23:13:47.2224352Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_strided_cuda_bool SKIPPED (Expected: empty_strided is not comparable) [ 68%] 2023-01-11T23:13:47.2224610Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_strided_cuda_complex128 SKIPPED (Expected: empty_strided is not comparable) [ 68%] 2023-01-11T23:13:47.2224852Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_strided_cuda_complex64 SKIPPED (Expected: empty_strided is not comparable) [ 68%] 2023-01-11T23:13:47.2225079Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_strided_cuda_int16 SKIPPED (Expected: empty_strided is not comparable) [ 68%] 2023-01-11T23:13:47.2225300Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_strided_cuda_int32 SKIPPED (Expected: empty_strided is not comparable) [ 68%] 2023-01-11T23:13:47.2225526Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_strided_cuda_int64 SKIPPED (Expected: empty_strided is not comparable) [ 68%] 2023-01-11T23:13:47.2225756Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_strided_cuda_int8 SKIPPED (Expected: empty_strided is not comparable) [ 68%] 2023-01-11T23:13:47.2225937Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_full_cuda_bfloat16 PASSED [ 68%] 2023-01-11T23:13:47.2226118Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_full_cuda_complex128 PASSED [ 68%] 2023-01-11T23:13:47.2226300Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_full_cuda_complex64 PASSED [ 68%] 2023-01-11T23:13:47.2226475Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_full_cuda_int16 PASSED [ 68%] 2023-01-11T23:13:47.2226648Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_full_cuda_int8 PASSED [ 68%] 2023-01-11T23:13:47.2226824Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_ones_cuda_bfloat16 PASSED [ 68%] 2023-01-11T23:13:47.2227011Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_ones_cuda_bool PASSED [ 68%] 2023-01-11T23:13:47.2227191Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_ones_cuda_complex128 PASSED [ 68%] 2023-01-11T23:13:47.2227372Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_ones_cuda_float32 PASSED [ 68%] 2023-01-11T23:13:47.2227542Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_ones_cuda_int16 PASSED [ 68%] 2023-01-11T23:13:47.2227708Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_ones_cuda_int64 PASSED [ 68%] 2023-01-11T23:13:47.2227875Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_ones_cuda_uint8 PASSED [ 68%] 2023-01-11T23:13:47.2228042Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_zeros_cuda_bool PASSED [ 68%] 2023-01-11T23:13:47.2228220Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_zeros_cuda_complex64 PASSED [ 68%] 2023-01-11T23:13:47.2228398Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_zeros_cuda_float32 PASSED [ 68%] 2023-01-11T23:13:47.2228564Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_zeros_cuda_float64 PASSED [ 68%] 2023-01-11T23:13:47.2228742Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_zeros_cuda_int8 PASSED [ 68%] 2023-01-11T23:13:47.2228917Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_zeros_cuda_uint8 PASSED [ 68%] 2023-01-11T23:13:47.2229094Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nextafter_cuda_bfloat16 PASSED [ 68%] 2023-01-11T23:13:47.2229337Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_alpha_dropout_cuda_bfloat16 SKIPPED (Expected: dropout is not comparable) [ 68%] 2023-01-11T23:13:47.2229576Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_alpha_dropout_cuda_float16 SKIPPED (Expected: dropout is not comparable) [ 68%] 2023-01-11T23:13:47.2229900Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_alpha_dropout_cuda_float32 SKIPPED (Expected: dropout is not comparable) [ 68%] 2023-01-11T23:13:47.2230143Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_celu_cuda_bfloat16 PASSED [ 68%] 2023-01-11T23:13:47.2230355Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_celu_cuda_float32 PASSED [ 68%] 2023-01-11T23:13:47.2230637Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_dropout_cuda_float16 SKIPPED (Expected: dropout is not comparable) [ 68%] 2023-01-11T23:13:47.2230860Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_dropout_cuda_float64 SKIPPED (Expected: dropout is not comparable) [ 68%] 2023-01-11T23:13:47.2231046Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_gelu_cuda_float32 PASSED [ 68%] 2023-01-11T23:13:47.2231232Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_glu_cuda_bfloat16 PASSED [ 68%] 2023-01-11T23:13:47.2231420Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_glu_cuda_float16 PASSED [ 68%] 2023-01-11T23:13:47.2231605Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_glu_cuda_float32 PASSED [ 68%] 2023-01-11T23:13:47.2231789Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_glu_cuda_float64 PASSED [ 68%] 2023-01-11T23:13:47.2231981Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_group_norm_cuda_float16 PASSED [ 68%] 2023-01-11T23:13:47.2232168Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_group_norm_cuda_float32 PASSED [ 68%] 2023-01-11T23:13:47.2232355Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_hardshrink_cuda_float64 PASSED [ 68%] 2023-01-11T23:13:47.2232582Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_hardtanh_cuda_bfloat16 PASSED [ 68%] 2023-01-11T23:13:47.2232765Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_hardtanh_cuda_float16 PASSED [ 68%] 2023-01-11T23:13:47.2232955Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_hardtanh_cuda_float64 PASSED [ 68%] 2023-01-11T23:13:47.2233141Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_hardtanh_cuda_int32 PASSED [ 68%] 2023-01-11T23:13:47.2233328Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_hardtanh_cuda_int64 PASSED [ 68%] 2023-01-11T23:13:47.2233531Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_hinge_embedding_loss_cuda_float32 PASSED [ 68%] 2023-01-11T23:13:47.2233725Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_huber_loss_cuda_float32 PASSED [ 68%] 2023-01-11T23:13:47.2233912Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_huber_loss_cuda_float64 PASSED [ 68%] 2023-01-11T23:13:47.2234102Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_l1_loss_cuda_bfloat16 PASSED [ 68%] 2023-01-11T23:13:47.2234294Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_l1_loss_cuda_complex64 PASSED [ 68%] 2023-01-11T23:13:47.2234473Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_l1_loss_cuda_float64 PASSED [ 68%] 2023-01-11T23:13:47.2234667Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_layer_norm_cuda_bfloat16 PASSED [ 68%] 2023-01-11T23:13:47.2234856Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_layer_norm_cuda_float16 PASSED [ 68%] 2023-01-11T23:13:47.2235047Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_leaky_relu_cuda_bfloat16 PASSED [ 68%] 2023-01-11T23:13:47.2235240Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_leaky_relu_cuda_float32 PASSED [ 68%] 2023-01-11T23:13:47.2235449Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_log_softmax_with_dtype_cuda_bfloat16 PASSED [ 68%] 2023-01-11T23:13:47.2235685Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_log_softmax_with_dtype_cuda_complex64 PASSED [ 68%] 2023-01-11T23:13:47.2235890Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_log_softmax_with_dtype_cuda_float64 PASSED [ 68%] 2023-01-11T23:13:47.2236092Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_log_softmax_with_dtype_cuda_int16 PASSED [ 68%] 2023-01-11T23:13:47.2236292Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_margin_ranking_loss_cuda_float32 PASSED [ 68%] 2023-01-11T23:13:47.2236485Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_margin_ranking_loss_cuda_int16 PASSED [ 68%] 2023-01-11T23:13:47.2236691Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_margin_ranking_loss_cuda_uint8 PASSED [ 68%] 2023-01-11T23:13:47.2236878Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_mish_cuda_float16 PASSED [ 68%] 2023-01-11T23:13:47.2237065Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_mish_cuda_float32 PASSED [ 68%] 2023-01-11T23:13:47.2237255Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_mse_loss_cuda_bfloat16 PASSED [ 68%] 2023-01-11T23:13:47.2237444Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_mse_loss_cuda_float16 PASSED [ 68%] 2023-01-11T23:13:47.2237631Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_mse_loss_cuda_float32 PASSED [ 68%] 2023-01-11T23:13:47.2237818Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_nll_loss_cuda_bfloat16 PASSED [ 68%] 2023-01-11T23:13:47.2238024Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_nll_loss_cuda_float16 PASSED [ 68%] 2023-01-11T23:13:47.2238209Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_nll_loss_cuda_float32 PASSED [ 68%] 2023-01-11T23:13:47.2238405Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_pairwise_distance_cuda_complex64 PASSED [ 69%] 2023-01-11T23:13:47.2238604Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_pairwise_distance_cuda_float16 PASSED [ 69%] 2023-01-11T23:13:47.2238802Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_pairwise_distance_cuda_int16 PASSED [ 69%] 2023-01-11T23:13:47.2238999Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_pairwise_distance_cuda_int32 PASSED [ 69%] 2023-01-11T23:13:47.2239190Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_pairwise_distance_cuda_int64 PASSED [ 69%] 2023-01-11T23:13:47.2239389Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_pairwise_distance_cuda_int8 PASSED [ 69%] 2023-01-11T23:13:47.2239582Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_pdist_cuda_float32 PASSED [ 69%] 2023-01-11T23:13:47.2239768Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_pdist_cuda_float64 PASSED [ 69%] 2023-01-11T23:13:47.2239964Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_poisson_nll_loss_cuda_float16 PASSED [ 69%] 2023-01-11T23:13:47.2240154Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_poisson_nll_loss_cuda_float64 PASSED [ 69%] 2023-01-11T23:13:47.2240346Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_poisson_nll_loss_cuda_int16 PASSED [ 69%] 2023-01-11T23:13:47.2240536Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_poisson_nll_loss_cuda_int64 PASSED [ 69%] 2023-01-11T23:13:47.2240732Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_poisson_nll_loss_cuda_uint8 PASSED [ 69%] 2023-01-11T23:13:47.2240985Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_prelu_cuda_bfloat16 PASSED [ 69%] 2023-01-11T23:13:47.2241174Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_prelu_cuda_float32 PASSED [ 69%] 2023-01-11T23:13:47.2241359Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_prelu_cuda_float64 PASSED [ 69%] 2023-01-11T23:13:47.2241547Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_relu6_cuda_bfloat16 PASSED [ 69%] 2023-01-11T23:13:47.2241732Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_relu6_cuda_float16 PASSED [ 69%] 2023-01-11T23:13:47.2241921Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_relu6_cuda_int64 PASSED [ 69%] 2023-01-11T23:13:47.2242099Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_relu_cuda_bfloat16 PASSED [ 69%] 2023-01-11T23:13:47.2242286Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_relu_cuda_float16 PASSED [ 69%] 2023-01-11T23:13:47.2242470Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_relu_cuda_float32 PASSED [ 69%] 2023-01-11T23:13:47.2242653Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_relu_cuda_int16 PASSED [ 69%] 2023-01-11T23:13:47.2242835Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_relu_cuda_int64 PASSED [ 69%] 2023-01-11T23:13:47.2243018Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_relu_cuda_int8 PASSED [ 69%] 2023-01-11T23:13:47.2243200Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_selu_cuda_float16 PASSED [ 69%] 2023-01-11T23:13:47.2243405Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_selu_cuda_float32 PASSED [ 69%] 2023-01-11T23:13:47.2243595Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_selu_cuda_float64 PASSED [ 69%] 2023-01-11T23:13:47.2243793Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmax_with_dtype_cuda_complex128 PASSED [ 69%] 2023-01-11T23:13:47.2243994Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmax_with_dtype_cuda_float16 PASSED [ 69%] 2023-01-11T23:13:47.2244192Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmax_with_dtype_cuda_float64 PASSED [ 69%] 2023-01-11T23:13:47.2244391Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmin_with_dtype_cuda_bfloat16 PASSED [ 69%] 2023-01-11T23:13:47.2244594Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmin_with_dtype_cuda_complex64 PASSED [ 69%] 2023-01-11T23:13:47.2244793Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmin_with_dtype_cuda_float16 PASSED [ 69%] 2023-01-11T23:13:47.2244997Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmin_with_dtype_cuda_float32 PASSED [ 69%] 2023-01-11T23:13:47.2245196Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmin_with_dtype_cuda_int8 PASSED [ 69%] 2023-01-11T23:13:47.2245390Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softplus_cuda_bfloat16 PASSED [ 69%] 2023-01-11T23:13:47.2245584Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_tanhshrink_cuda_bfloat16 PASSED [ 69%] 2023-01-11T23:13:47.2245774Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_tanhshrink_cuda_complex128 PASSED [ 69%] 2023-01-11T23:13:47.2245978Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_tanhshrink_cuda_float16 PASSED [ 69%] 2023-01-11T23:13:47.2246228Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_tanhshrink_cuda_float64 PASSED [ 69%] 2023-01-11T23:13:47.2246496Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_tanhshrink_cuda_int8 PASSED [ 69%] 2023-01-11T23:13:47.2246694Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_tanhshrink_cuda_uint8 PASSED [ 69%] 2023-01-11T23:13:47.2246885Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_threshold_cuda_bfloat16 PASSED [ 69%] 2023-01-11T23:13:47.2247077Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_threshold_cuda_float16 PASSED [ 69%] 2023-01-11T23:13:47.2247266Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_threshold_cuda_float32 PASSED [ 69%] 2023-01-11T23:13:47.2247451Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_threshold_cuda_float64 PASSED [ 69%] 2023-01-11T23:13:47.2247645Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_threshold_cuda_uint8 PASSED [ 69%] 2023-01-11T23:13:47.2247846Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_triplet_margin_loss_cuda_bfloat16 PASSED [ 69%] 2023-01-11T23:13:47.2248058Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_triplet_margin_loss_cuda_complex128 PASSED [ 69%] 2023-01-11T23:13:47.2248264Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_triplet_margin_loss_cuda_complex64 PASSED [ 69%] 2023-01-11T23:13:47.2248466Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_triplet_margin_loss_cuda_int16 PASSED [ 69%] 2023-01-11T23:13:47.2248666Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_triplet_margin_loss_cuda_int32 PASSED [ 69%] 2023-01-11T23:13:47.2248889Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_triplet_margin_loss_cuda_int8 PASSED [ 69%] 2023-01-11T23:13:47.2249086Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_triplet_margin_loss_cuda_uint8 PASSED [ 69%] 2023-01-11T23:13:47.2249266Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_norm_cuda_complex128 PASSED [ 69%] 2023-01-11T23:13:47.2249435Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ones_cuda_bool PASSED [ 69%] 2023-01-11T23:13:47.2249601Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ones_cuda_complex32 PASSED [ 69%] 2023-01-11T23:13:47.2249770Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ones_cuda_float16 PASSED [ 69%] 2023-01-11T23:13:47.2249944Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ones_cuda_float64 PASSED [ 69%] 2023-01-11T23:13:47.2250113Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ones_cuda_int32 PASSED [ 69%] 2023-01-11T23:13:47.2250280Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ones_cuda_int64 PASSED [ 69%] 2023-01-11T23:13:47.2250460Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_permute_cuda_complex128 PASSED [ 69%] 2023-01-11T23:13:47.2250643Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_permute_cuda_complex64 PASSED [ 69%] 2023-01-11T23:13:47.2250823Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_permute_cuda_float32 PASSED [ 69%] 2023-01-11T23:13:47.2250994Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_permute_cuda_int32 PASSED [ 69%] 2023-01-11T23:13:47.2251157Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_permute_cuda_int64 PASSED [ 69%] 2023-01-11T23:13:47.2251328Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_permute_cuda_int8 PASSED [ 69%] 2023-01-11T23:13:47.2251501Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_permute_cuda_uint8 PASSED [ 69%] 2023-01-11T23:13:47.2251680Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_positive_cuda_float16 PASSED [ 69%] 2023-01-11T23:13:47.2251878Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_positive_cuda_float32 PASSED [ 69%] 2023-01-11T23:13:47.2252050Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_positive_cuda_float64 PASSED [ 69%] 2023-01-11T23:13:47.2252224Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_positive_cuda_int16 PASSED [ 69%] 2023-01-11T23:13:47.2252393Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_positive_cuda_uint8 PASSED [ 69%] 2023-01-11T23:13:47.2252566Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_pow_cuda_bfloat16 PASSED [ 69%] 2023-01-11T23:13:47.2252732Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_pow_cuda_complex64 PASSED [ 69%] 2023-01-11T23:13:47.2252904Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_pow_cuda_float16 PASSED [ 69%] 2023-01-11T23:13:47.2253078Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_pow_cuda_float32 PASSED [ 69%] 2023-01-11T23:13:47.2253248Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_pow_cuda_int16 PASSED [ 69%] 2023-01-11T23:13:47.2253417Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_pow_cuda_int8 PASSED [ 69%] 2023-01-11T23:13:47.2253592Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_prod_cuda_bfloat16 PASSED [ 69%] 2023-01-11T23:13:47.2253760Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_prod_cuda_bool PASSED [ 69%] 2023-01-11T23:13:47.2253938Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_prod_cuda_complex64 PASSED [ 69%] 2023-01-11T23:13:47.2254100Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_prod_cuda_float64 PASSED [ 69%] 2023-01-11T23:13:47.2254267Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_prod_cuda_int16 PASSED [ 69%] 2023-01-11T23:13:47.2254464Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_prod_cuda_int32 PASSED [ 69%] 2023-01-11T23:13:47.2254780Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_randn_cuda_bfloat16 PASSED [ 69%] 2023-01-11T23:13:47.2254961Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_randn_cuda_complex128 PASSED [ 69%] 2023-01-11T23:13:47.2255139Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_randn_cuda_complex64 PASSED [ 69%] 2023-01-11T23:13:47.2255309Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_randn_cuda_float16 PASSED [ 69%] 2023-01-11T23:13:47.2255475Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_randn_cuda_float64 PASSED [ 69%] 2023-01-11T23:13:47.2255645Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ravel_cuda_bfloat16 PASSED [ 69%] 2023-01-11T23:13:47.2255807Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ravel_cuda_bool PASSED [ 69%] 2023-01-11T23:13:47.2255986Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ravel_cuda_complex128 PASSED [ 69%] 2023-01-11T23:13:47.2256161Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ravel_cuda_complex32 PASSED [ 69%] 2023-01-11T23:13:47.2256332Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ravel_cuda_complex64 PASSED [ 69%] 2023-01-11T23:13:47.2256501Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ravel_cuda_float16 PASSED [ 69%] 2023-01-11T23:13:47.2256667Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ravel_cuda_float32 PASSED [ 69%] 2023-01-11T23:13:47.2256854Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ravel_cuda_float64 PASSED [ 69%] 2023-01-11T23:13:47.2257082Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ravel_cuda_int16 PASSED [ 69%] 2023-01-11T23:13:47.2257304Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ravel_cuda_int64 PASSED [ 69%] 2023-01-11T23:13:47.2257471Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ravel_cuda_int8 PASSED [ 69%] 2023-01-11T23:13:47.2257641Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_real_cuda_bfloat16 PASSED [ 69%] 2023-01-11T23:13:47.2257866Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_real_cuda_bool PASSED [ 69%] 2023-01-11T23:13:47.2258045Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_real_cuda_complex128 PASSED [ 69%] 2023-01-11T23:13:47.2258216Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_real_cuda_float16 PASSED [ 69%] 2023-01-11T23:13:47.2258383Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_real_cuda_int32 PASSED [ 69%] 2023-01-11T23:13:47.2258551Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_real_cuda_int64 PASSED [ 69%] 2023-01-11T23:13:47.2258718Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_real_cuda_int8 PASSED [ 69%] 2023-01-11T23:13:47.2258909Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reciprocal_cuda_complex128 PASSED [ 69%] 2023-01-11T23:13:47.2259088Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reciprocal_cuda_complex64 PASSED [ 69%] 2023-01-11T23:13:47.2259275Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reciprocal_cuda_float16 PASSED [ 69%] 2023-01-11T23:13:47.2259460Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reciprocal_cuda_float32 PASSED [ 69%] 2023-01-11T23:13:47.2259642Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_remainder_cuda_int32 PASSED [ 69%] 2023-01-11T23:13:47.2259814Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_repeat_cuda_bool PASSED [ 69%] 2023-01-11T23:13:47.2259990Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_repeat_cuda_float16 PASSED [ 69%] 2023-01-11T23:13:47.2260160Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_repeat_cuda_int16 PASSED [ 69%] 2023-01-11T23:13:47.2260365Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_repeat_cuda_int32 PASSED [ 69%] 2023-01-11T23:13:47.2260540Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_repeat_cuda_int8 PASSED [ 69%] 2023-01-11T23:13:47.2260705Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_repeat_cuda_uint8 PASSED [ 69%] 2023-01-11T23:13:47.2260887Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_as_cuda_complex128 PASSED [ 69%] 2023-01-11T23:13:47.2261066Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_as_cuda_complex32 PASSED [ 69%] 2023-01-11T23:13:47.2261246Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_as_cuda_float64 PASSED [ 69%] 2023-01-11T23:13:47.2261426Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_as_cuda_int8 PASSED [ 69%] 2023-01-11T23:13:47.2261601Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_as_cuda_uint8 PASSED [ 69%] 2023-01-11T23:13:47.2261776Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_cuda_bfloat16 PASSED [ 69%] 2023-01-11T23:13:47.2261955Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_cuda_complex128 PASSED [ 69%] 2023-01-11T23:13:47.2262133Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_cuda_complex32 PASSED [ 69%] 2023-01-11T23:13:47.2262303Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_cuda_float16 PASSED [ 69%] 2023-01-11T23:13:47.2262475Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_cuda_float32 PASSED [ 69%] 2023-01-11T23:13:47.2262647Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_cuda_float64 PASSED [ 69%] 2023-01-11T23:13:47.2262821Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_cuda_int64 PASSED [ 69%] 2023-01-11T23:13:47.2262993Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_cuda_int8 PASSED [ 69%] 2023-01-11T23:13:47.2263168Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_cuda_uint8 PASSED [ 69%] 2023-01-11T23:13:47.2263358Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_roll_cuda_bool PASSED [ 69%] 2023-01-11T23:13:47.2263533Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_roll_cuda_complex128 PASSED [ 69%] 2023-01-11T23:13:47.2263696Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_roll_cuda_complex64 PASSED [ 69%] 2023-01-11T23:13:47.2263864Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_roll_cuda_float32 PASSED [ 69%] 2023-01-11T23:13:47.2264030Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_roll_cuda_int16 PASSED [ 69%] 2023-01-11T23:13:47.2264198Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_roll_cuda_int64 PASSED [ 69%] 2023-01-11T23:13:47.2264365Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_roll_cuda_int8 PASSED [ 69%] 2023-01-11T23:13:47.2264536Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rot90_cuda_bool PASSED [ 69%] 2023-01-11T23:13:47.2264717Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rot90_cuda_complex128 PASSED [ 69%] 2023-01-11T23:13:47.2264888Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rot90_cuda_float64 PASSED [ 69%] 2023-01-11T23:13:47.2265055Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rot90_cuda_int16 PASSED [ 69%] 2023-01-11T23:13:47.2265215Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rot90_cuda_int32 PASSED [ 69%] 2023-01-11T23:13:47.2265382Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rot90_cuda_int64 PASSED [ 69%] 2023-01-11T23:13:47.2265549Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rot90_cuda_uint8 PASSED [ 69%] 2023-01-11T23:13:47.2265722Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_round_cuda_float16 PASSED [ 69%] 2023-01-11T23:13:47.2265918Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_round_cuda_float64 PASSED [ 69%] 2023-01-11T23:13:47.2266087Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_round_cuda_int32 PASSED [ 69%] 2023-01-11T23:13:47.2266265Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsqrt_cuda_complex128 PASSED [ 70%] 2023-01-11T23:13:47.2266439Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsqrt_cuda_complex32 PASSED [ 70%] 2023-01-11T23:13:47.2266609Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsqrt_cuda_complex64 PASSED [ 70%] 2023-01-11T23:13:47.2266772Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsqrt_cuda_float32 PASSED [ 70%] 2023-01-11T23:13:47.2266939Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsqrt_cuda_float64 PASSED [ 70%] 2023-01-11T23:13:47.2267106Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsqrt_cuda_int64 PASSED [ 70%] 2023-01-11T23:13:47.2267278Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsub_cuda_bfloat16 PASSED [ 70%] 2023-01-11T23:13:47.2267457Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsub_cuda_complex128 PASSED [ 70%] 2023-01-11T23:13:47.2267631Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsub_cuda_complex64 PASSED [ 70%] 2023-01-11T23:13:47.2267801Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsub_cuda_float16 PASSED [ 70%] 2023-01-11T23:13:47.2267969Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsub_cuda_float32 PASSED [ 70%] 2023-01-11T23:13:47.2268138Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsub_cuda_float64 PASSED [ 70%] 2023-01-11T23:13:47.2268298Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsub_cuda_int32 PASSED [ 70%] 2023-01-11T23:13:47.2268465Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsub_cuda_uint8 PASSED [ 70%] 2023-01-11T23:13:47.2268636Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sgn_cuda_bfloat16 PASSED [ 70%] 2023-01-11T23:13:47.2268825Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sgn_cuda_bool PASSED [ 70%] 2023-01-11T23:13:47.2269002Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sgn_cuda_complex32 PASSED [ 70%] 2023-01-11T23:13:47.2269196Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sgn_cuda_complex64 PASSED [ 70%] 2023-01-11T23:13:47.2269389Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sgn_cuda_float16 PASSED [ 70%] 2023-01-11T23:13:47.2269556Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sgn_cuda_float32 PASSED [ 70%] 2023-01-11T23:13:47.2269795Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sgn_cuda_int16 PASSED [ 70%] 2023-01-11T23:13:47.2269969Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sgn_cuda_int64 PASSED [ 70%] 2023-01-11T23:13:47.2270139Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sgn_cuda_int8 PASSED [ 70%] 2023-01-11T23:13:47.2270316Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sigmoid_cuda_bfloat16 PASSED [ 70%] 2023-01-11T23:13:47.2270494Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sigmoid_cuda_complex128 PASSED [ 70%] 2023-01-11T23:13:47.2270675Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sigmoid_cuda_complex32 PASSED [ 70%] 2023-01-11T23:13:47.2270854Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sigmoid_cuda_complex64 PASSED [ 70%] 2023-01-11T23:13:47.2271027Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sigmoid_cuda_float16 PASSED [ 70%] 2023-01-11T23:13:47.2271197Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sigmoid_cuda_float32 PASSED [ 70%] 2023-01-11T23:13:47.2271357Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sigmoid_cuda_float64 PASSED [ 70%] 2023-01-11T23:13:47.2271565Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sigmoid_cuda_int64 PASSED [ 70%] 2023-01-11T23:13:47.2271735Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sigmoid_cuda_uint8 PASSED [ 70%] 2023-01-11T23:13:47.2271900Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sign_cuda_bfloat16 PASSED [ 70%] 2023-01-11T23:13:47.2272066Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sign_cuda_bool PASSED [ 70%] 2023-01-11T23:13:47.2272238Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sign_cuda_float32 PASSED [ 70%] 2023-01-11T23:13:47.2272406Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sign_cuda_int16 PASSED [ 70%] 2023-01-11T23:13:47.2272574Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sign_cuda_int32 PASSED [ 70%] 2023-01-11T23:13:47.2272746Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_signbit_cuda_float16 PASSED [ 70%] 2023-01-11T23:13:47.2272910Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_signbit_cuda_float32 PASSED [ 70%] 2023-01-11T23:13:47.2273081Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_signbit_cuda_float64 PASSED [ 70%] 2023-01-11T23:13:47.2273248Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_signbit_cuda_int16 PASSED [ 70%] 2023-01-11T23:13:47.2273418Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_signbit_cuda_int8 PASSED [ 70%] 2023-01-11T23:13:47.2273585Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_signbit_cuda_uint8 PASSED [ 70%] 2023-01-11T23:13:47.2273751Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sin_cuda_bfloat16 PASSED [ 70%] 2023-01-11T23:13:47.2273920Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sin_cuda_bool PASSED [ 70%] 2023-01-11T23:13:47.2274092Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sin_cuda_complex128 PASSED [ 70%] 2023-01-11T23:13:47.2274262Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sin_cuda_float32 PASSED [ 70%] 2023-01-11T23:13:47.2274453Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sin_cuda_int16 PASSED [ 70%] 2023-01-11T23:13:47.2274622Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sin_cuda_int64 PASSED [ 70%] 2023-01-11T23:13:47.2274787Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sin_cuda_int8 PASSED [ 70%] 2023-01-11T23:13:47.2274961Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinc_cuda_bfloat16 PASSED [ 70%] 2023-01-11T23:13:47.2275125Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinc_cuda_bool PASSED [ 70%] 2023-01-11T23:13:47.2275298Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinc_cuda_complex64 PASSED [ 70%] 2023-01-11T23:13:47.2275471Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinc_cuda_float16 PASSED [ 70%] 2023-01-11T23:13:47.2275643Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinc_cuda_float64 PASSED [ 70%] 2023-01-11T23:13:47.2275803Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinc_cuda_int8 PASSED [ 70%] 2023-01-11T23:13:47.2275970Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinh_cuda_bfloat16 PASSED [ 70%] 2023-01-11T23:13:47.2276144Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinh_cuda_complex128 PASSED [ 70%] 2023-01-11T23:13:47.2276311Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinh_cuda_float16 PASSED [ 70%] 2023-01-11T23:13:47.2276479Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinh_cuda_float64 PASSED [ 70%] 2023-01-11T23:13:47.2276645Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinh_cuda_uint8 PASSED [ 70%] 2023-01-11T23:13:47.2276834Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_softmax_with_dtype_cuda_bfloat16 PASSED [ 70%] 2023-01-11T23:13:47.2277049Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_softmax_with_dtype_cuda_float16 PASSED [ 70%] 2023-01-11T23:13:47.2277240Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_softmax_with_dtype_cuda_float32 PASSED [ 70%] 2023-01-11T23:13:47.2277419Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_softmax_with_dtype_cuda_float64 PASSED [ 70%] 2023-01-11T23:13:47.2277605Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_softmax_with_dtype_cuda_int16 PASSED [ 70%] 2023-01-11T23:13:47.2277788Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_softmax_with_dtype_cuda_int32 PASSED [ 70%] 2023-01-11T23:13:47.2277973Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_softmax_with_dtype_cuda_int64 PASSED [ 70%] 2023-01-11T23:13:47.2278155Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_bessel_j0_cuda_bool PASSED [ 70%] 2023-01-11T23:13:47.2278341Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_bessel_j0_cuda_int32 PASSED [ 70%] 2023-01-11T23:13:47.2278526Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_bessel_j1_cuda_float32 PASSED [ 70%] 2023-01-11T23:13:47.2278716Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_bessel_j1_cuda_float64 PASSED [ 70%] 2023-01-11T23:13:47.2278896Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_bessel_j1_cuda_int16 PASSED [ 70%] 2023-01-11T23:13:47.2279077Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_bessel_j1_cuda_int32 PASSED [ 70%] 2023-01-11T23:13:47.2279278Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_entr_cuda_bfloat16 PASSED [ 70%] 2023-01-11T23:13:47.2279480Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_entr_cuda_float32 PASSED [ 70%] 2023-01-11T23:13:47.2279658Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_entr_cuda_int32 PASSED [ 70%] 2023-01-11T23:13:47.2279841Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_entr_cuda_int64 PASSED [ 70%] 2023-01-11T23:13:47.2280042Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_entr_cuda_int8 PASSED [ 70%] 2023-01-11T23:13:47.2280223Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_entr_cuda_uint8 PASSED [ 70%] 2023-01-11T23:13:47.2280397Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_erfcx_cuda_bool PASSED [ 70%] 2023-01-11T23:13:47.2280577Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_erfcx_cuda_int32 PASSED [ 70%] 2023-01-11T23:13:47.2280756Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_erfcx_cuda_int64 PASSED [ 70%] 2023-01-11T23:13:47.2280923Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_erfcx_cuda_int8 PASSED [ 70%] 2023-01-11T23:13:47.2281107Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i0e_cuda_bfloat16 PASSED [ 70%] 2023-01-11T23:13:47.2281285Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i0e_cuda_bool PASSED [ 70%] 2023-01-11T23:13:47.2281464Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i0e_cuda_float16 PASSED [ 70%] 2023-01-11T23:13:47.2281639Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i0e_cuda_float64 PASSED [ 70%] 2023-01-11T23:13:47.2281815Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i0e_cuda_int64 PASSED [ 70%] 2023-01-11T23:13:47.2281990Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i0e_cuda_int8 PASSED [ 70%] 2023-01-11T23:13:47.2282164Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i1_cuda_float64 PASSED [ 70%] 2023-01-11T23:13:47.2282336Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i1_cuda_int32 PASSED [ 70%] 2023-01-11T23:13:47.2282531Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i1_cuda_int8 PASSED [ 70%] 2023-01-11T23:13:47.2282708Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i1e_cuda_float32 PASSED [ 70%] 2023-01-11T23:13:47.2282890Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i1e_cuda_int16 PASSED [ 70%] 2023-01-11T23:13:47.2283060Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i1e_cuda_int32 PASSED [ 70%] 2023-01-11T23:13:47.2283246Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_log_ndtr_cuda_float32 PASSED [ 70%] 2023-01-11T23:13:47.2283432Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_log_ndtr_cuda_float64 PASSED [ 70%] 2023-01-11T23:13:47.2283616Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_log_ndtr_cuda_int16 PASSED [ 70%] 2023-01-11T23:13:47.2283797Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_log_ndtr_cuda_int32 PASSED [ 70%] 2023-01-11T23:13:47.2283976Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_log_ndtr_cuda_uint8 PASSED [ 70%] 2023-01-11T23:13:47.2284173Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_log_softmax_with_dtype_cuda_bfloat16 PASSED [ 70%] 2023-01-11T23:13:47.2284376Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_log_softmax_with_dtype_cuda_complex128 PASSED [ 70%] 2023-01-11T23:13:47.2284578Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_log_softmax_with_dtype_cuda_complex32 PASSED [ 70%] 2023-01-11T23:13:47.2284778Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_log_softmax_with_dtype_cuda_complex64 PASSED [ 70%] 2023-01-11T23:13:47.2284980Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_log_softmax_with_dtype_cuda_float16 PASSED [ 70%] 2023-01-11T23:13:47.2285175Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_log_softmax_with_dtype_cuda_int32 PASSED [ 70%] 2023-01-11T23:13:47.2285373Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_log_softmax_with_dtype_cuda_int8 PASSED [ 70%] 2023-01-11T23:13:47.2285572Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_logit_cuda_bool PASSED [ 70%] 2023-01-11T23:13:47.2285756Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_logit_cuda_float16 PASSED [ 70%] 2023-01-11T23:13:47.2285931Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_logit_cuda_float32 PASSED [ 70%] 2023-01-11T23:13:47.2286112Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_logit_cuda_float64 PASSED [ 70%] 2023-01-11T23:13:47.2286292Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_logit_cuda_int8 PASSED [ 70%] 2023-01-11T23:13:47.2286470Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_logit_cuda_uint8 PASSED [ 70%] 2023-01-11T23:13:47.2286676Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_1_cuda_float16 PASSED [ 70%] 2023-01-11T23:13:47.2286880Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_1_cuda_float32 PASSED [ 70%] 2023-01-11T23:13:47.2287082Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_1_cuda_float64 PASSED [ 70%] 2023-01-11T23:13:47.2287281Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_1_cuda_int32 PASSED [ 70%] 2023-01-11T23:13:47.2287478Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_1_cuda_int64 PASSED [ 70%] 2023-01-11T23:13:47.2287678Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_3_cuda_float64 PASSED [ 70%] 2023-01-11T23:13:47.2287871Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_3_cuda_int32 PASSED [ 70%] 2023-01-11T23:13:47.2288096Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_5_cuda_float16 PASSED [ 70%] 2023-01-11T23:13:47.2288296Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_5_cuda_float64 PASSED [ 70%] 2023-01-11T23:13:47.2288491Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_5_cuda_int16 PASSED [ 70%] 2023-01-11T23:13:47.2288691Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_5_cuda_uint8 PASSED [ 70%] 2023-01-11T23:13:47.2288870Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_ndtr_cuda_bfloat16 PASSED [ 70%] 2023-01-11T23:13:47.2289049Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_ndtr_cuda_bool PASSED [ 70%] 2023-01-11T23:13:47.2289238Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_ndtr_cuda_float64 PASSED [ 70%] 2023-01-11T23:13:47.2289419Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_ndtri_cuda_bool PASSED [ 70%] 2023-01-11T23:13:47.2289596Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_ndtri_cuda_float32 PASSED [ 70%] 2023-01-11T23:13:47.2289779Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_ndtri_cuda_int32 PASSED [ 70%] 2023-01-11T23:13:47.2289959Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_ndtri_cuda_int64 PASSED [ 70%] 2023-01-11T23:13:47.2290136Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_ndtri_cuda_int8 PASSED [ 70%] 2023-01-11T23:13:47.2290314Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_ndtri_cuda_uint8 PASSED [ 70%] 2023-01-11T23:13:47.2290506Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_softmax_with_dtype_cuda_bool PASSED [ 70%] 2023-01-11T23:13:47.2290707Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_softmax_with_dtype_cuda_complex128 PASSED [ 70%] 2023-01-11T23:13:47.2290936Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_softmax_with_dtype_cuda_float16 PASSED [ 70%] 2023-01-11T23:13:47.2291134Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_softmax_with_dtype_cuda_float32 PASSED [ 70%] 2023-01-11T23:13:47.2291328Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_softmax_with_dtype_cuda_float64 PASSED [ 70%] 2023-01-11T23:13:47.2291513Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_softmax_with_dtype_cuda_int16 PASSED [ 70%] 2023-01-11T23:13:47.2291701Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_softmax_with_dtype_cuda_int32 PASSED [ 70%] 2023-01-11T23:13:47.2291884Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_softmax_with_dtype_cuda_int64 PASSED [ 70%] 2023-01-11T23:13:47.2292073Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_softmax_with_dtype_cuda_uint8 PASSED [ 70%] 2023-01-11T23:13:47.2292260Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_spherical_bessel_j0_cuda_bool PASSED [ 70%] 2023-01-11T23:13:47.2292456Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_spherical_bessel_j0_cuda_float32 PASSED [ 70%] 2023-01-11T23:13:47.2292650Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_spherical_bessel_j0_cuda_int64 PASSED [ 70%] 2023-01-11T23:13:47.2292844Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_spherical_bessel_j0_cuda_uint8 PASSED [ 70%] 2023-01-11T23:13:47.2293029Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_xlog1py_cuda_bfloat16 PASSED [ 70%] 2023-01-11T23:13:47.2293204Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_xlog1py_cuda_bool PASSED [ 70%] 2023-01-11T23:13:47.2293415Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_xlog1py_cuda_float64 PASSED [ 70%] 2023-01-11T23:13:47.2293598Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_xlog1py_cuda_int32 PASSED [ 70%] 2023-01-11T23:13:47.2293779Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_xlog1py_cuda_int64 PASSED [ 70%] 2023-01-11T23:13:47.2293956Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_xlog1py_cuda_uint8 PASSED [ 71%] 2023-01-11T23:13:47.2294136Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_zeta_cuda_bool PASSED [ 71%] 2023-01-11T23:13:47.2294316Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_zeta_cuda_float32 PASSED [ 71%] 2023-01-11T23:13:47.2294643Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_zeta_cuda_int8 PASSED [ 71%] 2023-01-11T23:13:47.2294824Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_zeta_cuda_uint8 PASSED [ 71%] 2023-01-11T23:13:47.2294994Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sqrt_cuda_bfloat16 PASSED [ 71%] 2023-01-11T23:13:47.2295175Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sqrt_cuda_complex128 PASSED [ 71%] 2023-01-11T23:13:47.2295348Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sqrt_cuda_float32 PASSED [ 71%] 2023-01-11T23:13:47.2295519Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sqrt_cuda_float64 PASSED [ 71%] 2023-01-11T23:13:47.2295688Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sqrt_cuda_int16 PASSED [ 71%] 2023-01-11T23:13:47.2295855Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sqrt_cuda_int64 PASSED [ 71%] 2023-01-11T23:13:47.2296025Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sqrt_cuda_uint8 PASSED [ 71%] 2023-01-11T23:13:47.2296197Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_square_cuda_bool PASSED [ 71%] 2023-01-11T23:13:47.2296371Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_square_cuda_int16 PASSED [ 71%] 2023-01-11T23:13:47.2296574Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_square_cuda_int8 PASSED [ 71%] 2023-01-11T23:13:47.2296745Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_square_cuda_uint8 PASSED [ 71%] 2023-01-11T23:13:47.2296922Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_squeeze_cuda_bfloat16 PASSED [ 71%] 2023-01-11T23:13:47.2297103Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_squeeze_cuda_complex32 PASSED [ 71%] 2023-01-11T23:13:47.2297282Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_squeeze_cuda_complex64 PASSED [ 71%] 2023-01-11T23:13:47.2297454Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_squeeze_cuda_int16 PASSED [ 71%] 2023-01-11T23:13:47.2297623Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_squeeze_cuda_int32 PASSED [ 71%] 2023-01-11T23:13:47.2297792Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_squeeze_cuda_int64 PASSED [ 71%] 2023-01-11T23:13:47.2297963Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_squeeze_cuda_int8 PASSED [ 71%] 2023-01-11T23:13:47.2298124Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_stack_cuda_bool PASSED [ 71%] 2023-01-11T23:13:47.2298301Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_stack_cuda_complex32 PASSED [ 71%] 2023-01-11T23:13:47.2298472Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_stack_cuda_complex64 PASSED [ 71%] 2023-01-11T23:13:47.2298643Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_stack_cuda_float16 PASSED [ 71%] 2023-01-11T23:13:47.2298810Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_stack_cuda_float64 PASSED [ 71%] 2023-01-11T23:13:47.2298977Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_stack_cuda_int16 PASSED [ 71%] 2023-01-11T23:13:47.2299182Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_stack_cuda_int64 PASSED [ 71%] 2023-01-11T23:13:47.2299355Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_stack_cuda_int8 PASSED [ 71%] 2023-01-11T23:13:47.2299526Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_stack_cuda_uint8 PASSED [ 71%] 2023-01-11T23:13:47.2299688Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_std_cuda_complex64 PASSED [ 71%] 2023-01-11T23:13:47.2299856Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_std_cuda_float32 PASSED [ 71%] 2023-01-11T23:13:47.2300032Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_std_mean_cuda_bfloat16 PASSED [ 71%] 2023-01-11T23:13:47.2300208Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_std_mean_cuda_complex64 PASSED [ 71%] 2023-01-11T23:13:47.2300383Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_std_mean_cuda_float32 PASSED [ 71%] 2023-01-11T23:13:47.2300554Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sub_cuda_float16 PASSED [ 71%] 2023-01-11T23:13:47.2300722Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sub_cuda_float64 PASSED [ 71%] 2023-01-11T23:13:47.2300891Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sub_cuda_int16 PASSED [ 71%] 2023-01-11T23:13:47.2301049Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sub_cuda_int64 PASSED [ 71%] 2023-01-11T23:13:47.2301215Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sub_cuda_int8 PASSED [ 71%] 2023-01-11T23:13:47.2301377Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sub_cuda_uint8 PASSED [ 71%] 2023-01-11T23:13:47.2301548Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_cuda_complex32 PASSED [ 71%] 2023-01-11T23:13:47.2301715Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_cuda_float16 PASSED [ 71%] 2023-01-11T23:13:47.2301884Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_cuda_float32 PASSED [ 71%] 2023-01-11T23:13:47.2302075Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_cuda_int8 PASSED [ 71%] 2023-01-11T23:13:47.2302252Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_to_size_cuda_bool PASSED [ 71%] 2023-01-11T23:13:47.2302435Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_to_size_cuda_complex128 PASSED [ 71%] 2023-01-11T23:13:47.2302610Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_to_size_cuda_complex64 PASSED [ 71%] 2023-01-11T23:13:47.2302787Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_to_size_cuda_int16 PASSED [ 71%] 2023-01-11T23:13:47.2302961Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_to_size_cuda_int64 PASSED [ 71%] 2023-01-11T23:13:47.2303125Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_t_cuda_bool PASSED [ 71%] 2023-01-11T23:13:47.2303295Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_t_cuda_float32 PASSED [ 71%] 2023-01-11T23:13:47.2303461Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_t_cuda_int16 PASSED [ 71%] 2023-01-11T23:13:47.2303628Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_t_cuda_int32 PASSED [ 71%] 2023-01-11T23:13:47.2303787Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_t_cuda_int64 PASSED [ 71%] 2023-01-11T23:13:47.2303953Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_t_cuda_int8 PASSED [ 71%] 2023-01-11T23:13:47.2304113Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tan_cuda_bool PASSED [ 71%] 2023-01-11T23:13:47.2304290Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tan_cuda_complex128 PASSED [ 71%] 2023-01-11T23:13:47.2304465Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tan_cuda_complex32 PASSED [ 71%] 2023-01-11T23:13:47.2304660Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tan_cuda_float64 PASSED [ 71%] 2023-01-11T23:13:47.2304825Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tan_cuda_int16 PASSED [ 71%] 2023-01-11T23:13:47.2304990Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tan_cuda_int8 PASSED [ 71%] 2023-01-11T23:13:47.2305157Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tan_cuda_uint8 PASSED [ 71%] 2023-01-11T23:13:47.2305329Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tanh_cuda_bfloat16 PASSED [ 71%] 2023-01-11T23:13:47.2305495Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tanh_cuda_complex32 PASSED [ 71%] 2023-01-11T23:13:47.2305665Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tanh_cuda_complex64 PASSED [ 71%] 2023-01-11T23:13:47.2305836Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tanh_cuda_float64 PASSED [ 71%] 2023-01-11T23:13:47.2306024Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tensor_split_cuda_complex128 PASSED [ 71%] 2023-01-11T23:13:47.2306207Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tensor_split_cuda_float32 PASSED [ 71%] 2023-01-11T23:13:47.2306388Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tensor_split_cuda_float64 PASSED [ 71%] 2023-01-11T23:13:47.2306567Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tensor_split_cuda_int16 PASSED [ 71%] 2023-01-11T23:13:47.2306746Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tensor_split_cuda_int32 PASSED [ 71%] 2023-01-11T23:13:47.2306920Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tensor_split_cuda_int64 PASSED [ 71%] 2023-01-11T23:13:47.2307082Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_to_cuda_bfloat16 PASSED [ 71%] 2023-01-11T23:13:47.2307255Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_to_cuda_complex128 PASSED [ 71%] 2023-01-11T23:13:47.2307427Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_to_cuda_float16 PASSED [ 71%] 2023-01-11T23:13:47.2307616Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_to_cuda_float32 PASSED [ 71%] 2023-01-11T23:13:47.2307782Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_to_cuda_int16 PASSED [ 71%] 2023-01-11T23:13:47.2307948Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_to_cuda_int32 PASSED [ 71%] 2023-01-11T23:13:47.2308112Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_to_cuda_uint8 PASSED [ 71%] 2023-01-11T23:13:47.2308290Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trace_cuda_complex64 PASSED [ 71%] 2023-01-11T23:13:47.2308460Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trace_cuda_float16 PASSED [ 71%] 2023-01-11T23:13:47.2308620Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trace_cuda_float64 PASSED [ 71%] 2023-01-11T23:13:47.2308796Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trace_cuda_int16 PASSED [ 71%] 2023-01-11T23:13:47.2308966Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trace_cuda_int32 PASSED [ 71%] 2023-01-11T23:13:47.2309157Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_transpose_cuda_complex128 PASSED [ 71%] 2023-01-11T23:13:47.2309361Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_transpose_cuda_float16 PASSED [ 71%] 2023-01-11T23:13:47.2309561Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_transpose_cuda_int32 PASSED [ 71%] 2023-01-11T23:13:47.2309796Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_transpose_cuda_int8 PASSED [ 71%] 2023-01-11T23:13:47.2309966Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tril_cuda_bool PASSED [ 71%] 2023-01-11T23:13:47.2310138Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tril_cuda_complex32 PASSED [ 71%] 2023-01-11T23:13:47.2310340Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tril_cuda_complex64 PASSED [ 71%] 2023-01-11T23:13:47.2310513Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tril_cuda_float32 PASSED [ 71%] 2023-01-11T23:13:47.2310681Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tril_cuda_int16 PASSED [ 71%] 2023-01-11T23:13:47.2310849Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tril_cuda_int32 PASSED [ 71%] 2023-01-11T23:13:47.2311014Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tril_cuda_int64 PASSED [ 71%] 2023-01-11T23:13:47.2311180Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tril_cuda_int8 PASSED [ 71%] 2023-01-11T23:13:47.2311352Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_triu_cuda_complex64 PASSED [ 71%] 2023-01-11T23:13:47.2311521Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_triu_cuda_float16 PASSED [ 71%] 2023-01-11T23:13:47.2311692Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_triu_cuda_float32 PASSED [ 71%] 2023-01-11T23:13:47.2311849Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_triu_cuda_int32 PASSED [ 71%] 2023-01-11T23:13:47.2312019Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_triu_cuda_uint8 PASSED [ 71%] 2023-01-11T23:13:47.2312198Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_triu_indices_cuda_int32 PASSED [ 71%] 2023-01-11T23:13:47.2312373Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_true_divide_cuda_bool PASSED [ 71%] 2023-01-11T23:13:47.2312554Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_true_divide_cuda_complex128 PASSED [ 71%] 2023-01-11T23:13:47.2312732Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_true_divide_cuda_complex32 XFAIL [ 71%] 2023-01-11T23:13:47.2312910Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_true_divide_cuda_float16 PASSED [ 71%] 2023-01-11T23:13:47.2313091Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_true_divide_cuda_float32 PASSED [ 71%] 2023-01-11T23:13:47.2313288Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_true_divide_cuda_int16 PASSED [ 71%] 2023-01-11T23:13:47.2313459Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_true_divide_cuda_uint8 PASSED [ 71%] 2023-01-11T23:13:47.2313635Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trunc_cuda_float32 PASSED [ 71%] 2023-01-11T23:13:47.2313805Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trunc_cuda_float64 PASSED [ 71%] 2023-01-11T23:13:47.2313975Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trunc_cuda_int32 PASSED [ 71%] 2023-01-11T23:13:47.2314144Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trunc_cuda_uint8 PASSED [ 71%] 2023-01-11T23:13:47.2314317Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unbind_cuda_bool PASSED [ 71%] 2023-01-11T23:13:47.2314498Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unbind_cuda_complex128 PASSED [ 71%] 2023-01-11T23:13:47.2314678Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unbind_cuda_complex32 PASSED [ 71%] 2023-01-11T23:13:47.2314838Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unbind_cuda_int8 PASSED [ 71%] 2023-01-11T23:13:47.2315017Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unflatten_cuda_bfloat16 PASSED [ 71%] 2023-01-11T23:13:47.2315198Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unflatten_cuda_complex64 PASSED [ 71%] 2023-01-11T23:13:47.2315373Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unflatten_cuda_float16 PASSED [ 71%] 2023-01-11T23:13:47.2315552Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unflatten_cuda_float64 PASSED [ 71%] 2023-01-11T23:13:47.2315752Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unflatten_cuda_int64 PASSED [ 71%] 2023-01-11T23:13:47.2315925Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unflatten_cuda_uint8 PASSED [ 71%] 2023-01-11T23:13:47.2316112Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_copy_cuda_complex64 PASSED [ 71%] 2023-01-11T23:13:47.2316293Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_copy_cuda_float64 PASSED [ 71%] 2023-01-11T23:13:47.2316460Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_copy_cuda_int64 PASSED [ 71%] 2023-01-11T23:13:47.2316631Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_cuda_bfloat16 PASSED [ 71%] 2023-01-11T23:13:47.2316802Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_cuda_bool PASSED [ 71%] 2023-01-11T23:13:47.2316974Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_cuda_complex128 PASSED [ 71%] 2023-01-11T23:13:47.2317146Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_cuda_int16 PASSED [ 71%] 2023-01-11T23:13:47.2317315Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_cuda_int32 PASSED [ 71%] 2023-01-11T23:13:47.2317487Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_cuda_int64 PASSED [ 71%] 2023-01-11T23:13:47.2317658Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_cuda_uint8 PASSED [ 71%] 2023-01-11T23:13:47.2317836Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unsqueeze_cuda_bfloat16 PASSED [ 71%] 2023-01-11T23:13:47.2318006Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unsqueeze_cuda_float16 PASSED [ 71%] 2023-01-11T23:13:47.2318183Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unsqueeze_cuda_float32 PASSED [ 71%] 2023-01-11T23:13:47.2318354Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_var_cuda_complex64 PASSED [ 71%] 2023-01-11T23:13:47.2318524Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_var_cuda_float16 PASSED [ 71%] 2023-01-11T23:13:47.2318694Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_var_cuda_float32 PASSED [ 71%] 2023-01-11T23:13:47.2318894Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_var_mean_cuda_bfloat16 PASSED [ 71%] 2023-01-11T23:13:47.2319075Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_var_mean_cuda_complex64 PASSED [ 71%] 2023-01-11T23:13:47.2319250Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_as_cuda_bfloat16 PASSED [ 71%] 2023-01-11T23:13:47.2319418Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_as_cuda_bool PASSED [ 71%] 2023-01-11T23:13:47.2319585Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_as_cuda_complex128 PASSED [ 71%] 2023-01-11T23:13:47.2319759Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_as_cuda_complex64 PASSED [ 71%] 2023-01-11T23:13:47.2319937Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_as_cuda_float16 PASSED [ 71%] 2023-01-11T23:13:47.2320105Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_as_cuda_float32 PASSED [ 71%] 2023-01-11T23:13:47.2320275Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_as_cuda_float64 PASSED [ 71%] 2023-01-11T23:13:47.2320447Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_as_cuda_int16 PASSED [ 72%] 2023-01-11T23:13:47.2320617Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_as_cuda_int32 PASSED [ 72%] 2023-01-11T23:13:47.2320785Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_as_cuda_int8 PASSED [ 72%] 2023-01-11T23:13:47.2320953Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_as_cuda_uint8 PASSED [ 72%] 2023-01-11T23:13:47.2321112Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_cuda_bfloat16 PASSED [ 72%] 2023-01-11T23:13:47.2321305Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_cuda_bool PASSED [ 72%] 2023-01-11T23:13:47.2321482Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_cuda_complex128 PASSED [ 72%] 2023-01-11T23:13:47.2321659Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_cuda_complex32 PASSED [ 72%] 2023-01-11T23:13:47.2321828Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_cuda_float64 PASSED [ 72%] 2023-01-11T23:13:47.2321998Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_cuda_int32 PASSED [ 72%] 2023-01-11T23:13:47.2322170Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vsplit_cuda_bfloat16 PASSED [ 72%] 2023-01-11T23:13:47.2322339Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vsplit_cuda_bool PASSED [ 72%] 2023-01-11T23:13:47.2322509Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vsplit_cuda_int32 PASSED [ 72%] 2023-01-11T23:13:47.2322673Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vsplit_cuda_uint8 PASSED [ 72%] 2023-01-11T23:13:47.2322850Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vstack_cuda_bfloat16 PASSED [ 72%] 2023-01-11T23:13:47.2323022Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vstack_cuda_bool PASSED [ 72%] 2023-01-11T23:13:47.2323190Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vstack_cuda_int32 PASSED [ 72%] 2023-01-11T23:13:47.2323359Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vstack_cuda_int8 PASSED [ 72%] 2023-01-11T23:13:47.2323530Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_where_cuda_bfloat16 PASSED [ 72%] 2023-01-11T23:13:47.2323699Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_where_cuda_float16 PASSED [ 72%] 2023-01-11T23:13:47.2323866Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_where_cuda_float64 PASSED [ 72%] 2023-01-11T23:13:47.2324030Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_where_cuda_int64 PASSED [ 72%] 2023-01-11T23:13:47.2324199Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_where_cuda_int8 PASSED [ 72%] 2023-01-11T23:13:47.2324390Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_where_cuda_uint8 PASSED [ 72%] 2023-01-11T23:13:47.2324569Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_xlogy_cuda_bfloat16 PASSED [ 72%] 2023-01-11T23:13:47.2324739Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_xlogy_cuda_float16 PASSED [ 72%] 2023-01-11T23:13:47.2324905Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_xlogy_cuda_int16 PASSED [ 72%] 2023-01-11T23:13:47.2325073Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_xlogy_cuda_int64 PASSED [ 72%] 2023-01-11T23:13:47.2325245Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_zeros_cuda_bfloat16 PASSED [ 72%] 2023-01-11T23:13:47.2325413Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_zeros_cuda_bool PASSED [ 72%] 2023-01-11T23:13:47.2325579Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_zeros_cuda_complex32 PASSED [ 72%] 2023-01-11T23:13:47.2325754Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_zeros_cuda_complex64 PASSED [ 72%] 2023-01-11T23:13:47.2325925Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_zeros_cuda_float16 PASSED [ 72%] 2023-01-11T23:13:47.2326094Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_zeros_cuda_float64 PASSED [ 72%] 2023-01-11T23:13:47.2326266Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_zeros_cuda_int32 PASSED [ 72%] 2023-01-11T23:13:47.2326435Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_zeros_cuda_int64 PASSED [ 72%] 2023-01-11T23:13:47.2326602Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_zeros_cuda_int8 PASSED [ 72%] 2023-01-11T23:13:47.2326864Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback_ops_nvprims_native_batch_norm_cuda_bfloat16 PASSED [ 72%] 2023-01-11T23:13:47.2327050Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback_ops_nvprims_var_mean_cuda_float64 PASSED [ 72%] 2023-01-11T23:13:47.2327227Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback_ops_nvprims_view_cuda_bfloat16 PASSED [ 72%] 2023-01-11T23:13:47.2327400Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback_ops_nvprims_view_cuda_bool PASSED [ 72%] 2023-01-11T23:13:47.2327583Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback_ops_nvprims_view_cuda_complex32 PASSED [ 72%] 2023-01-11T23:13:47.2327765Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback_ops_nvprims_view_cuda_complex64 PASSED [ 72%] 2023-01-11T23:13:47.2327945Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback_ops_nvprims_view_cuda_float64 PASSED [ 72%] 2023-01-11T23:13:47.2328123Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback_ops_nvprims_view_cuda_int16 PASSED [ 72%] 2023-01-11T23:13:47.2328300Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback_ops_nvprims_view_cuda_int32 PASSED [ 72%] 2023-01-11T23:13:47.2328478Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback_ops_nvprims_view_cuda_int8 PASSED [ 72%] 2023-01-11T23:13:47.2328648Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback_ops_nvprims_view_cuda_uint8 PASSED [ 72%] 2023-01-11T23:13:47.2328810Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_H_cuda_complex64 PASSED [ 72%] 2023-01-11T23:13:47.2328978Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_H_cuda_float32 PASSED [ 72%] 2023-01-11T23:13:47.2329142Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_T_cuda_float32 PASSED [ 72%] 2023-01-11T23:13:47.2329320Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager___getitem___cuda_float32 PASSED [ 72%] 2023-01-11T23:13:47.2329490Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager___radd___cuda_float32 PASSED [ 72%] 2023-01-11T23:13:47.2329665Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager___rdiv___cuda_complex64 PASSED [ 72%] 2023-01-11T23:13:47.2329841Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager___rmatmul___cuda_complex64 PASSED [ 72%] 2023-01-11T23:13:47.2330036Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager___rmod___cuda_float32 PASSED [ 72%] 2023-01-11T23:13:47.2330202Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager___rmul___cuda_complex64 PASSED [ 72%] 2023-01-11T23:13:47.2330375Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager___rpow___cuda_complex64 PASSED [ 72%] 2023-01-11T23:13:47.2330543Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager___rpow___cuda_float32 PASSED [ 72%] 2023-01-11T23:13:47.2330708Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager___rsub___cuda_float32 PASSED [ 72%] 2023-01-11T23:13:47.2330897Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager__softmax_backward_data_cuda_float32 PASSED [ 72%] 2023-01-11T23:13:47.2331070Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_acosh_cuda_float32 PASSED [ 72%] 2023-01-11T23:13:47.2331239Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_add_cuda_complex64 PASSED [ 72%] 2023-01-11T23:13:47.2331408Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_add_cuda_float32 PASSED [ 72%] 2023-01-11T23:13:47.2331578Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_addcdiv_cuda_complex64 XFAIL [ 72%] 2023-01-11T23:13:47.2331738Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_addcmul_cuda_float32 XFAIL [ 72%] 2023-01-11T23:13:47.2331910Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_addmm_cuda_complex64 PASSED [ 72%] 2023-01-11T23:13:47.2332076Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_addmm_cuda_float32 PASSED [ 72%] 2023-01-11T23:13:47.2332258Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_addmm_decomposed_cuda_float32 PASSED [ 72%] 2023-01-11T23:13:47.2332456Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_addmv_cuda_complex64 PASSED [ 72%] 2023-01-11T23:13:47.2332619Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_all_cuda_float32 PASSED [ 72%] 2023-01-11T23:13:47.2332788Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_amax_cuda_float32 PASSED [ 72%] 2023-01-11T23:13:47.2332952Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_amin_cuda_float32 PASSED [ 72%] 2023-01-11T23:13:47.2333122Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_aminmax_cuda_float32 PASSED [ 72%] 2023-01-11T23:13:47.2333281Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_argmax_cuda_float32 PASSED [ 72%] 2023-01-11T23:13:47.2333448Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_argmin_cuda_float32 PASSED [ 72%] 2023-01-11T23:13:47.2333618Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_argsort_cuda_float32 PASSED [ 72%] 2023-01-11T23:13:47.2333793Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_argwhere_cuda_complex64 PASSED [ 72%] 2023-01-11T23:13:47.2333964Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_argwhere_cuda_float32 PASSED [ 72%] 2023-01-11T23:13:47.2334185Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_as_strided_cuda_float32 SKIPPED (Errors when storage_offset is included) [ 72%] 2023-01-11T23:13:47.2334430Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_as_strided_scatter_cuda_complex64 SKIPPED (Fails in most cases, passes on LAZY for some reason) [ 72%] 2023-01-11T23:13:47.2334911Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_as_strided_scatter_cuda_float32 SKIPPED (Fails in most cases, passes on LAZY for some reason) [ 72%] 2023-01-11T23:13:47.2335083Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_asin_cuda_float32 PASSED [ 72%] 2023-01-11T23:13:47.2335245Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_atan_cuda_float32 PASSED [ 72%] 2023-01-11T23:13:47.2335429Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_atleast_1d_cuda_complex64 PASSED [ 72%] 2023-01-11T23:13:47.2335603Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_atleast_1d_cuda_float32 PASSED [ 72%] 2023-01-11T23:13:47.2335828Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_atleast_2d_cuda_complex64 PASSED [ 72%] 2023-01-11T23:13:47.2336005Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_atleast_2d_cuda_float32 PASSED [ 72%] 2023-01-11T23:13:47.2336183Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_atleast_3d_cuda_complex64 PASSED [ 72%] 2023-01-11T23:13:47.2336355Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_atleast_3d_cuda_float32 PASSED [ 72%] 2023-01-11T23:13:47.2336532Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_baddbmm_cuda_complex64 PASSED [ 72%] 2023-01-11T23:13:47.2336702Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_bfloat16_cuda_float32 PASSED [ 72%] 2023-01-11T23:13:47.2336873Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_block_diag_cuda_complex64 PASSED [ 72%] 2023-01-11T23:13:47.2337045Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_block_diag_cuda_float32 PASSED [ 72%] 2023-01-11T23:13:47.2337215Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_bool_cuda_float32 PASSED [ 72%] 2023-01-11T23:13:47.2337415Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_broadcast_shapes_cuda_float32 SKIPPED (Skipped!) [ 72%] 2023-01-11T23:13:47.2337603Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_broadcast_tensors_cuda_complex64 PASSED [ 72%] 2023-01-11T23:13:47.2337787Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_broadcast_tensors_cuda_float32 PASSED [ 72%] 2023-01-11T23:13:47.2337969Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_broadcast_to_cuda_complex64 PASSED [ 72%] 2023-01-11T23:13:47.2338143Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_bucketize_cuda_float32 PASSED [ 72%] 2023-01-11T23:13:47.2338359Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cartesian_prod_cuda_complex64 PASSED [ 72%] 2023-01-11T23:13:47.2338520Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cdist_cuda_float32 PASSED [ 72%] 2023-01-11T23:13:47.2338696Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cdouble_cuda_float32 PASSED [ 72%] 2023-01-11T23:13:47.2338866Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cfloat_cuda_float32 PASSED [ 72%] 2023-01-11T23:13:47.2339034Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_chalf_cuda_float32 PASSED [ 72%] 2023-01-11T23:13:47.2339202Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_char_cuda_complex64 PASSED [ 72%] 2023-01-11T23:13:47.2339388Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cholesky_inverse_cuda_complex64 PASSED [ 72%] 2023-01-11T23:13:47.2339571Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cholesky_inverse_cuda_float32 PASSED [ 72%] 2023-01-11T23:13:47.2339756Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cholesky_solve_cuda_complex64 PASSED [ 72%] 2023-01-11T23:13:47.2339937Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cholesky_solve_cuda_float32 PASSED [ 72%] 2023-01-11T23:13:47.2340101Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_clamp_cuda_float32 PASSED [ 72%] 2023-01-11T23:13:47.2340273Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_clamp_max_cuda_float32 PASSED [ 72%] 2023-01-11T23:13:47.2340449Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_column_stack_cuda_float32 PASSED [ 72%] 2023-01-11T23:13:47.2340629Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_combinations_cuda_complex64 PASSED [ 72%] 2023-01-11T23:13:47.2340800Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_conj_cuda_complex64 PASSED [ 72%] 2023-01-11T23:13:47.2340983Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_constant_pad_nd_cuda_complex64 PASSED [ 72%] 2023-01-11T23:13:47.2341167Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_constant_pad_nd_cuda_float32 PASSED [ 72%] 2023-01-11T23:13:47.2341338Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_corrcoef_cuda_float32 PASSED [ 72%] 2023-01-11T23:13:47.2341524Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cos_cuda_complex64 PASSED [ 72%] 2023-01-11T23:13:47.2341693Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cos_cuda_float32 PASSED [ 72%] 2023-01-11T23:13:47.2341862Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cosh_cuda_float32 PASSED [ 72%] 2023-01-11T23:13:47.2342045Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_count_nonzero_cuda_complex64 PASSED [ 72%] 2023-01-11T23:13:47.2342215Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cov_cuda_complex64 PASSED [ 72%] 2023-01-11T23:13:47.2342384Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cross_cuda_float32 PASSED [ 72%] 2023-01-11T23:13:47.2342553Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cummax_cuda_float32 PASSED [ 72%] 2023-01-11T23:13:47.2342722Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cummin_cuda_float32 PASSED [ 72%] 2023-01-11T23:13:47.2342902Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cumprod_cuda_complex64 PASSED [ 72%] 2023-01-11T23:13:47.2343067Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cumprod_cuda_float32 PASSED [ 72%] 2023-01-11T23:13:47.2343260Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cumulative_trapezoid_cuda_complex64 PASSED [ 72%] 2023-01-11T23:13:47.2343449Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cumulative_trapezoid_cuda_float32 PASSED [ 72%] 2023-01-11T23:13:47.2343619Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_deg2rad_cuda_float32 PASSED [ 72%] 2023-01-11T23:13:47.2343787Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_diag_cuda_float32 PASSED [ 72%] 2023-01-11T23:13:47.2343963Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_diagflat_cuda_complex64 PASSED [ 72%] 2023-01-11T23:13:47.2344172Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_diagonal_copy_cuda_complex64 PASSED [ 72%] 2023-01-11T23:13:47.2344354Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_diagonal_copy_cuda_float32 PASSED [ 72%] 2023-01-11T23:13:47.2344525Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_diff_cuda_complex64 PASSED [ 72%] 2023-01-11T23:13:47.2344683Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_dist_cuda_float32 PASSED [ 72%] 2023-01-11T23:13:47.2344874Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_div_no_rounding_mode_cuda_complex64 PASSED [ 72%] 2023-01-11T23:13:47.2345060Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_div_trunc_rounding_cuda_float32 PASSED [ 72%] 2023-01-11T23:13:47.2345230Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_dot_cuda_complex64 PASSED [ 72%] 2023-01-11T23:13:47.2345399Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_dot_cuda_float32 PASSED [ 72%] 2023-01-11T23:13:47.2345574Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_dstack_cuda_complex64 PASSED [ 72%] 2023-01-11T23:13:47.2345750Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_einsum_cuda_complex64 PASSED [ 72%] 2023-01-11T23:13:47.2345919Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_einsum_cuda_float32 PASSED [ 72%] 2023-01-11T23:13:47.2346112Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_empty_cuda_complex64 SKIPPED (Skipped!) [ 72%] 2023-01-11T23:13:47.2346281Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_empty_like_cuda_float32 PASSED [ 72%] 2023-01-11T23:13:47.2354216Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_eq_cuda_complex64 PASSED [ 72%] 2023-01-11T23:13:47.2354399Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_erf_cuda_float32 PASSED [ 72%] 2023-01-11T23:13:47.2354561Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_erfc_cuda_float32 PASSED [ 72%] 2023-01-11T23:13:47.2354733Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_exp2_cuda_float32 PASSED [ 72%] 2023-01-11T23:13:47.2354954Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_exp_cuda_complex64 PASSED [ 72%] 2023-01-11T23:13:47.2355119Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_exp_cuda_float32 PASSED [ 72%] 2023-01-11T23:13:47.2355295Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_expand_cuda_complex64 PASSED [ 73%] 2023-01-11T23:13:47.2355457Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_expm1_cuda_float32 PASSED [ 73%] 2023-01-11T23:13:47.2355640Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_eye_cuda_complex64 SKIPPED (Skipped!) [ 73%] 2023-01-11T23:13:47.2355814Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_fftn_cuda_float32 PASSED [ 73%] 2023-01-11T23:13:47.2355997Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_fftshift_cuda_complex64 PASSED [ 73%] 2023-01-11T23:13:47.2356178Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_hfft2_cuda_complex64 PASSED [ 73%] 2023-01-11T23:13:47.2356351Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_hfftn_cuda_float32 PASSED [ 73%] 2023-01-11T23:13:47.2356534Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_ifft2_cuda_complex64 PASSED [ 73%] 2023-01-11T23:13:47.2356709Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_ifft_cuda_complex64 PASSED [ 73%] 2023-01-11T23:13:47.2356883Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_ifft_cuda_float32 PASSED [ 73%] 2023-01-11T23:13:47.2357054Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_ifftn_cuda_complex64 PASSED [ 73%] 2023-01-11T23:13:47.2357234Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_ifftshift_cuda_float32 PASSED [ 73%] 2023-01-11T23:13:47.2357409Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_ihfftn_cuda_float32 PASSED [ 73%] 2023-01-11T23:13:47.2357633Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_irfft2_cuda_complex64 PASSED [ 73%] 2023-01-11T23:13:47.2357810Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_irfft_cuda_float32 PASSED [ 73%] 2023-01-11T23:13:47.2357992Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_irfftn_cuda_complex64 PASSED [ 73%] 2023-01-11T23:13:47.2358169Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_rfft2_cuda_float32 PASSED [ 73%] 2023-01-11T23:13:47.2358343Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fill_cuda_complex64 PASSED [ 73%] 2023-01-11T23:13:47.2358512Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fill_cuda_float32 PASSED [ 73%] 2023-01-11T23:13:47.2358680Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_flatten_cuda_float32 PASSED [ 73%] 2023-01-11T23:13:47.2358852Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_flip_cuda_complex64 PASSED [ 73%] 2023-01-11T23:13:47.2359030Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fliplr_cuda_complex64 PASSED [ 73%] 2023-01-11T23:13:47.2359204Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_flipud_cuda_complex64 PASSED [ 73%] 2023-01-11T23:13:47.2359376Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_flipud_cuda_float32 PASSED [ 73%] 2023-01-11T23:13:47.2359550Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_float_cuda_complex64 PASSED [ 73%] 2023-01-11T23:13:47.2359722Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_float_cuda_float32 PASSED [ 73%] 2023-01-11T23:13:47.2359902Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_float_power_cuda_float32 PASSED [ 73%] 2023-01-11T23:13:47.2360072Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_floor_divide_cuda_float32 PASSED [ 73%] 2023-01-11T23:13:47.2360242Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fmax_cuda_float32 PASSED [ 73%] 2023-01-11T23:13:47.2360418Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fmod_cuda_float32 PASSED [ 73%] 2023-01-11T23:13:47.2360583Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_frac_cuda_float32 PASSED [ 73%] 2023-01-11T23:13:47.2360790Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_full_like_cuda_complex64 PASSED [ 73%] 2023-01-11T23:13:47.2360967Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_gather_cuda_complex64 PASSED [ 73%] 2023-01-11T23:13:47.2361137Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_gather_cuda_float32 PASSED [ 73%] 2023-01-11T23:13:47.2361303Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_ge_cuda_float32 PASSED [ 73%] 2023-01-11T23:13:47.2361473Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_geqrf_cuda_float32 PASSED [ 73%] 2023-01-11T23:13:47.2361645Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_gradient_cuda_complex64 PASSED [ 73%] 2023-01-11T23:13:47.2361814Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_gt_cuda_float32 PASSED [ 73%] 2023-01-11T23:13:47.2361988Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_half_cuda_float32 PASSED [ 73%] 2023-01-11T23:13:47.2362167Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_heaviside_cuda_float32 PASSED [ 73%] 2023-01-11T23:13:47.2362337Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_histc_cuda_float32 PASSED [ 73%] 2023-01-11T23:13:47.2362514Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_hstack_cuda_complex64 PASSED [ 73%] 2023-01-11T23:13:47.2362687Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_index_add_cuda_float32 PASSED [ 73%] 2023-01-11T23:13:47.2362867Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_index_copy_cuda_complex64 PASSED [ 73%] 2023-01-11T23:13:47.2363035Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_index_copy_cuda_float32 PASSED [ 73%] 2023-01-11T23:13:47.2363216Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_index_fill_cuda_complex64 PASSED [ 73%] 2023-01-11T23:13:47.2363419Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_index_put_cuda_complex64 PASSED [ 73%] 2023-01-11T23:13:47.2363594Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_index_put_cuda_float32 PASSED [ 73%] 2023-01-11T23:13:47.2363778Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_index_select_cuda_complex64 PASSED [ 73%] 2023-01-11T23:13:47.2363952Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_inner_cuda_complex64 PASSED [ 73%] 2023-01-11T23:13:47.2364122Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_int_cuda_float32 PASSED [ 73%] 2023-01-11T23:13:47.2364297Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_isclose_cuda_complex64 PASSED [ 73%] 2023-01-11T23:13:47.2364469Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_isfinite_cuda_float32 PASSED [ 73%] 2023-01-11T23:13:47.2364630Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_isin_cuda_float32 PASSED [ 73%] 2023-01-11T23:13:47.2364806Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_isinf_cuda_complex64 PASSED [ 73%] 2023-01-11T23:13:47.2364979Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_isinf_cuda_float32 PASSED [ 73%] 2023-01-11T23:13:47.2365154Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_isposinf_cuda_float32 PASSED [ 73%] 2023-01-11T23:13:47.2365329Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_isreal_cuda_complex64 PASSED [ 73%] 2023-01-11T23:13:47.2365502Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_istft_cuda_complex64 PASSED [ 73%] 2023-01-11T23:13:47.2365697Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_jiterator_2inputs_2outputs_cuda_float32 PASSED [ 73%] 2023-01-11T23:13:47.2365902Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_jiterator_4inputs_with_extra_args_cuda_complex64 PASSED [ 73%] 2023-01-11T23:13:47.2366106Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_jiterator_4inputs_with_extra_args_cuda_float32 PASSED [ 73%] 2023-01-11T23:13:47.2366289Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_jiterator_binary_cuda_float32 PASSED [ 73%] 2023-01-11T23:13:47.2366512Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_jiterator_binary_return_by_ref_cuda_float32 PASSED [ 73%] 2023-01-11T23:13:47.2366702Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_jiterator_unary_cuda_complex64 PASSED [ 73%] 2023-01-11T23:13:47.2366887Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_jiterator_unary_cuda_float32 PASSED [ 73%] 2023-01-11T23:13:47.2367058Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_kron_cuda_float32 PASSED [ 73%] 2023-01-11T23:13:47.2367229Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_kthvalue_cuda_float32 PASSED [ 73%] 2023-01-11T23:13:47.2367411Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_cond_cuda_complex64 PASSED [ 73%] 2023-01-11T23:13:47.2367587Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_cond_cuda_float32 PASSED [ 73%] 2023-01-11T23:13:47.2367776Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_cross_cuda_complex64 PASSED [ 73%] 2023-01-11T23:13:47.2367950Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_det_cuda_complex64 PASSED [ 73%] 2023-01-11T23:13:47.2368126Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_det_cuda_float32 PASSED [ 73%] 2023-01-11T23:13:47.2368317Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_det_singular_cuda_complex64 PASSED [ 73%] 2023-01-11T23:13:47.2368496Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_eig_cuda_complex64 PASSED [ 73%] 2023-01-11T23:13:47.2368669Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_eig_cuda_float32 PASSED [ 73%] 2023-01-11T23:13:47.2368847Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_eigh_cuda_float32 PASSED [ 73%] 2023-01-11T23:13:47.2369059Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_eigvalsh_cuda_complex64 PASSED [ 73%] 2023-01-11T23:13:47.2369245Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_eigvalsh_cuda_float32 PASSED [ 73%] 2023-01-11T23:13:47.2369444Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_householder_product_cuda_float32 PASSED [ 73%] 2023-01-11T23:13:47.2369616Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_inv_cuda_complex64 PASSED [ 73%] 2023-01-11T23:13:47.2369790Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_inv_cuda_float32 PASSED [ 73%] 2023-01-11T23:13:47.2369973Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_inv_ex_cuda_complex64 PASSED [ 73%] 2023-01-11T23:13:47.2370159Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_ldl_factor_cuda_float32 PASSED [ 73%] 2023-01-11T23:13:47.2370352Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_ldl_factor_ex_cuda_complex64 PASSED [ 73%] 2023-01-11T23:13:47.2370536Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_ldl_solve_cuda_float32 PASSED [ 73%] 2023-01-11T23:13:47.2370734Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_lstsq_grad_oriented_cuda_complex64 PASSED [ 73%] 2023-01-11T23:13:47.2370918Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_lu_factor_cuda_float32 PASSED [ 73%] 2023-01-11T23:13:47.2371100Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_lu_solve_cuda_float32 PASSED [ 73%] 2023-01-11T23:13:47.2371278Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_matrix_norm_cuda_float32 PASSED [ 73%] 2023-01-11T23:13:47.2371468Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_matrix_power_cuda_float32 PASSED [ 73%] 2023-01-11T23:13:47.2371653Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_matrix_rank_cuda_float32 PASSED [ 73%] 2023-01-11T23:13:47.2371851Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_matrix_rank_hermitian_cuda_complex64 PASSED [ 73%] 2023-01-11T23:13:47.2372052Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_matrix_rank_hermitian_cuda_float32 PASSED [ 73%] 2023-01-11T23:13:47.2372252Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_norm_cuda_float32 PASSED [ 73%] 2023-01-11T23:13:47.2372457Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_norm_subgradients_at_zero_cuda_float32 PASSED [ 73%] 2023-01-11T23:13:47.2372652Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_pinv_hermitian_cuda_complex64 PASSED [ 73%] 2023-01-11T23:13:47.2372910Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_pinv_singular_cuda_complex64 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 73%] 2023-01-11T23:13:47.2373160Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_pinv_singular_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 73%] 2023-01-11T23:13:47.2373334Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_qr_cuda_float32 PASSED [ 73%] 2023-01-11T23:13:47.2373520Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_slogdet_cuda_complex64 PASSED [ 73%] 2023-01-11T23:13:47.2373704Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_solve_cuda_complex64 PASSED [ 73%] 2023-01-11T23:13:47.2373881Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_solve_cuda_float32 PASSED [ 73%] 2023-01-11T23:13:47.2374076Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_solve_triangular_cuda_complex64 PASSED [ 73%] 2023-01-11T23:13:47.2374258Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_svd_cuda_complex64 PASSED [ 73%] 2023-01-11T23:13:47.2374445Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_tensorinv_cuda_complex64 PASSED [ 73%] 2023-01-11T23:13:47.2374756Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_tensorinv_cuda_float32 PASSED [ 73%] 2023-01-11T23:13:47.2375000Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_tensorsolve_cuda_float32 PASSED [ 73%] 2023-01-11T23:13:47.2375180Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_vander_cuda_complex64 PASSED [ 73%] 2023-01-11T23:13:47.2375369Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_vector_norm_cuda_complex64 PASSED [ 73%] 2023-01-11T23:13:47.2375549Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linspace_cuda_complex64 XFAIL [ 73%] 2023-01-11T23:13:47.2375723Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linspace_cuda_float32 XFAIL [ 73%] 2023-01-11T23:13:47.2375897Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_log10_cuda_complex64 PASSED [ 73%] 2023-01-11T23:13:47.2376066Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_log10_cuda_float32 PASSED [ 73%] 2023-01-11T23:13:47.2376235Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_log2_cuda_float32 PASSED [ 73%] 2023-01-11T23:13:47.2376419Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_log_softmax_cuda_float32 PASSED [ 73%] 2023-01-11T23:13:47.2376607Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_log_softmax_with_dtype_cuda_complex64 PASSED [ 73%] 2023-01-11T23:13:47.2376799Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_log_softmax_with_dtype_cuda_float32 PASSED [ 73%] 2023-01-11T23:13:47.2376995Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_logical_not_cuda_complex64 SKIPPED (Skipped!) [ 73%] 2023-01-11T23:13:47.2377178Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_logical_xor_cuda_complex64 PASSED [ 73%] 2023-01-11T23:13:47.2377356Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_logspace_cuda_complex64 XFAIL [ 73%] 2023-01-11T23:13:47.2377535Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_logsumexp_cuda_float32 PASSED [ 73%] 2023-01-11T23:13:47.2377712Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_long_cuda_complex64 PASSED [ 73%] 2023-01-11T23:13:47.2377881Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_long_cuda_float32 PASSED [ 73%] 2023-01-11T23:13:47.2378093Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_lu_solve_cuda_complex64 PASSED [ 73%] 2023-01-11T23:13:47.2378269Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_lu_unpack_cuda_complex64 PASSED [ 73%] 2023-01-11T23:13:47.2378447Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_lu_unpack_cuda_float32 PASSED [ 73%] 2023-01-11T23:13:47.2378613Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_mT_cuda_float32 PASSED [ 73%] 2023-01-11T23:13:47.2378788Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_amax_cuda_float32 PASSED [ 73%] 2023-01-11T23:13:47.2378960Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_amin_cuda_float32 PASSED [ 73%] 2023-01-11T23:13:47.2379142Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_argmax_cuda_float32 PASSED [ 73%] 2023-01-11T23:13:47.2379326Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_argmin_cuda_float32 PASSED [ 73%] 2023-01-11T23:13:47.2379516Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_cumprod_cuda_complex64 PASSED [ 73%] 2023-01-11T23:13:47.2379698Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_cumsum_cuda_float32 PASSED [ 73%] 2023-01-11T23:13:47.2379875Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_logaddexp_cuda_float32 PASSED [ 73%] 2023-01-11T23:13:47.2380058Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_mean_cuda_complex64 PASSED [ 73%] 2023-01-11T23:13:47.2380235Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_norm_cuda_float32 PASSED [ 73%] 2023-01-11T23:13:47.2380419Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_normalize_cuda_float32 PASSED [ 73%] 2023-01-11T23:13:47.2380631Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_scatter_cuda_complex64 PASSED [ 73%] 2023-01-11T23:13:47.2380814Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_scatter_cuda_float32 PASSED [ 73%] 2023-01-11T23:13:47.2380995Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_select_cuda_float32 PASSED [ 73%] 2023-01-11T23:13:47.2381172Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_softmax_cuda_float32 PASSED [ 73%] 2023-01-11T23:13:47.2381347Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_std_cuda_float32 PASSED [ 73%] 2023-01-11T23:13:47.2381519Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_sum_cuda_complex64 PASSED [ 73%] 2023-01-11T23:13:47.2381692Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_sum_cuda_float32 PASSED [ 73%] 2023-01-11T23:13:47.2381870Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_var_cuda_float32 PASSED [ 73%] 2023-01-11T23:13:47.2382047Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_matmul_cuda_complex64 PASSED [ 73%] 2023-01-11T23:13:47.2382220Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_matmul_cuda_float32 PASSED [ 73%] 2023-01-11T23:13:47.2382405Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_matrix_exp_cuda_complex64 PASSED [ 73%] 2023-01-11T23:13:47.2382577Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_max_binary_cuda_float32 PASSED [ 73%] 2023-01-11T23:13:47.2382768Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_max_reduction_with_dim_cuda_float32 PASSED [ 73%] 2023-01-11T23:13:47.2382939Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_mean_cuda_complex64 PASSED [ 74%] 2023-01-11T23:13:47.2383100Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_mean_cuda_float32 PASSED [ 74%] 2023-01-11T23:13:47.2383291Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_meshgrid_list_of_tensors_cuda_float32 PASSED [ 74%] 2023-01-11T23:13:47.2383494Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_meshgrid_variadic_tensors_cuda_complex64 PASSED [ 74%] 2023-01-11T23:13:47.2383667Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_min_binary_cuda_float32 PASSED [ 74%] 2023-01-11T23:13:47.2383879Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_min_reduction_no_dim_cuda_float32 PASSED [ 74%] 2023-01-11T23:13:47.2384070Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_min_reduction_with_dim_cuda_float32 PASSED [ 74%] 2023-01-11T23:13:47.2384242Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_minimum_cuda_float32 PASSED [ 74%] 2023-01-11T23:13:47.2384412Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_mm_cuda_float32 PASSED [ 74%] 2023-01-11T23:13:47.2384583Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_msort_cuda_float32 PASSED [ 74%] 2023-01-11T23:13:47.2384744Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_mul_cuda_float32 PASSED [ 74%] 2023-01-11T23:13:47.2384918Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_mv_cuda_complex64 PASSED [ 74%] 2023-01-11T23:13:47.2385087Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_mv_cuda_float32 PASSED [ 74%] 2023-01-11T23:13:47.2385278Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_mvlgamma_mvlgamma_p_3_cuda_float32 PASSED [ 74%] 2023-01-11T23:13:47.2385458Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nanmedian_cuda_float32 PASSED [ 74%] 2023-01-11T23:13:47.2385640Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_narrow_copy_cuda_float32 PASSED [ 74%] 2023-01-11T23:13:47.2385831Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_native_dropout_backward_cuda_float32 PASSED [ 74%] 2023-01-11T23:13:47.2385997Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_ne_cuda_float32 PASSED [ 74%] 2023-01-11T23:13:47.2386158Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_neg_cuda_float32 PASSED [ 74%] 2023-01-11T23:13:47.2386377Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_new_empty_cuda_complex64 SKIPPED (Skipped!) [ 74%] 2023-01-11T23:13:47.2386552Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_new_full_cuda_complex64 PASSED [ 74%] 2023-01-11T23:13:47.2386732Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_new_ones_cuda_float32 PASSED [ 74%] 2023-01-11T23:13:47.2386912Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_new_zeros_cuda_complex64 PASSED [ 74%] 2023-01-11T23:13:47.2387090Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nextafter_cuda_float32 PASSED [ 74%] 2023-01-11T23:13:47.2387305Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional__scaled_dot_product_attention_cuda_float32 PASSED [ 74%] 2023-01-11T23:13:47.2387508Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_adaptive_avg_pool2d_cuda_float32 PASSED [ 74%] 2023-01-11T23:13:47.2387710Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_adaptive_max_pool2d_cuda_float32 PASSED [ 74%] 2023-01-11T23:13:47.2387911Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_alpha_dropout_cuda_float32 PASSED [ 74%] 2023-01-11T23:13:47.2388097Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_avg_pool1d_cuda_float32 PASSED [ 74%] 2023-01-11T23:13:47.2388305Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_batch_norm_without_cudnn_cuda_float32 PASSED [ 74%] 2023-01-11T23:13:47.2388495Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_bilinear_cuda_float32 PASSED [ 74%] 2023-01-11T23:13:47.2388681Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_celu_cuda_float32 PASSED [ 74%] 2023-01-11T23:13:47.2388871Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_conv1d_cuda_complex64 PASSED [ 74%] 2023-01-11T23:13:47.2389058Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_conv1d_cuda_float32 PASSED [ 74%] 2023-01-11T23:13:47.2389251Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_conv2d_cuda_complex64 PASSED [ 74%] 2023-01-11T23:13:47.2389461Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_conv2d_cuda_float32 PASSED [ 74%] 2023-01-11T23:13:47.2389669Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_conv_transpose1d_cuda_complex64 PASSED [ 74%] 2023-01-11T23:13:47.2389936Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_conv_transpose2d_cuda_float32 PASSED [ 74%] 2023-01-11T23:13:47.2390131Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_conv_transpose3d_cuda_float32 PASSED [ 74%] 2023-01-11T23:13:47.2390329Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_cosine_similarity_cuda_float32 PASSED [ 74%] 2023-01-11T23:13:47.2390523Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_cross_entropy_cuda_float32 PASSED [ 74%] 2023-01-11T23:13:47.2390711Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_elu_cuda_float32 PASSED [ 74%] 2023-01-11T23:13:47.2390908Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_embedding_bag_cuda_float32 PASSED [ 74%] 2023-01-11T23:13:47.2391134Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_feature_alpha_dropout_without_train_cuda_complex64 PASSED [ 74%] 2023-01-11T23:13:47.2391339Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_fractional_max_pool3d_cuda_float32 PASSED [ 74%] 2023-01-11T23:13:47.2391539Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_gaussian_nll_loss_cuda_float32 PASSED [ 74%] 2023-01-11T23:13:47.2391719Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_glu_cuda_float32 PASSED [ 74%] 2023-01-11T23:13:47.2391914Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_grid_sample_cuda_float32 PASSED [ 74%] 2023-01-11T23:13:47.2392133Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_hardshrink_cuda_float32 PASSED [ 74%] 2023-01-11T23:13:47.2392326Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_hardsigmoid_cuda_float32 PASSED [ 74%] 2023-01-11T23:13:47.2392517Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_huber_loss_cuda_float32 PASSED [ 74%] 2023-01-11T23:13:47.2392713Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_interpolate_area_cuda_float32 PASSED [ 74%] 2023-01-11T23:13:47.2392916Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_interpolate_bicubic_cuda_float32 PASSED [ 74%] 2023-01-11T23:13:47.2393115Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_interpolate_linear_cuda_float32 PASSED [ 74%] 2023-01-11T23:13:47.2393315Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_interpolate_nearest_cuda_float32 PASSED [ 74%] 2023-01-11T23:13:47.2393522Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_interpolate_trilinear_cuda_float32 PASSED [ 74%] 2023-01-11T23:13:47.2393699Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_kl_div_cuda_float32 PASSED [ 74%] 2023-01-11T23:13:47.2393892Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_l1_loss_cuda_float32 PASSED [ 74%] 2023-01-11T23:13:47.2394082Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_leaky_relu_cuda_float32 PASSED [ 74%] 2023-01-11T23:13:47.2394273Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_linear_cuda_complex64 PASSED [ 74%] 2023-01-11T23:13:47.2394471Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_local_response_norm_cuda_float32 PASSED [ 74%] 2023-01-11T23:13:47.2394661Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_logsigmoid_cuda_float32 PASSED [ 74%] 2023-01-11T23:13:47.2394855Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_max_pool1d_cuda_float32 PASSED [ 74%] 2023-01-11T23:13:47.2395041Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_max_pool3d_cuda_float32 PASSED [ 74%] 2023-01-11T23:13:47.2395258Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_max_unpool1d_cuda_float32 PASSED [ 74%] 2023-01-11T23:13:47.2395451Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_max_unpool3d_grad_cuda_float32 PASSED [ 74%] 2023-01-11T23:13:47.2395662Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_multilabel_soft_margin_loss_cuda_float32 PASSED [ 74%] 2023-01-11T23:13:47.2395856Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_normalize_cuda_complex64 PASSED [ 74%] 2023-01-11T23:13:47.2396059Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_pad_circular_cuda_complex64 PASSED [ 74%] 2023-01-11T23:13:47.2396256Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_pad_circular_cuda_float32 PASSED [ 74%] 2023-01-11T23:13:47.2396453Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_pad_constant_cuda_complex64 PASSED [ 74%] 2023-01-11T23:13:47.2396650Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_pad_reflect_cuda_complex64 PASSED [ 74%] 2023-01-11T23:13:47.2396849Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_pad_replicate_cuda_complex64 PASSED [ 74%] 2023-01-11T23:13:47.2397036Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_pdist_cuda_float32 PASSED [ 74%] 2023-01-11T23:13:47.2397235Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_pixel_shuffle_cuda_complex64 PASSED [ 74%] 2023-01-11T23:13:47.2397420Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_pixel_unshuffle_cuda_float32 PASSED [ 74%] 2023-01-11T23:13:47.2397616Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_poisson_nll_loss_cuda_float32 PASSED [ 74%] 2023-01-11T23:13:47.2397838Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_relu6_cuda_float32 PASSED [ 74%] 2023-01-11T23:13:47.2398026Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_relu_cuda_float32 PASSED [ 74%] 2023-01-11T23:13:47.2398214Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_silu_cuda_float32 PASSED [ 74%] 2023-01-11T23:13:47.2398406Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_smooth_l1_loss_cuda_float32 PASSED [ 74%] 2023-01-11T23:13:47.2398599Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_soft_margin_loss_cuda_float32 PASSED [ 74%] 2023-01-11T23:13:47.2398802Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_softmin_with_dtype_cuda_complex64 PASSED [ 74%] 2023-01-11T23:13:47.2399002Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_softmin_with_dtype_cuda_float32 PASSED [ 74%] 2023-01-11T23:13:47.2399190Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_softshrink_cuda_float32 PASSED [ 74%] 2023-01-11T23:13:47.2399386Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_softsign_cuda_float32 PASSED [ 74%] 2023-01-11T23:13:47.2399579Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_tanhshrink_cuda_float32 PASSED [ 74%] 2023-01-11T23:13:47.2399769Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_threshold_cuda_float32 PASSED [ 74%] 2023-01-11T23:13:47.2399985Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_triplet_margin_with_distance_loss_cuda_float32 PASSED [ 74%] 2023-01-11T23:13:47.2400175Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_unfold_cuda_complex64 PASSED [ 74%] 2023-01-11T23:13:47.2400353Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nonzero_cuda_complex64 PASSED [ 74%] 2023-01-11T23:13:47.2400529Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_norm_cuda_complex64 PASSED [ 74%] 2023-01-11T23:13:47.2400707Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_norm_fro_cuda_complex64 PASSED [ 74%] 2023-01-11T23:13:47.2400898Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_norm_inf_cuda_complex64 PASSED [ 74%] 2023-01-11T23:13:47.2401074Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_norm_inf_cuda_float32 PASSED [ 74%] 2023-01-11T23:13:47.2401260Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_normal_number_mean_cuda_float32 PASSED [ 74%] 2023-01-11T23:13:47.2401434Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_ones_cuda_complex64 XFAIL [ 74%] 2023-01-11T23:13:47.2401602Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_ones_cuda_float32 XFAIL [ 74%] 2023-01-11T23:13:47.2401779Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_outer_cuda_complex64 PASSED [ 74%] 2023-01-11T23:13:47.2401961Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_pca_lowrank_cuda_float32 PASSED [ 74%] 2023-01-11T23:13:47.2402138Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_pinverse_cuda_complex64 PASSED [ 74%] 2023-01-11T23:13:47.2402315Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_pinverse_cuda_float32 PASSED [ 74%] 2023-01-11T23:13:47.2402480Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_polar_cuda_float32 PASSED [ 74%] 2023-01-11T23:13:47.2402672Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_polygamma_polygamma_n_0_cuda_float32 PASSED [ 74%] 2023-01-11T23:13:47.2402882Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_polygamma_polygamma_n_2_cuda_float32 SKIPPED (Skipped!) [ 74%] 2023-01-11T23:13:47.2403088Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_polygamma_polygamma_n_3_cuda_float32 SKIPPED (Skipped!) [ 74%] 2023-01-11T23:13:47.2403266Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_positive_cuda_complex64 PASSED [ 74%] 2023-01-11T23:13:47.2403468Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_positive_cuda_float32 PASSED [ 74%] 2023-01-11T23:13:47.2403643Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_prod_cuda_float32 PASSED [ 74%] 2023-01-11T23:13:47.2403811Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_put_cuda_float32 PASSED [ 74%] 2023-01-11T23:13:47.2403978Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_qr_cuda_complex64 PASSED [ 74%] 2023-01-11T23:13:47.2404143Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_quantile_cuda_float32 PASSED [ 74%] 2023-01-11T23:13:47.2404320Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_rand_like_cuda_float32 PASSED [ 74%] 2023-01-11T23:13:47.2404501Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_randint_like_cuda_float32 PASSED [ 74%] 2023-01-11T23:13:47.2404676Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_randn_like_cuda_complex64 PASSED [ 74%] 2023-01-11T23:13:47.2404851Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_ravel_cuda_complex64 PASSED [ 74%] 2023-01-11T23:13:47.2405025Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_real_cuda_complex64 PASSED [ 74%] 2023-01-11T23:13:47.2405206Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_reciprocal_cuda_float32 PASSED [ 74%] 2023-01-11T23:13:47.2405380Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_renorm_cuda_complex64 PASSED [ 74%] 2023-01-11T23:13:47.2405549Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_renorm_cuda_float32 PASSED [ 74%] 2023-01-11T23:13:47.2405723Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_reshape_as_cuda_complex64 PASSED [ 74%] 2023-01-11T23:13:47.2405898Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_reshape_as_cuda_float32 PASSED [ 74%] 2023-01-11T23:13:47.2406075Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_resize_as__cuda_float32 PASSED [ 74%] 2023-01-11T23:13:47.2406256Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_resolve_conj_cuda_float32 PASSED [ 74%] 2023-01-11T23:13:47.2406435Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_resolve_neg_cuda_float32 PASSED [ 74%] 2023-01-11T23:13:47.2406627Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_roll_cuda_float32 PASSED [ 74%] 2023-01-11T23:13:47.2406803Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_round_cuda_float32 PASSED [ 74%] 2023-01-11T23:13:47.2406998Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_round_decimals_3_cuda_float32 SKIPPED (Skipped!) [ 74%] 2023-01-11T23:13:47.2407190Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_round_decimals_neg_3_cuda_float32 SKIPPED (Skipped!) [ 74%] 2023-01-11T23:13:47.2407361Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_rsqrt_cuda_float32 PASSED [ 74%] 2023-01-11T23:13:47.2407531Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_rsub_cuda_float32 PASSED [ 74%] 2023-01-11T23:13:47.2407728Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_scalar_tensor_cuda_float32 SKIPPED (Skipped!) [ 74%] 2023-01-11T23:13:47.2407905Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_scatter_cuda_float32 PASSED [ 74%] 2023-01-11T23:13:47.2408092Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_scatter_reduce_amax_cuda_float32 PASSED [ 74%] 2023-01-11T23:13:47.2408278Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_scatter_reduce_amin_cuda_float32 PASSED [ 74%] 2023-01-11T23:13:47.2408460Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_scatter_reduce_mean_cuda_float32 PASSED [ 74%] 2023-01-11T23:13:47.2408641Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_searchsorted_cuda_float32 PASSED [ 74%] 2023-01-11T23:13:47.2408809Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_select_cuda_complex64 PASSED [ 74%] 2023-01-11T23:13:47.2408978Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_select_cuda_float32 PASSED [ 74%] 2023-01-11T23:13:47.2409195Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_select_scatter_cuda_float32 PASSED [ 74%] 2023-01-11T23:13:47.2409373Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_short_cuda_complex64 PASSED [ 74%] 2023-01-11T23:13:47.2409544Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_sigmoid_cuda_float32 PASSED [ 74%] 2023-01-11T23:13:47.2409758Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_signal_windows_general_cosine_cuda_float32 SKIPPED (Skipped!) [ 74%] 2023-01-11T23:13:47.2409973Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_signal_windows_general_hamming_cuda_float32 SKIPPED (Skipped!) [ 74%] 2023-01-11T23:13:47.2410175Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_signal_windows_hann_cuda_float32 SKIPPED (Skipped!) [ 74%] 2023-01-11T23:13:47.2410375Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_signal_windows_kaiser_cuda_float32 SKIPPED (Skipped!) [ 74%] 2023-01-11T23:13:47.2410577Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_signal_windows_nuttall_cuda_float32 SKIPPED (Skipped!) [ 74%] 2023-01-11T23:13:47.2410742Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_sin_cuda_float32 PASSED [ 74%] 2023-01-11T23:13:47.2410917Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_sinc_cuda_complex64 PASSED [ 74%] 2023-01-11T23:13:47.2411088Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_sinc_cuda_float32 PASSED [ 74%] 2023-01-11T23:13:47.2411259Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_sinh_cuda_complex64 PASSED [ 74%] 2023-01-11T23:13:47.2411431Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_slice_cuda_complex64 PASSED [ 74%] 2023-01-11T23:13:47.2411603Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_slice_cuda_float32 PASSED [ 75%] 2023-01-11T23:13:47.2411774Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_sort_cuda_float32 PASSED [ 75%] 2023-01-11T23:13:47.2411981Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_sparse_sampled_addmm_cuda_complex64 SKIPPED (Skipped!) [ 75%] 2023-01-11T23:13:47.2412199Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_sparse_sampled_addmm_cuda_float32 SKIPPED (Skipped!) [ 75%] 2023-01-11T23:13:47.2412384Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_airy_ai_cuda_float32 PASSED [ 75%] 2023-01-11T23:13:47.2412570Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_bessel_y0_cuda_float32 PASSED [ 75%] 2023-01-11T23:13:47.2412770Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_chebyshev_polynomial_u_cuda_float32 PASSED [ 75%] 2023-01-11T23:13:47.2413277Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_chebyshev_polynomial_v_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 75%] 2023-01-11T23:13:47.2413644Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_chebyshev_polynomial_w_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 75%] 2023-01-11T23:13:47.2413847Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_hermite_polynomial_h_cuda_float32 PASSED [ 75%] 2023-01-11T23:13:47.2414029Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_i1_cuda_float32 PASSED [ 75%] 2023-01-11T23:13:47.2414229Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_laguerre_polynomial_l_cuda_float32 PASSED [ 75%] 2023-01-11T23:13:47.2414414Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_log_ndtr_cuda_float32 PASSED [ 75%] 2023-01-11T23:13:47.2414790Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_modified_bessel_k0_cuda_float32 PASSED [ 75%] 2023-01-11T23:13:47.2414974Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_ndtr_cuda_float32 PASSED [ 75%] 2023-01-11T23:13:47.2415156Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_ndtri_cuda_float32 PASSED [ 75%] 2023-01-11T23:13:47.2415603Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_shifted_chebyshev_polynomial_t_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 75%] 2023-01-11T23:13:47.2415991Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_shifted_chebyshev_polynomial_u_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 75%] 2023-01-11T23:13:47.2416366Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_shifted_chebyshev_polynomial_v_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 75%] 2023-01-11T23:13:47.2416738Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_shifted_chebyshev_polynomial_w_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 75%] 2023-01-11T23:13:47.2416935Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_spherical_bessel_j0_cuda_float32 PASSED [ 75%] 2023-01-11T23:13:47.2417112Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_split_cuda_complex64 PASSED [ 75%] 2023-01-11T23:13:47.2417294Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_split_list_args_cuda_float32 PASSED [ 75%] 2023-01-11T23:13:47.2417485Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_split_with_sizes_cuda_complex64 PASSED [ 75%] 2023-01-11T23:13:47.2417660Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_split_with_sizes_cuda_float32 PASSED [ 75%] 2023-01-11T23:13:47.2417832Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_square_cuda_complex64 PASSED [ 75%] 2023-01-11T23:13:47.2418001Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_square_cuda_float32 PASSED [ 75%] 2023-01-11T23:13:47.2418171Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_squeeze_cuda_float32 PASSED [ 75%] 2023-01-11T23:13:47.2418348Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_std_mean_cuda_complex64 PASSED [ 75%] 2023-01-11T23:13:47.2418538Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_std_mean_unbiased_cuda_complex64 PASSED [ 75%] 2023-01-11T23:13:47.2418757Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_std_mean_unbiased_cuda_float32 PASSED [ 75%] 2023-01-11T23:13:47.2418933Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_stft_cuda_complex64 PASSED [ 75%] 2023-01-11T23:13:47.2419104Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_sum_cuda_complex64 PASSED [ 75%] 2023-01-11T23:13:47.2419264Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_sum_cuda_float32 PASSED [ 75%] 2023-01-11T23:13:47.2419443Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_sum_to_size_cuda_float32 PASSED [ 75%] 2023-01-11T23:13:47.2419608Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_svd_cuda_float32 PASSED [ 75%] 2023-01-11T23:13:47.2419784Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_svd_lowrank_cuda_float32 PASSED [ 75%] 2023-01-11T23:13:47.2419962Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_symeig_cuda_complex64 PASSED [ 75%] 2023-01-11T23:13:47.2420128Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_t_cuda_complex64 PASSED [ 75%] 2023-01-11T23:13:47.2420308Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_take_along_dim_cuda_float32 PASSED [ 75%] 2023-01-11T23:13:47.2420476Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_take_cuda_float32 PASSED [ 75%] 2023-01-11T23:13:47.2420630Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_tan_cuda_float32 PASSED [ 75%] 2023-01-11T23:13:47.2420800Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_tanh_cuda_complex64 PASSED [ 75%] 2023-01-11T23:13:47.2420976Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_tensor_split_cuda_float32 PASSED [ 75%] 2023-01-11T23:13:47.2421142Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_tile_cuda_float32 PASSED [ 75%] 2023-01-11T23:13:47.2421337Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_trace_cuda_complex64 PASSED [ 75%] 2023-01-11T23:13:47.2421519Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_transpose_cuda_complex64 PASSED [ 75%] 2023-01-11T23:13:47.2421697Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_trapezoid_cuda_complex64 PASSED [ 75%] 2023-01-11T23:13:47.2421874Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_trapezoid_cuda_float32 PASSED [ 75%] 2023-01-11T23:13:47.2422045Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_trapz_cuda_complex64 PASSED [ 75%] 2023-01-11T23:13:47.2422224Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_triangular_solve_cuda_complex64 PASSED [ 75%] 2023-01-11T23:13:47.2422390Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_tril_cuda_float32 PASSED [ 75%] 2023-01-11T23:13:47.2422568Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_unbind_cuda_complex64 PASSED [ 75%] 2023-01-11T23:13:47.2422740Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_unbind_cuda_float32 PASSED [ 75%] 2023-01-11T23:13:47.2422914Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_unflatten_cuda_float32 PASSED [ 75%] 2023-01-11T23:13:47.2423097Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_unfold_copy_cuda_complex64 PASSED [ 75%] 2023-01-11T23:13:47.2423266Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_unfold_cuda_complex64 PASSED [ 75%] 2023-01-11T23:13:47.2423433Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_unfold_cuda_float32 PASSED [ 75%] 2023-01-11T23:13:47.2423606Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_uniform_cuda_complex64 XFAIL [ 75%] 2023-01-11T23:13:47.2423784Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_unique_consecutive_cuda_float32 PASSED [ 75%] 2023-01-11T23:13:47.2423955Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_unique_cuda_float32 PASSED [ 75%] 2023-01-11T23:13:47.2424143Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_var_mean_unbiased_cuda_float32 PASSED [ 75%] 2023-01-11T23:13:47.2424322Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_var_unbiased_cuda_complex64 PASSED [ 75%] 2023-01-11T23:13:47.2424568Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_view_as_complex_cuda_float32 PASSED [ 75%] 2023-01-11T23:13:47.2424746Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_view_copy_cuda_float32 PASSED [ 75%] 2023-01-11T23:13:47.2424918Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_vsplit_cuda_complex64 PASSED [ 75%] 2023-01-11T23:13:47.2425093Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_vstack_cuda_complex64 PASSED [ 75%] 2023-01-11T23:13:47.2425255Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_where_cuda_float32 PASSED [ 75%] 2023-01-11T23:13:47.2425424Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_zero__cuda_complex64 PASSED [ 75%] 2023-01-11T23:13:47.2425591Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_zeros_cuda_complex64 XFAIL [ 75%] 2023-01-11T23:13:47.2425773Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_zeros_like_cuda_complex64 PASSED [ 75%] 2023-01-11T23:13:47.2425958Z test_ops.py::TestCompositeComplianceCUDA::test_backward___getitem___cuda_float32 PASSED [ 75%] 2023-01-11T23:13:47.2426135Z test_ops.py::TestCompositeComplianceCUDA::test_backward___rdiv___cuda_float32 PASSED [ 75%] 2023-01-11T23:13:47.2426311Z test_ops.py::TestCompositeComplianceCUDA::test_backward___rmod___cuda_float32 PASSED [ 75%] 2023-01-11T23:13:47.2426487Z test_ops.py::TestCompositeComplianceCUDA::test_backward___rpow___cuda_float32 PASSED [ 75%] 2023-01-11T23:13:47.2426687Z test_ops.py::TestCompositeComplianceCUDA::test_backward__native_batch_norm_legit_cuda_float32 PASSED [ 75%] 2023-01-11T23:13:47.2426855Z test_ops.py::TestCompositeComplianceCUDA::test_backward_addbmm_cuda_float32 PASSED [ 75%] 2023-01-11T23:13:47.2427035Z test_ops.py::TestCompositeComplianceCUDA::test_backward_addcmul_cuda_float32 PASSED [ 75%] 2023-01-11T23:13:47.2427238Z test_ops.py::TestCompositeComplianceCUDA::test_backward_addmm_cuda_float32 PASSED [ 75%] 2023-01-11T23:13:47.2427414Z test_ops.py::TestCompositeComplianceCUDA::test_backward_addr_cuda_float32 PASSED [ 75%] 2023-01-11T23:13:47.2427592Z test_ops.py::TestCompositeComplianceCUDA::test_backward_amax_cuda_float32 PASSED [ 75%] 2023-01-11T23:13:47.2427766Z test_ops.py::TestCompositeComplianceCUDA::test_backward_amin_cuda_float32 PASSED [ 75%] 2023-01-11T23:13:47.2427940Z test_ops.py::TestCompositeComplianceCUDA::test_backward_asinh_cuda_float32 PASSED [ 75%] 2023-01-11T23:13:47.2428116Z test_ops.py::TestCompositeComplianceCUDA::test_backward_atan2_cuda_float32 PASSED [ 75%] 2023-01-11T23:13:47.2428291Z test_ops.py::TestCompositeComplianceCUDA::test_backward_atleast_2d_cuda_float32 PASSED [ 75%] 2023-01-11T23:13:47.2428470Z test_ops.py::TestCompositeComplianceCUDA::test_backward_bfloat16_cuda_float32 PASSED [ 75%] 2023-01-11T23:13:47.2428651Z test_ops.py::TestCompositeComplianceCUDA::test_backward_block_diag_cuda_float32 PASSED [ 75%] 2023-01-11T23:13:47.2428834Z test_ops.py::TestCompositeComplianceCUDA::test_backward_cdouble_cuda_float32 PASSED [ 75%] 2023-01-11T23:13:47.2429012Z test_ops.py::TestCompositeComplianceCUDA::test_backward_chalf_cuda_float32 PASSED [ 75%] 2023-01-11T23:13:47.2429205Z test_ops.py::TestCompositeComplianceCUDA::test_backward_cholesky_inverse_cuda_float32 PASSED [ 75%] 2023-01-11T23:13:47.2429416Z test_ops.py::TestCompositeComplianceCUDA::test_backward_cholesky_solve_cuda_float32 PASSED [ 75%] 2023-01-11T23:13:47.2429619Z test_ops.py::TestCompositeComplianceCUDA::test_backward_clamp_cuda_float32 PASSED [ 75%] 2023-01-11T23:13:47.2429872Z test_ops.py::TestCompositeComplianceCUDA::test_backward_clamp_max_cuda_float32 PASSED [ 75%] 2023-01-11T23:13:47.2430042Z test_ops.py::TestCompositeComplianceCUDA::test_backward_clone_cuda_float32 PASSED [ 75%] 2023-01-11T23:13:47.2430223Z test_ops.py::TestCompositeComplianceCUDA::test_backward_complex_cuda_float32 PASSED [ 75%] 2023-01-11T23:13:47.2430417Z test_ops.py::TestCompositeComplianceCUDA::test_backward_constant_pad_nd_cuda_float32 PASSED [ 75%] 2023-01-11T23:13:47.2430596Z test_ops.py::TestCompositeComplianceCUDA::test_backward_copysign_cuda_float32 PASSED [ 75%] 2023-01-11T23:13:47.2430805Z test_ops.py::TestCompositeComplianceCUDA::test_backward_corrcoef_cuda_float32 PASSED [ 75%] 2023-01-11T23:13:47.2430985Z test_ops.py::TestCompositeComplianceCUDA::test_backward_cos_cuda_float32 PASSED [ 75%] 2023-01-11T23:13:47.2431160Z test_ops.py::TestCompositeComplianceCUDA::test_backward_cov_cuda_float32 PASSED [ 75%] 2023-01-11T23:13:47.2431335Z test_ops.py::TestCompositeComplianceCUDA::test_backward_cross_cuda_float32 PASSED [ 75%] 2023-01-11T23:13:47.2431504Z test_ops.py::TestCompositeComplianceCUDA::test_backward_cummax_cuda_float32 PASSED [ 75%] 2023-01-11T23:13:47.2431682Z test_ops.py::TestCompositeComplianceCUDA::test_backward_cumprod_cuda_float32 PASSED [ 75%] 2023-01-11T23:13:47.2431861Z test_ops.py::TestCompositeComplianceCUDA::test_backward_cumsum_cuda_float32 PASSED [ 75%] 2023-01-11T23:13:47.2432058Z test_ops.py::TestCompositeComplianceCUDA::test_backward_cumulative_trapezoid_cuda_float32 PASSED [ 75%] 2023-01-11T23:13:47.2432242Z test_ops.py::TestCompositeComplianceCUDA::test_backward_diag_embed_cuda_float32 PASSED [ 75%] 2023-01-11T23:13:47.2432428Z test_ops.py::TestCompositeComplianceCUDA::test_backward_diagonal_copy_cuda_float32 PASSED [ 75%] 2023-01-11T23:13:47.2432603Z test_ops.py::TestCompositeComplianceCUDA::test_backward_diff_cuda_float32 PASSED [ 75%] 2023-01-11T23:13:47.2432780Z test_ops.py::TestCompositeComplianceCUDA::test_backward_digamma_cuda_float32 PASSED [ 75%] 2023-01-11T23:13:47.2432973Z test_ops.py::TestCompositeComplianceCUDA::test_backward_div_floor_rounding_cuda_float32 PASSED [ 75%] 2023-01-11T23:13:47.2433153Z test_ops.py::TestCompositeComplianceCUDA::test_backward_div_trunc_rounding_cuda_float32 PASSED [ 75%] 2023-01-11T23:13:47.2433331Z test_ops.py::TestCompositeComplianceCUDA::test_backward_dsplit_cuda_float32 PASSED [ 75%] 2023-01-11T23:13:47.2433534Z test_ops.py::TestCompositeComplianceCUDA::test_backward_dstack_cuda_float32 PASSED [ 75%] 2023-01-11T23:13:47.2433711Z test_ops.py::TestCompositeComplianceCUDA::test_backward_einsum_cuda_float32 PASSED [ 75%] 2023-01-11T23:13:47.2433889Z test_ops.py::TestCompositeComplianceCUDA::test_backward_erf_cuda_float32 PASSED [ 75%] 2023-01-11T23:13:47.2434064Z test_ops.py::TestCompositeComplianceCUDA::test_backward_exp_cuda_float32 PASSED [ 75%] 2023-01-11T23:13:47.2434239Z test_ops.py::TestCompositeComplianceCUDA::test_backward_expand_cuda_float32 PASSED [ 75%] 2023-01-11T23:13:47.2434413Z test_ops.py::TestCompositeComplianceCUDA::test_backward_expm1_cuda_float32 PASSED [ 75%] 2023-01-11T23:13:47.2434598Z test_ops.py::TestCompositeComplianceCUDA::test_backward_fft_fftshift_cuda_float32 PASSED [ 75%] 2023-01-11T23:13:47.2434771Z test_ops.py::TestCompositeComplianceCUDA::test_backward_fft_hfft2_cuda_float32 PASSED [ 75%] 2023-01-11T23:13:47.2434954Z test_ops.py::TestCompositeComplianceCUDA::test_backward_fft_hfftn_cuda_float32 PASSED [ 75%] 2023-01-11T23:13:47.2435129Z test_ops.py::TestCompositeComplianceCUDA::test_backward_fft_ifftn_cuda_float32 PASSED [ 75%] 2023-01-11T23:13:47.2435313Z test_ops.py::TestCompositeComplianceCUDA::test_backward_fft_ihfftn_cuda_float32 PASSED [ 75%] 2023-01-11T23:13:47.2435493Z test_ops.py::TestCompositeComplianceCUDA::test_backward_fft_irfft2_cuda_float32 PASSED [ 75%] 2023-01-11T23:13:47.2435669Z test_ops.py::TestCompositeComplianceCUDA::test_backward_fft_irfftn_cuda_float32 PASSED [ 75%] 2023-01-11T23:13:47.2435846Z test_ops.py::TestCompositeComplianceCUDA::test_backward_fft_rfft_cuda_float32 PASSED [ 75%] 2023-01-11T23:13:47.2436023Z test_ops.py::TestCompositeComplianceCUDA::test_backward_flatten_cuda_float32 PASSED [ 75%] 2023-01-11T23:13:47.2436191Z test_ops.py::TestCompositeComplianceCUDA::test_backward_flip_cuda_float32 PASSED [ 75%] 2023-01-11T23:13:47.2436370Z test_ops.py::TestCompositeComplianceCUDA::test_backward_fliplr_cuda_float32 PASSED [ 75%] 2023-01-11T23:13:47.2436557Z test_ops.py::TestCompositeComplianceCUDA::test_backward_float_power_cuda_float32 PASSED [ 75%] 2023-01-11T23:13:47.2436758Z test_ops.py::TestCompositeComplianceCUDA::test_backward_floor_cuda_float32 PASSED [ 75%] 2023-01-11T23:13:47.2436936Z test_ops.py::TestCompositeComplianceCUDA::test_backward_fmax_cuda_float32 PASSED [ 75%] 2023-01-11T23:13:47.2437114Z test_ops.py::TestCompositeComplianceCUDA::test_backward_frac_cuda_float32 PASSED [ 75%] 2023-01-11T23:13:47.2437291Z test_ops.py::TestCompositeComplianceCUDA::test_backward_frexp_cuda_float32 PASSED [ 75%] 2023-01-11T23:13:47.2437472Z test_ops.py::TestCompositeComplianceCUDA::test_backward_gradient_cuda_float32 PASSED [ 75%] 2023-01-11T23:13:47.2437662Z test_ops.py::TestCompositeComplianceCUDA::test_backward_grid_sampler_2d_cuda_float32 PASSED [ 75%] 2023-01-11T23:13:47.2437830Z test_ops.py::TestCompositeComplianceCUDA::test_backward_hsplit_cuda_float32 PASSED [ 75%] 2023-01-11T23:13:47.2438008Z test_ops.py::TestCompositeComplianceCUDA::test_backward_i0_cuda_float32 PASSED [ 75%] 2023-01-11T23:13:47.2438187Z test_ops.py::TestCompositeComplianceCUDA::test_backward_index_put_cuda_float32 PASSED [ 75%] 2023-01-11T23:13:47.2438366Z test_ops.py::TestCompositeComplianceCUDA::test_backward_ldexp_cuda_float32 PASSED [ 75%] 2023-01-11T23:13:47.2438541Z test_ops.py::TestCompositeComplianceCUDA::test_backward_lgamma_cuda_float32 PASSED [ 75%] 2023-01-11T23:13:47.2438729Z test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_eigvals_cuda_float32 PASSED [ 75%] 2023-01-11T23:13:47.2438915Z test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_eigvalsh_cuda_float32 PASSED [ 75%] 2023-01-11T23:13:47.2439096Z test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_inv_cuda_float32 PASSED [ 75%] 2023-01-11T23:13:47.2439273Z test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_inv_ex_cuda_float32 PASSED [ 75%] 2023-01-11T23:13:47.2439491Z test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_lstsq_cuda_float32 PASSED [ 75%] 2023-01-11T23:13:47.2439691Z test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_lstsq_grad_oriented_cuda_float32 PASSED [ 75%] 2023-01-11T23:13:47.2439873Z test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_lu_cuda_float32 PASSED [ 75%] 2023-01-11T23:13:47.2440057Z test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_lu_solve_cuda_float32 PASSED [ 75%] 2023-01-11T23:13:47.2440246Z test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_matrix_norm_cuda_float32 PASSED [ 75%] 2023-01-11T23:13:47.2440438Z test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_matrix_power_cuda_float32 PASSED [ 75%] 2023-01-11T23:13:47.2440627Z test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_multi_dot_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2440810Z test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_norm_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2441011Z test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_norm_subgradients_at_zero_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2441207Z test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_pinv_hermitian_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2441468Z test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_pinv_singular_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 76%] 2023-01-11T23:13:47.2441656Z test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_slogdet_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2441841Z test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_svdvals_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2442034Z test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_tensorsolve_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2442220Z test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_vander_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2442404Z test_ops.py::TestCompositeComplianceCUDA::test_backward_log_softmax_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2442603Z test_ops.py::TestCompositeComplianceCUDA::test_backward_log_softmax_with_dtype_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2442802Z test_ops.py::TestCompositeComplianceCUDA::test_backward_logaddexp2_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2442979Z test_ops.py::TestCompositeComplianceCUDA::test_backward_logdet_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2443151Z test_ops.py::TestCompositeComplianceCUDA::test_backward_lu_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2443329Z test_ops.py::TestCompositeComplianceCUDA::test_backward_lu_unpack_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2443509Z test_ops.py::TestCompositeComplianceCUDA::test_backward_masked_amax_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2443688Z test_ops.py::TestCompositeComplianceCUDA::test_backward_masked_amin_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2443874Z test_ops.py::TestCompositeComplianceCUDA::test_backward_masked_cumprod_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2444061Z test_ops.py::TestCompositeComplianceCUDA::test_backward_masked_fill_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2444250Z test_ops.py::TestCompositeComplianceCUDA::test_backward_masked_log_softmax_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2444434Z test_ops.py::TestCompositeComplianceCUDA::test_backward_masked_logaddexp_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2444612Z test_ops.py::TestCompositeComplianceCUDA::test_backward_masked_norm_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2444792Z test_ops.py::TestCompositeComplianceCUDA::test_backward_masked_prod_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2444973Z test_ops.py::TestCompositeComplianceCUDA::test_backward_masked_softmax_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2445156Z test_ops.py::TestCompositeComplianceCUDA::test_backward_masked_std_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2445338Z test_ops.py::TestCompositeComplianceCUDA::test_backward_matrix_exp_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2445558Z test_ops.py::TestCompositeComplianceCUDA::test_backward_max_reduction_no_dim_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2445736Z test_ops.py::TestCompositeComplianceCUDA::test_backward_maximum_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2445914Z test_ops.py::TestCompositeComplianceCUDA::test_backward_mean_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2446106Z test_ops.py::TestCompositeComplianceCUDA::test_backward_meshgrid_list_of_tensors_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2446307Z test_ops.py::TestCompositeComplianceCUDA::test_backward_meshgrid_variadic_tensors_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2446486Z test_ops.py::TestCompositeComplianceCUDA::test_backward_min_binary_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2446682Z test_ops.py::TestCompositeComplianceCUDA::test_backward_min_reduction_with_dim_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2446865Z test_ops.py::TestCompositeComplianceCUDA::test_backward_minimum_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2447041Z test_ops.py::TestCompositeComplianceCUDA::test_backward_mm_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2447216Z test_ops.py::TestCompositeComplianceCUDA::test_backward_mul_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2447392Z test_ops.py::TestCompositeComplianceCUDA::test_backward_mv_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2447579Z test_ops.py::TestCompositeComplianceCUDA::test_backward_mvlgamma_mvlgamma_p_3_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2447774Z test_ops.py::TestCompositeComplianceCUDA::test_backward_mvlgamma_mvlgamma_p_5_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2447956Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nan_to_num_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2448141Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nanmedian_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2448317Z test_ops.py::TestCompositeComplianceCUDA::test_backward_narrow_cuda_float32 XFAIL [ 76%] 2023-01-11T23:13:47.2448508Z test_ops.py::TestCompositeComplianceCUDA::test_backward_native_layer_norm_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2448734Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional__scaled_dot_product_attention_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2448968Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_adaptive_avg_pool1d_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2449175Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_adaptive_avg_pool3d_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2449369Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_adaptive_max_pool1d_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2449571Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_adaptive_max_pool2d_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2449769Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_avg_pool1d_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2449970Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_avg_pool3d_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2450216Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_bilinear_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2450432Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_binary_cross_entropy_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2450682Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_conv_transpose2d_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2450892Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_conv_transpose3d_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2451096Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_cross_entropy_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2451299Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_embedding_bag_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2451492Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_embedding_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2451824Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_gaussian_nll_loss_cuda_float32 XFAIL [ 76%] 2023-01-11T23:13:47.2452037Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_gelu_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2452276Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_grid_sample_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2452479Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_hardshrink_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2452679Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_hardsigmoid_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2452880Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_hardswish_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2453093Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_hinge_embedding_loss_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2453307Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_interpolate_bicubic_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2453510Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_interpolate_linear_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2453711Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_layer_norm_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2453908Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_leaky_relu_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2454115Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_local_response_norm_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2454312Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_logsigmoid_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2454626Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_max_pool1d_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2454823Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_max_pool2d_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2455028Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_max_unpool1d_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2455297Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_max_unpool1d_grad_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2455494Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_multi_margin_loss_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2455692Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_nll_loss_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2455893Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_pixel_shuffle_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2456098Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_pixel_unshuffle_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2456300Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_poisson_nll_loss_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2456500Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_rrelu_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2456704Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_smooth_l1_loss_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2456911Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_soft_margin_loss_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2457106Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_softmin_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2457305Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_softmin_with_dtype_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2457500Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_softplus_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2457694Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_softsign_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2457922Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_unfold_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2458103Z test_ops.py::TestCompositeComplianceCUDA::test_backward_norm_fro_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2458286Z test_ops.py::TestCompositeComplianceCUDA::test_backward_norm_inf_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2458463Z test_ops.py::TestCompositeComplianceCUDA::test_backward_norm_nuc_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2458639Z test_ops.py::TestCompositeComplianceCUDA::test_backward_normal_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2458819Z test_ops.py::TestCompositeComplianceCUDA::test_backward_permute_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2458990Z test_ops.py::TestCompositeComplianceCUDA::test_backward_pinverse_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2459192Z test_ops.py::TestCompositeComplianceCUDA::test_backward_polygamma_polygamma_n_1_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2459391Z test_ops.py::TestCompositeComplianceCUDA::test_backward_positive_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2459598Z test_ops.py::TestCompositeComplianceCUDA::test_backward_pow_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2459781Z test_ops.py::TestCompositeComplianceCUDA::test_backward_prod_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2459958Z test_ops.py::TestCompositeComplianceCUDA::test_backward_qr_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2460134Z test_ops.py::TestCompositeComplianceCUDA::test_backward_ravel_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2460316Z test_ops.py::TestCompositeComplianceCUDA::test_backward_reciprocal_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2460498Z test_ops.py::TestCompositeComplianceCUDA::test_backward_remainder_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2460665Z test_ops.py::TestCompositeComplianceCUDA::test_backward_renorm_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2460858Z test_ops.py::TestCompositeComplianceCUDA::test_backward_repeat_interleave_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2461051Z test_ops.py::TestCompositeComplianceCUDA::test_backward_resolve_conj_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2461235Z test_ops.py::TestCompositeComplianceCUDA::test_backward_resolve_neg_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2461437Z test_ops.py::TestCompositeComplianceCUDA::test_backward_rot90_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2461628Z test_ops.py::TestCompositeComplianceCUDA::test_backward_round_decimals_3_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2461806Z test_ops.py::TestCompositeComplianceCUDA::test_backward_rsqrt_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2461991Z test_ops.py::TestCompositeComplianceCUDA::test_backward_scatter_add_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2462161Z test_ops.py::TestCompositeComplianceCUDA::test_backward_scatter_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2462353Z test_ops.py::TestCompositeComplianceCUDA::test_backward_scatter_reduce_amax_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2462547Z test_ops.py::TestCompositeComplianceCUDA::test_backward_scatter_reduce_mean_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2462721Z test_ops.py::TestCompositeComplianceCUDA::test_backward_select_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2462911Z test_ops.py::TestCompositeComplianceCUDA::test_backward_select_scatter_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2463087Z test_ops.py::TestCompositeComplianceCUDA::test_backward_sigmoid_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2463260Z test_ops.py::TestCompositeComplianceCUDA::test_backward_sign_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2463437Z test_ops.py::TestCompositeComplianceCUDA::test_backward_sinh_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2463622Z test_ops.py::TestCompositeComplianceCUDA::test_backward_slice_scatter_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2463806Z test_ops.py::TestCompositeComplianceCUDA::test_backward_softmax_with_dtype_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2464014Z test_ops.py::TestCompositeComplianceCUDA::test_backward_sparse_sampled_addmm_cuda_float32 SKIPPED (Skipped!) [ 76%] 2023-01-11T23:13:47.2464223Z test_ops.py::TestCompositeComplianceCUDA::test_backward_special_entr_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2464411Z test_ops.py::TestCompositeComplianceCUDA::test_backward_special_ndtri_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2464628Z test_ops.py::TestCompositeComplianceCUDA::test_backward_special_polygamma_special_polygamma_n_0_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2464814Z test_ops.py::TestCompositeComplianceCUDA::test_backward_split_list_args_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2465003Z test_ops.py::TestCompositeComplianceCUDA::test_backward_split_with_sizes_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2465177Z test_ops.py::TestCompositeComplianceCUDA::test_backward_sqrt_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2465355Z test_ops.py::TestCompositeComplianceCUDA::test_backward_square_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2465523Z test_ops.py::TestCompositeComplianceCUDA::test_backward_std_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2465702Z test_ops.py::TestCompositeComplianceCUDA::test_backward_std_mean_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2465888Z test_ops.py::TestCompositeComplianceCUDA::test_backward_std_unbiased_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2466061Z test_ops.py::TestCompositeComplianceCUDA::test_backward_stft_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2466240Z test_ops.py::TestCompositeComplianceCUDA::test_backward_sum_to_size_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2466414Z test_ops.py::TestCompositeComplianceCUDA::test_backward_svd_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2466753Z test_ops.py::TestCompositeComplianceCUDA::test_backward_svd_lowrank_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 76%] 2023-01-11T23:13:47.2466941Z test_ops.py::TestCompositeComplianceCUDA::test_backward_symeig_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2467149Z test_ops.py::TestCompositeComplianceCUDA::test_backward_take_along_dim_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2467336Z test_ops.py::TestCompositeComplianceCUDA::test_backward_tanh_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2467584Z test_ops.py::TestCompositeComplianceCUDA::test_backward_to_sparse_cuda_float32 SKIPPED (Allowed exception) [ 76%] 2023-01-11T23:13:47.2467762Z test_ops.py::TestCompositeComplianceCUDA::test_backward_trace_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2467944Z test_ops.py::TestCompositeComplianceCUDA::test_backward_transpose_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2468120Z test_ops.py::TestCompositeComplianceCUDA::test_backward_trapz_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2468309Z test_ops.py::TestCompositeComplianceCUDA::test_backward_triangular_solve_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2468479Z test_ops.py::TestCompositeComplianceCUDA::test_backward_tril_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2468654Z test_ops.py::TestCompositeComplianceCUDA::test_backward_triu_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2468827Z test_ops.py::TestCompositeComplianceCUDA::test_backward_unbind_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2469011Z test_ops.py::TestCompositeComplianceCUDA::test_backward_unfold_copy_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2469188Z test_ops.py::TestCompositeComplianceCUDA::test_backward_unfold_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2469367Z test_ops.py::TestCompositeComplianceCUDA::test_backward_var_mean_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2469550Z test_ops.py::TestCompositeComplianceCUDA::test_backward_var_unbiased_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2469789Z test_ops.py::TestCompositeComplianceCUDA::test_backward_vdot_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2469979Z test_ops.py::TestCompositeComplianceCUDA::test_backward_view_as_complex_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2470154Z test_ops.py::TestCompositeComplianceCUDA::test_backward_vsplit_cuda_float32 PASSED [ 76%] 2023-01-11T23:13:47.2470356Z test_ops.py::TestCompositeComplianceCUDA::test_backward_vstack_cuda_float32 PASSED [ 77%] 2023-01-11T23:13:47.2470524Z test_ops.py::TestCompositeComplianceCUDA::test_backward_where_cuda_float32 PASSED [ 77%] 2023-01-11T23:13:47.2470703Z test_ops.py::TestCompositeComplianceCUDA::test_backward_xlogy_cuda_float32 PASSED [ 77%] 2023-01-11T23:13:47.2470878Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_H_cuda_float32 PASSED [ 77%] 2023-01-11T23:13:47.2471053Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_T_cuda_float32 PASSED [ 77%] 2023-01-11T23:13:47.2471235Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad___radd___cuda_float32 PASSED [ 77%] 2023-01-11T23:13:47.2471419Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad___rmatmul___cuda_float32 PASSED [ 77%] 2023-01-11T23:13:47.2471597Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad___rsub___cuda_float32 PASSED [ 77%] 2023-01-11T23:13:47.2471795Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad__native_batch_norm_legit_cuda_float32 PASSED [ 77%] 2023-01-11T23:13:47.2471968Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_acosh_cuda_float32 PASSED [ 77%] 2023-01-11T23:13:47.2472150Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_addcmul_cuda_float32 PASSED [ 77%] 2023-01-11T23:13:47.2472341Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_addmm_decomposed_cuda_float32 PASSED [ 77%] 2023-01-11T23:13:47.2472516Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_addmv_cuda_float32 PASSED [ 77%] 2023-01-11T23:13:47.2472726Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_all_cuda_float32 SKIPPED (Does not support autograd) [ 77%] 2023-01-11T23:13:47.2472899Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_amax_cuda_float32 PASSED [ 77%] 2023-01-11T23:13:47.2473113Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_aminmax_cuda_float32 SKIPPED (Does not support autograd) [ 77%] 2023-01-11T23:13:47.2473285Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_angle_cuda_float32 PASSED [ 77%] 2023-01-11T23:13:47.2473500Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_argmin_cuda_float32 SKIPPED (Does not support autograd) [ 77%] 2023-01-11T23:13:47.2473694Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_asinh_cuda_float32 PASSED [ 77%] 2023-01-11T23:13:47.2473867Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_atan_cuda_float32 PASSED [ 77%] 2023-01-11T23:13:47.2474048Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_atleast_1d_cuda_float32 PASSED [ 77%] 2023-01-11T23:13:47.2474228Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_baddbmm_cuda_float32 PASSED [ 77%] 2023-01-11T23:13:47.2474440Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_bfloat16_cuda_float32 SKIPPED (Does not support forward_ad) [ 77%] 2023-01-11T23:13:47.2474626Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_block_diag_cuda_float32 PASSED [ 77%] 2023-01-11T23:13:47.2474835Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_bool_cuda_float32 SKIPPED (Does not support autograd) [ 77%] 2023-01-11T23:13:47.2475061Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_broadcast_shapes_cuda_float32 SKIPPED (Does not support autograd) [ 77%] 2023-01-11T23:13:47.2475256Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_broadcast_tensors_cuda_float32 PASSED [ 77%] 2023-01-11T23:13:47.2475435Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_broadcast_to_cuda_float32 PASSED [ 77%] 2023-01-11T23:13:47.2475646Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_bucketize_cuda_float32 SKIPPED (Does not support autograd) [ 77%] 2023-01-11T23:13:47.2475853Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_byte_cuda_float32 SKIPPED (Does not support autograd) [ 77%] 2023-01-11T23:13:47.2476033Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_cdouble_cuda_float32 PASSED [ 77%] 2023-01-11T23:13:47.2476205Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_ceil_cuda_float32 PASSED [ 77%] 2023-01-11T23:13:47.2476444Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_chalf_cuda_float32 SKIPPED (Does not support forward_ad) [ 77%] 2023-01-11T23:13:47.2476654Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_char_cuda_float32 SKIPPED (Does not support autograd) [ 77%] 2023-01-11T23:13:47.2476846Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_cholesky_inverse_cuda_float32 PASSED [ 77%] 2023-01-11T23:13:47.2477023Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_chunk_cuda_float32 PASSED [ 77%] 2023-01-11T23:13:47.2477192Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_clamp_cuda_float32 PASSED [ 77%] 2023-01-11T23:13:47.2477373Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_clamp_max_cuda_float32 PASSED [ 77%] 2023-01-11T23:13:47.2477556Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_column_stack_cuda_float32 PASSED [ 77%] 2023-01-11T23:13:47.2477759Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_combinations_cuda_float32 PASSED [ 77%] 2023-01-11T23:13:47.2477996Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_complex_cuda_float32 PASSED [ 77%] 2023-01-11T23:13:47.2478224Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_contiguous_cuda_float32 PASSED [ 77%] 2023-01-11T23:13:47.2478408Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_copysign_cuda_float32 PASSED [ 77%] 2023-01-11T23:13:47.2478589Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_corrcoef_cuda_float32 PASSED [ 77%] 2023-01-11T23:13:47.2478764Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_cosh_cuda_float32 PASSED [ 77%] 2023-01-11T23:13:47.2478974Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_count_nonzero_cuda_float32 SKIPPED (Does not support autograd) [ 77%] 2023-01-11T23:13:47.2479149Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_cov_cuda_float32 PASSED [ 77%] 2023-01-11T23:13:47.2479328Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_cummax_cuda_float32 PASSED [ 77%] 2023-01-11T23:13:47.2479509Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_cummin_cuda_float32 PASSED [ 77%] 2023-01-11T23:13:47.2479689Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_cumprod_cuda_float32 PASSED [ 77%] 2023-01-11T23:13:47.2479929Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_cumulative_trapezoid_cuda_float32 PASSED [ 77%] 2023-01-11T23:13:47.2480109Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_diag_cuda_float32 PASSED [ 77%] 2023-01-11T23:13:47.2480295Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_diagflat_cuda_float32 PASSED [ 77%] 2023-01-11T23:13:47.2480486Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_diagonal_copy_cuda_float32 PASSED [ 77%] 2023-01-11T23:13:47.2480673Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_diagonal_scatter_cuda_float32 PASSED [ 77%] 2023-01-11T23:13:47.2480869Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_div_trunc_rounding_cuda_float32 PASSED [ 77%] 2023-01-11T23:13:47.2481047Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_dot_cuda_float32 PASSED [ 77%] 2023-01-11T23:13:47.2481226Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_dsplit_cuda_float32 PASSED [ 77%] 2023-01-11T23:13:47.2481407Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_dstack_cuda_float32 PASSED [ 77%] 2023-01-11T23:13:47.2481583Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_einsum_cuda_float32 PASSED [ 77%] 2023-01-11T23:13:47.2481799Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_eq_cuda_float32 SKIPPED (Does not support autograd) [ 77%] 2023-01-11T23:13:47.2482009Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_equal_cuda_float32 SKIPPED (Does not support autograd) [ 77%] 2023-01-11T23:13:47.2482178Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_erf_cuda_float32 PASSED [ 77%] 2023-01-11T23:13:47.2482353Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_erfinv_cuda_float32 PASSED [ 77%] 2023-01-11T23:13:47.2482566Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_exp2_cuda_float32 PASSED [ 77%] 2023-01-11T23:13:47.2482736Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_exp_cuda_float32 PASSED [ 77%] 2023-01-11T23:13:47.2482921Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_expand_as_cuda_float32 PASSED [ 77%] 2023-01-11T23:13:47.2483127Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_eye_cuda_float32 SKIPPED (Does not support autograd) [ 77%] 2023-01-11T23:13:47.2483305Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fft_fft_cuda_float32 PASSED [ 77%] 2023-01-11T23:13:47.2483483Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fft_fftn_cuda_float32 PASSED [ 77%] 2023-01-11T23:13:47.2483667Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fft_fftshift_cuda_float32 PASSED [ 77%] 2023-01-11T23:13:47.2483839Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fft_hfft_cuda_float32 PASSED [ 77%] 2023-01-11T23:13:47.2484019Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fft_ifft2_cuda_float32 PASSED [ 77%] 2023-01-11T23:13:47.2484199Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fft_ifft_cuda_float32 PASSED [ 77%] 2023-01-11T23:13:47.2484388Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fft_ifftshift_cuda_float32 PASSED [ 77%] 2023-01-11T23:13:47.2484567Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fft_ihfft_cuda_float32 PASSED [ 77%] 2023-01-11T23:13:47.2484748Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fft_ihfftn_cuda_float32 PASSED [ 77%] 2023-01-11T23:13:47.2484929Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fft_irfft2_cuda_float32 PASSED [ 77%] 2023-01-11T23:13:47.2485109Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fft_irfft_cuda_float32 PASSED [ 77%] 2023-01-11T23:13:47.2485287Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fft_rfftn_cuda_float32 PASSED [ 77%] 2023-01-11T23:13:47.2485459Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fill_cuda_float32 PASSED [ 77%] 2023-01-11T23:13:47.2485638Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_flatten_cuda_float32 PASSED [ 77%] 2023-01-11T23:13:47.2485841Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fliplr_cuda_float32 PASSED [ 77%] 2023-01-11T23:13:47.2486021Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_flipud_cuda_float32 PASSED [ 77%] 2023-01-11T23:13:47.2486198Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_floor_cuda_float32 PASSED [ 77%] 2023-01-11T23:13:47.2486371Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fmin_cuda_float32 PASSED [ 77%] 2023-01-11T23:13:47.2486577Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_ge_cuda_float32 SKIPPED (Does not support autograd) [ 77%] 2023-01-11T23:13:47.2486787Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_half_cuda_float32 SKIPPED (Does not support forward_ad) [ 77%] 2023-01-11T23:13:47.2487005Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_heaviside_cuda_float32 SKIPPED (Does not support autograd) [ 77%] 2023-01-11T23:13:47.2487174Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_hstack_cuda_float32 PASSED [ 77%] 2023-01-11T23:13:47.2487359Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_hypot_cuda_float32 PASSED [ 77%] 2023-01-11T23:13:47.2487534Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_i0_cuda_float32 PASSED [ 77%] 2023-01-11T23:13:47.2487746Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_igamma_cuda_float32 SKIPPED (Does not support autograd) [ 77%] 2023-01-11T23:13:47.2487955Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_igammac_cuda_float32 SKIPPED (Does not support autograd) [ 77%] 2023-01-11T23:13:47.2488135Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_index_add_cuda_float32 PASSED [ 77%] 2023-01-11T23:13:47.2488317Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_index_fill_cuda_float32 PASSED [ 77%] 2023-01-11T23:13:47.2488576Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_index_reduce_cuda_float32 SKIPPED (Does not support forward_ad) [ 77%] 2023-01-11T23:13:47.2488798Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_inner_cuda_float32 PASSED [ 77%] 2023-01-11T23:13:47.2489088Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_isneginf_cuda_float32 SKIPPED (Does not support autograd) [ 77%] 2023-01-11T23:13:47.2489358Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_isreal_cuda_float32 SKIPPED (Does not support autograd) [ 77%] 2023-01-11T23:13:47.2489592Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_jiterator_2inputs_2outputs_cuda_float32 SKIPPED (Does not support autograd) [ 77%] 2023-01-11T23:13:47.2489767Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_kron_cuda_float32 PASSED [ 77%] 2023-01-11T23:13:47.2489942Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_ldexp_cuda_float32 PASSED [ 77%] 2023-01-11T23:13:47.2490148Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_le_cuda_float32 SKIPPED (Does not support autograd) [ 77%] 2023-01-11T23:13:47.2490324Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_lerp_cuda_float32 PASSED [ 77%] 2023-01-11T23:13:47.2490510Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_cross_cuda_float32 PASSED [ 77%] 2023-01-11T23:13:47.2490690Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_det_cuda_float32 PASSED [ 77%] 2023-01-11T23:13:47.2490864Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_eig_cuda_float32 PASSED [ 77%] 2023-01-11T23:13:47.2491048Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_eigh_cuda_float32 PASSED [ 77%] 2023-01-11T23:13:47.2491236Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_eigvals_cuda_float32 PASSED [ 77%] 2023-01-11T23:13:47.2491417Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_inv_cuda_float32 PASSED [ 77%] 2023-01-11T23:13:47.2491638Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_lstsq_cuda_float32 SKIPPED (Does not support forward_ad) [ 77%] 2023-01-11T23:13:47.2491828Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_lu_factor_cuda_float32 PASSED [ 77%] 2023-01-11T23:13:47.2492059Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_lu_factor_ex_cuda_float32 PASSED [ 77%] 2023-01-11T23:13:47.2492282Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_matrix_rank_cuda_float32 SKIPPED (Does not support autograd) [ 77%] 2023-01-11T23:13:47.2492515Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_matrix_rank_hermitian_cuda_float32 SKIPPED (Does not support autograd) [ 77%] 2023-01-11T23:13:47.2492692Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_pinv_cuda_float32 PASSED [ 77%] 2023-01-11T23:13:47.2492952Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_pinv_singular_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 77%] 2023-01-11T23:13:47.2493138Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_qr_cuda_float32 PASSED [ 77%] 2023-01-11T23:13:47.2493324Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_slogdet_cuda_float32 PASSED [ 77%] 2023-01-11T23:13:47.2493517Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_solve_ex_cuda_float32 PASSED [ 77%] 2023-01-11T23:13:47.2493699Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_svd_cuda_float32 PASSED [ 77%] 2023-01-11T23:13:47.2493884Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_svdvals_cuda_float32 PASSED [ 77%] 2023-01-11T23:13:47.2494073Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_tensorinv_cuda_float32 PASSED [ 77%] 2023-01-11T23:13:47.2494265Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_tensorsolve_cuda_float32 PASSED [ 77%] 2023-01-11T23:13:47.2494443Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_vecdot_cuda_float32 PASSED [ 77%] 2023-01-11T23:13:47.2494878Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_vector_norm_cuda_float32 PASSED [ 77%] 2023-01-11T23:13:47.2495058Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_log1p_cuda_float32 PASSED [ 77%] 2023-01-11T23:13:47.2495240Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_log2_cuda_float32 PASSED [ 77%] 2023-01-11T23:13:47.2495423Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_log_softmax_cuda_float32 PASSED [ 77%] 2023-01-11T23:13:47.2495607Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_logaddexp_cuda_float32 PASSED [ 77%] 2023-01-11T23:13:47.2495822Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_logical_or_cuda_float32 SKIPPED (Does not support autograd) [ 77%] 2023-01-11T23:13:47.2495998Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_logit_cuda_float32 PASSED [ 77%] 2023-01-11T23:13:47.2496211Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_logspace_cuda_float32 SKIPPED (Does not support autograd) [ 77%] 2023-01-11T23:13:47.2496389Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_logsumexp_cuda_float32 PASSED [ 77%] 2023-01-11T23:13:47.2496598Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_long_cuda_float32 SKIPPED (Does not support autograd) [ 77%] 2023-01-11T23:13:47.2496804Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_lt_cuda_float32 SKIPPED (Does not support autograd) [ 77%] 2023-01-11T23:13:47.2496978Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_lu_cuda_float32 PASSED [ 77%] 2023-01-11T23:13:47.2497152Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_mH_cuda_float32 PASSED [ 77%] 2023-01-11T23:13:47.2497327Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_mT_cuda_float32 PASSED [ 77%] 2023-01-11T23:13:47.2497510Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_amax_cuda_float32 PASSED [ 77%] 2023-01-11T23:13:47.2497696Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_cumsum_cuda_float32 PASSED [ 77%] 2023-01-11T23:13:47.2497882Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_fill_cuda_float32 PASSED [ 77%] 2023-01-11T23:13:47.2498113Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_log_softmax_cuda_float32 PASSED [ 77%] 2023-01-11T23:13:47.2498309Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_logaddexp_cuda_float32 PASSED [ 77%] 2023-01-11T23:13:47.2498490Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_norm_cuda_float32 PASSED [ 77%] 2023-01-11T23:13:47.2498675Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_prod_cuda_float32 PASSED [ 77%] 2023-01-11T23:13:47.2498860Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_scatter_cuda_float32 PASSED [ 77%] 2023-01-11T23:13:47.2499045Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_select_cuda_float32 PASSED [ 77%] 2023-01-11T23:13:47.2499227Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_softmax_cuda_float32 PASSED [ 77%] 2023-01-11T23:13:47.2499413Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_softmin_cuda_float32 PASSED [ 77%] 2023-01-11T23:13:47.2499615Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_std_cuda_float32 PASSED [ 78%] 2023-01-11T23:13:47.2499821Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_sum_cuda_float32 PASSED [ 78%] 2023-01-11T23:13:47.2500002Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_matrix_exp_cuda_float32 PASSED [ 78%] 2023-01-11T23:13:47.2500208Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_max_pool2d_with_indices_backward_cuda_float32 PASSED [ 78%] 2023-01-11T23:13:47.2500383Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_mean_cuda_float32 PASSED [ 78%] 2023-01-11T23:13:47.2500562Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_median_cuda_float32 PASSED [ 78%] 2023-01-11T23:13:47.2500759Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_meshgrid_list_of_tensors_cuda_float32 PASSED [ 78%] 2023-01-11T23:13:47.2500991Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_meshgrid_variadic_tensors_cuda_float32 PASSED [ 78%] 2023-01-11T23:13:47.2501192Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_min_reduction_with_dim_cuda_float32 PASSED [ 78%] 2023-01-11T23:13:47.2501365Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_minimum_cuda_float32 PASSED [ 78%] 2023-01-11T23:13:47.2501544Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_mm_cuda_float32 PASSED [ 78%] 2023-01-11T23:13:47.2501721Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_mode_cuda_float32 PASSED [ 78%] 2023-01-11T23:13:47.2501900Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_movedim_cuda_float32 PASSED [ 78%] 2023-01-11T23:13:47.2502118Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_multinomial_cuda_float32 SKIPPED (Does not support autograd) [ 78%] 2023-01-11T23:13:47.2502307Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nanquantile_cuda_float32 PASSED [ 78%] 2023-01-11T23:13:47.2502484Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nansum_cuda_float32 PASSED [ 78%] 2023-01-11T23:13:47.2502665Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_narrow_cuda_float32 XFAIL [ 78%] 2023-01-11T23:13:47.2502858Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_native_batch_norm_cuda_float32 PASSED [ 78%] 2023-01-11T23:13:47.2503025Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_neg_cuda_float32 PASSED [ 78%] 2023-01-11T23:13:47.2503239Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_new_empty_cuda_float32 SKIPPED (Does not support autograd) [ 78%] 2023-01-11T23:13:47.2503447Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_new_full_cuda_float32 SKIPPED (Does not support autograd) [ 78%] 2023-01-11T23:13:47.2503654Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_new_zeros_cuda_float32 SKIPPED (Does not support autograd) [ 78%] 2023-01-11T23:13:47.2503870Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nextafter_cuda_float32 SKIPPED (Does not support autograd) [ 78%] 2023-01-11T23:13:47.2504106Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_adaptive_max_pool2d_cuda_float32 PASSED [ 78%] 2023-01-11T23:13:47.2504313Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_alpha_dropout_cuda_float32 PASSED [ 78%] 2023-01-11T23:13:47.2504510Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_avg_pool1d_cuda_float32 PASSED [ 78%] 2023-01-11T23:13:47.2504707Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_avg_pool3d_cuda_float32 PASSED [ 78%] 2023-01-11T23:13:47.2504892Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_batch_norm_cuda_float32 PASSED [ 78%] 2023-01-11T23:13:47.2505112Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_binary_cross_entropy_with_logits_cuda_float32 PASSED [ 78%] 2023-01-11T23:13:47.2505308Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_conv1d_cuda_float32 PASSED [ 78%] 2023-01-11T23:13:47.2505507Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_conv2d_cuda_float32 PASSED [ 78%] 2023-01-11T23:13:47.2505717Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_conv_transpose1d_cuda_float32 PASSED [ 78%] 2023-01-11T23:13:47.2505925Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_cosine_embedding_loss_cuda_float32 PASSED [ 78%] 2023-01-11T23:13:47.2506149Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_elu_cuda_float32 PASSED [ 78%] 2023-01-11T23:13:47.2506428Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_feature_alpha_dropout_without_train_cuda_float32 PASSED [ 78%] 2023-01-11T23:13:47.2506726Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_grid_sample_cuda_float32 SKIPPED (Does not support forward_ad) [ 78%] 2023-01-11T23:13:47.2507034Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_group_norm_cuda_float32 PASSED [ 78%] 2023-01-11T23:13:47.2507231Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_hardshrink_cuda_float32 PASSED [ 78%] 2023-01-11T23:13:47.2507439Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_hardsigmoid_cuda_float32 PASSED [ 78%] 2023-01-11T23:13:47.2507653Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_hinge_embedding_loss_cuda_float32 PASSED [ 78%] 2023-01-11T23:13:47.2507851Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_huber_loss_cuda_float32 PASSED [ 78%] 2023-01-11T23:13:47.2508052Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_instance_norm_cuda_float32 PASSED [ 78%] 2023-01-11T23:13:47.2508263Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_interpolate_bilinear_cuda_float32 PASSED [ 78%] 2023-01-11T23:13:47.2508475Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_interpolate_nearest_cuda_float32 PASSED [ 78%] 2023-01-11T23:13:47.2508687Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_interpolate_trilinear_cuda_float32 PASSED [ 78%] 2023-01-11T23:13:47.2508884Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_l1_loss_cuda_float32 PASSED [ 78%] 2023-01-11T23:13:47.2509073Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_layer_norm_cuda_float32 PASSED [ 78%] 2023-01-11T23:13:47.2509275Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_linear_cuda_float32 PASSED [ 78%] 2023-01-11T23:13:47.2509485Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_local_response_norm_cuda_float32 PASSED [ 78%] 2023-01-11T23:13:47.2509779Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_max_pool2d_cuda_float32 PASSED [ 78%] 2023-01-11T23:13:47.2509997Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_max_pool3d_cuda_float32 PASSED [ 78%] 2023-01-11T23:13:47.2510202Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_max_unpool1d_cuda_float32 PASSED [ 78%] 2023-01-11T23:13:47.2510445Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_max_unpool1d_grad_cuda_float32 PASSED [ 78%] 2023-01-11T23:13:47.2510664Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_max_unpool3d_cuda_float32 SKIPPED (Skipped!) [ 78%] 2023-01-11T23:13:47.2510876Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_max_unpool3d_grad_cuda_float32 PASSED [ 78%] 2023-01-11T23:13:47.2511076Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_mse_loss_cuda_float32 PASSED [ 78%] 2023-01-11T23:13:47.2511314Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_multilabel_margin_loss_cuda_float32 SKIPPED (Does not support forward_ad) [ 78%] 2023-01-11T23:13:47.2511538Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_multilabel_soft_margin_loss_cuda_float32 PASSED [ 78%] 2023-01-11T23:13:47.2511740Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_normalize_cuda_float32 PASSED [ 78%] 2023-01-11T23:13:47.2511949Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_pad_replicate_cuda_float32 PASSED [ 78%] 2023-01-11T23:13:47.2512161Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_pairwise_distance_cuda_float32 PASSED [ 78%] 2023-01-11T23:13:47.2512364Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_pixel_shuffle_cuda_float32 PASSED [ 78%] 2023-01-11T23:13:47.2512572Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_pixel_unshuffle_cuda_float32 PASSED [ 78%] 2023-01-11T23:13:47.2512768Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_prelu_cuda_float32 PASSED [ 78%] 2023-01-11T23:13:47.2512966Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_relu_cuda_float32 PASSED [ 78%] 2023-01-11T23:13:47.2513189Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_smooth_l1_loss_cuda_float32 PASSED [ 78%] 2023-01-11T23:13:47.2513403Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_soft_margin_loss_cuda_float32 PASSED [ 78%] 2023-01-11T23:13:47.2513613Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_softmin_with_dtype_cuda_float32 PASSED [ 78%] 2023-01-11T23:13:47.2513816Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_softplus_cuda_float32 PASSED [ 78%] 2023-01-11T23:13:47.2514016Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_softshrink_cuda_float32 PASSED [ 78%] 2023-01-11T23:13:47.2514215Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_softsign_cuda_float32 PASSED [ 78%] 2023-01-11T23:13:47.2514411Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_tanhshrink_cuda_float32 PASSED [ 78%] 2023-01-11T23:13:47.2514627Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_upsample_bilinear_cuda_float32 PASSED [ 78%] 2023-01-11T23:13:47.2514846Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nonzero_cuda_float32 SKIPPED (Does not support autograd) [ 78%] 2023-01-11T23:13:47.2515031Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_norm_cuda_float32 PASSED [ 78%] 2023-01-11T23:13:47.2515251Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_normal_number_mean_cuda_float32 SKIPPED (Does not support forward_ad) [ 78%] 2023-01-11T23:13:47.2515466Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_ones_like_cuda_float32 SKIPPED (Does not support autograd) [ 78%] 2023-01-11T23:13:47.2515648Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_outer_cuda_float32 PASSED [ 78%] 2023-01-11T23:13:47.2515900Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_pca_lowrank_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 78%] 2023-01-11T23:13:47.2516090Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_pinverse_cuda_float32 PASSED [ 78%] 2023-01-11T23:13:47.2516304Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_polar_cuda_float32 PASSED [ 78%] 2023-01-11T23:13:47.2516506Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_polygamma_polygamma_n_0_cuda_float32 PASSED [ 78%] 2023-01-11T23:13:47.2516709Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_polygamma_polygamma_n_2_cuda_float32 PASSED [ 78%] 2023-01-11T23:13:47.2516907Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_polygamma_polygamma_n_4_cuda_float32 PASSED [ 78%] 2023-01-11T23:13:47.2517077Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_pow_cuda_float32 PASSED [ 78%] 2023-01-11T23:13:47.2517253Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_prod_cuda_float32 PASSED [ 78%] 2023-01-11T23:13:47.2517430Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_put_cuda_float32 PASSED [ 78%] 2023-01-11T23:13:47.2517619Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_quantile_cuda_float32 PASSED [ 78%] 2023-01-11T23:13:47.2517801Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_rad2deg_cuda_float32 PASSED [ 78%] 2023-01-11T23:13:47.2518013Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_rand_like_cuda_float32 SKIPPED (Does not support autograd) [ 78%] 2023-01-11T23:13:47.2518190Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_ravel_cuda_float32 PASSED [ 78%] 2023-01-11T23:13:47.2518367Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_real_cuda_float32 PASSED [ 78%] 2023-01-11T23:13:47.2518552Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_remainder_cuda_float32 PASSED [ 78%] 2023-01-11T23:13:47.2518756Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_renorm_cuda_float32 SKIPPED (Does not support forward_ad) [ 78%] 2023-01-11T23:13:47.2518950Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_repeat_interleave_cuda_float32 PASSED [ 78%] 2023-01-11T23:13:47.2519205Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_reshape_as_cuda_float32 PASSED [ 78%] 2023-01-11T23:13:47.2519411Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_resolve_conj_cuda_float32 PASSED [ 78%] 2023-01-11T23:13:47.2519626Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_resolve_neg_cuda_float32 PASSED [ 78%] 2023-01-11T23:13:47.2519805Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_rot90_cuda_float32 PASSED [ 78%] 2023-01-11T23:13:47.2519981Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_round_cuda_float32 PASSED [ 78%] 2023-01-11T23:13:47.2520173Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_round_decimals_0_cuda_float32 PASSED [ 78%] 2023-01-11T23:13:47.2520340Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_rsub_cuda_float32 PASSED [ 78%] 2023-01-11T23:13:47.2520561Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_scalar_tensor_cuda_float32 SKIPPED (Does not support autograd) [ 78%] 2023-01-11T23:13:47.2520745Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_scatter_cuda_float32 PASSED [ 78%] 2023-01-11T23:13:47.2520942Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_scatter_reduce_amax_cuda_float32 PASSED [ 78%] 2023-01-11T23:13:47.2521139Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_scatter_reduce_mean_cuda_float32 PASSED [ 78%] 2023-01-11T23:13:47.2521365Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_scatter_reduce_prod_cuda_float32 SKIPPED (Does not support forward_ad) [ 78%] 2023-01-11T23:13:47.2521586Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_searchsorted_cuda_float32 SKIPPED (Does not support autograd) [ 78%] 2023-01-11T23:13:47.2521816Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_segment_reduce_lengths_cuda_float32 SKIPPED (Does not support forward_ad) [ 78%] 2023-01-11T23:13:47.2522046Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_segment_reduce_offsets_cuda_float32 SKIPPED (Does not support forward_ad) [ 78%] 2023-01-11T23:13:47.2522238Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_select_scatter_cuda_float32 PASSED [ 78%] 2023-01-11T23:13:47.2522432Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_sgn_cuda_float32 PASSED [ 78%] 2023-01-11T23:13:47.2522661Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_signal_windows_blackman_cuda_float32 SKIPPED (Does not support autograd) [ 78%] 2023-01-11T23:13:47.2522887Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_signal_windows_cosine_cuda_float32 SKIPPED (Does not support autograd) [ 78%] 2023-01-11T23:13:47.2523116Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_signal_windows_gaussian_cuda_float32 SKIPPED (Does not support autograd) [ 78%] 2023-01-11T23:13:47.2523352Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_signal_windows_general_cosine_cuda_float32 SKIPPED (Does not support autograd) [ 78%] 2023-01-11T23:13:47.2523577Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_signal_windows_hann_cuda_float32 SKIPPED (Does not support autograd) [ 78%] 2023-01-11T23:13:47.2523803Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_signal_windows_kaiser_cuda_float32 SKIPPED (Does not support autograd) [ 78%] 2023-01-11T23:13:47.2524028Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_signal_windows_nuttall_cuda_float32 SKIPPED (Does not support autograd) [ 78%] 2023-01-11T23:13:47.2524206Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_sinh_cuda_float32 PASSED [ 78%] 2023-01-11T23:13:47.2524384Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_slice_cuda_float32 PASSED [ 78%] 2023-01-11T23:13:47.2524557Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_softmax_cuda_float32 PASSED [ 78%] 2023-01-11T23:13:47.2524732Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_sort_cuda_float32 PASSED [ 78%] 2023-01-11T23:13:47.2524960Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_sparse_sampled_addmm_cuda_float32 SKIPPED (Does not support forward_ad) [ 78%] 2023-01-11T23:13:47.2525217Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_bessel_j0_cuda_float32 SKIPPED (Does not support autograd) [ 78%] 2023-01-11T23:13:47.2525433Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_bessel_j1_cuda_float32 SKIPPED (Does not support autograd) [ 78%] 2023-01-11T23:13:47.2525645Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_bessel_y0_cuda_float32 SKIPPED (Does not support autograd) [ 78%] 2023-01-11T23:13:47.2525883Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_chebyshev_polynomial_t_cuda_float32 SKIPPED (Does not support autograd) [ 78%] 2023-01-11T23:13:47.2526334Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_chebyshev_polynomial_v_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 78%] 2023-01-11T23:13:47.2526713Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_chebyshev_polynomial_w_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 78%] 2023-01-11T23:13:47.2526903Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_entr_cuda_float32 PASSED [ 78%] 2023-01-11T23:13:47.2527131Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_laguerre_polynomial_l_cuda_float32 SKIPPED (Does not support autograd) [ 78%] 2023-01-11T23:13:47.2527507Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_legendre_polynomial_p_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 78%] 2023-01-11T23:13:47.2527737Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_modified_bessel_i0_cuda_float32 SKIPPED (Does not support autograd) [ 78%] 2023-01-11T23:13:47.2527966Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_modified_bessel_i1_cuda_float32 SKIPPED (Does not support autograd) [ 78%] 2023-01-11T23:13:47.2528193Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_modified_bessel_k1_cuda_float32 SKIPPED (Does not support autograd) [ 78%] 2023-01-11T23:13:47.2528381Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_ndtri_cuda_float32 PASSED [ 78%] 2023-01-11T23:13:47.2528651Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_scaled_modified_bessel_k1_cuda_float32 SKIPPED (Does not support autograd) [ 78%] 2023-01-11T23:13:47.2529040Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_shifted_chebyshev_polynomial_u_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 78%] 2023-01-11T23:13:47.2529435Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_shifted_chebyshev_polynomial_v_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 78%] 2023-01-11T23:13:47.2529653Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_zeta_cuda_float32 SKIPPED (Does not support autograd) [ 78%] 2023-01-11T23:13:47.2529849Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_split_list_args_cuda_float32 PASSED [ 78%] 2023-01-11T23:13:47.2530023Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_square_cuda_float32 PASSED [ 78%] 2023-01-11T23:13:47.2530207Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_squeeze_cuda_float32 PASSED [ 78%] 2023-01-11T23:13:47.2530387Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_std_mean_cuda_float32 PASSED [ 78%] 2023-01-11T23:13:47.2530579Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_std_mean_unbiased_cuda_float32 PASSED [ 78%] 2023-01-11T23:13:47.2530766Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_std_unbiased_cuda_float32 PASSED [ 78%] 2023-01-11T23:13:47.2530942Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_sub_cuda_float32 PASSED [ 78%] 2023-01-11T23:13:47.2531116Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_sum_cuda_float32 PASSED [ 78%] 2023-01-11T23:13:47.2531329Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_sum_to_size_cuda_float32 PASSED [ 78%] 2023-01-11T23:13:47.2531492Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_svd_cuda_float32 PASSED [ 78%] 2023-01-11T23:13:47.2531673Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_tanh_cuda_float32 PASSED [ 78%] 2023-01-11T23:13:47.2531860Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_tensordot_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2532035Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_tile_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2532210Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_to_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2532388Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_topk_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2532567Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_trapz_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2532745Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_tril_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2532926Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_trunc_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2533102Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_unflatten_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2533319Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_uniform_cuda_float32 SKIPPED (Does not support autograd) [ 79%] 2023-01-11T23:13:47.2533546Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_unique_consecutive_cuda_float32 SKIPPED (Does not support autograd) [ 79%] 2023-01-11T23:13:47.2533754Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_unique_cuda_float32 SKIPPED (Does not support autograd) [ 79%] 2023-01-11T23:13:47.2533938Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_unsqueeze_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2534120Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_var_mean_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2534298Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_vdot_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2534612Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_view_as_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2534839Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_view_copy_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2535014Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_vsplit_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2535194Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_vstack_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2535374Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_where_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2535548Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_xlogy_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2535758Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_zeros_cuda_float32 SKIPPED (Does not support autograd) [ 79%] 2023-01-11T23:13:47.2535972Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_zeros_like_cuda_float32 SKIPPED (Does not support autograd) [ 79%] 2023-01-11T23:13:47.2536156Z test_ops.py::TestCompositeComplianceCUDA::test_operator___getitem___cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2536339Z test_ops.py::TestCompositeComplianceCUDA::test_operator___rmatmul___cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2536517Z test_ops.py::TestCompositeComplianceCUDA::test_operator___rmod___cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2536706Z test_ops.py::TestCompositeComplianceCUDA::test_operator__native_batch_norm_legit_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2536882Z test_ops.py::TestCompositeComplianceCUDA::test_operator_acosh_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2537056Z test_ops.py::TestCompositeComplianceCUDA::test_operator_add_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2537231Z test_ops.py::TestCompositeComplianceCUDA::test_operator_addbmm_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2537410Z test_ops.py::TestCompositeComplianceCUDA::test_operator_addcdiv_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2537621Z test_ops.py::TestCompositeComplianceCUDA::test_operator_addr_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2537796Z test_ops.py::TestCompositeComplianceCUDA::test_operator_all_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2537977Z test_ops.py::TestCompositeComplianceCUDA::test_operator_amax_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2538143Z test_ops.py::TestCompositeComplianceCUDA::test_operator_amin_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2538326Z test_ops.py::TestCompositeComplianceCUDA::test_operator_aminmax_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2538500Z test_ops.py::TestCompositeComplianceCUDA::test_operator_any_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2538679Z test_ops.py::TestCompositeComplianceCUDA::test_operator_arange_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2538856Z test_ops.py::TestCompositeComplianceCUDA::test_operator_argmin_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2539058Z test_ops.py::TestCompositeComplianceCUDA::test_operator_as_strided_partial_views_cuda_float32 XFAIL [ 79%] 2023-01-11T23:13:47.2539234Z test_ops.py::TestCompositeComplianceCUDA::test_operator_atan2_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2539439Z test_ops.py::TestCompositeComplianceCUDA::test_operator_atanh_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2539646Z test_ops.py::TestCompositeComplianceCUDA::test_operator_atleast_2d_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2539820Z test_ops.py::TestCompositeComplianceCUDA::test_operator_bernoulli_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2540005Z test_ops.py::TestCompositeComplianceCUDA::test_operator_broadcast_to_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2540180Z test_ops.py::TestCompositeComplianceCUDA::test_operator_cdist_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2540355Z test_ops.py::TestCompositeComplianceCUDA::test_operator_ceil_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2540532Z test_ops.py::TestCompositeComplianceCUDA::test_operator_cfloat_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2540714Z test_ops.py::TestCompositeComplianceCUDA::test_operator_chalf_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2540905Z test_ops.py::TestCompositeComplianceCUDA::test_operator_cholesky_inverse_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2541116Z test_ops.py::TestCompositeComplianceCUDA::test_operator_cholesky_solve_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2541285Z test_ops.py::TestCompositeComplianceCUDA::test_operator_clamp_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2541466Z test_ops.py::TestCompositeComplianceCUDA::test_operator_clamp_min_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2541644Z test_ops.py::TestCompositeComplianceCUDA::test_operator_clone_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2541828Z test_ops.py::TestCompositeComplianceCUDA::test_operator_column_stack_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2542015Z test_ops.py::TestCompositeComplianceCUDA::test_operator_conj_physical_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2542199Z test_ops.py::TestCompositeComplianceCUDA::test_operator_contiguous_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2542380Z test_ops.py::TestCompositeComplianceCUDA::test_operator_corrcoef_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2542557Z test_ops.py::TestCompositeComplianceCUDA::test_operator_cos_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2542731Z test_ops.py::TestCompositeComplianceCUDA::test_operator_cosh_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2542900Z test_ops.py::TestCompositeComplianceCUDA::test_operator_cummax_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2543075Z test_ops.py::TestCompositeComplianceCUDA::test_operator_cummin_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2543250Z test_ops.py::TestCompositeComplianceCUDA::test_operator_cumsum_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2543436Z test_ops.py::TestCompositeComplianceCUDA::test_operator_diag_embed_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2543627Z test_ops.py::TestCompositeComplianceCUDA::test_operator_div_floor_rounding_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2543832Z test_ops.py::TestCompositeComplianceCUDA::test_operator_double_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2544008Z test_ops.py::TestCompositeComplianceCUDA::test_operator_dsplit_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2544184Z test_ops.py::TestCompositeComplianceCUDA::test_operator_dstack_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2544351Z test_ops.py::TestCompositeComplianceCUDA::test_operator_eq_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2544526Z test_ops.py::TestCompositeComplianceCUDA::test_operator_erfc_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2544701Z test_ops.py::TestCompositeComplianceCUDA::test_operator_erfinv_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2544877Z test_ops.py::TestCompositeComplianceCUDA::test_operator_exp2_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2545060Z test_ops.py::TestCompositeComplianceCUDA::test_operator_expand_as_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2545237Z test_ops.py::TestCompositeComplianceCUDA::test_operator_eye_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2545414Z test_ops.py::TestCompositeComplianceCUDA::test_operator_fft_fft_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2545599Z test_ops.py::TestCompositeComplianceCUDA::test_operator_fft_fftshift_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2545777Z test_ops.py::TestCompositeComplianceCUDA::test_operator_fft_hfft2_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2545949Z test_ops.py::TestCompositeComplianceCUDA::test_operator_fft_ifft_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2546136Z test_ops.py::TestCompositeComplianceCUDA::test_operator_fft_ifftshift_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2546317Z test_ops.py::TestCompositeComplianceCUDA::test_operator_fft_ihfft_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2546497Z test_ops.py::TestCompositeComplianceCUDA::test_operator_fft_ihfftn_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2546675Z test_ops.py::TestCompositeComplianceCUDA::test_operator_fft_rfft_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2546853Z test_ops.py::TestCompositeComplianceCUDA::test_operator_fill_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2547053Z test_ops.py::TestCompositeComplianceCUDA::test_operator_flipud_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2547229Z test_ops.py::TestCompositeComplianceCUDA::test_operator_fmax_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2547396Z test_ops.py::TestCompositeComplianceCUDA::test_operator_fmin_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2547575Z test_ops.py::TestCompositeComplianceCUDA::test_operator_frexp_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2547748Z test_ops.py::TestCompositeComplianceCUDA::test_operator_ge_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2547926Z test_ops.py::TestCompositeComplianceCUDA::test_operator_gradient_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2548117Z test_ops.py::TestCompositeComplianceCUDA::test_operator_grid_sampler_2d_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2548301Z test_ops.py::TestCompositeComplianceCUDA::test_operator_heaviside_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2548474Z test_ops.py::TestCompositeComplianceCUDA::test_operator_hsplit_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2548650Z test_ops.py::TestCompositeComplianceCUDA::test_operator_hstack_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2548827Z test_ops.py::TestCompositeComplianceCUDA::test_operator_hypot_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2548996Z test_ops.py::TestCompositeComplianceCUDA::test_operator_igamma_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2549174Z test_ops.py::TestCompositeComplianceCUDA::test_operator_igammac_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2549351Z test_ops.py::TestCompositeComplianceCUDA::test_operator_index_add_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2549533Z test_ops.py::TestCompositeComplianceCUDA::test_operator_index_fill_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2549779Z test_ops.py::TestCompositeComplianceCUDA::test_operator_index_reduce_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2550002Z test_ops.py::TestCompositeComplianceCUDA::test_operator_index_select_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2550184Z test_ops.py::TestCompositeComplianceCUDA::test_operator_inner_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2550360Z test_ops.py::TestCompositeComplianceCUDA::test_operator_int_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2550532Z test_ops.py::TestCompositeComplianceCUDA::test_operator_isclose_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2550710Z test_ops.py::TestCompositeComplianceCUDA::test_operator_isfinite_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2550886Z test_ops.py::TestCompositeComplianceCUDA::test_operator_isin_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2551062Z test_ops.py::TestCompositeComplianceCUDA::test_operator_isinf_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2551237Z test_ops.py::TestCompositeComplianceCUDA::test_operator_isposinf_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2551418Z test_ops.py::TestCompositeComplianceCUDA::test_operator_isreal_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2551619Z test_ops.py::TestCompositeComplianceCUDA::test_operator_jiterator_binary_cuda_float32 SKIPPED (skip) [ 79%] 2023-01-11T23:13:47.2551831Z test_ops.py::TestCompositeComplianceCUDA::test_operator_jiterator_binary_return_by_ref_cuda_float32 SKIPPED (skip) [ 79%] 2023-01-11T23:13:47.2552034Z test_ops.py::TestCompositeComplianceCUDA::test_operator_jiterator_unary_cuda_float32 SKIPPED (skip) [ 79%] 2023-01-11T23:13:47.2552202Z test_ops.py::TestCompositeComplianceCUDA::test_operator_kron_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2552381Z test_ops.py::TestCompositeComplianceCUDA::test_operator_kthvalue_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2552555Z test_ops.py::TestCompositeComplianceCUDA::test_operator_lerp_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2552729Z test_ops.py::TestCompositeComplianceCUDA::test_operator_lgamma_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2552917Z test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_cond_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2553102Z test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_cross_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2553310Z test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_det_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2553507Z test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_det_singular_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2553690Z test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_eig_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2553870Z test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_eigvals_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2554061Z test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_eigvalsh_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2554251Z test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_ldl_factor_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2554456Z test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_lstsq_grad_oriented_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2554635Z test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_lu_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2554833Z test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_lu_factor_ex_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2555045Z test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_norm_subgradients_at_zero_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2555229Z test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_solve_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2555429Z test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_solve_triangular_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2555603Z test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_svd_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2555794Z test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_tensorinv_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2556015Z test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_tensorsolve_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2556202Z test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_vecdot_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2556396Z test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_vector_norm_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2556582Z test_ops.py::TestCompositeComplianceCUDA::test_operator_linspace_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2556760Z test_ops.py::TestCompositeComplianceCUDA::test_operator_log10_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2556935Z test_ops.py::TestCompositeComplianceCUDA::test_operator_log2_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2557103Z test_ops.py::TestCompositeComplianceCUDA::test_operator_log_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2557287Z test_ops.py::TestCompositeComplianceCUDA::test_operator_log_softmax_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2557469Z test_ops.py::TestCompositeComplianceCUDA::test_operator_logaddexp2_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2557648Z test_ops.py::TestCompositeComplianceCUDA::test_operator_logdet_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2557834Z test_ops.py::TestCompositeComplianceCUDA::test_operator_logical_not_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2558017Z test_ops.py::TestCompositeComplianceCUDA::test_operator_logical_xor_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2558195Z test_ops.py::TestCompositeComplianceCUDA::test_operator_logit_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2558369Z test_ops.py::TestCompositeComplianceCUDA::test_operator_lt_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2558539Z test_ops.py::TestCompositeComplianceCUDA::test_operator_mH_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2558713Z test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_amax_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2558902Z test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_cumsum_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2559090Z test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_fill_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2559281Z test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_log_softmax_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2559527Z test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_logaddexp_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2559726Z test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_mean_cuda_float32 PASSED [ 79%] 2023-01-11T23:13:47.2559906Z test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_norm_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2560094Z test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_softmax_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2560284Z test_ops.py::TestCompositeComplianceCUDA::test_operator_max_reduction_with_dim_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2560460Z test_ops.py::TestCompositeComplianceCUDA::test_operator_mean_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2560641Z test_ops.py::TestCompositeComplianceCUDA::test_operator_median_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2560841Z test_ops.py::TestCompositeComplianceCUDA::test_operator_meshgrid_list_of_tensors_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2561046Z test_ops.py::TestCompositeComplianceCUDA::test_operator_meshgrid_variadic_tensors_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2561227Z test_ops.py::TestCompositeComplianceCUDA::test_operator_min_binary_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2561406Z test_ops.py::TestCompositeComplianceCUDA::test_operator_minimum_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2561582Z test_ops.py::TestCompositeComplianceCUDA::test_operator_mode_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2561767Z test_ops.py::TestCompositeComplianceCUDA::test_operator_multinomial_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2561956Z test_ops.py::TestCompositeComplianceCUDA::test_operator_mvlgamma_mvlgamma_p_3_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2562161Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nan_to_num_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2562341Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nanmean_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2562542Z test_ops.py::TestCompositeComplianceCUDA::test_operator_native_dropout_backward_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2562770Z test_ops.py::TestCompositeComplianceCUDA::test_operator_new_empty_cuda_float32 SKIPPED (Expected: new_empty is not comparable) [ 80%] 2023-01-11T23:13:47.2563015Z test_ops.py::TestCompositeComplianceCUDA::test_operator_new_empty_strided_cuda_float32 SKIPPED (Expected: new_empty_strided is not comparable) [ 80%] 2023-01-11T23:13:47.2563196Z test_ops.py::TestCompositeComplianceCUDA::test_operator_new_full_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2563408Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_adaptive_avg_pool1d_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2563616Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_adaptive_avg_pool3d_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2563812Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_adaptive_max_pool2d_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2564015Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_adaptive_max_pool3d_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2564220Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_alpha_dropout_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2564419Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_avg_pool1d_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2564616Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_avg_pool2d_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2564809Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_avg_pool3d_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2565002Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_batch_norm_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2565220Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_batch_norm_without_cudnn_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2565444Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_bilinear_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2565667Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_binary_cross_entropy_with_logits_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2565855Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_conv1d_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2566061Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_conv_transpose3d_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2566272Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_cosine_embedding_loss_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2566483Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_cosine_similarity_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2566684Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_dropout2d_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2566896Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_fractional_max_pool2d_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2567104Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_fractional_max_pool3d_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2567305Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_gaussian_nll_loss_cuda_float32 XFAIL [ 80%] 2023-01-11T23:13:47.2567499Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_gelu_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2567693Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_grid_sample_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2567893Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_group_norm_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2568128Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_hardshrink_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2568330Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_hardsigmoid_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2568530Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_hardswish_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2568740Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_interpolate_bicubic_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2568950Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_interpolate_linear_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2569155Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_interpolate_nearest_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2569367Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_interpolate_trilinear_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2569557Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_l1_loss_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2569754Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_leaky_relu_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2569954Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_linear_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2570151Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_logsigmoid_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2570358Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_margin_ranking_loss_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2570554Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_max_pool2d_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2570760Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_max_unpool1d_grad_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2570960Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_max_unpool2d_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2571169Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_max_unpool2d_grad_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2571390Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_max_unpool3d_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2571585Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_max_unpool3d_grad_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2571801Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_multilabel_soft_margin_loss_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2571998Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_normalize_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2572197Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_pad_constant_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2572399Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_pad_replicate_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2572609Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_pairwise_distance_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2572813Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_pixel_shuffle_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2573016Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_poisson_nll_loss_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2573207Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_relu_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2573393Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_rrelu_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2573596Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_smooth_l1_loss_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2573797Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_soft_margin_loss_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2574022Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_tanhshrink_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2574220Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_threshold_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2574419Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_unfold_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2574699Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nonzero_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2574877Z test_ops.py::TestCompositeComplianceCUDA::test_operator_norm_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2575056Z test_ops.py::TestCompositeComplianceCUDA::test_operator_norm_fro_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2575225Z test_ops.py::TestCompositeComplianceCUDA::test_operator_norm_nuc_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2575403Z test_ops.py::TestCompositeComplianceCUDA::test_operator_normal_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2575586Z test_ops.py::TestCompositeComplianceCUDA::test_operator_ones_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2575766Z test_ops.py::TestCompositeComplianceCUDA::test_operator_ones_like_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2575954Z test_ops.py::TestCompositeComplianceCUDA::test_operator_pca_lowrank_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2576134Z test_ops.py::TestCompositeComplianceCUDA::test_operator_permute_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2576313Z test_ops.py::TestCompositeComplianceCUDA::test_operator_pinverse_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2576512Z test_ops.py::TestCompositeComplianceCUDA::test_operator_polygamma_polygamma_n_1_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2576710Z test_ops.py::TestCompositeComplianceCUDA::test_operator_polygamma_polygamma_n_2_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2576894Z test_ops.py::TestCompositeComplianceCUDA::test_operator_polygamma_polygamma_n_4_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2577071Z test_ops.py::TestCompositeComplianceCUDA::test_operator_positive_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2577251Z test_ops.py::TestCompositeComplianceCUDA::test_operator_qr_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2577471Z test_ops.py::TestCompositeComplianceCUDA::test_operator_quantile_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2577652Z test_ops.py::TestCompositeComplianceCUDA::test_operator_rad2deg_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2577837Z test_ops.py::TestCompositeComplianceCUDA::test_operator_randint_like_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2578014Z test_ops.py::TestCompositeComplianceCUDA::test_operator_randn_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2578191Z test_ops.py::TestCompositeComplianceCUDA::test_operator_ravel_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2578358Z test_ops.py::TestCompositeComplianceCUDA::test_operator_real_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2578544Z test_ops.py::TestCompositeComplianceCUDA::test_operator_reciprocal_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2578724Z test_ops.py::TestCompositeComplianceCUDA::test_operator_renorm_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2578920Z test_ops.py::TestCompositeComplianceCUDA::test_operator_repeat_interleave_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2579107Z test_ops.py::TestCompositeComplianceCUDA::test_operator_reshape_as_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2579287Z test_ops.py::TestCompositeComplianceCUDA::test_operator_reshape_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2579472Z test_ops.py::TestCompositeComplianceCUDA::test_operator_resolve_conj_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2579647Z test_ops.py::TestCompositeComplianceCUDA::test_operator_round_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2579834Z test_ops.py::TestCompositeComplianceCUDA::test_operator_round_decimals_0_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2580019Z test_ops.py::TestCompositeComplianceCUDA::test_operator_round_decimals_neg_3_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2580228Z test_ops.py::TestCompositeComplianceCUDA::test_operator_rsub_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2580413Z test_ops.py::TestCompositeComplianceCUDA::test_operator_scatter_add_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2580596Z test_ops.py::TestCompositeComplianceCUDA::test_operator_scatter_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2580790Z test_ops.py::TestCompositeComplianceCUDA::test_operator_scatter_reduce_amax_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2580981Z test_ops.py::TestCompositeComplianceCUDA::test_operator_scatter_reduce_amin_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2581174Z test_ops.py::TestCompositeComplianceCUDA::test_operator_scatter_reduce_mean_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2581366Z test_ops.py::TestCompositeComplianceCUDA::test_operator_scatter_reduce_prod_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2581544Z test_ops.py::TestCompositeComplianceCUDA::test_operator_searchsorted_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2581747Z test_ops.py::TestCompositeComplianceCUDA::test_operator_segment_reduce_offsets_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2581927Z test_ops.py::TestCompositeComplianceCUDA::test_operator_select_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2582116Z test_ops.py::TestCompositeComplianceCUDA::test_operator_select_scatter_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2582314Z test_ops.py::TestCompositeComplianceCUDA::test_operator_signal_windows_bartlett_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2582512Z test_ops.py::TestCompositeComplianceCUDA::test_operator_signal_windows_blackman_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2582714Z test_ops.py::TestCompositeComplianceCUDA::test_operator_signal_windows_exponential_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2582907Z test_ops.py::TestCompositeComplianceCUDA::test_operator_signal_windows_gaussian_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2583111Z test_ops.py::TestCompositeComplianceCUDA::test_operator_signal_windows_general_cosine_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2583301Z test_ops.py::TestCompositeComplianceCUDA::test_operator_signal_windows_nuttall_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2583483Z test_ops.py::TestCompositeComplianceCUDA::test_operator_signbit_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2583681Z test_ops.py::TestCompositeComplianceCUDA::test_operator_sin_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2583862Z test_ops.py::TestCompositeComplianceCUDA::test_operator_slice_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2584048Z test_ops.py::TestCompositeComplianceCUDA::test_operator_slice_scatter_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2584227Z test_ops.py::TestCompositeComplianceCUDA::test_operator_softmax_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2584400Z test_ops.py::TestCompositeComplianceCUDA::test_operator_sort_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2584610Z test_ops.py::TestCompositeComplianceCUDA::test_operator_sparse_sampled_addmm_cuda_float32 SKIPPED (Skipped!) [ 80%] 2023-01-11T23:13:47.2584805Z test_ops.py::TestCompositeComplianceCUDA::test_operator_special_bessel_j1_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2584984Z test_ops.py::TestCompositeComplianceCUDA::test_operator_special_bessel_y0_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2585172Z test_ops.py::TestCompositeComplianceCUDA::test_operator_special_bessel_y1_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2585381Z test_ops.py::TestCompositeComplianceCUDA::test_operator_special_chebyshev_polynomial_u_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2585773Z test_ops.py::TestCompositeComplianceCUDA::test_operator_special_chebyshev_polynomial_w_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 80%] 2023-01-11T23:13:47.2585958Z test_ops.py::TestCompositeComplianceCUDA::test_operator_special_entr_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2586161Z test_ops.py::TestCompositeComplianceCUDA::test_operator_special_hermite_polynomial_h_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2586378Z test_ops.py::TestCompositeComplianceCUDA::test_operator_special_log_ndtr_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2586576Z test_ops.py::TestCompositeComplianceCUDA::test_operator_special_modified_bessel_i0_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2586776Z test_ops.py::TestCompositeComplianceCUDA::test_operator_special_modified_bessel_i1_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2586953Z test_ops.py::TestCompositeComplianceCUDA::test_operator_special_ndtr_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2587168Z test_ops.py::TestCompositeComplianceCUDA::test_operator_special_polygamma_special_polygamma_n_0_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2587374Z test_ops.py::TestCompositeComplianceCUDA::test_operator_special_scaled_modified_bessel_k0_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2587574Z test_ops.py::TestCompositeComplianceCUDA::test_operator_special_scaled_modified_bessel_k1_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2587962Z test_ops.py::TestCompositeComplianceCUDA::test_operator_special_shifted_chebyshev_polynomial_u_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 80%] 2023-01-11T23:13:47.2588357Z test_ops.py::TestCompositeComplianceCUDA::test_operator_special_shifted_chebyshev_polynomial_v_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 80%] 2023-01-11T23:13:47.2588541Z test_ops.py::TestCompositeComplianceCUDA::test_operator_special_zeta_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2588720Z test_ops.py::TestCompositeComplianceCUDA::test_operator_split_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2588907Z test_ops.py::TestCompositeComplianceCUDA::test_operator_split_list_args_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2589095Z test_ops.py::TestCompositeComplianceCUDA::test_operator_split_with_sizes_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2589263Z test_ops.py::TestCompositeComplianceCUDA::test_operator_square_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2589443Z test_ops.py::TestCompositeComplianceCUDA::test_operator_squeeze_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2589622Z test_ops.py::TestCompositeComplianceCUDA::test_operator_std_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2589858Z test_ops.py::TestCompositeComplianceCUDA::test_operator_sum_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2590064Z test_ops.py::TestCompositeComplianceCUDA::test_operator_svd_cuda_float32 PASSED [ 80%] 2023-01-11T23:13:47.2590249Z test_ops.py::TestCompositeComplianceCUDA::test_operator_svd_lowrank_cuda_float32 PASSED [ 81%] 2023-01-11T23:13:47.2590435Z test_ops.py::TestCompositeComplianceCUDA::test_operator_take_along_dim_cuda_float32 PASSED [ 81%] 2023-01-11T23:13:47.2590609Z test_ops.py::TestCompositeComplianceCUDA::test_operator_tanh_cuda_float32 PASSED [ 81%] 2023-01-11T23:13:47.2590785Z test_ops.py::TestCompositeComplianceCUDA::test_operator_tile_cuda_float32 PASSED [ 81%] 2023-01-11T23:13:47.2590951Z test_ops.py::TestCompositeComplianceCUDA::test_operator_to_cuda_float32 PASSED [ 81%] 2023-01-11T23:13:47.2591158Z test_ops.py::TestCompositeComplianceCUDA::test_operator_to_sparse_cuda_float32 SKIPPED (Allowed exception) [ 81%] 2023-01-11T23:13:47.2591335Z test_ops.py::TestCompositeComplianceCUDA::test_operator_trace_cuda_float32 PASSED [ 81%] 2023-01-11T23:13:47.2591518Z test_ops.py::TestCompositeComplianceCUDA::test_operator_trapezoid_cuda_float32 PASSED [ 81%] 2023-01-11T23:13:47.2591692Z test_ops.py::TestCompositeComplianceCUDA::test_operator_trapz_cuda_float32 PASSED [ 81%] 2023-01-11T23:13:47.2591865Z test_ops.py::TestCompositeComplianceCUDA::test_operator_tril_cuda_float32 PASSED [ 81%] 2023-01-11T23:13:47.2592043Z test_ops.py::TestCompositeComplianceCUDA::test_operator_uniform_cuda_float32 PASSED [ 81%] 2023-01-11T23:13:47.2592220Z test_ops.py::TestCompositeComplianceCUDA::test_operator_var_mean_cuda_float32 PASSED [ 81%] 2023-01-11T23:13:47.2592397Z test_ops.py::TestCompositeComplianceCUDA::test_operator_var_unbiased_cuda_float32 PASSED [ 81%] 2023-01-11T23:13:47.2592567Z test_ops.py::TestCompositeComplianceCUDA::test_operator_view_cuda_float32 PASSED [ 81%] 2023-01-11T23:13:47.2592774Z test_ops.py::TestCompositeComplianceCUDA::test_operator_vsplit_cuda_float32 PASSED [ 81%] 2023-01-11T23:13:47.2592953Z test_ops.py::TestCompositeComplianceCUDA::test_operator_vstack_cuda_float32 PASSED [ 81%] 2023-01-11T23:13:47.2593127Z test_ops.py::TestCompositeComplianceCUDA::test_operator_xlogy_cuda_float32 PASSED [ 81%] 2023-01-11T23:13:47.2593301Z test_ops.py::TestCompositeComplianceCUDA::test_operator_zero__cuda_float32 PASSED [ 81%] 2023-01-11T23:13:47.2593458Z test_ops.py::TestMathBitsCUDA::test_conj_view_T_cuda_complex64 PASSED [ 81%] 2023-01-11T23:13:47.2593620Z test_ops.py::TestMathBitsCUDA::test_conj_view___getitem___cuda_complex64 PASSED [ 81%] 2023-01-11T23:13:47.2593779Z test_ops.py::TestMathBitsCUDA::test_conj_view___radd___cuda_complex64 PASSED [ 81%] 2023-01-11T23:13:47.2593926Z test_ops.py::TestMathBitsCUDA::test_conj_view___rdiv___cuda_complex64 PASSED [ 81%] 2023-01-11T23:13:47.2594090Z test_ops.py::TestMathBitsCUDA::test_conj_view___rmatmul___cuda_complex64 PASSED [ 81%] 2023-01-11T23:13:47.2594251Z test_ops.py::TestMathBitsCUDA::test_conj_view___rmul___cuda_complex64 PASSED [ 81%] 2023-01-11T23:13:47.2594407Z test_ops.py::TestMathBitsCUDA::test_conj_view___rpow___cuda_complex64 PASSED [ 81%] 2023-01-11T23:13:47.2594586Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs__conversions_bool_cuda_complex64 PASSED [ 81%] 2023-01-11T23:13:47.2594760Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs__conversions_byte_cuda_complex64 PASSED [ 81%] 2023-01-11T23:13:47.2594940Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs__conversions_cfloat_cuda_complex64 PASSED [ 81%] 2023-01-11T23:13:47.2595115Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs__conversions_int_cuda_complex64 PASSED [ 81%] 2023-01-11T23:13:47.2595267Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_acos_cuda_complex64 PASSED [ 81%] 2023-01-11T23:13:47.2595427Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_add_cuda_complex64 PASSED [ 81%] 2023-01-11T23:13:47.2595595Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_addcdiv_cuda_complex64 PASSED [ 81%] 2023-01-11T23:13:47.2595762Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_addcmul_cuda_complex64 PASSED [ 81%] 2023-01-11T23:13:47.2595944Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_addr_cuda_complex64 PASSED [ 81%] 2023-01-11T23:13:47.2596104Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_all_cuda_complex64 PASSED [ 81%] 2023-01-11T23:13:47.2596259Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_any_cuda_complex64 PASSED [ 81%] 2023-01-11T23:13:47.2596489Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_as_strided_partial_views_cuda_complex64 SKIPPED (Errors when storage_offset is included) [ 81%] 2023-01-11T23:13:47.2596667Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_as_strided_scatter_cuda_complex64 PASSED [ 81%] 2023-01-11T23:13:47.2596819Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_asin_cuda_complex64 PASSED [ 81%] 2023-01-11T23:13:47.2596984Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_asinh_cuda_complex64 PASSED [ 81%] 2023-01-11T23:13:47.2597144Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_atan_cuda_complex64 PASSED [ 81%] 2023-01-11T23:13:47.2597313Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_atleast_1d_cuda_complex64 PASSED [ 81%] 2023-01-11T23:13:47.2597488Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_broadcast_tensors_cuda_complex64 PASSED [ 81%] 2023-01-11T23:13:47.2597649Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_cat_cuda_complex64 PASSED [ 81%] 2023-01-11T23:13:47.2597817Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_contiguous_cuda_complex64 PASSED [ 81%] 2023-01-11T23:13:47.2597982Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_diag_embed_cuda_complex64 PASSED [ 81%] 2023-01-11T23:13:47.2598150Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_diagonal_scatter_cuda_complex64 PASSED [ 81%] 2023-01-11T23:13:47.2598330Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_div_no_rounding_mode_cuda_complex64 PASSED [ 81%] 2023-01-11T23:13:47.2598563Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_empty_cuda_complex64 SKIPPED (Expected: empty is not comparable) [ 81%] 2023-01-11T23:13:47.2598777Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_empty_like_cuda_complex64 SKIPPED (Expected: empty is not comparable) [ 81%] 2023-01-11T23:13:47.2598937Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_eq_cuda_complex64 PASSED [ 81%] 2023-01-11T23:13:47.2599099Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_exp_cuda_complex64 PASSED [ 81%] 2023-01-11T23:13:47.2599261Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_expand_cuda_complex64 PASSED [ 81%] 2023-01-11T23:13:47.2599437Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_eye_cuda_complex64 SKIPPED (Skipped!) [ 81%] 2023-01-11T23:13:47.2599605Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_fft_fftshift_cuda_complex64 PASSED [ 81%] 2023-01-11T23:13:47.2599762Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_fft_hfftn_cuda_complex64 PASSED [ 81%] 2023-01-11T23:13:47.2599929Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_fft_ifft2_cuda_complex64 PASSED [ 81%] 2023-01-11T23:13:47.2600106Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_fft_ifftshift_cuda_complex64 PASSED [ 81%] 2023-01-11T23:13:47.2600272Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_fft_irfft_cuda_complex64 PASSED [ 81%] 2023-01-11T23:13:47.2600438Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_fft_irfftn_cuda_complex64 PASSED [ 81%] 2023-01-11T23:13:47.2600598Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_fill_cuda_complex64 PASSED [ 81%] 2023-01-11T23:13:47.2600763Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_flatten_cuda_complex64 PASSED [ 81%] 2023-01-11T23:13:47.2600923Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_flip_cuda_complex64 PASSED [ 81%] 2023-01-11T23:13:47.2601079Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_fliplr_cuda_complex64 PASSED [ 81%] 2023-01-11T23:13:47.2601249Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_flipud_cuda_complex64 PASSED [ 81%] 2023-01-11T23:13:47.2601411Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_hstack_cuda_complex64 PASSED [ 81%] 2023-01-11T23:13:47.2601607Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_index_copy_cuda_complex64 PASSED [ 81%] 2023-01-11T23:13:47.2601772Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_index_fill_cuda_complex64 PASSED [ 81%] 2023-01-11T23:13:47.2601940Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_index_select_cuda_complex64 PASSED [ 81%] 2023-01-11T23:13:47.2602104Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_isclose_cuda_complex64 PASSED [ 81%] 2023-01-11T23:13:47.2602269Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_isfinite_cuda_complex64 PASSED [ 81%] 2023-01-11T23:13:47.2602431Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_isinf_cuda_complex64 PASSED [ 81%] 2023-01-11T23:13:47.2602586Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_isnan_cuda_complex64 PASSED [ 81%] 2023-01-11T23:13:47.2602752Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_linalg_svd_cuda_complex64 PASSED [ 81%] 2023-01-11T23:13:47.2602917Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_linspace_cuda_complex64 XFAIL [ 81%] 2023-01-11T23:13:47.2603081Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_log_cuda_complex64 PASSED [ 81%] 2023-01-11T23:13:47.2603250Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_logical_not_cuda_complex64 PASSED [ 81%] 2023-01-11T23:13:47.2603419Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_logical_xor_cuda_complex64 PASSED [ 81%] 2023-01-11T23:13:47.2603583Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_logspace_cuda_complex64 XFAIL [ 81%] 2023-01-11T23:13:47.2603743Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_mean_cuda_complex64 PASSED [ 81%] 2023-01-11T23:13:47.2603919Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_meshgrid_list_of_tensors_cuda_complex64 PASSED [ 81%] 2023-01-11T23:13:47.2604105Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_mul_cuda_complex64 PASSED [ 81%] 2023-01-11T23:13:47.2604271Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_narrow_copy_cuda_complex64 PASSED [ 81%] 2023-01-11T23:13:47.2604438Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_narrow_cuda_complex64 PASSED [ 81%] 2023-01-11T23:13:47.2604597Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_ne_cuda_complex64 PASSED [ 81%] 2023-01-11T23:13:47.2604759Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_new_full_cuda_complex64 PASSED [ 81%] 2023-01-11T23:13:47.2604922Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_new_ones_cuda_complex64 PASSED [ 81%] 2023-01-11T23:13:47.2605084Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_norm_cuda_complex64 PASSED [ 81%] 2023-01-11T23:13:47.2605247Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_permute_cuda_complex64 PASSED [ 81%] 2023-01-11T23:13:47.2605400Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_pow_cuda_complex64 PASSED [ 81%] 2023-01-11T23:13:47.2605563Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_prod_cuda_complex64 PASSED [ 81%] 2023-01-11T23:13:47.2605761Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_randn_cuda_complex64 SKIPPED (Test expects tensor input) [ 81%] 2023-01-11T23:13:47.2605924Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_rsub_cuda_complex64 PASSED [ 81%] 2023-01-11T23:13:47.2606088Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_sigmoid_cuda_complex64 PASSED [ 81%] 2023-01-11T23:13:47.2606266Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_softmax_with_dtype_cuda_complex64 PASSED [ 81%] 2023-01-11T23:13:47.2606451Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_special_softmax_with_dtype_cuda_complex64 PASSED [ 81%] 2023-01-11T23:13:47.2606610Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_sqrt_cuda_complex64 PASSED [ 81%] 2023-01-11T23:13:47.2606764Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_square_cuda_complex64 PASSED [ 81%] 2023-01-11T23:13:47.2606928Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_squeeze_cuda_complex64 PASSED [ 81%] 2023-01-11T23:13:47.2607092Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_stack_cuda_complex64 PASSED [ 81%] 2023-01-11T23:13:47.2607273Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_std_cuda_complex64 PASSED [ 81%] 2023-01-11T23:13:47.2607438Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_std_mean_cuda_complex64 PASSED [ 81%] 2023-01-11T23:13:47.2607598Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_sum_cuda_complex64 PASSED [ 81%] 2023-01-11T23:13:47.2607764Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_sum_to_size_cuda_complex64 PASSED [ 81%] 2023-01-11T23:13:47.2607924Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_tanh_cuda_complex64 PASSED [ 81%] 2023-01-11T23:13:47.2608093Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_tensor_split_cuda_complex64 PASSED [ 81%] 2023-01-11T23:13:47.2608244Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_to_cuda_complex64 PASSED [ 81%] 2023-01-11T23:13:47.2608404Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_tril_cuda_complex64 PASSED [ 81%] 2023-01-11T23:13:47.2608560Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_triu_cuda_complex64 PASSED [ 81%] 2023-01-11T23:13:47.2608733Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_unflatten_cuda_complex64 PASSED [ 81%] 2023-01-11T23:13:47.2608902Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_unsqueeze_cuda_complex64 PASSED [ 81%] 2023-01-11T23:13:47.2609065Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_var_cuda_complex64 PASSED [ 81%] 2023-01-11T23:13:47.2609225Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_view_as_cuda_complex64 PASSED [ 81%] 2023-01-11T23:13:47.2609387Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_vstack_cuda_complex64 PASSED [ 81%] 2023-01-11T23:13:47.2609542Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_where_cuda_complex64 PASSED [ 81%] 2023-01-11T23:13:47.2609700Z test_ops.py::TestMathBitsCUDA::test_conj_view_acosh_cuda_complex64 PASSED [ 81%] 2023-01-11T23:13:47.2609898Z test_ops.py::TestMathBitsCUDA::test_conj_view_addcmul_cuda_complex64 PASSED [ 81%] 2023-01-11T23:13:47.2610057Z test_ops.py::TestMathBitsCUDA::test_conj_view_addmm_cuda_complex64 PASSED [ 81%] 2023-01-11T23:13:47.2610230Z test_ops.py::TestMathBitsCUDA::test_conj_view_addmm_decomposed_cuda_complex64 PASSED [ 81%] 2023-01-11T23:13:47.2610388Z test_ops.py::TestMathBitsCUDA::test_conj_view_addr_cuda_complex64 PASSED [ 81%] 2023-01-11T23:13:47.2610547Z test_ops.py::TestMathBitsCUDA::test_conj_view_all_cuda_complex64 PASSED [ 81%] 2023-01-11T23:13:47.2610704Z test_ops.py::TestMathBitsCUDA::test_conj_view_angle_cuda_complex64 PASSED [ 81%] 2023-01-11T23:13:47.2610857Z test_ops.py::TestMathBitsCUDA::test_conj_view_argwhere_cuda_complex64 PASSED [ 81%] 2023-01-11T23:13:47.2611066Z test_ops.py::TestMathBitsCUDA::test_conj_view_as_strided_cuda_complex64 SKIPPED (Errors when storage_offset is included) [ 81%] 2023-01-11T23:13:47.2611280Z test_ops.py::TestMathBitsCUDA::test_conj_view_as_strided_partial_views_cuda_complex64 SKIPPED (Test changes in memory layout) [ 81%] 2023-01-11T23:13:47.2611454Z test_ops.py::TestMathBitsCUDA::test_conj_view_as_strided_scatter_cuda_complex64 PASSED [ 81%] 2023-01-11T23:13:47.2611617Z test_ops.py::TestMathBitsCUDA::test_conj_view_asinh_cuda_complex64 PASSED [ 81%] 2023-01-11T23:13:47.2611779Z test_ops.py::TestMathBitsCUDA::test_conj_view_baddbmm_cuda_complex64 PASSED [ 81%] 2023-01-11T23:13:47.2611940Z test_ops.py::TestMathBitsCUDA::test_conj_view_block_diag_cuda_complex64 PASSED [ 81%] 2023-01-11T23:13:47.2612104Z test_ops.py::TestMathBitsCUDA::test_conj_view_broadcast_to_cuda_complex64 PASSED [ 81%] 2023-01-11T23:13:47.2612270Z test_ops.py::TestMathBitsCUDA::test_conj_view_cartesian_prod_cuda_complex64 XFAIL [ 81%] 2023-01-11T23:13:47.2612417Z test_ops.py::TestMathBitsCUDA::test_conj_view_cat_cuda_complex64 PASSED [ 81%] 2023-01-11T23:13:47.2612579Z test_ops.py::TestMathBitsCUDA::test_conj_view_cdouble_cuda_complex64 PASSED [ 81%] 2023-01-11T23:13:47.2612743Z test_ops.py::TestMathBitsCUDA::test_conj_view_cholesky_cuda_complex64 PASSED [ 81%] 2023-01-11T23:13:47.2612911Z test_ops.py::TestMathBitsCUDA::test_conj_view_cholesky_inverse_cuda_complex64 PASSED [ 81%] 2023-01-11T23:13:47.2613139Z test_ops.py::TestMathBitsCUDA::test_conj_view_clone_cuda_complex64 PASSED [ 81%] 2023-01-11T23:13:47.2613304Z test_ops.py::TestMathBitsCUDA::test_conj_view_column_stack_cuda_complex64 PASSED [ 81%] 2023-01-11T23:13:47.2613471Z test_ops.py::TestMathBitsCUDA::test_conj_view_combinations_cuda_complex64 PASSED [ 81%] 2023-01-11T23:13:47.2613627Z test_ops.py::TestMathBitsCUDA::test_conj_view_conj_cuda_complex64 PASSED [ 81%] 2023-01-11T23:13:47.2613782Z test_ops.py::TestMathBitsCUDA::test_conj_view_contiguous_cuda_complex64 PASSED [ 81%] 2023-01-11T23:13:47.2613944Z test_ops.py::TestMathBitsCUDA::test_conj_view_corrcoef_cuda_complex64 PASSED [ 81%] 2023-01-11T23:13:47.2614098Z test_ops.py::TestMathBitsCUDA::test_conj_view_cos_cuda_complex64 PASSED [ 81%] 2023-01-11T23:13:47.2614257Z test_ops.py::TestMathBitsCUDA::test_conj_view_cross_cuda_complex64 PASSED [ 81%] 2023-01-11T23:13:47.2614418Z test_ops.py::TestMathBitsCUDA::test_conj_view_cumprod_cuda_complex64 PASSED [ 81%] 2023-01-11T23:13:47.2614684Z test_ops.py::TestMathBitsCUDA::test_conj_view_diag_embed_cuda_complex64 PASSED [ 81%] 2023-01-11T23:13:47.2614846Z test_ops.py::TestMathBitsCUDA::test_conj_view_diagflat_cuda_complex64 PASSED [ 81%] 2023-01-11T23:13:47.2615012Z test_ops.py::TestMathBitsCUDA::test_conj_view_diagonal_copy_cuda_complex64 PASSED [ 81%] 2023-01-11T23:13:47.2615165Z test_ops.py::TestMathBitsCUDA::test_conj_view_diagonal_cuda_complex64 PASSED [ 81%] 2023-01-11T23:13:47.2615334Z test_ops.py::TestMathBitsCUDA::test_conj_view_diagonal_scatter_cuda_complex64 PASSED [ 81%] 2023-01-11T23:13:47.2615490Z test_ops.py::TestMathBitsCUDA::test_conj_view_dist_cuda_complex64 PASSED [ 81%] 2023-01-11T23:13:47.2615643Z test_ops.py::TestMathBitsCUDA::test_conj_view_dot_cuda_complex64 PASSED [ 81%] 2023-01-11T23:13:47.2615843Z test_ops.py::TestMathBitsCUDA::test_conj_view_eq_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2615999Z test_ops.py::TestMathBitsCUDA::test_conj_view_exp_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2616165Z test_ops.py::TestMathBitsCUDA::test_conj_view_expand_as_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2616325Z test_ops.py::TestMathBitsCUDA::test_conj_view_fft_fft_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2616486Z test_ops.py::TestMathBitsCUDA::test_conj_view_fft_fftn_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2616638Z test_ops.py::TestMathBitsCUDA::test_conj_view_fft_hfft2_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2616796Z test_ops.py::TestMathBitsCUDA::test_conj_view_fft_hfft_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2616953Z test_ops.py::TestMathBitsCUDA::test_conj_view_fft_ifft2_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2617110Z test_ops.py::TestMathBitsCUDA::test_conj_view_fft_ifft_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2617278Z test_ops.py::TestMathBitsCUDA::test_conj_view_fft_ifftshift_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2617440Z test_ops.py::TestMathBitsCUDA::test_conj_view_fft_irfft_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2617598Z test_ops.py::TestMathBitsCUDA::test_conj_view_fill_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2617754Z test_ops.py::TestMathBitsCUDA::test_conj_view_flip_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2617909Z test_ops.py::TestMathBitsCUDA::test_conj_view_flipud_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2618068Z test_ops.py::TestMathBitsCUDA::test_conj_view_float_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2618229Z test_ops.py::TestMathBitsCUDA::test_conj_view_full_like_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2618388Z test_ops.py::TestMathBitsCUDA::test_conj_view_gather_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2618546Z test_ops.py::TestMathBitsCUDA::test_conj_view_geqrf_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2618710Z test_ops.py::TestMathBitsCUDA::test_conj_view_gradient_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2618867Z test_ops.py::TestMathBitsCUDA::test_conj_view_hstack_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2619065Z test_ops.py::TestMathBitsCUDA::test_conj_view_index_add_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2619221Z test_ops.py::TestMathBitsCUDA::test_conj_view_index_copy_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2619410Z test_ops.py::TestMathBitsCUDA::test_conj_view_index_select_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2619592Z test_ops.py::TestMathBitsCUDA::test_conj_view_inner_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2619747Z test_ops.py::TestMathBitsCUDA::test_conj_view_int_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2619908Z test_ops.py::TestMathBitsCUDA::test_conj_view_isclose_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2620070Z test_ops.py::TestMathBitsCUDA::test_conj_view_isfinite_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2620258Z test_ops.py::TestMathBitsCUDA::test_conj_view_jiterator_4inputs_with_extra_args_cuda_complex64 XFAIL [ 82%] 2023-01-11T23:13:47.2620431Z test_ops.py::TestMathBitsCUDA::test_conj_view_jiterator_binary_cuda_complex64 XFAIL [ 82%] 2023-01-11T23:13:47.2620579Z test_ops.py::TestMathBitsCUDA::test_conj_view_ldexp_cuda_complex64 XFAIL [ 82%] 2023-01-11T23:13:47.2620747Z test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_cross_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2620909Z test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_det_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2621082Z test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_det_singular_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2621250Z test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_eigvalsh_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2621434Z test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_householder_product_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2621599Z test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_inv_ex_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2621795Z test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_ldl_factor_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2621962Z test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_lu_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2622126Z test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_lu_factor_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2622298Z test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_lu_factor_ex_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2622469Z test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_matrix_rank_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2622635Z test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_pinv_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2622815Z test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_pinv_hermitian_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2623056Z test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_pinv_singular_cuda_complex64 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 82%] 2023-01-11T23:13:47.2623222Z test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_qr_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2623389Z test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_slogdet_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2623560Z test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_svdvals_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2623724Z test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_vector_norm_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2623886Z test_ops.py::TestMathBitsCUDA::test_conj_view_linspace_cuda_complex64 XFAIL [ 82%] 2023-01-11T23:13:47.2624042Z test_ops.py::TestMathBitsCUDA::test_conj_view_log10_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2624220Z test_ops.py::TestMathBitsCUDA::test_conj_view_log_softmax_with_dtype_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2624375Z test_ops.py::TestMathBitsCUDA::test_conj_view_lu_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2624530Z test_ops.py::TestMathBitsCUDA::test_conj_view_mT_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2624702Z test_ops.py::TestMathBitsCUDA::test_conj_view_masked_cumprod_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2624869Z test_ops.py::TestMathBitsCUDA::test_conj_view_masked_cumsum_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2625055Z test_ops.py::TestMathBitsCUDA::test_conj_view_masked_mean_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2625221Z test_ops.py::TestMathBitsCUDA::test_conj_view_masked_prod_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2625385Z test_ops.py::TestMathBitsCUDA::test_conj_view_masked_scatter_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2625548Z test_ops.py::TestMathBitsCUDA::test_conj_view_masked_std_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2625708Z test_ops.py::TestMathBitsCUDA::test_conj_view_masked_sum_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2625866Z test_ops.py::TestMathBitsCUDA::test_conj_view_masked_var_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2626026Z test_ops.py::TestMathBitsCUDA::test_conj_view_matrix_exp_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2626186Z test_ops.py::TestMathBitsCUDA::test_conj_view_mean_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2626366Z test_ops.py::TestMathBitsCUDA::test_conj_view_meshgrid_variadic_tensors_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2626527Z test_ops.py::TestMathBitsCUDA::test_conj_view_movedim_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2626692Z test_ops.py::TestMathBitsCUDA::test_conj_view_narrow_copy_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2626851Z test_ops.py::TestMathBitsCUDA::test_conj_view_narrow_cuda_complex64 XFAIL [ 82%] 2023-01-11T23:13:47.2627008Z test_ops.py::TestMathBitsCUDA::test_conj_view_neg_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2627186Z test_ops.py::TestMathBitsCUDA::test_conj_view_new_empty_cuda_complex64 SKIPPED (Skipped!) [ 82%] 2023-01-11T23:13:47.2627348Z test_ops.py::TestMathBitsCUDA::test_conj_view_new_full_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2627543Z test_ops.py::TestMathBitsCUDA::test_conj_view_new_zeros_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2627717Z test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_conv1d_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2627887Z test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_conv2d_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2628079Z test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_conv_transpose2d_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2628268Z test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_conv_transpose3d_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2628479Z test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_feature_alpha_dropout_without_train_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2628655Z test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_l1_loss_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2628830Z test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_linear_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2629013Z test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_normalize_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2629198Z test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_pad_constant_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2629386Z test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_pad_replicate_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2629563Z test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_pixel_shuffle_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2629817Z test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_pixel_unshuffle_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2629998Z test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_softsign_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2630156Z test_ops.py::TestMathBitsCUDA::test_conj_view_norm_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2630318Z test_ops.py::TestMathBitsCUDA::test_conj_view_norm_fro_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2630480Z test_ops.py::TestMathBitsCUDA::test_conj_view_norm_nuc_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2630644Z test_ops.py::TestMathBitsCUDA::test_conj_view_ones_like_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2630842Z test_ops.py::TestMathBitsCUDA::test_conj_view_ops_nvprims_view_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2630995Z test_ops.py::TestMathBitsCUDA::test_conj_view_outer_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2631159Z test_ops.py::TestMathBitsCUDA::test_conj_view_permute_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2631321Z test_ops.py::TestMathBitsCUDA::test_conj_view_positive_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2631479Z test_ops.py::TestMathBitsCUDA::test_conj_view_pow_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2631634Z test_ops.py::TestMathBitsCUDA::test_conj_view_qr_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2631792Z test_ops.py::TestMathBitsCUDA::test_conj_view_real_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2631958Z test_ops.py::TestMathBitsCUDA::test_conj_view_reciprocal_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2632122Z test_ops.py::TestMathBitsCUDA::test_conj_view_repeat_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2632290Z test_ops.py::TestMathBitsCUDA::test_conj_view_repeat_interleave_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2632455Z test_ops.py::TestMathBitsCUDA::test_conj_view_reshape_as_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2632618Z test_ops.py::TestMathBitsCUDA::test_conj_view_reshape_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2632776Z test_ops.py::TestMathBitsCUDA::test_conj_view_resize__cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2632942Z test_ops.py::TestMathBitsCUDA::test_conj_view_resolve_neg_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2633099Z test_ops.py::TestMathBitsCUDA::test_conj_view_rot90_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2633259Z test_ops.py::TestMathBitsCUDA::test_conj_view_rsqrt_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2633443Z test_ops.py::TestMathBitsCUDA::test_conj_view_scalar_tensor_cuda_complex64 SKIPPED (Skipped!) [ 82%] 2023-01-11T23:13:47.2633635Z test_ops.py::TestMathBitsCUDA::test_conj_view_scatter_add_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2633789Z test_ops.py::TestMathBitsCUDA::test_conj_view_scatter_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2633950Z test_ops.py::TestMathBitsCUDA::test_conj_view_select_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2634107Z test_ops.py::TestMathBitsCUDA::test_conj_view_sgn_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2634266Z test_ops.py::TestMathBitsCUDA::test_conj_view_short_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2634424Z test_ops.py::TestMathBitsCUDA::test_conj_view_sinc_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2634580Z test_ops.py::TestMathBitsCUDA::test_conj_view_slice_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2634753Z test_ops.py::TestMathBitsCUDA::test_conj_view_softmax_with_dtype_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2634914Z test_ops.py::TestMathBitsCUDA::test_conj_view_split_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2635076Z test_ops.py::TestMathBitsCUDA::test_conj_view_split_list_args_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2635251Z test_ops.py::TestMathBitsCUDA::test_conj_view_split_with_sizes_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2635410Z test_ops.py::TestMathBitsCUDA::test_conj_view_square_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2635578Z test_ops.py::TestMathBitsCUDA::test_conj_view_std_unbiased_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2635734Z test_ops.py::TestMathBitsCUDA::test_conj_view_stft_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2635891Z test_ops.py::TestMathBitsCUDA::test_conj_view_svd_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2636051Z test_ops.py::TestMathBitsCUDA::test_conj_view_symeig_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2636205Z test_ops.py::TestMathBitsCUDA::test_conj_view_t_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2636352Z test_ops.py::TestMathBitsCUDA::test_conj_view_tanh_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2636523Z test_ops.py::TestMathBitsCUDA::test_conj_view_tensordot_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2636682Z test_ops.py::TestMathBitsCUDA::test_conj_view_to_sparse_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2636867Z test_ops.py::TestMathBitsCUDA::test_conj_view_transpose_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2637028Z test_ops.py::TestMathBitsCUDA::test_conj_view_trapezoid_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2637191Z test_ops.py::TestMathBitsCUDA::test_conj_view_unbind_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2637356Z test_ops.py::TestMathBitsCUDA::test_conj_view_unflatten_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2637514Z test_ops.py::TestMathBitsCUDA::test_conj_view_uniform_cuda_complex64 XFAIL [ 82%] 2023-01-11T23:13:47.2637669Z test_ops.py::TestMathBitsCUDA::test_conj_view_unsqueeze_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2637823Z test_ops.py::TestMathBitsCUDA::test_conj_view_var_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2637992Z test_ops.py::TestMathBitsCUDA::test_conj_view_var_unbiased_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2638149Z test_ops.py::TestMathBitsCUDA::test_conj_view_vdot_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2638304Z test_ops.py::TestMathBitsCUDA::test_conj_view_view_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2638463Z test_ops.py::TestMathBitsCUDA::test_conj_view_vsplit_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2638620Z test_ops.py::TestMathBitsCUDA::test_conj_view_vstack_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2638777Z test_ops.py::TestMathBitsCUDA::test_conj_view_where_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2638933Z test_ops.py::TestMathBitsCUDA::test_conj_view_zero__cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2639089Z test_ops.py::TestMathBitsCUDA::test_conj_view_zeros_like_cuda_complex64 PASSED [ 82%] 2023-01-11T23:13:47.2639254Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view___rdiv___cuda_complex128 PASSED [ 82%] 2023-01-11T23:13:47.2639451Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view___rmatmul___cuda_complex128 PASSED [ 82%] 2023-01-11T23:13:47.2639615Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view___rmul___cuda_complex128 PASSED [ 82%] 2023-01-11T23:13:47.2639781Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view___rpow___cuda_complex128 PASSED [ 82%] 2023-01-11T23:13:47.2639969Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs__conversions_cfloat_cuda_complex128 PASSED [ 82%] 2023-01-11T23:13:47.2640152Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs__conversions_char_cuda_complex128 PASSED [ 82%] 2023-01-11T23:13:47.2640337Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs__conversions_double_cuda_complex128 PASSED [ 82%] 2023-01-11T23:13:47.2640514Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs__conversions_half_cuda_complex128 PASSED [ 82%] 2023-01-11T23:13:47.2640696Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs__conversions_int_cuda_complex128 PASSED [ 82%] 2023-01-11T23:13:47.2640867Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_acosh_cuda_complex128 PASSED [ 82%] 2023-01-11T23:13:47.2641038Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_addcmul_cuda_complex128 PASSED [ 82%] 2023-01-11T23:13:47.2641213Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_allclose_cuda_complex128 PASSED [ 82%] 2023-01-11T23:13:47.2641379Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_any_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2641616Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_as_strided_partial_views_cuda_complex128 SKIPPED (Errors when storage_offset is included) [ 83%] 2023-01-11T23:13:47.2641786Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_asin_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2641953Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_atanh_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2642131Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_broadcast_tensors_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2642313Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_broadcast_to_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2642508Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_cat_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2642688Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_conj_physical_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2642859Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_cosh_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2643029Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_cumsum_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2643205Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_diagonal_copy_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2643392Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_div_no_rounding_mode_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2643609Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_empty_like_cuda_complex128 SKIPPED (Expected: empty is not comparable) [ 83%] 2023-01-11T23:13:47.2643777Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_fft_fftn_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2643957Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_fft_fftshift_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2644131Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_fft_hfft_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2644306Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_fft_ifftshift_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2644478Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_fft_irfft_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2644652Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_fft_irfftn_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2644826Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_hsplit_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2644997Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_index_add_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2645206Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_index_copy_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2645372Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_isfinite_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2645543Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_isreal_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2645712Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_lerp_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2645896Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_linalg_matrix_norm_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2646070Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_linalg_svd_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2646248Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_linalg_svdvals_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2646435Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_log_softmax_with_dtype_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2646613Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_logical_and_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2646783Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_logical_not_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2646961Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_logical_xor_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2647133Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_logspace_cuda_complex128 XFAIL [ 83%] 2023-01-11T23:13:47.2647305Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_masked_fill_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2647473Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_mean_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2647663Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_meshgrid_list_of_tensors_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2647856Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_meshgrid_variadic_tensors_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2648031Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_movedim_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2648225Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_mul_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2648394Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_narrow_copy_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2648558Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_neg_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2648775Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_new_empty_cuda_complex128 SKIPPED (Expected: empty is not comparable) [ 83%] 2023-01-11T23:13:47.2648947Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_new_ones_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2649133Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_nn_functional_l1_loss_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2649335Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_nn_functional_log_softmax_with_dtype_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2649537Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_nn_functional_softmin_with_dtype_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2649729Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_nn_functional_tanhshrink_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2649896Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_norm_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2650053Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_ones_cuda_complex128 XFAIL [ 83%] 2023-01-11T23:13:47.2650226Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_permute_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2650397Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_positive_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2650561Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_prod_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2650801Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_randn_cuda_complex128 SKIPPED (Test expects tensor input) [ 83%] 2023-01-11T23:13:47.2650973Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_repeat_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2651144Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_rot90_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2651313Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_rsqrt_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2651481Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_sinh_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2651670Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_special_log_softmax_with_dtype_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2651863Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_special_softmax_with_dtype_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2652031Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_square_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2652203Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_stack_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2652369Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_std_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2652537Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_sub_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2652698Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_sum_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2652862Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_t_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2653041Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_tensor_split_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2653198Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_to_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2653365Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_tril_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2653530Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_triu_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2653709Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_true_divide_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2653904Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_vsplit_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2654074Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_vstack_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2654240Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_where_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2654404Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_zeros_cuda_complex128 XFAIL [ 83%] 2023-01-11T23:13:47.2654662Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_add_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2654829Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_addbmm_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2654993Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_addmm_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2655156Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_addmv_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2655317Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_all_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2655486Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_allclose_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2655648Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_any_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2655811Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_asin_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2655972Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_asinh_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2656124Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_atan_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2656291Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_atleast_2d_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2656460Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_atleast_3d_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2656669Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_bfloat16_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2656834Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_block_diag_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2656999Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_bool_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2657170Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_broadcast_to_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2657344Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_cartesian_prod_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2657503Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_cdouble_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2657665Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_chalf_cuda_complex128 XFAIL [ 83%] 2023-01-11T23:13:47.2657825Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_char_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2657991Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_cholesky_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2658171Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_cholesky_inverse_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2658341Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_cholesky_solve_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2658506Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_clone_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2658674Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_column_stack_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2658837Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_combinations_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2659012Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_constant_pad_nd_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2659174Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_cos_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2659336Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_cosh_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2659495Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_cov_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2659664Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_cumprod_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2659865Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_diag_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2660033Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_diag_embed_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2660199Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_diagflat_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2660365Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_diagonal_copy_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2660525Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_diff_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2660687Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_dot_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2660850Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_double_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2661030Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_empty_cuda_complex128 SKIPPED (Skipped!) [ 83%] 2023-01-11T23:13:47.2661215Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_empty_like_cuda_complex128 SKIPPED (Skipped!) [ 83%] 2023-01-11T23:13:47.2661379Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_eq_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2661541Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_exp_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2661699Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_expand_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2661875Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_eye_cuda_complex128 SKIPPED (Skipped!) [ 83%] 2023-01-11T23:13:47.2662040Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_fft_fft2_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2662204Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_fft_fft_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2662368Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_fft_fftn_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2662569Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_fft_fftshift_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2662735Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_fft_hfft_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2662903Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_fft_hfftn_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2663070Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_fft_ifftn_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2663230Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_fft_irfft2_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2663396Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_fft_irfftn_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2663557Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_fill_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2663719Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_fliplr_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2663885Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_gather_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2664052Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_geqrf_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2664218Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_hsplit_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2664384Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_hstack_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2664543Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_index_copy_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2664708Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_isfinite_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2664875Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_isnan_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2665033Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_istft_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2665194Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_lerp_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2665367Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_cross_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2665546Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_det_singular_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2665741Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_eig_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2665914Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_eigh_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2666080Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_eigvals_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2666246Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_inv_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2666419Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_inv_ex_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2666599Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_ldl_factor_ex_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2666770Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_lstsq_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2666958Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_lstsq_grad_oriented_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2667129Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_lu_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2667308Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_lu_factor_ex_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2667476Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_matrix_power_cuda_complex128 PASSED [ 83%] 2023-01-11T23:13:47.2667652Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_multi_dot_cuda_complex128 PASSED [ 84%] 2023-01-11T23:13:47.2667848Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_norm_subgradients_at_zero_cuda_complex128 PASSED [ 84%] 2023-01-11T23:13:47.2668015Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_qr_cuda_complex128 PASSED [ 84%] 2023-01-11T23:13:47.2668225Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_solve_triangular_cuda_complex128 PASSED [ 84%] 2023-01-11T23:13:47.2668393Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_svd_cuda_complex128 PASSED [ 84%] 2023-01-11T23:13:47.2668570Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_svdvals_cuda_complex128 PASSED [ 84%] 2023-01-11T23:13:47.2668741Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_vecdot_cuda_complex128 PASSED [ 84%] 2023-01-11T23:13:47.2668919Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_vector_norm_cuda_complex128 PASSED [ 84%] 2023-01-11T23:13:47.2669074Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_log10_cuda_complex128 PASSED [ 84%] 2023-01-11T23:13:47.2669235Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_log2_cuda_complex128 PASSED [ 84%] 2023-01-11T23:13:47.2669399Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_logdet_cuda_complex128 PASSED [ 84%] 2023-01-11T23:13:47.2669568Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_logical_not_cuda_complex128 PASSED [ 84%] 2023-01-11T23:13:47.2669795Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_logical_or_cuda_complex128 PASSED [ 84%] 2023-01-11T23:13:47.2669961Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_logical_xor_cuda_complex128 PASSED [ 84%] 2023-01-11T23:13:47.2670127Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_lu_cuda_complex128 PASSED [ 84%] 2023-01-11T23:13:47.2670295Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_lu_solve_cuda_complex128 PASSED [ 84%] 2023-01-11T23:13:47.2670455Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_lu_unpack_cuda_complex128 PASSED [ 84%] 2023-01-11T23:13:47.2670629Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_masked_cumsum_cuda_complex128 PASSED [ 84%] 2023-01-11T23:13:47.2670797Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_masked_fill_cuda_complex128 PASSED [ 84%] 2023-01-11T23:13:47.2670969Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_masked_scatter_cuda_complex128 PASSED [ 84%] 2023-01-11T23:13:47.2671145Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_masked_select_cuda_complex128 PASSED [ 84%] 2023-01-11T23:13:47.2671310Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_matmul_cuda_complex128 PASSED [ 84%] 2023-01-11T23:13:47.2671499Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_mean_cuda_complex128 PASSED [ 84%] 2023-01-11T23:13:47.2671688Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_meshgrid_list_of_tensors_cuda_complex128 PASSED [ 84%] 2023-01-11T23:13:47.2671856Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_movedim_cuda_complex128 PASSED [ 84%] 2023-01-11T23:13:47.2672011Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_mul_cuda_complex128 PASSED [ 84%] 2023-01-11T23:13:47.2672172Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_mv_cuda_complex128 PASSED [ 84%] 2023-01-11T23:13:47.2672338Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_narrow_cuda_complex128 PASSED [ 84%] 2023-01-11T23:13:47.2672502Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_ne_cuda_complex128 PASSED [ 84%] 2023-01-11T23:13:47.2672667Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_neg_cuda_complex128 PASSED [ 84%] 2023-01-11T23:13:47.2672832Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_new_full_cuda_complex128 PASSED [ 84%] 2023-01-11T23:13:47.2672999Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_new_ones_cuda_complex128 PASSED [ 84%] 2023-01-11T23:13:47.2673180Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_conv1d_cuda_complex128 PASSED [ 84%] 2023-01-11T23:13:47.2673365Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_conv_transpose1d_cuda_complex128 PASSED [ 84%] 2023-01-11T23:13:47.2673560Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_conv_transpose3d_cuda_complex128 PASSED [ 84%] 2023-01-11T23:13:47.2673743Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_l1_loss_cuda_complex128 PASSED [ 84%] 2023-01-11T23:13:47.2673929Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_pad_circular_cuda_complex128 PASSED [ 84%] 2023-01-11T23:13:47.2674144Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_pad_constant_cuda_complex128 PASSED [ 84%] 2023-01-11T23:13:47.2674330Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_pad_reflect_cuda_complex128 PASSED [ 84%] 2023-01-11T23:13:47.2674528Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_softmin_with_dtype_cuda_complex128 PASSED [ 84%] 2023-01-11T23:13:47.2674712Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_softsign_cuda_complex128 PASSED [ 84%] 2023-01-11T23:13:47.2674922Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_triplet_margin_with_distance_loss_cuda_complex128 PASSED [ 84%] 2023-01-11T23:13:47.2675082Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nonzero_cuda_complex128 PASSED [ 84%] 2023-01-11T23:13:47.2675247Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_norm_fro_cuda_complex128 PASSED [ 84%] 2023-01-11T23:13:47.2675424Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_ops_nvprims_view_cuda_complex128 PASSED [ 84%] 2023-01-11T23:13:47.2675593Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_ormqr_cuda_complex128 PASSED [ 84%] 2023-01-11T23:13:47.2675753Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_outer_cuda_complex128 PASSED [ 84%] 2023-01-11T23:13:47.2675920Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_positive_cuda_complex128 PASSED [ 84%] 2023-01-11T23:13:47.2676084Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_pow_cuda_complex128 PASSED [ 84%] 2023-01-11T23:13:47.2676247Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_qr_cuda_complex128 PASSED [ 84%] 2023-01-11T23:13:47.2676412Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_renorm_cuda_complex128 PASSED [ 84%] 2023-01-11T23:13:47.2676583Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_repeat_interleave_cuda_complex128 PASSED [ 84%] 2023-01-11T23:13:47.2676752Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_reshape_as_cuda_complex128 PASSED [ 84%] 2023-01-11T23:13:47.2676919Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_reshape_cuda_complex128 PASSED [ 84%] 2023-01-11T23:13:47.2677089Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_resize_as__cuda_complex128 PASSED [ 84%] 2023-01-11T23:13:47.2677279Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_roll_cuda_complex128 PASSED [ 84%] 2023-01-11T23:13:47.2677443Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_rsqrt_cuda_complex128 PASSED [ 84%] 2023-01-11T23:13:47.2677607Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_select_cuda_complex128 PASSED [ 84%] 2023-01-11T23:13:47.2677768Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_sgn_cuda_complex128 PASSED [ 84%] 2023-01-11T23:13:47.2677923Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_short_cuda_complex128 PASSED [ 84%] 2023-01-11T23:13:47.2678088Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_sigmoid_cuda_complex128 PASSED [ 84%] 2023-01-11T23:13:47.2678251Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_split_cuda_complex128 PASSED [ 84%] 2023-01-11T23:13:47.2678414Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_sqrt_cuda_complex128 PASSED [ 84%] 2023-01-11T23:13:47.2678578Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_square_cuda_complex128 PASSED [ 84%] 2023-01-11T23:13:47.2678746Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_squeeze_cuda_complex128 PASSED [ 84%] 2023-01-11T23:13:47.2678913Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_std_mean_cuda_complex128 PASSED [ 84%] 2023-01-11T23:13:47.2679092Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_std_mean_unbiased_cuda_complex128 PASSED [ 84%] 2023-01-11T23:13:47.2679253Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_svd_cuda_complex128 PASSED [ 84%] 2023-01-11T23:13:47.2679418Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_take_along_dim_cuda_complex128 PASSED [ 84%] 2023-01-11T23:13:47.2679580Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_take_cuda_complex128 PASSED [ 84%] 2023-01-11T23:13:47.2679739Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_tanh_cuda_complex128 PASSED [ 84%] 2023-01-11T23:13:47.2679934Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_tensor_split_cuda_complex128 PASSED [ 84%] 2023-01-11T23:13:47.2680103Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_tensordot_cuda_complex128 PASSED [ 84%] 2023-01-11T23:13:47.2680266Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_tile_cuda_complex128 PASSED [ 84%] 2023-01-11T23:13:47.2680435Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_trapezoid_cuda_complex128 PASSED [ 84%] 2023-01-11T23:13:47.2680597Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_trapz_cuda_complex128 PASSED [ 84%] 2023-01-11T23:13:47.2680767Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_triangular_solve_cuda_complex128 PASSED [ 84%] 2023-01-11T23:13:47.2680927Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_tril_cuda_complex128 PASSED [ 84%] 2023-01-11T23:13:47.2681083Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_triu_cuda_complex128 PASSED [ 84%] 2023-01-11T23:13:47.2681253Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_true_divide_cuda_complex128 PASSED [ 84%] 2023-01-11T23:13:47.2681422Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_uniform_cuda_complex128 XFAIL [ 84%] 2023-01-11T23:13:47.2681591Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_unsqueeze_cuda_complex128 PASSED [ 84%] 2023-01-11T23:13:47.2681757Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_var_cuda_complex128 PASSED [ 84%] 2023-01-11T23:13:47.2681921Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_var_mean_cuda_complex128 PASSED [ 84%] 2023-01-11T23:13:47.2682083Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_var_unbiased_cuda_complex128 PASSED [ 84%] 2023-01-11T23:13:47.2682246Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_vdot_cuda_complex128 PASSED [ 84%] 2023-01-11T23:13:47.2682576Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_view_as_real_cuda_complex128 SKIPPED (Operation doesn't support conjugated inputs.) [ 84%] 2023-01-11T23:13:47.2682743Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_vsplit_cuda_complex128 PASSED [ 84%] 2023-01-11T23:13:47.2682910Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_vstack_cuda_complex128 PASSED [ 84%] 2023-01-11T23:13:47.2683072Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_where_cuda_complex128 PASSED [ 84%] 2023-01-11T23:13:47.2683259Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_zero__cuda_complex128 PASSED [ 84%] 2023-01-11T23:13:47.2683417Z test_ops.py::TestMathBitsCUDA::test_neg_view_H_cuda_float64 PASSED [ 84%] 2023-01-11T23:13:47.2683570Z test_ops.py::TestMathBitsCUDA::test_neg_view_T_cuda_float64 PASSED [ 84%] 2023-01-11T23:13:47.2683719Z test_ops.py::TestMathBitsCUDA::test_neg_view___radd___cuda_float64 PASSED [ 84%] 2023-01-11T23:13:47.2683876Z test_ops.py::TestMathBitsCUDA::test_neg_view___rdiv___cuda_float64 PASSED [ 84%] 2023-01-11T23:13:47.2684036Z test_ops.py::TestMathBitsCUDA::test_neg_view___rmatmul___cuda_float64 PASSED [ 84%] 2023-01-11T23:13:47.2684191Z test_ops.py::TestMathBitsCUDA::test_neg_view___rmul___cuda_float64 PASSED [ 84%] 2023-01-11T23:13:47.2684349Z test_ops.py::TestMathBitsCUDA::test_neg_view___rpow___cuda_float64 PASSED [ 84%] 2023-01-11T23:13:47.2684500Z test_ops.py::TestMathBitsCUDA::test_neg_view___rsub___cuda_float64 PASSED [ 84%] 2023-01-11T23:13:47.2684658Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_T_cuda_float64 PASSED [ 84%] 2023-01-11T23:13:47.2684834Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs__conversions_byte_cuda_float64 PASSED [ 84%] 2023-01-11T23:13:47.2685001Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs__conversions_cfloat_cuda_float64 PASSED [ 84%] 2023-01-11T23:13:47.2685176Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs__conversions_char_cuda_float64 PASSED [ 84%] 2023-01-11T23:13:47.2685347Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs__conversions_half_cuda_float64 PASSED [ 84%] 2023-01-11T23:13:47.2685517Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs__conversions_int_cuda_float64 PASSED [ 84%] 2023-01-11T23:13:47.2685690Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs__conversions_short_cuda_float64 PASSED [ 84%] 2023-01-11T23:13:47.2685876Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_addr_cuda_float64 PASSED [ 84%] 2023-01-11T23:13:47.2686032Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_all_cuda_float64 PASSED [ 84%] 2023-01-11T23:13:47.2686198Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_allclose_cuda_float64 PASSED [ 84%] 2023-01-11T23:13:47.2686355Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_amax_cuda_float64 PASSED [ 84%] 2023-01-11T23:13:47.2686520Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_as_strided_scatter_cuda_float64 PASSED [ 84%] 2023-01-11T23:13:47.2686679Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_asin_cuda_float64 PASSED [ 84%] 2023-01-11T23:13:47.2686838Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_asinh_cuda_float64 PASSED [ 84%] 2023-01-11T23:13:47.2686993Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_atan2_cuda_float64 PASSED [ 84%] 2023-01-11T23:13:47.2687149Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_atan_cuda_float64 PASSED [ 84%] 2023-01-11T23:13:47.2687317Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_atleast_3d_cuda_float64 PASSED [ 84%] 2023-01-11T23:13:47.2687486Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_broadcast_to_cuda_float64 PASSED [ 84%] 2023-01-11T23:13:47.2687646Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_cat_cuda_float64 PASSED [ 84%] 2023-01-11T23:13:47.2687796Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_ceil_cuda_float64 PASSED [ 84%] 2023-01-11T23:13:47.2687957Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_clamp_cuda_float64 PASSED [ 84%] 2023-01-11T23:13:47.2688119Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_clamp_max_cuda_float64 PASSED [ 84%] 2023-01-11T23:13:47.2688284Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_column_stack_cuda_float64 PASSED [ 84%] 2023-01-11T23:13:47.2688440Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_conj_cuda_float64 PASSED [ 84%] 2023-01-11T23:13:47.2688608Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_conj_physical_cuda_float64 PASSED [ 84%] 2023-01-11T23:13:47.2688772Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_cumsum_cuda_float64 PASSED [ 84%] 2023-01-11T23:13:47.2688929Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_diag_cuda_float64 PASSED [ 84%] 2023-01-11T23:13:47.2689115Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_diagonal_copy_cuda_float64 PASSED [ 84%] 2023-01-11T23:13:47.2689282Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_diagonal_cuda_float64 PASSED [ 84%] 2023-01-11T23:13:47.2689445Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_digamma_cuda_float64 PASSED [ 84%] 2023-01-11T23:13:47.2689615Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_div_floor_rounding_cuda_float64 PASSED [ 84%] 2023-01-11T23:13:47.2689776Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_dsplit_cuda_float64 PASSED [ 84%] 2023-01-11T23:13:47.2689932Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_dstack_cuda_float64 PASSED [ 84%] 2023-01-11T23:13:47.2690139Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_empty_cuda_float64 SKIPPED (Expected: empty is not comparable) [ 84%] 2023-01-11T23:13:47.2690299Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_erf_cuda_float64 PASSED [ 84%] 2023-01-11T23:13:47.2690458Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_erfc_cuda_float64 PASSED [ 84%] 2023-01-11T23:13:47.2690608Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_erfinv_cuda_float64 PASSED [ 84%] 2023-01-11T23:13:47.2690770Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_expand_as_cuda_float64 PASSED [ 84%] 2023-01-11T23:13:47.2690933Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_fft2_cuda_float64 PASSED [ 84%] 2023-01-11T23:13:47.2691092Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_fft_cuda_float64 PASSED [ 84%] 2023-01-11T23:13:47.2691252Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_fftn_cuda_float64 PASSED [ 84%] 2023-01-11T23:13:47.2691414Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_hfft2_cuda_float64 PASSED [ 84%] 2023-01-11T23:13:47.2691613Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_hfftn_cuda_float64 PASSED [ 84%] 2023-01-11T23:13:47.2691775Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_ifft2_cuda_float64 PASSED [ 84%] 2023-01-11T23:13:47.2691928Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_ifft_cuda_float64 PASSED [ 84%] 2023-01-11T23:13:47.2692088Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_ifftn_cuda_float64 PASSED [ 84%] 2023-01-11T23:13:47.2692252Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_ihfft2_cuda_float64 PASSED [ 84%] 2023-01-11T23:13:47.2692410Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_ihfft_cuda_float64 PASSED [ 84%] 2023-01-11T23:13:47.2692576Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_irfft2_cuda_float64 PASSED [ 84%] 2023-01-11T23:13:47.2692733Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fill_cuda_float64 PASSED [ 84%] 2023-01-11T23:13:47.2692893Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_flatten_cuda_float64 PASSED [ 84%] 2023-01-11T23:13:47.2693056Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fliplr_cuda_float64 PASSED [ 84%] 2023-01-11T23:13:47.2693213Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_float_power_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2693376Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_floor_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2693544Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_floor_divide_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2693702Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fmax_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2693856Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fmin_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2694009Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fmod_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2694161Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_frac_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2694318Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_ge_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2694589Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_hsplit_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2694741Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_hstack_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2694945Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_index_copy_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2695108Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_isinf_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2695262Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_isnan_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2695424Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_isneginf_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2695587Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_lerp_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2695751Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_linalg_svd_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2695918Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_linalg_svdvals_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2696086Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_linalg_vector_norm_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2696251Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_linspace_cuda_float64 XFAIL [ 85%] 2023-01-11T23:13:47.2696410Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_log1p_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2696574Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_logical_and_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2696737Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_logical_not_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2696899Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_logical_xor_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2697060Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_logspace_cuda_float64 XFAIL [ 85%] 2023-01-11T23:13:47.2697224Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_logsumexp_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2697409Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_lt_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2697568Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_maximum_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2697734Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nan_to_num_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2697894Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_narrow_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2698047Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_ne_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2698274Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_new_empty_strided_cuda_float64 SKIPPED (Expected: empty_strided is not comparable) [ 85%] 2023-01-11T23:13:47.2698435Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_new_ones_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2698597Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_new_zeros_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2698759Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nextafter_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2698983Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_alpha_dropout_cuda_float64 SKIPPED (Expected: dropout is not comparable) [ 85%] 2023-01-11T23:13:47.2699159Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_celu_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2699331Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_elu_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2699502Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_glu_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2699685Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_hardshrink_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2699863Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_hardtanh_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2700054Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_hinge_embedding_loss_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2700234Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_layer_norm_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2700416Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_leaky_relu_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2700624Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_margin_ranking_loss_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2700801Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_nll_loss_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2700975Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_pdist_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2701163Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_poisson_nll_loss_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2701338Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_prelu_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2701508Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_relu6_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2701688Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_relu_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2701862Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_selu_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2702054Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_softmax_with_dtype_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2702226Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_threshold_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2702383Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_norm_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2702540Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_prod_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2702692Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_real_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2702845Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_roll_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2703001Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_rot90_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2703231Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_rsqrt_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2703386Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_rsub_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2703533Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_sign_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2703686Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_sinh_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2703859Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_softmax_with_dtype_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2704030Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_special_bessel_j1_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2704197Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_special_entr_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2704365Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_special_erfcx_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2704559Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_special_multigammaln_mvlgamma_p_1_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2704728Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_special_ndtr_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2704914Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_special_spherical_bessel_j0_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2705073Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_special_zeta_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2705230Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_sqrt_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2705390Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_square_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2705550Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_stack_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2705710Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_std_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2705872Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_std_mean_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2706031Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_sub_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2706196Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_sum_to_size_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2706372Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_t_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2706532Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_tan_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2706696Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_transpose_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2706858Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_trunc_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2707013Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_var_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2707168Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_zeros_cuda_float64 XFAIL [ 85%] 2023-01-11T23:13:47.2707340Z test_ops.py::TestMathBitsCUDA::test_neg_view__softmax_backward_data_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2707495Z test_ops.py::TestMathBitsCUDA::test_neg_view_abs_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2707642Z test_ops.py::TestMathBitsCUDA::test_neg_view_addbmm_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2707803Z test_ops.py::TestMathBitsCUDA::test_neg_view_addcdiv_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2707959Z test_ops.py::TestMathBitsCUDA::test_neg_view_addmm_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2708111Z test_ops.py::TestMathBitsCUDA::test_neg_view_addr_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2708260Z test_ops.py::TestMathBitsCUDA::test_neg_view_amax_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2708413Z test_ops.py::TestMathBitsCUDA::test_neg_view_aminmax_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2708563Z test_ops.py::TestMathBitsCUDA::test_neg_view_any_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2708726Z test_ops.py::TestMathBitsCUDA::test_neg_view_argwhere_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2708958Z test_ops.py::TestMathBitsCUDA::test_neg_view_as_strided_cuda_float64 SKIPPED (Errors when storage_offset is included) [ 85%] 2023-01-11T23:13:47.2709137Z test_ops.py::TestMathBitsCUDA::test_neg_view_as_strided_scatter_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2709314Z test_ops.py::TestMathBitsCUDA::test_neg_view_asinh_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2709472Z test_ops.py::TestMathBitsCUDA::test_neg_view_atan2_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2709630Z test_ops.py::TestMathBitsCUDA::test_neg_view_atleast_1d_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2709846Z test_ops.py::TestMathBitsCUDA::test_neg_view_baddbmm_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2710007Z test_ops.py::TestMathBitsCUDA::test_neg_view_bernoulli_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2710166Z test_ops.py::TestMathBitsCUDA::test_neg_view_block_diag_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2710314Z test_ops.py::TestMathBitsCUDA::test_neg_view_bmm_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2710462Z test_ops.py::TestMathBitsCUDA::test_neg_view_bool_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2710628Z test_ops.py::TestMathBitsCUDA::test_neg_view_broadcast_to_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2710789Z test_ops.py::TestMathBitsCUDA::test_neg_view_bucketize_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2710941Z test_ops.py::TestMathBitsCUDA::test_neg_view_byte_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2711089Z test_ops.py::TestMathBitsCUDA::test_neg_view_cat_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2711239Z test_ops.py::TestMathBitsCUDA::test_neg_view_cdist_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2711391Z test_ops.py::TestMathBitsCUDA::test_neg_view_cdouble_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2711541Z test_ops.py::TestMathBitsCUDA::test_neg_view_cfloat_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2711684Z test_ops.py::TestMathBitsCUDA::test_neg_view_chalf_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2711855Z test_ops.py::TestMathBitsCUDA::test_neg_view_cholesky_inverse_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2712020Z test_ops.py::TestMathBitsCUDA::test_neg_view_cholesky_solve_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2712197Z test_ops.py::TestMathBitsCUDA::test_neg_view_clamp_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2712354Z test_ops.py::TestMathBitsCUDA::test_neg_view_clamp_min_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2712505Z test_ops.py::TestMathBitsCUDA::test_neg_view_conj_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2712668Z test_ops.py::TestMathBitsCUDA::test_neg_view_conj_physical_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2712833Z test_ops.py::TestMathBitsCUDA::test_neg_view_constant_pad_nd_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2712987Z test_ops.py::TestMathBitsCUDA::test_neg_view_contiguous_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2713136Z test_ops.py::TestMathBitsCUDA::test_neg_view_cosh_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2713291Z test_ops.py::TestMathBitsCUDA::test_neg_view_cummax_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2713451Z test_ops.py::TestMathBitsCUDA::test_neg_view_diag_embed_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2713611Z test_ops.py::TestMathBitsCUDA::test_neg_view_diagflat_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2713770Z test_ops.py::TestMathBitsCUDA::test_neg_view_diagonal_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2713920Z test_ops.py::TestMathBitsCUDA::test_neg_view_diff_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2714070Z test_ops.py::TestMathBitsCUDA::test_neg_view_dist_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2714230Z test_ops.py::TestMathBitsCUDA::test_neg_view_div_trunc_rounding_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2714380Z test_ops.py::TestMathBitsCUDA::test_neg_view_dot_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2714532Z test_ops.py::TestMathBitsCUDA::test_neg_view_double_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2714685Z test_ops.py::TestMathBitsCUDA::test_neg_view_dsplit_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2714880Z test_ops.py::TestMathBitsCUDA::test_neg_view_empty_cuda_float64 SKIPPED (Skipped!) [ 85%] 2023-01-11T23:13:47.2715056Z test_ops.py::TestMathBitsCUDA::test_neg_view_empty_like_cuda_float64 SKIPPED (Skipped!) [ 85%] 2023-01-11T23:13:47.2715208Z test_ops.py::TestMathBitsCUDA::test_neg_view_eq_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2715359Z test_ops.py::TestMathBitsCUDA::test_neg_view_equal_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2715508Z test_ops.py::TestMathBitsCUDA::test_neg_view_erf_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2715651Z test_ops.py::TestMathBitsCUDA::test_neg_view_erfc_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2715803Z test_ops.py::TestMathBitsCUDA::test_neg_view_erfinv_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2715961Z test_ops.py::TestMathBitsCUDA::test_neg_view_expand_as_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2716112Z test_ops.py::TestMathBitsCUDA::test_neg_view_expm1_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2716274Z test_ops.py::TestMathBitsCUDA::test_neg_view_fft_fftn_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2716433Z test_ops.py::TestMathBitsCUDA::test_neg_view_fft_hfftn_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2716592Z test_ops.py::TestMathBitsCUDA::test_neg_view_fft_ifft2_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2716752Z test_ops.py::TestMathBitsCUDA::test_neg_view_fft_ifftn_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2716904Z test_ops.py::TestMathBitsCUDA::test_neg_view_fft_ihfftn_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2717061Z test_ops.py::TestMathBitsCUDA::test_neg_view_fft_irfftn_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2717218Z test_ops.py::TestMathBitsCUDA::test_neg_view_fft_rfft2_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2717375Z test_ops.py::TestMathBitsCUDA::test_neg_view_fft_rfft_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2717526Z test_ops.py::TestMathBitsCUDA::test_neg_view_fill_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2717692Z test_ops.py::TestMathBitsCUDA::test_neg_view_float_power_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2717844Z test_ops.py::TestMathBitsCUDA::test_neg_view_floor_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2718033Z test_ops.py::TestMathBitsCUDA::test_neg_view_floor_divide_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2718179Z test_ops.py::TestMathBitsCUDA::test_neg_view_fmax_cuda_float64 PASSED [ 85%] 2023-01-11T23:13:47.2718330Z test_ops.py::TestMathBitsCUDA::test_neg_view_fmin_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2718482Z test_ops.py::TestMathBitsCUDA::test_neg_view_fmod_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2718639Z test_ops.py::TestMathBitsCUDA::test_neg_view_full_like_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2718791Z test_ops.py::TestMathBitsCUDA::test_neg_view_gather_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2718944Z test_ops.py::TestMathBitsCUDA::test_neg_view_geqrf_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2719106Z test_ops.py::TestMathBitsCUDA::test_neg_view_gradient_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2719256Z test_ops.py::TestMathBitsCUDA::test_neg_view_gt_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2719401Z test_ops.py::TestMathBitsCUDA::test_neg_view_histc_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2719555Z test_ops.py::TestMathBitsCUDA::test_neg_view_igammac_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2719711Z test_ops.py::TestMathBitsCUDA::test_neg_view_index_add_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2719875Z test_ops.py::TestMathBitsCUDA::test_neg_view_index_reduce_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2720035Z test_ops.py::TestMathBitsCUDA::test_neg_view_index_select_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2720187Z test_ops.py::TestMathBitsCUDA::test_neg_view_inner_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2720338Z test_ops.py::TestMathBitsCUDA::test_neg_view_int_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2720489Z test_ops.py::TestMathBitsCUDA::test_neg_view_isclose_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2720656Z test_ops.py::TestMathBitsCUDA::test_neg_view_isin_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2720810Z test_ops.py::TestMathBitsCUDA::test_neg_view_isinf_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2720963Z test_ops.py::TestMathBitsCUDA::test_neg_view_isnan_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2721121Z test_ops.py::TestMathBitsCUDA::test_neg_view_isneginf_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2721280Z test_ops.py::TestMathBitsCUDA::test_neg_view_isposinf_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2721433Z test_ops.py::TestMathBitsCUDA::test_neg_view_isreal_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2721584Z test_ops.py::TestMathBitsCUDA::test_neg_view_kron_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2721748Z test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_cond_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2721902Z test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_cross_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2722064Z test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_det_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2722234Z test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_det_singular_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2722398Z test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_eigh_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2722578Z test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_householder_product_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2722738Z test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_inv_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2722898Z test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_lstsq_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2723075Z test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_lstsq_grad_oriented_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2723241Z test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_lu_factor_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2723400Z test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_matrix_norm_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2723574Z test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_matrix_power_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2723761Z test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_multi_dot_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2723922Z test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_pinv_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2724159Z test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_pinv_singular_cuda_float64 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 86%] 2023-01-11T23:13:47.2724317Z test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_qr_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2724481Z test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_solve_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2724644Z test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_solve_ex_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2724811Z test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_solve_triangular_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2724976Z test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_svd_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2725143Z test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_svdvals_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2725315Z test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_tensorsolve_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2725481Z test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_vector_norm_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2725634Z test_ops.py::TestMathBitsCUDA::test_neg_view_log10_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2725788Z test_ops.py::TestMathBitsCUDA::test_neg_view_log2_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2725948Z test_ops.py::TestMathBitsCUDA::test_neg_view_log_softmax_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2726120Z test_ops.py::TestMathBitsCUDA::test_neg_view_log_softmax_with_dtype_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2726274Z test_ops.py::TestMathBitsCUDA::test_neg_view_logical_or_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2726459Z test_ops.py::TestMathBitsCUDA::test_neg_view_logspace_cuda_float64 XFAIL [ 86%] 2023-01-11T23:13:47.2726620Z test_ops.py::TestMathBitsCUDA::test_neg_view_logsumexp_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2726782Z test_ops.py::TestMathBitsCUDA::test_neg_view_long_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2726939Z test_ops.py::TestMathBitsCUDA::test_neg_view_lt_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2727100Z test_ops.py::TestMathBitsCUDA::test_neg_view_lu_unpack_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2727264Z test_ops.py::TestMathBitsCUDA::test_neg_view_masked_amax_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2727424Z test_ops.py::TestMathBitsCUDA::test_neg_view_masked_amin_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2727581Z test_ops.py::TestMathBitsCUDA::test_neg_view_masked_argmax_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2727745Z test_ops.py::TestMathBitsCUDA::test_neg_view_masked_argmin_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2727916Z test_ops.py::TestMathBitsCUDA::test_neg_view_masked_cumprod_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2728080Z test_ops.py::TestMathBitsCUDA::test_neg_view_masked_cumsum_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2728253Z test_ops.py::TestMathBitsCUDA::test_neg_view_masked_log_softmax_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2728414Z test_ops.py::TestMathBitsCUDA::test_neg_view_masked_mean_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2728580Z test_ops.py::TestMathBitsCUDA::test_neg_view_masked_median_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2728740Z test_ops.py::TestMathBitsCUDA::test_neg_view_masked_norm_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2728892Z test_ops.py::TestMathBitsCUDA::test_neg_view_masked_sum_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2729049Z test_ops.py::TestMathBitsCUDA::test_neg_view_masked_var_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2729204Z test_ops.py::TestMathBitsCUDA::test_neg_view_matrix_exp_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2729361Z test_ops.py::TestMathBitsCUDA::test_neg_view_max_binary_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2729520Z test_ops.py::TestMathBitsCUDA::test_neg_view_maximum_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2729722Z test_ops.py::TestMathBitsCUDA::test_neg_view_meshgrid_list_of_tensors_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2729883Z test_ops.py::TestMathBitsCUDA::test_neg_view_min_binary_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2730035Z test_ops.py::TestMathBitsCUDA::test_neg_view_mm_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2730190Z test_ops.py::TestMathBitsCUDA::test_neg_view_msort_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2730334Z test_ops.py::TestMathBitsCUDA::test_neg_view_mv_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2730493Z test_ops.py::TestMathBitsCUDA::test_neg_view_nan_to_num_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2730652Z test_ops.py::TestMathBitsCUDA::test_neg_view_nanmedian_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2730818Z test_ops.py::TestMathBitsCUDA::test_neg_view_nanquantile_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2730972Z test_ops.py::TestMathBitsCUDA::test_neg_view_nansum_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2731135Z test_ops.py::TestMathBitsCUDA::test_neg_view_narrow_copy_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2731287Z test_ops.py::TestMathBitsCUDA::test_neg_view_neg_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2731461Z test_ops.py::TestMathBitsCUDA::test_neg_view_new_empty_cuda_float64 SKIPPED (Skipped!) [ 86%] 2023-01-11T23:13:47.2731678Z test_ops.py::TestMathBitsCUDA::test_neg_view_new_empty_strided_cuda_float64 SKIPPED (Expected: new_empty_strided is not comparable) [ 86%] 2023-01-11T23:13:47.2731836Z test_ops.py::TestMathBitsCUDA::test_neg_view_new_ones_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2731995Z test_ops.py::TestMathBitsCUDA::test_neg_view_new_zeros_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2732190Z test_ops.py::TestMathBitsCUDA::test_neg_view_nextafter_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2732389Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional__scaled_dot_product_attention_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2732578Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_adaptive_avg_pool2d_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2732767Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_adaptive_avg_pool3d_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2732952Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_adaptive_max_pool1d_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2733136Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_adaptive_max_pool2d_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2733312Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_adaptive_max_pool3d_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2733489Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_avg_pool1d_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2733667Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_avg_pool3d_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2733845Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_bilinear_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2734015Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_celu_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2734184Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_conv2d_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2734368Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_conv_transpose2d_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2734746Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_conv_transpose3d_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2734921Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_ctc_loss_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2735089Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_dropout2d_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2735266Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_dropout3d_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2735468Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_feature_alpha_dropout_with_train_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2735753Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_fractional_max_pool2d_cuda_float64 SKIPPED (Operation not tested with tensors with negative bit.) [ 86%] 2023-01-11T23:13:47.2735939Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_gaussian_nll_loss_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2736108Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_gelu_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2736277Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_glu_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2736453Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_group_norm_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2736630Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_hardshrink_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2736801Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_hardswish_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2736974Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_hardtanh_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2737162Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_hinge_embedding_loss_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2737336Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_huber_loss_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2737522Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_interpolate_bicubic_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2737711Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_interpolate_bilinear_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2737901Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_interpolate_linear_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2738091Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_interpolate_trilinear_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2738286Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_kl_div_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2738464Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_layer_norm_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2738651Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_margin_ranking_loss_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2738831Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_max_pool1d_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2739020Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_max_pool3d_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2739227Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_max_unpool2d_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2739410Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_max_unpool3d_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2739594Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_max_unpool3d_grad_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2739778Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_multi_margin_loss_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2739962Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_multilabel_margin_loss_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2740156Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_multilabel_soft_margin_loss_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2740329Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_nll_loss_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2740503Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_normalize_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2740686Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_pad_replicate_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2740855Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_pdist_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2741042Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_poisson_nll_loss_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2741213Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_relu_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2741405Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_selu_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2741564Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_silu_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2741751Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_softmin_with_dtype_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2741923Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_softsign_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2742095Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_unfold_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2742279Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_upsample_nearest_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2742439Z test_ops.py::TestMathBitsCUDA::test_neg_view_norm_fro_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2742601Z test_ops.py::TestMathBitsCUDA::test_neg_view_norm_inf_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2742759Z test_ops.py::TestMathBitsCUDA::test_neg_view_normal_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2742907Z test_ops.py::TestMathBitsCUDA::test_neg_view_ones_cuda_float64 XFAIL [ 86%] 2023-01-11T23:13:47.2743076Z test_ops.py::TestMathBitsCUDA::test_neg_view_ops_nvprims_view_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2743229Z test_ops.py::TestMathBitsCUDA::test_neg_view_outer_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2743386Z test_ops.py::TestMathBitsCUDA::test_neg_view_pinverse_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2743563Z test_ops.py::TestMathBitsCUDA::test_neg_view_polygamma_polygamma_n_1_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2743740Z test_ops.py::TestMathBitsCUDA::test_neg_view_polygamma_polygamma_n_2_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2743912Z test_ops.py::TestMathBitsCUDA::test_neg_view_polygamma_polygamma_n_4_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2744094Z test_ops.py::TestMathBitsCUDA::test_neg_view_put_cuda_float64 PASSED [ 86%] 2023-01-11T23:13:47.2744251Z test_ops.py::TestMathBitsCUDA::test_neg_view_randint_cuda_float64 XFAIL [ 87%] 2023-01-11T23:13:47.2744400Z test_ops.py::TestMathBitsCUDA::test_neg_view_randn_cuda_float64 XFAIL [ 87%] 2023-01-11T23:13:47.2744565Z test_ops.py::TestMathBitsCUDA::test_neg_view_randn_like_cuda_float64 PASSED [ 87%] 2023-01-11T23:13:47.2744717Z test_ops.py::TestMathBitsCUDA::test_neg_view_ravel_cuda_float64 PASSED [ 87%] 2023-01-11T23:13:47.2744869Z test_ops.py::TestMathBitsCUDA::test_neg_view_real_cuda_float64 PASSED [ 87%] 2023-01-11T23:13:47.2745033Z test_ops.py::TestMathBitsCUDA::test_neg_view_reciprocal_cuda_float64 PASSED [ 87%] 2023-01-11T23:13:47.2745192Z test_ops.py::TestMathBitsCUDA::test_neg_view_remainder_cuda_float64 PASSED [ 87%] 2023-01-11T23:13:47.2745345Z test_ops.py::TestMathBitsCUDA::test_neg_view_renorm_cuda_float64 PASSED [ 87%] 2023-01-11T23:13:47.2745501Z test_ops.py::TestMathBitsCUDA::test_neg_view_repeat_cuda_float64 PASSED [ 87%] 2023-01-11T23:13:47.2745662Z test_ops.py::TestMathBitsCUDA::test_neg_view_repeat_interleave_cuda_float64 PASSED [ 87%] 2023-01-11T23:13:47.2745825Z test_ops.py::TestMathBitsCUDA::test_neg_view_reshape_as_cuda_float64 PASSED [ 87%] 2023-01-11T23:13:47.2745980Z test_ops.py::TestMathBitsCUDA::test_neg_view_resize_as__cuda_float64 PASSED [ 87%] 2023-01-11T23:13:47.2746142Z test_ops.py::TestMathBitsCUDA::test_neg_view_resolve_neg_cuda_float64 PASSED [ 87%] 2023-01-11T23:13:47.2746293Z test_ops.py::TestMathBitsCUDA::test_neg_view_roll_cuda_float64 PASSED [ 87%] 2023-01-11T23:13:47.2746449Z test_ops.py::TestMathBitsCUDA::test_neg_view_rot90_cuda_float64 PASSED [ 87%] 2023-01-11T23:13:47.2746603Z test_ops.py::TestMathBitsCUDA::test_neg_view_round_cuda_float64 PASSED [ 87%] 2023-01-11T23:13:47.2746790Z test_ops.py::TestMathBitsCUDA::test_neg_view_round_decimals_neg_3_cuda_float64 SKIPPED (Skipped!) [ 87%] 2023-01-11T23:13:47.2746938Z test_ops.py::TestMathBitsCUDA::test_neg_view_rsqrt_cuda_float64 PASSED [ 87%] 2023-01-11T23:13:47.2747116Z test_ops.py::TestMathBitsCUDA::test_neg_view_scalar_tensor_cuda_float64 SKIPPED (Skipped!) [ 87%] 2023-01-11T23:13:47.2747301Z test_ops.py::TestMathBitsCUDA::test_neg_view_scatter_add_cuda_float64 PASSED [ 87%] 2023-01-11T23:13:47.2747456Z test_ops.py::TestMathBitsCUDA::test_neg_view_scatter_cuda_float64 PASSED [ 87%] 2023-01-11T23:13:47.2747625Z test_ops.py::TestMathBitsCUDA::test_neg_view_scatter_reduce_amin_cuda_float64 PASSED [ 87%] 2023-01-11T23:13:47.2747795Z test_ops.py::TestMathBitsCUDA::test_neg_view_scatter_reduce_mean_cuda_float64 PASSED [ 87%] 2023-01-11T23:13:47.2747963Z test_ops.py::TestMathBitsCUDA::test_neg_view_searchsorted_cuda_float64 PASSED [ 87%] 2023-01-11T23:13:47.2748137Z test_ops.py::TestMathBitsCUDA::test_neg_view_segment_reduce_lengths_cuda_float64 PASSED [ 87%] 2023-01-11T23:13:47.2748290Z test_ops.py::TestMathBitsCUDA::test_neg_view_select_cuda_float64 PASSED [ 87%] 2023-01-11T23:13:47.2748475Z test_ops.py::TestMathBitsCUDA::test_neg_view_signal_windows_blackman_cuda_float64 SKIPPED (Skipped!) [ 87%] 2023-01-11T23:13:47.2748663Z test_ops.py::TestMathBitsCUDA::test_neg_view_signal_windows_cosine_cuda_float64 SKIPPED (Skipped!) [ 87%] 2023-01-11T23:13:47.2748861Z test_ops.py::TestMathBitsCUDA::test_neg_view_signal_windows_exponential_cuda_float64 SKIPPED (Skipped!) [ 87%] 2023-01-11T23:13:47.2749057Z test_ops.py::TestMathBitsCUDA::test_neg_view_signal_windows_general_hamming_cuda_float64 SKIPPED (Skipped!) [ 87%] 2023-01-11T23:13:47.2749243Z test_ops.py::TestMathBitsCUDA::test_neg_view_signal_windows_hamming_cuda_float64 SKIPPED (Skipped!) [ 87%] 2023-01-11T23:13:47.2749399Z test_ops.py::TestMathBitsCUDA::test_neg_view_signbit_cuda_float64 PASSED [ 87%] 2023-01-11T23:13:47.2749550Z test_ops.py::TestMathBitsCUDA::test_neg_view_sin_cuda_float64 PASSED [ 87%] 2023-01-11T23:13:47.2749768Z test_ops.py::TestMathBitsCUDA::test_neg_view_softmax_cuda_float64 PASSED [ 87%] 2023-01-11T23:13:47.2749975Z test_ops.py::TestMathBitsCUDA::test_neg_view_softmax_with_dtype_cuda_float64 PASSED [ 87%] 2023-01-11T23:13:47.2750129Z test_ops.py::TestMathBitsCUDA::test_neg_view_sort_cuda_float64 PASSED [ 87%] 2023-01-11T23:13:47.2750322Z test_ops.py::TestMathBitsCUDA::test_neg_view_sparse_sampled_addmm_cuda_float64 SKIPPED (Skipped!) [ 87%] 2023-01-11T23:13:47.2750490Z test_ops.py::TestMathBitsCUDA::test_neg_view_special_airy_ai_cuda_float64 PASSED [ 87%] 2023-01-11T23:13:47.2750657Z test_ops.py::TestMathBitsCUDA::test_neg_view_special_bessel_y0_cuda_float64 PASSED [ 87%] 2023-01-11T23:13:47.2750823Z test_ops.py::TestMathBitsCUDA::test_neg_view_special_bessel_y1_cuda_float64 PASSED [ 87%] 2023-01-11T23:13:47.2751010Z test_ops.py::TestMathBitsCUDA::test_neg_view_special_chebyshev_polynomial_t_cuda_float64 PASSED [ 87%] 2023-01-11T23:13:47.2751379Z test_ops.py::TestMathBitsCUDA::test_neg_view_special_chebyshev_polynomial_w_cuda_float64 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 87%] 2023-01-11T23:13:47.2751546Z test_ops.py::TestMathBitsCUDA::test_neg_view_special_entr_cuda_float64 PASSED [ 87%] 2023-01-11T23:13:47.2751725Z test_ops.py::TestMathBitsCUDA::test_neg_view_special_hermite_polynomial_h_cuda_float64 PASSED [ 87%] 2023-01-11T23:13:47.2751887Z test_ops.py::TestMathBitsCUDA::test_neg_view_special_i0e_cuda_float64 PASSED [ 87%] 2023-01-11T23:13:47.2752045Z test_ops.py::TestMathBitsCUDA::test_neg_view_special_i1_cuda_float64 PASSED [ 87%] 2023-01-11T23:13:47.2752395Z test_ops.py::TestMathBitsCUDA::test_neg_view_special_legendre_polynomial_p_cuda_float64 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 87%] 2023-01-11T23:13:47.2752562Z test_ops.py::TestMathBitsCUDA::test_neg_view_special_log_ndtr_cuda_float64 PASSED [ 87%] 2023-01-11T23:13:47.2752747Z test_ops.py::TestMathBitsCUDA::test_neg_view_special_scaled_modified_bessel_k0_cuda_float64 PASSED [ 87%] 2023-01-11T23:13:47.2753107Z test_ops.py::TestMathBitsCUDA::test_neg_view_special_shifted_chebyshev_polynomial_t_cuda_float64 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 87%] 2023-01-11T23:13:47.2753496Z test_ops.py::TestMathBitsCUDA::test_neg_view_special_shifted_chebyshev_polynomial_w_cuda_float64 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 87%] 2023-01-11T23:13:47.2753675Z test_ops.py::TestMathBitsCUDA::test_neg_view_special_spherical_bessel_j0_cuda_float64 PASSED [ 87%] 2023-01-11T23:13:47.2753833Z test_ops.py::TestMathBitsCUDA::test_neg_view_split_cuda_float64 PASSED [ 87%] 2023-01-11T23:13:47.2753993Z test_ops.py::TestMathBitsCUDA::test_neg_view_split_with_sizes_cuda_float64 PASSED [ 87%] 2023-01-11T23:13:47.2754144Z test_ops.py::TestMathBitsCUDA::test_neg_view_sqrt_cuda_float64 PASSED [ 87%] 2023-01-11T23:13:47.2754301Z test_ops.py::TestMathBitsCUDA::test_neg_view_square_cuda_float64 PASSED [ 87%] 2023-01-11T23:13:47.2754455Z test_ops.py::TestMathBitsCUDA::test_neg_view_squeeze_cuda_float64 PASSED [ 87%] 2023-01-11T23:13:47.2754625Z test_ops.py::TestMathBitsCUDA::test_neg_view_std_mean_unbiased_cuda_float64 PASSED [ 87%] 2023-01-11T23:13:47.2754775Z test_ops.py::TestMathBitsCUDA::test_neg_view_stft_cuda_float64 PASSED [ 87%] 2023-01-11T23:13:47.2754941Z test_ops.py::TestMathBitsCUDA::test_neg_view_sum_to_size_cuda_float64 PASSED [ 87%] 2023-01-11T23:13:47.2755093Z test_ops.py::TestMathBitsCUDA::test_neg_view_svd_cuda_float64 PASSED [ 87%] 2023-01-11T23:13:47.2755248Z test_ops.py::TestMathBitsCUDA::test_neg_view_svd_lowrank_cuda_float64 PASSED [ 87%] 2023-01-11T23:13:47.2755402Z test_ops.py::TestMathBitsCUDA::test_neg_view_symeig_cuda_float64 PASSED [ 87%] 2023-01-11T23:13:47.2755553Z test_ops.py::TestMathBitsCUDA::test_neg_view_t_cuda_float64 PASSED [ 87%] 2023-01-11T23:13:47.2755717Z test_ops.py::TestMathBitsCUDA::test_neg_view_take_along_dim_cuda_float64 PASSED [ 87%] 2023-01-11T23:13:47.2755865Z test_ops.py::TestMathBitsCUDA::test_neg_view_tan_cuda_float64 PASSED [ 87%] 2023-01-11T23:13:47.2756042Z test_ops.py::TestMathBitsCUDA::test_neg_view_tanh_cuda_float64 PASSED [ 87%] 2023-01-11T23:13:47.2756206Z test_ops.py::TestMathBitsCUDA::test_neg_view_tensor_split_cuda_float64 PASSED [ 87%] 2023-01-11T23:13:47.2756362Z test_ops.py::TestMathBitsCUDA::test_neg_view_to_cuda_float64 PASSED [ 87%] 2023-01-11T23:13:47.2756515Z test_ops.py::TestMathBitsCUDA::test_neg_view_to_sparse_cuda_float64 PASSED [ 87%] 2023-01-11T23:13:47.2756676Z test_ops.py::TestMathBitsCUDA::test_neg_view_transpose_cuda_float64 PASSED [ 87%] 2023-01-11T23:13:47.2756829Z test_ops.py::TestMathBitsCUDA::test_neg_view_trapz_cuda_float64 PASSED [ 87%] 2023-01-11T23:13:47.2756997Z test_ops.py::TestMathBitsCUDA::test_neg_view_triangular_solve_cuda_float64 PASSED [ 87%] 2023-01-11T23:13:47.2757149Z test_ops.py::TestMathBitsCUDA::test_neg_view_triu_cuda_float64 PASSED [ 87%] 2023-01-11T23:13:47.2757301Z test_ops.py::TestMathBitsCUDA::test_neg_view_trunc_cuda_float64 PASSED [ 87%] 2023-01-11T23:13:47.2757460Z test_ops.py::TestMathBitsCUDA::test_neg_view_unflatten_cuda_float64 PASSED [ 87%] 2023-01-11T23:13:47.2757613Z test_ops.py::TestMathBitsCUDA::test_neg_view_unfold_cuda_float64 PASSED [ 87%] 2023-01-11T23:13:47.2757759Z test_ops.py::TestMathBitsCUDA::test_neg_view_uniform_cuda_float64 XFAIL [ 87%] 2023-01-11T23:13:47.2757910Z test_ops.py::TestMathBitsCUDA::test_neg_view_unique_cuda_float64 PASSED [ 87%] 2023-01-11T23:13:47.2758061Z test_ops.py::TestMathBitsCUDA::test_neg_view_var_cuda_float64 PASSED [ 87%] 2023-01-11T23:13:47.2758221Z test_ops.py::TestMathBitsCUDA::test_neg_view_var_mean_cuda_float64 PASSED [ 87%] 2023-01-11T23:13:47.2758392Z test_ops.py::TestMathBitsCUDA::test_neg_view_var_mean_unbiased_cuda_float64 PASSED [ 87%] 2023-01-11T23:13:47.2758556Z test_ops.py::TestMathBitsCUDA::test_neg_view_var_unbiased_cuda_float64 PASSED [ 87%] 2023-01-11T23:13:47.2758708Z test_ops.py::TestMathBitsCUDA::test_neg_view_vdot_cuda_float64 PASSED [ 87%] 2023-01-11T23:13:47.2758937Z test_ops.py::TestMathBitsCUDA::test_neg_view_view_as_complex_cuda_float64 SKIPPED (Operation not tested with tensors with negative bit.) [ 87%] 2023-01-11T23:13:47.2759095Z test_ops.py::TestMathBitsCUDA::test_neg_view_view_copy_cuda_float64 PASSED [ 87%] 2023-01-11T23:13:47.2759273Z test_ops.py::TestMathBitsCUDA::test_neg_view_where_cuda_float64 PASSED [ 87%] 2023-01-11T23:13:47.2759427Z test_ops.py::TestMathBitsCUDA::test_neg_view_xlogy_cuda_float64 PASSED [ 87%] 2023-01-11T23:13:47.2759582Z test_ops.py::TestMathBitsCUDA::test_neg_view_zeros_cuda_float64 XFAIL [ 87%] 2023-01-11T23:13:47.2759733Z test_ops.py::TestFakeTensorCUDA::test_fake_H_cuda_float32 PASSED [ 87%] 2023-01-11T23:13:47.2759890Z test_ops.py::TestFakeTensorCUDA::test_fake___getitem___cuda_float32 PASSED [ 87%] 2023-01-11T23:13:47.2760042Z test_ops.py::TestFakeTensorCUDA::test_fake___rdiv___cuda_float32 PASSED [ 87%] 2023-01-11T23:13:47.2760197Z test_ops.py::TestFakeTensorCUDA::test_fake___rmod___cuda_float32 PASSED [ 87%] 2023-01-11T23:13:47.2760351Z test_ops.py::TestFakeTensorCUDA::test_fake___rpow___cuda_float32 PASSED [ 87%] 2023-01-11T23:13:47.2760493Z test_ops.py::TestFakeTensorCUDA::test_fake___rxor___cuda_int64 PASSED [ 87%] 2023-01-11T23:13:47.2760670Z test_ops.py::TestFakeTensorCUDA::test_fake__softmax_backward_data_cuda_float32 PASSED [ 87%] 2023-01-11T23:13:47.2760823Z test_ops.py::TestFakeTensorCUDA::test_fake_abs_cuda_float32 PASSED [ 87%] 2023-01-11T23:13:47.2760975Z test_ops.py::TestFakeTensorCUDA::test_fake_acosh_cuda_float32 PASSED [ 87%] 2023-01-11T23:13:47.2761125Z test_ops.py::TestFakeTensorCUDA::test_fake_add_cuda_float32 PASSED [ 87%] 2023-01-11T23:13:47.2761278Z test_ops.py::TestFakeTensorCUDA::test_fake_addcdiv_cuda_float32 PASSED [ 87%] 2023-01-11T23:13:47.2761429Z test_ops.py::TestFakeTensorCUDA::test_fake_addcmul_cuda_float32 PASSED [ 87%] 2023-01-11T23:13:47.2761580Z test_ops.py::TestFakeTensorCUDA::test_fake_addmm_cuda_float32 PASSED [ 87%] 2023-01-11T23:13:47.2761724Z test_ops.py::TestFakeTensorCUDA::test_fake_addmv_cuda_float32 PASSED [ 87%] 2023-01-11T23:13:47.2761904Z test_ops.py::TestFakeTensorCUDA::test_fake_all_cuda_float32 PASSED [ 87%] 2023-01-11T23:13:47.2762056Z test_ops.py::TestFakeTensorCUDA::test_fake_allclose_cuda_float32 PASSED [ 87%] 2023-01-11T23:13:47.2762212Z test_ops.py::TestFakeTensorCUDA::test_fake_amin_cuda_float32 PASSED [ 87%] 2023-01-11T23:13:47.2762366Z test_ops.py::TestFakeTensorCUDA::test_fake_any_cuda_float32 PASSED [ 87%] 2023-01-11T23:13:47.2762518Z test_ops.py::TestFakeTensorCUDA::test_fake_arange_cuda_float32 PASSED [ 87%] 2023-01-11T23:13:47.2762669Z test_ops.py::TestFakeTensorCUDA::test_fake_argmax_cuda_float32 PASSED [ 87%] 2023-01-11T23:13:47.2762819Z test_ops.py::TestFakeTensorCUDA::test_fake_argmin_cuda_float32 PASSED [ 87%] 2023-01-11T23:13:47.2762965Z test_ops.py::TestFakeTensorCUDA::test_fake_argsort_cuda_float32 PASSED [ 87%] 2023-01-11T23:13:47.2763143Z test_ops.py::TestFakeTensorCUDA::test_fake_as_strided_partial_views_cuda_float32 PASSED [ 87%] 2023-01-11T23:13:47.2763294Z test_ops.py::TestFakeTensorCUDA::test_fake_asin_cuda_float32 PASSED [ 87%] 2023-01-11T23:13:47.2763445Z test_ops.py::TestFakeTensorCUDA::test_fake_atanh_cuda_float32 PASSED [ 87%] 2023-01-11T23:13:47.2763607Z test_ops.py::TestFakeTensorCUDA::test_fake_atleast_1d_cuda_float32 PASSED [ 87%] 2023-01-11T23:13:47.2763766Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_H_cuda_float32 PASSED [ 87%] 2023-01-11T23:13:47.2763925Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_T_cuda_float32 PASSED [ 87%] 2023-01-11T23:13:47.2764095Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast___radd___cuda_float32 PASSED [ 87%] 2023-01-11T23:13:47.2764253Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast___rmul___cuda_float32 PASSED [ 87%] 2023-01-11T23:13:47.2764418Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast___ror___cuda_int64 PASSED [ 87%] 2023-01-11T23:13:47.2764586Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast___rsub___cuda_float32 PASSED [ 87%] 2023-01-11T23:13:47.2764751Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast___rxor___cuda_int64 PASSED [ 87%] 2023-01-11T23:13:47.2764935Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast__softmax_backward_data_cuda_float32 PASSED [ 87%] 2023-01-11T23:13:47.2765124Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_acosh_cuda_float32 PASSED [ 87%] 2023-01-11T23:13:47.2765288Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_add_cuda_float32 PASSED [ 87%] 2023-01-11T23:13:47.2765452Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_addbmm_cuda_float32 PASSED [ 87%] 2023-01-11T23:13:47.2765620Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_addcmul_cuda_float32 PASSED [ 87%] 2023-01-11T23:13:47.2765778Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_addmm_cuda_float32 PASSED [ 87%] 2023-01-11T23:13:47.2765955Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_addmm_decomposed_cuda_float32 PASSED [ 87%] 2023-01-11T23:13:47.2766117Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_addmv_cuda_float32 PASSED [ 87%] 2023-01-11T23:13:47.2766285Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_allclose_cuda_float32 PASSED [ 87%] 2023-01-11T23:13:47.2766478Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_aminmax_cuda_float32 SKIPPED (Skip failing test) [ 87%] 2023-01-11T23:13:47.2766646Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_angle_cuda_float32 PASSED [ 87%] 2023-01-11T23:13:47.2766811Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_argmax_cuda_float32 PASSED [ 87%] 2023-01-11T23:13:47.2766980Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_as_strided_cuda_float32 PASSED [ 87%] 2023-01-11T23:13:47.2767151Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_as_strided_scatter_cuda_float32 PASSED [ 87%] 2023-01-11T23:13:47.2767314Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_asin_cuda_float32 PASSED [ 87%] 2023-01-11T23:13:47.2767475Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_atan2_cuda_float32 PASSED [ 87%] 2023-01-11T23:13:47.2767659Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_atanh_cuda_float32 PASSED [ 87%] 2023-01-11T23:13:47.2767828Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_atleast_1d_cuda_float32 PASSED [ 87%] 2023-01-11T23:13:47.2767998Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_atleast_2d_cuda_float32 PASSED [ 87%] 2023-01-11T23:13:47.2768167Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_atleast_3d_cuda_float32 PASSED [ 87%] 2023-01-11T23:13:47.2768335Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_baddbmm_cuda_float32 PASSED [ 87%] 2023-01-11T23:13:47.2768496Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_bernoulli_cuda_float32 PASSED [ 87%] 2023-01-11T23:13:47.2768663Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_bfloat16_cuda_float32 PASSED [ 87%] 2023-01-11T23:13:47.2768830Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_bitwise_and_cuda_int64 PASSED [ 87%] 2023-01-11T23:13:47.2768995Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_bitwise_or_cuda_int64 PASSED [ 87%] 2023-01-11T23:13:47.2769163Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_bool_cuda_float32 PASSED [ 87%] 2023-01-11T23:13:47.2769343Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_broadcast_tensors_cuda_float32 PASSED [ 87%] 2023-01-11T23:13:47.2769508Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_byte_cuda_float32 PASSED [ 87%] 2023-01-11T23:13:47.2769674Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_cat_cuda_float32 PASSED [ 87%] 2023-01-11T23:13:47.2769838Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_cdist_cuda_float32 PASSED [ 87%] 2023-01-11T23:13:47.2769996Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_cdouble_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2770198Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_cholesky_inverse_cuda_float32 SKIPPED (Skip failing test) [ 88%] 2023-01-11T23:13:47.2770360Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_chunk_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2770527Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_clamp_max_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2770700Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_column_stack_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2770889Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_complex_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2771053Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_conj_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2771228Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_conj_physical_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2771387Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_corrcoef_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2771549Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_cos_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2771722Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_count_nonzero_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2771886Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_cummax_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2772053Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_cumsum_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2772238Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_cumulative_trapezoid_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2772407Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_deg2rad_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2772575Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_diag_embed_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2772749Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_diagonal_copy_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2772906Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_digamma_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2773086Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_div_floor_rounding_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2773265Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_div_no_rounding_mode_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2773442Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_div_trunc_rounding_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2773648Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_equal_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2773811Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_erf_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2773976Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_exp2_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2774134Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_eye_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2774293Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fft_fft_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2774459Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fft_hfft_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2774750Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fft_hfftn_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2774916Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fft_ifft2_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2775084Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fft_ifft_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2775254Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fft_irfft2_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2775419Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fft_irfft_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2775584Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fft_rfft2_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2775748Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fft_rfftn_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2775902Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fill_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2782735Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_flip_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2782922Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fliplr_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2783083Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_flipud_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2783260Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_floor_divide_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2783425Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fmax_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2783655Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fmod_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2783825Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_full_like_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2783984Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_gather_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2784147Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_gcd_cuda_int64 PASSED [ 88%] 2023-01-11T23:13:47.2784310Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_ge_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2784474Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_histc_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2784637Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_hsplit_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2784798Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_igamma_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2784958Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_imag_cuda_complex64 PASSED [ 88%] 2023-01-11T23:13:47.2785129Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_index_copy_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2785289Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_index_fill_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2785455Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_index_put_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2785618Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_isclose_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2785786Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_isfinite_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2785948Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_isnan_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2786113Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_isposinf_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2786342Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_istft_cuda_complex64 SKIPPED (Skip failing test) [ 88%] 2023-01-11T23:13:47.2786567Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_jiterator_binary_return_by_ref_cuda_float32 SKIPPED (Skip failing test) [ 88%] 2023-01-11T23:13:47.2786736Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_kthvalue_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2786893Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_lerp_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2787067Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_cond_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2787240Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_cross_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2787423Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_det_singular_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2787592Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_eig_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2787794Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_eigvals_cuda_float32 SKIPPED (Skip failing test) [ 88%] 2023-01-11T23:13:47.2787995Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_eigvalsh_cuda_float32 SKIPPED (Skip failing test) [ 88%] 2023-01-11T23:13:47.2788167Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_inv_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2788333Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_inv_ex_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2788516Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_ldl_factor_ex_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2788691Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_ldl_solve_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2788869Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_lstsq_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2789060Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_lstsq_grad_oriented_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2789236Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_lu_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2789419Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_matrix_rank_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2789662Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_matrix_rank_hermitian_cuda_float32 SKIPPED (Skip failing test) [ 88%] 2023-01-11T23:13:47.2789929Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_multi_dot_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2790093Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_norm_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2790289Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_norm_subgradients_at_zero_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2790460Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_pinv_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2790665Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_pinv_hermitian_cuda_float32 SKIPPED (Skip failing test) [ 88%] 2023-01-11T23:13:47.2790914Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_pinv_singular_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 88%] 2023-01-11T23:13:47.2791112Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_solve_cuda_float32 SKIPPED (Skip failing test) [ 88%] 2023-01-11T23:13:47.2791281Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_svd_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2791455Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_svdvals_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2791632Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_tensorinv_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2791837Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_tensorsolve_cuda_float32 SKIPPED (Skip failing test) [ 88%] 2023-01-11T23:13:47.2792003Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_vander_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2792206Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_vecdot_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2792387Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_vector_norm_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2792556Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_log10_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2792719Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_log1p_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2792887Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_log2_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2793075Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_log_softmax_with_dtype_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2793248Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_logaddexp2_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2793406Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_logdet_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2793578Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_logical_and_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2793752Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_logspace_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2793922Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_lu_unpack_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2794094Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_masked_amax_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2794269Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_masked_cumprod_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2794449Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_masked_logsumexp_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2794620Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_masked_mean_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2794793Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_masked_norm_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2794963Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_masked_normalize_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2795139Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_masked_prod_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2795313Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_masked_scatter_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2795514Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_masked_select_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2795684Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_masked_std_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2795852Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_masked_sum_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2796019Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_masked_var_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2796187Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_matmul_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2796348Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_matrix_exp_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2796547Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_max_pool2d_with_indices_backward_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2796734Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_max_reduction_no_dim_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2796928Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_meshgrid_variadic_tensors_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2797109Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_min_reduction_no_dim_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2797292Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_min_reduction_with_dim_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2797456Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_mode_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2797626Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_movedim_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2797794Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_msort_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2797984Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_multinomial_cuda_float32 SKIPPED (Skip failing test) [ 88%] 2023-01-11T23:13:47.2798205Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nanmean_cuda_float32 SKIPPED (Skip failing test) [ 88%] 2023-01-11T23:13:47.2798379Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_narrow_copy_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2798565Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_native_dropout_backward_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2798744Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_native_layer_norm_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2798924Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_new_empty_strided_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2799093Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_new_zeros_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2799262Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nextafter_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2799471Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional__scaled_dot_product_attention_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2799666Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_adaptive_avg_pool1d_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2799866Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_adaptive_avg_pool3d_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2800064Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_adaptive_max_pool2d_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2800251Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_batch_norm_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2800452Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_batch_norm_without_cudnn_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2800639Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_bilinear_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2800836Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_binary_cross_entropy_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2801020Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_conv1d_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2801196Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_conv2d_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2801451Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_conv_transpose1d_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2801645Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_conv_transpose2d_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2801839Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_cosine_similarity_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2802030Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_cross_entropy_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2802241Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_feature_alpha_dropout_with_train_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2802458Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_feature_alpha_dropout_without_train_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2802661Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_fractional_max_pool3d_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2802852Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_hardshrink_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2803039Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_huber_loss_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2803223Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_instance_norm_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2803418Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_interpolate_bicubic_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2803615Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_interpolate_linear_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2803814Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_interpolate_nearest_cuda_float32 PASSED [ 88%] 2023-01-11T23:13:47.2804041Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_interpolate_trilinear_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2804226Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_kl_div_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2804420Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_local_response_norm_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2804607Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_logsigmoid_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2804796Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_margin_ranking_loss_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2804997Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_max_pool1d_cuda_float32 SKIPPED (Skip failing test) [ 89%] 2023-01-11T23:13:47.2805190Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_max_unpool1d_grad_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2805384Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_max_unpool2d_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2805573Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_max_unpool2d_grad_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2805761Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_max_unpool3d_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2805942Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_mish_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2806127Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_mse_loss_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2806320Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_multi_margin_loss_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2806525Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_one_hot_cuda_int64 SKIPPED (Skip failing test) [ 89%] 2023-01-11T23:13:47.2806703Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_pad_constant_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2806896Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_pixel_shuffle_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2807117Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_pixel_unshuffle_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2807300Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_prelu_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2807476Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_relu6_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2807652Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_relu_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2807832Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_rrelu_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2808018Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_softplus_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2808237Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_triplet_margin_with_distance_loss_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2808431Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_upsample_nearest_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2808592Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_norm_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2808762Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_norm_fro_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2808931Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_norm_inf_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2809100Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_norm_nuc_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2809280Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_normal_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2809488Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_normal_number_mean_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2809691Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_ones_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2809947Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_ones_like_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2810143Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_outer_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2810317Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_pca_lowrank_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2810482Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_pinverse_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2810644Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_polar_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2810828Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_polygamma_polygamma_n_0_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2811010Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_polygamma_polygamma_n_1_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2811193Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_polygamma_polygamma_n_4_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2811358Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_positive_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2811519Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_pow_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2811675Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_prod_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2811834Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_put_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2811999Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_rad2deg_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2812164Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_rand_like_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2812333Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_randint_like_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2812498Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_randn_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2812664Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_randn_like_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2812835Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_reciprocal_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2812991Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_renorm_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2813188Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_resize__cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2813364Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_resolve_conj_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2813531Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_resolve_neg_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2813693Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_roll_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2813866Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_round_decimals_0_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2814043Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_round_decimals_neg_3_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2814208Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_rsqrt_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2814362Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_rsub_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2814656Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_scalar_tensor_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2814822Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_scatter_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2815001Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_scatter_reduce_amax_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2815174Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_scatter_reduce_prod_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2815349Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_scatter_reduce_sum_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2815521Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_searchsorted_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2815706Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_segment_reduce_offsets_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2815976Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_sigmoid_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2816129Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_sign_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2816317Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_signal_windows_bartlett_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2816506Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_signal_windows_exponential_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2816672Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_signbit_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2816834Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_sin_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2816994Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_sinc_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2817156Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_softmax_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2817338Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_softmax_with_dtype_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2817506Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_bessel_j0_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2817685Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_bessel_y0_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2817878Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_chebyshev_polynomial_t_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2818068Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_hermite_polynomial_he_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2818238Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_i0e_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2818403Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_i1_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2818594Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_laguerre_polynomial_l_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2818773Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_log_ndtr_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2818959Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_modified_bessel_i1_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2819168Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_modified_bessel_k1_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2819339Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_ndtr_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2819540Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_polygamma_special_polygamma_n_0_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2819733Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_scaled_modified_bessel_k1_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2820140Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_shifted_chebyshev_polynomial_u_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 89%] 2023-01-11T23:13:47.2820533Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_shifted_chebyshev_polynomial_w_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 89%] 2023-01-11T23:13:47.2820755Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_split_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2820981Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_split_list_args_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2821195Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_sqrt_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2821403Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_square_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2821594Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_std_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2821775Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_std_mean_unbiased_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2821937Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_sub_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2822172Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_sum_to_size_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2822343Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_svd_lowrank_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2822505Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_t_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2822673Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_take_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2822835Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_tanh_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2823023Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_tensor_split_cuda_float32 SKIPPED (Skip failing test) [ 89%] 2023-01-11T23:13:47.2823214Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_to_sparse_cuda_float32 SKIPPED (Skip failing test) [ 89%] 2023-01-11T23:13:47.2823377Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_topk_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2823539Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_trace_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2823716Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_triangular_solve_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2823885Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_tril_indices_cuda_int64 PASSED [ 89%] 2023-01-11T23:13:47.2824048Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_triu_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2824216Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_true_divide_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2824378Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_trunc_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2824538Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_unflatten_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2824704Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_unfold_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2824883Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_unique_consecutive_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2825051Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_var_mean_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2825230Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_view_as_complex_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2825418Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_view_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2825581Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_xlogy_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2825740Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_zero__cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2825890Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_zeros_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2826049Z test_ops.py::TestFakeTensorCUDA::test_fake_baddbmm_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2826209Z test_ops.py::TestFakeTensorCUDA::test_fake_bernoulli_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2826367Z test_ops.py::TestFakeTensorCUDA::test_fake_bfloat16_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2826527Z test_ops.py::TestFakeTensorCUDA::test_fake_bincount_cuda_int64 PASSED [ 89%] 2023-01-11T23:13:47.2826681Z test_ops.py::TestFakeTensorCUDA::test_fake_bitwise_not_cuda_int64 PASSED [ 89%] 2023-01-11T23:13:47.2826836Z test_ops.py::TestFakeTensorCUDA::test_fake_bitwise_or_cuda_int64 PASSED [ 89%] 2023-01-11T23:13:47.2826992Z test_ops.py::TestFakeTensorCUDA::test_fake_bitwise_xor_cuda_int64 PASSED [ 89%] 2023-01-11T23:13:47.2827145Z test_ops.py::TestFakeTensorCUDA::test_fake_block_diag_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2827295Z test_ops.py::TestFakeTensorCUDA::test_fake_bmm_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2827446Z test_ops.py::TestFakeTensorCUDA::test_fake_bool_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2827614Z test_ops.py::TestFakeTensorCUDA::test_fake_broadcast_shapes_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2827782Z test_ops.py::TestFakeTensorCUDA::test_fake_broadcast_tensors_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2827932Z test_ops.py::TestFakeTensorCUDA::test_fake_byte_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2828113Z test_ops.py::TestFakeTensorCUDA::test_fake_cdouble_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2828266Z test_ops.py::TestFakeTensorCUDA::test_fake_char_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2828426Z test_ops.py::TestFakeTensorCUDA::test_fake_cholesky_solve_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2828583Z test_ops.py::TestFakeTensorCUDA::test_fake_clone_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2828748Z test_ops.py::TestFakeTensorCUDA::test_fake_combinations_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2828902Z test_ops.py::TestFakeTensorCUDA::test_fake_complex_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2829052Z test_ops.py::TestFakeTensorCUDA::test_fake_conj_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2829218Z test_ops.py::TestFakeTensorCUDA::test_fake_constant_pad_nd_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2829372Z test_ops.py::TestFakeTensorCUDA::test_fake_copysign_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2829529Z test_ops.py::TestFakeTensorCUDA::test_fake_corrcoef_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2829672Z test_ops.py::TestFakeTensorCUDA::test_fake_cos_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2829911Z test_ops.py::TestFakeTensorCUDA::test_fake_cosh_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2830064Z test_ops.py::TestFakeTensorCUDA::test_fake_cross_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2830242Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_T_cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2830424Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp___radd___cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2830606Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp___rdiv___cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2830789Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp___rmatmul___cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2830969Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp___rmul___cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2831150Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp___rpow___cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2831345Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp___rsub___cuda_float32 PASSED [ 89%] 2023-01-11T23:13:47.2831547Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp__native_batch_norm_legit_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2831745Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp__softmax_backward_data_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2831922Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_acos_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2832100Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_add_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2832284Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_addcdiv_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2832466Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_addcmul_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2832642Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_addmm_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2832842Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_addmm_decomposed_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2833021Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_as_strided_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2833219Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_as_strided_partial_views_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2833395Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_asin_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2833572Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_atan2_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2833747Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_atanh_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2833960Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_atleast_1d_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2834145Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_atleast_3d_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2834334Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_bernoulli_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2834515Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_bfloat16_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2834682Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_bmm_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2834871Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_broadcast_to_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2835044Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_cat_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2835222Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_ceil_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2835403Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_chalf_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2835580Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_chunk_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2835758Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_clamp_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2835942Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_clamp_min_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2836122Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_complex_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2836291Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_conj_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2836474Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_corrcoef_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2836648Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_cos_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2836827Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_cosh_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2837003Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_cov_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2837202Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_cross_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2837384Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_cummax_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2837563Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_cumsum_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2837753Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_cumulative_trapezoid_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2837942Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_diagonal_copy_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2838125Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_digamma_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2838303Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_dist_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2838478Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_dot_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2838660Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_double_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2838836Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_dsplit_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2839018Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_erfc_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2839209Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_expand_as_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2839403Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_expm1_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2839595Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fft_fft2_cuda_float32 XFAIL [ 90%] 2023-01-11T23:13:47.2839811Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fft_fftshift_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2839995Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fft_hfftn_cuda_float32 XFAIL [ 90%] 2023-01-11T23:13:47.2840173Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fft_ihfft2_cuda_float32 XFAIL [ 90%] 2023-01-11T23:13:47.2840348Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fft_ihfftn_cuda_float32 XFAIL [ 90%] 2023-01-11T23:13:47.2840524Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fft_irfft_cuda_float32 XFAIL [ 90%] 2023-01-11T23:13:47.2840698Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fft_rfft2_cuda_float32 XFAIL [ 90%] 2023-01-11T23:13:47.2840874Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fft_rfft_cuda_float32 XFAIL [ 90%] 2023-01-11T23:13:47.2841043Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_flip_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2841231Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_flipud_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2841409Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_float_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2841596Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_float_power_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2841770Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_floor_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2841949Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fmod_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2842125Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_frac_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2842305Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_gather_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2842497Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_grid_sampler_2d_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2842666Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_i0_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2842876Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_index_fill_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2843060Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_index_put_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2843245Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_index_select_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2843420Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_inner_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2843596Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_kron_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2843775Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_lgamma_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2843961Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_cross_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2844147Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_det_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2844338Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_det_singular_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2844519Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_eig_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2844707Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_eigh_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2844899Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_eigvals_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2845104Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_householder_product_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2845287Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_lu_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2845503Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_lu_factor_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2845698Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_lu_factor_ex_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2845895Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_matrix_power_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2846075Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_multi_dot_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2846260Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_norm_cuda_float32 XFAIL [ 90%] 2023-01-11T23:13:47.2846467Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_norm_subgradients_at_zero_cuda_float32 XFAIL [ 90%] 2023-01-11T23:13:47.2846668Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_pinv_cuda_float32 SKIPPED (Skipped!) [ 90%] 2023-01-11T23:13:47.2846879Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_pinv_singular_cuda_float32 SKIPPED (Skipped!) [ 90%] 2023-01-11T23:13:47.2847062Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_qr_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2847250Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_solve_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2847439Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_solve_ex_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2847620Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_svd_cuda_float32 XFAIL [ 90%] 2023-01-11T23:13:47.2847801Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_vander_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2847989Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_vecdot_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2848167Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_log2_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2848354Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_log_softmax_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2848540Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_logaddexp2_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2848754Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_logcumsumexp_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2848937Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_logdet_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2849136Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_logit_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2849336Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_lu_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2849504Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_mT_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2849696Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_masked_cumprod_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2849883Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_masked_fill_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2850072Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_masked_mean_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2850256Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_masked_norm_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2850446Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_masked_normalize_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2850634Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_masked_select_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2850823Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_masked_softmin_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2851011Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_masked_std_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2851189Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_matrix_exp_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2851406Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_max_binary_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2851618Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_max_pool2d_with_indices_backward_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2851816Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_max_reduction_with_dim_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2852017Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_meshgrid_variadic_tensors_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2852195Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_mode_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2852376Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_movedim_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2852554Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_msort_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2852731Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_mv_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2852920Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_mvlgamma_mvlgamma_p_3_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2853120Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_mvlgamma_mvlgamma_p_5_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2853305Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nan_to_num_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2853487Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nanmean_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2853674Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nanquantile_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2853854Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nansum_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2854044Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_native_layer_norm_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2854223Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_neg_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2854470Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional__scaled_dot_product_attention_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2854806Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_adaptive_avg_pool1d_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2855018Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_adaptive_avg_pool3d_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2855222Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_adaptive_max_pool1d_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2855429Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_alpha_dropout_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2855634Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_avg_pool2d_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2855841Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_avg_pool3d_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2856039Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_batch_norm_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2856265Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_binary_cross_entropy_cuda_float32 SKIPPED (Skipped!) [ 90%] 2023-01-11T23:13:47.2856476Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_conv_transpose3d_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2856685Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_cosine_similarity_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2856890Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_ctc_loss_cuda_float32 SKIPPED (Skipped!) [ 90%] 2023-01-11T23:13:47.2857138Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_dropout2d_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2857335Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_dropout3d_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2857535Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_dropout_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2857730Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_elu_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2857935Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_embedding_bag_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2858135Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_embedding_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2858363Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_feature_alpha_dropout_without_train_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2858579Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_fractional_max_pool2d_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2858794Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_fractional_max_pool3d_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2858980Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_glu_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2859181Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_grid_sample_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2859380Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_group_norm_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2859579Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_hardsigmoid_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2859782Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_instance_norm_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2859993Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_interpolate_area_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2860233Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_interpolate_nearest_cuda_float32 PASSED [ 90%] 2023-01-11T23:13:47.2860433Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_kl_div_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2860628Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_linear_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2860827Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_local_response_norm_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2861038Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_margin_ranking_loss_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2861237Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_max_pool3d_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2861443Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_max_unpool2d_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2861650Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_max_unpool3d_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2861851Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_nll_loss_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2862052Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_normalize_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2862259Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_pad_replicate_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2862466Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_pairwise_distance_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2862671Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_pixel_unshuffle_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2862904Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_poisson_nll_loss_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2863102Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_relu6_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2863296Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_relu_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2863491Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_rrelu_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2863694Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_smooth_l1_loss_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2863897Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_soft_margin_loss_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2864099Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_softsign_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2864300Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_tanhshrink_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2864500Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_threshold_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2864704Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_triplet_margin_loss_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2864904Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_upsample_bilinear_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2865111Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_upsample_nearest_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2865290Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_norm_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2865474Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_norm_inf_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2865658Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_normal_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2865880Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_normal_number_mean_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2866062Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_ormqr_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2866248Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_pca_lowrank_cuda_float32 XFAIL [ 91%] 2023-01-11T23:13:47.2866431Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_permute_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2866621Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_pinverse_cuda_float32 SKIPPED (Skipped!) [ 91%] 2023-01-11T23:13:47.2866821Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_polygamma_polygamma_n_0_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2867015Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_polygamma_polygamma_n_2_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2867211Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_polygamma_polygamma_n_3_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2867397Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_positive_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2867574Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_pow_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2867759Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_quantile_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2867940Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_ravel_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2868118Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_real_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2868300Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_reshape_as_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2868507Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_reshape_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2868696Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_resolve_conj_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2868883Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_round_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2869075Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_round_decimals_0_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2869266Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_round_decimals_3_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2869455Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_scatter_add_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2869652Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_segment_reduce_lengths_cuda_float32 XFAIL [ 91%] 2023-01-11T23:13:47.2869924Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_segment_reduce_offsets_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2870123Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_select_scatter_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2870303Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_sgn_cuda_float32 XFAIL [ 91%] 2023-01-11T23:13:47.2870485Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_sigmoid_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2870663Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_sin_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2870841Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_sinc_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2871021Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_slice_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2871212Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_slice_scatter_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2871425Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_sparse_sampled_addmm_cuda_float32 SKIPPED (Skipped!) [ 91%] 2023-01-11T23:13:47.2871618Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_special_entr_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2871824Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_special_i1_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2872015Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_special_i1e_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2872213Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_special_log_ndtr_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2872439Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_special_polygamma_special_polygamma_n_0_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2872634Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_special_xlog1py_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2872824Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_split_with_sizes_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2873010Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_sqrt_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2873196Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_square_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2873387Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_std_mean_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2873559Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_stft_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2873740Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_sub_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2873930Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_sum_to_size_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2874110Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_t_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2874288Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_tan_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2874494Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_tanh_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2874686Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_tensor_split_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2874875Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_tensordot_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2875053Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_tile_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2875223Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_to_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2875406Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_trace_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2875589Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_trapz_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2875784Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_triangular_solve_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2875965Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_trunc_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2876148Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_unbind_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2876337Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_unflatten_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2876519Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_unsqueeze_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2876694Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_view_as_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2876875Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_view_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2877059Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_vsplit_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2877238Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_where_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2877426Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_xlogy_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2877645Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp___getitem___cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2877841Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp___rmatmul___cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2878026Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp___rmod___cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2878213Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp___rpow___cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2878414Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp__native_batch_norm_legit_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2878598Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_abs_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2878776Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_add_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2878962Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_addbmm_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2879163Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_addcmul_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2879380Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_addmm_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2879588Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_addmm_decomposed_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2879772Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_addmv_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2879953Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_amax_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2880123Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_amin_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2880347Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_as_strided_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2880551Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_as_strided_partial_views_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2880735Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_asin_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2880910Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_atan_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2881091Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_atanh_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2881279Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_atleast_1d_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2881467Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_block_diag_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2881656Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_broadcast_to_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2881830Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_cat_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2882010Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_ceil_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2882190Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_chalf_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2882373Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_cholesky_cuda_float32 XFAIL [ 91%] 2023-01-11T23:13:47.2882567Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_cholesky_solve_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2882749Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_chunk_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2882927Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_clone_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2883121Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_combinations_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2883305Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_conj_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2883516Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_constant_pad_nd_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2883705Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_corrcoef_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2883886Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_cov_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2884070Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_cummax_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2884249Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_cummin_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2884427Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_cumsum_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2884633Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_cumulative_trapezoid_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2884814Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_diag_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2884993Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_diff_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2885172Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_digamma_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2885351Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_dist_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2885550Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_div_no_rounding_mode_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2885729Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_dot_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2885914Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_dstack_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2886120Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_einsum_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2886300Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_erfinv_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2886481Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_exp2_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2886659Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_expand_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2886842Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fft_fftshift_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2887024Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fft_hfft2_cuda_float32 XFAIL [ 91%] 2023-01-11T23:13:47.2887206Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fft_hfft_cuda_float32 XFAIL [ 91%] 2023-01-11T23:13:47.2887389Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fft_hfftn_cuda_float32 XFAIL [ 91%] 2023-01-11T23:13:47.2887574Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fft_ihfft_cuda_float32 XFAIL [ 91%] 2023-01-11T23:13:47.2887762Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fft_ihfftn_cuda_float32 XFAIL [ 91%] 2023-01-11T23:13:47.2887949Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fft_irfft2_cuda_float32 XFAIL [ 91%] 2023-01-11T23:13:47.2888129Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fft_rfftn_cuda_float32 XFAIL [ 91%] 2023-01-11T23:13:47.2888316Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_flatten_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2888493Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fliplr_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2888672Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fmin_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2888848Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_frac_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2889033Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_frexp_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2889236Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_gather_cuda_float32 PASSED [ 91%] 2023-01-11T23:13:47.2889421Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_gradient_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2889618Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_grid_sampler_2d_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2889797Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_hsplit_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2889973Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_i0_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2890155Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_index_copy_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2890346Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_index_fill_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2890536Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_index_put_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2890728Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_index_reduce_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2890922Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_index_select_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2891105Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_inner_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2891283Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_kron_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2891467Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_kthvalue_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2891650Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_ldexp_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2891907Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_cond_cuda_float32 XFAIL [ 92%] 2023-01-11T23:13:47.2892095Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_cross_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2892300Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_det_singular_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2892486Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_eig_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2892680Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_eigvals_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2892867Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_inv_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2893073Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_lstsq_grad_oriented_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2893264Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_lu_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2893460Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_lu_factor_ex_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2893647Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_matrix_norm_cuda_float32 XFAIL [ 92%] 2023-01-11T23:13:47.2893833Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_norm_cuda_float32 XFAIL [ 92%] 2023-01-11T23:13:47.2894024Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_pinv_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2894224Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_pinv_hermitian_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2894409Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_qr_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2894707Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_slogdet_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2894916Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_solve_triangular_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2895146Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_svdvals_cuda_float32 XFAIL [ 92%] 2023-01-11T23:13:47.2895336Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_vander_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2895532Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_vector_norm_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2895705Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_log_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2895894Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_log_softmax_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2896092Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_log_softmax_with_dtype_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2896285Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_logcumsumexp_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2896472Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_logit_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2896662Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_logsumexp_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2896838Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_mH_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2897029Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_masked_cumprod_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2897219Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_masked_cumsum_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2897400Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_masked_fill_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2897594Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_masked_log_softmax_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2897814Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_masked_mean_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2898005Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_masked_median_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2898189Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_masked_norm_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2898384Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_masked_normalize_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2898572Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_masked_prod_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2898758Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_masked_sum_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2898944Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_masked_var_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2899127Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_max_binary_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2899343Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_max_pool2d_with_indices_backward_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2899540Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_max_reduction_no_dim_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2899723Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_mean_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2899905Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_median_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2900092Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_min_binary_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2900292Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_min_reduction_with_dim_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2900482Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nanmedian_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2900669Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nanquantile_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2900868Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_narrow_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2901064Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_native_batch_norm_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2901256Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_native_layer_norm_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2901478Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional__scaled_dot_product_attention_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2901691Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_adaptive_avg_pool1d_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2901895Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_avg_pool3d_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2902100Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_batch_norm_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2902314Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_binary_cross_entropy_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2902537Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_binary_cross_entropy_with_logits_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2902734Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_celu_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2902934Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_conv_transpose3d_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2903143Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_cosine_similarity_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2903389Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_cross_entropy_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2903595Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_dropout2d_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2903795Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_dropout_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2903989Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_elu_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2904203Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_fractional_max_pool3d_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2904412Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_gaussian_nll_loss_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2904607Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_gelu_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2904813Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_grid_sample_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2905022Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_hinge_embedding_loss_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2905223Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_huber_loss_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2905431Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_interpolate_area_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2905644Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_interpolate_bicubic_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2905858Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_interpolate_bilinear_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2906055Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_kl_div_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2906254Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_l1_loss_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2906476Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_leaky_relu_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2906680Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_max_pool1d_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2906874Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_max_pool2d_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2907078Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_max_pool3d_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2907282Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_max_unpool1d_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2907491Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_max_unpool1d_grad_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2907703Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_max_unpool2d_grad_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2907911Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_max_unpool3d_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2908105Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_mish_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2908323Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_multilabel_margin_loss_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2908542Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_multilabel_soft_margin_loss_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2908740Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_nll_loss_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2908961Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_pad_reflect_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2909173Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_pad_replicate_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2909380Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_pixel_shuffle_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2909589Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_pixel_unshuffle_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2909858Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_poisson_nll_loss_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2910055Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_selu_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2910250Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_silu_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2910464Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_softmin_with_dtype_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2910669Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_softshrink_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2910869Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_softsign_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2911063Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_tanhshrink_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2911291Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_triplet_margin_with_distance_loss_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2911478Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_norm_fro_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2911664Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_normal_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2911861Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_normal_number_mean_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2912071Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_outer_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2912253Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_polar_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2912439Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_positive_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2912622Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_pow_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2912796Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_prod_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2912978Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_put_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2913165Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_quantile_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2913352Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_ravel_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2913551Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_repeat_interleave_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2913736Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_reshape_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2913917Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_rot90_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2914115Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_round_decimals_0_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2914312Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_round_decimals_neg_3_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2914486Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_rsqrt_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2914709Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_scatter_reduce_amax_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2914901Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_scatter_reduce_amin_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2915099Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_scatter_reduce_mean_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2915290Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_scatter_reduce_prod_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2915481Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_scatter_reduce_sum_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2915683Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_segment_reduce_lengths_cuda_float32 XFAIL [ 92%] 2023-01-11T23:13:47.2915865Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_sin_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2916048Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_slice_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2916232Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_slice_scatter_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2916417Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_sort_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2916610Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_special_entr_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2916800Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_special_i0e_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2916988Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_special_i1_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2917183Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_special_log_ndtr_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2917371Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_special_ndtr_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2917569Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_split_list_args_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2917785Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_split_with_sizes_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2917965Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_square_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2918153Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_squeeze_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2918336Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_stack_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2918515Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_stft_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2918694Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_sub_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2918876Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_symeig_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2919056Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_t_cuda_float32 PASSED [ 92%] 2023-01-11T23:13:47.2919251Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_take_along_dim_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2919431Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_take_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2919614Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_tensordot_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2919797Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_tile_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2919973Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_to_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2920158Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_trapezoid_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2920366Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_trapz_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2920545Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_triu_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2920738Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_true_divide_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2920922Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_unbind_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2921109Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_unsqueeze_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2921297Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_var_mean_unbiased_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2921478Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_vdot_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2921663Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_view_as_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2921845Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_view_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2922028Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_vsplit_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2922214Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_where_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2922391Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_xlogy_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2922552Z test_ops.py::TestFakeTensorCUDA::test_fake_cumprod_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2922711Z test_ops.py::TestFakeTensorCUDA::test_fake_cumsum_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2922879Z test_ops.py::TestFakeTensorCUDA::test_fake_cumulative_trapezoid_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2923037Z test_ops.py::TestFakeTensorCUDA::test_fake_deg2rad_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2923199Z test_ops.py::TestFakeTensorCUDA::test_fake_diag_embed_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2923369Z test_ops.py::TestFakeTensorCUDA::test_fake_diagonal_copy_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2923539Z test_ops.py::TestFakeTensorCUDA::test_fake_diagonal_scatter_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2923720Z test_ops.py::TestFakeTensorCUDA::test_fake_dist_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2923893Z test_ops.py::TestFakeTensorCUDA::test_fake_div_floor_rounding_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2924062Z test_ops.py::TestFakeTensorCUDA::test_fake_div_trunc_rounding_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2924211Z test_ops.py::TestFakeTensorCUDA::test_fake_dsplit_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2924366Z test_ops.py::TestFakeTensorCUDA::test_fake_einsum_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2924525Z test_ops.py::TestFakeTensorCUDA::test_fake_empty_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2924687Z test_ops.py::TestFakeTensorCUDA::test_fake_empty_like_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2924843Z test_ops.py::TestFakeTensorCUDA::test_fake_eq_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2924995Z test_ops.py::TestFakeTensorCUDA::test_fake_erf_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2925152Z test_ops.py::TestFakeTensorCUDA::test_fake_erfc_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2925309Z test_ops.py::TestFakeTensorCUDA::test_fake_erfinv_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2925453Z test_ops.py::TestFakeTensorCUDA::test_fake_exp_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2925607Z test_ops.py::TestFakeTensorCUDA::test_fake_expand_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2925761Z test_ops.py::TestFakeTensorCUDA::test_fake_expm1_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2925912Z test_ops.py::TestFakeTensorCUDA::test_fake_eye_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2926070Z test_ops.py::TestFakeTensorCUDA::test_fake_fft_fft_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2926248Z test_ops.py::TestFakeTensorCUDA::test_fake_fft_fftn_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2926413Z test_ops.py::TestFakeTensorCUDA::test_fake_fft_fftshift_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2926570Z test_ops.py::TestFakeTensorCUDA::test_fake_fft_hfft2_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2926717Z test_ops.py::TestFakeTensorCUDA::test_fake_fft_hfftn_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2926877Z test_ops.py::TestFakeTensorCUDA::test_fake_fft_ifftn_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2927038Z test_ops.py::TestFakeTensorCUDA::test_fake_fft_ihfft2_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2927191Z test_ops.py::TestFakeTensorCUDA::test_fake_fft_rfft2_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2927342Z test_ops.py::TestFakeTensorCUDA::test_fake_flip_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2927497Z test_ops.py::TestFakeTensorCUDA::test_fake_float_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2927660Z test_ops.py::TestFakeTensorCUDA::test_fake_float_power_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2927814Z test_ops.py::TestFakeTensorCUDA::test_fake_floor_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2927965Z test_ops.py::TestFakeTensorCUDA::test_fake_fmax_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2928112Z test_ops.py::TestFakeTensorCUDA::test_fake_frexp_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2928264Z test_ops.py::TestFakeTensorCUDA::test_fake_full_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2928419Z test_ops.py::TestFakeTensorCUDA::test_fake_gather_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2928567Z test_ops.py::TestFakeTensorCUDA::test_fake_gcd_cuda_int64 PASSED [ 93%] 2023-01-11T23:13:47.2928735Z test_ops.py::TestFakeTensorCUDA::test_fake_grid_sampler_2d_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2928887Z test_ops.py::TestFakeTensorCUDA::test_fake_gt_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2929041Z test_ops.py::TestFakeTensorCUDA::test_fake_i0_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2929196Z test_ops.py::TestFakeTensorCUDA::test_fake_igamma_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2929342Z test_ops.py::TestFakeTensorCUDA::test_fake_igammac_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2929525Z test_ops.py::TestFakeTensorCUDA::test_fake_imag_cuda_complex64 PASSED [ 93%] 2023-01-11T23:13:47.2929677Z test_ops.py::TestFakeTensorCUDA::test_fake_int_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2929829Z test_ops.py::TestFakeTensorCUDA::test_fake_isclose_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2929981Z test_ops.py::TestFakeTensorCUDA::test_fake_isfinite_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2930133Z test_ops.py::TestFakeTensorCUDA::test_fake_isin_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2930287Z test_ops.py::TestFakeTensorCUDA::test_fake_isnan_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2930438Z test_ops.py::TestFakeTensorCUDA::test_fake_isreal_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2930618Z test_ops.py::TestFakeTensorCUDA::test_fake_istft_cuda_complex64 SKIPPED (Skip failing test) [ 93%] 2023-01-11T23:13:47.2930821Z test_ops.py::TestFakeTensorCUDA::test_fake_jiterator_2inputs_2outputs_cuda_float32 SKIPPED (Skip failing test) [ 93%] 2023-01-11T23:13:47.2931016Z test_ops.py::TestFakeTensorCUDA::test_fake_jiterator_unary_cuda_float32 SKIPPED (Skip failing test) [ 93%] 2023-01-11T23:13:47.2931168Z test_ops.py::TestFakeTensorCUDA::test_fake_kron_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2931322Z test_ops.py::TestFakeTensorCUDA::test_fake_kthvalue_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2931475Z test_ops.py::TestFakeTensorCUDA::test_fake_lcm_cuda_int64 PASSED [ 93%] 2023-01-11T23:13:47.2931627Z test_ops.py::TestFakeTensorCUDA::test_fake_ldexp_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2931779Z test_ops.py::TestFakeTensorCUDA::test_fake_lgamma_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2931935Z test_ops.py::TestFakeTensorCUDA::test_fake_linalg_cond_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2932121Z test_ops.py::TestFakeTensorCUDA::test_fake_linalg_det_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2932293Z test_ops.py::TestFakeTensorCUDA::test_fake_linalg_det_singular_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2932460Z test_ops.py::TestFakeTensorCUDA::test_fake_linalg_eigh_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2932651Z test_ops.py::TestFakeTensorCUDA::test_fake_linalg_eigvals_cuda_float32 SKIPPED (Skip failing test) [ 93%] 2023-01-11T23:13:47.2932832Z test_ops.py::TestFakeTensorCUDA::test_fake_linalg_householder_product_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2933002Z test_ops.py::TestFakeTensorCUDA::test_fake_linalg_ldl_factor_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2933171Z test_ops.py::TestFakeTensorCUDA::test_fake_linalg_ldl_factor_ex_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2933338Z test_ops.py::TestFakeTensorCUDA::test_fake_linalg_ldl_solve_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2933513Z test_ops.py::TestFakeTensorCUDA::test_fake_linalg_lstsq_grad_oriented_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2933667Z test_ops.py::TestFakeTensorCUDA::test_fake_linalg_lu_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2933836Z test_ops.py::TestFakeTensorCUDA::test_fake_linalg_lu_factor_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2934006Z test_ops.py::TestFakeTensorCUDA::test_fake_linalg_lu_factor_ex_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2934174Z test_ops.py::TestFakeTensorCUDA::test_fake_linalg_matrix_norm_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2934368Z test_ops.py::TestFakeTensorCUDA::test_fake_linalg_matrix_power_cuda_float32 SKIPPED (Skip failing test) [ 93%] 2023-01-11T23:13:47.2934711Z test_ops.py::TestFakeTensorCUDA::test_fake_linalg_pinv_singular_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 93%] 2023-01-11T23:13:47.2934877Z test_ops.py::TestFakeTensorCUDA::test_fake_linalg_solve_ex_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2935032Z test_ops.py::TestFakeTensorCUDA::test_fake_linalg_svd_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2935196Z test_ops.py::TestFakeTensorCUDA::test_fake_linalg_svdvals_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2935400Z test_ops.py::TestFakeTensorCUDA::test_fake_linalg_tensorinv_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2935593Z test_ops.py::TestFakeTensorCUDA::test_fake_linalg_tensorsolve_cuda_float32 SKIPPED (Skip failing test) [ 93%] 2023-01-11T23:13:47.2935756Z test_ops.py::TestFakeTensorCUDA::test_fake_linalg_vander_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2935923Z test_ops.py::TestFakeTensorCUDA::test_fake_linalg_vector_norm_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2936080Z test_ops.py::TestFakeTensorCUDA::test_fake_linspace_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2936232Z test_ops.py::TestFakeTensorCUDA::test_fake_log_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2936392Z test_ops.py::TestFakeTensorCUDA::test_fake_log_softmax_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2936558Z test_ops.py::TestFakeTensorCUDA::test_fake_log_softmax_with_dtype_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2936723Z test_ops.py::TestFakeTensorCUDA::test_fake_logcumsumexp_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2936882Z test_ops.py::TestFakeTensorCUDA::test_fake_logical_and_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2937039Z test_ops.py::TestFakeTensorCUDA::test_fake_logical_or_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2937196Z test_ops.py::TestFakeTensorCUDA::test_fake_logical_xor_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2937353Z test_ops.py::TestFakeTensorCUDA::test_fake_logsumexp_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2937507Z test_ops.py::TestFakeTensorCUDA::test_fake_long_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2937657Z test_ops.py::TestFakeTensorCUDA::test_fake_lu_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2937831Z test_ops.py::TestFakeTensorCUDA::test_fake_lu_solve_cuda_float32 SKIPPED (Skip failing test) [ 93%] 2023-01-11T23:13:47.2938024Z test_ops.py::TestFakeTensorCUDA::test_fake_masked_amin_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2938188Z test_ops.py::TestFakeTensorCUDA::test_fake_masked_argmax_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2938355Z test_ops.py::TestFakeTensorCUDA::test_fake_masked_cumprod_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2938520Z test_ops.py::TestFakeTensorCUDA::test_fake_masked_logaddexp_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2938678Z test_ops.py::TestFakeTensorCUDA::test_fake_masked_mean_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2938839Z test_ops.py::TestFakeTensorCUDA::test_fake_masked_median_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2939000Z test_ops.py::TestFakeTensorCUDA::test_fake_masked_norm_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2939171Z test_ops.py::TestFakeTensorCUDA::test_fake_masked_prod_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2939359Z test_ops.py::TestFakeTensorCUDA::test_fake_masked_scatter_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2939526Z test_ops.py::TestFakeTensorCUDA::test_fake_masked_select_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2939684Z test_ops.py::TestFakeTensorCUDA::test_fake_masked_softmax_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2939844Z test_ops.py::TestFakeTensorCUDA::test_fake_masked_softmin_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2940001Z test_ops.py::TestFakeTensorCUDA::test_fake_masked_std_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2940162Z test_ops.py::TestFakeTensorCUDA::test_fake_masked_sum_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2940319Z test_ops.py::TestFakeTensorCUDA::test_fake_masked_var_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2940471Z test_ops.py::TestFakeTensorCUDA::test_fake_matmul_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2940619Z test_ops.py::TestFakeTensorCUDA::test_fake_matrix_exp_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2940804Z test_ops.py::TestFakeTensorCUDA::test_fake_max_pool2d_with_indices_backward_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2940977Z test_ops.py::TestFakeTensorCUDA::test_fake_max_reduction_no_dim_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2941128Z test_ops.py::TestFakeTensorCUDA::test_fake_mean_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2941328Z test_ops.py::TestFakeTensorCUDA::test_fake_meshgrid_list_of_tensors_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2941499Z test_ops.py::TestFakeTensorCUDA::test_fake_min_reduction_with_dim_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2941649Z test_ops.py::TestFakeTensorCUDA::test_fake_mul_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2941844Z test_ops.py::TestFakeTensorCUDA::test_fake_mvlgamma_mvlgamma_p_1_cuda_float32 SKIPPED (Skip failing test) [ 93%] 2023-01-11T23:13:47.2942029Z test_ops.py::TestFakeTensorCUDA::test_fake_mvlgamma_mvlgamma_p_5_cuda_float32 SKIPPED (Skip failing test) [ 93%] 2023-01-11T23:13:47.2942183Z test_ops.py::TestFakeTensorCUDA::test_fake_nan_to_num_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2942368Z test_ops.py::TestFakeTensorCUDA::test_fake_nanquantile_cuda_float32 SKIPPED (Skip failing test) [ 93%] 2023-01-11T23:13:47.2942528Z test_ops.py::TestFakeTensorCUDA::test_fake_narrow_copy_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2942703Z test_ops.py::TestFakeTensorCUDA::test_fake_narrow_cuda_float32 SKIPPED (Skip failing test) [ 93%] 2023-01-11T23:13:47.2942878Z test_ops.py::TestFakeTensorCUDA::test_fake_native_dropout_backward_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2943043Z test_ops.py::TestFakeTensorCUDA::test_fake_native_layer_norm_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2943199Z test_ops.py::TestFakeTensorCUDA::test_fake_neg_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2943354Z test_ops.py::TestFakeTensorCUDA::test_fake_new_ones_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2943542Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional__scaled_dot_product_attention_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2943727Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_adaptive_avg_pool2d_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2943948Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_adaptive_max_pool3d_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2944126Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_alpha_dropout_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2944301Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_avg_pool1d_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2944473Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_batch_norm_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2944645Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_bilinear_cuda_float32 PASSED [ 93%] 2023-01-11T23:13:47.2944832Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_binary_cross_entropy_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2945021Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_binary_cross_entropy_with_logits_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2945190Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_conv1d_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2945358Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_conv2d_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2945540Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_conv_transpose1d_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2945721Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_cosine_similarity_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2945892Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_dropout3d_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2946094Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_embedding_bag_cuda_float32 SKIPPED (Skip failing test) [ 94%] 2023-01-11T23:13:47.2946288Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_feature_alpha_dropout_with_train_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2946486Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_feature_alpha_dropout_without_train_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2946664Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_fractional_max_pool3d_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2946847Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_gaussian_nll_loss_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2947036Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_gelu_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2947209Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_group_norm_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2947383Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_hardshrink_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2947553Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_hardswish_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2947726Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_hardtanh_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2947902Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_instance_norm_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2948085Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_interpolate_bicubic_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2948267Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_interpolate_bilinear_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2948452Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_interpolate_linear_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2948639Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_interpolate_nearest_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2948805Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_kl_div_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2948977Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_layer_norm_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2949166Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_local_response_norm_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2949360Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_logsigmoid_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2949556Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_margin_ranking_loss_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2949841Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_max_pool1d_cuda_float32 SKIPPED (Skip failing test) [ 94%] 2023-01-11T23:13:47.2950007Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_max_pool2d_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2950181Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_max_pool3d_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2950356Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_max_unpool1d_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2950532Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_max_unpool3d_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2950722Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_multilabel_soft_margin_loss_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2950895Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_normalize_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2951067Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_pad_constant_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2951248Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_pixel_unshuffle_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2951416Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_relu6_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2951575Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_relu_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2951741Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_rrelu_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2951907Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_selu_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2952083Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_smooth_l1_loss_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2952252Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_softmin_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2952433Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_softmin_with_dtype_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2952607Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_softsign_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2952780Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_threshold_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2952984Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_upsample_bilinear_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2953168Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_upsample_nearest_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2953322Z test_ops.py::TestFakeTensorCUDA::test_fake_nonzero_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2953477Z test_ops.py::TestFakeTensorCUDA::test_fake_norm_fro_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2953630Z test_ops.py::TestFakeTensorCUDA::test_fake_norm_inf_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2953785Z test_ops.py::TestFakeTensorCUDA::test_fake_ones_like_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2953936Z test_ops.py::TestFakeTensorCUDA::test_fake_ormqr_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2954094Z test_ops.py::TestFakeTensorCUDA::test_fake_permute_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2954247Z test_ops.py::TestFakeTensorCUDA::test_fake_polar_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2954416Z test_ops.py::TestFakeTensorCUDA::test_fake_polygamma_polygamma_n_1_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2954567Z test_ops.py::TestFakeTensorCUDA::test_fake_positive_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2954715Z test_ops.py::TestFakeTensorCUDA::test_fake_put_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2954868Z test_ops.py::TestFakeTensorCUDA::test_fake_rand_like_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2955027Z test_ops.py::TestFakeTensorCUDA::test_fake_randn_like_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2955174Z test_ops.py::TestFakeTensorCUDA::test_fake_real_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2955333Z test_ops.py::TestFakeTensorCUDA::test_fake_reciprocal_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2955511Z test_ops.py::TestFakeTensorCUDA::test_fake_repeat_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2955693Z test_ops.py::TestFakeTensorCUDA::test_fake_repeat_interleave_cuda_float32 SKIPPED (Skip failing test) [ 94%] 2023-01-11T23:13:47.2955846Z test_ops.py::TestFakeTensorCUDA::test_fake_resize__cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2956007Z test_ops.py::TestFakeTensorCUDA::test_fake_resolve_conj_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2956155Z test_ops.py::TestFakeTensorCUDA::test_fake_round_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2956324Z test_ops.py::TestFakeTensorCUDA::test_fake_round_decimals_3_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2956472Z test_ops.py::TestFakeTensorCUDA::test_fake_rsqrt_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2956624Z test_ops.py::TestFakeTensorCUDA::test_fake_rsub_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2956773Z test_ops.py::TestFakeTensorCUDA::test_fake_scatter_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2956936Z test_ops.py::TestFakeTensorCUDA::test_fake_scatter_reduce_amax_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2957103Z test_ops.py::TestFakeTensorCUDA::test_fake_scatter_reduce_amin_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2957271Z test_ops.py::TestFakeTensorCUDA::test_fake_scatter_reduce_mean_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2957437Z test_ops.py::TestFakeTensorCUDA::test_fake_scatter_reduce_prod_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2957600Z test_ops.py::TestFakeTensorCUDA::test_fake_scatter_reduce_sum_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2957763Z test_ops.py::TestFakeTensorCUDA::test_fake_searchsorted_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2957956Z test_ops.py::TestFakeTensorCUDA::test_fake_segment_reduce_lengths_cuda_float32 SKIPPED (Skip failing test) [ 94%] 2023-01-11T23:13:47.2958121Z test_ops.py::TestFakeTensorCUDA::test_fake_select_scatter_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2958263Z test_ops.py::TestFakeTensorCUDA::test_fake_sgn_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2958417Z test_ops.py::TestFakeTensorCUDA::test_fake_short_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2958571Z test_ops.py::TestFakeTensorCUDA::test_fake_sigmoid_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2958748Z test_ops.py::TestFakeTensorCUDA::test_fake_sign_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2958921Z test_ops.py::TestFakeTensorCUDA::test_fake_signal_windows_blackman_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2959103Z test_ops.py::TestFakeTensorCUDA::test_fake_signal_windows_general_cosine_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2959293Z test_ops.py::TestFakeTensorCUDA::test_fake_signal_windows_hann_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2959485Z test_ops.py::TestFakeTensorCUDA::test_fake_signal_windows_kaiser_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2959636Z test_ops.py::TestFakeTensorCUDA::test_fake_signbit_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2959783Z test_ops.py::TestFakeTensorCUDA::test_fake_sinc_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2959932Z test_ops.py::TestFakeTensorCUDA::test_fake_sinh_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2960087Z test_ops.py::TestFakeTensorCUDA::test_fake_softmax_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2960280Z test_ops.py::TestFakeTensorCUDA::test_fake_sparse_sampled_addmm_cuda_float32 SKIPPED (Skip failing test) [ 94%] 2023-01-11T23:13:47.2960442Z test_ops.py::TestFakeTensorCUDA::test_fake_special_airy_ai_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2960607Z test_ops.py::TestFakeTensorCUDA::test_fake_special_bessel_j0_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2960769Z test_ops.py::TestFakeTensorCUDA::test_fake_special_bessel_y1_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2961141Z test_ops.py::TestFakeTensorCUDA::test_fake_special_chebyshev_polynomial_w_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 94%] 2023-01-11T23:13:47.2961296Z test_ops.py::TestFakeTensorCUDA::test_fake_special_entr_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2961481Z test_ops.py::TestFakeTensorCUDA::test_fake_special_i0e_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2961638Z test_ops.py::TestFakeTensorCUDA::test_fake_special_i1_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2961799Z test_ops.py::TestFakeTensorCUDA::test_fake_special_i1e_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2961979Z test_ops.py::TestFakeTensorCUDA::test_fake_special_laguerre_polynomial_l_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2962327Z test_ops.py::TestFakeTensorCUDA::test_fake_special_legendre_polynomial_p_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 94%] 2023-01-11T23:13:47.2962494Z test_ops.py::TestFakeTensorCUDA::test_fake_special_log_ndtr_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2962670Z test_ops.py::TestFakeTensorCUDA::test_fake_special_modified_bessel_i0_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2962848Z test_ops.py::TestFakeTensorCUDA::test_fake_special_modified_bessel_i1_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2963017Z test_ops.py::TestFakeTensorCUDA::test_fake_special_modified_bessel_k1_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2963208Z test_ops.py::TestFakeTensorCUDA::test_fake_special_scaled_modified_bessel_k1_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2963567Z test_ops.py::TestFakeTensorCUDA::test_fake_special_shifted_chebyshev_polynomial_u_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 94%] 2023-01-11T23:13:47.2963927Z test_ops.py::TestFakeTensorCUDA::test_fake_special_shifted_chebyshev_polynomial_v_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 94%] 2023-01-11T23:13:47.2964275Z test_ops.py::TestFakeTensorCUDA::test_fake_special_shifted_chebyshev_polynomial_w_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 94%] 2023-01-11T23:13:47.2964435Z test_ops.py::TestFakeTensorCUDA::test_fake_special_zeta_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2964602Z test_ops.py::TestFakeTensorCUDA::test_fake_split_list_args_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2964757Z test_ops.py::TestFakeTensorCUDA::test_fake_square_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2964940Z test_ops.py::TestFakeTensorCUDA::test_fake_std_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2965092Z test_ops.py::TestFakeTensorCUDA::test_fake_sub_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2965243Z test_ops.py::TestFakeTensorCUDA::test_fake_sum_to_size_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2965393Z test_ops.py::TestFakeTensorCUDA::test_fake_svd_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2965550Z test_ops.py::TestFakeTensorCUDA::test_fake_svd_lowrank_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2965703Z test_ops.py::TestFakeTensorCUDA::test_fake_symeig_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2965854Z test_ops.py::TestFakeTensorCUDA::test_fake_t_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2966008Z test_ops.py::TestFakeTensorCUDA::test_fake_tanh_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2966194Z test_ops.py::TestFakeTensorCUDA::test_fake_tensor_split_cuda_float32 SKIPPED (Skip failing test) [ 94%] 2023-01-11T23:13:47.2966345Z test_ops.py::TestFakeTensorCUDA::test_fake_tile_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2966519Z test_ops.py::TestFakeTensorCUDA::test_fake_to_sparse_cuda_float32 SKIPPED (Skip failing test) [ 94%] 2023-01-11T23:13:47.2966672Z test_ops.py::TestFakeTensorCUDA::test_fake_topk_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2966825Z test_ops.py::TestFakeTensorCUDA::test_fake_transpose_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2966992Z test_ops.py::TestFakeTensorCUDA::test_fake_triangular_solve_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2967139Z test_ops.py::TestFakeTensorCUDA::test_fake_tril_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2967297Z test_ops.py::TestFakeTensorCUDA::test_fake_true_divide_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2967472Z test_ops.py::TestFakeTensorCUDA::test_fake_trunc_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2967623Z test_ops.py::TestFakeTensorCUDA::test_fake_unbind_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2967778Z test_ops.py::TestFakeTensorCUDA::test_fake_unfold_copy_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2967931Z test_ops.py::TestFakeTensorCUDA::test_fake_uniform_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2968100Z test_ops.py::TestFakeTensorCUDA::test_fake_unique_consecutive_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2968253Z test_ops.py::TestFakeTensorCUDA::test_fake_unsqueeze_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2968402Z test_ops.py::TestFakeTensorCUDA::test_fake_var_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2968556Z test_ops.py::TestFakeTensorCUDA::test_fake_var_mean_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2968723Z test_ops.py::TestFakeTensorCUDA::test_fake_var_mean_unbiased_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2968877Z test_ops.py::TestFakeTensorCUDA::test_fake_vdot_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2969024Z test_ops.py::TestFakeTensorCUDA::test_fake_view_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2969195Z test_ops.py::TestFakeTensorCUDA::test_fake_xlogy_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2969369Z test_ops.py::TestFakeTensorCUDA::test_fake_zero__cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2969517Z test_ops.py::TestFakeTensorCUDA::test_fake_zeros_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2969686Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops___getitem___cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2969849Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops___rand___cuda_int64 PASSED [ 94%] 2023-01-11T23:13:47.2970014Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops___rdiv___cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2970178Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops___rmod___cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2970338Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops___rmul___cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2970498Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops___rsub___cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2970686Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops___rxor___cuda_int64 PASSED [ 94%] 2023-01-11T23:13:47.2970852Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_add_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2971018Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_addcdiv_cuda_float32 PASSED [ 94%] 2023-01-11T23:13:47.2971183Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_addmm_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2971345Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_amax_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2971510Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_any_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2971673Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_arange_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2971831Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_argmax_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2972008Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_as_strided_scatter_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2972172Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_asin_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2972335Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_asinh_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2972494Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_atan2_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2972650Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_atanh_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2972818Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_atleast_2d_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2972986Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_atleast_3d_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2973144Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_baddbmm_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2973336Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_bfloat16_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2973501Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_bitwise_and_cuda_int64 PASSED [ 95%] 2023-01-11T23:13:47.2973669Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_bitwise_or_cuda_int64 PASSED [ 95%] 2023-01-11T23:13:47.2973839Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_bitwise_xor_cuda_int64 PASSED [ 95%] 2023-01-11T23:13:47.2974002Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_bmm_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2974179Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_broadcast_shapes_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2974356Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_broadcast_tensors_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2974636Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_broadcast_to_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2974805Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_cartesian_prod_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2974974Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_cfloat_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2975138Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_chalf_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2975303Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_char_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2975501Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_cholesky_inverse_cuda_float32 SKIPPED (Skip failing test) [ 95%] 2023-01-11T23:13:47.2975677Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_cholesky_solve_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2975842Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_clamp_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2976002Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_clone_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2976161Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_complex_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2976334Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_conj_physical_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2976500Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_corrcoef_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2976702Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_cos_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2976889Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_cov_cuda_float32 SKIPPED (Skip failing test) [ 95%] 2023-01-11T23:13:47.2977059Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_cummin_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2977223Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_cumsum_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2977391Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_deg2rad_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2977556Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_diag_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2977718Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_diagflat_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2977888Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_diagonal_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2978074Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_diagonal_scatter_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2978240Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_diff_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2978418Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_div_floor_rounding_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2978600Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_div_no_rounding_mode_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2978778Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_div_trunc_rounding_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2978941Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_dot_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2979099Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_dsplit_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2979260Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_dstack_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2979454Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_einsum_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2979620Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_empty_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2979786Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_equal_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2979950Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_erfinv_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2980116Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_exp2_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2980280Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_exp_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2980445Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_expand_as_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2980613Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_expand_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2980779Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_expm1_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2980947Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_eye_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2981117Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fft_fft2_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2981284Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fft_fft_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2981455Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fft_fftn_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2981623Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fft_hfft_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2981794Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fft_ifft2_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2981953Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fft_ifft_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2982124Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fft_ifftn_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2982301Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fft_ifftshift_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2982473Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fft_irfft2_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2982719Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fft_irfft_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2982888Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fft_rfftn_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2983053Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fliplr_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2983215Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_flipud_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2983371Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_float_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2983532Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_floor_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2983695Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fmax_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2983858Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_full_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2984022Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_gather_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2984186Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_gcd_cuda_int64 PASSED [ 95%] 2023-01-11T23:13:47.2984349Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_half_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2984518Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_heaviside_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2984671Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_histc_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2984833Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_hsplit_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2984995Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_hypot_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2985157Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_i0_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2985357Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_igamma_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2985525Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_index_add_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2985698Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_index_copy_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2985866Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_index_fill_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2986040Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_index_select_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2986195Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_isin_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2986356Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_isinf_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2986514Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_isnan_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2986680Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_isneginf_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2986852Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_isposinf_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2987018Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_isreal_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2987211Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_istft_cuda_complex64 SKIPPED (Skip failing test) [ 95%] 2023-01-11T23:13:47.2987428Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_jiterator_4inputs_with_extra_args_cuda_float32 SKIPPED (Skip failing test) [ 95%] 2023-01-11T23:13:47.2987631Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_jiterator_binary_return_by_ref_cuda_float32 SKIPPED (Skip failing test) [ 95%] 2023-01-11T23:13:47.2987827Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_jiterator_unary_cuda_float32 SKIPPED (Skip failing test) [ 95%] 2023-01-11T23:13:47.2987996Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_kthvalue_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2988156Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_lcm_cuda_int64 PASSED [ 95%] 2023-01-11T23:13:47.2988319Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_ldexp_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2988481Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_le_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2988678Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_cross_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2988865Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_det_singular_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2989036Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_eigh_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2989226Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_eigvals_cuda_float32 SKIPPED (Skip failing test) [ 95%] 2023-01-11T23:13:47.2989422Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_eigvalsh_cuda_float32 SKIPPED (Skip failing test) [ 95%] 2023-01-11T23:13:47.2989590Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_inv_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2989837Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_lstsq_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2990007Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_lu_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2990184Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_lu_factor_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2990360Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_lu_solve_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2990561Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_matrix_power_cuda_float32 SKIPPED (Skip failing test) [ 95%] 2023-01-11T23:13:47.2990738Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_matrix_rank_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2990904Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_slogdet_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2991077Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_solve_ex_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2991310Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_tensorsolve_cuda_float32 SKIPPED (Skip failing test) [ 95%] 2023-01-11T23:13:47.2991493Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_vector_norm_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2991661Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linspace_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2991824Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_log10_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2991984Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_log1p_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2992156Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_log_softmax_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2992341Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_log_softmax_with_dtype_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2992511Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_logaddexp_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2992734Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_logical_or_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2992960Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_logical_xor_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2993135Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_logit_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2993302Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_logsumexp_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2993467Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_lt_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2993655Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_lu_solve_cuda_float32 SKIPPED (Skip failing test) [ 95%] 2023-01-11T23:13:47.2993823Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_lu_unpack_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2993977Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_mH_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2994138Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_mT_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2994316Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_argmax_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2994489Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_cumprod_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2994709Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_fill_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2994889Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_log_softmax_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2995067Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_logaddexp_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2995236Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_mean_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2995405Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_norm_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2995575Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_normalize_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2995748Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_select_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2995916Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_sum_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2996083Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_matmul_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2996251Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_maximum_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2996414Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_mean_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2996600Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_meshgrid_list_of_tensors_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2996788Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_meshgrid_variadic_tensors_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2996947Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_min_binary_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2997130Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_min_reduction_no_dim_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2997339Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_min_reduction_with_dim_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2997505Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_mul_cuda_float32 PASSED [ 95%] 2023-01-11T23:13:47.2997709Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_mvlgamma_mvlgamma_p_5_cuda_float32 SKIPPED (Skip failing test) [ 96%] 2023-01-11T23:13:47.2997876Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nan_to_num_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.2998066Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nanmean_cuda_float32 SKIPPED (Skip failing test) [ 96%] 2023-01-11T23:13:47.2998235Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nanmedian_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.2998400Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nansum_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.2998562Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_narrow_copy_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.2998753Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_native_dropout_backward_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.2998935Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_native_layer_norm_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.2999103Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_new_empty_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.2999276Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_new_empty_strided_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.2999441Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nextafter_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.2999637Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_adaptive_avg_pool1d_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.2999833Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_adaptive_avg_pool2d_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3000029Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_adaptive_avg_pool3d_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3000219Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_adaptive_max_pool3d_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3000434Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_alpha_dropout_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3000621Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_avg_pool3d_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3000820Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_batch_norm_without_cudnn_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3001027Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_binary_cross_entropy_with_logits_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3001205Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_celu_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3001386Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_conv2d_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3001582Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_conv_transpose1d_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3001774Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_conv_transpose3d_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3001955Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_dropout3d_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3002139Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_dropout_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3002350Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_feature_alpha_dropout_without_train_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3002547Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_fractional_max_pool2d_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3002724Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_glu_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3002912Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_hardshrink_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3003124Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_hardswish_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3003323Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_hinge_embedding_loss_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3003507Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_huber_loss_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3003688Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_instance_norm_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3003881Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_interpolate_area_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3004074Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_interpolate_bicubic_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3004268Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_interpolate_linear_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3004465Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_interpolate_nearest_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3004648Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_kl_div_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3004831Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_layer_norm_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3005014Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_leaky_relu_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3005222Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_max_pool1d_cuda_float32 SKIPPED (Skip failing test) [ 96%] 2023-01-11T23:13:47.3005396Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_max_pool3d_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3005584Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_max_unpool2d_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3005770Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_mse_loss_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3005960Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_multi_margin_loss_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3006180Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_multilabel_margin_loss_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3006384Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_multilabel_soft_margin_loss_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3006568Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_normalize_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3006769Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_one_hot_cuda_int64 SKIPPED (Skip failing test) [ 96%] 2023-01-11T23:13:47.3006960Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_pad_constant_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3007142Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_pad_replicate_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3007339Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_pairwise_distance_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3007532Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_pixel_shuffle_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3007724Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_pixel_unshuffle_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3007913Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_poisson_nll_loss_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3008094Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_rrelu_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3008272Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_selu_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3008449Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_silu_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3008663Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_smooth_l1_loss_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3008845Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_soft_margin_loss_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3009033Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_softmin_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3009226Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_softmin_with_dtype_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3009410Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_softplus_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3009595Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_softshrink_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3009780Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_softsign_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3009975Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_triplet_margin_loss_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3010169Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_upsample_nearest_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3010338Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_norm_fro_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3010500Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_normal_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3010677Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_normal_number_mean_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3010842Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_outer_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3011009Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_permute_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3011194Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_polygamma_polygamma_n_1_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3011383Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_polygamma_polygamma_n_3_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3011570Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_polygamma_polygamma_n_4_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3011737Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_positive_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3011940Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_prod_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3012102Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_rand_like_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3012269Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_renorm_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3012431Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_repeat_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3012602Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_reshape_as_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3012771Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_resolve_neg_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3012936Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_round_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3013121Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_round_decimals_neg_3_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3013283Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_rsqrt_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3013442Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_rsub_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3013619Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_scalar_tensor_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3013788Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_scatter_add_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3013968Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_scatter_reduce_amin_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3014143Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_scatter_reduce_prod_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3014356Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_segment_reduce_lengths_cuda_float32 SKIPPED (Skip failing test) [ 96%] 2023-01-11T23:13:47.3014772Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_segment_reduce_offsets_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3014977Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_select_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3015155Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_select_scatter_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3015310Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_sgn_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3015473Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_short_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3015642Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_sigmoid_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3015808Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_sign_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3015992Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_signal_windows_bartlett_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3016175Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_signal_windows_blackman_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3016360Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_signal_windows_cosine_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3016548Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_signal_windows_exponential_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3016739Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_signal_windows_general_hamming_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3016909Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_signal_windows_hann_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3017089Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_signal_windows_kaiser_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3017258Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_signbit_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3017428Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_slice_scatter_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3017595Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_softmax_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3017770Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_softmax_with_dtype_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3018030Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_sparse_sampled_addmm_cuda_float32 SKIPPED (Skip failing test) [ 96%] 2023-01-11T23:13:47.3018210Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_airy_ai_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3018380Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_bessel_j1_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3018558Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_bessel_y0_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3018751Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_chebyshev_polynomial_u_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3019140Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_chebyshev_polynomial_v_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 96%] 2023-01-11T23:13:47.3019508Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_chebyshev_polynomial_w_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 96%] 2023-01-11T23:13:47.3019686Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_entr_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3019857Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_i1e_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3020052Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_laguerre_polynomial_l_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3020231Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_log_ndtr_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3020420Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_modified_bessel_k0_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3020600Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_modified_bessel_k1_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3020810Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_ndtr_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3020987Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_ndtri_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3021196Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_polygamma_special_polygamma_n_0_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3021390Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_scaled_modified_bessel_k1_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3021766Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_shifted_chebyshev_polynomial_t_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 96%] 2023-01-11T23:13:47.3022129Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_shifted_chebyshev_polynomial_u_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 96%] 2023-01-11T23:13:47.3022487Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_shifted_chebyshev_polynomial_v_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 96%] 2023-01-11T23:13:47.3022668Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_xlog1py_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3022847Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_split_list_args_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3023018Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_split_with_sizes_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3023187Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_sqrt_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3023354Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_squeeze_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3023524Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_stack_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3023693Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_std_mean_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3023875Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_std_mean_unbiased_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3024053Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_std_unbiased_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3024250Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_stft_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3024406Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_sub_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3024564Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_sum_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3024735Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_sum_to_size_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3024897Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_take_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3025056Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_tanh_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3025226Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_tensordot_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3025393Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_tile_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3025583Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_to_sparse_cuda_float32 SKIPPED (Skip failing test) [ 96%] 2023-01-11T23:13:47.3025756Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_transpose_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3025914Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_trapz_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3026074Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_tril_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3026235Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_triu_cuda_float32 PASSED [ 96%] 2023-01-11T23:13:47.3026407Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_true_divide_cuda_float32 PASSED [ 97%] 2023-01-11T23:13:47.3026573Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_unbind_cuda_float32 PASSED [ 97%] 2023-01-11T23:13:47.3026742Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_unflatten_cuda_float32 PASSED [ 97%] 2023-01-11T23:13:47.3026946Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_unique_cuda_float32 PASSED [ 97%] 2023-01-11T23:13:47.3027121Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_view_as_complex_cuda_float32 PASSED [ 97%] 2023-01-11T23:13:47.3027282Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_view_as_cuda_float32 PASSED [ 97%] 2023-01-11T23:13:47.3027449Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_view_copy_cuda_float32 PASSED [ 97%] 2023-01-11T23:13:47.3027613Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_where_cuda_float32 PASSED [ 97%] 2023-01-11T23:13:47.3027774Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_xlogy_cuda_float32 PASSED [ 97%] 2023-01-11T23:13:47.3027932Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_zero__cuda_float32 PASSED [ 97%] 2023-01-11T23:13:47.3028094Z test_ops.py::TestTagsCUDA::test_tags_H_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3028262Z test_ops.py::TestTagsCUDA::test_tags___getitem___cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3028434Z test_ops.py::TestTagsCUDA::test_tags___radd___cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3028588Z test_ops.py::TestTagsCUDA::test_tags___rand___cuda_int64 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3028751Z test_ops.py::TestTagsCUDA::test_tags___rdiv___cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3028914Z test_ops.py::TestTagsCUDA::test_tags___rmul___cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3029070Z test_ops.py::TestTagsCUDA::test_tags___rpow___cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3029258Z test_ops.py::TestTagsCUDA::test_tags__native_batch_norm_legit_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3029445Z test_ops.py::TestTagsCUDA::test_tags__refs_T_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3029649Z test_ops.py::TestTagsCUDA::test_tags__refs__conversions_chalf_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3029920Z test_ops.py::TestTagsCUDA::test_tags__refs__conversions_char_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3030103Z test_ops.py::TestTagsCUDA::test_tags__refs__conversions_half_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3030305Z test_ops.py::TestTagsCUDA::test_tags__refs__conversions_int_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3030483Z test_ops.py::TestTagsCUDA::test_tags__refs__conversions_long_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3030649Z test_ops.py::TestTagsCUDA::test_tags__refs_abs_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3030816Z test_ops.py::TestTagsCUDA::test_tags__refs_acosh_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3030985Z test_ops.py::TestTagsCUDA::test_tags__refs_addr_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3031149Z test_ops.py::TestTagsCUDA::test_tags__refs_amin_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3031324Z test_ops.py::TestTagsCUDA::test_tags__refs_arange_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3031498Z test_ops.py::TestTagsCUDA::test_tags__refs_as_strided_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3031665Z test_ops.py::TestTagsCUDA::test_tags__refs_asin_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3031826Z test_ops.py::TestTagsCUDA::test_tags__refs_asinh_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3031992Z test_ops.py::TestTagsCUDA::test_tags__refs_atan2_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3032160Z test_ops.py::TestTagsCUDA::test_tags__refs_atan_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3032331Z test_ops.py::TestTagsCUDA::test_tags__refs_bitwise_or_cuda_int64 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3032501Z test_ops.py::TestTagsCUDA::test_tags__refs_bitwise_xor_cuda_int64 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3032711Z test_ops.py::TestTagsCUDA::test_tags__refs_broadcast_shapes_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3032893Z test_ops.py::TestTagsCUDA::test_tags__refs_broadcast_tensors_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3033067Z test_ops.py::TestTagsCUDA::test_tags__refs_bucketize_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3033232Z test_ops.py::TestTagsCUDA::test_tags__refs_cat_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3033390Z test_ops.py::TestTagsCUDA::test_tags__refs_clamp_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3033566Z test_ops.py::TestTagsCUDA::test_tags__refs_clamp_min_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3033735Z test_ops.py::TestTagsCUDA::test_tags__refs_clone_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3033911Z test_ops.py::TestTagsCUDA::test_tags__refs_column_stack_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3034082Z test_ops.py::TestTagsCUDA::test_tags__refs_conj_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3034257Z test_ops.py::TestTagsCUDA::test_tags__refs_contiguous_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3034430Z test_ops.py::TestTagsCUDA::test_tags__refs_copysign_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3034592Z test_ops.py::TestTagsCUDA::test_tags__refs_cos_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3034750Z test_ops.py::TestTagsCUDA::test_tags__refs_cosh_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3034928Z test_ops.py::TestTagsCUDA::test_tags__refs_diag_embed_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3035107Z test_ops.py::TestTagsCUDA::test_tags__refs_diagonal_copy_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3035279Z test_ops.py::TestTagsCUDA::test_tags__refs_diagonal_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3035465Z test_ops.py::TestTagsCUDA::test_tags__refs_diagonal_scatter_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3035636Z test_ops.py::TestTagsCUDA::test_tags__refs_digamma_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3035849Z test_ops.py::TestTagsCUDA::test_tags__refs_div_no_rounding_mode_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3036032Z test_ops.py::TestTagsCUDA::test_tags__refs_div_trunc_rounding_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3036198Z test_ops.py::TestTagsCUDA::test_tags__refs_empty_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3036356Z test_ops.py::TestTagsCUDA::test_tags__refs_erf_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3036521Z test_ops.py::TestTagsCUDA::test_tags__refs_erfc_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3036689Z test_ops.py::TestTagsCUDA::test_tags__refs_exp2_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3036864Z test_ops.py::TestTagsCUDA::test_tags__refs_expand_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3037028Z test_ops.py::TestTagsCUDA::test_tags__refs_eye_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3037199Z test_ops.py::TestTagsCUDA::test_tags__refs_fft_fft2_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3037364Z test_ops.py::TestTagsCUDA::test_tags__refs_fft_fftn_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3037539Z test_ops.py::TestTagsCUDA::test_tags__refs_fft_fftshift_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3037715Z test_ops.py::TestTagsCUDA::test_tags__refs_fft_hfft2_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3037878Z test_ops.py::TestTagsCUDA::test_tags__refs_fft_hfft_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3038045Z test_ops.py::TestTagsCUDA::test_tags__refs_fft_hfftn_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3038232Z test_ops.py::TestTagsCUDA::test_tags__refs_fft_ifft2_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3038403Z test_ops.py::TestTagsCUDA::test_tags__refs_fft_ifft_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3038570Z test_ops.py::TestTagsCUDA::test_tags__refs_fft_ifftn_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3038741Z test_ops.py::TestTagsCUDA::test_tags__refs_fft_irfft2_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3038911Z test_ops.py::TestTagsCUDA::test_tags__refs_fft_irfftn_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3039076Z test_ops.py::TestTagsCUDA::test_tags__refs_fft_rfft2_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3039253Z test_ops.py::TestTagsCUDA::test_tags__refs_float_power_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3039413Z test_ops.py::TestTagsCUDA::test_tags__refs_floor_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3039588Z test_ops.py::TestTagsCUDA::test_tags__refs_floor_divide_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3039755Z test_ops.py::TestTagsCUDA::test_tags__refs_fmin_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3039921Z test_ops.py::TestTagsCUDA::test_tags__refs_fmod_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3040088Z test_ops.py::TestTagsCUDA::test_tags__refs_frac_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3040252Z test_ops.py::TestTagsCUDA::test_tags__refs_gcd_cuda_int64 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3040417Z test_ops.py::TestTagsCUDA::test_tags__refs_ge_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3040587Z test_ops.py::TestTagsCUDA::test_tags__refs_hstack_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3040761Z test_ops.py::TestTagsCUDA::test_tags__refs_index_copy_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3040934Z test_ops.py::TestTagsCUDA::test_tags__refs_index_select_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3041108Z test_ops.py::TestTagsCUDA::test_tags__refs_isfinite_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3041303Z test_ops.py::TestTagsCUDA::test_tags__refs_isnan_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3041475Z test_ops.py::TestTagsCUDA::test_tags__refs_isneginf_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3041645Z test_ops.py::TestTagsCUDA::test_tags__refs_isposinf_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3041812Z test_ops.py::TestTagsCUDA::test_tags__refs_le_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3041990Z test_ops.py::TestTagsCUDA::test_tags__refs_linalg_norm_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3042168Z test_ops.py::TestTagsCUDA::test_tags__refs_linspace_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3042332Z test_ops.py::TestTagsCUDA::test_tags__refs_log1p_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3042498Z test_ops.py::TestTagsCUDA::test_tags__refs_log2_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3042689Z test_ops.py::TestTagsCUDA::test_tags__refs_log_softmax_with_dtype_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3042865Z test_ops.py::TestTagsCUDA::test_tags__refs_logical_xor_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3043034Z test_ops.py::TestTagsCUDA::test_tags__refs_logspace_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3043207Z test_ops.py::TestTagsCUDA::test_tags__refs_logsumexp_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3043379Z test_ops.py::TestTagsCUDA::test_tags__refs_masked_fill_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3043547Z test_ops.py::TestTagsCUDA::test_tags__refs_maximum_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3043746Z test_ops.py::TestTagsCUDA::test_tags__refs_minimum_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3043902Z test_ops.py::TestTagsCUDA::test_tags__refs_mul_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3044077Z test_ops.py::TestTagsCUDA::test_tags__refs_nan_to_num_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3044248Z test_ops.py::TestTagsCUDA::test_tags__refs_narrow_copy_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3044417Z test_ops.py::TestTagsCUDA::test_tags__refs_narrow_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3044602Z test_ops.py::TestTagsCUDA::test_tags__refs_native_layer_norm_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3044768Z test_ops.py::TestTagsCUDA::test_tags__refs_ne_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3044948Z test_ops.py::TestTagsCUDA::test_tags__refs_new_empty_strided_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3045137Z test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_celu_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3045315Z test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_elu_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3045487Z test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_gelu_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3045678Z test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_hardtanh_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3045878Z test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_hinge_embedding_loss_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3046063Z test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_l1_loss_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3046253Z test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_layer_norm_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3046442Z test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_leaky_relu_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3046654Z test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_nll_loss_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3046851Z test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_poisson_nll_loss_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3047038Z test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_prelu_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3047209Z test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_relu_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3047408Z test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_softmin_with_dtype_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3047596Z test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_softplus_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3047790Z test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_softshrink_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3047977Z test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_tanhshrink_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3048145Z test_ops.py::TestTagsCUDA::test_tags__refs_norm_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3048314Z test_ops.py::TestTagsCUDA::test_tags__refs_ones_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3048487Z test_ops.py::TestTagsCUDA::test_tags__refs_permute_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3048655Z test_ops.py::TestTagsCUDA::test_tags__refs_pow_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3048821Z test_ops.py::TestTagsCUDA::test_tags__refs_prod_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3048981Z test_ops.py::TestTagsCUDA::test_tags__refs_randn_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3049184Z test_ops.py::TestTagsCUDA::test_tags__refs_remainder_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3049355Z test_ops.py::TestTagsCUDA::test_tags__refs_reshape_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3049522Z test_ops.py::TestTagsCUDA::test_tags__refs_roll_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3049689Z test_ops.py::TestTagsCUDA::test_tags__refs_rot90_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3049854Z test_ops.py::TestTagsCUDA::test_tags__refs_sgn_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3050017Z test_ops.py::TestTagsCUDA::test_tags__refs_sign_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3050188Z test_ops.py::TestTagsCUDA::test_tags__refs_signbit_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3050341Z test_ops.py::TestTagsCUDA::test_tags__refs_sin_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3050526Z test_ops.py::TestTagsCUDA::test_tags__refs_softmax_with_dtype_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3050704Z test_ops.py::TestTagsCUDA::test_tags__refs_special_entr_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3050882Z test_ops.py::TestTagsCUDA::test_tags__refs_special_i0e_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3051056Z test_ops.py::TestTagsCUDA::test_tags__refs_special_i1_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3051228Z test_ops.py::TestTagsCUDA::test_tags__refs_special_i1e_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3051425Z test_ops.py::TestTagsCUDA::test_tags__refs_special_log_softmax_with_dtype_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3051605Z test_ops.py::TestTagsCUDA::test_tags__refs_special_logit_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3051783Z test_ops.py::TestTagsCUDA::test_tags__refs_special_ndtr_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3051957Z test_ops.py::TestTagsCUDA::test_tags__refs_special_ndtri_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3052176Z test_ops.py::TestTagsCUDA::test_tags__refs_special_softmax_with_dtype_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3052373Z test_ops.py::TestTagsCUDA::test_tags__refs_special_spherical_bessel_j0_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3052549Z test_ops.py::TestTagsCUDA::test_tags__refs_special_zeta_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3052719Z test_ops.py::TestTagsCUDA::test_tags__refs_std_mean_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3052883Z test_ops.py::TestTagsCUDA::test_tags__refs_sub_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:13:47.3053056Z test_ops.py::TestTagsCUDA::test_tags__refs_sum_to_size_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3053220Z test_ops.py::TestTagsCUDA::test_tags__refs_tan_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3053383Z test_ops.py::TestTagsCUDA::test_tags__refs_tanh_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3053551Z test_ops.py::TestTagsCUDA::test_tags__refs_tensor_split_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3053716Z test_ops.py::TestTagsCUDA::test_tags__refs_to_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3053891Z test_ops.py::TestTagsCUDA::test_tags__refs_tril_indices_cuda_int64 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3054061Z test_ops.py::TestTagsCUDA::test_tags__refs_triu_indices_cuda_int64 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3054233Z test_ops.py::TestTagsCUDA::test_tags__refs_true_divide_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3054404Z test_ops.py::TestTagsCUDA::test_tags__refs_unbind_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3054703Z test_ops.py::TestTagsCUDA::test_tags__refs_unflatten_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3054931Z test_ops.py::TestTagsCUDA::test_tags__refs_unfold_copy_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3055104Z test_ops.py::TestTagsCUDA::test_tags__refs_unfold_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3055272Z test_ops.py::TestTagsCUDA::test_tags__refs_unsqueeze_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3055443Z test_ops.py::TestTagsCUDA::test_tags__refs_var_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3055615Z test_ops.py::TestTagsCUDA::test_tags__refs_view_as_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3055783Z test_ops.py::TestTagsCUDA::test_tags__refs_view_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3055954Z test_ops.py::TestTagsCUDA::test_tags__refs_vsplit_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3056123Z test_ops.py::TestTagsCUDA::test_tags__refs_vstack_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3056312Z test_ops.py::TestTagsCUDA::test_tags__softmax_backward_data_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3056476Z test_ops.py::TestTagsCUDA::test_tags_abs_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3056642Z test_ops.py::TestTagsCUDA::test_tags_acosh_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3056801Z test_ops.py::TestTagsCUDA::test_tags_addbmm_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3056968Z test_ops.py::TestTagsCUDA::test_tags_addcmul_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3057131Z test_ops.py::TestTagsCUDA::test_tags_addmv_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3057295Z test_ops.py::TestTagsCUDA::test_tags_addr_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3057460Z test_ops.py::TestTagsCUDA::test_tags_allclose_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3057629Z test_ops.py::TestTagsCUDA::test_tags_aminmax_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3057792Z test_ops.py::TestTagsCUDA::test_tags_angle_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3057993Z test_ops.py::TestTagsCUDA::test_tags_arange_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3058150Z test_ops.py::TestTagsCUDA::test_tags_argmax_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3058310Z test_ops.py::TestTagsCUDA::test_tags_asin_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3058466Z test_ops.py::TestTagsCUDA::test_tags_asinh_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3058626Z test_ops.py::TestTagsCUDA::test_tags_atan2_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3058787Z test_ops.py::TestTagsCUDA::test_tags_atan_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3058954Z test_ops.py::TestTagsCUDA::test_tags_atleast_2d_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3059121Z test_ops.py::TestTagsCUDA::test_tags_baddbmm_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3059292Z test_ops.py::TestTagsCUDA::test_tags_bitwise_and_cuda_int64 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3059466Z test_ops.py::TestTagsCUDA::test_tags_bitwise_left_shift_cuda_int64 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3059621Z test_ops.py::TestTagsCUDA::test_tags_bitwise_not_cuda_int64 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3059783Z test_ops.py::TestTagsCUDA::test_tags_bitwise_or_cuda_int64 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3059941Z test_ops.py::TestTagsCUDA::test_tags_bmm_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3060120Z test_ops.py::TestTagsCUDA::test_tags_broadcast_shapes_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3060290Z test_ops.py::TestTagsCUDA::test_tags_broadcast_to_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3060477Z test_ops.py::TestTagsCUDA::test_tags_byte_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3060655Z test_ops.py::TestTagsCUDA::test_tags_cartesian_prod_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3060824Z test_ops.py::TestTagsCUDA::test_tags_cdouble_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3060978Z test_ops.py::TestTagsCUDA::test_tags_chalf_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3061137Z test_ops.py::TestTagsCUDA::test_tags_char_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3061303Z test_ops.py::TestTagsCUDA::test_tags_cholesky_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3061470Z test_ops.py::TestTagsCUDA::test_tags_clamp_min_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3061640Z test_ops.py::TestTagsCUDA::test_tags_column_stack_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3061804Z test_ops.py::TestTagsCUDA::test_tags_complex_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3061978Z test_ops.py::TestTagsCUDA::test_tags_conj_physical_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3062148Z test_ops.py::TestTagsCUDA::test_tags_contiguous_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3062315Z test_ops.py::TestTagsCUDA::test_tags_cummax_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3062472Z test_ops.py::TestTagsCUDA::test_tags_cummin_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3062631Z test_ops.py::TestTagsCUDA::test_tags_cumsum_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3062796Z test_ops.py::TestTagsCUDA::test_tags_deg2rad_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3062959Z test_ops.py::TestTagsCUDA::test_tags_diagflat_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3063130Z test_ops.py::TestTagsCUDA::test_tags_diagonal_copy_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3063313Z test_ops.py::TestTagsCUDA::test_tags_diagonal_scatter_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3063502Z test_ops.py::TestTagsCUDA::test_tags_digamma_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3063680Z test_ops.py::TestTagsCUDA::test_tags_div_floor_rounding_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3063859Z test_ops.py::TestTagsCUDA::test_tags_div_trunc_rounding_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3064016Z test_ops.py::TestTagsCUDA::test_tags_double_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3064177Z test_ops.py::TestTagsCUDA::test_tags_einsum_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3064344Z test_ops.py::TestTagsCUDA::test_tags_empty_like_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3064504Z test_ops.py::TestTagsCUDA::test_tags_exp_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3064675Z test_ops.py::TestTagsCUDA::test_tags_expand_as_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3064839Z test_ops.py::TestTagsCUDA::test_tags_fft_fft2_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3065002Z test_ops.py::TestTagsCUDA::test_tags_fft_fft_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3065172Z test_ops.py::TestTagsCUDA::test_tags_fft_fftshift_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3065329Z test_ops.py::TestTagsCUDA::test_tags_fft_ifft2_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3065490Z test_ops.py::TestTagsCUDA::test_tags_fft_ifft_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3065657Z test_ops.py::TestTagsCUDA::test_tags_fft_ifftn_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3065827Z test_ops.py::TestTagsCUDA::test_tags_fft_ifftshift_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3066029Z test_ops.py::TestTagsCUDA::test_tags_fft_ihfft_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3066199Z test_ops.py::TestTagsCUDA::test_tags_fft_ihfftn_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3066366Z test_ops.py::TestTagsCUDA::test_tags_fft_irfft_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3066526Z test_ops.py::TestTagsCUDA::test_tags_flip_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3066690Z test_ops.py::TestTagsCUDA::test_tags_fliplr_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3066846Z test_ops.py::TestTagsCUDA::test_tags_flipud_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3067018Z test_ops.py::TestTagsCUDA::test_tags_float_power_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3067179Z test_ops.py::TestTagsCUDA::test_tags_floor_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3067349Z test_ops.py::TestTagsCUDA::test_tags_floor_divide_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3067511Z test_ops.py::TestTagsCUDA::test_tags_fmax_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3067674Z test_ops.py::TestTagsCUDA::test_tags_frac_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3067835Z test_ops.py::TestTagsCUDA::test_tags_geqrf_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3068002Z test_ops.py::TestTagsCUDA::test_tags_gradient_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3068155Z test_ops.py::TestTagsCUDA::test_tags_gt_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3068320Z test_ops.py::TestTagsCUDA::test_tags_heaviside_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3068477Z test_ops.py::TestTagsCUDA::test_tags_histc_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3068638Z test_ops.py::TestTagsCUDA::test_tags_hstack_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3068801Z test_ops.py::TestTagsCUDA::test_tags_hypot_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3068963Z test_ops.py::TestTagsCUDA::test_tags_igamma_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3069152Z test_ops.py::TestTagsCUDA::test_tags_igammac_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3069320Z test_ops.py::TestTagsCUDA::test_tags_index_put_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3069491Z test_ops.py::TestTagsCUDA::test_tags_index_select_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3069645Z test_ops.py::TestTagsCUDA::test_tags_inner_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3069873Z test_ops.py::TestTagsCUDA::test_tags_isclose_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3070040Z test_ops.py::TestTagsCUDA::test_tags_isfinite_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3070198Z test_ops.py::TestTagsCUDA::test_tags_isinf_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3070361Z test_ops.py::TestTagsCUDA::test_tags_isnan_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3070529Z test_ops.py::TestTagsCUDA::test_tags_isneginf_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3070693Z test_ops.py::TestTagsCUDA::test_tags_isposinf_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3070868Z test_ops.py::TestTagsCUDA::test_tags_jiterator_unary_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3071019Z test_ops.py::TestTagsCUDA::test_tags_le_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3071178Z test_ops.py::TestTagsCUDA::test_tags_lerp_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3071350Z test_ops.py::TestTagsCUDA::test_tags_linalg_cond_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3071520Z test_ops.py::TestTagsCUDA::test_tags_linalg_det_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3071775Z test_ops.py::TestTagsCUDA::test_tags_linalg_det_singular_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3071942Z test_ops.py::TestTagsCUDA::test_tags_linalg_eig_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3072117Z test_ops.py::TestTagsCUDA::test_tags_linalg_eigvalsh_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3072286Z test_ops.py::TestTagsCUDA::test_tags_linalg_inv_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3072457Z test_ops.py::TestTagsCUDA::test_tags_linalg_inv_ex_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3072625Z test_ops.py::TestTagsCUDA::test_tags_linalg_ldl_solve_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3072795Z test_ops.py::TestTagsCUDA::test_tags_linalg_lstsq_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3072981Z test_ops.py::TestTagsCUDA::test_tags_linalg_lstsq_grad_oriented_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3073151Z test_ops.py::TestTagsCUDA::test_tags_linalg_lu_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3073328Z test_ops.py::TestTagsCUDA::test_tags_linalg_lu_factor_ex_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3073505Z test_ops.py::TestTagsCUDA::test_tags_linalg_lu_solve_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3073699Z test_ops.py::TestTagsCUDA::test_tags_linalg_norm_subgradients_at_zero_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3073881Z test_ops.py::TestTagsCUDA::test_tags_linalg_pinv_hermitian_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3074105Z test_ops.py::TestTagsCUDA::test_tags_linalg_pinv_singular_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 98%] 2023-01-11T23:13:47.3074271Z test_ops.py::TestTagsCUDA::test_tags_linalg_slogdet_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3074440Z test_ops.py::TestTagsCUDA::test_tags_linalg_svd_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3074611Z test_ops.py::TestTagsCUDA::test_tags_linalg_svdvals_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3074818Z test_ops.py::TestTagsCUDA::test_tags_linalg_tensorsolve_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3074994Z test_ops.py::TestTagsCUDA::test_tags_linalg_vector_norm_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3075155Z test_ops.py::TestTagsCUDA::test_tags_log10_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3075324Z test_ops.py::TestTagsCUDA::test_tags_logical_and_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3075494Z test_ops.py::TestTagsCUDA::test_tags_logical_not_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3075658Z test_ops.py::TestTagsCUDA::test_tags_logical_or_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3075824Z test_ops.py::TestTagsCUDA::test_tags_logical_xor_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3075987Z test_ops.py::TestTagsCUDA::test_tags_long_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3076149Z test_ops.py::TestTagsCUDA::test_tags_lt_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3076309Z test_ops.py::TestTagsCUDA::test_tags_lu_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3076475Z test_ops.py::TestTagsCUDA::test_tags_lu_solve_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3076648Z test_ops.py::TestTagsCUDA::test_tags_lu_unpack_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3076814Z test_ops.py::TestTagsCUDA::test_tags_masked_amin_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3076988Z test_ops.py::TestTagsCUDA::test_tags_masked_argmax_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3077155Z test_ops.py::TestTagsCUDA::test_tags_masked_cumprod_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3077351Z test_ops.py::TestTagsCUDA::test_tags_masked_cumsum_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3077523Z test_ops.py::TestTagsCUDA::test_tags_masked_fill_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3077697Z test_ops.py::TestTagsCUDA::test_tags_masked_scatter_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3077868Z test_ops.py::TestTagsCUDA::test_tags_masked_select_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3078040Z test_ops.py::TestTagsCUDA::test_tags_masked_softmin_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3078205Z test_ops.py::TestTagsCUDA::test_tags_masked_sum_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3078373Z test_ops.py::TestTagsCUDA::test_tags_masked_var_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3078537Z test_ops.py::TestTagsCUDA::test_tags_matmul_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3078694Z test_ops.py::TestTagsCUDA::test_tags_matrix_exp_cuda_float32 SKIPPED (Only runs on cpu) [ 98%] 2023-01-11T23:13:47.3078888Z test_ops.py::TestTagsCUDA::test_tags_max_pool2d_with_indices_backward_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3079071Z test_ops.py::TestTagsCUDA::test_tags_max_reduction_no_dim_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3079255Z test_ops.py::TestTagsCUDA::test_tags_max_reduction_with_dim_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3079442Z test_ops.py::TestTagsCUDA::test_tags_median_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3079653Z test_ops.py::TestTagsCUDA::test_tags_meshgrid_variadic_tensors_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3079833Z test_ops.py::TestTagsCUDA::test_tags_min_reduction_no_dim_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3080014Z test_ops.py::TestTagsCUDA::test_tags_min_reduction_with_dim_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3080179Z test_ops.py::TestTagsCUDA::test_tags_mode_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3080361Z test_ops.py::TestTagsCUDA::test_tags_movedim_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3080523Z test_ops.py::TestTagsCUDA::test_tags_msort_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3080684Z test_ops.py::TestTagsCUDA::test_tags_mul_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3080865Z test_ops.py::TestTagsCUDA::test_tags_mvlgamma_mvlgamma_p_3_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3081035Z test_ops.py::TestTagsCUDA::test_tags_nanmedian_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3081198Z test_ops.py::TestTagsCUDA::test_tags_narrow_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3081382Z test_ops.py::TestTagsCUDA::test_tags_native_dropout_backward_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3081542Z test_ops.py::TestTagsCUDA::test_tags_neg_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3081711Z test_ops.py::TestTagsCUDA::test_tags_new_empty_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3081882Z test_ops.py::TestTagsCUDA::test_tags_new_empty_strided_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3082046Z test_ops.py::TestTagsCUDA::test_tags_new_ones_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3082208Z test_ops.py::TestTagsCUDA::test_tags_new_zeros_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3082413Z test_ops.py::TestTagsCUDA::test_tags_nn_functional__scaled_dot_product_attention_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3082601Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_batch_norm_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3082784Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_bilinear_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3083019Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_binary_cross_entropy_with_logits_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3083205Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_conv1d_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3083400Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_conv_transpose1d_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3083578Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_dropout3d_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3083758Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_elu_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3083967Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_feature_alpha_dropout_with_train_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3084174Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_feature_alpha_dropout_without_train_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3084351Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_glu_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3084539Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_grid_sample_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3084726Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_instance_norm_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3084910Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_kl_div_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3085093Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_layer_norm_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3085268Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_leaky_relu_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3085463Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_local_response_norm_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3085645Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_max_pool2d_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3085837Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_max_unpool1d_grad_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3086042Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_mish_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3086225Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_mse_loss_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3086428Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_multilabel_soft_margin_loss_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3086620Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_pad_circular_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3086809Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_pad_replicate_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3086998Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_pairwise_distance_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3087187Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_poisson_nll_loss_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3087369Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_relu6_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3087547Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_rrelu_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3087726Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_selu_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3087906Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_softmin_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3088100Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_softmin_with_dtype_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3088281Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_softsign_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3088486Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_unfold_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3088654Z test_ops.py::TestTagsCUDA::test_tags_nonzero_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3088812Z test_ops.py::TestTagsCUDA::test_tags_norm_inf_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3088990Z test_ops.py::TestTagsCUDA::test_tags_normal_number_mean_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3089196Z test_ops.py::TestTagsCUDA::test_tags_ops_nvprims_native_batch_norm_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3089409Z test_ops.py::TestTagsCUDA::test_tags_ops_nvprims_var_mean_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3089575Z test_ops.py::TestTagsCUDA::test_tags_ormqr_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3089737Z test_ops.py::TestTagsCUDA::test_tags_outer_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3089911Z test_ops.py::TestTagsCUDA::test_tags_pca_lowrank_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3090076Z test_ops.py::TestTagsCUDA::test_tags_permute_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3090256Z test_ops.py::TestTagsCUDA::test_tags_polygamma_polygamma_n_1_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3090439Z test_ops.py::TestTagsCUDA::test_tags_polygamma_polygamma_n_2_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3090617Z test_ops.py::TestTagsCUDA::test_tags_polygamma_polygamma_n_3_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3090777Z test_ops.py::TestTagsCUDA::test_tags_put_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3090946Z test_ops.py::TestTagsCUDA::test_tags_rad2deg_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3091113Z test_ops.py::TestTagsCUDA::test_tags_randint_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3091279Z test_ops.py::TestTagsCUDA::test_tags_randn_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3091443Z test_ops.py::TestTagsCUDA::test_tags_real_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3091640Z test_ops.py::TestTagsCUDA::test_tags_reciprocal_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3091802Z test_ops.py::TestTagsCUDA::test_tags_remainder_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3091965Z test_ops.py::TestTagsCUDA::test_tags_renorm_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3092132Z test_ops.py::TestTagsCUDA::test_tags_repeat_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3092312Z test_ops.py::TestTagsCUDA::test_tags_repeat_interleave_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3092483Z test_ops.py::TestTagsCUDA::test_tags_reshape_as_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3092657Z test_ops.py::TestTagsCUDA::test_tags_reshape_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3092823Z test_ops.py::TestTagsCUDA::test_tags_resize__cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3092990Z test_ops.py::TestTagsCUDA::test_tags_resize_as__cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3093152Z test_ops.py::TestTagsCUDA::test_tags_roll_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3093324Z test_ops.py::TestTagsCUDA::test_tags_round_decimals_0_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3093493Z test_ops.py::TestTagsCUDA::test_tags_scatter_add_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3093674Z test_ops.py::TestTagsCUDA::test_tags_scatter_reduce_prod_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3093857Z test_ops.py::TestTagsCUDA::test_tags_segment_reduce_offsets_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3094059Z test_ops.py::TestTagsCUDA::test_tags_select_scatter_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3094219Z test_ops.py::TestTagsCUDA::test_tags_short_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3094385Z test_ops.py::TestTagsCUDA::test_tags_sign_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3094674Z test_ops.py::TestTagsCUDA::test_tags_signal_windows_blackman_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3094851Z test_ops.py::TestTagsCUDA::test_tags_signal_windows_cosine_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3095042Z test_ops.py::TestTagsCUDA::test_tags_signal_windows_exponential_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3095234Z test_ops.py::TestTagsCUDA::test_tags_signal_windows_general_hamming_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3095414Z test_ops.py::TestTagsCUDA::test_tags_signal_windows_hann_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3095588Z test_ops.py::TestTagsCUDA::test_tags_signbit_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3095752Z test_ops.py::TestTagsCUDA::test_tags_sinc_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3095916Z test_ops.py::TestTagsCUDA::test_tags_sinh_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3096088Z test_ops.py::TestTagsCUDA::test_tags_slice_scatter_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3096247Z test_ops.py::TestTagsCUDA::test_tags_sort_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3096415Z test_ops.py::TestTagsCUDA::test_tags_special_airy_ai_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3096594Z test_ops.py::TestTagsCUDA::test_tags_special_bessel_j1_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3096937Z test_ops.py::TestTagsCUDA::test_tags_special_chebyshev_polynomial_w_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 99%] 2023-01-11T23:13:47.3097130Z test_ops.py::TestTagsCUDA::test_tags_special_hermite_polynomial_h_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3097361Z test_ops.py::TestTagsCUDA::test_tags_special_hermite_polynomial_he_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3097697Z test_ops.py::TestTagsCUDA::test_tags_special_legendre_polynomial_p_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 99%] 2023-01-11T23:13:47.3097882Z test_ops.py::TestTagsCUDA::test_tags_special_modified_bessel_k0_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3098068Z test_ops.py::TestTagsCUDA::test_tags_special_modified_bessel_k1_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3098240Z test_ops.py::TestTagsCUDA::test_tags_special_ndtr_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3098412Z test_ops.py::TestTagsCUDA::test_tags_special_ndtri_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3098609Z test_ops.py::TestTagsCUDA::test_tags_special_polygamma_special_polygamma_n_0_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3098802Z test_ops.py::TestTagsCUDA::test_tags_special_scaled_modified_bessel_k0_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3099146Z test_ops.py::TestTagsCUDA::test_tags_special_shifted_chebyshev_polynomial_t_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 99%] 2023-01-11T23:13:47.3099479Z test_ops.py::TestTagsCUDA::test_tags_special_shifted_chebyshev_polynomial_u_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 99%] 2023-01-11T23:13:47.3099666Z test_ops.py::TestTagsCUDA::test_tags_special_spherical_bessel_j0_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3099828Z test_ops.py::TestTagsCUDA::test_tags_split_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3100005Z test_ops.py::TestTagsCUDA::test_tags_split_with_sizes_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3100199Z test_ops.py::TestTagsCUDA::test_tags_sqrt_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3100369Z test_ops.py::TestTagsCUDA::test_tags_squeeze_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3100521Z test_ops.py::TestTagsCUDA::test_tags_stack_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3100682Z test_ops.py::TestTagsCUDA::test_tags_std_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3100840Z test_ops.py::TestTagsCUDA::test_tags_sub_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3100999Z test_ops.py::TestTagsCUDA::test_tags_sum_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3101169Z test_ops.py::TestTagsCUDA::test_tags_svd_lowrank_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3101329Z test_ops.py::TestTagsCUDA::test_tags_t_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3101505Z test_ops.py::TestTagsCUDA::test_tags_take_along_dim_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3101672Z test_ops.py::TestTagsCUDA::test_tags_tensordot_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3101834Z test_ops.py::TestTagsCUDA::test_tags_tile_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3101997Z test_ops.py::TestTagsCUDA::test_tags_to_sparse_cuda_float32 SKIPPED (Allowed exception) [ 99%] 2023-01-11T23:13:47.3102155Z test_ops.py::TestTagsCUDA::test_tags_topk_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3102317Z test_ops.py::TestTagsCUDA::test_tags_trace_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3102483Z test_ops.py::TestTagsCUDA::test_tags_transpose_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3102651Z test_ops.py::TestTagsCUDA::test_tags_trapezoid_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3102814Z test_ops.py::TestTagsCUDA::test_tags_trapz_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3102979Z test_ops.py::TestTagsCUDA::test_tags_tril_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3103174Z test_ops.py::TestTagsCUDA::test_tags_tril_indices_cuda_int64 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3103325Z test_ops.py::TestTagsCUDA::test_tags_triu_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3103486Z test_ops.py::TestTagsCUDA::test_tags_triu_indices_cuda_int64 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3103654Z test_ops.py::TestTagsCUDA::test_tags_true_divide_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3103819Z test_ops.py::TestTagsCUDA::test_tags_unflatten_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3103990Z test_ops.py::TestTagsCUDA::test_tags_unfold_copy_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3104152Z test_ops.py::TestTagsCUDA::test_tags_unfold_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3104319Z test_ops.py::TestTagsCUDA::test_tags_uniform_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3104498Z test_ops.py::TestTagsCUDA::test_tags_unique_consecutive_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3104662Z test_ops.py::TestTagsCUDA::test_tags_var_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3104820Z test_ops.py::TestTagsCUDA::test_tags_var_mean_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3104979Z test_ops.py::TestTagsCUDA::test_tags_vdot_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3105154Z test_ops.py::TestTagsCUDA::test_tags_view_as_complex_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3105316Z test_ops.py::TestTagsCUDA::test_tags_view_as_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3105488Z test_ops.py::TestTagsCUDA::test_tags_view_as_real_cuda_complex64 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3105699Z test_ops.py::TestTagsCUDA::test_tags_view_copy_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3105858Z test_ops.py::TestTagsCUDA::test_tags_view_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3106022Z test_ops.py::TestTagsCUDA::test_tags_where_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3106174Z test_ops.py::TestTagsCUDA::test_tags_xlogy_cuda_float32 SKIPPED (Only runs on cpu) [ 99%] 2023-01-11T23:13:47.3106343Z test_ops.py::TestTagsCUDA::test_tags_zeros_like_cuda_float32 SKIPPED (Only runs on cpu) [100%] 2023-01-11T23:13:47.3106351Z 2023-01-11T23:13:47.3106472Z =============================== warnings summary =============================== 2023-01-11T23:13:47.3106697Z ../../../../../opt/conda/lib/python3.10/site-packages/_pytest/config/__init__.py:1171 2023-01-11T23:13:47.3107053Z /opt/conda/lib/python3.10/site-packages/_pytest/config/__init__.py:1171: PytestAssertRewriteWarning: Module already imported so cannot be rewritten: hypothesis 2023-01-11T23:13:47.3107162Z self._mark_plugins_for_rewrite(hook) 2023-01-11T23:13:47.3107168Z 2023-01-11T23:13:47.3107401Z -- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html 2023-01-11T23:13:47.3107695Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_ops/test_ops-621a128d9f5db79e.xml - 2023-01-11T23:13:47.3107847Z = 11898 passed, 3070 skipped, 17 deselected, 179 xfailed, 1 warning in 4799.59s (1:19:59) = 2023-01-11T23:13:47.3108027Z If in CI, skip info is located in the xml test reports, please either go to s3 or the hud to download them 2023-01-11T23:13:47.3108033Z 2023-01-11T23:13:47.3108397Z ##[endgroup] 2023-01-11T23:13:47.3108654Z FINISHED PRINTING LOG FILE of test_ops (/var/lib/jenkins/workspace/test/test-reports/test_ops_oa0bw8mk) 2023-01-11T23:13:47.3108660Z 2023-01-11T23:13:47.3109119Z Executing ['/opt/conda/bin/python', '-bb', 'test_ops.py', '-v', '--use-pytest', '-vv', '-rfEX', '-x', '--reruns=2', '-k=_linalg_cholesky_', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 23:13:47.250262] 2023-01-11T23:14:02.8990976Z 2023-01-11T23:14:02.8991624Z Expand the folded group to see the log file of test_ops 2023-01-11T23:14:02.8993583Z ##[group]PRINTING LOG FILE of test_ops (/var/lib/jenkins/workspace/test/test-reports/test_ops_3nn1zy0z) 2023-01-11T23:14:02.8994311Z Test results will be stored in test-reports/python-pytest/test_ops/test_ops-001e362a3915126e.xml 2023-01-11T23:14:02.8994645Z ============================= test session starts ============================== 2023-01-11T23:14:02.8995016Z platform linux -- Python 3.10.8, pytest-7.2.0, pluggy-1.0.0 -- /opt/conda/bin/python 2023-01-11T23:14:02.8995291Z cachedir: .pytest_cache 2023-01-11T23:14:02.8995718Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2023-01-11T23:14:02.8996090Z rootdir: /var/lib/jenkins/workspace, configfile: pytest.ini 2023-01-11T23:14:02.8996517Z plugins: hypothesis-5.35.1, flakefinder-1.1.0, rerunfailures-10.3, shard-0.1.2, xdist-3.1.0, xdoctest-1.1.0 2023-01-11T23:14:02.8996874Z collecting ... collected 30861 items / 30819 deselected / 42 selected 2023-01-11T23:14:02.9000780Z Running 42 items in this shard: test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_cholesky_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_cholesky_ex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_cholesky_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_cholesky_ex_cuda, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_cholesky_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_cholesky_ex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_cholesky_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_cholesky_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_cholesky_ex_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_cholesky_ex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_cholesky_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_cholesky_ex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_cholesky_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_cholesky_ex_cuda, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_cholesky_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_cholesky_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_cholesky_ex_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_cholesky_ex_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_cholesky_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_cholesky_ex_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_cholesky_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_cholesky_ex_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_cholesky_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_cholesky_ex_cuda_float32, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_cholesky_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_cholesky_ex_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_cholesky_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_cholesky_ex_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_cholesky_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_cholesky_ex_cuda_float64, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_cholesky_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_cholesky_ex_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_cholesky_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_cholesky_ex_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_cholesky_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_cholesky_ex_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_cholesky_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_cholesky_ex_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_cholesky_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_cholesky_ex_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_cholesky_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_cholesky_ex_cuda_float32 2023-01-11T23:14:02.9004453Z 2023-01-11T23:14:02.9004687Z test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_cholesky_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 2%] 2023-01-11T23:14:02.9005171Z test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_cholesky_ex_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 4%] 2023-01-11T23:14:02.9005577Z test_ops.py::TestCommonCUDA::test_dtypes_linalg_cholesky_cuda PASSED [ 7%] 2023-01-11T23:14:02.9005912Z test_ops.py::TestCommonCUDA::test_dtypes_linalg_cholesky_ex_cuda PASSED [ 9%] 2023-01-11T23:14:02.9006294Z test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_cholesky_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 11%] 2023-01-11T23:14:02.9006724Z test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_cholesky_ex_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 14%] 2023-01-11T23:14:02.9007130Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_cholesky_cuda_complex64 PASSED [ 16%] 2023-01-11T23:14:02.9007513Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_cholesky_cuda_float32 PASSED [ 19%] 2023-01-11T23:14:02.9007918Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_cholesky_ex_cuda_complex64 PASSED [ 21%] 2023-01-11T23:14:02.9008307Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_cholesky_ex_cuda_float32 PASSED [ 23%] 2023-01-11T23:14:02.9008671Z test_ops.py::TestCommonCUDA::test_out_linalg_cholesky_cuda_float32 PASSED [ 26%] 2023-01-11T23:14:02.9009002Z test_ops.py::TestCommonCUDA::test_out_linalg_cholesky_ex_cuda_float32 PASSED [ 28%] 2023-01-11T23:14:02.9009345Z test_ops.py::TestCommonCUDA::test_out_warning_linalg_cholesky_cuda PASSED [ 30%] 2023-01-11T23:14:02.9009688Z test_ops.py::TestCommonCUDA::test_out_warning_linalg_cholesky_ex_cuda PASSED [ 33%] 2023-01-11T23:14:02.9010060Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_cholesky_cuda_complex64 PASSED [ 35%] 2023-01-11T23:14:02.9010443Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_cholesky_cuda_float32 PASSED [ 38%] 2023-01-11T23:14:02.9010835Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_cholesky_ex_cuda_complex64 PASSED [ 40%] 2023-01-11T23:14:02.9011231Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_cholesky_ex_cuda_float32 PASSED [ 42%] 2023-01-11T23:14:02.9011626Z test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_cholesky_cuda_float32 PASSED [ 45%] 2023-01-11T23:14:02.9012021Z test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_cholesky_ex_cuda_float32 PASSED [ 47%] 2023-01-11T23:14:02.9012421Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_cholesky_cuda_float32 PASSED [ 50%] 2023-01-11T23:14:02.9012828Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_cholesky_ex_cuda_float32 PASSED [ 52%] 2023-01-11T23:14:02.9013220Z test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_cholesky_cuda_float32 PASSED [ 54%] 2023-01-11T23:14:02.9013621Z test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_cholesky_ex_cuda_float32 PASSED [ 57%] 2023-01-11T23:14:02.9014001Z test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_cholesky_cuda_complex64 PASSED [ 59%] 2023-01-11T23:14:02.9014364Z test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_cholesky_ex_cuda_complex64 PASSED [ 61%] 2023-01-11T23:14:02.9015022Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_cholesky_cuda_complex128 PASSED [ 64%] 2023-01-11T23:14:02.9015399Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_cholesky_ex_cuda_complex128 PASSED [ 66%] 2023-01-11T23:14:02.9015759Z test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_cholesky_cuda_float64 PASSED [ 69%] 2023-01-11T23:14:02.9016112Z test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_cholesky_ex_cuda_float64 PASSED [ 71%] 2023-01-11T23:14:02.9016468Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_cholesky_cuda_float32 PASSED [ 73%] 2023-01-11T23:14:02.9016840Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_cholesky_ex_cuda_float32 PASSED [ 76%] 2023-01-11T23:14:02.9017226Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_cholesky_cuda_float32 PASSED [ 78%] 2023-01-11T23:14:02.9017620Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_cholesky_ex_cuda_float32 PASSED [ 80%] 2023-01-11T23:14:02.9018024Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_cholesky_cuda_float32 PASSED [ 83%] 2023-01-11T23:14:02.9018422Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_cholesky_ex_cuda_float32 PASSED [ 85%] 2023-01-11T23:14:02.9018799Z test_ops.py::TestFakeTensorCUDA::test_fake_linalg_cholesky_cuda_float32 PASSED [ 88%] 2023-01-11T23:14:02.9019144Z test_ops.py::TestFakeTensorCUDA::test_fake_linalg_cholesky_ex_cuda_float32 PASSED [ 90%] 2023-01-11T23:14:02.9019509Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_cholesky_cuda_float32 PASSED [ 92%] 2023-01-11T23:14:02.9019878Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_cholesky_ex_cuda_float32 PASSED [ 95%] 2023-01-11T23:14:02.9020300Z test_ops.py::TestTagsCUDA::test_tags_linalg_cholesky_cuda_float32 SKIPPED (Only runs on cpu) [ 97%] 2023-01-11T23:14:02.9020670Z test_ops.py::TestTagsCUDA::test_tags_linalg_cholesky_ex_cuda_float32 SKIPPED (Only runs on cpu) [100%] 2023-01-11T23:14:02.9020873Z 2023-01-11T23:14:02.9020999Z =============================== warnings summary =============================== 2023-01-11T23:14:02.9021379Z ../../../../../opt/conda/lib/python3.10/site-packages/_pytest/config/__init__.py:1171 2023-01-11T23:14:02.9021913Z /opt/conda/lib/python3.10/site-packages/_pytest/config/__init__.py:1171: PytestAssertRewriteWarning: Module already imported so cannot be rewritten: hypothesis 2023-01-11T23:14:02.9022275Z self._mark_plugins_for_rewrite(hook) 2023-01-11T23:14:02.9022416Z 2023-01-11T23:14:02.9022649Z -- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html 2023-01-11T23:14:02.9023128Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_ops/test_ops-001e362a3915126e.xml - 2023-01-11T23:14:02.9023494Z ========== 36 passed, 6 skipped, 30819 deselected, 1 warning in 9.16s ========== 2023-01-11T23:14:02.9023871Z If in CI, skip info is located in the xml test reports, please either go to s3 or the hud to download them 2023-01-11T23:14:02.9024086Z 2023-01-11T23:14:02.9024327Z ##[endgroup] 2023-01-11T23:14:02.9024686Z FINISHED PRINTING LOG FILE of test_ops (/var/lib/jenkins/workspace/test/test-reports/test_ops_3nn1zy0z) 2023-01-11T23:14:02.9024894Z 2023-01-11T23:14:02.9025057Z Running test_prims ... [2023-01-11 23:14:02.899032] 2023-01-11T23:14:02.9025514Z Executing ['/opt/conda/bin/python', '-bb', 'test_prims.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 23:14:02.899225] 2023-01-11T23:14:49.4054788Z 2023-01-11T23:14:49.4055243Z Expand the folded group to see the log file of test_prims 2023-01-11T23:14:49.4056183Z ##[group]PRINTING LOG FILE of test_prims (/var/lib/jenkins/workspace/test/test-reports/test_prims_f_kgo3dr) 2023-01-11T23:14:49.4059068Z 2023-01-11T23:14:49.4059382Z Running tests... 2023-01-11T23:14:49.4059927Z ---------------------------------------------------------------------- 2023-01-11T23:14:49.4060409Z Test results will be stored in test-reports/python-unittest/test_prims 2023-01-11T23:14:49.4060947Z test_decomposition_method_vararg_ones_cuda_float32 (__main__.TestDecompCUDA) ... ok (0.019s) 2023-01-11T23:14:49.4061320Z test_decomposition_method_vararg_permute_cuda_float32 (__main__.TestDecompCUDA) ... ok (0.015s) 2023-01-11T23:14:49.4061699Z test_decomposition_type_promotion_nvprim_amp_cuda_float16 (__main__.TestDecompCUDA) ... ok (1.316s) 2023-01-11T23:14:49.4064281Z test_decomposition_type_promotion_nvprim_amp_cuda_float32 (__main__.TestDecompCUDA) ... ok (0.679s) 2023-01-11T23:14:49.4064829Z test_masked_fill_decomposition_under_nvprim_context_cuda_float16 (__main__.TestDecompCUDA) ... ok (0.118s) 2023-01-11T23:14:49.4065369Z test_masked_fill_decomposition_under_nvprim_context_cuda_float32 (__main__.TestDecompCUDA) ... ok (0.120s) 2023-01-11T23:14:49.4065709Z test_mul_complex (__main__.TestPrimsBasic) ... ok (0.001s) 2023-01-11T23:14:49.4066088Z test_torch_ops (__main__.TestPrimsBasic) ... ok (0.002s) 2023-01-11T23:14:49.4066486Z test_aten_overload_to_prims_cuda (__main__.TestPrimsCUDA) ... ok (0.038s) 2023-01-11T23:14:49.4066931Z test_batch_norm_backward_nvprims_cuda_float16 (__main__.TestPrimsCUDA) ... ok (3.815s) 2023-01-11T23:14:49.4067385Z test_batch_norm_backward_nvprims_cuda_float32 (__main__.TestPrimsCUDA) ... ok (3.585s) 2023-01-11T23:14:49.4067804Z test_broadcast_in_dim_cuda_float32 (__main__.TestPrimsCUDA) ... ok (0.927s) 2023-01-11T23:14:49.4068177Z test_broadcast_in_dim_sum_cuda_float32 (__main__.TestPrimsCUDA) ... ok (0.258s) 2023-01-11T23:14:49.4068486Z test_cbrt_prim_cuda_float64 (__main__.TestPrimsCUDA) ... ok (0.010s) 2023-01-11T23:14:49.4068777Z test_cbrt_prim_cuda_int64 (__main__.TestPrimsCUDA) ... ok (0.009s) 2023-01-11T23:14:49.4069075Z test_cpu_tensor_cuda_float16 (__main__.TestPrimsCUDA) ... ok (0.434s) 2023-01-11T23:14:49.4069501Z test_cpu_tensor_cuda_float32 (__main__.TestPrimsCUDA) ... ok (0.365s) 2023-01-11T23:14:49.4069818Z test_cudnn_batch_norm_nvprims_cuda_float32 (__main__.TestPrimsCUDA) ... ok (6.611s) 2023-01-11T23:14:49.4070232Z test_cudnn_batch_norm_nvprims_cuda_float64 (__main__.TestPrimsCUDA) ... ok (6.536s) 2023-01-11T23:14:49.4070567Z test_full_cuda_float32 (__main__.TestPrimsCUDA) ... ok (2.199s) 2023-01-11T23:14:49.4070871Z test_memory_format_strides_cuda_float32 (__main__.TestPrimsCUDA) ... ok (0.020s) 2023-01-11T23:14:49.4071190Z test_native_batch_norm_nvprims_cuda_float32 (__main__.TestPrimsCUDA) ... ok (5.339s) 2023-01-11T23:14:49.4071528Z test_native_batch_norm_nvprims_cuda_float64 (__main__.TestPrimsCUDA) ... ok (5.271s) 2023-01-11T23:14:49.4071849Z test_nvfuser_capability_context_cuda (__main__.TestPrimsCUDA) ... ok (0.147s) 2023-01-11T23:14:49.4072167Z test_nvfuser_constant_tensors_cuda (__main__.TestPrimsCUDA) ... ok (0.057s) 2023-01-11T23:14:49.4072929Z test_nvfuser_empty_fusion_cuda (__main__.TestPrimsCUDA) ... /opt/conda/lib/python3.10/site-packages/torch/_prims/nvfuser_executor.py:414: RuntimeWarning: No partition found for the graph. This is likely because the graph is not supported by nvFuser. Please use the eager ATen mode to execute the graph. 2023-01-11T23:14:49.4073427Z warn( 2023-01-11T23:14:49.4073598Z ok (0.016s) 2023-01-11T23:14:49.4073868Z test_nvfuser_executor_cached_noncontiguous_cuda (__main__.TestPrimsCUDA) ... ok (0.450s) 2023-01-11T23:14:49.4074198Z test_nvfuser_executor_parameters_cuda (__main__.TestPrimsCUDA) ... ok (0.199s) 2023-01-11T23:14:49.4074523Z test_nvfuser_executor_partitioned_cuda (__main__.TestPrimsCUDA) ... ok (0.548s) 2023-01-11T23:14:49.4077380Z test_nvfuser_executor_partitioned_no_partitions_error_cuda (__main__.TestPrimsCUDA) ... ok (0.048s) 2023-01-11T23:14:49.4077776Z test_nvfuser_impl_is_used_cuda (__main__.TestPrimsCUDA) ... ok (0.001s) 2023-01-11T23:14:49.4078345Z test_nvfuser_no_args_cuda (__main__.TestPrimsCUDA) ... ok (0.068s) 2023-01-11T23:14:49.4078796Z test_nvfuser_rand_like_fusion_cuda (__main__.TestPrimsCUDA) ... ok (0.208s) 2023-01-11T23:14:49.4079910Z test_nvprim_convert_element_type_cuda_float16 (__main__.TestPrimsCUDA) ... /opt/conda/lib/python3.10/site-packages/torch/_prims/context.py:238: UserWarning: get_isolated_graphmodule failed on decomposition: empty_like with error message: Unexpected type when checking for same shape, ! 2023-01-11T23:14:49.4080582Z warn( 2023-01-11T23:14:49.4081126Z /opt/conda/lib/python3.10/site-packages/torch/_prims/context.py:238: UserWarning: get_isolated_graphmodule failed on decomposition: fill_scalar with error message: full_like(): argument 'fill_value' (position 2) must be Number, not Proxy 2023-01-11T23:14:49.4081519Z warn( 2023-01-11T23:14:49.4081688Z ok (0.251s) 2023-01-11T23:14:49.4081950Z test_nvprim_convert_element_type_cuda_uint8 (__main__.TestPrimsCUDA) ... ok (0.251s) 2023-01-11T23:14:49.4082252Z test_nvprims_cuda (__main__.TestPrimsCUDA) ... ok (0.012s) 2023-01-11T23:14:49.4082539Z test_nvprims_view_cuda_float16 (__main__.TestPrimsCUDA) ... ok (0.329s) 2023-01-11T23:14:49.4082837Z test_nvprims_view_cuda_float32 (__main__.TestPrimsCUDA) ... ok (0.330s) 2023-01-11T23:14:49.4083548Z test_nvprims_view_partitioner_cuda_float16 (__main__.TestPrimsCUDA) ... /opt/conda/lib/python3.10/site-packages/torch/_prims/nvfuser_executor.py:414: RuntimeWarning: No partition found for the graph. This is likely because the graph is not supported by nvFuser. Please use the eager ATen mode to execute the graph. 2023-01-11T23:14:49.4083993Z warn( 2023-01-11T23:14:49.4084162Z ok (0.084s) 2023-01-11T23:14:49.4084417Z test_nvprims_view_partitioner_cuda_float32 (__main__.TestPrimsCUDA) ... ok (0.073s) 2023-01-11T23:14:49.4084732Z test_partitioner_tuple_output_cuda (__main__.TestPrimsCUDA) ... ok (0.038s) 2023-01-11T23:14:49.4085047Z test_pytree_input_output_cuda_float32 (__main__.TestPrimsCUDA) ... ok (0.258s) 2023-01-11T23:14:49.4085406Z test_reshape_view_method_cuda_float32 (__main__.TestPrimsCUDA) ... ok (0.002s) 2023-01-11T23:14:49.4085738Z test_silu_backward_no_filled_tensor_cuda_float32 (__main__.TestPrimsCUDA) ... ok (0.829s) 2023-01-11T23:14:49.4086068Z test_skip_ops_nvfuser_capability_mode_cuda (__main__.TestPrimsCUDA) ... ok (0.022s) 2023-01-11T23:14:49.4086388Z test_skip_ops_nvfuser_prims_mode_cuda (__main__.TestPrimsCUDA) ... ok (0.011s) 2023-01-11T23:14:49.4086696Z test_var_correction_0_cuda_float32 (__main__.TestPrimsCUDA) ... ok (0.273s) 2023-01-11T23:14:49.4087012Z test_var_correction_1_cuda_float32 (__main__.TestPrimsCUDA) ... ok (0.273s) 2023-01-11T23:14:49.4087371Z test_var_mean_correction_0_keepdim_False_cuda_float16 (__main__.TestPrimsCUDA) ... ok (0.055s) 2023-01-11T23:14:49.4087710Z test_var_mean_correction_0_keepdim_False_cuda_float32 (__main__.TestPrimsCUDA) ... ok (0.040s) 2023-01-11T23:14:49.4088055Z test_var_mean_correction_0_keepdim_True_cuda_float16 (__main__.TestPrimsCUDA) ... ok (0.065s) 2023-01-11T23:14:49.4088387Z test_var_mean_correction_0_keepdim_True_cuda_float32 (__main__.TestPrimsCUDA) ... ok (0.049s) 2023-01-11T23:14:49.4088721Z test_var_mean_correction_1_keepdim_False_cuda_float16 (__main__.TestPrimsCUDA) ... ok (0.054s) 2023-01-11T23:14:49.4089063Z test_var_mean_correction_1_keepdim_False_cuda_float32 (__main__.TestPrimsCUDA) ... ok (0.039s) 2023-01-11T23:14:49.4089391Z test_var_mean_correction_1_keepdim_True_cuda_float16 (__main__.TestPrimsCUDA) ... ok (0.064s) 2023-01-11T23:14:49.4089724Z test_var_mean_correction_1_keepdim_True_cuda_float32 (__main__.TestPrimsCUDA) ... ok (0.049s) 2023-01-11T23:14:49.4090051Z test_constant_pad_nd_memory_format_cuda_float32 (__main__.TestRefsCUDA) ... ok (0.003s) 2023-01-11T23:14:49.4090236Z 2023-01-11T23:14:49.4090443Z ---------------------------------------------------------------------- 2023-01-11T23:14:49.4090690Z Ran 57 tests in 42.887s 2023-01-11T23:14:49.4090813Z 2023-01-11T23:14:49.4090879Z OK 2023-01-11T23:14:49.4090982Z 2023-01-11T23:14:49.4091071Z Generating XML reports... 2023-01-11T23:14:49.4091469Z Generated XML report: test-reports/python-unittest/test_prims/TEST-TestDecompCUDA-20230111231405.xml 2023-01-11T23:14:49.4092007Z Generated XML report: test-reports/python-unittest/test_prims/TEST-TestPrimsBasic-20230111231405.xml 2023-01-11T23:14:49.4092495Z Generated XML report: test-reports/python-unittest/test_prims/TEST-TestPrimsCUDA-20230111231405.xml 2023-01-11T23:14:49.4092976Z Generated XML report: test-reports/python-unittest/test_prims/TEST-TestRefsCUDA-20230111231405.xml 2023-01-11T23:14:49.4093193Z 2023-01-11T23:14:49.4093447Z ##[endgroup] 2023-01-11T23:14:49.4093812Z FINISHED PRINTING LOG FILE of test_prims (/var/lib/jenkins/workspace/test/test-reports/test_prims_f_kgo3dr) 2023-01-11T23:14:49.4094023Z 2023-01-11T23:14:49.4094205Z Running test_tensor_creation_ops ... [2023-01-11 23:14:49.402429] 2023-01-11T23:14:49.4094905Z Executing ['/opt/conda/bin/python', '-bb', 'test_tensor_creation_ops.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 23:14:49.402619] 2023-01-11T23:16:18.3492561Z 2023-01-11T23:16:18.3492786Z Expand the folded group to see the log file of test_tensor_creation_ops 2023-01-11T23:16:18.3496061Z ##[group]PRINTING LOG FILE of test_tensor_creation_ops (/var/lib/jenkins/workspace/test/test-reports/test_tensor_creation_ops_hsc_5c7d) 2023-01-11T23:16:18.3496399Z 2023-01-11T23:16:18.3496527Z Running tests... 2023-01-11T23:16:18.3496996Z ---------------------------------------------------------------------- 2023-01-11T23:16:18.3497473Z Test results will be stored in test-reports/python-unittest/test_tensor_creation_ops 2023-01-11T23:16:18.3497970Z test_alias_from_buffer_cuda_bool (__main__.TestAsArrayCUDA) ... skip: Only runs on cpu (0.001s) 2023-01-11T23:16:18.3499458Z test_alias_from_buffer_cuda_complex128 (__main__.TestAsArrayCUDA) ... skip: Only runs on cpu (0.000s) 2023-01-11T23:16:18.3499997Z test_alias_from_buffer_cuda_complex64 (__main__.TestAsArrayCUDA) ... skip: Only runs on cpu (0.000s) 2023-01-11T23:16:18.3500687Z test_alias_from_buffer_cuda_float16 (__main__.TestAsArrayCUDA) ... skip: Only runs on cpu (0.000s) 2023-01-11T23:16:18.3501059Z test_alias_from_buffer_cuda_float32 (__main__.TestAsArrayCUDA) ... skip: Only runs on cpu (0.000s) 2023-01-11T23:16:18.3501429Z test_alias_from_buffer_cuda_float64 (__main__.TestAsArrayCUDA) ... skip: Only runs on cpu (0.000s) 2023-01-11T23:16:18.3501821Z test_alias_from_buffer_cuda_int16 (__main__.TestAsArrayCUDA) ... skip: Only runs on cpu (0.000s) 2023-01-11T23:16:18.3502171Z test_alias_from_buffer_cuda_int32 (__main__.TestAsArrayCUDA) ... skip: Only runs on cpu (0.000s) 2023-01-11T23:16:18.3502516Z test_alias_from_buffer_cuda_int64 (__main__.TestAsArrayCUDA) ... skip: Only runs on cpu (0.000s) 2023-01-11T23:16:18.3502849Z test_alias_from_buffer_cuda_int8 (__main__.TestAsArrayCUDA) ... skip: Only runs on cpu (0.000s) 2023-01-11T23:16:18.3536871Z test_alias_from_buffer_cuda_uint8 (__main__.TestAsArrayCUDA) ... skip: Only runs on cpu (0.000s) 2023-01-11T23:16:18.3537365Z test_alias_from_dlpack_cuda_bfloat16 (__main__.TestAsArrayCUDA) ... ok (0.006s) 2023-01-11T23:16:18.3537806Z test_alias_from_dlpack_cuda_complex128 (__main__.TestAsArrayCUDA) ... ok (0.007s) 2023-01-11T23:16:18.3538149Z test_alias_from_dlpack_cuda_complex64 (__main__.TestAsArrayCUDA) ... ok (0.004s) 2023-01-11T23:16:18.3538465Z test_alias_from_dlpack_cuda_float16 (__main__.TestAsArrayCUDA) ... ok (0.004s) 2023-01-11T23:16:18.3538796Z test_alias_from_dlpack_cuda_float32 (__main__.TestAsArrayCUDA) ... ok (0.004s) 2023-01-11T23:16:18.3539101Z test_alias_from_dlpack_cuda_float64 (__main__.TestAsArrayCUDA) ... ok (0.004s) 2023-01-11T23:16:18.3542702Z test_alias_from_dlpack_cuda_int16 (__main__.TestAsArrayCUDA) ... ok (0.003s) 2023-01-11T23:16:18.3543934Z test_alias_from_dlpack_cuda_int32 (__main__.TestAsArrayCUDA) ... ok (0.003s) 2023-01-11T23:16:18.3544324Z test_alias_from_dlpack_cuda_int64 (__main__.TestAsArrayCUDA) ... ok (0.003s) 2023-01-11T23:16:18.3544688Z test_alias_from_dlpack_cuda_int8 (__main__.TestAsArrayCUDA) ... ok (0.003s) 2023-01-11T23:16:18.3545032Z test_alias_from_dlpack_cuda_uint8 (__main__.TestAsArrayCUDA) ... ok (0.003s) 2023-01-11T23:16:18.3545580Z test_alias_from_numpy_cuda_bool (__main__.TestAsArrayCUDA) ... skip: Only runs on cpu (0.000s) 2023-01-11T23:16:18.3545954Z test_alias_from_numpy_cuda_complex128 (__main__.TestAsArrayCUDA) ... skip: Only runs on cpu (0.000s) 2023-01-11T23:16:18.3546326Z test_alias_from_numpy_cuda_complex64 (__main__.TestAsArrayCUDA) ... skip: Only runs on cpu (0.000s) 2023-01-11T23:16:18.3546693Z test_alias_from_numpy_cuda_float16 (__main__.TestAsArrayCUDA) ... skip: Only runs on cpu (0.000s) 2023-01-11T23:16:18.3547044Z test_alias_from_numpy_cuda_float32 (__main__.TestAsArrayCUDA) ... skip: Only runs on cpu (0.000s) 2023-01-11T23:16:18.3547448Z test_alias_from_numpy_cuda_float64 (__main__.TestAsArrayCUDA) ... skip: Only runs on cpu (0.000s) 2023-01-11T23:16:18.3547822Z test_alias_from_numpy_cuda_int16 (__main__.TestAsArrayCUDA) ... skip: Only runs on cpu (0.000s) 2023-01-11T23:16:18.3548175Z test_alias_from_numpy_cuda_int32 (__main__.TestAsArrayCUDA) ... skip: Only runs on cpu (0.000s) 2023-01-11T23:16:18.3548525Z test_alias_from_numpy_cuda_int64 (__main__.TestAsArrayCUDA) ... skip: Only runs on cpu (0.000s) 2023-01-11T23:16:18.3548873Z test_alias_from_numpy_cuda_int8 (__main__.TestAsArrayCUDA) ... skip: Only runs on cpu (0.000s) 2023-01-11T23:16:18.3549216Z test_alias_from_numpy_cuda_uint8 (__main__.TestAsArrayCUDA) ... skip: Only runs on cpu (0.000s) 2023-01-11T23:16:18.3549554Z test_alias_from_tensor_cuda_bfloat16 (__main__.TestAsArrayCUDA) ... ok (0.004s) 2023-01-11T23:16:18.3549867Z test_alias_from_tensor_cuda_bool (__main__.TestAsArrayCUDA) ... ok (0.003s) 2023-01-11T23:16:18.3550192Z test_alias_from_tensor_cuda_complex128 (__main__.TestAsArrayCUDA) ... ok (0.004s) 2023-01-11T23:16:18.3550643Z test_alias_from_tensor_cuda_complex64 (__main__.TestAsArrayCUDA) ... ok (0.004s) 2023-01-11T23:16:18.3551056Z test_alias_from_tensor_cuda_float16 (__main__.TestAsArrayCUDA) ... ok (0.004s) 2023-01-11T23:16:18.3551362Z test_alias_from_tensor_cuda_float32 (__main__.TestAsArrayCUDA) ... ok (0.004s) 2023-01-11T23:16:18.3551670Z test_alias_from_tensor_cuda_float64 (__main__.TestAsArrayCUDA) ... ok (0.004s) 2023-01-11T23:16:18.3552031Z test_alias_from_tensor_cuda_int16 (__main__.TestAsArrayCUDA) ... ok (0.003s) 2023-01-11T23:16:18.3552335Z test_alias_from_tensor_cuda_int32 (__main__.TestAsArrayCUDA) ... ok (0.003s) 2023-01-11T23:16:18.3552641Z test_alias_from_tensor_cuda_int64 (__main__.TestAsArrayCUDA) ... ok (0.002s) 2023-01-11T23:16:18.3552948Z test_alias_from_tensor_cuda_int8 (__main__.TestAsArrayCUDA) ... ok (0.003s) 2023-01-11T23:16:18.3553275Z test_alias_from_tensor_cuda_uint8 (__main__.TestAsArrayCUDA) ... ok (0.003s) 2023-01-11T23:16:18.3553636Z test_astensor_consistency_cuda (__main__.TestAsArrayCUDA) ... skip: Only runs on cpu (0.001s) 2023-01-11T23:16:18.3554114Z test_copy_from_buffer_cuda_bool (__main__.TestAsArrayCUDA) ... skip: Only runs on cpu (0.000s) 2023-01-11T23:16:18.3554611Z test_copy_from_buffer_cuda_complex128 (__main__.TestAsArrayCUDA) ... skip: Only runs on cpu (0.000s) 2023-01-11T23:16:18.3555074Z test_copy_from_buffer_cuda_complex64 (__main__.TestAsArrayCUDA) ... skip: Only runs on cpu (0.000s) 2023-01-11T23:16:18.3555561Z test_copy_from_buffer_cuda_float16 (__main__.TestAsArrayCUDA) ... skip: Only runs on cpu (0.000s) 2023-01-11T23:16:18.3556033Z test_copy_from_buffer_cuda_float32 (__main__.TestAsArrayCUDA) ... skip: Only runs on cpu (0.000s) 2023-01-11T23:16:18.3556489Z test_copy_from_buffer_cuda_float64 (__main__.TestAsArrayCUDA) ... skip: Only runs on cpu (0.000s) 2023-01-11T23:16:18.3556872Z test_copy_from_buffer_cuda_int16 (__main__.TestAsArrayCUDA) ... skip: Only runs on cpu (0.000s) 2023-01-11T23:16:18.3557225Z test_copy_from_buffer_cuda_int32 (__main__.TestAsArrayCUDA) ... skip: Only runs on cpu (0.000s) 2023-01-11T23:16:18.3557588Z test_copy_from_buffer_cuda_int64 (__main__.TestAsArrayCUDA) ... skip: Only runs on cpu (0.000s) 2023-01-11T23:16:18.3557934Z test_copy_from_buffer_cuda_int8 (__main__.TestAsArrayCUDA) ... skip: Only runs on cpu (0.000s) 2023-01-11T23:16:18.3558338Z test_copy_from_buffer_cuda_uint8 (__main__.TestAsArrayCUDA) ... skip: Only runs on cpu (0.000s) 2023-01-11T23:16:18.3558682Z test_copy_from_dlpack_cuda_bfloat16 (__main__.TestAsArrayCUDA) ... ok (0.010s) 2023-01-11T23:16:18.3559220Z test_copy_from_dlpack_cuda_complex128 (__main__.TestAsArrayCUDA) ... /var/lib/jenkins/workspace/test/test_tensor_creation_ops.py:3701: UserWarning: Casting complex values to real discards the imaginary part (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/Copy.cpp:276.) 2023-01-11T23:16:18.3559765Z result = torch.asarray(cvt(original), **kwargs) 2023-01-11T23:16:18.3560006Z ok (0.010s) 2023-01-11T23:16:18.3560272Z test_copy_from_dlpack_cuda_complex64 (__main__.TestAsArrayCUDA) ... ok (0.009s) 2023-01-11T23:16:18.3560612Z test_copy_from_dlpack_cuda_float16 (__main__.TestAsArrayCUDA) ... ok (0.009s) 2023-01-11T23:16:18.3560995Z test_copy_from_dlpack_cuda_float32 (__main__.TestAsArrayCUDA) ... ok (0.009s) 2023-01-11T23:16:18.3561404Z test_copy_from_dlpack_cuda_float64 (__main__.TestAsArrayCUDA) ... ok (0.009s) 2023-01-11T23:16:18.3561816Z test_copy_from_dlpack_cuda_int16 (__main__.TestAsArrayCUDA) ... ok (0.008s) 2023-01-11T23:16:18.3562245Z test_copy_from_dlpack_cuda_int32 (__main__.TestAsArrayCUDA) ... ok (0.008s) 2023-01-11T23:16:18.3562656Z test_copy_from_dlpack_cuda_int64 (__main__.TestAsArrayCUDA) ... ok (0.008s) 2023-01-11T23:16:18.3563044Z test_copy_from_dlpack_cuda_int8 (__main__.TestAsArrayCUDA) ... ok (0.008s) 2023-01-11T23:16:18.3563420Z test_copy_from_dlpack_cuda_uint8 (__main__.TestAsArrayCUDA) ... ok (0.008s) 2023-01-11T23:16:18.3563789Z test_copy_from_dlpack_mult_devices_cuda_bfloat16 (__main__.TestAsArrayCUDA) ... skip: fewer than 2 devices detected (0.000s) 2023-01-11T23:16:18.3564251Z test_copy_from_dlpack_mult_devices_cuda_complex128 (__main__.TestAsArrayCUDA) ... skip: fewer than 2 devices detected (0.000s) 2023-01-11T23:16:18.3564663Z test_copy_from_dlpack_mult_devices_cuda_complex64 (__main__.TestAsArrayCUDA) ... skip: fewer than 2 devices detected (0.000s) 2023-01-11T23:16:18.3565065Z test_copy_from_dlpack_mult_devices_cuda_float16 (__main__.TestAsArrayCUDA) ... skip: fewer than 2 devices detected (0.000s) 2023-01-11T23:16:18.3565466Z test_copy_from_dlpack_mult_devices_cuda_float32 (__main__.TestAsArrayCUDA) ... skip: fewer than 2 devices detected (0.000s) 2023-01-11T23:16:18.3565863Z test_copy_from_dlpack_mult_devices_cuda_float64 (__main__.TestAsArrayCUDA) ... skip: fewer than 2 devices detected (0.000s) 2023-01-11T23:16:18.3566260Z test_copy_from_dlpack_mult_devices_cuda_int16 (__main__.TestAsArrayCUDA) ... skip: fewer than 2 devices detected (0.000s) 2023-01-11T23:16:18.3566655Z test_copy_from_dlpack_mult_devices_cuda_int32 (__main__.TestAsArrayCUDA) ... skip: fewer than 2 devices detected (0.000s) 2023-01-11T23:16:18.3567040Z test_copy_from_dlpack_mult_devices_cuda_int64 (__main__.TestAsArrayCUDA) ... skip: fewer than 2 devices detected (0.000s) 2023-01-11T23:16:18.3567420Z test_copy_from_dlpack_mult_devices_cuda_int8 (__main__.TestAsArrayCUDA) ... skip: fewer than 2 devices detected (0.000s) 2023-01-11T23:16:18.3567809Z test_copy_from_dlpack_mult_devices_cuda_uint8 (__main__.TestAsArrayCUDA) ... skip: fewer than 2 devices detected (0.000s) 2023-01-11T23:16:18.3568176Z test_copy_from_numpy_cuda_bool (__main__.TestAsArrayCUDA) ... skip: Only runs on cpu (0.000s) 2023-01-11T23:16:18.3568526Z test_copy_from_numpy_cuda_complex128 (__main__.TestAsArrayCUDA) ... skip: Only runs on cpu (0.000s) 2023-01-11T23:16:18.3568880Z test_copy_from_numpy_cuda_complex64 (__main__.TestAsArrayCUDA) ... skip: Only runs on cpu (0.000s) 2023-01-11T23:16:18.3569237Z test_copy_from_numpy_cuda_float16 (__main__.TestAsArrayCUDA) ... skip: Only runs on cpu (0.000s) 2023-01-11T23:16:18.3569590Z test_copy_from_numpy_cuda_float32 (__main__.TestAsArrayCUDA) ... skip: Only runs on cpu (0.000s) 2023-01-11T23:16:18.3569938Z test_copy_from_numpy_cuda_float64 (__main__.TestAsArrayCUDA) ... skip: Only runs on cpu (0.000s) 2023-01-11T23:16:18.3570312Z test_copy_from_numpy_cuda_int16 (__main__.TestAsArrayCUDA) ... skip: Only runs on cpu (0.000s) 2023-01-11T23:16:18.3570655Z test_copy_from_numpy_cuda_int32 (__main__.TestAsArrayCUDA) ... skip: Only runs on cpu (0.000s) 2023-01-11T23:16:18.3570996Z test_copy_from_numpy_cuda_int64 (__main__.TestAsArrayCUDA) ... skip: Only runs on cpu (0.000s) 2023-01-11T23:16:18.3571332Z test_copy_from_numpy_cuda_int8 (__main__.TestAsArrayCUDA) ... skip: Only runs on cpu (0.000s) 2023-01-11T23:16:18.3571684Z test_copy_from_numpy_cuda_uint8 (__main__.TestAsArrayCUDA) ... skip: Only runs on cpu (0.000s) 2023-01-11T23:16:18.3572094Z test_copy_from_tensor_mult_devices_cuda_bfloat16 (__main__.TestAsArrayCUDA) ... skip: fewer than 2 devices detected (0.000s) 2023-01-11T23:16:18.3572505Z test_copy_from_tensor_mult_devices_cuda_complex128 (__main__.TestAsArrayCUDA) ... skip: fewer than 2 devices detected (0.000s) 2023-01-11T23:16:18.3572909Z test_copy_from_tensor_mult_devices_cuda_complex64 (__main__.TestAsArrayCUDA) ... skip: fewer than 2 devices detected (0.000s) 2023-01-11T23:16:18.3573309Z test_copy_from_tensor_mult_devices_cuda_float16 (__main__.TestAsArrayCUDA) ... skip: fewer than 2 devices detected (0.000s) 2023-01-11T23:16:18.3573708Z test_copy_from_tensor_mult_devices_cuda_float32 (__main__.TestAsArrayCUDA) ... skip: fewer than 2 devices detected (0.000s) 2023-01-11T23:16:18.3574102Z test_copy_from_tensor_mult_devices_cuda_float64 (__main__.TestAsArrayCUDA) ... skip: fewer than 2 devices detected (0.000s) 2023-01-11T23:16:18.3574684Z test_copy_from_tensor_mult_devices_cuda_int16 (__main__.TestAsArrayCUDA) ... skip: fewer than 2 devices detected (0.000s) 2023-01-11T23:16:18.3575158Z test_copy_from_tensor_mult_devices_cuda_int32 (__main__.TestAsArrayCUDA) ... skip: fewer than 2 devices detected (0.000s) 2023-01-11T23:16:18.3575753Z test_copy_from_tensor_mult_devices_cuda_int64 (__main__.TestAsArrayCUDA) ... skip: fewer than 2 devices detected (0.000s) 2023-01-11T23:16:18.3576146Z test_copy_from_tensor_mult_devices_cuda_int8 (__main__.TestAsArrayCUDA) ... skip: fewer than 2 devices detected (0.000s) 2023-01-11T23:16:18.3576533Z test_copy_from_tensor_mult_devices_cuda_uint8 (__main__.TestAsArrayCUDA) ... skip: fewer than 2 devices detected (0.000s) 2023-01-11T23:16:18.3576880Z test_copy_list_cuda_bfloat16 (__main__.TestAsArrayCUDA) ... ok (0.002s) 2023-01-11T23:16:18.3577186Z test_copy_list_cuda_bool (__main__.TestAsArrayCUDA) ... ok (0.002s) 2023-01-11T23:16:18.3577495Z test_copy_list_cuda_complex128 (__main__.TestAsArrayCUDA) ... ok (0.002s) 2023-01-11T23:16:18.3577802Z test_copy_list_cuda_complex64 (__main__.TestAsArrayCUDA) ... ok (0.002s) 2023-01-11T23:16:18.3578117Z test_copy_list_cuda_float16 (__main__.TestAsArrayCUDA) ... ok (0.002s) 2023-01-11T23:16:18.3578425Z test_copy_list_cuda_float32 (__main__.TestAsArrayCUDA) ... ok (0.002s) 2023-01-11T23:16:18.3578722Z test_copy_list_cuda_float64 (__main__.TestAsArrayCUDA) ... ok (0.002s) 2023-01-11T23:16:18.3579028Z test_copy_list_cuda_int16 (__main__.TestAsArrayCUDA) ... ok (0.002s) 2023-01-11T23:16:18.3579333Z test_copy_list_cuda_int32 (__main__.TestAsArrayCUDA) ... ok (0.002s) 2023-01-11T23:16:18.3579625Z test_copy_list_cuda_int64 (__main__.TestAsArrayCUDA) ... ok (0.002s) 2023-01-11T23:16:18.3579927Z test_copy_list_cuda_int8 (__main__.TestAsArrayCUDA) ... ok (0.002s) 2023-01-11T23:16:18.3580227Z test_copy_list_cuda_uint8 (__main__.TestAsArrayCUDA) ... ok (0.002s) 2023-01-11T23:16:18.3580532Z test_copy_tensor_cuda_bfloat16 (__main__.TestAsArrayCUDA) ... ok (0.009s) 2023-01-11T23:16:18.3580835Z test_copy_tensor_cuda_bool (__main__.TestAsArrayCUDA) ... ok (0.008s) 2023-01-11T23:16:18.3581148Z test_copy_tensor_cuda_complex128 (__main__.TestAsArrayCUDA) ... ok (0.009s) 2023-01-11T23:16:18.3581472Z test_copy_tensor_cuda_complex64 (__main__.TestAsArrayCUDA) ... ok (0.009s) 2023-01-11T23:16:18.3581779Z test_copy_tensor_cuda_float16 (__main__.TestAsArrayCUDA) ... ok (0.009s) 2023-01-11T23:16:18.3582130Z test_copy_tensor_cuda_float32 (__main__.TestAsArrayCUDA) ... ok (0.009s) 2023-01-11T23:16:18.3582442Z test_copy_tensor_cuda_float64 (__main__.TestAsArrayCUDA) ... ok (0.009s) 2023-01-11T23:16:18.3582749Z test_copy_tensor_cuda_int16 (__main__.TestAsArrayCUDA) ... ok (0.008s) 2023-01-11T23:16:18.3583050Z test_copy_tensor_cuda_int32 (__main__.TestAsArrayCUDA) ... ok (0.008s) 2023-01-11T23:16:18.3583352Z test_copy_tensor_cuda_int64 (__main__.TestAsArrayCUDA) ... ok (0.008s) 2023-01-11T23:16:18.3583659Z test_copy_tensor_cuda_int8 (__main__.TestAsArrayCUDA) ... ok (0.008s) 2023-01-11T23:16:18.3583954Z test_copy_tensor_cuda_uint8 (__main__.TestAsArrayCUDA) ... ok (0.008s) 2023-01-11T23:16:18.3584285Z test_retain_autograd_history_cuda_complex64 (__main__.TestAsArrayCUDA) ... ok (0.001s) 2023-01-11T23:16:18.3584627Z test_retain_autograd_history_cuda_float32 (__main__.TestAsArrayCUDA) ... ok (0.001s) 2023-01-11T23:16:18.3584969Z test_unsupported_alias_cuda_float32 (__main__.TestAsArrayCUDA) ... ok (0.003s) 2023-01-11T23:16:18.3585340Z test_unsupported_alias_mult_devices_cuda_float32 (__main__.TestAsArrayCUDA) ... skip: fewer than 2 devices detected (0.001s) 2023-01-11T23:16:18.3585735Z test_empty_like_cuda (__main__.TestLikeTensorCreationCUDA) ... skip: Only runs on cpu (0.001s) 2023-01-11T23:16:18.3586100Z test_full_like_inference_cuda (__main__.TestLikeTensorCreationCUDA) ... ok (0.001s) 2023-01-11T23:16:18.3586434Z test_ones_like_cuda (__main__.TestLikeTensorCreationCUDA) ... ok (0.001s) 2023-01-11T23:16:18.3586819Z test_ones_like_multiple_device_cuda (__main__.TestLikeTensorCreationCUDA) ... skip: fewer than 2 devices detected (0.001s) 2023-01-11T23:16:18.3587195Z test_zeros_like_cuda (__main__.TestLikeTensorCreationCUDA) ... ok (0.001s) 2023-01-11T23:16:18.3587613Z test_zeros_like_multiple_device_cuda (__main__.TestLikeTensorCreationCUDA) ... skip: fewer than 2 devices detected (0.001s) 2023-01-11T23:16:18.3587995Z test_normal_cuda_float32 (__main__.TestRandomTensorCreationCUDA) ... ok (0.011s) 2023-01-11T23:16:18.3588347Z test_normal_cuda_float64 (__main__.TestRandomTensorCreationCUDA) ... ok (0.011s) 2023-01-11T23:16:18.3588701Z test_normal_std_error_cuda (__main__.TestRandomTensorCreationCUDA) ... ok (0.020s) 2023-01-11T23:16:18.3589059Z test_rand_cuda_complex128 (__main__.TestRandomTensorCreationCUDA) ... ok (0.001s) 2023-01-11T23:16:18.3589932Z test_rand_cuda_complex32 (__main__.TestRandomTensorCreationCUDA) ... /var/lib/jenkins/workspace/test/test_tensor_creation_ops.py:3349: UserWarning: ComplexHalf support is experimental and many operators don't support it yet. (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/EmptyTensor.cpp:32.) 2023-01-11T23:16:18.3590554Z res1 = torch.rand(size, size, dtype=dtype, device=device) 2023-01-11T23:16:18.3590802Z ok (0.001s) 2023-01-11T23:16:18.3591064Z test_rand_cuda_complex64 (__main__.TestRandomTensorCreationCUDA) ... ok (0.001s) 2023-01-11T23:16:18.3591417Z test_rand_cuda_float32 (__main__.TestRandomTensorCreationCUDA) ... ok (0.001s) 2023-01-11T23:16:18.3591759Z test_rand_cuda_float64 (__main__.TestRandomTensorCreationCUDA) ... ok (0.001s) 2023-01-11T23:16:18.3592170Z test_randint_cuda (__main__.TestRandomTensorCreationCUDA) ... skip: Only runs on cpu (0.001s) 2023-01-11T23:16:18.3592551Z test_randint_inference_cuda (__main__.TestRandomTensorCreationCUDA) ... skip: Only runs on cpu (0.001s) 2023-01-11T23:16:18.3592921Z test_randn_cuda_bfloat16 (__main__.TestRandomTensorCreationCUDA) ... ok (0.001s) 2023-01-11T23:16:18.3593271Z test_randn_cuda_complex128 (__main__.TestRandomTensorCreationCUDA) ... ok (0.001s) 2023-01-11T23:16:18.3593623Z test_randn_cuda_complex32 (__main__.TestRandomTensorCreationCUDA) ... ok (0.001s) 2023-01-11T23:16:18.3593970Z test_randn_cuda_complex64 (__main__.TestRandomTensorCreationCUDA) ... ok (0.001s) 2023-01-11T23:16:18.3594315Z test_randn_cuda_float16 (__main__.TestRandomTensorCreationCUDA) ... ok (0.001s) 2023-01-11T23:16:18.3594692Z test_randn_cuda_float32 (__main__.TestRandomTensorCreationCUDA) ... ok (0.001s) 2023-01-11T23:16:18.3595028Z test_randn_cuda_float64 (__main__.TestRandomTensorCreationCUDA) ... ok (0.001s) 2023-01-11T23:16:18.3595376Z test_random_neg_values_cuda (__main__.TestRandomTensorCreationCUDA) ... ok (0.001s) 2023-01-11T23:16:18.3595723Z test_randperm_cuda (__main__.TestRandomTensorCreationCUDA) ... ok (0.044s) 2023-01-11T23:16:18.3596091Z test_randperm_device_compatibility_cuda (__main__.TestRandomTensorCreationCUDA) ... ok (0.167s) 2023-01-11T23:16:18.3596458Z test_uniform_from_to_cuda_bfloat16 (__main__.TestRandomTensorCreationCUDA) ... ok (0.017s) 2023-01-11T23:16:18.3596819Z test_uniform_from_to_cuda_float16 (__main__.TestRandomTensorCreationCUDA) ... ok (0.018s) 2023-01-11T23:16:18.3597181Z test_uniform_from_to_cuda_float32 (__main__.TestRandomTensorCreationCUDA) ... ok (0.058s) 2023-01-11T23:16:18.3597530Z test_uniform_from_to_cuda_float64 (__main__.TestRandomTensorCreationCUDA) ... ok (0.112s) 2023-01-11T23:16:18.3597867Z test_arange_bfloat16_cuda (__main__.TestTensorCreationCUDA) ... ok (0.001s) 2023-01-11T23:16:18.3598186Z test_arange_cuda (__main__.TestTensorCreationCUDA) ... ok (0.063s) 2023-01-11T23:16:18.3598520Z test_arange_device_vs_cpu_cuda_float32 (__main__.TestTensorCreationCUDA) ... ok (0.001s) 2023-01-11T23:16:18.3598865Z test_arange_device_vs_cpu_cuda_float64 (__main__.TestTensorCreationCUDA) ... ok (0.001s) 2023-01-11T23:16:18.3599213Z test_arange_device_vs_cpu_cuda_int32 (__main__.TestTensorCreationCUDA) ... ok (0.001s) 2023-01-11T23:16:18.3599563Z test_arange_device_vs_cpu_cuda_int64 (__main__.TestTensorCreationCUDA) ... ok (0.001s) 2023-01-11T23:16:18.3599920Z test_arange_inference_cuda (__main__.TestTensorCreationCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T23:16:18.3600312Z test_as_strided_neg_cuda (__main__.TestTensorCreationCUDA) ... ok (0.012s) 2023-01-11T23:16:18.3600653Z test_as_tensor_cuda (__main__.TestTensorCreationCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T23:16:18.3600994Z test_block_diag_cuda (__main__.TestTensorCreationCUDA) ... ok (0.040s) 2023-01-11T23:16:18.3601313Z test_block_diag_scipy_cuda (__main__.TestTensorCreationCUDA) ... ok (0.031s) 2023-01-11T23:16:18.3601643Z test_cartesian_prod_cuda (__main__.TestTensorCreationCUDA) ... ok (0.002s) 2023-01-11T23:16:18.3602017Z test_cat2_cuda_float16 (__main__.TestTensorCreationCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T23:16:18.3602386Z test_cat2_cuda_float64 (__main__.TestTensorCreationCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T23:16:18.3602737Z test_cat2_cuda_int32 (__main__.TestTensorCreationCUDA) ... skip: Only runs on cpu (0.001s) 2023-01-11T23:16:18.3603088Z test_cat_all_dtypes_and_devices_cuda (__main__.TestTensorCreationCUDA) ... ok (0.007s) 2023-01-11T23:16:18.3603488Z test_cat_big_cuda (__main__.TestTensorCreationCUDA) ... skip: test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test (0.001s) 2023-01-11T23:16:18.3603845Z test_cat_cuda (__main__.TestTensorCreationCUDA) ... ok (0.005s) 2023-01-11T23:16:18.3604154Z test_cat_empty_cuda (__main__.TestTensorCreationCUDA) ... ok (0.001s) 2023-01-11T23:16:18.3604477Z test_cat_empty_legacy_cuda (__main__.TestTensorCreationCUDA) ... ok (0.001s) 2023-01-11T23:16:18.3604807Z test_cat_in_channels_last_cuda (__main__.TestTensorCreationCUDA) ... ok (0.025s) 2023-01-11T23:16:18.3605141Z test_cat_mem_overlap_cuda (__main__.TestTensorCreationCUDA) ... ok (0.005s) 2023-01-11T23:16:18.3605475Z test_cat_out_channels_last_cuda (__main__.TestTensorCreationCUDA) ... ok (0.001s) 2023-01-11T23:16:18.3605797Z test_cat_out_cuda (__main__.TestTensorCreationCUDA) ... ok (0.003s) 2023-01-11T23:16:18.3606114Z test_cat_out_memory_format_cuda (__main__.TestTensorCreationCUDA) ... ok (0.002s) 2023-01-11T23:16:18.3606463Z test_cat_preserve_channels_last_cuda (__main__.TestTensorCreationCUDA) ... ok (0.002s) 2023-01-11T23:16:18.3606810Z test_cat_stack_cross_devices_cuda (__main__.TestTensorCreationCUDA) ... ok (0.011s) 2023-01-11T23:16:18.3607167Z test_combinations_cuda (__main__.TestTensorCreationCUDA) ... ok (0.005s) 2023-01-11T23:16:18.3607556Z test_complex_type_conversions_cuda (__main__.TestTensorCreationCUDA) ... skip: real and imag not implemented for complex (0.001s) 2023-01-11T23:16:18.3607972Z test_constructor_device_legacy_cuda (__main__.TestTensorCreationCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T23:16:18.3608358Z test_constructor_dtypes_cuda (__main__.TestTensorCreationCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T23:16:18.3608987Z test_ctor_with_numpy_array_cuda (__main__.TestTensorCreationCUDA) ... /var/lib/jenkins/workspace/test/test_tensor_creation_ops.py:1447: DeprecationWarning: `np.float` is a deprecated alias for the builtin `float`. To silence this warning, use `float` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.float64` here. 2023-01-11T23:16:18.3609756Z Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations 2023-01-11T23:16:18.3610060Z np.float, 2023-01-11T23:16:18.3610538Z /var/lib/jenkins/workspace/test/test_tensor_creation_ops.py:1454: DeprecationWarning: `np.bool` is a deprecated alias for the builtin `bool`. To silence this warning, use `bool` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.bool_` here. 2023-01-11T23:16:18.3611201Z Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations 2023-01-11T23:16:18.3611486Z np.bool, 2023-01-11T23:16:18.3611668Z ok (0.007s) 2023-01-11T23:16:18.3611964Z test_device_rounding_cuda_float16 (__main__.TestTensorCreationCUDA) ... ok (0.001s) 2023-01-11T23:16:18.3612366Z test_device_rounding_cuda_float32 (__main__.TestTensorCreationCUDA) ... ok (0.001s) 2023-01-11T23:16:18.3612710Z test_device_rounding_cuda_float64 (__main__.TestTensorCreationCUDA) ... ok (0.001s) 2023-01-11T23:16:18.3613077Z test_diag_embed_cuda_float32 (__main__.TestTensorCreationCUDA) ... skip: Only runs on cpu (0.001s) 2023-01-11T23:16:18.3613419Z test_diagflat_cuda (__main__.TestTensorCreationCUDA) ... ok (0.003s) 2023-01-11T23:16:18.3613740Z test_dsplit_cuda_complex64 (__main__.TestTensorCreationCUDA) ... ok (0.020s) 2023-01-11T23:16:18.3614069Z test_dsplit_cuda_float32 (__main__.TestTensorCreationCUDA) ... ok (0.020s) 2023-01-11T23:16:18.3614394Z test_dsplit_cuda_int64 (__main__.TestTensorCreationCUDA) ... ok (0.019s) 2023-01-11T23:16:18.3614836Z test_dstack_cuda_complex128 (__main__.TestTensorCreationCUDA) ... ok (0.040s) 2023-01-11T23:16:18.3615171Z test_dstack_cuda_complex64 (__main__.TestTensorCreationCUDA) ... ok (0.040s) 2023-01-11T23:16:18.3615504Z test_dstack_cuda_float16 (__main__.TestTensorCreationCUDA) ... ok (0.040s) 2023-01-11T23:16:18.3615829Z test_dstack_cuda_float32 (__main__.TestTensorCreationCUDA) ... ok (0.040s) 2023-01-11T23:16:18.3616146Z test_dstack_cuda_float64 (__main__.TestTensorCreationCUDA) ... ok (0.039s) 2023-01-11T23:16:18.3616463Z test_dstack_cuda_int16 (__main__.TestTensorCreationCUDA) ... ok (0.023s) 2023-01-11T23:16:18.3616782Z test_dstack_cuda_int32 (__main__.TestTensorCreationCUDA) ... ok (0.023s) 2023-01-11T23:16:18.3617084Z test_dstack_cuda_int64 (__main__.TestTensorCreationCUDA) ... ok (0.023s) 2023-01-11T23:16:18.3617406Z test_dstack_cuda_int8 (__main__.TestTensorCreationCUDA) ... ok (0.023s) 2023-01-11T23:16:18.3617722Z test_dstack_cuda_uint8 (__main__.TestTensorCreationCUDA) ... ok (0.023s) 2023-01-11T23:16:18.3618035Z test_empty_full_cuda (__main__.TestTensorCreationCUDA) ... ok (0.052s) 2023-01-11T23:16:18.3618352Z test_empty_overflow_cuda (__main__.TestTensorCreationCUDA) ... ok (0.016s) 2023-01-11T23:16:18.3619069Z test_empty_strided_cuda (__main__.TestTensorCreationCUDA) ... /var/lib/jenkins/workspace/test/test_tensor_creation_ops.py:2445: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T23:16:18.3619688Z as_strided = torch.empty(empty_strided.storage().size(), 2023-01-11T23:16:18.3619923Z ok (0.002s) 2023-01-11T23:16:18.3620179Z test_empty_tensor_props_cuda (__main__.TestTensorCreationCUDA) ... ok (0.002s) 2023-01-11T23:16:18.3620497Z test_eye_cuda (__main__.TestTensorCreationCUDA) ... ok (0.184s) 2023-01-11T23:16:18.3620827Z test_fill_all_dtypes_and_devices_cuda (__main__.TestTensorCreationCUDA) ... ok (0.363s) 2023-01-11T23:16:18.3621180Z test_float_to_int_conversion_finite_cuda_bool (__main__.TestTensorCreationCUDA) ... ok (0.001s) 2023-01-11T23:16:18.3621551Z test_float_to_int_conversion_finite_cuda_int16 (__main__.TestTensorCreationCUDA) ... ok (0.001s) 2023-01-11T23:16:18.3621949Z test_float_to_int_conversion_finite_cuda_int32 (__main__.TestTensorCreationCUDA) ... ok (0.001s) 2023-01-11T23:16:18.3622335Z test_float_to_int_conversion_finite_cuda_int64 (__main__.TestTensorCreationCUDA) ... ok (0.001s) 2023-01-11T23:16:18.3622688Z test_float_to_int_conversion_finite_cuda_int8 (__main__.TestTensorCreationCUDA) ... ok (0.001s) 2023-01-11T23:16:18.3623054Z test_float_to_int_conversion_finite_cuda_uint8 (__main__.TestTensorCreationCUDA) ... ok (0.001s) 2023-01-11T23:16:18.3623439Z test_float_to_int_conversion_nonfinite_cuda_bool (__main__.TestTensorCreationCUDA) ... skip: Only runs on cpu (0.001s) 2023-01-11T23:16:18.3623849Z test_float_to_int_conversion_nonfinite_cuda_int16 (__main__.TestTensorCreationCUDA) ... skip: Only runs on cpu (0.001s) 2023-01-11T23:16:18.3624333Z test_float_to_int_conversion_nonfinite_cuda_int32 (__main__.TestTensorCreationCUDA) ... skip: Only runs on cpu (0.001s) 2023-01-11T23:16:18.3624730Z test_float_to_int_conversion_nonfinite_cuda_int64 (__main__.TestTensorCreationCUDA) ... skip: Only runs on cpu (0.001s) 2023-01-11T23:16:18.3625138Z test_float_to_int_conversion_nonfinite_cuda_int8 (__main__.TestTensorCreationCUDA) ... skip: Only runs on cpu (0.001s) 2023-01-11T23:16:18.3625537Z test_float_to_int_conversion_nonfinite_cuda_uint8 (__main__.TestTensorCreationCUDA) ... skip: Only runs on cpu (0.001s) 2023-01-11T23:16:18.3625902Z test_full_inference_cuda_float16 (__main__.TestTensorCreationCUDA) ... ok (0.001s) 2023-01-11T23:16:18.3626249Z test_full_inference_cuda_float32 (__main__.TestTensorCreationCUDA) ... ok (0.001s) 2023-01-11T23:16:18.3626590Z test_full_inference_cuda_float64 (__main__.TestTensorCreationCUDA) ... ok (0.001s) 2023-01-11T23:16:18.3626907Z test_full_out_cuda (__main__.TestTensorCreationCUDA) ... ok (0.002s) 2023-01-11T23:16:18.3627236Z test_hsplit_cuda_complex64 (__main__.TestTensorCreationCUDA) ... ok (0.025s) 2023-01-11T23:16:18.3627564Z test_hsplit_cuda_float32 (__main__.TestTensorCreationCUDA) ... ok (0.025s) 2023-01-11T23:16:18.3627892Z test_hsplit_cuda_int64 (__main__.TestTensorCreationCUDA) ... ok (0.023s) 2023-01-11T23:16:18.3628228Z test_hstack_column_stack_cuda_complex128 (__main__.TestTensorCreationCUDA) ... ok (0.081s) 2023-01-11T23:16:18.3628586Z test_hstack_column_stack_cuda_complex64 (__main__.TestTensorCreationCUDA) ... ok (0.081s) 2023-01-11T23:16:18.3628935Z test_hstack_column_stack_cuda_float16 (__main__.TestTensorCreationCUDA) ... ok (0.080s) 2023-01-11T23:16:18.3629274Z test_hstack_column_stack_cuda_float32 (__main__.TestTensorCreationCUDA) ... ok (0.080s) 2023-01-11T23:16:18.3629610Z test_hstack_column_stack_cuda_float64 (__main__.TestTensorCreationCUDA) ... ok (0.080s) 2023-01-11T23:16:18.3629956Z test_hstack_column_stack_cuda_int16 (__main__.TestTensorCreationCUDA) ... ok (0.059s) 2023-01-11T23:16:18.3630300Z test_hstack_column_stack_cuda_int32 (__main__.TestTensorCreationCUDA) ... ok (0.059s) 2023-01-11T23:16:18.3630730Z test_hstack_column_stack_cuda_int64 (__main__.TestTensorCreationCUDA) ... ok (0.059s) 2023-01-11T23:16:18.3631126Z test_hstack_column_stack_cuda_int8 (__main__.TestTensorCreationCUDA) ... ok (0.060s) 2023-01-11T23:16:18.3631504Z test_hstack_column_stack_cuda_uint8 (__main__.TestTensorCreationCUDA) ... ok (0.059s) 2023-01-11T23:16:18.3631851Z test_kaiser_window_cuda_bfloat16 (__main__.TestTensorCreationCUDA) ... ok (0.790s) 2023-01-11T23:16:18.3632188Z test_kaiser_window_cuda_float16 (__main__.TestTensorCreationCUDA) ... ok (0.626s) 2023-01-11T23:16:18.3632525Z test_kaiser_window_cuda_float32 (__main__.TestTensorCreationCUDA) ... ok (0.624s) 2023-01-11T23:16:18.3632860Z test_kaiser_window_cuda_float64 (__main__.TestTensorCreationCUDA) ... ok (0.653s) 2023-01-11T23:16:18.3633184Z test_kaiser_window_cuda_int64 (__main__.TestTensorCreationCUDA) ... ok (0.177s) 2023-01-11T23:16:18.3633523Z test_large_linspace_cuda_int32 (__main__.TestTensorCreationCUDA) ... ok (0.001s) 2023-01-11T23:16:18.3633863Z test_large_linspace_cuda_int64 (__main__.TestTensorCreationCUDA) ... ok (0.001s) 2023-01-11T23:16:18.3634245Z test_like_fn_stride_proparation_vs_tensoriterator_unary_op_cuda (__main__.TestTensorCreationCUDA) ... ok (0.106s) 2023-01-11T23:16:18.3634627Z test_linlogspace_mem_overlap_cuda (__main__.TestTensorCreationCUDA) ... ok (0.014s) 2023-01-11T23:16:18.3634970Z test_linspace_cuda_bfloat16 (__main__.TestTensorCreationCUDA) ... ok (0.009s) 2023-01-11T23:16:18.3635308Z test_linspace_cuda_complex128 (__main__.TestTensorCreationCUDA) ... ok (12.919s) 2023-01-11T23:16:18.3635643Z test_linspace_cuda_complex64 (__main__.TestTensorCreationCUDA) ... ok (12.564s) 2023-01-11T23:16:18.3635975Z test_linspace_cuda_float32 (__main__.TestTensorCreationCUDA) ... ok (12.549s) 2023-01-11T23:16:18.3636300Z test_linspace_cuda_float64 (__main__.TestTensorCreationCUDA) ... ok (12.579s) 2023-01-11T23:16:18.3636655Z test_linspace_cuda_int16 (__main__.TestTensorCreationCUDA) ... ok (0.009s) 2023-01-11T23:16:18.3636984Z test_linspace_cuda_int32 (__main__.TestTensorCreationCUDA) ... ok (12.714s) 2023-01-11T23:16:18.3637313Z test_linspace_cuda_int64 (__main__.TestTensorCreationCUDA) ... ok (12.611s) 2023-01-11T23:16:18.3637637Z test_linspace_cuda_int8 (__main__.TestTensorCreationCUDA) ... ok (0.008s) 2023-01-11T23:16:18.3637954Z test_linspace_cuda_uint8 (__main__.TestTensorCreationCUDA) ... ok (0.008s) 2023-01-11T23:16:18.3638288Z test_linspace_deduction_cuda (__main__.TestTensorCreationCUDA) ... ok (0.001s) 2023-01-11T23:16:18.3638638Z test_linspace_device_vs_cpu_cuda_bfloat16 (__main__.TestTensorCreationCUDA) ... ok (0.001s) 2023-01-11T23:16:18.3638996Z test_linspace_device_vs_cpu_cuda_complex128 (__main__.TestTensorCreationCUDA) ... ok (0.001s) 2023-01-11T23:16:18.3639360Z test_linspace_device_vs_cpu_cuda_complex64 (__main__.TestTensorCreationCUDA) ... ok (0.001s) 2023-01-11T23:16:18.3639719Z test_linspace_device_vs_cpu_cuda_float16 (__main__.TestTensorCreationCUDA) ... ok (0.001s) 2023-01-11T23:16:18.3640078Z test_linspace_device_vs_cpu_cuda_float32 (__main__.TestTensorCreationCUDA) ... ok (0.001s) 2023-01-11T23:16:18.3640425Z test_linspace_device_vs_cpu_cuda_float64 (__main__.TestTensorCreationCUDA) ... ok (0.001s) 2023-01-11T23:16:18.3640782Z test_linspace_special_steps_cuda_bfloat16 (__main__.TestTensorCreationCUDA) ... ok (0.001s) 2023-01-11T23:16:18.3641145Z test_linspace_special_steps_cuda_complex128 (__main__.TestTensorCreationCUDA) ... ok (0.001s) 2023-01-11T23:16:18.3641508Z test_linspace_special_steps_cuda_complex64 (__main__.TestTensorCreationCUDA) ... ok (0.001s) 2023-01-11T23:16:18.3641866Z test_linspace_special_steps_cuda_float16 (__main__.TestTensorCreationCUDA) ... ok (0.001s) 2023-01-11T23:16:18.3642224Z test_linspace_special_steps_cuda_float32 (__main__.TestTensorCreationCUDA) ... ok (0.001s) 2023-01-11T23:16:18.3642575Z test_linspace_special_steps_cuda_float64 (__main__.TestTensorCreationCUDA) ... ok (0.001s) 2023-01-11T23:16:18.3642930Z test_linspace_vs_numpy_complex_cuda_complex64 (__main__.TestTensorCreationCUDA) ... ok (0.087s) 2023-01-11T23:16:18.3643326Z test_linspace_vs_numpy_cuda_complex128 (__main__.TestTensorCreationCUDA) ... ok (0.076s) 2023-01-11T23:16:18.3643682Z test_linspace_vs_numpy_cuda_complex64 (__main__.TestTensorCreationCUDA) ... ok (0.087s) 2023-01-11T23:16:18.3644033Z test_linspace_vs_numpy_cuda_float32 (__main__.TestTensorCreationCUDA) ... ok (0.047s) 2023-01-11T23:16:18.3644372Z test_linspace_vs_numpy_cuda_float64 (__main__.TestTensorCreationCUDA) ... ok (0.042s) 2023-01-11T23:16:18.3644727Z test_linspace_vs_numpy_integral_cuda_int16 (__main__.TestTensorCreationCUDA) ... ok (0.002s) 2023-01-11T23:16:18.3645081Z test_linspace_vs_numpy_integral_cuda_int32 (__main__.TestTensorCreationCUDA) ... ok (0.001s) 2023-01-11T23:16:18.3645428Z test_linspace_vs_numpy_integral_cuda_int64 (__main__.TestTensorCreationCUDA) ... ok (0.001s) 2023-01-11T23:16:18.3645793Z test_linspace_vs_numpy_integral_cuda_int8 (__main__.TestTensorCreationCUDA) ... ok (0.001s) 2023-01-11T23:16:18.3646152Z test_linspace_vs_numpy_integral_cuda_uint8 (__main__.TestTensorCreationCUDA) ... ok (0.001s) 2023-01-11T23:16:18.3646501Z test_logspace_base2_cuda_float16 (__main__.TestTensorCreationCUDA) ... ok (0.001s) 2023-01-11T23:16:18.3646838Z test_logspace_base2_cuda_float32 (__main__.TestTensorCreationCUDA) ... ok (0.001s) 2023-01-11T23:16:18.3647179Z test_logspace_base2_cuda_float64 (__main__.TestTensorCreationCUDA) ... ok (0.001s) 2023-01-11T23:16:18.3647518Z test_logspace_cuda_bfloat16 (__main__.TestTensorCreationCUDA) ... ok (0.008s) 2023-01-11T23:16:18.3647850Z test_logspace_cuda_float16 (__main__.TestTensorCreationCUDA) ... ok (0.008s) 2023-01-11T23:16:18.3648171Z test_logspace_cuda_float32 (__main__.TestTensorCreationCUDA) ... ok (0.017s) 2023-01-11T23:16:18.3648493Z test_logspace_cuda_float64 (__main__.TestTensorCreationCUDA) ... ok (0.007s) 2023-01-11T23:16:18.3648856Z test_logspace_cuda_int16 (__main__.TestTensorCreationCUDA) ... ok (0.007s) 2023-01-11T23:16:18.3649175Z test_logspace_cuda_int32 (__main__.TestTensorCreationCUDA) ... ok (0.007s) 2023-01-11T23:16:18.3649502Z test_logspace_cuda_int64 (__main__.TestTensorCreationCUDA) ... ok (0.007s) 2023-01-11T23:16:18.3649824Z test_logspace_cuda_int8 (__main__.TestTensorCreationCUDA) ... ok (0.007s) 2023-01-11T23:16:18.3650141Z test_logspace_cuda_uint8 (__main__.TestTensorCreationCUDA) ... ok (0.007s) 2023-01-11T23:16:18.3650471Z test_logspace_deduction_cuda (__main__.TestTensorCreationCUDA) ... ok (0.001s) 2023-01-11T23:16:18.3650818Z test_logspace_device_vs_cpu_cuda_float16 (__main__.TestTensorCreationCUDA) ... ok (0.001s) 2023-01-11T23:16:18.3651174Z test_logspace_device_vs_cpu_cuda_float32 (__main__.TestTensorCreationCUDA) ... ok (0.001s) 2023-01-11T23:16:18.3651522Z test_logspace_device_vs_cpu_cuda_float64 (__main__.TestTensorCreationCUDA) ... ok (0.001s) 2023-01-11T23:16:18.3651883Z test_logspace_special_steps_cuda_float16 (__main__.TestTensorCreationCUDA) ... ok (0.001s) 2023-01-11T23:16:18.3652236Z test_logspace_special_steps_cuda_float32 (__main__.TestTensorCreationCUDA) ... ok (0.001s) 2023-01-11T23:16:18.3652589Z test_logspace_special_steps_cuda_float64 (__main__.TestTensorCreationCUDA) ... ok (0.001s) 2023-01-11T23:16:18.3652941Z test_logspace_vs_numpy_complex_cuda_complex64 (__main__.TestTensorCreationCUDA) ... ok (0.090s) 2023-01-11T23:16:18.3653300Z test_logspace_vs_numpy_cuda_float32 (__main__.TestTensorCreationCUDA) ... ok (0.216s) 2023-01-11T23:16:18.3653647Z test_logspace_vs_numpy_cuda_float64 (__main__.TestTensorCreationCUDA) ... ok (0.211s) 2023-01-11T23:16:18.3654447Z test_meshgrid_default_indexing_cuda (__main__.TestTensorCreationCUDA) ... /opt/conda/lib/python3.10/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/TensorShape.cpp:3452.) 2023-01-11T23:16:18.3655257Z return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined] 2023-01-11T23:16:18.3655506Z ok (0.003s) 2023-01-11T23:16:18.3655819Z test_meshgrid_empty_cuda (__main__.TestTensorCreationCUDA) ... ok (0.003s) 2023-01-11T23:16:18.3656169Z test_meshgrid_ij_indexing_cuda (__main__.TestTensorCreationCUDA) ... ok (0.002s) 2023-01-11T23:16:18.3656557Z test_meshgrid_ij_indexing_is_default_cuda (__main__.TestTensorCreationCUDA) ... ok (0.001s) 2023-01-11T23:16:18.3656955Z test_meshgrid_inconsistent_device_cuda (__main__.TestTensorCreationCUDA) ... ok (0.003s) 2023-01-11T23:16:18.3657349Z test_meshgrid_inconsistent_dtype_cuda (__main__.TestTensorCreationCUDA) ... ok (0.003s) 2023-01-11T23:16:18.3657725Z test_meshgrid_non_1d_tensor_cuda (__main__.TestTensorCreationCUDA) ... ok (0.003s) 2023-01-11T23:16:18.3658113Z test_meshgrid_unsupported_indexing_cuda (__main__.TestTensorCreationCUDA) ... ok (0.002s) 2023-01-11T23:16:18.3658503Z test_meshgrid_vs_numpy_cuda (__main__.TestTensorCreationCUDA) ... ok (0.013s) 2023-01-11T23:16:18.3658876Z test_meshgrid_warns_if_no_indexing_cuda (__main__.TestTensorCreationCUDA) ... ok (0.001s) 2023-01-11T23:16:18.3659264Z test_meshgrid_xy_indexing_cuda (__main__.TestTensorCreationCUDA) ... ok (0.002s) 2023-01-11T23:16:18.3659628Z test_new_empty_strided_cuda (__main__.TestTensorCreationCUDA) ... ok (0.010s) 2023-01-11T23:16:18.3660031Z test_new_methods_requires_grad_cuda (__main__.TestTensorCreationCUDA) ... skip: Only runs on cpu (0.001s) 2023-01-11T23:16:18.3660431Z test_new_tensor_cuda (__main__.TestTensorCreationCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T23:16:18.3660834Z test_offset_scalar_cast_cuda (__main__.TestTensorCreationCUDA) ... skip: Only runs on cpu (0.000s) 2023-01-11T23:16:18.3661201Z test_ones_cuda (__main__.TestTensorCreationCUDA) ... ok (0.002s) 2023-01-11T23:16:18.3661535Z test_random_bool_cuda (__main__.TestTensorCreationCUDA) ... ok (0.001s) 2023-01-11T23:16:18.3661948Z test_random_cuda_float32 (__main__.TestTensorCreationCUDA) ... ok (0.001s) 2023-01-11T23:16:18.3662333Z test_random_cuda_float64 (__main__.TestTensorCreationCUDA) ... ok (0.001s) 2023-01-11T23:16:18.3662685Z test_random_cuda_int16 (__main__.TestTensorCreationCUDA) ... ok (0.001s) 2023-01-11T23:16:18.3663027Z test_random_cuda_int32 (__main__.TestTensorCreationCUDA) ... ok (0.001s) 2023-01-11T23:16:18.3663377Z test_random_cuda_int64 (__main__.TestTensorCreationCUDA) ... ok (0.001s) 2023-01-11T23:16:18.3663722Z test_random_cuda_int8 (__main__.TestTensorCreationCUDA) ... ok (0.001s) 2023-01-11T23:16:18.3664077Z test_random_default_cuda_bfloat16 (__main__.TestTensorCreationCUDA) ... ok (0.001s) 2023-01-11T23:16:18.3664457Z test_random_default_cuda_float16 (__main__.TestTensorCreationCUDA) ... ok (0.001s) 2023-01-11T23:16:18.3664829Z test_random_default_cuda_float32 (__main__.TestTensorCreationCUDA) ... ok (0.001s) 2023-01-11T23:16:18.3665205Z test_random_default_cuda_float64 (__main__.TestTensorCreationCUDA) ... ok (0.001s) 2023-01-11T23:16:18.3665565Z test_random_default_cuda_int16 (__main__.TestTensorCreationCUDA) ... ok (0.001s) 2023-01-11T23:16:18.3665936Z test_random_default_cuda_int32 (__main__.TestTensorCreationCUDA) ... ok (0.001s) 2023-01-11T23:16:18.3666305Z test_random_default_cuda_int64 (__main__.TestTensorCreationCUDA) ... ok (0.001s) 2023-01-11T23:16:18.3666663Z test_random_default_cuda_int8 (__main__.TestTensorCreationCUDA) ... ok (0.001s) 2023-01-11T23:16:18.3667029Z test_random_default_cuda_uint8 (__main__.TestTensorCreationCUDA) ... ok (0.001s) 2023-01-11T23:16:18.3667390Z test_random_from_to_bool_cuda (__main__.TestTensorCreationCUDA) ... ok (0.217s) 2023-01-11T23:16:18.3667756Z test_random_from_to_cuda_bfloat16 (__main__.TestTensorCreationCUDA) ... ok (0.047s) 2023-01-11T23:16:18.3668119Z test_random_from_to_cuda_float16 (__main__.TestTensorCreationCUDA) ... ok (0.046s) 2023-01-11T23:16:18.3668493Z test_random_from_to_cuda_float32 (__main__.TestTensorCreationCUDA) ... ok (0.046s) 2023-01-11T23:16:18.3668862Z test_random_from_to_cuda_float64 (__main__.TestTensorCreationCUDA) ... ok (0.046s) 2023-01-11T23:16:18.3669250Z test_random_from_to_cuda_int16 (__main__.TestTensorCreationCUDA) ... ok (0.233s) 2023-01-11T23:16:18.3669615Z test_random_from_to_cuda_int32 (__main__.TestTensorCreationCUDA) ... ok (0.232s) 2023-01-11T23:16:18.3669983Z test_random_from_to_cuda_int64 (__main__.TestTensorCreationCUDA) ... ok (0.031s) 2023-01-11T23:16:18.3670347Z test_random_from_to_cuda_int8 (__main__.TestTensorCreationCUDA) ... ok (0.233s) 2023-01-11T23:16:18.3670795Z test_random_from_to_cuda_uint8 (__main__.TestTensorCreationCUDA) ... ok (0.203s) 2023-01-11T23:16:18.3671135Z test_random_full_range_cuda_bfloat16 (__main__.TestTensorCreationCUDA) ... ok (0.001s) 2023-01-11T23:16:18.3671482Z test_random_full_range_cuda_float16 (__main__.TestTensorCreationCUDA) ... ok (0.001s) 2023-01-11T23:16:18.3671823Z test_random_full_range_cuda_float32 (__main__.TestTensorCreationCUDA) ... ok (0.001s) 2023-01-11T23:16:18.3672167Z test_random_full_range_cuda_float64 (__main__.TestTensorCreationCUDA) ... ok (0.001s) 2023-01-11T23:16:18.3672511Z test_random_full_range_cuda_int16 (__main__.TestTensorCreationCUDA) ... ok (0.001s) 2023-01-11T23:16:18.3672849Z test_random_full_range_cuda_int32 (__main__.TestTensorCreationCUDA) ... ok (0.001s) 2023-01-11T23:16:18.3673175Z test_random_full_range_cuda_int64 (__main__.TestTensorCreationCUDA) ... ok (0.001s) 2023-01-11T23:16:18.3673510Z test_random_full_range_cuda_int8 (__main__.TestTensorCreationCUDA) ... ok (0.001s) 2023-01-11T23:16:18.3673848Z test_random_full_range_cuda_uint8 (__main__.TestTensorCreationCUDA) ... ok (0.001s) 2023-01-11T23:16:18.3674972Z test_random_to_cuda_bfloat16 (__main__.TestTensorCreationCUDA) ... /var/lib/jenkins/workspace/test/test_tensor_creation_ops.py:1714: UserWarning: to - 1 is out of bounds [-(2^8), 2^8]. Due to precision limitations c10::BFloat16 can support discrete uniform distribution only within this range. This warning will become an error in version 1.7 release, please fix the code in advance (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/DistributionTemplates.h:106.) 2023-01-11T23:16:18.3675630Z t.random_(to_) 2023-01-11T23:16:18.3675820Z ok (0.011s) 2023-01-11T23:16:18.3676792Z test_random_to_cuda_float16 (__main__.TestTensorCreationCUDA) ... /var/lib/jenkins/workspace/test/test_tensor_creation_ops.py:1714: UserWarning: to - 1 is out of bounds [-(2^11), 2^11]. Due to precision limitations c10::Half can support discrete uniform distribution only within this range. This warning will become an error in version 1.7 release, please fix the code in advance (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/DistributionTemplates.h:106.) 2023-01-11T23:16:18.3677391Z t.random_(to_) 2023-01-11T23:16:18.3677575Z ok (0.011s) 2023-01-11T23:16:18.3678457Z test_random_to_cuda_float32 (__main__.TestTensorCreationCUDA) ... /var/lib/jenkins/workspace/test/test_tensor_creation_ops.py:1714: UserWarning: to - 1 is out of bounds [-(2^24), 2^24]. Due to precision limitations float can support discrete uniform distribution only within this range. This warning will become an error in version 1.7 release, please fix the code in advance (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/DistributionTemplates.h:106.) 2023-01-11T23:16:18.3679049Z t.random_(to_) 2023-01-11T23:16:18.3679227Z ok (0.011s) 2023-01-11T23:16:18.3680111Z test_random_to_cuda_float64 (__main__.TestTensorCreationCUDA) ... /var/lib/jenkins/workspace/test/test_tensor_creation_ops.py:1714: UserWarning: to - 1 is out of bounds [-(2^53), 2^53]. Due to precision limitations double can support discrete uniform distribution only within this range. This warning will become an error in version 1.7 release, please fix the code in advance (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/DistributionTemplates.h:106.) 2023-01-11T23:16:18.3680709Z t.random_(to_) 2023-01-11T23:16:18.3680893Z ok (0.011s) 2023-01-11T23:16:18.3681142Z test_random_to_cuda_int16 (__main__.TestTensorCreationCUDA) ... ok (0.025s) 2023-01-11T23:16:18.3681500Z test_random_to_cuda_int32 (__main__.TestTensorCreationCUDA) ... ok (0.025s) 2023-01-11T23:16:18.3681830Z test_random_to_cuda_int64 (__main__.TestTensorCreationCUDA) ... ok (0.011s) 2023-01-11T23:16:18.3682148Z test_random_to_cuda_int8 (__main__.TestTensorCreationCUDA) ... ok (0.025s) 2023-01-11T23:16:18.3682474Z test_random_to_cuda_uint8 (__main__.TestTensorCreationCUDA) ... ok (0.021s) 2023-01-11T23:16:18.3682792Z test_range_cuda (__main__.TestTensorCreationCUDA) ... ok (0.003s) 2023-01-11T23:16:18.3683127Z test_range_factories_64bit_indexing_cuda (__main__.TestTensorCreationCUDA) ... ok (0.240s) 2023-01-11T23:16:18.3683461Z test_range_warning_cuda (__main__.TestTensorCreationCUDA) ... ok (0.001s) 2023-01-11T23:16:18.3683791Z test_repeat_interleave_cuda (__main__.TestTensorCreationCUDA) ... ok (0.020s) 2023-01-11T23:16:18.3684114Z test_roll_cuda (__main__.TestTensorCreationCUDA) ... ok (0.023s) 2023-01-11T23:16:18.3684461Z test_signal_window_functions_window_bartlett_cuda_bfloat16 (__main__.TestTensorCreationCUDA) ... ok (0.009s) 2023-01-11T23:16:18.3684854Z test_signal_window_functions_window_bartlett_cuda_float16 (__main__.TestTensorCreationCUDA) ... ok (0.008s) 2023-01-11T23:16:18.3685241Z test_signal_window_functions_window_bartlett_cuda_float32 (__main__.TestTensorCreationCUDA) ... ok (0.008s) 2023-01-11T23:16:18.3685626Z test_signal_window_functions_window_bartlett_cuda_float64 (__main__.TestTensorCreationCUDA) ... ok (0.008s) 2023-01-11T23:16:18.3685999Z test_signal_window_functions_window_bartlett_cuda_int64 (__main__.TestTensorCreationCUDA) ... ok (0.004s) 2023-01-11T23:16:18.3686384Z test_signal_window_functions_window_blackman_cuda_bfloat16 (__main__.TestTensorCreationCUDA) ... ok (0.010s) 2023-01-11T23:16:18.3686774Z test_signal_window_functions_window_blackman_cuda_float16 (__main__.TestTensorCreationCUDA) ... ok (0.010s) 2023-01-11T23:16:18.3687191Z test_signal_window_functions_window_blackman_cuda_float32 (__main__.TestTensorCreationCUDA) ... ok (0.010s) 2023-01-11T23:16:18.3687566Z test_signal_window_functions_window_blackman_cuda_float64 (__main__.TestTensorCreationCUDA) ... ok (0.010s) 2023-01-11T23:16:18.3687952Z test_signal_window_functions_window_blackman_cuda_int64 (__main__.TestTensorCreationCUDA) ... ok (0.004s) 2023-01-11T23:16:18.3688335Z test_signal_window_functions_window_hamming_cuda_bfloat16 (__main__.TestTensorCreationCUDA) ... ok (0.010s) 2023-01-11T23:16:18.3688728Z test_signal_window_functions_window_hamming_cuda_float16 (__main__.TestTensorCreationCUDA) ... ok (0.010s) 2023-01-11T23:16:18.3689111Z test_signal_window_functions_window_hamming_cuda_float32 (__main__.TestTensorCreationCUDA) ... ok (0.010s) 2023-01-11T23:16:18.3689496Z test_signal_window_functions_window_hamming_cuda_float64 (__main__.TestTensorCreationCUDA) ... ok (0.010s) 2023-01-11T23:16:18.3689876Z test_signal_window_functions_window_hamming_cuda_int64 (__main__.TestTensorCreationCUDA) ... ok (0.005s) 2023-01-11T23:16:18.3690259Z test_signal_window_functions_window_hann_cuda_bfloat16 (__main__.TestTensorCreationCUDA) ... ok (0.009s) 2023-01-11T23:16:18.3690627Z test_signal_window_functions_window_hann_cuda_float16 (__main__.TestTensorCreationCUDA) ... ok (0.009s) 2023-01-11T23:16:18.3691005Z test_signal_window_functions_window_hann_cuda_float32 (__main__.TestTensorCreationCUDA) ... ok (0.009s) 2023-01-11T23:16:18.3691377Z test_signal_window_functions_window_hann_cuda_float64 (__main__.TestTensorCreationCUDA) ... ok (0.009s) 2023-01-11T23:16:18.3691748Z test_signal_window_functions_window_hann_cuda_int64 (__main__.TestTensorCreationCUDA) ... ok (0.004s) 2023-01-11T23:16:18.3692131Z test_simple_scalar_cast_cuda (__main__.TestTensorCreationCUDA) ... skip: Only runs on cpu (0.001s) 2023-01-11T23:16:18.3692493Z test_stack_cuda (__main__.TestTensorCreationCUDA) ... skip: Only runs on cpu (0.001s) 2023-01-11T23:16:18.3692850Z test_stack_out_cuda (__main__.TestTensorCreationCUDA) ... skip: Only runs on cpu (0.001s) 2023-01-11T23:16:18.3693238Z test_strided_mismatched_stride_shape_cuda (__main__.TestTensorCreationCUDA) ... ok (0.022s) 2023-01-11T23:16:18.3693603Z test_tensor_ctor_device_inference_cuda (__main__.TestTensorCreationCUDA) ... ok (0.002s) 2023-01-11T23:16:18.3693942Z test_tensor_device_cuda (__main__.TestTensorCreationCUDA) ... ok (0.004s) 2023-01-11T23:16:18.3694273Z test_tensor_factories_empty_cuda (__main__.TestTensorCreationCUDA) ... ok (0.198s) 2023-01-11T23:16:18.3694761Z test_tensor_factory_copy_var_cuda (__main__.TestTensorCreationCUDA) ... skip: Only runs on cpu (0.001s) 2023-01-11T23:16:18.3695138Z test_tensor_factory_cuda (__main__.TestTensorCreationCUDA) ... skip: Only runs on cpu (0.003s) 2023-01-11T23:16:18.3695492Z test_tensor_factory_gpu_type_cuda (__main__.TestTensorCreationCUDA) ... ok (0.001s) 2023-01-11T23:16:18.3695855Z test_tensor_factory_gpu_type_inference_cuda (__main__.TestTensorCreationCUDA) ... ok (0.001s) 2023-01-11T23:16:18.3696236Z test_tensor_factory_type_inference_cuda (__main__.TestTensorCreationCUDA) ... skip: Only runs on cpu (0.002s) 2023-01-11T23:16:18.3696610Z test_tensor_from_non_writable_numpy_cuda (__main__.TestTensorCreationCUDA) ... ok (0.001s) 2023-01-11T23:16:18.3696978Z test_tensor_from_sequence_cuda (__main__.TestTensorCreationCUDA) ... skip: Only runs on cpu (0.001s) 2023-01-11T23:16:18.3697331Z test_torch_complex_cuda_float16 (__main__.TestTensorCreationCUDA) ... ok (0.001s) 2023-01-11T23:16:18.3697668Z test_torch_complex_cuda_float32 (__main__.TestTensorCreationCUDA) ... ok (0.001s) 2023-01-11T23:16:18.3698004Z test_torch_complex_cuda_float64 (__main__.TestTensorCreationCUDA) ... ok (0.001s) 2023-01-11T23:16:18.3698356Z test_torch_complex_floating_dtype_error_cuda_bool (__main__.TestTensorCreationCUDA) ... ok (0.005s) 2023-01-11T23:16:18.3698788Z test_torch_complex_floating_dtype_error_cuda_complex128 (__main__.TestTensorCreationCUDA) ... ok (0.005s) 2023-01-11T23:16:18.3699184Z test_torch_complex_floating_dtype_error_cuda_complex64 (__main__.TestTensorCreationCUDA) ... ok (0.005s) 2023-01-11T23:16:18.3699567Z test_torch_complex_floating_dtype_error_cuda_int16 (__main__.TestTensorCreationCUDA) ... ok (0.005s) 2023-01-11T23:16:18.3699939Z test_torch_complex_floating_dtype_error_cuda_int32 (__main__.TestTensorCreationCUDA) ... ok (0.005s) 2023-01-11T23:16:18.3700309Z test_torch_complex_floating_dtype_error_cuda_int64 (__main__.TestTensorCreationCUDA) ... ok (0.005s) 2023-01-11T23:16:18.3700684Z test_torch_complex_floating_dtype_error_cuda_int8 (__main__.TestTensorCreationCUDA) ... ok (0.005s) 2023-01-11T23:16:18.3701059Z test_torch_complex_floating_dtype_error_cuda_uint8 (__main__.TestTensorCreationCUDA) ... ok (0.005s) 2023-01-11T23:16:18.3701426Z test_torch_complex_out_dtype_error_cuda_float32 (__main__.TestTensorCreationCUDA) ... ok (0.008s) 2023-01-11T23:16:18.3701796Z test_torch_complex_out_dtype_error_cuda_float64 (__main__.TestTensorCreationCUDA) ... ok (0.008s) 2023-01-11T23:16:18.3702167Z test_torch_complex_same_dtype_error_cuda_float32 (__main__.TestTensorCreationCUDA) ... ok (0.011s) 2023-01-11T23:16:18.3702534Z test_torch_complex_same_dtype_error_cuda_float64 (__main__.TestTensorCreationCUDA) ... ok (0.011s) 2023-01-11T23:16:18.3702884Z test_torch_polar_cuda_float32 (__main__.TestTensorCreationCUDA) ... ok (0.001s) 2023-01-11T23:16:18.3703216Z test_torch_polar_cuda_float64 (__main__.TestTensorCreationCUDA) ... ok (0.001s) 2023-01-11T23:16:18.3703555Z test_unpack_double_cuda_float32 (__main__.TestTensorCreationCUDA) ... ok (0.002s) 2023-01-11T23:16:18.3703885Z test_unpack_double_cuda_float64 (__main__.TestTensorCreationCUDA) ... ok (0.002s) 2023-01-11T23:16:18.3704209Z test_vander_cuda (__main__.TestTensorCreationCUDA) ... ok (0.005s) 2023-01-11T23:16:18.3704534Z test_vander_types_cuda_bool (__main__.TestTensorCreationCUDA) ... ok (0.005s) 2023-01-11T23:16:18.3704871Z test_vander_types_cuda_complex128 (__main__.TestTensorCreationCUDA) ... ok (0.005s) 2023-01-11T23:16:18.3705215Z test_vander_types_cuda_complex64 (__main__.TestTensorCreationCUDA) ... ok (0.005s) 2023-01-11T23:16:18.3705604Z test_vander_types_cuda_float32 (__main__.TestTensorCreationCUDA) ... ok (0.005s) 2023-01-11T23:16:18.3705947Z test_vander_types_cuda_float64 (__main__.TestTensorCreationCUDA) ... ok (0.005s) 2023-01-11T23:16:18.3706280Z test_vander_types_cuda_int16 (__main__.TestTensorCreationCUDA) ... ok (0.005s) 2023-01-11T23:16:18.3706607Z test_vander_types_cuda_int32 (__main__.TestTensorCreationCUDA) ... ok (0.005s) 2023-01-11T23:16:18.3706935Z test_vander_types_cuda_int64 (__main__.TestTensorCreationCUDA) ... ok (0.005s) 2023-01-11T23:16:18.3707261Z test_vander_types_cuda_int8 (__main__.TestTensorCreationCUDA) ... ok (0.005s) 2023-01-11T23:16:18.3707593Z test_vander_types_cuda_uint8 (__main__.TestTensorCreationCUDA) ... ok (0.005s) 2023-01-11T23:16:18.3707927Z test_vsplit_cuda_complex64 (__main__.TestTensorCreationCUDA) ... ok (0.016s) 2023-01-11T23:16:18.3708256Z test_vsplit_cuda_float32 (__main__.TestTensorCreationCUDA) ... ok (0.015s) 2023-01-11T23:16:18.3708574Z test_vsplit_cuda_int64 (__main__.TestTensorCreationCUDA) ... ok (0.014s) 2023-01-11T23:16:18.3708911Z test_vstack_row_stack_cuda_complex128 (__main__.TestTensorCreationCUDA) ... ok (0.089s) 2023-01-11T23:16:18.3709260Z test_vstack_row_stack_cuda_complex64 (__main__.TestTensorCreationCUDA) ... ok (0.089s) 2023-01-11T23:16:18.3709601Z test_vstack_row_stack_cuda_float16 (__main__.TestTensorCreationCUDA) ... ok (0.088s) 2023-01-11T23:16:18.3709942Z test_vstack_row_stack_cuda_float32 (__main__.TestTensorCreationCUDA) ... ok (0.088s) 2023-01-11T23:16:18.3710279Z test_vstack_row_stack_cuda_float64 (__main__.TestTensorCreationCUDA) ... ok (0.088s) 2023-01-11T23:16:18.3710697Z test_vstack_row_stack_cuda_int16 (__main__.TestTensorCreationCUDA) ... ok (0.057s) 2023-01-11T23:16:18.3711084Z test_vstack_row_stack_cuda_int32 (__main__.TestTensorCreationCUDA) ... ok (0.057s) 2023-01-11T23:16:18.3711420Z test_vstack_row_stack_cuda_int64 (__main__.TestTensorCreationCUDA) ... ok (0.057s) 2023-01-11T23:16:18.3711758Z test_vstack_row_stack_cuda_int8 (__main__.TestTensorCreationCUDA) ... ok (0.057s) 2023-01-11T23:16:18.3712090Z test_vstack_row_stack_cuda_uint8 (__main__.TestTensorCreationCUDA) ... ok (0.057s) 2023-01-11T23:16:18.3712409Z test_zeros_cuda (__main__.TestTensorCreationCUDA) ... ok (0.003s) 2023-01-11T23:16:18.3712748Z test_zeros_dtype_layout_device_match_cuda_bool (__main__.TestTensorCreationCUDA) ... ok (0.001s) 2023-01-11T23:16:18.3713125Z test_zeros_dtype_layout_device_match_cuda_complex64 (__main__.TestTensorCreationCUDA) ... ok (0.001s) 2023-01-11T23:16:18.3713496Z test_zeros_dtype_layout_device_match_cuda_float16 (__main__.TestTensorCreationCUDA) ... ok (0.001s) 2023-01-11T23:16:18.3713867Z test_zeros_dtype_layout_device_match_cuda_float32 (__main__.TestTensorCreationCUDA) ... ok (0.001s) 2023-01-11T23:16:18.3714236Z test_zeros_dtype_layout_device_match_cuda_int16 (__main__.TestTensorCreationCUDA) ... ok (0.001s) 2023-01-11T23:16:18.3714602Z test_zeros_dtype_layout_device_match_cuda_int64 (__main__.TestTensorCreationCUDA) ... ok (0.001s) 2023-01-11T23:16:18.3714961Z test_zeros_dtype_layout_device_match_cuda_uint8 (__main__.TestTensorCreationCUDA) ... ok (0.001s) 2023-01-11T23:16:18.3715300Z test_zeros_out_cuda (__main__.TestTensorCreationCUDA) ... ok (0.004s) 2023-01-11T23:16:18.3715476Z 2023-01-11T23:16:18.3715691Z ---------------------------------------------------------------------- 2023-01-11T23:16:18.3715947Z Ran 459 tests in 85.967s 2023-01-11T23:16:18.3716073Z 2023-01-11T23:16:18.3716156Z OK (skipped=99) 2023-01-11T23:16:18.3716273Z 2023-01-11T23:16:18.3716366Z Generating XML reports... 2023-01-11T23:16:18.3716796Z Generated XML report: test-reports/python-unittest/test_tensor_creation_ops/TEST-TestAsArrayCUDA-20230111231451.xml 2023-01-11T23:16:18.3717365Z Generated XML report: test-reports/python-unittest/test_tensor_creation_ops/TEST-TestLikeTensorCreationCUDA-20230111231451.xml 2023-01-11T23:16:18.3718025Z Generated XML report: test-reports/python-unittest/test_tensor_creation_ops/TEST-TestRandomTensorCreationCUDA-20230111231451.xml 2023-01-11T23:16:18.3718611Z Generated XML report: test-reports/python-unittest/test_tensor_creation_ops/TEST-TestTensorCreationCUDA-20230111231451.xml 2023-01-11T23:16:18.3718870Z 2023-01-11T23:16:18.3719216Z ##[endgroup] 2023-01-11T23:16:18.3719638Z FINISHED PRINTING LOG FILE of test_tensor_creation_ops (/var/lib/jenkins/workspace/test/test-reports/test_tensor_creation_ops_hsc_5c7d) 2023-01-11T23:16:18.3719880Z 2023-01-11T23:16:18.3720038Z Running doctests ... [2023-01-11 23:16:18.349520] 2023-01-11T23:16:18.3720388Z Start doctest_module('/opt/conda/lib/python3.10/site-packages/torch') 2023-01-11T23:16:18.3720634Z Listing tests 2023-01-11T23:16:23.9128805Z gathering tests 2023-01-11T23:16:23.9143112Z running 663 test(s) 2023-01-11T23:16:23.9150245Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/__init__.py::is_tensor:0, line 429 <- wrt source file 2023-01-11T23:16:23.9155646Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/__init__.py::is_tensor:0 2023-01-11T23:16:23.9156323Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/__init__.py::set_default_tensor_type:0, line 458 <- wrt source file 2023-01-11T23:16:23.9156926Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/__init__.py::set_default_tensor_type:0 2023-01-11T23:16:23.9157480Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/__init__.py::set_default_dtype:0, line 496 <- wrt source file 2023-01-11T23:16:23.9159956Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/__init__.py::set_default_dtype:0 2023-01-11T23:16:23.9160723Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/__init__.py::use_deterministic_algorithms:0, line 629 <- wrt source file 2023-01-11T23:16:23.9161810Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/__init__.py::use_deterministic_algorithms:0 2023-01-11T23:16:23.9162470Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/__init__.py::compile:0, line 1221 <- wrt source file 2023-01-11T23:16:23.9162953Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/__init__.py::compile:0 2023-01-11T23:16:23.9163590Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/_C.cpython-310-x86_64-linux-gnu.so::Generator:0, line 15 <- wrt source file 2023-01-11T23:16:23.9164647Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/_C.cpython-310-x86_64-linux-gnu.so::Generator:0 2023-01-11T23:16:23.9165343Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/_C.cpython-310-x86_64-linux-gnu.so::_LinAlgError:0, line 5 <- wrt source file 2023-01-11T23:16:23.9166024Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/_C.cpython-310-x86_64-linux-gnu.so::_LinAlgError:0 2023-01-11T23:16:23.9166641Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/_namedtensor_internals.py::update_names:0, line 125 <- wrt source file 2023-01-11T23:16:23.9168738Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/_namedtensor_internals.py::update_names:0 2023-01-11T23:16:23.9169321Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/_tensor.py::Tensor.register_hook:0, line 508 <- wrt source file 2023-01-11T23:16:23.9388816Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/_tensor.py::Tensor.register_hook:0 2023-01-11T23:16:23.9389556Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/_tensor.py::Tensor.refine_names:0, line 1096 <- wrt source file 2023-01-11T23:16:23.9498355Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/_tensor.py::Tensor.refine_names:0 2023-01-11T23:16:23.9499847Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/_tensor.py::Tensor.align_to:0, line 1141 <- wrt source file 2023-01-11T23:16:23.9505071Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/_tensor.py::Tensor.align_to:0 2023-01-11T23:16:23.9505614Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/_tensor.py::Tensor.rename:0, line 1214 <- wrt source file 2023-01-11T23:16:23.9511003Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/_tensor.py::Tensor.rename:0 2023-01-11T23:16:23.9511568Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/_tensor.py::Tensor.to_sparse_coo:0, line 1244 <- wrt source file 2023-01-11T23:16:23.9521271Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/_tensor.py::Tensor.to_sparse_coo:0 2023-01-11T23:16:23.9521913Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/_tensor_str.py::set_printoptions:0, line 49 <- wrt source file 2023-01-11T23:16:23.9548588Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/_tensor_str.py::set_printoptions:0 2023-01-11T23:16:23.9549224Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/functional.py::broadcast_tensors:0, line 61 <- wrt source file 2023-01-11T23:16:23.9554985Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/functional.py::broadcast_tensors:0 2023-01-11T23:16:23.9555604Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/functional.py::broadcast_shapes:0, line 89 <- wrt source file 2023-01-11T23:16:23.9557796Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/functional.py::broadcast_shapes:0 2023-01-11T23:16:23.9558403Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/functional.py::split:0, line 161 <- wrt source file 2023-01-11T23:16:23.9571502Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/functional.py::split:0 2023-01-11T23:16:23.9572234Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/functional.py::einsum:0, line 269 <- wrt source file 2023-01-11T23:16:23.9590221Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/functional.py::einsum:0 2023-01-11T23:16:23.9590889Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/functional.py::meshgrid:0, line 450 <- wrt source file 2023-01-11T23:16:23.9625764Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/functional.py::meshgrid:0 2023-01-11T23:16:23.9626374Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/functional.py::_unique_impl:0, line 764 <- wrt source file 2023-01-11T23:16:23.9640047Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/functional.py::_unique_impl:0 2023-01-11T23:16:23.9640672Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/functional.py::_unique_consecutive_impl:0, line 842 <- wrt source file 2023-01-11T23:16:23.9651895Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/functional.py::_unique_consecutive_impl:0 2023-01-11T23:16:23.9652444Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/functional.py::tensordot:0, line 1040 <- wrt source file 2023-01-11T23:16:23.9663179Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/functional.py::tensordot:0 2023-01-11T23:16:23.9663707Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/functional.py::cartesian_prod:0, line 1118 <- wrt source file 2023-01-11T23:16:23.9670660Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/functional.py::cartesian_prod:0 2023-01-11T23:16:23.9671190Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/functional.py::block_diag:0, line 1152 <- wrt source file 2023-01-11T23:16:23.9681500Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/functional.py::block_diag:0 2023-01-11T23:16:23.9682030Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/functional.py::cdist:0, line 1203 <- wrt source file 2023-01-11T23:16:23.9696106Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/functional.py::cdist:0 2023-01-11T23:16:23.9696652Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/functional.py::atleast_1d:0, line 1243 <- wrt source file 2023-01-11T23:16:23.9713443Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/functional.py::atleast_1d:0 2023-01-11T23:16:23.9713973Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/functional.py::atleast_2d:0, line 1279 <- wrt source file 2023-01-11T23:16:23.9729084Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/functional.py::atleast_2d:0 2023-01-11T23:16:23.9729629Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/functional.py::atleast_3d:0, line 1317 <- wrt source file 2023-01-11T23:16:23.9750875Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/functional.py::atleast_3d:0 2023-01-11T23:16:23.9751408Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/functional.py::norm:0, line 1455 <- wrt source file 2023-01-11T23:16:23.9785539Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/functional.py::norm:0 2023-01-11T23:16:23.9786173Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/functional.py::chain_matmul:0, line 1606 <- wrt source file 2023-01-11T23:16:23.9786794Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/functional.py::chain_matmul:0 2023-01-11T23:16:23.9787348Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/functional.py::_lu_impl:0, line 1706 <- wrt source file 2023-01-11T23:16:23.9789623Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/functional.py::_lu_impl:0 2023-01-11T23:16:23.9790571Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/hub.py::list:0, line 391 <- wrt source file 2023-01-11T23:16:23.9791222Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/hub.py::list:0 2023-01-11T23:16:23.9791844Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/hub.py::help:0, line 444 <- wrt source file 2023-01-11T23:16:23.9792374Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/hub.py::help:0 2023-01-11T23:16:23.9793031Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/hub.py::load:0, line 524 <- wrt source file 2023-01-11T23:16:23.9793685Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/hub.py::load:0 2023-01-11T23:16:23.9794351Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/hub.py::_load_local:0, line 563 <- wrt source file 2023-01-11T23:16:23.9795007Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/hub.py::_load_local:0 2023-01-11T23:16:23.9795630Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/hub.py::download_url_to_file:0, line 592 <- wrt source file 2023-01-11T23:16:23.9796136Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/hub.py::download_url_to_file:0 2023-01-11T23:16:23.9796658Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/hub.py::load_state_dict_from_url:0, line 701 <- wrt source file 2023-01-11T23:16:23.9797237Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/hub.py::load_state_dict_from_url:0 2023-01-11T23:16:23.9797766Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/library.py::Library.define:0, line 61 <- wrt source file 2023-01-11T23:16:23.9798342Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/library.py::Library.define:0 2023-01-11T23:16:23.9799042Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/library.py::Library.impl:0, line 81 <- wrt source file 2023-01-11T23:16:23.9799718Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/library.py::Library.impl:0 2023-01-11T23:16:23.9800366Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/overrides.py::get_ignored_functions:0, line 67 <- wrt source file 2023-01-11T23:16:23.9800905Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/overrides.py::get_ignored_functions:0 2023-01-11T23:16:23.9801449Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/overrides.py::get_testing_overrides:0, line 336 <- wrt source file 2023-01-11T23:16:23.9839047Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/overrides.py::get_testing_overrides:0 2023-01-11T23:16:23.9839690Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/overrides.py::wrap_torch_function:0, line 1391 <- wrt source file 2023-01-11T23:16:23.9842961Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/overrides.py::wrap_torch_function:0 2023-01-11T23:16:23.9843608Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/overrides.py::handle_torch_function:0, line 1508 <- wrt source file 2023-01-11T23:16:23.9846540Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/overrides.py::handle_torch_function:0 2023-01-11T23:16:23.9847202Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/overrides.py::is_tensor_method_or_property:0, line 1732 <- wrt source file 2023-01-11T23:16:23.9889522Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/overrides.py::is_tensor_method_or_property:0 2023-01-11T23:16:23.9890170Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/overrides.py::is_tensor_like:0, line 1750 <- wrt source file 2023-01-11T23:16:23.9898075Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/overrides.py::is_tensor_like:0 2023-01-11T23:16:23.9898943Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/quasirandom.py::SobolEngine:0, line 37 <- wrt source file 2023-01-11T23:16:23.9899529Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/quasirandom.py::SobolEngine:0 2023-01-11T23:16:23.9900062Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/serialization.py::save:0, line 429 <- wrt source file 2023-01-11T23:16:23.9900558Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/serialization.py::save:0 2023-01-11T23:16:23.9901069Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/serialization.py::load:0, line 754 <- wrt source file 2023-01-11T23:16:23.9904409Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/serialization.py::load:0 2023-01-11T23:16:23.9905157Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/torch_version.py::TorchVersion:0, line 49 <- wrt source file 2023-01-11T23:16:23.9905758Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/torch_version.py::TorchVersion:0 2023-01-11T23:16:23.9906336Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/_prims_common/__init__.py::compute_required_storage_length:0, line 1495 <- wrt source file 2023-01-11T23:16:23.9913350Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/_prims_common/__init__.py::compute_required_storage_length:0 2023-01-11T23:16:23.9914046Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/futures/__init__.py::Future.then:0, line 147 <- wrt source file 2023-01-11T23:16:23.9914590Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/futures/__init__.py::Future.then:0 2023-01-11T23:16:23.9915256Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/futures/__init__.py::Future.add_done_callback:0, line 195 <- wrt source file 2023-01-11T23:16:23.9915872Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/futures/__init__.py::Future.add_done_callback:0 2023-01-11T23:16:23.9916618Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/futures/__init__.py::Future.set_result:0, line 228 <- wrt source file 2023-01-11T23:16:23.9917299Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/futures/__init__.py::Future.set_result:0 2023-01-11T23:16:23.9918025Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/futures/__init__.py::Future.set_exception:0, line 257 <- wrt source file 2023-01-11T23:16:23.9918745Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/futures/__init__.py::Future.set_exception:0 2023-01-11T23:16:23.9919423Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/futures/__init__.py::collect_all:0, line 288 <- wrt source file 2023-01-11T23:16:23.9920132Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/futures/__init__.py::collect_all:0 2023-01-11T23:16:23.9920850Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/jit/__init__.py::annotate:0, line 103 <- wrt source file 2023-01-11T23:16:23.9921533Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/jit/__init__.py::annotate:0 2023-01-11T23:16:23.9922209Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/jit/__init__.py::strict_fusion:0, line 202 <- wrt source file 2023-01-11T23:16:23.9922879Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/jit/__init__.py::strict_fusion:0 2023-01-11T23:16:23.9923435Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/monitor/__init__.py::TensorboardEventHandler:0, line 21 <- wrt source file 2023-01-11T23:16:23.9938528Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/monitor/__init__.py::TensorboardEventHandler:0 2023-01-11T23:16:23.9939279Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nested/__init__.py::as_nested_tensor:0, line 39 <- wrt source file 2023-01-11T23:16:23.9953555Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nested/__init__.py::as_nested_tensor:0 2023-01-11T23:16:23.9954116Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/sparse/__init__.py::sum:0, line 175 <- wrt source file 2023-01-11T23:16:23.9963310Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/sparse/__init__.py::sum:0 2023-01-11T23:16:23.9963867Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py::aot_function:0, line 2139 <- wrt source file 2023-01-11T23:16:24.1259916Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py::aot_function:0 2023-01-11T23:16:24.1261101Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/_functorch/benchmark_utils.py::benchmark_utilization:0, line 162 <- wrt source file 2023-01-11T23:16:24.1262265Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/_functorch/benchmark_utils.py::benchmark_utilization:0 2023-01-11T23:16:24.1263167Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/_functorch/eager_transforms.py::vjp:0, line 195 <- wrt source file 2023-01-11T23:16:24.1327748Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/_functorch/eager_transforms.py::vjp:0 2023-01-11T23:16:24.1328338Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/_functorch/eager_transforms.py::jacrev:0, line 382 <- wrt source file 2023-01-11T23:16:24.1457091Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/_functorch/eager_transforms.py::jacrev:0 2023-01-11T23:16:24.1458255Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/_functorch/eager_transforms.py::jvp:0, line 882 <- wrt source file 2023-01-11T23:16:24.2111479Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/_functorch/eager_transforms.py::jvp:0 2023-01-11T23:16:24.2112523Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/_functorch/eager_transforms.py::jacfwd:0, line 1024 <- wrt source file 2023-01-11T23:16:24.2233333Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/_functorch/eager_transforms.py::jacfwd:0 2023-01-11T23:16:24.2234032Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/_functorch/eager_transforms.py::hessian:0, line 1173 <- wrt source file 2023-01-11T23:16:24.2270227Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/_functorch/eager_transforms.py::hessian:0 2023-01-11T23:16:24.2271705Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/_functorch/eager_transforms.py::grad:0, line 1290 <- wrt source file 2023-01-11T23:16:24.2273076Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/_functorch/eager_transforms.py::grad:0 2023-01-11T23:16:24.2273689Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/_functorch/eager_transforms.py::functionalize:0, line 1441 <- wrt source file 2023-01-11T23:16:24.2274265Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/_functorch/eager_transforms.py::functionalize:0 2023-01-11T23:16:24.2274815Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/_functorch/fx_minifier.py::minifier:0, line 72 <- wrt source file 2023-01-11T23:16:24.2275345Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/_functorch/fx_minifier.py::minifier:0 2023-01-11T23:16:24.2275870Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/_functorch/vmap.py::vmap:0, line 306 <- wrt source file 2023-01-11T23:16:24.2317414Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/_functorch/vmap.py::vmap:0 2023-01-11T23:16:24.2318924Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/_prims/context.py::NvfuserPrimsMode:0, line 90 <- wrt source file 2023-01-11T23:16:24.2320673Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/_prims/context.py::NvfuserPrimsMode:0 2023-01-11T23:16:24.2322031Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/_prims/context.py::TorchRefsMode:0, line 141 <- wrt source file 2023-01-11T23:16:24.2323265Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/_prims/context.py::TorchRefsMode:0 2023-01-11T23:16:24.2323833Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/nn/intrinsic/qat/modules/linear_relu.py::LinearReLU:0, line 21 <- wrt source file 2023-01-11T23:16:24.2324410Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/ao/nn/intrinsic/qat/modules/linear_relu.py::LinearReLU:0 2023-01-11T23:16:24.2325022Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/nn/intrinsic/quantized/dynamic/modules/linear_relu.py::LinearReLU:0, line 21 <- wrt source file 2023-01-11T23:16:24.2325644Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/ao/nn/intrinsic/quantized/dynamic/modules/linear_relu.py::LinearReLU:0 2023-01-11T23:16:24.2326252Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/nn/intrinsic/quantized/modules/linear_relu.py::LinearReLU:0, line 22 <- wrt source file 2023-01-11T23:16:24.2326857Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/ao/nn/intrinsic/quantized/modules/linear_relu.py::LinearReLU:0 2023-01-11T23:16:24.2327468Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/nn/intrinsic/quantized/modules/linear_relu.py::LinearLeakyReLU:0, line 59 <- wrt source file 2023-01-11T23:16:24.2328075Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/ao/nn/intrinsic/quantized/modules/linear_relu.py::LinearLeakyReLU:0 2023-01-11T23:16:24.2328687Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/nn/intrinsic/quantized/modules/linear_relu.py::LinearTanh:0, line 126 <- wrt source file 2023-01-11T23:16:24.2329338Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/ao/nn/intrinsic/quantized/modules/linear_relu.py::LinearTanh:0 2023-01-11T23:16:24.2329916Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantizable/modules/rnn.py::LSTMCell:0, line 24 <- wrt source file 2023-01-11T23:16:24.2347656Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantizable/modules/rnn.py::LSTMCell:0 2023-01-11T23:16:24.2348813Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantizable/modules/rnn.py::LSTM:0, line 274 <- wrt source file 2023-01-11T23:16:24.2376574Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantizable/modules/rnn.py::LSTM:0 2023-01-11T23:16:24.2377641Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantized/functional.py::conv1d:0, line 166 <- wrt source file 2023-01-11T23:16:24.2378354Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantized/functional.py::conv1d:0 2023-01-11T23:16:24.2379134Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantized/functional.py::conv2d:0, line 226 <- wrt source file 2023-01-11T23:16:24.2379734Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantized/functional.py::conv2d:0 2023-01-11T23:16:24.2380426Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantized/functional.py::conv3d:0, line 287 <- wrt source file 2023-01-11T23:16:24.2381069Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantized/functional.py::conv3d:0 2023-01-11T23:16:24.2381626Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantized/modules/__init__.py::Quantize:0, line 74 <- wrt source file 2023-01-11T23:16:24.2386859Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantized/modules/__init__.py::Quantize:0 2023-01-11T23:16:24.2387466Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantized/modules/__init__.py::DeQuantize:0, line 114 <- wrt source file 2023-01-11T23:16:24.2394299Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantized/modules/__init__.py::DeQuantize:0 2023-01-11T23:16:24.2395100Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantized/dynamic/modules/conv.py::Conv1d:0, line 34 <- wrt source file 2023-01-11T23:16:24.2395864Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantized/dynamic/modules/conv.py::Conv1d:0 2023-01-11T23:16:24.2396514Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantized/dynamic/modules/conv.py::Conv2d:0, line 105 <- wrt source file 2023-01-11T23:16:24.2397131Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantized/dynamic/modules/conv.py::Conv2d:0 2023-01-11T23:16:24.2397878Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantized/dynamic/modules/conv.py::Conv3d:0, line 170 <- wrt source file 2023-01-11T23:16:24.2398533Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantized/dynamic/modules/conv.py::Conv3d:0 2023-01-11T23:16:24.2399265Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantized/dynamic/modules/conv.py::ConvTranspose1d:0, line 236 <- wrt source file 2023-01-11T23:16:24.2399998Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantized/dynamic/modules/conv.py::ConvTranspose1d:0 2023-01-11T23:16:24.2400842Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantized/dynamic/modules/conv.py::ConvTranspose2d:0, line 297 <- wrt source file 2023-01-11T23:16:24.2401686Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantized/dynamic/modules/conv.py::ConvTranspose2d:0 2023-01-11T23:16:24.2402609Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantized/dynamic/modules/conv.py::ConvTranspose3d:0, line 358 <- wrt source file 2023-01-11T23:16:24.2403455Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantized/dynamic/modules/conv.py::ConvTranspose3d:0 2023-01-11T23:16:24.2404279Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantized/dynamic/modules/linear.py::Linear:0, line 28 <- wrt source file 2023-01-11T23:16:24.2404846Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantized/dynamic/modules/linear.py::Linear:0 2023-01-11T23:16:24.2405413Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantized/dynamic/modules/rnn.py::LSTM:0, line 391 <- wrt source file 2023-01-11T23:16:24.2405956Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantized/dynamic/modules/rnn.py::LSTM:0 2023-01-11T23:16:24.2406519Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantized/dynamic/modules/rnn.py::GRU:0, line 618 <- wrt source file 2023-01-11T23:16:24.2407075Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantized/dynamic/modules/rnn.py::GRU:0 2023-01-11T23:16:24.2407639Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantized/dynamic/modules/rnn.py::RNNCell:0, line 940 <- wrt source file 2023-01-11T23:16:24.2408353Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantized/dynamic/modules/rnn.py::RNNCell:0 2023-01-11T23:16:24.2409113Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantized/dynamic/modules/rnn.py::LSTMCell:0, line 993 <- wrt source file 2023-01-11T23:16:24.2409864Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantized/dynamic/modules/rnn.py::LSTMCell:0 2023-01-11T23:16:24.2410559Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantized/dynamic/modules/rnn.py::GRUCell:0, line 1036 <- wrt source file 2023-01-11T23:16:24.2411186Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantized/dynamic/modules/rnn.py::GRUCell:0 2023-01-11T23:16:24.2411752Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantized/modules/activation.py::ReLU6:0, line 31 <- wrt source file 2023-01-11T23:16:24.2412440Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantized/modules/activation.py::ReLU6:0 2023-01-11T23:16:24.2413108Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantized/modules/conv.py::Conv1d:0, line 295 <- wrt source file 2023-01-11T23:16:24.2413709Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantized/modules/conv.py::Conv1d:0 2023-01-11T23:16:24.2414386Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantized/modules/conv.py::Conv2d:0, line 403 <- wrt source file 2023-01-11T23:16:24.2415127Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantized/modules/conv.py::Conv2d:0 2023-01-11T23:16:24.2415774Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantized/modules/conv.py::Conv3d:0, line 502 <- wrt source file 2023-01-11T23:16:24.2416314Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantized/modules/conv.py::Conv3d:0 2023-01-11T23:16:24.2416874Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantized/modules/conv.py::ConvTranspose1d:0, line 685 <- wrt source file 2023-01-11T23:16:24.2417452Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantized/modules/conv.py::ConvTranspose1d:0 2023-01-11T23:16:24.2418038Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantized/modules/conv.py::ConvTranspose2d:0, line 775 <- wrt source file 2023-01-11T23:16:24.2418678Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantized/modules/conv.py::ConvTranspose2d:0 2023-01-11T23:16:24.2419255Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantized/modules/conv.py::ConvTranspose3d:0, line 869 <- wrt source file 2023-01-11T23:16:24.2419813Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantized/modules/conv.py::ConvTranspose3d:0 2023-01-11T23:16:24.2420386Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantized/modules/embedding_ops.py::Embedding:0, line 84 <- wrt source file 2023-01-11T23:16:24.2439845Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantized/modules/embedding_ops.py::Embedding:0 2023-01-11T23:16:24.2440467Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantized/modules/embedding_ops.py::EmbeddingBag:0, line 209 <- wrt source file 2023-01-11T23:16:24.2460790Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantized/modules/embedding_ops.py::EmbeddingBag:0 2023-01-11T23:16:24.2461414Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantized/modules/functional_modules.py::FloatFunctional:0, line 21 <- wrt source file 2023-01-11T23:16:24.2465696Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantized/modules/functional_modules.py::FloatFunctional:0 2023-01-11T23:16:24.2466320Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantized/modules/functional_modules.py::QFunctional:0, line 141 <- wrt source file 2023-01-11T23:16:24.2469564Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantized/modules/functional_modules.py::QFunctional:0 2023-01-11T23:16:24.2470517Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantized/modules/linear.py::Linear:0, line 117 <- wrt source file 2023-01-11T23:16:24.2471295Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantized/modules/linear.py::Linear:0 2023-01-11T23:16:24.2471857Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantized/modules/rnn.py::LSTM:0, line 20 <- wrt source file 2023-01-11T23:16:24.2472486Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantized/modules/rnn.py::LSTM:0 2023-01-11T23:16:24.2473168Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/pruning/_experimental/activation_sparsifier/activation_sparsifier.py::ActivationSparsifier:0, line 59 <- wrt source file 2023-01-11T23:16:24.2473975Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/ao/pruning/_experimental/activation_sparsifier/activation_sparsifier.py::ActivationSparsifier:0 2023-01-11T23:16:24.2474706Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/pruning/_experimental/data_scheduler/base_data_scheduler.py::BaseDataScheduler.get_schedule_param:0, line 92 <- wrt source file 2023-01-11T23:16:24.2504229Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/ao/pruning/_experimental/data_scheduler/base_data_scheduler.py::BaseDataScheduler.get_schedule_param:0 2023-01-11T23:16:24.2505069Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/pruning/_experimental/data_sparsifier/base_data_sparsifier.py::BaseDataSparsifier:0, line 54 <- wrt source file 2023-01-11T23:16:24.2505836Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/ao/pruning/_experimental/data_sparsifier/base_data_sparsifier.py::BaseDataSparsifier:0 2023-01-11T23:16:24.2506567Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/pruning/scheduler/lambda_scheduler.py::LambdaSL:0, line 19 <- wrt source file 2023-01-11T23:16:24.2508255Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/ao/pruning/scheduler/lambda_scheduler.py::LambdaSL:0 2023-01-11T23:16:24.2509067Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/pruning/sparsifier/base_sparsifier.py::BaseSparsifier:0, line 45 <- wrt source file 2023-01-11T23:16:24.2509762Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/ao/pruning/sparsifier/base_sparsifier.py::BaseSparsifier:0 2023-01-11T23:16:24.2510541Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/pruning/sparsifier/base_sparsifier.py::BaseSparsifier.squash_mask:0, line 237 <- wrt source file 2023-01-11T23:16:24.2514951Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/ao/pruning/sparsifier/base_sparsifier.py::BaseSparsifier.squash_mask:0 2023-01-11T23:16:24.2515653Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/quantization/fuse_modules.py::fuse_modules:0, line 143 <- wrt source file 2023-01-11T23:16:24.2516437Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/ao/quantization/fuse_modules.py::fuse_modules:0 2023-01-11T23:16:24.2517105Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/quantization/fuser_method_mappings.py::fuse_conv_bn:0, line 27 <- wrt source file 2023-01-11T23:16:24.2524453Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/ao/quantization/fuser_method_mappings.py::fuse_conv_bn:0 2023-01-11T23:16:24.2525117Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/quantization/fuser_method_mappings.py::fuse_conv_bn_relu:0, line 64 <- wrt source file 2023-01-11T23:16:24.2529460Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/ao/quantization/fuser_method_mappings.py::fuse_conv_bn_relu:0 2023-01-11T23:16:24.2530320Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/quantization/fuser_method_mappings.py::fuse_linear_bn:0, line 111 <- wrt source file 2023-01-11T23:16:24.2534713Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/ao/quantization/fuser_method_mappings.py::fuse_linear_bn:0 2023-01-11T23:16:24.2535340Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/quantization/fuser_method_mappings.py::fuse_convtranspose_bn:0, line 139 <- wrt source file 2023-01-11T23:16:24.2540259Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/ao/quantization/fuser_method_mappings.py::fuse_convtranspose_bn:0 2023-01-11T23:16:24.2540843Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/quantization/observer.py::_with_args:0, line 85 <- wrt source file 2023-01-11T23:16:24.2541535Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/ao/quantization/observer.py::_with_args:0 2023-01-11T23:16:24.2542117Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/quantization/observer.py::_with_callable_args:0, line 106 <- wrt source file 2023-01-11T23:16:24.2542686Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/ao/quantization/observer.py::_with_callable_args:0 2023-01-11T23:16:24.2543252Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/quantization/quantize_fx.py::fuse_fx:0, line 242 <- wrt source file 2023-01-11T23:16:24.2543795Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/ao/quantization/quantize_fx.py::fuse_fx:0 2023-01-11T23:16:24.2546573Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/quantization/quantize_fx.py::prepare_fx:0, line 301 <- wrt source file 2023-01-11T23:16:24.2547134Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/ao/quantization/quantize_fx.py::prepare_fx:0 2023-01-11T23:16:24.2549155Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/quantization/quantize_fx.py::prepare_qat_fx:0, line 439 <- wrt source file 2023-01-11T23:16:24.2549823Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/ao/quantization/quantize_fx.py::prepare_qat_fx:0 2023-01-11T23:16:24.2550571Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/quantization/quantize_fx.py::convert_fx:0, line 604 <- wrt source file 2023-01-11T23:16:24.2551148Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/ao/quantization/quantize_fx.py::convert_fx:0 2023-01-11T23:16:24.2551735Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/quantization/quantize_fx.py::convert_to_reference_fx:0, line 663 <- wrt source file 2023-01-11T23:16:24.2552589Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/ao/quantization/quantize_fx.py::convert_to_reference_fx:0 2023-01-11T23:16:24.2553249Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/quantization/quantize_fx.py::_convert_to_reference_decomposed_fx:0, line 715 <- wrt source file 2023-01-11T23:16:24.2554113Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/ao/quantization/quantize_fx.py::_convert_to_reference_decomposed_fx:0 2023-01-11T23:16:24.2554927Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/quantization/utils.py::_get_path_of_module:0, line 408 <- wrt source file 2023-01-11T23:16:24.2555663Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/ao/quantization/utils.py::_get_path_of_module:0 2023-01-11T23:16:24.2556404Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/quantization/utils.py::_get_signature_locals:0, line 429 <- wrt source file 2023-01-11T23:16:24.2556966Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/ao/quantization/utils.py::_get_signature_locals:0 2023-01-11T23:16:24.2557536Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/quantization/utils.py::_get_default_kwargs:0, line 442 <- wrt source file 2023-01-11T23:16:24.2558163Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/ao/quantization/utils.py::_get_default_kwargs:0 2023-01-11T23:16:24.2558742Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/quantization/utils.py::_normalize_kwargs:0, line 463 <- wrt source file 2023-01-11T23:16:24.2559292Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/ao/quantization/utils.py::_normalize_kwargs:0 2023-01-11T23:16:24.2559921Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/quantization/utils.py::_get_num_pos_args:0, line 483 <- wrt source file 2023-01-11T23:16:24.2560606Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/ao/quantization/utils.py::_get_num_pos_args:0 2023-01-11T23:16:24.2561259Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/quantization/backend_config/backend_config.py::DTypeConfig:0, line 131 <- wrt source file 2023-01-11T23:16:24.2561877Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/ao/quantization/backend_config/backend_config.py::DTypeConfig:0 2023-01-11T23:16:24.2562506Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/quantization/backend_config/onednn.py::_fuse_linear_bn_leaky_relu:0, line 80 <- wrt source file 2023-01-11T23:16:24.2563113Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/ao/quantization/backend_config/onednn.py::_fuse_linear_bn_leaky_relu:0 2023-01-11T23:16:24.2563764Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/quantization/fx/_model_report/model_report.py::ModelReport:0, line 79 <- wrt source file 2023-01-11T23:16:24.2565799Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/ao/quantization/fx/_model_report/model_report.py::ModelReport:0 2023-01-11T23:16:24.2566600Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/quantization/fx/_model_report/model_report_visualizer.py::ModelReportVisualizer.generate_filtered_tables:0, line 324 <- wrt source file 2023-01-11T23:16:24.2567472Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/ao/quantization/fx/_model_report/model_report_visualizer.py::ModelReportVisualizer.generate_filtered_tables:0 2023-01-11T23:16:24.2568233Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/quantization/fx/_model_report/model_report_visualizer.py::ModelReportVisualizer.generate_table_visualization:0, line 407 <- wrt source file 2023-01-11T23:16:24.2568970Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/ao/quantization/fx/_model_report/model_report_visualizer.py::ModelReportVisualizer.generate_table_visualization:0 2023-01-11T23:16:24.2569727Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/quantization/fx/_model_report/model_report_visualizer.py::ModelReportVisualizer.generate_plot_visualization:0, line 557 <- wrt source file 2023-01-11T23:16:24.2570482Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/ao/quantization/fx/_model_report/model_report_visualizer.py::ModelReportVisualizer.generate_plot_visualization:0 2023-01-11T23:16:24.2571445Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/quantization/fx/_model_report/model_report_visualizer.py::ModelReportVisualizer.generate_histogram_visualization:0, line 619 <- wrt source file 2023-01-11T23:16:24.2572254Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/ao/quantization/fx/_model_report/model_report_visualizer.py::ModelReportVisualizer.generate_histogram_visualization:0 2023-01-11T23:16:24.2573081Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/autograd/anomaly_mode.py::detect_anomaly:0, line 25 <- wrt source file 2023-01-11T23:16:24.2573693Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/autograd/anomaly_mode.py::detect_anomaly:0 2023-01-11T23:16:24.2574789Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/autograd/forward_ad.py::make_dual:0, line 63 <- wrt source file 2023-01-11T23:16:24.2575516Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/autograd/forward_ad.py::make_dual:0 2023-01-11T23:16:24.2576238Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/autograd/forward_ad.py::unpack_dual:0, line 126 <- wrt source file 2023-01-11T23:16:24.2576801Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/autograd/forward_ad.py::unpack_dual:0 2023-01-11T23:16:24.2577342Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/autograd/forward_ad.py::dual_level:0, line 163 <- wrt source file 2023-01-11T23:16:24.2577866Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/autograd/forward_ad.py::dual_level:0 2023-01-11T23:16:24.2578436Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/autograd/function.py::FunctionCtx.save_for_backward:0, line 51 <- wrt source file 2023-01-11T23:16:24.2579006Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/autograd/function.py::FunctionCtx.save_for_backward:0 2023-01-11T23:16:24.2579591Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/autograd/function.py::FunctionCtx.save_for_forward:0, line 93 <- wrt source file 2023-01-11T23:16:24.2580153Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/autograd/function.py::FunctionCtx.save_for_forward:0 2023-01-11T23:16:24.2580725Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/autograd/function.py::FunctionCtx.mark_dirty:0, line 143 <- wrt source file 2023-01-11T23:16:24.2581265Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/autograd/function.py::FunctionCtx.mark_dirty:0 2023-01-11T23:16:24.2581869Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/autograd/function.py::FunctionCtx.mark_non_differentiable:0, line 187 <- wrt source file 2023-01-11T23:16:24.2582758Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/autograd/function.py::FunctionCtx.mark_non_differentiable:0 2023-01-11T23:16:24.2583632Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/autograd/function.py::FunctionCtx.set_materialize_grads:0, line 215 <- wrt source file 2023-01-11T23:16:24.2584301Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/autograd/function.py::FunctionCtx.set_materialize_grads:0 2023-01-11T23:16:24.2584860Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/autograd/function.py::Function:0, line 387 <- wrt source file 2023-01-11T23:16:24.2585381Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/autograd/function.py::Function:0 2023-01-11T23:16:24.2585911Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/autograd/functional.py::vjp:0, line 248 <- wrt source file 2023-01-11T23:16:24.2587954Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/autograd/functional.py::vjp:0 2023-01-11T23:16:24.2588623Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/autograd/functional.py::jvp:0, line 346 <- wrt source file 2023-01-11T23:16:24.2593092Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/autograd/functional.py::jvp:0 2023-01-11T23:16:24.2593778Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/autograd/functional.py::jacobian:0, line 548 <- wrt source file 2023-01-11T23:16:24.2594832Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/autograd/functional.py::jacobian:0 2023-01-11T23:16:24.2595458Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/autograd/functional.py::hessian:0, line 760 <- wrt source file 2023-01-11T23:16:24.2598491Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/autograd/functional.py::hessian:0 2023-01-11T23:16:24.2599167Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/autograd/functional.py::vhp:0, line 864 <- wrt source file 2023-01-11T23:16:24.2602499Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/autograd/functional.py::vhp:0 2023-01-11T23:16:24.2603167Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/autograd/functional.py::hvp:0, line 955 <- wrt source file 2023-01-11T23:16:24.2606467Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/autograd/functional.py::hvp:0 2023-01-11T23:16:24.2607153Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py::no_grad:0, line 120 <- wrt source file 2023-01-11T23:16:24.2607992Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py::no_grad:0 2023-01-11T23:16:24.2608614Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py::enable_grad:0, line 166 <- wrt source file 2023-01-11T23:16:24.2610571Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py::enable_grad:0 2023-01-11T23:16:24.2611268Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py::set_grad_enabled:0, line 216 <- wrt source file 2023-01-11T23:16:24.2612362Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py::set_grad_enabled:0 2023-01-11T23:16:24.2613047Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py::inference_mode:0, line 273 <- wrt source file 2023-01-11T23:16:24.2614952Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py::inference_mode:0 2023-01-11T23:16:24.2615700Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/autograd/graph.py::saved_tensors_hooks:0, line 50 <- wrt source file 2023-01-11T23:16:24.2616458Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/autograd/graph.py::saved_tensors_hooks:0 2023-01-11T23:16:24.2617230Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/autograd/graph.py::save_on_cpu:0, line 109 <- wrt source file 2023-01-11T23:16:24.2617942Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/autograd/graph.py::save_on_cpu:0 2023-01-11T23:16:24.2618524Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/autograd/graph.py::disable_saved_tensors_hooks:0, line 164 <- wrt source file 2023-01-11T23:16:24.2619080Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/autograd/graph.py::disable_saved_tensors_hooks:0 2023-01-11T23:16:24.2619643Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/autograd/graph.py::register_multi_grad_hook:0, line 204 <- wrt source file 2023-01-11T23:16:24.2633333Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/autograd/graph.py::register_multi_grad_hook:0 2023-01-11T23:16:24.2634045Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/autograd/graph.py::allow_mutation_on_saved_tensors:0, line 406 <- wrt source file 2023-01-11T23:16:24.2647389Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/autograd/graph.py::allow_mutation_on_saved_tensors:0 2023-01-11T23:16:24.2648088Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/autograd/profiler.py::profile:0, line 123 <- wrt source file 2023-01-11T23:16:24.2648866Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/autograd/profiler.py::profile:0 2023-01-11T23:16:24.2649560Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/autograd/profiler.py::record_function:0, line 457 <- wrt source file 2023-01-11T23:16:24.2650365Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/autograd/profiler.py::record_function:0 2023-01-11T23:16:24.2651270Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/autograd/profiler.py::emit_itt:0, line 582 <- wrt source file 2023-01-11T23:16:24.2651887Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/autograd/profiler.py::emit_itt:0 2023-01-11T23:16:24.2652585Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/autograd/profiler.py::emit_nvtx:0, line 651 <- wrt source file 2023-01-11T23:16:24.2653548Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/autograd/profiler.py::emit_nvtx:0 2023-01-11T23:16:24.2654416Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/cuda/jiterator.py::_create_jit_fn:0, line 95 <- wrt source file 2023-01-11T23:16:24.2655272Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/cuda/jiterator.py::_create_jit_fn:0 2023-01-11T23:16:24.2655993Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/cuda/jiterator.py::_create_jit_fn:1, line 106 <- wrt source file 2023-01-11T23:16:24.2656721Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/cuda/jiterator.py::_create_jit_fn:1 2023-01-11T23:16:24.2657492Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/cuda/jiterator.py::_create_jit_fn:2, line 119 <- wrt source file 2023-01-11T23:16:24.2658067Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/cuda/jiterator.py::_create_jit_fn:2 2023-01-11T23:16:24.2658747Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/cuda/jiterator.py::_create_multi_output_jit_fn:0, line 151 <- wrt source file 2023-01-11T23:16:24.2659431Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/cuda/jiterator.py::_create_multi_output_jit_fn:0 2023-01-11T23:16:24.2659991Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/argparse_util.py::env:0, line 23 <- wrt source file 2023-01-11T23:16:24.2660530Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/distributed/argparse_util.py::env:0 2023-01-11T23:16:24.2661156Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/argparse_util.py::check_env:0, line 73 <- wrt source file 2023-01-11T23:16:24.2661808Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/distributed/argparse_util.py::check_env:0 2023-01-11T23:16:24.2662634Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py::batch_isend_irecv:0, line 1377 <- wrt source file 2023-01-11T23:16:24.2663358Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py::batch_isend_irecv:0 2023-01-11T23:16:24.2664211Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py::all_reduce:0, line 1633 <- wrt source file 2023-01-11T23:16:24.2664794Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py::all_reduce:0 2023-01-11T23:16:24.2665556Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py::all_gather_object:0, line 1997 <- wrt source file 2023-01-11T23:16:24.2666213Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py::all_gather_object:0 2023-01-11T23:16:24.2666792Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py::gather_object:0, line 2085 <- wrt source file 2023-01-11T23:16:24.2667352Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py::gather_object:0 2023-01-11T23:16:24.2667930Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py::broadcast_object_list:0, line 2192 <- wrt source file 2023-01-11T23:16:24.2681383Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py::broadcast_object_list:0 2023-01-11T23:16:24.2682428Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py::scatter_object_list:0, line 2288 <- wrt source file 2023-01-11T23:16:24.2683059Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py::scatter_object_list:0 2023-01-11T23:16:24.2683653Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py::all_gather:0, line 2375 <- wrt source file 2023-01-11T23:16:24.2684229Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py::all_gather:0 2023-01-11T23:16:24.2684812Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py::all_gather_into_tensor:0, line 2455 <- wrt source file 2023-01-11T23:16:24.2685392Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py::all_gather_into_tensor:0 2023-01-11T23:16:24.2685976Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py::all_gather_coalesced:0, line 2564 <- wrt source file 2023-01-11T23:16:24.2686549Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py::all_gather_coalesced:0 2023-01-11T23:16:24.2687116Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py::scatter:0, line 2722 <- wrt source file 2023-01-11T23:16:24.2687664Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py::scatter:0 2023-01-11T23:16:24.2688237Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py::reduce_scatter_tensor:0, line 2929 <- wrt source file 2023-01-11T23:16:24.2688817Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py::reduce_scatter_tensor:0 2023-01-11T23:16:24.2689501Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py::all_to_all_single:0, line 3046 <- wrt source file 2023-01-11T23:16:24.2690164Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py::all_to_all_single:0 2023-01-11T23:16:24.2690725Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py::all_to_all:0, line 3164 <- wrt source file 2023-01-11T23:16:24.2691290Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py::all_to_all:0 2023-01-11T23:16:24.2691866Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py::monitored_barrier:0, line 3346 <- wrt source file 2023-01-11T23:16:24.2692494Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py::monitored_barrier:0 2023-01-11T23:16:24.2693099Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py::new_subgroups:0, line 3598 <- wrt source file 2023-01-11T23:16:24.2693690Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py::new_subgroups:0 2023-01-11T23:16:24.2694285Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py::new_subgroups_by_enumeration:0, line 3713 <- wrt source file 2023-01-11T23:16:24.2695025Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py::new_subgroups_by_enumeration:0 2023-01-11T23:16:24.2695582Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/launch.py::__doc__:0, line 81 <- wrt source file 2023-01-11T23:16:24.2696311Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/distributed/launch.py::__doc__:0 2023-01-11T23:16:24.2696914Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/run.py::__doc__:0, line 297 <- wrt source file 2023-01-11T23:16:24.2697593Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/distributed/run.py::__doc__:0 2023-01-11T23:16:24.2698262Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/autograd/__init__.py::context:0, line 39 <- wrt source file 2023-01-11T23:16:24.2698805Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/distributed/autograd/__init__.py::context:0 2023-01-11T23:16:24.2699409Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/_composable/_ddp.py::DistributedDataParallel.no_sync:0, line 509 <- wrt source file 2023-01-11T23:16:24.2700036Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/distributed/_composable/_ddp.py::DistributedDataParallel.no_sync:0 2023-01-11T23:16:24.2700702Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/_composable/_ddp.py::DistributedDataParallel.register_comm_hook:0, line 826 <- wrt source file 2023-01-11T23:16:24.2722142Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/distributed/_composable/_ddp.py::DistributedDataParallel.register_comm_hook:0 2023-01-11T23:16:24.2722916Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/_composable/_ddp.py::DistributedDataParallel.register_comm_hook:1, line 838 <- wrt source file 2023-01-11T23:16:24.2723580Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/distributed/_composable/_ddp.py::DistributedDataParallel.register_comm_hook:1 2023-01-11T23:16:24.2724405Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/_composable/_ddp.py::DistributedDataParallel._register_builtin_comm_hook:0, line 874 <- wrt source file 2023-01-11T23:16:24.2725175Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/distributed/_composable/_ddp.py::DistributedDataParallel._register_builtin_comm_hook:0 2023-01-11T23:16:24.2725922Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/_composable/_ddp.py::DistributedDataParallel._register_fused_optim:0, line 930 <- wrt source file 2023-01-11T23:16:24.2726577Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/distributed/_composable/_ddp.py::DistributedDataParallel._register_fused_optim:0 2023-01-11T23:16:24.2727212Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/_composable/checkpoint_activation.py::checkpoint:0, line 198 <- wrt source file 2023-01-11T23:16:24.2727820Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/distributed/_composable/checkpoint_activation.py::checkpoint:0 2023-01-11T23:16:24.2728410Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/_composable/contract.py::contract:0, line 44 <- wrt source file 2023-01-11T23:16:24.2728962Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/distributed/_composable/contract.py::contract:0 2023-01-11T23:16:24.2729541Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/_composable/replicate.py::replicate:0, line 21 <- wrt source file 2023-01-11T23:16:24.2730236Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/distributed/_composable/replicate.py::replicate:0 2023-01-11T23:16:24.2730830Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/_shard/partial_tensor.py::_PartialTensor:0, line 61 <- wrt source file 2023-01-11T23:16:24.2731397Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/distributed/_shard/partial_tensor.py::_PartialTensor:0 2023-01-11T23:16:24.2732118Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/_shard/sharded_optim/__init__.py::named_params_with_sharded_tensor:0, line 32 <- wrt source file 2023-01-11T23:16:24.2732827Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/distributed/_shard/sharded_optim/__init__.py::named_params_with_sharded_tensor:0 2023-01-11T23:16:24.2733458Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/_shard/sharded_tensor/__init__.py::init_from_local_shards:0, line 366 <- wrt source file 2023-01-11T23:16:24.2734053Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/distributed/_shard/sharded_tensor/__init__.py::init_from_local_shards:0 2023-01-11T23:16:24.2734931Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/_shard/sharded_tensor/__init__.py::custom_sharded_op_impl:0, line 430 <- wrt source file 2023-01-11T23:16:24.2736017Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/distributed/_shard/sharded_tensor/__init__.py::custom_sharded_op_impl:0 2023-01-11T23:16:24.2736697Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/_shard/sharded_tensor/api.py::ShardedTensor._init_from_local_tensor:0, line 808 <- wrt source file 2023-01-11T23:16:24.2737361Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/distributed/_shard/sharded_tensor/api.py::ShardedTensor._init_from_local_tensor:0 2023-01-11T23:16:24.2738002Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/_shard/sharded_tensor/api.py::ShardedTensor.reshard:0, line 958 <- wrt source file 2023-01-11T23:16:24.2739598Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/distributed/_shard/sharded_tensor/api.py::ShardedTensor.reshard:0 2023-01-11T23:16:24.2740244Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/_shard/sharded_tensor/_ops/_common.py::_sharded_op_common:0, line 15 <- wrt source file 2023-01-11T23:16:24.2740863Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/distributed/_shard/sharded_tensor/_ops/_common.py::_sharded_op_common:0 2023-01-11T23:16:24.2741603Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/_shard/sharding_plan/api.py::ShardingPlan:0, line 36 <- wrt source file 2023-01-11T23:16:24.2742283Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/distributed/_shard/sharding_plan/api.py::ShardingPlan:0 2023-01-11T23:16:24.2742881Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/_tools/memory_tracker.py::MemoryTracker:0, line 57 <- wrt source file 2023-01-11T23:16:24.2743605Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/distributed/_tools/memory_tracker.py::MemoryTracker:0 2023-01-11T23:16:24.2744166Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/algorithms/join.py::Join:0, line 148 <- wrt source file 2023-01-11T23:16:24.2747650Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/distributed/algorithms/join.py::Join:0 2023-01-11T23:16:24.2748515Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/algorithms/ddp_comm_hooks/__init__.py::register_ddp_comm_hook:0, line 99 <- wrt source file 2023-01-11T23:16:24.2749147Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/distributed/algorithms/ddp_comm_hooks/__init__.py::register_ddp_comm_hook:0 2023-01-11T23:16:24.2749909Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/algorithms/ddp_comm_hooks/debugging_hooks.py::noop_hook:0, line 23 <- wrt source file 2023-01-11T23:16:24.2750683Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/distributed/algorithms/ddp_comm_hooks/debugging_hooks.py::noop_hook:0 2023-01-11T23:16:24.2751387Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/algorithms/ddp_comm_hooks/default_hooks.py::allreduce_hook:0, line 37 <- wrt source file 2023-01-11T23:16:24.2752473Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/distributed/algorithms/ddp_comm_hooks/default_hooks.py::allreduce_hook:0 2023-01-11T23:16:24.2753243Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/algorithms/ddp_comm_hooks/default_hooks.py::fp16_compress_hook:0, line 54 <- wrt source file 2023-01-11T23:16:24.2754085Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/distributed/algorithms/ddp_comm_hooks/default_hooks.py::fp16_compress_hook:0 2023-01-11T23:16:24.2754770Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/algorithms/ddp_comm_hooks/default_hooks.py::bf16_compress_hook:0, line 90 <- wrt source file 2023-01-11T23:16:24.2755508Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/distributed/algorithms/ddp_comm_hooks/default_hooks.py::bf16_compress_hook:0 2023-01-11T23:16:24.2756162Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/algorithms/ddp_comm_hooks/default_hooks.py::fp16_compress_wrapper:0, line 123 <- wrt source file 2023-01-11T23:16:24.2756958Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/distributed/algorithms/ddp_comm_hooks/default_hooks.py::fp16_compress_wrapper:0 2023-01-11T23:16:24.2757695Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/algorithms/ddp_comm_hooks/default_hooks.py::bf16_compress_wrapper:0, line 161 <- wrt source file 2023-01-11T23:16:24.2758408Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/distributed/algorithms/ddp_comm_hooks/default_hooks.py::bf16_compress_wrapper:0 2023-01-11T23:16:24.2759282Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/algorithms/ddp_comm_hooks/post_localSGD_hook.py::post_localSGD_hook:0, line 85 <- wrt source file 2023-01-11T23:16:24.2759935Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/distributed/algorithms/ddp_comm_hooks/post_localSGD_hook.py::post_localSGD_hook:0 2023-01-11T23:16:24.2760873Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/algorithms/ddp_comm_hooks/powerSGD_hook.py::powerSGD_hook:0, line 382 <- wrt source file 2023-01-11T23:16:24.2761563Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/distributed/algorithms/ddp_comm_hooks/powerSGD_hook.py::powerSGD_hook:0 2023-01-11T23:16:24.2762202Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/algorithms/ddp_comm_hooks/powerSGD_hook.py::batched_powerSGD_hook:0, line 691 <- wrt source file 2023-01-11T23:16:24.2763044Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/distributed/algorithms/ddp_comm_hooks/powerSGD_hook.py::batched_powerSGD_hook:0 2023-01-11T23:16:24.2763937Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/algorithms/ddp_comm_hooks/quantization_hooks.py::quantization_pertensor_hook:0, line 62 <- wrt source file 2023-01-11T23:16:24.2764752Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/distributed/algorithms/ddp_comm_hooks/quantization_hooks.py::quantization_pertensor_hook:0 2023-01-11T23:16:24.2765585Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/algorithms/ddp_comm_hooks/quantization_hooks.py::quantization_perchannel_hook:0, line 142 <- wrt source file 2023-01-11T23:16:24.2766256Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/distributed/algorithms/ddp_comm_hooks/quantization_hooks.py::quantization_perchannel_hook:0 2023-01-11T23:16:24.2767040Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/algorithms/model_averaging/averagers.py::PeriodicModelAverager:0, line 51 <- wrt source file 2023-01-11T23:16:24.2767790Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/distributed/algorithms/model_averaging/averagers.py::PeriodicModelAverager:0 2023-01-11T23:16:24.2768777Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/algorithms/model_averaging/hierarchical_model_averager.py::HierarchicalModelAverager:0, line 50 <- wrt source file 2023-01-11T23:16:24.2769654Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/distributed/algorithms/model_averaging/hierarchical_model_averager.py::HierarchicalModelAverager:0 2023-01-11T23:16:24.2770492Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/checkpoint/optimizer.py::load_sharded_optimizer_state_dict:0, line 205 <- wrt source file 2023-01-11T23:16:24.2771281Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/distributed/checkpoint/optimizer.py::load_sharded_optimizer_state_dict:0 2023-01-11T23:16:24.2771957Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/checkpoint/planner.py::SavePlanner:0, line 126 <- wrt source file 2023-01-11T23:16:24.2772662Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/distributed/checkpoint/planner.py::SavePlanner:0 2023-01-11T23:16:24.2773407Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/checkpoint/planner.py::LoadPlanner:0, line 281 <- wrt source file 2023-01-11T23:16:24.2774076Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/distributed/checkpoint/planner.py::LoadPlanner:0 2023-01-11T23:16:24.2775046Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/checkpoint/state_dict_loader.py::load_state_dict:0, line 63 <- wrt source file 2023-01-11T23:16:24.2775799Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/distributed/checkpoint/state_dict_loader.py::load_state_dict:0 2023-01-11T23:16:24.2776471Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/checkpoint/state_dict_saver.py::save_state_dict:0, line 60 <- wrt source file 2023-01-11T23:16:24.2777193Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/distributed/checkpoint/state_dict_saver.py::save_state_dict:0 2023-01-11T23:16:24.2777914Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/__init__.py::start_processes:0, line 132 <- wrt source file 2023-01-11T23:16:24.2778661Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/__init__.py::start_processes:0 2023-01-11T23:16:24.2779264Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/api.py::Std.from_str:0, line 110 <- wrt source file 2023-01-11T23:16:24.2779853Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/api.py::Std.from_str:0 2023-01-11T23:16:24.2780448Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/api.py::to_map:0, line 150 <- wrt source file 2023-01-11T23:16:24.2781031Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/api.py::to_map:0 2023-01-11T23:16:24.2781757Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py::ChildFailedError:0, line 203 <- wrt source file 2023-01-11T23:16:24.2782399Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py::ChildFailedError:0 2023-01-11T23:16:24.2783161Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/elastic/rendezvous/api.py::RendezvousHandler.shutdown:0, line 112 <- wrt source file 2023-01-11T23:16:24.2783846Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/distributed/elastic/rendezvous/api.py::RendezvousHandler.shutdown:0 2023-01-11T23:16:24.2784547Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/api.py::StateDictType:0, line 221 <- wrt source file 2023-01-11T23:16:24.2785196Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/api.py::StateDictType:0 2023-01-11T23:16:24.2785774Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/api.py::FullStateDictConfig:0, line 258 <- wrt source file 2023-01-11T23:16:24.2786506Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/api.py::FullStateDictConfig:0 2023-01-11T23:16:24.2787307Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py::FullyShardedDataParallel:0, line 125 <- wrt source file 2023-01-11T23:16:24.2788085Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py::FullyShardedDataParallel:0 2023-01-11T23:16:24.2788965Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py::FullyShardedDataParallel.set_state_dict_type:0, line 551 <- wrt source file 2023-01-11T23:16:24.2789917Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py::FullyShardedDataParallel.set_state_dict_type:0 2023-01-11T23:16:24.2790764Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py::FullyShardedDataParallel.state_dict_type:0, line 619 <- wrt source file 2023-01-11T23:16:24.2791549Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py::FullyShardedDataParallel.state_dict_type:0 2023-01-11T23:16:24.2792540Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py::FullyShardedDataParallel.shard_full_optim_state_dict:0, line 1346 <- wrt source file 2023-01-11T23:16:24.2793493Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py::FullyShardedDataParallel.shard_full_optim_state_dict:0 2023-01-11T23:16:24.2794238Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py::FullyShardedDataParallel.scatter_full_optim_state_dict:0, line 1455 <- wrt source file 2023-01-11T23:16:24.2795067Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py::FullyShardedDataParallel.scatter_full_optim_state_dict:0 2023-01-11T23:16:24.2795809Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py::FullyShardedDataParallel.rekey_optim_state_dict:0, line 1586 <- wrt source file 2023-01-11T23:16:24.2796504Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py::FullyShardedDataParallel.rekey_optim_state_dict:0 2023-01-11T23:16:24.2797168Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/sharded_grad_scaler.py::ShardedGradScaler:0, line 45 <- wrt source file 2023-01-11T23:16:24.2797764Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/sharded_grad_scaler.py::ShardedGradScaler:0 2023-01-11T23:16:24.2798349Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/nn/functional.py::_all_gather_base:0, line 130 <- wrt source file 2023-01-11T23:16:24.2798895Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/distributed/nn/functional.py::_all_gather_base:0 2023-01-11T23:16:24.2799480Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/nn/api/remote_module.py::_RemoteModule.__init__:0, line 201 <- wrt source file 2023-01-11T23:16:24.2800192Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/distributed/nn/api/remote_module.py::_RemoteModule.__init__:0 2023-01-11T23:16:24.2800870Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/nn/api/remote_module.py::_RemoteModule.init_from_module_rref:0, line 524 <- wrt source file 2023-01-11T23:16:24.2801540Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/distributed/nn/api/remote_module.py::_RemoteModule.init_from_module_rref:0 2023-01-11T23:16:24.2802142Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/nn/api/remote_module.py::RemoteModule:0, line 646 <- wrt source file 2023-01-11T23:16:24.2802731Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/distributed/nn/api/remote_module.py::RemoteModule:0 2023-01-11T23:16:24.2803368Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/optim/apply_optimizer_in_backward.py::_apply_optimizer_in_backward:0, line 27 <- wrt source file 2023-01-11T23:16:24.2804010Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/distributed/optim/apply_optimizer_in_backward.py::_apply_optimizer_in_backward:0 2023-01-11T23:16:24.2804724Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/optim/named_optimizer.py::_NamedOptimizer:0, line 38 <- wrt source file 2023-01-11T23:16:24.2805396Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/distributed/optim/named_optimizer.py::_NamedOptimizer:0 2023-01-11T23:16:24.2806097Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/optim/optimizer.py::DistributedOptimizer:0, line 160 <- wrt source file 2023-01-11T23:16:24.2806686Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/distributed/optim/optimizer.py::DistributedOptimizer:0 2023-01-11T23:16:24.2807304Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/optim/post_localSGD_optimizer.py::PostLocalSGDOptimizer:0, line 18 <- wrt source file 2023-01-11T23:16:24.2807998Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/distributed/optim/post_localSGD_optimizer.py::PostLocalSGDOptimizer:0 2023-01-11T23:16:24.2808607Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/optim/utils.py::register_functional_optim:0, line 35 <- wrt source file 2023-01-11T23:16:24.2809176Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/distributed/optim/utils.py::register_functional_optim:0 2023-01-11T23:16:24.2809874Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/optim/zero_redundancy_optimizer.py::ZeroRedundancyOptimizer:0, line 325 <- wrt source file 2023-01-11T23:16:24.2810570Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/distributed/optim/zero_redundancy_optimizer.py::ZeroRedundancyOptimizer:0 2023-01-11T23:16:24.2811243Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/pipeline/sync/pipe.py::WithDevice:0, line 152 <- wrt source file 2023-01-11T23:16:24.2811805Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/distributed/pipeline/sync/pipe.py::WithDevice:0 2023-01-11T23:16:24.2812372Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/pipeline/sync/pipe.py::Pipe:0, line 274 <- wrt source file 2023-01-11T23:16:24.2812915Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/distributed/pipeline/sync/pipe.py::Pipe:0 2023-01-11T23:16:24.2813462Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/rpc/api.py::_wait_all:0, line 160 <- wrt source file 2023-01-11T23:16:24.2813979Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/distributed/rpc/api.py::_wait_all:0 2023-01-11T23:16:24.2814649Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/rpc/api.py::shutdown:0, line 333 <- wrt source file 2023-01-11T23:16:24.2815240Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/distributed/rpc/api.py::shutdown:0 2023-01-11T23:16:24.2815778Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/rpc/api.py::remote:0, line 582 <- wrt source file 2023-01-11T23:16:24.2816295Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/distributed/rpc/api.py::remote:0 2023-01-11T23:16:24.2816824Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/rpc/api.py::rpc_sync:0, line 766 <- wrt source file 2023-01-11T23:16:24.2817341Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/distributed/rpc/api.py::rpc_sync:0 2023-01-11T23:16:24.2817877Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/rpc/api.py::rpc_async:0, line 858 <- wrt source file 2023-01-11T23:16:24.2818400Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/distributed/rpc/api.py::rpc_async:0 2023-01-11T23:16:24.2818956Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/rpc/functions.py::async_execution:0, line 33 <- wrt source file 2023-01-11T23:16:24.2830020Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/distributed/rpc/functions.py::async_execution:0 2023-01-11T23:16:24.2830724Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/rpc/options.py::TensorPipeRpcBackendOptions.set_device_map:0, line 117 <- wrt source file 2023-01-11T23:16:24.2831378Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/distributed/rpc/options.py::TensorPipeRpcBackendOptions.set_device_map:0 2023-01-11T23:16:24.2832039Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/rpc/server_process_global_profiler.py::_server_process_global_profile:0, line 58 <- wrt source file 2023-01-11T23:16:24.2835175Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/distributed/rpc/server_process_global_profiler.py::_server_process_global_profile:0 2023-01-11T23:16:24.2835912Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/tensor/parallel/_utils.py::_prepare_input_validate:0, line 33 <- wrt source file 2023-01-11T23:16:24.2836512Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/distributed/tensor/parallel/_utils.py::_prepare_input_validate:0 2023-01-11T23:16:24.2837126Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/tensor/parallel/_utils.py::_prepare_output_validate:0, line 78 <- wrt source file 2023-01-11T23:16:24.2837721Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/distributed/tensor/parallel/_utils.py::_prepare_output_validate:0 2023-01-11T23:16:24.2838320Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/tensor/parallel/api.py::parallelize_module:0, line 63 <- wrt source file 2023-01-11T23:16:24.2838901Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/distributed/tensor/parallel/api.py::parallelize_module:0 2023-01-11T23:16:24.2839510Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributions/bernoulli.py::Bernoulli:0, line 21 <- wrt source file 2023-01-11T23:16:24.2841221Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/distributions/bernoulli.py::Bernoulli:0 2023-01-11T23:16:24.2841753Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributions/beta.py::Beta:0, line 16 <- wrt source file 2023-01-11T23:16:24.2846957Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/distributions/beta.py::Beta:0 2023-01-11T23:16:24.2847499Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributions/binomial.py::Binomial:0, line 20 <- wrt source file 2023-01-11T23:16:24.2855046Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/distributions/binomial.py::Binomial:0 2023-01-11T23:16:24.2855616Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributions/categorical.py::Categorical:0, line 37 <- wrt source file 2023-01-11T23:16:24.2861005Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/distributions/categorical.py::Categorical:0 2023-01-11T23:16:24.2861562Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributions/cauchy.py::Cauchy:0, line 19 <- wrt source file 2023-01-11T23:16:24.2865505Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/distributions/cauchy.py::Cauchy:0 2023-01-11T23:16:24.2866037Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributions/chi2.py::Chi2:0, line 12 <- wrt source file 2023-01-11T23:16:24.2870193Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/distributions/chi2.py::Chi2:0 2023-01-11T23:16:24.2870847Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributions/constraints.py::_DependentProperty:0, line 152 <- wrt source file 2023-01-11T23:16:24.2873151Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/distributions/constraints.py::_DependentProperty:0 2023-01-11T23:16:24.2873771Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributions/continuous_bernoulli.py::ContinuousBernoulli:0, line 24 <- wrt source file 2023-01-11T23:16:24.2878581Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/distributions/continuous_bernoulli.py::ContinuousBernoulli:0 2023-01-11T23:16:24.2879162Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributions/dirichlet.py::Dirichlet:0, line 35 <- wrt source file 2023-01-11T23:16:24.2882880Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/distributions/dirichlet.py::Dirichlet:0 2023-01-11T23:16:24.2883456Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributions/exponential.py::Exponential:0, line 15 <- wrt source file 2023-01-11T23:16:24.2887158Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/distributions/exponential.py::Exponential:0 2023-01-11T23:16:24.2887761Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributions/fishersnedecor.py::FisherSnedecor:0, line 16 <- wrt source file 2023-01-11T23:16:24.2893020Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/distributions/fishersnedecor.py::FisherSnedecor:0 2023-01-11T23:16:24.2893616Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributions/gamma.py::Gamma:0, line 19 <- wrt source file 2023-01-11T23:16:24.2897379Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/distributions/gamma.py::Gamma:0 2023-01-11T23:16:24.2897938Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributions/geometric.py::Geometric:0, line 21 <- wrt source file 2023-01-11T23:16:24.2902070Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/distributions/geometric.py::Geometric:0 2023-01-11T23:16:24.2902621Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributions/gumbel.py::Gumbel:0, line 17 <- wrt source file 2023-01-11T23:16:24.2908121Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/distributions/gumbel.py::Gumbel:0 2023-01-11T23:16:24.2908682Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributions/half_cauchy.py::HalfCauchy:0, line 20 <- wrt source file 2023-01-11T23:16:24.2913900Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/distributions/half_cauchy.py::HalfCauchy:0 2023-01-11T23:16:24.2914456Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributions/half_normal.py::HalfNormal:0, line 20 <- wrt source file 2023-01-11T23:16:24.2917983Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/distributions/half_normal.py::HalfNormal:0 2023-01-11T23:16:24.2918551Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributions/independent.py::Independent:0, line 18 <- wrt source file 2023-01-11T23:16:24.2929707Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/distributions/independent.py::Independent:0 2023-01-11T23:16:24.2930374Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributions/kumaraswamy.py::Kumaraswamy:0, line 25 <- wrt source file 2023-01-11T23:16:24.2935864Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/distributions/kumaraswamy.py::Kumaraswamy:0 2023-01-11T23:16:24.2937381Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributions/laplace.py::Laplace:0, line 14 <- wrt source file 2023-01-11T23:16:24.2941222Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/distributions/laplace.py::Laplace:0 2023-01-11T23:16:24.2942098Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributions/lkj_cholesky.py::LKJCholesky:0, line 38 <- wrt source file 2023-01-11T23:16:24.2950178Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/distributions/lkj_cholesky.py::LKJCholesky:0 2023-01-11T23:16:24.2951216Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributions/log_normal.py::LogNormal:0, line 17 <- wrt source file 2023-01-11T23:16:24.2956328Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/distributions/log_normal.py::LogNormal:0 2023-01-11T23:16:24.2957279Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributions/logistic_normal.py::LogisticNormal:0, line 22 <- wrt source file 2023-01-11T23:16:24.2964280Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/distributions/logistic_normal.py::LogisticNormal:0 2023-01-11T23:16:24.2965313Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributions/lowrank_multivariate_normal.py::LowRankMultivariateNormal:0, line 56 <- wrt source file 2023-01-11T23:16:24.2966558Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/distributions/lowrank_multivariate_normal.py::LowRankMultivariateNormal:0 2023-01-11T23:16:24.2967559Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributions/mixture_same_family.py::MixtureSameFamily:0, line 19 <- wrt source file 2023-01-11T23:16:24.2970358Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/distributions/mixture_same_family.py::MixtureSameFamily:0 2023-01-11T23:16:24.2971279Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributions/multinomial.py::Multinomial:0, line 34 <- wrt source file 2023-01-11T23:16:24.2972176Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/distributions/multinomial.py::Multinomial:0 2023-01-11T23:16:24.2973119Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributions/multivariate_normal.py::MultivariateNormal:0, line 94 <- wrt source file 2023-01-11T23:16:24.2974084Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/distributions/multivariate_normal.py::MultivariateNormal:0 2023-01-11T23:16:24.2975438Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributions/normal.py::Normal:0, line 18 <- wrt source file 2023-01-11T23:16:24.2979631Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/distributions/normal.py::Normal:0 2023-01-11T23:16:24.2980542Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributions/one_hot_categorical.py::OneHotCategorical:0, line 27 <- wrt source file 2023-01-11T23:16:24.2987031Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/distributions/one_hot_categorical.py::OneHotCategorical:0 2023-01-11T23:16:24.2988112Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributions/pareto.py::Pareto:0, line 14 <- wrt source file 2023-01-11T23:16:24.2993883Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/distributions/pareto.py::Pareto:0 2023-01-11T23:16:24.2994763Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributions/poisson.py::Poisson:0, line 20 <- wrt source file 2023-01-11T23:16:24.2995613Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/distributions/poisson.py::Poisson:0 2023-01-11T23:16:24.2996554Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributions/relaxed_bernoulli.py::RelaxedBernoulli:0, line 102 <- wrt source file 2023-01-11T23:16:24.3001127Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/distributions/relaxed_bernoulli.py::RelaxedBernoulli:0 2023-01-11T23:16:24.3002119Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributions/relaxed_categorical.py::RelaxedOneHotCategorical:0, line 96 <- wrt source file 2023-01-11T23:16:24.3009514Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/distributions/relaxed_categorical.py::RelaxedOneHotCategorical:0 2023-01-11T23:16:24.3010460Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributions/studentT.py::StudentT:0, line 17 <- wrt source file 2023-01-11T23:16:24.3016776Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/distributions/studentT.py::StudentT:0 2023-01-11T23:16:24.3017685Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributions/transforms.py::CatTransform:0, line 1002 <- wrt source file 2023-01-11T23:16:24.3018573Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/distributions/transforms.py::CatTransform:0 2023-01-11T23:16:24.3019456Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributions/transforms.py::StackTransform:0, line 1105 <- wrt source file 2023-01-11T23:16:24.3020372Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/distributions/transforms.py::StackTransform:0 2023-01-11T23:16:24.3021483Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributions/transforms.py::CumulativeDistributionTransform:0, line 1178 <- wrt source file 2023-01-11T23:16:24.3022495Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/distributions/transforms.py::CumulativeDistributionTransform:0 2023-01-11T23:16:24.3023444Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributions/uniform.py::Uniform:0, line 17 <- wrt source file 2023-01-11T23:16:24.3025041Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/distributions/uniform.py::Uniform:0 2023-01-11T23:16:24.3025930Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributions/von_mises.py::VonMises:0, line 79 <- wrt source file 2023-01-11T23:16:24.3035101Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/distributions/von_mises.py::VonMises:0 2023-01-11T23:16:24.3036007Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributions/weibull.py::Weibull:0, line 16 <- wrt source file 2023-01-11T23:16:24.3040440Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/distributions/weibull.py::Weibull:0 2023-01-11T23:16:24.3041328Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributions/wishart.py::Wishart:0, line 36 <- wrt source file 2023-01-11T23:16:24.3042163Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/distributions/wishart.py::Wishart:0 2023-01-11T23:16:24.3042983Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/fx/graph.py::_snake_case:0, line 79 <- wrt source file 2023-01-11T23:16:24.3043934Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/fx/graph.py::_snake_case:0 2023-01-11T23:16:24.3044818Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/fx/graph.py::Graph.eliminate_dead_code:0, line 1363 <- wrt source file 2023-01-11T23:16:24.3045683Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/fx/graph.py::Graph.eliminate_dead_code:0 2023-01-11T23:16:24.3046530Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/fx/graph.py::Graph.on_generate_code:0, line 1431 <- wrt source file 2023-01-11T23:16:24.3047349Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/fx/graph.py::Graph.on_generate_code:0 2023-01-11T23:16:24.3048197Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/fx/interpreter.py::Interpreter:0, line 37 <- wrt source file 2023-01-11T23:16:24.3049036Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/fx/interpreter.py::Interpreter:0 2023-01-11T23:16:24.3049897Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/fx/interpreter.py::Transformer:0, line 380 <- wrt source file 2023-01-11T23:16:24.3050710Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/fx/interpreter.py::Transformer:0 2023-01-11T23:16:24.3051611Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/fx/subgraph_rewriter.py::replace_pattern:0, line 108 <- wrt source file 2023-01-11T23:16:24.3052441Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/fx/subgraph_rewriter.py::replace_pattern:0 2023-01-11T23:16:24.3053278Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/fx/tensor_type.py::TensorType:0, line 11 <- wrt source file 2023-01-11T23:16:24.3054151Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/fx/tensor_type.py::TensorType:0 2023-01-11T23:16:24.3055258Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/fx/tensor_type.py::is_consistent:0, line 62 <- wrt source file 2023-01-11T23:16:24.3056061Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/fx/tensor_type.py::is_consistent:0 2023-01-11T23:16:24.3057037Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/fx/tensor_type.py::is_more_precise:0, line 88 <- wrt source file 2023-01-11T23:16:24.3057872Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/fx/tensor_type.py::is_more_precise:0 2023-01-11T23:16:24.3058793Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/rewriter.py::AST_Rewriter.visit_AnnAssign:0, line 87 <- wrt source file 2023-01-11T23:16:24.3059720Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/rewriter.py::AST_Rewriter.visit_AnnAssign:0 2023-01-11T23:16:24.3060635Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/core.py::reify:0, line 42 <- wrt source file 2023-01-11T23:16:24.3061526Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/core.py::reify:0 2023-01-11T23:16:24.3062415Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/match.py::VarDispatcher:0, line 42 <- wrt source file 2023-01-11T23:16:24.3063355Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/match.py::VarDispatcher:0 2023-01-11T23:16:24.3064279Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/more.py::unifiable:0, line 10 <- wrt source file 2023-01-11T23:16:24.3065146Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/more.py::unifiable:0 2023-01-11T23:16:24.3066025Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/more.py::reify_object:0, line 36 <- wrt source file 2023-01-11T23:16:24.3067024Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/more.py::reify_object:0 2023-01-11T23:16:24.3067893Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/more.py::unify_object:0, line 91 <- wrt source file 2023-01-11T23:16:24.3068732Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/more.py::unify_object:0 2023-01-11T23:16:24.3069598Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/unification_tools.py::merge:0, line 22 <- wrt source file 2023-01-11T23:16:24.3092780Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/unification_tools.py::merge:0 2023-01-11T23:16:24.3093794Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/unification_tools.py::merge_with:0, line 49 <- wrt source file 2023-01-11T23:16:24.3097851Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/unification_tools.py::merge_with:0 2023-01-11T23:16:24.3098816Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/unification_tools.py::valmap:0, line 75 <- wrt source file 2023-01-11T23:16:24.3101234Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/unification_tools.py::valmap:0 2023-01-11T23:16:24.3102232Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/unification_tools.py::keymap:0, line 91 <- wrt source file 2023-01-11T23:16:24.3105717Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/unification_tools.py::keymap:0 2023-01-11T23:16:24.3106690Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/unification_tools.py::itemmap:0, line 107 <- wrt source file 2023-01-11T23:16:24.3109444Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/unification_tools.py::itemmap:0 2023-01-11T23:16:24.3110652Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/unification_tools.py::valfilter:0, line 123 <- wrt source file 2023-01-11T23:16:24.3115032Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/unification_tools.py::valfilter:0 2023-01-11T23:16:24.3116041Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/unification_tools.py::keyfilter:0, line 143 <- wrt source file 2023-01-11T23:16:24.3119898Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/unification_tools.py::keyfilter:0 2023-01-11T23:16:24.3120896Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/unification_tools.py::itemfilter:0, line 163 <- wrt source file 2023-01-11T23:16:24.3125792Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/unification_tools.py::itemfilter:0 2023-01-11T23:16:24.3126801Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/unification_tools.py::assoc:0, line 189 <- wrt source file 2023-01-11T23:16:24.3129422Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/unification_tools.py::assoc:0 2023-01-11T23:16:24.3130402Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/unification_tools.py::dissoc:0, line 206 <- wrt source file 2023-01-11T23:16:24.3135386Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/unification_tools.py::dissoc:0 2023-01-11T23:16:24.3136358Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/unification_tools.py::assoc_in:0, line 232 <- wrt source file 2023-01-11T23:16:24.3140004Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/unification_tools.py::assoc_in:0 2023-01-11T23:16:24.3141014Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/unification_tools.py::update_in:0, line 259 <- wrt source file 2023-01-11T23:16:24.3148206Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/unification_tools.py::update_in:0 2023-01-11T23:16:24.3149221Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/unification_tools.py::get_in:0, line 311 <- wrt source file 2023-01-11T23:16:24.3158244Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/unification_tools.py::get_in:0 2023-01-11T23:16:24.3159244Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/unification_tools.py::groupby:0, line 357 <- wrt source file 2023-01-11T23:16:24.3162844Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/unification_tools.py::groupby:0 2023-01-11T23:16:24.3163543Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/unification_tools.py::first:0, line 393 <- wrt source file 2023-01-11T23:16:24.3165603Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/unification_tools.py::first:0 2023-01-11T23:16:24.3166198Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/utils.py::transitive_get:0, line 12 <- wrt source file 2023-01-11T23:16:24.3169308Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/utils.py::transitive_get:0 2023-01-11T23:16:24.3169908Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/utils.py::_toposort:0, line 39 <- wrt source file 2023-01-11T23:16:24.3170607Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/utils.py::_toposort:0 2023-01-11T23:16:24.3171205Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/utils.py::reverse_dict:0, line 67 <- wrt source file 2023-01-11T23:16:24.3173589Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/utils.py::reverse_dict:0 2023-01-11T23:16:24.3174230Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/utils.py::freeze:0, line 92 <- wrt source file 2023-01-11T23:16:24.3177353Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/utils.py::freeze:0 2023-01-11T23:16:24.3177948Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/variable.py::variables:0, line 62 <- wrt source file 2023-01-11T23:16:24.3179971Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/variable.py::variables:0 2023-01-11T23:16:24.3180590Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/multipledispatch/core.py::dispatch:0, line 18 <- wrt source file 2023-01-11T23:16:24.3183906Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/multipledispatch/core.py::dispatch:0 2023-01-11T23:16:24.3184567Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/multipledispatch/dispatcher.py::Dispatcher:0, line 100 <- wrt source file 2023-01-11T23:16:24.3185212Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/multipledispatch/dispatcher.py::Dispatcher:0 2023-01-11T23:16:24.3185972Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/multipledispatch/dispatcher.py::Dispatcher.register:0, line 124 <- wrt source file 2023-01-11T23:16:24.3186659Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/multipledispatch/dispatcher.py::Dispatcher.register:0 2023-01-11T23:16:24.3187345Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/multipledispatch/dispatcher.py::Dispatcher.add:0, line 176 <- wrt source file 2023-01-11T23:16:24.3188006Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/multipledispatch/dispatcher.py::Dispatcher.add:0 2023-01-11T23:16:24.3188686Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/multipledispatch/dispatcher.py::Dispatcher.dispatch:0, line 288 <- wrt source file 2023-01-11T23:16:24.3189363Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/multipledispatch/dispatcher.py::Dispatcher.dispatch:0 2023-01-11T23:16:24.3190035Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/multipledispatch/dispatcher.py::str_signature:0, line 418 <- wrt source file 2023-01-11T23:16:24.3193546Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/multipledispatch/dispatcher.py::str_signature:0 2023-01-11T23:16:24.3194196Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/multipledispatch/utils.py::expand_tuples:0, line 15 <- wrt source file 2023-01-11T23:16:24.3195845Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/multipledispatch/utils.py::expand_tuples:0 2023-01-11T23:16:24.3196487Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/multipledispatch/utils.py::_toposort:0, line 38 <- wrt source file 2023-01-11T23:16:24.3199541Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/multipledispatch/utils.py::_toposort:0 2023-01-11T23:16:24.3200187Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/multipledispatch/utils.py::reverse_dict:0, line 66 <- wrt source file 2023-01-11T23:16:24.3202684Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/multipledispatch/utils.py::reverse_dict:0 2023-01-11T23:16:24.3203320Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/multipledispatch/utils.py::groupby:0, line 85 <- wrt source file 2023-01-11T23:16:24.3206459Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/multipledispatch/utils.py::groupby:0 2023-01-11T23:16:24.3207085Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/multipledispatch/utils.py::typename:0, line 115 <- wrt source file 2023-01-11T23:16:24.3209596Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/multipledispatch/utils.py::typename:0 2023-01-11T23:16:24.3210237Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/multipledispatch/variadic.py::isvariadic:0, line 47 <- wrt source file 2023-01-11T23:16:24.3210886Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/multipledispatch/variadic.py::isvariadic:0 2023-01-11T23:16:24.3211533Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/multipledispatch/variadic.py::Variadic:0, line 80 <- wrt source file 2023-01-11T23:16:24.3213470Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/multipledispatch/variadic.py::Variadic:0 2023-01-11T23:16:24.3214663Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/fx/passes/shape_prop.py::ShapeProp:0, line 76 <- wrt source file 2023-01-11T23:16:24.3215216Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/fx/passes/shape_prop.py::ShapeProp:0 2023-01-11T23:16:24.3217290Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/fx/passes/split_module.py::split_module:0, line 68 <- wrt source file 2023-01-11T23:16:24.3217838Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/fx/passes/split_module.py::split_module:0 2023-01-11T23:16:24.3218413Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/jit/_check.py::AttributeTypeIsSupportedChecker:0, line 35 <- wrt source file 2023-01-11T23:16:24.3219010Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/jit/_check.py::AttributeTypeIsSupportedChecker:0 2023-01-11T23:16:24.3219783Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/jit/_serialization.py::save:0, line 53 <- wrt source file 2023-01-11T23:16:24.3220384Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/jit/_serialization.py::save:0 2023-01-11T23:16:24.3221068Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/jit/_serialization.py::load:0, line 111 <- wrt source file 2023-01-11T23:16:24.3221663Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/jit/_serialization.py::load:0 2023-01-11T23:16:24.3222558Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/jit/_serialization.py::save_jit_module_to_flatbuffer:0, line 235 <- wrt source file 2023-01-11T23:16:24.3223261Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/jit/_serialization.py::save_jit_module_to_flatbuffer:0 2023-01-11T23:16:24.3224054Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/jit/mobile/__init__.py::_load_for_lite_interpreter:0, line 23 <- wrt source file 2023-01-11T23:16:24.3224772Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/jit/mobile/__init__.py::_load_for_lite_interpreter:0 2023-01-11T23:16:24.3225515Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/jit/mobile/__init__.py::_get_model_bytecode_version:0, line 89 <- wrt source file 2023-01-11T23:16:24.3226217Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/jit/mobile/__init__.py::_get_model_bytecode_version:0 2023-01-11T23:16:24.3226846Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/jit/mobile/__init__.py::_get_mobile_model_contained_types:0, line 119 <- wrt source file 2023-01-11T23:16:24.3227616Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/jit/mobile/__init__.py::_get_mobile_model_contained_types:0 2023-01-11T23:16:24.3228274Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/jit/mobile/__init__.py::_get_model_ops_and_info:0, line 199 <- wrt source file 2023-01-11T23:16:24.3228830Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/jit/mobile/__init__.py::_get_model_ops_and_info:0 2023-01-11T23:16:24.3229406Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/masked/maskedtensor/core.py::is_masked_tensor:0, line 22 <- wrt source file 2023-01-11T23:16:24.3229962Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/masked/maskedtensor/core.py::is_masked_tensor:0 2023-01-11T23:16:24.3230607Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/functional.py::fractional_max_pool2d_with_indices:0, line 452 <- wrt source file 2023-01-11T23:16:24.3275972Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/functional.py::fractional_max_pool2d_with_indices:0 2023-01-11T23:16:24.3276670Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/functional.py::fractional_max_pool3d_with_indices:0, line 553 <- wrt source file 2023-01-11T23:16:24.4076984Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/functional.py::fractional_max_pool3d_with_indices:0 2023-01-11T23:16:24.4089803Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/functional.py::gumbel_softmax:0, line 1878 <- wrt source file 2023-01-11T23:16:24.4099145Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/functional.py::gumbel_softmax:0 2023-01-11T23:16:24.4099765Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/functional.py::embedding:0, line 2149 <- wrt source file 2023-01-11T23:16:24.4108542Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/functional.py::embedding:0 2023-01-11T23:16:24.4109171Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/functional.py::embedding_bag:0, line 2286 <- wrt source file 2023-01-11T23:16:24.4119947Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/functional.py::embedding_bag:0 2023-01-11T23:16:24.4120571Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/functional.py::ctc_loss:0, line 2615 <- wrt source file 2023-01-11T23:16:24.4145943Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/functional.py::ctc_loss:0 2023-01-11T23:16:24.4146544Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/functional.py::nll_loss:0, line 2681 <- wrt source file 2023-01-11T23:16:24.4153454Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/functional.py::nll_loss:0 2023-01-11T23:16:24.4154056Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/functional.py::cross_entropy:0, line 3000 <- wrt source file 2023-01-11T23:16:24.4161087Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/functional.py::cross_entropy:0 2023-01-11T23:16:24.4161727Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/functional.py::binary_cross_entropy:0, line 3066 <- wrt source file 2023-01-11T23:16:24.4168267Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/functional.py::binary_cross_entropy:0 2023-01-11T23:16:24.4169600Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/functional.py::binary_cross_entropy_with_logits:0, line 3138 <- wrt source file 2023-01-11T23:16:24.4175384Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/functional.py::binary_cross_entropy_with_logits:0 2023-01-11T23:16:24.4176251Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/grad.py::conv1d_input:0, line 23 <- wrt source file 2023-01-11T23:16:24.4183763Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/grad.py::conv1d_input:0 2023-01-11T23:16:24.4184608Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/grad.py::conv1d_weight:0, line 53 <- wrt source file 2023-01-11T23:16:24.4189241Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/grad.py::conv1d_weight:0 2023-01-11T23:16:24.4190073Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/grad.py::conv2d_input:0, line 86 <- wrt source file 2023-01-11T23:16:24.4197590Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/grad.py::conv2d_input:0 2023-01-11T23:16:24.4198437Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/grad.py::conv2d_weight:0, line 116 <- wrt source file 2023-01-11T23:16:24.4203200Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/grad.py::conv2d_weight:0 2023-01-11T23:16:24.4204033Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/grad.py::conv3d_input:0, line 149 <- wrt source file 2023-01-11T23:16:24.4229988Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/grad.py::conv3d_input:0 2023-01-11T23:16:24.4231096Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/grad.py::conv3d_weight:0, line 179 <- wrt source file 2023-01-11T23:16:24.4249884Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/grad.py::conv3d_weight:0 2023-01-11T23:16:24.4250708Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/init.py::calculate_gain:0, line 96 <- wrt source file 2023-01-11T23:16:24.4253172Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/init.py::calculate_gain:0 2023-01-11T23:16:24.4253992Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/init.py::uniform_:0, line 132 <- wrt source file 2023-01-11T23:16:24.4258100Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/init.py::uniform_:0 2023-01-11T23:16:24.4258881Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/init.py::normal_:0, line 150 <- wrt source file 2023-01-11T23:16:24.4261892Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/init.py::normal_:0 2023-01-11T23:16:24.4262522Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/init.py::trunc_normal_:0, line 173 <- wrt source file 2023-01-11T23:16:24.4265971Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/init.py::trunc_normal_:0 2023-01-11T23:16:24.4266566Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/init.py::constant_:0, line 187 <- wrt source file 2023-01-11T23:16:24.4269815Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/init.py::constant_:0 2023-01-11T23:16:24.4270391Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/init.py::ones_:0, line 202 <- wrt source file 2023-01-11T23:16:24.4274136Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/init.py::ones_:0 2023-01-11T23:16:24.4274724Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/init.py::zeros_:0, line 215 <- wrt source file 2023-01-11T23:16:24.4276397Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/init.py::zeros_:0 2023-01-11T23:16:24.4277112Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/init.py::eye_:0, line 230 <- wrt source file 2023-01-11T23:16:24.4279955Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/init.py::eye_:0 2023-01-11T23:16:24.4280527Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/init.py::dirac_:0, line 251 <- wrt source file 2023-01-11T23:16:24.4285336Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/init.py::dirac_:0 2023-01-11T23:16:24.4285917Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/init.py::xavier_uniform_:0, line 320 <- wrt source file 2023-01-11T23:16:24.4289288Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/init.py::xavier_uniform_:0 2023-01-11T23:16:24.4289892Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/init.py::xavier_normal_:0, line 347 <- wrt source file 2023-01-11T23:16:24.4292834Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/init.py::xavier_normal_:0 2023-01-11T23:16:24.4293504Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/init.py::kaiming_uniform_:0, line 392 <- wrt source file 2023-01-11T23:16:24.4296720Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/init.py::kaiming_uniform_:0 2023-01-11T23:16:24.4297351Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/init.py::kaiming_normal_:0, line 441 <- wrt source file 2023-01-11T23:16:24.4300836Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/init.py::kaiming_normal_:0 2023-01-11T23:16:24.4301452Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/init.py::orthogonal_:0, line 466 <- wrt source file 2023-01-11T23:16:24.4302131Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/nn/init.py::orthogonal_:0 2023-01-11T23:16:24.4302736Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/init.py::sparse_:0, line 512 <- wrt source file 2023-01-11T23:16:24.4306524Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/init.py::sparse_:0 2023-01-11T23:16:24.4307140Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/activation.py::Threshold:0, line 40 <- wrt source file 2023-01-11T23:16:24.4311850Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/activation.py::Threshold:0 2023-01-11T23:16:24.4313776Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/activation.py::ReLU:0, line 83 <- wrt source file 2023-01-11T23:16:24.4318553Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/activation.py::ReLU:0 2023-01-11T23:16:24.4319425Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/activation.py::RReLU:0, line 142 <- wrt source file 2023-01-11T23:16:24.4324046Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/activation.py::RReLU:0 2023-01-11T23:16:24.4324961Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/activation.py::Hardtanh:0, line 202 <- wrt source file 2023-01-11T23:16:24.4328794Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/activation.py::Hardtanh:0 2023-01-11T23:16:24.4329688Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/activation.py::ReLU6:0, line 260 <- wrt source file 2023-01-11T23:16:24.4333363Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/activation.py::ReLU6:0 2023-01-11T23:16:24.4334270Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/activation.py::Sigmoid:0, line 288 <- wrt source file 2023-01-11T23:16:24.4338074Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/activation.py::Sigmoid:0 2023-01-11T23:16:24.4338977Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/activation.py::Hardsigmoid:0, line 320 <- wrt source file 2023-01-11T23:16:24.4342437Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/activation.py::Hardsigmoid:0 2023-01-11T23:16:24.4343325Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/activation.py::Tanh:0, line 352 <- wrt source file 2023-01-11T23:16:24.4346816Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/activation.py::Tanh:0 2023-01-11T23:16:24.4347681Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/activation.py::SiLU:0, line 383 <- wrt source file 2023-01-11T23:16:24.4351479Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/activation.py::SiLU:0 2023-01-11T23:16:24.4352463Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/activation.py::Mish:0, line 419 <- wrt source file 2023-01-11T23:16:24.4355839Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/activation.py::Mish:0 2023-01-11T23:16:24.4356731Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/activation.py::Hardswish:0, line 461 <- wrt source file 2023-01-11T23:16:24.4360481Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/activation.py::Hardswish:0 2023-01-11T23:16:24.4361349Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/activation.py::ELU:0, line 502 <- wrt source file 2023-01-11T23:16:24.4365011Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/activation.py::ELU:0 2023-01-11T23:16:24.4365998Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/activation.py::CELU:0, line 543 <- wrt source file 2023-01-11T23:16:24.4369643Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/activation.py::CELU:0 2023-01-11T23:16:24.4370507Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/activation.py::SELU:0, line 595 <- wrt source file 2023-01-11T23:16:24.4374096Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/activation.py::SELU:0 2023-01-11T23:16:24.4375218Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/activation.py::GLU:0, line 631 <- wrt source file 2023-01-11T23:16:24.4378880Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/activation.py::GLU:0 2023-01-11T23:16:24.4379755Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/activation.py::GELU:0, line 672 <- wrt source file 2023-01-11T23:16:24.4385522Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/activation.py::GELU:0 2023-01-11T23:16:24.4386416Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/activation.py::Hardshrink:0, line 714 <- wrt source file 2023-01-11T23:16:24.4390225Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/activation.py::Hardshrink:0 2023-01-11T23:16:24.4391221Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/activation.py::LeakyReLU:0, line 761 <- wrt source file 2023-01-11T23:16:24.4395480Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/activation.py::LeakyReLU:0 2023-01-11T23:16:24.4396323Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/activation.py::LogSigmoid:0, line 796 <- wrt source file 2023-01-11T23:16:24.4399172Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/activation.py::LogSigmoid:0 2023-01-11T23:16:24.4400172Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/activation.py::Softplus:0, line 827 <- wrt source file 2023-01-11T23:16:24.4403703Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/activation.py::Softplus:0 2023-01-11T23:16:24.4404592Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/activation.py::Softshrink:0, line 869 <- wrt source file 2023-01-11T23:16:24.4408869Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/activation.py::Softshrink:0 2023-01-11T23:16:24.4409780Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/activation.py::MultiheadAttention:0, line 938 <- wrt source file 2023-01-11T23:16:24.4410649Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/activation.py::MultiheadAttention:0 2023-01-11T23:16:24.4411520Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/activation.py::PReLU:0, line 1274 <- wrt source file 2023-01-11T23:16:24.4414280Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/activation.py::PReLU:0 2023-01-11T23:16:24.4415468Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/activation.py::Softsign:0, line 1309 <- wrt source file 2023-01-11T23:16:24.4419469Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/activation.py::Softsign:0 2023-01-11T23:16:24.4420351Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/activation.py::Tanhshrink:0, line 1332 <- wrt source file 2023-01-11T23:16:24.4423758Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/activation.py::Tanhshrink:0 2023-01-11T23:16:24.4424808Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/activation.py::Softmin:0, line 1366 <- wrt source file 2023-01-11T23:16:24.4428696Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/activation.py::Softmin:0 2023-01-11T23:16:24.4429595Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/activation.py::Softmax:0, line 1421 <- wrt source file 2023-01-11T23:16:24.4434862Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/activation.py::Softmax:0 2023-01-11T23:16:24.4435761Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/activation.py::Softmax2d:0, line 1461 <- wrt source file 2023-01-11T23:16:24.4437287Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/activation.py::Softmax2d:0 2023-01-11T23:16:24.4438184Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/activation.py::LogSoftmax:0, line 1493 <- wrt source file 2023-01-11T23:16:24.4442367Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/activation.py::LogSoftmax:0 2023-01-11T23:16:24.4443291Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/batchnorm.py::BatchNorm1d:0, line 290 <- wrt source file 2023-01-11T23:16:24.4450994Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/batchnorm.py::BatchNorm1d:0 2023-01-11T23:16:24.4451870Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/batchnorm.py::BatchNorm2d:0, line 399 <- wrt source file 2023-01-11T23:16:24.4672352Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/batchnorm.py::BatchNorm2d:0 2023-01-11T23:16:24.4673360Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/batchnorm.py::BatchNorm3d:0, line 505 <- wrt source file 2023-01-11T23:16:24.6851095Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/batchnorm.py::BatchNorm3d:0 2023-01-11T23:16:24.6961369Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/batchnorm.py::SyncBatchNorm:0, line 627 <- wrt source file 2023-01-11T23:16:24.6963198Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/batchnorm.py::SyncBatchNorm:0 2023-01-11T23:16:24.6963845Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/batchnorm.py::SyncBatchNorm.convert_sync_batchnorm:0, line 782 <- wrt source file 2023-01-11T23:16:24.6964474Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/batchnorm.py::SyncBatchNorm.convert_sync_batchnorm:0 2023-01-11T23:16:24.6965076Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/channelshuffle.py::ChannelShuffle:0, line 17 <- wrt source file 2023-01-11T23:16:24.6984367Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/channelshuffle.py::ChannelShuffle:0 2023-01-11T23:16:24.6985057Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/container.py::Sequential:0, line 63 <- wrt source file 2023-01-11T23:16:24.6985672Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/container.py::Sequential:0 2023-01-11T23:16:24.6986557Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/container.py::ModuleList:0, line 261 <- wrt source file 2023-01-11T23:16:24.6987183Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/container.py::ModuleList:0 2023-01-11T23:16:24.6987963Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/container.py::ModuleDict:0, line 433 <- wrt source file 2023-01-11T23:16:24.6988567Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/container.py::ModuleDict:0 2023-01-11T23:16:24.6989374Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/container.py::ParameterList:0, line 567 <- wrt source file 2023-01-11T23:16:24.6990013Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/container.py::ParameterList:0 2023-01-11T23:16:24.6993044Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/container.py::ParameterDict:0, line 707 <- wrt source file 2023-01-11T23:16:24.6993729Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/container.py::ParameterDict:0 2023-01-11T23:16:24.6994368Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/distance.py::PairwiseDistance:0, line 36 <- wrt source file 2023-01-11T23:16:24.6999428Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/distance.py::PairwiseDistance:0 2023-01-11T23:16:24.7000083Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/distance.py::CosineSimilarity:0, line 72 <- wrt source file 2023-01-11T23:16:24.7006884Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/distance.py::CosineSimilarity:0 2023-01-11T23:16:24.7007510Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/dropout.py::Dropout:0, line 49 <- wrt source file 2023-01-11T23:16:24.7012459Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/dropout.py::Dropout:0 2023-01-11T23:16:24.7013148Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/dropout.py::Dropout1d:0, line 91 <- wrt source file 2023-01-11T23:16:24.7017810Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/dropout.py::Dropout1d:0 2023-01-11T23:16:24.7018665Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/dropout.py::Dropout2d:0, line 140 <- wrt source file 2023-01-11T23:16:24.7040891Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/dropout.py::Dropout2d:0 2023-01-11T23:16:24.7041549Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/dropout.py::Dropout3d:0, line 182 <- wrt source file 2023-01-11T23:16:24.7128084Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/dropout.py::Dropout3d:0 2023-01-11T23:16:24.7128761Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/dropout.py::AlphaDropout:0, line 225 <- wrt source file 2023-01-11T23:16:24.7133076Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/dropout.py::AlphaDropout:0 2023-01-11T23:16:24.7133838Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/dropout.py::FeatureAlphaDropout:0, line 272 <- wrt source file 2023-01-11T23:16:24.7249745Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/dropout.py::FeatureAlphaDropout:0 2023-01-11T23:16:24.7250423Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/flatten.py::Flatten:0, line 24 <- wrt source file 2023-01-11T23:16:24.7257427Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/flatten.py::Flatten:0 2023-01-11T23:16:24.7258041Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/flatten.py::Unflatten:0, line 76 <- wrt source file 2023-01-11T23:16:24.7274309Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/flatten.py::Unflatten:0 2023-01-11T23:16:24.7274917Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/fold.py::Fold:0, line 111 <- wrt source file 2023-01-11T23:16:24.7279071Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/fold.py::Fold:0 2023-01-11T23:16:24.7279662Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/fold.py::Unfold:0, line 253 <- wrt source file 2023-01-11T23:16:24.7299441Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/fold.py::Unfold:0 2023-01-11T23:16:24.7300079Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/instancenorm.py::InstanceNorm1d:0, line 135 <- wrt source file 2023-01-11T23:16:24.7313209Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/instancenorm.py::InstanceNorm1d:0 2023-01-11T23:16:24.7313921Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/instancenorm.py::InstanceNorm2d:0, line 251 <- wrt source file 2023-01-11T23:16:24.7541725Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/instancenorm.py::InstanceNorm2d:0 2023-01-11T23:16:24.7542419Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/instancenorm.py::InstanceNorm3d:0, line 367 <- wrt source file 2023-01-11T23:16:24.9689311Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/instancenorm.py::InstanceNorm3d:0 2023-01-11T23:16:24.9785801Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/lazy.py::LazyModuleMixin:0, line 77 <- wrt source file 2023-01-11T23:16:24.9790913Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/lazy.py::LazyModuleMixin:0 2023-01-11T23:16:24.9791716Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/linear.py::Identity:0, line 33 <- wrt source file 2023-01-11T23:16:24.9798888Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/linear.py::Identity:0 2023-01-11T23:16:24.9799759Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/linear.py::Linear:0, line 78 <- wrt source file 2023-01-11T23:16:24.9807786Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/linear.py::Linear:0 2023-01-11T23:16:24.9808635Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/linear.py::Bilinear:0, line 164 <- wrt source file 2023-01-11T23:16:24.9831811Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/linear.py::Bilinear:0 2023-01-11T23:16:24.9832591Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/loss.py::L1Loss:0, line 88 <- wrt source file 2023-01-11T23:16:24.9839547Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/loss.py::L1Loss:0 2023-01-11T23:16:24.9840161Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/loss.py::NLLLoss:0, line 184 <- wrt source file 2023-01-11T23:16:24.9865627Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/loss.py::NLLLoss:0 2023-01-11T23:16:24.9866259Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/loss.py::PoissonNLLLoss:0, line 271 <- wrt source file 2023-01-11T23:16:24.9872824Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/loss.py::PoissonNLLLoss:0 2023-01-11T23:16:24.9873538Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/loss.py::GaussianNLLLoss:0, line 343 <- wrt source file 2023-01-11T23:16:24.9887056Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/loss.py::GaussianNLLLoss:0 2023-01-11T23:16:24.9887707Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/loss.py::KLDivLoss:0, line 451 <- wrt source file 2023-01-11T23:16:24.9896475Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/loss.py::KLDivLoss:0 2023-01-11T23:16:24.9897091Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/loss.py::MSELoss:0, line 523 <- wrt source file 2023-01-11T23:16:24.9902917Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/loss.py::MSELoss:0 2023-01-11T23:16:24.9903751Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/loss.py::BCELoss:0, line 605 <- wrt source file 2023-01-11T23:16:24.9909623Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/loss.py::BCELoss:0 2023-01-11T23:16:24.9910249Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/loss.py::BCEWithLogitsLoss:0, line 668 <- wrt source file 2023-01-11T23:16:24.9921369Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/loss.py::BCEWithLogitsLoss:0 2023-01-11T23:16:24.9922014Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/loss.py::MultiLabelMarginLoss:0, line 831 <- wrt source file 2023-01-11T23:16:24.9929610Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/loss.py::MultiLabelMarginLoss:0 2023-01-11T23:16:24.9930187Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/loss.py::CrossEntropyLoss:0, line 1149 <- wrt source file 2023-01-11T23:16:24.9939955Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/loss.py::CrossEntropyLoss:0 2023-01-11T23:16:24.9940619Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/loss.py::MarginRankingLoss:0, line 1317 <- wrt source file 2023-01-11T23:16:24.9947105Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/loss.py::MarginRankingLoss:0 2023-01-11T23:16:24.9947910Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/loss.py::MultiMarginLoss:0, line 1388 <- wrt source file 2023-01-11T23:16:24.9955242Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/loss.py::MultiMarginLoss:0 2023-01-11T23:16:24.9955885Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/loss.py::TripletMarginLoss:0, line 1468 <- wrt source file 2023-01-11T23:16:24.9966793Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/loss.py::TripletMarginLoss:0 2023-01-11T23:16:24.9967603Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/loss.py::TripletMarginWithDistanceLoss:0, line 1559 <- wrt source file 2023-01-11T23:16:24.9987797Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/loss.py::TripletMarginWithDistanceLoss:0 2023-01-11T23:16:24.9988448Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/loss.py::CTCLoss:0, line 1670 <- wrt source file 2023-01-11T23:16:25.0015246Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/loss.py::CTCLoss:0 2023-01-11T23:16:25.0015879Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py::Module.register_buffer:0, line 491 <- wrt source file 2023-01-11T23:16:25.0016519Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py::Module.register_buffer:0 2023-01-11T23:16:25.0017160Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py::Module.apply:0, line 847 <- wrt source file 2023-01-11T23:16:25.0030509Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py::Module.apply:0 2023-01-11T23:16:25.0031134Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py::Module.to:0, line 1072 <- wrt source file 2023-01-11T23:16:25.0039341Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py::Module.to:0 2023-01-11T23:16:25.0039986Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py::Module.state_dict:0, line 1763 <- wrt source file 2023-01-11T23:16:25.0040616Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py::Module.state_dict:0 2023-01-11T23:16:25.0041407Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py::Module.parameters:0, line 2050 <- wrt source file 2023-01-11T23:16:25.0042023Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py::Module.parameters:0 2023-01-11T23:16:25.0042673Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py::Module.named_parameters:0, line 2082 <- wrt source file 2023-01-11T23:16:25.0043313Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py::Module.named_parameters:0 2023-01-11T23:16:25.0043953Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py::Module.buffers:0, line 2107 <- wrt source file 2023-01-11T23:16:25.0044553Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py::Module.buffers:0 2023-01-11T23:16:25.0045201Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py::Module.named_buffers:0, line 2133 <- wrt source file 2023-01-11T23:16:25.0045831Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py::Module.named_buffers:0 2023-01-11T23:16:25.0046477Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py::Module.named_children:0, line 2163 <- wrt source file 2023-01-11T23:16:25.0047090Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py::Module.named_children:0 2023-01-11T23:16:25.0047722Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py::Module.modules:0, line 2187 <- wrt source file 2023-01-11T23:16:25.0050817Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py::Module.modules:0 2023-01-11T23:16:25.0051460Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py::Module.named_modules:0, line 2221 <- wrt source file 2023-01-11T23:16:25.0056898Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py::Module.named_modules:0 2023-01-11T23:16:25.0057638Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/normalization.py::LocalResponseNorm:0, line 34 <- wrt source file 2023-01-11T23:16:25.0096339Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/normalization.py::LocalResponseNorm:0 2023-01-11T23:16:25.0097252Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/normalization.py::LayerNorm:0, line 140 <- wrt source file 2023-01-11T23:16:25.0106275Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/normalization.py::LayerNorm:0 2023-01-11T23:16:25.0106940Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/normalization.py::GroupNorm:0, line 230 <- wrt source file 2023-01-11T23:16:25.0114062Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/normalization.py::GroupNorm:0 2023-01-11T23:16:25.0114755Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/padding.py::ConstantPad1d:0, line 48 <- wrt source file 2023-01-11T23:16:25.0123120Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/padding.py::ConstantPad1d:0 2023-01-11T23:16:25.0123771Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/padding.py::ConstantPad2d:0, line 101 <- wrt source file 2023-01-11T23:16:25.0129200Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/padding.py::ConstantPad2d:0 2023-01-11T23:16:25.0129850Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/padding.py::ConstantPad3d:0, line 157 <- wrt source file 2023-01-11T23:16:25.0155212Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/padding.py::ConstantPad3d:0 2023-01-11T23:16:25.0155871Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/padding.py::ReflectionPad1d:0, line 201 <- wrt source file 2023-01-11T23:16:25.0161500Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/padding.py::ReflectionPad1d:0 2023-01-11T23:16:25.0162406Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/padding.py::ReflectionPad2d:0, line 244 <- wrt source file 2023-01-11T23:16:25.0168368Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/padding.py::ReflectionPad2d:0 2023-01-11T23:16:25.0169032Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/padding.py::ReflectionPad3d:0, line 301 <- wrt source file 2023-01-11T23:16:25.0172250Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/padding.py::ReflectionPad3d:0 2023-01-11T23:16:25.0172925Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/padding.py::ReplicationPad1d:0, line 358 <- wrt source file 2023-01-11T23:16:25.0178159Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/padding.py::ReplicationPad1d:0 2023-01-11T23:16:25.0178826Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/padding.py::ReplicationPad2d:0, line 401 <- wrt source file 2023-01-11T23:16:25.0184370Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/padding.py::ReplicationPad2d:0 2023-01-11T23:16:25.0185023Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/padding.py::ReplicationPad3d:0, line 458 <- wrt source file 2023-01-11T23:16:25.4954538Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/padding.py::ReplicationPad3d:0 2023-01-11T23:16:25.5128459Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/padding.py::ZeroPad2d:0, line 494 <- wrt source file 2023-01-11T23:16:25.5136467Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/padding.py::ZeroPad2d:0 2023-01-11T23:16:25.5137335Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/pixelshuffle.py::PixelShuffle:0, line 36 <- wrt source file 2023-01-11T23:16:25.5143322Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/pixelshuffle.py::PixelShuffle:0 2023-01-11T23:16:25.5144084Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/pixelshuffle.py::PixelUnshuffle:0, line 86 <- wrt source file 2023-01-11T23:16:25.5149529Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/pixelshuffle.py::PixelUnshuffle:0 2023-01-11T23:16:25.5150179Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/pooling.py::MaxPool1d:0, line 76 <- wrt source file 2023-01-11T23:16:25.5156042Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/pooling.py::MaxPool1d:0 2023-01-11T23:16:25.5156660Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/pooling.py::MaxPool2d:0, line 148 <- wrt source file 2023-01-11T23:16:25.5199855Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/pooling.py::MaxPool2d:0 2023-01-11T23:16:25.5200540Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/pooling.py::MaxPool3d:0, line 226 <- wrt source file 2023-01-11T23:16:25.7005103Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/pooling.py::MaxPool3d:0 2023-01-11T23:16:25.7042663Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/pooling.py::MaxUnpool1d:0, line 293 <- wrt source file 2023-01-11T23:16:25.7057485Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/pooling.py::MaxUnpool1d:0 2023-01-11T23:16:25.7058493Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/pooling.py::MaxUnpool2d:0, line 366 <- wrt source file 2023-01-11T23:16:25.7077513Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/pooling.py::MaxUnpool2d:0 2023-01-11T23:16:25.7078146Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/pooling.py::MaxUnpool3d:0, line 451 <- wrt source file 2023-01-11T23:16:25.7692868Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/pooling.py::MaxUnpool3d:0 2023-01-11T23:16:25.7693573Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/pooling.py::AvgPool1d:0, line 524 <- wrt source file 2023-01-11T23:16:25.7703183Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/pooling.py::AvgPool1d:0 2023-01-11T23:16:25.7703870Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/pooling.py::AvgPool2d:0, line 600 <- wrt source file 2023-01-11T23:16:25.7742998Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/pooling.py::AvgPool2d:0 2023-01-11T23:16:25.7743630Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/pooling.py::AvgPool3d:0, line 686 <- wrt source file 2023-01-11T23:16:25.9248715Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/pooling.py::AvgPool3d:0 2023-01-11T23:16:25.9284484Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/pooling.py::FractionalMaxPool2d:0, line 749 <- wrt source file 2023-01-11T23:16:25.9329498Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/pooling.py::FractionalMaxPool2d:0 2023-01-11T23:16:25.9330191Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/pooling.py::FractionalMaxPool3d:0, line 819 <- wrt source file 2023-01-11T23:16:25.9904942Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/pooling.py::FractionalMaxPool3d:0 2023-01-11T23:16:25.9905972Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/pooling.py::LPPool1d:0, line 909 <- wrt source file 2023-01-11T23:16:25.9914538Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/pooling.py::LPPool1d:0 2023-01-11T23:16:25.9915174Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/pooling.py::LPPool2d:0, line 960 <- wrt source file 2023-01-11T23:16:25.9959820Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/pooling.py::LPPool2d:0 2023-01-11T23:16:25.9960463Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/pooling.py::AdaptiveMaxPool1d:0, line 1011 <- wrt source file 2023-01-11T23:16:25.9964978Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/pooling.py::AdaptiveMaxPool1d:0 2023-01-11T23:16:25.9965663Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/pooling.py::AdaptiveMaxPool2d:0, line 1045 <- wrt source file 2023-01-11T23:16:25.9973426Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/pooling.py::AdaptiveMaxPool2d:0 2023-01-11T23:16:25.9974095Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/pooling.py::AdaptiveMaxPool3d:0, line 1088 <- wrt source file 2023-01-11T23:16:25.9993458Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/pooling.py::AdaptiveMaxPool3d:0 2023-01-11T23:16:25.9994184Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/pooling.py::AdaptiveAvgPool1d:0, line 1135 <- wrt source file 2023-01-11T23:16:25.9997691Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/pooling.py::AdaptiveAvgPool1d:0 2023-01-11T23:16:25.9998503Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/pooling.py::AdaptiveAvgPool2d:0, line 1166 <- wrt source file 2023-01-11T23:16:26.0005937Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/pooling.py::AdaptiveAvgPool2d:0 2023-01-11T23:16:26.0006597Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/pooling.py::AdaptiveAvgPool3d:0, line 1205 <- wrt source file 2023-01-11T23:16:26.0025896Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/pooling.py::AdaptiveAvgPool3d:0 2023-01-11T23:16:26.0026524Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/rnn.py::RNN:0, line 436 <- wrt source file 2023-01-11T23:16:26.0038204Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/rnn.py::RNN:0 2023-01-11T23:16:26.0038798Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/rnn.py::LSTM:0, line 702 <- wrt source file 2023-01-11T23:16:26.0054269Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/rnn.py::LSTM:0 2023-01-11T23:16:26.0055005Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/rnn.py::GRU:0, line 933 <- wrt source file 2023-01-11T23:16:26.0072240Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/rnn.py::GRU:0 2023-01-11T23:16:26.0073001Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/rnn.py::RNNCell:0, line 1106 <- wrt source file 2023-01-11T23:16:26.0081979Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/rnn.py::RNNCell:0 2023-01-11T23:16:26.0082572Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/rnn.py::LSTMCell:0, line 1207 <- wrt source file 2023-01-11T23:16:26.0092617Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/rnn.py::LSTMCell:0 2023-01-11T23:16:26.0093240Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/rnn.py::GRUCell:0, line 1300 <- wrt source file 2023-01-11T23:16:26.0105762Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/rnn.py::GRUCell:0 2023-01-11T23:16:26.0106397Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/sparse.py::Embedding:0, line 67 <- wrt source file 2023-01-11T23:16:26.0119849Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/sparse.py::Embedding:0 2023-01-11T23:16:26.0120503Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/sparse.py::Embedding.from_pretrained:0, line 200 <- wrt source file 2023-01-11T23:16:26.0125963Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/sparse.py::Embedding.from_pretrained:0 2023-01-11T23:16:26.0126615Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/sparse.py::EmbeddingBag:0, line 278 <- wrt source file 2023-01-11T23:16:26.0141587Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/sparse.py::EmbeddingBag:0 2023-01-11T23:16:26.0142264Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/sparse.py::EmbeddingBag.from_pretrained:0, line 429 <- wrt source file 2023-01-11T23:16:26.0148519Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/sparse.py::EmbeddingBag.from_pretrained:0 2023-01-11T23:16:26.0149183Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/transformer.py::Transformer:0, line 42 <- wrt source file 2023-01-11T23:16:26.5985142Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/transformer.py::Transformer:0 2023-01-11T23:16:26.5997447Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/transformer.py::Transformer.forward:0, line 134 <- wrt source file 2023-01-11T23:16:26.5998777Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/transformer.py::Transformer.forward:0 2023-01-11T23:16:26.5999683Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/transformer.py::TransformerEncoder:0, line 181 <- wrt source file 2023-01-11T23:16:26.6689092Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/transformer.py::TransformerEncoder:0 2023-01-11T23:16:26.6691749Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/transformer.py::TransformerDecoder:0, line 325 <- wrt source file 2023-01-11T23:16:26.8154586Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/transformer.py::TransformerDecoder:0 2023-01-11T23:16:26.8160305Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/transformer.py::TransformerEncoderLayer:0, line 391 <- wrt source file 2023-01-11T23:16:26.8409200Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/transformer.py::TransformerEncoderLayer:0 2023-01-11T23:16:26.8450693Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/transformer.py::TransformerDecoderLayer:0, line 608 <- wrt source file 2023-01-11T23:16:26.8963546Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/transformer.py::TransformerDecoderLayer:0 2023-01-11T23:16:26.8964316Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/upsampling.py::Upsample:0, line 74 <- wrt source file 2023-01-11T23:16:26.8990184Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/upsampling.py::Upsample:0 2023-01-11T23:16:26.8990928Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/upsampling.py::UpsamplingNearest2d:0, line 196 <- wrt source file 2023-01-11T23:16:26.9004264Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/upsampling.py::UpsamplingNearest2d:0 2023-01-11T23:16:26.9005239Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/upsampling.py::UpsamplingBilinear2d:0, line 242 <- wrt source file 2023-01-11T23:16:26.9014226Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/upsampling.py::UpsamplingBilinear2d:0 2023-01-11T23:16:26.9015208Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/parallel/data_parallel.py::DataParallel:0, line 116 <- wrt source file 2023-01-11T23:16:26.9015874Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/nn/parallel/data_parallel.py::DataParallel:0 2023-01-11T23:16:26.9016560Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/parallel/distributed.py::DistributedDataParallel:0, line 534 <- wrt source file 2023-01-11T23:16:26.9017263Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/nn/parallel/distributed.py::DistributedDataParallel:0 2023-01-11T23:16:26.9017994Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/parallel/distributed.py::DistributedDataParallel.no_sync:0, line 1051 <- wrt source file 2023-01-11T23:16:26.9018770Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/nn/parallel/distributed.py::DistributedDataParallel.no_sync:0 2023-01-11T23:16:26.9019435Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/parallel/distributed.py::DistributedDataParallel.join:0, line 1377 <- wrt source file 2023-01-11T23:16:26.9020151Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/nn/parallel/distributed.py::DistributedDataParallel.join:0 2023-01-11T23:16:26.9020898Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/parallel/distributed.py::DistributedDataParallel.register_comm_hook:0, line 1550 <- wrt source file 2023-01-11T23:16:26.9021732Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/nn/parallel/distributed.py::DistributedDataParallel.register_comm_hook:0 2023-01-11T23:16:26.9022411Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/parallel/distributed.py::DistributedDataParallel.register_comm_hook:1, line 1560 <- wrt source file 2023-01-11T23:16:26.9023053Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/nn/parallel/distributed.py::DistributedDataParallel.register_comm_hook:1 2023-01-11T23:16:26.9023734Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/parallel/distributed.py::DistributedDataParallel._register_builtin_comm_hook:0, line 1594 <- wrt source file 2023-01-11T23:16:26.9024460Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/nn/parallel/distributed.py::DistributedDataParallel._register_builtin_comm_hook:0 2023-01-11T23:16:26.9025139Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/parallel/distributed.py::DistributedDataParallel._register_fused_optim:0, line 1652 <- wrt source file 2023-01-11T23:16:26.9025793Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/nn/parallel/distributed.py::DistributedDataParallel._register_fused_optim:0 2023-01-11T23:16:26.9026509Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/utils/_per_sample_grad.py::call_for_per_sample_grads:0, line 32 <- wrt source file 2023-01-11T23:16:26.9027177Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/nn/utils/_per_sample_grad.py::call_for_per_sample_grads:0 2023-01-11T23:16:26.9027805Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/utils/init.py::skip_init:0, line 30 <- wrt source file 2023-01-11T23:16:26.9033948Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/utils/init.py::skip_init:0 2023-01-11T23:16:26.9035786Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/utils/memory_format.py::convert_conv2d_weight_memory_format:0, line 54 <- wrt source file 2023-01-11T23:16:26.9036872Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/nn/utils/memory_format.py::convert_conv2d_weight_memory_format:0 2023-01-11T23:16:26.9037481Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/utils/parametrizations.py::orthogonal:0, line 245 <- wrt source file 2023-01-11T23:16:26.9038049Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/nn/utils/parametrizations.py::orthogonal:0 2023-01-11T23:16:26.9038634Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/utils/parametrizations.py::spectral_norm:0, line 462 <- wrt source file 2023-01-11T23:16:26.9039204Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/nn/utils/parametrizations.py::spectral_norm:0 2023-01-11T23:16:26.9039803Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/utils/parametrize.py::register_parametrization:0, line 463 <- wrt source file 2023-01-11T23:16:26.9044427Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/nn/utils/parametrize.py::register_parametrization:0 2023-01-11T23:16:26.9045050Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/utils/prune.py::identity:0, line 846 <- wrt source file 2023-01-11T23:16:26.9045561Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/nn/utils/prune.py::identity:0 2023-01-11T23:16:26.9046098Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/utils/prune.py::random_unstructured:0, line 880 <- wrt source file 2023-01-11T23:16:26.9046640Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/nn/utils/prune.py::random_unstructured:0 2023-01-11T23:16:26.9047190Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/utils/prune.py::l1_unstructured:0, line 921 <- wrt source file 2023-01-11T23:16:26.9048035Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/nn/utils/prune.py::l1_unstructured:0 2023-01-11T23:16:26.9048583Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/utils/prune.py::random_structured:0, line 959 <- wrt source file 2023-01-11T23:16:26.9050327Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/nn/utils/prune.py::random_structured:0 2023-01-11T23:16:26.9050879Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/utils/prune.py::ln_structured:0, line 1005 <- wrt source file 2023-01-11T23:16:26.9062955Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/utils/prune.py::ln_structured:0 2023-01-11T23:16:26.9063506Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/utils/prune.py::global_unstructured:0, line 1058 <- wrt source file 2023-01-11T23:16:26.9079697Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/utils/prune.py::global_unstructured:0 2023-01-11T23:16:26.9080256Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/utils/prune.py::custom_from_mask:0, line 1160 <- wrt source file 2023-01-11T23:16:26.9088763Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/utils/prune.py::custom_from_mask:0 2023-01-11T23:16:26.9089302Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/utils/prune.py::remove:0, line 1188 <- wrt source file 2023-01-11T23:16:26.9094950Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/utils/prune.py::remove:0 2023-01-11T23:16:26.9095485Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/utils/prune.py::is_pruned:0, line 1215 <- wrt source file 2023-01-11T23:16:26.9103446Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/utils/prune.py::is_pruned:0 2023-01-11T23:16:26.9104017Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/utils/rnn.py::pad_packed_sequence:0, line 282 <- wrt source file 2023-01-11T23:16:26.9118934Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/utils/rnn.py::pad_packed_sequence:0 2023-01-11T23:16:26.9119473Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/utils/rnn.py::pad_sequence:0, line 359 <- wrt source file 2023-01-11T23:16:26.9124803Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/utils/rnn.py::pad_sequence:0 2023-01-11T23:16:26.9125335Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/utils/rnn.py::unpad_sequence:0, line 412 <- wrt source file 2023-01-11T23:16:26.9139825Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/utils/rnn.py::unpad_sequence:0 2023-01-11T23:16:26.9140443Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/utils/rnn.py::pack_sequence:0, line 467 <- wrt source file 2023-01-11T23:16:26.9147236Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/utils/rnn.py::pack_sequence:0 2023-01-11T23:16:26.9147863Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/utils/rnn.py::unpack_sequence:0, line 495 <- wrt source file 2023-01-11T23:16:26.9163848Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/utils/rnn.py::unpack_sequence:0 2023-01-11T23:16:26.9164545Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/utils/spectral_norm.py::spectral_norm:0, line 267 <- wrt source file 2023-01-11T23:16:26.9170835Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/utils/spectral_norm.py::spectral_norm:0 2023-01-11T23:16:26.9171502Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/utils/spectral_norm.py::remove_spectral_norm:0, line 294 <- wrt source file 2023-01-11T23:16:26.9177853Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/utils/spectral_norm.py::remove_spectral_norm:0 2023-01-11T23:16:26.9178599Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/utils/stateless.py::functional_call:0, line 123 <- wrt source file 2023-01-11T23:16:26.9181269Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/utils/stateless.py::functional_call:0 2023-01-11T23:16:26.9181907Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/utils/weight_norm.py::weight_norm:0, line 99 <- wrt source file 2023-01-11T23:16:26.9188407Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/utils/weight_norm.py::weight_norm:0 2023-01-11T23:16:26.9189052Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/utils/weight_norm.py::remove_weight_norm:0, line 121 <- wrt source file 2023-01-11T23:16:26.9194496Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/utils/weight_norm.py::remove_weight_norm:0 2023-01-11T23:16:26.9195163Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/utils/_expanded_weights/conv_utils.py::unfold3d:0, line 203 <- wrt source file 2023-01-11T23:16:26.9195838Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/nn/utils/_expanded_weights/conv_utils.py::unfold3d:0 2023-01-11T23:16:26.9196582Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/utils/_expanded_weights/expanded_weights_utils.py::sum_over_all_but_batch_and_last_n:0, line 108 <- wrt source file 2023-01-11T23:16:26.9217045Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/nn/utils/_expanded_weights/expanded_weights_utils.py::sum_over_all_but_batch_and_last_n:0 2023-01-11T23:16:26.9217741Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/onnx/_type_utils.py::JitScalarType:0, line 66 <- wrt source file 2023-01-11T23:16:26.9219925Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/onnx/_type_utils.py::JitScalarType:0 2023-01-11T23:16:26.9220586Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/onnx/verification.py::find_mismatch:0, line 1746 <- wrt source file 2023-01-11T23:16:26.9221285Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/onnx/verification.py::find_mismatch:0 2023-01-11T23:16:26.9221976Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/onnx/_internal/diagnostics/infra/engine.py::DiagnosticEngine:0, line 20 <- wrt source file 2023-01-11T23:16:26.9222965Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/onnx/_internal/diagnostics/infra/engine.py::DiagnosticEngine:0 2023-01-11T23:16:26.9223639Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/optim/lr_scheduler.py::LambdaLR:0, line 200 <- wrt source file 2023-01-11T23:16:26.9224238Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/optim/lr_scheduler.py::LambdaLR:0 2023-01-11T23:16:26.9224873Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/optim/lr_scheduler.py::MultiplicativeLR:0, line 286 <- wrt source file 2023-01-11T23:16:26.9225499Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/optim/lr_scheduler.py::MultiplicativeLR:0 2023-01-11T23:16:26.9226114Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/optim/lr_scheduler.py::StepLR:0, line 369 <- wrt source file 2023-01-11T23:16:26.9226705Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/optim/lr_scheduler.py::StepLR:0 2023-01-11T23:16:26.9227315Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/optim/lr_scheduler.py::MultiStepLR:0, line 418 <- wrt source file 2023-01-11T23:16:26.9227917Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/optim/lr_scheduler.py::MultiStepLR:0 2023-01-11T23:16:26.9228527Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/optim/lr_scheduler.py::ConstantLR:0, line 467 <- wrt source file 2023-01-11T23:16:26.9229184Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/optim/lr_scheduler.py::ConstantLR:0 2023-01-11T23:16:26.9229798Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/optim/lr_scheduler.py::LinearLR:0, line 529 <- wrt source file 2023-01-11T23:16:26.9230406Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/optim/lr_scheduler.py::LinearLR:0 2023-01-11T23:16:26.9231109Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/optim/lr_scheduler.py::SequentialLR:0, line 621 <- wrt source file 2023-01-11T23:16:26.9231710Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/optim/lr_scheduler.py::SequentialLR:0 2023-01-11T23:16:26.9232332Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/optim/lr_scheduler.py::PolynomialLR:0, line 729 <- wrt source file 2023-01-11T23:16:26.9232942Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/optim/lr_scheduler.py::PolynomialLR:0 2023-01-11T23:16:26.9233568Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/optim/lr_scheduler.py::ChainedScheduler:0, line 849 <- wrt source file 2023-01-11T23:16:26.9234187Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/optim/lr_scheduler.py::ChainedScheduler:0 2023-01-11T23:16:26.9234840Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/optim/lr_scheduler.py::ReduceLROnPlateau:0, line 953 <- wrt source file 2023-01-11T23:16:26.9235466Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/optim/lr_scheduler.py::ReduceLROnPlateau:0 2023-01-11T23:16:26.9236105Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/optim/lr_scheduler.py::CyclicLR:0, line 1168 <- wrt source file 2023-01-11T23:16:26.9236696Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/optim/lr_scheduler.py::CyclicLR:0 2023-01-11T23:16:26.9237369Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/optim/lr_scheduler.py::CosineAnnealingWarmRestarts.step:0, line 1389 <- wrt source file 2023-01-11T23:16:26.9238112Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/optim/lr_scheduler.py::CosineAnnealingWarmRestarts.step:0 2023-01-11T23:16:26.9238822Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/optim/lr_scheduler.py::CosineAnnealingWarmRestarts.step:1, line 1405 <- wrt source file 2023-01-11T23:16:26.9239517Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/optim/lr_scheduler.py::CosineAnnealingWarmRestarts.step:1 2023-01-11T23:16:26.9240172Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/optim/lr_scheduler.py::OneCycleLR:0, line 1547 <- wrt source file 2023-01-11T23:16:26.9240767Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/optim/lr_scheduler.py::OneCycleLR:0 2023-01-11T23:16:26.9241343Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/optim/sgd.py::SGD:0, line 58 <- wrt source file 2023-01-11T23:16:26.9241889Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/optim/sgd.py::SGD:0 2023-01-11T23:16:26.9242487Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/optim/swa_utils.py::AveragedModel:0, line 38 <- wrt source file 2023-01-11T23:16:26.9243082Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/optim/swa_utils.py::AveragedModel:0 2023-01-11T23:16:26.9243698Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/optim/swa_utils.py::AveragedModel:1, line 64 <- wrt source file 2023-01-11T23:16:26.9244289Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/optim/swa_utils.py::AveragedModel:1 2023-01-11T23:16:26.9244894Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/optim/swa_utils.py::update_bn:0, line 161 <- wrt source file 2023-01-11T23:16:26.9245520Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/optim/swa_utils.py::update_bn:0 2023-01-11T23:16:26.9246111Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/optim/swa_utils.py::SWALR:0, line 222 <- wrt source file 2023-01-11T23:16:26.9246679Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/optim/swa_utils.py::SWALR:0 2023-01-11T23:16:26.9247283Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/package/glob_group.py::GlobGroup:0, line 19 <- wrt source file 2023-01-11T23:16:26.9247884Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/package/glob_group.py::GlobGroup:0 2023-01-11T23:16:26.9248495Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/profiler/profiler.py::profile:0, line 363 <- wrt source file 2023-01-11T23:16:26.9249081Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/profiler/profiler.py::profile:0 2023-01-11T23:16:26.9249705Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/testing/_comparison.py::assert_close:0, line 1395 <- wrt source file 2023-01-11T23:16:26.9291845Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/testing/_comparison.py::assert_close:0 2023-01-11T23:16:26.9292436Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/testing/_creation.py::make_tensor:0, line 93 <- wrt source file 2023-01-11T23:16:26.9293155Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/testing/_creation.py::make_tensor:0 2023-01-11T23:16:26.9293774Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py::parametrize:0, line 305 <- wrt source file 2023-01-11T23:16:26.9294342Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py::parametrize:0 2023-01-11T23:16:26.9295147Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py::random_symmetric_psd_matrix:0, line 3447 <- wrt source file 2023-01-11T23:16:26.9296041Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py::random_symmetric_psd_matrix:0 2023-01-11T23:16:26.9296649Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py::random_hermitian_psd_matrix:0, line 3461 <- wrt source file 2023-01-11T23:16:26.9297343Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py::random_hermitian_psd_matrix:0 2023-01-11T23:16:26.9298091Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py::random_hermitian_pd_matrix:0, line 3491 <- wrt source file 2023-01-11T23:16:26.9298706Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py::random_hermitian_pd_matrix:0 2023-01-11T23:16:26.9299424Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/distributed/_tensor/common_dtensor.py::skip_unless_torch_gpu:0, line 57 <- wrt source file 2023-01-11T23:16:26.9300168Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/distributed/_tensor/common_dtensor.py::skip_unless_torch_gpu:0 2023-01-11T23:16:26.9300995Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/utils/backend_registration.py::rename_privateuse1_backend:0, line 24 <- wrt source file 2023-01-11T23:16:26.9301749Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/utils/backend_registration.py::rename_privateuse1_backend:0 2023-01-11T23:16:26.9302459Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/utils/checkpoint.py::checkpoint_sequential:0, line 306 <- wrt source file 2023-01-11T23:16:26.9303021Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/utils/checkpoint.py::checkpoint_sequential:0 2023-01-11T23:16:26.9303717Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/utils/cpp_extension.py::CppExtension:0, line 912 <- wrt source file 2023-01-11T23:16:26.9304273Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/utils/cpp_extension.py::CppExtension:0 2023-01-11T23:16:26.9304829Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/utils/cpp_extension.py::CUDAExtension:0, line 960 <- wrt source file 2023-01-11T23:16:26.9305366Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/utils/cpp_extension.py::CUDAExtension:0 2023-01-11T23:16:26.9305918Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/utils/cpp_extension.py::CUDAExtension:1, line 1037 <- wrt source file 2023-01-11T23:16:26.9306442Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/utils/cpp_extension.py::CUDAExtension:1 2023-01-11T23:16:26.9306992Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/utils/cpp_extension.py::load:0, line 1273 <- wrt source file 2023-01-11T23:16:26.9307510Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/utils/cpp_extension.py::load:0 2023-01-11T23:16:26.9308055Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/utils/cpp_extension.py::load_inline:0, line 1364 <- wrt source file 2023-01-11T23:16:26.9308576Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/utils/cpp_extension.py::load_inline:0 2023-01-11T23:16:26.9309105Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/utils/dlpack.py::from_dlpack:0, line 71 <- wrt source file 2023-01-11T23:16:26.9320070Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/utils/dlpack.py::from_dlpack:0 2023-01-11T23:16:26.9320651Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/utils/throughput_benchmark.py::ThroughputBenchmark:0, line 77 <- wrt source file 2023-01-11T23:16:26.9321247Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/utils/throughput_benchmark.py::ThroughputBenchmark:0 2023-01-11T23:16:26.9321911Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/utils/data/dataset.py::IterableDataset:0, line 84 <- wrt source file 2023-01-11T23:16:26.9329113Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/utils/data/dataset.py::IterableDataset:0 2023-01-11T23:16:26.9329720Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/utils/data/dataset.py::random_split:0, line 320 <- wrt source file 2023-01-11T23:16:26.9330239Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/utils/data/dataset.py::random_split:0 2023-01-11T23:16:26.9330802Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/utils/data/distributed.py::DistributedSampler:0, line 51 <- wrt source file 2023-01-11T23:16:26.9331377Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/utils/data/distributed.py::DistributedSampler:0 2023-01-11T23:16:26.9331961Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/utils/data/sampler.py::WeightedRandomSampler:0, line 172 <- wrt source file 2023-01-11T23:16:26.9337035Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/utils/data/sampler.py::WeightedRandomSampler:0 2023-01-11T23:16:26.9337603Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/utils/data/sampler.py::BatchSampler:0, line 220 <- wrt source file 2023-01-11T23:16:26.9341713Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/utils/data/sampler.py::BatchSampler:0 2023-01-11T23:16:26.9342283Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/utils/data/_utils/collate.py::default_convert:0, line 36 <- wrt source file 2023-01-11T23:16:26.9345404Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/utils/data/_utils/collate.py::default_convert:0 2023-01-11T23:16:26.9346047Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/utils/data/_utils/collate.py::collate:0, line 102 <- wrt source file 2023-01-11T23:16:26.9350409Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/utils/data/_utils/collate.py::collate:0 2023-01-11T23:16:26.9351055Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/utils/data/_utils/collate.py::default_collate:0, line 231 <- wrt source file 2023-01-11T23:16:26.9357199Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/utils/data/_utils/collate.py::default_collate:0 2023-01-11T23:16:26.9357821Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/utils/data/datapipes/datapipe.py::IterDataPipe:0, line 84 <- wrt source file 2023-01-11T23:16:26.9360919Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/utils/data/datapipes/datapipe.py::IterDataPipe:0 2023-01-11T23:16:26.9361553Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/utils/data/datapipes/datapipe.py::MapDataPipe:0, line 232 <- wrt source file 2023-01-11T23:16:26.9362444Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/utils/data/datapipes/datapipe.py::MapDataPipe:0 2023-01-11T23:16:26.9363047Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/callable.py::MapperIterDataPipe:0, line 46 <- wrt source file 2023-01-11T23:16:26.9365274Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/callable.py::MapperIterDataPipe:0 2023-01-11T23:16:26.9365909Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/callable.py::CollatorIterDataPipe:0, line 187 <- wrt source file 2023-01-11T23:16:26.9367919Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/callable.py::CollatorIterDataPipe:0 2023-01-11T23:16:26.9368602Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/combinatorics.py::ShufflerIterDataPipe:0, line 80 <- wrt source file 2023-01-11T23:16:26.9369531Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/combinatorics.py::ShufflerIterDataPipe:0 2023-01-11T23:16:26.9370163Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/combining.py::ConcaterIterDataPipe:0, line 33 <- wrt source file 2023-01-11T23:16:26.9395239Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/combining.py::ConcaterIterDataPipe:0 2023-01-11T23:16:26.9395906Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/combining.py::ForkerIterDataPipe:0, line 75 <- wrt source file 2023-01-11T23:16:26.9396506Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/combining.py::ForkerIterDataPipe:0 2023-01-11T23:16:26.9397112Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/combining.py::_ChildDataPipe:0, line 250 <- wrt source file 2023-01-11T23:16:26.9397695Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/combining.py::_ChildDataPipe:0 2023-01-11T23:16:26.9398369Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/combining.py::DemultiplexerIterDataPipe:0, line 329 <- wrt source file 2023-01-11T23:16:26.9399231Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/combining.py::DemultiplexerIterDataPipe:0 2023-01-11T23:16:26.9399872Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/combining.py::MultiplexerIterDataPipe:0, line 507 <- wrt source file 2023-01-11T23:16:26.9400888Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/combining.py::MultiplexerIterDataPipe:0 2023-01-11T23:16:26.9401592Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/combining.py::ZipperIterDataPipe:0, line 572 <- wrt source file 2023-01-11T23:16:26.9402368Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/combining.py::ZipperIterDataPipe:0 2023-01-11T23:16:26.9402983Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/filelister.py::FileListerIterDataPipe:0, line 29 <- wrt source file 2023-01-11T23:16:26.9404149Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/filelister.py::FileListerIterDataPipe:0 2023-01-11T23:16:26.9404967Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/fileopener.py::FileOpenerIterDataPipe:0, line 33 <- wrt source file 2023-01-11T23:16:26.9405606Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/fileopener.py::FileOpenerIterDataPipe:0 2023-01-11T23:16:26.9406345Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/grouping.py::BatcherIterDataPipe:0, line 102 <- wrt source file 2023-01-11T23:16:26.9407146Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/grouping.py::BatcherIterDataPipe:0 2023-01-11T23:16:26.9407767Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/grouping.py::UnBatcherIterDataPipe:0, line 159 <- wrt source file 2023-01-11T23:16:26.9408369Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/grouping.py::UnBatcherIterDataPipe:0 2023-01-11T23:16:26.9408993Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/grouping.py::GrouperIterDataPipe:0, line 226 <- wrt source file 2023-01-11T23:16:26.9412776Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/grouping.py::GrouperIterDataPipe:0 2023-01-11T23:16:26.9413567Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/selecting.py::FilterIterDataPipe:0, line 34 <- wrt source file 2023-01-11T23:16:26.9414201Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/selecting.py::FilterIterDataPipe:0 2023-01-11T23:16:26.9415145Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/streamreader.py::StreamReaderIterDataPipe:0, line 20 <- wrt source file 2023-01-11T23:16:26.9415798Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/streamreader.py::StreamReaderIterDataPipe:0 2023-01-11T23:16:26.9416553Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/utils.py::IterableWrapperIterDataPipe:0, line 23 <- wrt source file 2023-01-11T23:16:26.9417188Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/utils.py::IterableWrapperIterDataPipe:0 2023-01-11T23:16:26.9417875Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/utils/data/datapipes/map/callable.py::MapperMapDataPipe:0, line 30 <- wrt source file 2023-01-11T23:16:26.9418871Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/utils/data/datapipes/map/callable.py::MapperMapDataPipe:0 2023-01-11T23:16:26.9419684Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/utils/data/datapipes/map/combinatorics.py::ShufflerIterDataPipe:0, line 31 <- wrt source file 2023-01-11T23:16:26.9420847Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/utils/data/datapipes/map/combinatorics.py::ShufflerIterDataPipe:0 2023-01-11T23:16:26.9421582Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/utils/data/datapipes/map/combining.py::ConcaterMapDataPipe:0, line 24 <- wrt source file 2023-01-11T23:16:26.9422286Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/utils/data/datapipes/map/combining.py::ConcaterMapDataPipe:0 2023-01-11T23:16:26.9422932Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/utils/data/datapipes/map/combining.py::ZipperMapDataPipe:0, line 66 <- wrt source file 2023-01-11T23:16:26.9423566Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/utils/data/datapipes/map/combining.py::ZipperMapDataPipe:0 2023-01-11T23:16:26.9424181Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/utils/data/datapipes/map/grouping.py::BatcherMapDataPipe:0, line 23 <- wrt source file 2023-01-11T23:16:26.9424845Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/utils/data/datapipes/map/grouping.py::BatcherMapDataPipe:0 2023-01-11T23:16:26.9425538Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/utils/data/datapipes/map/utils.py::SequenceWrapperMapDataPipe:0, line 23 <- wrt source file 2023-01-11T23:16:26.9426170Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/utils/data/datapipes/map/utils.py::SequenceWrapperMapDataPipe:0 2023-01-11T23:16:26.9426782Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/utils/data/datapipes/utils/common.py::validate_input_col:0, line 33 <- wrt source file 2023-01-11T23:16:26.9427362Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/utils/data/datapipes/utils/common.py::validate_input_col:0 2023-01-11T23:16:26.9427936Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/utils/hipify/hipify_python.py::find_closure_group:0, line 415 <- wrt source file 2023-01-11T23:16:26.9430139Z * SUCCESS: /opt/conda/lib/python3.10/site-packages/torch/utils/hipify/hipify_python.py::find_closure_group:0 2023-01-11T23:16:26.9430987Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/utils/hipify/hipify_python.py::replace_extern_shared:0, line 511 <- wrt source file 2023-01-11T23:16:26.9431578Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/utils/hipify/hipify_python.py::replace_extern_shared:0 2023-01-11T23:16:26.9432200Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/utils/tensorboard/writer.py::SummaryWriter.__init__:0, line 213 <- wrt source file 2023-01-11T23:16:26.9433540Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/utils/tensorboard/writer.py::SummaryWriter.__init__:0 2023-01-11T23:16:26.9434247Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/utils/tensorboard/writer.py::SummaryWriter.add_hparams:0, line 320 <- wrt source file 2023-01-11T23:16:26.9434976Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/utils/tensorboard/writer.py::SummaryWriter.add_hparams:0 2023-01-11T23:16:26.9435777Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/utils/tensorboard/writer.py::SummaryWriter.add_scalar:0, line 368 <- wrt source file 2023-01-11T23:16:26.9436491Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/utils/tensorboard/writer.py::SummaryWriter.add_scalar:0 2023-01-11T23:16:26.9437199Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/utils/tensorboard/writer.py::SummaryWriter.add_scalars:0, line 404 <- wrt source file 2023-01-11T23:16:26.9437940Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/utils/tensorboard/writer.py::SummaryWriter.add_scalars:0 2023-01-11T23:16:26.9438536Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/utils/tensorboard/writer.py::SummaryWriter.add_histogram:0, line 462 <- wrt source file 2023-01-11T23:16:26.9439352Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/utils/tensorboard/writer.py::SummaryWriter.add_histogram:0 2023-01-11T23:16:26.9440015Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/utils/tensorboard/writer.py::SummaryWriter.add_histogram_raw:0, line 519 <- wrt source file 2023-01-11T23:16:26.9440731Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/utils/tensorboard/writer.py::SummaryWriter.add_histogram_raw:0 2023-01-11T23:16:26.9441365Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/utils/tensorboard/writer.py::SummaryWriter.add_image:0, line 585 <- wrt source file 2023-01-11T23:16:26.9442007Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/utils/tensorboard/writer.py::SummaryWriter.add_image:0 2023-01-11T23:16:26.9442659Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/utils/tensorboard/writer.py::SummaryWriter.add_images:0, line 638 <- wrt source file 2023-01-11T23:16:26.9443322Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/utils/tensorboard/writer.py::SummaryWriter.add_images:0 2023-01-11T23:16:26.9443913Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/utils/tensorboard/writer.py::SummaryWriter.add_text:0, line 810 <- wrt source file 2023-01-11T23:16:26.9444558Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/utils/tensorboard/writer.py::SummaryWriter.add_text:0 2023-01-11T23:16:26.9445184Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/utils/tensorboard/writer.py::SummaryWriter.add_embedding:0, line 896 <- wrt source file 2023-01-11T23:16:26.9445811Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/utils/tensorboard/writer.py::SummaryWriter.add_embedding:0 2023-01-11T23:16:26.9446489Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/utils/tensorboard/writer.py::SummaryWriter.add_pr_curve:0, line 1001 <- wrt source file 2023-01-11T23:16:26.9447072Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/utils/tensorboard/writer.py::SummaryWriter.add_pr_curve:0 2023-01-11T23:16:26.9447757Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/utils/tensorboard/writer.py::SummaryWriter.add_custom_scalars_multilinechart:0, line 1076 <- wrt source file 2023-01-11T23:16:26.9448390Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/utils/tensorboard/writer.py::SummaryWriter.add_custom_scalars_multilinechart:0 2023-01-11T23:16:26.9449134Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/utils/tensorboard/writer.py::SummaryWriter.add_custom_scalars_marginchart:0, line 1095 <- wrt source file 2023-01-11T23:16:26.9449817Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/utils/tensorboard/writer.py::SummaryWriter.add_custom_scalars_marginchart:0 2023-01-11T23:16:26.9450550Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/utils/tensorboard/writer.py::SummaryWriter.add_custom_scalars:0, line 1117 <- wrt source file 2023-01-11T23:16:26.9451143Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/utils/tensorboard/writer.py::SummaryWriter.add_custom_scalars:0 2023-01-11T23:16:26.9451746Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/utils/tensorboard/writer.py::SummaryWriter.add_mesh:0, line 1161 <- wrt source file 2023-01-11T23:16:26.9452421Z * SKIPPED: /opt/conda/lib/python3.10/site-packages/torch/utils/tensorboard/writer.py::SummaryWriter.add_mesh:0 2023-01-11T23:16:26.9452763Z ============ 2023-01-11T23:16:26.9452954Z Finished doctests 2023-01-11T23:16:26.9453165Z 287 / 663 passed 2023-01-11T23:16:26.9453421Z  2023-01-11T23:16:26.9453723Z === Found 3 run-time warnings === 2023-01-11T23:16:26.9454093Z --- Runtime Warning: 1 / 3 --- 2023-01-11T23:16:26.9454452Z example = 2023-01-11T23:16:26.9455410Z /opt/conda/lib/python3.10/site-packages/torch/_tensor.py:1114: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at /var/lib/jenkins/workspace/c10/core/TensorImpl.h:1816.) 2023-01-11T23:16:26.9456007Z return super(Tensor, self).refine_names(names) 2023-01-11T23:16:26.9456217Z 2023-01-11T23:16:26.9456508Z --- Runtime Warning: 2 / 3 --- 2023-01-11T23:16:26.9456857Z example = 2023-01-11T23:16:26.9457527Z /opt/conda/lib/python3.10/site-packages/torch/nested/__init__.py:58: UserWarning: The PyTorch API of nested tensors is in prototype stage and will change in the near future. (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/NestedTensorImpl.cpp:179.) 2023-01-11T23:16:26.9458047Z return torch._nested_tensor_from_tensor_list(tensor_list, dtype, None, device, None) 2023-01-11T23:16:26.9458300Z 2023-01-11T23:16:26.9458547Z --- Runtime Warning: 3 / 3 --- 2023-01-11T23:16:26.9458842Z example = 2023-01-11T23:16:26.9459687Z /opt/conda/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py:921: UserWarning: Your compiler for AOTAutograd is returning a a function that doesn't take boxed arguments. Please wrap it with functorch.compile.make_boxed_func or handle the boxed arguments yourself. See https://github.com/pytorch/pytorch/pull/83137#issuecomment-1211320670 for rationale. 2023-01-11T23:16:26.9460220Z warnings.warn( 2023-01-11T23:16:26.9460404Z 2023-01-11T23:16:26.9460687Z === 287 passed, 376 skipped, 3 warnings in 8.58 seconds === 2023-01-11T23:16:27.3529291Z 2023-01-11T23:16:27.3529846Z real 113m44.950s 2023-01-11T23:16:27.3530179Z user 222m9.096s 2023-01-11T23:16:27.3530459Z sys 10m41.357s 2023-01-11T23:16:27.3530715Z + assert_git_not_dirty 2023-01-11T23:16:27.3531185Z + [[ linux-bionic-cuda11.6-py3.10-gcc7-sm86 != *rocm* ]] 2023-01-11T23:16:27.3531886Z + [[ linux-bionic-cuda11.6-py3.10-gcc7-sm86 != *xla* ]] 2023-01-11T23:16:27.3532187Z ++ git status --porcelain 2023-01-11T23:16:28.5459094Z + git_status= 2023-01-11T23:16:28.5459701Z + [[ -n '' ]] 2023-01-11T23:16:28.5460071Z + test_libtorch 2023-01-11T23:16:28.5460700Z + [[ linux-bionic-cuda11.6-py3.10-gcc7-sm86 != *rocm* ]] 2023-01-11T23:16:28.5461262Z + echo 'Testing libtorch' 2023-01-11T23:16:28.5461659Z Testing libtorch 2023-01-11T23:16:28.5462499Z + ln -sf /opt/conda/lib/python3.10/site-packages/torch/lib/libbackend_with_compiler.so /opt/conda/lib/python3.10/site-packages/torch/bin 2023-01-11T23:16:28.5471376Z + ln -sf /opt/conda/lib/python3.10/site-packages/torch/lib/libjitbackend_test.so /opt/conda/lib/python3.10/site-packages/torch/bin 2023-01-11T23:16:28.5480779Z + ln -sf /opt/conda/lib/python3.10/site-packages/torch/lib/libc10.so /opt/conda/lib/python3.10/site-packages/torch/lib/libc10_cuda.so /opt/conda/lib/python3.10/site-packages/torch/lib/libc10d_cuda_test.so /opt/conda/lib/python3.10/site-packages/torch/bin 2023-01-11T23:16:28.5487348Z + ln -sf /opt/conda/lib/python3.10/site-packages/torch/lib/libshm.so /opt/conda/lib/python3.10/site-packages/torch/bin 2023-01-11T23:16:28.5495126Z + ln -sf /opt/conda/lib/python3.10/site-packages/torch/lib/libtorch.so /opt/conda/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so /opt/conda/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so /opt/conda/lib/python3.10/site-packages/torch/lib/libtorch_cuda_linalg.so /opt/conda/lib/python3.10/site-packages/torch/lib/libtorch_global_deps.so /opt/conda/lib/python3.10/site-packages/torch/lib/libtorch_python.so /opt/conda/lib/python3.10/site-packages/torch/lib/libtorchbind_test.so /opt/conda/lib/python3.10/site-packages/torch/bin 2023-01-11T23:16:28.5502134Z + ln -sf '/opt/conda/lib/python3.10/site-packages/torch/lib/libtbb*' /opt/conda/lib/python3.10/site-packages/torch/bin 2023-01-11T23:16:28.5511148Z + TEST_REPORTS_DIR=test/test-reports/cpp-unittest/test_libtorch 2023-01-11T23:16:28.5513071Z + mkdir -p test/test-reports/cpp-unittest/test_libtorch 2023-01-11T23:16:28.5513427Z + python tools/download_mnist.py --quiet -d test/cpp/api/mnist 2023-01-11T23:16:28.5521614Z + [[ linux-bionic-cuda11.6-py3.10-gcc7-sm86 != *-tsan* ]] 2023-01-11T23:16:28.5521921Z + python test/cpp/jit/tests_setup.py setup 2023-01-11T23:16:28.5903864Z Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz ... 2023-01-11T23:16:28.9486862Z Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz ... 2023-01-11T23:16:28.9673841Z Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz ... 2023-01-11T23:16:29.0232727Z Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz ... 2023-01-11T23:16:30.0466737Z + [[ linux-bionic-cuda11.6-py3.10-gcc7-sm86 == *cuda* ]] 2023-01-11T23:16:30.0467285Z + /opt/conda/lib/python3.10/site-packages/torch/bin/test_jit --gtest_output=xml:test/test-reports/cpp-unittest/test_libtorch/test_jit.xml 2023-01-11T23:16:30.4573488Z Only one CUDA device detected. Disabling MultiCUDA tests 2023-01-11T23:16:30.4581096Z Note: Google Test filter = *-*_MultiCUDA 2023-01-11T23:16:30.4581466Z [==========] Running 1340 tests from 122 test suites. 2023-01-11T23:16:30.4581841Z [----------] Global test environment set-up. 2023-01-11T23:16:30.4582168Z [----------] 2 tests from AddIfThenElseOpTest 2023-01-11T23:16:30.4582524Z [ RUN ] AddIfThenElseOpTest.AddIfThenElseOpSimple 2023-01-11T23:16:30.4648517Z [ OK ] AddIfThenElseOpTest.AddIfThenElseOpSimple (6 ms) 2023-01-11T23:16:30.4649106Z [ RUN ] AddIfThenElseOpTest.NoIfThenElseOpMultipleOutputs 2023-01-11T23:16:30.4649662Z [ OK ] AddIfThenElseOpTest.NoIfThenElseOpMultipleOutputs (0 ms) 2023-01-11T23:16:30.4650240Z [----------] 2 tests from AddIfThenElseOpTest (6 ms total) 2023-01-11T23:16:30.4650492Z 2023-01-11T23:16:30.4650689Z [----------] 15 tests from TopologicalMoveTest 2023-01-11T23:16:30.4660614Z [ RUN ] TopologicalMoveTest.SplitsDeps 2023-01-11T23:16:30.4661140Z [ OK ] TopologicalMoveTest.SplitsDeps (0 ms) 2023-01-11T23:16:30.4661610Z [ RUN ] TopologicalMoveTest.MoveAfterBackwardSimple 2023-01-11T23:16:30.4662184Z [ OK ] TopologicalMoveTest.MoveAfterBackwardSimple (0 ms) 2023-01-11T23:16:30.4662751Z [ RUN ] TopologicalMoveTest.MoveAfterBackwardInvalid 2023-01-11T23:16:30.4663317Z [ OK ] TopologicalMoveTest.MoveAfterBackwardInvalid (0 ms) 2023-01-11T23:16:30.4663751Z [ RUN ] TopologicalMoveTest.MoveAfterNoOp 2023-01-11T23:16:30.4664103Z [ OK ] TopologicalMoveTest.MoveAfterNoOp (0 ms) 2023-01-11T23:16:30.4664507Z [ RUN ] TopologicalMoveTest.MoveAfterBackwardMultipleDeps 2023-01-11T23:16:30.4664981Z [ OK ] TopologicalMoveTest.MoveAfterBackwardMultipleDeps (0 ms) 2023-01-11T23:16:30.4665511Z [ RUN ] TopologicalMoveTest.MoveAfterBackwardNonZeroWorkingSet 2023-01-11T23:16:30.4665997Z [ OK ] TopologicalMoveTest.MoveAfterBackwardNonZeroWorkingSet (0 ms) 2023-01-11T23:16:30.4666562Z [ RUN ] TopologicalMoveTest.MoveAfterForwardSimple 2023-01-11T23:16:30.4666987Z [ OK ] TopologicalMoveTest.MoveAfterForwardSimple (0 ms) 2023-01-11T23:16:30.4667418Z [ RUN ] TopologicalMoveTest.MoveAfterForwardNonZeroWorkingSet 2023-01-11T23:16:30.4667893Z [ OK ] TopologicalMoveTest.MoveAfterForwardNonZeroWorkingSet (0 ms) 2023-01-11T23:16:30.4668324Z [ RUN ] TopologicalMoveTest.MoveBeforeForwardSimple 2023-01-11T23:16:30.4668735Z [ OK ] TopologicalMoveTest.MoveBeforeForwardSimple (0 ms) 2023-01-11T23:16:30.4669239Z [ RUN ] TopologicalMoveTest.MoveBeforeBackwardSimple 2023-01-11T23:16:30.4669656Z [ OK ] TopologicalMoveTest.MoveBeforeBackwardSimple (0 ms) 2023-01-11T23:16:30.4670040Z [ RUN ] TopologicalMoveTest.MoveBeforeNoOp 2023-01-11T23:16:30.4670399Z [ OK ] TopologicalMoveTest.MoveBeforeNoOp (0 ms) 2023-01-11T23:16:30.4670896Z [ RUN ] TopologicalMoveTest.MoveBeforeForwardWithDeps 2023-01-11T23:16:30.4671321Z [ OK ] TopologicalMoveTest.MoveBeforeForwardWithDeps (0 ms) 2023-01-11T23:16:30.4671743Z [ RUN ] TopologicalMoveTest.MoveBeforeBackwardWithDeps 2023-01-11T23:16:30.4672164Z [ OK ] TopologicalMoveTest.MoveBeforeBackwardWithDeps (0 ms) 2023-01-11T23:16:30.4672557Z [ RUN ] TopologicalMoveTest.DepsDisallowMove 2023-01-11T23:16:30.4672927Z [ OK ] TopologicalMoveTest.DepsDisallowMove (0 ms) 2023-01-11T23:16:30.4673311Z [ RUN ] TopologicalMoveTest.MoveAfterBeforeWithDeps 2023-01-11T23:16:30.4673723Z [ OK ] TopologicalMoveTest.MoveAfterBeforeWithDeps (0 ms) 2023-01-11T23:16:30.4674119Z [----------] 15 tests from TopologicalMoveTest (1 ms total) 2023-01-11T23:16:30.4674292Z 2023-01-11T23:16:30.4674462Z [----------] 6 tests from AliasAnalysisTest 2023-01-11T23:16:30.4674820Z [ RUN ] AliasAnalysisTest.AliasingMutationBlocksMoves 2023-01-11T23:16:30.4705083Z [ OK ] AliasAnalysisTest.AliasingMutationBlocksMoves (3 ms) 2023-01-11T23:16:30.4705548Z [ RUN ] AliasAnalysisTest.AliasingMutationBlocksMoves2 2023-01-11T23:16:30.4706118Z [ OK ] AliasAnalysisTest.AliasingMutationBlocksMoves2 (0 ms) 2023-01-11T23:16:30.4706557Z [ RUN ] AliasAnalysisTest.SideEffectsBlockMoves 2023-01-11T23:16:30.4707061Z [ OK ] AliasAnalysisTest.SideEffectsBlockMoves (0 ms) 2023-01-11T23:16:30.4707533Z [ RUN ] AliasAnalysisTest.MovingAcrossInnerBlocks 2023-01-11T23:16:30.4707936Z [ OK ] AliasAnalysisTest.MovingAcrossInnerBlocks (0 ms) 2023-01-11T23:16:30.4708311Z [ RUN ] AliasAnalysisTest.NoneHasNoWriters 2023-01-11T23:16:30.4708694Z [ OK ] AliasAnalysisTest.NoneHasNoWriters (0 ms) 2023-01-11T23:16:30.4709356Z [ RUN ] AliasAnalysisTest.SafeToChangeAliasingRelationship 2023-01-11T23:16:30.4710018Z [ OK ] AliasAnalysisTest.SafeToChangeAliasingRelationship (0 ms) 2023-01-11T23:16:30.4710566Z [----------] 6 tests from AliasAnalysisTest (4 ms total) 2023-01-11T23:16:30.4710756Z 2023-01-11T23:16:30.4710947Z [----------] 4 tests from WriteTrackingTest 2023-01-11T23:16:30.4711273Z [ RUN ] WriteTrackingTest.Basic 2023-01-11T23:16:30.4711653Z [ OK ] WriteTrackingTest.Basic (0 ms) 2023-01-11T23:16:30.4712075Z [ RUN ] WriteTrackingTest.IsMutable 2023-01-11T23:16:30.4712483Z [ OK ] WriteTrackingTest.IsMutable (0 ms) 2023-01-11T23:16:30.4712885Z [ RUN ] WriteTrackingTest.IsImmutable 2023-01-11T23:16:30.4713231Z [ OK ] WriteTrackingTest.IsImmutable (0 ms) 2023-01-11T23:16:30.4713582Z [ RUN ] WriteTrackingTest.HasWriters 2023-01-11T23:16:30.4714027Z [ OK ] WriteTrackingTest.HasWriters (0 ms) 2023-01-11T23:16:30.4714497Z [----------] 4 tests from WriteTrackingTest (0 ms total) 2023-01-11T23:16:30.4714728Z 2023-01-11T23:16:30.4714990Z [----------] 13 tests from ContainerAliasingTest 2023-01-11T23:16:30.4715476Z [ RUN ] ContainerAliasingTest.MayContainAlias 2023-01-11T23:16:30.4716004Z [ OK ] ContainerAliasingTest.MayContainAlias (0 ms) 2023-01-11T23:16:30.4716489Z [ RUN ] ContainerAliasingTest.MayContainAlias_cast 2023-01-11T23:16:30.4716937Z [ OK ] ContainerAliasingTest.MayContainAlias_cast (0 ms) 2023-01-11T23:16:30.4717499Z [ RUN ] ContainerAliasingTest.PrimitveValuesDontAliasContainers 2023-01-11T23:16:30.4718174Z [ OK ] ContainerAliasingTest.PrimitveValuesDontAliasContainers (0 ms) 2023-01-11T23:16:30.4718594Z [ RUN ] ContainerAliasingTest.UnionAliasing 2023-01-11T23:16:30.4718954Z [ OK ] ContainerAliasingTest.UnionAliasing (0 ms) 2023-01-11T23:16:30.4719341Z [ RUN ] ContainerAliasingTest.InputsCanAliasOutputs 2023-01-11T23:16:30.4719831Z [ OK ] ContainerAliasingTest.InputsCanAliasOutputs (0 ms) 2023-01-11T23:16:30.4720244Z [ RUN ] ContainerAliasingTest.NestedTupleConstruct 2023-01-11T23:16:30.4720672Z [ OK ] ContainerAliasingTest.NestedTupleConstruct (0 ms) 2023-01-11T23:16:30.4721165Z [ RUN ] ContainerAliasingTest.NestedTypes 2023-01-11T23:16:30.4721539Z [ OK ] ContainerAliasingTest.NestedTypes (0 ms) 2023-01-11T23:16:30.4721876Z [ RUN ] ContainerAliasingTest.Simple 2023-01-11T23:16:30.4722280Z [ OK ] ContainerAliasingTest.Simple (0 ms) 2023-01-11T23:16:30.4722648Z [ RUN ] ContainerAliasingTest.Lists 2023-01-11T23:16:30.4723047Z [ OK ] ContainerAliasingTest.Lists (0 ms) 2023-01-11T23:16:30.4723483Z [ RUN ] ContainerAliasingTest.Lists2 2023-01-11T23:16:30.4723875Z [ OK ] ContainerAliasingTest.Lists2 (0 ms) 2023-01-11T23:16:30.4724328Z [ RUN ] ContainerAliasingTest.Conservative 2023-01-11T23:16:30.4724793Z [ OK ] ContainerAliasingTest.Conservative (0 ms) 2023-01-11T23:16:30.4725194Z [ RUN ] ContainerAliasingTest.MovesAcrossContainedWrites 2023-01-11T23:16:30.4725761Z [ OK ] ContainerAliasingTest.MovesAcrossContainedWrites (0 ms) 2023-01-11T23:16:30.4726364Z [ RUN ] ContainerAliasingTest.MovesAcrossContainedWritesNested 2023-01-11T23:16:30.4726943Z [ OK ] ContainerAliasingTest.MovesAcrossContainedWritesNested (0 ms) 2023-01-11T23:16:30.4727444Z [----------] 13 tests from ContainerAliasingTest (1 ms total) 2023-01-11T23:16:30.4727639Z 2023-01-11T23:16:30.4727853Z [----------] 3 tests from WildcardsTest 2023-01-11T23:16:30.4728226Z [ RUN ] WildcardsTest.Basic 2023-01-11T23:16:30.4728660Z [ OK ] WildcardsTest.Basic (0 ms) 2023-01-11T23:16:30.4728989Z [ RUN ] WildcardsTest.TypeIsolation 2023-01-11T23:16:30.4729430Z [ OK ] WildcardsTest.TypeIsolation (0 ms) 2023-01-11T23:16:30.4729881Z [ RUN ] WildcardsTest.InvariantContainerAliasing 2023-01-11T23:16:30.4730420Z [ OK ] WildcardsTest.InvariantContainerAliasing (0 ms) 2023-01-11T23:16:30.4730841Z [----------] 3 tests from WildcardsTest (0 ms total) 2023-01-11T23:16:30.4730998Z 2023-01-11T23:16:30.4731178Z [----------] 18 tests from AliasRegistrationTest 2023-01-11T23:16:30.4731572Z [ RUN ] AliasRegistrationTest.ConservativeWithInferredSchema 2023-01-11T23:16:30.4732109Z [ OK ] AliasRegistrationTest.ConservativeWithInferredSchema (0 ms) 2023-01-11T23:16:30.4732583Z [ RUN ] AliasRegistrationTest.ConservativeWithSpecifiedSchema 2023-01-11T23:16:30.4733062Z [ OK ] AliasRegistrationTest.ConservativeWithSpecifiedSchema (0 ms) 2023-01-11T23:16:30.4733587Z [ RUN ] AliasRegistrationTest.ConservativeWithAliasingAnnotationsShouldError 2023-01-11T23:16:30.4768990Z [ OK ] AliasRegistrationTest.ConservativeWithAliasingAnnotationsShouldError (3 ms) 2023-01-11T23:16:30.4769569Z [ RUN ] AliasRegistrationTest.ConservativeWithAliasingAnnotationsShouldError2 2023-01-11T23:16:30.4802993Z [ OK ] AliasRegistrationTest.ConservativeWithAliasingAnnotationsShouldError2 (3 ms) 2023-01-11T23:16:30.4803548Z [ RUN ] AliasRegistrationTest.FromSchemaWithInferredSchemaShouldError 2023-01-11T23:16:30.4816892Z [ OK ] AliasRegistrationTest.FromSchemaWithInferredSchemaShouldError (1 ms) 2023-01-11T23:16:30.4817598Z [ RUN ] AliasRegistrationTest.FromSchemaInferredPure 2023-01-11T23:16:30.4818028Z [ OK ] AliasRegistrationTest.FromSchemaInferredPure (0 ms) 2023-01-11T23:16:30.4818434Z [ RUN ] AliasRegistrationTest.FromSchemaAliased 2023-01-11T23:16:30.4819164Z [ OK ] AliasRegistrationTest.FromSchemaAliased (0 ms) 2023-01-11T23:16:30.4819984Z [ RUN ] AliasRegistrationTest.FromSchemaPure 2023-01-11T23:16:30.4820503Z [ OK ] AliasRegistrationTest.FromSchemaPure (0 ms) 2023-01-11T23:16:30.4820967Z [ RUN ] AliasRegistrationTest.PureNoSchema 2023-01-11T23:16:30.4821819Z [ OK ] AliasRegistrationTest.PureNoSchema (0 ms) 2023-01-11T23:16:30.4822393Z [ RUN ] AliasRegistrationTest.PureWithSchema 2023-01-11T23:16:30.4822883Z [ OK ] AliasRegistrationTest.PureWithSchema (0 ms) 2023-01-11T23:16:30.4823300Z [ RUN ] AliasRegistrationTest.PureWithAnnotationsShouldError 2023-01-11T23:16:30.4860672Z [ OK ] AliasRegistrationTest.PureWithAnnotationsShouldError (3 ms) 2023-01-11T23:16:30.4861320Z [ RUN ] AliasRegistrationTest.AliasMoveAtenListOp 2023-01-11T23:16:30.4861880Z [ OK ] AliasRegistrationTest.AliasMoveAtenListOp (0 ms) 2023-01-11T23:16:30.4862507Z [ RUN ] AliasRegistrationTest.AliasMoveForTupleConstructWithSingleUseAsGraphOutput 2023-01-11T23:16:30.4863133Z [ OK ] AliasRegistrationTest.AliasMoveForTupleConstructWithSingleUseAsGraphOutput (0 ms) 2023-01-11T23:16:30.4863686Z [ RUN ] AliasRegistrationTest.RecursiveSubgraphTupleContainment 2023-01-11T23:16:30.4864198Z [ OK ] AliasRegistrationTest.RecursiveSubgraphTupleContainment (0 ms) 2023-01-11T23:16:30.4864693Z [ RUN ] AliasRegistrationTest.WildcardAliasForTupleConstructWithUses 2023-01-11T23:16:30.4865250Z [ OK ] AliasRegistrationTest.WildcardAliasForTupleConstructWithUses (0 ms) 2023-01-11T23:16:30.4865722Z [ RUN ] AliasRegistrationTest.ATenSplitIntListAliasCheck 2023-01-11T23:16:30.4866167Z [ OK ] AliasRegistrationTest.ATenSplitIntListAliasCheck (0 ms) 2023-01-11T23:16:30.4866699Z [ RUN ] AliasRegistrationTest.ATenSplitIntAliasCheck 2023-01-11T23:16:30.4867117Z [ OK ] AliasRegistrationTest.ATenSplitIntAliasCheck (0 ms) 2023-01-11T23:16:30.4867556Z [ RUN ] AliasRegistrationTest.PureWithAnnotationsShouldError2 2023-01-11T23:16:30.4897984Z [ OK ] AliasRegistrationTest.PureWithAnnotationsShouldError2 (3 ms) 2023-01-11T23:16:30.4898598Z [----------] 18 tests from AliasRegistrationTest (16 ms total) 2023-01-11T23:16:30.4898956Z 2023-01-11T23:16:30.4899202Z [----------] 2 tests from IRNonDeterminismTest 2023-01-11T23:16:30.4899539Z [ RUN ] IRNonDeterminismTest.Basic 2023-01-11T23:16:30.4899858Z [ OK ] IRNonDeterminismTest.Basic (0 ms) 2023-01-11T23:16:30.4900276Z [ RUN ] IRNonDeterminismTest.DropoutSpecialCase 2023-01-11T23:16:30.4900668Z [ OK ] IRNonDeterminismTest.DropoutSpecialCase (0 ms) 2023-01-11T23:16:30.4901138Z [----------] 2 tests from IRNonDeterminismTest (0 ms total) 2023-01-11T23:16:30.4901368Z 2023-01-11T23:16:30.4901654Z [----------] 1 test from NonDeterminismBackwardsCompatibility 2023-01-11T23:16:30.4902109Z [ RUN ] NonDeterminismBackwardsCompatibility.BackwardsCompatibility 2023-01-11T23:16:30.4902628Z [ OK ] NonDeterminismBackwardsCompatibility.BackwardsCompatibility (0 ms) 2023-01-11T23:16:30.4903107Z [----------] 1 test from NonDeterminismBackwardsCompatibility (0 ms total) 2023-01-11T23:16:30.4903308Z 2023-01-11T23:16:30.4903474Z [----------] 2 tests from ArgumentSpecTest 2023-01-11T23:16:30.4903832Z [ RUN ] ArgumentSpecTest.CompleteArgumentSpec_CUDA 2023-01-11T23:16:31.3385778Z [ OK ] ArgumentSpecTest.CompleteArgumentSpec_CUDA (847 ms) 2023-01-11T23:16:31.3386454Z [ RUN ] ArgumentSpecTest.Basic_CUDA 2023-01-11T23:16:31.3386796Z [ OK ] ArgumentSpecTest.Basic_CUDA (0 ms) 2023-01-11T23:16:31.3387251Z [----------] 2 tests from ArgumentSpecTest (848 ms total) 2023-01-11T23:16:31.3387431Z 2023-01-11T23:16:31.3387588Z [----------] 3 tests from AutodiffTest 2023-01-11T23:16:31.3387906Z [ RUN ] AutodiffTest.ADFormulas 2023-01-11T23:16:31.3862934Z [ OK ] AutodiffTest.ADFormulas (47 ms) 2023-01-11T23:16:31.3863741Z [ RUN ] AutodiffTest.Differentiate 2023-01-11T23:16:31.3865563Z [ OK ] AutodiffTest.Differentiate (0 ms) 2023-01-11T23:16:31.3866466Z [ RUN ] AutodiffTest.DifferentiateWithRequiresGrad 2023-01-11T23:16:31.3885800Z [ OK ] AutodiffTest.DifferentiateWithRequiresGrad (2 ms) 2023-01-11T23:16:31.3886339Z [----------] 3 tests from AutodiffTest (50 ms total) 2023-01-11T23:16:31.3886511Z 2023-01-11T23:16:31.3886724Z [----------] 1 test from AutodiffRemoveUnusedGradientsTest 2023-01-11T23:16:31.3887107Z [ RUN ] AutodiffRemoveUnusedGradientsTest.Linear 2023-01-11T23:16:31.3898284Z [ OK ] AutodiffRemoveUnusedGradientsTest.Linear (1 ms) 2023-01-11T23:16:31.3898740Z [----------] 1 test from AutodiffRemoveUnusedGradientsTest (1 ms total) 2023-01-11T23:16:31.3898945Z 2023-01-11T23:16:31.3899109Z [----------] 1 test from UpgraderLoad 2023-01-11T23:16:31.3899446Z [ RUN ] UpgraderLoad.CanPopulateUpgradersGraph 2023-01-11T23:16:31.3946977Z [ OK ] UpgraderLoad.CanPopulateUpgradersGraph (4 ms) 2023-01-11T23:16:31.3947941Z [----------] 1 test from UpgraderLoad (4 ms total) 2023-01-11T23:16:31.3948269Z 2023-01-11T23:16:31.3948649Z [----------] 4 tests from OpReplacementTest 2023-01-11T23:16:31.3949490Z [ RUN ] OpReplacementTest.ReplaceDivInSimpleFunction 2023-01-11T23:16:31.3950336Z [ OK ] OpReplacementTest.ReplaceDivInSimpleFunction (0 ms) 2023-01-11T23:16:31.3951425Z [ RUN ] OpReplacementTest.ReplaceTwoOpsInSimpleFunction 2023-01-11T23:16:31.3952643Z [ OK ] OpReplacementTest.ReplaceTwoOpsInSimpleFunction (0 ms) 2023-01-11T23:16:31.3953729Z [ RUN ] OpReplacementTest.ReplaceDivInNestedFunction 2023-01-11T23:16:31.3955004Z [ OK ] OpReplacementTest.ReplaceDivInNestedFunction (0 ms) 2023-01-11T23:16:31.3955485Z [ RUN ] OpReplacementTest.ReplaceTestSubcmulInSimpleFunction 2023-01-11T23:16:31.3955945Z [ OK ] OpReplacementTest.ReplaceTestSubcmulInSimpleFunction (0 ms) 2023-01-11T23:16:31.3956360Z [----------] 4 tests from OpReplacementTest (0 ms total) 2023-01-11T23:16:31.3956531Z 2023-01-11T23:16:31.3956694Z [----------] 4 tests from UpgraderUtils 2023-01-11T23:16:31.3957029Z [ RUN ] UpgraderUtils.FindCorrectUpgrader 2023-01-11T23:16:31.3957383Z [ OK ] UpgraderUtils.FindCorrectUpgrader (0 ms) 2023-01-11T23:16:31.3957735Z [ RUN ] UpgraderUtils.IsVersionMapSorted 2023-01-11T23:16:31.3958094Z [ OK ] UpgraderUtils.IsVersionMapSorted (0 ms) 2023-01-11T23:16:31.3958430Z [ RUN ] UpgraderUtils.FindIfOpIsCurrent 2023-01-11T23:16:31.3958778Z [ OK ] UpgraderUtils.FindIfOpIsCurrent (0 ms) 2023-01-11T23:16:31.3959120Z [ RUN ] UpgraderUtils.CanLoadHistoricOp 2023-01-11T23:16:31.3959468Z [ OK ] UpgraderUtils.CanLoadHistoricOp (0 ms) 2023-01-11T23:16:31.3959809Z [----------] 4 tests from UpgraderUtils (0 ms total) 2023-01-11T23:16:31.3959972Z 2023-01-11T23:16:31.3960123Z [----------] 9 tests from BackendTest 2023-01-11T23:16:31.3960408Z [ RUN ] BackendTest.ToBackend 2023-01-11T23:16:31.4008991Z [ OK ] BackendTest.ToBackend (5 ms) 2023-01-11T23:16:31.4009487Z [ RUN ] BackendTest.ToBackendNotAvailable 2023-01-11T23:16:31.4035727Z [W backend_detail.cpp:393] Warning: Backend [test_backend_unavailable] is not available. Execution of this Module is still possible by saving and loading on a device where the backend is available. (function codegen_backend_module) 2023-01-11T23:16:31.4052385Z [ OK ] BackendTest.ToBackendNotAvailable (4 ms) 2023-01-11T23:16:31.4052750Z [ RUN ] BackendTest.TestCompiler 2023-01-11T23:16:31.4117008Z [ OK ] BackendTest.TestCompiler (6 ms) 2023-01-11T23:16:31.4117399Z [ RUN ] BackendTest.TestCompilerWithStringTable 2023-01-11T23:16:31.4175021Z [ OK ] BackendTest.TestCompilerWithStringTable (5 ms) 2023-01-11T23:16:31.4175462Z [ RUN ] BackendTest.TestComposite 2023-01-11T23:16:31.4290612Z [ OK ] BackendTest.TestComposite (11 ms) 2023-01-11T23:16:31.4290986Z [ RUN ] BackendTest.TestPrimDtype 2023-01-11T23:16:31.4296661Z [ OK ] BackendTest.TestPrimDtype (0 ms) 2023-01-11T23:16:31.4297053Z [ RUN ] BackendTest.TestCompositeWithSetStates 2023-01-11T23:16:31.4411695Z [ OK ] BackendTest.TestCompositeWithSetStates (11 ms) 2023-01-11T23:16:31.4412146Z [ RUN ] BackendTest.TestConsistencyOfCompositeWithSetStates 2023-01-11T23:16:31.4615128Z [ OK ] BackendTest.TestConsistencyOfCompositeWithSetStates (20 ms) 2023-01-11T23:16:31.4615977Z [ RUN ] BackendTest.TestCompilerNotSupport 2023-01-11T23:16:31.4635680Z [ OK ] BackendTest.TestCompilerNotSupport (1 ms) 2023-01-11T23:16:31.4636078Z [----------] 9 tests from BackendTest (67 ms total) 2023-01-11T23:16:31.4636247Z 2023-01-11T23:16:31.4636446Z [----------] 6 tests from BackendTestDebugInfo 2023-01-11T23:16:31.4636887Z [ RUN ] BackendTestDebugInfo.TestCompiler 2023-01-11T23:16:31.4766215Z [ OK ] BackendTestDebugInfo.TestCompiler (13 ms) 2023-01-11T23:16:31.4766661Z [ RUN ] BackendTestDebugInfo.TestCompilerWithStringTable 2023-01-11T23:16:31.4899983Z [ OK ] BackendTestDebugInfo.TestCompilerWithStringTable (13 ms) 2023-01-11T23:16:31.4900780Z [ RUN ] BackendTestDebugInfo.TestExceptionStackForCompilerWithModuleHierarchy 2023-01-11T23:16:31.5035699Z [ OK ] BackendTestDebugInfo.TestExceptionStackForCompilerWithModuleHierarchy (13 ms) 2023-01-11T23:16:31.5036365Z [ RUN ] BackendTestDebugInfo.TestExceptionStackForCompilerWithTwoLevelModuleHierarchy 2023-01-11T23:16:31.5168402Z [ OK ] BackendTestDebugInfo.TestExceptionStackForCompilerWithTwoLevelModuleHierarchy (13 ms) 2023-01-11T23:16:31.5169051Z [ RUN ] BackendTestDebugInfo.TestExceptionStackForCompilerWithLoweredSubModule 2023-01-11T23:16:31.5307867Z [ OK ] BackendTestDebugInfo.TestExceptionStackForCompilerWithLoweredSubModule (13 ms) 2023-01-11T23:16:31.5309178Z [ RUN ] BackendTestDebugInfo.TestExceptionStackForCompilerWithSelectiveLoweredSubModule 2023-01-11T23:16:31.5446897Z [ OK ] BackendTestDebugInfo.TestExceptionStackForCompilerWithSelectiveLoweredSubModule (13 ms) 2023-01-11T23:16:31.5448168Z [----------] 6 tests from BackendTestDebugInfo (81 ms total) 2023-01-11T23:16:31.5448514Z 2023-01-11T23:16:31.5448834Z [----------] 4 tests from ClassImportTest 2023-01-11T23:16:31.5449430Z [ RUN ] ClassImportTest.Basic 2023-01-11T23:16:31.5453405Z [ OK ] ClassImportTest.Basic (0 ms) 2023-01-11T23:16:31.5454281Z [ RUN ] ClassImportTest.ScriptObject 2023-01-11T23:16:31.5478513Z [ OK ] ClassImportTest.ScriptObject (2 ms) 2023-01-11T23:16:31.5478989Z [ RUN ] ClassImportTest.ClassDerive 2023-01-11T23:16:31.5479429Z [ OK ] ClassImportTest.ClassDerive (0 ms) 2023-01-11T23:16:31.5479960Z [ RUN ] ClassImportTest.CustomClass 2023-01-11T23:16:31.5480413Z [ OK ] ClassImportTest.CustomClass (0 ms) 2023-01-11T23:16:31.5480904Z [----------] 4 tests from ClassImportTest (3 ms total) 2023-01-11T23:16:31.5481123Z 2023-01-11T23:16:31.5481345Z [----------] 1 test from ClassParserTest 2023-01-11T23:16:31.5481705Z [ RUN ] ClassParserTest.Basic 2023-01-11T23:16:31.5482020Z [ OK ] ClassParserTest.Basic (0 ms) 2023-01-11T23:16:31.5482365Z [----------] 1 test from ClassParserTest (0 ms total) 2023-01-11T23:16:31.5482530Z 2023-01-11T23:16:31.5482680Z [----------] 3 tests from ClassTypeTest 2023-01-11T23:16:31.5482988Z [ RUN ] ClassTypeTest.AddRemoveAttr 2023-01-11T23:16:31.5483325Z [ OK ] ClassTypeTest.AddRemoveAttr (0 ms) 2023-01-11T23:16:31.5483655Z [ RUN ] ClassTypeTest.AddRemoveConstant 2023-01-11T23:16:31.5484009Z [ OK ] ClassTypeTest.AddRemoveConstant (0 ms) 2023-01-11T23:16:31.5484385Z [ RUN ] ClassTypeTest.IdenticalTypesDifferentCus 2023-01-11T23:16:31.5494095Z [ OK ] ClassTypeTest.IdenticalTypesDifferentCus (1 ms) 2023-01-11T23:16:31.5494869Z [----------] 3 tests from ClassTypeTest (1 ms total) 2023-01-11T23:16:31.5495144Z 2023-01-11T23:16:31.5495385Z [----------] 2 tests from TestCodeTemplate 2023-01-11T23:16:31.5495795Z [ RUN ] TestCodeTemplate.Copying 2023-01-11T23:16:31.5496115Z [ OK ] TestCodeTemplate.Copying (0 ms) 2023-01-11T23:16:31.5496425Z [ RUN ] TestCodeTemplate.Formatting 2023-01-11T23:16:31.5496751Z [ OK ] TestCodeTemplate.Formatting (0 ms) 2023-01-11T23:16:31.5497094Z [----------] 2 tests from TestCodeTemplate (0 ms total) 2023-01-11T23:16:31.5497255Z 2023-01-11T23:16:31.5497413Z [----------] 13 tests from ConcatOptTest 2023-01-11T23:16:31.5497788Z [ RUN ] ConcatOptTest.SimpleCommonInputsEliminationPrefix 2023-01-11T23:16:31.5531333Z [ OK ] ConcatOptTest.SimpleCommonInputsEliminationPrefix (3 ms) 2023-01-11T23:16:31.5531854Z [ RUN ] ConcatOptTest.SimpleCommonInputsEliminationSuffix 2023-01-11T23:16:31.5569671Z [ OK ] ConcatOptTest.SimpleCommonInputsEliminationSuffix (3 ms) 2023-01-11T23:16:31.5570244Z [ RUN ] ConcatOptTest.CommonInputsEliminationWithDifferentOrderInputs 2023-01-11T23:16:31.5608227Z [ OK ] ConcatOptTest.CommonInputsEliminationWithDifferentOrderInputs (3 ms) 2023-01-11T23:16:31.5608749Z [ RUN ] ConcatOptTest.MoreCommonInputsElimination 2023-01-11T23:16:31.5673938Z [ OK ] ConcatOptTest.MoreCommonInputsElimination (6 ms) 2023-01-11T23:16:31.5674384Z [ RUN ] ConcatOptTest.ExpandConcat 2023-01-11T23:16:31.5702642Z [ OK ] ConcatOptTest.ExpandConcat (2 ms) 2023-01-11T23:16:31.5703705Z [ RUN ] ConcatOptTest.ConcatWithoutResultShape 2023-01-11T23:16:31.5725101Z [ OK ] ConcatOptTest.ConcatWithoutResultShape (2 ms) 2023-01-11T23:16:31.5725567Z [ RUN ] ConcatOptTest.ConcatWithoutInputShape 2023-01-11T23:16:31.5755320Z [ OK ] ConcatOptTest.ConcatWithoutInputShape (2 ms) 2023-01-11T23:16:31.5755739Z [ RUN ] ConcatOptTest.UseVariadicCat 2023-01-11T23:16:31.5818803Z [ OK ] ConcatOptTest.UseVariadicCat (6 ms) 2023-01-11T23:16:31.5819275Z [ RUN ] ConcatOptTest.UseVariadicCatWithMultipleListUses 2023-01-11T23:16:31.5848277Z [ OK ] ConcatOptTest.UseVariadicCatWithMultipleListUses (2 ms) 2023-01-11T23:16:31.5849226Z [ RUN ] ConcatOptTest.UseVariadicCatWithListMutationAfterCat 2023-01-11T23:16:31.5884752Z [ OK ] ConcatOptTest.UseVariadicCatWithListMutationAfterCat (3 ms) 2023-01-11T23:16:31.5885311Z [ RUN ] ConcatOptTest.UseVariadicCatWithListMutationBeforeCat 2023-01-11T23:16:31.5921501Z [ OK ] ConcatOptTest.UseVariadicCatWithListMutationBeforeCat (3 ms) 2023-01-11T23:16:31.5922058Z [ RUN ] ConcatOptTest.UseVariadicCatWithMultipleListMutations 2023-01-11T23:16:31.5988253Z [ OK ] ConcatOptTest.UseVariadicCatWithMultipleListMutations (6 ms) 2023-01-11T23:16:31.5988885Z [ RUN ] ConcatOptTest.RemoveListMutationUseVariadicCatAndCommonInputsElimination 2023-01-11T23:16:31.6027899Z [ OK ] ConcatOptTest.RemoveListMutationUseVariadicCatAndCommonInputsElimination (3 ms) 2023-01-11T23:16:31.6028457Z [----------] 13 tests from ConcatOptTest (53 ms total) 2023-01-11T23:16:31.6028660Z 2023-01-11T23:16:31.6028858Z [----------] 1 test from OptimizeConcatTest 2023-01-11T23:16:31.6029303Z [ RUN ] OptimizeConcatTest.UseVariadicCatReplaceMultiple 2023-01-11T23:16:31.6079894Z [ OK ] OptimizeConcatTest.UseVariadicCatReplaceMultiple (5 ms) 2023-01-11T23:16:31.6080371Z [----------] 1 test from OptimizeConcatTest (5 ms total) 2023-01-11T23:16:31.6080576Z 2023-01-11T23:16:31.6080752Z [----------] 3 tests from ConcatOpt 2023-01-11T23:16:31.6081136Z [ RUN ] ConcatOpt.CombineConcatsSimpleCase 2023-01-11T23:16:31.6081559Z [ OK ] ConcatOpt.CombineConcatsSimpleCase (0 ms) 2023-01-11T23:16:31.6081980Z [ RUN ] ConcatOpt.CombineConcatsLongChain 2023-01-11T23:16:31.6084864Z [ OK ] ConcatOpt.CombineConcatsLongChain (0 ms) 2023-01-11T23:16:31.6085478Z [ RUN ] ConcatOpt.CombineConcatsMutation 2023-01-11T23:16:31.6085882Z [ OK ] ConcatOpt.CombineConcatsMutation (0 ms) 2023-01-11T23:16:31.6086286Z [----------] 3 tests from ConcatOpt (0 ms total) 2023-01-11T23:16:31.6086470Z 2023-01-11T23:16:31.6086691Z [----------] 4 tests from ConstantPoolingTest 2023-01-11T23:16:31.6087044Z [ RUN ] ConstantPoolingTest.Int 2023-01-11T23:16:31.6087411Z [ OK ] ConstantPoolingTest.Int (0 ms) 2023-01-11T23:16:31.6087831Z [ RUN ] ConstantPoolingTest.PoolingAcrossBlocks 2023-01-11T23:16:31.6088281Z [ OK ] ConstantPoolingTest.PoolingAcrossBlocks (0 ms) 2023-01-11T23:16:31.6088946Z [ RUN ] ConstantPoolingTest.PoolingDifferentDevices 2023-01-11T23:16:31.6090340Z [ OK ] ConstantPoolingTest.PoolingDifferentDevices (0 ms) 2023-01-11T23:16:31.6090806Z [ RUN ] ConstantPoolingTest.DictConstantPooling 2023-01-11T23:16:31.6091252Z [ OK ] ConstantPoolingTest.DictConstantPooling (0 ms) 2023-01-11T23:16:31.6095395Z [----------] 4 tests from ConstantPoolingTest (0 ms total) 2023-01-11T23:16:31.6095645Z 2023-01-11T23:16:31.6095855Z [----------] 1 test from CleanupPassTest 2023-01-11T23:16:31.6096213Z [ RUN ] CleanupPassTest.Basic 2023-01-11T23:16:31.6096578Z [ OK ] CleanupPassTest.Basic (0 ms) 2023-01-11T23:16:31.6096956Z [----------] 1 test from CleanupPassTest (0 ms total) 2023-01-11T23:16:31.6097151Z 2023-01-11T23:16:31.6097374Z [----------] 1 test from CreateAutodiffSubgraphsTest 2023-01-11T23:16:31.6097782Z [ RUN ] CreateAutodiffSubgraphsTest.Basic 2023-01-11T23:16:31.6098481Z [ OK ] CreateAutodiffSubgraphsTest.Basic (0 ms) 2023-01-11T23:16:31.6098930Z [----------] 1 test from CreateAutodiffSubgraphsTest (0 ms total) 2023-01-11T23:16:31.6099150Z 2023-01-11T23:16:31.6099334Z [----------] 4 tests from CustomClassTest 2023-01-11T23:16:31.6099714Z [ RUN ] CustomClassTest.TorchbindIValueAPI 2023-01-11T23:16:31.6105798Z [ OK ] CustomClassTest.TorchbindIValueAPI (0 ms) 2023-01-11T23:16:31.6106199Z [ RUN ] CustomClassTest.ScalarTypeClass 2023-01-11T23:16:31.6108342Z [ OK ] CustomClassTest.ScalarTypeClass (0 ms) 2023-01-11T23:16:31.6108725Z [ RUN ] CustomClassTest.TestDocString 2023-01-11T23:16:31.6109298Z [ OK ] CustomClassTest.TestDocString (0 ms) 2023-01-11T23:16:31.6109677Z [ RUN ] CustomClassTest.Serialization 2023-01-11T23:16:31.6123649Z [ OK ] CustomClassTest.Serialization (1 ms) 2023-01-11T23:16:31.6124055Z [----------] 4 tests from CustomClassTest (2 ms total) 2023-01-11T23:16:31.6124246Z 2023-01-11T23:16:31.6124444Z [----------] 5 tests from CustomOperatorTest 2023-01-11T23:16:31.6124833Z [ RUN ] CustomOperatorTest.InferredSchema 2023-01-11T23:16:31.6126336Z [ OK ] CustomOperatorTest.InferredSchema (0 ms) 2023-01-11T23:16:31.6126767Z [ RUN ] CustomOperatorTest.ExplicitSchema 2023-01-11T23:16:31.6129166Z [ OK ] CustomOperatorTest.ExplicitSchema (0 ms) 2023-01-11T23:16:31.6130980Z [ RUN ] CustomOperatorTest.ListParameters 2023-01-11T23:16:31.6131550Z [ OK ] CustomOperatorTest.ListParameters (0 ms) 2023-01-11T23:16:31.6132098Z [ RUN ] CustomOperatorTest.ListParameters2 2023-01-11T23:16:31.6132670Z [ OK ] CustomOperatorTest.ListParameters2 (0 ms) 2023-01-11T23:16:31.6133158Z [ RUN ] CustomOperatorTest.Aliasing 2023-01-11T23:16:31.6135336Z [ OK ] CustomOperatorTest.Aliasing (0 ms) 2023-01-11T23:16:31.6136074Z [----------] 5 tests from CustomOperatorTest (1 ms total) 2023-01-11T23:16:31.6136260Z 2023-01-11T23:16:31.6136441Z [----------] 2 tests from TestCustomOperator 2023-01-11T23:16:31.6136905Z [ RUN ] TestCustomOperator.OperatorGeneratorUndeclared 2023-01-11T23:16:31.6137344Z [ OK ] TestCustomOperator.OperatorGeneratorUndeclared (0 ms) 2023-01-11T23:16:31.6137764Z [ RUN ] TestCustomOperator.OperatorGeneratorBasic 2023-01-11T23:16:31.6138171Z [ OK ] TestCustomOperator.OperatorGeneratorBasic (0 ms) 2023-01-11T23:16:31.6138559Z [----------] 2 tests from TestCustomOperator (0 ms total) 2023-01-11T23:16:31.6138750Z 2023-01-11T23:16:31.6138973Z [----------] 1 test from EliminateDeadCodeTest 2023-01-11T23:16:31.6139301Z [ RUN ] EliminateDeadCodeTest.Basic 2023-01-11T23:16:31.6139657Z [ OK ] EliminateDeadCodeTest.Basic (0 ms) 2023-01-11T23:16:31.6140341Z [----------] 1 test from EliminateDeadCodeTest (0 ms total) 2023-01-11T23:16:31.6140526Z 2023-01-11T23:16:31.6140677Z [----------] 5 tests from FuserTest 2023-01-11T23:16:31.6140980Z [ RUN ] FuserTest.TestSimple_CUDA 2023-01-11T23:16:31.7485821Z [ OK ] FuserTest.TestSimple_CUDA (134 ms) 2023-01-11T23:16:31.7486176Z [ RUN ] FuserTest.TestOne_CUDA 2023-01-11T23:16:32.3000667Z [ OK ] FuserTest.TestOne_CUDA (551 ms) 2023-01-11T23:16:32.3001034Z [ RUN ] FuserTest.FusedConcat_CUDA 2023-01-11T23:16:32.7379216Z [ OK ] FuserTest.FusedConcat_CUDA (437 ms) 2023-01-11T23:16:32.7379633Z [ RUN ] FuserTest.FusionAliasing 2023-01-11T23:16:32.7384753Z [ OK ] FuserTest.FusionAliasing (0 ms) 2023-01-11T23:16:32.7385896Z [ RUN ] FuserTest.KernelCaching 2023-01-11T23:16:32.7386779Z [ OK ] FuserTest.KernelCaching (0 ms) 2023-01-11T23:16:32.7387270Z [----------] 5 tests from FuserTest (1124 ms total) 2023-01-11T23:16:32.7387504Z 2023-01-11T23:16:32.7387747Z [----------] 2 tests from GraphExecutorTest 2023-01-11T23:16:32.7388182Z [ RUN ] GraphExecutorTest.Basic_CUDA 2023-01-11T23:16:33.3595824Z [ OK ] GraphExecutorTest.Basic_CUDA (620 ms) 2023-01-11T23:16:33.3596424Z [ RUN ] GraphExecutorTest.runAsync_executor 2023-01-11T23:16:33.3640659Z [ OK ] GraphExecutorTest.runAsync_executor (4 ms) 2023-01-11T23:16:33.3641219Z [----------] 2 tests from GraphExecutorTest (625 ms total) 2023-01-11T23:16:33.3641483Z 2023-01-11T23:16:33.3641728Z [----------] 5 tests from GraphIteratorTest 2023-01-11T23:16:33.3642481Z [ RUN ] GraphIteratorTest.ConstantReturnGraph 2023-01-11T23:16:33.3642913Z [ OK ] GraphIteratorTest.ConstantReturnGraph (0 ms) 2023-01-11T23:16:33.3643440Z [ RUN ] GraphIteratorTest.GraphWithParameters 2023-01-11T23:16:33.3643868Z [ OK ] GraphIteratorTest.GraphWithParameters (0 ms) 2023-01-11T23:16:33.3644232Z [ RUN ] GraphIteratorTest.GraphWithIf 2023-01-11T23:16:33.3644586Z [ OK ] GraphIteratorTest.GraphWithIf (0 ms) 2023-01-11T23:16:33.3644931Z [ RUN ] GraphIteratorTest.GraphWithNestedIf 2023-01-11T23:16:33.3645298Z [ OK ] GraphIteratorTest.GraphWithNestedIf (0 ms) 2023-01-11T23:16:33.3645651Z [ RUN ] GraphIteratorTest.GraphWithLoop 2023-01-11T23:16:33.3645998Z [ OK ] GraphIteratorTest.GraphWithLoop (0 ms) 2023-01-11T23:16:33.3646350Z [----------] 5 tests from GraphIteratorTest (0 ms total) 2023-01-11T23:16:33.3646526Z 2023-01-11T23:16:33.3646723Z [----------] 1 test from CSDebugInfoSerializaitionTest 2023-01-11T23:16:33.3647102Z [ RUN ] CSDebugInfoSerializaitionTest.TwoSubmodules 2023-01-11T23:16:33.3650661Z [ OK ] CSDebugInfoSerializaitionTest.TwoSubmodules (0 ms) 2023-01-11T23:16:33.3651180Z [----------] 1 test from CSDebugInfoSerializaitionTest (0 ms total) 2023-01-11T23:16:33.3651382Z 2023-01-11T23:16:33.3651532Z [----------] 1 test from InlinerTest 2023-01-11T23:16:33.3651813Z [ RUN ] InlinerTest.Basic 2023-01-11T23:16:33.3652679Z [ OK ] InlinerTest.Basic (0 ms) 2023-01-11T23:16:33.3653071Z [----------] 1 test from InlinerTest (0 ms total) 2023-01-11T23:16:33.3653293Z 2023-01-11T23:16:33.3653473Z [----------] 1 test from InterfaceTest 2023-01-11T23:16:33.3653823Z [ RUN ] InterfaceTest.ModuleInterfaceSerialization 2023-01-11T23:16:33.3668181Z [ OK ] InterfaceTest.ModuleInterfaceSerialization (1 ms) 2023-01-11T23:16:33.3668593Z [----------] 1 test from InterfaceTest (1 ms total) 2023-01-11T23:16:33.3668855Z 2023-01-11T23:16:33.3669154Z [----------] 5 tests from TypeCheckTest 2023-01-11T23:16:33.3669740Z [ RUN ] TypeCheckTest.MatchingType 2023-01-11T23:16:33.3670187Z [ OK ] TypeCheckTest.MatchingType (0 ms) 2023-01-11T23:16:33.3670590Z [ RUN ] TypeCheckTest.SizeMismatch 2023-01-11T23:16:33.3670944Z [ OK ] TypeCheckTest.SizeMismatch (0 ms) 2023-01-11T23:16:33.3671310Z [ RUN ] TypeCheckTest.GradientMismatch 2023-01-11T23:16:33.3671823Z [ OK ] TypeCheckTest.GradientMismatch (0 ms) 2023-01-11T23:16:33.3672423Z [ RUN ] TypeCheckTest.ScalarTypeMismatch 2023-01-11T23:16:33.3673645Z [ OK ] TypeCheckTest.ScalarTypeMismatch (0 ms) 2023-01-11T23:16:33.3674094Z [ RUN ] TypeCheckTest.DeviceMismatch_CUDA 2023-01-11T23:16:33.3675008Z [ OK ] TypeCheckTest.DeviceMismatch_CUDA (0 ms) 2023-01-11T23:16:33.3675403Z [----------] 5 tests from TypeCheckTest (0 ms total) 2023-01-11T23:16:33.3675609Z 2023-01-11T23:16:33.3675834Z [----------] 4 tests from InterpreterTest 2023-01-11T23:16:33.3676160Z [ RUN ] InterpreterTest.Basic_CUDA 2023-01-11T23:16:33.3681836Z [ OK ] InterpreterTest.Basic_CUDA (0 ms) 2023-01-11T23:16:33.3682197Z [ RUN ] InterpreterTest.IgnorableArgsInSchema 2023-01-11T23:16:33.3685238Z [ OK ] InterpreterTest.IgnorableArgsInSchema (0 ms) 2023-01-11T23:16:33.3685673Z [ RUN ] InterpreterTest.IgnorableArgsInSchemaWithOut 2023-01-11T23:16:33.3686177Z [ OK ] InterpreterTest.IgnorableArgsInSchemaWithOut (0 ms) 2023-01-11T23:16:33.3686559Z [ RUN ] InterpreterTest.runAsyncBasicTest 2023-01-11T23:16:33.3718780Z [ OK ] InterpreterTest.runAsyncBasicTest (3 ms) 2023-01-11T23:16:33.3719325Z [----------] 4 tests from InterpreterTest (4 ms total) 2023-01-11T23:16:33.3719536Z 2023-01-11T23:16:33.3719757Z [----------] 1 test from EnableRethrowCaughtExceptionTest 2023-01-11T23:16:33.3720324Z [ RUN ] EnableRethrowCaughtExceptionTest.EnableRethrowCaughtExceptionTestRethrowsCaughtException 2023-01-11T23:16:33.3921075Z [ OK ] EnableRethrowCaughtExceptionTest.EnableRethrowCaughtExceptionTestRethrowsCaughtException (20 ms) 2023-01-11T23:16:33.3921809Z [----------] 1 test from EnableRethrowCaughtExceptionTest (20 ms total) 2023-01-11T23:16:33.3922094Z 2023-01-11T23:16:33.3922302Z [----------] 4 tests from IRTest 2023-01-11T23:16:33.3922663Z [ RUN ] IRTest.Attributes 2023-01-11T23:16:33.3923035Z [ OK ] IRTest.Attributes (0 ms) 2023-01-11T23:16:33.3923405Z [ RUN ] IRTest.Blocks 2023-01-11T23:16:33.3923805Z [ OK ] IRTest.Blocks (0 ms) 2023-01-11T23:16:33.3924110Z [ RUN ] IRTest.CommonAncestor 2023-01-11T23:16:33.3924417Z [ OK ] IRTest.CommonAncestor (0 ms) 2023-01-11T23:16:33.3924709Z [ RUN ] IRTest.OperatorMap 2023-01-11T23:16:33.3924994Z [ OK ] IRTest.OperatorMap (0 ms) 2023-01-11T23:16:33.3925378Z [----------] 4 tests from IRTest (0 ms total) 2023-01-11T23:16:33.3925537Z 2023-01-11T23:16:33.3925701Z [----------] 21 tests from IRParserTest 2023-01-11T23:16:33.3926040Z [ RUN ] IRParserTest.Basic 2023-01-11T23:16:33.3926336Z [ OK ] IRParserTest.Basic (0 ms) 2023-01-11T23:16:33.3926638Z [ RUN ] IRParserTest.NestedBlock 2023-01-11T23:16:33.3927015Z [ OK ] IRParserTest.NestedBlock (0 ms) 2023-01-11T23:16:33.3927403Z [ RUN ] IRParserTest.If 2023-01-11T23:16:33.3927798Z [ OK ] IRParserTest.If (0 ms) 2023-01-11T23:16:33.3928081Z [ RUN ] IRParserTest.If2 2023-01-11T23:16:33.3928354Z [ OK ] IRParserTest.If2 (0 ms) 2023-01-11T23:16:33.3928688Z [ RUN ] IRParserTest.InferredTypeIsTensor 2023-01-11T23:16:33.3929046Z [ OK ] IRParserTest.InferredTypeIsTensor (0 ms) 2023-01-11T23:16:33.3929368Z [ RUN ] IRParserTest.ValueReuse 2023-01-11T23:16:33.3929929Z [ OK ] IRParserTest.ValueReuse (0 ms) 2023-01-11T23:16:33.3930380Z [ RUN ] IRParserTest.Attributes 2023-01-11T23:16:33.3930796Z [ OK ] IRParserTest.Attributes (0 ms) 2023-01-11T23:16:33.3931233Z [ RUN ] IRParserTest.OptionalTypes 2023-01-11T23:16:33.3931626Z [ OK ] IRParserTest.OptionalTypes (0 ms) 2023-01-11T23:16:33.3932046Z [ RUN ] IRParserTest.StarTensor 2023-01-11T23:16:33.3932417Z [ OK ] IRParserTest.StarTensor (0 ms) 2023-01-11T23:16:33.3932730Z [ RUN ] IRParserTest.UnshapedTensor 2023-01-11T23:16:33.3933062Z [ OK ] IRParserTest.UnshapedTensor (0 ms) 2023-01-11T23:16:33.3933402Z [ RUN ] IRParserTest.ShapedTensor 2023-01-11T23:16:33.3933862Z [ OK ] IRParserTest.ShapedTensor (0 ms) 2023-01-11T23:16:33.3934310Z [ RUN ] IRParserTest.NestedContrainer 2023-01-11T23:16:33.3934962Z [ OK ] IRParserTest.NestedContrainer (0 ms) 2023-01-11T23:16:33.3935318Z [ RUN ] IRParserTest.MalformedShapeAnnotation 2023-01-11T23:16:33.3935888Z [ OK ] IRParserTest.MalformedShapeAnnotation (0 ms) 2023-01-11T23:16:33.3936339Z [ RUN ] IRParserTest.FileCheck 2023-01-11T23:16:33.3936650Z [ OK ] IRParserTest.FileCheck (0 ms) 2023-01-11T23:16:33.3936943Z [ RUN ] IRParserTest.Strides 2023-01-11T23:16:33.3937238Z [ OK ] IRParserTest.Strides (0 ms) 2023-01-11T23:16:33.3937544Z [ RUN ] IRParserTest.MalformedStrides 2023-01-11T23:16:33.3937879Z [ OK ] IRParserTest.MalformedStrides (0 ms) 2023-01-11T23:16:33.3938284Z [ RUN ] IRParserTest.TensorShapes 2023-01-11T23:16:33.3938595Z [ OK ] IRParserTest.TensorShapes (0 ms) 2023-01-11T23:16:33.3938952Z [ RUN ] IRParserTest.DeviceAndRequiresGradTensors 2023-01-11T23:16:33.3939372Z [ OK ] IRParserTest.DeviceAndRequiresGradTensors (0 ms) 2023-01-11T23:16:33.3939723Z [ RUN ] IRParserTest.ListConstant 2023-01-11T23:16:33.3940049Z [ OK ] IRParserTest.ListConstant (0 ms) 2023-01-11T23:16:33.3940430Z [ RUN ] IRParserTest.PartialStarTensor 2023-01-11T23:16:33.3940776Z [ OK ] IRParserTest.PartialStarTensor (0 ms) 2023-01-11T23:16:33.3941135Z [ RUN ] IRParserTest.ComplexTensorAttributes 2023-01-11T23:16:33.3941559Z [ OK ] IRParserTest.ComplexTensorAttributes (0 ms) 2023-01-11T23:16:33.3941930Z [----------] 21 tests from IRParserTest (1 ms total) 2023-01-11T23:16:33.3942094Z 2023-01-11T23:16:33.3942240Z [----------] 2 tests from JitTypeTest 2023-01-11T23:16:33.3942539Z [ RUN ] JitTypeTest.IsComplete 2023-01-11T23:16:33.3942845Z [ OK ] JitTypeTest.IsComplete (0 ms) 2023-01-11T23:16:33.3943149Z [ RUN ] JitTypeTest.UnifyTypes 2023-01-11T23:16:33.3943446Z [ OK ] JitTypeTest.UnifyTypes (0 ms) 2023-01-11T23:16:33.3943774Z [----------] 2 tests from JitTypeTest (0 ms total) 2023-01-11T23:16:33.3943944Z 2023-01-11T23:16:33.3944136Z [----------] 42 tests from LiteInterpreterTest 2023-01-11T23:16:33.3944517Z [ RUN ] LiteInterpreterTest.UpsampleNearest2d 2023-01-11T23:16:33.3950902Z [ OK ] LiteInterpreterTest.UpsampleNearest2d (1 ms) 2023-01-11T23:16:33.3951323Z [ RUN ] LiteInterpreterTest.CheckAttrAccess 2023-01-11T23:16:33.3952697Z [ OK ] LiteInterpreterTest.CheckAttrAccess (0 ms) 2023-01-11T23:16:33.3953129Z [ RUN ] LiteInterpreterTest.MethodInvocation 2023-01-11T23:16:33.3981699Z [ OK ] LiteInterpreterTest.MethodInvocation (2 ms) 2023-01-11T23:16:33.3982425Z [ RUN ] LiteInterpreterTest.Conv 2023-01-11T23:16:33.4014251Z [ OK ] LiteInterpreterTest.Conv (3 ms) 2023-01-11T23:16:33.4015019Z [ RUN ] LiteInterpreterTest.Inline 2023-01-11T23:16:33.4022721Z [ OK ] LiteInterpreterTest.Inline (0 ms) 2023-01-11T23:16:33.4023074Z [ RUN ] LiteInterpreterTest.Tuple 2023-01-11T23:16:33.4028834Z [ OK ] LiteInterpreterTest.Tuple (0 ms) 2023-01-11T23:16:33.4029178Z [ RUN ] LiteInterpreterTest.AtenFormat 2023-01-11T23:16:33.4035231Z [ OK ] LiteInterpreterTest.AtenFormat (0 ms) 2023-01-11T23:16:33.4035608Z [ RUN ] LiteInterpreterTest.PrimDevice 2023-01-11T23:16:33.4039395Z [ OK ] LiteInterpreterTest.PrimDevice (0 ms) 2023-01-11T23:16:33.4039728Z [ RUN ] LiteInterpreterTest.Dict 2023-01-11T23:16:33.4046196Z [ OK ] LiteInterpreterTest.Dict (0 ms) 2023-01-11T23:16:33.4046526Z [ RUN ] LiteInterpreterTest.List 2023-01-11T23:16:33.4054654Z [ OK ] LiteInterpreterTest.List (0 ms) 2023-01-11T23:16:33.4055010Z [ RUN ] LiteInterpreterTest.PrimOverload 2023-01-11T23:16:33.4055374Z [ OK ] LiteInterpreterTest.PrimOverload (0 ms) 2023-01-11T23:16:33.4055698Z [ RUN ] LiteInterpreterTest.Prim 2023-01-11T23:16:33.4059394Z [ OK ] LiteInterpreterTest.Prim (0 ms) 2023-01-11T23:16:33.4059731Z [ RUN ] LiteInterpreterTest.PrimScalar 2023-01-11T23:16:33.4063948Z [ OK ] LiteInterpreterTest.PrimScalar (0 ms) 2023-01-11T23:16:33.4064289Z [ RUN ] LiteInterpreterTest.LoadOrigJit 2023-01-11T23:16:33.4121650Z [ OK ] LiteInterpreterTest.LoadOrigJit (5 ms) 2023-01-11T23:16:33.4122010Z [ RUN ] LiteInterpreterTest.WrongMethodName 2023-01-11T23:16:33.4142432Z [ OK ] LiteInterpreterTest.WrongMethodName (2 ms) 2023-01-11T23:16:33.4142908Z [ RUN ] LiteInterpreterTest.SetState 2023-01-11T23:16:33.4169027Z [ OK ] LiteInterpreterTest.SetState (2 ms) 2023-01-11T23:16:33.4169378Z [ RUN ] LiteInterpreterTest.BuiltinClass 2023-01-11T23:16:33.4176967Z [ OK ] LiteInterpreterTest.BuiltinClass (0 ms) 2023-01-11T23:16:33.4177341Z [ RUN ] LiteInterpreterTest.BuiltinFunction 2023-01-11T23:16:33.4180283Z [ OK ] LiteInterpreterTest.BuiltinFunction (0 ms) 2023-01-11T23:16:33.4180687Z [ RUN ] LiteInterpreterTest.GetRuntimeByteCodeVersion 2023-01-11T23:16:33.4181194Z [ OK ] LiteInterpreterTest.GetRuntimeByteCodeVersion (0 ms) 2023-01-11T23:16:33.4181671Z [ RUN ] LiteInterpreterTest.GetRuntimeOperatorsVersion 2023-01-11T23:16:33.4182103Z [ OK ] LiteInterpreterTest.GetRuntimeOperatorsVersion (0 ms) 2023-01-11T23:16:33.4182507Z [ RUN ] LiteInterpreterTest.GetByteCodeVersion 2023-01-11T23:16:33.4182891Z [ OK ] LiteInterpreterTest.GetByteCodeVersion (0 ms) 2023-01-11T23:16:33.4183263Z [ RUN ] LiteInterpreterTest.GetContainTypes 2023-01-11T23:16:33.4183836Z [ OK ] LiteInterpreterTest.GetContainTypes (0 ms) 2023-01-11T23:16:33.4184300Z [ RUN ] LiteInterpreterTest.BackPortByteCodeModelAllVersions 2023-01-11T23:16:33.5107250Z [ OK ] LiteInterpreterTest.BackPortByteCodeModelAllVersions (92 ms) 2023-01-11T23:16:33.5107714Z [ RUN ] LiteInterpreterTest.GetRuntimeOpsAndInfo 2023-01-11T23:16:33.5164467Z [ OK ] LiteInterpreterTest.GetRuntimeOpsAndInfo (5 ms) 2023-01-11T23:16:33.5164864Z [ RUN ] LiteInterpreterTest.isCompatibleSuccess 2023-01-11T23:16:33.5214165Z [ OK ] LiteInterpreterTest.isCompatibleSuccess (4 ms) 2023-01-11T23:16:33.5215408Z [ RUN ] LiteInterpreterTest.isCompatibleFail 2023-01-11T23:16:33.5304884Z [ OK ] LiteInterpreterTest.isCompatibleFail (8 ms) 2023-01-11T23:16:33.5305599Z [ RUN ] LiteInterpreterTest.Eval 2023-01-11T23:16:33.5315644Z [ OK ] LiteInterpreterTest.Eval (1 ms) 2023-01-11T23:16:33.5316356Z [ RUN ] LiteInterpreterTest.FindWrongMethodName 2023-01-11T23:16:33.5318032Z [ OK ] LiteInterpreterTest.FindWrongMethodName (0 ms) 2023-01-11T23:16:33.5318417Z [ RUN ] LiteInterpreterTest.FindAndRunMethod 2023-01-11T23:16:33.5326173Z [ OK ] LiteInterpreterTest.FindAndRunMethod (0 ms) 2023-01-11T23:16:33.5326560Z [ RUN ] LiteInterpreterTest.RunMethodVariadic 2023-01-11T23:16:33.5331607Z [ OK ] LiteInterpreterTest.RunMethodVariadic (0 ms) 2023-01-11T23:16:33.5332006Z [ RUN ] LiteInterpreterTest.DuplicateSetState 2023-01-11T23:16:33.5343244Z [ OK ] LiteInterpreterTest.DuplicateSetState (1 ms) 2023-01-11T23:16:33.5343806Z [ RUN ] LiteInterpreterTest.ExtraFiles 2023-01-11T23:16:33.5347876Z [ OK ] LiteInterpreterTest.ExtraFiles (0 ms) 2023-01-11T23:16:33.5348290Z [ RUN ] LiteInterpreterTest.OpNameExportFetchRootOperators 2023-01-11T23:16:33.5356698Z [ OK ] LiteInterpreterTest.OpNameExportFetchRootOperators (0 ms) 2023-01-11T23:16:33.5357240Z [ RUN ] LiteInterpreterTest.DefaultArgsConv 2023-01-11T23:16:33.5375207Z [ OK ] LiteInterpreterTest.DefaultArgsConv (1 ms) 2023-01-11T23:16:33.5375645Z [ RUN ] LiteInterpreterTest.DefaultArgsPinv 2023-01-11T23:16:33.5470175Z [ OK ] LiteInterpreterTest.DefaultArgsPinv (9 ms) 2023-01-11T23:16:33.5470710Z [ RUN ] LiteInterpreterTest.DefaultArgsTensorinvSpecifyDefault 2023-01-11T23:16:33.5480126Z [ OK ] LiteInterpreterTest.DefaultArgsTensorinvSpecifyDefault (1 ms) 2023-01-11T23:16:33.5480589Z [ RUN ] LiteInterpreterTest.DefaultArgsPinvWithOutArg 2023-01-11T23:16:33.5513966Z [ OK ] LiteInterpreterTest.DefaultArgsPinvWithOutArg (3 ms) 2023-01-11T23:16:33.5514395Z [ RUN ] LiteInterpreterTest.DefaultArgsWithOutArg 2023-01-11T23:16:33.5518323Z [ OK ] LiteInterpreterTest.DefaultArgsWithOutArg (0 ms) 2023-01-11T23:16:33.5518811Z [ RUN ] LiteInterpreterTest.TestExceptionStackWithTwoLevelModuleHierarchy 2023-01-11T23:16:33.5619252Z [ OK ] LiteInterpreterTest.TestExceptionStackWithTwoLevelModuleHierarchy (10 ms) 2023-01-11T23:16:33.5619852Z [ RUN ] LiteInterpreterTest.OperatorCacheDifferentiatesDefaultArgs 2023-01-11T23:16:33.5652508Z [ OK ] LiteInterpreterTest.OperatorCacheDifferentiatesDefaultArgs (3 ms) 2023-01-11T23:16:33.5653006Z [ RUN ] LiteInterpreterTest.OperatorSize1 2023-01-11T23:16:33.5656437Z [ OK ] LiteInterpreterTest.OperatorSize1 (0 ms) 2023-01-11T23:16:33.5656873Z [ RUN ] LiteInterpreterTest.OperatorTest2 2023-01-11T23:16:33.5673703Z [ OK ] LiteInterpreterTest.OperatorTest2 (1 ms) 2023-01-11T23:16:33.5674220Z [----------] 42 tests from LiteInterpreterTest (173 ms total) 2023-01-11T23:16:33.5674457Z 2023-01-11T23:16:33.5674668Z [----------] 3 tests from RunTimeTest 2023-01-11T23:16:33.5675051Z [ RUN ] RunTimeTest.ParseBytecode 2023-01-11T23:16:33.5675428Z [ OK ] RunTimeTest.ParseBytecode (0 ms) 2023-01-11T23:16:33.5675743Z [ RUN ] RunTimeTest.ParseOperator 2023-01-11T23:16:33.5676151Z [ OK ] RunTimeTest.ParseOperator (0 ms) 2023-01-11T23:16:33.5676568Z [ RUN ] RunTimeTest.RuntimeCall 2023-01-11T23:16:33.5676978Z [ OK ] RunTimeTest.RuntimeCall (0 ms) 2023-01-11T23:16:33.5677402Z [----------] 3 tests from RunTimeTest (0 ms total) 2023-01-11T23:16:33.5677617Z 2023-01-11T23:16:33.5677878Z [----------] 11 tests from LiteInterpreterUpgraderTest 2023-01-11T23:16:33.5678336Z [ RUN ] LiteInterpreterUpgraderTest.DivTensorV2 2023-01-11T23:16:33.5678740Z [ OK ] LiteInterpreterUpgraderTest.DivTensorV2 (0 ms) 2023-01-11T23:16:33.5679259Z [ RUN ] LiteInterpreterUpgraderTest.DivTensorOutV2 2023-01-11T23:16:33.5679668Z [ OK ] LiteInterpreterUpgraderTest.DivTensorOutV2 (0 ms) 2023-01-11T23:16:33.5680171Z [ RUN ] LiteInterpreterUpgraderTest.DivTensorInplaceV2 2023-01-11T23:16:33.5680764Z [ OK ] LiteInterpreterUpgraderTest.DivTensorInplaceV2 (0 ms) 2023-01-11T23:16:33.5681281Z [ RUN ] LiteInterpreterUpgraderTest.DivScalarFloatV2 2023-01-11T23:16:33.5681842Z [ OK ] LiteInterpreterUpgraderTest.DivScalarFloatV2 (0 ms) 2023-01-11T23:16:33.5682446Z [ RUN ] LiteInterpreterUpgraderTest.DivScalarReciprocalFloatV2 2023-01-11T23:16:33.5682755Z expect output: 0.5000 2023-01-11T23:16:33.5682975Z [ CPUFloatType{1} ]actual output: 0.5000 2023-01-11T23:16:33.5683506Z [ CPUFloatType{1} ][ OK ] LiteInterpreterUpgraderTest.DivScalarReciprocalFloatV2 (0 ms) 2023-01-11T23:16:33.5684160Z [ RUN ] LiteInterpreterUpgraderTest.DivScalarReciprocalIntV2 2023-01-11T23:16:33.5684636Z [ OK ] LiteInterpreterUpgraderTest.DivScalarReciprocalIntV2 (0 ms) 2023-01-11T23:16:33.5685076Z [ RUN ] LiteInterpreterUpgraderTest.DivScalarScalarV2 2023-01-11T23:16:33.5685503Z [ OK ] LiteInterpreterUpgraderTest.DivScalarScalarV2 (0 ms) 2023-01-11T23:16:33.5685938Z [ RUN ] LiteInterpreterUpgraderTest.DivScalarIntV2 2023-01-11T23:16:33.5686496Z [ OK ] LiteInterpreterUpgraderTest.DivScalarIntV2 (0 ms) 2023-01-11T23:16:33.5687047Z [ RUN ] LiteInterpreterUpgraderTest.DivScalarInplaceFloatV2 2023-01-11T23:16:33.5687859Z [ OK ] LiteInterpreterUpgraderTest.DivScalarInplaceFloatV2 (0 ms) 2023-01-11T23:16:33.5688586Z [ RUN ] LiteInterpreterUpgraderTest.DivScalarInplaceIntV2 2023-01-11T23:16:33.5689134Z [ OK ] LiteInterpreterUpgraderTest.DivScalarInplaceIntV2 (0 ms) 2023-01-11T23:16:33.5689543Z [ RUN ] LiteInterpreterUpgraderTest.Upgrader 2023-01-11T23:16:33.5689922Z [ OK ] LiteInterpreterUpgraderTest.Upgrader (0 ms) 2023-01-11T23:16:33.5690318Z [----------] 11 tests from LiteInterpreterUpgraderTest (1 ms total) 2023-01-11T23:16:33.5690511Z 2023-01-11T23:16:33.5690706Z [----------] 29 tests from LiteInterpreterDirectTest 2023-01-11T23:16:33.5691083Z [ RUN ] LiteInterpreterDirectTest.UpsampleNearest2d 2023-01-11T23:16:33.5700127Z [ OK ] LiteInterpreterDirectTest.UpsampleNearest2d (1 ms) 2023-01-11T23:16:33.5700658Z [ RUN ] LiteInterpreterDirectTest.CheckAttrAccess 2023-01-11T23:16:33.5701153Z [ OK ] LiteInterpreterDirectTest.CheckAttrAccess (0 ms) 2023-01-11T23:16:33.5701553Z [ RUN ] LiteInterpreterDirectTest.MethodInvocation 2023-01-11T23:16:33.5705695Z hello 2023-01-11T23:16:33.5705959Z hello 3 2023-01-11T23:16:33.5713482Z hello 2023-01-11T23:16:33.5713925Z hello 3 2023-01-11T23:16:33.5718804Z hello 2023-01-11T23:16:33.5719254Z hello 3 2023-01-11T23:16:33.5719738Z [ OK ] LiteInterpreterDirectTest.MethodInvocation (1 ms) 2023-01-11T23:16:33.5720246Z [ RUN ] LiteInterpreterDirectTest.Conv 2023-01-11T23:16:33.5740275Z [ OK ] LiteInterpreterDirectTest.Conv (2 ms) 2023-01-11T23:16:33.5740766Z [ RUN ] LiteInterpreterDirectTest.Inline 2023-01-11T23:16:33.5745394Z [ OK ] LiteInterpreterDirectTest.Inline (0 ms) 2023-01-11T23:16:33.5745922Z [ RUN ] LiteInterpreterDirectTest.Tuple 2023-01-11T23:16:33.5749957Z [ OK ] LiteInterpreterDirectTest.Tuple (0 ms) 2023-01-11T23:16:33.5750433Z [ RUN ] LiteInterpreterDirectTest.Dict 2023-01-11T23:16:33.5754721Z [ OK ] LiteInterpreterDirectTest.Dict (0 ms) 2023-01-11T23:16:33.5755221Z [ RUN ] LiteInterpreterDirectTest.Prim 2023-01-11T23:16:33.5756199Z [ OK ] LiteInterpreterDirectTest.Prim (0 ms) 2023-01-11T23:16:33.5756832Z [ RUN ] LiteInterpreterDirectTest.PrimScalar 2023-01-11T23:16:33.5759548Z [ OK ] LiteInterpreterDirectTest.PrimScalar (0 ms) 2023-01-11T23:16:33.5760109Z [ RUN ] LiteInterpreterDirectTest.WrongMethodName 2023-01-11T23:16:33.5778379Z [ OK ] LiteInterpreterDirectTest.WrongMethodName (1 ms) 2023-01-11T23:16:33.5778888Z [ RUN ] LiteInterpreterDirectTest.SetState 2023-01-11T23:16:33.5796282Z [ OK ] LiteInterpreterDirectTest.SetState (1 ms) 2023-01-11T23:16:33.5796884Z [ RUN ] LiteInterpreterDirectTest.BuiltinFunction 2023-01-11T23:16:33.5797816Z [ OK ] LiteInterpreterDirectTest.BuiltinFunction (0 ms) 2023-01-11T23:16:33.5798477Z [ RUN ] LiteInterpreterDirectTest.GetRuntimeByteCodeVersion 2023-01-11T23:16:33.5798961Z [ OK ] LiteInterpreterDirectTest.GetRuntimeByteCodeVersion (0 ms) 2023-01-11T23:16:33.5799422Z [ RUN ] LiteInterpreterDirectTest.GetRuntimeOperatorsVersion 2023-01-11T23:16:33.5799886Z [ OK ] LiteInterpreterDirectTest.GetRuntimeOperatorsVersion (0 ms) 2023-01-11T23:16:33.5800322Z [ RUN ] LiteInterpreterDirectTest.GetByteCodeVersion 2023-01-11T23:16:33.5800738Z [ OK ] LiteInterpreterDirectTest.GetByteCodeVersion (0 ms) 2023-01-11T23:16:33.5801160Z [ RUN ] LiteInterpreterDirectTest.GetRuntimeOpsAndInfo 2023-01-11T23:16:33.5854140Z [ OK ] LiteInterpreterDirectTest.GetRuntimeOpsAndInfo (5 ms) 2023-01-11T23:16:33.5854959Z [ RUN ] LiteInterpreterDirectTest.Eval 2023-01-11T23:16:33.5860767Z [ OK ] LiteInterpreterDirectTest.Eval (0 ms) 2023-01-11T23:16:33.5861295Z [ RUN ] LiteInterpreterDirectTest.FindWrongMethodName 2023-01-11T23:16:33.5862536Z [ OK ] LiteInterpreterDirectTest.FindWrongMethodName (0 ms) 2023-01-11T23:16:33.5863100Z [ RUN ] LiteInterpreterDirectTest.FindAndRunMethod 2023-01-11T23:16:33.5868854Z [ OK ] LiteInterpreterDirectTest.FindAndRunMethod (0 ms) 2023-01-11T23:16:33.5869410Z [ RUN ] LiteInterpreterDirectTest.RunMethodVariadic 2023-01-11T23:16:33.5874840Z [ OK ] LiteInterpreterDirectTest.RunMethodVariadic (0 ms) 2023-01-11T23:16:33.5875396Z [ RUN ] LiteInterpreterDirectTest.DuplicateSetState 2023-01-11T23:16:33.5879170Z [ OK ] LiteInterpreterDirectTest.DuplicateSetState (0 ms) 2023-01-11T23:16:33.5879798Z [ RUN ] LiteInterpreterDirectTest.OpNameExportFetchRootOperators 2023-01-11T23:16:33.5884098Z [ OK ] LiteInterpreterDirectTest.OpNameExportFetchRootOperators (0 ms) 2023-01-11T23:16:33.5884711Z [ RUN ] LiteInterpreterDirectTest.DefaultArgsConv 2023-01-11T23:16:33.5897423Z [ OK ] LiteInterpreterDirectTest.DefaultArgsConv (1 ms) 2023-01-11T23:16:33.5897979Z [ RUN ] LiteInterpreterDirectTest.DefaultArgsPinv 2023-01-11T23:16:33.5983820Z [ OK ] LiteInterpreterDirectTest.DefaultArgsPinv (8 ms) 2023-01-11T23:16:33.5984308Z [ RUN ] LiteInterpreterDirectTest.DefaultArgsTensorinvSpecifyDefault 2023-01-11T23:16:33.5990220Z [ OK ] LiteInterpreterDirectTest.DefaultArgsTensorinvSpecifyDefault (0 ms) 2023-01-11T23:16:33.5990780Z [ RUN ] LiteInterpreterDirectTest.DefaultArgsPinvWithOutArg 2023-01-11T23:16:33.6019564Z [ OK ] LiteInterpreterDirectTest.DefaultArgsPinvWithOutArg (2 ms) 2023-01-11T23:16:33.6020027Z [ RUN ] LiteInterpreterDirectTest.DefaultArgsWithOutArg 2023-01-11T23:16:33.6024095Z [ OK ] LiteInterpreterDirectTest.DefaultArgsWithOutArg (0 ms) 2023-01-11T23:16:33.6024623Z [ RUN ] LiteInterpreterDirectTest.TestExceptionStackWithTwoLevelModuleHierarchy 2023-01-11T23:16:33.6119584Z [ OK ] LiteInterpreterDirectTest.TestExceptionStackWithTwoLevelModuleHierarchy (9 ms) 2023-01-11T23:16:33.6120373Z [ RUN ] LiteInterpreterDirectTest.OperatorCacheDifferentiatesDefaultArgs 2023-01-11T23:16:33.6140562Z [ OK ] LiteInterpreterDirectTest.OperatorCacheDifferentiatesDefaultArgs (2 ms) 2023-01-11T23:16:33.6141058Z [----------] 29 tests from LiteInterpreterDirectTest (45 ms total) 2023-01-11T23:16:33.6141239Z 2023-01-11T23:16:33.6141402Z [----------] 7 tests from LiteTrainerTest 2023-01-11T23:16:33.6141706Z [ RUN ] LiteTrainerTest.Params 2023-01-11T23:16:33.6224224Z [ OK ] LiteTrainerTest.Params (8 ms) 2023-01-11T23:16:33.6224525Z [ RUN ] LiteTrainerTest.SGD 2023-01-11T23:16:33.6302632Z [ OK ] LiteTrainerTest.SGD (7 ms) 2023-01-11T23:16:33.6302970Z [ RUN ] LiteTrainerTest.SequentialSampler 2023-01-11T23:16:33.6303358Z [ OK ] LiteTrainerTest.SequentialSampler (0 ms) 2023-01-11T23:16:33.6303779Z [ RUN ] LiteTrainerTest.RandomSamplerReturnsIndicesInCorrectRange 2023-01-11T23:16:33.6304283Z [ OK ] LiteTrainerTest.RandomSamplerReturnsIndicesInCorrectRange (0 ms) 2023-01-11T23:16:33.6304779Z [ RUN ] LiteTrainerTest.RandomSamplerReturnsLessValuesForLastBatch 2023-01-11T23:16:33.6305276Z [ OK ] LiteTrainerTest.RandomSamplerReturnsLessValuesForLastBatch (0 ms) 2023-01-11T23:16:33.6305716Z [ RUN ] LiteTrainerTest.RandomSamplerResetsWell 2023-01-11T23:16:33.6306104Z [ OK ] LiteTrainerTest.RandomSamplerResetsWell (0 ms) 2023-01-11T23:16:33.6306531Z [ RUN ] LiteTrainerTest.RandomSamplerResetsWithNewSizeWell 2023-01-11T23:16:33.6306980Z [ OK ] LiteTrainerTest.RandomSamplerResetsWithNewSizeWell (0 ms) 2023-01-11T23:16:33.6307590Z [----------] 7 tests from LiteTrainerTest (16 ms total) 2023-01-11T23:16:33.6307765Z 2023-01-11T23:16:33.6307917Z [----------] 6 tests from MobileTest 2023-01-11T23:16:33.6308232Z [ RUN ] MobileTest.SaveLoadParametersEmpty 2023-01-11T23:16:33.6308786Z [ OK ] MobileTest.SaveLoadParametersEmpty (0 ms) 2023-01-11T23:16:33.6309158Z [ RUN ] MobileTest.SaveParametersDefaultsToZip 2023-01-11T23:16:33.6309602Z [ OK ] MobileTest.SaveParametersDefaultsToZip (0 ms) 2023-01-11T23:16:33.6310015Z [ RUN ] MobileTest.SaveParametersCanUseFlatbuffer 2023-01-11T23:16:33.6310419Z [ OK ] MobileTest.SaveParametersCanUseFlatbuffer (0 ms) 2023-01-11T23:16:33.6310899Z [ RUN ] MobileTest.SaveLoadParametersUsingFlatbuffers 2023-01-11T23:16:33.6311710Z [ OK ] MobileTest.SaveLoadParametersUsingFlatbuffers (0 ms) 2023-01-11T23:16:33.6312360Z [ RUN ] MobileTest.LoadParametersUnexpectedFormatShouldThrow 2023-01-11T23:16:33.6333215Z [ OK ] MobileTest.LoadParametersUnexpectedFormatShouldThrow (2 ms) 2023-01-11T23:16:33.6333667Z [ RUN ] MobileTest.LoadParametersEmptyDataShouldThrow 2023-01-11T23:16:33.6355558Z [ OK ] MobileTest.LoadParametersEmptyDataShouldThrow (2 ms) 2023-01-11T23:16:33.6356138Z [----------] 6 tests from MobileTest (4 ms total) 2023-01-11T23:16:33.6356356Z 2023-01-11T23:16:33.6356522Z [----------] 1 test from MemoryDAGTest 2023-01-11T23:16:33.6356874Z [ RUN ] MemoryDAGTest.Basic 2023-01-11T23:16:33.6357208Z [ OK ] MemoryDAGTest.Basic (0 ms) 2023-01-11T23:16:33.6357570Z [----------] 1 test from MemoryDAGTest (0 ms total) 2023-01-11T23:16:33.6357769Z 2023-01-11T23:16:33.6357973Z [----------] 1 test from InternedStringsTest 2023-01-11T23:16:33.6358285Z [ RUN ] InternedStringsTest.Basic 2023-01-11T23:16:33.6358597Z [ OK ] InternedStringsTest.Basic (0 ms) 2023-01-11T23:16:33.6358952Z [----------] 1 test from InternedStringsTest (0 ms total) 2023-01-11T23:16:33.6359149Z 2023-01-11T23:16:33.6359350Z [----------] 1 test from FromQualStringTest 2023-01-11T23:16:33.6359818Z [ RUN ] FromQualStringTest.Basic 2023-01-11T23:16:33.6360178Z [ OK ] FromQualStringTest.Basic (0 ms) 2023-01-11T23:16:33.6360520Z [----------] 1 test from FromQualStringTest (0 ms total) 2023-01-11T23:16:33.6360709Z 2023-01-11T23:16:33.6360905Z [----------] 1 test from THNNConvTest 2023-01-11T23:16:33.6361186Z [ RUN ] THNNConvTest.Basic 2023-01-11T23:16:33.6383926Z [ OK ] THNNConvTest.Basic (2 ms) 2023-01-11T23:16:33.6384335Z [----------] 1 test from THNNConvTest (2 ms total) 2023-01-11T23:16:33.6384539Z 2023-01-11T23:16:33.6384760Z [----------] 1 test from ATenNativeBatchNormTest 2023-01-11T23:16:33.6385172Z [ RUN ] ATenNativeBatchNormTest.Basic 2023-01-11T23:16:33.6401275Z [ OK ] ATenNativeBatchNormTest.Basic (1 ms) 2023-01-11T23:16:33.6401669Z [----------] 1 test from ATenNativeBatchNormTest (1 ms total) 2023-01-11T23:16:33.6401849Z 2023-01-11T23:16:33.6402023Z [----------] 2 tests from CustomFusionTest 2023-01-11T23:16:33.6402331Z [ RUN ] CustomFusionTest.Basic 2023-01-11T23:16:33.6402797Z [ OK ] CustomFusionTest.Basic (0 ms) 2023-01-11T23:16:33.6403111Z [ RUN ] CustomFusionTest.NestedBlocks 2023-01-11T23:16:33.6404098Z [ OK ] CustomFusionTest.NestedBlocks (0 ms) 2023-01-11T23:16:33.6404458Z [----------] 2 tests from CustomFusionTest (0 ms total) 2023-01-11T23:16:33.6404629Z 2023-01-11T23:16:33.6404787Z [----------] 1 test from ControlFlowTest 2023-01-11T23:16:33.6405075Z [ RUN ] ControlFlowTest.Basic 2023-01-11T23:16:33.6411239Z [ OK ] ControlFlowTest.Basic (0 ms) 2023-01-11T23:16:33.6411702Z [----------] 1 test from ControlFlowTest (0 ms total) 2023-01-11T23:16:33.6411924Z 2023-01-11T23:16:33.6412108Z [----------] 1 test from ProtoTest 2023-01-11T23:16:33.6412445Z [ RUN ] ProtoTest.Basic 2023-01-11T23:16:33.6412803Z [ OK ] ProtoTest.Basic (0 ms) 2023-01-11T23:16:33.6413199Z [----------] 1 test from ProtoTest (0 ms total) 2023-01-11T23:16:33.6413381Z 2023-01-11T23:16:33.6413589Z [----------] 9 tests from SchemaParserTest 2023-01-11T23:16:33.6414015Z [ RUN ] SchemaParserTest.NestedArrays 2023-01-11T23:16:33.6414431Z [ OK ] SchemaParserTest.NestedArrays (0 ms) 2023-01-11T23:16:33.6414966Z [ RUN ] SchemaParserTest.OutVariant 2023-01-11T23:16:33.6415290Z [ OK ] SchemaParserTest.OutVariant (0 ms) 2023-01-11T23:16:33.6415629Z [ RUN ] SchemaParserTest.NamedReturns 2023-01-11T23:16:33.6416047Z [ OK ] SchemaParserTest.NamedReturns (0 ms) 2023-01-11T23:16:33.6416392Z [ RUN ] SchemaParserTest.Futures 2023-01-11T23:16:33.6416765Z [ OK ] SchemaParserTest.Futures (0 ms) 2023-01-11T23:16:33.6417154Z [ RUN ] SchemaParserTest.AnnotatedAliasSets 2023-01-11T23:16:33.6417543Z [ OK ] SchemaParserTest.AnnotatedAliasSets (0 ms) 2023-01-11T23:16:33.6418078Z [ RUN ] SchemaParserTest.TensorListAnnotatedAliasSets 2023-01-11T23:16:33.6418627Z [ OK ] SchemaParserTest.TensorListAnnotatedAliasSets (0 ms) 2023-01-11T23:16:33.6419060Z [ RUN ] SchemaParserTest.AnnotatedAliasWithoutBeforeSet 2023-01-11T23:16:33.6419490Z [ OK ] SchemaParserTest.AnnotatedAliasWithoutBeforeSet (0 ms) 2023-01-11T23:16:33.6419877Z [ RUN ] SchemaParserTest.BeforeAfterSets 2023-01-11T23:16:33.6420231Z [ OK ] SchemaParserTest.BeforeAfterSets (0 ms) 2023-01-11T23:16:33.6420577Z [ RUN ] SchemaParserTest.BeforeAfterSets2 2023-01-11T23:16:33.6420939Z [ OK ] SchemaParserTest.BeforeAfterSets2 (0 ms) 2023-01-11T23:16:33.6421301Z [----------] 9 tests from SchemaParserTest (0 ms total) 2023-01-11T23:16:33.6421470Z 2023-01-11T23:16:33.6421728Z [----------] 2 tests from TopologicalIndexTest 2023-01-11T23:16:33.6422064Z [ RUN ] TopologicalIndexTest.Basic 2023-01-11T23:16:33.6422439Z [ OK ] TopologicalIndexTest.Basic (0 ms) 2023-01-11T23:16:33.6422785Z [ RUN ] TopologicalIndexTest.Reindex 2023-01-11T23:16:33.6423189Z [ OK ] TopologicalIndexTest.Reindex (0 ms) 2023-01-11T23:16:33.6423553Z [----------] 2 tests from TopologicalIndexTest (0 ms total) 2023-01-11T23:16:33.6423730Z 2023-01-11T23:16:33.6423900Z [----------] 7 tests from RecordFunctionTest 2023-01-11T23:16:33.6424258Z [ RUN ] RecordFunctionTest.TracedTestInputsOutputs 2023-01-11T23:16:33.6424665Z [ OK ] RecordFunctionTest.TracedTestInputsOutputs (0 ms) 2023-01-11T23:16:33.6425053Z [ RUN ] RecordFunctionTest.SampledCallbacks 2023-01-11T23:16:33.6499564Z [ OK ] RecordFunctionTest.SampledCallbacks (7 ms) 2023-01-11T23:16:33.6500168Z [ RUN ] RecordFunctionTest.RecordFunctionGuard 2023-01-11T23:16:33.6500655Z [ OK ] RecordFunctionTest.RecordFunctionGuard (0 ms) 2023-01-11T23:16:33.6501011Z [ RUN ] RecordFunctionTest.Callbacks 2023-01-11T23:16:33.6503098Z [ OK ] RecordFunctionTest.Callbacks (0 ms) 2023-01-11T23:16:33.6503512Z [ RUN ] RecordFunctionTest.ShouldRun 2023-01-11T23:16:33.6503857Z [ OK ] RecordFunctionTest.ShouldRun (0 ms) 2023-01-11T23:16:33.6504173Z [ RUN ] RecordFunctionTest.Basic 2023-01-11T23:16:33.6505045Z [ OK ] RecordFunctionTest.Basic (0 ms) 2023-01-11T23:16:33.6505551Z [ RUN ] RecordFunctionTest.OperatorNameOverload 2023-01-11T23:16:33.6506168Z [ OK ] RecordFunctionTest.OperatorNameOverload (0 ms) 2023-01-11T23:16:33.6506590Z [----------] 7 tests from RecordFunctionTest (8 ms total) 2023-01-11T23:16:33.6506777Z 2023-01-11T23:16:33.6507004Z [----------] 1 test from ThreadLocalDebugInfoTest 2023-01-11T23:16:33.6507473Z [ RUN ] ThreadLocalDebugInfoTest.Basic 2023-01-11T23:16:33.6507917Z [ OK ] ThreadLocalDebugInfoTest.Basic (0 ms) 2023-01-11T23:16:33.6508291Z [----------] 1 test from ThreadLocalDebugInfoTest (0 ms total) 2023-01-11T23:16:33.6508475Z 2023-01-11T23:16:33.6508651Z [----------] 1 test from TestSymIntArrayRef 2023-01-11T23:16:33.6508988Z [ RUN ] TestSymIntArrayRef.BasicConversion 2023-01-11T23:16:33.6509413Z [ OK ] TestSymIntArrayRef.BasicConversion (0 ms) 2023-01-11T23:16:33.6509902Z [----------] 1 test from TestSymIntArrayRef (0 ms total) 2023-01-11T23:16:33.6510128Z 2023-01-11T23:16:33.6510309Z [----------] 4 tests from TestSymInt 2023-01-11T23:16:33.6510778Z [ RUN ] TestSymInt.NarrowCopyWithSymbolicInt 2023-01-11T23:16:33.6511159Z [ OK ] TestSymInt.NarrowCopyWithSymbolicInt (0 ms) 2023-01-11T23:16:33.6511503Z [ RUN ] TestSymInt.NarrowCopy 2023-01-11T23:16:33.6511844Z [ OK ] TestSymInt.NarrowCopy (0 ms) 2023-01-11T23:16:33.6512175Z [ RUN ] TestSymInt.AddSymbolicInt 2023-01-11T23:16:33.6512559Z [ OK ] TestSymInt.AddSymbolicInt (0 ms) 2023-01-11T23:16:33.6512912Z [ RUN ] TestSymInt.TestSymIntToSymNodeDispatch 2023-01-11T23:16:33.6513303Z [ OK ] TestSymInt.TestSymIntToSymNodeDispatch (0 ms) 2023-01-11T23:16:33.6513659Z [----------] 4 tests from TestSymInt (0 ms total) 2023-01-11T23:16:33.6513816Z 2023-01-11T23:16:33.6513983Z [----------] 1 test from FallbackGraphsTest 2023-01-11T23:16:33.6514291Z [ RUN ] FallbackGraphsTest.Basic 2023-01-11T23:16:33.6519260Z [ OK ] FallbackGraphsTest.Basic (0 ms) 2023-01-11T23:16:33.6519633Z [----------] 1 test from FallbackGraphsTest (0 ms total) 2023-01-11T23:16:33.6519838Z 2023-01-11T23:16:33.6520055Z [----------] 1 test from NoneSchemaMatchTest 2023-01-11T23:16:33.6520473Z [ RUN ] NoneSchemaMatchTest.Basic 2023-01-11T23:16:33.6520798Z [ OK ] NoneSchemaMatchTest.Basic (0 ms) 2023-01-11T23:16:33.6521148Z [----------] 1 test from NoneSchemaMatchTest (0 ms total) 2023-01-11T23:16:33.6521321Z 2023-01-11T23:16:33.6521489Z [----------] 1 test from PassManagementTest 2023-01-11T23:16:33.6521794Z [ RUN ] PassManagementTest.Basic 2023-01-11T23:16:33.6522118Z [ OK ] PassManagementTest.Basic (0 ms) 2023-01-11T23:16:33.6522463Z [----------] 1 test from PassManagementTest (0 ms total) 2023-01-11T23:16:33.6522669Z 2023-01-11T23:16:33.6522885Z [----------] 5 tests from LoopPeelerTest 2023-01-11T23:16:33.6523233Z [ RUN ] LoopPeelerTest.NoInductionVariableUse 2023-01-11T23:16:33.6524967Z [ OK ] LoopPeelerTest.NoInductionVariableUse (0 ms) 2023-01-11T23:16:33.6525359Z [ RUN ] LoopPeelerTest.YesInductionVariableUse 2023-01-11T23:16:33.6528281Z [ OK ] LoopPeelerTest.YesInductionVariableUse (0 ms) 2023-01-11T23:16:33.6528684Z [ RUN ] LoopPeelerTest.LoopWithTerminationCondition 2023-01-11T23:16:33.6533027Z [ OK ] LoopPeelerTest.LoopWithTerminationCondition (0 ms) 2023-01-11T23:16:33.6533413Z [ RUN ] LoopPeelerTest.SimpleNestedLoops 2023-01-11T23:16:33.6539779Z [ OK ] LoopPeelerTest.SimpleNestedLoops (0 ms) 2023-01-11T23:16:33.6540144Z [ RUN ] LoopPeelerTest.SimpleNestedLoops2 2023-01-11T23:16:33.6548310Z [ OK ] LoopPeelerTest.SimpleNestedLoops2 (0 ms) 2023-01-11T23:16:33.6548679Z [----------] 5 tests from LoopPeelerTest (2 ms total) 2023-01-11T23:16:33.6548934Z 2023-01-11T23:16:33.6549075Z [----------] 1 test from JitTracing 2023-01-11T23:16:33.6549348Z [ RUN ] JitTracing.Basic 2023-01-11T23:16:33.6681478Z [ OK ] JitTracing.Basic (13 ms) 2023-01-11T23:16:33.6681806Z [----------] 1 test from JitTracing (13 ms total) 2023-01-11T23:16:33.6681954Z 2023-01-11T23:16:33.6682175Z [----------] 1 test from InsertAndEliminateRedundantGuardsTest 2023-01-11T23:16:33.6682580Z [ RUN ] InsertAndEliminateRedundantGuardsTest.Basic 2023-01-11T23:16:33.6687411Z [ OK ] InsertAndEliminateRedundantGuardsTest.Basic (0 ms) 2023-01-11T23:16:33.6687859Z [----------] 1 test from InsertAndEliminateRedundantGuardsTest (0 ms total) 2023-01-11T23:16:33.6688073Z 2023-01-11T23:16:33.6688238Z [----------] 1 test from InsertBailOutsTest 2023-01-11T23:16:33.6688544Z [ RUN ] InsertBailOutsTest.Basic 2023-01-11T23:16:33.6697274Z [ OK ] InsertBailOutsTest.Basic (0 ms) 2023-01-11T23:16:33.6697623Z [----------] 1 test from InsertBailOutsTest (0 ms total) 2023-01-11T23:16:33.6697795Z 2023-01-11T23:16:33.6697950Z [----------] 2 tests from ProfilerTest 2023-01-11T23:16:33.6698233Z [ RUN ] ProfilerTest.Basic 2023-01-11T23:16:33.6795995Z [ OK ] ProfilerTest.Basic (9 ms) 2023-01-11T23:16:33.6796450Z [ RUN ] ProfilerTest.OptionalProfiling 2023-01-11T23:16:33.6797466Z [ OK ] ProfilerTest.OptionalProfiling (0 ms) 2023-01-11T23:16:33.6797824Z [----------] 2 tests from ProfilerTest (10 ms total) 2023-01-11T23:16:33.6797987Z 2023-01-11T23:16:33.6798143Z [----------] 2 tests from CallStackTest 2023-01-11T23:16:33.6798418Z [ RUN ] CallStackTest.Basic 2023-01-11T23:16:33.6802930Z [ OK ] CallStackTest.Basic (0 ms) 2023-01-11T23:16:33.6803228Z [ RUN ] CallStackTest.Caching 2023-01-11T23:16:33.6806650Z [ OK ] CallStackTest.Caching (0 ms) 2023-01-11T23:16:33.6806995Z [----------] 2 tests from CallStackTest (0 ms total) 2023-01-11T23:16:33.6807155Z 2023-01-11T23:16:33.6807330Z [----------] 2 tests from InlinedCallStackTest 2023-01-11T23:16:33.6807773Z [ RUN ] InlinedCallStackTest.BlockAnnotation 2023-01-11T23:16:33.6813331Z [ OK ] InlinedCallStackTest.BlockAnnotation (0 ms) 2023-01-11T23:16:33.6813708Z [ RUN ] InlinedCallStackTest.SelfCallMethods 2023-01-11T23:16:33.6822932Z [ OK ] InlinedCallStackTest.SelfCallMethods (0 ms) 2023-01-11T23:16:33.6823353Z [----------] 2 tests from InlinedCallStackTest (1 ms total) 2023-01-11T23:16:33.6823575Z 2023-01-11T23:16:33.6823756Z [----------] 1 test from AutogradSymbolsTest 2023-01-11T23:16:33.6824131Z [ RUN ] AutogradSymbolsTest.Basic 2023-01-11T23:16:33.6824452Z [ OK ] AutogradSymbolsTest.Basic (0 ms) 2023-01-11T23:16:33.6824791Z [----------] 1 test from AutogradSymbolsTest (0 ms total) 2023-01-11T23:16:33.6824969Z 2023-01-11T23:16:33.6825159Z [----------] 1 test from DefaultArgTypeHintingTest 2023-01-11T23:16:33.6825502Z [ RUN ] DefaultArgTypeHintingTest.Basic 2023-01-11T23:16:33.6825853Z [ OK ] DefaultArgTypeHintingTest.Basic (0 ms) 2023-01-11T23:16:33.6826221Z [----------] 1 test from DefaultArgTypeHintingTest (0 ms total) 2023-01-11T23:16:33.6826412Z 2023-01-11T23:16:33.6826563Z [----------] 5 tests from FuturesTest 2023-01-11T23:16:33.6826839Z [ RUN ] FuturesTest.Basic 2023-01-11T23:16:33.6827115Z [ OK ] FuturesTest.Basic (0 ms) 2023-01-11T23:16:33.6827390Z [ RUN ] FuturesTest.Error 2023-01-11T23:16:33.6838050Z [ OK ] FuturesTest.Error (1 ms) 2023-01-11T23:16:33.6838334Z [ RUN ] FuturesTest.Then 2023-01-11T23:16:33.6839270Z [ OK ] FuturesTest.Then (0 ms) 2023-01-11T23:16:33.6839976Z [ RUN ] FuturesTest.CollectAll 2023-01-11T23:16:33.6840290Z [ OK ] FuturesTest.CollectAll (0 ms) 2023-01-11T23:16:33.6840627Z [ RUN ] FuturesTest.CollectAny 2023-01-11T23:16:33.6841071Z [ OK ] FuturesTest.CollectAny (0 ms) 2023-01-11T23:16:33.6841494Z [----------] 5 tests from FuturesTest (1 ms total) 2023-01-11T23:16:33.6841662Z 2023-01-11T23:16:33.6841845Z [----------] 1 test from TLSFutureCallbacksTest 2023-01-11T23:16:33.6842167Z [ RUN ] TLSFutureCallbacksTest.Basic 2023-01-11T23:16:33.6842864Z [ OK ] TLSFutureCallbacksTest.Basic (0 ms) 2023-01-11T23:16:33.6843385Z [----------] 1 test from TLSFutureCallbacksTest (0 ms total) 2023-01-11T23:16:33.6843649Z 2023-01-11T23:16:33.6844077Z [----------] 1 test from ProfilerDisableInCallbackTest 2023-01-11T23:16:33.6844500Z [ RUN ] ProfilerDisableInCallbackTest.Basic 2023-01-11T23:16:33.6845323Z [ OK ] ProfilerDisableInCallbackTest.Basic (0 ms) 2023-01-11T23:16:33.6845957Z [----------] 1 test from ProfilerDisableInCallbackTest (0 ms total) 2023-01-11T23:16:33.6846242Z 2023-01-11T23:16:33.6846427Z [----------] 2 tests from RecordDebugHandles 2023-01-11T23:16:33.6846746Z [ RUN ] RecordDebugHandles.Basic 2023-01-11T23:16:33.9639832Z STAGE:2023-01-11 23:16:33 19737:19737 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T23:16:33.9640929Z STAGE:2023-01-11 23:16:33 19737:19737 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T23:16:33.9641507Z STAGE:2023-01-11 23:16:33 19737:19737 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T23:16:33.9642004Z [ OK ] RecordDebugHandles.Basic (279 ms) 2023-01-11T23:16:33.9642342Z [ RUN ] RecordDebugHandles.ScopedCallbacks 2023-01-11T23:16:33.9642765Z STAGE:2023-01-11 23:16:33 19737:19737 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T23:16:33.9647796Z STAGE:2023-01-11 23:16:33 19737:19737 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T23:16:33.9648295Z STAGE:2023-01-11 23:16:33 19737:19737 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T23:16:33.9648890Z STAGE:2023-01-11 23:16:33 19737:19737 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T23:16:33.9652661Z STAGE:2023-01-11 23:16:33 19737:19737 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T23:16:33.9653156Z STAGE:2023-01-11 23:16:33 19737:19737 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T23:16:33.9653613Z STAGE:2023-01-11 23:16:33 19737:19737 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T23:16:33.9658818Z STAGE:2023-01-11 23:16:33 19737:19737 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T23:16:33.9659315Z STAGE:2023-01-11 23:16:33 19737:19737 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T23:16:33.9659734Z [ OK ] RecordDebugHandles.ScopedCallbacks (1 ms) 2023-01-11T23:16:33.9660108Z [----------] 2 tests from RecordDebugHandles (281 ms total) 2023-01-11T23:16:33.9660286Z 2023-01-11T23:16:33.9660457Z [----------] 1 test from IValueKWargsTest 2023-01-11T23:16:33.9660757Z [ RUN ] IValueKWargsTest.Basic 2023-01-11T23:16:33.9666054Z [ OK ] IValueKWargsTest.Basic (0 ms) 2023-01-11T23:16:33.9666416Z [----------] 1 test from IValueKWargsTest (0 ms total) 2023-01-11T23:16:33.9666590Z 2023-01-11T23:16:33.9666757Z [----------] 1 test from ComputeFlopsTest 2023-01-11T23:16:33.9667064Z [ RUN ] ComputeFlopsTest.Basic 2023-01-11T23:16:33.9667548Z [W util.cpp:501] Warning: Failed to compute flops for op aten::conv2d because both input and weight must be size 4. (function computeFlops) 2023-01-11T23:16:33.9668080Z [W util.cpp:516] Warning: Failed to compute flops for op aten::conv2d because stride must be size 2 and cannot be 0. (function computeFlops) 2023-01-11T23:16:33.9668685Z [W util.cpp:472] Warning: Calculating flops for aten::conv2d requires groups, padding, stride, dilation, input_size, and weight_size in saved arguments. (function computeFlops) 2023-01-11T23:16:33.9669213Z [W util.cpp:545] Warning: Calculating flops for aten::mm requires mat1_size and mat2_size in saved arguments. (function computeFlops) 2023-01-11T23:16:33.9669690Z [ OK ] ComputeFlopsTest.Basic (0 ms) 2023-01-11T23:16:33.9670025Z [----------] 1 test from ComputeFlopsTest (0 ms total) 2023-01-11T23:16:33.9670196Z 2023-01-11T23:16:33.9670348Z [----------] 1 test from TestConstant 2023-01-11T23:16:33.9670751Z [ RUN ] TestConstant.TensorGrad 2023-01-11T23:16:33.9671055Z [ OK ] TestConstant.TensorGrad (0 ms) 2023-01-11T23:16:33.9671389Z [----------] 1 test from TestConstant (0 ms total) 2023-01-11T23:16:33.9671550Z 2023-01-11T23:16:33.9671713Z [----------] 1 test from TestMutation 2023-01-11T23:16:33.9672043Z [ RUN ] TestMutation.Basic 2023-01-11T23:16:33.9672355Z [ OK ] TestMutation.Basic (0 ms) 2023-01-11T23:16:33.9672789Z [----------] 1 test from TestMutation (0 ms total) 2023-01-11T23:16:33.9672972Z 2023-01-11T23:16:33.9673183Z [----------] 1 test from TestInplaceToFunctionalActivation 2023-01-11T23:16:33.9673564Z [ RUN ] TestInplaceToFunctionalActivation.Basic 2023-01-11T23:16:33.9673955Z [ OK ] TestInplaceToFunctionalActivation.Basic (0 ms) 2023-01-11T23:16:33.9674375Z [----------] 1 test from TestInplaceToFunctionalActivation (0 ms total) 2023-01-11T23:16:33.9674579Z 2023-01-11T23:16:33.9674747Z [----------] 1 test from TestRegisterShapeOp 2023-01-11T23:16:33.9675053Z [ RUN ] TestRegisterShapeOp.Basic 2023-01-11T23:16:34.0818388Z [ OK ] TestRegisterShapeOp.Basic (114 ms) 2023-01-11T23:16:34.0819163Z [----------] 1 test from TestRegisterShapeOp (114 ms total) 2023-01-11T23:16:34.0819520Z 2023-01-11T23:16:34.0819944Z [----------] 1 test from TestFunctionalToInplaceActivation 2023-01-11T23:16:34.0821040Z [ RUN ] TestFunctionalToInplaceActivation.Basic 2023-01-11T23:16:34.0822146Z [ OK ] TestFunctionalToInplaceActivation.Basic (0 ms) 2023-01-11T23:16:34.0822994Z [----------] 1 test from TestFunctionalToInplaceActivation (0 ms total) 2023-01-11T23:16:34.0823399Z 2023-01-11T23:16:34.0823735Z [----------] 2 tests from TestFunctionExecutor 2023-01-11T23:16:34.0824440Z [ RUN ] TestFunctionExecutor.SimpleExecutorTest 2023-01-11T23:16:34.0825222Z [ OK ] TestFunctionExecutor.SimpleExecutorTest (0 ms) 2023-01-11T23:16:34.0825999Z [ RUN ] TestFunctionExecutor.RunDecompositionTest 2023-01-11T23:16:34.0839371Z [ OK ] TestFunctionExecutor.RunDecompositionTest (1 ms) 2023-01-11T23:16:34.0839807Z [----------] 2 tests from TestFunctionExecutor (2 ms total) 2023-01-11T23:16:34.0839982Z 2023-01-11T23:16:34.0840159Z [----------] 1 test from TestShapeGraphLinting 2023-01-11T23:16:34.0840493Z [ RUN ] TestShapeGraphLinting.Basic 2023-01-11T23:16:34.0843085Z [ OK ] TestShapeGraphLinting.Basic (0 ms) 2023-01-11T23:16:34.0843470Z [----------] 1 test from TestShapeGraphLinting (0 ms total) 2023-01-11T23:16:34.0843657Z 2023-01-11T23:16:34.0843802Z [----------] 1 test from Composed 2023-01-11T23:16:34.0844085Z [ RUN ] Composed.ComposedOp 2023-01-11T23:16:34.3429621Z [ OK ] Composed.ComposedOp (258 ms) 2023-01-11T23:16:34.3430628Z [----------] 1 test from Composed (258 ms total) 2023-01-11T23:16:34.3430980Z 2023-01-11T23:16:34.3431512Z [----------] 1 test from ConstantPropagation 2023-01-11T23:16:34.3432415Z [ RUN ] ConstantPropagation.CustomClassesCanBePropagated 2023-01-11T23:16:34.3434061Z [ OK ] ConstantPropagation.CustomClassesCanBePropagated (0 ms) 2023-01-11T23:16:34.3435018Z [----------] 1 test from ConstantPropagation (0 ms total) 2023-01-11T23:16:34.3435372Z 2023-01-11T23:16:34.3435712Z [----------] 19 tests from MobileTypeParserTest 2023-01-11T23:16:34.3436071Z [ RUN ] MobileTypeParserTest.Int 2023-01-11T23:16:34.3436401Z [ OK ] MobileTypeParserTest.Int (0 ms) 2023-01-11T23:16:34.3436793Z [ RUN ] MobileTypeParserTest.NestedContainersAnnotationStr 2023-01-11T23:16:34.3437259Z [ OK ] MobileTypeParserTest.NestedContainersAnnotationStr (0 ms) 2023-01-11T23:16:34.3437663Z [ RUN ] MobileTypeParserTest.TorchBindClass 2023-01-11T23:16:34.3438033Z [ OK ] MobileTypeParserTest.TorchBindClass (0 ms) 2023-01-11T23:16:34.3438419Z [ RUN ] MobileTypeParserTest.ListOfTorchBindClass 2023-01-11T23:16:34.3438820Z [ OK ] MobileTypeParserTest.ListOfTorchBindClass (0 ms) 2023-01-11T23:16:34.3439285Z [ RUN ] MobileTypeParserTest.NestedContainersAnnotationStrWithSpaces 2023-01-11T23:16:34.3439807Z [ OK ] MobileTypeParserTest.NestedContainersAnnotationStrWithSpaces (0 ms) 2023-01-11T23:16:34.3440223Z [ RUN ] MobileTypeParserTest.NamedTuple 2023-01-11T23:16:34.3440577Z [ OK ] MobileTypeParserTest.NamedTuple (0 ms) 2023-01-11T23:16:34.3440978Z [ RUN ] MobileTypeParserTest.DictNestedNamedTupleTypeList 2023-01-11T23:16:34.3441430Z [ OK ] MobileTypeParserTest.DictNestedNamedTupleTypeList (0 ms) 2023-01-11T23:16:34.3441889Z [ RUN ] MobileTypeParserTest.NamedTupleNestedNamedTupleTypeList 2023-01-11T23:16:34.3442372Z [ OK ] MobileTypeParserTest.NamedTupleNestedNamedTupleTypeList (0 ms) 2023-01-11T23:16:34.3442830Z [ RUN ] MobileTypeParserTest.NamedTupleNestedNamedTuple 2023-01-11T23:16:34.3443265Z [ OK ] MobileTypeParserTest.NamedTupleNestedNamedTuple (0 ms) 2023-01-11T23:16:34.3443638Z [ RUN ] MobileTypeParserTest.Empty 2023-01-11T23:16:34.3468316Z [ OK ] MobileTypeParserTest.Empty (3 ms) 2023-01-11T23:16:34.3469278Z [ RUN ] MobileTypeParserTest.TypoRaises 2023-01-11T23:16:34.3505197Z [ OK ] MobileTypeParserTest.TypoRaises (3 ms) 2023-01-11T23:16:34.3505901Z [ RUN ] MobileTypeParserTest.MismatchBracketRaises 2023-01-11T23:16:34.3542646Z [ OK ] MobileTypeParserTest.MismatchBracketRaises (3 ms) 2023-01-11T23:16:34.3543079Z [ RUN ] MobileTypeParserTest.MismatchBracketRaises2 2023-01-11T23:16:34.3581789Z [ OK ] MobileTypeParserTest.MismatchBracketRaises2 (3 ms) 2023-01-11T23:16:34.3582224Z [ RUN ] MobileTypeParserTest.DictWithoutValueRaises 2023-01-11T23:16:34.3615884Z [ OK ] MobileTypeParserTest.DictWithoutValueRaises (3 ms) 2023-01-11T23:16:34.3616363Z [ RUN ] MobileTypeParserTest.ListArgCountMismatchRaises 2023-01-11T23:16:34.3655913Z [ OK ] MobileTypeParserTest.ListArgCountMismatchRaises (3 ms) 2023-01-11T23:16:34.3656429Z [ RUN ] MobileTypeParserTest.DictArgCountMismatchRaises 2023-01-11T23:16:34.3685424Z [ OK ] MobileTypeParserTest.DictArgCountMismatchRaises (3 ms) 2023-01-11T23:16:34.3685994Z [ RUN ] MobileTypeParserTest.ValidTypeWithExtraStuffRaises 2023-01-11T23:16:34.3708316Z [ OK ] MobileTypeParserTest.ValidTypeWithExtraStuffRaises (2 ms) 2023-01-11T23:16:34.3709320Z [ RUN ] MobileTypeParserTest.NonIdentifierRaises 2023-01-11T23:16:34.3729404Z [ OK ] MobileTypeParserTest.NonIdentifierRaises (2 ms) 2023-01-11T23:16:34.3730462Z [ RUN ] MobileTypeParserTest.DictNestedNamedTupleTypeListRaises 2023-01-11T23:16:34.3773671Z [ OK ] MobileTypeParserTest.DictNestedNamedTupleTypeListRaises (4 ms) 2023-01-11T23:16:34.3774420Z [----------] 19 tests from MobileTypeParserTest (34 ms total) 2023-01-11T23:16:34.3774807Z 2023-01-11T23:16:34.3775038Z [----------] 14 tests from ModuleAPITest 2023-01-11T23:16:34.3775389Z [ RUN ] ModuleAPITest.MethodRunAsync 2023-01-11T23:16:34.3810649Z [ OK ] ModuleAPITest.MethodRunAsync (3 ms) 2023-01-11T23:16:34.3811391Z [ RUN ] ModuleAPITest.Clone 2023-01-11T23:16:34.3811983Z [ OK ] ModuleAPITest.Clone (0 ms) 2023-01-11T23:16:34.3812644Z [ RUN ] ModuleAPITest.CloneWithModuleInterface 2023-01-11T23:16:34.3817596Z [ OK ] ModuleAPITest.CloneWithModuleInterface (0 ms) 2023-01-11T23:16:34.3818113Z [ RUN ] ModuleAPITest.Copy 2023-01-11T23:16:34.3818489Z [ OK ] ModuleAPITest.Copy (0 ms) 2023-01-11T23:16:34.3818914Z [ RUN ] ModuleAPITest.DeepCopy 2023-01-11T23:16:34.3819343Z [ OK ] ModuleAPITest.DeepCopy (0 ms) 2023-01-11T23:16:34.3819678Z [ RUN ] ModuleAPITest.DeepCopyString 2023-01-11T23:16:34.3820017Z [ OK ] ModuleAPITest.DeepCopyString (0 ms) 2023-01-11T23:16:34.3820346Z [ RUN ] ModuleAPITest.DeepCopyEnum 2023-01-11T23:16:34.3820675Z [ OK ] ModuleAPITest.DeepCopyEnum (0 ms) 2023-01-11T23:16:34.3821026Z [ RUN ] ModuleAPITest.DeepCopyPreservesAliasing 2023-01-11T23:16:34.3821422Z [ OK ] ModuleAPITest.DeepCopyPreservesAliasing (0 ms) 2023-01-11T23:16:34.3821765Z [ RUN ] ModuleAPITest.Constants 2023-01-11T23:16:34.3822068Z [ OK ] ModuleAPITest.Constants (0 ms) 2023-01-11T23:16:34.3822378Z [ RUN ] ModuleAPITest.Parameters 2023-01-11T23:16:34.3822692Z [ OK ] ModuleAPITest.Parameters (0 ms) 2023-01-11T23:16:34.3822992Z [ RUN ] ModuleAPITest.Define 2023-01-11T23:16:34.3824883Z [ OK ] ModuleAPITest.Define (0 ms) 2023-01-11T23:16:34.3825248Z [ RUN ] ModuleAPITest.Freezing 2023-01-11T23:16:34.3855818Z [ OK ] ModuleAPITest.Freezing (3 ms) 2023-01-11T23:16:34.3856268Z [ RUN ] ModuleAPITest.OfiFreezesTraining 2023-01-11T23:16:34.3888648Z [ OK ] ModuleAPITest.OfiFreezesTraining (3 ms) 2023-01-11T23:16:34.3889370Z [ RUN ] ModuleAPITest.To_CUDA 2023-01-11T23:16:34.3890791Z [ OK ] ModuleAPITest.To_CUDA (0 ms) 2023-01-11T23:16:34.3891701Z [----------] 14 tests from ModuleAPITest (11 ms total) 2023-01-11T23:16:34.3892165Z 2023-01-11T23:16:34.3892646Z [----------] 6 tests from PeepholeOptimizeTest 2023-01-11T23:16:34.3893496Z [ RUN ] PeepholeOptimizeTest.IsAndIsNot 2023-01-11T23:16:34.3894403Z [ OK ] PeepholeOptimizeTest.IsAndIsNot (0 ms) 2023-01-11T23:16:34.3895615Z [ RUN ] PeepholeOptimizeTest.IsAndIsNot2 2023-01-11T23:16:34.3896141Z [ OK ] PeepholeOptimizeTest.IsAndIsNot2 (0 ms) 2023-01-11T23:16:34.3896532Z [ RUN ] PeepholeOptimizeTest.IsAndIsNot3 2023-01-11T23:16:34.3896878Z [ OK ] PeepholeOptimizeTest.IsAndIsNot3 (0 ms) 2023-01-11T23:16:34.3897238Z [ RUN ] PeepholeOptimizeTest.UnwrapOptional 2023-01-11T23:16:34.3897609Z [ OK ] PeepholeOptimizeTest.UnwrapOptional (0 ms) 2023-01-11T23:16:34.3897969Z [ RUN ] PeepholeOptimizeTest.UnwrapOptional2 2023-01-11T23:16:34.3898343Z [ OK ] PeepholeOptimizeTest.UnwrapOptional2 (0 ms) 2023-01-11T23:16:34.3898698Z [ RUN ] PeepholeOptimizeTest.AddMMFusion 2023-01-11T23:16:34.3899048Z [ OK ] PeepholeOptimizeTest.AddMMFusion (0 ms) 2023-01-11T23:16:34.3899409Z [----------] 6 tests from PeepholeOptimizeTest (0 ms total) 2023-01-11T23:16:34.3899589Z 2023-01-11T23:16:34.3899753Z [----------] 5 tests from QualifiedNameTest 2023-01-11T23:16:34.3900091Z [ RUN ] QualifiedNameTest.PrefixConstruction 2023-01-11T23:16:34.3900551Z [ OK ] QualifiedNameTest.PrefixConstruction (0 ms) 2023-01-11T23:16:34.3900918Z [ RUN ] QualifiedNameTest.DottedConstruction 2023-01-11T23:16:34.3901292Z [ OK ] QualifiedNameTest.DottedConstruction (0 ms) 2023-01-11T23:16:34.3901650Z [ RUN ] QualifiedNameTest.BadInputRaises 2023-01-11T23:16:34.3943770Z [ OK ] QualifiedNameTest.BadInputRaises (4 ms) 2023-01-11T23:16:34.3945089Z [ RUN ] QualifiedNameTest.Equality 2023-01-11T23:16:34.3946006Z [ OK ] QualifiedNameTest.Equality (0 ms) 2023-01-11T23:16:34.3946867Z [ RUN ] QualifiedNameTest.IsPrefixOf 2023-01-11T23:16:34.3947526Z [ OK ] QualifiedNameTest.IsPrefixOf (0 ms) 2023-01-11T23:16:34.3948231Z [----------] 5 tests from QualifiedNameTest (4 ms total) 2023-01-11T23:16:34.3948570Z 2023-01-11T23:16:34.3948895Z [----------] 7 tests from SerializationTest 2023-01-11T23:16:34.3949613Z [ RUN ] SerializationTest.ExtraFilesHookPreference 2023-01-11T23:16:34.3950851Z [W export_module.cpp:587] Warning: An extra files hook attempted to write metadata.json but this is already written in extra files and so will be skipped. This warning will only appear once per process. (function operator()) 2023-01-11T23:16:34.3952117Z [ OK ] SerializationTest.ExtraFilesHookPreference (0 ms) 2023-01-11T23:16:34.3952997Z [ RUN ] SerializationTest.ExtraFileHooksNoSecret 2023-01-11T23:16:34.3953871Z [ OK ] SerializationTest.ExtraFileHooksNoSecret (0 ms) 2023-01-11T23:16:34.3954738Z [ RUN ] SerializationTest.ExtraFileHooksWithSecret 2023-01-11T23:16:34.3955856Z [ OK ] SerializationTest.ExtraFileHooksWithSecret (0 ms) 2023-01-11T23:16:34.3956428Z [ RUN ] SerializationTest.TypeTags 2023-01-11T23:16:34.3956752Z [ OK ] SerializationTest.TypeTags (0 ms) 2023-01-11T23:16:34.3957101Z [ RUN ] SerializationTest.TestJitStream_CUDA 2023-01-11T23:16:34.4075869Z [ OK ] SerializationTest.TestJitStream_CUDA (11 ms) 2023-01-11T23:16:34.4076417Z [ RUN ] SerializationTest.ParentDirNotExist 2023-01-11T23:16:34.4108324Z [ OK ] SerializationTest.ParentDirNotExist (3 ms) 2023-01-11T23:16:34.4108876Z [ RUN ] SerializationTest.CalculateNecessaryArgsTest 2023-01-11T23:16:34.4109387Z [ OK ] SerializationTest.CalculateNecessaryArgsTest (0 ms) 2023-01-11T23:16:34.4109787Z [----------] 7 tests from SerializationTest (16 ms total) 2023-01-11T23:16:34.4109953Z 2023-01-11T23:16:34.4110125Z [----------] 3 tests from TestSourceRoundTrip 2023-01-11T23:16:34.4110472Z [ RUN ] TestSourceRoundTrip.UpsampleNearest2d 2023-01-11T23:16:34.4126316Z [ OK ] TestSourceRoundTrip.UpsampleNearest2d (1 ms) 2023-01-11T23:16:34.4126837Z [ RUN ] TestSourceRoundTrip.CheckAttrAccess 2023-01-11T23:16:34.4127309Z [ OK ] TestSourceRoundTrip.CheckAttrAccess (0 ms) 2023-01-11T23:16:34.4127679Z [ RUN ] TestSourceRoundTrip.MethodInvocation 2023-01-11T23:16:34.4181249Z [ OK ] TestSourceRoundTrip.MethodInvocation (5 ms) 2023-01-11T23:16:34.4181774Z [----------] 3 tests from TestSourceRoundTrip (7 ms total) 2023-01-11T23:16:34.4182014Z 2023-01-11T23:16:34.4182170Z [----------] 1 test from TestSaveLoad 2023-01-11T23:16:34.4182493Z [ RUN ] TestSaveLoad.LoadWithoutDebugInfo 2023-01-11T23:16:34.4201938Z [ OK ] TestSaveLoad.LoadWithoutDebugInfo (2 ms) 2023-01-11T23:16:34.4202504Z [----------] 1 test from TestSaveLoad (2 ms total) 2023-01-11T23:16:34.4202738Z 2023-01-11T23:16:34.4203025Z [----------] 2 tests from FunctionSchemaIsAliasingTest 2023-01-11T23:16:34.4203466Z [ RUN ] FunctionSchemaIsAliasingTest.Basic 2023-01-11T23:16:34.4203950Z [ OK ] FunctionSchemaIsAliasingTest.Basic (0 ms) 2023-01-11T23:16:34.4204344Z [ RUN ] FunctionSchemaIsAliasingTest.InvalidArgument 2023-01-11T23:16:34.4225299Z [ OK ] FunctionSchemaIsAliasingTest.InvalidArgument (2 ms) 2023-01-11T23:16:34.4225731Z [----------] 2 tests from FunctionSchemaIsAliasingTest (2 ms total) 2023-01-11T23:16:34.4225930Z 2023-01-11T23:16:34.4226185Z [----------] 2 tests from FunctionSchemaIsMutableTest 2023-01-11T23:16:34.4226568Z [ RUN ] FunctionSchemaIsMutableTest.Basic 2023-01-11T23:16:34.4226926Z [ OK ] FunctionSchemaIsMutableTest.Basic (0 ms) 2023-01-11T23:16:34.4227296Z [ RUN ] FunctionSchemaIsMutableTest.InvalidArgument 2023-01-11T23:16:34.4257536Z [ OK ] FunctionSchemaIsMutableTest.InvalidArgument (3 ms) 2023-01-11T23:16:34.4257958Z [----------] 2 tests from FunctionSchemaIsMutableTest (3 ms total) 2023-01-11T23:16:34.4258153Z 2023-01-11T23:16:34.4258338Z [----------] 5 tests from SchemaInfoIsMutableTest 2023-01-11T23:16:34.4258662Z [ RUN ] SchemaInfoIsMutableTest.Basic 2023-01-11T23:16:34.4259417Z [ OK ] SchemaInfoIsMutableTest.Basic (0 ms) 2023-01-11T23:16:34.4259780Z [ RUN ] SchemaInfoIsMutableTest.InvalidArgument 2023-01-11T23:16:34.4293857Z [ OK ] SchemaInfoIsMutableTest.InvalidArgument (3 ms) 2023-01-11T23:16:34.4294243Z [ RUN ] SchemaInfoIsMutableTest.AliasingInputs 2023-01-11T23:16:34.4294961Z [ OK ] SchemaInfoIsMutableTest.AliasingInputs (0 ms) 2023-01-11T23:16:34.4295453Z [ RUN ] SchemaInfoIsMutableTest.InstanceNorm 2023-01-11T23:16:34.4295871Z [ OK ] SchemaInfoIsMutableTest.InstanceNorm (0 ms) 2023-01-11T23:16:34.4296226Z [ RUN ] SchemaInfoIsMutableTest.BatchNorm 2023-01-11T23:16:34.4296591Z [ OK ] SchemaInfoIsMutableTest.BatchNorm (0 ms) 2023-01-11T23:16:34.4296968Z [----------] 5 tests from SchemaInfoIsMutableTest (3 ms total) 2023-01-11T23:16:34.4297151Z 2023-01-11T23:16:34.4297352Z [----------] 2 tests from SchemaInfoIsNonDeterministicTest 2023-01-11T23:16:34.4297727Z [ RUN ] SchemaInfoIsNonDeterministicTest.Basic 2023-01-11T23:16:34.4298194Z [ OK ] SchemaInfoIsNonDeterministicTest.Basic (0 ms) 2023-01-11T23:16:34.4298583Z [ RUN ] SchemaInfoIsNonDeterministicTest.Dropout 2023-01-11T23:16:34.4298967Z [ OK ] SchemaInfoIsNonDeterministicTest.Dropout (0 ms) 2023-01-11T23:16:34.4299382Z [----------] 2 tests from SchemaInfoIsNonDeterministicTest (0 ms total) 2023-01-11T23:16:34.4299583Z 2023-01-11T23:16:34.4299773Z [----------] 3 tests from FunctionSchemaMayAliasTest 2023-01-11T23:16:34.4300107Z [ RUN ] FunctionSchemaMayAliasTest.Basic 2023-01-11T23:16:34.4300456Z [ OK ] FunctionSchemaMayAliasTest.Basic (0 ms) 2023-01-11T23:16:34.4300832Z [ RUN ] FunctionSchemaMayAliasTest.InvalidArgument 2023-01-11T23:16:34.4322873Z [ OK ] FunctionSchemaMayAliasTest.InvalidArgument (2 ms) 2023-01-11T23:16:34.4323353Z [ RUN ] FunctionSchemaMayAliasTest.Wildcard 2023-01-11T23:16:34.4323791Z [ OK ] FunctionSchemaMayAliasTest.Wildcard (0 ms) 2023-01-11T23:16:34.4324289Z [----------] 3 tests from FunctionSchemaMayAliasTest (2 ms total) 2023-01-11T23:16:34.4324538Z 2023-01-11T23:16:34.4324809Z [----------] 7 tests from SchemaInfoMayAliasTest 2023-01-11T23:16:34.4325205Z [ RUN ] SchemaInfoMayAliasTest.AliasingInputs 2023-01-11T23:16:34.4325673Z [ OK ] SchemaInfoMayAliasTest.AliasingInputs (0 ms) 2023-01-11T23:16:34.4326060Z [ RUN ] SchemaInfoMayAliasTest.AliasingOutputs 2023-01-11T23:16:34.4326569Z [ OK ] SchemaInfoMayAliasTest.AliasingOutputs (0 ms) 2023-01-11T23:16:34.4327107Z [ RUN ] SchemaInfoMayAliasTest.AliasingInputOutput 2023-01-11T23:16:34.4327617Z [ OK ] SchemaInfoMayAliasTest.AliasingInputOutput (0 ms) 2023-01-11T23:16:34.4328112Z [ RUN ] SchemaInfoMayAliasTest.MultipleWildcardInputs 2023-01-11T23:16:34.4328728Z [ OK ] SchemaInfoMayAliasTest.MultipleWildcardInputs (0 ms) 2023-01-11T23:16:34.4329224Z [ RUN ] SchemaInfoMayAliasTest.MultipleNonWildcardInputs 2023-01-11T23:16:34.4329693Z [W schema_info.cpp:333] Warning: alias::a appears twice in same argument list which will make aliasing checks more conservative. (function operator()) 2023-01-11T23:16:34.4330174Z [ OK ] SchemaInfoMayAliasTest.MultipleNonWildcardInputs (0 ms) 2023-01-11T23:16:34.4330622Z [ RUN ] SchemaInfoMayAliasTest.MultipleNonWildcardOutputs 2023-01-11T23:16:34.4331050Z [W schema_info.cpp:333] Warning: alias::a appears twice in same argument list which will make aliasing checks more conservative. (function operator()) 2023-01-11T23:16:34.4331547Z [ OK ] SchemaInfoMayAliasTest.MultipleNonWildcardOutputs (0 ms) 2023-01-11T23:16:34.4331958Z [ RUN ] SchemaInfoMayAliasTest.MismatchingTypes 2023-01-11T23:16:34.4332350Z [ OK ] SchemaInfoMayAliasTest.MismatchingTypes (0 ms) 2023-01-11T23:16:34.4332750Z [----------] 7 tests from SchemaInfoMayAliasTest (0 ms total) 2023-01-11T23:16:34.4332931Z 2023-01-11T23:16:34.4333135Z [----------] 3 tests from FunctionSchemaMayContainAliasTest 2023-01-11T23:16:34.4333518Z [ RUN ] FunctionSchemaMayContainAliasTest.Basic 2023-01-11T23:16:34.4333927Z [ OK ] FunctionSchemaMayContainAliasTest.Basic (0 ms) 2023-01-11T23:16:34.4334330Z [ RUN ] FunctionSchemaMayContainAliasTest.Wildcard 2023-01-11T23:16:34.4335094Z [ OK ] FunctionSchemaMayContainAliasTest.Wildcard (0 ms) 2023-01-11T23:16:34.4335744Z [ RUN ] FunctionSchemaMayContainAliasTest.InputAndOutputContainers 2023-01-11T23:16:34.4336334Z [ OK ] FunctionSchemaMayContainAliasTest.InputAndOutputContainers (0 ms) 2023-01-11T23:16:34.4336819Z [----------] 3 tests from FunctionSchemaMayContainAliasTest (0 ms total) 2023-01-11T23:16:34.4337024Z 2023-01-11T23:16:34.4337309Z [----------] 6 tests from SchemaInfoMayContainAliasTest 2023-01-11T23:16:34.4337741Z [ RUN ] SchemaInfoMayContainAliasTest.ContainAliasInputsEqual 2023-01-11T23:16:34.4338220Z [ OK ] SchemaInfoMayContainAliasTest.ContainAliasInputsEqual (0 ms) 2023-01-11T23:16:34.4338821Z [ RUN ] SchemaInfoMayContainAliasTest.ContainAliasInputsContained 2023-01-11T23:16:34.4339446Z [ OK ] SchemaInfoMayContainAliasTest.ContainAliasInputsContained (0 ms) 2023-01-11T23:16:34.4339932Z [ RUN ] SchemaInfoMayContainAliasTest.ContainAliasOutputs 2023-01-11T23:16:34.4349662Z [ OK ] SchemaInfoMayContainAliasTest.ContainAliasOutputs (0 ms) 2023-01-11T23:16:34.4350178Z [ RUN ] SchemaInfoMayContainAliasTest.ContainAliasInputOutput 2023-01-11T23:16:34.4350741Z [ OK ] SchemaInfoMayContainAliasTest.ContainAliasInputOutput (0 ms) 2023-01-11T23:16:34.4351226Z [ RUN ] SchemaInfoMayContainAliasTest.InputAndOutputContainers 2023-01-11T23:16:34.4351721Z [ OK ] SchemaInfoMayContainAliasTest.InputAndOutputContainers (0 ms) 2023-01-11T23:16:34.4352154Z [ RUN ] SchemaInfoMayContainAliasTest.Wildcard 2023-01-11T23:16:34.4352537Z [ OK ] SchemaInfoMayContainAliasTest.Wildcard (0 ms) 2023-01-11T23:16:34.4352958Z [----------] 6 tests from SchemaInfoMayContainAliasTest (0 ms total) 2023-01-11T23:16:34.4353158Z 2023-01-11T23:16:34.4353329Z [----------] 2 tests from SchemaMatchingTest 2023-01-11T23:16:34.4353645Z [ RUN ] SchemaMatchingTest.VarType 2023-01-11T23:16:34.4353992Z [ OK ] SchemaMatchingTest.VarType (0 ms) 2023-01-11T23:16:34.4354319Z [ RUN ] SchemaMatchingTest.VarType2 2023-01-11T23:16:34.4354762Z [ OK ] SchemaMatchingTest.VarType2 (0 ms) 2023-01-11T23:16:34.4355116Z [----------] 2 tests from SchemaMatchingTest (0 ms total) 2023-01-11T23:16:34.4355291Z 2023-01-11T23:16:34.4355451Z [----------] 6 tests from StackOptTest 2023-01-11T23:16:34.4355771Z [ RUN ] StackOptTest.UseVariadicStack 2023-01-11T23:16:34.4422262Z [ OK ] StackOptTest.UseVariadicStack (7 ms) 2023-01-11T23:16:34.4422726Z [ RUN ] StackOptTest.UseVariadicStackReplaceMultiple 2023-01-11T23:16:34.4470437Z [ OK ] StackOptTest.UseVariadicStackReplaceMultiple (4 ms) 2023-01-11T23:16:34.4471019Z [ RUN ] StackOptTest.UseVariadicStackWithMultipleListUses 2023-01-11T23:16:34.4498373Z [ OK ] StackOptTest.UseVariadicStackWithMultipleListUses (2 ms) 2023-01-11T23:16:34.4498878Z [ RUN ] StackOptTest.UseVariadicStackWithListMutationAfterCat 2023-01-11T23:16:34.4535402Z [ OK ] StackOptTest.UseVariadicStackWithListMutationAfterCat (3 ms) 2023-01-11T23:16:34.4535991Z [ RUN ] StackOptTest.UseVariadicStackWithListMutationBeforeCat 2023-01-11T23:16:34.4584533Z [ OK ] StackOptTest.UseVariadicStackWithListMutationBeforeCat (4 ms) 2023-01-11T23:16:34.4585051Z [ RUN ] StackOptTest.UseVariadicStackWithMultipleListMutations 2023-01-11T23:16:34.4671288Z [ OK ] StackOptTest.UseVariadicStackWithMultipleListMutations (8 ms) 2023-01-11T23:16:34.4671856Z [----------] 6 tests from StackOptTest (33 ms total) 2023-01-11T23:16:34.4672068Z 2023-01-11T23:16:34.4672314Z [----------] 16 tests from SubgraphMatcherTest 2023-01-11T23:16:34.4672696Z [ RUN ] SubgraphMatcherTest.Trivial1 2023-01-11T23:16:34.4673037Z [ OK ] SubgraphMatcherTest.Trivial1 (0 ms) 2023-01-11T23:16:34.4673360Z [ RUN ] SubgraphMatcherTest.Trivial2 2023-01-11T23:16:34.4673713Z [ OK ] SubgraphMatcherTest.Trivial2 (0 ms) 2023-01-11T23:16:34.4674050Z [ RUN ] SubgraphMatcherTest.Trivial3 2023-01-11T23:16:34.4674382Z [ OK ] SubgraphMatcherTest.Trivial3 (0 ms) 2023-01-11T23:16:34.4675041Z [ RUN ] SubgraphMatcherTest.Trivial4 2023-01-11T23:16:34.4675464Z [ OK ] SubgraphMatcherTest.Trivial4 (0 ms) 2023-01-11T23:16:34.4675788Z [ RUN ] SubgraphMatcherTest.Linear1 2023-01-11T23:16:34.4676241Z [ OK ] SubgraphMatcherTest.Linear1 (0 ms) 2023-01-11T23:16:34.4676571Z [ RUN ] SubgraphMatcherTest.Linear2 2023-01-11T23:16:34.4676898Z [ OK ] SubgraphMatcherTest.Linear2 (0 ms) 2023-01-11T23:16:34.4677217Z [ RUN ] SubgraphMatcherTest.Diamond1 2023-01-11T23:16:34.4677598Z [ OK ] SubgraphMatcherTest.Diamond1 (0 ms) 2023-01-11T23:16:34.4677968Z [ RUN ] SubgraphMatcherTest.Diamond2 2023-01-11T23:16:34.4678319Z [ OK ] SubgraphMatcherTest.Diamond2 (0 ms) 2023-01-11T23:16:34.4678708Z [ RUN ] SubgraphMatcherTest.XPattern 2023-01-11T23:16:34.4679047Z [ OK ] SubgraphMatcherTest.XPattern (0 ms) 2023-01-11T23:16:34.4679408Z [ RUN ] SubgraphMatcherTest.MultipleMatches 2023-01-11T23:16:34.4679769Z [ OK ] SubgraphMatcherTest.MultipleMatches (0 ms) 2023-01-11T23:16:34.4680148Z [ RUN ] SubgraphMatcherTest.OverlappingMatches 2023-01-11T23:16:34.4680543Z [ OK ] SubgraphMatcherTest.OverlappingMatches (0 ms) 2023-01-11T23:16:34.4680927Z [ RUN ] SubgraphMatcherTest.MatchInBasicBlocks1 2023-01-11T23:16:34.4681321Z [ OK ] SubgraphMatcherTest.MatchInBasicBlocks1 (0 ms) 2023-01-11T23:16:34.4681711Z [ RUN ] SubgraphMatcherTest.MatchInBasicBlocks2 2023-01-11T23:16:34.4682102Z [ OK ] SubgraphMatcherTest.MatchInBasicBlocks2 (0 ms) 2023-01-11T23:16:34.4682480Z [ RUN ] SubgraphMatcherTest.MatchesAttributes 2023-01-11T23:16:34.4682953Z [ OK ] SubgraphMatcherTest.MatchesAttributes (0 ms) 2023-01-11T23:16:34.4683311Z [ RUN ] SubgraphMatcherTest.BadPattern 2023-01-11T23:16:34.4713042Z [ OK ] SubgraphMatcherTest.BadPattern (3 ms) 2023-01-11T23:16:34.4714114Z [ RUN ] SubgraphMatcherTest.MultiOutput 2023-01-11T23:16:34.4715023Z [ OK ] SubgraphMatcherTest.MultiOutput (0 ms) 2023-01-11T23:16:34.4715906Z [----------] 16 tests from SubgraphMatcherTest (4 ms total) 2023-01-11T23:16:34.4716115Z 2023-01-11T23:16:34.4716327Z [----------] 4 tests from SubgraphRewriterTest 2023-01-11T23:16:34.4716651Z [ RUN ] SubgraphRewriterTest.FilterMatch 2023-01-11T23:16:34.4717008Z [ OK ] SubgraphRewriterTest.FilterMatch (0 ms) 2023-01-11T23:16:34.4717470Z [ RUN ] SubgraphRewriterTest.FilterNoMatch 2023-01-11T23:16:34.4717968Z [ OK ] SubgraphRewriterTest.FilterNoMatch (0 ms) 2023-01-11T23:16:34.4718452Z [ RUN ] SubgraphRewriterTest.MultiOutput 2023-01-11T23:16:34.4718976Z [ OK ] SubgraphRewriterTest.MultiOutput (0 ms) 2023-01-11T23:16:34.4719326Z [ RUN ] SubgraphRewriterTest.OutputType 2023-01-11T23:16:34.4719677Z [ OK ] SubgraphRewriterTest.OutputType (0 ms) 2023-01-11T23:16:34.4720041Z [----------] 4 tests from SubgraphRewriterTest (0 ms total) 2023-01-11T23:16:34.4720216Z 2023-01-11T23:16:34.4720384Z [----------] 3 tests from SubgraphUtilsTest 2023-01-11T23:16:34.4720681Z [ RUN ] SubgraphUtilsTest.Basic 2023-01-11T23:16:34.4723866Z [ OK ] SubgraphUtilsTest.Basic (0 ms) 2023-01-11T23:16:34.4724281Z [ RUN ] SubgraphUtilsTest.MergeSubgraphs 2023-01-11T23:16:34.4726025Z [ OK ] SubgraphUtilsTest.MergeSubgraphs (0 ms) 2023-01-11T23:16:34.4726563Z [ RUN ] SubgraphUtilsTest.GraphName 2023-01-11T23:16:34.4727044Z [ OK ] SubgraphUtilsTest.GraphName (0 ms) 2023-01-11T23:16:34.4727535Z [----------] 3 tests from SubgraphUtilsTest (0 ms total) 2023-01-11T23:16:34.4727790Z 2023-01-11T23:16:34.4727955Z [----------] 8 tests from UnionTypeTest 2023-01-11T23:16:34.4728399Z [ RUN ] UnionTypeTest.UnionOperatorEquals 2023-01-11T23:16:34.4728763Z [ OK ] UnionTypeTest.UnionOperatorEquals (0 ms) 2023-01-11T23:16:34.4729150Z [ RUN ] UnionTypeTest.UnionCreate_OptionalT1AndOptionalT2 2023-01-11T23:16:34.4729551Z [ OK ] UnionTypeTest.UnionCreate_OptionalT1AndOptionalT2 (0 ms) 2023-01-11T23:16:34.4729934Z [ RUN ] UnionTypeTest.UnionCreate_OptionalTAndT 2023-01-11T23:16:34.4730297Z [ OK ] UnionTypeTest.UnionCreate_OptionalTAndT (0 ms) 2023-01-11T23:16:34.4730690Z [ RUN ] UnionTypeTest.UnionCreate_TupleWithSubtypingRelationship 2023-01-11T23:16:34.4731126Z [ OK ] UnionTypeTest.UnionCreate_TupleWithSubtypingRelationship (0 ms) 2023-01-11T23:16:34.4731522Z [ RUN ] UnionTypeTest.UnionCreate_ContainerTAndT 2023-01-11T23:16:34.4731890Z [ OK ] UnionTypeTest.UnionCreate_ContainerTAndT (0 ms) 2023-01-11T23:16:34.4732305Z [ RUN ] UnionTypeTest.UnionCreate_OptionalContainerTAndContainerTAndT 2023-01-11T23:16:34.4732762Z [ OK ] UnionTypeTest.UnionCreate_OptionalContainerTAndContainerTAndT (0 ms) 2023-01-11T23:16:34.4733153Z [ RUN ] UnionTypeTest.Subtyping_NumberType 2023-01-11T23:16:34.4733491Z [ OK ] UnionTypeTest.Subtyping_NumberType (0 ms) 2023-01-11T23:16:34.4733842Z [ RUN ] UnionTypeTest.Subtyping_OptionalType 2023-01-11T23:16:34.4734193Z [ OK ] UnionTypeTest.Subtyping_OptionalType (0 ms) 2023-01-11T23:16:34.4734729Z [----------] 8 tests from UnionTypeTest (0 ms total) 2023-01-11T23:16:34.4734889Z 2023-01-11T23:16:34.4735124Z [----------] 2 tests from ScriptProfileTest 2023-01-11T23:16:34.4735433Z [ RUN ] ScriptProfileTest.Basic 2023-01-11T23:16:34.4735748Z [ OK ] ScriptProfileTest.Basic (0 ms) 2023-01-11T23:16:34.4736069Z [ RUN ] ScriptProfileTest.CallingOrder 2023-01-11T23:16:34.4760906Z [ OK ] ScriptProfileTest.CallingOrder (3 ms) 2023-01-11T23:16:34.4761393Z [----------] 2 tests from ScriptProfileTest (3 ms total) 2023-01-11T23:16:34.4761603Z 2023-01-11T23:16:34.4761814Z [----------] 7 tests from ShapeAnalysisTest 2023-01-11T23:16:34.4762193Z [ RUN ] ShapeAnalysisTest.DynamicShapesFusion 2023-01-11T23:16:34.4839518Z [ OK ] ShapeAnalysisTest.DynamicShapesFusion (7 ms) 2023-01-11T23:16:34.4839949Z [ RUN ] ShapeAnalysisTest.MovingConstantOutOfFusionGroups 2023-01-11T23:16:34.4856401Z [ OK ] ShapeAnalysisTest.MovingConstantOutOfFusionGroups (1 ms) 2023-01-11T23:16:34.4856830Z [ RUN ] ShapeAnalysisTest.SymbolicShapeAPI 2023-01-11T23:16:34.4933714Z [ OK ] ShapeAnalysisTest.SymbolicShapeAPI (7 ms) 2023-01-11T23:16:34.4934108Z [ RUN ] ShapeAnalysisTest.BoundedSymbolicShapes 2023-01-11T23:16:34.4940968Z [ OK ] ShapeAnalysisTest.BoundedSymbolicShapes (0 ms) 2023-01-11T23:16:34.4941375Z [ RUN ] ShapeAnalysisTest.SymbolicShapeCaching 2023-01-11T23:16:34.4949201Z [ OK ] ShapeAnalysisTest.SymbolicShapeCaching (0 ms) 2023-01-11T23:16:34.4949603Z [ RUN ] ShapeAnalysisTest.ShapeCacheMultipleFns 2023-01-11T23:16:34.4983529Z [ OK ] ShapeAnalysisTest.ShapeCacheMultipleFns (3 ms) 2023-01-11T23:16:34.4983947Z [ RUN ] ShapeAnalysisTest.TestShapeMultipleReturns 2023-01-11T23:16:34.4998269Z [ OK ] ShapeAnalysisTest.TestShapeMultipleReturns (1 ms) 2023-01-11T23:16:34.4998683Z [----------] 7 tests from ShapeAnalysisTest (23 ms total) 2023-01-11T23:16:34.4998863Z 2023-01-11T23:16:34.4999014Z [----------] 5 tests from JitLoggingTest 2023-01-11T23:16:34.4999349Z [ RUN ] JitLoggingTest.CheckSetLoggingLevel 2023-01-11T23:16:34.4999717Z [ OK ] JitLoggingTest.CheckSetLoggingLevel (0 ms) 2023-01-11T23:16:34.5000177Z [ RUN ] JitLoggingTest.CheckSetMultipleLogLevels 2023-01-11T23:16:34.5000600Z [ OK ] JitLoggingTest.CheckSetMultipleLogLevels (0 ms) 2023-01-11T23:16:34.5000996Z [ RUN ] JitLoggingTest.CheckLoggingLevelAfterUnset 2023-01-11T23:16:34.5001402Z [ OK ] JitLoggingTest.CheckLoggingLevelAfterUnset (0 ms) 2023-01-11T23:16:34.5001784Z [ RUN ] JitLoggingTest.CheckAfterChangingLevel 2023-01-11T23:16:34.5002168Z [ OK ] JitLoggingTest.CheckAfterChangingLevel (0 ms) 2023-01-11T23:16:34.5002549Z [ RUN ] JitLoggingTest.CheckOutputStreamSetting 2023-01-11T23:16:34.5002938Z [ OK ] JitLoggingTest.CheckOutputStreamSetting (0 ms) 2023-01-11T23:16:34.5003307Z [----------] 5 tests from JitLoggingTest (0 ms total) 2023-01-11T23:16:34.5003470Z 2023-01-11T23:16:34.5003628Z [----------] 9 tests from FileFormatTest 2023-01-11T23:16:34.5003979Z [ RUN ] FileFormatTest.IdentifiesFlatbufferStream 2023-01-11T23:16:34.5004370Z [ OK ] FileFormatTest.IdentifiesFlatbufferStream (0 ms) 2023-01-11T23:16:34.5004746Z [ RUN ] FileFormatTest.IdentifiesZipStream 2023-01-11T23:16:34.5005110Z [ OK ] FileFormatTest.IdentifiesZipStream (0 ms) 2023-01-11T23:16:34.5005475Z [ RUN ] FileFormatTest.FlatbufferTakesPrecedence 2023-01-11T23:16:34.5005876Z [ OK ] FileFormatTest.FlatbufferTakesPrecedence (0 ms) 2023-01-11T23:16:34.5006309Z [ RUN ] FileFormatTest.HandlesUnknownStream 2023-01-11T23:16:34.5006673Z [ OK ] FileFormatTest.HandlesUnknownStream (0 ms) 2023-01-11T23:16:34.5007026Z [ RUN ] FileFormatTest.ShortStreamIsUnknown 2023-01-11T23:16:34.5007439Z [ OK ] FileFormatTest.ShortStreamIsUnknown (0 ms) 2023-01-11T23:16:34.5007805Z [ RUN ] FileFormatTest.EmptyStreamIsUnknown 2023-01-11T23:16:34.5008168Z [ OK ] FileFormatTest.EmptyStreamIsUnknown (0 ms) 2023-01-11T23:16:34.5008529Z [ RUN ] FileFormatTest.BadStreamIsUnknown 2023-01-11T23:16:34.5008888Z [ OK ] FileFormatTest.BadStreamIsUnknown (0 ms) 2023-01-11T23:16:34.5009279Z [ RUN ] FileFormatTest.StreamOffsetIsObservedAndRestored 2023-01-11T23:16:34.5009725Z [ OK ] FileFormatTest.StreamOffsetIsObservedAndRestored (0 ms) 2023-01-11T23:16:34.5010116Z [ RUN ] FileFormatTest.HandlesMissingFile 2023-01-11T23:16:34.5010479Z [ OK ] FileFormatTest.HandlesMissingFile (0 ms) 2023-01-11T23:16:34.5010829Z [----------] 9 tests from FileFormatTest (0 ms total) 2023-01-11T23:16:34.5010988Z 2023-01-11T23:16:34.5011151Z [----------] 35 tests from FlatbufferTest 2023-01-11T23:16:34.5011479Z [ RUN ] FlatbufferTest.UpsampleNearest2d 2023-01-11T23:16:34.5012831Z [ OK ] FlatbufferTest.UpsampleNearest2d (1 ms) 2023-01-11T23:16:34.5013259Z [ RUN ] FlatbufferTest.UpsampleNearest2dWithCopyTensorMemory 2023-01-11T23:16:34.5023119Z [ OK ] FlatbufferTest.UpsampleNearest2dWithCopyTensorMemory (1 ms) 2023-01-11T23:16:34.5023528Z [ RUN ] FlatbufferTest.CheckAttrAccess 2023-01-11T23:16:34.5023865Z [ OK ] FlatbufferTest.CheckAttrAccess (0 ms) 2023-01-11T23:16:34.5024204Z [ RUN ] FlatbufferTest.MethodInvocation 2023-01-11T23:16:34.5044438Z [ OK ] FlatbufferTest.MethodInvocation (2 ms) 2023-01-11T23:16:34.5044807Z [ RUN ] FlatbufferTest.FlatbufferBackPortTest 2023-01-11T23:16:34.5074501Z [ OK ] FlatbufferTest.FlatbufferBackPortTest (2 ms) 2023-01-11T23:16:34.5074863Z [ RUN ] FlatbufferTest.ExtraFiles 2023-01-11T23:16:34.5076918Z [ OK ] FlatbufferTest.ExtraFiles (0 ms) 2023-01-11T23:16:34.5077210Z [ RUN ] FlatbufferTest.Conv 2023-01-11T23:16:34.5107787Z [ OK ] FlatbufferTest.Conv (3 ms) 2023-01-11T23:16:34.5108218Z [ RUN ] FlatbufferTest.ConvWithCopyTensorMemory 2023-01-11T23:16:34.5138231Z [ OK ] FlatbufferTest.ConvWithCopyTensorMemory (3 ms) 2023-01-11T23:16:34.5138570Z [ RUN ] FlatbufferTest.Inline 2023-01-11T23:16:34.5144088Z [ OK ] FlatbufferTest.Inline (0 ms) 2023-01-11T23:16:34.5144450Z [ RUN ] FlatbufferTest.InlineWithCopyTensorMemory 2023-01-11T23:16:34.5150151Z [ OK ] FlatbufferTest.InlineWithCopyTensorMemory (0 ms) 2023-01-11T23:16:34.5150541Z [ RUN ] FlatbufferTest.Tuple 2023-01-11T23:16:34.5154825Z [ OK ] FlatbufferTest.Tuple (0 ms) 2023-01-11T23:16:34.5155124Z [ RUN ] FlatbufferTest.Dict 2023-01-11T23:16:34.5158915Z [ OK ] FlatbufferTest.Dict (0 ms) 2023-01-11T23:16:34.5159206Z [ RUN ] FlatbufferTest.Prim 2023-01-11T23:16:34.5162084Z [ OK ] FlatbufferTest.Prim (0 ms) 2023-01-11T23:16:34.5162394Z [ RUN ] FlatbufferTest.PrimScalar 2023-01-11T23:16:34.5165925Z [ OK ] FlatbufferTest.PrimScalar (0 ms) 2023-01-11T23:16:34.5166297Z [ RUN ] FlatbufferTest.WrongMethodName 2023-01-11T23:16:34.5200826Z [ OK ] FlatbufferTest.WrongMethodName (3 ms) 2023-01-11T23:16:34.5201152Z [ RUN ] FlatbufferTest.SetState 2023-01-11T23:16:34.5221931Z [ OK ] FlatbufferTest.SetState (2 ms) 2023-01-11T23:16:34.5222247Z [ RUN ] FlatbufferTest.BuiltinClass 2023-01-11T23:16:34.5227621Z [ OK ] FlatbufferTest.BuiltinClass (0 ms) 2023-01-11T23:16:34.5227962Z [ RUN ] FlatbufferTest.BuiltinFunction 2023-01-11T23:16:34.5229991Z [ OK ] FlatbufferTest.BuiltinFunction (0 ms) 2023-01-11T23:16:34.5230303Z [ RUN ] FlatbufferTest.Eval 2023-01-11T23:16:34.5236425Z [ OK ] FlatbufferTest.Eval (0 ms) 2023-01-11T23:16:34.5236752Z [ RUN ] FlatbufferTest.FindWrongMethodName 2023-01-11T23:16:34.5239339Z [ OK ] FlatbufferTest.FindWrongMethodName (0 ms) 2023-01-11T23:16:34.5239694Z [ RUN ] FlatbufferTest.FindAndRunMethod 2023-01-11T23:16:34.5246120Z [ OK ] FlatbufferTest.FindAndRunMethod (0 ms) 2023-01-11T23:16:34.5246460Z [ RUN ] FlatbufferTest.RunMethodVariadic 2023-01-11T23:16:34.5252441Z [ OK ] FlatbufferTest.RunMethodVariadic (0 ms) 2023-01-11T23:16:34.5252792Z [ RUN ] FlatbufferTest.DuplicateSetState 2023-01-11T23:16:34.5261674Z [ OK ] FlatbufferTest.DuplicateSetState (0 ms) 2023-01-11T23:16:34.5262075Z [ RUN ] FlatbufferTest.OpNameExportFetchRootOperators 2023-01-11T23:16:34.5267953Z [ OK ] FlatbufferTest.OpNameExportFetchRootOperators (0 ms) 2023-01-11T23:16:34.5268342Z [ RUN ] FlatbufferTest.DefaultArgsConv 2023-01-11T23:16:34.5285522Z [ OK ] FlatbufferTest.DefaultArgsConv (1 ms) 2023-01-11T23:16:34.5285863Z [ RUN ] FlatbufferTest.DefaultArgsPinv 2023-01-11T23:16:34.5427160Z [ OK ] FlatbufferTest.DefaultArgsPinv (13 ms) 2023-01-11T23:16:34.5427595Z [ RUN ] FlatbufferTest.DefaultArgsTensorinvSpecifyDefault 2023-01-11T23:16:34.5434712Z [ OK ] FlatbufferTest.DefaultArgsTensorinvSpecifyDefault (0 ms) 2023-01-11T23:16:34.5435144Z [ RUN ] FlatbufferTest.DefaultArgsPinvWithOutArg 2023-01-11T23:16:34.5462723Z [ OK ] FlatbufferTest.DefaultArgsPinvWithOutArg (2 ms) 2023-01-11T23:16:34.5463116Z [ RUN ] FlatbufferTest.DefaultArgsWithOutArg 2023-01-11T23:16:34.5472123Z [ OK ] FlatbufferTest.DefaultArgsWithOutArg (0 ms) 2023-01-11T23:16:34.5473019Z [ RUN ] FlatbufferTest.OperatorCacheDifferentiatesDefaultArgs 2023-01-11T23:16:34.5496059Z [ OK ] FlatbufferTest.OperatorCacheDifferentiatesDefaultArgs (2 ms) 2023-01-11T23:16:34.5496685Z [ RUN ] FlatbufferTest.OperatorSize1 2023-01-11T23:16:34.5498896Z [ OK ] FlatbufferTest.OperatorSize1 (0 ms) 2023-01-11T23:16:34.5499252Z [ RUN ] FlatbufferTest.BoolAndDoubleList 2023-01-11T23:16:34.5499606Z [ OK ] FlatbufferTest.BoolAndDoubleList (0 ms) 2023-01-11T23:16:34.5499939Z [ RUN ] FlatbufferTest.OperatorTest2 2023-01-11T23:16:34.5509637Z [ OK ] FlatbufferTest.OperatorTest2 (0 ms) 2023-01-11T23:16:34.5509990Z [ RUN ] FlatbufferTest.DetachedBufferSmoke 2023-01-11T23:16:34.5510355Z [ OK ] FlatbufferTest.DetachedBufferSmoke (0 ms) 2023-01-11T23:16:34.5510800Z [ RUN ] FlatbufferTest.DetachedBufferNullOwner 2023-01-11T23:16:34.5511190Z [ OK ] FlatbufferTest.DetachedBufferNullOwner (0 ms) 2023-01-11T23:16:34.5511562Z [----------] 35 tests from FlatbufferTest (51 ms total) 2023-01-11T23:16:34.5511728Z 2023-01-11T23:16:34.5511910Z [----------] 3 tests from TestSourceFlatbuffer 2023-01-11T23:16:34.5512256Z [ RUN ] TestSourceFlatbuffer.UpsampleNearest2d 2023-01-11T23:16:34.5527991Z [ OK ] TestSourceFlatbuffer.UpsampleNearest2d (1 ms) 2023-01-11T23:16:34.5528495Z [ RUN ] TestSourceFlatbuffer.CheckAttrAccess 2023-01-11T23:16:34.5528872Z [ OK ] TestSourceFlatbuffer.CheckAttrAccess (0 ms) 2023-01-11T23:16:34.5529237Z [ RUN ] TestSourceFlatbuffer.MethodInvocation 2023-01-11T23:16:34.5586117Z [ OK ] TestSourceFlatbuffer.MethodInvocation (5 ms) 2023-01-11T23:16:34.5586654Z [----------] 3 tests from TestSourceFlatbuffer (7 ms total) 2023-01-11T23:16:34.5586838Z 2023-01-11T23:16:34.5587254Z [----------] 10 tests from FlatbufferUpgraderTest 2023-01-11T23:16:34.5587688Z [ RUN ] FlatbufferUpgraderTest.DivTensorV2 2023-01-11T23:16:34.5588147Z [ OK ] FlatbufferUpgraderTest.DivTensorV2 (0 ms) 2023-01-11T23:16:34.5588584Z [ RUN ] FlatbufferUpgraderTest.DivTensorOutV2 2023-01-11T23:16:34.5588979Z [ OK ] FlatbufferUpgraderTest.DivTensorOutV2 (0 ms) 2023-01-11T23:16:34.5589486Z [ RUN ] FlatbufferUpgraderTest.DivTensorInplaceV2 2023-01-11T23:16:34.5589899Z [ OK ] FlatbufferUpgraderTest.DivTensorInplaceV2 (0 ms) 2023-01-11T23:16:34.5590424Z [ RUN ] FlatbufferUpgraderTest.DivScalarFloatV2 2023-01-11T23:16:34.5591034Z [ OK ] FlatbufferUpgraderTest.DivScalarFloatV2 (0 ms) 2023-01-11T23:16:34.5591572Z [ RUN ] FlatbufferUpgraderTest.DivScalarReciprocalFloatV2 2023-01-11T23:16:34.5592036Z [ OK ] FlatbufferUpgraderTest.DivScalarReciprocalFloatV2 (0 ms) 2023-01-11T23:16:34.5592519Z [ RUN ] FlatbufferUpgraderTest.DivScalarReciprocalIntV2 2023-01-11T23:16:34.5593102Z [ OK ] FlatbufferUpgraderTest.DivScalarReciprocalIntV2 (0 ms) 2023-01-11T23:16:34.5593615Z [ RUN ] FlatbufferUpgraderTest.DivScalarScalarV2 2023-01-11T23:16:34.5594145Z [ OK ] FlatbufferUpgraderTest.DivScalarScalarV2 (0 ms) 2023-01-11T23:16:34.5594557Z [ RUN ] FlatbufferUpgraderTest.DivScalarIntV2 2023-01-11T23:16:34.5594945Z [ OK ] FlatbufferUpgraderTest.DivScalarIntV2 (0 ms) 2023-01-11T23:16:34.5595346Z [ RUN ] FlatbufferUpgraderTest.DivScalarInplaceFloatV2 2023-01-11T23:16:34.5595769Z [ OK ] FlatbufferUpgraderTest.DivScalarInplaceFloatV2 (0 ms) 2023-01-11T23:16:34.5596186Z [ RUN ] FlatbufferUpgraderTest.DivScalarInplaceIntV2 2023-01-11T23:16:34.5596601Z [ OK ] FlatbufferUpgraderTest.DivScalarInplaceIntV2 (0 ms) 2023-01-11T23:16:34.5597008Z [----------] 10 tests from FlatbufferUpgraderTest (0 ms total) 2023-01-11T23:16:34.5597189Z 2023-01-11T23:16:34.5597345Z [----------] 759 tests from NVFuserTest 2023-01-11T23:16:34.5597662Z [ RUN ] NVFuserTest.FusionDefinition_CUDA 2023-01-11T23:16:34.5657114Z [ OK ] NVFuserTest.FusionDefinition_CUDA (6 ms) 2023-01-11T23:16:34.5657487Z [ RUN ] NVFuserTest.PyFusionCache_CUDA 2023-01-11T23:16:34.5771702Z [ OK ] NVFuserTest.PyFusionCache_CUDA (11 ms) 2023-01-11T23:16:34.5772132Z [ RUN ] NVFuserTest.RecordFunctorEquality_CUDA 2023-01-11T23:16:34.5772514Z [ OK ] NVFuserTest.RecordFunctorEquality_CUDA (0 ms) 2023-01-11T23:16:34.5772885Z [ RUN ] NVFuserTest.FusionIrGraphGenerator_CUDA 2023-01-11T23:16:34.5781369Z [ OK ] NVFuserTest.FusionIrGraphGenerator_CUDA (0 ms) 2023-01-11T23:16:34.5781860Z [ RUN ] NVFuserTest.FusionDispatch_CUDA 2023-01-11T23:16:34.5782274Z [ OK ] NVFuserTest.FusionDispatch_CUDA (0 ms) 2023-01-11T23:16:34.5782637Z [ RUN ] NVFuserTest.FusionExprEvalConstants_CUDA 2023-01-11T23:16:34.5783021Z [ OK ] NVFuserTest.FusionExprEvalConstants_CUDA (0 ms) 2023-01-11T23:16:34.5783407Z [ RUN ] NVFuserTest.FusionExprEvalDouble_CUDA 2023-01-11T23:16:34.5783775Z [ OK ] NVFuserTest.FusionExprEvalDouble_CUDA (0 ms) 2023-01-11T23:16:34.5784144Z [ RUN ] NVFuserTest.FusionExprEvalBindings_CUDA 2023-01-11T23:16:34.5811252Z [ OK ] NVFuserTest.FusionExprEvalBindings_CUDA (2 ms) 2023-01-11T23:16:34.5811669Z [ RUN ] NVFuserTest.FusionExprEvalBasic_CUDA 2023-01-11T23:16:34.5818675Z [ OK ] NVFuserTest.FusionExprEvalBasic_CUDA (0 ms) 2023-01-11T23:16:34.5819124Z [ RUN ] NVFuserTest.FusionExprEvalComplex_CUDA 2023-01-11T23:16:34.5819504Z [ OK ] NVFuserTest.FusionExprEvalComplex_CUDA (0 ms) 2023-01-11T23:16:34.5819960Z [ RUN ] NVFuserTest.FusionExprEvalPostLower_CUDA 2023-01-11T23:16:34.5855570Z [ OK ] NVFuserTest.FusionExprEvalPostLower_CUDA (3 ms) 2023-01-11T23:16:34.5856004Z [ RUN ] NVFuserTest.FusionKernelExprEvalConstants_CUDA 2023-01-11T23:16:34.5856436Z [ OK ] NVFuserTest.FusionKernelExprEvalConstants_CUDA (0 ms) 2023-01-11T23:16:34.5856844Z [ RUN ] NVFuserTest.FusionKernelExprEvalBindings_CUDA 2023-01-11T23:16:34.5884321Z [ OK ] NVFuserTest.FusionKernelExprEvalBindings_CUDA (2 ms) 2023-01-11T23:16:34.5884689Z [ RUN ] NVFuserTest.FusionClear_CUDA 2023-01-11T23:16:34.7691445Z [ OK ] NVFuserTest.FusionClear_CUDA (180 ms) 2023-01-11T23:16:34.7692131Z [ RUN ] NVFuserTest.FusionCopy_CUDA 2023-01-11T23:16:34.7786425Z [ OK ] NVFuserTest.FusionCopy_CUDA (9 ms) 2023-01-11T23:16:34.7788072Z [ RUN ] NVFuserTest.FusionMove_CUDA 2023-01-11T23:16:34.7838279Z [ OK ] NVFuserTest.FusionMove_CUDA (5 ms) 2023-01-11T23:16:34.7838650Z [ RUN ] NVFuserTest.FusionSimpleArith_CUDA 2023-01-11T23:16:34.7839011Z [ OK ] NVFuserTest.FusionSimpleArith_CUDA (0 ms) 2023-01-11T23:16:34.7839385Z [ RUN ] NVFuserTest.FusionScalarTypePromote_CUDA 2023-01-11T23:16:34.7839773Z [ OK ] NVFuserTest.FusionScalarTypePromote_CUDA (0 ms) 2023-01-11T23:16:34.7840152Z [ RUN ] NVFuserTest.FusionComplexAbsTypes_CUDA 2023-01-11T23:16:34.7865894Z [ OK ] NVFuserTest.FusionComplexAbsTypes_CUDA (2 ms) 2023-01-11T23:16:34.7866273Z [ RUN ] NVFuserTest.FusionRegister_CUDA 2023-01-11T23:16:34.7866627Z [ OK ] NVFuserTest.FusionRegister_CUDA (0 ms) 2023-01-11T23:16:34.7867010Z [ RUN ] NVFuserTest.FusionTopoSort_CUDA 2023-01-11T23:16:34.7867344Z [ OK ] NVFuserTest.FusionTopoSort_CUDA (0 ms) 2023-01-11T23:16:34.7867685Z [ RUN ] NVFuserTest.FusionTensor_CUDA 2023-01-11T23:16:34.7868016Z [ OK ] NVFuserTest.FusionTensor_CUDA (0 ms) 2023-01-11T23:16:34.7868342Z [ RUN ] NVFuserTest.FusionFilterVals_CUDA 2023-01-11T23:16:34.7868861Z [ OK ] NVFuserTest.FusionFilterVals_CUDA (0 ms) 2023-01-11T23:16:34.7869205Z [ RUN ] NVFuserTest.FusionTVSplit_CUDA 2023-01-11T23:16:34.7869551Z [ OK ] NVFuserTest.FusionTVSplit_CUDA (0 ms) 2023-01-11T23:16:34.7869876Z [ RUN ] NVFuserTest.FusionTVMerge_CUDA 2023-01-11T23:16:34.7870214Z [ OK ] NVFuserTest.FusionTVMerge_CUDA (0 ms) 2023-01-11T23:16:34.7870661Z [ RUN ] NVFuserTest.FusionTVReorder_CUDA 2023-01-11T23:16:34.7871003Z [ OK ] NVFuserTest.FusionTVReorder_CUDA (0 ms) 2023-01-11T23:16:34.7871336Z [ RUN ] NVFuserTest.FusionEquality_CUDA 2023-01-11T23:16:34.7871678Z [ OK ] NVFuserTest.FusionEquality_CUDA (0 ms) 2023-01-11T23:16:34.7872013Z [ RUN ] NVFuserTest.FusionDependency_CUDA 2023-01-11T23:16:34.7872363Z [ OK ] NVFuserTest.FusionDependency_CUDA (0 ms) 2023-01-11T23:16:34.7872700Z [ RUN ] NVFuserTest.FusionParser_CUDA 2023-01-11T23:16:34.9679934Z [ OK ] NVFuserTest.FusionParser_CUDA (181 ms) 2023-01-11T23:16:34.9681381Z [ RUN ] NVFuserTest.FusionOuterSplit_CUDA 2023-01-11T23:16:35.1497269Z [ OK ] NVFuserTest.FusionOuterSplit_CUDA (181 ms) 2023-01-11T23:16:35.1497853Z [ RUN ] NVFuserTest.FusionCodeGen_CUDA 2023-01-11T23:16:35.3329368Z [ OK ] NVFuserTest.FusionCodeGen_CUDA (183 ms) 2023-01-11T23:16:35.3329945Z [ RUN ] NVFuserTest.FusionCodeGen2_CUDA 2023-01-11T23:16:35.5211775Z [ OK ] NVFuserTest.FusionCodeGen2_CUDA (188 ms) 2023-01-11T23:16:35.5212370Z [ RUN ] NVFuserTest.FusionSimplePWise_CUDA 2023-01-11T23:16:35.7042093Z [ OK ] NVFuserTest.FusionSimplePWise_CUDA (183 ms) 2023-01-11T23:16:35.7043227Z [ RUN ] NVFuserTest.FusionSimplePWiseDtypeComplex_CUDA 2023-01-11T23:16:35.8937026Z [ OK ] NVFuserTest.FusionSimplePWiseDtypeComplex_CUDA (189 ms) 2023-01-11T23:16:35.8937712Z [ RUN ] NVFuserTest.FusionExecKernel_CUDA 2023-01-11T23:16:36.0993617Z [ OK ] NVFuserTest.FusionExecKernel_CUDA (205 ms) 2023-01-11T23:16:36.0994062Z [ RUN ] NVFuserTest.FusionAdvancedComputeAt1_CUDA 2023-01-11T23:16:36.3142123Z [ OK ] NVFuserTest.FusionAdvancedComputeAt1_CUDA (214 ms) 2023-01-11T23:16:36.3142956Z [ RUN ] NVFuserTest.FusionAdvancedComputeAt2_CUDA 2023-01-11T23:16:36.5576750Z [ OK ] NVFuserTest.FusionAdvancedComputeAt2_CUDA (243 ms) 2023-01-11T23:16:36.5577528Z [ RUN ] NVFuserTest.FusionAdvancedComputeAt3_CUDA 2023-01-11T23:16:36.8182527Z [ OK ] NVFuserTest.FusionAdvancedComputeAt3_CUDA (260 ms) 2023-01-11T23:16:36.8183415Z [ RUN ] NVFuserTest.FusionAdvancedComputeAt4_CUDA 2023-01-11T23:16:37.1170848Z [ OK ] NVFuserTest.FusionAdvancedComputeAt4_CUDA (298 ms) 2023-01-11T23:16:37.1171689Z [ RUN ] NVFuserTest.FusionAdvancedComputeAt5_CUDA 2023-01-11T23:16:37.3187026Z [ OK ] NVFuserTest.FusionAdvancedComputeAt5_CUDA (201 ms) 2023-01-11T23:16:37.3187524Z [ RUN ] NVFuserTest.FusionAdvancedComputeAt6_CUDA 2023-01-11T23:16:37.5188241Z [ OK ] NVFuserTest.FusionAdvancedComputeAt6_CUDA (200 ms) 2023-01-11T23:16:37.5188692Z [ RUN ] NVFuserTest.FusionAdvancedComputeAt7_CUDA 2023-01-11T23:16:37.7307637Z [ OK ] NVFuserTest.FusionAdvancedComputeAt7_CUDA (211 ms) 2023-01-11T23:16:37.7308067Z [ RUN ] NVFuserTest.FusionAdvancedComputeAt8_CUDA 2023-01-11T23:16:37.9156484Z [ OK ] NVFuserTest.FusionAdvancedComputeAt8_CUDA (184 ms) 2023-01-11T23:16:37.9157141Z [ RUN ] NVFuserTest.FusionAdvancedComputeWith1_CUDA 2023-01-11T23:16:38.1479810Z [ OK ] NVFuserTest.FusionAdvancedComputeWith1_CUDA (232 ms) 2023-01-11T23:16:38.1480260Z [ RUN ] NVFuserTest.FusionAdvancedComputeWith2_CUDA 2023-01-11T23:16:38.4079363Z [ OK ] NVFuserTest.FusionAdvancedComputeWith2_CUDA (259 ms) 2023-01-11T23:16:38.4079827Z [ RUN ] NVFuserTest.FusionAdvancedComputeWith3_CUDA 2023-01-11T23:16:38.6678113Z [ OK ] NVFuserTest.FusionAdvancedComputeWith3_CUDA (259 ms) 2023-01-11T23:16:38.6678564Z [ RUN ] NVFuserTest.FusionAdvancedComputeWith4_CUDA 2023-01-11T23:16:38.9657229Z [ OK ] NVFuserTest.FusionAdvancedComputeWith4_CUDA (297 ms) 2023-01-11T23:16:38.9657735Z [ RUN ] NVFuserTest.FusionAdvancedComputeWith5_CUDA 2023-01-11T23:16:39.1661624Z [ OK ] NVFuserTest.FusionAdvancedComputeWith5_CUDA (200 ms) 2023-01-11T23:16:39.1662526Z [ RUN ] NVFuserTest.FusionAdvancedComputeWith6_CUDA 2023-01-11T23:16:39.3659477Z [ OK ] NVFuserTest.FusionAdvancedComputeWith6_CUDA (199 ms) 2023-01-11T23:16:39.3659956Z [ RUN ] NVFuserTest.FusionComputeAtMultiConsumers_CUDA 2023-01-11T23:16:39.5378716Z [ OK ] NVFuserTest.FusionComputeAtMultiConsumers_CUDA (171 ms) 2023-01-11T23:16:39.5379230Z [ RUN ] NVFuserTest.FusionComputeAtCommonConsumer1_CUDA 2023-01-11T23:16:39.7109687Z [ OK ] NVFuserTest.FusionComputeAtCommonConsumer1_CUDA (173 ms) 2023-01-11T23:16:39.7110196Z [ RUN ] NVFuserTest.FusionComputeAtCommonConsumer2_CUDA 2023-01-11T23:16:39.9122725Z [ OK ] NVFuserTest.FusionComputeAtCommonConsumer2_CUDA (201 ms) 2023-01-11T23:16:39.9123205Z [ RUN ] NVFuserTest.FusionComputeAtCommonConsumer3_CUDA 2023-01-11T23:16:40.1222227Z [ OK ] NVFuserTest.FusionComputeAtCommonConsumer3_CUDA (209 ms) 2023-01-11T23:16:40.1223184Z [ RUN ] NVFuserTest.FusionComputeAtNoCommonConsumer_CUDA 2023-01-11T23:16:40.2967152Z [ OK ] NVFuserTest.FusionComputeAtNoCommonConsumer_CUDA (174 ms) 2023-01-11T23:16:40.2967763Z [ RUN ] NVFuserTest.FusionRootMappingBasic_CUDA 2023-01-11T23:16:40.2968186Z [ OK ] NVFuserTest.FusionRootMappingBasic_CUDA (0 ms) 2023-01-11T23:16:40.2968574Z [ RUN ] NVFuserTest.FusionRootMappingRfactor_CUDA 2023-01-11T23:16:40.2975409Z [ OK ] NVFuserTest.FusionRootMappingRfactor_CUDA (0 ms) 2023-01-11T23:16:40.2976265Z [ RUN ] NVFuserTest.FusionRootMappingReductionDependency1_CUDA 2023-01-11T23:16:40.2977146Z [ OK ] NVFuserTest.FusionRootMappingReductionDependency1_CUDA (0 ms) 2023-01-11T23:16:40.2977810Z [ RUN ] NVFuserTest.FusionRootMappingReductionDependency2_CUDA 2023-01-11T23:16:40.2978270Z [ OK ] NVFuserTest.FusionRootMappingReductionDependency2_CUDA (0 ms) 2023-01-11T23:16:40.2978731Z [ RUN ] NVFuserTest.FusionRootMappingReductionDependency3_CUDA 2023-01-11T23:16:40.2980556Z [ OK ] NVFuserTest.FusionRootMappingReductionDependency3_CUDA (0 ms) 2023-01-11T23:16:40.2981021Z [ RUN ] NVFuserTest.FusionRootMappingReductionDependency4_CUDA 2023-01-11T23:16:40.2985619Z [ OK ] NVFuserTest.FusionRootMappingReductionDependency4_CUDA (0 ms) 2023-01-11T23:16:40.2986122Z [ RUN ] NVFuserTest.FusionRootMappingReductionDependency5_CUDA_CUDA 2023-01-11T23:16:40.2989556Z [ OK ] NVFuserTest.FusionRootMappingReductionDependency5_CUDA_CUDA (0 ms) 2023-01-11T23:16:40.2990124Z [ RUN ] NVFuserTest.FusionRootMappingReductionDependency6_CUDA_CUDA 2023-01-11T23:16:40.2997615Z [ OK ] NVFuserTest.FusionRootMappingReductionDependency6_CUDA_CUDA (0 ms) 2023-01-11T23:16:40.2998207Z [ RUN ] NVFuserTest.FusionRootMappingMultipleBroadcastWithNoCommonConsumer_CUDA 2023-01-11T23:16:40.2998820Z [ OK ] NVFuserTest.FusionRootMappingMultipleBroadcastWithNoCommonConsumer_CUDA (0 ms) 2023-01-11T23:16:40.2999344Z [ RUN ] NVFuserTest.FusionRootMappingBroadcastNonUniqueSize_CUDA 2023-01-11T23:16:40.3000373Z [ OK ] NVFuserTest.FusionRootMappingBroadcastNonUniqueSize_CUDA (0 ms) 2023-01-11T23:16:40.3000825Z [ RUN ] NVFuserTest.FusionRootMappingBroadcast_CUDA 2023-01-11T23:16:40.3001234Z [ OK ] NVFuserTest.FusionRootMappingBroadcast_CUDA (0 ms) 2023-01-11T23:16:40.3001646Z [ RUN ] NVFuserTest.FusionRootMappingTrivialReduction_CUDA 2023-01-11T23:16:40.4740307Z [ OK ] NVFuserTest.FusionRootMappingTrivialReduction_CUDA (173 ms) 2023-01-11T23:16:40.4741566Z [ RUN ] NVFuserTest.FusionRootMappingRepro1950_CUDA 2023-01-11T23:16:40.4742535Z [ OK ] NVFuserTest.FusionRootMappingRepro1950_CUDA (0 ms) 2023-01-11T23:16:40.4743464Z [ RUN ] NVFuserTest.FusionDetectSelfMappedDomains_CUDA 2023-01-11T23:16:40.4755867Z [ OK ] NVFuserTest.FusionDetectSelfMappedDomains_CUDA (1 ms) 2023-01-11T23:16:40.4756381Z [ RUN ] NVFuserTest.FusionScalarInputs_CUDA 2023-01-11T23:16:40.6890942Z [ OK ] NVFuserTest.FusionScalarInputs_CUDA (213 ms) 2023-01-11T23:16:40.6891724Z [ RUN ] NVFuserTest.FusionLoopUnroll_CUDA 2023-01-11T23:16:40.9147962Z [ OK ] NVFuserTest.FusionLoopUnroll_CUDA (225 ms) 2023-01-11T23:16:40.9148434Z [ RUN ] NVFuserTest.FusionUnaryOps_CUDA 2023-01-11T23:16:51.4682647Z [W Copy.cpp:276] Warning: Casting complex values to real discards the imaginary part (function operator()) 2023-01-11T23:16:56.7834009Z [ OK ] NVFuserTest.FusionUnaryOps_CUDA (15868 ms) 2023-01-11T23:16:56.7834511Z [ RUN ] NVFuserTest.FusionBinaryOps_CUDA 2023-01-11T23:17:05.0885093Z [ OK ] NVFuserTest.FusionBinaryOps_CUDA (8305 ms) 2023-01-11T23:17:05.0885512Z [ RUN ] NVFuserTest.FusionTernaryOps_CUDA 2023-01-11T23:17:06.4424365Z [ OK ] NVFuserTest.FusionTernaryOps_CUDA (1353 ms) 2023-01-11T23:17:06.4425076Z [ RUN ] NVFuserTest.FusionCompoundOps_CUDA 2023-01-11T23:17:07.9387583Z [ OK ] NVFuserTest.FusionCompoundOps_CUDA (1496 ms) 2023-01-11T23:17:07.9387989Z [ RUN ] NVFuserTest.FusionCastOps_CUDA 2023-01-11T23:17:08.1073864Z [ OK ] NVFuserTest.FusionCastOps_CUDA (168 ms) 2023-01-11T23:17:08.1074681Z [ RUN ] NVFuserTest.FusionReduction1_CUDA 2023-01-11T23:17:08.3095523Z [ OK ] NVFuserTest.FusionReduction1_CUDA (202 ms) 2023-01-11T23:17:08.3095929Z [ RUN ] NVFuserTest.FusionReduction2_CUDA 2023-01-11T23:17:08.5123733Z [ OK ] NVFuserTest.FusionReduction2_CUDA (202 ms) 2023-01-11T23:17:08.5124128Z [ RUN ] NVFuserTest.FusionReduction3_CUDA 2023-01-11T23:17:08.6999406Z [ OK ] NVFuserTest.FusionReduction3_CUDA (187 ms) 2023-01-11T23:17:08.6999826Z [ RUN ] NVFuserTest.FusionReduction4_CUDA 2023-01-11T23:17:08.8885910Z [ OK ] NVFuserTest.FusionReduction4_CUDA (188 ms) 2023-01-11T23:17:08.8886323Z [ RUN ] NVFuserTest.FusionReduction5_CUDA 2023-01-11T23:17:09.0754521Z [ OK ] NVFuserTest.FusionReduction5_CUDA (186 ms) 2023-01-11T23:17:09.0755422Z [ RUN ] NVFuserTest.FusionReduction6_CUDA 2023-01-11T23:17:09.2664697Z [ OK ] NVFuserTest.FusionReduction6_CUDA (191 ms) 2023-01-11T23:17:09.2665152Z [ RUN ] NVFuserTest.FusionMultiGridReduction_CUDA 2023-01-11T23:17:09.5059842Z [ OK ] NVFuserTest.FusionMultiGridReduction_CUDA (239 ms) 2023-01-11T23:17:09.5060298Z [ RUN ] NVFuserTest.FusionMultiGridReduction2_CUDA 2023-01-11T23:17:09.5102718Z [ OK ] NVFuserTest.FusionMultiGridReduction2_CUDA (4 ms) 2023-01-11T23:17:09.5103275Z [ RUN ] NVFuserTest.FusionReductionTFT_CUDA 2023-01-11T23:17:09.7042825Z [ OK ] NVFuserTest.FusionReductionTFT_CUDA (193 ms) 2023-01-11T23:17:09.7043249Z [ RUN ] NVFuserTest.FusionReductionOuterSplit_CUDA 2023-01-11T23:17:09.9174937Z [ OK ] NVFuserTest.FusionReductionOuterSplit_CUDA (212 ms) 2023-01-11T23:17:09.9175379Z [ RUN ] NVFuserTest.FusionBranches_CUDA 2023-01-11T23:17:10.1472011Z [ OK ] NVFuserTest.FusionBranches_CUDA (229 ms) 2023-01-11T23:17:10.1473511Z [ RUN ] NVFuserTest.FusionSimpleBCast1_CUDA 2023-01-11T23:17:10.3641186Z [ OK ] NVFuserTest.FusionSimpleBCast1_CUDA (217 ms) 2023-01-11T23:17:10.3641676Z [ RUN ] NVFuserTest.FusionSimpleBCast2_CUDA 2023-01-11T23:17:10.5415611Z [ OK ] NVFuserTest.FusionSimpleBCast2_CUDA (177 ms) 2023-01-11T23:17:10.5416267Z [ RUN ] NVFuserTest.FusionSimpleBCast3_CUDA 2023-01-11T23:17:10.7163825Z [ OK ] NVFuserTest.FusionSimpleBCast3_CUDA (175 ms) 2023-01-11T23:17:10.7164631Z [ RUN ] NVFuserTest.FusionSimpleBCast4_CUDA 2023-01-11T23:17:10.9262201Z [ OK ] NVFuserTest.FusionSimpleBCast4_CUDA (209 ms) 2023-01-11T23:17:10.9262647Z [ RUN ] NVFuserTest.FusionSimpleBCast5_CUDA 2023-01-11T23:17:11.1457002Z [ OK ] NVFuserTest.FusionSimpleBCast5_CUDA (219 ms) 2023-01-11T23:17:11.1457517Z [ RUN ] NVFuserTest.FusionComplexBCast1_CUDA 2023-01-11T23:17:11.3816614Z [ OK ] NVFuserTest.FusionComplexBCast1_CUDA (235 ms) 2023-01-11T23:17:11.3817122Z [ RUN ] NVFuserTest.FusionComplexBCast2_CUDA 2023-01-11T23:17:11.5663638Z [ OK ] NVFuserTest.FusionComplexBCast2_CUDA (184 ms) 2023-01-11T23:17:11.5664702Z [ RUN ] NVFuserTest.FusionAdvancedIndexing1_CUDA 2023-01-11T23:17:11.8023978Z [ OK ] NVFuserTest.FusionAdvancedIndexing1_CUDA (235 ms) 2023-01-11T23:17:11.8025013Z [ RUN ] NVFuserTest.FusionAdvancedIndexing2_CUDA 2023-01-11T23:17:12.0402543Z [ OK ] NVFuserTest.FusionAdvancedIndexing2_CUDA (238 ms) 2023-01-11T23:17:12.0403034Z [ RUN ] NVFuserTest.FusionAdvancedIndexing3_CUDA 2023-01-11T23:17:12.2358010Z [ OK ] NVFuserTest.FusionAdvancedIndexing3_CUDA (195 ms) 2023-01-11T23:17:12.2358443Z [ RUN ] NVFuserTest.FusionAdvancedIndexing4_CUDA 2023-01-11T23:17:12.6260083Z [ OK ] NVFuserTest.FusionAdvancedIndexing4_CUDA (390 ms) 2023-01-11T23:17:12.6260589Z [ RUN ] NVFuserTest.FusionAdvancedIndexing5_CUDA 2023-01-11T23:17:12.8428877Z [ OK ] NVFuserTest.FusionAdvancedIndexing5_CUDA (216 ms) 2023-01-11T23:17:12.8429437Z [ RUN ] NVFuserTest.FusionAdvancedIndexing6_CUDA 2023-01-11T23:17:13.0942453Z [ OK ] NVFuserTest.FusionAdvancedIndexing6_CUDA (251 ms) 2023-01-11T23:17:13.0943269Z [ RUN ] NVFuserTest.FusionAdvancedIndexing7_CUDA 2023-01-11T23:17:13.2997273Z [ OK ] NVFuserTest.FusionAdvancedIndexing7_CUDA (205 ms) 2023-01-11T23:17:13.2997707Z [ RUN ] NVFuserTest.FusionAdvancedIndexing8_CUDA 2023-01-11T23:17:14.2782686Z [ OK ] NVFuserTest.FusionAdvancedIndexing8_CUDA (978 ms) 2023-01-11T23:17:14.2783537Z [ RUN ] NVFuserTest.FusionAdvancedIndexing9_CUDA 2023-01-11T23:17:14.5280418Z [ OK ] NVFuserTest.FusionAdvancedIndexing9_CUDA (249 ms) 2023-01-11T23:17:14.5280850Z [ RUN ] NVFuserTest.FusionAdvancedIndexing10_CUDA 2023-01-11T23:17:14.7217755Z [ OK ] NVFuserTest.FusionAdvancedIndexing10_CUDA (193 ms) 2023-01-11T23:17:14.7218440Z [ RUN ] NVFuserTest.FusionAdvancedIndexing11_CUDA 2023-01-11T23:17:14.9001324Z [ OK ] NVFuserTest.FusionAdvancedIndexing11_CUDA (178 ms) 2023-01-11T23:17:14.9001968Z [ RUN ] NVFuserTest.FusionAdvancedLowering1_CUDA 2023-01-11T23:17:15.1918816Z [ OK ] NVFuserTest.FusionAdvancedLowering1_CUDA (291 ms) 2023-01-11T23:17:15.1919515Z [ RUN ] NVFuserTest.FusionAdvancedLowering2_CUDA 2023-01-11T23:17:15.3736828Z [ OK ] NVFuserTest.FusionAdvancedLowering2_CUDA (181 ms) 2023-01-11T23:17:15.3737955Z [ RUN ] NVFuserTest.FusionAdvancedLowering3_CUDA 2023-01-11T23:17:15.5507692Z [ OK ] NVFuserTest.FusionAdvancedLowering3_CUDA (177 ms) 2023-01-11T23:17:15.5508340Z [ RUN ] NVFuserTest.FusionAdvancedLowering4_CUDA 2023-01-11T23:17:15.7677333Z [ OK ] NVFuserTest.FusionAdvancedLowering4_CUDA (216 ms) 2023-01-11T23:17:15.7677970Z [ RUN ] NVFuserTest.FusionAdvancedLowering5_CUDA 2023-01-11T23:17:16.0147738Z [ OK ] NVFuserTest.FusionAdvancedLowering5_CUDA (247 ms) 2023-01-11T23:17:16.0148337Z [ RUN ] NVFuserTest.FusionAdvancedLowering6_CUDA 2023-01-11T23:17:16.2678712Z [ OK ] NVFuserTest.FusionAdvancedLowering6_CUDA (253 ms) 2023-01-11T23:17:16.2679139Z [ RUN ] NVFuserTest.FusionSimpleGemm_CUDA 2023-01-11T23:17:16.4681704Z [ OK ] NVFuserTest.FusionSimpleGemm_CUDA (200 ms) 2023-01-11T23:17:16.4683294Z [ RUN ] NVFuserTest.FusionSoftmax1D_CUDA 2023-01-11T23:17:16.6706327Z [ OK ] NVFuserTest.FusionSoftmax1D_CUDA (202 ms) 2023-01-11T23:17:16.6707059Z [ RUN ] NVFuserTest.FusionSoftmax1DNormalized_CUDA 2023-01-11T23:17:16.8979592Z [ OK ] NVFuserTest.FusionSoftmax1DNormalized_CUDA (227 ms) 2023-01-11T23:17:16.8980358Z [ RUN ] NVFuserTest.FusionSoftmax3D_CUDA 2023-01-11T23:17:17.0929909Z [ OK ] NVFuserTest.FusionSoftmax3D_CUDA (195 ms) 2023-01-11T23:17:17.0930372Z [ RUN ] NVFuserTest.FusionSoftmax3DNormalized_CUDA 2023-01-11T23:17:17.3048968Z [ OK ] NVFuserTest.FusionSoftmax3DNormalized_CUDA (211 ms) 2023-01-11T23:17:17.3049592Z [ RUN ] NVFuserTest.FusionSoftmaxComputeAt_CUDA 2023-01-11T23:17:17.3084722Z [ OK ] NVFuserTest.FusionSoftmaxComputeAt_CUDA (3 ms) 2023-01-11T23:17:17.3085268Z [ RUN ] NVFuserTest.FusionGridReduction1_CUDA 2023-01-11T23:17:17.5560196Z [ OK ] NVFuserTest.FusionGridReduction1_CUDA (247 ms) 2023-01-11T23:17:17.5560599Z [ RUN ] NVFuserTest.FusionGridReduction2_CUDA 2023-01-11T23:17:17.8040299Z [ OK ] NVFuserTest.FusionGridReduction2_CUDA (247 ms) 2023-01-11T23:17:17.8040692Z [ RUN ] NVFuserTest.FusionGridReduction3dim1_CUDA 2023-01-11T23:17:18.0161262Z [ OK ] NVFuserTest.FusionGridReduction3dim1_CUDA (212 ms) 2023-01-11T23:17:18.0161701Z [ RUN ] NVFuserTest.FusionGridReduction3dim0_CUDA 2023-01-11T23:17:18.2275224Z [ OK ] NVFuserTest.FusionGridReduction3dim0_CUDA (211 ms) 2023-01-11T23:17:18.2275947Z [ RUN ] NVFuserTest.FusionGridReduction4_CUDA 2023-01-11T23:17:18.4526313Z [ OK ] NVFuserTest.FusionGridReduction4_CUDA (225 ms) 2023-01-11T23:17:18.4526746Z [ RUN ] NVFuserTest.FusionGridReduction5_CUDA 2023-01-11T23:17:18.6667660Z [ OK ] NVFuserTest.FusionGridReduction5_CUDA (213 ms) 2023-01-11T23:17:18.6668385Z [ RUN ] NVFuserTest.FusionGridReduction6_CUDA 2023-01-11T23:17:18.8983119Z [ OK ] NVFuserTest.FusionGridReduction6_CUDA (231 ms) 2023-01-11T23:17:18.8983906Z [ RUN ] NVFuserTest.FusionGridReduction7_CUDA 2023-01-11T23:17:19.1041619Z [ OK ] NVFuserTest.FusionGridReduction7_CUDA (206 ms) 2023-01-11T23:17:19.1042032Z [ RUN ] NVFuserTest.FusionGridReduction8_CUDA 2023-01-11T23:17:19.3081267Z [ OK ] NVFuserTest.FusionGridReduction8_CUDA (203 ms) 2023-01-11T23:17:19.3081673Z [ RUN ] NVFuserTest.FusionGridReduction9_CUDA 2023-01-11T23:17:19.5229240Z [ OK ] NVFuserTest.FusionGridReduction9_CUDA (214 ms) 2023-01-11T23:17:19.5229658Z [ RUN ] NVFuserTest.FusionGridReduction10_CUDA 2023-01-11T23:17:19.7519099Z [ OK ] NVFuserTest.FusionGridReduction10_CUDA (228 ms) 2023-01-11T23:17:19.7519508Z [ RUN ] NVFuserTest.FusionNonRedAxisBind_CUDA 2023-01-11T23:17:19.9229605Z [ OK ] NVFuserTest.FusionNonRedAxisBind_CUDA (170 ms) 2023-01-11T23:17:19.9230042Z [ RUN ] NVFuserTest.FusionSplitBCast_CUDA 2023-01-11T23:17:20.1129679Z [ OK ] NVFuserTest.FusionSplitBCast_CUDA (190 ms) 2023-01-11T23:17:20.1130085Z [ RUN ] NVFuserTest.FusionBCastInnerDim_CUDA 2023-01-11T23:17:20.1130449Z [ OK ] NVFuserTest.FusionBCastInnerDim_CUDA (0 ms) 2023-01-11T23:17:20.1130804Z [ RUN ] NVFuserTest.FusionBCastReduce_CUDA 2023-01-11T23:17:20.1131161Z [ OK ] NVFuserTest.FusionBCastReduce_CUDA (0 ms) 2023-01-11T23:17:20.1134002Z [ RUN ] NVFuserTest.FusionReductionMultiConsumer_CUDA 2023-01-11T23:17:20.1135237Z [ OK ] NVFuserTest.FusionReductionMultiConsumer_CUDA (0 ms) 2023-01-11T23:17:20.1136065Z [ RUN ] NVFuserTest.FusionComputeAtExprOrder1_CUDA 2023-01-11T23:17:20.4582809Z [ OK ] NVFuserTest.FusionComputeAtExprOrder1_CUDA (344 ms) 2023-01-11T23:17:20.4583335Z [ RUN ] NVFuserTest.FusionComputeAtExprOrder2_CUDA 2023-01-11T23:17:20.7298450Z [ OK ] NVFuserTest.FusionComputeAtExprOrder2_CUDA (271 ms) 2023-01-11T23:17:20.7298874Z [ RUN ] NVFuserTest.FusionComputeAtExprOrder3_CUDA 2023-01-11T23:17:21.4906425Z [ OK ] NVFuserTest.FusionComputeAtExprOrder3_CUDA (760 ms) 2023-01-11T23:17:21.4907296Z [ RUN ] NVFuserTest.FusionZeroDimComputeAt_CUDA 2023-01-11T23:17:21.6597543Z [ OK ] NVFuserTest.FusionZeroDimComputeAt_CUDA (169 ms) 2023-01-11T23:17:21.6598356Z [ RUN ] NVFuserTest.FusionZeroDimBroadcast_CUDA 2023-01-11T23:17:21.8411430Z [ OK ] NVFuserTest.FusionZeroDimBroadcast_CUDA (181 ms) 2023-01-11T23:17:21.8412681Z [ RUN ] NVFuserTest.FusionZeroDimReduction_CUDA 2023-01-11T23:17:22.0533418Z [ OK ] NVFuserTest.FusionZeroDimReduction_CUDA (212 ms) 2023-01-11T23:17:22.0534241Z [ RUN ] NVFuserTest.FusionBCastAfterReduce_CUDA 2023-01-11T23:17:22.2475883Z [ OK ] NVFuserTest.FusionBCastAfterReduce_CUDA (194 ms) 2023-01-11T23:17:22.2476725Z [ RUN ] NVFuserTest.FusionOutputBroadcast_CUDA 2023-01-11T23:17:22.4204328Z [ OK ] NVFuserTest.FusionOutputBroadcast_CUDA (173 ms) 2023-01-11T23:17:22.4204827Z [ RUN ] NVFuserTest.FusionReductionKeepDimBasic_CUDA 2023-01-11T23:17:26.1359962Z [ OK ] NVFuserTest.FusionReductionKeepDimBasic_CUDA (3715 ms) 2023-01-11T23:17:26.1360680Z [ RUN ] NVFuserTest.FusionReductionKeepDimScheduler_CUDA 2023-01-11T23:17:26.3538212Z [ OK ] NVFuserTest.FusionReductionKeepDimScheduler_CUDA (217 ms) 2023-01-11T23:17:26.3539047Z [ RUN ] NVFuserTest.FusionSumTo_CUDA 2023-01-11T23:17:30.0736290Z [ OK ] NVFuserTest.FusionSumTo_CUDA (3719 ms) 2023-01-11T23:17:30.0736832Z [ RUN ] NVFuserTest.FusionSumToNoop_CUDA 2023-01-11T23:17:30.4026884Z [ OK ] NVFuserTest.FusionSumToNoop_CUDA (329 ms) 2023-01-11T23:17:30.4027488Z [ RUN ] NVFuserTest.FusionReductionScheduler_CUDA 2023-01-11T23:17:30.6199089Z [ OK ] NVFuserTest.FusionReductionScheduler_CUDA (217 ms) 2023-01-11T23:17:30.6200031Z [ RUN ] NVFuserTest.FusionReductionWithTrivialReduction_CUDA 2023-01-11T23:17:35.0989311Z [ OK ] NVFuserTest.FusionReductionWithTrivialReduction_CUDA (4478 ms) 2023-01-11T23:17:35.0990236Z [ RUN ] NVFuserTest.FusionSymbolicReduction_CUDA 2023-01-11T23:17:35.2864778Z [ OK ] NVFuserTest.FusionSymbolicReduction_CUDA (187 ms) 2023-01-11T23:17:35.2865283Z [ RUN ] NVFuserTest.FusionReductionSchedulerMultiDimNonFastest_CUDA 2023-01-11T23:17:35.5094744Z [ OK ] NVFuserTest.FusionReductionSchedulerMultiDimNonFastest_CUDA (223 ms) 2023-01-11T23:17:35.5095463Z [ RUN ] NVFuserTest.FusionReductionSchedulerMultiDimFastest_CUDA 2023-01-11T23:17:35.7274522Z [ OK ] NVFuserTest.FusionReductionSchedulerMultiDimFastest_CUDA (217 ms) 2023-01-11T23:17:35.7275077Z [ RUN ] NVFuserTest.FusionReductionSchedulerNoODimShmoo_CUDA 2023-01-11T23:17:45.1729172Z [ OK ] NVFuserTest.FusionReductionSchedulerNoODimShmoo_CUDA (9445 ms) 2023-01-11T23:17:45.1729663Z [ RUN ] NVFuserTest.FusionReductionSchedulerDimShmoo_CUDA 2023-01-11T23:18:25.7474117Z [ OK ] NVFuserTest.FusionReductionSchedulerDimShmoo_CUDA (40574 ms) 2023-01-11T23:18:25.7475087Z [ RUN ] NVFuserTest.FusionCacheBefore_CUDA 2023-01-11T23:18:25.9231969Z [ OK ] NVFuserTest.FusionCacheBefore_CUDA (175 ms) 2023-01-11T23:18:25.9232933Z [ RUN ] NVFuserTest.FusionCacheAfter_CUDA 2023-01-11T23:18:26.0982720Z [ OK ] NVFuserTest.FusionCacheAfter_CUDA (174 ms) 2023-01-11T23:18:26.0983515Z [ RUN ] NVFuserTest.FusionCacheFork_CUDA 2023-01-11T23:18:26.2759374Z [ OK ] NVFuserTest.FusionCacheFork_CUDA (177 ms) 2023-01-11T23:18:26.2759757Z [ RUN ] NVFuserTest.FusionCacheIndirect_CUDA 2023-01-11T23:18:26.4572162Z [ OK ] NVFuserTest.FusionCacheIndirect_CUDA (181 ms) 2023-01-11T23:18:26.4572983Z [ RUN ] NVFuserTest.FusionCacheBcast_CUDA 2023-01-11T23:18:26.9200552Z [ OK ] NVFuserTest.FusionCacheBcast_CUDA (462 ms) 2023-01-11T23:18:26.9201030Z [ RUN ] NVFuserTest.FusionCacheMultiConsumer_CUDA 2023-01-11T23:18:27.1002785Z [ OK ] NVFuserTest.FusionCacheMultiConsumer_CUDA (180 ms) 2023-01-11T23:18:27.1003180Z [ RUN ] NVFuserTest.FusionSmem_CUDA 2023-01-11T23:18:27.3894992Z [ OK ] NVFuserTest.FusionSmem_CUDA (288 ms) 2023-01-11T23:18:27.3896324Z [ RUN ] NVFuserTest.FusionSmemReduce_CUDA 2023-01-11T23:18:28.6751796Z [ OK ] NVFuserTest.FusionSmemReduce_CUDA (1285 ms) 2023-01-11T23:18:28.6752300Z [ RUN ] NVFuserTest.FusionSmemBlockGemm_CUDA 2023-01-11T23:18:28.9314097Z [ OK ] NVFuserTest.FusionSmemBlockGemm_CUDA (256 ms) 2023-01-11T23:18:28.9315115Z [ RUN ] NVFuserTest.FusionSmemBlockGemmCache_CUDA 2023-01-11T23:18:29.1845230Z [ OK ] NVFuserTest.FusionSmemBlockGemmCache_CUDA (253 ms) 2023-01-11T23:18:29.1845764Z [ RUN ] NVFuserTest.FusionSmemDynamicPersistentSoftmax2D_CUDA 2023-01-11T23:18:29.4242309Z [ OK ] NVFuserTest.FusionSmemDynamicPersistentSoftmax2D_CUDA (239 ms) 2023-01-11T23:18:29.4242820Z [ RUN ] NVFuserTest.FusionMagicSchedulerSoftmax_CUDA 2023-01-11T23:18:29.8499499Z [ OK ] NVFuserTest.FusionMagicSchedulerSoftmax_CUDA (425 ms) 2023-01-11T23:18:29.8500031Z [ RUN ] NVFuserTest.FusionTestMaskSoftmax_CUDA 2023-01-11T23:18:30.1598962Z [ OK ] NVFuserTest.FusionTestMaskSoftmax_CUDA (309 ms) 2023-01-11T23:18:30.1599472Z [ RUN ] NVFuserTest.FusionMagicSchedulerLayerNormBackward_CUDA 2023-01-11T23:18:31.2048010Z [ OK ] NVFuserTest.FusionMagicSchedulerLayerNormBackward_CUDA (1045 ms) 2023-01-11T23:18:31.2048509Z [ RUN ] NVFuserTest.FusionMagicSchedulerRMSNormBackward_CUDA 2023-01-11T23:18:31.8839860Z [ OK ] NVFuserTest.FusionMagicSchedulerRMSNormBackward_CUDA (678 ms) 2023-01-11T23:18:31.8840841Z [ RUN ] NVFuserTest.FusionMagicSchedulerLayerNormalization_CUDA 2023-01-11T23:18:32.4080621Z [ OK ] NVFuserTest.FusionMagicSchedulerLayerNormalization_CUDA (524 ms) 2023-01-11T23:18:32.4081142Z [ RUN ] NVFuserTest.FusionMagicSchedulerRMSNormalization_CUDA 2023-01-11T23:18:32.6479052Z [ OK ] NVFuserTest.FusionMagicSchedulerRMSNormalization_CUDA (239 ms) 2023-01-11T23:18:32.6479616Z [ RUN ] NVFuserTest.FusionMagicSchedulerBatchNormalization_CUDA 2023-01-11T23:18:34.9093561Z [ OK ] NVFuserTest.FusionMagicSchedulerBatchNormalization_CUDA (2261 ms) 2023-01-11T23:18:34.9095467Z [ RUN ] NVFuserTest.FusionMagicSchedulerInstanceNormalization_CUDA 2023-01-11T23:18:36.1120765Z [ OK ] NVFuserTest.FusionMagicSchedulerInstanceNormalization_CUDA (1202 ms) 2023-01-11T23:18:36.1121408Z [ RUN ] NVFuserTest.FusionMagicSchedulerInstanceNormalizationBackward_CUDA 2023-01-11T23:18:37.6478406Z [ OK ] NVFuserTest.FusionMagicSchedulerInstanceNormalizationBackward_CUDA (1535 ms) 2023-01-11T23:18:37.6479537Z [ RUN ] NVFuserTest.FusionPersistentSoftmaxLocalShared_CUDA 2023-01-11T23:18:40.6363863Z [ OK ] NVFuserTest.FusionPersistentSoftmaxLocalShared_CUDA (2988 ms) 2023-01-11T23:18:40.6364357Z [ RUN ] NVFuserTest.FusionPersistentNormLocalShared_CUDA 2023-01-11T23:18:41.1261651Z [ OK ] NVFuserTest.FusionPersistentNormLocalShared_CUDA (489 ms) 2023-01-11T23:18:41.1262145Z [ RUN ] NVFuserTest.FusionSmemDynamicPersistentNorm_CUDA 2023-01-11T23:18:41.3656038Z [ OK ] NVFuserTest.FusionSmemDynamicPersistentNorm_CUDA (239 ms) 2023-01-11T23:18:41.3657002Z [ RUN ] NVFuserTest.FusionSmemDynamicReductionSymbolic_CUDA 2023-01-11T23:18:41.5608517Z [ OK ] NVFuserTest.FusionSmemDynamicReductionSymbolic_CUDA (195 ms) 2023-01-11T23:18:41.5609058Z [ RUN ] NVFuserTest.FusionSmemDynamicReductionSymbolicArg_CUDA 2023-01-11T23:18:42.8851544Z [ OK ] NVFuserTest.FusionSmemDynamicReductionSymbolicArg_CUDA (1324 ms) 2023-01-11T23:18:42.8852117Z [ RUN ] NVFuserTest.FusionSmemDynamicPwiseMulSymbolicArgWAR_CUDA 2023-01-11T23:18:45.8315205Z [ OK ] NVFuserTest.FusionSmemDynamicPwiseMulSymbolicArgWAR_CUDA (2946 ms) 2023-01-11T23:18:45.8316109Z [ RUN ] NVFuserTest.FusionSmemDynamicTiledGemm_CUDA 2023-01-11T23:18:46.0798515Z [ OK ] NVFuserTest.FusionSmemDynamicTiledGemm_CUDA (248 ms) 2023-01-11T23:18:46.0799553Z [ RUN ] NVFuserTest.FusionGlobalIntermediate_CUDA 2023-01-11T23:18:46.2765075Z [ OK ] NVFuserTest.FusionGlobalIntermediate_CUDA (197 ms) 2023-01-11T23:18:46.2765635Z [ RUN ] NVFuserTest.FusionGlobalIntermediateDefaultSchedule_CUDA 2023-01-11T23:18:46.4631536Z [ OK ] NVFuserTest.FusionGlobalIntermediateDefaultSchedule_CUDA (186 ms) 2023-01-11T23:18:46.4632560Z [ RUN ] NVFuserTest.FusionConstCheck_CUDA 2023-01-11T23:18:46.4633263Z [ OK ] NVFuserTest.FusionConstCheck_CUDA (0 ms) 2023-01-11T23:18:46.4633682Z [ RUN ] NVFuserTest.FusionUnrollWithAlloc_CUDA 2023-01-11T23:18:46.6661322Z [ OK ] NVFuserTest.FusionUnrollWithAlloc_CUDA (202 ms) 2023-01-11T23:18:46.6662242Z [ RUN ] NVFuserTest.FusionIsZeroInt_CUDA 2023-01-11T23:18:46.6662993Z [ OK ] NVFuserTest.FusionIsZeroInt_CUDA (0 ms) 2023-01-11T23:18:46.6664397Z [ RUN ] NVFuserTest.FusionIsOneInt_CUDA 2023-01-11T23:18:46.6664775Z [ OK ] NVFuserTest.FusionIsOneInt_CUDA (0 ms) 2023-01-11T23:18:46.6665178Z [ RUN ] NVFuserTest.FusionComputeAtNonterminatingOutput_CUDA 2023-01-11T23:18:46.8390415Z [ OK ] NVFuserTest.FusionComputeAtNonterminatingOutput_CUDA (172 ms) 2023-01-11T23:18:46.8391595Z [ RUN ] NVFuserTest.FusionTraversalOrder1_CUDA 2023-01-11T23:18:47.0163554Z [ OK ] NVFuserTest.FusionTraversalOrder1_CUDA (177 ms) 2023-01-11T23:18:47.0163977Z [ RUN ] NVFuserTest.FusionTraversalOrder2_CUDA 2023-01-11T23:18:47.1924465Z [ OK ] NVFuserTest.FusionTraversalOrder2_CUDA (175 ms) 2023-01-11T23:18:47.1924901Z [ RUN ] NVFuserTest.FusionTraversalOrder3_CUDA 2023-01-11T23:18:47.7282917Z [ OK ] NVFuserTest.FusionTraversalOrder3_CUDA (535 ms) 2023-01-11T23:18:47.7283382Z [ RUN ] NVFuserTest.FusionTraversalOrder4_CUDA 2023-01-11T23:18:47.9045386Z [ OK ] NVFuserTest.FusionTraversalOrder4_CUDA (176 ms) 2023-01-11T23:18:47.9045850Z [ RUN ] NVFuserTest.FusionTraversalOrder5_CUDA 2023-01-11T23:18:48.0805266Z [ OK ] NVFuserTest.FusionTraversalOrder5_CUDA (176 ms) 2023-01-11T23:18:48.0805690Z [ RUN ] NVFuserTest.FusionTraversalOrder6_CUDA 2023-01-11T23:18:48.3424198Z [ OK ] NVFuserTest.FusionTraversalOrder6_CUDA (261 ms) 2023-01-11T23:18:48.3424650Z [ RUN ] NVFuserTest.FusionTraversalOrder7_CUDA 2023-01-11T23:18:48.7228864Z [ OK ] NVFuserTest.FusionTraversalOrder7_CUDA (380 ms) 2023-01-11T23:18:48.7229687Z [ RUN ] NVFuserTest.FusionThreadPredicate_CUDA 2023-01-11T23:18:48.9446106Z [ OK ] NVFuserTest.FusionThreadPredicate_CUDA (221 ms) 2023-01-11T23:18:48.9446538Z [ RUN ] NVFuserTest.FusionLSTMCell_CUDA 2023-01-11T23:18:49.5344288Z [ OK ] NVFuserTest.FusionLSTMCell_CUDA (589 ms) 2023-01-11T23:18:49.5344706Z [ RUN ] NVFuserTest.FusionReductionHalf_CUDA 2023-01-11T23:18:49.7639027Z [ OK ] NVFuserTest.FusionReductionHalf_CUDA (229 ms) 2023-01-11T23:18:49.7639805Z [ RUN ] NVFuserTest.FusionReduceSingle_CUDA 2023-01-11T23:18:50.0720471Z [ OK ] NVFuserTest.FusionReduceSingle_CUDA (307 ms) 2023-01-11T23:18:50.0721333Z [ RUN ] NVFuserTest.FusionReduceImplicitBroadcast_CUDA 2023-01-11T23:18:50.2914387Z [ OK ] NVFuserTest.FusionReduceImplicitBroadcast_CUDA (219 ms) 2023-01-11T23:18:50.2915023Z [ RUN ] NVFuserTest.FusionReduceImplicitBroadcast2_CUDA 2023-01-11T23:18:50.5123824Z [ OK ] NVFuserTest.FusionReduceImplicitBroadcast2_CUDA (220 ms) 2023-01-11T23:18:50.5125068Z [ RUN ] NVFuserTest.FusionReduceImplicitBroadcast3_CUDA 2023-01-11T23:18:50.7325543Z [ OK ] NVFuserTest.FusionReduceImplicitBroadcast3_CUDA (220 ms) 2023-01-11T23:18:50.7326081Z [ RUN ] NVFuserTest.FusionTrivialReduction_CUDA 2023-01-11T23:18:51.2474858Z [ OK ] NVFuserTest.FusionTrivialReduction_CUDA (514 ms) 2023-01-11T23:18:51.2475298Z [ RUN ] NVFuserTest.FusionTrivialReduction2_CUDA 2023-01-11T23:18:51.4323924Z [ OK ] NVFuserTest.FusionTrivialReduction2_CUDA (184 ms) 2023-01-11T23:18:51.4324438Z [ RUN ] NVFuserTest.FusionTrivialReduction3_CUDA 2023-01-11T23:18:51.6157785Z [ OK ] NVFuserTest.FusionTrivialReduction3_CUDA (183 ms) 2023-01-11T23:18:51.6158609Z [ RUN ] NVFuserTest.FusionDetectTrivialReduction1_CUDA 2023-01-11T23:18:51.8109956Z [ OK ] NVFuserTest.FusionDetectTrivialReduction1_CUDA (194 ms) 2023-01-11T23:18:51.8111174Z [ RUN ] NVFuserTest.FusionDetectTrivialReduction2_CUDA 2023-01-11T23:18:51.8132102Z [ OK ] NVFuserTest.FusionDetectTrivialReduction2_CUDA (2 ms) 2023-01-11T23:18:51.8132683Z [ RUN ] NVFuserTest.FusionInputsIdLookup_CUDA 2023-01-11T23:18:51.8133154Z [ OK ] NVFuserTest.FusionInputsIdLookup_CUDA (0 ms) 2023-01-11T23:18:51.8133632Z [ RUN ] NVFuserTest.FusionGroupGuardSimpleTensor_CUDA 2023-01-11T23:18:51.8134293Z [ OK ] NVFuserTest.FusionGroupGuardSimpleTensor_CUDA (0 ms) 2023-01-11T23:18:51.8135096Z [ RUN ] NVFuserTest.FusionGroupGuardBroadcastTensor_CUDA 2023-01-11T23:18:51.8135623Z [ OK ] NVFuserTest.FusionGroupGuardBroadcastTensor_CUDA (0 ms) 2023-01-11T23:18:51.8136197Z [ RUN ] NVFuserTest.FusionGroupGuardPermutedTensor_CUDA 2023-01-11T23:18:51.8136663Z [ OK ] NVFuserTest.FusionGroupGuardPermutedTensor_CUDA (0 ms) 2023-01-11T23:18:51.8140940Z [ RUN ] NVFuserTest.FusionGroupGuardRelaxedCheck_CUDA 2023-01-11T23:18:51.8141553Z [ OK ] NVFuserTest.FusionGroupGuardRelaxedCheck_CUDA (0 ms) 2023-01-11T23:18:51.8142017Z [ RUN ] NVFuserTest.FusionDisjointSet_CUDA 2023-01-11T23:18:51.8142627Z [ OK ] NVFuserTest.FusionDisjointSet_CUDA (0 ms) 2023-01-11T23:18:51.8143010Z [ RUN ] NVFuserTest.FusionNonUniqueBroadcastSize_CUDA 2023-01-11T23:18:51.8167773Z [ OK ] NVFuserTest.FusionNonUniqueBroadcastSize_CUDA (3 ms) 2023-01-11T23:18:51.8168185Z [ RUN ] NVFuserTest.FusionBiasGeluFwd_CUDA 2023-01-11T23:18:52.0920124Z [ OK ] NVFuserTest.FusionBiasGeluFwd_CUDA (274 ms) 2023-01-11T23:18:52.0920912Z [ RUN ] NVFuserTest.FusionBiasGeluBwd_CUDA 2023-01-11T23:18:52.4220230Z [ OK ] NVFuserTest.FusionBiasGeluBwd_CUDA (330 ms) 2023-01-11T23:18:52.4220985Z [ RUN ] NVFuserTest.FusionIssue459_CUDA 2023-01-11T23:18:52.6028123Z [ OK ] NVFuserTest.FusionIssue459_CUDA (180 ms) 2023-01-11T23:18:52.6028918Z [ RUN ] NVFuserTest.FusionSmemIndexingSimple_CUDA 2023-01-11T23:18:52.7737186Z [ OK ] NVFuserTest.FusionSmemIndexingSimple_CUDA (171 ms) 2023-01-11T23:18:52.7737612Z [ RUN ] NVFuserTest.FusionSmemIndexing_CUDA 2023-01-11T23:18:53.0096690Z [ OK ] NVFuserTest.FusionSmemIndexing_CUDA (235 ms) 2023-01-11T23:18:53.0097524Z [ RUN ] NVFuserTest.FusionCacheBeforeReduction_CUDA 2023-01-11T23:18:53.2121512Z [ OK ] NVFuserTest.FusionCacheBeforeReduction_CUDA (202 ms) 2023-01-11T23:18:53.2122434Z [ RUN ] NVFuserTest.FusionCacheBeforeReduction2_CUDA 2023-01-11T23:18:53.3869256Z [ OK ] NVFuserTest.FusionCacheBeforeReduction2_CUDA (174 ms) 2023-01-11T23:18:53.3870067Z [ RUN ] NVFuserTest.FusionIssue367_CUDA 2023-01-11T23:18:53.6461683Z [ OK ] NVFuserTest.FusionIssue367_CUDA (259 ms) 2023-01-11T23:18:53.6462808Z [ RUN ] NVFuserTest.FusionIssue468_CUDA 2023-01-11T23:18:53.8375648Z [ OK ] NVFuserTest.FusionIssue468_CUDA (191 ms) 2023-01-11T23:18:53.8376034Z [ RUN ] NVFuserTest.FusionIssue363_CUDA 2023-01-11T23:18:54.0510888Z [ OK ] NVFuserTest.FusionIssue363_CUDA (213 ms) 2023-01-11T23:18:54.0511652Z [ RUN ] NVFuserTest.FusionIssue484_CUDA 2023-01-11T23:18:54.2388292Z [ OK ] NVFuserTest.FusionIssue484_CUDA (187 ms) 2023-01-11T23:18:54.2389229Z [ RUN ] NVFuserTest.FusionIssue329_CUDA 2023-01-11T23:18:54.4152288Z [ OK ] NVFuserTest.FusionIssue329_CUDA (176 ms) 2023-01-11T23:18:54.4153024Z [ RUN ] NVFuserTest.FusionIssue382_CUDA 2023-01-11T23:18:54.5959659Z [ OK ] NVFuserTest.FusionIssue382_CUDA (181 ms) 2023-01-11T23:18:54.5960044Z [ RUN ] NVFuserTest.FusionIssue507_CUDA 2023-01-11T23:18:54.7655716Z [ OK ] NVFuserTest.FusionIssue507_CUDA (169 ms) 2023-01-11T23:18:54.7656533Z [ RUN ] NVFuserTest.FusionIssue532_CUDA 2023-01-11T23:18:55.0485066Z [ OK ] NVFuserTest.FusionIssue532_CUDA (282 ms) 2023-01-11T23:18:55.0485505Z [ RUN ] NVFuserTest.FusionLoopUnswitch_CUDA 2023-01-11T23:18:55.2276736Z [ OK ] NVFuserTest.FusionLoopUnswitch_CUDA (178 ms) 2023-01-11T23:18:55.2277126Z [ RUN ] NVFuserTest.FusionIssue549_CUDA 2023-01-11T23:18:55.4317511Z [ OK ] NVFuserTest.FusionIssue549_CUDA (204 ms) 2023-01-11T23:18:55.4318014Z [ RUN ] NVFuserTest.FusionSimpleCompileRtc_CUDA 2023-01-11T23:18:55.6092416Z [ OK ] NVFuserTest.FusionSimpleCompileRtc_CUDA (177 ms) 2023-01-11T23:18:55.6092861Z [ RUN ] NVFuserTest.FusionSerialWelford_CUDA 2023-01-11T23:18:55.8769532Z [ OK ] NVFuserTest.FusionSerialWelford_CUDA (267 ms) 2023-01-11T23:18:55.8770000Z [ RUN ] NVFuserTest.FusionBlockWelford_CUDA 2023-01-11T23:18:56.0833601Z [ OK ] NVFuserTest.FusionBlockWelford_CUDA (206 ms) 2023-01-11T23:18:56.0834661Z [ RUN ] NVFuserTest.FusionBlockWelfordNoInit_CUDA 2023-01-11T23:18:56.2864472Z [ OK ] NVFuserTest.FusionBlockWelfordNoInit_CUDA (203 ms) 2023-01-11T23:18:56.2864957Z [ RUN ] NVFuserTest.FusionGridWelfordNoInit_CUDA 2023-01-11T23:18:56.5270157Z [ OK ] NVFuserTest.FusionGridWelfordNoInit_CUDA (240 ms) 2023-01-11T23:18:56.5271270Z [ RUN ] NVFuserTest.FusionWelfordOp_CUDA 2023-01-11T23:18:57.5592186Z [ OK ] NVFuserTest.FusionWelfordOp_CUDA (1032 ms) 2023-01-11T23:18:57.5593653Z [ RUN ] NVFuserTest.FusionBlockWelfordOp_CUDA 2023-01-11T23:18:57.7703999Z [ OK ] NVFuserTest.FusionBlockWelfordOp_CUDA (211 ms) 2023-01-11T23:18:57.7704810Z [ RUN ] NVFuserTest.FusionGridWelfordOp_CUDA 2023-01-11T23:18:58.0075063Z [ OK ] NVFuserTest.FusionGridWelfordOp_CUDA (236 ms) 2023-01-11T23:18:58.0075718Z [ RUN ] NVFuserTest.FusionRfactorWelfordOp_CUDA 2023-01-11T23:18:58.2157979Z [ OK ] NVFuserTest.FusionRfactorWelfordOp_CUDA (208 ms) 2023-01-11T23:18:58.2158502Z [ RUN ] NVFuserTest.FusionWelfordSchedule_CUDA 2023-01-11T23:18:58.4905580Z [ OK ] NVFuserTest.FusionWelfordSchedule_CUDA (274 ms) 2023-01-11T23:18:58.4906580Z [ RUN ] NVFuserTest.FusionWelfordShmoo_CUDA 2023-01-11T23:19:43.9584038Z [ OK ] NVFuserTest.FusionWelfordShmoo_CUDA (45467 ms) 2023-01-11T23:19:43.9584808Z [ RUN ] NVFuserTest.FusionVarMean_CUDA 2023-01-11T23:19:46.1965125Z [ OK ] NVFuserTest.FusionVarMean_CUDA (2238 ms) 2023-01-11T23:19:46.1965561Z [ RUN ] NVFuserTest.FusionSimpleGemmTransposed_CUDA 2023-01-11T23:19:46.4019419Z [ OK ] NVFuserTest.FusionSimpleGemmTransposed_CUDA (205 ms) 2023-01-11T23:19:46.4020328Z [ RUN ] NVFuserTest.FusionSoftmax3DTransposed_CUDA 2023-01-11T23:19:46.6015994Z [ OK ] NVFuserTest.FusionSoftmax3DTransposed_CUDA (199 ms) 2023-01-11T23:19:46.6016901Z [ RUN ] NVFuserTest.FusionAdvancedComputeAtTransposed1_CUDA 2023-01-11T23:19:46.8189681Z [ OK ] NVFuserTest.FusionAdvancedComputeAtTransposed1_CUDA (217 ms) 2023-01-11T23:19:46.8190234Z [ RUN ] NVFuserTest.FusionAdvancedComputeAtTransposed2_CUDA 2023-01-11T23:19:47.0363171Z [ OK ] NVFuserTest.FusionAdvancedComputeAtTransposed2_CUDA (217 ms) 2023-01-11T23:19:47.0363679Z [ RUN ] NVFuserTest.FusionAdvancedComputeAtTransposed3_CUDA 2023-01-11T23:19:47.3277644Z [ OK ] NVFuserTest.FusionAdvancedComputeAtTransposed3_CUDA (291 ms) 2023-01-11T23:19:47.3278593Z [ RUN ] NVFuserTest.FusionAdvancedComputeAtTransposed4_CUDA 2023-01-11T23:19:47.7201773Z [ OK ] NVFuserTest.FusionAdvancedComputeAtTransposed4_CUDA (392 ms) 2023-01-11T23:19:47.7202322Z [ RUN ] NVFuserTest.FusionAdvancedComputeAtTransposed5_CUDA 2023-01-11T23:19:47.9308892Z [ OK ] NVFuserTest.FusionAdvancedComputeAtTransposed5_CUDA (210 ms) 2023-01-11T23:19:47.9309800Z [ RUN ] NVFuserTest.FusionAdvancedComputeAtTransposed6_CUDA 2023-01-11T23:19:48.1402285Z [ OK ] NVFuserTest.FusionAdvancedComputeAtTransposed6_CUDA (209 ms) 2023-01-11T23:19:48.1402771Z [ RUN ] NVFuserTest.FusionSegmentReducePointwise_CUDA 2023-01-11T23:19:48.5594730Z [ OK ] NVFuserTest.FusionSegmentReducePointwise_CUDA (418 ms) 2023-01-11T23:19:48.5595614Z [ RUN ] NVFuserTest.FusionMultipleVectorize_CUDA 2023-01-11T23:19:48.9346776Z [ OK ] NVFuserTest.FusionMultipleVectorize_CUDA (375 ms) 2023-01-11T23:19:48.9347583Z [ RUN ] NVFuserTest.FusionVectorizeSimple_CUDA 2023-01-11T23:19:49.1549003Z [ OK ] NVFuserTest.FusionVectorizeSimple_CUDA (220 ms) 2023-01-11T23:19:49.1549544Z [ RUN ] NVFuserTest.FusionSimpleVectorizeUnroll_CUDA 2023-01-11T23:19:49.3753554Z [ OK ] NVFuserTest.FusionSimpleVectorizeUnroll_CUDA (220 ms) 2023-01-11T23:19:49.3754823Z [ RUN ] NVFuserTest.FusionSegmentReduceSoftmax_CUDA 2023-01-11T23:19:49.9001229Z [ OK ] NVFuserTest.FusionSegmentReduceSoftmax_CUDA (524 ms) 2023-01-11T23:19:49.9001659Z [ RUN ] NVFuserTest.FusionSwizzle1_CUDA 2023-01-11T23:19:50.0742051Z [ OK ] NVFuserTest.FusionSwizzle1_CUDA (173 ms) 2023-01-11T23:19:50.0742481Z [ RUN ] NVFuserTest.FusionSwizzle2_CUDA 2023-01-11T23:19:50.2489118Z [ OK ] NVFuserTest.FusionSwizzle2_CUDA (174 ms) 2023-01-11T23:19:50.2489532Z [ RUN ] NVFuserTest.FusionGridPersistence_CUDA 2023-01-11T23:19:50.4784870Z [ OK ] NVFuserTest.FusionGridPersistence_CUDA (229 ms) 2023-01-11T23:19:50.4785728Z [ RUN ] NVFuserTest.FusionGridPersistence2_CUDA 2023-01-11T23:19:50.7138676Z [ OK ] NVFuserTest.FusionGridPersistence2_CUDA (235 ms) 2023-01-11T23:19:50.7139499Z [ RUN ] NVFuserTest.FusionWelfordPersistence_CUDA 2023-01-11T23:19:50.9855300Z [ OK ] NVFuserTest.FusionWelfordPersistence_CUDA (271 ms) 2023-01-11T23:19:50.9856167Z [ RUN ] NVFuserTest.FusionWelfordPersistence2_CUDA 2023-01-11T23:19:51.2707036Z [ OK ] NVFuserTest.FusionWelfordPersistence2_CUDA (285 ms) 2023-01-11T23:19:51.2707866Z [ RUN ] NVFuserTest.FusionIssue633_CUDA 2023-01-11T23:19:51.4438124Z [ OK ] NVFuserTest.FusionIssue633_CUDA (173 ms) 2023-01-11T23:19:51.4438972Z [ RUN ] NVFuserTest.FusionBroadcastAcrossComputeAt_CUDA 2023-01-11T23:19:51.6215697Z [ OK ] NVFuserTest.FusionBroadcastAcrossComputeAt_CUDA (177 ms) 2023-01-11T23:19:51.6216235Z [ RUN ] NVFuserTest.FusionVectorizeMisalignedPointwise_CUDA 2023-01-11T23:19:51.8445535Z [ OK ] NVFuserTest.FusionVectorizeMisalignedPointwise_CUDA (223 ms) 2023-01-11T23:19:51.8446062Z [ RUN ] NVFuserTest.FusionVectorizeMisalignedPointwiseMergeContig_CUDA 2023-01-11T23:19:52.0704178Z [ OK ] NVFuserTest.FusionVectorizeMisalignedPointwiseMergeContig_CUDA (225 ms) 2023-01-11T23:19:52.0705299Z [ RUN ] NVFuserTest.FusionVectorizeMisalignedPointwiseMergeSymbolicPass_CUDA 2023-01-11T23:19:52.2954896Z [ OK ] NVFuserTest.FusionVectorizeMisalignedPointwiseMergeSymbolicPass_CUDA (224 ms) 2023-01-11T23:19:52.2956039Z [ RUN ] NVFuserTest.FusionVectorizeMisalignedPointwiseMergeSymbolicFail_CUDA 2023-01-11T23:19:52.3038417Z [ OK ] NVFuserTest.FusionVectorizeMisalignedPointwiseMergeSymbolicFail_CUDA (8 ms) 2023-01-11T23:19:52.3039442Z [ RUN ] NVFuserTest.FusionVectorizeMisalignedRFactor_CUDA 2023-01-11T23:19:52.5359941Z [ OK ] NVFuserTest.FusionVectorizeMisalignedRFactor_CUDA (231 ms) 2023-01-11T23:19:52.5360843Z [ RUN ] NVFuserTest.FusionVectorizeMisalignedWrongDimFail_CUDA 2023-01-11T23:19:52.5410701Z [ OK ] NVFuserTest.FusionVectorizeMisalignedWrongDimFail_CUDA (5 ms) 2023-01-11T23:19:52.5411684Z [ RUN ] NVFuserTest.FusionVectorizeMisalignedStride_CUDA 2023-01-11T23:19:52.7635510Z [ OK ] NVFuserTest.FusionVectorizeMisalignedStride_CUDA (221 ms) 2023-01-11T23:19:52.7636456Z [ RUN ] NVFuserTest.FusionVectorizeMisalignedStrideFail_CUDA 2023-01-11T23:19:52.9920421Z [ OK ] NVFuserTest.FusionVectorizeMisalignedStrideFail_CUDA (228 ms) 2023-01-11T23:19:52.9921111Z [ RUN ] NVFuserTest.FusionVectorization1_CUDA 2023-01-11T23:19:53.2052154Z [ OK ] NVFuserTest.FusionVectorization1_CUDA (213 ms) 2023-01-11T23:19:53.2052568Z [ RUN ] NVFuserTest.FusionVectorization2_CUDA 2023-01-11T23:19:53.2108799Z [ OK ] NVFuserTest.FusionVectorization2_CUDA (5 ms) 2023-01-11T23:19:53.2109600Z [ RUN ] NVFuserTest.FusionVectorization3_CUDA 2023-01-11T23:19:53.4280210Z [ OK ] NVFuserTest.FusionVectorization3_CUDA (216 ms) 2023-01-11T23:19:53.4281398Z [ RUN ] NVFuserTest.FusionVectorizationRFactor_CUDA 2023-01-11T23:19:53.6333745Z [ OK ] NVFuserTest.FusionVectorizationRFactor_CUDA (205 ms) 2023-01-11T23:19:53.6335192Z [ RUN ] NVFuserTest.FusionSizeOneLoop1_CUDA 2023-01-11T23:19:53.8210088Z [ OK ] NVFuserTest.FusionSizeOneLoop1_CUDA (187 ms) 2023-01-11T23:19:53.8210511Z [ RUN ] NVFuserTest.FusionSizeOneLoop2_CUDA 2023-01-11T23:19:54.0130692Z [ OK ] NVFuserTest.FusionSizeOneLoop2_CUDA (191 ms) 2023-01-11T23:19:54.0131242Z [ RUN ] NVFuserTest.FusionValidateParallelize1_CUDA 2023-01-11T23:19:54.0162204Z [ OK ] NVFuserTest.FusionValidateParallelize1_CUDA (3 ms) 2023-01-11T23:19:54.0162781Z [ RUN ] NVFuserTest.FusionValidateParallelize2_CUDA 2023-01-11T23:19:54.1878959Z [ OK ] NVFuserTest.FusionValidateParallelize2_CUDA (171 ms) 2023-01-11T23:19:54.1880068Z [ RUN ] NVFuserTest.FusionValidateParallelize3_CUDA 2023-01-11T23:19:54.3640662Z [ OK ] NVFuserTest.FusionValidateParallelize3_CUDA (176 ms) 2023-01-11T23:19:54.3641697Z [ RUN ] NVFuserTest.FusionValidateParallelize4_CUDA 2023-01-11T23:19:54.5408008Z [ OK ] NVFuserTest.FusionValidateParallelize4_CUDA (177 ms) 2023-01-11T23:19:54.5408540Z [ RUN ] NVFuserTest.FusionValidateParallelize5_CUDA 2023-01-11T23:19:54.7184625Z [ OK ] NVFuserTest.FusionValidateParallelize5_CUDA (177 ms) 2023-01-11T23:19:54.7185153Z [ RUN ] NVFuserTest.FusionValidateParallelize6_CUDA 2023-01-11T23:19:54.7249386Z [ OK ] NVFuserTest.FusionValidateParallelize6_CUDA (6 ms) 2023-01-11T23:19:54.7249813Z [ RUN ] NVFuserTest.FusionValidateParallelize7_CUDA 2023-01-11T23:19:54.7262156Z [ OK ] NVFuserTest.FusionValidateParallelize7_CUDA (1 ms) 2023-01-11T23:19:54.7262574Z [ RUN ] NVFuserTest.FusionDAGMerging_CUDA 2023-01-11T23:19:54.7433856Z [ OK ] NVFuserTest.FusionDAGMerging_CUDA (16 ms) 2023-01-11T23:19:54.7434662Z [ RUN ] NVFuserTest.FusionDAGScalarMerging_CUDA 2023-01-11T23:19:55.1639087Z [ OK ] NVFuserTest.FusionDAGScalarMerging_CUDA (420 ms) 2023-01-11T23:19:55.1639962Z [ RUN ] NVFuserTest.FusionBlockReduceInSerialLoop_CUDA 2023-01-11T23:19:55.3482840Z [ OK ] NVFuserTest.FusionBlockReduceInSerialLoop_CUDA (184 ms) 2023-01-11T23:19:55.3483317Z [ RUN ] NVFuserTest.FusionBlockWelfordInSerialLoop_CUDA 2023-01-11T23:19:55.5533818Z [ OK ] NVFuserTest.FusionBlockWelfordInSerialLoop_CUDA (205 ms) 2023-01-11T23:19:55.5535272Z [ RUN ] NVFuserTest.FusionIOTensorTrivialReductionRepro_CUDA 2023-01-11T23:19:55.7235136Z [ OK ] NVFuserTest.FusionIOTensorTrivialReductionRepro_CUDA (170 ms) 2023-01-11T23:19:55.7235634Z [ RUN ] NVFuserTest.FusionReductionPredicate_CUDA 2023-01-11T23:19:56.0479731Z [ OK ] NVFuserTest.FusionReductionPredicate_CUDA (324 ms) 2023-01-11T23:19:56.0480726Z [ RUN ] NVFuserTest.FusionIssue728_CUDA 2023-01-11T23:19:56.0481419Z [ OK ] NVFuserTest.FusionIssue728_CUDA (0 ms) 2023-01-11T23:19:56.0481912Z [ RUN ] NVFuserTest.FusionIssue757_CUDA 2023-01-11T23:19:56.2388274Z [ OK ] NVFuserTest.FusionIssue757_CUDA (190 ms) 2023-01-11T23:19:56.2389323Z [ RUN ] NVFuserTest.FusionPredicatedBlockBroadcast_CUDA 2023-01-11T23:19:56.4315718Z [ OK ] NVFuserTest.FusionPredicatedBlockBroadcast_CUDA (192 ms) 2023-01-11T23:19:56.4316785Z [ RUN ] NVFuserTest.FusionSegmentVerticalMerge_CUDA 2023-01-11T23:19:56.4586180Z [ OK ] NVFuserTest.FusionSegmentVerticalMerge_CUDA (27 ms) 2023-01-11T23:19:56.4587054Z [ RUN ] NVFuserTest.FusionSegmentHorizontalMerge_CUDA 2023-01-11T23:19:56.4646287Z [ OK ] NVFuserTest.FusionSegmentHorizontalMerge_CUDA (6 ms) 2023-01-11T23:19:56.4647116Z [ RUN ] NVFuserTest.FusionSegmentMixReduction_CUDA 2023-01-11T23:19:56.4769251Z [ OK ] NVFuserTest.FusionSegmentMixReduction_CUDA (12 ms) 2023-01-11T23:19:56.4769634Z [ RUN ] NVFuserTest.FusionSBAR_CUDA 2023-01-11T23:19:56.7109465Z [ OK ] NVFuserTest.FusionSBAR_CUDA (233 ms) 2023-01-11T23:19:56.7109904Z [ RUN ] NVFuserTest.FusionSingleElement_CUDA 2023-01-11T23:19:56.8788686Z [ OK ] NVFuserTest.FusionSingleElement_CUDA (167 ms) 2023-01-11T23:19:56.8789504Z [ RUN ] NVFuserTest.FusionBNBackwardRepro_CUDA 2023-01-11T23:19:57.4563712Z [ OK ] NVFuserTest.FusionBNBackwardRepro_CUDA (577 ms) 2023-01-11T23:19:57.4564233Z [ RUN ] NVFuserTest.FusionBNBackwardRepro2_CUDA 2023-01-11T23:19:58.1802141Z [ OK ] NVFuserTest.FusionBNBackwardRepro2_CUDA (723 ms) 2023-01-11T23:19:58.1811115Z [ RUN ] NVFuserTest.FusionBNRepro_CUDA 2023-01-11T23:19:58.6199018Z [ OK ] NVFuserTest.FusionBNRepro_CUDA (439 ms) 2023-01-11T23:19:58.6199774Z [ RUN ] NVFuserTest.FusionBNRepro2_CUDA 2023-01-11T23:19:59.0030428Z [ OK ] NVFuserTest.FusionBNRepro2_CUDA (383 ms) 2023-01-11T23:19:59.0031441Z [ RUN ] NVFuserTest.FusionZeroSizeTensorPW_CUDA 2023-01-11T23:19:59.0051442Z [ OK ] NVFuserTest.FusionZeroSizeTensorPW_CUDA (2 ms) 2023-01-11T23:19:59.0051924Z [ RUN ] NVFuserTest.FusionZeroSizeTensorReduction_CUDA 2023-01-11T23:19:59.2246155Z [ OK ] NVFuserTest.FusionZeroSizeTensorReduction_CUDA (219 ms) 2023-01-11T23:19:59.2246645Z [ RUN ] NVFuserTest.FusionZeroSizeTensorNormalization_CUDA 2023-01-11T23:19:59.4450150Z [ OK ] NVFuserTest.FusionZeroSizeTensorNormalization_CUDA (220 ms) 2023-01-11T23:19:59.4450612Z [ RUN ] NVFuserTest.FusionSegmentIoAlias_CUDA 2023-01-11T23:19:59.8713460Z [ OK ] NVFuserTest.FusionSegmentIoAlias_CUDA (426 ms) 2023-01-11T23:19:59.8713892Z [ RUN ] NVFuserTest.FusionWelford1Output_CUDA 2023-01-11T23:20:00.1419731Z [ OK ] NVFuserTest.FusionWelford1Output_CUDA (270 ms) 2023-01-11T23:20:00.1420559Z [ RUN ] NVFuserTest.FusionTranslate1Welford_CUDA 2023-01-11T23:20:00.9497475Z [ OK ] NVFuserTest.FusionTranslate1Welford_CUDA (807 ms) 2023-01-11T23:20:00.9498288Z [ RUN ] NVFuserTest.FusionTranslate2Welford_CUDA 2023-01-11T23:20:02.2006705Z [ OK ] NVFuserTest.FusionTranslate2Welford_CUDA (1250 ms) 2023-01-11T23:20:02.2007170Z [ RUN ] NVFuserTest.FusionLargeWelfordNormalization_CUDA 2023-01-11T23:20:02.5163135Z [ OK ] NVFuserTest.FusionLargeWelfordNormalization_CUDA (315 ms) 2023-01-11T23:20:02.5163712Z [ RUN ] NVFuserTest.FusionWelfordOuterPersistence_CUDA 2023-01-11T23:20:04.6091037Z [ OK ] NVFuserTest.FusionWelfordOuterPersistence_CUDA (2093 ms) 2023-01-11T23:20:04.6091505Z [ RUN ] NVFuserTest.FusionSegmentIslands_CUDA 2023-01-11T23:20:05.0194767Z [ OK ] NVFuserTest.FusionSegmentIslands_CUDA (409 ms) 2023-01-11T23:20:05.0195251Z [ RUN ] NVFuserTest.FusionBackOffInnerBroadcast_CUDA 2023-01-11T23:20:05.0224213Z [ OK ] NVFuserTest.FusionBackOffInnerBroadcast_CUDA (3 ms) 2023-01-11T23:20:05.0224786Z [ RUN ] NVFuserTest.FusionBackOffInnerBroadcast2_CUDA 2023-01-11T23:20:05.0230454Z [ OK ] NVFuserTest.FusionBackOffInnerBroadcast2_CUDA (0 ms) 2023-01-11T23:20:05.0231092Z [ RUN ] NVFuserTest.FusionBackOffInnerBroadcast3_CUDA 2023-01-11T23:20:05.0237599Z [ OK ] NVFuserTest.FusionBackOffInnerBroadcast3_CUDA (0 ms) 2023-01-11T23:20:05.0238162Z [ RUN ] NVFuserTest.FusionSimpleWarp_CUDA 2023-01-11T23:20:05.2114334Z [ OK ] NVFuserTest.FusionSimpleWarp_CUDA (186 ms) 2023-01-11T23:20:05.2115229Z [ RUN ] NVFuserTest.FusionSimpleWarpPad_CUDA 2023-01-11T23:20:05.4124701Z [ OK ] NVFuserTest.FusionSimpleWarpPad_CUDA (201 ms) 2023-01-11T23:20:05.4125244Z [ RUN ] NVFuserTest.FusionWarpPadMergeSplit_CUDA 2023-01-11T23:20:05.6366850Z [ OK ] NVFuserTest.FusionWarpPadMergeSplit_CUDA (224 ms) 2023-01-11T23:20:05.6367296Z [ RUN ] NVFuserTest.FusionSerialWarpReduction_CUDA 2023-01-11T23:20:05.9149370Z [ OK ] NVFuserTest.FusionSerialWarpReduction_CUDA (278 ms) 2023-01-11T23:20:05.9150221Z [ RUN ] NVFuserTest.FusionTrivialWarpReduction_CUDA 2023-01-11T23:20:06.1255135Z [ OK ] NVFuserTest.FusionTrivialWarpReduction_CUDA (210 ms) 2023-01-11T23:20:06.1255597Z [ RUN ] NVFuserTest.FusionMultipleDimBinding_CUDA 2023-01-11T23:20:06.3528159Z [ OK ] NVFuserTest.FusionMultipleDimBinding_CUDA (227 ms) 2023-01-11T23:20:06.3528597Z [ RUN ] NVFuserTest.FusionPadNoWarpReduce_CUDA 2023-01-11T23:20:06.5394767Z [ OK ] NVFuserTest.FusionPadNoWarpReduce_CUDA (186 ms) 2023-01-11T23:20:06.5395264Z [ RUN ] NVFuserTest.FusionWarpMutipleThreadDim_CUDA 2023-01-11T23:20:06.7389833Z [ OK ] NVFuserTest.FusionWarpMutipleThreadDim_CUDA (199 ms) 2023-01-11T23:20:06.7390404Z [ RUN ] NVFuserTest.FusionWarpReduceUnrollOuterLoop_CUDA 2023-01-11T23:20:07.1016361Z [ OK ] NVFuserTest.FusionWarpReduceUnrollOuterLoop_CUDA (362 ms) 2023-01-11T23:20:07.1017390Z [ RUN ] NVFuserTest.FusionWarpReducePredication_CUDA 2023-01-11T23:20:07.3840946Z [ OK ] NVFuserTest.FusionWarpReducePredication_CUDA (282 ms) 2023-01-11T23:20:07.3841777Z [ RUN ] NVFuserTest.FusionSegfaultReduction_CUDA 2023-01-11T23:20:07.6038297Z [ OK ] NVFuserTest.FusionSegfaultReduction_CUDA (219 ms) 2023-01-11T23:20:07.6038867Z [ RUN ] NVFuserTest.FusionPredicateElimination1_CUDA 2023-01-11T23:20:07.6082038Z [ OK ] NVFuserTest.FusionPredicateElimination1_CUDA (4 ms) 2023-01-11T23:20:07.6082619Z [ RUN ] NVFuserTest.FusionPredicateElimination2_CUDA 2023-01-11T23:20:07.9126311Z [ OK ] NVFuserTest.FusionPredicateElimination2_CUDA (304 ms) 2023-01-11T23:20:07.9126858Z [ RUN ] NVFuserTest.FusionPredicateElimination3_CUDA 2023-01-11T23:20:09.2922166Z [ OK ] NVFuserTest.FusionPredicateElimination3_CUDA (1379 ms) 2023-01-11T23:20:09.2922684Z [ RUN ] NVFuserTest.FusionPredicateElimination4_CUDA 2023-01-11T23:20:40.0803713Z [ OK ] NVFuserTest.FusionPredicateElimination4_CUDA (30787 ms) 2023-01-11T23:20:40.0804167Z [ RUN ] NVFuserTest.FusionPredicateElimination5_CUDA 2023-01-11T23:20:41.3785530Z [ OK ] NVFuserTest.FusionPredicateElimination5_CUDA (1298 ms) 2023-01-11T23:20:41.3785993Z [ RUN ] NVFuserTest.FusionPredicateElimination6_CUDA 2023-01-11T23:20:41.5761160Z [ OK ] NVFuserTest.FusionPredicateElimination6_CUDA (197 ms) 2023-01-11T23:20:41.5762166Z [ RUN ] NVFuserTest.FusionPredicateElimination7_CUDA 2023-01-11T23:20:41.7904674Z [ OK ] NVFuserTest.FusionPredicateElimination7_CUDA (214 ms) 2023-01-11T23:20:41.7905127Z [ RUN ] NVFuserTest.FusionForceFp16Simple_CUDA 2023-01-11T23:20:42.2089510Z [ OK ] NVFuserTest.FusionForceFp16Simple_CUDA (418 ms) 2023-01-11T23:20:42.2089982Z [ RUN ] NVFuserTest.FusionForceBf16Simple_CUDA 2023-01-11T23:20:42.6288585Z [ OK ] NVFuserTest.FusionForceBf16Simple_CUDA (419 ms) 2023-01-11T23:20:42.6289034Z [ RUN ] NVFuserTest.FusionForceFp16NotAllCast_CUDA 2023-01-11T23:20:43.3020978Z [ OK ] NVFuserTest.FusionForceFp16NotAllCast_CUDA (673 ms) 2023-01-11T23:20:43.3021960Z [ RUN ] NVFuserTest.FusionForceBf16NotAllCast_CUDA 2023-01-11T23:20:43.9762426Z [ OK ] NVFuserTest.FusionForceBf16NotAllCast_CUDA (674 ms) 2023-01-11T23:20:43.9762918Z [ RUN ] NVFuserTest.FusionBufferReuseBroadCastMultiVisit_CUDA 2023-01-11T23:20:44.1598794Z [ OK ] NVFuserTest.FusionBufferReuseBroadCastMultiVisit_CUDA (183 ms) 2023-01-11T23:20:44.1599270Z [ RUN ] NVFuserTest.FusionBufferReuseStressTest_CUDA 2023-01-11T23:20:44.3560706Z [ OK ] NVFuserTest.FusionBufferReuseStressTest_CUDA (196 ms) 2023-01-11T23:20:44.3561196Z [ RUN ] NVFuserTest.FusionBufferReuseLargeBuffer_CUDA 2023-01-11T23:20:47.3522891Z [ OK ] NVFuserTest.FusionBufferReuseLargeBuffer_CUDA (2996 ms) 2023-01-11T23:20:47.3523556Z [ RUN ] NVFuserTest.FusionBufferReuseNo2hop_CUDA 2023-01-11T23:20:47.5377657Z [ OK ] NVFuserTest.FusionBufferReuseNo2hop_CUDA (185 ms) 2023-01-11T23:20:47.5378340Z [ RUN ] NVFuserTest.FusionBufferReuseAllocationOrder_CUDA 2023-01-11T23:20:47.7197372Z [ OK ] NVFuserTest.FusionBufferReuseAllocationOrder_CUDA (181 ms) 2023-01-11T23:20:47.7197862Z [ RUN ] NVFuserTest.FusionBufferReuseLiveInterval_CUDA 2023-01-11T23:20:48.4541034Z [ OK ] NVFuserTest.FusionBufferReuseLiveInterval_CUDA (734 ms) 2023-01-11T23:20:48.4541970Z [ RUN ] NVFuserTest.FusionBufferReuseNoAcrossBroadcast_CUDA 2023-01-11T23:20:48.6392207Z [ OK ] NVFuserTest.FusionBufferReuseNoAcrossBroadcast_CUDA (185 ms) 2023-01-11T23:20:48.6393053Z [ RUN ] NVFuserTest.FusionIssue970_CUDA 2023-01-11T23:20:49.0423855Z [ OK ] NVFuserTest.FusionIssue970_CUDA (403 ms) 2023-01-11T23:20:49.0424608Z [ RUN ] NVFuserTest.FusionIssue1016_CUDA 2023-01-11T23:20:49.2323962Z [ OK ] NVFuserTest.FusionIssue1016_CUDA (190 ms) 2023-01-11T23:20:49.2324365Z [ RUN ] NVFuserTest.FusionIssue1021_CUDA 2023-01-11T23:20:49.4102758Z [ OK ] NVFuserTest.FusionIssue1021_CUDA (177 ms) 2023-01-11T23:20:49.4103699Z [ RUN ] NVFuserTest.FusionNonUniqueThreadDim_CUDA 2023-01-11T23:20:49.5991292Z [ OK ] NVFuserTest.FusionNonUniqueThreadDim_CUDA (188 ms) 2023-01-11T23:20:49.5992389Z [ RUN ] NVFuserTest.FusionParallelDimensionMap1_CUDA 2023-01-11T23:20:49.7879127Z [ OK ] NVFuserTest.FusionParallelDimensionMap1_CUDA (188 ms) 2023-01-11T23:20:49.7879629Z [ RUN ] NVFuserTest.FusionParallelDimensionMap2_CUDA 2023-01-11T23:20:49.9762394Z [ OK ] NVFuserTest.FusionParallelDimensionMap2_CUDA (188 ms) 2023-01-11T23:20:49.9762862Z [ RUN ] NVFuserTest.FusionParallelDimensionMap3_CUDA 2023-01-11T23:20:50.1645028Z [ OK ] NVFuserTest.FusionParallelDimensionMap3_CUDA (188 ms) 2023-01-11T23:20:50.1645510Z [ RUN ] NVFuserTest.FusionParallelDimensionMap4_CUDA 2023-01-11T23:20:50.3524390Z [ OK ] NVFuserTest.FusionParallelDimensionMap4_CUDA (187 ms) 2023-01-11T23:20:50.3524930Z [ RUN ] NVFuserTest.FusionParallelDimensionMap5_CUDA 2023-01-11T23:20:50.5283827Z [ OK ] NVFuserTest.FusionParallelDimensionMap5_CUDA (175 ms) 2023-01-11T23:20:50.5284611Z [ RUN ] NVFuserTest.FusionSegmenterCombineReductionsCycleRepro_CUDA 2023-01-11T23:20:52.6284064Z [ OK ] NVFuserTest.FusionSegmenterCombineReductionsCycleRepro_CUDA (2100 ms) 2023-01-11T23:20:52.6284850Z [ RUN ] NVFuserTest.FusionSerialAndParallelIndexing_CUDA 2023-01-11T23:20:52.8322809Z [ OK ] NVFuserTest.FusionSerialAndParallelIndexing_CUDA (203 ms) 2023-01-11T23:20:52.8323262Z [ RUN ] NVFuserTest.FusionWARSyncAliasedSmem_CUDA 2023-01-11T23:20:53.0130551Z [ OK ] NVFuserTest.FusionWARSyncAliasedSmem_CUDA (180 ms) 2023-01-11T23:20:53.0131340Z [ RUN ] NVFuserTest.FusionIssue1099_CUDA 2023-01-11T23:20:53.2040208Z [ OK ] NVFuserTest.FusionIssue1099_CUDA (190 ms) 2023-01-11T23:20:53.2041010Z [ RUN ] NVFuserTest.FusionUnswitchPredicate_CUDA 2023-01-11T23:20:53.3919761Z [ OK ] NVFuserTest.FusionUnswitchPredicate_CUDA (187 ms) 2023-01-11T23:20:53.3920165Z [ RUN ] NVFuserTest.FusionIssue1189_CUDA 2023-01-11T23:20:53.5989427Z [ OK ] NVFuserTest.FusionIssue1189_CUDA (206 ms) 2023-01-11T23:20:53.5989835Z [ RUN ] NVFuserTest.FusionIssue1052_CUDA 2023-01-11T23:20:53.7700778Z [ OK ] NVFuserTest.FusionIssue1052_CUDA (171 ms) 2023-01-11T23:20:53.7701346Z [ RUN ] NVFuserTest.FusionPointwiseBroadcast_CUDA 2023-01-11T23:20:53.9800544Z [ OK ] NVFuserTest.FusionPointwiseBroadcast_CUDA (210 ms) 2023-01-11T23:20:53.9801005Z [ RUN ] NVFuserTest.FusionPointwiseVectorize_CUDA 2023-01-11T23:20:53.9824042Z [ OK ] NVFuserTest.FusionPointwiseVectorize_CUDA (2 ms) 2023-01-11T23:20:53.9825006Z [ RUN ] NVFuserTest.FusionSmemAliasSerial_CUDA 2023-01-11T23:20:54.1617482Z [ OK ] NVFuserTest.FusionSmemAliasSerial_CUDA (179 ms) 2023-01-11T23:20:54.1618105Z [ RUN ] NVFuserTest.FusionGridReductionWithNonExactParallelDimensions_CUDA 2023-01-11T23:20:54.3706124Z [ OK ] NVFuserTest.FusionGridReductionWithNonExactParallelDimensions_CUDA (208 ms) 2023-01-11T23:20:54.3707253Z [ RUN ] NVFuserTest.FusionGridWelfordWithNonExactParallelDimensions_CUDA 2023-01-11T23:20:54.6016663Z [ OK ] NVFuserTest.FusionGridWelfordWithNonExactParallelDimensions_CUDA (230 ms) 2023-01-11T23:20:54.6017778Z [ RUN ] NVFuserTest.FusionGridReductionWithNonExactParallelDimensions2_CUDA 2023-01-11T23:20:54.6018386Z [ OK ] NVFuserTest.FusionGridReductionWithNonExactParallelDimensions2_CUDA (0 ms) 2023-01-11T23:20:54.6019237Z [ RUN ] NVFuserTest.FusionGridWelfordWithNonExactParallelDimensions2_CUDA 2023-01-11T23:20:54.6019782Z [ OK ] NVFuserTest.FusionGridWelfordWithNonExactParallelDimensions2_CUDA (0 ms) 2023-01-11T23:20:54.6020257Z [ RUN ] NVFuserTest.FusionPredicateParallelizedDomains_CUDA 2023-01-11T23:20:54.8407214Z [ OK ] NVFuserTest.FusionPredicateParallelizedDomains_CUDA (239 ms) 2023-01-11T23:20:54.8407686Z [ RUN ] NVFuserTest.FusionSmemPredicateUnswitch_CUDA 2023-01-11T23:20:55.0504030Z [ OK ] NVFuserTest.FusionSmemPredicateUnswitch_CUDA (209 ms) 2023-01-11T23:20:55.0504827Z [ RUN ] NVFuserTest.FusionFloatPow_CUDA 2023-01-11T23:20:55.2428591Z [ OK ] NVFuserTest.FusionFloatPow_CUDA (192 ms) 2023-01-11T23:20:55.2429350Z [ RUN ] NVFuserTest.FusionIssue1127_CUDA 2023-01-11T23:20:55.2459797Z [ OK ] NVFuserTest.FusionIssue1127_CUDA (3 ms) 2023-01-11T23:20:55.2460194Z [ RUN ] NVFuserTest.FusionChannelsLastParser_CUDA 2023-01-11T23:20:55.2736027Z [ OK ] NVFuserTest.FusionChannelsLastParser_CUDA (27 ms) 2023-01-11T23:20:55.2736893Z [ RUN ] NVFuserTest.FusionThreadPredicateUnswitch_CUDA 2023-01-11T23:20:55.5038235Z [ OK ] NVFuserTest.FusionThreadPredicateUnswitch_CUDA (229 ms) 2023-01-11T23:20:55.5038690Z [ RUN ] NVFuserTest.FusionNonContigOutputs_CUDA 2023-01-11T23:20:55.6746162Z [ OK ] NVFuserTest.FusionNonContigOutputs_CUDA (170 ms) 2023-01-11T23:20:55.6746975Z [ RUN ] NVFuserTest.FusionTestWarpSoftMax_CUDA 2023-01-11T23:20:55.9783994Z [ OK ] NVFuserTest.FusionTestWarpSoftMax_CUDA (303 ms) 2023-01-11T23:20:55.9784778Z [ RUN ] NVFuserTest.FusionIssue1133_CUDA 2023-01-11T23:20:56.1907568Z [ OK ] NVFuserTest.FusionIssue1133_CUDA (212 ms) 2023-01-11T23:20:56.1908304Z [ RUN ] NVFuserTest.FusionRfactorContigIDs_CUDA 2023-01-11T23:20:56.3852285Z [ OK ] NVFuserTest.FusionRfactorContigIDs_CUDA (194 ms) 2023-01-11T23:20:56.3853912Z [ RUN ] NVFuserTest.FusionPersistentBufferCalculation1_CUDA 2023-01-11T23:20:56.3855378Z [ OK ] NVFuserTest.FusionPersistentBufferCalculation1_CUDA (0 ms) 2023-01-11T23:20:56.3856412Z [ RUN ] NVFuserTest.FusionPersistentBufferCalculation2_CUDA 2023-01-11T23:20:56.3859881Z [ OK ] NVFuserTest.FusionPersistentBufferCalculation2_CUDA (0 ms) 2023-01-11T23:20:56.3860331Z [ RUN ] NVFuserTest.FusionPersistentBufferCalculation3_CUDA 2023-01-11T23:20:56.3869654Z [ OK ] NVFuserTest.FusionPersistentBufferCalculation3_CUDA (0 ms) 2023-01-11T23:20:56.3870144Z [ RUN ] NVFuserTest.FusionPersistentBufferCalculation4_CUDA 2023-01-11T23:20:56.3876148Z [ OK ] NVFuserTest.FusionPersistentBufferCalculation4_CUDA (0 ms) 2023-01-11T23:20:56.3876636Z [ RUN ] NVFuserTest.FusionPersistentBufferProjection_CUDA 2023-01-11T23:20:56.6586074Z [ OK ] NVFuserTest.FusionPersistentBufferProjection_CUDA (270 ms) 2023-01-11T23:20:56.6586852Z [ RUN ] NVFuserTest.FusionIssue1223_CUDA 2023-01-11T23:20:56.8718724Z [ OK ] NVFuserTest.FusionIssue1223_CUDA (213 ms) 2023-01-11T23:20:56.8719189Z [ RUN ] NVFuserTest.FusionRfactorPredication1_CUDA 2023-01-11T23:20:57.0724661Z [ OK ] NVFuserTest.FusionRfactorPredication1_CUDA (200 ms) 2023-01-11T23:20:57.0725164Z [ RUN ] NVFuserTest.FusionRfactorPredication2_CUDA 2023-01-11T23:20:57.2961208Z [ OK ] NVFuserTest.FusionRfactorPredication2_CUDA (223 ms) 2023-01-11T23:20:57.2961650Z [ RUN ] NVFuserTest.FusionRfactorIndirectRoot_CUDA 2023-01-11T23:20:57.4861657Z [ OK ] NVFuserTest.FusionRfactorIndirectRoot_CUDA (189 ms) 2023-01-11T23:20:57.4862989Z [ RUN ] NVFuserTest.FusionNonDivisibleSplit1_CUDA 2023-01-11T23:20:57.6758373Z [ OK ] NVFuserTest.FusionNonDivisibleSplit1_CUDA (189 ms) 2023-01-11T23:20:57.6759043Z [ RUN ] NVFuserTest.FusionNonDivisibleSplit2_CUDA 2023-01-11T23:20:57.8823071Z [ OK ] NVFuserTest.FusionNonDivisibleSplit2_CUDA (206 ms) 2023-01-11T23:20:57.8823923Z [ RUN ] NVFuserTest.FusionNonDivisibleSplit3_CUDA 2023-01-11T23:20:58.0699573Z [ OK ] NVFuserTest.FusionNonDivisibleSplit3_CUDA (187 ms) 2023-01-11T23:20:58.0700055Z [ RUN ] NVFuserTest.FusionNonDivisibleSplit4_CUDA 2023-01-11T23:20:58.2624685Z [ OK ] NVFuserTest.FusionNonDivisibleSplit4_CUDA (192 ms) 2023-01-11T23:20:58.2625526Z [ RUN ] NVFuserTest.FusionNonDivisibleSplit5_CUDA 2023-01-11T23:20:58.4568200Z [ OK ] NVFuserTest.FusionNonDivisibleSplit5_CUDA (194 ms) 2023-01-11T23:20:58.4568703Z [ RUN ] NVFuserTest.FusionNonDivisibleSplitVectorize1_CUDA 2023-01-11T23:20:58.6578589Z [ OK ] NVFuserTest.FusionNonDivisibleSplitVectorize1_CUDA (200 ms) 2023-01-11T23:20:58.6579276Z [ RUN ] NVFuserTest.FusionNonDivisibleSplitVectorize2_CUDA 2023-01-11T23:20:58.9985899Z [ OK ] NVFuserTest.FusionNonDivisibleSplitVectorize2_CUDA (340 ms) 2023-01-11T23:20:58.9986929Z [ RUN ] NVFuserTest.FusionIssue1284Repro_CUDA 2023-01-11T23:20:59.3491099Z [ OK ] NVFuserTest.FusionIssue1284Repro_CUDA (350 ms) 2023-01-11T23:20:59.3491883Z [ RUN ] NVFuserTest.FusionIssue1284Repro2_CUDA 2023-01-11T23:20:59.7436331Z [ OK ] NVFuserTest.FusionIssue1284Repro2_CUDA (394 ms) 2023-01-11T23:20:59.7437265Z [ RUN ] NVFuserTest.FusionIssue1305Repro_CUDA 2023-01-11T23:20:59.7440661Z [ OK ] NVFuserTest.FusionIssue1305Repro_CUDA (0 ms) 2023-01-11T23:20:59.7441119Z [ RUN ] NVFuserTest.FusionDoubleBuffering1_CUDA 2023-01-11T23:20:59.9256988Z [ OK ] NVFuserTest.FusionDoubleBuffering1_CUDA (181 ms) 2023-01-11T23:20:59.9257961Z [ RUN ] NVFuserTest.FusionDoubleBuffering2_CUDA 2023-01-11T23:21:00.1060248Z [ OK ] NVFuserTest.FusionDoubleBuffering2_CUDA (180 ms) 2023-01-11T23:21:00.1060701Z [ RUN ] NVFuserTest.FusionDoubleBuffering3_CUDA 2023-01-11T23:21:00.2966301Z [ OK ] NVFuserTest.FusionDoubleBuffering3_CUDA (190 ms) 2023-01-11T23:21:00.2966775Z [ RUN ] NVFuserTest.FusionDoubleBuffering4_CUDA 2023-01-11T23:21:00.5075995Z [ OK ] NVFuserTest.FusionDoubleBuffering4_CUDA (210 ms) 2023-01-11T23:21:00.5076971Z [ RUN ] NVFuserTest.FusionDoubleBuffering5_CUDA 2023-01-11T23:21:00.7113893Z [ OK ] NVFuserTest.FusionDoubleBuffering5_CUDA (203 ms) 2023-01-11T23:21:00.7114890Z [ RUN ] NVFuserTest.FusionDoubleBuffering6_CUDA 2023-01-11T23:21:01.0722189Z [ OK ] NVFuserTest.FusionDoubleBuffering6_CUDA (361 ms) 2023-01-11T23:21:01.0722646Z [ RUN ] NVFuserTest.FusionDoubleBuffering7_CUDA 2023-01-11T23:21:01.2602869Z [ OK ] NVFuserTest.FusionDoubleBuffering7_CUDA (188 ms) 2023-01-11T23:21:01.2603349Z [ RUN ] NVFuserTest.FusionDoubleBuffering8_CUDA 2023-01-11T23:21:01.4656427Z [ OK ] NVFuserTest.FusionDoubleBuffering8_CUDA (205 ms) 2023-01-11T23:21:01.4657378Z [ RUN ] NVFuserTest.FusionDoubleBuffering9_CUDA 2023-01-11T23:21:01.6641413Z [ OK ] NVFuserTest.FusionDoubleBuffering9_CUDA (198 ms) 2023-01-11T23:21:01.6641919Z [ RUN ] NVFuserTest.FusionSmemBlockGemmCacheDoubleBuffer_CUDA 2023-01-11T23:21:04.1540820Z [ OK ] NVFuserTest.FusionSmemBlockGemmCacheDoubleBuffer_CUDA (2489 ms) 2023-01-11T23:21:04.1541438Z [ RUN ] NVFuserTest.FusionIntermediateTensorVectorize_CUDA 2023-01-11T23:21:04.5258798Z [ OK ] NVFuserTest.FusionIntermediateTensorVectorize_CUDA (371 ms) 2023-01-11T23:21:04.5259777Z [ RUN ] NVFuserTest.FusionBroadcastConcretization1_CUDA 2023-01-11T23:21:04.8476275Z [ OK ] NVFuserTest.FusionBroadcastConcretization1_CUDA (321 ms) 2023-01-11T23:21:04.8476780Z [ RUN ] NVFuserTest.FusionBroadcastConcretization2_CUDA 2023-01-11T23:21:05.0316669Z [ OK ] NVFuserTest.FusionBroadcastConcretization2_CUDA (183 ms) 2023-01-11T23:21:05.0317602Z [ RUN ] NVFuserTest.FusionBroadcastConcretization3_CUDA 2023-01-11T23:21:05.3933581Z [ OK ] NVFuserTest.FusionBroadcastConcretization3_CUDA (361 ms) 2023-01-11T23:21:05.3935321Z [ RUN ] NVFuserTest.FusionBroadcastConcretization5_CUDA 2023-01-11T23:21:05.3936739Z [ OK ] NVFuserTest.FusionBroadcastConcretization5_CUDA (0 ms) 2023-01-11T23:21:05.3937549Z [ RUN ] NVFuserTest.FusionIssue1430_CUDA 2023-01-11T23:21:06.0404717Z [ OK ] NVFuserTest.FusionIssue1430_CUDA (646 ms) 2023-01-11T23:21:06.0405156Z [ RUN ] NVFuserTest.FusionCodegenAllocatedScalars_CUDA 2023-01-11T23:21:06.0409000Z [ OK ] NVFuserTest.FusionCodegenAllocatedScalars_CUDA (0 ms) 2023-01-11T23:21:06.0409422Z [ RUN ] NVFuserTest.FusionIndexHoist1_CUDA 2023-01-11T23:21:06.2549106Z [ OK ] NVFuserTest.FusionIndexHoist1_CUDA (213 ms) 2023-01-11T23:21:06.2549862Z [ RUN ] NVFuserTest.FusionIndexHoist2_CUDA 2023-01-11T23:21:06.4425919Z [ OK ] NVFuserTest.FusionIndexHoist2_CUDA (187 ms) 2023-01-11T23:21:06.4426713Z [ RUN ] NVFuserTest.FusionTestGridComm_CUDA 2023-01-11T23:21:06.6447877Z [ OK ] NVFuserTest.FusionTestGridComm_CUDA (202 ms) 2023-01-11T23:21:06.6448312Z [ RUN ] NVFuserTest.FusionTestGridComm2_CUDA 2023-01-11T23:21:06.8298453Z [ OK ] NVFuserTest.FusionTestGridComm2_CUDA (184 ms) 2023-01-11T23:21:06.8299289Z [ RUN ] NVFuserTest.FusionDoubleBufferVector_CUDA 2023-01-11T23:21:07.0537881Z [ OK ] NVFuserTest.FusionDoubleBufferVector_CUDA (223 ms) 2023-01-11T23:21:07.0538453Z [ RUN ] NVFuserTest.FusionLargeSmem_CUDA 2023-01-11T23:21:07.4832353Z [ OK ] NVFuserTest.FusionLargeSmem_CUDA (429 ms) 2023-01-11T23:21:07.4832842Z [ RUN ] NVFuserTest.FusionTooLargeSmem_CUDA 2023-01-11T23:21:08.4867954Z [ OK ] NVFuserTest.FusionTooLargeSmem_CUDA (1003 ms) 2023-01-11T23:21:08.4868894Z [ RUN ] NVFuserTest.FusionSmemAlignment_CUDA 2023-01-11T23:21:13.6333436Z [ OK ] NVFuserTest.FusionSmemAlignment_CUDA (5146 ms) 2023-01-11T23:21:13.6333921Z [ RUN ] NVFuserTest.FusionImmediateValueAsInput_CUDA 2023-01-11T23:21:13.8047761Z [ OK ] NVFuserTest.FusionImmediateValueAsInput_CUDA (171 ms) 2023-01-11T23:21:13.8048248Z [ RUN ] NVFuserTest.FusionVectorizeContigIndex_CUDA 2023-01-11T23:21:13.9789433Z [ OK ] NVFuserTest.FusionVectorizeContigIndex_CUDA (173 ms) 2023-01-11T23:21:13.9790503Z [ RUN ] NVFuserTest.FusionVectorizeContigIndexFail_CUDA 2023-01-11T23:21:14.1563421Z [ OK ] NVFuserTest.FusionVectorizeContigIndexFail_CUDA (177 ms) 2023-01-11T23:21:14.1564059Z [ RUN ] NVFuserTest.FusionVectorizeInputToOutput_CUDA 2023-01-11T23:21:14.3360997Z [ OK ] NVFuserTest.FusionVectorizeInputToOutput_CUDA (179 ms) 2023-01-11T23:21:14.3362098Z [ RUN ] NVFuserTest.FusionVectorizeContigIndexValidationFail_CUDA 2023-01-11T23:21:14.5144874Z [ OK ] NVFuserTest.FusionVectorizeContigIndexValidationFail_CUDA (178 ms) 2023-01-11T23:21:14.5145372Z [ RUN ] NVFuserTest.FusionContigIndexingWithBroadcast_CUDA 2023-01-11T23:21:14.8910050Z [ OK ] NVFuserTest.FusionContigIndexingWithBroadcast_CUDA (376 ms) 2023-01-11T23:21:14.8911351Z [ RUN ] NVFuserTest.FusionVectorizeContigIndexValidationFail2_CUDA 2023-01-11T23:21:15.0852314Z [ OK ] NVFuserTest.FusionVectorizeContigIndexValidationFail2_CUDA (194 ms) 2023-01-11T23:21:15.0852862Z [ RUN ] NVFuserTest.FusionVectorizeContigIndexWithBroadcast_CUDA 2023-01-11T23:21:15.2682411Z [ OK ] NVFuserTest.FusionVectorizeContigIndexWithBroadcast_CUDA (182 ms) 2023-01-11T23:21:15.2683461Z [ RUN ] NVFuserTest.FusionVectorizeContigIndexPointwiseSchedule_CUDA 2023-01-11T23:21:15.4925574Z [ OK ] NVFuserTest.FusionVectorizeContigIndexPointwiseSchedule_CUDA (224 ms) 2023-01-11T23:21:15.4926089Z [ RUN ] NVFuserTest.FusionTrivialReductionForwarding1_CUDA 2023-01-11T23:21:15.4933228Z [ OK ] NVFuserTest.FusionTrivialReductionForwarding1_CUDA (0 ms) 2023-01-11T23:21:15.4933918Z [ RUN ] NVFuserTest.FusionTrivialReductionForwarding2_CUDA 2023-01-11T23:21:15.4934984Z [ OK ] NVFuserTest.FusionTrivialReductionForwarding2_CUDA (0 ms) 2023-01-11T23:21:15.4935876Z [ RUN ] NVFuserTest.FusionTrivialReductionForwarding3_CUDA 2023-01-11T23:21:15.4936676Z [ OK ] NVFuserTest.FusionTrivialReductionForwarding3_CUDA (0 ms) 2023-01-11T23:21:15.4937139Z [ RUN ] NVFuserTest.FusionTrivialReductionForwarding4_CUDA 2023-01-11T23:21:15.6723928Z [ OK ] NVFuserTest.FusionTrivialReductionForwarding4_CUDA (178 ms) 2023-01-11T23:21:15.6724413Z [ RUN ] NVFuserTest.FusionRAWSyncInsertionPlace1_CUDA 2023-01-11T23:21:15.8606791Z [ OK ] NVFuserTest.FusionRAWSyncInsertionPlace1_CUDA (188 ms) 2023-01-11T23:21:15.8607249Z [ RUN ] NVFuserTest.FusionRAWSyncInsertionPlace2_CUDA 2023-01-11T23:21:16.1478122Z [ OK ] NVFuserTest.FusionRAWSyncInsertionPlace2_CUDA (287 ms) 2023-01-11T23:21:16.1478644Z [ RUN ] NVFuserTest.FusionRAWSyncInsertionPlace3_CUDA 2023-01-11T23:21:16.3797097Z [ OK ] NVFuserTest.FusionRAWSyncInsertionPlace3_CUDA (231 ms) 2023-01-11T23:21:16.3797675Z [ RUN ] NVFuserTest.FusionRAWSyncInsertionPlace4_CUDA 2023-01-11T23:21:16.3836182Z [ OK ] NVFuserTest.FusionRAWSyncInsertionPlace4_CUDA (4 ms) 2023-01-11T23:21:16.3837609Z [ RUN ] NVFuserTest.FusionSerialSmemWriteParallelRead1_CUDA 2023-01-11T23:21:17.4333278Z [ OK ] NVFuserTest.FusionSerialSmemWriteParallelRead1_CUDA (1049 ms) 2023-01-11T23:21:17.4334075Z [ RUN ] NVFuserTest.FusionSerialSmemWriteParallelRead2_CUDA 2023-01-11T23:21:18.4980429Z [ OK ] NVFuserTest.FusionSerialSmemWriteParallelRead2_CUDA (1064 ms) 2023-01-11T23:21:18.4981436Z [ RUN ] NVFuserTest.FusionSimpleCpAsync_CUDA 2023-01-11T23:21:18.7520520Z [ OK ] NVFuserTest.FusionSimpleCpAsync_CUDA (254 ms) 2023-01-11T23:21:18.7520967Z [ RUN ] NVFuserTest.FusionDoubleBufferCpAsync1_CUDA 2023-01-11T23:21:19.1060138Z [ OK ] NVFuserTest.FusionDoubleBufferCpAsync1_CUDA (353 ms) 2023-01-11T23:21:19.1061112Z [ RUN ] NVFuserTest.FusionDoubleBufferCpAsync2_CUDA 2023-01-11T23:21:19.6731677Z [ OK ] NVFuserTest.FusionDoubleBufferCpAsync2_CUDA (567 ms) 2023-01-11T23:21:19.6732435Z [ RUN ] NVFuserTest.FusionDoubleBufferNoSync_CUDA 2023-01-11T23:21:20.1594440Z [ OK ] NVFuserTest.FusionDoubleBufferNoSync_CUDA (486 ms) 2023-01-11T23:21:20.1595094Z [ RUN ] NVFuserTest.FusionCpAsyncPredicate_CUDA 2023-01-11T23:21:22.3711439Z [ OK ] NVFuserTest.FusionCpAsyncPredicate_CUDA (2211 ms) 2023-01-11T23:21:22.3712305Z [ RUN ] NVFuserTest.FusionPredRemovalCheck_CUDA 2023-01-11T23:21:22.3744834Z [ OK ] NVFuserTest.FusionPredRemovalCheck_CUDA (3 ms) 2023-01-11T23:21:22.3745558Z [ RUN ] NVFuserTest.FusionPropagateParallelTypesToSiblings_CUDA 2023-01-11T23:21:22.6241469Z [ OK ] NVFuserTest.FusionPropagateParallelTypesToSiblings_CUDA (249 ms) 2023-01-11T23:21:22.6242282Z [ RUN ] NVFuserTest.FusionExactRootDomainMap_CUDA 2023-01-11T23:21:22.6242680Z [ OK ] NVFuserTest.FusionExactRootDomainMap_CUDA (0 ms) 2023-01-11T23:21:22.6243082Z [ RUN ] NVFuserTest.FusionIncompleteConcreteID_CUDA 2023-01-11T23:21:22.6300677Z [ OK ] NVFuserTest.FusionIncompleteConcreteID_CUDA (5 ms) 2023-01-11T23:21:22.6301541Z [ RUN ] NVFuserTest.FusionTestReEntrantGridWelford_CUDA 2023-01-11T23:21:23.8598034Z [ OK ] NVFuserTest.FusionTestReEntrantGridWelford_CUDA (1229 ms) 2023-01-11T23:21:23.8598499Z [ RUN ] NVFuserTest.FusionRedundantPredSync_CUDA 2023-01-11T23:21:24.0357429Z [ OK ] NVFuserTest.FusionRedundantPredSync_CUDA (175 ms) 2023-01-11T23:21:24.0357855Z [ RUN ] NVFuserTest.FusionRedundantPredSync2_CUDA 2023-01-11T23:21:24.2096082Z [ OK ] NVFuserTest.FusionRedundantPredSync2_CUDA (173 ms) 2023-01-11T23:21:24.2096594Z [ RUN ] NVFuserTest.FusionRedundantPredSync3_CUDA 2023-01-11T23:21:35.9968772Z [ OK ] NVFuserTest.FusionRedundantPredSync3_CUDA (11787 ms) 2023-01-11T23:21:35.9969256Z [ RUN ] NVFuserTest.FusionRedundantUseCheck_CUDA 2023-01-11T23:21:35.9997297Z [ OK ] NVFuserTest.FusionRedundantUseCheck_CUDA (2 ms) 2023-01-11T23:21:35.9998090Z [ RUN ] NVFuserTest.FusionSimpleSwizzle0_CUDA 2023-01-11T23:21:36.2909487Z [ OK ] NVFuserTest.FusionSimpleSwizzle0_CUDA (290 ms) 2023-01-11T23:21:36.2909921Z [ RUN ] NVFuserTest.FusionSimpleSwizzle1_CUDA 2023-01-11T23:21:36.6012300Z [ OK ] NVFuserTest.FusionSimpleSwizzle1_CUDA (310 ms) 2023-01-11T23:21:36.6013204Z [ RUN ] NVFuserTest.FusionSimpleSwizzle2_CUDA 2023-01-11T23:21:36.9478454Z [ OK ] NVFuserTest.FusionSimpleSwizzle2_CUDA (346 ms) 2023-01-11T23:21:36.9479258Z [ RUN ] NVFuserTest.FusionSwizzleMapping_CUDA 2023-01-11T23:21:36.9487339Z [ OK ] NVFuserTest.FusionSwizzleMapping_CUDA (1 ms) 2023-01-11T23:21:36.9487726Z [ RUN ] NVFuserTest.FusionLoopSwizzle0_CUDA 2023-01-11T23:21:37.2099017Z [ OK ] NVFuserTest.FusionLoopSwizzle0_CUDA (260 ms) 2023-01-11T23:21:37.2099599Z [ RUN ] NVFuserTest.FusionLoopSwizzle1_CUDA 2023-01-11T23:21:37.4731272Z [ OK ] NVFuserTest.FusionLoopSwizzle1_CUDA (263 ms) 2023-01-11T23:21:37.4731757Z [ RUN ] NVFuserTest.FusionLoopSwizzleCheck0_CUDA 2023-01-11T23:21:37.4777549Z [ OK ] NVFuserTest.FusionLoopSwizzleCheck0_CUDA (4 ms) 2023-01-11T23:21:37.4778358Z [ RUN ] NVFuserTest.FusionLoopSwizzleCheck1_CUDA 2023-01-11T23:21:37.4794067Z [W lower_validation.cpp:1081] Warning: Ignored loop swizzle :ZShape(2D): iS10{( ceilDiv(16, 4) )} , iS11{4} -> iS16{( ceilDiv(16, 4) )} , iS17{4} 2023-01-11T23:21:37.4794414Z (function operator()) 2023-01-11T23:21:37.4822743Z [ OK ] NVFuserTest.FusionLoopSwizzleCheck1_CUDA (4 ms) 2023-01-11T23:21:37.4823684Z [ RUN ] NVFuserTest.FusionUnsqueeze1_CUDA 2023-01-11T23:21:37.8197987Z [ OK ] NVFuserTest.FusionUnsqueeze1_CUDA (337 ms) 2023-01-11T23:21:37.8198931Z [ RUN ] NVFuserTest.FusionSqueeze1_CUDA 2023-01-11T23:21:38.1569303Z [ OK ] NVFuserTest.FusionSqueeze1_CUDA (337 ms) 2023-01-11T23:21:38.1569767Z [ RUN ] NVFuserTest.FusionContigPredicate_CUDA 2023-01-11T23:21:38.6067429Z [ OK ] NVFuserTest.FusionContigPredicate_CUDA (449 ms) 2023-01-11T23:21:38.6068274Z [ RUN ] NVFuserTest.FusionDivScalarLhs_CUDA 2023-01-11T23:21:38.7842202Z [ OK ] NVFuserTest.FusionDivScalarLhs_CUDA (177 ms) 2023-01-11T23:21:38.7842627Z [ RUN ] NVFuserTest.FusionRepro1713_CUDA 2023-01-11T23:21:39.4258313Z [ OK ] NVFuserTest.FusionRepro1713_CUDA (641 ms) 2023-01-11T23:21:39.4259462Z [ RUN ] NVFuserTest.FusionExpand_CUDA 2023-01-11T23:21:39.8244993Z [ OK ] NVFuserTest.FusionExpand_CUDA (398 ms) 2023-01-11T23:21:39.8245415Z [ RUN ] NVFuserTest.FusionExpandIssue1751_CUDA 2023-01-11T23:21:40.0158475Z [ OK ] NVFuserTest.FusionExpandIssue1751_CUDA (190 ms) 2023-01-11T23:21:40.0159462Z [ RUN ] NVFuserTest.FusionExpandToConcrete_CUDA 2023-01-11T23:21:40.1939153Z [ OK ] NVFuserTest.FusionExpandToConcrete_CUDA (178 ms) 2023-01-11T23:21:40.1940165Z [ RUN ] NVFuserTest.FusionReproNoncontigBroadcast_CUDA 2023-01-11T23:21:40.4404810Z [ OK ] NVFuserTest.FusionReproNoncontigBroadcast_CUDA (246 ms) 2023-01-11T23:21:40.4405288Z [ RUN ] NVFuserTest.FusionTransformPropagateSibling_CUDA 2023-01-11T23:21:40.4416051Z [ OK ] NVFuserTest.FusionTransformPropagateSibling_CUDA (1 ms) 2023-01-11T23:21:40.4416691Z [ RUN ] NVFuserTest.FusionTransformPropagateSelectorSibling_CUDA 2023-01-11T23:21:40.4442306Z [ OK ] NVFuserTest.FusionTransformPropagateSelectorSibling_CUDA (2 ms) 2023-01-11T23:21:40.4442905Z [ RUN ] NVFuserTest.FusionTransformPropagatePosition_CUDA 2023-01-11T23:21:40.4447360Z [ OK ] NVFuserTest.FusionTransformPropagatePosition_CUDA (0 ms) 2023-01-11T23:21:40.4447939Z [ RUN ] NVFuserTest.FusionIgnoreZeroDimReduction_CUDA 2023-01-11T23:21:40.6636808Z [ OK ] NVFuserTest.FusionIgnoreZeroDimReduction_CUDA (218 ms) 2023-01-11T23:21:40.6637795Z [ RUN ] NVFuserTest.FusionIssue1770Repro_CUDA 2023-01-11T23:21:40.8438048Z [ OK ] NVFuserTest.FusionIssue1770Repro_CUDA (180 ms) 2023-01-11T23:21:40.8439149Z [ RUN ] NVFuserTest.FusionTransformPropagatorSelector_CUDA 2023-01-11T23:21:40.8440298Z [ OK ] NVFuserTest.FusionTransformPropagatorSelector_CUDA (0 ms) 2023-01-11T23:21:40.8440972Z [ RUN ] NVFuserTest.FusionTransformPropagatorPos_CUDA 2023-01-11T23:21:40.8441564Z [ OK ] NVFuserTest.FusionTransformPropagatorPos_CUDA (0 ms) 2023-01-11T23:21:40.8442265Z [ RUN ] NVFuserTest.FusionMaxRootDomainInfoSpanningTreePrintTwice_CUDA 2023-01-11T23:21:40.8443168Z [ OK ] NVFuserTest.FusionMaxRootDomainInfoSpanningTreePrintTwice_CUDA (0 ms) 2023-01-11T23:21:40.8443722Z [ RUN ] NVFuserTest.FusionTransformPropagatorNoOverwrite_CUDA 2023-01-11T23:21:40.8444184Z [ OK ] NVFuserTest.FusionTransformPropagatorNoOverwrite_CUDA (0 ms) 2023-01-11T23:21:40.8444595Z [ RUN ] NVFuserTest.FusionIssue1785Repro_CUDA 2023-01-11T23:21:41.0456452Z [ OK ] NVFuserTest.FusionIssue1785Repro_CUDA (200 ms) 2023-01-11T23:21:41.0457323Z [ RUN ] NVFuserTest.FusionSkipReplay_CUDA 2023-01-11T23:21:41.0458569Z [ OK ] NVFuserTest.FusionSkipReplay_CUDA (0 ms) 2023-01-11T23:21:41.0459430Z [ RUN ] NVFuserTest.FusionInlineRepro1803_CUDA 2023-01-11T23:21:41.0468318Z [ OK ] NVFuserTest.FusionInlineRepro1803_CUDA (1 ms) 2023-01-11T23:21:41.0468939Z [ RUN ] NVFuserTest.FusionBoundedDirectionSelection1_CUDA 2023-01-11T23:21:41.0470254Z [ OK ] NVFuserTest.FusionBoundedDirectionSelection1_CUDA (0 ms) 2023-01-11T23:21:41.0470847Z [ RUN ] NVFuserTest.FusionIssueRepro1844_CUDA 2023-01-11T23:21:41.2835431Z [ OK ] NVFuserTest.FusionIssueRepro1844_CUDA (235 ms) 2023-01-11T23:21:41.2836449Z [ RUN ] NVFuserTest.FusionInsertMagicZero1_CUDA 2023-01-11T23:21:41.2866023Z [ OK ] NVFuserTest.FusionInsertMagicZero1_CUDA (3 ms) 2023-01-11T23:21:41.2866953Z [ RUN ] NVFuserTest.FusionRepro1860_CUDA 2023-01-11T23:21:41.5110893Z [ OK ] NVFuserTest.FusionRepro1860_CUDA (224 ms) 2023-01-11T23:21:41.5111359Z [ RUN ] NVFuserTest.FusionExpandReduce_CUDA 2023-01-11T23:21:41.7150698Z [ OK ] NVFuserTest.FusionExpandReduce_CUDA (203 ms) 2023-01-11T23:21:41.7151124Z [ RUN ] NVFuserTest.FusionExpandReduce2_CUDA 2023-01-11T23:21:41.9358501Z [ OK ] NVFuserTest.FusionExpandReduce2_CUDA (220 ms) 2023-01-11T23:21:41.9359325Z [ RUN ] NVFuserTest.FusionExpandBadShapeTest_CUDA 2023-01-11T23:21:41.9394432Z [ OK ] NVFuserTest.FusionExpandBadShapeTest_CUDA (4 ms) 2023-01-11T23:21:41.9394953Z [ RUN ] NVFuserTest.FusionPointwiseScheduleWithBroadcastAndTrivialReduction_CUDA 2023-01-11T23:21:42.1966242Z [ OK ] NVFuserTest.FusionPointwiseScheduleWithBroadcastAndTrivialReduction_CUDA (257 ms) 2023-01-11T23:21:42.1966805Z [ RUN ] NVFuserTest.FusionInliningMismatchedDims1_CUDA 2023-01-11T23:21:43.0544072Z [ OK ] NVFuserTest.FusionInliningMismatchedDims1_CUDA (857 ms) 2023-01-11T23:21:43.0545003Z [ RUN ] NVFuserTest.FusionInliningMismatchedDims2_CUDA 2023-01-11T23:21:43.9150537Z [ OK ] NVFuserTest.FusionInliningMismatchedDims2_CUDA (860 ms) 2023-01-11T23:21:43.9151629Z [ RUN ] NVFuserTest.FusionInliningMismatchedDims3_CUDA 2023-01-11T23:21:44.8634079Z [ OK ] NVFuserTest.FusionInliningMismatchedDims3_CUDA (948 ms) 2023-01-11T23:21:44.8634612Z [ RUN ] NVFuserTest.FusionInliningMismatchedDims4_CUDA 2023-01-11T23:21:45.7278230Z [ OK ] NVFuserTest.FusionInliningMismatchedDims4_CUDA (864 ms) 2023-01-11T23:21:45.7279376Z [ RUN ] NVFuserTest.FusionInliningBroadcast_CUDA 2023-01-11T23:21:46.5611453Z [ OK ] NVFuserTest.FusionInliningBroadcast_CUDA (833 ms) 2023-01-11T23:21:46.5612141Z [ RUN ] NVFuserTest.FusionInliningBroadcastTrivialReduction_CUDA 2023-01-11T23:21:47.4949209Z [ OK ] NVFuserTest.FusionInliningBroadcastTrivialReduction_CUDA (933 ms) 2023-01-11T23:21:47.4950606Z [ RUN ] NVFuserTest.FusionMatchedLeafPosWithoutReplayTrivialReduction_CUDA 2023-01-11T23:21:47.4951908Z [ OK ] NVFuserTest.FusionMatchedLeafPosWithoutReplayTrivialReduction_CUDA (0 ms) 2023-01-11T23:21:47.4952648Z [ RUN ] NVFuserTest.FusionMatchedLeafPosWithoutReplayBroadcast_CUDA 2023-01-11T23:21:47.4953488Z [ OK ] NVFuserTest.FusionMatchedLeafPosWithoutReplayBroadcast_CUDA (0 ms) 2023-01-11T23:21:47.4953952Z [ RUN ] NVFuserTest.FusionIdGraphTrivialReduction_CUDA 2023-01-11T23:21:47.4961161Z [ OK ] NVFuserTest.FusionIdGraphTrivialReduction_CUDA (1 ms) 2023-01-11T23:21:47.4961558Z [ RUN ] NVFuserTest.FusionPrint_CUDA 2023-01-11T23:21:47.6887600Z T3[0] = 0.000000 @ threadIdx=(0,0,0), blockIdx=(0,0,0) 2023-01-11T23:21:47.8913703Z T3[0] = 1.000000 @ threadIdx=(1,0,0), blockIdx=(0,0,0) 2023-01-11T23:21:47.8914113Z T3[0] = 0.000000 @ threadIdx=(0,0,0), blockIdx=(0,0,0) 2023-01-11T23:21:47.8914411Z T3[0] = 1.000000 @ threadIdx=(1,0,0), blockIdx=(0,0,0) 2023-01-11T23:21:48.0867408Z T4[0] = 0.000000 @ threadIdx=(0,0,0), blockIdx=(0,0,0) 2023-01-11T23:21:48.0868127Z T4[0] = 1.000000 @ threadIdx=(1,0,0), blockIdx=(0,0,0) 2023-01-11T23:21:48.2809869Z T4[0] = 0.000000 @ threadIdx=(0,0,0), blockIdx=(0,0,0) 2023-01-11T23:21:48.2810269Z T4[0] = 1.000000 @ threadIdx=(1,0,0), blockIdx=(0,0,0) 2023-01-11T23:21:48.4754803Z T4[0] = 0 @ threadIdx=(0,0,0), blockIdx=(0,0,0) 2023-01-11T23:21:48.4755172Z T4[0] = 1 @ threadIdx=(1,0,0), blockIdx=(0,0,0) 2023-01-11T23:21:48.6696779Z T4[0] = 0 @ threadIdx=(0,0,0), blockIdx=(0,0,0) 2023-01-11T23:21:48.6697447Z T4[0] = 1 @ threadIdx=(1,0,0), blockIdx=(0,0,0) 2023-01-11T23:21:48.8641628Z T4[0] = false @ threadIdx=(0,0,0), blockIdx=(0,0,0) 2023-01-11T23:21:48.8642177Z T4[0] = true @ threadIdx=(1,0,0), blockIdx=(0,0,0) 2023-01-11T23:21:48.8642820Z [ OK ] NVFuserTest.FusionPrint_CUDA (1367 ms) 2023-01-11T23:21:48.8643188Z [ RUN ] NVFuserTest.FusionCheckedSymbolicShape_CUDA 2023-01-11T23:21:49.0751146Z [ OK ] NVFuserTest.FusionCheckedSymbolicShape_CUDA (210 ms) 2023-01-11T23:21:49.0752030Z [ RUN ] NVFuserTest.FusionSizeDependentData_CUDA 2023-01-11T23:21:49.2493044Z [ OK ] NVFuserTest.FusionSizeDependentData_CUDA (174 ms) 2023-01-11T23:21:49.2493627Z [ RUN ] NVFuserTest.FusionDependencyCheck_CUDA 2023-01-11T23:21:49.2494008Z [ OK ] NVFuserTest.FusionDependencyCheck_CUDA (0 ms) 2023-01-11T23:21:49.2494400Z [ RUN ] NVFuserTest.FusionScheduleTransposeRepro1_CUDA 2023-01-11T23:21:49.5647994Z [ OK ] NVFuserTest.FusionScheduleTransposeRepro1_CUDA (315 ms) 2023-01-11T23:21:49.5648475Z [ RUN ] NVFuserTest.FusionInlineBroadcastIndexing0_CUDA 2023-01-11T23:21:49.8601936Z [ OK ] NVFuserTest.FusionInlineBroadcastIndexing0_CUDA (295 ms) 2023-01-11T23:21:49.8602490Z [ RUN ] NVFuserTest.FusionPredicateUnshare_CUDA 2023-01-11T23:21:50.0431040Z [ OK ] NVFuserTest.FusionPredicateUnshare_CUDA (182 ms) 2023-01-11T23:21:50.0431986Z [ RUN ] NVFuserTest.AsyncCompilation_CUDA 2023-01-11T23:21:50.4804803Z .....................[ OK ] NVFuserTest.AsyncCompilation_CUDA (437 ms) 2023-01-11T23:21:50.4805626Z [ RUN ] NVFuserTest.FusionMergeBroadcastingTrivialReduction1_CUDA 2023-01-11T23:21:50.6528343Z [ OK ] NVFuserTest.FusionMergeBroadcastingTrivialReduction1_CUDA (172 ms) 2023-01-11T23:21:50.6528942Z [ RUN ] NVFuserTest.FusionMergeBroadcastingTrivialReduction2_CUDA 2023-01-11T23:21:50.8279437Z [ OK ] NVFuserTest.FusionMergeBroadcastingTrivialReduction2_CUDA (174 ms) 2023-01-11T23:21:50.8279919Z [ RUN ] NVFuserTest.FusionNullScheduler_CUDA 2023-01-11T23:21:50.9952371Z [ OK ] NVFuserTest.FusionNullScheduler_CUDA (167 ms) 2023-01-11T23:21:50.9952991Z [ RUN ] NVFuserTest.FusionNullScheduler2_CUDA 2023-01-11T23:21:51.1647089Z [ OK ] NVFuserTest.FusionNullScheduler2_CUDA (169 ms) 2023-01-11T23:21:51.1647503Z [ RUN ] NVFuserTest.FusionNullScheduler3_CUDA 2023-01-11T23:21:51.3321090Z [ OK ] NVFuserTest.FusionNullScheduler3_CUDA (167 ms) 2023-01-11T23:21:51.3322546Z [ RUN ] NVFuserTest.FusionEmpty_CUDA 2023-01-11T23:21:51.3325050Z [ OK ] NVFuserTest.FusionEmpty_CUDA (0 ms) 2023-01-11T23:21:51.3325403Z [ RUN ] NVFuserTest.FusionMappingRelation_CUDA 2023-01-11T23:21:51.5068961Z [ OK ] NVFuserTest.FusionMappingRelation_CUDA (174 ms) 2023-01-11T23:21:51.5069409Z [ RUN ] NVFuserTest.FusionInlineAt_CUDA 2023-01-11T23:21:51.7064669Z [ OK ] NVFuserTest.FusionInlineAt_CUDA (199 ms) 2023-01-11T23:21:51.7065131Z [ RUN ] NVFuserTest.FusionTrivialInputForwarding_CUDA 2023-01-11T23:21:51.7073372Z [ OK ] NVFuserTest.FusionTrivialInputForwarding_CUDA (0 ms) 2023-01-11T23:21:51.7073867Z [ RUN ] NVFuserTest.FusionTrivialInputForwarding2_CUDA 2023-01-11T23:21:51.8733900Z [ OK ] NVFuserTest.FusionTrivialInputForwarding2_CUDA (166 ms) 2023-01-11T23:21:51.8735258Z [ RUN ] NVFuserTest.FusionReplayTrivialReductionAndBroadcast2_CUDA 2023-01-11T23:21:52.0638920Z [ OK ] NVFuserTest.FusionReplayTrivialReductionAndBroadcast2_CUDA (190 ms) 2023-01-11T23:21:52.0639894Z [ RUN ] NVFuserTest.FusionVectorizeStrideContiguity2D_CUDA 2023-01-11T23:21:52.6404875Z [ OK ] NVFuserTest.FusionVectorizeStrideContiguity2D_CUDA (576 ms) 2023-01-11T23:21:52.6405349Z [ RUN ] NVFuserTest.FusionVectorizeStrideContiguity3D_CUDA 2023-01-11T23:21:53.2327077Z [ OK ] NVFuserTest.FusionVectorizeStrideContiguity3D_CUDA (592 ms) 2023-01-11T23:21:53.2327540Z [ RUN ] NVFuserTest.FusionVectorizeStrideContiguity5D_CUDA 2023-01-11T23:21:53.8572535Z [ OK ] NVFuserTest.FusionVectorizeStrideContiguity5D_CUDA (624 ms) 2023-01-11T23:21:53.8573269Z [ RUN ] NVFuserTest.FusionVectorizeStrideContiguitySelfOverlapping_CUDA 2023-01-11T23:21:54.4827267Z [ OK ] NVFuserTest.FusionVectorizeStrideContiguitySelfOverlapping_CUDA (625 ms) 2023-01-11T23:21:54.4828179Z [ RUN ] NVFuserTest.FusionSimpleAmperePipeline_CUDA 2023-01-11T23:21:54.7708560Z [ OK ] NVFuserTest.FusionSimpleAmperePipeline_CUDA (287 ms) 2023-01-11T23:21:54.7709386Z [ RUN ] NVFuserTest.FusionStandaloneFull_CUDA 2023-01-11T23:21:59.3800056Z [ OK ] NVFuserTest.FusionStandaloneFull_CUDA (4609 ms) 2023-01-11T23:21:59.3800457Z [ RUN ] NVFuserTest.FusionStandaloneZeros_CUDA 2023-01-11T23:22:03.9906654Z [ OK ] NVFuserTest.FusionStandaloneZeros_CUDA (4610 ms) 2023-01-11T23:22:03.9907391Z [ RUN ] NVFuserTest.FusionStandaloneOnes_CUDA 2023-01-11T23:22:08.6056109Z [ OK ] NVFuserTest.FusionStandaloneOnes_CUDA (4614 ms) 2023-01-11T23:22:08.6056908Z [ RUN ] NVFuserTest.FusionStandaloneARange_CUDA 2023-01-11T23:22:10.8239509Z [ OK ] NVFuserTest.FusionStandaloneARange_CUDA (2218 ms) 2023-01-11T23:22:10.8239946Z [ RUN ] NVFuserTest.FusionStandaloneEye_CUDA 2023-01-11T23:22:14.0315100Z [ OK ] NVFuserTest.FusionStandaloneEye_CUDA (3207 ms) 2023-01-11T23:22:14.0315503Z [ RUN ] NVFuserTest.FusionGridAllreduce1_CUDA 2023-01-11T23:22:14.3045924Z [ OK ] NVFuserTest.FusionGridAllreduce1_CUDA (273 ms) 2023-01-11T23:22:14.3046606Z [ RUN ] NVFuserTest.FusionGridAllreduce2_CUDA 2023-01-11T23:22:14.5400114Z [ OK ] NVFuserTest.FusionGridAllreduce2_CUDA (235 ms) 2023-01-11T23:22:14.5401572Z [ RUN ] NVFuserTest.FusionGridAllreduce3_CUDA 2023-01-11T23:22:14.7806327Z [ OK ] NVFuserTest.FusionGridAllreduce3_CUDA (240 ms) 2023-01-11T23:22:14.7806734Z [ RUN ] NVFuserTest.FusionGridAllreduce4_CUDA 2023-01-11T23:22:15.0137826Z [ OK ] NVFuserTest.FusionGridAllreduce4_CUDA (232 ms) 2023-01-11T23:22:15.0138613Z [ RUN ] NVFuserTest.FusionGridAllreduce5_CUDA 2023-01-11T23:22:15.2785663Z [ OK ] NVFuserTest.FusionGridAllreduce5_CUDA (264 ms) 2023-01-11T23:22:15.2786479Z [ RUN ] NVFuserTest.FusionGridAllreduce6_CUDA 2023-01-11T23:22:15.5727470Z [ OK ] NVFuserTest.FusionGridAllreduce6_CUDA (294 ms) 2023-01-11T23:22:15.5727984Z [ RUN ] NVFuserTest.FusionGridAllreduceWelford1_CUDA 2023-01-11T23:22:15.8581358Z [ OK ] NVFuserTest.FusionGridAllreduceWelford1_CUDA (285 ms) 2023-01-11T23:22:15.8581811Z [ RUN ] NVFuserTest.FusionGridAllreduceWelford2_CUDA 2023-01-11T23:22:16.1559456Z [ OK ] NVFuserTest.FusionGridAllreduceWelford2_CUDA (297 ms) 2023-01-11T23:22:16.1560406Z [ RUN ] NVFuserTest.FusionFusedReductionBatchnorm_CUDA 2023-01-11T23:22:22.3870336Z [ OK ] NVFuserTest.FusionFusedReductionBatchnorm_CUDA (6231 ms) 2023-01-11T23:22:22.3871377Z [ RUN ] NVFuserTest.FusionGroupedReduction1_CUDA 2023-01-11T23:22:22.5801141Z [ OK ] NVFuserTest.FusionGroupedReduction1_CUDA (193 ms) 2023-01-11T23:22:22.5801987Z [ RUN ] NVFuserTest.FusionGroupedReduction2_CUDA 2023-01-11T23:22:22.8262557Z [ OK ] NVFuserTest.FusionGroupedReduction2_CUDA (246 ms) 2023-01-11T23:22:22.8263061Z [ RUN ] NVFuserTest.FusionGroupedReduction3_CUDA 2023-01-11T23:22:23.0668374Z [ OK ] NVFuserTest.FusionGroupedReduction3_CUDA (240 ms) 2023-01-11T23:22:23.0669240Z [ RUN ] NVFuserTest.FusionGroupedReduction4_CUDA 2023-01-11T23:22:23.0685131Z [ OK ] NVFuserTest.FusionGroupedReduction4_CUDA (2 ms) 2023-01-11T23:22:23.0685556Z [ RUN ] NVFuserTest.FusionGroupedReduction5_CUDA 2023-01-11T23:22:23.0705981Z [ OK ] NVFuserTest.FusionGroupedReduction5_CUDA (1 ms) 2023-01-11T23:22:23.0707522Z [ RUN ] NVFuserTest.FusionGroupedReduction6_CUDA 2023-01-11T23:22:23.2700702Z [ OK ] NVFuserTest.FusionGroupedReduction6_CUDA (199 ms) 2023-01-11T23:22:23.2701731Z [ RUN ] NVFuserTest.FusionGroupedReduction7_CUDA 2023-01-11T23:22:23.2717967Z [ OK ] NVFuserTest.FusionGroupedReduction7_CUDA (2 ms) 2023-01-11T23:22:23.2718412Z [ RUN ] NVFuserTest.FusionGroupedReductionRfactor1_CUDA 2023-01-11T23:22:23.5125845Z [ OK ] NVFuserTest.FusionGroupedReductionRfactor1_CUDA (240 ms) 2023-01-11T23:22:23.5126317Z [ RUN ] NVFuserTest.FusionGroupedReductionRfactor2_CUDA 2023-01-11T23:22:23.7542600Z [ OK ] NVFuserTest.FusionGroupedReductionRfactor2_CUDA (241 ms) 2023-01-11T23:22:23.7543143Z [ RUN ] NVFuserTest.FusionGroupedReductionAfterComputeAt_CUDA 2023-01-11T23:22:23.9573926Z [ OK ] NVFuserTest.FusionGroupedReductionAfterComputeAt_CUDA (203 ms) 2023-01-11T23:22:23.9574399Z [ RUN ] NVFuserTest.FusionGroupAllreduce1_CUDA 2023-01-11T23:22:24.2438410Z [ OK ] NVFuserTest.FusionGroupAllreduce1_CUDA (286 ms) 2023-01-11T23:22:24.2439236Z [ RUN ] NVFuserTest.FusionGroupAllreduce2_CUDA 2023-01-11T23:22:24.5446394Z [ OK ] NVFuserTest.FusionGroupAllreduce2_CUDA (301 ms) 2023-01-11T23:22:24.5446805Z [ RUN ] NVFuserTest.FusionGroupAllreduce3_CUDA 2023-01-11T23:22:24.8907788Z [ OK ] NVFuserTest.FusionGroupAllreduce3_CUDA (345 ms) 2023-01-11T23:22:24.8908586Z [ RUN ] NVFuserTest.FusionGroupAllreduce4_CUDA 2023-01-11T23:22:25.4693988Z [ OK ] NVFuserTest.FusionGroupAllreduce4_CUDA (578 ms) 2023-01-11T23:22:25.4694900Z [ RUN ] NVFuserTest.FusionGroupAllreduce5_CUDA 2023-01-11T23:22:25.8278854Z [ OK ] NVFuserTest.FusionGroupAllreduce5_CUDA (358 ms) 2023-01-11T23:22:25.8279956Z [ RUN ] NVFuserTest.FusionPersistentBNBackwardAllreduce_CUDA 2023-01-11T23:22:26.2747532Z [ OK ] NVFuserTest.FusionPersistentBNBackwardAllreduce_CUDA (446 ms) 2023-01-11T23:22:26.2748616Z [ RUN ] NVFuserTest.FusionGroupedReductionReEntrant1_CUDA 2023-01-11T23:22:26.5572250Z [ OK ] NVFuserTest.FusionGroupedReductionReEntrant1_CUDA (282 ms) 2023-01-11T23:22:26.5572806Z [ RUN ] NVFuserTest.FusionGroupedReductionChannelsLastBatchNormLike_CUDA 2023-01-11T23:22:26.9072349Z [ OK ] NVFuserTest.FusionGroupedReductionChannelsLastBatchNormLike_CUDA (349 ms) 2023-01-11T23:22:26.9073698Z [ RUN ] NVFuserTest.FusionGroupedReductionPersistentChannelsLastBatchNormLike_CUDA 2023-01-11T23:22:27.4449924Z [ OK ] NVFuserTest.FusionGroupedReductionPersistentChannelsLastBatchNormLike_CUDA (537 ms) 2023-01-11T23:22:27.4450555Z [ RUN ] NVFuserTest.FusionCrossIterationGroupedGridAllreduce1_CUDA 2023-01-11T23:22:27.7518673Z [ OK ] NVFuserTest.FusionCrossIterationGroupedGridAllreduce1_CUDA (306 ms) 2023-01-11T23:22:27.7519921Z [ RUN ] NVFuserTest.FusionCrossIterationGroupedGridAllreduce2_CUDA 2023-01-11T23:22:28.2360249Z [ OK ] NVFuserTest.FusionCrossIterationGroupedGridAllreduce2_CUDA (484 ms) 2023-01-11T23:22:28.2361404Z [ RUN ] NVFuserTest.FusionCrossIterationGroupedGridAllreduce3_CUDA 2023-01-11T23:22:28.6357350Z [ OK ] NVFuserTest.FusionCrossIterationGroupedGridAllreduce3_CUDA (399 ms) 2023-01-11T23:22:28.6357946Z [ RUN ] NVFuserTest.FusionCrossIterationGroupedGridAllreduce4_CUDA 2023-01-11T23:22:28.9431109Z [ OK ] NVFuserTest.FusionCrossIterationGroupedGridAllreduce4_CUDA (307 ms) 2023-01-11T23:22:28.9433362Z [ RUN ] NVFuserTest.FusionCrossIterationGroupedGridAllreduceWelford1_CUDA 2023-01-11T23:22:29.3792714Z [ OK ] NVFuserTest.FusionCrossIterationGroupedGridAllreduceWelford1_CUDA (436 ms) 2023-01-11T23:22:29.3794019Z [ RUN ] NVFuserTest.FusionCrossIterationGroupedGridAllreduceWelford2_CUDA 2023-01-11T23:22:30.2701468Z [ OK ] NVFuserTest.FusionCrossIterationGroupedGridAllreduceWelford2_CUDA (890 ms) 2023-01-11T23:22:30.2702834Z [ RUN ] NVFuserTest.FusionCrossIterationGroupedGridAllreduceWelfordShmoo_CUDA 2023-01-11T23:23:02.5874991Z [ OK ] NVFuserTest.FusionCrossIterationGroupedGridAllreduceWelfordShmoo_CUDA (32317 ms) 2023-01-11T23:23:02.5875503Z [ RUN ] NVFuserTest.FusionShift1_CUDA 2023-01-11T23:23:02.7809665Z [ OK ] NVFuserTest.FusionShift1_CUDA (193 ms) 2023-01-11T23:23:02.7810107Z [ RUN ] NVFuserTest.FusionShift2_CUDA 2023-01-11T23:23:03.0094414Z [ OK ] NVFuserTest.FusionShift2_CUDA (228 ms) 2023-01-11T23:23:03.0096296Z [ RUN ] NVFuserTest.FusionShiftRightOfCA_CUDA 2023-01-11T23:23:03.1914005Z [ OK ] NVFuserTest.FusionShiftRightOfCA_CUDA (181 ms) 2023-01-11T23:23:03.1914440Z [ RUN ] NVFuserTest.FusionShiftLeftOfCA_CUDA 2023-01-11T23:23:03.1953889Z [ OK ] NVFuserTest.FusionShiftLeftOfCA_CUDA (4 ms) 2023-01-11T23:23:03.1954349Z [ RUN ] NVFuserTest.FusionShiftSplit1_CUDA 2023-01-11T23:23:03.4001930Z [ OK ] NVFuserTest.FusionShiftSplit1_CUDA (204 ms) 2023-01-11T23:23:03.4002703Z [ RUN ] NVFuserTest.FusionShiftSplit2_CUDA 2023-01-11T23:23:03.6314327Z [ OK ] NVFuserTest.FusionShiftSplit2_CUDA (231 ms) 2023-01-11T23:23:03.6314745Z [ RUN ] NVFuserTest.FusionShiftDoubleSplit_CUDA 2023-01-11T23:23:03.8444755Z [ OK ] NVFuserTest.FusionShiftDoubleSplit_CUDA (213 ms) 2023-01-11T23:23:03.8445198Z [ RUN ] NVFuserTest.FusionShift3ptStencil_CUDA 2023-01-11T23:23:04.0716664Z [ OK ] NVFuserTest.FusionShift3ptStencil_CUDA (227 ms) 2023-01-11T23:23:04.0717136Z [ RUN ] NVFuserTest.FusionShift5ptStencil_CUDA 2023-01-11T23:23:04.9088184Z [ OK ] NVFuserTest.FusionShift5ptStencil_CUDA (837 ms) 2023-01-11T23:23:04.9088613Z [ RUN ] NVFuserTest.FusionShift9ptStencil_CUDA 2023-01-11T23:23:05.9640969Z [ OK ] NVFuserTest.FusionShift9ptStencil_CUDA (1054 ms) 2023-01-11T23:23:05.9641853Z [ RUN ] NVFuserTest.FusionShiftSmemBlocking_CUDA 2023-01-11T23:23:06.1486178Z [ OK ] NVFuserTest.FusionShiftSmemBlocking_CUDA (184 ms) 2023-01-11T23:23:06.1486659Z [ RUN ] NVFuserTest.FusionShift3ptStencilParallel_CUDA 2023-01-11T23:23:06.3476149Z [ OK ] NVFuserTest.FusionShift3ptStencilParallel_CUDA (198 ms) 2023-01-11T23:23:06.3476617Z [ RUN ] NVFuserTest.FusionShift5ptStencilParallel_CUDA 2023-01-11T23:23:06.5567158Z [ OK ] NVFuserTest.FusionShift5ptStencilParallel_CUDA (209 ms) 2023-01-11T23:23:06.5567637Z [ RUN ] NVFuserTest.FusionShiftMerge1_CUDA 2023-01-11T23:23:06.8006602Z [ OK ] NVFuserTest.FusionShiftMerge1_CUDA (243 ms) 2023-01-11T23:23:06.8007018Z [ RUN ] NVFuserTest.FusionShiftMerge2_CUDA 2023-01-11T23:23:07.1162948Z [ OK ] NVFuserTest.FusionShiftMerge2_CUDA (315 ms) 2023-01-11T23:23:07.1163877Z [ RUN ] NVFuserTest.FusionShiftGlobal_CUDA 2023-01-11T23:23:07.3389080Z [ OK ] NVFuserTest.FusionShiftGlobal_CUDA (222 ms) 2023-01-11T23:23:07.3389725Z [ RUN ] NVFuserTest.FusionShiftDoubleSplitMerge1_CUDA 2023-01-11T23:23:07.5583055Z [ OK ] NVFuserTest.FusionShiftDoubleSplitMerge1_CUDA (219 ms) 2023-01-11T23:23:07.5584055Z [ RUN ] NVFuserTest.FusionShiftDoubleSplitMerge2_CUDA 2023-01-11T23:23:07.7655684Z [ OK ] NVFuserTest.FusionShiftDoubleSplitMerge2_CUDA (207 ms) 2023-01-11T23:23:07.7656232Z [ RUN ] NVFuserTest.FusionShift5ptStencilParallel1DThreadBlock_CUDA 2023-01-11T23:23:11.6979363Z [ OK ] NVFuserTest.FusionShift5ptStencilParallel1DThreadBlock_CUDA (3932 ms) 2023-01-11T23:23:11.6980712Z [ RUN ] NVFuserTest.FusionShiftChain1_CUDA 2023-01-11T23:23:11.8921037Z [ OK ] NVFuserTest.FusionShiftChain1_CUDA (194 ms) 2023-01-11T23:23:11.8921443Z [ RUN ] NVFuserTest.FusionShiftChain2_CUDA 2023-01-11T23:23:12.0850932Z [ OK ] NVFuserTest.FusionShiftChain2_CUDA (192 ms) 2023-01-11T23:23:12.0851330Z [ RUN ] NVFuserTest.FusionShiftChain3_CUDA 2023-01-11T23:23:12.2847509Z [ OK ] NVFuserTest.FusionShiftChain3_CUDA (199 ms) 2023-01-11T23:23:12.2848284Z [ RUN ] NVFuserTest.FusionShiftChain4_CUDA 2023-01-11T23:23:13.0901967Z [ OK ] NVFuserTest.FusionShiftChain4_CUDA (805 ms) 2023-01-11T23:23:13.0902785Z [ RUN ] NVFuserTest.FusionShift5ptStencilChain_CUDA 2023-01-11T23:23:13.3659316Z [ OK ] NVFuserTest.FusionShift5ptStencilChain_CUDA (275 ms) 2023-01-11T23:23:13.3660172Z [ RUN ] NVFuserTest.FusionShiftReduction1_CUDA 2023-01-11T23:23:13.5679270Z [ OK ] NVFuserTest.FusionShiftReduction1_CUDA (202 ms) 2023-01-11T23:23:13.5679742Z [ RUN ] NVFuserTest.FusionShiftReduction2_CUDA 2023-01-11T23:23:13.8046259Z [ OK ] NVFuserTest.FusionShiftReduction2_CUDA (236 ms) 2023-01-11T23:23:13.8046757Z [ RUN ] NVFuserTest.FusionShiftRfactor1_CUDA 2023-01-11T23:23:14.0450536Z [ OK ] NVFuserTest.FusionShiftRfactor1_CUDA (240 ms) 2023-01-11T23:23:14.0451507Z [ RUN ] NVFuserTest.FusionShiftBcast1_CUDA 2023-01-11T23:23:14.2225570Z [ OK ] NVFuserTest.FusionShiftBcast1_CUDA (177 ms) 2023-01-11T23:23:14.2226280Z [ RUN ] NVFuserTest.FusionShiftBcast2_CUDA 2023-01-11T23:23:14.4189971Z [ OK ] NVFuserTest.FusionShiftBcast2_CUDA (196 ms) 2023-01-11T23:23:14.4190979Z [ RUN ] NVFuserTest.FusionShiftBcast3_CUDA 2023-01-11T23:23:14.6122932Z [ OK ] NVFuserTest.FusionShiftBcast3_CUDA (193 ms) 2023-01-11T23:23:14.6123387Z [ RUN ] NVFuserTest.FusionShiftSyncPlacement1_CUDA 2023-01-11T23:23:14.8045274Z [ OK ] NVFuserTest.FusionShiftSyncPlacement1_CUDA (192 ms) 2023-01-11T23:23:14.8046074Z [ RUN ] NVFuserTest.FusionShiftSyncPlacement2_CUDA 2023-01-11T23:23:14.9844786Z [ OK ] NVFuserTest.FusionShiftSyncPlacement2_CUDA (180 ms) 2023-01-11T23:23:14.9845194Z [ RUN ] NVFuserTest.FusionHdiff_CUDA 2023-01-11T23:23:15.3442103Z [ OK ] NVFuserTest.FusionHdiff_CUDA (359 ms) 2023-01-11T23:23:15.3442578Z [ RUN ] NVFuserTest.FusionHdiffPartialSplitUnswitch_CUDA 2023-01-11T23:23:15.8072935Z [ OK ] NVFuserTest.FusionHdiffPartialSplitUnswitch_CUDA (462 ms) 2023-01-11T23:23:15.8074040Z [ RUN ] NVFuserTest.FusionMaxPooling_CUDA 2023-01-11T23:23:16.0781301Z [ OK ] NVFuserTest.FusionMaxPooling_CUDA (270 ms) 2023-01-11T23:23:16.0782164Z [ RUN ] NVFuserTest.FusionGather1_CUDA 2023-01-11T23:23:16.2615724Z [ OK ] NVFuserTest.FusionGather1_CUDA (183 ms) 2023-01-11T23:23:16.2616579Z [ RUN ] NVFuserTest.FusionGather2_CUDA 2023-01-11T23:23:16.4495606Z [ OK ] NVFuserTest.FusionGather2_CUDA (187 ms) 2023-01-11T23:23:16.4496380Z [ RUN ] NVFuserTest.FusionGather3_CUDA 2023-01-11T23:23:16.6298711Z [ OK ] NVFuserTest.FusionGather3_CUDA (180 ms) 2023-01-11T23:23:16.6299131Z [ RUN ] NVFuserTest.FusionGather4_CUDA 2023-01-11T23:23:16.8188402Z [ OK ] NVFuserTest.FusionGather4_CUDA (188 ms) 2023-01-11T23:23:16.8188827Z [ RUN ] NVFuserTest.FusionGather5_CUDA 2023-01-11T23:23:17.0221597Z [ OK ] NVFuserTest.FusionGather5_CUDA (203 ms) 2023-01-11T23:23:17.0222416Z [ RUN ] NVFuserTest.FusionGather6_CUDA 2023-01-11T23:23:17.2178643Z [ OK ] NVFuserTest.FusionGather6_CUDA (195 ms) 2023-01-11T23:23:17.2179398Z [ RUN ] NVFuserTest.FusionGather7_CUDA 2023-01-11T23:23:17.4211367Z [ OK ] NVFuserTest.FusionGather7_CUDA (203 ms) 2023-01-11T23:23:17.4212109Z [ RUN ] NVFuserTest.FusionGather8_CUDA 2023-01-11T23:23:17.6108327Z [ OK ] NVFuserTest.FusionGather8_CUDA (189 ms) 2023-01-11T23:23:17.6108706Z [ RUN ] NVFuserTest.FusionGather9_CUDA 2023-01-11T23:23:17.8263812Z [ OK ] NVFuserTest.FusionGather9_CUDA (215 ms) 2023-01-11T23:23:17.8264532Z [ RUN ] NVFuserTest.FusionConv2D_CUDA 2023-01-11T23:23:19.3064346Z [ OK ] NVFuserTest.FusionConv2D_CUDA (1480 ms) 2023-01-11T23:23:19.3065106Z [ RUN ] NVFuserTest.FusionConv2DNoPadding_CUDA 2023-01-11T23:23:19.5934005Z [ OK ] NVFuserTest.FusionConv2DNoPadding_CUDA (286 ms) 2023-01-11T23:23:19.5935375Z [ RUN ] NVFuserTest.FusionConv2DNoPaddingStrided_CUDA 2023-01-11T23:23:19.8666686Z [ OK ] NVFuserTest.FusionConv2DNoPaddingStrided_CUDA (273 ms) 2023-01-11T23:23:19.8667165Z [ RUN ] NVFuserTest.FusionConv2DChain_CUDA 2023-01-11T23:23:20.2391554Z [ OK ] NVFuserTest.FusionConv2DChain_CUDA (372 ms) 2023-01-11T23:23:20.2392527Z [ RUN ] NVFuserTest.FusionConv2DStaticEvenSizedWindow_CUDA 2023-01-11T23:23:20.4912270Z [ OK ] NVFuserTest.FusionConv2DStaticEvenSizedWindow_CUDA (252 ms) 2023-01-11T23:23:20.4913707Z [ RUN ] NVFuserTest.FusionConv4x4Pad1x1_CUDA 2023-01-11T23:23:20.8196777Z [ OK ] NVFuserTest.FusionConv4x4Pad1x1_CUDA (328 ms) 2023-01-11T23:23:20.8197594Z [ RUN ] NVFuserTest.FusionConv4x5Pad1x2_CUDA 2023-01-11T23:23:21.1680539Z [ OK ] NVFuserTest.FusionConv4x5Pad1x2_CUDA (348 ms) 2023-01-11T23:23:21.1680960Z [ RUN ] NVFuserTest.FusionConv4x4Pad1x1Stride4_CUDA 2023-01-11T23:23:21.7198673Z [ OK ] NVFuserTest.FusionConv4x4Pad1x1Stride4_CUDA (551 ms) 2023-01-11T23:23:21.7199078Z [ RUN ] NVFuserTest.FusionIm2Col_CUDA 2023-01-11T23:23:21.9282132Z [ OK ] NVFuserTest.FusionIm2Col_CUDA (208 ms) 2023-01-11T23:23:21.9282974Z [ RUN ] NVFuserTest.FusionShiftNoPadding1_CUDA 2023-01-11T23:23:22.1385590Z [ OK ] NVFuserTest.FusionShiftNoPadding1_CUDA (210 ms) 2023-01-11T23:23:22.1386476Z [ RUN ] NVFuserTest.FusionShiftNoPadding2_CUDA 2023-01-11T23:23:22.3481735Z [ OK ] NVFuserTest.FusionShiftNoPadding2_CUDA (209 ms) 2023-01-11T23:23:22.3482235Z [ RUN ] NVFuserTest.FusionShiftNoPadding3_CUDA 2023-01-11T23:23:22.5831279Z [ OK ] NVFuserTest.FusionShiftNoPadding3_CUDA (234 ms) 2023-01-11T23:23:22.5832161Z [ RUN ] NVFuserTest.FusionShiftNoPaddingContigMerge_CUDA 2023-01-11T23:23:22.7796587Z [ OK ] NVFuserTest.FusionShiftNoPaddingContigMerge_CUDA (196 ms) 2023-01-11T23:23:22.7797693Z [ RUN ] NVFuserTest.FusionShiftNoPaddingChain_CUDA 2023-01-11T23:23:23.0120046Z [ OK ] NVFuserTest.FusionShiftNoPaddingChain_CUDA (232 ms) 2023-01-11T23:23:23.0120550Z [ RUN ] NVFuserTest.FusionShiftNoPaddingRfactor_CUDA 2023-01-11T23:23:23.0142228Z [ OK ] NVFuserTest.FusionShiftNoPaddingRfactor_CUDA (2 ms) 2023-01-11T23:23:23.0142680Z [ RUN ] NVFuserTest.FusionShiftPadding1_CUDA 2023-01-11T23:23:23.2274640Z [ OK ] NVFuserTest.FusionShiftPadding1_CUDA (212 ms) 2023-01-11T23:23:23.2275458Z [ RUN ] NVFuserTest.FusionPartialSplit1_CUDA 2023-01-11T23:23:23.4047710Z [ OK ] NVFuserTest.FusionPartialSplit1_CUDA (177 ms) 2023-01-11T23:23:23.4048131Z [ RUN ] NVFuserTest.FusionPartialSplit2_CUDA 2023-01-11T23:23:23.4102883Z [ OK ] NVFuserTest.FusionPartialSplit2_CUDA (5 ms) 2023-01-11T23:23:23.4103680Z [ RUN ] NVFuserTest.FusionPartialSplit3_CUDA 2023-01-11T23:23:23.5942149Z [ OK ] NVFuserTest.FusionPartialSplit3_CUDA (183 ms) 2023-01-11T23:23:23.5942951Z [ RUN ] NVFuserTest.FusionPartialSplit4_CUDA 2023-01-11T23:23:23.8778305Z [ OK ] NVFuserTest.FusionPartialSplit4_CUDA (283 ms) 2023-01-11T23:23:23.8778940Z [ RUN ] NVFuserTest.FusionPartialSplit5_CUDA 2023-01-11T23:23:24.1031827Z [ OK ] NVFuserTest.FusionPartialSplit5_CUDA (225 ms) 2023-01-11T23:23:24.1033145Z [ RUN ] NVFuserTest.FusionPartialSplit6_CUDA 2023-01-11T23:23:24.3019668Z [ OK ] NVFuserTest.FusionPartialSplit6_CUDA (198 ms) 2023-01-11T23:23:24.3020089Z [ RUN ] NVFuserTest.FusionShiftUnswitch1_CUDA 2023-01-11T23:23:24.5127551Z [ OK ] NVFuserTest.FusionShiftUnswitch1_CUDA (210 ms) 2023-01-11T23:23:24.5127970Z [ RUN ] NVFuserTest.FusionGatherUnswitch1_CUDA 2023-01-11T23:23:25.7845950Z [ OK ] NVFuserTest.FusionGatherUnswitch1_CUDA (1271 ms) 2023-01-11T23:23:25.7846600Z [ RUN ] NVFuserTest.FusionGatherStrided1_CUDA 2023-01-11T23:23:25.9714959Z [ OK ] NVFuserTest.FusionGatherStrided1_CUDA (186 ms) 2023-01-11T23:23:25.9715635Z [ RUN ] NVFuserTest.FusionGatherStrided2_CUDA 2023-01-11T23:23:26.1588228Z [ OK ] NVFuserTest.FusionGatherStrided2_CUDA (187 ms) 2023-01-11T23:23:26.1588885Z [ RUN ] NVFuserTest.FusionGatherStrided3_CUDA 2023-01-11T23:23:26.3499777Z [ OK ] NVFuserTest.FusionGatherStrided3_CUDA (191 ms) 2023-01-11T23:23:26.3500419Z [ RUN ] NVFuserTest.FusionGatherStrided4_CUDA 2023-01-11T23:23:26.5704229Z [ OK ] NVFuserTest.FusionGatherStrided4_CUDA (220 ms) 2023-01-11T23:23:26.5704814Z [ RUN ] NVFuserTest.FusionGatherStrided5_CUDA 2023-01-11T23:23:26.7566814Z [ OK ] NVFuserTest.FusionGatherStrided5_CUDA (186 ms) 2023-01-11T23:23:26.7567443Z [ RUN ] NVFuserTest.FusionGatherStrided6_CUDA 2023-01-11T23:23:26.9442045Z [ OK ] NVFuserTest.FusionGatherStrided6_CUDA (187 ms) 2023-01-11T23:23:26.9442613Z [ RUN ] NVFuserTest.FusionGatherStrided7_CUDA 2023-01-11T23:23:26.9491829Z [ OK ] NVFuserTest.FusionGatherStrided7_CUDA (4 ms) 2023-01-11T23:23:26.9492474Z [ RUN ] NVFuserTest.FusionGatherStrided8_CUDA 2023-01-11T23:23:27.1476825Z [ OK ] NVFuserTest.FusionGatherStrided8_CUDA (199 ms) 2023-01-11T23:23:27.1477455Z [ RUN ] NVFuserTest.FusionGatherStridedChain_CUDA 2023-01-11T23:23:27.1513827Z [ OK ] NVFuserTest.FusionGatherStridedChain_CUDA (3 ms) 2023-01-11T23:23:27.1514430Z [ RUN ] NVFuserTest.FusionMaxPoolingStrided_CUDA 2023-01-11T23:23:27.4789261Z [ OK ] NVFuserTest.FusionMaxPoolingStrided_CUDA (327 ms) 2023-01-11T23:23:27.4789919Z [ RUN ] NVFuserTest.FusionConv2DStaticStrided_CUDA 2023-01-11T23:23:27.9185083Z [ OK ] NVFuserTest.FusionConv2DStaticStrided_CUDA (439 ms) 2023-01-11T23:23:27.9185711Z [ RUN ] NVFuserTest.FusionNonDivisibleHalo1_CUDA 2023-01-11T23:23:28.1160128Z [ OK ] NVFuserTest.FusionNonDivisibleHalo1_CUDA (197 ms) 2023-01-11T23:23:28.1160846Z [ RUN ] NVFuserTest.FusionNonDivisibleHalo2_CUDA 2023-01-11T23:23:29.0930667Z [ OK ] NVFuserTest.FusionNonDivisibleHalo2_CUDA (977 ms) 2023-01-11T23:23:29.0931498Z [ RUN ] NVFuserTest.FusionGather9ptStencilDoubleBuffering_CUDA 2023-01-11T23:23:29.3122181Z [ OK ] NVFuserTest.FusionGather9ptStencilDoubleBuffering_CUDA (218 ms) 2023-01-11T23:23:29.3122950Z [ RUN ] NVFuserTest.FusionValidateParallelizeShift_CUDA 2023-01-11T23:23:29.5092129Z [ OK ] NVFuserTest.FusionValidateParallelizeShift_CUDA (197 ms) 2023-01-11T23:23:29.5092876Z [ RUN ] NVFuserTest.FusionGatherIterTypePromotion_CUDA 2023-01-11T23:23:29.7325874Z [ OK ] NVFuserTest.FusionGatherIterTypePromotion_CUDA (223 ms) 2023-01-11T23:23:29.7326564Z [ RUN ] NVFuserTest.FusionContigPredicateShift_CUDA 2023-01-11T23:23:29.9215238Z [ OK ] NVFuserTest.FusionContigPredicateShift_CUDA (188 ms) 2023-01-11T23:23:29.9215876Z [ RUN ] NVFuserTest.FusionVoltaMMATT_CUDA 2023-01-11T23:23:30.2762209Z [ OK ] NVFuserTest.FusionVoltaMMATT_CUDA (354 ms) 2023-01-11T23:23:30.2762651Z [ RUN ] NVFuserTest.FusionVoltaMMATN_CUDA 2023-01-11T23:23:30.6435551Z [ OK ] NVFuserTest.FusionVoltaMMATN_CUDA (367 ms) 2023-01-11T23:23:30.6436542Z [ RUN ] NVFuserTest.FusionVoltaMMANT_CUDA 2023-01-11T23:23:30.9841339Z [ OK ] NVFuserTest.FusionVoltaMMANT_CUDA (340 ms) 2023-01-11T23:23:30.9841753Z [ RUN ] NVFuserTest.FusionVoltaMatmul_CUDA 2023-01-11T23:24:54.7088126Z [ OK ] NVFuserTest.FusionVoltaMatmul_CUDA (83724 ms) 2023-01-11T23:24:54.7088589Z [ RUN ] NVFuserTest.FusionVoltaMatmulRegDoubleBuffer_CUDA 2023-01-11T23:26:19.9360640Z [ OK ] NVFuserTest.FusionVoltaMatmulRegDoubleBuffer_CUDA (85227 ms) 2023-01-11T23:26:19.9361089Z [ RUN ] NVFuserTest.FusionAmpereMMATN_CUDA 2023-01-11T23:26:20.8708707Z [ OK ] NVFuserTest.FusionAmpereMMATN_CUDA (934 ms) 2023-01-11T23:26:20.8709637Z [ RUN ] NVFuserTest.FusionAmpereMMATT_CUDA 2023-01-11T23:26:21.8750973Z [ OK ] NVFuserTest.FusionAmpereMMATT_CUDA (1004 ms) 2023-01-11T23:26:21.8752240Z [ RUN ] NVFuserTest.FusionAmpereMMANT_CUDA 2023-01-11T23:26:22.8820168Z [ OK ] NVFuserTest.FusionAmpereMMANT_CUDA (1006 ms) 2023-01-11T23:26:22.8820876Z [ RUN ] NVFuserTest.FusionAmpereMatmul_CUDA 2023-01-11T23:26:25.7296383Z [ OK ] NVFuserTest.FusionAmpereMatmul_CUDA (2847 ms) 2023-01-11T23:26:25.7297163Z [ RUN ] NVFuserTest.FusionAmpereMatmulPipelineGmem_CUDA 2023-01-11T23:26:31.2761813Z [ OK ] NVFuserTest.FusionAmpereMatmulPipelineGmem_CUDA (5546 ms) 2023-01-11T23:26:31.2762285Z [ RUN ] NVFuserTest.FusionAmpereMatmulRegDbouleBuffer_CUDA 2023-01-11T23:26:36.9900815Z [ OK ] NVFuserTest.FusionAmpereMatmulRegDbouleBuffer_CUDA (5713 ms) 2023-01-11T23:26:36.9901662Z [ RUN ] NVFuserTest.FusionMatmulMatmulAmpere_CUDA 2023-01-11T23:26:38.3271730Z [ OK ] NVFuserTest.FusionMatmulMatmulAmpere_CUDA (1337 ms) 2023-01-11T23:26:38.3272792Z [ RUN ] NVFuserTest.FusionMatmulSoftmaxMatmulAmpere_CUDA 2023-01-11T23:26:40.9613157Z [ OK ] NVFuserTest.FusionMatmulSoftmaxMatmulAmpere_CUDA (2634 ms) 2023-01-11T23:26:40.9614050Z [ RUN ] NVFuserTest.FusionTuringMMATN_CUDA 2023-01-11T23:26:41.8960895Z [ OK ] NVFuserTest.FusionTuringMMATN_CUDA (934 ms) 2023-01-11T23:26:41.8961661Z [ RUN ] NVFuserTest.FusionTuringMMATT_CUDA 2023-01-11T23:26:42.9028416Z [ OK ] NVFuserTest.FusionTuringMMATT_CUDA (1006 ms) 2023-01-11T23:26:42.9029179Z [ RUN ] NVFuserTest.FusionTuringMMANT_CUDA 2023-01-11T23:26:43.9108509Z [ OK ] NVFuserTest.FusionTuringMMANT_CUDA (1008 ms) 2023-01-11T23:26:43.9109543Z [ RUN ] NVFuserTest.FusionTuringMatmul_CUDA 2023-01-11T23:26:46.7068631Z [ OK ] NVFuserTest.FusionTuringMatmul_CUDA (2795 ms) 2023-01-11T23:26:46.7069461Z [ RUN ] NVFuserTest.FusionAmpereMatmulTNcpAsync_CUDA 2023-01-11T23:26:47.2297141Z [ OK ] NVFuserTest.FusionAmpereMatmulTNcpAsync_CUDA (523 ms) 2023-01-11T23:26:47.2298150Z [ RUN ] NVFuserTest.FusionAmpereStridedBatchedMatmulTN_CUDA 2023-01-11T23:26:48.0353237Z [ OK ] NVFuserTest.FusionAmpereStridedBatchedMatmulTN_CUDA (805 ms) 2023-01-11T23:26:48.0354154Z [ RUN ] NVFuserTest.FusionAmpereViewMatmulTN_CUDA 2023-01-11T23:26:48.6133573Z [ OK ] NVFuserTest.FusionAmpereViewMatmulTN_CUDA (578 ms) 2023-01-11T23:26:48.6134142Z [ RUN ] NVFuserTest.FusionVoltaMatMulTNCrossWarp_CUDA 2023-01-11T23:26:53.2347609Z [ OK ] NVFuserTest.FusionVoltaMatMulTNCrossWarp_CUDA (4621 ms) 2023-01-11T23:26:53.2348442Z [ RUN ] NVFuserTest.FusionVoltaMatMulTNCrossCTA_CUDA 2023-01-11T23:27:35.6292914Z [ OK ] NVFuserTest.FusionVoltaMatMulTNCrossCTA_CUDA (42394 ms) 2023-01-11T23:27:35.6294042Z [ RUN ] NVFuserTest.FusionAmpereMatmulTNSwizzled_CUDA 2023-01-11T23:27:37.0102020Z [ OK ] NVFuserTest.FusionAmpereMatmulTNSwizzled_CUDA (1380 ms) 2023-01-11T23:27:37.0103183Z [ RUN ] NVFuserTest.FusionAmpereMatmulLargeLoad_CUDA 2023-01-11T23:27:39.8418597Z [ OK ] NVFuserTest.FusionAmpereMatmulLargeLoad_CUDA (2831 ms) 2023-01-11T23:27:39.8419368Z [ RUN ] NVFuserTest.FusionTuringMatmulLargeLoad_CUDA 2023-01-11T23:27:42.6118277Z [ OK ] NVFuserTest.FusionTuringMatmulLargeLoad_CUDA (2769 ms) 2023-01-11T23:27:42.6119422Z [ RUN ] NVFuserTest.FusionViewDtypeSameSizeOutput_CUDA 2023-01-11T23:27:42.8008929Z [ OK ] NVFuserTest.FusionViewDtypeSameSizeOutput_CUDA (189 ms) 2023-01-11T23:27:42.8009945Z [ RUN ] NVFuserTest.FusionViewDtypeFailMismatchSize_CUDA 2023-01-11T23:27:42.8040316Z [ OK ] NVFuserTest.FusionViewDtypeFailMismatchSize_CUDA (3 ms) 2023-01-11T23:27:42.8040938Z [ RUN ] NVFuserTest.FusionViewAsRealOutput_CUDA 2023-01-11T23:27:42.9842097Z [ OK ] NVFuserTest.FusionViewAsRealOutput_CUDA (180 ms) 2023-01-11T23:27:42.9842684Z [ RUN ] NVFuserTest.FusionViewRfactorExtentReplacement_CUDA 2023-01-11T23:27:43.5888767Z [ OK ] NVFuserTest.FusionViewRfactorExtentReplacement_CUDA (604 ms) 2023-01-11T23:27:43.5890411Z [ RUN ] NVFuserTest.FusionViewOutput_CUDA 2023-01-11T23:27:43.7837411Z [ OK ] NVFuserTest.FusionViewOutput_CUDA (194 ms) 2023-01-11T23:27:43.7838229Z [ RUN ] NVFuserTest.FusionViewFailMismatchSize_CUDA 2023-01-11T23:27:43.7855980Z [ OK ] NVFuserTest.FusionViewFailMismatchSize_CUDA (2 ms) 2023-01-11T23:27:43.7856754Z [ RUN ] NVFuserTest.FusionViewFailMulitDimInference_CUDA 2023-01-11T23:27:43.7876782Z [ OK ] NVFuserTest.FusionViewFailMulitDimInference_CUDA (2 ms) 2023-01-11T23:27:43.7877193Z [ RUN ] NVFuserTest.FusionViewReductionShmoo_CUDA 2023-01-11T23:28:21.8919232Z [ OK ] NVFuserTest.FusionViewReductionShmoo_CUDA (38103 ms) 2023-01-11T23:28:21.8920071Z [ RUN ] NVFuserTest.FusionViewPersistentShmoo_CUDA 2023-01-11T23:29:32.6822915Z [ OK ] NVFuserTest.FusionViewPersistentShmoo_CUDA (70790 ms) 2023-01-11T23:29:32.6823785Z [ RUN ] NVFuserTest.FusionViewSplit_CUDA 2023-01-11T23:29:33.0993193Z [ OK ] NVFuserTest.FusionViewSplit_CUDA (417 ms) 2023-01-11T23:29:33.0994744Z [ RUN ] NVFuserTest.FusionViewBroadcast_CUDA 2023-01-11T23:29:33.4943493Z [ OK ] NVFuserTest.FusionViewBroadcast_CUDA (394 ms) 2023-01-11T23:29:33.4944185Z [ RUN ] NVFuserTest.FusionViewMerge_CUDA 2023-01-11T23:29:33.8960689Z [ OK ] NVFuserTest.FusionViewMerge_CUDA (401 ms) 2023-01-11T23:29:33.8961065Z [ RUN ] NVFuserTest.FusionViewAllShmoo_CUDA 2023-01-11T23:29:49.6735408Z [ OK ] NVFuserTest.FusionViewAllShmoo_CUDA (15777 ms) 2023-01-11T23:29:49.6735794Z [ RUN ] NVFuserTest.FusionViewStride_CUDA 2023-01-11T23:30:05.7922591Z [ OK ] NVFuserTest.FusionViewStride_CUDA (16118 ms) 2023-01-11T23:30:05.7923012Z [ RUN ] NVFuserTest.FusionViewBinary_CUDA 2023-01-11T23:30:06.3081621Z [ OK ] NVFuserTest.FusionViewBinary_CUDA (515 ms) 2023-01-11T23:30:06.3082310Z [ RUN ] NVFuserTest.FusionViewConcreteDomain_CUDA 2023-01-11T23:30:06.4898785Z [ OK ] NVFuserTest.FusionViewConcreteDomain_CUDA (181 ms) 2023-01-11T23:30:06.4899576Z [ RUN ] NVFuserTest.FusionViewConcreteDomain2_CUDA 2023-01-11T23:30:07.2470015Z [ OK ] NVFuserTest.FusionViewConcreteDomain2_CUDA (757 ms) 2023-01-11T23:30:07.2470470Z [ RUN ] NVFuserTest.FusionViewConcreteDomain3_CUDA 2023-01-11T23:30:07.5920216Z [ OK ] NVFuserTest.FusionViewConcreteDomain3_CUDA (345 ms) 2023-01-11T23:30:07.5920679Z [ RUN ] NVFuserTest.FusionViewConcreteDomain4_CUDA 2023-01-11T23:30:07.5935763Z [ OK ] NVFuserTest.FusionViewConcreteDomain4_CUDA (1 ms) 2023-01-11T23:30:07.5936152Z [ RUN ] NVFuserTest.FusionViewConcreteDomain5_CUDA 2023-01-11T23:30:07.5993675Z [ OK ] NVFuserTest.FusionViewConcreteDomain5_CUDA (5 ms) 2023-01-11T23:30:07.5994559Z [ RUN ] NVFuserTest.FusionFlattenAfterUnsqueezeOutput_CUDA 2023-01-11T23:30:07.7848967Z [ OK ] NVFuserTest.FusionFlattenAfterUnsqueezeOutput_CUDA (185 ms) 2023-01-11T23:30:07.7849628Z [ RUN ] NVFuserTest.FusionComputeAtRootDomainMapWithView_CUDA 2023-01-11T23:30:07.7850265Z [ OK ] NVFuserTest.FusionComputeAtRootDomainMapWithView_CUDA (0 ms) 2023-01-11T23:30:07.7850728Z [ RUN ] NVFuserTest.FusionExpandRepro_CUDA 2023-01-11T23:30:07.9592275Z [ OK ] NVFuserTest.FusionExpandRepro_CUDA (173 ms) 2023-01-11T23:30:07.9593017Z [ RUN ] NVFuserTest.FusionExpandView1_CUDA 2023-01-11T23:30:08.1503469Z [ OK ] NVFuserTest.FusionExpandView1_CUDA (191 ms) 2023-01-11T23:30:08.1504165Z [ RUN ] NVFuserTest.FusionExpandView2_CUDA 2023-01-11T23:30:08.3434337Z [ OK ] NVFuserTest.FusionExpandView2_CUDA (193 ms) 2023-01-11T23:30:08.3435371Z [ RUN ] NVFuserTest.FusionViewTransformCache_CUDA 2023-01-11T23:30:08.3436381Z [ OK ] NVFuserTest.FusionViewTransformCache_CUDA (0 ms) 2023-01-11T23:30:08.3437124Z [ RUN ] NVFuserTest.FusionViewIdGraph_CUDA 2023-01-11T23:30:08.3447687Z [ OK ] NVFuserTest.FusionViewIdGraph_CUDA (1 ms) 2023-01-11T23:30:08.3448350Z [ RUN ] NVFuserTest.FusionViewVectorize_CUDA 2023-01-11T23:30:08.7482209Z [ OK ] NVFuserTest.FusionViewVectorize_CUDA (403 ms) 2023-01-11T23:30:08.7482786Z [ RUN ] NVFuserTest.FusionExpandFlatten_CUDA 2023-01-11T23:30:09.1622221Z [ OK ] NVFuserTest.FusionExpandFlatten_CUDA (414 ms) 2023-01-11T23:30:09.1622792Z [ RUN ] NVFuserTest.FusionIllegalReductionFlatten_CUDA 2023-01-11T23:30:09.1642346Z [ OK ] NVFuserTest.FusionIllegalReductionFlatten_CUDA (1 ms) 2023-01-11T23:30:09.1642924Z [ RUN ] NVFuserTest.FusionReductionFlatten1_CUDA 2023-01-11T23:30:09.5864400Z [ OK ] NVFuserTest.FusionReductionFlatten1_CUDA (421 ms) 2023-01-11T23:30:09.5865430Z [ RUN ] NVFuserTest.FusionPwiseViewSchedule_CUDA 2023-01-11T23:30:09.9040169Z [ OK ] NVFuserTest.FusionPwiseViewSchedule_CUDA (317 ms) 2023-01-11T23:30:09.9040712Z [ RUN ] NVFuserTest.FusionSumViewSchedule_CUDA 2023-01-11T23:30:12.3139628Z [ OK ] NVFuserTest.FusionSumViewSchedule_CUDA (2409 ms) 2023-01-11T23:30:12.3140654Z [ RUN ] NVFuserTest.FusionViewMagicSchedule1_CUDA 2023-01-11T23:30:12.6559171Z [ OK ] NVFuserTest.FusionViewMagicSchedule1_CUDA (341 ms) 2023-01-11T23:30:12.6560162Z [ RUN ] NVFuserTest.FusionViewMagicSchedule2_CUDA 2023-01-11T23:30:13.5389255Z [ OK ] NVFuserTest.FusionViewMagicSchedule2_CUDA (882 ms) 2023-01-11T23:30:13.5390360Z [ RUN ] NVFuserTest.FusionViewMagicSchedule3_CUDA 2023-01-11T23:30:13.8678728Z [ OK ] NVFuserTest.FusionViewMagicSchedule3_CUDA (329 ms) 2023-01-11T23:30:13.8679788Z [ RUN ] NVFuserTest.FusionViewMagicSchedule4_CUDA 2023-01-11T23:30:14.2166461Z [ OK ] NVFuserTest.FusionViewMagicSchedule4_CUDA (348 ms) 2023-01-11T23:30:14.2166995Z [ RUN ] NVFuserTest.FusionViewMagicSchedule5_CUDA 2023-01-11T23:30:14.6455954Z [ OK ] NVFuserTest.FusionViewMagicSchedule5_CUDA (428 ms) 2023-01-11T23:30:14.6456983Z [ RUN ] NVFuserTest.FusionViewMapping_CUDA 2023-01-11T23:30:15.0560739Z [ OK ] NVFuserTest.FusionViewMapping_CUDA (410 ms) 2023-01-11T23:30:15.0561606Z [ RUN ] NVFuserTest.FusionLowerDivisibleSplits_CUDA 2023-01-11T23:30:15.0592090Z [ OK ] NVFuserTest.FusionLowerDivisibleSplits_CUDA (3 ms) 2023-01-11T23:30:15.0592621Z [ RUN ] NVFuserTest.FusionTranspose1_CUDA 2023-01-11T23:30:15.2301800Z [ OK ] NVFuserTest.FusionTranspose1_CUDA (170 ms) 2023-01-11T23:30:15.2302275Z [ RUN ] NVFuserTest.FusionTranspose2_CUDA 2023-01-11T23:30:15.4039871Z [ OK ] NVFuserTest.FusionTranspose2_CUDA (173 ms) 2023-01-11T23:30:15.4040852Z [ RUN ] NVFuserTest.FusionTransposeWithSwizzle_CUDA 2023-01-11T23:30:15.5812919Z [ OK ] NVFuserTest.FusionTransposeWithSwizzle_CUDA (177 ms) 2023-01-11T23:30:15.5814052Z [ RUN ] NVFuserTest.FusionTransposeWithSwizzle1DThreadBlock_CUDA 2023-01-11T23:30:15.7780691Z [ OK ] NVFuserTest.FusionTransposeWithSwizzle1DThreadBlock_CUDA (196 ms) 2023-01-11T23:30:15.7781735Z [ RUN ] NVFuserTest.FusionScheduleTransposeSimple_CUDA 2023-01-11T23:30:16.4472624Z [ OK ] NVFuserTest.FusionScheduleTransposeSimple_CUDA (669 ms) 2023-01-11T23:30:16.4473283Z [ RUN ] NVFuserTest.FusionScheduleTransposeSinTransposeCos_CUDA 2023-01-11T23:30:17.1689992Z [ OK ] NVFuserTest.FusionScheduleTransposeSinTransposeCos_CUDA (721 ms) 2023-01-11T23:30:17.1690591Z [ RUN ] NVFuserTest.FusionScheduleTransposeMultipleInput_CUDA 2023-01-11T23:30:17.8590808Z [ OK ] NVFuserTest.FusionScheduleTransposeMultipleInput_CUDA (689 ms) 2023-01-11T23:30:17.8591898Z [ RUN ] NVFuserTest.FusionScheduleTransposeMultipleOutput_CUDA 2023-01-11T23:30:18.8833261Z [ OK ] NVFuserTest.FusionScheduleTransposeMultipleOutput_CUDA (1024 ms) 2023-01-11T23:30:18.8833924Z [ RUN ] NVFuserTest.FusionScheduleTransposeMultipleInputOutput_CUDA 2023-01-11T23:30:19.8207664Z [ OK ] NVFuserTest.FusionScheduleTransposeMultipleInputOutput_CUDA (937 ms) 2023-01-11T23:30:19.8208223Z [ RUN ] NVFuserTest.FusionScheduleTransposeMatchingSkipConnection_CUDA 2023-01-11T23:30:20.5547039Z [ OK ] NVFuserTest.FusionScheduleTransposeMatchingSkipConnection_CUDA (733 ms) 2023-01-11T23:30:20.5547955Z [ RUN ] NVFuserTest.FusionScheduleTransposeBroadcast_CUDA 2023-01-11T23:30:20.9181647Z [ OK ] NVFuserTest.FusionScheduleTransposeBroadcast_CUDA (363 ms) 2023-01-11T23:30:20.9182754Z [ RUN ] NVFuserTest.FusionScheduleTransposeNoReference_CUDA 2023-01-11T23:30:20.9213568Z [ OK ] NVFuserTest.FusionScheduleTransposeNoReference_CUDA (3 ms) 2023-01-11T23:30:20.9214150Z [ RUN ] NVFuserTest.FusionScheduleBroadcastOnly_CUDA 2023-01-11T23:30:22.2719541Z [ OK ] NVFuserTest.FusionScheduleBroadcastOnly_CUDA (1350 ms) 2023-01-11T23:30:22.2720666Z [ RUN ] NVFuserTest.FusionScheduleTransposeComplexDAG1_CUDA 2023-01-11T23:30:23.1530448Z [ OK ] NVFuserTest.FusionScheduleTransposeComplexDAG1_CUDA (881 ms) 2023-01-11T23:30:23.1531056Z [ RUN ] NVFuserTest.FusionManualScheduleTransposeComplexDAG1_CUDA 2023-01-11T23:30:24.0074375Z [ OK ] NVFuserTest.FusionManualScheduleTransposeComplexDAG1_CUDA (854 ms) 2023-01-11T23:30:24.0074957Z [ RUN ] NVFuserTest.FusionViewNoTranspose_CUDA 2023-01-11T23:30:24.0075752Z [ OK ] NVFuserTest.FusionViewNoTranspose_CUDA (0 ms) 2023-01-11T23:30:24.0076376Z [ RUN ] NVFuserTest.FusionTransposeSelfMapping_CUDA 2023-01-11T23:30:24.3715061Z [ OK ] NVFuserTest.FusionTransposeSelfMapping_CUDA (363 ms) 2023-01-11T23:30:24.3715749Z [ RUN ] NVFuserTest.FusionScheduleTransposeMissingDim_CUDA 2023-01-11T23:30:24.7850875Z [ OK ] NVFuserTest.FusionScheduleTransposeMissingDim_CUDA (413 ms) 2023-01-11T23:30:24.7851349Z [ RUN ] NVFuserTest.FusionScheduleTransposeSmall_CUDA 2023-01-11T23:30:25.5209506Z [ OK ] NVFuserTest.FusionScheduleTransposeSmall_CUDA (735 ms) 2023-01-11T23:30:25.5210045Z [ RUN ] NVFuserTest.FusionScheduleTransposeSmallInnerSize1_CUDA 2023-01-11T23:30:26.2922853Z [ OK ] NVFuserTest.FusionScheduleTransposeSmallInnerSize1_CUDA (771 ms) 2023-01-11T23:30:26.2923372Z [ RUN ] NVFuserTest.FusionScheduleTransposeSmallInnerSize2_CUDA 2023-01-11T23:30:27.0790826Z [ OK ] NVFuserTest.FusionScheduleTransposeSmallInnerSize2_CUDA (786 ms) 2023-01-11T23:30:27.0791791Z [ RUN ] NVFuserTest.FusionScheduleTransposeSmallInnerSize3_CUDA 2023-01-11T23:30:28.1598076Z [ OK ] NVFuserTest.FusionScheduleTransposeSmallInnerSize3_CUDA (1081 ms) 2023-01-11T23:30:28.1598616Z [ RUN ] NVFuserTest.FusionScheduleTranspose2DSmallInnerSize_CUDA 2023-01-11T23:30:29.5287948Z [ OK ] NVFuserTest.FusionScheduleTranspose2DSmallInnerSize_CUDA (1368 ms) 2023-01-11T23:30:29.5288836Z [ RUN ] NVFuserTest.FusionTransposeBankConflict1_CUDA 2023-01-11T23:30:29.5308963Z [ OK ] NVFuserTest.FusionTransposeBankConflict1_CUDA (2 ms) 2023-01-11T23:30:29.5310105Z [ RUN ] NVFuserTest.FusionTransposeBankConflict2_CUDA 2023-01-11T23:30:29.5321110Z [ OK ] NVFuserTest.FusionTransposeBankConflict2_CUDA (1 ms) 2023-01-11T23:30:29.5321569Z [ RUN ] NVFuserTest.FusionTransposeBankConflict3_CUDA 2023-01-11T23:30:29.5335635Z [ OK ] NVFuserTest.FusionTransposeBankConflict3_CUDA (1 ms) 2023-01-11T23:30:29.5336067Z [ RUN ] NVFuserTest.FusionTransposeBankConflict4_CUDA 2023-01-11T23:30:29.5376594Z [ OK ] NVFuserTest.FusionTransposeBankConflict4_CUDA (3 ms) 2023-01-11T23:30:29.5377224Z [ RUN ] NVFuserTest.FusionTransposeBankConflict5_CUDA 2023-01-11T23:30:29.5393514Z [ OK ] NVFuserTest.FusionTransposeBankConflict5_CUDA (1 ms) 2023-01-11T23:30:29.5393943Z [ RUN ] NVFuserTest.FusionTransposeBankConflict6_CUDA 2023-01-11T23:30:29.5409954Z [ OK ] NVFuserTest.FusionTransposeBankConflict6_CUDA (1 ms) 2023-01-11T23:30:29.5410396Z [ RUN ] NVFuserTest.FusionTransposeBankConflict7_CUDA 2023-01-11T23:30:29.5428211Z [ OK ] NVFuserTest.FusionTransposeBankConflict7_CUDA (1 ms) 2023-01-11T23:30:29.5428994Z [ RUN ] NVFuserTest.FusionTransposeBankConflict8_CUDA 2023-01-11T23:30:29.5443819Z [ OK ] NVFuserTest.FusionTransposeBankConflict8_CUDA (1 ms) 2023-01-11T23:30:29.5444253Z [ RUN ] NVFuserTest.FusionRNGValidateWithCURand_CUDA 2023-01-11T23:30:29.9306960Z [ OK ] NVFuserTest.FusionRNGValidateWithCURand_CUDA (385 ms) 2023-01-11T23:30:29.9307441Z [ RUN ] NVFuserTest.FusionRNGManualScheduleValidateWithCURand_CUDA 2023-01-11T23:30:30.2319284Z [ OK ] NVFuserTest.FusionRNGManualScheduleValidateWithCURand_CUDA (301 ms) 2023-01-11T23:30:30.2319805Z [ RUN ] NVFuserTest.FusionRNGManualScheduleValidateWithCURand2_CUDA 2023-01-11T23:30:30.4429286Z [ OK ] NVFuserTest.FusionRNGManualScheduleValidateWithCURand2_CUDA (210 ms) 2023-01-11T23:30:30.4430353Z [ RUN ] NVFuserTest.FusionBroadcastingRNG_CUDA 2023-01-11T23:30:30.8479402Z [ OK ] NVFuserTest.FusionBroadcastingRNG_CUDA (405 ms) 2023-01-11T23:30:30.8479806Z [ RUN ] NVFuserTest.FusionBroadcastingRNG2_CUDA 2023-01-11T23:30:33.5678722Z [ OK ] NVFuserTest.FusionBroadcastingRNG2_CUDA (2719 ms) 2023-01-11T23:30:33.5679130Z [ RUN ] NVFuserTest.FusionBroadcastingRNGSmem_CUDA 2023-01-11T23:30:34.7446120Z [ OK ] NVFuserTest.FusionBroadcastingRNGSmem_CUDA (1176 ms) 2023-01-11T23:30:34.7446610Z [ RUN ] NVFuserTest.FusionBroadcastingRNGSmemNonSquareTile_CUDA 2023-01-11T23:30:34.9995220Z [ OK ] NVFuserTest.FusionBroadcastingRNGSmemNonSquareTile_CUDA (254 ms) 2023-01-11T23:30:34.9996084Z [ RUN ] NVFuserTest.FusionUniform_CUDA 2023-01-11T23:30:35.3880307Z [ OK ] NVFuserTest.FusionUniform_CUDA (388 ms) 2023-01-11T23:30:35.3880758Z [ RUN ] NVFuserTest.FusionRandLikeReduction_CUDA 2023-01-11T23:30:35.5988278Z [ OK ] NVFuserTest.FusionRandLikeReduction_CUDA (210 ms) 2023-01-11T23:30:35.5988801Z [ RUN ] NVFuserTest.FusionSplitDims_CUDA 2023-01-11T23:30:35.5994450Z [ OK ] NVFuserTest.FusionSplitDims_CUDA (0 ms) 2023-01-11T23:30:35.5994974Z [ RUN ] NVFuserTest.FusionMergeDims_CUDA 2023-01-11T23:30:35.5995920Z [ OK ] NVFuserTest.FusionMergeDims_CUDA (0 ms) 2023-01-11T23:30:35.5996964Z [ RUN ] NVFuserTest.FusionReorderAsRFactor_CUDA 2023-01-11T23:30:35.5997519Z [ OK ] NVFuserTest.FusionReorderAsRFactor_CUDA (0 ms) 2023-01-11T23:30:35.5997931Z [ RUN ] NVFuserTest.FusionDisjointViewSet_CUDA 2023-01-11T23:30:35.5998305Z [ OK ] NVFuserTest.FusionDisjointViewSet_CUDA (0 ms) 2023-01-11T23:30:35.5998677Z [ RUN ] NVFuserTest.FusionMatchingViews_CUDA 2023-01-11T23:30:35.5999115Z [ OK ] NVFuserTest.FusionMatchingViews_CUDA (0 ms) 2023-01-11T23:30:35.5999612Z [ RUN ] NVFuserTest.FusionBroadcastViewMultiples_CUDA 2023-01-11T23:30:35.6013114Z [ OK ] NVFuserTest.FusionBroadcastViewMultiples_CUDA (1 ms) 2023-01-11T23:30:35.6013689Z [ RUN ] NVFuserTest.FusionTVDomainGuard_CUDA 2023-01-11T23:30:35.6014175Z [ OK ] NVFuserTest.FusionTVDomainGuard_CUDA (0 ms) 2023-01-11T23:30:35.6014868Z [----------] 759 tests from NVFuserTest (841041 ms total) 2023-01-11T23:30:35.6015091Z 2023-01-11T23:30:35.6015587Z [----------] 2 tests from NVFuserMultithreadedTest 2023-01-11T23:30:35.6016150Z [ RUN ] NVFuserMultithreadedTest.SingleFunction_CUDA 2023-01-11T23:30:35.8954138Z [ OK ] NVFuserMultithreadedTest.SingleFunction_CUDA (293 ms) 2023-01-11T23:30:35.8955046Z [ RUN ] NVFuserMultithreadedTest.MultipleFunctions_CUDA 2023-01-11T23:30:35.9088787Z [ OK ] NVFuserMultithreadedTest.MultipleFunctions_CUDA (13 ms) 2023-01-11T23:30:35.9090041Z [----------] 2 tests from NVFuserMultithreadedTest (307 ms total) 2023-01-11T23:30:35.9090493Z 2023-01-11T23:30:35.9091072Z [----------] 12 tests from AliasAnalysisTest/BatchAndInstanceNormFixture 2023-01-11T23:30:35.9092364Z [ RUN ] AliasAnalysisTest/BatchAndInstanceNormFixture.BatchAndInstanceNorm/0 2023-01-11T23:30:35.9093737Z [ OK ] AliasAnalysisTest/BatchAndInstanceNormFixture.BatchAndInstanceNorm/0 (0 ms) 2023-01-11T23:30:35.9095352Z [ RUN ] AliasAnalysisTest/BatchAndInstanceNormFixture.BatchAndInstanceNorm/1 2023-01-11T23:30:35.9096476Z [ OK ] AliasAnalysisTest/BatchAndInstanceNormFixture.BatchAndInstanceNorm/1 (0 ms) 2023-01-11T23:30:35.9097262Z [ RUN ] AliasAnalysisTest/BatchAndInstanceNormFixture.BatchAndInstanceNorm/2 2023-01-11T23:30:35.9097796Z [ OK ] AliasAnalysisTest/BatchAndInstanceNormFixture.BatchAndInstanceNorm/2 (0 ms) 2023-01-11T23:30:35.9098282Z [ RUN ] AliasAnalysisTest/BatchAndInstanceNormFixture.BatchAndInstanceNorm/3 2023-01-11T23:30:35.9098772Z [ OK ] AliasAnalysisTest/BatchAndInstanceNormFixture.BatchAndInstanceNorm/3 (0 ms) 2023-01-11T23:30:35.9099579Z [ RUN ] AliasAnalysisTest/BatchAndInstanceNormFixture.BatchAndInstanceNormTrainingUnknown/0 2023-01-11T23:30:35.9100171Z [ OK ] AliasAnalysisTest/BatchAndInstanceNormFixture.BatchAndInstanceNormTrainingUnknown/0 (0 ms) 2023-01-11T23:30:35.9100768Z [ RUN ] AliasAnalysisTest/BatchAndInstanceNormFixture.BatchAndInstanceNormTrainingUnknown/1 2023-01-11T23:30:35.9101363Z [ OK ] AliasAnalysisTest/BatchAndInstanceNormFixture.BatchAndInstanceNormTrainingUnknown/1 (0 ms) 2023-01-11T23:30:35.9101946Z [ RUN ] AliasAnalysisTest/BatchAndInstanceNormFixture.BatchAndInstanceNormTrainingUnknown/2 2023-01-11T23:30:35.9102542Z [ OK ] AliasAnalysisTest/BatchAndInstanceNormFixture.BatchAndInstanceNormTrainingUnknown/2 (0 ms) 2023-01-11T23:30:35.9103131Z [ RUN ] AliasAnalysisTest/BatchAndInstanceNormFixture.BatchAndInstanceNormTrainingUnknown/3 2023-01-11T23:30:35.9103726Z [ OK ] AliasAnalysisTest/BatchAndInstanceNormFixture.BatchAndInstanceNormTrainingUnknown/3 (0 ms) 2023-01-11T23:30:35.9104301Z [ RUN ] AliasAnalysisTest/BatchAndInstanceNormFixture.BatchNormTrainingWithNoMeanOrVar/0 2023-01-11T23:30:35.9104877Z [ OK ] AliasAnalysisTest/BatchAndInstanceNormFixture.BatchNormTrainingWithNoMeanOrVar/0 (0 ms) 2023-01-11T23:30:35.9105447Z [ RUN ] AliasAnalysisTest/BatchAndInstanceNormFixture.BatchNormTrainingWithNoMeanOrVar/1 2023-01-11T23:30:35.9106020Z [ OK ] AliasAnalysisTest/BatchAndInstanceNormFixture.BatchNormTrainingWithNoMeanOrVar/1 (0 ms) 2023-01-11T23:30:35.9106577Z [ RUN ] AliasAnalysisTest/BatchAndInstanceNormFixture.BatchNormTrainingWithNoMeanOrVar/2 2023-01-11T23:30:35.9107155Z [ OK ] AliasAnalysisTest/BatchAndInstanceNormFixture.BatchNormTrainingWithNoMeanOrVar/2 (0 ms) 2023-01-11T23:30:35.9107762Z [ RUN ] AliasAnalysisTest/BatchAndInstanceNormFixture.BatchNormTrainingWithNoMeanOrVar/3 2023-01-11T23:30:35.9108338Z [ OK ] AliasAnalysisTest/BatchAndInstanceNormFixture.BatchNormTrainingWithNoMeanOrVar/3 (0 ms) 2023-01-11T23:30:35.9108841Z [----------] 12 tests from AliasAnalysisTest/BatchAndInstanceNormFixture (0 ms total) 2023-01-11T23:30:35.9109109Z 2023-01-11T23:30:35.9109357Z [----------] 10 tests from PyTorch/LiteInterpreterDynamicTypeTestFixture 2023-01-11T23:30:35.9109816Z [ RUN ] PyTorch/LiteInterpreterDynamicTypeTestFixture.Conformance/0 2023-01-11T23:30:36.8078668Z [ OK ] PyTorch/LiteInterpreterDynamicTypeTestFixture.Conformance/0 (897 ms) 2023-01-11T23:30:36.8079370Z [ RUN ] PyTorch/LiteInterpreterDynamicTypeTestFixture.Conformance/1 2023-01-11T23:30:38.0435500Z [ OK ] PyTorch/LiteInterpreterDynamicTypeTestFixture.Conformance/1 (1235 ms) 2023-01-11T23:30:38.0436502Z [ RUN ] PyTorch/LiteInterpreterDynamicTypeTestFixture.Conformance/2 2023-01-11T23:30:39.5062956Z [ OK ] PyTorch/LiteInterpreterDynamicTypeTestFixture.Conformance/2 (1462 ms) 2023-01-11T23:30:39.5063451Z [ RUN ] PyTorch/LiteInterpreterDynamicTypeTestFixture.Conformance/3 2023-01-11T23:30:40.9623399Z [ OK ] PyTorch/LiteInterpreterDynamicTypeTestFixture.Conformance/3 (1456 ms) 2023-01-11T23:30:40.9624627Z [ RUN ] PyTorch/LiteInterpreterDynamicTypeTestFixture.Conformance/4 2023-01-11T23:30:42.3999487Z [ OK ] PyTorch/LiteInterpreterDynamicTypeTestFixture.Conformance/4 (1437 ms) 2023-01-11T23:30:42.4000118Z [ RUN ] PyTorch/LiteInterpreterDynamicTypeTestFixture.Conformance/5 2023-01-11T23:30:43.8749728Z [ OK ] PyTorch/LiteInterpreterDynamicTypeTestFixture.Conformance/5 (1474 ms) 2023-01-11T23:30:43.8750420Z [ RUN ] PyTorch/LiteInterpreterDynamicTypeTestFixture.Conformance/6 2023-01-11T23:30:45.3362963Z [ OK ] PyTorch/LiteInterpreterDynamicTypeTestFixture.Conformance/6 (1461 ms) 2023-01-11T23:30:45.3363804Z [ RUN ] PyTorch/LiteInterpreterDynamicTypeTestFixture.Conformance/7 2023-01-11T23:30:46.7752326Z [ OK ] PyTorch/LiteInterpreterDynamicTypeTestFixture.Conformance/7 (1438 ms) 2023-01-11T23:30:46.7753370Z [ RUN ] PyTorch/LiteInterpreterDynamicTypeTestFixture.Conformance/8 2023-01-11T23:30:48.1138870Z [ OK ] PyTorch/LiteInterpreterDynamicTypeTestFixture.Conformance/8 (1338 ms) 2023-01-11T23:30:48.1139876Z [ RUN ] PyTorch/LiteInterpreterDynamicTypeTestFixture.Conformance/9 2023-01-11T23:30:49.4522468Z [ OK ] PyTorch/LiteInterpreterDynamicTypeTestFixture.Conformance/9 (1338 ms) 2023-01-11T23:30:49.4523141Z [----------] 10 tests from PyTorch/LiteInterpreterDynamicTypeTestFixture (13542 ms total) 2023-01-11T23:30:49.4523373Z 2023-01-11T23:30:49.4523547Z [----------] Global test environment tear-down 2023-01-11T23:30:49.4698239Z [==========] 1340 tests from 122 test suites ran. (858993 ms total) 2023-01-11T23:30:49.4698861Z [ PASSED ] 1340 tests. 2023-01-11T23:30:50.0729308Z + [[ linux-bionic-cuda11.6-py3.10-gcc7-sm86 == *cuda* ]] 2023-01-11T23:30:50.0729703Z + [[ default != *nogpu* ]] 2023-01-11T23:30:50.0729970Z + LTC_TS_CUDA=1 2023-01-11T23:30:50.0730439Z + /opt/conda/lib/python3.10/site-packages/torch/bin/test_lazy --gtest_output=xml:test/test-reports/cpp-unittest/test_libtorch/test_lazy.xml 2023-01-11T23:30:50.4864408Z Only one CUDA device detected. Disabling MultiCUDA tests 2023-01-11T23:30:50.4866447Z Note: Google Test filter = *-*_MultiCUDA 2023-01-11T23:30:50.4866809Z [==========] Running 611 tests from 10 test suites. 2023-01-11T23:30:50.4867248Z [----------] Global test environment set-up. 2023-01-11T23:30:50.4867634Z [----------] 11 tests from BackendDeviceTest 2023-01-11T23:30:50.4868085Z [ RUN ] BackendDeviceTest.BackendDeviceType 2023-01-11T23:30:50.4868511Z [ OK ] BackendDeviceTest.BackendDeviceType (0 ms) 2023-01-11T23:30:50.4868945Z [ RUN ] BackendDeviceTest.Basic1 2023-01-11T23:30:50.4869405Z [ OK ] BackendDeviceTest.Basic1 (0 ms) 2023-01-11T23:30:50.4869799Z [ RUN ] BackendDeviceTest.Basic2 2023-01-11T23:30:50.4870358Z [ OK ] BackendDeviceTest.Basic2 (0 ms) 2023-01-11T23:30:50.4870716Z [ RUN ] BackendDeviceTest.Basic3 2023-01-11T23:30:50.4871061Z [ OK ] BackendDeviceTest.Basic3 (0 ms) 2023-01-11T23:30:50.4871555Z [ RUN ] BackendDeviceTest.Basic4 2023-01-11T23:30:50.4871895Z [ OK ] BackendDeviceTest.Basic4 (0 ms) 2023-01-11T23:30:50.4872341Z [ RUN ] BackendDeviceTest.Compare 2023-01-11T23:30:50.4872760Z [ OK ] BackendDeviceTest.Compare (0 ms) 2023-01-11T23:30:50.4873069Z [ RUN ] BackendDeviceTest.Ostream 2023-01-11T23:30:50.4873387Z [ OK ] BackendDeviceTest.Ostream (0 ms) 2023-01-11T23:30:50.4873713Z [ RUN ] BackendDeviceTest.FromAten 2023-01-11T23:30:50.4879405Z [ OK ] BackendDeviceTest.FromAten (1 ms) 2023-01-11T23:30:50.4880092Z [ RUN ] BackendDeviceTest.ToAten 2023-01-11T23:30:50.4880531Z [ OK ] BackendDeviceTest.ToAten (0 ms) 2023-01-11T23:30:50.4880902Z [ RUN ] BackendDeviceTest.GetBackendDevice1 2023-01-11T23:30:50.4886405Z [ OK ] BackendDeviceTest.GetBackendDevice1 (0 ms) 2023-01-11T23:30:50.4886930Z [ RUN ] BackendDeviceTest.GetBackendDevice2 2023-01-11T23:30:50.4887438Z [ OK ] BackendDeviceTest.GetBackendDevice2 (0 ms) 2023-01-11T23:30:50.4887932Z [----------] 11 tests from BackendDeviceTest (1 ms total) 2023-01-11T23:30:50.4888157Z 2023-01-11T23:30:50.4888309Z [----------] 2 tests from CacheTest 2023-01-11T23:30:50.4888582Z [ RUN ] CacheTest.BasicTest 2023-01-11T23:30:50.4888875Z [ OK ] CacheTest.BasicTest (0 ms) 2023-01-11T23:30:50.4889356Z [ RUN ] CacheTest.ShapeCacheTestForDynamicShape 2023-01-11T23:30:50.4889744Z [ OK ] CacheTest.ShapeCacheTestForDynamicShape (0 ms) 2023-01-11T23:30:50.4890108Z [----------] 2 tests from CacheTest (0 ms total) 2023-01-11T23:30:50.4890267Z 2023-01-11T23:30:50.4890407Z [----------] 5 tests from IrTest 2023-01-11T23:30:50.4890673Z [ RUN ] IrTest.BasicTest 2023-01-11T23:30:50.4890943Z [ OK ] IrTest.BasicTest (0 ms) 2023-01-11T23:30:50.4891223Z [ RUN ] IrTest.MetaDataTest 2023-01-11T23:30:50.4891513Z [ OK ] IrTest.MetaDataTest (0 ms) 2023-01-11T23:30:50.4891795Z [ RUN ] IrTest.TsNodeTest 2023-01-11T23:30:50.4892078Z [ OK ] IrTest.TsNodeTest (0 ms) 2023-01-11T23:30:50.4892370Z [ RUN ] IrTest.DimensionNodeTest 2023-01-11T23:30:50.4892671Z [ OK ] IrTest.DimensionNodeTest (0 ms) 2023-01-11T23:30:50.4892993Z [ RUN ] IrTest.DimensionIsDynamicTest 2023-01-11T23:30:50.4893329Z [ OK ] IrTest.DimensionIsDynamicTest (0 ms) 2023-01-11T23:30:50.4893645Z [----------] 5 tests from IrTest (0 ms total) 2023-01-11T23:30:50.4893795Z 2023-01-11T23:30:50.4893945Z [----------] 2 tests from IrUtilTest 2023-01-11T23:30:50.4894225Z [ RUN ] IrUtilTest.BasicTest 2023-01-11T23:30:50.4894776Z [ OK ] IrUtilTest.BasicTest (0 ms) 2023-01-11T23:30:50.4895202Z [ RUN ] IrUtilTest.TestCircle 2023-01-11T23:30:50.4909580Z [ OK ] IrUtilTest.TestCircle (2 ms) 2023-01-11T23:30:50.4910046Z [----------] 2 tests from IrUtilTest (2 ms total) 2023-01-11T23:30:50.4910211Z 2023-01-11T23:30:50.4912102Z [----------] 2 tests from HashTest 2023-01-11T23:30:50.4912552Z [ RUN ] HashTest.Scalar 2023-01-11T23:30:50.4912951Z [ OK ] HashTest.Scalar (0 ms) 2023-01-11T23:30:50.4913323Z [ RUN ] HashTest.Sanity 2023-01-11T23:30:50.4913668Z [ OK ] HashTest.Sanity (0 ms) 2023-01-11T23:30:50.4913977Z [----------] 2 tests from HashTest (0 ms total) 2023-01-11T23:30:50.4914134Z 2023-01-11T23:30:50.4914308Z [----------] 3 tests from PermutationUtilTest 2023-01-11T23:30:50.4914758Z [ RUN ] PermutationUtilTest.TestInversePermutation 2023-01-11T23:30:50.4931149Z [ OK ] PermutationUtilTest.TestInversePermutation (2 ms) 2023-01-11T23:30:50.4931986Z [ RUN ] PermutationUtilTest.TestIsPermutation 2023-01-11T23:30:50.4932744Z [ OK ] PermutationUtilTest.TestIsPermutation (0 ms) 2023-01-11T23:30:50.4933449Z [ RUN ] PermutationUtilTest.TestPermute 2023-01-11T23:30:50.4949545Z [ OK ] PermutationUtilTest.TestPermute (1 ms) 2023-01-11T23:30:50.4950123Z [----------] 3 tests from PermutationUtilTest (3 ms total) 2023-01-11T23:30:50.4951908Z 2023-01-11T23:30:50.4952525Z [----------] 7 tests from ShapeTest 2023-01-11T23:30:50.4952915Z [ RUN ] ShapeTest.Basic1 2023-01-11T23:30:50.4953315Z [ OK ] ShapeTest.Basic1 (0 ms) 2023-01-11T23:30:50.4953694Z [ RUN ] ShapeTest.Basic2 2023-01-11T23:30:50.4954070Z [ OK ] ShapeTest.Basic2 (0 ms) 2023-01-11T23:30:50.4954437Z [ RUN ] ShapeTest.Basic3 2023-01-11T23:30:50.4954813Z [ OK ] ShapeTest.Basic3 (0 ms) 2023-01-11T23:30:50.4955197Z [ RUN ] ShapeTest.SetScalarType 2023-01-11T23:30:50.4955607Z [ OK ] ShapeTest.SetScalarType (0 ms) 2023-01-11T23:30:50.4956007Z [ RUN ] ShapeTest.SetSize 2023-01-11T23:30:50.4956335Z [ OK ] ShapeTest.SetSize (0 ms) 2023-01-11T23:30:50.4956600Z [ RUN ] ShapeTest.Equal 2023-01-11T23:30:50.4956879Z [ OK ] ShapeTest.Equal (0 ms) 2023-01-11T23:30:50.4957155Z [ RUN ] ShapeTest.Ostream 2023-01-11T23:30:50.4957432Z [ OK ] ShapeTest.Ostream (0 ms) 2023-01-11T23:30:50.4957850Z [----------] 7 tests from ShapeTest (0 ms total) 2023-01-11T23:30:50.4958011Z 2023-01-11T23:30:50.4958168Z [----------] 2 tests from TrieCacheTest 2023-01-11T23:30:50.4958485Z [ RUN ] TrieCacheTest.TestSinglePath 2023-01-11T23:30:50.4958819Z [ OK ] TrieCacheTest.TestSinglePath (0 ms) 2023-01-11T23:30:50.4959146Z [ RUN ] TrieCacheTest.TestTwoPaths 2023-01-11T23:30:50.4959472Z [ OK ] TrieCacheTest.TestTwoPaths (0 ms) 2023-01-11T23:30:50.4959808Z [----------] 2 tests from TrieCacheTest (0 ms total) 2023-01-11T23:30:50.4959976Z 2023-01-11T23:30:50.4960122Z [----------] 3 tests from UtilTest 2023-01-11T23:30:50.4960421Z [ RUN ] UtilTest.ExceptionCleanup 2023-01-11T23:30:50.4960736Z [ OK ] UtilTest.ExceptionCleanup (0 ms) 2023-01-11T23:30:50.4961037Z [ RUN ] UtilTest.MaybeRef 2023-01-11T23:30:50.4961367Z [ OK ] UtilTest.MaybeRef (0 ms) 2023-01-11T23:30:50.4961637Z [ RUN ] UtilTest.Iota 2023-01-11T23:30:50.4961900Z [ OK ] UtilTest.Iota (0 ms) 2023-01-11T23:30:50.4962204Z [----------] 3 tests from UtilTest (0 ms total) 2023-01-11T23:30:50.4962362Z 2023-01-11T23:30:50.4962518Z [----------] 574 tests from LazyOpsTest 2023-01-11T23:30:50.4962819Z [ RUN ] LazyOpsTest.TestScalarTensor 2023-01-11T23:30:51.3512088Z [ OK ] LazyOpsTest.TestScalarTensor (855 ms) 2023-01-11T23:30:51.3512543Z [ RUN ] LazyOpsTest.TestClone 2023-01-11T23:30:51.3523253Z [ OK ] LazyOpsTest.TestClone (1 ms) 2023-01-11T23:30:51.3523570Z [ RUN ] LazyOpsTest.TestTo 2023-01-11T23:30:51.3523938Z [ OK ] LazyOpsTest.TestTo (0 ms) 2023-01-11T23:30:51.3524406Z [ RUN ] LazyOpsTest.TestIsFloatingPoint 2023-01-11T23:30:51.3524871Z [ OK ] LazyOpsTest.TestIsFloatingPoint (0 ms) 2023-01-11T23:30:51.3525214Z [ RUN ] LazyOpsTest.TestIsSigned 2023-01-11T23:30:51.3525532Z [ OK ] LazyOpsTest.TestIsSigned (0 ms) 2023-01-11T23:30:51.3525843Z [ RUN ] LazyOpsTest.TestCastByte 2023-01-11T23:30:51.3557914Z [ OK ] LazyOpsTest.TestCastByte (3 ms) 2023-01-11T23:30:51.3558703Z [ RUN ] LazyOpsTest.TestCastChar 2023-01-11T23:30:51.3560723Z [ OK ] LazyOpsTest.TestCastChar (0 ms) 2023-01-11T23:30:51.3561387Z [ RUN ] LazyOpsTest.TestCastShort 2023-01-11T23:30:51.3563580Z [ OK ] LazyOpsTest.TestCastShort (0 ms) 2023-01-11T23:30:51.3563904Z [ RUN ] LazyOpsTest.TestCastInt 2023-01-11T23:30:51.3567787Z [ OK ] LazyOpsTest.TestCastInt (0 ms) 2023-01-11T23:30:51.3568115Z [ RUN ] LazyOpsTest.TestCastLong 2023-01-11T23:30:51.3570948Z [ OK ] LazyOpsTest.TestCastLong (0 ms) 2023-01-11T23:30:51.3571441Z [ RUN ] LazyOpsTest.TestCastFloat 2023-01-11T23:30:51.3571885Z [ OK ] LazyOpsTest.TestCastFloat (0 ms) 2023-01-11T23:30:51.3572204Z [ RUN ] LazyOpsTest.TestRetainType 2023-01-11T23:30:51.3574164Z [ OK ] LazyOpsTest.TestRetainType (0 ms) 2023-01-11T23:30:51.3576116Z [ RUN ] LazyOpsTest.TestLogicalTypeWithInterop 2023-01-11T23:30:51.3580533Z [ OK ] LazyOpsTest.TestLogicalTypeWithInterop (0 ms) 2023-01-11T23:30:51.3592304Z [ RUN ] LazyOpsTest.TestAdd 2023-01-11T23:30:51.3592750Z [ OK ] LazyOpsTest.TestAdd (0 ms) 2023-01-11T23:30:51.3593150Z [ RUN ] LazyOpsTest.TestAddHalf 2023-01-11T23:30:51.3593551Z [ OK ] LazyOpsTest.TestAddHalf (0 ms) 2023-01-11T23:30:51.3593998Z [ RUN ] LazyOpsTest.TestAddMixedPrecision 2023-01-11T23:30:51.3594851Z [ OK ] LazyOpsTest.TestAddMixedPrecision (0 ms) 2023-01-11T23:30:51.3595297Z [ RUN ] LazyOpsTest.TestAddInPlace 2023-01-11T23:30:51.3600779Z [ OK ] LazyOpsTest.TestAddInPlace (0 ms) 2023-01-11T23:30:51.3601193Z [ RUN ] LazyOpsTest.TestAddScalar 2023-01-11T23:30:51.3604677Z [ OK ] LazyOpsTest.TestAddScalar (0 ms) 2023-01-11T23:30:51.3605066Z [ RUN ] LazyOpsTest.TestAddScalarInPlace 2023-01-11T23:30:51.3610474Z [ OK ] LazyOpsTest.TestAddScalarInPlace (0 ms) 2023-01-11T23:30:51.3611047Z [ RUN ] LazyOpsTest.TestAddZeroSizeDim 2023-01-11T23:30:51.3613600Z [ OK ] LazyOpsTest.TestAddZeroSizeDim (0 ms) 2023-01-11T23:30:51.3613929Z [ RUN ] LazyOpsTest.TestSub 2023-01-11T23:30:51.3618256Z [ OK ] LazyOpsTest.TestSub (0 ms) 2023-01-11T23:30:51.3618568Z [ RUN ] LazyOpsTest.TestSubInPlace 2023-01-11T23:30:51.3624201Z [ OK ] LazyOpsTest.TestSubInPlace (0 ms) 2023-01-11T23:30:51.3624535Z [ RUN ] LazyOpsTest.TestSubScalar 2023-01-11T23:30:51.3628154Z [ OK ] LazyOpsTest.TestSubScalar (0 ms) 2023-01-11T23:30:51.3628508Z [ RUN ] LazyOpsTest.TestSubScalarInPlace 2023-01-11T23:30:51.3634028Z [ OK ] LazyOpsTest.TestSubScalarInPlace (0 ms) 2023-01-11T23:30:51.3634361Z [ RUN ] LazyOpsTest.TestMul 2023-01-11T23:30:51.3638245Z [ OK ] LazyOpsTest.TestMul (0 ms) 2023-01-11T23:30:51.3638573Z [ RUN ] LazyOpsTest.TestMulInPlace 2023-01-11T23:30:51.3644199Z [ OK ] LazyOpsTest.TestMulInPlace (0 ms) 2023-01-11T23:30:51.3644528Z [ RUN ] LazyOpsTest.TestMulScalar 2023-01-11T23:30:51.3648192Z [ OK ] LazyOpsTest.TestMulScalar (0 ms) 2023-01-11T23:30:51.3648532Z [ RUN ] LazyOpsTest.TestMulScalarInPlace 2023-01-11T23:30:51.3653899Z [ OK ] LazyOpsTest.TestMulScalarInPlace (0 ms) 2023-01-11T23:30:51.3654244Z [ RUN ] LazyOpsTest.TestDiv 2023-01-11T23:30:51.3778463Z [ OK ] LazyOpsTest.TestDiv (12 ms) 2023-01-11T23:30:51.3779738Z [ RUN ] LazyOpsTest.TestDivWithRoundingMode 2023-01-11T23:30:51.4181553Z [ OK ] LazyOpsTest.TestDivWithRoundingMode (40 ms) 2023-01-11T23:30:51.4181930Z [ RUN ] LazyOpsTest.TestDivInPlace 2023-01-11T23:30:51.4184297Z [ OK ] LazyOpsTest.TestDivInPlace (0 ms) 2023-01-11T23:30:51.4184892Z [ RUN ] LazyOpsTest.TestDivInPlaceWithRoundingMode 2023-01-11T23:30:51.4197815Z [ OK ] LazyOpsTest.TestDivInPlaceWithRoundingMode (1 ms) 2023-01-11T23:30:51.4198371Z [ RUN ] LazyOpsTest.TestDivScalar 2023-01-11T23:30:51.4244116Z [ OK ] LazyOpsTest.TestDivScalar (4 ms) 2023-01-11T23:30:51.4244636Z [ RUN ] LazyOpsTest.TestDivScalarInPlace 2023-01-11T23:30:51.4252350Z [ OK ] LazyOpsTest.TestDivScalarInPlace (0 ms) 2023-01-11T23:30:51.4252829Z [ RUN ] LazyOpsTest.TestDivOut 2023-01-11T23:30:51.4258734Z [ OK ] LazyOpsTest.TestDivOut (0 ms) 2023-01-11T23:30:51.4259178Z [ RUN ] LazyOpsTest.TestRsubScalar 2023-01-11T23:30:51.4263109Z [ OK ] LazyOpsTest.TestRsubScalar (0 ms) 2023-01-11T23:30:51.4263522Z [ RUN ] LazyOpsTest.TestNe 2023-01-11T23:30:51.4267144Z [ OK ] LazyOpsTest.TestNe (0 ms) 2023-01-11T23:30:51.4267561Z [ RUN ] LazyOpsTest.TestNeInplace 2023-01-11T23:30:51.4274828Z [ OK ] LazyOpsTest.TestNeInplace (0 ms) 2023-01-11T23:30:51.4275491Z [ RUN ] LazyOpsTest.TestEq 2023-01-11T23:30:51.4277102Z [ OK ] LazyOpsTest.TestEq (0 ms) 2023-01-11T23:30:51.4277853Z [ RUN ] LazyOpsTest.TestEqInplace 2023-01-11T23:30:51.4283264Z [ OK ] LazyOpsTest.TestEqInplace (0 ms) 2023-01-11T23:30:51.4283684Z [ RUN ] LazyOpsTest.TestGe 2023-01-11T23:30:51.4286572Z [ OK ] LazyOpsTest.TestGe (0 ms) 2023-01-11T23:30:51.4286917Z [ RUN ] LazyOpsTest.TestGeInplace 2023-01-11T23:30:51.4293045Z [ OK ] LazyOpsTest.TestGeInplace (0 ms) 2023-01-11T23:30:51.4293363Z [ RUN ] LazyOpsTest.TestLe 2023-01-11T23:30:51.4296374Z [ OK ] LazyOpsTest.TestLe (0 ms) 2023-01-11T23:30:51.4296702Z [ RUN ] LazyOpsTest.TestLeInplace 2023-01-11T23:30:51.4302009Z [ OK ] LazyOpsTest.TestLeInplace (0 ms) 2023-01-11T23:30:51.4302405Z [ RUN ] LazyOpsTest.TestGt 2023-01-11T23:30:51.4306275Z [ OK ] LazyOpsTest.TestGt (0 ms) 2023-01-11T23:30:51.4306651Z [ RUN ] LazyOpsTest.TestGtInplace 2023-01-11T23:30:51.4312432Z [ OK ] LazyOpsTest.TestGtInplace (0 ms) 2023-01-11T23:30:51.4312850Z [ RUN ] LazyOpsTest.TestLt 2023-01-11T23:30:51.4315851Z [ OK ] LazyOpsTest.TestLt (0 ms) 2023-01-11T23:30:51.4316208Z [ RUN ] LazyOpsTest.TestLtInplace 2023-01-11T23:30:51.4321928Z [ OK ] LazyOpsTest.TestLtInplace (0 ms) 2023-01-11T23:30:51.4322282Z [ RUN ] LazyOpsTest.TestNeScalar 2023-01-11T23:30:51.4325249Z [ OK ] LazyOpsTest.TestNeScalar (0 ms) 2023-01-11T23:30:51.4325575Z [ RUN ] LazyOpsTest.TestEqScalar 2023-01-11T23:30:51.4328392Z [ OK ] LazyOpsTest.TestEqScalar (0 ms) 2023-01-11T23:30:51.4328717Z [ RUN ] LazyOpsTest.TestGeScalar 2023-01-11T23:30:51.4331683Z [ OK ] LazyOpsTest.TestGeScalar (0 ms) 2023-01-11T23:30:51.4332053Z [ RUN ] LazyOpsTest.TestGeScalarInplace 2023-01-11T23:30:51.4337191Z [ OK ] LazyOpsTest.TestGeScalarInplace (0 ms) 2023-01-11T23:30:51.4337520Z [ RUN ] LazyOpsTest.TestLeScalar 2023-01-11T23:30:51.4340337Z [ OK ] LazyOpsTest.TestLeScalar (0 ms) 2023-01-11T23:30:51.4340684Z [ RUN ] LazyOpsTest.TestLeScalarInplace 2023-01-11T23:30:51.4345573Z [ OK ] LazyOpsTest.TestLeScalarInplace (0 ms) 2023-01-11T23:30:51.4345939Z [ RUN ] LazyOpsTest.TestGtScalar 2023-01-11T23:30:51.4348819Z [ OK ] LazyOpsTest.TestGtScalar (0 ms) 2023-01-11T23:30:51.4349166Z [ RUN ] LazyOpsTest.TestGtScalarInplace 2023-01-11T23:30:51.4354382Z [ OK ] LazyOpsTest.TestGtScalarInplace (0 ms) 2023-01-11T23:30:51.4354740Z [ RUN ] LazyOpsTest.TestLtScalar 2023-01-11T23:30:51.4357436Z [ OK ] LazyOpsTest.TestLtScalar (0 ms) 2023-01-11T23:30:51.4357783Z [ RUN ] LazyOpsTest.TestLtScalarInplace 2023-01-11T23:30:51.4362307Z [ OK ] LazyOpsTest.TestLtScalarInplace (0 ms) 2023-01-11T23:30:51.4362664Z [ RUN ] LazyOpsTest.TestIntegerAdd 2023-01-11T23:30:51.4379042Z [ OK ] LazyOpsTest.TestIntegerAdd (1 ms) 2023-01-11T23:30:51.4379393Z [ RUN ] LazyOpsTest.TestSVD 2023-01-11T23:30:52.2476706Z [ OK ] LazyOpsTest.TestSVD (809 ms) 2023-01-11T23:30:52.2477507Z [ RUN ] LazyOpsTest.TestQR 2023-01-11T23:30:52.2478385Z [W BatchLinearAlgebra.cpp:2459] Warning: torch.qr is deprecated in favor of torch.linalg.qr and will be removed in a future PyTorch release. 2023-01-11T23:30:52.2479258Z The boolean parameter 'some' has been replaced with a string parameter 'mode'. 2023-01-11T23:30:52.2479957Z Q, R = torch.qr(A, some) 2023-01-11T23:30:52.2480344Z should be replaced with 2023-01-11T23:30:52.2480976Z Q, R = torch.linalg.qr(A, 'reduced' if some else 'complete') (function operator()) 2023-01-11T23:30:52.2509088Z [ OK ] LazyOpsTest.TestQR (3 ms) 2023-01-11T23:30:52.2509810Z [ RUN ] LazyOpsTest.TestSymEig 2023-01-11T23:30:52.2512306Z [W BatchLinearAlgebra.cpp:2910] Warning: torch.symeig is deprecated in favor of torch.linalg.eigh and will be removed in a future PyTorch release. 2023-01-11T23:30:52.2512877Z The default behavior has changed from using the upper triangular portion of the matrix by default to using the lower triangular portion. 2023-01-11T23:30:52.2513417Z L, _ = torch.symeig(A, upper=upper) 2023-01-11T23:30:52.2513730Z should be replaced with 2023-01-11T23:30:52.2514175Z L = torch.linalg.eigvalsh(A, UPLO='U' if upper else 'L') 2023-01-11T23:30:52.2514485Z and 2023-01-11T23:30:52.2514703Z L, V = torch.symeig(A, eigenvectors=True) 2023-01-11T23:30:52.2514939Z should be replaced with 2023-01-11T23:30:52.2515274Z L, V = torch.linalg.eigh(A, UPLO='U' if upper else 'L') (function operator()) 2023-01-11T23:30:52.4682361Z [ OK ] LazyOpsTest.TestSymEig (217 ms) 2023-01-11T23:30:52.4682875Z [ RUN ] LazyOpsTest.TestCholesky 2023-01-11T23:30:52.4683467Z [W BatchLinearAlgebra.cpp:1730] Warning: torch.cholesky is deprecated in favor of torch.linalg.cholesky and will be removed in a future PyTorch release. 2023-01-11T23:30:52.4683958Z L = torch.cholesky(A) 2023-01-11T23:30:52.4684279Z should be replaced with 2023-01-11T23:30:52.4684603Z L = torch.linalg.cholesky(A) 2023-01-11T23:30:52.4684877Z and 2023-01-11T23:30:52.4685162Z U = torch.cholesky(A, upper=True) 2023-01-11T23:30:52.4685484Z should be replaced with 2023-01-11T23:30:52.4685798Z U = torch.linalg.cholesky(A).mH(). 2023-01-11T23:30:52.4686302Z This transform will produce equivalent results for all valid (symmetric positive definite) inputs. (function operator()) 2023-01-11T23:30:52.4698763Z [ OK ] LazyOpsTest.TestCholesky (1 ms) 2023-01-11T23:30:52.4699186Z [ RUN ] LazyOpsTest.TestLogDet 2023-01-11T23:30:52.4749285Z [ OK ] LazyOpsTest.TestLogDet (4 ms) 2023-01-11T23:30:52.4750363Z [ RUN ] LazyOpsTest.TestTriangularSolve 2023-01-11T23:30:52.4751265Z [W BatchLinearAlgebra.cpp:2225] Warning: torch.triangular_solve is deprecated in favor of torch.linalg.solve_triangularand will be removed in a future PyTorch release. 2023-01-11T23:30:52.4752068Z torch.linalg.solve_triangular has its arguments reversed and does not return a copy of one of the inputs. 2023-01-11T23:30:52.4752422Z X = torch.triangular_solve(B, A).solution 2023-01-11T23:30:52.4752653Z should be replaced with 2023-01-11T23:30:52.4752917Z X = torch.linalg.solve_triangular(A, B). (function operator()) 2023-01-11T23:30:52.5260888Z [ OK ] LazyOpsTest.TestTriangularSolve (51 ms) 2023-01-11T23:30:52.5261667Z [ RUN ] LazyOpsTest.TestKthValue 2023-01-11T23:30:52.5343745Z [ OK ] LazyOpsTest.TestKthValue (8 ms) 2023-01-11T23:30:52.5344403Z [ RUN ] LazyOpsTest.TestTopK 2023-01-11T23:30:52.5601981Z [ OK ] LazyOpsTest.TestTopK (25 ms) 2023-01-11T23:30:52.5602342Z [ RUN ] LazyOpsTest.TestSort 2023-01-11T23:30:52.5720786Z [ OK ] LazyOpsTest.TestSort (11 ms) 2023-01-11T23:30:52.5721498Z [ RUN ] LazyOpsTest.TestSortDescWithMinValue 2023-01-11T23:30:52.5726498Z [ OK ] LazyOpsTest.TestSortDescWithMinValue (0 ms) 2023-01-11T23:30:52.5726860Z [ RUN ] LazyOpsTest.TestArgSort 2023-01-11T23:30:52.5763404Z [ OK ] LazyOpsTest.TestArgSort (3 ms) 2023-01-11T23:30:52.5763727Z [ RUN ] LazyOpsTest.TestMin 2023-01-11T23:30:52.5766694Z [ OK ] LazyOpsTest.TestMin (0 ms) 2023-01-11T23:30:52.5767501Z [ RUN ] LazyOpsTest.TestMax 2023-01-11T23:30:52.5770446Z [ OK ] LazyOpsTest.TestMax (0 ms) 2023-01-11T23:30:52.5771025Z [ RUN ] LazyOpsTest.TestUnaryMin 2023-01-11T23:30:52.5774414Z [ OK ] LazyOpsTest.TestUnaryMin (0 ms) 2023-01-11T23:30:52.5775116Z [ RUN ] LazyOpsTest.TestUnaryMax 2023-01-11T23:30:52.5778234Z [ OK ] LazyOpsTest.TestUnaryMax (0 ms) 2023-01-11T23:30:52.5778648Z [ RUN ] LazyOpsTest.TestAll 2023-01-11T23:30:52.5796057Z [ OK ] LazyOpsTest.TestAll (1 ms) 2023-01-11T23:30:52.5796508Z [ RUN ] LazyOpsTest.TestAllDim 2023-01-11T23:30:52.5803645Z [ OK ] LazyOpsTest.TestAllDim (0 ms) 2023-01-11T23:30:52.5804115Z [ RUN ] LazyOpsTest.TestAllDimKeep 2023-01-11T23:30:52.5810686Z [ OK ] LazyOpsTest.TestAllDimKeep (0 ms) 2023-01-11T23:30:52.5811144Z [ RUN ] LazyOpsTest.TestAmax 2023-01-11T23:30:52.5914057Z [ OK ] LazyOpsTest.TestAmax (10 ms) 2023-01-11T23:30:52.5914909Z [ RUN ] LazyOpsTest.TestAmin 2023-01-11T23:30:52.6010967Z [ OK ] LazyOpsTest.TestAmin (9 ms) 2023-01-11T23:30:52.6011409Z [ RUN ] LazyOpsTest.TestAny 2023-01-11T23:30:52.6028417Z [ OK ] LazyOpsTest.TestAny (1 ms) 2023-01-11T23:30:52.6028852Z [ RUN ] LazyOpsTest.TestAnyDim 2023-01-11T23:30:52.6037072Z [ OK ] LazyOpsTest.TestAnyDim (0 ms) 2023-01-11T23:30:52.6037528Z [ RUN ] LazyOpsTest.TestAnyDimKeep 2023-01-11T23:30:52.6044571Z [ OK ] LazyOpsTest.TestAnyDimKeep (0 ms) 2023-01-11T23:30:52.6045001Z [ RUN ] LazyOpsTest.TestMean 2023-01-11T23:30:52.6048373Z [ OK ] LazyOpsTest.TestMean (0 ms) 2023-01-11T23:30:52.6048809Z [ RUN ] LazyOpsTest.TestMeanCast 2023-01-11T23:30:52.6052236Z [ OK ] LazyOpsTest.TestMeanCast (0 ms) 2023-01-11T23:30:52.6052575Z [ RUN ] LazyOpsTest.TestMeanInDim 2023-01-11T23:30:52.6079257Z [ OK ] LazyOpsTest.TestMeanInDim (2 ms) 2023-01-11T23:30:52.6079959Z [ RUN ] LazyOpsTest.TestMeanInDims 2023-01-11T23:30:52.6084432Z [ OK ] LazyOpsTest.TestMeanInDims (0 ms) 2023-01-11T23:30:52.6084797Z [ RUN ] LazyOpsTest.TestMeanInDimsKeepCast 2023-01-11T23:30:52.6092939Z [ OK ] LazyOpsTest.TestMeanInDimsKeepCast (0 ms) 2023-01-11T23:30:52.6093292Z [ RUN ] LazyOpsTest.TestMeanInDimOut 2023-01-11T23:30:52.6117555Z [ OK ] LazyOpsTest.TestMeanInDimOut (2 ms) 2023-01-11T23:30:52.6117883Z [ RUN ] LazyOpsTest.TestStd 2023-01-11T23:30:52.6125437Z [ OK ] LazyOpsTest.TestStd (0 ms) 2023-01-11T23:30:52.6125749Z [ RUN ] LazyOpsTest.TestStdInDim 2023-01-11T23:30:52.6224568Z [ OK ] LazyOpsTest.TestStdInDim (9 ms) 2023-01-11T23:30:52.6224960Z [ RUN ] LazyOpsTest.TestStdWithCorrection 2023-01-11T23:30:52.6275134Z [ OK ] LazyOpsTest.TestStdWithCorrection (5 ms) 2023-01-11T23:30:52.6275528Z [ RUN ] LazyOpsTest.TestStdMeanWithCorrection 2023-01-11T23:30:52.6308805Z [ OK ] LazyOpsTest.TestStdMeanWithCorrection (3 ms) 2023-01-11T23:30:52.6309142Z [ RUN ] LazyOpsTest.TestSum 2023-01-11T23:30:52.6313676Z [ OK ] LazyOpsTest.TestSum (0 ms) 2023-01-11T23:30:52.6314054Z [ RUN ] LazyOpsTest.TestSumCast 2023-01-11T23:30:52.6316168Z [ OK ] LazyOpsTest.TestSumCast (0 ms) 2023-01-11T23:30:52.6316517Z [ RUN ] LazyOpsTest.TestSumU8 2023-01-11T23:30:52.6319686Z [ OK ] LazyOpsTest.TestSumU8 (0 ms) 2023-01-11T23:30:52.6319997Z [ RUN ] LazyOpsTest.TestSumInDim 2023-01-11T23:30:52.6342787Z [ OK ] LazyOpsTest.TestSumInDim (2 ms) 2023-01-11T23:30:52.6343114Z [ RUN ] LazyOpsTest.TestSumInDims 2023-01-11T23:30:52.6350655Z [ OK ] LazyOpsTest.TestSumInDims (0 ms) 2023-01-11T23:30:52.6351723Z [ RUN ] LazyOpsTest.TestSumInDimsKeep 2023-01-11T23:30:52.6358906Z [ OK ] LazyOpsTest.TestSumInDimsKeep (0 ms) 2023-01-11T23:30:52.6359302Z [ RUN ] LazyOpsTest.TestSumInDimsKeepCast 2023-01-11T23:30:52.6367199Z [ OK ] LazyOpsTest.TestSumInDimsKeepCast (0 ms) 2023-01-11T23:30:52.6367535Z [ RUN ] LazyOpsTest.TestVar 2023-01-11T23:30:52.6370999Z [ OK ] LazyOpsTest.TestVar (0 ms) 2023-01-11T23:30:52.6371316Z [ RUN ] LazyOpsTest.TestVarWithDim 2023-01-11T23:30:52.6385550Z [ OK ] LazyOpsTest.TestVarWithDim (1 ms) 2023-01-11T23:30:52.6385905Z [ RUN ] LazyOpsTest.TestVarWithCorrection 2023-01-11T23:30:52.6406467Z [ OK ] LazyOpsTest.TestVarWithCorrection (2 ms) 2023-01-11T23:30:52.6406845Z [ RUN ] LazyOpsTest.TestVarMeanWithCorrection 2023-01-11T23:30:52.6439458Z [ OK ] LazyOpsTest.TestVarMeanWithCorrection (3 ms) 2023-01-11T23:30:52.6439810Z [ RUN ] LazyOpsTest.TestMaxInDim 2023-01-11T23:30:52.6521258Z [ OK ] LazyOpsTest.TestMaxInDim (8 ms) 2023-01-11T23:30:52.6521571Z [ RUN ] LazyOpsTest.TestMinInDim 2023-01-11T23:30:52.6550377Z [ OK ] LazyOpsTest.TestMinInDim (2 ms) 2023-01-11T23:30:52.6552220Z [ RUN ] LazyOpsTest.TestNorm 2023-01-11T23:30:52.6554680Z [ OK ] LazyOpsTest.TestNorm (0 ms) 2023-01-11T23:30:52.6555018Z [ RUN ] LazyOpsTest.TestNormInDim 2023-01-11T23:30:52.6563308Z [ OK ] LazyOpsTest.TestNormInDim (0 ms) 2023-01-11T23:30:52.6563643Z [ RUN ] LazyOpsTest.TestNormInDims 2023-01-11T23:30:52.6571281Z [ OK ] LazyOpsTest.TestNormInDims (0 ms) 2023-01-11T23:30:52.6571630Z [ RUN ] LazyOpsTest.TestNormInDimsKeep 2023-01-11T23:30:52.6579505Z [ OK ] LazyOpsTest.TestNormInDimsKeep (0 ms) 2023-01-11T23:30:52.6579865Z [ RUN ] LazyOpsTest.TestNormalTwoTensor 2023-01-11T23:30:52.6587311Z [ OK ] LazyOpsTest.TestNormalTwoTensor (0 ms) 2023-01-11T23:30:52.6587661Z [ RUN ] LazyOpsTest.TestNormalDoubleMean 2023-01-11T23:30:52.6596136Z [ OK ] LazyOpsTest.TestNormalDoubleMean (0 ms) 2023-01-11T23:30:52.6596494Z [ RUN ] LazyOpsTest.TestNormalDoubleStd 2023-01-11T23:30:52.6598765Z [ OK ] LazyOpsTest.TestNormalDoubleStd (0 ms) 2023-01-11T23:30:52.6599122Z [ RUN ] LazyOpsTest.TestNormalInPlace 2023-01-11T23:30:52.6602589Z [ OK ] LazyOpsTest.TestNormalInPlace (0 ms) 2023-01-11T23:30:52.6602994Z [ RUN ] LazyOpsTest.TestUniformInPlace 2023-01-11T23:30:52.6605693Z [ OK ] LazyOpsTest.TestUniformInPlace (0 ms) 2023-01-11T23:30:52.6606048Z [ RUN ] LazyOpsTest.TestRandomInPlace 2023-01-11T23:30:52.6677267Z [ OK ] LazyOpsTest.TestRandomInPlace (7 ms) 2023-01-11T23:30:52.6677770Z [ RUN ] LazyOpsTest.TestRandomInPlaceDefaultFrom 2023-01-11T23:30:52.6746736Z [ OK ] LazyOpsTest.TestRandomInPlaceDefaultFrom (6 ms) 2023-01-11T23:30:52.6747134Z [ RUN ] LazyOpsTest.TestRandomInPlaceDefault 2023-01-11T23:30:52.6766050Z [ OK ] LazyOpsTest.TestRandomInPlaceDefault (1 ms) 2023-01-11T23:30:52.6766411Z [ RUN ] LazyOpsTest.TestNormGeneral 2023-01-11T23:30:52.6770503Z [ OK ] LazyOpsTest.TestNormGeneral (0 ms) 2023-01-11T23:30:52.6770842Z [ RUN ] LazyOpsTest.TestNormNuclear 2023-01-11T23:30:52.6775319Z [ OK ] LazyOpsTest.TestNormNuclear (0 ms) 2023-01-11T23:30:52.6775664Z [ RUN ] LazyOpsTest.TestFrobeniusNormInDim 2023-01-11T23:30:52.6783696Z [ OK ] LazyOpsTest.TestFrobeniusNormInDim (0 ms) 2023-01-11T23:30:52.6784072Z [ RUN ] LazyOpsTest.TestFrobeniusNormInDims 2023-01-11T23:30:52.6792605Z [ OK ] LazyOpsTest.TestFrobeniusNormInDims (0 ms) 2023-01-11T23:30:52.6793035Z [ RUN ] LazyOpsTest.TestGroupNorm 2023-01-11T23:30:52.6828615Z [ OK ] LazyOpsTest.TestGroupNorm (3 ms) 2023-01-11T23:30:52.6828977Z [ RUN ] LazyOpsTest.TestGroupNormBackward 2023-01-11T23:30:52.7435220Z [ OK ] LazyOpsTest.TestGroupNormBackward (60 ms) 2023-01-11T23:30:52.7435643Z [ RUN ] LazyOpsTest.TestInstanceNorm 2023-01-11T23:30:52.7464596Z [ OK ] LazyOpsTest.TestInstanceNorm (2 ms) 2023-01-11T23:30:52.7465052Z [ RUN ] LazyOpsTest.TestLayerNorm 2023-01-11T23:30:52.7494897Z [ OK ] LazyOpsTest.TestLayerNorm (3 ms) 2023-01-11T23:30:52.7495367Z [ RUN ] LazyOpsTest.TestLayerNormBackward 2023-01-11T23:30:52.7833135Z [ OK ] LazyOpsTest.TestLayerNormBackward (33 ms) 2023-01-11T23:30:52.7833605Z [ RUN ] LazyOpsTest.TestNuclearNorm 2023-01-11T23:30:52.7842841Z [ OK ] LazyOpsTest.TestNuclearNorm (1 ms) 2023-01-11T23:30:52.7843352Z [ RUN ] LazyOpsTest.TestPairwiseDistance 2023-01-11T23:30:53.1055248Z [ OK ] LazyOpsTest.TestPairwiseDistance (320 ms) 2023-01-11T23:30:53.1056228Z [ RUN ] LazyOpsTest.TestCosineSimilarity 2023-01-11T23:30:53.1113139Z [ OK ] LazyOpsTest.TestCosineSimilarity (5 ms) 2023-01-11T23:30:53.1113576Z [ RUN ] LazyOpsTest.TestCosineEmbeddingLoss 2023-01-11T23:30:54.4101390Z [ OK ] LazyOpsTest.TestCosineEmbeddingLoss (1298 ms) 2023-01-11T23:30:54.4102093Z [ RUN ] LazyOpsTest.TestHingeEmbeddingLoss 2023-01-11T23:30:54.4156987Z [ OK ] LazyOpsTest.TestHingeEmbeddingLoss (5 ms) 2023-01-11T23:30:54.4158165Z [ RUN ] LazyOpsTest.TestTripletMarginLoss 2023-01-11T23:30:55.1078729Z [ OK ] LazyOpsTest.TestTripletMarginLoss (691 ms) 2023-01-11T23:30:55.1079563Z [ RUN ] LazyOpsTest.TestBinaryCrossEntropy 2023-01-11T23:30:55.1102335Z [ OK ] LazyOpsTest.TestBinaryCrossEntropy (2 ms) 2023-01-11T23:30:55.1103346Z [ RUN ] LazyOpsTest.TestMarginRankingLoss 2023-01-11T23:30:55.3220131Z [ OK ] LazyOpsTest.TestMarginRankingLoss (211 ms) 2023-01-11T23:30:55.3220804Z [ RUN ] LazyOpsTest.TestBCEWithLogits 2023-01-11T23:30:55.3241299Z [ OK ] LazyOpsTest.TestBCEWithLogits (2 ms) 2023-01-11T23:30:55.3241649Z [ RUN ] LazyOpsTest.TestKlDiv 2023-01-11T23:30:55.3270228Z [ OK ] LazyOpsTest.TestKlDiv (2 ms) 2023-01-11T23:30:55.3270888Z [ RUN ] LazyOpsTest.TestProd 2023-01-11T23:30:55.3271655Z [ OK ] LazyOpsTest.TestProd (0 ms) 2023-01-11T23:30:55.3272321Z [ RUN ] LazyOpsTest.TestProdCast 2023-01-11T23:30:55.3281615Z [ OK ] LazyOpsTest.TestProdCast (1 ms) 2023-01-11T23:30:55.3282093Z [ RUN ] LazyOpsTest.TestProdInDim 2023-01-11T23:30:55.3301878Z [ OK ] LazyOpsTest.TestProdInDim (2 ms) 2023-01-11T23:30:55.3302423Z [ RUN ] LazyOpsTest.TestProdInDimKeepCast 2023-01-11T23:30:55.3322584Z [ OK ] LazyOpsTest.TestProdInDimKeepCast (2 ms) 2023-01-11T23:30:55.3323119Z [ RUN ] LazyOpsTest.TestProdInDimKeep 2023-01-11T23:30:55.3332401Z [ OK ] LazyOpsTest.TestProdInDimKeep (0 ms) 2023-01-11T23:30:55.3332882Z [ RUN ] LazyOpsTest.TestCumSum 2023-01-11T23:30:55.3355734Z [ OK ] LazyOpsTest.TestCumSum (2 ms) 2023-01-11T23:30:55.3356187Z [ RUN ] LazyOpsTest.TestCumSumCast 2023-01-11T23:30:55.3377805Z [ OK ] LazyOpsTest.TestCumSumCast (2 ms) 2023-01-11T23:30:55.3378703Z [ RUN ] LazyOpsTest.TestCumSumLong 2023-01-11T23:30:55.3395712Z [ OK ] LazyOpsTest.TestCumSumLong (1 ms) 2023-01-11T23:30:55.3396192Z [ RUN ] LazyOpsTest.TestCumSumCastLong 2023-01-11T23:30:55.3414276Z [ OK ] LazyOpsTest.TestCumSumCastLong (1 ms) 2023-01-11T23:30:55.3415884Z [ RUN ] LazyOpsTest.TestCumProd 2023-01-11T23:30:55.3424766Z [ OK ] LazyOpsTest.TestCumProd (1 ms) 2023-01-11T23:30:55.3425231Z [ RUN ] LazyOpsTest.TestCumProdCast 2023-01-11T23:30:55.3434066Z [ OK ] LazyOpsTest.TestCumProdCast (1 ms) 2023-01-11T23:30:55.3434537Z [ RUN ] LazyOpsTest.TestCumProdLong 2023-01-11T23:30:55.3446863Z [ OK ] LazyOpsTest.TestCumProdLong (1 ms) 2023-01-11T23:30:55.3447343Z [ RUN ] LazyOpsTest.TestCumProdCastLong 2023-01-11T23:30:55.3459721Z [ OK ] LazyOpsTest.TestCumProdCastLong (1 ms) 2023-01-11T23:30:55.3460211Z [ RUN ] LazyOpsTest.TestArgMin 2023-01-11T23:30:55.3460881Z [ OK ] LazyOpsTest.TestArgMin (0 ms) 2023-01-11T23:30:55.3461301Z [ RUN ] LazyOpsTest.TestArgMinDim 2023-01-11T23:30:55.3464245Z [ OK ] LazyOpsTest.TestArgMinDim (0 ms) 2023-01-11T23:30:55.3464730Z [ RUN ] LazyOpsTest.TestArgMinDimKeep 2023-01-11T23:30:55.3466833Z [ OK ] LazyOpsTest.TestArgMinDimKeep (0 ms) 2023-01-11T23:30:55.3467278Z [ RUN ] LazyOpsTest.TestArgMinSameValue 2023-01-11T23:30:55.3468712Z [ OK ] LazyOpsTest.TestArgMinSameValue (0 ms) 2023-01-11T23:30:55.3469169Z [ RUN ] LazyOpsTest.TestArgMinWrapper 2023-01-11T23:30:55.3472428Z [ OK ] LazyOpsTest.TestArgMinWrapper (0 ms) 2023-01-11T23:30:55.3472938Z [ RUN ] LazyOpsTest.TestArgMax 2023-01-11T23:30:55.3474122Z [ OK ] LazyOpsTest.TestArgMax (0 ms) 2023-01-11T23:30:55.3474529Z [ RUN ] LazyOpsTest.TestArgMaxDim 2023-01-11T23:30:55.3476979Z [ OK ] LazyOpsTest.TestArgMaxDim (0 ms) 2023-01-11T23:30:55.3477400Z [ RUN ] LazyOpsTest.TestArgMaxDimKeep 2023-01-11T23:30:55.3479268Z [ OK ] LazyOpsTest.TestArgMaxDimKeep (0 ms) 2023-01-11T23:30:55.3479743Z [ RUN ] LazyOpsTest.TestArgMaxSameValue 2023-01-11T23:30:55.3480650Z [ OK ] LazyOpsTest.TestArgMaxSameValue (0 ms) 2023-01-11T23:30:55.3481058Z [ RUN ] LazyOpsTest.TestArgMaxWrapper 2023-01-11T23:30:55.3483775Z [ OK ] LazyOpsTest.TestArgMaxWrapper (0 ms) 2023-01-11T23:30:55.3484191Z [ RUN ] LazyOpsTest.TestAsin 2023-01-11T23:30:55.3485780Z [ OK ] LazyOpsTest.TestAsin (0 ms) 2023-01-11T23:30:55.3486149Z [ RUN ] LazyOpsTest.TestAsinh 2023-01-11T23:30:55.3487944Z [ OK ] LazyOpsTest.TestAsinh (0 ms) 2023-01-11T23:30:55.3488291Z [ RUN ] LazyOpsTest.TestAsinhInPlace 2023-01-11T23:30:55.3490851Z [ OK ] LazyOpsTest.TestAsinhInPlace (0 ms) 2023-01-11T23:30:55.3491216Z [ RUN ] LazyOpsTest.TestSin 2023-01-11T23:30:55.3493063Z [ OK ] LazyOpsTest.TestSin (0 ms) 2023-01-11T23:30:55.3493368Z [ RUN ] LazyOpsTest.TestSinh 2023-01-11T23:30:55.3495492Z [ OK ] LazyOpsTest.TestSinh (0 ms) 2023-01-11T23:30:55.3495918Z [ RUN ] LazyOpsTest.TestAcos 2023-01-11T23:30:55.3498023Z [ OK ] LazyOpsTest.TestAcos (0 ms) 2023-01-11T23:30:55.3498425Z [ RUN ] LazyOpsTest.TestAcosh 2023-01-11T23:30:55.3499892Z [ OK ] LazyOpsTest.TestAcosh (0 ms) 2023-01-11T23:30:55.3500219Z [ RUN ] LazyOpsTest.TestAcoshInPlace 2023-01-11T23:30:55.3502673Z [ OK ] LazyOpsTest.TestAcoshInPlace (0 ms) 2023-01-11T23:30:55.3502981Z [ RUN ] LazyOpsTest.TestCos 2023-01-11T23:30:55.3506377Z [ OK ] LazyOpsTest.TestCos (0 ms) 2023-01-11T23:30:55.3506743Z [ RUN ] LazyOpsTest.TestCosh 2023-01-11T23:30:55.3508604Z [ OK ] LazyOpsTest.TestCosh (0 ms) 2023-01-11T23:30:55.3508938Z [ RUN ] LazyOpsTest.TestAtan 2023-01-11T23:30:55.3512104Z [ OK ] LazyOpsTest.TestAtan (0 ms) 2023-01-11T23:30:55.3512695Z [ RUN ] LazyOpsTest.TestAtanh 2023-01-11T23:30:55.3514098Z [ OK ] LazyOpsTest.TestAtanh (0 ms) 2023-01-11T23:30:55.3514546Z [ RUN ] LazyOpsTest.TestAtanhInPlace 2023-01-11T23:30:55.3516220Z [ OK ] LazyOpsTest.TestAtanhInPlace (0 ms) 2023-01-11T23:30:55.3516658Z [ RUN ] LazyOpsTest.TestAtan2 2023-01-11T23:30:55.3518858Z [ OK ] LazyOpsTest.TestAtan2 (0 ms) 2023-01-11T23:30:55.3519270Z [ RUN ] LazyOpsTest.TestTan 2023-01-11T23:30:55.3521217Z [ OK ] LazyOpsTest.TestTan (0 ms) 2023-01-11T23:30:55.3521622Z [ RUN ] LazyOpsTest.TestTanh 2023-01-11T23:30:55.3524704Z [ OK ] LazyOpsTest.TestTanh (0 ms) 2023-01-11T23:30:55.3525134Z [ RUN ] LazyOpsTest.TestClampMinMax 2023-01-11T23:30:55.3528719Z [ OK ] LazyOpsTest.TestClampMinMax (0 ms) 2023-01-11T23:30:55.3529036Z [ RUN ] LazyOpsTest.TestClampMin 2023-01-11T23:30:55.3532916Z [ OK ] LazyOpsTest.TestClampMin (0 ms) 2023-01-11T23:30:55.3533252Z [ RUN ] LazyOpsTest.TestClampMax 2023-01-11T23:30:55.3536526Z [ OK ] LazyOpsTest.TestClampMax (0 ms) 2023-01-11T23:30:55.3536861Z [ RUN ] LazyOpsTest.TestClampMinExplicit 2023-01-11T23:30:55.3540278Z [ OK ] LazyOpsTest.TestClampMinExplicit (0 ms) 2023-01-11T23:30:55.3540747Z [ RUN ] LazyOpsTest.TestClampMaxExplicit 2023-01-11T23:30:55.3542705Z [ OK ] LazyOpsTest.TestClampMaxExplicit (0 ms) 2023-01-11T23:30:55.3543074Z [ RUN ] LazyOpsTest.TestClampMinExplicitInPlace 2023-01-11T23:30:55.3547204Z [ OK ] LazyOpsTest.TestClampMinExplicitInPlace (0 ms) 2023-01-11T23:30:55.3547621Z [ RUN ] LazyOpsTest.TestClampMaxExplicitInPlace 2023-01-11T23:30:55.3550312Z [ OK ] LazyOpsTest.TestClampMaxExplicitInPlace (0 ms) 2023-01-11T23:30:55.3552295Z [ RUN ] LazyOpsTest.TestCeil 2023-01-11T23:30:55.3553932Z [ OK ] LazyOpsTest.TestCeil (0 ms) 2023-01-11T23:30:55.3554337Z [ RUN ] LazyOpsTest.TestFloor 2023-01-11T23:30:55.3556616Z [ OK ] LazyOpsTest.TestFloor (0 ms) 2023-01-11T23:30:55.3556949Z [ RUN ] LazyOpsTest.TestRound 2023-01-11T23:30:55.3559418Z [ OK ] LazyOpsTest.TestRound (0 ms) 2023-01-11T23:30:55.3559755Z [ RUN ] LazyOpsTest.TestTrunc 2023-01-11T23:30:55.3563173Z [ OK ] LazyOpsTest.TestTrunc (0 ms) 2023-01-11T23:30:55.3563595Z [ RUN ] LazyOpsTest.TestFrac 2023-01-11T23:30:55.3566905Z [ OK ] LazyOpsTest.TestFrac (0 ms) 2023-01-11T23:30:55.3567254Z [ RUN ] LazyOpsTest.TestNeg 2023-01-11T23:30:55.3570397Z [ OK ] LazyOpsTest.TestNeg (0 ms) 2023-01-11T23:30:55.3570753Z [ RUN ] LazyOpsTest.TestBitwiseNot 2023-01-11T23:30:55.3576052Z [ OK ] LazyOpsTest.TestBitwiseNot (0 ms) 2023-01-11T23:30:55.3576420Z [ RUN ] LazyOpsTest.TestBitwiseNotInPlace 2023-01-11T23:30:55.3581026Z [ OK ] LazyOpsTest.TestBitwiseNotInPlace (0 ms) 2023-01-11T23:30:55.3581378Z [ RUN ] LazyOpsTest.TestSign 2023-01-11T23:30:55.3583254Z [ OK ] LazyOpsTest.TestSign (0 ms) 2023-01-11T23:30:55.3583673Z [ RUN ] LazyOpsTest.TestSignByte 2023-01-11T23:30:55.3584837Z [ OK ] LazyOpsTest.TestSignByte (0 ms) 2023-01-11T23:30:55.3585180Z [ RUN ] LazyOpsTest.TestAbs 2023-01-11T23:30:55.3588079Z [ OK ] LazyOpsTest.TestAbs (0 ms) 2023-01-11T23:30:55.3588388Z [ RUN ] LazyOpsTest.TestAbsByte 2023-01-11T23:30:55.3592882Z [ OK ] LazyOpsTest.TestAbsByte (0 ms) 2023-01-11T23:30:55.3593344Z [ RUN ] LazyOpsTest.TestEmptyLike 2023-01-11T23:30:55.3593696Z [ OK ] LazyOpsTest.TestEmptyLike (0 ms) 2023-01-11T23:30:55.3594029Z [ RUN ] LazyOpsTest.TestEmptyLikeOptions 2023-01-11T23:30:55.3594480Z [ OK ] LazyOpsTest.TestEmptyLikeOptions (0 ms) 2023-01-11T23:30:55.3594805Z [ RUN ] LazyOpsTest.TestEmpty 2023-01-11T23:30:55.3595104Z [ OK ] LazyOpsTest.TestEmpty (0 ms) 2023-01-11T23:30:55.3595502Z [ RUN ] LazyOpsTest.TestZeroInPlace 2023-01-11T23:30:55.3596111Z [ OK ] LazyOpsTest.TestZeroInPlace (0 ms) 2023-01-11T23:30:55.3596427Z [ RUN ] LazyOpsTest.TestZerosLike 2023-01-11T23:30:55.3598960Z [ OK ] LazyOpsTest.TestZerosLike (0 ms) 2023-01-11T23:30:55.3599478Z [ RUN ] LazyOpsTest.TestZerosLikeOptions 2023-01-11T23:30:55.3599865Z [ OK ] LazyOpsTest.TestZerosLikeOptions (0 ms) 2023-01-11T23:30:55.3600179Z [ RUN ] LazyOpsTest.TestZeros 2023-01-11T23:30:55.3603544Z [ OK ] LazyOpsTest.TestZeros (0 ms) 2023-01-11T23:30:55.3603856Z [ RUN ] LazyOpsTest.TestOnes 2023-01-11T23:30:55.3607988Z [ OK ] LazyOpsTest.TestOnes (0 ms) 2023-01-11T23:30:55.3608296Z [ RUN ] LazyOpsTest.TestOnesLike 2023-01-11T23:30:55.3610050Z [ OK ] LazyOpsTest.TestOnesLike (0 ms) 2023-01-11T23:30:55.3610542Z [ RUN ] LazyOpsTest.TestOnesLikeOptions 2023-01-11T23:30:55.3611376Z [ OK ] LazyOpsTest.TestOnesLikeOptions (0 ms) 2023-01-11T23:30:55.3611703Z [ RUN ] LazyOpsTest.TestFull 2023-01-11T23:30:55.3617795Z [ OK ] LazyOpsTest.TestFull (0 ms) 2023-01-11T23:30:55.3618151Z [ RUN ] LazyOpsTest.TestFullLike 2023-01-11T23:30:55.3619326Z [ OK ] LazyOpsTest.TestFullLike (0 ms) 2023-01-11T23:30:55.3619666Z [ RUN ] LazyOpsTest.TestFullLikeOptions 2023-01-11T23:30:55.3621260Z [ OK ] LazyOpsTest.TestFullLikeOptions (0 ms) 2023-01-11T23:30:55.3621592Z [ RUN ] LazyOpsTest.TestARange 2023-01-11T23:30:55.3629079Z [ OK ] LazyOpsTest.TestARange (0 ms) 2023-01-11T23:30:55.3629488Z [ RUN ] LazyOpsTest.TestARangeOut 2023-01-11T23:30:55.3630058Z [W RangeFactories.cu:270] Warning: The number of elements in the out tensor of shape [4] is 4 which does not match the computed number of elements 200. Note that this may occur as a result of rounding error. The out tensor will be resized to a tensor of shape (200,). (function operator()) 2023-01-11T23:30:55.3634060Z [ OK ] LazyOpsTest.TestARangeOut (0 ms) 2023-01-11T23:30:55.3634381Z [ RUN ] LazyOpsTest.TestDimARange 2023-01-11T23:30:55.3638625Z [ OK ] LazyOpsTest.TestDimARange (0 ms) 2023-01-11T23:30:55.3638966Z [ RUN ] LazyOpsTest.TestBartlettWindow 2023-01-11T23:30:55.3676696Z [ OK ] LazyOpsTest.TestBartlettWindow (3 ms) 2023-01-11T23:30:55.3677202Z [ RUN ] LazyOpsTest.TestBlackmanWindow 2023-01-11T23:30:55.3701709Z [ OK ] LazyOpsTest.TestBlackmanWindow (2 ms) 2023-01-11T23:30:55.3702057Z [ RUN ] LazyOpsTest.TestHammingWindow 2023-01-11T23:30:55.3719646Z [ OK ] LazyOpsTest.TestHammingWindow (1 ms) 2023-01-11T23:30:55.3719971Z [ RUN ] LazyOpsTest.TestHannWindow 2023-01-11T23:30:55.7837702Z [ OK ] LazyOpsTest.TestHannWindow (411 ms) 2023-01-11T23:30:55.7838469Z [ RUN ] LazyOpsTest.TestLogSigmoid 2023-01-11T23:30:55.7843234Z [ OK ] LazyOpsTest.TestLogSigmoid (0 ms) 2023-01-11T23:30:55.7843599Z [ RUN ] LazyOpsTest.TestLogSigmoidForward 2023-01-11T23:30:55.7846906Z [ OK ] LazyOpsTest.TestLogSigmoidForward (0 ms) 2023-01-11T23:30:55.7847271Z [ RUN ] LazyOpsTest.TestLogsumexp 2023-01-11T23:30:55.7898911Z [ OK ] LazyOpsTest.TestLogsumexp (4 ms) 2023-01-11T23:30:55.7899693Z [ RUN ] LazyOpsTest.TestSiLU 2023-01-11T23:30:55.7902328Z [ OK ] LazyOpsTest.TestSiLU (0 ms) 2023-01-11T23:30:55.7902759Z [ RUN ] LazyOpsTest.TestSigmoid 2023-01-11T23:30:55.7905290Z [ OK ] LazyOpsTest.TestSigmoid (0 ms) 2023-01-11T23:30:55.7905758Z [ RUN ] LazyOpsTest.TestMatmul_1x1 2023-01-11T23:30:55.7907841Z [ OK ] LazyOpsTest.TestMatmul_1x1 (0 ms) 2023-01-11T23:30:55.7908161Z [ RUN ] LazyOpsTest.TestMatmul_2x1 2023-01-11T23:30:55.7914596Z [ OK ] LazyOpsTest.TestMatmul_2x1 (0 ms) 2023-01-11T23:30:55.7914944Z [ RUN ] LazyOpsTest.TestMatmul_1x2 2023-01-11T23:30:55.7920563Z [ OK ] LazyOpsTest.TestMatmul_1x2 (0 ms) 2023-01-11T23:30:55.7920904Z [ RUN ] LazyOpsTest.TestMatmul_2x2 2023-01-11T23:30:55.7924532Z [ OK ] LazyOpsTest.TestMatmul_2x2 (0 ms) 2023-01-11T23:30:55.7924880Z [ RUN ] LazyOpsTest.TestMatmulBcast 2023-01-11T23:30:55.7933668Z [ OK ] LazyOpsTest.TestMatmulBcast (0 ms) 2023-01-11T23:30:55.7934139Z [ RUN ] LazyOpsTest.TestDot 2023-01-11T23:30:55.7936296Z [ OK ] LazyOpsTest.TestDot (0 ms) 2023-01-11T23:30:55.7936632Z [ RUN ] LazyOpsTest.TestTensorDot 2023-01-11T23:30:55.7944936Z [ OK ] LazyOpsTest.TestTensorDot (0 ms) 2023-01-11T23:30:55.7945277Z [ RUN ] LazyOpsTest.TestGer 2023-01-11T23:30:55.7949840Z [ OK ] LazyOpsTest.TestGer (0 ms) 2023-01-11T23:30:55.7950243Z [ RUN ] LazyOpsTest.TestMv 2023-01-11T23:30:55.7954690Z [ OK ] LazyOpsTest.TestMv (0 ms) 2023-01-11T23:30:55.7955009Z [ RUN ] LazyOpsTest.TestMvOut 2023-01-11T23:30:55.7959696Z [ OK ] LazyOpsTest.TestMvOut (0 ms) 2023-01-11T23:30:55.7960039Z [ RUN ] LazyOpsTest.TestBatchAddBatchMatMul 2023-01-11T23:30:55.7964446Z [ OK ] LazyOpsTest.TestBatchAddBatchMatMul (0 ms) 2023-01-11T23:30:55.7964843Z [ RUN ] LazyOpsTest.TestBatchAddBatchMatMulInPlace 2023-01-11T23:30:55.7970747Z [ OK ] LazyOpsTest.TestBatchAddBatchMatMulInPlace (0 ms) 2023-01-11T23:30:55.7971120Z [ RUN ] LazyOpsTest.TestBatchMatMul 2023-01-11T23:30:55.7975033Z [ OK ] LazyOpsTest.TestBatchMatMul (0 ms) 2023-01-11T23:30:55.7975373Z [ RUN ] LazyOpsTest.TestChainMatMul 2023-01-11T23:30:55.7975871Z [W LinearAlgebra.cpp:1077] Warning: torch.chain_matmul is deprecated and will be removed in a future PyTorch release. Use torch.linalg.multi_dot instead, which accepts a list of two or more tensors rather than multiple parameters. (function operator()) 2023-01-11T23:30:55.7982949Z [ OK ] LazyOpsTest.TestChainMatMul (0 ms) 2023-01-11T23:30:55.7983316Z [ RUN ] LazyOpsTest.TestLinear 2023-01-11T23:30:55.7996024Z [ OK ] LazyOpsTest.TestLinear (1 ms) 2023-01-11T23:30:55.7996349Z [ RUN ] LazyOpsTest.TestPinverse 2023-01-11T23:30:55.8023975Z [ OK ] LazyOpsTest.TestPinverse (2 ms) 2023-01-11T23:30:55.8024430Z [ RUN ] LazyOpsTest.TestEinsumOuter 2023-01-11T23:30:55.8032087Z [ OK ] LazyOpsTest.TestEinsumOuter (0 ms) 2023-01-11T23:30:55.8032464Z [ RUN ] LazyOpsTest.TestEinsumOuterBackward 2023-01-11T23:30:55.8060409Z [ OK ] LazyOpsTest.TestEinsumOuterBackward (3 ms) 2023-01-11T23:30:55.8060803Z [ RUN ] LazyOpsTest.TestEinsumBatchMatMul 2023-01-11T23:30:55.8074041Z [ OK ] LazyOpsTest.TestEinsumBatchMatMul (1 ms) 2023-01-11T23:30:55.8074437Z [ RUN ] LazyOpsTest.TestEinsumPyTorchLowerBilinear 2023-01-11T23:30:55.8095066Z [ OK ] LazyOpsTest.TestEinsumPyTorchLowerBilinear (2 ms) 2023-01-11T23:30:55.8095491Z [ RUN ] LazyOpsTest.TestEinsumPyTorchLowerDiagonal 2023-01-11T23:30:55.8101288Z [ OK ] LazyOpsTest.TestEinsumPyTorchLowerDiagonal (0 ms) 2023-01-11T23:30:55.8101751Z [ RUN ] LazyOpsTest.TestEinsumPyTorchLowerBatchDiagonal 2023-01-11T23:30:55.8107006Z [ OK ] LazyOpsTest.TestEinsumPyTorchLowerBatchDiagonal (0 ms) 2023-01-11T23:30:55.8107573Z [ RUN ] LazyOpsTest.TestEinsumPyTorchLowerBatchPermute 2023-01-11T23:30:55.8112402Z [ OK ] LazyOpsTest.TestEinsumPyTorchLowerBatchPermute (0 ms) 2023-01-11T23:30:55.8112941Z [ RUN ] LazyOpsTest.TestEinsumPyTorchLowerRepeatedAxis 2023-01-11T23:30:55.8123128Z [ OK ] LazyOpsTest.TestEinsumPyTorchLowerRepeatedAxis (1 ms) 2023-01-11T23:30:55.8123534Z [ RUN ] LazyOpsTest.TestBilinear 2023-01-11T23:30:55.8300681Z [ OK ] LazyOpsTest.TestBilinear (17 ms) 2023-01-11T23:30:55.8301034Z [ RUN ] LazyOpsTest.TestUpsampleNearest2D 2023-01-11T23:30:55.8305299Z [ OK ] LazyOpsTest.TestUpsampleNearest2D (0 ms) 2023-01-11T23:30:55.8305715Z [ RUN ] LazyOpsTest.TestUpsampleNearest2DBackward 2023-01-11T23:30:55.8318987Z [ OK ] LazyOpsTest.TestUpsampleNearest2DBackward (1 ms) 2023-01-11T23:30:55.8319399Z [ RUN ] LazyOpsTest.TestUpsampleNearest2DWithScale 2023-01-11T23:30:55.8323615Z [ OK ] LazyOpsTest.TestUpsampleNearest2DWithScale (0 ms) 2023-01-11T23:30:55.8324070Z [ RUN ] LazyOpsTest.TestUpsampleNearest2DBackwardWithScale 2023-01-11T23:30:55.8336756Z [ OK ] LazyOpsTest.TestUpsampleNearest2DBackwardWithScale (1 ms) 2023-01-11T23:30:55.8337168Z [ RUN ] LazyOpsTest.TestUpsampleBilinear2D 2023-01-11T23:30:55.8345577Z [ OK ] LazyOpsTest.TestUpsampleBilinear2D (0 ms) 2023-01-11T23:30:55.8345993Z [ RUN ] LazyOpsTest.TestUpsampleBilinear2DBackward 2023-01-11T23:30:55.8372951Z [ OK ] LazyOpsTest.TestUpsampleBilinear2DBackward (2 ms) 2023-01-11T23:30:55.8373316Z [ RUN ] LazyOpsTest.TestAddCMul 2023-01-11T23:30:55.8377759Z [ OK ] LazyOpsTest.TestAddCMul (0 ms) 2023-01-11T23:30:55.8378111Z [ RUN ] LazyOpsTest.TestAddCDiv 2023-01-11T23:30:55.8382702Z [ OK ] LazyOpsTest.TestAddCDiv (0 ms) 2023-01-11T23:30:55.8383102Z [ RUN ] LazyOpsTest.TestAddCDivWithBroadcast 2023-01-11T23:30:55.8387446Z [ OK ] LazyOpsTest.TestAddCDivWithBroadcast (0 ms) 2023-01-11T23:30:55.8387815Z [ RUN ] LazyOpsTest.TestSize 2023-01-11T23:30:55.8388102Z [ OK ] LazyOpsTest.TestSize (0 ms) 2023-01-11T23:30:55.8388403Z [ RUN ] LazyOpsTest.TestSelect 2023-01-11T23:30:55.8484709Z [ OK ] LazyOpsTest.TestSelect (9 ms) 2023-01-11T23:30:55.8485098Z [ RUN ] LazyOpsTest.TestBernoulliScalarProb 2023-01-11T23:30:55.8488976Z [ OK ] LazyOpsTest.TestBernoulliScalarProb (0 ms) 2023-01-11T23:30:55.8489366Z [ RUN ] LazyOpsTest.TestBernoulliTensorProb 2023-01-11T23:30:55.8492815Z [ OK ] LazyOpsTest.TestBernoulliTensorProb (0 ms) 2023-01-11T23:30:55.8493317Z [ RUN ] LazyOpsTest.TestBernoulliScalarProbInPlace 2023-01-11T23:30:55.8497668Z [ OK ] LazyOpsTest.TestBernoulliScalarProbInPlace (0 ms) 2023-01-11T23:30:55.8498094Z [ RUN ] LazyOpsTest.TestBernoulliTensorProbInPlace 2023-01-11T23:30:55.8502535Z [ OK ] LazyOpsTest.TestBernoulliTensorProbInPlace (0 ms) 2023-01-11T23:30:55.8502948Z [ RUN ] LazyOpsTest.TestDropout 2023-01-11T23:30:55.8506061Z [ OK ] LazyOpsTest.TestDropout (0 ms) 2023-01-11T23:30:55.8506435Z [ RUN ] LazyOpsTest.TestDropoutInPlace 2023-01-11T23:30:55.8513002Z [ OK ] LazyOpsTest.TestDropoutInPlace (0 ms) 2023-01-11T23:30:55.8513469Z [ RUN ] LazyOpsTest.TestRandperm 2023-01-11T23:30:55.8514438Z [ OK ] LazyOpsTest.TestRandperm (0 ms) 2023-01-11T23:30:55.8514813Z [ RUN ] LazyOpsTest.TestSlice 2023-01-11T23:30:55.8521714Z [ OK ] LazyOpsTest.TestSlice (0 ms) 2023-01-11T23:30:55.8522042Z [ RUN ] LazyOpsTest.TestTake 2023-01-11T23:30:55.8524470Z [ OK ] LazyOpsTest.TestTake (0 ms) 2023-01-11T23:30:55.8524820Z [ RUN ] LazyOpsTest.TestTakeBackward 2023-01-11T23:30:55.8540363Z [ OK ] LazyOpsTest.TestTakeBackward (1 ms) 2023-01-11T23:30:55.8540973Z [ RUN ] LazyOpsTest.TestStack 2023-01-11T23:30:55.8577560Z [ OK ] LazyOpsTest.TestStack (3 ms) 2023-01-11T23:30:55.8577879Z [ RUN ] LazyOpsTest.TestCat 2023-01-11T23:30:55.8587083Z [ OK ] LazyOpsTest.TestCat (0 ms) 2023-01-11T23:30:55.8587457Z [ RUN ] LazyOpsTest.TestUnbind 2023-01-11T23:30:55.8616775Z [ OK ] LazyOpsTest.TestUnbind (2 ms) 2023-01-11T23:30:55.8617094Z [ RUN ] LazyOpsTest.TestRepeat 2023-01-11T23:30:55.8632703Z [ OK ] LazyOpsTest.TestRepeat (1 ms) 2023-01-11T23:30:55.8633025Z [ RUN ] LazyOpsTest.TestGather 2023-01-11T23:30:55.8641124Z [ OK ] LazyOpsTest.TestGather (0 ms) 2023-01-11T23:30:55.8641445Z [ RUN ] LazyOpsTest.TestScatter 2023-01-11T23:30:55.8649001Z [ OK ] LazyOpsTest.TestScatter (0 ms) 2023-01-11T23:30:55.8649327Z [ RUN ] LazyOpsTest.TestScatterR1 2023-01-11T23:30:55.8652111Z [ OK ] LazyOpsTest.TestScatterR1 (0 ms) 2023-01-11T23:30:55.8652464Z [ RUN ] LazyOpsTest.TestScatterR3 2023-01-11T23:30:55.8657286Z [ OK ] LazyOpsTest.TestScatterR3 (0 ms) 2023-01-11T23:30:55.8657639Z [ RUN ] LazyOpsTest.TestScatterBiggerSource 2023-01-11T23:30:55.8663828Z [ OK ] LazyOpsTest.TestScatterBiggerSource (0 ms) 2023-01-11T23:30:55.8664178Z [ RUN ] LazyOpsTest.TestScatterScalar 2023-01-11T23:30:55.8669705Z [ OK ] LazyOpsTest.TestScatterScalar (0 ms) 2023-01-11T23:30:55.8670153Z [ RUN ] LazyOpsTest.TestScatterReduceAdd 2023-01-11T23:30:55.8677721Z [ OK ] LazyOpsTest.TestScatterReduceAdd (0 ms) 2023-01-11T23:30:55.8678057Z [ RUN ] LazyOpsTest.TestScatterAdd 2023-01-11T23:30:55.8688973Z [ OK ] LazyOpsTest.TestScatterAdd (1 ms) 2023-01-11T23:30:55.8689319Z [ RUN ] LazyOpsTest.TestScatterAddInPlace 2023-01-11T23:30:55.8700840Z [ OK ] LazyOpsTest.TestScatterAddInPlace (1 ms) 2023-01-11T23:30:55.8701180Z [ RUN ] LazyOpsTest.TestIndexSelect 2023-01-11T23:30:55.8843321Z [ OK ] LazyOpsTest.TestIndexSelect (14 ms) 2023-01-11T23:30:55.8843738Z [ RUN ] LazyOpsTest.TestIndexSelectRank0 2023-01-11T23:30:55.8881467Z [ OK ] LazyOpsTest.TestIndexSelectRank0 (3 ms) 2023-01-11T23:30:55.8881959Z [ RUN ] LazyOpsTest.TestInverse 2023-01-11T23:30:55.8882260Z /var/lib/jenkins/workspace/test/cpp/lazy/test_lazy_ops.cpp:4728: Skipped 2023-01-11T23:30:55.8882519Z 2023-01-11T23:30:55.8883214Z [ SKIPPED ] LazyOpsTest.TestInverse (0 ms) 2023-01-11T23:30:55.8883803Z [ RUN ] LazyOpsTest.TestIsnan 2023-01-11T23:30:55.8884213Z [ OK ] LazyOpsTest.TestIsnan (0 ms) 2023-01-11T23:30:55.8884519Z [ RUN ] LazyOpsTest.TestExpand 2023-01-11T23:30:55.8887624Z [ OK ] LazyOpsTest.TestExpand (0 ms) 2023-01-11T23:30:55.8888082Z [ RUN ] LazyOpsTest.TestExpandBack 2023-01-11T23:30:55.8891809Z [ OK ] LazyOpsTest.TestExpandBack (0 ms) 2023-01-11T23:30:55.8892276Z [ RUN ] LazyOpsTest.TestExpandAs 2023-01-11T23:30:55.8897041Z [ OK ] LazyOpsTest.TestExpandAs (0 ms) 2023-01-11T23:30:55.8897430Z [ RUN ] LazyOpsTest.TestEye 2023-01-11T23:30:55.8899004Z [ OK ] LazyOpsTest.TestEye (0 ms) 2023-01-11T23:30:55.8899408Z [ RUN ] LazyOpsTest.TestEyeWide 2023-01-11T23:30:55.8900368Z [ OK ] LazyOpsTest.TestEyeWide (0 ms) 2023-01-11T23:30:55.8900785Z [ RUN ] LazyOpsTest.TestEyeNarrow 2023-01-11T23:30:55.8902425Z [ OK ] LazyOpsTest.TestEyeNarrow (0 ms) 2023-01-11T23:30:55.8903058Z [ RUN ] LazyOpsTest.TestBroadcastTensors 2023-01-11T23:30:55.8910247Z [ OK ] LazyOpsTest.TestBroadcastTensors (0 ms) 2023-01-11T23:30:55.8910720Z [ RUN ] LazyOpsTest.TestOneIndex 2023-01-11T23:30:55.8926221Z [ OK ] LazyOpsTest.TestOneIndex (1 ms) 2023-01-11T23:30:55.8926691Z [ RUN ] LazyOpsTest.TestOneIndexTransfer 2023-01-11T23:30:55.8941801Z [ OK ] LazyOpsTest.TestOneIndexTransfer (1 ms) 2023-01-11T23:30:55.8942244Z [ RUN ] LazyOpsTest.TestNonzero 2023-01-11T23:30:55.8945085Z [ OK ] LazyOpsTest.TestNonzero (0 ms) 2023-01-11T23:30:55.8945516Z [ RUN ] LazyOpsTest.TestMaskedSelect 2023-01-11T23:30:55.8949356Z [ OK ] LazyOpsTest.TestMaskedSelect (0 ms) 2023-01-11T23:30:55.8949804Z [ RUN ] LazyOpsTest.TestMaskedScatter 2023-01-11T23:30:55.8954280Z [ OK ] LazyOpsTest.TestMaskedScatter (0 ms) 2023-01-11T23:30:55.8954760Z [ RUN ] LazyOpsTest.TestMultiIndexHeadNull 2023-01-11T23:30:55.8970449Z [ OK ] LazyOpsTest.TestMultiIndexHeadNull (1 ms) 2023-01-11T23:30:55.8970990Z [ RUN ] LazyOpsTest.TestMultiIndexMiddleNull 2023-01-11T23:30:55.8986610Z [ OK ] LazyOpsTest.TestMultiIndexMiddleNull (1 ms) 2023-01-11T23:30:55.8987127Z [ RUN ] LazyOpsTest.TestMultiIndexTailNull 2023-01-11T23:30:55.9002826Z [ OK ] LazyOpsTest.TestMultiIndexTailNull (1 ms) 2023-01-11T23:30:55.9003386Z [ RUN ] LazyOpsTest.TestMultiIndexMiddleBroadcast 2023-01-11T23:30:55.9020710Z [ OK ] LazyOpsTest.TestMultiIndexMiddleBroadcast (1 ms) 2023-01-11T23:30:55.9021248Z [ RUN ] LazyOpsTest.TestMultiIndexTailBroadcast 2023-01-11T23:30:55.9038908Z [ OK ] LazyOpsTest.TestMultiIndexTailBroadcast (1 ms) 2023-01-11T23:30:55.9039393Z [ RUN ] LazyOpsTest.TestMaskIndex 2023-01-11T23:30:55.9055740Z [ OK ] LazyOpsTest.TestMaskIndex (1 ms) 2023-01-11T23:30:55.9056179Z [ RUN ] LazyOpsTest.TestOneIndexPut 2023-01-11T23:30:55.9060007Z /var/lib/jenkins/workspace/test/cpp/lazy/test_lazy_ops.cpp:5230: Skipped 2023-01-11T23:30:55.9060264Z 2023-01-11T23:30:55.9060552Z [ SKIPPED ] LazyOpsTest.TestOneIndexPut (0 ms) 2023-01-11T23:30:55.9060994Z [ RUN ] LazyOpsTest.TestOneIndexPutInPlace 2023-01-11T23:30:55.9065364Z /var/lib/jenkins/workspace/test/cpp/lazy/test_lazy_ops.cpp:5264: Skipped 2023-01-11T23:30:55.9065618Z 2023-01-11T23:30:55.9065922Z [ SKIPPED ] LazyOpsTest.TestOneIndexPutInPlace (0 ms) 2023-01-11T23:30:55.9066349Z [ RUN ] LazyOpsTest.TestOneIndexPutTransfer 2023-01-11T23:30:55.9068265Z /var/lib/jenkins/workspace/test/cpp/lazy/test_lazy_ops.cpp:5315: Skipped 2023-01-11T23:30:55.9068523Z 2023-01-11T23:30:55.9068934Z [ SKIPPED ] LazyOpsTest.TestOneIndexPutTransfer (0 ms) 2023-01-11T23:30:55.9069398Z [ RUN ] LazyOpsTest.TestMultiIndexPut 2023-01-11T23:30:55.9073402Z /var/lib/jenkins/workspace/test/cpp/lazy/test_lazy_ops.cpp:5360: Skipped 2023-01-11T23:30:55.9073649Z 2023-01-11T23:30:55.9073921Z [ SKIPPED ] LazyOpsTest.TestMultiIndexPut (0 ms) 2023-01-11T23:30:55.9074381Z [ RUN ] LazyOpsTest.TestMultiIndexPutHeadNull 2023-01-11T23:30:55.9076571Z /var/lib/jenkins/workspace/test/cpp/lazy/test_lazy_ops.cpp:5411: Skipped 2023-01-11T23:30:55.9076813Z 2023-01-11T23:30:55.9077115Z [ SKIPPED ] LazyOpsTest.TestMultiIndexPutHeadNull (0 ms) 2023-01-11T23:30:55.9077535Z [ RUN ] LazyOpsTest.TestMultiIndexPutMiddleNull 2023-01-11T23:30:55.9081672Z /var/lib/jenkins/workspace/test/cpp/lazy/test_lazy_ops.cpp:5462: Skipped 2023-01-11T23:30:55.9081915Z 2023-01-11T23:30:55.9082231Z [ SKIPPED ] LazyOpsTest.TestMultiIndexPutMiddleNull (0 ms) 2023-01-11T23:30:55.9082727Z [ RUN ] LazyOpsTest.TestMultiIndexPutTailNull 2023-01-11T23:30:55.9085334Z /var/lib/jenkins/workspace/test/cpp/lazy/test_lazy_ops.cpp:5513: Skipped 2023-01-11T23:30:55.9085579Z 2023-01-11T23:30:55.9085885Z [ SKIPPED ] LazyOpsTest.TestMultiIndexPutTailNull (0 ms) 2023-01-11T23:30:55.9086339Z [ RUN ] LazyOpsTest.TestMultiIndexPutMiddleBroadcast 2023-01-11T23:30:55.9089437Z /var/lib/jenkins/workspace/test/cpp/lazy/test_lazy_ops.cpp:5563: Skipped 2023-01-11T23:30:55.9089682Z 2023-01-11T23:30:55.9090008Z [ SKIPPED ] LazyOpsTest.TestMultiIndexPutMiddleBroadcast (0 ms) 2023-01-11T23:30:55.9090500Z [ RUN ] LazyOpsTest.TestMultiIndexPutTailBroadcast 2023-01-11T23:30:55.9093399Z /var/lib/jenkins/workspace/test/cpp/lazy/test_lazy_ops.cpp:5613: Skipped 2023-01-11T23:30:55.9093639Z 2023-01-11T23:30:55.9093957Z [ SKIPPED ] LazyOpsTest.TestMultiIndexPutTailBroadcast (0 ms) 2023-01-11T23:30:55.9094361Z [ RUN ] LazyOpsTest.TestMaskIndexPut 2023-01-11T23:30:55.9156733Z [ OK ] LazyOpsTest.TestMaskIndexPut (6 ms) 2023-01-11T23:30:55.9157209Z [ RUN ] LazyOpsTest.TestIndexPutImpl 2023-01-11T23:30:55.9161359Z /var/lib/jenkins/workspace/test/cpp/lazy/test_lazy_ops.cpp:5687: Skipped 2023-01-11T23:30:55.9161628Z 2023-01-11T23:30:55.9161921Z [ SKIPPED ] LazyOpsTest.TestIndexPutImpl (0 ms) 2023-01-11T23:30:55.9162367Z [ RUN ] LazyOpsTest.TestIndexFillWithScalar 2023-01-11T23:30:55.9241269Z [ OK ] LazyOpsTest.TestIndexFillWithScalar (7 ms) 2023-01-11T23:30:55.9241714Z [ RUN ] LazyOpsTest.TestIndexFillWithScalarInPlace 2023-01-11T23:30:55.9330823Z [ OK ] LazyOpsTest.TestIndexFillWithScalarInPlace (8 ms) 2023-01-11T23:30:55.9331273Z [ RUN ] LazyOpsTest.TestIndexFillWithTensor 2023-01-11T23:30:55.9418967Z [ OK ] LazyOpsTest.TestIndexFillWithTensor (8 ms) 2023-01-11T23:30:55.9419361Z [ RUN ] LazyOpsTest.TestIndexFillWithTensorInPlace 2023-01-11T23:30:55.9527331Z [ OK ] LazyOpsTest.TestIndexFillWithTensorInPlace (10 ms) 2023-01-11T23:30:55.9527720Z [ RUN ] LazyOpsTest.TestIndexFillRank0 2023-01-11T23:30:55.9614063Z [ OK ] LazyOpsTest.TestIndexFillRank0 (8 ms) 2023-01-11T23:30:55.9614401Z [ RUN ] LazyOpsTest.TestIndexAdd 2023-01-11T23:30:55.9805888Z [ OK ] LazyOpsTest.TestIndexAdd (19 ms) 2023-01-11T23:30:55.9806237Z [ RUN ] LazyOpsTest.TestIndexAddInPlace 2023-01-11T23:30:55.9940473Z [ OK ] LazyOpsTest.TestIndexAddInPlace (13 ms) 2023-01-11T23:30:55.9940882Z [ RUN ] LazyOpsTest.TestIndexAddRank0 2023-01-11T23:30:56.0019282Z [ OK ] LazyOpsTest.TestIndexAddRank0 (7 ms) 2023-01-11T23:30:56.0019628Z [ RUN ] LazyOpsTest.TestIndexCopy 2023-01-11T23:30:56.0113531Z [ OK ] LazyOpsTest.TestIndexCopy (9 ms) 2023-01-11T23:30:56.0114067Z [ RUN ] LazyOpsTest.TestIndexCopyInPlace 2023-01-11T23:30:56.0114388Z /var/lib/jenkins/workspace/test/cpp/lazy/test_lazy_ops.cpp:6082: Skipped 2023-01-11T23:30:56.0114570Z 2023-01-11T23:30:56.0114772Z [ SKIPPED ] LazyOpsTest.TestIndexCopyInPlace (0 ms) 2023-01-11T23:30:56.0115118Z [ RUN ] LazyOpsTest.TestIndexCopyRank0 2023-01-11T23:30:56.0194395Z [ OK ] LazyOpsTest.TestIndexCopyRank0 (8 ms) 2023-01-11T23:30:56.0194724Z [ RUN ] LazyOpsTest.TestRelu 2023-01-11T23:30:56.0198113Z [ OK ] LazyOpsTest.TestRelu (0 ms) 2023-01-11T23:30:56.0198433Z [ RUN ] LazyOpsTest.TestReluInPlace 2023-01-11T23:30:56.0203317Z [ OK ] LazyOpsTest.TestReluInPlace (0 ms) 2023-01-11T23:30:56.0203655Z [ RUN ] LazyOpsTest.TestHardshrink 2023-01-11T23:30:56.0205438Z [ OK ] LazyOpsTest.TestHardshrink (0 ms) 2023-01-11T23:30:56.0205836Z [ RUN ] LazyOpsTest.TestHardSigmoid 2023-01-11T23:30:56.0209469Z [ OK ] LazyOpsTest.TestHardSigmoid (0 ms) 2023-01-11T23:30:56.0209932Z [ RUN ] LazyOpsTest.TestHardSigmoidInPlace 2023-01-11T23:30:56.0214410Z [ OK ] LazyOpsTest.TestHardSigmoidInPlace (0 ms) 2023-01-11T23:30:56.0214937Z [ RUN ] LazyOpsTest.TestHardSigmoidBackward 2023-01-11T23:30:56.0228416Z [ OK ] LazyOpsTest.TestHardSigmoidBackward (1 ms) 2023-01-11T23:30:56.0228900Z [ RUN ] LazyOpsTest.TestSoftshrink 2023-01-11T23:30:56.0230672Z [ OK ] LazyOpsTest.TestSoftshrink (0 ms) 2023-01-11T23:30:56.0232335Z [ RUN ] LazyOpsTest.TestHardtanh 2023-01-11T23:30:56.0233168Z [ OK ] LazyOpsTest.TestHardtanh (0 ms) 2023-01-11T23:30:56.0233648Z [ RUN ] LazyOpsTest.TestHardtanhInPlace 2023-01-11T23:30:56.0235800Z [ OK ] LazyOpsTest.TestHardtanhInPlace (0 ms) 2023-01-11T23:30:56.0236284Z [ RUN ] LazyOpsTest.TestLeakyRelu 2023-01-11T23:30:56.0239726Z [ OK ] LazyOpsTest.TestLeakyRelu (0 ms) 2023-01-11T23:30:56.0240220Z [ RUN ] LazyOpsTest.TestLeakyReluInPlace 2023-01-11T23:30:56.0244910Z [ OK ] LazyOpsTest.TestLeakyReluInPlace (0 ms) 2023-01-11T23:30:56.0245354Z [ RUN ] LazyOpsTest.TestExp 2023-01-11T23:30:56.0248400Z [ OK ] LazyOpsTest.TestExp (0 ms) 2023-01-11T23:30:56.0248805Z [ RUN ] LazyOpsTest.TestExpm1 2023-01-11T23:30:56.0250299Z [ OK ] LazyOpsTest.TestExpm1 (0 ms) 2023-01-11T23:30:56.0250707Z [ RUN ] LazyOpsTest.TestLog 2023-01-11T23:30:56.0253886Z [ OK ] LazyOpsTest.TestLog (0 ms) 2023-01-11T23:30:56.0254306Z [ RUN ] LazyOpsTest.TestLog2 2023-01-11T23:30:56.0257758Z [ OK ] LazyOpsTest.TestLog2 (0 ms) 2023-01-11T23:30:56.0258196Z [ RUN ] LazyOpsTest.TestLog10 2023-01-11T23:30:56.0259942Z [ OK ] LazyOpsTest.TestLog10 (0 ms) 2023-01-11T23:30:56.0260378Z [ RUN ] LazyOpsTest.TestLog1p 2023-01-11T23:30:56.0262128Z [ OK ] LazyOpsTest.TestLog1p (0 ms) 2023-01-11T23:30:56.0262568Z [ RUN ] LazyOpsTest.TestErf 2023-01-11T23:30:56.0264673Z [ OK ] LazyOpsTest.TestErf (0 ms) 2023-01-11T23:30:56.0265098Z [ RUN ] LazyOpsTest.TestErfc 2023-01-11T23:30:56.0273141Z [ OK ] LazyOpsTest.TestErfc (0 ms) 2023-01-11T23:30:56.0273569Z [ RUN ] LazyOpsTest.TestErfinv 2023-01-11T23:30:56.0278228Z [ OK ] LazyOpsTest.TestErfinv (0 ms) 2023-01-11T23:30:56.0278654Z [ RUN ] LazyOpsTest.TestSqrt 2023-01-11T23:30:56.0281951Z [ OK ] LazyOpsTest.TestSqrt (0 ms) 2023-01-11T23:30:56.0282274Z [ RUN ] LazyOpsTest.TestRsqrt 2023-01-11T23:30:56.0285526Z [ OK ] LazyOpsTest.TestRsqrt (0 ms) 2023-01-11T23:30:56.0285853Z [ RUN ] LazyOpsTest.TestReciprocal 2023-01-11T23:30:56.0289097Z [ OK ] LazyOpsTest.TestReciprocal (0 ms) 2023-01-11T23:30:56.0289448Z [ RUN ] LazyOpsTest.TestPowTensorScalar 2023-01-11T23:30:56.0292952Z [ OK ] LazyOpsTest.TestPowTensorScalar (0 ms) 2023-01-11T23:30:56.0293326Z [ RUN ] LazyOpsTest.TestPowTensorScalarInPlace 2023-01-11T23:30:56.0298502Z [ OK ] LazyOpsTest.TestPowTensorScalarInPlace (0 ms) 2023-01-11T23:30:56.0298865Z [ RUN ] LazyOpsTest.TestPowTensorTensor 2023-01-11T23:30:56.0302955Z [ OK ] LazyOpsTest.TestPowTensorTensor (0 ms) 2023-01-11T23:30:56.0303324Z [ RUN ] LazyOpsTest.TestPowTensorTensorInPlace 2023-01-11T23:30:56.0308767Z [ OK ] LazyOpsTest.TestPowTensorTensorInPlace (0 ms) 2023-01-11T23:30:56.0309156Z [ RUN ] LazyOpsTest.TestPowTensorTensorBroadcast 2023-01-11T23:30:56.0313959Z [ OK ] LazyOpsTest.TestPowTensorTensorBroadcast (0 ms) 2023-01-11T23:30:56.0314454Z [ RUN ] LazyOpsTest.TestPowScalarTensor 2023-01-11T23:30:56.0315062Z [ OK ] LazyOpsTest.TestPowScalarTensor (0 ms) 2023-01-11T23:30:56.0315402Z [ RUN ] LazyOpsTest.TestPowIntExponent 2023-01-11T23:30:56.0319197Z [ OK ] LazyOpsTest.TestPowIntExponent (0 ms) 2023-01-11T23:30:56.0319535Z [ RUN ] LazyOpsTest.TestFmodScalar 2023-01-11T23:30:56.0321797Z [ OK ] LazyOpsTest.TestFmodScalar (0 ms) 2023-01-11T23:30:56.0322151Z [ RUN ] LazyOpsTest.TestFmodScalarInPlace 2023-01-11T23:30:56.0325279Z [ OK ] LazyOpsTest.TestFmodScalarInPlace (0 ms) 2023-01-11T23:30:56.0325626Z [ RUN ] LazyOpsTest.TestFmodTensor 2023-01-11T23:30:56.0327612Z [ OK ] LazyOpsTest.TestFmodTensor (0 ms) 2023-01-11T23:30:56.0327962Z [ RUN ] LazyOpsTest.TestFmodTensorInPlace 2023-01-11T23:30:56.0331272Z [ OK ] LazyOpsTest.TestFmodTensorInPlace (0 ms) 2023-01-11T23:30:56.0331874Z [ RUN ] LazyOpsTest.TestRemainderScalar 2023-01-11T23:30:56.0335588Z [ OK ] LazyOpsTest.TestRemainderScalar (0 ms) 2023-01-11T23:30:56.0335959Z [ RUN ] LazyOpsTest.TestRemainderScalarInPlace 2023-01-11T23:30:56.0341544Z [ OK ] LazyOpsTest.TestRemainderScalarInPlace (0 ms) 2023-01-11T23:30:56.0341918Z [ RUN ] LazyOpsTest.TestRemainderTensor 2023-01-11T23:30:56.0345883Z [ OK ] LazyOpsTest.TestRemainderTensor (0 ms) 2023-01-11T23:30:56.0346249Z [ RUN ] LazyOpsTest.TestRemainderTensorInPlace 2023-01-11T23:30:56.0353158Z [ OK ] LazyOpsTest.TestRemainderTensorInPlace (0 ms) 2023-01-11T23:30:56.0353639Z [ RUN ] LazyOpsTest.TestWhere 2023-01-11T23:30:56.0354130Z [W TensorCompare.cpp:493] Warning: where received a uint8 condition tensor. This behavior is deprecated and will be removed in a future version of PyTorch. Use a boolean condition instead. (function operator()) 2023-01-11T23:30:56.0356229Z [ OK ] LazyOpsTest.TestWhere (0 ms) 2023-01-11T23:30:56.0356582Z [ RUN ] LazyOpsTest.TestWhereBroadcast 2023-01-11T23:30:56.0359831Z [ OK ] LazyOpsTest.TestWhereBroadcast (0 ms) 2023-01-11T23:30:56.0360193Z [ RUN ] LazyOpsTest.TestThreshold 2023-01-11T23:30:56.0363839Z [ OK ] LazyOpsTest.TestThreshold (0 ms) 2023-01-11T23:30:56.0364211Z [ RUN ] LazyOpsTest.TestThresholdBackward 2023-01-11T23:30:56.0376663Z [ OK ] LazyOpsTest.TestThresholdBackward (1 ms) 2023-01-11T23:30:56.0377038Z [ RUN ] LazyOpsTest.TestThresholdInPlace 2023-01-11T23:30:56.0381168Z [ OK ] LazyOpsTest.TestThresholdInPlace (0 ms) 2023-01-11T23:30:56.0381564Z [ RUN ] LazyOpsTest.TestElu 2023-01-11T23:30:56.0385332Z [ OK ] LazyOpsTest.TestElu (0 ms) 2023-01-11T23:30:56.0385673Z [ RUN ] LazyOpsTest.TestEluInPlace 2023-01-11T23:30:56.0390611Z [ OK ] LazyOpsTest.TestEluInPlace (0 ms) 2023-01-11T23:30:56.0391534Z [ RUN ] LazyOpsTest.TestSelu 2023-01-11T23:30:56.0394508Z [ OK ] LazyOpsTest.TestSelu (0 ms) 2023-01-11T23:30:56.0394851Z [ RUN ] LazyOpsTest.TestSeluInPlace 2023-01-11T23:30:56.0399627Z [ OK ] LazyOpsTest.TestSeluInPlace (0 ms) 2023-01-11T23:30:56.0399957Z [ RUN ] LazyOpsTest.TestCelu 2023-01-11T23:30:56.0402021Z [ OK ] LazyOpsTest.TestCelu (0 ms) 2023-01-11T23:30:56.0402382Z [ RUN ] LazyOpsTest.TestCeluInPlace 2023-01-11T23:30:56.0405102Z [ OK ] LazyOpsTest.TestCeluInPlace (0 ms) 2023-01-11T23:30:56.0405465Z [ RUN ] LazyOpsTest.TestGelu 2023-01-11T23:30:56.0408933Z [ OK ] LazyOpsTest.TestGelu (0 ms) 2023-01-11T23:30:56.0409307Z [ RUN ] LazyOpsTest.TestAddMatMul 2023-01-11T23:30:56.0420744Z [ OK ] LazyOpsTest.TestAddMatMul (1 ms) 2023-01-11T23:30:56.0421143Z [ RUN ] LazyOpsTest.TestEmbedding 2023-01-11T23:30:56.0425670Z [ OK ] LazyOpsTest.TestEmbedding (0 ms) 2023-01-11T23:30:56.0426067Z [ RUN ] LazyOpsTest.TestOneHot 2023-01-11T23:30:56.0436243Z [ OK ] LazyOpsTest.TestOneHot (1 ms) 2023-01-11T23:30:56.0436644Z [ RUN ] LazyOpsTest.TestTranspose 2023-01-11T23:30:56.0440045Z [ OK ] LazyOpsTest.TestTranspose (0 ms) 2023-01-11T23:30:56.0440448Z [ RUN ] LazyOpsTest.TestTransposeInPlace 2023-01-11T23:30:56.0444914Z [ OK ] LazyOpsTest.TestTransposeInPlace (0 ms) 2023-01-11T23:30:56.0445340Z [ RUN ] LazyOpsTest.TestReshape 2023-01-11T23:30:56.0450139Z [ OK ] LazyOpsTest.TestReshape (0 ms) 2023-01-11T23:30:56.0450540Z [ RUN ] LazyOpsTest.TestResize 2023-01-11T23:30:56.0456301Z [ OK ] LazyOpsTest.TestResize (0 ms) 2023-01-11T23:30:56.0456677Z [ RUN ] LazyOpsTest.TestViewResize 2023-01-11T23:30:56.0464664Z [ OK ] LazyOpsTest.TestViewResize (0 ms) 2023-01-11T23:30:56.0465039Z [ RUN ] LazyOpsTest.TestView 2023-01-11T23:30:56.0469844Z [ OK ] LazyOpsTest.TestView (0 ms) 2023-01-11T23:30:56.0470295Z [ RUN ] LazyOpsTest.TestViewMod 2023-01-11T23:30:56.0485442Z [ OK ] LazyOpsTest.TestViewMod (1 ms) 2023-01-11T23:30:56.0485846Z [ RUN ] LazyOpsTest.TestViewModComplex 2023-01-11T23:30:56.0503975Z [ OK ] LazyOpsTest.TestViewModComplex (1 ms) 2023-01-11T23:30:56.0504384Z [ RUN ] LazyOpsTest.TestViewOfViewMod 2023-01-11T23:30:56.0525271Z [ OK ] LazyOpsTest.TestViewOfViewMod (2 ms) 2023-01-11T23:30:56.0525642Z [ RUN ] LazyOpsTest.TestViewSqueezeAddInPlace 2023-01-11T23:30:56.0540458Z [ OK ] LazyOpsTest.TestViewSqueezeAddInPlace (1 ms) 2023-01-11T23:30:56.0540860Z [ RUN ] LazyOpsTest.TestUnsafeView 2023-01-11T23:30:56.0546001Z [ OK ] LazyOpsTest.TestUnsafeView (0 ms) 2023-01-11T23:30:56.0546374Z [ RUN ] LazyOpsTest.TestNarrow 2023-01-11T23:30:56.0567776Z [ OK ] LazyOpsTest.TestNarrow (2 ms) 2023-01-11T23:30:56.0568128Z [ RUN ] LazyOpsTest.TestNarrowUpdate 2023-01-11T23:30:56.0608354Z [ OK ] LazyOpsTest.TestNarrowUpdate (4 ms) 2023-01-11T23:30:56.0608773Z [ RUN ] LazyOpsTest.TestNarrowUpdateBaseCheck 2023-01-11T23:30:56.0642653Z [ OK ] LazyOpsTest.TestNarrowUpdateBaseCheck (3 ms) 2023-01-11T23:30:56.0643112Z [ RUN ] LazyOpsTest.TestNarrowUpdateTwoSlices 2023-01-11T23:30:56.0827822Z [ OK ] LazyOpsTest.TestNarrowUpdateTwoSlices (18 ms) 2023-01-11T23:30:56.0828212Z [ RUN ] LazyOpsTest.TestNarrowUpdateView 2023-01-11T23:30:56.0877299Z [ OK ] LazyOpsTest.TestNarrowUpdateView (4 ms) 2023-01-11T23:30:56.0877691Z [ RUN ] LazyOpsTest.TestNarrowInNarrowUpdate 2023-01-11T23:30:56.0943817Z [ OK ] LazyOpsTest.TestNarrowInNarrowUpdate (6 ms) 2023-01-11T23:30:56.0944357Z [ RUN ] LazyOpsTest.TestNarrowCopy 2023-01-11T23:30:56.0954464Z [ OK ] LazyOpsTest.TestNarrowCopy (1 ms) 2023-01-11T23:30:56.0954846Z [ RUN ] LazyOpsTest.TestViewAs 2023-01-11T23:30:56.0960391Z [ OK ] LazyOpsTest.TestViewAs (0 ms) 2023-01-11T23:30:56.0960795Z [ RUN ] LazyOpsTest.TestLogSoftmax 2023-01-11T23:30:56.0986832Z [ OK ] LazyOpsTest.TestLogSoftmax (2 ms) 2023-01-11T23:30:56.0987240Z [ RUN ] LazyOpsTest.TestLogSoftmaxCast 2023-01-11T23:30:56.1022565Z [ OK ] LazyOpsTest.TestLogSoftmaxCast (3 ms) 2023-01-11T23:30:56.1023016Z [ RUN ] LazyOpsTest.TestLogSoftmaxWrapper 2023-01-11T23:30:56.1048568Z [ OK ] LazyOpsTest.TestLogSoftmaxWrapper (2 ms) 2023-01-11T23:30:56.1048973Z [ RUN ] LazyOpsTest.TestSoftmax 2023-01-11T23:30:56.1075612Z [ OK ] LazyOpsTest.TestSoftmax (2 ms) 2023-01-11T23:30:56.1076127Z [ RUN ] LazyOpsTest.TestSoftmaxCast 2023-01-11T23:30:56.1112326Z [ OK ] LazyOpsTest.TestSoftmaxCast (3 ms) 2023-01-11T23:30:56.1112739Z [ RUN ] LazyOpsTest.TestSoftmaxWrapper 2023-01-11T23:30:56.1145062Z [ OK ] LazyOpsTest.TestSoftmaxWrapper (3 ms) 2023-01-11T23:30:56.1145404Z [ RUN ] LazyOpsTest.TestSoftplus 2023-01-11T23:30:56.1148881Z [ OK ] LazyOpsTest.TestSoftplus (0 ms) 2023-01-11T23:30:56.1149271Z [ RUN ] LazyOpsTest.TestMaxPool1D 2023-01-11T23:30:56.1249836Z [ OK ] LazyOpsTest.TestMaxPool1D (10 ms) 2023-01-11T23:30:56.1250222Z [ RUN ] LazyOpsTest.TestMaxPool2D 2023-01-11T23:30:56.1322629Z [ OK ] LazyOpsTest.TestMaxPool2D (7 ms) 2023-01-11T23:30:56.1323067Z [ RUN ] LazyOpsTest.TestMaxPool2DWithIndices 2023-01-11T23:30:56.1478637Z [ OK ] LazyOpsTest.TestMaxPool2DWithIndices (15 ms) 2023-01-11T23:30:56.1479215Z [ RUN ] LazyOpsTest.TestMaxPool2DNonSquare 2023-01-11T23:30:56.1550588Z [ OK ] LazyOpsTest.TestMaxPool2DNonSquare (7 ms) 2023-01-11T23:30:56.1551855Z [ RUN ] LazyOpsTest.TestMaxPool3D 2023-01-11T23:30:56.1581934Z [ OK ] LazyOpsTest.TestMaxPool3D (3 ms) 2023-01-11T23:30:56.1582355Z [ RUN ] LazyOpsTest.TestMaxPool3DWithIndices 2023-01-11T23:30:56.1626836Z [ OK ] LazyOpsTest.TestMaxPool3DWithIndices (4 ms) 2023-01-11T23:30:56.1627274Z [ RUN ] LazyOpsTest.TestMaxPool3DIncompleteAttributes 2023-01-11T23:30:56.1658202Z [ OK ] LazyOpsTest.TestMaxPool3DIncompleteAttributes (3 ms) 2023-01-11T23:30:56.1658625Z [ RUN ] LazyOpsTest.TestMaxPool3DNonSquare 2023-01-11T23:30:56.1689109Z [ OK ] LazyOpsTest.TestMaxPool3DNonSquare (3 ms) 2023-01-11T23:30:56.1689505Z [ RUN ] LazyOpsTest.TestMaxPool2DNoBatch 2023-01-11T23:30:56.1760305Z [ OK ] LazyOpsTest.TestMaxPool2DNoBatch (7 ms) 2023-01-11T23:30:56.1760689Z [ RUN ] LazyOpsTest.TestMaxPool3DNoBatch 2023-01-11T23:30:56.1794379Z [ OK ] LazyOpsTest.TestMaxPool3DNoBatch (3 ms) 2023-01-11T23:30:56.1794749Z [ RUN ] LazyOpsTest.TestAvgPool1D 2023-01-11T23:30:56.1887047Z [ OK ] LazyOpsTest.TestAvgPool1D (9 ms) 2023-01-11T23:30:56.1887385Z [ RUN ] LazyOpsTest.TestAvgPool2D 2023-01-11T23:30:56.1954565Z [ OK ] LazyOpsTest.TestAvgPool2D (6 ms) 2023-01-11T23:30:56.1954939Z [ RUN ] LazyOpsTest.TestAvgPool2DNonSquare 2023-01-11T23:30:56.2023183Z [ OK ] LazyOpsTest.TestAvgPool2DNonSquare (6 ms) 2023-01-11T23:30:56.2023538Z [ RUN ] LazyOpsTest.TestAvgPool3D 2023-01-11T23:30:56.2052315Z [ OK ] LazyOpsTest.TestAvgPool3D (2 ms) 2023-01-11T23:30:56.2052829Z [ RUN ] LazyOpsTest.TestAvgPool3DIncompleteAttributes 2023-01-11T23:30:56.2081330Z [ OK ] LazyOpsTest.TestAvgPool3DIncompleteAttributes (2 ms) 2023-01-11T23:30:56.2081746Z [ RUN ] LazyOpsTest.TestAvgPool3DNonSquare 2023-01-11T23:30:56.2109996Z [ OK ] LazyOpsTest.TestAvgPool3DNonSquare (2 ms) 2023-01-11T23:30:56.2110420Z [ RUN ] LazyOpsTest.TestAvgPool2DNoBatch 2023-01-11T23:30:56.2179283Z [ OK ] LazyOpsTest.TestAvgPool2DNoBatch (6 ms) 2023-01-11T23:30:56.2179651Z [ RUN ] LazyOpsTest.TestAvgPool3DNoBatch 2023-01-11T23:30:56.2206561Z [ OK ] LazyOpsTest.TestAvgPool3DNoBatch (2 ms) 2023-01-11T23:30:56.2207033Z [ RUN ] LazyOpsTest.TestAdaptiveAvgPool2D 2023-01-11T23:30:56.2214432Z [ OK ] LazyOpsTest.TestAdaptiveAvgPool2D (0 ms) 2023-01-11T23:30:56.2214997Z [ RUN ] LazyOpsTest.TestAdaptiveAvgPool3D 2023-01-11T23:30:56.2225829Z [ OK ] LazyOpsTest.TestAdaptiveAvgPool3D (1 ms) 2023-01-11T23:30:56.2226342Z [ RUN ] LazyOpsTest.TestAdaptiveAvgPool3DNoBatch 2023-01-11T23:30:56.2232555Z [ OK ] LazyOpsTest.TestAdaptiveAvgPool3DNoBatch (0 ms) 2023-01-11T23:30:56.2233032Z [ RUN ] LazyOpsTest.TestAdaptiveAvgPool2DNoBatch 2023-01-11T23:30:56.2238767Z [ OK ] LazyOpsTest.TestAdaptiveAvgPool2DNoBatch (0 ms) 2023-01-11T23:30:56.2239193Z [ RUN ] LazyOpsTest.TestMaxUnpool2D 2023-01-11T23:30:56.2277252Z [ OK ] LazyOpsTest.TestMaxUnpool2D (3 ms) 2023-01-11T23:30:56.2277617Z [ RUN ] LazyOpsTest.TestMaxUnpool3D 2023-01-11T23:30:56.2316136Z [ OK ] LazyOpsTest.TestMaxUnpool3D (3 ms) 2023-01-11T23:30:56.2316477Z [ RUN ] LazyOpsTest.TestNllLoss 2023-01-11T23:30:56.2316848Z /var/lib/jenkins/workspace/test/cpp/lazy/test_lazy_ops.cpp:8173: Skipped 2023-01-11T23:30:56.2317065Z 2023-01-11T23:30:56.2317251Z [ SKIPPED ] LazyOpsTest.TestNllLoss (0 ms) 2023-01-11T23:30:56.2317587Z [ RUN ] LazyOpsTest.TestNllLoss2d 2023-01-11T23:30:56.2430164Z [ OK ] LazyOpsTest.TestNllLoss2d (11 ms) 2023-01-11T23:30:56.2430490Z [ RUN ] LazyOpsTest.TestSmoothL1Loss 2023-01-11T23:30:56.2454728Z [ OK ] LazyOpsTest.TestSmoothL1Loss (2 ms) 2023-01-11T23:30:56.2455059Z [ RUN ] LazyOpsTest.TestL1Loss 2023-01-11T23:30:56.2470119Z [ OK ] LazyOpsTest.TestL1Loss (1 ms) 2023-01-11T23:30:56.2470487Z [ RUN ] LazyOpsTest.TestL1LossBackward 2023-01-11T23:30:56.2523072Z [ OK ] LazyOpsTest.TestL1LossBackward (5 ms) 2023-01-11T23:30:56.2523434Z [ RUN ] LazyOpsTest.TestMseLoss 2023-01-11T23:30:56.2529714Z [ OK ] LazyOpsTest.TestMseLoss (0 ms) 2023-01-11T23:30:56.2530118Z [ RUN ] LazyOpsTest.TestMseLossBackward 2023-01-11T23:30:56.2568649Z [ OK ] LazyOpsTest.TestMseLossBackward (3 ms) 2023-01-11T23:30:56.2569011Z [ RUN ] LazyOpsTest.TestBatchNorm1D 2023-01-11T23:30:56.2594071Z [ OK ] LazyOpsTest.TestBatchNorm1D (2 ms) 2023-01-11T23:30:56.2594415Z [ RUN ] LazyOpsTest.TestBatchNorm2D 2023-01-11T23:30:56.2617682Z [ OK ] LazyOpsTest.TestBatchNorm2D (2 ms) 2023-01-11T23:30:56.2618001Z [ RUN ] LazyOpsTest.TestDim 2023-01-11T23:30:56.2618363Z [ OK ] LazyOpsTest.TestDim (0 ms) 2023-01-11T23:30:56.2618727Z [ RUN ] LazyOpsTest.TestContiguous 2023-01-11T23:30:56.2619258Z [ OK ] LazyOpsTest.TestContiguous (0 ms) 2023-01-11T23:30:56.2619648Z [ RUN ] LazyOpsTest.TestSqueezeAll 2023-01-11T23:30:56.2623225Z [ OK ] LazyOpsTest.TestSqueezeAll (0 ms) 2023-01-11T23:30:56.2623582Z [ RUN ] LazyOpsTest.TestSqueezeAllInPlace 2023-01-11T23:30:56.2628005Z [ OK ] LazyOpsTest.TestSqueezeAllInPlace (0 ms) 2023-01-11T23:30:56.2628441Z [ RUN ] LazyOpsTest.TestSqueezeOne 2023-01-11T23:30:56.2656716Z [ OK ] LazyOpsTest.TestSqueezeOne (2 ms) 2023-01-11T23:30:56.2657080Z [ RUN ] LazyOpsTest.TestSqueezeOneInPlace 2023-01-11T23:30:56.2695067Z [ OK ] LazyOpsTest.TestSqueezeOneInPlace (3 ms) 2023-01-11T23:30:56.2695419Z [ RUN ] LazyOpsTest.TestUnsqueeze 2023-01-11T23:30:56.2716097Z [ OK ] LazyOpsTest.TestUnsqueeze (2 ms) 2023-01-11T23:30:56.2716498Z [ RUN ] LazyOpsTest.TestUnsqueezeInPlace 2023-01-11T23:30:56.2742924Z [ OK ] LazyOpsTest.TestUnsqueezeInPlace (2 ms) 2023-01-11T23:30:56.2743294Z [ RUN ] LazyOpsTest.TestMaskedFill 2023-01-11T23:30:56.2747214Z [ OK ] LazyOpsTest.TestMaskedFill (0 ms) 2023-01-11T23:30:56.2747616Z [ RUN ] LazyOpsTest.TestMaskedFillInPlace 2023-01-11T23:30:56.2753571Z [ OK ] LazyOpsTest.TestMaskedFillInPlace (0 ms) 2023-01-11T23:30:56.2753993Z [ RUN ] LazyOpsTest.TestMaskedFillBroadcast 2023-01-11T23:30:56.2757261Z [ OK ] LazyOpsTest.TestMaskedFillBroadcast (0 ms) 2023-01-11T23:30:56.2757647Z [ RUN ] LazyOpsTest.TestFill 2023-01-11T23:30:56.2762778Z [ OK ] LazyOpsTest.TestFill (0 ms) 2023-01-11T23:30:56.2763164Z [ RUN ] LazyOpsTest.TestFillWithRank0 2023-01-11T23:30:56.2766118Z [ OK ] LazyOpsTest.TestFillWithRank0 (0 ms) 2023-01-11T23:30:56.2766490Z [ RUN ] LazyOpsTest.TestPermute 2023-01-11T23:30:56.2810846Z [ OK ] LazyOpsTest.TestPermute (4 ms) 2023-01-11T23:30:56.2811217Z [ RUN ] LazyOpsTest.TestPermuteMod 2023-01-11T23:30:56.2971169Z [ OK ] LazyOpsTest.TestPermuteMod (16 ms) 2023-01-11T23:30:56.2971536Z [ RUN ] LazyOpsTest.TestFlip 2023-01-11T23:30:56.3020939Z [ OK ] LazyOpsTest.TestFlip (4 ms) 2023-01-11T23:30:56.3021313Z [ RUN ] LazyOpsTest.TestPixelShuffle 2023-01-11T23:30:56.3027301Z [ OK ] LazyOpsTest.TestPixelShuffle (0 ms) 2023-01-11T23:30:56.3027678Z [ RUN ] LazyOpsTest.TestSumToSize 2023-01-11T23:30:56.3032744Z [ OK ] LazyOpsTest.TestSumToSize (0 ms) 2023-01-11T23:30:56.3033137Z [ RUN ] LazyOpsTest.TestTransposeDims 2023-01-11T23:30:56.3036200Z [ OK ] LazyOpsTest.TestTransposeDims (0 ms) 2023-01-11T23:30:56.3036552Z [ RUN ] LazyOpsTest.TestTransposeDimsMod 2023-01-11T23:30:56.3049534Z [ OK ] LazyOpsTest.TestTransposeDimsMod (1 ms) 2023-01-11T23:30:56.3049901Z [ RUN ] LazyOpsTest.TestTransposeDimsInPlace 2023-01-11T23:30:56.3055089Z [ OK ] LazyOpsTest.TestTransposeDimsInPlace (0 ms) 2023-01-11T23:30:56.3055435Z [ RUN ] LazyOpsTest.TestSplit 2023-01-11T23:30:56.3106791Z [ OK ] LazyOpsTest.TestSplit (5 ms) 2023-01-11T23:30:56.3107118Z [ RUN ] LazyOpsTest.TestSplitEmpty 2023-01-11T23:30:56.3107743Z [ OK ] LazyOpsTest.TestSplitEmpty (0 ms) 2023-01-11T23:30:56.3108117Z [ RUN ] LazyOpsTest.TestSplitWithSizes 2023-01-11T23:30:56.3131922Z [ OK ] LazyOpsTest.TestSplitWithSizes (2 ms) 2023-01-11T23:30:56.3132273Z [ RUN ] LazyOpsTest.TestCrossImplicitDim 2023-01-11T23:30:56.3139232Z [ OK ] LazyOpsTest.TestCrossImplicitDim (0 ms) 2023-01-11T23:30:56.3139589Z [ RUN ] LazyOpsTest.TestCrossExplicitDim 2023-01-11T23:30:56.3147786Z [ OK ] LazyOpsTest.TestCrossExplicitDim (0 ms) 2023-01-11T23:30:56.3148195Z [ RUN ] LazyOpsTest.TestCrossZeroDim 2023-01-11T23:30:56.3148618Z [ OK ] LazyOpsTest.TestCrossZeroDim (0 ms) 2023-01-11T23:30:56.3148940Z [ RUN ] LazyOpsTest.TestTriu 2023-01-11T23:30:56.3186659Z [ OK ] LazyOpsTest.TestTriu (3 ms) 2023-01-11T23:30:56.3186987Z [ RUN ] LazyOpsTest.TestTriuNonSquare 2023-01-11T23:30:56.3223573Z [ OK ] LazyOpsTest.TestTriuNonSquare (3 ms) 2023-01-11T23:30:56.3223921Z [ RUN ] LazyOpsTest.TestTriuBatch 2023-01-11T23:30:56.3261076Z [ OK ] LazyOpsTest.TestTriuBatch (3 ms) 2023-01-11T23:30:56.3261389Z [ RUN ] LazyOpsTest.TestTril 2023-01-11T23:30:56.3298849Z [ OK ] LazyOpsTest.TestTril (3 ms) 2023-01-11T23:30:56.3299171Z [ RUN ] LazyOpsTest.TestTrilNonSquare 2023-01-11T23:30:56.3336000Z [ OK ] LazyOpsTest.TestTrilNonSquare (3 ms) 2023-01-11T23:30:56.3336334Z [ RUN ] LazyOpsTest.TestTrilBatch 2023-01-11T23:30:56.3376166Z [ OK ] LazyOpsTest.TestTrilBatch (3 ms) 2023-01-11T23:30:56.3376487Z [ RUN ] LazyOpsTest.TestTriuInPlace 2023-01-11T23:30:56.3427835Z [ OK ] LazyOpsTest.TestTriuInPlace (5 ms) 2023-01-11T23:30:56.3428169Z [ RUN ] LazyOpsTest.TestTrilInPlace 2023-01-11T23:30:56.3479820Z [ OK ] LazyOpsTest.TestTrilInPlace (5 ms) 2023-01-11T23:30:56.3480211Z [ RUN ] LazyOpsTest.TestTrace 2023-01-11T23:30:56.3483556Z [ OK ] LazyOpsTest.TestTrace (0 ms) 2023-01-11T23:30:56.3483875Z [ RUN ] LazyOpsTest.TestTraceWide 2023-01-11T23:30:56.3487072Z [ OK ] LazyOpsTest.TestTraceWide (0 ms) 2023-01-11T23:30:56.3487408Z [ RUN ] LazyOpsTest.TestTraceNarrow 2023-01-11T23:30:56.3490757Z [ OK ] LazyOpsTest.TestTraceNarrow (0 ms) 2023-01-11T23:30:56.3491086Z [ RUN ] LazyOpsTest.TestDiagRank1 2023-01-11T23:30:56.3735005Z [ OK ] LazyOpsTest.TestDiagRank1 (24 ms) 2023-01-11T23:30:56.3735360Z [ RUN ] LazyOpsTest.TestDiagRank2 2023-01-11T23:30:56.3789659Z [ OK ] LazyOpsTest.TestDiagRank2 (5 ms) 2023-01-11T23:30:56.3790080Z [ RUN ] LazyOpsTest.TestDiagFlat 2023-01-11T23:30:56.4223054Z [ OK ] LazyOpsTest.TestDiagFlat (43 ms) 2023-01-11T23:30:56.4223478Z [ RUN ] LazyOpsTest.TestDiagonal 2023-01-11T23:30:56.4261921Z [ OK ] LazyOpsTest.TestDiagonal (4 ms) 2023-01-11T23:30:56.4262298Z [ RUN ] LazyOpsTest.TestDiagonalUpdate 2023-01-11T23:30:56.4388622Z [ OK ] LazyOpsTest.TestDiagonalUpdate (12 ms) 2023-01-11T23:30:56.4389006Z [ RUN ] LazyOpsTest.TestDiagonalNonSquare 2023-01-11T23:30:56.4428705Z [ OK ] LazyOpsTest.TestDiagonalNonSquare (4 ms) 2023-01-11T23:30:56.4429106Z [ RUN ] LazyOpsTest.TestDiagonalBatch 2023-01-11T23:30:56.4469435Z [ OK ] LazyOpsTest.TestDiagonalBatch (4 ms) 2023-01-11T23:30:56.4469767Z [ RUN ] LazyOpsTest.TestFlatten 2023-01-11T23:30:56.4549047Z [ OK ] LazyOpsTest.TestFlatten (7 ms) 2023-01-11T23:30:56.4549378Z [ RUN ] LazyOpsTest.TestLogicalAnd 2023-01-11T23:30:56.4593982Z [ OK ] LazyOpsTest.TestLogicalAnd (4 ms) 2023-01-11T23:30:56.4594379Z [ RUN ] LazyOpsTest.TestBitwiseAnd 2023-01-11T23:30:56.4596564Z [ OK ] LazyOpsTest.TestBitwiseAnd (0 ms) 2023-01-11T23:30:56.4596976Z [ RUN ] LazyOpsTest.TestBitwiseAndInPlace 2023-01-11T23:30:56.4601189Z [ OK ] LazyOpsTest.TestBitwiseAndInPlace (0 ms) 2023-01-11T23:30:56.4601599Z [ RUN ] LazyOpsTest.TestBitwiseAndScalar 2023-01-11T23:30:56.4604582Z [ OK ] LazyOpsTest.TestBitwiseAndScalar (0 ms) 2023-01-11T23:30:56.4605039Z [ RUN ] LazyOpsTest.TestBitwiseAndScalarInPlace 2023-01-11T23:30:56.4609226Z [ OK ] LazyOpsTest.TestBitwiseAndScalarInPlace (0 ms) 2023-01-11T23:30:56.4609671Z [ RUN ] LazyOpsTest.TestBitwiseAndPromotion 2023-01-11T23:30:56.4615412Z [ OK ] LazyOpsTest.TestBitwiseAndPromotion (0 ms) 2023-01-11T23:30:56.4615759Z [ RUN ] LazyOpsTest.TestBitwiseOr 2023-01-11T23:30:56.4618822Z [ OK ] LazyOpsTest.TestBitwiseOr (0 ms) 2023-01-11T23:30:56.4619539Z [ RUN ] LazyOpsTest.TestBitwiseOrInPlace 2023-01-11T23:30:56.4623397Z [ OK ] LazyOpsTest.TestBitwiseOrInPlace (0 ms) 2023-01-11T23:30:56.4623862Z [ RUN ] LazyOpsTest.TestBitwiseOrScalar 2023-01-11T23:30:56.4626759Z [ OK ] LazyOpsTest.TestBitwiseOrScalar (0 ms) 2023-01-11T23:30:56.4627222Z [ RUN ] LazyOpsTest.TestBitwiseOrScalarInPlace 2023-01-11T23:30:56.4632562Z [ OK ] LazyOpsTest.TestBitwiseOrScalarInPlace (0 ms) 2023-01-11T23:30:56.4633059Z [ RUN ] LazyOpsTest.TestBitwiseXor 2023-01-11T23:30:56.4634073Z [ OK ] LazyOpsTest.TestBitwiseXor (0 ms) 2023-01-11T23:30:56.4634500Z [ RUN ] LazyOpsTest.TestBitwiseXorInPlace 2023-01-11T23:30:56.4635894Z [ OK ] LazyOpsTest.TestBitwiseXorInPlace (0 ms) 2023-01-11T23:30:56.4636351Z [ RUN ] LazyOpsTest.TestBitwiseXorScalar 2023-01-11T23:30:56.4637013Z [ OK ] LazyOpsTest.TestBitwiseXorScalar (0 ms) 2023-01-11T23:30:56.4637604Z [ RUN ] LazyOpsTest.TestBitwiseXorScalarInPlace 2023-01-11T23:30:56.4638882Z [ OK ] LazyOpsTest.TestBitwiseXorScalarInPlace (0 ms) 2023-01-11T23:30:56.4639238Z [ RUN ] LazyOpsTest.TestLshift 2023-01-11T23:30:56.4641565Z [ OK ] LazyOpsTest.TestLshift (0 ms) 2023-01-11T23:30:56.4641961Z [ RUN ] LazyOpsTest.TestLshiftInPlace 2023-01-11T23:30:56.4645567Z [ OK ] LazyOpsTest.TestLshiftInPlace (0 ms) 2023-01-11T23:30:56.4645965Z [ RUN ] LazyOpsTest.TestLshiftScalar 2023-01-11T23:30:56.4647376Z [ OK ] LazyOpsTest.TestLshiftScalar (0 ms) 2023-01-11T23:30:56.4647763Z [ RUN ] LazyOpsTest.TestLshiftScalarInPlace 2023-01-11T23:30:56.4650463Z [ OK ] LazyOpsTest.TestLshiftScalarInPlace (0 ms) 2023-01-11T23:30:56.4650807Z [ RUN ] LazyOpsTest.TestRshift 2023-01-11T23:30:56.4652735Z [ OK ] LazyOpsTest.TestRshift (0 ms) 2023-01-11T23:30:56.4653079Z [ RUN ] LazyOpsTest.TestRshiftInPlace 2023-01-11T23:30:56.4656563Z [ OK ] LazyOpsTest.TestRshiftInPlace (0 ms) 2023-01-11T23:30:56.4656905Z [ RUN ] LazyOpsTest.TestRshiftScalar 2023-01-11T23:30:56.4658599Z [ OK ] LazyOpsTest.TestRshiftScalar (0 ms) 2023-01-11T23:30:56.4658962Z [ RUN ] LazyOpsTest.TestRshiftScalarInPlace 2023-01-11T23:30:56.4661745Z [ OK ] LazyOpsTest.TestRshiftScalarInPlace (0 ms) 2023-01-11T23:30:56.4662088Z [ RUN ] LazyOpsTest.TestMeshgrid 2023-01-11T23:30:56.4662461Z [W TensorShape.cpp:3452] Warning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (function operator()) 2023-01-11T23:30:56.4676421Z [ OK ] LazyOpsTest.TestMeshgrid (1 ms) 2023-01-11T23:30:56.4676757Z [ RUN ] LazyOpsTest.TestConstantPad 2023-01-11T23:30:56.4681686Z [ OK ] LazyOpsTest.TestConstantPad (0 ms) 2023-01-11T23:30:56.4682045Z [ RUN ] LazyOpsTest.TestConstantPadIncomplete 2023-01-11T23:30:56.4686156Z [ OK ] LazyOpsTest.TestConstantPadIncomplete (0 ms) 2023-01-11T23:30:56.4686590Z [ RUN ] LazyOpsTest.TestReflectionPad2dRank3 2023-01-11T23:30:56.4688308Z [ OK ] LazyOpsTest.TestReflectionPad2dRank3 (0 ms) 2023-01-11T23:30:56.4688688Z [ RUN ] LazyOpsTest.TestReflectionPad2dRank4 2023-01-11T23:30:56.4690687Z [ OK ] LazyOpsTest.TestReflectionPad2dRank4 (0 ms) 2023-01-11T23:30:56.4691084Z [ RUN ] LazyOpsTest.TestReflectionPad2dBackward 2023-01-11T23:30:56.4704576Z [ OK ] LazyOpsTest.TestReflectionPad2dBackward (1 ms) 2023-01-11T23:30:56.4704951Z [ RUN ] LazyOpsTest.TestReplicationPad1d 2023-01-11T23:30:56.4706982Z [ OK ] LazyOpsTest.TestReplicationPad1d (0 ms) 2023-01-11T23:30:56.4707548Z [ RUN ] LazyOpsTest.TestReplicationPad1dZeroPad 2023-01-11T23:30:56.4709519Z [ OK ] LazyOpsTest.TestReplicationPad1dZeroPad (0 ms) 2023-01-11T23:30:56.4710009Z [ RUN ] LazyOpsTest.TestReplicationPad1dBackward 2023-01-11T23:30:56.4720296Z [ OK ] LazyOpsTest.TestReplicationPad1dBackward (1 ms) 2023-01-11T23:30:56.4720741Z [ RUN ] LazyOpsTest.TestReplicationPad2d 2023-01-11T23:30:56.4722484Z [ OK ] LazyOpsTest.TestReplicationPad2d (0 ms) 2023-01-11T23:30:56.4723048Z [ RUN ] LazyOpsTest.TestReplicationPad2dZeroPad 2023-01-11T23:30:56.4724698Z [ OK ] LazyOpsTest.TestReplicationPad2dZeroPad (0 ms) 2023-01-11T23:30:56.4725155Z [ RUN ] LazyOpsTest.TestReplicationPad2dBackward 2023-01-11T23:30:56.4736825Z [ OK ] LazyOpsTest.TestReplicationPad2dBackward (1 ms) 2023-01-11T23:30:56.4737192Z [ RUN ] LazyOpsTest.TestAsStrided 2023-01-11T23:30:56.4745494Z [ OK ] LazyOpsTest.TestAsStrided (0 ms) 2023-01-11T23:30:56.4746007Z [ RUN ] LazyOpsTest.TestAsStridedInPlace 2023-01-11T23:30:56.4775218Z [ OK ] LazyOpsTest.TestAsStridedInPlace (1 ms) 2023-01-11T23:30:56.4775811Z [ RUN ] LazyOpsTest.TestAsStridedWithOffset 2023-01-11T23:30:56.4776230Z [ OK ] LazyOpsTest.TestAsStridedWithOffset (0 ms) 2023-01-11T23:30:56.4776640Z [ RUN ] LazyOpsTest.TestAsStridedWithInplaceCopy 2023-01-11T23:30:56.4777081Z [ OK ] LazyOpsTest.TestAsStridedWithInplaceCopy (0 ms) 2023-01-11T23:30:56.4777480Z [ RUN ] LazyOpsTest.TestEmptyStrided 2023-01-11T23:30:56.4777846Z [ OK ] LazyOpsTest.TestEmptyStrided (0 ms) 2023-01-11T23:30:56.4778235Z [ RUN ] LazyOpsTest.TestAvgPool2DBackward 2023-01-11T23:30:56.5001793Z [ OK ] LazyOpsTest.TestAvgPool2DBackward (22 ms) 2023-01-11T23:30:56.5002167Z [ RUN ] LazyOpsTest.TestAvgPool3DBackward 2023-01-11T23:30:56.5154783Z [ OK ] LazyOpsTest.TestAvgPool3DBackward (15 ms) 2023-01-11T23:30:56.5155200Z [ RUN ] LazyOpsTest.TestAvgPool2DNoBatchBackward 2023-01-11T23:30:56.5383169Z [ OK ] LazyOpsTest.TestAvgPool2DNoBatchBackward (22 ms) 2023-01-11T23:30:56.5383605Z [ RUN ] LazyOpsTest.TestAvgPool3DNoBatchBackward 2023-01-11T23:30:56.5538040Z [ OK ] LazyOpsTest.TestAvgPool3DNoBatchBackward (15 ms) 2023-01-11T23:30:56.5538533Z [ RUN ] LazyOpsTest.TestAdaptiveAvgPool3DNoBatchBackward 2023-01-11T23:30:56.5538984Z /var/lib/jenkins/workspace/test/cpp/lazy/test_lazy_ops.cpp:10048: Skipped 2023-01-11T23:30:56.5539226Z 2023-01-11T23:30:56.5539489Z [ SKIPPED ] LazyOpsTest.TestAdaptiveAvgPool3DNoBatchBackward (0 ms) 2023-01-11T23:30:56.5539926Z [ RUN ] LazyOpsTest.TestAdaptiveAvgPool3DBackward 2023-01-11T23:30:56.5540254Z /var/lib/jenkins/workspace/test/cpp/lazy/test_lazy_ops.cpp:10071: Skipped 2023-01-11T23:30:56.5540430Z 2023-01-11T23:30:56.5540660Z [ SKIPPED ] LazyOpsTest.TestAdaptiveAvgPool3DBackward (0 ms) 2023-01-11T23:30:56.5541063Z [ RUN ] LazyOpsTest.TestAdaptiveAvgPool2DBackward 2023-01-11T23:30:56.5569482Z [ OK ] LazyOpsTest.TestAdaptiveAvgPool2DBackward (3 ms) 2023-01-11T23:30:56.5569903Z [ RUN ] LazyOpsTest.TestAdaptiveAvgPool2DNoBatchBackward 2023-01-11T23:30:56.5596859Z [ OK ] LazyOpsTest.TestAdaptiveAvgPool2DNoBatchBackward (2 ms) 2023-01-11T23:30:56.5597237Z [ RUN ] LazyOpsTest.TestConv2D 2023-01-11T23:30:57.6765825Z [ OK ] LazyOpsTest.TestConv2D (1116 ms) 2023-01-11T23:30:57.6766278Z [ RUN ] LazyOpsTest.TestConv2DBackward 2023-01-11T23:30:58.2265676Z [ OK ] LazyOpsTest.TestConv2DBackward (549 ms) 2023-01-11T23:30:58.2266471Z [ RUN ] LazyOpsTest.TestTransposedConv2DBackward 2023-01-11T23:30:58.5929696Z [ OK ] LazyOpsTest.TestTransposedConv2DBackward (366 ms) 2023-01-11T23:30:58.5930194Z [ RUN ] LazyOpsTest.TestConv3DBackward 2023-01-11T23:30:58.8327166Z [ OK ] LazyOpsTest.TestConv3DBackward (239 ms) 2023-01-11T23:30:58.8327561Z [ RUN ] LazyOpsTest.TestTransposedConv3DBackward 2023-01-11T23:30:59.2616517Z [ OK ] LazyOpsTest.TestTransposedConv3DBackward (428 ms) 2023-01-11T23:30:59.2616967Z [ RUN ] LazyOpsTest.TestMaxPool2DBackward 2023-01-11T23:30:59.2746560Z [ OK ] LazyOpsTest.TestMaxPool2DBackward (13 ms) 2023-01-11T23:30:59.2746941Z [ RUN ] LazyOpsTest.TestMaxPool3DBackward 2023-01-11T23:30:59.2836087Z [ OK ] LazyOpsTest.TestMaxPool3DBackward (8 ms) 2023-01-11T23:30:59.2836473Z [ RUN ] LazyOpsTest.TestMaxPool2DNoBatchBackward 2023-01-11T23:30:59.2965010Z [ OK ] LazyOpsTest.TestMaxPool2DNoBatchBackward (12 ms) 2023-01-11T23:30:59.2965475Z [ RUN ] LazyOpsTest.TestMaxPool3DNoBatchBackward 2023-01-11T23:30:59.3052022Z [ OK ] LazyOpsTest.TestMaxPool3DNoBatchBackward (8 ms) 2023-01-11T23:30:59.3052414Z [ RUN ] LazyOpsTest.TestMaxUnpool2DBackward 2023-01-11T23:30:59.3312798Z [ OK ] LazyOpsTest.TestMaxUnpool2DBackward (25 ms) 2023-01-11T23:30:59.3313193Z [ RUN ] LazyOpsTest.TestMaxUnpool3DBackward 2023-01-11T23:30:59.3536531Z [ OK ] LazyOpsTest.TestMaxUnpool3DBackward (22 ms) 2023-01-11T23:30:59.3536924Z [ RUN ] LazyOpsTest.TestTanhBackward 2023-01-11T23:30:59.3550063Z [ OK ] LazyOpsTest.TestTanhBackward (1 ms) 2023-01-11T23:30:59.3550466Z [ RUN ] LazyOpsTest.TestSigmoidBackward 2023-01-11T23:30:59.3563625Z [ OK ] LazyOpsTest.TestSigmoidBackward (1 ms) 2023-01-11T23:30:59.3564092Z [ RUN ] LazyOpsTest.TestLogSigmoidBackward 2023-01-11T23:30:59.3577295Z [ OK ] LazyOpsTest.TestLogSigmoidBackward (1 ms) 2023-01-11T23:30:59.3577716Z [ RUN ] LazyOpsTest.TestLogSoftmaxBackward 2023-01-11T23:30:59.3683535Z [ OK ] LazyOpsTest.TestLogSoftmaxBackward (10 ms) 2023-01-11T23:30:59.3683949Z [ RUN ] LazyOpsTest.TestSoftmaxBackward 2023-01-11T23:30:59.3791390Z [ OK ] LazyOpsTest.TestSoftmaxBackward (10 ms) 2023-01-11T23:30:59.3792077Z [ RUN ] LazyOpsTest.TestSoftplusBackward 2023-01-11T23:30:59.3804632Z [ OK ] LazyOpsTest.TestSoftplusBackward (1 ms) 2023-01-11T23:30:59.3804996Z [ RUN ] LazyOpsTest.TestReluBackward 2023-01-11T23:30:59.3818233Z [ OK ] LazyOpsTest.TestReluBackward (1 ms) 2023-01-11T23:30:59.3818583Z [ RUN ] LazyOpsTest.TestRreluBackward 2023-01-11T23:30:59.3832728Z [ OK ] LazyOpsTest.TestRreluBackward (1 ms) 2023-01-11T23:30:59.3833102Z [ RUN ] LazyOpsTest.TestHardshrinkBackward 2023-01-11T23:30:59.3844436Z [ OK ] LazyOpsTest.TestHardshrinkBackward (1 ms) 2023-01-11T23:30:59.3844822Z [ RUN ] LazyOpsTest.TestSoftshrinkBackward 2023-01-11T23:30:59.3857044Z [ OK ] LazyOpsTest.TestSoftshrinkBackward (1 ms) 2023-01-11T23:30:59.3857419Z [ RUN ] LazyOpsTest.TestHardtanhBackward 2023-01-11T23:30:59.3865668Z [ OK ] LazyOpsTest.TestHardtanhBackward (0 ms) 2023-01-11T23:30:59.3866011Z [ RUN ] LazyOpsTest.TestEluBackward 2023-01-11T23:30:59.3880322Z [ OK ] LazyOpsTest.TestEluBackward (1 ms) 2023-01-11T23:30:59.3880661Z [ RUN ] LazyOpsTest.TestGeluBackward 2023-01-11T23:30:59.3893742Z [ OK ] LazyOpsTest.TestGeluBackward (1 ms) 2023-01-11T23:30:59.3894093Z [ RUN ] LazyOpsTest.TestLeakyReluBackward 2023-01-11T23:30:59.3907597Z [ OK ] LazyOpsTest.TestLeakyReluBackward (1 ms) 2023-01-11T23:30:59.3907970Z [ RUN ] LazyOpsTest.TestTransposeBackward 2023-01-11T23:30:59.3920333Z [ OK ] LazyOpsTest.TestTransposeBackward (1 ms) 2023-01-11T23:30:59.3920707Z [ RUN ] LazyOpsTest.TestAddMatMulBackward 2023-01-11T23:30:59.3994469Z [ OK ] LazyOpsTest.TestAddMatMulBackward (7 ms) 2023-01-11T23:30:59.3994869Z [ RUN ] LazyOpsTest.TestBinaryCrossEntropyBackward 2023-01-11T23:30:59.4074640Z [ OK ] LazyOpsTest.TestBinaryCrossEntropyBackward (8 ms) 2023-01-11T23:30:59.4075083Z [ RUN ] LazyOpsTest.TestNllLossBackward 2023-01-11T23:30:59.4075498Z /var/lib/jenkins/workspace/test/cpp/lazy/test_lazy_ops.cpp:10954: Skipped 2023-01-11T23:30:59.4075738Z 2023-01-11T23:30:59.4075948Z [ SKIPPED ] LazyOpsTest.TestNllLossBackward (0 ms) 2023-01-11T23:30:59.4076294Z [ RUN ] LazyOpsTest.TestNllLoss2dBackward 2023-01-11T23:30:59.4478136Z [ OK ] LazyOpsTest.TestNllLoss2dBackward (40 ms) 2023-01-11T23:30:59.4478662Z [ RUN ] LazyOpsTest.TestSmoothL1LossBackward 2023-01-11T23:30:59.4565467Z [ OK ] LazyOpsTest.TestSmoothL1LossBackward (8 ms) 2023-01-11T23:30:59.4566057Z [ RUN ] LazyOpsTest.TestViewBackward 2023-01-11T23:30:59.4584397Z [ OK ] LazyOpsTest.TestViewBackward (1 ms) 2023-01-11T23:30:59.4584764Z [ RUN ] LazyOpsTest.TestBatchNorm2DBackward 2023-01-11T23:30:59.4651264Z [ OK ] LazyOpsTest.TestBatchNorm2DBackward (6 ms) 2023-01-11T23:30:59.4651641Z [ RUN ] LazyOpsTest.TestBatchNorm3DBackward 2023-01-11T23:30:59.4717170Z [ OK ] LazyOpsTest.TestBatchNorm3DBackward (6 ms) 2023-01-11T23:30:59.4717558Z [ RUN ] LazyOpsTest.TestBCEWithLogitsBackward 2023-01-11T23:30:59.5181059Z [ OK ] LazyOpsTest.TestBCEWithLogitsBackward (46 ms) 2023-01-11T23:30:59.5181432Z [ RUN ] LazyOpsTest.TestKlDivBackward 2023-01-11T23:30:59.5291887Z [ OK ] LazyOpsTest.TestKlDivBackward (11 ms) 2023-01-11T23:30:59.5292264Z [ RUN ] LazyOpsTest.TestEmbeddingBackward 2023-01-11T23:30:59.6342171Z [ OK ] LazyOpsTest.TestEmbeddingBackward (104 ms) 2023-01-11T23:30:59.6343189Z [ RUN ] LazyOpsTest.TestAmpForeachNonFiniteCheckAndUnscale 2023-01-11T23:30:59.6343905Z /var/lib/jenkins/workspace/test/cpp/lazy/test_lazy_ops.cpp:11331: Skipped 2023-01-11T23:30:59.6344229Z 2023-01-11T23:30:59.6344526Z [ SKIPPED ] LazyOpsTest.TestAmpForeachNonFiniteCheckAndUnscale (0 ms) 2023-01-11T23:30:59.6344940Z [ RUN ] LazyOpsTest.TestAmpUpdateScale 2023-01-11T23:30:59.6358951Z [ OK ] LazyOpsTest.TestAmpUpdateScale (1 ms) 2023-01-11T23:30:59.6359339Z [ RUN ] LazyOpsTest.TestEarlySyncLiveTensors 2023-01-11T23:30:59.6359738Z [ OK ] LazyOpsTest.TestEarlySyncLiveTensors (0 ms) 2023-01-11T23:30:59.6360096Z [ RUN ] LazyOpsTest.TestLerp 2023-01-11T23:30:59.6363294Z [ OK ] LazyOpsTest.TestLerp (0 ms) 2023-01-11T23:30:59.6363675Z [ RUN ] LazyOpsTest.TestLerpScalar 2023-01-11T23:30:59.6365795Z [ OK ] LazyOpsTest.TestLerpScalar (0 ms) 2023-01-11T23:30:59.6366172Z [ RUN ] LazyOpsTest.TestLerpInplace 2023-01-11T23:30:59.6368626Z [ OK ] LazyOpsTest.TestLerpInplace (0 ms) 2023-01-11T23:30:59.6369047Z [ RUN ] LazyOpsTest.TestLerpScalarInplace 2023-01-11T23:30:59.6371770Z [ OK ] LazyOpsTest.TestLerpScalarInplace (0 ms) 2023-01-11T23:30:59.6372216Z [ RUN ] LazyOpsTest.TestLerpOut 2023-01-11T23:30:59.6375052Z [ OK ] LazyOpsTest.TestLerpOut (0 ms) 2023-01-11T23:30:59.6375454Z [ RUN ] LazyOpsTest.TestLerpScalarOut 2023-01-11T23:30:59.6378084Z [ OK ] LazyOpsTest.TestLerpScalarOut (0 ms) 2023-01-11T23:30:59.6378563Z [ RUN ] LazyOpsTest.IsAliasOf 2023-01-11T23:30:59.6378975Z [ OK ] LazyOpsTest.IsAliasOf (0 ms) 2023-01-11T23:30:59.6379513Z [----------] 574 tests from LazyOpsTest (9142 ms total) 2023-01-11T23:30:59.6379690Z 2023-01-11T23:30:59.6379867Z [----------] Global test environment tear-down 2023-01-11T23:30:59.6456031Z [==========] 611 tests from 10 test suites ran. (9151 ms total) 2023-01-11T23:30:59.6456833Z [ PASSED ] 594 tests. 2023-01-11T23:30:59.6457569Z [ SKIPPED ] 17 tests, listed below: 2023-01-11T23:30:59.6458339Z [ SKIPPED ] LazyOpsTest.TestInverse 2023-01-11T23:30:59.6459074Z [ SKIPPED ] LazyOpsTest.TestOneIndexPut 2023-01-11T23:30:59.6459751Z [ SKIPPED ] LazyOpsTest.TestOneIndexPutInPlace 2023-01-11T23:30:59.6460490Z [ SKIPPED ] LazyOpsTest.TestOneIndexPutTransfer 2023-01-11T23:30:59.6461177Z [ SKIPPED ] LazyOpsTest.TestMultiIndexPut 2023-01-11T23:30:59.6461885Z [ SKIPPED ] LazyOpsTest.TestMultiIndexPutHeadNull 2023-01-11T23:30:59.6462626Z [ SKIPPED ] LazyOpsTest.TestMultiIndexPutMiddleNull 2023-01-11T23:30:59.6463391Z [ SKIPPED ] LazyOpsTest.TestMultiIndexPutTailNull 2023-01-11T23:30:59.6464134Z [ SKIPPED ] LazyOpsTest.TestMultiIndexPutMiddleBroadcast 2023-01-11T23:30:59.6464555Z [ SKIPPED ] LazyOpsTest.TestMultiIndexPutTailBroadcast 2023-01-11T23:30:59.6464914Z [ SKIPPED ] LazyOpsTest.TestIndexPutImpl 2023-01-11T23:30:59.6465255Z [ SKIPPED ] LazyOpsTest.TestIndexCopyInPlace 2023-01-11T23:30:59.6465567Z [ SKIPPED ] LazyOpsTest.TestNllLoss 2023-01-11T23:30:59.6465945Z [ SKIPPED ] LazyOpsTest.TestAdaptiveAvgPool3DNoBatchBackward 2023-01-11T23:30:59.6466363Z [ SKIPPED ] LazyOpsTest.TestAdaptiveAvgPool3DBackward 2023-01-11T23:30:59.6466727Z [ SKIPPED ] LazyOpsTest.TestNllLossBackward 2023-01-11T23:30:59.6467119Z [ SKIPPED ] LazyOpsTest.TestAmpForeachNonFiniteCheckAndUnscale 2023-01-11T23:31:00.2592945Z + [[ linux-bionic-cuda11.6-py3.10-gcc7-sm86 != *-tsan* ]] 2023-01-11T23:31:00.2593720Z + python test/cpp/jit/tests_setup.py shutdown 2023-01-11T23:31:01.6112935Z + wait 2023-01-11T23:31:01.6113191Z + OMP_NUM_THREADS=2 2023-01-11T23:31:01.6113443Z + TORCH_CPP_TEST_MNIST_PATH=test/cpp/api/mnist 2023-01-11T23:31:01.6114134Z + /opt/conda/lib/python3.10/site-packages/torch/bin/test_api '--gtest_filter=-IMethodTest.*' --gtest_output=xml:test/test-reports/cpp-unittest/test_libtorch/test_api.xml 2023-01-11T23:31:01.9924415Z Only one CUDA device detected. Disabling MultiCUDA tests 2023-01-11T23:31:01.9930805Z Note: Google Test filter = -IMethodTest.*:*_MultiCUDA 2023-01-11T23:31:01.9931241Z [==========] Running 1035 tests from 49 test suites. 2023-01-11T23:31:01.9931557Z [----------] Global test environment set-up. 2023-01-11T23:31:01.9931880Z [----------] 9 tests from AutogradAPITests 2023-01-11T23:31:01.9932219Z [ RUN ] AutogradAPITests.BackwardSimpleTest 2023-01-11T23:31:01.9947043Z [ OK ] AutogradAPITests.BackwardSimpleTest (1 ms) 2023-01-11T23:31:01.9947421Z [ RUN ] AutogradAPITests.BackwardTest 2023-01-11T23:31:01.9948056Z [W engine.cpp:1134] Warning: Using backward() with create_graph=True will create a reference cycle between the parameter and its gradient which can cause a memory leak. We recommend using autograd.grad when creating the graph to avoid this. If you have to use this function, make sure to reset the .grad fields of your parameters to None after use to break the cycle and avoid the leak. (function operator()) 2023-01-11T23:31:01.9952414Z [ OK ] AutogradAPITests.BackwardTest (0 ms) 2023-01-11T23:31:01.9952786Z [ RUN ] AutogradAPITests.GradSimpleTest 2023-01-11T23:31:01.9953239Z [ OK ] AutogradAPITests.GradSimpleTest (0 ms) 2023-01-11T23:31:01.9953581Z [ RUN ] AutogradAPITests.GradTest 2023-01-11T23:31:01.9956943Z [ OK ] AutogradAPITests.GradTest (0 ms) 2023-01-11T23:31:01.9957413Z [ RUN ] AutogradAPITests.GradNonLeafTest 2023-01-11T23:31:01.9959522Z [W TensorBody.h:485] Warning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (function grad) 2023-01-11T23:31:01.9960932Z [W TensorBody.h:485] Warning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (function grad) 2023-01-11T23:31:01.9962328Z [W TensorBody.h:485] Warning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (function grad) 2023-01-11T23:31:01.9963829Z [W TensorBody.h:485] Warning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (function grad) 2023-01-11T23:31:01.9965206Z [ OK ] AutogradAPITests.GradNonLeafTest (0 ms) 2023-01-11T23:31:01.9965580Z [ RUN ] AutogradAPITests.GradUnreachableTest 2023-01-11T23:31:01.9993561Z [ OK ] AutogradAPITests.GradUnreachableTest (2 ms) 2023-01-11T23:31:01.9993941Z [ RUN ] AutogradAPITests.EmptyInput 2023-01-11T23:31:02.0021880Z [ OK ] AutogradAPITests.EmptyInput (2 ms) 2023-01-11T23:31:02.0022737Z [ RUN ] AutogradAPITests.RetainGrad 2023-01-11T23:31:02.0024202Z [ OK ] AutogradAPITests.RetainGrad (0 ms) 2023-01-11T23:31:02.0024847Z [ RUN ] AutogradAPITests.AnomalyMode 2023-01-11T23:31:02.0025364Z [W anomaly_mode.cpp:27] Warning: This mode should be enabled only for debugging as the different tests will slow down your program execution. (function operator()) 2023-01-11T23:31:02.1628487Z [ OK ] AutogradAPITests.AnomalyMode (160 ms) 2023-01-11T23:31:02.1629022Z [----------] 9 tests from AutogradAPITests (169 ms total) 2023-01-11T23:31:02.1629217Z 2023-01-11T23:31:02.1629395Z [----------] 33 tests from CustomAutogradTest 2023-01-11T23:31:02.1629785Z [ RUN ] CustomAutogradTest.GradUnreachableDiscoveryTest 2023-01-11T23:31:02.1630374Z [ OK ] CustomAutogradTest.GradUnreachableDiscoveryTest (0 ms) 2023-01-11T23:31:02.1630808Z [ RUN ] CustomAutogradTest.CustomFunction 2023-01-11T23:31:02.1631214Z [ OK ] CustomAutogradTest.CustomFunction (0 ms) 2023-01-11T23:31:02.1631803Z [ RUN ] CustomAutogradTest.CustomFunctionWithTensorList 2023-01-11T23:31:02.1632379Z [ OK ] CustomAutogradTest.CustomFunctionWithTensorList (0 ms) 2023-01-11T23:31:02.1632892Z [ RUN ] CustomAutogradTest.GraphTaskTrimEdges 2023-01-11T23:31:02.1636381Z [ OK ] CustomAutogradTest.GraphTaskTrimEdges (0 ms) 2023-01-11T23:31:02.1637037Z [ RUN ] CustomAutogradTest.FunctionReturnsInput 2023-01-11T23:31:02.1637596Z [ OK ] CustomAutogradTest.FunctionReturnsInput (0 ms) 2023-01-11T23:31:02.1638273Z [ RUN ] CustomAutogradTest.FunctionReturnsUndefined 2023-01-11T23:31:02.1638701Z [ OK ] CustomAutogradTest.FunctionReturnsUndefined (0 ms) 2023-01-11T23:31:02.1639107Z [ RUN ] CustomAutogradTest.MaterializeGrads 2023-01-11T23:31:02.1639470Z [ OK ] CustomAutogradTest.MaterializeGrads (0 ms) 2023-01-11T23:31:02.1639967Z [ RUN ] CustomAutogradTest.DontMaterializeGrads 2023-01-11T23:31:02.1640594Z [ OK ] CustomAutogradTest.DontMaterializeGrads (0 ms) 2023-01-11T23:31:02.1641088Z [ RUN ] CustomAutogradTest.NoGradCustomFunction 2023-01-11T23:31:02.1641475Z [ OK ] CustomAutogradTest.NoGradCustomFunction (0 ms) 2023-01-11T23:31:02.1641841Z [ RUN ] CustomAutogradTest.MarkDirty 2023-01-11T23:31:02.1642352Z [ OK ] CustomAutogradTest.MarkDirty (0 ms) 2023-01-11T23:31:02.1642890Z [ RUN ] CustomAutogradTest.MarkNonDifferentiable 2023-01-11T23:31:02.1643403Z [ OK ] CustomAutogradTest.MarkNonDifferentiable (0 ms) 2023-01-11T23:31:02.1643825Z [ RUN ] CustomAutogradTest.MarkNonDifferentiableMixed 2023-01-11T23:31:02.1644264Z [ OK ] CustomAutogradTest.MarkNonDifferentiableMixed (0 ms) 2023-01-11T23:31:02.1644690Z [ RUN ] CustomAutogradTest.MarkNonDifferentiableNone 2023-01-11T23:31:02.1645115Z [ OK ] CustomAutogradTest.MarkNonDifferentiableNone (0 ms) 2023-01-11T23:31:02.1645516Z [ RUN ] CustomAutogradTest.ReturnLeafInplace 2023-01-11T23:31:02.1645891Z [ OK ] CustomAutogradTest.ReturnLeafInplace (0 ms) 2023-01-11T23:31:02.1646286Z [ RUN ] CustomAutogradTest.ReturnDuplicateInplace 2023-01-11T23:31:02.1667868Z [ OK ] CustomAutogradTest.ReturnDuplicateInplace (2 ms) 2023-01-11T23:31:02.1668962Z [ RUN ] CustomAutogradTest.ReturnDuplicate 2023-01-11T23:31:02.1669704Z [ OK ] CustomAutogradTest.ReturnDuplicate (0 ms) 2023-01-11T23:31:02.1670663Z [ RUN ] CustomAutogradTest.SaveEmptyForBackward 2023-01-11T23:31:02.1671443Z [ OK ] CustomAutogradTest.SaveEmptyForBackward (0 ms) 2023-01-11T23:31:02.1672164Z [ RUN ] CustomAutogradTest.InvalidGradients 2023-01-11T23:31:02.1728591Z [ OK ] CustomAutogradTest.InvalidGradients (5 ms) 2023-01-11T23:31:02.1728999Z [ RUN ] CustomAutogradTest.NoGradInput 2023-01-11T23:31:02.1729592Z [ OK ] CustomAutogradTest.NoGradInput (0 ms) 2023-01-11T23:31:02.1730056Z [ RUN ] CustomAutogradTest.TooManyGrads 2023-01-11T23:31:02.1730405Z [ OK ] CustomAutogradTest.TooManyGrads (0 ms) 2023-01-11T23:31:02.1730736Z [ RUN ] CustomAutogradTest.DepNoGrad 2023-01-11T23:31:02.1731064Z [ OK ] CustomAutogradTest.DepNoGrad (0 ms) 2023-01-11T23:31:02.1731388Z [ RUN ] CustomAutogradTest.Reentrant 2023-01-11T23:31:02.1731738Z [ OK ] CustomAutogradTest.Reentrant (0 ms) 2023-01-11T23:31:02.1732071Z [ RUN ] CustomAutogradTest.DeepReentrant 2023-01-11T23:31:02.6057955Z [ OK ] CustomAutogradTest.DeepReentrant (432 ms) 2023-01-11T23:31:02.6058377Z [ RUN ] CustomAutogradTest.ReentrantPriority 2023-01-11T23:31:02.6063071Z [ OK ] CustomAutogradTest.ReentrantPriority (0 ms) 2023-01-11T23:31:02.6063429Z [ RUN ] CustomAutogradTest.Hooks 2023-01-11T23:31:02.6084874Z [ OK ] CustomAutogradTest.Hooks (2 ms) 2023-01-11T23:31:02.6085208Z [ RUN ] CustomAutogradTest.HooksInplace 2023-01-11T23:31:02.6087138Z [ OK ] CustomAutogradTest.HooksInplace (0 ms) 2023-01-11T23:31:02.6087551Z [ RUN ] CustomAutogradTest.HooksInplaceWithRetainsGrad 2023-01-11T23:31:02.6089298Z [ OK ] CustomAutogradTest.HooksInplaceWithRetainsGrad (0 ms) 2023-01-11T23:31:02.6090121Z [ RUN ] CustomAutogradTest.HooksInplaceTwiceWithRetainsGrad 2023-01-11T23:31:02.6092372Z [ OK ] CustomAutogradTest.HooksInplaceTwiceWithRetainsGrad (0 ms) 2023-01-11T23:31:02.6092770Z [ RUN ] CustomAutogradTest.HookNone 2023-01-11T23:31:02.6093234Z [ OK ] CustomAutogradTest.HookNone (0 ms) 2023-01-11T23:31:02.6093580Z [ RUN ] CustomAutogradTest.BackwardWithInputs 2023-01-11T23:31:02.6095437Z [ OK ] CustomAutogradTest.BackwardWithInputs (0 ms) 2023-01-11T23:31:02.6095842Z [ RUN ] CustomAutogradTest.BackwardWithEmptyInputs 2023-01-11T23:31:02.6108008Z [ OK ] CustomAutogradTest.BackwardWithEmptyInputs (1 ms) 2023-01-11T23:31:02.6108420Z [ RUN ] CustomAutogradTest.BackwardWithNonLeafInputs 2023-01-11T23:31:02.6109877Z [ OK ] CustomAutogradTest.BackwardWithNonLeafInputs (0 ms) 2023-01-11T23:31:02.6110417Z [ RUN ] CustomAutogradTest.BackwardWithCreateGraphWarns 2023-01-11T23:31:02.6110958Z [ OK ] CustomAutogradTest.BackwardWithCreateGraphWarns (0 ms) 2023-01-11T23:31:02.6111374Z [----------] 33 tests from CustomAutogradTest (448 ms total) 2023-01-11T23:31:02.6111554Z 2023-01-11T23:31:02.6111777Z [----------] 13 tests from TestAutogradNotImplementedFallback 2023-01-11T23:31:02.6112219Z [ RUN ] TestAutogradNotImplementedFallback.RetSingleNonTensor 2023-01-11T23:31:02.6114540Z [ OK ] TestAutogradNotImplementedFallback.RetSingleNonTensor (0 ms) 2023-01-11T23:31:02.6115021Z [ RUN ] TestAutogradNotImplementedFallback.InplaceOp 2023-01-11T23:31:02.6158742Z [ OK ] TestAutogradNotImplementedFallback.InplaceOp (4 ms) 2023-01-11T23:31:02.6159194Z [ RUN ] TestAutogradNotImplementedFallback.DoubleInplaceOp 2023-01-11T23:31:02.6195735Z [ OK ] TestAutogradNotImplementedFallback.DoubleInplaceOp (3 ms) 2023-01-11T23:31:02.6196177Z [ RUN ] TestAutogradNotImplementedFallback.OptOp 2023-01-11T23:31:02.6198283Z [ OK ] TestAutogradNotImplementedFallback.OptOp (0 ms) 2023-01-11T23:31:02.6198738Z [ RUN ] TestAutogradNotImplementedFallback.OutOfPlaceAddition 2023-01-11T23:31:02.6234561Z [ OK ] TestAutogradNotImplementedFallback.OutOfPlaceAddition (3 ms) 2023-01-11T23:31:02.6235055Z [ RUN ] TestAutogradNotImplementedFallback.RetTupleNonTensor 2023-01-11T23:31:02.6266758Z [ OK ] TestAutogradNotImplementedFallback.RetTupleNonTensor (3 ms) 2023-01-11T23:31:02.6267197Z [ RUN ] TestAutogradNotImplementedFallback.ViewOp 2023-01-11T23:31:02.6332258Z [ OK ] TestAutogradNotImplementedFallback.ViewOp (6 ms) 2023-01-11T23:31:02.6332711Z [ RUN ] TestAutogradNotImplementedFallback.ViewOpWithExtraArg 2023-01-11T23:31:02.6366710Z [ OK ] TestAutogradNotImplementedFallback.ViewOpWithExtraArg (3 ms) 2023-01-11T23:31:02.6367207Z [ RUN ] TestAutogradNotImplementedFallback.RetTensorVectorView 2023-01-11T23:31:02.6367699Z [ OK ] TestAutogradNotImplementedFallback.RetTensorVectorView (0 ms) 2023-01-11T23:31:02.6368160Z [ RUN ] TestAutogradNotImplementedFallback.DoubleViewOP 2023-01-11T23:31:02.6387569Z [ OK ] TestAutogradNotImplementedFallback.DoubleViewOP (1 ms) 2023-01-11T23:31:02.6388027Z [ RUN ] TestAutogradNotImplementedFallback.NonFirstViewOP 2023-01-11T23:31:02.6418006Z [ OK ] TestAutogradNotImplementedFallback.NonFirstViewOP (3 ms) 2023-01-11T23:31:02.6418471Z [ RUN ] TestAutogradNotImplementedFallback.RetTensorVector 2023-01-11T23:31:02.6451806Z [ OK ] TestAutogradNotImplementedFallback.RetTensorVector (3 ms) 2023-01-11T23:31:02.6452261Z [ RUN ] TestAutogradNotImplementedFallback.TensorlistOp 2023-01-11T23:31:02.6476212Z [ OK ] TestAutogradNotImplementedFallback.TensorlistOp (2 ms) 2023-01-11T23:31:02.6476762Z [----------] 13 tests from TestAutogradNotImplementedFallback (36 ms total) 2023-01-11T23:31:02.6476974Z 2023-01-11T23:31:02.6477133Z [----------] 18 tests from AnyModuleTest 2023-01-11T23:31:02.6477447Z [ RUN ] AnyModuleTest.SimpleReturnType 2023-01-11T23:31:02.6477807Z [ OK ] AnyModuleTest.SimpleReturnType (0 ms) 2023-01-11T23:31:02.6478227Z [ RUN ] AnyModuleTest.SimpleReturnTypeAndSingleArgument 2023-01-11T23:31:02.6478668Z [ OK ] AnyModuleTest.SimpleReturnTypeAndSingleArgument (0 ms) 2023-01-11T23:31:02.6479096Z [ RUN ] AnyModuleTest.StringLiteralReturnTypeAndArgument 2023-01-11T23:31:02.6479538Z [ OK ] AnyModuleTest.StringLiteralReturnTypeAndArgument (0 ms) 2023-01-11T23:31:02.6479988Z [ RUN ] AnyModuleTest.StringReturnTypeWithConstArgument 2023-01-11T23:31:02.6480414Z [ OK ] AnyModuleTest.StringReturnTypeWithConstArgument (0 ms) 2023-01-11T23:31:02.6480986Z [ RUN ] AnyModuleTest.TensorReturnTypeAndStringArgumentsWithFunkyQualifications 2023-01-11T23:31:02.6481586Z [ OK ] AnyModuleTest.TensorReturnTypeAndStringArgumentsWithFunkyQualifications (0 ms) 2023-01-11T23:31:02.6482046Z [ RUN ] AnyModuleTest.WrongArgumentType 2023-01-11T23:31:02.6491743Z [ OK ] AnyModuleTest.WrongArgumentType (1 ms) 2023-01-11T23:31:02.6492112Z [ RUN ] AnyModuleTest.WrongNumberOfArguments 2023-01-11T23:31:02.6528620Z [ OK ] AnyModuleTest.WrongNumberOfArguments (3 ms) 2023-01-11T23:31:02.6529138Z [ RUN ] AnyModuleTest.PassingArgumentsToModuleWithDefaultArgumentsInForwardMethod 2023-01-11T23:31:02.6602464Z [ OK ] AnyModuleTest.PassingArgumentsToModuleWithDefaultArgumentsInForwardMethod (7 ms) 2023-01-11T23:31:02.6602983Z [ RUN ] AnyModuleTest.GetWithCorrectTypeSucceeds 2023-01-11T23:31:02.6603386Z [ OK ] AnyModuleTest.GetWithCorrectTypeSucceeds (0 ms) 2023-01-11T23:31:02.6603787Z [ RUN ] AnyModuleTest.GetWithIncorrectTypeThrows 2023-01-11T23:31:02.6613070Z [ OK ] AnyModuleTest.GetWithIncorrectTypeThrows (1 ms) 2023-01-11T23:31:02.6613471Z [ RUN ] AnyModuleTest.PtrWithBaseClassSucceeds 2023-01-11T23:31:02.6613860Z [ OK ] AnyModuleTest.PtrWithBaseClassSucceeds (0 ms) 2023-01-11T23:31:02.6614245Z [ RUN ] AnyModuleTest.PtrWithGoodDowncastSuccceeds 2023-01-11T23:31:02.6614789Z [ OK ] AnyModuleTest.PtrWithGoodDowncastSuccceeds (0 ms) 2023-01-11T23:31:02.6615182Z [ RUN ] AnyModuleTest.PtrWithBadDowncastThrows 2023-01-11T23:31:02.6624264Z [ OK ] AnyModuleTest.PtrWithBadDowncastThrows (1 ms) 2023-01-11T23:31:02.6624638Z [ RUN ] AnyModuleTest.DefaultStateIsEmpty 2023-01-11T23:31:02.6625043Z [ OK ] AnyModuleTest.DefaultStateIsEmpty (0 ms) 2023-01-11T23:31:02.6625448Z [ RUN ] AnyModuleTest.AllMethodsThrowForEmptyAnyModule 2023-01-11T23:31:02.6677115Z [ OK ] AnyModuleTest.AllMethodsThrowForEmptyAnyModule (5 ms) 2023-01-11T23:31:02.6677637Z [ RUN ] AnyModuleTest.CanMoveAssignDifferentModules 2023-01-11T23:31:02.6678208Z [ OK ] AnyModuleTest.CanMoveAssignDifferentModules (0 ms) 2023-01-11T23:31:02.6678706Z [ RUN ] AnyModuleTest.ConstructsFromModuleHolder 2023-01-11T23:31:02.6679095Z [ OK ] AnyModuleTest.ConstructsFromModuleHolder (0 ms) 2023-01-11T23:31:02.6679506Z [ RUN ] AnyModuleTest.ConvertsVariableToTensorCorrectly 2023-01-11T23:31:02.6679943Z [ OK ] AnyModuleTest.ConvertsVariableToTensorCorrectly (0 ms) 2023-01-11T23:31:02.6680353Z [----------] 18 tests from AnyModuleTest (20 ms total) 2023-01-11T23:31:02.6680511Z 2023-01-11T23:31:02.6680666Z [----------] 12 tests from AnyValueTest 2023-01-11T23:31:02.6681127Z [ RUN ] AnyValueTest.CorrectlyAccessesIntWhenCorrectType 2023-01-11T23:31:02.6681581Z [ OK ] AnyValueTest.CorrectlyAccessesIntWhenCorrectType (0 ms) 2023-01-11T23:31:02.6682044Z [ RUN ] AnyValueTest.CorrectlyAccessesStringLiteralWhenCorrectType 2023-01-11T23:31:02.6682557Z [ OK ] AnyValueTest.CorrectlyAccessesStringLiteralWhenCorrectType (0 ms) 2023-01-11T23:31:02.6683031Z [ RUN ] AnyValueTest.CorrectlyAccessesStringWhenCorrectType 2023-01-11T23:31:02.6683493Z [ OK ] AnyValueTest.CorrectlyAccessesStringWhenCorrectType (0 ms) 2023-01-11T23:31:02.6683945Z [ RUN ] AnyValueTest.CorrectlyAccessesPointersWhenCorrectType 2023-01-11T23:31:02.6684421Z [ OK ] AnyValueTest.CorrectlyAccessesPointersWhenCorrectType (0 ms) 2023-01-11T23:31:02.6684894Z [ RUN ] AnyValueTest.CorrectlyAccessesReferencesWhenCorrectType 2023-01-11T23:31:02.6685376Z [ OK ] AnyValueTest.CorrectlyAccessesReferencesWhenCorrectType (0 ms) 2023-01-11T23:31:02.6685877Z [ RUN ] AnyValueTest.TryGetReturnsNullptrForTheWrongType 2023-01-11T23:31:02.6686317Z [ OK ] AnyValueTest.TryGetReturnsNullptrForTheWrongType (0 ms) 2023-01-11T23:31:02.6686724Z [ RUN ] AnyValueTest.GetThrowsForTheWrongType 2023-01-11T23:31:02.6701469Z [ OK ] AnyValueTest.GetThrowsForTheWrongType (2 ms) 2023-01-11T23:31:02.6701987Z [ RUN ] AnyValueTest.MoveConstructionIsAllowed 2023-01-11T23:31:02.6702488Z [ OK ] AnyValueTest.MoveConstructionIsAllowed (0 ms) 2023-01-11T23:31:02.6702959Z [ RUN ] AnyValueTest.MoveAssignmentIsAllowed 2023-01-11T23:31:02.6703335Z [ OK ] AnyValueTest.MoveAssignmentIsAllowed (0 ms) 2023-01-11T23:31:02.6703719Z [ RUN ] AnyValueTest.TypeInfoIsCorrectForInt 2023-01-11T23:31:02.6704092Z [ OK ] AnyValueTest.TypeInfoIsCorrectForInt (0 ms) 2023-01-11T23:31:02.6704488Z [ RUN ] AnyValueTest.TypeInfoIsCorrectForStringLiteral 2023-01-11T23:31:02.6704942Z [ OK ] AnyValueTest.TypeInfoIsCorrectForStringLiteral (0 ms) 2023-01-11T23:31:02.6705396Z [ RUN ] AnyValueTest.TypeInfoIsCorrectForString 2023-01-11T23:31:02.6705776Z [ OK ] AnyValueTest.TypeInfoIsCorrectForString (0 ms) 2023-01-11T23:31:02.6706148Z [----------] 12 tests from AnyValueTest (2 ms total) 2023-01-11T23:31:02.6706307Z 2023-01-11T23:31:02.6706457Z [----------] 50 tests from DataTest 2023-01-11T23:31:02.6706773Z [ RUN ] DataTest.DatasetCallsGetCorrectly 2023-01-11T23:31:02.6707123Z [ OK ] DataTest.DatasetCallsGetCorrectly (0 ms) 2023-01-11T23:31:02.6707496Z [ RUN ] DataTest.TransformCallsGetApplyCorrectly 2023-01-11T23:31:02.6707894Z [ OK ] DataTest.TransformCallsGetApplyCorrectly (0 ms) 2023-01-11T23:31:02.6708294Z [ RUN ] DataTest.ChunkDataSetWithInvalidInitParameter 2023-01-11T23:31:02.6762780Z [ OK ] DataTest.ChunkDataSetWithInvalidInitParameter (6 ms) 2023-01-11T23:31:02.6763194Z [ RUN ] DataTest.InfiniteStreamDataset 2023-01-11T23:31:02.6763625Z [ OK ] DataTest.InfiniteStreamDataset (0 ms) 2023-01-11T23:31:02.6764042Z [ RUN ] DataTest.NoSequencerIsIdentity 2023-01-11T23:31:02.6764509Z [ OK ] DataTest.NoSequencerIsIdentity (0 ms) 2023-01-11T23:31:02.6764902Z [ RUN ] DataTest.OrderedSequencerIsSetUpWell 2023-01-11T23:31:02.6765314Z [ OK ] DataTest.OrderedSequencerIsSetUpWell (0 ms) 2023-01-11T23:31:02.6765696Z [ RUN ] DataTest.OrderedSequencerReOrdersValues 2023-01-11T23:31:02.6766084Z [ OK ] DataTest.OrderedSequencerReOrdersValues (0 ms) 2023-01-11T23:31:02.6766479Z [ RUN ] DataTest.BatchLambdaAppliesFunctionToBatch 2023-01-11T23:31:02.6766880Z [ OK ] DataTest.BatchLambdaAppliesFunctionToBatch (0 ms) 2023-01-11T23:31:02.6767362Z [ RUN ] DataTest.LambdaAppliesFunctionToExample 2023-01-11T23:31:02.6767762Z [ OK ] DataTest.LambdaAppliesFunctionToExample (0 ms) 2023-01-11T23:31:02.6768110Z [ RUN ] DataTest.CollateReducesBatch 2023-01-11T23:31:02.6768445Z [ OK ] DataTest.CollateReducesBatch (0 ms) 2023-01-11T23:31:02.6768774Z [ RUN ] DataTest.CollationReducesBatch 2023-01-11T23:31:02.6769106Z [ OK ] DataTest.CollationReducesBatch (0 ms) 2023-01-11T23:31:02.6769494Z [ RUN ] DataTest.SequentialSamplerReturnsIndicesInOrder 2023-01-11T23:31:02.6769930Z [ OK ] DataTest.SequentialSamplerReturnsIndicesInOrder (0 ms) 2023-01-11T23:31:02.6770390Z [ RUN ] DataTest.SequentialSamplerReturnsLessValuesForLastBatch 2023-01-11T23:31:02.6770867Z [ OK ] DataTest.SequentialSamplerReturnsLessValuesForLastBatch (0 ms) 2023-01-11T23:31:02.6771292Z [ RUN ] DataTest.SequentialSamplerResetsWell 2023-01-11T23:31:02.6771711Z [ OK ] DataTest.SequentialSamplerResetsWell (0 ms) 2023-01-11T23:31:02.6772109Z [ RUN ] DataTest.SequentialSamplerResetsWithNewSizeWell 2023-01-11T23:31:02.6772549Z [ OK ] DataTest.SequentialSamplerResetsWithNewSizeWell (0 ms) 2023-01-11T23:31:02.6772962Z [ RUN ] DataTest.CanSaveAndLoadSequentialSampler 2023-01-11T23:31:02.6904231Z [ OK ] DataTest.CanSaveAndLoadSequentialSampler (13 ms) 2023-01-11T23:31:02.6904779Z [ RUN ] DataTest.RandomSamplerReturnsIndicesInCorrectRange 2023-01-11T23:31:02.6905716Z [ OK ] DataTest.RandomSamplerReturnsIndicesInCorrectRange (0 ms) 2023-01-11T23:31:02.6906786Z [ RUN ] DataTest.RandomSamplerReturnsLessValuesForLastBatch 2023-01-11T23:31:02.6907679Z [ OK ] DataTest.RandomSamplerReturnsLessValuesForLastBatch (0 ms) 2023-01-11T23:31:02.6908498Z [ RUN ] DataTest.RandomSamplerResetsWell 2023-01-11T23:31:02.6909373Z [ OK ] DataTest.RandomSamplerResetsWell (0 ms) 2023-01-11T23:31:02.6910285Z [ RUN ] DataTest.RandomSamplerResetsWithNewSizeWell 2023-01-11T23:31:02.6911162Z [ OK ] DataTest.RandomSamplerResetsWithNewSizeWell (0 ms) 2023-01-11T23:31:02.6912029Z [ RUN ] DataTest.SavingAndLoadingRandomSamplerYieldsSameSequence 2023-01-11T23:31:02.6912925Z [ OK ] DataTest.SavingAndLoadingRandomSamplerYieldsSameSequence (0 ms) 2023-01-11T23:31:02.6913818Z [ RUN ] DataTest.StreamSamplerReturnsTheBatchSizeAndThenRemainder 2023-01-11T23:31:02.6914690Z [ OK ] DataTest.StreamSamplerReturnsTheBatchSizeAndThenRemainder (0 ms) 2023-01-11T23:31:02.6915132Z [ RUN ] DataTest.StreamSamplerResetsWell 2023-01-11T23:31:02.6915651Z [ OK ] DataTest.StreamSamplerResetsWell (0 ms) 2023-01-11T23:31:02.6916142Z [ RUN ] DataTest.StreamSamplerResetsWithNewSizeWell 2023-01-11T23:31:02.6916565Z [ OK ] DataTest.StreamSamplerResetsWithNewSizeWell (0 ms) 2023-01-11T23:31:02.6916988Z [ RUN ] DataTest.TensorDatasetConstructsFromSingleTensor 2023-01-11T23:31:02.6917421Z [ OK ] DataTest.TensorDatasetConstructsFromSingleTensor (0 ms) 2023-01-11T23:31:02.6917899Z [ RUN ] DataTest.TensorDatasetConstructsFromInitializerListOfTensors 2023-01-11T23:31:02.6918430Z [ OK ] DataTest.TensorDatasetConstructsFromInitializerListOfTensors (0 ms) 2023-01-11T23:31:02.6918871Z [ RUN ] DataTest.StackTransformWorksForExample 2023-01-11T23:31:02.6919249Z [ OK ] DataTest.StackTransformWorksForExample (0 ms) 2023-01-11T23:31:02.6919646Z [ RUN ] DataTest.StackTransformWorksForTensorExample 2023-01-11T23:31:02.6920063Z [ OK ] DataTest.StackTransformWorksForTensorExample (0 ms) 2023-01-11T23:31:02.6920480Z [ RUN ] DataTest.TensorTransformWorksForAnyTargetType 2023-01-11T23:31:02.6920990Z [ OK ] DataTest.TensorTransformWorksForAnyTargetType (0 ms) 2023-01-11T23:31:02.6921404Z [ RUN ] DataTest.TensorLambdaWorksforAnyTargetType 2023-01-11T23:31:02.6921810Z [ OK ] DataTest.TensorLambdaWorksforAnyTargetType (0 ms) 2023-01-11T23:31:02.6922162Z [ RUN ] DataTest.NormalizeTransform 2023-01-11T23:31:02.6922550Z [ OK ] DataTest.NormalizeTransform (0 ms) 2023-01-11T23:31:02.6922891Z [ RUN ] DataTest.MapDoesNotCopy 2023-01-11T23:31:02.6923197Z [ OK ] DataTest.MapDoesNotCopy (0 ms) 2023-01-11T23:31:02.6923547Z [ RUN ] DataTest.QueuePushAndPopFromSameThread 2023-01-11T23:31:02.6923936Z [ OK ] DataTest.QueuePushAndPopFromSameThread (0 ms) 2023-01-11T23:31:02.6924342Z [ RUN ] DataTest.QueuePopWithTimeoutThrowsUponTimeout 2023-01-11T23:31:02.7035180Z [ OK ] DataTest.QueuePopWithTimeoutThrowsUponTimeout (11 ms) 2023-01-11T23:31:02.7035686Z [ RUN ] DataTest.QueuePushAndPopFromDifferentThreads 2023-01-11T23:31:02.7240897Z [ OK ] DataTest.QueuePushAndPopFromDifferentThreads (20 ms) 2023-01-11T23:31:02.7241295Z [ RUN ] DataTest.QueueClearEmptiesTheQueue 2023-01-11T23:31:02.7263599Z [ OK ] DataTest.QueueClearEmptiesTheQueue (2 ms) 2023-01-11T23:31:02.7263986Z [ RUN ] DataTest.DataShuttleCanPushAndPopJob 2023-01-11T23:31:02.7264384Z [ OK ] DataTest.DataShuttleCanPushAndPopJob (0 ms) 2023-01-11T23:31:02.7264830Z [ RUN ] DataTest.DataShuttleCanPushAndPopResult 2023-01-11T23:31:02.7265221Z [ OK ] DataTest.DataShuttleCanPushAndPopResult (0 ms) 2023-01-11T23:31:02.7265695Z [ RUN ] DataTest.DataShuttlePopResultReturnsNulloptWhenNoJobsInFlight 2023-01-11T23:31:02.7266232Z [ OK ] DataTest.DataShuttlePopResultReturnsNulloptWhenNoJobsInFlight (0 ms) 2023-01-11T23:31:02.7266727Z [ RUN ] DataTest.DataShuttleDrainMeansPopResultReturnsNullopt 2023-01-11T23:31:02.7267208Z [ OK ] DataTest.DataShuttleDrainMeansPopResultReturnsNullopt (0 ms) 2023-01-11T23:31:02.7267628Z [ RUN ] DataTest.DataShuttlePopResultTimesOut 2023-01-11T23:31:02.7376037Z [ OK ] DataTest.DataShuttlePopResultTimesOut (11 ms) 2023-01-11T23:31:02.7376448Z [ RUN ] DataTest.SharedBatchDatasetReallyIsShared 2023-01-11T23:31:02.7402950Z [ OK ] DataTest.SharedBatchDatasetReallyIsShared (2 ms) 2023-01-11T23:31:02.7403500Z [ RUN ] DataTest.SharedBatchDatasetDoesNotIncurCopyWhenPassedDatasetObject 2023-01-11T23:31:02.7404135Z [ OK ] DataTest.SharedBatchDatasetDoesNotIncurCopyWhenPassedDatasetObject (0 ms) 2023-01-11T23:31:02.7404601Z [ RUN ] DataTest.CanUseCustomTypeAsIndexType 2023-01-11T23:31:02.7404974Z [ OK ] DataTest.CanUseCustomTypeAsIndexType (0 ms) 2023-01-11T23:31:02.7405465Z [ RUN ] DataTest.DistributedRandomSamplerSingleReplicaProduceCorrectSamples 2023-01-11T23:31:02.7406035Z [ OK ] DataTest.DistributedRandomSamplerSingleReplicaProduceCorrectSamples (0 ms) 2023-01-11T23:31:02.7406596Z [ RUN ] DataTest.DistributedRandomSamplerMultiReplicaProduceCorrectSamples 2023-01-11T23:31:02.7407160Z [ OK ] DataTest.DistributedRandomSamplerMultiReplicaProduceCorrectSamples (0 ms) 2023-01-11T23:31:02.7407644Z [ RUN ] DataTest.CanSaveAndLoadDistributedRandomSampler 2023-01-11T23:31:02.7410242Z [ OK ] DataTest.CanSaveAndLoadDistributedRandomSampler (0 ms) 2023-01-11T23:31:02.7410791Z [ RUN ] DataTest.DistributedSequentialSamplerSingleReplicaProduceCorrectSamples 2023-01-11T23:31:02.7411475Z [ OK ] DataTest.DistributedSequentialSamplerSingleReplicaProduceCorrectSamples (0 ms) 2023-01-11T23:31:02.7412144Z [ RUN ] DataTest.DistributedSequentialSamplerMultiReplicaProduceCorrectSamples 2023-01-11T23:31:02.7412744Z [ OK ] DataTest.DistributedSequentialSamplerMultiReplicaProduceCorrectSamples (0 ms) 2023-01-11T23:31:02.7413263Z [ RUN ] DataTest.CanSaveAndLoadDistributedSequentialSampler 2023-01-11T23:31:02.7413892Z [ OK ] DataTest.CanSaveAndLoadDistributedSequentialSampler (0 ms) 2023-01-11T23:31:02.7414375Z [----------] 50 tests from DataTest (71 ms total) 2023-01-11T23:31:02.7414655Z 2023-01-11T23:31:02.7414819Z [----------] 37 tests from DataLoaderTest 2023-01-11T23:31:02.7415198Z [ RUN ] DataLoaderTest.DataLoaderOptionsDefaultAsExpected 2023-01-11T23:31:02.7415654Z [ OK ] DataLoaderTest.DataLoaderOptionsDefaultAsExpected (0 ms) 2023-01-11T23:31:02.7416112Z [ RUN ] DataLoaderTest.DataLoaderOptionsCoalesceOptionalValues 2023-01-11T23:31:02.7416599Z [ OK ] DataLoaderTest.DataLoaderOptionsCoalesceOptionalValues (0 ms) 2023-01-11T23:31:02.7417059Z [ RUN ] DataLoaderTest.MakeDataLoaderDefaultsAsExpected 2023-01-11T23:31:02.7417550Z [ OK ] DataLoaderTest.MakeDataLoaderDefaultsAsExpected (0 ms) 2023-01-11T23:31:02.7418098Z [ RUN ] DataLoaderTest.MakeDataLoaderThrowsWhenConstructingSamplerWithUnsizedDataset 2023-01-11T23:31:02.7425293Z [ OK ] DataLoaderTest.MakeDataLoaderThrowsWhenConstructingSamplerWithUnsizedDataset (1 ms) 2023-01-11T23:31:02.7426016Z [ RUN ] DataLoaderTest.IteratorsCompareEqualToThemselves 2023-01-11T23:31:02.7426577Z [ OK ] DataLoaderTest.IteratorsCompareEqualToThemselves (0 ms) 2023-01-11T23:31:02.7427104Z [ RUN ] DataLoaderTest.ValidIteratorsCompareUnequalToEachOther 2023-01-11T23:31:02.7427585Z [ OK ] DataLoaderTest.ValidIteratorsCompareUnequalToEachOther (0 ms) 2023-01-11T23:31:02.7428066Z [ RUN ] DataLoaderTest.SentinelIteratorsCompareEqualToEachOther 2023-01-11T23:31:02.7428552Z [ OK ] DataLoaderTest.SentinelIteratorsCompareEqualToEachOther (0 ms) 2023-01-11T23:31:02.7429055Z [ RUN ] DataLoaderTest.IteratorsCompareEqualToSentinelWhenExhausted 2023-01-11T23:31:02.7429580Z [ OK ] DataLoaderTest.IteratorsCompareEqualToSentinelWhenExhausted (0 ms) 2023-01-11T23:31:02.7430074Z [ RUN ] DataLoaderTest.IteratorsShareState 2023-01-11T23:31:02.7430439Z [ OK ] DataLoaderTest.IteratorsShareState (0 ms) 2023-01-11T23:31:02.7430848Z [ RUN ] DataLoaderTest.CanDereferenceIteratorMultipleTimes 2023-01-11T23:31:02.7431306Z [ OK ] DataLoaderTest.CanDereferenceIteratorMultipleTimes (0 ms) 2023-01-11T23:31:02.7431720Z [ RUN ] DataLoaderTest.CanUseIteratorAlgorithms 2023-01-11T23:31:02.7432109Z [ OK ] DataLoaderTest.CanUseIteratorAlgorithms (0 ms) 2023-01-11T23:31:02.7432571Z [ RUN ] DataLoaderTest.CallingBeginWhileOtherIteratorIsInFlightThrows 2023-01-11T23:31:02.7440690Z [ OK ] DataLoaderTest.CallingBeginWhileOtherIteratorIsInFlightThrows (1 ms) 2023-01-11T23:31:02.7441204Z [ RUN ] DataLoaderTest.IncrementingExhaustedValidIteratorThrows 2023-01-11T23:31:02.7452419Z [ OK ] DataLoaderTest.IncrementingExhaustedValidIteratorThrows (1 ms) 2023-01-11T23:31:02.7452912Z [ RUN ] DataLoaderTest.DereferencingExhaustedValidIteratorThrows 2023-01-11T23:31:02.7464055Z [ OK ] DataLoaderTest.DereferencingExhaustedValidIteratorThrows (1 ms) 2023-01-11T23:31:02.7464529Z [ RUN ] DataLoaderTest.IncrementingSentinelIteratorThrows 2023-01-11T23:31:02.7475541Z [ OK ] DataLoaderTest.IncrementingSentinelIteratorThrows (1 ms) 2023-01-11T23:31:02.7475999Z [ RUN ] DataLoaderTest.DereferencingSentinelIteratorThrows 2023-01-11T23:31:02.7486994Z [ OK ] DataLoaderTest.DereferencingSentinelIteratorThrows (1 ms) 2023-01-11T23:31:02.7487506Z [ RUN ] DataLoaderTest.YieldsCorrectBatchSize 2023-01-11T23:31:02.7487909Z [ OK ] DataLoaderTest.YieldsCorrectBatchSize (0 ms) 2023-01-11T23:31:02.7488414Z [ RUN ] DataLoaderTest.ReturnsLastBatchWhenSmallerThanBatchSizeWhenDropLastIsFalse 2023-01-11T23:31:02.7489029Z [ OK ] DataLoaderTest.ReturnsLastBatchWhenSmallerThanBatchSizeWhenDropLastIsFalse (0 ms) 2023-01-11T23:31:02.7489664Z [ RUN ] DataLoaderTest.DoesNotReturnLastBatchWhenSmallerThanBatchSizeWhenDropLastIsTrue 2023-01-11T23:31:02.7490325Z [ OK ] DataLoaderTest.DoesNotReturnLastBatchWhenSmallerThanBatchSizeWhenDropLastIsTrue (0 ms) 2023-01-11T23:31:02.7490809Z [ RUN ] DataLoaderTest.RespectsTimeout 2023-01-11T23:31:02.7604634Z [ OK ] DataLoaderTest.RespectsTimeout (11 ms) 2023-01-11T23:31:02.7605154Z [ RUN ] DataLoaderTest.EnforcesOrderingAmongThreadsWhenConfigured 2023-01-11T23:31:02.7628844Z [ OK ] DataLoaderTest.EnforcesOrderingAmongThreadsWhenConfigured (2 ms) 2023-01-11T23:31:02.7629328Z [ RUN ] DataLoaderTest.Reset 2023-01-11T23:31:02.7629637Z [ OK ] DataLoaderTest.Reset (0 ms) 2023-01-11T23:31:02.7630113Z [ RUN ] DataLoaderTest.TestExceptionsArePropagatedFromWorkers 2023-01-11T23:31:02.7635116Z [ OK ] DataLoaderTest.TestExceptionsArePropagatedFromWorkers (0 ms) 2023-01-11T23:31:02.7635576Z [ RUN ] DataLoaderTest.StatefulDatasetWithNoWorkers 2023-01-11T23:31:02.7635992Z [ OK ] DataLoaderTest.StatefulDatasetWithNoWorkers (0 ms) 2023-01-11T23:31:02.7636407Z [ RUN ] DataLoaderTest.StatefulDatasetWithManyWorkers 2023-01-11T23:31:02.7669066Z [ OK ] DataLoaderTest.StatefulDatasetWithManyWorkers (3 ms) 2023-01-11T23:31:02.7669483Z [ RUN ] DataLoaderTest.StatefulDatasetWithMap 2023-01-11T23:31:02.7669880Z [ OK ] DataLoaderTest.StatefulDatasetWithMap (0 ms) 2023-01-11T23:31:02.7670466Z [ RUN ] DataLoaderTest.StatefulDatasetWithCollate 2023-01-11T23:31:02.7673049Z [ OK ] DataLoaderTest.StatefulDatasetWithCollate (0 ms) 2023-01-11T23:31:02.7673438Z [ RUN ] DataLoaderTest.ChunkDataSetGetBatch 2023-01-11T23:31:02.7805012Z [ OK ] DataLoaderTest.ChunkDataSetGetBatch (13 ms) 2023-01-11T23:31:02.7805449Z [ RUN ] DataLoaderTest.ChunkDataSetWithBatchSizeMismatch 2023-01-11T23:31:02.7818709Z [ OK ] DataLoaderTest.ChunkDataSetWithBatchSizeMismatch (1 ms) 2023-01-11T23:31:02.7819154Z [ RUN ] DataLoaderTest.ChunkDataSetWithEmptyBatch 2023-01-11T23:31:02.7821677Z [ OK ] DataLoaderTest.ChunkDataSetWithEmptyBatch (0 ms) 2023-01-11T23:31:02.7822126Z [ RUN ] DataLoaderTest.ChunkDataSetGetBatchWithUnevenBatchSize 2023-01-11T23:31:02.7827630Z [ OK ] DataLoaderTest.ChunkDataSetGetBatchWithUnevenBatchSize (0 ms) 2023-01-11T23:31:02.7828227Z [ RUN ] DataLoaderTest.CanAccessChunkSamplerWithChunkDataSet 2023-01-11T23:31:02.7833209Z [ OK ] DataLoaderTest.CanAccessChunkSamplerWithChunkDataSet (0 ms) 2023-01-11T23:31:02.7833633Z [ RUN ] DataLoaderTest.ChunkDatasetDoesNotHang 2023-01-11T23:31:02.7835079Z [ OK ] DataLoaderTest.ChunkDatasetDoesNotHang (0 ms) 2023-01-11T23:31:02.7835450Z [ RUN ] DataLoaderTest.ChunkDatasetSave 2023-01-11T23:31:02.7985217Z [ OK ] DataLoaderTest.ChunkDatasetSave (14 ms) 2023-01-11T23:31:02.7985567Z [ RUN ] DataLoaderTest.ChunkDatasetLoad 2023-01-11T23:31:02.7990524Z [ OK ] DataLoaderTest.ChunkDatasetLoad (0 ms) 2023-01-11T23:31:02.7991263Z [ RUN ] DataLoaderTest.ChunkDatasetCrossChunkShuffle 2023-01-11T23:31:02.8000424Z [ OK ] DataLoaderTest.ChunkDatasetCrossChunkShuffle (0 ms) 2023-01-11T23:31:02.8000825Z [ RUN ] DataLoaderTest.CustomPreprocessPolicy 2023-01-11T23:31:02.8005723Z [ OK ] DataLoaderTest.CustomPreprocessPolicy (0 ms) 2023-01-11T23:31:02.8006137Z [----------] 37 tests from DataLoaderTest (59 ms total) 2023-01-11T23:31:02.8006377Z 2023-01-11T23:31:02.8006588Z [----------] 1 test from EnumTest 2023-01-11T23:31:02.8006915Z [ RUN ] EnumTest.AllEnums 2023-01-11T23:31:02.8007235Z [ OK ] EnumTest.AllEnums (0 ms) 2023-01-11T23:31:02.8007548Z [----------] 1 test from EnumTest (0 ms total) 2023-01-11T23:31:02.8007694Z 2023-01-11T23:31:02.8007881Z [----------] 6 tests from ExpandingArrayTest 2023-01-11T23:31:02.8008270Z [ RUN ] ExpandingArrayTest.CanConstructFromInitializerList 2023-01-11T23:31:02.8008735Z [ OK ] ExpandingArrayTest.CanConstructFromInitializerList (0 ms) 2023-01-11T23:31:02.8009163Z [ RUN ] ExpandingArrayTest.CanConstructFromVector 2023-01-11T23:31:02.8009566Z [ OK ] ExpandingArrayTest.CanConstructFromVector (0 ms) 2023-01-11T23:31:02.8010035Z [ RUN ] ExpandingArrayTest.CanConstructFromArray 2023-01-11T23:31:02.8010436Z [ OK ] ExpandingArrayTest.CanConstructFromArray (0 ms) 2023-01-11T23:31:02.8010845Z [ RUN ] ExpandingArrayTest.CanConstructFromSingleValue 2023-01-11T23:31:02.8011287Z [ OK ] ExpandingArrayTest.CanConstructFromSingleValue (0 ms) 2023-01-11T23:31:02.8011881Z [ RUN ] ExpandingArrayTest.ThrowsWhenConstructedWithIncorrectNumberOfArgumentsInInitializerList 2023-01-11T23:31:02.8018291Z [ OK ] ExpandingArrayTest.ThrowsWhenConstructedWithIncorrectNumberOfArgumentsInInitializerList (1 ms) 2023-01-11T23:31:02.8018991Z [ RUN ] ExpandingArrayTest.ThrowsWhenConstructedWithIncorrectNumberOfArgumentsInVector 2023-01-11T23:31:02.8028580Z [ OK ] ExpandingArrayTest.ThrowsWhenConstructedWithIncorrectNumberOfArgumentsInVector (1 ms) 2023-01-11T23:31:02.8029193Z [----------] 6 tests from ExpandingArrayTest (2 ms total) 2023-01-11T23:31:02.8029376Z 2023-01-11T23:31:02.8029528Z [----------] 10 tests from FFTTest 2023-01-11T23:31:02.8029785Z [ RUN ] FFTTest.fft 2023-01-11T23:31:02.8033831Z [ OK ] FFTTest.fft (0 ms) 2023-01-11T23:31:02.8034112Z [ RUN ] FFTTest.fft_real 2023-01-11T23:31:02.8034982Z [ OK ] FFTTest.fft_real (0 ms) 2023-01-11T23:31:02.8035972Z [ RUN ] FFTTest.fft_pad 2023-01-11T23:31:02.8037492Z [ OK ] FFTTest.fft_pad (0 ms) 2023-01-11T23:31:02.8037783Z [ RUN ] FFTTest.fft_norm 2023-01-11T23:31:02.8039564Z [ OK ] FFTTest.fft_norm (0 ms) 2023-01-11T23:31:02.8039831Z [ RUN ] FFTTest.ifft 2023-01-11T23:31:02.8042059Z [ OK ] FFTTest.ifft (0 ms) 2023-01-11T23:31:02.8042444Z [ RUN ] FFTTest.fft_ifft 2023-01-11T23:31:02.8043099Z [ OK ] FFTTest.fft_ifft (0 ms) 2023-01-11T23:31:02.8043406Z [ RUN ] FFTTest.rfft 2023-01-11T23:31:02.8043954Z [ OK ] FFTTest.rfft (0 ms) 2023-01-11T23:31:02.8044436Z [ RUN ] FFTTest.rfft_irfft 2023-01-11T23:31:02.8045329Z [ OK ] FFTTest.rfft_irfft (0 ms) 2023-01-11T23:31:02.8045734Z [ RUN ] FFTTest.ihfft 2023-01-11T23:31:02.8046997Z [ OK ] FFTTest.ihfft (0 ms) 2023-01-11T23:31:02.8047275Z [ RUN ] FFTTest.hfft_ihfft 2023-01-11T23:31:02.8048912Z [ OK ] FFTTest.hfft_ihfft (0 ms) 2023-01-11T23:31:02.8049243Z [----------] 10 tests from FFTTest (2 ms total) 2023-01-11T23:31:02.8049406Z 2023-01-11T23:31:02.8049573Z [----------] 135 tests from FunctionalTest 2023-01-11T23:31:02.8049868Z [ RUN ] FunctionalTest.Conv1d 2023-01-11T23:31:02.8058992Z [ OK ] FunctionalTest.Conv1d (0 ms) 2023-01-11T23:31:02.8059307Z [ RUN ] FunctionalTest.Conv2dEven 2023-01-11T23:31:02.8063221Z [ OK ] FunctionalTest.Conv2dEven (0 ms) 2023-01-11T23:31:02.8063560Z [ RUN ] FunctionalTest.Conv2dUneven 2023-01-11T23:31:02.8064992Z [ OK ] FunctionalTest.Conv2dUneven (0 ms) 2023-01-11T23:31:02.8065313Z [ RUN ] FunctionalTest.Conv3d 2023-01-11T23:31:02.8069168Z [ OK ] FunctionalTest.Conv3d (0 ms) 2023-01-11T23:31:02.8069505Z [ RUN ] FunctionalTest.MaxPool1d 2023-01-11T23:31:02.8069982Z [ OK ] FunctionalTest.MaxPool1d (0 ms) 2023-01-11T23:31:02.8070322Z [ RUN ] FunctionalTest.MaxPool2d 2023-01-11T23:31:02.8070859Z [ OK ] FunctionalTest.MaxPool2d (0 ms) 2023-01-11T23:31:02.8072089Z [ RUN ] FunctionalTest.MaxPool2dBackward 2023-01-11T23:31:02.8073994Z [ OK ] FunctionalTest.MaxPool2dBackward (0 ms) 2023-01-11T23:31:02.8074436Z [ RUN ] FunctionalTest.MaxPool3d 2023-01-11T23:31:02.8074867Z [ OK ] FunctionalTest.MaxPool3d (0 ms) 2023-01-11T23:31:02.8075309Z [ RUN ] FunctionalTest.AvgPool1d 2023-01-11T23:31:02.8075681Z [ OK ] FunctionalTest.AvgPool1d (0 ms) 2023-01-11T23:31:02.8076019Z [ RUN ] FunctionalTest.AvgPool2d 2023-01-11T23:31:02.8076320Z [ OK ] FunctionalTest.AvgPool2d (0 ms) 2023-01-11T23:31:02.8076628Z [ RUN ] FunctionalTest.AvgPool3d 2023-01-11T23:31:02.8076944Z [ OK ] FunctionalTest.AvgPool3d (0 ms) 2023-01-11T23:31:02.8077270Z [ RUN ] FunctionalTest.FractionalMaxPool2d 2023-01-11T23:31:02.8079576Z [ OK ] FunctionalTest.FractionalMaxPool2d (0 ms) 2023-01-11T23:31:02.8079942Z [ RUN ] FunctionalTest.FractionalMaxPool3d 2023-01-11T23:31:02.8081488Z [ OK ] FunctionalTest.FractionalMaxPool3d (0 ms) 2023-01-11T23:31:02.8081820Z [ RUN ] FunctionalTest.LPPool1d 2023-01-11T23:31:02.8083425Z [ OK ] FunctionalTest.LPPool1d (0 ms) 2023-01-11T23:31:02.8083773Z [ RUN ] FunctionalTest.LPPool2d 2023-01-11T23:31:02.8084151Z [ OK ] FunctionalTest.LPPool2d (0 ms) 2023-01-11T23:31:02.8084472Z [ RUN ] FunctionalTest.CosineSimilarity 2023-01-11T23:31:02.8085925Z [ OK ] FunctionalTest.CosineSimilarity (0 ms) 2023-01-11T23:31:02.8086306Z [ RUN ] FunctionalTest.SmoothL1LossDefaultOptions 2023-01-11T23:31:02.8088049Z [ OK ] FunctionalTest.SmoothL1LossDefaultOptions (0 ms) 2023-01-11T23:31:02.8088520Z [ RUN ] FunctionalTest.SmoothL1LossBeta 2023-01-11T23:31:02.8089216Z [ OK ] FunctionalTest.SmoothL1LossBeta (0 ms) 2023-01-11T23:31:02.8089622Z [ RUN ] FunctionalTest.SmoothL1LossNoReduction 2023-01-11T23:31:02.8090562Z [ OK ] FunctionalTest.SmoothL1LossNoReduction (0 ms) 2023-01-11T23:31:02.8091025Z [ RUN ] FunctionalTest.HuberLossDefaultOptions 2023-01-11T23:31:02.8092122Z [ OK ] FunctionalTest.HuberLossDefaultOptions (0 ms) 2023-01-11T23:31:02.8092566Z [ RUN ] FunctionalTest.HuberLossDelta 2023-01-11T23:31:02.8092964Z [ OK ] FunctionalTest.HuberLossDelta (0 ms) 2023-01-11T23:31:02.8093369Z [ RUN ] FunctionalTest.HuberLossNoReduction 2023-01-11T23:31:02.8094168Z [ OK ] FunctionalTest.HuberLossNoReduction (0 ms) 2023-01-11T23:31:02.8094917Z [ RUN ] FunctionalTest.SoftMarginLossDefaultOptions 2023-01-11T23:31:02.8097170Z [ OK ] FunctionalTest.SoftMarginLossDefaultOptions (0 ms) 2023-01-11T23:31:02.8097624Z [ RUN ] FunctionalTest.MultiLabelSoftMarginLossDefaultOptions 2023-01-11T23:31:02.8100449Z [ OK ] FunctionalTest.MultiLabelSoftMarginLossDefaultOptions (0 ms) 2023-01-11T23:31:02.8100895Z [ RUN ] FunctionalTest.SoftMarginLossNoReduction 2023-01-11T23:31:02.8101850Z [ OK ] FunctionalTest.SoftMarginLossNoReduction (0 ms) 2023-01-11T23:31:02.8102529Z [ RUN ] FunctionalTest.MultiLabelSoftMarginLossWeightedNoReduction 2023-01-11T23:31:02.8104545Z [ OK ] FunctionalTest.MultiLabelSoftMarginLossWeightedNoReduction (0 ms) 2023-01-11T23:31:02.8105060Z [ RUN ] FunctionalTest.PairwiseDistance 2023-01-11T23:31:02.8105503Z [ OK ] FunctionalTest.PairwiseDistance (0 ms) 2023-01-11T23:31:02.8105856Z [ RUN ] FunctionalTest.PDist 2023-01-11T23:31:02.8106885Z [ OK ] FunctionalTest.PDist (0 ms) 2023-01-11T23:31:02.8107219Z [ RUN ] FunctionalTest.AdaptiveMaxPool1d 2023-01-11T23:31:02.8107667Z [ OK ] FunctionalTest.AdaptiveMaxPool1d (0 ms) 2023-01-11T23:31:02.8108016Z [ RUN ] FunctionalTest.AdaptiveMaxPool2d 2023-01-11T23:31:02.8108422Z [ OK ] FunctionalTest.AdaptiveMaxPool2d (0 ms) 2023-01-11T23:31:02.8108786Z [ RUN ] FunctionalTest.AdaptiveMaxPool3d 2023-01-11T23:31:02.8109457Z [ OK ] FunctionalTest.AdaptiveMaxPool3d (0 ms) 2023-01-11T23:31:02.8110058Z [ RUN ] FunctionalTest.AdaptiveAvgPool1d 2023-01-11T23:31:02.8110460Z [ OK ] FunctionalTest.AdaptiveAvgPool1d (0 ms) 2023-01-11T23:31:02.8110809Z [ RUN ] FunctionalTest.AdaptiveAvgPool2d 2023-01-11T23:31:02.8111460Z [ OK ] FunctionalTest.AdaptiveAvgPool2d (0 ms) 2023-01-11T23:31:02.8112160Z [ RUN ] FunctionalTest.AdaptiveAvgPool3d 2023-01-11T23:31:02.8112777Z [ OK ] FunctionalTest.AdaptiveAvgPool3d (0 ms) 2023-01-11T23:31:02.8113190Z [ RUN ] FunctionalTest.L1Loss 2023-01-11T23:31:02.8113696Z [ OK ] FunctionalTest.L1Loss (0 ms) 2023-01-11T23:31:02.8113988Z [ RUN ] FunctionalTest.MSELoss 2023-01-11T23:31:02.8115123Z [ OK ] FunctionalTest.MSELoss (0 ms) 2023-01-11T23:31:02.8115460Z [ RUN ] FunctionalTest.BCELoss 2023-01-11T23:31:02.8116787Z [ OK ] FunctionalTest.BCELoss (0 ms) 2023-01-11T23:31:02.8117101Z [ RUN ] FunctionalTest.KLDivLoss 2023-01-11T23:31:02.8117772Z [W loss.h:57] Warning: reduction: 'mean' divides the total loss by both the batch size and the support size.'batchmean' divides only by the batch size, and aligns with the KL div math definition.'mean' will be changed to behave the same as 'batchmean' in the next major release. (function kl_div) 2023-01-11T23:31:02.8118415Z [ OK ] FunctionalTest.KLDivLoss (0 ms) 2023-01-11T23:31:02.8118758Z [ RUN ] FunctionalTest.HingeEmbeddingLoss 2023-01-11T23:31:02.8120038Z [ OK ] FunctionalTest.HingeEmbeddingLoss (0 ms) 2023-01-11T23:31:02.8120396Z [ RUN ] FunctionalTest.GridSample 2023-01-11T23:31:02.8122874Z [W vision.h:87] Warning: Default grid_sample and affine_grid behavior has changed to align_corners=False since 1.3.0. Please specify align_corners=True if the old behavior is desired. See the documentation of grid_sample for details. (function grid_sample) 2023-01-11T23:31:02.8125128Z [ OK ] FunctionalTest.GridSample (0 ms) 2023-01-11T23:31:02.8125454Z [ RUN ] FunctionalTest.AffineGrid 2023-01-11T23:31:02.8268921Z [ OK ] FunctionalTest.AffineGrid (14 ms) 2023-01-11T23:31:02.8269359Z [ RUN ] FunctionalTest.MultiMarginLoss 2023-01-11T23:31:02.8269711Z [ OK ] FunctionalTest.MultiMarginLoss (0 ms) 2023-01-11T23:31:02.8270207Z [ RUN ] FunctionalTest.CosineEmbeddingLoss 2023-01-11T23:31:02.8273585Z [ OK ] FunctionalTest.CosineEmbeddingLoss (0 ms) 2023-01-11T23:31:02.8274004Z [ RUN ] FunctionalTest.MultiLabelMarginLossDefaultOptions 2023-01-11T23:31:02.8274577Z [ OK ] FunctionalTest.MultiLabelMarginLossDefaultOptions (0 ms) 2023-01-11T23:31:02.8275104Z [ RUN ] FunctionalTest.MultiLabelMarginLossNoReduction 2023-01-11T23:31:02.8276197Z [ OK ] FunctionalTest.MultiLabelMarginLossNoReduction (0 ms) 2023-01-11T23:31:02.8276815Z [ RUN ] FunctionalTest.TripletMarginLoss 2023-01-11T23:31:02.8277434Z [ OK ] FunctionalTest.TripletMarginLoss (0 ms) 2023-01-11T23:31:02.8277875Z [ RUN ] FunctionalTest.TripletMarginWithDistanceLossDefaultParity 2023-01-11T23:31:02.8395044Z [ OK ] FunctionalTest.TripletMarginWithDistanceLossDefaultParity (11 ms) 2023-01-11T23:31:02.8395482Z [ RUN ] FunctionalTest.NLLLoss 2023-01-11T23:31:02.8395849Z [ OK ] FunctionalTest.NLLLoss (0 ms) 2023-01-11T23:31:02.8396163Z [ RUN ] FunctionalTest.CrossEntropy 2023-01-11T23:31:02.8399956Z [ OK ] FunctionalTest.CrossEntropy (0 ms) 2023-01-11T23:31:02.8400295Z [ RUN ] FunctionalTest.MaxUnpool1d 2023-01-11T23:31:02.8403549Z [ OK ] FunctionalTest.MaxUnpool1d (0 ms) 2023-01-11T23:31:02.8403977Z [ RUN ] FunctionalTest.MaxUnpool2d 2023-01-11T23:31:02.8406286Z [ OK ] FunctionalTest.MaxUnpool2d (0 ms) 2023-01-11T23:31:02.8406817Z [ RUN ] FunctionalTest.MaxUnpool3d 2023-01-11T23:31:02.8407552Z [ OK ] FunctionalTest.MaxUnpool3d (0 ms) 2023-01-11T23:31:02.8407897Z [ RUN ] FunctionalTest.ELU 2023-01-11T23:31:02.8421892Z [ OK ] FunctionalTest.ELU (1 ms) 2023-01-11T23:31:02.8422189Z [ RUN ] FunctionalTest.SELU 2023-01-11T23:31:02.8425534Z [ OK ] FunctionalTest.SELU (0 ms) 2023-01-11T23:31:02.8425871Z [ RUN ] FunctionalTest.GLU 2023-01-11T23:31:02.8426783Z [ OK ] FunctionalTest.GLU (0 ms) 2023-01-11T23:31:02.8427077Z [ RUN ] FunctionalTest.GELU 2023-01-11T23:31:02.8431306Z [ OK ] FunctionalTest.GELU (0 ms) 2023-01-11T23:31:02.8431970Z [ RUN ] FunctionalTest.TanhGELU 2023-01-11T23:31:02.8433384Z [ OK ] FunctionalTest.TanhGELU (0 ms) 2023-01-11T23:31:02.8433711Z [ RUN ] FunctionalTest.Hardshrink 2023-01-11T23:31:02.8440329Z [ OK ] FunctionalTest.Hardshrink (0 ms) 2023-01-11T23:31:02.8440650Z [ RUN ] FunctionalTest.OneHot 2023-01-11T23:31:02.8444022Z [ OK ] FunctionalTest.OneHot (0 ms) 2023-01-11T23:31:02.8444334Z [ RUN ] FunctionalTest.Hardtanh 2023-01-11T23:31:02.8473323Z [ OK ] FunctionalTest.Hardtanh (2 ms) 2023-01-11T23:31:02.8473646Z [ RUN ] FunctionalTest.LeakyReLU 2023-01-11T23:31:02.8481591Z [ OK ] FunctionalTest.LeakyReLU (0 ms) 2023-01-11T23:31:02.8481918Z [ RUN ] FunctionalTest.LogSigmoid 2023-01-11T23:31:02.8482436Z [ OK ] FunctionalTest.LogSigmoid (0 ms) 2023-01-11T23:31:02.8482779Z [ RUN ] FunctionalTest.GumbelSoftmax 2023-01-11T23:31:02.8514344Z [ OK ] FunctionalTest.GumbelSoftmax (3 ms) 2023-01-11T23:31:02.8514685Z [ RUN ] FunctionalTest.Softmax 2023-01-11T23:31:02.8515113Z [ OK ] FunctionalTest.Softmax (0 ms) 2023-01-11T23:31:02.8515427Z [ RUN ] FunctionalTest.Softmin 2023-01-11T23:31:02.8516059Z [ OK ] FunctionalTest.Softmin (0 ms) 2023-01-11T23:31:02.8516435Z [ RUN ] FunctionalTest.LogSoftmax 2023-01-11T23:31:02.8517370Z [ OK ] FunctionalTest.LogSoftmax (0 ms) 2023-01-11T23:31:02.8517698Z [ RUN ] FunctionalTest.PReLU 2023-01-11T23:31:02.8520818Z [ OK ] FunctionalTest.PReLU (0 ms) 2023-01-11T23:31:02.8521128Z [ RUN ] FunctionalTest.LayerNorm 2023-01-11T23:31:02.8521472Z [ OK ] FunctionalTest.LayerNorm (0 ms) 2023-01-11T23:31:02.8521851Z [ RUN ] FunctionalTest.GroupNorm 2023-01-11T23:31:02.8522164Z [ OK ] FunctionalTest.GroupNorm (0 ms) 2023-01-11T23:31:02.8522500Z [ RUN ] FunctionalTest.LocalResponseNorm 2023-01-11T23:31:02.8524144Z [ OK ] FunctionalTest.LocalResponseNorm (0 ms) 2023-01-11T23:31:02.8524485Z [ RUN ] FunctionalTest.Linear 2023-01-11T23:31:02.8527136Z [ OK ] FunctionalTest.Linear (0 ms) 2023-01-11T23:31:02.8527455Z [ RUN ] FunctionalTest.Embedding 2023-01-11T23:31:02.8527870Z [ OK ] FunctionalTest.Embedding (0 ms) 2023-01-11T23:31:02.8528189Z [ RUN ] FunctionalTest.EmbeddingBag 2023-01-11T23:31:02.8532577Z [ OK ] FunctionalTest.EmbeddingBag (0 ms) 2023-01-11T23:31:02.8532899Z [ RUN ] FunctionalTest.Bilinear 2023-01-11T23:31:02.8535406Z [ OK ] FunctionalTest.Bilinear (0 ms) 2023-01-11T23:31:02.8535716Z [ RUN ] FunctionalTest.Normalize 2023-01-11T23:31:02.8539915Z [ OK ] FunctionalTest.Normalize (0 ms) 2023-01-11T23:31:02.8540221Z [ RUN ] FunctionalTest.ReLU 2023-01-11T23:31:02.8542761Z [ OK ] FunctionalTest.ReLU (0 ms) 2023-01-11T23:31:02.8543098Z [ RUN ] FunctionalTest.ReLUDefaultOptions 2023-01-11T23:31:02.8543544Z [ OK ] FunctionalTest.ReLUDefaultOptions (0 ms) 2023-01-11T23:31:02.8543871Z [ RUN ] FunctionalTest.ReLU6 2023-01-11T23:31:02.8547282Z [ OK ] FunctionalTest.ReLU6 (0 ms) 2023-01-11T23:31:02.8547684Z [ RUN ] FunctionalTest.ReLU6DefaultOptions 2023-01-11T23:31:02.8548094Z [ OK ] FunctionalTest.ReLU6DefaultOptions (0 ms) 2023-01-11T23:31:02.8548415Z [ RUN ] FunctionalTest.RReLU 2023-01-11T23:31:02.8584301Z [ OK ] FunctionalTest.RReLU (3 ms) 2023-01-11T23:31:02.8585252Z [ RUN ] FunctionalTest.RReLUDefaultOptions 2023-01-11T23:31:02.8586256Z [ OK ] FunctionalTest.RReLUDefaultOptions (0 ms) 2023-01-11T23:31:02.8586763Z [ RUN ] FunctionalTest.CELU 2023-01-11T23:31:02.8599203Z [ OK ] FunctionalTest.CELU (1 ms) 2023-01-11T23:31:02.8599698Z [ RUN ] FunctionalTest.CELUDefaultOptions 2023-01-11T23:31:02.8600271Z [ OK ] FunctionalTest.CELUDefaultOptions (0 ms) 2023-01-11T23:31:02.8600782Z [ RUN ] FunctionalTest.PixelShuffle 2023-01-11T23:31:02.8602663Z [ OK ] FunctionalTest.PixelShuffle (0 ms) 2023-01-11T23:31:02.8603186Z [ RUN ] FunctionalTest.PixelUnshuffle 2023-01-11T23:31:02.8604361Z [ OK ] FunctionalTest.PixelUnshuffle (0 ms) 2023-01-11T23:31:02.8604870Z [ RUN ] FunctionalTest.Softplus 2023-01-11T23:31:02.8612493Z [ OK ] FunctionalTest.Softplus (0 ms) 2023-01-11T23:31:02.8613036Z [ RUN ] FunctionalTest.SoftplusDefaultOptions 2023-01-11T23:31:02.8613591Z [ OK ] FunctionalTest.SoftplusDefaultOptions (0 ms) 2023-01-11T23:31:02.8614080Z [ RUN ] FunctionalTest.Fold 2023-01-11T23:31:02.8614867Z [ OK ] FunctionalTest.Fold (0 ms) 2023-01-11T23:31:02.8615260Z [ RUN ] FunctionalTest.Unfold 2023-01-11T23:31:02.8616446Z [ OK ] FunctionalTest.Unfold (0 ms) 2023-01-11T23:31:02.8616760Z [ RUN ] FunctionalTest.Softshrink 2023-01-11T23:31:02.8623523Z [ OK ] FunctionalTest.Softshrink (0 ms) 2023-01-11T23:31:02.8623992Z [ RUN ] FunctionalTest.SoftshrinkDefaultOptions 2023-01-11T23:31:02.8624400Z [ OK ] FunctionalTest.SoftshrinkDefaultOptions (0 ms) 2023-01-11T23:31:02.8624868Z [ RUN ] FunctionalTest.Softsign 2023-01-11T23:31:02.8625231Z [ OK ] FunctionalTest.Softsign (0 ms) 2023-01-11T23:31:02.8625543Z [ RUN ] FunctionalTest.Mish 2023-01-11T23:31:02.8626287Z [ OK ] FunctionalTest.Mish (0 ms) 2023-01-11T23:31:02.8626603Z [ RUN ] FunctionalTest.Tanhshrink 2023-01-11T23:31:02.8626961Z [ OK ] FunctionalTest.Tanhshrink (0 ms) 2023-01-11T23:31:02.8627326Z [ RUN ] FunctionalTest.Threshold 2023-01-11T23:31:02.8640658Z [ OK ] FunctionalTest.Threshold (1 ms) 2023-01-11T23:31:02.8640997Z [ RUN ] FunctionalTest.BatchNorm1d 2023-01-11T23:31:02.8641425Z [ OK ] FunctionalTest.BatchNorm1d (0 ms) 2023-01-11T23:31:02.8641784Z [ RUN ] FunctionalTest.BatchNorm1dDefaultOptions 2023-01-11T23:31:02.8642553Z [ OK ] FunctionalTest.BatchNorm1dDefaultOptions (0 ms) 2023-01-11T23:31:02.8643110Z [ RUN ] FunctionalTest.BatchNorm2d 2023-01-11T23:31:02.8644029Z [ OK ] FunctionalTest.BatchNorm2d (0 ms) 2023-01-11T23:31:02.8644583Z [ RUN ] FunctionalTest.BatchNorm2dDefaultOptions 2023-01-11T23:31:02.8645480Z [ OK ] FunctionalTest.BatchNorm2dDefaultOptions (0 ms) 2023-01-11T23:31:02.8646017Z [ RUN ] FunctionalTest.BatchNorm3d 2023-01-11T23:31:02.8647004Z [ OK ] FunctionalTest.BatchNorm3d (0 ms) 2023-01-11T23:31:02.8647544Z [ RUN ] FunctionalTest.BatchNorm3dDefaultOptions 2023-01-11T23:31:02.8648516Z [ OK ] FunctionalTest.BatchNorm3dDefaultOptions (0 ms) 2023-01-11T23:31:02.8649212Z [ RUN ] FunctionalTest.InstanceNorm1d 2023-01-11T23:31:02.8652111Z [ OK ] FunctionalTest.InstanceNorm1d (0 ms) 2023-01-11T23:31:02.8652701Z [ RUN ] FunctionalTest.InstanceNorm1dDefaultOptions 2023-01-11T23:31:02.8653675Z [ OK ] FunctionalTest.InstanceNorm1dDefaultOptions (0 ms) 2023-01-11T23:31:02.8654218Z [ RUN ] FunctionalTest.InstanceNorm2d 2023-01-11T23:31:02.8657545Z [ OK ] FunctionalTest.InstanceNorm2d (0 ms) 2023-01-11T23:31:02.8658101Z [ RUN ] FunctionalTest.InstanceNorm2dDefaultOptions 2023-01-11T23:31:02.8659503Z [ OK ] FunctionalTest.InstanceNorm2dDefaultOptions (0 ms) 2023-01-11T23:31:02.8660046Z [ RUN ] FunctionalTest.InstanceNorm3d 2023-01-11T23:31:02.8664404Z [ OK ] FunctionalTest.InstanceNorm3d (0 ms) 2023-01-11T23:31:02.8664987Z [ RUN ] FunctionalTest.InstanceNorm3dDefaultOptions 2023-01-11T23:31:02.8667646Z [ OK ] FunctionalTest.InstanceNorm3dDefaultOptions (0 ms) 2023-01-11T23:31:02.8668223Z [ RUN ] FunctionalTest.Interpolate 2023-01-11T23:31:02.8669133Z [W upsampling.h:66] Warning: The default behavior for interpolate/upsample with float scale_factor changed in 1.6.0 to align with other frameworks/libraries, and uses scale_factor directly, instead of relying on the computed output size. If you wish to keep the old behavior, please set recompute_scale_factor=True. See the documentation of nn.Upsample for details. (function _interp_output_size) 2023-01-11T23:31:02.8670670Z [W upsampling.h:66] Warning: The default behavior for interpolate/upsample with float scale_factor changed in 1.6.0 to align with other frameworks/libraries, and uses scale_factor directly, instead of relying on the computed output size. If you wish to keep the old behavior, please set recompute_scale_factor=True. See the documentation of nn.Upsample for details. (function _interp_output_size) 2023-01-11T23:31:02.8672758Z [W upsampling.h:66] Warning: The default behavior for interpolate/upsample with float scale_factor changed in 1.6.0 to align with other frameworks/libraries, and uses scale_factor directly, instead of relying on the computed output size. If you wish to keep the old behavior, please set recompute_scale_factor=True. See the documentation of nn.Upsample for details. (function _interp_output_size) 2023-01-11T23:31:02.8674128Z [W upsampling.h:66] Warning: The default behavior for interpolate/upsample with float scale_factor changed in 1.6.0 to align with other frameworks/libraries, and uses scale_factor directly, instead of relying on the computed output size. If you wish to keep the old behavior, please set recompute_scale_factor=True. See the documentation of nn.Upsample for details. (function _interp_output_size) 2023-01-11T23:31:02.8675649Z [W upsampling.h:66] Warning: The default behavior for interpolate/upsample with float scale_factor changed in 1.6.0 to align with other frameworks/libraries, and uses scale_factor directly, instead of relying on the computed output size. If you wish to keep the old behavior, please set recompute_scale_factor=True. See the documentation of nn.Upsample for details. (function _interp_output_size) 2023-01-11T23:31:02.8677026Z [W upsampling.h:66] Warning: The default behavior for interpolate/upsample with float scale_factor changed in 1.6.0 to align with other frameworks/libraries, and uses scale_factor directly, instead of relying on the computed output size. If you wish to keep the old behavior, please set recompute_scale_factor=True. See the documentation of nn.Upsample for details. (function _interp_output_size) 2023-01-11T23:31:02.8678299Z [W upsampling.h:66] Warning: The default behavior for interpolate/upsample with float scale_factor changed in 1.6.0 to align with other frameworks/libraries, and uses scale_factor directly, instead of relying on the computed output size. If you wish to keep the old behavior, please set recompute_scale_factor=True. See the documentation of nn.Upsample for details. (function _interp_output_size) 2023-01-11T23:31:02.8679662Z [W upsampling.h:66] Warning: The default behavior for interpolate/upsample with float scale_factor changed in 1.6.0 to align with other frameworks/libraries, and uses scale_factor directly, instead of relying on the computed output size. If you wish to keep the old behavior, please set recompute_scale_factor=True. See the documentation of nn.Upsample for details. (function _interp_output_size) 2023-01-11T23:31:02.8756429Z [ OK ] FunctionalTest.Interpolate (8 ms) 2023-01-11T23:31:02.8756931Z [ RUN ] FunctionalTest.Pad1 2023-01-11T23:31:02.8757359Z [ OK ] FunctionalTest.Pad1 (0 ms) 2023-01-11T23:31:02.8757803Z [ RUN ] FunctionalTest.Pad2 2023-01-11T23:31:02.8759521Z [ OK ] FunctionalTest.Pad2 (0 ms) 2023-01-11T23:31:02.8759959Z [ RUN ] FunctionalTest.Pad3 2023-01-11T23:31:02.8766241Z [ OK ] FunctionalTest.Pad3 (0 ms) 2023-01-11T23:31:02.8766682Z [ RUN ] FunctionalTest.Pad4 2023-01-11T23:31:02.8768508Z [ OK ] FunctionalTest.Pad4 (0 ms) 2023-01-11T23:31:02.8768939Z [ RUN ] FunctionalTest.Pad5 2023-01-11T23:31:02.8772999Z [ OK ] FunctionalTest.Pad5 (0 ms) 2023-01-11T23:31:02.8773442Z [ RUN ] FunctionalTest.Pad6 2023-01-11T23:31:02.8775513Z [ OK ] FunctionalTest.Pad6 (0 ms) 2023-01-11T23:31:02.8775920Z [ RUN ] FunctionalTest.Pad7 2023-01-11T23:31:02.8776332Z [ OK ] FunctionalTest.Pad7 (0 ms) 2023-01-11T23:31:02.8776729Z [ RUN ] FunctionalTest.Pad8 2023-01-11T23:31:02.8777120Z [ OK ] FunctionalTest.Pad8 (0 ms) 2023-01-11T23:31:02.8777531Z [ RUN ] FunctionalTest.CTCLoss 2023-01-11T23:31:02.8888008Z [ OK ] FunctionalTest.CTCLoss (11 ms) 2023-01-11T23:31:02.8888545Z [ RUN ] FunctionalTest.PoissonNLLLoss 2023-01-11T23:31:02.8889560Z [ OK ] FunctionalTest.PoissonNLLLoss (0 ms) 2023-01-11T23:31:02.8890100Z [ RUN ] FunctionalTest.MarginRankingLoss 2023-01-11T23:31:02.8894131Z [ OK ] FunctionalTest.MarginRankingLoss (0 ms) 2023-01-11T23:31:02.8896577Z [ RUN ] FunctionalTest.ConvTranspose1d 2023-01-11T23:31:02.8897156Z [ OK ] FunctionalTest.ConvTranspose1d (0 ms) 2023-01-11T23:31:02.8897672Z [ RUN ] FunctionalTest.ConvTranspose2dEven 2023-01-11T23:31:02.8902308Z [ OK ] FunctionalTest.ConvTranspose2dEven (0 ms) 2023-01-11T23:31:02.8902889Z [ RUN ] FunctionalTest.ConvTranspose2dUneven 2023-01-11T23:31:02.8905760Z [ OK ] FunctionalTest.ConvTranspose2dUneven (0 ms) 2023-01-11T23:31:02.8906333Z [ RUN ] FunctionalTest.ConvTranspose3d 2023-01-11T23:31:02.8909064Z [ OK ] FunctionalTest.ConvTranspose3d (0 ms) 2023-01-11T23:31:02.8909606Z [ RUN ] FunctionalTest.AlphaDropout 2023-01-11T23:31:02.8918849Z [ OK ] FunctionalTest.AlphaDropout (0 ms) 2023-01-11T23:31:02.8919403Z [ RUN ] FunctionalTest.FeatureAlphaDropout 2023-01-11T23:31:02.8928000Z [ OK ] FunctionalTest.FeatureAlphaDropout (0 ms) 2023-01-11T23:31:02.8928552Z [ RUN ] FunctionalTest.Dropout 2023-01-11T23:31:02.8933094Z [ OK ] FunctionalTest.Dropout (0 ms) 2023-01-11T23:31:02.8933594Z [ RUN ] FunctionalTest.Dropout2d 2023-01-11T23:31:02.8938589Z [ OK ] FunctionalTest.Dropout2d (0 ms) 2023-01-11T23:31:02.8939069Z [ RUN ] FunctionalTest.Dropout3d 2023-01-11T23:31:02.8943946Z [ OK ] FunctionalTest.Dropout3d (0 ms) 2023-01-11T23:31:02.8944438Z [ RUN ] FunctionalTest.isfinite 2023-01-11T23:31:02.8950146Z [ OK ] FunctionalTest.isfinite (0 ms) 2023-01-11T23:31:02.8950590Z [ RUN ] FunctionalTest.isfinite_CUDA 2023-01-11T23:31:03.7402210Z [ OK ] FunctionalTest.isfinite_CUDA (844 ms) 2023-01-11T23:31:03.7402546Z [ RUN ] FunctionalTest.isinf 2023-01-11T23:31:03.7406634Z [ OK ] FunctionalTest.isinf (0 ms) 2023-01-11T23:31:03.7406958Z [ RUN ] FunctionalTest.isinf_CUDA 2023-01-11T23:31:03.7436609Z [ OK ] FunctionalTest.isinf_CUDA (2 ms) 2023-01-11T23:31:03.7436950Z [ RUN ] FunctionalTest.AllClose 2023-01-11T23:31:03.7510913Z [ OK ] FunctionalTest.AllClose (7 ms) 2023-01-11T23:31:03.7511480Z [ RUN ] FunctionalTest.AllClose_CUDA 2023-01-11T23:31:03.7836886Z [ OK ] FunctionalTest.AllClose_CUDA (32 ms) 2023-01-11T23:31:03.7837255Z [ RUN ] FunctionalTest.BCEWithLogitsLoss 2023-01-11T23:31:03.7875095Z [ OK ] FunctionalTest.BCEWithLogitsLoss (3 ms) 2023-01-11T23:31:03.7875755Z [----------] 135 tests from FunctionalTest (982 ms total) 2023-01-11T23:31:03.7875939Z 2023-01-11T23:31:03.7876110Z [----------] 3 tests from IntegrationTest 2023-01-11T23:31:03.7876425Z [ RUN ] IntegrationTest.CartPole 2023-01-11T23:31:15.8182055Z [ OK ] IntegrationTest.CartPole (12030 ms) 2023-01-11T23:31:15.8182711Z [ RUN ] IntegrationTest.MNIST_CUDA 2023-01-11T23:31:25.2905575Z [ OK ] IntegrationTest.MNIST_CUDA (9472 ms) 2023-01-11T23:31:25.2905990Z [ RUN ] IntegrationTest.MNISTBatchNorm_CUDA 2023-01-11T23:31:34.0412462Z [ OK ] IntegrationTest.MNISTBatchNorm_CUDA (8750 ms) 2023-01-11T23:31:34.0413088Z [----------] 3 tests from IntegrationTest (30253 ms total) 2023-01-11T23:31:34.0413269Z 2023-01-11T23:31:34.0413419Z [----------] 9 tests from InitTest 2023-01-11T23:31:34.0413812Z [ RUN ] InitTest.ProducesPyTorchValues_XavierUniform 2023-01-11T23:31:34.0427265Z [ OK ] InitTest.ProducesPyTorchValues_XavierUniform (1 ms) 2023-01-11T23:31:34.0427687Z [ RUN ] InitTest.ProducesPyTorchValues_XavierNormal 2023-01-11T23:31:34.0441345Z [ OK ] InitTest.ProducesPyTorchValues_XavierNormal (1 ms) 2023-01-11T23:31:34.0441904Z [ RUN ] InitTest.ProducesPyTorchValues_KaimingNormal 2023-01-11T23:31:34.0456364Z [ OK ] InitTest.ProducesPyTorchValues_KaimingNormal (1 ms) 2023-01-11T23:31:34.0456780Z [ RUN ] InitTest.ProducesPyTorchValues_KaimingUniform 2023-01-11T23:31:34.0471467Z [ OK ] InitTest.ProducesPyTorchValues_KaimingUniform (1 ms) 2023-01-11T23:31:34.0472272Z [ RUN ] InitTest.CanInitializeTensorThatRequiresGrad 2023-01-11T23:31:34.0499219Z [ OK ] InitTest.CanInitializeTensorThatRequiresGrad (2 ms) 2023-01-11T23:31:34.0499628Z [ RUN ] InitTest.CalculateGainWithTanh 2023-01-11T23:31:34.0500029Z [ OK ] InitTest.CalculateGainWithTanh (0 ms) 2023-01-11T23:31:34.0500442Z [ RUN ] InitTest.CalculateGainWithRelu 2023-01-11T23:31:34.0500876Z [ OK ] InitTest.CalculateGainWithRelu (0 ms) 2023-01-11T23:31:34.0501371Z [ RUN ] InitTest.CalculateGainWithLeakyRelu 2023-01-11T23:31:34.0503453Z [ OK ] InitTest.CalculateGainWithLeakyRelu (0 ms) 2023-01-11T23:31:34.0504025Z [ RUN ] InitTest.CanInitializeCnnWithOrthogonal 2023-01-11T23:31:34.0504594Z [ OK ] InitTest.CanInitializeCnnWithOrthogonal (0 ms) 2023-01-11T23:31:34.0504970Z [----------] 9 tests from InitTest (9 ms total) 2023-01-11T23:31:34.0505130Z 2023-01-11T23:31:34.0505297Z [----------] 6 tests from TorchScriptTest 2023-01-11T23:31:34.0505653Z [ RUN ] TorchScriptTest.CanCompileMultipleFunctions 2023-01-11T23:31:34.0917602Z [ OK ] TorchScriptTest.CanCompileMultipleFunctions (41 ms) 2023-01-11T23:31:34.0918713Z [ RUN ] TorchScriptTest.TestNestedIValueModuleArgMatching 2023-01-11T23:31:34.0939205Z [ OK ] TorchScriptTest.TestNestedIValueModuleArgMatching (2 ms) 2023-01-11T23:31:34.0939756Z [ RUN ] TorchScriptTest.TestDictArgMatching 2023-01-11T23:31:34.0942651Z [ OK ] TorchScriptTest.TestDictArgMatching (0 ms) 2023-01-11T23:31:34.0943204Z [ RUN ] TorchScriptTest.TestTupleArgMatching 2023-01-11T23:31:34.0943695Z [ OK ] TorchScriptTest.TestTupleArgMatching (0 ms) 2023-01-11T23:31:34.0944105Z [ RUN ] TorchScriptTest.TestOptionalArgMatching 2023-01-11T23:31:34.0950028Z [ OK ] TorchScriptTest.TestOptionalArgMatching (0 ms) 2023-01-11T23:31:34.0950559Z [ RUN ] TorchScriptTest.TestPickle 2023-01-11T23:31:34.0951013Z [ OK ] TorchScriptTest.TestPickle (0 ms) 2023-01-11T23:31:34.0952543Z [----------] 6 tests from TorchScriptTest (44 ms total) 2023-01-11T23:31:34.0952786Z 2023-01-11T23:31:34.0953033Z [----------] 3 tests from MakeUniqueTest 2023-01-11T23:31:34.0953532Z [ RUN ] MakeUniqueTest.ForwardRvaluesCorrectly 2023-01-11T23:31:34.0954059Z [ OK ] MakeUniqueTest.ForwardRvaluesCorrectly (0 ms) 2023-01-11T23:31:34.0954590Z [ RUN ] MakeUniqueTest.ForwardLvaluesCorrectly 2023-01-11T23:31:34.0955092Z [ OK ] MakeUniqueTest.ForwardLvaluesCorrectly (0 ms) 2023-01-11T23:31:34.0955578Z [ RUN ] MakeUniqueTest.CanConstructUniquePtrOfArray 2023-01-11T23:31:34.0956190Z [ OK ] MakeUniqueTest.CanConstructUniquePtrOfArray (0 ms) 2023-01-11T23:31:34.0956682Z [----------] 3 tests from MakeUniqueTest (0 ms total) 2023-01-11T23:31:34.0956855Z 2023-01-11T23:31:34.0957067Z [----------] 2 tests from MetaTensorTest 2023-01-11T23:31:34.0957502Z [ RUN ] MetaTensorTest.MetaDeviceApi 2023-01-11T23:31:34.0957965Z [ OK ] MetaTensorTest.MetaDeviceApi (0 ms) 2023-01-11T23:31:34.0958361Z [ RUN ] MetaTensorTest.MetaNamespaceApi 2023-01-11T23:31:34.0958717Z [ OK ] MetaTensorTest.MetaNamespaceApi (0 ms) 2023-01-11T23:31:34.0959067Z [----------] 2 tests from MetaTensorTest (0 ms total) 2023-01-11T23:31:34.0959230Z 2023-01-11T23:31:34.0959376Z [----------] 2 tests from UtilsTest 2023-01-11T23:31:34.0959787Z [ RUN ] UtilsTest.WarnOnce 2023-01-11T23:31:34.0960075Z [ OK ] UtilsTest.WarnOnce (0 ms) 2023-01-11T23:31:34.0960407Z [ RUN ] UtilsTest.AmbiguousOperatorDefaults 2023-01-11T23:31:34.0960775Z [ OK ] UtilsTest.AmbiguousOperatorDefaults (0 ms) 2023-01-11T23:31:34.0961127Z [----------] 2 tests from UtilsTest (0 ms total) 2023-01-11T23:31:34.0961274Z 2023-01-11T23:31:34.0961419Z [----------] 1 test from NoGradTest 2023-01-11T23:31:34.0961731Z [ RUN ] NoGradTest.SetsGradModeCorrectly 2023-01-11T23:31:34.0983322Z [ OK ] NoGradTest.SetsGradModeCorrectly (3 ms) 2023-01-11T23:31:34.0983890Z [----------] 1 test from NoGradTest (3 ms total) 2023-01-11T23:31:34.0984117Z 2023-01-11T23:31:34.0984344Z [----------] 3 tests from AutogradTest 2023-01-11T23:31:34.0984672Z [ RUN ] AutogradTest.CanTakeDerivatives 2023-01-11T23:31:34.0985103Z [ OK ] AutogradTest.CanTakeDerivatives (0 ms) 2023-01-11T23:31:34.0985518Z [ RUN ] AutogradTest.CanTakeDerivativesOfZeroDimTensors 2023-01-11T23:31:34.0985977Z [ OK ] AutogradTest.CanTakeDerivativesOfZeroDimTensors (0 ms) 2023-01-11T23:31:34.0986394Z [ RUN ] AutogradTest.CanPassCustomGradientInputs 2023-01-11T23:31:34.0987977Z [ OK ] AutogradTest.CanPassCustomGradientInputs (0 ms) 2023-01-11T23:31:34.0988491Z [----------] 3 tests from AutogradTest (0 ms total) 2023-01-11T23:31:34.0988703Z 2023-01-11T23:31:34.0988908Z [----------] 1 test from OptionalArrayRefTest 2023-01-11T23:31:34.0989333Z [ RUN ] OptionalArrayRefTest.DanglingPointerFix 2023-01-11T23:31:34.0989857Z [ OK ] OptionalArrayRefTest.DanglingPointerFix (0 ms) 2023-01-11T23:31:34.0990472Z [----------] 1 test from OptionalArrayRefTest (0 ms total) 2023-01-11T23:31:34.0990659Z 2023-01-11T23:31:34.0990816Z [----------] 55 tests from ModuleTest 2023-01-11T23:31:34.0991167Z [ RUN ] ModuleTest.CanEnableAndDisableTrainingMode 2023-01-11T23:31:34.0991568Z [ OK ] ModuleTest.CanEnableAndDisableTrainingMode (0 ms) 2023-01-11T23:31:34.0991911Z [ RUN ] ModuleTest.ZeroGrad 2023-01-11T23:31:34.0992213Z [ OK ] ModuleTest.ZeroGrad (0 ms) 2023-01-11T23:31:34.0992584Z [ RUN ] ModuleTest.ZeroGradWithUndefined 2023-01-11T23:31:34.0992956Z [ OK ] ModuleTest.ZeroGradWithUndefined (0 ms) 2023-01-11T23:31:34.0993365Z [ RUN ] ModuleTest.RegisterModuleThrowsForEmptyOrDottedName 2023-01-11T23:31:34.1016032Z [ OK ] ModuleTest.RegisterModuleThrowsForEmptyOrDottedName (2 ms) 2023-01-11T23:31:34.1017018Z [ RUN ] ModuleTest.RegisterModuleThrowsForDuplicateModuleName 2023-01-11T23:31:34.1026322Z [ OK ] ModuleTest.RegisterModuleThrowsForDuplicateModuleName (1 ms) 2023-01-11T23:31:34.1026807Z [ RUN ] ModuleTest.ReplaceModuleThrowsForUnknownModuleName 2023-01-11T23:31:34.1038449Z [ OK ] ModuleTest.ReplaceModuleThrowsForUnknownModuleName (1 ms) 2023-01-11T23:31:34.1038922Z [ RUN ] ModuleTest.ReplaceModule 2023-01-11T23:31:34.1039242Z [ OK ] ModuleTest.ReplaceModule (0 ms) 2023-01-11T23:31:34.1039559Z [ RUN ] ModuleTest.UnregisterModule 2023-01-11T23:31:34.1055022Z [ OK ] ModuleTest.UnregisterModule (1 ms) 2023-01-11T23:31:34.1055606Z [ RUN ] ModuleTest.RegisterParameterThrowsForEmptyOrDottedName 2023-01-11T23:31:34.1088021Z [ OK ] ModuleTest.RegisterParameterThrowsForEmptyOrDottedName (3 ms) 2023-01-11T23:31:34.1088656Z [ RUN ] ModuleTest.RegisterParameterThrowsForDuplicateModuleName 2023-01-11T23:31:34.1110124Z [ OK ] ModuleTest.RegisterParameterThrowsForDuplicateModuleName (2 ms) 2023-01-11T23:31:34.1110766Z [ RUN ] ModuleTest.RegisterParameterUndefinedTensor 2023-01-11T23:31:34.1111302Z [ OK ] ModuleTest.RegisterParameterUndefinedTensor (0 ms) 2023-01-11T23:31:34.1111746Z [ RUN ] ModuleTest.RegisterBufferThrowsForEmptyOrDottedName 2023-01-11T23:31:34.1142908Z [ OK ] ModuleTest.RegisterBufferThrowsForEmptyOrDottedName (3 ms) 2023-01-11T23:31:34.1143473Z [ RUN ] ModuleTest.RegisterBufferThrowsForDuplicateModuleName 2023-01-11T23:31:34.1164226Z [ OK ] ModuleTest.RegisterBufferThrowsForDuplicateModuleName (2 ms) 2023-01-11T23:31:34.1164856Z [ RUN ] ModuleTest.CanGetName 2023-01-11T23:31:34.1165242Z [ OK ] ModuleTest.CanGetName (0 ms) 2023-01-11T23:31:34.1165578Z [ RUN ] ModuleTest.AsCastsModulesCorrectly 2023-01-11T23:31:34.1165941Z [ OK ] ModuleTest.AsCastsModulesCorrectly (0 ms) 2023-01-11T23:31:34.1166425Z [ RUN ] ModuleTest.DeviceOrDtypeConversionSkipsUndefinedTensor 2023-01-11T23:31:34.1167052Z [ OK ] ModuleTest.DeviceOrDtypeConversionSkipsUndefinedTensor (0 ms) 2023-01-11T23:31:34.1167630Z [ RUN ] ModuleTest.DeviceOrDtypeConversionSkipsUndefinedTensor_CUDA 2023-01-11T23:31:34.1168123Z [ OK ] ModuleTest.DeviceOrDtypeConversionSkipsUndefinedTensor_CUDA (0 ms) 2023-01-11T23:31:34.1168619Z [ RUN ] ModuleTest.ParametersAndBuffersAccessorSkipsUndefinedTensor 2023-01-11T23:31:34.1169124Z [ OK ] ModuleTest.ParametersAndBuffersAccessorSkipsUndefinedTensor (0 ms) 2023-01-11T23:31:34.1169637Z [ RUN ] ModuleTest.CallingCloneOnModuleThatDoesNotOverrideCloneThrows 2023-01-11T23:31:34.1184596Z [ OK ] ModuleTest.CallingCloneOnModuleThatDoesNotOverrideCloneThrows (1 ms) 2023-01-11T23:31:34.1185142Z [ RUN ] ModuleTest.CallingCloneOnModuleThatDoesOverrideCloneDoesNotThrow 2023-01-11T23:31:34.1185682Z [ OK ] ModuleTest.CallingCloneOnModuleThatDoesOverrideCloneDoesNotThrow (0 ms) 2023-01-11T23:31:34.1186148Z [ RUN ] ModuleTest.CloneCreatesDistinctParameters 2023-01-11T23:31:34.1192678Z [ OK ] ModuleTest.CloneCreatesDistinctParameters (0 ms) 2023-01-11T23:31:34.1193140Z [ RUN ] ModuleTest.CloneCreatesDistinctParametersExplicitDevice_CUDA 2023-01-11T23:31:34.1209299Z [ OK ] ModuleTest.CloneCreatesDistinctParametersExplicitDevice_CUDA (1 ms) 2023-01-11T23:31:34.1209747Z [ RUN ] ModuleTest.ClonePreservesExternalReferences 2023-01-11T23:31:34.1210683Z [ OK ] ModuleTest.ClonePreservesExternalReferences (0 ms) 2023-01-11T23:31:34.1211230Z [ RUN ] ModuleTest.CloneCopiesTheValuesOfVariablesOfSubmodules 2023-01-11T23:31:34.1211912Z [ OK ] ModuleTest.CloneCopiesTheValuesOfVariablesOfSubmodules (0 ms) 2023-01-11T23:31:34.1212402Z [ RUN ] ModuleTest.CloneToDevicePreservesTheDeviceOfParameters_CUDA 2023-01-11T23:31:34.1216484Z [ OK ] ModuleTest.CloneToDevicePreservesTheDeviceOfParameters_CUDA (0 ms) 2023-01-11T23:31:34.1217059Z [ RUN ] ModuleTest.HasCorrectNumberOfParameters 2023-01-11T23:31:34.1217602Z [ OK ] ModuleTest.HasCorrectNumberOfParameters (0 ms) 2023-01-11T23:31:34.1218154Z [ RUN ] ModuleTest.ContainsParametersWithTheCorrectName 2023-01-11T23:31:34.1218612Z [ OK ] ModuleTest.ContainsParametersWithTheCorrectName (0 ms) 2023-01-11T23:31:34.1219016Z [ RUN ] ModuleTest.HasCorrectNumberOfBuffers 2023-01-11T23:31:34.1219400Z [ OK ] ModuleTest.HasCorrectNumberOfBuffers (0 ms) 2023-01-11T23:31:34.1219785Z [ RUN ] ModuleTest.ContainsBuffersWithTheCorrectName 2023-01-11T23:31:34.1220209Z [ OK ] ModuleTest.ContainsBuffersWithTheCorrectName (0 ms) 2023-01-11T23:31:34.1220723Z [ RUN ] ModuleTest.DefaultConstructorOfModuleHolderCallsDefaultConstructorOfImpl 2023-01-11T23:31:34.1221404Z [ OK ] ModuleTest.DefaultConstructorOfModuleHolderCallsDefaultConstructorOfImpl (0 ms) 2023-01-11T23:31:34.1221999Z [ RUN ] ModuleTest.ValueConstructorOfModuleHolderCallsCorrectConstructorInImpl 2023-01-11T23:31:34.1222588Z [ OK ] ModuleTest.ValueConstructorOfModuleHolderCallsCorrectConstructorInImpl (0 ms) 2023-01-11T23:31:34.1223149Z [ RUN ] ModuleTest.NullptrConstructorLeavesTheModuleHolderInEmptyState 2023-01-11T23:31:34.1229616Z [ OK ] ModuleTest.NullptrConstructorLeavesTheModuleHolderInEmptyState (1 ms) 2023-01-11T23:31:34.1230361Z [ RUN ] ModuleTest.ModulesReturnsExpectedSubmodulesForFlatModel 2023-01-11T23:31:34.1230903Z [ OK ] ModuleTest.ModulesReturnsExpectedSubmodulesForFlatModel (0 ms) 2023-01-11T23:31:34.1231516Z [ RUN ] ModuleTest.ModulesExcludesSelfWhenIncludeSelfSetToFalse 2023-01-11T23:31:34.1232020Z [ OK ] ModuleTest.ModulesExcludesSelfWhenIncludeSelfSetToFalse (0 ms) 2023-01-11T23:31:34.1232585Z [ RUN ] ModuleTest.NamedModulesReturnsExpectedNamedSubmodulesForFlatModel 2023-01-11T23:31:34.1233302Z [ OK ] ModuleTest.NamedModulesReturnsExpectedNamedSubmodulesForFlatModel (0 ms) 2023-01-11T23:31:34.1234034Z [ RUN ] ModuleTest.NamedModulesExcludesSelfWhenIncludeSelfSetToFalse 2023-01-11T23:31:34.1234738Z [ OK ] ModuleTest.NamedModulesExcludesSelfWhenIncludeSelfSetToFalse (0 ms) 2023-01-11T23:31:34.1235269Z [ RUN ] ModuleTest.ChildrenReturnsExpectedSubmodulesForFlatModel 2023-01-11T23:31:34.1235926Z [ OK ] ModuleTest.ChildrenReturnsExpectedSubmodulesForFlatModel (0 ms) 2023-01-11T23:31:34.1236573Z [ RUN ] ModuleTest.NamedChildrenReturnsExpectedNamedSubmodulesForFlatModel 2023-01-11T23:31:34.1237201Z [ OK ] ModuleTest.NamedChildrenReturnsExpectedNamedSubmodulesForFlatModel (0 ms) 2023-01-11T23:31:34.1237717Z [ RUN ] ModuleTest.ParametersReturnsExpectedTensorsForFlatModel 2023-01-11T23:31:34.1238208Z [ OK ] ModuleTest.ParametersReturnsExpectedTensorsForFlatModel (0 ms) 2023-01-11T23:31:34.1238699Z [ RUN ] ModuleTest.NamedParametersReturnsExpectedTensorsForFlatModel 2023-01-11T23:31:34.1239216Z [ OK ] ModuleTest.NamedParametersReturnsExpectedTensorsForFlatModel (0 ms) 2023-01-11T23:31:34.1239701Z [ RUN ] ModuleTest.BuffersReturnsExpectedTensorsForFlatModel 2023-01-11T23:31:34.1240168Z [ OK ] ModuleTest.BuffersReturnsExpectedTensorsForFlatModel (0 ms) 2023-01-11T23:31:34.1240641Z [ RUN ] ModuleTest.NamedBuffersReturnsExpectedTensorsForFlatModel 2023-01-11T23:31:34.1241142Z [ OK ] ModuleTest.NamedBuffersReturnsExpectedTensorsForFlatModel (0 ms) 2023-01-11T23:31:34.1241625Z [ RUN ] ModuleTest.ModulesReturnsExpectedSubmodulesForDeepModel 2023-01-11T23:31:34.1242103Z [ OK ] ModuleTest.ModulesReturnsExpectedSubmodulesForDeepModel (0 ms) 2023-01-11T23:31:34.1242621Z [ RUN ] ModuleTest.NamedModulesReturnsExpectedNamedSubmodulesForDeepModel 2023-01-11T23:31:34.1243171Z [ OK ] ModuleTest.NamedModulesReturnsExpectedNamedSubmodulesForDeepModel (0 ms) 2023-01-11T23:31:34.1243690Z [ RUN ] ModuleTest.ChildrensReturnsExpectedSubmodulesForDeepModel 2023-01-11T23:31:34.1244180Z [ OK ] ModuleTest.ChildrensReturnsExpectedSubmodulesForDeepModel (0 ms) 2023-01-11T23:31:34.1244710Z [ RUN ] ModuleTest.NamedChildrensReturnsExpectedNamedSubmodulesForDeepModel 2023-01-11T23:31:34.1245276Z [ OK ] ModuleTest.NamedChildrensReturnsExpectedNamedSubmodulesForDeepModel (0 ms) 2023-01-11T23:31:34.1245741Z [ RUN ] ModuleTest.ModuleApplyIteratesCorreclty 2023-01-11T23:31:34.1246132Z [ OK ] ModuleTest.ModuleApplyIteratesCorreclty (0 ms) 2023-01-11T23:31:34.1246533Z [ RUN ] ModuleTest.ConstModuleApplyIteratesCorreclty 2023-01-11T23:31:34.1246996Z [ OK ] ModuleTest.ConstModuleApplyIteratesCorreclty (0 ms) 2023-01-11T23:31:34.1247406Z [ RUN ] ModuleTest.NamedModuleApplyIteratesCorreclty 2023-01-11T23:31:34.1247824Z [ OK ] ModuleTest.NamedModuleApplyIteratesCorreclty (0 ms) 2023-01-11T23:31:34.1248255Z [ RUN ] ModuleTest.ConstNamedModuleApplyIteratesCorreclty 2023-01-11T23:31:34.1248698Z [ OK ] ModuleTest.ConstNamedModuleApplyIteratesCorreclty (0 ms) 2023-01-11T23:31:34.1249135Z [ RUN ] ModuleTest.ModulePointerApplyIteratesCorreclty 2023-01-11T23:31:34.1249566Z [ OK ] ModuleTest.ModulePointerApplyIteratesCorreclty (0 ms) 2023-01-11T23:31:34.1250011Z [ RUN ] ModuleTest.NamedModulePointerApplyIteratesCorreclty 2023-01-11T23:31:34.1250463Z [ OK ] ModuleTest.NamedModulePointerApplyIteratesCorreclty (0 ms) 2023-01-11T23:31:34.1250955Z [ RUN ] ModuleTest.ThrowsWhenAttemptingtoGetTopLevelModuleAsSharedPtr 2023-01-11T23:31:34.1266834Z [ OK ] ModuleTest.ThrowsWhenAttemptingtoGetTopLevelModuleAsSharedPtr (2 ms) 2023-01-11T23:31:34.1267279Z [ RUN ] ModuleTest.PrettyPrint 2023-01-11T23:31:34.1267650Z [ OK ] ModuleTest.PrettyPrint (0 ms) 2023-01-11T23:31:34.1268059Z [ RUN ] ModuleTest.CanCallForwardOnNonTensorForwardThroughPimpl 2023-01-11T23:31:34.1268545Z [ OK ] ModuleTest.CanCallForwardOnNonTensorForwardThroughPimpl (0 ms) 2023-01-11T23:31:34.1268953Z [----------] 55 tests from ModuleTest (27 ms total) 2023-01-11T23:31:34.1269113Z 2023-01-11T23:31:34.1269273Z [----------] 13 tests from ModuleDictTest 2023-01-11T23:31:34.1269678Z [ RUN ] ModuleDictTest.ConstructsFromList 2023-01-11T23:31:34.1270102Z [ OK ] ModuleDictTest.ConstructsFromList (0 ms) 2023-01-11T23:31:34.1270474Z [ RUN ] ModuleDictTest.ConstructsFromordereddict 2023-01-11T23:31:34.1270869Z [ OK ] ModuleDictTest.ConstructsFromordereddict (0 ms) 2023-01-11T23:31:34.1271259Z [ RUN ] ModuleDictTest.UpdatePopClearContains 2023-01-11T23:31:34.1280693Z [ OK ] ModuleDictTest.UpdatePopClearContains (1 ms) 2023-01-11T23:31:34.1281052Z [ RUN ] ModuleDictTest.UpdateExist 2023-01-11T23:31:34.1281382Z [ OK ] ModuleDictTest.UpdateExist (0 ms) 2023-01-11T23:31:34.1281674Z [ RUN ] ModuleDictTest.Keys 2023-01-11T23:31:34.1293381Z [ OK ] ModuleDictTest.Keys (1 ms) 2023-01-11T23:31:34.1294063Z [ RUN ] ModuleDictTest.Values 2023-01-11T23:31:34.1294619Z [ OK ] ModuleDictTest.Values (0 ms) 2023-01-11T23:31:34.1295046Z [ RUN ] ModuleDictTest.SanityCheckForHoldingStandardModules 2023-01-11T23:31:34.1295810Z [ OK ] ModuleDictTest.SanityCheckForHoldingStandardModules (0 ms) 2023-01-11T23:31:34.1296332Z [ RUN ] ModuleDictTest.HasReferenceSemantics 2023-01-11T23:31:34.1296720Z [ OK ] ModuleDictTest.HasReferenceSemantics (0 ms) 2023-01-11T23:31:34.1297063Z [ RUN ] ModuleDictTest.IsCloneable 2023-01-11T23:31:34.1300229Z [ OK ] ModuleDictTest.IsCloneable (0 ms) 2023-01-11T23:31:34.1300671Z [ RUN ] ModuleDictTest.IsCloneable_CUDA 2023-01-11T23:31:34.1312614Z [ OK ] ModuleDictTest.IsCloneable_CUDA (1 ms) 2023-01-11T23:31:34.1313156Z [ RUN ] ModuleDictTest.RegistersElementsAsSubmodules 2023-01-11T23:31:34.1313933Z [ OK ] ModuleDictTest.RegistersElementsAsSubmodules (0 ms) 2023-01-11T23:31:34.1314461Z [ RUN ] ModuleDictTest.CloneToDevice_CUDA 2023-01-11T23:31:34.1314951Z [ OK ] ModuleDictTest.CloneToDevice_CUDA (0 ms) 2023-01-11T23:31:34.1315322Z [ RUN ] ModuleDictTest.PrettyPrintModuleDict 2023-01-11T23:31:34.1316980Z [ OK ] ModuleDictTest.PrettyPrintModuleDict (0 ms) 2023-01-11T23:31:34.1327999Z [----------] 13 tests from ModuleDictTest (5 ms total) 2023-01-11T23:31:34.1328492Z 2023-01-11T23:31:34.1328972Z [----------] 16 tests from ModuleListTest 2023-01-11T23:31:34.1329483Z [ RUN ] ModuleListTest.ConstructsFromSharedPointer 2023-01-11T23:31:34.1330040Z [ OK ] ModuleListTest.ConstructsFromSharedPointer (0 ms) 2023-01-11T23:31:34.1330536Z [ RUN ] ModuleListTest.ConstructsFromConcreteType 2023-01-11T23:31:34.1330935Z [ OK ] ModuleListTest.ConstructsFromConcreteType (0 ms) 2023-01-11T23:31:34.1331329Z [ RUN ] ModuleListTest.ConstructsFromModuleHolder 2023-01-11T23:31:34.1331735Z [ OK ] ModuleListTest.ConstructsFromModuleHolder (0 ms) 2023-01-11T23:31:34.1332108Z [ RUN ] ModuleListTest.PushBackAddsAnElement 2023-01-11T23:31:34.1332484Z [ OK ] ModuleListTest.PushBackAddsAnElement (0 ms) 2023-01-11T23:31:34.1332823Z [ RUN ] ModuleListTest.Insertion 2023-01-11T23:31:34.1333142Z [ OK ] ModuleListTest.Insertion (0 ms) 2023-01-11T23:31:34.1333453Z [ RUN ] ModuleListTest.AccessWithAt 2023-01-11T23:31:34.1341387Z [ OK ] ModuleListTest.AccessWithAt (2 ms) 2023-01-11T23:31:34.1341835Z [ RUN ] ModuleListTest.AccessWithPtr 2023-01-11T23:31:34.1364387Z [ OK ] ModuleListTest.AccessWithPtr (2 ms) 2023-01-11T23:31:34.1365022Z [ RUN ] ModuleListTest.SanityCheckForHoldingStandardModules 2023-01-11T23:31:34.1365768Z [ OK ] ModuleListTest.SanityCheckForHoldingStandardModules (0 ms) 2023-01-11T23:31:34.1366422Z [ RUN ] ModuleListTest.ExtendPushesModulesFromOtherModuleList 2023-01-11T23:31:34.1367002Z [ OK ] ModuleListTest.ExtendPushesModulesFromOtherModuleList (0 ms) 2023-01-11T23:31:34.1367418Z [ RUN ] ModuleListTest.HasReferenceSemantics 2023-01-11T23:31:34.1367798Z [ OK ] ModuleListTest.HasReferenceSemantics (0 ms) 2023-01-11T23:31:34.1368133Z [ RUN ] ModuleListTest.IsCloneable 2023-01-11T23:31:34.1370071Z [ OK ] ModuleListTest.IsCloneable (0 ms) 2023-01-11T23:31:34.1370669Z [ RUN ] ModuleListTest.RegistersElementsAsSubmodules 2023-01-11T23:31:34.1371181Z [ OK ] ModuleListTest.RegistersElementsAsSubmodules (0 ms) 2023-01-11T23:31:34.1371554Z [ RUN ] ModuleListTest.NestingIsPossible 2023-01-11T23:31:34.1371908Z [ OK ] ModuleListTest.NestingIsPossible (0 ms) 2023-01-11T23:31:34.1372255Z [ RUN ] ModuleListTest.CloneToDevice_CUDA 2023-01-11T23:31:34.1372904Z [ OK ] ModuleListTest.CloneToDevice_CUDA (0 ms) 2023-01-11T23:31:34.1373479Z [ RUN ] ModuleListTest.PrettyPrintModuleList 2023-01-11T23:31:34.1374924Z [ OK ] ModuleListTest.PrettyPrintModuleList (0 ms) 2023-01-11T23:31:34.1375429Z [ RUN ] ModuleListTest.RangeBasedForLoop 2023-01-11T23:31:34.1375795Z [ OK ] ModuleListTest.RangeBasedForLoop (0 ms) 2023-01-11T23:31:34.1376158Z [----------] 16 tests from ModuleListTest (5 ms total) 2023-01-11T23:31:34.1376326Z 2023-01-11T23:31:34.1376479Z [----------] 258 tests from ModulesTest 2023-01-11T23:31:34.1376762Z [ RUN ] ModulesTest.Conv1d 2023-01-11T23:31:34.1385348Z [ OK ] ModulesTest.Conv1d (0 ms) 2023-01-11T23:31:34.1385673Z [ RUN ] ModulesTest.Conv1dSameStrided 2023-01-11T23:31:34.1407527Z [ OK ] ModulesTest.Conv1dSameStrided (2 ms) 2023-01-11T23:31:34.1407849Z [ RUN ] ModulesTest.Conv2dEven 2023-01-11T23:31:34.1410625Z [ OK ] ModulesTest.Conv2dEven (0 ms) 2023-01-11T23:31:34.1410956Z [ RUN ] ModulesTest.Conv2dUneven 2023-01-11T23:31:34.1413250Z [ OK ] ModulesTest.Conv2dUneven (0 ms) 2023-01-11T23:31:34.1413607Z [ RUN ] ModulesTest.Conv2dSameStrided 2023-01-11T23:31:34.1459666Z [ OK ] ModulesTest.Conv2dSameStrided (4 ms) 2023-01-11T23:31:34.1459986Z [ RUN ] ModulesTest.Conv3d 2023-01-11T23:31:34.1463696Z [ OK ] ModulesTest.Conv3d (0 ms) 2023-01-11T23:31:34.1464007Z [ RUN ] ModulesTest.Conv3dSameStrided 2023-01-11T23:31:34.1508889Z [ OK ] ModulesTest.Conv3dSameStrided (4 ms) 2023-01-11T23:31:34.1509229Z [ RUN ] ModulesTest.ConvTranspose1d 2023-01-11T23:31:34.1512814Z [ OK ] ModulesTest.ConvTranspose1d (0 ms) 2023-01-11T23:31:34.1513197Z [ RUN ] ModulesTest.ConvTranspose2dEven 2023-01-11T23:31:34.1516215Z [ OK ] ModulesTest.ConvTranspose2dEven (0 ms) 2023-01-11T23:31:34.1516583Z [ RUN ] ModulesTest.ConvTranspose2dUneven 2023-01-11T23:31:34.1519924Z [ OK ] ModulesTest.ConvTranspose2dUneven (0 ms) 2023-01-11T23:31:34.1520274Z [ RUN ] ModulesTest.ConvTranspose3d 2023-01-11T23:31:34.1523038Z [ OK ] ModulesTest.ConvTranspose3d (0 ms) 2023-01-11T23:31:34.1523535Z [ RUN ] ModulesTest.MaxPool1d 2023-01-11T23:31:34.1524890Z [ OK ] ModulesTest.MaxPool1d (0 ms) 2023-01-11T23:31:34.1525231Z [ RUN ] ModulesTest.MaxPool1dReturnIndices 2023-01-11T23:31:34.1526058Z [ OK ] ModulesTest.MaxPool1dReturnIndices (0 ms) 2023-01-11T23:31:34.1526401Z [ RUN ] ModulesTest.MaxPool2dEven 2023-01-11T23:31:34.1527756Z [ OK ] ModulesTest.MaxPool2dEven (0 ms) 2023-01-11T23:31:34.1528087Z [ RUN ] ModulesTest.MaxPool2dUneven 2023-01-11T23:31:34.1528643Z [ OK ] ModulesTest.MaxPool2dUneven (0 ms) 2023-01-11T23:31:34.1529191Z [ RUN ] ModulesTest.MaxPool2dReturnIndices 2023-01-11T23:31:34.1530766Z [ OK ] ModulesTest.MaxPool2dReturnIndices (0 ms) 2023-01-11T23:31:34.1531154Z [ RUN ] ModulesTest.MaxPool3d 2023-01-11T23:31:34.1532508Z [ OK ] ModulesTest.MaxPool3d (0 ms) 2023-01-11T23:31:34.1532905Z [ RUN ] ModulesTest.MaxPool3dReturnIndices 2023-01-11T23:31:34.1534763Z [ OK ] ModulesTest.MaxPool3dReturnIndices (0 ms) 2023-01-11T23:31:34.1535140Z [ RUN ] ModulesTest.AvgPool1d 2023-01-11T23:31:34.1535924Z [ OK ] ModulesTest.AvgPool1d (0 ms) 2023-01-11T23:31:34.1536283Z [ RUN ] ModulesTest.AvgPool2dEven 2023-01-11T23:31:34.1537841Z [ OK ] ModulesTest.AvgPool2dEven (0 ms) 2023-01-11T23:31:34.1538197Z [ RUN ] ModulesTest.AvgPool2dUneven 2023-01-11T23:31:34.1539413Z [ OK ] ModulesTest.AvgPool2dUneven (0 ms) 2023-01-11T23:31:34.1539777Z [ RUN ] ModulesTest.AvgPool3d 2023-01-11T23:31:34.1540264Z [ OK ] ModulesTest.AvgPool3d (0 ms) 2023-01-11T23:31:34.1540627Z [ RUN ] ModulesTest.FractionalMaxPool2d 2023-01-11T23:31:34.1542407Z [ OK ] ModulesTest.FractionalMaxPool2d (0 ms) 2023-01-11T23:31:34.1542850Z [ RUN ] ModulesTest.FractionalMaxPool2dReturnIndices 2023-01-11T23:31:34.1543776Z [ OK ] ModulesTest.FractionalMaxPool2dReturnIndices (0 ms) 2023-01-11T23:31:34.1544204Z [ RUN ] ModulesTest.FractionalMaxPool3d 2023-01-11T23:31:34.1545780Z [ OK ] ModulesTest.FractionalMaxPool3d (0 ms) 2023-01-11T23:31:34.1546208Z [ RUN ] ModulesTest.FractionalMaxPool3dReturnIndices 2023-01-11T23:31:34.1547578Z [ OK ] ModulesTest.FractionalMaxPool3dReturnIndices (0 ms) 2023-01-11T23:31:34.1547978Z [ RUN ] ModulesTest.LPPool1d 2023-01-11T23:31:34.1548470Z [ OK ] ModulesTest.LPPool1d (0 ms) 2023-01-11T23:31:34.1548805Z [ RUN ] ModulesTest.LPPool2d 2023-01-11T23:31:34.1551688Z [ OK ] ModulesTest.LPPool2d (0 ms) 2023-01-11T23:31:34.1552366Z [ RUN ] ModulesTest.Identity 2023-01-11T23:31:34.1552782Z [ OK ] ModulesTest.Identity (0 ms) 2023-01-11T23:31:34.1553236Z [ RUN ] ModulesTest.Flatten 2023-01-11T23:31:34.1553961Z [ OK ] ModulesTest.Flatten (0 ms) 2023-01-11T23:31:34.1554374Z [ RUN ] ModulesTest.Unflatten 2023-01-11T23:31:34.1555005Z [W TensorImpl.h:1816] Warning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (function operator()) 2023-01-11T23:31:34.1555572Z [ OK ] ModulesTest.Unflatten (0 ms) 2023-01-11T23:31:34.1555896Z [ RUN ] ModulesTest.AdaptiveMaxPool1d 2023-01-11T23:31:34.1556562Z [ OK ] ModulesTest.AdaptiveMaxPool1d (0 ms) 2023-01-11T23:31:34.1557046Z [ RUN ] ModulesTest.AdaptiveMaxPool1dReturnIndices 2023-01-11T23:31:34.1558612Z [ OK ] ModulesTest.AdaptiveMaxPool1dReturnIndices (0 ms) 2023-01-11T23:31:34.1559119Z [ RUN ] ModulesTest.AdaptiveMaxPool2dEven 2023-01-11T23:31:34.1560571Z [ OK ] ModulesTest.AdaptiveMaxPool2dEven (0 ms) 2023-01-11T23:31:34.1561072Z [ RUN ] ModulesTest.AdaptiveMaxPool2dUneven 2023-01-11T23:31:34.1561754Z [ OK ] ModulesTest.AdaptiveMaxPool2dUneven (0 ms) 2023-01-11T23:31:34.1562251Z [ RUN ] ModulesTest.AdaptiveMaxPool2dReturnIndicesEven 2023-01-11T23:31:34.1564785Z [ OK ] ModulesTest.AdaptiveMaxPool2dReturnIndicesEven (0 ms) 2023-01-11T23:31:34.1565369Z [ RUN ] ModulesTest.AdaptiveMaxPool2dReturnIndicesUneven 2023-01-11T23:31:34.1567273Z [ OK ] ModulesTest.AdaptiveMaxPool2dReturnIndicesUneven (0 ms) 2023-01-11T23:31:34.1567883Z [ RUN ] ModulesTest.AdaptiveMaxPool3d 2023-01-11T23:31:34.1569219Z [ OK ] ModulesTest.AdaptiveMaxPool3d (0 ms) 2023-01-11T23:31:34.1569718Z [ RUN ] ModulesTest.AdaptiveMaxPool3dReturnIndices 2023-01-11T23:31:34.1572331Z [ OK ] ModulesTest.AdaptiveMaxPool3dReturnIndices (0 ms) 2023-01-11T23:31:34.1572826Z [ RUN ] ModulesTest.AdaptiveAvgPool1d 2023-01-11T23:31:34.1573770Z [ OK ] ModulesTest.AdaptiveAvgPool1d (0 ms) 2023-01-11T23:31:34.1574233Z [ RUN ] ModulesTest.AdaptiveAvgPool2dEven 2023-01-11T23:31:34.1576066Z [ OK ] ModulesTest.AdaptiveAvgPool2dEven (0 ms) 2023-01-11T23:31:34.1576568Z [ RUN ] ModulesTest.AdaptiveAvgPool2dUneven 2023-01-11T23:31:34.1577456Z [ OK ] ModulesTest.AdaptiveAvgPool2dUneven (0 ms) 2023-01-11T23:31:34.1577927Z [ RUN ] ModulesTest.AdaptiveAvgPool3d 2023-01-11T23:31:34.1579778Z [ OK ] ModulesTest.AdaptiveAvgPool3d (0 ms) 2023-01-11T23:31:34.1580235Z [ RUN ] ModulesTest.MaxUnpool1d 2023-01-11T23:31:34.1581993Z [ OK ] ModulesTest.MaxUnpool1d (0 ms) 2023-01-11T23:31:34.1582442Z [ RUN ] ModulesTest.MaxPool1d_MaxUnpool1d 2023-01-11T23:31:34.1585102Z [ OK ] ModulesTest.MaxPool1d_MaxUnpool1d (0 ms) 2023-01-11T23:31:34.1585548Z [ RUN ] ModulesTest.MaxUnpool2d 2023-01-11T23:31:34.1587519Z [ OK ] ModulesTest.MaxUnpool2d (0 ms) 2023-01-11T23:31:34.1587971Z [ RUN ] ModulesTest.MaxPool2d_MaxUnpool2d 2023-01-11T23:31:34.1590198Z [ OK ] ModulesTest.MaxPool2d_MaxUnpool2d (0 ms) 2023-01-11T23:31:34.1590637Z [ RUN ] ModulesTest.MaxUnpool3d 2023-01-11T23:31:34.1592602Z [ OK ] ModulesTest.MaxUnpool3d (0 ms) 2023-01-11T23:31:34.1593012Z [ RUN ] ModulesTest.MaxUnpool3dOutputSize 2023-01-11T23:31:34.1594404Z [ OK ] ModulesTest.MaxUnpool3dOutputSize (0 ms) 2023-01-11T23:31:34.1594856Z [ RUN ] ModulesTest.MaxPool3d_MaxUnpool3d 2023-01-11T23:31:34.2473561Z [ OK ] ModulesTest.MaxPool3d_MaxUnpool3d (87 ms) 2023-01-11T23:31:34.2473975Z [ RUN ] ModulesTest.Linear 2023-01-11T23:31:34.2475880Z [ OK ] ModulesTest.Linear (0 ms) 2023-01-11T23:31:34.2476521Z [ RUN ] ModulesTest.LocalResponseNorm 2023-01-11T23:31:34.2480563Z [ OK ] ModulesTest.LocalResponseNorm (0 ms) 2023-01-11T23:31:34.2480928Z [ RUN ] ModulesTest.LayerNorm 2023-01-11T23:31:34.2481889Z [ OK ] ModulesTest.LayerNorm (0 ms) 2023-01-11T23:31:34.2482200Z [ RUN ] ModulesTest.GroupNorm 2023-01-11T23:31:34.2484599Z [ OK ] ModulesTest.GroupNorm (0 ms) 2023-01-11T23:31:34.2484891Z [ RUN ] ModulesTest.Bilinear 2023-01-11T23:31:34.2488500Z [ OK ] ModulesTest.Bilinear (0 ms) 2023-01-11T23:31:34.2488795Z [ RUN ] ModulesTest.Fold 2023-01-11T23:31:34.2514806Z [ OK ] ModulesTest.Fold (2 ms) 2023-01-11T23:31:34.2515455Z [ RUN ] ModulesTest.Unfold 2023-01-11T23:31:34.2580153Z [ OK ] ModulesTest.Unfold (6 ms) 2023-01-11T23:31:34.2580577Z [ RUN ] ModulesTest.SimpleContainer 2023-01-11T23:31:34.2592381Z [ OK ] ModulesTest.SimpleContainer (1 ms) 2023-01-11T23:31:34.2592786Z [ RUN ] ModulesTest.EmbeddingBasic 2023-01-11T23:31:34.2593869Z [ OK ] ModulesTest.EmbeddingBasic (0 ms) 2023-01-11T23:31:34.2594299Z [ RUN ] ModulesTest.EmbeddingList 2023-01-11T23:31:34.2594713Z [ OK ] ModulesTest.EmbeddingList (0 ms) 2023-01-11T23:31:34.2595059Z [ RUN ] ModulesTest.EmbeddingFromPretrained 2023-01-11T23:31:34.2595420Z [ OK ] ModulesTest.EmbeddingFromPretrained (0 ms) 2023-01-11T23:31:34.2595791Z [ RUN ] ModulesTest.EmbeddingBagFromPretrained 2023-01-11T23:31:34.2596719Z [ OK ] ModulesTest.EmbeddingBagFromPretrained (0 ms) 2023-01-11T23:31:34.2597073Z [ RUN ] ModulesTest.AlphaDropout 2023-01-11T23:31:34.2597521Z [ OK ] ModulesTest.AlphaDropout (0 ms) 2023-01-11T23:31:34.2597862Z [ RUN ] ModulesTest.FeatureAlphaDropout 2023-01-11T23:31:34.2599085Z [ OK ] ModulesTest.FeatureAlphaDropout (0 ms) 2023-01-11T23:31:34.2599409Z [ RUN ] ModulesTest.Dropout 2023-01-11T23:31:34.2600963Z [ OK ] ModulesTest.Dropout (0 ms) 2023-01-11T23:31:34.2601265Z [ RUN ] ModulesTest.Dropout2d 2023-01-11T23:31:34.2605743Z [ OK ] ModulesTest.Dropout2d (0 ms) 2023-01-11T23:31:34.2606041Z [ RUN ] ModulesTest.Dropout3d 2023-01-11T23:31:34.2611468Z [ OK ] ModulesTest.Dropout3d (0 ms) 2023-01-11T23:31:34.2611796Z [ RUN ] ModulesTest.Parameters 2023-01-11T23:31:34.2612203Z [ OK ] ModulesTest.Parameters (0 ms) 2023-01-11T23:31:34.2612636Z [ RUN ] ModulesTest.FunctionalCallsSuppliedFunction 2023-01-11T23:31:34.2613068Z [ OK ] ModulesTest.FunctionalCallsSuppliedFunction (0 ms) 2023-01-11T23:31:34.2613525Z [ RUN ] ModulesTest.FunctionalWithTorchFunction 2023-01-11T23:31:34.2614004Z [ OK ] ModulesTest.FunctionalWithTorchFunction (0 ms) 2023-01-11T23:31:34.2614629Z [ RUN ] ModulesTest.FunctionalArgumentBinding 2023-01-11T23:31:34.2615014Z [ OK ] ModulesTest.FunctionalArgumentBinding (0 ms) 2023-01-11T23:31:34.2615378Z [ RUN ] ModulesTest.BatchNorm1dStateful 2023-01-11T23:31:34.2615715Z [ OK ] ModulesTest.BatchNorm1dStateful (0 ms) 2023-01-11T23:31:34.2616063Z [ RUN ] ModulesTest.BatchNorm1dStateless 2023-01-11T23:31:34.2616434Z [ OK ] ModulesTest.BatchNorm1dStateless (0 ms) 2023-01-11T23:31:34.2616863Z [ RUN ] ModulesTest.BatchNorm1d 2023-01-11T23:31:34.2617295Z [ OK ] ModulesTest.BatchNorm1d (0 ms) 2023-01-11T23:31:34.2617736Z [ RUN ] ModulesTest.BatchNorm2dStateful 2023-01-11T23:31:34.2618178Z [ OK ] ModulesTest.BatchNorm2dStateful (0 ms) 2023-01-11T23:31:34.2618517Z [ RUN ] ModulesTest.BatchNorm2dStateless 2023-01-11T23:31:34.2618956Z [ OK ] ModulesTest.BatchNorm2dStateless (0 ms) 2023-01-11T23:31:34.2619283Z [ RUN ] ModulesTest.BatchNorm2d 2023-01-11T23:31:34.2620991Z [ OK ] ModulesTest.BatchNorm2d (0 ms) 2023-01-11T23:31:34.2621447Z [ RUN ] ModulesTest.BatchNorm3dStateful 2023-01-11T23:31:34.2621914Z [ OK ] ModulesTest.BatchNorm3dStateful (0 ms) 2023-01-11T23:31:34.2622373Z [ RUN ] ModulesTest.BatchNorm3dStateless 2023-01-11T23:31:34.2622833Z [ OK ] ModulesTest.BatchNorm3dStateless (0 ms) 2023-01-11T23:31:34.2623192Z [ RUN ] ModulesTest.BatchNorm3d 2023-01-11T23:31:34.2626116Z [ OK ] ModulesTest.BatchNorm3d (0 ms) 2023-01-11T23:31:34.2627271Z [ RUN ] ModulesTest.InstanceNorm1dStateful 2023-01-11T23:31:34.2627865Z [ OK ] ModulesTest.InstanceNorm1dStateful (0 ms) 2023-01-11T23:31:34.2628396Z [ RUN ] ModulesTest.InstanceNorm1dStateless 2023-01-11T23:31:34.2628946Z [ OK ] ModulesTest.InstanceNorm1dStateless (0 ms) 2023-01-11T23:31:34.2629442Z [ RUN ] ModulesTest.InstanceNorm1d 2023-01-11T23:31:34.2629920Z [ OK ] ModulesTest.InstanceNorm1d (0 ms) 2023-01-11T23:31:34.2630500Z [ RUN ] ModulesTest.InstanceNorm2dStateful 2023-01-11T23:31:34.2631032Z [ OK ] ModulesTest.InstanceNorm2dStateful (0 ms) 2023-01-11T23:31:34.2631561Z [ RUN ] ModulesTest.InstanceNorm2dStateless 2023-01-11T23:31:34.2632096Z [ OK ] ModulesTest.InstanceNorm2dStateless (0 ms) 2023-01-11T23:31:34.2632915Z [ RUN ] ModulesTest.InstanceNorm2d 2023-01-11T23:31:34.2634028Z [ OK ] ModulesTest.InstanceNorm2d (0 ms) 2023-01-11T23:31:34.2634539Z [ RUN ] ModulesTest.InstanceNorm3dStateful 2023-01-11T23:31:34.2635084Z [ OK ] ModulesTest.InstanceNorm3dStateful (0 ms) 2023-01-11T23:31:34.2635612Z [ RUN ] ModulesTest.InstanceNorm3dStateless 2023-01-11T23:31:34.2636143Z [ OK ] ModulesTest.InstanceNorm3dStateless (0 ms) 2023-01-11T23:31:34.2636647Z [ RUN ] ModulesTest.InstanceNorm3d 2023-01-11T23:31:34.2640144Z [ OK ] ModulesTest.InstanceNorm3d (0 ms) 2023-01-11T23:31:34.2640613Z [ RUN ] ModulesTest.Linear_CUDA 2023-01-11T23:31:34.2647081Z [ OK ] ModulesTest.Linear_CUDA (0 ms) 2023-01-11T23:31:34.2647523Z [ RUN ] ModulesTest.Linear2_CUDA 2023-01-11T23:31:34.2649575Z [ OK ] ModulesTest.Linear2_CUDA (0 ms) 2023-01-11T23:31:34.2649893Z [ RUN ] ModulesTest.L1Loss 2023-01-11T23:31:34.2650894Z [ OK ] ModulesTest.L1Loss (0 ms) 2023-01-11T23:31:34.2651189Z [ RUN ] ModulesTest.MSELoss 2023-01-11T23:31:34.2652901Z [ OK ] ModulesTest.MSELoss (0 ms) 2023-01-11T23:31:34.2653297Z [ RUN ] ModulesTest.BCELoss 2023-01-11T23:31:34.2653595Z [ OK ] ModulesTest.BCELoss (0 ms) 2023-01-11T23:31:34.2653883Z [ RUN ] ModulesTest.KLDivLoss 2023-01-11T23:31:34.2654794Z [W loss.h:57] Warning: reduction: 'mean' divides the total loss by both the batch size and the support size.'batchmean' divides only by the batch size, and aligns with the KL div math definition.'mean' will be changed to behave the same as 'batchmean' in the next major release. (function kl_div) 2023-01-11T23:31:34.2655656Z [ OK ] ModulesTest.KLDivLoss (0 ms) 2023-01-11T23:31:34.2655991Z [ RUN ] ModulesTest.HingeEmbeddingLoss 2023-01-11T23:31:34.2658518Z [ OK ] ModulesTest.HingeEmbeddingLoss (0 ms) 2023-01-11T23:31:34.2658869Z [ RUN ] ModulesTest.MultiMarginLoss 2023-01-11T23:31:34.2660380Z [ OK ] ModulesTest.MultiMarginLoss (0 ms) 2023-01-11T23:31:34.2660732Z [ RUN ] ModulesTest.CosineEmbeddingLoss 2023-01-11T23:31:34.2664728Z [ OK ] ModulesTest.CosineEmbeddingLoss (0 ms) 2023-01-11T23:31:34.2665309Z [ RUN ] ModulesTest.SmoothL1LossDefaultOptions 2023-01-11T23:31:34.2666224Z [ OK ] ModulesTest.SmoothL1LossDefaultOptions (0 ms) 2023-01-11T23:31:34.2666619Z [ RUN ] ModulesTest.HuberLossDefaultOptions 2023-01-11T23:31:34.2667474Z [ OK ] ModulesTest.HuberLossDefaultOptions (0 ms) 2023-01-11T23:31:34.2667899Z [ RUN ] ModulesTest.MultiLabelMarginLossDefaultOptions 2023-01-11T23:31:34.2669343Z [ OK ] ModulesTest.MultiLabelMarginLossDefaultOptions (0 ms) 2023-01-11T23:31:34.2669771Z [ RUN ] ModulesTest.SmoothL1LossNoReduction 2023-01-11T23:31:34.2670507Z [ OK ] ModulesTest.SmoothL1LossNoReduction (0 ms) 2023-01-11T23:31:34.2671170Z [ RUN ] ModulesTest.HuberLossNoReduction 2023-01-11T23:31:34.2673440Z [ OK ] ModulesTest.HuberLossNoReduction (0 ms) 2023-01-11T23:31:34.2673909Z [ RUN ] ModulesTest.MultiLabelMarginLossNoReduction 2023-01-11T23:31:34.2674327Z [ OK ] ModulesTest.MultiLabelMarginLossNoReduction (0 ms) 2023-01-11T23:31:34.2674703Z [ RUN ] ModulesTest.SmoothL1LossBeta 2023-01-11T23:31:34.2675084Z [ OK ] ModulesTest.SmoothL1LossBeta (0 ms) 2023-01-11T23:31:34.2675407Z [ RUN ] ModulesTest.HuberLossDelta 2023-01-11T23:31:34.2676133Z [ OK ] ModulesTest.HuberLossDelta (0 ms) 2023-01-11T23:31:34.2676476Z [ RUN ] ModulesTest.TripletMarginLoss 2023-01-11T23:31:34.2679534Z [ OK ] ModulesTest.TripletMarginLoss (0 ms) 2023-01-11T23:31:34.2680056Z [ RUN ] ModulesTest.TripletMarginWithDistanceLossDefaultParity 2023-01-11T23:31:34.2819330Z [ OK ] ModulesTest.TripletMarginWithDistanceLossDefaultParity (13 ms) 2023-01-11T23:31:34.2819826Z [ RUN ] ModulesTest.TripletMarginWithDistanceLossFunctionalParity 2023-01-11T23:31:34.3061864Z [ OK ] ModulesTest.TripletMarginWithDistanceLossFunctionalParity (24 ms) 2023-01-11T23:31:34.3062271Z [ RUN ] ModulesTest.NLLLoss 2023-01-11T23:31:34.3064094Z [ OK ] ModulesTest.NLLLoss (0 ms) 2023-01-11T23:31:34.3064643Z [ RUN ] ModulesTest.CrossEntropyLoss 2023-01-11T23:31:34.3070961Z [ OK ] ModulesTest.CrossEntropyLoss (0 ms) 2023-01-11T23:31:34.3072234Z [ RUN ] ModulesTest.CosineSimilarity 2023-01-11T23:31:34.3074146Z [ OK ] ModulesTest.CosineSimilarity (0 ms) 2023-01-11T23:31:34.3074633Z [ RUN ] ModulesTest.SoftMarginLossDefaultOptions 2023-01-11T23:31:34.3075686Z [ OK ] ModulesTest.SoftMarginLossDefaultOptions (0 ms) 2023-01-11T23:31:34.3076126Z [ RUN ] ModulesTest.MultiLabelSoftMarginLossDefaultOptions 2023-01-11T23:31:34.3078759Z [ OK ] ModulesTest.MultiLabelSoftMarginLossDefaultOptions (0 ms) 2023-01-11T23:31:34.3079177Z [ RUN ] ModulesTest.SoftMarginLossNoReduction 2023-01-11T23:31:34.3080636Z [ OK ] ModulesTest.SoftMarginLossNoReduction (0 ms) 2023-01-11T23:31:34.3081081Z [ RUN ] ModulesTest.MultiLabelSoftMarginLossWeightedNoReduction 2023-01-11T23:31:34.3083661Z [ OK ] ModulesTest.MultiLabelSoftMarginLossWeightedNoReduction (0 ms) 2023-01-11T23:31:34.3084061Z [ RUN ] ModulesTest.PairwiseDistance 2023-01-11T23:31:34.3085499Z [ OK ] ModulesTest.PairwiseDistance (0 ms) 2023-01-11T23:31:34.3085987Z [ RUN ] ModulesTest.ELU 2023-01-11T23:31:34.3096974Z [ OK ] ModulesTest.ELU (1 ms) 2023-01-11T23:31:34.3097276Z [ RUN ] ModulesTest.SELU 2023-01-11T23:31:34.3099760Z [ OK ] ModulesTest.SELU (0 ms) 2023-01-11T23:31:34.3100060Z [ RUN ] ModulesTest.Hardshrink 2023-01-11T23:31:34.3107363Z [ OK ] ModulesTest.Hardshrink (0 ms) 2023-01-11T23:31:34.3107671Z [ RUN ] ModulesTest.Hardtanh 2023-01-11T23:31:34.3136114Z [ OK ] ModulesTest.Hardtanh (2 ms) 2023-01-11T23:31:34.3136455Z [ RUN ] ModulesTest.HardtanhMinValGEMaxVal 2023-01-11T23:31:34.3214200Z [ OK ] ModulesTest.HardtanhMinValGEMaxVal (7 ms) 2023-01-11T23:31:34.3214774Z [ RUN ] ModulesTest.LeakyReLU 2023-01-11T23:31:34.3227426Z [ OK ] ModulesTest.LeakyReLU (1 ms) 2023-01-11T23:31:34.3227734Z [ RUN ] ModulesTest.LogSigmoid 2023-01-11T23:31:34.3229458Z [ OK ] ModulesTest.LogSigmoid (0 ms) 2023-01-11T23:31:34.3229822Z [ RUN ] ModulesTest.Softmax 2023-01-11T23:31:34.3230679Z [ OK ] ModulesTest.Softmax (0 ms) 2023-01-11T23:31:34.3231161Z [ RUN ] ModulesTest.Softmin 2023-01-11T23:31:34.3233715Z [ OK ] ModulesTest.Softmin (0 ms) 2023-01-11T23:31:34.3234032Z [ RUN ] ModulesTest.LogSoftmax 2023-01-11T23:31:34.3234341Z [ OK ] ModulesTest.LogSoftmax (0 ms) 2023-01-11T23:31:34.3234694Z [ RUN ] ModulesTest.AdaptiveLogSoftmaxWithLoss 2023-01-11T23:31:34.3257190Z [ OK ] ModulesTest.AdaptiveLogSoftmaxWithLoss (2 ms) 2023-01-11T23:31:34.3257538Z [ RUN ] ModulesTest.Softmax2d 2023-01-11T23:31:34.3268099Z [ OK ] ModulesTest.Softmax2d (1 ms) 2023-01-11T23:31:34.3268631Z [ RUN ] ModulesTest.PReLU 2023-01-11T23:31:34.3273211Z [ OK ] ModulesTest.PReLU (0 ms) 2023-01-11T23:31:34.3273580Z [ RUN ] ModulesTest.ReLU 2023-01-11T23:31:34.3274808Z [ OK ] ModulesTest.ReLU (0 ms) 2023-01-11T23:31:34.3275211Z [ RUN ] ModulesTest.ReLU6 2023-01-11T23:31:34.3278040Z [ OK ] ModulesTest.ReLU6 (0 ms) 2023-01-11T23:31:34.3278330Z [ RUN ] ModulesTest.RReLU 2023-01-11T23:31:34.3320954Z [ OK ] ModulesTest.RReLU (4 ms) 2023-01-11T23:31:34.3321296Z [ RUN ] ModulesTest.CELU 2023-01-11T23:31:34.3330759Z [ OK ] ModulesTest.CELU (0 ms) 2023-01-11T23:31:34.3331041Z [ RUN ] ModulesTest.GLU 2023-01-11T23:31:34.3332923Z [ OK ] ModulesTest.GLU (0 ms) 2023-01-11T23:31:34.3333292Z [ RUN ] ModulesTest.GELU 2023-01-11T23:31:34.3335047Z [ OK ] ModulesTest.GELU (0 ms) 2023-01-11T23:31:34.3335347Z [ RUN ] ModulesTest.TanhGELU 2023-01-11T23:31:34.3335859Z [ OK ] ModulesTest.TanhGELU (0 ms) 2023-01-11T23:31:34.3336187Z [ RUN ] ModulesTest.Mish 2023-01-11T23:31:34.3337337Z [ OK ] ModulesTest.Mish (0 ms) 2023-01-11T23:31:34.3337627Z [ RUN ] ModulesTest.Sigmoid 2023-01-11T23:31:34.3337924Z [ OK ] ModulesTest.Sigmoid (0 ms) 2023-01-11T23:31:34.3338239Z [ RUN ] ModulesTest.PixelShuffle 2023-01-11T23:31:34.3340385Z [ OK ] ModulesTest.PixelShuffle (0 ms) 2023-01-11T23:31:34.3340712Z [ RUN ] ModulesTest.PixelUnshuffle 2023-01-11T23:31:34.3341398Z [ OK ] ModulesTest.PixelUnshuffle (0 ms) 2023-01-11T23:31:34.3341712Z [ RUN ] ModulesTest.Softplus 2023-01-11T23:31:34.3348616Z [ OK ] ModulesTest.Softplus (0 ms) 2023-01-11T23:31:34.3348922Z [ RUN ] ModulesTest.Softshrink 2023-01-11T23:31:34.3355355Z [ OK ] ModulesTest.Softshrink (0 ms) 2023-01-11T23:31:34.3355717Z [ RUN ] ModulesTest.Softsign 2023-01-11T23:31:34.3356073Z [ OK ] ModulesTest.Softsign (0 ms) 2023-01-11T23:31:34.3356360Z [ RUN ] ModulesTest.Tanh 2023-01-11T23:31:34.3356653Z [ OK ] ModulesTest.Tanh (0 ms) 2023-01-11T23:31:34.3357012Z [ RUN ] ModulesTest.Tanhshrink 2023-01-11T23:31:34.3357325Z [ OK ] ModulesTest.Tanhshrink (0 ms) 2023-01-11T23:31:34.3357625Z [ RUN ] ModulesTest.Threshold 2023-01-11T23:31:34.3370644Z [ OK ] ModulesTest.Threshold (1 ms) 2023-01-11T23:31:34.3371075Z [ RUN ] ModulesTest.Upsampling1D 2023-01-11T23:31:34.3372029Z [W upsampling.h:66] Warning: The default behavior for interpolate/upsample with float scale_factor changed in 1.6.0 to align with other frameworks/libraries, and uses scale_factor directly, instead of relying on the computed output size. If you wish to keep the old behavior, please set recompute_scale_factor=True. See the documentation of nn.Upsample for details. (function _interp_output_size) 2023-01-11T23:31:34.3374680Z [W upsampling.h:66] Warning: The default behavior for interpolate/upsample with float scale_factor changed in 1.6.0 to align with other frameworks/libraries, and uses scale_factor directly, instead of relying on the computed output size. If you wish to keep the old behavior, please set recompute_scale_factor=True. See the documentation of nn.Upsample for details. (function _interp_output_size) 2023-01-11T23:31:34.3376771Z [W upsampling.h:66] Warning: The default behavior for interpolate/upsample with float scale_factor changed in 1.6.0 to align with other frameworks/libraries, and uses scale_factor directly, instead of relying on the computed output size. If you wish to keep the old behavior, please set recompute_scale_factor=True. See the documentation of nn.Upsample for details. (function _interp_output_size) 2023-01-11T23:31:34.3377860Z [W upsampling.h:66] Warning: The default behavior for interpolate/upsample with float scale_factor changed in 1.6.0 to align with other frameworks/libraries, and uses scale_factor directly, instead of relying on the computed output size. If you wish to keep the old behavior, please set recompute_scale_factor=True. See the documentation of nn.Upsample for details. (function _interp_output_size) 2023-01-11T23:31:34.3380983Z [ OK ] ModulesTest.Upsampling1D (1 ms) 2023-01-11T23:31:34.3381333Z [ RUN ] ModulesTest.Upsampling2D 2023-01-11T23:31:34.3382084Z [W upsampling.h:66] Warning: The default behavior for interpolate/upsample with float scale_factor changed in 1.6.0 to align with other frameworks/libraries, and uses scale_factor directly, instead of relying on the computed output size. If you wish to keep the old behavior, please set recompute_scale_factor=True. See the documentation of nn.Upsample for details. (function _interp_output_size) 2023-01-11T23:31:34.3383660Z [W upsampling.h:66] Warning: The default behavior for interpolate/upsample with float scale_factor changed in 1.6.0 to align with other frameworks/libraries, and uses scale_factor directly, instead of relying on the computed output size. If you wish to keep the old behavior, please set recompute_scale_factor=True. See the documentation of nn.Upsample for details. (function _interp_output_size) 2023-01-11T23:31:34.3385904Z [W upsampling.h:66] Warning: The default behavior for interpolate/upsample with float scale_factor changed in 1.6.0 to align with other frameworks/libraries, and uses scale_factor directly, instead of relying on the computed output size. If you wish to keep the old behavior, please set recompute_scale_factor=True. See the documentation of nn.Upsample for details. (function _interp_output_size) 2023-01-11T23:31:34.3386779Z [W upsampling.h:66] Warning: The default behavior for interpolate/upsample with float scale_factor changed in 1.6.0 to align with other frameworks/libraries, and uses scale_factor directly, instead of relying on the computed output size. If you wish to keep the old behavior, please set recompute_scale_factor=True. See the documentation of nn.Upsample for details. (function _interp_output_size) 2023-01-11T23:31:34.3389071Z [W upsampling.h:66] Warning: The default behavior for interpolate/upsample with float scale_factor changed in 1.6.0 to align with other frameworks/libraries, and uses scale_factor directly, instead of relying on the computed output size. If you wish to keep the old behavior, please set recompute_scale_factor=True. See the documentation of nn.Upsample for details. (function _interp_output_size) 2023-01-11T23:31:34.3390512Z [W upsampling.h:66] Warning: The default behavior for interpolate/upsample with float scale_factor changed in 1.6.0 to align with other frameworks/libraries, and uses scale_factor directly, instead of relying on the computed output size. If you wish to keep the old behavior, please set recompute_scale_factor=True. See the documentation of nn.Upsample for details. (function _interp_output_size) 2023-01-11T23:31:34.3394576Z [W upsampling.h:66] Warning: The default behavior for interpolate/upsample with float scale_factor changed in 1.6.0 to align with other frameworks/libraries, and uses scale_factor directly, instead of relying on the computed output size. If you wish to keep the old behavior, please set recompute_scale_factor=True. See the documentation of nn.Upsample for details. (function _interp_output_size) 2023-01-11T23:31:34.3395472Z [W upsampling.h:66] Warning: The default behavior for interpolate/upsample with float scale_factor changed in 1.6.0 to align with other frameworks/libraries, and uses scale_factor directly, instead of relying on the computed output size. If you wish to keep the old behavior, please set recompute_scale_factor=True. See the documentation of nn.Upsample for details. (function _interp_output_size) 2023-01-11T23:31:34.3396282Z [ OK ] ModulesTest.Upsampling2D (1 ms) 2023-01-11T23:31:34.3396719Z [ RUN ] ModulesTest.Upsampling3D 2023-01-11T23:31:34.3398453Z [W upsampling.h:66] Warning: The default behavior for interpolate/upsample with float scale_factor changed in 1.6.0 to align with other frameworks/libraries, and uses scale_factor directly, instead of relying on the computed output size. If you wish to keep the old behavior, please set recompute_scale_factor=True. See the documentation of nn.Upsample for details. (function _interp_output_size) 2023-01-11T23:31:34.3399702Z [W upsampling.h:66] Warning: The default behavior for interpolate/upsample with float scale_factor changed in 1.6.0 to align with other frameworks/libraries, and uses scale_factor directly, instead of relying on the computed output size. If you wish to keep the old behavior, please set recompute_scale_factor=True. See the documentation of nn.Upsample for details. (function _interp_output_size) 2023-01-11T23:31:34.3402367Z [W upsampling.h:66] Warning: The default behavior for interpolate/upsample with float scale_factor changed in 1.6.0 to align with other frameworks/libraries, and uses scale_factor directly, instead of relying on the computed output size. If you wish to keep the old behavior, please set recompute_scale_factor=True. See the documentation of nn.Upsample for details. (function _interp_output_size) 2023-01-11T23:31:34.3403276Z [W upsampling.h:66] Warning: The default behavior for interpolate/upsample with float scale_factor changed in 1.6.0 to align with other frameworks/libraries, and uses scale_factor directly, instead of relying on the computed output size. If you wish to keep the old behavior, please set recompute_scale_factor=True. See the documentation of nn.Upsample for details. (function _interp_output_size) 2023-01-11T23:31:34.3405760Z [ OK ] ModulesTest.Upsampling3D (0 ms) 2023-01-11T23:31:34.3406080Z [ RUN ] ModulesTest.CTCLoss 2023-01-11T23:31:34.3408897Z [ OK ] ModulesTest.CTCLoss (0 ms) 2023-01-11T23:31:34.3409309Z [ RUN ] ModulesTest.PoissonNLLLoss 2023-01-11T23:31:34.3410665Z [ OK ] ModulesTest.PoissonNLLLoss (0 ms) 2023-01-11T23:31:34.3411089Z [ RUN ] ModulesTest.MarginRankingLoss 2023-01-11T23:31:34.3414203Z [ OK ] ModulesTest.MarginRankingLoss (0 ms) 2023-01-11T23:31:34.3414780Z [ RUN ] ModulesTest.BCEWithLogitsLoss 2023-01-11T23:31:34.3472502Z [ OK ] ModulesTest.BCEWithLogitsLoss (5 ms) 2023-01-11T23:31:34.3472936Z [ RUN ] ModulesTest.MultiheadAttention 2023-01-11T23:31:43.6439465Z [ OK ] ModulesTest.MultiheadAttention (9296 ms) 2023-01-11T23:31:43.6440173Z [ RUN ] ModulesTest.PrettyPrintIdentity 2023-01-11T23:31:43.6440632Z [ OK ] ModulesTest.PrettyPrintIdentity (0 ms) 2023-01-11T23:31:43.6441043Z [ RUN ] ModulesTest.PrettyPrintFlatten 2023-01-11T23:31:43.6441508Z [ OK ] ModulesTest.PrettyPrintFlatten (0 ms) 2023-01-11T23:31:43.6441934Z [ RUN ] ModulesTest.PrettyPrintUnflatten 2023-01-11T23:31:43.6442296Z [ OK ] ModulesTest.PrettyPrintUnflatten (0 ms) 2023-01-11T23:31:43.6442632Z [ RUN ] ModulesTest.ReflectionPad1d 2023-01-11T23:31:43.6442970Z [ OK ] ModulesTest.ReflectionPad1d (0 ms) 2023-01-11T23:31:43.6443400Z [ RUN ] ModulesTest.ReflectionPad2d 2023-01-11T23:31:43.6443732Z [ OK ] ModulesTest.ReflectionPad2d (0 ms) 2023-01-11T23:31:43.6444050Z [ RUN ] ModulesTest.ReflectionPad3d 2023-01-11T23:31:43.6446478Z [ OK ] ModulesTest.ReflectionPad3d (0 ms) 2023-01-11T23:31:43.6446940Z [ RUN ] ModulesTest.ReplicationPad1d 2023-01-11T23:31:43.6447609Z [ OK ] ModulesTest.ReplicationPad1d (0 ms) 2023-01-11T23:31:43.6447934Z [ RUN ] ModulesTest.ReplicationPad2d 2023-01-11T23:31:43.6453419Z [ OK ] ModulesTest.ReplicationPad2d (0 ms) 2023-01-11T23:31:43.6453779Z [ RUN ] ModulesTest.ReplicationPad3d 2023-01-11T23:31:43.6456979Z [ OK ] ModulesTest.ReplicationPad3d (0 ms) 2023-01-11T23:31:43.6457305Z [ RUN ] ModulesTest.ZeroPad2d 2023-01-11T23:31:43.6460857Z [ OK ] ModulesTest.ZeroPad2d (0 ms) 2023-01-11T23:31:43.6461329Z [ RUN ] ModulesTest.ConstantPad1d 2023-01-11T23:31:43.6462453Z [ OK ] ModulesTest.ConstantPad1d (0 ms) 2023-01-11T23:31:43.6462787Z [ RUN ] ModulesTest.ConstantPad2d 2023-01-11T23:31:43.6465743Z [ OK ] ModulesTest.ConstantPad2d (0 ms) 2023-01-11T23:31:43.6466120Z [ RUN ] ModulesTest.ConstantPad3d 2023-01-11T23:31:43.6472221Z [ OK ] ModulesTest.ConstantPad3d (0 ms) 2023-01-11T23:31:43.6472567Z [ RUN ] ModulesTest.CrossMapLRN2d 2023-01-11T23:31:43.6478728Z [ OK ] ModulesTest.CrossMapLRN2d (0 ms) 2023-01-11T23:31:43.6479031Z [ RUN ] ModulesTest.RNNCell 2023-01-11T23:31:43.6481943Z [ OK ] ModulesTest.RNNCell (0 ms) 2023-01-11T23:31:43.6482249Z [ RUN ] ModulesTest.LSTMCell 2023-01-11T23:31:43.6486504Z [ OK ] ModulesTest.LSTMCell (0 ms) 2023-01-11T23:31:43.6486797Z [ RUN ] ModulesTest.GRUCell 2023-01-11T23:31:43.6490136Z [ OK ] ModulesTest.GRUCell (0 ms) 2023-01-11T23:31:43.6490583Z [ RUN ] ModulesTest.PrettyPrintLinear 2023-01-11T23:31:43.6490947Z [ OK ] ModulesTest.PrettyPrintLinear (0 ms) 2023-01-11T23:31:43.6491283Z [ RUN ] ModulesTest.PrettyPrintBilinear 2023-01-11T23:31:43.6491640Z [ OK ] ModulesTest.PrettyPrintBilinear (0 ms) 2023-01-11T23:31:43.6492076Z [ RUN ] ModulesTest.PrettyPrintConv 2023-01-11T23:31:43.6492795Z [ OK ] ModulesTest.PrettyPrintConv (0 ms) 2023-01-11T23:31:43.6493914Z [ RUN ] ModulesTest.PrettyPrintConvTranspose 2023-01-11T23:31:43.6494686Z [ OK ] ModulesTest.PrettyPrintConvTranspose (0 ms) 2023-01-11T23:31:43.6495157Z [ RUN ] ModulesTest.PrettyPrintUpsample 2023-01-11T23:31:43.6495581Z [ OK ] ModulesTest.PrettyPrintUpsample (0 ms) 2023-01-11T23:31:43.6496064Z [ RUN ] ModulesTest.PrettyPrintFold 2023-01-11T23:31:43.6496536Z [ OK ] ModulesTest.PrettyPrintFold (0 ms) 2023-01-11T23:31:43.6496979Z [ RUN ] ModulesTest.PrettyPrintUnfold 2023-01-11T23:31:43.6497417Z [ OK ] ModulesTest.PrettyPrintUnfold (0 ms) 2023-01-11T23:31:43.6497887Z [ RUN ] ModulesTest.PrettyPrintMaxPool 2023-01-11T23:31:43.6498673Z [ OK ] ModulesTest.PrettyPrintMaxPool (0 ms) 2023-01-11T23:31:43.6499064Z [ RUN ] ModulesTest.PrettyPrintAvgPool 2023-01-11T23:31:43.6499556Z [ OK ] ModulesTest.PrettyPrintAvgPool (0 ms) 2023-01-11T23:31:43.6500059Z [ RUN ] ModulesTest.PrettyPrinFractionalMaxPool 2023-01-11T23:31:43.6500600Z [ OK ] ModulesTest.PrettyPrinFractionalMaxPool (0 ms) 2023-01-11T23:31:43.6501145Z [ RUN ] ModulesTest.PrettyPrintLPPool 2023-01-11T23:31:43.6501611Z [ OK ] ModulesTest.PrettyPrintLPPool (0 ms) 2023-01-11T23:31:43.6502108Z [ RUN ] ModulesTest.PrettyPrintAdaptiveMaxPool 2023-01-11T23:31:43.6502635Z [ OK ] ModulesTest.PrettyPrintAdaptiveMaxPool (0 ms) 2023-01-11T23:31:43.6503151Z [ RUN ] ModulesTest.PrettyPrintAdaptiveAvgPool 2023-01-11T23:31:43.6503671Z [ OK ] ModulesTest.PrettyPrintAdaptiveAvgPool (0 ms) 2023-01-11T23:31:43.6504150Z [ RUN ] ModulesTest.PrettyPrintMaxUnpool 2023-01-11T23:31:43.6504597Z [ OK ] ModulesTest.PrettyPrintMaxUnpool (0 ms) 2023-01-11T23:31:43.6505011Z [ RUN ] ModulesTest.PrettyPrintDropout 2023-01-11T23:31:43.6505481Z [ OK ] ModulesTest.PrettyPrintDropout (0 ms) 2023-01-11T23:31:43.6505910Z [ RUN ] ModulesTest.PrettyPrintDropout2d 2023-01-11T23:31:43.6506337Z [ OK ] ModulesTest.PrettyPrintDropout2d (0 ms) 2023-01-11T23:31:43.6506826Z [ RUN ] ModulesTest.PrettyPrintDropout3d 2023-01-11T23:31:43.6507270Z [ OK ] ModulesTest.PrettyPrintDropout3d (0 ms) 2023-01-11T23:31:43.6507806Z [ RUN ] ModulesTest.PrettyPrintFunctional 2023-01-11T23:31:43.6508282Z [ OK ] ModulesTest.PrettyPrintFunctional (0 ms) 2023-01-11T23:31:43.6508747Z [ RUN ] ModulesTest.PrettyPrintBatchNorm1d 2023-01-11T23:31:43.6509217Z [ OK ] ModulesTest.PrettyPrintBatchNorm1d (0 ms) 2023-01-11T23:31:43.6509615Z [ RUN ] ModulesTest.PrettyPrintBatchNorm2d 2023-01-11T23:31:43.6510088Z [ OK ] ModulesTest.PrettyPrintBatchNorm2d (0 ms) 2023-01-11T23:31:43.6510499Z [ RUN ] ModulesTest.PrettyPrintBatchNorm3d 2023-01-11T23:31:43.6510982Z [ OK ] ModulesTest.PrettyPrintBatchNorm3d (0 ms) 2023-01-11T23:31:43.6511411Z [ RUN ] ModulesTest.PrettyPrintInstanceNorm1d 2023-01-11T23:31:43.6511790Z [ OK ] ModulesTest.PrettyPrintInstanceNorm1d (0 ms) 2023-01-11T23:31:43.6512252Z [ RUN ] ModulesTest.PrettyPrintInstanceNorm2d 2023-01-11T23:31:43.6512753Z [ OK ] ModulesTest.PrettyPrintInstanceNorm2d (0 ms) 2023-01-11T23:31:43.6513180Z [ RUN ] ModulesTest.PrettyPrintInstanceNorm3d 2023-01-11T23:31:43.6513672Z [ OK ] ModulesTest.PrettyPrintInstanceNorm3d (0 ms) 2023-01-11T23:31:43.6514163Z [ RUN ] ModulesTest.PrettyPrintLayerNorm 2023-01-11T23:31:43.6514636Z [ OK ] ModulesTest.PrettyPrintLayerNorm (0 ms) 2023-01-11T23:31:43.6515080Z [ RUN ] ModulesTest.PrettyPrintGroupNorm 2023-01-11T23:31:43.6515522Z [ OK ] ModulesTest.PrettyPrintGroupNorm (0 ms) 2023-01-11T23:31:43.6515914Z [ RUN ] ModulesTest.PrettyPrintLocalResponseNorm 2023-01-11T23:31:43.6516315Z [ OK ] ModulesTest.PrettyPrintLocalResponseNorm (0 ms) 2023-01-11T23:31:43.6516758Z [ RUN ] ModulesTest.PrettyPrintEmbedding 2023-01-11T23:31:43.6517133Z [ OK ] ModulesTest.PrettyPrintEmbedding (0 ms) 2023-01-11T23:31:43.6517509Z [ RUN ] ModulesTest.PrettyPrintEmbeddingBag 2023-01-11T23:31:43.6518008Z [ OK ] ModulesTest.PrettyPrintEmbeddingBag (0 ms) 2023-01-11T23:31:43.6518492Z [ RUN ] ModulesTest.PrettyPrintL1Loss 2023-01-11T23:31:43.6518840Z [ OK ] ModulesTest.PrettyPrintL1Loss (0 ms) 2023-01-11T23:31:43.6519176Z [ RUN ] ModulesTest.PrettyPrintKLDivLoss 2023-01-11T23:31:43.6519702Z [ OK ] ModulesTest.PrettyPrintKLDivLoss (0 ms) 2023-01-11T23:31:43.6520200Z [ RUN ] ModulesTest.PrettyPrintMSELoss 2023-01-11T23:31:43.6520651Z [ OK ] ModulesTest.PrettyPrintMSELoss (0 ms) 2023-01-11T23:31:43.6521074Z [ RUN ] ModulesTest.PrettyPrintBCELoss 2023-01-11T23:31:43.6521459Z [ OK ] ModulesTest.PrettyPrintBCELoss (0 ms) 2023-01-11T23:31:43.6521830Z [ RUN ] ModulesTest.PrettyPrintHingeEmbeddingLoss 2023-01-11T23:31:43.6522224Z [ OK ] ModulesTest.PrettyPrintHingeEmbeddingLoss (0 ms) 2023-01-11T23:31:43.6522630Z [ RUN ] ModulesTest.PrettyPrintCosineEmbeddingLoss 2023-01-11T23:31:43.6523039Z [ OK ] ModulesTest.PrettyPrintCosineEmbeddingLoss (0 ms) 2023-01-11T23:31:43.6523437Z [ RUN ] ModulesTest.PrettyPrintTripletMarginLoss 2023-01-11T23:31:43.6523826Z [ OK ] ModulesTest.PrettyPrintTripletMarginLoss (0 ms) 2023-01-11T23:31:43.6524257Z [ RUN ] ModulesTest.PrettyPrintTripletMarginWithDistanceLoss 2023-01-11T23:31:43.6524724Z [ OK ] ModulesTest.PrettyPrintTripletMarginWithDistanceLoss (0 ms) 2023-01-11T23:31:43.6525110Z [ RUN ] ModulesTest.PrettyPrintNLLLoss 2023-01-11T23:31:43.6525453Z [ OK ] ModulesTest.PrettyPrintNLLLoss (0 ms) 2023-01-11T23:31:43.6525816Z [ RUN ] ModulesTest.PrettyPrinCrossEntropyLoss 2023-01-11T23:31:43.6526251Z [ OK ] ModulesTest.PrettyPrinCrossEntropyLoss (0 ms) 2023-01-11T23:31:43.6526638Z [ RUN ] ModulesTest.PrettyPrintMultiLabelMarginLoss 2023-01-11T23:31:43.6527108Z [ OK ] ModulesTest.PrettyPrintMultiLabelMarginLoss (0 ms) 2023-01-11T23:31:43.6527537Z [ RUN ] ModulesTest.PrettyPrintMultiLabelSoftMarginLoss 2023-01-11T23:31:43.6527971Z [ OK ] ModulesTest.PrettyPrintMultiLabelSoftMarginLoss (0 ms) 2023-01-11T23:31:43.6528378Z [ RUN ] ModulesTest.PrettyPrintSoftMarginLoss 2023-01-11T23:31:43.6528761Z [ OK ] ModulesTest.PrettyPrintSoftMarginLoss (0 ms) 2023-01-11T23:31:43.6529137Z [ RUN ] ModulesTest.PrettyPrintCosineSimilarity 2023-01-11T23:31:43.6529531Z [ OK ] ModulesTest.PrettyPrintCosineSimilarity (0 ms) 2023-01-11T23:31:43.6529921Z [ RUN ] ModulesTest.PrettyPrintPairwiseDistance 2023-01-11T23:31:43.6530312Z [ OK ] ModulesTest.PrettyPrintPairwiseDistance (0 ms) 2023-01-11T23:31:43.6530691Z [ RUN ] ModulesTest.PrettyPrintReflectionPad 2023-01-11T23:31:43.6531077Z [ OK ] ModulesTest.PrettyPrintReflectionPad (0 ms) 2023-01-11T23:31:43.6531457Z [ RUN ] ModulesTest.PrettyPrintReplicationPad 2023-01-11T23:31:43.6531833Z [ OK ] ModulesTest.PrettyPrintReplicationPad (0 ms) 2023-01-11T23:31:43.6532202Z [ RUN ] ModulesTest.PrettyPrintZeroPad2d 2023-01-11T23:31:43.6532566Z [ OK ] ModulesTest.PrettyPrintZeroPad2d (0 ms) 2023-01-11T23:31:43.6532919Z [ RUN ] ModulesTest.PrettyPrintConstantPad 2023-01-11T23:31:43.6533293Z [ OK ] ModulesTest.PrettyPrintConstantPad (0 ms) 2023-01-11T23:31:43.6533658Z [ RUN ] ModulesTest.PrettyPrintNestedModel 2023-01-11T23:31:43.6534021Z [ OK ] ModulesTest.PrettyPrintNestedModel (0 ms) 2023-01-11T23:31:43.6534356Z [ RUN ] ModulesTest.PrettyPrintELU 2023-01-11T23:31:43.6534928Z [ OK ] ModulesTest.PrettyPrintELU (0 ms) 2023-01-11T23:31:43.6535307Z [ RUN ] ModulesTest.PrettyPrintSELU 2023-01-11T23:31:43.6535638Z [ OK ] ModulesTest.PrettyPrintSELU (0 ms) 2023-01-11T23:31:43.6535956Z [ RUN ] ModulesTest.PrettyPrintGLU 2023-01-11T23:31:43.6536281Z [ OK ] ModulesTest.PrettyPrintGLU (0 ms) 2023-01-11T23:31:43.6536610Z [ RUN ] ModulesTest.PrettyPrintHardshrink 2023-01-11T23:31:43.6537030Z [ OK ] ModulesTest.PrettyPrintHardshrink (0 ms) 2023-01-11T23:31:43.6537387Z [ RUN ] ModulesTest.PrettyPrintHardtanh 2023-01-11T23:31:43.6537744Z [ OK ] ModulesTest.PrettyPrintHardtanh (0 ms) 2023-01-11T23:31:43.6538087Z [ RUN ] ModulesTest.PrettyPrintLeakyReLU 2023-01-11T23:31:43.6538447Z [ OK ] ModulesTest.PrettyPrintLeakyReLU (0 ms) 2023-01-11T23:31:43.6538805Z [ RUN ] ModulesTest.PrettyPrintLogSigmoid 2023-01-11T23:31:43.6539158Z [ OK ] ModulesTest.PrettyPrintLogSigmoid (0 ms) 2023-01-11T23:31:43.6539507Z [ RUN ] ModulesTest.PrettyPrintSoftmax 2023-01-11T23:31:43.6539857Z [ OK ] ModulesTest.PrettyPrintSoftmax (0 ms) 2023-01-11T23:31:43.6540188Z [ RUN ] ModulesTest.PrettyPrintSoftmin 2023-01-11T23:31:43.6540535Z [ OK ] ModulesTest.PrettyPrintSoftmin (0 ms) 2023-01-11T23:31:43.6540884Z [ RUN ] ModulesTest.PrettyPrintLogSoftmax 2023-01-11T23:31:43.6541247Z [ OK ] ModulesTest.PrettyPrintLogSoftmax (0 ms) 2023-01-11T23:31:43.6541592Z [ RUN ] ModulesTest.PrettyPrintSoftmax2d 2023-01-11T23:31:43.6541950Z [ OK ] ModulesTest.PrettyPrintSoftmax2d (0 ms) 2023-01-11T23:31:43.6542288Z [ RUN ] ModulesTest.PrettyPrintPReLU 2023-01-11T23:31:43.6542622Z [ OK ] ModulesTest.PrettyPrintPReLU (0 ms) 2023-01-11T23:31:43.6542948Z [ RUN ] ModulesTest.PrettyPrintReLU 2023-01-11T23:31:43.6543280Z [ OK ] ModulesTest.PrettyPrintReLU (0 ms) 2023-01-11T23:31:43.6543643Z [ RUN ] ModulesTest.PrettyPrintReLU6 2023-01-11T23:31:43.6543986Z [ OK ] ModulesTest.PrettyPrintReLU6 (0 ms) 2023-01-11T23:31:43.6544316Z [ RUN ] ModulesTest.PrettyPrintRReLU 2023-01-11T23:31:43.6544716Z [ OK ] ModulesTest.PrettyPrintRReLU (0 ms) 2023-01-11T23:31:43.6545072Z [ RUN ] ModulesTest.PrettyPrintCELU 2023-01-11T23:31:43.6545441Z [ OK ] ModulesTest.PrettyPrintCELU (0 ms) 2023-01-11T23:31:43.6545808Z [ RUN ] ModulesTest.PrettyPrintSigmoid 2023-01-11T23:31:43.6546194Z [ OK ] ModulesTest.PrettyPrintSigmoid (0 ms) 2023-01-11T23:31:43.6546582Z [ RUN ] ModulesTest.PrettyPrintPixelShuffle 2023-01-11T23:31:43.6546995Z [ OK ] ModulesTest.PrettyPrintPixelShuffle (0 ms) 2023-01-11T23:31:43.6547441Z [ RUN ] ModulesTest.PrettyPrintPixelUnshuffle 2023-01-11T23:31:43.6547925Z [ OK ] ModulesTest.PrettyPrintPixelUnshuffle (0 ms) 2023-01-11T23:31:43.6548372Z [ RUN ] ModulesTest.PrettyPrintSoftplus 2023-01-11T23:31:43.6548855Z [ OK ] ModulesTest.PrettyPrintSoftplus (0 ms) 2023-01-11T23:31:43.6549299Z [ RUN ] ModulesTest.PrettyPrintSoftshrink 2023-01-11T23:31:43.6549679Z [ OK ] ModulesTest.PrettyPrintSoftshrink (0 ms) 2023-01-11T23:31:43.6550119Z [ RUN ] ModulesTest.PrettyPrintSoftsign 2023-01-11T23:31:43.6550471Z [ OK ] ModulesTest.PrettyPrintSoftsign (0 ms) 2023-01-11T23:31:43.6550796Z [ RUN ] ModulesTest.PrettyPrintTanh 2023-01-11T23:31:43.6551128Z [ OK ] ModulesTest.PrettyPrintTanh (0 ms) 2023-01-11T23:31:43.6551467Z [ RUN ] ModulesTest.PrettyPrintTanhshrink 2023-01-11T23:31:43.6551884Z [ OK ] ModulesTest.PrettyPrintTanhshrink (0 ms) 2023-01-11T23:31:43.6552344Z [ RUN ] ModulesTest.PrettyPrintThreshold 2023-01-11T23:31:43.6552746Z [ OK ] ModulesTest.PrettyPrintThreshold (0 ms) 2023-01-11T23:31:43.6553091Z [ RUN ] ModulesTest.PrettyPrintCTCLoss 2023-01-11T23:31:43.6553435Z [ OK ] ModulesTest.PrettyPrintCTCLoss (0 ms) 2023-01-11T23:31:43.6553791Z [ RUN ] ModulesTest.PrettyPrintPoissonNLLLoss 2023-01-11T23:31:43.6554165Z [ OK ] ModulesTest.PrettyPrintPoissonNLLLoss (0 ms) 2023-01-11T23:31:43.6554617Z [ RUN ] ModulesTest.PrettyPrintMarginRankingLoss 2023-01-11T23:31:43.6555016Z [ OK ] ModulesTest.PrettyPrintMarginRankingLoss (0 ms) 2023-01-11T23:31:43.6555394Z [ RUN ] ModulesTest.PrettyPrintCrossMapLRN2d 2023-01-11T23:31:43.6555764Z [ OK ] ModulesTest.PrettyPrintCrossMapLRN2d (0 ms) 2023-01-11T23:31:43.6556178Z [ RUN ] ModulesTest.PrettyPrintAlphaDropout 2023-01-11T23:31:43.6556542Z [ OK ] ModulesTest.PrettyPrintAlphaDropout (0 ms) 2023-01-11T23:31:43.6556916Z [ RUN ] ModulesTest.PrettyPrintFeatureAlphaDropout 2023-01-11T23:31:43.6557328Z [ OK ] ModulesTest.PrettyPrintFeatureAlphaDropout (0 ms) 2023-01-11T23:31:43.6557723Z [ RUN ] ModulesTest.PrettyPrintBCEWithLogitsLoss 2023-01-11T23:31:43.6558115Z [ OK ] ModulesTest.PrettyPrintBCEWithLogitsLoss (0 ms) 2023-01-11T23:31:43.6558502Z [ RUN ] ModulesTest.PrettyPrintMultiheadAttention 2023-01-11T23:31:43.6558901Z [ OK ] ModulesTest.PrettyPrintMultiheadAttention (0 ms) 2023-01-11T23:31:43.6559263Z [ RUN ] ModulesTest.PrettyPrintRNNCell 2023-01-11T23:31:43.6559595Z [ OK ] ModulesTest.PrettyPrintRNNCell (0 ms) 2023-01-11T23:31:43.6559931Z [ RUN ] ModulesTest.PrettyPrintLSTMCell 2023-01-11T23:31:43.6560277Z [ OK ] ModulesTest.PrettyPrintLSTMCell (0 ms) 2023-01-11T23:31:43.6560609Z [ RUN ] ModulesTest.PrettyPrintGRUCell 2023-01-11T23:31:43.6560946Z [ OK ] ModulesTest.PrettyPrintGRUCell (0 ms) 2023-01-11T23:31:43.6561370Z [ RUN ] ModulesTest.PrettyPrintAdaptiveLogSoftmaxWithLoss 2023-01-11T23:31:43.6561820Z [ OK ] ModulesTest.PrettyPrintAdaptiveLogSoftmaxWithLoss (0 ms) 2023-01-11T23:31:43.6562216Z [----------] 258 tests from ModulesTest (9514 ms total) 2023-01-11T23:31:43.6562381Z 2023-01-11T23:31:43.6562531Z [----------] 1 test from NestedTest 2023-01-11T23:31:43.6562802Z [ RUN ] NestedTest.Nested 2023-01-11T23:31:43.6563165Z [W NestedTensorImpl.cpp:179] Warning: The PyTorch API of nested tensors is in prototype stage and will change in the near future. (function operator()) 2023-01-11T23:31:43.6563569Z [ OK ] NestedTest.Nested (0 ms) 2023-01-11T23:31:43.6563877Z [----------] 1 test from NestedTest (0 ms total) 2023-01-11T23:31:43.6564030Z 2023-01-11T23:31:43.6564194Z [----------] 10 tests from ParameterDictTest 2023-01-11T23:31:43.6564534Z [ RUN ] ParameterDictTest.ConstructFromTensor 2023-01-11T23:31:43.6564915Z [ OK ] ParameterDictTest.ConstructFromTensor (0 ms) 2023-01-11T23:31:43.6565298Z [ RUN ] ParameterDictTest.ConstructFromOrderedDict 2023-01-11T23:31:43.6565697Z [ OK ] ParameterDictTest.ConstructFromOrderedDict (0 ms) 2023-01-11T23:31:43.6566077Z [ RUN ] ParameterDictTest.InsertAndContains 2023-01-11T23:31:43.6566439Z [ OK ] ParameterDictTest.InsertAndContains (0 ms) 2023-01-11T23:31:43.6566780Z [ RUN ] ParameterDictTest.InsertAndClear 2023-01-11T23:31:43.6567134Z [ OK ] ParameterDictTest.InsertAndClear (0 ms) 2023-01-11T23:31:43.6567471Z [ RUN ] ParameterDictTest.InsertAndPop 2023-01-11T23:31:43.6567811Z [ OK ] ParameterDictTest.InsertAndPop (1 ms) 2023-01-11T23:31:43.6568137Z [ RUN ] ParameterDictTest.SimpleUpdate 2023-01-11T23:31:43.6568496Z [ OK ] ParameterDictTest.SimpleUpdate (1 ms) 2023-01-11T23:31:43.6568816Z [ RUN ] ParameterDictTest.Keys 2023-01-11T23:31:43.6569110Z [ OK ] ParameterDictTest.Keys (0 ms) 2023-01-11T23:31:43.6569414Z [ RUN ] ParameterDictTest.Values 2023-01-11T23:31:43.6569725Z [ OK ] ParameterDictTest.Values (0 ms) 2023-01-11T23:31:43.6570024Z [ RUN ] ParameterDictTest.Get 2023-01-11T23:31:43.6570354Z [ OK ] ParameterDictTest.Get (0 ms) 2023-01-11T23:31:43.6570708Z [ RUN ] ParameterDictTest.PrettyPrintParameterDict 2023-01-11T23:31:43.6571112Z [ OK ] ParameterDictTest.PrettyPrintParameterDict (0 ms) 2023-01-11T23:31:43.6571490Z [----------] 10 tests from ParameterDictTest (2 ms total) 2023-01-11T23:31:43.6571659Z 2023-01-11T23:31:43.6571825Z [----------] 8 tests from ParameterListTest 2023-01-11T23:31:43.6572194Z [ RUN ] ParameterListTest.ConstructsFromSharedPointer 2023-01-11T23:31:43.6572609Z [ OK ] ParameterListTest.ConstructsFromSharedPointer (0 ms) 2023-01-11T23:31:43.6572972Z [ RUN ] ParameterListTest.isEmpty 2023-01-11T23:31:43.6573296Z [ OK ] ParameterListTest.isEmpty (0 ms) 2023-01-11T23:31:43.6573648Z [ RUN ] ParameterListTest.PushBackAddsAnElement 2023-01-11T23:31:43.6574031Z [ OK ] ParameterListTest.PushBackAddsAnElement (0 ms) 2023-01-11T23:31:43.6574386Z [ RUN ] ParameterListTest.ForEachLoop 2023-01-11T23:31:43.6575061Z [ OK ] ParameterListTest.ForEachLoop (0 ms) 2023-01-11T23:31:43.6575397Z [ RUN ] ParameterListTest.AccessWithAt 2023-01-11T23:31:43.6587866Z [ OK ] ParameterListTest.AccessWithAt (3 ms) 2023-01-11T23:31:43.6588486Z [ RUN ] ParameterListTest.ExtendPushesParametersFromOtherParameterList 2023-01-11T23:31:43.6589112Z [ OK ] ParameterListTest.ExtendPushesParametersFromOtherParameterList (0 ms) 2023-01-11T23:31:43.6589655Z [ RUN ] ParameterListTest.PrettyPrintParameterList 2023-01-11T23:31:43.6590393Z [ OK ] ParameterListTest.PrettyPrintParameterList (0 ms) 2023-01-11T23:31:43.6590776Z [ RUN ] ParameterListTest.IncrementAdd 2023-01-11T23:31:43.6591125Z [ OK ] ParameterListTest.IncrementAdd (0 ms) 2023-01-11T23:31:43.6591486Z [----------] 8 tests from ParameterListTest (4 ms total) 2023-01-11T23:31:43.6591658Z 2023-01-11T23:31:43.6591817Z [----------] 1 test from NamespaceTests 2023-01-11T23:31:43.6592223Z [ RUN ] NamespaceTests.NotLeakingSymbolsFromTorchAutogradNamespace 2023-01-11T23:31:43.6592726Z [ OK ] NamespaceTests.NotLeakingSymbolsFromTorchAutogradNamespace (0 ms) 2023-01-11T23:31:43.6593150Z [----------] 1 test from NamespaceTests (0 ms total) 2023-01-11T23:31:43.6593312Z 2023-01-11T23:31:43.6593462Z [----------] 7 tests from NNUtilsTest 2023-01-11T23:31:43.6593749Z [ RUN ] NNUtilsTest.ClipGradNorm 2023-01-11T23:31:43.6605565Z [ OK ] NNUtilsTest.ClipGradNorm (1 ms) 2023-01-11T23:31:43.6605981Z [ RUN ] NNUtilsTest.ClipGradNormErrorIfNonfinite 2023-01-11T23:31:44.1766817Z [ OK ] NNUtilsTest.ClipGradNormErrorIfNonfinite (516 ms) 2023-01-11T23:31:44.1767282Z [ RUN ] NNUtilsTest.ClipGradValue 2023-01-11T23:31:44.1767674Z [ OK ] NNUtilsTest.ClipGradValue (0 ms) 2023-01-11T23:31:44.1768006Z [ RUN ] NNUtilsTest.ConvertParameters 2023-01-11T23:31:44.1773296Z [ OK ] NNUtilsTest.ConvertParameters (0 ms) 2023-01-11T23:31:44.1773655Z [ RUN ] NNUtilsTest.PackSequence 2023-01-11T23:31:44.2198239Z [ OK ] NNUtilsTest.PackSequence (42 ms) 2023-01-11T23:31:44.2198617Z [ RUN ] NNUtilsTest.PackPaddedSequence 2023-01-11T23:31:44.2363819Z [ OK ] NNUtilsTest.PackPaddedSequence (16 ms) 2023-01-11T23:31:44.2364180Z [ RUN ] NNUtilsTest.PadSequence 2023-01-11T23:31:44.2433074Z [ OK ] NNUtilsTest.PadSequence (6 ms) 2023-01-11T23:31:44.2433800Z [----------] 7 tests from NNUtilsTest (583 ms total) 2023-01-11T23:31:44.2434242Z 2023-01-11T23:31:44.2434615Z [----------] 3 tests from PackedSequenceTest 2023-01-11T23:31:44.2435254Z [ RUN ] PackedSequenceTest.WrongOrder 2023-01-11T23:31:44.2472791Z [ OK ] PackedSequenceTest.WrongOrder (4 ms) 2023-01-11T23:31:44.2473548Z [ RUN ] PackedSequenceTest.TotalLength 2023-01-11T23:31:44.2549554Z [ OK ] PackedSequenceTest.TotalLength (7 ms) 2023-01-11T23:31:44.2550476Z [ RUN ] PackedSequenceTest.To 2023-01-11T23:31:44.2563053Z [ OK ] PackedSequenceTest.To (1 ms) 2023-01-11T23:31:44.2563594Z [----------] 3 tests from PackedSequenceTest (13 ms total) 2023-01-11T23:31:44.2563851Z 2023-01-11T23:31:44.2564073Z [----------] 34 tests from OptimTest 2023-01-11T23:31:44.2564395Z [ RUN ] OptimTest.OptimizerAccessors 2023-01-11T23:31:44.2581736Z [ OK ] OptimTest.OptimizerAccessors (1 ms) 2023-01-11T23:31:44.2582196Z [ RUN ] OptimTest.OldInterface 2023-01-11T23:31:44.2582697Z [ OK ] OptimTest.OldInterface (0 ms) 2023-01-11T23:31:44.2583144Z [ RUN ] OptimTest.XORConvergence_SGD 2023-01-11T23:31:45.7378963Z [ OK ] OptimTest.XORConvergence_SGD (1479 ms) 2023-01-11T23:31:45.7379912Z [ RUN ] OptimTest.XORConvergence_LBFGS 2023-01-11T23:31:46.7543165Z [ OK ] OptimTest.XORConvergence_LBFGS (1016 ms) 2023-01-11T23:31:46.7543907Z [ RUN ] OptimTest.XORConvergence_Adagrad 2023-01-11T23:31:47.3218432Z [ OK ] OptimTest.XORConvergence_Adagrad (567 ms) 2023-01-11T23:31:47.3219406Z [ RUN ] OptimTest.XORConvergence_RMSprop 2023-01-11T23:31:47.8874723Z [ OK ] OptimTest.XORConvergence_RMSprop (565 ms) 2023-01-11T23:31:47.8875775Z [ RUN ] OptimTest.XORConvergence_RMSpropWithMomentum 2023-01-11T23:31:49.4775519Z [ OK ] OptimTest.XORConvergence_RMSpropWithMomentum (1590 ms) 2023-01-11T23:31:49.4776283Z [ RUN ] OptimTest.XORConvergence_Adam 2023-01-11T23:31:50.0958545Z [ OK ] OptimTest.XORConvergence_Adam (618 ms) 2023-01-11T23:31:50.0959083Z [ RUN ] OptimTest.XORConvergence_AdamWithAmsgrad 2023-01-11T23:31:50.7185203Z [ OK ] OptimTest.XORConvergence_AdamWithAmsgrad (622 ms) 2023-01-11T23:31:50.7186304Z [ RUN ] OptimTest.ProducesPyTorchValues_Adam 2023-01-11T23:31:50.9174428Z [ OK ] OptimTest.ProducesPyTorchValues_Adam (198 ms) 2023-01-11T23:31:50.9176058Z [ RUN ] OptimTest.ProducesPyTorchValues_AdamWithWeightDecay 2023-01-11T23:31:51.1227299Z [ OK ] OptimTest.ProducesPyTorchValues_AdamWithWeightDecay (205 ms) 2023-01-11T23:31:51.1228272Z [ RUN ] OptimTest.ProducesPyTorchValues_AdamWithWeightDecayAndAMSGrad 2023-01-11T23:31:51.3335677Z [ OK ] OptimTest.ProducesPyTorchValues_AdamWithWeightDecayAndAMSGrad (210 ms) 2023-01-11T23:31:51.3336682Z [ RUN ] OptimTest.XORConvergence_AdamW 2023-01-11T23:31:51.9663484Z [ OK ] OptimTest.XORConvergence_AdamW (632 ms) 2023-01-11T23:31:51.9664021Z [ RUN ] OptimTest.XORConvergence_AdamWWithAmsgrad 2023-01-11T23:31:52.6005764Z [ OK ] OptimTest.XORConvergence_AdamWWithAmsgrad (634 ms) 2023-01-11T23:31:52.6006341Z [ RUN ] OptimTest.ProducesPyTorchValues_AdamW 2023-01-11T23:31:52.8059696Z [ OK ] OptimTest.ProducesPyTorchValues_AdamW (205 ms) 2023-01-11T23:31:52.8060263Z [ RUN ] OptimTest.ProducesPyTorchValues_AdamWWithoutWeightDecay 2023-01-11T23:31:53.0033276Z [ OK ] OptimTest.ProducesPyTorchValues_AdamWWithoutWeightDecay (197 ms) 2023-01-11T23:31:53.0034402Z [ RUN ] OptimTest.ProducesPyTorchValues_AdamWWithAMSGrad 2023-01-11T23:31:53.2150041Z [ OK ] OptimTest.ProducesPyTorchValues_AdamWWithAMSGrad (211 ms) 2023-01-11T23:31:53.2150582Z [ RUN ] OptimTest.ProducesPyTorchValues_Adagrad 2023-01-11T23:31:53.3810999Z [ OK ] OptimTest.ProducesPyTorchValues_Adagrad (166 ms) 2023-01-11T23:31:53.3812347Z [ RUN ] OptimTest.ProducesPyTorchValues_AdagradWithWeightDecay 2023-01-11T23:31:53.5543662Z [ OK ] OptimTest.ProducesPyTorchValues_AdagradWithWeightDecay (173 ms) 2023-01-11T23:31:53.5544882Z [ RUN ] OptimTest.ProducesPyTorchValues_AdagradWithWeightDecayAndLRDecay 2023-01-11T23:31:53.7273102Z [ OK ] OptimTest.ProducesPyTorchValues_AdagradWithWeightDecayAndLRDecay (173 ms) 2023-01-11T23:31:53.7273703Z [ RUN ] OptimTest.ProducesPyTorchValues_RMSprop 2023-01-11T23:31:53.9033412Z [ OK ] OptimTest.ProducesPyTorchValues_RMSprop (175 ms) 2023-01-11T23:31:53.9034621Z [ RUN ] OptimTest.ProducesPyTorchValues_RMSpropWithWeightDecay 2023-01-11T23:31:54.0849033Z [ OK ] OptimTest.ProducesPyTorchValues_RMSpropWithWeightDecay (181 ms) 2023-01-11T23:31:54.0849691Z [ RUN ] OptimTest.ProducesPyTorchValues_RMSpropWithWeightDecayAndCentered 2023-01-11T23:31:54.2843575Z [ OK ] OptimTest.ProducesPyTorchValues_RMSpropWithWeightDecayAndCentered (199 ms) 2023-01-11T23:31:54.2844324Z [ RUN ] OptimTest.ProducesPyTorchValues_RMSpropWithWeightDecayAndCenteredAndMomentum 2023-01-11T23:31:54.4937341Z [ OK ] OptimTest.ProducesPyTorchValues_RMSpropWithWeightDecayAndCenteredAndMomentum (208 ms) 2023-01-11T23:31:54.4938283Z [ RUN ] OptimTest.ProducesPyTorchValues_SGD 2023-01-11T23:31:54.6270469Z [ OK ] OptimTest.ProducesPyTorchValues_SGD (133 ms) 2023-01-11T23:31:54.6271337Z [ RUN ] OptimTest.ProducesPyTorchValues_SGDWithWeightDecay 2023-01-11T23:31:54.7695433Z [ OK ] OptimTest.ProducesPyTorchValues_SGDWithWeightDecay (142 ms) 2023-01-11T23:31:54.7696366Z [ RUN ] OptimTest.ProducesPyTorchValues_SGDWithWeightDecayAndMomentum 2023-01-11T23:31:54.9303594Z [ OK ] OptimTest.ProducesPyTorchValues_SGDWithWeightDecayAndMomentum (160 ms) 2023-01-11T23:31:54.9304629Z [ RUN ] OptimTest.ProducesPyTorchValues_SGDWithWeightDecayAndNesterovMomentum 2023-01-11T23:31:55.0974847Z [ OK ] OptimTest.ProducesPyTorchValues_SGDWithWeightDecayAndNesterovMomentum (167 ms) 2023-01-11T23:31:55.0975423Z [ RUN ] OptimTest.ProducesPyTorchValues_LBFGS 2023-01-11T23:31:55.2478640Z [ OK ] OptimTest.ProducesPyTorchValues_LBFGS (150 ms) 2023-01-11T23:31:55.2479500Z [ RUN ] OptimTest.ProducesPyTorchValues_LBFGS_with_line_search 2023-01-11T23:31:55.9391407Z [ OK ] OptimTest.ProducesPyTorchValues_LBFGS_with_line_search (691 ms) 2023-01-11T23:31:55.9392386Z [ RUN ] OptimTest.ZeroGrad 2023-01-11T23:31:55.9393175Z [ OK ] OptimTest.ZeroGrad (0 ms) 2023-01-11T23:31:55.9394085Z [ RUN ] OptimTest.ExternalVectorOfParameters 2023-01-11T23:31:55.9395120Z [ OK ] OptimTest.ExternalVectorOfParameters (0 ms) 2023-01-11T23:31:55.9395957Z [ RUN ] OptimTest.AddParameter_LBFGS 2023-01-11T23:31:55.9396613Z [ OK ] OptimTest.AddParameter_LBFGS (0 ms) 2023-01-11T23:31:55.9397266Z [ RUN ] OptimTest.CheckLRChange_StepLR_Adam 2023-01-11T23:31:55.9397925Z [ OK ] OptimTest.CheckLRChange_StepLR_Adam (0 ms) 2023-01-11T23:31:55.9398615Z [----------] 34 tests from OptimTest (11682 ms total) 2023-01-11T23:31:55.9398937Z 2023-01-11T23:31:55.9399258Z [----------] 29 tests from OrderedDictTest 2023-01-11T23:31:55.9399667Z [ RUN ] OrderedDictTest.IsEmptyAfterDefaultConstruction 2023-01-11T23:31:55.9400106Z [ OK ] OrderedDictTest.IsEmptyAfterDefaultConstruction (0 ms) 2023-01-11T23:31:55.9400582Z [ RUN ] OrderedDictTest.InsertAddsElementsWhenTheyAreYetNotPresent 2023-01-11T23:31:55.9401087Z [ OK ] OrderedDictTest.InsertAddsElementsWhenTheyAreYetNotPresent (0 ms) 2023-01-11T23:31:55.9401557Z [ RUN ] OrderedDictTest.GetReturnsValuesWhenTheyArePresent 2023-01-11T23:31:55.9402174Z [ OK ] OrderedDictTest.GetReturnsValuesWhenTheyArePresent (0 ms) 2023-01-11T23:31:55.9402646Z [ RUN ] OrderedDictTest.GetThrowsWhenPassedKeysThatAreNotPresent 2023-01-11T23:31:55.9421925Z [ OK ] OrderedDictTest.GetThrowsWhenPassedKeysThatAreNotPresent (2 ms) 2023-01-11T23:31:55.9422536Z [ RUN ] OrderedDictTest.CanInitializeFromList 2023-01-11T23:31:55.9423008Z [ OK ] OrderedDictTest.CanInitializeFromList (0 ms) 2023-01-11T23:31:55.9423512Z [ RUN ] OrderedDictTest.InsertThrowsWhenPassedElementsThatArePresent 2023-01-11T23:31:55.9445498Z [ OK ] OrderedDictTest.InsertThrowsWhenPassedElementsThatArePresent (2 ms) 2023-01-11T23:31:55.9446036Z [ RUN ] OrderedDictTest.FrontReturnsTheFirstItem 2023-01-11T23:31:55.9446437Z [ OK ] OrderedDictTest.FrontReturnsTheFirstItem (0 ms) 2023-01-11T23:31:55.9446821Z [ RUN ] OrderedDictTest.FrontThrowsWhenEmpty 2023-01-11T23:31:55.9456403Z [ OK ] OrderedDictTest.FrontThrowsWhenEmpty (1 ms) 2023-01-11T23:31:55.9456799Z [ RUN ] OrderedDictTest.BackReturnsTheLastItem 2023-01-11T23:31:55.9457265Z [ OK ] OrderedDictTest.BackReturnsTheLastItem (0 ms) 2023-01-11T23:31:55.9457637Z [ RUN ] OrderedDictTest.BackThrowsWhenEmpty 2023-01-11T23:31:55.9467297Z [ OK ] OrderedDictTest.BackThrowsWhenEmpty (1 ms) 2023-01-11T23:31:55.9467862Z [ RUN ] OrderedDictTest.FindReturnsPointersToValuesWhenPresent 2023-01-11T23:31:55.9468476Z [ OK ] OrderedDictTest.FindReturnsPointersToValuesWhenPresent (0 ms) 2023-01-11T23:31:55.9469146Z [ RUN ] OrderedDictTest.FindReturnsNullPointersWhenPasesdKeysThatAreNotPresent 2023-01-11T23:31:55.9469747Z [ OK ] OrderedDictTest.FindReturnsNullPointersWhenPasesdKeysThatAreNotPresent (0 ms) 2023-01-11T23:31:55.9470430Z [ RUN ] OrderedDictTest.SubscriptOperatorThrowsWhenPassedKeysThatAreNotPresent 2023-01-11T23:31:55.9471028Z [ OK ] OrderedDictTest.SubscriptOperatorThrowsWhenPassedKeysThatAreNotPresent (0 ms) 2023-01-11T23:31:55.9471624Z [ RUN ] OrderedDictTest.SubscriptOperatorReturnsItemsPositionallyWhenPassedIntegers 2023-01-11T23:31:55.9472254Z [ OK ] OrderedDictTest.SubscriptOperatorReturnsItemsPositionallyWhenPassedIntegers (0 ms) 2023-01-11T23:31:55.9472861Z [ RUN ] OrderedDictTest.SubscriptOperatorsThrowswhenPassedKeysThatAreNotPresent 2023-01-11T23:31:55.9491859Z [ OK ] OrderedDictTest.SubscriptOperatorsThrowswhenPassedKeysThatAreNotPresent (2 ms) 2023-01-11T23:31:55.9492638Z [ RUN ] OrderedDictTest.UpdateInsertsAllItemsFromAnotherOrderedDict 2023-01-11T23:31:55.9493193Z [ OK ] OrderedDictTest.UpdateInsertsAllItemsFromAnotherOrderedDict (0 ms) 2023-01-11T23:31:55.9493666Z [ RUN ] OrderedDictTest.UpdateAlsoChecksForDuplicates 2023-01-11T23:31:55.9503224Z [ OK ] OrderedDictTest.UpdateAlsoChecksForDuplicates (1 ms) 2023-01-11T23:31:55.9503705Z [ RUN ] OrderedDictTest.CanIterateItems 2023-01-11T23:31:55.9504178Z [ OK ] OrderedDictTest.CanIterateItems (0 ms) 2023-01-11T23:31:55.9504591Z [ RUN ] OrderedDictTest.EraseWorks 2023-01-11T23:31:55.9505104Z [ OK ] OrderedDictTest.EraseWorks (0 ms) 2023-01-11T23:31:55.9505577Z [ RUN ] OrderedDictTest.ClearMakesTheDictEmpty 2023-01-11T23:31:55.9505973Z [ OK ] OrderedDictTest.ClearMakesTheDictEmpty (0 ms) 2023-01-11T23:31:55.9506359Z [ RUN ] OrderedDictTest.CanCopyConstruct 2023-01-11T23:31:55.9506705Z [ OK ] OrderedDictTest.CanCopyConstruct (0 ms) 2023-01-11T23:31:55.9507044Z [ RUN ] OrderedDictTest.CanCopyAssign 2023-01-11T23:31:55.9507382Z [ OK ] OrderedDictTest.CanCopyAssign (0 ms) 2023-01-11T23:31:55.9507800Z [ RUN ] OrderedDictTest.CanMoveConstruct 2023-01-11T23:31:55.9508154Z [ OK ] OrderedDictTest.CanMoveConstruct (0 ms) 2023-01-11T23:31:55.9508495Z [ RUN ] OrderedDictTest.CanMoveAssign 2023-01-11T23:31:55.9508826Z [ OK ] OrderedDictTest.CanMoveAssign (0 ms) 2023-01-11T23:31:55.9509198Z [ RUN ] OrderedDictTest.CanInsertWithBraces 2023-01-11T23:31:55.9509601Z [ OK ] OrderedDictTest.CanInsertWithBraces (0 ms) 2023-01-11T23:31:55.9510134Z [ RUN ] OrderedDictTest.ErrorMessagesIncludeTheKeyDescription 2023-01-11T23:31:55.9526907Z [ OK ] OrderedDictTest.ErrorMessagesIncludeTheKeyDescription (2 ms) 2023-01-11T23:31:55.9527413Z [ RUN ] OrderedDictTest.KeysReturnsAllKeys 2023-01-11T23:31:55.9527912Z [ OK ] OrderedDictTest.KeysReturnsAllKeys (0 ms) 2023-01-11T23:31:55.9528312Z [ RUN ] OrderedDictTest.ValuesReturnsAllValues 2023-01-11T23:31:55.9528711Z [ OK ] OrderedDictTest.ValuesReturnsAllValues (0 ms) 2023-01-11T23:31:55.9529084Z [ RUN ] OrderedDictTest.ItemsReturnsAllItems 2023-01-11T23:31:55.9529450Z [ OK ] OrderedDictTest.ItemsReturnsAllItems (0 ms) 2023-01-11T23:31:55.9529813Z [----------] 29 tests from OrderedDictTest (13 ms total) 2023-01-11T23:31:55.9529981Z 2023-01-11T23:31:55.9530128Z [----------] 25 tests from RNNTest 2023-01-11T23:31:55.9530417Z [ RUN ] RNNTest.CheckOutputSizes 2023-01-11T23:31:55.9597441Z [ OK ] RNNTest.CheckOutputSizes (6 ms) 2023-01-11T23:31:55.9598152Z [ RUN ] RNNTest.CheckOutputSizesProj 2023-01-11T23:31:55.9670350Z [ OK ] RNNTest.CheckOutputSizesProj (7 ms) 2023-01-11T23:31:55.9671153Z [ RUN ] RNNTest.CheckOutputValuesMatchPyTorch 2023-01-11T23:31:55.9675337Z [ OK ] RNNTest.CheckOutputValuesMatchPyTorch (0 ms) 2023-01-11T23:31:55.9676027Z [ RUN ] RNNTest.EndToEndLSTM 2023-01-11T23:31:57.5905261Z [ OK ] RNNTest.EndToEndLSTM (1622 ms) 2023-01-11T23:31:57.5906110Z [ RUN ] RNNTest.EndToEndLSTMProj 2023-01-11T23:31:59.2580022Z [ OK ] RNNTest.EndToEndLSTMProj (1667 ms) 2023-01-11T23:31:59.2580730Z [ RUN ] RNNTest.EndToEndGRU 2023-01-11T23:32:00.6770577Z [ OK ] RNNTest.EndToEndGRU (1419 ms) 2023-01-11T23:32:00.6770931Z [ RUN ] RNNTest.EndToEndRNNRelu 2023-01-11T23:32:01.4737435Z [ OK ] RNNTest.EndToEndRNNRelu (796 ms) 2023-01-11T23:32:01.4738143Z [ RUN ] RNNTest.EndToEndRNNTanh 2023-01-11T23:32:02.3808416Z [ OK ] RNNTest.EndToEndRNNTanh (907 ms) 2023-01-11T23:32:02.3808762Z [ RUN ] RNNTest.Sizes_CUDA 2023-01-11T23:32:02.4830202Z [ OK ] RNNTest.Sizes_CUDA (101 ms) 2023-01-11T23:32:02.4830992Z [ RUN ] RNNTest.SizesProj_CUDA 2023-01-11T23:32:02.4857150Z [ OK ] RNNTest.SizesProj_CUDA (2 ms) 2023-01-11T23:32:02.4857816Z [ RUN ] RNNTest.EndToEndLSTM_CUDA 2023-01-11T23:32:03.5556834Z [ OK ] RNNTest.EndToEndLSTM_CUDA (1069 ms) 2023-01-11T23:32:03.5557603Z [ RUN ] RNNTest.EndToEndLSTMProj_CUDA 2023-01-11T23:32:04.7278438Z [ OK ] RNNTest.EndToEndLSTMProj_CUDA (1172 ms) 2023-01-11T23:32:04.7279491Z [ RUN ] RNNTest.EndToEndGRU_CUDA 2023-01-11T23:32:05.5923773Z [ OK ] RNNTest.EndToEndGRU_CUDA (864 ms) 2023-01-11T23:32:05.5924130Z [ RUN ] RNNTest.EndToEndRNNRelu_CUDA 2023-01-11T23:32:06.4318599Z [ OK ] RNNTest.EndToEndRNNRelu_CUDA (839 ms) 2023-01-11T23:32:06.4319320Z [ RUN ] RNNTest.EndToEndRNNTanh_CUDA 2023-01-11T23:32:07.3722961Z [ OK ] RNNTest.EndToEndRNNTanh_CUDA (940 ms) 2023-01-11T23:32:07.3723622Z [ RUN ] RNNTest.PrettyPrintRNNs 2023-01-11T23:32:07.3740175Z [ OK ] RNNTest.PrettyPrintRNNs (2 ms) 2023-01-11T23:32:07.3740934Z [ RUN ] RNNTest.BidirectionalFlattenParameters 2023-01-11T23:32:07.3827756Z [ OK ] RNNTest.BidirectionalFlattenParameters (8 ms) 2023-01-11T23:32:07.3828819Z [ RUN ] RNNTest.BidirectionalGRUReverseForward 2023-01-11T23:32:07.3839948Z [ OK ] RNNTest.BidirectionalGRUReverseForward (1 ms) 2023-01-11T23:32:07.3840472Z [ RUN ] RNNTest.BidirectionalGRUReverseForward_CUDA 2023-01-11T23:32:07.3851310Z [ OK ] RNNTest.BidirectionalGRUReverseForward_CUDA (1 ms) 2023-01-11T23:32:07.3851919Z [ RUN ] RNNTest.BidirectionalLSTMReverseForward 2023-01-11T23:32:07.3862903Z [ OK ] RNNTest.BidirectionalLSTMReverseForward (1 ms) 2023-01-11T23:32:07.3863430Z [ RUN ] RNNTest.BidirectionalLSTMReverseForward_CUDA 2023-01-11T23:32:07.3875099Z [ OK ] RNNTest.BidirectionalLSTMReverseForward_CUDA (1 ms) 2023-01-11T23:32:07.3875600Z [ RUN ] RNNTest.BidirectionalMultilayerGRU_CPU_vs_CUDA 2023-01-11T23:32:07.3915336Z [ OK ] RNNTest.BidirectionalMultilayerGRU_CPU_vs_CUDA (3 ms) 2023-01-11T23:32:07.3915836Z [ RUN ] RNNTest.BidirectionalMultilayerLSTM_CPU_vs_CUDA 2023-01-11T23:32:07.3953348Z [ OK ] RNNTest.BidirectionalMultilayerLSTM_CPU_vs_CUDA (3 ms) 2023-01-11T23:32:07.3953946Z [ RUN ] RNNTest.BidirectionalMultilayerLSTMProj_CPU_vs_CUDA 2023-01-11T23:32:07.3999482Z [ OK ] RNNTest.BidirectionalMultilayerLSTMProj_CPU_vs_CUDA (4 ms) 2023-01-11T23:32:07.4000451Z [ RUN ] RNNTest.UsePackedSequenceAsInput 2023-01-11T23:32:07.4014894Z [ OK ] RNNTest.UsePackedSequenceAsInput (1 ms) 2023-01-11T23:32:07.4015331Z [----------] 25 tests from RNNTest (11448 ms total) 2023-01-11T23:32:07.4015490Z 2023-01-11T23:32:07.4015646Z [----------] 20 tests from SequentialTest 2023-01-11T23:32:07.4015972Z [ RUN ] SequentialTest.CanContainThings 2023-01-11T23:32:07.4016320Z [ OK ] SequentialTest.CanContainThings (0 ms) 2023-01-11T23:32:07.4016692Z [ RUN ] SequentialTest.ConstructsFromSharedPointer 2023-01-11T23:32:07.4017105Z [ OK ] SequentialTest.ConstructsFromSharedPointer (0 ms) 2023-01-11T23:32:07.4017507Z [ RUN ] SequentialTest.ConstructsFromConcreteType 2023-01-11T23:32:07.4017906Z [ OK ] SequentialTest.ConstructsFromConcreteType (0 ms) 2023-01-11T23:32:07.4018289Z [ RUN ] SequentialTest.ConstructsFromModuleHolder 2023-01-11T23:32:07.4018684Z [ OK ] SequentialTest.ConstructsFromModuleHolder (0 ms) 2023-01-11T23:32:07.4019069Z [ RUN ] SequentialTest.PushBackAddsAnElement 2023-01-11T23:32:07.4019440Z [ OK ] SequentialTest.PushBackAddsAnElement (0 ms) 2023-01-11T23:32:07.4019775Z [ RUN ] SequentialTest.AccessWithAt 2023-01-11T23:32:07.4042348Z [ OK ] SequentialTest.AccessWithAt (2 ms) 2023-01-11T23:32:07.4042793Z [ RUN ] SequentialTest.AccessWithPtr 2023-01-11T23:32:07.4065707Z [ OK ] SequentialTest.AccessWithPtr (2 ms) 2023-01-11T23:32:07.4066220Z [ RUN ] SequentialTest.CallingForwardOnEmptySequentialIsDisallowed 2023-01-11T23:32:07.4076930Z [ OK ] SequentialTest.CallingForwardOnEmptySequentialIsDisallowed (1 ms) 2023-01-11T23:32:07.4077528Z [ RUN ] SequentialTest.CallingForwardChainsCorrectly 2023-01-11T23:32:07.4077941Z [ OK ] SequentialTest.CallingForwardChainsCorrectly (0 ms) 2023-01-11T23:32:07.4078399Z [ RUN ] SequentialTest.CallingForwardWithTheWrongReturnTypeThrows 2023-01-11T23:32:07.4088067Z [ OK ] SequentialTest.CallingForwardWithTheWrongReturnTypeThrows (1 ms) 2023-01-11T23:32:07.4088729Z [ RUN ] SequentialTest.TheReturnTypeOfForwardDefaultsToTensor 2023-01-11T23:32:07.4089384Z [ OK ] SequentialTest.TheReturnTypeOfForwardDefaultsToTensor (0 ms) 2023-01-11T23:32:07.4089822Z [ RUN ] SequentialTest.ForwardReturnsTheLastValue 2023-01-11T23:32:07.4091431Z [ OK ] SequentialTest.ForwardReturnsTheLastValue (0 ms) 2023-01-11T23:32:07.4091976Z [ RUN ] SequentialTest.SanityCheckForHoldingStandardModules 2023-01-11T23:32:07.4092600Z [ OK ] SequentialTest.SanityCheckForHoldingStandardModules (0 ms) 2023-01-11T23:32:07.4093135Z [ RUN ] SequentialTest.ExtendPushesModulesFromOtherSequential 2023-01-11T23:32:07.4093672Z [ OK ] SequentialTest.ExtendPushesModulesFromOtherSequential (0 ms) 2023-01-11T23:32:07.4094134Z [ RUN ] SequentialTest.HasReferenceSemantics 2023-01-11T23:32:07.4094796Z [ OK ] SequentialTest.HasReferenceSemantics (0 ms) 2023-01-11T23:32:07.4095141Z [ RUN ] SequentialTest.IsCloneable 2023-01-11T23:32:07.4098117Z [ OK ] SequentialTest.IsCloneable (0 ms) 2023-01-11T23:32:07.4098583Z [ RUN ] SequentialTest.RegistersElementsAsSubmodules 2023-01-11T23:32:07.4099083Z [ OK ] SequentialTest.RegistersElementsAsSubmodules (0 ms) 2023-01-11T23:32:07.4099466Z [ RUN ] SequentialTest.CloneToDevice_CUDA 2023-01-11T23:32:07.4101379Z [ OK ] SequentialTest.CloneToDevice_CUDA (0 ms) 2023-01-11T23:32:07.4101826Z [ RUN ] SequentialTest.PrettyPrintSequential 2023-01-11T23:32:07.4104875Z [ OK ] SequentialTest.PrettyPrintSequential (0 ms) 2023-01-11T23:32:07.4105370Z [ RUN ] SequentialTest.ModuleForwardMethodOptionalArg 2023-01-11T23:32:07.4137734Z [ OK ] SequentialTest.ModuleForwardMethodOptionalArg (3 ms) 2023-01-11T23:32:07.4138317Z [----------] 20 tests from SequentialTest (12 ms total) 2023-01-11T23:32:07.4138535Z 2023-01-11T23:32:07.4138706Z [----------] 17 tests from TransformerTest 2023-01-11T23:32:07.4139063Z [ RUN ] TransformerTest.TransformerEncoderLayer 2023-01-11T23:32:07.4214157Z [ OK ] TransformerTest.TransformerEncoderLayer (7 ms) 2023-01-11T23:32:07.4215544Z [ RUN ] TransformerTest.TransformerEncoderLayer_CUDA 2023-01-11T23:32:07.4376053Z [ OK ] TransformerTest.TransformerEncoderLayer_CUDA (16 ms) 2023-01-11T23:32:07.4376874Z [ RUN ] TransformerTest.TransformerDecoderLayer 2023-01-11T23:32:07.4452835Z [ OK ] TransformerTest.TransformerDecoderLayer (7 ms) 2023-01-11T23:32:07.4453737Z [ RUN ] TransformerTest.TransformerDecoderLayer_CUDA 2023-01-11T23:32:07.4627738Z [ OK ] TransformerTest.TransformerDecoderLayer_CUDA (17 ms) 2023-01-11T23:32:07.4628584Z [ RUN ] TransformerTest.TransformerDecoderLayer_gelu 2023-01-11T23:32:07.4670148Z [ OK ] TransformerTest.TransformerDecoderLayer_gelu (4 ms) 2023-01-11T23:32:07.4671399Z [ RUN ] TransformerTest.TransformerDecoderLayer_gelu_CUDA 2023-01-11T23:32:07.4763119Z [ OK ] TransformerTest.TransformerDecoderLayer_gelu_CUDA (9 ms) 2023-01-11T23:32:07.4763530Z [ RUN ] TransformerTest.TransformerEncoder 2023-01-11T23:32:07.4910929Z [ OK ] TransformerTest.TransformerEncoder (14 ms) 2023-01-11T23:32:07.4912102Z [ RUN ] TransformerTest.TransformerEncoder_CUDA 2023-01-11T23:32:07.5205763Z [ OK ] TransformerTest.TransformerEncoder_CUDA (29 ms) 2023-01-11T23:32:07.5206207Z [ RUN ] TransformerTest.PrettyPrintTransformerEncoderLayer 2023-01-11T23:32:07.5206664Z [ OK ] TransformerTest.PrettyPrintTransformerEncoderLayer (0 ms) 2023-01-11T23:32:07.5207116Z [ RUN ] TransformerTest.PrettyPrintTransformerEncoder 2023-01-11T23:32:07.5215225Z [ OK ] TransformerTest.PrettyPrintTransformerEncoder (0 ms) 2023-01-11T23:32:07.5215696Z [ RUN ] TransformerTest.PrettyPrintTransformerDecoderLayer 2023-01-11T23:32:07.5216282Z [ OK ] TransformerTest.PrettyPrintTransformerDecoderLayer (0 ms) 2023-01-11T23:32:07.5216688Z [ RUN ] TransformerTest.TransformerDecoder 2023-01-11T23:32:07.5644816Z [ OK ] TransformerTest.TransformerDecoder (42 ms) 2023-01-11T23:32:07.5645203Z [ RUN ] TransformerTest.TransformerDecoder_CUDA 2023-01-11T23:32:07.6513734Z [ OK ] TransformerTest.TransformerDecoder_CUDA (86 ms) 2023-01-11T23:32:07.6514583Z [ RUN ] TransformerTest.PrettyPrintTransformerDecoder 2023-01-11T23:32:07.6521103Z [ OK ] TransformerTest.PrettyPrintTransformerDecoder (0 ms) 2023-01-11T23:32:07.6521882Z [ RUN ] TransformerTest.Transformer 2023-01-11T23:32:07.6675519Z [ OK ] TransformerTest.Transformer (15 ms) 2023-01-11T23:32:07.6676220Z [ RUN ] TransformerTest.Transformer_CUDA 2023-01-11T23:32:07.7005886Z [ OK ] TransformerTest.Transformer_CUDA (33 ms) 2023-01-11T23:32:07.7006286Z [ RUN ] TransformerTest.TransformerArgsCorrectness 2023-01-11T23:32:07.7066266Z [ OK ] TransformerTest.TransformerArgsCorrectness (5 ms) 2023-01-11T23:32:07.7067020Z [----------] 17 tests from TransformerTest (292 ms total) 2023-01-11T23:32:07.7067331Z 2023-01-11T23:32:07.7067621Z [----------] 24 tests from SerializeTest 2023-01-11T23:32:07.7068157Z [ RUN ] SerializeTest.KeysFunc 2023-01-11T23:32:07.7069078Z [ OK ] SerializeTest.KeysFunc (0 ms) 2023-01-11T23:32:07.7069645Z [ RUN ] SerializeTest.TryReadFunc 2023-01-11T23:32:07.7072956Z [ OK ] SerializeTest.TryReadFunc (0 ms) 2023-01-11T23:32:07.7073275Z [ RUN ] SerializeTest.Basic 2023-01-11T23:32:07.7073912Z [ OK ] SerializeTest.Basic (0 ms) 2023-01-11T23:32:07.7074362Z [ RUN ] SerializeTest.MathBits 2023-01-11T23:32:07.7155161Z [ OK ] SerializeTest.MathBits (7 ms) 2023-01-11T23:32:07.7156119Z [ RUN ] SerializeTest.BasicToFile 2023-01-11T23:32:07.7156991Z [ OK ] SerializeTest.BasicToFile (0 ms) 2023-01-11T23:32:07.7157624Z [ RUN ] SerializeTest.BasicViaFunc 2023-01-11T23:32:07.7159120Z [ OK ] SerializeTest.BasicViaFunc (0 ms) 2023-01-11T23:32:07.7159879Z [ RUN ] SerializeTest.Resized 2023-01-11T23:32:07.7160969Z [ OK ] SerializeTest.Resized (0 ms) 2023-01-11T23:32:07.7161749Z [ RUN ] SerializeTest.Sliced 2023-01-11T23:32:07.7162784Z [ OK ] SerializeTest.Sliced (0 ms) 2023-01-11T23:32:07.7163189Z [ RUN ] SerializeTest.NonContiguous 2023-01-11T23:32:07.7165800Z [ OK ] SerializeTest.NonContiguous (0 ms) 2023-01-11T23:32:07.7166273Z [ RUN ] SerializeTest.ErrorOnMissingKey 2023-01-11T23:32:07.7245757Z [ OK ] SerializeTest.ErrorOnMissingKey (7 ms) 2023-01-11T23:32:07.7246212Z [ RUN ] SerializeTest.XOR 2023-01-11T23:32:07.9141809Z [ OK ] SerializeTest.XOR (189 ms) 2023-01-11T23:32:07.9142470Z [ RUN ] SerializeTest.Optim 2023-01-11T23:32:07.9158375Z [ OK ] SerializeTest.Optim (2 ms) 2023-01-11T23:32:07.9158724Z [ RUN ] SerializeTest.Optim_Adagrad 2023-01-11T23:32:07.9188732Z [ OK ] SerializeTest.Optim_Adagrad (3 ms) 2023-01-11T23:32:07.9189059Z [ RUN ] SerializeTest.Optim_SGD 2023-01-11T23:32:07.9220104Z [ OK ] SerializeTest.Optim_SGD (2 ms) 2023-01-11T23:32:07.9220758Z [ RUN ] SerializeTest.Optim_Adam 2023-01-11T23:32:07.9249867Z [ OK ] SerializeTest.Optim_Adam (3 ms) 2023-01-11T23:32:07.9250201Z [ RUN ] SerializeTest.Optim_AdamW 2023-01-11T23:32:07.9282198Z [ OK ] SerializeTest.Optim_AdamW (3 ms) 2023-01-11T23:32:07.9282539Z [ RUN ] SerializeTest.Optim_RMSprop 2023-01-11T23:32:07.9315072Z [ OK ] SerializeTest.Optim_RMSprop (3 ms) 2023-01-11T23:32:07.9315407Z [ RUN ] SerializeTest.Optim_LBFGS 2023-01-11T23:32:07.9344573Z [ OK ] SerializeTest.Optim_LBFGS (3 ms) 2023-01-11T23:32:07.9344903Z [ RUN ] SerializeTest.XOR_CUDA 2023-01-11T23:32:08.1537372Z [ OK ] SerializeTest.XOR_CUDA (218 ms) 2023-01-11T23:32:08.1538345Z [ RUN ] SerializeTest.CanSerializeModulesWithIntermediateModulesWithoutParametersOrBuffers 2023-01-11T23:32:08.1540389Z [ OK ] SerializeTest.CanSerializeModulesWithIntermediateModulesWithoutParametersOrBuffers (0 ms) 2023-01-11T23:32:08.1541293Z [ RUN ] SerializeTest.VectorOfTensors 2023-01-11T23:32:08.1543681Z [ OK ] SerializeTest.VectorOfTensors (0 ms) 2023-01-11T23:32:08.1544289Z [ RUN ] SerializeTest.IValue 2023-01-11T23:32:08.1544610Z [ OK ] SerializeTest.IValue (0 ms) 2023-01-11T23:32:08.1545048Z [ RUN ] SerializeTest.UnserializableSubmoduleIsSkippedWhenSavingModule 2023-01-11T23:32:08.1545637Z [ OK ] SerializeTest.UnserializableSubmoduleIsSkippedWhenSavingModule (0 ms) 2023-01-11T23:32:08.1546171Z [ RUN ] SerializeTest.UnserializableSubmoduleIsIgnoredWhenLoadingModule 2023-01-11T23:32:08.1553245Z [ OK ] SerializeTest.UnserializableSubmoduleIsIgnoredWhenLoadingModule (0 ms) 2023-01-11T23:32:08.1553720Z [----------] 24 tests from SerializeTest (448 ms total) 2023-01-11T23:32:08.1553888Z 2023-01-11T23:32:08.1554044Z [----------] 1 test from SpecialTest 2023-01-11T23:32:08.1554480Z [ RUN ] SpecialTest.special 2023-01-11T23:32:08.1554768Z [ OK ] SpecialTest.special (0 ms) 2023-01-11T23:32:08.1555089Z [----------] 1 test from SpecialTest (0 ms total) 2023-01-11T23:32:08.1555251Z 2023-01-11T23:32:08.1555400Z [----------] 5 tests from TestStatic 2023-01-11T23:32:08.1555678Z [ RUN ] TestStatic.AllOf 2023-01-11T23:32:08.1555963Z [ OK ] TestStatic.AllOf (0 ms) 2023-01-11T23:32:08.1556236Z [ RUN ] TestStatic.AnyOf 2023-01-11T23:32:08.1556504Z [ OK ] TestStatic.AnyOf (0 ms) 2023-01-11T23:32:08.1556797Z [ RUN ] TestStatic.EnableIfModule 2023-01-11T23:32:08.1557118Z [ OK ] TestStatic.EnableIfModule (0 ms) 2023-01-11T23:32:08.1557446Z [ RUN ] TestStatic.ReturnTypeOfForward 2023-01-11T23:32:08.1557782Z [ OK ] TestStatic.ReturnTypeOfForward (0 ms) 2023-01-11T23:32:08.1558080Z [ RUN ] TestStatic.Apply 2023-01-11T23:32:08.1558365Z [ OK ] TestStatic.Apply (0 ms) 2023-01-11T23:32:08.1558670Z [----------] 5 tests from TestStatic (0 ms total) 2023-01-11T23:32:08.1558825Z 2023-01-11T23:32:08.1558974Z [----------] 49 tests from TensorTest 2023-01-11T23:32:08.1559351Z [ RUN ] TensorTest.ToDoesNotCopyWhenOptionsAreAllTheSame_CUDA 2023-01-11T23:32:08.1559810Z [ OK ] TensorTest.ToDoesNotCopyWhenOptionsAreAllTheSame_CUDA (0 ms) 2023-01-11T23:32:08.1560228Z [ RUN ] TensorTest.MagmaInitializesCorrectly_CUDA 2023-01-11T23:32:08.2986685Z [ OK ] TensorTest.MagmaInitializesCorrectly_CUDA (142 ms) 2023-01-11T23:32:08.2987338Z [ RUN ] TensorTest.ToDtype 2023-01-11T23:32:08.2987872Z [ OK ] TensorTest.ToDtype (0 ms) 2023-01-11T23:32:08.2988475Z [ RUN ] TensorTest.ToTensorAndTensorAttributes 2023-01-11T23:32:08.2989174Z [ OK ] TensorTest.ToTensorAndTensorAttributes (0 ms) 2023-01-11T23:32:08.2989871Z [ RUN ] TensorTest.ToOptionsWithRequiresGrad 2023-01-11T23:32:08.3007254Z [ OK ] TensorTest.ToOptionsWithRequiresGrad (2 ms) 2023-01-11T23:32:08.3008575Z [ RUN ] TensorTest.ToDoesNotCopyWhenOptionsAreAllTheSame 2023-01-11T23:32:08.3009098Z [ OK ] TensorTest.ToDoesNotCopyWhenOptionsAreAllTheSame (0 ms) 2023-01-11T23:32:08.3009791Z [ RUN ] TensorTest.AtTensorCtorScalar 2023-01-11T23:32:08.3010145Z [ OK ] TensorTest.AtTensorCtorScalar (0 ms) 2023-01-11T23:32:08.3010476Z [ RUN ] TensorTest.AtTensorCtorSingleDim 2023-01-11T23:32:08.3010830Z [ OK ] TensorTest.AtTensorCtorSingleDim (0 ms) 2023-01-11T23:32:08.3011187Z [ RUN ] TensorTest.AtTensorCastRealToComplex 2023-01-11T23:32:08.3011560Z [ OK ] TensorTest.AtTensorCastRealToComplex (0 ms) 2023-01-11T23:32:08.3011956Z [ RUN ] TensorTest.AtTensorCastComplexToRealErrorChecks 2023-01-11T23:32:08.3064537Z [ OK ] TensorTest.AtTensorCastComplexToRealErrorChecks (5 ms) 2023-01-11T23:32:08.3065200Z [ RUN ] TensorTest.TorchTensorCtorScalarIntegralType 2023-01-11T23:32:08.3065863Z [ OK ] TensorTest.TorchTensorCtorScalarIntegralType (0 ms) 2023-01-11T23:32:08.3066553Z [ RUN ] TensorTest.TorchTensorCtorScalarFloatingType 2023-01-11T23:32:08.3067231Z [ OK ] TensorTest.TorchTensorCtorScalarFloatingType (0 ms) 2023-01-11T23:32:08.3067869Z [ RUN ] TensorTest.TorchTensorCtorScalarBoolType 2023-01-11T23:32:08.3068474Z [ OK ] TensorTest.TorchTensorCtorScalarBoolType (0 ms) 2023-01-11T23:32:08.3069102Z [ RUN ] TensorTest.TorchTensorCtorSingleDimIntegralType 2023-01-11T23:32:08.3069748Z [ OK ] TensorTest.TorchTensorCtorSingleDimIntegralType (0 ms) 2023-01-11T23:32:08.3070485Z [ RUN ] TensorTest.TorchTensorCtorSingleDimFloatingType 2023-01-11T23:32:08.3071344Z [ OK ] TensorTest.TorchTensorCtorSingleDimFloatingType (0 ms) 2023-01-11T23:32:08.3071979Z [ RUN ] TensorTest.TorchTensorCtorSingleDimBoolType 2023-01-11T23:32:08.3072634Z [ OK ] TensorTest.TorchTensorCtorSingleDimBoolType (0 ms) 2023-01-11T23:32:08.3073307Z [ RUN ] TensorTest.TorchTensorCtorMultiDimIntegralType 2023-01-11T23:32:08.3074395Z [ OK ] TensorTest.TorchTensorCtorMultiDimIntegralType (0 ms) 2023-01-11T23:32:08.3075062Z [ RUN ] TensorTest.TorchTensorCtorMultiDimFloatingType 2023-01-11T23:32:08.3075953Z [ OK ] TensorTest.TorchTensorCtorMultiDimFloatingType (0 ms) 2023-01-11T23:32:08.3076565Z [ RUN ] TensorTest.TorchTensorCtorMultiDimBoolType 2023-01-11T23:32:08.3077135Z [ OK ] TensorTest.TorchTensorCtorMultiDimBoolType (0 ms) 2023-01-11T23:32:08.3077719Z [ RUN ] TensorTest.TorchTensorCtorMultiDimWithOptions 2023-01-11T23:32:08.3078642Z [ OK ] TensorTest.TorchTensorCtorMultiDimWithOptions (0 ms) 2023-01-11T23:32:08.3079228Z [ RUN ] TensorTest.TorchTensorCtorMultiDimErrorChecks 2023-01-11T23:32:08.3139448Z [ OK ] TensorTest.TorchTensorCtorMultiDimErrorChecks (5 ms) 2023-01-11T23:32:08.3140816Z [ RUN ] TensorTest.TorchTensorCastRealToComplex 2023-01-11T23:32:08.3141892Z [ OK ] TensorTest.TorchTensorCastRealToComplex (0 ms) 2023-01-11T23:32:08.3142568Z [ RUN ] TensorTest.TorchTensorCastComplexToRealErrorChecks 2023-01-11T23:32:08.3142963Z [W Copy.cpp:276] Warning: Casting complex values to real discards the imaginary part (function operator()) 2023-01-11T23:32:08.3151916Z [ OK ] TensorTest.TorchTensorCastComplexToRealErrorChecks (1 ms) 2023-01-11T23:32:08.3152565Z [ RUN ] TensorTest.TorchTensorCtorMultiDim_CUDA 2023-01-11T23:32:08.3156983Z [ OK ] TensorTest.TorchTensorCtorMultiDim_CUDA (0 ms) 2023-01-11T23:32:08.3157547Z [ RUN ] TensorTest.TorchTensorCtorZeroSizedDim 2023-01-11T23:32:08.3158093Z [ OK ] TensorTest.TorchTensorCtorZeroSizedDim (0 ms) 2023-01-11T23:32:08.3158583Z [ RUN ] TensorTest.TorchTensorCtorWithoutSpecifyingDtype 2023-01-11T23:32:08.3159041Z [ OK ] TensorTest.TorchTensorCtorWithoutSpecifyingDtype (0 ms) 2023-01-11T23:32:08.3159768Z [ RUN ] TensorTest.TorchTensorCtorWithNonDtypeOptions 2023-01-11T23:32:08.3160402Z [ OK ] TensorTest.TorchTensorCtorWithNonDtypeOptions (0 ms) 2023-01-11T23:32:08.3160800Z [ RUN ] TensorTest.Arange 2023-01-11T23:32:08.3161080Z [ OK ] TensorTest.Arange (0 ms) 2023-01-11T23:32:08.3161438Z [ RUN ] TensorTest.PrettyPrintTensorDataContainer 2023-01-11T23:32:08.3161844Z [ OK ] TensorTest.PrettyPrintTensorDataContainer (0 ms) 2023-01-11T23:32:08.3162300Z [ RUN ] TensorTest.TensorDataContainerCallingAccessorOfWrongType 2023-01-11T23:32:08.3225780Z [ OK ] TensorTest.TensorDataContainerCallingAccessorOfWrongType (6 ms) 2023-01-11T23:32:08.3226315Z [ RUN ] TensorTest.FromBlob 2023-01-11T23:32:08.3226703Z [ OK ] TensorTest.FromBlob (0 ms) 2023-01-11T23:32:08.3227087Z [ RUN ] TensorTest.FromBlobUsesDeleter 2023-01-11T23:32:08.3227446Z [ OK ] TensorTest.FromBlobUsesDeleter (0 ms) 2023-01-11T23:32:08.3227785Z [ RUN ] TensorTest.FromBlobWithStrides 2023-01-11T23:32:08.3228127Z [ OK ] TensorTest.FromBlobWithStrides (0 ms) 2023-01-11T23:32:08.3228418Z [ RUN ] TensorTest.Item 2023-01-11T23:32:08.3228700Z [ OK ] TensorTest.Item (0 ms) 2023-01-11T23:32:08.3228981Z [ RUN ] TensorTest.Item_CUDA 2023-01-11T23:32:08.3229264Z [ OK ] TensorTest.Item_CUDA (0 ms) 2023-01-11T23:32:08.3234423Z [ RUN ] TensorTest.DataPtr 2023-01-11T23:32:08.3234859Z [ OK ] TensorTest.DataPtr (0 ms) 2023-01-11T23:32:08.3235149Z [ RUN ] TensorTest.Data 2023-01-11T23:32:08.3235443Z [ OK ] TensorTest.Data (0 ms) 2023-01-11T23:32:08.3235736Z [ RUN ] TensorTest.BackwardAndGrad 2023-01-11T23:32:08.3236058Z [ OK ] TensorTest.BackwardAndGrad (0 ms) 2023-01-11T23:32:08.3236407Z [ RUN ] TensorTest.BackwardCreatesOnesGrad 2023-01-11T23:32:08.3236768Z [ OK ] TensorTest.BackwardCreatesOnesGrad (0 ms) 2023-01-11T23:32:08.3237123Z [ RUN ] TensorTest.BackwardNonScalarOutputs 2023-01-11T23:32:08.3262106Z [ OK ] TensorTest.BackwardNonScalarOutputs (3 ms) 2023-01-11T23:32:08.3263323Z [ RUN ] TensorTest.IsLeaf 2023-01-11T23:32:08.3263783Z [ OK ] TensorTest.IsLeaf (0 ms) 2023-01-11T23:32:08.3264204Z [ RUN ] TensorTest.OutputNr 2023-01-11T23:32:08.3264602Z [ OK ] TensorTest.OutputNr (0 ms) 2023-01-11T23:32:08.3274531Z [ RUN ] TensorTest.Version 2023-01-11T23:32:08.3274822Z [ OK ] TensorTest.Version (0 ms) 2023-01-11T23:32:08.3275100Z [ RUN ] TensorTest.Detach 2023-01-11T23:32:08.3275379Z [ OK ] TensorTest.Detach (0 ms) 2023-01-11T23:32:08.3275663Z [ RUN ] TensorTest.DetachInplace 2023-01-11T23:32:08.3275975Z [ OK ] TensorTest.DetachInplace (0 ms) 2023-01-11T23:32:08.3276271Z [ RUN ] TensorTest.SetData 2023-01-11T23:32:08.3276550Z [ OK ] TensorTest.SetData (0 ms) 2023-01-11T23:32:08.3276862Z [ RUN ] TensorTest.RequiresGradInplace 2023-01-11T23:32:08.3287847Z [ OK ] TensorTest.RequiresGradInplace (2 ms) 2023-01-11T23:32:08.3288225Z [ RUN ] TensorTest.StdDimension 2023-01-11T23:32:08.3288585Z [ OK ] TensorTest.StdDimension (0 ms) 2023-01-11T23:32:08.3288902Z [ RUN ] TensorTest.ReshapeAlias 2023-01-11T23:32:08.3291226Z [ OK ] TensorTest.ReshapeAlias (0 ms) 2023-01-11T23:32:08.3291921Z [----------] 49 tests from TensorTest (173 ms total) 2023-01-11T23:32:08.3292109Z 2023-01-11T23:32:08.3292547Z [----------] 40 tests from TensorIndexingTest 2023-01-11T23:32:08.3292863Z [ RUN ] TensorIndexingTest.Slice 2023-01-11T23:32:08.3293296Z [ OK ] TensorIndexingTest.Slice (0 ms) 2023-01-11T23:32:08.3293653Z [ RUN ] TensorIndexingTest.TensorIndex 2023-01-11T23:32:08.3304668Z [ OK ] TensorIndexingTest.TensorIndex (1 ms) 2023-01-11T23:32:08.3305026Z [ RUN ] TensorIndexingTest.TestNoIndices 2023-01-11T23:32:08.3394733Z [ OK ] TensorIndexingTest.TestNoIndices (8 ms) 2023-01-11T23:32:08.3395682Z [ RUN ] TensorIndexingTest.TestAdvancedIndexingWithListOfTensor 2023-01-11T23:32:08.3396698Z [ OK ] TensorIndexingTest.TestAdvancedIndexingWithListOfTensor (0 ms) 2023-01-11T23:32:08.3397523Z [ RUN ] TensorIndexingTest.TestSingleInt 2023-01-11T23:32:08.3398211Z [ OK ] TensorIndexingTest.TestSingleInt (0 ms) 2023-01-11T23:32:08.3398927Z [ RUN ] TensorIndexingTest.TestMultipleInt 2023-01-11T23:32:08.3399645Z [ OK ] TensorIndexingTest.TestMultipleInt (0 ms) 2023-01-11T23:32:08.3400319Z [ RUN ] TensorIndexingTest.TestNone 2023-01-11T23:32:08.3400951Z [ OK ] TensorIndexingTest.TestNone (0 ms) 2023-01-11T23:32:08.3401582Z [ RUN ] TensorIndexingTest.TestStep 2023-01-11T23:32:08.3402351Z [ OK ] TensorIndexingTest.TestStep (0 ms) 2023-01-11T23:32:08.3402862Z [ RUN ] TensorIndexingTest.TestStepAssignment 2023-01-11T23:32:08.3403372Z [ OK ] TensorIndexingTest.TestStepAssignment (0 ms) 2023-01-11T23:32:08.3403835Z [ RUN ] TensorIndexingTest.TestBoolIndices 2023-01-11T23:32:08.3404187Z [ OK ] TensorIndexingTest.TestBoolIndices (0 ms) 2023-01-11T23:32:08.3404684Z [ RUN ] TensorIndexingTest.TestBoolIndicesAccumulate 2023-01-11T23:32:08.3405252Z [ OK ] TensorIndexingTest.TestBoolIndicesAccumulate (0 ms) 2023-01-11T23:32:08.3405768Z [ RUN ] TensorIndexingTest.TestMultipleBoolIndices 2023-01-11T23:32:08.3406251Z [ OK ] TensorIndexingTest.TestMultipleBoolIndices (0 ms) 2023-01-11T23:32:08.3406764Z [ RUN ] TensorIndexingTest.TestByteMask 2023-01-11T23:32:08.3407136Z [ OK ] TensorIndexingTest.TestByteMask (0 ms) 2023-01-11T23:32:08.3407493Z [ RUN ] TensorIndexingTest.TestByteMaskAccumulate 2023-01-11T23:32:08.3407888Z [ OK ] TensorIndexingTest.TestByteMaskAccumulate (0 ms) 2023-01-11T23:32:08.3408275Z [ RUN ] TensorIndexingTest.TestMultipleByteMask 2023-01-11T23:32:08.3408658Z [ OK ] TensorIndexingTest.TestMultipleByteMask (0 ms) 2023-01-11T23:32:08.3409020Z [ RUN ] TensorIndexingTest.TestByteMask2d 2023-01-11T23:32:08.3409383Z [ OK ] TensorIndexingTest.TestByteMask2d (0 ms) 2023-01-11T23:32:08.3409727Z [ RUN ] TensorIndexingTest.TestIntIndices 2023-01-11T23:32:08.3410069Z [ OK ] TensorIndexingTest.TestIntIndices (0 ms) 2023-01-11T23:32:08.3410426Z [ RUN ] TensorIndexingTest.TestIntIndices2d 2023-01-11T23:32:08.3410786Z [ OK ] TensorIndexingTest.TestIntIndices2d (0 ms) 2023-01-11T23:32:08.3411155Z [ RUN ] TensorIndexingTest.TestIntIndicesBroadcast 2023-01-11T23:32:08.3411557Z [ OK ] TensorIndexingTest.TestIntIndicesBroadcast (0 ms) 2023-01-11T23:32:08.3411930Z [ RUN ] TensorIndexingTest.TestEmptyIndex 2023-01-11T23:32:08.3412282Z [ OK ] TensorIndexingTest.TestEmptyIndex (0 ms) 2023-01-11T23:32:08.3412659Z [ RUN ] TensorIndexingTest.TestEmptyNdimIndex 2023-01-11T23:32:08.3466725Z [ OK ] TensorIndexingTest.TestEmptyNdimIndex (5 ms) 2023-01-11T23:32:08.3467317Z [ RUN ] TensorIndexingTest.TestEmptyNdimIndex_CUDA 2023-01-11T23:32:08.3467752Z [ OK ] TensorIndexingTest.TestEmptyNdimIndex_CUDA (0 ms) 2023-01-11T23:32:08.3468150Z [ RUN ] TensorIndexingTest.TestEmptyNdimIndexBool 2023-01-11T23:32:08.3487653Z [ OK ] TensorIndexingTest.TestEmptyNdimIndexBool (2 ms) 2023-01-11T23:32:08.3488228Z [ RUN ] TensorIndexingTest.TestEmptyNdimIndexBool_CUDA 2023-01-11T23:32:08.3509176Z [ OK ] TensorIndexingTest.TestEmptyNdimIndexBool_CUDA (2 ms) 2023-01-11T23:32:08.3510402Z [ RUN ] TensorIndexingTest.TestEmptySlice 2023-01-11T23:32:08.3511151Z [ OK ] TensorIndexingTest.TestEmptySlice (0 ms) 2023-01-11T23:32:08.3511866Z [ RUN ] TensorIndexingTest.TestEmptySlice_CUDA 2023-01-11T23:32:08.3512584Z [ OK ] TensorIndexingTest.TestEmptySlice_CUDA (0 ms) 2023-01-11T23:32:08.3513124Z [ RUN ] TensorIndexingTest.TestIndexGetitemCopyBoolsSlices 2023-01-11T23:32:08.3513574Z [ OK ] TensorIndexingTest.TestIndexGetitemCopyBoolsSlices (0 ms) 2023-01-11T23:32:08.3514009Z [ RUN ] TensorIndexingTest.TestIndexSetitemBoolsSlices 2023-01-11T23:32:08.3601685Z [ OK ] TensorIndexingTest.TestIndexSetitemBoolsSlices (9 ms) 2023-01-11T23:32:08.3602798Z [ RUN ] TensorIndexingTest.TestIndexScalarWithBoolMask 2023-01-11T23:32:08.3603286Z [ OK ] TensorIndexingTest.TestIndexScalarWithBoolMask (0 ms) 2023-01-11T23:32:08.3603735Z [ RUN ] TensorIndexingTest.TestIndexScalarWithBoolMask_CUDA 2023-01-11T23:32:08.3607547Z [ OK ] TensorIndexingTest.TestIndexScalarWithBoolMask_CUDA (0 ms) 2023-01-11T23:32:08.3607997Z [ RUN ] TensorIndexingTest.TestSetitemExpansionError 2023-01-11T23:32:08.3727703Z [ OK ] TensorIndexingTest.TestSetitemExpansionError (11 ms) 2023-01-11T23:32:08.3728283Z [ RUN ] TensorIndexingTest.TestGetitemScalars 2023-01-11T23:32:08.3839039Z [ OK ] TensorIndexingTest.TestGetitemScalars (10 ms) 2023-01-11T23:32:08.3839816Z [ RUN ] TensorIndexingTest.TestSetitemScalars 2023-01-11T23:32:08.3946927Z [ OK ] TensorIndexingTest.TestSetitemScalars (10 ms) 2023-01-11T23:32:08.3947344Z [ RUN ] TensorIndexingTest.TestBasicAdvancedCombined 2023-01-11T23:32:08.3947759Z [ OK ] TensorIndexingTest.TestBasicAdvancedCombined (0 ms) 2023-01-11T23:32:08.3948155Z [ RUN ] TensorIndexingTest.TestIntAssignment 2023-01-11T23:32:08.3948532Z [ OK ] TensorIndexingTest.TestIntAssignment (0 ms) 2023-01-11T23:32:08.3948918Z [ RUN ] TensorIndexingTest.TestByteTensorAssignment 2023-01-11T23:32:08.3949531Z [ OK ] TensorIndexingTest.TestByteTensorAssignment (0 ms) 2023-01-11T23:32:08.3949983Z [ RUN ] TensorIndexingTest.TestVariableSlicing 2023-01-11T23:32:08.3950386Z [ OK ] TensorIndexingTest.TestVariableSlicing (0 ms) 2023-01-11T23:32:08.3950764Z [ RUN ] TensorIndexingTest.TestEllipsisTensor 2023-01-11T23:32:08.3952979Z [ OK ] TensorIndexingTest.TestEllipsisTensor (0 ms) 2023-01-11T23:32:08.3953363Z [ RUN ] TensorIndexingTest.TestOutOfBoundIndex 2023-01-11T23:32:08.4051402Z [ OK ] TensorIndexingTest.TestOutOfBoundIndex (9 ms) 2023-01-11T23:32:08.4051793Z [ RUN ] TensorIndexingTest.TestZeroDimIndex 2023-01-11T23:32:08.4073394Z [ OK ] TensorIndexingTest.TestZeroDimIndex (2 ms) 2023-01-11T23:32:08.4074213Z [----------] 40 tests from TensorIndexingTest (78 ms total) 2023-01-11T23:32:08.4074727Z 2023-01-11T23:32:08.4075125Z [----------] 18 tests from NumpyTests 2023-01-11T23:32:08.4075714Z [ RUN ] NumpyTests.TestNoneIndex 2023-01-11T23:32:08.4076346Z [ OK ] NumpyTests.TestNoneIndex (0 ms) 2023-01-11T23:32:08.4077006Z [ RUN ] NumpyTests.TestEmptyFancyIndex 2023-01-11T23:32:08.4126491Z [ OK ] NumpyTests.TestEmptyFancyIndex (5 ms) 2023-01-11T23:32:08.4126987Z [ RUN ] NumpyTests.TestEllipsisIndex 2023-01-11T23:32:08.4127474Z [ OK ] NumpyTests.TestEllipsisIndex (0 ms) 2023-01-11T23:32:08.4127988Z [ RUN ] NumpyTests.TestSingleIntIndex 2023-01-11T23:32:08.4152736Z [ OK ] NumpyTests.TestSingleIntIndex (2 ms) 2023-01-11T23:32:08.4153766Z [ RUN ] NumpyTests.TestSingleBoolIndex 2023-01-11T23:32:08.4154718Z [ OK ] NumpyTests.TestSingleBoolIndex (0 ms) 2023-01-11T23:32:08.4155447Z [ RUN ] NumpyTests.TestBooleanShapeMismatch 2023-01-11T23:32:08.4364317Z [ OK ] NumpyTests.TestBooleanShapeMismatch (21 ms) 2023-01-11T23:32:08.4364864Z [ RUN ] NumpyTests.TestBooleanIndexingOnedim 2023-01-11T23:32:08.4365377Z [ OK ] NumpyTests.TestBooleanIndexingOnedim (0 ms) 2023-01-11T23:32:08.4365788Z [ RUN ] NumpyTests.TestBooleanAssignmentValueMismatch 2023-01-11T23:32:08.4518214Z [ OK ] NumpyTests.TestBooleanAssignmentValueMismatch (15 ms) 2023-01-11T23:32:08.4518779Z [ RUN ] NumpyTests.TestBooleanIndexingTwodim 2023-01-11T23:32:08.4519265Z [ OK ] NumpyTests.TestBooleanIndexingTwodim (0 ms) 2023-01-11T23:32:08.4519644Z [ RUN ] NumpyTests.TestBooleanIndexingWeirdness 2023-01-11T23:32:08.4631597Z [ OK ] NumpyTests.TestBooleanIndexingWeirdness (11 ms) 2023-01-11T23:32:08.4633143Z [ RUN ] NumpyTests.TestBooleanIndexingWeirdnessTensors 2023-01-11T23:32:08.4743918Z [ OK ] NumpyTests.TestBooleanIndexingWeirdnessTensors (11 ms) 2023-01-11T23:32:08.4744566Z [ RUN ] NumpyTests.TestBooleanIndexingAlldims 2023-01-11T23:32:08.4745243Z [ OK ] NumpyTests.TestBooleanIndexingAlldims (0 ms) 2023-01-11T23:32:08.4745892Z [ RUN ] NumpyTests.TestBooleanListIndexing 2023-01-11T23:32:08.4746276Z [ OK ] NumpyTests.TestBooleanListIndexing (0 ms) 2023-01-11T23:32:08.4746653Z [ RUN ] NumpyTests.TestEverythingReturnsViews 2023-01-11T23:32:08.4747028Z [ OK ] NumpyTests.TestEverythingReturnsViews (0 ms) 2023-01-11T23:32:08.4747396Z [ RUN ] NumpyTests.TestBroaderrorsIndexing 2023-01-11T23:32:08.4964507Z [ OK ] NumpyTests.TestBroaderrorsIndexing (21 ms) 2023-01-11T23:32:08.4964904Z [ RUN ] NumpyTests.TestTrivialFancyOutOfBounds 2023-01-11T23:32:08.5238842Z [ OK ] NumpyTests.TestTrivialFancyOutOfBounds (27 ms) 2023-01-11T23:32:08.5239637Z [ RUN ] NumpyTests.TestIndexIsLarger 2023-01-11T23:32:08.5240416Z [ OK ] NumpyTests.TestIndexIsLarger (0 ms) 2023-01-11T23:32:08.5241221Z [ RUN ] NumpyTests.TestBroadcastSubspace 2023-01-11T23:32:08.5242087Z [ OK ] NumpyTests.TestBroadcastSubspace (0 ms) 2023-01-11T23:32:08.5242827Z [----------] 18 tests from NumpyTests (116 ms total) 2023-01-11T23:32:08.5243159Z 2023-01-11T23:32:08.5243431Z [----------] 6 tests from TensorOptionsTest 2023-01-11T23:32:08.5243851Z [ RUN ] TensorOptionsTest.ConstructsWellFromCUDATypes_CUDA 2023-01-11T23:32:08.5244358Z [ OK ] TensorOptionsTest.ConstructsWellFromCUDATypes_CUDA (0 ms) 2023-01-11T23:32:08.5244832Z [ RUN ] TensorOptionsTest.DefaultsToTheRightValues 2023-01-11T23:32:08.5245298Z [ OK ] TensorOptionsTest.DefaultsToTheRightValues (0 ms) 2023-01-11T23:32:08.5245819Z [ RUN ] TensorOptionsTest.UtilityFunctionsReturnTheRightTensorOptions 2023-01-11T23:32:08.5246408Z [ OK ] TensorOptionsTest.UtilityFunctionsReturnTheRightTensorOptions (0 ms) 2023-01-11T23:32:08.5246930Z [ RUN ] TensorOptionsTest.ConstructsWellFromCPUTypes 2023-01-11T23:32:08.5247398Z [ OK ] TensorOptionsTest.ConstructsWellFromCPUTypes (0 ms) 2023-01-11T23:32:08.5247876Z [ RUN ] TensorOptionsTest.ConstructsWellFromCPUTensors 2023-01-11T23:32:08.5248360Z [ OK ] TensorOptionsTest.ConstructsWellFromCPUTensors (0 ms) 2023-01-11T23:32:08.5248825Z [ RUN ] TensorOptionsTest.ConstructsWellFromVariables 2023-01-11T23:32:08.5249407Z [ OK ] TensorOptionsTest.ConstructsWellFromVariables (0 ms) 2023-01-11T23:32:08.5249861Z [----------] 6 tests from TensorOptionsTest (0 ms total) 2023-01-11T23:32:08.5250051Z 2023-01-11T23:32:08.5250222Z [----------] 1 test from DeviceTest 2023-01-11T23:32:08.5250588Z [ RUN ] DeviceTest.ParsesCorrectlyFromString 2023-01-11T23:32:08.5326407Z [ OK ] DeviceTest.ParsesCorrectlyFromString (8 ms) 2023-01-11T23:32:08.5326870Z [----------] 1 test from DeviceTest (8 ms total) 2023-01-11T23:32:08.5327050Z 2023-01-11T23:32:08.5327244Z [----------] 3 tests from DefaultDtypeTest 2023-01-11T23:32:08.5327641Z [ RUN ] DefaultDtypeTest.CanSetAndGetDefaultDtype 2023-01-11T23:32:08.5328111Z [ OK ] DefaultDtypeTest.CanSetAndGetDefaultDtype (0 ms) 2023-01-11T23:32:08.5328588Z [ RUN ] DefaultDtypeTest.NewTensorOptionsHasCorrectDefault 2023-01-11T23:32:08.5329092Z [ OK ] DefaultDtypeTest.NewTensorOptionsHasCorrectDefault (0 ms) 2023-01-11T23:32:08.5329595Z [ RUN ] DefaultDtypeTest.NewTensorsHaveCorrectDefaultDtype 2023-01-11T23:32:08.5330101Z [ OK ] DefaultDtypeTest.NewTensorsHaveCorrectDefaultDtype (0 ms) 2023-01-11T23:32:08.5330553Z [----------] 3 tests from DefaultDtypeTest (0 ms total) 2023-01-11T23:32:08.5330733Z 2023-01-11T23:32:08.5330924Z [----------] 1 test from TorchIncludeTest 2023-01-11T23:32:08.5331292Z [ RUN ] TorchIncludeTest.GetSetNumThreads 2023-01-11T23:32:08.5539938Z [ OK ] TorchIncludeTest.GetSetNumThreads (21 ms) 2023-01-11T23:32:08.5540424Z [----------] 1 test from TorchIncludeTest (21 ms total) 2023-01-11T23:32:08.5540587Z 2023-01-11T23:32:08.5540758Z [----------] 28 tests from InferenceModeTest 2023-01-11T23:32:08.5541085Z [ RUN ] InferenceModeTest.TestTLSState 2023-01-11T23:32:08.5541436Z [ OK ] InferenceModeTest.TestTLSState (0 ms) 2023-01-11T23:32:08.5541819Z [ RUN ] InferenceModeTest.TestInferenceTensorCreation 2023-01-11T23:32:08.5542247Z [ OK ] InferenceModeTest.TestInferenceTensorCreation (0 ms) 2023-01-11T23:32:08.5542676Z [ RUN ] InferenceModeTest.TestExistingAutogradSession 2023-01-11T23:32:08.5612080Z [ OK ] InferenceModeTest.TestExistingAutogradSession (7 ms) 2023-01-11T23:32:08.5612579Z [ RUN ] InferenceModeTest.TestInferenceTensorInInferenceModeFunctionalOp 2023-01-11T23:32:08.5613141Z [ OK ] InferenceModeTest.TestInferenceTensorInInferenceModeFunctionalOp (0 ms) 2023-01-11T23:32:08.5613674Z [ RUN ] InferenceModeTest.TestInferenceTensorInInferenceModeInplaceOp 2023-01-11T23:32:08.5614198Z [ OK ] InferenceModeTest.TestInferenceTensorInInferenceModeInplaceOp (0 ms) 2023-01-11T23:32:08.5614966Z [ RUN ] InferenceModeTest.TestInferenceTensorInInferenceModeViewOp 2023-01-11T23:32:08.5615489Z [ OK ] InferenceModeTest.TestInferenceTensorInInferenceModeViewOp (0 ms) 2023-01-11T23:32:08.5616006Z [ RUN ] InferenceModeTest.TestInferenceTensorInNormalModeFunctionalOp 2023-01-11T23:32:08.5616526Z [ OK ] InferenceModeTest.TestInferenceTensorInNormalModeFunctionalOp (0 ms) 2023-01-11T23:32:08.5617043Z [ RUN ] InferenceModeTest.TestInferenceTensorInNormalModeInplaceOp 2023-01-11T23:32:08.5667552Z [ OK ] InferenceModeTest.TestInferenceTensorInNormalModeInplaceOp (5 ms) 2023-01-11T23:32:08.5668058Z [ RUN ] InferenceModeTest.TestInferenceTensorInNormalModeViewOp 2023-01-11T23:32:08.5668544Z [ OK ] InferenceModeTest.TestInferenceTensorInNormalModeViewOp (0 ms) 2023-01-11T23:32:08.5669051Z [ RUN ] InferenceModeTest.TestNormalTensorInplaceOutputInInferenceMode 2023-01-11T23:32:08.5669663Z [ OK ] InferenceModeTest.TestNormalTensorInplaceOutputInInferenceMode (0 ms) 2023-01-11T23:32:08.5670254Z [ RUN ] InferenceModeTest.TestNormalTensorInplaceOutputInNormalMode 2023-01-11T23:32:08.5670756Z [ OK ] InferenceModeTest.TestNormalTensorInplaceOutputInNormalMode (0 ms) 2023-01-11T23:32:08.5671257Z [ RUN ] InferenceModeTest.TestNormalTensorViewOutputInInferenceMode 2023-01-11T23:32:08.5671768Z [ OK ] InferenceModeTest.TestNormalTensorViewOutputInInferenceMode (0 ms) 2023-01-11T23:32:08.5672255Z [ RUN ] InferenceModeTest.TestNormalTensorViewOutputInNormalMode 2023-01-11T23:32:08.5702183Z [ OK ] InferenceModeTest.TestNormalTensorViewOutputInNormalMode (3 ms) 2023-01-11T23:32:08.5702731Z [ RUN ] InferenceModeTest.TestMixInferenceAndNormalTensorFunctionalOp 2023-01-11T23:32:08.5733874Z [ OK ] InferenceModeTest.TestMixInferenceAndNormalTensorFunctionalOp (3 ms) 2023-01-11T23:32:08.5734392Z [ RUN ] InferenceModeTest.TestMixInferenceAndNormalTensorInplaceOp 2023-01-11T23:32:08.5817646Z [ OK ] InferenceModeTest.TestMixInferenceAndNormalTensorInplaceOp (8 ms) 2023-01-11T23:32:08.5818155Z [ RUN ] InferenceModeTest.TestMixInferenceAndNormalTensorViewOp 2023-01-11T23:32:08.5818637Z [ OK ] InferenceModeTest.TestMixInferenceAndNormalTensorViewOp (0 ms) 2023-01-11T23:32:08.5819078Z [ RUN ] InferenceModeTest.TestHandleDirectViewOnRebase 2023-01-11T23:32:08.5852067Z [ OK ] InferenceModeTest.TestHandleDirectViewOnRebase (3 ms) 2023-01-11T23:32:08.5852507Z [ RUN ] InferenceModeTest.TestHandleInDirectViewOnRebase 2023-01-11T23:32:08.5874364Z [ OK ] InferenceModeTest.TestHandleInDirectViewOnRebase (2 ms) 2023-01-11T23:32:08.5874800Z [ RUN ] InferenceModeTest.TestCreationMetaPropagation 2023-01-11T23:32:08.5936936Z [ OK ] InferenceModeTest.TestCreationMetaPropagation (6 ms) 2023-01-11T23:32:08.5937392Z [ RUN ] InferenceModeTest.TestCreationMetaPropagationInput 2023-01-11T23:32:08.6061709Z [ OK ] InferenceModeTest.TestCreationMetaPropagationInput (12 ms) 2023-01-11T23:32:08.6062172Z [ RUN ] InferenceModeTest.TestInplaceCopyOnInferenceTensor 2023-01-11T23:32:08.6134869Z [ OK ] InferenceModeTest.TestInplaceCopyOnInferenceTensor (7 ms) 2023-01-11T23:32:08.6135325Z [ RUN ] InferenceModeTest.TestSetRequiresGradInNormalMode 2023-01-11T23:32:08.6145524Z [ OK ] InferenceModeTest.TestSetRequiresGradInNormalMode (1 ms) 2023-01-11T23:32:08.6146364Z [ RUN ] InferenceModeTest.TestAccessVersionCounter 2023-01-11T23:32:08.6179828Z [ OK ] InferenceModeTest.TestAccessVersionCounter (3 ms) 2023-01-11T23:32:08.6180350Z [ RUN ] InferenceModeTest.TestInplaceUpdateInferenceTensorWithNormalTensor 2023-01-11T23:32:08.6252291Z [ OK ] InferenceModeTest.TestInplaceUpdateInferenceTensorWithNormalTensor (7 ms) 2023-01-11T23:32:08.6252975Z [ RUN ] InferenceModeTest.TestComplexViewInInferenceMode 2023-01-11T23:32:08.6253548Z [ OK ] InferenceModeTest.TestComplexViewInInferenceMode (0 ms) 2023-01-11T23:32:08.6254003Z [ RUN ] InferenceModeTest.TestComplexViewInNormalMode 2023-01-11T23:32:08.6254685Z [ OK ] InferenceModeTest.TestComplexViewInNormalMode (0 ms) 2023-01-11T23:32:08.6255095Z [ RUN ] InferenceModeTest.TestCustomFunction 2023-01-11T23:32:08.6255464Z [ OK ] InferenceModeTest.TestCustomFunction (0 ms) 2023-01-11T23:32:08.6255897Z [ RUN ] InferenceModeTest.TestLegacyAutoNonVariableTypeModeWarning 2023-01-11T23:32:08.6256401Z [ OK ] InferenceModeTest.TestLegacyAutoNonVariableTypeModeWarning (0 ms) 2023-01-11T23:32:08.6256846Z [----------] 28 tests from InferenceModeTest (71 ms total) 2023-01-11T23:32:08.6257020Z 2023-01-11T23:32:08.6257396Z [----------] 4 tests from GradModeTest 2023-01-11T23:32:08.6257739Z [ RUN ] GradModeTest.TestRequiresGradFunctionalOp 2023-01-11T23:32:08.6258138Z [ OK ] GradModeTest.TestRequiresGradFunctionalOp (0 ms) 2023-01-11T23:32:08.6258527Z [ RUN ] GradModeTest.TestRequiresGradInplaceOp 2023-01-11T23:32:08.6258901Z [ OK ] GradModeTest.TestRequiresGradInplaceOp (0 ms) 2023-01-11T23:32:08.6259271Z [ RUN ] GradModeTest.TestRequiresGradViewOp 2023-01-11T23:32:08.6259637Z [ OK ] GradModeTest.TestRequiresGradViewOp (0 ms) 2023-01-11T23:32:08.6260017Z [ RUN ] GradModeTest.TestRequiresGradViewOpExiting 2023-01-11T23:32:08.6286930Z [ OK ] GradModeTest.TestRequiresGradViewOpExiting (3 ms) 2023-01-11T23:32:08.6287342Z [----------] 4 tests from GradModeTest (3 ms total) 2023-01-11T23:32:08.6287510Z 2023-01-11T23:32:08.6287671Z [----------] 3 tests from OperationTest 2023-01-11T23:32:08.6287964Z [ RUN ] OperationTest.Lerp 2023-01-11T23:32:08.6295953Z [ OK ] OperationTest.Lerp (0 ms) 2023-01-11T23:32:08.6296321Z [ RUN ] OperationTest.Cross 2023-01-11T23:32:08.6335885Z [ OK ] OperationTest.Cross (3 ms) 2023-01-11T23:32:08.6336190Z [ RUN ] OperationTest.Linear_out 2023-01-11T23:32:08.6339557Z [ OK ] OperationTest.Linear_out (0 ms) 2023-01-11T23:32:08.6339930Z [----------] 3 tests from OperationTest (5 ms total) 2023-01-11T23:32:08.6340114Z 2023-01-11T23:32:08.6340334Z [----------] 1 test from ParallelTest 2023-01-11T23:32:08.6340877Z [ RUN ] ParallelTest.DataParallelUsesAllAvailableCUDADevices_CUDA 2023-01-11T23:32:08.6341369Z [ OK ] ParallelTest.DataParallelUsesAllAvailableCUDADevices_CUDA (0 ms) 2023-01-11T23:32:08.6341783Z [----------] 1 test from ParallelTest (0 ms total) 2023-01-11T23:32:08.6341948Z 2023-01-11T23:32:08.6342118Z [----------] Global test environment tear-down 2023-01-11T23:32:08.6472821Z [==========] 1035 tests from 49 test suites ran. (66641 ms total) 2023-01-11T23:32:08.6473122Z [ PASSED ] 1035 tests. 2023-01-11T23:32:09.3135735Z + /opt/conda/lib/python3.10/site-packages/torch/bin/test_tensorexpr --gtest_output=xml:test/test-reports/cpp-unittest/test_libtorch/test_tensorexpr.xml 2023-01-11T23:32:09.7148816Z Only one CUDA device detected. Disabling MultiCUDA tests 2023-01-11T23:32:09.7152503Z Note: Google Test filter = *-*_MultiCUDA 2023-01-11T23:32:09.7152892Z [==========] Running 829 tests from 26 test suites. 2023-01-11T23:32:09.7153255Z [----------] Global test environment set-up. 2023-01-11T23:32:09.7153580Z [----------] 1 test from Approx 2023-01-11T23:32:09.7153849Z [ RUN ] Approx.log_vml 2023-01-11T23:32:10.5153636Z [ OK ] Approx.log_vml (799 ms) 2023-01-11T23:32:10.5154085Z [----------] 1 test from Approx (799 ms total) 2023-01-11T23:32:10.5154310Z 2023-01-11T23:32:10.5154479Z [----------] 34 tests from ATen 2023-01-11T23:32:10.5154785Z [ RUN ] ATen._cast_Float 2023-01-11T23:32:10.5155088Z [ OK ] ATen._cast_Float (0 ms) 2023-01-11T23:32:10.5155385Z [ RUN ] ATen.negInt 2023-01-11T23:32:10.5158140Z [ OK ] ATen.negInt (0 ms) 2023-01-11T23:32:10.5158436Z [ RUN ] ATen.negFloat 2023-01-11T23:32:10.5162217Z [ OK ] ATen.negFloat (0 ms) 2023-01-11T23:32:10.5162521Z [ RUN ] ATen.addInt 2023-01-11T23:32:10.5168655Z [ OK ] ATen.addInt (0 ms) 2023-01-11T23:32:10.5168953Z [ RUN ] ATen.addFloat 2023-01-11T23:32:10.5176037Z [ OK ] ATen.addFloat (0 ms) 2023-01-11T23:32:10.5176343Z [ RUN ] ATen.subInt 2023-01-11T23:32:10.5182891Z [ OK ] ATen.subInt (0 ms) 2023-01-11T23:32:10.5183522Z [ RUN ] ATen.subFloat 2023-01-11T23:32:10.5190920Z [ OK ] ATen.subFloat (0 ms) 2023-01-11T23:32:10.5191345Z [ RUN ] ATen.lerp 2023-01-11T23:32:10.5199349Z [ OK ] ATen.lerp (0 ms) 2023-01-11T23:32:10.5199766Z [ RUN ] ATen.addcmulInt 2023-01-11T23:32:10.5208129Z [ OK ] ATen.addcmulInt (0 ms) 2023-01-11T23:32:10.5208577Z [ RUN ] ATen.addcmulFloat 2023-01-11T23:32:10.5217291Z [ OK ] ATen.addcmulFloat (0 ms) 2023-01-11T23:32:10.5217713Z [ RUN ] ATen.mulInt 2023-01-11T23:32:10.5222577Z [ OK ] ATen.mulInt (0 ms) 2023-01-11T23:32:10.5222991Z [ RUN ] ATen.mulFloat 2023-01-11T23:32:10.5227954Z [ OK ] ATen.mulFloat (0 ms) 2023-01-11T23:32:10.5228350Z [ RUN ] ATen.divInt 2023-01-11T23:32:10.5233994Z [ OK ] ATen.divInt (0 ms) 2023-01-11T23:32:10.5234408Z [ RUN ] ATen.divFloat 2023-01-11T23:32:10.5238811Z [ OK ] ATen.divFloat (0 ms) 2023-01-11T23:32:10.5239215Z [ RUN ] ATen.maxInt 2023-01-11T23:32:10.5244295Z [ OK ] ATen.maxInt (0 ms) 2023-01-11T23:32:10.5244702Z [ RUN ] ATen.maxFloat 2023-01-11T23:32:10.5249565Z [ OK ] ATen.maxFloat (0 ms) 2023-01-11T23:32:10.5249977Z [ RUN ] ATen.minInt 2023-01-11T23:32:10.5255325Z [ OK ] ATen.minInt (0 ms) 2023-01-11T23:32:10.5255732Z [ RUN ] ATen.minFloat 2023-01-11T23:32:10.5260548Z [ OK ] ATen.minFloat (0 ms) 2023-01-11T23:32:10.5260968Z [ RUN ] ATen.reluInt 2023-01-11T23:32:10.5264835Z [ OK ] ATen.reluInt (0 ms) 2023-01-11T23:32:10.5265245Z [ RUN ] ATen.reluFloat 2023-01-11T23:32:10.5269183Z [ OK ] ATen.reluFloat (0 ms) 2023-01-11T23:32:10.5269600Z [ RUN ] ATen.logFloat 2023-01-11T23:32:10.5274241Z [ OK ] ATen.logFloat (0 ms) 2023-01-11T23:32:10.5274651Z [ RUN ] ATen.fastLogFloat 2023-01-11T23:32:10.5437138Z [ OK ] ATen.fastLogFloat (16 ms) 2023-01-11T23:32:10.5437588Z [ RUN ] ATen.fastTanhFloat 2023-01-11T23:32:10.5501541Z [ OK ] ATen.fastTanhFloat (6 ms) 2023-01-11T23:32:10.5501999Z [ RUN ] ATen.fastSigmoidFloat 2023-01-11T23:32:10.5583257Z [ OK ] ATen.fastSigmoidFloat (8 ms) 2023-01-11T23:32:10.5583705Z [ RUN ] ATen.log10Float 2023-01-11T23:32:10.5587450Z [ OK ] ATen.log10Float (0 ms) 2023-01-11T23:32:10.5587881Z [ RUN ] ATen.log2Float 2023-01-11T23:32:10.5592731Z [ OK ] ATen.log2Float (0 ms) 2023-01-11T23:32:10.5593156Z [ RUN ] ATen.expFloat 2023-01-11T23:32:10.5595279Z [ OK ] ATen.expFloat (0 ms) 2023-01-11T23:32:10.5595693Z [ RUN ] ATen.erfFloat 2023-01-11T23:32:10.5599731Z [ OK ] ATen.erfFloat (0 ms) 2023-01-11T23:32:10.5600137Z [ RUN ] ATen.cosFloat 2023-01-11T23:32:10.5604055Z [ OK ] ATen.cosFloat (0 ms) 2023-01-11T23:32:10.5604455Z [ RUN ] ATen.eqInt 2023-01-11T23:32:10.5609821Z [ OK ] ATen.eqInt (0 ms) 2023-01-11T23:32:10.5610208Z [ RUN ] ATen.geInt 2023-01-11T23:32:10.5615796Z [ OK ] ATen.geInt (0 ms) 2023-01-11T23:32:10.5616194Z [ RUN ] ATen.gtInt 2023-01-11T23:32:10.5621478Z [ OK ] ATen.gtInt (0 ms) 2023-01-11T23:32:10.5621863Z [ RUN ] ATen.leInt 2023-01-11T23:32:10.5627204Z [ OK ] ATen.leInt (0 ms) 2023-01-11T23:32:10.5627603Z [ RUN ] ATen.ltInt 2023-01-11T23:32:10.5633794Z [ OK ] ATen.ltInt (0 ms) 2023-01-11T23:32:10.5635033Z [----------] 34 tests from ATen (48 ms total) 2023-01-11T23:32:10.5635236Z 2023-01-11T23:32:10.5635415Z [----------] 26 tests from BoundsInference 2023-01-11T23:32:10.5635727Z [ RUN ] BoundsInference._1 2023-01-11T23:32:10.5636318Z [ OK ] BoundsInference._1 (0 ms) 2023-01-11T23:32:10.5636615Z [ RUN ] BoundsInference._2 2023-01-11T23:32:10.5638555Z [ OK ] BoundsInference._2 (0 ms) 2023-01-11T23:32:10.5638913Z [ RUN ] BoundsInference._3 2023-01-11T23:32:10.5641006Z [ OK ] BoundsInference._3 (0 ms) 2023-01-11T23:32:10.5641362Z [ RUN ] BoundsInference._4 2023-01-11T23:32:10.5646601Z [ OK ] BoundsInference._4 (0 ms) 2023-01-11T23:32:10.5646925Z [ RUN ] BoundsInference._5 2023-01-11T23:32:10.5659767Z [ OK ] BoundsInference._5 (1 ms) 2023-01-11T23:32:10.5660123Z [ RUN ] BoundsInference._6 2023-01-11T23:32:10.5668892Z [ OK ] BoundsInference._6 (0 ms) 2023-01-11T23:32:10.5669225Z [ RUN ] BoundsInference.Adjacent 2023-01-11T23:32:10.5674816Z [ OK ] BoundsInference.Adjacent (0 ms) 2023-01-11T23:32:10.5675231Z [ RUN ] BoundsInference.MultipleTopLoopLoad 2023-01-11T23:32:10.5679913Z [ OK ] BoundsInference.MultipleTopLoopLoad (0 ms) 2023-01-11T23:32:10.5680389Z [ RUN ] BoundsInference.MultipleTopLoopStore 2023-01-11T23:32:10.5684930Z [ OK ] BoundsInference.MultipleTopLoopStore (0 ms) 2023-01-11T23:32:10.5685308Z [ RUN ] BoundsInference.CacheReads 2023-01-11T23:32:10.5712873Z [ OK ] BoundsInference.CacheReads (2 ms) 2023-01-11T23:32:10.5713317Z [ RUN ] BoundsInference.Flattened 2023-01-11T23:32:10.5725096Z [ OK ] BoundsInference.Flattened (1 ms) 2023-01-11T23:32:10.5725688Z [ RUN ] BoundsInference.GetPotentialHazards 2023-01-11T23:32:10.5726106Z [ OK ] BoundsInference.GetPotentialHazards (0 ms) 2023-01-11T23:32:10.5726508Z [ RUN ] BoundsInference.GetPotentialHazardsLoopNoHazard 2023-01-11T23:32:10.5730304Z [ OK ] BoundsInference.GetPotentialHazardsLoopNoHazard (0 ms) 2023-01-11T23:32:10.5730773Z [ RUN ] BoundsInference.GetPotentialHazardsLoopCall 2023-01-11T23:32:10.5735139Z [ OK ] BoundsInference.GetPotentialHazardsLoopCall (0 ms) 2023-01-11T23:32:10.5735603Z [ RUN ] BoundsInference.GetPotentialHazardsLoopSplit 2023-01-11T23:32:10.5747083Z [ OK ] BoundsInference.GetPotentialHazardsLoopSplit (1 ms) 2023-01-11T23:32:10.5747624Z [ RUN ] BoundsInference.HasConflictingOverlapSameBufferWithPartialOverlap 2023-01-11T23:32:10.5752938Z [ OK ] BoundsInference.HasConflictingOverlapSameBufferWithPartialOverlap (0 ms) 2023-01-11T23:32:10.5753504Z [ RUN ] BoundsInference.HasConflictingOverlapSameBufferWithFullOverlap 2023-01-11T23:32:10.5754575Z [ OK ] BoundsInference.HasConflictingOverlapSameBufferWithFullOverlap (0 ms) 2023-01-11T23:32:10.5755429Z [ RUN ] BoundsInference.HasConflictingOverlapSameBufferWithFullOverlapRAW 2023-01-11T23:32:10.5757333Z [ OK ] BoundsInference.HasConflictingOverlapSameBufferWithFullOverlapRAW (0 ms) 2023-01-11T23:32:10.5758157Z [ RUN ] BoundsInference.HasConflictingOverlapSameBufferNotOverlapping 2023-01-11T23:32:10.5761434Z [ OK ] BoundsInference.HasConflictingOverlapSameBufferNotOverlapping (0 ms) 2023-01-11T23:32:10.5762219Z [ RUN ] BoundsInference.HasConflictingOverlap2DBufferWithOverlap 2023-01-11T23:32:10.5776580Z [ OK ] BoundsInference.HasConflictingOverlap2DBufferWithOverlap (1 ms) 2023-01-11T23:32:10.5777342Z [ RUN ] BoundsInference.HasConflictingOverlap2DBufferWithNoOverlap 2023-01-11T23:32:10.5787366Z [ OK ] BoundsInference.HasConflictingOverlap2DBufferWithNoOverlap (1 ms) 2023-01-11T23:32:10.5788102Z [ RUN ] BoundsInference.HasConflictingOverlapDifferentBuffers 2023-01-11T23:32:10.5790386Z [ OK ] BoundsInference.HasConflictingOverlapDifferentBuffers (0 ms) 2023-01-11T23:32:10.5791250Z [ RUN ] BoundsInference.HasConflictingOverlapDueToRAWDependence 2023-01-11T23:32:10.5794398Z [ OK ] BoundsInference.HasConflictingOverlapDueToRAWDependence (0 ms) 2023-01-11T23:32:10.5795131Z [ RUN ] BoundsInference.HasConflictingOverlapDueToWARDependence 2023-01-11T23:32:10.5797715Z [ OK ] BoundsInference.HasConflictingOverlapDueToWARDependence (0 ms) 2023-01-11T23:32:10.5798391Z [ RUN ] BoundsInference.HasConflictingOverlapWithLoads 2023-01-11T23:32:10.5801029Z [ OK ] BoundsInference.HasConflictingOverlapWithLoads (0 ms) 2023-01-11T23:32:10.5801631Z [ RUN ] BoundsInference.IsOverlapping 2023-01-11T23:32:10.5824408Z [ OK ] BoundsInference.IsOverlapping (2 ms) 2023-01-11T23:32:10.5824958Z [----------] 26 tests from BoundsInference (19 ms total) 2023-01-11T23:32:10.5825223Z 2023-01-11T23:32:10.5825447Z [----------] 4 tests from Conv 2023-01-11T23:32:10.5825887Z [ RUN ] Conv.DepthwiseConv2D 2023-01-11T23:32:10.9417688Z [ OK ] Conv.DepthwiseConv2D (359 ms) 2023-01-11T23:32:10.9418250Z [ RUN ] Conv.DepthwiseConv2DNoBias 2023-01-11T23:32:11.2952381Z [ OK ] Conv.DepthwiseConv2DNoBias (353 ms) 2023-01-11T23:32:11.2952953Z [ RUN ] Conv.DepthwiseConv2DDynamicShapes 2023-01-11T23:32:11.4631809Z [ OK ] Conv.DepthwiseConv2DDynamicShapes (167 ms) 2023-01-11T23:32:11.4632219Z [ RUN ] Conv.Conv2D 2023-01-11T23:32:12.4843025Z [ OK ] Conv.Conv2D (1020 ms) 2023-01-11T23:32:12.4843754Z [----------] 4 tests from Conv (1901 ms total) 2023-01-11T23:32:12.4843917Z 2023-01-11T23:32:12.4844071Z [----------] 28 tests from CppPrinter 2023-01-11T23:32:12.4844360Z [ RUN ] CppPrinter.IntImm 2023-01-11T23:32:12.4844690Z [ OK ] CppPrinter.IntImm (0 ms) 2023-01-11T23:32:12.4845096Z [ RUN ] CppPrinter.FloatImm 2023-01-11T23:32:12.4845523Z [ OK ] CppPrinter.FloatImm (0 ms) 2023-01-11T23:32:12.4845904Z [ RUN ] CppPrinter.FloatImm1 2023-01-11T23:32:12.4846206Z [ OK ] CppPrinter.FloatImm1 (0 ms) 2023-01-11T23:32:12.4846492Z [ RUN ] CppPrinter.DoubleImm 2023-01-11T23:32:12.4846788Z [ OK ] CppPrinter.DoubleImm (0 ms) 2023-01-11T23:32:12.4847110Z [ RUN ] CppPrinter.DoubleImm1 2023-01-11T23:32:12.4847470Z [ OK ] CppPrinter.DoubleImm1 (0 ms) 2023-01-11T23:32:12.4847815Z [ RUN ] CppPrinter.HalfImm 2023-01-11T23:32:12.4848177Z [ OK ] CppPrinter.HalfImm (0 ms) 2023-01-11T23:32:12.4848456Z [ RUN ] CppPrinter.Add 2023-01-11T23:32:12.4848730Z [ OK ] CppPrinter.Add (0 ms) 2023-01-11T23:32:12.4849018Z [ RUN ] CppPrinter.AddExpr1 2023-01-11T23:32:12.4849317Z [ OK ] CppPrinter.AddExpr1 (0 ms) 2023-01-11T23:32:12.4849604Z [ RUN ] CppPrinter.AddExpr2 2023-01-11T23:32:12.4849901Z [ OK ] CppPrinter.AddExpr2 (0 ms) 2023-01-11T23:32:12.4850193Z [ RUN ] CppPrinter.AddExpr3 2023-01-11T23:32:12.4850481Z [ OK ] CppPrinter.AddExpr3 (0 ms) 2023-01-11T23:32:12.4850755Z [ RUN ] CppPrinter.Mod 2023-01-11T23:32:12.4851033Z [ OK ] CppPrinter.Mod (0 ms) 2023-01-11T23:32:12.4851310Z [ RUN ] CppPrinter.ModFloat 2023-01-11T23:32:12.4851608Z [ OK ] CppPrinter.ModFloat (0 ms) 2023-01-11T23:32:12.4851886Z [ RUN ] CppPrinter.Max 2023-01-11T23:32:12.4852167Z [ OK ] CppPrinter.Max (0 ms) 2023-01-11T23:32:12.4852442Z [ RUN ] CppPrinter.MaxFloat 2023-01-11T23:32:12.4852741Z [ OK ] CppPrinter.MaxFloat (0 ms) 2023-01-11T23:32:12.4853025Z [ RUN ] CppPrinter.MaxHalf 2023-01-11T23:32:12.4853309Z [ OK ] CppPrinter.MaxHalf (0 ms) 2023-01-11T23:32:12.4853682Z [ RUN ] CppPrinter.And 2023-01-11T23:32:12.4854018Z [ OK ] CppPrinter.And (0 ms) 2023-01-11T23:32:12.4854306Z [ RUN ] CppPrinter.CompareSelect 2023-01-11T23:32:12.4854871Z [ OK ] CppPrinter.CompareSelect (0 ms) 2023-01-11T23:32:12.4855179Z [ RUN ] CppPrinter.IfThenElse 2023-01-11T23:32:12.4855473Z [ OK ] CppPrinter.IfThenElse (0 ms) 2023-01-11T23:32:12.4855780Z [ RUN ] CppPrinter.AllocateFree 2023-01-11T23:32:12.4856093Z [ OK ] CppPrinter.AllocateFree (0 ms) 2023-01-11T23:32:12.4856389Z [ RUN ] CppPrinter.LoadStore 2023-01-11T23:32:12.4856691Z [ OK ] CppPrinter.LoadStore (0 ms) 2023-01-11T23:32:12.4856974Z [ RUN ] CppPrinter.Var 2023-01-11T23:32:12.4857248Z [ OK ] CppPrinter.Var (0 ms) 2023-01-11T23:32:12.4857511Z [ RUN ] CppPrinter.Cast 2023-01-11T23:32:12.4857794Z [ OK ] CppPrinter.Cast (0 ms) 2023-01-11T23:32:12.4858080Z [ RUN ] CppPrinter.BitCast 2023-01-11T23:32:12.4858363Z [ OK ] CppPrinter.BitCast (0 ms) 2023-01-11T23:32:12.4858636Z [ RUN ] CppPrinter.Let 2023-01-11T23:32:12.4858910Z [ OK ] CppPrinter.Let (0 ms) 2023-01-11T23:32:12.4859172Z [ RUN ] CppPrinter.For 2023-01-11T23:32:12.4859447Z [ OK ] CppPrinter.For (0 ms) 2023-01-11T23:32:12.4859717Z [ RUN ] CppPrinter.Cond 2023-01-11T23:32:12.4859988Z [ OK ] CppPrinter.Cond (0 ms) 2023-01-11T23:32:12.4860271Z [ RUN ] CppPrinter.Intrinsics 2023-01-11T23:32:12.4860627Z [ OK ] CppPrinter.Intrinsics (0 ms) 2023-01-11T23:32:12.4860938Z [ RUN ] CppPrinter.ExternalCall 2023-01-11T23:32:12.4861244Z [ OK ] CppPrinter.ExternalCall (0 ms) 2023-01-11T23:32:12.4861577Z [----------] 28 tests from CppPrinter (0 ms total) 2023-01-11T23:32:12.4861738Z 2023-01-11T23:32:12.4861907Z [----------] 8 tests from DynamicShapes 2023-01-11T23:32:12.4862229Z [ RUN ] DynamicShapes.SimpleGraph 2023-01-11T23:32:12.5877756Z [ OK ] DynamicShapes.SimpleGraph (103 ms) 2023-01-11T23:32:12.5878986Z [ RUN ] DynamicShapes.GraphWith2InputsSameDims 2023-01-11T23:32:12.6797638Z [ OK ] DynamicShapes.GraphWith2InputsSameDims (91 ms) 2023-01-11T23:32:12.6798138Z [ RUN ] DynamicShapes.GraphWith2InputsAndBroadcast 2023-01-11T23:32:12.7672539Z [ OK ] DynamicShapes.GraphWith2InputsAndBroadcast (87 ms) 2023-01-11T23:32:12.7673060Z [ RUN ] DynamicShapes.GraphWithPartiallySymbolicOutput 2023-01-11T23:32:12.8124408Z [ OK ] DynamicShapes.GraphWithPartiallySymbolicOutput (45 ms) 2023-01-11T23:32:12.8124829Z [ RUN ] DynamicShapes.GraphWithSymbolicStrides 2023-01-11T23:32:13.0074808Z [ OK ] DynamicShapes.GraphWithSymbolicStrides (194 ms) 2023-01-11T23:32:13.0075417Z [ RUN ] DynamicShapes.GraphWithCatAndBroadcast 2023-01-11T23:32:13.4194432Z [ OK ] DynamicShapes.GraphWithCatAndBroadcast (411 ms) 2023-01-11T23:32:13.4194983Z [ RUN ] DynamicShapes.GraphFromModel 2023-01-11T23:32:13.7936132Z [ OK ] DynamicShapes.GraphFromModel (374 ms) 2023-01-11T23:32:13.7937095Z [ RUN ] DynamicShapes.MultiThreadedExecution 2023-01-11T23:32:14.9972127Z [ OK ] DynamicShapes.MultiThreadedExecution (1203 ms) 2023-01-11T23:32:14.9972706Z [----------] 8 tests from DynamicShapes (2512 ms total) 2023-01-11T23:32:14.9972911Z 2023-01-11T23:32:14.9973075Z [----------] 30 tests from Expr 2023-01-11T23:32:14.9973415Z [ RUN ] Expr.BasicValueTest 2023-01-11T23:32:14.9973703Z [ OK ] Expr.BasicValueTest (0 ms) 2023-01-11T23:32:14.9974064Z [ RUN ] Expr.BasicValueTest02 2023-01-11T23:32:14.9974965Z [ OK ] Expr.BasicValueTest02 (0 ms) 2023-01-11T23:32:14.9975338Z [ RUN ] Expr.IsChannelsLastContiguous 2023-01-11T23:32:14.9975687Z [ OK ] Expr.IsChannelsLastContiguous (0 ms) 2023-01-11T23:32:14.9976074Z [ RUN ] Expr.LetTest01 2023-01-11T23:32:14.9976345Z [ OK ] Expr.LetTest01 (0 ms) 2023-01-11T23:32:14.9976624Z [ RUN ] Expr.LetTest02 2023-01-11T23:32:14.9976954Z [ OK ] Expr.LetTest02 (0 ms) 2023-01-11T23:32:14.9977222Z [ RUN ] Expr.LetStmtTest01 2023-01-11T23:32:14.9977538Z [ OK ] Expr.LetStmtTest01 (0 ms) 2023-01-11T23:32:14.9977869Z [ RUN ] Expr.IntTest 2023-01-11T23:32:14.9978141Z [ OK ] Expr.IntTest (0 ms) 2023-01-11T23:32:14.9978425Z [ RUN ] Expr.FloatTest 2023-01-11T23:32:14.9978746Z [ OK ] Expr.FloatTest (0 ms) 2023-01-11T23:32:14.9979013Z [ RUN ] Expr.ByteTest 2023-01-11T23:32:14.9979319Z [ OK ] Expr.ByteTest (0 ms) 2023-01-11T23:32:14.9979636Z [ RUN ] Expr.CharTest 2023-01-11T23:32:14.9979912Z [ OK ] Expr.CharTest (0 ms) 2023-01-11T23:32:14.9980199Z [ RUN ] Expr.ShortTest 2023-01-11T23:32:14.9980518Z [ OK ] Expr.ShortTest (0 ms) 2023-01-11T23:32:14.9980785Z [ RUN ] Expr.LongTest 2023-01-11T23:32:14.9981078Z [ OK ] Expr.LongTest (0 ms) 2023-01-11T23:32:14.9981371Z [ RUN ] Expr.HalfTest 2023-01-11T23:32:14.9981637Z [ OK ] Expr.HalfTest (0 ms) 2023-01-11T23:32:14.9981918Z [ RUN ] Expr.DoubleTest 2023-01-11T23:32:14.9982313Z [ OK ] Expr.DoubleTest (0 ms) 2023-01-11T23:32:14.9982584Z [ RUN ] Expr.VectorAdd01 2023-01-11T23:32:14.9989587Z [ OK ] Expr.VectorAdd01 (1 ms) 2023-01-11T23:32:14.9990727Z [ RUN ] Expr.CompareSelectEQ 2023-01-11T23:32:15.0027937Z [ OK ] Expr.CompareSelectEQ (3 ms) 2023-01-11T23:32:15.0028799Z [ RUN ] Expr.CompareSelectDtypes 2023-01-11T23:32:15.0065958Z [ OK ] Expr.CompareSelectDtypes (3 ms) 2023-01-11T23:32:15.0066812Z [ RUN ] Expr.IntrinsicsDtypes 2023-01-11T23:32:15.0072472Z [ OK ] Expr.IntrinsicsDtypes (0 ms) 2023-01-11T23:32:15.0073335Z [ RUN ] Expr.Substitute01 2023-01-11T23:32:15.0074116Z [ OK ] Expr.Substitute01 (0 ms) 2023-01-11T23:32:15.0074801Z [ RUN ] Expr.Math01 2023-01-11T23:32:15.0075124Z [ OK ] Expr.Math01 (0 ms) 2023-01-11T23:32:15.0075530Z [ RUN ] Expr.UnaryMath01 2023-01-11T23:32:15.0075947Z [ OK ] Expr.UnaryMath01 (0 ms) 2023-01-11T23:32:15.0076228Z [ RUN ] Expr.BinaryMath01 2023-01-11T23:32:15.0076511Z [ OK ] Expr.BinaryMath01 (0 ms) 2023-01-11T23:32:15.0076783Z [ RUN ] Expr.LogicalOps01 2023-01-11T23:32:15.0077073Z [ OK ] Expr.LogicalOps01 (0 ms) 2023-01-11T23:32:15.0077356Z [ RUN ] Expr.LogicalOps02 2023-01-11T23:32:15.0077631Z [ OK ] Expr.LogicalOps02 (0 ms) 2023-01-11T23:32:15.0077907Z [ RUN ] Expr.LogicalOps03 2023-01-11T23:32:15.0078206Z [ OK ] Expr.LogicalOps03 (0 ms) 2023-01-11T23:32:15.0078585Z [ RUN ] Expr.BitwiseOps 2023-01-11T23:32:15.0079000Z [ OK ] Expr.BitwiseOps (0 ms) 2023-01-11T23:32:15.0079418Z [ RUN ] Expr.DynamicShapeAdd 2023-01-11T23:32:15.0079865Z [ OK ] Expr.DynamicShapeAdd (0 ms) 2023-01-11T23:32:15.0080283Z [ RUN ] Expr.OutOfBounds 2023-01-11T23:32:15.0080623Z [ OK ] Expr.OutOfBounds (0 ms) 2023-01-11T23:32:15.0080905Z [ RUN ] Expr.OutOfBounds2d 2023-01-11T23:32:15.0088117Z [ OK ] Expr.OutOfBounds2d (0 ms) 2023-01-11T23:32:15.0088576Z [ RUN ] Expr.OutOfBounds2dFlattenedIndex 2023-01-11T23:32:15.0092744Z [ OK ] Expr.OutOfBounds2dFlattenedIndex (0 ms) 2023-01-11T23:32:15.0093222Z [----------] 30 tests from Expr (12 ms total) 2023-01-11T23:32:15.0093400Z 2023-01-11T23:32:15.0093560Z [----------] 16 tests from ExternalCall 2023-01-11T23:32:15.0093859Z [ RUN ] ExternalCall.Conv1d_float 2023-01-11T23:32:15.0411982Z [ OK ] ExternalCall.Conv1d_float (31 ms) 2023-01-11T23:32:15.0412429Z [ RUN ] ExternalCall.Conv1d_int 2023-01-11T23:32:15.0749393Z [ OK ] ExternalCall.Conv1d_int (33 ms) 2023-01-11T23:32:15.0750446Z [ RUN ] ExternalCall.Conv1d_nobias_noargs 2023-01-11T23:32:15.1023747Z [ OK ] ExternalCall.Conv1d_nobias_noargs (27 ms) 2023-01-11T23:32:15.1024705Z [ RUN ] ExternalCall.Conv2d_float 2023-01-11T23:32:15.1373649Z [ OK ] ExternalCall.Conv2d_float (35 ms) 2023-01-11T23:32:15.1374145Z [ RUN ] ExternalCall.Conv2d_int 2023-01-11T23:32:15.1795597Z [ OK ] ExternalCall.Conv2d_int (42 ms) 2023-01-11T23:32:15.1796088Z [ RUN ] ExternalCall.Conv2d_nobias_noargs 2023-01-11T23:32:15.2102947Z [ OK ] ExternalCall.Conv2d_nobias_noargs (30 ms) 2023-01-11T23:32:15.2103439Z [ RUN ] ExternalCall.Addmm_float 2023-01-11T23:32:15.2396843Z [ OK ] ExternalCall.Addmm_float (29 ms) 2023-01-11T23:32:15.2397299Z [ RUN ] ExternalCall.Embedding 2023-01-11T23:32:15.2674027Z [ OK ] ExternalCall.Embedding (27 ms) 2023-01-11T23:32:15.2674522Z [ RUN ] ExternalCall.MaxReduction 2023-01-11T23:32:15.2922604Z [ OK ] ExternalCall.MaxReduction (24 ms) 2023-01-11T23:32:15.2923159Z [ RUN ] ExternalCall.Prepacked_Linear_float 2023-01-11T23:32:15.3704351Z [ OK ] ExternalCall.Prepacked_Linear_float (78 ms) 2023-01-11T23:32:15.3704756Z [ RUN ] ExternalCall.Prepacked_Conv2d_float 2023-01-11T23:32:15.4426745Z [ OK ] ExternalCall.Prepacked_Conv2d_float (72 ms) 2023-01-11T23:32:15.4428053Z [ RUN ] ExternalCall.BinaryFloat 2023-01-11T23:32:15.5228783Z [ OK ] ExternalCall.BinaryFloat (80 ms) 2023-01-11T23:32:15.5229231Z [ RUN ] ExternalCall.UnaryFloat 2023-01-11T23:32:15.5729734Z [ OK ] ExternalCall.UnaryFloat (50 ms) 2023-01-11T23:32:15.5730225Z [ RUN ] ExternalCall.ComputeInterop 2023-01-11T23:32:16.7200607Z [ OK ] ExternalCall.ComputeInterop (1146 ms) 2023-01-11T23:32:16.7201001Z [ RUN ] ExternalCall.Inlining 2023-01-11T23:32:16.8142112Z [ OK ] ExternalCall.Inlining (93 ms) 2023-01-11T23:32:16.8142844Z [ RUN ] ExternalCall.JitCustomFusionOp 2023-01-11T23:32:16.9347823Z [ OK ] ExternalCall.JitCustomFusionOp (120 ms) 2023-01-11T23:32:16.9348578Z [----------] 16 tests from ExternalCall (1925 ms total) 2023-01-11T23:32:16.9348913Z 2023-01-11T23:32:16.9349212Z [----------] 8 tests from GraphOpt 2023-01-11T23:32:16.9349773Z [ RUN ] GraphOpt.OptimizeCat 2023-01-11T23:32:16.9713726Z [ OK ] GraphOpt.OptimizeCat (36 ms) 2023-01-11T23:32:16.9714388Z [ RUN ] GraphOpt.OptimizeCat2 2023-01-11T23:32:17.0117083Z [ OK ] GraphOpt.OptimizeCat2 (40 ms) 2023-01-11T23:32:17.0117438Z [ RUN ] GraphOpt.OptimizeCat3 2023-01-11T23:32:17.0595876Z [ OK ] GraphOpt.OptimizeCat3 (47 ms) 2023-01-11T23:32:17.0596325Z [ RUN ] GraphOpt.OptimizeCatWithTypePromotionInUser 2023-01-11T23:32:17.0957109Z [ OK ] GraphOpt.OptimizeCatWithTypePromotionInUser (36 ms) 2023-01-11T23:32:17.0957573Z [ RUN ] GraphOpt.OptimizeCatWithTypePromotionInCat 2023-01-11T23:32:17.1739150Z [ OK ] GraphOpt.OptimizeCatWithTypePromotionInCat (78 ms) 2023-01-11T23:32:17.1740053Z [ RUN ] GraphOpt.OptimizeCatNoSingleTensorElementwiseOp 2023-01-11T23:32:17.2172332Z [ OK ] GraphOpt.OptimizeCatNoSingleTensorElementwiseOp (43 ms) 2023-01-11T23:32:17.2173279Z [ RUN ] GraphOpt.OptimizeCatNoSingleTensorElementwiseOp2 2023-01-11T23:32:17.2638214Z [ OK ] GraphOpt.OptimizeCatNoSingleTensorElementwiseOp2 (46 ms) 2023-01-11T23:32:17.2638688Z [ RUN ] GraphOpt.AOTGraphPrepPasses 2023-01-11T23:32:17.2639182Z [ OK ] GraphOpt.AOTGraphPrepPasses (0 ms) 2023-01-11T23:32:17.2639684Z [----------] 8 tests from GraphOpt (329 ms total) 2023-01-11T23:32:17.2639847Z 2023-01-11T23:32:17.2639999Z [----------] 4 tests from IRPrinter 2023-01-11T23:32:17.2640350Z [ RUN ] IRPrinter.BasicValueTest 2023-01-11T23:32:17.2640828Z [ OK ] IRPrinter.BasicValueTest (0 ms) 2023-01-11T23:32:17.2641318Z [ RUN ] IRPrinter.BasicValueTest02 2023-01-11T23:32:17.2641769Z [ OK ] IRPrinter.BasicValueTest02 (0 ms) 2023-01-11T23:32:17.2642229Z [ RUN ] IRPrinter.CastTest 2023-01-11T23:32:17.2642564Z [ OK ] IRPrinter.CastTest (0 ms) 2023-01-11T23:32:17.2642921Z [ RUN ] IRPrinter.FunctionName 2023-01-11T23:32:17.2643317Z [ OK ] IRPrinter.FunctionName (0 ms) 2023-01-11T23:32:17.2643647Z [----------] 4 tests from IRPrinter (0 ms total) 2023-01-11T23:32:17.2643806Z 2023-01-11T23:32:17.2643954Z [----------] 8 tests from IRVerifier 2023-01-11T23:32:17.2644235Z [ RUN ] IRVerifier.BitwiseOps 2023-01-11T23:32:17.2644538Z [ OK ] IRVerifier.BitwiseOps (0 ms) 2023-01-11T23:32:17.2645040Z [ RUN ] IRVerifier.CompareSelect 2023-01-11T23:32:17.2645347Z [ OK ] IRVerifier.CompareSelect (0 ms) 2023-01-11T23:32:17.2645637Z [ RUN ] IRVerifier.Ramp 2023-01-11T23:32:17.2645919Z [ OK ] IRVerifier.Ramp (0 ms) 2023-01-11T23:32:17.2646188Z [ RUN ] IRVerifier.Load 2023-01-11T23:32:17.2646462Z [ OK ] IRVerifier.Load (0 ms) 2023-01-11T23:32:17.2646752Z [ RUN ] IRVerifier.IfThenElse 2023-01-11T23:32:17.2647054Z [ OK ] IRVerifier.IfThenElse (0 ms) 2023-01-11T23:32:17.2647321Z [ RUN ] IRVerifier.For 2023-01-11T23:32:17.2647593Z [ OK ] IRVerifier.For (0 ms) 2023-01-11T23:32:17.2647862Z [ RUN ] IRVerifier.Block 2023-01-11T23:32:17.2648136Z [ OK ] IRVerifier.Block (0 ms) 2023-01-11T23:32:17.2648411Z [ RUN ] IRVerifier.Store 2023-01-11T23:32:17.2648690Z [ OK ] IRVerifier.Store (0 ms) 2023-01-11T23:32:17.2648997Z [----------] 8 tests from IRVerifier (0 ms total) 2023-01-11T23:32:17.2649157Z 2023-01-11T23:32:17.2649300Z [----------] 39 tests from Kernel 2023-01-11T23:32:17.2649609Z [ RUN ] Kernel.ParallelExternalCallBuf 2023-01-11T23:32:17.3177736Z [ OK ] Kernel.ParallelExternalCallBuf (53 ms) 2023-01-11T23:32:17.3178655Z [ RUN ] Kernel.InliningIntermediates 2023-01-11T23:32:17.6404407Z [ OK ] Kernel.InliningIntermediates (322 ms) 2023-01-11T23:32:17.6404896Z [ RUN ] Kernel.PreAllocIntermediateBufs 2023-01-11T23:32:17.8270282Z [ OK ] Kernel.PreAllocIntermediateBufs (186 ms) 2023-01-11T23:32:17.8271039Z [ RUN ] Kernel._1 2023-01-11T23:32:17.8598481Z [ OK ] Kernel._1 (32 ms) 2023-01-11T23:32:17.8598864Z [ RUN ] Kernel._2 2023-01-11T23:32:17.8950985Z [ OK ] Kernel._2 (35 ms) 2023-01-11T23:32:17.8951870Z [ RUN ] Kernel._3 2023-01-11T23:32:17.9309167Z [ OK ] Kernel._3 (35 ms) 2023-01-11T23:32:17.9309713Z [ RUN ] Kernel.Huge 2023-01-11T23:32:17.9569482Z [ OK ] Kernel.Huge (26 ms) 2023-01-11T23:32:17.9570150Z [ RUN ] Kernel.ParallelStrided 2023-01-11T23:32:18.0462879Z [ OK ] Kernel.ParallelStrided (89 ms) 2023-01-11T23:32:18.0463557Z [ RUN ] Kernel.CatInputTypesPromotion 2023-01-11T23:32:18.1690513Z [ OK ] Kernel.CatInputTypesPromotion (122 ms) 2023-01-11T23:32:18.1691137Z [ RUN ] Kernel.ToDType 2023-01-11T23:32:18.1984237Z [ OK ] Kernel.ToDType (29 ms) 2023-01-11T23:32:18.1984949Z [ RUN ] Kernel.CatAndInlineWithAConstantDim 2023-01-11T23:32:18.2286505Z [ OK ] Kernel.CatAndInlineWithAConstantDim (30 ms) 2023-01-11T23:32:18.2287176Z [ RUN ] Kernel.CatWithEmptyInputs 2023-01-11T23:32:18.3088681Z [ OK ] Kernel.CatWithEmptyInputs (80 ms) 2023-01-11T23:32:18.3089403Z [ RUN ] Kernel.CatWoConditionals 2023-01-11T23:32:18.4017842Z [ OK ] Kernel.CatWoConditionals (92 ms) 2023-01-11T23:32:18.4018559Z [ RUN ] Kernel.OptimizeConditionals 2023-01-11T23:32:18.5149674Z [ OK ] Kernel.OptimizeConditionals (113 ms) 2023-01-11T23:32:18.5150650Z [ RUN ] Kernel.SumAllAxes 2023-01-11T23:32:18.5685393Z [ OK ] Kernel.SumAllAxes (53 ms) 2023-01-11T23:32:18.5685723Z [ RUN ] Kernel.SumOneAxis 2023-01-11T23:32:19.0117392Z [ OK ] Kernel.SumOneAxis (442 ms) 2023-01-11T23:32:19.0117721Z [ RUN ] Kernel.SumMultipleAxes 2023-01-11T23:32:19.5131765Z [ OK ] Kernel.SumMultipleAxes (501 ms) 2023-01-11T23:32:19.5132588Z [ RUN ] Kernel.Softmax2D 2023-01-11T23:32:20.0512323Z [ OK ] Kernel.Softmax2D (537 ms) 2023-01-11T23:32:20.0512878Z [ RUN ] Kernel.Softmax3D 2023-01-11T23:32:21.1867283Z [ OK ] Kernel.Softmax3D (1135 ms) 2023-01-11T23:32:21.1867905Z [ RUN ] Kernel.Softmax4D 2023-01-11T23:32:22.5412910Z [ OK ] Kernel.Softmax4D (1354 ms) 2023-01-11T23:32:22.5413503Z [ RUN ] Kernel.SignTest 2023-01-11T23:32:22.6390544Z [ OK ] Kernel.SignTest (97 ms) 2023-01-11T23:32:22.6391310Z [ RUN ] Kernel.InlineProducerIntoReduction 2023-01-11T23:32:22.6707598Z [ OK ] Kernel.InlineProducerIntoReduction (31 ms) 2023-01-11T23:32:22.6708350Z [ RUN ] Kernel.InlineReductionIntoConsumer 2023-01-11T23:32:22.7113341Z [ OK ] Kernel.InlineReductionIntoConsumer (40 ms) 2023-01-11T23:32:22.7114034Z [ RUN ] Kernel.SanitizeNames_CUDA 2023-01-11T23:32:22.9483238Z [ OK ] Kernel.SanitizeNames_CUDA (237 ms) 2023-01-11T23:32:22.9483584Z [ RUN ] Kernel.SanitizeConstants_CUDA 2023-01-11T23:32:23.1844808Z [ OK ] Kernel.SanitizeConstants_CUDA (236 ms) 2023-01-11T23:32:23.1845156Z [ RUN ] Kernel.ConstantTensors 2023-01-11T23:32:23.2341748Z [ OK ] Kernel.ConstantTensors (49 ms) 2023-01-11T23:32:23.2342460Z [ RUN ] Kernel.ConstantTensorsNonContiguous 2023-01-11T23:32:23.2825797Z [ OK ] Kernel.ConstantTensorsNonContiguous (48 ms) 2023-01-11T23:32:23.2826477Z [ RUN ] Kernel.RunFast 2023-01-11T23:32:23.3179796Z [ OK ] Kernel.RunFast (35 ms) 2023-01-11T23:32:23.3180448Z [ RUN ] Kernel.RunWithAllocatedOutputs 2023-01-11T23:32:23.3529808Z [ OK ] Kernel.RunWithAllocatedOutputs (35 ms) 2023-01-11T23:32:23.3530197Z [ RUN ] Kernel.CodegenInspection 2023-01-11T23:32:23.4015453Z [ OK ] Kernel.CodegenInspection (48 ms) 2023-01-11T23:32:23.4016138Z [ RUN ] Kernel.CustomLowering 2023-01-11T23:32:23.4267129Z [ OK ] Kernel.CustomLowering (25 ms) 2023-01-11T23:32:23.4267763Z [ RUN ] Kernel.Vectorize 2023-01-11T23:32:23.4572623Z [ OK ] Kernel.Vectorize (30 ms) 2023-01-11T23:32:23.4573257Z [ RUN ] Kernel.Strided1dWithinBounds 2023-01-11T23:32:23.4807352Z [ OK ] Kernel.Strided1dWithinBounds (23 ms) 2023-01-11T23:32:23.4808002Z [ RUN ] Kernel.InputAsOutput 2023-01-11T23:32:23.5188083Z [ OK ] Kernel.InputAsOutput (37 ms) 2023-01-11T23:32:23.5189004Z [ RUN ] Kernel.ScalarOut 2023-01-11T23:32:23.5397841Z [ OK ] Kernel.ScalarOut (21 ms) 2023-01-11T23:32:23.5398682Z [ RUN ] Kernel.ScalarTensorOut 2023-01-11T23:32:23.5676069Z [ OK ] Kernel.ScalarTensorOut (27 ms) 2023-01-11T23:32:23.5677155Z [ RUN ] Kernel.FuseLoopsWithVariableBounds 2023-01-11T23:32:24.1421650Z [ OK ] Kernel.FuseLoopsWithVariableBounds (574 ms) 2023-01-11T23:32:24.1422441Z [ RUN ] Kernel.FuseLoopsWithVariableConcatDim 2023-01-11T23:32:24.4207439Z [ OK ] Kernel.FuseLoopsWithVariableConcatDim (278 ms) 2023-01-11T23:32:24.4208332Z [ RUN ] Kernel.DoNotFuseLoopsWithMismatchingVariableDims 2023-01-11T23:32:24.6066791Z [ OK ] Kernel.DoNotFuseLoopsWithMismatchingVariableDims (185 ms) 2023-01-11T23:32:24.6067476Z [----------] 39 tests from Kernel (7341 ms total) 2023-01-11T23:32:24.6067641Z 2023-01-11T23:32:24.6067786Z [----------] 174 tests from LoopNest 2023-01-11T23:32:24.6068156Z [ RUN ] LoopNest.ExprSimple01 2023-01-11T23:32:24.6068470Z [ OK ] LoopNest.ExprSimple01 (0 ms) 2023-01-11T23:32:24.6068760Z [ RUN ] LoopNest.ExprLower01 2023-01-11T23:32:24.6069129Z [ OK ] LoopNest.ExprLower01 (0 ms) 2023-01-11T23:32:24.6069426Z [ RUN ] LoopNest.ExprSimple02 2023-01-11T23:32:24.6089154Z [ OK ] LoopNest.ExprSimple02 (2 ms) 2023-01-11T23:32:24.6090098Z [ RUN ] LoopNest.ExprSliceHeadWithLoopOptions 2023-01-11T23:32:24.6091561Z [ OK ] LoopNest.ExprSliceHeadWithLoopOptions (0 ms) 2023-01-11T23:32:24.6092529Z [ RUN ] LoopNest.ExprSliceTailWithLoopOptions 2023-01-11T23:32:24.6093498Z [ OK ] LoopNest.ExprSliceTailWithLoopOptions (0 ms) 2023-01-11T23:32:24.6095134Z [ RUN ] LoopNest.ExprSliceHeadWhenFactorEqualsSize 2023-01-11T23:32:24.6096290Z [ OK ] LoopNest.ExprSliceHeadWhenFactorEqualsSize (0 ms) 2023-01-11T23:32:24.6097088Z [ RUN ] LoopNest.ExprSliceHeadWhenFactorLargerThanSize 2023-01-11T23:32:24.6097649Z [ OK ] LoopNest.ExprSliceHeadWhenFactorLargerThanSize (0 ms) 2023-01-11T23:32:24.6098030Z [ RUN ] LoopNest.ExprSliceHead 2023-01-11T23:32:24.6098452Z [ OK ] LoopNest.ExprSliceHead (0 ms) 2023-01-11T23:32:24.6098816Z [ RUN ] LoopNest.ExprSliceHeadWithNonZeroStart 2023-01-11T23:32:24.6099209Z [ OK ] LoopNest.ExprSliceHeadWithNonZeroStart (0 ms) 2023-01-11T23:32:24.6099606Z [ RUN ] LoopNest.ExprSliceTailWhenFactorEqualsSize 2023-01-11T23:32:24.6100004Z [ OK ] LoopNest.ExprSliceTailWhenFactorEqualsSize (0 ms) 2023-01-11T23:32:24.6100409Z [ RUN ] LoopNest.ExprSliceTailWhenFactorLargerThanSize 2023-01-11T23:32:24.6100835Z [ OK ] LoopNest.ExprSliceTailWhenFactorLargerThanSize (0 ms) 2023-01-11T23:32:24.6101187Z [ RUN ] LoopNest.ExprSliceTail 2023-01-11T23:32:24.6101481Z [ OK ] LoopNest.ExprSliceTail (0 ms) 2023-01-11T23:32:24.6101788Z [ RUN ] LoopNest.ExprSplitAndSlice 2023-01-11T23:32:24.6105518Z [ OK ] LoopNest.ExprSplitAndSlice (0 ms) 2023-01-11T23:32:24.6105933Z [ RUN ] LoopNest.ExprSliceAndNormalize 2023-01-11T23:32:24.6106572Z [ OK ] LoopNest.ExprSliceAndNormalize (0 ms) 2023-01-11T23:32:24.6106946Z [ RUN ] LoopNest.ExprSliceWithVariableDimension 2023-01-11T23:32:24.6123265Z [ OK ] LoopNest.ExprSliceWithVariableDimension (1 ms) 2023-01-11T23:32:24.6123648Z [ RUN ] LoopNest.ExprSplitWithTail 2023-01-11T23:32:24.6129794Z [ OK ] LoopNest.ExprSplitWithTail (0 ms) 2023-01-11T23:32:24.6130173Z [ RUN ] LoopNest.ExprSplitWithTailNone 2023-01-11T23:32:24.6143557Z [ OK ] LoopNest.ExprSplitWithTailNone (1 ms) 2023-01-11T23:32:24.6143964Z [ RUN ] LoopNest.ExprSplitWithMask01 2023-01-11T23:32:24.6172268Z [ OK ] LoopNest.ExprSplitWithMask01 (2 ms) 2023-01-11T23:32:24.6172712Z [ RUN ] LoopNest.ExprSplitWithMaskRepeatedNoMask 2023-01-11T23:32:24.6177245Z [ OK ] LoopNest.ExprSplitWithMaskRepeatedNoMask (0 ms) 2023-01-11T23:32:24.6177684Z [ RUN ] LoopNest.getLoopAt 2023-01-11T23:32:24.6178005Z [ OK ] LoopNest.getLoopAt (0 ms) 2023-01-11T23:32:24.6178317Z [ RUN ] LoopNest.TileSimple 2023-01-11T23:32:24.7113624Z [ OK ] LoopNest.TileSimple (93 ms) 2023-01-11T23:32:24.7114326Z [ RUN ] LoopNest.TileWithTails 2023-01-11T23:32:24.8047256Z [ OK ] LoopNest.TileWithTails (93 ms) 2023-01-11T23:32:24.8047587Z [ RUN ] LoopNest.TileInMiddle 2023-01-11T23:32:24.9545698Z [ OK ] LoopNest.TileInMiddle (149 ms) 2023-01-11T23:32:24.9546384Z [ RUN ] LoopNest.SplitWithTailWithLoopOptions 2023-01-11T23:32:24.9547081Z [ OK ] LoopNest.SplitWithTailWithLoopOptions (0 ms) 2023-01-11T23:32:24.9547691Z [ RUN ] LoopNest.SplitWithMaskWithLoopOptions 2023-01-11T23:32:24.9548068Z [ OK ] LoopNest.SplitWithMaskWithLoopOptions (0 ms) 2023-01-11T23:32:24.9548433Z [ RUN ] LoopNest.ScheduleBroadcastAddBuffer 2023-01-11T23:32:24.9563578Z [ OK ] LoopNest.ScheduleBroadcastAddBuffer (1 ms) 2023-01-11T23:32:24.9563971Z [ RUN ] LoopNest.ScheduleFunctionCall01 2023-01-11T23:32:24.9642705Z [ OK ] LoopNest.ScheduleFunctionCall01 (7 ms) 2023-01-11T23:32:24.9643084Z [ RUN ] LoopNest.ScheduleInlineSimple 2023-01-11T23:32:24.9768857Z [ OK ] LoopNest.ScheduleInlineSimple (12 ms) 2023-01-11T23:32:24.9769591Z [ RUN ] LoopNest.ScheduleInlineFunc01 2023-01-11T23:32:25.0353322Z [ OK ] LoopNest.ScheduleInlineFunc01 (58 ms) 2023-01-11T23:32:25.0353818Z [ RUN ] LoopNest.ScheduleInlineRandom 2023-01-11T23:32:25.0354643Z [ OK ] LoopNest.ScheduleInlineRandom (0 ms) 2023-01-11T23:32:25.0355017Z [ RUN ] LoopNest.ScheduleInlineRandomUnrelated 2023-01-11T23:32:25.0359767Z [ OK ] LoopNest.ScheduleInlineRandomUnrelated (0 ms) 2023-01-11T23:32:25.0360184Z [ RUN ] LoopNest.ScheduleInlineRandomLowerDimensions 2023-01-11T23:32:25.0362126Z [ OK ] LoopNest.ScheduleInlineRandomLowerDimensions (0 ms) 2023-01-11T23:32:25.0362530Z [ RUN ] LoopNest.ScheduleInlineIntrinsics 2023-01-11T23:32:25.0455189Z [ OK ] LoopNest.ScheduleInlineIntrinsics (9 ms) 2023-01-11T23:32:25.0455658Z [ RUN ] LoopNest.ScheduleInlineRandWithIntrinsics 2023-01-11T23:32:25.0457984Z [ OK ] LoopNest.ScheduleInlineRandWithIntrinsics (0 ms) 2023-01-11T23:32:25.0458381Z [ RUN ] LoopNest.ScheduleSplitAThenInline 2023-01-11T23:32:25.0458745Z [ OK ] LoopNest.ScheduleSplitAThenInline (0 ms) 2023-01-11T23:32:25.0459090Z [ RUN ] LoopNest.ScheduleSplitBThenInline 2023-01-11T23:32:25.0462959Z [ OK ] LoopNest.ScheduleSplitBThenInline (0 ms) 2023-01-11T23:32:25.0464572Z [ RUN ] LoopNest.ScheduleSplitTwiceThenInline 2023-01-11T23:32:25.0465400Z [ OK ] LoopNest.ScheduleSplitTwiceThenInline (0 ms) 2023-01-11T23:32:25.0466123Z [ RUN ] LoopNest.ScheduleInlineThenSplit 2023-01-11T23:32:25.0467838Z [ OK ] LoopNest.ScheduleInlineThenSplit (0 ms) 2023-01-11T23:32:25.0468213Z [ RUN ] LoopNest.ScheduleSplitInlineThenSplit 2023-01-11T23:32:25.0476031Z [ OK ] LoopNest.ScheduleSplitInlineThenSplit (0 ms) 2023-01-11T23:32:25.0476439Z [ RUN ] LoopNest.ScheduleSplitInlineSimplify 2023-01-11T23:32:25.0477222Z [ OK ] LoopNest.ScheduleSplitInlineSimplify (0 ms) 2023-01-11T23:32:25.0477616Z [ RUN ] LoopNest.ScheduleInlineThreeMixedOnce 2023-01-11T23:32:25.0481970Z [ OK ] LoopNest.ScheduleInlineThreeMixedOnce (0 ms) 2023-01-11T23:32:25.0482354Z [ RUN ] LoopNest.ScheduleInlineThreeMixedTwice 2023-01-11T23:32:25.0485990Z [ OK ] LoopNest.ScheduleInlineThreeMixedTwice (0 ms) 2023-01-11T23:32:25.0486379Z [ RUN ] LoopNest.ScheduleInlineThreeMixedInner 2023-01-11T23:32:25.0490805Z [ OK ] LoopNest.ScheduleInlineThreeMixedInner (0 ms) 2023-01-11T23:32:25.0491199Z [ RUN ] LoopNest.ScheduleInlineThreeMixedSplit 2023-01-11T23:32:25.0492789Z [ OK ] LoopNest.ScheduleInlineThreeMixedSplit (0 ms) 2023-01-11T23:32:25.0493251Z [ RUN ] LoopNest.ScheduleInlineOutputTensors 2023-01-11T23:32:25.0497176Z [ OK ] LoopNest.ScheduleInlineOutputTensors (0 ms) 2023-01-11T23:32:25.0497745Z [ RUN ] LoopNest.ScheduleInlineWithCompoundIndices 2023-01-11T23:32:25.0498287Z [ OK ] LoopNest.ScheduleInlineWithCompoundIndices (0 ms) 2023-01-11T23:32:25.0498712Z [ RUN ] LoopNest.ScheduleInlineConsumerIndicesWithCast 2023-01-11T23:32:25.0499245Z [ OK ] LoopNest.ScheduleInlineConsumerIndicesWithCast (0 ms) 2023-01-11T23:32:25.0499823Z [ RUN ] LoopNest.ScheduleInlineProducerIndicesWithCast 2023-01-11T23:32:25.0500261Z [ OK ] LoopNest.ScheduleInlineProducerIndicesWithCast (0 ms) 2023-01-11T23:32:25.0500634Z [ RUN ] LoopNest.ScheduleFuserStyle 2023-01-11T23:32:25.0552504Z [ OK ] LoopNest.ScheduleFuserStyle (5 ms) 2023-01-11T23:32:25.0553472Z [ RUN ] LoopNest.ScheduleFuserThreeArg 2023-01-11T23:32:25.0612540Z [ OK ] LoopNest.ScheduleFuserThreeArg (6 ms) 2023-01-11T23:32:25.0613267Z [ RUN ] LoopNest.ScheduleDynamicShape2D 2023-01-11T23:32:25.0718766Z [ OK ] LoopNest.ScheduleDynamicShape2D (10 ms) 2023-01-11T23:32:25.0719131Z [ RUN ] LoopNest.LoopNestComputeAt_1 2023-01-11T23:32:25.0726459Z [ OK ] LoopNest.LoopNestComputeAt_1 (0 ms) 2023-01-11T23:32:25.0726823Z [ RUN ] LoopNest.LoopNestComputeAt_2 2023-01-11T23:32:25.1245549Z [ OK ] LoopNest.LoopNestComputeAt_2 (51 ms) 2023-01-11T23:32:25.1245919Z [ RUN ] LoopNest.LoopNestComputeAt_3 2023-01-11T23:32:25.1759490Z [ OK ] LoopNest.LoopNestComputeAt_3 (51 ms) 2023-01-11T23:32:25.1759843Z [ RUN ] LoopNest.Reduce2dComputeAt 2023-01-11T23:32:25.2620343Z [ OK ] LoopNest.Reduce2dComputeAt (85 ms) 2023-01-11T23:32:25.2621144Z [ RUN ] LoopNest.LoopNestReorderAxis1 2023-01-11T23:32:25.2622013Z [ OK ] LoopNest.LoopNestReorderAxis1 (0 ms) 2023-01-11T23:32:25.2622709Z [ RUN ] LoopNest.LoopNestReorderPartialAxes 2023-01-11T23:32:25.2630358Z [ OK ] LoopNest.LoopNestReorderPartialAxes (0 ms) 2023-01-11T23:32:25.2631680Z [ RUN ] LoopNest.LoopNestReorderInternalAxis 2023-01-11T23:32:25.2638172Z [ OK ] LoopNest.LoopNestReorderInternalAxis (0 ms) 2023-01-11T23:32:25.2638560Z [ RUN ] LoopNest.LoopNestReorderEnclosingAxis 2023-01-11T23:32:25.2647257Z [ OK ] LoopNest.LoopNestReorderEnclosingAxis (0 ms) 2023-01-11T23:32:25.2647771Z [ RUN ] LoopNest.LoopNestReorderSameAxis 2023-01-11T23:32:25.2648142Z [ OK ] LoopNest.LoopNestReorderSameAxis (0 ms) 2023-01-11T23:32:25.2648507Z [ RUN ] LoopNest.LoopNestReorderExtraStatements 2023-01-11T23:32:25.2662594Z [ OK ] LoopNest.LoopNestReorderExtraStatements (1 ms) 2023-01-11T23:32:25.2663032Z [ RUN ] LoopNest.LoopNestReorderLongStringOfPreOrphans 2023-01-11T23:32:25.3227922Z [ OK ] LoopNest.LoopNestReorderLongStringOfPreOrphans (56 ms) 2023-01-11T23:32:25.3228544Z [ RUN ] LoopNest.LoopNestReorderLongStringOfPostOrphans 2023-01-11T23:32:25.3788065Z [ OK ] LoopNest.LoopNestReorderLongStringOfPostOrphans (56 ms) 2023-01-11T23:32:25.3789279Z [ RUN ] LoopNest.LoopNestReorderLongStringFull 2023-01-11T23:32:25.4496353Z [ OK ] LoopNest.LoopNestReorderLongStringFull (70 ms) 2023-01-11T23:32:25.4497163Z [ RUN ] LoopNest.LoopNestReorderInternalLoopNest 2023-01-11T23:32:25.4617441Z [ OK ] LoopNest.LoopNestReorderInternalLoopNest (12 ms) 2023-01-11T23:32:25.4618204Z [ RUN ] LoopNest.OuterLoopVectorization 2023-01-11T23:32:25.4619096Z [ OK ] LoopNest.OuterLoopVectorization (0 ms) 2023-01-11T23:32:25.4619812Z [ RUN ] LoopNest.VectorizeLoopNotNormalized 2023-01-11T23:32:25.4620587Z [ OK ] LoopNest.VectorizeLoopNotNormalized (0 ms) 2023-01-11T23:32:25.4621368Z [ RUN ] LoopNest.Unroll 2023-01-11T23:32:25.4621932Z [ OK ] LoopNest.Unroll (0 ms) 2023-01-11T23:32:25.4622490Z [ RUN ] LoopNest.UnrollOuter 2023-01-11T23:32:25.4623066Z [ OK ] LoopNest.UnrollOuter (0 ms) 2023-01-11T23:32:25.4623641Z [ RUN ] LoopNest.UnrollInner 2023-01-11T23:32:25.4624222Z [ OK ] LoopNest.UnrollInner (0 ms) 2023-01-11T23:32:25.4624855Z [ RUN ] LoopNest.UnrollMultipleStatements 2023-01-11T23:32:25.4625577Z [ OK ] LoopNest.UnrollMultipleStatements (0 ms) 2023-01-11T23:32:25.4626309Z [ RUN ] LoopNest.UnrollNonLiteralConstantBounds 2023-01-11T23:32:25.4627322Z [ OK ] LoopNest.UnrollNonLiteralConstantBounds (0 ms) 2023-01-11T23:32:25.4627767Z [ RUN ] LoopNest.UnrollNonConstantBounds 2023-01-11T23:32:25.4638896Z [ OK ] LoopNest.UnrollNonConstantBounds (1 ms) 2023-01-11T23:32:25.4639276Z [ RUN ] LoopNest.UnrollByFactorsLessThan2 2023-01-11T23:32:25.4639647Z [ OK ] LoopNest.UnrollByFactorsLessThan2 (0 ms) 2023-01-11T23:32:25.4639997Z [ RUN ] LoopNest.UnrollByFactorEqualToIters 2023-01-11T23:32:25.4640768Z [ OK ] LoopNest.UnrollByFactorEqualToIters (0 ms) 2023-01-11T23:32:25.4641096Z [ RUN ] LoopNest.UnrollEmpty 2023-01-11T23:32:25.4641392Z [ OK ] LoopNest.UnrollEmpty (0 ms) 2023-01-11T23:32:25.4641731Z [ RUN ] LoopNest.NoUnroll 2023-01-11T23:32:25.4642077Z [ OK ] LoopNest.NoUnroll (0 ms) 2023-01-11T23:32:25.4642369Z [ RUN ] LoopNest.UnrollWithLet 2023-01-11T23:32:25.4643550Z [ OK ] LoopNest.UnrollWithLet (0 ms) 2023-01-11T23:32:25.4643891Z [ RUN ] LoopNest.IsNormalized 2023-01-11T23:32:25.4644268Z [ OK ] LoopNest.IsNormalized (0 ms) 2023-01-11T23:32:25.4644589Z [ RUN ] LoopNest.NormalizeStartPositive 2023-01-11T23:32:25.4645451Z [ OK ] LoopNest.NormalizeStartPositive (0 ms) 2023-01-11T23:32:25.4645797Z [ RUN ] LoopNest.NormalizeStartNegative 2023-01-11T23:32:25.4648152Z [ OK ] LoopNest.NormalizeStartNegative (0 ms) 2023-01-11T23:32:25.4648807Z [ RUN ] LoopNest.NormalizeStartZero 2023-01-11T23:32:25.4649217Z [ OK ] LoopNest.NormalizeStartZero (0 ms) 2023-01-11T23:32:25.4649546Z [ RUN ] LoopNest.NormalizeStartVariable 2023-01-11T23:32:25.4651957Z [ OK ] LoopNest.NormalizeStartVariable (0 ms) 2023-01-11T23:32:25.4652342Z [ RUN ] LoopNest.NormalizeOnNestedOuterLoop 2023-01-11T23:32:25.4653131Z [ OK ] LoopNest.NormalizeOnNestedOuterLoop (0 ms) 2023-01-11T23:32:25.4653572Z [ RUN ] LoopNest.NormalizeOnNestedInnerLoop 2023-01-11T23:32:25.4656100Z [ OK ] LoopNest.NormalizeOnNestedInnerLoop (0 ms) 2023-01-11T23:32:25.4656491Z [ RUN ] LoopNest.NormalizeAndSplitWithTail 2023-01-11T23:32:25.4660096Z [ OK ] LoopNest.NormalizeAndSplitWithTail (0 ms) 2023-01-11T23:32:25.4660496Z [ RUN ] LoopNest.NotNormalizeAndSplitWithTail 2023-01-11T23:32:25.4664508Z [ OK ] LoopNest.NotNormalizeAndSplitWithTail (0 ms) 2023-01-11T23:32:25.4664976Z [ RUN ] LoopNest.FlattenSimpleLoopNest2D 2023-01-11T23:32:25.4674233Z [ OK ] LoopNest.FlattenSimpleLoopNest2D (0 ms) 2023-01-11T23:32:25.4674611Z [ RUN ] LoopNest.FlattenSimpleLoopNest3D 2023-01-11T23:32:25.4764054Z [ OK ] LoopNest.FlattenSimpleLoopNest3D (8 ms) 2023-01-11T23:32:25.4764525Z [ RUN ] LoopNest.FlattenLoopNestAfterNormalize 2023-01-11T23:32:25.4794026Z [ OK ] LoopNest.FlattenLoopNestAfterNormalize (2 ms) 2023-01-11T23:32:25.4795120Z [ RUN ] LoopNest.FlattenLoopNestWithNonLiteralConstantBounds 2023-01-11T23:32:25.4801594Z [ OK ] LoopNest.FlattenLoopNestWithNonLiteralConstantBounds (1 ms) 2023-01-11T23:32:25.4802207Z [ RUN ] LoopNest.FlattenImperfectLoopNest 2023-01-11T23:32:25.4802688Z [ OK ] LoopNest.FlattenImperfectLoopNest (0 ms) 2023-01-11T23:32:25.4803138Z [ RUN ] LoopNest.FlattenReductionLoopNest 2023-01-11T23:32:25.4803497Z [ OK ] LoopNest.FlattenReductionLoopNest (0 ms) 2023-01-11T23:32:25.4803881Z [ RUN ] LoopNest.FlattenReductionLoopNestFromTensor 2023-01-11T23:32:25.4804294Z [ OK ] LoopNest.FlattenReductionLoopNestFromTensor (0 ms) 2023-01-11T23:32:25.4804685Z [ RUN ] LoopNest.FlattenIncorrectLoopsAsInput 2023-01-11T23:32:25.4805167Z [ OK ] LoopNest.FlattenIncorrectLoopsAsInput (0 ms) 2023-01-11T23:32:25.4805523Z [ RUN ] LoopNest.DetectInlineRankMismatch 2023-01-11T23:32:25.4805884Z [ OK ] LoopNest.DetectInlineRankMismatch (0 ms) 2023-01-11T23:32:25.4806217Z [ RUN ] LoopNest.CacheReadsSimple 2023-01-11T23:32:25.5184137Z [ OK ] LoopNest.CacheReadsSimple (37 ms) 2023-01-11T23:32:25.5185051Z [ RUN ] LoopNest.CacheReadsOuter 2023-01-11T23:32:25.5589365Z [ OK ] LoopNest.CacheReadsOuter (40 ms) 2023-01-11T23:32:25.5590412Z [ RUN ] LoopNest.CacheReadsInternal 2023-01-11T23:32:25.6025976Z [ OK ] LoopNest.CacheReadsInternal (43 ms) 2023-01-11T23:32:25.6026909Z [ RUN ] LoopNest.CacheReadsInner 2023-01-11T23:32:25.6711426Z [ OK ] LoopNest.CacheReadsInner (68 ms) 2023-01-11T23:32:25.6712715Z [ RUN ] LoopNest.CacheWritesSimple 2023-01-11T23:32:25.7463915Z [ OK ] LoopNest.CacheWritesSimple (75 ms) 2023-01-11T23:32:25.7464855Z [ RUN ] LoopNest.DeadStoreElimination 2023-01-11T23:32:25.7474994Z [ OK ] LoopNest.DeadStoreElimination (1 ms) 2023-01-11T23:32:25.7475549Z [ RUN ] LoopNest.DeadStoreEliminationWithIntermediates 2023-01-11T23:32:25.7485878Z [ OK ] LoopNest.DeadStoreEliminationWithIntermediates (1 ms) 2023-01-11T23:32:25.7486415Z [ RUN ] LoopNest.CompoundTensorSimple 2023-01-11T23:32:25.7499377Z [ OK ] LoopNest.CompoundTensorSimple (1 ms) 2023-01-11T23:32:25.7499835Z [ RUN ] LoopNest.InlineConstantIndex 2023-01-11T23:32:25.7502106Z [ OK ] LoopNest.InlineConstantIndex (0 ms) 2023-01-11T23:32:25.7502587Z [ RUN ] LoopNest.CompoundTensorUsed 2023-01-11T23:32:25.7524406Z [ OK ] LoopNest.CompoundTensorUsed (2 ms) 2023-01-11T23:32:25.7524873Z [ RUN ] LoopNest.InlineFromLoad 2023-01-11T23:32:25.7525307Z [ OK ] LoopNest.InlineFromLoad (0 ms) 2023-01-11T23:32:25.7525692Z [ RUN ] LoopNest.OptimizeConditionalsSimple 2023-01-11T23:32:25.7526445Z [ OK ] LoopNest.OptimizeConditionalsSimple (0 ms) 2023-01-11T23:32:25.7526974Z [ RUN ] LoopNest.OptimizeConditionalsNestedConditions 2023-01-11T23:32:25.7528045Z [ OK ] LoopNest.OptimizeConditionalsNestedConditions (0 ms) 2023-01-11T23:32:25.7528540Z [ RUN ] LoopNest.OptimizeConditionalsMultipleStores 2023-01-11T23:32:25.7530874Z [ OK ] LoopNest.OptimizeConditionalsMultipleStores (0 ms) 2023-01-11T23:32:25.7531481Z [ RUN ] LoopNest.OptimizeConditionalsMultipleStoresInOneLoop 2023-01-11T23:32:25.7535569Z [ OK ] LoopNest.OptimizeConditionalsMultipleStoresInOneLoop (0 ms) 2023-01-11T23:32:25.7536164Z [ RUN ] LoopNest.OptimizeConditionalsOuterLoopVar 2023-01-11T23:32:25.7538492Z [ OK ] LoopNest.OptimizeConditionalsOuterLoopVar (0 ms) 2023-01-11T23:32:25.7539101Z [ RUN ] LoopNest.OptimizeConditionalsCompValuesNotOrdered 2023-01-11T23:32:25.7541707Z [ OK ] LoopNest.OptimizeConditionalsCompValuesNotOrdered (0 ms) 2023-01-11T23:32:25.7542327Z [ RUN ] LoopNest.OptimizeConditionalsCompValuesNotConstants 2023-01-11T23:32:25.7544723Z [ OK ] LoopNest.OptimizeConditionalsCompValuesNotConstants (0 ms) 2023-01-11T23:32:25.7545336Z [ RUN ] LoopNest.OptimizeConditionalsInvalidCondition 2023-01-11T23:32:25.7547820Z [ OK ] LoopNest.OptimizeConditionalsInvalidCondition (0 ms) 2023-01-11T23:32:25.7548415Z [ RUN ] LoopNest.OptimizeConditionalsInvalidCondition2 2023-01-11T23:32:25.7551656Z [ OK ] LoopNest.OptimizeConditionalsInvalidCondition2 (0 ms) 2023-01-11T23:32:25.7552303Z [ RUN ] LoopNest.OptimizeConditionalsInvalidCondition3 2023-01-11T23:32:25.7552877Z [ OK ] LoopNest.OptimizeConditionalsInvalidCondition3 (0 ms) 2023-01-11T23:32:25.7553437Z [ RUN ] LoopNest.OptimizeConditionalsInvalidCondition4 2023-01-11T23:32:25.7555995Z [ OK ] LoopNest.OptimizeConditionalsInvalidCondition4 (0 ms) 2023-01-11T23:32:25.7556587Z [ RUN ] LoopNest.OptimizeConditionalsNotNormalized 2023-01-11T23:32:25.7557099Z [ OK ] LoopNest.OptimizeConditionalsNotNormalized (0 ms) 2023-01-11T23:32:25.7557496Z [ RUN ] LoopNest.ColReduceSplitTailEvenReorder 2023-01-11T23:32:25.9127696Z [ OK ] LoopNest.ColReduceSplitTailEvenReorder (157 ms) 2023-01-11T23:32:25.9128211Z [ RUN ] LoopNest.ColReduceSplitTailUnevenReorder 2023-01-11T23:32:26.0367991Z [ OK ] LoopNest.ColReduceSplitTailUnevenReorder (124 ms) 2023-01-11T23:32:26.0368946Z [ RUN ] LoopNest.ColReduceSplitMaskEvenReorder 2023-01-11T23:32:26.1928787Z [ OK ] LoopNest.ColReduceSplitMaskEvenReorder (155 ms) 2023-01-11T23:32:26.1929575Z [ RUN ] LoopNest.ColReduceSplitMaskUnevenReorder 2023-01-11T23:32:26.3335632Z [ OK ] LoopNest.ColReduceSplitMaskUnevenReorder (140 ms) 2023-01-11T23:32:26.3336589Z [ RUN ] LoopNest.ReorderAxisWithMultipleConds 2023-01-11T23:32:26.3337319Z [ OK ] LoopNest.ReorderAxisWithMultipleConds (0 ms) 2023-01-11T23:32:26.3337935Z [ RUN ] LoopNest.VectorizeUse 2023-01-11T23:32:26.3341019Z [ OK ] LoopNest.VectorizeUse (0 ms) 2023-01-11T23:32:26.3341674Z [ RUN ] LoopNest.Int64Direct 2023-01-11T23:32:26.3342317Z [ OK ] LoopNest.Int64Direct (0 ms) 2023-01-11T23:32:26.3343107Z [ RUN ] LoopNest.Int64Compute 2023-01-11T23:32:26.3343903Z [ OK ] LoopNest.Int64Compute (0 ms) 2023-01-11T23:32:26.3344713Z [ RUN ] LoopNest.DistributeLoopWithAllStmtsAsPivots 2023-01-11T23:32:26.3345536Z [ OK ] LoopNest.DistributeLoopWithAllStmtsAsPivots (0 ms) 2023-01-11T23:32:26.3346349Z [ RUN ] LoopNest.DistributeLoopWithOneStmtAsPivot 2023-01-11T23:32:26.3347137Z [ OK ] LoopNest.DistributeLoopWithOneStmtAsPivot (0 ms) 2023-01-11T23:32:26.3347766Z [ RUN ] LoopNest.DistributeLoopWithoutAnyPivot 2023-01-11T23:32:26.3348437Z [ OK ] LoopNest.DistributeLoopWithoutAnyPivot (0 ms) 2023-01-11T23:32:26.3348960Z [ RUN ] LoopNest.DistributeLoopOverInnerLoops 2023-01-11T23:32:26.3349381Z [ OK ] LoopNest.DistributeLoopOverInnerLoops (0 ms) 2023-01-11T23:32:26.3349790Z [ RUN ] LoopNest.DistributeLoopAndParentsWithoutAnyPivot 2023-01-11T23:32:26.3350320Z [ OK ] LoopNest.DistributeLoopAndParentsWithoutAnyPivot (0 ms) 2023-01-11T23:32:26.3350691Z [ RUN ] LoopNest.fuseLoopsSimple 2023-01-11T23:32:26.3351005Z [ OK ] LoopNest.fuseLoopsSimple (0 ms) 2023-01-11T23:32:26.3351313Z [ RUN ] LoopNest.fuseLoopsMultiple 2023-01-11T23:32:26.3351649Z [ OK ] LoopNest.fuseLoopsMultiple (0 ms) 2023-01-11T23:32:26.3351964Z [ RUN ] LoopNest.fuseLoopsNested 2023-01-11T23:32:26.3354995Z [ OK ] LoopNest.fuseLoopsNested (0 ms) 2023-01-11T23:32:26.3355455Z [ RUN ] LoopNest.fuseLoopsNested2D 2023-01-11T23:32:26.3357164Z [ OK ] LoopNest.fuseLoopsNested2D (0 ms) 2023-01-11T23:32:26.3357646Z [ RUN ] LoopNest.fuseLoopsNested2DInner 2023-01-11T23:32:26.3359877Z [ OK ] LoopNest.fuseLoopsNested2DInner (0 ms) 2023-01-11T23:32:26.3360466Z [ RUN ] LoopNest.fuseLoopsDifferentStopBounds 2023-01-11T23:32:26.3361007Z [ OK ] LoopNest.fuseLoopsDifferentStopBounds (0 ms) 2023-01-11T23:32:26.3361551Z [ RUN ] LoopNest.fuseLoopsDifferentStartBounds 2023-01-11T23:32:26.3362087Z [ OK ] LoopNest.fuseLoopsDifferentStartBounds (0 ms) 2023-01-11T23:32:26.3362687Z [ RUN ] LoopNest.fuseLoopsNotContiguous 2023-01-11T23:32:26.3363034Z [ OK ] LoopNest.fuseLoopsNotContiguous (0 ms) 2023-01-11T23:32:26.3363390Z [ RUN ] LoopNest.fuseLoopsWithDifferentParents 2023-01-11T23:32:26.3363777Z [ OK ] LoopNest.fuseLoopsWithDifferentParents (0 ms) 2023-01-11T23:32:26.3364155Z [ RUN ] LoopNest.fuseLoopsWithVariableBounds 2023-01-11T23:32:26.3365180Z [ OK ] LoopNest.fuseLoopsWithVariableBounds (0 ms) 2023-01-11T23:32:26.3365657Z [ RUN ] LoopNest.fuseLoopsWithExprBounds 2023-01-11T23:32:26.3369763Z [ OK ] LoopNest.fuseLoopsWithExprBounds (0 ms) 2023-01-11T23:32:26.3370292Z [ RUN ] LoopNest.fuseLoopsWithDifferentExprBounds 2023-01-11T23:32:26.3373605Z [ OK ] LoopNest.fuseLoopsWithDifferentExprBounds (0 ms) 2023-01-11T23:32:26.3374196Z [ RUN ] LoopNest.fuseLoopsWithNonOverlappingBufferAccesses 2023-01-11T23:32:26.3377747Z [ OK ] LoopNest.fuseLoopsWithNonOverlappingBufferAccesses (0 ms) 2023-01-11T23:32:26.3378395Z [ RUN ] LoopNest.fuseLoopsWithNonOverlapping2DBufferAccesses 2023-01-11T23:32:26.3385131Z [ OK ] LoopNest.fuseLoopsWithNonOverlapping2DBufferAccesses (0 ms) 2023-01-11T23:32:26.3385678Z [ RUN ] LoopNest.fuseLoopsWithReductions 2023-01-11T23:32:26.3389704Z [ OK ] LoopNest.fuseLoopsWithReductions (0 ms) 2023-01-11T23:32:26.3390265Z [ RUN ] LoopNest.fuseLoopsWith2DReductions 2023-01-11T23:32:26.3398465Z [ OK ] LoopNest.fuseLoopsWith2DReductions (0 ms) 2023-01-11T23:32:26.3398966Z [ RUN ] LoopNest.fuseLoopsWithComplexIndices 2023-01-11T23:32:26.3406039Z [ OK ] LoopNest.fuseLoopsWithComplexIndices (0 ms) 2023-01-11T23:32:26.3406576Z [ RUN ] LoopNest.fuseLoopsWithMixedLoopVarsAsIndices 2023-01-11T23:32:26.3416328Z [ OK ] LoopNest.fuseLoopsWithMixedLoopVarsAsIndices (0 ms) 2023-01-11T23:32:26.3416880Z [ RUN ] LoopNest.fuseLoopsWithTranspose 2023-01-11T23:32:26.3421527Z [ OK ] LoopNest.fuseLoopsWithTranspose (0 ms) 2023-01-11T23:32:26.3422056Z [ RUN ] LoopNest.fuseLoopsThatViolateDependencies1 2023-01-11T23:32:26.3425917Z [ OK ] LoopNest.fuseLoopsThatViolateDependencies1 (0 ms) 2023-01-11T23:32:26.3426483Z [ RUN ] LoopNest.fuseLoopsThatViolateDependencies2 2023-01-11T23:32:26.3429882Z [ OK ] LoopNest.fuseLoopsThatViolateDependencies2 (0 ms) 2023-01-11T23:32:26.3430555Z [ RUN ] LoopNest.fuseLoopsThatViolateDependencies3 2023-01-11T23:32:26.3437364Z [ OK ] LoopNest.fuseLoopsThatViolateDependencies3 (0 ms) 2023-01-11T23:32:26.3437948Z [ RUN ] LoopNest.fuseLoopsThatViolateDependencies4 2023-01-11T23:32:26.3446659Z [ OK ] LoopNest.fuseLoopsThatViolateDependencies4 (0 ms) 2023-01-11T23:32:26.3447230Z [ RUN ] LoopNest.fuseLoopsThatViolateDependencies5 2023-01-11T23:32:26.3451319Z [ OK ] LoopNest.fuseLoopsThatViolateDependencies5 (0 ms) 2023-01-11T23:32:26.3451878Z [ RUN ] LoopNest.fuseLoopsThatViolateDependencies6 2023-01-11T23:32:26.3456097Z [ OK ] LoopNest.fuseLoopsThatViolateDependencies6 (0 ms) 2023-01-11T23:32:26.3456649Z [ RUN ] LoopNest.fuseLoopsThatViolateDependencies7 2023-01-11T23:32:26.3460948Z [ OK ] LoopNest.fuseLoopsThatViolateDependencies7 (0 ms) 2023-01-11T23:32:26.3461477Z [ RUN ] LoopNest.areLoopsPerfectlyNested 2023-01-11T23:32:26.3461965Z [ OK ] LoopNest.areLoopsPerfectlyNested (0 ms) 2023-01-11T23:32:26.3462319Z [ RUN ] LoopNest.reorderNestedLoops2D 2023-01-11T23:32:26.3462784Z [ OK ] LoopNest.reorderNestedLoops2D (0 ms) 2023-01-11T23:32:26.3463232Z [ RUN ] LoopNest.reorderNestedLoops3D 2023-01-11T23:32:26.3463627Z [ OK ] LoopNest.reorderNestedLoops3D (0 ms) 2023-01-11T23:32:26.3464049Z [ RUN ] LoopNest.reorderNestedLoops4D 2023-01-11T23:32:26.3464383Z [ OK ] LoopNest.reorderNestedLoops4D (0 ms) 2023-01-11T23:32:26.3464735Z [ RUN ] LoopNest.reorderTrivialPermutation 2023-01-11T23:32:26.3465137Z [ OK ] LoopNest.reorderTrivialPermutation (0 ms) 2023-01-11T23:32:26.3465522Z [ RUN ] LoopNest.reorderInvalidPermutations 2023-01-11T23:32:26.3465893Z [ OK ] LoopNest.reorderInvalidPermutations (0 ms) 2023-01-11T23:32:26.3466233Z [ RUN ] LoopNest.reorderInvalidLoopNest 2023-01-11T23:32:26.3466627Z [ OK ] LoopNest.reorderInvalidLoopNest (0 ms) 2023-01-11T23:32:26.3467081Z [ RUN ] LoopNest.compressBufferSimple 2023-01-11T23:32:26.3467478Z [ OK ] LoopNest.compressBufferSimple (0 ms) 2023-01-11T23:32:26.3467819Z [ RUN ] LoopNest.compressBufferMultipleDims 2023-01-11T23:32:26.3468190Z [ OK ] LoopNest.compressBufferMultipleDims (0 ms) 2023-01-11T23:32:26.3468556Z [ RUN ] LoopNest.compressBufferMultipleDims2 2023-01-11T23:32:26.3468921Z [ OK ] LoopNest.compressBufferMultipleDims2 (0 ms) 2023-01-11T23:32:26.3469317Z [ RUN ] LoopNest.compressBufferDifferentOrderIndices 2023-01-11T23:32:26.3469740Z [ OK ] LoopNest.compressBufferDifferentOrderIndices (0 ms) 2023-01-11T23:32:26.3470236Z [ RUN ] LoopNest.compressBufferVariableBounds 2023-01-11T23:32:26.3470605Z [ OK ] LoopNest.compressBufferVariableBounds (0 ms) 2023-01-11T23:32:26.3470995Z [ RUN ] LoopNest.compressBufferNoCommonParentLoops 2023-01-11T23:32:26.3471400Z [ OK ] LoopNest.compressBufferNoCommonParentLoops (0 ms) 2023-01-11T23:32:26.3471779Z [ RUN ] LoopNest.compressBufferIndicesMixed 2023-01-11T23:32:26.3472140Z [ OK ] LoopNest.compressBufferIndicesMixed (0 ms) 2023-01-11T23:32:26.3472494Z [ RUN ] LoopNest.compressMultipleBuffers 2023-01-11T23:32:26.3472841Z [ OK ] LoopNest.compressMultipleBuffers (0 ms) 2023-01-11T23:32:26.3473166Z [ RUN ] LoopNest.sanitizeNames 2023-01-11T23:32:26.3483761Z [ OK ] LoopNest.sanitizeNames (1 ms) 2023-01-11T23:32:26.3484167Z [----------] 174 tests from LoopNest (1742 ms total) 2023-01-11T23:32:26.3484348Z 2023-01-11T23:32:26.3484557Z [----------] 31 tests from MemDependency 2023-01-11T23:32:26.3484890Z [ RUN ] MemDependency.BoundOverlap 2023-01-11T23:32:26.3494837Z [ OK ] MemDependency.BoundOverlap (1 ms) 2023-01-11T23:32:26.3495178Z [ RUN ] MemDependency.BoundComparison 2023-01-11T23:32:26.3503488Z [ OK ] MemDependency.BoundComparison (0 ms) 2023-01-11T23:32:26.3504165Z [ RUN ] MemDependency.BoundOverlapSymbolic 2023-01-11T23:32:26.3512466Z [ OK ] MemDependency.BoundOverlapSymbolic (0 ms) 2023-01-11T23:32:26.3512849Z [ RUN ] MemDependency.BoundOverlapMultiDim 2023-01-11T23:32:26.3520000Z [ OK ] MemDependency.BoundOverlapMultiDim (0 ms) 2023-01-11T23:32:26.3520347Z [ RUN ] MemDependency.BoundSubtract 2023-01-11T23:32:26.3526914Z [ OK ] MemDependency.BoundSubtract (0 ms) 2023-01-11T23:32:26.3527265Z [ RUN ] MemDependency.BoundSubtractSymbolic 2023-01-11T23:32:26.3548479Z [ OK ] MemDependency.BoundSubtractSymbolic (2 ms) 2023-01-11T23:32:26.3549256Z [ RUN ] MemDependency.BoundSubtractMultiDim 2023-01-11T23:32:26.3572010Z [ OK ] MemDependency.BoundSubtractMultiDim (2 ms) 2023-01-11T23:32:26.3572817Z [ RUN ] MemDependency.BoundSubtractMultiDimSymbolic 2023-01-11T23:32:26.3603403Z [ OK ] MemDependency.BoundSubtractMultiDimSymbolic (3 ms) 2023-01-11T23:32:26.3603989Z [ RUN ] MemDependency.MemDependencyCheckerSimple 2023-01-11T23:32:26.3604687Z [ OK ] MemDependency.MemDependencyCheckerSimple (0 ms) 2023-01-11T23:32:26.3605083Z [ RUN ] MemDependency.MemDependencyCheckerMultiStmt 2023-01-11T23:32:26.3605497Z [ OK ] MemDependency.MemDependencyCheckerMultiStmt (0 ms) 2023-01-11T23:32:26.3606009Z [ RUN ] MemDependency.MemDependencyCheckerOverlap 2023-01-11T23:32:26.3606587Z [ OK ] MemDependency.MemDependencyCheckerOverlap (0 ms) 2023-01-11T23:32:26.3606997Z [ RUN ] MemDependency.MemDependencyCheckerLoop 2023-01-11T23:32:26.3607385Z [ OK ] MemDependency.MemDependencyCheckerLoop (0 ms) 2023-01-11T23:32:26.3607784Z [ RUN ] MemDependency.MemDependencyCheckerLoopReduce 2023-01-11T23:32:26.3608982Z [ OK ] MemDependency.MemDependencyCheckerLoopReduce (0 ms) 2023-01-11T23:32:26.3609497Z [ RUN ] MemDependency.MemDependencyCheckerLoopReduceExpanded 2023-01-11T23:32:26.3612176Z [ OK ] MemDependency.MemDependencyCheckerLoopReduceExpanded (0 ms) 2023-01-11T23:32:26.3612656Z [ RUN ] MemDependency.MemDependencyCheckerInputsOutputs 2023-01-11T23:32:26.3613256Z [ OK ] MemDependency.MemDependencyCheckerInputsOutputs (0 ms) 2023-01-11T23:32:26.3613791Z [ RUN ] MemDependency.MemDependencyCheckerOutputDoesntDepend 2023-01-11T23:32:26.3614461Z [ OK ] MemDependency.MemDependencyCheckerOutputDoesntDepend (0 ms) 2023-01-11T23:32:26.3615125Z [ RUN ] MemDependency.MemDependencyCheckerLoopBounds 2023-01-11T23:32:26.3626214Z [ OK ] MemDependency.MemDependencyCheckerLoopBounds (1 ms) 2023-01-11T23:32:26.3626692Z [ RUN ] MemDependency.MemDependencyCheckerLoopBoundsIndexShift 2023-01-11T23:32:26.3653127Z [ OK ] MemDependency.MemDependencyCheckerLoopBoundsIndexShift (2 ms) 2023-01-11T23:32:26.3654116Z [ RUN ] MemDependency.MemDependencyCheckerLoopSelfDependency 2023-01-11T23:32:26.3784695Z [ OK ] MemDependency.MemDependencyCheckerLoopSelfDependency (13 ms) 2023-01-11T23:32:26.3785677Z [ RUN ] MemDependency.MemDependencyCheckerLoopDistinctStrides 2023-01-11T23:32:26.3792326Z [ OK ] MemDependency.MemDependencyCheckerLoopDistinctStrides (0 ms) 2023-01-11T23:32:26.3792901Z [ RUN ] MemDependency.MemDependencyCheckerLoopBoundsCond 2023-01-11T23:32:26.3809944Z [ OK ] MemDependency.MemDependencyCheckerLoopBoundsCond (1 ms) 2023-01-11T23:32:26.3824810Z [ RUN ] MemDependency.MemDependencyCheckerIfThenElse 2023-01-11T23:32:26.3825344Z [ OK ] MemDependency.MemDependencyCheckerIfThenElse (1 ms) 2023-01-11T23:32:26.3825770Z [ RUN ] MemDependency.MemDependencyCheckerCutLoop 2023-01-11T23:32:26.3838597Z [ OK ] MemDependency.MemDependencyCheckerCutLoop (1 ms) 2023-01-11T23:32:26.3839053Z [ RUN ] MemDependency.MemDependencyCheckerDynamicShapes 2023-01-11T23:32:26.3864308Z [ OK ] MemDependency.MemDependencyCheckerDynamicShapes (2 ms) 2023-01-11T23:32:26.3864731Z [ RUN ] MemDependency.MemDependencyCheckerMultiDim 2023-01-11T23:32:26.3897134Z [ OK ] MemDependency.MemDependencyCheckerMultiDim (3 ms) 2023-01-11T23:32:26.3897559Z [ RUN ] MemDependency.MemDependencyCheckerComputeAPI 2023-01-11T23:32:26.3907560Z [ OK ] MemDependency.MemDependencyCheckerComputeAPI (1 ms) 2023-01-11T23:32:26.3907985Z [ RUN ] MemDependency.MemDependencyCheckerComputeInline 2023-01-11T23:32:26.3915183Z [ OK ] MemDependency.MemDependencyCheckerComputeInline (0 ms) 2023-01-11T23:32:26.3915624Z [ RUN ] MemDependency.MemDependencyCheckerComputeSplit 2023-01-11T23:32:26.3936137Z [ OK ] MemDependency.MemDependencyCheckerComputeSplit (2 ms) 2023-01-11T23:32:26.3936577Z [ RUN ] MemDependency.MemDependencyCheckerComputeReorder 2023-01-11T23:32:26.3949489Z [ OK ] MemDependency.MemDependencyCheckerComputeReorder (1 ms) 2023-01-11T23:32:26.3949984Z [ RUN ] MemDependency.MemDependencyCheckerComputeReduce 2023-01-11T23:32:26.3960377Z [ OK ] MemDependency.MemDependencyCheckerComputeReduce (1 ms) 2023-01-11T23:32:26.3960808Z [ RUN ] MemDependency.MemDependencyCheckerComputeGEMM 2023-01-11T23:32:26.4116357Z [ OK ] MemDependency.MemDependencyCheckerComputeGEMM (15 ms) 2023-01-11T23:32:26.4116769Z [----------] 31 tests from MemDependency (63 ms total) 2023-01-11T23:32:26.4116982Z 2023-01-11T23:32:26.4117174Z [----------] 2 tests from Ops 2023-01-11T23:32:26.4117441Z [ RUN ] Ops.Sum 2023-01-11T23:32:26.4157960Z [ OK ] Ops.Sum (4 ms) 2023-01-11T23:32:26.4158265Z [ RUN ] Ops.ChannelsLastSum 2023-01-11T23:32:26.5320722Z [ OK ] Ops.ChannelsLastSum (116 ms) 2023-01-11T23:32:26.5321259Z [----------] 2 tests from Ops (120 ms total) 2023-01-11T23:32:26.5321496Z 2023-01-11T23:32:26.5321747Z [----------] 10 tests from Quantization 2023-01-11T23:32:26.5322208Z [ RUN ] Quantization.QuantDequantInt8 2023-01-11T23:32:26.5600267Z [ OK ] Quantization.QuantDequantInt8 (27 ms) 2023-01-11T23:32:26.5600812Z [ RUN ] Quantization.QuantDequantUInt8 2023-01-11T23:32:26.5869295Z [ OK ] Quantization.QuantDequantUInt8 (26 ms) 2023-01-11T23:32:26.5870397Z [ RUN ] Quantization.QuantDequantUInt8_NLC 2023-01-11T23:32:26.6148655Z [ OK ] Quantization.QuantDequantUInt8_NLC (28 ms) 2023-01-11T23:32:26.6149085Z [ RUN ] Quantization.QuantAddDequantInt8 2023-01-11T23:32:26.6475466Z [ OK ] Quantization.QuantAddDequantInt8 (32 ms) 2023-01-11T23:32:26.6475884Z [ RUN ] Quantization.QuantAddDequantUInt8 2023-01-11T23:32:26.6799785Z [ OK ] Quantization.QuantAddDequantUInt8 (32 ms) 2023-01-11T23:32:26.6800312Z [ RUN ] Quantization.QuantSigmoidDequantUInt8 2023-01-11T23:32:26.7182769Z [ OK ] Quantization.QuantSigmoidDequantUInt8 (38 ms) 2023-01-11T23:32:26.7183888Z [ RUN ] Quantization.QuantMulDequantUInt8 2023-01-11T23:32:26.7648364Z [ OK ] Quantization.QuantMulDequantUInt8 (46 ms) 2023-01-11T23:32:26.7648899Z [ RUN ] Quantization.QuantUpsampleNearst2dDequantUInt8 2023-01-11T23:32:26.8212248Z [ OK ] Quantization.QuantUpsampleNearst2dDequantUInt8 (56 ms) 2023-01-11T23:32:26.8213240Z [ RUN ] Quantization.UpsampleNearst2d 2023-01-11T23:32:26.8549136Z [ OK ] Quantization.UpsampleNearst2d (33 ms) 2023-01-11T23:32:26.8550280Z [ RUN ] Quantization.QuantCatDequantUInt8 2023-01-11T23:32:26.9269989Z [ OK ] Quantization.QuantCatDequantUInt8 (72 ms) 2023-01-11T23:32:26.9270690Z [----------] 10 tests from Quantization (395 ms total) 2023-01-11T23:32:26.9270918Z 2023-01-11T23:32:26.9271135Z [----------] 2 tests from BufLiveRange 2023-01-11T23:32:26.9271543Z [ RUN ] BufLiveRange.SingleRangeLine 2023-01-11T23:32:26.9271989Z [ OK ] BufLiveRange.SingleRangeLine (0 ms) 2023-01-11T23:32:26.9272318Z [ RUN ] BufLiveRange.MulRangeLine 2023-01-11T23:32:26.9272646Z [ OK ] BufLiveRange.MulRangeLine (0 ms) 2023-01-11T23:32:26.9272981Z [----------] 2 tests from BufLiveRange (0 ms total) 2023-01-11T23:32:26.9273141Z 2023-01-11T23:32:26.9273292Z [----------] 6 tests from MemPlanning 2023-01-11T23:32:26.9273610Z [ RUN ] MemPlanning.MemReuseWithTypeCast 2023-01-11T23:32:27.0038612Z [ OK ] MemPlanning.MemReuseWithTypeCast (76 ms) 2023-01-11T23:32:27.0039006Z [ RUN ] MemPlanning.NoMemReuseForLargerType 2023-01-11T23:32:27.0982153Z [ OK ] MemPlanning.NoMemReuseForLargerType (94 ms) 2023-01-11T23:32:27.0982707Z [ RUN ] MemPlanning.SameBufSizeMemReuse 2023-01-11T23:32:27.1826535Z [ OK ] MemPlanning.SameBufSizeMemReuse (84 ms) 2023-01-11T23:32:27.1827319Z [ RUN ] MemPlanning.SameBufSizeMultiMemReuses 2023-01-11T23:32:27.2685979Z [ OK ] MemPlanning.SameBufSizeMultiMemReuses (86 ms) 2023-01-11T23:32:27.2686412Z [ RUN ] MemPlanning.SameBufSizeMultiMemReusesOfOneBuf 2023-01-11T23:32:27.3701784Z [ OK ] MemPlanning.SameBufSizeMultiMemReusesOfOneBuf (101 ms) 2023-01-11T23:32:27.3702657Z [ RUN ] MemPlanning.SmallerBufSizeNonMemReuse 2023-01-11T23:32:27.4502178Z [ OK ] MemPlanning.SmallerBufSizeNonMemReuse (80 ms) 2023-01-11T23:32:27.4502958Z [----------] 6 tests from MemPlanning (522 ms total) 2023-01-11T23:32:27.4503299Z 2023-01-11T23:32:27.4503641Z [----------] 45 tests from Reductions 2023-01-11T23:32:27.4504434Z [ RUN ] Reductions.ReduceSum0D_1 2023-01-11T23:32:27.4505057Z [ OK ] Reductions.ReduceSum0D_1 (0 ms) 2023-01-11T23:32:27.4505661Z [ RUN ] Reductions.ReduceSum0D_2 2023-01-11T23:32:27.4506259Z [ OK ] Reductions.ReduceSum0D_2 (0 ms) 2023-01-11T23:32:27.4506848Z [ RUN ] Reductions.ReduceSum1D 2023-01-11T23:32:27.4507450Z [ OK ] Reductions.ReduceSum1D (0 ms) 2023-01-11T23:32:27.4508022Z [ RUN ] Reductions.ReduceSum2D 2023-01-11T23:32:27.4508618Z [ OK ] Reductions.ReduceSum2D (0 ms) 2023-01-11T23:32:27.4509011Z [ RUN ] Reductions.ReduceSum3D 2023-01-11T23:32:27.4536473Z [ OK ] Reductions.ReduceSum3D (3 ms) 2023-01-11T23:32:27.4537016Z [ RUN ] Reductions.ReduceSum10D 2023-01-11T23:32:28.1548566Z [ OK ] Reductions.ReduceSum10D (700 ms) 2023-01-11T23:32:28.1549057Z [ RUN ] Reductions.ReduceProduct 2023-01-11T23:32:28.1549593Z [ OK ] Reductions.ReduceProduct (0 ms) 2023-01-11T23:32:28.1550052Z [ RUN ] Reductions.ReduceMax 2023-01-11T23:32:28.1552910Z [ OK ] Reductions.ReduceMax (0 ms) 2023-01-11T23:32:28.1553296Z [ RUN ] Reductions.ReduceMinCustomInitializer 2023-01-11T23:32:28.1555257Z [ OK ] Reductions.ReduceMinCustomInitializer (0 ms) 2023-01-11T23:32:28.1555706Z [ RUN ] Reductions.ReduceAnyAll 2023-01-11T23:32:28.1576389Z [ OK ] Reductions.ReduceAnyAll (2 ms) 2023-01-11T23:32:28.1576811Z [ RUN ] Reductions.ReduceMatmul2D 2023-01-11T23:32:28.1585464Z [ OK ] Reductions.ReduceMatmul2D (0 ms) 2023-01-11T23:32:28.1585915Z [ RUN ] Reductions.ReduceRfactorLike 2023-01-11T23:32:28.1595684Z [ OK ] Reductions.ReduceRfactorLike (1 ms) 2023-01-11T23:32:28.1596043Z [ RUN ] Reductions.ReduceAsProducer 2023-01-11T23:32:28.1618719Z [ OK ] Reductions.ReduceAsProducer (2 ms) 2023-01-11T23:32:28.1619505Z [ RUN ] Reductions.ReduceAsConsumer 2023-01-11T23:32:28.1656536Z [ OK ] Reductions.ReduceAsConsumer (3 ms) 2023-01-11T23:32:28.1657395Z [ RUN ] Reductions.SplitReduceAxis 2023-01-11T23:32:28.1670548Z [ OK ] Reductions.SplitReduceAxis (1 ms) 2023-01-11T23:32:28.1671542Z [ RUN ] Reductions.SplitNonReduceAxis 2023-01-11T23:32:28.1707068Z [ OK ] Reductions.SplitNonReduceAxis (3 ms) 2023-01-11T23:32:28.1707998Z [ RUN ] Reductions.ReorderedReductionInitializer 2023-01-11T23:32:28.1746187Z [ OK ] Reductions.ReorderedReductionInitializer (3 ms) 2023-01-11T23:32:28.1747191Z [ RUN ] Reductions.ReduceRfactor 2023-01-11T23:32:28.1755141Z [ OK ] Reductions.ReduceRfactor (1 ms) 2023-01-11T23:32:28.1755613Z [ RUN ] Reductions.Reduce3DRfactorInner 2023-01-11T23:32:28.1887323Z [ OK ] Reductions.Reduce3DRfactorInner (13 ms) 2023-01-11T23:32:28.1888019Z [ RUN ] Reductions.Reduce3DRfactorOuter 2023-01-11T23:32:28.2024104Z [ OK ] Reductions.Reduce3DRfactorOuter (13 ms) 2023-01-11T23:32:28.2024656Z [ RUN ] Reductions.ReduceRepeatedInternalRfactor 2023-01-11T23:32:28.3828385Z [ OK ] Reductions.ReduceRepeatedInternalRfactor (180 ms) 2023-01-11T23:32:28.3828905Z [ RUN ] Reductions.ReduceSplitTail 2023-01-11T23:32:28.4239889Z [ OK ] Reductions.ReduceSplitTail (41 ms) 2023-01-11T23:32:28.4240247Z [ RUN ] Reductions.ReduceSplitNoTail 2023-01-11T23:32:28.4714932Z [ OK ] Reductions.ReduceSplitNoTail (47 ms) 2023-01-11T23:32:28.4715830Z [ RUN ] Reductions.ReduceOverSplitTail 2023-01-11T23:32:28.5116111Z [ OK ] Reductions.ReduceOverSplitTail (39 ms) 2023-01-11T23:32:28.5116751Z [ RUN ] Reductions.ReduceSplitMask 2023-01-11T23:32:28.5632957Z [ OK ] Reductions.ReduceSplitMask (51 ms) 2023-01-11T23:32:28.5633690Z [ RUN ] Reductions.ReduceSplitNoMask 2023-01-11T23:32:28.6108813Z [ OK ] Reductions.ReduceSplitNoMask (47 ms) 2023-01-11T23:32:28.6109172Z [ RUN ] Reductions.ReduceOverSplitMask 2023-01-11T23:32:28.6526186Z [ OK ] Reductions.ReduceOverSplitMask (41 ms) 2023-01-11T23:32:28.6526550Z [ RUN ] Reductions.ReduceSplitRfactor 2023-01-11T23:32:28.6568880Z [ OK ] Reductions.ReduceSplitRfactor (4 ms) 2023-01-11T23:32:28.6569625Z [ RUN ] Reductions.ReduceOverSplitRfactor 2023-01-11T23:32:28.6581988Z [ OK ] Reductions.ReduceOverSplitRfactor (1 ms) 2023-01-11T23:32:28.6582370Z [ RUN ] Reductions.ReduceInlineReduction 2023-01-11T23:32:28.6582723Z [ OK ] Reductions.ReduceInlineReduction (0 ms) 2023-01-11T23:32:28.6583069Z [ RUN ] Reductions.ReduceInlineConsumer 2023-01-11T23:32:28.6676783Z [ OK ] Reductions.ReduceInlineConsumer (9 ms) 2023-01-11T23:32:28.6677169Z [ RUN ] Reductions.ReduceInlineReducerInternal 2023-01-11T23:32:28.6768972Z [ OK ] Reductions.ReduceInlineReducerInternal (9 ms) 2023-01-11T23:32:28.6769784Z [ RUN ] Reductions.ReductionCacheAccessesOperatorAxis 2023-01-11T23:32:28.6818150Z [ OK ] Reductions.ReductionCacheAccessesOperatorAxis (4 ms) 2023-01-11T23:32:28.6818951Z [ RUN ] Reductions.ReductionCacheAccessesOuterReduceAxis 2023-01-11T23:32:28.6860120Z [ OK ] Reductions.ReductionCacheAccessesOuterReduceAxis (4 ms) 2023-01-11T23:32:28.6861176Z [ RUN ] Reductions.ReductionCacheAccessesInnerReduceAxis 2023-01-11T23:32:28.6902586Z [ OK ] Reductions.ReductionCacheAccessesInnerReduceAxis (4 ms) 2023-01-11T23:32:28.6903419Z [ RUN ] Reductions.ReductionCacheBodyAccess 2023-01-11T23:32:28.6913971Z [ OK ] Reductions.ReductionCacheBodyAccess (1 ms) 2023-01-11T23:32:28.6914379Z [ RUN ] Reductions.ReductionCacheConsumerAccess 2023-01-11T23:32:28.6930133Z [ OK ] Reductions.ReductionCacheConsumerAccess (1 ms) 2023-01-11T23:32:28.6930572Z [ RUN ] Reductions.ReductionSplitCacheConsumerAccess 2023-01-11T23:32:28.6949076Z [ OK ] Reductions.ReductionSplitCacheConsumerAccess (1 ms) 2023-01-11T23:32:28.6950098Z [ RUN ] Reductions.ReductionReorderCacheConsumerAccess 2023-01-11T23:32:28.6965351Z [ OK ] Reductions.ReductionReorderCacheConsumerAccess (1 ms) 2023-01-11T23:32:28.6965805Z [ RUN ] Reductions.ReductionRfactorCacheTempOuter 2023-01-11T23:32:28.7153853Z [ OK ] Reductions.ReductionRfactorCacheTempOuter (18 ms) 2023-01-11T23:32:28.7154622Z [ RUN ] Reductions.ReductionRfactorCacheTempInner 2023-01-11T23:32:28.7299601Z [ OK ] Reductions.ReductionRfactorCacheTempInner (14 ms) 2023-01-11T23:32:28.7300114Z [ RUN ] Reductions.ReductionVectorize 2023-01-11T23:32:28.7308770Z [ OK ] Reductions.ReductionVectorize (1 ms) 2023-01-11T23:32:28.7309239Z [ RUN ] Reductions.ReductionVectorizeInner 2023-01-11T23:32:28.7309600Z [ OK ] Reductions.ReductionVectorizeInner (0 ms) 2023-01-11T23:32:28.7310048Z [ RUN ] Reductions.ReductionVectorizeRfactor 2023-01-11T23:32:28.7322376Z [ OK ] Reductions.ReductionVectorizeRfactor (1 ms) 2023-01-11T23:32:28.7322872Z [ RUN ] Reductions.InitFunction 2023-01-11T23:32:28.7323218Z [ OK ] Reductions.InitFunction (0 ms) 2023-01-11T23:32:28.7323559Z [----------] 45 tests from Reductions (1282 ms total) 2023-01-11T23:32:28.7323725Z 2023-01-11T23:32:28.7323881Z [----------] 69 tests from Registerizer 2023-01-11T23:32:28.7324265Z [ RUN ] Registerizer.RegisterizerSimple 2023-01-11T23:32:28.7325059Z [ OK ] Registerizer.RegisterizerSimple (0 ms) 2023-01-11T23:32:28.7325538Z [ RUN ] Registerizer.RegisterizerLoop 2023-01-11T23:32:28.7325931Z [ OK ] Registerizer.RegisterizerLoop (0 ms) 2023-01-11T23:32:28.7326287Z [ RUN ] Registerizer.RegisterizerLoopFixedLoad 2023-01-11T23:32:28.7327826Z [ OK ] Registerizer.RegisterizerLoopFixedLoad (0 ms) 2023-01-11T23:32:28.7328296Z [ RUN ] Registerizer.RegisterizerLoopInternal 2023-01-11T23:32:28.7328947Z [ OK ] Registerizer.RegisterizerLoopInternal (0 ms) 2023-01-11T23:32:28.7329368Z [ RUN ] Registerizer.RegisterizerLoopInternalLoadOverlap 2023-01-11T23:32:28.7331228Z [ OK ] Registerizer.RegisterizerLoopInternalLoadOverlap (0 ms) 2023-01-11T23:32:28.7331687Z [ RUN ] Registerizer.RegisterizerLoopInternalRepeated 2023-01-11T23:32:28.7335373Z [ OK ] Registerizer.RegisterizerLoopInternalRepeated (0 ms) 2023-01-11T23:32:28.7335902Z [ RUN ] Registerizer.RegisterizerLoopInternalRepeatedOverlapLoopVar 2023-01-11T23:32:28.7339035Z [ OK ] Registerizer.RegisterizerLoopInternalRepeatedOverlapLoopVar (0 ms) 2023-01-11T23:32:28.7339576Z [ RUN ] Registerizer.RegisterizerLoopInternalRepeatedOverlapOther 2023-01-11T23:32:28.7342285Z [ OK ] Registerizer.RegisterizerLoopInternalRepeatedOverlapOther (0 ms) 2023-01-11T23:32:28.7342761Z [ RUN ] Registerizer.RegisterizerMultiVar 2023-01-11T23:32:28.7344302Z [ OK ] Registerizer.RegisterizerMultiVar (0 ms) 2023-01-11T23:32:28.7344708Z [ RUN ] Registerizer.RegisterizerVariableLoad 2023-01-11T23:32:28.7346535Z [ OK ] Registerizer.RegisterizerVariableLoad (0 ms) 2023-01-11T23:32:28.7346940Z [ RUN ] Registerizer.RegisterizerSymbolicIndices 2023-01-11T23:32:28.7347779Z [ OK ] Registerizer.RegisterizerSymbolicIndices (0 ms) 2023-01-11T23:32:28.7348257Z [ RUN ] Registerizer.RegisterizerMultiLoop 2023-01-11T23:32:28.7350609Z [ OK ] Registerizer.RegisterizerMultiLoop (0 ms) 2023-01-11T23:32:28.7351785Z [ RUN ] Registerizer.RegisterizerRepeated 2023-01-11T23:32:28.7354382Z [ OK ] Registerizer.RegisterizerRepeated (0 ms) 2023-01-11T23:32:28.7354898Z [ RUN ] Registerizer.RegisterizerNoLoads 2023-01-11T23:32:28.7355312Z [ OK ] Registerizer.RegisterizerNoLoads (0 ms) 2023-01-11T23:32:28.7355730Z [ RUN ] Registerizer.RegisterizerNoRepeatedStores 2023-01-11T23:32:28.7356710Z [ OK ] Registerizer.RegisterizerNoRepeatedStores (0 ms) 2023-01-11T23:32:28.7357107Z [ RUN ] Registerizer.RegisterizerMultiVarOverlap 2023-01-11T23:32:28.7359855Z [ OK ] Registerizer.RegisterizerMultiVarOverlap (0 ms) 2023-01-11T23:32:28.7360243Z [ RUN ] Registerizer.RegisterizerAllocs 2023-01-11T23:32:28.7362656Z [ OK ] Registerizer.RegisterizerAllocs (0 ms) 2023-01-11T23:32:28.7363133Z [ RUN ] Registerizer.RegisterizerNoInitializer 2023-01-11T23:32:28.7363818Z [ OK ] Registerizer.RegisterizerNoInitializer (0 ms) 2023-01-11T23:32:28.7364234Z [ RUN ] Registerizer.RegisterizerNoInitializerLoopVar 2023-01-11T23:32:28.7364944Z [ OK ] Registerizer.RegisterizerNoInitializerLoopVar (0 ms) 2023-01-11T23:32:28.7365416Z [ RUN ] Registerizer.RegisterizerLoadThenStore 2023-01-11T23:32:28.7366891Z [ OK ] Registerizer.RegisterizerLoadThenStore (0 ms) 2023-01-11T23:32:28.7367313Z [ RUN ] Registerizer.RegisterizerParallelized 2023-01-11T23:32:28.7367776Z [ OK ] Registerizer.RegisterizerParallelized (0 ms) 2023-01-11T23:32:28.7368221Z [ RUN ] Registerizer.RegisterizerConditionAfter 2023-01-11T23:32:28.7369364Z [ OK ] Registerizer.RegisterizerConditionAfter (0 ms) 2023-01-11T23:32:28.7369836Z [ RUN ] Registerizer.RegisterizerConditionBefore 2023-01-11T23:32:28.7371280Z [ OK ] Registerizer.RegisterizerConditionBefore (0 ms) 2023-01-11T23:32:28.7371748Z [ RUN ] Registerizer.RegisterizerConditionInside 2023-01-11T23:32:28.7373155Z [ OK ] Registerizer.RegisterizerConditionInside (0 ms) 2023-01-11T23:32:28.7373683Z [ RUN ] Registerizer.RegisterizerConditionInsideOverlap1 2023-01-11T23:32:28.7375629Z [ OK ] Registerizer.RegisterizerConditionInsideOverlap1 (0 ms) 2023-01-11T23:32:28.7376079Z [ RUN ] Registerizer.RegisterizerConditionInsideOverlap2 2023-01-11T23:32:28.7379034Z [ OK ] Registerizer.RegisterizerConditionInsideOverlap2 (0 ms) 2023-01-11T23:32:28.7379454Z [ RUN ] Registerizer.RegisterizerConditionHidden 2023-01-11T23:32:28.7380119Z [ OK ] Registerizer.RegisterizerConditionHidden (0 ms) 2023-01-11T23:32:28.7380548Z [ RUN ] Registerizer.RegisterizerConditionUnhidden 2023-01-11T23:32:28.7382436Z [ OK ] Registerizer.RegisterizerConditionUnhidden (0 ms) 2023-01-11T23:32:28.7382849Z [ RUN ] Registerizer.RegisterizerCondCondition 2023-01-11T23:32:28.7384175Z [ OK ] Registerizer.RegisterizerCondCondition (0 ms) 2023-01-11T23:32:28.7384759Z [ RUN ] Registerizer.RegisterizerCondConditionUnhidden 2023-01-11T23:32:28.7385386Z [ OK ] Registerizer.RegisterizerCondConditionUnhidden (0 ms) 2023-01-11T23:32:28.7385877Z [ RUN ] Registerizer.RegisterizerIfThenElseHidden 2023-01-11T23:32:28.7389168Z [ OK ] Registerizer.RegisterizerIfThenElseHidden (0 ms) 2023-01-11T23:32:28.7389607Z [ RUN ] Registerizer.RegisterizerIfThenElseUnhidden 2023-01-11T23:32:28.7393752Z [ OK ] Registerizer.RegisterizerIfThenElseUnhidden (0 ms) 2023-01-11T23:32:28.7394165Z [ RUN ] Registerizer.RegisterizerIfThenElseNested 2023-01-11T23:32:28.7394677Z [ OK ] Registerizer.RegisterizerIfThenElseNested (0 ms) 2023-01-11T23:32:28.7395167Z [ RUN ] Registerizer.RegisterizerIfThenElseInternal 2023-01-11T23:32:28.7397325Z [ OK ] Registerizer.RegisterizerIfThenElseInternal (0 ms) 2023-01-11T23:32:28.7397739Z [ RUN ] Registerizer.RegisterizerIfThenElseCondition 2023-01-11T23:32:28.7398540Z [ OK ] Registerizer.RegisterizerIfThenElseCondition (0 ms) 2023-01-11T23:32:28.7399049Z [ RUN ] Registerizer.RegisterizerIfThenElseConditionUnhidden 2023-01-11T23:32:28.7399999Z [ OK ] Registerizer.RegisterizerIfThenElseConditionUnhidden (0 ms) 2023-01-11T23:32:28.7400430Z [ RUN ] Registerizer.RegisterizerConditionBranchOnly 2023-01-11T23:32:28.7410766Z [ OK ] Registerizer.RegisterizerConditionBranchOnly (1 ms) 2023-01-11T23:32:28.7411234Z [ RUN ] Registerizer.RegisterizerCondIfThenElse 2023-01-11T23:32:28.7412766Z [ OK ] Registerizer.RegisterizerCondIfThenElse (0 ms) 2023-01-11T23:32:28.7413165Z [ RUN ] Registerizer.RegisterizerIfThenElseLoop 2023-01-11T23:32:28.7414894Z [ OK ] Registerizer.RegisterizerIfThenElseLoop (0 ms) 2023-01-11T23:32:28.7415412Z [ RUN ] Registerizer.RegisterizerIfThenElseLoopCut 2023-01-11T23:32:28.7415960Z [ OK ] Registerizer.RegisterizerIfThenElseLoopCut (0 ms) 2023-01-11T23:32:28.7416378Z [ RUN ] Registerizer.RegisterizerPartialAfter 2023-01-11T23:32:28.7418336Z [ OK ] Registerizer.RegisterizerPartialAfter (0 ms) 2023-01-11T23:32:28.7418786Z [ RUN ] Registerizer.RegisterizerPartialBefore 2023-01-11T23:32:28.7421010Z [ OK ] Registerizer.RegisterizerPartialBefore (0 ms) 2023-01-11T23:32:28.7421407Z [ RUN ] Registerizer.RegisterizerPartialInside 2023-01-11T23:32:28.7424326Z [ OK ] Registerizer.RegisterizerPartialInside (0 ms) 2023-01-11T23:32:28.7424743Z [ RUN ] Registerizer.RegisterizerPartialCondition 2023-01-11T23:32:28.7428159Z [ OK ] Registerizer.RegisterizerPartialCondition (0 ms) 2023-01-11T23:32:28.7428605Z [ RUN ] Registerizer.RegisterizerPartialConditionInternalCut 2023-01-11T23:32:28.7429359Z [ OK ] Registerizer.RegisterizerPartialConditionInternalCut (0 ms) 2023-01-11T23:32:28.7429839Z [ RUN ] Registerizer.RegisterizerPartialConditionInternalStart 2023-01-11T23:32:28.7432479Z [ OK ] Registerizer.RegisterizerPartialConditionInternalStart (0 ms) 2023-01-11T23:32:28.7432963Z [ RUN ] Registerizer.RegisterizerPartialOverlapsTwo 2023-01-11T23:32:28.7433804Z [ OK ] Registerizer.RegisterizerPartialOverlapsTwo (0 ms) 2023-01-11T23:32:28.7434212Z [ RUN ] Registerizer.RegisterizerNestedBlocks 2023-01-11T23:32:28.7435665Z [ OK ] Registerizer.RegisterizerNestedBlocks (0 ms) 2023-01-11T23:32:28.7436079Z [ RUN ] Registerizer.RegisterizerNestedConditions 2023-01-11T23:32:28.7436877Z [ OK ] Registerizer.RegisterizerNestedConditions (0 ms) 2023-01-11T23:32:28.7437310Z [ RUN ] Registerizer.RegisterizerNestedConditionsUnhidden 2023-01-11T23:32:28.7438997Z [ OK ] Registerizer.RegisterizerNestedConditionsUnhidden (0 ms) 2023-01-11T23:32:28.7439479Z [ RUN ] Registerizer.RegisterizerNestedConditionsHiddenFirst 2023-01-11T23:32:28.7441411Z [ OK ] Registerizer.RegisterizerNestedConditionsHiddenFirst (0 ms) 2023-01-11T23:32:28.7441900Z [ RUN ] Registerizer.RegisterizerNestedConditionsHiddenSecond 2023-01-11T23:32:28.7443855Z [ OK ] Registerizer.RegisterizerNestedConditionsHiddenSecond (0 ms) 2023-01-11T23:32:28.7444365Z [ RUN ] Registerizer.RegisterizerNestedConditionsCut 2023-01-11T23:32:28.7444971Z [ OK ] Registerizer.RegisterizerNestedConditionsCut (0 ms) 2023-01-11T23:32:28.7445502Z [ RUN ] Registerizer.RegisterizerNestedConditionLoopHidden 2023-01-11T23:32:28.7447400Z [ OK ] Registerizer.RegisterizerNestedConditionLoopHidden (0 ms) 2023-01-11T23:32:28.7447889Z [ RUN ] Registerizer.RegisterizerNestedConditionThreeDeep 2023-01-11T23:32:28.7452375Z [ OK ] Registerizer.RegisterizerNestedConditionThreeDeep (0 ms) 2023-01-11T23:32:28.7452848Z [ RUN ] Registerizer.RegisterizerNestedLoopSimple 2023-01-11T23:32:28.7453670Z [ OK ] Registerizer.RegisterizerNestedLoopSimple (0 ms) 2023-01-11T23:32:28.7454138Z [ RUN ] Registerizer.RegisterizerHiddenAccessYes 2023-01-11T23:32:28.7456741Z [ OK ] Registerizer.RegisterizerHiddenAccessYes (0 ms) 2023-01-11T23:32:28.7457186Z [ RUN ] Registerizer.RegisterizerHiddenAccessNo 2023-01-11T23:32:28.7459448Z [ OK ] Registerizer.RegisterizerHiddenAccessNo (0 ms) 2023-01-11T23:32:28.7459976Z [ RUN ] Registerizer.RegisterizerHiddenAccessMultiLoop 2023-01-11T23:32:28.7462639Z [ OK ] Registerizer.RegisterizerHiddenAccessMultiLoop (0 ms) 2023-01-11T23:32:28.7463106Z [ RUN ] Registerizer.RegisterizerTwoConditionalLoops 2023-01-11T23:32:28.7465034Z [ OK ] Registerizer.RegisterizerTwoConditionalLoops (0 ms) 2023-01-11T23:32:28.7465486Z [ RUN ] Registerizer.RegisterizerTwoConditionalLoopsCut 2023-01-11T23:32:28.7467705Z [ OK ] Registerizer.RegisterizerTwoConditionalLoopsCut (0 ms) 2023-01-11T23:32:28.7468151Z [ RUN ] Registerizer.RegisterizerLoopLetVar 2023-01-11T23:32:28.7468957Z [ OK ] Registerizer.RegisterizerLoopLetVar (0 ms) 2023-01-11T23:32:28.7469443Z [ RUN ] Registerizer.RegisterizerLoopLetVarOuter 2023-01-11T23:32:28.7470042Z [ OK ] Registerizer.RegisterizerLoopLetVarOuter (0 ms) 2023-01-11T23:32:28.7470452Z [ RUN ] Registerizer.RegisterizerMultiDim 2023-01-11T23:32:28.7473205Z [ OK ] Registerizer.RegisterizerMultiDim (0 ms) 2023-01-11T23:32:28.7473585Z [ RUN ] Registerizer.RegisterizerMultiDimPartial 2023-01-11T23:32:28.7473979Z [ OK ] Registerizer.RegisterizerMultiDimPartial (0 ms) 2023-01-11T23:32:28.7474366Z [ RUN ] Registerizer.RegisterizerMultiDimOverlap 2023-01-11T23:32:28.7476921Z [ OK ] Registerizer.RegisterizerMultiDimOverlap (0 ms) 2023-01-11T23:32:28.7477451Z [ RUN ] Registerizer.RegisterizerMultiDimPartialOverlap 2023-01-11T23:32:28.7479416Z [ OK ] Registerizer.RegisterizerMultiDimPartialOverlap (0 ms) 2023-01-11T23:32:28.7479937Z [ RUN ] Registerizer.RegisterizerMultiDim3DReduction1 2023-01-11T23:32:28.7481932Z [ OK ] Registerizer.RegisterizerMultiDim3DReduction1 (0 ms) 2023-01-11T23:32:28.7482365Z [ RUN ] Registerizer.RegisterizerMultiDim3DReduction2 2023-01-11T23:32:28.7484744Z [ OK ] Registerizer.RegisterizerMultiDim3DReduction2 (0 ms) 2023-01-11T23:32:28.7485287Z [----------] 69 tests from Registerizer (16 ms total) 2023-01-11T23:32:28.7485509Z 2023-01-11T23:32:28.7485722Z [----------] 92 tests from Simplify 2023-01-11T23:32:28.7486158Z [ RUN ] Simplify.ConstantFoldSimple 2023-01-11T23:32:28.7486611Z [ OK ] Simplify.ConstantFoldSimple (0 ms) 2023-01-11T23:32:28.7487041Z [ RUN ] Simplify.ConstantFoldTwoLayer 2023-01-11T23:32:28.7487466Z [ OK ] Simplify.ConstantFoldTwoLayer (0 ms) 2023-01-11T23:32:28.7487903Z [ RUN ] Simplify.ConstantFoldShifts 2023-01-11T23:32:28.7488369Z [ OK ] Simplify.ConstantFoldShifts (0 ms) 2023-01-11T23:32:28.7488737Z [ RUN ] Simplify.ConstantFoldBitwise 2023-01-11T23:32:28.7489076Z [ OK ] Simplify.ConstantFoldBitwise (0 ms) 2023-01-11T23:32:28.7489402Z [ RUN ] Simplify.ConstantFoldMultiOp 2023-01-11T23:32:28.7489738Z [ OK ] Simplify.ConstantFoldMultiOp (0 ms) 2023-01-11T23:32:28.7490141Z [ RUN ] Simplify.ConstantFoldMinMax 2023-01-11T23:32:28.7490508Z [ OK ] Simplify.ConstantFoldMinMax (0 ms) 2023-01-11T23:32:28.7490848Z [ RUN ] Simplify.ConstantFoldIntrinsics 2023-01-11T23:32:28.7491219Z [ OK ] Simplify.ConstantFoldIntrinsics (0 ms) 2023-01-11T23:32:28.7491671Z [ RUN ] Simplify.ConstantFoldCastToBool 2023-01-11T23:32:28.7492120Z [ OK ] Simplify.ConstantFoldCastToBool (0 ms) 2023-01-11T23:32:28.7492519Z [ RUN ] Simplify.ConstantFoldWithVar 2023-01-11T23:32:28.7492922Z [ OK ] Simplify.ConstantFoldWithVar (0 ms) 2023-01-11T23:32:28.7493382Z [ RUN ] Simplify.ConditionalSelectFoldSimple 2023-01-11T23:32:28.7493812Z [ OK ] Simplify.ConditionalSelectFoldSimple (0 ms) 2023-01-11T23:32:28.7494274Z [ RUN ] Simplify.ConditionalSelectFoldTwoLayer 2023-01-11T23:32:28.7494925Z [ OK ] Simplify.ConditionalSelectFoldTwoLayer (0 ms) 2023-01-11T23:32:28.7495440Z [ RUN ] Simplify.ConditionalSelectFoldWithVar 2023-01-11T23:32:28.7495867Z [ OK ] Simplify.ConditionalSelectFoldWithVar (0 ms) 2023-01-11T23:32:28.7496270Z [ RUN ] Simplify.UnFoldableExpr 2023-01-11T23:32:28.7496628Z [ OK ] Simplify.UnFoldableExpr (0 ms) 2023-01-11T23:32:28.7497025Z [ RUN ] Simplify.HashSimple 2023-01-11T23:32:28.7497390Z [ OK ] Simplify.HashSimple (0 ms) 2023-01-11T23:32:28.7497790Z [ RUN ] Simplify.HashEquivalence 2023-01-11T23:32:28.7498121Z [ OK ] Simplify.HashEquivalence (0 ms) 2023-01-11T23:32:28.7498443Z [ RUN ] Simplify.HashEquivalenceRand 2023-01-11T23:32:28.7498771Z [ OK ] Simplify.HashEquivalenceRand (0 ms) 2023-01-11T23:32:28.7499130Z [ RUN ] Simplify.HashEquivalenceAfterFolding 2023-01-11T23:32:28.7499506Z [ OK ] Simplify.HashEquivalenceAfterFolding (0 ms) 2023-01-11T23:32:28.7499853Z [ RUN ] Simplify.HashDifferenceTypes 2023-01-11T23:32:28.7500259Z [ OK ] Simplify.HashDifferenceTypes (0 ms) 2023-01-11T23:32:28.7500694Z [ RUN ] Simplify.HashLargeExpression 2023-01-11T23:32:28.7501029Z [ OK ] Simplify.HashLargeExpression (0 ms) 2023-01-11T23:32:28.7501348Z [ RUN ] Simplify.HashForLoopOptions 2023-01-11T23:32:28.7501680Z [ OK ] Simplify.HashForLoopOptions (0 ms) 2023-01-11T23:32:28.7502009Z [ RUN ] Simplify.SimplifyAdd 2023-01-11T23:32:28.7502308Z [ OK ] Simplify.SimplifyAdd (0 ms) 2023-01-11T23:32:28.7502594Z [ RUN ] Simplify.SimplifySub 2023-01-11T23:32:28.7502891Z [ OK ] Simplify.SimplifySub (0 ms) 2023-01-11T23:32:28.7503207Z [ RUN ] Simplify.SimplifyMultiLayer 2023-01-11T23:32:28.7503534Z [ OK ] Simplify.SimplifyMultiLayer (0 ms) 2023-01-11T23:32:28.7503957Z [ RUN ] Simplify.SimplifyMultiTerm 2023-01-11T23:32:28.7504358Z [ OK ] Simplify.SimplifyMultiTerm (0 ms) 2023-01-11T23:32:28.7504741Z [ RUN ] Simplify.SimplifyCasts 2023-01-11T23:32:28.7505058Z [ OK ] Simplify.SimplifyCasts (0 ms) 2023-01-11T23:32:28.7505389Z [ RUN ] Simplify.SimplifyEliminatesNoOps 2023-01-11T23:32:28.7505742Z [ OK ] Simplify.SimplifyEliminatesNoOps (0 ms) 2023-01-11T23:32:28.7506076Z [ RUN ] Simplify.SimplifyMultiVar 2023-01-11T23:32:28.7506399Z [ OK ] Simplify.SimplifyMultiVar (0 ms) 2023-01-11T23:32:28.7506728Z [ RUN ] Simplify.SimplifyEliminatesVar 2023-01-11T23:32:28.7507068Z [ OK ] Simplify.SimplifyEliminatesVar (0 ms) 2023-01-11T23:32:28.7507389Z [ RUN ] Simplify.SimplifyAdds 2023-01-11T23:32:28.7507696Z [ OK ] Simplify.SimplifyAdds (0 ms) 2023-01-11T23:32:28.7507987Z [ RUN ] Simplify.SimplifyMuls 2023-01-11T23:32:28.7508296Z [ OK ] Simplify.SimplifyMuls (0 ms) 2023-01-11T23:32:28.7508621Z [ RUN ] Simplify.SimplifySubs 2023-01-11T23:32:28.7512408Z [ OK ] Simplify.SimplifySubs (0 ms) 2023-01-11T23:32:28.7512845Z [ RUN ] Simplify.SimplifyDiv 2023-01-11T23:32:28.7513203Z [ OK ] Simplify.SimplifyDiv (0 ms) 2023-01-11T23:32:28.7513551Z [ RUN ] Simplify.SimplifyDivWithLoopContext0 2023-01-11T23:32:28.7513924Z [ OK ] Simplify.SimplifyDivWithLoopContext0 (0 ms) 2023-01-11T23:32:28.7514337Z [ RUN ] Simplify.SimplifyDivWithLoopContext1 2023-01-11T23:32:28.7514837Z [ OK ] Simplify.SimplifyDivWithLoopContext1 (0 ms) 2023-01-11T23:32:28.7515234Z [ RUN ] Simplify.SimplifyDivWithLoopContext2 2023-01-11T23:32:28.7515955Z [ OK ] Simplify.SimplifyDivWithLoopContext2 (0 ms) 2023-01-11T23:32:28.7516330Z [ RUN ] Simplify.SimplifyDivWithLoopContext3 2023-01-11T23:32:28.7516698Z [ OK ] Simplify.SimplifyDivWithLoopContext3 (0 ms) 2023-01-11T23:32:28.7517060Z [ RUN ] Simplify.SimplifyDivWithLoopContext4 2023-01-11T23:32:28.7519259Z [ OK ] Simplify.SimplifyDivWithLoopContext4 (0 ms) 2023-01-11T23:32:28.7519638Z [ RUN ] Simplify.SimplifyDivWithLoopContext5 2023-01-11T23:32:28.7521496Z [ OK ] Simplify.SimplifyDivWithLoopContext5 (0 ms) 2023-01-11T23:32:28.7521875Z [ RUN ] Simplify.SimplifyDivWithLoopContext6 2023-01-11T23:32:28.7524724Z [ OK ] Simplify.SimplifyDivWithLoopContext6 (0 ms) 2023-01-11T23:32:28.7525252Z [ RUN ] Simplify.SimplifyDivWithLoopContext7 2023-01-11T23:32:28.7525736Z [ OK ] Simplify.SimplifyDivWithLoopContext7 (0 ms) 2023-01-11T23:32:28.7526175Z [ RUN ] Simplify.SimplifyModWithLoopContext0 2023-01-11T23:32:28.7526554Z [ OK ] Simplify.SimplifyModWithLoopContext0 (0 ms) 2023-01-11T23:32:28.7526992Z [ RUN ] Simplify.SimplifyModWithLoopContext1 2023-01-11T23:32:28.7527987Z [ OK ] Simplify.SimplifyModWithLoopContext1 (0 ms) 2023-01-11T23:32:28.7528374Z [ RUN ] Simplify.SimplifyModWithLoopContext2 2023-01-11T23:32:28.7529995Z [ OK ] Simplify.SimplifyModWithLoopContext2 (0 ms) 2023-01-11T23:32:28.7530369Z [ RUN ] Simplify.SimplifyModWithLoopContext3 2023-01-11T23:32:28.7530740Z [ OK ] Simplify.SimplifyModWithLoopContext3 (0 ms) 2023-01-11T23:32:28.7531110Z [ RUN ] Simplify.SimplifyModWithLoopContext4 2023-01-11T23:32:28.7533151Z [ OK ] Simplify.SimplifyModWithLoopContext4 (0 ms) 2023-01-11T23:32:28.7533517Z [ RUN ] Simplify.SimplifyModWithLoopContext5 2023-01-11T23:32:28.7535740Z [ OK ] Simplify.SimplifyModWithLoopContext5 (0 ms) 2023-01-11T23:32:28.7536114Z [ RUN ] Simplify.SimplifyModWithLoopContext6 2023-01-11T23:32:28.7539157Z [ OK ] Simplify.SimplifyModWithLoopContext6 (0 ms) 2023-01-11T23:32:28.7539533Z [ RUN ] Simplify.SimplifyModWithLoopContext7 2023-01-11T23:32:28.7540018Z [ OK ] Simplify.SimplifyModWithLoopContext7 (0 ms) 2023-01-11T23:32:28.7540451Z [ RUN ] Simplify.SimplifyMod 2023-01-11T23:32:28.7542095Z [ OK ] Simplify.SimplifyMod (0 ms) 2023-01-11T23:32:28.7542467Z [ RUN ] Simplify.SimplifyMultiOp 2023-01-11T23:32:28.7544327Z [ OK ] Simplify.SimplifyMultiOp (0 ms) 2023-01-11T23:32:28.7544647Z [ RUN ] Simplify.SimplifyManyOps 2023-01-11T23:32:28.7547719Z [ OK ] Simplify.SimplifyManyOps (0 ms) 2023-01-11T23:32:28.7548059Z [ RUN ] Simplify.SimplifyFactorization 2023-01-11T23:32:28.7554067Z [ OK ] Simplify.SimplifyFactorization (0 ms) 2023-01-11T23:32:28.7554528Z [ RUN ] Simplify.SimplifyFactorizeUneven 2023-01-11T23:32:28.7554921Z [ OK ] Simplify.SimplifyFactorizeUneven (0 ms) 2023-01-11T23:32:28.7555264Z [ RUN ] Simplify.SimplifyDeeperTerms 2023-01-11T23:32:28.7555609Z [ OK ] Simplify.SimplifyDeeperTerms (0 ms) 2023-01-11T23:32:28.7555953Z [ RUN ] Simplify.SimplifyDeeperDifference 2023-01-11T23:32:28.7556383Z [ OK ] Simplify.SimplifyDeeperDifference (0 ms) 2023-01-11T23:32:28.7556912Z [ RUN ] Simplify.SimplifyFoldComplexDifference 2023-01-11T23:32:28.7557483Z [ OK ] Simplify.SimplifyFoldComplexDifference (0 ms) 2023-01-11T23:32:28.7557838Z [ RUN ] Simplify.SimplifyIfComponents 2023-01-11T23:32:28.7558184Z [ OK ] Simplify.SimplifyIfComponents (0 ms) 2023-01-11T23:32:28.7558603Z [ RUN ] Simplify.SimplifyOpaqueTerms 2023-01-11T23:32:28.7559064Z [ OK ] Simplify.SimplifyOpaqueTerms (0 ms) 2023-01-11T23:32:28.7559402Z [ RUN ] Simplify.SimplifySymbolicMinMax 2023-01-11T23:32:28.7560553Z [ OK ] Simplify.SimplifySymbolicMinMax (0 ms) 2023-01-11T23:32:28.7561149Z [ RUN ] Simplify.SimplifyNestedMax 2023-01-11T23:32:28.7573791Z [ OK ] Simplify.SimplifyNestedMax (1 ms) 2023-01-11T23:32:28.7574318Z [ RUN ] Simplify.SimplifyNestedMin 2023-01-11T23:32:28.7587359Z [ OK ] Simplify.SimplifyNestedMin (1 ms) 2023-01-11T23:32:28.7587909Z [ RUN ] Simplify.SimplifyWontReorderFloat 2023-01-11T23:32:28.7588880Z [ OK ] Simplify.SimplifyWontReorderFloat (0 ms) 2023-01-11T23:32:28.7589432Z [ RUN ] Simplify.SimplifyRoundModPattern 2023-01-11T23:32:28.7597832Z [ OK ] Simplify.SimplifyRoundModPattern (0 ms) 2023-01-11T23:32:28.7598474Z [ RUN ] Simplify.SimplifyRoundModPatternFactorization 2023-01-11T23:32:28.7602324Z [ OK ] Simplify.SimplifyRoundModPatternFactorization (0 ms) 2023-01-11T23:32:28.7602960Z [ RUN ] Simplify.SimplifyRoundModPatternMultivar 2023-01-11T23:32:28.7606941Z [ OK ] Simplify.SimplifyRoundModPatternMultivar (0 ms) 2023-01-11T23:32:28.7607558Z [ RUN ] Simplify.SimplifyModRoundModPattern 2023-01-11T23:32:28.7612272Z [ OK ] Simplify.SimplifyModRoundModPattern (0 ms) 2023-01-11T23:32:28.7612907Z [ RUN ] Simplify.SimplifyModRoundModPatternFactorization 2023-01-11T23:32:28.7620564Z [ OK ] Simplify.SimplifyModRoundModPatternFactorization (0 ms) 2023-01-11T23:32:28.7621242Z [ RUN ] Simplify.SimplifyModRoundModPatternMultivar 2023-01-11T23:32:28.7634474Z [ OK ] Simplify.SimplifyModRoundModPatternMultivar (1 ms) 2023-01-11T23:32:28.7635030Z [ RUN ] Simplify.SimplifyDivisionScalarFactorization 2023-01-11T23:32:28.7636093Z [ OK ] Simplify.SimplifyDivisionScalarFactorization (0 ms) 2023-01-11T23:32:28.7636630Z [ RUN ] Simplify.SimplifyConstantBranches 2023-01-11T23:32:28.7637056Z [ OK ] Simplify.SimplifyConstantBranches (0 ms) 2023-01-11T23:32:28.7637497Z [ RUN ] Simplify.SimplifyConstantCond 2023-01-11T23:32:28.7638015Z [ OK ] Simplify.SimplifyConstantCond (0 ms) 2023-01-11T23:32:28.7638370Z [ RUN ] Simplify.SimplifyEliminateEmptyCond 2023-01-11T23:32:28.7638749Z [ OK ] Simplify.SimplifyEliminateEmptyCond (0 ms) 2023-01-11T23:32:28.7639117Z [ RUN ] Simplify.SimplifyConstantComparisons 2023-01-11T23:32:28.7645412Z [ OK ] Simplify.SimplifyConstantComparisons (0 ms) 2023-01-11T23:32:28.7646008Z [ RUN ] Simplify.SimplifySymbolicComparisons 2023-01-11T23:32:28.7652737Z [ OK ] Simplify.SimplifySymbolicComparisons (0 ms) 2023-01-11T23:32:28.7653342Z [ RUN ] Simplify.SimplifyEliminateZeroLengthFor 2023-01-11T23:32:28.7653943Z [ OK ] Simplify.SimplifyEliminateZeroLengthFor (0 ms) 2023-01-11T23:32:28.7654649Z [ RUN ] Simplify.SimplifyOneLoopFor 2023-01-11T23:32:28.7655184Z [ OK ] Simplify.SimplifyOneLoopFor (0 ms) 2023-01-11T23:32:28.7655728Z [ RUN ] Simplify.SimplifyForWontLoseLoopOptions 2023-01-11T23:32:28.7656307Z [ OK ] Simplify.SimplifyForWontLoseLoopOptions (0 ms) 2023-01-11T23:32:28.7656859Z [ RUN ] Simplify.SimplifyMultilevelFor 2023-01-11T23:32:28.7657379Z [ OK ] Simplify.SimplifyMultilevelFor (0 ms) 2023-01-11T23:32:28.7657873Z [ RUN ] Simplify.SimplifyForCleansUp 2023-01-11T23:32:28.7660234Z [ OK ] Simplify.SimplifyForCleansUp (0 ms) 2023-01-11T23:32:28.7660770Z [ RUN ] Simplify.SimplifyEliminateEmptyFor 2023-01-11T23:32:28.7661708Z [ OK ] Simplify.SimplifyEliminateEmptyFor (0 ms) 2023-01-11T23:32:28.7662095Z [ RUN ] Simplify.SimplifyFlattenBlock 2023-01-11T23:32:28.7662461Z [ OK ] Simplify.SimplifyFlattenBlock (0 ms) 2023-01-11T23:32:28.7662831Z [ RUN ] Simplify.SimplifyEliminateZeroLengthAlloc 2023-01-11T23:32:28.7664279Z [ OK ] Simplify.SimplifyEliminateZeroLengthAlloc (0 ms) 2023-01-11T23:32:28.7664651Z [ RUN ] Simplify.DontSimplifyRand 2023-01-11T23:32:28.7664975Z [ OK ] Simplify.DontSimplifyRand (0 ms) 2023-01-11T23:32:28.7665311Z [ RUN ] Simplify.SimplifyReorderForCond 2023-01-11T23:32:28.7669095Z [ OK ] Simplify.SimplifyReorderForCond (0 ms) 2023-01-11T23:32:28.7669449Z [ RUN ] Simplify.SimplifyFuseConditions 2023-01-11T23:32:28.7674866Z [ OK ] Simplify.SimplifyFuseConditions (0 ms) 2023-01-11T23:32:28.7675217Z [ RUN ] Simplify.SimplifySyncThreads 2023-01-11T23:32:28.7675596Z [ OK ] Simplify.SimplifySyncThreads (0 ms) 2023-01-11T23:32:28.7676009Z [ RUN ] Simplify.SimplifyRampSubBroadcast 2023-01-11T23:32:28.7676428Z [ OK ] Simplify.SimplifyRampSubBroadcast (0 ms) 2023-01-11T23:32:28.7676831Z [ RUN ] Simplify.SimplifyBroadcastTermExpander 2023-01-11T23:32:28.7677223Z [ OK ] Simplify.SimplifyBroadcastTermExpander (0 ms) 2023-01-11T23:32:28.7677584Z [ RUN ] Simplify.CompareSelectLoopBounds 2023-01-11T23:32:28.7760074Z [ OK ] Simplify.CompareSelectLoopBounds (8 ms) 2023-01-11T23:32:28.7760469Z [ RUN ] Simplify.CompareSelectCondAlwaysInLoopBounds 2023-01-11T23:32:28.7760951Z [ OK ] Simplify.CompareSelectCondAlwaysInLoopBounds (0 ms) 2023-01-11T23:32:28.7761383Z [ RUN ] Simplify.IfThenCondAlwaysInLoopBounds 2023-01-11T23:32:28.7761766Z [ OK ] Simplify.IfThenCondAlwaysInLoopBounds (0 ms) 2023-01-11T23:32:28.7762166Z [ RUN ] Simplify.MultiClauseCondAlwaysInLoopBounds 2023-01-11T23:32:28.7763915Z [ OK ] Simplify.MultiClauseCondAlwaysInLoopBounds (0 ms) 2023-01-11T23:32:28.7764309Z [----------] 92 tests from Simplify (27 ms total) 2023-01-11T23:32:28.7764503Z 2023-01-11T23:32:28.7764725Z [----------] 12 tests from TEFuserPass 2023-01-11T23:32:28.7765153Z [ RUN ] TEFuserPass.FuserPass_1 2023-01-11T23:32:28.7770435Z [ OK ] TEFuserPass.FuserPass_1 (0 ms) 2023-01-11T23:32:28.7770751Z [ RUN ] TEFuserPass.FuserPass_2 2023-01-11T23:32:28.7774161Z [ OK ] TEFuserPass.FuserPass_2 (0 ms) 2023-01-11T23:32:28.7774584Z [ RUN ] TEFuserPass.FuserPass_3 2023-01-11T23:32:28.7777112Z [ OK ] TEFuserPass.FuserPass_3 (0 ms) 2023-01-11T23:32:28.7777467Z [ RUN ] TEFuserPass.FuserPass_0DimInput 2023-01-11T23:32:28.7779748Z [ OK ] TEFuserPass.FuserPass_0DimInput (0 ms) 2023-01-11T23:32:28.7780100Z [ RUN ] TEFuserPass.FuserPass_UnfusibleDevice 2023-01-11T23:32:28.7780461Z [ OK ] TEFuserPass.FuserPass_UnfusibleDevice (0 ms) 2023-01-11T23:32:28.7780808Z [ RUN ] TEFuserPass.FuserPass_UnknownShapes 2023-01-11T23:32:28.7781694Z [ OK ] TEFuserPass.FuserPass_UnknownShapes (0 ms) 2023-01-11T23:32:28.7782253Z [ RUN ] TEFuserPass.FuserPass_Multidevice 2023-01-11T23:32:28.7790718Z [ OK ] TEFuserPass.FuserPass_Multidevice (0 ms) 2023-01-11T23:32:28.7791249Z [ RUN ] TEFuserPass.FuserPass_MergeGroups 2023-01-11T23:32:28.7794568Z [ OK ] TEFuserPass.FuserPass_MergeGroups (0 ms) 2023-01-11T23:32:28.7795157Z [ RUN ] TEFuserPass.FuserPass_IgnoreUnknownShapeAtStart 2023-01-11T23:32:28.7795754Z [ OK ] TEFuserPass.FuserPass_IgnoreUnknownShapeAtStart (0 ms) 2023-01-11T23:32:28.7796289Z [ RUN ] TEFuserPass.FuserPass_Where 2023-01-11T23:32:28.7797997Z [ OK ] TEFuserPass.FuserPass_Where (0 ms) 2023-01-11T23:32:28.7798523Z [ RUN ] TEFuserPass.FuserPass_WhereList 2023-01-11T23:32:28.7799486Z [ OK ] TEFuserPass.FuserPass_WhereList (0 ms) 2023-01-11T23:32:28.7799979Z [ RUN ] TEFuserPass.DynamicShapeFusion 2023-01-11T23:32:29.0239213Z [ OK ] TEFuserPass.DynamicShapeFusion (243 ms) 2023-01-11T23:32:29.0239602Z [----------] 12 tests from TEFuserPass (247 ms total) 2023-01-11T23:32:29.0239782Z 2023-01-11T23:32:29.0239926Z [----------] 3 tests from Type 2023-01-11T23:32:29.0240210Z [ RUN ] Type.Test01 2023-01-11T23:32:29.0240481Z [ OK ] Type.Test01 (0 ms) 2023-01-11T23:32:29.0240749Z [ RUN ] Type.BitCasting 2023-01-11T23:32:29.0241025Z [ OK ] Type.BitCasting (0 ms) 2023-01-11T23:32:29.0241306Z [ RUN ] Type.Propagation 2023-01-11T23:32:29.0241599Z [ OK ] Type.Propagation (0 ms) 2023-01-11T23:32:29.0241911Z [----------] 3 tests from Type (0 ms total) 2023-01-11T23:32:29.0242064Z 2023-01-11T23:32:29.0242265Z [----------] 1 test from SpecializationsInCustomPasses 2023-01-11T23:32:29.0242634Z [ RUN ] SpecializationsInCustomPasses.Basic 2023-01-11T23:32:29.0257096Z [ OK ] SpecializationsInCustomPasses.Basic (1 ms) 2023-01-11T23:32:29.0257509Z [----------] 1 test from SpecializationsInCustomPasses (2 ms total) 2023-01-11T23:32:29.0257711Z 2023-01-11T23:32:29.0257854Z [----------] 26 tests from Cuda 2023-01-11T23:32:29.0258124Z [ RUN ] Cuda.Sigmoid_CUDA 2023-01-11T23:32:29.2705668Z [ OK ] Cuda.Sigmoid_CUDA (244 ms) 2023-01-11T23:32:29.2706301Z [ RUN ] Cuda.TestVectorAdd01_CUDA 2023-01-11T23:32:31.1580072Z [ OK ] Cuda.TestVectorAdd01_CUDA (1887 ms) 2023-01-11T23:32:31.1580735Z [ RUN ] Cuda.TestVectorAdd02_CUDA 2023-01-11T23:32:31.6259428Z [ OK ] Cuda.TestVectorAdd02_CUDA (467 ms) 2023-01-11T23:32:31.6259739Z [ RUN ] Cuda.HalfCast_CUDA 2023-01-11T23:32:31.8601078Z [ OK ] Cuda.HalfCast_CUDA (234 ms) 2023-01-11T23:32:31.8601485Z [ RUN ] Cuda.DynamicShape2D_CUDA 2023-01-11T23:32:32.6085869Z [ OK ] Cuda.DynamicShape2D_CUDA (748 ms) 2023-01-11T23:32:32.6086223Z [ RUN ] Cuda.TestRand01_CUDA 2023-01-11T23:32:32.8567663Z [ OK ] Cuda.TestRand01_CUDA (248 ms) 2023-01-11T23:32:32.8567996Z [ RUN ] Cuda.DynamicShapeSplit_CUDA 2023-01-11T23:32:33.0940256Z [ OK ] Cuda.DynamicShapeSplit_CUDA (236 ms) 2023-01-11T23:32:33.0940647Z [ RUN ] Cuda.OneBlockOneThreadGlobalReduce1_CUDA 2023-01-11T23:32:33.3361548Z [ OK ] Cuda.OneBlockOneThreadGlobalReduce1_CUDA (242 ms) 2023-01-11T23:32:33.3361983Z [ RUN ] Cuda.OneBlockMultiThreadGlobalReduce1_CUDA 2023-01-11T23:32:33.5695921Z [ OK ] Cuda.OneBlockMultiThreadGlobalReduce1_CUDA (233 ms) 2023-01-11T23:32:33.5696668Z [ RUN ] Cuda.NoThreadIdxWrite_1_CUDA 2023-01-11T23:32:33.8070326Z [ OK ] Cuda.NoThreadIdxWrite_1_CUDA (237 ms) 2023-01-11T23:32:33.8070722Z [ RUN ] Cuda.SharedMemReduce_1_CUDA 2023-01-11T23:32:34.0568643Z [ OK ] Cuda.SharedMemReduce_1_CUDA (249 ms) 2023-01-11T23:32:34.0569535Z [ RUN ] Cuda.LocalMemReduce_1_CUDA 2023-01-11T23:32:34.3070884Z [ OK ] Cuda.LocalMemReduce_1_CUDA (249 ms) 2023-01-11T23:32:34.3071803Z [ RUN ] Cuda.HalfSupport_CUDA 2023-01-11T23:32:34.5446407Z [ OK ] Cuda.HalfSupport_CUDA (237 ms) 2023-01-11T23:32:34.5446881Z [ RUN ] Cuda.HalfPropagation_CUDA 2023-01-11T23:32:34.7825093Z [ OK ] Cuda.HalfPropagation_CUDA (237 ms) 2023-01-11T23:32:34.7825560Z [ RUN ] Cuda.UnusedHalfArgument_CUDA 2023-01-11T23:32:35.0182582Z [ OK ] Cuda.UnusedHalfArgument_CUDA (235 ms) 2023-01-11T23:32:35.0183275Z [ RUN ] Cuda.PrioritizeDependents_CUDA 2023-01-11T23:32:35.2516881Z [ OK ] Cuda.PrioritizeDependents_CUDA (233 ms) 2023-01-11T23:32:35.2517559Z [ RUN ] Cuda.MaskBlockDim_CUDA 2023-01-11T23:32:35.4852878Z [ OK ] Cuda.MaskBlockDim_CUDA (233 ms) 2023-01-11T23:32:35.4853543Z [ RUN ] Cuda.MaskThreadDim_CUDA 2023-01-11T23:32:35.7197326Z [ OK ] Cuda.MaskThreadDim_CUDA (234 ms) 2023-01-11T23:32:35.7198051Z [ RUN ] Cuda.MaskMultiBlockDim_CUDA 2023-01-11T23:32:35.9537216Z [ OK ] Cuda.MaskMultiBlockDim_CUDA (234 ms) 2023-01-11T23:32:35.9538401Z [ RUN ] Cuda.MaskBlockAndThreadDim_CUDA 2023-01-11T23:32:36.1882440Z [ OK ] Cuda.MaskBlockAndThreadDim_CUDA (234 ms) 2023-01-11T23:32:36.1882843Z [ RUN ] Cuda.MaskMultiDim_CUDA 2023-01-11T23:32:36.4248977Z [ OK ] Cuda.MaskMultiDim_CUDA (236 ms) 2023-01-11T23:32:36.4249484Z [ RUN ] Cuda.MaskMultiDimSymbolic_CUDA 2023-01-11T23:32:36.6648696Z [ OK ] Cuda.MaskMultiDimSymbolic_CUDA (240 ms) 2023-01-11T23:32:36.6649065Z [ RUN ] Cuda.MaskCompoundInnerLoop_CUDA 2023-01-11T23:32:36.9017545Z [ OK ] Cuda.MaskCompoundInnerLoop_CUDA (236 ms) 2023-01-11T23:32:36.9018001Z [ RUN ] Cuda.MaskInnerLoopOneBlock_CUDA 2023-01-11T23:32:37.1497561Z [ OK ] Cuda.MaskInnerLoopOneBlock_CUDA (248 ms) 2023-01-11T23:32:37.1498285Z [ RUN ] Cuda.MaskMultiDimMultiAxis_CUDA 2023-01-11T23:32:37.3868214Z [ OK ] Cuda.MaskMultiDimMultiAxis_CUDA (237 ms) 2023-01-11T23:32:37.3868576Z [ RUN ] Cuda.MaskMultiDimMultiLevel_CUDA 2023-01-11T23:32:37.6233448Z [ OK ] Cuda.MaskMultiDimMultiLevel_CUDA (236 ms) 2023-01-11T23:32:37.6234286Z [----------] 26 tests from Cuda (8597 ms total) 2023-01-11T23:32:37.6234620Z 2023-01-11T23:32:37.6234908Z [----------] 150 tests from LLVM 2023-01-11T23:32:37.6235436Z [ RUN ] LLVM.ByteImmTest 2023-01-11T23:32:37.6420277Z [ OK ] LLVM.ByteImmTest (18 ms) 2023-01-11T23:32:37.6420841Z [ RUN ] LLVM.CharImmTest 2023-01-11T23:32:37.6600436Z [ OK ] LLVM.CharImmTest (18 ms) 2023-01-11T23:32:37.6601040Z [ RUN ] LLVM.ShortImmTest 2023-01-11T23:32:37.6781035Z [ OK ] LLVM.ShortImmTest (18 ms) 2023-01-11T23:32:37.6781358Z [ RUN ] LLVM.IntImmTest 2023-01-11T23:32:37.6960605Z [ OK ] LLVM.IntImmTest (17 ms) 2023-01-11T23:32:37.6960905Z [ RUN ] LLVM.LongImmTest 2023-01-11T23:32:37.7142840Z [ OK ] LLVM.LongImmTest (18 ms) 2023-01-11T23:32:37.7143563Z [ RUN ] LLVM.FloatImmTest 2023-01-11T23:32:37.7321662Z [ OK ] LLVM.FloatImmTest (18 ms) 2023-01-11T23:32:37.7321979Z [ RUN ] LLVM.DoubleImmTest 2023-01-11T23:32:37.7504606Z [ OK ] LLVM.DoubleImmTest (18 ms) 2023-01-11T23:32:37.7505202Z [ RUN ] LLVM.HalfImmTest 2023-01-11T23:32:37.7683969Z [ OK ] LLVM.HalfImmTest (18 ms) 2023-01-11T23:32:37.7684286Z [ RUN ] LLVM.ByteAddTest 2023-01-11T23:32:37.7867883Z [ OK ] LLVM.ByteAddTest (18 ms) 2023-01-11T23:32:37.7868282Z [ RUN ] LLVM.CharAddTest 2023-01-11T23:32:37.8047526Z [ OK ] LLVM.CharAddTest (18 ms) 2023-01-11T23:32:37.8047835Z [ RUN ] LLVM.ShortAddTest 2023-01-11T23:32:37.8230191Z [ OK ] LLVM.ShortAddTest (17 ms) 2023-01-11T23:32:37.8230705Z [ RUN ] LLVM.IntAddTest 2023-01-11T23:32:37.8408246Z [ OK ] LLVM.IntAddTest (17 ms) 2023-01-11T23:32:37.8408565Z [ RUN ] LLVM.LongAddTest 2023-01-11T23:32:37.8593047Z [ OK ] LLVM.LongAddTest (18 ms) 2023-01-11T23:32:37.8593642Z [ RUN ] LLVM.FloatAddTest 2023-01-11T23:32:37.8770045Z [ OK ] LLVM.FloatAddTest (18 ms) 2023-01-11T23:32:37.8770351Z [ RUN ] LLVM.DoubleAddTest 2023-01-11T23:32:37.8955222Z [ OK ] LLVM.DoubleAddTest (18 ms) 2023-01-11T23:32:37.8955822Z [ RUN ] LLVM.HalfAddTest 2023-01-11T23:32:37.9133149Z [ OK ] LLVM.HalfAddTest (18 ms) 2023-01-11T23:32:37.9133763Z [ RUN ] LLVM.ByteSubTest 2023-01-11T23:32:37.9316368Z [ OK ] LLVM.ByteSubTest (18 ms) 2023-01-11T23:32:37.9316956Z [ RUN ] LLVM.CharSubTest 2023-01-11T23:32:37.9492966Z [ OK ] LLVM.CharSubTest (17 ms) 2023-01-11T23:32:37.9493421Z [ RUN ] LLVM.ShortSubTest 2023-01-11T23:32:37.9677141Z [ OK ] LLVM.ShortSubTest (18 ms) 2023-01-11T23:32:37.9677753Z [ RUN ] LLVM.IntSubTest 2023-01-11T23:32:37.9855483Z [ OK ] LLVM.IntSubTest (17 ms) 2023-01-11T23:32:37.9856070Z [ RUN ] LLVM.LongSubTest 2023-01-11T23:32:38.0037289Z [ OK ] LLVM.LongSubTest (18 ms) 2023-01-11T23:32:38.0037882Z [ RUN ] LLVM.FloatSubTest 2023-01-11T23:32:38.0217729Z [ OK ] LLVM.FloatSubTest (18 ms) 2023-01-11T23:32:38.0218373Z [ RUN ] LLVM.DoubleSubTest 2023-01-11T23:32:38.0399469Z [ OK ] LLVM.DoubleSubTest (18 ms) 2023-01-11T23:32:38.0400067Z [ RUN ] LLVM.HalfSubTest 2023-01-11T23:32:38.0578796Z [ OK ] LLVM.HalfSubTest (18 ms) 2023-01-11T23:32:38.0579377Z [ RUN ] LLVM.ByteMulTest 2023-01-11T23:32:38.0759748Z [ OK ] LLVM.ByteMulTest (17 ms) 2023-01-11T23:32:38.0760337Z [ RUN ] LLVM.CharMulTest 2023-01-11T23:32:38.0940292Z [ OK ] LLVM.CharMulTest (17 ms) 2023-01-11T23:32:38.0940882Z [ RUN ] LLVM.ShortMulTest 2023-01-11T23:32:38.1120098Z [ OK ] LLVM.ShortMulTest (18 ms) 2023-01-11T23:32:38.1120682Z [ RUN ] LLVM.IntMulTest 2023-01-11T23:32:38.1302225Z [ OK ] LLVM.IntMulTest (18 ms) 2023-01-11T23:32:38.1302827Z [ RUN ] LLVM.LongMulTest 2023-01-11T23:32:38.1482633Z [ OK ] LLVM.LongMulTest (18 ms) 2023-01-11T23:32:38.1483211Z [ RUN ] LLVM.FloatMulTest 2023-01-11T23:32:38.1664811Z [ OK ] LLVM.FloatMulTest (18 ms) 2023-01-11T23:32:38.1665632Z [ RUN ] LLVM.DoubleMulTest 2023-01-11T23:32:38.1843309Z [ OK ] LLVM.DoubleMulTest (18 ms) 2023-01-11T23:32:38.1843702Z [ RUN ] LLVM.HalfMulTest 2023-01-11T23:32:38.2026170Z [ OK ] LLVM.HalfMulTest (18 ms) 2023-01-11T23:32:38.2026823Z [ RUN ] LLVM.ByteDivTest 2023-01-11T23:32:38.2204997Z [ OK ] LLVM.ByteDivTest (18 ms) 2023-01-11T23:32:38.2205372Z [ RUN ] LLVM.CharDivTest 2023-01-11T23:32:38.2387153Z [ OK ] LLVM.CharDivTest (17 ms) 2023-01-11T23:32:38.2388530Z [ RUN ] LLVM.ShortDivTest 2023-01-11T23:32:38.2564899Z [ OK ] LLVM.ShortDivTest (17 ms) 2023-01-11T23:32:38.2565200Z [ RUN ] LLVM.IntDivTest 2023-01-11T23:32:38.2749046Z [ OK ] LLVM.IntDivTest (18 ms) 2023-01-11T23:32:38.2749612Z [ RUN ] LLVM.LongDivTest 2023-01-11T23:32:38.2927059Z [ OK ] LLVM.LongDivTest (18 ms) 2023-01-11T23:32:38.2927366Z [ RUN ] LLVM.FloatDivTest 2023-01-11T23:32:38.3110371Z [ OK ] LLVM.FloatDivTest (18 ms) 2023-01-11T23:32:38.3110999Z [ RUN ] LLVM.DoubleDivTest 2023-01-11T23:32:38.3289308Z [ OK ] LLVM.DoubleDivTest (18 ms) 2023-01-11T23:32:38.3289618Z [ RUN ] LLVM.HalfDivTest 2023-01-11T23:32:38.3472834Z [ OK ] LLVM.HalfDivTest (18 ms) 2023-01-11T23:32:38.3473157Z [ RUN ] LLVM.IntToFloatCastTest 2023-01-11T23:32:38.3651721Z [ OK ] LLVM.IntToFloatCastTest (18 ms) 2023-01-11T23:32:38.3652544Z [ RUN ] LLVM.FloatToIntCastTest 2023-01-11T23:32:38.3835029Z [ OK ] LLVM.FloatToIntCastTest (18 ms) 2023-01-11T23:32:38.3835630Z [ RUN ] LLVM.IntToLongCastTest 2023-01-11T23:32:38.4012529Z [ OK ] LLVM.IntToLongCastTest (17 ms) 2023-01-11T23:32:38.4013122Z [ RUN ] LLVM.ByteToCharCastTest 2023-01-11T23:32:38.4193972Z [ OK ] LLVM.ByteToCharCastTest (18 ms) 2023-01-11T23:32:38.4194309Z [ RUN ] LLVM.HalfToLongCastTest 2023-01-11T23:32:38.4376440Z [ OK ] LLVM.HalfToLongCastTest (18 ms) 2023-01-11T23:32:38.4377089Z [ RUN ] LLVM.ByteToDoubleCastTest 2023-01-11T23:32:38.4560416Z [ OK ] LLVM.ByteToDoubleCastTest (18 ms) 2023-01-11T23:32:38.4561030Z [ RUN ] LLVM.FloatToByteCastTest 2023-01-11T23:32:38.4738292Z [ OK ] LLVM.FloatToByteCastTest (18 ms) 2023-01-11T23:32:38.4738980Z [ RUN ] LLVM.FloatToCharCastTest 2023-01-11T23:32:38.4920080Z [ OK ] LLVM.FloatToCharCastTest (17 ms) 2023-01-11T23:32:38.4920692Z [ RUN ] LLVM.ByteToFloatCastTest 2023-01-11T23:32:38.5100640Z [ OK ] LLVM.ByteToFloatCastTest (18 ms) 2023-01-11T23:32:38.5101173Z [ RUN ] LLVM.CharToFloatCastTest 2023-01-11T23:32:38.5278655Z [ OK ] LLVM.CharToFloatCastTest (17 ms) 2023-01-11T23:32:38.5278964Z [ RUN ] LLVM.BitCast 2023-01-11T23:32:38.6001665Z [ OK ] LLVM.BitCast (72 ms) 2023-01-11T23:32:38.6001963Z [ RUN ] LLVM.fastLogFloat 2023-01-11T23:32:38.6710144Z [ OK ] LLVM.fastLogFloat (70 ms) 2023-01-11T23:32:38.6710780Z [ RUN ] LLVM.LetTest01 2023-01-11T23:32:38.6890416Z [ OK ] LLVM.LetTest01 (18 ms) 2023-01-11T23:32:38.6890708Z [ RUN ] LLVM.LetTest02 2023-01-11T23:32:38.7074701Z [ OK ] LLVM.LetTest02 (18 ms) 2023-01-11T23:32:38.7075258Z [ RUN ] LLVM.LetTestMultitype 2023-01-11T23:32:38.7253539Z [ OK ] LLVM.LetTestMultitype (18 ms) 2023-01-11T23:32:38.7254864Z [ RUN ] LLVM.BufferTest 2023-01-11T23:32:38.7438798Z [ OK ] LLVM.BufferTest (18 ms) 2023-01-11T23:32:38.7439354Z [ RUN ] LLVM.BlockTest 2023-01-11T23:32:38.7622227Z [ OK ] LLVM.BlockTest (18 ms) 2023-01-11T23:32:38.7622537Z [ RUN ] LLVM.LoadStoreTest 2023-01-11T23:32:38.7805407Z [ OK ] LLVM.LoadStoreTest (18 ms) 2023-01-11T23:32:38.7805722Z [ RUN ] LLVM.IfThenElseTest 2023-01-11T23:32:38.8010250Z [ OK ] LLVM.IfThenElseTest (20 ms) 2023-01-11T23:32:38.8011316Z [ RUN ] LLVM.CondNoFalseBlockTest 2023-01-11T23:32:38.8594140Z [ OK ] LLVM.CondNoFalseBlockTest (58 ms) 2023-01-11T23:32:38.8594747Z [ RUN ] LLVM.CondTest 2023-01-11T23:32:38.9183775Z [ OK ] LLVM.CondTest (59 ms) 2023-01-11T23:32:38.9184101Z [ RUN ] LLVM.CondNestedTest 2023-01-11T23:32:39.0026531Z [ OK ] LLVM.CondNestedTest (84 ms) 2023-01-11T23:32:39.0027282Z [ RUN ] LLVM.DirectVectorization 2023-01-11T23:32:39.0464591Z [ OK ] LLVM.DirectVectorization (43 ms) 2023-01-11T23:32:39.0465239Z [ RUN ] LLVM.VecLoadStoreTest 2023-01-11T23:32:39.0646610Z [ OK ] LLVM.VecLoadStoreTest (18 ms) 2023-01-11T23:32:39.0646932Z [ RUN ] LLVM.VecFloat_erfLane4Test 2023-01-11T23:32:39.0840851Z [ OK ] LLVM.VecFloat_erfLane4Test (19 ms) 2023-01-11T23:32:39.0841505Z [ RUN ] LLVM.VecFloat_erfcLane4Test 2023-01-11T23:32:39.1031028Z [ OK ] LLVM.VecFloat_erfcLane4Test (18 ms) 2023-01-11T23:32:39.1031685Z [ RUN ] LLVM.VecFloat_acosLane4Test 2023-01-11T23:32:39.1220695Z [ OK ] LLVM.VecFloat_acosLane4Test (19 ms) 2023-01-11T23:32:39.1221667Z [ RUN ] LLVM.VecFloat_asinLane4Test 2023-01-11T23:32:39.1409720Z [ OK ] LLVM.VecFloat_asinLane4Test (19 ms) 2023-01-11T23:32:39.1410050Z [ RUN ] LLVM.VecFloat_atanLane4Test 2023-01-11T23:32:39.1602638Z [ OK ] LLVM.VecFloat_atanLane4Test (19 ms) 2023-01-11T23:32:39.1603180Z [ RUN ] LLVM.VecFloat_coshLane4Test 2023-01-11T23:32:39.1794751Z [ OK ] LLVM.VecFloat_coshLane4Test (19 ms) 2023-01-11T23:32:39.1795873Z [ RUN ] LLVM.VecFloat_sinhLane4Test 2023-01-11T23:32:39.1987011Z [ OK ] LLVM.VecFloat_sinhLane4Test (19 ms) 2023-01-11T23:32:39.1987663Z [ RUN ] LLVM.VecFloat_tanhLane4Test 2023-01-11T23:32:39.2177452Z [ OK ] LLVM.VecFloat_tanhLane4Test (19 ms) 2023-01-11T23:32:39.2177811Z [ RUN ] LLVM.VecFloat_expm1Lane4Test 2023-01-11T23:32:39.2366943Z [ OK ] LLVM.VecFloat_expm1Lane4Test (19 ms) 2023-01-11T23:32:39.2367287Z [ RUN ] LLVM.VecFloat_lgammaLane4Test 2023-01-11T23:32:39.2560652Z [ OK ] LLVM.VecFloat_lgammaLane4Test (19 ms) 2023-01-11T23:32:39.2561331Z [ RUN ] LLVM.VecFloat_erfLane8Test 2023-01-11T23:32:39.2754328Z [ OK ] LLVM.VecFloat_erfLane8Test (19 ms) 2023-01-11T23:32:39.2755497Z [ RUN ] LLVM.VecFloat_erfcLane8Test 2023-01-11T23:32:39.2946046Z [ OK ] LLVM.VecFloat_erfcLane8Test (19 ms) 2023-01-11T23:32:39.2946678Z [ RUN ] LLVM.VecFloat_acosLane8Test 2023-01-11T23:32:39.3135803Z [ OK ] LLVM.VecFloat_acosLane8Test (19 ms) 2023-01-11T23:32:39.3136838Z [ RUN ] LLVM.VecFloat_asinLane8Test 2023-01-11T23:32:39.3325069Z [ OK ] LLVM.VecFloat_asinLane8Test (19 ms) 2023-01-11T23:32:39.3325507Z [ RUN ] LLVM.VecFloat_atanLane8Test 2023-01-11T23:32:39.3518807Z [ OK ] LLVM.VecFloat_atanLane8Test (19 ms) 2023-01-11T23:32:39.3519481Z [ RUN ] LLVM.VecFloat_coshLane8Test 2023-01-11T23:32:39.3709340Z [ OK ] LLVM.VecFloat_coshLane8Test (18 ms) 2023-01-11T23:32:39.3710129Z [ RUN ] LLVM.VecFloat_sinhLane8Test 2023-01-11T23:32:39.3900754Z [ OK ] LLVM.VecFloat_sinhLane8Test (19 ms) 2023-01-11T23:32:39.3901401Z [ RUN ] LLVM.VecFloat_tanhLane8Test 2023-01-11T23:32:39.4090120Z [ OK ] LLVM.VecFloat_tanhLane8Test (19 ms) 2023-01-11T23:32:39.4090519Z [ RUN ] LLVM.VecFloat_expm1Lane8Test 2023-01-11T23:32:39.4282669Z [ OK ] LLVM.VecFloat_expm1Lane8Test (19 ms) 2023-01-11T23:32:39.4283035Z [ RUN ] LLVM.VecFloat_lgammaLane8Test 2023-01-11T23:32:39.4474567Z [ OK ] LLVM.VecFloat_lgammaLane8Test (19 ms) 2023-01-11T23:32:39.4475221Z [ RUN ] LLVM.VecDouble_erfLane2Test 2023-01-11T23:32:39.4665878Z [ OK ] LLVM.VecDouble_erfLane2Test (19 ms) 2023-01-11T23:32:39.4666524Z [ RUN ] LLVM.VecDouble_erfcLane2Test 2023-01-11T23:32:39.4855118Z [ OK ] LLVM.VecDouble_erfcLane2Test (19 ms) 2023-01-11T23:32:39.4855778Z [ RUN ] LLVM.VecDouble_acosLane2Test 2023-01-11T23:32:39.5044714Z [ OK ] LLVM.VecDouble_acosLane2Test (19 ms) 2023-01-11T23:32:39.5045050Z [ RUN ] LLVM.VecDouble_asinLane2Test 2023-01-11T23:32:39.5239218Z [ OK ] LLVM.VecDouble_asinLane2Test (19 ms) 2023-01-11T23:32:39.5239863Z [ RUN ] LLVM.VecDouble_atanLane2Test 2023-01-11T23:32:39.5428758Z [ OK ] LLVM.VecDouble_atanLane2Test (19 ms) 2023-01-11T23:32:39.5429411Z [ RUN ] LLVM.VecDouble_coshLane2Test 2023-01-11T23:32:39.5620725Z [ OK ] LLVM.VecDouble_coshLane2Test (19 ms) 2023-01-11T23:32:39.5621374Z [ RUN ] LLVM.VecDouble_sinhLane2Test 2023-01-11T23:32:39.5809499Z [ OK ] LLVM.VecDouble_sinhLane2Test (19 ms) 2023-01-11T23:32:39.5809836Z [ RUN ] LLVM.VecDouble_tanhLane2Test 2023-01-11T23:32:39.6002486Z [ OK ] LLVM.VecDouble_tanhLane2Test (19 ms) 2023-01-11T23:32:39.6002823Z [ RUN ] LLVM.VecDouble_expm1Lane2Test 2023-01-11T23:32:39.6194453Z [ OK ] LLVM.VecDouble_expm1Lane2Test (19 ms) 2023-01-11T23:32:39.6195118Z [ RUN ] LLVM.VecDouble_lgammaLane2Test 2023-01-11T23:32:39.6384987Z [ OK ] LLVM.VecDouble_lgammaLane2Test (19 ms) 2023-01-11T23:32:39.6385671Z [ RUN ] LLVM.VecDouble_erfLane4Test 2023-01-11T23:32:39.6576177Z [ OK ] LLVM.VecDouble_erfLane4Test (19 ms) 2023-01-11T23:32:39.6576821Z [ RUN ] LLVM.VecDouble_erfcLane4Test 2023-01-11T23:32:39.6765426Z [ OK ] LLVM.VecDouble_erfcLane4Test (19 ms) 2023-01-11T23:32:39.6766015Z [ RUN ] LLVM.VecDouble_acosLane4Test 2023-01-11T23:32:39.6960292Z [ OK ] LLVM.VecDouble_acosLane4Test (19 ms) 2023-01-11T23:32:39.6960950Z [ RUN ] LLVM.VecDouble_asinLane4Test 2023-01-11T23:32:39.7151352Z [ OK ] LLVM.VecDouble_asinLane4Test (19 ms) 2023-01-11T23:32:39.7152706Z [ RUN ] LLVM.VecDouble_atanLane4Test 2023-01-11T23:32:39.7340775Z [ OK ] LLVM.VecDouble_atanLane4Test (19 ms) 2023-01-11T23:32:39.7341429Z [ RUN ] LLVM.VecDouble_coshLane4Test 2023-01-11T23:32:39.7530118Z [ OK ] LLVM.VecDouble_coshLane4Test (19 ms) 2023-01-11T23:32:39.7530456Z [ RUN ] LLVM.VecDouble_sinhLane4Test 2023-01-11T23:32:39.7723020Z [ OK ] LLVM.VecDouble_sinhLane4Test (19 ms) 2023-01-11T23:32:39.7723488Z [ RUN ] LLVM.VecDouble_tanhLane4Test 2023-01-11T23:32:39.7916115Z [ OK ] LLVM.VecDouble_tanhLane4Test (19 ms) 2023-01-11T23:32:39.7916798Z [ RUN ] LLVM.VecDouble_expm1Lane4Test 2023-01-11T23:32:39.8110067Z [ OK ] LLVM.VecDouble_expm1Lane4Test (19 ms) 2023-01-11T23:32:39.8110437Z [ RUN ] LLVM.VecDouble_lgammaLane4Test 2023-01-11T23:32:39.8299117Z [ OK ] LLVM.VecDouble_lgammaLane4Test (19 ms) 2023-01-11T23:32:39.8299805Z [ RUN ] LLVM.VectorizerLoadStoreTest 2023-01-11T23:32:39.8485602Z [ OK ] LLVM.VectorizerLoadStoreTest (18 ms) 2023-01-11T23:32:39.8485937Z [ RUN ] LLVM.VectorizeBitCast 2023-01-11T23:32:39.8724402Z [ OK ] LLVM.VectorizeBitCast (23 ms) 2023-01-11T23:32:39.8725009Z [ RUN ] LLVM.MemcpyTest 2023-01-11T23:32:39.8931526Z [ OK ] LLVM.MemcpyTest (20 ms) 2023-01-11T23:32:39.8932470Z [ RUN ] LLVM.BzeroTest 2023-01-11T23:32:39.9130366Z [ OK ] LLVM.BzeroTest (19 ms) 2023-01-11T23:32:39.9130666Z [ RUN ] LLVM.ElemwiseAdd 2023-01-11T23:32:39.9431209Z [ OK ] LLVM.ElemwiseAdd (29 ms) 2023-01-11T23:32:39.9432394Z [ RUN ] LLVM.ElemwiseAddFloat 2023-01-11T23:32:39.9723687Z [ OK ] LLVM.ElemwiseAddFloat (29 ms) 2023-01-11T23:32:39.9724019Z [ RUN ] LLVM.ElemwiseLog10Float 2023-01-11T23:32:39.9961571Z [ OK ] LLVM.ElemwiseLog10Float (23 ms) 2023-01-11T23:32:39.9962066Z [ RUN ] LLVM.ElemwiseLog1pFloat 2023-01-11T23:32:40.0200107Z [ OK ] LLVM.ElemwiseLog1pFloat (23 ms) 2023-01-11T23:32:40.0200719Z [ RUN ] LLVM.ElemwiseMaxInt 2023-01-11T23:32:40.0494144Z [ OK ] LLVM.ElemwiseMaxInt (29 ms) 2023-01-11T23:32:40.0495093Z [ RUN ] LLVM.ElemwiseMinInt 2023-01-11T23:32:40.0793024Z [ OK ] LLVM.ElemwiseMinInt (29 ms) 2023-01-11T23:32:40.0794303Z [ RUN ] LLVM.ElemwiseMaxFloat 2023-01-11T23:32:40.1113685Z [ OK ] LLVM.ElemwiseMaxFloat (31 ms) 2023-01-11T23:32:40.1114368Z [ RUN ] LLVM.ElemwiseMaxNaNFloat 2023-01-11T23:32:40.1432662Z [ OK ] LLVM.ElemwiseMaxNaNFloat (31 ms) 2023-01-11T23:32:40.1433311Z [ RUN ] LLVM.ElemwiseMinFloat 2023-01-11T23:32:40.1753274Z [ OK ] LLVM.ElemwiseMinFloat (32 ms) 2023-01-11T23:32:40.1754307Z [ RUN ] LLVM.ElemwiseMinNaNFloat 2023-01-11T23:32:40.2073271Z [ OK ] LLVM.ElemwiseMinNaNFloat (31 ms) 2023-01-11T23:32:40.2073898Z [ RUN ] LLVM.ElemwiseMod 2023-01-11T23:32:40.2314069Z [ OK ] LLVM.ElemwiseMod (24 ms) 2023-01-11T23:32:40.2314726Z [ RUN ] LLVM.CompareSelectIntEQ 2023-01-11T23:32:40.2618513Z [ OK ] LLVM.CompareSelectIntEQ (30 ms) 2023-01-11T23:32:40.2619176Z [ RUN ] LLVM.CompareSelectFloatEQ 2023-01-11T23:32:40.2923767Z [ OK ] LLVM.CompareSelectFloatEQ (30 ms) 2023-01-11T23:32:40.2924110Z [ RUN ] LLVM.CompareSelectByteGT 2023-01-11T23:32:40.3249677Z [ OK ] LLVM.CompareSelectByteGT (32 ms) 2023-01-11T23:32:40.3250021Z [ RUN ] LLVM.CompareSelectByteGE 2023-01-11T23:32:40.3582626Z [ OK ] LLVM.CompareSelectByteGE (32 ms) 2023-01-11T23:32:40.3583285Z [ RUN ] LLVM.CompareSelectByteLT 2023-01-11T23:32:40.3910333Z [ OK ] LLVM.CompareSelectByteLT (32 ms) 2023-01-11T23:32:40.3911048Z [ RUN ] LLVM.CompareSelectByteLE 2023-01-11T23:32:40.4240362Z [ OK ] LLVM.CompareSelectByteLE (33 ms) 2023-01-11T23:32:40.4240975Z [ RUN ] LLVM.StoreFloat 2023-01-11T23:32:40.4421941Z [ OK ] LLVM.StoreFloat (18 ms) 2023-01-11T23:32:40.4422543Z [ RUN ] LLVM.SimpleMath01 2023-01-11T23:32:40.4740334Z [ OK ] LLVM.SimpleMath01 (31 ms) 2023-01-11T23:32:40.4740917Z [ RUN ] LLVM.ComputeMul 2023-01-11T23:32:40.5035733Z [ OK ] LLVM.ComputeMul (29 ms) 2023-01-11T23:32:40.5036328Z [ RUN ] LLVM.BroadcastAdd 2023-01-11T23:32:40.5395137Z [ OK ] LLVM.BroadcastAdd (35 ms) 2023-01-11T23:32:40.5395757Z [ RUN ] LLVM.BitwiseOps 2023-01-11T23:32:40.5575069Z [ OK ] LLVM.BitwiseOps (18 ms) 2023-01-11T23:32:40.5575748Z [ RUN ] LLVM.ArithmeticRightShift 2023-01-11T23:32:40.5757251Z [ OK ] LLVM.ArithmeticRightShift (18 ms) 2023-01-11T23:32:40.5757928Z [ RUN ] LLVM.LogicalRightShift 2023-01-11T23:32:40.5936216Z [ OK ] LLVM.LogicalRightShift (17 ms) 2023-01-11T23:32:40.5936856Z [ RUN ] LLVM.DynamicShapeAdd 2023-01-11T23:32:40.7020371Z [ OK ] LLVM.DynamicShapeAdd (108 ms) 2023-01-11T23:32:40.7021097Z [ RUN ] LLVM.BindDynamicShapeAdd 2023-01-11T23:32:40.8105046Z [ OK ] LLVM.BindDynamicShapeAdd (108 ms) 2023-01-11T23:32:40.8105510Z [ RUN ] LLVM.TensorDynamicShapeAdd 2023-01-11T23:32:40.9197032Z [ OK ] LLVM.TensorDynamicShapeAdd (108 ms) 2023-01-11T23:32:40.9198674Z [ RUN ] LLVM.DynamicShape2D 2023-01-11T23:32:41.0707106Z [ OK ] LLVM.DynamicShape2D (151 ms) 2023-01-11T23:32:41.0707642Z [ RUN ] LLVM.EmptyStmt 2023-01-11T23:32:41.0877456Z [ OK ] LLVM.EmptyStmt (17 ms) 2023-01-11T23:32:41.0878531Z [ RUN ] LLVM.EliminatedStmt 2023-01-11T23:32:41.1050566Z [ OK ] LLVM.EliminatedStmt (17 ms) 2023-01-11T23:32:41.1050895Z [ RUN ] LLVM.SimpleReduction 2023-01-11T23:32:41.2095438Z [ OK ] LLVM.SimpleReduction (104 ms) 2023-01-11T23:32:41.2096085Z [ RUN ] LLVM.RFactorReduction 2023-01-11T23:32:41.2388531Z [ OK ] LLVM.RFactorReduction (29 ms) 2023-01-11T23:32:41.2389171Z [ RUN ] LLVM.RFactorVectorizedReduction 2023-01-11T23:32:41.2916967Z [ OK ] LLVM.RFactorVectorizedReduction (52 ms) 2023-01-11T23:32:41.2917658Z [ RUN ] LLVM.SimpleParallelSS 2023-01-11T23:32:41.3168306Z [ OK ] LLVM.SimpleParallelSS (25 ms) 2023-01-11T23:32:41.3168630Z [ RUN ] LLVM.SimpleParallelSP 2023-01-11T23:32:41.3445023Z [ OK ] LLVM.SimpleParallelSP (27 ms) 2023-01-11T23:32:41.3445517Z [ RUN ] LLVM.SimpleParallelPS 2023-01-11T23:32:41.3738898Z [ OK ] LLVM.SimpleParallelPS (29 ms) 2023-01-11T23:32:41.3739589Z [ RUN ] LLVM.SimpleParallelPP 2023-01-11T23:32:41.4017224Z [ OK ] LLVM.SimpleParallelPP (28 ms) 2023-01-11T23:32:41.4017586Z [ RUN ] LLVM.CompositeParallel 2023-01-11T23:32:44.9376231Z [ OK ] LLVM.CompositeParallel (3535 ms) 2023-01-11T23:32:44.9376616Z [ RUN ] LLVM.VectorizedGEMM 2023-01-11T23:32:45.0583698Z [ OK ] LLVM.VectorizedGEMM (120 ms) 2023-01-11T23:32:45.0583996Z [ RUN ] LLVM.CallRaw 2023-01-11T23:32:45.7203922Z [ OK ] LLVM.CallRaw (662 ms) 2023-01-11T23:32:45.7204222Z [ RUN ] LLVM.CustomTarget 2023-01-11T23:32:45.7465450Z [ OK ] LLVM.CustomTarget (26 ms) 2023-01-11T23:32:45.7465769Z [ RUN ] LLVM.CodeGenKernelFuncName 2023-01-11T23:32:45.7837552Z [ OK ] LLVM.CodeGenKernelFuncName (36 ms) 2023-01-11T23:32:45.7838230Z [----------] 150 tests from LLVM (8160 ms total) 2023-01-11T23:32:45.7838515Z 2023-01-11T23:32:45.7838820Z [----------] Global test environment tear-down 2023-01-11T23:32:45.7937314Z [==========] 829 tests from 26 test suites ran. (36068 ms total) 2023-01-11T23:32:45.7941609Z [ PASSED ] 829 tests. 2023-01-11T23:32:45.7941777Z 2023-01-11T23:32:45.7942503Z  YOU HAVE 5 DISABLED TESTS 2023-01-11T23:32:45.7942825Z 2023-01-11T23:32:45.9664381Z + [[ linux-bionic-cuda11.6-py3.10-gcc7-sm86 != *android* ]] 2023-01-11T23:32:45.9664822Z + [[ linux-bionic-cuda11.6-py3.10-gcc7-sm86 != *cuda* ]] 2023-01-11T23:32:45.9665072Z + assert_git_not_dirty 2023-01-11T23:32:45.9665375Z + [[ linux-bionic-cuda11.6-py3.10-gcc7-sm86 != *rocm* ]] 2023-01-11T23:32:45.9665725Z + [[ linux-bionic-cuda11.6-py3.10-gcc7-sm86 != *xla* ]] 2023-01-11T23:32:45.9667198Z ++ git status --porcelain 2023-01-11T23:32:46.0390733Z + git_status= 2023-01-11T23:32:46.0391602Z + [[ -n '' ]] 2023-01-11T23:32:46.0392025Z + test_aot_compilation 2023-01-11T23:32:46.0392310Z + echo 'Testing Ahead of Time compilation' 2023-01-11T23:32:46.0392565Z Testing Ahead of Time compilation 2023-01-11T23:32:46.0393417Z + ln -sf /opt/conda/lib/python3.10/site-packages/torch/lib/libc10.so /opt/conda/lib/python3.10/site-packages/torch/lib/libc10_cuda.so /opt/conda/lib/python3.10/site-packages/torch/lib/libc10d_cuda_test.so /opt/conda/lib/python3.10/site-packages/torch/bin 2023-01-11T23:32:46.0404870Z + ln -sf /opt/conda/lib/python3.10/site-packages/torch/lib/libtorch.so /opt/conda/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so /opt/conda/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so /opt/conda/lib/python3.10/site-packages/torch/lib/libtorch_cuda_linalg.so /opt/conda/lib/python3.10/site-packages/torch/lib/libtorch_global_deps.so /opt/conda/lib/python3.10/site-packages/torch/lib/libtorch_python.so /opt/conda/lib/python3.10/site-packages/torch/lib/libtorchbind_test.so /opt/conda/lib/python3.10/site-packages/torch/bin 2023-01-11T23:32:46.0414640Z + TEST_REPORTS_DIR=test/test-reports/cpp-unittest/test_aot_compilation 2023-01-11T23:32:46.0415126Z + mkdir -p test/test-reports/cpp-unittest/test_aot_compilation 2023-01-11T23:32:46.0423760Z + '[' -f /opt/conda/lib/python3.10/site-packages/torch/bin/test_mobile_nnc ']' 2023-01-11T23:32:46.0424293Z + /opt/conda/lib/python3.10/site-packages/torch/bin/test_mobile_nnc --gtest_output=xml:test/test-reports/cpp-unittest/test_aot_compilation/test_mobile_nnc.xml 2023-01-11T23:32:46.4149334Z Note: Google Test filter = *-*_CUDA:*_MultiCUDA 2023-01-11T23:32:46.4150339Z [==========] Running 6 tests from 2 test suites. 2023-01-11T23:32:46.4151189Z [----------] Global test environment set-up. 2023-01-11T23:32:46.4151908Z [----------] 4 tests from Function 2023-01-11T23:32:46.4152481Z [ RUN ] Function.ExecuteSlowMul 2023-01-11T23:32:46.4159060Z [ OK ] Function.ExecuteSlowMul (1 ms) 2023-01-11T23:32:46.4160154Z [ RUN ] Function.Serialization 2023-01-11T23:32:46.4160502Z [ OK ] Function.Serialization (0 ms) 2023-01-11T23:32:46.4160890Z [ RUN ] Function.ValidInput 2023-01-11T23:32:46.4161201Z [ OK ] Function.ValidInput (0 ms) 2023-01-11T23:32:46.4161506Z [ RUN ] Function.InvalidInput 2023-01-11T23:32:46.4167213Z [ OK ] Function.InvalidInput (0 ms) 2023-01-11T23:32:46.4167969Z [----------] 4 tests from Function (2 ms total) 2023-01-11T23:32:46.4168408Z 2023-01-11T23:32:46.4168723Z [----------] 2 tests from MobileNNCRegistryTest 2023-01-11T23:32:46.4169203Z [ RUN ] MobileNNCRegistryTest.FindAndRun 2023-01-11T23:32:46.4169670Z [ OK ] MobileNNCRegistryTest.FindAndRun (0 ms) 2023-01-11T23:32:46.4170100Z [ RUN ] MobileNNCRegistryTest.NoKernel 2023-01-11T23:32:46.4170593Z [ OK ] MobileNNCRegistryTest.NoKernel (0 ms) 2023-01-11T23:32:46.4170998Z [----------] 2 tests from MobileNNCRegistryTest (0 ms total) 2023-01-11T23:32:46.4171178Z 2023-01-11T23:32:46.4171343Z [----------] Global test environment tear-down 2023-01-11T23:32:46.4171668Z [==========] 6 tests from 2 test suites ran. (2 ms total) 2023-01-11T23:32:46.4171946Z [ PASSED ] 6 tests. 2023-01-11T23:32:46.4172076Z 2023-01-11T23:32:46.4172203Z  YOU HAVE 1 DISABLED TEST 2023-01-11T23:32:46.4172325Z 2023-01-11T23:32:46.4906295Z + '[' -f /opt/conda/lib/python3.10/site-packages/torch/bin/aot_model_compiler_test ']' 2023-01-11T23:32:46.4906720Z + source test/mobile/nnc/test_aot_compile.sh 2023-01-11T23:32:46.4907001Z ++ set -e -o pipefail 2023-01-11T23:32:46.4910659Z +++ python -c 'import site; print(site.getsitepackages()[0])' 2023-01-11T23:32:46.5057267Z ++ TORCH_INSTALL_DIR=/opt/conda/lib/python3.10/site-packages/torch 2023-01-11T23:32:46.5058114Z ++ TORCH_BIN_DIR=/opt/conda/lib/python3.10/site-packages/torch/bin 2023-01-11T23:32:46.5059113Z +++ dirname test/mobile/nnc/test_aot_compile.sh 2023-01-11T23:32:46.5068026Z ++ CURRENT_DIR=test/mobile/nnc 2023-01-11T23:32:46.5068579Z ++ MODEL=aot_test_model.pt 2023-01-11T23:32:46.5068894Z ++ COMPILED_MODEL=aot_test_model.compiled.pt 2023-01-11T23:32:46.5069237Z ++ COMPILED_CODE=aot_test_model.compiled.ll 2023-01-11T23:32:46.5071055Z +++ mktemp -d -t build_XXX 2023-01-11T23:32:46.5081771Z ++ TMP_DIR=/tmp/build_TDr 2023-01-11T23:32:46.5082137Z + test_custom_script_ops 2023-01-11T23:32:46.5082606Z + [[ linux-bionic-cuda11.6-py3.10-gcc7-sm86 != *asan* ]] 2023-01-11T23:32:46.5083014Z + echo 'Testing custom script operators' 2023-01-11T23:32:46.5083330Z Testing custom script operators 2023-01-11T23:32:46.5083693Z + CUSTOM_OP_BUILD=/var/lib/jenkins/workspace/build/custom_test_artifacts/custom-op-build 2023-01-11T23:32:46.5083978Z + pushd test/custom_operator 2023-01-11T23:32:46.5084220Z ~/workspace/test/custom_operator ~/workspace 2023-01-11T23:32:46.5084587Z + cp -a /var/lib/jenkins/workspace/build/custom_test_artifacts/custom-op-build build 2023-01-11T23:32:46.5153037Z + python test_custom_ops.py -v 2023-01-11T23:32:47.8307936Z Test results will be stored in test-reports/python-unittest/test_custom_ops 2023-01-11T23:32:47.8315924Z 2023-01-11T23:32:47.8316119Z Running tests... 2023-01-11T23:32:47.8316484Z ---------------------------------------------------------------------- 2023-01-11T23:32:47.8359872Z test_calling_custom_op (__main__.TestCustomOperators) ... ok (0.004s) 2023-01-11T23:32:47.8836787Z test_calling_custom_op_inside_script_module (__main__.TestCustomOperators) ... ok (0.047s) 2023-01-11T23:32:47.8840079Z test_calling_custom_op_string (__main__.TestCustomOperators) ... ok (0.001s) 2023-01-11T23:32:47.8858235Z test_calling_custom_op_with_autograd (__main__.TestCustomOperators) ... /opt/conda/lib/python3.10/site-packages/torch/autograd/__init__.py:197: UserWarning: Using backward() with create_graph=True will create a reference cycle between the parameter and its gradient which can cause a memory leak. We recommend using autograd.grad when creating the graph to avoid this. If you have to use this function, make sure to reset the .grad fields of your parameters to None after use to break the cycle and avoid the leak. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/engine.cpp:1134.) 2023-01-11T23:32:47.8859491Z Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2023-01-11T23:32:47.8870551Z ok (0.003s) 2023-01-11T23:32:47.8878095Z test_calling_custom_op_with_autograd_in_nograd_mode (__main__.TestCustomOperators) ... ok (0.001s) 2023-01-11T23:32:47.8881337Z test_custom_library_is_loaded (__main__.TestCustomOperators) ... ok (0.000s) 2023-01-11T23:32:47.8967666Z test_saving_and_loading_script_module_with_custom_op (__main__.TestCustomOperators) ... ok (0.008s) 2023-01-11T23:32:47.8967938Z 2023-01-11T23:32:47.8968275Z ---------------------------------------------------------------------- 2023-01-11T23:32:47.8968643Z Ran 7 tests in 0.065s 2023-01-11T23:32:47.8968803Z 2023-01-11T23:32:47.8968898Z OK 2023-01-11T23:32:47.8969029Z 2023-01-11T23:32:47.8969591Z Generating XML reports... 2023-01-11T23:32:47.8996322Z Generated XML report: test-reports/python-unittest/test_custom_ops/TEST-TestCustomOperators-20230111233247.xml 2023-01-11T23:32:48.1999940Z + python model.py --export-script-module=model.pt 2023-01-11T23:32:49.5480986Z + build/test_custom_ops ./model.pt 2023-01-11T23:32:49.9356450Z [W engine.cpp:1134] Warning: Using backward() with create_graph=True will create a reference cycle between the parameter and its gradient which can cause a memory leak. We recommend using autograd.grad when creating the graph to avoid this. If you have to use this function, make sure to reset the .grad fields of your parameters to None after use to break the cycle and avoid the leak. (function operator()) 2023-01-11T23:32:50.9038423Z ok 2023-01-11T23:32:51.0714975Z + popd 2023-01-11T23:32:51.0715236Z ~/workspace 2023-01-11T23:32:51.0715429Z + assert_git_not_dirty 2023-01-11T23:32:51.0715890Z + [[ linux-bionic-cuda11.6-py3.10-gcc7-sm86 != *rocm* ]] 2023-01-11T23:32:51.0716256Z + [[ linux-bionic-cuda11.6-py3.10-gcc7-sm86 != *xla* ]] 2023-01-11T23:32:51.0716535Z ++ git status --porcelain 2023-01-11T23:32:51.1435016Z + git_status= 2023-01-11T23:32:51.1435432Z + [[ -n '' ]] 2023-01-11T23:32:51.1435848Z + test_custom_backend 2023-01-11T23:32:51.1436205Z + [[ linux-bionic-cuda11.6-py3.10-gcc7-sm86 != *asan* ]] 2023-01-11T23:32:51.1436506Z + echo 'Testing custom backends' 2023-01-11T23:32:51.1436722Z Testing custom backends 2023-01-11T23:32:51.1437088Z + CUSTOM_BACKEND_BUILD=/var/lib/jenkins/workspace/build/custom_test_artifacts/custom-backend-build 2023-01-11T23:32:51.1437399Z + pushd test/custom_backend 2023-01-11T23:32:51.1437639Z ~/workspace/test/custom_backend ~/workspace 2023-01-11T23:32:51.1438010Z + cp -a /var/lib/jenkins/workspace/build/custom_test_artifacts/custom-backend-build build 2023-01-11T23:32:51.1501785Z + python test_custom_backend.py -v 2023-01-11T23:32:52.4760010Z Test results will be stored in test-reports/python-unittest/test_custom_backend 2023-01-11T23:32:52.4771305Z 2023-01-11T23:32:52.4771603Z Running tests... 2023-01-11T23:32:52.4771952Z ---------------------------------------------------------------------- 2023-01-11T23:32:52.4778235Z test_execute (__main__.TestCustomBackend) 2023-01-11T23:32:52.5327813Z Test execution using the custom backend. ... ok (0.055s) 2023-01-11T23:32:52.5333282Z test_save_load (__main__.TestCustomBackend) 2023-01-11T23:32:52.5512280Z Test that a lowered module can be executed correctly ... ok (0.018s) 2023-01-11T23:32:52.5513595Z 2023-01-11T23:32:52.5514540Z ---------------------------------------------------------------------- 2023-01-11T23:32:52.5514842Z Ran 2 tests in 0.074s 2023-01-11T23:32:52.5515000Z 2023-01-11T23:32:52.5515295Z OK 2023-01-11T23:32:52.5515398Z 2023-01-11T23:32:52.5515501Z Generating XML reports... 2023-01-11T23:32:52.5538283Z Generated XML report: test-reports/python-unittest/test_custom_backend/TEST-TestCustomBackend-20230111233252.xml 2023-01-11T23:32:52.8517037Z + python backend.py --export-module-to=model.pt 2023-01-11T23:32:54.3198902Z + build/test_custom_backend ./model.pt 2023-01-11T23:32:54.7374453Z Testing custom_backend 2023-01-11T23:32:54.8858753Z OK 2023-01-11T23:32:54.9797877Z + rm -f ./model.pt 2023-01-11T23:32:54.9807049Z + popd 2023-01-11T23:32:54.9807258Z ~/workspace 2023-01-11T23:32:54.9807511Z + assert_git_not_dirty 2023-01-11T23:32:54.9807874Z + [[ linux-bionic-cuda11.6-py3.10-gcc7-sm86 != *rocm* ]] 2023-01-11T23:32:54.9808294Z + [[ linux-bionic-cuda11.6-py3.10-gcc7-sm86 != *xla* ]] 2023-01-11T23:32:54.9810171Z ++ git status --porcelain 2023-01-11T23:32:55.0541105Z + git_status= 2023-01-11T23:32:55.0541609Z + [[ -n '' ]] 2023-01-11T23:32:55.0541821Z + test_torch_function_benchmark 2023-01-11T23:32:55.0542107Z + echo 'Testing __torch_function__ benchmarks' 2023-01-11T23:32:55.0542356Z Testing __torch_function__ benchmarks 2023-01-11T23:32:55.0542626Z + pushd benchmarks/overrides_benchmark 2023-01-11T23:32:55.0542904Z ~/workspace/benchmarks/overrides_benchmark ~/workspace 2023-01-11T23:32:55.0543190Z + python bench.py -n 1 -m 2 2023-01-11T23:32:56.1616159Z Type tensor had a minimum time of 0.0064373016357421875 us and a standard deviation of 0.5222837207838893 us. 2023-01-11T23:32:56.1617167Z Type SubTensor had a minimum time of 0.013589859008789062 us and a standard deviation of 0.028491269404185005 us. 2023-01-11T23:32:56.1617738Z Type WithTorchFunction had a minimum time of 0.00667572021484375 us and a standard deviation of 0.01281264212593669 us. 2023-01-11T23:32:56.1618321Z Type SubWithTorchFunction had a minimum time of 0.010967254638671875 us and a standard deviation of 0.006237733487068908 us. 2023-01-11T23:32:56.4001123Z + python pyspybench.py Tensor -n 1 2023-01-11T23:32:57.7378046Z + python pyspybench.py SubTensor -n 1 2023-01-11T23:32:59.0754885Z + python pyspybench.py WithTorchFunction -n 1 2023-01-11T23:33:00.4216272Z + python pyspybench.py SubWithTorchFunction -n 1 2023-01-11T23:33:01.7617636Z + popd 2023-01-11T23:33:01.7618049Z ~/workspace 2023-01-11T23:33:01.7618342Z + assert_git_not_dirty 2023-01-11T23:33:01.7618823Z + [[ linux-bionic-cuda11.6-py3.10-gcc7-sm86 != *rocm* ]] 2023-01-11T23:33:01.7619181Z + [[ linux-bionic-cuda11.6-py3.10-gcc7-sm86 != *xla* ]] 2023-01-11T23:33:01.7621354Z ++ git status --porcelain 2023-01-11T23:33:01.8340162Z + git_status= 2023-01-11T23:33:01.8340543Z + [[ -n '' ]] 2023-01-11T23:33:01.8407470Z Prepare all required actions 2023-01-11T23:33:01.8407780Z Getting action download info 2023-01-11T23:33:02.0479868Z ##[group]Run ./.github/actions/get-workflow-job-id 2023-01-11T23:33:02.0480096Z with: 2023-01-11T23:33:02.0480449Z github-token: *** 2023-01-11T23:33:02.0480633Z env: 2023-01-11T23:33:02.0480819Z GIT_DEFAULT_BRANCH: master 2023-01-11T23:33:02.0481022Z GPU_FLAG: --gpus all 2023-01-11T23:33:02.0481298Z DOCKER_CONTAINER_ID: b465a1e11c77f74805f694f8096e59c3761c5a109e9555d7d0ec24fd6c5fa0c6 2023-01-11T23:33:02.0481570Z ##[endgroup] 2023-01-11T23:33:02.0505661Z ##[group]Run nick-fields/retry@3e91a01664abd3c5cd539100d10d33b9c5b68482 2023-01-11T23:33:02.0505916Z with: 2023-01-11T23:33:02.0506091Z shell: bash 2023-01-11T23:33:02.0506275Z timeout_minutes: 10 2023-01-11T23:33:02.0506470Z max_attempts: 5 2023-01-11T23:33:02.0506670Z retry_wait_seconds: 30 2023-01-11T23:33:02.0507089Z command: set -eux python3 -m pip install requests==2.26.0 GHA_WORKFLOW_JOB_ID=$(python3 .github/scripts/get_workflow_job_id.py "${GITHUB_RUN_ID}" "${RUNNER_NAME}") echo "job-id=${GHA_WORKFLOW_JOB_ID}" >> "${GITHUB_OUTPUT}" 2023-01-11T23:33:02.0507641Z polling_interval_seconds: 1 2023-01-11T23:33:02.0507852Z warning_on_retry: true 2023-01-11T23:33:02.0508063Z continue_on_error: false 2023-01-11T23:33:02.0508270Z env: 2023-01-11T23:33:02.0508479Z GIT_DEFAULT_BRANCH: master 2023-01-11T23:33:02.0508697Z GPU_FLAG: --gpus all 2023-01-11T23:33:02.0508980Z DOCKER_CONTAINER_ID: b465a1e11c77f74805f694f8096e59c3761c5a109e9555d7d0ec24fd6c5fa0c6 2023-01-11T23:33:02.0509423Z GITHUB_TOKEN: *** 2023-01-11T23:33:02.0509608Z ##[endgroup] 2023-01-11T23:33:02.0984898Z + python3 -m pip install requests==2.26.0 2023-01-11T23:33:02.3057212Z Defaulting to user installation because normal site-packages is not writeable 2023-01-11T23:33:02.3232542Z Requirement already satisfied: requests==2.26.0 in /home/ec2-user/.local/lib/python3.7/site-packages (2.26.0) 2023-01-11T23:33:02.3371012Z Requirement already satisfied: urllib3<1.27,>=1.21.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from requests==2.26.0) (1.26.14) 2023-01-11T23:33:02.3541458Z Requirement already satisfied: certifi>=2017.4.17 in /home/ec2-user/.local/lib/python3.7/site-packages (from requests==2.26.0) (2022.12.7) 2023-01-11T23:33:02.3549201Z Requirement already satisfied: charset-normalizer~=2.0.0; python_version >= "3" in /home/ec2-user/.local/lib/python3.7/site-packages (from requests==2.26.0) (2.0.12) 2023-01-11T23:33:02.3567353Z Requirement already satisfied: idna<4,>=2.5; python_version >= "3" in /home/ec2-user/.local/lib/python3.7/site-packages (from requests==2.26.0) (3.4) 2023-01-11T23:33:02.5478841Z ++ python3 .github/scripts/get_workflow_job_id.py 3896346758 i-016718a172a944ca0 2023-01-11T23:33:05.4456608Z + GHA_WORKFLOW_JOB_ID=10589556206 2023-01-11T23:33:05.4457680Z + echo job-id=10589556206 2023-01-11T23:33:06.0997391Z Command completed after 1 attempt(s). 2023-01-11T23:33:06.1099778Z ##[group]Run kill "$MONITOR_SCRIPT_PID" 2023-01-11T23:33:06.1100035Z kill "$MONITOR_SCRIPT_PID" 2023-01-11T23:33:06.1110805Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2023-01-11T23:33:06.1111045Z env: 2023-01-11T23:33:06.1111234Z GIT_DEFAULT_BRANCH: master 2023-01-11T23:33:06.1111439Z GPU_FLAG: --gpus all 2023-01-11T23:33:06.1111720Z DOCKER_CONTAINER_ID: b465a1e11c77f74805f694f8096e59c3761c5a109e9555d7d0ec24fd6c5fa0c6 2023-01-11T23:33:06.1111995Z MONITOR_SCRIPT_PID: 5027 2023-01-11T23:33:06.1112185Z ##[endgroup] 2023-01-11T23:33:06.1192379Z Prepare all required actions 2023-01-11T23:33:06.1192644Z Getting action download info 2023-01-11T23:33:06.3101065Z Download action repository 'actions/upload-artifact@v3' (SHA:0b7f8abb1508181956e8e162db84b466c27e18ce) 2023-01-11T23:33:06.4596484Z ##[group]Run ./.github/actions/upload-test-artifacts 2023-01-11T23:33:06.4596724Z with: 2023-01-11T23:33:06.4597001Z file-suffix: test-default-2-4-linux.g5.4xlarge.nvidia.gpu_10589556206 2023-01-11T23:33:06.4597271Z env: 2023-01-11T23:33:06.4597454Z GIT_DEFAULT_BRANCH: master 2023-01-11T23:33:06.4597668Z GPU_FLAG: --gpus all 2023-01-11T23:33:06.4597951Z DOCKER_CONTAINER_ID: b465a1e11c77f74805f694f8096e59c3761c5a109e9555d7d0ec24fd6c5fa0c6 2023-01-11T23:33:06.4598214Z ##[endgroup] 2023-01-11T23:33:06.4620557Z ##[group]Run # Remove any previous test jsons if they exist 2023-01-11T23:33:06.4620844Z # Remove any previous test jsons if they exist 2023-01-11T23:33:06.4621093Z rm -f test-jsons-*.zip 2023-01-11T23:33:06.4621392Z zip -r "test-jsons-${FILE_SUFFIX}.zip" test -i '*.json' 2023-01-11T23:33:06.4632126Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2023-01-11T23:33:06.4632360Z env: 2023-01-11T23:33:06.4632553Z GIT_DEFAULT_BRANCH: master 2023-01-11T23:33:06.4632760Z GPU_FLAG: --gpus all 2023-01-11T23:33:06.4633041Z DOCKER_CONTAINER_ID: b465a1e11c77f74805f694f8096e59c3761c5a109e9555d7d0ec24fd6c5fa0c6 2023-01-11T23:33:06.4633397Z FILE_SUFFIX: test-default-2-4-linux.g5.4xlarge.nvidia.gpu_10589556206 2023-01-11T23:33:06.4633661Z ##[endgroup] 2023-01-11T23:33:06.4717385Z adding: test/allowlist_for_publicAPI.json (deflated 78%) 2023-01-11T23:33:06.4746882Z adding: test/benchmark_utils/callgrind_artifacts.json (deflated 92%) 2023-01-11T23:33:06.4753168Z adding: test/profiler/profiler_utils_mock_events.json (deflated 87%) 2023-01-11T23:33:06.4754857Z adding: test/.pytorch-slow-tests.json (deflated 77%) 2023-01-11T23:33:06.4759584Z adding: test/.pytorch-disabled-tests.json (deflated 84%) 2023-01-11T23:33:06.4782809Z ##[group]Run # Remove any previous test reports if they exist 2023-01-11T23:33:06.4783184Z # Remove any previous test reports if they exist 2023-01-11T23:33:06.4783509Z rm -f test-reports-*.zip 2023-01-11T23:33:06.4783858Z zip -r "test-reports-${FILE_SUFFIX}.zip" test -i '*.xml' -i '*.csv' 2023-01-11T23:33:06.4796261Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2023-01-11T23:33:06.4796553Z env: 2023-01-11T23:33:06.4796794Z GIT_DEFAULT_BRANCH: master 2023-01-11T23:33:06.4797046Z GPU_FLAG: --gpus all 2023-01-11T23:33:06.4797386Z DOCKER_CONTAINER_ID: b465a1e11c77f74805f694f8096e59c3761c5a109e9555d7d0ec24fd6c5fa0c6 2023-01-11T23:33:06.4797822Z FILE_SUFFIX: test-default-2-4-linux.g5.4xlarge.nvidia.gpu_10589556206 2023-01-11T23:33:06.4798148Z ##[endgroup] 2023-01-11T23:33:06.4873511Z adding: test/custom_backend/test-reports/python-unittest/test_custom_backend/TEST-TestCustomBackend-20230111233252.xml (deflated 56%) 2023-01-11T23:33:06.4874257Z adding: test/custom_operator/test-reports/python-unittest/test_custom_ops/TEST-TestCustomOperators-20230111233247.xml (deflated 65%) 2023-01-11T23:33:06.4874942Z adding: test/test-reports/python-unittest/dynamo.test_optimizations/TEST-NormalizeIRTests-20230111212247.xml (deflated 41%) 2023-01-11T23:33:06.4875604Z adding: test/test-reports/python-unittest/dynamo.test_optimizations/TEST-TestOptimizations-20230111212247.xml (deflated 78%) 2023-01-11T23:33:06.4881934Z adding: test/test-reports/python-unittest/dynamo.test_misc/TEST-MiscTests-20230111212247.xml (deflated 90%) 2023-01-11T23:33:06.4882528Z adding: test/test-reports/python-unittest/dynamo.test_misc/TEST-TestTracer-20230111212247.xml (deflated 39%) 2023-01-11T23:33:06.4883201Z adding: test/test-reports/python-unittest/dynamo.test_torchxla_integration/TEST-TorchXLAReuseGraphTest-20230111212252.xml (deflated 76%) 2023-01-11T23:33:06.4884485Z adding: test/test-reports/python-unittest/test_python_dispatch/TEST-TestPythonDispatch-20230111212256.xml (deflated 87%) 2023-01-11T23:33:06.4885166Z adding: test/test-reports/python-unittest/test_python_dispatch/TEST-TestPythonDispatcher-20230111212256.xml (deflated 55%) 2023-01-11T23:33:06.4886383Z adding: test/test-reports/python-unittest/test_python_dispatch/TEST-TestPythonRegistration-20230111212256.xml (deflated 82%) 2023-01-11T23:33:06.4887697Z adding: test/test-reports/python-unittest/test_scatter_gather_ops/TEST-TestScatterGatherCUDA-20230111212303.xml (deflated 92%) 2023-01-11T23:33:06.4888775Z adding: test/test-reports/python-unittest/test_testing/TEST-TestAssertClose-20230111212313.xml (deflated 84%) 2023-01-11T23:33:06.4889580Z adding: test/test-reports/python-unittest/test_testing/TEST-TestAssertCloseContainer-20230111212313.xml (deflated 70%) 2023-01-11T23:33:06.4890412Z adding: test/test-reports/python-unittest/test_testing/TEST-TestAssertCloseErrorMessage-20230111212313.xml (deflated 84%) 2023-01-11T23:33:06.4891248Z adding: test/test-reports/python-unittest/test_testing/TEST-TestAssertCloseMultiDeviceCUDA-20230111212313.xml (deflated 56%) 2023-01-11T23:33:06.4892128Z adding: test/test-reports/python-unittest/test_testing/TEST-TestAssertCloseQuantized-20230111212313.xml (deflated 69%) 2023-01-11T23:33:06.4892950Z adding: test/test-reports/python-unittest/test_testing/TEST-TestAssertCloseSparseBSC-20230111212313.xml (deflated 70%) 2023-01-11T23:33:06.4893778Z adding: test/test-reports/python-unittest/test_testing/TEST-TestAssertCloseSparseBSR-20230111212313.xml (deflated 70%) 2023-01-11T23:33:06.4894713Z adding: test/test-reports/python-unittest/test_testing/TEST-TestAssertCloseSparseCOO-20230111212313.xml (deflated 76%) 2023-01-11T23:33:06.4895519Z adding: test/test-reports/python-unittest/test_testing/TEST-TestAssertCloseSparseCSC-20230111212313.xml (deflated 70%) 2023-01-11T23:33:06.4896297Z adding: test/test-reports/python-unittest/test_testing/TEST-TestAssertCloseSparseCSR-20230111212313.xml (deflated 63%) 2023-01-11T23:33:06.4897255Z adding: test/test-reports/python-unittest/test_testing/TEST-TestFrameworkUtils-20230111212313.xml (deflated 41%) 2023-01-11T23:33:06.4898031Z adding: test/test-reports/python-unittest/test_testing/TEST-TestImports-20230111212313.xml (deflated 59%) 2023-01-11T23:33:06.4910153Z adding: test/test-reports/python-unittest/test_testing/TEST-TestOpInfoSampleFunctionsCUDA-20230111212313.xml (deflated 97%) 2023-01-11T23:33:06.4910951Z adding: test/test-reports/python-unittest/test_testing/TEST-TestOpInfos-20230111212313.xml (deflated 54%) 2023-01-11T23:33:06.4912294Z adding: test/test-reports/python-unittest/test_testing/TEST-TestTestParametrization-20230111212313.xml (deflated 88%) 2023-01-11T23:33:06.4913181Z adding: test/test-reports/python-unittest/test_testing/TEST-TestTestParametrizationDeviceTypeCUDA-20230111212313.xml (deflated 88%) 2023-01-11T23:33:06.4914257Z adding: test/test-reports/python-unittest/test_testing/TEST-TestTestingCUDA-20230111212313.xml (deflated 91%) 2023-01-11T23:33:06.4922278Z adding: test/test-reports/python-unittest/inductor.test_torchinductor/TEST-CPUReproTests-20230111212336.xml (deflated 90%) 2023-01-11T23:33:06.5082378Z adding: test/test-reports/python-unittest/inductor.test_torchinductor/TEST-CpuTests-20230111212336.xml (deflated 94%) 2023-01-11T23:33:06.5090065Z adding: test/test-reports/python-unittest/inductor.test_torchinductor/TEST-CudaReproTests-20230111212336.xml (deflated 88%) 2023-01-11T23:33:06.5344870Z adding: test/test-reports/python-unittest/inductor.test_torchinductor/TEST-CudaTests-20230111212336.xml (deflated 95%) 2023-01-11T23:33:06.5345649Z adding: test/test-reports/python-unittest/inductor.test_torchinductor/TEST-ExprPrinterTests-20230111212336.xml (deflated 41%) 2023-01-11T23:33:06.5357864Z adding: test/test-reports/python-unittest/inductor.test_torchinductor/TEST-SweepInputsCpuTest-20230111212336.xml (deflated 97%) 2023-01-11T23:33:06.5371341Z adding: test/test-reports/python-unittest/inductor.test_torchinductor/TEST-SweepInputsCudaTest-20230111212336.xml (deflated 97%) 2023-01-11T23:33:06.5372216Z adding: test/test-reports/python-unittest/inductor.test_torchinductor/TEST-TestIndexingSimplification-20230111212336.xml (deflated 57%) 2023-01-11T23:33:06.5385410Z adding: test/test-reports/python-unittest/inductor.test_torchinductor/TEST-TritonCodeGenTests-20230111212336.xml (deflated 91%) 2023-01-11T23:33:06.5386002Z adding: test/test-reports/python-unittest/benchmark_utils.test_benchmark_utils/TEST-TestBenchmarkUtils-20230111212350.xml (deflated 77%) 2023-01-11T23:33:06.5386686Z adding: test/test-reports/python-unittest/dynamo.test_comptime/TEST-ComptimeTests-20230111212357.xml (deflated 77%) 2023-01-11T23:33:06.5389728Z adding: test/test-reports/python-unittest/dynamo.test_functions/TEST-FunctionTests-20230111212402.xml (deflated 95%) 2023-01-11T23:33:06.5390386Z adding: test/test-reports/python-unittest/dynamo.test_replay_record/TEST-ReplayRecordTests-20230111212408.xml (deflated 81%) 2023-01-11T23:33:06.5390970Z adding: test/test-reports/python-unittest/dynamo.test_verify_correctness/TEST-TestVerifyCorrectness-20230111212411.xml (deflated 69%) 2023-01-11T23:33:06.5391964Z adding: test/test-reports/python-unittest/lazy.test_ts_opinfo/TEST-TestLazyDynamicOps-20230111212421.xml (deflated 40%) 2023-01-11T23:33:06.5392506Z adding: test/test-reports/python-unittest/lazy.test_ts_opinfo/TEST-TestLazyTensor-20230111212421.xml (deflated 58%) 2023-01-11T23:33:06.5393046Z adding: test/test-reports/python-unittest/nn.test_packed_sequence/TEST-PackedSequenceTest-20230111212429.xml (deflated 81%) 2023-01-11T23:33:06.5393589Z adding: test/test-reports/python-unittest/profiler.test_memory_profiler/TEST-TestDataFlow-20230111212434.xml (deflated 73%) 2023-01-11T23:33:06.5394161Z adding: test/test-reports/python-unittest/profiler.test_memory_profiler/TEST-TestIdentifyGradients-20230111212434.xml (deflated 71%) 2023-01-11T23:33:06.5394739Z adding: test/test-reports/python-unittest/profiler.test_memory_profiler/TEST-TestMemoryProfiler-20230111212434.xml (deflated 42%) 2023-01-11T23:33:06.5395394Z adding: test/test-reports/python-unittest/profiler.test_memory_profiler/TEST-TestMemoryProfilerE2E-20230111212434.xml (deflated 84%) 2023-01-11T23:33:06.5395942Z adding: test/test-reports/python-unittest/test_autocast/TEST-TestAutocastCPU-20230111212442.xml (deflated 77%) 2023-01-11T23:33:06.5396442Z adding: test/test-reports/python-unittest/test_autocast/TEST-TestAutocastGPU-20230111212442.xml (deflated 45%) 2023-01-11T23:33:06.5396950Z adding: test/test-reports/python-unittest/test_autocast/TEST-TestTorchAutocast-20230111212442.xml (deflated 42%) 2023-01-11T23:33:06.5397474Z adding: test/test-reports/python-unittest/test_dataloader/TEST-TestConcatDataset-20230111212451.xml (deflated 74%) 2023-01-11T23:33:06.5397998Z adding: test/test-reports/python-unittest/test_dataloader/TEST-TestCustomPinFn-20230111212451.xml (deflated 53%) 2023-01-11T23:33:06.5398511Z adding: test/test-reports/python-unittest/test_dataloader/TEST-TestDataLoader-20230111212451.xml (deflated 82%) 2023-01-11T23:33:06.5399411Z adding: test/test-reports/python-unittest/test_dataloader/TEST-TestDataLoaderPersistentWorkers-20230111212451.xml (deflated 85%) 2023-01-11T23:33:06.5400003Z adding: test/test-reports/python-unittest/test_dataloader/TEST-TestDatasetRandomSplit-20230111212451.xml (deflated 76%) 2023-01-11T23:33:06.5400553Z adding: test/test-reports/python-unittest/test_dataloader/TEST-TestDictDataLoader-20230111212451.xml (deflated 61%) 2023-01-11T23:33:06.5417683Z adding: test/test-reports/python-unittest/test_dataloader/TEST-TestNamedTupleDataLoader-20230111212451.xml (deflated 43%) 2023-01-11T23:33:06.5418378Z adding: test/test-reports/python-unittest/test_dataloader/TEST-TestSetAffinity-20230111212451.xml (deflated 40%) 2023-01-11T23:33:06.5419140Z adding: test/test-reports/python-unittest/test_dataloader/TEST-TestStringDataLoader-20230111212451.xml (deflated 40%) 2023-01-11T23:33:06.5419864Z adding: test/test-reports/python-unittest/test_dataloader/TEST-TestTensorDataset-20230111212451.xml (deflated 72%) 2023-01-11T23:33:06.5420473Z adding: test/test-reports/python-unittest/test_dataloader/TEST-IntegrationTestDataLoaderDataPipe-20230111212451.xml (deflated 43%) 2023-01-11T23:33:06.5421095Z adding: test/test-reports/python-unittest/test_dataloader/TEST-TestConvAfterFork-20230111212451.xml (deflated 41%) 2023-01-11T23:33:06.5421658Z adding: test/test-reports/python-unittest/test_dataloader/TEST-TestIndividualWorkerQueue-20230111212451.xml (deflated 42%) 2023-01-11T23:33:06.5422293Z adding: test/test-reports/python-unittest/test_functional_optim/TEST-TestFunctionalOptimParity-20230111212706.xml (deflated 71%) 2023-01-11T23:33:06.5422881Z adding: test/test-reports/python-unittest/test_fx_experimental/TEST-TestFXExperimental-20230111212713.xml (deflated 79%) 2023-01-11T23:33:06.5423498Z adding: test/test-reports/python-unittest/test_fx_experimental/TEST-TestNormalizeOperatorsCUDA-20230111212713.xml (deflated 97%) 2023-01-11T23:33:06.5424084Z adding: test/test-reports/python-unittest/test_import_stats/TEST-TestImportTime-20230111212730.xml (deflated 53%) 2023-01-11T23:33:06.5424609Z adding: test/test-reports/python-unittest/test_jit_autocast/TEST-TestAutocast-20230111212740.xml (deflated 86%) 2023-01-11T23:33:06.5425151Z adding: test/test-reports/python-unittest/test_jit_autocast/TEST-TestJitTraceAutocast-20230111212740.xml (deflated 76%) 2023-01-11T23:33:06.5425713Z adding: test/test-reports/python-unittest/test_jit_llga_fuser/TEST-TestEnableDisableLlgaFuser-20230111212850.xml (deflated 41%) 2023-01-11T23:33:06.5426262Z adding: test/test-reports/python-unittest/test_jit_llga_fuser/TEST-TestDynamoAOT-20230111212850.xml (deflated 41%) 2023-01-11T23:33:06.5426806Z adding: test/test-reports/python-unittest/test_jit_llga_fuser/TEST-TestFusionPatternCUDA-20230111212850.xml (deflated 92%) 2023-01-11T23:33:06.5427334Z adding: test/test-reports/python-unittest/test_jit_llga_fuser/TEST-TestModel-20230111212850.xml (deflated 96%) 2023-01-11T23:33:06.5427895Z adding: test/test-reports/python-unittest/test_jit_llga_fuser/TEST-TestOpCUDA-20230111212850.xml (deflated 93%) 2023-01-11T23:33:06.5428436Z adding: test/test-reports/python-unittest/test_matmul_cuda/TEST-TestMatmulCudaCUDA-20230111212855.xml (deflated 89%) 2023-01-11T23:33:06.5429101Z adding: test/test-reports/python-unittest/test_mkldnn_fusion/TEST-TestMkldnnFusion-20230111213029.xml (deflated 71%) 2023-01-11T23:33:06.5429702Z adding: test/test-reports/python-unittest/test_module_init/TEST-TestModuleInitCUDA-20230111213112.xml (deflated 96%) 2023-01-11T23:33:06.5430483Z adding: test/test-reports/python-unittest/test_native_mha/TEST-TestMHADeviceTypeCUDA-20230111213119.xml (deflated 95%) 2023-01-11T23:33:06.5431725Z adding: test/test-reports/python-unittest/test_numpy_interop/TEST-TestNumPyInteropCUDA-20230111213125.xml (deflated 86%) 2023-01-11T23:33:06.5432695Z adding: test/test-reports/python-unittest/test_optim/TEST-TestDifferentiableOptimizer-20230111213129.xml (deflated 82%) 2023-01-11T23:33:06.5434561Z adding: test/test-reports/python-unittest/test_optim/TEST-TestLRScheduler-20230111213129.xml (deflated 91%) 2023-01-11T23:33:06.5435211Z adding: test/test-reports/python-unittest/test_optim/TEST-TestOptim-20230111213129.xml (deflated 81%) 2023-01-11T23:33:06.5435710Z adding: test/test-reports/python-unittest/test_optim/TEST-TestSWAUtils-20230111213129.xml (deflated 75%) 2023-01-11T23:33:06.5437236Z adding: test/test-reports/python-unittest/test_shape_ops/TEST-TestShapeOpsCUDA-20230111213312.xml (deflated 93%) 2023-01-11T23:33:06.5437890Z adding: test/test-reports/python-unittest/test_type_info/TEST-TestDTypeInfo-20230111213317.xml (deflated 62%) 2023-01-11T23:33:06.5439961Z adding: test/test-reports/python-unittest/test_view_ops/TEST-TestOldViewOpsCUDA-20230111213323.xml (deflated 90%) 2023-01-11T23:33:06.5442699Z adding: test/test-reports/python-unittest/test_view_ops/TEST-TestViewOpsCUDA-20230111213323.xml (deflated 94%) 2023-01-11T23:33:06.5443504Z adding: test/test-reports/python-unittest/test_fake_tensor/TEST-FakeTensorConstHandling-20230111213807.xml (deflated 76%) 2023-01-11T23:33:06.5444211Z adding: test/test-reports/python-unittest/test_fake_tensor/TEST-FakeTensorConverterTest-20230111213807.xml (deflated 73%) 2023-01-11T23:33:06.5444807Z adding: test/test-reports/python-unittest/test_fake_tensor/TEST-FakeTensorOperatorInvariants-20230111213807.xml (deflated 60%) 2023-01-11T23:33:06.5445482Z adding: test/test-reports/python-unittest/test_fake_tensor/TEST-FakeTensorPropTest-20230111213807.xml (deflated 40%) 2023-01-11T23:33:06.5446233Z adding: test/test-reports/python-unittest/test_fake_tensor/TEST-FakeTensorTest-20230111213807.xml (deflated 82%) 2023-01-11T23:33:06.5472309Z adding: test/test-reports/python-unittest/test_sparse_csr/TEST-TestSparseCSRCUDA-20230111213819.xml (deflated 96%) 2023-01-11T23:33:06.5472999Z adding: test/test-reports/python-unittest/test_sparse_csr/TEST-TestSparseCSRSampler-20230111213819.xml (deflated 40%) 2023-01-11T23:33:06.5506641Z adding: test/test-reports/python-unittest/test_sparse_csr/TEST-TestSparseCompressedCUDA-20230111213819.xml (deflated 97%) 2023-01-11T23:33:06.5507467Z adding: test/test-reports/python-unittest/test_cpp_extensions_aot_no_ninja/TEST-TestCppExtensionAOT-20230111214707.xml (deflated 78%) 2023-01-11T23:33:06.5508245Z adding: test/test-reports/python-unittest/test_cpp_extensions_aot_no_ninja/TEST-TestORTTensor-20230111214707.xml (deflated 68%) 2023-01-11T23:33:06.5509040Z adding: test/test-reports/python-unittest/test_cpp_extensions_aot_no_ninja/TEST-TestPybindTypeCasters-20230111214707.xml (deflated 41%) 2023-01-11T23:33:06.5509836Z adding: test/test-reports/python-unittest/test_cpp_extensions_aot_no_ninja/TEST-TestRNGExtension-20230111214707.xml (deflated 40%) 2023-01-11T23:33:06.5510517Z adding: test/test-reports/python-unittest/test_cpp_extensions_aot_no_ninja/TEST-TestTorchLibrary-20230111214707.xml (deflated 40%) 2023-01-11T23:33:06.5511101Z adding: test/test-reports/python-unittest/test_cuda_nvml_based_avail/TEST-TestExtendedCUDAIsAvail-20230111214711.xml (deflated 42%) 2023-01-11T23:33:06.5511900Z adding: test/test-reports/python-unittest/test_cuda_nvml_based_avail/TEST-TestExtendedCUDAIsAvail-20230111214712.xml (deflated 42%) 2023-01-11T23:33:06.5512620Z adding: test/test-reports/python-unittest/test_cuda_nvml_based_avail/TEST-TestExtendedCUDAIsAvail-20230111214714.xml (deflated 41%) 2023-01-11T23:33:06.5513200Z adding: test/test-reports/python-unittest/test_cuda_nvml_based_avail/TEST-TestExtendedCUDAIsAvail-20230111214715.xml (deflated 42%) 2023-01-11T23:33:06.5513770Z adding: test/test-reports/python-unittest/test_cuda_nvml_based_avail/TEST-TestExtendedCUDAIsAvail-20230111214717.xml (deflated 42%) 2023-01-11T23:33:06.5514341Z adding: test/test-reports/python-unittest/test_cuda_nvml_based_avail/TEST-TestExtendedCUDAIsAvail-20230111214719.xml (deflated 41%) 2023-01-11T23:33:06.5514861Z adding: test/test-reports/python-unittest/test_dispatch/TEST-TestDispatch-20230111214721.xml (deflated 81%) 2023-01-11T23:33:06.5515389Z adding: test/test-reports/python-unittest/test_dispatch/TEST-TestPythonDispatcher-20230111214721.xml (deflated 75%) 2023-01-11T23:33:06.5526691Z adding: test/test-reports/python-unittest/test_linalg/TEST-TestLinalgCUDA-20230111214810.xml (deflated 93%) 2023-01-11T23:33:06.5527387Z adding: test/test-reports/python-unittest/test_multiprocessing_spawn/TEST-ErrorTest-20230111215317.xml (deflated 39%) 2023-01-11T23:33:06.5528052Z adding: test/test-reports/python-unittest/test_multiprocessing_spawn/TEST-ForkTest-20230111215317.xml (deflated 76%) 2023-01-11T23:33:06.5528632Z adding: test/test-reports/python-unittest/test_multiprocessing_spawn/TEST-SpawnTest-20230111215317.xml (deflated 77%) 2023-01-11T23:33:06.5529153Z adding: test/test-reports/python-unittest/test_prims/TEST-TestDecompCUDA-20230111231405.xml (deflated 74%) 2023-01-11T23:33:06.5529786Z adding: test/test-reports/python-unittest/test_prims/TEST-TestPrimsBasic-20230111231405.xml (deflated 52%) 2023-01-11T23:33:06.5530515Z adding: test/test-reports/python-unittest/test_prims/TEST-TestPrimsCUDA-20230111231405.xml (deflated 84%) 2023-01-11T23:33:06.5531021Z adding: test/test-reports/python-unittest/test_prims/TEST-TestRefsCUDA-20230111231405.xml (deflated 38%) 2023-01-11T23:33:06.5532667Z adding: test/test-reports/python-unittest/test_tensor_creation_ops/TEST-TestAsArrayCUDA-20230111231451.xml (deflated 95%) 2023-01-11T23:33:06.5533446Z adding: test/test-reports/python-unittest/test_tensor_creation_ops/TEST-TestLikeTensorCreationCUDA-20230111231451.xml (deflated 76%) 2023-01-11T23:33:06.5534266Z adding: test/test-reports/python-unittest/test_tensor_creation_ops/TEST-TestRandomTensorCreationCUDA-20230111231451.xml (deflated 86%) 2023-01-11T23:33:06.5540555Z adding: test/test-reports/python-unittest/test_tensor_creation_ops/TEST-TestTensorCreationCUDA-20230111231451.xml (deflated 93%) 2023-01-11T23:33:06.5777379Z adding: test/test-reports/python-pytest/test_ops/test_ops-8308d40cbcb1066e.xml (deflated 95%) 2023-01-11T23:33:06.6008621Z adding: test/test-reports/python-pytest/test_ops/test_ops-621a128d9f5db79e.xml (deflated 95%) 2023-01-11T23:33:06.6009337Z adding: test/test-reports/python-pytest/test_ops/test_ops-001e362a3915126e.xml (deflated 86%) 2023-01-11T23:33:06.6037460Z adding: test/test-reports/cpp-unittest/test_libtorch/test_jit.xml (deflated 91%) 2023-01-11T23:33:06.6046306Z adding: test/test-reports/cpp-unittest/test_libtorch/test_lazy.xml (deflated 93%) 2023-01-11T23:33:06.6065225Z adding: test/test-reports/cpp-unittest/test_libtorch/test_api.xml (deflated 92%) 2023-01-11T23:33:06.6080509Z adding: test/test-reports/cpp-unittest/test_libtorch/test_tensorexpr.xml (deflated 92%) 2023-01-11T23:33:06.6081325Z adding: test/test-reports/cpp-unittest/test_aot_compilation/test_mobile_nnc.xml (deflated 75%) 2023-01-11T23:33:06.6102402Z ##[group]Run # Remove any previous test reports if they exist 2023-01-11T23:33:06.6102716Z # Remove any previous test reports if they exist 2023-01-11T23:33:06.6103044Z rm -f usage-log-*.zip 2023-01-11T23:33:06.6103340Z # this workflow is also run in bazel build test, but we dont generate usage reports for it 2023-01-11T23:33:06.6103655Z # so check to see if the file exists first 2023-01-11T23:33:06.6103899Z if [ -f 'usage_log.txt' ]; then 2023-01-11T23:33:06.6104167Z  zip "usage-log-${FILE_SUFFIX}.zip" 'usage_log.txt' 2023-01-11T23:33:06.6104400Z fi 2023-01-11T23:33:06.6114605Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2023-01-11T23:33:06.6114838Z env: 2023-01-11T23:33:06.6115027Z GIT_DEFAULT_BRANCH: master 2023-01-11T23:33:06.6115229Z GPU_FLAG: --gpus all 2023-01-11T23:33:06.6115508Z DOCKER_CONTAINER_ID: b465a1e11c77f74805f694f8096e59c3761c5a109e9555d7d0ec24fd6c5fa0c6 2023-01-11T23:33:06.6115862Z FILE_SUFFIX: test-default-2-4-linux.g5.4xlarge.nvidia.gpu_10589556206 2023-01-11T23:33:06.6116126Z ##[endgroup] 2023-01-11T23:33:06.6965601Z adding: usage_log.txt (deflated 98%) 2023-01-11T23:33:06.7001574Z ##[group]Run seemethere/upload-artifact-s3@v5 2023-01-11T23:33:06.7001808Z with: 2023-01-11T23:33:06.7002023Z s3-prefix: pytorch/pytorch/3896346758/1/artifact 2023-01-11T23:33:06.7002247Z retention-days: 14 2023-01-11T23:33:06.7002452Z if-no-files-found: warn 2023-01-11T23:33:06.7002667Z path: test-jsons-*.zip 2023-01-11T23:33:06.7002862Z name: artifact 2023-01-11T23:33:06.7003060Z s3-bucket: gha-artifacts 2023-01-11T23:33:06.7003266Z region: us-east-1 2023-01-11T23:33:06.7003441Z env: 2023-01-11T23:33:06.7003630Z GIT_DEFAULT_BRANCH: master 2023-01-11T23:33:06.7003849Z GPU_FLAG: --gpus all 2023-01-11T23:33:06.7004120Z DOCKER_CONTAINER_ID: b465a1e11c77f74805f694f8096e59c3761c5a109e9555d7d0ec24fd6c5fa0c6 2023-01-11T23:33:06.7004383Z ##[endgroup] 2023-01-11T23:33:07.0103152Z NOTE: s3-prefix specified, ignoring name parameter 2023-01-11T23:33:07.0104179Z With the provided path, there will be 1 file uploaded 2023-01-11T23:33:07.0105127Z Uploading to s3 prefix: pytorch/pytorch/3896346758/1/artifact 2023-01-11T23:33:07.0112439Z Starting upload of test-jsons-test-default-2-4-linux.g5.4xlarge.nvidia.gpu_10589556206.zip 2023-01-11T23:33:07.1415676Z Finished upload of test-jsons-test-default-2-4-linux.g5.4xlarge.nvidia.gpu_10589556206.zip 2023-01-11T23:33:07.1542695Z ##[group]Run seemethere/upload-artifact-s3@v5 2023-01-11T23:33:07.1542939Z with: 2023-01-11T23:33:07.1543153Z s3-prefix: pytorch/pytorch/3896346758/1/artifact 2023-01-11T23:33:07.1543389Z retention-days: 14 2023-01-11T23:33:07.1543605Z if-no-files-found: error 2023-01-11T23:33:07.1543820Z path: test-reports-*.zip 2023-01-11T23:33:07.1544024Z name: artifact 2023-01-11T23:33:07.1544228Z s3-bucket: gha-artifacts 2023-01-11T23:33:07.1544428Z region: us-east-1 2023-01-11T23:33:07.1544611Z env: 2023-01-11T23:33:07.1544801Z GIT_DEFAULT_BRANCH: master 2023-01-11T23:33:07.1545006Z GPU_FLAG: --gpus all 2023-01-11T23:33:07.1545297Z DOCKER_CONTAINER_ID: b465a1e11c77f74805f694f8096e59c3761c5a109e9555d7d0ec24fd6c5fa0c6 2023-01-11T23:33:07.1545564Z ##[endgroup] 2023-01-11T23:33:07.4619471Z NOTE: s3-prefix specified, ignoring name parameter 2023-01-11T23:33:07.4620077Z With the provided path, there will be 1 file uploaded 2023-01-11T23:33:07.4620640Z Uploading to s3 prefix: pytorch/pytorch/3896346758/1/artifact 2023-01-11T23:33:07.4627429Z Starting upload of test-reports-test-default-2-4-linux.g5.4xlarge.nvidia.gpu_10589556206.zip 2023-01-11T23:33:07.6623228Z Finished upload of test-reports-test-default-2-4-linux.g5.4xlarge.nvidia.gpu_10589556206.zip 2023-01-11T23:33:07.6759150Z ##[group]Run seemethere/upload-artifact-s3@v5 2023-01-11T23:33:07.6759383Z with: 2023-01-11T23:33:07.6759593Z s3-prefix: pytorch/pytorch/3896346758/1/artifact 2023-01-11T23:33:07.6759827Z retention-days: 14 2023-01-11T23:33:07.6760041Z if-no-files-found: ignore 2023-01-11T23:33:07.6760479Z path: usage-log-*.zip 2023-01-11T23:33:07.6760685Z name: artifact 2023-01-11T23:33:07.6760893Z s3-bucket: gha-artifacts 2023-01-11T23:33:07.6761167Z region: us-east-1 2023-01-11T23:33:07.6761348Z env: 2023-01-11T23:33:07.6761535Z GIT_DEFAULT_BRANCH: master 2023-01-11T23:33:07.6761740Z GPU_FLAG: --gpus all 2023-01-11T23:33:07.6762016Z DOCKER_CONTAINER_ID: b465a1e11c77f74805f694f8096e59c3761c5a109e9555d7d0ec24fd6c5fa0c6 2023-01-11T23:33:07.6762277Z ##[endgroup] 2023-01-11T23:33:07.9844380Z NOTE: s3-prefix specified, ignoring name parameter 2023-01-11T23:33:07.9844698Z With the provided path, there will be 1 file uploaded 2023-01-11T23:33:07.9845047Z Uploading to s3 prefix: pytorch/pytorch/3896346758/1/artifact 2023-01-11T23:33:07.9852332Z Starting upload of usage-log-test-default-2-4-linux.g5.4xlarge.nvidia.gpu_10589556206.zip 2023-01-11T23:33:08.1166451Z Finished upload of usage-log-test-default-2-4-linux.g5.4xlarge.nvidia.gpu_10589556206.zip 2023-01-11T23:33:08.1295891Z ##[group]Run # shellcheck disable=SC2156 2023-01-11T23:33:08.1296153Z # shellcheck disable=SC2156 2023-01-11T23:33:08.1296485Z find . -iname "core.[1-9]*" -exec docker exec "${DOCKER_CONTAINER_ID}" sh -c "gdb python {} -ex 'bt' -ex 'q'" \; 2023-01-11T23:33:08.1307020Z shell: /usr/bin/bash -e {0} 2023-01-11T23:33:08.1307211Z env: 2023-01-11T23:33:08.1307399Z GIT_DEFAULT_BRANCH: master 2023-01-11T23:33:08.1307610Z GPU_FLAG: --gpus all 2023-01-11T23:33:08.1307882Z DOCKER_CONTAINER_ID: b465a1e11c77f74805f694f8096e59c3761c5a109e9555d7d0ec24fd6c5fa0c6 2023-01-11T23:33:08.1308144Z ##[endgroup] 2023-01-11T23:33:08.3410594Z GNU gdb (Ubuntu 8.1.1-0ubuntu1) 8.1.1 2023-01-11T23:33:08.3411051Z Copyright (C) 2018 Free Software Foundation, Inc. 2023-01-11T23:33:08.3411501Z License GPLv3+: GNU GPL version 3 or later 2023-01-11T23:33:08.3411961Z This is free software: you are free to change and redistribute it. 2023-01-11T23:33:08.3412447Z There is NO WARRANTY, to the extent permitted by law. Type "show copying" 2023-01-11T23:33:08.3412824Z and "show warranty" for details. 2023-01-11T23:33:08.3413247Z This GDB was configured as "x86_64-linux-gnu". 2023-01-11T23:33:08.3413631Z Type "show configuration" for configuration details. 2023-01-11T23:33:08.3413944Z For bug reporting instructions, please see: 2023-01-11T23:33:08.3414208Z . 2023-01-11T23:33:08.3414742Z Find the GDB manual and other documentation resources online at: 2023-01-11T23:33:08.3415042Z . 2023-01-11T23:33:08.3415278Z For help, type "help". 2023-01-11T23:33:08.3415538Z Type "apropos word" to search for commands related to "word"... 2023-01-11T23:33:08.4120446Z Reading symbols from python...done. 2023-01-11T23:33:08.9405386Z 2023-01-11T23:33:08.9405825Z warning: core file may not match specified executable file. 2023-01-11T23:33:08.9432762Z BFD: warning: /opt/conda/lib/libgomp.so.1: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010001 2023-01-11T23:33:08.9433757Z BFD: warning: /opt/conda/lib/libgomp.so.1: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010002 2023-01-11T23:33:08.9434741Z BFD: warning: /opt/conda/bin/../lib/libstdc++.so.6: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010001 2023-01-11T23:33:08.9435700Z BFD: warning: /opt/conda/bin/../lib/libstdc++.so.6: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010002 2023-01-11T23:33:08.9436615Z BFD: warning: /opt/conda/bin/../lib/libgcc_s.so.1: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010001 2023-01-11T23:33:08.9437330Z BFD: warning: /opt/conda/bin/../lib/libgcc_s.so.1: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010002 2023-01-11T23:33:08.9437869Z [New LWP 6354] 2023-01-11T23:33:08.9438256Z [New LWP 6362] 2023-01-11T23:33:08.9438611Z [New LWP 6360] 2023-01-11T23:33:08.9438981Z [New LWP 6369] 2023-01-11T23:33:08.9439477Z [New LWP 6373] 2023-01-11T23:33:08.9439920Z [New LWP 6388] 2023-01-11T23:33:08.9440105Z [New LWP 6387] 2023-01-11T23:33:08.9440340Z [New LWP 6371] 2023-01-11T23:33:08.9440519Z [New LWP 6372] 2023-01-11T23:33:08.9440696Z [New LWP 6375] 2023-01-11T23:33:08.9440876Z [New LWP 6374] 2023-01-11T23:33:08.9441401Z [New LWP 6376] 2023-01-11T23:33:08.9441674Z [New LWP 6364] 2023-01-11T23:33:08.9441949Z [New LWP 6370] 2023-01-11T23:33:08.9442124Z [New LWP 6377] 2023-01-11T23:33:08.9442304Z [New LWP 6389] 2023-01-11T23:33:08.9442486Z [New LWP 6392] 2023-01-11T23:33:08.9443075Z BFD: warning: /opt/conda/lib/python3.10/site-packages/numpy/core/../../../.././libgfortran.so.5: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010001 2023-01-11T23:33:08.9443625Z BFD: warning: /opt/conda/lib/python3.10/site-packages/numpy/core/../../../.././libgfortran.so.5: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010002 2023-01-11T23:33:08.9444174Z BFD: warning: /opt/conda/lib/python3.10/site-packages/numpy/core/../../../.././libquadmath.so.0: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010001 2023-01-11T23:33:08.9444824Z BFD: warning: /opt/conda/lib/python3.10/site-packages/numpy/core/../../../.././libquadmath.so.0: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010002 2023-01-11T23:33:08.9616738Z [Thread debugging using libthread_db enabled] 2023-01-11T23:33:08.9617783Z Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". 2023-01-11T23:33:14.4700612Z Core was generated by `/opt/conda/bin/python -bb -c from multiprocessing.spawn import spawn_main; spaw'. 2023-01-11T23:33:14.4701542Z Program terminated with signal SIGSEGV, Segmentation fault. 2023-01-11T23:33:14.4702018Z #0 raise (sig=) at ../sysdeps/unix/sysv/linux/raise.c:51 2023-01-11T23:33:14.4702419Z [Current thread is 1 (Thread 0x7fea17da6080 (LWP 6354))] 2023-01-11T23:33:14.4724648Z 51 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory. 2023-01-11T23:33:14.4725929Z warning: File "/var/lib/jenkins/workspace/.gdbinit" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load". 2023-01-11T23:33:14.4732344Z To enable execution of this file add 2023-01-11T23:33:14.4732890Z add-auto-load-safe-path /var/lib/jenkins/workspace/.gdbinit 2023-01-11T23:33:14.4733287Z line to your configuration file "/var/lib/jenkins/.gdbinit". 2023-01-11T23:33:14.4733687Z To completely disable this security protection add 2023-01-11T23:33:14.4734076Z set auto-load safe-path / 2023-01-11T23:33:14.4734400Z line to your configuration file "/var/lib/jenkins/.gdbinit". 2023-01-11T23:33:14.4734956Z For more information about this security protection see the 2023-01-11T23:33:14.4735337Z "Auto-loading safe path" section in the GDB manual. E.g., run from the shell: 2023-01-11T23:33:14.4735664Z info "(gdb)Auto-loading safe path" 2023-01-11T23:33:14.4775719Z #0 raise (sig=) at ../sysdeps/unix/sysv/linux/raise.c:51 2023-01-11T23:33:14.4776873Z #1 0x00007fe9dedc265b in handler_SIGSEGV(int, siginfo_t*, void*) () 2023-01-11T23:33:14.4777975Z from /opt/conda/lib/python3.10/site-packages/torch/lib/libtorch_python.so 2023-01-11T23:33:14.4785918Z #2 2023-01-11T23:33:14.4786273Z #3 __strlen_sse2 () at ../sysdeps/x86_64/multiarch/../strlen.S:120 2023-01-11T23:33:14.4922693Z #4 0x00007fea1686403b in string_at (ptr=0x0, size=-1) at :5564 2023-01-11T23:33:14.4923122Z #5 0x00007fea17c5b052 in ffi_call_unix64 () 2023-01-11T23:33:14.4923526Z from /opt/conda/lib/python3.10/lib-dynload/../../libffi.so.8 2023-01-11T23:33:14.4928371Z #6 0x00007fea17c598cd in ffi_call_int () 2023-01-11T23:33:14.4928880Z from /opt/conda/lib/python3.10/lib-dynload/../../libffi.so.8 2023-01-11T23:33:14.4933989Z #7 0x00007fea1686c879 in _call_function_pointer (argtypecount=2, argcount=2, 2023-01-11T23:33:14.4934443Z resmem=0x7ffca89e55f0, restype=, atypes=, 2023-01-11T23:33:14.4935158Z avalues=, pProc=0x7fea16864002 , flags=4357) 2023-01-11T23:33:14.4935684Z at /usr/local/src/conda/python-3.10.8/build-static/stgdict.c:916 2023-01-11T23:33:14.4935994Z #8 _ctypes_callproc () 2023-01-11T23:33:14.4936333Z at /usr/local/src/conda/python-3.10.8/build-static/stgdict.c:1259 2023-01-11T23:33:14.4936644Z #9 0x00007fea1686c3fe in PyCFuncPtr_call () at :4201 2023-01-11T23:33:14.4937196Z #10 0x00000000004f7b8b in _PyObject_MakeTpCall.localalias () 2023-01-11T23:33:14.4937537Z at /usr/local/src/conda/python-3.10.8/Include/abstract.h:224 2023-01-11T23:33:14.4941531Z #11 0x00000000004f37ae in _PyObject_VectorcallTstate ( 2023-01-11T23:33:14.4941994Z kwnames=, 2023-01-11T23:33:14.4942527Z nargsf=, args=0x7fe9262b7418, 2023-01-11T23:33:14.4942968Z callable=, 2023-01-11T23:33:14.4943356Z tstate=) 2023-01-11T23:33:14.4944075Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:112 2023-01-11T23:33:14.4944517Z #12 _PyObject_VectorcallTstate (kwnames=0x0, nargsf=, 2023-01-11T23:33:14.4944855Z args=0x7fe9262b7418, callable=0x7fea16b20880, tstate=) 2023-01-11T23:33:14.4945227Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:99 2023-01-11T23:33:14.4947377Z #13 PyObject_Vectorcall (kwnames=0x0, nargsf=, 2023-01-11T23:33:14.4947973Z args=0x7fe9262b7418, callable=0x7fea16b20880) 2023-01-11T23:33:14.4948587Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123 2023-01-11T23:33:14.4949672Z #14 call_function (kwnames=0x0, oparg=, 2023-01-11T23:33:14.4950218Z pp_stack=, trace_info=0x7ffca89e5900, 2023-01-11T23:33:14.4950593Z tstate=) 2023-01-11T23:33:14.4951082Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5891 2023-01-11T23:33:14.4951487Z #15 _PyEval_EvalFrameDefault () 2023-01-11T23:33:14.4951986Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4213 2023-01-11T23:33:14.4952936Z #16 0x00000000004fe5ef in _PyEval_EvalFrame ( 2023-01-11T23:33:14.4954053Z throwflag=, 2023-01-11T23:33:14.4954476Z f=, 2023-01-11T23:33:14.4954849Z tstate=) 2023-01-11T23:33:14.4955381Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052 2023-01-11T23:33:14.4958621Z #17 _PyEval_Vector (kwnames=, 2023-01-11T23:33:14.4959022Z kwnames@entry=, argcount=, args=, 2023-01-11T23:33:14.4959491Z args@entry=, locals=0x0, 2023-01-11T23:33:14.4959989Z locals@entry=, con=0x7fea16adf890, 2023-01-11T23:33:14.4960460Z con@entry=, tstate=0x18f3bf0, 2023-01-11T23:33:14.4960864Z tstate@entry=) 2023-01-11T23:33:14.4961295Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065 2023-01-11T23:33:14.4961556Z #18 _PyFunction_Vectorcall () 2023-01-11T23:33:14.4961872Z at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342 2023-01-11T23:33:14.4964524Z #19 0x00000000004f351e in _PyObject_VectorcallTstate (kwnames=0x0, 2023-01-11T23:33:14.4964928Z nargsf=, args=0x7fe9262b7280, callable=0x7fea16adf880, 2023-01-11T23:33:14.4965246Z tstate=0x18f3bf0) 2023-01-11T23:33:14.4965635Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123 2023-01-11T23:33:14.4966267Z #20 PyObject_Vectorcall (kwnames=0x0, nargsf=, 2023-01-11T23:33:14.4966785Z args=0x7fe9262b7280, callable=0x7fea16adf880) 2023-01-11T23:33:14.4967204Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123 2023-01-11T23:33:14.4968562Z #21 call_function (kwnames=0x0, oparg=, 2023-01-11T23:33:14.4968935Z pp_stack=, trace_info=0x7ffca89e5ac0, 2023-01-11T23:33:14.4969200Z tstate=) 2023-01-11T23:33:14.4969602Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5891 2023-01-11T23:33:14.4969924Z #22 _PyEval_EvalFrameDefault () 2023-01-11T23:33:14.4970245Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4181 2023-01-11T23:33:14.4970845Z #23 0x0000000000543a33 in _PyEval_EvalFrame ( 2023-01-11T23:33:14.4971813Z throwflag=, 2023-01-11T23:33:14.4972416Z f=, 2023-01-11T23:33:14.4972988Z tstate=) 2023-01-11T23:33:14.4973628Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052 2023-01-11T23:33:14.4976078Z #24 _PyEval_Vector (kwnames=0x0, 2023-01-11T23:33:14.4976580Z argcount=, 2023-01-11T23:33:14.4977210Z args=, locals=0x0, con=0x7fe9260c1ac0, tstate=0x18f3bf0) 2023-01-11T23:33:14.4977884Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065 2023-01-11T23:33:14.4978336Z #25 _PyFunction_Vectorcall (kwnames=0x0, nargsf=, 2023-01-11T23:33:14.4978747Z stack=, func=0x7fe9260c1ab0) 2023-01-11T23:33:14.4979262Z at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342 2023-01-11T23:33:14.4982419Z #26 _PyObject_VectorcallTstate (kwnames=0x0, nargsf=, 2023-01-11T23:33:14.4982893Z args=, callable=0x7fe9260c1ab0, tstate=0x18f3bf0) 2023-01-11T23:33:14.4983488Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:114 2023-01-11T23:33:14.4983874Z #27 vectorcall_unbound ( 2023-01-11T23:33:14.4984348Z nargs=, args=, 2023-01-11T23:33:14.4984939Z func=, 2023-01-11T23:33:14.4985498Z unbound=, 2023-01-11T23:33:14.4986052Z tstate=) 2023-01-11T23:33:14.4986661Z at /usr/local/src/conda/python-3.10.8/Programs/gcmodule.c:1629 2023-01-11T23:33:14.4987032Z #28 vectorcall_method () 2023-01-11T23:33:14.4987497Z at /usr/local/src/conda/python-3.10.8/Programs/gcmodule.c:1661 2023-01-11T23:33:14.4987935Z #29 0x0000000000543898 in slot_mp_subscript (self=, 2023-01-11T23:33:14.4988289Z arg1=) 2023-01-11T23:33:14.4988755Z at /usr/local/src/conda/python-3.10.8/Programs/gcmodule.c:7258 2023-01-11T23:33:14.4989166Z #30 0x00000000004ef56e in _PyEval_EvalFrameDefault () 2023-01-11T23:33:14.4989674Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:2109 2023-01-11T23:33:14.4992379Z #31 0x00000000004fe5ef in _PyEval_EvalFrame ( 2023-01-11T23:33:14.4992881Z throwflag=, 2023-01-11T23:33:14.4993441Z f=, 2023-01-11T23:33:14.4993998Z tstate=) 2023-01-11T23:33:14.4994602Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052 2023-01-11T23:33:14.4996683Z #32 _PyEval_Vector (kwnames=, 2023-01-11T23:33:14.4997441Z kwnames@entry=, argcount=, args=, 2023-01-11T23:33:14.4998107Z args@entry=, locals=0x0, 2023-01-11T23:33:14.4998728Z locals@entry=, con=0x7fea16add5b0, 2023-01-11T23:33:14.4999346Z con@entry=, tstate=0x18f3bf0, 2023-01-11T23:33:14.4999942Z tstate@entry=) 2023-01-11T23:33:14.5000670Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065 2023-01-11T23:33:14.5001054Z #33 _PyFunction_Vectorcall () 2023-01-11T23:33:14.5001533Z at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342 2023-01-11T23:33:14.5002866Z #34 0x00000000004eecef in _PyObject_VectorcallTstate (kwnames=0x0, 2023-01-11T23:33:14.5003326Z nargsf=, args=0x7fe9262bebf8, callable=0x7fea16add5a0, 2023-01-11T23:33:14.5003707Z tstate=0x18f3bf0) 2023-01-11T23:33:14.5004179Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123 2023-01-11T23:33:14.5006244Z #35 PyObject_Vectorcall (kwnames=0x0, nargsf=, 2023-01-11T23:33:14.5006665Z args=0x7fe9262bebf8, callable=0x7fea16add5a0) 2023-01-11T23:33:14.5007183Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123 2023-01-11T23:33:14.5008012Z #36 call_function (kwnames=0x0, oparg=, 2023-01-11T23:33:14.5008445Z pp_stack=, trace_info=0x7ffca89e5ec0, 2023-01-11T23:33:14.5008822Z tstate=) 2023-01-11T23:33:14.5009282Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5891 2023-01-11T23:33:14.5009679Z #37 _PyEval_EvalFrameDefault () 2023-01-11T23:33:14.5010163Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4213 2023-01-11T23:33:14.5010825Z #38 0x00000000004fe5ef in _PyEval_EvalFrame ( 2023-01-11T23:33:14.5011462Z throwflag=, 2023-01-11T23:33:14.5012055Z f=, 2023-01-11T23:33:14.5012602Z tstate=) 2023-01-11T23:33:14.5013213Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052 2023-01-11T23:33:14.5016658Z #39 _PyEval_Vector (kwnames=, 2023-01-11T23:33:14.5017269Z kwnames@entry=, argcount=, args=, 2023-01-11T23:33:14.5017926Z args@entry=, locals=0x0, 2023-01-11T23:33:14.5018543Z locals@entry=, con=0x7fe92f0e1370, 2023-01-11T23:33:14.5019180Z con@entry=, tstate=0x18f3bf0, 2023-01-11T23:33:14.5019769Z tstate@entry=) 2023-01-11T23:33:14.5020392Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065 2023-01-11T23:33:14.5020766Z #40 _PyFunction_Vectorcall () 2023-01-11T23:33:14.5021235Z at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342 2023-01-11T23:33:14.5022339Z #41 0x00000000004ef101 in _PyObject_VectorcallTstate (kwnames=0x0, 2023-01-11T23:33:14.5022797Z nargsf=, args=0x690b718, callable=0x7fe92f0e1360, 2023-01-11T23:33:14.5023163Z tstate=0x18f3bf0) 2023-01-11T23:33:14.5023633Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123 2023-01-11T23:33:14.5025991Z #42 PyObject_Vectorcall (kwnames=0x0, nargsf=, args=0x690b718, 2023-01-11T23:33:14.5026394Z callable=0x7fe92f0e1360) 2023-01-11T23:33:14.5026885Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123 2023-01-11T23:33:14.5027649Z #43 call_function (kwnames=0x0, oparg=, 2023-01-11T23:33:14.5028105Z pp_stack=, trace_info=0x7ffca89e6080, 2023-01-11T23:33:14.5028471Z tstate=) 2023-01-11T23:33:14.5028943Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5891 2023-01-11T23:33:14.5029320Z #44 _PyEval_EvalFrameDefault () 2023-01-11T23:33:14.5029809Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4198 2023-01-11T23:33:14.5030734Z #45 0x00000000004fe5ef in _PyEval_EvalFrame ( 2023-01-11T23:33:14.5032826Z throwflag=, 2023-01-11T23:33:14.5033410Z f=, 2023-01-11T23:33:14.5033951Z tstate=) 2023-01-11T23:33:14.5034548Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052 2023-01-11T23:33:14.5035991Z #46 _PyEval_Vector (kwnames=, 2023-01-11T23:33:14.5036569Z kwnames@entry=, argcount=, args=, 2023-01-11T23:33:14.5037212Z args@entry=, locals=0x0, 2023-01-11T23:33:14.5037834Z locals@entry=, con=0x7fe92f0e0680, 2023-01-11T23:33:14.5038457Z con@entry=, tstate=0x18f3bf0, 2023-01-11T23:33:14.5039062Z tstate@entry=) 2023-01-11T23:33:14.5039675Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065 2023-01-11T23:33:14.5040056Z #47 _PyFunction_Vectorcall () 2023-01-11T23:33:14.5040533Z at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342 2023-01-11T23:33:14.5042802Z #48 0x00000000004f141c in do_call_core (kwdict=0x7fea16c2a700, 2023-01-11T23:33:14.5043239Z callargs=0x7fe926269e40, func=0x7fe92f0e0670, trace_info=0x7ffca89e6240, 2023-01-11T23:33:14.5043627Z tstate=) 2023-01-11T23:33:14.5044128Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5943 2023-01-11T23:33:14.5044524Z #49 _PyEval_EvalFrameDefault () 2023-01-11T23:33:14.5045005Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4277 2023-01-11T23:33:14.5045426Z #50 0x00000000004fe5ef in _PyEval_EvalFrame ( 2023-01-11T23:33:14.5045902Z throwflag=, 2023-01-11T23:33:14.5046436Z f=, 2023-01-11T23:33:14.5046992Z tstate=) 2023-01-11T23:33:14.5047580Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052 2023-01-11T23:33:14.5050690Z #51 _PyEval_Vector (kwnames=, 2023-01-11T23:33:14.5051262Z kwnames@entry=, argcount=, args=, 2023-01-11T23:33:14.5051927Z args@entry=, locals=0x0, 2023-01-11T23:33:14.5052556Z locals@entry=, con=0x7fea16c555b0, 2023-01-11T23:33:14.5053270Z con@entry=, tstate=0x18f3bf0, 2023-01-11T23:33:14.5053854Z tstate@entry=) 2023-01-11T23:33:14.5054692Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065 2023-01-11T23:33:14.5055071Z #52 _PyFunction_Vectorcall () 2023-01-11T23:33:14.5055500Z at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342 2023-01-11T23:33:14.5056342Z #53 0x00000000004ef101 in _PyObject_VectorcallTstate (kwnames=0x0, 2023-01-11T23:33:14.5057014Z nargsf=, args=0x694cd88, callable=0x7fea16c555a0, 2023-01-11T23:33:14.5057286Z tstate=0x18f3bf0) 2023-01-11T23:33:14.5057994Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123 2023-01-11T23:33:14.5060584Z #54 PyObject_Vectorcall (kwnames=0x0, nargsf=, args=0x694cd88, 2023-01-11T23:33:14.5060868Z callable=0x7fea16c555a0) 2023-01-11T23:33:14.5061221Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123 2023-01-11T23:33:14.5062100Z #55 call_function (kwnames=0x0, oparg=, 2023-01-11T23:33:14.5062408Z pp_stack=, trace_info=0x7ffca89e6400, 2023-01-11T23:33:14.5062750Z tstate=) 2023-01-11T23:33:14.5063072Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5891 2023-01-11T23:33:14.5063338Z #56 _PyEval_EvalFrameDefault () 2023-01-11T23:33:14.5063655Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4198 2023-01-11T23:33:14.5064179Z #57 0x00000000004fe5ef in _PyEval_EvalFrame ( 2023-01-11T23:33:14.5064951Z throwflag=, 2023-01-11T23:33:14.5065532Z f=, 2023-01-11T23:33:14.5066091Z tstate=) 2023-01-11T23:33:14.5066717Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052 2023-01-11T23:33:14.5069562Z #58 _PyEval_Vector (kwnames=, 2023-01-11T23:33:14.5070142Z kwnames@entry=, argcount=, args=, 2023-01-11T23:33:14.5070618Z args@entry=, locals=0x0, 2023-01-11T23:33:14.5071063Z locals@entry=, con=0x7fea16c55f40, 2023-01-11T23:33:14.5071547Z con@entry=, tstate=0x18f3bf0, 2023-01-11T23:33:14.5072008Z tstate@entry=) 2023-01-11T23:33:14.5072445Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065 2023-01-11T23:33:14.5072712Z #59 _PyFunction_Vectorcall () 2023-01-11T23:33:14.5073032Z at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342 2023-01-11T23:33:14.5075165Z #60 0x00000000004ef101 in _PyObject_VectorcallTstate (kwnames=0x0, 2023-01-11T23:33:14.5075467Z nargsf=, args=0x1970510, callable=0x7fea16c55f30, 2023-01-11T23:33:14.5075720Z tstate=0x18f3bf0) 2023-01-11T23:33:14.5076135Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123 2023-01-11T23:33:14.5077745Z #61 PyObject_Vectorcall (kwnames=0x0, nargsf=, args=0x1970510, 2023-01-11T23:33:14.5078020Z callable=0x7fea16c55f30) 2023-01-11T23:33:14.5078700Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123 2023-01-11T23:33:14.5079247Z #62 call_function (kwnames=0x0, oparg=, 2023-01-11T23:33:14.5079552Z pp_stack=, trace_info=0x7ffca89e65c0, 2023-01-11T23:33:14.5079809Z tstate=) 2023-01-11T23:33:14.5080494Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5891 2023-01-11T23:33:14.5080789Z #63 _PyEval_EvalFrameDefault () 2023-01-11T23:33:14.5081118Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4198 2023-01-11T23:33:14.5081391Z #64 0x00000000004fe5ef in _PyEval_EvalFrame ( 2023-01-11T23:33:14.5081716Z throwflag=, 2023-01-11T23:33:14.5082093Z f=, 2023-01-11T23:33:14.5082470Z tstate=) 2023-01-11T23:33:14.5082854Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052 2023-01-11T23:33:14.5085414Z #65 _PyEval_Vector (kwnames=, 2023-01-11T23:33:14.5085818Z kwnames@entry=, argcount=, args=, 2023-01-11T23:33:14.5086263Z args@entry=, locals=0x0, 2023-01-11T23:33:14.5086673Z locals@entry=, con=0x7fea16ad6de0, 2023-01-11T23:33:14.5087090Z con@entry=, tstate=0x18f3bf0, 2023-01-11T23:33:14.5087489Z tstate@entry=) 2023-01-11T23:33:14.5087892Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065 2023-01-11T23:33:14.5088152Z #66 _PyFunction_Vectorcall () 2023-01-11T23:33:14.5088473Z at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342 2023-01-11T23:33:14.5090271Z #67 0x00000000004eecef in _PyObject_VectorcallTstate (kwnames=0x0, 2023-01-11T23:33:14.5090585Z nargsf=, args=0x7fea16d9c200, callable=0x7fea16ad6dd0, 2023-01-11T23:33:14.5090843Z tstate=0x18f3bf0) 2023-01-11T23:33:14.5091226Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123 2023-01-11T23:33:14.5092716Z #68 PyObject_Vectorcall (kwnames=0x0, nargsf=, 2023-01-11T23:33:14.5092992Z args=0x7fea16d9c200, callable=0x7fea16ad6dd0) 2023-01-11T23:33:14.5093351Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123 2023-01-11T23:33:14.5094239Z #69 call_function (kwnames=0x0, oparg=, 2023-01-11T23:33:14.5094696Z pp_stack=, trace_info=0x7ffca89e6780, 2023-01-11T23:33:14.5095033Z tstate=) 2023-01-11T23:33:14.5095388Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5891 2023-01-11T23:33:14.5095741Z #70 _PyEval_EvalFrameDefault () 2023-01-11T23:33:14.5096117Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4213 2023-01-11T23:33:14.5096391Z #71 0x00000000004fe5ef in _PyEval_EvalFrame ( 2023-01-11T23:33:14.5096721Z throwflag=, 2023-01-11T23:33:14.5097092Z f=, 2023-01-11T23:33:14.5097465Z tstate=) 2023-01-11T23:33:14.5097855Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052 2023-01-11T23:33:14.5100305Z #72 _PyEval_Vector (kwnames=, 2023-01-11T23:33:14.5100682Z kwnames@entry=, argcount=, args=, 2023-01-11T23:33:14.5101130Z args@entry=, locals=0x0, 2023-01-11T23:33:14.5101560Z locals@entry=, con=0x7fea16ad6d50, 2023-01-11T23:33:14.5102067Z con@entry=, tstate=0x18f3bf0, 2023-01-11T23:33:14.5102458Z tstate@entry=) 2023-01-11T23:33:14.5102867Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065 2023-01-11T23:33:14.5103124Z #73 _PyFunction_Vectorcall () 2023-01-11T23:33:14.5103455Z at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342 2023-01-11T23:33:14.5104301Z #74 0x00000000004efd83 in _PyObject_VectorcallTstate (kwnames=0x7fea16d75e00, 2023-01-11T23:33:14.5104629Z nargsf=, args=, callable=0x7fea16ad6d40, 2023-01-11T23:33:14.5104887Z tstate=0x18f3bf0) 2023-01-11T23:33:14.5105319Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123 2023-01-11T23:33:14.5106001Z #75 PyObject_Vectorcall (kwnames=0x7fea16d75e00, nargsf=, 2023-01-11T23:33:14.5106306Z args=, callable=0x7fea16ad6d40) 2023-01-11T23:33:14.5106669Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123 2023-01-11T23:33:14.5108241Z #76 call_function (kwnames=0x7fea16d75e00, oparg=, 2023-01-11T23:33:14.5108550Z pp_stack=, trace_info=0x7ffca89e6940, 2023-01-11T23:33:14.5108800Z tstate=) 2023-01-11T23:33:14.5109118Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5891 2023-01-11T23:33:14.5109380Z #77 _PyEval_EvalFrameDefault () 2023-01-11T23:33:14.5109787Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4231 2023-01-11T23:33:14.5111995Z #78 0x0000000000594b72 in _PyEval_EvalFrame ( 2023-01-11T23:33:14.5112585Z throwflag=, 2023-01-11T23:33:14.5113102Z f=, 2023-01-11T23:33:14.5113596Z tstate=) 2023-01-11T23:33:14.5114091Z at /croot/python-split_1669298683653/_build_env/x86_64-conda-linux-gnu/sysroot/usr/include/bits/call.c:46 2023-01-11T23:33:14.5114383Z #79 _PyEval_Vector () 2023-01-11T23:33:14.5114697Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065 2023-01-11T23:33:14.5116214Z #80 0x0000000000594ab7 in PyEval_EvalCode (co=co@entry=0x7fea16d38920, 2023-01-11T23:33:14.5116634Z globals=globals@entry=0x7fea16d2a600, locals=locals@entry=0x7fea16d2a600) 2023-01-11T23:33:14.5117115Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:1134 2023-01-11T23:33:14.5117660Z #81 0x00000000005c6e57 in run_eval_code_obj () 2023-01-11T23:33:14.5118142Z at /usr/local/src/conda/python-3.10.8/Objects/clinic/marshal.c.h:1291 2023-01-11T23:33:14.5119942Z #82 0x00000000005c1d40 in run_mod () 2023-01-11T23:33:14.5120648Z at /usr/local/src/conda/python-3.10.8/Objects/clinic/marshal.c.h:1312 2023-01-11T23:33:14.5121940Z #83 0x00000000005b9ebb in PyRun_StringFlags.localalias () 2023-01-11T23:33:14.5122454Z at /usr/local/src/conda/python-3.10.8/Objects/clinic/marshal.c.h:1183 2023-01-11T23:33:14.5124015Z #84 0x00000000005b9cfb in PyRun_SimpleStringFlags.localalias () 2023-01-11T23:33:14.5124534Z at /usr/local/src/conda/python-3.10.8/Objects/clinic/marshal.c.h:503 2023-01-11T23:33:14.5126065Z #85 0x00000000005b8d5c in pymain_run_command ( 2023-01-11T23:33:14.5126495Z command=) 2023-01-11T23:33:14.5126987Z at /croot/python-split_1669298683653/work/build-static/python.c:252 2023-01-11T23:33:14.5127554Z #86 pymain_run_python (exitcode=0x7ffca89e6ba0) 2023-01-11T23:33:14.5128008Z at /croot/python-split_1669298683653/work/build-static/python.c:582 2023-01-11T23:33:14.5128350Z #87 Py_RunMain.localalias () 2023-01-11T23:33:14.5128710Z at /croot/python-split_1669298683653/work/build-static/python.c:670 2023-01-11T23:33:14.5144161Z #88 0x0000000000587c29 in Py_BytesMain (argc=, 2023-01-11T23:33:14.5144496Z argv=) 2023-01-11T23:33:14.5145251Z at /croot/python-split_1669298683653/work/build-static/python.c:1090 2023-01-11T23:33:14.5148169Z #89 0x00007fea16e0ac87 in __libc_start_main (main=0x587be0
, argc=5, 2023-01-11T23:33:14.5148578Z argv=0x7ffca89e6da8, init=, fini=, 2023-01-11T23:33:14.5148958Z rtld_fini=, stack_end=0x7ffca89e6d98) 2023-01-11T23:33:14.5149345Z at ../csu/libc-start.c:310 2023-01-11T23:33:14.5149625Z #90 0x0000000000587ade in _start () 2023-01-11T23:33:14.5150175Z at /usr/local/src/conda/python-3.10.8/Modules/_io/clinic/peg_api.c:880 2023-01-11T23:33:14.9774815Z GNU gdb (Ubuntu 8.1.1-0ubuntu1) 8.1.1 2023-01-11T23:33:14.9775627Z Copyright (C) 2018 Free Software Foundation, Inc. 2023-01-11T23:33:14.9776378Z License GPLv3+: GNU GPL version 3 or later 2023-01-11T23:33:14.9777030Z This is free software: you are free to change and redistribute it. 2023-01-11T23:33:14.9777654Z There is NO WARRANTY, to the extent permitted by law. Type "show copying" 2023-01-11T23:33:14.9778182Z and "show warranty" for details. 2023-01-11T23:33:14.9778769Z This GDB was configured as "x86_64-linux-gnu". 2023-01-11T23:33:14.9779298Z Type "show configuration" for configuration details. 2023-01-11T23:33:14.9779823Z For bug reporting instructions, please see: 2023-01-11T23:33:14.9780331Z . 2023-01-11T23:33:14.9780889Z Find the GDB manual and other documentation resources online at: 2023-01-11T23:33:14.9781234Z . 2023-01-11T23:33:14.9781499Z For help, type "help". 2023-01-11T23:33:14.9781757Z Type "apropos word" to search for commands related to "word"... 2023-01-11T23:33:15.0488655Z Reading symbols from python...done. 2023-01-11T23:33:15.5762877Z 2023-01-11T23:33:15.5763582Z warning: core file may not match specified executable file. 2023-01-11T23:33:15.5784967Z BFD: warning: /opt/conda/lib/libgomp.so.1: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010001 2023-01-11T23:33:15.5785337Z [New LWP 6353] 2023-01-11T23:33:15.5785544Z [New LWP 6381] 2023-01-11T23:33:15.5785789Z [New LWP 6368] 2023-01-11T23:33:15.5786184Z BFD: warning: /opt/conda/lib/libgomp.so.1: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010002 2023-01-11T23:33:15.5786664Z BFD: warning: /opt/conda/bin/../lib/libstdc++.so.6: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010001 2023-01-11T23:33:15.5787033Z BFD: warning: /opt/conda/bin/../lib/libstdc++.so.6: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010002 2023-01-11T23:33:15.5787421Z BFD: warning: /opt/conda/bin/../lib/libgcc_s.so.1: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010001 2023-01-11T23:33:15.5787787Z BFD: warning: /opt/conda/bin/../lib/libgcc_s.so.1: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010002 2023-01-11T23:33:15.5788058Z [New LWP 6383] 2023-01-11T23:33:15.5788236Z [New LWP 6361] 2023-01-11T23:33:15.5788421Z [New LWP 6366] 2023-01-11T23:33:15.5788605Z [New LWP 6382] 2023-01-11T23:33:15.5788782Z [New LWP 6379] 2023-01-11T23:33:15.5788964Z [New LWP 6378] 2023-01-11T23:33:15.5789157Z [New LWP 6365] 2023-01-11T23:33:15.5789332Z [New LWP 6380] 2023-01-11T23:33:15.5789512Z [New LWP 6363] 2023-01-11T23:33:15.5789693Z [New LWP 6367] 2023-01-11T23:33:15.5789945Z [New LWP 6393] 2023-01-11T23:33:15.5790168Z [New LWP 6386] 2023-01-11T23:33:15.5790349Z [New LWP 6385] 2023-01-11T23:33:15.5790524Z [New LWP 6384] 2023-01-11T23:33:15.5796706Z BFD: warning: /opt/conda/lib/python3.10/site-packages/numpy/core/../../../.././libgfortran.so.5: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010001 2023-01-11T23:33:15.5797350Z BFD: warning: /opt/conda/lib/python3.10/site-packages/numpy/core/../../../.././libgfortran.so.5: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010002 2023-01-11T23:33:15.5797898Z BFD: warning: /opt/conda/lib/python3.10/site-packages/numpy/core/../../../.././libquadmath.so.0: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010001 2023-01-11T23:33:15.5798728Z BFD: warning: /opt/conda/lib/python3.10/site-packages/numpy/core/../../../.././libquadmath.so.0: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010002 2023-01-11T23:33:15.5967327Z [Thread debugging using libthread_db enabled] 2023-01-11T23:33:15.5967852Z Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". 2023-01-11T23:33:21.0667881Z 51 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory. 2023-01-11T23:33:21.0669222Z Core was generated by `/opt/conda/bin/python -bb -c from multiprocessing.spawn import spawn_main; spaw'. 2023-01-11T23:33:21.0670097Z Program terminated with signal SIGSEGV, Segmentation fault. 2023-01-11T23:33:21.0671525Z warning: File "/var/lib/jenkins/workspace/.gdbinit" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load". 2023-01-11T23:33:21.0672336Z #0 raise (sig=) at ../sysdeps/unix/sysv/linux/raise.c:51 2023-01-11T23:33:21.0672929Z [Current thread is 1 (Thread 0x7f974070a080 (LWP 6353))] 2023-01-11T23:33:21.0698176Z To enable execution of this file add 2023-01-11T23:33:21.0699202Z add-auto-load-safe-path /var/lib/jenkins/workspace/.gdbinit 2023-01-11T23:33:21.0700038Z line to your configuration file "/var/lib/jenkins/.gdbinit". 2023-01-11T23:33:21.0700826Z To completely disable this security protection add 2023-01-11T23:33:21.0701577Z set auto-load safe-path / 2023-01-11T23:33:21.0702120Z line to your configuration file "/var/lib/jenkins/.gdbinit". 2023-01-11T23:33:21.0702704Z For more information about this security protection see the 2023-01-11T23:33:21.0703337Z "Auto-loading safe path" section in the GDB manual. E.g., run from the shell: 2023-01-11T23:33:21.0703690Z info "(gdb)Auto-loading safe path" 2023-01-11T23:33:21.0730024Z #0 raise (sig=) at ../sysdeps/unix/sysv/linux/raise.c:51 2023-01-11T23:33:21.0731473Z #1 0x00007f970772665b in handler_SIGSEGV(int, siginfo_t*, void*) () 2023-01-11T23:33:21.0731994Z from /opt/conda/lib/python3.10/site-packages/torch/lib/libtorch_python.so 2023-01-11T23:33:21.0741960Z #2 2023-01-11T23:33:21.0742295Z #3 __strlen_sse2 () at ../sysdeps/x86_64/multiarch/../strlen.S:120 2023-01-11T23:33:21.0880137Z #4 0x00007f973f1c803b in string_at (ptr=0x0, size=-1) at :5564 2023-01-11T23:33:21.0880515Z #5 0x00007f97405bf052 in ffi_call_unix64 () 2023-01-11T23:33:21.0880878Z from /opt/conda/lib/python3.10/lib-dynload/../../libffi.so.8 2023-01-11T23:33:21.0884517Z #6 0x00007f97405bd8cd in ffi_call_int () 2023-01-11T23:33:21.0884957Z from /opt/conda/lib/python3.10/lib-dynload/../../libffi.so.8 2023-01-11T23:33:21.0890514Z #7 0x00007f973f1d0879 in _call_function_pointer (argtypecount=2, argcount=2, 2023-01-11T23:33:21.0890974Z resmem=0x7fff19e0f940, restype=, atypes=, 2023-01-11T23:33:21.0891296Z avalues=, pProc=0x7f973f1c8002 , flags=4357) 2023-01-11T23:33:21.0891709Z at /usr/local/src/conda/python-3.10.8/build-static/stgdict.c:916 2023-01-11T23:33:21.0891959Z #8 _ctypes_callproc () 2023-01-11T23:33:21.0892344Z at /usr/local/src/conda/python-3.10.8/build-static/stgdict.c:1259 2023-01-11T23:33:21.0892660Z #9 0x00007f973f1d03fe in PyCFuncPtr_call () at :4201 2023-01-11T23:33:21.0893244Z #10 0x00000000004f7b8b in _PyObject_MakeTpCall.localalias () 2023-01-11T23:33:21.0893716Z at /usr/local/src/conda/python-3.10.8/Include/abstract.h:224 2023-01-11T23:33:21.0898179Z #11 0x00000000004f37ae in _PyObject_VectorcallTstate ( 2023-01-11T23:33:21.0898638Z kwnames=, 2023-01-11T23:33:21.0899172Z nargsf=, args=0x7f964ec1b418, 2023-01-11T23:33:21.0899584Z callable=, 2023-01-11T23:33:21.0900261Z tstate=) 2023-01-11T23:33:21.0900717Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:112 2023-01-11T23:33:21.0901046Z #12 _PyObject_VectorcallTstate (kwnames=0x0, nargsf=, 2023-01-11T23:33:21.0901365Z args=0x7f964ec1b418, callable=0x7f973f484880, tstate=) 2023-01-11T23:33:21.0901724Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:99 2023-01-11T23:33:21.0903878Z #13 PyObject_Vectorcall (kwnames=0x0, nargsf=, 2023-01-11T23:33:21.0904200Z args=0x7f964ec1b418, callable=0x7f973f484880) 2023-01-11T23:33:21.0904671Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123 2023-01-11T23:33:21.0906730Z #14 call_function (kwnames=0x0, oparg=, 2023-01-11T23:33:21.0922187Z pp_stack=, trace_info=0x7fff19e0fc50, 2023-01-11T23:33:21.0922514Z tstate=) 2023-01-11T23:33:21.0922932Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5891 2023-01-11T23:33:21.0923677Z #15 _PyEval_EvalFrameDefault () 2023-01-11T23:33:21.0924121Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4213 2023-01-11T23:33:21.0924487Z #16 0x00000000004fe5ef in _PyEval_EvalFrame ( 2023-01-11T23:33:21.0924932Z throwflag=, 2023-01-11T23:33:21.0925421Z f=, 2023-01-11T23:33:21.0925857Z tstate=) 2023-01-11T23:33:21.0926270Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052 2023-01-11T23:33:21.0926542Z #17 _PyEval_Vector (kwnames=, 2023-01-11T23:33:21.0926916Z kwnames@entry=, argcount=, args=, 2023-01-11T23:33:21.0927370Z args@entry=, locals=0x0, 2023-01-11T23:33:21.0927828Z locals@entry=, con=0x7f973f443890, 2023-01-11T23:33:21.0928257Z con@entry=, tstate=0x1f34bf0, 2023-01-11T23:33:21.0928663Z tstate@entry=) 2023-01-11T23:33:21.0929062Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065 2023-01-11T23:33:21.0929312Z #18 _PyFunction_Vectorcall () 2023-01-11T23:33:21.0929627Z at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342 2023-01-11T23:33:21.0929921Z #19 0x00000000004f351e in _PyObject_VectorcallTstate (kwnames=0x0, 2023-01-11T23:33:21.0930217Z nargsf=, args=0x7f964ec1b280, callable=0x7f973f443880, 2023-01-11T23:33:21.0930457Z tstate=0x1f34bf0) 2023-01-11T23:33:21.0930776Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123 2023-01-11T23:33:21.0931082Z #20 PyObject_Vectorcall (kwnames=0x0, nargsf=, 2023-01-11T23:33:21.0931345Z args=0x7f964ec1b280, callable=0x7f973f443880) 2023-01-11T23:33:21.0931704Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123 2023-01-11T23:33:21.0931992Z #21 call_function (kwnames=0x0, oparg=, 2023-01-11T23:33:21.0932267Z pp_stack=, trace_info=0x7fff19e0fe10, 2023-01-11T23:33:21.0932510Z tstate=) 2023-01-11T23:33:21.0932822Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5891 2023-01-11T23:33:21.0933082Z #22 _PyEval_EvalFrameDefault () 2023-01-11T23:33:21.0933390Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4181 2023-01-11T23:33:21.0933652Z #23 0x0000000000543a33 in _PyEval_EvalFrame ( 2023-01-11T23:33:21.0934097Z throwflag=, 2023-01-11T23:33:21.0935017Z f=, 2023-01-11T23:33:21.0935503Z tstate=) 2023-01-11T23:33:21.0935965Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052 2023-01-11T23:33:21.0936218Z #24 _PyEval_Vector (kwnames=0x0, 2023-01-11T23:33:21.0936521Z argcount=, 2023-01-11T23:33:21.0937049Z args=, locals=0x0, con=0x7f964ea29ac0, tstate=0x1f34bf0) 2023-01-11T23:33:21.0937517Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065 2023-01-11T23:33:21.0937894Z #25 _PyFunction_Vectorcall (kwnames=0x0, nargsf=, 2023-01-11T23:33:21.0938249Z stack=, func=0x7f964ea29ab0) 2023-01-11T23:33:21.0938597Z at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342 2023-01-11T23:33:21.0938910Z #26 _PyObject_VectorcallTstate (kwnames=0x0, nargsf=, 2023-01-11T23:33:21.0939206Z args=, callable=0x7f964ea29ab0, tstate=0x1f34bf0) 2023-01-11T23:33:21.0939568Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:114 2023-01-11T23:33:21.0939815Z #27 vectorcall_unbound ( 2023-01-11T23:33:21.0940122Z nargs=, args=, 2023-01-11T23:33:21.0940511Z func=, 2023-01-11T23:33:21.0940883Z unbound=, 2023-01-11T23:33:21.0941255Z tstate=) 2023-01-11T23:33:21.0941709Z at /usr/local/src/conda/python-3.10.8/Programs/gcmodule.c:1629 2023-01-11T23:33:21.0941961Z #28 vectorcall_method () 2023-01-11T23:33:21.0942279Z at /usr/local/src/conda/python-3.10.8/Programs/gcmodule.c:1661 2023-01-11T23:33:21.0942561Z #29 0x0000000000543898 in slot_mp_subscript (self=, 2023-01-11T23:33:21.0942803Z arg1=) 2023-01-11T23:33:21.0943107Z at /usr/local/src/conda/python-3.10.8/Programs/gcmodule.c:7258 2023-01-11T23:33:21.0943373Z #30 0x00000000004ef56e in _PyEval_EvalFrameDefault () 2023-01-11T23:33:21.0943705Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:2109 2023-01-11T23:33:21.0943976Z #31 0x00000000004fe5ef in _PyEval_EvalFrame ( 2023-01-11T23:33:21.0944299Z throwflag=, 2023-01-11T23:33:21.0944664Z f=, 2023-01-11T23:33:21.0945029Z tstate=) 2023-01-11T23:33:21.0945414Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052 2023-01-11T23:33:21.0945684Z #32 _PyEval_Vector (kwnames=, 2023-01-11T23:33:21.0946060Z kwnames@entry=, argcount=, args=, 2023-01-11T23:33:21.0946500Z args@entry=, locals=0x0, 2023-01-11T23:33:21.0946912Z locals@entry=, con=0x7f973f4415b0, 2023-01-11T23:33:21.0947318Z con@entry=, tstate=0x1f34bf0, 2023-01-11T23:33:21.0947819Z tstate@entry=) 2023-01-11T23:33:21.0948214Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065 2023-01-11T23:33:21.0948474Z #33 _PyFunction_Vectorcall () 2023-01-11T23:33:21.0948779Z at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342 2023-01-11T23:33:21.0950193Z #34 0x00000000004eecef in _PyObject_VectorcallTstate (kwnames=0x0, 2023-01-11T23:33:21.0950746Z nargsf=, args=0x7f964ec22a48, callable=0x7f973f4415a0, 2023-01-11T23:33:21.0951000Z tstate=0x1f34bf0) 2023-01-11T23:33:21.0952004Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123 2023-01-11T23:33:21.0953518Z #35 PyObject_Vectorcall (kwnames=0x0, nargsf=, 2023-01-11T23:33:21.0954044Z args=0x7f964ec22a48, callable=0x7f973f4415a0) 2023-01-11T23:33:21.0954499Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123 2023-01-11T23:33:21.0954794Z #36 call_function (kwnames=0x0, oparg=, 2023-01-11T23:33:21.0955140Z pp_stack=, trace_info=0x7fff19e10210, 2023-01-11T23:33:21.0955399Z tstate=) 2023-01-11T23:33:21.0955712Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5891 2023-01-11T23:33:21.0956044Z #37 _PyEval_EvalFrameDefault () 2023-01-11T23:33:21.0956372Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4213 2023-01-11T23:33:21.0956639Z #38 0x00000000004fe5ef in _PyEval_EvalFrame ( 2023-01-11T23:33:21.0956969Z throwflag=, 2023-01-11T23:33:21.0957348Z f=, 2023-01-11T23:33:21.0957730Z tstate=) 2023-01-11T23:33:21.0958111Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052 2023-01-11T23:33:21.0960589Z #39 _PyEval_Vector (kwnames=, 2023-01-11T23:33:21.0960977Z kwnames@entry=, argcount=, args=, 2023-01-11T23:33:21.0961429Z args@entry=, locals=0x0, 2023-01-11T23:33:21.0961834Z locals@entry=, con=0x7f9657a45370, 2023-01-11T23:33:21.0962247Z con@entry=, tstate=0x1f34bf0, 2023-01-11T23:33:21.0962644Z tstate@entry=) 2023-01-11T23:33:21.0963065Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065 2023-01-11T23:33:21.0963316Z #40 _PyFunction_Vectorcall () 2023-01-11T23:33:21.0963633Z at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342 2023-01-11T23:33:21.0965547Z #41 0x00000000004ef101 in _PyObject_VectorcallTstate (kwnames=0x0, 2023-01-11T23:33:21.0965850Z nargsf=, args=0x20cef18, callable=0x7f9657a45360, 2023-01-11T23:33:21.0966097Z tstate=0x1f34bf0) 2023-01-11T23:33:21.0966503Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123 2023-01-11T23:33:21.0968017Z #42 PyObject_Vectorcall (kwnames=0x0, nargsf=, args=0x20cef18, 2023-01-11T23:33:21.0968330Z callable=0x7f9657a45360) 2023-01-11T23:33:21.0968673Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123 2023-01-11T23:33:21.0969261Z #43 call_function (kwnames=0x0, oparg=, 2023-01-11T23:33:21.0969634Z pp_stack=, trace_info=0x7fff19e103d0, 2023-01-11T23:33:21.0969946Z tstate=) 2023-01-11T23:33:21.0970334Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5891 2023-01-11T23:33:21.0970688Z #44 _PyEval_EvalFrameDefault () 2023-01-11T23:33:21.0971001Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4198 2023-01-11T23:33:21.0971277Z #45 0x00000000004fe5ef in _PyEval_EvalFrame ( 2023-01-11T23:33:21.0971634Z throwflag=, 2023-01-11T23:33:21.0972012Z f=, 2023-01-11T23:33:21.0972377Z tstate=) 2023-01-11T23:33:21.0972768Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052 2023-01-11T23:33:21.0975593Z #46 _PyEval_Vector (kwnames=, 2023-01-11T23:33:21.0976456Z kwnames@entry=, argcount=, args=, 2023-01-11T23:33:21.0977036Z args@entry=, locals=0x0, 2023-01-11T23:33:21.0977535Z locals@entry=, con=0x7f9657a44680, 2023-01-11T23:33:21.0977956Z con@entry=, tstate=0x1f34bf0, 2023-01-11T23:33:21.0978351Z tstate@entry=) 2023-01-11T23:33:21.0978799Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065 2023-01-11T23:33:21.0979063Z #47 _PyFunction_Vectorcall () 2023-01-11T23:33:21.0979382Z at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342 2023-01-11T23:33:21.0980895Z #48 0x00000000004f141c in do_call_core (kwdict=0x7f973f58e700, 2023-01-11T23:33:21.0981464Z callargs=0x7f964ebcde40, func=0x7f9657a44670, trace_info=0x7fff19e10590, 2023-01-11T23:33:21.0981754Z tstate=) 2023-01-11T23:33:21.0982116Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5943 2023-01-11T23:33:21.0982450Z #49 _PyEval_EvalFrameDefault () 2023-01-11T23:33:21.0982823Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4277 2023-01-11T23:33:21.0983152Z #50 0x00000000004fe5ef in _PyEval_EvalFrame ( 2023-01-11T23:33:21.0983476Z throwflag=, 2023-01-11T23:33:21.0983843Z f=, 2023-01-11T23:33:21.0984214Z tstate=) 2023-01-11T23:33:21.0984603Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052 2023-01-11T23:33:21.0986639Z #51 _PyEval_Vector (kwnames=, 2023-01-11T23:33:21.0987281Z kwnames@entry=, argcount=, args=, 2023-01-11T23:33:21.0987804Z args@entry=, locals=0x0, 2023-01-11T23:33:21.0988268Z locals@entry=, con=0x7f973f5b95b0, 2023-01-11T23:33:21.0988680Z con@entry=, tstate=0x1f34bf0, 2023-01-11T23:33:21.0989079Z tstate@entry=) 2023-01-11T23:33:21.0989517Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065 2023-01-11T23:33:21.0989781Z #52 _PyFunction_Vectorcall () 2023-01-11T23:33:21.0990199Z at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342 2023-01-11T23:33:21.0993477Z #53 0x00000000004ef101 in _PyObject_VectorcallTstate (kwnames=0x0, 2023-01-11T23:33:21.0993806Z nargsf=, args=0x6f8f008, callable=0x7f973f5b95a0, 2023-01-11T23:33:21.0994167Z tstate=0x1f34bf0) 2023-01-11T23:33:21.0994529Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123 2023-01-11T23:33:21.0994858Z #54 PyObject_Vectorcall (kwnames=0x0, nargsf=, args=0x6f8f008, 2023-01-11T23:33:21.0995124Z callable=0x7f973f5b95a0) 2023-01-11T23:33:21.0995455Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123 2023-01-11T23:33:21.0996358Z #55 call_function (kwnames=0x0, oparg=, 2023-01-11T23:33:21.0996967Z pp_stack=, trace_info=0x7fff19e10750, 2023-01-11T23:33:21.0997384Z tstate=) 2023-01-11T23:33:21.0997901Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5891 2023-01-11T23:33:21.0998291Z #56 _PyEval_EvalFrameDefault () 2023-01-11T23:33:21.0998896Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4198 2023-01-11T23:33:21.0999324Z #57 0x00000000004fe5ef in _PyEval_EvalFrame ( 2023-01-11T23:33:21.0999802Z throwflag=, 2023-01-11T23:33:21.1000365Z f=, 2023-01-11T23:33:21.1000918Z tstate=) 2023-01-11T23:33:21.1001518Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052 2023-01-11T23:33:21.1003821Z #58 _PyEval_Vector (kwnames=, 2023-01-11T23:33:21.1004385Z kwnames@entry=, argcount=, args=, 2023-01-11T23:33:21.1005067Z args@entry=, locals=0x0, 2023-01-11T23:33:21.1005677Z locals@entry=, con=0x7f973f5b9f40, 2023-01-11T23:33:21.1006319Z con@entry=, tstate=0x1f34bf0, 2023-01-11T23:33:21.1006907Z tstate@entry=) 2023-01-11T23:33:21.1007529Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065 2023-01-11T23:33:21.1007925Z #59 _PyFunction_Vectorcall () 2023-01-11T23:33:21.1008386Z at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342 2023-01-11T23:33:21.1010165Z #60 0x00000000004ef101 in _PyObject_VectorcallTstate (kwnames=0x0, 2023-01-11T23:33:21.1010616Z nargsf=, args=0x1f727b0, callable=0x7f973f5b9f30, 2023-01-11T23:33:21.1010985Z tstate=0x1f34bf0) 2023-01-11T23:33:21.1011491Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123 2023-01-11T23:33:21.1013282Z #61 PyObject_Vectorcall (kwnames=0x0, nargsf=, args=0x1f727b0, 2023-01-11T23:33:21.1013679Z callable=0x7f973f5b9f30) 2023-01-11T23:33:21.1014194Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123 2023-01-11T23:33:21.1015074Z #62 call_function (kwnames=0x0, oparg=, 2023-01-11T23:33:21.1015496Z pp_stack=, trace_info=0x7fff19e10910, 2023-01-11T23:33:21.1015878Z tstate=) 2023-01-11T23:33:21.1016348Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5891 2023-01-11T23:33:21.1016734Z #63 _PyEval_EvalFrameDefault () 2023-01-11T23:33:21.1017229Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4198 2023-01-11T23:33:21.1018073Z #64 0x00000000004fe5ef in _PyEval_EvalFrame ( 2023-01-11T23:33:21.1018569Z throwflag=, 2023-01-11T23:33:21.1019144Z f=, 2023-01-11T23:33:21.1019689Z tstate=) 2023-01-11T23:33:21.1020443Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052 2023-01-11T23:33:21.1023804Z #65 _PyEval_Vector (kwnames=, 2023-01-11T23:33:21.1024393Z kwnames@entry=, argcount=, args=, 2023-01-11T23:33:21.1025044Z args@entry=, locals=0x0, 2023-01-11T23:33:21.1025653Z locals@entry=, con=0x7f973f43ade0, 2023-01-11T23:33:21.1026287Z con@entry=, tstate=0x1f34bf0, 2023-01-11T23:33:21.1027015Z tstate@entry=) 2023-01-11T23:33:21.1027643Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065 2023-01-11T23:33:21.1028021Z #66 _PyFunction_Vectorcall () 2023-01-11T23:33:21.1028509Z at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342 2023-01-11T23:33:21.1029373Z #67 0x00000000004eecef in _PyObject_VectorcallTstate (kwnames=0x0, 2023-01-11T23:33:21.1029820Z nargsf=, args=0x7f973f700200, callable=0x7f973f43add0, 2023-01-11T23:33:21.1030281Z tstate=0x1f34bf0) 2023-01-11T23:33:21.1030779Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123 2023-01-11T23:33:21.1033758Z #68 PyObject_Vectorcall (kwnames=0x0, nargsf=, 2023-01-11T23:33:21.1034164Z args=0x7f973f700200, callable=0x7f973f43add0) 2023-01-11T23:33:21.1034707Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123 2023-01-11T23:33:21.1035163Z #69 call_function (kwnames=0x0, oparg=, 2023-01-11T23:33:21.1035559Z pp_stack=, trace_info=0x7fff19e10ad0, 2023-01-11T23:33:21.1035931Z tstate=) 2023-01-11T23:33:21.1036417Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5891 2023-01-11T23:33:21.1036807Z #70 _PyEval_EvalFrameDefault () 2023-01-11T23:33:21.1037293Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4213 2023-01-11T23:33:21.1037702Z #71 0x00000000004fe5ef in _PyEval_EvalFrame ( 2023-01-11T23:33:21.1038191Z throwflag=, 2023-01-11T23:33:21.1038725Z f=, 2023-01-11T23:33:21.1039282Z tstate=) 2023-01-11T23:33:21.1039873Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052 2023-01-11T23:33:21.1042675Z #72 _PyEval_Vector (kwnames=, 2023-01-11T23:33:21.1043248Z kwnames@entry=, argcount=, args=, 2023-01-11T23:33:21.1043922Z args@entry=, locals=0x0, 2023-01-11T23:33:21.1044570Z locals@entry=, con=0x7f973f43ad50, 2023-01-11T23:33:21.1045200Z con@entry=, tstate=0x1f34bf0, 2023-01-11T23:33:21.1045775Z tstate@entry=) 2023-01-11T23:33:21.1046408Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065 2023-01-11T23:33:21.1046819Z #73 _PyFunction_Vectorcall () 2023-01-11T23:33:21.1047295Z at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342 2023-01-11T23:33:21.1047818Z #74 0x00000000004efd83 in _PyObject_VectorcallTstate (kwnames=0x7f973f6d9e00, 2023-01-11T23:33:21.1048470Z nargsf=, args=, callable=0x7f973f43ad40, 2023-01-11T23:33:21.1048856Z tstate=0x1f34bf0) 2023-01-11T23:33:21.1049339Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123 2023-01-11T23:33:21.1049842Z #75 PyObject_Vectorcall (kwnames=0x7f973f6d9e00, nargsf=, 2023-01-11T23:33:21.1050276Z args=, callable=0x7f973f43ad40) 2023-01-11T23:33:21.1050798Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123 2023-01-11T23:33:21.1052263Z #76 call_function (kwnames=0x7f973f6d9e00, oparg=, 2023-01-11T23:33:21.1052700Z pp_stack=, trace_info=0x7fff19e10c90, 2023-01-11T23:33:21.1053064Z tstate=) 2023-01-11T23:33:21.1053649Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5891 2023-01-11T23:33:21.1054035Z #77 _PyEval_EvalFrameDefault () 2023-01-11T23:33:21.1054801Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4231 2023-01-11T23:33:21.1055567Z #78 0x0000000000594b72 in _PyEval_EvalFrame ( 2023-01-11T23:33:21.1056042Z throwflag=, 2023-01-11T23:33:21.1056623Z f=, 2023-01-11T23:33:21.1057177Z tstate=) 2023-01-11T23:33:21.1057872Z at /croot/python-split_1669298683653/_build_env/x86_64-conda-linux-gnu/sysroot/usr/include/bits/call.c:46 2023-01-11T23:33:21.1058294Z #79 _PyEval_Vector () 2023-01-11T23:33:21.1058754Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065 2023-01-11T23:33:21.1063596Z #80 0x0000000000594ab7 in PyEval_EvalCode (co=co@entry=0x7f973f69c920, 2023-01-11T23:33:21.1064057Z globals=globals@entry=0x7f973f68e600, locals=locals@entry=0x7f973f68e600) 2023-01-11T23:33:21.1064617Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:1134 2023-01-11T23:33:21.1065016Z #81 0x00000000005c6e57 in run_eval_code_obj () 2023-01-11T23:33:21.1065509Z at /usr/local/src/conda/python-3.10.8/Objects/clinic/marshal.c.h:1291 2023-01-11T23:33:21.1068547Z #82 0x00000000005c1d40 in run_mod () 2023-01-11T23:33:21.1069057Z at /usr/local/src/conda/python-3.10.8/Objects/clinic/marshal.c.h:1312 2023-01-11T23:33:21.1072391Z #83 0x00000000005b9ebb in PyRun_StringFlags.localalias () 2023-01-11T23:33:21.1072970Z at /usr/local/src/conda/python-3.10.8/Objects/clinic/marshal.c.h:1183 2023-01-11T23:33:21.1073602Z #84 0x00000000005b9cfb in PyRun_SimpleStringFlags.localalias () 2023-01-11T23:33:21.1074040Z at /usr/local/src/conda/python-3.10.8/Objects/clinic/marshal.c.h:503 2023-01-11T23:33:21.1076826Z #85 0x00000000005b8d5c in pymain_run_command ( 2023-01-11T23:33:21.1077492Z command=) 2023-01-11T23:33:21.1078133Z at /croot/python-split_1669298683653/work/build-static/python.c:252 2023-01-11T23:33:21.1078568Z #86 pymain_run_python (exitcode=0x7fff19e10ef0) 2023-01-11T23:33:21.1079079Z at /croot/python-split_1669298683653/work/build-static/python.c:582 2023-01-11T23:33:21.1079461Z #87 Py_RunMain.localalias () 2023-01-11T23:33:21.1079931Z at /croot/python-split_1669298683653/work/build-static/python.c:670 2023-01-11T23:33:21.1096650Z #88 0x0000000000587c29 in Py_BytesMain (argc=, 2023-01-11T23:33:21.1097045Z argv=) 2023-01-11T23:33:21.1097569Z at /croot/python-split_1669298683653/work/build-static/python.c:1090 2023-01-11T23:33:21.1101609Z #89 0x00007f973f76ec87 in __libc_start_main (main=0x587be0
, argc=5, 2023-01-11T23:33:21.1101990Z argv=0x7fff19e110f8, init=, fini=, 2023-01-11T23:33:21.1102374Z rtld_fini=, stack_end=0x7fff19e110e8) 2023-01-11T23:33:21.1102793Z at ../csu/libc-start.c:310 2023-01-11T23:33:21.1103273Z #90 0x0000000000587ade in _start () 2023-01-11T23:33:21.1103750Z at /usr/local/src/conda/python-3.10.8/Modules/_io/clinic/peg_api.c:880 2023-01-11T23:33:21.2770024Z GNU gdb (Ubuntu 8.1.1-0ubuntu1) 8.1.1 2023-01-11T23:33:21.2770459Z Copyright (C) 2018 Free Software Foundation, Inc. 2023-01-11T23:33:21.2770896Z License GPLv3+: GNU GPL version 3 or later 2023-01-11T23:33:21.2771341Z This is free software: you are free to change and redistribute it. 2023-01-11T23:33:21.2771732Z There is NO WARRANTY, to the extent permitted by law. Type "show copying" 2023-01-11T23:33:21.2772020Z and "show warranty" for details. 2023-01-11T23:33:21.2772317Z This GDB was configured as "x86_64-linux-gnu". 2023-01-11T23:33:21.2772593Z Type "show configuration" for configuration details. 2023-01-11T23:33:21.2773128Z For bug reporting instructions, please see: 2023-01-11T23:33:21.2773398Z . 2023-01-11T23:33:21.2773687Z Find the GDB manual and other documentation resources online at: 2023-01-11T23:33:21.2773989Z . 2023-01-11T23:33:21.2774225Z For help, type "help". 2023-01-11T23:33:21.2774764Z Type "apropos word" to search for commands related to "word"... 2023-01-11T23:33:21.3484884Z Reading symbols from python...done. 2023-01-11T23:33:21.8763435Z 2023-01-11T23:33:21.8764499Z warning: core file may not match specified executable file. 2023-01-11T23:33:21.8783793Z BFD: warning: /opt/conda/lib/libgomp.so.1: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010001 2023-01-11T23:33:21.8784250Z BFD: warning: /opt/conda/lib/libgomp.so.1: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010002 2023-01-11T23:33:21.8784695Z BFD: warning: /opt/conda/bin/../lib/libstdc++.so.6: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010001 2023-01-11T23:33:21.8785221Z [New LWP 8833] 2023-01-11T23:33:21.8785547Z BFD: warning: /opt/conda/bin/../lib/libstdc++.so.6: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010002 2023-01-11T23:33:21.8785922Z BFD: warning: /opt/conda/bin/../lib/libgcc_s.so.1: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010001 2023-01-11T23:33:21.8786303Z BFD: warning: /opt/conda/bin/../lib/libgcc_s.so.1: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010002 2023-01-11T23:33:21.8786570Z [New LWP 8850] 2023-01-11T23:33:21.8786753Z [New LWP 8842] 2023-01-11T23:33:21.8786935Z [New LWP 8848] 2023-01-11T23:33:21.8787118Z [New LWP 8860] 2023-01-11T23:33:21.8787293Z [New LWP 8840] 2023-01-11T23:33:21.8787476Z [New LWP 8859] 2023-01-11T23:33:21.8787657Z [New LWP 8858] 2023-01-11T23:33:21.8787830Z [New LWP 8857] 2023-01-11T23:33:21.8788010Z [New LWP 8846] 2023-01-11T23:33:21.8788191Z [New LWP 8864] 2023-01-11T23:33:21.8788363Z [New LWP 8844] 2023-01-11T23:33:21.8788542Z [New LWP 8862] 2023-01-11T23:33:21.8788722Z [New LWP 8863] 2023-01-11T23:33:21.8788895Z [New LWP 8865] 2023-01-11T23:33:21.8789079Z [New LWP 8861] 2023-01-11T23:33:21.8789258Z [New LWP 8872] 2023-01-11T23:33:21.8794638Z BFD: warning: /opt/conda/lib/python3.10/site-packages/numpy/core/../../../.././libgfortran.so.5: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010001 2023-01-11T23:33:21.8795255Z BFD: warning: /opt/conda/lib/python3.10/site-packages/numpy/core/../../../.././libgfortran.so.5: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010002 2023-01-11T23:33:21.8795805Z BFD: warning: /opt/conda/lib/python3.10/site-packages/numpy/core/../../../.././libquadmath.so.0: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010001 2023-01-11T23:33:21.8796346Z BFD: warning: /opt/conda/lib/python3.10/site-packages/numpy/core/../../../.././libquadmath.so.0: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010002 2023-01-11T23:33:21.8967287Z [Thread debugging using libthread_db enabled] 2023-01-11T23:33:21.8968047Z Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". 2023-01-11T23:33:27.3697658Z 51 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory. 2023-01-11T23:33:27.3699085Z warning: File "/var/lib/jenkins/workspace/.gdbinit" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load". 2023-01-11T23:33:27.3700625Z Core was generated by `/opt/conda/bin/python -bb -c from multiprocessing.spawn import spawn_main; spaw'. 2023-01-11T23:33:27.3701378Z Program terminated with signal SIGSEGV, Segmentation fault. 2023-01-11T23:33:27.3701986Z #0 raise (sig=) at ../sysdeps/unix/sysv/linux/raise.c:51 2023-01-11T23:33:27.3702566Z [Current thread is 1 (Thread 0x7f494eef1080 (LWP 8833))] 2023-01-11T23:33:27.3726613Z To enable execution of this file add 2023-01-11T23:33:27.3727124Z add-auto-load-safe-path /var/lib/jenkins/workspace/.gdbinit 2023-01-11T23:33:27.3727534Z line to your configuration file "/var/lib/jenkins/.gdbinit". 2023-01-11T23:33:27.3727864Z To completely disable this security protection add 2023-01-11T23:33:27.3728157Z set auto-load safe-path / 2023-01-11T23:33:27.3728524Z line to your configuration file "/var/lib/jenkins/.gdbinit". 2023-01-11T23:33:27.3728825Z For more information about this security protection see the 2023-01-11T23:33:27.3729195Z "Auto-loading safe path" section in the GDB manual. E.g., run from the shell: 2023-01-11T23:33:27.3729517Z info "(gdb)Auto-loading safe path" 2023-01-11T23:33:27.3760834Z #0 raise (sig=) at ../sysdeps/unix/sysv/linux/raise.c:51 2023-01-11T23:33:27.3762105Z #1 0x00007f4915f0d65b in handler_SIGSEGV(int, siginfo_t*, void*) () 2023-01-11T23:33:27.3762669Z from /opt/conda/lib/python3.10/site-packages/torch/lib/libtorch_python.so 2023-01-11T23:33:27.3771599Z #2 2023-01-11T23:33:27.3771964Z #3 __strlen_sse2 () at ../sysdeps/x86_64/multiarch/../strlen.S:120 2023-01-11T23:33:27.3908293Z #4 0x00007f494d9af03b in string_at (ptr=0x0, size=-1) at :5564 2023-01-11T23:33:27.3908677Z #5 0x00007f494eda6052 in ffi_call_unix64 () 2023-01-11T23:33:27.3909065Z from /opt/conda/lib/python3.10/lib-dynload/../../libffi.so.8 2023-01-11T23:33:27.3915308Z #6 0x00007f494eda48cd in ffi_call_int () 2023-01-11T23:33:27.3915799Z from /opt/conda/lib/python3.10/lib-dynload/../../libffi.so.8 2023-01-11T23:33:27.3921301Z #7 0x00007f494d9b7879 in _call_function_pointer (argtypecount=2, argcount=2, 2023-01-11T23:33:27.3921774Z resmem=0x7ffd99b0c510, restype=, atypes=, 2023-01-11T23:33:27.3922200Z avalues=, pProc=0x7f494d9af002 , flags=4357) 2023-01-11T23:33:27.3922634Z at /usr/local/src/conda/python-3.10.8/build-static/stgdict.c:916 2023-01-11T23:33:27.3922987Z #8 _ctypes_callproc () 2023-01-11T23:33:27.3923418Z at /usr/local/src/conda/python-3.10.8/build-static/stgdict.c:1259 2023-01-11T23:33:27.3923722Z #9 0x00007f494d9b73fe in PyCFuncPtr_call () at :4201 2023-01-11T23:33:27.3924340Z #10 0x00000000004f7b8b in _PyObject_MakeTpCall.localalias () 2023-01-11T23:33:27.3925654Z at /usr/local/src/conda/python-3.10.8/Include/abstract.h:224 2023-01-11T23:33:27.3928775Z #11 0x00000000004f37ae in _PyObject_VectorcallTstate ( 2023-01-11T23:33:27.3929362Z kwnames=, 2023-01-11T23:33:27.3929934Z nargsf=, args=0x7f485d457418, 2023-01-11T23:33:27.3930349Z callable=, 2023-01-11T23:33:27.3930723Z tstate=) 2023-01-11T23:33:27.3931180Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:112 2023-01-11T23:33:27.3931774Z #12 _PyObject_VectorcallTstate (kwnames=0x0, nargsf=, 2023-01-11T23:33:27.3932175Z args=0x7f485d457418, callable=0x7f494dc68880, tstate=) 2023-01-11T23:33:27.3932609Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:99 2023-01-11T23:33:27.3935373Z #13 PyObject_Vectorcall (kwnames=0x0, nargsf=, 2023-01-11T23:33:27.3935764Z args=0x7f485d457418, callable=0x7f494dc68880) 2023-01-11T23:33:27.3936458Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123 2023-01-11T23:33:27.3938748Z #14 call_function (kwnames=0x0, oparg=, 2023-01-11T23:33:27.3939161Z pp_stack=, trace_info=0x7ffd99b0c820, 2023-01-11T23:33:27.3939488Z tstate=) 2023-01-11T23:33:27.3939934Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5891 2023-01-11T23:33:27.3940295Z #15 _PyEval_EvalFrameDefault () 2023-01-11T23:33:27.3940713Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4213 2023-01-11T23:33:27.3941005Z #16 0x00000000004fe5ef in _PyEval_EvalFrame ( 2023-01-11T23:33:27.3941326Z throwflag=, 2023-01-11T23:33:27.3941809Z f=, 2023-01-11T23:33:27.3942178Z tstate=) 2023-01-11T23:33:27.3942573Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052 2023-01-11T23:33:27.3946246Z #17 _PyEval_Vector (kwnames=, 2023-01-11T23:33:27.3946778Z kwnames@entry=, argcount=, args=, 2023-01-11T23:33:27.3947216Z args@entry=, locals=0x0, 2023-01-11T23:33:27.3947636Z locals@entry=, con=0x7f494dc27890, 2023-01-11T23:33:27.3948054Z con@entry=, tstate=0x135abf0, 2023-01-11T23:33:27.3948455Z tstate@entry=) 2023-01-11T23:33:27.3948889Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065 2023-01-11T23:33:27.3949153Z #18 _PyFunction_Vectorcall () 2023-01-11T23:33:27.3949469Z at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342 2023-01-11T23:33:27.3952173Z #19 0x00000000004f351e in _PyObject_VectorcallTstate (kwnames=0x0, 2023-01-11T23:33:27.3952596Z nargsf=, args=0x7f485d457280, callable=0x7f494dc27880, 2023-01-11T23:33:27.3952861Z tstate=0x135abf0) 2023-01-11T23:33:27.3953289Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123 2023-01-11T23:33:27.3953839Z #20 PyObject_Vectorcall (kwnames=0x0, nargsf=, 2023-01-11T23:33:27.3954168Z args=0x7f485d457280, callable=0x7f494dc27880) 2023-01-11T23:33:27.3954601Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123 2023-01-11T23:33:27.3956329Z #21 call_function (kwnames=0x0, oparg=, 2023-01-11T23:33:27.3956695Z pp_stack=, trace_info=0x7ffd99b0c9e0, 2023-01-11T23:33:27.3956998Z tstate=) 2023-01-11T23:33:27.3957334Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5891 2023-01-11T23:33:27.3957628Z #22 _PyEval_EvalFrameDefault () 2023-01-11T23:33:27.3958042Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4181 2023-01-11T23:33:27.3958489Z #23 0x0000000000543a33 in _PyEval_EvalFrame ( 2023-01-11T23:33:27.3958888Z throwflag=, 2023-01-11T23:33:27.3959295Z f=, 2023-01-11T23:33:27.3959695Z tstate=) 2023-01-11T23:33:27.3960240Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052 2023-01-11T23:33:27.3963118Z #24 _PyEval_Vector (kwnames=0x0, 2023-01-11T23:33:27.3963637Z argcount=, 2023-01-11T23:33:27.3964256Z args=, locals=0x0, con=0x7f485d269ac0, tstate=0x135abf0) 2023-01-11T23:33:27.3965002Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065 2023-01-11T23:33:27.3965574Z #25 _PyFunction_Vectorcall (kwnames=0x0, nargsf=, 2023-01-11T23:33:27.3965866Z stack=, func=0x7f485d269ab0) 2023-01-11T23:33:27.3966522Z at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342 2023-01-11T23:33:27.3970146Z #26 _PyObject_VectorcallTstate (kwnames=0x0, nargsf=, 2023-01-11T23:33:27.3970580Z args=, callable=0x7f485d269ab0, tstate=0x135abf0) 2023-01-11T23:33:27.3971179Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:114 2023-01-11T23:33:27.3971442Z #27 vectorcall_unbound ( 2023-01-11T23:33:27.3971770Z nargs=, args=, 2023-01-11T23:33:27.3972168Z func=, 2023-01-11T23:33:27.3972601Z unbound=, 2023-01-11T23:33:27.3973124Z tstate=) 2023-01-11T23:33:27.3973664Z at /usr/local/src/conda/python-3.10.8/Programs/gcmodule.c:1629 2023-01-11T23:33:27.3974010Z #28 vectorcall_method () 2023-01-11T23:33:27.3974354Z at /usr/local/src/conda/python-3.10.8/Programs/gcmodule.c:1661 2023-01-11T23:33:27.3974921Z #29 0x0000000000543898 in slot_mp_subscript (self=, 2023-01-11T23:33:27.3975178Z arg1=) 2023-01-11T23:33:27.3975546Z at /usr/local/src/conda/python-3.10.8/Programs/gcmodule.c:7258 2023-01-11T23:33:27.3975831Z #30 0x00000000004ef56e in _PyEval_EvalFrameDefault () 2023-01-11T23:33:27.3976280Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:2109 2023-01-11T23:33:27.3978309Z #31 0x00000000004fe5ef in _PyEval_EvalFrame ( 2023-01-11T23:33:27.3978636Z throwflag=, 2023-01-11T23:33:27.3979014Z f=, 2023-01-11T23:33:27.3979386Z tstate=) 2023-01-11T23:33:27.3979791Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052 2023-01-11T23:33:27.3982859Z #32 _PyEval_Vector (kwnames=, 2023-01-11T23:33:27.3983472Z kwnames@entry=, argcount=, args=, 2023-01-11T23:33:27.3983918Z args@entry=, locals=0x0, 2023-01-11T23:33:27.3984339Z locals@entry=, con=0x7f494dc255b0, 2023-01-11T23:33:27.3984751Z con@entry=, tstate=0x135abf0, 2023-01-11T23:33:27.3985150Z tstate@entry=) 2023-01-11T23:33:27.3985570Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065 2023-01-11T23:33:27.3985831Z #33 _PyFunction_Vectorcall () 2023-01-11T23:33:27.3986138Z at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342 2023-01-11T23:33:27.3988355Z #34 0x00000000004eecef in _PyObject_VectorcallTstate (kwnames=0x0, 2023-01-11T23:33:27.3988860Z nargsf=, args=0x7f485d45ea48, callable=0x7f494dc255a0, 2023-01-11T23:33:27.3989128Z tstate=0x135abf0) 2023-01-11T23:33:27.3989474Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123 2023-01-11T23:33:27.3992631Z #35 PyObject_Vectorcall (kwnames=0x0, nargsf=, 2023-01-11T23:33:27.3993022Z args=0x7f485d45ea48, callable=0x7f494dc255a0) 2023-01-11T23:33:27.3993486Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123 2023-01-11T23:33:27.3993790Z #36 call_function (kwnames=0x0, oparg=, 2023-01-11T23:33:27.3994084Z pp_stack=, trace_info=0x7ffd99b0cde0, 2023-01-11T23:33:27.3994336Z tstate=) 2023-01-11T23:33:27.3994649Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5891 2023-01-11T23:33:27.3994915Z #37 _PyEval_EvalFrameDefault () 2023-01-11T23:33:27.3995235Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4213 2023-01-11T23:33:27.3996402Z #38 0x00000000004fe5ef in _PyEval_EvalFrame ( 2023-01-11T23:33:27.3996881Z throwflag=, 2023-01-11T23:33:27.3997292Z f=, 2023-01-11T23:33:27.3997677Z tstate=) 2023-01-11T23:33:27.3998190Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052 2023-01-11T23:33:27.4000106Z #39 _PyEval_Vector (kwnames=, 2023-01-11T23:33:27.4000726Z kwnames@entry=, argcount=, args=, 2023-01-11T23:33:27.4001346Z args@entry=, locals=0x0, 2023-01-11T23:33:27.4001906Z locals@entry=, con=0x7f4866255370, 2023-01-11T23:33:27.4002361Z con@entry=, tstate=0x135abf0, 2023-01-11T23:33:27.4002765Z tstate@entry=) 2023-01-11T23:33:27.4003226Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065 2023-01-11T23:33:27.4003527Z #40 _PyFunction_Vectorcall () 2023-01-11T23:33:27.4003977Z at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342 2023-01-11T23:33:27.4004982Z #41 0x00000000004ef101 in _PyObject_VectorcallTstate (kwnames=0x0, 2023-01-11T23:33:27.4005375Z nargsf=, args=0x6372c38, callable=0x7f4866255360, 2023-01-11T23:33:27.4005680Z tstate=0x135abf0) 2023-01-11T23:33:27.4006052Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123 2023-01-11T23:33:27.4007672Z #42 PyObject_Vectorcall (kwnames=0x0, nargsf=, args=0x6372c38, 2023-01-11T23:33:27.4008041Z callable=0x7f4866255360) 2023-01-11T23:33:27.4008487Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123 2023-01-11T23:33:27.4009210Z #43 call_function (kwnames=0x0, oparg=, 2023-01-11T23:33:27.4009603Z pp_stack=, trace_info=0x7ffd99b0cfa0, 2023-01-11T23:33:27.4009931Z tstate=) 2023-01-11T23:33:27.4010313Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5891 2023-01-11T23:33:27.4010575Z #44 _PyEval_EvalFrameDefault () 2023-01-11T23:33:27.4010893Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4198 2023-01-11T23:33:27.4011362Z #45 0x00000000004fe5ef in _PyEval_EvalFrame ( 2023-01-11T23:33:27.4011774Z throwflag=, 2023-01-11T23:33:27.4012267Z f=, 2023-01-11T23:33:27.4012728Z tstate=) 2023-01-11T23:33:27.4013138Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052 2023-01-11T23:33:27.4016701Z #46 _PyEval_Vector (kwnames=, 2023-01-11T23:33:27.4017202Z kwnames@entry=, argcount=, args=, 2023-01-11T23:33:27.4017771Z args@entry=, locals=0x0, 2023-01-11T23:33:27.4018215Z locals@entry=, con=0x7f4866254680, 2023-01-11T23:33:27.4018618Z con@entry=, tstate=0x135abf0, 2023-01-11T23:33:27.4019078Z tstate@entry=) 2023-01-11T23:33:27.4019596Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065 2023-01-11T23:33:27.4019852Z #47 _PyFunction_Vectorcall () 2023-01-11T23:33:27.4020175Z at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342 2023-01-11T23:33:27.4022112Z #48 0x00000000004f141c in do_call_core (kwdict=0x7f494dd72780, 2023-01-11T23:33:27.4022752Z callargs=0x7f485d211e40, func=0x7f4866254670, trace_info=0x7ffd99b0d160, 2023-01-11T23:33:27.4023235Z tstate=) 2023-01-11T23:33:27.4023748Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5943 2023-01-11T23:33:27.4024100Z #49 _PyEval_EvalFrameDefault () 2023-01-11T23:33:27.4024476Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4277 2023-01-11T23:33:27.4024772Z #50 0x00000000004fe5ef in _PyEval_EvalFrame ( 2023-01-11T23:33:27.4025095Z throwflag=, 2023-01-11T23:33:27.4025471Z f=, 2023-01-11T23:33:27.4025852Z tstate=) 2023-01-11T23:33:27.4026250Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052 2023-01-11T23:33:27.4029279Z #51 _PyEval_Vector (kwnames=, 2023-01-11T23:33:27.4030125Z kwnames@entry=, argcount=, args=, 2023-01-11T23:33:27.4030830Z args@entry=, locals=0x0, 2023-01-11T23:33:27.4031452Z locals@entry=, con=0x7f494dd9d5b0, 2023-01-11T23:33:27.4032081Z con@entry=, tstate=0x135abf0, 2023-01-11T23:33:27.4032687Z tstate@entry=) 2023-01-11T23:33:27.4033349Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065 2023-01-11T23:33:27.4033735Z #52 _PyFunction_Vectorcall () 2023-01-11T23:33:27.4034222Z at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342 2023-01-11T23:33:27.4034927Z #53 0x00000000004ef101 in _PyObject_VectorcallTstate (kwnames=0x0, 2023-01-11T23:33:27.4035374Z nargsf=, args=0x63b42a8, callable=0x7f494dd9d5a0, 2023-01-11T23:33:27.4035634Z tstate=0x135abf0) 2023-01-11T23:33:27.4035987Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123 2023-01-11T23:33:27.4038193Z #54 PyObject_Vectorcall (kwnames=0x0, nargsf=, args=0x63b42a8, 2023-01-11T23:33:27.4038568Z callable=0x7f494dd9d5a0) 2023-01-11T23:33:27.4038995Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123 2023-01-11T23:33:27.4039970Z #55 call_function (kwnames=0x0, oparg=, 2023-01-11T23:33:27.4040275Z pp_stack=, trace_info=0x7ffd99b0d320, 2023-01-11T23:33:27.4040597Z tstate=) 2023-01-11T23:33:27.4041178Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5891 2023-01-11T23:33:27.4041539Z #56 _PyEval_EvalFrameDefault () 2023-01-11T23:33:27.4041998Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4198 2023-01-11T23:33:27.4042368Z #57 0x00000000004fe5ef in _PyEval_EvalFrame ( 2023-01-11T23:33:27.4042692Z throwflag=, 2023-01-11T23:33:27.4043074Z f=, 2023-01-11T23:33:27.4043447Z tstate=) 2023-01-11T23:33:27.4043842Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052 2023-01-11T23:33:27.4047136Z #58 _PyEval_Vector (kwnames=, 2023-01-11T23:33:27.4047585Z kwnames@entry=, argcount=, args=, 2023-01-11T23:33:27.4048166Z args@entry=, locals=0x0, 2023-01-11T23:33:27.4048602Z locals@entry=, con=0x7f494dd9df40, 2023-01-11T23:33:27.4049127Z con@entry=, tstate=0x135abf0, 2023-01-11T23:33:27.4049602Z tstate@entry=) 2023-01-11T23:33:27.4050027Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065 2023-01-11T23:33:27.4050276Z #59 _PyFunction_Vectorcall () 2023-01-11T23:33:27.4050596Z at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342 2023-01-11T23:33:27.4051928Z #60 0x00000000004ef101 in _PyObject_VectorcallTstate (kwnames=0x0, 2023-01-11T23:33:27.4052318Z nargsf=, args=0x13d7510, callable=0x7f494dd9df30, 2023-01-11T23:33:27.4052638Z tstate=0x135abf0) 2023-01-11T23:33:27.4053005Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123 2023-01-11T23:33:27.4055047Z #61 PyObject_Vectorcall (kwnames=0x0, nargsf=, args=0x13d7510, 2023-01-11T23:33:27.4055408Z callable=0x7f494dd9df30) 2023-01-11T23:33:27.4055831Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123 2023-01-11T23:33:27.4056920Z #62 call_function (kwnames=0x0, oparg=, 2023-01-11T23:33:27.4057312Z pp_stack=, trace_info=0x7ffd99b0d4e0, 2023-01-11T23:33:27.4057627Z tstate=) 2023-01-11T23:33:27.4058046Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5891 2023-01-11T23:33:27.4058387Z #63 _PyEval_EvalFrameDefault () 2023-01-11T23:33:27.4058818Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4198 2023-01-11T23:33:27.4059183Z #64 0x00000000004fe5ef in _PyEval_EvalFrame ( 2023-01-11T23:33:27.4059595Z throwflag=, 2023-01-11T23:33:27.4059973Z f=, 2023-01-11T23:33:27.4060334Z tstate=) 2023-01-11T23:33:27.4060734Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052 2023-01-11T23:33:27.4063511Z #65 _PyEval_Vector (kwnames=, 2023-01-11T23:33:27.4063922Z kwnames@entry=, argcount=, args=, 2023-01-11T23:33:27.4064449Z args@entry=, locals=0x0, 2023-01-11T23:33:27.4064875Z locals@entry=, con=0x7f494dc1ede0, 2023-01-11T23:33:27.4065414Z con@entry=, tstate=0x135abf0, 2023-01-11T23:33:27.4065814Z tstate@entry=) 2023-01-11T23:33:27.4066245Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065 2023-01-11T23:33:27.4066510Z #66 _PyFunction_Vectorcall () 2023-01-11T23:33:27.4066833Z at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342 2023-01-11T23:33:27.4069032Z #67 0x00000000004eecef in _PyObject_VectorcallTstate (kwnames=0x0, 2023-01-11T23:33:27.4069362Z nargsf=, args=0x7f494dee8200, callable=0x7f494dc1edd0, 2023-01-11T23:33:27.4069697Z tstate=0x135abf0) 2023-01-11T23:33:27.4070185Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123 2023-01-11T23:33:27.4072716Z #68 PyObject_Vectorcall (kwnames=0x0, nargsf=, 2023-01-11T23:33:27.4073075Z args=0x7f494dee8200, callable=0x7f494dc1edd0) 2023-01-11T23:33:27.4073495Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123 2023-01-11T23:33:27.4073783Z #69 call_function (kwnames=0x0, oparg=, 2023-01-11T23:33:27.4074282Z pp_stack=, trace_info=0x7ffd99b0d6a0, 2023-01-11T23:33:27.4074655Z tstate=) 2023-01-11T23:33:27.4075055Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5891 2023-01-11T23:33:27.4075314Z #70 _PyEval_EvalFrameDefault () 2023-01-11T23:33:27.4075666Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4213 2023-01-11T23:33:27.4076043Z #71 0x00000000004fe5ef in _PyEval_EvalFrame ( 2023-01-11T23:33:27.4076482Z throwflag=, 2023-01-11T23:33:27.4076938Z f=, 2023-01-11T23:33:27.4077312Z tstate=) 2023-01-11T23:33:27.4077724Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052 2023-01-11T23:33:27.4080254Z #72 _PyEval_Vector (kwnames=, 2023-01-11T23:33:27.4080806Z kwnames@entry=, argcount=, args=, 2023-01-11T23:33:27.4081369Z args@entry=, locals=0x0, 2023-01-11T23:33:27.4081791Z locals@entry=, con=0x7f494dc1ed50, 2023-01-11T23:33:27.4082200Z con@entry=, tstate=0x135abf0, 2023-01-11T23:33:27.4082603Z tstate@entry=) 2023-01-11T23:33:27.4083015Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065 2023-01-11T23:33:27.4083279Z #73 _PyFunction_Vectorcall () 2023-01-11T23:33:27.4083589Z at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342 2023-01-11T23:33:27.4085272Z #74 0x00000000004efd83 in _PyObject_VectorcallTstate (kwnames=0x7f494dec1e00, 2023-01-11T23:33:27.4085604Z nargsf=, args=, callable=0x7f494dc1ed40, 2023-01-11T23:33:27.4085924Z tstate=0x135abf0) 2023-01-11T23:33:27.4086246Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123 2023-01-11T23:33:27.4087404Z #75 PyObject_Vectorcall (kwnames=0x7f494dec1e00, nargsf=, 2023-01-11T23:33:27.4087759Z args=, callable=0x7f494dc1ed40) 2023-01-11T23:33:27.4088154Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123 2023-01-11T23:33:27.4089318Z #76 call_function (kwnames=0x7f494dec1e00, oparg=, 2023-01-11T23:33:27.4089696Z pp_stack=, trace_info=0x7ffd99b0d860, 2023-01-11T23:33:27.4090138Z tstate=) 2023-01-11T23:33:27.4090471Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5891 2023-01-11T23:33:27.4090739Z #77 _PyEval_EvalFrameDefault () 2023-01-11T23:33:27.4091061Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4231 2023-01-11T23:33:27.4091802Z #78 0x0000000000594b72 in _PyEval_EvalFrame ( 2023-01-11T23:33:27.4092260Z throwflag=, 2023-01-11T23:33:27.4092664Z f=, 2023-01-11T23:33:27.4093044Z tstate=) 2023-01-11T23:33:27.4093623Z at /croot/python-split_1669298683653/_build_env/x86_64-conda-linux-gnu/sysroot/usr/include/bits/call.c:46 2023-01-11T23:33:27.4093920Z #79 _PyEval_Vector () 2023-01-11T23:33:27.4094268Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065 2023-01-11T23:33:27.4098499Z #80 0x0000000000594ab7 in PyEval_EvalCode (co=co@entry=0x7f494de84920, 2023-01-11T23:33:27.4098814Z globals=globals@entry=0x7f494de76600, locals=locals@entry=0x7f494de76600) 2023-01-11T23:33:27.4099283Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:1134 2023-01-11T23:33:27.4099988Z #81 0x00000000005c6e57 in run_eval_code_obj () 2023-01-11T23:33:27.4100369Z at /usr/local/src/conda/python-3.10.8/Objects/clinic/marshal.c.h:1291 2023-01-11T23:33:27.4102675Z #82 0x00000000005c1d40 in run_mod () 2023-01-11T23:33:27.4103027Z at /usr/local/src/conda/python-3.10.8/Objects/clinic/marshal.c.h:1312 2023-01-11T23:33:27.4105087Z #83 0x00000000005b9ebb in PyRun_StringFlags.localalias () 2023-01-11T23:33:27.4105465Z at /usr/local/src/conda/python-3.10.8/Objects/clinic/marshal.c.h:1183 2023-01-11T23:33:27.4107620Z #84 0x00000000005b9cfb in PyRun_SimpleStringFlags.localalias () 2023-01-11T23:33:27.4108021Z at /usr/local/src/conda/python-3.10.8/Objects/clinic/marshal.c.h:503 2023-01-11T23:33:27.4110822Z #85 0x00000000005b8d5c in pymain_run_command ( 2023-01-11T23:33:27.4112314Z command=) 2023-01-11T23:33:27.4112873Z at /croot/python-split_1669298683653/work/build-static/python.c:252 2023-01-11T23:33:27.4113212Z #86 pymain_run_python (exitcode=0x7ffd99b0dac0) 2023-01-11T23:33:27.4113561Z at /croot/python-split_1669298683653/work/build-static/python.c:582 2023-01-11T23:33:27.4113820Z #87 Py_RunMain.localalias () 2023-01-11T23:33:27.4114164Z at /croot/python-split_1669298683653/work/build-static/python.c:670 2023-01-11T23:33:27.4129408Z #88 0x0000000000587c29 in Py_BytesMain (argc=, 2023-01-11T23:33:27.4129685Z argv=) 2023-01-11T23:33:27.4130027Z at /croot/python-split_1669298683653/work/build-static/python.c:1090 2023-01-11T23:33:27.4134205Z #89 0x00007f494df55c87 in __libc_start_main (main=0x587be0
, argc=5, 2023-01-11T23:33:27.4135449Z argv=0x7ffd99b0dcc8, init=, fini=, 2023-01-11T23:33:27.4136163Z rtld_fini=, stack_end=0x7ffd99b0dcb8) 2023-01-11T23:33:27.4136605Z at ../csu/libc-start.c:310 2023-01-11T23:33:27.4136905Z #90 0x0000000000587ade in _start () 2023-01-11T23:33:27.4137335Z at /usr/local/src/conda/python-3.10.8/Modules/_io/clinic/peg_api.c:880 2023-01-11T23:33:27.5657320Z GNU gdb (Ubuntu 8.1.1-0ubuntu1) 8.1.1 2023-01-11T23:33:27.5657758Z Copyright (C) 2018 Free Software Foundation, Inc. 2023-01-11T23:33:27.5658111Z License GPLv3+: GNU GPL version 3 or later 2023-01-11T23:33:27.5658429Z This is free software: you are free to change and redistribute it. 2023-01-11T23:33:27.5658763Z There is NO WARRANTY, to the extent permitted by law. Type "show copying" 2023-01-11T23:33:27.5659036Z and "show warranty" for details. 2023-01-11T23:33:27.5659563Z This GDB was configured as "x86_64-linux-gnu". 2023-01-11T23:33:27.5659831Z Type "show configuration" for configuration details. 2023-01-11T23:33:27.5660101Z For bug reporting instructions, please see: 2023-01-11T23:33:27.5660352Z . 2023-01-11T23:33:27.5660634Z Find the GDB manual and other documentation resources online at: 2023-01-11T23:33:27.5660926Z . 2023-01-11T23:33:27.5661162Z For help, type "help". 2023-01-11T23:33:27.5661417Z Type "apropos word" to search for commands related to "word"... 2023-01-11T23:33:27.6368492Z Reading symbols from python...done. 2023-01-11T23:33:28.1652825Z 2023-01-11T23:33:28.1653312Z warning: core file may not match specified executable file. 2023-01-11T23:33:28.1675785Z BFD: warning: /opt/conda/lib/libgomp.so.1: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010001 2023-01-11T23:33:28.1676162Z [New LWP 8834] 2023-01-11T23:33:28.1676430Z [New LWP 8852] 2023-01-11T23:33:28.1676672Z [New LWP 8843] 2023-01-11T23:33:28.1676906Z [New LWP 8853] 2023-01-11T23:33:28.1677205Z BFD: warning: /opt/conda/lib/libgomp.so.1: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010002 2023-01-11T23:33:28.1677669Z BFD: warning: /opt/conda/bin/../lib/libstdc++.so.6: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010001 2023-01-11T23:33:28.1678247Z BFD: warning: /opt/conda/bin/../lib/libstdc++.so.6: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010002 2023-01-11T23:33:28.1678706Z BFD: warning: /opt/conda/bin/../lib/libgcc_s.so.1: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010001 2023-01-11T23:33:28.1679073Z BFD: warning: /opt/conda/bin/../lib/libgcc_s.so.1: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010002 2023-01-11T23:33:28.1679330Z [New LWP 8849] 2023-01-11T23:33:28.1679518Z [New LWP 8855] 2023-01-11T23:33:28.1679706Z [New LWP 8845] 2023-01-11T23:33:28.1679888Z [New LWP 8851] 2023-01-11T23:33:28.1680072Z [New LWP 8856] 2023-01-11T23:33:28.1680255Z [New LWP 8854] 2023-01-11T23:33:28.1680428Z [New LWP 8841] 2023-01-11T23:33:28.1680611Z [New LWP 8847] 2023-01-11T23:33:28.1680795Z [New LWP 8868] 2023-01-11T23:33:28.1680968Z [New LWP 8873] 2023-01-11T23:33:28.1681148Z [New LWP 8866] 2023-01-11T23:33:28.1681331Z [New LWP 8869] 2023-01-11T23:33:28.1681503Z [New LWP 8867] 2023-01-11T23:33:28.1687936Z BFD: warning: /opt/conda/lib/python3.10/site-packages/numpy/core/../../../.././libgfortran.so.5: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010001 2023-01-11T23:33:28.1688673Z BFD: warning: /opt/conda/lib/python3.10/site-packages/numpy/core/../../../.././libgfortran.so.5: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010002 2023-01-11T23:33:28.1689233Z BFD: warning: /opt/conda/lib/python3.10/site-packages/numpy/core/../../../.././libquadmath.so.0: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010001 2023-01-11T23:33:28.1689774Z BFD: warning: /opt/conda/lib/python3.10/site-packages/numpy/core/../../../.././libquadmath.so.0: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010002 2023-01-11T23:33:28.1857092Z [Thread debugging using libthread_db enabled] 2023-01-11T23:33:28.1858109Z Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". 2023-01-11T23:33:33.6833990Z 51 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory. 2023-01-11T23:33:33.6835141Z Core was generated by `/opt/conda/bin/python -bb -c from multiprocessing.spawn import spawn_main; spaw'. 2023-01-11T23:33:33.6836264Z warning: File "/var/lib/jenkins/workspace/.gdbinit" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load". 2023-01-11T23:33:33.6836945Z Program terminated with signal SIGSEGV, Segmentation fault. 2023-01-11T23:33:33.6837274Z #0 raise (sig=) at ../sysdeps/unix/sysv/linux/raise.c:51 2023-01-11T23:33:33.6837563Z [Current thread is 1 (Thread 0x7fb3f30a2080 (LWP 8834))] 2023-01-11T23:33:33.6864579Z To enable execution of this file add 2023-01-11T23:33:33.6865563Z add-auto-load-safe-path /var/lib/jenkins/workspace/.gdbinit 2023-01-11T23:33:33.6866359Z line to your configuration file "/var/lib/jenkins/.gdbinit". 2023-01-11T23:33:33.6867217Z To completely disable this security protection add 2023-01-11T23:33:33.6867506Z set auto-load safe-path / 2023-01-11T23:33:33.6867761Z line to your configuration file "/var/lib/jenkins/.gdbinit". 2023-01-11T23:33:33.6868055Z For more information about this security protection see the 2023-01-11T23:33:33.6868423Z "Auto-loading safe path" section in the GDB manual. E.g., run from the shell: 2023-01-11T23:33:33.6868739Z info "(gdb)Auto-loading safe path" 2023-01-11T23:33:33.6895504Z #0 raise (sig=) at ../sysdeps/unix/sysv/linux/raise.c:51 2023-01-11T23:33:33.6896310Z #1 0x00007fb3ba0be65b in handler_SIGSEGV(int, siginfo_t*, void*) () 2023-01-11T23:33:33.6897125Z from /opt/conda/lib/python3.10/site-packages/torch/lib/libtorch_python.so 2023-01-11T23:33:33.6906155Z #2 2023-01-11T23:33:33.6906498Z #3 __strlen_sse2 () at ../sysdeps/x86_64/multiarch/../strlen.S:120 2023-01-11T23:33:33.7040943Z #4 0x00007fb3f1b6003b in string_at (ptr=0x0, size=-1) at :5564 2023-01-11T23:33:33.7041252Z #5 0x00007fb3f2f57052 in ffi_call_unix64 () 2023-01-11T23:33:33.7041625Z from /opt/conda/lib/python3.10/lib-dynload/../../libffi.so.8 2023-01-11T23:33:33.7047230Z #6 0x00007fb3f2f558cd in ffi_call_int () 2023-01-11T23:33:33.7048332Z from /opt/conda/lib/python3.10/lib-dynload/../../libffi.so.8 2023-01-11T23:33:33.7052052Z #7 0x00007fb3f1b68879 in _call_function_pointer (argtypecount=2, argcount=2, 2023-01-11T23:33:33.7052611Z resmem=0x7ffc09d87bc0, restype=, atypes=, 2023-01-11T23:33:33.7053523Z avalues=, pProc=0x7fb3f1b60002 , flags=4357) 2023-01-11T23:33:33.7054360Z at /usr/local/src/conda/python-3.10.8/build-static/stgdict.c:916 2023-01-11T23:33:33.7055352Z #8 _ctypes_callproc () 2023-01-11T23:33:33.7056013Z at /usr/local/src/conda/python-3.10.8/build-static/stgdict.c:1259 2023-01-11T23:33:33.7056319Z #9 0x00007fb3f1b683fe in PyCFuncPtr_call () at :4201 2023-01-11T23:33:33.7056626Z #10 0x00000000004f7b8b in _PyObject_MakeTpCall.localalias () 2023-01-11T23:33:33.7056972Z at /usr/local/src/conda/python-3.10.8/Include/abstract.h:224 2023-01-11T23:33:33.7060315Z #11 0x00000000004f37ae in _PyObject_VectorcallTstate ( 2023-01-11T23:33:33.7060782Z kwnames=, 2023-01-11T23:33:33.7061308Z nargsf=, args=0x7fb3015af418, 2023-01-11T23:33:33.7061825Z callable=, 2023-01-11T23:33:33.7062319Z tstate=) 2023-01-11T23:33:33.7062795Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:112 2023-01-11T23:33:33.7063116Z #12 _PyObject_VectorcallTstate (kwnames=0x0, nargsf=, 2023-01-11T23:33:33.7063430Z args=0x7fb3015af418, callable=0x7fb3f1e1c880, tstate=) 2023-01-11T23:33:33.7063786Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:99 2023-01-11T23:33:33.7064787Z #13 PyObject_Vectorcall (kwnames=0x0, nargsf=, 2023-01-11T23:33:33.7065157Z args=0x7fb3015af418, callable=0x7fb3f1e1c880) 2023-01-11T23:33:33.7065524Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123 2023-01-11T23:33:33.7068199Z #14 call_function (kwnames=0x0, oparg=, 2023-01-11T23:33:33.7068519Z pp_stack=, trace_info=0x7ffc09d87ed0, 2023-01-11T23:33:33.7068830Z tstate=) 2023-01-11T23:33:33.7069300Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5891 2023-01-11T23:33:33.7069614Z #15 _PyEval_EvalFrameDefault () 2023-01-11T23:33:33.7070005Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4213 2023-01-11T23:33:33.7070274Z #16 0x00000000004fe5ef in _PyEval_EvalFrame ( 2023-01-11T23:33:33.7070869Z throwflag=, 2023-01-11T23:33:33.7071249Z f=, 2023-01-11T23:33:33.7071618Z tstate=) 2023-01-11T23:33:33.7072006Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052 2023-01-11T23:33:33.7074812Z #17 _PyEval_Vector (kwnames=, 2023-01-11T23:33:33.7075213Z kwnames@entry=, argcount=, args=, 2023-01-11T23:33:33.7075848Z args@entry=, locals=0x0, 2023-01-11T23:33:33.7076418Z locals@entry=, con=0x7fb3f1ddb890, 2023-01-11T23:33:33.7076845Z con@entry=, tstate=0x1fb3bf0, 2023-01-11T23:33:33.7077249Z tstate@entry=) 2023-01-11T23:33:33.7077684Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065 2023-01-11T23:33:33.7077938Z #18 _PyFunction_Vectorcall () 2023-01-11T23:33:33.7078259Z at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342 2023-01-11T23:33:33.7080328Z #19 0x00000000004f351e in _PyObject_VectorcallTstate (kwnames=0x0, 2023-01-11T23:33:33.7080726Z nargsf=, args=0x7fb3015af280, callable=0x7fb3f1ddb880, 2023-01-11T23:33:33.7081046Z tstate=0x1fb3bf0) 2023-01-11T23:33:33.7081454Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123 2023-01-11T23:33:33.7082061Z #20 PyObject_Vectorcall (kwnames=0x0, nargsf=, 2023-01-11T23:33:33.7082445Z args=0x7fb3015af280, callable=0x7fb3f1ddb880) 2023-01-11T23:33:33.7082906Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123 2023-01-11T23:33:33.7083543Z #21 call_function (kwnames=0x0, oparg=, 2023-01-11T23:33:33.7083929Z pp_stack=, trace_info=0x7ffc09d88090, 2023-01-11T23:33:33.7084258Z tstate=) 2023-01-11T23:33:33.7084680Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5891 2023-01-11T23:33:33.7084945Z #22 _PyEval_EvalFrameDefault () 2023-01-11T23:33:33.7085264Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4181 2023-01-11T23:33:33.7085759Z #23 0x0000000000543a33 in _PyEval_EvalFrame ( 2023-01-11T23:33:33.7086172Z throwflag=, 2023-01-11T23:33:33.7086625Z f=, 2023-01-11T23:33:33.7086990Z tstate=) 2023-01-11T23:33:33.7087390Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052 2023-01-11T23:33:33.7090041Z #24 _PyEval_Vector (kwnames=0x0, 2023-01-11T23:33:33.7090482Z argcount=, 2023-01-11T23:33:33.7091030Z args=, locals=0x0, con=0x7fb3013c1ac0, tstate=0x1fb3bf0) 2023-01-11T23:33:33.7091575Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065 2023-01-11T23:33:33.7091880Z #25 _PyFunction_Vectorcall (kwnames=0x0, nargsf=, 2023-01-11T23:33:33.7092155Z stack=, func=0x7fb3013c1ab0) 2023-01-11T23:33:33.7092492Z at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342 2023-01-11T23:33:33.7095244Z #26 _PyObject_VectorcallTstate (kwnames=0x0, nargsf=, 2023-01-11T23:33:33.7095814Z args=, callable=0x7fb3013c1ab0, tstate=0x1fb3bf0) 2023-01-11T23:33:33.7096334Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:114 2023-01-11T23:33:33.7096676Z #27 vectorcall_unbound ( 2023-01-11T23:33:33.7097093Z nargs=, args=, 2023-01-11T23:33:33.7097607Z func=, 2023-01-11T23:33:33.7098035Z unbound=, 2023-01-11T23:33:33.7098452Z tstate=) 2023-01-11T23:33:33.7099077Z at /usr/local/src/conda/python-3.10.8/Programs/gcmodule.c:1629 2023-01-11T23:33:33.7099419Z #28 vectorcall_method () 2023-01-11T23:33:33.7099811Z at /usr/local/src/conda/python-3.10.8/Programs/gcmodule.c:1661 2023-01-11T23:33:33.7100116Z #29 0x0000000000543898 in slot_mp_subscript (self=, 2023-01-11T23:33:33.7100359Z arg1=) 2023-01-11T23:33:33.7100664Z at /usr/local/src/conda/python-3.10.8/Programs/gcmodule.c:7258 2023-01-11T23:33:33.7100946Z #30 0x00000000004ef56e in _PyEval_EvalFrameDefault () 2023-01-11T23:33:33.7101280Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:2109 2023-01-11T23:33:33.7101931Z #31 0x00000000004fe5ef in _PyEval_EvalFrame ( 2023-01-11T23:33:33.7102370Z throwflag=, 2023-01-11T23:33:33.7102866Z f=, 2023-01-11T23:33:33.7103293Z tstate=) 2023-01-11T23:33:33.7103685Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052 2023-01-11T23:33:33.7106136Z #32 _PyEval_Vector (kwnames=, 2023-01-11T23:33:33.7106680Z kwnames@entry=, argcount=, args=, 2023-01-11T23:33:33.7107278Z args@entry=, locals=0x0, 2023-01-11T23:33:33.7107736Z locals@entry=, con=0x7fb3f1dd95b0, 2023-01-11T23:33:33.7108149Z con@entry=, tstate=0x1fb3bf0, 2023-01-11T23:33:33.7108542Z tstate@entry=) 2023-01-11T23:33:33.7108973Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065 2023-01-11T23:33:33.7109223Z #33 _PyFunction_Vectorcall () 2023-01-11T23:33:33.7109535Z at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342 2023-01-11T23:33:33.7112055Z #34 0x00000000004eecef in _PyObject_VectorcallTstate (kwnames=0x0, 2023-01-11T23:33:33.7112471Z nargsf=, args=0x7fb3015b6a48, callable=0x7fb3f1dd95a0, 2023-01-11T23:33:33.7112790Z tstate=0x1fb3bf0) 2023-01-11T23:33:33.7113231Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123 2023-01-11T23:33:33.7113644Z #35 PyObject_Vectorcall (kwnames=0x0, nargsf=, 2023-01-11T23:33:33.7113986Z args=0x7fb3015b6a48, callable=0x7fb3f1dd95a0) 2023-01-11T23:33:33.7114351Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123 2023-01-11T23:33:33.7115067Z #36 call_function (kwnames=0x0, oparg=, 2023-01-11T23:33:33.7115456Z pp_stack=, trace_info=0x7ffc09d88490, 2023-01-11T23:33:33.7115773Z tstate=) 2023-01-11T23:33:33.7116178Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5891 2023-01-11T23:33:33.7116444Z #37 _PyEval_EvalFrameDefault () 2023-01-11T23:33:33.7116884Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4213 2023-01-11T23:33:33.7117516Z #38 0x00000000004fe5ef in _PyEval_EvalFrame ( 2023-01-11T23:33:33.7117947Z throwflag=, 2023-01-11T23:33:33.7118469Z f=, 2023-01-11T23:33:33.7118922Z tstate=) 2023-01-11T23:33:33.7119354Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052 2023-01-11T23:33:33.7121702Z #39 _PyEval_Vector (kwnames=, 2023-01-11T23:33:33.7122349Z kwnames@entry=, argcount=, args=, 2023-01-11T23:33:33.7122976Z args@entry=, locals=0x0, 2023-01-11T23:33:33.7123499Z locals@entry=, con=0x7fb30a3dd370, 2023-01-11T23:33:33.7123908Z con@entry=, tstate=0x1fb3bf0, 2023-01-11T23:33:33.7124307Z tstate@entry=) 2023-01-11T23:33:33.7124739Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065 2023-01-11T23:33:33.7124999Z #40 _PyFunction_Vectorcall () 2023-01-11T23:33:33.7125306Z at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342 2023-01-11T23:33:33.7126601Z #41 0x00000000004ef101 in _PyObject_VectorcallTstate (kwnames=0x0, 2023-01-11T23:33:33.7127020Z nargsf=, args=0x2009fa8, callable=0x7fb30a3dd360, 2023-01-11T23:33:33.7127350Z tstate=0x1fb3bf0) 2023-01-11T23:33:33.7127768Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123 2023-01-11T23:33:33.7128559Z #42 PyObject_Vectorcall (kwnames=0x0, nargsf=, args=0x2009fa8, 2023-01-11T23:33:33.7128919Z callable=0x7fb30a3dd360) 2023-01-11T23:33:33.7129337Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123 2023-01-11T23:33:33.7130822Z #43 call_function (kwnames=0x0, oparg=, 2023-01-11T23:33:33.7131203Z pp_stack=, trace_info=0x7ffc09d88650, 2023-01-11T23:33:33.7131532Z tstate=) 2023-01-11T23:33:33.7131956Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5891 2023-01-11T23:33:33.7132302Z #44 _PyEval_EvalFrameDefault () 2023-01-11T23:33:33.7132727Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4198 2023-01-11T23:33:33.7133008Z #45 0x00000000004fe5ef in _PyEval_EvalFrame ( 2023-01-11T23:33:33.7133334Z throwflag=, 2023-01-11T23:33:33.7133712Z f=, 2023-01-11T23:33:33.7134084Z tstate=) 2023-01-11T23:33:33.7134699Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052 2023-01-11T23:33:33.7137152Z #46 _PyEval_Vector (kwnames=, 2023-01-11T23:33:33.7137666Z kwnames@entry=, argcount=, args=, 2023-01-11T23:33:33.7138254Z args@entry=, locals=0x0, 2023-01-11T23:33:33.7138741Z locals@entry=, con=0x7fb30a3dc680, 2023-01-11T23:33:33.7139161Z con@entry=, tstate=0x1fb3bf0, 2023-01-11T23:33:33.7139676Z tstate@entry=) 2023-01-11T23:33:33.7140109Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065 2023-01-11T23:33:33.7140362Z #47 _PyFunction_Vectorcall () 2023-01-11T23:33:33.7140674Z at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342 2023-01-11T23:33:33.7141716Z #48 0x00000000004f141c in do_call_core (kwdict=0x7fb3f1f26780, 2023-01-11T23:33:33.7142017Z callargs=0x7fb301569e40, func=0x7fb30a3dc670, trace_info=0x7ffc09d88810, 2023-01-11T23:33:33.7142330Z tstate=) 2023-01-11T23:33:33.7142667Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5943 2023-01-11T23:33:33.7142921Z #49 _PyEval_EvalFrameDefault () 2023-01-11T23:33:33.7143308Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4277 2023-01-11T23:33:33.7143612Z #50 0x00000000004fe5ef in _PyEval_EvalFrame ( 2023-01-11T23:33:33.7144025Z throwflag=, 2023-01-11T23:33:33.7144416Z f=, 2023-01-11T23:33:33.7144783Z tstate=) 2023-01-11T23:33:33.7145175Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052 2023-01-11T23:33:33.7147862Z #51 _PyEval_Vector (kwnames=, 2023-01-11T23:33:33.7148255Z kwnames@entry=, argcount=, args=, 2023-01-11T23:33:33.7148826Z args@entry=, locals=0x0, 2023-01-11T23:33:33.7149329Z locals@entry=, con=0x7fb3f1f515b0, 2023-01-11T23:33:33.7149747Z con@entry=, tstate=0x1fb3bf0, 2023-01-11T23:33:33.7150232Z tstate@entry=) 2023-01-11T23:33:33.7150659Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065 2023-01-11T23:33:33.7150915Z #52 _PyFunction_Vectorcall () 2023-01-11T23:33:33.7151221Z at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342 2023-01-11T23:33:33.7153071Z #53 0x00000000004ef101 in _PyObject_VectorcallTstate (kwnames=0x0, 2023-01-11T23:33:33.7153468Z nargsf=, args=0x700c638, callable=0x7fb3f1f515a0, 2023-01-11T23:33:33.7153784Z tstate=0x1fb3bf0) 2023-01-11T23:33:33.7154175Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123 2023-01-11T23:33:33.7154943Z #54 PyObject_Vectorcall (kwnames=0x0, nargsf=, args=0x700c638, 2023-01-11T23:33:33.7155302Z callable=0x7fb3f1f515a0) 2023-01-11T23:33:33.7155749Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123 2023-01-11T23:33:33.7156630Z #55 call_function (kwnames=0x0, oparg=, 2023-01-11T23:33:33.7157009Z pp_stack=, trace_info=0x7ffc09d889d0, 2023-01-11T23:33:33.7157350Z tstate=) 2023-01-11T23:33:33.7157748Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5891 2023-01-11T23:33:33.7158011Z #56 _PyEval_EvalFrameDefault () 2023-01-11T23:33:33.7158343Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4198 2023-01-11T23:33:33.7158701Z #57 0x00000000004fe5ef in _PyEval_EvalFrame ( 2023-01-11T23:33:33.7159098Z throwflag=, 2023-01-11T23:33:33.7159479Z f=, 2023-01-11T23:33:33.7159848Z tstate=) 2023-01-11T23:33:33.7160311Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052 2023-01-11T23:33:33.7162580Z #58 _PyEval_Vector (kwnames=, 2023-01-11T23:33:33.7163045Z kwnames@entry=, argcount=, args=, 2023-01-11T23:33:33.7163626Z args@entry=, locals=0x0, 2023-01-11T23:33:33.7164087Z locals@entry=, con=0x7fb3f1f51f40, 2023-01-11T23:33:33.7164506Z con@entry=, tstate=0x1fb3bf0, 2023-01-11T23:33:33.7164954Z tstate@entry=) 2023-01-11T23:33:33.7165385Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065 2023-01-11T23:33:33.7165655Z #59 _PyFunction_Vectorcall () 2023-01-11T23:33:33.7166000Z at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342 2023-01-11T23:33:33.7167351Z #60 0x00000000004ef101 in _PyObject_VectorcallTstate (kwnames=0x0, 2023-01-11T23:33:33.7167718Z nargsf=, args=0x2030510, callable=0x7fb3f1f51f30, 2023-01-11T23:33:33.7168030Z tstate=0x1fb3bf0) 2023-01-11T23:33:33.7168423Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123 2023-01-11T23:33:33.7169650Z #61 PyObject_Vectorcall (kwnames=0x0, nargsf=, args=0x2030510, 2023-01-11T23:33:33.7169973Z callable=0x7fb3f1f51f30) 2023-01-11T23:33:33.7170309Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123 2023-01-11T23:33:33.7171710Z #62 call_function (kwnames=0x0, oparg=, 2023-01-11T23:33:33.7172044Z pp_stack=, trace_info=0x7ffc09d88b90, 2023-01-11T23:33:33.7172328Z tstate=) 2023-01-11T23:33:33.7172665Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5891 2023-01-11T23:33:33.7172923Z #63 _PyEval_EvalFrameDefault () 2023-01-11T23:33:33.7173232Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4198 2023-01-11T23:33:33.7173510Z #64 0x00000000004fe5ef in _PyEval_EvalFrame ( 2023-01-11T23:33:33.7173886Z throwflag=, 2023-01-11T23:33:33.7174260Z f=, 2023-01-11T23:33:33.7174902Z tstate=) 2023-01-11T23:33:33.7175310Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052 2023-01-11T23:33:33.7177835Z #65 _PyEval_Vector (kwnames=, 2023-01-11T23:33:33.7178219Z kwnames@entry=, argcount=, args=, 2023-01-11T23:33:33.7178742Z args@entry=, locals=0x0, 2023-01-11T23:33:33.7179248Z locals@entry=, con=0x7fb3f1dd2de0, 2023-01-11T23:33:33.7179663Z con@entry=, tstate=0x1fb3bf0, 2023-01-11T23:33:33.7180058Z tstate@entry=) 2023-01-11T23:33:33.7180454Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065 2023-01-11T23:33:33.7180709Z #66 _PyFunction_Vectorcall () 2023-01-11T23:33:33.7181029Z at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342 2023-01-11T23:33:33.7183257Z #67 0x00000000004eecef in _PyObject_VectorcallTstate (kwnames=0x0, 2023-01-11T23:33:33.7183932Z nargsf=, args=0x7fb3f2098200, callable=0x7fb3f1dd2dd0, 2023-01-11T23:33:33.7184495Z tstate=0x1fb3bf0) 2023-01-11T23:33:33.7185040Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123 2023-01-11T23:33:33.7185914Z #68 PyObject_Vectorcall (kwnames=0x0, nargsf=, 2023-01-11T23:33:33.7186329Z args=0x7fb3f2098200, callable=0x7fb3f1dd2dd0) 2023-01-11T23:33:33.7186859Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123 2023-01-11T23:33:33.7187990Z #69 call_function (kwnames=0x0, oparg=, 2023-01-11T23:33:33.7188434Z pp_stack=, trace_info=0x7ffc09d88d50, 2023-01-11T23:33:33.7188792Z tstate=) 2023-01-11T23:33:33.7189291Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5891 2023-01-11T23:33:33.7189803Z #70 _PyEval_EvalFrameDefault () 2023-01-11T23:33:33.7190369Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4213 2023-01-11T23:33:33.7191215Z #71 0x00000000004fe5ef in _PyEval_EvalFrame ( 2023-01-11T23:33:33.7192282Z throwflag=, 2023-01-11T23:33:33.7192859Z f=, 2023-01-11T23:33:33.7193444Z tstate=) 2023-01-11T23:33:33.7194041Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052 2023-01-11T23:33:33.7196634Z #72 _PyEval_Vector (kwnames=, 2023-01-11T23:33:33.7197224Z kwnames@entry=, argcount=, args=, 2023-01-11T23:33:33.7197888Z args@entry=, locals=0x0, 2023-01-11T23:33:33.7198521Z locals@entry=, con=0x7fb3f1dd2d50, 2023-01-11T23:33:33.7199141Z con@entry=, tstate=0x1fb3bf0, 2023-01-11T23:33:33.7199741Z tstate@entry=) 2023-01-11T23:33:33.7200368Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065 2023-01-11T23:33:33.7200748Z #73 _PyFunction_Vectorcall () 2023-01-11T23:33:33.7201219Z at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342 2023-01-11T23:33:33.7201901Z #74 0x00000000004efd83 in _PyObject_VectorcallTstate (kwnames=0x7fb3f2071e00, 2023-01-11T23:33:33.7202240Z nargsf=, args=, callable=0x7fb3f1dd2d40, 2023-01-11T23:33:33.7202501Z tstate=0x1fb3bf0) 2023-01-11T23:33:33.7202862Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123 2023-01-11T23:33:33.7203579Z #75 PyObject_Vectorcall (kwnames=0x7fb3f2071e00, nargsf=, 2023-01-11T23:33:33.7203873Z args=, callable=0x7fb3f1dd2d40) 2023-01-11T23:33:33.7204416Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123 2023-01-11T23:33:33.7206272Z #76 call_function (kwnames=0x7fb3f2071e00, oparg=, 2023-01-11T23:33:33.7206714Z pp_stack=, trace_info=0x7ffc09d88f10, 2023-01-11T23:33:33.7207094Z tstate=) 2023-01-11T23:33:33.7207570Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5891 2023-01-11T23:33:33.7207936Z #77 _PyEval_EvalFrameDefault () 2023-01-11T23:33:33.7208430Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4231 2023-01-11T23:33:33.7209335Z #78 0x0000000000594b72 in _PyEval_EvalFrame ( 2023-01-11T23:33:33.7209818Z throwflag=, 2023-01-11T23:33:33.7210390Z f=, 2023-01-11T23:33:33.7211040Z tstate=) 2023-01-11T23:33:33.7211736Z at /croot/python-split_1669298683653/_build_env/x86_64-conda-linux-gnu/sysroot/usr/include/bits/call.c:46 2023-01-11T23:33:33.7212152Z #79 _PyEval_Vector () 2023-01-11T23:33:33.7212625Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065 2023-01-11T23:33:33.7217419Z #80 0x0000000000594ab7 in PyEval_EvalCode (co=co@entry=0x7fb3f2034920, 2023-01-11T23:33:33.7217894Z globals=globals@entry=0x7fb3f2026600, locals=locals@entry=0x7fb3f2026600) 2023-01-11T23:33:33.7218441Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:1134 2023-01-11T23:33:33.7218833Z #81 0x00000000005c6e57 in run_eval_code_obj () 2023-01-11T23:33:33.7219447Z at /usr/local/src/conda/python-3.10.8/Objects/clinic/marshal.c.h:1291 2023-01-11T23:33:33.7222360Z #82 0x00000000005c1d40 in run_mod () 2023-01-11T23:33:33.7222874Z at /usr/local/src/conda/python-3.10.8/Objects/clinic/marshal.c.h:1312 2023-01-11T23:33:33.7224833Z #83 0x00000000005b9ebb in PyRun_StringFlags.localalias () 2023-01-11T23:33:33.7225401Z at /usr/local/src/conda/python-3.10.8/Objects/clinic/marshal.c.h:1183 2023-01-11T23:33:33.7227880Z #84 0x00000000005b9cfb in PyRun_SimpleStringFlags.localalias () 2023-01-11T23:33:33.7228466Z at /usr/local/src/conda/python-3.10.8/Objects/clinic/marshal.c.h:503 2023-01-11T23:33:33.7232489Z #85 0x00000000005b8d5c in pymain_run_command ( 2023-01-11T23:33:33.7232992Z command=) 2023-01-11T23:33:33.7233601Z at /croot/python-split_1669298683653/work/build-static/python.c:252 2023-01-11T23:33:33.7234008Z #86 pymain_run_python (exitcode=0x7ffc09d89170) 2023-01-11T23:33:33.7234525Z at /croot/python-split_1669298683653/work/build-static/python.c:582 2023-01-11T23:33:33.7234903Z #87 Py_RunMain.localalias () 2023-01-11T23:33:33.7235404Z at /croot/python-split_1669298683653/work/build-static/python.c:670 2023-01-11T23:33:33.7250882Z #88 0x0000000000587c29 in Py_BytesMain (argc=, 2023-01-11T23:33:33.7251267Z argv=) 2023-01-11T23:33:33.7251779Z at /croot/python-split_1669298683653/work/build-static/python.c:1090 2023-01-11T23:33:33.7255882Z #89 0x00007fb3f2106c87 in __libc_start_main (main=0x587be0
, argc=5, 2023-01-11T23:33:33.7256272Z argv=0x7ffc09d89378, init=, fini=, 2023-01-11T23:33:33.7256630Z rtld_fini=, stack_end=0x7ffc09d89368) 2023-01-11T23:33:33.7257063Z at ../csu/libc-start.c:310 2023-01-11T23:33:33.7257378Z #90 0x0000000000587ade in _start () 2023-01-11T23:33:33.7257855Z at /usr/local/src/conda/python-3.10.8/Modules/_io/clinic/peg_api.c:880 2023-01-11T23:33:33.8771122Z GNU gdb (Ubuntu 8.1.1-0ubuntu1) 8.1.1 2023-01-11T23:33:33.8771726Z Copyright (C) 2018 Free Software Foundation, Inc. 2023-01-11T23:33:33.8772177Z License GPLv3+: GNU GPL version 3 or later 2023-01-11T23:33:33.8772662Z This is free software: you are free to change and redistribute it. 2023-01-11T23:33:33.8772990Z There is NO WARRANTY, to the extent permitted by law. Type "show copying" 2023-01-11T23:33:33.8773260Z and "show warranty" for details. 2023-01-11T23:33:33.8773577Z This GDB was configured as "x86_64-linux-gnu". 2023-01-11T23:33:33.8773853Z Type "show configuration" for configuration details. 2023-01-11T23:33:33.8774129Z For bug reporting instructions, please see: 2023-01-11T23:33:33.8774397Z . 2023-01-11T23:33:33.8774934Z Find the GDB manual and other documentation resources online at: 2023-01-11T23:33:33.8775221Z . 2023-01-11T23:33:33.8775466Z For help, type "help". 2023-01-11T23:33:33.8775731Z Type "apropos word" to search for commands related to "word"... 2023-01-11T23:33:33.9484277Z Reading symbols from python...done. 2023-01-11T23:33:34.4779141Z 2023-01-11T23:33:34.4779877Z warning: core file may not match specified executable file. 2023-01-11T23:33:34.4798979Z BFD: warning: /opt/conda/lib/libgomp.so.1: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010001 2023-01-11T23:33:34.4799532Z BFD: warning: /opt/conda/lib/libgomp.so.1: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010002 2023-01-11T23:33:34.4799908Z BFD: warning: /opt/conda/bin/../lib/libstdc++.so.6: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010001 2023-01-11T23:33:34.4800186Z [New LWP 19077] 2023-01-11T23:33:34.4800484Z BFD: warning: /opt/conda/bin/../lib/libstdc++.so.6: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010002 2023-01-11T23:33:34.4800850Z BFD: warning: /opt/conda/bin/../lib/libgcc_s.so.1: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010001 2023-01-11T23:33:34.4801502Z BFD: warning: /opt/conda/bin/../lib/libgcc_s.so.1: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010002 2023-01-11T23:33:34.4809496Z BFD: warning: /opt/conda/lib/python3.10/site-packages/numpy/core/../../../.././libgfortran.so.5: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010001 2023-01-11T23:33:34.4810185Z BFD: warning: /opt/conda/lib/python3.10/site-packages/numpy/core/../../../.././libgfortran.so.5: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010002 2023-01-11T23:33:34.4810751Z BFD: warning: /opt/conda/lib/python3.10/site-packages/numpy/core/../../../.././libquadmath.so.0: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010001 2023-01-11T23:33:34.4811293Z BFD: warning: /opt/conda/lib/python3.10/site-packages/numpy/core/../../../.././libquadmath.so.0: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010002 2023-01-11T23:33:34.4991938Z [Thread debugging using libthread_db enabled] 2023-01-11T23:33:34.4993032Z Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". 2023-01-11T23:33:40.4032594Z 78 ../sysdeps/unix/syscall-template.S: No such file or directory. 2023-01-11T23:33:40.4033133Z Core was generated by `/opt/conda/bin/python -bb test_multiprocessing_spawn.py -v --import-slow-tests'. 2023-01-11T23:33:40.4033716Z warning: File "/var/lib/jenkins/workspace/.gdbinit" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load". 2023-01-11T23:33:40.4034098Z Program terminated with signal SIGABRT, Aborted. 2023-01-11T23:33:40.4034464Z #0 0x00007f1c269ce177 in kill () at ../sysdeps/unix/syscall-template.S:78 2023-01-11T23:33:40.4063665Z To enable execution of this file add 2023-01-11T23:33:40.4064840Z add-auto-load-safe-path /var/lib/jenkins/workspace/.gdbinit 2023-01-11T23:33:40.4065630Z line to your configuration file "/var/lib/jenkins/.gdbinit". 2023-01-11T23:33:40.4066227Z To completely disable this security protection add 2023-01-11T23:33:40.4066751Z set auto-load safe-path / 2023-01-11T23:33:40.4067215Z line to your configuration file "/var/lib/jenkins/.gdbinit". 2023-01-11T23:33:40.4067758Z For more information about this security protection see the 2023-01-11T23:33:40.4068419Z "Auto-loading safe path" section in the GDB manual. E.g., run from the shell: 2023-01-11T23:33:40.4068779Z info "(gdb)Auto-loading safe path" 2023-01-11T23:33:40.4097364Z #0 0x00007f1c269ce177 in kill () at ../sysdeps/unix/syscall-template.S:78 2023-01-11T23:33:40.4098738Z #1 0x00000000004cb0d3 in os_kill_impl ( 2023-01-11T23:33:40.4099559Z module=, 2023-01-11T23:33:40.4100419Z signal=, 2023-01-11T23:33:40.4101156Z pid=) 2023-01-11T23:33:40.4102115Z at /croot/python-split_1669298683653/_build_env/x86_64-conda-linux-gnu/sysroot/usr/include/sys/_iomodule.c:7929 2023-01-11T23:33:40.4106398Z #2 os_kill (module=, args=args@entry=0x7f1b327c5058, 2023-01-11T23:33:40.4106954Z nargs=) 2023-01-11T23:33:40.4107670Z at /usr/local/src/conda/python-3.10.8/Modules/codecs.c:3581 2023-01-11T23:33:40.4109453Z #3 0x00000000004fe7d4 in cfunction_vectorcall_FASTCALL (func=0x7f1c27911a30, 2023-01-11T23:33:40.4110610Z args=0x7f1b327c5058, nargsf=, kwnames=) 2023-01-11T23:33:40.4111605Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_bitutils.h:430 2023-01-11T23:33:40.4115046Z #4 0x00000000004f351e in _PyObject_VectorcallTstate (kwnames=0x0, 2023-01-11T23:33:40.4115505Z nargsf=, args=0x7f1b327c5058, callable=0x7f1c27911a30, 2023-01-11T23:33:40.4115856Z tstate=0xb34b80) 2023-01-11T23:33:40.4116345Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123 2023-01-11T23:33:40.4117056Z #5 PyObject_Vectorcall (kwnames=0x0, nargsf=, 2023-01-11T23:33:40.4117441Z args=0x7f1b327c5058, callable=0x7f1c27911a30) 2023-01-11T23:33:40.4118239Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123 2023-01-11T23:33:40.4119779Z #6 call_function (kwnames=0x0, oparg=, 2023-01-11T23:33:40.4120517Z pp_stack=, trace_info=0x7ffc614eb7c0, 2023-01-11T23:33:40.4120911Z tstate=) 2023-01-11T23:33:40.4121763Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5891 2023-01-11T23:33:40.4122360Z #7 _PyEval_EvalFrameDefault () 2023-01-11T23:33:40.4122882Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4181 2023-01-11T23:33:40.4123308Z #8 0x00000000004fe5ef in _PyEval_EvalFrame ( 2023-01-11T23:33:40.4123839Z throwflag=, 2023-01-11T23:33:40.4124437Z f=, 2023-01-11T23:33:40.4124808Z tstate=) 2023-01-11T23:33:40.4125213Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052 2023-01-11T23:33:40.4125586Z #9 _PyEval_Vector (kwnames=, 2023-01-11T23:33:40.4126064Z kwnames@entry=, argcount=, args=, 2023-01-11T23:33:40.4126780Z args@entry=, locals=0x0, 2023-01-11T23:33:40.4127339Z locals@entry=, con=0x7f1b35dacd40, 2023-01-11T23:33:40.4127751Z con@entry=, tstate=0xb34b80, 2023-01-11T23:33:40.4128178Z tstate@entry=) 2023-01-11T23:33:40.4128596Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065 2023-01-11T23:33:40.4128859Z #10 _PyFunction_Vectorcall () 2023-01-11T23:33:40.4129174Z at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342 2023-01-11T23:33:40.4131067Z #11 0x00000000004f141c in do_call_core (kwdict=0x0, callargs=0x7f1b35d70c10, 2023-01-11T23:33:40.4131621Z func=0x7f1b35dacd30, trace_info=0x7ffc614eb980, tstate=) 2023-01-11T23:33:40.4132044Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5943 2023-01-11T23:33:40.4132372Z #12 _PyEval_EvalFrameDefault () 2023-01-11T23:33:40.4132786Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4277 2023-01-11T23:33:40.4133058Z #13 0x00000000004fe5ef in _PyEval_EvalFrame ( 2023-01-11T23:33:40.4133382Z throwflag=, 2023-01-11T23:33:40.4133748Z f=, 2023-01-11T23:33:40.4134122Z tstate=) 2023-01-11T23:33:40.4134929Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052 2023-01-11T23:33:40.4137034Z #14 _PyEval_Vector (kwnames=, 2023-01-11T23:33:40.4137585Z kwnames@entry=, argcount=, args=, 2023-01-11T23:33:40.4148776Z args@entry=, locals=0x0, 2023-01-11T23:33:40.4149330Z locals@entry=, con=0x7f1b3f41b1d0, 2023-01-11T23:33:40.4149963Z con@entry=, tstate=0xb34b80, 2023-01-11T23:33:40.4150427Z tstate@entry=) 2023-01-11T23:33:40.4151025Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065 2023-01-11T23:33:40.4151286Z #15 _PyFunction_Vectorcall () 2023-01-11T23:33:40.4151607Z at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342 2023-01-11T23:33:40.4151972Z #16 0x00000000004f141c in do_call_core (kwdict=0x7f1b3f3de800, 2023-01-11T23:33:40.4152300Z callargs=0x7f1b327ae2a0, func=0x7f1b3f41b1c0, trace_info=0x7ffc614ebb40, 2023-01-11T23:33:40.4152549Z tstate=) 2023-01-11T23:33:40.4152916Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5943 2023-01-11T23:33:40.4153206Z #17 _PyEval_EvalFrameDefault () 2023-01-11T23:33:40.4153519Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4277 2023-01-11T23:33:40.4153787Z #18 0x00000000004fe5ef in _PyEval_EvalFrame ( 2023-01-11T23:33:40.4154106Z throwflag=, 2023-01-11T23:33:40.4154577Z f=, 2023-01-11T23:33:40.4154944Z tstate=) 2023-01-11T23:33:40.4155427Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052 2023-01-11T23:33:40.4155757Z #19 _PyEval_Vector (kwnames=, 2023-01-11T23:33:40.4156245Z kwnames@entry=, argcount=, args=, 2023-01-11T23:33:40.4156763Z args@entry=, locals=0x0, 2023-01-11T23:33:40.4157175Z locals@entry=, con=0x7f1b3f3c7650, 2023-01-11T23:33:40.4157667Z con@entry=, tstate=0xb34b80, 2023-01-11T23:33:40.4158188Z tstate@entry=) 2023-01-11T23:33:40.4158723Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065 2023-01-11T23:33:40.4158983Z #20 _PyFunction_Vectorcall () 2023-01-11T23:33:40.4159290Z at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342 2023-01-11T23:33:40.4159600Z #21 0x00000000004ef101 in _PyObject_VectorcallTstate (kwnames=0x0, 2023-01-11T23:33:40.4159900Z nargsf=, args=0x39ca14b8, callable=0x7f1b3f3c7640, 2023-01-11T23:33:40.4160187Z tstate=0xb34b80) 2023-01-11T23:33:40.4160585Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123 2023-01-11T23:33:40.4160992Z #22 PyObject_Vectorcall (kwnames=0x0, nargsf=, 2023-01-11T23:33:40.4161305Z args=0x39ca14b8, callable=0x7f1b3f3c7640) 2023-01-11T23:33:40.4161643Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123 2023-01-11T23:33:40.4161934Z #23 call_function (kwnames=0x0, oparg=, 2023-01-11T23:33:40.4162215Z pp_stack=, trace_info=0x7ffc614ebd00, 2023-01-11T23:33:40.4162462Z tstate=) 2023-01-11T23:33:40.4162764Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5891 2023-01-11T23:33:40.4163097Z #24 _PyEval_EvalFrameDefault () 2023-01-11T23:33:40.4163414Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4198 2023-01-11T23:33:40.4163685Z #25 0x0000000000509dbe in _PyEval_EvalFrame ( 2023-01-11T23:33:40.4164003Z throwflag=, 2023-01-11T23:33:40.4164375Z f=, 2023-01-11T23:33:40.4164746Z tstate=) 2023-01-11T23:33:40.4165125Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052 2023-01-11T23:33:40.4165500Z #26 _PyEval_Vector (kwnames=, argcount=, 2023-01-11T23:33:40.4165864Z args=0x7f1b3325f9d8, locals=0x0, con=0x7f1b3f3f0050, tstate=0xb34b80) 2023-01-11T23:33:40.4166206Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065 2023-01-11T23:33:40.4166936Z #27 _PyFunction_Vectorcall (kwnames=, nargsf=, 2023-01-11T23:33:40.4167300Z stack=0x7f1b3325f9d8, func=0x7f1b3f3f0040) 2023-01-11T23:33:40.4167636Z at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342 2023-01-11T23:33:40.4170714Z #28 _PyObject_VectorcallTstate (kwnames=, 2023-01-11T23:33:40.4171024Z nargsf=, args=0x7f1b3325f9d8, callable=0x7f1b3f3f0040, 2023-01-11T23:33:40.4171311Z tstate=0xb34b80) 2023-01-11T23:33:40.4171691Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:114 2023-01-11T23:33:40.4171957Z #29 method_vectorcall () 2023-01-11T23:33:40.4172282Z at /usr/local/src/conda/python-3.10.8/Programs/_functoolsmodule.c:53 2023-01-11T23:33:40.4174956Z #30 0x00000000004efd83 in _PyObject_VectorcallTstate (kwnames=0x7f1b328f77f0, 2023-01-11T23:33:40.4175392Z nargsf=, args=, callable=0x7f1b35a1f400, 2023-01-11T23:33:40.4175686Z tstate=0xb34b80) 2023-01-11T23:33:40.4176019Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123 2023-01-11T23:33:40.4176537Z #31 PyObject_Vectorcall (kwnames=0x7f1b328f77f0, nargsf=, 2023-01-11T23:33:40.4176863Z args=, callable=0x7f1b35a1f400) 2023-01-11T23:33:40.4177408Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123 2023-01-11T23:33:40.4178080Z #32 call_function (kwnames=0x7f1b328f77f0, oparg=, 2023-01-11T23:33:40.4178381Z pp_stack=, trace_info=0x7ffc614ebf10, 2023-01-11T23:33:40.4178655Z tstate=) 2023-01-11T23:33:40.4179043Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5891 2023-01-11T23:33:40.4179306Z #33 _PyEval_EvalFrameDefault () 2023-01-11T23:33:40.4179685Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4231 2023-01-11T23:33:40.4180111Z #34 0x00000000004fe5ef in _PyEval_EvalFrame ( 2023-01-11T23:33:40.4180483Z throwflag=, 2023-01-11T23:33:40.4180902Z f=, 2023-01-11T23:33:40.4181271Z tstate=) 2023-01-11T23:33:40.4181671Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052 2023-01-11T23:33:40.4184928Z #35 _PyEval_Vector (kwnames=, 2023-01-11T23:33:40.4185454Z kwnames@entry=, argcount=, args=, 2023-01-11T23:33:40.4185911Z args@entry=, locals=0x0, 2023-01-11T23:33:40.4186332Z locals@entry=, con=0x7f1b327b7410, 2023-01-11T23:33:40.4186865Z con@entry=, tstate=0xb34b80, 2023-01-11T23:33:40.4187258Z tstate@entry=) 2023-01-11T23:33:40.4187669Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065 2023-01-11T23:33:40.4187930Z #36 _PyFunction_Vectorcall () 2023-01-11T23:33:40.4188239Z at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342 2023-01-11T23:33:40.4189996Z #37 0x00000000004ef101 in _PyObject_VectorcallTstate (kwnames=0x0, 2023-01-11T23:33:40.4190362Z nargsf=, args=0x7f1b328cb278, callable=0x7f1b327b7400, 2023-01-11T23:33:40.4190619Z tstate=0xb34b80) 2023-01-11T23:33:40.4191024Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123 2023-01-11T23:33:40.4193390Z #38 PyObject_Vectorcall (kwnames=0x0, nargsf=, 2023-01-11T23:33:40.4193745Z args=0x7f1b328cb278, callable=0x7f1b327b7400) 2023-01-11T23:33:40.4194286Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123 2023-01-11T23:33:40.4194669Z #39 call_function (kwnames=0x0, oparg=, 2023-01-11T23:33:40.4194954Z pp_stack=, trace_info=0x7ffc614ec0d0, 2023-01-11T23:33:40.4195208Z tstate=) 2023-01-11T23:33:40.4195528Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5891 2023-01-11T23:33:40.4195784Z #40 _PyEval_EvalFrameDefault () 2023-01-11T23:33:40.4196186Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4198 2023-01-11T23:33:40.4196505Z #41 0x00000000004f706d in _PyEval_EvalFrame ( 2023-01-11T23:33:40.4196924Z throwflag=, 2023-01-11T23:33:40.4197447Z f=, 2023-01-11T23:33:40.4197849Z tstate=) 2023-01-11T23:33:40.4198253Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052 2023-01-11T23:33:40.4200755Z #42 _PyEval_Vector (kwnames=0x0, 2023-01-11T23:33:40.4201513Z argcount=, args=, locals=0x0, con=0x7f1b327b7020, 2023-01-11T23:33:40.4202066Z tstate=0xb34b80) 2023-01-11T23:33:40.4202531Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065 2023-01-11T23:33:40.4202969Z #43 _PyFunction_Vectorcall (kwnames=0x0, nargsf=, 2023-01-11T23:33:40.4203274Z stack=, func=0x7f1b327b7010) 2023-01-11T23:33:40.4203607Z at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342 2023-01-11T23:33:40.4203904Z #44 _PyObject_FastCallDictTstate.localalias () 2023-01-11T23:33:40.4204241Z at /usr/local/src/conda/python-3.10.8/Include/abstract.h:142 2023-01-11T23:33:40.4208746Z #45 0x0000000000507af8 in _PyObject_Call_Prepend (kwargs=0x0, 2023-01-11T23:33:40.4209424Z kwargs@entry=, args=0x7f1b35d91270, 2023-01-11T23:33:40.4210014Z args@entry=, obj=, 2023-01-11T23:33:40.4210596Z obj@entry=, callable=0x7f1b327b7010, 2023-01-11T23:33:40.4211060Z callable@entry=, tstate=0xb34b80, 2023-01-11T23:33:40.4211462Z tstate@entry=) 2023-01-11T23:33:40.4211880Z at /usr/local/src/conda/python-3.10.8/Include/abstract.h:431 2023-01-11T23:33:40.4212119Z #46 slot_tp_init () 2023-01-11T23:33:40.4212431Z at /usr/local/src/conda/python-3.10.8/Programs/gcmodule.c:7734 2023-01-11T23:33:40.4212900Z #47 0x00000000004f7bdb in type_call (kwds=0x0, args=0x7f1b35d91270, 2023-01-11T23:33:40.4213208Z type=) 2023-01-11T23:33:40.4213627Z at /usr/local/src/conda/python-3.10.8/Include/abstract.h:224 2023-01-11T23:33:40.4213901Z #48 _PyObject_MakeTpCall.localalias () 2023-01-11T23:33:40.4214241Z at /usr/local/src/conda/python-3.10.8/Include/abstract.h:215 2023-01-11T23:33:40.4215225Z #49 0x00000000004f37ae in _PyObject_VectorcallTstate ( 2023-01-11T23:33:40.4215647Z kwnames=, 2023-01-11T23:33:40.4216053Z nargsf=, args=0x7f1b328cb0d8, 2023-01-11T23:33:40.4216549Z callable=, 2023-01-11T23:33:40.4216973Z tstate=) 2023-01-11T23:33:40.4217409Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:112 2023-01-11T23:33:40.4218010Z #50 _PyObject_VectorcallTstate (kwnames=0x0, nargsf=, 2023-01-11T23:33:40.4218328Z args=0x7f1b328cb0d8, callable=0x39dcd400, tstate=) 2023-01-11T23:33:40.4218696Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:99 2023-01-11T23:33:40.4220853Z #51 PyObject_Vectorcall (kwnames=0x0, nargsf=, 2023-01-11T23:33:40.4221134Z args=0x7f1b328cb0d8, callable=0x39dcd400) 2023-01-11T23:33:40.4221684Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123 2023-01-11T23:33:40.4223247Z #52 call_function (kwnames=0x0, oparg=, 2023-01-11T23:33:40.4223693Z pp_stack=, trace_info=0x7ffc614ec3b0, 2023-01-11T23:33:40.4224205Z tstate=) 2023-01-11T23:33:40.4224713Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5891 2023-01-11T23:33:40.4225074Z #53 _PyEval_EvalFrameDefault () 2023-01-11T23:33:40.4225403Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4213 2023-01-11T23:33:40.4225739Z #54 0x00000000004fe5ef in _PyEval_EvalFrame ( 2023-01-11T23:33:40.4226072Z throwflag=, 2023-01-11T23:33:40.4226441Z f=, 2023-01-11T23:33:40.4226809Z tstate=) 2023-01-11T23:33:40.4227199Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052 2023-01-11T23:33:40.4229356Z #55 _PyEval_Vector (kwnames=, 2023-01-11T23:33:40.4229987Z kwnames@entry=, argcount=, args=, 2023-01-11T23:33:40.4230648Z args@entry=, locals=0x0, 2023-01-11T23:33:40.4231159Z locals@entry=, con=0x7f1b3f418200, 2023-01-11T23:33:40.4231575Z con@entry=, tstate=0xb34b80, 2023-01-11T23:33:40.4231968Z tstate@entry=) 2023-01-11T23:33:40.4232383Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065 2023-01-11T23:33:40.4232639Z #56 _PyFunction_Vectorcall () 2023-01-11T23:33:40.4232948Z at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342 2023-01-11T23:33:40.4234665Z #57 0x00000000004f351e in _PyObject_VectorcallTstate (kwnames=0x0, 2023-01-11T23:33:40.4235136Z nargsf=, args=0x7f1b328cad98, callable=0x7f1b3f4181f0, 2023-01-11T23:33:40.4235496Z tstate=0xb34b80) 2023-01-11T23:33:40.4236022Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123 2023-01-11T23:33:40.4236735Z #58 PyObject_Vectorcall (kwnames=0x0, nargsf=, 2023-01-11T23:33:40.4237344Z args=0x7f1b328cad98, callable=0x7f1b3f4181f0) 2023-01-11T23:33:40.4237946Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123 2023-01-11T23:33:40.4239092Z #59 call_function (kwnames=0x0, oparg=, 2023-01-11T23:33:40.4239522Z pp_stack=, trace_info=0x7ffc614ec570, 2023-01-11T23:33:40.4239875Z tstate=) 2023-01-11T23:33:40.4240339Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5891 2023-01-11T23:33:40.4240653Z #60 _PyEval_EvalFrameDefault () 2023-01-11T23:33:40.4241216Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4181 2023-01-11T23:33:40.4241508Z #61 0x00000000004fe5ef in _PyEval_EvalFrame ( 2023-01-11T23:33:40.4241835Z throwflag=, 2023-01-11T23:33:40.4242219Z f=, 2023-01-11T23:33:40.4242580Z tstate=) 2023-01-11T23:33:40.4242972Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052 2023-01-11T23:33:40.4245285Z #62 _PyEval_Vector (kwnames=, 2023-01-11T23:33:40.4245681Z kwnames@entry=, argcount=, args=, 2023-01-11T23:33:40.4246305Z args@entry=, locals=0x0, 2023-01-11T23:33:40.4246799Z locals@entry=, con=0x7f1b3f3c76e0, 2023-01-11T23:33:40.4247214Z con@entry=, tstate=0xb34b80, 2023-01-11T23:33:40.4247610Z tstate@entry=) 2023-01-11T23:33:40.4248237Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065 2023-01-11T23:33:40.4248496Z #63 _PyFunction_Vectorcall () 2023-01-11T23:33:40.4248811Z at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342 2023-01-11T23:33:40.4250247Z #64 0x00000000004ef101 in _PyObject_VectorcallTstate (kwnames=0x0, 2023-01-11T23:33:40.4250671Z nargsf=, args=0x398f0b98, callable=0x7f1b3f3c76d0, 2023-01-11T23:33:40.4250929Z tstate=0xb34b80) 2023-01-11T23:33:40.4251578Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123 2023-01-11T23:33:40.4252952Z #65 PyObject_Vectorcall (kwnames=0x0, nargsf=, 2023-01-11T23:33:40.4253348Z args=0x398f0b98, callable=0x7f1b3f3c76d0) 2023-01-11T23:33:40.4253871Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123 2023-01-11T23:33:40.4254298Z #66 call_function (kwnames=0x0, oparg=, 2023-01-11T23:33:40.4254882Z pp_stack=, trace_info=0x7ffc614ec730, 2023-01-11T23:33:40.4255182Z tstate=) 2023-01-11T23:33:40.4255499Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5891 2023-01-11T23:33:40.4255764Z #67 _PyEval_EvalFrameDefault () 2023-01-11T23:33:40.4256082Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4198 2023-01-11T23:33:40.4256357Z #68 0x00000000004fe5ef in _PyEval_EvalFrame ( 2023-01-11T23:33:40.4257092Z throwflag=, 2023-01-11T23:33:40.4257545Z f=, 2023-01-11T23:33:40.4257942Z tstate=) 2023-01-11T23:33:40.4258459Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052 2023-01-11T23:33:40.4260639Z #69 _PyEval_Vector (kwnames=, 2023-01-11T23:33:40.4261090Z kwnames@entry=, argcount=, args=, 2023-01-11T23:33:40.4261603Z args@entry=, locals=0x0, 2023-01-11T23:33:40.4262103Z locals@entry=, con=0x7f1b3f435b50, 2023-01-11T23:33:40.4262592Z con@entry=, tstate=0xb34b80, 2023-01-11T23:33:40.4263075Z tstate@entry=) 2023-01-11T23:33:40.4263491Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065 2023-01-11T23:33:40.4263744Z #70 _PyFunction_Vectorcall () 2023-01-11T23:33:40.4264066Z at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342 2023-01-11T23:33:40.4264820Z #71 0x00000000004efd83 in _PyObject_VectorcallTstate (kwnames=0x7f1c267c5a40, 2023-01-11T23:33:40.4265246Z nargsf=, args=, callable=0x7f1b3f435b40, 2023-01-11T23:33:40.4265573Z tstate=0xb34b80) 2023-01-11T23:33:40.4265913Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123 2023-01-11T23:33:40.4266718Z #72 PyObject_Vectorcall (kwnames=0x7f1c267c5a40, nargsf=, 2023-01-11T23:33:40.4267110Z args=, callable=0x7f1b3f435b40) 2023-01-11T23:33:40.4267536Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123 2023-01-11T23:33:40.4269154Z #73 call_function (kwnames=0x7f1c267c5a40, oparg=, 2023-01-11T23:33:40.4269574Z pp_stack=, trace_info=0x7ffc614ec8f0, 2023-01-11T23:33:40.4269964Z tstate=) 2023-01-11T23:33:40.4270390Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5891 2023-01-11T23:33:40.4270651Z #74 _PyEval_EvalFrameDefault () 2023-01-11T23:33:40.4270962Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4231 2023-01-11T23:33:40.4272327Z #75 0x0000000000509dbe in _PyEval_EvalFrame ( 2023-01-11T23:33:40.4272994Z throwflag=, 2023-01-11T23:33:40.4273640Z f=, 2023-01-11T23:33:40.4274315Z tstate=) 2023-01-11T23:33:40.4274970Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052 2023-01-11T23:33:40.4275578Z #76 _PyEval_Vector (kwnames=, argcount=, 2023-01-11T23:33:40.4276090Z args=0x7f1b330bddd0, locals=0x0, con=0x7f1b35dad490, tstate=0xb34b80) 2023-01-11T23:33:40.4276472Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065 2023-01-11T23:33:40.4277636Z #77 _PyFunction_Vectorcall (kwnames=, nargsf=, 2023-01-11T23:33:40.4278105Z stack=0x7f1b330bddd0, func=0x7f1b35dad480) 2023-01-11T23:33:40.4278561Z at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342 2023-01-11T23:33:40.4280263Z #78 _PyObject_VectorcallTstate (kwnames=, 2023-01-11T23:33:40.4280668Z nargsf=, args=0x7f1b330bddd0, callable=0x7f1b35dad480, 2023-01-11T23:33:40.4281185Z tstate=0xb34b80) 2023-01-11T23:33:40.4281864Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:114 2023-01-11T23:33:40.4282220Z #79 method_vectorcall () 2023-01-11T23:33:40.4282724Z at /usr/local/src/conda/python-3.10.8/Programs/_functoolsmodule.c:53 2023-01-11T23:33:40.4285092Z #80 0x00000000004eecef in _PyObject_VectorcallTstate (kwnames=0x0, 2023-01-11T23:33:40.4285549Z nargsf=, args=0x7f1b330bddd8, callable=0x7f1b35bf7340, 2023-01-11T23:33:40.4285980Z tstate=0xb34b80) 2023-01-11T23:33:40.4286320Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123 2023-01-11T23:33:40.4287641Z #81 PyObject_Vectorcall (kwnames=0x0, nargsf=, 2023-01-11T23:33:40.4288121Z args=0x7f1b330bddd8, callable=0x7f1b35bf7340) 2023-01-11T23:33:40.4288548Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123 2023-01-11T23:33:40.4289095Z #82 call_function (kwnames=0x0, oparg=, 2023-01-11T23:33:40.4289479Z pp_stack=, trace_info=0x7ffc614ecb00, 2023-01-11T23:33:40.4289794Z tstate=) 2023-01-11T23:33:40.4290131Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5891 2023-01-11T23:33:40.4290452Z #83 _PyEval_EvalFrameDefault () 2023-01-11T23:33:40.4290812Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4213 2023-01-11T23:33:40.4291175Z #84 0x00000000004fe5ef in _PyEval_EvalFrame ( 2023-01-11T23:33:40.4291899Z throwflag=, 2023-01-11T23:33:40.4292415Z f=, 2023-01-11T23:33:40.4292918Z tstate=) 2023-01-11T23:33:40.4293386Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052 2023-01-11T23:33:40.4295676Z #85 _PyEval_Vector (kwnames=, 2023-01-11T23:33:40.4296304Z kwnames@entry=, argcount=, args=, 2023-01-11T23:33:40.4296929Z args@entry=, locals=0x0, 2023-01-11T23:33:40.4297442Z locals@entry=, con=0x7f1c26720d40, 2023-01-11T23:33:40.4297896Z con@entry=, tstate=0xb34b80, 2023-01-11T23:33:40.4298316Z tstate@entry=) 2023-01-11T23:33:40.4298739Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065 2023-01-11T23:33:40.4298994Z #86 _PyFunction_Vectorcall () 2023-01-11T23:33:40.4299310Z at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342 2023-01-11T23:33:40.4301106Z #87 0x00000000004ef101 in _PyObject_VectorcallTstate (kwnames=0x0, 2023-01-11T23:33:40.4301544Z nargsf=, args=0x5b1cef0, callable=0x7f1c26720d30, 2023-01-11T23:33:40.4301787Z tstate=0xb34b80) 2023-01-11T23:33:40.4302116Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123 2023-01-11T23:33:40.4303288Z #88 PyObject_Vectorcall (kwnames=0x0, nargsf=, args=0x5b1cef0, 2023-01-11T23:33:40.4303679Z callable=0x7f1c26720d30) 2023-01-11T23:33:40.4304112Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123 2023-01-11T23:33:40.4305323Z #89 call_function (kwnames=0x0, oparg=, 2023-01-11T23:33:40.4305726Z pp_stack=, trace_info=0x7ffc614eccc0, 2023-01-11T23:33:40.4305993Z tstate=) 2023-01-11T23:33:40.4306344Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5891 2023-01-11T23:33:40.4306620Z #90 _PyEval_EvalFrameDefault () 2023-01-11T23:33:40.4307070Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4198 2023-01-11T23:33:40.4307347Z #91 0x0000000000509dbe in _PyEval_EvalFrame ( 2023-01-11T23:33:40.4307671Z throwflag=, 2023-01-11T23:33:40.4308092Z f=, 2023-01-11T23:33:40.4308474Z tstate=) 2023-01-11T23:33:40.4309038Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052 2023-01-11T23:33:40.4312551Z #92 _PyEval_Vector (kwnames=, argcount=, 2023-01-11T23:33:40.4313316Z args=0x5ccbda8, locals=0x0, con=0x7f1c26720ef0, tstate=0xb34b80) 2023-01-11T23:33:40.4314236Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065 2023-01-11T23:33:40.4314773Z #93 _PyFunction_Vectorcall (kwnames=, nargsf=, 2023-01-11T23:33:40.4315294Z stack=0x5ccbda8, func=0x7f1c26720ee0) 2023-01-11T23:33:40.4316028Z at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342 2023-01-11T23:33:40.4317588Z #94 _PyObject_VectorcallTstate (kwnames=, 2023-01-11T23:33:40.4318265Z nargsf=, args=0x5ccbda8, callable=0x7f1c26720ee0, 2023-01-11T23:33:40.4318513Z tstate=0xb34b80) 2023-01-11T23:33:40.4318944Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:114 2023-01-11T23:33:40.4319207Z #95 method_vectorcall () 2023-01-11T23:33:40.4319532Z at /usr/local/src/conda/python-3.10.8/Programs/_functoolsmodule.c:53 2023-01-11T23:33:40.4323094Z #96 0x00000000004efd83 in _PyObject_VectorcallTstate (kwnames=0x7f1b3d104a30, 2023-01-11T23:33:40.4323425Z nargsf=, args=, callable=0x7f1b35bf7200, 2023-01-11T23:33:40.4323682Z tstate=0xb34b80) 2023-01-11T23:33:40.4324001Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123 2023-01-11T23:33:40.4324660Z #97 PyObject_Vectorcall (kwnames=0x7f1b3d104a30, nargsf=, 2023-01-11T23:33:40.4324954Z args=, callable=0x7f1b35bf7200) 2023-01-11T23:33:40.4325545Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123 2023-01-11T23:33:40.4327325Z #98 call_function (kwnames=0x7f1b3d104a30, oparg=, 2023-01-11T23:33:40.4327782Z pp_stack=, trace_info=0x7ffc614eced0, 2023-01-11T23:33:40.4328188Z tstate=) 2023-01-11T23:33:40.4328682Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5891 2023-01-11T23:33:40.4329075Z #99 _PyEval_EvalFrameDefault () 2023-01-11T23:33:40.4329548Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4231 2023-01-11T23:33:40.4329945Z #100 0x0000000000509dbe in _PyEval_EvalFrame ( 2023-01-11T23:33:40.4330438Z throwflag=, 2023-01-11T23:33:40.4330992Z f=, 2023-01-11T23:33:40.4331538Z tstate=) 2023-01-11T23:33:40.4332131Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052 2023-01-11T23:33:40.4335977Z #101 _PyEval_Vector (kwnames=, argcount=, 2023-01-11T23:33:40.4336522Z args=0x7f1b35f46928, locals=0x0, con=0x7f1b35da5910, tstate=0xb34b80) 2023-01-11T23:33:40.4336957Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065 2023-01-11T23:33:40.4337543Z #102 _PyFunction_Vectorcall (kwnames=, nargsf=, 2023-01-11T23:33:40.4337844Z stack=0x7f1b35f46928, func=0x7f1b35da5900) 2023-01-11T23:33:40.4338168Z at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342 2023-01-11T23:33:40.4341388Z #103 _PyObject_VectorcallTstate (kwnames=, 2023-01-11T23:33:40.4341954Z nargsf=, args=0x7f1b35f46928, callable=0x7f1b35da5900, 2023-01-11T23:33:40.4342395Z tstate=0xb34b80) 2023-01-11T23:33:40.4342898Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:114 2023-01-11T23:33:40.4343295Z #104 method_vectorcall () 2023-01-11T23:33:40.4343785Z at /usr/local/src/conda/python-3.10.8/Programs/_functoolsmodule.c:53 2023-01-11T23:33:40.4346828Z #105 0x00000000004efd83 in _PyObject_VectorcallTstate ( 2023-01-11T23:33:40.4347469Z kwnames=0x7f1b3d102700, nargsf=, args=, 2023-01-11T23:33:40.4348120Z callable=0x7f1b3d4148c0, tstate=0xb34b80) 2023-01-11T23:33:40.4348704Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123 2023-01-11T23:33:40.4349170Z #106 PyObject_Vectorcall (kwnames=0x7f1b3d102700, nargsf=, 2023-01-11T23:33:40.4349590Z args=, callable=0x7f1b3d4148c0) 2023-01-11T23:33:40.4350176Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123 2023-01-11T23:33:40.4353150Z #107 call_function (kwnames=0x7f1b3d102700, oparg=, 2023-01-11T23:33:40.4353939Z pp_stack=, trace_info=0x7ffc614ed0e0, 2023-01-11T23:33:40.4354628Z tstate=) 2023-01-11T23:33:40.4355546Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5891 2023-01-11T23:33:40.4355965Z #108 _PyEval_EvalFrameDefault () 2023-01-11T23:33:40.4356448Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4231 2023-01-11T23:33:40.4356852Z #109 0x0000000000509f16 in _PyEval_EvalFrame ( 2023-01-11T23:33:40.4357304Z throwflag=, 2023-01-11T23:33:40.4357858Z f=, 2023-01-11T23:33:40.4358408Z tstate=) 2023-01-11T23:33:40.4359005Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052 2023-01-11T23:33:40.4359360Z #110 _PyEval_Vector ( 2023-01-11T23:33:40.4359787Z kwnames=, 2023-01-11T23:33:40.4360421Z argcount=, args=0x7ffc614ed1d0, locals=0x0, con=0x7f1b35da59a0, 2023-01-11T23:33:40.4360898Z tstate=0xb34b80, 2023-01-11T23:33:40.4361337Z tstate@entry=) 2023-01-11T23:33:40.4361944Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065 2023-01-11T23:33:40.4362320Z #111 _PyFunction_Vectorcall ( 2023-01-11T23:33:40.4362875Z kwnames=, 2023-01-11T23:33:40.4363314Z nargsf=, stack=0x7ffc614ed1d0, func=0x7f1b35da5990) 2023-01-11T23:33:40.4363827Z at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342 2023-01-11T23:33:40.4365211Z #112 _PyObject_VectorcallTstate ( 2023-01-11T23:33:40.4365913Z kwnames=, nargsf=, args=, 2023-01-11T23:33:40.4366465Z callable=0x7f1b35da5990, tstate=0xb34b80) 2023-01-11T23:33:40.4367011Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:114 2023-01-11T23:33:40.4367391Z #113 method_vectorcall () 2023-01-11T23:33:40.4367876Z at /usr/local/src/conda/python-3.10.8/Programs/_functoolsmodule.c:83 2023-01-11T23:33:40.4371804Z #114 0x00000000004f141c in do_call_core (kwdict=0x7f1b35dc7080, 2023-01-11T23:33:40.4372304Z callargs=0x7f1b35d91180, func=0x7f1b3eaf3980, trace_info=0x7ffc614ed2f0, 2023-01-11T23:33:40.4372668Z tstate=) 2023-01-11T23:33:40.4373170Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5943 2023-01-11T23:33:40.4373449Z #115 _PyEval_EvalFrameDefault () 2023-01-11T23:33:40.4373775Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4277 2023-01-11T23:33:40.4374044Z #116 0x00000000004f706d in _PyEval_EvalFrame ( 2023-01-11T23:33:40.4374372Z throwflag=, 2023-01-11T23:33:40.4375072Z f=, 2023-01-11T23:33:40.4375605Z tstate=) 2023-01-11T23:33:40.4375999Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052 2023-01-11T23:33:40.4378196Z #117 _PyEval_Vector (kwnames=0x0, 2023-01-11T23:33:40.4378846Z argcount=, args=, locals=0x0, con=0x7f1c267210a0, 2023-01-11T23:33:40.4379323Z tstate=0xb34b80) 2023-01-11T23:33:40.4379806Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065 2023-01-11T23:33:40.4380464Z #118 _PyFunction_Vectorcall (kwnames=0x0, nargsf=, 2023-01-11T23:33:40.4380856Z stack=, func=0x7f1c26721090) 2023-01-11T23:33:40.4381486Z at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342 2023-01-11T23:33:40.4381924Z #119 _PyObject_FastCallDictTstate.localalias () 2023-01-11T23:33:40.4382442Z at /usr/local/src/conda/python-3.10.8/Include/abstract.h:142 2023-01-11T23:33:40.4382825Z #120 0x00000000005084a6 in _PyObject_Call_Prepend () 2023-01-11T23:33:40.4383305Z at /usr/local/src/conda/python-3.10.8/Include/abstract.h:431 2023-01-11T23:33:40.4386544Z #121 0x00000000005d04d3 in slot_tp_call () 2023-01-11T23:33:40.4387303Z at /usr/local/src/conda/python-3.10.8/Programs/gcmodule.c:7494 2023-01-11T23:33:40.4387908Z #122 0x00000000004f7b8b in _PyObject_MakeTpCall.localalias () 2023-01-11T23:33:40.4388495Z at /usr/local/src/conda/python-3.10.8/Include/abstract.h:224 2023-01-11T23:33:40.4393253Z #123 0x00000000004f37ae in _PyObject_VectorcallTstate ( 2023-01-11T23:33:40.4393891Z kwnames=, 2023-01-11T23:33:40.4394468Z nargsf=, args=0x5bd2650, 2023-01-11T23:33:40.4394991Z callable=, 2023-01-11T23:33:40.4395404Z tstate=) 2023-01-11T23:33:40.4395834Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:112 2023-01-11T23:33:40.4396166Z #124 _PyObject_VectorcallTstate (kwnames=0x0, nargsf=, 2023-01-11T23:33:40.4396472Z args=0x5bd2650, callable=0x7f1b35d82ad0, tstate=) 2023-01-11T23:33:40.4396827Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:99 2023-01-11T23:33:40.4397418Z #125 PyObject_Vectorcall (kwnames=0x0, nargsf=, 2023-01-11T23:33:40.4397807Z args=0x5bd2650, callable=0x7f1b35d82ad0) 2023-01-11T23:33:40.4398329Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123 2023-01-11T23:33:40.4400687Z #126 call_function (kwnames=0x0, oparg=, 2023-01-11T23:33:40.4401115Z pp_stack=, trace_info=0x7ffc614ed5f0, 2023-01-11T23:33:40.4401490Z tstate=) 2023-01-11T23:33:40.4401966Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5891 2023-01-11T23:33:40.4402349Z #127 _PyEval_EvalFrameDefault () 2023-01-11T23:33:40.4402842Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4213 2023-01-11T23:33:40.4403227Z #128 0x0000000000509f16 in _PyEval_EvalFrame ( 2023-01-11T23:33:40.4403545Z throwflag=, 2023-01-11T23:33:40.4403915Z f=, 2023-01-11T23:33:40.4404289Z tstate=) 2023-01-11T23:33:40.4404675Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052 2023-01-11T23:33:40.4406638Z #129 _PyEval_Vector ( 2023-01-11T23:33:40.4407341Z kwnames=, 2023-01-11T23:33:40.4408067Z argcount=, args=0x7ffc614ed6e0, locals=0x0, con=0x7f1c2672c200, 2023-01-11T23:33:40.4408566Z tstate=0xb34b80, 2023-01-11T23:33:40.4408888Z tstate@entry=) 2023-01-11T23:33:40.4409328Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065 2023-01-11T23:33:40.4409672Z #130 _PyFunction_Vectorcall ( 2023-01-11T23:33:40.4410149Z kwnames=, 2023-01-11T23:33:40.4410741Z nargsf=, stack=0x7ffc614ed6e0, func=0x7f1c2672c1f0) 2023-01-11T23:33:40.4411174Z at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342 2023-01-11T23:33:40.4411828Z #131 _PyObject_VectorcallTstate ( 2023-01-11T23:33:40.4412541Z kwnames=, nargsf=, args=, 2023-01-11T23:33:40.4413014Z callable=0x7f1c2672c1f0, tstate=0xb34b80) 2023-01-11T23:33:40.4413488Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:114 2023-01-11T23:33:40.4413791Z #132 method_vectorcall () 2023-01-11T23:33:40.4414130Z at /usr/local/src/conda/python-3.10.8/Programs/_functoolsmodule.c:83 2023-01-11T23:33:40.4418624Z #133 0x00000000004f141c in do_call_core (kwdict=0x7f1b3601a300, 2023-01-11T23:33:40.4419061Z callargs=0x7f1b35d67130, func=0x7f1b3592a040, trace_info=0x7ffc614ed800, 2023-01-11T23:33:40.4419387Z tstate=) 2023-01-11T23:33:40.4419732Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5943 2023-01-11T23:33:40.4419990Z #134 _PyEval_EvalFrameDefault () 2023-01-11T23:33:40.4420310Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4277 2023-01-11T23:33:40.4420587Z #135 0x00000000004f706d in _PyEval_EvalFrame ( 2023-01-11T23:33:40.4421244Z throwflag=, 2023-01-11T23:33:40.4421831Z f=, 2023-01-11T23:33:40.4422405Z tstate=) 2023-01-11T23:33:40.4423019Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052 2023-01-11T23:33:40.4425990Z #136 _PyEval_Vector (kwnames=0x0, 2023-01-11T23:33:40.4426777Z argcount=, args=, locals=0x0, con=0x7f1c2672c0e0, 2023-01-11T23:33:40.4427275Z tstate=0xb34b80) 2023-01-11T23:33:40.4427764Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065 2023-01-11T23:33:40.4428199Z #137 _PyFunction_Vectorcall (kwnames=0x0, nargsf=, 2023-01-11T23:33:40.4428616Z stack=, func=0x7f1c2672c0d0) 2023-01-11T23:33:40.4429108Z at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342 2023-01-11T23:33:40.4429524Z #138 _PyObject_FastCallDictTstate.localalias () 2023-01-11T23:33:40.4430124Z at /usr/local/src/conda/python-3.10.8/Include/abstract.h:142 2023-01-11T23:33:40.4430516Z #139 0x00000000005084a6 in _PyObject_Call_Prepend () 2023-01-11T23:33:40.4430993Z at /usr/local/src/conda/python-3.10.8/Include/abstract.h:431 2023-01-11T23:33:40.4433784Z #140 0x00000000005d04d3 in slot_tp_call () 2023-01-11T23:33:40.4434653Z at /usr/local/src/conda/python-3.10.8/Programs/gcmodule.c:7494 2023-01-11T23:33:40.4435118Z #141 0x00000000004f7b8b in _PyObject_MakeTpCall.localalias () 2023-01-11T23:33:40.4435476Z at /usr/local/src/conda/python-3.10.8/Include/abstract.h:224 2023-01-11T23:33:40.4438887Z #142 0x00000000004f37ae in _PyObject_VectorcallTstate ( 2023-01-11T23:33:40.4439669Z kwnames=, 2023-01-11T23:33:40.4440474Z nargsf=, args=0x5bd1850, 2023-01-11T23:33:40.4441069Z callable=, 2023-01-11T23:33:40.4441632Z tstate=) 2023-01-11T23:33:40.4442297Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:112 2023-01-11T23:33:40.4442782Z #143 _PyObject_VectorcallTstate (kwnames=0x0, nargsf=, 2023-01-11T23:33:40.4443261Z args=0x5bd1850, callable=0x7f1b35d83880, tstate=) 2023-01-11T23:33:40.4443917Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:99 2023-01-11T23:33:40.4445821Z #144 PyObject_Vectorcall (kwnames=0x0, nargsf=, 2023-01-11T23:33:40.4446395Z args=0x5bd1850, callable=0x7f1b35d83880) 2023-01-11T23:33:40.4446996Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123 2023-01-11T23:33:40.4449265Z #145 call_function (kwnames=0x0, oparg=, 2023-01-11T23:33:40.4449864Z pp_stack=, trace_info=0x7ffc614edb00, 2023-01-11T23:33:40.4450253Z tstate=) 2023-01-11T23:33:40.4450883Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5891 2023-01-11T23:33:40.4451277Z #146 _PyEval_EvalFrameDefault () 2023-01-11T23:33:40.4451781Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4213 2023-01-11T23:33:40.4452165Z #147 0x0000000000509f16 in _PyEval_EvalFrame ( 2023-01-11T23:33:40.4452648Z throwflag=, 2023-01-11T23:33:40.4453220Z f=, 2023-01-11T23:33:40.4453778Z tstate=) 2023-01-11T23:33:40.4454356Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052 2023-01-11T23:33:40.4456297Z #148 _PyEval_Vector ( 2023-01-11T23:33:40.4456765Z kwnames=, 2023-01-11T23:33:40.4457395Z argcount=, args=0x7ffc614edbf0, locals=0x0, con=0x7f1c2672c200, 2023-01-11T23:33:40.4457894Z tstate=0xb34b80, 2023-01-11T23:33:40.4458378Z tstate@entry=) 2023-01-11T23:33:40.4458985Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065 2023-01-11T23:33:40.4460696Z #149 _PyFunction_Vectorcall ( 2023-01-11T23:33:40.4461316Z kwnames=, 2023-01-11T23:33:40.4461867Z nargsf=, stack=0x7ffc614edbf0, func=0x7f1c2672c1f0) 2023-01-11T23:33:40.4462416Z at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342 2023-01-11T23:33:40.4463599Z #150 _PyObject_VectorcallTstate ( 2023-01-11T23:33:40.4464309Z kwnames=, nargsf=, args=, 2023-01-11T23:33:40.4464797Z callable=0x7f1c2672c1f0, tstate=0xb34b80) 2023-01-11T23:33:40.4465298Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:114 2023-01-11T23:33:40.4465587Z #151 method_vectorcall () 2023-01-11T23:33:40.4465910Z at /usr/local/src/conda/python-3.10.8/Programs/_functoolsmodule.c:83 2023-01-11T23:33:40.4469336Z #152 0x00000000004f141c in do_call_core (kwdict=0x7f1b35e39740, 2023-01-11T23:33:40.4469737Z callargs=0x7f1b35d675e0, func=0x7f1b35ef4bc0, trace_info=0x7ffc614edd10, 2023-01-11T23:33:40.4470310Z tstate=) 2023-01-11T23:33:40.4472353Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5943 2023-01-11T23:33:40.4472724Z #153 _PyEval_EvalFrameDefault () 2023-01-11T23:33:40.4473192Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4277 2023-01-11T23:33:40.4473557Z #154 0x00000000004f706d in _PyEval_EvalFrame ( 2023-01-11T23:33:40.4473900Z throwflag=, 2023-01-11T23:33:40.4474273Z f=, 2023-01-11T23:33:40.4474644Z tstate=) 2023-01-11T23:33:40.4475032Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052 2023-01-11T23:33:40.4475589Z #155 _PyEval_Vector (kwnames=0x0, 2023-01-11T23:33:40.4476099Z argcount=, args=, locals=0x0, con=0x7f1c2672c0e0, 2023-01-11T23:33:40.4476501Z tstate=0xb34b80) 2023-01-11T23:33:40.4476816Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065 2023-01-11T23:33:40.4478479Z #156 _PyFunction_Vectorcall (kwnames=0x0, nargsf=, 2023-01-11T23:33:40.4479151Z stack=, func=0x7f1c2672c0d0) 2023-01-11T23:33:40.4479955Z at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342 2023-01-11T23:33:40.4480671Z #157 _PyObject_FastCallDictTstate.localalias () 2023-01-11T23:33:40.4481484Z at /usr/local/src/conda/python-3.10.8/Include/abstract.h:142 2023-01-11T23:33:40.4482151Z #158 0x00000000005084a6 in _PyObject_Call_Prepend () 2023-01-11T23:33:40.4482729Z at /usr/local/src/conda/python-3.10.8/Include/abstract.h:431 2023-01-11T23:33:40.4483080Z #159 0x00000000005d04d3 in slot_tp_call () 2023-01-11T23:33:40.4483469Z at /usr/local/src/conda/python-3.10.8/Programs/gcmodule.c:7494 2023-01-11T23:33:40.4485165Z #160 0x00000000004f7b8b in _PyObject_MakeTpCall.localalias () 2023-01-11T23:33:40.4485919Z at /usr/local/src/conda/python-3.10.8/Include/abstract.h:224 2023-01-11T23:33:40.4489105Z #161 0x00000000004f37ae in _PyObject_VectorcallTstate ( 2023-01-11T23:33:40.4489633Z kwnames=, 2023-01-11T23:33:40.4490201Z nargsf=, args=0x5b54c20, 2023-01-11T23:33:40.4490804Z callable=, 2023-01-11T23:33:40.4491357Z tstate=) 2023-01-11T23:33:40.4491998Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:112 2023-01-11T23:33:40.4492481Z #162 _PyObject_VectorcallTstate (kwnames=0x0, nargsf=, 2023-01-11T23:33:40.4492938Z args=0x5b54c20, callable=0x7f1b35d72410, tstate=) 2023-01-11T23:33:40.4493483Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:99 2023-01-11T23:33:40.4495127Z #163 PyObject_Vectorcall (kwnames=0x0, nargsf=, 2023-01-11T23:33:40.4495534Z args=0x5b54c20, callable=0x7f1b35d72410) 2023-01-11T23:33:40.4496076Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123 2023-01-11T23:33:40.4498811Z #164 call_function (kwnames=0x0, oparg=, 2023-01-11T23:33:40.4499246Z pp_stack=, trace_info=0x7ffc614ee010, 2023-01-11T23:33:40.4499607Z tstate=) 2023-01-11T23:33:40.4500118Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5891 2023-01-11T23:33:40.4500497Z #165 _PyEval_EvalFrameDefault () 2023-01-11T23:33:40.4500987Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4213 2023-01-11T23:33:40.4501393Z #166 0x00000000004fe5ef in _PyEval_EvalFrame ( 2023-01-11T23:33:40.4501852Z throwflag=, 2023-01-11T23:33:40.4502563Z f=, 2023-01-11T23:33:40.4503111Z tstate=) 2023-01-11T23:33:40.4503708Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052 2023-01-11T23:33:40.4506123Z #167 _PyEval_Vector (kwnames=, 2023-01-11T23:33:40.4506703Z kwnames@entry=, argcount=, args=, 2023-01-11T23:33:40.4507335Z args@entry=, locals=0x0, 2023-01-11T23:33:40.4507913Z locals@entry=, con=0x7f1b35dc1d90, 2023-01-11T23:33:40.4508338Z con@entry=, tstate=0xb34b80, 2023-01-11T23:33:40.4508747Z tstate@entry=) 2023-01-11T23:33:40.4509159Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065 2023-01-11T23:33:40.4509431Z #168 _PyFunction_Vectorcall () 2023-01-11T23:33:40.4509747Z at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342 2023-01-11T23:33:40.4512772Z #169 0x00000000004ef101 in _PyObject_VectorcallTstate (kwnames=0x0, 2023-01-11T23:33:40.4513200Z nargsf=, args=0x5c960b8, callable=0x7f1b35dc1d80, 2023-01-11T23:33:40.4513546Z tstate=0xb34b80) 2023-01-11T23:33:40.4513981Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123 2023-01-11T23:33:40.4514419Z #170 PyObject_Vectorcall (kwnames=0x0, nargsf=, 2023-01-11T23:33:40.4514805Z args=0x5c960b8, callable=0x7f1b35dc1d80) 2023-01-11T23:33:40.4515253Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123 2023-01-11T23:33:40.4516794Z #171 call_function (kwnames=0x0, oparg=, 2023-01-11T23:33:40.4517376Z pp_stack=, trace_info=0x7ffc614ee1d0, 2023-01-11T23:33:40.4517741Z tstate=) 2023-01-11T23:33:40.4518211Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5891 2023-01-11T23:33:40.4518572Z #172 _PyEval_EvalFrameDefault () 2023-01-11T23:33:40.4518912Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4198 2023-01-11T23:33:40.4519196Z #173 0x00000000004fe5ef in _PyEval_EvalFrame ( 2023-01-11T23:33:40.4519519Z throwflag=, 2023-01-11T23:33:40.4519894Z f=, 2023-01-11T23:33:40.4520262Z tstate=) 2023-01-11T23:33:40.4520655Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052 2023-01-11T23:33:40.4523047Z #174 _PyEval_Vector (kwnames=, 2023-01-11T23:33:40.4523978Z kwnames@entry=, argcount=, args=, 2023-01-11T23:33:40.4524655Z args@entry=, locals=0x0, 2023-01-11T23:33:40.4525293Z locals@entry=, con=0x7f1c2657a3c0, 2023-01-11T23:33:40.4525899Z con@entry=, tstate=0xb34b80, 2023-01-11T23:33:40.4526483Z tstate@entry=) 2023-01-11T23:33:40.4527096Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065 2023-01-11T23:33:40.4527588Z #175 _PyFunction_Vectorcall () 2023-01-11T23:33:40.4528063Z at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342 2023-01-11T23:33:40.4528843Z #176 0x00000000004ef101 in _PyObject_VectorcallTstate (kwnames=0x0, 2023-01-11T23:33:40.4529298Z nargsf=, args=0x5b7f2b8, callable=0x7f1c2657a3b0, 2023-01-11T23:33:40.4529664Z tstate=0xb34b80) 2023-01-11T23:33:40.4530145Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123 2023-01-11T23:33:40.4532503Z #177 PyObject_Vectorcall (kwnames=0x0, nargsf=, 2023-01-11T23:33:40.4533046Z args=0x5b7f2b8, callable=0x7f1c2657a3b0) 2023-01-11T23:33:40.4533727Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123 2023-01-11T23:33:40.4534249Z #178 call_function (kwnames=0x0, oparg=, 2023-01-11T23:33:40.4535010Z pp_stack=, trace_info=0x7ffc614ee390, 2023-01-11T23:33:40.4535476Z tstate=) 2023-01-11T23:33:40.4535961Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5891 2023-01-11T23:33:40.4536225Z #179 _PyEval_EvalFrameDefault () 2023-01-11T23:33:40.4536547Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4198 2023-01-11T23:33:40.4536950Z #180 0x00000000004fe5ef in _PyEval_EvalFrame ( 2023-01-11T23:33:40.4537395Z throwflag=, 2023-01-11T23:33:40.4537797Z f=, 2023-01-11T23:33:40.4538190Z tstate=) 2023-01-11T23:33:40.4538585Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052 2023-01-11T23:33:40.4542175Z #181 _PyEval_Vector (kwnames=, 2023-01-11T23:33:40.4542683Z kwnames@entry=, argcount=, args=, 2023-01-11T23:33:40.4543310Z args@entry=, locals=0x0, 2023-01-11T23:33:40.4543730Z locals@entry=, con=0x7f1c26579e20, 2023-01-11T23:33:40.4544145Z con@entry=, tstate=0xb34b80, 2023-01-11T23:33:40.4544534Z tstate@entry=) 2023-01-11T23:33:40.4544943Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065 2023-01-11T23:33:40.4545198Z #182 _PyFunction_Vectorcall () 2023-01-11T23:33:40.4545506Z at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342 2023-01-11T23:33:40.4545818Z #183 0x00000000004f711d in _PyObject_FastCallDictTstate.localalias () 2023-01-11T23:33:40.4546180Z at /usr/local/src/conda/python-3.10.8/Include/abstract.h:153 2023-01-11T23:33:40.4550125Z #184 0x0000000000507af8 in _PyObject_Call_Prepend (kwargs=0x7f1b35db6a00, 2023-01-11T23:33:40.4550720Z kwargs@entry=, args=0x7f1c27820070, 2023-01-11T23:33:40.4551368Z args@entry=, obj=, 2023-01-11T23:33:40.4551795Z obj@entry=, callable=0x7f1c26579e10, 2023-01-11T23:33:40.4552218Z callable@entry=, tstate=0xb34b80, 2023-01-11T23:33:40.4552616Z tstate@entry=) 2023-01-11T23:33:40.4553038Z at /usr/local/src/conda/python-3.10.8/Include/abstract.h:431 2023-01-11T23:33:40.4553424Z #185 slot_tp_init () 2023-01-11T23:33:40.4553726Z at /usr/local/src/conda/python-3.10.8/Programs/gcmodule.c:7734 2023-01-11T23:33:40.4554201Z #186 0x00000000004f7bdb in type_call (kwds=0x7f1b35db6a00, 2023-01-11T23:33:40.4554467Z args=0x7f1c27820070, type=) 2023-01-11T23:33:40.4554804Z at /usr/local/src/conda/python-3.10.8/Include/abstract.h:224 2023-01-11T23:33:40.4555068Z #187 _PyObject_MakeTpCall.localalias () 2023-01-11T23:33:40.4555393Z at /usr/local/src/conda/python-3.10.8/Include/abstract.h:215 2023-01-11T23:33:40.4557563Z #188 0x00000000004f428d in _PyObject_VectorcallTstate ( 2023-01-11T23:33:40.4558033Z kwnames=0x7f1b3d0f5740, 2023-01-11T23:33:40.4558542Z nargsf=, args=, 2023-01-11T23:33:40.4559067Z callable=, 2023-01-11T23:33:40.4559454Z tstate=) 2023-01-11T23:33:40.4559925Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:112 2023-01-11T23:33:40.4560366Z #189 _PyObject_VectorcallTstate (kwnames=0x7f1b3d0f5740, 2023-01-11T23:33:40.4561006Z nargsf=, args=, callable=0xcd1c30, 2023-01-11T23:33:40.4561376Z tstate=0xb34b80) 2023-01-11T23:33:40.4562000Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:99 2023-01-11T23:33:40.4562464Z #190 PyObject_Vectorcall (kwnames=0x7f1b3d0f5740, nargsf=, 2023-01-11T23:33:40.4562870Z args=, callable=0xcd1c30) 2023-01-11T23:33:40.4563380Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123 2023-01-11T23:33:40.4564134Z #191 call_function (kwnames=0x7f1b3d0f5740, oparg=, 2023-01-11T23:33:40.4564805Z pp_stack=, trace_info=0x7ffc614ee6b0, 2023-01-11T23:33:40.4565437Z tstate=) 2023-01-11T23:33:40.4566181Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5891 2023-01-11T23:33:40.4566570Z #192 _PyEval_EvalFrameDefault () 2023-01-11T23:33:40.4567052Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4231 2023-01-11T23:33:40.4567437Z #193 0x00000000004fe5ef in _PyEval_EvalFrame ( 2023-01-11T23:33:40.4567812Z throwflag=, 2023-01-11T23:33:40.4568193Z f=, 2023-01-11T23:33:40.4568560Z tstate=) 2023-01-11T23:33:40.4568945Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052 2023-01-11T23:33:40.4569542Z #194 _PyEval_Vector (kwnames=, 2023-01-11T23:33:40.4570050Z kwnames@entry=, argcount=, args=, 2023-01-11T23:33:40.4570573Z args@entry=, locals=0x0, 2023-01-11T23:33:40.4570992Z locals@entry=, con=0x7f1b3ced2960, 2023-01-11T23:33:40.4571407Z con@entry=, tstate=0xb34b80, 2023-01-11T23:33:40.4571808Z tstate@entry=) 2023-01-11T23:33:40.4572233Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065 2023-01-11T23:33:40.4572511Z #195 _PyFunction_Vectorcall () 2023-01-11T23:33:40.4572860Z at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342 2023-01-11T23:33:40.4574877Z #196 0x00000000004eecef in _PyObject_VectorcallTstate (kwnames=0x0, 2023-01-11T23:33:40.4575416Z nargsf=, args=0x7f1c27845ba8, callable=0x7f1b3ced2950, 2023-01-11T23:33:40.4575710Z tstate=0xb34b80) 2023-01-11T23:33:40.4576042Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123 2023-01-11T23:33:40.4577954Z #197 PyObject_Vectorcall (kwnames=0x0, nargsf=, 2023-01-11T23:33:40.4578360Z args=0x7f1c27845ba8, callable=0x7f1b3ced2950) 2023-01-11T23:33:40.4579094Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123 2023-01-11T23:33:40.4579760Z #198 call_function (kwnames=0x0, oparg=, 2023-01-11T23:33:40.4580373Z pp_stack=, trace_info=0x7ffc614ee870, 2023-01-11T23:33:40.4580708Z tstate=) 2023-01-11T23:33:40.4581095Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5891 2023-01-11T23:33:40.4581453Z #199 _PyEval_EvalFrameDefault () 2023-01-11T23:33:40.4581885Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4213 2023-01-11T23:33:40.4582257Z #200 0x0000000000594b72 in _PyEval_EvalFrame ( 2023-01-11T23:33:40.4582685Z throwflag=, 2023-01-11T23:33:40.4583138Z f=, 2023-01-11T23:33:40.4583509Z tstate=) 2023-01-11T23:33:40.4583972Z at /croot/python-split_1669298683653/_build_env/x86_64-conda-linux-gnu/sysroot/usr/include/bits/call.c:46 2023-01-11T23:33:40.4584255Z #201 _PyEval_Vector () 2023-01-11T23:33:40.4584554Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065 2023-01-11T23:33:40.4587693Z #202 0x0000000000594ab7 in PyEval_EvalCode (co=co@entry=0x7f1c26797730, 2023-01-11T23:33:40.4588017Z globals=globals@entry=0x7f1c268d2480, locals=locals@entry=0x7f1c268d2480) 2023-01-11T23:33:40.4588395Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:1134 2023-01-11T23:33:40.4588787Z #203 0x00000000005c6e57 in run_eval_code_obj () 2023-01-11T23:33:40.4589220Z at /usr/local/src/conda/python-3.10.8/Objects/clinic/marshal.c.h:1291 2023-01-11T23:33:40.4592299Z #204 0x00000000005c1d40 in run_mod () 2023-01-11T23:33:40.4592682Z at /usr/local/src/conda/python-3.10.8/Objects/clinic/marshal.c.h:1312 2023-01-11T23:33:40.4594846Z #205 0x000000000045adf2 in pyrun_file ( 2023-01-11T23:33:40.4595403Z fp=, 2023-01-11T23:33:40.4595914Z filename=, 2023-01-11T23:33:40.4596411Z start=, 2023-01-11T23:33:40.4596937Z globals=, 2023-01-11T23:33:40.4597324Z locals=, 2023-01-11T23:33:40.4597719Z closeit=, flags=0x7ffc614eeb68) 2023-01-11T23:33:40.4598160Z at /usr/local/src/conda/python-3.10.8/Objects/clinic/marshal.c.h:1208 2023-01-11T23:33:40.4598476Z #206 0x00000000005bc25f in _PyRun_SimpleFileObject.localalias () 2023-01-11T23:33:40.4598836Z at /usr/local/src/conda/python-3.10.8/Objects/clinic/marshal.c.h:456 2023-01-11T23:33:40.4599151Z #207 0x00000000005bc063 in _PyRun_AnyFileObject.localalias () 2023-01-11T23:33:40.4599509Z at /usr/local/src/conda/python-3.10.8/Objects/clinic/marshal.c.h:90 2023-01-11T23:33:40.4603409Z #208 0x00000000005b8e7d in pymain_run_file_obj (skip_source_first_line=0, 2023-01-11T23:33:40.4603705Z filename=0x7f1c2694d840, program_name=0x7f1c268bfa00) 2023-01-11T23:33:40.4604067Z at /croot/python-split_1669298683653/work/build-static/python.c:357 2023-01-11T23:33:40.4604346Z #209 pymain_run_file (config=0xb18e50) 2023-01-11T23:33:40.4604770Z at /croot/python-split_1669298683653/work/build-static/python.c:376 2023-01-11T23:33:40.4605310Z #210 pymain_run_python (exitcode=0x7ffc614eeb60) 2023-01-11T23:33:40.4605822Z at /croot/python-split_1669298683653/work/build-static/python.c:591 2023-01-11T23:33:40.4606097Z #211 Py_RunMain.localalias () 2023-01-11T23:33:40.4606419Z at /croot/python-split_1669298683653/work/build-static/python.c:670 2023-01-11T23:33:40.4620764Z #212 0x0000000000587c29 in Py_BytesMain (argc=, 2023-01-11T23:33:40.4621318Z argv=) 2023-01-11T23:33:40.4621808Z at /croot/python-split_1669298683653/work/build-static/python.c:1090 2023-01-11T23:33:40.4625046Z #213 0x00007f1c269b0c87 in __libc_start_main (main=0x587be0
, argc=6, 2023-01-11T23:33:40.4625507Z argv=0x7ffc614eed68, init=, fini=, 2023-01-11T23:33:40.4625812Z rtld_fini=, stack_end=0x7ffc614eed58) 2023-01-11T23:33:40.4626201Z at ../csu/libc-start.c:310 2023-01-11T23:33:40.4626487Z #214 0x0000000000587ade in _start () 2023-01-11T23:33:40.4626880Z at /usr/local/src/conda/python-3.10.8/Modules/_io/clinic/peg_api.c:880 2023-01-11T23:33:40.6170420Z GNU gdb (Ubuntu 8.1.1-0ubuntu1) 8.1.1 2023-01-11T23:33:40.6170848Z Copyright (C) 2018 Free Software Foundation, Inc. 2023-01-11T23:33:40.6171241Z License GPLv3+: GNU GPL version 3 or later 2023-01-11T23:33:40.6171592Z This is free software: you are free to change and redistribute it. 2023-01-11T23:33:40.6171911Z There is NO WARRANTY, to the extent permitted by law. Type "show copying" 2023-01-11T23:33:40.6172175Z and "show warranty" for details. 2023-01-11T23:33:40.6172474Z This GDB was configured as "x86_64-linux-gnu". 2023-01-11T23:33:40.6172757Z Type "show configuration" for configuration details. 2023-01-11T23:33:40.6173026Z For bug reporting instructions, please see: 2023-01-11T23:33:40.6173277Z . 2023-01-11T23:33:40.6173566Z Find the GDB manual and other documentation resources online at: 2023-01-11T23:33:40.6173853Z . 2023-01-11T23:33:40.6174085Z For help, type "help". 2023-01-11T23:33:40.6174336Z Type "apropos word" to search for commands related to "word"... 2023-01-11T23:33:40.6878127Z Reading symbols from python...done. 2023-01-11T23:33:41.2175363Z 2023-01-11T23:33:41.2175859Z warning: core file may not match specified executable file. 2023-01-11T23:33:41.2197453Z BFD: warning: /opt/conda/lib/libgomp.so.1: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010001 2023-01-11T23:33:41.2197840Z [New LWP 19370] 2023-01-11T23:33:41.2198115Z [New LWP 19380] 2023-01-11T23:33:41.2198409Z [New LWP 19381] 2023-01-11T23:33:41.2198756Z BFD: warning: /opt/conda/lib/libgomp.so.1: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010002 2023-01-11T23:33:41.2199124Z BFD: warning: /opt/conda/bin/../lib/libstdc++.so.6: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010001 2023-01-11T23:33:41.2199501Z BFD: warning: /opt/conda/bin/../lib/libstdc++.so.6: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010002 2023-01-11T23:33:41.2199862Z BFD: warning: /opt/conda/bin/../lib/libgcc_s.so.1: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010001 2023-01-11T23:33:41.2200214Z BFD: warning: /opt/conda/bin/../lib/libgcc_s.so.1: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010002 2023-01-11T23:33:41.2200483Z [New LWP 19374] 2023-01-11T23:33:41.2200666Z [New LWP 19379] 2023-01-11T23:33:41.2200847Z [New LWP 19373] 2023-01-11T23:33:41.2201026Z [New LWP 19376] 2023-01-11T23:33:41.2201212Z [New LWP 19375] 2023-01-11T23:33:41.2201388Z [New LWP 19378] 2023-01-11T23:33:41.2201565Z [New LWP 19372] 2023-01-11T23:33:41.2201750Z [New LWP 19385] 2023-01-11T23:33:41.2201923Z [New LWP 19382] 2023-01-11T23:33:41.2202099Z [New LWP 19384] 2023-01-11T23:33:41.2202281Z [New LWP 19383] 2023-01-11T23:33:41.2202450Z [New LWP 19377] 2023-01-11T23:33:41.2202628Z [New LWP 19386] 2023-01-11T23:33:41.2202806Z [New LWP 19403] 2023-01-11T23:33:41.2207106Z BFD: warning: /opt/conda/lib/python3.10/site-packages/numpy/core/../../../.././libgfortran.so.5: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010001 2023-01-11T23:33:41.2207664Z BFD: warning: /opt/conda/lib/python3.10/site-packages/numpy/core/../../../.././libgfortran.so.5: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010002 2023-01-11T23:33:41.2208219Z BFD: warning: /opt/conda/lib/python3.10/site-packages/numpy/core/../../../.././libquadmath.so.0: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010001 2023-01-11T23:33:41.2208756Z BFD: warning: /opt/conda/lib/python3.10/site-packages/numpy/core/../../../.././libquadmath.so.0: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010002 2023-01-11T23:33:41.2378669Z [Thread debugging using libthread_db enabled] 2023-01-11T23:33:41.2380027Z Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". 2023-01-11T23:33:46.6756923Z 78 ../sysdeps/unix/syscall-template.S: No such file or directory. 2023-01-11T23:33:46.6757865Z Core was generated by `/opt/conda/bin/python -bb -c from multiprocessing.spawn import spawn_main; spaw'. 2023-01-11T23:33:46.6758987Z warning: File "/var/lib/jenkins/workspace/.gdbinit" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load". 2023-01-11T23:33:46.6759691Z Program terminated with signal SIGABRT, Aborted. 2023-01-11T23:33:46.6760113Z #0 0x00007f7a9a4be177 in kill () at ../sysdeps/unix/syscall-template.S:78 2023-01-11T23:33:46.6760403Z [Current thread is 1 (Thread 0x7f7a9b43c080 (LWP 19370))] 2023-01-11T23:33:46.6782759Z To enable execution of this file add 2023-01-11T23:33:46.6783114Z add-auto-load-safe-path /var/lib/jenkins/workspace/.gdbinit 2023-01-11T23:33:46.6783440Z line to your configuration file "/var/lib/jenkins/.gdbinit". 2023-01-11T23:33:46.6783717Z To completely disable this security protection add 2023-01-11T23:33:46.6784012Z set auto-load safe-path / 2023-01-11T23:33:46.6784278Z line to your configuration file "/var/lib/jenkins/.gdbinit". 2023-01-11T23:33:46.6784564Z For more information about this security protection see the 2023-01-11T23:33:46.6784937Z "Auto-loading safe path" section in the GDB manual. E.g., run from the shell: 2023-01-11T23:33:46.6785258Z info "(gdb)Auto-loading safe path" 2023-01-11T23:33:46.6813661Z #0 0x00007f7a9a4be177 in kill () at ../sysdeps/unix/syscall-template.S:78 2023-01-11T23:33:46.6815743Z #1 0x00000000004cb0d3 in os_kill_impl ( 2023-01-11T23:33:46.6816075Z module=, 2023-01-11T23:33:46.6816461Z signal=, 2023-01-11T23:33:46.6816837Z pid=) 2023-01-11T23:33:46.6817327Z at /croot/python-split_1669298683653/_build_env/x86_64-conda-linux-gnu/sysroot/usr/include/sys/_iomodule.c:7929 2023-01-11T23:33:46.6822144Z #2 os_kill (module=, args=args@entry=0x7f79a9a92578, 2023-01-11T23:33:46.6822753Z nargs=) 2023-01-11T23:33:46.6823123Z at /usr/local/src/conda/python-3.10.8/Modules/codecs.c:3581 2023-01-11T23:33:46.6827099Z #3 0x00000000004fe7d4 in cfunction_vectorcall_FASTCALL (func=0x7f7a9b4018f0, 2023-01-11T23:33:46.6827700Z args=0x7f79a9a92578, nargsf=, kwnames=) 2023-01-11T23:33:46.6828131Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_bitutils.h:430 2023-01-11T23:33:46.6834122Z #4 0x00000000004f351e in _PyObject_VectorcallTstate (kwnames=0x0, 2023-01-11T23:33:46.6834747Z nargsf=, args=0x7f79a9a92578, callable=0x7f7a9b4018f0, 2023-01-11T23:33:46.6835011Z tstate=0x1894bf0) 2023-01-11T23:33:46.6835500Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123 2023-01-11T23:33:46.6836092Z #5 PyObject_Vectorcall (kwnames=0x0, nargsf=, 2023-01-11T23:33:46.6836523Z args=0x7f79a9a92578, callable=0x7f7a9b4018f0) 2023-01-11T23:33:46.6836961Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123 2023-01-11T23:33:46.6838538Z #6 call_function (kwnames=0x0, oparg=, 2023-01-11T23:33:46.6838981Z pp_stack=, trace_info=0x7fff600c2e20, 2023-01-11T23:33:46.6839372Z tstate=) 2023-01-11T23:33:46.6839858Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5891 2023-01-11T23:33:46.6840254Z #7 _PyEval_EvalFrameDefault () 2023-01-11T23:33:46.6840794Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4181 2023-01-11T23:33:46.6841155Z #8 0x00000000004fe5ef in _PyEval_EvalFrame ( 2023-01-11T23:33:46.6841475Z throwflag=, 2023-01-11T23:33:46.6841936Z f=, 2023-01-11T23:33:46.6842315Z tstate=) 2023-01-11T23:33:46.6842708Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052 2023-01-11T23:33:46.6845206Z #9 _PyEval_Vector (kwnames=, 2023-01-11T23:33:46.6845836Z kwnames@entry=, argcount=, args=, 2023-01-11T23:33:46.6846307Z args@entry=, locals=0x0, 2023-01-11T23:33:46.6846876Z locals@entry=, con=0x7f79a9850ef0, 2023-01-11T23:33:46.6847559Z con@entry=, tstate=0x1894bf0, 2023-01-11T23:33:46.6848206Z tstate@entry=) 2023-01-11T23:33:46.6848789Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065 2023-01-11T23:33:46.6849059Z #10 _PyFunction_Vectorcall () 2023-01-11T23:33:46.6849376Z at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342 2023-01-11T23:33:46.6850932Z #11 0x00000000004f141c in do_call_core (kwdict=0x0, callargs=0x7f7a9a1b3c10, 2023-01-11T23:33:46.6851405Z func=0x7f79a9850ee0, trace_info=0x7fff600c2fe0, tstate=) 2023-01-11T23:33:46.6851963Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5943 2023-01-11T23:33:46.6852223Z #12 _PyEval_EvalFrameDefault () 2023-01-11T23:33:46.6852569Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4277 2023-01-11T23:33:46.6852873Z #13 0x00000000004fe5ef in _PyEval_EvalFrame ( 2023-01-11T23:33:46.6853232Z throwflag=, 2023-01-11T23:33:46.6853655Z f=, 2023-01-11T23:33:46.6854076Z tstate=) 2023-01-11T23:33:46.6854760Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052 2023-01-11T23:33:46.6856939Z #14 _PyEval_Vector (kwnames=, 2023-01-11T23:33:46.6857626Z kwnames@entry=, argcount=, args=, 2023-01-11T23:33:46.6858328Z args@entry=, locals=0x0, 2023-01-11T23:33:46.6858923Z locals@entry=, con=0x7f79b2eea9f0, 2023-01-11T23:33:46.6859347Z con@entry=, tstate=0x1894bf0, 2023-01-11T23:33:46.6859736Z tstate@entry=) 2023-01-11T23:33:46.6860330Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065 2023-01-11T23:33:46.6860588Z #15 _PyFunction_Vectorcall () 2023-01-11T23:33:46.6860910Z at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342 2023-01-11T23:33:46.6862774Z #16 0x00000000004f141c in do_call_core (kwdict=0x7f7a9a2c2800, 2023-01-11T23:33:46.6863416Z callargs=0x7f79a984f240, func=0x7f79b2eea9e0, trace_info=0x7fff600c31a0, 2023-01-11T23:33:46.6863809Z tstate=) 2023-01-11T23:33:46.6864286Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5943 2023-01-11T23:33:46.6864688Z #17 _PyEval_EvalFrameDefault () 2023-01-11T23:33:46.6865179Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4277 2023-01-11T23:33:46.6865548Z #18 0x00000000004fe5ef in _PyEval_EvalFrame ( 2023-01-11T23:33:46.6866005Z throwflag=, 2023-01-11T23:33:46.6866398Z f=, 2023-01-11T23:33:46.6866776Z tstate=) 2023-01-11T23:33:46.6867169Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052 2023-01-11T23:33:46.6869776Z #19 _PyEval_Vector (kwnames=, 2023-01-11T23:33:46.6870443Z kwnames@entry=, argcount=, args=, 2023-01-11T23:33:46.6870956Z args@entry=, locals=0x0, 2023-01-11T23:33:46.6871457Z locals@entry=, con=0x7f7a9a2ed5b0, 2023-01-11T23:33:46.6871976Z con@entry=, tstate=0x1894bf0, 2023-01-11T23:33:46.6872372Z tstate@entry=) 2023-01-11T23:33:46.6872808Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065 2023-01-11T23:33:46.6873067Z #20 _PyFunction_Vectorcall () 2023-01-11T23:33:46.6873375Z at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342 2023-01-11T23:33:46.6874647Z #21 0x00000000004ef101 in _PyObject_VectorcallTstate (kwnames=0x0, 2023-01-11T23:33:46.6875051Z nargsf=, args=0x6833738, callable=0x7f7a9a2ed5a0, 2023-01-11T23:33:46.6875381Z tstate=0x1894bf0) 2023-01-11T23:33:46.6875807Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123 2023-01-11T23:33:46.6877134Z #22 PyObject_Vectorcall (kwnames=0x0, nargsf=, args=0x6833738, 2023-01-11T23:33:46.6877511Z callable=0x7f7a9a2ed5a0) 2023-01-11T23:33:46.6877981Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123 2023-01-11T23:33:46.6879335Z #23 call_function (kwnames=0x0, oparg=, 2023-01-11T23:33:46.6879764Z pp_stack=, trace_info=0x7fff600c3360, 2023-01-11T23:33:46.6880131Z tstate=) 2023-01-11T23:33:46.6880555Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5891 2023-01-11T23:33:46.6880863Z #24 _PyEval_EvalFrameDefault () 2023-01-11T23:33:46.6881305Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4198 2023-01-11T23:33:46.6881663Z #25 0x00000000004fe5ef in _PyEval_EvalFrame ( 2023-01-11T23:33:46.6882098Z throwflag=, 2023-01-11T23:33:46.6882503Z f=, 2023-01-11T23:33:46.6882892Z tstate=) 2023-01-11T23:33:46.6883281Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052 2023-01-11T23:33:46.6886089Z #26 _PyEval_Vector (kwnames=, 2023-01-11T23:33:46.6886768Z kwnames@entry=, argcount=, args=, 2023-01-11T23:33:46.6887350Z args@entry=, locals=0x0, 2023-01-11T23:33:46.6887901Z locals@entry=, con=0x7f7a9a2edf40, 2023-01-11T23:33:46.6888371Z con@entry=, tstate=0x1894bf0, 2023-01-11T23:33:46.6888774Z tstate@entry=) 2023-01-11T23:33:46.6889261Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065 2023-01-11T23:33:46.6889516Z #27 _PyFunction_Vectorcall () 2023-01-11T23:33:46.6889833Z at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342 2023-01-11T23:33:46.6891341Z #28 0x00000000004ef101 in _PyObject_VectorcallTstate (kwnames=0x0, 2023-01-11T23:33:46.6891858Z nargsf=, args=0x1911510, callable=0x7f7a9a2edf30, 2023-01-11T23:33:46.6892172Z tstate=0x1894bf0) 2023-01-11T23:33:46.6892586Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123 2023-01-11T23:33:46.6893866Z #29 PyObject_Vectorcall (kwnames=0x0, nargsf=, args=0x1911510, 2023-01-11T23:33:46.6894333Z callable=0x7f7a9a2edf30) 2023-01-11T23:33:46.6895074Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123 2023-01-11T23:33:46.6895834Z #30 call_function (kwnames=0x0, oparg=, 2023-01-11T23:33:46.6896229Z pp_stack=, trace_info=0x7fff600c3520, 2023-01-11T23:33:46.6896551Z tstate=) 2023-01-11T23:33:46.6896994Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5891 2023-01-11T23:33:46.6897323Z #31 _PyEval_EvalFrameDefault () 2023-01-11T23:33:46.6897746Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4198 2023-01-11T23:33:46.6898672Z #32 0x00000000004fe5ef in _PyEval_EvalFrame ( 2023-01-11T23:33:46.6899252Z throwflag=, 2023-01-11T23:33:46.6899656Z f=, 2023-01-11T23:33:46.6900127Z tstate=) 2023-01-11T23:33:46.6900554Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052 2023-01-11T23:33:46.6903678Z #33 _PyEval_Vector (kwnames=, 2023-01-11T23:33:46.6904085Z kwnames@entry=, argcount=, args=, 2023-01-11T23:33:46.6904649Z args@entry=, locals=0x0, 2023-01-11T23:33:46.6905246Z locals@entry=, con=0x7f7a9a172de0, 2023-01-11T23:33:46.6905797Z con@entry=, tstate=0x1894bf0, 2023-01-11T23:33:46.6906311Z tstate@entry=) 2023-01-11T23:33:46.6906840Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065 2023-01-11T23:33:46.6907159Z #34 _PyFunction_Vectorcall () 2023-01-11T23:33:46.6907567Z at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342 2023-01-11T23:33:46.6908205Z #35 0x00000000004eecef in _PyObject_VectorcallTstate (kwnames=0x0, 2023-01-11T23:33:46.6908585Z nargsf=, args=0x7f7a9a438200, callable=0x7f7a9a172dd0, 2023-01-11T23:33:46.6908905Z tstate=0x1894bf0) 2023-01-11T23:33:46.6909511Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123 2023-01-11T23:33:46.6913040Z #36 PyObject_Vectorcall (kwnames=0x0, nargsf=, 2023-01-11T23:33:46.6913341Z args=0x7f7a9a438200, callable=0x7f7a9a172dd0) 2023-01-11T23:33:46.6913844Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123 2023-01-11T23:33:46.6914233Z #37 call_function (kwnames=0x0, oparg=, 2023-01-11T23:33:46.6914581Z pp_stack=, trace_info=0x7fff600c36e0, 2023-01-11T23:33:46.6914895Z tstate=) 2023-01-11T23:33:46.6915298Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5891 2023-01-11T23:33:46.6915613Z #38 _PyEval_EvalFrameDefault () 2023-01-11T23:33:46.6916023Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4213 2023-01-11T23:33:46.6916499Z #39 0x00000000004fe5ef in _PyEval_EvalFrame ( 2023-01-11T23:33:46.6916905Z throwflag=, 2023-01-11T23:33:46.6917373Z f=, 2023-01-11T23:33:46.6917832Z tstate=) 2023-01-11T23:33:46.6918327Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052 2023-01-11T23:33:46.6920103Z #40 _PyEval_Vector (kwnames=, 2023-01-11T23:33:46.6920730Z kwnames@entry=, argcount=, args=, 2023-01-11T23:33:46.6921285Z args@entry=, locals=0x0, 2023-01-11T23:33:46.6921827Z locals@entry=, con=0x7f7a9a172d50, 2023-01-11T23:33:46.6922351Z con@entry=, tstate=0x1894bf0, 2023-01-11T23:33:46.6922848Z tstate@entry=) 2023-01-11T23:33:46.6923360Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065 2023-01-11T23:33:46.6923688Z #41 _PyFunction_Vectorcall () 2023-01-11T23:33:46.6924085Z at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342 2023-01-11T23:33:46.6924478Z #42 0x00000000004efd83 in _PyObject_VectorcallTstate (kwnames=0x7f7a9a40de40, 2023-01-11T23:33:46.6924885Z nargsf=, args=, callable=0x7f7a9a172d40, 2023-01-11T23:33:46.6925205Z tstate=0x1894bf0) 2023-01-11T23:33:46.6925595Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123 2023-01-11T23:33:46.6926170Z #43 PyObject_Vectorcall (kwnames=0x7f7a9a40de40, nargsf=, 2023-01-11T23:33:46.6926537Z args=, callable=0x7f7a9a172d40) 2023-01-11T23:33:46.6926979Z at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123 2023-01-11T23:33:46.6928587Z #44 call_function (kwnames=0x7f7a9a40de40, oparg=, 2023-01-11T23:33:46.6928968Z pp_stack=, trace_info=0x7fff600c38a0, 2023-01-11T23:33:46.6929290Z tstate=) 2023-01-11T23:33:46.6929685Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5891 2023-01-11T23:33:46.6930011Z #45 _PyEval_EvalFrameDefault () 2023-01-11T23:33:46.6930414Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4231 2023-01-11T23:33:46.6931099Z #46 0x0000000000594b72 in _PyEval_EvalFrame ( 2023-01-11T23:33:46.6931515Z throwflag=, 2023-01-11T23:33:46.6931991Z f=, 2023-01-11T23:33:46.6932456Z tstate=) 2023-01-11T23:33:46.6933032Z at /croot/python-split_1669298683653/_build_env/x86_64-conda-linux-gnu/sysroot/usr/include/bits/call.c:46 2023-01-11T23:33:46.6933454Z #47 _PyEval_Vector () 2023-01-11T23:33:46.6933843Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065 2023-01-11T23:33:46.6937727Z #48 0x0000000000594ab7 in PyEval_EvalCode (co=co@entry=0x7f7a9a3d0920, 2023-01-11T23:33:46.6938111Z globals=globals@entry=0x7f7a9a3c2600, locals=locals@entry=0x7f7a9a3c2600) 2023-01-11T23:33:46.6938586Z at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:1134 2023-01-11T23:33:46.6939114Z #49 0x00000000005c6e57 in run_eval_code_obj () 2023-01-11T23:33:46.6939553Z at /usr/local/src/conda/python-3.10.8/Objects/clinic/marshal.c.h:1291 2023-01-11T23:33:46.6942201Z #50 0x00000000005c1d40 in run_mod () 2023-01-11T23:33:46.6942739Z at /usr/local/src/conda/python-3.10.8/Objects/clinic/marshal.c.h:1312 2023-01-11T23:33:46.6944541Z #51 0x00000000005b9ebb in PyRun_StringFlags.localalias () 2023-01-11T23:33:46.6945002Z at /usr/local/src/conda/python-3.10.8/Objects/clinic/marshal.c.h:1183 2023-01-11T23:33:46.6946963Z #52 0x00000000005b9cfb in PyRun_SimpleStringFlags.localalias () 2023-01-11T23:33:46.6947694Z at /usr/local/src/conda/python-3.10.8/Objects/clinic/marshal.c.h:503 2023-01-11T23:33:46.6949779Z #53 0x00000000005b8d5c in pymain_run_command ( 2023-01-11T23:33:46.6950370Z command=) 2023-01-11T23:33:46.6950892Z at /croot/python-split_1669298683653/work/build-static/python.c:252 2023-01-11T23:33:46.6953197Z #54 pymain_run_python (exitcode=0x7fff600c3b00) 2023-01-11T23:33:46.6953618Z at /croot/python-split_1669298683653/work/build-static/python.c:582 2023-01-11T23:33:46.6953894Z #55 Py_RunMain.localalias () 2023-01-11T23:33:46.6954238Z at /croot/python-split_1669298683653/work/build-static/python.c:670 2023-01-11T23:33:46.6967955Z #56 0x0000000000587c29 in Py_BytesMain (argc=, 2023-01-11T23:33:46.6968202Z argv=) 2023-01-11T23:33:46.6968548Z at /croot/python-split_1669298683653/work/build-static/python.c:1090 2023-01-11T23:33:46.6972616Z #57 0x00007f7a9a4a0c87 in __libc_start_main (main=0x587be0
, argc=5, 2023-01-11T23:33:46.6972932Z argv=0x7fff600c3d08, init=, fini=, 2023-01-11T23:33:46.6973226Z rtld_fini=, stack_end=0x7fff600c3cf8) 2023-01-11T23:33:46.6973511Z at ../csu/libc-start.c:310 2023-01-11T23:33:46.6973860Z #58 0x0000000000587ade in _start () 2023-01-11T23:33:46.6974299Z at /usr/local/src/conda/python-3.10.8/Modules/_io/clinic/peg_api.c:880 2023-01-11T23:33:46.8102125Z ##[group]Run set -x 2023-01-11T23:33:46.8102349Z set -x 2023-01-11T23:33:46.8102584Z python3 -m pip install -r requirements.txt 2023-01-11T23:33:46.8102844Z python3 -m pip install boto3==1.19.12 2023-01-11T23:33:46.8103160Z python3 -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test 2023-01-11T23:33:46.8113626Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2023-01-11T23:33:46.8113857Z env: 2023-01-11T23:33:46.8114053Z GIT_DEFAULT_BRANCH: master 2023-01-11T23:33:46.8114275Z GPU_FLAG: --gpus all 2023-01-11T23:33:46.8114548Z DOCKER_CONTAINER_ID: b465a1e11c77f74805f694f8096e59c3761c5a109e9555d7d0ec24fd6c5fa0c6 2023-01-11T23:33:46.8114830Z AWS_DEFAULT_REGION: us-east-1 2023-01-11T23:33:46.8115032Z BRANCH: 2023-01-11T23:33:46.8115220Z TEST_CONFIG: default 2023-01-11T23:33:46.8115407Z SHARD_NUMBER: 2 2023-01-11T23:33:46.8115666Z BUILD_ENVIRONMENT: linux-bionic-cuda11.6-py3.10-gcc7-sm86 2023-01-11T23:33:46.8115939Z PR_NUMBER: 2023-01-11T23:33:46.8116144Z PYTORCH_RETRY_TEST_CASES: 1 2023-01-11T23:33:46.8116368Z PYTORCH_OVERRIDE_FLAKY_SIGNAL: 1 2023-01-11T23:33:46.8116619Z SHA1: 8419ddda87c8a47eacc63b54bc7ec98c1f27c26e 2023-01-11T23:33:46.8116849Z TAG: ciflow/trunk/91627 2023-01-11T23:33:46.8117051Z WORKFLOW_ID: 3896346758 2023-01-11T23:33:46.8117385Z GITHUB_TOKEN: *** 2023-01-11T23:33:46.8117670Z GHA_WORKFLOW_JOB_ID: 10589556206 2023-01-11T23:33:46.8117878Z ##[endgroup] 2023-01-11T23:33:46.8140523Z + python3 -m pip install -r requirements.txt 2023-01-11T23:33:47.0256876Z Defaulting to user installation because normal site-packages is not writeable 2023-01-11T23:33:47.0533798Z Requirement already satisfied: astunparse in /home/ec2-user/.local/lib/python3.7/site-packages (from -r requirements.txt (line 2)) (1.6.3) 2023-01-11T23:33:47.0562625Z Requirement already satisfied: expecttest in /home/ec2-user/.local/lib/python3.7/site-packages (from -r requirements.txt (line 3)) (0.1.4) 2023-01-11T23:33:47.0570797Z Requirement already satisfied: future in /home/ec2-user/.local/lib/python3.7/site-packages (from -r requirements.txt (line 4)) (0.18.2) 2023-01-11T23:33:47.0579537Z Requirement already satisfied: hypothesis in /home/ec2-user/.local/lib/python3.7/site-packages (from -r requirements.txt (line 5)) (6.62.0) 2023-01-11T23:33:47.1003775Z Requirement already satisfied: numpy in /home/ec2-user/.local/lib/python3.7/site-packages (from -r requirements.txt (line 6)) (1.21.6) 2023-01-11T23:33:47.1013650Z Requirement already satisfied: psutil in /home/ec2-user/.local/lib/python3.7/site-packages (from -r requirements.txt (line 7)) (5.9.1) 2023-01-11T23:33:47.1101690Z Requirement already satisfied: pyyaml in /home/ec2-user/.local/lib/python3.7/site-packages (from -r requirements.txt (line 8)) (6.0) 2023-01-11T23:33:47.1109476Z Requirement already satisfied: requests in /home/ec2-user/.local/lib/python3.7/site-packages (from -r requirements.txt (line 9)) (2.26.0) 2023-01-11T23:33:47.1318745Z Requirement already satisfied: setuptools in /usr/lib/python3.7/site-packages (from -r requirements.txt (line 10)) (49.1.3) 2023-01-11T23:33:47.1508577Z Requirement already satisfied: six in /home/ec2-user/.local/lib/python3.7/site-packages (from -r requirements.txt (line 11)) (1.16.0) 2023-01-11T23:33:47.1517855Z Requirement already satisfied: types-dataclasses in /home/ec2-user/.local/lib/python3.7/site-packages (from -r requirements.txt (line 12)) (0.6.6) 2023-01-11T23:33:47.1521976Z Requirement already satisfied: typing_extensions in /home/ec2-user/.local/lib/python3.7/site-packages (from -r requirements.txt (line 13)) (4.4.0) 2023-01-11T23:33:47.1532338Z Requirement already satisfied: sympy in /home/ec2-user/.local/lib/python3.7/site-packages (from -r requirements.txt (line 14)) (1.10.1) 2023-01-11T23:33:47.1552739Z Requirement already satisfied: filelock in /home/ec2-user/.local/lib/python3.7/site-packages (from -r requirements.txt (line 15)) (3.9.0) 2023-01-11T23:33:47.1631949Z Requirement already satisfied: networkx in /home/ec2-user/.local/lib/python3.7/site-packages (from -r requirements.txt (line 16)) (2.6.3) 2023-01-11T23:33:47.1808491Z Requirement already satisfied: jinja2 in /home/ec2-user/.local/lib/python3.7/site-packages (from -r requirements.txt (line 17)) (3.1.2) 2023-01-11T23:33:47.1835422Z Requirement already satisfied: wheel<1.0,>=0.23.0 in /home/ec2-user/.local/lib/python3.7/site-packages (from astunparse->-r requirements.txt (line 2)) (0.38.4) 2023-01-11T23:33:47.1851892Z Requirement already satisfied: attrs>=19.2.0 in /home/ec2-user/.local/lib/python3.7/site-packages (from hypothesis->-r requirements.txt (line 5)) (22.2.0) 2023-01-11T23:33:47.2143450Z Requirement already satisfied: sortedcontainers<3.0.0,>=2.1.0 in /home/ec2-user/.local/lib/python3.7/site-packages (from hypothesis->-r requirements.txt (line 5)) (2.4.0) 2023-01-11T23:33:47.2153385Z Requirement already satisfied: exceptiongroup>=1.0.0; python_version < "3.11" in /home/ec2-user/.local/lib/python3.7/site-packages (from hypothesis->-r requirements.txt (line 5)) (1.1.0) 2023-01-11T23:33:47.2170959Z Requirement already satisfied: urllib3<1.27,>=1.21.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from requests->-r requirements.txt (line 9)) (1.26.14) 2023-01-11T23:33:47.2347872Z Requirement already satisfied: charset-normalizer~=2.0.0; python_version >= "3" in /home/ec2-user/.local/lib/python3.7/site-packages (from requests->-r requirements.txt (line 9)) (2.0.12) 2023-01-11T23:33:47.2366825Z Requirement already satisfied: idna<4,>=2.5; python_version >= "3" in /home/ec2-user/.local/lib/python3.7/site-packages (from requests->-r requirements.txt (line 9)) (3.4) 2023-01-11T23:33:47.2378411Z Requirement already satisfied: certifi>=2017.4.17 in /home/ec2-user/.local/lib/python3.7/site-packages (from requests->-r requirements.txt (line 9)) (2022.12.7) 2023-01-11T23:33:47.2386929Z Requirement already satisfied: mpmath>=0.19 in /home/ec2-user/.local/lib/python3.7/site-packages (from sympy->-r requirements.txt (line 14)) (1.2.1) 2023-01-11T23:33:47.2448346Z Requirement already satisfied: MarkupSafe>=2.0 in /home/ec2-user/.local/lib/python3.7/site-packages (from jinja2->-r requirements.txt (line 17)) (2.1.1) 2023-01-11T23:33:47.3010740Z + python3 -m pip install boto3==1.19.12 2023-01-11T23:33:47.5142360Z Defaulting to user installation because normal site-packages is not writeable 2023-01-11T23:33:47.5313314Z Requirement already satisfied: boto3==1.19.12 in /home/ec2-user/.local/lib/python3.7/site-packages (1.19.12) 2023-01-11T23:33:47.5364218Z Requirement already satisfied: botocore<1.23.0,>=1.22.12 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (1.22.12) 2023-01-11T23:33:47.5409801Z Requirement already satisfied: jmespath<1.0.0,>=0.7.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (0.10.0) 2023-01-11T23:33:47.5429445Z Requirement already satisfied: s3transfer<0.6.0,>=0.5.0 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (0.5.2) 2023-01-11T23:33:47.5457702Z Requirement already satisfied: python-dateutil<3.0.0,>=2.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (2.8.2) 2023-01-11T23:33:47.5477157Z Requirement already satisfied: urllib3<1.27,>=1.25.4 in /home/ec2-user/.local/lib/python3.7/site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.26.14) 2023-01-11T23:33:47.5649560Z Requirement already satisfied: six>=1.5 in /home/ec2-user/.local/lib/python3.7/site-packages (from python-dateutil<3.0.0,>=2.1->botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.16.0) 2023-01-11T23:33:47.7594337Z + python3 -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test 2023-01-11T23:35:15.6278301Z [scribe] Scribe access token not provided, sending report via boto3... 2023-01-11T23:35:15.6279035Z ERROR ENCOUNTERED WHEN UPLOADING TO SCRIBE: {"errorMessage":"2023-01-11T23:34:56.027Z 0a32609c-a387-4d95-91aa-b330e1781bb5 Task timed out after 60.02 seconds"} 2023-01-11T23:35:15.6279296Z 2023-01-11T23:35:15.6279779Z ----- Historic stats comparison result ------ 2023-01-11T23:35:15.6279945Z 2023-01-11T23:35:15.6282560Z job: linux-bionic-cuda11.6-py3.10-gcc7-sm86 2023-01-11T23:35:15.6282884Z commit: 8419ddda87c8a47eacc63b54bc7ec98c1f27c26e 2023-01-11T23:35:15.6283057Z 2023-01-11T23:35:15.6283212Z Commit graph (base is most recent master ancestor with at least one S3 report): 2023-01-11T23:35:15.6283421Z 2023-01-11T23:35:15.6283496Z : (master) 2023-01-11T23:35:15.6283666Z | 2023-01-11T23:35:15.6283877Z | * 8419ddda87 (HEAD) total time 12646.49s 2023-01-11T23:35:15.6287630Z | | 2023-01-11T23:35:15.6287892Z | : (2 commits) 2023-01-11T23:35:15.6288130Z |/ 2023-01-11T23:35:15.6288651Z * db2a237763 (base) 9 reports, total time 5112.18s ± 3497.27s 2023-01-11T23:35:15.6289113Z * 2b0abd4ce3 9 reports, total time 5114.49s ± 3505.80s 2023-01-11T23:35:15.6289518Z * f7939b21e1 22 reports, total time 4164.47s ± 3805.54s 2023-01-11T23:35:15.6290000Z * cb3204823e 9 reports, total time 5113.06s ± 3503.00s 2023-01-11T23:35:15.6290369Z * 6e236553f5 9 reports, total time 5103.01s ± 3496.18s 2023-01-11T23:35:15.6290846Z * cce577b391 9 reports, total time 5169.75s ± 3599.85s 2023-01-11T23:35:15.6291266Z * fae821c2f1 9 reports, total time 4979.02s ± 3495.76s 2023-01-11T23:35:15.6291590Z * 0c3659586d 9 reports, total time 4993.42s ± 3527.06s 2023-01-11T23:35:15.6292235Z * 122245985a 9 reports, total time 4994.41s ± 3491.69s 2023-01-11T23:35:15.6292555Z * b797a24259 9 reports, total time 4950.22s ± 3490.58s 2023-01-11T23:35:15.6292768Z | 2023-01-11T23:35:15.6292924Z : 2023-01-11T23:35:15.6293025Z 2023-01-11T23:35:15.6293156Z Removed (across 942 suites) 0 tests, totaling 0.00s 2023-01-11T23:35:15.6293426Z Modified (across 0 suites) 0 tests, totaling 0.00s 2023-01-11T23:35:15.6293689Z Added (across 330 suites) 45187 tests, totaling +12646.49s 2023-01-11T23:35:15.7358804Z ##[group]Run pytorch/test-infra/.github/actions/teardown-linux@main 2023-01-11T23:35:15.7359064Z with: 2023-01-11T23:35:15.7359235Z env: 2023-01-11T23:35:15.7359432Z GIT_DEFAULT_BRANCH: master 2023-01-11T23:35:15.7359642Z GPU_FLAG: --gpus all 2023-01-11T23:35:15.7359926Z DOCKER_CONTAINER_ID: b465a1e11c77f74805f694f8096e59c3761c5a109e9555d7d0ec24fd6c5fa0c6 2023-01-11T23:35:15.7360227Z ##[endgroup] 2023-01-11T23:35:15.7373153Z ##[group]Run set -eou pipefail 2023-01-11T23:35:15.7373391Z set -eou pipefail 2023-01-11T23:35:15.7373593Z  2023-01-11T23:35:15.7373845Z echo "Holding runner for 2 hours until all ssh sessions have logged out" 2023-01-11T23:35:15.7374107Z for _ in $(seq 1440); do 2023-01-11T23:35:15.7374348Z  # Break if no ssh session exists anymore 2023-01-11T23:35:15.7374816Z  if [ "$(who)" = "" ]; then 2023-01-11T23:35:15.7375023Z  break 2023-01-11T23:35:15.7375231Z  fi 2023-01-11T23:35:15.7375423Z  echo "." 2023-01-11T23:35:15.7375615Z  sleep 5 2023-01-11T23:35:15.7375798Z done 2023-01-11T23:35:15.7386339Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2023-01-11T23:35:15.7386572Z env: 2023-01-11T23:35:15.7386757Z GIT_DEFAULT_BRANCH: master 2023-01-11T23:35:15.7386969Z GPU_FLAG: --gpus all 2023-01-11T23:35:15.7387249Z DOCKER_CONTAINER_ID: b465a1e11c77f74805f694f8096e59c3761c5a109e9555d7d0ec24fd6c5fa0c6 2023-01-11T23:35:15.7387503Z ##[endgroup] 2023-01-11T23:35:15.7408696Z Holding runner for 2 hours until all ssh sessions have logged out 2023-01-11T23:35:15.7441058Z ##[group]Run # ignore expansion of "docker ps -q" since it could be empty 2023-01-11T23:35:15.7441384Z # ignore expansion of "docker ps -q" since it could be empty 2023-01-11T23:35:15.7441650Z # shellcheck disable=SC2046 2023-01-11T23:35:15.7441898Z docker stop $(docker ps -q) || true 2023-01-11T23:35:15.7442141Z # Prune all of the docker images 2023-01-11T23:35:15.7442376Z docker system prune -af 2023-01-11T23:35:15.7450928Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2023-01-11T23:35:15.7451160Z env: 2023-01-11T23:35:15.7451354Z GIT_DEFAULT_BRANCH: master 2023-01-11T23:35:15.7451572Z GPU_FLAG: --gpus all 2023-01-11T23:35:15.7451855Z DOCKER_CONTAINER_ID: b465a1e11c77f74805f694f8096e59c3761c5a109e9555d7d0ec24fd6c5fa0c6 2023-01-11T23:35:15.7452115Z ##[endgroup] 2023-01-11T23:35:17.5323094Z b465a1e11c77 2023-01-11T23:35:18.8763787Z Deleted Containers: 2023-01-11T23:35:18.8764495Z b465a1e11c77f74805f694f8096e59c3761c5a109e9555d7d0ec24fd6c5fa0c6 2023-01-11T23:35:18.8764833Z 2023-01-11T23:35:22.0474449Z Deleted Images: 2023-01-11T23:35:22.0476189Z untagged: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-bionic-cuda11.6-cudnn8-py3-gcc7:fd224c2e6c79d7fdec6408da598bf52bc5b201dd 2023-01-11T23:35:22.0477329Z untagged: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-bionic-cuda11.6-cudnn8-py3-gcc7@sha256:866df6c1171dbe014496717cf2080d6cc72ca611a4e8146525c9ef09640c8ba4 2023-01-11T23:35:22.0477827Z deleted: sha256:09e297797cd8c095524ba49e041c45e57bf05ba16719f01e7240e8549da5beba 2023-01-11T23:35:22.0478162Z deleted: sha256:6d5f0082fbf8c3b01e49961283f44105a5bb12616f0073762021db97f20a16a5 2023-01-11T23:35:22.0478502Z deleted: sha256:19c574c96e47e3d16cc51cc088fed6cef16eaa2174e667ba0b395ac0e3b989bb 2023-01-11T23:35:22.0479202Z deleted: sha256:4fa7af758c23581dbcc2bd92defdc5fea97c8671fa67bbc30888ffbbf96c49a6 2023-01-11T23:35:22.0479629Z deleted: sha256:e68331e1b0f863bbdbd445ed8475d59d2234c9659264a4e49b7b096311445aee 2023-01-11T23:35:22.0480053Z deleted: sha256:69d886418998cd8758c555ed219fca3a457539e9d6f62c41c9664a80c82c4036 2023-01-11T23:35:22.0480489Z deleted: sha256:2368b1fdf0235d155eaa47e75ff379cff06cc82d63e03886a57363b8092d3c83 2023-01-11T23:35:22.0480961Z deleted: sha256:651c1e1b625aeaf8fa65e96ef11487e650a7a6d00ba8ec9fdfe7e89e762dc5c4 2023-01-11T23:35:22.0481526Z deleted: sha256:7b73f298df08c4b3aa849b55bb5e73ded619cb5b786fddcc21ece3c0b3887038 2023-01-11T23:35:22.0481958Z deleted: sha256:716f28b4b433958e5d1b9839a20dd22f9986f9c1b42fb95552f13d6bfd291efe 2023-01-11T23:35:22.0482263Z deleted: sha256:e4495993276176c504228e66b2fd6f348523c5af66f9292b2f7ea12acefcd606 2023-01-11T23:35:22.0482578Z deleted: sha256:d6f5fbd8782783697c73f4bcbce91a05e126575f72f0f58e1e9465aa57640a92 2023-01-11T23:35:22.0482945Z deleted: sha256:0b7725c897ee2681e3b1ff00aea6c14805c8050758cfa1010f561c9713934014 2023-01-11T23:35:22.0483287Z deleted: sha256:fdd864c6750bdd24f8cfb131673c6f04087e3cbecac2c1a9b3c30fefbd6d3070 2023-01-11T23:35:22.0483646Z deleted: sha256:bfaeaf77f180f62f3994ccdc2be80dd2ef7f4d25ffc8c9497dc51e6cff69711e 2023-01-11T23:35:22.0484012Z deleted: sha256:02f5a9d8be5a1bdd5a350d4c47147fd3dd46bfcedc7637f53a8a692720381fc2 2023-01-11T23:35:22.0484367Z deleted: sha256:e22cc66e4fd2e491fb4ae8194c35d6b1789f9f5d01e1dfbedf1c266c3a1537de 2023-01-11T23:35:22.0484694Z deleted: sha256:1536d02ae84ab410916541408cf2935f122735cc5d128324f6f82fbbeb913e80 2023-01-11T23:35:22.0485023Z deleted: sha256:a8374bc83a4bf3a838aaf8ee71b4a8281ac4eef801473b007f88b2f0efadc6c6 2023-01-11T23:35:22.0485349Z deleted: sha256:9381921b1e612b2d23517d24662a39f00be43efe2412e440ef41b487a48cb389 2023-01-11T23:35:22.0485678Z deleted: sha256:ae063d6cbeae3688a5a7bb8694d431dbb9792bfb7ff2908d3b25842f9586fc8e 2023-01-11T23:35:22.0486017Z deleted: sha256:595397ba048a351c7b09c25b5eba4cbb916c2db40dd80bde4d95c2b51b766045 2023-01-11T23:35:22.0486366Z deleted: sha256:7a2da4ffb8ea2b858fcdfc92f6e640dbb6a083b39dfa5e7aa87cd1296f9314e1 2023-01-11T23:35:22.0486729Z deleted: sha256:eddd66bfdbb5913f133ffda8d967ee0235f9f121434112ea8da4cfdf5f9ebbf4 2023-01-11T23:35:22.0487062Z deleted: sha256:c9392d92ee837d52b35b41e4de67d213886d844cecd9e769d9284dc21070aee8 2023-01-11T23:35:22.0487407Z deleted: sha256:fba16b8beafe9efa854d93f0e92718750ec97ced755ace0f6f51bbd5d1964f91 2023-01-11T23:35:22.0487742Z deleted: sha256:a09592e27d2e6896b9029f31e269787c92761fd19a48528851c48d85221cd4bc 2023-01-11T23:35:22.0488054Z deleted: sha256:8b2e3a8416af60625ccb9a8562891c1b9e85c2ee05f103b190ad5040313bf1f1 2023-01-11T23:35:22.0488379Z deleted: sha256:135caddb443044601f433d421e5c0f5d8ab02dff69cf2df9024a8ecb97c8948c 2023-01-11T23:35:22.0488697Z deleted: sha256:112b60db47585e101175390e102a66908f0f1175510c5e5d5f10b7e4e0c9769b 2023-01-11T23:35:22.0489063Z deleted: sha256:d9e5dd4e760b68190c010a9042c842afb0bcf3d4477a334d0e9c2d9c302ecb3c 2023-01-11T23:35:22.0489439Z deleted: sha256:7d37673dfb91518db6e68e9637f3db142a4eaccb9f548ab99d14ac52b3672325 2023-01-11T23:35:22.0489804Z deleted: sha256:12784d32c23b63941ebc8adf713d7167158c38c771d3b5f94506e036b6273dbd 2023-01-11T23:35:22.0490183Z deleted: sha256:395db4c8bde7e390accfa81ff84376004b78566423bd0a3bd7559e12e66759bc 2023-01-11T23:35:22.0490519Z deleted: sha256:a7aa04ee64333d427277f47b8e07dee6cc566b132a18ee166c305c890d4fd3bb 2023-01-11T23:35:22.0490838Z deleted: sha256:10a0e8d138614f29a13aecd15175039c7f86ca04fc588b439f56284b3c0e292a 2023-01-11T23:35:22.0491173Z deleted: sha256:68d97ecc8d2ec5f756b4a1b7e7451a413ed19f3d7c3ce52c81b30dc00fd96185 2023-01-11T23:35:22.0491506Z deleted: sha256:e833f7b95e0efaaf775293760365243991395e796d587deffe588e50fe7a9f1e 2023-01-11T23:35:22.0491821Z deleted: sha256:cbcd1e7502e949614d05d5d0a34316fd62665c3e030b3656d9c488cfee1eca34 2023-01-11T23:35:22.0492141Z deleted: sha256:a90ce4a75d9d408df6349607788566125c2662c603dc8f84c767b2256273ff12 2023-01-11T23:35:22.0492525Z deleted: sha256:bdee06da46fd67ff14bbc7286b40ef15174551b6452ca633be8576376a3dbddb 2023-01-11T23:35:22.0492864Z deleted: sha256:2cafe83ace87d1a83f30f8d458001f3a315a606d5251c20543cbd6604499aa73 2023-01-11T23:35:22.0493196Z deleted: sha256:cc1f7fb1208e7b05b48d3b3b2648946adaa723b10df2574c7db87ed9112b1510 2023-01-11T23:35:22.0493530Z deleted: sha256:9027ec66ecfd6490ecf157cdf665428dad438f54565d9d85263211da30e6684d 2023-01-11T23:35:22.0493873Z deleted: sha256:a27541cda46a9ede5932b1a1807360e2f6ada5bb0c30bfa3aa953d59dcc1bc5c 2023-01-11T23:35:22.0494252Z deleted: sha256:7b927de6f9fdeb74acee54e7654d04e2614a112cafb477b2433db34dcc7ebe28 2023-01-11T23:35:22.0495002Z deleted: sha256:8298c8753925ad5124b0611c4101e92ec1f877252f8320f5503ebf3e4e7e1314 2023-01-11T23:35:22.0495325Z deleted: sha256:0b741a2d83533cb3b47b912506d486fd477d8c8a1c520a0f9d7d62edfc55487d 2023-01-11T23:35:22.0495656Z deleted: sha256:9412a0a6fd6057a9939fed5beadf64568de1d230c0325a628e396f5b76444bbb 2023-01-11T23:35:22.0496047Z deleted: sha256:0a294dad6a5df3fab407cd28b31eadaf9f65a5bdbf8c7a0db349d3401b537cef 2023-01-11T23:35:22.0496426Z deleted: sha256:b42ee7d4f5715886c70d5cfef4724e889f773a20741ebc4ed9c1771eb09ad634 2023-01-11T23:35:22.0496758Z deleted: sha256:a68106c7b0e03f1e0fa9ad405ce403e2931174825bdcd8e259522ef40ec8a617 2023-01-11T23:35:22.0497076Z deleted: sha256:ad403ee05a64f55c765e665bd3f25a71650857123282dc1bad2af81f5665cda5 2023-01-11T23:35:22.0497402Z deleted: sha256:eb731dc1382c6cc80193d0454f740fc55f441c32f08cd0ce1b8784e7840e53df 2023-01-11T23:35:22.0497725Z deleted: sha256:45bbe3d22998589317c7f6c4dd591475423bb37ca9b922529c5878653483b18d 2023-01-11T23:35:22.0497898Z 2023-01-11T23:35:22.0498010Z Total reclaimed space: 22.44GB 2023-01-11T23:35:22.0538912Z Post job cleanup. 2023-01-11T23:35:22.0566355Z Post job cleanup. 2023-01-11T23:35:22.1544233Z [command]/usr/bin/git version 2023-01-11T23:35:22.1580646Z git version 2.38.1 2023-01-11T23:35:22.1618445Z Temporarily overriding HOME='/home/ec2-user/actions-runner/_work/_temp/f4d42d21-0616-47c6-8f2e-23c5a01efe0a' before making global git config changes 2023-01-11T23:35:22.1619272Z Adding repository directory to the temporary git global config as a safe directory 2023-01-11T23:35:22.1623372Z [command]/usr/bin/git config --global --add safe.directory /home/ec2-user/actions-runner/_work/pytorch/pytorch 2023-01-11T23:35:22.1647221Z [command]/usr/bin/git config --local --name-only --get-regexp core\.sshCommand 2023-01-11T23:35:22.1677288Z [command]/usr/bin/git submodule foreach --recursive git config --local --name-only --get-regexp 'core\.sshCommand' && git config --local --unset-all 'core.sshCommand' || : 2023-01-11T23:35:22.1900338Z Entering 'android/libs/fbjni' 2023-01-11T23:35:22.1929763Z Entering 'third_party/FP16' 2023-01-11T23:35:22.1957689Z Entering 'third_party/FXdiv' 2023-01-11T23:35:22.1989227Z Entering 'third_party/NNPACK' 2023-01-11T23:35:22.2019674Z Entering 'third_party/QNNPACK' 2023-01-11T23:35:22.2047574Z Entering 'third_party/VulkanMemoryAllocator' 2023-01-11T23:35:22.2075354Z Entering 'third_party/XNNPACK' 2023-01-11T23:35:22.2112155Z Entering 'third_party/benchmark' 2023-01-11T23:35:22.2139977Z Entering 'third_party/cpuinfo' 2023-01-11T23:35:22.2167795Z Entering 'third_party/cub' 2023-01-11T23:35:22.2196150Z Entering 'third_party/cudnn_frontend' 2023-01-11T23:35:22.2232425Z Entering 'third_party/cutlass' 2023-01-11T23:35:22.2264304Z Entering 'third_party/eigen' 2023-01-11T23:35:22.2293874Z Entering 'third_party/fbgemm' 2023-01-11T23:35:22.2324600Z Entering 'third_party/fbgemm/third_party/asmjit' 2023-01-11T23:35:22.2352974Z Entering 'third_party/fbgemm/third_party/cpuinfo' 2023-01-11T23:35:22.2382009Z Entering 'third_party/fbgemm/third_party/googletest' 2023-01-11T23:35:22.2410820Z Entering 'third_party/fbgemm/third_party/hipify_torch' 2023-01-11T23:35:22.2440543Z Entering 'third_party/flatbuffers' 2023-01-11T23:35:22.2470017Z Entering 'third_party/fmt' 2023-01-11T23:35:22.2498445Z Entering 'third_party/foxi' 2023-01-11T23:35:22.2525355Z Entering 'third_party/gemmlowp/gemmlowp' 2023-01-11T23:35:22.2554083Z Entering 'third_party/gloo' 2023-01-11T23:35:22.2583712Z Entering 'third_party/googletest' 2023-01-11T23:35:22.2612987Z Entering 'third_party/ideep' 2023-01-11T23:35:22.2643027Z Entering 'third_party/ideep/mkl-dnn' 2023-01-11T23:35:22.2672605Z Entering 'third_party/ideep/mkl-dnn/third_party/oneDNN' 2023-01-11T23:35:22.2705325Z Entering 'third_party/ios-cmake' 2023-01-11T23:35:22.2732596Z Entering 'third_party/ittapi' 2023-01-11T23:35:22.2762781Z Entering 'third_party/kineto' 2023-01-11T23:35:22.2789271Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2023-01-11T23:35:22.2818370Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2023-01-11T23:35:22.2847983Z Entering 'third_party/nccl/nccl' 2023-01-11T23:35:22.2878417Z Entering 'third_party/neon2sse' 2023-01-11T23:35:22.2908057Z Entering 'third_party/nlohmann' 2023-01-11T23:35:22.2935965Z Entering 'third_party/onnx' 2023-01-11T23:35:22.2973879Z Entering 'third_party/onnx/third_party/benchmark' 2023-01-11T23:35:22.2999574Z Entering 'third_party/onnx/third_party/pybind11' 2023-01-11T23:35:22.3029247Z Entering 'third_party/onnx-tensorrt' 2023-01-11T23:35:22.3058015Z Entering 'third_party/onnx-tensorrt/third_party/onnx' 2023-01-11T23:35:22.3090150Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/benchmark' 2023-01-11T23:35:22.3118400Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11' 2023-01-11T23:35:22.3144247Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11/tools/clang' 2023-01-11T23:35:22.3178200Z Entering 'third_party/pocketfft' 2023-01-11T23:35:22.3204082Z Entering 'third_party/protobuf' 2023-01-11T23:35:22.3235300Z Entering 'third_party/protobuf/third_party/benchmark' 2023-01-11T23:35:22.3263297Z Entering 'third_party/protobuf/third_party/googletest' 2023-01-11T23:35:22.3294404Z Entering 'third_party/psimd' 2023-01-11T23:35:22.3324804Z Entering 'third_party/pthreadpool' 2023-01-11T23:35:22.3352785Z Entering 'third_party/pybind11' 2023-01-11T23:35:22.3379158Z Entering 'third_party/python-enum' 2023-01-11T23:35:22.3408587Z Entering 'third_party/python-peachpy' 2023-01-11T23:35:22.3436689Z Entering 'third_party/python-six' 2023-01-11T23:35:22.3463646Z Entering 'third_party/sleef' 2023-01-11T23:35:22.3492505Z Entering 'third_party/tbb' 2023-01-11T23:35:22.3523496Z Entering 'third_party/tensorpipe' 2023-01-11T23:35:22.3552354Z Entering 'third_party/tensorpipe/third_party/googletest' 2023-01-11T23:35:22.3580576Z Entering 'third_party/tensorpipe/third_party/libnop' 2023-01-11T23:35:22.3608667Z Entering 'third_party/tensorpipe/third_party/libuv' 2023-01-11T23:35:22.3638010Z Entering 'third_party/tensorpipe/third_party/pybind11' 2023-01-11T23:35:22.3663706Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2023-01-11T23:35:22.3691539Z Entering 'third_party/zstd' 2023-01-11T23:35:22.3731154Z [command]/usr/bin/git config --local --name-only --get-regexp http\.https\:\/\/github\.com\/\.extraheader 2023-01-11T23:35:22.3756378Z http.https://github.com/.extraheader 2023-01-11T23:35:22.3763626Z [command]/usr/bin/git config --local --unset-all http.https://github.com/.extraheader 2023-01-11T23:35:22.3792637Z [command]/usr/bin/git submodule foreach --recursive git config --local --name-only --get-regexp 'http\.https\:\/\/github\.com\/\.extraheader' && git config --local --unset-all 'http.https://github.com/.extraheader' || : 2023-01-11T23:35:22.4001500Z Entering 'android/libs/fbjni' 2023-01-11T23:35:22.4018364Z http.https://github.com/.extraheader 2023-01-11T23:35:22.4040121Z Entering 'third_party/FP16' 2023-01-11T23:35:22.4056624Z http.https://github.com/.extraheader 2023-01-11T23:35:22.4078494Z Entering 'third_party/FXdiv' 2023-01-11T23:35:22.4093915Z http.https://github.com/.extraheader 2023-01-11T23:35:22.4115313Z Entering 'third_party/NNPACK' 2023-01-11T23:35:22.4131727Z http.https://github.com/.extraheader 2023-01-11T23:35:22.4155077Z Entering 'third_party/QNNPACK' 2023-01-11T23:35:22.4171485Z http.https://github.com/.extraheader 2023-01-11T23:35:22.4194096Z Entering 'third_party/VulkanMemoryAllocator' 2023-01-11T23:35:22.4209440Z http.https://github.com/.extraheader 2023-01-11T23:35:22.4232321Z Entering 'third_party/XNNPACK' 2023-01-11T23:35:22.4247395Z http.https://github.com/.extraheader 2023-01-11T23:35:22.4277139Z Entering 'third_party/benchmark' 2023-01-11T23:35:22.4292772Z http.https://github.com/.extraheader 2023-01-11T23:35:22.4316503Z Entering 'third_party/cpuinfo' 2023-01-11T23:35:22.4332087Z http.https://github.com/.extraheader 2023-01-11T23:35:22.4353645Z Entering 'third_party/cub' 2023-01-11T23:35:22.4370263Z http.https://github.com/.extraheader 2023-01-11T23:35:22.4393148Z Entering 'third_party/cudnn_frontend' 2023-01-11T23:35:22.4409405Z http.https://github.com/.extraheader 2023-01-11T23:35:22.4436394Z Entering 'third_party/cutlass' 2023-01-11T23:35:22.4452920Z http.https://github.com/.extraheader 2023-01-11T23:35:22.4480023Z Entering 'third_party/eigen' 2023-01-11T23:35:22.4496918Z http.https://github.com/.extraheader 2023-01-11T23:35:22.4520556Z Entering 'third_party/fbgemm' 2023-01-11T23:35:22.4536742Z http.https://github.com/.extraheader 2023-01-11T23:35:22.4558784Z Entering 'third_party/fbgemm/third_party/asmjit' 2023-01-11T23:35:22.4575615Z http.https://github.com/.extraheader 2023-01-11T23:35:22.4597710Z Entering 'third_party/fbgemm/third_party/cpuinfo' 2023-01-11T23:35:22.4613041Z http.https://github.com/.extraheader 2023-01-11T23:35:22.4634049Z Entering 'third_party/fbgemm/third_party/googletest' 2023-01-11T23:35:22.4650505Z http.https://github.com/.extraheader 2023-01-11T23:35:22.4673187Z Entering 'third_party/fbgemm/third_party/hipify_torch' 2023-01-11T23:35:22.4689346Z http.https://github.com/.extraheader 2023-01-11T23:35:22.4711973Z Entering 'third_party/flatbuffers' 2023-01-11T23:35:22.4727961Z http.https://github.com/.extraheader 2023-01-11T23:35:22.4753232Z Entering 'third_party/fmt' 2023-01-11T23:35:22.4768610Z http.https://github.com/.extraheader 2023-01-11T23:35:22.4791181Z Entering 'third_party/foxi' 2023-01-11T23:35:22.4806989Z http.https://github.com/.extraheader 2023-01-11T23:35:22.4829730Z Entering 'third_party/gemmlowp/gemmlowp' 2023-01-11T23:35:22.4846502Z http.https://github.com/.extraheader 2023-01-11T23:35:22.4869488Z Entering 'third_party/gloo' 2023-01-11T23:35:22.4886222Z http.https://github.com/.extraheader 2023-01-11T23:35:22.4909708Z Entering 'third_party/googletest' 2023-01-11T23:35:22.4926020Z http.https://github.com/.extraheader 2023-01-11T23:35:22.4949099Z Entering 'third_party/ideep' 2023-01-11T23:35:22.4965941Z http.https://github.com/.extraheader 2023-01-11T23:35:22.4986233Z Entering 'third_party/ideep/mkl-dnn' 2023-01-11T23:35:22.5000948Z http.https://github.com/.extraheader 2023-01-11T23:35:22.5023246Z Entering 'third_party/ideep/mkl-dnn/third_party/oneDNN' 2023-01-11T23:35:22.5038817Z http.https://github.com/.extraheader 2023-01-11T23:35:22.5065789Z Entering 'third_party/ios-cmake' 2023-01-11T23:35:22.5082282Z http.https://github.com/.extraheader 2023-01-11T23:35:22.5103493Z Entering 'third_party/ittapi' 2023-01-11T23:35:22.5119768Z http.https://github.com/.extraheader 2023-01-11T23:35:22.5142306Z Entering 'third_party/kineto' 2023-01-11T23:35:22.5158614Z http.https://github.com/.extraheader 2023-01-11T23:35:22.5179796Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2023-01-11T23:35:22.5195993Z http.https://github.com/.extraheader 2023-01-11T23:35:22.5216974Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2023-01-11T23:35:22.5233282Z http.https://github.com/.extraheader 2023-01-11T23:35:22.5258814Z Entering 'third_party/nccl/nccl' 2023-01-11T23:35:22.5275282Z http.https://github.com/.extraheader 2023-01-11T23:35:22.5296478Z Entering 'third_party/neon2sse' 2023-01-11T23:35:22.5313001Z http.https://github.com/.extraheader 2023-01-11T23:35:22.5335702Z Entering 'third_party/nlohmann' 2023-01-11T23:35:22.5352427Z http.https://github.com/.extraheader 2023-01-11T23:35:22.5374938Z Entering 'third_party/onnx' 2023-01-11T23:35:22.5392247Z http.https://github.com/.extraheader 2023-01-11T23:35:22.5421856Z Entering 'third_party/onnx/third_party/benchmark' 2023-01-11T23:35:22.5439193Z http.https://github.com/.extraheader 2023-01-11T23:35:22.5460893Z Entering 'third_party/onnx/third_party/pybind11' 2023-01-11T23:35:22.5477336Z http.https://github.com/.extraheader 2023-01-11T23:35:22.5502910Z Entering 'third_party/onnx-tensorrt' 2023-01-11T23:35:22.5518862Z http.https://github.com/.extraheader 2023-01-11T23:35:22.5541323Z Entering 'third_party/onnx-tensorrt/third_party/onnx' 2023-01-11T23:35:22.5557467Z http.https://github.com/.extraheader 2023-01-11T23:35:22.5583944Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/benchmark' 2023-01-11T23:35:22.5600407Z http.https://github.com/.extraheader 2023-01-11T23:35:22.5622668Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11' 2023-01-11T23:35:22.5639806Z http.https://github.com/.extraheader 2023-01-11T23:35:22.5660807Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11/tools/clang' 2023-01-11T23:35:22.5677297Z http.https://github.com/.extraheader 2023-01-11T23:35:22.5703567Z Entering 'third_party/pocketfft' 2023-01-11T23:35:22.5720470Z http.https://github.com/.extraheader 2023-01-11T23:35:22.5743000Z Entering 'third_party/protobuf' 2023-01-11T23:35:22.5759455Z http.https://github.com/.extraheader 2023-01-11T23:35:22.5784689Z Entering 'third_party/protobuf/third_party/benchmark' 2023-01-11T23:35:22.5801221Z http.https://github.com/.extraheader 2023-01-11T23:35:22.5823512Z Entering 'third_party/protobuf/third_party/googletest' 2023-01-11T23:35:22.5840192Z http.https://github.com/.extraheader 2023-01-11T23:35:22.5864534Z Entering 'third_party/psimd' 2023-01-11T23:35:22.5881355Z http.https://github.com/.extraheader 2023-01-11T23:35:22.5901958Z Entering 'third_party/pthreadpool' 2023-01-11T23:35:22.5920187Z http.https://github.com/.extraheader 2023-01-11T23:35:22.5941535Z Entering 'third_party/pybind11' 2023-01-11T23:35:22.5957842Z http.https://github.com/.extraheader 2023-01-11T23:35:22.5980060Z Entering 'third_party/python-enum' 2023-01-11T23:35:22.5996660Z http.https://github.com/.extraheader 2023-01-11T23:35:22.6017291Z Entering 'third_party/python-peachpy' 2023-01-11T23:35:22.6034276Z http.https://github.com/.extraheader 2023-01-11T23:35:22.6056332Z Entering 'third_party/python-six' 2023-01-11T23:35:22.6072853Z http.https://github.com/.extraheader 2023-01-11T23:35:22.6094949Z Entering 'third_party/sleef' 2023-01-11T23:35:22.6112240Z http.https://github.com/.extraheader 2023-01-11T23:35:22.6133582Z Entering 'third_party/tbb' 2023-01-11T23:35:22.6150255Z http.https://github.com/.extraheader 2023-01-11T23:35:22.6174443Z Entering 'third_party/tensorpipe' 2023-01-11T23:35:22.6190347Z http.https://github.com/.extraheader 2023-01-11T23:35:22.6212289Z Entering 'third_party/tensorpipe/third_party/googletest' 2023-01-11T23:35:22.6229450Z http.https://github.com/.extraheader 2023-01-11T23:35:22.6251955Z Entering 'third_party/tensorpipe/third_party/libnop' 2023-01-11T23:35:22.6269251Z http.https://github.com/.extraheader 2023-01-11T23:35:22.6291553Z Entering 'third_party/tensorpipe/third_party/libuv' 2023-01-11T23:35:22.6308500Z http.https://github.com/.extraheader 2023-01-11T23:35:22.6330859Z Entering 'third_party/tensorpipe/third_party/pybind11' 2023-01-11T23:35:22.6347482Z http.https://github.com/.extraheader 2023-01-11T23:35:22.6368701Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2023-01-11T23:35:22.6385944Z http.https://github.com/.extraheader 2023-01-11T23:35:22.6410861Z Entering 'third_party/zstd' 2023-01-11T23:35:22.6427608Z http.https://github.com/.extraheader 2023-01-11T23:35:22.6639952Z Cleaning up orphan processes